Available Webservices

The Carp product range supports several SOAP web services. All web services are based on the SOAP XML interface (http://en.wikipedia.org/wiki/SOAP).
Each Carp web service has a dedicated URI. If you want to use the web services or need specialized new ones or want to use specialized applications, please.
With the services you can pass plain text or binary documents (HTML, XML, Microsoft Office, PDF etc. ).

Classifier

classification of documents

Duplicatefinder

finding and comparing of similar text files

Identifier

detect specific entities in text

KeywordFinder / Tagging

tag a document without training.

Wordcloud

get sentiment information given a document.

Summarizer

automatic dynamic summarization of any text


Technical Documentation

Classifier

Introduction

With the Classifier API you can categorize your own text.
You need to define your own categories and train Classifier with the service ClassifierAdmin. The number of documents to train depends on the length of each text document and uniqueness of the categories.

After the training you can use the Classifier service.

With each request the header needs  “SOAPAction: “urn:classify”.

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
<soap:Envelope
   <soap:Header/>
   <soap:Body>
      <carp:classify>
         <carp:language>en</carp:language>
         <carp:categoryset>demo</carp:categoryset>
         <carp:documenttype>text/plain</carp:documenttype>
         <carp:document>
If you're good enough at what you do, it is possible to live forever.
That's a lesson to be drawn from the news out of Amsterdam last week.
A painting by the Dutch artist Vincent van Gogh, which was previously
believed to be a forgery, has been authenticated. "Sunset at Montmajour,"
a landscape painted by van Gogh in 1888, has been painstakingly studied
by experts at the Van Gogh Museum, using sophisticated
chemical-and-technological analysis. Their conclusion: It's the real thing.
It is said to be the first full-sized canvas by van Gogh to be found
in 85 years. In the past, paintings by van Gogh have sold for
tens of millions of dollars apiece.
He has been dead since 1890, when, in one of many moments of despair,
he took his own life. He was only 37. Even before he entered the world,
there were omens that his might not be a conventional existence.
On March 30, 1852, Vincent van Gogh was stillborn in the Netherlands.
That was his older brother. A year later -- to the day -- a second child
was born. This child, too, was given the name Vincent. He would be the
boy who grew into an artist.
         </carp:document>
      </carp:classify>
   </soap:Body>
</soap:Envelope>

Result:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<soapenv:Envelope xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope">
   <soapenv:Body>
      <ns:classifyResponse xmlns:ns="http://carp.tm7.nl">
         <ns:return xsi:type="ax29:CarpClassifier"
            xmlns:ax29="http://carp.tm7.nl/xsd"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <ax29:error xsi:type="ax29:CarpError">
               <ax29:code>0</ax29:code>
               <ax29:description xsi:nil="true"/>
               <ax29:severity xsi:nil="true"/>
               <ax29:stacktrace xsi:nil="true"/>
            </ax29:error>
            <ax29:language>en</ax29:language>
            <ax29:classificationList xsi:type="ax29:CarpClassifier_Classification">
               <ax29:confidence>0.6459968046554693</ax29:confidence>
               <ax29:label>Entertainment</ax29:label>
               <ax29:probability>0.9457582256937944</ax29:probability>
               <ax29:score>0.6109567917748173</ax29:score>
            </ax29:classificationList>
            <ax29:classificationList xsi:type="ax29:CarpClassifier_Classification">
               <ax29:confidence>0.3018046687373605</ax29:confidence>
               <ax29:label>Soccer</ax29:label>
               <ax29:probability>0.7115595907569598</ax29:probability>
               <ax29:score>0.21475200657529608</ax29:score>
            </ax29:classificationList>
         </ns:return>
      </ns:classifyResponse>
   </soapenv:Body>
</soapenv:Envelope>

Detailed Classifier Admin documentation

nl.tm7.carp

Class CarpClassifierAdminService

  • java.lang.Object
    • nl.tm7.carp.CarpClassifierAdminService

  • public class CarpClassifierAdminService
    extends java.lang.Object
    Carp classifier admin class.
    The Carp classifier admin class is used to manipulate the classifier configuration.
    Several web services classify are available for this.All web services returns a ClassifierAdminImpl object, which contains the web service return information.
    The CarpClassifierAdmin object contains the CarpError object which gives the method result:
    Possible values:
    Fatal:
    1, “Properties file: /” + filename + ” not found.”
    2,”No document types defined in carp.properties file.”
    2,”No languages defined in properties file.”
    2,”No default language defined in properties file.”
    2,”Invalid default language defined in properties file.”
    2,”CatagorySet directory does not exist.”
    3,”Internal server error.”
    Error:
    101,”Language ” + reqlanguage + ” value not allowed for this method.”
    101,”Language ” + reqlanguage + ” value not allowed for this method.”
    101,”Documenttype missing.”
    101,”Documenttype ” + reqdoctype + ” not supported.”
    101,”Document content missing.”
    102,”Language could not be determined.”
    103,”Decoding rtf document failed:”
    105,”CatagorySet does already exist.”
    107,”CatagoryLabel does already exist.”
    Warning:
    200, “License will expire in ” + result + ” days.”
    Version:
    1.0
    Author:
    (c) Carp Technologies 2013

    • Method Summary

      Methods 
      Modifier and Type Method and Description
      CarpClassifierAdmin addExample(java.lang.String language,
      java.lang.String categoryset,
      java.lang.String categorylabel,
      java.lang.String documenttype,
      java.lang.String document,
      byte[] documentBytes)

      Add an example document to the classifier configuration.
      CarpClassifierAdmin copyCategorySet(java.lang.String language,
      java.lang.String categoryset,
      java.lang.String categorysetcopy)

      Copy a CategorySet for the classifier configuration.
      CarpClassifierAdmin createCategory(java.lang.String language,
      java.lang.String categoryset,
      java.lang.String categorysetlabel)

      Create a CategorySet label for the classifier configuration.
      CarpClassifierAdmin createCategorySet(java.lang.String language,
      java.lang.String categoryset)

      Create a CategorySet for the classifier configuration.
      CarpClassifierAdmin deleteCategory(java.lang.String language,
      java.lang.String categoryset,
      java.lang.String categorysetlabel)

      Remove a CategorySet label from the classifier configuration.
      CarpClassifierAdmin deleteCategorySet(java.lang.String language,
      java.lang.String categoryset)

      Remove a CategorySet from the classifier configuration.
      java.lang.String encodedHex(java.lang.String coded)

      Return a string in hexadecimal format, to verify various stream encoding formats.
      CarpClassifierAdmin getCategoryLabels(java.lang.String language,
      java.lang.String categoryset)

      Get all Category labels for a given CategorySet from the classifier configuration.
      CarpClassifierAdmin getCategorySetNames(java.lang.String language)

      Get all CategorySet objects for the classifier configuration.
      CarpClassifierAdmin isCategory(java.lang.String language,
      java.lang.String categoryset,
      java.lang.String categorysetlabel)

      Validate a CategorySet label for the classifier configuration.
      CarpClassifierAdmin isCategorySet(java.lang.String language,
      java.lang.String categoryset)

      Validate a CategorySet name for the classifier configuration.
      CarpClassifierAdmin train(java.lang.String language,
      java.lang.String categoryset)

      Process the example documents for the classifier configuration.
      java.lang.String version()

      Return the version number of all components.

      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait


    • Constructor Detail

      • CarpClassifierAdminService

        public CarpClassifierAdminService()
        Internal use only

    • Method Detail

      • getCategorySetNames

        public CarpClassifierAdmin getCategorySetNames(java.lang.String language)
        Get all CategorySet objects for the classifier configuration.
        Parameters:
        language – Language for the CategorySet. Default ‘nl’. Supported are nl (Dutch), en (English), * (all).
        Returns:
        ClassifierAdminImpl object with CategorySetNameList: Array of CategorySet objects for the classifier configuration.

      • getCategoryLabels

        public CarpClassifierAdmin getCategoryLabels(java.lang.String language,
                                            java.lang.String categoryset)
        Get all Category labels for a given CategorySet from the classifier configuration.
        Parameters:
        language – Language for the CategorySet. Default ‘nl’. Supported are nl (Dutch), en (English).
        categoryset – Context to search for classification labels.
        Returns:
        ClassifierAdminImpl object with CategorySetLableList: Array of Category labels for the classifier configuration context.

      • createCategorySet

        public CarpClassifierAdmin createCategorySet(java.lang.String language,
                                            java.lang.String categoryset)
        Create a CategorySet for the classifier configuration.
        Recommended not to use this method, use the classify trainer to manipulate the classifier configuration.
        Parameters:
        language – Language for the CategorySet. Default ‘nl’. Supported are nl (Dutch), en (English).
        categoryset – Name for the CategorySet.
        Returns:
        ClassifierAdminImpl object with result code.

      • deleteCategorySet

        public CarpClassifierAdmin deleteCategorySet(java.lang.String language,
                                            java.lang.String categoryset)
        Remove a CategorySet from the classifier configuration.
        Parameters:
        language – Language for the CategorySet. Default ‘nl’. Supported are nl (Dutch), en (English).
        categoryset – Name for the CategorySet.
        Returns:
        ClassifierAdminImpl object with result code.

      • copyCategorySet

        public CarpClassifierAdmin copyCategorySet(java.lang.String language,
                                          java.lang.String categoryset,
                                          java.lang.String categorysetcopy)
        Copy a CategorySet for the classifier configuration.
        Parameters:
        language – Language for the CategorySet. Default ‘nl’. Supported are nl (Dutch), en (English).
        categoryset – Name for the source CategorySet.
        categorysetcopy – Name for the target CategorySet.
        Returns:
        ClassifierAdminImpl object with result code.

      • createCategory

        public CarpClassifierAdmin createCategory(java.lang.String language,
                                         java.lang.String categoryset,
                                         java.lang.String categorysetlabel)
        Create a CategorySet label for the classifier configuration.
        Recommended not to use this method, use the classify trainer to manipulate the classifier configuration.
        Parameters:
        language – Language for the CategorySet. Default ‘nl’. Supported are nl (Dutch), en (English).
        categoryset – Name for the CategorySet.
        categorysetlabel – Name for the CategorySet label.
        Returns:
        ClassifierAdminImpl object with result code.

      • deleteCategory

        public CarpClassifierAdmin deleteCategory(java.lang.String language,
                                         java.lang.String categoryset,
                                         java.lang.String categorysetlabel)
        Remove a CategorySet label from the classifier configuration.
        Parameters:
        language – Language for the CategorySet. Default ‘nl’. Supported are nl (Dutch), en (English).
        categoryset – Name for the CategorySet.
        categorysetlabel – Name for the CategorySet label.
        Returns:
        ClassifierAdminImpl object with result code.

      • isCategorySet

        public CarpClassifierAdmin isCategorySet(java.lang.String language,
                                        java.lang.String categoryset)
        Validate a CategorySet name for the classifier configuration.
        Parameters:
        language – Language for the CategorySet. Default ‘nl’. Supported are nl (Dutch), en (English).
        categoryset – Name for the CategorySet.
        Returns:
        ClassifierAdminImpl object with result code.

      • isCategory

        public CarpClassifierAdmin isCategory(java.lang.String language,
                                     java.lang.String categoryset,
                                     java.lang.String categorysetlabel)
        Validate a CategorySet label for the classifier configuration.
        Parameters:
        language – Language for the CategorySet. Default ‘nl’. Supported are nl (Dutch), en (English).
        categoryset – Name for the CategorySet.
        categorysetlabel – Name for the CategorySet label.
        Returns:
        ClassifierAdminImpl object with result code.

      • addExample

        public CarpClassifierAdmin addExample(java.lang.String language,
                                     java.lang.String categoryset,
                                     java.lang.String categorylabel,
                                     java.lang.String documenttype,
                                     java.lang.String document,
                                     byte[] documentBytes)
        Add an example document to the classifier configuration.
        Recommended not to use this method, use the classify trainer to manipulate the classifier configuration.
        Parameters:
        language – Language for the CategorySet. Default ‘nl’. Supported are nl (Dutch), en (English).
        categoryset – Name for the CategorySet.
        categorylabel – Name for the CategorySet label.
        documenttype – Document type. Supported are text/plain.
        document – Document content.
        documentBytes – Document content in binary form. Cannot set document as string, ignores document type.
        Returns:
        ClassifierAdminImpl object with result code.

      • train

        public CarpClassifierAdmin train(java.lang.String language,
                                java.lang.String categoryset)
        Process the example documents for the classifier configuration.
        Recommended not to use this method, use the classify trainer to manipulate the classifier configuration.
        Parameters:
        language – Language for the CategorySet. Default ‘nl’. Supported are nl (Dutch), en (English).
        categoryset – Name for the CategorySet.
        Returns:
        ClassifierAdminImpl object with result code.

      • version

        public java.lang.String version()
        Return the version number of all components.
        Returns:
        Version info

      • encodedHex

        public java.lang.String encodedHex(java.lang.String coded)
        Return a string in hexadecimal format, to verify various stream encoding formats.
        Parameters:
        coded – String to be encoded
        Returns:
        String in hexadecimal

Class CarpIdentifierService

  • java.lang.Object
    • nl.tm7.carp.CarpIdentifierService

  • public class CarpIdentifierService
    extends java.lang.Object

    Carp Identifier class.
    The Identifier class is used to identify a document.
    The web service identify is available for this.

    The web service returns a CarpIdentifier object, which contains the document identifier result.
    The CarpIdentifier object contains the CarpError object which gives the method result:
    Possible values:
    Fatal:
    1, “Properties file: /” + filename + ” not found.”
    1, “Properties file: /” + filename + ” error. ” + e.getLocalizedMessage()”
    2, “No document types defined in carp.properties file.”
    2, “No languages defined in properties file.”
    2, “No model found for language ” + locale + “.”
    2, “Language model” + locale + “, internal error: ” + e.getLocalizedMessage()
    3, “Resource directory: ” + resourceRoot + ” not found.”
    3, “Internal server error.”
    4, “Invalid license.”
    Error:
    101,”Language ” + reqlanguage + ” value not allowed for this method.”
    101,”Documenttype missing.”
    101,”Documenttype ” + reqdoctype + ” not supported.”
    101,”Document content missing.”
    102,”Language could not be determined.”
    103,”Decoding rtf document failed:”
    Warning:
    200, “License will expire in ” + result + ” days.”

    Version:
    1.0
    Author:
    (c) Carp Technologies 2013

    • Method Summary

      Methods 
      Modifier and Type Method and Description
      java.lang.String

      encodedHex(java.lang.String coded)

      Return a string in hexadecimal format, to verify various stream encoding formats.
      CarpIdentifier

      identify(java.lang.String context,
      java.lang.String language,
      java.lang.String documenttype,
      java.lang.String document,
      byte[] documentBytes,
      java.lang.Boolean doHtmlOutput)

      Identify a document
      java.lang.String

      version()

      Return the version number of all components.

      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait


    • Constructor Detail

      • CarpIdentifierService

        public CarpIdentifierService()

    • Method Detail

      • identify

        public CarpIdentifier identify(java.lang.String context,
                              java.lang.String language,
                              java.lang.String documenttype,
                              java.lang.String document,
                              byte[] documentBytes,
                              java.lang.Boolean doHtmlOutput)
        Identify a document
        Parameters:
        context – Optional context for identifier configuration
        language – Language for the document content. Supported are nl (Dutch), en (English), ? detect the language
        documenttype – Document type. Supported are text/plain, text/html, text/rtf.
        document – Document content as a string. Must set document type. Cannot set document in binary form.
        documentBytes – Document content in binary form. Cannot set document as string, ignores document type.
        doHtmlOutput – Return HTML colored text output
        Returns:
        CarpIdentifier object with the result.

      • version

        public java.lang.String version()
        Return the version number of all components.
        Returns:
        Version info

      • encodedHex

        public java.lang.String encodedHex(java.lang.String coded)
        Return a string in hexadecimal format, to verify various stream encoding formats.
        Parameters:
        coded – String to be encoded
        Returns:
        String in hexadecimal

Identifier Soap Service documentation

Introduction

The Identifier API analyse text and shows the entities. Based on domain specific definitions (context) it will return specific information of the entities.

With each request the header needs  “SOAPAction: “urn:identify”.

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope"
   xmlns:carp="http://carp.tm7.nl">
   <soap:Header/>
   <soap:Body>
      <carp:identify>
         <carp:context>nl.zip</carp:context>
         <carp:language>nl</carp:language>
         <carp:documenttype>text/plain</carp:documenttype>
         <!--Optional:-->
         <carp:document>
Curriculum Vitae
Jeroen Dijsselbloem
Minister van Financiën
Personalia
Voornamen (roepnaam): Jeroen René Victor Anton (Jeroen)
Geboortedatum en -plaats: 29 maart 1966, Eindhoven
Woonplaats: Wageningen.
Burgerlijke staat: ongehuwd samenwonend, twee kinderen.
Opleiding
VWO, RK Eckartcollege, Eindhoven (1985)
Agrarische economie, richting bedrijfseconomie, landbouwpolitiek
en sociaal-economische geschiedenis aan Wageningen University (1985-1991)
Doctoraal onderzoek Bedrijfseconomie, University College Cork, Ierland (1991)
</carp:document>
      </carp:identify>
   </soap:Body>
</soap:Envelope>

Result:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
<soapenv:Envelope xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope">
   <soapenv:Body>
      <ns:identifyResponse xmlns:ns="http://carp.tm7.nl">
         <ns:return xsi:type="ax215:CarpIdentifier" xmlns:ax215="http://carp.tm7.nl/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <ax215:error xsi:type="ax215:CarpError">
               <ax215:code>0</ax215:code>
               <ax215:description xsi:nil="true"/>
               <ax215:severity xsi:nil="true"/>
               <ax215:stacktrace xsi:nil="true"/>
            </ax215:error>
            <ax215:language>nl</ax215:language>
            <ax215:entityList xsi:type="ax215:CarpIdentifier_Entity">
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>subtype</ax215:attributeName>
                  <ax215:attributeValue>person</ax215:attributeValue>
                  <ax215:attributeWeight>1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>normalized-form</ax215:attributeName>
                  <ax215:attributeValue>person:Jeroen Dijsselbloem</ax215:attributeValue>
                  <ax215:attributeWeight>1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>type</ax215:attributeName>
                  <ax215:attributeValue>named-entity</ax215:attributeValue>
                  <ax215:attributeWeight>-1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:entityValue>Jeroen Dijsselbloem</ax215:entityValue>
               <ax215:index>1</ax215:index>
               <ax215:occurences>1</ax215:occurences>
            </ax215:entityList>
            <ax215:entityList xsi:type="ax215:CarpIdentifier_Entity">
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>subtype</ax215:attributeName>
                  <ax215:attributeValue>organization</ax215:attributeValue>
                  <ax215:attributeWeight>1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>role</ax215:attributeName>
                  <ax215:attributeValue>ministerie</ax215:attributeValue>
                  <ax215:attributeWeight>1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>type</ax215:attributeName>
                  <ax215:attributeValue>named-entity</ax215:attributeValue>
                  <ax215:attributeWeight>-1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:entityValue>Financiën</ax215:entityValue>
               <ax215:index>2</ax215:index>
               <ax215:occurences>1</ax215:occurences>
            </ax215:entityList>
            <ax215:entityList xsi:type="ax215:CarpIdentifier_Entity">
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>subtype</ax215:attributeName>
                  <ax215:attributeValue>person</ax215:attributeValue>
                  <ax215:attributeWeight>2.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>normalized-form</ax215:attributeName>
                  <ax215:attributeValue>person:Jeroen René Victor Anton</ax215:attributeValue>
                  <ax215:attributeWeight>1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>type</ax215:attributeName>
                  <ax215:attributeValue>named-entity</ax215:attributeValue>
                  <ax215:attributeWeight>-1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:entityValue>Jeroen René Victor Anton</ax215:entityValue>
               <ax215:index>3</ax215:index>
               <ax215:occurences>2</ax215:occurences>
            </ax215:entityList>
            <ax215:entityList xsi:type="ax215:CarpIdentifier_Entity">
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>subtype</ax215:attributeName>
                  <ax215:attributeValue>city</ax215:attributeValue>
                  <ax215:attributeWeight>2.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>type</ax215:attributeName>
                  <ax215:attributeValue>named-entity</ax215:attributeValue>
                  <ax215:attributeWeight>-1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:entityValue>Eindhoven</ax215:entityValue>
               <ax215:index>4</ax215:index>
               <ax215:occurences>2</ax215:occurences>
            </ax215:entityList>
            <ax215:entityList xsi:type="ax215:CarpIdentifier_Entity">
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>subtype</ax215:attributeName>
                  <ax215:attributeValue>city</ax215:attributeValue>
                  <ax215:attributeWeight>1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>type</ax215:attributeName>
                  <ax215:attributeValue>named-entity</ax215:attributeValue>
                  <ax215:attributeWeight>-1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:entityValue>Wageningen University</ax215:entityValue>
               <ax215:index>5</ax215:index>
               <ax215:occurences>3</ax215:occurences>
            </ax215:entityList>
            <ax215:entityList xsi:type="ax215:CarpIdentifier_Entity">
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>type</ax215:attributeName>
                  <ax215:attributeValue>named-entity</ax215:attributeValue>
                  <ax215:attributeWeight>-1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:entityValue>VWO</ax215:entityValue>
               <ax215:index>6</ax215:index>
               <ax215:occurences>1</ax215:occurences>
            </ax215:entityList>
            <ax215:entityList xsi:type="ax215:CarpIdentifier_Entity">
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>type</ax215:attributeName>
                  <ax215:attributeValue>named-entity</ax215:attributeValue>
                  <ax215:attributeWeight>-1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:entityValue>RK Eckartcollege</ax215:entityValue>
               <ax215:index>7</ax215:index>
               <ax215:occurences>1</ax215:occurences>
            </ax215:entityList>
            <ax215:entityList xsi:type="ax215:CarpIdentifier_Entity">
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>type</ax215:attributeName>
                  <ax215:attributeValue>named-entity</ax215:attributeValue>
                  <ax215:attributeWeight>-1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:entityValue>College Cork</ax215:entityValue>
               <ax215:index>8</ax215:index>
               <ax215:occurences>1</ax215:occurences>
            </ax215:entityList>
            <ax215:entityList xsi:type="ax215:CarpIdentifier_Entity">
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>subtype</ax215:attributeName>
                  <ax215:attributeValue>country</ax215:attributeValue>
                  <ax215:attributeWeight>0.8</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>role</ax215:attributeName>
                  <ax215:attributeValue>europa</ax215:attributeValue>
                  <ax215:attributeWeight>1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:attributeList xsi:type="ax215:CarpIdentifier_Attribute">
                  <ax215:attributeName>type</ax215:attributeName>
                  <ax215:attributeValue>named-entity</ax215:attributeValue>
                  <ax215:attributeWeight>-1.0</ax215:attributeWeight>
               </ax215:attributeList>
               <ax215:entityValue>Ierland</ax215:entityValue>
               <ax215:index>9</ax215:index>
               <ax215:occurences>1</ax215:occurences>
            </ax215:entityList>
            <ax215:resultHtmlOutput xsi:nil="true"/>
            <ax215:resultTableHtml xsi:nil="true"/>
         </ns:return>
      </ns:identifyResponse>
   </soapenv:Body>
</soapenv:Envelope>

Carp Identifier detailed documentation

nl.tm7.carp

Class CarpIdentifierService

  • java.lang.Object
    • nl.tm7.carp.CarpIdentifierService

  • public class CarpIdentifierService
    extends java.lang.Object
    Carp Identifier class.
    The Identifier class is used to identify a document.
    The web service identify is available for this.The web service returns a CarpIdentifier object, which contains the document identifier result.
    The CarpIdentifier object contains the CarpError object which gives the method result:
    Possible values:
    Fatal:
    1, “Properties file: /” + filename + ” not found.”
    1, “Properties file: /” + filename + ” error. ” + e.getLocalizedMessage()”
    2, “No document types defined in carp.properties file.”
    2, “No languages defined in properties file.”
    2, “No model found for language ” + locale + “.”
    2, “Language model” + locale + “, internal error: ” + e.getLocalizedMessage()
    3, “Resource directory: ” + resourceRoot + ” not found.”
    3, “Internal server error.”
    4, “Invalid license.”
    Error:
    101,”Language ” + reqlanguage + ” value not allowed for this method.”
    101,”Documenttype missing.”
    101,”Documenttype ” + reqdoctype + ” not supported.”
    101,”Document content missing.”
    102,”Language could not be determined.”
    103,”Decoding rtf document failed:”
    Warning:
    200, “License will expire in ” + result + ” days.”
    Version:
    1.0
    Author:
    (c) Carp Technologies 2017

    • Method Summary

      Methods 
      Modifier and Type Method and Description
      java.lang.String encodedHex(java.lang.String coded)

      Return a string in hexadecimal format, to verify various stream encoding formats.
      CarpIdentifier identify(java.lang.String context,
      java.lang.String language,
      java.lang.String documenttype,
      java.lang.String document,
      byte[] documentBytes,
      java.lang.Boolean doHtmlOutput)

      Identify a document
      java.lang.String version()

      Return the version number of all components.

      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait


    • Constructor Detail

      • CarpIdentifierService

        public CarpIdentifierService()

    • Method Detail

      • identify

        public CarpIdentifier identify(java.lang.String context,
                              java.lang.String language,
                              java.lang.String documenttype,
                              java.lang.String document,
                              byte[] documentBytes,
                              java.lang.Boolean doHtmlOutput)
        Identify a document
        Parameters:
        context – Optional context for identifier configuration
        language – Language for the document content. Supported are nl (Dutch), en (English), ? detect the language
        documenttype – Document type. Supported are text/plain, text/html, text/rtf.
        document – Document content as a string. Must set document type. Cannot set document in binary form.
        documentBytes – Document content in binary form. Cannot set document as string, ignores document type.
        doHtmlOutput – Return HTML colored text output
        Returns:
        CarpIdentifier object with the result.

      • version

        public java.lang.String version()
        Return the version number of all components.
        Returns:
        Version info

      • encodedHex

        public java.lang.String encodedHex(java.lang.String coded)
        Return a string in hexadecimal format, to verify various stream encoding formats.
        Parameters:
        coded – String to be encoded
        Returns:
        String in hexadecimal

Carp Keyword Finder Soap Service documentation

Introduction

Carp Technologies developed a unique improved technology: keyword extraction. Based on Natural Language Technology and techniques developed by Carp the keyword extractor shows the important subjects in text and create the tags for the text. Finding the right tags is automatic, no training is needed!

You can use this with our SOAP service.

With each request the header needs  “SOAPAction: “urn:findKeywords”.

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
<soapenv:Envelope
xmlns:carp="http://carp.tm7.nl">
   <soapenv:Header/>
   <soapenv:Body>
      <carp:findKeywords>
         <carp:language>en</carp:language>
         <carp:documenttype>text/plain</carp:documenttype>
         <carp:document>
If you're good enough at what you do, it is possible to live forever.
That's a lesson to be drawn from the news out of Amsterdam last week.
A painting by the Dutch artist Vincent van Gogh, which was previously
believed to be a forgery, has been authenticated. "Sunset at Montmajour,"
a landscape painted by van Gogh in 1888, has been painstakingly studied by
experts at the Van Gogh Museum, using sophisticated
chemical-and-technological analysis. Their conclusion: It's the real thing.
It is said to be the first full-sized canvas by van Gogh to be found in
85 years. In the past, paintings by van Gogh have sold for
tens of millions of dollars apiece.
He has been dead since 1890, when, in one of many moments of despair,
he took his own life. He was only 37. Even before he entered the world,
there were omens that his might not be a conventional existence.
On March 30, 1852, Vincent van Gogh was stillborn in the Netherlands.
That was his older brother. A year later -- to the day -- a second
child was born. This child, too, was given the name Vincent. He would
be the boy who grew into an artist.
         
         </carp:document>
         <carp:doHtmlOutput>false</carp:doHtmlOutput>
         <carp:doHtmlCloud>false</carp:doHtmlCloud>
      </carp:findKeywords>
   </soapenv:Body>
</soapenv:Envelope>

Result:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
   <soapenv:Body>
      <ns:findKeywordsResponse xmlns:ns="http://carp.tm7.nl">
         <ns:return xsi:type="ax211:CarpKeywordfinder" xmlns:ax211="http://carp.tm7.nl/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <ax211:error xsi:type="ax211:CarpError">
               <ax211:code>0</ax211:code>
               <ax211:description xsi:nil="true"/>
               <ax211:severity xsi:nil="true"/>
               <ax211:stacktrace xsi:nil="true"/>
            </ax211:error>
            <ax211:language>en</ax211:language>
            <ax211:keywordList xsi:type="ax211:CarpKeywordfinder_Keyword">
               <ax211:keywordCaption>care</ax211:keywordCaption>
               <ax211:keywordOccurences>79.0</ax211:keywordOccurences>
               <ax211:keywordWeigth>0.8185521292482019</ax211:keywordWeigth>
            </ax211:keywordList>
        .....
     
      </ns:findKeywordsResponse>
   </soapenv:Body>
</soapenv:Envelope>

Carp Keyword Finder detailed documentation

nl.tm7.carp

Class CarpKeywordfinderService

  • java.lang.Object
    • nl.tm7.carp.CarpKeywordfinderService

  • public class CarpKeywordfinderService
    extends java.lang.Object
    Carp Keyword finder class.
    The Keyword finder class is used to find topics in a document.
    The web service findKeywords is available for this.

    The web service returns a CarpKeywordfinder object, which contains the document topics.
    The CarpKeywordfinder object contains the CarpError object which gives the method result:
    Possible values:
    Fatal:
    1, “Properties file: /” + filename + ” not found.”
    1, “Properties file: /” + filename + ” error. ” + e.getLocalizedMessage()”
    2, “No document types defined in carp.properties file.”
    2, “No languages defined in properties file.”
    2, “No model found for language ” + locale + “.”
    2, “Language model” + locale + “, internal error: ” + e.getLocalizedMessage()
    3, “Resource directory: ” + resourceRoot + ” not found.”
    3, “Internal server error.”
    4, “Invalid license.”
    Error:
    101,”Language ” + reqlanguage + ” value not allowed for this method.”
    101,”Documenttype missing.”
    101,”Documenttype ” + reqdoctype + ” not supported.”
    101,”Document content missing.”
    102,”Language could not be determined.”
    103,”Decoding rtf document failed:”
    Warning:
    200, “License will expire in ” + result + ” days.”

    Version:
    1.0
    Author:
    (c) Carp Technologies 2017

Carp Wordcloud detailed documentation

nl.tm7.carp

Class CarpWordcloudService

  • java.lang.Object
    • nl.tm7.carp.CarpWordcloudService

  • public class CarpWordcloudService
    extends java.lang.Object
    Carp wordcloud class.
    The Carp wordcloud class is used return a wordcloud given a word.
    The web service wordcloud is available for this.The web service returns a CarpWordcloud object, which contains the wordcloud.
    The CarpWordcloud object contains the CarpError object which gives the method result:
    Possible values:
    Fatal:
    1,”Properties file: /” + filename + ” not found.”
    2,”No document types defined in carp.properties file.”
    2,”No languages defined in properties file.”
    2,”No default language defined in properties file.”
    2,”Invalid default language defined in properties file.”
    3,”Internal server error.”
    Error:
    101,”Language ” + reqlanguage + ” value not allowed for this method.”
    101,”Language ” + reqlanguage + ” value not allowed for this method.”
    101,”Documenttype missing.”
    101,”Documenttype ” + reqdoctype + ” not supported.”
    101,”Document content missing.”
    101,”Summary percentage or word count missing.”
    101,”Summary percentage illegal value.”
    102,”Language could not be determined.”
    103,”Decoding rtf document failed:”
    Warning:
    200, “License will expire in ” + result + ” days.”
    Version:
    1.0
    Author:
    (c) Carp Technologies 2017

Carp Summarizer

Introduction

The Summarizer creates automatic summaries of texts in English and Dutch. The length of the summaries that are generated is fully adjustable. This can be specified in number of words or a percentage of the length of the original. The summary of a text from some pages are generated within seconds.

With each request the header needs  “SOAPAction: “urn:sentiment”.

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:carp="http://carp.tm7.nl">
   <soapenv:Header/>
   <soapenv:Body>
      <carp:summarize>
         <carp:language>en</carp:language>
         <carp:documenttype>text/plain</carp:documenttype>
         <carp:document>
Winners of 2013 National Book Awards announced
James McBride won the National Book Award for fiction Wednesday night for
"The Good Lord Bird." Each year the National Book Foundation presents awards
to winners in four categories: fiction, nonfiction, poetry and young people's
literature. The four winners were announced in a ceremony in New York hosted
by Mika Brzezinski, co-host of MSNBC's "Morning Joe." George Packer won the
nonfiction award for "The Unwinding: An Inner History of the New America."
Mary Szybist won the poetry award for "Incarnadine: Poems," while Cynthia Kadohata
won the young people's literature award for "The Thing About Luck."
Established in 1950, the National Book Award is one of the most prestigious
literary awards in the United States. Past recipients include William Faulkner,
Alice Walker, Philip Roth and Adrienne Rich. The winners were narrowed down from
a pool of 1,432 submissions. A five-judge panel of writers, literary critics and
booksellers in each category came up with a list of 10 titles announced in
September and narrowed it down to five finalists in October.
Among this year's finalists were journalists, historians, Pulitzer Prize
winners and past National Book Award winners and finalists, including
Thomas Pynchon, who won the National Book Award in 1974 for "Gravity's Rainbow,"
and Rachel Kushner, whose debut novel "Telex From Cuba" was a 2008 National Book
Award finalist.
         </carp:document>
         <carp:maximumSummaryPercentage>0</carp:maximumSummaryPercentage>
         <carp:maximumSummaryWordCount>100</carp:maximumSummaryWordCount>
         <carp:addEllipses>true</carp:addEllipses>
         <carp:addHeadings>true</carp:addHeadings>
         <carp:addTitle>true</carp:addTitle>
      </carp:summarize>
   </soapenv:Body>
</soapenv:Envelope>

Result:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
   <soapenv:Body>
      <ns:summarizeResponse xmlns:ns="http://carp.tm7.nl">
         <ns:return xsi:type="ax21:CarpSummarizer"
             xmlns:ax21="http://carp.tm7.nl/xsd"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <ax21:error xsi:type="ax21:CarpError">
               <ax21:code>0</ax21:code>
               <ax21:description xsi:nil="true"/>
               <ax21:severity xsi:nil="true"/>
               <ax21:stacktrace xsi:nil="true"/>
            </ax21:error>
            <ax21:language>en</ax21:language>
            <ax21:summary>Winners of 2013 National Book Awards announced
James McBride won the National Book Award for fiction Wednesday night for
Each year the National Book Foundation presents awards
to winners in four categories: fiction, nonfiction, poetry and young people's
literature. … Established in 1950, the National Book Award is one of the most prestigious
literary awards in the United States. Past recipients include William Faulkner,
Alice Walker, Philip Roth and Adrienne Rich. The winners were narrowed down from
a pool of 1,432 submissions.</ax21:summary>
         </ns:return>
      </ns:summarizeResponse>
   </soapenv:Body>
</soapenv:Envelope>

Carp Summarizer detailed documentation

nl.tm7.carp

Class CarpSummarizerService

  • java.lang.Object
    • nl.tm7.carp.CarpSummarizerService

  • public class CarpSummarizerService
    extends java.lang.Object
    Carp summarizer class.
    The Carp summarizer class is used to summarize a document.
    The web service summarize is available for this.The web service returns a CarpSummarizer object, which contains the document summary.
    The CarpSummarizer object contains the CarpError object which gives the method result:
    Possible values:
    Fatal:
    1,”Properties file: /” + filename + ” not found.”
    2,”No document types defined in carp.properties file.”
    2,”No languages defined in properties file.”
    2,”No default language defined in properties file.”
    2,”Invalid default language defined in properties file.”
    3,”Internal server error.”
    Error:
    101,”Language ” + reqlanguage + ” value not allowed for this method.”
    101,”Language ” + reqlanguage + ” value not allowed for this method.”
    101,”Documenttype missing.”
    101,”Documenttype ” + reqdoctype + ” not supported.”
    101,”Document content missing.”
    101,”Summary percentage or word count missing.”
    101,”Summary percentage illegal value.”
    102,”Language could not be determined.”
    103,”Decoding rtf document failed:”
    Warning:
    200, “License will expire in ” + result + ” days.”
    Version:
    1.0
    Author:
    (c) Carp Technologies 2017

TM7 offers tools for a.o.

  • Classifying Documents

    The TM7 classifier can handle small and large documents form various formats

  • Summarizing and highlighting texts

    To be more productive and objective analysis TM7 provides the summarizer and highlighter that focus on the objective essential parts of texts

  • Entity extraction

    To check large files for entities for research or for anonimization TM7 provides tools for entity extraction

  • Infocloud

    To detect relations within various texts TM7 offers the infocloud that allows researcher to see the relation between topics within texts

For the full stack of tools please contact TM7 or one of its consultants.

By upgrading to a Konnect subscription you will gain access to advanced features and will also receive continuous updates so we can provide you with an exceptional experience.

Basic

$19

Per Month
  • First Feature
  • Second Feature
  • Third Feature
GET STARTED!

Pro

$39

Per Month
  • First Feature
  • Second Feature
  • Third Feature
GET STARTED

A multi-channel natural language processing tool for enterprises.