Detect Entities Using Amazon Comprehend Service

Amazon Comprehend Service is a natural language processing service that can be used to analyze free text and detect entities from the text. It can detect person names, location, organization and dates.

The Comprehend service currently supports English and Spanish languages.

The Comprehend service can be called from Lambda functions and therefore from the Api Gateway as well. Therefore, for example, an application can be built that processes the news and identify if certain organizations appear in the news or not.

Below is an example on how the Comprehend service can be called to detect entities from text. The text is obtained from the test event.

Note that, the sample code is ran in Ireland region but the new aws-sdk is not available yet. So, the aws-sdk is downloaded from npm and supplied as part of the code.

Sample event - the text is obtained from Wikipedia:


        var AWS = require('aws-sdk');
        var comprehend = new AWS.Comprehend({apiVersion: '2017-11-27'});        
        
        exports.handler = (event, context, callback) => {
            
            var params = {
                LanguageCode: 'en', /* required - other option is es */
                Text: event.text /* required - string that will parse for detecting entities */
            };
            
            comprehend.detectEntities(params, function(err, data) {
                if (err) console.log(err, err.stack); // an error occurred
                else     console.log(data);           // successful response
            });
        };          
                    
    

Output when the above code is run.


        { Entities: 
            [ { Score: 0.8724896311759949,
                Type: 'ORGANIZATION',
                Text: 'AWS',
                BeginOffset: 4,
                EndOffset: 7 },
                { Score: 0.9992467164993286,
                Type: 'DATE',
                Text: 'July 2002',
                BeginOffset: 33,
                EndOffset: 42 },
                { Score: 0.7297202348709106,
                Type: 'ORGANIZATION',
                Text: 'Amazon',
                BeginOffset: 86,
                EndOffset: 92 },
                { Score: 0.8973404169082642,
                Type: 'DATE',
                Text: 'late 2003',
                BeginOffset: 296,
                EndOffset: 305 },
                { Score: 0.9107503294944763,
                Type: 'ORGANIZATION',
                Text: 'AWS',
                BeginOffset: 311,
                EndOffset: 314 },
                { Score: 0.9996297359466553,
                Type: 'PERSON',
                Text: 'Chris Pinkham',
                BeginOffset: 354,
                EndOffset: 367 },
                { Score: 0.9968699812889099,
                Type: 'PERSON',
                Text: 'Benjamin Black',
                BeginOffset: 372,
                EndOffset: 386 },
                { Score: 0.8937766551971436,
                Type: 'ORGANIZATION',
                Text: 'Amazon',
                BeginOffset: 429,
                EndOffset: 435 },
                { Score: 0.9992678761482239,
                Type: 'DATE',
                Text: 'November 2004',
                BeginOffset: 850,
                EndOffset: 863 },
                { Score: 0.8968200087547302,
                Type: 'QUANTITY',
                Text: 'first',
                BeginOffset: 869,
                EndOffset: 874 },
                { Score: 0.7346844673156738,
                Type: 'ORGANIZATION',
                Text: 'AWS',
                BeginOffset: 875,
                EndOffset: 878 },
                { Score: 0.44130393862724304,
                Type: 'ORGANIZATION',
                Text: 'Pinkham',
                BeginOffset: 956,
                EndOffset: 963 },
                { Score: 0.9781473278999329,
                Type: 'PERSON',
                Text: 'Christoper Brown',
                BeginOffset: 983,
                EndOffset: 999 },
                { Score: 0.9949941039085388,
                Type: 'ORGANIZATION',
                Text: 'Amazon',
                BeginOffset: 1014,
                EndOffset: 1020 },
                { Score: 0.79152512550354,
                Type: 'TITLE',
                Text: 'EC2',
                BeginOffset: 1021,
                EndOffset: 1024 },
                { Score: 0.9789855480194092,
                Type: 'LOCATION',
                Text: 'Cape Town, South Africa',
                BeginOffset: 1049,
                EndOffset: 1072 } ] }
    

From the above output, each of the entities determined contains attributes below.

  • Score: a score between 0 and 1 that shows the level of confidence of the accuracy
  • Type: the type of entity - PERSON | LOCATION | ORGANIZATION | COMMERCIAL_ITEM | EVENT | DATE | QUANTITY | TITLE | OTHER
  • Text: string representing the entity
  • BeginOffset: start character position for the entity
  • EndOffset: end character position for the entity