The following demo is based on simple showcase demonstrating the use of the S4 Text Analytics services in a complete application. The application is supposed to monitor and analyze social media content (recent posts on Twitter) for certain topics (in this example: '#earthquake') and relevant entities co-occurring in the messages. The descriptions below briefly cover all phases of the application workflow detailing the use of the S4 services.
The demo is comprised of three steps:
The first step is to access data from Twitter, by using the Twitter Search API. For accessing the API we use the Twitter4j library, which requires providing a valid Twitter API key, API secret, Access token and Access token secret. The following section explains the procedure for obtaining such keys and tokens (skip this section if you already have valid credentials). For convenience, these artifacts are stored and read from a configuration file 'app.properties' within the application.
Follow the steps:
Now you have valid Twitter API key, API secret, Access token and Access token secret which are used to make calls to the Twitter APIs. Note that "read only" access level is sufficient for the demo.
The following code snippets use the Twitter4j library to access the Twitter Search API :
This is the code for the saveTweetIntoFile(Status tweet) method from the previous step:
At this phase the raw tweets content is sent to the text analysis service (Twitter IE) to extract mentions of people, organizations, locations, etc. The results from the text processing are enriched twitter documents in JSON format, stored in separate folder. The interaction with the S4 APIs is facilitated by a S4 Java Client API library.
To execute API calls we use the S4 Java API client.
Follow the steps:
To make real use of the results from the text analysis, on this phase we aggregate the information and export it in various formats for visualization. For simplicity we make use of the rich variety of widgets and charts from the Google Charts tools.
We use a simple data model for representing the annotated results. The workflow comprises of three steps: parsing the source data, aggregating the data, storing the result.
Here follows an example of the aggregated result used by the pie chart below:
The sample data in this demo is based on a Twitter query for the #earthquake hashtag, executed on 30-Jul-2014. This interactive widget plots the number of mentions of 'earthquake' over time. It allows for zooming in and out the diagram showing different levels of detail
The following geo chart shows the popularity of geo locations detected in the tweets. The figure below indicates the places there earthquakes are hot topic.
Here we don't take into consideration the type of the entities being recognized and naturally the hash tags are dominating
A simple table widget summarizing the occurrences entities of any type, allowing sorting by representation (not normalized), type, number of occurance
Skip to end of metadata Go to start of metadata