Twitter Analytics Demo

Skip to end of metadata
Go to start of metadata
Table of Contents
Source code
The source code for the demo is available from the S4 GitHub repository

Introduction

The following demo is based on simple showcase demonstrating the use of the S4 Text Analytics services in a complete application. The application is supposed to monitor and analyze social media content (recent posts on Twitter) for certain topics (in this example: '#earthquake') and relevant entities co-occurring in the messages. The descriptions below briefly cover all phases of the application workflow detailing the use of the S4 services.

The demo is comprised of three steps:

  1. data provisioning by accessing the Twitter Search API
  2. analyzing the social media content via S4 Services
  3. aggregating and visualizing the results using the Google Charts API

The demo source code is also available from the S4 GitHub repository

Prerequisites

  • Registered Twitter account (twitter.com)
  • valid S4 credentials (account, API key and secret)
  • Java runtime environment 1.6+

Accessing Twitter Data

The first step is to access data from Twitter, by using the Twitter Search API. For accessing the API we use the Twitter4j library, which requires providing a valid Twitter API key, API secret, Access token and Access token secret. The following section explains the procedure for obtaining such keys and tokens (skip this section if you already have valid credentials). For convenience, these artifacts are stored and read from a configuration file 'app.properties' within the application.

Obtaining Twitter Credentials

Follow the steps:

  1. Sign in with your Twitter account at https://dev.twitter.com/
  2. Select My Applications (from the top-right corner)
  3. Create a new application for this demo
  4. After you create your new application you will be redirected to its page (Details tab).
  5. Switch to the API Keys tab and you will see your application API Key and API Secret.
  6. Create an Access token and Access token secret by clicking on the Create my access token button.

Now you have valid Twitter API key, API secret, Access token and Access token secret which are used to make calls to the Twitter APIs. Note that "read only" access level is sufficient for the demo.

Search and Acquisition of Twitter Data

The following code snippets use the Twitter4j library to access the Twitter Search API :

  • Create and configure a Twitter client instance with the proper credentials:
  • Prepare a query
  • Execute the query

This is the code for the saveTweetIntoFile(Status tweet) method from the previous step:

S4 Processing

At this phase the raw tweets content is sent to the text analysis service (Twitter IE) to extract mentions of people, organizations, locations, etc. The results from the text processing are enriched twitter documents in JSON format, stored in separate folder. The interaction with the S4 APIs is facilitated by a S4 Java Client API library.

S4 Credentials

For this step you will need valid S4 credentials (account, API key and secret). Refer to the S4 Management Console guide for details on how to register with S4 and obtain API access keys

Processing Tweets

To execute API calls we  use the S4 Java API client.

Follow the steps:

  • Select the the TwitIE S4 service
  • provide your personal S4 API Key and API Secret
  • Configure the input call parameters
  • read the tweet from the folder 
  • Execute the S4 processing request

Visualization

To make real use of the results from the text analysis, on this phase we aggregate the information and export it in various formats for visualization. For simplicity we make use of the rich variety of widgets and charts from the Google Charts tools.

Content Aggregation

We use a simple data model for representing the annotated results. The workflow comprises of three steps: parsing the source data, aggregating the data, storing the result.

Each visualization (next sections) represents different aspect of the data, so a different aggregation function is applied. The result from each function is a javascript object or code fragment stored in a file, in a form required by the corresponding visualization chart. Technically, each visualization is a pair of static html and a .js files. The latter is generated by the aggregation processing module.

Here follows an example of the aggregated result used by the pie chart below:

Occurrence of a Topic Over Time

The sample data in this demo is based on a Twitter query for the #earthquake hashtag, executed on 30-Jul-2014. This interactive widget plots the number of mentions of 'earthquake' over time. It allows for zooming in and out the diagram showing different levels of detail

Distribution of Locations

The following geo chart shows the popularity of geo locations detected in the tweets. The figure below indicates the places there earthquakes are hot topic.

Top 10 Entities

Here we don't take into consideration the type of the entities being recognized and naturally the hash tags are dominating

Summary of All Entities

A simple table widget summarizing the occurrences entities of any type, allowing sorting by representation (not normalized), type, number of occurance

Next Steps

Register for an S4 developer account (if you don't already have one), download the demo source code from the S4 GitHub repository and start annotating Tweets and visualising trends!

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.