Saturday, June 28, 2014

My first tweet feed in Hadoop

When I started working in Hadoop one of the thing that I always wanted to accomplish was to load the live twitter feed in HDFS. Now why do we need to load the twitter feed? There are so many use cases with twitter data like analyzing brand sentiments, performance of a movie, sports events like FIFA, NBA, NFL, etc. Soccer world cup is becoming biggest social media event ever and social media uses are going to get increase day by day for almost everything so its very important to have twitter data analyzed for any organization for marketing needs or for any other purpose.

When I started working in Hadoop I didn't have any working experience in Java, along with Hadoop I am also learning little bit of Java. To load the twitter feed I used flume. There are certain steps that I followed to load the twitter feed in HDFS via flume.
  1. First I created twitter application from my own twitter account in apps.twitter.com and then generate API Key/Secret and Access Token/Secret. 
  2. A custom twitter flume source is used for streaming twitter data into HDFS. Flume source uses the twitter streaming API which returns a json structure for every tweet via Twitter4J library and then it gets stored in HDFS. We got the custom source jar file from one of our support guy. Initially I was getting this error "java.lang.UnsupportedClassVersionError: poc/hortonworks/flume/source/twitter/TwitterSource : Unsupported major.minor version 51.0" after using correct version of JDK/JRE it worked as expected.
  3. Modified the flume-env.sh file to locate your custom jar file in classpath. 
  4. I created the flume config file and added my key/secret, also other flume configuration details.
  5. For execution I used below command - 
             flume-ng agent -n TwitterAgent -c /etc/flume/conf -f flume.conf 

In this POC after fixing the custom jar issue I didn't really had any issue I was able to stream the tweet feed in HDFS. I created the external table in hive to see and analyze twitter data. I used below command to create hive table - 

add jar json-serde-1.1.9.3-SNAPSHOT-jar-with-dependencies.jar;

CREATE EXTERNAL TABLE tweets (
  tweetmessage STRING,
  createddate STRING,
  geolocation STRING,
  user struct<
        userlocation:STRING,
        id:STRING,
        name:STRING,
        screenname:STRING,
        geoenabled:STRING>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/user/twitter'
;
 

For hive external table I downloaded the jar file and added the jar file like above otherwise table won't be created it will give error.
 
I would love to see your feedback/comment on this.

13 comments:

  1. Managing a business data is not an easy thing, it is very complex process to handle the corporate information both Hadoop and cognos doing this in a easy manner with help of business software suite, thanks for sharing this useful post….
    Regards,
    cognos Training in Chennai|cognos Training Chennai|cognos Training

    ReplyDelete
  2. A table is the basic unit of data storage in an oracle database. The table of a database hold all of the user accesible data. Table data is stored in rows and columns. But what is all about the clusters and how to handle it using oracle database system? Expecting a right answer from you. By the way you are maintaining a great blog. Thanks for sharing this in here.
    Oracle Training in Chennai | Oracle Course in Chennai | Oracle Training Center in Chennai

    ReplyDelete
  3. Thanks Admin for sharing such a useful post, I hope it’s useful to many individuals for developing their skill to get good career.
    Regards,
    Best Informatica Training In Chennai|Informatica training center in Chennai|Informatica training chennai

    ReplyDelete
  4. Cloud servers are the best in safe guarding one's information thorugh online. Without this dedicated methodology many companies would have not existed at all. The same though has been furnished above. Thanks for sharing this worth while content in here. Keep writing article like this.

    Salesforce certification Training in Chennai | Salesforce developer training in chennai

    ReplyDelete
  5. I like your writing style, it was very clear to understanding the concept well; I hope you ll keep your blog as updated.
    Regards,

    SAS Training in Chennai|SAS Course in Chennai|SAS Training Chennai

    ReplyDelete