When I started working in Hadoop one of the thing that I always wanted to accomplish was to load the live twitter feed in HDFS. Now why do we need to load the twitter feed? There are so many use cases with twitter data like analyzing brand sentiments, performance of a movie, sports events like FIFA, NBA, NFL, etc. Soccer world cup is becoming biggest social media event ever and social media uses are going to get increase day by day for almost everything so its very important to have twitter data analyzed for any organization for marketing needs or for any other purpose.
When I started working in Hadoop I didn't have any working experience in Java, along with Hadoop I am also learning little bit of Java. To load the twitter feed I used flume. There are certain steps that I followed to load the twitter feed in HDFS via flume.
In this POC after fixing the custom jar issue I didn't really had any issue I was able to stream the tweet feed in HDFS. I created the external table in hive to see and analyze twitter data. I used below command to create hive table -
add jar json-serde-1.1.9.3-SNAPSHOT-jar-with-dependencies.jar;
For hive external table I downloaded the jar file and added the jar file like above otherwise table won't be created it will give error.
I would love to see your feedback/comment on this.
When I started working in Hadoop I didn't have any working experience in Java, along with Hadoop I am also learning little bit of Java. To load the twitter feed I used flume. There are certain steps that I followed to load the twitter feed in HDFS via flume.
- First I created twitter application from my own twitter account in apps.twitter.com and then generate API Key/Secret and Access Token/Secret.
- A custom twitter flume source is used for streaming twitter data into HDFS. Flume source uses the twitter streaming API which returns a json structure for every tweet via Twitter4J library and then it gets stored in HDFS. We got the custom source jar file from one of our support guy. Initially I was getting this error "java.lang.UnsupportedClassVersionError: poc/hortonworks/flume/source/twitter/TwitterSource : Unsupported major.minor version 51.0" after using correct version of JDK/JRE it worked as expected.
- Modified the flume-env.sh file to locate your custom jar file in classpath.
- I created the flume config file and added my key/secret, also other flume configuration details.
- For execution I used below command -
In this POC after fixing the custom jar issue I didn't really had any issue I was able to stream the tweet feed in HDFS. I created the external table in hive to see and analyze twitter data. I used below command to create hive table -
add jar json-serde-1.1.9.3-SNAPSHOT-jar-with-dependencies.jar;
CREATE EXTERNAL TABLE tweets (
tweetmessage STRING,
createddate STRING,
geolocation STRING,
user struct<
userlocation:STRING,
id:STRING,
name:STRING,
screenname:STRING,
geoenabled:STRING>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/user/twitter'
;
tweetmessage STRING,
createddate STRING,
geolocation STRING,
user struct<
userlocation:STRING,
id:STRING,
name:STRING,
screenname:STRING,
geoenabled:STRING>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/user/twitter'
;
For hive external table I downloaded the jar file and added the jar file like above otherwise table won't be created it will give error.
I would love to see your feedback/comment on this.
Managing a business data is not an easy thing, it is very complex process to handle the corporate information both Hadoop and cognos doing this in a easy manner with help of business software suite, thanks for sharing this useful post….
ReplyDeleteRegards,
cognos Training in Chennai|cognos Training Chennai|cognos Training
A table is the basic unit of data storage in an oracle database. The table of a database hold all of the user accesible data. Table data is stored in rows and columns. But what is all about the clusters and how to handle it using oracle database system? Expecting a right answer from you. By the way you are maintaining a great blog. Thanks for sharing this in here.
ReplyDeleteOracle Training in Chennai | Oracle Course in Chennai | Oracle Training Center in Chennai
Thanks Admin for sharing such a useful post, I hope it’s useful to many individuals for developing their skill to get good career.
ReplyDeleteRegards,
Best Informatica Training In Chennai|Informatica training center in Chennai|Informatica training chennai
Cloud servers are the best in safe guarding one's information thorugh online. Without this dedicated methodology many companies would have not existed at all. The same though has been furnished above. Thanks for sharing this worth while content in here. Keep writing article like this.
ReplyDeleteSalesforce certification Training in Chennai | Salesforce developer training in chennai
Maharashtra Police Patil Recruitment 2016
ReplyDeleteNice way of explanation and fastidious article......
Really Very interesting article you prepared for hadoop
ReplyDeleteJava Training Institutes in Chennai | java j2ee training institutes in chennai | Java Training in Chennai | J2EE Training in Chennai | Java Course in Chennai
I like your writing style, it was very clear to understanding the concept well; I hope you ll keep your blog as updated.
ReplyDeleteRegards,
SAS Training in Chennai|SAS Course in Chennai|SAS Training Chennai
no deposit bonus forex 2021 - takipçi satın al - takipçi satın al - takipçi satın al - takipcialdim.com/tiktok-takipci-satin-al/ - instagram beğeni satın al - instagram beğeni satın al - google haritalara yer ekleme - btcturk - tiktok izlenme satın al - sms onay - youtube izlenme satın al - google haritalara yer ekleme - no deposit bonus forex 2021 - tiktok jeton hilesi - tiktok beğeni satın al - binance - takipçi satın al - uc satın al - finanspedia.com - sms onay - sms onay - tiktok takipçi satın al - tiktok beğeni satın al - twitter takipçi satın al - trend topic satın al - youtube abone satın al - instagram beğeni satın al - tiktok beğeni satın al - twitter takipçi satın al - trend topic satın al - youtube abone satın al - instagram beğeni satın al - tiktok takipçi satın al - tiktok beğeni satın al - twitter takipçi satın al - trend topic satın al - youtube abone satın al - instagram beğeni satın al - perde modelleri - instagram takipçi satın al - instagram takipçi satın al - cami avizesi - marsbahis
ReplyDeleteüsküdar toshiba klima servisi
ReplyDeletependik daikin klima servisi
tuzla toshiba klima servisi
tuzla beko klima servisi
çekmeköy lg klima servisi
ataşehir lg klima servisi
kadıköy toshiba klima servisi
kadıköy beko klima servisi
kartal lg klima servisi
Good content. You write beautiful things.
ReplyDeletevbet
mrbahis
mrbahis
sportsbet
hacklink
taksi
hacklink
vbet
sportsbet
Success Write content success. Thanks.
ReplyDeletekralbet
betmatik
betturkey
betpark
canlı slot siteleri
kıbrıs bahis siteleri
canlı poker siteleri
dijital kartvizit
ReplyDeletereferans kimliği nedir
binance referans kodu
referans kimliği nedir
bitcoin nasıl alınır
resimli magnet
UZB10J
شركة تنظيف سجاد بالجبيل m9FI5Fe6To
ReplyDelete