tests.system.apache.hive.example_twitter_dag¶

This is an example dag for managing twitter data.

Attributes¶

`fetch_tweets`()	This task should call Twitter API and retrieve tweets from yesterday from and to for the four twitter
`clean_tweets`()	This is a placeholder to clean the eight files. In this step you can get rid of or cherry pick columns
`analyze_tweets`()	This is a placeholder to analyze the twitter data. Could simply be a sentiment analysis through algorithms
`transfer_to_db`()	This is a placeholder to extract summary from Hive data and store it to MySQL.

tests.system.apache.hive.example_twitter_dag.DAG_ID = 'example_twitter_dag'[source]¶

tests.system.apache.hive.example_twitter_dag.fetch_tweets()[source]¶: This task should call Twitter API and retrieve tweets from yesterday from and to for the four twitter users (Twitter_A,..,Twitter_D) There should be eight csv output files generated by this task and naming convention is direction(from or to)_twitterHandle_date.csv

tests.system.apache.hive.example_twitter_dag.clean_tweets()[source]¶: This is a placeholder to clean the eight files. In this step you can get rid of or cherry pick columns and different parts of the text.

tests.system.apache.hive.example_twitter_dag.analyze_tweets()[source]¶: This is a placeholder to analyze the twitter data. Could simply be a sentiment analysis through algorithms like bag of words or something more complicated. You can also take a look at Web Services to do such tasks.

tests.system.apache.hive.example_twitter_dag.transfer_to_db()[source]¶: This is a placeholder to extract summary from Hive data and store it to MySQL.