tests.system.apache.hive.example_twitter_dag

This is an example dag for managing twitter data.

Module Contents

Functions

fetch_tweets()

This task should call Twitter API and retrieve tweets from yesterday from and to for the four twitter

clean_tweets()

This is a placeholder to clean the eight files. In this step you can get rid of or cherry pick columns

analyze_tweets()

This is a placeholder to analyze the twitter data. Could simply be a sentiment analysis through algorithms

transfer_to_db()

This is a placeholder to extract summary from Hive data and store it to MySQL.

Attributes

ENV_ID

DAG_ID

fetch

test_run

tests.system.apache.hive.example_twitter_dag.ENV_ID[source]
tests.system.apache.hive.example_twitter_dag.DAG_ID = 'example_twitter_dag'[source]
tests.system.apache.hive.example_twitter_dag.fetch_tweets()[source]

This task should call Twitter API and retrieve tweets from yesterday from and to for the four twitter users (Twitter_A,..,Twitter_D) There should be eight csv output files generated by this task and naming convention is direction(from or to)_twitterHandle_date.csv

tests.system.apache.hive.example_twitter_dag.clean_tweets()[source]

This is a placeholder to clean the eight files. In this step you can get rid of or cherry pick columns and different parts of the text.

tests.system.apache.hive.example_twitter_dag.analyze_tweets()[source]

This is a placeholder to analyze the twitter data. Could simply be a sentiment analysis through algorithms like bag of words or something more complicated. You can also take a look at Web Services to do such tasks.

tests.system.apache.hive.example_twitter_dag.transfer_to_db()[source]

This is a placeholder to extract summary from Hive data and store it to MySQL.

tests.system.apache.hive.example_twitter_dag.fetch[source]
tests.system.apache.hive.example_twitter_dag.test_run[source]

Was this entry helpful?