how-to-build-a-postgresql-database-to-store-tweets
Related GitHub files - "Analysis of Twitter" by Euge InzaugaratInternal notes: tweetdata/python/tweepy/
Add a project in an approved Twitter developer account to get:
Four keys: Consumer API key, API secret key, Access token, Access token secret
Use the free community edition of DataBricks - Create a notebook set to scala (ex. Twitter Southeast)
Click: "Clusters > Create Cluster" and name it cluster1
Wait a couple minutes for the cluster to appear in the Clusters list. Click the 3-dot menu and choose "Libraries > Install New"
Downloaded 2.4.3 of spark-streaming-kafka-0-10_2.11 (get latest) https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10_2.11/
Downloaded 1.6.3 spark-streaming-twitter_2.11 (get latest) https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-twitter_2.11/
Amazon Managed Streaming for Apache Kafka (MSK) – Generally Available as of May 2019 (includes Apache ZooKeeper) https://aws.amazon.com/blogs/aws/amazon-managed-streaming-for-apache-kafka-msk-now-generally-available/
"Apache Kafka (Kafka) is an open-source platform that enables customers to capture streaming data like click stream events, transactions, IoT events, application and machine logs, and have applications that perform real-time analytics, run continuous transformations, and distribute this data to data lakes and databases in real time."
Next, Create a Cluster in Amazon Managed Streaming for Apache Kafka (MSK)
Yikes, MSK pricing says $0.21 per hour! But it's not clear if a small dataset would be less.