Real-Time Financial Market Data Processing and Prediction application
-
Updated
Jul 23, 2023 - Python
Real-Time Financial Market Data Processing and Prediction application
Astronomy Broker based on Apache Spark
A structured streaming was applied to the robot data from ROS-Gazebo simulation environment using Apache Spark. Data is collected in Kafka, analyzed by Apache Spark and stored in Cassandra.
Ingesting real-time Twitter API using tweepy into Kafka and process using Apache Spark Structured Streaming with Sentiment Analysis TextBlob before loading into time-series database, InfluxDB and monitoring dashboard, Grafana
Databricks Real-Time Fintech Monitoring Pipeline: Hands-on lab to build a streaming fraud detection system using Auto Loader, watermarked deduplication, stream-static joins, and windowed rules engines in Databricks. Covers dual-SLA architecture for real-time alerts and batch compliance reporting.
Real-time ETL pipeline for financial data (kafka, pyspark) .
Databricks PySpark Certification Prep Lab: Build an e-commerce analytics pipeline covering Spark DataFrame API, Structured Streaming, data skew handling with salting, broadcast joins, and Pandas UDFs. Designed for the Databricks Certified Associate Developer for Apache Spark exam.
development scaffold for test driven pyspark structured streaming with fast local testing
Efficiently tackle large datasets and perform big data analysis with Spark and Python
Structured Spark Streaming with Apache Kafka and Twitter
A framework for incremental streaming joins and incremental streaming aggregations over change data feeds from Databricks Delta
Demo Spark Structured Streaming + Apache Kafka + Apache Cassandra
An end‑to‑end real-time analytics & anomaly detection with PySpark Structured Streaming on user activity logs from Kafka
End-to-end big data pipeline for Amazon product reviews: Spark batch ETL + feature engineering, MLflow-tracked sentiment (TF-IDF + LogReg) and fraud detection (GBT + behavioral features) models, Spark Structured Streaming scorer, and a FastAPI dashboard.
This project links together a MongoDB cluster and a Kafka cluster with a Standalone Pyspark cluster all done locally
Streaming and analyzing data streams from smart systems using Apache Kafka and Apache Spark tools
Structured Streaming app that can read files from the local system folder as new files are added to the folder as stream data and apply all the operations on the new data and, finally, write the results in an output directory.
Add a description, image, and links to the structured-streaming topic page so that developers can more easily learn about it.
To associate your repository with the structured-streaming topic, visit your repo's landing page and select "manage topics."