CASE STUDY

High-Volume Real-Time Data Processing and Analytics System

·

Objective

To design and implement a robust system capable of handling high-frequency (250,000 QPS) and high-cardinality data in the context of real-time bidding (RTB) for digital advertising. The system needed to ingest and process various RTB event types (bid requests, bid responses, wins, losses, impressions, clicks, postbacks, demand ad requests/responses, and arbitrary events), deduplicate records, batch the data, and store it for analytics. The stored data was then used to inform machine learning (ML) models for real-time pricing adjustments based on historical performance.

Outcome

The system successfully handled 250,000 QPS with events having high cardinality dimensions, ensuring accurate data processing and real-time decision-making. By integrating ML models and leveraging BI tools, the system allowed for continuous optimization of RTB bid prices based on past performance, improving overall profitability and efficiency in the ad exchange environment. The scalable architecture ensured that the system could handle future traffic growth and evolving business needs.

Technologies Used

  • Nginx and OpenResty/Lua for handling the RTB event ingestion endpoints.
  • Redis for caching and parameter storage.
  • Docker for containerizing microservices.
  • AWS ECS for compute management.
  • Kinesis for real-time data streaming.
  • S3 for batch storage of raw data.
  • Redshift, Snowflake, and Databricks Spark for data warehousing and processing.
  • Postgres for storing historical parameters and feeding them back to Redis.
  • Tableau for Business Intelligence (BI) and visualization.
  • AWS SageMaker for machine learning pipeline management and model training and hosting.

Problem Statement

The system required the ability to record, process, analyze, and act on high-frequency event data generated by an RTB system, with a focus on scalability, low-latency, and data quality (through deduplication and batching). The goal was to make sure the system:

  1. Could handle massive traffic at scale.
  2. Delivered deduplicated, high-quality data to an analytics pipeline.
  3. Enabled real-time decisions to be informed by ML models based on historical data.
  4. Provided insights via BI tools like Tableau for monitoring and optimization.

System Architecture

  1. Ingestion Layer:
    • Nginx with OpenResty/Lua was used to handle all incoming RTB events (e.g., bid requests, bid responses, wins, losses, impressions, clicks, postbacks). Each event was routed to its respective endpoint through OpenResty scripts to ensure low-latency processing and efficient handling of high-traffic volumes.
  2. Data Streaming and Deduplication:
    • Kinesis Firehose was used to capture and buffer incoming events in real time. Kinesis provided the throughput needed to handle up to 250,000 QPS, ensuring smooth ingestion and buffering of data to downstream systems.
    • Custom Lua scripts embedded in the ingestion layer were responsible for checking each incoming event for duplication before sending it to Kinesis. This ensured only unique events were passed down the pipeline.
  3. Batching and Storage:
    • Amazon S3 was used as the primary storage solution for the raw, deduplicated data. Event records were batched and written to S3 in micro-batches to minimize write overhead while ensuring data consistency.
    • Events were periodically aggregated into larger files, reducing the overhead associated with high-frequency small writes.
  4. Data Warehousing:
    • Redshift and Snowflake were used for structured data analysis. Data was periodically loaded from S3 into these data warehouses, where it could be queried and analyzed.
    • Databricks Spark was employed to process large datasets, perform ETL operations, and prepare the data for ML modeling and advanced analytics. Spark’s distributed processing allowed for efficient handling of terabytes of data.
  5. Analytics and Visualization:
    • Tableau was integrated as the BI tool for real-time monitoring and historical analysis. Key performance indicators (KPIs) related to RTB events, such as win rates, bid response times, and click-through rates, were visualized in customizable dashboards. This allowed stakeholders to continuously monitor system performance and adjust their bidding strategies.
  6. Machine Learning Integration:
    • The historical performance data from Redshift and Snowflake was fed into an Sagemaker pipeline that calculated optimal bid prices. The model used parameters like win rates, click-through rates, and bid response times to inform real-time pricing decisions.
    • Parameters calculated by the ML model were pushed to Postgres, which acted as the main data store for the current model parameters.
    • Redis was used as a low-latency cache to quickly retrieve these parameters and feed them back into the bid response pipeline, ensuring pricing decisions were based on the most up-to-date information.
  7. Feedback Loop for Optimization:
    • A continuous feedback loop was implemented whereby real-time events, after being processed and analyzed, fed back into the ML model. Redis was regularly updated with new bid price parameters, which adjusted future bid responses, optimizing bid prices based on past performance.
    • This feedback loop allowed for dynamic adjustments to bidding strategies, improving the efficiency and profitability of the RTB system.

Key Features and Challenges

  1. Scalability:
    • Handling 250,000 QPS was a significant challenge. To address this, Nginx/OpenResty with Lua was optimized for low-latency, high-throughput processing, and Kinesis was chosen for its ability to scale elastically with traffic.
  2. Data Deduplication:
    • Deduplication was performed at the ingestion layer using Lua scripts and within the streaming and batch processes. This ensured that the dataset was accurate and that data analysis was based on clean, high-quality data.
  3. Batching and Storage Optimization:
    • Storing large amounts of high-frequency data in S3 required efficient batching strategies. Data was written to S3 in compressed batches, with each event tagged for retrieval and analysis downstream.
  4. Real-Time vs Batch Processing:
    • The system balanced real-time processing needs with batch analytics. Openresty handled real-time data streams for immediate processing, while S3 and the data warehouses managed historical data analysis through periodic batch uploads.
  5. Integration with Machine Learning:
    • The continuous feedback loop between the Sagemaker models, Redis, and Postgres ensured that bidding strategies were dynamically updated based on the latest data. The low-latency architecture of Redis enabled rapid access to real-time pricing parameters, ensuring quick bid responses.
  6. Visualization and Insights:
    • Tableau provided stakeholders with an intuitive, visual interface for analyzing the system’s performance. Custom dashboards tracked everything from bid request volumes to conversion rates, offering deep insights into the effectiveness of the bidding strategy.