CHALLENGES
ACTIVEID: ml-feature-store
ETL SPEED RACE CHALLENGE
Process 10TB of e-commerce data with lowest latency
REWARDS
KUDOS + SWAG
STATUS
ONGOING
PARTICIPANTS
850
STATUS
STARTED:FEB 1, 2026
ONGOINGThis challenge does not expire
PROBLEM DESCRIPTION
You must build a pipeline that processes 10TB of e-commerce transaction data. The pipeline should:
1. Extract data from S3 (Parquet format)
2. Transform: Deduplicate, enrich, aggregate
3. Load to target data warehouse
4. Maintain data quality (>99.9% accuracy)
**METRICS:**
- Throughput (records/sec) - 30% weight
- Latency p99 - 30% weight
- Resource efficiency - 20% weight
- Code quality - 20% weight
DATASETS
orders.parquet
2.3 GBcustomers.parquet
450 MBproducts.parquet
120 MBinventory.parquet
85 MBREWARDS
KUDOS + SWAG
For top participants
1st PlaceEXCLUSIVE SWAG + KUDOS
2nd PlaceSWAG + KUDOS
3rd PlaceKUDOS
Kudos are displayed on your profile and leaderboard. Swag includes Databolt merchandise shipped to top performers.
STATS
PARTICIPANTS850
SUBMISSIONS2341
YOUR RANK--
BEST SCORE96/100
TECH STACK
SPARKAIRFLOWKAFKA (OPT)