DataboltDATABOLT
CHALLENGESPATTERNSLEARNDISCUSSIONSWRITE-UPSMY WORK
DataboltDATABOLT
Home
WORK
My WorkNotebooksCode / Scripts
COMMUNITY
DiscussionsCompetitionsContributionsWrite-ups
LEARN
Learning PathsNotebook Playground
ACHIEVEMENTS
Badges
SETTINGS
Cloud (BYOC)
RECENT
ETL Pipeline - Customer Data
10m ago
ETL Speed Race
1h ago
Late-arriving data approach
2h ago
Kafka Consumer Script
4h ago
Spark Fundamentals
8h ago
LOGIN / SIGN UP
CHALLENGES
ACTIVEID: budget-challenge

ETL SPEED RACE CHALLENGE

Process 10TB of e-commerce data with lowest latency

REWARDS
KUDOS + SWAG
STATUS
ONGOING
PARTICIPANTS
850

STATUS

STARTED:FEB 1, 2026
ONGOINGThis challenge does not expire

PROBLEM DESCRIPTION

You must build a pipeline that processes 10TB of e-commerce transaction data. The pipeline should: 1. Extract data from S3 (Parquet format) 2. Transform: Deduplicate, enrich, aggregate 3. Load to target data warehouse 4. Maintain data quality (>99.9% accuracy) **METRICS:** - Throughput (records/sec) - 30% weight - Latency p99 - 30% weight - Resource efficiency - 20% weight - Code quality - 20% weight

DATASETS

orders.parquet
2.3 GB
customers.parquet
450 MB
products.parquet
120 MB
inventory.parquet
85 MB

REWARDS

KUDOS + SWAG

For top participants

1st PlaceEXCLUSIVE SWAG + KUDOS
2nd PlaceSWAG + KUDOS
3rd PlaceKUDOS

Kudos are displayed on your profile and leaderboard. Swag includes Databolt merchandise shipped to top performers.

STATS

PARTICIPANTS850
SUBMISSIONS2341
YOUR RANK--
BEST SCORE96/100

TECH STACK

SPARKAIRFLOWKAFKA (OPT)
OPEN BROWSER NOTEBOOK