Crash Course on Celery Primitives

📦 Group

Execute multiple tasks in parallel and collect all their results. Think of it as "run all these tasks at the same time."

Visual Flow:

Start →

Task 1

Task 2

Task 3

→

[Result1, Result2, Result3]

from celery import group

# Execute 3 tasks in parallel

job = group(

    process_image.s(img1),

    process_image.s(img2),

    process_image.s(img3)

)

result = job.apply_async()

# Get all results as a list

results = result.get()  # [res1, res2, res3]

🎯 When to Use:

Processing multiple independent items (e.g., batch image processing)
Making multiple API calls that don't depend on each other
Running the same task with different inputs simultaneously
When you need all results but order doesn't matter

                Key Point: All tasks run in parallel. You get a list of results in the same order as tasks were submitted.
            

⛓️ Chain

Execute tasks sequentially where each task's output becomes the input for the next task. Think of it as "do this, then do that with the result."

Visual Flow:

Task 1

→

Task 2

→

Task 3

→

Final Result

Each task waits for the previous one to complete

from celery import chain

# Task 1 output → Task 2 input → Task 3 input

job = chain(

    fetch_data.s(url),        # Returns raw data

    parse_data.s(),          # Gets raw data, returns parsed

    save_to_db.s()           # Gets parsed data, returns ID

)

result = job.apply_async()

# Get only the final result

final_result = result.get()  # Just the DB ID

🎯 When to Use:

Data processing pipelines (fetch → transform → store)
When each step depends on the previous step's output
Multi-stage transformations (resize image → add watermark → compress)
Workflows where order matters

                Key Point: Tasks run sequentially. Each task automatically receives the previous task's result as its first argument. You only get the final result.
            

🎵 Chord

A Group followed by a callback task. Run multiple tasks in parallel, wait for ALL to complete, then run one final task with all results. Think of it as "do all these in parallel, then do this with all the results."

Visual Flow:

Start →

Task 1

Task 2

Task 3

→

Callback([r1, r2, r3])

→

Final Result

Callback waits for ALL parallel tasks to finish

from celery import chord

# Run tasks in parallel, then aggregate results

job = chord(

    [

        analyze_log.s(log1),

        analyze_log.s(log2),

        analyze_log.s(log3)

    ]

)(create_report.s())  # Callback gets [res1, res2, res3]

result = job.get()  # Returns the report

🎯 When to Use:

Process multiple items then aggregate (analyze logs → create summary report)
Parallel computation followed by consolidation (map-reduce pattern)
Fetch data from multiple sources then combine
When you need to do something with ALL results together

                Key Point: Chord = Group + Callback. The callback receives a list of ALL results from the group. Perfect for map-reduce patterns.
            

🍰 Chunk

Split a large batch of work into smaller groups that run in parallel. Think of it as "I have 1000 items, process them in batches of 100."

Visual Flow:

1000 items → Split into chunks of 100

Chunk 1 (items 1-100)

Chunk 2 (items 101-200)

Chunk 3 (items 201-300)

⋮

Chunk 10 (items 901-1000)

→

All Results

Each chunk processes independently

from celery import chunks

# Process 1000 emails in batches of 100

items = list(range(1000))  # Your data

job = send_email.chunks(items, 100)  # 10 groups of 100

result = job.apply_async()

# Returns a GroupResult with 10 sublists

results = result.get()  # [[res1-100], [res101-200], ...]

🎯 When to Use:

Processing large datasets that would overwhelm if sent all at once
Rate-limiting: control how many tasks run simultaneously
Memory management: process data in manageable batches
Bulk operations (send 10,000 emails in chunks of 100)

                Key Point: Chunks automatically splits your work into parallel batches. It's essentially a convenient way to create a Group with batched data. Great for controlling resource usage.
            

🔍 Quick Comparison

Understanding when to use each primitive

Primitive	Execution	Result	Use When...
Group	All tasks in parallel	List of all results	Tasks are independent, need all results
Chain	Sequential (one after another)	Final task result only	Each step depends on previous output
Chord	Parallel + callback	Callback result	Need to aggregate parallel results
Chunk	Batched parallel	List of batch results	Large dataset, need resource control

Real-World Decision Tree:

❓ Do tasks depend on each other's output?
    ✅ YES → Use Chain
    ❌ NO → Continue...

❓ Do you have MANY items (100s/1000s)?
    ✅ YES → Use Chunk
    ❌ NO → Continue...

❓ Need to process all results together?
    ✅ YES → Use Chord
    ❌ NO → Use Group

🎯 Common Patterns Combined:

Chain of Groups: Multiple parallel stages in sequence (fetch all data → process all data → save all data)
Group of Chains: Multiple independent pipelines running in parallel
Chord with Chain callback: Parallel work → sequential processing of results
Chunks in Chord: Process huge dataset in batches, then aggregate