Skip to content

E.B.3 Concurrent Programming (Including asyncio)

asyncio concurrency control flowchart

async task timeout cancellation and rate limiting diagram

Concurrency helps when the program is mostly waiting: HTTP calls, database calls, file I/O, scraping, RAG retrieval, or Agent tool calls. It is not a magic speed button for CPU-heavy work.

  • Python 3.10+
  • No external packages
  • A terminal that can run python
  • I/O-bound: most time is spent waiting for another system.
  • CPU-bound: most time is spent computing.
  • Coroutine: an async function that can pause with await.
  • asyncio.gather: run multiple awaitables and collect results.
  • Semaphore: limit how many tasks can run at the same time.
  • Timeout: stop waiting after a fixed limit.

Create async_batch.py:

import asyncio
async def call_tool(name, delay):
await asyncio.sleep(delay)
return f"{name}:ok"
async def guarded_call(semaphore, name, delay, timeout):
async with semaphore:
try:
return await asyncio.wait_for(call_tool(name, delay), timeout=timeout)
except asyncio.TimeoutError:
return f"{name}:timeout"
async def main():
semaphore = asyncio.Semaphore(2)
results = await asyncio.gather(
guarded_call(semaphore, "search", 0.1, 0.5),
guarded_call(semaphore, "database", 0.2, 0.5),
guarded_call(semaphore, "slow_tool", 1.0, 0.3),
)
print(results)
asyncio.run(main())

Run it:

Terminal window
python async_batch.py

Expected output:

Terminal window
['search:ok', 'database:ok', 'slow_tool:timeout']

The important pattern is not just gather. It is gather plus a concurrency limit plus timeout handling.

Review async code by asking what is allowed to wait, how many tasks may run at once, and what happens when one task is too slow. If those three answers are not visible, the code may work in a demo but fail under real traffic.

For AI apps, this applies to retrieval, reranking, tool calls, file uploads, and remote model APIs. Keep a trace that names each task and result. A list like ['search:ok', 'slow_tool:timeout'] is small, but it proves timeout behavior better than a paragraph of explanation.

Run this tiny check to see the two possible limits:

import asyncio
for limit in [2, 1]:
semaphore = asyncio.Semaphore(limit)
print("limit:", limit, "semaphore:", type(semaphore).__name__)

Expected output:

Terminal window
limit: 2 semaphore: Semaphore
limit: 1 semaphore: Semaphore

The final result stays the same, but tasks run more conservatively. In real services, this protects upstream APIs from sudden request bursts.

Good fit:

  1. Many network requests
  2. Multiple tool calls
  3. RAG retrieval from several sources
  4. Waiting on databases or queues

Poor first choice:

  1. Heavy numerical computation
  2. Large image transformations
  3. Code that must stay very simple and has no waiting bottleneck

Keep this page’s proof of learning as a small evidence card:

Python Pattern
decorator, iterator, generator, concurrency primitive, or metaprogramming hook
Code Artifact
minimal runnable example plus printed output
Use Case
where this pattern improves an AI app, pipeline, tool, or server
Failure Check
hidden side effects, unreadable abstraction, race condition, or overengineering
Expected Output
small advanced-Python example with a practical AI-system use note
  • Adding async everywhere without checking whether the task is I/O-bound.
  • Using gather without a concurrency limit.
  • Forgetting timeouts, so one slow upstream blocks the whole workflow.
  • Swallowing exceptions without logging which task failed.

Add five more tool calls and set Semaphore(3). Then count how many return :timeout when you lower the timeout to 0.15.

Reference implementation and walkthrough

The exact timeout count depends on the delays you assign, so the answer should report the observed count instead of inventing a fixed number. A solid solution prints both the full result list and a count such as:

timeouts = sum(result.endswith(":timeout") for result in results)
print("timeouts:", timeouts)

The explanation should mention that Semaphore(3) limits pressure on upstream tools, while the lower timeout exposes slow calls. In production, you would log which tool timed out, not only the total number.