Five Python Libraries Every Business Automation Engineer Should Know
Python's ecosystem is enormous. For business automation specifically, the same handful of libraries show up project after project. If you're building (or hiring someone to build) an automation, these five are the load-bearing dependencies that make everything else easier.
1. Pydantic — schema validation for LLMs and APIs
Pydantic v2 is the unsung hero of every production LLM workflow we ship. Declare a data shape as a Python class, validate untrusted input against it (LLM output, API response, user-submitted JSON), get clear error messages on mismatch. Combined with instructor or OpenAI's structured-output mode, it turns “the LLM returned plausible nonsense” from a real problem into a non-issue.
2. httpx — modern HTTP client
Requests is fine but httpx is better: native async, HTTP/2, identical API for sync and async usage, much better defaults. For workflows that hit multiple APIs (Xero + HubSpot + Stripe + your own backend), the async story alone justifies the dependency. Bonus: trivially mockable in tests.
3. tenacity — production-grade retries
External APIs fail. Networks blip. Rate limits hit. Tenacity is the retry library that handles all the cases you didn't know you needed: exponential backoff, jitter, retry on specific exceptions only, max attempts, max duration, callbacks on retry. Decorate one function, get robust behaviour. Without this, every automation eventually fails in production at the worst time.
4. SQLAlchemy 2.x — the database layer
For anything beyond throwaway scripts, SQLAlchemy 2.x with declarative models is the right choice. Type-hinted query builder, sync and async support, migrations via Alembic, works against Postgres / SQLite / MySQL interchangeably. The learning curve is real but pays back the moment you need to change schema in production. Skip the ORM tier and use Core if you prefer raw SQL with parameter binding — equally valid.
5. structlog — structured logging that grows with you
Python's stdlib logging module is fine until you need to search, filter, or correlate. structlog gives you structured JSON logs from day one, with context binding (so every log line for a given request carries the request ID automatically) and easy redirection to services like Axiom, Datadog, or CloudWatch. Setting it up takes 20 lines; the payoff in production debugging is enormous.
Honourable mentions
- FastAPI — when you need a web API, this is the default.
- typer — turn any function into a CLI in three lines.
- polars — if you're moving real data volume, polars often beats pandas in both ergonomics and speed.
- ruff — linting and formatting that runs 100× faster than its predecessors. Use it.
- uv — package installation and venv management that's actually fast.
See the how we build page for the broader engineering stack and how these libraries fit into our reference architecture.
Got a workflow you want to talk through?
30 minutes, no pitch. We'll tell you honestly what we'd build — or whether automation isn't right yet.