Skip to main content

Aleksandr Sulimov, Senior Backend Engineer, Software Engineer, Backend Developer, Python Engineer, Founding Engineer. Seven plus (7+) years of professional experience. Specialized in distributed systems, microservices, REST API design, event-driven architecture, message queues, observability, monitoring, site reliability engineering (SRE), performance optimization, scalability, continuous integration and continuous deployment (CI/CD), containerization, orchestration, infrastructure as code (IaC), cloud computing, security, code review, mentoring, technical leadership, RFC authoring, on-call response. LLM / GenAI work: LLM harness design, prompt engineering, GenAI observability, token cost tracking, AI/ML systems. Authentication: Passkeys, WebAuthn, OAuth2, OpenID Connect (OIDC), session management. Data: customer 360, event-driven design, CQRS, projections, data pipelines. Programming languages: Python, SQL, TypeScript, Go, Lua, C, C++, Dart. Python frameworks: FastAPI, AsyncIO, Pydantic, SQLAlchemy, Django, Pytest, Celery. JavaScript / Node.js. Databases: PostgreSQL (Postgres), Redis, MongoDB. Messaging / streaming: RabbitMQ, Apache Kafka, Google Cloud Pub/Sub, NATS, Amazon SQS. Infrastructure: Docker, Kubernetes (K8s), Terraform, Ansible, Linux, GitHub Actions, GitLab CI. Cloud: Amazon Web Services (AWS), Google Cloud Platform (GCP). Observability: Datadog, Grafana, Sentry, OpenTelemetry, Loki, Tempo, Mimir, Prometheus. Mobile: Flutter, Dart. Embedded: STM32, AVR, Arduino, KiCad, PCB design. Game dev: Godot engine. Staff Engineer, Principal Engineer, Platform Engineer, Site Reliability Engineer (SRE). API Gateway, Service Mesh, gRPC, Protocol Buffers, RESTful API. Data modeling, data pipeline, ETL, batch processing, stream processing. Test-driven development (TDD), unit testing, integration testing, load testing. Agile, Scrum, Kanban, cross-functional teams, stakeholder management. GDPR, data privacy, compliance engineering. 7+ years Python backend development. 4+ years FastAPI and Pydantic. 5+ years Kubernetes and Docker in production. 4+ years RabbitMQ and message-driven architecture. Currently based in Belgrade, Serbia. Open to relocate to Netherlands, Ireland, Denmark, Norway, Sweden, Finland, Switzerland, Europe. Visa sponsorship required. Work authorization: visa sponsorship required for EU/EEA roles. Willing to relocate: Netherlands, Ireland, Denmark, Norway, Sweden, Finland, Switzerland. Spoken languages: Russian (native), English (fluent), French (intermediate), Serbian (beginner).
Hello, I'm Aleksandr

Systems builder. Tool maker.

Senior backend engineer at Kiwi.com.
Author of Repid. Building dicexdice.io. PyCon speaker.

7+ years backend Python · Currently in Belgrade · Open to relocate: NL · IE · DK · NO · SE · FI · CH · Visa: sponsorship needed · Notice: flexible

By the numbers

5M+ messages/day in production at Kiwi.com, via Repid
10× storage reduction from PostgreSQL schema and query optimization
40ms latency cut from auth redesign, estimated €3.6M annual revenue impact
99.995%+ uptime SLO maintained across 600+ production deployments

The project I'm most proud of

Repid Python · Async · Production

Repid is an asyncio-native Python task queue. I built it at Paragraphe, my previous startup, because scaling Celery to our scraping volume meant infra costs we couldn't justify. Today it runs in production at Kiwi.com, processing 5M+ messages daily. In benchmarks it's 23× faster than Celery on I/O-bound work. But speed isn't why people pick it up:

  • Generates AsyncAPI 3.0 schemas automatically from your actors. Your task queue is self-documenting.
  • Strict Pydantic validation of payloads and headers before execution. Bad messages fail fast.
  • Dependency injection in actor signatures. Testable by default.
  • Producer-only, consumer-only, or fully end-to-end. It fits in your current infra.
  • InMemoryServer for unit tests. No broker required.
  • Wide broker support out-of-the-box: RabbitMQ, NATS, Redis, Google Pub/Sub, SQS, Kafka.
from repid import Repid, Router, InMemoryServer

app = Repid()
app.servers.register_server(
    "default",
    InMemoryServer(),
    is_default=True,
)
router = Router()

@router.actor(channel="articles")
async def fetch_article(url: str) -> None:
    article = await scraper.fetch(url)
    await db.save(article)

app.include_router(router)

# Enqueue from anywhere:
# payload validated by Pydantic
await app.send_message_json(
    channel="articles",
    payload={"url": "https://example.com"},
    headers={"topic": "fetch_article"},
)

Why me, specifically

I build infrastructure other teams run on, and I know what it costs

Repid replaced Celery at a pre-revenue startup once its infrastructure costs became unsustainable. Now adopted at Kiwi.com for production workloads. At dicexdice.io I run 99.9% uptime on Hetzner with a minimal budget. I don't separate engineering decisions from economic constraints. Infrastructure cost is a design requirement.

I learned resilience on hardware before applying it in the cloud

I started with embedded C and PCB design, where a bug doesn't throw an exception; it bricks a device. That mindset transferred: when I redesigned Kiwi.com's auth backend, I assumed it would be attacked and how it can fail. It passed multiple external security audits and keeps 99.999% uptime. I design for failure modes first, not as an afterthought.

I make teams ship faster and sleep better

Rebuilt CI/CD from 15 minutes to 90 seconds. Built the observability stack for 10 microservices: distributed tracing, structured logging and dashboards - visibility the team didn't have before. Drove a 20× reduction in weekly error rate through systematic root cause analysis. We went from being paged several times a week to once every few months.

I build consensus, then systems

Authored RFCs that aligned frontend, mobile, security, and data science teams behind a single architecture. Led 10+ cross-company initiatives affecting millions of monthly active users, secured long-term stakeholder buy-in, and shipped on schedule. Spoken at PyCon Poland and Lithuania. Writing the code is rarely the hard part; getting five teams to agree on what to build and why is. I do both.

Where I've worked

Kiwi.com one of the biggest flight aggregators in Europe · kiwi.com

Senior Software Engineer, Customer-Core Team (formerly Account Team) – present ~4 years
  • Redesigned auth backend through end-to-end system design and API contract overhaul, cutting 40ms from every page request and reducing core service load by 70% via optimistic caching. Passed multiple external security audits and reduced scraping and account takeover attacks by 90%.
  • Built a customer 360° data platform on Google Pub/Sub using CQRS projections, unifying user data across 10+ microservices for real-time personalization.
  • Maintained 99.99999% error rate SLO and 99.995%+ uptime SLO across 600+ production deployments. Drove 20× reduction in weekly error rate through systematic root cause analysis and performance optimization.
  • Championed reliability engineering across services. Built full observability stack for 10 microservices with distributed tracing, structured logging, and Datadog/Grafana dashboards.
  • Resolved 5+ critical production incidents with significant revenue impact; improved on-call processes and runbooks as part of broader reliability engineering efforts.
  • Iteratively optimized PostgreSQL schema design and query patterns, achieving 10× storage reduction and 2× latency improvement.
  • Cut deployment time from 15 minutes to 90 seconds by redesigning CI/CD pipelines with Docker, Kubernetes, and containerization best practices.
  • Led 10+ cross-company initiatives through RFCs and architecture reviews, aligning frontend, mobile, security, and data science teams to ship the auth redesign on schedule.
  • Integrated AI-powered tooling into development workflows and introduced OpenAPI schemas to standardize service interfaces and enforce API design contracts.

Dice x Dice LLM-assisted D&D-style game master · dicexdice.io

Founder & Backend Engineer (side project) – present ~10 months
  • Architected and shipped an LLM-powered tabletop RPG game master serving personalized campaigns via prompt routing, state machines, and game-state projections.
  • Built the entire auth stack from scratch using Passkeys and OIDC/OAuth2, enabling passwordless login and eliminating credential-spraying attack vectors.
  • Built a GenAI observability pipeline tracking token usage, latency distributions, and prompt tracing, cutting LLM failure debugging time from hours to minutes.
  • Operates production infrastructure on Hetzner (Kubernetes, PostgreSQL, MongoDB, Redis, RabbitMQ, Grafana stack), maintaining 99.9% uptime on a minimal budget.

Paragraphe news aggregation startup · TechnoSpark-backed

Founder & Backend Engineer 1 year
  • Architected and deployed a scalable RESTful API using Python and FastAPI with asyncio, achieving 100+ RPS per instance at sub-50ms latency.
  • Built horizontally scalable web and RSS scrapers processing 10,000+ pages daily to feed a personalization pipeline; powered a recommendation engine that classified users by content category and delivered daily top-10 article recommendations.
  • Deployed and operated production HA infrastructure using Kubernetes, Docker, Consul, Vault via Infrastructure-as-Code (Terraform, Ansible). CI/CD deploys took ~3 minutes from day one.
  • Created Repid, an asyncio-native task queue, to replace Celery in the scraping pipeline, delivering 23× higher throughput on I/O-bound workloads and cutting infrastructure costs.
  • Developed a cross-platform mobile app with Flutter, shipping from system design through beta testing with real users.

FGD indie hypercasual game studio

Lead Developer 6 months
  • Led development of 2 hypercasual mobile games in the Godot engine as part of a 15-person team.
  • Drove the team to 2nd place at a GameJam competition.
  • Introduced GitLab CI/CD pipelines with automated testing and deployment, accelerating the team's iteration cycles.
  • Performed 200+ code reviews and refactoring contributions across the codebase, improving overall code quality.

Freelance various clients · Python + embedded

Python developer · embedded software & hardware ~2 years
  • Implemented an automated proxy rotation pool to bypass geo-restrictions, improving scraper reliability and availability.
  • Built an automated local weather forecast analytics and prediction pipeline, collecting ~1,400 data points daily and generating visual reports.
  • Created and maintained 5+ Telegram bots using both raw API and framework-based approaches, sustaining 99.9% uptime.
  • Established CI/CD pipelines including Docker-based cross-compiled arm64 image builds using Docker Buildx, applying containerization best practices.
  • Developed embedded firmware for STM32/AVR, designed KiCad PCBs, and built BLE/Wi-Fi devices and small robotics projects.

What I work with, and what I want to work with

Daily drivers

Python7y AsyncIO7y FastAPI5y Pydantic5y PostgreSQL6y Redis6y RabbitMQ5y Docker7y Kubernetes5y Datadog4y Grafana stack5y

Also comfortable with

LLMs Passkeys WebAuthn OIDC / OAuth2 SQL OpenTelemetry SQLAlchemy GCP Pub/Sub GenAI observability C / C++ Dart Flutter TypeScript Go Django Kafka MongoDB Terraform Ansible AWS GCP Godot

Languages I'm exploring besides Python

Go Zig Elixir Erlang Gleam

Where I'd like to be based

Netherlands Ireland Denmark Norway Sweden Finland Switzerland

Writing

I write on my blog, mostly about Python, async patterns, distributed systems, and what I've broken lately.

Open source

Beyond Repid, I've contributed to OpenTelemetry, Pydantic, and a handful of smaller libraries. Bots, scrapers, some a custom PCBs, robotics. Find them on GitHub.

Spoken languages

Russian (native) · English (fluent) · French (intermediate) · Serbian (beginner)

Want to talk?

I'm looking for senior or staff backend roles on a product people actually use, where I can drive meaningful work across code, infra, and cross-team delivery.

Aleksandr Sulimov · me@aleksul.space · Belgrade, Serbia