Available · Q3 2026 Dubai, UAE — remote 15+ yrs in production

Guilherme
Jaccoud

Platform Engineer / SRE

Building reliable platforms with emphasis on performance and developer experience.

Distributed systems, Kubernetes at scale, multi-cloud infrastructure, and the operational discipline that makes them quietly boring.

Scroll · 01 / 06

01About

Systems engineer and platform architect. I build the reliability substrate that other engineers build on top of.

15+ years across cloud hosting, fintech, hyperscale e-commerce and crypto. I lead reliability programs end-to-end: capacity and risk modelling, SLO design, incident command, and the platform engineering that makes safety the default path.

With a background in design and product, I bring a product mindset to engineering. I consider simplicity a reliability property, and I design platforms the same way I would design products: with the end user in mind.

I write Rust for the parts that matter, Python for the parts that don't, and shell for the parts I'd rather not admit :)

99.85_%

Sustained availability

1.2k_nodes

Peak fleet operated

$20M_/yr

Infra cost removed

40_min

Median MTTR

02Experience

2021 —
2026

Senior Site Reliability Engineer

Kraken · Remote

Joined the Rust Core Backend team to advance the backend migration to Rust. Worked on high-performance trading and settlement systems, enhancing reliability, performance, and developer experience.

Maintained and evolved infrastructure specifications for 60+ backend services on HashiCorp Nomad, Consul, and Vault; designed and built a GitOps pull-request automation system for Nomad — filling the same role Atlantis plays for Terraform — and implemented in-service policy hot-reloading via Consul KV.
Instrumented services with Prometheus metrics and Grafana dashboards and alerts; held a seat on the Layer 1 on-call rotation covering the entire backend.
Owned the API gateway — source code, configuration, release pipeline, and weekly scheduled deployments; designed and shipped the ingress infrastructure supporting the Travel Rule compliance flow between Kraken and external exchanges.
Wrote GitLab custom CI executors enabling secure, auditable interaction with production systems from pipeline jobs.
Embedded with backend engineers across trading and settlement services — contributing to service codebases, improving latency and observability, and participating in incident response and post-mortems.
Participated in the Core SRE workgroup, developing internal tooling and establishing engineering standards adopted across the organisation.

RustNomadConsulVaultPrometheusGrafanaGitLabAWS

2020 —
2021

Staff Platform Engineer / SRE

Delivery Hero · Dubai

Shifted focus to platform adoption and developer experience — working with product teams to surface and eliminate infrastructure friction, and setting engineering standards across the organisation.

Engineered an active-active multi-region Kubernetes strategy with 50/50 traffic split via Cloudflare, automatic cluster failover, and ArgoCD as the only authorised deployment agent, guaranteeing availability even during cluster-wide incidents.
Wrote a Rust deployment tool, used in CI/CD pipelines and ad hoc operations, to render and push Helm and Kustomize manifests to the GitOps source of truth; implemented Teleport JIT access as a break-glass mechanism.
Designed a daily load-testing pipeline based on K3s and Terraform to spin up ephemeral load-testing infrastructure, run developer-authored K6 scripts, and push results to Grafana, providing insights into scalability headroom and potential bottlenecks.
Wrote a Rust / React web app to provision credentials across internal and third-party services, eliminating manual steps during engineering onboarding.
Introduced chaos engineering practices, running regular Chaos Monkey exercises to validate disaster recovery scenarios.

KubernetesTerraformAWSArgoCDCloudflareRustReactHelmKustomizeK3sK6Grafana

2019 —
2020

Senior Platform Engineer / SRE

Delivery Hero · Dubai

Led the SRE team through Talabat's post-acquisition migration from an on-prem monolith to a cloud-native distributed system on AWS.

Designed the overall migration strategy from on-prem to AWS, including planning and execution; trained team members in Terraform and IaC best practices; implemented Atlantis for self-service infrastructure changes.
Architected the complete Kubernetes platform, including CNI, CSI and Traefik ingress; implemented Cert-Manager with Vault for automatic mTLS; established a Direct Connect between on-prem and AWS to enable gradual, zero-disruption service migration.
Partnered with development teams on the migration from legacy .NET Framework to .NET Core on Kubernetes; introduced the Serverless Framework for workloads where dedicated Kubernetes deployments were overkill.
Enforced platform guardrails with OPA Gatekeeper policies covering security baselines, resource quotas, and naming conventions; introduced Vector as a unified logs and metrics collection pipeline routing telemetry to New Relic.

KubernetesTerraformAWSVaultOPAAtlantisServerlessNew RelicTraefikCert-ManagerVector

2016 —
2019

Platform Engineer

Symphony · São Paulo

Infrastructure hire following Symphony's acquisition by Google; led the infrastructure provisioning during the migration to microservices, with per-tenant deployments across hybrid cloud for each enterprise client.

Led the Infrastructure as Code migration from ad hoc Python scripts to multi-cloud Terraform (GCP/AWS); authored reusable modules covering networking, persistence, and orchestration layers, including GKE on GCP and EKS on AWS.
Built a GitOps CLI that resolves module dependency order, applies Terraform, and commits the resulting state to Git — making every environment change auditable and reproducible.
Integrated the CLI into Jenkins, giving development teams a self-service UI to spin up, update, and tear down development and production environments without SRE intervention.
Operated a heterogeneous persistence tier — HBase, Hadoop, MongoDB, Solr, Elasticsearch, Hazelcast — on auto-scaled instances across all tenant environments.

KubernetesTerraformGCPAWSJenkinsMongoDBElasticsearchHBaseHadoopSolrHazelcast

2010 —
2016

Co-Founder / DevOps Engineer

Tropicloud · São Paulo

Founded Latin America's first managed WordPress hosting company; grew it to hundreds of customers across 3 countries, serving some of the highest-traffic websites in the region.

Architected the application stack: NGINX / PHP-FPM / MariaDB, with Redis for full-page cache and static assets on S3 distributed via CloudFront CDN, achieving 10x faster loading times than traditional shared hosting.
Designed per-tenant isolation on AWS — each customer site provisioned with its own VPC, ALB, Auto Scaling Group and RDS cluster, behind Cloudflare WAF + DDoS protection — guaranteeing resource isolation and scalability under heavy traffic.
Rebuilt the entire platform on Kubernetes in 2014 — among the earliest production deployments in Latin America — reducing environment provisioning from hours to minutes and enabling zero-downtime rolling deploys.

KubernetesAWSCloudflareNGINXPHP-FPMMariaDBRDSRedisS3

03Stack

Tools are commodities. The discipline of how you operate them is the real artifact.

Hover any cell — name & years of practice

Orchestration & Platform

Infrastructure & Cloud

Observability & Incident

Languages & Data

04Focus areas

01

Distributed systems reliability

Failure modelling, consensus, partition behaviour, dependency contracts. The boring discipline that keeps a fleet calm under load.

02

Kubernetes platform engineering

Multi-tenant clusters, operator design, golden paths, and the platform abstractions that turn raw infrastructure into a product.

03

Multi-cloud infrastructure

AWS, GCP, Azure — and the architectural discipline to stay portable where it matters and proprietary where it pays.

04

High-availability architecture

Cell-based designs, active-active topologies, traffic shifting, and the failover drills that prove they actually work.

05

Observability engineering

SLI/SLO design, OpenTelemetry pipelines, sampling economics, and the dashboards engineers actually use at 3am.

06

Resilience & disaster recovery

RPO/RTO architecture, chaos engineering programs, regional failover rehearsals, and the runbook discipline behind them.

07

Infrastructure automation

IaC at scale, drift detection, policy-as-code, and the platform tooling that makes change cheap and reversible.

08

Platform scalability

Capacity modelling, performance work in Go and Rust, autoscaling control loops, and the cost mechanics behind them.

05 · Get in touch

For staff & principal reliability work, write to hello@guigo2k.com.

GitHub LinkedIn SoundCloud PGP key

GuilhermeJaccoud