- Level: Senior
- Type: B2B
- English Level: Upper-Intermediate
- Location: Poland, Romania, Portugal
- Skills: AWS CI/CD Docker Kubernetes SQL
Responsibilities
Service Reliability & Architecture
- Lead the design and implementation of reliable, scalable service architectures
- Ensure high availability, performance, and durability of production environments
- Drive reliability standards, patterns, and best practices across teams
- Lead major incident response efforts and guide long-term remediation
Database & Data Platform Engineering
- Architect, maintain, and optimize database systems (SQL)
- Design and implement replication, backup, failover, and disaster recovery strategies
- Improve data pipelines, storage systems, and high-availability configurations
- Conduct deep performance analyses and resolve complex database issues
Automation & Tooling Leadership
- Build and improve internal tools for deployment, orchestration, observability, and maintenance
- Automate operational workflows to eliminate manual processes and reduce error rates
- Mentor engineers on automation strategies and reliability engineering techniques
DevOps & Infrastructure
- Enhance CI/CD pipelines and deployment automation across services
- Define and enforce infrastructure-as-code standards
- Collaborate with engineering teams to design systems for operability and maintainability
- Drive improvements in monitoring, logging, metrics, and alerting systems
Operational Excellence
- Participate in and help lead the on-call rotation
- Develop runbooks, guides, and processes for consistent operational practices
- Conduct and lead post-incident reviews and reliability-focused retrospectives
- Identify systemic issues and deliver strategic, long-term solutions
Required Skills & Experience
Technical Expertise
- Strong proficiency with relational databases (PostgreSQL, MySQL, MongoDB)
- Extensive experience with database administration, replication, query optimization, and performance tuning
- Advanced knowledge of Linux systems, networking, and distributed system fundamentals
- Hands-on experience with cloud environments (AWS, GCP, or Azure)
- Experience with containerization and orchestration technologies (Docker, Kubernetes)
- Strong scripting or programming experience (Python, Java, Bash, Ansible)
- Experience designing and maintaining observability systems (Prometheus, Grafana, ELK)
DevOps & Reliability Skills
- Expertise in CI/CD pipelines, configuration management, and deployment automation
- Strong understanding of infrastructure-as-code tools (Terraform, CloudFormation, Pulumi)
- Ability to design highly available, fault-tolerant architectures
- Experience guiding teams through incident response and improving on-call processes
Leadership & Collaboration
- Ability to mentor and support engineers across teams
- Strong communication skills, especially during incidents and cross-team discussions
- Track record of leading reliability projects from concept through execution
- Demonstrated ability to influence engineering roadmap decisions based on operational needs
Preferred Qualifications
- Experience with distributed data systems or event-driven architectures
- Familiarity with SRE frameworks such as SLOs, SLIs, and error budgets
- Experience designing or managing multi-region cloud deployments
- Prior experience in a Senior, Staff, or Lead SRE role

