Atlassian logo

Site Reliability Engineering

Atlassian
Full-time
Remote
Working at Atlassian
Atlassians can choose where they work – whether in an office, from home, or a combination of the two. That way, Atlassians have more control over supporting their family, personal goals, and other priorities. We can hire people in any country where we have a legal entity. Interviews and onboarding are conducted virtually, a part of being a distributed-first company.


Design, architect, and implement monitoring and observability solutions for complex software applications and infrastructure.

Evaluate and select appropriate monitoring tools and technologies based on project requirements and industry trends.

Conduct performance analysis, capacity planning, and troubleshooting to identify and address performance bottlenecks and reliability issues.

Collaborate with cross-functional teams to gather requirements and define monitoring strategies.

Develop monitoring frameworks, dashboards, and alerting systems to ensure critical systems' reliability, performance, and availability.

Implement best practices for log management, metrics collection, and distributed tracing to gain deep insights into system behavior and performance.

Mentor and provide guidance to junior team members on monitoring best practices and methodologies

Stay up-to-date with emerging technologies and industry trends in observability, monitoring, and devOps practices.




Bachelor's or Master's degree in Computer Science, Information Technology, or related field.

10+ years of software development experience

With over 3 years of experience in a technical lead role, specializing in designing and developing high-scale distributed systems.

Strong communication and collaboration skills with the ability to work effectively in a fast-paced environment

Excellent analytical and problem-solving skills with a keen attention to detail

Proven experience designing and implementing monitoring solutions for large-scale, distributed systems

Solid understanding of cloud computing platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes).