Senior Database Reliability Engineer
Company: ClickUp
Location: San Diego
Posted on: November 14, 2024
Job Description:
We are looking for driven and innovative software engineers with
strong site reliability engineering (SRE) discipline or interest in
this area to help us make ClickUp the "one app to rule them all".
As an SRE at ClickUp, your primary roles will be improving the
stability, availability and reliability of our globally distributed
and cloud-based infrastructure that powers our app for thousands of
users daily. If you are a rockstar engineer with an entrepreneurial
and high-paced mindset who is ready to own, drive and tackle some
of the most complex problems there are out there, we would love to
hear from you!What you'll do:
- Build a deep understanding of how ClickUp's systems behave,
scale, interact and fail, and use that insight to identify risks
and opportunities for remediation
- Own, drive and improve the incident management process across
engineering org and participate in the team's follow-the-sun
model
- Define SLOs and SLIs for all of our services and introduce
error budgeting
- Own and improve our observability on all of our services
- Build software solutions to enable reliability and operability
of large scale distributed systems handling petabytes of data and
serving
- Build tools and automation to eliminate toil and reduce
operational overhead. Create frameworks, processes and best
practices to be used across ClickUp Engineering
- Automate critical portions of ClickUp engineering processes, to
minimize risk and maximize the speed of innovation
- Manage capacity and performance to help scale our
infrastructure both on public and private clouds around the
worldWhat we're looking for:
- Software engineering: At the very core, we are looking for
strong software engineers with operational, infrastructural or SRE
mentality who can design and build systems for platform and
infrastructure layers
- Cloud experience: Production working experience in a major
cloud environment around doing CI/CD deployments, using managed
services, bootstrapping and provisioning services via
infrastructure-as-code (IAC) systems, automations and
operations
- Infrastructure Management: You have worked with and managed
production grade infrastructure with IaC tools or configuration
management tools
- Operating systems: Strong knowledge of *nix based operating
systems, their internals and advanced troubleshooting commands
- Compute: Experience of working with VMs, containers and
container orchestration systems
- Database: Experience of working with RDBMS and NoSQL storage
solutions within production capacity and know your way around
running and inspecting queries. A good understanding of indexing,
locking, replication and sharding are a bonus!
- Observability: You have worked with logging, monitoring and
alerting tools before and you know how logs are collected,
aggregated and injected. You have set up monitors and alerts for
production services and know your way around concepts such as SLOs
and SLIs
- Bonus points: We believe strong engineers can pick up any
technologies and tools fast and hit the ground running. Therefore,
we avoid listing specific technologies. However, if you have worked
with at least one of the technologies we have in our stack, that
would definitely be a bonus point.
- CloudFormation/CDK, ECS, ElasticBeanstalk
- PostgreSQL, DynamoDB, AuroraDB
- Typescript or any JavaScript based frameworkSalary Range:
$200,000 - 242,500 (base pay). This position is eligible for the
following benefits:
- Equity
- 401(k) with up to 2% match
- ClickUp swag
- Teammate recognition awards
- Professional development program
- Health insurance
- Dental insurance
- Paid parental leave
- Flexible paid time off
- Sabbatical program
- Wellness stipend#LI-RS1#LI-REMOTE
#J-18808-Ljbffr
Keywords: ClickUp, Moreno Valley , Senior Database Reliability Engineer, IT / Software / Systems , San Diego, California
Didn't find what you're looking for? Search again!
Loading more jobs...