Sr Site Reliability Engineer
Company: Parks, Experiences and Products
Location: Orlando
Posted on: August 6, 2022
Job Description:
Do you want to be part of a team that creates magic for millions
of guests? Behind the scenes, the Retail Technology Operations team
helps provide magical digital and physical experiences applying the
latest technology; and our Site Reliability Engineers provide
expert engineering services in the cloud, automation, and
reliability to support the innovation and operation of The Walt
Disney Company. We are passionate about ensuring our systems
provide the best guest experience! You will protect and improve the
automation and systems that run Disney's experiences and services
with a focus on availability, latency, and automation while
embracing a DevOps culture. Responsibilities :
- Ownership of our current cloud and SaaS services - install,
upgrade, maintain all necessary middleware components, work with
cloud vendors to integrate APIs and automation tools
- Incident Response - work to compile a runbook that identifies
all known, potential risks and incidents, and have well-defined
procedures to mitigate or eliminate the risk if they occur, this
role does have on call and incident resolution
responsibilities
- Deploy and manage innovative modern cloud technologies using
infrastructure-as-code, self-healing, security automation patterns,
instrumented and monitored
- Participate in implementation of complex engineering solutions
across Retail Technologies
- Challenge the status quo through intellectual curiosity and
natural inquisitiveness to look beyond the obvious for continuous
improvement opportunities
- Manage and appropriately escalate delivery impediments, risks,
issues, and changes tied to the engineering initiatives to the
stakeholders
- Work closely with our development teams to ensure smooth
operational transition of solutions, and to improve existing
solutions post turnover Basic Qualifications :
- Experience programming in one or more of: Python, Perl, Ruby,
Java, Go, Rust, C/C++ (3 years)
- Skilled in Cloud/PaaS/SaaS Environments (e.g. AWS, Azure,
Google Cloud Compute) (3 years)
- Proficient, collaborative, & experienced in building reliable,
scalable, enterprise systems (5 years)
- Ability to identify root-cause sources of instability in a
high-traffic, large-scale distributed systems
- UNIX/Linux administration, troubleshooting, performance tuning,
& security (3 years)
- Leading technical projects, working with project managers to
ensure smooth delivery (2 years)
- Experience working with Security Operations teams to design
security into solutions as well as mitigate existing issues (2
years)
- Understanding of observability principles (monitoring, logging,
tracing, alerting), tools and practices that promote observability
(3 years)
- Experience with continuous integration tools (e.g.Gitlab, AWS
CodeBuild, CodeDeploy, CodePipeline, Azure DevOps) (3 years)
- Trouble-shooting skills that span systems, network, and
code
- Configuration management and orchestration (e.g. Terraform,
Cloud Formation, Ansible, Chef) (3 years)
- Excellent written and verbal communications; ability to develop
and deliver presentations Required Education :
- Bachelor's degree in applicable field
Keywords: Parks, Experiences and Products, Orlando , Sr Site Reliability Engineer, Professions , Orlando, Florida
Didn't find what you're looking for? Search again!
Loading more jobs...