Reliability Engineer
Company: Orangepeople
Location: Orlando
Posted on: May 28, 2023
|
|
Job Description:
The Systems Engineer is a critical member of the Technical
Operations Team. They are responsible for end-to-end technical
support of complex enterprise-scale applications which use a
variety of technologies both on-premises and in the cloud. Their
work includes day-to-day operations working with business units to
plan, design, and implement systems as well as monitoring ongoing
maintenance, enhancements, and automation. This role will apply
systems reliability engineering principles, DevOps practices, and
ITSM service operation disciplines that facilitate a highly
efficient, highly available production environment.
Responsibilities: Utilizes skills and experience provision
enterprise-scale software and services both on-premises and in the
cloud. This includes combining the right combination of cloud and
on-prem resources for any given product/solution. Utilizes skills
and experience to provide technical leadership with OS performance
monitoring, tuning, and troubleshooting. Utilizes skills and
experience to provide technical leadership assistance with
web/application server configuration, performance monitoring,
tuning, clustering, and debugging. Utilizes skills and experiences
to act as a liaison to the DevOps process with the project delivery
team. Evaluates new applications/systems for both operational best
practices and technical feasibility against current operational
standards. Participates in major incidents by providing technical
leadership to interpret data from OS, applications, middleware
stacks, and performance management tools. When engaged, takes
responsibility for identifying the point of failure and restoring
normal service operation. Works in a team responsible for Incident
Management, Request Fulfillment, Problem Management, IT Operations
Control, Change Evaluation, and Change Fulfillment. Assists in
creating concise and accurate documentation for Level 1 and Level 2
teams so they can achieve the resolution of simple to moderate
incidents/issues without escalation. Part of a 24x7 on-call
rotation. Basic Qualifications: Bachelor's degree in Computer
Science, Information Technology, or a similar field or related work
experience. 3+ years' or progressively more in-depth experience in
a role supporting and/or deploying enterprise-scale solutions which
demonstrated strong analytical and problem-solving skills. 2+
years' experience deploying and/or supporting systems and
applications in a cloud environment, with Amazon AWS and Microsoft
Azure strongly preferred. Proven expertise in setting up,
operating, and tuning a variety of performance management and
monitoring tools such as AppDynamics, SiteScope, Splunk, New Relic,
Grafana, etc. Proven experience working with multiple operating
systems, including a variety of Linux distros, as well as
containerized application deployment strategies such as Docker,
ECS, AKS/Kubernetes, etc. Demonstrated understanding of how to
configure and use code management, configuration, and deployment
tools, including Chef, Rundeck, Jenkins, git, GitHub, Terraform,
CloudFormation, Azure Resource Manager, etc. Demonstrated
understanding of certificate management for a variety of solutions
and use cases, including SSL/TLS and client certificates, for both
on-prem and cloud solutions. Demonstrated understanding of
full-stack application operational concepts such as Java
applications & middleware, NodeJS, Angular, React, etc.
Demonstrated understanding of computer networks and network
infrastructure, including HTTP, TCP/IP, SNMP, DNS, routing,
switching, and load balancing. Familiarity with current software
development lifecycle (SDLC) concepts and best practices and CI/CD
pipelines. Familiarity with IT Service Management (ITSM) processes,
especially incident management, problem management, and knowledge
management. ITIL Certification is desired. Familiarity with problem
analysis best practices, especially Kepner-Tragoe. Strong
interpersonal and communication skills with a track record that
demonstrates the ability to work effectively across a wide range of
constituencies in a diverse corporate environment. Excellent
organizational and time management skills that enable working in a
fast-paced team that is self-motivated to independently complete
tasks on multiple projects simultaneously. Required Education:
Bachelor's Degree. Additional Responsibilities: Participate in
OrangePeople monthly team meetings, and participate in
team-building efforts. Contribute to OrangePeople technical
discussions, peer reviews, etc. Contribute content and collaborate
via the OP-Wiki/Knowledge Base. Provide status reports to OP
Account Management as requested. About us : OrangePeople is an
Enterprise Architecture and Project Management solutions company.
Our most valuable asset is our people: dynamic, creative thinkers,
who are passionate about doing quality work. As a member of the
OrangePeople team, you will have access to industry-leading
consulting practices, strategies & technologies, innovative
training & education. An ideal Orange Person is a technology leader
with a proven track record of technical achievements and a strong
process/methodology orientation.
PDN-993351cc-b17f-4f43-83ee-b47c5458f6ad
Keywords: Orangepeople, Orlando , Reliability Engineer, Engineering , Orlando, Florida
Click
here to apply!
|