OrlandoRecruiter Since 2001
the smart solution for Orlando jobs

Lead Site Reliability Engineer

Company: Kolter Solutions
Location: Orlando
Posted on: March 17, 2023

Job Description:

Kolter Solutions is seeking a Lead Site Reliability Engineer.

Location: Remote.

Responsibilities:


  • Scale up and mature a team of Site Reliability Engineers
  • Build and execute vision to implement tools for monitoring efficient deployments and remediation SOPs
  • Strategize and prioritize roadmaps to drive reliability of platform(s)
  • Collaborate with key stakeholders across Products, IT, and Security on initiatives to drive operational excellence, instrumentation, security, grow, reliability and scalability
  • Promote the collaboration of all engineering teams in the sustainability of platform and help promote a culture of quantifiable continuous improvements
  • Drive service reliability by developing and enabling metric visibility using KPIs and system/component level SLAs
  • Serve as a change agent for driving service prioritization and help promote a culture of continuous improvement measured by operational metrics and KPIs. Provide a single pane of glass across all critical operational components
  • Drive end-to-end resolution of production incidents including root cause analysis, and prevention and correction plans
  • Support business infrastructure to ensure service availability, including outside of business hours as needed
  • Optimize services across the company to manage costs, including right sizing and depreciating systems
  • Contribute to the technology strategy by guiding the production and development technical architecture; maintain high quality standards, especially with technology, and foster a culture of long-term thinking and innovation
  • Support and unblock the SRE team in delivering on its goals; you will oversee the technical scoping and planning for the team, and help guide and empower the development approach within the team
  • Ensure the SRE team is high performing, with a healthy, inclusive and collaborative culture; coach engineers on the team and guide them through a fulfilling career
  • Research new technologies to solve tomorrow's deployment, monitoring, and scaling needs
  • Run the production environment by monitoring availability and taking a holistic view of system health
  • Build software and systems to manage platform infrastructure and applications
  • Improve reliability, quality, and time-to-market of our suite of software solutions
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
  • Provide primary operational support and engineering for multiple large-scale distributed software applications
  • Manage and participate in 24x7 on-call rotations to ensure site reliability and performance
  • Define best practices for monitoring, alerting, and incident management
  • Lead and participate in root cause analysis and documenting procedures

    Required:


    • 10+ years of relevant professional experience in highly available, public facing SaaS / EMR environments
    • Exemplary written and oral communication skills
    • Experience leading highly dynamic on-call teams, coaching, mentoring, and promoting cross team collaboration
    • Experience managing multiple projects and priorities simultaneously
    • Proven track record of improving reliability, availability, incident/crisis management and performance of cloud services
    • Experience troubleshooting and developing highly available systems that utilize load balancing, horizontal scalability, and high availability
    • Strong technical expertise in troubleshooting, cloud stacks, operating systems, networking, virtualization, and containers
    • Working knowledge of CI/CD, DevOps, and sophisticated software deployment techniques
    • Experience defining reliability metrics, operations processes including problem management and automation
    • Experience implementing chaos engineering

      Knowledge:


      • 4+ years' experience managing a team working with cloud infrastructure (in particular AWS) in a secure environment (ISO27001, SOC 2 type 2, GDPR, etc.)
      • 3+ years of technical operations experience, with a background in SaaS and cloud-based platforms
      • Experience dealing with environments that leverage container orchestration tools, i.e. Kubernetes
      • Experience building scalable and fault tolerant systems
      • Experience migrating from Data Centers to Cloud based solutions, and migrating solutions from other cloud providers
      • Past experience successfully leading one or more DevOps projects (CI/CD, pipeline tools, operations management, etc.) to completion through tools like Jenkins, Helm, Terraform, etc.
      • Experience with system health monitoring tools such as New Relic, OpsGenie, Uptime Robot, or StackDriver
      • Have actively managed hosting at scale at multiple companies, including costs and 24x7 uptime
      • Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)
      • Proficiency with scripting and/or programming languages; Python, Java, C/C++, Ruby, and JavaScript prefer red
      • Familiarity with Intranet tools and processes including Confluence, Jira, and Microsoft Teams


        Kolter Solutions is a leading professional staffing company based in Central Florida. We place highly skilled individuals on a contract, contract-to-hire and direct hire positions at clients nationwide.
        Kolter Solutions has proudly been recognized as the " Best Places to Work by the Orlando Business Journal and Staffing Industry Analysts (SIA). We are also in the Fast 50 2020 Fastest growing companies in Central Florida !
        We offer:


        • Full Health Benefits
        • Vision
        • Dental
        • 401 (k)
        • Pet Insurance
        • Life Insurance
        • Supplemental Benefits such as short-term disability, accidental insurance, and supplemental dental and vision.
        • Employee Discounts
        • Referral Program


          Kolter Solutions is an Equal Opportunity Employer. We believe in hiring a diverse workforce and sustaining an inclusive, people-first culture. We are committed to non-discrimination on any protected basis, such as disability and veteran status, or any other basis covered under federal, state or local applicable law.

Keywords: Kolter Solutions, Orlando , Lead Site Reliability Engineer, Engineering , Orlando, Florida

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Log In or Create An Account

Get the latest Florida jobs by following @recnetFL on Twitter!

Orlando RSS job feeds