Site Reliability Engineer, Cloud
San Francisco, CA, USA
Posted on Thursday, November 9, 2023
This is a San Francisco based position that is currently remote and will have a hybrid schedule once we return to office. We are open to candidates willing to relocate to the San Francisco Bay Area.
Geli (Growing Energy Labs, Inc.) provides software and business solutions to design, connect, and operate energy storage systems ranging in size from residential to utility-scale, as well as grid-tied, microgrid, and off-grid systems. Geli’s suite of products creates an ecosystem where project developers, OEMs, financiers, and project operators can deploy advanced energy projects using a seamless hardware-agnostic software platform.
Geli is a subsidiary of Hanwha Q CELLS, one of the world’s largest photovoltaic manufacturers most recognized for its high-performance, high-quality solar cells and modules.
Geli is committed to helping make the planet a cleaner, better place to live, both with our software products and through our everyday actions.
Imagine a world where there is less reliance on non-renewable power, where you source your electricity from your neighbors rather than from power stations hundreds of miles away and software makes the best possible use of the solar, wind, and battery storage available. This is our vision.
We are looking for enthusiastic colleagues that are not only fluent in technology, but also share our vision of a world running on 100% renewable energy.
Geli is looking for an experienced Site Reliability Engineer (SRE) who will have a range of responsibilities and qualifications related to ensuring the reliability, performance, and availability of Geli’s software and infrastructure.
- System Reliability: Ensure the reliability of our systems and services, minimizing downtime and outages through proactive monitoring, alerting, and troubleshooting.
- Incident Management: Respond to and manage incidents, conduct root-cause analysis, and implement preventive measures to reduce the impact of future incidents.
- Capacity Planning: Collaborate with teams to plan for capacity and scalability, making data-driven decisions about resource allocation and performance optimization.
- Automation: Develop and maintain automation tools for production environments to enhance system reliability and efficiency.
- Change Management: Oversee changes and updates to production systems, prioritizing risk mitigation and minimizing service disruptions during deployments.
- Performance Optimization: Work on performance tuning, profiling, and optimization of systems, making them faster and more efficient.
- On-Call Duty: Participate in an on-call rotation to respond to incidents and issues outside of regular working hours.
- Collaboration: Collaborate closely with Software Engineering, Product, DevOps, and other teams to ensure that reliability is built into the design and development process.
- Documentation: Contribute to documentation of production processes, systems, and incident response playbooks.
DESIRED EXPERIENCE & SKILLS
- Bachelor's degree in computer science, information technology, or a related field
- At least 5+ years as an SRE and at least 2+ years managing large production scale systems.
- Deep understanding of distributed systems, cloud computing, and containerization technologies.
- Strong programming and scripting skills (e.g., Python, Shell, Java) for automation and tools development.
- AWS cloud platform experience preferred.
- Security best practices and knowledge of network and application security.
- Experience with APM (Application Performance Monitoring) tools.
- Experience with configuration management, infrastructure as code, and orchestration tools (we use Ansible, Terraform, and Kubernetes).
- Proficiency in monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack).
- Knowledge of incident management and root cause analysis.
- Strong problem-solving skills and the ability to work under pressure.
- Excellent communication and skills.
- Energy domain experience preferred.
BENEFITS OF WORKING AT GELI
Competitive salary commensurate with experience
Competitive benefits offerings
Conveniently accessible location in downtown San Francisco
Flexible work-from-home-office opportunities, as determined by the position and job duties
Cigna and Kaiser options - available by region
Cigna Dental (including orthodontic coverage) and vision plans, both have options with a $0 paycheck contribution
Healthcare and Dependent Care Flexible Spending Accounts (FSA)
Company Paid (Health Savings Account) HSA Contribution when enrolled in the High Deductible Cigna medical plan
Company-paid Basic Life, AD&D, short-term, and long-term disability insurance
Voluntary benefits include: critical illness, hospital indemnity, accident insurance
401(k) with a 4% employer match
3 weeks of paid Parental Leave
Sick time- 72 hours frontloaded per calendar year
Vacation time (Flex time), and 13 Paid Holidays
Health Advocate wellness and concierge services
Wellness programs with our benefits providers
Bereavement leave- 5 paid days
Make a difference: join a group of people who are passionate about renewable energy
Have an impact: the company is still small enough that everyone’s contribution has a significant impact on the success of the company
Many opportunities to lead teams, and projects, and contribute to development
Casual professional working environment: there’s no need to dress up, just present your best self
Work collaboratively in a diverse environment- we commit to reaching better decisions by respecting opinions and working through disagreements
We value the insights that a diverse team can bring. We encourage applications from members of groups that have been traditionally underrepresented in tech.
Growing Energy Labs, Inc. provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, or genetics.