Senior Engineer – Application Support, IT & Site Reliability
Job title:
Senior Engineer – Application Support, IT & Site Reliability
Job type:
Emp type:
Location:
Colombo
Job published:
July 23, 2024
Job Description
Your role:
- Design, implement, and maintain highly available and scalable systems that meet our service level objectives (SLOs).
- Monitor system performance and proactively identify bottlenecks and potential issues.
- Automate routine tasks and processes to improve efficiency and reduce the risk of human error.
- Liaise with clients to understand their needs, ensuring their sites run optimally and securely, while providing expert technical support and guidance.
- Develop and maintain comprehensive monitoring, alerting, and logging systems.
- Configure and manage firewalls to ensure the security of our infrastructure.
- Administer and optimize Nginx web servers for optimal performance and resilience.
- Work closely with development, operations, and security teams to ensure smooth deployments and efficient incident response.
- Provide technical guidance and support on IT-related matters.
- Devise IT-related policies and procedures.
- Conduct regular security audits and vulnerability assessments.
- Implement and maintain secure network configurations and protocols.
- Procure and maintain IT and engineering equipment, inventory on devices, software licenses and subscriptions
- Document and share knowledge with the team to promote continuous improvement.
Required experience:
- 3+ years of hands-on experience as a Site Reliability Engineer or Systems Administrator with strong networking on Windows, macOS and Linux systems.
- Expert knowledge of Linux systems administration (Ubuntu, CentOS, or similar).
- Extensive experience with networking technologies (TCP/IP, DNS, routing, firewalls).
- Proficiency in configuring and managing bare metal servers.
- Knowledgable in web servers, reverse proxy, load balancers and load balancing strategies.
- Strong scripting/automation skills (Python, Bash, or similar).
- Familiarity with cloud infrastructure (AWS, Azure, GCP) is a plus.
- Experience in maintaining WordPress websites including but not limited to core & plugin updates to WordPress applications, performance optimizations, managing security and backups, troubleshooting and maintenance of contracts, etc.
- Excellent problem-solving and troubleshooting abilities.
- Strong written and verbal communication skills.
- Collaborative mindset and ability to work effectively in a team environment.
- Ability to work independently and take ownership of projects.
Added advantages:
- Experience with containerization technologies (Docker, Kubernetes).
- Knowledge of infrastructure as code (Terraform, Ansible).
- Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack).
- Familiarity with ITIL or similar IT service management frameworks.
Required experience:
- 3+ years of hands-on experience as a Site Reliability Engineer or Systems Administrator with strong networking on Windows, macOS and Linux systems.
- Expert knowledge of Linux systems administration (Ubuntu, CentOS, or similar).
- Extensive experience with networking technologies (TCP/IP, DNS, routing, firewalls).
- Proficiency in configuring and managing bare metal servers.
- Knowledgable in web servers, reverse proxy, load balancers and load balancing strategies.
- Strong scripting/automation skills (Python, Bash, or similar).
- Familiarity with cloud infrastructure (AWS, Azure, GCP) is a plus.
- Experience in maintaining WordPress websites including but not limited to core & plugin updates to WordPress applications, performance optimizations, managing security and backups, troubleshooting and maintenance of contracts, etc.
- Excellent problem-solving and troubleshooting abilities.
- Strong written and verbal communication skills.
- Collaborative mindset and ability to work effectively in a team environment.
- Ability to work independently and take ownership of projects.
Added advantages:
- Experience with containerization technologies (Docker, Kubernetes).
- Knowledge of infrastructure as code (Terraform, Ansible).
- Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack).
- Familiarity with ITIL or similar IT service management frameworks.