Site Reliability Engineer (Reliability & Quality) - Server Architecture at ByteDance (Singapore)
Add To BookmarksCompany:
Type: Full Time
Created: 2021-06-18 05:00:12
This position is with TikTok's Site Reliability & Engineering Team. The team is responsible for ensuring that the services provided by TikTok are highly reliable with low-latency. Reliability assurance is complex and systematic for any massive application system and the team focuses on optimizing the application architecture from end to end; driven by data analysis, with automatic and intelligent failure recovery.
- Ensure the online stability of the core system such as TikTok/Live, quickly respond to online accidents and build mechanisms and platforms to improve processing efficiency.
- Participate in the construction of operation and maintenance tools and platforms, and promote the automation of operation and maintenance.
- Find system weaknesses and improve projects on the ground through continuous and comprehensive data operations (including availability indicators, historical accidents, resource utilization, etc.),
- Accumulate best practices in operation and maintenance, provide guidance for business architecture design and component selection, and output operation and maintenance technical documents;
- Promote the improvement of service reliability, scalability and performance optimization to ensure system SLA.
- Bachelor's Degree or above, Major in Computer Science;
- Solid basic knowledge of computer software; understand the relevant principles of Linux operating system, storage, network IO, etc.;
- Familiar with one or more programming languages, such as Python/Go/Java/PHP/C/C++;
- Have the ability to solve problems systematically, good communication skills, and a strong sense of responsibility