The following are some senior operation and maintenance engineers' views on operation and maintenance in China Internet industry (privacy is involved, and relevant names are abbreviated):
CXY:
Operation and maintenance is a very broad definition, which has different responsibilities and positioning at different stages of different companies. If it is understood in the literal meaning of operation, it is wrong to think that it is the work of typing a few lines of operation orders. For a start-up company, the job of the operation and maintenance engineer may need to start from applying for a domain name, buying/renting servers, putting them on the shelves, adjusting the settings of network devices, deploying operating systems and operating environments, deploying codes, designing and deploying monitoring, preventing vulnerabilities and attacks, and so on. For large companies, the requirements for operation and maintenance work are getting higher and higher, which also gives birth to a more detailed division of operation and maintenance: from the general direction, IT can be divided into website operation and maintenance, system operation and maintenance, network operation and maintenance, database operation and maintenance, IT operation and maintenance, operation and development, operation and maintenance security and other directions.
many laymen's views on operation and maintenance generally belong to a very small responsibility of IT operation and maintenance: installing systems. Some R&D engineers' views on operation and maintenance are limited to several points of operation and maintenance: deployment, change, monitoring and response.
no matter what operation and maintenance, the most basic duty is to ensure the stable operation of the business. So it must be the owner of business stability. Some people usually think that operation and maintenance engineers are like firefighters, who respond abnormally 7*24 hours and put out fires. But the stable operation and maintenance engineer is closer to the doctor's occupation. Doctors are also divided into various departments, and there are also emergency rooms. It is necessary to judge the patient's problems first and prescribe the right medicine.
There are various needs in business. If the operation and maintenance engineers can meet the business needs, or actively explore the pain points and improvement methods of the business, they can realize more value for the business.
when meeting business needs, we should prioritize and give priority to the very important needs of rapid business development, such as stability, deployment and change efficiency, and capacity management. Needless to say, stability, if users can't use your business stably, any product features are worthless. For the fast-growing Internet company like Baidu, there are a lot of upgrades to be provided to users every day. How to meet the product upgrade needs in the large cluster in different places as quickly as possible, and at the same time make users unaware of the upgrade process, this is our pursuit. When users will use Baidu to measure whether the network can get online, it is a praise for the quality of operation and maintenance.
secondly, we can look at the needs of different businesses horizontally. If we can abstract the requirements of multiple services and platformize some work with general value (such as database, cdn, monitoring, traffic access and scheduling, storage and calculation of big data), we can also develop in this direction. With the huge traffic and server scale like Baidu, you not only have huge space and challenges, but also have sufficient resources and support to develop and apply the most cutting-edge technologies in the industry.
after a certain accumulation, you can enter the macro and micro levels, and consider the intelligent deployment and scheduling of services from the whole company level (involving network, hardware, systems, application development methods and other key points) to further improve efficiency and save costs.
if you can understand the business, understand the business model, and optimize and innovate closely with the business, it is another way for the operation and maintenance engineer to reflect the value. There are many product innovations, patent applications, publication of papers, and improvement of business indicators, which are directly or cooperatively contributed by operation and maintenance engineers.
YBX:
Compared with R&D personnel, operation and maintenance engineers can observe the maintained computer system globally, especially senior operation and maintenance engineers, and there is no module boundary. This unique position brings a lot of value: knowing the accurate system bottleneck, and then knowing the accurate capacity of the system; Know how to provide capacity quickly before the system bottlenecks. Knowing the risk points of the system, we can coordinate the related modules above and below the risk points and make redundant strategies; It is more reasonable than focusing on solving the stability of single-point module. Being engaged in related work for a long time and accumulating more experience in architecture design can guide the design and audit of new architecture. From the perspective of different businesses of the company, operation and maintenance can abstract the same modules, manage them in a unified way, and form an effective platform and automatic management method. Similarly, from the perspective of different businesses of the company, resources can be allocated in a unified way, thus saving resources.
KZ: design and implement software that can improve the availability, scalability, delay and efficiency of company services. Deal with daily emergency, correct and replace problem components. And design method to avoid that problem. Design and implement new architecture and standards of very large-scale distributed system. Participate in service expansion plan and forecast service growth trend, and optimize software and system performance. Provide online consulting service and on-site problem solving service. Build an automatic operation and maintenance platform to solve daily problems. Build a knowledge base and predict possible problems. XX:
Operation and maintenance is the whole process of maintaining the production environment, resources and services related to the production environment, including related technologies and process means, to ensure the stable, efficient and low-cost operation of the production environment.
on the one hand, operation and maintenance is ultimately responsible for business functions, and its value is reflected in maximizing the value of products. This is usually achieved by improving the performance of product functions to the extreme. For example, the operation and maintenance of search engines should focus on ensuring the ultimate experience of users when searching: stability, speed, accuracy, novelty and completeness. The operation and maintenance of an online chat system should ensure the real-time and smooth chat process of users. On the other hand, it is ultimately responsible for the cost of online business. Its value is reflected in reducing the service operation cost
Generally, the mode of operation and maintenance work depends on the characteristics and requirements of the business to be maintained, forming a number of required thematic directions for development. Common solutions include the following topics: event management, configuration management, change management, capacity management, etc.
the requirements of operation and maintenance engineers are particularly strict, because they need to constantly supplement and expand their knowledge and research scope for different problems.
in the initial stage, excellent operation and maintenance engineers will show outstanding initiative and sense of responsibility, and will actively learn and expand their understanding of business and corresponding knowledge in the face of unfamiliar business, so as to be competent for independent maintenance of business.
In the gradual development stage, engineers who pay attention to summary and introspection will gradually grow into senior operation and maintenance engineers, and usually they will have a more systematic understanding of service operation and maintenance. There are also some engineers who gradually become project managers
due to their excellent project management and planning skills, and further development. Senior operation and maintenance engineers will have a thorough understanding of products, so in this case, senior operation and maintenance engineers can even become product managers and consultants of product research and development, and play a vital role in the design and development of product functions.
SJY:
the technical system required by an operation and maintenance engineer varies according to his professional direction. But mastering the basic computer system architecture, operating system and network technology is the basic requirement. For example, you may need to master the use of linux operating system, use various scripting tools to handle daily work tasks, and master TCP/IP protocol stack to troubleshoot abnormal traffic problems in a large-scale network system. Further, you need to form a set of experience accumulation in software maintainability as a guide for the follow-up work.
The purpose of an operation and maintenance engineer in the initial stage is to master all the software and hardware knowledge and experience needed to maintain a system. In the advanced stage, it is necessary to design and develop a set of basic system software to support the stable and reliable operation of business systems, that is, to develop software that serves software to support larger-scale business systems and improve operation and maintenance productivity. The highest stage is the construction and operation stage of the software system, which makes the system have natural operability from the birth stage to maximize the productivity of the system and minimize the dependence on external support resources.
zm:
an operation and maintenance engineer should be a Software Engineer first, but his responsibilities and emphasis are different.
the operation and maintenance engineer is not a system administrator. The biggest difference with the system administrator is that the job of the operation and maintenance engineer is not only to configure and manage the system, but also to use software development methods to enhance the functions of the system or analyze the data.
An operation and maintenance engineer should be a combination of roles such as software engineer and system engineer, and have a broader knowledge background than ordinary software engineers.
The duties of operation and maintenance are: to ensure the stable operation of services; Consider the scalability of the service; From the point of view of system stability and operability, the development requirements are put forward; Locate system problems, and even directly correct bug;; Respond and deal with sudden problems quickly; Daily work of operation and maintenance: it is necessary to analyze the requirements and design scheme of the system, think about what can be strengthened in ensuring stability, and communicate effectively with the R&D personnel of the system; Use tools or write programs to analyze operational data; Write programs to build tools or platforms to strengthen the stability of the system; The most important thing for operation and maintenance engineers is to use programming and software methods to solve problems. The development path should not be very different from that of software engineers, but only the focus and the direction of the field.