Between the ideal and reality of big data

Between the ideal and reality of big data

I have been dealing with data for 25 years, and have experienced many reorganizations from telecom, Netcom to Unicom, and personally participated in the whole process of the gradual development of data professional lines from vulnerable groups. I have always wanted to find an opportunity to talk about my experience, but I have never made up my mind to start writing. Recently, inspired by Fan Zong's original "Thinking about Reading and Understanding Hadoop", I want to share my thoughts from the perspective of data work practice, so I just throw a brick to attract jade.

1. Regarding the positioning of the data center, let's start with the usual routines in the text. First, "What is data?" The popular understanding is that if an enterprise is compared to a "production line", then data is the information generated by various activities on this "production line" and stored in various systems or other carriers in various forms. Data is formed by classifying these information according to certain attributes and rules. It reflects the business development of the enterprise and records the usage of enterprise users and the situation of each participant in the industrial chain. Influenced by the establishment of modern enterprise departments and the management structure of professional lines, the complete "production line" of enterprises is divided by various departments, and the data is scattered in the management systems of various departments. This is the status quo of business and data management in large enterprises ~ ~ "separation of responsibilities and scattered data". So, how to reflect the overall development of enterprises? Usually, the overall situation of the company is reported in the company's monthly business analysis meeting and the analysis report of the financial department, while the reports of the market, group customers and other departments report the business situation of professional lines respectively. What happened once was that the financial department reported that the company's overall profit declined, while all business departments completed their tasks one after another, with great contrast. The boss wondered ~ ~ "You all finished the task, but I didn't?" Since the reorganization of China Unicom, under the pressure of other majors in the province, China Unicom has been promoting data concentration. The Ministry of Industry and Information Technology centrally stores hundreds of millions of users' detailed data in various provinces and systems at the group level, and processes the data through unified rules. Coupled with the later analysis and application, it not only makes the monthly statistics of user development data more realistic, but also finds the illegal operation and fraud performance at the city level. The chairman of the group held a national working conference at the prefecture and city levels, and criticized and replaced the bosses of several cities by name. At that time, the local boss was still in the fog ~ ~ "I don't have such detailed data. How did the chairman know? " This is the vital role of data in breaking down the barriers between departments and provinces, so that leaders can have a comprehensive view of the real situation of enterprises, "knowing why, knowing why." In the past two years, through cooperation with external companies, China Unicom has created real value for enterprises by using desensitized user tag data, and the application value of "data" has become increasingly prominent, truly becoming another valuable resource for enterprises. Previously, there was no such special department in the enterprise to take the role of "data resource" manager from the overall perspective, which was the original intention of establishing "data center" and its positioning and unshirkable responsibility. The establishment of Unicom's data center is the first time that the weak specialty of "data line" has got rid of dependence and become an independent secondary department. It is also a high affirmation of the achievements made by Unicom's information department in the centralized integration and application of practical data and supporting company management. Speaking of which, people who work on the data line will have a deep buzz, and there will be as many difficulties as there are. 2. Active or passive? It's all the fault of "support" The "data center" is separated from the information department, and the basic orientation of the information department is "support", that is, to be responsive. We are familiar with the scene that several days a month, in order to write an analysis report, people in the business department need some data other than the report for analysis, call the data department, and then anxiously wait for the data results provided by them. In another scene, the staff of the data department are tossed around by various data requirements of various departments, working overtime until dawn to provide data. In the past, there was an office responsible for data services, and the data used to summarize the annual work was "providing tens of thousands of reports". At the insistence of users, the subsystem has developed a large number of reports with similar contents and different formats. On the one hand, users' endless needs can't be met, on the other hand, a large number of reports in the system can't be accessed. Because users can't get data by themselves, the daily data service work is quite passive. In fact, in terms of content, the data of an enterprise is unique, but various departments put forward different presentation requirements because of different angles of concern, which actually has a high degree of overlap at the data level. In order to reduce demand and improve customer satisfaction, it is necessary for the personnel in the data management department to have a high degree of comprehensive ability. This person should not only be familiar with the company's business, process and division of responsibilities, but also have good communication skills, be able to correctly understand, synthesize and guide users' needs, and then solidify the integrated and verified requirements in the system under the overall framework. If you are very capable, you can still do something on your own initiative. However, in order to change the passive situation of data work, data management departments need to actively form a systematic data management framework from four aspects: data, application, management and control, and system from the perspective of "data resource managers" to guide daily work and system construction. The above picture shows the research results of data management system carried out by China Unicom in 2009, which is a good summary and promotion of data work. It is the structure of data management system L0, which reveals the components of data management and the relationship between them. Different from other professional lines, the management core of data specialty is "data", data quality, life cycle and safety management are the core control contents, and organization (personnel, system) and system are the basic guarantee for data to generate value. Data, application, management and control, and system are all indispensable, and there are interrelated and constantly optimized processes (processes), which is definitely not as simple as building several systems, which is also the difficulty of data professional management. The above picture well explains the flow of data work. If the data department wants to reverse the passive situation, it must first have its own complete architecture (data, application, system, process and management system). The formation of this framework needs to go through the following four steps: (1) correctly evaluate the present situation of its own ability; Find out the expectations and gaps of the company's business strategy and goals for data specialty; Selectively determine the strategic objectives and phased plans of data work and organize their implementation; After the phased plan is completed, it is necessary to evaluate the results of capacity improvement, so as to form a new evaluation of the status quo and accumulate it gradually and effectively. Informatization colleagues can easily understand data, applications and systems, but it is not easy to understand systems and processes. System is the rules of the game, which stipulates who should do what, how to do it and to what extent. This process is to clarify the relationship between each step of a job and the various departments involved. The lack of current process has brought too many problems. It is common that a company has gone offline, but it is still displayed in our system. The new business has created value for the company, and its income is not reflected separately in the financial statements, which can not reflect the development of the business in time; The data and report functions in the system are already available, and the business department is still asking the data department to provide data manually, so the user's demand cannot be transmitted to the construction link in time.

To solve these problems, we must form a closed-loop data workflow, and at the same time, we must join the front-end links such as company operation, network and management, and participate in the preliminary work such as product planning, infrastructure planning and subject adjustment at the first time to ensure the effective accumulation and normal operation of data work.

3. Who is using the data? What are his core needs? After defining the positioning and work content, the first thing to be clear is the work goal, and the determination of the goal needs to know who the data users are and what their core needs are. So, who is the demander of data? From the perspective of enterprise management, data requirements are usually divided into internal requirements and external requirements. Specifically: (1) Internal demand is the responsibility of data service enterprise management. From the management level, including group and molecular companies, provincial branches; In terms of management responsibilities, it is the company's management, functional departments and grassroots operators. The management's appeal is to grasp the overall situation of the company's operation through data and know "what happened? What is the main reason? Who should I go to? " You gave me an indicator above 10, because this indicator of 10 may change in the opposite direction, so I have to judge for myself which is the core indicator. Management needs to be "concise but not simple", which is also the highest requirement. "How to make the leadership's desktop simple?" If you haven't thought about this problem, it's hard for leaders to be satisfied with your work. The best way to meet the needs of leaders is to provide a comprehensive index, just like a thermometer, or the Shanghai Composite Index. An indicator can summarize the overall situation. Behind this index is a comprehensive evaluation system, which needs special research and a lot of practical tests. (The picture above is the UI specification of the leader homepage designed by DW 1.0, which is a workbench, including three functions: problem discovery, task assignment and problem feedback. In the middle is the evaluation result of the company's overall situation in that month. The radar chart shows the gap between the target value and the four indicators in the comprehensive evaluation index pool, such as business development, financial situation, enterprise operation and innovation ability, which supports early warning and drilling exploration. There is a link between hot information and information feedback at the bottom of the page, which supports leadership task assignment and problem feedback. ) functional departments are the departments we deal with most, and their demand is to obtain the data of their major to support daily management. The most widely used is monthly business analysis. Some departments use their own reporting systems, some rely on data departments for support, some departments have little available data, and some departments simply re-establish their own systems for model precipitation. From the application level of data, what we can provide to functional departments is still at the rough processing level of data. "What is our most profitable product? Who is our user with the highest gold content? What kind of strategy should we adopt? What is the effect of the measures we have taken? " Too many questions need to be answered with data. Now it's time to withdraw money from the background of the data department. People in functional departments can't get the data themselves and can't do anything. When they can analyze their own data, the user satisfaction of functional departments can be improved. Grass-roots operators are the closest link to users, and there is really little data they can use. In recent years, the work of stimulating grassroots vitality has put forward more requirements for data service grassroots. However, how can a user-level granular income data based on product line and a cost data based on management subject support the resource allocation and performance management of front-line personnel? The data level does little for grassroots personnel. Finally, say a few words for our users. If I'm a Fortune 500 user, and I'm entering Unicom's business hall for the first time, after how much income I've created for Unicom, can Unicom provide me with big customer-level services for the first time without waiting for me to find my own value? If I have used Unicom's broadband service for more than 10 years, can Unicom recognize my value and provide me with VIP-level comprehensive services? Can China Unicom recommend intimate services to me in the way I like, at a convenient time and by more convenient means, instead of focusing on digging my privacy? The demands of the above users are supported by a large amount of data. The company's business strategy for many years is "customer-centric", but from the data index system, it is still "product-centric". The data layer should really do something for our users. The above picture helps us to think about our work goals from another angle. We should consider what we should do, what we can do and what we have done from the perspective of data users to form our work goals. We can no longer stick to the traditional working mode, and data work needs to be summarized and innovated. (2) External demand is the embodiment of data serving the society and creating value for enterprises. In recent years, due to the advantages of China Unicom's centralized data, it has cooperated with many enterprises such as China Merchants and Ant Financial, and opened up many applications such as mobile phone terminals and user credit index, creating new sources of income for enterprises. I don't know the work, so I have no right to speak. "In September last year, the State Council released the Action Plan for Promoting Big Data Development, saying that it would complete the construction of a unified and open platform for national government data by the end of 20 18, and take the lead in opening public data resources in more than 20 important fields such as meteorology, environment, credit, transportation, medical care and health to the society." "Under the double urging of the government and the market, the old data originally sealed in the server has become a rich' gold mine'. Excited companies and researchers search for data while filtering out valuable data as needed. Reconstruction. But there are not many enterprises that can really do deep excavation. This field is waiting for the emergence of "killer" applications, which will promote fundamental changes in industries such as finance, health care, retail and manufacturing. " ~ "The Pain Point of Big Data" Unicom, like other areas of society, is also experiencing the process of exploration. First, it completed the centralized integration of its own data. Then, it considered the integration and application of external data. For Unicom, external users include: government departments, capital market supervision and audit institutions, and interested partners. Foreign service cooperation, especially fee-based services, requires higher productization. In addition, as a state-owned enterprise has its social responsibility, China Unicom's big data application may play a role in controlling traffic congestion, solving social problems such as difficulty in seeing a doctor, and improving residents' happiness index. Big data in all industries has the same feeling, and there is currently a lack of "killer" applications. Personally, the "killer" application should first be the result of combining the analysis and prediction ability based on big data with personalized requirements. For example, AutoNavi provides a forecast of peak congestion for each road. When users enter the travel plan, they can predict the congestion situation for a day or even a week, choose the travel time instead of on the road, and then struggle to choose which road. For another example, WeChat's recently released electronic invoice function, while solving practical problems and improving efficiency for users, incites enterprise-level applications, from personal applications to enterprises, which further makes banks feel powerless. In addition, I am very happy to receive the early peak warning information recently launched by Gaode Map. I think no matter what kind of application it is, you should ask yourself what you really need from the perspective of personal experience, and then calm down and solve the problem to the best with a responsible and down-to-earth attitude, so as not to be evaluated as "simple and rude". 4. What needs a system to achieve? ~ ~ copy, process, accumulation, in the final analysis, accumulation. Recently, friends from other departments complained to me several times. It will take a long time to submit the report now. I have communicated with my new colleagues many times, and I don't know what we want. I'm speechless. Such a thing has become the norm after personnel changes. Small things, whether they are analysts in the marketing department, demand managers in the information technology department or developers of manufacturers, will be "zero" for a while. Look at the big things, people have changed, and the previous work has not accumulated. Later people didn't understand the degree of the previous work at all. This work seems to be "zero". In another scenario, every month, analysts in the marketing department get data and write analysis reports. After the business analysis meeting, the dust settled. The data department spends a lot of money on overtime data and materials provided for marketers every year. What value did these expenses create? Is it just for the leaders to listen to the report and please their eyes? We really need to sit down and think about it. I once met a colleague in the marketing department. He used Excel to make a very complicated template, that is, to summarize the monthly data into monthly data, and then calculate the year-on-year and month-on-month ratio, constitute and draw a trend chart, which can be easily realized through technical means. Why don't they turn this template into a system capability and let the system help him? A technical brother told me before that people who know business now are the most valuable. There is nothing that technology can't do, mainly knowing what to do with technology. No matter how good the technology is, we should also think about what to do and what to do. Personally, what the system needs to do is to copy, process and accumulate, and it is impossible for the system to solve the problem that no one has solved. If you already have a mature template, the system can be copied every month and nationwide, which can improve efficiency and avoid human error. If you do a perfect closed-loop process, the system can help you strictly implement it. However, the most valuable thing is accumulation, not only the accumulation of data applications and processes, but also the accumulation of "knowledge" solidified in the system. It helps the people behind to be familiar with the data, and will not make the work zero because of personnel changes. "Accumulation" is something that needs to be considered all the time. The above figure shows the level of capability accumulation from the perspective of big data application value and system capability level, and it also helps us to quickly locate the level we can achieve at present and clarify the goal of our efforts. Are we satisfied with providing raw materials for data rough machining, or are we embedded in the production process of enterprises to form a business cooperation model? 5. What is the key to the development of data specialty? ~ ~ people, or people. It took four years from the reorganization of China Unicom in 2008 to the establishment of the data center in 20 12, and the hardships need not be said. (The word 30 1 is omitted here. No one can't do it, especially the data major. You need a group of people who understand data, can use data, and can bear hardships and stand hard work. Team is the most valuable resource. The necessary conditions for the construction of talent team include: (1) a cadre appointment and dismissal system that supports the survival of the fittest; (2) Support the retention of the best employee compensation system; (3) Effective training exchange and knowledge accumulation mechanism to help employees grow rapidly; (4) Support the salary system of our competitive independent development team; (5) Support our bidding process of selecting the best quality partner; (6) Partners realize their own shortcomings, concentrate on accumulation, do things seriously and grow with us. 6. Summary Finally, according to the classification method of "own things, other people's things, God's things", what can we do "own things": (1) First, we must have a stable data management framework, including data, applications, systems and regulations. This structure is combined with the company's strategic objectives to form an evolution route and annual work objectives, which are gradually realized through the achievement of the annual objectives. The data management architecture needs to be understood within the data center (group, provincial branch) and among the company management, information technology department and other business departments, and implemented unswervingly. (2) Clarify the job responsibilities and division of labor interfaces (group and provincial branches), maintain relative stability, and avoid "creating jobs temporarily due to events". Organize employee training and exchange regularly, do a good job in knowledge transfer and information sharing, achieve the annual work target at the employee level, and let new employees enter new roles as soon as possible. Inviting provincial companies to participate in data capacity building through thematic research groups, arousing the enthusiasm of provincial companies, developing the good habits of "reading and using data" and "finding and solving problems" for everyone in the data center, improving themselves, forming an effective accumulation and forming a "growth" data professional team. (3) Establish a regular user (data service object) communication system, actively introduce our data architecture and system capacity improvement, division of responsibilities and annual work objectives, and reach a * * * understanding at the user level. Guide users to use more system capabilities and benefit from them, so that users can really feel the improvement of efficiency and are willing to accumulate with us. (4) Integrate the resources around us in many ways, and reach a * * * understanding with partners in improving their own capabilities, methodology and product level, so as to make common progress. Introduce consulting institutions and university professionals to participate in a number of special studies such as comprehensive index and customer index system to enhance data productization and innovation capabilities. (5) Establish a closed-loop workflow, so that the relatively back-end data flow can participate in the front-end process of enterprise operation, so as to reflect the changes of enterprise operation in time, update the index system, report structure and related applications regularly, avoid inconsistent problems, and effectively implement the life cycle management of data and applications. Having said so much, on the one hand, I don't want to express my feelings accumulated over the years, on the other hand, I feel that this major can seize the opportunity and achieve better development results. Think of a sentence that a leader said many years ago ~ ~ "Where there is a place, there is a place". Riding the east wind of big data, our team has grown again. However, "the ideal is full and the reality is very skinny." We should be more aware of the responsibilities and gaps on our shoulders, avoid being impetuous and down-to-earth, and hope that new colleagues will adapt as soon as possible and enter the role.