1. In fact, the source of data can be multiple aspects and dimensions. For example, the data generated by the enterprise's own management activities, the industry data published by the government or institutions, the data purchased by data management consulting companies or data trading platforms, or the data crawled on the network through crawler tools.
2. Every post and personnel of an enterprise are engaged in business management activities related to the enterprise, and they all possess resources related to the enterprise, and have information and records of these resources. These resources and resource transformation activities are the birthplace of enterprise big data. As long as employees in each position can participate in the process of data collection and data recording, or cooperate with relevant equipment to complete data collection, it is very easy for enterprises to accumulate their own big data.
3. Industry data published by the National Bureau of Statistics, China National Statistical Institute, China Input-Output Society and other governments or institutions are actually better to obtain. In these websites, you can easily find some data, such as the basic situation of agriculture, the ex-factory price index of industrial producers, the total energy production and composition, foreign trade and utilization of foreign capital, and so on. It can also be divided into monthly, quarterly and annual reports. If we insist on obtaining analysis, it will play a great guiding role in the development trend of the industry.
4. If you don't have the required data in the market or don't want to buy it, you can choose to recruit/be a reptile engineer and crawl the data yourself. It can be said that as long as you see the data online, you can climb down. In the system framework of web crawler, the main process consists of three parts: controller, parser and resource base. The main job of the controller is to assign tasks to each crawler thread in multithreading. The basic work of crawler is completed by parser, and the resource library is used to store downloaded web pages.
The source of enterprise big data is reasonable, and big data engineers can analyze big data more accurately. Therefore, big data engineers should constantly improve their own capabilities in order to better analyze data.