How big is the domestic data label market at present?

Data labeling refers to the process of classifying, framing, labeling and labeling pictures, sounds, texts and other data, and labeling the features of objects as the basic materials of machine learning. According to the way of participation, the participating enterprises in China data label industry are mainly divided into crowdsourcing and self-built factories. Among them, crowdsourcing mainly includes Baidu crowdsourcing, JD.COM Zhongzhi and Totoro data. The factory models mainly include Bessie, Yunce, Love Number Wisdom, Haitian Sheng Rui, Ali Data Labeling, Yuan Kun Intelligent Data, Dianwo Technology, etc.

Head enterprises are self-built data teams, and small and medium-sized data suppliers account for a large proportion.

At present, the first echelon of the domestic data labeling market in China includes the head company setting up its own data labeling department. JD.COM(JD.COM Zhongzhi), Baidu (Baidu Zhongce), Tencent and Ali (Ali Data Labeling) all have their own labeling platforms and tools. In addition to the head company, many data label companies have emerged in China in recent years, such as Totoro Data, Testin Cloud Measurement, Besay BasicFinder, Data Hall and so on. These companies are second only to the head companies, all of which have considerable scale and are located in the second echelon.

Among the participants in the data label industry in China, according to the scale, brand data service providers, small and medium-sized data suppliers and self-built basic data teams of demanders form a market competition relationship, and they are the main suppliers of the AI data label market, accounting for 30.4%, 47.0% and 22.6% respectively in 20 19 years. At present, small and medium-sized data suppliers are the main suppliers in the market.

According to the model, it is divided into data label companies and crowdsourcing platforms, with a wide range of services.

According to the way of participation, the participating enterprises in China data label industry are mainly divided into crowdsourcing platforms and self-built factories (professional data label companies). In the ranking of data label companies in 2020, Testin cloud measurement, data hall and totoro data bits are in the top three; In the ranking of data tag crowdsourcing platforms, JD.COM Zhongzhi, Baidu Zhongce and Data Hall rank in the top three.

From the point of view that data annotation represents the business layout of enterprises, most data annotation service providers provide various types of data annotation such as text, voice, image and video, and their service applications cover security, intelligent driving, medical care, education, finance and other fields, and their main customers include technology companies, artificial intelligence enterprises, traditional enterprises, government departments, scientific research institutions and so on.

Most of the enterprises that focus on visual business build their own labeling bases, which are mostly distributed in Shanxi, Henan and other places.

AI data shows that according to the business direction and time to enter the market, players in the industry can be roughly divided into early players, middle and late players, players focusing on visual services and players focusing on voice services. Among them, players who are more focused on voice data usually have more data sets of their own intellectual property rights; There are self-built labeling bases or full-time labeling teams, mostly visual players.

As an indispensable part of the artificial intelligence industry chain, the development of AI data annotation service has become one of the important directions to promote AI construction in various places. Guizhou, Shanxi, Chongqing and other places have successively issued guidance, introduced technology companies, built data bases and data trading centers, and built artificial intelligence industrial parks with local characteristics.

At present, many data labeling companies build their own labeling bases or teams, such as Baidu AI data labeling base in Shanxi, Baidu Big Data Hundred Birds River base, Baoding data processing base in data hall, Hefei data base in data hall, and Beijing TTS recording center in data hall. Most of them are located in Shanxi, Henan and other places.

Beijing, Shanghai and Chengdu are the top three demand enterprises, and the number of Hangzhou has declined.

From the perspective of demand enterprises, according to AI data labeling statistics, in April 2020, the number of domestic data labeling enterprises was 565, and in February 2020, the number increased to 705. According to the regional distribution of enterprises marked by data, by the end of 2020, 65438+February, Beijing, Shanghai, Chengdu, Shenzhen and Hangzhou were TOP5 cities, and the number of enterprises reached 185, 84, 68, 63 and 46 respectively. Among them, the number of enterprises in Beijing, Shanghai, Chengdu and Shenzhen all increased compared with April 2020, and the number of enterprises in Hangzhou decreased compared with April 2020.

Market concentration is low, it will improve in the future, and industry mergers and acquisitions will become a trend.

In 20 19, the CR5 (the market share of the top five enterprises) of AI data label industry was 26.2%, which was in the stage of low concentration competition, and the industry was full of vitality and had a good development space. Among the top five enterprises, Haitian Risheng and Baidu data crowdsourcing are becoming more and more popular. It is understood that among domestic suppliers, most companies provide image data sampling services, involving portrait data, OCR data and automatic driving data. Business needs are scattered, with Baidu data crowdsourcing accounting for the largest share of revenue.

Comparatively speaking, the demand for voice data is relatively concentrated, and the supply threshold is higher than that of image data. The content includes speech recognition data, speech synthesis data and so on. Among them, Haitian Sheng Rui has the largest revenue share.

At present, the concentration of artificial intelligence data label industry is moderate, which is neither an oligopoly market nor a fully competitive market. On the one hand, Baidu data crowdsourcing, Haitian Sheng Rui, Data Hall and other enterprises entered the market earlier and accumulated more customer resources. On the other hand, because the downstream enterprises used the public data set training model before, the requirement for high accuracy of data is still short-lived, and the market threshold is not obvious because of the lag of ecological conduction effect. Small and medium-sized enterprises with weak capital and R&D strength still have strong development soil.

However, with the development of downstream enterprises in the future, the direct use of outsourcing teams has low cost and strong data security and controllability. Some basic needs will be self-sufficient by downstream enterprises, and the existing stock market of external data service providers will face a decline, so they must undertake unique tasks with high difficulty and cutting edge, which requires them to invest in the research and development of high-precision and professional data processing tools and the basic research of artificial intelligence algorithms in order to grasp customer needs and open up incremental markets. Therefore, capital and R&D strength have become a higher industry threshold. At the same time, due to the cooling of the capital market in recent years, a number of small and medium-sized manufacturers are facing business contraction. In addition, some manufacturers have begun to make mergers and acquisitions in the industry. With reference to the development of overseas data service market (overseas industry giant Apen has acquired other companies many times), M&A will also become a market trend. Under the influence of multiple factors, industry concentration will increase.

To sum up, the merger and reorganization of the data label industry will become the general trend in the future. At present, a typical M&A event in China is that Bisai BasicFinder has acquired a number of professional manual annotation service providers, so as to enrich the independent data acquisition system and complete more diversified tasks. The pace of merger and reorganization of the global data label industry has accelerated.

On the one hand, head enterprises gradually acquire small and medium-sized micro-data platforms, which will raise the bargaining power to a new height. In this context, the market concentration of the global data label industry has further improved. The M&A scale of the data label industry will continue to grow, and the industry competition will become increasingly fierce.

-For more data, please refer to the Analysis Report on Market Foresight and Investment Strategic Planning of China Data Label Industry by Forward-looking Industry Research Institute.