Characteristics of Big Data and Massive Data

Big data refers to a collection of data that cannot be captured, managed and processed by conventional software tools within a certain time range. It is a massive, high-growth and diversified information asset, which needs a new processing mode to have stronger decision-making, insight and discovery, and process optimization ability.

Rubik's cube (big data model platform)

Big data model platform is a tool platform for data analysis and mining based on service bus and distributed cloud computing. It uses a distributed file system to store data and supports the processing of massive data. Adopt a variety of data acquisition technologies to support the collection of structured data and unstructured data. Through the graphical model building tool, it supports process model configuration. Through the third-party plug-in technology, other tools and services can be easily integrated into the platform. Data analysis and judgment platform is a process of collecting massive information, establishing data model, mining and analyzing data, and finally forming knowledge service actual combat and decision-making. The platform mainly includes data acquisition part, model configuration part, model execution part and achievement display part.

Data Extraction Tool for Big Data Platform

The data extraction tool of big data platform realizes the function of importing data from db to hdfs. With the help of Hadoop, it can provide efficient cluster distributed parallel processing capability, and can extract db data to hdfs file system in parallel in batches through database partition, field partition and paging, which effectively solves the problems of excessive workload and long extraction time in traditional extraction of big data and provides a transmission pipeline for big data warehouses. The data processing server allocates an independent job task processing worker thread and task execution queue for each job, and the jobs do not interfere with each other. Flexible job task processing mode: job tasks can be executed incrementally, and the task processing time strategy can be configured and customized according to different needs. Asynchronous event-driven mode is adopted to manage and distribute job instructions and collect job status data. By managing the monitoring terminal, the real-time running status of jobs in each data processing node can be monitored in real time, and the historical execution status of jobs can be viewed, so that operations such as submitting new jobs, re-executing jobs, stopping the jobs being executed and the like can be conveniently realized.

Internet data acquisition tools

Network information radar is a kind of network information directional collection product, which can collect and update the website data set by users, achieve flexible network data collection goals, and provide a basis for Internet data analysis.

Weizhiyun (Internet Push Service Platform)

Cloud computing data center is based on advanced Chinese data processing and massive data support, supplemented by manual services in all links, which makes the data center run safely and efficiently. According to the different links of cloud computing data center, we have specially equipped system management and maintenance personnel, data processing and compiling personnel, data collection and maintenance personnel, platform system administrators, institutional administrators, public opinion monitoring and analysts to meet the needs of each link. For users, we provide government-oriented and enterprise-oriented solutions.

Microscope (big data text mining tool)

Text mining refers to the computer processing technology to extract valuable information and knowledge from text data, including text classification, text clustering, information extraction, entity recognition, keyword indexing, abstract and so on. The text mining software based on Hadoop MapReduce can realize the mining and analysis of massive texts. An important application field of CKM is intelligent comparison, which is widely used in patent novelty retrieval, scientific novelty retrieval, document duplicate retrieval, copyright protection, manuscript traceability and other fields.

Data cube (visual relationship mining)

The presentation methods of big data visual relationship mining include relationship diagram, time axis, analysis diagram, list and other expressions to provide users with all-round information presentation methods.