Characteristics of data warehouse

Infobright is a column database based on unique patent knowledge grid technology. Infobright is an open source mysql data warehouse solution, which introduces column storage scheme, high-intensity data compression and optimized statistical calculation (similar to sum/avg/group by). Infobright is based on mysql, but it can be installed without MySQL because it comes with one. Mysql can be roughly divided into logical layer and physical storage engine. Infobright mainly implements a storage engine, but because its storage logic is essentially different from that of relational database, it cannot be directly connected to mysql as a plug-in like InnoDB, and its logic layer is mysql's logic layer plus its own optimizer.

Infobright function

Advantages:

The query performance of large amount of data is strong and stable: under the conditions of millions, tens of millions and billions of records, the speed of the same SELECT query statement is 5 ~ 60 times faster than that of ordinary MySQL storage engines such as MyISAM and InnoDB. Efficient query mainly depends on the optimization of query by specially designed storage structure, but the effect of optimization here also depends on the design of database structure and query statement.

The amount of data stored is huge: TB data size, billions of records. Data storage mainly depends on high-speed data loading tools (100 G/ h) and high data compression ratio (>; 10: 1)

High data compression ratio: It is said that the average data compression ratio can reach 10: 1 or above. It can even reach 40: 1, which greatly saves data storage space. High data compression rate mainly depends on column storage and patent-pending flexible compression algorithm.

Column-based storage: no indexes and partitions. Even if the amount of data is huge, the query speed is very fast. It can be used in data warehouse to process massive data without collection. There is no need to establish an index, which avoids the problems of maintaining the index and expanding the index with data. Each column of data is compressed and stored into blocks, and each knowledgeable grid node records statistical information in blocks instead of indexes to speed up the search.

Quick response to complex aggregate queries: suitable for complex analytical SQL queries, such as SUM, COUNT, AVG and GROUP BY.