Four big data tools worth a try!

Compile | Harris Source | Computer Room 360

Nowadays, big data is becoming more and more important, because enterprises need to deal with the growing storage data from multiple sources.

Adopting big data can be called a perfect storm. Cheap storage and the influx of structured and unstructured data have led to the development of many large-scale data tools to help enterprises "unlock" their accumulated data, from customer records to product performance results.

Like traditional business intelligence (BI), these new big data tools can analyze past trends and help enterprises identify important patterns, such as specific sales trends. Many big data tools now provide a new generation of forecasting and normative insights, as well as all the data buried in enterprise data centers.

Regarding the challenges faced by people, Doug Laney, an analyst at Gartner, a research institution, said that people should not use extended infrastructure to process all these data, but should process all kinds of data themselves.

"For the real challenge, enterprises process and integrate their own and customers' transaction data, and * * * construct and understand the input, plus data from partners and suppliers, as well as some exogenous data, such as open data and aggregated data of social media, etc., which only scratch the surface." Lanny said in an email.

Big data is a big question: Is your network ready?

Although Gartner's customers show that all kinds of data are a bigger problem through the ratio of 2:/kloc-0:, for them, the speed of data growth is getting faster and faster, and data processing vendors will continue to provide more funds and faster solutions.

Doug Hensend, an analyst at ConstellationResearch, said that big data solutions are definitely evolving.

"In my book, 20 14 is the year when SQLHadoop was published, but this year, enterprises and vendors began to realize that the opportunity of big data is not only to expand traditional BI and databases." Hensend said, "Therefore, ApacheSpark open source framework and other analysis schemes have surpassed SQL in 20 15. In 20 15, hundreds of suppliers and large companies began to adopt the ApacheSpark open source framework. IIBM has accepted the most obvious suppliers who advocate other analysis options, and many other companies committed to data integration and big data platforms have joined the ranks. "

In fact, the wave of big data seems to be coming. Every day, suppliers will launch various solutions, including some more comprehensive designs. Although it is difficult to get a comprehensive list, these four tools should appear in the user's application list.

(1) H2O.ai of data scientist

H2O.ai is an independent open source machine learning platform launched by startup Oxdata at the end of 20 14, which mainly serves data scientists and developers and provides a fast machine learning engine for their applications. Oxdata said that it can process and analyze data from any source (such as Hadoop and SQL) on commercial hardware, and even run on thousands of network nodes or Amazon's AWS cloud. Individuals can try and continue to use H2O.ai for free. Oxdata will charge business users.

"Many companies use Spark instead of Hadoop's short-term memory, just like the memory of big data." Oleg Rogesco, vice president of marketing and growth of H2O Company, said, "h20.ai has a function beyond Spark in reading your short-term memory, which basically provides ultra-fast analysis ability."

Rogesco said that H2O.ai is a new data tool to provide predictive analysis. He pointed out that SQL helps promote products in the early stage of descriptive data analysis or "tell me what happened", followed by "forecast period" to see what happened and try to help customers predict what will happen next-for example, inventory exhaustion or product breakthrough.

"In the next few years, we will see that the third stage is a compulsory stage. The system said,' This is my lesson. I think what will happen in the future, you should maximize your goal. Roger said that he also pointed out that Google Maps' ability to actively suggest alternative routes is an example of a normative solution.

H20.ai positions itself as a forecasting tool and "box" used by data scientists in various industries. For example, Internet giant Cisco has 60,000 models to predict purchase decisions, and the company uses H2O.ai to rate these models. Cisco's chief data scientist said, "The result is great. We see that the performance of H2O.ai is three to seven times better than that of our similar products. From the personal modeling score, the h2o.ai environment is up to 10 to 15 times. "

(2) Thinking Point 3-Big Data Application

With the help of a search engine like Google, it is easy to find social data and network data that users need online, while enterprise data is generally difficult to find or even more difficult to use. To this end, seven engineers * * * set up ThoughtSpot Company, with the goal of developing a search engine similar to Google to find business data.

The company provided hardware equipment for Google in its early days, and provided ultra-fast search function after the enterprise enabled the firewall. ThoughtSpot combines the application of a new search engine, and its function is to search massive information through a fast memory database. The company also plans to provide cloud-based services.

Starting at $90,000, Thinking Spot 3 is a tool for data scientists to quickly find big data for enterprises. "We have seen that the number of data scientists using this product in enterprises is increasing." "Two billion people are searching, but at work, we still rely on data experts," said Scott holden, vice president of marketing at ThoughtSpot.

Holden gave a demonstration in PaloAlto, California, showing how the system works with the familiar search bar interface. ThoughtSpot3.0 just released has some new functions, including the working mode of "DataRank", which is similar to Google's PageRank and typeahead. The software uses keywords suggested by machine learning algorithm to search for customers to speed up this process.

Popcharts is undoubtedly the coolest new feature. When you enter "East Coast for sale ..." in the search box, ThoughtSpot creates a graph in real time according to the query relevance, and gives more than 10 graphs to choose from through machine learning.

Another "real-time" function is AutoJoins, which is designed to navigate for enterprises with hundreds of data sources. AutoJoins uses ThoughtSpot's data index to find out whether tables are related or not through index mode and machine learning, and presents the research results within one second.

Holden said that ThoughtSpot pays more attention to the traditional BI analysis of historical data (ultra-fast and easy to use), and its predictive and normative analysis functions will be reflected in future software.

(3) Connotation software

Connotate Company is an enterprise that classifies and analyzes unstructured data of thousands of websites around the world in real time for Associated Press, Reuters, Dow Jones and other big companies. Connotate software is the simplest and most cost-effective Web data extraction and monitoring solution in the world, which can effectively use massive data, mine valuable information for enterprise growth, and conduct highly scalable data monitoring and data collection.

Doug Lenny, an analyst at Gartner, said that Connotate and BrightPlanet are on his list of big data tools, because they help to harvest and build colorful content from enterprises' own databases and the Internet.

"With digitalization and economic growth, enterprises realize that focusing only on their own data is no longer a foolproof innovative prescription, and they are increasingly turning to external data (that is, data outside the company)." Lenny said.

Connotate said that its patented technology for extracting content from web pages goes far beyond web crawling or custom scripts. On the contrary, it adopts an intuitive visual understanding of how to use machine learning in website work. Connotate said that its content extraction is "accurate, reliable and extensible".

According to the company, Connotate platform can easily handle hundreds of websites and millions of megabytes. And provide targeted information related to the business. The average cost of content collection provided by this method is 55% less than that of traditional methods.

Taking a use case as an example, Connotate helps sales information providers to extract contact information (name, position, telephone number, e-mail and affiliation) from thousands of hospital websites and establish a national doctor file database.

Connotate said that its big data solutions were sold to several large pharmaceutical companies without spending additional hardware or IT resources. The scale of big data extraction can even provide data of 500,000 doctors.

(4) Bright Planet Tools

BrightPlanet also extracts data from the internet, which claims that its search has the so-called "deep network" insight. Its deep web can mine the data of password-protected websites and other websites that are not usually indexed by traditional search engines.

BrightPlanet said that it has collected millions of data items, including data from Twitter and news databases and medical journals, and can filter them according to the specific needs and conditions of enterprises.

The company provides free DaaS consultation for data acquisition engineers who use the software, and introduces their services as a good choice. The purpose of consultation is to help the enterprise data center find the appropriate collected data and obtain the correct format, so that customers can understand the process and results well.

End users or customers can choose which websites to get content from. BrightPlanet, in turn, enriches its content. For example, unstructured data (such as comments on social media sites) is designed in a customized format for submission on a more user-friendly client.

end