Data collection is mainly through computers and networks. All the data processed by the computer are easy to collect, such as searching in the browser, clicking, shopping online, ... Other data (such as temperature, seawater salinity, seismic waves) can be converted into digital signals by sensors and input into the computer.
Generally speaking, the collected data should be sorted first. Commonly used software: Tableau and Impress are integrated, Refine and herdsman are impure data sorting tools, and Weka is used for data mining.
Hadoop is a software framework that can distribute large amounts of data. The R language for statistical analysis has an extension R+Hadoop, which can run R code on Hadoop cluster. More specifically, search yourself.
There are many tools for visual output. It is recommended to refer to Wikipedia's "data visualization" entry.
Tableau and impure have visualization functions. R language can also draw pictures.
There are also many frameworks or controls that can be used to realize visual output on web pages.
Roughly based on four technologies: Flash(Flex) or JS(HTML5) or Java or ASP. NET(Silverlight)。
There are Degrafa, BirdEye, Axiis and openFlash charts in flash.
JS includes Ajax.org, Sencha Ext JS, Filament, jQchart, Flot, Minigraph, gRaphael, TufteGraph, Exhibit, PlotKit, ExplorerCanvas, MilkChart, Google Chart API, Protovis.
Java includes Choosel, google-visualization-java, GWT Chronoscope and JFreeChart.
ASP.NET has Telerik chart, Visifire chart and Dundas chart.
At present, I prefer d3 (data-driven document), which is rich in graphics and interactive. You can visit d3js.org. There are many kinds of graphic demonstrations.