Numpy is the basic package of Python scientific computing, which provides many functions: fast and efficient multi-dimensional array object ndarray, the function of element-level calculation and direct mathematical operation on array, the tool of reading and writing array-based data set on hard disk, linear algebraic operation, Fourier transform and random number generation. NumPy also has a main function in data analysis, which is to serve as a container for transferring data between algorithms and libraries.
panda
Pandas provides a large number of data structures and functions for processing structured data quickly and conveniently. Starting from 20 10, help Python become a powerful and efficient data analysis environment. Among them, the most commonly used panda object is DataFrame, which is a column-oriented two-dimensional table structure, and the other is Series, which is a one-dimensional tag array object. Pandas combines the high-performance array computing function of Numpy with the flexible data processing function of spreadsheets and relational databases. It also provides a complex indexing function, which can easily complete operations such as reshaping, slicing and dicing, aggregating and selecting data subsets.
3、matplotlib
Matplotlib is the most popular Python library for drawing charts and other two-dimensional data visualization. It was originally written by John.
D. Hunt (JDH) was established and is currently maintained by a huge development team. It is ideal for creating charts for use in publications. Although there are other Python visualization libraries, matplotlib is the most widely used.
4. grumpy
SciPy is a software package dedicated to solving various standard problem domains in scientific computing. When it is combined with Numpy, it forms a fairly complete and mature computing platform, which can deal with many traditional scientific computing problems.
5, sci kit- learning
Since the birth of 20 10, scikit-learn has become a universal machine learning toolkit for Python. Its sub-modules include: classification, regression, clustering, dimensionality reduction, selection, pretreatment and so on. Scikit-learn, together with pandas, statsmodels and IPython, played a key role in the process of Python becoming an efficient programming language for data science.
6. Statistical model
Statsmodels is a statistical analysis package, which originated from a professor of statistics at Stanford University. He designed a variety of regression analysis models popular in R language. Captain Sebald and Joseph
On 20 10, the statsmodels project was formally established in Perktold, after which a large number of users and contributors gathered. Compared with scikit-learn, statsmodels contains classical statistical and econometric algorithms.