P2p loan data

How to check your online loan records?

When querying online loan records, generally only those with credit records can be queried, and those without credit records have no suitable query channels for the time being.

The method of credit inquiry for online loan records is as follows:

1. Credit window inquiry: individuals can bring their ID cards to the nearby branches of the People's Bank of China, find the credit window and submit an inquiry application to the staff.

2. Inquiry of official website Credit Information Center: Individuals can log in to official website Credit Information Center of China People's Bank, click "Internet Personal Credit Information Service Platform" on the homepage, and then click "Start Now". If they are not registered, they can click New User to register. After successful registration, you can log in to your personal account and make online inquiries.

What are the top ten P2P online lending platforms and online lending platforms in China?

According to the statistical data released by Aegis Risk Control official website in June, 20021,the top ten p2p online lending platforms in China are ranked as follows: top 1: Pleasant Loan, top2: Love Money, top3: Jin Lu Service, top4: Building Block, top5: Auction Loan, top6: Sinkaijinfu, top7: Renren Loan. These ten online lending platforms are very formal and excellent online lending platforms in China, occupying a very large market share in the online lending industry and having a very great impact on the current domestic online lending industry.

Being able to be on the list of the top ten P2P online lending platforms in China is a very high affirmation of online lending platforms, because this list is not only judged according to the usage of online lending platforms and the number of users, but also needs to review the user satisfaction of online lending platforms and whether online lending is a formal business.

If there are things like "7 14 anti-aircraft guns" or various illegal phenomena such as illegal charges, even if there are more users, this online lending platform cannot be on such a list. This is also the reason why many well-known online lending platforms are not listed, because many well-known online lending platforms may have insufficient users, or there may be illegal operations, resulting in poor user reputation.

At present, there are a large number of online lending platforms, and there are many very formal and professional online lending platforms for us to choose from, but at the same time there are also many informal online lending companies that induce us. In this case, we must keep our eyes open, identify the quality of online lending companies, and choose a formal online lending platform for business.

Analysis of Prosper Loan Data of American Online Lending Platform

This paper mainly describes how to use Python to evaluate, organize and clean data sets.

After completing this process, we will explore, analyze and visualize the question "What are the characteristics of Prosper's defaulting customers" through Tableau.

Finally, the random forest algorithm is used to model and analyze the data after July 2009, and predict whether the loan is still in default.

Prosper is the first P2P lending platform in the United States. This data set comes from the loan data of Uber on Udacity from 2005 to 20 14. This paper hopes to judge what kind of customers are more likely to default through the analysis of completed loans, and predict whether unfinished loans will default.

The original data set * * * contains 8 1 variables and 1 13937 pieces of data. Some important variables are explained below, and the meanings of other variables can refer to the variable dictionary.

First load the library and data.

Then use df.describe () and df.info () to observe the data.

This time the main analysis 1. What kind of borrowers are more likely to default? 2. Predict whether the outstanding loan will default. So get rid of the irrelevant columns.

Since July 2009, Prosper has adjusted its evaluation method for customers. This time, only the loans after July 2009 are analyzed -0 1.

Delete columns with duplicate meanings:

Prosper's rating of new customers is different from that of old customers. This time, it only analyzes the data of new customers.

First, check the missing data of each variable.

The platform divides the loan status into 12 types: Cancelled, Chargedoff (write-off, investors have losses), completed (normal completion, investors have no losses), current (loan repayment), default (bad debts, investors have losses), and final repayment without losses.

This paper divides all the data into the following three groups according to whether the transaction is still in progress or has been completed, and whether the investor has lost money in the completed transaction:

Current (including current and expired),

Breach of contract (including breach of contract and refusal to pay),

Completed (including completed, finalpaymentinprogress).

In order to facilitate the subsequent analysis and calculation, "Done" is changed to 1, and "Default" is changed to 0.

The default rate of completed loans is defaulted _ ratio _ finished = 26.07%.

This data set has many characteristics that reflect the credit situation of loan users. Among them, the credit rating is established by Prosper according to its own model, which is the main basis for determining the loan interest rate, while the CreditScore is provided by the official credit rating agency.

As can be seen from Figure 5- 1, with the increasing of ProsperRating, the default rate shows an obvious downward trend.

In CreditScore, the score is low (640-700), and the default rate is in a relatively high position, and there is not much change. For the part above 720, with the improvement of credit score, the default rate decreased obviously.

Generally speaking, the higher the borrower's credit rating, the lower the possibility of default.

In different income levels, the default rate of unemployed borrowers is the highest, and with the increase of income, the default rate continues to decline.

Under different loan conditions, the monthly income of defaulting users is obviously lower than that of non-defaulting users.

According to the left figure of Figure 5-4, there is little difference in the overall debt-to-income ratio between defaulting users and non-defaulting users.

According to the quartile of debt-to-income ratio, all the data are divided into four groups with similar data volume. As can be seen from the right figure of Figure 5-4, the default rates of low ratio (debt-to-income ratio 0-0. 12) and medium ratio (0. 12-0. 19) are both low. The higher default rate (0. 19-0.29) is slightly higher than the first two. However, the default rate of users with a high proportion (greater than 0.29) increased significantly.

According to the quartile of bank card utilization rate, the data are divided into' unused',' low overdraft (0,0.3',' moderate overdraft (0.3,0.7',' high overdraft (0.7, 1)' and' serious overdraft (1).

It can be seen that borrowers with serious overdrafts have the highest default rate.

The second is unused users, which is why financial institutions pay special attention to "white households".

InquiriesLast6Months can reflect the borrower's recent frequency of applying for loans from financial institutions and indirectly reflect the borrower's recent financial situation.

In Figure 5-6, the green line indicates the number of loans at different inquiry times. As you can see, most of them are below 7 times.

In the range of 0-7 queries, the default rate increases with the increase of query times.

The current default situation can well reflect the borrower's credit status.

As can be seen from Figure 5-7, at present, most borrowers are overdue for less than 2 times. In the range of 0-6, the default rate increases with the increase of current overdue times.

In order to avoid the influence of some very few categories on the ranking of default rate, the categories with more than 30 loans are first screened out.

As can be seen from Figure 5-8, the largest number is 1-DebtConsolidation.

The highest default rates are 15- medical/dental (medical treatment), 13- household expenses (household expenses) and 3- commerce (commerce), all of which are higher than 30%.

According to the quartile of the loan amount, the data are divided into four groups with similar figures. Interestingly, medium-sized loans (365,438+000,4750) have the highest default rate, while large-scale loans (over 8,500) have the lowest default rate.

This is probably because users who can apply for high loans have good conditions in all aspects, thus reducing the default rate.

As can be seen from Figure 5- 1 1, in the range of 0-30, with the increase of duration, the default rate gradually decreases, and this range also contains about half of the data.

When the duration continues to increase, the default rate has no obvious change law.

There are obvious differences in default rates in different regions. In cities such as Los Angeles and SD, the default rate is high. In cities such as ut and co, the default rate is low.

On the whole, the default rate of borrowers with real estate is significantly lower than that of borrowers without real estate.

Import related libraries.

Convert string variables in data to numbers.

According to the proportion of 30% test set and 70% training set, the data set is divided and the model is established by using random forest algorithm.

The prediction accuracy of the model test set is: accuracy =73.99%.

For the random forest algorithm, we can check the importance of each feature in the model.

As shown in Figure 6-2, StatedMonthlyIncome and EmploymentStatusDuration are the most important functions.

According to this model, it is predicted whether the loan that is still in progress will default.

The default rate of loans still in progress is default _ ratio _ prediction = 3.64%.

This paper describes in detail the complete process of Prosper loan data from data exploration to model establishment and prediction.

It is found that monthly income and employment period have the greatest influence on whether to breach the contract. Mainly because these two are important factors reflecting the stability of borrowers.

In the aspect of model establishment, we can also improve the accuracy by adjusting the parameters of this model, or try to use other algorithms, such as logistic regression, to establish a new model for comparison.

So much for p2p loan data introduction.