Introduction

This project is about a consumer finance company that specialized in lending various types of loans to urban customers. When the company receives a loan application, the company has to make a decision for loan approval based on the applicant’s profile. Two types of risks are associated with the bank’s decision:

If the applicant is likely to repay the loan, then not approving the loan results in a loss of business to the company If the applicant is not likely to repay the loan, i.e. he/she is likely to default, then approving the loan may lead to a financial loss for the company.

What is Exploratory Data Analysis

Exploratory Data Analysis (EDA) is the process of exploring, investigating, and gathering meaningful insights and nuggets using different kind of statistical measures and visualizations. The objective of EDA is to develop an understanding of data by uncovering trends, relationships, and patterns.

When it comes to the requirement of statistical knowledge, visualization technique, and data analysis tools like Numpy, Pandas, Matplotlib, etc. we categorize it as an art. When there is a requirement of asking interesting questions to guide the investigation for generating meaningful insight we call it a science. So it is a mixture of both art and science.

Project Outline

  • Download and read the dataset.
  • Data Processing & Cleaning with Pandas
  • Exploratory Analysis and Visualization
  • Asking and Answering Questions

As part of previous post we concentrated on initial two steps. Now we will continue to use the cleaned data to do Exploratory Data Analysis and answers few questions related to data. Let’s start…

Exploratory Analysis and Visualization

This section of the project is all about exploring different columns of the data frame so that we can understand the trends in the data, get the nuggets and insights out of it using graphical representation.

In this dataset we can see the individuals borrowed the loan from a company but some of them are still obligate to the loan lender company. In this project we will try to get the cognizance of loan defaults.

Loan Defaulter Analysis based on following columns

  • Home Ownerships
  • Loan Purpose
  • Grade
  • Home Ownerships & Loan Purpose
  • Grade & Loan Purpose

Home Ownerships

As per above MORTGAGE takes the major chunk, let’s draw pie chart to visualize it better.

Observation: It shows there are more defaulters in RENT and MORTGAGE.

Loan Purpose

Observation: There are more defaulters from ‘debt_consolidation’,’other’, ‘credit_card’ and ‘small_business’.

Grade

Observation:

  1. It shows there are more defaulters in B,C and D grades.
  2. Grades F,G(more interest rate grades) are having less defaulters which is a good indicator.

Home Ownerships & Loan Purpose

Observation:

  1. From RENT category, there are more defaulters from ‘debt_consolidation’,’other’, ‘credit_card’ and ‘small_business’.
  2. From MORTGAGE category, there are more defaulters from ‘debt_consolidation’,’home_improvement’, ‘credit_card’ and ‘small_business’.

3. Overall, one should be carefull with ‘debt_consolidation’, ‘credit_card’ and ‘small_business’ loans when the borrowers don’t have own house.

Grade & Loan Purpose

Observation: From all grades, there are more defaulters from ‘debt_consolidation’, ‘others’, ‘credit_card’ and ‘small_business’ purpose loans.

Overall Loan Defaulters

Observation: We see that small business loans default the most, then renewable energy and education

Observation: There are good chunk of around 14% considered as Bad/Defaulter loan.

Interest Rate Distribution analysis

Observation : Interest distribution signifies that the percentage is more concentrated between 10 and 14.

Employee length analysis

Observation

  1. The total number of defaulters with 1 year and No work experience is around 1k
  2. The total number of defaulters with 10 and 10 + years of experience are 1331
  3. The trend of defaulters is lowering when we take the set of 1–9 years of work experience and it suddenly increases in the case of 10 and 10+ years of work experienced members

Grade analysis

Observation:

  1. It shows there are more defaulters in B,C and D grades.
  2. Grades F,G(more interest rate grades) are having less defaulters which is a good indicator.

Frequency distributions of

  • DTI (Debt-To-Income ratio)
  • Revolving credit utilization
  • Ratio of open_acc to total_acc
  • Ratio of open_acc to total_acc

Observation:

  1. The average debt-to-income ratio is 13%. There do not seem to be much skew from above distribution graph
  2. The mean of revolving credit utilization is 49%, which means the average borrower is using most of their revolving credit at a time when they are seeking the loan. Also, data is largely spread of and not much skewed.
  3. The percentage of open accounts to total accounts seems left skewed.
  4. Negative ROI indicates defaulted loans while almost all of the loans with positive ROI were fully paid.

Descriptive Statistics

Observation:

  1. As expected Defaulters total payment amount is far less than Fully paid .
  2. Loan term is around 40 and 60 for both fully paid and defaulters
  3. Most of Loan amount is less than 200k and Instalment amount is around 500 for both fully paid and defaulter

Asking and Answering Questions

Question 1: How is the distribution of Loan Defaulters by Loan Purpose?

Answer: As per the above graph, there are more defaulters from ‘debt_consolidation’, ’other’, ‘credit_card’ and ‘small_business’.

Question 2: How are loans issued based on years?

Answer: Number of loans issued increased steadily by every year with a slight decrease in 2008.

Question 3: What can we confirm from looking at distribution of Loan Defaulters by Home Ownerships?

Answer: Borrowers with own house and the purpose of loan with consolidate debt, ‘credit_card’ and ‘small_business’ are not at much risk, but borrower with rent, mortgage are high risk applicants.

Question 4: How does grade impact the loans status?

Answer: It shows there are more defaulters in B,C and D grades.Grades F,G(more interest rate grades) are having less defaulters which is a good indicator.

Question 5: What is relationship between interest rate and loan grade?

Answer: There is an inverse relationship between interest rate and loan grade — lower grades(E,F,G) have higher interest rate.

Question 6: Who are the major defaulters based on purpose from all grades?

Answer: Overall, there are more defaulters from ‘debt_consolidation’, ‘others’, ‘credit_card’ and ‘small_business’ purpose loans from all grades.

Question 7: How are load amount distributed based on work experience of the seeker?

Answer: We observe that loan trend gets downward with work experience from 1 to 9 years.However it is the highest with respect to people with and above 10 years of experience.

Question 8: How is loan amount distributed based on various Categorical columns?

  • From this dataset, we have more observations(85%) from “Fully Paid” status.
  • There are more loan applicant’s with purpose of debt consolidation.
  • There are more loan applicant’s from California state.
  • Most of the loan applicant’s are rented and mortgage.
  • There are more number of loan with 36 month term.
  • There are more number of loan which were not verified the annual income.

You can check Jupyter notebook with complete source code, CSV file used, and also an option to run on jovian

Summary and Conclusion

Here are the conclusions that we could draw about the Bank Loan Investment from our analysis :

  • Major defaulters from Home purposes like Debt consolidation,Credit card and Small business and from B,C and D grades. The good thing is as the interest rate increases(grades F and G) there are less defaulters.
  • From MORTGAGE category there are more defaulters from Home improvements along with Debt consolidation,Credit card and Small business. Overall one should be careful with Debt consolidation,Credit card and Small business loans when the borrowers don’t have own house.
  • From the dataset we were able to draw the conclusion that there are more loan applicant’s with purpose of debt consolidation, California state has maximum of applicant’s and most of the applicants are applied loan for rented and mortgage with more number of loan with 36 month term.
  • The trend of defaulters is lowering when we take the set of 1–9 years of work experience and it suddenly increases in the case of 10 and 10+ years of work experienced members. The total number of defaulters with 1 year and No work experience is around 1k which is same with 10 and 10 + years of experience
  • Grade is based on interest rate. As grade increases from A to G there is a increase in interest rate. Majority of loans were from A, B, and C grade which in turn has lower interest rate.
  • The mean of revolving credit utilization is 49%, which means the average borrower is using most of their revolving credit at a time when they are seeking the loan.
  • Interest distribution signifies that the percentage is more concentrated between 10 and 14.
  • Number of loans issued increased steadily by every year with a slight decrease in 2008.

If you’ve made it this far, thank you for reading and if you enjoyed reading this post, consider dropping a clap and follow. I post articles on interesting Data Analysis topics and write beginner-friendly tutorials.

Until next time…Happy coding !!

--

--

Ravi Chandra

Full Stack Developer | Java | Python | Machine Learning | Data Science | Self-Development