Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Group III

FRA Group
Assignment
Logistic regression model

Priyank, Sakshi, Varun, Vinay


Finance and Risk Analytics – Group Assignment

Problem Statement

Create India credit risk(default) model, using the data provided in the spreadsheet raw-data.xlsx, and
validate it on validation_data.xlsx. Please use the logistic regression framework to develop the credit
default model.

Data Insights

The data provided in raw-data comprises of financial data.

Major data points or variables are

Net worth next year, Total assets, Net worth, Total income, Total expenses, Profit after tax, PBDITA,
PBT (Profit Before Tax), Cash profit, PBDITA as % of total income, PBT as % of total income, Cash
profit as % of total income, PAT as % of net worth, Sales, Total capital, Reserves and funds,
Borrowings, Current liabilities & provisions, Capital employed, Net fixed assets, Investments, Net
working capital, Debt to equity ratio (times), Cash to current liabilities (times), Total liabilities.

In addition to the above variables there are other financial parameters which define the financial
strength of the organization taking the total tally of variables to 51.

Data Preparation

Step 1 (Identifying Defaulters)

Companies were classified into probable Defaulter and Non Defaulters based on the Net worth next
year. Ones with negative worth were classified as Defaulters (marked as 1) and the rest as Non
Defaulters (marked s 0).

Step 2 (Remove very small companies)

Removed companies with Total Net asset of 0 to 3

Step 3 (Basic Statistics)

Calculated Min, Max, Mean, Standard Deviation, Median and Percentiles (1 st to 4th and 99th to 97th )

Step 4 (Identified Floor and Cap)

Based on data understanding, identified floor and cap (Min / Max or based on Percentile)

Step 4 (Outlier and missing value Treatment)

Data imputation - Transformed the data with replacing the values with floor and cap.

Missing values were replaced with median.

Step 5 (Creation of new variables)

Created new variables to be used for model building. New variables are Critical variables divided by
Total assets.
New variables created

Net worth/ Total assets,Total income / Total assets, Total expenses / Total assets, Profit after tax /
Total assets, PBT / Total assets, Sales / Total assets. Current liabilities & provisions / Total assets,
Capital employed / Total assets, Net fixed assets / Total assets, Investments / Total assets, Total
liabilities / Total assets

Step 6 (Variable shortlisting)

Calculated Mean for Non Default and Default companies and then derived the ratio. Variables with
ratio > 3 and ratio < 1/3 were shortlisted for making the model.

Microsoft xls in which calculations were made and new transformed variables were created is
attached below

Following 14 variables were shortlisted

1-Net worth (Transfrmd)

2-Profit after tax (Transfrmd)

3-PBDITA (Transfrmd)

4-PBT (Transfrmd)

5-Cash profit (Transfrmd)

6-PBT as % of total income (Transfrmd)

7-Cash profit as % of total income (Transfrmd)

8-PAT as % of net worth (Transfrmd)

9-Sales (Transfrmd)

10-Reserves and funds (Transfrmd)

11-Net working capital (Transfrmd)

12-Profit after tax / Total assets

13-PBT / Total assets

14. Default

Logistic regression model was built in Knime.

Data was partitioned to test the accuracy before validating the accuracy with validation_data
Model building flow

Data Partitioning
Model Coefficients and Statistics

Following variables highlighted in yellow are selected for model

1-Net worth (Transfrmd)

2-Profit after tax (Transfrmd)

3-PBDITA (Transfrmd)

4-PBT (Transfrmd)

5-Cash profit (Transfrmd)

6-PBT as % of total income (Transfrmd)

7-Cash profit as % of total income (Transfrmd)

8-PAT as % of net worth (Transfrmd)

9-Sales (Transfrmd)

10-Reserves and funds (Transfrmd)

11-Net working capital (Transfrmd)

12-Profit after tax / Total assets

13-PBT / Total assets

14. Default

Confusion Matrix
Model Stats

Measure Value
Sensitivity 0.956
Specificity 0.691
Precision 0.986
Negative Predictive Value 0.397
False Positive Rate 0.31
False Discovery Rate 0.014
False Negative Rate 0.045
Accuracy 0.945

Now model ll be tested against validation_data

Measure Value

Sensitivity 0.9649

Specificity 0.6667

Precision 0.9693
Negative Predictive Value 0.6349

False Positive Rate 0.3333

False Discovery Rate 0.0307

False Negative Rate 0.0351

Accuracy 0.9399

You might also like