Finance and Risk Analytics GRP Assgn

Group III
FRA Group
Assignment
Logistic regression model
Priyank, Sakshi, Varun, Vinay

Finance and Risk Analytics – Group Assignment
Problem Statement
Create India credit risk(default) model, using the data provided in the spreadsheet raw-data.xlsx, and
validate it on validation_data.xlsx. Please use the logistic regression framework to develop the credit
default model.
Data Insights
The data provided in raw-data comprises of financial data.
Major data points or variables are
Net worth next year, Total assets, Net worth, Total income, Total expenses, Profit after tax, PBDITA,
PBT (Profit Before Tax), Cash profit, PBDITA as % of total income, PBT as % of total income, Cash
profit as % of total income, PAT as % of net worth, Sales, Total capital, Reserves and funds,
Borrowings, Current liabilities & provisions, Capital employed, Net fixed assets, Investments, Net
working capital, Debt to equity ratio (times), Cash to current liabilities (times), Total liabilities.
In addition to the above variables there are other financial parameters which define the financial
strength of the organization taking the total tally of variables to 51.
Data Preparation
Step 1 (Identifying Defaulters)
Companies were classified into probable Defaulter and Non Defaulters based on the Net worth next
year. Ones with negative worth were classified as Defaulters (marked as 1) and the rest as Non
Defaulters (marked s 0).
Step 2 (Remove very small companies)
Removed companies with Total Net asset of 0 to 3
Step 3 (Basic Statistics)
Calculated Min, Max, Mean, Standard Deviation, Median and Percentiles (1 st to 4th and 99th to 97th )
Step 4 (Identified Floor and Cap)
Based on data understanding, identified floor and cap (Min / Max or based on Percentile)
Step 4 (Outlier and missing value Treatment)
Data imputation - Transformed the data with replacing the values with floor and cap.
Missing values were replaced with median.
Step 5 (Creation of new variables)
Created new variables to be used for model building. New variables are Critical variables divided by
Total assets.
New variables created
Net worth/ Total assets,Total income / Total assets, Total expenses / Total assets, Profit after tax /
Total assets, PBT / Total assets, Sales / Total assets. Current liabilities & provisions / Total assets,
Capital employed / Total assets, Net fixed assets / Total assets, Investments / Total assets, Total
liabilities / Total assets
Step 6 (Variable shortlisting)
Calculated Mean for Non Default and Default companies and then derived the ratio. Variables with
ratio > 3 and ratio < 1/3 were shortlisted for making the model.
Microsoft xls in which calculations were made and new transformed variables were created is
attached below
Following 14 variables were shortlisted
1-Net worth (Transfrmd)
2-Profit after tax (Transfrmd)
3-PBDITA (Transfrmd)
4-PBT (Transfrmd)
5-Cash profit (Transfrmd)
6-PBT as % of total income (Transfrmd)
7-Cash profit as % of total income (Transfrmd)
8-PAT as % of net worth (Transfrmd)
9-Sales (Transfrmd)
10-Reserves and funds (Transfrmd)
11-Net working capital (Transfrmd)
12-Profit after tax / Total assets
13-PBT / Total assets
14. Default
Logistic regression model was built in Knime.
Data was partitioned to test the accuracy before validating the accuracy with validation_data
Model building flow
Data Partitioning
Model Coefficients and Statistics
Following variables highlighted in yellow are selected for model
1-Net worth (Transfrmd)
2-Profit after tax (Transfrmd)
3-PBDITA (Transfrmd)
4-PBT (Transfrmd)
5-Cash profit (Transfrmd)
6-PBT as % of total income (Transfrmd)
7-Cash profit as % of total income (Transfrmd)
8-PAT as % of net worth (Transfrmd)
9-Sales (Transfrmd)
10-Reserves and funds (Transfrmd)
11-Net working capital (Transfrmd)
12-Profit after tax / Total assets
13-PBT / Total assets
14. Default
Confusion Matrix
Model Stats
Measure Value
Sensitivity 0.956
Specificity 0.691
Precision 0.986
Negative Predictive Value 0.397
False Positive Rate 0.31
False Discovery Rate 0.014
False Negative Rate 0.045
Accuracy 0.945
Now model ll be tested against validation_data
Measure Value
Sensitivity 0.9649
Specificity 0.6667
Precision 0.9693
Negative Predictive Value 0.6349
False Positive Rate 0.3333
False Discovery Rate 0.0307
False Negative Rate 0.0351
Accuracy 0.9399

Finance and Risk Analytics GRP Assgn

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Finance and Risk Analytics GRP Assgn

Uploaded by

Copyright:

Available Formats

Group III

Priyank, Sakshi, Varun, Vinay

The data provided in raw-data comprises of financial data.

Major data points or variables are

Step 1 (Identifying Defaulters)

Step 2 (Remove very small companies)

Removed companies with Total Net asset of 0 to 3

Step 3 (Basic Statistics)

Step 4 (Identified Floor and Cap)

Step 4 (Outlier and missing value Treatment)

Missing values were replaced with median.

Step 5 (Creation of new variables)

Step 6 (Variable shortlisting)

Following 14 variables were shortlisted

1-Net worth (Transfrmd)

2-Profit after tax (Transfrmd)

5-Cash profit (Transfrmd)

6-PBT as % of total income (Transfrmd)

7-Cash profit as % of total income (Transfrmd)

8-PAT as % of net worth (Transfrmd)

10-Reserves and funds (Transfrmd)

11-Net working capital (Transfrmd)

12-Profit after tax / Total assets

13-PBT / Total assets

Logistic regression model was built in Knime.

Following variables highlighted in yellow are selected for model

1-Net worth (Transfrmd)

2-Profit after tax (Transfrmd)

5-Cash profit (Transfrmd)

6-PBT as % of total income (Transfrmd)

7-Cash profit as % of total income (Transfrmd)

8-PAT as % of net worth (Transfrmd)

10-Reserves and funds (Transfrmd)

11-Net working capital (Transfrmd)

12-Profit after tax / Total assets

13-PBT / Total assets

Now model ll be tested against validation_data

False Positive Rate 0.3333

False Discovery Rate 0.0307

False Negative Rate 0.0351

You might also like