top of page

Binary Classification Models for Credit Card Default Risk and Customer Profitability


Introduction

This is a small project that I worked on while taking Duke University's Excel to MySQL: Analytic Techniques for Business Specialization. The dataset is available there if you want to take a look at it.

When issuing out credit cards for potential consumers, a bank could be interested in two things which I will discuss, default risk and customer profitability. In this project, I will take on the role of a business data analyst. I will develop two predictive models to determine which applicants for credit cards should be accepted and which will be rejected. The first model will be minimizing default risk and the second one will be to maximize customer profitability.

Before I get into models, I would like to explain what and why I chose a binary model. The binary model leads to a "yes or no" control after the given data goes through the model. Thus, it is intuitively simple for the bank or consumer to get the "yes or no" answer when applying for a credit card. For example, in the case of an overqualified applicant vs someone who barely meets the model's requirements, the answer is still, "yes or no". This may seem too simple, but the model is much more complex than that. It uses binary classifiers such as true positives, true negatives, false positives, and false negatives to determine how to either minimize risks or maximize profits.

In a perfect model, there would only be true positives and true negatives. However, it is extremely rare to achieve a perfect model since there are so many random variables that are measurable. For example, time of application, religion of applicant, type of internet browser, and many more specific variables in which could have significant value aren't measured because of conventional models that have worked in the past. With that, let's get into the details of each classifier.

A true positive is when our prediction yields a condition which is present. Suppose that I classify an applicant as "minimal default risk" which allows them to get a credit card. If the applicant pays their bills and does not default, my classification would be a true positive. A true negative is when our prediction yields a condition which is not present. Suppose that I classify an applicant as "no cancer". If it is the case that my patient has no tumors which would lead to cancer, my classification would be a true negative. These two classifications in theory would save institutions money, since consumers pay good money for accuracy classifications. However, false positives and false negatives are still highly possible. A false positive is when our prediction detects a condition when it is actually absent. For example, if we were to say that a patient doesn't have cancer when they actually do, that is a false positive. False positives could cost institutions a great deal of money depending on what it is classifying. For example, if a restaurant chain were to have a predictive model where they measure if their steak is "done" or "not done", it wouldn't cost them much if they get a few false positives. However, suppose that there were a radar company which differentiated seagulls vs. bombers in a state of war. A false positive, where they classify a seagull as a bomber could cost a great deal of money since they have to fly out forces to intercept a seagull that was mistaken for a bomber. A false negative is when a model does not detect a condition when the condition is actually present. For example, if a model were to predict that a patient does not have cancer when they actually do, it could be devastating to the patients family when they die from a faulty model.

Type I and Type II Errors

The final models for minimizing default risk and maximizing customer profitability will ensure that there are an acceptable number of type I and type II errors using an receiver operating characteristic curve, or ROC curve. The threshold which I use will allow a few false positives and false negatives pass, enough to have the least default risk and the maximum customer profit possible. For example, if I had 10 applicants and each were ranked from most profitable to least profitable, it is not certain that the top 3 will have the most profit combined. If I chose to accept 3 applicants and reject 7 applicants, I could be more precise, but not make much profit due to only having 3 customers. If I accepted 5 applicants, I could make more profit but lose on a small risk of one or two customers not being profitable. If I accepted 7 applicants, the lower ranking applicants might cost me more than the profits of the higher ranking applicants. My threshold will tell my model a minimum score to accept or reject applicants.

 

Minimizing Default Risk of Credit Card Applications

In order to make the best possible model, updates or even complete makeovers to previous models must be implemented. Suppose that a bank was willing to risk $750,000 in an experiment to make a new model. In this project, the bank would make a new credit card and accept the first 400 applicants, regardless of whether they think the applicant will default or be profitable. In the application, the bank will collect important metrics that they are allowed to take according to CFPB laws and regulations. Some metrics include: age, years at employer, years at address, income, and current debt. With these metrics, we can also determine ratio of debt using current debt divided by income and many other mini-models to predict default or profitability. For example, years at address could be a metric to measure stability. This could be a positive metric to measure minimal likelihood to default. This metric can't be the tell-all for accepting vs. rejecting applicants since they are cases where an unemployed person living in a place for an extended amount of time or a CEO relocating every year to make new businesses. So thus, we must use all metrics that we are allowed to use to make the best possible model. To do this, we need to wait until the conclusion of the experiment. At the end, there are two metrics taken. The first, default, "yes or no", measures whether the applicant defaulted or not. The second, profit/loss, will be discussed in the next section.

So after the experiment, we are left with metrics gathered before and after the experiment. As a business analyst, we first tabulate all 400 data points in an excel sheet.

Raw Data

After, we standardize the data. (Think bell curve if you are a student) We do this because even though 35 years old may seem "average" to us, it is meaningless if the average definition is different than the average presented in the data. A quick refresher on what it is to standardize data. Suppose that I record 5 total test scores from my students. They score 45, 50, 55, 60, and 65. Even though a 65 may seem like a very low grade, it is actually the highest score achieved in the class. Also, even though a 60 may also seem low, it is one standard deviation above the average. This is why we need to standardize data. So 45, 50, 55, 60, and 65 would be -2, -1, 0, 1, and 2. It is also useful to standardize data to compare across categories. A high score for age and high score for income could be useful to make meaning out of, e.g. Age 2.2 / Income 2.5. This would say that the age of the applicant is 2.2 times the average and their income is 2.5 times the average. This is much more meaningful than to say the age was 56 and the income was $88,968.

Standardized Data

After standardization, we group the applicants into two groups, applicants that defaulted, and applicants that did not default. We then ask ourselves, what outcome are we looking for? Of course, this answer is easy, we are looking for applicants that will not default. Thus, we are going to average the metrics where the outcome is not defaulting.

So, after we filter out the defaulted applicants, we calculate the average of all of the metrics that we are allowed to use. Looking at our new model, we see that the highest average is "Age". So, we could say that higher ages that are 0.11 above the standardized mean are associated with minimal default rates. Since "Years at Address" has a 0.00 value, we could say that it has little to no association with likelihood to default. Pay special attention to the standard deviations of each metric. Although they are very close to 1, a higher standard deviation would mean less precision and a lower one would mean more precision. The "Credit Card Debt" metric has the least standard deviation so we could say that it is the most precise metric that is available to us. We could technically stop here if we wanted to. To use this model, we would take new applicants metrics, multiply it by the average model score, then sort it out to the applicant pool and accept the top several applicants.

But wait. What do we mean when we say top several applicants? If we were to receive 100 applicants in a day, do we accept the top 5? Do we accept the top 10, 25, 50? If we were to accept too few applicants, we would get too many false positives. (Predicting that an applicant will default when they won't) Thus, we could lose out of good customers due to our model being too strict. If we accept too many applicants, we would get too many false negatives. (Predicting that an applicant will not default when they will)

But which error will cost us more? Luckily, our project manager has some intel on how much each false negative and each false positive costs. From past experiments, research concludes that each false negative costs $5,000 and each false positive costs $2,500. Great! Now we have data on how much error our model can tolerate to ensure minimum default rate.

To predict how much applicants we can accept, we need to find a threshold score in which we can apply to current and future applicants to. We use an area under the curve, (AUC) calculator for this. A score of 0.50 is similar to no model being used. A score above 0.70 is good. A score of 0.95 or above is exceptional, maybe even too good to be true. Generally, for an application like this where we are limited to 400 data points, getting a score of 0.75 or above is good. As we get more funding to do bigger experiments, we can aim for AUC scores of 0.85 and above. Our model got us a score of 0.82.

AUC Calculator
 

Maximizing Customer Profitability of Credit Card Applications

Now it is time to make a model to maximize profits of credit card applications. By this time, we understand that it is necessary to standardize and sort our interested metrics. We are going to be looking for metrics that yields profits.

The rest of this application is essentially exactly the same as the method used to predict the minimizing default rate model as mentioned above so , I'm not going to go over it again. However, maximizing customer profitability is quite interesting in my opinion. The model holds more value, than minimizing default rate. Why? Let's take a look at this. The overall experiment estimated costs is about $750,000. The first model on minimizing default risk is okay in theory, but what do banks care about the most? Profit. The second model can create a model to use in the future and also payoff the $750,000 balance that the bank invested in the first place. According to my model, with 1,000 applicants per day, the profit model I made makes a profit of over $500,000 a day. Now, if we were to get the attention of our bank's directors and investors, wouldn't that sound more attractive than minimizing default risk? I bet so. Further, making the model for customer profitability somewhat contradicts the model for default risk. Let's take a scenario of an applicant that has the best qualities of a non-defaulting applicant. They have never missed a bill in their lives and makes the most money getting travel rewards on their credit card. Although they might score stellar scores on our first model, they would probably bomb or second model on maximizing profits. In order for the bank to make a sizable profit, consumers have to pay interest. That means keeping a balance on their cards, having debt, and maybe missing a bill or two for late fees to occur. Thus, it is difficult to merge the two models we have here because it might just cancel out.

Well, that's all for my project on binary classification models for credit card default risk and customer profitability! Thank you for reading!

Featured Posts
Recent Posts
Archive
Search By Tags
No tags yet.
bottom of page