The Road to Fairer AI Credit Models: Are We Heading in the Right Direction?

Richard Pace

Jan 30, 202420 min read

Updated: Feb 20

My 2023 AI credit model research suggests current LDA credit models may pose certain safety-and-soundness and compliance risks to early adopters. Given these challenges, what should 2024 research priorities be to put us on a more viable path?

2023 was another eventful year for AI/ML-based ("AI") credit models - punctuated by a growing wave of optimism for how these technologies can be a real game-changer by significantly improving consumer credit risk assessment and consumer lending fairness and inclusivity. Indeed, backed by the full force of fintech and regtech marketing campaigns, troubling "fairness reports" based on Home Mortgage Disclosure Act ("HMDA") data, and an active conference circuit in which the adoption of these algorithmically-optimized models is presented as a fait accompli, the call to action couldn't be clearer. Traditional consumer credit models - long used and rigorously tested for fair lending compliance - are now persona non grata and need to be retired post haste to allow modern-day, less discriminatory versions ("LDA credit models") to lead the way toward a new era of equitable credit access.

And, based solely on the influential power of this messaging, who wouldn't support this?

But, there's one small catch.

These claims may not be justified.

You see, despite these claims' immense economic and social attractiveness, they are built upon a shaky foundation - weakened by the absence of well-designed, publicly available, rigorous, and objective empirical vetting typically required for new complex technologies deployed in such high-stakes uses. As we know, consumer lenders operate within a heavily-regulated environment in which the pre-deployment testing and mitigation of potential safety-and-soundness and consumer compliance risks are paramount.[1] And, yet, there is still - in my opinion - insufficient due diligence on whether (and how) these technologies actually deliver these benefits, whether the high-level marketing claims and PowerPoint-level "research" results are demonstrably complete and accurate, and - most importantly - whether these LDA credit models may inadvertently create critical safety-and-soundness and/or compliance risks that may contraindicate their adoption.

The significant regulatory uncertainty surrounding these models and technologies makes early adoption even more treacherous. To be fair, federal bank regulators did issue warnings in 2022 and 2023 about the need for specificity and accuracy in AI-based credit decision explanations and the need for sufficient bank risk and compliance oversight of fintech partner lending platforms. President Biden also weighed in with his Administration's Blueprint for an AI Bill of Rights addressing, among other things, algorithmic discrimination protections. However, many key questions still remain unanswered regarding the measurement and mitigation of AI credit model disparate impact risks and whether current AI explainability approaches meet regulatory requirements for adverse action notifications - thus creating significant legal and regulatory risk exposure to many early LDA credit model adopters. And, more importantly, the bank regulatory agenda in this area continues to remain vague - with no clear roadmap, timeline, or public communications of what industry participants should (or should not) expect in terms of formal guidance or examination procedure.

Given this context, I believe that LDA credit model deployment appears to be racing ahead faster than the underlying risk and compliance guardrails required to ensure safety-and-soundness and regulatory compliance. In fact, this concern motivated my three 2023 "Fool's Gold?" articles where I employed empirical analysis to answer some of the critical LDA credit model due diligence questions discussed above. And, as I summarize in the section below, the key findings from this research suggest an even broader question that - unfortunately - appears to be receiving little notice:

Are our LDA credit model development efforts even on the right path?

My concern here is that many fintechs, regtechs, and early adopters have embraced - either implicitly or explicitly - a set of LDA credit model methodologies, practices, and embedded assumptions that are largely taken as settled practice even though there currently exists significant uncertainty as to their appropriateness within the highly-regulated consumer lending area. For example:

Measuring model fairness (or disparate impact) using the Adverse Impact Ratio ("AIR") or the Standardized Mean Difference.
Using dual-objective model training methodologies - such as adversarial debiasing or fairness regularization - to create LDA credit models.
Using actual or proxied demographic data - either directly or indirectly - in the LDA credit model development process.
Relying on SHAP or Integrated Gradient methodologies to identify the ECOA Adverse Action denial reasons.

As I have written elsewhere, far from being settled science, each of these elements carries risks and limitations impacting a lender's safety-and-soundness and/or regulatory compliance. Yet I see very little public research or writings discussing these risks or exploring alternative model design elements whose corresponding risks and limitations may be much lower.[2] And this creates a precarious position for the industry should regulators, enforcement agencies, or private litigants take issue with one or more of these features, or should a new presidential administration adopt a different position - for example - on equity-based fairness metrics. One wonders what type of potential liabilities early adopters may be accumulating, or to what alternatives they can pivot for business continuity should one or more of these model design elements come under fire. Some important food for thought.

So what avenues should the industry explore in 2024 to mitigate these risks and guide its path to more viable LDA credit models?

I present my thoughts on this important question in the final section of this article - after summarizing below the challenges - as I see them and as detailed in my three 2023 Fool's Gold? articles - facing current LDA credit model adopters. For those who have read these articles and are familiar with these challenges, you can skip ahead to my 2024 research recommendations.

Let's dive in.

The Risks and Limitations of Current AI-Based LDA Credit Models

Fool's Gold Part 1: A Closer Look Under the Hood

In January 2023, I released Fool's Gold? Assessing the Case For Algorithmic Debiasing containing the results of my empirical research into how certain fintech-promoted LDA credit model technologies - such as fairness regularization and adversarial debiasing - actually "remove" illegal disparate impact from consumer credit models. In particular, I wanted to know how these algorithms actually change the original credit model to produce more demographically-aligned loan approval rates with relatively little sacrifice of predictive accuracy. Using a simple credit scoring model built from HMDA data for these analyses, what I found under the hood was surprising and troubling - with my key findings as follows:

Algorithmic debiasing improves credit model fairness simply by lowering the estimated riskiness of certain credit profiles that are disproportionately associated with protected class applicants, and vice versa. The only magic here is the algorithm's ability: (1) to sift efficiently through the various applicant credit profiles to find those whose estimated riskiness are both above the lender's approval threshold and comprised disproportionately of protected class applicants, and (2) to determine precisely the set of model weight distortions needed to lower the estimated riskiness of these credit profiles into the lender's approval region with the least amount of impact on overall model predictive accuracy.

While this is certainly an effective mathematical solution to the fairness issue, my analysis showed that it also comes with some troubling side-effects. Specifically, the resulting distortions to the model's estimated credit risk relationships to improve outcome fairness may be so severe that they become counterintuitive, conceptually unsound, and expose the lender to safety-and-soundness risk.

As I wrote,

In my example, ... the fairness regularizer solves the mathematical fairness problem primarily by distorting the core CLTV credit risk relationship - i.e., all else equal, CLTVs above 85% are now associated with lower estimated PDs [i.e., probabilities of default] than those for CLTVs between 80% and 85%. This produces an LDA ... Model that - while superior according to the ... fairness metric - is no longer conceptually sound, appearing to come from some Stranger Things "upside-down" universe in which credit risk behaviors operate very differently.

The figure below - reproduced from my original Fool's Gold? article - illustrates this distortion. The green line represents the original credit model's estimated CLTV credit risk relationship in which the likelihood of borrower default increases monotonically with the level of his/her loan-to-value ratio - a relationship that is consistent with numerous academic studies, business experience, and is embedded within virtually all credit underwriting policies and procedures.

Impact of Fairness Regularization AI Credit Model

The LDA credit model, however, distorts this relationship - as reflected by the red line - by adjusting the original weights of the CLTV variable such that borrowers with CLTVs between 80% and 85% (who are disproportionately members of the control group) are now considered riskier than those whose CLTVs are above 85% (who are disproportionately members of the protected class). While such adjustments do improve measured fairness, they also produce a model whose estimated CLTV relationship is not conceptually sound given the discussion above. It also leads to a significant underestimation of credit risk for high-defaulting borrowers - thereby exposing the lender to greater safety-and-soundness risks.

Debiasing algorithms typically operate unfettered - that is, they ruthlessly seek an optimal mathematical solution to the fairness problem by adjusting the weights on whatever predictive variables are necessary. It matters not to the algorithm what each variable represents conceptually from a credit risk perspective, how critical that variable is to the lender's credit underwriting policy, whether that variable has a strong causal connection to borrower default behavior, whether the variable has previously been criticized (or not) by regulators for use in credit underwriting, or what true form the estimated credit risk relationship of that variable should inherently take (e.g., positively or negatively related to default risk, monotonically increasing or decreasing, etc.).

As I wrote,

... there are certain credit risk attributes such as CLTV, DTI, PTI, recent bankruptcy, recent credit derogatories, and more that are standard and direct measures of applicant credit risk used ubiquitously throughout the banking industry, that form the foundations of prudent consumer credit risk management policies, and that have not been criticized by federal bank regulators for disparate impact risk. In fact, federal regulation actually requires some of these credit risk attributes to be considered by lenders during loan underwriting. ... Accordingly, using a completely unconditional outcomes-based fairness metric - such as AIR [adverse impact ratio], or the correlation of estimated PDs with applicant demographics - to measure credit model fairness would not seem to align with certain bank regulatory requirements ... , nor with long-standing bank regulatory supervision and enforcement activities that have not cited standard credit risk attributes for illegal disparate impact.

To be fair, some may disagree with this finding - saying that lenders can always exclude certain variables from the algorithm's adjustments. However, even if this were so, then - for logical consistency - one would also need to convert the Adverse Impact Ratio from an unconditional metric to a conditional one - purging this metric of any loan approval rate disparity driven by the excluded variables. Not only does this fundamentally change the fairness metric, but it also sows further confusion into the underlying disparate impact theory on which algorithmic debiasing relies. My article in Part 3 below addresses this point more fully.

Fool's Gold Part 2: Assessing LDA Credit Model Safety-and-Soundness Risks

In October 2023, I released Fool's Gold 2: Is There Really a Low-Cost Accuracy-Fairness Trade-off? where I evaluated one of the central safety-and-soundness claims of debiasing proponents - the existence of a low-cost fairness-accuracy trade-off. This claim - one of the proponents' most amplified - assures lenders that improved fairness is virtually a "free lunch" - involving only minor, sometimes trivial, sacrifices in model predictive accuracy. However, when assessed empirically within the framework of my simple credit scoring model, I found this claim to be untrue as it excludes consideration of a broader set of accuracy metrics necessary for effective credit risk management. In fact, once I incorporated the complete set of model accuracy measures into the analysis, I found that a given improvement in fairness can actually lead to a material reduction in the model's ability to predict accurately the actual default rate levels of a portfolio or certain portfolio sub-segments (its "calibration accuracy") - even though the model's rank-ordering accuracy (the accuracy metric typically used by proponents) may be little affected.

In my simple credit scoring model example, the LDA credit model suffered from a -20% reduction in model calibration accuracy on approved loan applications relative to the original credit model. Not only is this materially different than the relatively minor -3.4% reduction in rank-ordering accuracy, it is also very troublesome for effective credit risk management.

As I wrote,

While the ... rank-ordering accuracy metric is certainly important for operationalizing a scorecard-based credit decision strategy, it represents only a partial measure of credit model performance relevant to lenders. What it doesn't capture is the model's [calibration] accuracy [- i.e., its accuracy] in predicting the lender's expected credit risk exposure levels associated with its credit decisions - such as expected default rates and expected losses on approved loans. Accurate estimates of these metrics are critically necessary for the lender to remain within established credit risk limits, and for evaluating whether subsequent observed loan losses are consistent with original LDA Model estimates.

Fool's Gold Part 3: Assessing LDA Credit Model Legal and Compliance Risks

Finally, in November 2023, I released Fool's Gold 3: Do LDA Credit Models Really Improve Fairness? where I evaluated whether the LDA credit model's improved fairness was actually driven by a removal of the type of inappropriate credit barriers most compliance professionals and regulatory writings attribute to disparate impact.

As I wrote,

... traditional credit model disparate impact analysis focused, in part, on specific model attributes that were highly correlated with applicant demographics and whose causal connections to an applicant's credit performance were considered questionable. From a fair lending perspective, the primary concern was that such attributes would improperly penalize protected class applicants' credit access and borrowing costs by overestimating their credit risk relative to: (1) their historical default behavior, or (2) credit risk estimates based on more "legitimate" credit risk attributes linked more directly and causally to their repayment behavior.

What I found in my research was that the removal of this traditional form of disparate impact was NOT what algorithmic debiasing was doing - nor what it was specifically designed to do.

While algorithmic debiasing may, in fact, improve the LDA credit model's relative approval rates, this fairness improvement may NOT be due to the remediation of improper credit access barriers typically associated with disparate impact. Instead, the LDA credit model may simply act as a tool to implement affirmative, policy-driven credit access expansion targeted to certain protected class groups via a latently-encoded reverse disparate impact - thereby exposing the lender and its consumers to certain unintended legal and compliance risks.

As I wrote,

In the swap set analysis performed on my LDA Model, we learned that the AD process achieved its ... fairness improvements NOT by mitigating the impact of an improper predictive attribute on the estimated PDs of otherwise qualified applicants, but by artificially making certain unqualified applicants appear qualified, and vice versa. For example, high risk applicants with CLTVs > 95% (which skew demographically to the protected class and were denied under the [original credit model]) are now approved by the LDA Model by assigning them artificially and significantly lower PDs (i.e., 9% on average vs. 61.4% under the [original credit model] - and an actual average default rate of 62.1%). And to keep the overall approval rate at the lender's desired 90% level and to alter the relative approval rates in the desired demographic direction, the algorithm then offset these additional LDA approvals with denials of an approximately similar number of lower risk, qualified applicants with CLTVs between 80% and 85% (which skewed demographically to the control group). These denials were "justified" by assigning them artificially and significantly higher estimated PDs (i.e., 20.2% under the LDA Model vs. 5.0% under the [original credit model] and an actual average default rate of 5.4%)

In none of these swap set segments was the LDA Model mitigating the effects of a model failure or a questionable, non-causal attribute that was distorting the applicants' true credit qualifications based on observed repayment performance. Rather, the LDA Model implemented credit decision overrides for applicants with credit profiles exhibiting certain demographic correlations in order to achieve the [algorithm's] objective of improved approval rate equity. In essence, the LDA Model implemented a latent ECOA-like Special Purpose Credit Program - but without the explicit identification of the target group and without the formal underlying legal structure required by law and regulation (among other things).

LDA credit models may expose lenders to UDAAP claims, predatory lending allegations, and/or safety and soundness risk. And these risks can be difficult to detect in highly-complex LDA credit models containing hundreds or thousands of variables.

As I wrote,

[In my example], ... the LDA Model may be approving higher-defaulting (i.e., unqualified) applicant segments to improve the AIR-based fairness metric. However, without expanded fairness transparency, the lender may inadvertently expose itself to heightened legal, compliance, safety and soundness, and reputational risks for targeting such applicants with loans that many may not be able to repay. For example, my LDA Model swapped in 5,477 applicants with CLTVs > 95% for approval despite such applicants having a 62.1% historic default rate. While, in the real world, a lender's risk management team would likely (and sensibly) identify and potentially prevent such blatantly improper approvals, this outcome becomes much more difficult in models with hundreds or thousands of predictive variables and where no swap set transparency analysis of this type is performed.

LDA credit models may counterintuitively perpetuate diminished credit access to underserved populations.

As I wrote,

While improved approval rate equity may be a strong driver for a lender's ... LDA Model adoption, this analysis suggests that lenders should also consider the longer-term potential for certain LDA Models to perpetuate the very societal problem that they are trying to address. ... [W]hile LDA Model swap-ins help to improve approval rate equity, if these swap-ins ... are not otherwise creditworthy, then a high percentage of such approvals may likely default on such loans - leading to impaired credit reports for such borrowers and an extended future period of likely diminished and higher-priced credit access.

Because LDA credit models are developed using a dual-objective model training process (i.e., optimizing for both fairness and accuracy), standard approaches for determining Adverse Action reasons may yield denial reasons that are not compliant with ECOA.

As I wrote,

... [T]he LDA Model produces some lower-risk denials and some higher-risk approvals solely to improve the model's fairness performance. Accordingly, standard methods to determine ECOA denial reasons may attribute certain credit model attributes as the reason(s) for these lower-risk denials when, in fact, the only role played by such attributes in this decision was their correlation with the applicant pool's demographics. ... For example, in my LDA Model, 23,719 applicants - approved under the [original credit model] with an average PD of 5% - are denied under the LDA Model with an average PD of 20.2% .... Standard explainability processes would attribute this denial to the applicant's CLTV attribute since the LDA Model significantly increased the risk of CLTV 80-85% applicants. ... However, the higher estimated PDs for these applicants have nothing to do with their repayment behavior; in fact, their repayment behavior (i.e., average default rate) is 90% better than the repayment behavior of applicants with CLTVs > 95% which the LDA Model now approves. Accordingly, attributing these applicants' credit denial decisions to their CLTVs may not be considered accurate as they were, in reality, denied due to the racial composition of their credit profile as part of a rough justice swap set to improve the lender's approval rate equity.

LDA credit models may expose lenders to the risk of reverse discrimination claims in light of SCOTUS's recent Students For Fair Admission ("SFFA") decision.

As I wrote,

... [T]o the extent that an LDA Model achieves improved ... fairness via the affirmative, policy-driven swapping in (i.e., approval) of less qualified applicants and the swapping out (i.e., denial) of more qualified applicants, a lender may be at risk of reverse discrimination claims if the SFFA [Students For Fair Admissions] decision is deemed to apply. Additionally, even if a lender is not debiasing its credit models, there could still be risk since, as I discussed previously, many credit models may have an inherent predictive bias in favor of certain protected class groups. I would therefore recommend that lenders take prudent action to evaluate such risks with legal counsel and mitigate them as warranted.

It is insufficient due diligence for a lender in a high-stakes, highly-regulated area like consumer lending to adopt a complex, algorithmically-altered credit model simply based on surface-level improvements in high-level credit outcome comparisons across demographic groups - such as loan approval rates. Instead, robust and accurate fairness-based transparency and explainability information needs to be provided to key stakeholders to promote a deeper and more meaningful understanding of how the LDA credit model specifically achieves its improved fairness performance both globally (i.e., at the demographic group level) and locally (i.e., for individual swap-ins and swap-outs).

As I wrote,

Such transparency helps these stakeholders evaluate whether the bases for expanded protected class approvals (as well as the likely reduction in control group approvals) are consistent with their fair lending compliance objectives as well as with applicable laws, regulations, company policies, and company values. Not only is this prudent, but it is also consistent with the model risk management principles embedded in long-standing bank regulatory guidance.

Fairer AI Credit Models: Where Do We Go From Here?

As my prior research indicates, algorithmically-derived LDA credit models - for all of their purported benefits - may actually operate very differently than is commonly understood. Not only does this suggest that early adopters may be exposed to important safety-and-soundness and compliance risks, it also raises important questions as to whether these models are currently fit for purpose due to insufficient model validation and compliance testing.

Moreover, as I discussed in the introductory section, the current breed of LDA credit models have several design features that many early adopters implicitly or explicitly accept as settled practice even though there currently exists significant uncertainty as to their appropriateness within the highly-regulated consumer lending area. This creates added risk to early adopters should regulators and enforcement agencies take issue with one or more of these features, or should a new presidential administration adopt a different position - for example - on equity-based fairness metrics.

Clearly, further publicly-available research into these risks and benefits is sorely needed - as well as research on alternative LDA credit model designs. To this end, I would suggest the following topics as priorities for lenders, fintechs, regtechs, and regulators to provide a clearer path forward to safer and more legally-defensible consumer credit fairness improvements.

1. Further Public Research Into Current LDA Model Risks and Benefits

In my own research, I have clearly noted that my empirical results are based on analyses of a simple credit scoring model developed using publicly available HMDA data. While further research based on more typical consumer credit scoring models and datasets would be beneficial to understand whether my results generalize more broadly - thereby reinforcing the need to explore alternative LDA approaches, the underlying data and models necessary for such research are not publicly available. Accordingly, fintech lenders and regtech vendors should consider releasing their own similar research results or, to ensure more independence of the research, provide the underlying data to academic or other research organizations for these purposes. This would also be a worthy research area for the federal bank regulatory agencies who have much broader access to industry LDA credit models and data, and who are staffing up on technical professionals to better understand and regulate AI-driven financial technologies.

While I am fairly confident that my results should generalize to more typical models based on similar debiasing approaches, we do require the cooperation of those creating and promoting these models to see if this is so. And if those promoting these models are unwilling to share such results, then lenders should take that lack of transparency into account when evaluating the overall risks of third-party LDA credit model adoption.

2. Specific Research on LDA Model Swap Sets: Profiles, Defaults, and Losses

Proponents claim that LDA credit models improve fairness by removing illegal disparate impact on protected class applicants. However, as I point out in my third Fool's Gold article, how these proponents define "disparate impact" may be very different than how it has been defined traditionally in regulatory writings, or how it is defined in any given company's fair lending compliance policy. Accordingly, a lender adopting an LDA credit model should ensure that the "disparate impact" definition deployed within the model is consistent with its policy, and one way to do this is to analyze the LDA credit model's "swap sets" - that is, as described in the above article, those borrowers in the training/test data who would be denied under the original credit model but approved under the LDA version (the "swap-ins") and vice versa (the "swap-outs").

Among other things, this analysis should identify and evaluate the characteristics of the applicants for whom the effects of disparate impact - as operationalized by the debiasing algorithm - have purportedly been removed (i.e., the swap-ins).[4] That is, as disparate impact is an adverse effect - e.g., denying a loan to an applicant who is otherwise credit qualified - one would expect that such applicants who are newly-approved by the LDA credit model (i.e., the swap-ins) would have certain characteristics consistent with this inappropriate treatment. For example,

Are their actual default rates and/or losses statistically equivalent to (or even lower than) those on applicants who were approved under the original credit model? Materially higher default rates and/or losses on swap-in applicants would appear to be counterintuitive to disparate impact's requirement for otherwise creditworthy applicants.

Does the evidence indicate that the swap-ins' credit risk was originally over-estimated by the original credit model? If the original credit model was not penalizing these otherwise creditworthy applicants by overestimating their credit risk (either absolutely, or relative to other relevant applicant groups), the presence of traditional disparate impact would appear to be contraindicated.

Do the credit profiles of the swap-ins reflect model attributes that are highly correlated with these applicant's demographics and whose causal connections to their credit performance is considered questionable? Are these attributes improperly penalizing these applicants' credit access and borrowing costs by overestimating their credit risk relative to: (1) their historical default behavior, or (2) credit risk estimates based on more "legitimate" credit risk attributes linked more directly and causally to their repayment behavior.

One may also wish to perform similar testing for the LDA credit model's swap-outs - investigating whether they perform statistically the same or better than approved applicants and the swap-ins, and whether they received a benefit from the original model in terms of an underestimation of their credit risk. Such analyses would indicate whether there may be a potential risk for reverse discrimination claims - although you should consult with legal counsel for a more authoritative opinion on this.

From a more general research perspective - and perhaps jointly performed with law school faculty or other legal professionals, it would be beneficial for the industry to know precisely how disparate impact is operationalized in the most popular debiasing algorithms and whether the corresponding effects on applicant credit decisions create legal and compliance risks to LDA credit model adopters based on sound legal analyses. Should such analyses indicate inconsistencies with applicable fair lending and civil rights laws and regulations, it would be wise for the industry to consider alternative LDA credit model approaches to remedy such an important weakness for this high-stakes use case.

3. Exploration of Alternatives to Current Debiasing Algorithms

Given the potential safety-and-soundness and compliance risks associated with current debiasing algorithm design features, it may be prudent for more technical researchers to begin exploring alternative methodologies and/or features that would mitigate these risks - for example, developing debiasing algorithms (or improving initial model development processes) that identify and mitigate systemic model prediction errors (i.e., overestimation of credit risk for protected class groups and underestimation of credit risk for control groups). This, of course, assumes a traditional disparate impact definition in which applicants are harmed by systemically inaccurate credit risk scores rather than simple approval rate inequity. Additionally, researchers would need to be careful that any such methodologies do not inadvertently introduce disparate treatment into the model's credit risk estimates in order to address potential systemic predictive biases disproportionately impacting certain demographic groups.

4. Legal Research Into the Intersection of AI Credit Model Explainability and Consumer Protection Laws and Regulations

Currently, most adoptees of AI-based credit models employ some variation of the SHAP or Integrated Gradients methodologies to identify ECOA Adverse Action Reasons for denied credit applicants. However, as LDA credit models are estimated using dual training objectives (i.e., optimizing predictive accuracy and fairness), the model's estimated variable weights will necessarily reflect the influence of both objectives. What this means practically is that while one variable weight (e.g., DTI) may primarily reflect its strong correlation with borrower default behavior - increasing the likelihood of credit denial for applicants with higher DTIs to promote predictive accuracy, another variable weight (e.g., CLTV) may primarily reflect its correlation with control group demographic membership - thereby increasing the likelihood of credit denial for applicants with lower CLTVs to promote increased fairness (see Finding #1 in the previous section for a specific discussion of this).

When a credit applicant is denied using these models, the key question under ECOA is: what are the specific reasons? In the former case where the applicant is denied for a high DTI, the denial would be due to the applicant's high credit risk caused by excess debt obligations. However, in the latter case where SHAP or Integrated Gradients would identify CLTV as the primary denial reason - the denial may not be due to unacceptable credit risk, but due to the lender's fairness objectives. However, simply reporting CLTV as the denial reason on the applicant's Adverse Action Notice implies that the applicant's CLTV level made them insufficiently creditworthy - even though the applicant's inherent creditworthiness may have had nothing to do with their adverse action. In reality, the applicant could have been denied credit due to the fact that their low CLTV is disproportionately associated with control group applicants whose approvals would compromise the lender's desired fairness targets (e.g., AIR levels). The critical question here is whether simply stating that the application was denied due to its CLTV is consistent with the requirements of ECOA and Regulation B, or whether the applicable laws and regulations would require the "specific" denial reason to reference the applicant's likely demographic group membership that compromises the lender's ability to meet its fairness targets.

* * *

ENDNOTES:

[1] In fact, the rigorous testing requirements for AI models was recently highlighted by senior leaders of the four primary federal bank regulatory agencies at the January 18-19, 2024 National Fair Housing Alliance's Responsible AI Symposium.

[2] To be fair, the use of inherently interpretable AI credit model architectures has been discussed relatively broadly as an alternative model design element to improve credit decision explainability and thereby lower the risk of ECOA Adverse Action Notification violations.

[3] Consumer compliance risks are also a significant concern here as the LDA credit model is effectively targeting high-risk protected class applicants for loans whose historical default rates are very high.

[4] Technically, the disparate impact analysis should focus on those swap-ins who are members of a protected class. However, because debiasing algorithms implicitly search for swap-in credit profiles that are disproportionately populated by protected class applicants, I discuss the research testing here more generically by reference to "swap-ins".

The Road to Fairer AI Credit Models: Are We Heading in the Right Direction?

The Risks and Limitations of Current AI-Based LDA Credit Models

Fool's Gold Part 1: A Closer Look Under the Hood

Fool's Gold Part 2: Assessing LDA Credit Model Safety-and-Soundness Risks

Fool's Gold Part 3: Assessing LDA Credit Model Legal and Compliance Risks

Fairer AI Credit Models: Where Do We Go From Here?

1. Further Public Research Into Current LDA Model Risks and Benefits

2. Specific Research on LDA Model Swap Sets: Profiles, Defaults, and Losses

3. Exploration of Alternatives to Current Debiasing Algorithms

4. Legal Research Into the Intersection of AI Credit Model Explainability and Consumer Protection Laws and Regulations

Related Posts

Share your feedback on the AI LendScape Blog