The Hidden Challenge of Reject Inference in Credit Scoring

When developing credit risk models, one of the hardest challenges is how to deal with rejected applications.  These are the customers who applied for credit but were declined, meaning we never get to observe their true repayment behaviour. 

For application (origination) scorecards or models, this creates a fundamental problem:

if we only build on accepted customers we may end up with a model that only works well on previous accepts but poorly for the wider applicant population. If we try to infer what the rejects might have done we risk building on assumptions rather than evidence.  This delicate balance (between realism and inference) is the reject inference challenge, and why it remains both an art and a science. 

Why is Reject Inference hard?

At its core, reject inference tries to solve an impossible problem: 

What would have happened if we had lent to people we didn’t lend to?” 

Since we cannot observe those outcomes directly we have to estimate them.  And the accuracy of those estimates depends on the quality of our assumptions. 

This can be inherently hard because of: 

Selection bias: 
The data we have (accepted customers) are not representative of all applicants. The act of accepting or rejecting changes what the known data shows.  For example: Customers in the UK with CCJs may be disproportionately rejected and only the very ‘good’ ones accepted, skewing the observed risk profile for the historic accepts. 

Feedback loops: 
Lending policies might evolve based on past models, meaning today’s accept/reject decisions are influenced by yesterday’s biases.  

Limited validation: 
We can’t directly check whether our inferred outcomes are correct.  Some modellers use bureau data as surrogate performance data to help inform and understand how rejects would have performed had they been accepted.  They do this by seeing if they open an account elsewhere soon after being rejected.  There are dangers of assuming performance would have been the same – different lender, limit, and other terms and conditions can produce very different payment behaviour. 

Common Pitfalls 

Reject inference has many traps for the unwary or inexperienced modeller.  Here are a few we see most often: 

Assuming rejects behave like bads: 
A crude approach sometimes treats all rejects as bad customers. This is a guaranteed way to bias the model toward excessive conservatism and create a self-fulfilling feedback loop where the profiles of previous rejects look ever riskier. 

Ignoring Accept/Reject patterns: 
Not paying enough attention to Accept/Reject patterns for different profiles and understanding how this can influence the observed Known Good/Bad patterns and therefore the Known Good/Bad model.    

Using inconsistent data windows: 
When the acceptance criteria or product offer changes during the development period, inferred outcomes may reflect outdated or irrelevant conditions. 

Overlooking policy and operational realities: 
Sometimes rejects are declined (or simply walk away) for non-risk reasons (e.g. incomplete applications or missing documents). Ignoring this nuance leads to poor inference. 

How to Avoid the Pitfalls 

The experienced credit risk modellers treat reject inference as a sensitivity and iterative exercise rather than a search for perfect truth.  Here are a few practical methods to approach it: 

Start simple, and compare: 
Always benchmark against a model built on accepts only. Does inference actually add predictive power, or just artificial lift?  

Segment your rejects: 
Not all rejects are equal. Differentiate between: 

Rejects where Accept/Reject and Known Good/Bad patterns indicate cherry picking, 

Borderline declines, 

Very high-risk declines, and  

Non-risk declines (e.g. incomplete or withdrawn applications).

Inference is often most reliable on the borderline group.

Be transparent: 
Document the assumptions, the methodology, and the limitations. A regulator or validator will expect this, and it builds trust in your modelling process. 

How to Evaluate Success 

Evaluation is the hardest part because we can’t measure what didn’t happen. 

The next time you perform reject inference, here are some checks that you may want to perform to gain a better insight and understanding of how your inference results have impacted your ‘through the door population’ and therefore your final model. 

Univariate Information Value (IV) reports for potential scorecard characteristics  

IV by variable and outcome predicted

The purpose of this report is to see if the IV for most characteristics increases from Known Good/Bad to All Good/Bad.  If the Known Good/Bad pattern was counter-intuitive the IV might go down.  So this report should be looked at in combination with the WoE patterns report for each characteristic.   

WoE patterns for potential scorecard characteristics 

WoE patterns for scorecard characteristic Age

Are the WoE patterns consistent across the Known Good/Bad, Accept/Reject, All Good/Bad and Assigned Good/Bad?  If not, can they be explained?

If the Known Good/Bad pattern is counter-intuitive (for example, cherry picking and WoE patterns reversing compared to Accept/Reject pattern) then this report helps understand if the Rejects have been inferred in an understandable and acceptable way.

Accept Reject Swapset (maintaining previous accept rate)  

Accept Reject swapset table

Are you comfortable with the size of the swapsets if the Accept rate were to be maintained?  In the example above, 20% of all decisions will change (previous accepts being rejected or vice versa).  You can drilldown into the profiles of the swap-in and swap-out groups to understand these groups better and determine if they are palatable for the business.

Accept Rates for potential scorecard characteristics

Accept Rates for potential scorecard characteristic Age

Do the projected accept rates for your characteristic groups make sense, when compared to the historic accept rates? Which groups will you accept more of or fewer of? Have they moved in a sensible direction?  Is the size of change acceptable or too large or small?

Log(Odds) vs. Score fit for the previous Accepts and Rejects

Log(Odds) vs. Score fit for the previous Accepts and Rejects

Is the relationship similar for the previous Accepted and Rejected populations? Typically, we like to see that the new model developed on the total through the door population is not unduly impacted by the previous accept/reject decision and therefore these two sub-populations have aligned score-to-odds relationships.

I have kept these example evaluation reports relatively simple to explain the core concepts.  It is certainly possible to delve deeper into these evaluations. Further checks can explore:

  • Model stability:
    If inference has improved your model, it should also perform more consistently across time and population segments.

  • Reaccept tests:
    If possible, re-score some previously rejected applications and observe subsequent outcomes. Do inferred “goods” and “bads” behave as expected?

  • Validation overlays:
    Use hold-out or shadow portfolios (e.g. partial approvals) to see if the inferred behaviour matches observed performance.

  • Economic reasonableness:
    Does the inferred model make intuitive sense to credit risk experts? Models that only look good statistically rarely survive real-world scrutiny.

Final Thought

Reject inference will never be perfect but done thoughtfully it helps make credit models more representative, fairer, and robust. The real skill lies not in finding the cleverest algorithm, but in engineering the process carefully, testing assumptions, and interpreting results with a balance of science and credit judgment.

At Paragon, we’ve built these principles into our Modeller platform. Its dedicated Reject Inference module provides flexibility, transparency, and full tracking of iterations - helping teams turn uncertainty into structured insight. If you’d like to learn more about how Modeller supports reject inference and end-to-end credit model development, we’d love to talk. Please get in touch.

Next
Next

The Paragon Best Paper Award 2025