The first real argument about scores in a fast-growing lender rarely happens in a model-validation meeting.
It usually happens in a quarterly portfolio review that was meant to be routine.
It’s late in the evening at an NBFC that has grown aggressively through digital channels.
On the screen is a slide titled, in neutral language:
“Scorecard Performance – New-to-Credit PL, Q1–Q3”.
The Head of Analytics walks through the highlights:
· Gini still above 0.55 on the main bureau-plus-income model
· Approval rates stable at ~32%
· Early loss numbers “well within plan”
A product head leans back and says what many are thinking:
“This confirms what we’ve been saying – our scores are solid.
If there was a real risk problem, it would already be in the numbers.”
A few pages later, a less comfortable slide appears.
Vintage curves for a specific pocket:
· Customers from a set of Tier-3 districts
· Sourced through one digital partner
· Ticket size between ₹75,000 and ₹1,25,000
· Score range 690–720, all “green” by policy
The 6–9 month loss curve is visibly higher.
Collections has added a short comment box:
“Score good, early behaviour clean.
Spike in skips and roll-forwards from month 7 onwards.
Heavy top-up usage through other lenders based on our early performance.”
The room goes quiet for a few seconds.
The CRO asks:
“Remind me – this pocket is approved fully on system score? No manual touch?”
The product head replies:
“Yes, that’s the whole point of the digital flow.
Instant approval under 30 seconds.
If we slow that down, we lose the funnel.”
Everyone knows this argument.
The number that nobody quite says aloud is the proportion of the book now being decided by those “solid” scores:
· In this portfolio, 78% of decisions in the last six months were fully automated.
· Human eyes see edge cases and exceptions.
· The rest is whatever the score and the rule engine say.
The belief in the room is still simple:
“Our scores are well-built and monitored.
If something fundamental was wrong, it would show up across the portfolio, not just in a few pockets.”
That belief is what keeps growth moving.
It’s also what blinds people to what scores can’t see.
If you strip away the jargon, the working assumption in many NBFCs and digital lenders sounds like a line from that room:
“Our scores are solid.
They’ve been back-tested.
Ginis are good.
Reject rates look reasonable.
If there was a structural risk issue, it would already show up in the numbers.”
You hear versions of it in different places.
In a digital-lending product review:
“We are not just lending on gut.
We have a proper bureau-plus-income score, clear cut-offs by segment, challenger models in the background.
Compared to older NBFC practice, this is a huge step up.”
In a funding discussion with investors:
“Our book is driven by data-led underwriting.
Scores, not branches.
We can tune approval rates and average ticket size quickly if we see any stress.”
In a collections review:
“We’re already using the same scores to prioritise buckets.
Higher-risk bands get earlier attention.
So score-driven decisions are not just at the front; we use them end-to-end.”
Underneath sits a simple comfort:
· If scores are calibrated and monitored at portfolio level, they will signal trouble early enough.
· If trouble hasn’t appeared in headline metrics yet, the use of scores is not the problem.
That feels reasonable, especially for NBFCs and digital lenders that have grown away from manual branch-led underwriting and want to show they are “more scientific” than the last cycle.
The uncomfortable part is this:
Scores are not just predicting risk anymore.
They are trading invisibility for speed at scale.
And the places where that trade goes wrong are rarely where model Ginis and portfolio averages are looking.
If you follow how a typical NBFC or digital lender really uses scores, you don’t see a clean underwriting tool.
You see a decision engine that has slowly expanded its reach.
A few patterns repeat.
Originally, scores sat at the back:
· Application arrives.
· Data is collected.
· Score is calculated.
· Decision is made.
In many digital flows now, scores sit at the front:
· Marketing uses bureau-cut lists to build “likely to qualify” audiences.
· Pre-login screens and app banners are shown only to customers above certain score ranges.
· Some NBFCs pre-generate “pre-approved” lines in the background, long before an explicit application.
The effect:
· Scores decide who even feels like they have access.
· For a large part of the target universe, the decision is over before it formally begins.
No policy document says:
“We will use scores to decide who feels invited to participate in formal credit.”
Practically, that’s what happens.
When loss numbers look fine, nobody questions this front-end filtration.
But it shapes the book in ways that standard risk reports don’t describe.
For NBFCs and digital lenders, especially in personal loans and cards, a borrower’s life now looks like a series of score episodes:
· Initial approval: bureau-plus-internal score > cut-off → approve.
· 6 months later: refreshed score + performance → offer top-up.
· 12 months later: new score + usage pattern → increase line, cross-sell, or freeze.
· Under stress: updated score → decide early restructuring vs firmer collection.
At each point, the score is combined with a few rules:
· Minimum income.
· Maximum exposure.
· Basic KYC checks.
· Sometimes device or app-level signals.
But the main comfort is still:
“The customer is in this score band.
This is our policy for that band.”
The risk that creeps in is not that the score suddenly becomes wrong.
It’s that:
· The context around the score changes faster than the monitoring is set up to see.
· The same score value is treated as equivalent across segments, geographies, and acquisition channels with very different behaviour.
So “720 with strong bureau history, salaried in metro sourced direct” and “720 with thin-file, aggressive top-ups sourced via affiliate link in a Tier-3 town” get treated with the same confidence , until you plot their 9-month roll-rate curves side by side.
The score is the same.
The story is not.
As NBFCs and digital lenders mature, scores quietly become commercial knobs:
· Marketing likes them because they can show investors and partners a neat grid of approval rates by score band.
· Risk likes them because they can show loss rates by decile and comfort themselves that the curve is monotonic.
· Business likes them because they can run pricing experiments by score bucket and show incremental ROA gains.
Soon, you see slides like:
· “If we lower the cut-off from 710 to 700 in this channel, we gain X bps of approval for Y bps of loss.”
· “If we offer +200 bps pricing to this band, we can afford a slightly higher bad rate.”
On paper, this is rational.
In practice:
· The score becomes a single proxy for risk in more and more commercial decisions.
· Other signals that don’t show up cleanly in the score , such as channel quality, documentation corners being cut, early customer service interactions , are treated as secondary, even when they are early warning.
The model deck may have ten pages of caveats:
· “Not validated beyond X ticket size.”
· “Not designed for self-employed thin-file in Y states.”
· “Performance for leads from certain digital partners not fully stable.”
Those caveats rarely make it to the actual pricing and campaign decks.
Collections and service teams also increasingly use scores as triage tools:
· High scores with temporary dips in behaviour get gentler treatment and more options.
· Low scores with similar behaviour get firmer scripts and earlier escalation.
· Some digital lenders decide who to call manually versus who to leave on automated reminders based partly on scores.
Again, there is nothing inherently wrong with this.
The subtle shift is that:
· Score becomes a moral comfort as well as a risk signal.
· Teams feel justified in leaning harder on someone with a low score – “they were always riskier” – even when the triggering behaviour is the same as a high-score customer’s.
You can see this in collection dashboards that show:
· “Cure rates by score band”
· “Call intensity by band”
· “Settlement rates by band”
but almost never show:
· “Differences in hardship options communicated by band”
· “Differences in complaints and escalation by band”
The same number is doing more work than anybody admits.
If scores are being stretched this way, why don’t more NBFC and digital leaders see it sooner?
Because at the start, scores are very good at one job – and that hides how they are being pulled into other jobs.
Early on, headline metrics cooperate:
· Overall loss rates stay within planned ranges.
· Approval rates look healthy.
· Vintage curves behave as expected for the first few cohorts.
Standard monitoring shows comforting charts:
· Risk by score decile: neatly increasing.
· Approvals by band: sensible tapering.
· “Bad rate per decile” like a textbook.
What those charts don’t show is where scores are genuinely weak:
· Thin-file borrowers whose bureau score is driven by very few trades.
· Segments where reported credit behaviour lags real leverage because of fast-moving BNPL or informal borrowing.
· Micro-markets where documentation and KYC are technically compliant but social reality is different from training data.
Because those pockets are small at first, their pain is averaged out.
By the time their impact is visible in portfolio-level numbers, they are already baked into several vintages.
Model-validation documents and monitoring packs are full of technical detail:
· Population stability.
· Score distribution shifts.
· Override rates.
· Roll-rate performance.
Very few include delivery artefacts like:
· Screenshots of the actual loan-journey UI where the score drives instant approval or rejection.
· Excerpts from channel training decks that show what sales teams are being told about who is “easy approval”.
· Snippets from collections scripts that show how agents are encouraged to think about high-score vs low-score customers.
So committees sign off on a model in abstraction.
They do not see that, in the app, the same score is:
· A green “Approved in seconds” badge.
· A basis for more aggressive upsell.
· A justification for different treatment under stress.
The conversation stays about “the model”, not about the lives built around the model.
When NBFCs and digital lenders are in high-growth mode:
· New disbursements are rising month-on-month.
· Acquisition cost per account is falling through better digital and partner funnels.
· Fresh vintages haven’t had time to show their full loss curve.
In that environment, it is easy to believe:
“Scores are doing their job; look at our scaled book and controlled losses.”
What is harder to see:
· Where growth itself is changing the input population in ways the score was not built for.
· Where the score is now being used to rationalise moves that are more about sales pressure than risk.
By the time growth slows and loss curves fatten, the score will be blamed.
The underlying misuse rarely gets a line of its own.
The NBFCs and digital lenders that seem less surprised by score-related trouble are not necessarily better modellers.
They are more deliberate in how they let scores into the room.
Instead of letting each team pull scores into their own local decisions, they force a single view of where score is allowed to play.
In one lender, the CRO asked for a simple document:
· “List every place where any score – bureau-based or internal – directly changes a customer outcome.”
The list included:
· Underwriting cut-offs and pricing grids.
· Pre-approval list generation for campaigns.
· Rules for line increases and temporary credit boosts.
· Collections segmentation and script assignment.
· Eligibility criteria for hardship and restructuring routes.
Seeing all of these on one page did two things:
· It made it obvious that the same customer might hit the score in four or five different contexts with no joined-up thinking.
· It raised the question:
“Are we comfortable with score being this central to so many different, unrelated decisions?”
They didn’t rip anything out.
They did mark a few uses as “to be reviewed” and removed score from a couple of places where it had slipped in by habit.
Rather than only tracking Gini and bad rates by decile, more cautious teams:
· Break performance down by channel, geography and customer type for the same score ranges.
· Look explicitly at thin-file, new-to-credit, self-employed and certain PIN codes inside each band.
Their question is not:
“Does the model work overall?”
It is:
“Where does the model give us most comfort and least comfort, and are we letting that shape decisions properly?”
This shows up in small operational rules:
· Manual review is reintroduced for specific combinations, even if the score is high.
· Instant-approval flows are restricted for certain thin-file pockets; a 30–60 second delay with an extra check is accepted as a cost.
· For some channels, the nominal cut-off is lifted by 10–20 points until more experience accumulates.
On paper, these moves look like friction.
On the ground, they are just an admission that score is not a god, especially at the edges.
Another signal they track is not a model metric at all:
· “What percentage of disbursals in the last quarter were done with no human eyes on the case?”
And then:
· “What percentage of those fully automated decisions were in segments we understand well, vs segments where our data is thin?”
When that second number starts to creep up, they see it as a risk signal, not as a badge of modernity.
It leads to decisions like:
· Capping growth in certain digital funnels until manual samples catch up.
· Forcing a periodic “borderline case review” – 50 auto-approved and 50 auto-declined cases around the cut-off manually read by senior credit staff each month.
The point of those reviews is not to second-guess the model.
It is to keep a human sense of whether the patterns still feel right.
More experienced teams think about scores in the full customer lifetime curve, not just at the point of decision.
They look at:
· “How often does a high-score customer go on to over-leverage through top-ups, both with us and others?”
· “Do some bands generate good first-year performance but poor behaviour in years two and three?”
· “Are we using score to justify too many top-up and refinance offers in the first 12–18 months?”
Where they see issues, they change how much they lean on scores for repeat lending.
They accept that:
· A score that was very good at picking safe entry points may be less good at deciding how much more to offer, how soon, and to whom.
And they reflect that in:
· Cooling-off periods.
· Limits on early top-ups for some segments.
· Extra checks before cross-selling into already leveraged pockets.
Not because the score is wrong.
Because the score is not the whole story of resilience.
It is tempting, especially in NBFC and digital environments, to keep the story simple:
“We moved from gut to scores.
Our models are validated.
We monitor Gini and bad rates.
If there was a fundamental risk issue, the numbers would already show it.”
If you stay with that, scores will increasingly become:
· The main comfort you show investors.
· The main tool you give product teams.
· The main filter collections uses to triage.
And the next time something cracks, the model will be the villain of the month.
A different way to hold scores is to accept that:
· They are now woven into who you invite, how you serve, when you push, and when you help.
· They are very good at some things and much weaker at others.
· Their weak spots often sit exactly where NBFCs and digital lenders are most tempted to grow fastest.
From that angle, the useful question before the next proud slide that says “our scores are solid” is not:
“Are our models calibrated and monitored?”
It is:
“If we take one customer and trace every decision where a score changed their path –
from the first app notification they saw, through approvals, top-ups, collections calls and hardship options –
would we still feel that we used that score as a tool, or did we let it quietly run more of their financial life than we ever intended?”
Most lenders haven’t traced that story yet.
The ones who have still use scores.
They’ve just stopped pretending that a good Gini means the hard thinking is done.