Search by Categories

image
  • February 13, 2026
  • Arth Data Solutions

Data Quality Challenges (The Risk We Only See Sideways)

Data Quality Challenges (The Risk We Only See Sideways)

The first sign that data quality is not what you think it is rarely appears with a big red flag.

It usually shows up as a side comment in a meeting that was meant to be about something else.

A collections review.

An NPA drill-down.

A credit bureau inspection response.

On the projector: a familiar table.

“Vintage performance by product – bureau-based view vs internal DPD.”

The numbers don’t quite line up.

Someone from analytics says:

“We should note that there are minor mismatches between internal and bureau DPD for these buckets. Nothing material – mostly due to reporting timing and field mapping.”

A business head replies:

“As long as it doesn’t change the story, let’s not get stuck in technicalities. We have RBI checks and bureau validations anyway, right?”

The room nods, and moves on.

No one asks the harder version of that question:

“If our own systems and the bureau disagree on when a customer was actually late, how sure are we that the rest of our data is as clean as our decks assume?”

You see that version later.

It appears when:

·         An RBI inspection notes inconsistencies between internal records and bureau reporting.

·         A model validation quietly flags unstable behaviour in key variables.

·         A partner bank challenges pool-level data in a co-lending structure.

By then, the language in the room is different:

“We need to understand these data quality issues properly.”

The truth is simpler and less comfortable:

By the time you are discussing “data quality issues”, you’ve been living with them for years.

 

The belief: “Data quality is mostly solved – we have checks and reconciliations”

If you strip away the careful phrasing, the working belief in many institutions sounds like this:

“We’ve invested in core systems, bureau integrations and regulatory reporting.

Files pass bureau checks.

RBI hasn’t raised major exceptions on our credit information.

So data quality is not perfect, but it’s basically under control.”

You hear versions of it in different rooms:

·         In a Board Risk Committee:

“We have data quality controls around reporting and model inputs. Nothing material has surfaced.”

·         In a technology steering discussion:

“Once migration is complete and reconciliations are signed off, data quality will not be a constraint.”

·         In a credit policy review:

“Let’s assume the bureau and internal data are reliable; we need to move on to cut-offs and strategy.”

Underneath is a quiet assumption:

·         If files are accepted by bureaus and we are compliant with RBI timelines, data quality is “good enough” for most purposes.

It feels practical.

It lets leaders focus on:

·         GNPA and write-offs.

·         Growth, pricing, capital.

·         Collections and recovery.

The idea that “data quality” is an IT problem with regulatory guardrails becomes convenient.

If you sit with the teams that actually push data around, operations, reporting, analytics, bureau desks, the picture is more blunt:

Data quality is rarely one big failure.

It is a long list of small, tolerated compromises.

 

How data quality challenges really show up (without anyone naming them)

If you walk through a normal quarter in a mid-sized lender, you’ll see the same patterns.

Not as headlines.

As background behaviour.

1. “Temporary” fixes that become permanent wiring

In one NBFC, a core system upgrade had a known limitation:

·         Certain legacy products did not map cleanly to the new product-code structure.

·         During migration, a decision was taken to club them under a generic code, with a note that “this will be refined later”.

To keep the bureau reporting going:

·         The data team created a mapping file:

– old product codes → temporary generic code → bureau product type.

It lived in a shared folder and was maintained by one senior analyst.

Over time:

·         New products were added.

·         Some legacy accounts were closed, some restructured.

·         The mapping file was tweaked repeatedly.

Three years later:

·         No one could fully explain why certain accounts showed up under odd product types in bureau reports.

·         Internal analytics and bureau-based analytics gave different splits for the same segment.

·         A junior analyst described the mapping logic as “this is how it’s always been”.

Formally, data quality was “fine”:

·         Files were accepted by bureaus.

·         RBI reporting was up to date.

Practically, one “temporary” compromise had become structural wiring:

·         Product-level risk signals were blurred.

·         Portfolio views by product didn’t match across internal and external sources.

·         Any model using “product type” as an input was learning from a distorted picture.

None of this showed up as “data quality” in a dashboard.

It showed up as quiet confusion whenever someone tried to reconcile numbers across systems.

2. Field-by-field compromises that nobody joins up

In a bank’s retail business, a model-validation exercise flagged an odd pattern:

·         The variable “time since oldest trade line” in bureau data behaved inconsistently across applications.

·         For some segments, it looked surprisingly young, even when internal relationships were long.

When the team traced it back, they found:

·         During integration, the “opened date” from one bureau had been mapped incorrectly for some account types.

·         For older accounts migrated from a legacy system, a “system start date” had been used instead of actual origination, because historical dates were messy.

Individually, each choice had seemed harmless:

·         “We’ll use the date we are sure of, even if it’s not perfect.”

·         “Older dates are messy; let’s start from the migration date.”

Across thousands of customers, those choices meant:

·         The system quietly understated the length of customer credit history for certain segments.

·         Models and rules treated long-standing customers like newer borrowers.

·         Risk views on “seasoned” vs “new-to-credit” were blurred.

Nobody wrote a document that said:

“We are comfortable compromising on the accuracy of customer credit history.”

They wrote:

·         “Given data constraints, we will use the most reliable available field.”

Data quality degraded by a series of reasonable decisions.

3. Reconciliations that focus on totals, not truth

In regulatory reporting and bureau submissions, reconciliation meetings usually sound like this:

·         “Total live accounts match.”

·         “Total outstanding matches within tolerance.”

·         “Number of write-offs and NPA accounts are consistent with our books.”

This is necessary.

It is also incomplete.

What doesn’t get the same attention:

·         Are DPD buckets aligned for the same account across systems?

·         Are closure dates and settlement flags consistent?

·         Are there systematic differences between what we report to RBI and what we send to bureaus?

In one institution, when an internal audit team reconciled a small sample of accounts end-to-end, they found:

·         A customer marked 0 DPD internally but showing 30 days overdue at a bureau, due to a reporting delay.

·         An account closed in core but still “open” in bureau data for two months.

·         A structured settlement posted differently in two systems, leading to different “highest ever DPD” histories.

At portfolio level:

·         Totals matched.

·         Ratios looked correct.

At customer level:

·         Stories diverged.

When regulators or partners look closely, they do not only care that totals add up.

They care whose version of the truth you are living with.

 

Why senior dashboards rarely show “data quality” as a problem

If these issues are so common, why doesn’t data quality show up as a top risk?

Very simply: because most metrics are not designed to expose it.

Dashboards measure volume and timeliness, not correctness

Typical reporting around data looks like:

·         “Bureau reporting files sent on time in X of Y cycles.”

·         “RBI returns filed with 0 major rejections.”

·         “No critical data incidents in the last quarter.”

These are important.

But they measure:

·         Did we send something on time?

·         Did it pass basic format and range checks?

They do not measure:

·         Was what we sent actually correct at customer level?

·         Are our internal and external views of the same customer consistent?

A portfolio chart is happy if:

·         All accounts have a DPD bucket.

It does not know whether:

·         Those buckets are anchored to the right events.

Ownership of “data quality” is fragmented

Ask “who owns data quality?” and you get different answers:

·         IT will say they own systems availability and integrity of pipelines.

·         Operations will say they own correct posting and account maintenance.

·         Risk will say they own model input quality and portfolio views.

·         Compliance will say they own regulatory reporting accuracy.

All are correct in slices.

None fully owns:

“For this customer, on this date, across our systems and the bureaus, do we tell the same story?”

So data quality issues live in:

·         Jira tickets,

·         Email threads,

·         Local Excel reconciliations,

·         Conversations between two teams trying to make a file pass.

They rarely surface as a single, coherent concern.

Regulatory comfort is mistaken for data comfort

When RBI returns are accepted without major exceptions, and bureaus don’t raise large-scale rejection flags, the internal translation becomes:

“Our data is clean enough. If there was a serious problem, someone would have told us.”

That is a narrow reading.

Regulators and bureaus will absolutely point out glaring issues.

They will not do the slow work of:

·         Telling you how many of your closure dates are off by a month.

·         Explaining that your “time since oldest trade line” is systematically mis-stated for one product.

·         Showing that your internal and bureau DPD diverge in ways that hurt specific segments.

That is your job.

 

How more experienced teams treat data quality (without turning it into a buzzword)

The institutions that handle data quality better rarely have grand slogans.

They do a few specific, repeatable things.

They talk about “data truth” at customer level, not just validation at file level

In one bank, the CRO asked for a simple exercise:

“Give me 50 random customers from our main retail product.

For each one, put on a single page:

– what our core says,

– what our collections system says,

– what we report to RBI,

– what each bureau shows.”

The team treated it as a nuisance at first.

The output was uncomfortable:

·         Some customers showed different closure dates in core and bureau.

·         A few had post-moratorium DPD handled differently internally vs externally.

·         One had been written off internally but still showed as standard in one bureau for weeks.

These were not system failures.

They were normal consequences of the way batches, postings, fixes and files interacted.

The impact of the exercise wasn’t the document.

It was the change in language afterwards:

·         “File passed checks” quietly gave way to

·         “For this category, we are still not aligned on account status between systems.”

Once you start talking about customer-level truth instead of file-level validity, data quality stops being abstract.

They pick one or two critical fields and get serious about them

Rather than try to “improve data quality” everywhere, they choose a small set of fields where truth really matters:

·         DPD and status codes.

·         Origination and closure dates.

·         Write-off / settlement flags.

·         Product type.

Then they ask practical questions:

·         “For these fields, how many systems are we using them in?”

·         “Where is the single source of truth?”

·         “If two systems disagree, which one wins, and how do we correct the other?”

In one NBFC, this led to a small but important change:

·         The bureau reporting team, the RBI reporting team, and the collections MIS team agreed on one shared definition of DPD and write-off for their main unsecured book.

·         They created a basic cross-check: a monthly sample where internal DPD, bureau DPD, and reported DPD were compared for the same accounts.

No new platform.

Just a consistent expectation and a recurring sampling exercise.

Over a year, noisy arguments about “whose number is right” reduced.

Not to zero, but enough that people could focus on decisions.

They treat “data quality” as a behaviour, not a project

You sometimes see big programmes with names that sound impressive.

They often die in their own ambition.

The quieter teams do something else:

·         They put data fields and their definitions into credit committee annexures when new products are launched.

·         They include, in model-validation reports, a short section on input data limitations, written in clear language.

·         They make sure that when an RBI observation or a partner challenge relates to data consistency, it is not filed away as a pure “compliance point”.

In one institution, a simple practice made a difference:

·         Whenever a significant data issue was discovered (e.g., mis-mapped product codes), the post-mortem included:

– “What did we have to believe for this to stay hidden so long?”

– “Who assumed data quality was somebody else’s problem here?”

It wasn’t about blame.

It was about making the assumptions visible.

 

A quieter way to hold “data quality challenges”

It’s tempting to keep the comfortable story:

“We have decent systems.

We pass bureau and RBI checks.

Data quality is not perfect, but it’s not a top risk.

When issues arise, we fix them.”

If you stay with that, data quality will continue to appear in your world as:

·         Occasional “incidents” and reconciliation headaches.

·         A footnote in audit reports.

·         A line item when a regulator comments on specific gaps.

If you accept a more uncomfortable view:

·         That most data quality challenges are not about massive errors, but about small distortions repeated at scale,

·         That “file accepted” and “customer truth aligned” are not the same thing,

·         And that your portfolio and models are learning from whatever mix of truth and compromise you send them,

then the question changes.

It stops being:

“Do we have data quality checks, and are we broadly compliant?”

and becomes:

“For the credit decisions and risk views that matter most to us,

how far is the data we actually use from the customer’s real story –

and how much of that distance are we consciously choosing, versus quietly inheriting?”

You won’t like the first honest answer.

But once you’ve seen it, “data quality challenges” stops being a technical label.

It becomes what it really is:

A description of how much of your risk is self-inflicted, because the stories your systems tell about your customers are not quite the stories you would stand by if you had to read them aloud.