Claris now supports the RBI FREE-AI Framework — full automated compliance mappingLearn More
Fairness & RAI7 min read

Hidden Redlining: How Zip Codes Become Proxies for Caste in Credit Models

NA

Nodex8 AI Research

AI Research Team

March 5, 2026

AI Snapshot

3 things to know before you read

1

Proxy Bias occurs when a feature that appears neutral (zip code, device type) is statistically correlated with a protected attribute (caste, religion, gender) — encoding discrimination without explicit intent

2

Geographic features are the most common proxy for caste and religion in Indian credit models, with some urban postcodes acting as near-perfect proxies for community composition

3

Feature Shielding is the technical solution: identify proxy features through correlation analysis, then either remove them or apply statistical correction to neutralise the encoded protected attribute

What Is Proxy Bias and How Does It Enter Credit Models?

Direct Answer

Proxy Bias occurs when a model learns to use a seemingly neutral feature as a stand-in for a protected attribute. The model never "sees" caste or religion — but it learns that certain zip codes, device types, or shopping patterns correlate with these attributes in the training data, and uses this correlation to make decisions. The result is discrimination that is statistically indistinguishable from direct discrimination — but invisible to standard fairness tests.

In India, historical patterns of residential segregation mean that geography encodes protected attributes with surprising precision. **Example: Mumbai zip codes and community composition** A study of credit applications in Mumbai found that 23 postal codes in the city have community composition concentrations above 65% for a single religious group. A credit model trained on historical data from these areas will learn that the postal code predicts creditworthiness — but the creditworthiness signal is confounded with the protected attribute signal. **How proxy features enter models:** 1. **Data inheritance** — Training data reflects historical lending patterns that were themselves discriminatory. The model learns the historical pattern. 2. **Correlation mining** — Modern ML models (gradient boosting, neural networks) are extremely good at finding subtle correlations. A feature that humans would not consider a proxy can be identified by the model as predictive. 3. **Interaction effects** — Two individually innocuous features (city + employment type) can interact to create a powerful proxy for a protected attribute. **The Feature Correlation Matrix:** Claris's RAI Module computes pairwise correlations between all features and all protected attributes (including self-reported and proxy-inferred attributes) as part of the model validation process. Features with Cramér's V correlation above 0.15 with any protected attribute are flagged for human review.
Full Article Access

Continue reading

Enter your details to unlock the remaining 2 sections — free access, no obligation.

Free · No spam · Unsubscribe any time

NA

Written by

Nodex8 AI Research

AI Research Team

The research team of Nodex8 AI focuses on global AI governance agenda, policy to code maturity across the globe, theoretical and empirical explainable AI research and technology advancement in the domain.

View research publications