Access to credit
The provision of sustainable credit is a fundamental lever to solve some of the most important issues facing the real economy and, through it, society. By channelling capital effectively from investors to borrowers, lending creates jobs, fosters innovation, creates fair opportunities and contributes to reducing inequalities.
But extending small loans to individuals or small businesses, one of the core historical missions of the banking system, is a costly exercise. In their current form, the lending verticals at most retail and commercial banks mobilize many credit officers and back office employees to ensure that credit files are analyzed and priced according to the bank’s standards.
Faced with a more stringent set of constraints imposed by regulators and shareholders since the 2008 Global Financial Crisis (GFC), banks have realized that high Operating Expenses (OpEx) and high capital consumption have made small loans a structurally loss-making business proposition. Logically, the traditional banking system has turned off the taps of credit but for the larger corporates or wealthier individuals.
Although large banks remain the first port of call for US small businesses to get a loan, their approval rates are by far smaller than those of alternative small business lenders.
In 2015, Goldman Sachs calculated that rate to be 21% for banks vs. 62% for new fintech lenders, characterized by their extensive use of data and advanced analytics throughout the loan underwriting process.
So, can Machine Learning (ML) substantially improve both small borrowers’ access to credit and lenders’ profitability metrics? Our answer is an unequivocal yes.
Equipped with the right underwriting technology, lenders can lend more to the real economy while cutting delinquency rates and reducing Operating Expenses (OpEx). Equipped with similar data analytics, investors can independently assess credit risk and allocate substantially more capital to the asset class. In short, getting this ML use case right is of major importance to the evolution of the global financial system.
In this post, we lay out key theoretical arguments making ML particularly well suited to the risk scoring of granular credit.
How can ML help with credit scoring?
Credit scoring aims at answering a simple question: what is the probability that this credit will be paid back? This is a typical classification problem.
Banks and credit institutions have traditionally addressed the credit scoring problem using a scorecard approach, feeding credit variables into a static heuristic function to produce one aggregate number (the credit score). In the US, the FICO score has become the overwhelming driver of credit underwriting decisions by traditional banks when it comes to granular credit (consumer loans, student loans, small business loans).
The FICO score:
FICO (Fair Isaac and COmpany) is a data analytics company founded in 1956, focused on credit scoring services. Its FICO score is the most widely used measure of consumer credit risk in the US.
FICO scores are available through all major consumer reporting agencies in the US, including Equifax, Experian and TransUnion. FICO scores range from 300 (worst) to 850 (best).
FICO scores incorporate numerous variables reflecting a consumer’s credit health across 5 categories:
- Payment history
- Credit utilization
- Length of credit history
- New credit applications
- Credit mix
According to TransUnion, over 170m US consumers had a FICO score at the end of 2017.
But the credit classification problem has 3 important characteristics:
- It is a high-dimensional problem,
- It is a non-linear problem, and
- It should be an evolutive process.
High-dimensional. A borrower’s credit health cannot be regressed on a small number of credit-related variables. Empirical evidence shows that reducing the number of these variables excessively results in massively over-simplifying the problem.
Non-linear. Within this high-dimensional space, the impact of a given variable on a borrower’s credit strength can be very different depending on the values of the other variables (the coordinates of that borrower in space). Considering the analogy of a 3-dimensional space (for illustration purposes) and classifying loans as ‘good’ (class 0) or ‘bad’ (class 1), one should not expect a 2-dimensional plane to separate the two classes but a much higher order surface.
Evolutive. ML models are forward looking models capable of adapting their predictions to new information (the learning process). This makes them very relevant to real life applications. As the pool of originated loans matures, new information becomes available for a ML model to refine its predictions on new loan applications. In general, ML models have a lot more structural features to adjust to subtle changes in the data than traditional statistical models.
The above characteristics, combined with the vast volume of historical loan-level data available within banks and credit institutions, make the granular credit scoring problem a perfect use case for ML to outperform static, fixed-weight heuristics used by most traditional lenders.
In the aftermath of the GFC, a new breed of technology-centered, non-bank lenders started offering an ambitious alternative to the traditional scorecard-based approach. Using technology to automate the collection and pre-processing of vast amounts of borrower data, they placed advanced modeling techniques at the heart of the loan underwriting and pricing processes.
In a detailed research article dated April 2018, the Federal Reserve Bank of Philadelphia (the Regulator) looked into the benefits of alternative data and ML in the provision of credit to US consumers, based on a thorough quantitative analysis of data sets provided by US consumer lending platform Lending Club and Y-14M regulatory filings by large US bank holding companies. The broad conclusions are clear:
- The use of ML and alternative data provides fintech lenders with an edge over traditional FICO-based lenders to price credit risk accurately. At qbridge, we estimate that the use of alternative data approximately doubles (at least) the predictive power of a FICO-based ML classifier.
- This edge is particularly pronounced for higher risk cohorts, allowing fintech lenders to widen access to credit for cohorts of borrowers underserved by traditional banks.
What is alternative data?
Traditional credit analysis is based on historical, borrower-level data directly related to credit. But in the digital age, the value of data not directly related to credit performance is becoming impossible to ignore. Patterns of consumption and payment schemes, now recorded and stored in massive volumes, are very powerful variables to predict credit performance.
A few concrete examples of alternative data in consumer credit (by no means an exhaustive list) are bank account data, utility consumption and payment patterns, mobile phone consumption and behavior, internet usage, social network connections, census and economic data, general retail spends, etc. The value of alternative data in credit scoring is now widely acknowledged in highly developed credit markets like the US. When it comes to emerging markets with no reliable credit agencies and the emergence of a massive consumer middle class, it is the single most important source of prediction for real economy lenders.
ML allows to make full use of alternative data in a way which isn’t possible in a scorecard framework, due to the less structured nature of that data.
ML and alternative data are changing the credit scoring paradigm, allowing non-bank players to contemplate expanding the spectrum of their products and services to financial services. For banks, it is both a wake-up call and a massive opportunity. Indeed, through bank account information, banks are sitting on a trove of credit-relevant data. By putting that data at the core of the credit creation process, banks can transform and turbo-charge their lending verticals.
Getting the job done
Turning vast amounts of credit and alternative data into valuable predictors is a substantial task which must be left to a multi-skilled team of data scientists and financial market experts.
At qbridge, we spent time and effort building a dedicated technology infrastructure to absorb structured and unstructured data, pre-process it and transform it into consumable matter for our advanced ML algorithms and advanced credit analytics.
Our generic data infrastructure enables us to expedite and systematize the crucial data processing tasks which otherwise could derail a credit scoring automation project. Those tasks include:
- Data infrastructure definition,
- Structuring of optimal historical data samples,
- Data validation and cleaning,
- Data enrichment,
- Feature engineering,
- Dimensionality reduction,
- Tackling of imbalanced classes,
- Model choices and parameters,
- Performance metric choices,
- Run time,
- Model recalibration, learning process.
Credit scoring was historically one of the main fields of application of ML to the financial sector. There is enough empirical evidence to show that ML and alternative data are a complete game changer in the field of granular credit.
For lenders. Banks are particularly well positioned to monetize their competitive advantage when it comes to their small-to-medium sized business and retail customers. The economic benefits of automating the scoring, pricing and underwriting of credit using ML and alternative data are phenomenal for lending banks:
- Reduce delinquency rates,
- Increase credit production,
- Adjust credit pricing in real time based on weak signal detection,
- Reduce OpEx tied to low-value credit assessment functions.
For universal banks with global markets operations, the readability brought by ML to granular credit risk enables the creation of a new private credit asset class for their institutional investors.
For investors. Institutional investors and asset owners equipped with the right ML technology and access to rich data sets are ideally positioned to allocate significant capital to a new asset class, namely granular credit to the real economy, without relying on high-level, scorecard-based ratings which, we believe, are a thing of the past.