In the last video,
you got familiar with Tools for Classification Tasks that
include Logistic Regression and Fit for Neural Network for Classification.
Now, I want to show you how we can
apply these tools to a problem of a great practical interest.
The problem is predicting banking failures using quarterly reports submitted
by all U.S. based commercial banks to the Federal Deposit Insurance Corporation,
also known as the FDIC.
Data that we will be using for this analysis is made publicly available by the FDIC,
which has a lot of relevant data of this sort.
This certainly helps as normally,
the most difficult part of any mission or any project is to get the data.
I have recently read the blog that offer
a very distinct description of most machine learning startups.
Take open source machine learning algorithm and
use it on the proprietary data set to monetize the information in it.
Other companies built predictive models using open data.
In particular, there are many companies that commercialized various models built on
the freely available FDIC that we will be using in this use case.
Now, coming back to the problem itself,
let's see why it's so important in practice.
The problem of prediction of bank failures,
that is the events one a bank will be closed by the FDIC,
is for interest for a number of counterparties in financial markets.
First, it's important for trading in bank stocks.
Second, it matters for inter-bank lending and trading.
Third, it helps regulatory models that monitor the financial system.
Now, let's talk a little bit about the FDIC itself.
The FDIC or Financial Deposit Insurance Corporation,
regulates all commercial banks in the U.S.
The FDIC provides deposit insurance for commercial banks,
and charges them a premium for this insurance.
If you live in the United States,
you might have frequently seen this sign when you enter a bank.
This is the sign that your money is guaranteed by the FDIC,
for the amount of up to 250,000 dollars.
The premium that the bank pays to the FDIC is
decided based on the rating that the FDIC assigns to the bank.
The methodology used by the FDIC to decide ratings is called CAMELS,
which stands for Capital Strengths, Asset Quality,
Management Quality, Earnings, Liquidity,
and Sensitivity to Market Risk.
Rating 1 is the best and rating 5 is the worse.
If a bank is assigned rating 4 or 5 as a result of a review by the FDIC,
it's likely to be closed soon.
Capital inadequacy, that is an insufficient level of capital held by
the bank as a cushion for
potential financial distress is the most common reason for regulators to close a bank.
However, there might be other reasons as well, for example,
violations of financial rules and management failures.
If the FDIC decides to close the bank,
it takes over both its assets and its liabilities and then
tries to sell the assets at the best price possible to pay up the liabilities.
Now, if CAMELS ratings,
given by the FDIC to hold banks who are publicly known,
then the problem of predicting bank failure would be relatively easy.
But the problem is that the CAMEL's ratings are not publicly known.
There are secrets tightly guarded by the banks and the FDIC.
The problem of predicting bank failures is
therefore addressed using statistical and machine learning approaches.
Let's first talk about the data needed for such modeling.
Fortunately, relevant data is made freely
available by the FDIC itself, as we said before.
The FDIC collects quarterly reports called consolidated reports of condition and income,
also known as Call Reports in industry from all banks.
It makes them freely available to the public and these reports are
actually widely used by financial analysts who control banks.
Call Reports consists of twenty-eight parts called schedules.
They provide a very detailed view of banks balance sheets and income statements.
While this table gives you a full list of all schedules and call reports,
I highlighted in bold those schedules that are most relevant to
the type of analysis that I'm going to present next.
Now, let's talk a bit about similarities and differences between
the current problem of modeling bank failures and the problem of corporate defaults,
which is another binary classification problem in
finance and they actually offer research
for once that analyze
companies fundamentals for investing or for trade related decisions.
Corporate defaults such as bankruptcies and non payments on
debt obligations are binding events just like bank failures.
Moreover, both such events are rare as statistically,
neither bank failures nor corporate defaults happen too often.
In both cases, we typically talked about probabilities of the order of 1 percent.
Instead of using raw features from Call Reports,
modeling of both types of events uses
certain financial ratios derived from these raw features.
Both look at sequences or financial ratios
to probabilistically predict the future binary event,
that is a corporate default or a bank failure.
So these are similarities but there are also some differences.
First, the banking business is different from
corporations that produce physical or digital goods or services.
Respectively, Call Reports do not contain some items that are present in corporate files,
such as for example,
cost or cost of good sold because there are no goods to sell in the first place.
On the other hand,
banks have substantially richer debt structure than
corporations because their specific businesses related to lending and deposit accounts.
In addition, banks have
more specialized financial assets in comparison to
corporations whose main assets are normally non-financial.
The other difference is that the mechanics of
actual decision making is totally different in these two cases.
While corporations decide themselves to declare bankruptcy,
a decision to close the troubled bank is taken not by the bank but rather by the FDIC,
that is by an external body.
With all these differences in mind,
it's still useful to start our discussion of modeling bank failures with
a short digression of how corporate defaults are modeled in classical finance,
or maybe in this case,
I should better say super classical finance because the model
I want to describe to you was developed by Robert Norton,
from my team, 1974,
a year after he developed jointly with Fischer Black and Myron Scholes
the Option Price Theory that brought
the Nobel Prize in Economics in 1997 to Norton and Scholes,
as Black, a legend of Wall Street died of cancer in 1994.
By the way, it's relatively less known fact that Fischer Black had started
artificial intelligence at MIT with Marvin Minsky before turning to finance.
So, let's talk a bit about the Marathon Corporate Default Model.
Let's look at our next video.