What is the difference between dv01_id and loan_id_provider?

Every data provider has their own loan ID convention and oftentimes they are long alphanumeric strings. For both performance reasons and to avoid any ID collisions across datasets, we give every loan a unique dv01 loan ID called dv01_id. As the name implies, if you need to join any tables at the loan level you should use dv01_id (and account_name). Should you need the originator-assigned loan ID, you can get that from loan_id_provider in the Static Data view.

Why are some loan_id_provider values not alphanumeric?

Some data providers require that we obfuscate the original loan ID. As such, you may see some odd loan IDs with a lot of non-standard characters. This is not bad data or a failed load, this is a masked version of the original loan ID.

How can I identify platform datasets, securitizations, or my own portfolios? 

Another important concept is that of a portfolio type. In this context, a portfolio type is either a bond (securitization), a whole loan portfolio, or a platform-wide dataset. It is important to note that a loan may appear in multiple data sources. For instance, a loan could initially be in a whole loan portfolio before being securitized. It is possible that this loan will show up in a whole-loan account, a securitization account, as well as a platform-wide account. It is important to note that Loan IDs are sometimes but not always maintained across accounts depending on the requirements of our data providers. In this scenario just described, the database could have up to three distinct loan IDs for a single loan.

Can a single loan appear multiple times across portfolio types? 

Yes. It is common for a loan to appear in either a securitization or a portfolio dataset and again in a platform dataset.

If a loan has multiple records, how should I join static_data to the aggregated or history tables?

When joining tables at the loan level, the join should be done on dv01_id and account_name.

How can I get the last record for every loan regardless of whether it is still outstanding?

When a loan pays off (either voluntarily or involuntarily), not every data provider continues to report on the loan. As such, not every as-of date will necessarily contain a record for every loan ever in the portfolio. Certain analyses, primarily cumulative analyses like a cumulative loss calculation need to have knowledge of every loan in the portfolio, not just what is outstanding on a given date. To help assist with these types of queries, we provide a field called carry-through date. The carry-through date is the date for which the given record is valid until. Another way to think of it is as the last day in a reporting period. For a loan that has a subsequent record, the carry-through date will be the day before the next as-of date. The more important observation is that for loans without a subsequent record, usually either because they've paid off or simply because it's the most recent available date, we set the carry through date to 3008-01-01.

Are grade and subgrade standardized across originators?

Loan grades (and sub-grades) are unique to each originator. Originators may share the same nomenclature, but similarly named grades are not comparable across platforms. For example, Lending Club's B-grade loans are not meant to be comparable to Prosper's B-grade loans. Oftentimes, analysts will use a field(s) that is universally defined, the most common being FICO, to do cross-platform analytics within a credit bucket.

Why aren’t the as-of dates aligned with Lending Club’s raw data?

Lending Club’s monthly data files tend to have one record per loan per month-on-book (MOB) with each MOB corresponding to a single as-of-date. It is reasonable to assume that the MOB and as-of month would increment by one with every record, but that is not always the case; there are loans for which an as-of month is seemingly skipped. The table below is an example from their platform dataset.

In reviewing borrower agreements and the data, we believe this happens because Lending Club allows borrowers to make a due-date adjustment to get their loan payments in-sync with their 13 paychecks. For example, suppose a borrower’s due date is the 25th of every month, but he or she gets paid on the 1st and 15th of every month. Let’s assume this borrower made his or her payment on 5/25 and then asked Lending Club for an adjustment such that his or her payment can be made on the first of the month to coincide with their paycheck. As such, rather than the next payment being due on 6/25, it is due on 7/1 resulting in no payments for the month of June. To smooth these gaps, we anchor the dates to the most recently reported data point. From there, we would decrement the as-of date by one month for each prior MOB. In the above example, the data would now be represented as in the table below.

In the periodic tables, do all loans have a month 0 record and why do some show a payment in month 0?

Unfortunately, some of the datasets we receive do not contain an initial record. Because of this, there is no guarantee of a month 0 record for every loan. This especially applies to securitizations since we rarely have information before a loan is securitized. Further complicating matters is that not all datasets are consistent in their inclusion/exclusion of an origination record. We see a lot of datasets that do not include an origination record by default but have a handful of loans with a loan_age = 0 record, and these records usually show a corresponding payment. You may be wondering, “How can a loan have a payment in month 0 since month 1 is ‘on-the-clock’ once the loan is originated?” In at least some of the cases, this happens because a borrower made a payment between the origination date and the time the loan started to accrue interest, so technically month 1 was not yet “on-the-clock”.

In their platform dataset, Lending Club has one big 31-120 dpd bucket; how do you have more granular dq statuses?

For any loan that is 31 to 120 days delinquent, Lending Club will simply label it as “Late (31-120 days)”. Recognizing that analysts might want more granular delinquency statuses, we use balance and payment information to infer whether the loan is one, two, or three payments behind and label it as such. The method to make that determination is straightforward; given the reported month-on-book we know how many payments should have been made to date. Using the historical data, it is easy to determine how many payments have been made, and from those two data points we can determine the number of payments the borrower is behind. Conceptually the logic is clear, but reality is always a little grayer. We try to adhere to industry standards whenever we make a determination of delinquency. A perfect example of this is a partial payment - does a borrower get partial, full, or no credit for a partial payment? In this case the general industry consensus is that a borrower continues to roll forward until the full amount of a payment is made, whether it comes as a single payment or multiple partial payments.