TABLE OF CONTENTS
- Field Matching
- Data Wrangling
- Saving A Tape
After importing a tape, the tape will automatically open into Edit Mode. There are two main functions in Edit Mode: Field Matching and Data Wrangling. When importing new tapes, you will arrive on the Field Matching page before progressing to Data Wrangling; when editing an existing tape, you will arrive on the Data Wrangling page, with the option to revisit Field Matching if necessary.
The first step of cracking a tape is to match the raw column field headers in the uploaded tape to standard field names that exist in your Data Dictionary (the organization of this is called the "schema").
There are two tables on the Field Matching page. The table on the left provides a list of unmatched fields, while the table on the right shows the fields that have already been matched to the schema in your data dictionary.
When you first land on the Field Matching page, you will notice that some of the fields on your tape will already be in the matched table. dv01's matching algorithm uses machine learning to help automate matching between the raw columns in the tape and the fields in your data dictionary based on similarities between field names and prior matches. The algorithm gets smarter as more tapes are imported, meaning that manual match/unmatch actions will be remembered for future tapes.
In the table on the left, you will see a list of unmatched columns in your tape. Use the input next to each field to match the raw column to a field in your schema. If your desired match does not exist in your schema, you can create a new custom field. Some inputs may have a field already selected; these suggested fields are likely match candidates, but our confidence in the match is not strong enough to automatically match the field for you. When opening the input, you will also see a list of suggested matches at the top. After finalizing your matches, simply hit the Match button to apply your selections.
In the table on the right, you will see the list of columns that have already been matched. Here, you can unmatch fields that are incorrect, and/or match the same raw column header to multiple fields in your Data Dictionary. The latter is particularly useful when the raw column in a tape contains information that you would like to split into multiple fields.
Creating A New Custom Field
By default, the Data Dictionary contains dv01's standard mortgage schema, created by leveraging our expertise in working with large mortgage datasets. However, the Data Dictionary is fully customizable, allowing you to create unlimited custom fields for ultimate flexibility. These custom fields will automatically be available for usage across all your tapes, including in automatic & suggested matches by dv01's field matching algorithm.
To create a custom field, open the relevant input in the unmatched field table and click "Create Custom Field". Then enter a field name, field type, description, and optionally select a raw column that you would like to match the newly created field to.
When field matching is complete, toggle to Data Wrangling to fix poorly formatted data, standardize defined value fields, and create/apply calculations. The table contains the list of matched fields and reflects what that will end up in your cracked tape (if you don't see a desired field, toggle back to Field Matching to match additional raw columns in the tape). For each matched field, the table displays the raw column header, the field type, and the Data Health bar, providing at-a-glance feedback on the cleanliness of the data within a particular field.
The Data Health bar support four statuses for this field data:
- Valid: indicates that a given value is properly formatted.
- Invalid: indicates that a given value is improperly formatted. While you do not need to fix invalid data to save your tape, invalid data must be fixed in order to use that field in a calculation.
- Missing: there is no value.
- Nonstandard: indicates that a given value does not match something in your schema. This status only applies to defined value fields.
Applying Field Format Actions
To fix and standardize your data, click on a field in the table to open up the Field Detail drawer, which displays the Preview of individual values within the field alongside the field Editor. Here, you will be able to see the source values in your tape, along with a preview of what the values will look like after you apply format actions.
We support the following format actions:
- Trim: Removes extra white space from values.
- Replace: Replaces any instance of the first input (STR1) with the second input (STR2), including if the first input appears as a substring in the value.
- ExactReplace: Replaces instances of the first input (STR1) with the second input (STR2) only if the entire value matches the first input exactly.
- Remove: Removes all instances of the input (STR1) .
- Left: Includes only the specified number (END) of left-most characters of the value.
- Right: Includes only the specified number (END) of right most characters of the value.
- Substring: Takes the substring of the value between the number of characters specified (BEGIN and END).
- Round: Rounds all values to the specified number of digits.
- StandardizePercentage: Automatically converts percentages into decimals.
- StandardizeDate: Automatically formats different date formats into the dv01 date format.
StandardizePercentage and StandardizeDate format actions will be automatically applied to fields that dv01 identifies as date or percentage fields.
Defined Value Fields
Defined value fields, also known as enum fields, are slightly different than other fields. These fields can contain any value contained within a specific set/list of values (these are viewable in your Data Dictionary). If a given value in the tape does not match one of the set's defined values in your Data Dictionary, the value will be marked as nonstandard.
Clicking on a defined value field in the Data Wrangling table will open the Field Detail drawer, where you will be able to map the existing unique values in the tape to the list of defined values that exist in your Data Dictionary. You can create new custom values within a set by clicking Create New Custom Enum Value button. Any new enum values will appear in your Data Dictionary and be remembered as Valid for future tapes.
Creating And Applying Calculations
We also support the ability to create and apply calculations to your data using our bespoke dv01 syntax language.
To create a calculation, click the Create New Calculation button to open the drawer. Select an existing field name or create a new custom field as the calculation output, then enter the calculation. Our calculation editor makes autocomplete suggestions for available fields and formulas as you type, making it easier to write your calculations. Click Generate Preview to verify the results of your calculation, then click Save Calculation to finish.
If necessary, use the Formula Browser for help in writing a calculation. You can browse all available formulas, allowing you to explore our dv01 syntax, view definitions, and see examples of calculations.
You can then apply calculations to your tape by clicking the Bulk Apply Calculations button. This opens a drawer containing a list of all available calculations in your environment, along with some dv01-provided calculations. Select and reorder them, then hit Apply.
Note that you can apply calculations that use existing columns from the tape in order to create new aggregate columns, or you can use calculations to overwrite the raw data in a field.
Fixing Calculation Errors
After applying your calculations, they will appear in the table on the Data Wrangling page bearing a success message (indicating they were applied) or a failure message (indicating there is an issue). Expand the rows for calculated fields to see the applied calculation or unapply the calculation (in the event the application was successful), and/or to understand the reason the calculation failed.
There are two kinds of errors you are likely to encounter:
- Unformatted Data: This error occurs when you try to use a field containing formatting errors in a calculation. Click the Fix It button to apply format actions to the offending field.
- Field Not Matched: This error occurs when your calculation uses a field that has not been matched to the raw column headers. Click Fix It to match the fields. This error may also occur if the order of your calculations is incorrect (i.e., if one of your calculations uses a field that is calculated by another calculation that appears lower in the list). Resolve this error by reordering the calculations in the Bulk Apply Calculations drawer.
You can save your progress in a tape at any time, simply by hitting the Save Tape button in top right corner of your table. You must enter a tape name, select a source, and enter the as of date in the Tape Summary header before saving a tape.