Teradata Fastload is a command driven utility which can be invoked either in batch or interactive mode and used to load large volume of data into Teradata empty table.
Teradata Fastload utility establishes multiple session i.e. one session per AMP by default; transfer data using block rather than row to load data into the table and does not use any transient journal during operation. This is the reason it is so fast.
It loads only one table at a time. If you want to load multiple tables simultaneously simply invoke multiple Fastload jobs.
Fastload can load data from
- Input files on a network attached workstation.
- Disk or tape files from channel attached workstation.
- Special input module (INMOD) routines written to select, validate, and preprocess input data.
- Any other device providing properly formatted source data.
Pre-requisite for operating Fastload
In order to run any Fastload, three key components are needed. They are a log table, one empty target table and two error tables-
Log table: One log table is needed to keep track the status of every Fastload session running on the system. Teradata maintains one table called fastlog under database SYSADMIN. In order to use this table, you need INSERT, DELETE and UPDATE privilege on his table.
Empty Target Table: Fastload needs one empty target table before inserting data into it. Fastload does not care how this has been accomplished. You can achieve this by deleting the data from target table or drop-recreate the target table prior to start the loading.
Two ERROR tables: Two error tables are required by Fastload to capture if any exception occurs during the Fastload process. These tables will be created automatically.
The first error table is to capture any translation errors or constrains violation. For example, a row with column wrong data type would be reported to the first error table.
The second table is for error caused by duplicates values for Unique Primary Index (UPI). Fasload will load only one instance for every UPI and stores the duplicate occurrence in the second table. However, if the entire row is a duplicate, Fastload counts it but does not store the row.
Note: Teradata will allow maximum 15 Fastloads, Multiloads or Fastexports at the same time. This protects the system from giving all the resources to only load utilities.
Phases of Fastload
Fastload divides the whole process into two phases as below-
Phase 1: Acquisition –
- The main objective of this phase is to transfer the data from the Host computer to the Teradata environment i.e. to Access Module Processor (AMP) as quickly as possible. To achieve this Teradata Parsing Engine (PE) does not hashes each row one by one.
- As soon as Teradata PE receives any insert statement from Fastload utility, it parses the statement just once.
- Then PE opens a sessions from the Fastlaod client to the AMPs directly. By default, it will create one session per AMP. So, if we have a system with 10 AMPs, it will create 10 sessions.
- One of the client sessions will pack the raw data into 64K blocks.
- Then this large block of data will be sent to the AMP randomly without any concern for which AMPs gets the data.
- After receipt of each data block, every AMP hashes its rows based on the primary index and redistributes them to the proper AMP.
- After redistribution of rows, data will be written to an internal Worktable on the AMPs but will be remain unsorted.
Phase 2: Application–
- The main objective of this phase is to write the data to the actual table space.
- Each AMP will start sorting of rows in its work table which has been prepared during phase 1.
- After sorting is done, these rows will be written to the actual data block in the disk permanently.
- Any error in this phase will be stored in the second error table.