Monday, 9 April 2018

Most useful Teradata interview questions -- PART - Vi

Most useful Teradata interview questions -- PART - VI

Related Links

LOCKS

 199.What does” Pseudo Table” Locks mean in EXPLAIN Plan in Teradata?
  • It is a false lock which is applied on the table to prevent  two users from getting conflicting locks with all-AMP requests.
  • PE will determine an particular AMP to manage all AMP LOCK requests for given table and Put Pseudo lock on the table.
  • Pseudo tables are the dummy tables, Whenever we place a request which involves full table scan and needs data to be retrieved from all AMPS then the parsing engine identifies a gatekeeper AMP which can command all other AMP to lock for a particular request from the user and does not allow multiple user to access the same table to prevent global deadlock.
200.How do you whether table is locked or not?
Just run the following query on the table.
Lock Table DBNAME.TABLENAME write nowait
Select * from DBNAME.TABLENAME;
If this table is locked currently then , then the above statement would fail as Nowait will not wait for locks to be released on that  table .
201.How can you apply access lock on table rather than read lock?
We can override default read lock on any table by explicitly mentioning the ACCESS LOCK before using the table.
LOCK TABLE EMPLOYEES FOR ACCESS SELECT * FROM EMPLOYEES;
202.What are different type of locks in teradata ?
In Teradata , LOCKS are used to maintain integrity of data present in the tables. Teradata , itself decides the kind of lock to implement however in some cases users can also decide what kind of lock, the query should use.
In TERADATA, locks can be implemented at three levels:
DATABASE , TABLE and ROW HASH.
And there are 4 types of LOCKS available in TERADATA:
1) ExclusiveSuch LOCKS are applied on a table when it is undergoing any STRUCTURAL change. All DDL commands apply this LOCK. If this lock is applied then no other locks can be applied on that table.
2) WriteSuch LOCKS are applied when any tables data demography changes by either DELETE,INSERT or UPDATE operation on it. So whenever a DML operation is performed on the table , such locks are applied. Only ACCESS locks are permissible when WRITE locks are applied on the table. And ACCESS Locks may lead to STALE DATA READ from the table since data demography would be changed by DML operations holding WRITE LOCKS.
3)READSuch LOCKS are applied when user tries to READ data from the table and don’t want to fetch STALE DATA. READ locks maintains the data integrity as the DATA in the tables cannot be modified while READ LOCK is applied on the tables. READ LOCK only allow READ or ACCESS LOCK on the table. READ Locks are the default lock used by Teradata while fetching data from any Table.
4)ACCESSSuch locks are applied when table needs to be accessed and the data demographics is not that important; even stale data is ok for the process. ACCESS LOCKS allows READ, WRITE or ACCESS Locks to be applied on the table. However EXCLUSIVE locks are not allowed.
Tips It can be considered as a good practice to apply ACCESS locks on the source table when multiple SQL Queries are using same Source Table. Even if the TARGET table is different still it can lead to bad performance as by default TERADATA applied READ lock so other queries may have to wait to change the demography in SOURCE table. Understanding LOCKS and LOCKING table manually can be very useful in an environment where several tables are loaded parallely. Teradata automatically blocks the query trying to apply incompatible LOCK on the table already locked by some other query. You can use below command before your actual DML commands:
LOCKING SOURCE_TABLE FOR ACCESS INSERT INTO TARGET_TABLE SELECT * FROM SOURCE_TABLE; The other good practice is to use PRIMARY INDEX column while fetching data from SOURCE table. In such cases, the query becomes an AMP operation and Teradata applies lock at ROW HASH level and not at Table level.
203.  What is the particular designated level at which a LOCK is liable to be applied in Teradata?
  • Table Level – All the rows that are present inside a table will certainly be locked.
  • Database Level Lock – All the objects that are present inside the database will be locked.
  • Row Hash Level Lock – Only those rows will be locked which are corresponding to the particular row.

Performance Tuning

204. Highlight the need for Performance Tuning.
Performance tuning in Teradata is basically done to identify all the bottlenecks and then resolve them.
205. Comment whether bottleneck is an error or not.
Technically, bottleneck is not a form of error, but it certainly causes a certain amount of delay in the system.
 206.How can bottlenecks be identified?
There are basically four ways of identifying a bottleneck. These are: –
  • Teradata Visual Explain
  • Explain Request Modifier
  • Teradata Manager
  • Performance Monitor
207.What are the Performance improvement techniques available in Teradata?
Performance tuning and optimization of a query involves
  • First of all use EXPLAIN plan to see how the query is performing.
  • Keywords like Product joins, low confidence are measures of poor performance.
  • Make Sure, STATS are collected on the columns used in WHERE Clause and JOIN columns.
  • If STATS are collected , explain plan will show HIGH CONFIDENCE This tells the optimizer about the number of rows in that table which will help the optimizer to choose the redistribution/duplication of smaller tables.
  • Check the joining columns & WHERE Clause whether PI, SI or PPI are used.
  • Check whether proper alias names are used in the joining conditions.
  • Split the queries into smaller subsets in case of poor performance.
  • collecting statistics on join columns
  • avoiding cross product join
  • selection of appropriate primary index (to avoid skewness in storage)
  • and using secondary index.
  • Avoiding NUSI is advisable
  •  Check for the recommendations of stats to be collected for the columns.
  •  Try to avoid casting in join conditions
  •  Use secondary index appropriately.
  • Use join index if necessary
  • choose the Primary Index to avoid skew,
  • avoid using CAST, SUBSTR, CASE, POSITION, as these functions consumes a lot of resource.
  • If two huge tables are joined almost regularly, create Joined Index.
  • Avoid Redistribution when possible
  • Use sub-selects instead of big “IN” lists
  • Use derived tables
  • Use GROUP BY instead of DISTINCT ( GROUP BY sorts the data locally on the VPROC. DISTINCT sorts the data after it is redistributed)
  • Use Compression on large tables
Optimization of queries improves performance of TD and helps the PE to generate the most efficient execution plan.To Optimize a query
208.Below Steps are useful for performance tuning
  • A) Explain the EXPLAIN: Check for EXPLAIN plan to see how exactly Teradata will be executing the query. Try to understand basic keywords in Explain Plan like confidence level, join strategy used, re-distribution happening or not.
  • B) Collect STATS: The stats of the columns used join conditions should updated. Secondary Indexes without proper STATS can be of little or no help. Check for STATS status of the table.
  • C) Use Proper PI: If the Primary index is not properly defined in any one or all of the tables in the query. Check if the PI of target table loaded is unique.
  • D) Use PPI: If there is Partition Primary Index created on a table, try to use it. If you are not using it in filter condition, it will degrade the performance.
  • E) No FUNCTIONS in Conditions: Try to avoid using function in join conditions. Ex Applying COALESCE or TRIM etc causes high CPU consumption.
  • F) Use PPI: If Partition Primary Index is defined in tables try to use it. If you are not using it in filter condition, it will degrade the performance.
  • G) Same column DATA TYPES: Define same data type for the joining columns.
  • H) Avoid IN clause in filter conditions: When there can be huge number of values in where conditions, better option can be to insert such values in a volatile table and use volatile table with INNER JOIN in the main query.
  • I) Use Same PI in Source & Target: PI columns also can help in saving the data into disk .If the Source and Target have the same PI, data dump can happen very efficiently form source to target.
  • J) Collect STATS on VOLATILE table: Collect stats on volatile tables where required can save AMPCPU. Remove stats if already present where it is not getting used. If the volatile table contains UNIQUE PI, then go for sample stats rather than full stats.
  • K) DROPPING volatile tables explicitly: Once volatile tables is no more required you can drop those. Donít wait for complete procedure to be over. This will free some spool space immediately and could prove to be very helpful in avoiding No More Spool Space error.
  • L) NO LOG for volatile tables: Create volatile tables with NO LOG option.
  • M) Check DBQL Stats: Keep your performance stats accessible. Target the most AMPCPU consuming query first.
  • N) UPDATE clause: Do not write UPDATE clause with just SET condition and no WHERE condition. Even if the Target/Source has just one row, add WHERE clause for PI column.
  • O) DELETE & INSERT: Sometimes replacing UPDATE with DELETE & INSERT can save good number of AMPCPU. Check if this holds good for your query.
  • P) Query SPLITS: Split queries into several smaller queries logically and use volatile tables with proper PI.
  • Q) Try MSR: If same target table is loaded multiple times, try MSR for several sections. This will speed the final MERGE step into target table and you may see good CPU gain.
  • R) Try OLAP Functions: Check if replacing co-related sub query with OLAP function may result in AMPCPU saving.
  • S) Avoid DUPLICATE data: If the join columns in the tables involved in the query have duplicates. Use Distinct or Group by, load into a volatile table, collect stats and use the volatile table.
  • T) Use Proper JOINS: If joins used, donít use right outer, left or full joins where inner joins is sufficient.
  • U) User proper ALIAS: Check the aliases in the joins. Small mistake could lead to a product join.
  • V) Avoid CAST: Avoid unnecessary casting for DATE columns. Once defined as DATE, you can compare date columns against each other even when they are in different format. Internally, DATE is stored as INTEGER. CAST is required mainly when you have to compare VARCHAR value as DATE.
  • W) Avoid UDF: Most of the functions are available in Teradata for data manipulations. So avoid User Defined Functions
  • X) Avoid FULL TABLE SCAN: Try to avoid FTS scenarios like SI should be defined on the columns which are used as part of joins or Alternate access path. Collect stats on SI columns else there are chances where optimizer might go for FTS even when SI is defined on that particular column
  • Y) Avoid using IN/NOT IN: For large list of values, avoid using IN /NOT IN in SQLs. Write large list values to a temporary table and use this table in the query
  • Z) Use CONSTANTS: Use constants to specify index column contents whenever possible, instead of specifying the constant once, and joining the tables. This may provide a small savings on performance.
Sql Tuning:
  • Use Top N option and column names to see sample data in a table. If there is a just “select and  * “ then the optimizer has to replace * with all columns from that table Example: Lock Row For Access Select Top 5 Empno, Ename From Employee;
  • Use “Create AS Table” if there is a need to create a table from existing  table. It operates on an efficient block-by-block basis that bypasses journaling. You can create only structure or structure & data or structure,data and stats. Example: CREATE TABLE  NEW_Table AS OLD_TABLE WITH DATA and STAT
  • Use Insert/Select if there is a need to copy data from one table to another empty table. INSERT/SELECT operates on an efficient block-by-block basis that bypasses journaling. Insert/Select will handle conversions also. Example: INSERT into Summary_Table SELECT store, region, sum (sales), count (sale_item) FROM Region_1 GROUP BY 1,2
  • Use Nowait option if you don’t want your request to wait in the queue. Example: Locking Emp For Read Nowait SELECT * FROM Emp
  • Nesting view could add a substantial time to parser.Try to reduce nesting views creation.
  • If values are nearly unique values then “Distinct” clause may outperform “Group By”. When there are many duplicate value then “Group By” performs better than “Distinct”.
  •  Make sure that the join columns have always same data type. Otherwise one of the table rows would have to undergo translation and does full table scan even though there are stats on the join column.
  •  If you are doing a lot of deletes on rows from a table, consider the use of MultiLoad instead of BTEQ / SQL Assistant to do the deletes. MultiLoad completely avoids use of the TJ and is restartable.Another approach is, do Insert/Select into an empty table then drop the original table and then rename the new table to original table.
  •  Check the data distribution of primary index. Example: Select HashAmp (HashBucket (HashRow (Last_Name, Fisrt_name))) , Count (*) From Emp Group By 1 Order By 2 Desc
  •  If possible try to avoid using any functions on join columns or on where columns. For example if substr or coalesce.. Etc. functions used on where column then the optimizer is not going use stats even though if there are stats on the column.
  • If there is sql with an outer join then make sure that the inner table filter condition should be present in on condition then no extra data is loaded into spool file from inner table for further processing. Filter condition for outer table should be present in where condition.
Example: Select A.Empno , A.Ename , B.Dname From Emp A left outer join Dept B On (A.Deptno = B.Deptno And B.Deptno=10 ) Here Emp table called as Outer Table and Dept  table called as Inner table.
  •  You can re-write correlated or minus queries with left outer join. Correlated query has to be executed for every row returned by the outer query. Correlated queries are expensive for the optimizer.
Example:  Get all departments that have no employees.
Select A.Deptno , A.Dname , A.Location From Dept A Where Not Exists (Select ‘1’ ;            From Emp B & nbsp;           Where B.DeptNo = A.Deptno )
Above query can be re-written like below and gives better performance than above.
Select A.Deptno , A.Dname , A.Location From Dept A Left Outer Join Emp B On (A.DeptNo =B.Deptno ) Where B.Deptno Is Null
  •  Avoid use of large list of values in IN/NOT IN clauses. Store them in some temporary  table and use that table in the query.
  • If intermediate tables are used for processing then make sure that it has same PI of source and target tables. Collect the stats on intermediate table NUPI after loading.
  •   If “like” used in a where clause, it is better to try to use one or more leading character in the clause, if at all possible otherwise the optimizer is going to full table scan even though if there is a index on this column.
  •   Use “Not Null” for columns, which are declared as Nullable in table definition so the tables skew can be avoided in joins.
  •  Always check the plan and run the below command to see if optimizer has recommended any stats for better execution. Collect the stats only if the optimizer is recommending high confidence.Avoid the stats if the columns have low cardinality and multi column stats (more than three).Diagnostic help stats on for the session
  • Create join index when the table is consistently joined with other tables other than primary index and retrieval benefits are greater than setting up and maintenance.

JOIN STRATEGY

209.What are the types of Join Strategies available in Teradata?
Join Strategies are used by the optimizer to choose the best plan to join tables based on the given join condition.
  • Merge (Exclusion)
  • Nested
  • Row Hash
  • Product (including Cartesian Product joins)

Merge (Exclusion)
  • . It is adopted when the join conditions are based on equality (=).
  • There is a prerequisite though: the two tables must be sorted based on the join column in advance (actually it’s sorted based on the join column row hash sequence).
  • That brings a great advantage for this type of join: both tables only need to be scanned once, in an interleaved manner.
  • Merge join is not necessarily always better than product join, due to the fact that sorting is required.
  • If both tables are huge, sorting can be a tremendous effort.
Requirements:
  • The rows to be joined have to be located on a common AMP
  • Both spools have to be sorted by the ROWID calculated over the join column(s)
Process:
The ROWHASH of each qualifying row in the left spool is used to look up matching rows with identical ROWHASH in the right spool (by means of a binary search as both spools are sorted by ROWID)
Possible Join Preparations required:
  • Re-Distribution of one or both spools by ROWHASH or
  • Duplication of the smaller spool to all AMPs
  • Sorting of one or both spools by the ROWID
teradata merge join
while joining two tables the data will be redistributed or duplicated across all AMPs to make sure joining rows are in the same AMPs.
Relocation of rows to the common AMP can be done by redistribution of the rows by the join column(s) ROWHASH or by copying the smaller table as a whole to all AMPs.
If one table PI is used and Other table PI not used, redistribution/duplication of the table will happen based on the table size.In these cases Secondary Indexes will be helpful.
The common AMP of rows from two spools being joined is defined by the join columns.

Case 1 – P.I = P.I joins
  • The Primary Indexes (or any other suitable index) of both tables equals the join columns:
  • there is no redistribution of data over amp’s. Since amp local joins happen as data are present in same AMP and need not be re-distributed.
  • These types of joins on unique primary index are very fast
  • No join preparation is needed as the rows to be joined are already on the common AMP
Case 2 – P.I = non Index joins
  • Only the Primary Index (or any other suitable index)  of one table matches the join columns: The rows of the second table have to be relocated to the common AMP
  • data from second table will be re-distributed on all amps since joins are happening on PI vs. non Index column.
  • Ideal scenario is when small table is redistributed to be joined with large table records on same amp -Data in small table is duplicated to Every AMP where it is joined locally with large table
    • 1- duplicate all rows of one table onto every AMP (The duplication of all rows is done when the non-PI column is on a small table),
    • 2- redistribute the rows of one table by hashing the non-PI join column and sending them to the AMP containing the matching PI row,
Case 3 – non Index = non Index joins 
  • Neither the Primary Index of the first table (or any other suitable index) nor the Primary Index (or any other suitable index)  of the second table matches the join columns:
  • data from both the tables are redistributed on all AMPs.
  • This is one of the longest processing queries; Care should be taken to see that stats are collected on these columns.
  • redistribute both tables by hashed join column value
Nested Join
  • Nested Join is the most efficient join method in Teradata.
  • It is also the only join method that don’t always use all the AMPs.
In order to make Nested Join picked, the following conditions must be satisfied:
1) The join condition is based on equality;
2) The join column is a unique index on one table;
3) The join column is any index on another table.
  • First only one single row will be retrieved from one table with the help of the unique index, and then based on the row hash of that row, another table is accessed by some index.
  • Nested Join is one of the most precise join plans   suggested by Optimizer .
  • Nested Join works on UPI/USI used in Join statement and is used to retrieve the single row from first table .
  • It then checks for one more matching rows in second table based on being used in the join using an index (primary or secondary) and returns the matching results.

Requirements:
  • Spool 1 allows a unique ROWHASH access (a unique index is defined)
  • Spool 2 allows any kind of ROWHASH access (a unique or not unique is index defined)
Process:
  • The qualifying row of spool 1 is accessed by usage of any unique index.
  • The row is relocated to the AMP owning the rows of spool 2
  • Spool 2 is full table scanned and each row is combined with the one row from Spool 1
Possible Join Preparations required:
  • None
Example:
Select EMP.Ename , DEP.Deptno, EMP.salary
from
EMPLOYEE EMP ,
DEPARTMENT DEP
Where EMP.Enum = DEP.Enum
and EMp.Enum= 2345; this results in nested join
 Hash join
  • Hash Join is also based on equality condition (=).
  • Hash Join Hash Join gets its name from the fact that one smaller table is built as “hash-table”, and potential matching rows from the second table are searched by hashing against the smaller table.
  • Usually optimizer will first identify a smaller table, and then sort it by the join column row hash sequence.
  • If the smaller table is really small and can fit in the memory, the performance will be best. Otherwise, the sorted smaller table will be duplicated to all the AMPs.
  • Then the larger table is processed one row at a time by doing a binary search of the smaller table for a match.
  •  We can say Hash Join to be close relative of Merge based on its functionality.
In case of merge join, joining would happen in same amp.
  •   In Hash Join, one or both tables which are on same amp are fit completely inside the AMP’s Memory   .
  • Amp chooses to hold small tables in its memory for joins happening on ROW hash.
  • The Sprinter, but only if executed in FSG Cache

Requirements:
  • The rows to be joined have to be located on a common AMP
  • The smaller spool is sorted by the ROWHASH calculated over the join column(s) and kept in the FSG cache
  • The bigger spool stays unsorted
Process:
  • The bigger spool is full table scanned row by row
  • Each ROWID from the bigger spools is searched in the smaller spool (by means of a binary search)
Possible Join Preparations required:
  • Re-Distribution of the smaller spool by ROWHASH or
  • Duplication of the smaller spool to all AMPs
  • Sorting of the smaller spools
hash_join
Advantages of Hash joins are
  1. They are faster than Merge joins since the large table doesn’t need to be sorted.
  2. Since the join happening b/w table in AMP memory and table in unsorted spool, it happens so quickly.
 Exclusion Join

  •  Exclusion Join This join strategy is used to find non-matching rows.
  • If the query contains “NOT IN” or “EXCEPT”, exclusion join will be picked.
  • As a matter of fact, this kind of join can be done as either Merge Join or Product Join.
  • One thing worth noticing: exclusion merge join is based on set subtraction operation, and a three-value logic (TRUE, FALSE, UNKNOWN) will be used when comparisons is done on nullble columns (or temporary result set).
These type of joins are suggested by optimizer when following are used in the queries
  • – NOT IN
  • – EXCEPT
  • – MINUS
  • – SET subtraction operations
Select
EMP.Ename , DEP.Deptno, EMP.salary
from
EMPLOYEE EMP
WHERE
EMP.Enum
NOTIN
Select Enum from
DEPARTMENT DEP
where Enum isNOTNULL );
Please make sure to add an additional WHERE filter “with <column> IS NOT NULL since usage of NULL in a NOT IN <column> list will return no results.
Exclusion join for following NOT In query has 3 scenarios
Case 1: matched data in “NOT IN” sub Query will disqualify that row
Case 2: Non-matched data in “NOT IN” sub Query will qualify that row
Case 3: Any Unknown result in “NOT IN” will disqualify that row – (‘NULL’ is a typical example of this scenario).
Product join
  • to find a match between two tables with a join condition which is not based on equality (>, <, <>), or join conditions are ORed together.
  • The reason why we call it “Product” join is that, the number of comparisons required is the “product” of the number of rows of both tables. For example, table t1 has 10 rows, and table t2 has 25 rows, then it would require 10×25=250 comparisons to find the matching rows.
  • When the WHERE clause is missing, it will cause a special product join, called Cartesian Join or Cross Join, 
Requirements:
  • The rows to be joined have to be located on the AMP
  • No spool needs to be sorted!
Process:
  • full table scan is done on the smaller spool and
  • Each qualifying row of spool 1 is compared against each row of spool 2
Possible Join Preparations required:
  • Re-Distribution of one or both spools by ROWHASH or
  • Duplication of the smaller spool
product_join


Sliding Window Merge Join

The Teradata Traditional Merge Join

Requirements:
  • The rows to be joined have to be located on a common AMP
  • Both spools have to be sorted by the ROWID calculated over the join column(s)
Possible Join Preparations required:
  • Re-Distribution of one or both spools by ROWHASH or
  • Duplication of the smaller spool to all AMPs
  • Sorting of one or both spools by the ROWID
Two different algorithms can be used for the Merge Join:

1. The Fast Path Algorithm

The comparision takes place alternating from both sides starting wit the left table. The algorithm tries to join rows with matching rowhash. If there is no match, the pointer is positioned on the row with the next highest rowhash value and the comparison continues until all rows have been compared.
This method is used if left and right table are full table scanned:

teradata fastpath merge join
2. The Slow Path Algorithm

This algorithm reads each row from the left table and tries to match it against rows with the same rowhash from the right table.
This method is used if the left table is accessed via an index:

teradata slow path merge join
Traditional Merge Joins have a big advantage over other join types which don’t require both tables to be sorted by rowhash of the join columns:
Each data block of both tables is accessed exactly once (the algorithm basically slides down on each of the rowhash sorted tables) and it is therefore less sensitive to the size of available FSG cache.

The Teradata Sliding Window Merge Join

The sliding window merge join can be considered as an advancement of the traditional merge join. After introducing the feature of row partitioning, it was required to find an algorithm which allows to join a row partitioned table (PPI table) with
  • Another PPI table having different partition characteristics
  • A Non-partitioned table
The optimizer has the possibility to change a PPI table into a NPPI table and vice versa. This step can be followed by a traditional merge join. Nevertheless, in order to avoid this join preparation step, a sliding window merge join can be executed without the requirement of the restructuring of tables.
As opposed to non-partitioned tables (NPPI tables), which are only sorted by rowhash, PPI tables are sorted on each AMP by two levels:
1. Each row is placed into its assigned row partition.
2. Within each row partition the rows are sorted by rowhash (the same way the rows of a NPPI table are stored).
The Sliding Window Merge Join was designed to be able to directly join
  • A NPPI table with a PPI table or
  • Two PPI tables with different partitioning
Directly means without changing a PPI table into a NPPI table or the other way around.
In order to understand the sliding window join process, one has to remember how PPI table rows are stored:
 Data rows are sorted by rowhash within the data blocks, but several row partitions can hold the same rowhash value! 
The easiest way of merge joining a NPPI table against a PPI table seems by join the NPPI table against each PPI table partition. This would be a reasonable solution because the NPPI table, and each partition of the PPI table, are sorted by rowhash allowing to execute a binary search within the data blocks of each partition.
Nevertheless, for performance reasons, Teradata implements a slightly different, but faster, algorithm:
The AMP reads the first data block from the NPPI table and one data block per partition from the PPI table. The rows are joined (binary search) and the algorithm moves down both tables by reading data block after data block from both tables.
The data block of the NPPI tables stays in FSG cache as long as data blocks from any partition of the PPI tables can be matched. If no more rows can be matched, the next data block of the NPPI tables is moved into the FSG cache and above described process is repeated, until the last data blocks of each table and partition is reached.
This process requires each data block of each table to be touched  exactly once (similar to a traditional merge join).
This process could theoretically result in a similar join performance, as we can reach with a traditional merge join, but there is one restriction degrading performance: The available FSG cache memory.
If the FSG cache is not big enough to hold at least one data block from each PPI table partition, the optimizer has to split the join process into so-called windows.
Assume for example, that the PPI tables consists of 4 partitions but there is only space in FSG cache for the first data block of  2 partitions. In this case the process would define 2 windows, each one consisting of 2 partitions:
  1. Join the NPPI table against the first 2 partitions
  2. Join  the NPPI table against the remaining 2 partitions

SlidingGFX
As a result, the NPPI table has to be read twice and the join becomes more costly. The more windows are needed, the more costly the join becomes. If the NPPI table is small enough, caching effects could decrease the negative performance impact of having several windows.
The sliding window merge join has a similar performance pattern like a traditional merge join, if most of the partitions can be eliminated before the join takes place. The best case scenario is, when there is sufficient FSG cache available to join all partitions at once. This kind of setup is called single window merge join.
The join of two PPI tables with different partitioning is implemented similar to the join between a NPPI and a PPI table:
The left table and the right table rows are split into windows (each window containing a part of the row partitions) and each window from the left table is joined against each window from the right table.
Here is an example: If the join needs to create 5 windows from the left PPI table and 2 windows from the right PPI table, this would result in a product join of 5*2 windows:
A sliding window merge join is product joining each window of the left table with each window of the right table. The join process between 2 windows is similar to a traditional merge join process with a binary search on both tables.

210.What is the default join strategy in Teradata?
  • There is no “default” join strategy.
  • Optimizer decides the type of strategy based on the best retrieval path and other parameters to execute the query.
  • Each join strategy has its own pros and cons, and it’s hard to say which one is the best, depending on different circumstances.
  • The optimizer will choose the best join strategy based on data demographics, statistics and indexes if any of them are available.
  • Using EXPLAIN can help find out what join strategies are to be adopted. No matter which join strategy, it is always applied between two tables. The more tables, the more join steps.
  • Rows must be on the same AMP to be joined.
  • So row distribution or duplication is unavoidable for some join strategies.
  • types of join
    • Product Join
    • Merge Join
    • Exclusion Join
    • Hash Join
    • Nested Join

211.What are the types of JOINs available in Teradata?
Types of JOINs are  :
  • Inner Join,
  • Outer Join (Left, Right, Full), S
  • elf Join, Cross Join and
  • Cartesian Joins.
The key things to know about Teradata and Joins
  • Each AMP holds a portion of a table.
  • Teradata uses the Primary Index to distribute the rows among the AMPs.
  • Each AMP keeps their tables separated from other tables like someone might keep clothes in a dresser drawer.
  • Each AMP sorts their tables by Row ID.
  • For a JOIN to take place the two rows being joined must find a way to get to the same AMP.
  • If the rows to be joined are not on the same AMP, Teradata will either redistribute the data or duplicate the data in spool to make that happen.

TERADATA INDEX

213.  PPI(Partition Primary Index).
  • PPI is used to improve performance for large tables when you submit queries that specify a range constraint.
  • PPI allows you to reduce the number of rows to be processed by using partition elimination.
  • PPI will increase performance for incremental data loads, deletes, and data access when working with large tables with range constraints
Lets take Order_Table, where we have both January and February dates in column Order_Date.
TeradataWiki-Teradata PPI
The Order_Table spread across the AMPs.Notice that January and February dates are mixed on every AMP in what is a random order. This is because the Primary Index is Order_Number.
When we apply Range Query, that means it uses the keyword BETWEEN.
The BETWEEN keyword in Teradata means find everything in the range BETWEEN this date and this other date. We had no indexes on Order_Date so it is obvious the PE will command the AMPs to do a Full Table Scan. To avoid full table scan, we will Partition the table.
TeradataWiki-Teradata PPI
After Partitioned Table,
The example of AMPs on the top of the page. This table is not partitioned.
The example of AMPs on the bottom of the page. This table is partitioned.
Each AMP always sorts its rows by the Row-ID in order to do a Binary Search on Primary Index queries.
Notice that the rows on an AMP don‘t change AMPs because the table is partitioned. Remember it is the Primary Index alone that will determine which AMP gets a row. If the table is partitioned then the AMP will sort its rows by the partition.
TeradataWiki-Teradata PPI
Now we are running our Range Query on our Partitioned Table,each AMP only reads from one partition. The Parsing Engine will not to do full table scan. It instructs the AMPs to each read from their January Partition. You Partition a Table when you CREATE the Table.
A Partitioned Table is designed to eliminate a Full Table Scan, especially on Range Queries.
Types of partitioning:
RANGE_N Partitioning
Below is the example for RANGE_N Partition by day.
CREATE TABLE ORDER_TABLE
(
ORDER_NO INTEGER NOT NULL,
CUST_NO INTERGER,
ORDER_DATE DATE,
ORDER_TOTAL DECIMAL(10,2)
)
PRIMARY INDEX(ORDER_NO)
PARTITION BY RANGE_N
(ORDER_DATE BETWEEN
DATE ‘2012-01-01’ AND DATE ‘2012-12-31’
EACH INTERVAL ‘7’ DAY);
Case_N Partitioning
CREATE TABLE ORDER_TABLE
(
ORDER_NO INTEGER NOT NULL,
CUST_NO INTERGER,
ORDER_DATE DATE,
ORDER_TOTAL DECIMAL(10,2)
)
PRIMARY INDEX(ORDER_NO)
PARTITION BY CASE_N
(ORDER_TOTAL < 1000,
 ORDER_TOTAL < 2000,
 ORDER_TOTAL < 5000,
 ORDER_TOTAL < 10000,
 ORDER_TOTAL < 20000,
 NO CASE, UNKNOWN);
The UNKNOWN Partition is for an Order_Total with a NULL value. The NO CASE Partition is for partitions that did not meet the CASE criteria.
For example, if an Order_Total is greater than 20,000 it wouldn‘t fall into any of the partitions so it goes to the NO CASE partition.
Multi-Level Partitioning:
You can have up to 15 levels of partitions within partitions.
CREATE TABLE ORDER_TABLE
(
ORDER_NO INTEGER NOT NULL,
CUST_NO INTERGER,
ORDER_DATE DATE,
ORDER_TOTAL DECIMAL(10,2)
)
PRIMARY INDEX(ORDER_NO)
PARTITION BY (RANGE_N
(ORDER_DATE BETWEEN
DATE ‘2012-01-01’ AND DATE ‘2012-12-31’
EACH INTERVAL ‘1’ DAY)
CASE_N (ORDER_TOTAL < 5000,
 ORDER_TOTAL < 10000,
 ORDER_TOTAL < 15000,
 ORDER_TOTAL < 20000,
 NO CASE, UNKNOWN));
 Character Based Partitioning(New Feature V13.10) :
There are four new data types available for Character Based PPI. They are CHAR, VARCHAR, GRAPHIC, and VARGRAPHIC.
Example for RANGE Based Character PPI
 CREATE TABLE EMP_TBL
 (
 EMP_NO INTEGER NOT NULL,
 DEPT_NO INTEGER,
 FIRST_NAME CHAR(20),
 LAST_NAME VARCHAR(20),
SALARY DECIMAL(10,2),
 ) PRIMARY INDEX(EMP_NO)
 PARTITION BY RANGE_N
(LAST NAME BETWEEN ( ‘A ‘,’B ‘,’C ‘,’D ‘,’E ‘,’F ‘,’G ‘,’H ‘,
‘I ‘,’J ‘,’K ‘,’L ‘,’M ‘,’N ‘,’O ‘,’P ‘,’Q ‘,’R ‘,’S ‘,’T ‘,
‘U ‘,’V ‘,’W ‘,’X ‘,’Y ‘,’Z ‘ AND ‘ZZ’,UNKNOWN));
Example for CASE Based Character PPI
CREATE TABLE PRODUCT_TBL
(PRODUCT_ID INTEGER NOT NULL
,PRODUCT_NAME CHAR(30)
,PRODUCT_COST DECIMAL(10,2)
,PRODUCT_DESCRIPTION VARCHAR(100)
)PRIMARY INDEX(PRODUCT_ID)
PARTITION BY CASE_N
(PRODUCT_NAME < ‘Apples’
PRODUCT_NAME < ‘Bananas’
PRODUCT_NAME < ‘Cantaloupe’
PRODUCT_NAME < ‘Grapes’
PRODUCT_NAME < ‘Lettuce’
PRODUCT_NAME < ‘Mangos’
PRODUCT_NAME >=’Mangos’ and <=’Tomatoes’);
Ex-Query: Find all Products between Apples and Grapes?
Ans: SELECT * FROM PRODUCT_TBL WHERE PRODUCT_NAME BETWEEN ‘Apples’ and ‘Grapes’;
Partitioning Rules:
  • table can have up to 65,535 Partitions.
  • Partitioning never determines which AMP gets row.
  • Partitioning determines how an AMP will sort the row on its own.
  • Table can have up to 15 levels of partitions.
  • A table cannot have an UPI as the Primary Index if the Partition table does not include PI.
  • Total 3 forms of Partitioning Simple, RANGE and CASE.
Advantages of partitioned tables:
  • They provide efficient searches by using partition elimination at the various levels or combination of levels.
  • They reduce the I/O for range constraint queries
  • They take advantage of dynamic partition elimination
  • They provide multiple access paths to the data, and an MLPPI provides even more partition elimination and more partitioning expression choices, (i.e., you can use last name or some other value that is more readily available to query on.)
  • The Primary Index may be either a UPI or a NUPI; a NUPI allows local joins to other similar entities
  • Row hash locks are used for SELECT with equality conditions on the PI columns.
  • Partitioned tables allow for fast deletes of data in a partition.
  • They allow for range queries without having to use a secondary index.
  • Specific partitions maybe archived or deleted.
  • May be created on Volatile tables; global temp tables, base tables, and non-compressed join indexes.
  • May replace a Value Ordered NUSI for access.
  • Tables that hold yearly information don’t have to be split into 12 smaller tables to avoid Full Table Scans (FTS). This can make modeling and querying easier.
  • Fastload and Multiload work with PPI tables, but not with all Secondary Indexes.
  • Range queries don’t have to utilize a Full Table Scan.
  • Deletions of entire partitions are lightning fast.
  •  PPI provides an excellent solution instead of using Secondary Indexes
Disadvantage of partitioned tables:
  • two-byte Partition number is added to the ROW-ID and it is now called a ROW KEY. The two-bytes per row will add more Perm Space to a table.
  • Joins to Non-Partitioned Tables can take longer and become more complicated for Teradata to perform.
  • Basic select queries utilizing the Primary Index can take longer if the Partition number is not also mentioned in the WHERE clause of the query.
  • You can’t have a Unique Primary Index (UPI) if the Partition Number is not at least part of the Primary Index. You must therefore create a Unique Secondary Index to maintain uniqueness.
  • Access via the Primary Index may take longer
    Full table joins to a NPPI table with the same PI may take longer
 214.Highlight the differences between Primary Key and Primary Index.
  • Primary index is quite mandatory, whereas Primary Key is optional.
  • Primary Index has a limit of 64 tables/columns, whereas Primary Key does not have any limit.
  • Primary Index allows duplicates and nulls, whereas Primary Key doesn’t.
  • Primary Index is a physical mechanism, whereas Primary Key is purely logical mechanism.
  • Primary Key is a concept that uniquely identifies a particular row of a table.
  • Primary INDEX is used for finding best access path for data retrieval and data insertion and Primary KEY is used for finding each rows uniquely just like in other RDBMS.
215.What are the types of PI (Primary Index) in Teradata?
There are two types of Primary Index.
Unique Primary Index ( UPI) and
Non Unique Primary Index (NUPI).
  • By default, NUPI is created when the table is created.
  • Unique keyword has to be explicitly given when UPI has to be created.
  • UPI will slower the performance sometimes as for each and every row , uniqueness of the column value has to be checked and it is an additional overhead to the system but the distribution of data will be even.
  • Care should be taken while choosing a NUPI so that the distribution of data is almost even .
  • UPI/NUPI decision should be taken based on the data and its usage.
  • One Primary Index per table.
  • A Primary Index value can be unique or non-unique.
  • The Primary Index value can be NULL.
  • The Primary Index value can be modified.
  • The Primary Index of a populated table cannot be modified.
  • A Primary Index has a limit of 64 columns.
  • By default it takes as first column as PI
216. In the Primary Index, what is the score of AMPs that are actively involved?
Only one AMP is actively involved in a Primary Index.
 217.Can we have several Primary Index on a table?
No we cannot have multiple Primary Index on a table. However we can create 32 Secondary Indexes on a table. 
218.Can we define PARTITION PRIMARY INDEX (PPI) on a Primary Index column in Table? Explain Why?
 PPI cannot be defined on PI column in Table. Since PI is used to locate an AMP and store the data based on HASH value (ROW ID ) it cannot be used as PPI column. In case of PPI , The data stored in AMP’s are Partitioned based on PPI column after they are row hashed (ROW KEY = ROW ID +PPI value ) P.S: If you want to create UPI on table, then PPI column can be added as part of PI .
219. How many types of Index are present in teradata?
 There are 5 different indices present in Teradata
  • 1. Primary Index a.Unique primary index b. non Unique primary index
  • 2. Secondary Index a. Unique Secondary index b. non Unique Secondary index
  • 3. Partitioned Primary Index a. Case partition (ex. age, salary…) b. range partition ( ex. date)
  • 4. Join index
    •  Single table join index
    • multiple table join index
    •  Sparse Join index ( constraint applied on join index in where clause)
    • aggregate index
  • 5. Hash index
220.What are Secondary Indexes (SI) , types of SI and disadvantages of Secondary Indexes in Teradata?
  • Secondary Indexes provide another path to access data.
  • Teradata allows up to 32 secondary indexes per table.
  • Keep in mind; row distribution of records does not occur when secondary indexes are defined.
  • The value of secondary indexes is that they reside in a subtable and are stored on all AMPs, which is very different from how the primary indexes (part of base table) are stored.
  • Keep in mind that Secondary Indexes (when defined) do take up additional space. Secondary Indexes are frequently used in a WHERE clause.
  • The Secondary Index can be changed or dropped at any time.
  • However, because of the overhead for index maintenance, it is recommended that index values should not be frequently changed.
  • There are two different types of Secondary Indexes, Unique Secondary Index (USI), and Non-Unique Secondary Index (NUSI).
  • Unique Secondary Indexes are extremely efficient. A USI is considered a two-AMP operation. One AMP is utilized to access the USI subtable row (in the Secondary Index subtable) that references the actual data row, which resides on the second AMP.
  • A Non-Unique Secondary Index is an All-AMP operation and will usually require a spool file. Although a NUSI is an All-AMP operation, it is faster than a full table scan.
Secondary indexes can be useful for:
  • Satisfying complex conditions
  • Processing aggregates
  • Value comparisons
  • Matching character combinations
  • Joining tables
Advantages:
  1. A secondary index might be created and dropped dynamically
  2.      table may have up to 32 secondary indexes.
  3. Secondary index can be created on any column. .Either Unique or Non-Unique
  4. It is used as alternate path or Least frequently used cases.  ex. defining SI on non indexed column can improve the performance, if it is used in  join or filter condition of a given query.
  5. Collecting Statistics on SI columns make sure Optimizer chooses SI if it is better than doing Full Table Scans
Disadvantages
  1. Since Sub tables are to be created, there is always an overhead for additional spaces.
  2. They require additional I/Os to maintain their sub tables.
  3. The Optimizer may, or may not, use a NUSI, depending on its selectivity.
  4. If the base table is Fallback, the secondary index sub table is Fallback as well.
  5. If statistics are not collected accordingly, then the optimizer would go for Full Table Scan.
  6. Secondary Indexes need more I/O.
  7. Secondary Index slows down the process because of Sub-table maintenance.
  8. Secondary Index requires more Physical Storage.
221.How are the data distributed in Secondary Index Subtables in Teradata?
  • When a user creates a Secondary Index, Teradata automatically creates a Secondary Index Subtable.
    • The subtable will contain the:
    • Secondary Index Value
    • Secondary Index Row ID
    • Primary Index Row ID
  • When a user writes an SQL query that has an SI in the WHERE clause, the Parsing Engine will Hash the Secondary Index Value. The output is the Row Hash, which points to a bucket in the Hash Map.
  • That bucket contains an AMP number and the Parsing Engine then knows which AMP contains the Secondary Index Subtable pertaining to the requested USI information. The PE will direct the chosen AMP to look-up the Row Hash in the Subtable.
  • The AMP will check to see if the Row Hash exists in the Subtable and double check the subtable row with the actual secondary index value.
  • Then, the AMP will pass the Primary Index Row ID back up the BYNET network. This request is directed to the AMP with the base table row, which is then easily retrieved.
222.  What are the things to considered while creating secondary index?
  •   Creating a secondary index causes Teradata to build a sub-table to contain its index rows, thus adding another set of rows that requires updating each time a table row is inserted, deleted, or updated.
  • Secondary index sub-tables are also duplicated whenever a table is defined with FALLBACK, so the maintenance overhead is effectively doubled.
223.  What is primary index and secondary index?Answer:     
  • Primary index is the mechanism for assigning a data row to an AMP and a location on the AMP’s disks.
  • Indexes also used to access rows from a table without having to search the entire table.
  • Secondary indexes enhance set selection by specifying access paths less frequently used than the primary index path.
  • Secondary indexes are also used to facilitate aggregate operations. If a secondary index covers a query, then the Optimizer determines that it would be less costly to accesses its rows directly rather than using it to access the base table rows it points to.
224. What is the purpose of indexes? Answer:  
  • An index is a mechanism that can be used by the SQL query optimizer to make table access more performant.
  • Indexes enhance data access by providing a more-or-less direct path to stored data and avoiding the necessity to perform full table scans to locate the small number of rows you typically want to retrieve or update. 
225.How does indexing improve query performance?
  • By creating index on single or multiple columns, query takes index access path instead of full table scan.
  • Full table scan is costilier operation where it has to visit all rows in the table whereas index operation just accesses the index sub table with required hash value and joins with the basetable using rowid. This improves response time and performance of the query.
  • Teradata automatically Creates the Primary index , if not specified by the DDl for Table .
    • PI
    • Secondary Index
    • PPI
    • Join Index
    • HASH Index
  • Indexing is a way to physically reorganise the records to enable some frequently used queries to run faster. The index can be used as a pointer to the large table.
  • It helps to locate the required row quickly and then return it back to the user.
  • or The frequently used queries need not hit a large table for data. they can get what they want from the index itself. – cover queries.
  • Index comes with the overhead of maintanance.
  • Teradata maintains its index by itself. Each time an insert/update/delete is done on the table the indexes will also need to be updated and maintained.
  • Indexes cannot be accessed directly by users. Only the optimizer has access to the index.
226.What are the differences between the followings? – Vertical & Horizontal Partitioning vs – Join & Hash Indexes vs – PPI
  • Horizontal partitioning is used in large tables where no of rows are very high.
  • Rows are broken and placed in partitions.
  • Loading becomes faster and query performance  would be good.
  • Vertical partitioning is used for tables with huge no of columns.
  • Column wise data  are broken and placed in partitions with a common key.
  • A multi/aggregate join index basically stores the results of the join in perm space.
  • A hash index basically stores the results of the table based on the hash value.
  • PPI basically partitions the data based on the column that is specified in the partition clause.
227.What is Value-Ordered NUSIs ?
  • When we create a NUSI on the table a subtable is built on all AMP’s.
  • Each Subtable contains a row for the NUSI and corresponding row ids to the base rows located on the same AMP.
  • Rows in the subtable are sequenced by the row hash of the NUSI value.
  • This way of storing rows is convenient in storing rows with particular NUSI, but not good for range queries.
  • This is where Value Ordered NUSI becomes useful VALUE ordered NUSI allows rows in the subtable to be stored in order of data value , rather than hash of the value. This is useful when performing range queries.
228.Creating value ordered NUSI when creating table.
Example 1:
CREATE TABLE EMPLOYEE1
(
Employeeid integer,
DepartmentNo integer,
Salary decimal(8,2),
Hiredate date
) INDEX (Departmentno) ORDER BY VALUES;

Doing a show table will show below:

SHOW TABLE EMPLOYEE1

CREATE SET TABLE EDW_RESTORE_TABLES.EMPLOYEE1 ,NO FALLBACK ,
   NO BEFORE JOURNAL,
   NO AFTER JOURNAL,
   CHECKSUM = DEFAULT,
   DEFAULT MERGEBLOCKRATIO
   (
   Employeeid INTEGER,
   DepartmentNo INTEGER,
   Salary DECIMAL(8,2),
   Hiredate DATE FORMAT 'YYYY/MM/DD')
PRIMARY INDEX ( Employeeid )
INDEX ( DepartmentNo ) ORDER BY VALUES ( DepartmentNo ); 
--> note that DEPARTMENTNO is used to order. 
Here since we had only one column in the index, it is used for ordering.

However when we have index made of multiple columns we can choose which column to use fo ordering.

Example 2:

CREATE TABLE EMPLOYEE1
(
Employeeid integer,
DepartmentNo integer,
Salary decimal(8,2),
Hiredate date
) INDEX (Departmentno,Employeeid) ORDER BY VALUES;
/* Multiple columns as part of INDEX*/



SHOW TABLE EMPLOYEE1

CREATE SET TABLE EDW_RESTORE_TABLES.EMPLOYEE1 ,NO FALLBACK ,
   NO BEFORE JOURNAL,
   NO AFTER JOURNAL,
   CHECKSUM = DEFAULT,
   DEFAULT MERGEBLOCKRATIO
   (
   Employeeid INTEGER,
   DepartmentNo INTEGER,
   Salary DECIMAL(8,2),
   Hiredate DATE FORMAT 'YYYY/MM/DD')
PRIMARY INDEX ( Employeeid )
INDEX ( Employeeid ,DepartmentNo ) ORDER BY VALUES ( DepartmentNo ); 
--> by default system choose the 1st column DEpartmentNo for ordering.


In case of multiple columns being part of index 
we can specify the column which we would want to be used 
for ordering as shown in below example:

Example 3:

CREATE TABLE EMPLOYEE1
(
Employeeid integer,
DepartmentNo integer,
Salary decimal(8,2),
Hiredate date
) INDEX (Departmentno,Employeeid) ORDER BY VALUES(employeeid);


Creating Value ordered NUSI using CREATE INDEX syntax

CREATE INDEX(DepartmentNO) ORDER BY VALUES (DepartmentNO) on EMPLOYEE ;


Rules for using value ordered NUSI

The ordering can be done only on 1 single column.

In below we try to create a value ordered NUSI with two columns

CREATE TABLE EMPLOYEE1
(
Employeeid integer,
DepartmentNo integer,
Salary decimal(8,2),
Hiredate date
) INDEX (Departmentno,Employeeid) ORDER BY VALUES(Departmentno,Employeeid) ;

Following is the error message we get:

CREATE TABLE Failed. 5466: Error in Secondary Index DDL, Order by clause can have only one column. 

Column used for ordering must a part or all of the index definitions

In the following query we are ordering by a column that is not a part of index definition

CREATE TABLE EMPLOYEE1
(
Employeeid integer,
DepartmentNo integer,
Salary decimal(8,2),
Hiredate date
) INDEX (Departmentno,Employeeid) ORDER BY VALUES(salary) ;

Following is the error message we get:

CREATE TABLE Failed. 5466: Error in Secondary Index DDL,
 Order by field does not belong to the index. 


The column should be numeric- non numeric columns cannot be used.

Column should not be greater than 4 bytes. Thus only INT,
 SMALLINT, BYTEINT, DATE, DEC are valid. A Decimal is 
also allowed provided its storage length does not exceed 
4 bytes and it does not have any precision digits.
)
229.I have not defined PRIMARY INDEX or PRIMARY KEY on table what will happen now: In this case,
a) I have not defined PRIMARY INDEX or PRIMARY KEY on table what will happen now: In this case, Teradata will check if any column is defined as UNIQUE, then it will make that column as UNIQUE PRIMARY INDEX else first column will be created as PRIMARY INDEX.
b) I have not defined PRIMARY INDEX however a column is defined as PRIMARY KEY: In this case, Teradata will make the PRIMARY KEY column as UNIQUE PRIMARY INDEX of the table.
c) I have defined both PRIMARY KEY and PRIMARY INDEX on different column: In this case, Teradata will make PRIMARY KEY column as UNIQUE SECONDARY INDEX i.e. UNIQUE INDEX on the table. So one must understand the importance of PRIMARY INDEX in Teradata. Generally, PRIMARY KEY concept is taken care by UNIQUE PRIMARY INDEX in Teradata environment.
230.How Teradata can accesses data ?
  • Primary Index (fastest)
  • Secondary Index (second fastest way)
  • Full Table Scan (slowest way)
Primary Index (fastest) – When ever a Primary Index is utilized in the SQL WHERE Clause the PE will be able to use the Primary Index to get the data with a one- AMP operation.
Secondary Index (next fastest) -If the Primary Index is not utilized sometimes Teradata can utilize a secondary index.It is not as fast as the Primary Index, but it is much faster than a full table scan.
Full Table Scan (FTS) (Slowest) Teradata handles full table scans brilliantly because Teradata accesses eachdata row only once because of the parallel processing. Full Table Scans are a way to access Teradata without using an index.Each data block per table is read only once.
231.What are the reasons for product joins ?
  • 1.Stale or no stats causing optimizer to use product join
  • 2. Improper usage of aliases in the query.
  • 3. missing where clause ( or Cartesian product join  1=1 )
  • 4. non equality conditions like > ,< , between   example ( date)
  • 5. few join conditions
  • 6.  when or conditions are used. last but not the least   product joins are not bad always!! sometimes PJ are better compared to other types of joins.
233.Create Table: When you are designing/creating the table, check the below steps.
  • Always create Multiset table.If you have a requirement to create a SET table then make sure that you have a unique primary index or unique secondary index on that table.
  • If there is no UPI or USI on the SET table then the new row will check every
  • row with same row hash value byte-by-byte to make sure that a duplicate doesn’t insert into the SET table.
  • Choose the primary index that has even distribution and high join access. If the column has unique values and is not a primary index then create Unique Secondary Index. But Fastload and Multiload will not work if the table has Unique Secondary  index.
  • Do not create any UNICODE columns unless absolutely required. UNICODE requires double the storage space and stats often become unusable for Teradata.
  • If the table has no updates then you can change default Free Space Percent Parameter of a table to 0 so that the data block can hold many rows and then all rows can be retrieved in a single I/O.
  • Try to create vertical partition if the table has too many columns and some of the columns are used very rare.Ifyou reduce the table size then the data block can hold many rows so I/O can be more efficient.
  • If the 90% of users queries data for current year then horizontal partition gives best performance.
  • You can split the table by creating two tables history and current.
  • Another approach is by creating partition. Teradata partition reduces the overhead of scanning the complete table thus improving performance. Choose the partition table when is accessed by date ranges or character columns. Make sure that number of partitions is less.
  • Compression reduces the storage capacity and the number of I/O’s required for an operation. Wherever possible, do the compression.
  • If there is a join column in a table and  has more null values then replace   the null values with some random number using random function and then multiply with -1 then those records can be easily distinguished from other records. Better to give random range like 1 to 10000 so it will not impact distribution if there is an upgrade in near future. Business users don’t want to see negative or unknown members in the reports so replace negative values with null in an application layer.
  • Try to define all join columns with not null at table level even though source is providing null values. ETL logic should populate it with some random values if they are null so there will be no  data skew issues or spool out error in applications.
  • Wherever possible try to avoid outer joins. We can avoid outer joins by inserting some unknown (dummy) records into the dimension (master) table. Same range values will be inserted into the FACT (Transaction) table wherever there is a null in corresponding column.
  • Note down that there is a difference between create table (copy) and create table (select). Create table (copy) retains all definitions and indexes where as create table (select) not.

TERADATA JOIN INDEX

234 .What is a join index ? What are benefits of using join index ?
  • Join Indexes are created so data doesn’t have to move to satisfy the join.
  • The Join Index essentially pre-joins the table and keeps it hidden for the Parsing Engine to utilize.
  • Remember, the user cannot select directly from the Join Index.
  • The PE decides when it is to be used whenever the user queries the base table or the views of the base table.
  • Join Indexes are physically stored exactly like normal Teradata tables.•
  • Users can’t query the Join Index directly, but PE will decide when to use.
  • Join Indexes are automatically updated as base tables change.
  • Join Indexes can have Non-Unique Primary Indexes (NUPI) only.
  • Join Indexes can have Non-Unique Secondary (NUSI) Indexes.
  • Max 64 Columns per Table per Join Index
  • BLOB and CLOB types cannot be defined.
  • Triggers with Join Indexes allowed V2R6.2
  • After Restoring a Table, Drop and Recreate the Join Index.FastLoad/ MultiLoad won’t load to tables with a Join Index defined
  • It is a index that is maintained in a system .
  • It maintains rows joined on two or more tables.
  • Join index is useful for queries where the index structure contains all the columns referenced by one or more joins, thereby allowing the index to cover all or part of the query
235.Benefits if using join index is
  • To eliminate base table access.
  • Aggregate processing is eliminated by creating aggregate join index.
  • It reduces redistribution of data since data is materialized by JI.
  • Reduces complex join conditions by using covering queries. 
236.What are Teradata Join Indexes best suited for?
  • Frequent joins of n middle to large tables with a significant number of rows from both tables being joined.
  • Frequent joins of large tables and a relatively small set of columns is repeatedly requested.
  • For frequently run queries with complex expression in its predicate.
  • For join indexes assistance in de-normalization.
  • Perfectly suited for alternative partitioning.
  • Very useful when alternate keys are used a lot for joining.
  • Move large and time-consuming joins or aggregations into the batch windows.
  • Join indexes even allow direct access via the primary index of the JI.
237.The Teradata Join Index and Partitioning
  • The Join Index can be partitioned
  • Partitioned JI for a non-partitioned base table is ok, but non-partitioned JI on partitioned base table not recommended from a performance point of view
  • You cannot specify a partition for a row compressed Join Index

238.Full or partial Coverage of the Join IndexIf the join index is not fully covering, it can only be used to locate the ROWID of the base table data record. A covering join index is a special case, where the join index contains all required columns and can directly satisfy the query without accessing the base table.
First of all, it is important to know:
There is no full coverage for SELECT * FROM table_name queries.
Avoid the practice of writing your SQL queries, without specifying a column list, if you want to achieve full join index coverage.
The usage of the join index with partial coverage is possible in the following situations:
  • One of the columns in the join index definition is the keyword ROWID. In this case, the base table ROWID is stored with each index row and a lookup of the base table row can be done.
  • The column set of the UPI of the underlying table is part of the definition. In this case the column combination can be hashed and the ROWID of the base table row can be derived
  • The column set of the NUPI of the underlying table is part of the definition plus either
    (a) one of the columns in the definition of that index is the keyword ROWID or
    (b) the column set defining a USI on the underlying base table.
For (a) like in the case of the USI coverage, the base table rows can be accessed via the ROWID
For (b), similar to the UPI case described above, just that the base table rows can be accessed indirectly by hashing the USI columns,                       extracting the base table ROWIDs from the USI into the spool and retrieving the base table rows by scanning this spool.
Coverage does not guarantee use of a Join Index. The cost of using must be estimated to be less than the cost of not using!
239.Restrictions on SQL Operations
  • FULL OUTER JOIN not allowed
  • LEFT or RIGHT JOINS: on the inner side at least 1 non-nullable column
  • OUTER JOIN preferable for Join Index usage likelihood, but not allowed for Aggregate JI
  • HAVING and QUALIFY not allowed
  • No Set Operations: UNION, INTERSECT, MINUS
  • No Sub queries
  • No inequality conditions for ON clauses in join index definitions. Supported only if ANDed to at least one equality join condition
  • Only <, <=, >=, > as comparison operators in join allowed
  • TOP n and SAMPLE not allowed
SELECT * FROM dbc.indices WHERE indextype=’J’;
240.TYPES OF JOINS INDEX  
Multi-Table Join Index
A Multi-Table Outer Join Index has some very specific rules above to remember.
The Outer Join Index has the additional rows that did NOT match.
Compressed Multi-Table Join Index
  • A Compressed Multi-Table Join Index won’t keep repeating the same Customer_Number and Customer_Name, but only list it once.
  •  compressed Join Index (such as the example above) is an example of Row compression because it eliminates space by not duplicating the Customer_Number or Customer_Name. Some rows are essentially smaller and this is a form of compression!
 Single-Table Join Index
  • If a USER queries with the Dept_No in the WHERE clause this will be a Single-AMP retrieve.
  • If the USER joins the Employee and Department Tables together then Teradata won’t need to Redistribute or Duplicate to get the data AMP local. 
Single Table Join Index Great For LIKE Clause
  • Build a STJI with column that contains three columns. They are the LIKE column being queried, the primary index column of the base table and the keyword ROWID!
  • The PE will choose to scan the narrow table (the Join Index) and qualify all rows that qualify with a car license LIKE ‘TeraT%’,
  • then the PE uses the ROWID to get data from the BMV_Table where row is on the same AMP because both the base table and join index are on the same AMP because they both have the same Primary Index.
  • This can save enormous time for queries using the LIKE command.
  • A LIKE command on a base table will never use a Non-Unique Secondary Index (NUSI).
  • The above technique should be tested and only used if a lot of users are utilizing the LIKE command on a large table. If that is the case a lot of time can be saved.
Single Table Join Index with Value Ordered NUSI
  • A Value Ordered NUSI can only be done on columns that are 4-byte integers.
  • Dates qualify because they are stored internally in Teradata as 4-byte integers.
  • A value ordered index has been expanded from 16 to 64 columns.
  • Indexes are always sorted by their hash, but a Value Ordered index is sorted on each AMP by values and not hash.
Aggregate Join Indexes
  • Aggregate Join Indexes may be defined on:
    • Single Tables – A columnar subset of a base table with aggregates automatically maintained by Teradata.
    • Multiple Tables -A columnar subset of as many as 64 base tables with aggregate columns automatically maintained by Teradata.
  • Sparse Join Indexes are defined with a WHERE clause that limits the number of base table rows included and the space required to store them.
  • Aggregate Join Indexes can only include SUM and COUNT values.
  • You can calculate Averages from these two columns though.
Sparse Join Index
A Sparse Join Index has a WHERE clause so it doesn’t take all the rows in the table, but only a portion. This is a very effective way to save space and focus on the latest data.
Global Multi-Table Join Index
With the ROWID inside the Join Index, the PE can get columns in the User’s SQL NOT specified in the Join Index directly from the Base Table by using the Row-ID.
 Hash Index
A Hash Index can be Ordered by Values or by Hash.
  • The storage of a vertical subset (columns) of the table in the hash index structure
  • The selection of a better Primary Index in order to avoid costly redistribution activities for join preparation.
  • The Ordering of rows by value, to support range scans
  • Hash indexes are similar to single-table simple join indexes in that they are used for denormalizing a single table by storing rows in a separate physical object from the base table.
  • Hash indexes are limited to a SINGLE table and are beneficial in improving efficiency. Like join indexes, Hash index rows are stored in a different object than the base table rows.
  • This allows for faster access without needing access to the underlying rows. Also like their relative, they function as an alternative to the base rows instead of providing a different access path.
  • Like join indexes, hash indexes are known as “covered queries” because they are utilized when they contain all the columns requested by the query. If all of the columns are covered, the optimizer will usually choose to access the rows via the hash index instead of the base table rows.
  • In a situation where the hash index partially covers the query, the optimizer may use the Row ID included with the Hash Index to access the other columns in the data row. Join indexes and hash indexes are both transparently maintained by Teradata. When the base table changes, the Hash Index table is automatically updated.
  • This automatic task is referred to as an update burden.
  • Being that Hash Indexes are strikingly similar in functionality to secondary indexes, they should be carefully considered because they carry this update burden.
  • Hash indexes however, can offer marginal performance gains over a secondary index when certain columns are frequently needed from one table to join with many others. In these situations, the Hash index partially covers the query, but fully covers the rows from the base table, effectively eliminating any need to access the base table.
  • Hash Indexes: • Can be ordered by hash or by values to facilitate range queries. • Automatically has the Row ID of the base table row included in the Hash Index which the RDBMS software can use to access columns not included in a “covered query”
Maybe the most outstanding limitations compared with a Single Table Join Index are:
  • – A Hash Index cannot have a Partitioned Primary Index
    – A Hash Index cannot have a Non-Unique Secondary Index.
    – Hash Indexes cannot be specified for NOPI or column‑partitioned base tables as they are designed around the Teradata hashing algorithm.
    – A hash index cannot be column partitioned
    –  A hash index must have a Primary Index, a Single Table Join Index can be created with or without a primary index if a table is column-partitioned (as column stores on Teradata never have a Primary Index)
Hash Indexes are base tables that cannot be accessesd directly by a query.The differences between hash and single table join indexes are:1)Hash indexes cannot have secondary index,but single table join indexes can have secondary index.2)Hash index column list cannot have aggreagete functions,but single table join index supports aggreagete function with column lis
The optimizer includes a hash index in a query plan when the index covers all or part of a join query thus eliminating the need to redistribute the rows.However hash index carries more burden than a secondary index.And the storage of a hash index is similar to a base table storage except that hash indexes can be compressed.



TERADATA UTILITIES

BTEQ

  • BTEQ is a report writer, unlike SQL Assistant which is more of a spreadsheet
  • Queries can be run interactively or in a Batch Script!
  •  It is a cool report writer and can do things SQL Assistant cannot do
  • Batch TEradata Query (BTEQ) tool was the original way that SQL was submitted to Teradata as a means of getting an answer set in a desired format
  • Remember that BTEQ commands begin with a period (.) and do not require a semi-colon (;) to end the statementSQL commands do not ever start with a period, and they must always be terminated with a semi-colon.
 SELECT * FROM SQL_Class.Employee_Table WITH SUM(Salary) ;
 If you just use the WITH statement, you can get aggregate grand totals.
 WITH and WITH BY Together for Subtotals and Grand Totals
SELECT *FROM SQL_Class.Employee_TableWITH SUM(Salary) BY Dept_NoWITH SUM(Salary) ;
BTEQ Export – Four types of Export Variations
 
  • EXPORT DATA–Exports data in record mode, but completely unreadable to viewers.
  • EXPORT Report—Exports in report mode, which includes the headers and is readable.
  • EXPORT INDICDATA–Exports in INDICDATA mode warning mainframes of NULLs.
  • EXPORT DIF –Exports and converts data into Data Interchange Format for PC’s.
  • BTEQ allows for multiple techniques to export data. We usually think of an export as moving data off of Teradata to a normal flat file. That is example number one and that is called RECORD Mode.
  • BTEQ can actually take your SQL output report and include the Headers and export all together. It looks like an electronic report. That is EXPORT REPORT mode. This is also called Field Mode
  • When are NULL’s in your data and you export them to a mainframe the actual mainframe application could run into problems so INDICDATA warns of NULL values.
  • Use DIF to be able to be used by PC applications in Data Interchange Format.

 
 
 We are taking data from our flat file called C:\Temp\CustData.txt and importing the records into the Customer_Tabl
.QUIET ON
Limits the output to reporting only errors and request processing statistics
.REPEAT
Causes BTEQ to read records until End of File.
USING
Defines the input data from the host.

Creating a BTEQ IMPORT for a Comma Separated Value File
.IMPORT VARTEXT ‘,’ FILE= C:\Temp\var1.txt
Column is defined using VARCHAR. This is necessary when using CSV files.
.logon localtd/dbc, apple123.
IMPORT VARTEXT ‘,’ FILE= C:\Temp\var1.txt
.REPEAT *
USING (IN_CustNo   VARCHAR(11)             ,
 ,FILLER   (CHAR(30))
IN_Cust_Name   VARCHAR(20)             ,
IN_Cust_Phone   VARCHAR(8))
INSERT INTO SQL_Class.Customer_Table
VALUES
(:IN_CustNo      ,
:IN_Cust_Name      ,
:IN_Cust_Phone) ;
.QUIT .LOGOFF
to Run a Teradata BTEQ Script
At your Command Prompt
 C: BTEQ < C:\Temp\Script1.txt
Enter your logon or
BTEQ Command:
.logon localtd/dbc, apple123.RUN FILE=C:\Temp\Script1.txt


 
 
 This is useless unless you are using the Primary Index or a Unique Secondary Index in the utility. Typically, multiple sessions can be used in BTEQ Imports but won’t be used in BTEQ Exports.
All three tables must have the same Primary Index for the Fast Path to be taken
The most important thing to take notice is where the semi-colon is located and that is at the beginning of the line. That is how BTEQ makes two separate statements become one transaction.

FASTEXPORT


  • Moves data off Teradata into flat files in 64 K blocks to mainframe or computer.
  • Use FastExport instead of BTEQ Export if there are more than 500,000 rows.
  • A Block Level Utility so no more than 60 FastExport jobs can run simultaneously.
  • V13 enhanced so it won’t use spool before exporting rows.

  • Before Teradata V2R6.0 the DBS control parameter MaxLoadTasks had a max of 15 block level utilities that include FastLoad, MultiLoad and FastExport.
  • This meant that no more than 15 Block Utilities combined could run simultaneously.
  • After V2R6.0 a new DBS control parameter named MaxLoadAWT was introduced.
  • MaxLoadAWT allows for up to 30 FastLoad and MultiLoad jobs to run concurrently.
  • MaxLoadAWT also allows for up to 60 FastExport jobs to run concurrently (minus FastLoad or MultiLoad jobs).
  • If MaxLoadAWT is set to 0 then things run like the old days and no more than 15 block utilities can run simultaneously.
  • If MaxLoadAWT is set > 0 then new rules take effect, but remember this can impact queries.


  • 1) FASTLOAD – a two-byte integer (n) which is followed by n bytes of data which is then followed by and end-of-record.
  • 2) BINARY – a two-byte integer (n) which is followed by n bytes of data
  • .3) TEXT – a number of bytes of data which is then followed by an end-of-record
  • .4) UNFORMAT – Exported as it is received from the Call Level Interface (CLI).

LOGTABLE SQL01.Stu_Export_Log ;
.LOGON Teradata V14/dbc, abc123 ;
  Create the log.
Logon using the TDP ID, User andPassword.
  .BEGIN EXPORT
SESSIONS 12;
  Begin FastExport and set the sessions to 12.
  .EXPORT OUTFILE ‘C:\temp\Student.txt
FORMAT BINARY ;
  Define the output file name and location and set the format statement to binary.
  SELECT Student_ID (CHAR(11)),
          Last_Name (CHAR(20)),
          First_Name (CHAR(14)),
          Class_Code (CHAR(2)),
          Grade_Pt (CHAR(8))
FROM SQL_Class.Student_Table ;
  The SQL to export from Teradata, including the columns and the data types to export them to.
  .END EXPORT ;
.LOGOFF ;
  End the FastExport and Logoff.

 FastExport by Default places Null Indicators in Output

.LOGTABLE SQL01.Student_Table_Lg ;
   .LOGON Teradata V14/dbc, abc123 ;
  Create the log.
Logon using the TDP ID, User andPassword.
  .BEGIN EXPORT ;
  Begin FastExport.
   .EXPORT OUTFILE ‘C:\temp\Student2.txt’
FORMAT FASTLOAD MODE INDICATOR ;
  Define the output file name and location and set the format to FASTLOAD. Place Indicators for NULL Values (That is the default)
  SELECT Student_ID (INTEGER),
          Last_Name (CHAR(20)),
          First_Name (VARCHAR(12)),
          Class_Code (CHAR(2)),
          Grade_Pt (DECIMAL(5,2))
FROM SQL_Class.Student_Table ;
  The SQL to export from Teradata, including the columns and the data types to export them to.
  .END EXPORT ;
.LOGOFF ;
.QUIT
  End the FastExport, Logoff and Quit.

FORMAT FASTLOAD MODE INDICATOR ;
set the format to FASTLOAD. Place Indicators for NULL Values (That is the default)
No Spool Options with FastExport
.BEGIN EXPORT WITH NO SPOOL ;
New in Teradata V13.10 there is the option for no spool in a FastExport job.
run without having the Exported answer set first to go to spool and then be exported.
  •  BEGIN EXPORT [WITH SPOOL]
  • BEGIN EXPORT [WITH NO SPOOL]
  •  BEGIN EXPORT [WITH NO SPOOL ONLY]
The NO SPOOL ONLY will error out if the job isn’t eligible for NO SPOOL.
FastExport jobs eligible for NO SPOOL cannot have an ORDER BY clause, Join Tables, have Aggregates, be Multi-Statement requests or use reserved words such as DATE or USER.

FastExport with No Spool

  .LOGTABLE SQL01.Student_Table_Lg ;
.LOGON Teradata V14/dbc,abc123 ;
  Create the log.
Logon using the TDP ID, User andPassword.
  .BEGIN EXPORT WITH NO SPOOL ;
  Begin FastExport but use no spool.
  .EXPORT OUTFILE ‘C:\temp\Student3.txt’
FORMAT TEXT MODE INDICATOR;
  Define the output file name and location and set the format to FASTLOAD. Place Indicators for NULL Values (That is the default)
  SELECT Student_ID (CHAR(11)),
          Last_Name (CHAR(20)),
          First_Name (CHAR(14)),
          Class_Code (CHAR(2)),
          Grade_Pt (CHAR(8))
FROM “SQL_CLASS”.“Student_Table” ;
  The SQL to export from Teradata, including the columns and the data types to export them to. These columns have been converted to CHAR so it will be easier to import into another database
  .END EXPORT;
.LOGOFF;
.QUIT;
  End the FastExport, Logoff and Quit.
This job is joining two tables so it is not eligible for the NO SPOOL Options.
FastExport that Joins Two Tables
  .LOGTABLE SQL01.Emp_Dept_Lg ;
.LOGON Teradata V14/dbc, abc123 ;
  Create the log.
Logon using the TDP ID, User andPassword.
  .BEGIN EXPORT Sessions 12 ;
  Begin FastExport with 12 sessions.
  .EXPORT OUTFILE ‘C:\temp\Emp_Dept.txt
MODE RECORD FORMAT TEXT ;
  Define the output file name and location and set the format.
  SELECT E.Employee_No (CHAR(11)),
          E.First_Name (CHAR(14)),
          E.Last_Name (CHAR(20)),
          D.Dept_No (CHR(6)),
          D.Dept_Name (CHAR(20))
FROM SQL_CLASS.Employee_Table as E INNER JOIN
          SQL_Class.Department_Table as D ON E.Dept_No = D.Dept_No ;
  The SQL to export from Teradata, including the columns and the data types to export them to. Notice that we are joining two tables together here.
This job is joining two tables so it is not eligible for the NO SPOOL Options.
  .END EXPORT;
.LOGOFF;
.QUIT;
  End the FastExport, Logoff and Quit.
FastExport can join multiple tables together and then export the result set off of Teradata. This example shows exactly that, but this job is not eligible for NO SPOOL because it joins multiple tables together, and thus will need spool to complete the task.
How to Eliminate Indicators in your FastExport Script
.LOGTABLE SQL01.Student_Table_Lg ;
.LOGON Teradata V14/dbc, abc123 ;
Create the log.
Logon using the TDP ID, User andPassword.
.BEGIN EXPORT ;
Begin FastExport.
.EXPORT OUTFILE ‘C:\temp\Student4.txt’
FORMAT FASTLOAD MODE RECORD ;
Record Mode is not the default mode, but it will NOT include indicators. This is a great way to export if you expect to use FastLoad to load back into Teradata.
SELECT Student_ID (INTEGER),
          Last_Name (CHAR(20)),
          First_Name (VARCHAR(12)),
          Class_Code (CHAR(2)),
          Grade_Pt (DECIMAL(5,2))
FROM SQL_Class.Student_Table ;
The SQL to export from Teradata, including the columns and the data types to export them to. These columns have been converted to CHAR so it will be easier to import into another database
.END EXPORT;
.LOGOFF;
.QUIT;
End the FastExport, Logoff and Quit.



How to Eliminate Indicators in your FastExport Script
FORMAT FASTLOAD MODE RECORD ;
Record Mode is not the default mode, but it will NOT include indicators. This is a great way to export if you expect to use FastLoad to load back into Teradata.

Executing a FastExport Script
.LOGTABLE SQL01.Student_Table_Lg ;
.LOGON Teradata V14/dbc, abc123;
.BEGIN EXPORT ;
.EXPORT OUTFILE ‘C:\temp\Student4.txt’
FORMAT FASTLOAD MODE RECORD ;
SELECT Student_ID (INTEGER),
          Last_Name (CHAR(20)),
          First_Name (VARCHAR(12)),
          Class_Code (CHAR(2)),
          Grade_Pt (DECIMAL(5,2))
FROM SQL_Class.Student_Table ;
.END EXPORT;
.LOGOFF;
.QUIT;
images
Go to the DOS or the UNIX prompt and run the following command.
images
You can also create, schedule and execute your FastExport job from the Nexus.
images

FastLoad


241.Explain Fast Load in Teradata?
  •        Loads large amount of data from external file into an empty table at high             speed.Only load one empty table with 1 fast load job.
  •        Tables defined with Referential integrity, secondary indexesjoin indexeshash         indexes or triggers cannot be loaded with FastLoad.
  • Duplicate rows cannot be loaded into a multiset table with FastLoad
  • FastLoad has two phases – Data Acquisition and Application.
  • It requires separate error table for each phase.
  • Use the BEGIN LOADING statement to specify (and create) the two error tables for the operation.
  • We can specify the error table in the same database as the data table or different database.
  • We must remove the error tables before we re-run the same load job or it will terminate in an error condition.
  • Loads data in 64K blocks and
  • only 30 FastLoad and MultiLoad combinations can run simultaneously.
  • No Secondary Indexes (FastLoad)
  • No Unique Secondary Indexes (FastLoad and MultiLoad)
  • No Triggers (FastLoad and MultiLoad)
  • No Join Indexes (FastLoad and MultiLoad)
  • No Referential Integrity (FastLoad and MultiLoad)
  • Only 30 FastLoad and MultiLoad combinations can run at one time.    
  • Only 60 FastExport jobs can run simultaneously!
242. Fastload has some limitations.
  •  . Target table must be empty
  • . Fload performs inserts only (no updates)
  • . Only one target table at a time
  • . Must drop secondary indexes before loading and recreate them afterward Multiload allows nonunique secondary indexes – automatically rebuilds them after loading
  •  It does not load duplicate data.
  •  It does not run in modes. a) Import mode. b) Delete mode.
  •  It does not support NUSI table
  • a It applies Access lock on target Table during loading.
Phase 1 – Acquisition Phase
• Transfer the 64K blocks from the host to the AMPs.
• Have each AMP hash the rows by the table’s Primary Index and send them to a work table on the correct AMP.


Phase 2 – Application Phase
• Sort the data in the work tables by their Row-ID.
• Write the data to the table.


243.Why Fload doesn’t support multiset table in Teradata?
  • Fload does not support Multiset table because of restart capability.
  • Say, the fastload job fails. Till the fastload failed, some number of rows was sent to the AMP’s.
  • Now if you restart FLOAD,  it would start loading record from the last checkpoint and some of the consecutive rows are sent for the second time.
  • These will be caught as duplicate rows are found after sorting of data.
  • This restart logic is the reason that Fastload will not load duplicate rows into a MULTISET table.
  • Fastload support Multiset table but does not support the duplicate rows.
  • Multiset tables are tables that allow duplicate rows.
  • When Fastload finds the duplicate rows it discards it.
  • Fast Load can load data into multiset table but will not load the duplicate rows
244. How many error tables are there in fload and what are their significance/use? Can we see the data of error tables?
 Fload uses 2 error tables Error table
1: where format of data is not correct. Error table
2: violations of UPI
Mload also uses 2 error tables (ET and UV), 1 work table and 1 log table
1. ET TABLE – Data error MultiLoad uses the ET table, also called the Acquisition Phase error table, to store data errors found during the acquisition phase of a MultiLoad import task.
2. UV TABLE – UPI violations MultiLoad uses the UV table, also called the Application Phase error table, to store data errors found during the application phase of a MultiLoad import or delete task
3. WORK TABLE – WT Mload loads the selected records in the work table
4. LOG TABLE A log table maintains record of all checkpoints related to the load job, it is essential/mandatory to specify a log table in mload job. This table will be useful in case you have a job abort or restart due to any reason.
245.How to skip the header row in the fastload script
RECORD 2; /* this skips first record in the source file */ DEFINE …
246. can I use “drop” statement in the utility “fload”?
  • YES, But you have to declare it out of the FLOAD Block it means it should not come between .begin loading,.end loading
  • FLOAD also supports DELETE,CREATE,DROP statements which we have to declare out of FLOAD block in the FLOAD Block we can give only INSERT Question:
247.
FastLoad to a NoPI Table
A NoPI table is useful for a Staging Table.
Loading data into a NoPI staging table will be faster when compared to the same table with a Primary Index.
• Data can be loaded into a staging table quicker using FastLoad if it is a NoPI table thus freeing up client resources earlier.
• The data-redistribution processing in the acquisition phase is done more efficiently by using bigger blocks to distribute the rows between AMPs (4KB versus 64KB).
• The End Loading (sort phase) is completely eliminated with a NoPI table. 
  • While a NoPI target table is being loaded with FastLoad, users can view the table content with an ACCESS lock.
  • This is allowed because rows are always appended at the end of a NoPI table. This is not allowed on a PI target table until the data has been sorted which does not happen until the end of Phase 2.
248.Can you load multiple data files for same target table using Fastload?
Yes, we can Load a table using multiple datafiles in Fastload. Before giving “end loading” statement user can define file path and use insert sql for multiple source files and give “end loading” statement at the end
249.
Loading Multiple Input Files with FastLoad
  • No END LOADING statement causes a “FastLoad Paused” state.
  • Then, we run the job again with a different flat file.
  • When the second FastLoad uses the END LOADING statement, this indicates no more data and Phase 2 starts.
  • This is the only way to trick FastLoad into using multiple input files.

 250.If Fast Load Script fails and only the error tables are made available to you, then how will you restart?
There are basically two ways of restarting in this case.
  • Making the old file to run – Make sure that you do not completely drop the error tables. Instead, try to rectify the errors that are present in the script or the file and then execute again.
  • Running a new file – In this process, the script is executed simply using end loading and beginning statements. This will help in removing the lock that has been put up on the target table and might also remove the given record from the log table. Once this is done, you are free to run the whole script once again.
  • Execute the FASTLOAD script by removing the DML block , i.e only BEGIN LOADING and END LOADING in the FASTLOAD script should be kept.
  • Another method can be to drop the table and create it again..
The block utilities have limits that prevent too many of them running at the same time. This feature is actually controlled by a new DBS Control parameter named MaxLoadAWT which controls AMP Worker Tasks (AWT). When MaxLoadAWT is set to zero, then it is like going back in time to pre-V2R6.0 where only 15 FastLoad, MultiLoad and FastExport jobs can run max.
FastLoad permits one conversion per column from one data type to another.Notice that our Accounts_Table has different data types than the flat file, but FastLoad will convert them for us automatically within the FastLoad.
A FastLoad that Uses the NULLIF Statement
NULLIF allows you to specify that if an input field contains a specific value, it should be treated as a NULL.This example occurs when dates are entered as zeroes. This would normally cause a failure, but not now.
251.
FastLoad and Referential Integrity Solutions
Approach Number One
Approach Number Two
images
FastLoad and CHECKPOINT
.LOGON Teradata V14/dbc, abc123 ;
CREATE TABLE SQL_Class.Employee_Table
(Employee_No    INTEGER
,Dept_No              INTEGER
,Last_Name          CHAR(20)
,First_Name         VARCHAR(12)
,Salary                    DECIMAL(10,2)
) Unique Primary Index (Employee_No) ;
DEFINE    Employee_No (INTEGER)
                  , Dept_No (SMALLINT)
                  , Last_name (CHAR(20))
                  , First_name (VARCHAR(12))
                  , Salary (DECIMAL(8,2))
FILE=C:\Temp\EmpFlat.txt;
BEGIN LOADING SQL_CLASS.Employee_table
         ERRORFILES SQL_CLASS.Employee_table_ERR1
                        , SQL_CLASS.Employee_table_ERR2
         CHECKPOINT 1000000 ;
INSERT INTO
SQL_CLASS.Employee_table VALUES
   ( :Employee_No, :Dept_No, :Last_name, :First_name, :Salary ) ;
END LOADING;
LOGOFF;
CHECKPOINT is used to verify that rows have been transmitted and processed and specifies the number of rows transmitted before pausing to take a checkpoint to verify receipt by the AMPs.
If the CHECKPOINT parameter is NOT specified, FastLoad takes a default checkpoint as follows:
• Beginning of Phase 1
• Every 100,000 input records
• End of Phase 1
FastLoad can be restarted and will continue from the previous checkpoint.
Checkpoints slow down processing so set the CHECKPOINT large enough that checkpoints are taken about every 10-15 minutes. This usually requires a CHECKPOINT value around 1000000(one million).
Loading Multiple Input Files with FastLoad
.LOGON Teradata V14/dbc, abc123 ;
CREATE TABLE SQL_Class.Employee_Table
(Employee_No    INTEGER
,Dept_No             INTEGER
,Last_Name          CHAR(20)
,First_Name         VARCHAR(12)
,Salary                   DECIMAL(10,2)
) Unique Primary Index (Employee_No) ;
BEGIN LOADING SQL_CLASS.Employee_table
         ERRORFILES SQL_CLASS.Employee_table_ERR1
                  , SQL_CLASS.Employee_table_ERR2 ;
DEFINE     Employee_No (INTEGER)
                   , Dept_No (SMALLINT)
                   , Last_name(CHAR(20))
                   , First_name(VARCHAR(12))
                   , Salary (DECIMAL(8,2))
FILE=C:\Temp\EmpUS.txt ;
.LOGON Teradata V14/dbc, abc123 ;
BEGIN LOADING SQL_CLASS.Employee_table
ERRORFILES SQL_CLASS.Employee_table_ERR1
               , SQL_CLASS.Employee_table_ERR2 ;
DEFINE    Employee_No (INTEGER)
            , Dept_No (SMALLINT)
            , Last_name(CHAR(20))
            , First_name(VARCHAR(12))
            , Salary (DECIMAL(8,2))
FILE=C:\Temp\EmpIntl.txt ;
INSERT INTO
SQL_CLASS.Employee_table VALUES
  ( :Employee_No, :Dept_No, :Last_name,
    :First_name, :Salary ) ;
END LOADING;
LOGOFF;
INSERT INTO
SQL_CLASS.Employee_table VALUES
  ( :Employee_No, :Dept_No, :Last_name,
    :First_name, :Salary ) ;
images
LOGOFF;
No END LOADING statement causes a “FastLoad Paused” state. Then, we run the job again with a different flat file. When the second FastLoad uses the END LOADING statement, this indicates no more data and Phase 2 starts. This is the only way to trick FastLoad into using multiple input files.
A BTEQ Export and then a FastLoad
.LOGON Teradata V14/dbc,**PASSWORD**;
.EXPORT DATA FILE = ‘C:\Windows\Temp\BteqSC.txt
SELECT Student_ID, Course_ID
FROM SQL_CLASS.Student_Course_table ;
.EXPORT RESET
.LOGOFF;
.QUIT;
A simple BTEQ Export that will export the BteqSC.txt file off of Teradata.
.LOGON Teradata V14/dbc,**PASSWORD**;
SET RECORD FORMATTED;
DEFINE Student_ID (INTEGER)
   , Course_ID (SMALLINT)
FILE=’C:\Windows\Temp\BteqSC.txt’;
BEGIN LOADING SQL_CLASS.Student_Course_table
                     ERRORFILES SQL_CLASS.Student_Course_table_ERR1
                              ,SQL_CLASS.Student_Course_table_ERR2 ;
INSERT INTO SQL_CLASS.Student_Course_table
VALUES ( :Student_ID
                ,:Course_ID );
END LOADING;
LOGOFF;
A simple FastLoad that will load the BteqSC.txt file to the Student_Course_Table on Teradata.
Assume the table started completely empty!
A FastExport and then a FastLoad Needs Indicators
images

MULTI LOAD

252.Explain Multi Load in Teradata?
  • Used for loading, updating or deleting data to and from populated tables, typically with batch inputs from a host file.
  • Cannot process tables defined with USI’s, Referential Integrity, Join Indexes, Hash Indexes, or Triggers.
  • No data retrieval capability.
  • Import tasks require use of Primary Index
  • The Multiload supports five target tables per script.
  • Tables may contain pre-existing data.
  • Ability to do INSERTs UPDATEs, DELETEs and UPSERTs.
  • Error tables should be dropped manually, as RELASE MLOAD will not drop them automatically.
  • Loads up to 5 tables at a time.
  •  Does not mind loading duplicate data.
  •   Loads the data into the Table whether it can be either empty or full.
  • It needs 2 error tables, 1 restart log table and 1 work table.
  • It supports NUSI table loading loading.
  • It applies write lock on target Table table during loading.

253..What are the MultiLoad Utility limitations?
  • MultiLoad is a very powerful utility; it has following limitations:
  • MultiLoad Utility doesn’t support SELECT statement.
  • Concatenation of multiple input data files is not allowed.
  • MultiLoad doesn’t support Arithmetic Functions i.e. ABS, LOG etc. in Mload Script.
  • MultiLoad doesn’t support Exponentiation and Aggregator Operators i.e. AVG, SUM etc. in Mload Script.
  • Import task require use of PI (Primary Index).
  • MultiLoad loads to populated Teradata tables in 64K blocks via Inserts, Updates, Deletes and Upserts.
  • No Unique Secondary Indexes (USI), but Non-Unique Secondary Indexes (NUSI) are allowed..Supports up to five populated tables.
  • No concatenating input files.Loads data in 64K blocks and only 30 FastLoad and MultiLoad combinations can run simultaneously.
  • No Secondary Indexes (FastLoad)
  • No Unique Secondary Indexes (FastLoad and MultiLoad)
  • No Triggers (FastLoad and MultiLoad)
  • No Join Indexes (FastLoad and MultiLoad)
  • No Referential Integrity (FastLoad and MultiLoad)
  • Only 30 FastLoad and MultiLoad combinations can run at one time.    
  • Only 60 FastExport jobs can run simultaneously!
Referential Integrity and Load Utility Solutions

MultiLoad has IMPORT and DELETE Tasks


DO INSERT FOR MISSING UPDATE ROWS ;
The keywords you can’t forget for an UPSERT MultiLoad are listed above in red and are: DO INSERT FOR MISSING UPDATE ROWS. That tells Teradata to first try and do an Update, and if it is successful move on. But, if the Update fails, then do an Insert.
A Sample MultiLoad Script Created by Nexus SmartScript
  .LOGTABLE SQL01.Emp_Salary_Raise_Log ;
.LOGON Teradata V14/dbc, abc123 ;
  Create the log
Logon to Teradata using the TDP ID, User andPassword.
  .BEGIN IMPORT MLOAD
TABLES SQL01.Employee_Table
WORKTABLES SQL01.EMP_WT
ERRORTABLES SQL01.EMP_ET
SQL01.EMP_UV ;
  Begin MLOAD and set up
one Worktable and
two Error Tables.
   .LAYOUT FileColDesc1 ;
.FIELD Employee_No  *  INTEGER ;
.FIELD Salary               *  DECIMAL (8,2) ;
   Describe the layout and positions of the columns in the Flat file. The * means the next physical position.
   .DML Label EMP_UPD ;
UPDATE SQL01.Employee_Table
SET Salary = :Salary
WHERE Employee_No = :Employee_No ;
   The DML label has the DML statement to be executed. There can be many DML statements in a MultiLoad.
  .IMPORT INFILE C:\temp\mload_Flat_File.txt
LAYOUT FileColDesc1
APPLY EMP_UPD ;
   Import this file name
Use this Layout for the flat file and the columns
Run the DML statements in the label named EMP_UPD
   .END MLOAD ;
.LOGOFF ;
   End the loading and Logoff.
 254.Name the five phases that come under MultiLoad Utility.
MultiLoad has Five Phases
images
Preliminary Phase
• Basic Setup.
images
DML Transaction Phase
• Get DML steps down onto the AMPs3
images
Acquisition Phase
• Send the input data to the AMPs and sort it.
images
Application Phase
• Apply the input data to the appropriate Target Tables.
images
Cleanup Phase
• Basic Cleanup





  • The MultiLoad DELETE mode is used to perform a global (all AMP) delete on just one table. 
  • .BEGIN DELETE MLOAD is that it bypasses the Transient Journal (TJ) and can be RESTARTED if an error causes it to terminate prior to finishing.
MultiLoad DELETE Rules
  • DELETE tasks operate differently than IMPORT tasks.
  • Deleting a Non-Unique Primary Index on equality is allowed.
255.A MultiLoad Delete is often better than an SQL Delete 
  • MultiLoad Delete if faster
  • MultiLoad Delete is restartable
  • The MultiLoad DELETE mode is used to perform a global (all AMP) delete on just one table.
  • .BEGIN DELETE MLOAD is that it bypasses the Transient Journal (TJ) and can be RESTARTED if an error causes it to terminate prior to finishing.
  • When performing in DELETE mode, the DELETE SQL statement cannot reference a Unique Primary Index in the WHERE clause with equality.
  • This due to the fact that a primary index access is to a specific AMP; this is a global operation.
256.
257. What is the process to restart the multiload if it fails?
  • MULTILOAD will creates 2 error tables, 1 work table
  • When MULTILOAD fails We have to unlock the Main Table, here error tables and work tables are not locked like FLOAD
  • To Unlock the Main Table in Mload RELEASE MLOAD <TABLE NAME>;
  • The above ans is only when the MLOAD failed in acquisition phase To release lock in application phase failure in MLOAD RELEASE MLOAD <Table Name> .IN APPLY;
  • If the MultiLoad job fails, you can restart the job and it will continue where it left off.
  • If Teradata goes down, the MultiLoad will continue where it left off, without user intervention, when Teradata data comes back.
If all else fails, you can use the RELEASE MLOAD statement below and start over
.RELEASE MLOAD SQL01.Employee_Table ;
258.Five Formats of MultiLoad Files
Binary• Each record is a 2-byte integer, n, that is followed by n bytes of data. A byte is the smallest means of storage for Teradata.
FastLoad• This format is the same as Binary, plus a marker (X ‘0A’ or X ‘0D’) that specifies the end of the record.
Text• Each record has a random number of bytes and is followed by an end of the record marker.
Unformat• The format for these input records is defined in the LAYOUT statement of the MultiLoad script using the components FIELD, FILLER and TABLE.
Vartext• This is variable length text RECORD format separated by delimiters such as a comma. For this format you may only use VARCHAR, LONG VARCHAR (IBM) or VARBYTE data formats in your MultiLoad LAYOUT. Note that two delimiter characters in a row will result in a null value between them.
259.A NoPI Table Does Not Work with MultiLoad
  • A NoPI Table has NO PRIMARY INDEX so it does Not work with MultiLoad!
  • A NoPI Table has no primary index.
  • A NoPI table guarantees even distribution. A MultiLoad IMPORT works by utilizing the table’s Primary Index.
  • A NOPI table stands for NO PRIMARY INDEX and therefore MultiLoad won’t work with this type of table.
Error Tables are dropped if there are not errors, but you can query the error tables.
260.Host Utility Locks (HUT Locks)
There are two types of Host Utility Locks (also referred to as HUT Locks)
:Acquisition locks – Prevent all DDL statements (except DROP) from running against the table(s) being MultiLoaded, but allow for the all DML statements such as SELECT
.Application Locks – Prevent all DDL statements (except DROP) from running against the table(s) being MultiLoaded, but allow for users to run SELECT statements only if they use an ACCESS Lock.
Although MultiLoad seems pretty intense, it never places an Exclusive lock inside the job. If you decide you want to trouble-shoot a MultiLoad job so you want to look at the logtable, the work tables, or any of the error tables, you must use an ACCESS Lock in your SQL to avoid abnormally terminating the MultiLoad job due to locking problems.
Locking Row for ACCESS
SELECT DISTINCT ErrorCode ,ErrorFieldName FROM SQL01.Emp_ET;

261. Give a justifiable reason why Multiload supports NUSI instead of USI.
The index sub-table row happens to be on the same Amp in the same way as the data row in NUSI. Thus, each Amp is operated separately and in a parallel manner.
Whenever we define a Secondary index, an Secondary index subtable will be created.In case of UPI, when they go for has distribution subtable is in one AMP and actual data row pointed by USI subtable is in another AMP. So the AMPs need to communicate, which is not supported by Multiload.
  • But in case of NUSI, the subtable and the references of the actual data will store in the same AMP hence AMPs no need to communicate here. So in case of NUSI, AMPs work in parallel and hence Mload supports that.
 262.How is MLOAD Client System restarted after execution?
The script has to be submitted manually so that it can easily load the data from the checkpoint that comes last.
 263.How is MLOAD Teradata Server restarted after execution?
The process is basically carried out from the last known checkpoint, and once the data has been carried out after execution of MLOAD script, the server is restarted.
264.How do you transfer large amount of data in Teradata?
Transferring of large amount of data can be done by using the various Teradata Utilities i.e. BTEQ, FASTLOAD, MULTILOAD, TPUMP and FASTEXPORT.
  • BTEQ (Basic Teradata Query) supports all 4 DMLs: SELECT, INSERT, UPDATE and DELETE.BTEQ also support IMPORT/EXPORT protocols.
  • Fastload, MultiLoad and Tpump transfer the data from Host to Teradata.
  • FastExport is used to export data from Teradata to the Host.
265.Difference between MultiLoad and TPump in Teradata?
  • Tpump provides an alternative to MultiLoad for low volume batch maintenance of large databases under control of a Teradata system.
  • Tpump updates information in real time, acquiring every bit of a data from the client system with low processor utilization.
  • It does this through a continuous feed of data into the data warehouse, rather than the traditional batch updates.
  • Continuous updates results in more accurate, timely data.
  • Tpump uses row hash locks than table level locks.
  • This allows you to run queries while Tpump is running.
266.Which is faster – MultiLoad delete or Delete command in Teradata?
  • MultiLoad delete is faster than normal Delete command, since the deletion happens in data blocks of 64Kbytes, whereas delete command deletes data row by row.
  • Transient journal maintains entries only for Delete command since Teradata utilities doesn’t support Transient journal loading
  • For smaller table deletes, simple DELETE command is enough.
  • Multiload delete is useful when the delete has to be performed on a large table in teradata.
267.How to Skip or Get first and Last Record from Flat File through MultiLoad in Teradata?
  • In .IMPORT command in Mload we have a option to give record no. from which processing should begin. i.e. ‘FROM m’ ‘m’ is a logical record number, as an integer, of the record in the identified data source where processing is to begin. You can mention ’m’ as 2 and processing will start from second record.
  • THRU k and FOR n are two options in the same Mload command, functions same towards the end of the processing. Adding to the above, if from m”start record” and for n “stop record” are not mentioned, mload considers records from start till the end of the file



268.  What is the difference between FastLoad and MultiLoad?
  •   FastLoad uses multiple sessions to quickly load large amount of data on empty table.   MultiLoad is used for high-volume maintenance on tables and views.
  • It works with non-empty tables also.
  • Maximum 5 tables can be used in MultiLoad.
269. Can we load a Multi set table using MLOAD?
  • YES, We can Load SET, MULTISET tables using Mload, But here when loading into MULTISET table using MLOAD duplicate rows will not be rejected, we have to take care of them before loading.
  • But in case of Fload when we are loading into MULTISET duplicate rows are automatically rejected, FLOAD will not load duplicate rows weather table is SET or MULTISET
270.What is the difference between Multiload & Fastload in terms of Performance?
  •  Fastload is used to load empty tables and is very fast, can load one table at a time. Multiload can load at max 5 tbls at a time and can also update and delete the data.
  • Fastload can be used only for inserting data, not updating and deleting.
  • Multiload can at max 5 tables with non unique secondary indexes on them. where as in fastload u cannot have secondary indexes on the table.
  • In multiload you can insert,update or delete data in already populated tables whereas in fastload the target table should be empty.
  • Multiload works in 5 phases whereas fastload works in two phases.
  • If we want to load data into an empty table then fastload is best option as compared to multiload.
  • multiset table will allow duplication even though if you load it into fastload it wont allow duplication
  • If you want to load, empty table then you use the fastload, so it will very usefull than the multiload..because fastload performs the loading of the data in 2 phase..and it need a work table for loading the data.., so it is faster as well as it follows the below steps to load the data in the table
271.What are all Utilities of Teradata



  • BTEQ-Basic Teradata Query Language (For Export/Import) which supports SELECT, INSERT, UPDATE and DELETE
  • Teradata FastLoad(For To Load empty tables at high speed)
  • Teradata MultiLoad(For To insert, update and delete records)
  • Teradata FastExport(For export data from the Teradata Database)
  • Teradata TPump(For real time data loads)
272.Why Multiload and Fastload does not supports SI,JI,RI and TRIGGERS?
SI
  • USI – generally stored in sub table in a different amp other than the amp having data row. Both the utilities do not support.
  • NUSI – The sub table is stored in the same amp itself. Mload supports it and Fastload not.
JI
  • TD generally stores the rows of join index across the amps Both the utilities do not support.
RI and triggers
  • These are multi amp operations and any operation involving communication between two amps is not supported by these two utilities



276.How can you specify the charset to be used for MULTILOAD?
MultiLoad That Inserts and Updates from Two Different Files
.LOGTABLE SQL01.Emp_Dept_Log ;
.LOGON Teradata V14/dbc, abc123 ;
.BEGIN IMPORT MLOAD
   TABLES SQL01.Employee_Table
                 ,SQL01.Department_Table
         WORKTABLES SQL01.EMP_WT
                                   ,SQL01.DEPT_WT
            ERRORTABLES SQL01.EMP_ET
                                    SQL01.EMP_UV,
                                      SQL01.DEPT_ET
                                      SQL01.DEPT_UV ;
.LAYOUT FileIn1 ;
.FIELD Emp_No  *  INTEGER ;
.FIELD Dept_No  *  SMALLINT ;
.FIELD LName     *  CHAR(20) ;
.FIELD FName     *  VARCHAR(12) ;
.FIELD Salary       *  DECIMAL (10,2) ;
.LAYOUT FILEIn2 ;
.FIELD DeptNo       *  INTEGER ;
.FIELD DeptName  *  CHAR(20) ;
.DML Label EMP_INS ;
  INSERT INTO SQL01.Employee_Table
    VALUES(:Emp_No, :Dept_No, :LName,
                  :FName, :Salary ) ;
.DML Label DEPT_UPD ;
   UPDATE SQL01.Department_Table
      SET Department_Name = :DeptName
          WHERE Dept_No = :DeptNo ;
.IMPORT INFILE C:\temp\Emp_Data.txt
     LAYOUT FileIn1
     APPLY EMP_INS;
.IMPORT INFILE C:\temp\Dept_Data.txt
   LAYOUT FileIn2
   APPLY DEPT_UPD;
.END MLOAD ;
.LOGOFF ;
A MultiLoad Example That UPSERTs
.LOGTABLE SQL01.CDW_Log ;
.LOGON Teradata V14/dbc, abc123;
.BEGIN IMPORT MLOAD
   TABLES SQL01.Student_Profile
      WORKTABLES SQL01.Prof_WT
         ERRORTABLES SQL01.Stud_Prof_ET
                        SQL01.Stud_Prof_UV ;
.LAYOUT FileIn ;
.FIELD Student_ID  *  INTEGER ;
.FIELD Last_Name  *  CHAR(20) ;
.FIELD First_Name  *  VARCHAR(12) ;
.FIELD Class_Code  *  CHAR(2) ;
.FIELD Grade_Pt   *  DECIMAL (5,2) ;
.LAYOUT FILEIn2 ;
.FIELD DeptNo    *  INTEGER ;
.FIELD DeptName  *  CHAR(20) ;
.DML Label UPSERTER ;
DO INSERT FOR MISSING UPDATE ROWS ;
UPDATE SQL01.Student_Profile
     SET Last_Name = :Last_Name
            ,First_Name = :First_Name
            ,Class_Code = :Class_Code
            ,Grade_Pt    = :Grade_Pt
     WHERE Student_ID = :Student_ID ;
INSERT INTO SQL01.Student_Profile
   VALUES(:Student_ID, :Last_Name,
               :First_Name, :Class_Code
               ,:Grade_Pt ) ;
.IMPORT INFILE C:\temp\Upsert.txt
   LAYOUT FileIn
   APPLY UPSERTER;
.END MLOAD ;
.LOGOFF ;

TPUMP

277.A certain load is being imposed on the table and that too, every hour. The traffic in the morning is relatively low, and that of the night is very high. As per this situation, which is the most advisable utility and how is that utility supposed to be loaded?
The most suggestible utility here has to be Tpump. By making use of packet size decreasing or increasing, the traffic can be easily handled.

278.Explain TPUMP (Teradata Parallel Data Pump) Utility in Teradata?
  • TPUMP allows near real time updates from Transactional Systems into the Data Warehouse.It can perform Insert, Update and Delete operations or a combination from the same source.
  • It can be used as an alternative to MLOAD for low volume batch maintenance of large databases.
  • TPUMP allows target tables to have Secondary Indexes, Join Indexes, Hash Indexes, Referential Integrity, Populated or Empty Table, Multiset or Set Table or Triggers defined on the Tables.
  • TPUMP can have many sessions as it doesn’t have session limit.
  • TPUMP uses row hash locks thus allowing concurrent updates on the same table
  • It is called TPump because you can adjust the rate for which you want the data loaded, from trickle to maximum speed.
  • TPump is used when you want to load during the day while simultaneously querying a table.
  • TPump does it all and is used for maintenance on populated tables one row at a time.
  • TPump is NOT a Block Level Utility and has No session  Limits
  • so it is perfect when you don’t need to load massive amounts of data or you want to simultaneously load and query a table.
  • It is often used if the tables you’re loading to have secondary indexes, join indexes, triggers, or referential integrity.
  • It is also great when you need hundreds of load jobs to run simultaneously.
  • It is not as fast as FastLoad or MultiLoad, but it is pretty darn fast.

280. What are TPUMP Utility Limitations?
  • Use of SELECT statement is not allowed.
  • Concatenation of Data Files is not supported.
  • Exponential & Aggregate Operators are not allowed.
  • Arithmetic functions are not supported.
Rule #6: On some network attached systems, the maximum file size when using TPump is 2GB. This is true for a computer running under a 32-bit operating system.
Rule #7: TPump performance will be diminished if Access Logging is used. TPump uses normal SQL to accomplish its tasks.
281.
These .BEGIN Parameters are only used in TPump
  • SERIALIZE If ON, this options guarantees that operations on a row occur serially.
  • PACK Statements to pack into a multiple-statement request. (default is 20, max is 600)
  • PACKMAXIMUM Use the max pack factor.
  • RATE Max rate statements are sent to the Teradata RDBMS per minute. (Default unlimited)
  • LATENCY Number (range is from 10-600 seconds) Allows TPump to commit to Teradata any data sitting in the buffer longer than the LATENCY value.
  • NOMONITOR Prevents TPump from checking for statement rate changes from or update status information for the TPump Monitor application. By default the monitoring is ON
  • ROBUST ON|OFF (default is ON). The OFF parameter signals TPump to use “simple” restart logic. In this case, restarts cause TPump to begin where the last checkpoint occurred.
  • MACRODB dbname (default is logtable database). Database if a macro is used in the TPump job.

Five Formats of TPump Files




TPump Script with Error Treatment Options
.LOGTABLE SQL01.Log_Pump ;
.LOGON Teradata V14/dbc, abc123;
DATABASE SQL01 ;
.BEGIN LOAD ERRLIMIT 5
          CHECKPOINT 1
          SESSIONS 1
          TENACITY 2
          PACK 40
          RATE 1000
ERRORTABLE SQL01.Err_Pump ;
.LAYOUT FileLayOut ;
.FIELD Student_ID * VARCHAR(11) ;
.FIELD Last_Name * VARCHAR(20) ;
.FIELD First_Name * VARCHAR(14) ;
.FIELD Class_Code * VARCHAR(2) ;
.FIELD Grade_Pt * VARCHAR(8) ;
     .DML Label INSREC
     IGNORE DUPLICATE ROWS
     IGNORE MISSING ROWS
     IGNORE EXTRA ROWS ;
     INSERT INTO Student_Table
     (Student_ID, Last_Name, First_Name, Class_Code, Grade_Pt)
       VALUES(:Student_ID, :Last_Name, :First_Name, :Class_Code, :Grade_Pt ) ;
     .IMPORT INFILE C:\temp\Cdw_Import.txt
       FORMAT VARTEXT ‘,’
       LAYOUT FileLayOut
       APPLY INSREC;
     .END LOAD ;
     .LOGOFF ;
TPump UPSERT Script
.LOGTABLE SQL01.CDW_Log ;
.LOGON Teradata V14/dbc, abc123;.BEGIN LOAD ERRLIMIT 5
CHECKPOINT 10
SESSIONS 10
TENACITY 2
PACK 10
RATE 10
ERRORTABLE SQL01.SWA_ET ;
.LAYOUT INREC INDICATORS ;
.FIELD Student_ID * INTEGER ;
.FIELD Last_Name * CHAR(20) ;
.FIELD First_Name * VARCHAR(14) ;
.FIELD Class_Code * CHAR(2) ;
.FIELD Grade_Pt * DECIMAL(8,2) ;
     .DML Label UPSERTER
     DO INSERT FOR MISSING UPDATE ROWS ;
     UPDATE SQL01.Student_Table
     SET Last_Name = :Last_Name
,First_Name = :First_Name
,Class_Code = :Class_Code
,Grade_Pt = :Grade_Pt
     WHERE Student_ID = :Student_ID ;
INSERT INTO SQL01.Student_Table
(Student_ID, Last_Name, First_Name, Class_Code, Grade_Pt)
VALUES(:Student_ID, :Last_Name, :First_Name, :Class_Code, :Grade_Pt ) ;
.IMPORT INFILE C:\temp\Upsert_File.txt
FORMAT FASTLOAD
LAYOUT INREC
APPLY UPSERTER;
.END LOAD ;
.LOGOFF ;

TPT


282.
  • The Teradata Parallel Transport (TPT) utility combines BTEQ, FastLoad, MultiLoad, TPump, and FastExport utilities into one comprehensive language utility.
  • This allows TPT to insert data to tables, export data from tables, and update tables.
  • TPT works around the concept of Operators and Data Streams.
  • There will be an Operator to read Source data, pass the contents of that Source to a data stream where another operator will be responsible for taking the Data Stream, and loading it to disk.
  • A Producer Operator, designed to read input, will move the data to a Data Stream.
  • The Consume Operator, designed to write data to a Teradata table, will then Load the data.
  • TPT Producers Create Streams and Consumers Write Them
283.
  • The Four Major Operators of TPT
Producer – Reads data and writes it to a stream.
Consumer – Takes data from streams and writes it to disk.
Filter – Reads streams, filters data, and then sends it down stream.
Standalone – Used to Create DDL (table structures).
  • TPT is designed to read data from disk and then place that data in a memory data stream, thus eliminating unnecessary I/O.
  • The Operators above work together to Create Table structures, read data from flat files and send it down the stream, filter data from the stream and then send it further down the stream, and to finally read the stream and write it to its final disk destination.
One of the clever concepts behind TPT is that you can take multiple sources (multiple flat files for example), and utilize multiple Producer Operators to reach each source and then move multiple data streams to multiple Consumer Operators. These operators will then write the multiple sources to one Teradata table. This couldn’t be done before with FastLoad or MultiLoad.
TPT can have more Operators than Consumers
Each source can be handled by a different instance of the producer operator which merges these files into a single data stream.



284.
285.
TPT Scripts are divided into two major sections
Teradata divides the TPT scripts into two major sections:
•   Declarative Section
•   Executable

The Declarative Section uses DEFINE statements to define the TPT objects needed for the job. These objects identify the schemas of data sources and data targets as well as each operator that will be used to extract, filter, or load the data.
The Executable Section specifies all processing statements that will initiate actions to read, extract, filter, insert, update, and delete data. This is accomplished using APPLY statements similar to MultiLoad and TPump jobs.
286.
Three Required Define Statements in the Declarative Section
DEFINE JOB – Names the TPT JOB and can be anything you want to call it. It can even identify the character set being used. It will contain the definitions of all objects, as well as one or more processing statements
.DEFINE SCHEMA – Defines the data structure for the data that a particular operator will process. There can be many DEFINE SCHEMAS because each unique data structure used in the TPT job requires a separate DEFINE SCHEMA object.
DEFINE OPERATOR – Defines an operator and specifies all operator attributes to which values can be assigned.

Partial TPT Script (just snippets for example sake)    
DEFINE JOB Load_Dept_Table
Description ‘Loads the Dept_Table’
   (Define Schema Input_Flat_File_Schema
Description ‘TPT Load Operator like FastLoad’
(Dept_No     Integer,
Dept_Name Char(20),
Budget        Decimal(10,2));
  Define Operator
Load_Operator                 Description ‘TPT Load Operator’
TYPE LOAD…
Declarative Section:
DEFINE JOB
Defines the overall job and coordinates all the DEFINE and APPLY statements.
DEFINE SCHEMA
Defines the columns and their data types. Multiple schemas can define multiple files or A particular schema can be used to describe multiple data objects.
DEFINE OPERATOR
Defines a specific TPT operator to be used in this job
Executable Section:
APPLYA processing statement used to initiate a TPT load, update, or delete operation.TPT is an improvement from the past because now a single script can be used to define multiple operators and schemas to create complex extracting and loading jobs. There are only a few statements needed to build a TPT script, and they are listed above.
Schemas
A Schema is a set of metadata definitions about the columns and rows of a data source or destination object. These include column names, data types, column sizes, any precision, scale and null value indicators.
TPT needs to know about what the source and destination look like, and the schema’s job is to tell TPT what that is.
287.

The DDL Operator

The DDL Operator is a standalone operator that allows DDL and other SQL statements to perform activities such as CREATE, ALTER, DROP, GIVE, GRANT, REVOKE, INSERT, UPDATE, DELETE and INSERT/SELECT.
Statements not supported are SELECT, HELP and SHOW.
All SQL or DDL commands must be hard-coded into the SQL statements and are submitted to TPT as part of the APPLY clause.
As a standalone operator, the DDL operator does not send data to or retrieve data from a data stream.
288.
The SQL Selector Operator
  • SQL SELECTOR is a producer operator similar to the old BTEQ Export.
  • Exports smaller volumes of data from Teradata.
  • Should be used for any export with less than 500,000 rows.
  • Submits a single SQL SELECT statement to Teradata, and supports a single session and single operator instance only.
  • Can export from multiple tables using a Join Statement.
  • Does NOT use a “Loader” slot.Supports Report Mode to create character-based files, just like the old BTEQ Export did.In the past, users had to decide whether or not to export using BTEQ or FastExport.
  • BTEQ was the choice when there was less than 500,000 rows and FastExport was preferred for larger volumes of data.
  • This still holds true, but with TPT the BTEQ Export is now the SQL Selector Operator and the FastExport is the Export Operator.
SQL_Selector Operator Export to Delimited File
continuation …..
continuation …..
continuation …..
continuation …..
continuation …..
continuation …..
continuation …..
continuation …..
SQL Selector Example
continuation …..
continuation …..
continuation …..
continuation …..
another example
continuation …..
continuation …..
continuation …..
continuation …..
continuation …..
289.

The Export Operator

  • EXPORT is a producer operator that uses the old Fast Export protocol to produce a data stream.Exports large volumes of formatted data from Teradata.
  • Should be used for any export with more than 500,000 rows.
  • Takes advantage of multiple sessions and multiple instances.
  • Can export from multiple tables using a Join Statement.
  • Uses one of the “Loader” slots.
  • NOSPOOL mode improves performance as the export begins immediately, and the table is read in one pass, and exported while data blocks are being read into memory buffers.
  • The EXPORT operators means you are using the FastExport job but with TPT.
  • If the No Spool option is set, the SELECT statement cannot have ORDER BY, HAVING, WITH, Joins, SUM, or use the Teradata SQL Keywords DATE or USER.




290.Deferred Schema:What is a Deferred Schema?
– When the structure of an incoming data stream may change, it is advised to utilize a technique called deferring a schema using the SCHEMA * syntax.
This defers the schema definition until that phase of the job is ready to execute.
What operators use a Deferred Schema? –
A Deferred Schema is only done with consumer operators such as Load, not with producers such as Export.
It is a good practice to defer the schema when defining consumer operators in case definitions change or a filter alters the stream.
If you are certain that the schema definition will not change (i.e., the consumed data and the loaded data are the same), then an explicit schema definition is OK.

291.

LOAD OPERATOR 

  • The Load Operator is a consumer operator that uses the FastLoad protocol to insert data into an empty Teradata table.
  • Takes advantage of multiple sessions and multiple instances.
  • Uses one of the ”loader slots”.
  • Used to initially load an empty Teradata table at the block level.
  • Can also load to a staging table.Has two error tables, supports check pointing and data may be loaded from multiple separate batches.
  • All SQL or DDL commands must be hard-coded into the SQL statements and are submitted to TPT as part of the APPLY clause.
  • The Load operator is the new FastLoad.
  • The same rules that were in FastLoad still apply to the TPT load operator.
  • You use TPT load to load to an empty Teradata table.
  • The table must be empty, cannot have secondary indexes, referential integrity, triggers, or join indexes.

292.

No comments:

Post a Comment