1. How do you define Teradata? Give some of the primary characteristics of the same.
Teradata is basically an RDMS which is used to drive the Datamart, Data warehouse, OLAP, OLTP, as well as DSS Appliances of the company. Some of the primary characteristics of Teradata are given below.
- Is capable of running on Single-nodes, as well as multi-nodes.
- Parallelism is built into the system.
- Very much compatible with the standards of ANSI.
- Tends to act in the same way as a server.
- It is an Open System that basically executes for UNIX MR-RAS, Suse Linux ETC, WIN2K, etc.
2. What are the newly developed features of Teradata?
Some of the newly developed features of Teradata are: –
- HASH JOIN ENHANCEMENTS
These enhancements extend the application of hash joins to include:*
- Classical and dynamic hash outer joins
- Inclusion hash semi joins and exclusion hash semi joins
- Dynamic, inclusion, and exclusion hash semi joins with dynamic partition elimination
- Hash joins with cross terms
Benefits
- More efficient hash joins.
- Enhanced performance of outer joins, inclusion and exclusion semi joins, joins with partition elimination, and joins with cross terms.
Related Links
Most useful Teradata interview questions -- PART - I
Most useful Teradata interview questions -- PART - II
Most useful Teradata interview questions -- PART - III
Most useful Teradata interview questions -- PART - IV
Most useful Teradata interview questions -- PART - V
Most useful Teradata interview questions -- PART - VI
Most useful Teradata interview questions -- PART - VII
Most useful Teradata interview questions -- PART - II
Most useful Teradata interview questions -- PART - III
Most useful Teradata interview questions -- PART - IV
Most useful Teradata interview questions -- PART - V
Most useful Teradata interview questions -- PART - VI
Most useful Teradata interview questions -- PART - VII
Considerations :
Hash join enhancements with dynamic partition elimination only apply to dynamic hash
joins, inclusion hash joins, and exclusion hash joins. They do not apply to classical or
direct hash joins.
- JavaScript Object Notation:
JSON (JavaScript Object Notation) is new functionality to support the storage and processing of data into Teradata database. - Teradata QueryGrid:
Teradata database now able to connect Hadoop with this QueryGrid so it’s called as Teradata Database-to-Hadoop also referred as Teradata-to-Hadoop connector - Teradata also provides a new data type which stores and process the XML data.
- DBQL Show Parameter parameterized query used to place parameters, and the parameter values are provided in a separate statement at time of execution.The main purpose is to distinguishes between code and data. Also avoids attackers from changing the query by inserting SQL commands.Two new Data Dictionary tables are
- DBC.DBQLParamTbl
- DBC.DBQLParamTbl logs
- The Light-Weight Redistribution(LWR) also referred to as the Single Sender Redistribution (SSR). With this feature, the Teradata Database optimizer can switch from an all-AMP row redistribution to a few AMP row redistribution.
Big Data & Analytic JSON Integration Scripting and Language Support Table Operator Enhancements 3D Geospatial Data Type Geospatial Performance Enhancements SQL-H Enhancements Performance Light-Weight Redistribution Software Efficiency Improvements Ecosystem SQL Interface For ShowBlocks TASM/TIWM – Classification by Datablock Selectivity Utility Statistics Logging TASM – Global Weights Unity Callback Support TASM – Virtual Partitions Enhancement TIWM – Operating Periods For Appliance Quality Smart Retry During Parsing HSN Health New PI On Access Rights Table Unresponsive Node Isolation Onsite System and Dump Analysis DBQL – Show Parameters Industry Compatibility Sequenced Aggregate Enhancements and Aggregate JI for Temporal 1MB Phase 2 ANSI Temporal Syntax Support Teradata Directory Manager
3. Highlight a few of the important components of Teradata.
- Bynet
- Access Module Processor (AMP)
- Parsing Engine (PE)
- Virtual Disk (vDisk)
- Virtual Storage System (VSS)
4. What is meant by a Virtual Disk?
- Virtual Disk is basically a compilation of a whole array of cylinders which are physical disks. It is sometimes referred to as disk Array.
- A vdisk is the logical disk space that is managed by an AMP. Depending on the configuration, a vdisk may not be contained on the node; however, it is managed by an AMP, which is always a part of the node.
- The vdisk is made up of 1 to 64 pdisks (user slices in UNIX or partitions in Windows NT, whose size and configuration vary based on RAID level). The pdisks logically combine to comprise the AMP’s vdisk. Although an AMP can manage up to 64 pdisks, it controls only one vdisk. An AMP manages only its own vdisk, not the vdisk of any other AMP.
5. Why AMP & PE are called Vprocs?
AMPs and PEs are implemented as virtual processors – vprocs. They run under the control of PDE and their number is software configurable. AMPs are associated with virtual disks vdisks which are associated with logical units (LUNs) within a disk array
Vprocs:Virtual process From PE to AMP (This is the network root via MSP(message passing layer),The processing data will store in Disks(These are Physical disks),Each Amp have too many P.disks,to migrate these P.disks The Bynet Network maintains Virtual disks.These V.disks will responsible for data migration.hence they are called as Virtual Process(VPROCS).
Both AMP and PE are implemented using software and thus the word Virtual processor. There is no special hardware(like a special microprocessor) implementation required AMP and PE to function.
The versatility of the Teradata Database is based on virtual processors (vprocs)that eliminate dependency on specialized physical processors. Vprocs are a set of software processes that run on a node under the Teradata Parallel Database Extensions (PDE) within the multitasking environment of the operating system.
6. Explain the meaning of Amp?
Amp basically stands for Access Module Processor and happens to be a processor working virtually and is basically used for managing a single portion of the database. This particular portion of database cannot be shared by any other Amp. Thus, this form of architecture is commonly referred to as shared-nothing architecture.
7. What does Amp contain and what are all the operations that it performs?
Amp basically consists of a Database Manager Subsystem
- Performing DML
- Performing DDL
- Implementing Aggregations and Joins.
- Releasing and applying locks, etc.
8. What is meant by a Parsing Engine?
- PE happens to be a kind Vproc.
- Its primary function is to take SQL requests and deliver responses in SQL.
- It consists of a wide array of software components that are used to break SQL into various steps and then send those steps to AMPs.
- To put it simply, a virtual processor is a simulated processor in a processing software system, or a software version of a dedicated physical processor. Each vproc uses a portion of the physical processor resource, and runs independently of other vprocs. The vprocs co-existing within a node share a single memory pool – a collection of free memory in the node. The portion of memory allocated from the pool to one vproc will be returned to the pool after usage for use by other vprocs.
- The virtual processor (VPROC) is the basic unit of parallelism. One symmetrical multi-processing (SMP) node is consisted of several VPROCs, as many as 8 or 12.
9.What do you mean by parsing?
Parsing is a process concerned with analysis of symbols of string that are either in computer language or in natural language.
10. What are the functions of a Parser?
- Checks semantics errors
- Checks syntactical errors
- Checks object existence
11. What is meant by a dispatcher?
Dispatcher takes a whole collection of requests and then keeps them stored in a queue. The same queue is being kept throughout the process in order to deliver multiple sets of responses.
12. How many sessions of MAX is PE capable of handling at a particular time?
PE can handle a total of 120 sessions at a particular point of time.
13. Explain BYNET.
BYNET basically serves as a medium of communication between the components. It is primarily responsible for sending messages and also responsible for performing merging, as well as sorting operations.
14. Explain PDE.
PDE basically stands for Parallel Data Extension. PDE basically happens to be an interface layer of software present above the operation system and gives the database a chance to operate in a parallel milieu.
15. What is TPD?
TPD basically stands for Trusted Parallel Database, and it basically works under PDE. Teradata happens to be a database that primarily works under PDE. This is the reason why Teradata is usually referred to as Trusted Parallel or Pure Parallel database.
16. What is meant by a Channel Driver?
A channel driver is software that acts as a medium of communication between PEs and all the applications that are running on channels which are attached to the clients.
17. What is meant by Teradata Gateway?
Just like channel driver, Teradata Gateway acts as a medium of communication between the Parse Engine and applications that are attached to network clients.
Only one Gateway is assigned per node.
18. Highlight a few of the advantages that ETL tools have over TD.
- Multiple heterogeneous destinations, as well as sources can be operated.
- Debugging process is much easier with the help of ETL tools owing to full-fledged GUI support.
- Components of ETL tools can be easily reused, and as a result, if there is an update to the main server, then all the corresponding applications connected to the server are updated automatically.
- De-pivoting and pivoting can be easily done using ETL tools.
19. Mention a few of the ETL tools that come under Teradata.
Some of the ETL tools which are commonly used in Teradata are DataStage, Informatica, SSIS, etc.
20. What is the meaning of Caching in Teradata?
Caching is considered as an added advantage of using Teradata as it primarily works with the source which stays in the same order i.e. does not change on a frequent basis. At times, Cache is usually shared amongst applications.
21. What is meant by a node?
A node basically is termed as an assortment of components of hardware and software. Usually a server is referred to as a node.
22. What is meant by a Clique?
A Clique is basically known to be an assortment of nodes that is being shared amongst common disk drives. Presence of Clique is immensely important since it helps in avoiding node failures.
23. What happens when a node suffers a downfall?
Whenever there is a downfall in the performance level of a node, all the corresponding Vprocs immediately migrate to a new node from the fail node in order to get all the data back from common drives.
24. Highlight the points of differences between the database and user in Teradata.
- A database is basically passive, whereas a user is active.
- A database primarily stores all the objects of database, whereas a user can store any object whether that is a macro, table, view, etc.
- Database does not has password while the user has to enter password.
25. Explain how spool space is used.
Spool space in Teradata is basically used for running queries. Out of the total space that is available in Teradata, 20% of the space is basically allocated to spool space.
26.How the data is distributed among AMPs based on PI in Teradata?
- Assume a row is to be inserted into a Teradata table
- The Primary Index Value for the Row is put into the Hash Algorithm
- The output is a 32-bit Row Hash
- The Row Hash points to a bucket in the Hash Map.The first 16 bits of the Row Hash of is used to locate a bucket in the Hash Map
- The bucket points to a specific AMP
- The row along with the Row Hash are delivered to that AMP
When the AMP receives a row it will place the row into the proper table, and the AMP checks if it has any other rows in the table with the same row hash. If this is the first row with this particular row hash the AMP will assign a 32-bit uniqueness value of 1. If this is the second row hash with that particular row hash, the AMP will assign a uniqueness value of 2. The 32-bit row hash and the 32-bit uniqueness value make up the 64-bit Row ID. The Row ID is how tables are sorted on an AMP. This uniqueness value is useful in case of NUPI’s to distinguish each NUPI value. Both UPI and NUPI is always a One AMP operation as the same values will be stores in same AMP.
27. How Teradata retrieves a row?
For example, a user runs a query looking for information on Employee ID 100. The PE sees that the Primary Index Value EMP is used in the SQL WHERE clause. Because this is a Primary Index access operation, the PE knows this is a one AMP operation. The PE hashes 100 and the Row Hash points to a bucket in the Hash Map that represents AMP X. AMP X is sent a message to get the Row Hash and make sure it’s EMP 100.
28.NO PI in Teradata?
CREATE TABLE <TABLE> (
PK INTEGER NOT NULL
) NO PRIMARY INDEX ;
PK INTEGER NOT NULL
) NO PRIMARY INDEX ;
NOPI tables come with some further restrictions, find below the most important ones:
– Only MULTISET tables can be created
– No identity columns can be used
– NoPI tables cannot be partitioned with a PPI
– No statements with an update character allowed (UPDATE,MERGE INTO,UPSERT), still you can use INSERT,DELETE and SELECT
– No Permanent Journal possible
– Cannot be defined as Queue Tables
– No Queue Tables allowed
– Update Triggers cannot update a NOPI table (probably introduces with a later release)
– No hash indexes are allowed (use join indexes instead)
– No identity columns can be used
– NoPI tables cannot be partitioned with a PPI
– No statements with an update character allowed (UPDATE,MERGE INTO,UPSERT), still you can use INSERT,DELETE and SELECT
– No Permanent Journal possible
– Cannot be defined as Queue Tables
– No Queue Tables allowed
– Update Triggers cannot update a NOPI table (probably introduces with a later release)
– No hash indexes are allowed (use join indexes instead)
The following features can be used as usual together with NOPI tables:
– Fallback protection of the table
– Secondary Indexes (USI, NUSI)
– Join Indexes
– CHECK and UNIQUE constraints
– Triggers
– Collection of statistics
– Secondary Indexes (USI, NUSI)
– Join Indexes
– CHECK and UNIQUE constraints
– Triggers
– Collection of statistics
- The information about NOPI tables, as in the case of any table type, is stored in DBC.Tables. The tablekind is ‘O’:
- (Fastloads, TPUMP array insert loads) is faster than loading tables with a Primary Index.
- After distributing the rows randomly across the AMPs we are already finished. No hashing and redistribution is needed. No sorting is needed. Furthermore, as the rows are assigned randomly to the AMPs, our data will always be distributed evenly across all AMPs and no skewing will occur.only the acquisition phase of the loading utilities is executed
- rows are always appended to the end of the table’s data blocks, thus any overhead usually caused by sorting the rows by rowhash into the data blocks is avoided.
- Bulk loaded small tables will always be skewed. Round robin is done on block level and not the row level. Some AMPs will receive data blocks and the rest will receive no data.
- If you execute an INSERT…SELECT statement, from a primary index table into a NOPI table, AMP-local copying of the rows will be applied. This means, all rows stay on their current AMP, and if the primary index table is skewed you, will end up with a skewed NOPI table.
- Skewing can happen, by executing an archive job on a source system, and a restore of the backup on a different target system.
- NOPI tables are useful in certain situations, but without a Primary Index, row access is limited to All-AMP full tables scans
- NOPI tables are useful in certain situations, but without a Primary Index, row access is limited to All-AMP full tables scans. In order to offset this impact, you could add an unique secondary index (USI) to your NOPI table.mainly consider using them during the ETL-Process, in situations when anyway a full table scan is required.
29. SubQuery and Correlated Subquery in teradata?
subquery is that it retrieves a list of values that are used for comparison against one or more columns in the main query. Here the subquery is executed first and based on the result set, the main query will be executed.
Select empname,deptname from employee where empid IN ( select empid from salarytable where salary>10000).
In the above query, empid will be chosen first based on the salary in the subquery and main query will be executed based on the result subset.
Correlated Subquery is an excellent technique to use when there is a need to determine which rows to SELECT based on one or more values from another table.It combines subquery processing and Join processing into a single request.
It first reads a row from the main query and then goes into the subquery to find the rows that match the specified column value.Then it goes for the next row from the main query. This process continues until all the qualifying rows from MAIN query.
select empname,deptno, salary from employeetable as emp
where salary=(select max(salary) from employeetable as emt where emt.deptno=emp.deptno)
Above query returns the highest paid employee from each department. This is also one of the scenario based questions in teradata.
30.Which is more efficient GROUP BY or DISTINCT to find duplicates in Teradata?
- With more duplicates GROUP BY is more efficient
- while if we have fewer duplicates the DISTINCT is efficient.
- It depends on the tablesize. If the records are more than half a million then GROUP BY is much better than DISTINCT.
- However if the records are very less than DISTINCT performance is better than GROUP BY.
31. What is spool space and when running a job if it reached the maximum spool space how you solve the problem in Teradata?
- WHERE clause is missing, it will cause a special product join, called Cartesian Join or Cross Join, Spool space is the space which is required by the query for processing or to hold the rows in the answer set.
- Spool space reaches maximum when the query is not properly optimized.
- We must use appropriate condition in WHERE clause and JOIN on correct columns to optimize the query.
- Also make sure unnecessary volatile tables are dropped as it occupies spool space.
32.What is Vdisk and how it will communicate with physical data storage at the time of data retrieval through AMP ?
- The total disk space associated with an AMP is called a vdisk. A vdisk may have up to three ranks. Hence Vdisk will communicate with physical storage through array controllers.
- Each AMP vproc must have access to an array controller, which in turn accesses the physical disks.
- AMP vprocs are associated with one or more ranks (or mirrored pairs) of data
33: After creating tables dynamically in the Teardata, where is the GRANT table option usually done ? When tables are newly created, what is the default role and what the default privileges which get assigned ?
- The GRANT option for any particular table depends on the privileges of the user. If it is an admin user you can grant privileges at any point of time.
- The default roles associated with the newly created tables depend on he schema in which they are created.
34: what is basic teradata query language?
- BTEQ(Basic teradata query) It allows us to write SQL statements along with BTEQ commands.
- We can use BTEQ for importing,exporting and reporting purposes.
- The commands start with a (.) dot and can be terminated by using (;), it is not mandatory to use (;).
- SQL statements doesn’t start with a dot , but (;) is compulsory to terminate the SQL statement.
- BTEQ will assume any thing written with out a dot as a SQL statement and requires a (;) to terminate it..
35. Difference between BTEQ and Sql assistant (query man)?
BTEQ : Basic Teradata Query utility
- SQL front-end : Report writing and formatting features
- Interactive and batch queries
- Import/Export across all platforms
- The default number of sessions, upon login, is 1.
Teradata Query Manager / Queryman / Teradata SQL Assistant
- SQL front-end for ODBC compliant databases
- Historical record of queries including:
- Timings
- Status
- Row counts
- Random sampling feature
- Limit amount of data returned
- Import/Export between database and PC
- Export to EXCEL or ACCESS
36.What is explain in teradata?
- The EXPLAIN modifier preceding an SQL statement generates an English translation of the optimizer’s plan. It is fully parsed and optimized, but not executed.
- EXPLAIN returns: Text showing how a statement will be processed (a plan).
- An estimate of how many rows will be involved.
- A relative cost of the request (in units of time).
- This information is useful for:
- Predicting row counts.
- Predicting performance.
- Testing queries before production.
- Analyzing various approaches to a problem
- EXPLAIN may also be accessed using Teradata Visual Explain.
- The execution strategy provides what an optimizer does but not why it chose them.
- The EXPLAIN facility is used to analyze all joins and complex queries.
37.What is log table? What is the use of log table?
- Data conversion errors,
- Constraint violations
- and other error conditions:
- Contains rows which failed to be manipulated due to constraint violations or Translation error
- It logs errors & exceptions that occurs during the apply phase.
- It logs errors that are occurs during the acquisition phase.
- Log table in MLOAD is used to store the results of each phase of the operation. This log table is used by MLOAD for restarting the mload process from the appropriate checkpoint i.e. after the successful phase.
- Captures rows that contains duplicate Values for UPIs.
38.How teradata makes sure that there are no duplicate rows being inserted when its a SET table?
Teradata will redirect the new inserted row as per its PI to the target AMP (on the basis of its row hash value), and if it find same row hash value in that AMP (hash synonyms) then it start comparing the whole row, and find out if duplicate. If its a duplicate it silently skips it without throwing any error.
39.How Terdata Row Distribution happen
- Each AMP holds a portion of every table.
- Each AMP keeps their tables in separate drawers.
- Each table is sorted by Row ID.
40.Why should we go Teradata over Oracle? What is the main differences?
Teradata data has below features over Oracle Architecture :
- Teradata Architecture is shared nothing.It supports unconditional parallelism.
- Data : supports terabytes of data.
- Load : we can load bulk data into tables with less time.
- Scale : Teradata is highly scalable
41.Explain parallel distribution and subtable concept in teradata?Parallel distribution:
Teradata stores the data uniformly. Ideally, every amp should have equal no of rows from each table in vdisks. Based on the primary index, the distribution of data would be uniform on amps.
Sub table: This table is created when a secondary Index is created on the table, it will have rowid,data and hash value for the data. It is used to fetch the data using secondary index.
42.What is activity count? What is skew factor?
- Activity count indicates the no of rows affected/processed by the last request(no of rows returned to bteq from teradata)
- skewness refers to the distribution of rows on the amps. If some AMPs are having more rows and some very less, then skewness would be high. This would affect the parallelism.
43.How many types of Skew exists? What is the difference between amp skew, data skew,cpu skew, io skew, delta amp cpu ?
If you utilized unequally TD resources(CPU,AMP,IO,Disk and etc) this is called skew exists.
- Major are 3 type of skews (CPU skew,AMP/Data skew, IO Skew). –
- Data skew? When data is not distributed equally on all AMPs. –
- CPU skew? Who is taking/consuming more CPU called CPU skew. –
- IO skew? Who perform more IO Operation. called IO Skew.
44.Increasing number of amps will increase performance. Is this true?
Yes,increase the no of amps with same no of users–performance increases proportionately.
Increase the no of PEs to increase the no of sessions.
Increase the no of disks to increase the data storage.
45.How to copy 1 million records from one DB to another, Which is effective in terms of performance?
1) In case of data migration from one db to another db in two different environments, nparc utility can be used
2) TPT utilities can be be used transfer data from one environment other environment
46.The following table types exists in Teradata:
- Permanent Table – Once created, remains in the database until it is dropped. It can be shared with different sessions and users.
- 1. SET table – Strictly NO duplicate values [By-Default]
- 2. MULTISET table – Allows Duplicate Values.
- Temporary Tables:
- Volatile Table – Only the session in which it is created can access the table structure and data. Session ends, table gets dropped.
- Global Temporary Table – Once created, the structure can be shared with different sessions and users but the data remains private to the session who loaded it. Session ends, its private data gets deleted.
- Derived Table – A type of temporary table derived from one or more other tables as the result of a sub-query.
- Queue Table – Permanent tables with timestamp column which indicates when each row was inserted.
- Error Logging Table – Stores errors of an associated permanent table.
- NoPI Table – Permanent tables that do not have any primary index defined on them
47.What is difference between Error code and Error Level in Bteq ?
Errorcode is used in bteq conditional logic. It holds the return code of every sql in bteq. 0 indicates perfect execution of the sql. The code returned by the previous sql statement which resembles the error message information(we can see in DBC.ErrorMsgs table). Return Code: The code return by the script after execution. These return codes can be 0,2,4,8,12,16.
Errorlevel assigns severity to error numbers. To know the error codes type select * from dbc.errormsgs in the Queryman Error level is the severity set for the corresponding error code if error level is 0- Success 4- Warning 8- User Error 12- Fatal error 16- Typical Error
48. What is the difference between Access Logging and Query Logging in Teradata?
- Access Logging is concerned with security (i.e. who’s is doing what). In access logging you ask the database to log who’s doing what on a given object. The information stored is based on the object not the SQL fired or the user who fired it.
- Query Logging (DBQL) is used for debugging (i.e. what’s happening around ?). Incase of DBQL database keep on tracking various parameters i.e. the SQLs, Resource, Spool Usage, Steps and other things which help you understand
49.Commands used within Ferret Utility?
- SHOWSPACE –
- amount of Disk Cylinder Space is in use
- amount of Disk Cylinder Space is available in the system.
- This will give you an information about
- Permanent Space cylinders,
- Spool Space cylinders,
- Temporary Space cylinders,
- Journaling cylinders,
- Bad cylinders and
- Free cylinders.
- For each of these 5 things it will present you 3 parameters i.e.
- Average Utilization per cylinder,
- % of total available cylinders and
- number of cylinders.
- SHOWBLOCK
50.Explain Ferret Utility in Teradata?
- Ferret (File Reconfiguration tool) is an utility which is used to display and set Disk Space Utilization parameters within Teradata RDBMS. When you select the Ferret Utility parameters, it dynamically re configures the data on disks.
- We can run this utility through Teradata Manager; to start the Ferret Utility type (START FERRET) in the database window.
51. What does SLEEP function does in Fast load?
- The SLEEP command specifies the amount minutes to wait before retrying to logon and establish all sessions. Sleep command can be used with all load utilities not only fastload. This situation can occur if all of the loader slots are used or if the number of requested sessions is not available.
- The default value is 6 minutes. If tenacity was set to 2 hours and sleep 10 minutes, Sleep command will try to logon for every 10 minutes up to 2 hours duration.
52. What is multi Insert?
- Inserting data records into the table using multiple insert statements. Putting a Semi colon in front of the key word INSERT in the next statement rather than Terminating the first statement with a semi colon achieves it.
- Insert into Sales “select * from customer” ; Insert into Loan “select * from customer”;
53. Which one is better IN or BETWEEN?
If we have to search for range of values, it is always advisable to use BETWEEN rather than list of values in IN clause. BETWEEN tells the optimizer that it is range of values we are going to handle rather than individual values. Hence, BETWEEN is much faster than IN.
54.What is default session transaction mode in Teradata?
- Teradata has two session transaction mode : BTET(Default) and ANSI.
- To change session mode we use following command before establishing session: .SET SESSION TRANSACTION (ANSI/BTET)
55.What are the options that are not available for global temporary tables ?
GTT definitions are created in Data dictionary. These table provide separate instance to each user who refers to them . The following options are not available for global temporary tables:
- Any kind of constraints like check/referential cannot be applied on table –
- Identity columns since data in GTT are materialized only for session –
- Permanent Journaling cannot be done as data in tables are instances only to user in that session –
- PPI cannot be applied as data does not get stored in PERM , only TEMP space is utilized here.
56.What are the options not available for volatile tables in teradata ?
The following options are not available for volatile tables because table definition is not stored in data dictionary
- Default values for columns –
- Title clause for columns –
- Named Indexes for table –
- Compression on columns/table level since table data are spooled –
- Stats cannot be collected since data is materialized only for session –
- Identity columns as these again would need entry in IDcol tables –
- PPI cannot be done on tables – Any kind of constraints like check/referential cannot be applied on table
57.Explain about Skew Factor?
The data distribution of table among AMPs is called Skew Factor Generally For Non-Unique PI we get duplicate values so the more duplicate vales we get more the data have same rowhash so all the same data will come to same amp, it makes data distribution inequality, One amp will store more data and other amp stores less amount of data, when we are accessing full table, The amp which is having more data will take longer time and makes other amps waiting which leads processing wastage In this situation (unequal distribution of data)we get Skew Factor High For this type of tables we should avoid full table scans
58.What is the acceptable range for skew factor in a table?
There is no particular range for skew factor. In case of production systems, it is suggested to keep skew factor between 5-10.
There are various considerations for skew factor –
- Number of AMPS
- Size of row in a table
- number of records in a table
- PI of a table
- Frequent access of table (Performance consideration)
- whether table getting loaded daily /monthly or how frequently data is being refreshed
59. How do you determine the number of sessions?
- Teradata performance and workload
- Client platform type
- Channel performance for channel attached systems
- Network topology and performance for network attached systems
- Volume of data to be processed by the application
60.What is multivalued compression in Teradata?
Multi valued compression or just MVC is a compression technique applied on columns in Teradata . MVC has a unique feature of compressing up-to 255 distinct values per column in a given table. The advantage of compression are
- Reduced Storage cost by storing more of a logical data than physical data.
- Performance is greatly improves due to reduced retrieval of physical data for that column.
- Tables having compression always have an advantage since optimizer considers reduced I/O as one of the factors for generating EXPLAIN plan.
61. What is use of compress in terdata?Explain?
- Compression is used to Minimize the table size, for example when the table size is increasing anonymously We can use Compression to reduce the size of the table Conditions:
- Compression can be declared at the time of table creation
- We can compress up to 256 column values(not columns)
- We can’t compress variable length fields (vartext,varchar.
- The compressed value is stored in column header and is to be used as default of that column unless a value is present. e.g. Dept of Taxes in Washington has a database that tracks all people working in Washington. Around 99.9% of the tax payers would have Washington as their state code on their address. Instead of storing “Washington” in millions of records the compress will store the value “Washington” in the table header. Now, a value can be assumed in a row as a default unless another value exists inside the column.
62. Can you compress a column which is already present in table using ALTER in Teradata?
- No, We cannot use ALTER command to compress the existing columns in the table.
- A new table structure has to be created which includes the Compression values and data should be inserted into Compress column table.
- Please note – ALTER can be used only to add new columns with compression values to table.
63.how do you you implement Multi valued compression in an existing table?
- A new column with multi valued compression can be added to an existing table, but cannot modify existing compressed column.
- Create a new table with column which has MVC and do insert .. select from original table Or
- CREATE TABLE… as with column designated MVC.
64.What are advantages of compression on tables?
- They take less physical space then uncompressed columns hence reducing space cost
- They improve system performance as less data will be retrieved per row fetched ,
- more data is fetched per data block thus increasing data loading speed – They reduce overall I/O
65.What are the ways by which we can use zero to replace a null value for a given column ?
- Select Col1, ZEROIFNULL(Col2) from Table_name;
- Coalesce Select Col1,COALESCE(Col2,0) from Table_name;
- .Case operator select Case When Col2 IS NOT NULL Then Col2 Else 0 End from Table_name;
66.Can a macro be called inside a macro?
The main purpose of run a set of repeated sql queries. Macro supports only DML queries . Hence We cant call any-other macro or not even a procedure in a macro. One trick to have closest functionality of this is to copy all the sql queries from macro2 to macro1 and add parameters if it is necessary
67.How do you find the list of employees named “john” in an employee table without using Like operator??
This question seems tricky.. but yes there is another way by which we can find names/patters without using like operator. By using “BETWEEN” , we can find the list of employees named john… sel * from employee where name between ‘J’ and ‘K’; But at times usage of between is tricky, if there are other employees starting with J, those employees will also be listed by this query.
68.What are different types of Spaces available in Teradata ?
There are 3 types of Spaces available in teradata ,they are
Perm space
- This is disk space used for storing user data rows in any tables located on the database.
- Both Users & databases can be given perm space.
- This Space is not pre-allocated , it is used up when the data rows are stored on disk.
.Spool Space
- It is a temporary workspace which is used for processing Rows for given SQL statements.
- Spool space is assigned only to users . –
- Once the SQL processing is complete the spool is freed and given to some other query.
- Unused Perm space is automatically available for Spool .
. TEMP space
- It is allocated to any databases/users where Global temporary tables are created and data is stored in them.
- Unused perm space is available for TEMP space
69.What is hash collision ?
This occurs when there is same hash value generated for two different Primary Index Values. It is a rare occurance and Has been taken care in future versions of TD
70.What is RAID, What are the types of RAID?
Redundant Array of Inexpensive Disks (RAID) is a type of protection available in Teradata. RAID provides Data protection at the disk Drive level. It ensures data is available even when the disk drive had failed. Th\ere are around 6 levels of RAID ( RAID0 to RAID5) . Teradata supports Two levels of RAID protection RAID 1 – Mirrored copy of data in other disk RAID 5 – Parity bit (XOR) based Data protection on each disk array. One of the major overhead’s of RAID is Space consumption
71.What are Restrictions on Views in Teradata?
- An index cannot be Created on a view.
- It cannot contain an ORDER BY clause.
- All the derived columns and aggregate columns used in the view must have an AS clause (alias defined).
- A view cannot be used to UPDATE if it contains the following :
- Data from more than one table (JOIN VIEW)
- The same column twice
- Derived columns
- A DISTINCT clause
- A GROUP BY clause
72.list Built-in functions used in teradata ?
- SESSION: – Returns a number for the session for current user . •
- TIME: – this function provides the current time based on a 24-hour day • USER: – This one gives the user name of the current user.
- • ACCOUNT: – display’s your Teradata Account information •
- CURRENT_DATE: – Returns the current system date •
- CURRENT_TIME: – This function returns the current system time and current session ‘Time Zone’ displacement. •
- CURRENT_TIMESTAMP: – Returns the current system timestamp with TimeZone DATABASE: – It returns the name of the default database for the current user.
- DATE: – same as Current_DATE and is teradata built in .
73.What are the difference types of temporary tables in Teradata?
- Global Temporary tables (GTT) –
- When they are created, its definition goes into Data Dictionary.
- When materialized data goes in temp space.
- That’s why, data is active up to the session ends, and definition will remain there up-to its not dropped using Drop table statement. If dropped from some other session then its should be Drop table all;
- You can collect stats on GTT.
- Defined with the CREATE GLOBAL TEMPORARY TABLE sql
- Volatile Temporary tables (VTT) –
- Local to a session (deleted automatically when the session terminates)
- Table Definition is stored in System cache .
- Data is stored in spool space.
- That’s why; data and table definition both are active only up to session ends.
- No collect stats for VTT.If you are using volatile table,
- you can not put the default values on column level (while creating table)
- Created by the CREATE VOLATILE TABLE sql statement
- Derived tables
- Derived tables are local to an SQL query.
- Not included in the DBC data dictionary database, the definition is kept in cache.
- They are specified on a query level with an AS keyword in an sql statement
74. Why is BTET transaction processing overhead in Teradata.
- BTET makes all the queries running between BT and ET as single transaction . If any of query fails then the entire set of queries will not be committed
- BTET also has an overhead with locking , since it holds locks on tables till the ET is occured or all the queries have executed successfully
- DDL statements cannot be used everywhere in BTET processing , but these statements can be given towards the end of BTET transaction.
- Using large number of BTET caused transient Journal to grow and has tendancy for System Restart.
75.what are the different date formats available in Teradata system?
- The Teradata default format is: YY/MM/DD
- The ANSI display format is: YYYY-MM-DD
Related Links
Most useful Teradata interview questions -- PART - I
Most useful Teradata interview questions -- PART - II
Most useful Teradata interview questions -- PART - III
Most useful Teradata interview questions -- PART - IV
Most useful Teradata interview questions -- PART - V
Most useful Teradata interview questions -- PART - VI
Most useful Teradata interview questions -- PART - VII
Most useful Teradata interview questions -- PART - II
Most useful Teradata interview questions -- PART - III
Most useful Teradata interview questions -- PART - IV
Most useful Teradata interview questions -- PART - V
Most useful Teradata interview questions -- PART - VI
Most useful Teradata interview questions -- PART - VII
No comments:
Post a Comment