Sunday, September 18, 2016

Fact and Dimension Relationship

Following are the four different types of relationships between a Cube Dimension and a Measure Group:

In a Regular relationship, primary key column of a dimension is directly connected to the fact table. This type of relationship is similar to the relationship between a dimension and a fact in a Star Schema, and it can be based on either the physical primary key-foreign key relationship in the underlying relational database or the logical primary key-foreign key relationship defined in the Data Source View.
In a Referenced relationship, primary key columns of a dimension is indirectly connected to the fact table through a key column in the intermediate dimension table. This type of relationship is similar to the indirect relationship between a dimension and a fact, through an intermediate dimension, in a Snowflake Schema.
In a Fact relationship, the dimension table and the fact table are one and the same. Basically a Fact Dimension or Degenerate Dimensionis created using one or more columns from the fact table and this degenerate dimension is used while defining/establishing the relationship in case of a fact relationship.
In a Many-to-Many relationship, a dimension is indirectly connected to a Measure Group through an intermediate fact table which joins with the dimension table. It is analogous to a scenario, where one project can have multiple project managers and one project manager can manage multiple projects.

Saturday, November 7, 2015


 What do we mean by dataflow in SSIS?
Data flow is nothing but the flow of data from the corresponding sources to the referred destinations. In this process, the data transformations make changes to the data to make it ready for the data warehouse.
• What is a breakpoint in SSIS? How is it setup? How do you disable it?
A breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an
opportunity to review the status of the data, variables and the overall status of the SSIS package.
10 unique conditions exist for each breakpoint.
Breakpoints are setup in BIDS. In BIDS, navigate to the control flow interface. Right click on the
object where you want to set the breakpoint and select the ‘Edit Breakpoints…’ option.
• Can you name 5 or more of the native SSIS connection managers?
1) OLEDB connection – Used to connect to any data source requiring an OLEDB connection (i.e.,
SQL Server 2000)
2) Flat file connection – Used to make a connection to a single file in the File System. Required for reading information from a File System flat file
3) ADO.Net connection – Uses the .Net Provider to make a connection to SQL Server 2005 or other
connection exposed through managed code (like C#) in a custom task
4) Analysis Services connection – Used to make a connection to an Analysis Services database or project. Required for the Analysis Services DDL Task and Analysis Services Processing Task
5) File connection – Used to reference a file or folder. The options are to either use or create a file or folder
6) Excel
• What is the use of Bulk Insert Task in SSIS?
Bulk Insert Task is used to upload large amount of data from flat files into Sql Server. It supports only OLE DB connections for destination database.
• What is Conditional Split transformation in SSIS?
This is just like IF condition which checks for the given condition and based on the condition evaluation, the output will be sent to the appropriate OUTPUT path. It has ONE input and MANY outputs. Conditional Split transformation is used to send paths to different outputs based on some conditions. For example, we can organize the transform for the students in a class who have marks greater than 40 to one path and the students who score less than 40 to another path.
• How do you eliminate quotes from being uploaded from a flat file to SQL Server? 
This can be done using TEXT QUALIFIER property. In the SSIS package on the Flat File Connection Manager Editor, enter quotes into the Text qualifier field then preview the data to ensure the quotes are not included.
• Can you explain how to setup a checkpoint file in SSIS?
The following items need to be configured on the properties tab for SSIS package:
CheckpointFileName – Specify the full path to the Checkpoint file that the package uses to save the value of package variables and log completed tasks. Rather than using a hard-coded path as shown above, it’s a good idea to use an expression that concatenates a path defined in a package variable and the package name.
CheckpointUsage – Determines if/how checkpoints are used. Choose from these options: Never(default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists is the typical setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it is used to restore package variable values and restart at the point of failure. If a Checkpoint file is not found the package starts execution with the first task. The Always choice raises an error if the Checkpoint file does not exist.
SaveCheckpoints – Choose from these options: True or False (default). You must select True to implement the Checkpoint behavior.
• What are the different values you can set for CheckpointUsage property ?
There are three values, which describe how a checkpoint file is used during package execution:
1) Never: The package will not use a checkpoint file and therefore will never restart.
2) If Exists: If a checkpoint file exists in the place you specified for the CheckpointFilename property, then it will be used, and the package will restart according to the checkpoints written.
3) Always: The package will always use a checkpoint file to restart, and if one does not exist, the package will fail.
• What is the ONLY Property you need to set on TASKS in order to configure CHECKPOINTS to RESTART package from failure?
The one property you have to set on the task is FailPackageOnFailure. This must be set for each task or container that you want to be the point for a checkpoint and restart. If you do not set this property to true and the task fails, no file will be written, and the next time you invoke the package, it will start from the beginning again.
• Where can we set the CHECKPOINTS, in DataFlow or ControlFlow ?
Checkpoints only happen at the Control Flow; it is not possible to checkpoint transformations or restart inside a Data Flow. The Data Flow Task can be a checkpoint, but it is treated as any other task.
• Can you explain different options for dynamic configurations in SSIS?
1) XML file
2) custom variables
3) Database per environment with the variables
4) Use a centralized database with all variables
• What is the use of Percentage Sampling transformation in SSIS?
Percentage Sampling transformation is generally used for data mining. This transformation builds a random sample of set of output rows by choosing specified percentage of input rows. For example if the input has 1000 rows and if I specify 10 as percentage sample then the transformation returns 10% of the RANDOM records from the input data.
• What is the use of Term Extraction transformation in SSIS?
Term Extraction transformation is used to extract nouns or noun phrases or both noun and noun phrases only from English text. It extracts terms from text in a transformation input column and then writes the terms to a transformation output column. It can be also used to find out the content of a dataset.
• What is Data Viewer and what are the different types of Data Viewers in SSIS?
A Data Viewer allows viewing data at a point of time at runtime. If data viewer is placed before and after the Aggregate transform, we can see data flowing to the transformation at the runtime and how it looks like after the transformation occurred. The different types of data viewers are:
1. Grid
2. Histogram
3. Scatter Plot
4. Column Chart.
• What is Ignore Failure option in SSIS?
In Ignore Failure option, the error will be ignored and the data row will be directed to continue on the next transformation. Let’s say you have some JUNK data(wrong type of data or JUNK data) flowing from source, then using this option in SSIS we can REDIRECT the junk data records to another transformation instead of FAILING the package. This helps to MOVE only valid data to destination and JUNK can be captured into separate file.
• Which are the different types of Control Flow components in SSIS?
The different types of Control Flow components are: Data Flow Tasks, SQL Server Tasks, Data Preparation Tasks, Work flow Tasks, Scripting Tasks, Analysis Services Tasks, Maintenance Tasks, Containers.
• What are containers? What are the different types of containers in SSIS?
Containers are objects that provide structures to packages and extra functionality to tasks. There are four types of containers in SSIS, they are: Foreach Loop Container, For Loop Container, Sequence Container and Task Host Container.
• What are the different types of Data flow components in SSIS?
There are 3 data flow components in SSIS.
1. Sources
2. Transformations
3. Destinations
• What are the different types of data sources available in SSIS?
There are 7 types of data sources provided by SSIS: a.) Data Reader source b.) Excel source c.) Flat file source d.) OLEDB source e.) Raw file source f.) XML source g.) Script component
• What is SSIS Designer?
It is a graphical tool for creating packages. It has 4 tabs: Control Flow, Data Flow, Event Handlers and Package Explorer.
• What is Control Flow tab?
It is the tab in SSIS designer where various Tasks can be arranged and configured. This is the tab where we provide and control the program flow of the project.
• What is Data Flow tab?
This is the tab where we do all the work related to ETL job. It is the tab in SSIS Designer where we can extract data from sources, transform the data and then load them into destinations.
• What is the function of control flow tab in SSIS?
On the control flow tab, the tasks including dataflow task, containers and precedence constraints that connect containers and tasks can be arranged and configured.
• What is the function of Event handlers tab in SSIS?
On the Event handlers tab, workflows can be configured to respond to package events.
For example, we can configure Work Flow when ANY task Failes or Stops or Starts ..
• What is the function of Package explorer tab in SSIS?
This tab provides an explorer view of the package. You can see what is happening in the package. The Package is a container at the top of the hierarchy.
• What is Solution Explorer?
It is a place in SSIS Designer where all the projects, Data Sources, Data Source Views and other miscellaneous files can be viewed and accessed for modification.
• How do we convert data type in SSIS?
The Data Conversion Transformation in SSIS converts the data type of an input column to a different data type.
• How are variables useful in ssis package?
Variables can provide communication among objects in the package. Variables can provide communication between parent and child packages. Variables can also be used in expressions and scripts. This helps in providing dynamic values to tasks.
• Explain Aggregate Transformation in SSIS?
It aggregates data, similar you do in applying TSQL functions like Group By, Min, Max, Avg, and Count. For example you get total quantity and Total line item for each product in Aggregate Transformation Editor. First you determine input columns, then output column name in Output Alias table in datagrid, and also operations for each Output Alias in Operation columns of the same datagrid. Some of operation functions listed below :
• Group By
• Average
• Count
• Count Distinct : count distinct and non null column value
• Min, Max, Sum
In Advanced tab, you can do some optimization here, such as setting up Key Scale option (low, medium, high), Count Distinct scale option (low, medium, high), Auto Extend factor and Warn On Division By Zero. If you check Warn On Division By Zero, the component will give warning instead of error. Key Scale option will optimize transformation cache to certain number of key threshold. If you set it low, optimization will target to 500,000 keys written to cache, medium can handle up to 5 million keys, and high can handle up to 25 million keys, or you can specify particular number of keys here. Default value is unspecified. Similar to number of keys for Count Distinct scale option. It is used to optimize number of distinct value written to memory, default value is unspecified. Auto Extend Factor is used when you want some portion of memory is used for this component. Default value is 25% of memory.
• Explain Audit Transformation ?
It allows you to add auditing information as required in auditing world specified by HIPPA and Sarbanes-Oxley (SOX). Auditing options that you can add to transformed data through this transformation are :
1. Execution of Instance GUID : ID of execution instance of the package
2. PackageID : ID of the package
3. PackageName
4. VersionID : GUID version of the package
5. Execution StartTime
6. MachineName
7. UserName
8. TaskName
9. TaskID : uniqueidentifier type of the data flow task that contains audit transformation.
• Explain Character Map Transformation?
It transforms some character. It gives options whether output result will override the existing column or add to new column. If you define it as new column, specify new column name. Operations available here are:
1. Uppercase
2. Lowercase
3. Byte reversal : such as from 0×1234 to 0×4321
4. Full width
5. Half width
6. Hiragana/katakana/traditional Chinese/simplified Chinese
7. Linguistic casing
• Explain Conditional split Transformation ?
It functions as if…then…else construct. It enables send input data to a satisfied conditional branch. For example you want to split product quantity between less than 500 and greater or equal to 500. You can give the conditional a name that easily identifies its purpose. Else section will be covered in Default Output Column name.
After you configure the component, it connect to subsequent transformation/destination, when connected, it pops up dialog box to let you choose which conditional options will apply to the destination transformation/destination.
• Explain Copy column Transformation?
This component simply copies a column to another new column. Just like ALIAS Column in T-Sql.
• Explain Data conversion Transformation?
This component does conversion data type, similar to TSQL function CAST or CONVERT. If you wish to convery the data from one type to another then this is the best bet. But please make sure that you have COMPATABLE data in the column.
• Explain Data Mining query Transformation?
This component does prediction on the data or fills gap on it. Some good scenarios uses this component is:
1. Take some input columns as number of children, domestic income, and marital income to predict whether someone owns a house or not.
2. Take prediction what a customer would buy based analysis buying pattern on their shopping cart.
3. Filling blank data or default values when customer doesn’t fill some items in the questionnaire.
• Explain Derived column Transformation?
Derived column creates new column or put manipulation of several columns into new column. You can directly copy existing or create a new column using more than one column also.
• Explain Merge Transformation?
Merge transformation merges two paths into single path. It is useful when you want to break out data into path that handles errors after the errors are handled, the data are merge back into downstream or you want to merge 2 data sources. It is similar with Union All transformation, but Merge has some restrictions :
1. Data should be in sorted order
2. Data type , data length and other meta data attribute must be similar before merged.
• Explain Merge Join Transformation?
Merge Join transformation will merge output from 2 inputs and doing INNER or OUTER join on the data. But if you the data come from 1 OLEDB data source, it is better you join through SQL query rather than using Merge Join transformation. Merge Join is intended to join 2 different data source.
• Explain Multicast Transformation?
This transformation sends output to multiple output paths with no conditional as Conditional Split does. Takes ONE Input and makes the COPY of data and passes the same data through many outputs. In simple Give one input and take many outputs of the same data.
• Explain Percentage and row sampling Transformations?
This transformation will take data from source and randomly sampling data. It gives you 2 outputs. First is selected data and second one is unselected data. It is used in situation where you train data mining model. These two are used to take the SAMPLE of data from the input data.
• Explain Sort Transformation?
This component will sort data, similar in TSQL command ORDER BY. Some transformations need sorted data.
• Explain Union all Transformation?
It works in opposite way to Merge transformation. It can take output from more than 2 input paths and combines into single output path.
• What r the possible locations to save SSIS package?
You can save a package wherever you want.
SQL Server
Package Store
File System
• What is a package?
A discrete executable unit of work composed of a collection of control flow and other objects, including data sources, transformations, process sequence, and rules, errors and event handling, and data destinations.
• What is a workflow in SSIS?
A workflow is a set of instructions on how to execute tasks.
(It is a set of instructions on how to execute tasks such as sessions, emails and shell commands. a workflow is created form work flow mgr.
• What is the diff between control flow Items and data flow Items?
The control flow is the highest level control process. It allows you to manage the run-time process activities of data flow and other processes within a package.
When we want to extract, transform and load data within a package. You add an SSIS dataflow task to the package control flow.
• What are the main component of SSIS(project-architecture)?
SSIS archItecture has 4 main components
1.ssis service
2.ssis runtime engine & runtime executables
3.ssis dataflow engine & dataflow components
4.ssis clients
• Different components in SSIS package?
Control flow
Data flow
Event handler
Package explorer
• What are Connection Managers?
It is a bridge b/w package object and physical data. It provides logical representation of a connection at design time the properties of the connection mgr describes the physical connection that integration services creates when the package is run.
• What is environment variable in SSIS?
An environment variable configuration sets a package property equal to the value in an environment variable.
Environmental configurations are useful for configuring properties that are dependent on the computer that is executing the package.
• How to provide securIty to packages?
We can provide security in two ways
1. Package encryption
2. Password protection.
• What are Precedence constraints?
Constraints that link executable, container, and tasks wIthin the package control flow and specify condItion that determine the sequence and condItions for determine whether executable run.
• What is Design time Deployment in SSIS ?
When you run a package from with in BIDS,it is built and temporarily deployed to the folder. By default the package will be deployed to the BIN folder in the Package’s Project folder and you can configure for custom folder for deployment. When the Package’s execution is completed and stopped in BIDS,the deployed package will be deleted and this is called as Design Time Deployment.

Saturday, November 29, 2014

What are types of storage modes and Explain

There are three standard storage modes in OLAP applications
1.       MOLAP
2.       ROLAP
3.       HOLAP

MOLAP (Multi dimensional Online Analytical Processing) : MOLAP is the most used storage type. Its designed to offer maximum query performance to the users. the data and aggregations are stored in a multidimensional format, compressed and optimized for performance. This is both good and bad. When a cube with MOLAP storage is processed, the data is pulled from the relational database, the aggregations are performed, and the data is stored in the AS database. The data inside the cube will refresh only when the cube is processed, so latency is high.
·         Since the data is stored on the OLAP server in optimized format, queries (even complex calculations) are faster than ROLAP.
·         The data is compressed so it takes up less space.
·         And because the data is stored on the OLAP server, you don’t need to keep the connection to the relational database.
·         Cube browsing is fastest using MOLAP.
ROLAP (Relational Online Analytical Processing) : ROLAP does not have the high latency disadvantage of MOLAP. With ROLAP, the data and aggregations are stored in relational format. This means that there will be zero latency between the relational source database and the cube.

Disadvantage of this mode is the performance, this type gives the poorest query performance because no objects benefit from multi dimensional storage.

·         Since the data is kept in the relational database instead of on the OLAP server, you can view the data in almost real time.
·         Also, since the data is kept in the relational database, it allows for much larger amounts of data, which can mean better scalability.
·         Low latency.


Hybrid Online Analytical Processing (HOLAP): HOLAP is a combination of MOLAP and ROLAP. HOLAP stores the detail data in the relational database but stores the aggregations in multidimensional format. Because of this, the aggregations will need to be processed when changes are occur. With HOLAP you kind of have medium query performance: not as slow as ROLAP, but not as fast as MOLAP. If, however, you were only querying aggregated data or using a cached query, query performance would be similar to MOLAP. But when you need to get that detail data, performance is closer to ROLAP.


·         HOLAP is best used when large amounts of aggregations are queried often with little detail data, offering high performance and lower storage requirements.
·         Cubes are smaller than MOLAP since the detail data is kept in the relational database.
·         Processing time is less than MOLAP since only aggregations are stored in multidimensional format.
·         Low latency since processing takes place when changes occur and detail data is kept in the relational database.

Monday, September 1, 2014

What is use of AttributeHierarchyOrdered ?

Determines whether the associated attribute hierarchy is ordered. The default value is True

if an attribute hierarchy will not be used for querying, you can save processing time by changing the value of this property to False.