ColumnStore Indexes are great! They are a real performance booster for Data Warehousing and Reporting workloads. In combination with Clustered Column Store Indexes you get a huge compression benefit over regular Row Store Indexes (Clustered Indexes, Non-Clustered Indexes).
In other words, it’s a technology for storing, retrieving and managing data by using a columnar data format, called a Column Store.
Creating a Clustered Column Store Index is quite easy:
But is that all that you have to know about Clustered Column Store Indexes?
Not really, let’s explore this blog to understand more of this.
In discussions about column store indexes, we use the terms row store and column store to emphasize the format for the data storage. Column store indexes use both types of storage.
- A column store is data that is logically organized as a table with rows and columns, and physically stored in a column-wise data format.
A column store index physically stores most of the data in column store format. In column store format, the data is compressed and uncompressed as columns. There is no need to uncompressed other values in each row that are not requested by the query. This makes it fast to scan an entire column of a large table.
- A row store is data that is logically organized as a table with rows and columns, and then physically stored in a row-wise data format. This has been the traditional way to store relational table data such as a heap or clustered “b tree” index.
A column store index also physically stores some rows in a row store format called a delta store. The delta store, also called delta row groups, is a holding place for rows that are too few in number to qualify for compression into the column store. Each delta row group is implemented as a clustered b tree index.
- A delta store is a holding place for rows that are too few in number to be compressed into the column store. The delta store is a row store.
There are so many advantages with Clustered Column Store Indexes, which lead to massive performance improvements:
- Better Compression
- Batch Mode Execution
- Less IO & better Memory Management
- Segment Elimination
Creating a Clustered Column Store Index in SQL Server is quite easy as you see from the following example:
You only need to specify the table name, nothing more. You even don’t need to worry about Clustered Key Columns, because this concept doesn’t apply to Column Store Indexes.
Easy, isn’t it?
Let’s run a simple query with that Clustered Column Store Index in place:
— Segment Elimination doesn’t work quite well, because
— we have a lot of overlapping Segments.
The query is quite fast, because SQL Server can use the Clustered Column Store Index for query execution. And the output from STATISTICS IO also shows you that not many LOB Logical Reads were needed for accessing the Clustered Column Store Index:
But what about these Segment Read and Segment Skipped metrics?
As you might know a Column Store Index is internally subdivided into so-called Column Store Segments. A Column Store Segment is always specific to a specific column and a Row Group. A Row Group contains about 1 million of rows. The following picture illustrates this very important concept:
Figure 1: Illustrating how a column store index is created and stored. The set of rows is divided into row groups that are converted to column segments and dictionaries that are then stored using SQL Server blob storage.
What is Column Store Segment Elimination?
The most important thing here is that SQL Server stores a Minimum and Maximum Value internally for each Column Store Segment. And based on these values SQL Server can perform so-called Segment Elimination. Segment Elimination means that SQL Server only reads those Segments (while accessing a Column Store Index) that contain requested data. You can think about it in the same way as Partition Elimination when you work with Partitioned Tables. But the elimination happens here at the Column Store Segment Level.
As you have seen in the previous picture, SQL Server wasn’t able to eliminate any segments during the Column Store Index access, because by default you have no sorting order in the Column Store Index. The sorting order of your data depends on how SQL Server reads the data in the Execution Plan when you create the Column Store Index:
As you can see the Clustered Column Store Index was created by reading from the Heap Table that initially contained the data. And therefore, you have no sorting order in the Clustered Column Store Index, and therefore the Segment Elimination can’t work perfectly for you.
How can you improve that situation?
Enforce a sorting order in your data by first creating a traditional Row Store Clustered Index, and change it to a Clustered Column Store Index! Ouch that hurts…
— Now we create a traditional Row Store Clustered Index to sort our table data by the column “DateKey”.
— “Swap” the Clustered Index through a Clustered Column Store Index
You should have now sorted data in the Clustered Column Store Index, and Segment Elimination should work quite well:
— Segment Elimination works better than previously, but still not perfectly.
But when you look again at the output of STATISTICS IO, SQL Server still has to read a lot of segments, and only skips a few of them:
But why can’t SQL Server skip all segments other than the one we are interested in? The problem lies in the creation of the Clustered Column Store Index. Execution Plan, the Column Store Index Insert (Clustered) operator was running in parallel – across multiple worker threads. And these worker threads are again destroying the order of your data in the Clustered Column Store Index! You read your data pre-sorted from the Clustered Row Store Index, and then the parallel creation of the Clustered Column Store Index reshuffles your data… That hurts – again!
You can only solve that problem by creating the Clustered ColumnStore Index with a MAXDOP of 1:
This sounds terrible, and it is terrible!!! But it’s the only solution to let you preserve the ordering of your data in the Column Store Index. When you later read from the Clustered Column Store Index, you will then see that SQL Server was finally able to skip all Segments other than the one you are interested in:
Clustered Column Store Indexes are great. But by default, the Segment Elimination can’t be performed very well, because you have no predefined sorting order in your Clustered Column Store. Therefore, you should always make sure that the Segment Elimination works well, when you tune your Column Store queries. And sometimes you even have to work against SQL Server by using a MAXDOP 1 to preserve the ordering of your data.