Here are my summary notes for what I learn about Column Store Index
- Columnar storage: Reduce IO and Memory usage
- Batch mode processing
- Segment elimination : segment can be bypassed base on the predicates
- Updateable cluster index
- Further reduce the storage need
- ROW GROUP: Each table would be split into row group. currently 1 million record per row group.
- Segment : Each Column in the row group will be split into segment with MAX/MIN value.
- Delta Store: New Add data would be add as Delta store before merge into the ROW GROUP
- Delete Bitmap: Delete record would be mark as delete in the delete bitmap.Only when the table get reorg, the delete bitmap would be apply to the column segment.
Below test, I am going to create 4 tables:
- t1: normal cluster index
- t2: Non cluster columnstore index
- t3: Cluster clumnstore index
- t4: compress columnstore index
Here are the test query and its result query plan.
As you can see, the Columnstore index dramatically increase the performance and CCI also use less space then the regular table and NCCI.
Non-Cluster Column Store index: NCCI
- Introduce in SQL 2012
- Read only, non updateable .
- require separate heap and Cluster index before create it.
We can use IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX to ignore the NCCI. There is no correspond query hint exist in the CCI.
If we try to insert the data into the NCCI table, we will get below error.
Msg 35330, Level 15, State 1, Line 56
INSERT statement failed because data cannot be updated in a table that has a nonclustered columnstore index. Consider disabling the columnstore index before issuing the INSERT statement, and then rebuilding the columnstore index after INSERT has completed.
Updateable Cluster Column Store Index: CCI
New in 2014, Cluster Column Store index ( CCI ) is updateable. This is achieve by adding the row group into the data. New insert row will be place in the new row group until reorg .
sys.column_store_row_group is new system table which we can use to view the status of the row group.
The entire table is going to converting to one column store index and we don’t need to specify the column name when creating the index.
After insert the data, we can see new row group being added with sate OPEN.
REORGANTIZE WITH ( COMPRESS_ALL_ROW_GROUPS = ON)
REBUILD PARTITION = ALL
Batch mode & Row Mode
- New in SQL 2012, Query optimizer can use ‘Batch’ mode to process the data instead of processing the data in ROW mode.
- Optimizer would not use always use Batch mode in the conlumnstore index . If the query does not need to be Parallelism ( MAXDOP >= 2) , it wont use.
- Only conlumnstore index can have batch mode processing.
- operator that support the Batch mode
I am going to use the same query to see what performance difference we can get from Batch mode.
To force the ROW mode, I will use MAXDOP 1 in the query hint.
Another performance gain we can get from column store index is through segment elimination.
Data is distribute through segment and each segment has MIN and MAX value. Query optimizer can skip the non need segment to reduce the IO. By using DBCC TRACEON(646, -1) we can see if segment elimination is happing during the query processing.
This is all the row group for the column 1
Base on our query, it will skip the row group 0