Here are my summary notes for what I learn about Column Store Index
Benefit :
- Columnar storage: Reduce IO and Memory usage
- Batch mode processing
- Segment elimination : segment can be bypassed base on the predicates
New 2014
- Updateable cluster index
- Further reduce the storage need
Terminology
- ROW GROUP: Each table would be split into row group. currently 1 million record per row group.
- Segment : Each Column in the row group will be split into segment with MAX/MIN value.
- Delta Store: New Add data would be add as Delta store before merge into the ROW GROUP
- Delete Bitmap: Delete record would be mark as delete in the delete bitmap.Only when the table get reorg, the delete bitmap would be apply to the column segment.
Testing
Below test, I am going to create 4 tables:
- t1: normal cluster index
- t2: Non cluster columnstore index
- t3: Cluster clumnstore index
- t4: compress columnstore index
Here are the test query and its result query plan.
As you can see, the Columnstore index dramatically increase the performance and CCI also use less space then the regular table and NCCI.

Create database ColumnStoreDB;
go
use ColumnStoreDB;
Go
--- Create 4 tables
/**
t1: normal cluster index
t2: Non cluster columnstore index
t3: Cluster clumnstore index
t4: compress columnstore index
**/
drop table t1;
go
drop table t2;
go
drop table t3;
go
drop table t4;
--go
SELECT [ProductKey] ,[CustomerKey] ,[CurrencyKey], [TotalProductCost] ,[OrderDate] ,[DueDate] INTO t1
FROM [AdventureWorksDW2014].[dbo].[FactInternetSales]
GO
INSERT INTO [dbo].t1 ( [ProductKey] ,[CustomerKey] ,[CurrencyKey], [TotalProductCost] ,[OrderDate],[DueDate] )
SELECT [ProductKey] ,[CustomerKey] ,[CurrencyKey], [TotalProductCost] ,[OrderDate] ,[DueDate]
FROM [AdventureWorksDW2014].[dbo].[FactInternetSales]
GO 100
SELECT * INTO t2
FROM t1
GO
SELECT * INTO t3
FROM t1
GO
SELECT * INTO t4
FROM t1
GO
-- Create cluster index t1 and t2
create clustered index IX_C_ProductKey on t1 ( [ProductKey] )
go
create clustered index IX_C_ProductKey on t2 ( [ProductKey] )
go
-- Create nonclustered ColumnStore index t2
CREATE NONCLUSTERED COLUMNSTORE INDEX IX_NC_ProductKey
ON t2 ([ProductKey] ,[CustomerKey] ,[CurrencyKey], [TotalProductCost] ,[OrderDate] ,[DueDate] )
GO
-- ERROR
-- Msg 35330, Level 15, State 1, Line 25
-- INSERT statement failed because data cannot be updated in a table that has a nonclustered columnstore index. Consider disabling the columnstore index before issuing the INSERT statement, and then rebuilding the columnstore index after INSERT has completed.
-- Create clustered index t3
CREATE CLUSTERED COLUMNSTORE INDEX IX_C_ProductKey
ON t3
GO
-- Create clustered index with compression t4
CREATE CLUSTERED COLUMNSTORE INDEX IX_C_ProductKey_compress
ON t4
WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)
GO
-- Check Index
SELECT ind.name, obj.name, ind.type_desc
FROM sys.indexes ind
INNER JOIN sys.objects obj
ON ind.object_id = obj.object_id
WHERE obj.name like 't%'
ORDER BY 2
GO
--- check the table size
sp_spaceused '[dbo].[t1]';
go
sp_spaceused '[dbo].[t2]';
go
sp_spaceused '[dbo].[t3]';
go
sp_spaceused '[dbo].[t4]';
go
-- check performance
DBCC DROPCLEANBUFFERS
SET STATISTICS IO ON
SET STATISTICS TIME ON
--select count(*) from t1
--go
--select count(*) from t2
--go
--select count(*) from t3
--go
--select count(*) from t4
--go
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t1
group by ProductKey
go
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t2
group by ProductKey
go
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t3
group by ProductKey
go
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t4
group by ProductKey
go
Non-Cluster Column Store index: NCCI
- Introduce in SQL 2012
- Read only, non updateable .
- require separate heap and Cluster index before create it.
We can use IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX to ignore the NCCI. There is no correspond query hint exist in the CCI.
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t2
group by ProductKey
option(IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX ) ;
go
If we try to insert the data into the NCCI table, we will get below error.
Msg 35330, Level 15, State 1, Line 56
INSERT statement failed because data cannot be updated in a table that has a nonclustered columnstore index. Consider disabling the columnstore index before issuing the INSERT statement, and then rebuilding the columnstore index after INSERT has completed.
Updateable Cluster Column Store Index: CCI
New in 2014, Cluster Column Store index ( CCI ) is updateable. This is achieve by adding the row group into the data. New insert row will be place in the new row group until reorg .
sys.column_store_row_group is new system table which we can use to view the status of the row group.
The entire table is going to converting to one column store index and we don’t need to specify the column name when creating the index.
--- CCI update
Insert into t3 values ( 1,1,1,1,getdate(),getdate())
select * from sys.column_store_row_groups where object_id= object_id('t3');
--- regular reorg would not reorg the CCI
ALTER INDEX IX_C_ProductKey ON t3 REORGANIZE;
--- Change the Sate from OPEN to COMPRESSED
ALTER INDEX IX_C_ProductKey ON t3 REORGANIZE
WITH (COMPRESS_ALL_ROW_GROUPS = ON);
--- Merge the rowgroup
ALTER INDEX IX_C_ProductKey ON t3 REBUILD PARTITION = ALL;
After insert the data, we can see new row group being added with sate OPEN.
REORGANTIZE WITH ( COMPRESS_ALL_ROW_GROUPS = ON)
REBUILD PARTITION = ALL
Batch mode & Row Mode
- New in SQL 2012, Query optimizer can use ‘Batch’ mode to process the data instead of processing the data in ROW mode.
- Optimizer would not use always use Batch mode in the conlumnstore index . If the query does not need to be Parallelism ( MAXDOP >= 2) , it wont use.
- Only conlumnstore index can have batch mode processing.
- operator that support the Batch mode
I am going to use the same query to see what performance difference we can get from Batch mode.
To force the ROW mode, I will use MAXDOP 1 in the query hint.
-- Limit MADOP in t3 for ROW Mode
DBCC DROPCLEANBUFFERS
--- Batch Mode
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t3
group by ProductKey
go
--- Row Mode
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t3
group by ProductKey
option(maxdop 1)
Segment elimination
Another performance gain we can get from column store index is through segment elimination.
Data is distribute through segment and each segment has MIN and MAX value. Query optimizer can skip the non need segment to reduce the IO. By using DBCC TRACEON(646, -1) we can see if segment elimination is happing during the query processing.
--- Segment elimitation
SELECT a.partition_id,a.hobt_id,column_id,segment_id,row_count,on_disk_size,a.min_data_id,
a.max_data_id
FROM sys.column_store_segments a
INNER JOIN sys.partitions b
ON a.partition_id = b.partition_id
INNER JOIN sys.objects c
ON b.object_id = c.object_id
WHERE name like 't2'
order by column_id
GO
DBCC TRACEON(3605, -1);
GO
DBCC TRACEON(646, -1);
GO
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t2
group by ProductKey
HAVING ProductKey BETWEEN 500 AND 600
OPTION (RECOMPILE);
go
xp_readerrorlog
go
This is all the row group for the column 1
Base on our query, it will skip the row group 0
Columnstore Metadata
select * from sys.column_store_dictionaries
Select * from sys.column_store_dictionaries;
select * from sys.column_store_segments;
Reference
- http://www.sqlskills.com/blogs/joe/row-and-batch-execution-modes-and-columnstore-indexes/
- http://social.technet.microsoft.com/wiki/contents/articles/4995.sql-server-columnstore-performance-tuning.aspx
- http://sqlbits.com/Downloads/222/Parallel%20Query%20Processing%20in%20SQL%20Server.pptx
- http://phoebix.com/2014/05/01/sql-server-2014-columnstore-indexes-partitioning/
0 comments:
Post a Comment