Wednesday, October 8, 2014

0 SQL Server 2012/2014 Column Store Index

Here are my summary notes for what I learn about Column Store Index

Benefit :

  • Columnar storage: Reduce IO and Memory usage
  • Batch mode processing
  • Segment elimination  : segment can be bypassed base on the predicates

New 2014

  • Updateable cluster index
  • Further reduce the storage need

Terminology

  • ROW GROUP: Each table would be split into row group. currently 1 million record per row group.
  • Segment : Each Column in the row group will be split into segment with MAX/MIN value.
  • Delta Store: New Add data would be add as Delta store before merge into the ROW GROUP
  • Delete Bitmap: Delete record would be mark as delete in the delete bitmap.Only when the table get reorg, the delete bitmap would be apply to the column segment.

Testing 

Below test, I am going to create 4 tables:

  • t1: normal cluster index
  • t2: Non cluster columnstore index
  • t3: Cluster clumnstore index
  • t4: compress columnstore index

Here are the test query and its result query plan.

image

As you can see, the Columnstore index dramatically increase the performance and CCI also use less space then the regular table and NCCI.

image

image
Create database ColumnStoreDB;
go
use ColumnStoreDB;
Go
--- Create 4 tables 
/**
   t1: normal cluster index
   t2: Non cluster columnstore index
   t3: Cluster clumnstore index
   t4: compress columnstore index
**/
drop table t1;
go
drop table t2;
go
drop table t3;
go                        
drop table t4;
--go
SELECT [ProductKey] ,[CustomerKey] ,[CurrencyKey], [TotalProductCost] ,[OrderDate] ,[DueDate] INTO t1
FROM [AdventureWorksDW2014].[dbo].[FactInternetSales]
GO                                                          
INSERT INTO [dbo].t1 ( [ProductKey] ,[CustomerKey] ,[CurrencyKey], [TotalProductCost] ,[OrderDate],[DueDate] )
SELECT      [ProductKey] ,[CustomerKey] ,[CurrencyKey], [TotalProductCost] ,[OrderDate]   ,[DueDate]
FROM [AdventureWorksDW2014].[dbo].[FactInternetSales]
GO 100
 
 
SELECT * INTO t2
FROM t1
GO
SELECT *  INTO t3
FROM t1
GO
SELECT *   INTO t4
FROM t1
GO
-- Create cluster index    t1 and t2
create clustered index  IX_C_ProductKey on t1 ( [ProductKey] )
go
create clustered index  IX_C_ProductKey on t2 ( [ProductKey] )
go
 
-- Create nonclustered ColumnStore index  t2
CREATE NONCLUSTERED COLUMNSTORE INDEX IX_NC_ProductKey 
ON t2 ([ProductKey] ,[CustomerKey] ,[CurrencyKey], [TotalProductCost] ,[OrderDate] ,[DueDate] )
GO
-- ERROR
-- Msg 35330, Level 15, State 1, Line 25
-- INSERT statement failed because data cannot be updated in a table that has a nonclustered columnstore index. Consider disabling the columnstore index before issuing the INSERT statement, and then rebuilding the columnstore index after INSERT has completed.
 
 
-- Create clustered index  t3
CREATE CLUSTERED COLUMNSTORE INDEX IX_C_ProductKey
ON t3
GO
-- Create clustered index with compression t4 
CREATE CLUSTERED COLUMNSTORE INDEX IX_C_ProductKey_compress
ON t4
WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE) 
GO
-- Check Index
SELECT ind.name, obj.name, ind.type_desc
FROM sys.indexes ind
INNER JOIN sys.objects obj
ON ind.object_id = obj.object_id
WHERE obj.name like 't%'
ORDER BY 2
GO
---    check the table size
sp_spaceused '[dbo].[t1]';
go
sp_spaceused '[dbo].[t2]';
go
sp_spaceused '[dbo].[t3]';
go
sp_spaceused '[dbo].[t4]';
go
-- check performance
DBCC DROPCLEANBUFFERS 
SET STATISTICS IO ON
SET STATISTICS TIME ON
--select count(*) from t1
--go
--select count(*) from t2
--go
--select count(*) from t3
--go
--select count(*) from t4
--go
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t1
group by ProductKey
go
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t2
group by ProductKey
go
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t3
group by ProductKey
go
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t4
group by ProductKey
go

Non-Cluster Column Store index: NCCI

  • Introduce in SQL 2012
  • Read only, non updateable .
  • require separate heap and Cluster index before create it. 

We can use IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX  to ignore the NCCI. There is no correspond query hint exist in the CCI.

select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t2
group by ProductKey
option(IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX ) ;
go

image

If we try to insert the data into the NCCI table, we will get below error.

Msg 35330, Level 15, State 1, Line 56
INSERT statement failed because data cannot be updated in a table that has a nonclustered columnstore index. Consider disabling the columnstore index before issuing the INSERT statement, and then rebuilding the columnstore index after INSERT has completed.

image

Updateable Cluster Column Store Index: CCI

New in 2014, Cluster Column Store index ( CCI ) is updateable. This is achieve by adding the row group into the data. New insert row will be place in the new row group until reorg .

sys.column_store_row_group is new system table which we can use to view the status of the row group.

The entire table is going to converting to one column store index and we don’t need to specify the column name when creating the index.

--- CCI   update 
Insert into t3 values ( 1,1,1,1,getdate(),getdate())
select * from sys.column_store_row_groups where object_id= object_id('t3');
 
--- regular reorg would not reorg the CCI 
ALTER INDEX IX_C_ProductKey ON t3 REORGANIZE;
--- Change the Sate from OPEN to COMPRESSED
ALTER INDEX IX_C_ProductKey ON t3 REORGANIZE
WITH (COMPRESS_ALL_ROW_GROUPS = ON);
--- Merge the rowgroup  
ALTER INDEX IX_C_ProductKey ON t3  REBUILD PARTITION = ALL;

After insert the data, we can see new row group being added with sate OPEN.

image

REORGANTIZE WITH ( COMPRESS_ALL_ROW_GROUPS = ON)

image

REBUILD PARTITION = ALL

image

Batch mode & Row Mode

  • New in SQL 2012, Query optimizer can use ‘Batch’ mode to process the data instead of processing the data in ROW mode.
  • Optimizer  would not use always use Batch mode in the conlumnstore index . If the query does not need to be Parallelism ( MAXDOP >= 2) , it wont use.
  • Only conlumnstore index can have batch mode processing.
  • operator that support the Batch mode
    • Columnstore index Scanimage
    • Filter image
    • Compute Scalar image
    • Hash operations (aggregate, join, batch hash table buildimage

I am going to use the same query to see what performance difference we can get from Batch mode.

To force the ROW mode, I will use MAXDOP 1 in the query hint. 

-- Limit MADOP in t3 for ROW Mode
DBCC DROPCLEANBUFFERS 
--- Batch Mode
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t3
group by ProductKey
go
--- Row Mode   
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t3
group by ProductKey
option(maxdop 1)

image

image

image

image

Segment elimination

Another performance gain we can get from column store index is through segment elimination.

Data is distribute through segment and each segment has MIN and MAX value. Query optimizer can skip the non need segment to reduce the IO. By using  DBCC TRACEON(646, -1) we can see if segment elimination is happing during the query processing.

--- Segment elimitation 
SELECT a.partition_id,a.hobt_id,column_id,segment_id,row_count,on_disk_size,a.min_data_id,
       a.max_data_id
FROM sys.column_store_segments a
INNER JOIN sys.partitions b
ON a.partition_id = b.partition_id
INNER JOIN sys.objects c
ON b.object_id = c.object_id
WHERE name like 't2'
order by column_id 
GO
DBCC TRACEON(3605, -1); 
GO 
DBCC TRACEON(646, -1); 
GO 
select ProductKey,count(CustomerKey),SUM([TotalProductCost]) from t2
group by ProductKey
HAVING  ProductKey BETWEEN 500 AND 600
OPTION (RECOMPILE); 
go
xp_readerrorlog
go

This is all the row group for the column 1

image

Base on our query, it will skip the row group 0

image

Columnstore Metadata

select * from sys.column_store_dictionaries  
Select * from sys.column_store_dictionaries;
select * from sys.column_store_segments;

Reference

0 comments:

Post a Comment

 

SQL Panda Copyright © 2011 - |- Template created by O Pregador - |- Powered by Blogger Templates