Monday, October 13, 2014

0 SQL2014: In Memory OLTP (2) Index

All Index are Non-Clustered . Must be NOT NULL. NONCLUSTERED keyword is optional .

Alter table is not allowed. Index needs to be defined at the table creation time.  There are two type of Index in the memory optimized table.

Hash Index:

  • single item lookup
  • Static size determined at creation time.
  • slow performance in the LIKE operation
  • Bucket count : recommend value to 1.5 ~ 2X of the distant value.  

Range Index

  • Range Index –> similar to the B-tree index
  • Good for range queries ( <, >, IN)
  • Size will grow.
  • slow performance on single item look up

Statistic is not being auto created on the memory optimized table. We have to manually create it.

PREREQUISITES

  • Below test is base on the TPC-H database.I user HammerDB to create the TPCH database and test dataset with scalar factor = 1 . Refer here
  • Setup the test tables:
CREATE TABLE [dbo].[orders_disk](
    [o_orderdate] [date] NULL,
    [o_orderkey] [int] NOT NULL,
    [o_custkey] [int] NOT NULL,
    [o_orderpriority] [char](15) NULL,
    [o_shippriority] [int] NULL,
    [o_clerk] [char](15) NULL,
    [o_orderstatus] [char](1) NULL,
    [o_totalprice] [money] NULL,
    [o_comment] [varchar](79) NULL,
    --INDEX ix_orderkey_disk NONCLUSTERED (o_orderkey ASC),
    INDEX ix_orderkey_disk CLUSTERED (o_orderkey ASC),
    INDEX ix_orderdate_disk NONCLUSTERED ([o_orderdate] ASC)
) ON [PRIMARY]
 
 
CREATE TABLE [dbo].[orders_mem](
    [o_orderdate] [date] NOT NULL,
    [o_orderkey] [int] NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT=3000000),
    [o_custkey] [int] NOT NULL,
    [o_orderpriority] [char](15) NULL,
    [o_shippriority] [int] NULL,
    [o_clerk] [char](15) NULL,
    [o_orderstatus] [char](1) NULL,
    [o_totalprice] [money] NULL,
    [o_comment] [varchar](79) NULL
    INDEX ix_orderdate (o_orderdate ASC)  -- all index are NONCLUSTERED
) WITH (MEMORY_OPTIMIZED=ON, DURABILITY = SCHEMA_AND_DATA)  -- SCHEMA_ONLY
GO
INSERT INTO orders_mem SELECT * FROM orders
GO
INSERT INTO orders_disk SELECT * FROM orders
GO

Hash Index

For single value lookup, memory optimized table perform better than disk table.

CHECKPOINT
GO
DBCC DROPCLEANBUFFERS
GO
SET STATISTICS IO ON
SET STATISTICS TIME ON
GO
select * from orders_disk where o_orderkey = 3953057
go
select * from orders_mem o where o_orderkey=3953057

image

image

In the storage type, it indicates, it is memoryOptimized

image

Statistic IO only has IO information for the disk base table.

image

However, hash index is not good for the range type of query. It is slower than the disk base table.

Disk base table: 22secs

Memory base table: 24 secs

image

Even we force the index on the memory optimized table. It is still take 22 secs  and it is Index Scan not the fast Index Seek.

image

Range Index

Another Index is range index.

Below test shows the range index is covering index therefore it does not need to do the key lookup like non cluster index in the row base table.

select * from orders_disk where o_orderDate = '1997-01-01'
go
select * from orders_mem where o_orderDate = '1997-01-01'
go

image

For range query, it perform better than row base table

select * from orders_disk where o_orderDate between '1997-01-01' and '1997-12-01' 
go
select * from orders_mem where o_orderDate between '1997-01-01' and '1997-12-01' 
go
image

Oder does matter more in the index in memory optimized table. If we want to retrieve the order that is different than index . SQL server needs to do the expensive index scan instead of index seek. 

Because the index on the orderDate column is created base on ASC order. Therefore when we want to get the DESC order result, it needs to sort again.

select * from orders_mem where o_orderDate between '1997-01-01' and '1997-12-01'  order by o_orderDate 
go
select * from orders_mem where o_orderDate between '1997-01-01' and '1997-12-01'  order by o_orderDate DESC
go 

image

If we run the same query on the row bases table, it does not matter .

image

Index on the character column

When create the index on the character column, we need to specify the collation otherwise, we will get below error:

Msg 12328, Level 16, State 102, Line 119
Indexes on character columns that do not use a *_BIN2 collation are not supported with indexes on memory optimized tables.
Msg 1750, Level 16, State 0, Line 119
Could not create constraint or index. See previous errors.

image

CREATE TABLE [dbo].[customer_mem2](
    [c_custkey] [int] NOT NULL,
    [c_firstname] [varchar](25) COLLATE Latin1_General_100_BIN2 NOT NULL,
    [c_lastname] [varchar](25) COLLATE Latin1_General_100_BIN2 NOT NULL,
    [c_age] [int] NOT NULL,
    [c_interest] [varchar](40) COLLATE Latin1_General_100_BIN2  NOT NULL
    ,PRIMARY KEY NONCLUSTERED HASH ([c_firstname],[c_lastname]) WITH ( BUCKET_COUNT=100)
    ,INDEX ix_customer_age NONCLUSTERED ([c_age],[c_interest] )
) WITH (MEMORY_OPTIMIZED=ON) 
 
GO
Insert into [customer_mem2] values (1,'summer','chen',5,'Elsa')
Insert into [customer_mem2] values (2,'sunny','chen',2,'Elmo')
Insert into [customer_mem2] values (3,'skye','chen',0,'UNKNOWN')

The column will become case sensitive and the order in the WHERE clause does become matter.

Hash Index: For composite key, the hash is created on the combination of the both value. Therefore, we for the single value lookup, it will result in the index scan .

--- index scan becasue hash key is base on c_firstname and c_lastname
select * from [customer_mem2] where [c_firstname] = 'skye'
 
--- index seek becasue hash key is base on c_firstname and c_lastname
select * from [customer_mem2] where [c_firstname] = 'skye' and c_lastname = 'chen'
select * from [customer_mem2] where  c_lastname = 'chen' and [c_firstname] = 'skye'

image

For the range index, it does not matter.

image

0 comments:

Post a Comment

 

SQL Panda Copyright © 2011 - |- Template created by O Pregador - |- Powered by Blogger Templates