Create the COLUMN store database
db2 terminate
db2set db2_workload=ANALYTICS
db2 force applications all
db2stop force
# sleep 5
db2start
db2 create database coldb
Check the DB/DBM configuration
#!/bin/bash
DMPARMS=INTRA_PARALLEL
DBPARAMS="Database page size|dft_table_org|dft_degree|dft_extent_sz|catalogcache_sz|sortheap|util_heap_sz|auto_reorg|sheapthres_shr"
echo "========================================================================="
echo "DBM CFG parameters affected by DB2_WORKLOAD=ANALYTICS for COLDB"
echo "========================================================================="
db2 get dbm cfg |grep -Ei --color "$DMPARMS"
echo "========================================================================="
echo "DB CFG parameters affected by DB2_WORKLOAD=ANALYTICS for COLDB"
echo "========================================================================="
db2 get db cfg for COLDB | grep -Ei --color "$DBPARAMS"
echo "========================================================================="
db2pd –d coldb –workclasssets
Create the column store table and row store table
The table structure is base on TPC-H’s customer table.
CREATE TABLE CUSTOMER_C ( C_CUSTKEY INTEGER NOT NULL,
C_NAME VARCHAR(25) NOT NULL,
C_ADDRESS VARCHAR(40) NOT NULL,
C_NATIONKEY INTEGER NOT NULL,
C_PHONE CHAR(15) NOT NULL,
C_ACCTBAL DECIMAL(15,2) NOT NULL,
C_MKTSEGMENT CHAR(10) NOT NULL,
C_COMMENT VARCHAR(117) NOT NULL);
CREATE TABLE CUSTOMER_R ( C_CUSTKEY INTEGER NOT NULL,
C_NAME VARCHAR(25) NOT NULL,
C_ADDRESS VARCHAR(40) NOT NULL,
C_NATIONKEY INTEGER NOT NULL,
C_PHONE CHAR(15) NOT NULL,
C_ACCTBAL DECIMAL(15,2) NOT NULL,
C_MKTSEGMENT CHAR(10) NOT NULL,
C_COMMENT VARCHAR(117) NOT NULL) organize by row
tableorg column in the SYSCAT.TABLES can show whether the table is column store or row store.
DB2 would create the synopsis table for each Column store base table. Since there is no data in the column store table, so there is no data in the synopsis table.
create 2GB test data for customer table
dbgen.exe -T c -s 40 –vf
Above command is for generating the customer table data only.
-s 40 –> means the overall db sizes is 40GB for all tables. However since the customer table is small ( about 25MB per 1GB database), therefore –s 40 would create about 1GB customer table.
Load the data to the table
Load to the column store would take longer than the row base table because the load would need to analysis the table and update the synopsis table.
Load the data to the column store table
LOAD FROM "customer.tbl" OF DEL MODIFIED BY ANYORDER COLDEL| METHOD P (1, 2, 3, 4, 5, 6, 7, 8) MESSAGES "/tmp/load_customer_C.log" REPLACE INTO CUSTOMER_C (C_CUSTKEY, C_NAME, C_ADDRESS, C_NATIONKEY, C_PHONE, C_ACCTBAL, C_MKTSEGMENT, C_COMMENT) STATISTICS USE PROFILE NONRECOVERABLE INDEXING MODE REBUILD SET INTEGRITY PENDING CASCADE IMMEDIATE LOCK with FORCE
Load the data to the row store table
LOAD FROM "customer.tbl" OF DEL MODIFIED BY ANYORDER COLDEL| METHOD P (1, 2, 3, 4, 5, 6, 7, 8) MESSAGES "/tmp/load_customer.log" REPLACE INTO CUSTOMER_R (C_CUSTKEY, C_NAME, C_ADDRESS, C_NATIONKEY, C_PHONE, C_ACCTBAL, C_MKTSEGMENT, C_COMMENT) STATISTICS USE PROFILE NONRECOVERABLE INDEXING MODE REBUILD SET INTEGRITY PENDING CASCADE IMMEDIATE LOCK with FORCE
Below screenshot show the analysis phase of the load to the column store table
Compare the storage between row base table and column base table
From below screenshot, we can see the column base store use less page ( 26777 ) than the row base table ( 32789)
Compare the store
select VARCHAR(tabschema, 10) SCHEMA, VARCHAR(tabname, 10) TABLE,
data_object_p_size DATA_SIZE,
index_object_p_size INDEX_SIZE,
data_object_p_size+index_object_p_size TABLE_SIZE,
sum(data_object_p_size+index_object_p_size) over () ROLL_SUM
from sysibmadm.admintabinfo
where tabschema = 'DB2INST1'

Page compression ratio
SELECT VARCHAR(TABNAME,20) TABNAME, TABLEORG,CARD,
PCTPAGESSAVED,
1.0 / (1.0 - PCTPAGESSAVED/100.0) COMPRESS_RATIO
FROM SYSCAT.TABLES
WHERE TABNAME = 'CUSTOMER_C'
AND TABLEORG='C';
Encoding ratio
SELECT VARCHAR(TABNAME,20) TABNAME,
VARCHAR(COLNAME,20) COLNAME,
PCTENCODED,
CASE WHEN PCTENCODED < 30 THEN 'POOR'
WHEN PCTENCODED BETWEEN 31 AND 60 THEN 'GOOD'
WHEN PCTENCODED BETWEEN 61 AND 90 THEN 'V GOOD'
WHEN PCTENCODED > 90 THEN 'EXCELLENT'
END
FROM SYSCAT.COLUMNS
WHERE TABSCHEMA='DB2INST1'
ORDER BY TABNAME,COLNAME
Convert row base table to column store table
Due to the naturel of the data, the column base table does not save much space compare to the row base table. In the real world, we should get better compression rate.
db2convert -d coldb -z db2inst1 -t CUSTOMER_R
Great post.Thanks for one marvelous posting! I enjoyed reading it;The information was very useful.Keep the good work going on!!
ReplyDeleteETL Testing training in chennai| SAP MM training in chennai | Informatica training in chennai
There is fake rolex watches no name in the watch industry replica Rolex Submariner watches that can compare with the status of Rolex. Literally, Rolex is synonymous with quality, and the well-known name has surpassed the watchmaking industry. It is not only an impeccable leading manufacturer of subdivided products, but also a model of international popular culture.
ReplyDelete