Thursday, September 10, 2015

Compress LOB data in the database

SQL Server stores the data in regular B-Tree indexes in three different sets of the data pages called allocation units. The main data row structure and fixed-length data are stored in IN-ROW data pages. Variable-length data greater than 8,000 bytes in size is stored in LOB (large object) pages. Such data includes (max) columns, XML, CLR UDT and a few other data types. Finally, variable-length data, which does not exceed 8,000 bytes, is stored either in IN-ROW data pages when it fits into the page, or in ROW-OVERFLOW data pages.

Enterprise Edition of SQL Server allows you to reduce the size of the data by implementing data compression. However, data compression is applied to IN-ROW data only and it does not compress ROW-OVERFLOW and LOB data. Any large objects that do not fit into IN-ROW data pages remain uncompressed.

The approach to address such an overhead is manually compress LOB data in the code. You can create the methods to compress and decompress data utilizing one of the classes from System.IO.Compression namespace, for example using GZipStream or DeflateStream classes. Moreover, that method could be implemented in CLR stored procedures and used directly in T-SQL code. The drawback to this approach, compression is CPU intensive. It is better to run such code on the client whenever it is possible. The second important consideration is performance. Obviously, decompression adds an overhead, which you would like to avoid on the large scope.

Compressing LOB data in the database could help you to significantly reduce the database size in the large number of cases. However, it adds an overhead of compressing and decompressing data. In some cases, such overhead would be easily offset by the smaller data size, less I/O and buffer pool usage.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.