MSSQL – Lets Compress those large text with loads of spaces in it via UDF

Hi,

When the data grows large, i mean really large one has to come up with ways to bring in techniques where even the slightest can make a difference.

In this post I am going to share a technique where by when one is faced storing the large text data set, say receipts, articles which are long text and have loads of spaces in them.
We can write 2 custom CLR UDF, which can GZip compress them and then store in a table. And when required by the user decompress the string to get the original text back.

And all this task with a goal of saving some space. In my experience leveraging the above technique we were able to save around 35~40 % space.

So not taking much time the code for the same is below:

 [return: SqlFacet(MaxSize = -1)]
        [SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true)]
        public static SqlChars Compress(SqlChars cmptext)
        {
            if (cmptext.IsNull)
                return null;

            byte[] buffer = Encoding.UTF8.GetBytes(cmptext.ToSqlString().ToString());
            MemoryStream ms = new MemoryStream();
            using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true))
            {
                zip.Write(buffer, 0, buffer.Length);
            }

            ms.Position = 0;
            MemoryStream outStream = new MemoryStream();

            byte[] compressed = new byte[ms.Length];
            ms.Read(compressed, 0, compressed.Length);

            byte[] gzBuffer = new byte[compressed.Length + 4];
            System.Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
            System.Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
            return  new SqlChars(Convert.ToBase64String(gzBuffer).ToCharArray());
        }

        //------------------------------------------------------------------------------------------

        [return: SqlFacet(MaxSize = -1)]
        [SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true)]
        public static SqlChars Decompress(SqlChars dcmptext)
        {
            if (dcmptext.IsNull)
                return null;

            byte[] gzBuffer = Convert.FromBase64String(dcmptext.ToSqlString().ToString());
            using (MemoryStream ms = new MemoryStream())
            {
                int msgLength = BitConverter.ToInt32(gzBuffer, 0);
                ms.Write(gzBuffer, 4, gzBuffer.Length - 4);

                byte[] buffer = new byte[msgLength];

                ms.Position = 0;
                using (GZipStream zip = new GZipStream(ms, CompressionMode.Decompress))
                {
                    zip.Read(buffer, 0, buffer.Length);
                }
                return new SqlChars(Encoding.UTF8.GetString(buffer).ToCharArray());
            }

The resultant can be seen as below:

And from the above its visible that the compress has indeed reduced the string length from 138 to 80, so on a large data set it makes a substantial difference.

The compression algorithm has been referenced from the following location:

Gzip-string-and-back

Hope it helps.

Leave a Reply

Your email address will not be published. Required fields are marked *

*