Hi,
When the data grows large, i mean really large one has to come up with ways to bring in techniques where even the slightest can make a difference.
In this post I am going to share a technique where by when one is faced storing the large text data set, say receipts, articles which are long text and have loads of spaces in them.
We can write 2 custom CLR UDF, which can GZip compress them and then store in a table. And when required by the user decompress the string to get the original text back.
And all this task with a goal of saving some space. In my experience leveraging the above technique we were able to save around 35~40 % space.
So not taking much time the code for the same is below:
[return: SqlFacet(MaxSize = -1)] [SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true)] public static SqlChars Compress(SqlChars cmptext) { if (cmptext.IsNull) return null; byte[] buffer = Encoding.UTF8.GetBytes(cmptext.ToSqlString().ToString()); MemoryStream ms = new MemoryStream(); using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true)) { zip.Write(buffer, 0, buffer.Length); } ms.Position = 0; MemoryStream outStream = new MemoryStream(); byte[] compressed = new byte[ms.Length]; ms.Read(compressed, 0, compressed.Length); byte[] gzBuffer = new byte[compressed.Length + 4]; System.Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length); System.Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4); return new SqlChars(Convert.ToBase64String(gzBuffer).ToCharArray()); } //------------------------------------------------------------------------------------------ [return: SqlFacet(MaxSize = -1)] [SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true)] public static SqlChars Decompress(SqlChars dcmptext) { if (dcmptext.IsNull) return null; byte[] gzBuffer = Convert.FromBase64String(dcmptext.ToSqlString().ToString()); using (MemoryStream ms = new MemoryStream()) { int msgLength = BitConverter.ToInt32(gzBuffer, 0); ms.Write(gzBuffer, 4, gzBuffer.Length - 4); byte[] buffer = new byte[msgLength]; ms.Position = 0; using (GZipStream zip = new GZipStream(ms, CompressionMode.Decompress)) { zip.Read(buffer, 0, buffer.Length); } return new SqlChars(Encoding.UTF8.GetString(buffer).ToCharArray()); }
The resultant can be seen as below:
And from the above its visible that the compress has indeed reduced the string length from 138 to 80, so on a large data set it makes a substantial difference.
The compression algorithm has been referenced from the following location:
Hope it helps.