Shredded Storage

Supported Formats

The Cobalt aspects of Shredded storage work only with Office XML documents still. The storage to SQL shredding works with all documents we’ve tested from Office XML documents (2010/2013 format), Office Binary Documents (< 2007 format), PDF, JPEG, etc. But as discussed later, Office XML Documents have some benefits over others. This is interesting as all non-Office XML documents will actually be shredded on the SQL server as it receives the entire binary BLOB file, so in the case of non-Office XML documents, shredded storage is purely a storage optimization benefit.

For each version of the BLOB saved in SharePoint, it only stores the differential shreds. It does not touch the existing shreds created, but “magically” works out the differential shreds to store. Interestingly, it actually shreds all documents over the defined size regardless of whether versioning is turned on in the library but, if versioning is turned on will store deltas of each cumulative version of that library item. This is a huge savings for companies that do have lots of document versions in their SharePoint libraries, but there is obviously no benefit of shredding a document if versioning is not enabled. What we have found already is that the efficiency of the deltas to save storage, compared to de-duplication if RBS is enabled, is significantly less optimal.
One thing to point out, though, is if I have 50 copies of the same document across multiple SharePoint sites, it does not do the differentials at this level, it only does it at the document (item) scope. So there is no saving in this scenario, either.
One nice feature that was not included in prior versions of SharePoint is that if I just edit the metadata in the list item within SharePoint without editing the attached Office XML Document, it doesn’t create a new version of the BLOB in the SQL table. Note this isn’t the case with non-Office XML documents. This will result in tremendous storage capacity savings for some customers.
Shred Size

It appears to shred any BLOB, and, to date, our research has shown that the shred size is inconsistent and varies depending on the file format. For example, a 156K JPEG file had 6 shreds in version 0.1, a 1Mb .docx had 12 shreds as shown in screenshot below. Please note that in this example, the sum total of the shreds is in fact LARGER than the original 1Mb document and is therefore inefficient storage optimization.
Shredded Storage.png

There are some variables in the API that can be set at content database level; the default is 64320Kb for the maximum size of the shred. If the file is less than the maximum size set, then it simply won’t shred the file at all.
Existing Data
A key issue to point out is that if you upgrade your existing SharePoint 2010 Content Databases to SharePoint 2013, they will not benefit from Shredded Storage until a new document version is created.
Turning Off Shredded Storage

Shredded Storage can be turned off for a web application, site collection, and site (web) level – the default setting is AlwaysDirectToShredded. If you turn off Shredded Storage, SharePoint goes back to acting like it did in SharePoint 2010…Cobalt v1 style. This means that you have potentially higher file i/o on between the WFE and no storage savings on deltas of versioned files.
What happens when you enable RBS?
When you turn on RBS with a content database that has Shredded Storage enabled, the real-time RBS provider receives each shredded BLOB individually. These shreds are extremely small and as our RBS researchin 2010 proved with our white paper, storing BLOBs outside of the SQL database that are less than 1Mb is, in general, inefficient. This is why we recommend setting up RBS rules that leave files less than 1Mb in the content database.
By adding the RBS Provider into the mix, when I’m fetching the 69th version of a document, it’s going to get REAL chatty with the RBS provider fetching all the individual shreds. The shred size can potentially be changed up to 1Mb to be more efficient from an RBS perspective, but until we get more data from our labs we have no concrete guidance here yet. Some preliminary performance stats are available below.

Fetch Performance

From a performance perspective, for instance, if I save a 10Mb document 100 times and store each version – changing randomly a few paragraphs all over the document – to fetch version #69 or even the latest version it must merge all of the relevant shreds and do so all in the SQL software layer. This concerns me A LOT as it will be a huge performance overhead to do this over simply fetching the entire BLOB version like in SharePoint 2010!
The table below illustrates the time it took to perform a full SP-Export on the entire site collection, based on the different configurations with exactly the same content data set:
Shredded StorageDB size (Mb)RBS size (Gb)Export time (secs)
Default – 64Kb chunk6000.31Off2471
Default – 64Kb chunk103.256.353502
1Mb chunk6749.30Off2005
1Mb chunk95.196.253309
1Gb chunk13349.81Off1745
1Gb chunk74.0012.402096
From this you can see that there is a 40% increase in the time it takes to perform an export with Shredded Storage switched on, and in this content sample set a 75% saving in storage size. This will differ a lot depending on the type of content you are versioning. You’ll note that with Shredded Storage and Remote BLOB Storage on, there is a 58% increase in time taken. More notably, there is only a 22% increase if only Remote BLOB Storage is enabled and de-duplication was switched on – dramatically reducing the externalized BLOBs.
These initial performance tests were done on virtualized hardware based on recommendations on TechNet and NetApp infrastructure for the externalized content.


Popular Posts