One of the greatest things about Content Manager OnDemand is one of the ways in which it compresses data — resource deduplication. During the indexing process for AFP, PDF, and XML data formats, OnDemand separates data from ‘resources’ — things like fonts, graphics, lines, gradient fills, and the fine print that is the same on every page. These resources are written into ‘bundle’, compressed, then compared to IBM CMOD’s library of existing resources. If the resources are identical to a previously stored bundle of resources, then CMOD discards it, and simply links this new load to the previous version of the resources. The number of times Content Manager OnDemand tries to find a duplicate set of resources is controlled by the “Resource Compares” parameter in the Application Definition, under the Load Information Tab.
At one customer site, one of their primary uses for OnDemand was as a glorified print spool. A customer service representative would request a form, and the mainframe application would create it, with most of the fields pre-filled, so that all a customer had to do was enter a few pieces of information, and sign on the dotted line. This mainframe application streamlined the process of filling out these forms — but the variety of documents it produced, and an artificially low ‘Resource Compares’ value conspired to create several months worth of problems.
It wasn’t until after several years, and billions of documents that the IBM CMOD server hardware was scheduled to finally be decommissioned. It had easily been one of the oldest servers I’d ever worked on. The job was to extract the data off the old system, and reload it onto a new system using RAPTOR Extractor — a tool purpose built for extracting huge volumes of data from CMOD quickly and easily.
A few weeks into the migration, the process slowed dramatically. An investigation showed that the Application Group that was storing these mainframe-produced forms was the culprit. A closer inspection showed that there were several million individual resources in this ONE OnDemand Application Group — it was something I’d never seen before! Of course, the customer wanted to keep with their project timeline, but between the old server hardware, the terrible configuration error, and the fact that the old server was still in use by customers, there was little to be done but exercise extraordinary patience.
As I investigated further, I discovered that there were approximately 80,000 unique resource bundles, but that they had been duplicated tens of thousands of times, because of the Application configuration — with ‘Resource Compares’ set to an artificially low number, CMOD was never really given the chance to work its de-duplication magic. The project dragged on for months more than scheduled, and the pressure to complete the project grew dramatically.
When the day came to reload the data on the new server, two tricks saved the day — I resorted the data files into an order that was primarily chronological to preserve table segmentation, but also sorted by identical resource files. Combined with a higher Resource Compares value, I was able to reduce the number of resources from several million to approximately 140,000 — a tremendous savings in storage and database overhead.