Why Resize U2 Files?

Paradigm Systems, Inc
Mar 3, 2019
4 min read

U2 hashed files are perfect for data storage because they can grow rapidly and store variable length data and can have a wide range of key and data structures. Like any finely tuned machine, regular maintenance is required to retain that efficiency. Just like your car, your database requires regular periodic preventive maintenance.

U2 hashed files are divided into groups and each group can only store a finite about of data. As a file group reaches its maximum capacity, additional overflow groups are linked to each group in order to contain new data. Eventually data becomes distributed across multiple groups located in different areas of the file. These additional groups are referred to as overflow space. Overflow space is similar to a Champaign fountain. As the first glass reaches its capacity, Champaign overflows to the next glass, and then the next glass, and eventually when it gets to that last glass there is nothing to catch that overflow, and you have a big mess to clean up. Just like these Champaign glasses, your data will eventually overflow to such a point that your database becomes slow and unstable and then it’s just a matter of time before you experience a database failure which could result in lost data.

As data is distributed across more overflow segments the file becomes inefficient as U2 has to keep track of every overflow group and how each group is connected in relation to the other groups, it also takes more time to access and update the data. While a properly sized file is more like a neat bundle of pasta, all organized into identical lengths with a uniform structure which is organized into a neat bundle, a poorly maintained file resembles a bowl of cooked spaghetti, consuming more space and becoming difficult to find the start and end of your data.

In addition to being poorly organized and requiring multiple reads to access the data, poorly sized files consume more space. For example the table below in figure 1 shows information for a dynamic file containing 1 million randomly sized records ranging from 1k to 3k in size.

Modulo / Block Size Level 1 Overflow File Size

168330 / 16 23,382 3,662,790,656

180731 / 16 0 2,961,162,240

Figure 1

As you can see in this example, when the file is properly sized the overflow space is zero (0) along with the added benefit of reducing the disk space consumed by almost 20% for a savings of 701mb in this case. The benefits are obvious, a properly sized file provides for improved speed of data access, more efficient use of disk space, and a much more stable file.

It should however be noted that when resizing files you won’t always save space and in many cases you may not be able to eliminate overflow. The goal in resizing is to reorganize the stored data so that U2 may efficiently read and write data to the file with the least amount of effort.

The table below in figure 2, shows three 64 bit static files, each containing one million records ranging in size from 1k to 3k. The first file was created using a minimal size of one, 1k group and the second was created using a more realistic size of 133,723 groups of 4k blocks, while the last file contains 180,719 groups of 16k blocks.

The reason I choose a small block size of one, 1k group, is to amplify and demonstrate the excessive amount of resources consumed to update a poorly sized file.

Line Modulo / Block Size Level 1 Overflow File Size Time to Build

1 1 / 1 3,000,000 3,096,849,408 03:38:36

2 133723 / 4 866,486 4,096,860,160 00:00:29

3 180719 / 16 35,677 3,545,448,448 00:02:00

Figure 2 – 64 Bit Static Files

In the above example, line 1 is the ridiculously small file mentioned earlier. As you can see this poor sizing has resulted in a massive number of groups in overflow space, which adds a considerable amount of overhead to the system to make updates to the file. In this case it took over 3.5 hours to create the 1 million records.

Line item 2 has a much more realistic file size which results in a larger file, but considerably less overflow space, resulting in much faster file access. While line item 3 uses less disk space and has the least amount of overflow, it still does not provide the fastest file access time. In cases like this we must determine if we are more interested in file access speed or efficient use of disk space.

What about Dynamic Files?

A common misconception is that dynamic files automatically resize themselves. In theory, dynamic file groups are supposed to split as data is added and merge when data is deleted. In reality this process is not as quick or accurate as users are lead to believe. In reality, dynamic files often result in excessive overflow or too much unusable disk space locked away inside each group.

Why is Mercury the Solution?

File analysis and database tuning are paramount in keeping a system operating at peak performance. If you have not been resizing your files on a regular basis, your system is wasting valuable resources by performing unnecessary disk I/O, while large portions of disk space become unusable.

Rocket Software recommends that all files should be routinely analyzed to calculate the proper file size by determining the correct modulo and separation. Once determined, these values must be implemented by resizing each file on the system.

Gain Control

Mercury allows you to take complete control of your database without the need for a full time DBA, saving you piles of cash. It’s like having the premier DBA working for you at a fraction of the cost.

Call your Mercury reseller today to arrange a demostration.

Comments