This is a blogbench comparison between UFS and HAMMER. Each test was run on identical 2TB disks with the filesystem formatted on an 800G partition with the same geometry. UFS on one disk, HAMMER on the other. Tests were run through a Silicon Image SATA driver with NCQ. Due to the random-access nature of the test bulk bandwidth is not a concern and never approaches the SATA phy ceiling. Tests were run on DragonFly BSD with the latest master as of 20Aug2009.
Blogbench is a benchmark which uses an ever-growing dataset. It essentially writes the data set in parallel with reading any part of the already-written dataset, using many threads. The first column, "Nb blogs", roughly scales to the size of the dataset. Therefore, read performance is expected to be quite high until the dataset exceeds available system caches. Once the system caches get blown out read and/or write performance has no choice but to drop off dramantically. The results are quite interesting.
UFS has fairly consistent results but this is mainly due to a severe lack of write bandwidth even when operating within the system caches. The reduced write bandwidth means it takes far longer for the test to blow out the system caches and it winds up basically just testing cache accesses rather then disk accesses on the read side until that point. Why write performance is so low when reads are not touching the disk at all is unknown. On the plus side, UFS clearly has a smaller memory footprint and the system caches do not get blown out until around blog 580. Once the system caches do get blown out UFS comes to a grinding halt... write action basically stops entirely and read action winds up reflecting what has essentially become a static data set only slightly larger then system caches.
HAMMER appears to be able to push much higher write bandwidth in the first phase of the test while reads are operating within the system cache. Because of the much higher bandwidth HAMMER blows out the system caches much more quickly, at around blog 370. The write action itself skews the blow out point a bit but in repeated tests the system caches blow out prior to blog 450. So HAMMER has a larger memory footprint, for sure. However, once the system caches are blown out HAMMER continues to be able to write at a fairly high clip while read performance drops to levels expected when a growing data set is well past system cache limits. UFS, in constrast, was able to maintain higher read throughput but only because its write throughput essentially went to zero, so it was operating on a dataset only slightly larger then the system cache. HAMMER exhibits less determinism. The results are not smooth. This is mainly due to HAMMER's background meta-data flusher. Despite this write performance stays good throughout the entire test and read performance is also quite reasonable as the data set grows to the point where most reads have to hit the disk.
Because the dataset continues to grow in the HAMMER test one can't really compare the output beyond blog 600 to UFS, other then to note that despite UFS being able to maintain read performance it would be a completely failure if a production system had similar characteristics because the web site wouldn't be able to write anything at all.
What conclusions can be determined from this comparison?
Additional UFS notes: I included a lot of trailing output from the blogbench test so people could see what happens after writes get stalled into oblivion. What you are seeing here is a dataset which has blown out the system caches and then continues to grow very slowly. This relative small growth directly relates to the continued read degredation as a larger and larger percentage of the read requests cannot be found in the system caches. In any random access system it only takes 1% of the requests missing the cache to seriously degrade benchmark performance, owing to the difference between completing the read from system caches in a few microseconds verses completing it from disk in few milliseconds. The times may be small but performance ratio is 1:1000 cache-vs-disk. With 100 reader threads having 1% of the requests miss will drop read performance by approximately 1%, leading to rapid degredation. The fewer reader threads there are, the more pronounced the degredation.
Additional HAMMER notes: I included a lot of trailing output for HAMMER as well. HAMMER blows out the system caches at blog 370 or so (UFS blows them out at blog 580). HAMMER's read performance is, like UFS, coming from the cache up to that point. That HAMMER has somewhat lower cached read numbers is clearly due to the balance against the higher write performance and in particular the fact that write performance is not seriously degraded at this stage of the test. Even though write performance is much higher, there are still some hicups due to the background flusher that can stall writes for a while as the flusher catches up. The non-determinisitic performance is something that we need to work on. It gets really interesting when the dataset size gets past blog 420. Now the system caches have been blown out, but as you can see from the output the write performance, while clearly lower and more bursty, does not become permanently stalled out like it does in UFS. HAMMER is able to continue to write regardless of the size of the dataset.
HAMMER has gone through numerous engineering cycles since it was first release. Performance is now a lot better then it was on the initial release but much of the improved performance requires a few days of settling time for the background maintainance tasks to clean the tree up. Initial B-Tree layout still needs a lot of work. Due to its nature HAMMER will always have a bit lower read and write performance then something like UFS, at least for single-threaded tests. HAMMER is, after all, retaining all operational history during the test and manipulating B-Trees is definitely a heavier-weight task verses blockmaps.