From mboxrd@z Thu Jan 1 00:00:00 1970 From: Phillip Lougher Subject: Re: Q. cache in squashfs? Date: Fri, 09 Jul 2010 11:32:17 +0100 Message-ID: <4C36FAB1.6010506@lougher.demon.co.uk> References: <19486.1277347066@jrobl> <4C354CBE.70401@lougher.demon.co.uk> <6356.1278569327@jrobl> <15323.1278662033@jrobl> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org To: "J. R. Okajima" Return-path: Received: from anchor-post-2.mail.demon.net ([195.173.77.133]:39677 "EHLO anchor-post-2.mail.demon.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753365Ab0GIKhX (ORCPT ); Fri, 9 Jul 2010 06:37:23 -0400 In-Reply-To: <15323.1278662033@jrobl> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: J. R. Okajima wrote: >> Phillip Lougher: >>> What I think you're seeing here is the negative effect of fragment >>> blocks (tail-end packing) in the native squashfs example and the >>> positive effect of vfs/loop block caching in the ext3 on squashfs example. >> Thank you very much for your explanation. >> I think the number of cached decompressed fragment blocks is related >> too. I thought it is much larger, but I found it is 3 by default. I will >> try larger value with/without -no-fragments which you pointed. > > The -no-fragments shows better performance, but it is very small. > It doesn't seem that the number of fragment blocks is large on my test > environment. That is *very* surprising. How many fragments do you have? > > Next, I tried increasing the number of cache entries in squashfs. > squashfs_fill_super() > /* Allocate read_page block */ > - msblk->read_page = squashfs_cache_init("data", 1, msblk->block_size); > + msblk->read_page = squashfs_cache_init("data", 100, msblk->block_size); That is the *wrong* cache. Read_page isn't really a cache (it is merely allocated as a cache to re-use code). This is used to store the data block in the read_page() routine, and the entire contents are explicitly pushed into the page cache. As the entire contents are pushed into the page cache, it is *very* unlikely the VFS is calling Squashfs to re-read *this* data. If it is then something fundamental is broken, or you're seeing page cache shrinkage. > and > CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=100 (it was 3) > which is for msblk->fragment_cache. and which should make *no* difference if you've used the -no-fragments option to build an image without fragments. Squashfs has three types of compressed block, each with different caching behaviour 1. Data blocks. Once read, the entire contents are pushed in the page cache. They are not cached by Squashfs. If you've got repeated reads of *these* blocks then you're seeing page cache shrinkage or flushing. 2. Fragment blocks. These are large data blocks which have have multiple small files packed together. In read_page() the file for which the fragment has been read is pushed into the page cache. The other contents of the fragment block (the other files) are not, so they're temporarily cached in the squashfs fragment cache in the belief they'll be requested soon (locality of reference and all that stuff). 3. Metadata blocks (always 8K). These store inode and directory metadata, and are (unsurprisingly) read when inodes are looked-up/instantiated and when directory look-up takes place. These blocks tend to store multiple inodes and directories packed together (for greater compression). As such they're temporarily cached in the squashfs metadata_cache in belief they'll be re-used soon (again locality of reference). It is fragments and metadata blocks which show the potential for repeated re-reading on random access patterns. As you've presumably eliminated fragments from your image, that leaves metadata blocks as the *only* cause of repeated re-reading/decompression. You should have modified the size of the metadata cache, from 8 to something larger, i.e. msblk->block_cache = squashfs_cache_init("metadata", SQUASHFS_CACHED_BLKS, SQUASHFS_METADATA_SIZE); As a rough guide, to see how much to increase the cache so that it caches the entire amount of metadata in your image, you can add up the uncompressed sizes of the inode and directory tables reported by mksquashfs. But there's a mystery here, I'll be very much surprised if your test image has more than 64K of metadata, which would fit into the existing 8 entry metadata cache. > Of course, these numbers are not generic solution, but they are large > enough to keep all blocks for my test. > > It shows much better performance. If you've done as you said, it should have made no difference whatsoever, unless the page pushing into the page cache is broken. So there's a big mystery here. >All blocks are cached and the number > of decompression for native squashfs (a.img) is almost equivalent to the > case of nested ext3 (b.img). But a.img consumes CPU much more than > b.img. > My guess for CPU is the cost to search in cache. > squashfs_cache_get() > for (i = 0; i < cache->entries; i++) > if (cache->entry[i].block == block) > break; > The value of cache->entries grows, and its search cost grows too. As you seriously suggesting a scan of a 100 entry table on a modern CPU makes any noticable difference? > > Befor I am going to introduce a hash table or something to reduce the > search cost, I think it is better to convert the squashfs cache into > generic system cache. The hash index will be based on the block number. > I don't know it will be able to combine with the page cache. But at > least, it will be able to kmem_cache_create() and register_shrinker(). > > Phillip, how do you think about converting the cache system? > That was discussed on this list back in 2008, and there are pros and cons to doing this. You can look at the list archives for the discussion and so I won't repeat it here. At the moment I see this as a red herring because your results suggest something more fundamental is wrong. Doing what you did above with the size of the read_page cache should not have made any difference, and if it did, it suggests pages which *should* be in the page cache (explicitly pushed there by the read_page() routine) are not there. In short its not a question of should Squashfs be using the page cache, for the pages in question it already is. I'll try and reproduce your results, as they're to be frank significantly at variance to my previous experience. Maybe there's a bug or VFS changes means the page pushing into the page cache isn't working, but I cannot see where your repeated block reading/decompression results are coming from. Phillip > > J. R. Okajima >