From mboxrd@z Thu Jan  1 00:00:00 1970
From: "J. R. Okajima" <hooanon05@yahoo.co.jp>
Subject: Re: Q. cache in squashfs?
Date: Fri, 09 Jul 2010 16:53:53 +0900
Message-ID: <15323.1278662033@jrobl>
References: <19486.1277347066@jrobl> <4C354CBE.70401@lougher.demon.co.uk> <6356.1278569327@jrobl>
To: Phillip Lougher <phillip@lougher.demon.co.uk>,
	linux-fsdevel@vger.kernel.org
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mtoichi14.ns.itscom.net ([219.110.2.184]:62318 "EHLO
	mtoichi14.ns.itscom.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751411Ab0GIHyT (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Fri, 9 Jul 2010 03:54:19 -0400
In-Reply-To: <6356.1278569327@jrobl>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>


> Phillip Lougher:
> > What I think you're seeing here is the negative effect of fragment
> > blocks (tail-end packing) in the native squashfs example and the
> > positive effect of vfs/loop block caching in the ext3 on squashfs example.
>
> Thank you very much for your explanation.
> I think the number of cached decompressed fragment blocks is related
> too. I thought it is much larger, but I found it is 3 by default. I will
> try larger value with/without -no-fragments which you pointed.

The -no-fragments shows better performance, but it is very small.
It doesn't seem that the number of fragment blocks is large on my test
environment.

Next, I tried increasing the number of cache entries in squashfs.
squashfs_fill_super()
        /* Allocate read_page block */
-       msblk->read_page = squashfs_cache_init("data", 1, msblk->block_size);
+       msblk->read_page = squashfs_cache_init("data", 100, msblk->block_size);
and
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=100 (it was 3)
which is for msblk->fragment_cache.
Of course, these numbers are not generic solution, but they are large
enough to keep all blocks for my test.

It shows much better performance. All blocks are cached and the number
of decompression for native squashfs (a.img) is almost equivalent to the
case of nested ext3 (b.img). But a.img consumes CPU much more than
b.img.
My guess for CPU is the cost to search in cache.
squashfs_cache_get()
		for (i = 0; i < cache->entries; i++)
			if (cache->entry[i].block == block)
				break;
The value of cache->entries grows, and its search cost grows too.

Befor I am going to introduce a hash table or something to reduce the
search cost, I think it is better to convert the squashfs cache into
generic system cache. The hash index will be based on the block number.
I don't know it will be able to combine with the page cache. But at
least, it will be able to kmem_cache_create() and register_shrinker().

Phillip, how do you think about converting the cache system?


J. R. Okajima