From mboxrd@z Thu Jan  1 00:00:00 1970
From: "J. R. Okajima" <hooanon05@yahoo.co.jp>
Subject: Q. cache in squashfs?
Date: Thu, 24 Jun 2010 11:37:46 +0900
Message-ID: <19486.1277347066@jrobl>
Cc: linux-fsdevel@vger.kernel.org
To: phillip@lougher.demon.co.uk
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mtoichi11.ns.itscom.net ([219.110.2.181]:63277 "EHLO
	mtoichi11.ns.itscom.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753218Ab0FXCiN (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 23 Jun 2010 22:38:13 -0400
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>


Hello Phillip,

I've found an intersting issue about squashfs.
Please give me a guidance or an advise.
In short:
Why does squashfs read and decompress the same block several times?
Is the nested fs-image always better for squashfs?

Long:
I created two squashfs images.
- from /bin directly by mksquashfs
  $ mksquashfs /bin /tmp/a.img
- from a single ext3 fs image which contains /bin
  $ dd if=/dev/zero of=/tmp/ext3/img bs=... count=...
  $ mkfs -t ext3 -F -m 0 -T small -O dir_index /tmp/ext3/img
  $ sudo mount -o loop /tmp/ext3/img /mnt
  $ tar -C /bin -cf - . | tar -C /mnt -xpf -
  $ sudo umount /mnt
  $ mksquashfs /tmp/ext3/img /tmp/b.img

Of course, /tmp/b.img is bigger than /tmp/a.img. It is OK.
For these squashfs, I tried profiling the random file read all over the
fs.
$ find /squashfs -type f > /tmp/l
$ seq 10 | time sh -c "while read i; do rl /tmp/l | xargs -r cat & done > /dev/null; wait"
("rl" is a command to randomize lines)

For b.img, I have to loopback-mount twice.
$ mount -o ro,loop /tmp/b.img /tmp/sq
$ mount -o ro,loop /tmp/sq/img /mnt

Honestly speaking, I gueesed b.img is slower due to the nested fs
overhead. But it shows that b.img (ext3 within squashfs) consumes less
CPU cycles and faster.

- a.img (plain squashfs)
  0.00user 0.14system 0:00.09elapsed 151%CPU (0avgtext+0avgdata 2192maxresident)k
  0inputs+0outputs (0major+6184minor)pagefaults 0swaps

(oprofile report)
samples  %        image name               app name                 symbol name
710      53.9514  zlib_inflate.ko          zlib_inflate             inflate_fast
123       9.3465  libc-2.7.so              libc-2.7.so              (no symbols)
119       9.0426  zlib_inflate.ko          zlib_inflate             zlib_adler32
106       8.0547  zlib_inflate.ko          zlib_inflate             zlib_inflate
95        7.2188  ld-2.7.so                ld-2.7.so                (no symbols)
64        4.8632  oprofiled                oprofiled                (no symbols)
36        2.7356  dash                     dash                     (no symbols)

- b.img (ext3 + squashfs)
  0.00user 0.01system 0:00.06elapsed 22%CPU (0avgtext+0avgdata 2192maxresident)k
  0inputs+0outputs (0major+6134minor)pagefaults 0swaps

samples  %        image name               app name                 symbol name
268      37.0678  zlib_inflate.ko          zlib_inflate             inflate_fast
126      17.4274  libc-2.7.so              libc-2.7.so              (no symbols)
106      14.6611  ld-2.7.so                ld-2.7.so                (no symbols)
57        7.8838  zlib_inflate.ko          zlib_inflate             zlib_adler32
45        6.2241  oprofiled                oprofiled                (no symbols)
40        5.5325  dash                     dash                     (no symbols)
33        4.5643  zlib_inflate.ko          zlib_inflate             zlib_inflate


The biggest difference is to decompress the blocks.
(Since /bin is used for this sample, the difference is not so big. But
if I used antoher dir which has much more files than /bin, then the
difference grows too).
I don't think the difference of fs-layout or metadata is a problem.
Actually inserting debug-prints to show the block index in
squashfs_read_data(), it shows squashfs reads the same block multiple
times from a.img.

int squashfs_read_data(struct super_block *sb, void **buffer, u64 index,
			int length, u64 *next_index, int srclength, int pages)
{
	:::
	// for datablock
	for (b = 0; bytes < length; b++, cur_index++) {
		bh[b] = sb_getblk(sb, cur_index);
+		pr_info("%llu\n", cur_index);
		if (bh[b] == NULL)
			goto block_release;
		bytes += msblk->devblksize;
	}
	ll_rw_block(READ, b, bh);
	:::
	// for metadata
	for (; bytes < length; b++) {
		bh[b] = sb_getblk(sb, ++cur_index);
+		pr_info("%llu\n", cur_index);
		if (bh[b] == NULL)
			goto block_release;
		bytes += msblk->devblksize;
	}
	ll_rw_block(READ, b - 1, bh + 1);
	:::
}

In case of b.img, the same block is read several times too. But the
number of times is much smaller than a.img.

I am intrested where did the difference come from.
Do you think the loopback block device in the middle cached the
decompressed block effectively?
- a.img
  squashfs
  + loop0
  + disk

- b.img
  ext3
  + loop1	<-- so effective?
  + squashfs
  + loop0
  + disk

In other word, is inserting a loopback mount always effective for all
squashfs?


Thanx for reading long mail
J. R. Okajima