* Q. cache in squashfs? @ 2010-06-24 2:37 J. R. Okajima 2010-07-08 3:57 ` Phillip Lougher 0 siblings, 1 reply; 16+ messages in thread From: J. R. Okajima @ 2010-06-24 2:37 UTC (permalink / raw) To: phillip; +Cc: linux-fsdevel Hello Phillip, I've found an intersting issue about squashfs. Please give me a guidance or an advise. In short: Why does squashfs read and decompress the same block several times? Is the nested fs-image always better for squashfs? Long: I created two squashfs images. - from /bin directly by mksquashfs $ mksquashfs /bin /tmp/a.img - from a single ext3 fs image which contains /bin $ dd if=/dev/zero of=/tmp/ext3/img bs=... count=... $ mkfs -t ext3 -F -m 0 -T small -O dir_index /tmp/ext3/img $ sudo mount -o loop /tmp/ext3/img /mnt $ tar -C /bin -cf - . | tar -C /mnt -xpf - $ sudo umount /mnt $ mksquashfs /tmp/ext3/img /tmp/b.img Of course, /tmp/b.img is bigger than /tmp/a.img. It is OK. For these squashfs, I tried profiling the random file read all over the fs. $ find /squashfs -type f > /tmp/l $ seq 10 | time sh -c "while read i; do rl /tmp/l | xargs -r cat & done > /dev/null; wait" ("rl" is a command to randomize lines) For b.img, I have to loopback-mount twice. $ mount -o ro,loop /tmp/b.img /tmp/sq $ mount -o ro,loop /tmp/sq/img /mnt Honestly speaking, I gueesed b.img is slower due to the nested fs overhead. But it shows that b.img (ext3 within squashfs) consumes less CPU cycles and faster. - a.img (plain squashfs) 0.00user 0.14system 0:00.09elapsed 151%CPU (0avgtext+0avgdata 2192maxresident)k 0inputs+0outputs (0major+6184minor)pagefaults 0swaps (oprofile report) samples % image name app name symbol name 710 53.9514 zlib_inflate.ko zlib_inflate inflate_fast 123 9.3465 libc-2.7.so libc-2.7.so (no symbols) 119 9.0426 zlib_inflate.ko zlib_inflate zlib_adler32 106 8.0547 zlib_inflate.ko zlib_inflate zlib_inflate 95 7.2188 ld-2.7.so ld-2.7.so (no symbols) 64 4.8632 oprofiled oprofiled (no symbols) 36 2.7356 dash dash (no symbols) - b.img (ext3 + squashfs) 0.00user 0.01system 0:00.06elapsed 22%CPU (0avgtext+0avgdata 2192maxresident)k 0inputs+0outputs (0major+6134minor)pagefaults 0swaps samples % image name app name symbol name 268 37.0678 zlib_inflate.ko zlib_inflate inflate_fast 126 17.4274 libc-2.7.so libc-2.7.so (no symbols) 106 14.6611 ld-2.7.so ld-2.7.so (no symbols) 57 7.8838 zlib_inflate.ko zlib_inflate zlib_adler32 45 6.2241 oprofiled oprofiled (no symbols) 40 5.5325 dash dash (no symbols) 33 4.5643 zlib_inflate.ko zlib_inflate zlib_inflate The biggest difference is to decompress the blocks. (Since /bin is used for this sample, the difference is not so big. But if I used antoher dir which has much more files than /bin, then the difference grows too). I don't think the difference of fs-layout or metadata is a problem. Actually inserting debug-prints to show the block index in squashfs_read_data(), it shows squashfs reads the same block multiple times from a.img. int squashfs_read_data(struct super_block *sb, void **buffer, u64 index, int length, u64 *next_index, int srclength, int pages) { ::: // for datablock for (b = 0; bytes < length; b++, cur_index++) { bh[b] = sb_getblk(sb, cur_index); + pr_info("%llu\n", cur_index); if (bh[b] == NULL) goto block_release; bytes += msblk->devblksize; } ll_rw_block(READ, b, bh); ::: // for metadata for (; bytes < length; b++) { bh[b] = sb_getblk(sb, ++cur_index); + pr_info("%llu\n", cur_index); if (bh[b] == NULL) goto block_release; bytes += msblk->devblksize; } ll_rw_block(READ, b - 1, bh + 1); ::: } In case of b.img, the same block is read several times too. But the number of times is much smaller than a.img. I am intrested where did the difference come from. Do you think the loopback block device in the middle cached the decompressed block effectively? - a.img squashfs + loop0 + disk - b.img ext3 + loop1 <-- so effective? + squashfs + loop0 + disk In other word, is inserting a loopback mount always effective for all squashfs? Thanx for reading long mail J. R. Okajima ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. cache in squashfs? 2010-06-24 2:37 Q. cache in squashfs? J. R. Okajima @ 2010-07-08 3:57 ` Phillip Lougher 2010-07-08 6:08 ` J. R. Okajima 0 siblings, 1 reply; 16+ messages in thread From: Phillip Lougher @ 2010-07-08 3:57 UTC (permalink / raw) To: J. R. Okajima; +Cc: linux-fsdevel J. R. Okajima wrote: > Hello Phillip, > > I've found an intersting issue about squashfs. > Please give me a guidance or an advise. > In short: > Why does squashfs read and decompress the same block several times? > Is the nested fs-image always better for squashfs? > Hi Junjiro, What I think you're seeing here is the negative effect of fragment blocks (tail-end packing) in the native squashfs example and the positive effect of vfs/loop block caching in the ext3 on squashfs example. In the native Squashfs example (a.img) multiple files are packed into fragment blocks and then compressed together to get better compression. Fragment blocks are efficient at getting better compression, but are highly CPU decompress inefficient on very random file access patterns. It is easy to see why - access of file X (of say 5K) packed into a 128K fragment containing files W, X ,Y and Z requires the decompression of the entire 128K fragment. If the file access pattern is extremely random (i.e. it doesn't exhibit any locality of reference), none of the other files are accessed before the decompressed fragment is evicted from the fragment cache. So, when the other files are read the fragment needs to be read and decompressed a second (and third etc.) time. As I said fragments are good at getting better compression, and are CPU efficient in normal file access patterns which exhibit locality of reference - files in the same directory (and hence are in the same fragment) tend to be accessed at the same time. In the above example it means once the fragment has been decompressed for file X, the other files W, Y and Z will likely to be read soon afterwards and will find the already decompressed fragment in the fragment cache. Indeed for this scenario fragments actually improve I/O performance and reduce CPU overhead. In the ext3 on squashfs example you're seeing the effect of VFS/loopback caching of the decompressed data from Squashfs. In this example the ext3 file is stored as a single file inside Squashfs compressed in blocks of 128K. In normal operation the mounted ext3 filesystem will issue 4K block reads to the loopback file which will cause the underlying 128K compressed block to be read from the Squashfs file, and this decompressed data will go into the VFS page cache. With random file access of the ext3 file system only a small part of that 128K block may be required at this time, however, much later ext3 filesystem accesses requiring that 128K block will be satisfied from the page cache still holding the decompressed block, and so this later access *won't* go to Squashfs wand there will not be a second decompress of the block. The overall effect means worse performance for Squashfs. But this is rather unsurprising due to the negative effects of fragments and the positive effect of VFS caching in the case of ext3 on Squashfs. As I said I suspect the major cause of worse performance for Squashfs is fragments coupled with your very random file access pattern. Fragments simply do not work well with atypical random access patterns. If you expect random access you should *not* use fragments and specify the -no-fragments option to Mksquashfs. The default Mksquashfs options (duplicate detection, fragments, 128K blocks) are a compromise which give high compression coupled with fast I/O for typical file access patterns. However, people should *always* play around with the defaults as the different compression and I/O performance achieved may suit their needs better. Unfortunately very few people do so, which is a shame as I often see people complaining about various aspects of Squashfs which I know would be solved if they'd only use different Mksquashfs settings. Regards Phillip ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. cache in squashfs? 2010-07-08 3:57 ` Phillip Lougher @ 2010-07-08 6:08 ` J. R. Okajima 2010-07-09 7:53 ` J. R. Okajima 0 siblings, 1 reply; 16+ messages in thread From: J. R. Okajima @ 2010-07-08 6:08 UTC (permalink / raw) To: Phillip Lougher; +Cc: linux-fsdevel Phillip Lougher: > What I think you're seeing here is the negative effect of fragment > blocks (tail-end packing) in the native squashfs example and the > positive effect of vfs/loop block caching in the ext3 on squashfs example. Thank you very much for your explanation. I think the number of cached decompressed fragment blocks is related too. I thought it is much larger, but I found it is 3 by default. I will try larger value with/without -no-fragments which you pointed. Also I am afraid the nested loopback mount will cause caching doubly (or more), cache by ext3-loopback and by native squashfs loopback, and some people doesn't want this. But if user has rich memory and doen't care about nested caching (because it will be reclaimed when necessary), then I expect the nested loopback mount will be a good option. For instance, - CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE = 1 - inner single ext2 image - mksquashfs without -no-fragments - ram 1GB - the squashfs image size 250MB Do you think will it be better for very random access pattern? J. R. Okajima ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. cache in squashfs? 2010-07-08 6:08 ` J. R. Okajima @ 2010-07-09 7:53 ` J. R. Okajima 2010-07-09 10:32 ` Phillip Lougher 0 siblings, 1 reply; 16+ messages in thread From: J. R. Okajima @ 2010-07-09 7:53 UTC (permalink / raw) To: Phillip Lougher, linux-fsdevel > Phillip Lougher: > > What I think you're seeing here is the negative effect of fragment > > blocks (tail-end packing) in the native squashfs example and the > > positive effect of vfs/loop block caching in the ext3 on squashfs example. > > Thank you very much for your explanation. > I think the number of cached decompressed fragment blocks is related > too. I thought it is much larger, but I found it is 3 by default. I will > try larger value with/without -no-fragments which you pointed. The -no-fragments shows better performance, but it is very small. It doesn't seem that the number of fragment blocks is large on my test environment. Next, I tried increasing the number of cache entries in squashfs. squashfs_fill_super() /* Allocate read_page block */ - msblk->read_page = squashfs_cache_init("data", 1, msblk->block_size); + msblk->read_page = squashfs_cache_init("data", 100, msblk->block_size); and CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=100 (it was 3) which is for msblk->fragment_cache. Of course, these numbers are not generic solution, but they are large enough to keep all blocks for my test. It shows much better performance. All blocks are cached and the number of decompression for native squashfs (a.img) is almost equivalent to the case of nested ext3 (b.img). But a.img consumes CPU much more than b.img. My guess for CPU is the cost to search in cache. squashfs_cache_get() for (i = 0; i < cache->entries; i++) if (cache->entry[i].block == block) break; The value of cache->entries grows, and its search cost grows too. Befor I am going to introduce a hash table or something to reduce the search cost, I think it is better to convert the squashfs cache into generic system cache. The hash index will be based on the block number. I don't know it will be able to combine with the page cache. But at least, it will be able to kmem_cache_create() and register_shrinker(). Phillip, how do you think about converting the cache system? J. R. Okajima ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. cache in squashfs? 2010-07-09 7:53 ` J. R. Okajima @ 2010-07-09 10:32 ` Phillip Lougher 2010-07-09 10:55 ` Phillip Lougher 2010-07-09 12:24 ` Q. cache in squashfs? J. R. Okajima 0 siblings, 2 replies; 16+ messages in thread From: Phillip Lougher @ 2010-07-09 10:32 UTC (permalink / raw) To: J. R. Okajima; +Cc: linux-fsdevel J. R. Okajima wrote: >> Phillip Lougher: >>> What I think you're seeing here is the negative effect of fragment >>> blocks (tail-end packing) in the native squashfs example and the >>> positive effect of vfs/loop block caching in the ext3 on squashfs example. >> Thank you very much for your explanation. >> I think the number of cached decompressed fragment blocks is related >> too. I thought it is much larger, but I found it is 3 by default. I will >> try larger value with/without -no-fragments which you pointed. > > The -no-fragments shows better performance, but it is very small. > It doesn't seem that the number of fragment blocks is large on my test > environment. That is *very* surprising. How many fragments do you have? > > Next, I tried increasing the number of cache entries in squashfs. > squashfs_fill_super() > /* Allocate read_page block */ > - msblk->read_page = squashfs_cache_init("data", 1, msblk->block_size); > + msblk->read_page = squashfs_cache_init("data", 100, msblk->block_size); That is the *wrong* cache. Read_page isn't really a cache (it is merely allocated as a cache to re-use code). This is used to store the data block in the read_page() routine, and the entire contents are explicitly pushed into the page cache. As the entire contents are pushed into the page cache, it is *very* unlikely the VFS is calling Squashfs to re-read *this* data. If it is then something fundamental is broken, or you're seeing page cache shrinkage. > and > CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=100 (it was 3) > which is for msblk->fragment_cache. and which should make *no* difference if you've used the -no-fragments option to build an image without fragments. Squashfs has three types of compressed block, each with different caching behaviour 1. Data blocks. Once read, the entire contents are pushed in the page cache. They are not cached by Squashfs. If you've got repeated reads of *these* blocks then you're seeing page cache shrinkage or flushing. 2. Fragment blocks. These are large data blocks which have have multiple small files packed together. In read_page() the file for which the fragment has been read is pushed into the page cache. The other contents of the fragment block (the other files) are not, so they're temporarily cached in the squashfs fragment cache in the belief they'll be requested soon (locality of reference and all that stuff). 3. Metadata blocks (always 8K). These store inode and directory metadata, and are (unsurprisingly) read when inodes are looked-up/instantiated and when directory look-up takes place. These blocks tend to store multiple inodes and directories packed together (for greater compression). As such they're temporarily cached in the squashfs metadata_cache in belief they'll be re-used soon (again locality of reference). It is fragments and metadata blocks which show the potential for repeated re-reading on random access patterns. As you've presumably eliminated fragments from your image, that leaves metadata blocks as the *only* cause of repeated re-reading/decompression. You should have modified the size of the metadata cache, from 8 to something larger, i.e. msblk->block_cache = squashfs_cache_init("metadata", SQUASHFS_CACHED_BLKS, SQUASHFS_METADATA_SIZE); As a rough guide, to see how much to increase the cache so that it caches the entire amount of metadata in your image, you can add up the uncompressed sizes of the inode and directory tables reported by mksquashfs. But there's a mystery here, I'll be very much surprised if your test image has more than 64K of metadata, which would fit into the existing 8 entry metadata cache. > Of course, these numbers are not generic solution, but they are large > enough to keep all blocks for my test. > > It shows much better performance. If you've done as you said, it should have made no difference whatsoever, unless the page pushing into the page cache is broken. So there's a big mystery here. >All blocks are cached and the number > of decompression for native squashfs (a.img) is almost equivalent to the > case of nested ext3 (b.img). But a.img consumes CPU much more than > b.img. > My guess for CPU is the cost to search in cache. > squashfs_cache_get() > for (i = 0; i < cache->entries; i++) > if (cache->entry[i].block == block) > break; > The value of cache->entries grows, and its search cost grows too. As you seriously suggesting a scan of a 100 entry table on a modern CPU makes any noticable difference? > > Befor I am going to introduce a hash table or something to reduce the > search cost, I think it is better to convert the squashfs cache into > generic system cache. The hash index will be based on the block number. > I don't know it will be able to combine with the page cache. But at > least, it will be able to kmem_cache_create() and register_shrinker(). > > Phillip, how do you think about converting the cache system? > That was discussed on this list back in 2008, and there are pros and cons to doing this. You can look at the list archives for the discussion and so I won't repeat it here. At the moment I see this as a red herring because your results suggest something more fundamental is wrong. Doing what you did above with the size of the read_page cache should not have made any difference, and if it did, it suggests pages which *should* be in the page cache (explicitly pushed there by the read_page() routine) are not there. In short its not a question of should Squashfs be using the page cache, for the pages in question it already is. I'll try and reproduce your results, as they're to be frank significantly at variance to my previous experience. Maybe there's a bug or VFS changes means the page pushing into the page cache isn't working, but I cannot see where your repeated block reading/decompression results are coming from. Phillip > > J. R. Okajima > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. cache in squashfs? 2010-07-09 10:32 ` Phillip Lougher @ 2010-07-09 10:55 ` Phillip Lougher 2010-07-10 5:07 ` J. R. Okajima 2010-07-09 12:24 ` Q. cache in squashfs? J. R. Okajima 1 sibling, 1 reply; 16+ messages in thread From: Phillip Lougher @ 2010-07-09 10:55 UTC (permalink / raw) To: J. R. Okajima; +Cc: Phillip Lougher, linux-fsdevel Phillip Lougher wrote: > > That was discussed on this list back in 2008, and there are pros and cons > to doing this. You can look at the list archives for the discussion and > so I won't repeat it here. At the moment I see this as a red herring > because your results suggest something more fundamental is wrong. Doing > what you did above with the size of the read_page cache should not have > made any difference, and if it did, it suggests pages which *should* be > in the page cache (explicitly pushed there by the read_page() routine) are > not there. In short its not a question of should Squashfs be using the > page cache, for the pages in question it already is. > > I'll try and reproduce your results, as they're to be frank > significantly at variance to my previous experience. Maybe there's a bug > or VFS changes means the page pushing into the page cache isn't working, > but I cannot see where your repeated block reading/decompression results are > coming from. > You can determine which blocks are being repeatedly decompressed by printing out the value of cache->name in squashfs_cache_get(). You should get one of "data", "fragment" and "metadata" for data blocks, fragment blocks and metadata respectively. This information will go a long way in showing where the problem lies. Phillip ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. cache in squashfs? 2010-07-09 10:55 ` Phillip Lougher @ 2010-07-10 5:07 ` J. R. Okajima 2010-07-10 5:08 ` J. R. Okajima 2010-07-11 2:48 ` Phillip Lougher 0 siblings, 2 replies; 16+ messages in thread From: J. R. Okajima @ 2010-07-10 5:07 UTC (permalink / raw) To: Phillip Lougher; +Cc: linux-fsdevel Phillip Lougher: > You can determine which blocks are being repeatedly decompressed by > printing out the value of cache->name in squashfs_cache_get(). > > You should get one of "data", "fragment" and "metadata" for data > blocks, fragment blocks and metadata respectively. > > This information will go a long way in showing where the problem lies. Here is a patch to count and the result. ---------------------------------------------------------------------- frag(3, 100) x -no-fragments(with, without) O: no-fragments x inner ext3 A: frag=3 x without -no-fragments B: frag=3 x with -no-fragments C: frag=100 x without -no-fragments -: frag=100 x with -no-fragments cat10 cache_get read zlib (sec,cpu) (meta,frag,data) (meta,data) (meta,data) ---------------------------------------------------------------------- O .06, 35% 92, -, 41 3, 44 2, 3557 A .09, 113% 12359, 81, 22 4, 90 6, 6474 B .07, 104% 12369, -, 109 3, 100 5, 3484 C .06, 112% 12381, 80, 35 4, 53 6, 3650 - the case O is b.img in my first mail, and the case A is a.img. - the "cat10" column is the result of time command as described in my first mail. - all these numbers just show the trend and the small difference doesn't have much meaning. - with -no-fragments option (case B), + the number of zlib call is reduced. + the CPU usage is not reduced much. + the number of cache_get for data increses. + the number of read for data may increse too. - even with the compressed fragments, by increasing CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE it shows similar performance (case C), + the number of zlib call is reduced. + the CPU usage is not reduced much. + the number of cache_get for data may increse. + the number of read for data may decrese. I am not sure the differece of cache_get/read for data between cases is so meaningful. But it surely shows high CPU usage in squashfs and I guess it is caused by cache_get for metadata. The number of zlib compression may not be related to this CPU usage much. J. R. Okajima ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. cache in squashfs? 2010-07-10 5:07 ` J. R. Okajima @ 2010-07-10 5:08 ` J. R. Okajima 2010-07-11 2:48 ` Phillip Lougher 1 sibling, 0 replies; 16+ messages in thread From: J. R. Okajima @ 2010-07-10 5:08 UTC (permalink / raw) To: Phillip Lougher, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 111 bytes --] "J. R. Okajima": > Here is a patch to count and the result. Forgot appending the patch, sorry. J. R. Okajima [-- Attachment #2: a.patch.bz2 --] [-- Type: application/x-bzip2, Size: 2075 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. cache in squashfs? 2010-07-10 5:07 ` J. R. Okajima 2010-07-10 5:08 ` J. R. Okajima @ 2010-07-11 2:48 ` Phillip Lougher 2010-07-11 5:55 ` J. R. Okajima 1 sibling, 1 reply; 16+ messages in thread From: Phillip Lougher @ 2010-07-11 2:48 UTC (permalink / raw) To: J. R. Okajima; +Cc: linux-fsdevel J. R. Okajima wrote: > O: no-fragments x inner ext3 > > A: frag=3 x without -no-fragments > B: frag=3 x with -no-fragments > > C: frag=100 x without -no-fragments > -: frag=100 x with -no-fragments > > cat10 cache_get read zlib > (sec,cpu) (meta,frag,data) (meta,data) (meta,data) > ---------------------------------------------------------------------- > O .06, 35% 92, -, 41 3, 44 2, 3557 > A .09, 113% 12359, 81, 22 4, 90 6, 6474 > B .07, 104% 12369, -, 109 3, 100 5, 3484 > C .06, 112% 12381, 80, 35 4, 53 6, 3650 > OK, I've done some tests of my own, and I can report that there is no issue with Squashfs. Squashfs on its own is performing better than ext3 on Squashfs. The reason why your tests suggest otherwise is because your testing methodology is *broken*. In your first column (ext3 on squashfs), only a small amount of the overall cost is being accounted to the 'cat10' command, the bulk of the work is being accounted to the kernel 'loop1' thread and this isn't showing up. In the other cases (Squashfs only) the entire cost is being accounted to the 'cat10' command. The resulting results are therefore completely bogus, and incorrectly show higher CPU usage for Squashfs. The following should illustrate this (all tests done under kvm): 1. Squashfs native Following sqsh.sh shell used #!/bin/sh for i in `seq 2`; do mount -t squashfs /data/comp/bin.sqsh /mnt -o loop find /mnt -type f|xargs wc 2>&1 > /dev/null umount /mnt done bin.sqsh is a copy of /usr/bin, without any fragments. # /usr/bin/time sqsh.sh root@slackware:/data/blame-game/data# /usr/bin/time ./test-sqsh.sh 5.51user 12.70system 0:18.72elapsed 97%CPU (0avgtext+0avgdata 5712maxresident)k High CPU usage, however, this should not be surprising, in an otherwise idle system there is no reason not to use all CPU. Snapshot from top while running confirms this: top - 01:59:30 up 1:13, 2 users, load average: 0.49, 0.23, 0.10 Tasks: 58 total, 2 running, 56 sleeping, 0 stopped, 0 zombie Cpu(s): 36.0%us, 64.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2023364k total, 1342200k used, 681164k free, 127316k buffers Swap: 0k total, 0k used, 0k free, 1134448k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3214 root 20 0 4696 1124 552 R 97.1 0.1 0:05.33 wc The system is running fully occupied, with 0 % idle. Note overall elapsed time (from time command) running Squashfs native: 18.72 s 2. ext3 on squashfs Following ext3.sh shell used #!/bin/sh for i in `seq 2`; do mount -t squashfs /data/comp/ext3.sqsh /mnt2 -o loop mount -t ext3 /mnt2/ext3.img /mnt -o loop find /mnt -type f | xargs wc 2&>1 > /dev/null umount /mnt umount /mnt2 done ext3.img is an ext3 fs containing /usr/bin. # /usr/bin/time ext3.sh 5.70user 5.11system 0:20.28elapsed 53%CPU (0avgtext+0avgdata 5712maxresident)k 0inputs+0outputs (0major+5346minor)pagefaults 0swaps Much lower CPU, but this is bogus. A snapshot from top shows: top - 02:04:29 up 1:18, 2 users, load average: 0.44, 0.18, 0.10 Tasks: 61 total, 2 running, 59 sleeping, 0 stopped, 0 zombie Cpu(s): 33.0%us, 67.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2023364k total, 1637056k used, 386308k free, 143416k buffers Swap: 0k total, 0k used, 0k free, 1410636k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3241 root 0 -20 0 0 0 S 52.8 0.0 0:03.19 loop1 3248 root 20 0 4696 1148 576 R 44.5 0.1 0:02.38 wc Again the system is running fully occupied with 0 % idle. The major difference is 52.8 % of the CPU is being accounted to the 'loop1' kernel thread, and this does not show up in the time command. To make that clear, all of the cost of reading the loop1 file (ext3.img) is being accounted to the loop1 kernel thread, and therefore no decompression overhead is showing up in the time command. As decompression cost is the majority of the overhead of reading compressed data, it is little wonder the CPU usage reported by time is only 53 %. In fact as the 53 % CPU figure only includes time spent in user-space and ext3 (and excludes decompression cost), it is surprising it is so *high*. On this basis Squashfs is using only 47 % of CPU or less to decompress the data. Which is *good*, and a complete reversal of your bogus results. Note overall elapsed time (from time command) running ext3 on Squashfs: 20.28 s. 3. Overall conclusion On my tests both Squashfs native and ext3 on Squashfs uses 100 % CPU. However, Squashfs native is faster, 18.72. seconds versus 20.28 seconds. Cheers Phillip ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Q. cache in squashfs? 2010-07-11 2:48 ` Phillip Lougher @ 2010-07-11 5:55 ` J. R. Okajima 2010-07-11 9:38 ` [RFC 0/2] squashfs parallel decompression J. R. Okajima ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: J. R. Okajima @ 2010-07-11 5:55 UTC (permalink / raw) To: Phillip Lougher; +Cc: linux-fsdevel Phillip Lougher: > In your first column (ext3 on squashfs), only a small amount of the > overall cost is being accounted to the 'cat10' command, the bulk of > the work is being accounted to the kernel 'loop1' thread and this isn't > showing up. In the other cases (Squashfs only) the entire cost is being > accounted to the 'cat10' command. The resulting results are therefore > completely bogus, and incorrectly show higher CPU usage for Squashfs. Ah, I forget about the kthread. My question about CPU usage must be due to the kthread. Also I could confirm that the sequential access pattern as you did shows good performance. While the very random access shows worse, it is a positive effect of loopback caching as you wrote in your first reply. Thank you very much. J. R. Okajima ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC 0/2] squashfs parallel decompression 2010-07-11 5:55 ` J. R. Okajima @ 2010-07-11 9:38 ` J. R. Okajima 2011-02-22 19:41 ` Phillip Susi 2010-07-11 9:38 ` [RFC 1/2] squashfs parallel decompression, early wait_on_buffer J. R. Okajima 2010-07-11 9:38 ` [RFC 2/2] squashfs parallel decompression, z_stream per cpu J. R. Okajima 2 siblings, 1 reply; 16+ messages in thread From: J. R. Okajima @ 2010-07-11 9:38 UTC (permalink / raw) To: phillip; +Cc: linux-fsdevel, J. R. Okajima Discussing about the performance of squashfs, I have tried enabling parallel decompression. On my test system, the elapsed time to read 59171 files randomly 10 times becomes 33.25sec to 20.54sec (of course, CPU usage increases). The base version is v2.6.33. J. R. Okajima (2): squashfs parallel decompression, early wait_on_buffer squashfs parallel decompression, z_stream per cpu fs/squashfs/block.c | 81 +++++++++++++++++------------------------ fs/squashfs/cache.c | 1 + fs/squashfs/dir.c | 1 + fs/squashfs/export.c | 1 + fs/squashfs/file.c | 1 + fs/squashfs/fragment.c | 1 + fs/squashfs/id.c | 1 + fs/squashfs/inode.c | 1 + fs/squashfs/namei.c | 1 + fs/squashfs/squashfs.h | 3 ++ fs/squashfs/squashfs_fs_sb.h | 2 - fs/squashfs/super.c | 48 +++++++++++++++++++------ fs/squashfs/symlink.c | 1 + 13 files changed, 83 insertions(+), 60 deletions(-) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 0/2] squashfs parallel decompression 2010-07-11 9:38 ` [RFC 0/2] squashfs parallel decompression J. R. Okajima @ 2011-02-22 19:41 ` Phillip Susi 2011-02-23 3:23 ` Phillip Lougher 0 siblings, 1 reply; 16+ messages in thread From: Phillip Susi @ 2011-02-22 19:41 UTC (permalink / raw) To: J. R. Okajima; +Cc: phillip, linux-fsdevel Did anything ever come of this? Nobody ever responded and I don't see it ever having been merged to Linus' tree. On 7/11/2010 5:38 AM, J. R. Okajima wrote: > Discussing about the performance of squashfs, I have tried enabling > parallel decompression. > On my test system, the elapsed time to read 59171 files randomly 10 > times becomes 33.25sec to 20.54sec (of course, CPU usage increases). > > The base version is v2.6.33. > > J. R. Okajima (2): > squashfs parallel decompression, early wait_on_buffer > squashfs parallel decompression, z_stream per cpu > > fs/squashfs/block.c | 81 +++++++++++++++++------------------------ > fs/squashfs/cache.c | 1 + > fs/squashfs/dir.c | 1 + > fs/squashfs/export.c | 1 + > fs/squashfs/file.c | 1 + > fs/squashfs/fragment.c | 1 + > fs/squashfs/id.c | 1 + > fs/squashfs/inode.c | 1 + > fs/squashfs/namei.c | 1 + > fs/squashfs/squashfs.h | 3 ++ > fs/squashfs/squashfs_fs_sb.h | 2 - > fs/squashfs/super.c | 48 +++++++++++++++++++------ > fs/squashfs/symlink.c | 1 + > 13 files changed, 83 insertions(+), 60 deletions(-) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC 0/2] squashfs parallel decompression 2011-02-22 19:41 ` Phillip Susi @ 2011-02-23 3:23 ` Phillip Lougher 0 siblings, 0 replies; 16+ messages in thread From: Phillip Lougher @ 2011-02-23 3:23 UTC (permalink / raw) To: Phillip Susi; +Cc: J. R. Okajima, linux-fsdevel Phillip Susi wrote: > Did anything ever come of this? Nobody ever responded and I don't see > it ever having been merged to Linus' tree. > They're on my TO DO list. Unfortunately the patches are against 2.6.33 and are based on Squashfs with only gzip compression support. At the time of posting mainline was 2.6.36 or so with a new compression framework backend in Squashfs with support for gzip, lzo and (now) xz. So the patches need a re-work to fit in with this new framework. When I get time I'll look into refactoring the patches so they can apply to mainline. I've been tied up with xz compression support for the last couple of months. Phillip > On 7/11/2010 5:38 AM, J. R. Okajima wrote: >> Discussing about the performance of squashfs, I have tried enabling >> parallel decompression. >> On my test system, the elapsed time to read 59171 files randomly 10 >> times becomes 33.25sec to 20.54sec (of course, CPU usage increases). >> >> The base version is v2.6.33. >> >> J. R. Okajima (2): >> squashfs parallel decompression, early wait_on_buffer >> squashfs parallel decompression, z_stream per cpu >> >> fs/squashfs/block.c | 81 +++++++++++++++++------------------------ >> fs/squashfs/cache.c | 1 + >> fs/squashfs/dir.c | 1 + >> fs/squashfs/export.c | 1 + >> fs/squashfs/file.c | 1 + >> fs/squashfs/fragment.c | 1 + >> fs/squashfs/id.c | 1 + >> fs/squashfs/inode.c | 1 + >> fs/squashfs/namei.c | 1 + >> fs/squashfs/squashfs.h | 3 ++ >> fs/squashfs/squashfs_fs_sb.h | 2 - >> fs/squashfs/super.c | 48 +++++++++++++++++++------ >> fs/squashfs/symlink.c | 1 + >> 13 files changed, 83 insertions(+), 60 deletions(-) > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC 1/2] squashfs parallel decompression, early wait_on_buffer 2010-07-11 5:55 ` J. R. Okajima 2010-07-11 9:38 ` [RFC 0/2] squashfs parallel decompression J. R. Okajima @ 2010-07-11 9:38 ` J. R. Okajima 2010-07-11 9:38 ` [RFC 2/2] squashfs parallel decompression, z_stream per cpu J. R. Okajima 2 siblings, 0 replies; 16+ messages in thread From: J. R. Okajima @ 2010-07-11 9:38 UTC (permalink / raw) To: phillip; +Cc: linux-fsdevel, J. R. Okajima Preparing parallel decompression. In squashfs_read_data(), move wait_on_buffer() forer which is common part of the 'if (compressed) - else' blocks. Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> --- fs/squashfs/block.c | 17 +++++++---------- 1 files changed, 7 insertions(+), 10 deletions(-) diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c index 2a79603..1017b94 100644 --- a/fs/squashfs/block.c +++ b/fs/squashfs/block.c @@ -151,6 +151,12 @@ int squashfs_read_data(struct super_block *sb, void **buffer, u64 index, } ll_rw_block(READ, b - 1, bh + 1); } + for (k = 0; k < b; k++) { + wait_on_buffer(bh[k]); + /* possible? */ + WARN_ON(!buffer_uptodate(bh[k])); + } + k = 0; if (compressed) { int zlib_err = 0, zlib_init = 0; @@ -169,9 +175,6 @@ int squashfs_read_data(struct super_block *sb, void **buffer, u64 index, if (msblk->stream.avail_in == 0 && k < b) { avail = min(bytes, msblk->devblksize - offset); bytes -= avail; - wait_on_buffer(bh[k]); - if (!buffer_uptodate(bh[k])) - goto release_mutex; if (avail == 0) { offset = 0; @@ -223,13 +226,7 @@ int squashfs_read_data(struct super_block *sb, void **buffer, u64 index, /* * Block is uncompressed. */ - int i, in, pg_offset = 0; - - for (i = 0; i < b; i++) { - wait_on_buffer(bh[i]); - if (!buffer_uptodate(bh[i])) - goto block_release; - } + int in, pg_offset = 0; for (bytes = length; k < b; k++) { in = min(bytes, msblk->devblksize - offset); -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC 2/2] squashfs parallel decompression, z_stream per cpu 2010-07-11 5:55 ` J. R. Okajima 2010-07-11 9:38 ` [RFC 0/2] squashfs parallel decompression J. R. Okajima 2010-07-11 9:38 ` [RFC 1/2] squashfs parallel decompression, early wait_on_buffer J. R. Okajima @ 2010-07-11 9:38 ` J. R. Okajima 2 siblings, 0 replies; 16+ messages in thread From: J. R. Okajima @ 2010-07-11 9:38 UTC (permalink / raw) To: phillip; +Cc: linux-fsdevel, J. R. Okajima Convert z_stream in squashfs_sb_info into the module-global per-cpu var, and remove read_data_mutex in squashfs_sb_info. Also convert the repeated sequence of zlib_inflateInit - zlib_inflate - zlib_inflateEnd into zlib_inflateReset - zlib_inflate zlib_inflateEnd() is not called since its current implementation is less important. Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> --- fs/squashfs/block.c | 64 +++++++++++++++++------------------------ fs/squashfs/cache.c | 1 + fs/squashfs/dir.c | 1 + fs/squashfs/export.c | 1 + fs/squashfs/file.c | 1 + fs/squashfs/fragment.c | 1 + fs/squashfs/id.c | 1 + fs/squashfs/inode.c | 1 + fs/squashfs/namei.c | 1 + fs/squashfs/squashfs.h | 3 ++ fs/squashfs/squashfs_fs_sb.h | 2 - fs/squashfs/super.c | 48 ++++++++++++++++++++++++------- fs/squashfs/symlink.c | 1 + 13 files changed, 76 insertions(+), 50 deletions(-) diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c index 1017b94..ed34979 100644 --- a/fs/squashfs/block.c +++ b/fs/squashfs/block.c @@ -33,6 +33,7 @@ #include <linux/string.h> #include <linux/buffer_head.h> #include <linux/zlib.h> +#include <linux/percpu.h> #include "squashfs_fs.h" #include "squashfs_fs_sb.h" @@ -159,20 +160,28 @@ int squashfs_read_data(struct super_block *sb, void **buffer, u64 index, k = 0; if (compressed) { - int zlib_err = 0, zlib_init = 0; + int zlib_err = 0; + z_stream *z; /* * Uncompress block. */ + /* it disables preemption */ + z = &get_cpu_var(squashfs_zstream); + zlib_err = zlib_inflateReset(z); + if (zlib_err != Z_OK) { + put_cpu_var(squashfs_zstream); + ERROR("zlib_inflateReset returned" + " unexpected result 0x%x, srclength %d\n", + zlib_err, srclength); + goto block_release; + } - mutex_lock(&msblk->read_data_mutex); - - msblk->stream.avail_out = 0; - msblk->stream.avail_in = 0; - + z->avail_out = 0; + z->avail_in = 0; bytes = length; do { - if (msblk->stream.avail_in == 0 && k < b) { + if (z->avail_in == 0 && k < b) { avail = min(bytes, msblk->devblksize - offset); bytes -= avail; @@ -182,46 +191,30 @@ int squashfs_read_data(struct super_block *sb, void **buffer, u64 index, continue; } - msblk->stream.next_in = bh[k]->b_data + offset; - msblk->stream.avail_in = avail; + z->next_in = bh[k]->b_data + offset; + z->avail_in = avail; offset = 0; } - if (msblk->stream.avail_out == 0 && page < pages) { - msblk->stream.next_out = buffer[page++]; - msblk->stream.avail_out = PAGE_CACHE_SIZE; + if (z->avail_out == 0 && page < pages) { + z->next_out = buffer[page++]; + z->avail_out = PAGE_CACHE_SIZE; } - if (!zlib_init) { - zlib_err = zlib_inflateInit(&msblk->stream); - if (zlib_err != Z_OK) { - ERROR("zlib_inflateInit returned" - " unexpected result 0x%x," - " srclength %d\n", zlib_err, - srclength); - goto release_mutex; - } - zlib_init = 1; - } + zlib_err = zlib_inflate(z, Z_SYNC_FLUSH); - zlib_err = zlib_inflate(&msblk->stream, Z_SYNC_FLUSH); - - if (msblk->stream.avail_in == 0 && k < b) + if (z->avail_in == 0 && k < b) put_bh(bh[k++]); } while (zlib_err == Z_OK); if (zlib_err != Z_STREAM_END) { + put_cpu_var(squashfs_zstream); ERROR("zlib_inflate error, data probably corrupt\n"); - goto release_mutex; + goto block_release; } - zlib_err = zlib_inflateEnd(&msblk->stream); - if (zlib_err != Z_OK) { - ERROR("zlib_inflate error, data probably corrupt\n"); - goto release_mutex; - } - length = msblk->stream.total_out; - mutex_unlock(&msblk->read_data_mutex); + length = z->total_out; + put_cpu_var(squashfs_zstream); } else { /* * Block is uncompressed. @@ -252,9 +245,6 @@ int squashfs_read_data(struct super_block *sb, void **buffer, u64 index, kfree(bh); return length; -release_mutex: - mutex_unlock(&msblk->read_data_mutex); - block_release: for (; k < b; k++) put_bh(bh[k]); diff --git a/fs/squashfs/cache.c b/fs/squashfs/cache.c index 40c98fa..d90814f 100644 --- a/fs/squashfs/cache.c +++ b/fs/squashfs/cache.c @@ -53,6 +53,7 @@ #include <linux/wait.h> #include <linux/zlib.h> #include <linux/pagemap.h> +#include <linux/percpu.h> #include "squashfs_fs.h" #include "squashfs_fs_sb.h" diff --git a/fs/squashfs/dir.c b/fs/squashfs/dir.c index 566b0ea..3d2a632 100644 --- a/fs/squashfs/dir.c +++ b/fs/squashfs/dir.c @@ -31,6 +31,7 @@ #include <linux/vfs.h> #include <linux/slab.h> #include <linux/zlib.h> +#include <linux/percpu.h> #include "squashfs_fs.h" #include "squashfs_fs_sb.h" diff --git a/fs/squashfs/export.c b/fs/squashfs/export.c index 2b1b8fe..9b33473 100644 --- a/fs/squashfs/export.c +++ b/fs/squashfs/export.c @@ -41,6 +41,7 @@ #include <linux/exportfs.h> #include <linux/zlib.h> #include <linux/slab.h> +#include <linux/percpu.h> #include "squashfs_fs.h" #include "squashfs_fs_sb.h" diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c index 717767d..f9728f3 100644 --- a/fs/squashfs/file.c +++ b/fs/squashfs/file.c @@ -48,6 +48,7 @@ #include <linux/pagemap.h> #include <linux/mutex.h> #include <linux/zlib.h> +#include <linux/percpu.h> #include "squashfs_fs.h" #include "squashfs_fs_sb.h" diff --git a/fs/squashfs/fragment.c b/fs/squashfs/fragment.c index b5a2c15..fd6f776 100644 --- a/fs/squashfs/fragment.c +++ b/fs/squashfs/fragment.c @@ -37,6 +37,7 @@ #include <linux/vfs.h> #include <linux/slab.h> #include <linux/zlib.h> +#include <linux/percpu.h> #include "squashfs_fs.h" #include "squashfs_fs_sb.h" diff --git a/fs/squashfs/id.c b/fs/squashfs/id.c index 3795b83..cea22c1 100644 --- a/fs/squashfs/id.c +++ b/fs/squashfs/id.c @@ -35,6 +35,7 @@ #include <linux/vfs.h> #include <linux/slab.h> #include <linux/zlib.h> +#include <linux/percpu.h> #include "squashfs_fs.h" #include "squashfs_fs_sb.h" diff --git a/fs/squashfs/inode.c b/fs/squashfs/inode.c index 9101dbd..6977d34 100644 --- a/fs/squashfs/inode.c +++ b/fs/squashfs/inode.c @@ -41,6 +41,7 @@ #include <linux/fs.h> #include <linux/vfs.h> #include <linux/zlib.h> +#include <linux/percpu.h> #include "squashfs_fs.h" #include "squashfs_fs_sb.h" diff --git a/fs/squashfs/namei.c b/fs/squashfs/namei.c index 9e39865..3a6e51f 100644 --- a/fs/squashfs/namei.c +++ b/fs/squashfs/namei.c @@ -58,6 +58,7 @@ #include <linux/string.h> #include <linux/dcache.h> #include <linux/zlib.h> +#include <linux/percpu.h> #include "squashfs_fs.h" #include "squashfs_fs_sb.h" diff --git a/fs/squashfs/squashfs.h b/fs/squashfs/squashfs.h index 0e9feb6..0ff530e 100644 --- a/fs/squashfs/squashfs.h +++ b/fs/squashfs/squashfs.h @@ -70,6 +70,9 @@ extern struct inode *squashfs_iget(struct super_block *, long long, unsigned int); extern int squashfs_read_inode(struct inode *, long long); +/* super.c */ +DECLARE_PER_CPU(z_stream, squashfs_zstream); + /* * Inodes and files operations */ diff --git a/fs/squashfs/squashfs_fs_sb.h b/fs/squashfs/squashfs_fs_sb.h index c8c6561..95489cc 100644 --- a/fs/squashfs/squashfs_fs_sb.h +++ b/fs/squashfs/squashfs_fs_sb.h @@ -61,10 +61,8 @@ struct squashfs_sb_info { __le64 *id_table; __le64 *fragment_index; unsigned int *fragment_index_2; - struct mutex read_data_mutex; struct mutex meta_index_mutex; struct meta_index *meta_index; - z_stream stream; __le64 *inode_lookup_table; u64 inode_table; u64 directory_table; diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c index 6c197ef..e78ffbf 100644 --- a/fs/squashfs/super.c +++ b/fs/squashfs/super.c @@ -37,6 +37,7 @@ #include <linux/module.h> #include <linux/zlib.h> #include <linux/magic.h> +#include <linux/percpu.h> #include "squashfs_fs.h" #include "squashfs_fs_sb.h" @@ -87,13 +88,6 @@ static int squashfs_fill_super(struct super_block *sb, void *data, int silent) } msblk = sb->s_fs_info; - msblk->stream.workspace = kmalloc(zlib_inflate_workspacesize(), - GFP_KERNEL); - if (msblk->stream.workspace == NULL) { - ERROR("Failed to allocate zlib workspace\n"); - goto failure; - } - sblk = kzalloc(sizeof(*sblk), GFP_KERNEL); if (sblk == NULL) { ERROR("Failed to allocate squashfs_super_block\n"); @@ -103,7 +97,6 @@ static int squashfs_fill_super(struct super_block *sb, void *data, int silent) msblk->devblksize = sb_min_blocksize(sb, BLOCK_SIZE); msblk->devblksize_log2 = ffz(~msblk->devblksize); - mutex_init(&msblk->read_data_mutex); mutex_init(&msblk->meta_index_mutex); /* @@ -295,14 +288,12 @@ failed_mount: kfree(msblk->inode_lookup_table); kfree(msblk->fragment_index); kfree(msblk->id_table); - kfree(msblk->stream.workspace); kfree(sb->s_fs_info); sb->s_fs_info = NULL; kfree(sblk); return err; failure: - kfree(msblk->stream.workspace); kfree(sb->s_fs_info); sb->s_fs_info = NULL; return -ENOMEM; @@ -349,7 +340,6 @@ static void squashfs_put_super(struct super_block *sb) kfree(sbi->id_table); kfree(sbi->fragment_index); kfree(sbi->meta_index); - kfree(sbi->stream.workspace); kfree(sb->s_fs_info); sb->s_fs_info = NULL; } @@ -394,14 +384,43 @@ static void destroy_inodecache(void) } +DEFINE_PER_CPU(z_stream, squashfs_zstream); static int __init init_squashfs_fs(void) { int err = init_inodecache(); + int cpu, sz; + z_stream *z; if (err) return err; + err = -ENOMEM; + z = NULL; /* suppress gcc warning */ + sz = zlib_inflate_workspacesize(); + for_each_online_cpu(cpu) { + z = &per_cpu(squashfs_zstream, cpu); + z->workspace = kmalloc(sz, GFP_KERNEL); + if (!z->workspace) { + ERROR("Failed to allocate zlib workspace\n"); + break; + } + err = zlib_inflateInit(z); + if (err == Z_MEM_ERROR) { + err = -ENOMEM; + ERROR("Failed to intialize zlib\n"); + break; + } + } + if (!z->workspace) { + for_each_online_cpu(cpu) { + z = &per_cpu(squashfs_zstream, cpu); + kfree(z->workspace); + } + goto out_err; + } + err = register_filesystem(&squashfs_fs_type); +out_err: if (err) { destroy_inodecache(); return err; @@ -416,7 +435,14 @@ static int __init init_squashfs_fs(void) static void __exit exit_squashfs_fs(void) { + int cpu; + z_stream *z; + unregister_filesystem(&squashfs_fs_type); + for_each_online_cpu(cpu) { + z = &per_cpu(squashfs_zstream, cpu); + kfree(z->workspace); + } destroy_inodecache(); } diff --git a/fs/squashfs/symlink.c b/fs/squashfs/symlink.c index 83d8788..133776b 100644 --- a/fs/squashfs/symlink.c +++ b/fs/squashfs/symlink.c @@ -37,6 +37,7 @@ #include <linux/string.h> #include <linux/pagemap.h> #include <linux/zlib.h> +#include <linux/percpu.h> #include "squashfs_fs.h" #include "squashfs_fs_sb.h" -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Q. cache in squashfs? 2010-07-09 10:32 ` Phillip Lougher 2010-07-09 10:55 ` Phillip Lougher @ 2010-07-09 12:24 ` J. R. Okajima 1 sibling, 0 replies; 16+ messages in thread From: J. R. Okajima @ 2010-07-09 12:24 UTC (permalink / raw) To: Phillip Lougher; +Cc: linux-fsdevel Phillip Lougher: > > The -no-fragments shows better performance, but it is very small. > > It doesn't seem that the number of fragment blocks is large on my test > > environment. > > That is *very* surprising. How many fragments do you have? Actually -no-fragments could reduce the number of zlib_inflate() expectedly. But the performance didn't improve much, particulary CPU usage. So I removed -no-fragments option again. This is what I forgot to write in my mail. I hope one your big mystery solved. $ sq4.0.wcvs/squashfs/squashfs-tools/mksquashfs /bin /tmp/a.img -no-progress -noappend -keep-as-directory -comp gzip Parallel mksquashfs: Using 2 processors Creating 4.0 filesystem on /tmp/a.img, block size 131072. Exportable Squashfs 4.0 filesystem, gzip compressed, data block size 131072 compressed data, compressed metadata, compressed fragments duplicates are removed Filesystem size 2236.52 Kbytes (2.18 Mbytes) 47.19% of uncompressed filesystem size (4739.02 Kbytes) Inode table size 1210 bytes (1.18 Kbytes) 36.87% of uncompressed inode table size (3282 bytes) Directory table size 851 bytes (0.83 Kbytes) 63.70% of uncompressed directory table size (1336 bytes) Number of duplicate files found 1 Number of inodes 98 Number of files 84 Number of fragments 28 Number of symbolic links 12 Number of device nodes 0 Number of fifo nodes 0 Number of socket nodes 0 Number of directories 2 Number of ids (unique uids + gids) 2 Number of uids 2 root (0) jro (1000) Number of gids 2 root (0) jro (1000) > It is fragments and metadata blocks which show the potential for > repeated re-reading on random access patterns. Ok, then I'd focus metadata. Increasing SQUASHFS_CACHED_BLKS to (8<<10) didn't help the performance for my case. Here is my thought. squashfs_read_metadata() is called very many times, from (every?) lookup or file read. In squashfs_cache_get(), the search loop runs every time with a spinlock held. That is why I thought the search is the CPU eater. "100" is not a problem. J. R. Okajima ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-02-23 3:42 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-06-24 2:37 Q. cache in squashfs? J. R. Okajima 2010-07-08 3:57 ` Phillip Lougher 2010-07-08 6:08 ` J. R. Okajima 2010-07-09 7:53 ` J. R. Okajima 2010-07-09 10:32 ` Phillip Lougher 2010-07-09 10:55 ` Phillip Lougher 2010-07-10 5:07 ` J. R. Okajima 2010-07-10 5:08 ` J. R. Okajima 2010-07-11 2:48 ` Phillip Lougher 2010-07-11 5:55 ` J. R. Okajima 2010-07-11 9:38 ` [RFC 0/2] squashfs parallel decompression J. R. Okajima 2011-02-22 19:41 ` Phillip Susi 2011-02-23 3:23 ` Phillip Lougher 2010-07-11 9:38 ` [RFC 1/2] squashfs parallel decompression, early wait_on_buffer J. R. Okajima 2010-07-11 9:38 ` [RFC 2/2] squashfs parallel decompression, z_stream per cpu J. R. Okajima 2010-07-09 12:24 ` Q. cache in squashfs? J. R. Okajima
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.