From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cn.fujitsu.com ([59.151.112.132]:28445 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1751326AbcGRBOP (ORCPT ); Sun, 17 Jul 2016 21:14:15 -0400 Subject: Re: mount btrfs takes 30 minutes, btrfs check runs out of memory To: John Ettedgui , Austin S Hemmelgarn References: <55BB0A39.1050208@cn.fujitsu.com> <55C017E7.40704@cn.fujitsu.com> <55C02AF9.3070600@cn.fujitsu.com> <55C0A1ED.6020407@gmail.com> <55C1F3DD.7020603@gmail.com> <126f9f09-4e28-3c12-5384-63032e17942f@cn.fujitsu.com> CC: btrfs From: Qu Wenruo Message-ID: <5cc93522-1bd2-bdc1-d5da-a11d5e4816a7@cn.fujitsu.com> Date: Mon, 18 Jul 2016 09:13:57 +0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: At 07/16/2016 07:17 PM, John Ettedgui wrote: > On Thu, Jul 14, 2016 at 10:54 PM John Ettedgui > wrote: > > On Thu, Jul 14, 2016 at 10:26 PM Qu Wenruo > wrote: > > > > Would increasing the leaf size help as well? > > > nodatacow seems unsafe > > > Nodatacow is not that unsafe, as btrfs will still do data cow if > it's > needed, like rewriting data of another subvolume/snapshot. > > Alright. > > That would be one of the most obvious method if you do a lot of > rewrite. > > > as for defrag, all my partitions are already on > > autodefrag, so I assume that should be good. Or is manual once > in a > > while a good idea as well? > AFAIK autodefrag will only help if you're doing appending write. > > Manual one will help more, but since btrfs has problem defraging > extents > shared by different subvolumes, I doubt the effect if you have a > lot of > subvolumes/snapshots. > > I don't have any subvolume/snapshot for the big partitions, my usage > there is fairly simple. I'll have to add a regular defrag job then. > > > Another method is to disable compression. > For compression, file extent size up limit is 128K, while for > non-compress case, it's 128M. > > So for the same 1G sized file, it would cause 8K extents using > compression, while only 8 extents without compression. > > Now that might be something important, I do use LZO compression on > all of them. > Does this limit apply to only compressed files, or any file if the > fs is mounted using the compression option? > Would mounting these partitions without compression option and then > defragmenting them reverse the compression? > > I've tried this for the slowest to mount partition. > I changed its mount option to compression=no, run defrag and balance. > Not sure if the latter was needed but I thought to try... like in the > past it worked fine up to dusage=99 but with 100% I get a crash, oh well. > The result of defrag + nocompress (I don't know how much it actually > decompressed, and if it changed the limit Qu mentioned before) is about > 26% less time spent to mount the partition, and it's no more my slowerst > partition to mount.! Well, compression=no only affects any write after the mount option. And balance won't help to convert compressed extent to non-compressed one. But maybe the defrag will convert them to normal extents. The best method to de-compress them is, to read them out and rewrite them with compression=no mount option. > > I'll try just defragmenting another partition but keeping the > compression on and see what difference I get there the same changes. > > I've tried the patch, which applied fine to my kernel (4.6.4) but I > don't see any difference in mounting time, maybe I made a mistake or my > issue is not really the same? Pretty possible that there is another problem causing the slow mount. The best method to verify is to do a ftrace on the btrfs mount. Here is the script I tested my patch: ------ #!/bin/bash trace_dir=/sys/kernel/debug/tracing init_trace () { echo 0 > $trace_dir/tracing_on echo > $trace_dir/trace echo function_graph > $trace_dir/current_tracer echo > $trace_dir/set_ftrace_filter echo open_ctree >> $trace_dir/set_ftrace_filter echo btrfs_read_chunk_tree >> $trace_dir/set_ftrace_filter echo btrfs_read_block_groups >> $trace_dir/set_ftrace_filter # This will generate tons of trace, better to comment it out echo find_block_group >> $trace_dir/set_ftrace_filter echo 1 > $trace_dir/tracing_on } end_trace () { cp $trace_dir/trace $(dirname $0) echo 0 > $trace_dir/tracing_on echo > $trace_dir/set_ftrace_filter echo > $trace_dir/trace } init_trace echo start mounting time mount /dev/sdb /mnt/test echo mount done end_trace ------ After executing the script, you got a file named "trace" at the same directory of the script. The content will be like: ------ # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) $ 7670856 us | open_ctree [btrfs](); 2) * 13533.45 us | btrfs_read_chunk_tree [btrfs](); 2) # 1320.981 us | btrfs_init_space_info [btrfs](); 2) | btrfs_read_block_groups [btrfs]() { 2) * 10127.35 us | find_block_group [btrfs](); 2) 4.951 us | find_block_group [btrfs](); 2) * 26225.17 us | find_block_group [btrfs](); ...... 3) * 26450.28 us | find_block_group [btrfs](); 3) * 11590.29 us | find_block_group [btrfs](); 3) $ 7557210 us | } /* btrfs_read_block_groups [btrfs] */ <<< ------ And you can see open_ctree() function, the main part of btrfs mount, takes about 7.67 seconds to execute, while btrfs_read_block_groups() takes 7.56 seconds, about 98.6% of the open_ctree() executing time. If your result are much the same as mine, then that's the same problem. And after applying my patch, please try to compare the executing time of btrfs_read_block_groups() to see if there is any obvious(>5%) change. Thanks, Qu > > Thank you, > John