All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs mount takes too long time
@ 2021-08-28 11:58 Jingyun He
  2021-08-28 23:47 ` Anand Jain
  0 siblings, 1 reply; 16+ messages in thread
From: Jingyun He @ 2021-08-28 11:58 UTC (permalink / raw)
  To: linux-btrfs

Hello, all
I'm new to btrfs, I have a HM-SMR 14TB disk, I have formatted it to
btrfs to store the files.

When the device is almost full, it needs about 5 mins to mount the device.

Is it normal? is there any mount option that I can use to reduce the mount time?

Thank you.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
  2021-08-28 11:58 btrfs mount takes too long time Jingyun He
@ 2021-08-28 23:47 ` Anand Jain
       [not found]   ` <CAHQ7scVkHp8Lcfxx2QZXv2ghkW-nYpiFGntyZa0Toz2hU3S-tQ@mail.gmail.com>
  0 siblings, 1 reply; 16+ messages in thread
From: Anand Jain @ 2021-08-28 23:47 UTC (permalink / raw)
  To: Jingyun He, linux-btrfs

On 28/08/2021 19:58, Jingyun He wrote:
> Hello, all
> I'm new to btrfs, I have a HM-SMR 14TB disk, I have formatted it to
> btrfs to store the files.
> 
> When the device is almost full, it needs about 5 mins to mount the device.
> 
> Is it normal? is there any mount option that I can use to reduce the mount time?
> 


  We need to figure out the function taking a longer time (maybe it is
  read-block-groups). I have similar reports on the non zoned device
  as well (with a few TB full of data). But there is no good data yet
  to analyse.

  Could you please collect the trace data from the ftracegraph
  from here [1] (It needs trace-cmd).

  [1] https://github.com/asj/btrfstrace.git

  Run it as in the example below:

  umount /btrfs;

  ./ftracegraph open_ctree 3 "*:mod:btrfs" "mount /dev/vg/scratch0 /btrfs"

  cat /tmp/ftracegraph.out


Thanks, Anand



> Thank you.
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
       [not found]   ` <CAHQ7scVkHp8Lcfxx2QZXv2ghkW-nYpiFGntyZa0Toz2hU3S-tQ@mail.gmail.com>
@ 2021-08-30  1:27     ` Anand Jain
       [not found]       ` <CAHQ7scUiNLVD_4---ArBet-0DqzfmmH5Y9JgQY0grYrUv8yhiQ@mail.gmail.com>
  0 siblings, 1 reply; 16+ messages in thread
From: Anand Jain @ 2021-08-30  1:27 UTC (permalink / raw)
  To: Jingyun He; +Cc: linux-btrfs


Our open_ctree() took around ~223secs (~3.7mins)

  1) $ 223375750 us |  } /* open_ctree [btrfs] */

Unfortunately, the default trace buffer per CPU (4K) wasn't sufficient
and, the trace-buffer rolled over.
So we still don't know how long we spent in btrfs_read_block_groups().
Sorry for my mistake we should go 1 step at a time and, we have to do
this until we narrow it down to a specific function.

Could you please run with the depth = 2 (instead of 3) and use the time
command prefix. Also, pull a new ftracegraph as I have updated it to
display a proper time output.

$ ftracegraph open_ctree 2 "*:mod:btrfs" "time mount /dev/vg/scratch0 
/btrfs"

Thanks, Anand

On 29/08/2021 17:42, Jingyun He wrote:
> Hi, Anand
> 
> I have attached the file.
> Could you kindly check this?
> 
> Thank you.
> 
> On Sun, Aug 29, 2021 at 7:47 AM Anand Jain <anand.jain@oracle.com> wrote:
>>
>> On 28/08/2021 19:58, Jingyun He wrote:
>>> Hello, all
>>> I'm new to btrfs, I have a HM-SMR 14TB disk, I have formatted it to
>>> btrfs to store the files.
>>>
>>> When the device is almost full, it needs about 5 mins to mount the device.
>>>
>>> Is it normal? is there any mount option that I can use to reduce the mount time?
>>>
>>
>>
>>    We need to figure out the function taking a longer time (maybe it is
>>    read-block-groups). I have similar reports on the non zoned device
>>    as well (with a few TB full of data). But there is no good data yet
>>    to analyse.
>>
>>    Could you please collect the trace data from the ftracegraph
>>    from here [1] (It needs trace-cmd).
>>
>>    [1] https://github.com/asj/btrfstrace.git
>>
>>    Run it as in the example below:
>>
>>    umount /btrfs;
>>
>>    ./ftracegraph open_ctree 3 "*:mod:btrfs" "mount /dev/vg/scratch0 /btrfs"
>>
>>    cat /tmp/ftracegraph.out
>>
>>
>> Thanks, Anand
>>
>>
>>
>>> Thank you.
>>>
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
       [not found]       ` <CAHQ7scUiNLVD_4---ArBet-0DqzfmmH5Y9JgQY0grYrUv8yhiQ@mail.gmail.com>
@ 2021-08-30 13:04         ` Anand Jain
  2021-08-30 13:04         ` Anand Jain
  1 sibling, 0 replies; 16+ messages in thread
From: Anand Jain @ 2021-08-30 13:04 UTC (permalink / raw)
  To: Jingyun He; +Cc: linux-btrfs

open_ctree() took 228254398 us. And 98% of it that is 225418272 us
was taken by btrfs_read_block_groups().

-------------------
  1) $ 225418272 us | } /* btrfs_read_block_groups [btrfs] */
  1) * 16934.96 us | btrfs_check_rw_degradable [btrfs]();
  0) 0.967 us | btrfs_apply_pending_changes [btrfs]();
  0) 0.239 us | btrfs_read_qgroup_config [btrfs]();
  0) * 21017.34 us | btrfs_get_root_ref [btrfs]();
  0) + 15.717 us | btrfs_start_pre_rw_mount [btrfs]();
  0) 0.865 us | btrfs_discard_resume [btrfs]();
  0) $ 228254398 us | } /* open_ctree [btrfs] */
-------------------

Now we need to run the same thing on btrfs_read_block_groups(),
could you please run.. [1] (no need of the time).

[1]
   $ umount /btrfs;
   $./ftracegraph btrfs_read_block_groups 3 "*:mod:btrfs" "mount 
/dev/vg/scratch0 /btrfs"

Thanks, Anand


On 30/08/2021 14:44, Jingyun He wrote:
> Hi, Anand,
> I have attached the new result.
> Kindly check.
> 
> Thank you.
> 
> On Mon, Aug 30, 2021 at 9:27 AM Anand Jain <anand.jain@oracle.com> wrote:
>>
>>
>> Our open_ctree() took around ~223secs (~3.7mins)
>>
>>    1) $ 223375750 us |  } /* open_ctree [btrfs] */
>>
>> Unfortunately, the default trace buffer per CPU (4K) wasn't sufficient
>> and, the trace-buffer rolled over.
>> So we still don't know how long we spent in btrfs_read_block_groups().
>> Sorry for my mistake we should go 1 step at a time and, we have to do
>> this until we narrow it down to a specific function.
>>
>> Could you please run with the depth = 2 (instead of 3) and use the time
>> command prefix. Also, pull a new ftracegraph as I have updated it to
>> display a proper time output.
>>
>> $ ftracegraph open_ctree 2 "*:mod:btrfs" "time mount /dev/vg/scratch0
>> /btrfs"
>>
>> Thanks, Anand
>>
>> On 29/08/2021 17:42, Jingyun He wrote:
>>> Hi, Anand
>>>
>>> I have attached the file.
>>> Could you kindly check this?
>>>
>>> Thank you.
>>>
>>> On Sun, Aug 29, 2021 at 7:47 AM Anand Jain <anand.jain@oracle.com> wrote:
>>>>
>>>> On 28/08/2021 19:58, Jingyun He wrote:
>>>>> Hello, all
>>>>> I'm new to btrfs, I have a HM-SMR 14TB disk, I have formatted it to
>>>>> btrfs to store the files.
>>>>>
>>>>> When the device is almost full, it needs about 5 mins to mount the device.
>>>>>
>>>>> Is it normal? is there any mount option that I can use to reduce the mount time?
>>>>>
>>>>
>>>>
>>>>     We need to figure out the function taking a longer time (maybe it is
>>>>     read-block-groups). I have similar reports on the non zoned device
>>>>     as well (with a few TB full of data). But there is no good data yet
>>>>     to analyse.
>>>>
>>>>     Could you please collect the trace data from the ftracegraph
>>>>     from here [1] (It needs trace-cmd).
>>>>
>>>>     [1] https://github.com/asj/btrfstrace.git
>>>>
>>>>     Run it as in the example below:
>>>>
>>>>     umount /btrfs;
>>>>
>>>>     ./ftracegraph open_ctree 3 "*:mod:btrfs" "mount /dev/vg/scratch0 /btrfs"
>>>>
>>>>     cat /tmp/ftracegraph.out
>>>>
>>>>
>>>> Thanks, Anand
>>>>
>>>>
>>>>
>>>>> Thank you.
>>>>>
>>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
       [not found]       ` <CAHQ7scUiNLVD_4---ArBet-0DqzfmmH5Y9JgQY0grYrUv8yhiQ@mail.gmail.com>
  2021-08-30 13:04         ` Anand Jain
@ 2021-08-30 13:04         ` Anand Jain
       [not found]           ` <CAHQ7scVOYuzz58KcO_fo2rph44CCC46ef=LFVZF8XzoKYq27ig@mail.gmail.com>
  2021-09-01 16:11           ` Dāvis Mosāns
  1 sibling, 2 replies; 16+ messages in thread
From: Anand Jain @ 2021-08-30 13:04 UTC (permalink / raw)
  To: Jingyun He; +Cc: linux-btrfs

open_ctree() took 228254398 us. And 98% of it that is 225418272 us
was taken by btrfs_read_block_groups().

-------------------
  1) $ 225418272 us | } /* btrfs_read_block_groups [btrfs] */
  1) * 16934.96 us | btrfs_check_rw_degradable [btrfs]();
  0) 0.967 us | btrfs_apply_pending_changes [btrfs]();
  0) 0.239 us | btrfs_read_qgroup_config [btrfs]();
  0) * 21017.34 us | btrfs_get_root_ref [btrfs]();
  0) + 15.717 us | btrfs_start_pre_rw_mount [btrfs]();
  0) 0.865 us | btrfs_discard_resume [btrfs]();
  0) $ 228254398 us | } /* open_ctree [btrfs] */
-------------------

Now we need to run the same thing on btrfs_read_block_groups(),
could you please run.. [1] (no need of the time).

[1]
   $ umount /btrfs;
   $./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount 
/dev/vg/scratch0 /btrfs"

Thanks, Anand


On 30/08/2021 14:44, Jingyun He wrote:
> Hi, Anand,
> I have attached the new result.
> Kindly check.
> 
> Thank you.
> 
> On Mon, Aug 30, 2021 at 9:27 AM Anand Jain <anand.jain@oracle.com> wrote:
>>
>>
>> Our open_ctree() took around ~223secs (~3.7mins)
>>
>>    1) $ 223375750 us |  } /* open_ctree [btrfs] */
>>
>> Unfortunately, the default trace buffer per CPU (4K) wasn't sufficient
>> and, the trace-buffer rolled over.
>> So we still don't know how long we spent in btrfs_read_block_groups().
>> Sorry for my mistake we should go 1 step at a time and, we have to do
>> this until we narrow it down to a specific function.
>>
>> Could you please run with the depth = 2 (instead of 3) and use the time
>> command prefix. Also, pull a new ftracegraph as I have updated it to
>> display a proper time output.
>>
>> $ ftracegraph open_ctree 2 "*:mod:btrfs" "time mount /dev/vg/scratch0
>> /btrfs"
>>
>> Thanks, Anand
>>
>> On 29/08/2021 17:42, Jingyun He wrote:
>>> Hi, Anand
>>>
>>> I have attached the file.
>>> Could you kindly check this?
>>>
>>> Thank you.
>>>
>>> On Sun, Aug 29, 2021 at 7:47 AM Anand Jain <anand.jain@oracle.com> wrote:
>>>>
>>>> On 28/08/2021 19:58, Jingyun He wrote:
>>>>> Hello, all
>>>>> I'm new to btrfs, I have a HM-SMR 14TB disk, I have formatted it to
>>>>> btrfs to store the files.
>>>>>
>>>>> When the device is almost full, it needs about 5 mins to mount the device.
>>>>>
>>>>> Is it normal? is there any mount option that I can use to reduce the mount time?
>>>>>
>>>>
>>>>
>>>>     We need to figure out the function taking a longer time (maybe it is
>>>>     read-block-groups). I have similar reports on the non zoned device
>>>>     as well (with a few TB full of data). But there is no good data yet
>>>>     to analyse.
>>>>
>>>>     Could you please collect the trace data from the ftracegraph
>>>>     from here [1] (It needs trace-cmd).
>>>>
>>>>     [1] https://github.com/asj/btrfstrace.git
>>>>
>>>>     Run it as in the example below:
>>>>
>>>>     umount /btrfs;
>>>>
>>>>     ./ftracegraph open_ctree 3 "*:mod:btrfs" "mount /dev/vg/scratch0 /btrfs"
>>>>
>>>>     cat /tmp/ftracegraph.out
>>>>
>>>>
>>>> Thanks, Anand
>>>>
>>>>
>>>>
>>>>> Thank you.
>>>>>
>>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
       [not found]           ` <CAHQ7scVOYuzz58KcO_fo2rph44CCC46ef=LFVZF8XzoKYq27ig@mail.gmail.com>
@ 2021-08-31 11:09             ` Anand Jain
  2021-08-31 13:10               ` Johannes Thumshirn
       [not found]               ` <CAHQ7scXp9gTANB9gWHqQgXUsnF-YvTOKrK1waUScmqvHU_0ifw@mail.gmail.com>
  0 siblings, 2 replies; 16+ messages in thread
From: Anand Jain @ 2021-08-31 11:09 UTC (permalink / raw)
  To: Jingyun He; +Cc: linux-btrfs


btrfs_load_block_group_zone_info () and btrfs_search_slot() were the
slowest and, they are called more than 2127 times each (trace buffer
has rolled over there isn't all the data).

Sorted in the ascending order of time spent.. and bottom few...
<snip>
19472.64 us | btrfs_search_slot [btrfs]();
23294.26 us | btrfs_load_block_group_zone_info [btrfs]();
23315.21 us | btrfs_load_block_group_zone_info [btrfs]();
224534963 us | } / btrfs_read_block_groups [btrfs] /

Most likely because the zoned devices are slow, and we have to read
all the block groups during mount, so there is not much that can be
done about it.

But let's see what part in btrfs_load_block_group_zone_info() is taking
most of the time. Could you please help get this output?

$./ftracegraph btrfs_load_block_group_zone_info 2 "*:mod:btrfs" "mount
 > /dev/vg/scratch0 /btrfs"

Thx.


On 31/08/2021 01:17, Jingyun He wrote:
> Hi,
> I have attached the result of
> $./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount
> /dev/vg/scratch0 /btrfs"
> 
> Please kindly check.
> Thank you very much.
> 
> On Mon, Aug 30, 2021 at 9:05 PM Anand Jain <anand.jain@oracle.com> wrote:
>>
>> open_ctree() took 228254398 us. And 98% of it that is 225418272 us
>> was taken by btrfs_read_block_groups().
>>
>> -------------------
>>    1) $ 225418272 us | } /* btrfs_read_block_groups [btrfs] */
>>    1) * 16934.96 us | btrfs_check_rw_degradable [btrfs]();
>>    0) 0.967 us | btrfs_apply_pending_changes [btrfs]();
>>    0) 0.239 us | btrfs_read_qgroup_config [btrfs]();
>>    0) * 21017.34 us | btrfs_get_root_ref [btrfs]();
>>    0) + 15.717 us | btrfs_start_pre_rw_mount [btrfs]();
>>    0) 0.865 us | btrfs_discard_resume [btrfs]();
>>    0) $ 228254398 us | } /* open_ctree [btrfs] */
>> -------------------
>>
>> Now we need to run the same thing on btrfs_read_block_groups(),
>> could you please run.. [1] (no need of the time).
>>
>> [1]
>>     $ umount /btrfs;
>>     $./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount
>> /dev/vg/scratch0 /btrfs"
>>
>> Thanks, Anand
>>
>>
>> On 30/08/2021 14:44, Jingyun He wrote:
>>> Hi, Anand,
>>> I have attached the new result.
>>> Kindly check.
>>>
>>> Thank you.
>>>
>>> On Mon, Aug 30, 2021 at 9:27 AM Anand Jain <anand.jain@oracle.com> wrote:
>>>>
>>>>
>>>> Our open_ctree() took around ~223secs (~3.7mins)
>>>>
>>>>     1) $ 223375750 us |  } /* open_ctree [btrfs] */
>>>>
>>>> Unfortunately, the default trace buffer per CPU (4K) wasn't sufficient
>>>> and, the trace-buffer rolled over.
>>>> So we still don't know how long we spent in btrfs_read_block_groups().
>>>> Sorry for my mistake we should go 1 step at a time and, we have to do
>>>> this until we narrow it down to a specific function.
>>>>
>>>> Could you please run with the depth = 2 (instead of 3) and use the time
>>>> command prefix. Also, pull a new ftracegraph as I have updated it to
>>>> display a proper time output.
>>>>
>>>> $ ftracegraph open_ctree 2 "*:mod:btrfs" "time mount /dev/vg/scratch0
>>>> /btrfs"
>>>>
>>>> Thanks, Anand
>>>>
>>>> On 29/08/2021 17:42, Jingyun He wrote:
>>>>> Hi, Anand
>>>>>
>>>>> I have attached the file.
>>>>> Could you kindly check this?
>>>>>
>>>>> Thank you.
>>>>>
>>>>> On Sun, Aug 29, 2021 at 7:47 AM Anand Jain <anand.jain@oracle.com> wrote:
>>>>>>
>>>>>> On 28/08/2021 19:58, Jingyun He wrote:
>>>>>>> Hello, all
>>>>>>> I'm new to btrfs, I have a HM-SMR 14TB disk, I have formatted it to
>>>>>>> btrfs to store the files.
>>>>>>>
>>>>>>> When the device is almost full, it needs about 5 mins to mount the device.
>>>>>>>
>>>>>>> Is it normal? is there any mount option that I can use to reduce the mount time?
>>>>>>>
>>>>>>
>>>>>>
>>>>>>      We need to figure out the function taking a longer time (maybe it is
>>>>>>      read-block-groups). I have similar reports on the non zoned device
>>>>>>      as well (with a few TB full of data). But there is no good data yet
>>>>>>      to analyse.
>>>>>>
>>>>>>      Could you please collect the trace data from the ftracegraph
>>>>>>      from here [1] (It needs trace-cmd).
>>>>>>
>>>>>>      [1] https://github.com/asj/btrfstrace.git
>>>>>>
>>>>>>      Run it as in the example below:
>>>>>>
>>>>>>      umount /btrfs;
>>>>>>
>>>>>>      ./ftracegraph open_ctree 3 "*:mod:btrfs" "mount /dev/vg/scratch0 /btrfs"
>>>>>>
>>>>>>      cat /tmp/ftracegraph.out
>>>>>>
>>>>>>
>>>>>> Thanks, Anand
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
  2021-08-31 11:09             ` Anand Jain
@ 2021-08-31 13:10               ` Johannes Thumshirn
  2021-08-31 16:47                 ` David Sterba
       [not found]               ` <CAHQ7scXp9gTANB9gWHqQgXUsnF-YvTOKrK1waUScmqvHU_0ifw@mail.gmail.com>
  1 sibling, 1 reply; 16+ messages in thread
From: Johannes Thumshirn @ 2021-08-31 13:10 UTC (permalink / raw)
  To: Anand Jain, Jingyun He; +Cc: linux-btrfs, Naohiro Aota

[ +Cc Naohiro ]

On 31/08/2021 14:11, Anand Jain wrote:
> But let's see what part in btrfs_load_block_group_zone_info() is taking
> most of the time. Could you please help get this output?

I'd suspect it's the btrfs_get_dev_zone() call that we need to do for 
each block group in order to determine the current write pointer to
set block_group->alloc_offset.

We could speed up the mount process by caching the device's REPORT ZONES
response, as we're doing a REPORT ZONES once to get all zones and then 
again per block group load. On a 14TB SMR drive this results in 
(14  * 1024 * 1024) / 256 + 1 = 57345 REPORT ZONES calls. OTOH 
struct blk_zone is 64 bytes per zone resulting in 64 * 57344 = 3584kB
data to be cached.

So the solution should probably somewhere in between, aka caching X 
REPORT ZONES responses before we need to do a new one.

Byte,
	Johannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
  2021-08-31 13:10               ` Johannes Thumshirn
@ 2021-08-31 16:47                 ` David Sterba
  2021-09-07  5:29                   ` Naohiro Aota
  0 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2021-08-31 16:47 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: Anand Jain, Jingyun He, linux-btrfs, Naohiro Aota

On Tue, Aug 31, 2021 at 01:10:38PM +0000, Johannes Thumshirn wrote:
> [ +Cc Naohiro ]
> 
> On 31/08/2021 14:11, Anand Jain wrote:
> > But let's see what part in btrfs_load_block_group_zone_info() is taking
> > most of the time. Could you please help get this output?
> 
> I'd suspect it's the btrfs_get_dev_zone() call that we need to do for 
> each block group in order to determine the current write pointer to
> set block_group->alloc_offset.
> 
> We could speed up the mount process by caching the device's REPORT ZONES
> response, as we're doing a REPORT ZONES once to get all zones and then 
> again per block group load. On a 14TB SMR drive this results in 
> (14  * 1024 * 1024) / 256 + 1 = 57345 REPORT ZONES calls. OTOH 
> struct blk_zone is 64 bytes per zone resulting in 64 * 57344 = 3584kB
> data to be cached.
> 
> So the solution should probably somewhere in between, aka caching X 
> REPORT ZONES responses before we need to do a new one.

That sounds like a good option. The number of new members in block group
struct has grown due to zoned support, this means higher memory
requirements, scaling up with device size. We'll have to do something
about that eventually.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
  2021-08-30 13:04         ` Anand Jain
       [not found]           ` <CAHQ7scVOYuzz58KcO_fo2rph44CCC46ef=LFVZF8XzoKYq27ig@mail.gmail.com>
@ 2021-09-01 16:11           ` Dāvis Mosāns
  2021-09-01 21:30             ` Anand Jain
  2021-09-01 23:42             ` Qu Wenruo
  1 sibling, 2 replies; 16+ messages in thread
From: Dāvis Mosāns @ 2021-09-01 16:11 UTC (permalink / raw)
  To: Anand Jain; +Cc: Jingyun He, Btrfs BTRFS, Johannes Thumshirn, David Sterba

pirmd., 2021. g. 30. aug., plkst. 16:08 — lietotājs Anand Jain
(<anand.jain@oracle.com>) rakstīja:
>
> open_ctree() took 228254398 us. And 98% of it that is 225418272 us
> was taken by btrfs_read_block_groups().
>
> -------------------
>   1) $ 225418272 us | } /* btrfs_read_block_groups [btrfs] */
>   1) * 16934.96 us | btrfs_check_rw_degradable [btrfs]();
>   0) 0.967 us | btrfs_apply_pending_changes [btrfs]();
>   0) 0.239 us | btrfs_read_qgroup_config [btrfs]();
>   0) * 21017.34 us | btrfs_get_root_ref [btrfs]();
>   0) + 15.717 us | btrfs_start_pre_rw_mount [btrfs]();
>   0) 0.865 us | btrfs_discard_resume [btrfs]();
>   0) $ 228254398 us | } /* open_ctree [btrfs] */
> -------------------
>
> Now we need to run the same thing on btrfs_read_block_groups(),
> could you please run.. [1] (no need of the time).
>
> [1]
>    $ umount /btrfs;
>    $./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount
> /dev/vg/scratch0 /btrfs"
>
> Thanks, Anand
>
>

Hi,

I also have a btrfs filesystem that takes a while to mount.
So I'm interested if this could be improved.

$ ./ftracegraph open_ctree 2 "*:mod:btrfs" "time mount /dev/md127 -o
space_cache=v2,compress=zstd,acl,subvol=Data /mnt/Data/"
kernel.ftrace_enabled = 1

real    1m33,638s
user    0m0,000s
sys     0m1,130s

Here's the trace output https://dāvis.lv/files/ftracegraph.out.gz

The filesystem is on top of RAID6 mdadm array which is from 9x 3TB HDDs.

Best regards,
Dāvis

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
  2021-09-01 16:11           ` Dāvis Mosāns
@ 2021-09-01 21:30             ` Anand Jain
  2021-09-02 16:56               ` Dāvis Mosāns
  2021-09-02 17:34               ` Joshua
  2021-09-01 23:42             ` Qu Wenruo
  1 sibling, 2 replies; 16+ messages in thread
From: Anand Jain @ 2021-09-01 21:30 UTC (permalink / raw)
  To: Dāvis Mosāns
  Cc: Jingyun He, Btrfs BTRFS, Johannes Thumshirn, David Sterba

On 02/09/2021 00:11, Dāvis Mosāns wrote:
> pirmd., 2021. g. 30. aug., plkst. 16:08 — lietotājs Anand Jain
> (<anand.jain@oracle.com>) rakstīja:
>>
>> open_ctree() took 228254398 us. And 98% of it that is 225418272 us
>> was taken by btrfs_read_block_groups().
>>
>> -------------------
>>    1) $ 225418272 us | } /* btrfs_read_block_groups [btrfs] */
>>    1) * 16934.96 us | btrfs_check_rw_degradable [btrfs]();
>>    0) 0.967 us | btrfs_apply_pending_changes [btrfs]();
>>    0) 0.239 us | btrfs_read_qgroup_config [btrfs]();
>>    0) * 21017.34 us | btrfs_get_root_ref [btrfs]();
>>    0) + 15.717 us | btrfs_start_pre_rw_mount [btrfs]();
>>    0) 0.865 us | btrfs_discard_resume [btrfs]();
>>    0) $ 228254398 us | } /* open_ctree [btrfs] */
>> -------------------
>>
>> Now we need to run the same thing on btrfs_read_block_groups(),
>> could you please run.. [1] (no need of the time).
>>
>> [1]
>>     $ umount /btrfs;
>>     $./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount
>> /dev/vg/scratch0 /btrfs"
>>
>> Thanks, Anand
>>
>>
> 
> Hi,
> 
> I also have a btrfs filesystem that takes a while to mount.
> So I'm interested if this could be improved.
> 
> $ ./ftracegraph open_ctree 2 "*:mod:btrfs" "time mount /dev/md127 -o
> space_cache=v2,compress=zstd,acl,subvol=Data /mnt/Data/"

  It is better if we don't use the time prefix for the mount command
  here. The ftrace, traces time syscall as well, which is unessential.
  And we lose a lot of trace-buffer to it.

> kernel.ftrace_enabled = 1
> 
> real    1m33,638s
> user    0m0,000s
> sys     0m1,130s
> 
> Here's the trace output https://dāvis.lv/files/ftracegraph.out.gz
> 
> The filesystem is on top of RAID6 mdadm array which is from 9x 3TB HDDs.

  So here is a case of a non-zoned device.

  Again it is btrfs_read_block_groups() which is taking ~98% of the time.

    3) $ 91607669 us |    } /* btrfs_read_block_groups [btrfs] */
    3) # 9399.566 us |    btrfs_check_rw_degradable [btrfs]();
    3)   0.922 us    |    btrfs_apply_pending_changes [btrfs]();
    3) ! 186.540 us  |    btrfs_read_qgroup_config [btrfs]();
    3) * 26109.92 us |    btrfs_get_root_ref [btrfs]();
    3) + 23.965 us   |    btrfs_start_pre_rw_mount [btrfs]();
    3)   1.192 us    |    btrfs_discard_resume [btrfs]();
    3) $ 93501136 us |  } /* open_ctree [btrfs] */

  Could we pls get this?

  $ ./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount ..."

  Hopefully, there won't be a trace-buffer rollover here, as we saw in
  the other case so that we could account for all the time spent.

  Also, let's understand how many block groups are there.

  $ btrfs in dump-tree <dev> | grep BLOCK_GROUP_ITEM | wc -l


Thx, Anand

> Best regards,
> Dāvis
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
  2021-09-01 16:11           ` Dāvis Mosāns
  2021-09-01 21:30             ` Anand Jain
@ 2021-09-01 23:42             ` Qu Wenruo
  1 sibling, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2021-09-01 23:42 UTC (permalink / raw)
  To: Dāvis Mosāns, Anand Jain
  Cc: Jingyun He, Btrfs BTRFS, Johannes Thumshirn, David Sterba



On 2021/9/2 上午12:11, Dāvis Mosāns wrote:
> pirmd., 2021. g. 30. aug., plkst. 16:08 — lietotājs Anand Jain
> (<anand.jain@oracle.com>) rakstīja:
>>
>> open_ctree() took 228254398 us. And 98% of it that is 225418272 us
>> was taken by btrfs_read_block_groups().
>>
>> -------------------
>>    1) $ 225418272 us | } /* btrfs_read_block_groups [btrfs] */

As long as it's not zoned device, this is a known problem.

Block group items by its original design is scattered across the huge
extent tree.

Grabbing them all needs quite a lot of random IO, for large fs it's very
time consuming.

I have purposed a skinny bg tree design to put all block group items
into one tree, so that would greatly speed up the whole mount process.
And with user-space tool to do the convert.

But recently Josef is developing a completely new extent tree design,
which will include a similar design for it.

For know I'm not sure what's the way to go, either we can wait for
extent-tree-v2, or should we push for the middle ground skinny bg tree?

David, Any idea on this?

Thanks,
Qu
>>    1) * 16934.96 us | btrfs_check_rw_degradable [btrfs]();
>>    0) 0.967 us | btrfs_apply_pending_changes [btrfs]();
>>    0) 0.239 us | btrfs_read_qgroup_config [btrfs]();
>>    0) * 21017.34 us | btrfs_get_root_ref [btrfs]();
>>    0) + 15.717 us | btrfs_start_pre_rw_mount [btrfs]();
>>    0) 0.865 us | btrfs_discard_resume [btrfs]();
>>    0) $ 228254398 us | } /* open_ctree [btrfs] */
>> -------------------
>>
>> Now we need to run the same thing on btrfs_read_block_groups(),
>> could you please run.. [1] (no need of the time).
>>
>> [1]
>>     $ umount /btrfs;
>>     $./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount
>> /dev/vg/scratch0 /btrfs"
>>
>> Thanks, Anand
>>
>>
>
> Hi,
>
> I also have a btrfs filesystem that takes a while to mount.
> So I'm interested if this could be improved.
>
> $ ./ftracegraph open_ctree 2 "*:mod:btrfs" "time mount /dev/md127 -o
> space_cache=v2,compress=zstd,acl,subvol=Data /mnt/Data/"
> kernel.ftrace_enabled = 1
>
> real    1m33,638s
> user    0m0,000s
> sys     0m1,130s
>
> Here's the trace output https://dāvis.lv/files/ftracegraph.out.gz
>
> The filesystem is on top of RAID6 mdadm array which is from 9x 3TB HDDs.
>
> Best regards,
> Dāvis
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
  2021-09-01 21:30             ` Anand Jain
@ 2021-09-02 16:56               ` Dāvis Mosāns
  2021-09-03  9:56                 ` Anand Jain
  2021-09-02 17:34               ` Joshua
  1 sibling, 1 reply; 16+ messages in thread
From: Dāvis Mosāns @ 2021-09-02 16:56 UTC (permalink / raw)
  To: Anand Jain; +Cc: Jingyun He, Btrfs BTRFS, Johannes Thumshirn, David Sterba

ceturtd., 2021. g. 2. sept., plkst. 00:31 — lietotājs Anand Jain
(<anand.jain@oracle.com>) rakstīja:
>
> On 02/09/2021 00:11, Dāvis Mosāns wrote:
> > pirmd., 2021. g. 30. aug., plkst. 16:08 — lietotājs Anand Jain
> > (<anand.jain@oracle.com>) rakstīja:
> >>
> >> open_ctree() took 228254398 us. And 98% of it that is 225418272 us
> >> was taken by btrfs_read_block_groups().
> >>
> >> -------------------
> >>    1) $ 225418272 us | } /* btrfs_read_block_groups [btrfs] */
> >>    1) * 16934.96 us | btrfs_check_rw_degradable [btrfs]();
> >>    0) 0.967 us | btrfs_apply_pending_changes [btrfs]();
> >>    0) 0.239 us | btrfs_read_qgroup_config [btrfs]();
> >>    0) * 21017.34 us | btrfs_get_root_ref [btrfs]();
> >>    0) + 15.717 us | btrfs_start_pre_rw_mount [btrfs]();
> >>    0) 0.865 us | btrfs_discard_resume [btrfs]();
> >>    0) $ 228254398 us | } /* open_ctree [btrfs] */
> >> -------------------
> >>
> >> Now we need to run the same thing on btrfs_read_block_groups(),
> >> could you please run.. [1] (no need of the time).
> >>
> >> [1]
> >>     $ umount /btrfs;
> >>     $./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount
> >> /dev/vg/scratch0 /btrfs"
> >>
> >> Thanks, Anand
> >>
> >>
> >
> > Hi,
> >
> > I also have a btrfs filesystem that takes a while to mount.
> > So I'm interested if this could be improved.
> >
> > $ ./ftracegraph open_ctree 2 "*:mod:btrfs" "time mount /dev/md127 -o
> > space_cache=v2,compress=zstd,acl,subvol=Data /mnt/Data/"
>
>   It is better if we don't use the time prefix for the mount command
>   here. The ftrace, traces time syscall as well, which is unessential.
>   And we lose a lot of trace-buffer to it.
>
> > kernel.ftrace_enabled = 1
> >
> > real    1m33,638s
> > user    0m0,000s
> > sys     0m1,130s
> >
> > Here's the trace output https://dāvis.lv/files/ftracegraph.out.gz
> >
> > The filesystem is on top of RAID6 mdadm array which is from 9x 3TB HDDs.
>
>   So here is a case of a non-zoned device.
>
>   Again it is btrfs_read_block_groups() which is taking ~98% of the time.
>
>     3) $ 91607669 us |    } /* btrfs_read_block_groups [btrfs] */
>     3) # 9399.566 us |    btrfs_check_rw_degradable [btrfs]();
>     3)   0.922 us    |    btrfs_apply_pending_changes [btrfs]();
>     3) ! 186.540 us  |    btrfs_read_qgroup_config [btrfs]();
>     3) * 26109.92 us |    btrfs_get_root_ref [btrfs]();
>     3) + 23.965 us   |    btrfs_start_pre_rw_mount [btrfs]();
>     3)   1.192 us    |    btrfs_discard_resume [btrfs]();
>     3) $ 93501136 us |  } /* open_ctree [btrfs] */
>
>   Could we pls get this?
>
>   $ ./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount ..."
>
>   Hopefully, there won't be a trace-buffer rollover here, as we saw in
>   the other case so that we could account for all the time spent.

Sure, here https://dāvis.lv/files/ftracegraph_v2.out.gz

>   Also, let's understand how many block groups are there.
>
>   $ btrfs in dump-tree <dev> | grep BLOCK_GROUP_ITEM | wc -l

It's 22660
Also by the way `-t EXTENT_TREE` should be faster

Best regards,
Dāvis

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
  2021-09-01 21:30             ` Anand Jain
  2021-09-02 16:56               ` Dāvis Mosāns
@ 2021-09-02 17:34               ` Joshua
  1 sibling, 0 replies; 16+ messages in thread
From: Joshua @ 2021-09-02 17:34 UTC (permalink / raw)
  To: Dāvis Mosāns, Anand Jain, Btrfs BTRFS
  Cc: Jingyun He, Johannes Thumshirn, David Sterba

September 2, 2021 9:56 AM, "Dāvis Mosāns" <davispuh@gmail.com> wrote:

> ceturtd., 2021. g. 2. sept., plkst. 00:31 — lietotājs Anand Jain
> (<anand.jain@oracle.com>) rakstīja:
> 
>> On 02/09/2021 00:11, Dāvis Mosāns wrote:
>> pirmd., 2021. g. 30. aug., plkst. 16:08 — lietotājs Anand Jain
>> (<anand.jain@oracle.com>) rakstīja:
>> 
>> open_ctree() took 228254398 us. And 98% of it that is 225418272 us
>> was taken by btrfs_read_block_groups().
>> 
>> -------------------
>> 1) $ 225418272 us | } /* btrfs_read_block_groups [btrfs] */
>> 1) * 16934.96 us | btrfs_check_rw_degradable [btrfs]();
>> 0) 0.967 us | btrfs_apply_pending_changes [btrfs]();
>> 0) 0.239 us | btrfs_read_qgroup_config [btrfs]();
>> 0) * 21017.34 us | btrfs_get_root_ref [btrfs]();
>> 0) + 15.717 us | btrfs_start_pre_rw_mount [btrfs]();
>> 0) 0.865 us | btrfs_discard_resume [btrfs]();
>> 0) $ 228254398 us | } /* open_ctree [btrfs] */
>> -------------------
>> 
>> Now we need to run the same thing on btrfs_read_block_groups(),
>> could you please run.. [1] (no need of the time).
>> 
>> [1]
>> $ umount /btrfs;
>> $./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount
>> /dev/vg/scratch0 /btrfs"
>> 
>> Thanks, Anand
>> 
>> Hi,
>> 
>> I also have a btrfs filesystem that takes a while to mount.
>> So I'm interested if this could be improved.
>> 
>> $ ./ftracegraph open_ctree 2 "*:mod:btrfs" "time mount /dev/md127 -o
>> space_cache=v2,compress=zstd,acl,subvol=Data /mnt/Data/"
>> 
>> It is better if we don't use the time prefix for the mount command
>> here. The ftrace, traces time syscall as well, which is unessential.
>> And we lose a lot of trace-buffer to it.
>> 
>> kernel.ftrace_enabled = 1
>> 
>> real 1m33,638s
>> user 0m0,000s
>> sys 0m1,130s
>> 
>> Here's the trace output https://dāvis.lv/files/ftracegraph.out.gz
>> 
>> The filesystem is on top of RAID6 mdadm array which is from 9x 3TB HDDs.
>> 
>> So here is a case of a non-zoned device.
>> 
>> Again it is btrfs_read_block_groups() which is taking ~98% of the time.
>> 
>> 3) $ 91607669 us | } /* btrfs_read_block_groups [btrfs] */
>> 3) # 9399.566 us | btrfs_check_rw_degradable [btrfs]();
>> 3) 0.922 us | btrfs_apply_pending_changes [btrfs]();
>> 3) ! 186.540 us | btrfs_read_qgroup_config [btrfs]();
>> 3) * 26109.92 us | btrfs_get_root_ref [btrfs]();
>> 3) + 23.965 us | btrfs_start_pre_rw_mount [btrfs]();
>> 3) 1.192 us | btrfs_discard_resume [btrfs]();
>> 3) $ 93501136 us | } /* open_ctree [btrfs] */
>> 
>> Could we pls get this?
>> 
>> $ ./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount ..."
>> 
>> Hopefully, there won't be a trace-buffer rollover here, as we saw in
>> the other case so that we could account for all the time spent.
> 
> Sure, here https://dāvis.lv/files/ftracegraph_v2.out.gz
> 
>> Also, let's understand how many block groups are there.
>> 
>> $ btrfs in dump-tree <dev> | grep BLOCK_GROUP_ITEM | wc -l
> 
> It's 22660
> Also by the way `-t EXTENT_TREE` should be faster

Just to add more data, I have a 10-device raid1 array with ~86TB used.

My results:
root@SERVER:~# btrfs in dump-tree -t EXTENT_TREE /dev/sdf | grep BLOCK_GROUP_ITEM | wc -l
44986

I estimate it takes ~83 seconds to mount. (by looking at dmesg) It sometimes times out on boot and drops to recovery mode, as systemd will only wait 90 seconds, and for some reason 'x-systemd.mount-timeout=500' doesn't work

Note that I defragment the subvolume and extent tree for each subvolume weekly, as it seems to reduce my mount times, and make it *usually* boot without dropping to recovery mode.

--Joshua

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
  2021-09-02 16:56               ` Dāvis Mosāns
@ 2021-09-03  9:56                 ` Anand Jain
  0 siblings, 0 replies; 16+ messages in thread
From: Anand Jain @ 2021-09-03  9:56 UTC (permalink / raw)
  To: Dāvis Mosāns
  Cc: Jingyun He, Btrfs BTRFS, Johannes Thumshirn, David Sterba

On 03/09/2021 00:56, Dāvis Mosāns wrote:
> ceturtd., 2021. g. 2. sept., plkst. 00:31 — lietotājs Anand Jain
> (<anand.jain@oracle.com>) rakstīja:
>>
>> On 02/09/2021 00:11, Dāvis Mosāns wrote:
>>> pirmd., 2021. g. 30. aug., plkst. 16:08 — lietotājs Anand Jain
>>> (<anand.jain@oracle.com>) rakstīja:
>>>>
>>>> open_ctree() took 228254398 us. And 98% of it that is 225418272 us
>>>> was taken by btrfs_read_block_groups().
>>>>
>>>> -------------------
>>>>     1) $ 225418272 us | } /* btrfs_read_block_groups [btrfs] */
>>>>     1) * 16934.96 us | btrfs_check_rw_degradable [btrfs]();
>>>>     0) 0.967 us | btrfs_apply_pending_changes [btrfs]();
>>>>     0) 0.239 us | btrfs_read_qgroup_config [btrfs]();
>>>>     0) * 21017.34 us | btrfs_get_root_ref [btrfs]();
>>>>     0) + 15.717 us | btrfs_start_pre_rw_mount [btrfs]();
>>>>     0) 0.865 us | btrfs_discard_resume [btrfs]();
>>>>     0) $ 228254398 us | } /* open_ctree [btrfs] */
>>>> -------------------
>>>>
>>>> Now we need to run the same thing on btrfs_read_block_groups(),
>>>> could you please run.. [1] (no need of the time).
>>>>
>>>> [1]
>>>>      $ umount /btrfs;
>>>>      $./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount
>>>> /dev/vg/scratch0 /btrfs"
>>>>
>>>> Thanks, Anand
>>>>
>>>>
>>>
>>> Hi,
>>>
>>> I also have a btrfs filesystem that takes a while to mount.
>>> So I'm interested if this could be improved.
>>>
>>> $ ./ftracegraph open_ctree 2 "*:mod:btrfs" "time mount /dev/md127 -o
>>> space_cache=v2,compress=zstd,acl,subvol=Data /mnt/Data/"
>>
>>    It is better if we don't use the time prefix for the mount command
>>    here. The ftrace, traces time syscall as well, which is unessential.
>>    And we lose a lot of trace-buffer to it.
>>
>>> kernel.ftrace_enabled = 1
>>>
>>> real    1m33,638s
>>> user    0m0,000s
>>> sys     0m1,130s
>>>
>>> Here's the trace output https://dāvis.lv/files/ftracegraph.out.gz
>>>
>>> The filesystem is on top of RAID6 mdadm array which is from 9x 3TB HDDs.
>>
>>    So here is a case of a non-zoned device.
>>
>>    Again it is btrfs_read_block_groups() which is taking ~98% of the time.
>>
>>      3) $ 91607669 us |    } /* btrfs_read_block_groups [btrfs] */
>>      3) # 9399.566 us |    btrfs_check_rw_degradable [btrfs]();
>>      3)   0.922 us    |    btrfs_apply_pending_changes [btrfs]();
>>      3) ! 186.540 us  |    btrfs_read_qgroup_config [btrfs]();
>>      3) * 26109.92 us |    btrfs_get_root_ref [btrfs]();
>>      3) + 23.965 us   |    btrfs_start_pre_rw_mount [btrfs]();
>>      3)   1.192 us    |    btrfs_discard_resume [btrfs]();
>>      3) $ 93501136 us |  } /* open_ctree [btrfs] */
>>
>>    Could we pls get this?
>>
>>    $ ./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount ..."
>>
>>    Hopefully, there won't be a trace-buffer rollover here, as we saw in
>>    the other case so that we could account for all the time spent.
> 
> Sure, here https://dāvis.lv/files/ftracegraph_v2.out.gz

  Ok. For conventional devices, it's only btrfs_search_slot() 
contributing to the mount time significantly.

> 
>>    Also, let's understand how many block groups are there.
>>
>>    $ btrfs in dump-tree <dev> | grep BLOCK_GROUP_ITEM | wc -l
> 
> It's 22660
> Also by the way `-t EXTENT_TREE` should be faster
> 
> Best regards,
> Dāvis
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
  2021-08-31 16:47                 ` David Sterba
@ 2021-09-07  5:29                   ` Naohiro Aota
  0 siblings, 0 replies; 16+ messages in thread
From: Naohiro Aota @ 2021-09-07  5:29 UTC (permalink / raw)
  To: dsterba, Johannes Thumshirn, Anand Jain, Jingyun He, linux-btrfs,
	Naohiro Aota

On Tue, Aug 31, 2021 at 06:47:44PM +0200, David Sterba wrote:
> On Tue, Aug 31, 2021 at 01:10:38PM +0000, Johannes Thumshirn wrote:
> > [ +Cc Naohiro ]
> > 
> > On 31/08/2021 14:11, Anand Jain wrote:
> > > But let's see what part in btrfs_load_block_group_zone_info() is taking
> > > most of the time. Could you please help get this output?
> > 
> > I'd suspect it's the btrfs_get_dev_zone() call that we need to do for 
> > each block group in order to determine the current write pointer to
> > set block_group->alloc_offset.
> > 
> > We could speed up the mount process by caching the device's REPORT ZONES
> > response, as we're doing a REPORT ZONES once to get all zones and then 
> > again per block group load. On a 14TB SMR drive this results in 
> > (14  * 1024 * 1024) / 256 + 1 = 57345 REPORT ZONES calls. OTOH 
> > struct blk_zone is 64 bytes per zone resulting in 64 * 57344 = 3584kB
> > data to be cached.
> > 
> > So the solution should probably somewhere in between, aka caching X 
> > REPORT ZONES responses before we need to do a new one.
> 
> That sounds like a good option. The number of new members in block group
> struct has grown due to zoned support, this means higher memory
> requirements, scaling up with device size. We'll have to do something
> about that eventually.

As a reminder, I'm working on to implement the REPORT ZONES caching
solution on mount time. But, I'm struggling to fill my btrfs to
reproduce the issue due to some other bug (it hang while I'm filling
it). So, it will take some more time implementing it.

Thanks,

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: btrfs mount takes too long time
       [not found]               ` <CAHQ7scXp9gTANB9gWHqQgXUsnF-YvTOKrK1waUScmqvHU_0ifw@mail.gmail.com>
@ 2021-09-07  9:52                 ` Anand Jain
  0 siblings, 0 replies; 16+ messages in thread
From: Anand Jain @ 2021-09-07  9:52 UTC (permalink / raw)
  To: Jingyun He; +Cc: linux-btrfs





On 04/09/2021 20:45, Jingyun He wrote:
> Hello, Anand
> Sorry about the delay,
> I have attached the result of
> $./ftracegraph btrfs_load_block_group_zone_info 2 "*:mod:btrfs" "mount
> /dev/vg/scratch0 /btrfs"

The last trace has 8514 calls to btrfs_get_dev_zones() and, each takes
around ~1.2ms to ~3ms.  Naohiro is planning to cache the report zones.
It would help.

Thanks, Anand

> Please kindly check.
> 
> Thank you
> 
> On Tue, Aug 31, 2021 at 7:09 PM Anand Jain <anand.jain@oracle.com> wrote:
>>
>>
>> btrfs_load_block_group_zone_info () and btrfs_search_slot() were the
>> slowest and, they are called more than 2127 times each (trace buffer
>> has rolled over there isn't all the data).
>>
>> Sorted in the ascending order of time spent.. and bottom few...
>> <snip>
>> 19472.64 us | btrfs_search_slot [btrfs]();
>> 23294.26 us | btrfs_load_block_group_zone_info [btrfs]();
>> 23315.21 us | btrfs_load_block_group_zone_info [btrfs]();
>> 224534963 us | } / btrfs_read_block_groups [btrfs] /
>>
>> Most likely because the zoned devices are slow, and we have to read
>> all the block groups during mount, so there is not much that can be
>> done about it.
>>
>> But let's see what part in btrfs_load_block_group_zone_info() is taking
>> most of the time. Could you please help get this output?
>>
>> $./ftracegraph btrfs_load_block_group_zone_info 2 "*:mod:btrfs" "mount
>>   > /dev/vg/scratch0 /btrfs"
>>
>> Thx.
>>
>>
>> On 31/08/2021 01:17, Jingyun He wrote:
>>> Hi,
>>> I have attached the result of
>>> $./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount
>>> /dev/vg/scratch0 /btrfs"
>>>
>>> Please kindly check.
>>> Thank you very much.
>>>
>>> On Mon, Aug 30, 2021 at 9:05 PM Anand Jain <anand.jain@oracle.com> wrote:
>>>>
>>>> open_ctree() took 228254398 us. And 98% of it that is 225418272 us
>>>> was taken by btrfs_read_block_groups().
>>>>
>>>> -------------------
>>>>     1) $ 225418272 us | } /* btrfs_read_block_groups [btrfs] */
>>>>     1) * 16934.96 us | btrfs_check_rw_degradable [btrfs]();
>>>>     0) 0.967 us | btrfs_apply_pending_changes [btrfs]();
>>>>     0) 0.239 us | btrfs_read_qgroup_config [btrfs]();
>>>>     0) * 21017.34 us | btrfs_get_root_ref [btrfs]();
>>>>     0) + 15.717 us | btrfs_start_pre_rw_mount [btrfs]();
>>>>     0) 0.865 us | btrfs_discard_resume [btrfs]();
>>>>     0) $ 228254398 us | } /* open_ctree [btrfs] */
>>>> -------------------
>>>>
>>>> Now we need to run the same thing on btrfs_read_block_groups(),
>>>> could you please run.. [1] (no need of the time).
>>>>
>>>> [1]
>>>>      $ umount /btrfs;
>>>>      $./ftracegraph btrfs_read_block_groups 2 "*:mod:btrfs" "mount
>>>> /dev/vg/scratch0 /btrfs"
>>>>
>>>> Thanks, Anand
>>>>
>>>>
>>>> On 30/08/2021 14:44, Jingyun He wrote:
>>>>> Hi, Anand,
>>>>> I have attached the new result.
>>>>> Kindly check.
>>>>>
>>>>> Thank you.
>>>>>
>>>>> On Mon, Aug 30, 2021 at 9:27 AM Anand Jain <anand.jain@oracle.com> wrote:
>>>>>>
>>>>>>
>>>>>> Our open_ctree() took around ~223secs (~3.7mins)
>>>>>>
>>>>>>      1) $ 223375750 us |  } /* open_ctree [btrfs] */
>>>>>>
>>>>>> Unfortunately, the default trace buffer per CPU (4K) wasn't sufficient
>>>>>> and, the trace-buffer rolled over.
>>>>>> So we still don't know how long we spent in btrfs_read_block_groups().
>>>>>> Sorry for my mistake we should go 1 step at a time and, we have to do
>>>>>> this until we narrow it down to a specific function.
>>>>>>
>>>>>> Could you please run with the depth = 2 (instead of 3) and use the time
>>>>>> command prefix. Also, pull a new ftracegraph as I have updated it to
>>>>>> display a proper time output.
>>>>>>
>>>>>> $ ftracegraph open_ctree 2 "*:mod:btrfs" "time mount /dev/vg/scratch0
>>>>>> /btrfs"
>>>>>>
>>>>>> Thanks, Anand
>>>>>>
>>>>>> On 29/08/2021 17:42, Jingyun He wrote:
>>>>>>> Hi, Anand
>>>>>>>
>>>>>>> I have attached the file.
>>>>>>> Could you kindly check this?
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>> On Sun, Aug 29, 2021 at 7:47 AM Anand Jain <anand.jain@oracle.com> wrote:
>>>>>>>>
>>>>>>>> On 28/08/2021 19:58, Jingyun He wrote:
>>>>>>>>> Hello, all
>>>>>>>>> I'm new to btrfs, I have a HM-SMR 14TB disk, I have formatted it to
>>>>>>>>> btrfs to store the files.
>>>>>>>>>
>>>>>>>>> When the device is almost full, it needs about 5 mins to mount the device.
>>>>>>>>>
>>>>>>>>> Is it normal? is there any mount option that I can use to reduce the mount time?
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>       We need to figure out the function taking a longer time (maybe it is
>>>>>>>>       read-block-groups). I have similar reports on the non zoned device
>>>>>>>>       as well (with a few TB full of data). But there is no good data yet
>>>>>>>>       to analyse.
>>>>>>>>
>>>>>>>>       Could you please collect the trace data from the ftracegraph
>>>>>>>>       from here [1] (It needs trace-cmd).
>>>>>>>>
>>>>>>>>       [1] https://github.com/asj/btrfstrace.git
>>>>>>>>
>>>>>>>>       Run it as in the example below:
>>>>>>>>
>>>>>>>>       umount /btrfs;
>>>>>>>>
>>>>>>>>       ./ftracegraph open_ctree 3 "*:mod:btrfs" "mount /dev/vg/scratch0 /btrfs"
>>>>>>>>
>>>>>>>>       cat /tmp/ftracegraph.out
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks, Anand
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thank you.
>>>>>>>>>
>>>>>>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-09-07  9:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-28 11:58 btrfs mount takes too long time Jingyun He
2021-08-28 23:47 ` Anand Jain
     [not found]   ` <CAHQ7scVkHp8Lcfxx2QZXv2ghkW-nYpiFGntyZa0Toz2hU3S-tQ@mail.gmail.com>
2021-08-30  1:27     ` Anand Jain
     [not found]       ` <CAHQ7scUiNLVD_4---ArBet-0DqzfmmH5Y9JgQY0grYrUv8yhiQ@mail.gmail.com>
2021-08-30 13:04         ` Anand Jain
2021-08-30 13:04         ` Anand Jain
     [not found]           ` <CAHQ7scVOYuzz58KcO_fo2rph44CCC46ef=LFVZF8XzoKYq27ig@mail.gmail.com>
2021-08-31 11:09             ` Anand Jain
2021-08-31 13:10               ` Johannes Thumshirn
2021-08-31 16:47                 ` David Sterba
2021-09-07  5:29                   ` Naohiro Aota
     [not found]               ` <CAHQ7scXp9gTANB9gWHqQgXUsnF-YvTOKrK1waUScmqvHU_0ifw@mail.gmail.com>
2021-09-07  9:52                 ` Anand Jain
2021-09-01 16:11           ` Dāvis Mosāns
2021-09-01 21:30             ` Anand Jain
2021-09-02 16:56               ` Dāvis Mosāns
2021-09-03  9:56                 ` Anand Jain
2021-09-02 17:34               ` Joshua
2021-09-01 23:42             ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.