All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: John Ettedgui <john.ettedgui@gmail.com>,
	Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
Date: Fri, 15 Jul 2016 13:24:45 +0800	[thread overview]
Message-ID: <fd718727-9b54-ba58-5f39-63066181d13e@cn.fujitsu.com> (raw)
In-Reply-To: <CAJ3TwYRXwDVVfT0TRRiM9dEw-7TvY8qG=WvMYKczZOv6wkFWAQ@mail.gmail.com>



At 07/15/2016 12:39 PM, John Ettedgui wrote:
> On Thu, Jul 14, 2016 at 8:56 PM Qu Wenruo <quwenruo@cn.fujitsu.com
> <mailto:quwenruo@cn.fujitsu.com>> wrote:
>
>     Sorry for the late reply.
>
> Oh it's all good, it's only a been a few days.
>
>     [Slow mount]
>     In fact we also reproduce the same problem, and found the problem.
>
> Awesome!
>
>     It's related to the size of extent tree.
>
>     If the extent tree is large enough, mount needs to do quite a lot of IO
>     to read out all block group items.
>     And such read is random small read (default leaf size is just 16K), and
>     considering the per GB cost, spinning rust is the normal choice for such
>     large fs, which makes random small read even more slower.
>
>
>     The good news is, we have patch to slightly speedup the mount, by
>     avoiding reading out unrelated tree blocks.
>
>     In our test environment, it takes 15% less time to mount a fs filled
>     with 16K files(2T used space).
>
>     https://patchwork.kernel.org/patch/9021421/
>
>
> Great, I will try this and report on it.
>
>     And according to the facts that only extent size is related to the
>     problem, any method to reduce extent tree size will help, including
>     defrag, nodatacow.
>
> Would increasing the leaf size help as well?
May help.
But didn't test it, and since leafsize can only be determined at mkfs 
time, it's not an easy thing to try it.

> nodatacow seems unsafe
Nodatacow is not that unsafe, as btrfs will still do data cow if it's 
needed, like rewriting data of another subvolume/snapshot.

That would be one of the most obvious method if you do a lot of rewrite.

> as for defrag, all my partitions are already on
> autodefrag, so I assume that should be good. Or is manual once in a
> while a good idea as well?
AFAIK autodefrag will only help if you're doing appending write.

Manual one will help more, but since btrfs has problem defraging extents 
shared by different subvolumes, I doubt the effect if you have a lot of 
subvolumes/snapshots.


Another method is to disable compression.
For compression, file extent size up limit is 128K, while for 
non-compress case, it's 128M.

So for the same 1G sized file, it would cause 8K extents using 
compression, while only 8 extents without compression.

>
> Is there a way to display the tree size? that would help knowing what
> worked and what didn't.

You can dump the whole extent tree to get the accurate size:

# btrfs-debug-tree -t 2 <your dev> > some_file

It may be quite long, so output redirection is highly recommended.
You can do it online(mounted), but if the fs is very very large, it's 
recommended to do it offline(unmounted), or at least make sure there is 
not much write while mounted.

Check the first few line then you can already get the overall size:

------
btrfs-progs v4.6.1
extent tree key (EXTENT_TREE ROOT_ITEM 0)
node 30441472 level *1* items 41 free 452 generation 7 owner 2
------

If the level is high (7 is the highest possible value), it's almost sure 
that's the problem.

For accurate space size, use the following scrip
t to get the number of extent tree blocks:

------
$ egrep -e "^node" -e "^leaf" some_file | wc -l
------

Then multiple it by nodesize, you get the accurate size of extent tree.

Thanks,
Qu
>
>
>     [Btrfsck OOM]
>     Lu Fengqi is developing btrfsck low memory usage mode.
>     It's not merged into mainline btrfs progs and not fully completely, but
>     shows quite positive result for large fs.
>
>     It may needs sometime to get it stable, but IMHO it's going the right
>     direction.
>
> Well that is great news as well, thank you for sharing it!
>
>     Thanks,
>     Qu
>
>
> Thank you!
> John



  parent reply	other threads:[~2016-07-15  5:25 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAJ3TwYQXqUZiKhYc5rciTmvGX1RLkHnkQb5SSYAJ7AD+kbudag@mail.gmail.com>
2015-07-31  2:34 ` mount btrfs takes 30 minutes, btrfs check runs out of memory Qu Wenruo
2015-07-31  4:10   ` John Ettedgui
2015-08-02  5:44     ` Georgi Georgiev
     [not found]   ` <CAJ3TwYRN+1tJY+paz=qZT0_XP=r9CcTKbBgX_kZRFOWj8vSK=w@mail.gmail.com>
2015-07-31  4:52     ` Qu Wenruo
     [not found]       ` <CAJ3TwYR5g-JhjmGnZUXqLXc7qV1_=AN5_6sj54JQODbtgG9Aag@mail.gmail.com>
2015-07-31  5:40         ` Qu Wenruo
2015-07-31  5:45           ` John Ettedgui
2015-08-01  4:35             ` John Ettedgui
2015-08-01 10:05               ` Russell Coker
2015-08-04  1:39               ` Qu Wenruo
2015-08-04  1:55                 ` John Ettedgui
2015-08-04  2:31                   ` John Ettedgui
2015-08-04  3:01                   ` Qu Wenruo
2015-08-04  4:58                     ` John Ettedgui
2015-08-04  6:47                       ` Duncan
2015-08-04 11:28                       ` Austin S Hemmelgarn
2015-08-04 17:36                         ` John Ettedgui
2015-08-05 11:30                           ` Austin S Hemmelgarn
2015-08-13 22:38                             ` Vincent Olivier
2015-08-13 23:19                               ` Chris Murphy
2015-08-14  0:30                                 ` Duncan
2015-08-14  2:42                                   ` Vincent Olivier
2015-08-18 17:36                                     ` Vincent Olivier
2015-08-14  2:39                                 ` Vincent Olivier
     [not found]                             ` <CAJ3TwYSW+SvbBrh1u_x+c3HTRx03qSR6BoH5cj_VzCXxZYv6EA@mail.gmail.com>
2016-07-15  3:56                               ` Qu Wenruo
     [not found]                                 ` <CAJ3TwYRXwDVVfT0TRRiM9dEw-7TvY8qG=WvMYKczZOv6wkFWAQ@mail.gmail.com>
2016-07-15  5:24                                   ` Qu Wenruo [this message]
2016-07-15  6:56                                     ` Kai Krakow
     [not found]                                     ` <CAJ3TwYSTnQfj=qmBLtnmtXQKexMMD4x=9Gk3p3anf4uF+G26kw@mail.gmail.com>
     [not found]                                       ` <CAJ3TwYTnMPVwkrZEU-=Q_Nq+9Bn0vM3z+EFC8RP=RTyaufSoqw@mail.gmail.com>
2016-07-18  1:13                                         ` Qu Wenruo
     [not found]                                           ` <CAJ3TwYRpc_R-wVur0T6+Uy_aPVXTGpvp_ag1Ar9K2HoB0H1ySQ@mail.gmail.com>
2016-07-18  8:41                                             ` Qu Wenruo
     [not found]                                               ` <CAJ3TwYRH8JVkuv2Hu7FYb+BSwKGrq1spx079zwOF_FO1y=9NFA@mail.gmail.com>
2016-07-18  9:07                                                 ` Qu Wenruo
2016-07-18 15:31                                                   ` Duncan
     [not found]                                                   ` <CAJ3TwYS6UTkWf=PNku3RG7hPrXMKz3yhk2WqCRLix4v_VwgrmA@mail.gmail.com>
2016-07-21  8:10                                                     ` Qu Wenruo
     [not found]                                                       ` <CAJ3TwYQ47SVpbO1Pb-TWjhaTCCpMFFmijwTgmV8=7+1_a6_3Ww@mail.gmail.com>
2016-07-21  8:19                                                         ` Qu Wenruo
2016-07-21 15:47                                                           ` Graham Cobb
2017-04-10  0:52                                                             ` Qu Wenruo
2018-02-13 10:21                                                           ` John Ettedgui
2018-02-13 11:04                                                             ` Qu Wenruo
2018-02-13 11:25                                                               ` John Ettedgui
2018-02-13 11:40                                                                 ` Qu Wenruo
2018-02-13 12:06                                                                   ` John Ettedgui
2018-02-13 12:46                                                                     ` Qu Wenruo
2018-02-13 12:52                                                                       ` John Ettedgui
2018-02-13 12:26                                                                   ` Holger Hoffstätte
2018-02-13 12:54                                                                     ` Qu Wenruo
2018-02-13 16:24                                                                       ` Holger Hoffstätte
2018-02-14  0:43                                                                         ` Qu Wenruo
2016-07-15 11:29                                 ` Christian Rohmann
2016-07-16 23:53                                   ` Qu Wenruo
2016-07-18 13:42                                     ` Josef Bacik
2016-07-19  0:35                                       ` Qu Wenruo
2016-07-25 13:01                                       ` David Sterba
2016-07-25 13:38                                         ` Josef Bacik
2015-08-04 14:38                     ` Chris Murphy
2015-07-29  5:46 Georgi Georgiev
2015-07-29  6:19 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fd718727-9b54-ba58-5f39-63066181d13e@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=ahferroin7@gmail.com \
    --cc=john.ettedgui@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.