* btrfs check lowmem, take 2 @ 2018-07-10 18:09 Marc MERLIN 2018-07-11 0:53 ` Su Yue 2018-07-11 17:09 ` Chris Murphy 0 siblings, 2 replies; 14+ messages in thread From: Marc MERLIN @ 2018-07-10 18:09 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs; +Cc: Su Yue, Su Yue Thanks to Su and Qu, I was able to get my filesystem to a point that it's mountable. I then deleted loads of snapshots and I'm down to 26. IT now looks like this: gargamel:~# btrfs fi show /mnt/mnt Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d Total devices 1 FS bytes used 12.30TiB devid 1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 gargamel:~# btrfs fi df /mnt/mnt Data, single: total=13.57TiB, used=12.19TiB System, DUP: total=32.00MiB, used=1.55MiB Metadata, DUP: total=124.50GiB, used=115.62GiB Metadata, single: total=216.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B Problems 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the server, despite my deleting lots of snapshots. Is it because I have too many files then? 2) I tried Su's master git branch for btrfs-progs to try and see how a normal check would go, and I'm stuck on this: gargamel:/var/local/src/btrfs-progs.sy# time ./btrfsck --mode=lowmem --repair /dev/mapper/dshelf2 enabling repair mode WARNING: low-memory mode repair support is only partial Checking filesystem on /dev/mapper/dshelf2 UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d root 18446744073709551607 has a root item with a more recent gen (143376) compared to the found root node (139061) ERROR: failed to repair root items: Invalid argument real 75m8.046s user 0m14.591s sys 0m52.431s I understand what the message means, I just need to switch to the newer root but honestly I'm not quite sure how to do this from the btrfs-check man page. This didn't work: time ./btrfsck --mode=lowmem --repair --chunk-root=18446744073709551607 /dev/mapper/dshelf2 enabling repair mode WARNING: low-memory mode repair support is only partial WARNING: chunk_root_bytenr 18446744073709551607 is unaligned to 4096, ignore it How do I address the error above? Thanks Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs check lowmem, take 2 2018-07-10 18:09 btrfs check lowmem, take 2 Marc MERLIN @ 2018-07-11 0:53 ` Su Yue 2018-07-11 0:58 ` Marc MERLIN 2018-07-11 17:09 ` Chris Murphy 1 sibling, 1 reply; 14+ messages in thread From: Su Yue @ 2018-07-11 0:53 UTC (permalink / raw) To: Marc MERLIN, Qu Wenruo, linux-btrfs; +Cc: Su Yue On 07/11/2018 02:09 AM, Marc MERLIN wrote: > Thanks to Su and Qu, I was able to get my filesystem to a point that > it's mountable. > I then deleted loads of snapshots and I'm down to 26. > > IT now looks like this: > gargamel:~# btrfs fi show /mnt/mnt > Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d > Total devices 1 FS bytes used 12.30TiB > devid 1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 > > gargamel:~# btrfs fi df /mnt/mnt > Data, single: total=13.57TiB, used=12.19TiB > System, DUP: total=32.00MiB, used=1.55MiB > Metadata, DUP: total=124.50GiB, used=115.62GiB > Metadata, single: total=216.00MiB, used=0.00B > GlobalReserve, single: total=512.00MiB, used=0.00B > > > Problems > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the > server, despite my deleting lots of snapshots. > Is it because I have too many files then? > Yes. Original check first gather all infomation about extent tree and your files in RAM, then process one by one. But deleting still counts, it does speed lowmem check up. > 2) I tried Su's master git branch for btrfs-progs to try and see how Oh..No... My master branch is still 4.14. The true mater branch is David's here: https://github.com/kdave/btrfs-progs But the master branch has a known bug which I fixed yesterday, please see the mail. Thanks, Su > normal check would go, and I'm stuck on this: > gargamel:/var/local/src/btrfs-progs.sy# time ./btrfsck --mode=lowmem --repair /dev/mapper/dshelf2 > enabling repair mode > WARNING: low-memory mode repair support is only partial > Checking filesystem on /dev/mapper/dshelf2 > UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d > root 18446744073709551607 has a root item with a more recent gen (143376) compared to the found root node (139061) > ERROR: failed to repair root items: Invalid argument Thanks to the failure, the old version didn't do any things wrong. > > real 75m8.046s > user 0m14.591s > sys 0m52.431s > > I understand what the message means, I just need to switch to the newer root > but honestly I'm not quite sure how to do this from the btrfs-check man page. > > This didn't work: > time ./btrfsck --mode=lowmem --repair --chunk-root=18446744073709551607 /dev/mapper/dshelf2 > enabling repair mode > WARNING: low-memory mode repair support is only partial > WARNING: chunk_root_bytenr 18446744073709551607 is unaligned to 4096, ignore it > > How do I address the error above? > Thanks > Marc > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs check lowmem, take 2 2018-07-11 0:53 ` Su Yue @ 2018-07-11 0:58 ` Marc MERLIN 2018-07-11 1:08 ` Su Yue 0 siblings, 1 reply; 14+ messages in thread From: Marc MERLIN @ 2018-07-11 0:58 UTC (permalink / raw) To: Su Yue; +Cc: Qu Wenruo, linux-btrfs, Su Yue On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote: > > Problems > > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the > > server, despite my deleting lots of snapshots. > > Is it because I have too many files then? > > > Yes. Original check first gather all infomation about extent tree and > your files in RAM, then process one by one. > But deleting still counts, it does speed lowmem check up. Understood. > > 2) I tried Su's master git branch for btrfs-progs to try and see how > Oh..No... My master branch is still 4.14. The true mater branch is > David's here: > https://github.com/kdave/btrfs-progs > But the master branch has a known bug which I fixed yesterday, please see > the mail. So, if I git sync it now, it should have your fix, and I can run it, correct? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs check lowmem, take 2 2018-07-11 0:58 ` Marc MERLIN @ 2018-07-11 1:08 ` Su Yue 2018-07-11 1:44 ` Marc MERLIN 0 siblings, 1 reply; 14+ messages in thread From: Su Yue @ 2018-07-11 1:08 UTC (permalink / raw) To: Marc MERLIN; +Cc: Qu Wenruo, linux-btrfs, Su Yue On 07/11/2018 08:58 AM, Marc MERLIN wrote: > On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote: >>> Problems >>> 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the >>> server, despite my deleting lots of snapshots. >>> Is it because I have too many files then? >>> >> Yes. Original check first gather all infomation about extent tree and >> your files in RAM, then process one by one. >> But deleting still counts, it does speed lowmem check up. > > Understood. > >>> 2) I tried Su's master git branch for btrfs-progs to try and see how >> Oh..No... My master branch is still 4.14. The true mater branch is >> David's here: >> https://github.com/kdave/btrfs-progs >> But the master branch has a known bug which I fixed yesterday, please see >> the mail. > > So, if I git sync it now, it should have your fix, and I can run it, > correct? > Yes, please. Thanks, Su > Thanks, > Marc > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs check lowmem, take 2 2018-07-11 1:08 ` Su Yue @ 2018-07-11 1:44 ` Marc MERLIN 2018-07-11 1:58 ` Su Yue 0 siblings, 1 reply; 14+ messages in thread From: Marc MERLIN @ 2018-07-11 1:44 UTC (permalink / raw) To: Su Yue; +Cc: Qu Wenruo, linux-btrfs, Su Yue On Wed, Jul 11, 2018 at 09:08:40AM +0800, Su Yue wrote: > > > On 07/11/2018 08:58 AM, Marc MERLIN wrote: > > On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote: > > > > Problems > > > > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the > > > > server, despite my deleting lots of snapshots. > > > > Is it because I have too many files then? > > > > > > > Yes. Original check first gather all infomation about extent tree and > > > your files in RAM, then process one by one. > > > But deleting still counts, it does speed lowmem check up. > > > > Understood. > > > > > > 2) I tried Su's master git branch for btrfs-progs to try and see how > > > Oh..No... My master branch is still 4.14. The true mater branch is > > > David's here: > > > https://github.com/kdave/btrfs-progs > > > But the master branch has a known bug which I fixed yesterday, please see > > > the mail. > > > > So, if I git sync it now, it should have your fix, and I can run it, > > correct? > > > Yes, please. Ok, I am now running gargamel:~# time btrfs check --mode=lowmem --repair /dev/mapper/dshelf2 using git master from https://github.com/kdave/btrfs-progs I will report back how long it takes with extent tree check and whether it returns clean, or not. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs check lowmem, take 2 2018-07-11 1:44 ` Marc MERLIN @ 2018-07-11 1:58 ` Su Yue 2018-07-11 3:36 ` Marc MERLIN 0 siblings, 1 reply; 14+ messages in thread From: Su Yue @ 2018-07-11 1:58 UTC (permalink / raw) To: Marc MERLIN; +Cc: Qu Wenruo, linux-btrfs, Su Yue On 07/11/2018 09:44 AM, Marc MERLIN wrote: > On Wed, Jul 11, 2018 at 09:08:40AM +0800, Su Yue wrote: >> >> >> On 07/11/2018 08:58 AM, Marc MERLIN wrote: >>> On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote: >>>>> Problems >>>>> 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the >>>>> server, despite my deleting lots of snapshots. >>>>> Is it because I have too many files then? >>>>> >>>> Yes. Original check first gather all infomation about extent tree and >>>> your files in RAM, then process one by one. >>>> But deleting still counts, it does speed lowmem check up. >>> >>> Understood. >>> >>>>> 2) I tried Su's master git branch for btrfs-progs to try and see how >>>> Oh..No... My master branch is still 4.14. The true mater branch is >>>> David's here: >>>> https://github.com/kdave/btrfs-progs >>>> But the master branch has a known bug which I fixed yesterday, please see >>>> the mail. >>> >>> So, if I git sync it now, it should have your fix, and I can run it, >>> correct? >>> >> Yes, please. > > Ok, I am now running > gargamel:~# time btrfs check --mode=lowmem --repair /dev/mapper/dshelf2 > using git master from https://github.com/kdave/btrfs-progs > Please stop check, plese. The branch 'it' which I mean is https://github.com/Damenly/btrfs-progs/tree/tmp1 > I will report back how long it takes with extent tree check and whether > it returns clean, or not. > > Marc > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs check lowmem, take 2 2018-07-11 1:58 ` Su Yue @ 2018-07-11 3:36 ` Marc MERLIN 2018-07-11 4:07 ` Su Yue 0 siblings, 1 reply; 14+ messages in thread From: Marc MERLIN @ 2018-07-11 3:36 UTC (permalink / raw) To: Su Yue; +Cc: Qu Wenruo, linux-btrfs, Su Yue On Wed, Jul 11, 2018 at 09:58:36AM +0800, Su Yue wrote: > > > On 07/11/2018 09:44 AM, Marc MERLIN wrote: > > On Wed, Jul 11, 2018 at 09:08:40AM +0800, Su Yue wrote: > > > > > > > > > On 07/11/2018 08:58 AM, Marc MERLIN wrote: > > > > On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote: > > > > > > Problems > > > > > > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the > > > > > > server, despite my deleting lots of snapshots. > > > > > > Is it because I have too many files then? > > > > > > > > > > > Yes. Original check first gather all infomation about extent tree and > > > > > your files in RAM, then process one by one. > > > > > But deleting still counts, it does speed lowmem check up. > > > > > > > > Understood. > > > > > > > > > > 2) I tried Su's master git branch for btrfs-progs to try and see how > > > > > Oh..No... My master branch is still 4.14. The true mater branch is > > > > > David's here: > > > > > https://github.com/kdave/btrfs-progs > > > > > But the master branch has a known bug which I fixed yesterday, please see > > > > > the mail. > > > > > > > > So, if I git sync it now, it should have your fix, and I can run it, > > > > correct? > > > > > > > Yes, please. > > > > Ok, I am now running > > gargamel:~# time btrfs check --mode=lowmem --repair /dev/mapper/dshelf2 > > using git master from https://github.com/kdave/btrfs-progs > > > Please stop check, plese. > > The branch 'it' which I mean is > https://github.com/Damenly/btrfs-progs/tree/tmp1 Ok, sorry I thought you said you had pushed your changes to https://github.com/kdave/btrfs-progs yesterday. So, I went back to https://github.com/Damenly/btrfs-progs.git/tmp1 and I'm running it without the extra options you added with hardcoded stuff: gargamel:/var/local/src/btrfs-progs.sy-test# ./btrfsck --mode=lowmem --repair /dev/mapper/dshelf2 Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs check lowmem, take 2 2018-07-11 3:36 ` Marc MERLIN @ 2018-07-11 4:07 ` Su Yue 2018-07-11 4:39 ` Marc MERLIN 0 siblings, 1 reply; 14+ messages in thread From: Su Yue @ 2018-07-11 4:07 UTC (permalink / raw) To: Marc MERLIN; +Cc: Qu Wenruo, linux-btrfs, Su Yue On 07/11/2018 11:36 AM, Marc MERLIN wrote: > On Wed, Jul 11, 2018 at 09:58:36AM +0800, Su Yue wrote: >> >> >> On 07/11/2018 09:44 AM, Marc MERLIN wrote: >>> On Wed, Jul 11, 2018 at 09:08:40AM +0800, Su Yue wrote: >>>> >>>> >>>> On 07/11/2018 08:58 AM, Marc MERLIN wrote: >>>>> On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote: >>>>>>> Problems >>>>>>> 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the >>>>>>> server, despite my deleting lots of snapshots. >>>>>>> Is it because I have too many files then? >>>>>>> >>>>>> Yes. Original check first gather all infomation about extent tree and >>>>>> your files in RAM, then process one by one. >>>>>> But deleting still counts, it does speed lowmem check up. >>>>> >>>>> Understood. >>>>> >>>>>>> 2) I tried Su's master git branch for btrfs-progs to try and see how >>>>>> Oh..No... My master branch is still 4.14. The true mater branch is >>>>>> David's here: >>>>>> https://github.com/kdave/btrfs-progs >>>>>> But the master branch has a known bug which I fixed yesterday, please see >>>>>> the mail. >>>>> >>>>> So, if I git sync it now, it should have your fix, and I can run it, >>>>> correct? >>>>> >>>> Yes, please. >>> >>> Ok, I am now running >>> gargamel:~# time btrfs check --mode=lowmem --repair /dev/mapper/dshelf2 >>> using git master from https://github.com/kdave/btrfs-progs >>> >> Please stop check, plese. >> >> The branch 'it' which I mean is >> https://github.com/Damenly/btrfs-progs/tree/tmp1 > > Ok, sorry I thought you said you had pushed your changes to https://github.com/kdave/btrfs-progs > yesterday. > > So, I went back to https://github.com/Damenly/btrfs-progs.git/tmp1 and > I'm running it without the extra options you added with hardcoded stuff: > gargamel:/var/local/src/btrfs-progs.sy-test# ./btrfsck --mode=lowmem --repair /dev/mapper/dshelf2 > This is okay. Let's wait to see the result. Thanks Su > Marc > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs check lowmem, take 2 2018-07-11 4:07 ` Su Yue @ 2018-07-11 4:39 ` Marc MERLIN 0 siblings, 0 replies; 14+ messages in thread From: Marc MERLIN @ 2018-07-11 4:39 UTC (permalink / raw) To: Su Yue; +Cc: Qu Wenruo, linux-btrfs, Su Yue On Wed, Jul 11, 2018 at 12:07:05PM +0800, Su Yue wrote: > > So, I went back to https://github.com/Damenly/btrfs-progs.git/tmp1 and > > I'm running it without the extra options you added with hardcoded stuff: > > gargamel:/var/local/src/btrfs-progs.sy-test# ./btrfsck --mode=lowmem --repair /dev/mapper/dshelf2 > > > This is okay. Let's wait to see the result. Sadly, it crashes quickly: Starting program: /var/local/src/btrfs-progs.sy-test/btrfs check --mode=lowmem --repair /dev/mapper/dshelf2 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". enabling repair mode WARNING: low-memory mode repair support is only partial Checking filesystem on /dev/mapper/dshelf2 UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d checking extents Program received signal SIGSEGV, Segmentation fault. check_tree_block_backref (fs_info=fs_info@entry=0x555555825e10, root_id=root_id@entry=18446744073709551607, bytenr=bytenr@entry=655589376, level=level@entry=1) at check/mode-lowmem.c:3744 3744 if (btrfs_header_bytenr(node) != bytenr) { (gdb) bt #0 check_tree_block_backref (fs_info=fs_info@entry=0x555555825e10, root_id=root_id@entry=18446744073709551607, bytenr=bytenr@entry=655589376, level=level@entry=1) at check/mode-lowmem.c:3744 #1 0x00005555555cb1f9 in check_extent_item (fs_info=fs_info@entry=0x555555825e10, path=path@entry=0x7fffffffdc60) at check/mode-lowmem.c:4194 #2 0x00005555555d06e9 in check_leaf_items (account_bytes=1, nrefs=0x7fffffffdb80, path=0x7fffffffdc60, root=0x5555558262f0) at check/mode-lowmem.c:4654 #3 walk_down_tree (check_all=1, nrefs=0x7fffffffdb80, level=<synthetic pointer>, path=0x7fffffffdc60, root=0x5555558262f0) at check/mode-lowmem.c:4790 #4 check_btrfs_root (root=root@entry=0x5555558262f0, check_all=check_all@entry=1) at check/mode-lowmem.c:5114 #5 0x00005555555d144f in check_chunks_and_extents_lowmem (fs_info=fs_info@entry=0x555555825e10) at check/mode-lowmem.c:5475 #6 0x00005555555b44b1 in do_check_chunks_and_extents (fs_info=0x555555825e10) at check/main.c:8369 #7 cmd_check (argc=<optimized out>, argv=<optimized out>) at check/main.c:9899 #8 0x0000555555567510 in main (argc=4, argv=0x7fffffffe390) at btrfs.c:302 Would you like anything off gdb? (feel free to Email me directly or point me to an online chat platform you have access to) Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs check lowmem, take 2 2018-07-10 18:09 btrfs check lowmem, take 2 Marc MERLIN 2018-07-11 0:53 ` Su Yue @ 2018-07-11 17:09 ` Chris Murphy 2018-07-11 17:14 ` btrfs check mode normal still hard crash-hanging systems Marc MERLIN 2018-07-12 5:26 ` Why original mode doesn't use swap? (Original: Re: btrfs check lowmem, take 2) Qu Wenruo 1 sibling, 2 replies; 14+ messages in thread From: Chris Murphy @ 2018-07-11 17:09 UTC (permalink / raw) To: Marc MERLIN; +Cc: Qu Wenruo, Btrfs BTRFS, Su Yue, Su Yue On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN <marc@merlins.org> wrote: > Thanks to Su and Qu, I was able to get my filesystem to a point that > it's mountable. > I then deleted loads of snapshots and I'm down to 26. > > IT now looks like this: > gargamel:~# btrfs fi show /mnt/mnt > Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d > Total devices 1 FS bytes used 12.30TiB > devid 1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 > > gargamel:~# btrfs fi df /mnt/mnt > Data, single: total=13.57TiB, used=12.19TiB > System, DUP: total=32.00MiB, used=1.55MiB > Metadata, DUP: total=124.50GiB, used=115.62GiB > Metadata, single: total=216.00MiB, used=0.00B > GlobalReserve, single: total=512.00MiB, used=0.00B > > > Problems > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the > server, despite my deleting lots of snapshots. > Is it because I have too many files then? I think originally needs most of metdata in memory. I'm not understanding why btrfs check won't use swap like at least xfs_repair and pretty sure e2fsck will as well. Using 128G swap on nvme with original check is still gonna be faster than lowmem mode. -- Chris Murphy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: btrfs check mode normal still hard crash-hanging systems 2018-07-11 17:09 ` Chris Murphy @ 2018-07-11 17:14 ` Marc MERLIN 2018-07-12 5:26 ` Why original mode doesn't use swap? (Original: Re: btrfs check lowmem, take 2) Qu Wenruo 1 sibling, 0 replies; 14+ messages in thread From: Marc MERLIN @ 2018-07-11 17:14 UTC (permalink / raw) To: Chris Murphy; +Cc: Qu Wenruo, dsterba, Btrfs BTRFS, Su Yue, Su Yue On Wed, Jul 11, 2018 at 11:09:56AM -0600, Chris Murphy wrote: > On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN <marc@merlins.org> wrote: > > Thanks to Su and Qu, I was able to get my filesystem to a point that > > it's mountable. > > I then deleted loads of snapshots and I'm down to 26. > > > > IT now looks like this: > > gargamel:~# btrfs fi show /mnt/mnt > > Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d > > Total devices 1 FS bytes used 12.30TiB > > devid 1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 > > > > gargamel:~# btrfs fi df /mnt/mnt > > Data, single: total=13.57TiB, used=12.19TiB > > System, DUP: total=32.00MiB, used=1.55MiB > > Metadata, DUP: total=124.50GiB, used=115.62GiB > > Metadata, single: total=216.00MiB, used=0.00B > > GlobalReserve, single: total=512.00MiB, used=0.00B > > > > > > Problems > > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the > > server, despite my deleting lots of snapshots. > > Is it because I have too many files then? > > I think originally needs most of metdata in memory. > > I'm not understanding why btrfs check won't use swap like at least > xfs_repair and pretty sure e2fsck will as well. > > Using 128G swap on nvme with original check is still gonna be faster > than lowmem mode. Yeah, that's been also a concern/question of mine all these years, even if Su isn't working on that code, and likely is the wrong person to ask. Personally, my take is that if btrfs wants to be taken seriously, at the very least its fsck tool should not hard crash a system you run it on. (and it really does the worst kind of hard crash I've ever seen, OOM can't trigger fast enough, linux doesn't panic, so it can't self reboot either, it just hard dies and hangs) Maybe David knows? Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Why original mode doesn't use swap? (Original: Re: btrfs check lowmem, take 2) 2018-07-11 17:09 ` Chris Murphy 2018-07-11 17:14 ` btrfs check mode normal still hard crash-hanging systems Marc MERLIN @ 2018-07-12 5:26 ` Qu Wenruo 2018-07-12 23:14 ` Marc MERLIN 1 sibling, 1 reply; 14+ messages in thread From: Qu Wenruo @ 2018-07-12 5:26 UTC (permalink / raw) To: Chris Murphy, Marc MERLIN; +Cc: Btrfs BTRFS, Su Yue, Su Yue On 2018年07月12日 01:09, Chris Murphy wrote: > On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN <marc@merlins.org> wrote: >> Thanks to Su and Qu, I was able to get my filesystem to a point that >> it's mountable. >> I then deleted loads of snapshots and I'm down to 26. >> >> IT now looks like this: >> gargamel:~# btrfs fi show /mnt/mnt >> Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d >> Total devices 1 FS bytes used 12.30TiB >> devid 1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 >> >> gargamel:~# btrfs fi df /mnt/mnt >> Data, single: total=13.57TiB, used=12.19TiB >> System, DUP: total=32.00MiB, used=1.55MiB >> Metadata, DUP: total=124.50GiB, used=115.62GiB >> Metadata, single: total=216.00MiB, used=0.00B >> GlobalReserve, single: total=512.00MiB, used=0.00B >> >> >> Problems >> 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the >> server, despite my deleting lots of snapshots. >> Is it because I have too many files then? > > I think originally needs most of metdata in memory. > > I'm not understanding why btrfs check won't use swap like at least > xfs_repair and pretty sure e2fsck will as well. I don't understand either. Isn't memory from malloc() swappable? Thanks, Qu > > Using 128G swap on nvme with original check is still gonna be faster > than lowmem mode. > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Why original mode doesn't use swap? (Original: Re: btrfs check lowmem, take 2) 2018-07-12 5:26 ` Why original mode doesn't use swap? (Original: Re: btrfs check lowmem, take 2) Qu Wenruo @ 2018-07-12 23:14 ` Marc MERLIN 2018-07-13 0:22 ` Qu Wenruo 0 siblings, 1 reply; 14+ messages in thread From: Marc MERLIN @ 2018-07-12 23:14 UTC (permalink / raw) To: Qu Wenruo; +Cc: Chris Murphy, Btrfs BTRFS, Su Yue, Su Yue On Thu, Jul 12, 2018 at 01:26:41PM +0800, Qu Wenruo wrote: > > > On 2018年07月12日 01:09, Chris Murphy wrote: > > On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN <marc@merlins.org> wrote: > >> Thanks to Su and Qu, I was able to get my filesystem to a point that > >> it's mountable. > >> I then deleted loads of snapshots and I'm down to 26. > >> > >> IT now looks like this: > >> gargamel:~# btrfs fi show /mnt/mnt > >> Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d > >> Total devices 1 FS bytes used 12.30TiB > >> devid 1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 > >> > >> gargamel:~# btrfs fi df /mnt/mnt > >> Data, single: total=13.57TiB, used=12.19TiB > >> System, DUP: total=32.00MiB, used=1.55MiB > >> Metadata, DUP: total=124.50GiB, used=115.62GiB > >> Metadata, single: total=216.00MiB, used=0.00B > >> GlobalReserve, single: total=512.00MiB, used=0.00B > >> > >> > >> Problems > >> 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the > >> server, despite my deleting lots of snapshots. > >> Is it because I have too many files then? > > > > I think originally needs most of metdata in memory. > > > > I'm not understanding why btrfs check won't use swap like at least > > xfs_repair and pretty sure e2fsck will as well. > > I don't understand either. > > Isn't memory from malloc() swappable? I never looked at the code and why/how it crashes, but my guess was that it somehow causes the kernel to grab a lot of memory in the btrfs driver and that is what is what is crashing the system. If it were just malloc() the btrfs user space tool, it should be both swappable like you said, and should also get OOM'ed. I suppose I can still be completely wrong, but I can't find another logical explanation. I just tried running it again to trigger the problem, but because I freed a lot of snapshots, btrfs check --repair goes back to only using 10GB instead of 32GB, so I wasn't able to replicate OOM for you. Incidently, it died with: gargamel:~# btrfs check --repair /dev/mapper/dshelf2 enabling repair mode Checking filesystem on /dev/mapper/dshelf2 UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d root 18446744073709551607 has a root item with a more recent gen (143376) compared to the found root node (139061) ERROR: failed to repair root items: Invalid argument That said, when it was using a fair amount of RAM, I captured this: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1376 1.4 25.2 8256368 8240392 pts/18 R+ 14:52 1:07 btrfs check --repair /dev/mapper/dshelf2 I don't know how to read /proc/meminfo, but that's what it said: MemTotal: 32643792 kB MemFree: 1367516 kB MemAvailable: 15554836 kB Buffers: 3491672 kB Cached: 15900320 kB SwapCached: 2092 kB Active: 14577228 kB Inactive: 15028608 kB Active(anon): 12122180 kB Inactive(anon): 2643176 kB Active(file): 2455048 kB Inactive(file): 12385432 kB Unevictable: 8068 kB Mlocked: 8068 kB SwapTotal: 15616764 kB < swap was totally unused and stays unused when I get the system to crash SwapFree: 15578020 kB Dirty: 71956 kB Writeback: 64 kB AnonPages: 10219976 kB Mapped: 4033568 kB Shmem: 4545552 kB Slab: 713300 kB SReclaimable: 395508 kB SUnreclaim: 317792 kB KernelStack: 11788 kB PageTables: 52592 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 31938660 kB Committed_AS: 20070736 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 16384 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 1207572 kB DirectMap2M: 32045056 kB Does it help figure out where the memory was going and wehther kernel memory was being used? Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Why original mode doesn't use swap? (Original: Re: btrfs check lowmem, take 2) 2018-07-12 23:14 ` Marc MERLIN @ 2018-07-13 0:22 ` Qu Wenruo 0 siblings, 0 replies; 14+ messages in thread From: Qu Wenruo @ 2018-07-13 0:22 UTC (permalink / raw) To: Marc MERLIN; +Cc: Chris Murphy, Btrfs BTRFS, Su Yue, Su Yue On 2018年07月13日 07:14, Marc MERLIN wrote: > On Thu, Jul 12, 2018 at 01:26:41PM +0800, Qu Wenruo wrote: >> >> >> On 2018年07月12日 01:09, Chris Murphy wrote: >>> On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN <marc@merlins.org> wrote: >>>> Thanks to Su and Qu, I was able to get my filesystem to a point that >>>> it's mountable. >>>> I then deleted loads of snapshots and I'm down to 26. >>>> >>>> IT now looks like this: >>>> gargamel:~# btrfs fi show /mnt/mnt >>>> Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d >>>> Total devices 1 FS bytes used 12.30TiB >>>> devid 1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2 >>>> >>>> gargamel:~# btrfs fi df /mnt/mnt >>>> Data, single: total=13.57TiB, used=12.19TiB >>>> System, DUP: total=32.00MiB, used=1.55MiB >>>> Metadata, DUP: total=124.50GiB, used=115.62GiB >>>> Metadata, single: total=216.00MiB, used=0.00B >>>> GlobalReserve, single: total=512.00MiB, used=0.00B >>>> >>>> >>>> Problems >>>> 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the >>>> server, despite my deleting lots of snapshots. >>>> Is it because I have too many files then? >>> >>> I think originally needs most of metdata in memory. >>> >>> I'm not understanding why btrfs check won't use swap like at least >>> xfs_repair and pretty sure e2fsck will as well. >> >> I don't understand either. >> >> Isn't memory from malloc() swappable? > > I never looked at the code and why/how it crashes, but my guess was > that it somehow causes the kernel to grab a lot of memory in the btrfs > driver and that is what is what is crashing the system. Btrfs check is done completely at user space, so it should not be related to kernel btrfs module. > If it were just malloc() the btrfs user space tool, it should be both > swappable like you said, and should also get OOM'ed. That's the case, but then why xfs/ext check tool could take up tons of swap without get killed by OOM? > > I suppose I can still be completely wrong, but I can't find another > logical explanation. > > I just tried running it again to trigger the problem, but because I > freed a lot of snapshots, btrfs check --repair goes back to only using > 10GB instead of 32GB, so I wasn't able to replicate OOM for you. At least it's a good news for you. > > Incidently, it died with: > gargamel:~# btrfs check --repair /dev/mapper/dshelf2 > enabling repair mode > Checking filesystem on /dev/mapper/dshelf2 > UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d > root 18446744073709551607 has a root item with a more recent gen (143376) compared to the found > root node (139061) > ERROR: failed to repair root items: Invalid argument > > That said, when it was using a fair amount of RAM, I captured this: > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 1376 1.4 25.2 8256368 8240392 pts/18 R+ 14:52 1:07 btrfs check --repair /dev/mapper/dshelf2 > > I don't know how to read /proc/meminfo, but that's what it said: > MemTotal: 32643792 kB > MemFree: 1367516 kB > MemAvailable: 15554836 kB > Buffers: 3491672 kB > Cached: 15900320 kB > SwapCached: 2092 kB > Active: 14577228 kB > Inactive: 15028608 kB > Active(anon): 12122180 kB > Inactive(anon): 2643176 kB > Active(file): 2455048 kB > Inactive(file): 12385432 kB > Unevictable: 8068 kB > Mlocked: 8068 kB > SwapTotal: 15616764 kB < swap was totally unused and stays unused when I get the system to crash > SwapFree: 15578020 kB > Dirty: 71956 kB > Writeback: 64 kB > AnonPages: 10219976 kB > Mapped: 4033568 kB > Shmem: 4545552 kB > Slab: 713300 kB > SReclaimable: 395508 kB > SUnreclaim: 317792 kB > KernelStack: 11788 kB > PageTables: 52592 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 31938660 kB > Committed_AS: 20070736 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 0 kB > VmallocChunk: 0 kB > HardwareCorrupted: 0 kB > AnonHugePages: 0 kB > ShmemHugePages: 0 kB > ShmemPmdMapped: 0 kB > CmaTotal: 16384 kB > CmaFree: 0 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > Hugetlb: 0 kB > DirectMap4k: 1207572 kB > DirectMap2M: 32045056 kB > > Does it help figure out where the memory was going and wehther kernel > memory was being used? Not really, much similar to what I observed. I also tried to over-commit my memory usage on my system, however it just freeze for several seconds and then get killed by OOM, failed to capture any useful info during that freeze. Thanks, Qu > > Marc > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-07-13 0:35 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-07-10 18:09 btrfs check lowmem, take 2 Marc MERLIN 2018-07-11 0:53 ` Su Yue 2018-07-11 0:58 ` Marc MERLIN 2018-07-11 1:08 ` Su Yue 2018-07-11 1:44 ` Marc MERLIN 2018-07-11 1:58 ` Su Yue 2018-07-11 3:36 ` Marc MERLIN 2018-07-11 4:07 ` Su Yue 2018-07-11 4:39 ` Marc MERLIN 2018-07-11 17:09 ` Chris Murphy 2018-07-11 17:14 ` btrfs check mode normal still hard crash-hanging systems Marc MERLIN 2018-07-12 5:26 ` Why original mode doesn't use swap? (Original: Re: btrfs check lowmem, take 2) Qu Wenruo 2018-07-12 23:14 ` Marc MERLIN 2018-07-13 0:22 ` Qu Wenruo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.