* btrfs-progs reports nonsense scrub status @ 2020-05-05 5:46 Andrew Pam 2020-05-05 9:51 ` Graham Cobb 0 siblings, 1 reply; 48+ messages in thread From: Andrew Pam @ 2020-05-05 5:46 UTC (permalink / raw) To: linux-btrfs $ sudo btrfs scrub status /home UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 no stats available Time left: 16964119:40:20 ETA: Mon Aug 8 23:23:14 3955 Total to scrub: 7.30TiB Bytes scrubbed: 10.20TiB Rate: 288.06MiB/s Error summary: no errors found Cheers, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-05 5:46 btrfs-progs reports nonsense scrub status Andrew Pam @ 2020-05-05 9:51 ` Graham Cobb 2020-05-05 10:10 ` Andrew Pam 2020-05-05 20:39 ` Andrew Pam 0 siblings, 2 replies; 48+ messages in thread From: Graham Cobb @ 2020-05-05 9:51 UTC (permalink / raw) To: Andrew Pam, linux-btrfs On 05/05/2020 06:46, Andrew Pam wrote: > $ sudo btrfs scrub status /home > UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 > no stats available > Time left: 16964119:40:20 > ETA: Mon Aug 8 23:23:14 3955 > Total to scrub: 7.30TiB > Bytes scrubbed: 10.20TiB > Rate: 288.06MiB/s > Error summary: no errors found btrfs-progs version? (output from 'btrfs version') Is there actually a scrub in progress? Presumably there is no stats file? Try 'cat /var/lib/btrfs/scrub.status.85069ce9-be06-4c92-b8c1-8a0f685e43c6' Is this a multi-device filesystem? See what 'btrfs scrub status -d /home' says. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-05 9:51 ` Graham Cobb @ 2020-05-05 10:10 ` Andrew Pam 2020-05-05 20:39 ` Andrew Pam 1 sibling, 0 replies; 48+ messages in thread From: Andrew Pam @ 2020-05-05 10:10 UTC (permalink / raw) To: Graham Cobb, linux-btrfs On 5/5/20 7:51 pm, Graham Cobb wrote: > On 05/05/2020 06:46, Andrew Pam wrote: >> $ sudo btrfs scrub status /home >> UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 >> no stats available >> Time left: 16964119:40:20 >> ETA: Mon Aug 8 23:23:14 3955 >> Total to scrub: 7.30TiB >> Bytes scrubbed: 10.20TiB >> Rate: 288.06MiB/s >> Error summary: no errors found > > btrfs-progs version? (output from 'btrfs version') btrfs-progs v5.4.1 > Is there actually a scrub in progress? Yes. > Presumably there is no stats file? Try 'cat > /var/lib/btrfs/scrub.status.85069ce9-be06-4c92-b8c1-8a0f685e43c6' scrub status:1 85069ce9-be06-4c92-b8c1-8a0f685e43c6:1|data_extents_scrubbed:116366415|tree_extents_scrubbed:1154955|data_bytes_scrubbed:7193709023232|tree_bytes_scrubbed:18922782720|read_errors:0|csum_errors:0|verify_errors:0|no_csum:363202|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:0|t_start:1588531014|t_resumed:0|duration:43505|canceled:0|finished:0 85069ce9-be06-4c92-b8c1-8a0f685e43c6:2|data_extents_scrubbed:99457032|tree_extents_scrubbed:899136|data_bytes_scrubbed:6169469337600|tree_bytes_scrubbed:14731444224|read_errors:0|csum_errors:0|verify_errors:0|no_csum:306304|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:0|t_start:0|t_resumed:0|duration:37079|canceled:0|finished:0 > Is this a multi-device filesystem? Yes. > See what 'btrfs scrub status -d /home' says. UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 scrub device /dev/sda (id 1) status Scrub resumed: Tue May 5 20:08:45 2020 Status: running Duration: 12:05:35 Time left: 30907070:08:51 ETA: Sun Mar 17 11:18:10 5546 Total to scrub: 3.66TiB Bytes scrubbed: 6.56TiB Rate: 158.11MiB/s Error summary: no errors found scrub device /dev/sdb (id 2) status no stats available Time left: 30720615:37:35 ETA: Mon Dec 8 12:46:54 5524 Total to scrub: 3.66TiB Bytes scrubbed: 5.63TiB Rate: 159.07MiB/s Error summary: no errors found Cheers, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-05 9:51 ` Graham Cobb 2020-05-05 10:10 ` Andrew Pam @ 2020-05-05 20:39 ` Andrew Pam 2020-05-06 23:42 ` Chris Murphy 1 sibling, 1 reply; 48+ messages in thread From: Andrew Pam @ 2020-05-05 20:39 UTC (permalink / raw) To: Graham Cobb, linux-btrfs On 5/5/20 7:51 pm, Graham Cobb wrote: > Is there actually a scrub in progress? The scrub has been going for a couple of days now, and has scrubbed considerably more data than exists on the disks. Will it ever finish? Thanks, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-05 20:39 ` Andrew Pam @ 2020-05-06 23:42 ` Chris Murphy 2020-05-07 1:09 ` Andrew Pam 0 siblings, 1 reply; 48+ messages in thread From: Chris Murphy @ 2020-05-06 23:42 UTC (permalink / raw) To: Andrew Pam; +Cc: Graham Cobb, Btrfs BTRFS On Tue, May 5, 2020 at 2:39 PM Andrew Pam <andrew@sericyb.com.au> wrote: > > On 5/5/20 7:51 pm, Graham Cobb wrote: > > Is there actually a scrub in progress? > > The scrub has been going for a couple of days now, and has scrubbed > considerably more data than exists on the disks. Will it ever finish? A raid1 volume has twice as many bytes to scrub as data reported by df. Can you tell us what kernel version? And also what you get for: $ sudo btrfs fi us /mp/ $ df -h I'm using progs 5.6 and kernel 5.6.8 for this: $ sudo btrfs scrub status /mnt/third UUID: Scrub resumed: Tue May 5 08:45:41 2020 Status: finished Duration: 2:41:12 Total to scrub: 759.43GiB Rate: 79.57MiB/s Error summary: no errors found $ sudo btrfs fi us /mnt/third Overall: Device size: 931.49GiB Device allocated: 762.02GiB Device unallocated: 169.48GiB Device missing: 0.00B Used: 759.43GiB Free (estimated): 85.17GiB (min: 85.17GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID1: Size:379.00GiB, Used:378.56GiB (99.89%) /dev/mapper/sdd 379.00GiB /dev/mapper/sdc 379.00GiB Metadata,RAID1: Size:2.00GiB, Used:1.15GiB (57.57%) /dev/mapper/sdd 2.00GiB /dev/mapper/sdc 2.00GiB System,RAID1: Size:8.00MiB, Used:80.00KiB (0.98%) /dev/mapper/sdd 8.00MiB /dev/mapper/sdc 8.00MiB Unallocated: /dev/mapper/sdd 84.74GiB /dev/mapper/sdc 84.74GiB $ df -h ... /dev/mapper/sdd 466G 381G 86G 82% /mnt/third I think what you're seeing is a bug. Most of the size reporting in btrfs commands is in btrfs-progs; whereas the scrub is initiated by user space, most of the work is done by the kernel. But I don't know where the tracking code is. Some of the sizes you have to infer perspective. There is no one correct perspective. So it's normal to get a bit confused about the convention that applies. On mdadm/lvm raid1, the mirror isn't included in any of the space reporting. It seems like it's reporting 1/2 the storage. Meanwhile Btrfs reports all of the storage, and variably shows data taking up twice as much space (as literally behind the scenes each block group of extents has a mirror) -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-06 23:42 ` Chris Murphy @ 2020-05-07 1:09 ` Andrew Pam 2020-05-07 5:10 ` Chris Murphy 0 siblings, 1 reply; 48+ messages in thread From: Andrew Pam @ 2020-05-07 1:09 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 7/5/20 9:42 am, Chris Murphy wrote: > A raid1 volume has twice as many bytes to scrub as data reported by > df. It's scrubbed more than twice as many bytes, though. > Can you tell us what kernel version? 5.4.0 > And also what you get for: > $ sudo btrfs fi us /mp/ Overall: Device size: 10.92TiB Device allocated: 7.32TiB Device unallocated: 3.59TiB Device missing: 0.00B Used: 7.31TiB Free (estimated): 1.80TiB (min: 1.80TiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID1: Size:3.65TiB, Used:3.65TiB (99.89%) /dev/sda 3.65TiB /dev/sdb 3.65TiB Metadata,RAID1: Size:8.00GiB, Used:6.54GiB (81.74%) /dev/sda 8.00GiB /dev/sdb 8.00GiB System,RAID1: Size:64.00MiB, Used:544.00KiB (0.83%) /dev/sda 64.00MiB /dev/sdb 64.00MiB Unallocated: /dev/sda 1.80TiB /dev/sdb 1.80TiB > $ df -h /dev/sda 5.5T 3.7T 1.9T 67% /home > I think what you're seeing is a bug. Most of the size reporting in > btrfs commands is in btrfs-progs; whereas the scrub is initiated by > user space, most of the work is done by the kernel. But I don't know > where the tracking code is. No kidding. What concerns me now is that the scrub shows no signs of ever stopping: $ sudo btrfs scrub status -d /home UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 scrub device /dev/sda (id 1) status Scrub started: Mon May 4 04:36:54 2020 Status: running Duration: 18:06:28 Time left: 31009959:50:08 ETA: Fri Dec 13 03:58:24 5557 Total to scrub: 3.66TiB Bytes scrubbed: 9.80TiB Rate: 157.58MiB/s Error summary: no errors found scrub device /dev/sdb (id 2) status no stats available Time left: 30892482:15:09 ETA: Wed Jul 19 05:23:25 5544 Total to scrub: 3.66TiB Bytes scrubbed: 8.86TiB Rate: 158.18MiB/s Error summary: no errors found Cheers, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-07 1:09 ` Andrew Pam @ 2020-05-07 5:10 ` Chris Murphy 2020-05-07 5:36 ` Chris Murphy 0 siblings, 1 reply; 48+ messages in thread From: Chris Murphy @ 2020-05-07 5:10 UTC (permalink / raw) To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS On Wed, May 6, 2020 at 7:11 PM Andrew Pam <andrew@sericyb.com.au> wrote: > > > $ sudo btrfs fi us /mp/ > > Overall: > Device size: 10.92TiB > Device allocated: 7.32TiB > Device unallocated: 3.59TiB > Device missing: 0.00B > Used: 7.31TiB Bytes to scrub should be 7.31TB... > $ sudo btrfs scrub status -d /home > UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 > scrub device /dev/sda (id 1) status > Scrub started: Mon May 4 04:36:54 2020 > Status: running > Duration: 18:06:28 > Time left: 31009959:50:08 > ETA: Fri Dec 13 03:58:24 5557 > Total to scrub: 3.66TiB > Bytes scrubbed: 9.80TiB So two bugs. Total to scrub is wrong. And scrubbed bytes is bigger than both the reported total to scrub and the correct total that should be scrubbed. Three bugs, the time is goofy. This sounds familiar. Maybe just upgrade your btrfs-progs. -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-07 5:10 ` Chris Murphy @ 2020-05-07 5:36 ` Chris Murphy 2020-05-07 5:56 ` Andrew Pam 0 siblings, 1 reply; 48+ messages in thread From: Chris Murphy @ 2020-05-07 5:36 UTC (permalink / raw) To: Chris Murphy; +Cc: Andrew Pam, Graham Cobb, Btrfs BTRFS On Wed, May 6, 2020 at 11:10 PM Chris Murphy <lists@colorremedies.com> wrote: > > On Wed, May 6, 2020 at 7:11 PM Andrew Pam <andrew@sericyb.com.au> wrote: > > > > > $ sudo btrfs fi us /mp/ > > > > Overall: > > Device size: 10.92TiB > > Device allocated: 7.32TiB > > Device unallocated: 3.59TiB > > Device missing: 0.00B > > Used: 7.31TiB > > > Bytes to scrub should be 7.31TB... > > > > $ sudo btrfs scrub status -d /home > > UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 > > scrub device /dev/sda (id 1) status > > Scrub started: Mon May 4 04:36:54 2020 > > Status: running > > Duration: 18:06:28 > > Time left: 31009959:50:08 > > ETA: Fri Dec 13 03:58:24 5557 > > Total to scrub: 3.66TiB > > Bytes scrubbed: 9.80TiB > > > So two bugs. Total to scrub is wrong. And scrubbed bytes is bigger > than both the reported total to scrub and the correct total that > should be scrubbed. > > Three bugs, the time is goofy. This sounds familiar. Maybe just > upgrade your btrfs-progs. This was fixed in 5.2.1. I'm not sure why you're seeing this. commit 96ed8e801fa2fc2d8a99e757566293c05572ebe1 Author: Grzegorz Kowal <grzegorz@amuncode.org> Date: Sun Jul 7 14:58:56 2019 -0300 btrfs-progs: scrub: fix ETA calculation What I would do is cancel the scrub. And then delete the applicable file in /var/lib/btrfs, which is the file that keeps track of the scrub. Then do 'btrfs scrub status' on that file system and it should say there are no stats, but it'd be interesting to know if Total to Scrub is sane. You can also start another scrub, and then again check status and see if it's still sane or not. If not I'd cancel it and keep troubleshooting. I was recently on btrfs-progs 5.4.1 and didn't see this behavior myself on a raid1 volume. -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-07 5:36 ` Chris Murphy @ 2020-05-07 5:56 ` Andrew Pam 2020-05-08 2:26 ` Chris Murphy 0 siblings, 1 reply; 48+ messages in thread From: Andrew Pam @ 2020-05-07 5:56 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 7/5/20 3:36 pm, Chris Murphy wrote: > This was fixed in 5.2.1. I'm not sure why you're seeing this. > > commit 96ed8e801fa2fc2d8a99e757566293c05572ebe1 > Author: Grzegorz Kowal <grzegorz@amuncode.org> > Date: Sun Jul 7 14:58:56 2019 -0300 > > btrfs-progs: scrub: fix ETA calculation Maybe not fixed under all conditions! :) > What I would do is cancel the scrub. And then delete the applicable > file in /var/lib/btrfs, which is the file that keeps track of the > scrub. Then do 'btrfs scrub status' on that file system and it should > say there are no stats, but it'd be interesting to know if Total to > Scrub is sane. $ sudo btrfs scrub status /home UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 no stats available Total to scrub: 7.31TiB Rate: 0.00B/s Error summary: no errors found > You can also start another scrub, and then again check > status and see if it's still sane or not. If not I'd cancel it and > keep troubleshooting. $ sudo btrfs scrub status -d /home UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 scrub device /dev/sda (id 1) status Scrub started: Thu May 7 15:44:21 2020 Status: running Duration: 0:06:53 Time left: 9:23:26 ETA: Fri May 8 01:14:40 2020 Total to scrub: 3.66TiB Bytes scrubbed: 45.24GiB Rate: 112.16MiB/s Error summary: no errors found scrub device /dev/sdb (id 2) status Scrub started: Thu May 7 15:44:21 2020 Status: running Duration: 0:06:53 Time left: 9:24:50 ETA: Fri May 8 01:16:04 2020 Total to scrub: 3.66TiB Bytes scrubbed: 45.12GiB Rate: 111.88MiB/s Error summary: no errors found Still sane after cancelling and resuming. One thing that might be relevant: On the original scrub, I started it on the mountpoint but initially cancelled and resumed it on the device /dev/sda rather than the mountpoint. Could that trigger a bug? Cheers, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-07 5:56 ` Andrew Pam @ 2020-05-08 2:26 ` Chris Murphy 2020-05-08 2:31 ` Andrew Pam 0 siblings, 1 reply; 48+ messages in thread From: Chris Murphy @ 2020-05-08 2:26 UTC (permalink / raw) To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS On Wed, May 6, 2020 at 11:56 PM Andrew Pam <andrew@sericyb.com.au> wrote: > > On 7/5/20 3:36 pm, Chris Murphy wrote: > > This was fixed in 5.2.1. I'm not sure why you're seeing this. > > > > commit 96ed8e801fa2fc2d8a99e757566293c05572ebe1 > > Author: Grzegorz Kowal <grzegorz@amuncode.org> > > Date: Sun Jul 7 14:58:56 2019 -0300 > > > > btrfs-progs: scrub: fix ETA calculation > > Maybe not fixed under all conditions! :) > > > What I would do is cancel the scrub. And then delete the applicable > > file in /var/lib/btrfs, which is the file that keeps track of the > > scrub. Then do 'btrfs scrub status' on that file system and it should > > say there are no stats, but it'd be interesting to know if Total to > > Scrub is sane. > > $ sudo btrfs scrub status /home > UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 > no stats available > Total to scrub: 7.31TiB > Rate: 0.00B/s > Error summary: no errors found > > > You can also start another scrub, and then again check > > status and see if it's still sane or not. If not I'd cancel it and > > keep troubleshooting. > > $ sudo btrfs scrub status -d /home > UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 > scrub device /dev/sda (id 1) status > Scrub started: Thu May 7 15:44:21 2020 > Status: running > Duration: 0:06:53 > Time left: 9:23:26 > ETA: Fri May 8 01:14:40 2020 > Total to scrub: 3.66TiB > Bytes scrubbed: 45.24GiB > Rate: 112.16MiB/s > Error summary: no errors found > scrub device /dev/sdb (id 2) status > Scrub started: Thu May 7 15:44:21 2020 > Status: running > Duration: 0:06:53 > Time left: 9:24:50 > ETA: Fri May 8 01:16:04 2020 > Total to scrub: 3.66TiB > Bytes scrubbed: 45.12GiB > Rate: 111.88MiB/s > Error summary: no errors found > > Still sane after cancelling and resuming. > > One thing that might be relevant: On the original scrub, I started it > on the mountpoint but initially cancelled and resumed it on the device > /dev/sda rather than the mountpoint. Could that trigger a bug? Maybe. Try it and see. -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 2:26 ` Chris Murphy @ 2020-05-08 2:31 ` Andrew Pam 2020-05-08 2:48 ` Chris Murphy 0 siblings, 1 reply; 48+ messages in thread From: Andrew Pam @ 2020-05-08 2:31 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 8/5/20 12:26 pm, Chris Murphy wrote: >> One thing that might be relevant: On the original scrub, I started it >> on the mountpoint but initially cancelled and resumed it on the device >> /dev/sda rather than the mountpoint. Could that trigger a bug? > > Maybe. Try it and see. I just want to get a scrub to complete normally first. So far it's not making a lot of progress. More visibility into the process might be helpful. Cheers, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 2:31 ` Andrew Pam @ 2020-05-08 2:48 ` Chris Murphy 2020-05-08 2:54 ` Andrew Pam 0 siblings, 1 reply; 48+ messages in thread From: Chris Murphy @ 2020-05-08 2:48 UTC (permalink / raw) To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS On Thu, May 7, 2020 at 8:32 PM Andrew Pam <andrew@sericyb.com.au> wrote: > > On 8/5/20 12:26 pm, Chris Murphy wrote: > >> One thing that might be relevant: On the original scrub, I started it > >> on the mountpoint but initially cancelled and resumed it on the device > >> /dev/sda rather than the mountpoint. Could that trigger a bug? > > > > Maybe. Try it and see. > > I just want to get a scrub to complete normally first. So far it's not > making a lot of progress. More visibility into the process might be > helpful. >Rate: 112.16MiB/s >Rate: 111.88MiB/s These are hard drives? Those are reasonable rates. It's typical for drives to differ ~50% from outer most to inner most tracks. Depending on the chunk layout and progress, the rate may vary, including between drives. -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 2:48 ` Chris Murphy @ 2020-05-08 2:54 ` Andrew Pam 2020-05-08 3:55 ` Chris Murphy 0 siblings, 1 reply; 48+ messages in thread From: Andrew Pam @ 2020-05-08 2:54 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 8/5/20 12:48 pm, Chris Murphy wrote: >> Rate: 112.16MiB/s >> Rate: 111.88MiB/s > > These are hard drives? Those are reasonable rates. It's typical for > drives to differ ~50% from outer most to inner most tracks. Depending > on the chunk layout and progress, the rate may vary, including between > drives. It's not the rate I'm concerned about - it's that the "Bytes scrubbed" barely increases, the "Time left" doesn't decrease, and the "ETA" recedes ever further into the future. Cheers, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 2:54 ` Andrew Pam @ 2020-05-08 3:55 ` Chris Murphy 2020-05-08 4:04 ` Andrew Pam 0 siblings, 1 reply; 48+ messages in thread From: Chris Murphy @ 2020-05-08 3:55 UTC (permalink / raw) To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS On Thu, May 7, 2020 at 8:54 PM Andrew Pam <andrew@sericyb.com.au> wrote: > > On 8/5/20 12:48 pm, Chris Murphy wrote: > >> Rate: 112.16MiB/s > >> Rate: 111.88MiB/s > > > > These are hard drives? Those are reasonable rates. It's typical for > > drives to differ ~50% from outer most to inner most tracks. Depending > > on the chunk layout and progress, the rate may vary, including between > > drives. > > It's not the rate I'm concerned about - it's that the "Bytes scrubbed" > barely increases, the "Time left" doesn't decrease, and the "ETA" > recedes ever further into the future. OK what's the current output from $ sudo btrfs scrub status -d /home 21 hours ago, the report was that it'd take 9 hours to scrub. -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 3:55 ` Chris Murphy @ 2020-05-08 4:04 ` Andrew Pam 2020-05-08 4:21 ` Chris Murphy 0 siblings, 1 reply; 48+ messages in thread From: Andrew Pam @ 2020-05-08 4:04 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 8/5/20 1:55 pm, Chris Murphy wrote: > OK what's the current output from > $ sudo btrfs scrub status -d /home > > 21 hours ago, the report was that it'd take 9 hours to scrub. UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 scrub device /dev/sda (id 1) status Scrub started: Thu May 7 15:44:21 2020 Status: running Duration: 5:40:13 Time left: 1:22:58 ETA: Fri May 8 15:25:44 2020 Total to scrub: 3.66TiB Bytes scrubbed: 2.94TiB Rate: 151.16MiB/s Error summary: no errors found scrub device /dev/sdb (id 2) status Scrub started: Thu May 7 15:44:21 2020 Status: running Duration: 5:40:16 Time left: 1:18:02 ETA: Fri May 8 15:20:48 2020 Total to scrub: 3.66TiB Bytes scrubbed: 2.98TiB Rate: 152.92MiB/s Error summary: no errors found The "Duration" and "Time left" and "Bytes scrubbed" values have not changed for a few hours, despite considerable disk I/O. Cheers, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 4:04 ` Andrew Pam @ 2020-05-08 4:21 ` Chris Murphy 2020-05-08 4:42 ` Andrew Pam 0 siblings, 1 reply; 48+ messages in thread From: Chris Murphy @ 2020-05-08 4:21 UTC (permalink / raw) To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS On Thu, May 7, 2020 at 10:04 PM Andrew Pam <andrew@sericyb.com.au> wrote: > > On 8/5/20 1:55 pm, Chris Murphy wrote: > > OK what's the current output from > > $ sudo btrfs scrub status -d /home > > > > 21 hours ago, the report was that it'd take 9 hours to scrub. > > UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 > scrub device /dev/sda (id 1) status > Scrub started: Thu May 7 15:44:21 2020 > Status: running > Duration: 5:40:13 > Time left: 1:22:58 > ETA: Fri May 8 15:25:44 2020 > Total to scrub: 3.66TiB > Bytes scrubbed: 2.94TiB > Rate: 151.16MiB/s What does 'iotop -d 10 -o' report? I'm expecting around 300MB/s reads. The ETA is +14 hours what you posted 21 hours ago. So yeah that's fakaked, but at least it's not saying it'll be done in year 5544! I've always seen the ETAs be pretty accurate so I don't know what's going on. 3082813.44MB to go divided by 300MB/s is 171 minutes. Or just under 3 hours. So the time left / ETA is wrong based on this rate, if it's a stable rate, which it might not be. The gotcha if the rate is changing due to concurrent load or a decent amount of free space fragmentation, could be to blame. Hence iotop. What are the current mount options for this file system? -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 4:21 ` Chris Murphy @ 2020-05-08 4:42 ` Andrew Pam 2020-05-08 4:45 ` Chris Murphy 0 siblings, 1 reply; 48+ messages in thread From: Andrew Pam @ 2020-05-08 4:42 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 8/5/20 2:21 pm, Chris Murphy wrote: > What does 'iotop -d 10 -o' report? I'm expecting around 300MB/s reads. 91872 idle root 149.25 M/s 0.00 B/s 0.00 % 0.00 % btrfs scrub resume -c3 /home 91873 idle root 157.77 M/s 0.00 B/s 0.00 % 0.00 % btrfs scrub resume -c3 /home > The ETA is +14 hours what you posted 21 hours ago. So yeah that's > fakaked, but at least it's not saying it'll be done in year 5544! I did cancel when I went to bed and resumed when I got up three hours later on two occasions, so subtract six hours. > I've always seen the ETAs be pretty accurate so I don't know what's going on. Maybe it screws up once you cancel and resume too many times? > 3082813.44MB to go divided by 300MB/s is 171 minutes. Or just under 3 > hours. So the time left / ETA is wrong based on this rate, if it's a > stable rate, which it might not be. I have an I/O load graph on my screen and it shows 100% read load on both drives at all times, with an I/O rate of around 130-160 M/s per drive. > What are the current mount options for this file system? /dev/sda on /home type btrfs (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/) Thanks, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 4:42 ` Andrew Pam @ 2020-05-08 4:45 ` Chris Murphy 2020-05-08 4:52 ` Andrew Pam 0 siblings, 1 reply; 48+ messages in thread From: Chris Murphy @ 2020-05-08 4:45 UTC (permalink / raw) To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS On Thu, May 7, 2020 at 10:42 PM Andrew Pam <andrew@sericyb.com.au> wrote: > > On 8/5/20 2:21 pm, Chris Murphy wrote: > > What does 'iotop -d 10 -o' report? I'm expecting around 300MB/s reads. > > 91872 idle root 149.25 M/s 0.00 B/s 0.00 % 0.00 % btrfs scrub > resume -c3 /home > 91873 idle root 157.77 M/s 0.00 B/s 0.00 % 0.00 % btrfs scrub > resume -c3 /home > > > The ETA is +14 hours what you posted 21 hours ago. So yeah that's > > fakaked, but at least it's not saying it'll be done in year 5544! > > I did cancel when I went to bed and resumed when I got up three hours > later on two occasions, so subtract six hours. > > > I've always seen the ETAs be pretty accurate so I don't know what's going on. > > Maybe it screws up once you cancel and resume too many times? > > > 3082813.44MB to go divided by 300MB/s is 171 minutes. Or just under 3 > > hours. So the time left / ETA is wrong based on this rate, if it's a > > stable rate, which it might not be. > > I have an I/O load graph on my screen and it shows 100% read load on > both drives at all times, with an I/O rate of around 130-160 M/s per drive. > > > What are the current mount options for this file system? > > /dev/sda on /home type btrfs > (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/) > > Thanks, > Andrew I don't know if space_cache v1 can negatively impact scrubs but it does negatively impact other things especially on larger file systems with a lot of free space remaining. v1 cache exists as data blocks, v2 cache is a dedicated 'free space tree' and resides in fs metadata. It's the inverse of the extent tree. A conservative approach that might speed things up: # mount -o remount,clear_cache,nospace_cache /mnt Once it's done: # mount -o remount,clear_cache,space_cache=v2 /mnt This sets a feature flag, and the next time you mount normally, it'll use v2. You could do this now during the scrub, but it might slow it down a bit while the new cache is being created. If the file system isn't busy it might take a minute to build. Anyway, I'm pretty confident this scrub will finish in about 2.5 hours if you just leave it as is. -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 4:45 ` Chris Murphy @ 2020-05-08 4:52 ` Andrew Pam 2020-05-08 7:37 ` Chris Murphy 0 siblings, 1 reply; 48+ messages in thread From: Andrew Pam @ 2020-05-08 4:52 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 8/5/20 2:45 pm, Chris Murphy wrote: > I don't know if space_cache v1 can negatively impact scrubs but it > does negatively impact other things especially on larger file systems > with a lot of free space remaining. v1 cache exists as data blocks, v2 > cache is a dedicated 'free space tree' and resides in fs metadata. > It's the inverse of the extent tree. > > A conservative approach that might speed things up: > > # mount -o remount,clear_cache,nospace_cache /mnt > > Once it's done: > > # mount -o remount,clear_cache,space_cache=v2 /mnt > > This sets a feature flag, and the next time you mount normally, it'll > use v2. You could do this now during the scrub, but it might slow it > down a bit while the new cache is being created. If the file system > isn't busy it might take a minute to build. Thanks, I didn't know that - I just used the defaults in /etc/fstab as follows: UUID=85069ce9-be06-4c92-b8c1-8a0f685e43c6 /home btrfs defaults,autodefrag,noatime I'll put space_cache=v2 in there as well. > Anyway, I'm pretty confident this scrub will finish in about 2.5 hours > if you just leave it as is. I'm not. It's still not reporting any forward progress, just moving the ETA ahead: $ sudo btrfs scrub status -d /home UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 scrub device /dev/sda (id 1) status Scrub started: Thu May 7 15:44:21 2020 Status: running Duration: 5:40:13 Time left: 1:23:04 ETA: Fri May 8 16:11:31 2020 Total to scrub: 3.66TiB Bytes scrubbed: 2.94TiB Rate: 151.16MiB/s Error summary: no errors found scrub device /dev/sdb (id 2) status Scrub started: Thu May 7 15:44:21 2020 Status: running Duration: 5:40:16 Time left: 1:18:09 ETA: Fri May 8 16:06:36 2020 Total to scrub: 3.66TiB Bytes scrubbed: 2.98TiB Rate: 152.92MiB/s Error summary: no errors found Thanks, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 4:52 ` Andrew Pam @ 2020-05-08 7:37 ` Chris Murphy 2020-05-08 8:19 ` Andrew Pam 0 siblings, 1 reply; 48+ messages in thread From: Chris Murphy @ 2020-05-08 7:37 UTC (permalink / raw) To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS On Thu, May 7, 2020 at 10:52 PM Andrew Pam <andrew@sericyb.com.au> wrote: > > On 8/5/20 2:45 pm, Chris Murphy wrote: > > I don't know if space_cache v1 can negatively impact scrubs but it > > does negatively impact other things especially on larger file systems > > with a lot of free space remaining. v1 cache exists as data blocks, v2 > > cache is a dedicated 'free space tree' and resides in fs metadata. > > It's the inverse of the extent tree. > > > > A conservative approach that might speed things up: > > > > # mount -o remount,clear_cache,nospace_cache /mnt > > > > Once it's done: > > > > # mount -o remount,clear_cache,space_cache=v2 /mnt > > > > This sets a feature flag, and the next time you mount normally, it'll > > use v2. You could do this now during the scrub, but it might slow it > > down a bit while the new cache is being created. If the file system > > isn't busy it might take a minute to build. > > Thanks, I didn't know that - I just used the defaults in /etc/fstab as > follows: > > UUID=85069ce9-be06-4c92-b8c1-8a0f685e43c6 /home btrfs > defaults,autodefrag,noatime > > I'll put space_cache=v2 in there as well. Is there anything else going on? What do you get for 'top -d 30' Autodefrag can sometimes be expensive, when it's doing copies in the background. This isn't inhibited during scrub. You do not need to put space_cache=v2 in fstab. It's a one time use, and it needs to be used with clear_cache. When you use it, a feature flag will be set that will cause it to be used automatically each time the file system mounts. Really the only thing that needs to be in there is 'noatime'. > > > Anyway, I'm pretty confident this scrub will finish in about 2.5 hours > > if you just leave it as is. > > I'm not. It's still not reporting any forward progress, just moving the > ETA ahead: > > $ sudo btrfs scrub status -d /home > UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 > scrub device /dev/sda (id 1) status > Scrub started: Thu May 7 15:44:21 2020 > Status: running > Duration: 5:40:13 > Time left: 1:23:04 > ETA: Fri May 8 16:11:31 2020 > Total to scrub: 3.66TiB > Bytes scrubbed: 2.94TiB > Rate: 151.16MiB/s OK I screwed up the previous math. I love it when I do that. Since this is -d the numbers are for this device. 0.72T remaining 0.72×1024×1024÷151÷60÷60=1h23 Are there any messages in dmesg? All of these numbers look sane, compared to your first post. But that ETA keeps increasing suggests the scrub isn't proceeding at the reported rate. As if it's busy doing something else. -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 7:37 ` Chris Murphy @ 2020-05-08 8:19 ` Andrew Pam 2020-05-08 11:21 ` Graham Cobb 0 siblings, 1 reply; 48+ messages in thread From: Andrew Pam @ 2020-05-08 8:19 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 8/5/20 5:37 pm, Chris Murphy wrote: > Are there any messages in dmesg? Well this is interesting: [129682.760759] BTRFS info (device sda): scrub: finished on devid 2 with status: 0 [129683.173404] BTRFS info (device sda): scrub: finished on devid 1 with status: 0 But then: $ sudo btrfs scrub status -d /home UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 scrub device /dev/sda (id 1) status Scrub started: Thu May 7 15:44:21 2020 Status: interrupted Duration: 5:40:13 Total to scrub: 3.66TiB Rate: 151.16MiB/s Error summary: no errors found scrub device /dev/sdb (id 2) status Scrub started: Thu May 7 15:44:21 2020 Status: interrupted Duration: 5:40:16 Total to scrub: 3.66TiB Rate: 152.92MiB/s Error summary: no errors found So was it really "interrupted", or did it finish normally with no errors but btrfs-progs is reporting wrongly? Thanks, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 8:19 ` Andrew Pam @ 2020-05-08 11:21 ` Graham Cobb 2020-05-08 12:54 ` Andrew Pam 2020-05-08 17:06 ` Andrew Pam 0 siblings, 2 replies; 48+ messages in thread From: Graham Cobb @ 2020-05-08 11:21 UTC (permalink / raw) To: Andrew Pam, Chris Murphy; +Cc: Btrfs BTRFS On 08/05/2020 09:19, Andrew Pam wrote: > On 8/5/20 5:37 pm, Chris Murphy wrote: >> Are there any messages in dmesg? > > Well this is interesting: > > [129682.760759] BTRFS info (device sda): scrub: finished on devid 2 with > status: 0 > [129683.173404] BTRFS info (device sda): scrub: finished on devid 1 with > status: 0 > > But then: > > $ sudo btrfs scrub status -d /home > UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 > scrub device /dev/sda (id 1) status > Scrub started: Thu May 7 15:44:21 2020 > Status: interrupted > Duration: 5:40:13 > Total to scrub: 3.66TiB > Rate: 151.16MiB/s > Error summary: no errors found > scrub device /dev/sdb (id 2) status > Scrub started: Thu May 7 15:44:21 2020 > Status: interrupted > Duration: 5:40:16 > Total to scrub: 3.66TiB > Rate: 152.92MiB/s > Error summary: no errors found > > So was it really "interrupted", or did it finish normally with no errors > but btrfs-progs is reporting wrongly? I also don't know whether it has really finished successfully. If you are worried that it is somehow looping (bytes scrubbed going up but not really making progress), use: btrfs scrub status -dR /home and look at last_physical (for each disk) - it should be always increasing. Also, there have been bugs in cancel/resume in the past. There could be more bugs lurking there, particularly for multi-device filesystems. If you are going to cancel and resume, check last_physical for each device before the cancel (using 'status -dR') and after the resume and make sure they seem sensible (not gone backwards, or skipped massively forward, or started again on a device which had already finished). ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 11:21 ` Graham Cobb @ 2020-05-08 12:54 ` Andrew Pam 2020-05-08 17:06 ` Andrew Pam 1 sibling, 0 replies; 48+ messages in thread From: Andrew Pam @ 2020-05-08 12:54 UTC (permalink / raw) To: Graham Cobb, Chris Murphy; +Cc: Btrfs BTRFS On 8/5/20 9:21 pm, Graham Cobb wrote: > If you are worried that it is somehow looping (bytes scrubbed going up > but not really making progress), use: > > btrfs scrub status -dR /home Aha! That's what I was looking for. > and look at last_physical (for each disk) - it should be always increasing. > > Also, there have been bugs in cancel/resume in the past. There could be > more bugs lurking there, particularly for multi-device filesystems. Apparently! Unfortunately since scrub blocks suspend, I have to use cancel/resume in my suspend pre/post scripts. > If you are going to cancel and resume, check last_physical for each > device before the cancel (using 'status -dR') and after the resume and > make sure they seem sensible (not gone backwards, or skipped massively > forward, or started again on a device which had already finished). $ sudo btrfs scrub status -dR /home UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 scrub device /dev/sda (id 1) status Scrub started: Thu May 7 15:44:21 2020 Status: running Duration: 5:40:13 data_extents_scrubbed: 51856478 tree_extents_scrubbed: 431748 data_bytes_scrubbed: 3228367126528 tree_bytes_scrubbed: 7073759232 read_errors: 0 csum_errors: 0 verify_errors: 0 no_csum: 179858 csum_discards: 0 super_errors: 0 malloc_errors: 0 uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 0 last_physical: 0 scrub device /dev/sdb (id 2) status Scrub started: Thu May 7 15:44:21 2020 Status: running Duration: 5:40:16 data_extents_scrubbed: 52452792 tree_extents_scrubbed: 431756 data_bytes_scrubbed: 3266540351488 tree_bytes_scrubbed: 7073890304 read_errors: 0 csum_errors: 0 verify_errors: 0 no_csum: 182034 csum_discards: 0 super_errors: 0 malloc_errors: 0 uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 0 last_physical: 0 last_physical is zero. That doesn't seem right. Cheers, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 11:21 ` Graham Cobb 2020-05-08 12:54 ` Andrew Pam @ 2020-05-08 17:06 ` Andrew Pam 2020-05-08 21:20 ` Chris Murphy ` (2 more replies) 1 sibling, 3 replies; 48+ messages in thread From: Andrew Pam @ 2020-05-08 17:06 UTC (permalink / raw) To: Graham Cobb, Chris Murphy; +Cc: Btrfs BTRFS And here we are again: $ sudo btrfs scrub status -d /home UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 scrub device /dev/sda (id 1) status Scrub resumed: Sat May 9 02:52:12 2020 Status: running Duration: 7:02:55 Time left: 32261372:31:39 ETA: Fri Sep 17 23:35:41 5700 Total to scrub: 3.66TiB Bytes scrubbed: 3.67TiB Rate: 151.47MiB/s Error summary: no errors found scrub device /dev/sdb (id 2) status Scrub resumed: Sat May 9 02:52:12 2020 Status: running Duration: 7:02:59 Time left: 31973655:40:34 ETA: Mon Nov 21 19:44:36 5667 Total to scrub: 3.66TiB Bytes scrubbed: 3.70TiB Rate: 152.83MiB/s Error summary: no errors found I tried building btrfs-progs v5.6.1 from source, but it gives exactly the same results. Cheers, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 17:06 ` Andrew Pam @ 2020-05-08 21:20 ` Chris Murphy 2020-05-08 21:31 ` Chris Murphy 2020-05-09 8:02 ` Su Yue 2 siblings, 0 replies; 48+ messages in thread From: Chris Murphy @ 2020-05-08 21:20 UTC (permalink / raw) To: Andrew Pam; +Cc: Graham Cobb, Chris Murphy, Btrfs BTRFS On Fri, May 8, 2020 at 11:06 AM Andrew Pam <andrew@sericyb.com.au> wrote: > > And here we are again: > > $ sudo btrfs scrub status -d /home > UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 > scrub device /dev/sda (id 1) status > Scrub resumed: Sat May 9 02:52:12 2020 > Status: running > Duration: 7:02:55 > Time left: 32261372:31:39 > ETA: Fri Sep 17 23:35:41 5700 > Total to scrub: 3.66TiB > Bytes scrubbed: 3.67TiB It's obviously a bug. But I don't know why. [ 1683.525602] BTRFS warning (device sda4): qgroup rescan init failed, qgroup is not enabled [chris@fmac ~] > Rate: 151.47MiB/s > Error summary: no errors found > scrub device /dev/sdb (id 2) status > Scrub resumed: Sat May 9 02:52:12 2020 > Status: running > Duration: 7:02:59 > Time left: 31973655:40:34 > ETA: Mon Nov 21 19:44:36 5667 > Total to scrub: 3.66TiB > Bytes scrubbed: 3.70TiB > Rate: 152.83MiB/s > Error summary: no errors found > > I tried building btrfs-progs v5.6.1 from source, but it gives exactly > the same results. -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 17:06 ` Andrew Pam 2020-05-08 21:20 ` Chris Murphy @ 2020-05-08 21:31 ` Chris Murphy 2020-05-09 10:15 ` Andrew Pam 2020-05-09 8:02 ` Su Yue 2 siblings, 1 reply; 48+ messages in thread From: Chris Murphy @ 2020-05-08 21:31 UTC (permalink / raw) To: Andrew Pam; +Cc: Graham Cobb, Chris Murphy, Btrfs BTRFS (gmail is super annoying sometimes, keyboard shortcuts that cause email to be sent) On Fri, May 8, 2020 at 11:06 AM Andrew Pam <andrew@sericyb.com.au> wrote: > > And here we are again: > > $ sudo btrfs scrub status -d /home > UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 > scrub device /dev/sda (id 1) status > Scrub resumed: Sat May 9 02:52:12 2020 > Status: running > Duration: 7:02:55 > Time left: 32261372:31:39 > ETA: Fri Sep 17 23:35:41 5700 > Total to scrub: 3.66TiB > Bytes scrubbed: 3.67TiB > Rate: 151.47MiB/s > Error summary: no errors found > scrub device /dev/sdb (id 2) status > Scrub resumed: Sat May 9 02:52:12 2020 > Status: running > Duration: 7:02:59 > Time left: 31973655:40:34 > ETA: Mon Nov 21 19:44:36 5667 > Total to scrub: 3.66TiB > Bytes scrubbed: 3.70TiB > Rate: 152.83MiB/s > Error summary: no errors found > > I tried building btrfs-progs v5.6.1 from source, but it gives exactly > the same results. Do you have qgroups enabled? $ sudo btrfs qgroup show /home ERROR: can't list qgroups: quotas not enabled What do you get for: $ sudo btrfs insp dump-s /dev/ This dumps out the super block. There are scrub related fixes in linux git since 5.4.0 but I can't tell if any of them are related to this problem. My suggestion is scrub with kernel 5.7.rc4. If the problem happens, it's a current and unfixed bug. If it's been fixed, then it's a question of what commit fixed it and whether it's been backported, or can be, to 5.4 series. -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 21:31 ` Chris Murphy @ 2020-05-09 10:15 ` Andrew Pam 2020-05-09 20:33 ` Chris Murphy 0 siblings, 1 reply; 48+ messages in thread From: Andrew Pam @ 2020-05-09 10:15 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 9/5/20 7:31 am, Chris Murphy wrote: > Do you have qgroups enabled? > > $ sudo btrfs qgroup show /home > ERROR: can't list qgroups: quotas not enabled No: $ sudo btrfs qgroup show /home ERROR: can't list qgroups: quotas not enabled > What do you get for: > > $ sudo btrfs insp dump-s /dev/ > > This dumps out the super block. $ sudo btrfs insp dump-s /dev/sda superblock: bytenr=65536, device=/dev/sda --------------------------------------------------------- csum_type 0 (crc32c) csum_size 4 csum 0x36e3f416 [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 85069ce9-be06-4c92-b8c1-8a0f685e43c6 metadata_uuid 85069ce9-be06-4c92-b8c1-8a0f685e43c6 label home generation 59159 root 4881572642816 sys_array_size 258 chunk_root_generation 58225 root_level 1 chunk_root 4925720363008 chunk_root_level 1 log_root 4881575936000 log_root_transid 0 log_root_level 0 total_bytes 12002350252032 bytes_used 4022988615680 sectorsize 4096 nodesize 16384 leafsize (deprecated) 16384 stripesize 4096 root_dir 6 num_devices 2 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x161 ( MIXED_BACKREF | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) cache_generation 59159 uuid_tree_generation 59159 dev_item.uuid b347d8aa-4563-45db-bcf1-c722e5b3abe8 dev_item.fsid 85069ce9-be06-4c92-b8c1-8a0f685e43c6 [match] dev_item.type 0 dev_item.total_bytes 6001175126016 dev_item.bytes_used 4055589978112 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 > There are scrub related fixes in linux git since 5.4.0 but I can't > tell if any of them are related to this problem. My suggestion is > scrub with kernel 5.7.rc4. If the problem happens, it's a current and > unfixed bug. If it's been fixed, then it's a question of what commit > fixed it and whether it's been backported, or can be, to 5.4 series. OK, I installed 5.7rc4 and rebooted, then started a fresh scrub: $ uname -a Linux RainbowDash 5.7.0-050700rc4-generic #202005051752 SMP Tue May 5 21:56:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux After a couple of cancel/resumes here we are again: $ sudo btrfs scrub status -dR /home UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 scrub device /dev/sda (id 1) status Scrub started: Sat May 9 15:52:21 2020 Status: running Duration: 2:25:03 data_extents_scrubbed: 22903068 tree_extents_scrubbed: 194348 data_bytes_scrubbed: 1418082578432 tree_bytes_scrubbed: 3184197632 read_errors: 0 csum_errors: 0 verify_errors: 0 no_csum: 60164 csum_discards: 0 super_errors: 0 malloc_errors: 0 uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 0 last_physical: 0 scrub device /dev/sdb (id 2) status Scrub started: Sat May 9 15:52:21 2020 Status: running Duration: 2:25:03 data_extents_scrubbed: 22903070 tree_extents_scrubbed: 194348 data_bytes_scrubbed: 1418082648064 tree_bytes_scrubbed: 3184197632 read_errors: 0 csum_errors: 0 verify_errors: 0 no_csum: 60164 csum_discards: 0 super_errors: 0 malloc_errors: 0 uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 0 last_physical: 0 Duration, time left and bytes scrubbed are not advancing, and last_physical is zero. Thanks, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-09 10:15 ` Andrew Pam @ 2020-05-09 20:33 ` Chris Murphy 2020-05-10 1:14 ` Chris Murphy ` (3 more replies) 0 siblings, 4 replies; 48+ messages in thread From: Chris Murphy @ 2020-05-09 20:33 UTC (permalink / raw) To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS On Sat, May 9, 2020 at 4:15 AM Andrew Pam <andrew@sericyb.com.au> wrote: > > $ uname -a > Linux RainbowDash 5.7.0-050700rc4-generic #202005051752 SMP Tue May 5 > 21:56:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux > > After a couple of cancel/resumes here we are again: OK I didn't previously pick up on the fact that you're cancelling a scrub. Suspending. Then resuming. Things to verify: 1. That a scrub started, and allowed to finish without any interruption, does finish. This is the only experience I have, but I'm wondering whether you've tried this just to eliminate the possibility of some weird multiple device scrub bug; or possibly something unique about your file system state. 2. That a scrub started, then cancelled, then resumed, also does finish (or not). 3. That a scrub started, then cancelled, then suspend-to-RAM, then resumed, also does finish (or not). The distinction between (2) and (3) is system sleep. Is this strictly a resume scrub bug. Or is there something about suspend that causing state to be lost? And a low priority (3b) variation might be suspend-to-disk; but this is not very well supported anyway, in particular UEFI Secure Boot enabled systems due to hibernation lockdown policy being common these days. -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-09 20:33 ` Chris Murphy @ 2020-05-10 1:14 ` Chris Murphy 2020-05-10 1:28 ` Andrew Pam ` (2 more replies) 2020-05-10 1:30 ` Andrew Pam ` (2 subsequent siblings) 3 siblings, 3 replies; 48+ messages in thread From: Chris Murphy @ 2020-05-10 1:14 UTC (permalink / raw) To: Chris Murphy; +Cc: Andrew Pam, Graham Cobb, Btrfs BTRFS On Sat, May 9, 2020 at 2:33 PM Chris Murphy <lists@colorremedies.com> wrote: > > On Sat, May 9, 2020 at 4:15 AM Andrew Pam <andrew@sericyb.com.au> wrote: > > > > $ uname -a > > Linux RainbowDash 5.7.0-050700rc4-generic #202005051752 SMP Tue May 5 > > 21:56:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux > > > > After a couple of cancel/resumes here we are again: > > > OK I didn't previously pick up on the fact that you're cancelling a > scrub. Suspending. Then resuming. > > Things to verify: > > 1. That a scrub started, and allowed to finish without any > interruption, does finish. This is the only experience I have, but I'm > wondering whether you've tried this just to eliminate the possibility > of some weird multiple device scrub bug; or possibly something unique > about your file system state. > > 2. That a scrub started, then cancelled, then resumed, also does > finish (or not). > > 3. That a scrub started, then cancelled, then suspend-to-RAM, then > resumed, also does finish (or not). I just tried all three of these on my laptop, which has NVMe and 5.7.0-0.rc4.1.fc33.x86_64+debug. And they all complete, with no odd reporting in user space for ETA, and progress seems normal. I do see this in dmesg, which I think coincides with the cancel/abort [98580.937332] BTRFS info (device nvme0n1p7): scrub: not finished on devid 1 with status: -125 I don't know how the kernel and user space communicate scrub progress. I don't see anything in sysfs. When you see this problem manifesting (no progress being made) the /home file system is otherwise working normally? You can read and write files OK? -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-10 1:14 ` Chris Murphy @ 2020-05-10 1:28 ` Andrew Pam 2020-05-10 1:32 ` Andrew Pam 2020-05-10 11:52 ` Graham Cobb 2 siblings, 0 replies; 48+ messages in thread From: Andrew Pam @ 2020-05-10 1:28 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 10/5/20 11:14 am, Chris Murphy wrote: > When you see this problem manifesting (no progress being made) the > /home file system is otherwise working normally? You can read and > write files OK? Correct. It works fine, although the "btrfs resume" process is generating heavy read I/O. Thanks, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-10 1:14 ` Chris Murphy 2020-05-10 1:28 ` Andrew Pam @ 2020-05-10 1:32 ` Andrew Pam 2020-05-10 11:52 ` Graham Cobb 2 siblings, 0 replies; 48+ messages in thread From: Andrew Pam @ 2020-05-10 1:32 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 10/5/20 11:14 am, Chris Murphy wrote: > I just tried all three of these on my laptop, which has NVMe and > 5.7.0-0.rc4.1.fc33.x86_64+debug. And they all complete, with no odd > reporting in user space for ETA, and progress seems normal. Have you tried multiple cancel/suspend/resume cycles? It doesn't always go bad right away. Thanks, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-10 1:14 ` Chris Murphy 2020-05-10 1:28 ` Andrew Pam 2020-05-10 1:32 ` Andrew Pam @ 2020-05-10 11:52 ` Graham Cobb 2020-05-10 17:21 ` Graham Cobb 2 siblings, 1 reply; 48+ messages in thread From: Graham Cobb @ 2020-05-10 11:52 UTC (permalink / raw) To: Chris Murphy; +Cc: Andrew Pam, Btrfs BTRFS On 10/05/2020 02:14, Chris Murphy wrote: > I don't know how the kernel and user space communicate scrub progress. > I don't see anything in sysfs. I did some work on btrfs-scrub a while ago to fix a bug in the way last_physical was being saved between cancel and restore. If I remember correctly, the way this works is that the scrub process receives stats about the scrub progress directly from the kernel on a socket it creates in /var/lib/btrfs, then writes updated stats into a file in /var/lib/btrfs/scrub.status.<UUID> to share with scrub status and keep a record. At the end of the scrub, the final stats are provided by the kernel into a buffer provided in the ioctl which are also then used to update the scrub.status file. I don't remember how often the stats are received from the kernel but it is frequently (not just at the end). Unless I am confused, if the bytes scrubbed numbers are going up, then the progress updates are being received from the kernel. The update includes the last_finished so there should be no way the scrub process can receive updated bytes scrubbed numbers without updated last_finished from the kernel. HOWEVER, I think (I may be wrong) btrfs scrub status normally reads the stats from the file, not directly from the kernel. So it is actually reporting the data in the file written by scrub process. That does processing on the numbers (that was where the bug was which I found). It is possible there is still a bug in that processing -- if so it is possible that last_finished in the file could be zero even though it was received as non-zero from the kernel. So, we can't be sure, without adding logging to the scrub command, whether the kernel is really reporting zero. This is from memory. Sorry, I don't have time, at the moment, to go back to the scrub usermode code to check. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-10 11:52 ` Graham Cobb @ 2020-05-10 17:21 ` Graham Cobb 2020-05-10 17:27 ` Andrew Pam 0 siblings, 1 reply; 48+ messages in thread From: Graham Cobb @ 2020-05-10 17:21 UTC (permalink / raw) To: Chris Murphy; +Cc: Andrew Pam, Btrfs BTRFS [-- Attachment #1: Type: text/plain, Size: 1486 bytes --] On 10/05/2020 12:52, Graham Cobb wrote: > On 10/05/2020 02:14, Chris Murphy wrote: >> I don't know how the kernel and user space communicate scrub progress. >> I don't see anything in sysfs. > > I did some work on btrfs-scrub a while ago to fix a bug in the way > last_physical was being saved between cancel and restore. > > If I remember correctly, the way this works is that the scrub process > receives stats about the scrub progress directly from the kernel on a > socket it creates in /var/lib/btrfs, then writes updated stats into a > file in /var/lib/btrfs/scrub.status.<UUID> to share with scrub status > and keep a record. At the end of the scrub, the final stats are provided > by the kernel into a buffer provided in the ioctl which are also then > used to update the scrub.status file. Unfortunately my memory is rubbish! I have just looked at the code to remind myself. The scrub process gets stats from the kernel using a BTRFS_IOC_SCRUB_PROGRESS ioctl (one for each device). It uses this to update the status file and also sends the stats to any listeners on the socket. btrfs scrub status reads the data from the socket if it exists (i.e. the scrub process is still running), otherwise it reads it from the data file (for a scrub which has finished or been cancelled). It might be worth you making a small logging change to progress_one_dev to watch the value of last_physical exactly as received from the kernel during the scrub. I have attached a small patch. [-- Attachment #2: scrub.c.patch --] [-- Type: text/x-patch, Size: 785 bytes --] diff --git a/cmds/scrub.c b/cmds/scrub.c index 9fe59822..48d76852 100644 --- a/cmds/scrub.c +++ b/cmds/scrub.c @@ -902,6 +902,8 @@ static void *progress_one_dev(void *ctx) struct scrub_progress *sp = ctx; sp->ret = ioctl(sp->fd, BTRFS_IOC_SCRUB_PROGRESS, &sp->scrub_args); + fprintf(stderr, "NOTE: device %lld last_physical=%lld\n", sp->scrub_args.devid, sp->scrub_args.progress.last_physical); + sp->ioctl_errno = errno; return NULL; @@ -1784,7 +1786,8 @@ static int cmd_scrub_status(const struct cmd_struct *cmd, int argc, char **argv) err = 1; goto out; } - } + fputs("NOTE: Reading progress from status file \n", stderr); + } else { fputs("NOTE: Reading progress from socket \n", stderr); } if (fdres >= 0) { past_scrubs = scrub_read_file(fdres, 1); ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-10 17:21 ` Graham Cobb @ 2020-05-10 17:27 ` Andrew Pam 0 siblings, 0 replies; 48+ messages in thread From: Andrew Pam @ 2020-05-10 17:27 UTC (permalink / raw) To: Graham Cobb, Chris Murphy; +Cc: Btrfs BTRFS On 11/5/20 3:21 am, Graham Cobb wrote: > It might be worth you making a small logging change to progress_one_dev > to watch the value of last_physical exactly as received from the kernel > during the scrub. I have attached a small patch. Thanks, I'll try that! Patched. Cheers, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-09 20:33 ` Chris Murphy 2020-05-10 1:14 ` Chris Murphy @ 2020-05-10 1:30 ` Andrew Pam 2020-05-10 6:17 ` Andrew Pam 2020-05-11 1:39 ` Andrew Pam 3 siblings, 0 replies; 48+ messages in thread From: Andrew Pam @ 2020-05-10 1:30 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 10/5/20 6:33 am, Chris Murphy wrote: > 1. That a scrub started, and allowed to finish without any > interruption, does finish. This is the only experience I have, but I'm > wondering whether you've tried this just to eliminate the possibility > of some weird multiple device scrub bug; or possibly something unique > about your file system state. I am testing this now. Completion estimated in around 3 hours (7 hours total running time). Thanks, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-09 20:33 ` Chris Murphy 2020-05-10 1:14 ` Chris Murphy 2020-05-10 1:30 ` Andrew Pam @ 2020-05-10 6:17 ` Andrew Pam 2020-05-11 1:39 ` Andrew Pam 3 siblings, 0 replies; 48+ messages in thread From: Andrew Pam @ 2020-05-10 6:17 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 10/5/20 6:33 am, Chris Murphy wrote: > Things to verify: > > 1. That a scrub started, and allowed to finish without any > interruption, does finish. This is the only experience I have, but I'm > wondering whether you've tried this just to eliminate the possibility > of some weird multiple device scrub bug; or possibly something unique > about your file system state. Completed successfully with no issues, and I was running Thunderbird and Firefox plus playing "Chimera Squad" using Steam Proton while the scrub was running to exercise the system. :) $ sudo btrfs scrub status -d /home UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 scrub device /dev/sda (id 1) history Scrub started: Sun May 10 07:48:43 2020 Status: finished Duration: 7:35:14 Total to scrub: 3.69TiB Rate: 140.47MiB/s Error summary: no errors found scrub device /dev/sdb (id 2) history Scrub started: Sun May 10 07:48:43 2020 Status: finished Duration: 7:35:14 Total to scrub: 3.69TiB Rate: 140.47MiB/s Error summary: no errors found [17537.701685] BTRFS info (device sda): scrub: started on devid 2 [17537.701686] BTRFS info (device sda): scrub: started on devid 1 [44851.445348] BTRFS info (device sda): scrub: finished on devid 2 with status: 0 [44851.476413] BTRFS info (device sda): scrub: finished on devid 1 with status: 0 > 2. That a scrub started, then cancelled, then resumed, also does > finish (or not). I will test this next. Thanks, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-09 20:33 ` Chris Murphy ` (2 preceding siblings ...) 2020-05-10 6:17 ` Andrew Pam @ 2020-05-11 1:39 ` Andrew Pam 2020-05-11 4:46 ` Chris Murphy 3 siblings, 1 reply; 48+ messages in thread From: Andrew Pam @ 2020-05-11 1:39 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 10/5/20 6:33 am, Chris Murphy wrote: > 2. That a scrub started, then cancelled, then resumed, also does > finish (or not). OK, I have now run a scrub with multiple cancel and resumes and that also proceeded and finished normally as expected: $ sudo ./btrfs scrub status -d /home NOTE: Reading progress from status file UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 scrub device /dev/sda (id 1) history Scrub resumed: Mon May 11 06:06:37 2020 Status: finished Duration: 7:27:31 Total to scrub: 3.67TiB Rate: 142.96MiB/s Error summary: no errors found scrub device /dev/sdb (id 2) history Scrub resumed: Mon May 11 06:06:37 2020 Status: finished Duration: 7:27:15 Total to scrub: 3.67TiB Rate: 143.04MiB/s Error summary: no errors found [54472.936094] BTRFS info (device sda): scrub: started on devid 2 [54472.936095] BTRFS info (device sda): scrub: started on devid 1 [55224.956293] BTRFS info (device sda): scrub: not finished on devid 1 with status: -125 [55226.356563] BTRFS info (device sda): scrub: not finished on devid 2 with status: -125 [58775.602370] BTRFS info (device sda): scrub: started on devid 1 [58775.602372] BTRFS info (device sda): scrub: started on devid 2 [72393.296199] BTRFS info (device sda): scrub: not finished on devid 1 with status: -125 [72393.296215] BTRFS info (device sda): scrub: not finished on devid 2 with status: -125 [77731.999603] BTRFS info (device sda): scrub: started on devid 1 [77731.999604] BTRFS info (device sda): scrub: started on devid 2 [87727.510382] BTRFS info (device sda): scrub: not finished on devid 1 with status: -125 [87727.582401] BTRFS info (device sda): scrub: not finished on devid 2 with status: -125 [89358.196384] BTRFS info (device sda): scrub: started on devid 1 [89358.196386] BTRFS info (device sda): scrub: started on devid 2 [89830.639654] BTRFS info (device sda): scrub: not finished on devid 2 with status: -125 [89830.856232] BTRFS info (device sda): scrub: not finished on devid 1 with status: -125 [94486.300097] BTRFS info (device sda): scrub: started on devid 2 [94486.300098] BTRFS info (device sda): scrub: started on devid 1 [96223.185459] BTRFS info (device sda): scrub: not finished on devid 1 with status: -125 [96223.227246] BTRFS info (device sda): scrub: not finished on devid 2 with status: -125 [97810.489388] BTRFS info (device sda): scrub: started on devid 1 [97810.540625] BTRFS info (device sda): scrub: started on devid 2 [98068.987932] BTRFS info (device sda): scrub: finished on devid 2 with status: 0 [98085.771626] BTRFS info (device sda): scrub: finished on devid 1 with status: 0 So by elimination it's starting to look like suspend-to-RAM might be part of the problem. That's what I'll test next. Cheers, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-11 1:39 ` Andrew Pam @ 2020-05-11 4:46 ` Chris Murphy 2020-05-13 6:47 ` Andrew Pam 0 siblings, 1 reply; 48+ messages in thread From: Chris Murphy @ 2020-05-11 4:46 UTC (permalink / raw) To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS On Sun, May 10, 2020 at 7:39 PM Andrew Pam <andrew@sericyb.com.au> wrote: > > On 10/5/20 6:33 am, Chris Murphy wrote: > > 2. That a scrub started, then cancelled, then resumed, also does > > finish (or not). > > OK, I have now run a scrub with multiple cancel and resumes and that > also proceeded and finished normally as expected: > > $ sudo ./btrfs scrub status -d /home > NOTE: Reading progress from status file > UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 > scrub device /dev/sda (id 1) history > Scrub resumed: Mon May 11 06:06:37 2020 > Status: finished > Duration: 7:27:31 > Total to scrub: 3.67TiB > Rate: 142.96MiB/s > Error summary: no errors found > scrub device /dev/sdb (id 2) history > Scrub resumed: Mon May 11 06:06:37 2020 > Status: finished > Duration: 7:27:15 > Total to scrub: 3.67TiB > Rate: 143.04MiB/s > Error summary: no errors found > > [54472.936094] BTRFS info (device sda): scrub: started on devid 2 > [54472.936095] BTRFS info (device sda): scrub: started on devid 1 > [55224.956293] BTRFS info (device sda): scrub: not finished on devid 1 > with status: -125 > [55226.356563] BTRFS info (device sda): scrub: not finished on devid 2 > with status: -125 > [58775.602370] BTRFS info (device sda): scrub: started on devid 1 > [58775.602372] BTRFS info (device sda): scrub: started on devid 2 > [72393.296199] BTRFS info (device sda): scrub: not finished on devid 1 > with status: -125 > [72393.296215] BTRFS info (device sda): scrub: not finished on devid 2 > with status: -125 > [77731.999603] BTRFS info (device sda): scrub: started on devid 1 > [77731.999604] BTRFS info (device sda): scrub: started on devid 2 > [87727.510382] BTRFS info (device sda): scrub: not finished on devid 1 > with status: -125 > [87727.582401] BTRFS info (device sda): scrub: not finished on devid 2 > with status: -125 > [89358.196384] BTRFS info (device sda): scrub: started on devid 1 > [89358.196386] BTRFS info (device sda): scrub: started on devid 2 > [89830.639654] BTRFS info (device sda): scrub: not finished on devid 2 > with status: -125 > [89830.856232] BTRFS info (device sda): scrub: not finished on devid 1 > with status: -125 > [94486.300097] BTRFS info (device sda): scrub: started on devid 2 > [94486.300098] BTRFS info (device sda): scrub: started on devid 1 > [96223.185459] BTRFS info (device sda): scrub: not finished on devid 1 > with status: -125 > [96223.227246] BTRFS info (device sda): scrub: not finished on devid 2 > with status: -125 > [97810.489388] BTRFS info (device sda): scrub: started on devid 1 > [97810.540625] BTRFS info (device sda): scrub: started on devid 2 > [98068.987932] BTRFS info (device sda): scrub: finished on devid 2 with > status: 0 > [98085.771626] BTRFS info (device sda): scrub: finished on devid 1 with > status: 0 > > So by elimination it's starting to look like suspend-to-RAM might be > part of the problem. That's what I'll test next. > Power management is difficult. (I'm actually working on a git bisect right now, older laptop won't wake from suspend, 5.7 regression.) Do all the devices wake up correctly isn't always easy to get an answer to. They might all have power but did they really come up in the correct state? Thing is, you're reporting that iotop independently shows a transfer rate consistent with getting data off the drives. I also wonder whether the socket that Graham mentions, could get in some kind of stuck or confused state due to sleep/wake cycle? My case, NVMe, is maybe not the best example because that's just PCIe. In your case it's real drives, so it's SCSI, block, and maybe libata and other things. -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-11 4:46 ` Chris Murphy @ 2020-05-13 6:47 ` Andrew Pam 2020-05-14 0:13 ` Chris Murphy 0 siblings, 1 reply; 48+ messages in thread From: Andrew Pam @ 2020-05-13 6:47 UTC (permalink / raw) To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS On 11/5/20 2:46 pm, Chris Murphy wrote: > I also wonder whether the socket that Graham mentions, could get in > some kind of stuck or confused state due to sleep/wake cycle? My case, > NVMe, is maybe not the best example because that's just PCIe. In your > case it's real drives, so it's SCSI, block, and maybe libata and other > things. Could be. When I start a new scrub and suspend the system, after a resume further attempts to run "btrfs scrub status -dR /home" result in the following: NOTE: Reading progress from status file UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 scrub device /dev/sda (id 1) status Scrub started: Wed May 13 16:10:12 2020 Status: running Duration: 0:00:22 data_extents_scrubbed: 0 tree_extents_scrubbed: 29238 data_bytes_scrubbed: 0 tree_bytes_scrubbed: 479035392 read_errors: 0 csum_errors: 0 verify_errors: 0 no_csum: 0 csum_discards: 0 super_errors: 0 malloc_errors: 0 uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 0 last_physical: 0 scrub device /dev/sdb (id 2) status Scrub started: Wed May 13 16:10:12 2020 Status: running Duration: 0:00:23 data_extents_scrubbed: 0 tree_extents_scrubbed: 27936 data_bytes_scrubbed: 0 tree_bytes_scrubbed: 457703424 read_errors: 0 csum_errors: 0 verify_errors: 0 no_csum: 0 csum_discards: 0 super_errors: 0 malloc_errors: 0 uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 0 last_physical: 0 So it appears that the socket connection to the kernel is not able to be reestablished after the resume from suspend-to-RAM. Interestingly, if I then manually run "btrfs scrub cancel /home" and "btrfs scrub resume -c3 /home" then the status reports start working again. It fails only when "btrfs scrub resume -c3 /home" is run from the script "/usr/lib/systemd/system-sleep/btrfs-scrub" as follows: #!/bin/sh case $1/$2 in pre/*) btrfs scrub cancel /home ;; post/*) sleep 1 btrfs scrub resume -c3 /home ;; esac Without the sleep, it does not resume the scrub. A longer sleep (5 seconds) does not resolve the issue with the status reports. Maybe this is some kind of systemd problem... :( Thanks, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-13 6:47 ` Andrew Pam @ 2020-05-14 0:13 ` Chris Murphy 2020-05-14 16:18 ` Graham Cobb 0 siblings, 1 reply; 48+ messages in thread From: Chris Murphy @ 2020-05-14 0:13 UTC (permalink / raw) To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS On Wed, May 13, 2020 at 12:49 AM Andrew Pam <andrew@sericyb.com.au> wrote: > > On 11/5/20 2:46 pm, Chris Murphy wrote: > > I also wonder whether the socket that Graham mentions, could get in > > some kind of stuck or confused state due to sleep/wake cycle? My case, > > NVMe, is maybe not the best example because that's just PCIe. In your > > case it's real drives, so it's SCSI, block, and maybe libata and other > > things. > > Could be. When I start a new scrub and suspend the system, after a > resume further attempts to run "btrfs scrub status -dR /home" result in > the following: > > NOTE: Reading progress from status file > UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 > scrub device /dev/sda (id 1) status > Scrub started: Wed May 13 16:10:12 2020 > Status: running > Duration: 0:00:22 > data_extents_scrubbed: 0 > tree_extents_scrubbed: 29238 > data_bytes_scrubbed: 0 > tree_bytes_scrubbed: 479035392 > read_errors: 0 > csum_errors: 0 > verify_errors: 0 > no_csum: 0 > csum_discards: 0 > super_errors: 0 > malloc_errors: 0 > uncorrectable_errors: 0 > unverified_errors: 0 > corrected_errors: 0 > last_physical: 0 > scrub device /dev/sdb (id 2) status > Scrub started: Wed May 13 16:10:12 2020 > Status: running > Duration: 0:00:23 > data_extents_scrubbed: 0 > tree_extents_scrubbed: 27936 > data_bytes_scrubbed: 0 > tree_bytes_scrubbed: 457703424 > read_errors: 0 > csum_errors: 0 > verify_errors: 0 > no_csum: 0 > csum_discards: 0 > super_errors: 0 > malloc_errors: 0 > uncorrectable_errors: 0 > unverified_errors: 0 > corrected_errors: 0 > last_physical: 0 > > So it appears that the socket connection to the kernel is not able to be > reestablished after the resume from suspend-to-RAM. Interestingly, if I > then manually run "btrfs scrub cancel /home" and "btrfs scrub resume -c3 > /home" then the status reports start working again. It fails only when > "btrfs scrub resume -c3 /home" is run from the script > "/usr/lib/systemd/system-sleep/btrfs-scrub" as follows: > > #!/bin/sh > > case $1/$2 in > pre/*) > btrfs scrub cancel /home > ;; > post/*) > sleep 1 > btrfs scrub resume -c3 /home > ;; > esac > > Without the sleep, it does not resume the scrub. A longer sleep (5 > seconds) does not resolve the issue with the status reports. > > Maybe this is some kind of systemd problem... :( Oof. So possibly two bugs. Well, you could post an inquiry about it to systemd-devel@. And then maybe try to reproduce with something that has systemd-245 to see if it makes a difference. I ran into something that looked like races in 244 with a rust generator I was testing, that didn't happen in 245. *shrug* But I'm not sure testing wise how to isolate the systemd from the socket questions. -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-14 0:13 ` Chris Murphy @ 2020-05-14 16:18 ` Graham Cobb 2020-05-15 2:37 ` Andrew Pam 0 siblings, 1 reply; 48+ messages in thread From: Graham Cobb @ 2020-05-14 16:18 UTC (permalink / raw) To: Chris Murphy, Andrew Pam; +Cc: Btrfs BTRFS On 14/05/2020 01:13, Chris Murphy wrote: > On Wed, May 13, 2020 at 12:49 AM Andrew Pam <andrew@sericyb.com.au> wrote: >> >> On 11/5/20 2:46 pm, Chris Murphy wrote: >>> I also wonder whether the socket that Graham mentions, could get in >>> some kind of stuck or confused state due to sleep/wake cycle? Clarification: I originally said that the socket was for communication between the scrub process and the kernel, but that was wrong. I tried to correct in a later message but, in any case, the socket is between the scrub usermode process and the "status" command. I suppose it is possible that suspend/resume causes some problem with sockets or with the thread which processes the socket but I would be surprised. In any case, that should only cause strangeness with "scrub status". My case, >>> NVMe, is maybe not the best example because that's just PCIe. In your >>> case it's real drives, so it's SCSI, block, and maybe libata and other >>> things. >> >> Could be. When I start a new scrub and suspend the system, after a >> resume further attempts to run "btrfs scrub status -dR /home" result in >> the following: >> >> NOTE: Reading progress from status file This means that scrub status could not connect to the socket and has fallen back to reading the file (which may be out of date if the scrub is actually still running). Either there is some strangeness about the socket or the scrub usermode process itself is not running or the thread which handles the socket is stuck (it normally sleeps in a "poll" waiting for connections to the socket). >> UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 >> scrub device /dev/sda (id 1) status >> Scrub started: Wed May 13 16:10:12 2020 >> Status: running >> Duration: 0:00:22 >> data_extents_scrubbed: 0 >> tree_extents_scrubbed: 29238 >> data_bytes_scrubbed: 0 >> tree_bytes_scrubbed: 479035392 >> read_errors: 0 >> csum_errors: 0 >> verify_errors: 0 >> no_csum: 0 >> csum_discards: 0 >> super_errors: 0 >> malloc_errors: 0 >> uncorrectable_errors: 0 >> unverified_errors: 0 >> corrected_errors: 0 >> last_physical: 0 >> scrub device /dev/sdb (id 2) status >> Scrub started: Wed May 13 16:10:12 2020 >> Status: running >> Duration: 0:00:23 >> data_extents_scrubbed: 0 >> tree_extents_scrubbed: 27936 >> data_bytes_scrubbed: 0 >> tree_bytes_scrubbed: 457703424 >> read_errors: 0 >> csum_errors: 0 >> verify_errors: 0 >> no_csum: 0 >> csum_discards: 0 >> super_errors: 0 >> malloc_errors: 0 >> uncorrectable_errors: 0 >> unverified_errors: 0 >> corrected_errors: 0 >> last_physical: 0 >> >> So it appears that the socket connection to the kernel is not able to be >> reestablished after the resume from suspend-to-RAM. Interestingly, if I >> then manually run "btrfs scrub cancel /home" and "btrfs scrub resume -c3 >> /home" then the status reports start working again. It fails only when >> "btrfs scrub resume -c3 /home" is run from the script >> "/usr/lib/systemd/system-sleep/btrfs-scrub" as follows: >> >> #!/bin/sh >> >> case $1/$2 in >> pre/*) >> btrfs scrub cancel /home >> ;; >> post/*) >> sleep 1 >> btrfs scrub resume -c3 /home >> ;; >> esac Try scrub resume restarts the scrub and backgrounds it. Maybe there is something strange about background processes created in the restore script. What happens if you specify "-B" on the resume? Another potential problem is that the "scrub cancel" cancels the scrub but the scrub process doesn't immediately exit (it waits to receive the "cancelled" status and get the final stats, then it writes them out and does some tidying up). So maybe there is a race: maybe the scrub is still cancelling (either in the kernel, or the scrub usermode process itself is still thinking about exiting) when the suspend happens. On resume, maybe the cancel then completes -- that might even be why the resume doesn't work without the delay. It is even possible that the resume could succeed in resuming but using bad last_physical data if the cancel process is still writing it out. I don't suppose anyone has really tested races between cancel and resume. Try adding a sleep after the cancel (or logic to see if the scrub process has actually exited). >> >> Without the sleep, it does not resume the scrub. A longer sleep (5 >> seconds) does not resolve the issue with the status reports. >> >> Maybe this is some kind of systemd problem... :( > > Oof. So possibly two bugs. > > Well, you could post an inquiry about it to systemd-devel@. And then > maybe try to reproduce with something that has systemd-245 to see if > it makes a difference. I ran into something that looked like races in > 244 with a rust generator I was testing, that didn't happen in 245. > *shrug* > > But I'm not sure testing wise how to isolate the systemd from the > socket questions. > > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-14 16:18 ` Graham Cobb @ 2020-05-15 2:37 ` Andrew Pam 2020-05-15 11:27 ` Graham Cobb 2020-07-04 14:51 ` Andrew Pam 0 siblings, 2 replies; 48+ messages in thread From: Andrew Pam @ 2020-05-15 2:37 UTC (permalink / raw) To: Graham Cobb, Chris Murphy; +Cc: Btrfs BTRFS On 15/5/20 2:18 am, Graham Cobb wrote: > Try scrub resume restarts the scrub and backgrounds it. Maybe there is > something strange about background processes created in the restore > script. What happens if you specify "-B" on the resume? Then the resume does not occur. After the resume from suspend-to-RAM, no "btrfs" process is running and the status still shows as "aborted". This suggests that the "btrfs resume -B" fails when run from within the systemd post-resume script. > Try adding a sleep after the cancel (or logic to see if the scrub > process has actually exited). I tried adding both "killall -s0 -w btrfs" and "sleep 5" - even with these changes, the "btrfs resume" (without -B option) once again shows "last_physical: 0" Thanks, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-15 2:37 ` Andrew Pam @ 2020-05-15 11:27 ` Graham Cobb 2020-07-04 14:51 ` Andrew Pam 1 sibling, 0 replies; 48+ messages in thread From: Graham Cobb @ 2020-05-15 11:27 UTC (permalink / raw) To: Andrew Pam, Chris Murphy; +Cc: Btrfs BTRFS On 15/05/2020 03:37, Andrew Pam wrote: > On 15/5/20 2:18 am, Graham Cobb wrote: >> Try scrub resume restarts the scrub and backgrounds it. Maybe there is >> something strange about background processes created in the restore >> script. What happens if you specify "-B" on the resume? > > Then the resume does not occur. After the resume from suspend-to-RAM, > no "btrfs" process is running and the status still shows as "aborted". > This suggests that the "btrfs resume -B" fails when run from within the > systemd post-resume script. Maybe something special about the post-resume script. Have you tried not resuming the scrub in the systemd script but doing it by hand after everything has settled down? > >> Try adding a sleep after the cancel (or logic to see if the scrub >> process has actually exited). > > I tried adding both "killall -s0 -w btrfs" and "sleep 5" - even with > these changes, the "btrfs resume" (without -B option) once again shows > "last_physical: 0" Oh well, worth a try. Did you try running the resume using my logging patch? Was the 0 for last_physical definitely being returned from the kernel in the ioctl response? It may mean running it manually (or redirecting stderr to a file). Graham ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-15 2:37 ` Andrew Pam 2020-05-15 11:27 ` Graham Cobb @ 2020-07-04 14:51 ` Andrew Pam 1 sibling, 0 replies; 48+ messages in thread From: Andrew Pam @ 2020-07-04 14:51 UTC (permalink / raw) To: Graham Cobb, Chris Murphy; +Cc: Btrfs BTRFS On 15/5/20 12:37 pm, Andrew Pam wrote: > After the resume from suspend-to-RAM, no "btrfs" process is running > and the status still shows as "aborted". This suggests that the > "btrfs resume -B" fails when run from within the systemd post-resume > script. Here's what I ultimately ended up with: http://www.sericyb.com.au/HOWTO-btrfs-scrub-suspend.html Hope that helps, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-08 17:06 ` Andrew Pam 2020-05-08 21:20 ` Chris Murphy 2020-05-08 21:31 ` Chris Murphy @ 2020-05-09 8:02 ` Su Yue 2020-05-09 8:23 ` Andrew Pam 2 siblings, 1 reply; 48+ messages in thread From: Su Yue @ 2020-05-09 8:02 UTC (permalink / raw) To: Andrew Pam; +Cc: Graham Cobb, Chris Murphy, Btrfs BTRFS On Sat 09 May 2020 at 01:06, Andrew Pam <andrew@sericyb.com.au> wrote: > And here we are again: > > $ sudo btrfs scrub status -d /home UUID: > 85069ce9-be06-4c92-b8c1-8a0f685e43c6 scrub device /dev/sda (id > 1) status Scrub resumed: Sat May 9 02:52:12 2020 Status: > running Duration: 7:02:55 Time left: > 32261372:31:39 ETA: Fri Sep 17 23:35:41 5700 Total > to scrub: 3.66TiB Bytes scrubbed: 3.67TiB Rate: > 151.47MiB/s Error summary: no errors found scrub device > /dev/sdb (id 2) status Scrub resumed: Sat May 9 02:52:12 > 2020 Status: running Duration: 7:02:59 Time > left: 31973655:40:34 ETA: Mon Nov 21 > 19:44:36 5667 Total to scrub: 3.66TiB Bytes scrubbed: > 3.70TiB Rate: 152.83MiB/s Error summary: no > errors found > Weird. Any operation was done since the last "interrupted" status? Does iotop show any actual io of the two devices? -- Su > I tried building btrfs-progs v5.6.1 from source, but it gives exactly > the same results. > > Cheers, > Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-09 8:02 ` Su Yue @ 2020-05-09 8:23 ` Andrew Pam 2020-05-09 20:28 ` Chris Murphy 0 siblings, 1 reply; 48+ messages in thread From: Andrew Pam @ 2020-05-09 8:23 UTC (permalink / raw) To: Su Yue; +Cc: Graham Cobb, Chris Murphy, Btrfs BTRFS On 9/5/20 6:02 pm, Su Yue wrote: > Weird. Any operation was done since the last "interrupted" status? Just suspending and resuming the system. > Does iotop show any actual io of the two devices? Yes, iotop shows >100M/s on each drive from the "btrfs resume" processes at all times when the scrub is running, even when it doesn't appear to be making any progress. Maybe the reporting from the kernel is wrong. Cheers, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-09 8:23 ` Andrew Pam @ 2020-05-09 20:28 ` Chris Murphy 2020-05-09 20:39 ` Andrew Pam 0 siblings, 1 reply; 48+ messages in thread From: Chris Murphy @ 2020-05-09 20:28 UTC (permalink / raw) To: Andrew Pam; +Cc: Su Yue, Graham Cobb, Chris Murphy, Btrfs BTRFS On Sat, May 9, 2020 at 2:23 AM Andrew Pam <andrew@sericyb.com.au> wrote: > > On 9/5/20 6:02 pm, Su Yue wrote: > > Weird. Any operation was done since the last "interrupted" status? > > Just suspending and resuming the system. OK so you haven't done a scrub from start to finish without a cancel+suspend+resume? I just want to be clear exactly when this bug is and isn't reproducible. -- Chris Murphy ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: btrfs-progs reports nonsense scrub status 2020-05-09 20:28 ` Chris Murphy @ 2020-05-09 20:39 ` Andrew Pam 0 siblings, 0 replies; 48+ messages in thread From: Andrew Pam @ 2020-05-09 20:39 UTC (permalink / raw) To: Chris Murphy; +Cc: Su Yue, Graham Cobb, Btrfs BTRFS On 10/5/20 6:28 am, Chris Murphy wrote: > OK so you haven't done a scrub from start to finish without a > cancel+suspend+resume? I have not, because my system is set to suspend after 30 minutes to save power. I will disable that and test a scrub with no suspend next. Cheers, Andrew ^ permalink raw reply [flat|nested] 48+ messages in thread
end of thread, other threads:[~2020-07-04 14:56 UTC | newest] Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-05 5:46 btrfs-progs reports nonsense scrub status Andrew Pam 2020-05-05 9:51 ` Graham Cobb 2020-05-05 10:10 ` Andrew Pam 2020-05-05 20:39 ` Andrew Pam 2020-05-06 23:42 ` Chris Murphy 2020-05-07 1:09 ` Andrew Pam 2020-05-07 5:10 ` Chris Murphy 2020-05-07 5:36 ` Chris Murphy 2020-05-07 5:56 ` Andrew Pam 2020-05-08 2:26 ` Chris Murphy 2020-05-08 2:31 ` Andrew Pam 2020-05-08 2:48 ` Chris Murphy 2020-05-08 2:54 ` Andrew Pam 2020-05-08 3:55 ` Chris Murphy 2020-05-08 4:04 ` Andrew Pam 2020-05-08 4:21 ` Chris Murphy 2020-05-08 4:42 ` Andrew Pam 2020-05-08 4:45 ` Chris Murphy 2020-05-08 4:52 ` Andrew Pam 2020-05-08 7:37 ` Chris Murphy 2020-05-08 8:19 ` Andrew Pam 2020-05-08 11:21 ` Graham Cobb 2020-05-08 12:54 ` Andrew Pam 2020-05-08 17:06 ` Andrew Pam 2020-05-08 21:20 ` Chris Murphy 2020-05-08 21:31 ` Chris Murphy 2020-05-09 10:15 ` Andrew Pam 2020-05-09 20:33 ` Chris Murphy 2020-05-10 1:14 ` Chris Murphy 2020-05-10 1:28 ` Andrew Pam 2020-05-10 1:32 ` Andrew Pam 2020-05-10 11:52 ` Graham Cobb 2020-05-10 17:21 ` Graham Cobb 2020-05-10 17:27 ` Andrew Pam 2020-05-10 1:30 ` Andrew Pam 2020-05-10 6:17 ` Andrew Pam 2020-05-11 1:39 ` Andrew Pam 2020-05-11 4:46 ` Chris Murphy 2020-05-13 6:47 ` Andrew Pam 2020-05-14 0:13 ` Chris Murphy 2020-05-14 16:18 ` Graham Cobb 2020-05-15 2:37 ` Andrew Pam 2020-05-15 11:27 ` Graham Cobb 2020-07-04 14:51 ` Andrew Pam 2020-05-09 8:02 ` Su Yue 2020-05-09 8:23 ` Andrew Pam 2020-05-09 20:28 ` Chris Murphy 2020-05-09 20:39 ` Andrew Pam
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.