All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs-progs reports nonsense scrub status
@ 2020-05-05  5:46 Andrew Pam
  2020-05-05  9:51 ` Graham Cobb
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Pam @ 2020-05-05  5:46 UTC (permalink / raw)
  To: linux-btrfs

$ sudo btrfs scrub status /home
UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
	no stats available
Time left:        16964119:40:20
ETA:              Mon Aug  8 23:23:14 3955
Total to scrub:   7.30TiB
Bytes scrubbed:   10.20TiB
Rate:             288.06MiB/s
Error summary:    no errors found

Cheers,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-05  5:46 btrfs-progs reports nonsense scrub status Andrew Pam
@ 2020-05-05  9:51 ` Graham Cobb
  2020-05-05 10:10   ` Andrew Pam
  2020-05-05 20:39   ` Andrew Pam
  0 siblings, 2 replies; 48+ messages in thread
From: Graham Cobb @ 2020-05-05  9:51 UTC (permalink / raw)
  To: Andrew Pam, linux-btrfs

On 05/05/2020 06:46, Andrew Pam wrote:
> $ sudo btrfs scrub status /home
> UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
> 	no stats available
> Time left:        16964119:40:20
> ETA:              Mon Aug  8 23:23:14 3955
> Total to scrub:   7.30TiB
> Bytes scrubbed:   10.20TiB
> Rate:             288.06MiB/s
> Error summary:    no errors found

btrfs-progs version? (output from 'btrfs version')

Is there actually a scrub in progress?

Presumably there is no stats file? Try 'cat
/var/lib/btrfs/scrub.status.85069ce9-be06-4c92-b8c1-8a0f685e43c6'

Is this a multi-device filesystem? See what 'btrfs scrub status -d
/home' says.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-05  9:51 ` Graham Cobb
@ 2020-05-05 10:10   ` Andrew Pam
  2020-05-05 20:39   ` Andrew Pam
  1 sibling, 0 replies; 48+ messages in thread
From: Andrew Pam @ 2020-05-05 10:10 UTC (permalink / raw)
  To: Graham Cobb, linux-btrfs

On 5/5/20 7:51 pm, Graham Cobb wrote:
> On 05/05/2020 06:46, Andrew Pam wrote:
>> $ sudo btrfs scrub status /home
>> UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
>> 	no stats available
>> Time left:        16964119:40:20
>> ETA:              Mon Aug  8 23:23:14 3955
>> Total to scrub:   7.30TiB
>> Bytes scrubbed:   10.20TiB
>> Rate:             288.06MiB/s
>> Error summary:    no errors found
> 
> btrfs-progs version? (output from 'btrfs version')

btrfs-progs v5.4.1

> Is there actually a scrub in progress?

Yes.

> Presumably there is no stats file? Try 'cat
> /var/lib/btrfs/scrub.status.85069ce9-be06-4c92-b8c1-8a0f685e43c6'

scrub status:1
85069ce9-be06-4c92-b8c1-8a0f685e43c6:1|data_extents_scrubbed:116366415|tree_extents_scrubbed:1154955|data_bytes_scrubbed:7193709023232|tree_bytes_scrubbed:18922782720|read_errors:0|csum_errors:0|verify_errors:0|no_csum:363202|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:0|t_start:1588531014|t_resumed:0|duration:43505|canceled:0|finished:0
85069ce9-be06-4c92-b8c1-8a0f685e43c6:2|data_extents_scrubbed:99457032|tree_extents_scrubbed:899136|data_bytes_scrubbed:6169469337600|tree_bytes_scrubbed:14731444224|read_errors:0|csum_errors:0|verify_errors:0|no_csum:306304|csum_discards:0|super_errors:0|malloc_errors:0|uncorrectable_errors:0|corrected_errors:0|last_physical:0|t_start:0|t_resumed:0|duration:37079|canceled:0|finished:0

> Is this a multi-device filesystem?

Yes.

> See what 'btrfs scrub status -d /home' says.

UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
scrub device /dev/sda (id 1) status
Scrub resumed:    Tue May  5 20:08:45 2020
Status:           running
Duration:         12:05:35
Time left:        30907070:08:51
ETA:              Sun Mar 17 11:18:10 5546
Total to scrub:   3.66TiB
Bytes scrubbed:   6.56TiB
Rate:             158.11MiB/s
Error summary:    no errors found
scrub device /dev/sdb (id 2) status
	no stats available
Time left:        30720615:37:35
ETA:              Mon Dec  8 12:46:54 5524
Total to scrub:   3.66TiB
Bytes scrubbed:   5.63TiB
Rate:             159.07MiB/s
Error summary:    no errors found

Cheers,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-05  9:51 ` Graham Cobb
  2020-05-05 10:10   ` Andrew Pam
@ 2020-05-05 20:39   ` Andrew Pam
  2020-05-06 23:42     ` Chris Murphy
  1 sibling, 1 reply; 48+ messages in thread
From: Andrew Pam @ 2020-05-05 20:39 UTC (permalink / raw)
  To: Graham Cobb, linux-btrfs

On 5/5/20 7:51 pm, Graham Cobb wrote:
> Is there actually a scrub in progress?

The scrub has been going for a couple of days now, and has scrubbed
considerably more data than exists on the disks.  Will it ever finish?

Thanks,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-05 20:39   ` Andrew Pam
@ 2020-05-06 23:42     ` Chris Murphy
  2020-05-07  1:09       ` Andrew Pam
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Murphy @ 2020-05-06 23:42 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Graham Cobb, Btrfs BTRFS

On Tue, May 5, 2020 at 2:39 PM Andrew Pam <andrew@sericyb.com.au> wrote:
>
> On 5/5/20 7:51 pm, Graham Cobb wrote:
> > Is there actually a scrub in progress?
>
> The scrub has been going for a couple of days now, and has scrubbed
> considerably more data than exists on the disks.  Will it ever finish?

A raid1 volume has twice as many bytes to scrub as data reported by
df. Can you tell us what kernel version? And also what you get for:
$ sudo btrfs fi us /mp/
$ df -h

I'm using progs 5.6 and kernel 5.6.8 for this:

$ sudo btrfs scrub status /mnt/third
UUID:
Scrub resumed:    Tue May  5 08:45:41 2020
Status:           finished
Duration:         2:41:12
Total to scrub:   759.43GiB
Rate:             79.57MiB/s
Error summary:    no errors found
$ sudo btrfs fi us /mnt/third
Overall:
    Device size:         931.49GiB
    Device allocated:         762.02GiB
    Device unallocated:         169.48GiB
    Device missing:             0.00B
    Used:             759.43GiB
    Free (estimated):          85.17GiB    (min: 85.17GiB)
    Data ratio:                  2.00
    Metadata ratio:              2.00
    Global reserve:         512.00MiB    (used: 0.00B)

Data,RAID1: Size:379.00GiB, Used:378.56GiB (99.89%)
   /dev/mapper/sdd     379.00GiB
   /dev/mapper/sdc     379.00GiB

Metadata,RAID1: Size:2.00GiB, Used:1.15GiB (57.57%)
   /dev/mapper/sdd       2.00GiB
   /dev/mapper/sdc       2.00GiB

System,RAID1: Size:8.00MiB, Used:80.00KiB (0.98%)
   /dev/mapper/sdd       8.00MiB
   /dev/mapper/sdc       8.00MiB

Unallocated:
   /dev/mapper/sdd      84.74GiB
   /dev/mapper/sdc      84.74GiB

$ df -h
...
/dev/mapper/sdd    466G  381G   86G  82% /mnt/third

I think what you're seeing is a bug. Most of the size reporting in
btrfs commands is in btrfs-progs; whereas the scrub is initiated by
user space, most of the work is done by the kernel. But I don't know
where the tracking code is.

Some of the sizes you have to infer perspective. There is no one
correct perspective. So it's normal to get a bit confused about the
convention that applies. On mdadm/lvm raid1, the mirror isn't included
in any of the space reporting. It seems like it's reporting 1/2 the
storage. Meanwhile Btrfs reports all of the storage, and variably
shows data taking up twice as much space (as literally behind the
scenes each block group of extents has a mirror)


--
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-06 23:42     ` Chris Murphy
@ 2020-05-07  1:09       ` Andrew Pam
  2020-05-07  5:10         ` Chris Murphy
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Pam @ 2020-05-07  1:09 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 7/5/20 9:42 am, Chris Murphy wrote:
> A raid1 volume has twice as many bytes to scrub as data reported by
> df.

It's scrubbed more than twice as many bytes, though.

> Can you tell us what kernel version?

5.4.0

> And also what you get for:
> $ sudo btrfs fi us /mp/

Overall:
    Device size:		  10.92TiB
    Device allocated:		   7.32TiB
    Device unallocated:		   3.59TiB
    Device missing:		     0.00B
    Used:			   7.31TiB
    Free (estimated):		   1.80TiB	(min: 1.80TiB)
    Data ratio:			      2.00
    Metadata ratio:		      2.00
    Global reserve:		 512.00MiB	(used: 0.00B)

Data,RAID1: Size:3.65TiB, Used:3.65TiB (99.89%)
   /dev/sda	   3.65TiB
   /dev/sdb	   3.65TiB

Metadata,RAID1: Size:8.00GiB, Used:6.54GiB (81.74%)
   /dev/sda	   8.00GiB
   /dev/sdb	   8.00GiB

System,RAID1: Size:64.00MiB, Used:544.00KiB (0.83%)
   /dev/sda	  64.00MiB
   /dev/sdb	  64.00MiB

Unallocated:
   /dev/sda	   1.80TiB
   /dev/sdb	   1.80TiB

> $ df -h

/dev/sda        5.5T  3.7T  1.9T  67% /home

> I think what you're seeing is a bug. Most of the size reporting in
> btrfs commands is in btrfs-progs; whereas the scrub is initiated by
> user space, most of the work is done by the kernel. But I don't know
> where the tracking code is.

No kidding.  What concerns me now is that the scrub shows no signs of
ever stopping:

$ sudo btrfs scrub status -d /home
UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
scrub device /dev/sda (id 1) status
Scrub started:    Mon May  4 04:36:54 2020
Status:           running
Duration:         18:06:28
Time left:        31009959:50:08
ETA:              Fri Dec 13 03:58:24 5557
Total to scrub:   3.66TiB
Bytes scrubbed:   9.80TiB
Rate:             157.58MiB/s
Error summary:    no errors found
scrub device /dev/sdb (id 2) status
	no stats available
Time left:        30892482:15:09
ETA:              Wed Jul 19 05:23:25 5544
Total to scrub:   3.66TiB
Bytes scrubbed:   8.86TiB
Rate:             158.18MiB/s
Error summary:    no errors found

Cheers,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-07  1:09       ` Andrew Pam
@ 2020-05-07  5:10         ` Chris Murphy
  2020-05-07  5:36           ` Chris Murphy
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Murphy @ 2020-05-07  5:10 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS

On Wed, May 6, 2020 at 7:11 PM Andrew Pam <andrew@sericyb.com.au> wrote:
>
> > $ sudo btrfs fi us /mp/
>
> Overall:
>     Device size:                  10.92TiB
>     Device allocated:              7.32TiB
>     Device unallocated:            3.59TiB
>     Device missing:                  0.00B
>     Used:                          7.31TiB


Bytes to scrub should be 7.31TB...


> $ sudo btrfs scrub status -d /home
> UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
> scrub device /dev/sda (id 1) status
> Scrub started:    Mon May  4 04:36:54 2020
> Status:           running
> Duration:         18:06:28
> Time left:        31009959:50:08
> ETA:              Fri Dec 13 03:58:24 5557
> Total to scrub:   3.66TiB
> Bytes scrubbed:   9.80TiB


So two bugs. Total to scrub is wrong. And scrubbed bytes is bigger
than both the reported total to scrub and the correct total that
should be scrubbed.

Three bugs, the time is goofy. This sounds familiar. Maybe just
upgrade your btrfs-progs.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-07  5:10         ` Chris Murphy
@ 2020-05-07  5:36           ` Chris Murphy
  2020-05-07  5:56             ` Andrew Pam
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Murphy @ 2020-05-07  5:36 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Andrew Pam, Graham Cobb, Btrfs BTRFS

On Wed, May 6, 2020 at 11:10 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Wed, May 6, 2020 at 7:11 PM Andrew Pam <andrew@sericyb.com.au> wrote:
> >
> > > $ sudo btrfs fi us /mp/
> >
> > Overall:
> >     Device size:                  10.92TiB
> >     Device allocated:              7.32TiB
> >     Device unallocated:            3.59TiB
> >     Device missing:                  0.00B
> >     Used:                          7.31TiB
>
>
> Bytes to scrub should be 7.31TB...
>
>
> > $ sudo btrfs scrub status -d /home
> > UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
> > scrub device /dev/sda (id 1) status
> > Scrub started:    Mon May  4 04:36:54 2020
> > Status:           running
> > Duration:         18:06:28
> > Time left:        31009959:50:08
> > ETA:              Fri Dec 13 03:58:24 5557
> > Total to scrub:   3.66TiB
> > Bytes scrubbed:   9.80TiB
>
>
> So two bugs. Total to scrub is wrong. And scrubbed bytes is bigger
> than both the reported total to scrub and the correct total that
> should be scrubbed.
>
> Three bugs, the time is goofy. This sounds familiar. Maybe just
> upgrade your btrfs-progs.

This was fixed in 5.2.1. I'm not sure why you're seeing this.

commit 96ed8e801fa2fc2d8a99e757566293c05572ebe1
Author: Grzegorz Kowal <grzegorz@amuncode.org>
Date:   Sun Jul 7 14:58:56 2019 -0300

    btrfs-progs: scrub: fix ETA calculation


What I would do is cancel the scrub. And then delete the applicable
file in /var/lib/btrfs, which is the file that keeps track of the
scrub. Then do 'btrfs scrub status' on that file system and it should
say there are no stats, but it'd be interesting to know if Total to
Scrub is sane. You can also start another scrub, and then again check
status and see if it's still sane or not. If not I'd cancel it and
keep troubleshooting.

I was recently on btrfs-progs 5.4.1 and didn't see this behavior
myself on a raid1 volume.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-07  5:36           ` Chris Murphy
@ 2020-05-07  5:56             ` Andrew Pam
  2020-05-08  2:26               ` Chris Murphy
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Pam @ 2020-05-07  5:56 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 7/5/20 3:36 pm, Chris Murphy wrote:
> This was fixed in 5.2.1. I'm not sure why you're seeing this.
> 
> commit 96ed8e801fa2fc2d8a99e757566293c05572ebe1
> Author: Grzegorz Kowal <grzegorz@amuncode.org>
> Date:   Sun Jul 7 14:58:56 2019 -0300
> 
>     btrfs-progs: scrub: fix ETA calculation

Maybe not fixed under all conditions!  :)

> What I would do is cancel the scrub. And then delete the applicable
> file in /var/lib/btrfs, which is the file that keeps track of the
> scrub. Then do 'btrfs scrub status' on that file system and it should
> say there are no stats, but it'd be interesting to know if Total to
> Scrub is sane.

$ sudo btrfs scrub status /home
UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
	no stats available
Total to scrub:   7.31TiB
Rate:             0.00B/s
Error summary:    no errors found

> You can also start another scrub, and then again check
> status and see if it's still sane or not. If not I'd cancel it and
> keep troubleshooting.

$ sudo btrfs scrub status -d /home
UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
scrub device /dev/sda (id 1) status
Scrub started:    Thu May  7 15:44:21 2020
Status:           running
Duration:         0:06:53
Time left:        9:23:26
ETA:              Fri May  8 01:14:40 2020
Total to scrub:   3.66TiB
Bytes scrubbed:   45.24GiB
Rate:             112.16MiB/s
Error summary:    no errors found
scrub device /dev/sdb (id 2) status
Scrub started:    Thu May  7 15:44:21 2020
Status:           running
Duration:         0:06:53
Time left:        9:24:50
ETA:              Fri May  8 01:16:04 2020
Total to scrub:   3.66TiB
Bytes scrubbed:   45.12GiB
Rate:             111.88MiB/s
Error summary:    no errors found

Still sane after cancelling and resuming.

One thing that might be relevant:  On the original scrub, I started it
on the mountpoint but initially cancelled and resumed it on the device
/dev/sda rather than the mountpoint.  Could that trigger a bug?

Cheers,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-07  5:56             ` Andrew Pam
@ 2020-05-08  2:26               ` Chris Murphy
  2020-05-08  2:31                 ` Andrew Pam
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Murphy @ 2020-05-08  2:26 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS

On Wed, May 6, 2020 at 11:56 PM Andrew Pam <andrew@sericyb.com.au> wrote:
>
> On 7/5/20 3:36 pm, Chris Murphy wrote:
> > This was fixed in 5.2.1. I'm not sure why you're seeing this.
> >
> > commit 96ed8e801fa2fc2d8a99e757566293c05572ebe1
> > Author: Grzegorz Kowal <grzegorz@amuncode.org>
> > Date:   Sun Jul 7 14:58:56 2019 -0300
> >
> >     btrfs-progs: scrub: fix ETA calculation
>
> Maybe not fixed under all conditions!  :)
>
> > What I would do is cancel the scrub. And then delete the applicable
> > file in /var/lib/btrfs, which is the file that keeps track of the
> > scrub. Then do 'btrfs scrub status' on that file system and it should
> > say there are no stats, but it'd be interesting to know if Total to
> > Scrub is sane.
>
> $ sudo btrfs scrub status /home
> UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
>         no stats available
> Total to scrub:   7.31TiB
> Rate:             0.00B/s
> Error summary:    no errors found
>
> > You can also start another scrub, and then again check
> > status and see if it's still sane or not. If not I'd cancel it and
> > keep troubleshooting.
>
> $ sudo btrfs scrub status -d /home
> UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
> scrub device /dev/sda (id 1) status
> Scrub started:    Thu May  7 15:44:21 2020
> Status:           running
> Duration:         0:06:53
> Time left:        9:23:26
> ETA:              Fri May  8 01:14:40 2020
> Total to scrub:   3.66TiB
> Bytes scrubbed:   45.24GiB
> Rate:             112.16MiB/s
> Error summary:    no errors found
> scrub device /dev/sdb (id 2) status
> Scrub started:    Thu May  7 15:44:21 2020
> Status:           running
> Duration:         0:06:53
> Time left:        9:24:50
> ETA:              Fri May  8 01:16:04 2020
> Total to scrub:   3.66TiB
> Bytes scrubbed:   45.12GiB
> Rate:             111.88MiB/s
> Error summary:    no errors found
>
> Still sane after cancelling and resuming.
>
> One thing that might be relevant:  On the original scrub, I started it
> on the mountpoint but initially cancelled and resumed it on the device
> /dev/sda rather than the mountpoint.  Could that trigger a bug?

Maybe. Try it and see.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08  2:26               ` Chris Murphy
@ 2020-05-08  2:31                 ` Andrew Pam
  2020-05-08  2:48                   ` Chris Murphy
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Pam @ 2020-05-08  2:31 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 8/5/20 12:26 pm, Chris Murphy wrote:
>> One thing that might be relevant:  On the original scrub, I started it
>> on the mountpoint but initially cancelled and resumed it on the device
>> /dev/sda rather than the mountpoint.  Could that trigger a bug?
> 
> Maybe. Try it and see.

I just want to get a scrub to complete normally first.  So far it's not
making a lot of progress.  More visibility into the process might be
helpful.

Cheers,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08  2:31                 ` Andrew Pam
@ 2020-05-08  2:48                   ` Chris Murphy
  2020-05-08  2:54                     ` Andrew Pam
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Murphy @ 2020-05-08  2:48 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS

On Thu, May 7, 2020 at 8:32 PM Andrew Pam <andrew@sericyb.com.au> wrote:
>
> On 8/5/20 12:26 pm, Chris Murphy wrote:
> >> One thing that might be relevant:  On the original scrub, I started it
> >> on the mountpoint but initially cancelled and resumed it on the device
> >> /dev/sda rather than the mountpoint.  Could that trigger a bug?
> >
> > Maybe. Try it and see.
>
> I just want to get a scrub to complete normally first.  So far it's not
> making a lot of progress.  More visibility into the process might be
> helpful.

>Rate:             112.16MiB/s
>Rate:             111.88MiB/s

These are hard drives? Those are reasonable rates. It's typical for
drives to differ ~50% from outer most to inner most tracks. Depending
on the chunk layout and progress, the rate may vary, including between
drives.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08  2:48                   ` Chris Murphy
@ 2020-05-08  2:54                     ` Andrew Pam
  2020-05-08  3:55                       ` Chris Murphy
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Pam @ 2020-05-08  2:54 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 8/5/20 12:48 pm, Chris Murphy wrote:
>> Rate:             112.16MiB/s
>> Rate:             111.88MiB/s
> 
> These are hard drives? Those are reasonable rates. It's typical for
> drives to differ ~50% from outer most to inner most tracks. Depending
> on the chunk layout and progress, the rate may vary, including between
> drives.

It's not the rate I'm concerned about - it's that the "Bytes scrubbed"
barely increases, the "Time left" doesn't decrease, and the "ETA"
recedes ever further into the future.

Cheers,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08  2:54                     ` Andrew Pam
@ 2020-05-08  3:55                       ` Chris Murphy
  2020-05-08  4:04                         ` Andrew Pam
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Murphy @ 2020-05-08  3:55 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS

On Thu, May 7, 2020 at 8:54 PM Andrew Pam <andrew@sericyb.com.au> wrote:
>
> On 8/5/20 12:48 pm, Chris Murphy wrote:
> >> Rate:             112.16MiB/s
> >> Rate:             111.88MiB/s
> >
> > These are hard drives? Those are reasonable rates. It's typical for
> > drives to differ ~50% from outer most to inner most tracks. Depending
> > on the chunk layout and progress, the rate may vary, including between
> > drives.
>
> It's not the rate I'm concerned about - it's that the "Bytes scrubbed"
> barely increases, the "Time left" doesn't decrease, and the "ETA"
> recedes ever further into the future.

OK what's the current output from
$ sudo btrfs scrub status -d /home

21 hours ago, the report was that it'd take 9 hours to scrub.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08  3:55                       ` Chris Murphy
@ 2020-05-08  4:04                         ` Andrew Pam
  2020-05-08  4:21                           ` Chris Murphy
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Pam @ 2020-05-08  4:04 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 8/5/20 1:55 pm, Chris Murphy wrote:
> OK what's the current output from
> $ sudo btrfs scrub status -d /home
> 
> 21 hours ago, the report was that it'd take 9 hours to scrub.

UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
scrub device /dev/sda (id 1) status
Scrub started:    Thu May  7 15:44:21 2020
Status:           running
Duration:         5:40:13
Time left:        1:22:58
ETA:              Fri May  8 15:25:44 2020
Total to scrub:   3.66TiB
Bytes scrubbed:   2.94TiB
Rate:             151.16MiB/s
Error summary:    no errors found
scrub device /dev/sdb (id 2) status
Scrub started:    Thu May  7 15:44:21 2020
Status:           running
Duration:         5:40:16
Time left:        1:18:02
ETA:              Fri May  8 15:20:48 2020
Total to scrub:   3.66TiB
Bytes scrubbed:   2.98TiB
Rate:             152.92MiB/s
Error summary:    no errors found

The "Duration" and "Time left" and "Bytes scrubbed" values have not
changed for a few hours, despite considerable disk I/O.

Cheers,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08  4:04                         ` Andrew Pam
@ 2020-05-08  4:21                           ` Chris Murphy
  2020-05-08  4:42                             ` Andrew Pam
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Murphy @ 2020-05-08  4:21 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS

On Thu, May 7, 2020 at 10:04 PM Andrew Pam <andrew@sericyb.com.au> wrote:
>
> On 8/5/20 1:55 pm, Chris Murphy wrote:
> > OK what's the current output from
> > $ sudo btrfs scrub status -d /home
> >
> > 21 hours ago, the report was that it'd take 9 hours to scrub.
>
> UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
> scrub device /dev/sda (id 1) status
> Scrub started:    Thu May  7 15:44:21 2020
> Status:           running
> Duration:         5:40:13
> Time left:        1:22:58
> ETA:              Fri May  8 15:25:44 2020
> Total to scrub:   3.66TiB
> Bytes scrubbed:   2.94TiB
> Rate:             151.16MiB/s

What does 'iotop -d 10 -o' report? I'm expecting around 300MB/s reads.

The ETA is +14 hours what you posted 21 hours ago. So yeah that's
fakaked, but at least it's not saying it'll be done in year 5544!

I've always seen the ETAs be pretty accurate so I don't know what's going on.

3082813.44MB to go divided by 300MB/s is 171 minutes. Or just under 3
hours. So the time left / ETA is wrong based on this rate, if it's a
stable rate, which it might not be.

The gotcha if the rate is changing due to concurrent load or a decent
amount of free space fragmentation, could be to blame. Hence iotop.

What are the current mount options for this file system?



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08  4:21                           ` Chris Murphy
@ 2020-05-08  4:42                             ` Andrew Pam
  2020-05-08  4:45                               ` Chris Murphy
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Pam @ 2020-05-08  4:42 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 8/5/20 2:21 pm, Chris Murphy wrote:
> What does 'iotop -d 10 -o' report? I'm expecting around 300MB/s reads.

91872 idle root      149.25 M/s    0.00 B/s  0.00 %  0.00 % btrfs scrub
resume -c3 /home
91873 idle root      157.77 M/s    0.00 B/s  0.00 %  0.00 % btrfs scrub
resume -c3 /home

> The ETA is +14 hours what you posted 21 hours ago. So yeah that's
> fakaked, but at least it's not saying it'll be done in year 5544!

I did cancel when I went to bed and resumed when I got up three hours
later on two occasions, so subtract six hours.

> I've always seen the ETAs be pretty accurate so I don't know what's going on.

Maybe it screws up once you cancel and resume too many times?

> 3082813.44MB to go divided by 300MB/s is 171 minutes. Or just under 3
> hours. So the time left / ETA is wrong based on this rate, if it's a
> stable rate, which it might not be.

I have an I/O load graph on my screen and it shows 100% read load on
both drives at all times, with an I/O rate of around 130-160 M/s per drive.

> What are the current mount options for this file system?

/dev/sda on /home type btrfs
(rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)

Thanks,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08  4:42                             ` Andrew Pam
@ 2020-05-08  4:45                               ` Chris Murphy
  2020-05-08  4:52                                 ` Andrew Pam
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Murphy @ 2020-05-08  4:45 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS

On Thu, May 7, 2020 at 10:42 PM Andrew Pam <andrew@sericyb.com.au> wrote:
>
> On 8/5/20 2:21 pm, Chris Murphy wrote:
> > What does 'iotop -d 10 -o' report? I'm expecting around 300MB/s reads.
>
> 91872 idle root      149.25 M/s    0.00 B/s  0.00 %  0.00 % btrfs scrub
> resume -c3 /home
> 91873 idle root      157.77 M/s    0.00 B/s  0.00 %  0.00 % btrfs scrub
> resume -c3 /home
>
> > The ETA is +14 hours what you posted 21 hours ago. So yeah that's
> > fakaked, but at least it's not saying it'll be done in year 5544!
>
> I did cancel when I went to bed and resumed when I got up three hours
> later on two occasions, so subtract six hours.
>
> > I've always seen the ETAs be pretty accurate so I don't know what's going on.
>
> Maybe it screws up once you cancel and resume too many times?
>
> > 3082813.44MB to go divided by 300MB/s is 171 minutes. Or just under 3
> > hours. So the time left / ETA is wrong based on this rate, if it's a
> > stable rate, which it might not be.
>
> I have an I/O load graph on my screen and it shows 100% read load on
> both drives at all times, with an I/O rate of around 130-160 M/s per drive.
>
> > What are the current mount options for this file system?
>
> /dev/sda on /home type btrfs
> (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>
> Thanks,
>         Andrew

I don't know if space_cache v1 can negatively impact scrubs but it
does negatively impact other things especially on larger file systems
with a lot of free space remaining. v1 cache exists as data blocks, v2
cache is a dedicated 'free space tree' and resides in fs metadata.
It's the inverse of the extent tree.

A conservative approach that might speed things up:

# mount -o remount,clear_cache,nospace_cache /mnt

Once it's done:

# mount -o remount,clear_cache,space_cache=v2 /mnt

This sets a feature flag, and the next time you mount normally, it'll
use v2. You could do this now during the scrub, but it might slow it
down a bit while the new cache is being created. If the file system
isn't busy it might take a minute to build.

Anyway, I'm pretty confident this scrub will finish in about 2.5 hours
if you just leave it as is.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08  4:45                               ` Chris Murphy
@ 2020-05-08  4:52                                 ` Andrew Pam
  2020-05-08  7:37                                   ` Chris Murphy
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Pam @ 2020-05-08  4:52 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 8/5/20 2:45 pm, Chris Murphy wrote:
> I don't know if space_cache v1 can negatively impact scrubs but it
> does negatively impact other things especially on larger file systems
> with a lot of free space remaining. v1 cache exists as data blocks, v2
> cache is a dedicated 'free space tree' and resides in fs metadata.
> It's the inverse of the extent tree.
>
> A conservative approach that might speed things up:
>
> # mount -o remount,clear_cache,nospace_cache /mnt
>
> Once it's done:
>
> # mount -o remount,clear_cache,space_cache=v2 /mnt
>
> This sets a feature flag, and the next time you mount normally, it'll
> use v2. You could do this now during the scrub, but it might slow it
> down a bit while the new cache is being created. If the file system
> isn't busy it might take a minute to build.

Thanks, I didn't know that - I just used the defaults in /etc/fstab as
follows:

UUID=85069ce9-be06-4c92-b8c1-8a0f685e43c6 /home		btrfs
defaults,autodefrag,noatime

I'll put space_cache=v2 in there as well.

> Anyway, I'm pretty confident this scrub will finish in about 2.5 hours
> if you just leave it as is.

I'm not.  It's still not reporting any forward progress, just moving the
ETA ahead:

$ sudo btrfs scrub status -d /home
UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
scrub device /dev/sda (id 1) status
Scrub started:    Thu May  7 15:44:21 2020
Status:           running
Duration:         5:40:13
Time left:        1:23:04
ETA:              Fri May  8 16:11:31 2020
Total to scrub:   3.66TiB
Bytes scrubbed:   2.94TiB
Rate:             151.16MiB/s
Error summary:    no errors found
scrub device /dev/sdb (id 2) status
Scrub started:    Thu May  7 15:44:21 2020
Status:           running
Duration:         5:40:16
Time left:        1:18:09
ETA:              Fri May  8 16:06:36 2020
Total to scrub:   3.66TiB
Bytes scrubbed:   2.98TiB
Rate:             152.92MiB/s
Error summary:    no errors found

Thanks,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08  4:52                                 ` Andrew Pam
@ 2020-05-08  7:37                                   ` Chris Murphy
  2020-05-08  8:19                                     ` Andrew Pam
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Murphy @ 2020-05-08  7:37 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS

On Thu, May 7, 2020 at 10:52 PM Andrew Pam <andrew@sericyb.com.au> wrote:
>
> On 8/5/20 2:45 pm, Chris Murphy wrote:
> > I don't know if space_cache v1 can negatively impact scrubs but it
> > does negatively impact other things especially on larger file systems
> > with a lot of free space remaining. v1 cache exists as data blocks, v2
> > cache is a dedicated 'free space tree' and resides in fs metadata.
> > It's the inverse of the extent tree.
> >
> > A conservative approach that might speed things up:
> >
> > # mount -o remount,clear_cache,nospace_cache /mnt
> >
> > Once it's done:
> >
> > # mount -o remount,clear_cache,space_cache=v2 /mnt
> >
> > This sets a feature flag, and the next time you mount normally, it'll
> > use v2. You could do this now during the scrub, but it might slow it
> > down a bit while the new cache is being created. If the file system
> > isn't busy it might take a minute to build.
>
> Thanks, I didn't know that - I just used the defaults in /etc/fstab as
> follows:
>
> UUID=85069ce9-be06-4c92-b8c1-8a0f685e43c6 /home         btrfs
> defaults,autodefrag,noatime
>
> I'll put space_cache=v2 in there as well.

Is there anything else going on? What do you get for 'top -d 30'

Autodefrag can sometimes be expensive, when it's doing copies in the
background. This isn't inhibited during scrub.

You do not need to put space_cache=v2 in fstab. It's a one time use,
and it needs to be used with clear_cache. When you use it, a feature
flag will be set that will cause it to be used automatically each time
the file system mounts. Really the only thing that needs to be in
there is 'noatime'.


>
> > Anyway, I'm pretty confident this scrub will finish in about 2.5 hours
> > if you just leave it as is.
>
> I'm not.  It's still not reporting any forward progress, just moving the
> ETA ahead:
>
> $ sudo btrfs scrub status -d /home
> UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
> scrub device /dev/sda (id 1) status
> Scrub started:    Thu May  7 15:44:21 2020
> Status:           running
> Duration:         5:40:13
> Time left:        1:23:04
> ETA:              Fri May  8 16:11:31 2020
> Total to scrub:   3.66TiB
> Bytes scrubbed:   2.94TiB
> Rate:             151.16MiB/s

OK I screwed up the previous math. I love it when I do that. Since
this is -d the numbers are for this device.

0.72T remaining
0.72×1024×1024÷151÷60÷60=1h23

Are there any messages in dmesg?

All of these numbers look sane, compared to your first post. But that
ETA keeps increasing suggests the scrub isn't proceeding at the
reported rate. As if it's busy doing something else.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08  7:37                                   ` Chris Murphy
@ 2020-05-08  8:19                                     ` Andrew Pam
  2020-05-08 11:21                                       ` Graham Cobb
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Pam @ 2020-05-08  8:19 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 8/5/20 5:37 pm, Chris Murphy wrote:
> Are there any messages in dmesg?

Well this is interesting:

[129682.760759] BTRFS info (device sda): scrub: finished on devid 2 with
status: 0
[129683.173404] BTRFS info (device sda): scrub: finished on devid 1 with
status: 0

But then:

$ sudo btrfs scrub status -d /home
UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
scrub device /dev/sda (id 1) status
Scrub started:    Thu May  7 15:44:21 2020
Status:           interrupted
Duration:         5:40:13
Total to scrub:   3.66TiB
Rate:             151.16MiB/s
Error summary:    no errors found
scrub device /dev/sdb (id 2) status
Scrub started:    Thu May  7 15:44:21 2020
Status:           interrupted
Duration:         5:40:16
Total to scrub:   3.66TiB
Rate:             152.92MiB/s
Error summary:    no errors found

So was it really "interrupted", or did it finish normally with no errors
but btrfs-progs is reporting wrongly?

Thanks,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08  8:19                                     ` Andrew Pam
@ 2020-05-08 11:21                                       ` Graham Cobb
  2020-05-08 12:54                                         ` Andrew Pam
  2020-05-08 17:06                                         ` Andrew Pam
  0 siblings, 2 replies; 48+ messages in thread
From: Graham Cobb @ 2020-05-08 11:21 UTC (permalink / raw)
  To: Andrew Pam, Chris Murphy; +Cc: Btrfs BTRFS

On 08/05/2020 09:19, Andrew Pam wrote:
> On 8/5/20 5:37 pm, Chris Murphy wrote:
>> Are there any messages in dmesg?
> 
> Well this is interesting:
> 
> [129682.760759] BTRFS info (device sda): scrub: finished on devid 2 with
> status: 0
> [129683.173404] BTRFS info (device sda): scrub: finished on devid 1 with
> status: 0
> 
> But then:
> 
> $ sudo btrfs scrub status -d /home
> UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
> scrub device /dev/sda (id 1) status
> Scrub started:    Thu May  7 15:44:21 2020
> Status:           interrupted
> Duration:         5:40:13
> Total to scrub:   3.66TiB
> Rate:             151.16MiB/s
> Error summary:    no errors found
> scrub device /dev/sdb (id 2) status
> Scrub started:    Thu May  7 15:44:21 2020
> Status:           interrupted
> Duration:         5:40:16
> Total to scrub:   3.66TiB
> Rate:             152.92MiB/s
> Error summary:    no errors found
> 
> So was it really "interrupted", or did it finish normally with no errors
> but btrfs-progs is reporting wrongly?

I also don't know whether it has really finished successfully.

If you are worried that it is somehow looping (bytes scrubbed going up
but not really making progress), use:

btrfs scrub status -dR /home

and look at last_physical (for each disk) - it should be always increasing.

Also, there have been bugs in cancel/resume in the past. There could be
more bugs lurking there, particularly for multi-device filesystems.

If you are going to cancel and resume, check last_physical for each
device before the cancel (using 'status -dR') and after the resume and
make sure they seem sensible (not gone backwards, or skipped massively
forward, or started again on a device which had already finished).

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08 11:21                                       ` Graham Cobb
@ 2020-05-08 12:54                                         ` Andrew Pam
  2020-05-08 17:06                                         ` Andrew Pam
  1 sibling, 0 replies; 48+ messages in thread
From: Andrew Pam @ 2020-05-08 12:54 UTC (permalink / raw)
  To: Graham Cobb, Chris Murphy; +Cc: Btrfs BTRFS

On 8/5/20 9:21 pm, Graham Cobb wrote:
> If you are worried that it is somehow looping (bytes scrubbed going up
> but not really making progress), use:
> 
> btrfs scrub status -dR /home

Aha!  That's what I was looking for.

> and look at last_physical (for each disk) - it should be always increasing.
> 
> Also, there have been bugs in cancel/resume in the past. There could be
> more bugs lurking there, particularly for multi-device filesystems.

Apparently!  Unfortunately since scrub blocks suspend, I have to use
cancel/resume in my suspend pre/post scripts.

> If you are going to cancel and resume, check last_physical for each
> device before the cancel (using 'status -dR') and after the resume and
> make sure they seem sensible (not gone backwards, or skipped massively
> forward, or started again on a device which had already finished).

$ sudo btrfs scrub status -dR /home
UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
scrub device /dev/sda (id 1) status
Scrub started:    Thu May  7 15:44:21 2020
Status:           running
Duration:         5:40:13
	data_extents_scrubbed: 51856478
	tree_extents_scrubbed: 431748
	data_bytes_scrubbed: 3228367126528
	tree_bytes_scrubbed: 7073759232
	read_errors: 0
	csum_errors: 0
	verify_errors: 0
	no_csum: 179858
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 0
	unverified_errors: 0
	corrected_errors: 0
	last_physical: 0
scrub device /dev/sdb (id 2) status
Scrub started:    Thu May  7 15:44:21 2020
Status:           running
Duration:         5:40:16
	data_extents_scrubbed: 52452792
	tree_extents_scrubbed: 431756
	data_bytes_scrubbed: 3266540351488
	tree_bytes_scrubbed: 7073890304
	read_errors: 0
	csum_errors: 0
	verify_errors: 0
	no_csum: 182034
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 0
	unverified_errors: 0
	corrected_errors: 0
	last_physical: 0

last_physical is zero.  That doesn't seem right.

Cheers,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08 11:21                                       ` Graham Cobb
  2020-05-08 12:54                                         ` Andrew Pam
@ 2020-05-08 17:06                                         ` Andrew Pam
  2020-05-08 21:20                                           ` Chris Murphy
                                                             ` (2 more replies)
  1 sibling, 3 replies; 48+ messages in thread
From: Andrew Pam @ 2020-05-08 17:06 UTC (permalink / raw)
  To: Graham Cobb, Chris Murphy; +Cc: Btrfs BTRFS

And here we are again:

$ sudo btrfs scrub status -d /home
UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
scrub device /dev/sda (id 1) status
Scrub resumed:    Sat May  9 02:52:12 2020
Status:           running
Duration:         7:02:55
Time left:        32261372:31:39
ETA:              Fri Sep 17 23:35:41 5700
Total to scrub:   3.66TiB
Bytes scrubbed:   3.67TiB
Rate:             151.47MiB/s
Error summary:    no errors found
scrub device /dev/sdb (id 2) status
Scrub resumed:    Sat May  9 02:52:12 2020
Status:           running
Duration:         7:02:59
Time left:        31973655:40:34
ETA:              Mon Nov 21 19:44:36 5667
Total to scrub:   3.66TiB
Bytes scrubbed:   3.70TiB
Rate:             152.83MiB/s
Error summary:    no errors found

I tried building btrfs-progs v5.6.1 from source, but it gives exactly
the same results.

Cheers,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08 17:06                                         ` Andrew Pam
@ 2020-05-08 21:20                                           ` Chris Murphy
  2020-05-08 21:31                                           ` Chris Murphy
  2020-05-09  8:02                                           ` Su Yue
  2 siblings, 0 replies; 48+ messages in thread
From: Chris Murphy @ 2020-05-08 21:20 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Graham Cobb, Chris Murphy, Btrfs BTRFS

On Fri, May 8, 2020 at 11:06 AM Andrew Pam <andrew@sericyb.com.au> wrote:
>
> And here we are again:
>
> $ sudo btrfs scrub status -d /home
> UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
> scrub device /dev/sda (id 1) status
> Scrub resumed:    Sat May  9 02:52:12 2020
> Status:           running
> Duration:         7:02:55
> Time left:        32261372:31:39
> ETA:              Fri Sep 17 23:35:41 5700
> Total to scrub:   3.66TiB
> Bytes scrubbed:   3.67TiB


It's obviously a bug. But I don't know why.


[ 1683.525602] BTRFS warning (device sda4): qgroup rescan init failed,
qgroup is not enabled
[chris@fmac ~]




> Rate:             151.47MiB/s
> Error summary:    no errors found
> scrub device /dev/sdb (id 2) status
> Scrub resumed:    Sat May  9 02:52:12 2020
> Status:           running
> Duration:         7:02:59
> Time left:        31973655:40:34
> ETA:              Mon Nov 21 19:44:36 5667
> Total to scrub:   3.66TiB
> Bytes scrubbed:   3.70TiB
> Rate:             152.83MiB/s
> Error summary:    no errors found
>
> I tried building btrfs-progs v5.6.1 from source, but it gives exactly
> the same results.




--
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08 17:06                                         ` Andrew Pam
  2020-05-08 21:20                                           ` Chris Murphy
@ 2020-05-08 21:31                                           ` Chris Murphy
  2020-05-09 10:15                                             ` Andrew Pam
  2020-05-09  8:02                                           ` Su Yue
  2 siblings, 1 reply; 48+ messages in thread
From: Chris Murphy @ 2020-05-08 21:31 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Graham Cobb, Chris Murphy, Btrfs BTRFS

(gmail is super annoying sometimes, keyboard shortcuts that cause
email to be sent)

On Fri, May 8, 2020 at 11:06 AM Andrew Pam <andrew@sericyb.com.au> wrote:
>
> And here we are again:
>
> $ sudo btrfs scrub status -d /home
> UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
> scrub device /dev/sda (id 1) status
> Scrub resumed:    Sat May  9 02:52:12 2020
> Status:           running
> Duration:         7:02:55
> Time left:        32261372:31:39
> ETA:              Fri Sep 17 23:35:41 5700
> Total to scrub:   3.66TiB
> Bytes scrubbed:   3.67TiB
> Rate:             151.47MiB/s
> Error summary:    no errors found
> scrub device /dev/sdb (id 2) status
> Scrub resumed:    Sat May  9 02:52:12 2020
> Status:           running
> Duration:         7:02:59
> Time left:        31973655:40:34
> ETA:              Mon Nov 21 19:44:36 5667
> Total to scrub:   3.66TiB
> Bytes scrubbed:   3.70TiB
> Rate:             152.83MiB/s
> Error summary:    no errors found
>
> I tried building btrfs-progs v5.6.1 from source, but it gives exactly
> the same results.

Do you have qgroups enabled?

$ sudo btrfs qgroup show /home
ERROR: can't list qgroups: quotas not enabled

What do you get for:

$ sudo btrfs insp dump-s /dev/

This dumps out the super block.

There are scrub related fixes in linux git since 5.4.0 but I can't
tell if any of them are related to this problem. My suggestion is
scrub with kernel 5.7.rc4. If the problem happens, it's a current and
unfixed bug. If it's been fixed, then it's a question of what commit
fixed it and whether it's been backported, or can be, to 5.4 series.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08 17:06                                         ` Andrew Pam
  2020-05-08 21:20                                           ` Chris Murphy
  2020-05-08 21:31                                           ` Chris Murphy
@ 2020-05-09  8:02                                           ` Su Yue
  2020-05-09  8:23                                             ` Andrew Pam
  2 siblings, 1 reply; 48+ messages in thread
From: Su Yue @ 2020-05-09  8:02 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Graham Cobb, Chris Murphy, Btrfs BTRFS


On Sat 09 May 2020 at 01:06, Andrew Pam <andrew@sericyb.com.au>
wrote:

> And here we are again:
>
> $ sudo btrfs scrub status -d /home UUID:
> 85069ce9-be06-4c92-b8c1-8a0f685e43c6 scrub device /dev/sda (id
> 1) status Scrub resumed:    Sat May  9 02:52:12 2020 Status:
> running Duration:         7:02:55 Time left:
> 32261372:31:39 ETA:              Fri Sep 17 23:35:41 5700 Total
> to scrub:   3.66TiB Bytes scrubbed:   3.67TiB Rate:
> 151.47MiB/s Error summary:    no errors found scrub device
> /dev/sdb (id 2) status Scrub resumed:    Sat May  9 02:52:12
> 2020 Status:           running Duration:         7:02:59 Time
> left:        31973655:40:34 ETA:              Mon Nov 21
> 19:44:36 5667 Total to scrub:   3.66TiB Bytes scrubbed:
> 3.70TiB Rate:             152.83MiB/s Error summary:    no
> errors found
>

Weird. Any operation was done since the last "interrupted" status?
Does iotop show any actual io of the two devices?

--
Su
> I tried building btrfs-progs v5.6.1 from source, but it gives exactly
> the same results.
>
> Cheers,
> 	Andrew


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-09  8:02                                           ` Su Yue
@ 2020-05-09  8:23                                             ` Andrew Pam
  2020-05-09 20:28                                               ` Chris Murphy
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Pam @ 2020-05-09  8:23 UTC (permalink / raw)
  To: Su Yue; +Cc: Graham Cobb, Chris Murphy, Btrfs BTRFS

On 9/5/20 6:02 pm, Su Yue wrote:
> Weird. Any operation was done since the last "interrupted" status?

Just suspending and resuming the system.

> Does iotop show any actual io of the two devices?

Yes, iotop shows >100M/s on each drive from the "btrfs resume" processes
at all times when the scrub is running, even when it doesn't appear to
be making any progress.  Maybe the reporting from the kernel is wrong.

Cheers,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-08 21:31                                           ` Chris Murphy
@ 2020-05-09 10:15                                             ` Andrew Pam
  2020-05-09 20:33                                               ` Chris Murphy
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Pam @ 2020-05-09 10:15 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 9/5/20 7:31 am, Chris Murphy wrote:
> Do you have qgroups enabled?
> 
> $ sudo btrfs qgroup show /home
> ERROR: can't list qgroups: quotas not enabled

No:

$ sudo btrfs qgroup show /home
ERROR: can't list qgroups: quotas not enabled

> What do you get for:
> 
> $ sudo btrfs insp dump-s /dev/
> 
> This dumps out the super block.

$ sudo btrfs insp dump-s /dev/sda
superblock: bytenr=65536, device=/dev/sda
---------------------------------------------------------
csum_type		0 (crc32c)
csum_size		4
csum			0x36e3f416 [match]
bytenr			65536
flags			0x1
			( WRITTEN )
magic			_BHRfS_M [match]
fsid			85069ce9-be06-4c92-b8c1-8a0f685e43c6
metadata_uuid		85069ce9-be06-4c92-b8c1-8a0f685e43c6
label			home
generation		59159
root			4881572642816
sys_array_size		258
chunk_root_generation	58225
root_level		1
chunk_root		4925720363008
chunk_root_level	1
log_root		4881575936000
log_root_transid	0
log_root_level		0
total_bytes		12002350252032
bytes_used		4022988615680
sectorsize		4096
nodesize		16384
leafsize (deprecated)	16384
stripesize		4096
root_dir		6
num_devices		2
compat_flags		0x0
compat_ro_flags		0x0
incompat_flags		0x161
			( MIXED_BACKREF |
			  BIG_METADATA |
			  EXTENDED_IREF |
			  SKINNY_METADATA )
cache_generation	59159
uuid_tree_generation	59159
dev_item.uuid		b347d8aa-4563-45db-bcf1-c722e5b3abe8
dev_item.fsid		85069ce9-be06-4c92-b8c1-8a0f685e43c6 [match]
dev_item.type		0
dev_item.total_bytes	6001175126016
dev_item.bytes_used	4055589978112
dev_item.io_align	4096
dev_item.io_width	4096
dev_item.sector_size	4096
dev_item.devid		1
dev_item.dev_group	0
dev_item.seek_speed	0
dev_item.bandwidth	0
dev_item.generation	0

> There are scrub related fixes in linux git since 5.4.0 but I can't
> tell if any of them are related to this problem. My suggestion is
> scrub with kernel 5.7.rc4. If the problem happens, it's a current and
> unfixed bug. If it's been fixed, then it's a question of what commit
> fixed it and whether it's been backported, or can be, to 5.4 series.

OK, I installed 5.7rc4 and rebooted, then started a fresh scrub:

$ uname -a
Linux RainbowDash 5.7.0-050700rc4-generic #202005051752 SMP Tue May 5
21:56:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

After a couple of cancel/resumes here we are again:

$ sudo btrfs scrub status -dR /home
UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
scrub device /dev/sda (id 1) status
Scrub started:    Sat May  9 15:52:21 2020
Status:           running
Duration:         2:25:03
	data_extents_scrubbed: 22903068
	tree_extents_scrubbed: 194348
	data_bytes_scrubbed: 1418082578432
	tree_bytes_scrubbed: 3184197632
	read_errors: 0
	csum_errors: 0
	verify_errors: 0
	no_csum: 60164
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 0
	unverified_errors: 0
	corrected_errors: 0
	last_physical: 0
scrub device /dev/sdb (id 2) status
Scrub started:    Sat May  9 15:52:21 2020
Status:           running
Duration:         2:25:03
	data_extents_scrubbed: 22903070
	tree_extents_scrubbed: 194348
	data_bytes_scrubbed: 1418082648064
	tree_bytes_scrubbed: 3184197632
	read_errors: 0
	csum_errors: 0
	verify_errors: 0
	no_csum: 60164
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 0
	unverified_errors: 0
	corrected_errors: 0
	last_physical: 0

Duration, time left and bytes scrubbed are not advancing, and
last_physical is zero.

Thanks,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-09  8:23                                             ` Andrew Pam
@ 2020-05-09 20:28                                               ` Chris Murphy
  2020-05-09 20:39                                                 ` Andrew Pam
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Murphy @ 2020-05-09 20:28 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Su Yue, Graham Cobb, Chris Murphy, Btrfs BTRFS

On Sat, May 9, 2020 at 2:23 AM Andrew Pam <andrew@sericyb.com.au> wrote:
>
> On 9/5/20 6:02 pm, Su Yue wrote:
> > Weird. Any operation was done since the last "interrupted" status?
>
> Just suspending and resuming the system.

OK so you haven't done a scrub from start to finish without a
cancel+suspend+resume?

I just want to be clear exactly when this bug is and isn't reproducible.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-09 10:15                                             ` Andrew Pam
@ 2020-05-09 20:33                                               ` Chris Murphy
  2020-05-10  1:14                                                 ` Chris Murphy
                                                                   ` (3 more replies)
  0 siblings, 4 replies; 48+ messages in thread
From: Chris Murphy @ 2020-05-09 20:33 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS

On Sat, May 9, 2020 at 4:15 AM Andrew Pam <andrew@sericyb.com.au> wrote:
>
> $ uname -a
> Linux RainbowDash 5.7.0-050700rc4-generic #202005051752 SMP Tue May 5
> 21:56:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>
> After a couple of cancel/resumes here we are again:


OK I didn't previously pick up on the fact that you're cancelling a
scrub. Suspending. Then resuming.

Things to verify:

1. That a scrub started, and allowed to finish without any
interruption, does finish. This is the only experience I have, but I'm
wondering whether you've tried this just to eliminate the possibility
of some weird multiple device scrub bug; or possibly something unique
about your file system state.

2. That a scrub started, then cancelled, then resumed, also does
finish (or not).

3. That a scrub started, then cancelled, then suspend-to-RAM, then
resumed, also does finish (or not).

The distinction between (2) and (3) is system sleep. Is this strictly
a resume scrub bug. Or is there something about suspend that causing
state to be lost?

And a low priority (3b) variation might be suspend-to-disk; but this
is not very well supported anyway, in particular UEFI Secure Boot
enabled systems due to hibernation lockdown policy being common these
days.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-09 20:28                                               ` Chris Murphy
@ 2020-05-09 20:39                                                 ` Andrew Pam
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Pam @ 2020-05-09 20:39 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Su Yue, Graham Cobb, Btrfs BTRFS

On 10/5/20 6:28 am, Chris Murphy wrote:
> OK so you haven't done a scrub from start to finish without a
> cancel+suspend+resume?

I have not, because my system is set to suspend after 30 minutes to save
power.  I will disable that and test a scrub with no suspend next.

Cheers,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-09 20:33                                               ` Chris Murphy
@ 2020-05-10  1:14                                                 ` Chris Murphy
  2020-05-10  1:28                                                   ` Andrew Pam
                                                                     ` (2 more replies)
  2020-05-10  1:30                                                 ` Andrew Pam
                                                                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 48+ messages in thread
From: Chris Murphy @ 2020-05-10  1:14 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Andrew Pam, Graham Cobb, Btrfs BTRFS

On Sat, May 9, 2020 at 2:33 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Sat, May 9, 2020 at 4:15 AM Andrew Pam <andrew@sericyb.com.au> wrote:
> >
> > $ uname -a
> > Linux RainbowDash 5.7.0-050700rc4-generic #202005051752 SMP Tue May 5
> > 21:56:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
> >
> > After a couple of cancel/resumes here we are again:
>
>
> OK I didn't previously pick up on the fact that you're cancelling a
> scrub. Suspending. Then resuming.
>
> Things to verify:
>
> 1. That a scrub started, and allowed to finish without any
> interruption, does finish. This is the only experience I have, but I'm
> wondering whether you've tried this just to eliminate the possibility
> of some weird multiple device scrub bug; or possibly something unique
> about your file system state.
>
> 2. That a scrub started, then cancelled, then resumed, also does
> finish (or not).
>
> 3. That a scrub started, then cancelled, then suspend-to-RAM, then
> resumed, also does finish (or not).


I just tried all three of these on my laptop, which has NVMe and
5.7.0-0.rc4.1.fc33.x86_64+debug. And they all complete, with no odd
reporting in user space for ETA, and progress seems normal.

I do see this in dmesg, which I think coincides with the cancel/abort
[98580.937332] BTRFS info (device nvme0n1p7): scrub: not finished on
devid 1 with status: -125

I don't know how the kernel and user space communicate scrub progress.
I don't see anything in sysfs.

When you see this problem manifesting (no progress being made) the
/home file system is otherwise working normally? You can read and
write files OK?


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-10  1:14                                                 ` Chris Murphy
@ 2020-05-10  1:28                                                   ` Andrew Pam
  2020-05-10  1:32                                                   ` Andrew Pam
  2020-05-10 11:52                                                   ` Graham Cobb
  2 siblings, 0 replies; 48+ messages in thread
From: Andrew Pam @ 2020-05-10  1:28 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 10/5/20 11:14 am, Chris Murphy wrote:
> When you see this problem manifesting (no progress being made) the
> /home file system is otherwise working normally? You can read and
> write files OK?

Correct.  It works fine, although the "btrfs resume" process is
generating heavy read I/O.

Thanks,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-09 20:33                                               ` Chris Murphy
  2020-05-10  1:14                                                 ` Chris Murphy
@ 2020-05-10  1:30                                                 ` Andrew Pam
  2020-05-10  6:17                                                 ` Andrew Pam
  2020-05-11  1:39                                                 ` Andrew Pam
  3 siblings, 0 replies; 48+ messages in thread
From: Andrew Pam @ 2020-05-10  1:30 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 10/5/20 6:33 am, Chris Murphy wrote:
> 1. That a scrub started, and allowed to finish without any
> interruption, does finish. This is the only experience I have, but I'm
> wondering whether you've tried this just to eliminate the possibility
> of some weird multiple device scrub bug; or possibly something unique
> about your file system state.

I am testing this now.  Completion estimated in around 3 hours (7 hours
total running time).

Thanks,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-10  1:14                                                 ` Chris Murphy
  2020-05-10  1:28                                                   ` Andrew Pam
@ 2020-05-10  1:32                                                   ` Andrew Pam
  2020-05-10 11:52                                                   ` Graham Cobb
  2 siblings, 0 replies; 48+ messages in thread
From: Andrew Pam @ 2020-05-10  1:32 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 10/5/20 11:14 am, Chris Murphy wrote:
> I just tried all three of these on my laptop, which has NVMe and
> 5.7.0-0.rc4.1.fc33.x86_64+debug. And they all complete, with no odd
> reporting in user space for ETA, and progress seems normal.

Have you tried multiple cancel/suspend/resume cycles?  It doesn't always
go bad right away.

Thanks,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-09 20:33                                               ` Chris Murphy
  2020-05-10  1:14                                                 ` Chris Murphy
  2020-05-10  1:30                                                 ` Andrew Pam
@ 2020-05-10  6:17                                                 ` Andrew Pam
  2020-05-11  1:39                                                 ` Andrew Pam
  3 siblings, 0 replies; 48+ messages in thread
From: Andrew Pam @ 2020-05-10  6:17 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 10/5/20 6:33 am, Chris Murphy wrote:
> Things to verify:
> 
> 1. That a scrub started, and allowed to finish without any
> interruption, does finish. This is the only experience I have, but I'm
> wondering whether you've tried this just to eliminate the possibility
> of some weird multiple device scrub bug; or possibly something unique
> about your file system state.

Completed successfully with no issues, and I was running Thunderbird and
Firefox plus playing "Chimera Squad" using Steam Proton while the scrub
was running to exercise the system.  :)

$ sudo btrfs scrub status -d /home
UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
scrub device /dev/sda (id 1) history
Scrub started:    Sun May 10 07:48:43 2020
Status:           finished
Duration:         7:35:14
Total to scrub:   3.69TiB
Rate:             140.47MiB/s
Error summary:    no errors found
scrub device /dev/sdb (id 2) history
Scrub started:    Sun May 10 07:48:43 2020
Status:           finished
Duration:         7:35:14
Total to scrub:   3.69TiB
Rate:             140.47MiB/s
Error summary:    no errors found

[17537.701685] BTRFS info (device sda): scrub: started on devid 2
[17537.701686] BTRFS info (device sda): scrub: started on devid 1
[44851.445348] BTRFS info (device sda): scrub: finished on devid 2 with
status: 0
[44851.476413] BTRFS info (device sda): scrub: finished on devid 1 with
status: 0

> 2. That a scrub started, then cancelled, then resumed, also does
> finish (or not).

I will test this next.

Thanks,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-10  1:14                                                 ` Chris Murphy
  2020-05-10  1:28                                                   ` Andrew Pam
  2020-05-10  1:32                                                   ` Andrew Pam
@ 2020-05-10 11:52                                                   ` Graham Cobb
  2020-05-10 17:21                                                     ` Graham Cobb
  2 siblings, 1 reply; 48+ messages in thread
From: Graham Cobb @ 2020-05-10 11:52 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Andrew Pam, Btrfs BTRFS

On 10/05/2020 02:14, Chris Murphy wrote:
> I don't know how the kernel and user space communicate scrub progress.
> I don't see anything in sysfs.

I did some work on btrfs-scrub a while ago to fix a bug in the way
last_physical was being saved between cancel and restore.

If I remember correctly, the way this works is that the scrub process
receives stats about the scrub progress directly from the kernel on a
socket it creates in /var/lib/btrfs, then writes updated stats into a
file in /var/lib/btrfs/scrub.status.<UUID> to share with scrub status
and keep a record. At the end of the scrub, the final stats are provided
by the kernel into a buffer provided in the ioctl which are also then
used to update the scrub.status file.

I don't remember how often the stats are received from the kernel but it
is frequently (not just at the end).

Unless I am confused, if the bytes scrubbed numbers are going up, then
the progress updates are being received from the kernel. The update
includes the last_finished so there should be no way the scrub process
can receive updated bytes scrubbed numbers without updated last_finished
from the kernel.

HOWEVER, I think (I may be wrong) btrfs scrub status normally reads the
stats from the file, not directly from the kernel. So it is actually
reporting the data in the file written by scrub process. That does
processing on the numbers (that was where the bug was which I found). It
is possible there is still a bug in that processing -- if so it is
possible that last_finished in the file could be zero even though it was
received as non-zero from the kernel. So, we can't be sure, without
adding logging to the scrub command, whether the kernel is really
reporting zero.

This is from memory. Sorry, I don't have time, at the moment, to go back
to the scrub usermode code to check.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-10 11:52                                                   ` Graham Cobb
@ 2020-05-10 17:21                                                     ` Graham Cobb
  2020-05-10 17:27                                                       ` Andrew Pam
  0 siblings, 1 reply; 48+ messages in thread
From: Graham Cobb @ 2020-05-10 17:21 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Andrew Pam, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

On 10/05/2020 12:52, Graham Cobb wrote:
> On 10/05/2020 02:14, Chris Murphy wrote:
>> I don't know how the kernel and user space communicate scrub progress.
>> I don't see anything in sysfs.
> 
> I did some work on btrfs-scrub a while ago to fix a bug in the way
> last_physical was being saved between cancel and restore.
> 
> If I remember correctly, the way this works is that the scrub process
> receives stats about the scrub progress directly from the kernel on a
> socket it creates in /var/lib/btrfs, then writes updated stats into a
> file in /var/lib/btrfs/scrub.status.<UUID> to share with scrub status
> and keep a record. At the end of the scrub, the final stats are provided
> by the kernel into a buffer provided in the ioctl which are also then
> used to update the scrub.status file.

Unfortunately my memory is rubbish! I have just looked at the code to
remind myself.

The scrub process gets stats from the kernel using a
BTRFS_IOC_SCRUB_PROGRESS ioctl (one for each device).

It uses this to update the status file and also sends the stats to any
listeners on the socket.

btrfs scrub status reads the data from the socket if it exists (i.e. the
scrub process is still running), otherwise it reads it from the data
file (for a scrub which has finished or been cancelled).

It might be worth you making a small logging change to progress_one_dev
to watch the value of last_physical exactly as received from the kernel
during the scrub. I have attached a small patch.

[-- Attachment #2: scrub.c.patch --]
[-- Type: text/x-patch, Size: 785 bytes --]

diff --git a/cmds/scrub.c b/cmds/scrub.c
index 9fe59822..48d76852 100644
--- a/cmds/scrub.c
+++ b/cmds/scrub.c
@@ -902,6 +902,8 @@ static void *progress_one_dev(void *ctx)
 	struct scrub_progress *sp = ctx;
 
 	sp->ret = ioctl(sp->fd, BTRFS_IOC_SCRUB_PROGRESS, &sp->scrub_args);
+	fprintf(stderr, "NOTE: device %lld last_physical=%lld\n", sp->scrub_args.devid, sp->scrub_args.progress.last_physical);
+
 	sp->ioctl_errno = errno;
 
 	return NULL;
@@ -1784,7 +1786,8 @@ static int cmd_scrub_status(const struct cmd_struct *cmd, int argc, char **argv)
 			err = 1;
 			goto out;
 		}
-	}
+		fputs("NOTE: Reading progress from status file \n", stderr);
+	} else { fputs("NOTE: Reading progress from socket \n", stderr); }
 
 	if (fdres >= 0) {
 		past_scrubs = scrub_read_file(fdres, 1);

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-10 17:21                                                     ` Graham Cobb
@ 2020-05-10 17:27                                                       ` Andrew Pam
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Pam @ 2020-05-10 17:27 UTC (permalink / raw)
  To: Graham Cobb, Chris Murphy; +Cc: Btrfs BTRFS

On 11/5/20 3:21 am, Graham Cobb wrote:
> It might be worth you making a small logging change to progress_one_dev
> to watch the value of last_physical exactly as received from the kernel
> during the scrub. I have attached a small patch.

Thanks, I'll try that!  Patched.

Cheers,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-09 20:33                                               ` Chris Murphy
                                                                   ` (2 preceding siblings ...)
  2020-05-10  6:17                                                 ` Andrew Pam
@ 2020-05-11  1:39                                                 ` Andrew Pam
  2020-05-11  4:46                                                   ` Chris Murphy
  3 siblings, 1 reply; 48+ messages in thread
From: Andrew Pam @ 2020-05-11  1:39 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 10/5/20 6:33 am, Chris Murphy wrote:
> 2. That a scrub started, then cancelled, then resumed, also does
> finish (or not).

OK, I have now run a scrub with multiple cancel and resumes and that
also proceeded and finished normally as expected:

$ sudo ./btrfs scrub status -d /home
NOTE: Reading progress from status file
UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
scrub device /dev/sda (id 1) history
Scrub resumed:    Mon May 11 06:06:37 2020
Status:           finished
Duration:         7:27:31
Total to scrub:   3.67TiB
Rate:             142.96MiB/s
Error summary:    no errors found
scrub device /dev/sdb (id 2) history
Scrub resumed:    Mon May 11 06:06:37 2020
Status:           finished
Duration:         7:27:15
Total to scrub:   3.67TiB
Rate:             143.04MiB/s
Error summary:    no errors found

[54472.936094] BTRFS info (device sda): scrub: started on devid 2
[54472.936095] BTRFS info (device sda): scrub: started on devid 1
[55224.956293] BTRFS info (device sda): scrub: not finished on devid 1
with status: -125
[55226.356563] BTRFS info (device sda): scrub: not finished on devid 2
with status: -125
[58775.602370] BTRFS info (device sda): scrub: started on devid 1
[58775.602372] BTRFS info (device sda): scrub: started on devid 2
[72393.296199] BTRFS info (device sda): scrub: not finished on devid 1
with status: -125
[72393.296215] BTRFS info (device sda): scrub: not finished on devid 2
with status: -125
[77731.999603] BTRFS info (device sda): scrub: started on devid 1
[77731.999604] BTRFS info (device sda): scrub: started on devid 2
[87727.510382] BTRFS info (device sda): scrub: not finished on devid 1
with status: -125
[87727.582401] BTRFS info (device sda): scrub: not finished on devid 2
with status: -125
[89358.196384] BTRFS info (device sda): scrub: started on devid 1
[89358.196386] BTRFS info (device sda): scrub: started on devid 2
[89830.639654] BTRFS info (device sda): scrub: not finished on devid 2
with status: -125
[89830.856232] BTRFS info (device sda): scrub: not finished on devid 1
with status: -125
[94486.300097] BTRFS info (device sda): scrub: started on devid 2
[94486.300098] BTRFS info (device sda): scrub: started on devid 1
[96223.185459] BTRFS info (device sda): scrub: not finished on devid 1
with status: -125
[96223.227246] BTRFS info (device sda): scrub: not finished on devid 2
with status: -125
[97810.489388] BTRFS info (device sda): scrub: started on devid 1
[97810.540625] BTRFS info (device sda): scrub: started on devid 2
[98068.987932] BTRFS info (device sda): scrub: finished on devid 2 with
status: 0
[98085.771626] BTRFS info (device sda): scrub: finished on devid 1 with
status: 0

So by elimination it's starting to look like suspend-to-RAM might be
part of the problem.  That's what I'll test next.

Cheers,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-11  1:39                                                 ` Andrew Pam
@ 2020-05-11  4:46                                                   ` Chris Murphy
  2020-05-13  6:47                                                     ` Andrew Pam
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Murphy @ 2020-05-11  4:46 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS

On Sun, May 10, 2020 at 7:39 PM Andrew Pam <andrew@sericyb.com.au> wrote:
>
> On 10/5/20 6:33 am, Chris Murphy wrote:
> > 2. That a scrub started, then cancelled, then resumed, also does
> > finish (or not).
>
> OK, I have now run a scrub with multiple cancel and resumes and that
> also proceeded and finished normally as expected:
>
> $ sudo ./btrfs scrub status -d /home
> NOTE: Reading progress from status file
> UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
> scrub device /dev/sda (id 1) history
> Scrub resumed:    Mon May 11 06:06:37 2020
> Status:           finished
> Duration:         7:27:31
> Total to scrub:   3.67TiB
> Rate:             142.96MiB/s
> Error summary:    no errors found
> scrub device /dev/sdb (id 2) history
> Scrub resumed:    Mon May 11 06:06:37 2020
> Status:           finished
> Duration:         7:27:15
> Total to scrub:   3.67TiB
> Rate:             143.04MiB/s
> Error summary:    no errors found
>
> [54472.936094] BTRFS info (device sda): scrub: started on devid 2
> [54472.936095] BTRFS info (device sda): scrub: started on devid 1
> [55224.956293] BTRFS info (device sda): scrub: not finished on devid 1
> with status: -125
> [55226.356563] BTRFS info (device sda): scrub: not finished on devid 2
> with status: -125
> [58775.602370] BTRFS info (device sda): scrub: started on devid 1
> [58775.602372] BTRFS info (device sda): scrub: started on devid 2
> [72393.296199] BTRFS info (device sda): scrub: not finished on devid 1
> with status: -125
> [72393.296215] BTRFS info (device sda): scrub: not finished on devid 2
> with status: -125
> [77731.999603] BTRFS info (device sda): scrub: started on devid 1
> [77731.999604] BTRFS info (device sda): scrub: started on devid 2
> [87727.510382] BTRFS info (device sda): scrub: not finished on devid 1
> with status: -125
> [87727.582401] BTRFS info (device sda): scrub: not finished on devid 2
> with status: -125
> [89358.196384] BTRFS info (device sda): scrub: started on devid 1
> [89358.196386] BTRFS info (device sda): scrub: started on devid 2
> [89830.639654] BTRFS info (device sda): scrub: not finished on devid 2
> with status: -125
> [89830.856232] BTRFS info (device sda): scrub: not finished on devid 1
> with status: -125
> [94486.300097] BTRFS info (device sda): scrub: started on devid 2
> [94486.300098] BTRFS info (device sda): scrub: started on devid 1
> [96223.185459] BTRFS info (device sda): scrub: not finished on devid 1
> with status: -125
> [96223.227246] BTRFS info (device sda): scrub: not finished on devid 2
> with status: -125
> [97810.489388] BTRFS info (device sda): scrub: started on devid 1
> [97810.540625] BTRFS info (device sda): scrub: started on devid 2
> [98068.987932] BTRFS info (device sda): scrub: finished on devid 2 with
> status: 0
> [98085.771626] BTRFS info (device sda): scrub: finished on devid 1 with
> status: 0
>
> So by elimination it's starting to look like suspend-to-RAM might be
> part of the problem.  That's what I'll test next.
>

Power management is difficult. (I'm actually working on a git bisect
right now, older laptop won't wake from suspend, 5.7 regression.)

Do all the devices wake up correctly isn't always easy to get an
answer to. They might all have power but did they really come up in
the correct state? Thing is, you're reporting that iotop independently
shows a transfer rate consistent with getting data off the drives.

I also wonder whether the socket that Graham mentions, could get in
some kind of stuck or confused state due to sleep/wake cycle? My case,
NVMe, is maybe not the best example because that's just PCIe. In your
case it's real drives, so it's SCSI, block, and maybe libata and other
things.



--
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-11  4:46                                                   ` Chris Murphy
@ 2020-05-13  6:47                                                     ` Andrew Pam
  2020-05-14  0:13                                                       ` Chris Murphy
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Pam @ 2020-05-13  6:47 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Graham Cobb, Btrfs BTRFS

On 11/5/20 2:46 pm, Chris Murphy wrote:
> I also wonder whether the socket that Graham mentions, could get in
> some kind of stuck or confused state due to sleep/wake cycle? My case,
> NVMe, is maybe not the best example because that's just PCIe. In your
> case it's real drives, so it's SCSI, block, and maybe libata and other
> things.

Could be.  When I start a new scrub and suspend the system, after a
resume further attempts to run "btrfs scrub status -dR /home" result in
the following:

NOTE: Reading progress from status file
UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
scrub device /dev/sda (id 1) status
Scrub started:    Wed May 13 16:10:12 2020
Status:           running
Duration:         0:00:22
	data_extents_scrubbed: 0
	tree_extents_scrubbed: 29238
	data_bytes_scrubbed: 0
	tree_bytes_scrubbed: 479035392
	read_errors: 0
	csum_errors: 0
	verify_errors: 0
	no_csum: 0
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 0
	unverified_errors: 0
	corrected_errors: 0
	last_physical: 0
scrub device /dev/sdb (id 2) status
Scrub started:    Wed May 13 16:10:12 2020
Status:           running
Duration:         0:00:23
	data_extents_scrubbed: 0
	tree_extents_scrubbed: 27936
	data_bytes_scrubbed: 0
	tree_bytes_scrubbed: 457703424
	read_errors: 0
	csum_errors: 0
	verify_errors: 0
	no_csum: 0
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 0
	unverified_errors: 0
	corrected_errors: 0
	last_physical: 0

So it appears that the socket connection to the kernel is not able to be
reestablished after the resume from suspend-to-RAM.  Interestingly, if I
then manually run "btrfs scrub cancel /home" and "btrfs scrub resume -c3
/home" then the status reports start working again.  It fails only when
"btrfs scrub resume -c3 /home" is run from the script
"/usr/lib/systemd/system-sleep/btrfs-scrub" as follows:

#!/bin/sh

case $1/$2 in
  pre/*)
    btrfs scrub cancel /home
    ;;
  post/*)
    sleep 1
    btrfs scrub resume -c3 /home
    ;;
esac

Without the sleep, it does not resume the scrub.  A longer sleep (5
seconds) does not resolve the issue with the status reports.

Maybe this is some kind of systemd problem...  :(

Thanks,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-13  6:47                                                     ` Andrew Pam
@ 2020-05-14  0:13                                                       ` Chris Murphy
  2020-05-14 16:18                                                         ` Graham Cobb
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Murphy @ 2020-05-14  0:13 UTC (permalink / raw)
  To: Andrew Pam; +Cc: Chris Murphy, Graham Cobb, Btrfs BTRFS

On Wed, May 13, 2020 at 12:49 AM Andrew Pam <andrew@sericyb.com.au> wrote:
>
> On 11/5/20 2:46 pm, Chris Murphy wrote:
> > I also wonder whether the socket that Graham mentions, could get in
> > some kind of stuck or confused state due to sleep/wake cycle? My case,
> > NVMe, is maybe not the best example because that's just PCIe. In your
> > case it's real drives, so it's SCSI, block, and maybe libata and other
> > things.
>
> Could be.  When I start a new scrub and suspend the system, after a
> resume further attempts to run "btrfs scrub status -dR /home" result in
> the following:
>
> NOTE: Reading progress from status file
> UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
> scrub device /dev/sda (id 1) status
> Scrub started:    Wed May 13 16:10:12 2020
> Status:           running
> Duration:         0:00:22
>         data_extents_scrubbed: 0
>         tree_extents_scrubbed: 29238
>         data_bytes_scrubbed: 0
>         tree_bytes_scrubbed: 479035392
>         read_errors: 0
>         csum_errors: 0
>         verify_errors: 0
>         no_csum: 0
>         csum_discards: 0
>         super_errors: 0
>         malloc_errors: 0
>         uncorrectable_errors: 0
>         unverified_errors: 0
>         corrected_errors: 0
>         last_physical: 0
> scrub device /dev/sdb (id 2) status
> Scrub started:    Wed May 13 16:10:12 2020
> Status:           running
> Duration:         0:00:23
>         data_extents_scrubbed: 0
>         tree_extents_scrubbed: 27936
>         data_bytes_scrubbed: 0
>         tree_bytes_scrubbed: 457703424
>         read_errors: 0
>         csum_errors: 0
>         verify_errors: 0
>         no_csum: 0
>         csum_discards: 0
>         super_errors: 0
>         malloc_errors: 0
>         uncorrectable_errors: 0
>         unverified_errors: 0
>         corrected_errors: 0
>         last_physical: 0
>
> So it appears that the socket connection to the kernel is not able to be
> reestablished after the resume from suspend-to-RAM.  Interestingly, if I
> then manually run "btrfs scrub cancel /home" and "btrfs scrub resume -c3
> /home" then the status reports start working again.  It fails only when
> "btrfs scrub resume -c3 /home" is run from the script
> "/usr/lib/systemd/system-sleep/btrfs-scrub" as follows:
>
> #!/bin/sh
>
> case $1/$2 in
>   pre/*)
>     btrfs scrub cancel /home
>     ;;
>   post/*)
>     sleep 1
>     btrfs scrub resume -c3 /home
>     ;;
> esac
>
> Without the sleep, it does not resume the scrub.  A longer sleep (5
> seconds) does not resolve the issue with the status reports.
>
> Maybe this is some kind of systemd problem...  :(

Oof. So possibly two bugs.

Well, you could post an inquiry about it to systemd-devel@. And then
maybe try to reproduce with something that has systemd-245 to see if
it makes a difference. I ran into something that looked like races in
244 with a rust generator I was testing, that didn't happen in 245.
*shrug*

But I'm not sure testing wise how to isolate the systemd from the
socket questions.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-14  0:13                                                       ` Chris Murphy
@ 2020-05-14 16:18                                                         ` Graham Cobb
  2020-05-15  2:37                                                           ` Andrew Pam
  0 siblings, 1 reply; 48+ messages in thread
From: Graham Cobb @ 2020-05-14 16:18 UTC (permalink / raw)
  To: Chris Murphy, Andrew Pam; +Cc: Btrfs BTRFS

On 14/05/2020 01:13, Chris Murphy wrote:
> On Wed, May 13, 2020 at 12:49 AM Andrew Pam <andrew@sericyb.com.au> wrote:
>>
>> On 11/5/20 2:46 pm, Chris Murphy wrote:
>>> I also wonder whether the socket that Graham mentions, could get in
>>> some kind of stuck or confused state due to sleep/wake cycle? 

Clarification: I originally said that the socket was for communication
between the scrub process and the kernel, but that was wrong.

I tried to correct in a later message but, in any case, the socket is
between the scrub usermode process and the "status" command.

I suppose it is possible that suspend/resume causes some problem with
sockets or with the thread which processes the socket but I would be
surprised. In any case, that should only cause strangeness with "scrub
status".

My case,
>>> NVMe, is maybe not the best example because that's just PCIe. In your
>>> case it's real drives, so it's SCSI, block, and maybe libata and other
>>> things.
>>
>> Could be.  When I start a new scrub and suspend the system, after a
>> resume further attempts to run "btrfs scrub status -dR /home" result in
>> the following:
>>
>> NOTE: Reading progress from status file

This means that scrub status could not connect to the socket and has
fallen back to reading the file (which may be out of date if the scrub
is actually still running). Either there is some strangeness about the
socket or the scrub usermode process itself is not running or the thread
which handles the socket is stuck (it normally sleeps in a "poll"
waiting for connections to the socket).

>> UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
>> scrub device /dev/sda (id 1) status
>> Scrub started:    Wed May 13 16:10:12 2020
>> Status:           running
>> Duration:         0:00:22
>>         data_extents_scrubbed: 0
>>         tree_extents_scrubbed: 29238
>>         data_bytes_scrubbed: 0
>>         tree_bytes_scrubbed: 479035392
>>         read_errors: 0
>>         csum_errors: 0
>>         verify_errors: 0
>>         no_csum: 0
>>         csum_discards: 0
>>         super_errors: 0
>>         malloc_errors: 0
>>         uncorrectable_errors: 0
>>         unverified_errors: 0
>>         corrected_errors: 0
>>         last_physical: 0
>> scrub device /dev/sdb (id 2) status
>> Scrub started:    Wed May 13 16:10:12 2020
>> Status:           running
>> Duration:         0:00:23
>>         data_extents_scrubbed: 0
>>         tree_extents_scrubbed: 27936
>>         data_bytes_scrubbed: 0
>>         tree_bytes_scrubbed: 457703424
>>         read_errors: 0
>>         csum_errors: 0
>>         verify_errors: 0
>>         no_csum: 0
>>         csum_discards: 0
>>         super_errors: 0
>>         malloc_errors: 0
>>         uncorrectable_errors: 0
>>         unverified_errors: 0
>>         corrected_errors: 0
>>         last_physical: 0
>>
>> So it appears that the socket connection to the kernel is not able to be
>> reestablished after the resume from suspend-to-RAM.  Interestingly, if I
>> then manually run "btrfs scrub cancel /home" and "btrfs scrub resume -c3
>> /home" then the status reports start working again.  It fails only when
>> "btrfs scrub resume -c3 /home" is run from the script
>> "/usr/lib/systemd/system-sleep/btrfs-scrub" as follows:
>>
>> #!/bin/sh
>>
>> case $1/$2 in
>>   pre/*)
>>     btrfs scrub cancel /home
>>     ;;
>>   post/*)
>>     sleep 1
>>     btrfs scrub resume -c3 /home
>>     ;;
>> esac

Try scrub resume restarts the scrub and backgrounds it. Maybe there is
something strange about background processes created in the restore
script. What happens if you specify "-B" on the resume?

Another potential problem is that the "scrub cancel" cancels the scrub
but the scrub process doesn't immediately exit (it waits to receive the
"cancelled" status and get the final stats, then it writes them out and
does some tidying up). So maybe there is a race: maybe the scrub is
still cancelling (either in the kernel, or the scrub usermode process
itself is still thinking about exiting) when the suspend happens. On
resume, maybe the cancel then completes -- that might even be why the
resume doesn't work without the delay.

It is even possible that the resume could succeed in resuming but using
bad last_physical data if the cancel process is still writing it out. I
don't suppose anyone has really tested races between cancel and resume.

Try adding a sleep after the cancel (or logic to see if the scrub
process has actually exited).

>>
>> Without the sleep, it does not resume the scrub.  A longer sleep (5
>> seconds) does not resolve the issue with the status reports.
>>
>> Maybe this is some kind of systemd problem...  :(
> 
> Oof. So possibly two bugs.
> 
> Well, you could post an inquiry about it to systemd-devel@. And then
> maybe try to reproduce with something that has systemd-245 to see if
> it makes a difference. I ran into something that looked like races in
> 244 with a rust generator I was testing, that didn't happen in 245.
> *shrug*
> 
> But I'm not sure testing wise how to isolate the systemd from the
> socket questions.
> 
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-14 16:18                                                         ` Graham Cobb
@ 2020-05-15  2:37                                                           ` Andrew Pam
  2020-05-15 11:27                                                             ` Graham Cobb
  2020-07-04 14:51                                                             ` Andrew Pam
  0 siblings, 2 replies; 48+ messages in thread
From: Andrew Pam @ 2020-05-15  2:37 UTC (permalink / raw)
  To: Graham Cobb, Chris Murphy; +Cc: Btrfs BTRFS

On 15/5/20 2:18 am, Graham Cobb wrote:
> Try scrub resume restarts the scrub and backgrounds it. Maybe there is
> something strange about background processes created in the restore
> script. What happens if you specify "-B" on the resume?

Then the resume does not occur.  After the resume from suspend-to-RAM,
no "btrfs" process is running and the status still shows as "aborted".
This suggests that the "btrfs resume -B" fails when run from within the
systemd post-resume script.

> Try adding a sleep after the cancel (or logic to see if the scrub
> process has actually exited).

I tried adding both "killall -s0 -w btrfs" and "sleep 5" - even with
these changes, the "btrfs resume" (without -B option) once again shows
"last_physical: 0"

Thanks,
	Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-15  2:37                                                           ` Andrew Pam
@ 2020-05-15 11:27                                                             ` Graham Cobb
  2020-07-04 14:51                                                             ` Andrew Pam
  1 sibling, 0 replies; 48+ messages in thread
From: Graham Cobb @ 2020-05-15 11:27 UTC (permalink / raw)
  To: Andrew Pam, Chris Murphy; +Cc: Btrfs BTRFS

On 15/05/2020 03:37, Andrew Pam wrote:
> On 15/5/20 2:18 am, Graham Cobb wrote:
>> Try scrub resume restarts the scrub and backgrounds it. Maybe there is
>> something strange about background processes created in the restore
>> script. What happens if you specify "-B" on the resume?
> 
> Then the resume does not occur.  After the resume from suspend-to-RAM,
> no "btrfs" process is running and the status still shows as "aborted".
> This suggests that the "btrfs resume -B" fails when run from within the
> systemd post-resume script.

Maybe something special about the post-resume script. Have you tried not
resuming the scrub in the systemd script but doing it by hand after
everything has settled down?

> 
>> Try adding a sleep after the cancel (or logic to see if the scrub
>> process has actually exited).
> 
> I tried adding both "killall -s0 -w btrfs" and "sleep 5" - even with
> these changes, the "btrfs resume" (without -B option) once again shows
> "last_physical: 0"

Oh well, worth a try.

Did you try running the resume using my logging patch? Was the 0 for
last_physical definitely being returned from the kernel in the ioctl
response? It may mean running it manually (or redirecting stderr to a file).

Graham

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: btrfs-progs reports nonsense scrub status
  2020-05-15  2:37                                                           ` Andrew Pam
  2020-05-15 11:27                                                             ` Graham Cobb
@ 2020-07-04 14:51                                                             ` Andrew Pam
  1 sibling, 0 replies; 48+ messages in thread
From: Andrew Pam @ 2020-07-04 14:51 UTC (permalink / raw)
  To: Graham Cobb, Chris Murphy; +Cc: Btrfs BTRFS

On 15/5/20 12:37 pm, Andrew Pam wrote:
> After the resume from suspend-to-RAM, no "btrfs" process is running
> and the status still shows as "aborted". This suggests that the
> "btrfs resume -B" fails when run from within the systemd post-resume
> script.

Here's what I ultimately ended up with:

http://www.sericyb.com.au/HOWTO-btrfs-scrub-suspend.html

Hope that helps,
		Andrew

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2020-07-04 14:56 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-05  5:46 btrfs-progs reports nonsense scrub status Andrew Pam
2020-05-05  9:51 ` Graham Cobb
2020-05-05 10:10   ` Andrew Pam
2020-05-05 20:39   ` Andrew Pam
2020-05-06 23:42     ` Chris Murphy
2020-05-07  1:09       ` Andrew Pam
2020-05-07  5:10         ` Chris Murphy
2020-05-07  5:36           ` Chris Murphy
2020-05-07  5:56             ` Andrew Pam
2020-05-08  2:26               ` Chris Murphy
2020-05-08  2:31                 ` Andrew Pam
2020-05-08  2:48                   ` Chris Murphy
2020-05-08  2:54                     ` Andrew Pam
2020-05-08  3:55                       ` Chris Murphy
2020-05-08  4:04                         ` Andrew Pam
2020-05-08  4:21                           ` Chris Murphy
2020-05-08  4:42                             ` Andrew Pam
2020-05-08  4:45                               ` Chris Murphy
2020-05-08  4:52                                 ` Andrew Pam
2020-05-08  7:37                                   ` Chris Murphy
2020-05-08  8:19                                     ` Andrew Pam
2020-05-08 11:21                                       ` Graham Cobb
2020-05-08 12:54                                         ` Andrew Pam
2020-05-08 17:06                                         ` Andrew Pam
2020-05-08 21:20                                           ` Chris Murphy
2020-05-08 21:31                                           ` Chris Murphy
2020-05-09 10:15                                             ` Andrew Pam
2020-05-09 20:33                                               ` Chris Murphy
2020-05-10  1:14                                                 ` Chris Murphy
2020-05-10  1:28                                                   ` Andrew Pam
2020-05-10  1:32                                                   ` Andrew Pam
2020-05-10 11:52                                                   ` Graham Cobb
2020-05-10 17:21                                                     ` Graham Cobb
2020-05-10 17:27                                                       ` Andrew Pam
2020-05-10  1:30                                                 ` Andrew Pam
2020-05-10  6:17                                                 ` Andrew Pam
2020-05-11  1:39                                                 ` Andrew Pam
2020-05-11  4:46                                                   ` Chris Murphy
2020-05-13  6:47                                                     ` Andrew Pam
2020-05-14  0:13                                                       ` Chris Murphy
2020-05-14 16:18                                                         ` Graham Cobb
2020-05-15  2:37                                                           ` Andrew Pam
2020-05-15 11:27                                                             ` Graham Cobb
2020-07-04 14:51                                                             ` Andrew Pam
2020-05-09  8:02                                           ` Su Yue
2020-05-09  8:23                                             ` Andrew Pam
2020-05-09 20:28                                               ` Chris Murphy
2020-05-09 20:39                                                 ` Andrew Pam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.