All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs scrub: cancel + resume not resuming?
@ 2020-01-09 10:03 Sebastian Döring
  2020-01-09 10:19 ` Graham Cobb
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Sebastian Döring @ 2020-01-09 10:03 UTC (permalink / raw)
  To: linux-btrfs

Maybe I'm doing it entirely wrong, but I can't seem to get 'btrfs
scrub resume' to work properly. During a running scrub the resume
information (like data_bytes_scrubbed:1081454592) gets written to a
file in /var/lib/btrfs, but as soon as the scrub is cancelled all
relevant fields are zeroed. 'btrfs scrub resume' then seems to
re-start from the very beginning.

This is on linux-5.5-rc5 and btrfs-progs 5.4, but I've been seeing
this for a while now.

Is this intended/expected behavior? Am I using the btrfs-progs wrong?
How can I interrupt and resume a scrub?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs scrub: cancel + resume not resuming?
  2020-01-09 10:03 btrfs scrub: cancel + resume not resuming? Sebastian Döring
@ 2020-01-09 10:19 ` Graham Cobb
  2020-01-09 17:06   ` Graham Cobb
  2020-01-09 10:34 ` btrfs scrub: cancel + resume not resuming? Holger Hoffstätte
  2020-01-22 15:52 ` David Sterba
  2 siblings, 1 reply; 11+ messages in thread
From: Graham Cobb @ 2020-01-09 10:19 UTC (permalink / raw)
  To: Sebastian Döring, linux-btrfs

On 09/01/2020 10:03, Sebastian Döring wrote:
> Maybe I'm doing it entirely wrong, but I can't seem to get 'btrfs
> scrub resume' to work properly. During a running scrub the resume
> information (like data_bytes_scrubbed:1081454592) gets written to a
> file in /var/lib/btrfs, but as soon as the scrub is cancelled all
> relevant fields are zeroed. 'btrfs scrub resume' then seems to
> re-start from the very beginning.
> 
> This is on linux-5.5-rc5 and btrfs-progs 5.4, but I've been seeing
> this for a while now.
> 
> Is this intended/expected behavior? Am I using the btrfs-progs wrong?
> How can I interrupt and resume a scrub?

Coincidentally, I noticed exactly the same thing yesterday!

I have just run a quick test. It works with kernel 4.19 but doesn't with
kernel 5.3. This is using exactly the same version of btrfs-progs:
v5.3.1 (I just rebooted the same system with an old kernel to check).

As Sebastian says, the symptom is that the file in /var/lib/btrfs shows
all fields as zero after the cancel (although "cancelled" and "finished"
are both 1). In particular, last_physical is zero so the scrub always
resumes from the beginning.

With the old kernel, the file in /var/lib/btrfs correctly has all the
values filled in after the cancel so the scrub can be resumed.

Graham

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs scrub: cancel + resume not resuming?
  2020-01-09 10:03 btrfs scrub: cancel + resume not resuming? Sebastian Döring
  2020-01-09 10:19 ` Graham Cobb
@ 2020-01-09 10:34 ` Holger Hoffstätte
  2020-01-09 10:52   ` Graham Cobb
  2020-01-22 15:52 ` David Sterba
  2 siblings, 1 reply; 11+ messages in thread
From: Holger Hoffstätte @ 2020-01-09 10:34 UTC (permalink / raw)
  To: Sebastian Döring, linux-btrfs

On 1/9/20 11:03 AM, Sebastian Döring wrote:
> Maybe I'm doing it entirely wrong, but I can't seem to get 'btrfs
> scrub resume' to work properly. During a running scrub the resume
> information (like data_bytes_scrubbed:1081454592) gets written to a
> file in /var/lib/btrfs, but as soon as the scrub is cancelled all
> relevant fields are zeroed. 'btrfs scrub resume' then seems to
> re-start from the very beginning.
> 
> This is on linux-5.5-rc5 and btrfs-progs 5.4, but I've been seeing
> this for a while now.
> 
> Is this intended/expected behavior? Am I using the btrfs-progs wrong?
> How can I interrupt and resume a scrub?

Using 5.4.9+ (all of btrfs-5.5) and btrfs-progs 5.4 I just tried and
it still works for me (and always has):

$btrfs scrub start /mnt/backup
scrub started on /mnt/backup, fsid d163af2f-6e03-4972-bfd6-30c68b6ed312 (pid=25633)

$btrfs scrub cancel /mnt/backup
scrub cancelled

$btrfs scrub resume /mnt/backup
scrub resumed on /mnt/backup, fsid d163af2f-6e03-4972-bfd6-30c68b6ed312 (pid=25704)

..and it keeps munching away as expected.

TBH it's a bit odd that there is no "pause" - I'd expect cancel to be final,
but apart from that it seems to work.

-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs scrub: cancel + resume not resuming?
  2020-01-09 10:34 ` btrfs scrub: cancel + resume not resuming? Holger Hoffstätte
@ 2020-01-09 10:52   ` Graham Cobb
  2020-01-09 11:05     ` Holger Hoffstätte
  0 siblings, 1 reply; 11+ messages in thread
From: Graham Cobb @ 2020-01-09 10:52 UTC (permalink / raw)
  To: Holger Hoffstätte, Sebastian Döring, linux-btrfs

On 09/01/2020 10:34, Holger Hoffstätte wrote:
> On 1/9/20 11:03 AM, Sebastian Döring wrote:
>> Maybe I'm doing it entirely wrong, but I can't seem to get 'btrfs
>> scrub resume' to work properly. During a running scrub the resume
>> information (like data_bytes_scrubbed:1081454592) gets written to a
>> file in /var/lib/btrfs, but as soon as the scrub is cancelled all
>> relevant fields are zeroed. 'btrfs scrub resume' then seems to
>> re-start from the very beginning.
>>
>> This is on linux-5.5-rc5 and btrfs-progs 5.4, but I've been seeing
>> this for a while now.
>>
>> Is this intended/expected behavior? Am I using the btrfs-progs wrong?
>> How can I interrupt and resume a scrub?
> 
> Using 5.4.9+ (all of btrfs-5.5) and btrfs-progs 5.4 I just tried and
> it still works for me (and always has):
> 
> $btrfs scrub start /mnt/backup
> scrub started on /mnt/backup, fsid d163af2f-6e03-4972-bfd6-30c68b6ed312
> (pid=25633)
> 
> $btrfs scrub cancel /mnt/backup
> scrub cancelled
> 
> $btrfs scrub resume /mnt/backup
> scrub resumed on /mnt/backup, fsid d163af2f-6e03-4972-bfd6-30c68b6ed312
> (pid=25704)
> 
> ..and it keeps munching away as expected.

Can you check that the resume has really started from where the scrub
was cancelled? What I (and, I think, Sebastian) are seeing is that the
resume "works" but actually restarts from the beginning.

For example, something like:

btrfs scrub start /mnt/backup
sleep 300
btrfs scrub status -R /mnt/backup
btrfs scrub cancel /mnt/backup
btrfs scrub resume /mnt/backup
sleep 100
btrfs scrub status -R /mnt/backup

and check the last_physical in the second status is higher than the one
in the first status.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs scrub: cancel + resume not resuming?
  2020-01-09 10:52   ` Graham Cobb
@ 2020-01-09 11:05     ` Holger Hoffstätte
  2020-01-09 11:13       ` Graham Cobb
  2020-01-09 11:16       ` Holger Hoffstätte
  0 siblings, 2 replies; 11+ messages in thread
From: Holger Hoffstätte @ 2020-01-09 11:05 UTC (permalink / raw)
  To: Graham Cobb, Sebastian Döring, linux-btrfs

On 1/9/20 11:52 AM, Graham Cobb wrote:
> On 09/01/2020 10:34, Holger Hoffstätte wrote:
>> On 1/9/20 11:03 AM, Sebastian Döring wrote:
>>> Maybe I'm doing it entirely wrong, but I can't seem to get 'btrfs
>>> scrub resume' to work properly. During a running scrub the resume
>>> information (like data_bytes_scrubbed:1081454592) gets written to a
>>> file in /var/lib/btrfs, but as soon as the scrub is cancelled all
>>> relevant fields are zeroed. 'btrfs scrub resume' then seems to
>>> re-start from the very beginning.
>>>
>>> This is on linux-5.5-rc5 and btrfs-progs 5.4, but I've been seeing
>>> this for a while now.
>>>
>>> Is this intended/expected behavior? Am I using the btrfs-progs wrong?
>>> How can I interrupt and resume a scrub?
>>
>> Using 5.4.9+ (all of btrfs-5.5) and btrfs-progs 5.4 I just tried and
>> it still works for me (and always has):
>>
>> $btrfs scrub start /mnt/backup
>> scrub started on /mnt/backup, fsid d163af2f-6e03-4972-bfd6-30c68b6ed312
>> (pid=25633)
>>
>> $btrfs scrub cancel /mnt/backup
>> scrub cancelled
>>
>> $btrfs scrub resume /mnt/backup
>> scrub resumed on /mnt/backup, fsid d163af2f-6e03-4972-bfd6-30c68b6ed312
>> (pid=25704)
>>
>> ..and it keeps munching away as expected.
> 
> Can you check that the resume has really started from where the scrub
> was cancelled? What I (and, I think, Sebastian) are seeing is that the
> resume "works" but actually restarts from the beginning.
> 
> For example, something like:
> 
> btrfs scrub start /mnt/backup
> sleep 300
> btrfs scrub status -R /mnt/backup
> btrfs scrub cancel /mnt/backup
> btrfs scrub resume /mnt/backup
> sleep 100
> btrfs scrub status -R /mnt/backup
> 
> and check the last_physical in the second status is higher than the one
> in the first status.
> 

Well, yes. Reduced the wait times a bit and:

$cat test-scrub
#!/bin/sh
btrfs scrub start /mnt/backup
sleep 30
btrfs scrub status -R /mnt/backup
btrfs scrub cancel /mnt/backup
btrfs scrub resume /mnt/backup
sleep 10
btrfs scrub status -R /mnt/backup

$./test-scrub
scrub started on /mnt/backup, fsid d163af2f-6e03-4972-bfd6-30c68b6ed312 (pid=26390)
UUID:             d163af2f-6e03-4972-bfd6-30c68b6ed312
Scrub started:    Thu Jan  9 12:02:18 2020
Status:           running
Duration:         0:00:25
	data_extents_scrubbed: 65419
	tree_extents_scrubbed: 28
	data_bytes_scrubbed: 4117274624
	tree_bytes_scrubbed: 458752
	read_errors: 0
	csum_errors: 0
	verify_errors: 0
	no_csum: 0
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 0
	unverified_errors: 0
	corrected_errors: 0
	last_physical: 3591372800
         ^^^^^^^^^^^^^^^^^^^^^^^^^
scrub cancelled
scrub resumed on /mnt/backup, fsid d163af2f-6e03-4972-bfd6-30c68b6ed312 (pid=26399)
UUID:             d163af2f-6e03-4972-bfd6-30c68b6ed312
Scrub resumed:    Thu Jan  9 12:02:49 2020
Status:           running
Duration:         0:00:36
	data_extents_scrubbed: 12648
	tree_extents_scrubbed: 28
	data_bytes_scrubbed: 823394304
	tree_bytes_scrubbed: 458752
	read_errors: 0
	csum_errors: 0
	verify_errors: 0
	no_csum: 0
	csum_discards: 0
	super_errors: 0
	malloc_errors: 0
	uncorrectable_errors: 0
	unverified_errors: 0
	corrected_errors: 0
	last_physical: 923205632
         ^^^^^^^^^^^^^^^^^^^^^^^^

Not sure what I'm doing wrong ;)

-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs scrub: cancel + resume not resuming?
  2020-01-09 11:05     ` Holger Hoffstätte
@ 2020-01-09 11:13       ` Graham Cobb
  2020-01-09 11:16       ` Holger Hoffstätte
  1 sibling, 0 replies; 11+ messages in thread
From: Graham Cobb @ 2020-01-09 11:13 UTC (permalink / raw)
  To: Holger Hoffstätte, Sebastian Döring, linux-btrfs

On 09/01/2020 11:05, Holger Hoffstätte wrote:
> On 1/9/20 11:52 AM, Graham Cobb wrote:
>> On 09/01/2020 10:34, Holger Hoffstätte wrote:
>>> On 1/9/20 11:03 AM, Sebastian Döring wrote:
>>>> Maybe I'm doing it entirely wrong, but I can't seem to get 'btrfs
>>>> scrub resume' to work properly. During a running scrub the resume
>>>> information (like data_bytes_scrubbed:1081454592) gets written to a
>>>> file in /var/lib/btrfs, but as soon as the scrub is cancelled all
>>>> relevant fields are zeroed. 'btrfs scrub resume' then seems to
>>>> re-start from the very beginning.
>>>>
>>>> This is on linux-5.5-rc5 and btrfs-progs 5.4, but I've been seeing
>>>> this for a while now.
>>>>
>>>> Is this intended/expected behavior? Am I using the btrfs-progs wrong?
>>>> How can I interrupt and resume a scrub?
>>>
>>> Using 5.4.9+ (all of btrfs-5.5) and btrfs-progs 5.4 I just tried and
>>> it still works for me (and always has):
>>>
>>> $btrfs scrub start /mnt/backup
>>> scrub started on /mnt/backup, fsid d163af2f-6e03-4972-bfd6-30c68b6ed312
>>> (pid=25633)
>>>
>>> $btrfs scrub cancel /mnt/backup
>>> scrub cancelled
>>>
>>> $btrfs scrub resume /mnt/backup
>>> scrub resumed on /mnt/backup, fsid d163af2f-6e03-4972-bfd6-30c68b6ed312
>>> (pid=25704)
>>>
>>> ..and it keeps munching away as expected.
>>
>> Can you check that the resume has really started from where the scrub
>> was cancelled? What I (and, I think, Sebastian) are seeing is that the
>> resume "works" but actually restarts from the beginning.
>>
>> For example, something like:
>>
>> btrfs scrub start /mnt/backup
>> sleep 300
>> btrfs scrub status -R /mnt/backup
>> btrfs scrub cancel /mnt/backup
>> btrfs scrub resume /mnt/backup
>> sleep 100
>> btrfs scrub status -R /mnt/backup
>>
>> and check the last_physical in the second status is higher than the one
>> in the first status.
>>
> 
> Well, yes. Reduced the wait times a bit and:
> 
> $cat test-scrub
> #!/bin/sh
> btrfs scrub start /mnt/backup
> sleep 30
> btrfs scrub status -R /mnt/backup
> btrfs scrub cancel /mnt/backup
> btrfs scrub resume /mnt/backup
> sleep 10
> btrfs scrub status -R /mnt/backup
> 
> $./test-scrub
> scrub started on /mnt/backup, fsid d163af2f-6e03-4972-bfd6-30c68b6ed312
> (pid=26390)
> UUID:             d163af2f-6e03-4972-bfd6-30c68b6ed312
> Scrub started:    Thu Jan  9 12:02:18 2020
> Status:           running
> Duration:         0:00:25
>     data_extents_scrubbed: 65419
>     tree_extents_scrubbed: 28
>     data_bytes_scrubbed: 4117274624
>     tree_bytes_scrubbed: 458752
>     read_errors: 0
>     csum_errors: 0
>     verify_errors: 0
>     no_csum: 0
>     csum_discards: 0
>     super_errors: 0
>     malloc_errors: 0
>     uncorrectable_errors: 0
>     unverified_errors: 0
>     corrected_errors: 0
>     last_physical: 3591372800
>         ^^^^^^^^^^^^^^^^^^^^^^^^^
> scrub cancelled
> scrub resumed on /mnt/backup, fsid d163af2f-6e03-4972-bfd6-30c68b6ed312
> (pid=26399)
> UUID:             d163af2f-6e03-4972-bfd6-30c68b6ed312
> Scrub resumed:    Thu Jan  9 12:02:49 2020
> Status:           running
> Duration:         0:00:36
>     data_extents_scrubbed: 12648
>     tree_extents_scrubbed: 28
>     data_bytes_scrubbed: 823394304
>     tree_bytes_scrubbed: 458752
>     read_errors: 0
>     csum_errors: 0
>     verify_errors: 0
>     no_csum: 0
>     csum_discards: 0
>     super_errors: 0
>     malloc_errors: 0
>     uncorrectable_errors: 0
>     unverified_errors: 0
>     corrected_errors: 0
>     last_physical: 923205632
>         ^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Not sure what I'm doing wrong ;)

So, you ARE seeing the same problem we are (as 923205632 is <
3591372800)! The resume started from scratch, which is not what it is
supposed to do. You should have seen 4514578432 (3591372800+923205632)
as last_physical in the second status.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs scrub: cancel + resume not resuming?
  2020-01-09 11:05     ` Holger Hoffstätte
  2020-01-09 11:13       ` Graham Cobb
@ 2020-01-09 11:16       ` Holger Hoffstätte
  1 sibling, 0 replies; 11+ messages in thread
From: Holger Hoffstätte @ 2020-01-09 11:16 UTC (permalink / raw)
  To: Graham Cobb, Sebastian Döring, linux-btrfs

On 1/9/20 12:05 PM, Holger Hoffstätte wrote:
> $cat test-scrub
> #!/bin/sh
> btrfs scrub start /mnt/backup
> sleep 30
> btrfs scrub status -R /mnt/backup
> btrfs scrub cancel /mnt/backup
> btrfs scrub resume /mnt/backup
> sleep 10
> btrfs scrub status -R /mnt/backup
> 
[snip]
>      last_physical: 3591372800
>          ^^^^^^^^^^^^^^^^^^^^^^^^^
[snip]
>      last_physical: 923205632
>          ^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Not sure what I'm doing wrong ;)

AARGH. What I'm doing wrong is that I can't read and that it indeed seems
to start from the beginning. Nice catch!

-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs scrub: cancel + resume not resuming?
  2020-01-09 10:19 ` Graham Cobb
@ 2020-01-09 17:06   ` Graham Cobb
  2020-01-09 20:35     ` Graham Cobb
  0 siblings, 1 reply; 11+ messages in thread
From: Graham Cobb @ 2020-01-09 17:06 UTC (permalink / raw)
  To: Sebastian Döring, linux-btrfs

On 09/01/2020 10:19, Graham Cobb wrote:
> On 09/01/2020 10:03, Sebastian Döring wrote:
>> Maybe I'm doing it entirely wrong, but I can't seem to get 'btrfs
>> scrub resume' to work properly. During a running scrub the resume
>> information (like data_bytes_scrubbed:1081454592) gets written to a
>> file in /var/lib/btrfs, but as soon as the scrub is cancelled all
>> relevant fields are zeroed. 'btrfs scrub resume' then seems to
>> re-start from the very beginning.
>>
>> This is on linux-5.5-rc5 and btrfs-progs 5.4, but I've been seeing
>> this for a while now.
>>
>> Is this intended/expected behavior? Am I using the btrfs-progs wrong?
>> How can I interrupt and resume a scrub?
> 
> Coincidentally, I noticed exactly the same thing yesterday!
> 
> I have just run a quick test. It works with kernel 4.19 but doesn't with
> kernel 5.3. This is using exactly the same version of btrfs-progs:
> v5.3.1 (I just rebooted the same system with an old kernel to check).
> 
> As Sebastian says, the symptom is that the file in /var/lib/btrfs shows
> all fields as zero after the cancel (although "cancelled" and "finished"
> are both 1). In particular, last_physical is zero so the scrub always
> resumes from the beginning.
> 
> With the old kernel, the file in /var/lib/btrfs correctly has all the
> values filled in after the cancel so the scrub can be resumed.

I have spent the last couple of hours instrumenting the code of scrub.c
to try to work out what is going on. The relationship between the main
thread, the thread where the scrub is running and the thread where the
status updates are being received from the kernel is quite horrible. Not
to mention that two of these three threads write out what could be the
final version of the progress file (and use different data structures as
the source for that write!).

The basic problem is that the scrub program seems to assume it will have
seen the cancellation in the update stream *before* the ioctl completes
with the cancelled status. And that seems to happen the other way round
in the 5.x kernel. Although I haven't done an actual comparison with a
4.19 run to check this.

What I haven't checked, yet, is if the 5.x kernel does actually send the
final data update if we stick around long enough to receive it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs scrub: cancel + resume not resuming?
  2020-01-09 17:06   ` Graham Cobb
@ 2020-01-09 20:35     ` Graham Cobb
  2020-01-13 13:57       ` btrfs scrub: cancel + resume not resuming - kernel regression Graham Cobb
  0 siblings, 1 reply; 11+ messages in thread
From: Graham Cobb @ 2020-01-09 20:35 UTC (permalink / raw)
  To: Sebastian Döring, linux-btrfs

On 09/01/2020 17:06, Graham Cobb wrote:
> On 09/01/2020 10:19, Graham Cobb wrote:
>> On 09/01/2020 10:03, Sebastian Döring wrote:
>>> Maybe I'm doing it entirely wrong, but I can't seem to get 'btrfs
>>> scrub resume' to work properly. During a running scrub the resume
>>> information (like data_bytes_scrubbed:1081454592) gets written to a
>>> file in /var/lib/btrfs, but as soon as the scrub is cancelled all
>>> relevant fields are zeroed. 'btrfs scrub resume' then seems to
>>> re-start from the very beginning.
>>>
>>> This is on linux-5.5-rc5 and btrfs-progs 5.4, but I've been seeing
>>> this for a while now.
>>>
>>> Is this intended/expected behavior? Am I using the btrfs-progs wrong?
>>> How can I interrupt and resume a scrub?
>>
>> Coincidentally, I noticed exactly the same thing yesterday!
>>
>> I have just run a quick test. It works with kernel 4.19 but doesn't with
>> kernel 5.3. This is using exactly the same version of btrfs-progs:
>> v5.3.1 (I just rebooted the same system with an old kernel to check).
>>
>> As Sebastian says, the symptom is that the file in /var/lib/btrfs shows
>> all fields as zero after the cancel (although "cancelled" and "finished"
>> are both 1). In particular, last_physical is zero so the scrub always
>> resumes from the beginning.
>>
>> With the old kernel, the file in /var/lib/btrfs correctly has all the
>> values filled in after the cancel so the scrub can be resumed.
> 
> I have spent the last couple of hours instrumenting the code of scrub.c
> to try to work out what is going on. 

I was over-complicating it. The problem is simple:

In kernel 4.19, BTRFS_IOC_SCRUB fills in the (final) progress values in
the scrub args EVEN WHEN THE SCRUB IS CANCELLED! If the errno is 125
(and presumably most other values) the output arguments are valid.

In kernel 5.3, THAT IS NO LONGER THE CASE! If the errno is 125, the
progress values are all 0.

This ABI change breaks btrfs-scrub -- in particular the scrub
cancel-resume handling. This relies on the scrub ioctl reporting the
progress values when the scrub is cancelled: those values are written
out to the file in /var/lib/btrfs and read back in for the resume.

I haven't attempted to look at the kernel code to see why the behaviour
changed.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs scrub: cancel + resume not resuming - kernel regression
  2020-01-09 20:35     ` Graham Cobb
@ 2020-01-13 13:57       ` Graham Cobb
  0 siblings, 0 replies; 11+ messages in thread
From: Graham Cobb @ 2020-01-13 13:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Sebastian Döring

On 09/01/2020 20:35, Graham Cobb wrote:
> On 09/01/2020 17:06, Graham Cobb wrote:
>> On 09/01/2020 10:19, Graham Cobb wrote:
>>> On 09/01/2020 10:03, Sebastian Döring wrote:
>>>> Maybe I'm doing it entirely wrong, but I can't seem to get 'btrfs
>>>> scrub resume' to work properly. During a running scrub the resume
>>>> information (like data_bytes_scrubbed:1081454592) gets written to a
>>>> file in /var/lib/btrfs, but as soon as the scrub is cancelled all
>>>> relevant fields are zeroed. 'btrfs scrub resume' then seems to
>>>> re-start from the very beginning.
>>>>
>>>> This is on linux-5.5-rc5 and btrfs-progs 5.4, but I've been seeing
>>>> this for a while now.
>>>>
>>>> Is this intended/expected behavior? Am I using the btrfs-progs wrong?
>>>> How can I interrupt and resume a scrub?
>>>
>>> Coincidentally, I noticed exactly the same thing yesterday!
>>>
>>> I have just run a quick test. It works with kernel 4.19 but doesn't with
>>> kernel 5.3. This is using exactly the same version of btrfs-progs:
>>> v5.3.1 (I just rebooted the same system with an old kernel to check).
>>>
>>> As Sebastian says, the symptom is that the file in /var/lib/btrfs shows
>>> all fields as zero after the cancel (although "cancelled" and "finished"
>>> are both 1). In particular, last_physical is zero so the scrub always
>>> resumes from the beginning.
>>>
>>> With the old kernel, the file in /var/lib/btrfs correctly has all the
>>> values filled in after the cancel so the scrub can be resumed.
>>
>> I have spent the last couple of hours instrumenting the code of scrub.c
>> to try to work out what is going on. 
> 
> I was over-complicating it. The problem is simple:
> 
> In kernel 4.19, BTRFS_IOC_SCRUB fills in the (final) progress values in
> the scrub args EVEN WHEN THE SCRUB IS CANCELLED! If the errno is 125
> (and presumably most other values) the output arguments are valid.
> 
> In kernel 5.3, THAT IS NO LONGER THE CASE! If the errno is 125, the
> progress values are all 0.
> 
> This ABI change breaks btrfs-scrub -- in particular the scrub
> cancel-resume handling. This relies on the scrub ioctl reporting the
> progress values when the scrub is cancelled: those values are written
> out to the file in /var/lib/btrfs and read back in for the resume.
> 
> I haven't attempted to look at the kernel code to see why the behaviour
> changed.

This regression in btrfs-scrub is a kernel problem: the scrub ioctl ABI
seems to have been broken some time between kernel 4.19 and kernel 5.3.

Do we need to provide any more information? I am not in a position to do
a bisect at this point, but if it is not obvious what change has caused
the breakage I can try to do so later in the week.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs scrub: cancel + resume not resuming?
  2020-01-09 10:03 btrfs scrub: cancel + resume not resuming? Sebastian Döring
  2020-01-09 10:19 ` Graham Cobb
  2020-01-09 10:34 ` btrfs scrub: cancel + resume not resuming? Holger Hoffstätte
@ 2020-01-22 15:52 ` David Sterba
  2 siblings, 0 replies; 11+ messages in thread
From: David Sterba @ 2020-01-22 15:52 UTC (permalink / raw)
  To: Sebastian Döring; +Cc: linux-btrfs

On Thu, Jan 09, 2020 at 11:03:08AM +0100, Sebastian Döring wrote:
> Maybe I'm doing it entirely wrong, but I can't seem to get 'btrfs
> scrub resume' to work properly. During a running scrub the resume
> information (like data_bytes_scrubbed:1081454592) gets written to a
> file in /var/lib/btrfs, but as soon as the scrub is cancelled all
> relevant fields are zeroed. 'btrfs scrub resume' then seems to
> re-start from the very beginning.

For the record, fix is queued for stable 5.4.14. Thanks for the report.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-01-22 15:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-09 10:03 btrfs scrub: cancel + resume not resuming? Sebastian Döring
2020-01-09 10:19 ` Graham Cobb
2020-01-09 17:06   ` Graham Cobb
2020-01-09 20:35     ` Graham Cobb
2020-01-13 13:57       ` btrfs scrub: cancel + resume not resuming - kernel regression Graham Cobb
2020-01-09 10:34 ` btrfs scrub: cancel + resume not resuming? Holger Hoffstätte
2020-01-09 10:52   ` Graham Cobb
2020-01-09 11:05     ` Holger Hoffstätte
2020-01-09 11:13       ` Graham Cobb
2020-01-09 11:16       ` Holger Hoffstätte
2020-01-22 15:52 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.