All of lore.kernel.org
 help / color / mirror / Atom feed
* Safe unmounting of external btrfs disk
@ 2020-11-26 18:33 Oliver Freyermuth
  2020-11-26 23:38 ` Chris Murphy
  0 siblings, 1 reply; 3+ messages in thread
From: Oliver Freyermuth @ 2020-11-26 18:33 UTC (permalink / raw)
  To: linux-btrfs

Dear BTRFS experts,

I've had a rather strange occurence with my external BTRFS backup disk last night,
which makes me question what is the correct way to safely remove a USB drive with BTRFS on it.

Here's the timeline:
- 02:00:00 am: btrbk starts running.
- 02:01:17 am: btrbk deletes the last old subvolume from the disk
                (I have btrfs_commit_delete = no, so the delayed deletion basically starts some time after).
- ~02:18 am: I performn an "umount" of the disk.
- ~02:18:43 am: I unplug the USB drive.
- 02:25:05 am: My kernel tells me this:
[19268.944902] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[19268.944904] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
[19268.944926] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
[19268.944926] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
[19268.944938] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
[19268.944938] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
[19268.944963] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
[19268.944964] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
[19268.944973] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
[19268.944974] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
[19268.946460] BTRFS: error (device sdc1) in btrfs_commit_transaction:2327: errno=-5 IO failure (Error while writing out transaction)
[19268.946461] BTRFS info (device sdc1): forced readonly
[19268.946462] BTRFS warning (device sdc1): Skipping commit of aborted transaction.
[19268.946463] BTRFS: error (device sdc1) in cleanup_transaction:1898: errno=-5 IO failure
[19268.946483] BTRFS error (device sdc1): commit super ret -5

What's worrisome is that this happened 7 minutes after the device was unmounted and unpluggied.
I use Kernel 5.9.11 and the following mount options on the backup disk:
  space_cache=v2,noatime,compress-force=zstd:6,commit=120,subvolid=0,noauto

Reconecting the disk, I do not see any subvolumes still pending deletion.

The good news: It seems the filesystem did not suffer at all,
probably since the whole transaction never reached the disk.

Is this behaviour expected?
If yes, how to "unmount" correctly (btrfs filesystem sync only seems to work on mounted filesystems)?
I believe udisks unmounts and then quickly removes power, so this would basically be similar to what I did manually here.

I've never seen this on earlier kernels, but I don't check my kernel logs each time, and don't unmount before shutdown each time,
so I can't state where this is a kernel version dependent issue.

Cheers,
	Oliver

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Safe unmounting of external btrfs disk
  2020-11-26 18:33 Safe unmounting of external btrfs disk Oliver Freyermuth
@ 2020-11-26 23:38 ` Chris Murphy
  2020-11-27  2:17   ` Oliver Freyermuth
  0 siblings, 1 reply; 3+ messages in thread
From: Chris Murphy @ 2020-11-26 23:38 UTC (permalink / raw)
  To: Oliver Freyermuth; +Cc: Btrfs BTRFS

On Thu, Nov 26, 2020 at 11:35 AM Oliver Freyermuth
<o.freyermuth@googlemail.com> wrote:
>
> Dear BTRFS experts,
>
> I've had a rather strange occurence with my external BTRFS backup disk last night,
> which makes me question what is the correct way to safely remove a USB drive with BTRFS on it.
>
> Here's the timeline:
> - 02:00:00 am: btrbk starts running.
> - 02:01:17 am: btrbk deletes the last old subvolume from the disk
>                 (I have btrfs_commit_delete = no, so the delayed deletion basically starts some time after).
> - ~02:18 am: I performn an "umount" of the disk.

How was this done? If it's "umount" on cli, what was the exact command
and did it complete?

> - ~02:18:43 am: I unplug the USB drive.
> - 02:25:05 am: My kernel tells me this:
> [19268.944902] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0

This suggests the file system was not actually unmounted as far as the
kernel was concerned, at the time the device was removed.

> Is this behaviour expected?

No, btrfs should block umount until it's done writing and flushing. It
is still possible those writes are in the drive's write cache, and
could get lost by disconnecting the drive before those writes are on
stable media, but Btrfs wouldn't know about that and wouldn't
complain. At next mount, it might know something went wrong only if
the lost writes were out of order, like the super block was update on
disk but the new trees it points to were lost. That'd be a drive bug,
not a Btrfs bug.


> If yes, how to "unmount" correctly (btrfs filesystem sync only seems to work on mounted filesystems)?
> I believe udisks unmounts and then quickly removes power, so this would basically be similar to what I did manually here.

Enable scsi event tracing:

# echo "scsi:*" > /sys/kernel/debug/tracing/set_event

On an HDD (i.e. nossd mount option), for 'umount' I get a bunch of
WRITE_10 commands that look like tree updates. Followed by
SYNCHRONIZE_CACHE. And then I see two WRITE_10 commands that
correspond to superblock 1 and 2. Followed by SYNCHRONIZE_CACHE. And
then one WRITE_10 for the 3rd superblock. That's it. And that should
be sufficient and nothing else happens after that - all of this is
blocking until it's done, as in, the drive itself claims the command
is done.

For this command:

# echo 1 > /sys/block/sdb/device/delete

I see two commands, SYNCHRONIZE_CACHE and START_STOP. That's it.

From this I conclude umount should be sufficient. I'm not certain
deleting the device is really necessary but it doesn't hurt.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Safe unmounting of external btrfs disk
  2020-11-26 23:38 ` Chris Murphy
@ 2020-11-27  2:17   ` Oliver Freyermuth
  0 siblings, 0 replies; 3+ messages in thread
From: Oliver Freyermuth @ 2020-11-27  2:17 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS


Thanks for the reply!

Am 27.11.20 um 00:38 schrieb Chris Murphy:
> On Thu, Nov 26, 2020 at 11:35 AM Oliver Freyermuth
> <o.freyermuth@googlemail.com> wrote:
>>
>> Dear BTRFS experts,
>>
>> I've had a rather strange occurence with my external BTRFS backup disk last night,
>> which makes me question what is the correct way to safely remove a USB drive with BTRFS on it.
>>
>> Here's the timeline:
>> - 02:00:00 am: btrbk starts running.
>> - 02:01:17 am: btrbk deletes the last old subvolume from the disk
>>                  (I have btrfs_commit_delete = no, so the delayed deletion basically starts some time after).
>> - ~02:18 am: I performn an "umount" of the disk.
> 
> How was this done? If it's "umount" on cli, what was the exact command
> and did it complete?

It was "umount" on the only mountpoint the device was mounted to.

Of course, I have to add at this point that everything is in my logs, apart from the "umount" command,
which is only in non-timestamped shell history.

Since the shell in which I ran "umount" was already closed when I checked the kernel logs,
it's only my memory we rely on here. While I am 98 % sure I entered the command, and only then unplugged the disk,
the same way I do it every day (unless the days I shut down the system and unplug then),
I give 2 % to my memory playing tricks on me and will bump that by +1 % each time I think about it
due to the unreliable way human brains work.

> 
>> - ~02:18:43 am: I unplug the USB drive.
>> - 02:25:05 am: My kernel tells me this:
>> [19268.944902] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
> 
> This suggests the file system was not actually unmounted as far as the
> kernel was concerned, at the time the device was removed.
> 
>> Is this behaviour expected?
> 
> No, btrfs should block umount until it's done writing and flushing. It
> is still possible those writes are in the drive's write cache, and
> could get lost by disconnecting the drive before those writes are on
> stable media, but Btrfs wouldn't know about that and wouldn't
> complain. At next mount, it might know something went wrong only if
> the lost writes were out of order, like the super block was update on
> disk but the new trees it points to were lost. That'd be a drive bug,
> not a Btrfs bug.

That's what I had hoped and would have expected.
So indeed the only valid explanation up to this point would be that my memory played a trick on me.
The good news is: I usually do this test about once per day. So if it was *not* a human memory fault,
it will happen again.

>> If yes, how to "unmount" correctly (btrfs filesystem sync only seems to work on mounted filesystems)?
>> I believe udisks unmounts and then quickly removes power, so this would basically be similar to what I did manually here.
> 
> Enable scsi event tracing:
> 
> # echo "scsi:*" > /sys/kernel/debug/tracing/set_event
> 
> On an HDD (i.e. nossd mount option), for 'umount' I get a bunch of
> WRITE_10 commands that look like tree updates. Followed by
> SYNCHRONIZE_CACHE. And then I see two WRITE_10 commands that
> correspond to superblock 1 and 2. Followed by SYNCHRONIZE_CACHE. And
> then one WRITE_10 for the 3rd superblock. That's it. And that should
> be sufficient and nothing else happens after that - all of this is
> blocking until it's done, as in, the drive itself claims the command
> is done.

I can confirm the very same observations here, thanks for the nice explanation
of what these translate to!

> 
> For this command:
> 
> # echo 1 > /sys/block/sdb/device/delete
> 
> I see two commands, SYNCHRONIZE_CACHE and START_STOP. That's it.
> 
>  From this I conclude umount should be sufficient. I'm not certain
> deleting the device is really necessary but it doesn't hurt.
> 

Ok!
Then I will go this way from now on. If I manage to reproduce this with may daily "detach the backup disk"
routine, it should show up again soon. If now, I have to run a memtest86 on my brain,
too bad these are not CoW and can not be scrubbed...

Cheers and thanks,
	Oliver

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-11-27  2:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-26 18:33 Safe unmounting of external btrfs disk Oliver Freyermuth
2020-11-26 23:38 ` Chris Murphy
2020-11-27  2:17   ` Oliver Freyermuth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.