Failover for unattached USB device

* Failover for unattached USB device
@ 2018-10-16 22:14 Dmitry Katsubo
  2018-10-24 15:03 ` Dmitry Katsubo
  0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Katsubo @ 2018-10-16 22:14 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2193 bytes --]

Dear btrfs team / community,

Sometimes it happens that kernel resets USB subsystem (looks like hardware
problem). Nevertheless all USB devices are unattached and attached back. After
few hours of struggle btrfs finally comes to the situation when read-only
filesystem mount is necessary. During this time when I try to access this
mounted filesystem (/mnt/backups) it reports success for some directories, or
error for others:

root@debian:~# ll /mnt/backups/
total 14334
drwxr-xr-x 1 adm users    116 Sep 12 00:35 .
drwxrwxr-x 1 adm users    164 Sep 19 22:44 ..
-rw-r--r-- 1 adm users  79927 Feb  7  2018 contacts.zip
drwxr-xr-x 1 adm users    254 Feb  4  2018 attic
drwxr-xr-x 1 adm users     16 Feb 23  2018 recent
...
root@debian:~# ll /mnt/backups/attic/
ls: reading directory '/mnt/backups/attic/': Input/output error
total 0
drwxr-xr-x 1 adm users 254 Feb  4  2018 .
drwxr-xr-x 1 adm users 116 Sep 12 00:35 ..

It looks like this depends on whether the content is in disk cache...

What is surprising: when I try to create a file, I succeed:

root@debian:~# touch /mnt/backups/.mounted
root@debian:~# ll /mnt/backups/.mounted
-rw-r--r-- 1 root root 0 Sep 20 16:52 /mnt/backups/.mounted
root@debian:~# rm /mnt/backups/.mounted

My btrfs volume consists of two identical drives combined into RAID1 volume:

# btrfs filesystem df /mnt/backups
Data, RAID1: total=880.00GiB, used=878.96GiB
System, RAID1: total=8.00MiB, used=144.00KiB
Metadata, RAID1: total=2.00GiB, used=1.13GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

# btrfs filesystem show /mnt/backups
Label: none  uuid: a657364b-36d2-4c1f-8e5d-dc3d28166190
        Total devices 2 FS bytes used 880.09GiB
        devid    1 size 3.64TiB used 882.01GiB path /dev/sdf
        devid    2 size 3.64TiB used 882.01GiB path /dev/sde

As a workaround I can monitor dmesg output but:

1. It would be nice if I could tell btrfs that I would like to mount read-only
after a certain error rate per minute is reached.
2. It would be nice if btrfs could detect that both drives are not available and
unmount (as mount read-only won't help much) the filesystem.

Kernel log for Linux v4.14.2 is attached.

-- 
With best regards,
Dmitry

[-- Attachment #2: log.txt --]
[-- Type: text/plain, Size: 4475 bytes --]

Jun 29 18:54:56 debian kernel: [1197865.440396] usb 4-2: USB disconnect, device number 3
Jun 29 18:54:56 debian kernel: [1197865.440403] usb 4-2.2: USB disconnect, device number 5
Jun 29 18:54:56 debian kernel: [1197865.476118] usb 4-2.3: USB disconnect, device number 8
Jun 29 18:54:56 debian kernel: [1197865.549379] usb 4-2.4: USB disconnect, device number 7
...
Jun 29 18:54:58 debian kernel: [1197867.517728] usb-storage 4-2.3:1.0: USB Mass Storage device detected
Jun 29 18:54:58 debian kernel: [1197867.524021] usb-storage 4-2.3:1.0: Quirks match for vid 152d pid 0567: 5000000
Jun 29 18:54:58 debian kernel: [1197867.603859] usb 4-2.4: new full-speed USB device number 13 using ehci-pci
Jun 29 18:54:58 debian kernel: [1197867.725595] usb-storage 4-2.4:1.2: USB Mass Storage device detected
Jun 29 18:54:58 debian kernel: [1197867.728602] scsi host9: usb-storage 4-2.4:1.2
Jun 29 18:54:59 debian kernel: [1197868.528737] scsi 7:0:0:0: Direct-Access     ST4000DM 004-2CV104       0125 PQ: 0 ANSI: 6
Jun 29 18:54:59 debian kernel: [1197868.529310] scsi 7:0:0:1: Direct-Access     ST4000DM 004-2CV104       0125 PQ: 0 ANSI: 6
Jun 29 18:54:59 debian kernel: [1197868.530093] sd 7:0:0:0: Attached scsi generic sg5 type 0
Jun 29 18:54:59 debian kernel: [1197868.530588] sd 7:0:0:1: Attached scsi generic sg6 type 0
Jun 29 18:54:59 debian kernel: [1197868.533064] sd 7:0:0:1: [sdh] Very big device. Trying to use READ CAPACITY(16).
Jun 29 18:54:59 debian kernel: [1197868.533619] sd 7:0:0:1: [sdh] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
Jun 29 18:54:59 debian kernel: [1197868.533626] sd 7:0:0:1: [sdh] 4096-byte physical blocks
Jun 29 18:54:59 debian kernel: [1197868.534063] sd 7:0:0:1: [sdh] Write Protect is off
Jun 29 18:54:59 debian kernel: [1197868.534069] sd 7:0:0:1: [sdh] Mode Sense: 67 00 10 08
Jun 29 18:54:59 debian kernel: [1197868.534422] sd 7:0:0:1: [sdh] No Caching mode page found
Jun 29 18:54:59 debian kernel: [1197868.534542] sd 7:0:0:1: [sdh] Assuming drive cache: write through
Jun 29 18:54:59 debian kernel: [1197868.535563] sd 7:0:0:1: [sdh] Very big device. Trying to use READ CAPACITY(16).
Jun 29 18:54:59 debian kernel: [1197868.536702] sd 7:0:0:0: [sdg] Very big device. Trying to use READ CAPACITY(16).
Jun 29 18:54:59 debian kernel: [1197868.537454] sd 7:0:0:0: [sdg] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
Jun 29 18:54:59 debian kernel: [1197868.537459] sd 7:0:0:0: [sdg] 4096-byte physical blocks
Jun 29 18:54:59 debian kernel: [1197868.538327] sd 7:0:0:0: [sdg] Write Protect is off
Jun 29 18:54:59 debian kernel: [1197868.538331] sd 7:0:0:0: [sdg] Mode Sense: 67 00 10 08
...
Jun 29 20:22:35 debian kernel: [1203125.061068] BTRFS error (device sdf): bdev /dev/sdh errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
Jun 29 20:22:35 debian kernel: [1203125.061244] BTRFS error (device sdf): bdev /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
Jun 29 20:22:35 debian kernel: [1203125.061412] BTRFS error (device sdf): bdev /dev/sdh errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
Jun 29 20:22:35 debian kernel: [1203125.061530] BTRFS error (device sdf): bdev /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
Jun 29 20:22:35 debian kernel: [1203125.061770] BTRFS error (device sdf): bdev /dev/sdh errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
Jun 29 20:22:35 debian kernel: [1203125.061894] BTRFS error (device sdf): bdev /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
Jun 29 20:22:40 debian kernel: [1203130.411911] btrfs_dev_stat_print_on_error: 42 callbacks suppressed
...
Jun 29 23:51:36 debian kernel: [1215666.863475] BTRFS error (device sdf): bdev /dev/sdh errs: wr 0, rd 1867, flush 0, corrupt 0, gen 0
Jun 29 23:51:36 debian kernel: [1215666.864464] BTRFS error (device sdf): bdev /dev/sdg errs: wr 0, rd 1867, flush 0, corrupt 0, gen 0
Jun 29 23:51:36 debian kernel: [1215666.865392] BTRFS: error (device sdf) in btrfs_run_delayed_refs:3089: errno=-5 IO failure
Jun 29 23:51:36 debian kernel: [1215666.866354] BTRFS info (device sdf): forced readonly
Jun 29 23:51:36 debian kernel: [1215666.866357] BTRFS warning (device sdf): Skipping commit of aborted transaction.
Jun 29 23:51:36 debian kernel: [1215666.866360] BTRFS: error (device sdf) in cleanup_transaction:1873: errno=-5 IO failure
Jun 29 23:51:36 debian kernel: [1215666.868305] BTRFS error (device sdf): commit super ret -5
Jun 29 23:51:36 debian kernel: [1215666.869849] BTRFS error (device sdf): cleaner transaction attach returned -30

^ permalink raw reply	[flat|nested] 5+ messages in thread