All of lore.kernel.org
 help / color / mirror / Atom feed
* Can't remount a BTRFS partition read write after a drive failure
@ 2017-05-16 12:56 Sylvain Leroux
  2017-05-17  4:10 ` Chris Murphy
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Sylvain Leroux @ 2017-05-16 12:56 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1.1: Type: text/plain, Size: 1804 bytes --]

Hi,
I'm investigating BTRFS using an external USB HDD on a Linux Debian
Stretch/Sid system.

The drive is not reliable. And I noticed when there is an error and the
USB device appears to be dead to the kernel, I am later unable to
remount rw the drive. I can mount it read only though.

This seems to be a systematic behavior. And it occasionally happens when
the computer wake up from sleep and the drive is still attached.

Power cycling the disk do not change anything, but restarting the
computer "solves" the issue.


I believe this may be caused by BTRFS having issues since the kernel
assign a different device name to the drive when it bring it back
inline? Or BTRFS didn't realize the "original" drive has gone away?


Attached are the dmesg output corresponding to my issue.
Initially, the drive was mounted rw and associated to the /dev/sdb
device, the btrfs partition being /dev/sdb1
After the failure, I power cycled the drive, and the kernel brought it
back as /dev/sdc

I can mount /dev/sdc1 read only.
But I'm unable to mount it read-write. Interestingly, the message in
dmesg still mention /dev/sdb1 as the device whereas it should be /dev/sdc1.



sylvain@bulbizarre:~$ uname -a
Linux bulbizarre 4.9.0-2-amd64 #1 SMP Debian 4.9.18-1 (2017-03-30)
x86_64 GNU/Linux
sylvain@bulbizarre:~$ btrfs --version
btrfs-progs v4.7.3
sylvain@bulbizarre:~$ sudo btrfs fi show
Label: 'G-Drive'  uuid: 9465dad3-6604-4437-883e-f66386d69ac8
    Total devices 1 FS bytes used 503.34GiB
    devid    1 size 931.51GiB used 507.02GiB path /dev/sdc1



Any clue to help me mounting back the disk without having to restart the
system would be greatly appreciated!
Regards,
- Sylvain.

-- 
-- Sylvain Leroux
-- sylvain@chicoree.fr
-- http://www.chicoree.fr


[-- Attachment #1.1.2: dmesg.txt --]
[-- Type: text/plain, Size: 29402 bytes --]

[57157.203272] sd 6:0:0:0: [sdb] tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD 
[57157.203278] sd 6:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 2c dd 25 e8 00 02 00 00
[57157.203306] scsi host6: uas_eh_bus_reset_handler start
[57157.283228] usb 4-1.2: reset high-speed USB device number 3 using ehci-pci
[57157.473053] scsi host6: uas_eh_bus_reset_handler success
[57167.688022] sd 6:0:0:0: [sdb] tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD 
[57167.688030] sd 6:0:0:0: [sdb] tag#0 CDB: Test Unit Ready 00 00 00 00 00 00
[57167.688035] scsi host6: uas_eh_bus_reset_handler start
[57167.768020] usb 4-1.2: reset high-speed USB device number 3 using ehci-pci
[57168.068580] scsi host6: uas_eh_bus_reset_handler FAILED err -19
[57168.068588] sd 6:0:0:0: Device offlined - not ready after error recovery
[57168.068603] sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[57168.068610] sd 6:0:0:0: [sdb] tag#0 CDB: Read(10) 28 00 2c dd 25 e8 00 02 00 00
[57168.068613] blk_update_request: I/O error, dev sdb, sector 752690664
[57168.068626] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
[57168.068833] sd 6:0:0:0: rejecting I/O to offline device
[57168.068844] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
[57168.068862] BTRFS error (device sdb1): error reading free space cache
[57168.068866] BTRFS warning (device sdb1): failed to load free space cache for block group 384428933120, rebuilding it now
[57168.068988] sd 6:0:0:0: rejecting I/O to offline device
[57168.068992] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
[57168.069076] usb 4-1.2: USB disconnect, device number 3
[57168.069290] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
[57168.069366] BTRFS error (device sdb1): error reading free space cache
[57168.069371] BTRFS warning (device sdb1): failed to load free space cache for block group 385502674944, rebuilding it now
[57168.071123] sd 6:0:0:0: [sdb] Synchronizing SCSI cache
[57168.077427] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 1, rd 4, flush 0, corrupt 0, gen 0
[57168.077880] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 2, rd 4, flush 0, corrupt 0, gen 0
[57168.078272] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 3, rd 4, flush 0, corrupt 0, gen 0
[57168.078572] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 4, rd 4, flush 0, corrupt 0, gen 0
[57168.079000] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 5, rd 4, flush 0, corrupt 0, gen 0
[57168.079207] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 6, rd 4, flush 0, corrupt 0, gen 0
[57168.100126] sd 6:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[57168.100195] BTRFS error (device sdb1): error reading free space cache
[57168.100203] BTRFS warning (device sdb1): failed to load free space cache for block group 386576416768, rebuilding it now
[57168.101236] BTRFS error (device sdb1): error reading free space cache
[57168.101241] BTRFS warning (device sdb1): failed to load free space cache for block group 387650158592, rebuilding it now
[57168.101435] BTRFS error (device sdb1): error reading free space cache
[57168.101440] BTRFS warning (device sdb1): failed to load free space cache for block group 388723900416, rebuilding it now
[57168.101637] BTRFS error (device sdb1): error reading free space cache
[57168.101643] BTRFS warning (device sdb1): failed to load free space cache for block group 389797642240, rebuilding it now
[57168.101890] BTRFS error (device sdb1): error reading free space cache
[57168.101895] BTRFS warning (device sdb1): failed to load free space cache for block group 390871384064, rebuilding it now
[57168.102054] BTRFS error (device sdb1): error reading free space cache
[57168.102059] BTRFS warning (device sdb1): failed to load free space cache for block group 391945125888, rebuilding it now
[57168.102192] BTRFS error (device sdb1): error reading free space cache
[57168.102197] BTRFS warning (device sdb1): failed to load free space cache for block group 393018867712, rebuilding it now
[57168.102313] BTRFS error (device sdb1): error reading free space cache
[57168.102317] BTRFS warning (device sdb1): failed to load free space cache for block group 394092609536, rebuilding it now
[57168.102477] BTRFS error (device sdb1): error reading free space cache
[57168.102481] BTRFS warning (device sdb1): failed to load free space cache for block group 395166351360, rebuilding it now
[57168.102598] BTRFS error (device sdb1): error reading free space cache
[57168.102601] BTRFS warning (device sdb1): failed to load free space cache for block group 396240093184, rebuilding it now
[57168.102713] BTRFS error (device sdb1): error reading free space cache
[57168.102717] BTRFS warning (device sdb1): failed to load free space cache for block group 397313835008, rebuilding it now
[57168.102849] BTRFS error (device sdb1): error reading free space cache
[57168.102854] BTRFS warning (device sdb1): failed to load free space cache for block group 398387576832, rebuilding it now
[57168.102983] BTRFS error (device sdb1): error reading free space cache
[57168.102988] BTRFS warning (device sdb1): failed to load free space cache for block group 399461318656, rebuilding it now
[57168.103118] BTRFS error (device sdb1): error reading free space cache
[57168.103122] BTRFS warning (device sdb1): failed to load free space cache for block group 400535060480, rebuilding it now
[57168.103330] BTRFS error (device sdb1): error reading free space cache
[57168.103334] BTRFS warning (device sdb1): failed to load free space cache for block group 401608802304, rebuilding it now
[57168.103501] BTRFS error (device sdb1): error reading free space cache
[57168.103505] BTRFS warning (device sdb1): failed to load free space cache for block group 402682544128, rebuilding it now
[57168.103643] BTRFS error (device sdb1): error reading free space cache
[57168.103649] BTRFS warning (device sdb1): failed to load free space cache for block group 403756285952, rebuilding it now
[57168.103776] BTRFS error (device sdb1): error reading free space cache
[57168.103781] BTRFS warning (device sdb1): failed to load free space cache for block group 404830027776, rebuilding it now
[57168.103903] BTRFS error (device sdb1): error reading free space cache
[57168.103908] BTRFS warning (device sdb1): failed to load free space cache for block group 405903769600, rebuilding it now
[57168.104276] BTRFS error (device sdb1): error reading free space cache
[57168.104282] BTRFS warning (device sdb1): failed to load free space cache for block group 406977511424, rebuilding it now
[57168.104407] BTRFS error (device sdb1): error reading free space cache
[57168.104412] BTRFS warning (device sdb1): failed to load free space cache for block group 408051253248, rebuilding it now
[57168.104657] BTRFS error (device sdb1): error reading free space cache
[57168.104661] BTRFS warning (device sdb1): failed to load free space cache for block group 409124995072, rebuilding it now
[57168.104963] BTRFS error (device sdb1): error reading free space cache
[57168.104970] BTRFS warning (device sdb1): failed to load free space cache for block group 410198736896, rebuilding it now
[57168.105134] BTRFS error (device sdb1): error reading free space cache
[57168.105140] BTRFS warning (device sdb1): failed to load free space cache for block group 411272478720, rebuilding it now
[57168.105282] BTRFS error (device sdb1): error reading free space cache
[57168.105288] BTRFS warning (device sdb1): failed to load free space cache for block group 412346220544, rebuilding it now
[57168.105428] BTRFS error (device sdb1): error reading free space cache
[57168.105434] BTRFS warning (device sdb1): failed to load free space cache for block group 413419962368, rebuilding it now
[57168.105594] BTRFS error (device sdb1): error reading free space cache
[57168.105599] BTRFS warning (device sdb1): failed to load free space cache for block group 415567446016, rebuilding it now
[57168.105753] BTRFS error (device sdb1): error reading free space cache
[57168.105757] BTRFS warning (device sdb1): failed to load free space cache for block group 416641187840, rebuilding it now
[57168.105880] BTRFS error (device sdb1): error reading free space cache
[57168.105885] BTRFS warning (device sdb1): failed to load free space cache for block group 417714929664, rebuilding it now
[57168.105998] BTRFS error (device sdb1): error reading free space cache
[57168.106004] BTRFS warning (device sdb1): failed to load free space cache for block group 418788671488, rebuilding it now
[57168.106215] BTRFS error (device sdb1): error reading free space cache
[57168.106219] BTRFS warning (device sdb1): failed to load free space cache for block group 419862413312, rebuilding it now
[57168.106339] BTRFS error (device sdb1): error reading free space cache
[57168.106344] BTRFS warning (device sdb1): failed to load free space cache for block group 420936155136, rebuilding it now
[57168.106530] BTRFS error (device sdb1): error reading free space cache
[57168.106535] BTRFS warning (device sdb1): failed to load free space cache for block group 422009896960, rebuilding it now
[57168.106820] BTRFS error (device sdb1): error reading free space cache
[57168.106824] BTRFS warning (device sdb1): failed to load free space cache for block group 423083638784, rebuilding it now
[57168.107214] BTRFS error (device sdb1): error reading free space cache
[57168.107218] BTRFS warning (device sdb1): failed to load free space cache for block group 426304864256, rebuilding it now
[57168.107323] BTRFS error (device sdb1): error reading free space cache
[57168.107326] BTRFS warning (device sdb1): failed to load free space cache for block group 427378606080, rebuilding it now
[57168.107445] BTRFS error (device sdb1): error reading free space cache
[57168.107451] BTRFS warning (device sdb1): failed to load free space cache for block group 428452347904, rebuilding it now
[57168.107579] BTRFS error (device sdb1): error reading free space cache
[57168.107583] BTRFS warning (device sdb1): failed to load free space cache for block group 429526089728, rebuilding it now
[57168.107937] BTRFS error (device sdb1): error reading free space cache
[57168.107941] BTRFS warning (device sdb1): failed to load free space cache for block group 430599831552, rebuilding it now
[57168.108266] BTRFS error (device sdb1): error reading free space cache
[57168.108271] BTRFS warning (device sdb1): failed to load free space cache for block group 431673573376, rebuilding it now
[57168.108421] BTRFS error (device sdb1): error reading free space cache
[57168.108426] BTRFS warning (device sdb1): failed to load free space cache for block group 432747315200, rebuilding it now
[57168.108546] BTRFS error (device sdb1): error reading free space cache
[57168.108549] BTRFS warning (device sdb1): failed to load free space cache for block group 433821057024, rebuilding it now
[57168.108658] BTRFS error (device sdb1): error reading free space cache
[57168.108663] BTRFS warning (device sdb1): failed to load free space cache for block group 434894798848, rebuilding it now
[57168.108783] BTRFS error (device sdb1): error reading free space cache
[57168.108787] BTRFS warning (device sdb1): failed to load free space cache for block group 435968540672, rebuilding it now
[57168.108908] BTRFS error (device sdb1): error reading free space cache
[57168.108913] BTRFS warning (device sdb1): failed to load free space cache for block group 437042282496, rebuilding it now
[57168.109035] BTRFS error (device sdb1): error reading free space cache
[57168.109040] BTRFS warning (device sdb1): failed to load free space cache for block group 438116024320, rebuilding it now
[57168.109258] BTRFS error (device sdb1): error reading free space cache
[57168.109263] BTRFS warning (device sdb1): failed to load free space cache for block group 439189766144, rebuilding it now
[57168.109386] BTRFS error (device sdb1): error reading free space cache
[57168.109390] BTRFS warning (device sdb1): failed to load free space cache for block group 440263507968, rebuilding it now
[57168.109516] BTRFS error (device sdb1): error reading free space cache
[57168.109520] BTRFS warning (device sdb1): failed to load free space cache for block group 441337249792, rebuilding it now
[57168.109645] BTRFS error (device sdb1): error reading free space cache
[57168.109649] BTRFS warning (device sdb1): failed to load free space cache for block group 442947862528, rebuilding it now
[57168.111282] BTRFS error (device sdb1): error reading free space cache
[57168.111287] BTRFS warning (device sdb1): failed to load free space cache for block group 444021604352, rebuilding it now
[57168.111383] BTRFS error (device sdb1): error reading free space cache
[57168.111387] BTRFS warning (device sdb1): failed to load free space cache for block group 445095346176, rebuilding it now
[57168.111596] BTRFS error (device sdb1): error reading free space cache
[57168.111601] BTRFS warning (device sdb1): failed to load free space cache for block group 446169088000, rebuilding it now
[57168.111716] BTRFS error (device sdb1): error reading free space cache
[57168.111720] BTRFS warning (device sdb1): failed to load free space cache for block group 447242829824, rebuilding it now
[57168.111823] BTRFS error (device sdb1): error reading free space cache
[57168.111827] BTRFS warning (device sdb1): failed to load free space cache for block group 448316571648, rebuilding it now
[57168.111949] BTRFS error (device sdb1): error reading free space cache
[57168.111953] BTRFS warning (device sdb1): failed to load free space cache for block group 449390313472, rebuilding it now
[57168.112204] BTRFS error (device sdb1): error reading free space cache
[57168.112209] BTRFS warning (device sdb1): failed to load free space cache for block group 450464055296, rebuilding it now
[57168.112346] BTRFS error (device sdb1): error reading free space cache
[57168.112350] BTRFS warning (device sdb1): failed to load free space cache for block group 451537797120, rebuilding it now
[57168.112470] BTRFS error (device sdb1): error reading free space cache
[57168.112476] BTRFS warning (device sdb1): failed to load free space cache for block group 452611538944, rebuilding it now
[57168.112601] BTRFS error (device sdb1): error reading free space cache
[57168.112606] BTRFS warning (device sdb1): failed to load free space cache for block group 453685280768, rebuilding it now
[57168.112728] BTRFS error (device sdb1): error reading free space cache
[57168.112732] BTRFS warning (device sdb1): failed to load free space cache for block group 454759022592, rebuilding it now
[57168.112860] BTRFS error (device sdb1): error reading free space cache
[57168.112864] BTRFS warning (device sdb1): failed to load free space cache for block group 455832764416, rebuilding it now
[57168.112988] BTRFS error (device sdb1): error reading free space cache
[57168.112993] BTRFS warning (device sdb1): failed to load free space cache for block group 456906506240, rebuilding it now
[57168.113232] BTRFS error (device sdb1): error reading free space cache
[57168.113236] BTRFS warning (device sdb1): failed to load free space cache for block group 457980248064, rebuilding it now
[57168.113521] BTRFS error (device sdb1): error reading free space cache
[57168.113525] BTRFS warning (device sdb1): failed to load free space cache for block group 459053989888, rebuilding it now
[57168.113623] BTRFS error (device sdb1): error reading free space cache
[57168.113628] BTRFS warning (device sdb1): failed to load free space cache for block group 460127731712, rebuilding it now
[57168.113826] BTRFS error (device sdb1): error reading free space cache
[57168.113831] BTRFS warning (device sdb1): failed to load free space cache for block group 461201473536, rebuilding it now
[57168.113944] BTRFS error (device sdb1): error reading free space cache
[57168.113947] BTRFS warning (device sdb1): failed to load free space cache for block group 462275215360, rebuilding it now
[57168.114052] BTRFS error (device sdb1): error reading free space cache
[57168.114057] BTRFS warning (device sdb1): failed to load free space cache for block group 463348957184, rebuilding it now
[57168.114208] BTRFS error (device sdb1): error reading free space cache
[57168.114211] BTRFS warning (device sdb1): failed to load free space cache for block group 464422699008, rebuilding it now
[57168.114307] BTRFS error (device sdb1): error reading free space cache
[57168.114310] BTRFS warning (device sdb1): failed to load free space cache for block group 465496440832, rebuilding it now
[57168.114402] BTRFS error (device sdb1): error reading free space cache
[57168.114408] BTRFS warning (device sdb1): failed to load free space cache for block group 466570182656, rebuilding it now
[57168.114522] BTRFS error (device sdb1): error reading free space cache
[57168.114526] BTRFS warning (device sdb1): failed to load free space cache for block group 467643924480, rebuilding it now
[57168.114615] BTRFS error (device sdb1): error reading free space cache
[57168.114620] BTRFS warning (device sdb1): failed to load free space cache for block group 468717666304, rebuilding it now
[57168.114709] BTRFS error (device sdb1): error reading free space cache
[57168.114712] BTRFS warning (device sdb1): failed to load free space cache for block group 469791408128, rebuilding it now
[57168.114798] BTRFS error (device sdb1): error reading free space cache
[57168.114801] BTRFS warning (device sdb1): failed to load free space cache for block group 470865149952, rebuilding it now
[57168.114883] BTRFS error (device sdb1): error reading free space cache
[57168.114886] BTRFS warning (device sdb1): failed to load free space cache for block group 471938891776, rebuilding it now
[57168.114971] BTRFS error (device sdb1): error reading free space cache
[57168.114974] BTRFS warning (device sdb1): failed to load free space cache for block group 473012633600, rebuilding it now
[57168.115096] BTRFS error (device sdb1): error reading free space cache
[57168.115100] BTRFS warning (device sdb1): failed to load free space cache for block group 474086375424, rebuilding it now
[57168.115241] BTRFS error (device sdb1): error reading free space cache
[57168.115247] BTRFS warning (device sdb1): failed to load free space cache for block group 475160117248, rebuilding it now
[57168.115426] BTRFS error (device sdb1): error reading free space cache
[57168.115430] BTRFS warning (device sdb1): failed to load free space cache for block group 476233859072, rebuilding it now
[57168.115526] BTRFS error (device sdb1): error reading free space cache
[57168.115529] BTRFS warning (device sdb1): failed to load free space cache for block group 478381342720, rebuilding it now
[57168.115619] BTRFS error (device sdb1): error reading free space cache
[57168.115622] BTRFS warning (device sdb1): failed to load free space cache for block group 481602568192, rebuilding it now
[57168.115710] BTRFS error (device sdb1): error reading free space cache
[57168.115713] BTRFS warning (device sdb1): failed to load free space cache for block group 482676310016, rebuilding it now
[57168.115799] BTRFS error (device sdb1): error reading free space cache
[57168.115803] BTRFS warning (device sdb1): failed to load free space cache for block group 483750051840, rebuilding it now
[57168.115912] BTRFS error (device sdb1): error reading free space cache
[57168.115917] BTRFS warning (device sdb1): failed to load free space cache for block group 485897535488, rebuilding it now
[57168.116000] BTRFS error (device sdb1): error reading free space cache
[57168.116003] BTRFS warning (device sdb1): failed to load free space cache for block group 486971277312, rebuilding it now
[57168.116121] BTRFS error (device sdb1): error reading free space cache
[57168.116125] BTRFS warning (device sdb1): failed to load free space cache for block group 488045019136, rebuilding it now
[57168.116212] BTRFS warning (device sdb1): failed to load free space cache for block group 489118760960, rebuilding it now
[57168.116300] BTRFS warning (device sdb1): failed to load free space cache for block group 490192502784, rebuilding it now
[57168.116389] BTRFS warning (device sdb1): failed to load free space cache for block group 491266244608, rebuilding it now
[57168.116471] BTRFS warning (device sdb1): failed to load free space cache for block group 492339986432, rebuilding it now
[57168.116563] BTRFS warning (device sdb1): failed to load free space cache for block group 493413728256, rebuilding it now
[57168.116678] BTRFS warning (device sdb1): failed to load free space cache for block group 494487470080, rebuilding it now
[57168.116771] BTRFS warning (device sdb1): failed to load free space cache for block group 495561211904, rebuilding it now
[57168.116858] BTRFS warning (device sdb1): failed to load free space cache for block group 496634953728, rebuilding it now
[57168.116941] BTRFS warning (device sdb1): failed to load free space cache for block group 497708695552, rebuilding it now
[57168.117027] BTRFS warning (device sdb1): failed to load free space cache for block group 498782437376, rebuilding it now
[57168.128139] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128151] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128156] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128159] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128163] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128166] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128169] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128172] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128176] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128179] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128182] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128185] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128188] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128192] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128195] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128199] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128202] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128205] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128208] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128212] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128215] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128218] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128221] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128225] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128229] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128240] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128248] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128258] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128267] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128276] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128286] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128296] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128305] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128313] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128322] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128331] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128340] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128352] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128361] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128371] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128379] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128388] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128400] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128408] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128416] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128424] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128432] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128449] scsi 6:0:0:0: rejecting I/O to dead device
[57168.128459] BTRFS: error (device sdb1) in btrfs_commit_transaction:2227: errno=-5 IO failure (Error while writing out transaction)
[57168.128463] BTRFS info (device sdb1): forced readonly
[57168.128465] ------------[ cut here ]------------
[57168.128502] WARNING: CPU: 0 PID: 2379 at /build/linux-9r9Ph5/linux-4.9.18/fs/btrfs/transaction.c:1850 cleanup_transaction+0x1f0/0x2e0 [btrfs]
[57168.128510] BTRFS: Transaction aborted (error -5)
[57168.128511] Modules linked in: fuse sdhci_pci sdhci jmb38x_ms mmc_core memstick xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter ip6_tables nfsv3 nfs_acl xt_addrtype xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter nf_nat nf_conntrack bridge stp llc aufs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache tun binfmt_misc arc4 iwldvm mac80211 snd_hda_codec_hdmi intel_powerclamp uvcvideo coretemp videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core kvm_intel videodev snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel iTCO_wdt snd_hda_codec kvm dell_wmi iTCO_vendor_support sparse_keymap media iwlwifi snd_hda_core dell_laptop snd_hwdep
[57168.128608]  snd_pcm_oss irqbypass cfg80211 snd_mixer_oss intel_cstate snd_pcm dell_smbios intel_uncore snd_timer dcdbas joydev evdev pcspkr mei_me video serio_raw sg wmi snd i7core_edac edac_core rfkill lpc_ich mfd_core mei soundcore button ac dell_smo8800 shpchp battery acpi_cpufreq tpm_tis tpm_tis_core tpm nvidia_drm(POE) drm_kms_helper drm nvidia_modeset(POE) nvidia(POE) parport_pc sunrpc ppdev lp parport ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 mbcache btrfs crc32c_generic xor raid6_pq dm_mod sr_mod cdrom sd_mod uas usb_storage ahci libahci xhci_pci xhci_hcd libata ehci_pci ehci_hcd psmouse crc32c_intel scsi_mod usbcore r8169 i2c_i801 i2c_smbus mii usb_common thermal fan
[57168.128681] CPU: 0 PID: 2379 Comm: btrfs-transacti Tainted: P           OE   4.9.0-2-amd64 #1 Debian 4.9.18-1
[57168.128683] Hardware name: Dell Inc. XPS L701X   /0T105W, BIOS A07 12/24/2010
[57168.128685]  0000000000000000 ffffffff83b28714 ffffab3f06017d58 0000000000000000
[57168.128689]  ffffffff83876e9e ffff8b649b9da528 ffffab3f06017db0 ffff8b648a3bd320
[57168.128695]  00000000fffffffb ffff8b67d9bb4100 0000000000000000 ffffffff83876f1f
[57168.128700] Call Trace:
[57168.128708]  [<ffffffff83b28714>] ? dump_stack+0x5c/0x78
[57168.128713]  [<ffffffff83876e9e>] ? __warn+0xbe/0xe0
[57168.128716]  [<ffffffff83876f1f>] ? warn_slowpath_fmt+0x5f/0x80
[57168.128718]  [<ffffffff83b2e983>] ? ___ratelimit+0xa3/0xf0
[57168.128748]  [<ffffffffc02e6310>] ? cleanup_transaction+0x1f0/0x2e0 [btrfs]
[57168.128752]  [<ffffffff838b8d00>] ? prepare_to_wait_event+0xf0/0xf0
[57168.128781]  [<ffffffffc02e7d68>] ? btrfs_commit_transaction+0x298/0xa10 [btrfs]
[57168.128809]  [<ffffffffc02e8576>] ? start_transaction+0x96/0x480 [btrfs]
[57168.128834]  [<ffffffffc02e29ac>] ? transaction_kthread+0x1dc/0x200 [btrfs]
[57168.128862]  [<ffffffffc02e27d0>] ? btrfs_cleanup_transaction+0x580/0x580 [btrfs]
[57168.128867]  [<ffffffff838965ce>] ? kthread+0xce/0xf0
[57168.128871]  [<ffffffff83824701>] ? __switch_to+0x2c1/0x6c0
[57168.128874]  [<ffffffff83896500>] ? kthread_park+0x60/0x60
[57168.128878]  [<ffffffff83dfb2f5>] ? ret_from_fork+0x25/0x30
[57168.128880] ---[ end trace 791e7db30a7cb691 ]---
[57168.128883] BTRFS: error (device sdb1) in cleanup_transaction:1850: errno=-5 IO failure
[57168.128887] BTRFS info (device sdb1): delayed_refs has NO entry
[58508.440104] CE: hpet6 increased min_delta_ns to 20115 nsec
[59259.276647] CE: hpet5 increased min_delta_ns to 20115 nsec
[60908.738878] usb 2-1: USB disconnect, device number 2
[64205.696040] usb 4-1.2: new high-speed USB device number 4 using ehci-pci
[64205.885758] usb 4-1.2: New USB device found, idVendor=4971, idProduct=8032
[64205.885762] usb 4-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[64205.885765] usb 4-1.2: Product: ev All-Terrain Case USB
[64205.885767] usb 4-1.2: Manufacturer: HGST
[64205.885772] usb 4-1.2: SerialNumber: EB0151700314
[64205.886677] scsi host7: uas
[64205.887706] scsi 7:0:0:0: Direct-Access     ev All-  Terrain Case USB 2103 PQ: 0 ANSI: 6
[64205.944929] sd 7:0:0:0: Attached scsi generic sg2 type 0
[64205.946467] sd 7:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
[64205.946471] sd 7:0:0:0: [sdc] 4096-byte physical blocks
[64205.947467] sd 7:0:0:0: [sdc] Write Protect is off
[64205.947472] sd 7:0:0:0: [sdc] Mode Sense: 53 00 00 08
[64205.948801] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[64205.992983]  sdc: sdc1
[64205.995462] sd 7:0:0:0: [sdc] Attached SCSI disk
[64521.173370] BTRFS info (device sdb1): disk space caching is enabled
[64521.173374] btrfs_printk: 50 callbacks suppressed
[64521.173376] BTRFS error (device sdb1): Remounting read-write after error is not allowed
[64667.233156] BTRFS info (device sdb1): disk space caching is enabled
[64667.233160] BTRFS error (device sdb1): Remounting read-write after error is not allowed


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 862 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Can't remount a BTRFS partition read write after a drive failure
  2017-05-16 12:56 Can't remount a BTRFS partition read write after a drive failure Sylvain Leroux
@ 2017-05-17  4:10 ` Chris Murphy
  2017-05-17  9:22 ` Duncan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Chris Murphy @ 2017-05-17  4:10 UTC (permalink / raw)
  To: Sylvain Leroux; +Cc: Btrfs BTRFS

On Tue, May 16, 2017 at 6:56 AM, Sylvain Leroux <sylvain@chicoree.fr> wrote:
> Hi,
> I'm investigating BTRFS using an external USB HDD on a Linux Debian
> Stretch/Sid system.
>
> The drive is not reliable. And I noticed when there is an error and the
> USB device appears to be dead to the kernel, I am later unable to
> remount rw the drive. I can mount it read only though.
>
> This seems to be a systematic behavior. And it occasionally happens when
> the computer wake up from sleep and the drive is still attached.
>
> Power cycling the disk do not change anything, but restarting the
> computer "solves" the issue.
>
>
> I believe this may be caused by BTRFS having issues since the kernel
> assign a different device name to the drive when it bring it back
> inline? Or BTRFS didn't realize the "original" drive has gone away?
>
>
> Attached are the dmesg output corresponding to my issue.
> Initially, the drive was mounted rw and associated to the /dev/sdb
> device, the btrfs partition being /dev/sdb1
> After the failure, I power cycled the drive, and the kernel brought it
> back as /dev/sdc
>
> I can mount /dev/sdc1 read only.
> But I'm unable to mount it read-write. Interestingly, the message in
> dmesg still mention /dev/sdb1 as the device whereas it should be /dev/sdc1.
>
>
>
> sylvain@bulbizarre:~$ uname -a
> Linux bulbizarre 4.9.0-2-amd64 #1 SMP Debian 4.9.18-1 (2017-03-30)
> x86_64 GNU/Linux
> sylvain@bulbizarre:~$ btrfs --version
> btrfs-progs v4.7.3
> sylvain@bulbizarre:~$ sudo btrfs fi show
> Label: 'G-Drive'  uuid: 9465dad3-6604-4437-883e-f66386d69ac8
>     Total devices 1 FS bytes used 503.34GiB
>     devid    1 size 931.51GiB used 507.02GiB path /dev/sdc1
>
>
>
> Any clue to help me mounting back the disk without having to restart the
> system would be greatly appreciated!
> Regards,
> - Sylvain.


Seems normal to me. The device is seriously misbehaving. Btrfs gets
confused. And it goes read-only to avoid causing more problems by
adding writes based on confusing to the misbehaving and unreliable
drive.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Can't remount a BTRFS partition read write after a drive failure
  2017-05-16 12:56 Can't remount a BTRFS partition read write after a drive failure Sylvain Leroux
  2017-05-17  4:10 ` Chris Murphy
@ 2017-05-17  9:22 ` Duncan
  2017-05-17 15:21 ` Ivan Sizov
       [not found] ` <CAMG9ccz555xia-FpqGsoy-uzYXcyYwG6H0EC18O2AC9yLUuBpg@mail.gmail.com>
  3 siblings, 0 replies; 7+ messages in thread
From: Duncan @ 2017-05-17  9:22 UTC (permalink / raw)
  To: linux-btrfs

Sylvain Leroux posted on Tue, 16 May 2017 14:56:37 +0200 as excerpted:

> I'm investigating BTRFS using an external USB HDD on a Linux Debian
> Stretch/Sid system.
> 
> The drive is not reliable. And I noticed when there is an error and the
> USB device appears to be dead to the kernel, I am later unable to
> remount rw the drive. I can mount it read only though.
> 
> This seems to be a systematic behavior. And it occasionally happens when
> the computer wake up from sleep and the drive is still attached.
> 
> Power cycling the disk do not change anything, but restarting the
> computer "solves" the issue.
> 
> 
> I believe this may be caused by BTRFS having issues since the kernel
> assign a different device name to the drive when it bring it back
> inline? Or BTRFS didn't realize the "original" drive has gone away?

> Initially, the drive was mounted rw and associated to the /dev/sdb
> device, the btrfs partition being /dev/sdb1 After the failure, I power
> cycled the drive, and the kernel brought it back as /dev/sdc
> 
> I can mount /dev/sdc1 read only.
> But I'm unable to mount it read-write. Interestingly, the message in
> dmesg still mention /dev/sdb1 as the device whereas it should be
> /dev/sdc1.
> 
> sylvain@bulbizarre:~$ uname -[r]
> 4.9.0-2-amd64

This is a known issue.  Btrfs doesn't yet properly track devices and is 
thus unaware that the old device (/dev/sdb1) has gone away.  It does see 
the new device (/dev/sdc1, after btrfs device scan, which udev normally 
runs automatically when a new device appears), and can thus mount it, but 
because it still thinks the old device is there as well, as Chris Murphy 
says, it gets confused, and to be safe, only allows mounting read-only.


There's a patch set in the wings that makes btrfs properly device aware, 
allowing it to track disappearing devices and act accordingly, as a 
prerequisite to the hot-spares feature which the patch set introduces, 
but that patch set is tied up waiting for a different patch series (IIRC 
a change in the device-flush handling, I'm not a dev and haven't tracked 
the specifics), so it could be awhile, 4.13 at absolute minimum, since 
4.12 is the current dev kernel series.

Meanwhile, the history of USB connection flakiness and problems in 
general means btrfs is not generally recommended for USB attached 
devices, at least until those above mentioned patches go in.  For some 
people on specific hardware it works... until it doesn't and we get the 
reports here.  But there's enough of those reports that we simply don't 
recommend btrfs, if the device(s) hosting the filesystem are going to be 
USB-attached.

FWIW, direct SATA connections (eSATA for external) seem to be a better 
choice.  Or choose a different filesystem that's more stable and mature 
(btrfs is still stabilizing, not yet fully stable and mature), and proven 
ready to handle such issues in a better way.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Can't remount a BTRFS partition read write after a drive failure
  2017-05-16 12:56 Can't remount a BTRFS partition read write after a drive failure Sylvain Leroux
  2017-05-17  4:10 ` Chris Murphy
  2017-05-17  9:22 ` Duncan
@ 2017-05-17 15:21 ` Ivan Sizov
       [not found] ` <CAMG9ccz555xia-FpqGsoy-uzYXcyYwG6H0EC18O2AC9yLUuBpg@mail.gmail.com>
  3 siblings, 0 replies; 7+ messages in thread
From: Ivan Sizov @ 2017-05-17 15:21 UTC (permalink / raw)
  To: Sylvain Leroux; +Cc: Btrfs BTRFS

2017-05-16 15:56 GMT+03:00 Sylvain Leroux <sylvain@chicoree.fr>:
>
>
> The drive is not reliable. And I noticed when there is an error and the
> USB device appears to be dead to the kernel, I am later unable to
> remount rw the drive. I can mount it read only though.
>
> This seems to be a systematic behavior. And it occasionally happens when
> the computer wake up from sleep and the drive is still attached.
>
> Power cycling the disk do not change anything, but restarting the
> computer "solves" the issue.

(Maybe offtop) Seems like your disk's USB-SATA controller is almost
dead. You shouldn't further use it with USB because this lead to data
corruption. Detach HDD from case and plug directly to a SATA port or
replace the controller.

--
Ivan Sizov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Can't remount a BTRFS partition read write after a drive failure
       [not found] ` <CAMG9ccz555xia-FpqGsoy-uzYXcyYwG6H0EC18O2AC9yLUuBpg@mail.gmail.com>
@ 2017-05-18  7:22   ` Sylvain Leroux
  2017-05-19  2:23     ` Duncan
  0 siblings, 1 reply; 7+ messages in thread
From: Sylvain Leroux @ 2017-05-18  7:22 UTC (permalink / raw)
  To: Ivan Sizov <sivan606@gmail.com>;Chris Murphy; +Cc: Btrfs BTRFS

On 05/17/2017 09:19 AM, Ivan Sizov wrote:
> > The drive is not reliable. And I noticed when there is an error and the
> > USB device appears to be dead to the kernel, I am later unable to
> > remount rw the drive. I can mount it read only though.
> > This seems to be a systematic behavior. And it occasionally happens when
> > the computer wake up from sleep and the drive is still attached.
> > Power cycling the disk do not change anything, but restarting the
> > computer "solves" the issue.
>
> (Maybe offtop) Seems like your disk's USB-SATA controller is almost
dead. You shouldn't further use it with USB because this lead to data
corruption. Detach HDD from case and plug directly to a SATA port or
replace the controller.


Thank you Chris, Ivan, for your answers.

I understand the drive appears dead to the kernel and the safest
solution is to mount back the drive read only.

But...

To give you more details about my particular use case, we are
investigating the resilience of various FS to hardware failures. The
disk is (presumably) working but we are using a modified USB cable to
produce bus errors on purpose.

If I understand it well, when we switch the cable to "faulty mode", the
kernels detects usb errors or something, consider the device as dead,
and try to reset the bus. On that event, the drive will remount ro.

However, when we switch back the cable to "Normal mode", we are unable
to forcefully remount the drive rw. Even if we replace our cable by a
genuine one, and/or if we power cycle the drive. BTRFS just refuse to
remount rw that drive. FWIW, BTRFS is the only filesystems we've tested
considering a faulty drive as _definitively_ faulty without any hope for
the administrator to override that.


Here we are in a very special use case. But I think we would see a
similar behavior if some drive case or cable was dying, the
administrator replaced it, but was unable to remount rw the drive after
having fixed the problem. Or did I missed something?


-- 
-- Sylvain Leroux
-- sylvain@chicoree.fr
-- http://www.chicoree.fr



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Can't remount a BTRFS partition read write after a drive failure
  2017-05-18  7:22   ` Sylvain Leroux
@ 2017-05-19  2:23     ` Duncan
  2017-05-20  9:11       ` Sylvain Leroux
  0 siblings, 1 reply; 7+ messages in thread
From: Duncan @ 2017-05-19  2:23 UTC (permalink / raw)
  To: linux-btrfs

Sylvain Leroux posted on Thu, 18 May 2017 09:22:55 +0200 as excerpted:

> Thank you Chris, Ivan, for your answers.

> Here we are in a very special use case. But I think we would see a
> similar behavior if some drive case or cable was dying, the
> administrator replaced it, but was unable to remount rw the drive after
> having fixed the problem. Or did I missed something?

What you seem to have missed is my earlier reply, saying, effectively...

Known issue.  Btrfs currently doesn't really have a concept of devices 
coming and going while it's mounted.  If a device disappears, btrfs will 
continue to try to write to it until there are enough errors to kick it 
into read-only mode.  Even then, it continues to be part of the 
filesystem.

So when the kernel kicks the device and it reappears as another device, 
btrfs is still tracking the first, now non-existent device, for that 
filesystem (which is tracked by UUID, so it will be considered the same 
filesystem even when appearing as a different device, because the UUID is 
the same), and will attempt to add the second device to it (which is 
logical because unlike most filesystems, btrfs can actually /be/ multi-
device).

But the filesystem properties still say only one device, so btrfs knows 
there's /something/ wrong and won't mount the filesystem at its new 
device location except read-only, to prevent further damage due to the 
confusion.

There's patches in the pipeline to give btrfs this dynamic appearing and 
disappearing devices sense that it currently lacks, but they're in a 
patch-set (the btrfs global hot-spare feature patch set, which obviously 
requires btrfs knowing when a device is gone and needs replaced by the 
hot-spare) that has been held up waiting for a different patch set, that 
itself hasn't been integrated yet.

So try again in a few kernel cycles...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Can't remount a BTRFS partition read write after a drive failure
  2017-05-19  2:23     ` Duncan
@ 2017-05-20  9:11       ` Sylvain Leroux
  0 siblings, 0 replies; 7+ messages in thread
From: Sylvain Leroux @ 2017-05-20  9:11 UTC (permalink / raw)
  Cc: Btrfs BTRFS

On 05/19/2017 04:23 AM, Duncan wrote:
> > What you seem to have missed is my earlier reply, saying,
> effectively...
> > Known issue. [...]
Indeed I missed that reply. Sorry about that.

Thank you Duncan for your reply, for the great explanations AND for
having taken the time to repost it!


> > So try again in a few kernel cycles...

We will do exactly as you say.

-- 
-- Sylvain Leroux
-- sylvain@chicoree.fr
-- http://www.chicoree.fr




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-05-20 10:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-16 12:56 Can't remount a BTRFS partition read write after a drive failure Sylvain Leroux
2017-05-17  4:10 ` Chris Murphy
2017-05-17  9:22 ` Duncan
2017-05-17 15:21 ` Ivan Sizov
     [not found] ` <CAMG9ccz555xia-FpqGsoy-uzYXcyYwG6H0EC18O2AC9yLUuBpg@mail.gmail.com>
2017-05-18  7:22   ` Sylvain Leroux
2017-05-19  2:23     ` Duncan
2017-05-20  9:11       ` Sylvain Leroux

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.