* Question: raid1 behaviour on failure
@ 2016-04-18 5:06 Matthias Bodenbinder
2016-04-18 7:22 ` Qu Wenruo
0 siblings, 1 reply; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-18 5:06 UTC (permalink / raw)
To: linux-btrfs
Hi,
I have a raid1 with 3 drives: 698, 465 and 232 GB. I copied 1,7 GB data to that raid1, balanced the filesystem and then removed the bigger drive (hotplug).
The data was still available. Now I copied the /root directory to the raid1. It showed up via ls -l. Then I plugged in the missing hard drive again (hotplug). After a few seconds "btrfs fi show" is giving output as usual:
Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
Total devices 3 FS bytes used 1.60GiB
devid 1 size 698.64GiB used 4.03GiB path /dev/sdg
devid 2 size 465.76GiB used 4.03GiB path /dev/sdh
devid 3 size 232.88GiB used 0.00B path /dev/sdi
The /root is still showing up, but the raid1 is now mounted in *read-only* mode.
I umounted it and mounted it again. Now the /root directory on the raid1 is no longer available. Its gone.
I guess I missed some important step to recover the degraded raid1 before umounting it.
What is it that I missed?
Matthias
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-18 5:06 Question: raid1 behaviour on failure Matthias Bodenbinder
@ 2016-04-18 7:22 ` Qu Wenruo
2016-04-20 5:17 ` Matthias Bodenbinder
0 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2016-04-18 7:22 UTC (permalink / raw)
To: Matthias Bodenbinder, linux-btrfs
Not quite sure about raid1 behavior.
But your "hotplug" seems to be problem.
IIRC Btrfs is known to have problem with re-appearing device.
If the hot revmoed device is fully wiped before re-plugged, it should
not cause the RO mount (abort transaction).
BTW, it would be better to post the dmesg for better debug.
Hopes other one could give better explanation on this.
Thanks,
Qu
Matthias Bodenbinder wrote on 2016/04/18 07:06 +0200:
> Hi,
>
> I have a raid1 with 3 drives: 698, 465 and 232 GB. I copied 1,7 GB data to that raid1, balanced the filesystem and then removed the bigger drive (hotplug).
>
> The data was still available. Now I copied the /root directory to the raid1. It showed up via ls -l. Then I plugged in the missing hard drive again (hotplug). After a few seconds "btrfs fi show" is giving output as usual:
>
> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> Total devices 3 FS bytes used 1.60GiB
> devid 1 size 698.64GiB used 4.03GiB path /dev/sdg
> devid 2 size 465.76GiB used 4.03GiB path /dev/sdh
> devid 3 size 232.88GiB used 0.00B path /dev/sdi
>
> The /root is still showing up, but the raid1 is now mounted in *read-only* mode.
>
> I umounted it and mounted it again. Now the /root directory on the raid1 is no longer available. Its gone.
>
> I guess I missed some important step to recover the degraded raid1 before umounting it.
>
> What is it that I missed?
>
> Matthias
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-18 7:22 ` Qu Wenruo
@ 2016-04-20 5:17 ` Matthias Bodenbinder
2016-04-20 7:25 ` Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-20 5:17 UTC (permalink / raw)
To: linux-btrfs
Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
> BTW, it would be better to post the dmesg for better debug.
So here we. I did the same test again. Here is a full log of what i did. It seems to be mean like a bug in btrfs.
Sequenz of events:
1. mount the raid1 (2 disc with different size)
2. unplug the biggest drive (hotplug)
3. try to copy something to the degraded raid1
4. plugin the device again (hotplug)
This scenario does not work. The disc array is NOT redundant! I can not work with it while a drive is missing and I can not reattach the device so that everything works again.
The btrfs module crashes during the test.
I am using LMDE2 with backports:
btrfs-tools 4.4-1~bpo8+1
linux-image-4.4.0-0.bpo.1-amd64
Matthias
rakete - root - /root
1# mount /mnt/raid1/
Journal:
Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto defrag
Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
rakete - root - /mnt/raid1
3# ll
insgesamt 0
drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
drwxr-xr-x 1 root root 108 Mär 24 07:31 var
4# btrfs fi show
Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
Total devices 3 FS bytes used 1.60GiB
devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
devid 3 size 232.88GiB used 0.00B path /dev/sdi
####
unplug device sdg:
Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is busy
Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info about processes that
Apr 20 07:03:05 rakete umount[16405]: use the device is found by lsof(8) or fuser(1).)
Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process exited, code=exited status=32
Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device number 3 using xhci_hcd
Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an MTP device
Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6 type 0
Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8 type 0
Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page found
Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache: write through
Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00 10 08
Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte logical blocks: (250 GB/233 GiB)
Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page found
Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive cache: write through
Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00 10 08
Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page found
Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive cache: write through
Apr 20 07:03:25 rakete kernel: sdf: sdf1
Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No such file or directory
####
5# btrfs fi show
Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
Total devices 3 FS bytes used 1.60GiB
devid 2 size 465.76GiB used 3.03GiB path /dev/sdj
devid 3 size 232.88GiB used 0.00B path /dev/sdk
*** Some devices missing
####
still mounted in rw mode:
/dev/sdg on /mnt/raid1 type btrfs (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
####
7# cp -r /root/ .
cp: das Verzeichnis „./root“ kann nicht angelegt werden: Eingabe-/Ausgabefehler
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error reading free space cache
Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 20497563648, rebuilding it now
Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156 __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci saa7146_vv ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core dvb_core v4l2_common videodev media cfg80211 vboxdrv(O) cpufreq_powersave cpufreq_conservative cpufreq_userspace cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev sparse_keymap drbg iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek rfkill ansi_cprng snd_hda_codec_generic nvidia(PO) aesni_intel aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper snd_hda_codec cryptd snd_hda_core serio_raw pcspkr snd_hwdep snd_pcm i2c_i801 snd_timer snd lpc_ich soundcore 8250_fintek mei_me shpchp mei
Apr 20 07:05:37 rakete kernel: mfd_core battery tpm_tis tpm evdev processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd2 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic md_mod dm_mirror dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod ata_generic ahci libahci pata_via xhci_pci ehci_pci crc32c_intel xhci_hcd ehci_hcd libata psmouse scsi_mod atl1c usbcore usb_common fjes video wmi fan thermal button
Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted: P O 4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer System Product Name/P8H67-V, BIOS 3707 07/12/2013
Apr 20 07:05:37 rakete kernel: 0000000000000286 000000006a1407c8 ffffffff812ed425 ffff88016b6dfb90
Apr 20 07:05:37 rakete kernel: ffffffffa03817b8 ffffffff81077ea1 ffff88018e7fcd30 ffff88016b6dfbe8
Apr 20 07:05:37 rakete kernel: ffff88005d863e88 ffff8801cde7a980 ffff88018e7fce48 ffffffff81077f2c
Apr 20 07:05:37 rakete kernel: Call Trace:
Apr 20 07:05:37 rakete kernel: [<ffffffff812ed425>] ? dump_stack+0x5c/0x77
Apr 20 07:05:37 rakete kernel: [<ffffffff81077ea1>] ? warn_slowpath_common+0x81/0xb0
Apr 20 07:05:37 rakete kernel: [<ffffffff81077f2c>] ? warn_slowpath_fmt+0x5c/0x80
Apr 20 07:05:37 rakete kernel: [<ffffffffa02d74af>] ? __btrfs_cow_block+0x56f/0x5e0 [btrfs]
Apr 20 07:05:37 rakete kernel: [<ffffffffa02d76af>] ? btrfs_cow_block+0x10f/0x1d0 [btrfs]
Apr 20 07:05:37 rakete kernel: [<ffffffffa02db2cd>] ? btrfs_search_slot+0x1fd/0xa30 [btrfs]
Apr 20 07:05:37 rakete kernel: [<ffffffffa02dd3f1>] ? btrfs_insert_empty_items+0x71/0xc0 [btrfs]
Apr 20 07:05:37 rakete kernel: [<ffffffff811f4d92>] ? insert_inode_locked4+0xa2/0x1c0
Apr 20 07:05:37 rakete kernel: [<ffffffffa030ee5d>] ? btrfs_new_inode+0x1cd/0x590 [btrfs]
Apr 20 07:05:37 rakete kernel: [<ffffffffa0310a77>] ? btrfs_mkdir+0x107/0x1f0 [btrfs]
Apr 20 07:05:37 rakete kernel: [<ffffffff811e80b0>] ? vfs_mkdir+0xb0/0x140
Apr 20 07:05:37 rakete kernel: [<ffffffff811e9d3e>] ? SyS_mkdir+0xce/0x110
Apr 20 07:05:37 rakete kernel: [<ffffffff81592736>] ? system_call_fast_compare_end+0xc/0x6b
Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in __btrfs_cow_block:1156: errno=-5 IO failure
Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced readonly
####
Try to copy again:
11# cp -r /root/ .
cp: cannot create directory './root': Read-only file system
####
/dev/sdg on /mnt/raid1 type btrfs (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
####
plugin device sdg again:
Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point /media/matthias/BACKUP (device 8:81 no longer exist)
Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device number 3
Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No such file or directory
Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is busy
Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info about processes that
Apr 20 07:07:39 rakete umount[16807]: use the device is found by lsof(8) or fuser(1).)
Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process exited, code=exited status=32
Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device number 4 using xhci_hcd
Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an MTP device
Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access WDC WD75 00AACS-00C7B0 0125 PQ: 0 ANSI: 6
Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6 type 0
Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00 10 08
Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7 type 0
Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode page found
Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive cache: write through
Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8 type 0
Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00 10 08
Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9 type 0
Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode page found
Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive cache: write through
Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00 10 08
Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte logical blocks: (250 GB/233 GiB)
Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode page found
Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive cache: write through
Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00 10 08
Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode page found
Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive cache: write through
Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
####
still ro mode
/dev/sdj on /mnt/raid1 type btrfs (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
####
14# btrfs fi show
Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
Total devices 3 FS bytes used 1.60GiB
devid 1 size 698.64GiB used 3.03GiB path /dev/sdj
devid 2 size 465.76GiB used 3.03GiB path /dev/sdk
devid 3 size 232.88GiB used 0.00B path /dev/sdl
####
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-20 5:17 ` Matthias Bodenbinder
@ 2016-04-20 7:25 ` Qu Wenruo
2016-04-21 5:22 ` Matthias Bodenbinder
2016-04-20 13:32 ` Anand Jain
2016-04-21 6:23 ` Satoru Takeuchi
2 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2016-04-20 7:25 UTC (permalink / raw)
To: Matthias Bodenbinder, linux-btrfs
Matthias Bodenbinder wrote on 2016/04/20 07:17 +0200:
> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>> BTW, it would be better to post the dmesg for better debug.
>
> So here we. I did the same test again. Here is a full log of what i did. It seems to be mean like a bug in btrfs.
> Sequenz of events:
> 1. mount the raid1 (2 disc with different size)
> 2. unplug the biggest drive (hotplug)
> 3. try to copy something to the degraded raid1
> 4. plugin the device again (hotplug)
>
> This scenario does not work. The disc array is NOT redundant! I can not work with it while a drive is missing and I can not reattach the device so that everything works again.
>
> The btrfs module crashes during the test.
>
> I am using LMDE2 with backports:
> btrfs-tools 4.4-1~bpo8+1
> linux-image-4.4.0-0.bpo.1-amd64
>
> Matthias
>
>
> rakete - root - /root
> 1# mount /mnt/raid1/
>
> Journal:
>
> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto defrag
> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>
> rakete - root - /mnt/raid1
> 3# ll
> insgesamt 0
> drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
> drwxr-xr-x 1 root root 108 Mär 24 07:31 var
>
> 4# btrfs fi show
> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> Total devices 3 FS bytes used 1.60GiB
> devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
> devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
> devid 3 size 232.88GiB used 0.00B path /dev/sdi
>
> ####
> unplug device sdg:
>
> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is busy
> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info about processes that
> Apr 20 07:03:05 rakete umount[16405]: use the device is found by lsof(8) or fuser(1).)
> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process exited, code=exited status=32
> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device number 3 using xhci_hcd
> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an MTP device
> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6 type 0
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8 type 0
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page found
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache: write through
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00 10 08
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page found
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive cache: write through
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00 10 08
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page found
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive cache: write through
> Apr 20 07:03:25 rakete kernel: sdf: sdf1
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No such file or directory
>
>
> ####
> 5# btrfs fi show
> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> Total devices 3 FS bytes used 1.60GiB
> devid 2 size 465.76GiB used 3.03GiB path /dev/sdj
> devid 3 size 232.88GiB used 0.00B path /dev/sdk
> *** Some devices missing
> ####
> still mounted in rw mode:
> /dev/sdg on /mnt/raid1 type btrfs (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
> ####
Unfortunately, this is the designed behavior.
The fs is rw just because it doesn't hit any critical problem.
If you try to touch a file and then sync the fs, btrfs will become RO
immediately.
> 7# cp -r /root/ .
> cp: das Verzeichnis „./root“ kann nicht angelegt werden: Eingabe-/Ausgabefehler
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error reading free space cache
> Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 20497563648, rebuilding it now
> Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
> Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156 __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
> Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
> Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci saa7146_vv ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core dvb_core v4l2_common videodev media cfg80211 vboxdrv(O) cpufreq_powersave cpufreq_conservative cpufreq_userspace cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev sparse_keymap drbg iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek rfkill ansi_cprng snd_hda_codec_generic nvidia(PO) aesni_intel aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper snd_hda_codec cryptd snd_hda_core serio_raw pcspkr snd_hwdep snd_pcm i2c_i801 snd_timer snd lpc_ich soundcore 8250_fintek mei_me shpchp mei
> Apr 20 07:05:37 rakete kernel: mfd_core battery tpm_tis tpm evdev processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd2 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic md_mod dm_mirror dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod ata_generic ahci libahci pata_via xhci_pci ehci_pci crc32c_intel xhci_hcd ehci_hcd libata psmouse scsi_mod atl1c usbcore usb_common fjes video wmi fan thermal button
> Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted: P O 4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
> Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer System Product Name/P8H67-V, BIOS 3707 07/12/2013
> Apr 20 07:05:37 rakete kernel: 0000000000000286 000000006a1407c8 ffffffff812ed425 ffff88016b6dfb90
> Apr 20 07:05:37 rakete kernel: ffffffffa03817b8 ffffffff81077ea1 ffff88018e7fcd30 ffff88016b6dfbe8
> Apr 20 07:05:37 rakete kernel: ffff88005d863e88 ffff8801cde7a980 ffff88018e7fce48 ffffffff81077f2c
> Apr 20 07:05:37 rakete kernel: Call Trace:
> Apr 20 07:05:37 rakete kernel: [<ffffffff812ed425>] ? dump_stack+0x5c/0x77
> Apr 20 07:05:37 rakete kernel: [<ffffffff81077ea1>] ? warn_slowpath_common+0x81/0xb0
> Apr 20 07:05:37 rakete kernel: [<ffffffff81077f2c>] ? warn_slowpath_fmt+0x5c/0x80
> Apr 20 07:05:37 rakete kernel: [<ffffffffa02d74af>] ? __btrfs_cow_block+0x56f/0x5e0 [btrfs]
> Apr 20 07:05:37 rakete kernel: [<ffffffffa02d76af>] ? btrfs_cow_block+0x10f/0x1d0 [btrfs]
> Apr 20 07:05:37 rakete kernel: [<ffffffffa02db2cd>] ? btrfs_search_slot+0x1fd/0xa30 [btrfs]
> Apr 20 07:05:37 rakete kernel: [<ffffffffa02dd3f1>] ? btrfs_insert_empty_items+0x71/0xc0 [btrfs]
> Apr 20 07:05:37 rakete kernel: [<ffffffff811f4d92>] ? insert_inode_locked4+0xa2/0x1c0
> Apr 20 07:05:37 rakete kernel: [<ffffffffa030ee5d>] ? btrfs_new_inode+0x1cd/0x590 [btrfs]
> Apr 20 07:05:37 rakete kernel: [<ffffffffa0310a77>] ? btrfs_mkdir+0x107/0x1f0 [btrfs]
> Apr 20 07:05:37 rakete kernel: [<ffffffff811e80b0>] ? vfs_mkdir+0xb0/0x140
> Apr 20 07:05:37 rakete kernel: [<ffffffff811e9d3e>] ? SyS_mkdir+0xce/0x110
> Apr 20 07:05:37 rakete kernel: [<ffffffff81592736>] ? system_call_fast_compare_end+0xc/0x6b
> Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
> Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in __btrfs_cow_block:1156: errno=-5 IO failure
> Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced readonly
Btrfs fails to read space cache, nor make a new dir.
The failure on cow_block in mkdir is ciritical, and btrfs become RO.
All expected behavior so far.
You may try use degraded mount option, but AFAIK it may not handle case
like yours.
Thanks,
Qu
>
> ####
> Try to copy again:
> 11# cp -r /root/ .
> cp: cannot create directory './root': Read-only file system
> ####
> /dev/sdg on /mnt/raid1 type btrfs (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
> ####
> plugin device sdg again:
>
> Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point /media/matthias/BACKUP (device 8:81 no longer exist)
> Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device number 3
> Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No such file or directory
> Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is busy
> Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info about processes that
> Apr 20 07:07:39 rakete umount[16807]: use the device is found by lsof(8) or fuser(1).)
> Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process exited, code=exited status=32
> Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
> Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device number 4 using xhci_hcd
> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
> Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an MTP device
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access WDC WD75 00AACS-00C7B0 0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
>
> ####
> still ro mode
> /dev/sdj on /mnt/raid1 type btrfs (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
> ####
> 14# btrfs fi show
> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> Total devices 3 FS bytes used 1.60GiB
> devid 1 size 698.64GiB used 3.03GiB path /dev/sdj
> devid 2 size 465.76GiB used 3.03GiB path /dev/sdk
> devid 3 size 232.88GiB used 0.00B path /dev/sdl
> ####
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-20 5:17 ` Matthias Bodenbinder
2016-04-20 7:25 ` Qu Wenruo
@ 2016-04-20 13:32 ` Anand Jain
2016-04-21 5:15 ` Matthias Bodenbinder
2016-04-21 6:23 ` Satoru Takeuchi
2 siblings, 1 reply; 32+ messages in thread
From: Anand Jain @ 2016-04-20 13:32 UTC (permalink / raw)
To: Matthias Bodenbinder, linux-btrfs
> 1. mount the raid1 (2 disc with different size)
> 2. unplug the biggest drive (hotplug)
Btrfs won't know that you have plugged-out a disk.
Though it experiences IO failures, it won't close the bdev.
> 3. try to copy something to the degraded raid1
This will work as long as you do _not_ run unmount/mount.
However once you umount/mount you won't be able to mount
even with -o degraded option. (there are some workaround
patches in the ML)
> 4. plugin the device again (hotplug)
This is a bad test case.
- Since btrfs didn't close the device, at #2 above, the block
layer will create a new device instance and path when you plug-in
the device.
And when btrfs will promptly scan the device and update its
records. But note that its still using the old bdev. And
you will continue to see the IO errors. And no IO will go
to the new device instance.
There are patches in the ML under tests which will force
close the device upon loosing access to the device. As a
first step.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-20 13:32 ` Anand Jain
@ 2016-04-21 5:15 ` Matthias Bodenbinder
2016-04-21 7:19 ` Anand Jain
0 siblings, 1 reply; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-21 5:15 UTC (permalink / raw)
To: linux-btrfs
Am 20.04.2016 um 15:32 schrieb Anand Jain:
>> 1. mount the raid1 (2 disc with different size)
>
>> 2. unplug the biggest drive (hotplug)
>
> Btrfs won't know that you have plugged-out a disk.
> Though it experiences IO failures, it won't close the bdev.
Well, as far as I can tell mdadm can handle this use case. I tested that. I have an mdadm raid5 running. I accidentially unplugged a sata cable from one of the devices and the raid still worked. I did not even notice that the cable was unplugged until a few hours later. Then I plugged in the cable agaib and that was it. mdadm recovered the raid5 without any problem. -> This is redunancy!
>
>> 3. try to copy something to the degraded raid1
>
> This will work as long as you do _not_ run unmount/mount.
I did not umount the raid1 when I tried to copy something. As you can see from the sequence of events: I removed the drive and immdiately afterwards tried to copy something to the degraded array. This copy failed with a crash of the btrfs module. -> This is NOT redundancy.
The ummount and mount operations are coming afterwards.
In a nutshell I have to say that the btrfs behaviour is by no means compliant with my understanding of redundancy.
Matthias
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-20 7:25 ` Qu Wenruo
@ 2016-04-21 5:22 ` Matthias Bodenbinder
2016-04-21 5:43 ` Qu Wenruo
0 siblings, 1 reply; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-21 5:22 UTC (permalink / raw)
To: linux-btrfs
Am 20.04.2016 um 09:25 schrieb Qu Wenruo:
>
> Unfortunately, this is the designed behavior.
>
> The fs is rw just because it doesn't hit any critical problem.
>
> If you try to touch a file and then sync the fs, btrfs will become RO immediately.
>
....
> Btrfs fails to read space cache, nor make a new dir.
>
> The failure on cow_block in mkdir is ciritical, and btrfs become RO.
>
> All expected behavior so far.
>
> You may try use degraded mount option, but AFAIK it may not handle case like yours.
This really scares me. "Expected bevahour"?
So you are saying: If one of the drives in the raid1 is going dead without noticing btrfs, the redundancy is lost.
Lets say, the power unit of a disc is going dead. This disc will disappear from the raid1 pretty much as suddenly as in my test case here. No difference.
You are saying that in this case, btrfs should exactly behave like this? If that is the case I eventually need to rethink my interpretation of redundancy.
Matthias
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-21 5:22 ` Matthias Bodenbinder
@ 2016-04-21 5:43 ` Qu Wenruo
2016-04-21 6:02 ` Liu Bo
2016-04-21 17:40 ` Matthias Bodenbinder
0 siblings, 2 replies; 32+ messages in thread
From: Qu Wenruo @ 2016-04-21 5:43 UTC (permalink / raw)
To: Matthias Bodenbinder, linux-btrfs
Matthias Bodenbinder wrote on 2016/04/21 07:22 +0200:
> Am 20.04.2016 um 09:25 schrieb Qu Wenruo:
>
>>
>> Unfortunately, this is the designed behavior.
>>
>> The fs is rw just because it doesn't hit any critical problem.
>>
>> If you try to touch a file and then sync the fs, btrfs will become RO immediately.
>>
> ....
>
>> Btrfs fails to read space cache, nor make a new dir.
>>
>> The failure on cow_block in mkdir is ciritical, and btrfs become RO.
>>
>> All expected behavior so far.
>>
>> You may try use degraded mount option, but AFAIK it may not handle case like yours.
>
> This really scares me. "Expected bevahour"?
> So you are saying: If one of the drives in the raid1 is going dead without noticing btrfs, the redundancy is lost.
>
> Lets say, the power unit of a disc is going dead. This disc will disappear from the raid1 pretty much as suddenly as in my test case here. No difference.
>
> You are saying that in this case, btrfs should exactly behave like this? If that is the case I eventually need to rethink my interpretation of redundancy.
>
> Matthias
>
The "expected behavior" just means the abort transaction behavior for
critical error is expected.
And you should know, btrfs is not doing full block level RAID1, it's
doing RAID at chunk level.
Which needs to consider more things than full block level RAID1, and
it's more flex than block level raid1.
(For example, you can use 3 devices with different sizes to do btrfs
RAID1 and get more available size than mdadm raid1)
You may think the behavior is totally insane for btrfs RAID1, but don't
forget, btrfs can have different metdata/data profile.
(And even more, there is already plan to support different profile for
different subvolumes)
In case your metadata is RAID1, your data can still be RAID0, and in
that case a missing devices can still cause huge problem.
There are already unmerged patches which will partly do the mdadm level
behavior, like automatically change to degraded mode without making the
fs RO.
The original patchset:
http://comments.gmane.org/gmane.comp.file-systems.btrfs/48335
Or the latest patchset inside Anand Jain's auto-replace patchset:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/55446
Thanks,
Qu
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-21 5:43 ` Qu Wenruo
@ 2016-04-21 6:02 ` Liu Bo
2016-04-21 6:09 ` Qu Wenruo
2016-04-21 17:40 ` Matthias Bodenbinder
1 sibling, 1 reply; 32+ messages in thread
From: Liu Bo @ 2016-04-21 6:02 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Matthias Bodenbinder, linux-btrfs
On Thu, Apr 21, 2016 at 01:43:56PM +0800, Qu Wenruo wrote:
>
>
> Matthias Bodenbinder wrote on 2016/04/21 07:22 +0200:
> >Am 20.04.2016 um 09:25 schrieb Qu Wenruo:
> >
> >>
> >>Unfortunately, this is the designed behavior.
> >>
> >>The fs is rw just because it doesn't hit any critical problem.
> >>
> >>If you try to touch a file and then sync the fs, btrfs will become RO immediately.
> >>
> >....
> >
> >>Btrfs fails to read space cache, nor make a new dir.
> >>
> >>The failure on cow_block in mkdir is ciritical, and btrfs become RO.
> >>
> >>All expected behavior so far.
> >>
> >>You may try use degraded mount option, but AFAIK it may not handle case like yours.
> >
> >This really scares me. "Expected bevahour"?
> >So you are saying: If one of the drives in the raid1 is going dead without noticing btrfs, the redundancy is lost.
> >
> >Lets say, the power unit of a disc is going dead. This disc will disappear from the raid1 pretty much as suddenly as in my test case here. No difference.
> >
> >You are saying that in this case, btrfs should exactly behave like this? If that is the case I eventually need to rethink my interpretation of redundancy.
> >
> >Matthias
> >
>
> The "expected behavior" just means the abort transaction behavior for
> critical error is expected.
>
> And you should know, btrfs is not doing full block level RAID1, it's doing
> RAID at chunk level.
> Which needs to consider more things than full block level RAID1, and it's
> more flex than block level raid1.
> (For example, you can use 3 devices with different sizes to do btrfs RAID1
> and get more available size than mdadm raid1)
>
> You may think the behavior is totally insane for btrfs RAID1, but don't
> forget, btrfs can have different metdata/data profile.
> (And even more, there is already plan to support different profile for
> different subvolumes)
>
> In case your metadata is RAID1, your data can still be RAID0, and in that
> case a missing devices can still cause huge problem.
>From an user's point of view, what you're saying is more an excuse and
kind of irrelavant. Stop doing that please, try to fix the insane behavior instead.
Thanks,
-liubo
>
> There are already unmerged patches which will partly do the mdadm level
> behavior, like automatically change to degraded mode without making the fs
> RO.
>
> The original patchset:
> http://comments.gmane.org/gmane.comp.file-systems.btrfs/48335
>
> Or the latest patchset inside Anand Jain's auto-replace patchset:
> http://thread.gmane.org/gmane.comp.file-systems.btrfs/55446
>
> Thanks,
> Qu
> >
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-21 6:02 ` Liu Bo
@ 2016-04-21 6:09 ` Qu Wenruo
0 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2016-04-21 6:09 UTC (permalink / raw)
To: bo.li.liu; +Cc: Matthias Bodenbinder, linux-btrfs
Liu Bo wrote on 2016/04/20 23:02 -0700:
> On Thu, Apr 21, 2016 at 01:43:56PM +0800, Qu Wenruo wrote:
>>
>>
>> Matthias Bodenbinder wrote on 2016/04/21 07:22 +0200:
>>> Am 20.04.2016 um 09:25 schrieb Qu Wenruo:
>>>
>>>>
>>>> Unfortunately, this is the designed behavior.
>>>>
>>>> The fs is rw just because it doesn't hit any critical problem.
>>>>
>>>> If you try to touch a file and then sync the fs, btrfs will become RO immediately.
>>>>
>>> ....
>>>
>>>> Btrfs fails to read space cache, nor make a new dir.
>>>>
>>>> The failure on cow_block in mkdir is ciritical, and btrfs become RO.
>>>>
>>>> All expected behavior so far.
>>>>
>>>> You may try use degraded mount option, but AFAIK it may not handle case like yours.
>>>
>>> This really scares me. "Expected bevahour"?
>>> So you are saying: If one of the drives in the raid1 is going dead without noticing btrfs, the redundancy is lost.
>>>
>>> Lets say, the power unit of a disc is going dead. This disc will disappear from the raid1 pretty much as suddenly as in my test case here. No difference.
>>>
>>> You are saying that in this case, btrfs should exactly behave like this? If that is the case I eventually need to rethink my interpretation of redundancy.
>>>
>>> Matthias
>>>
>>
>> The "expected behavior" just means the abort transaction behavior for
>> critical error is expected.
>>
>> And you should know, btrfs is not doing full block level RAID1, it's doing
>> RAID at chunk level.
>> Which needs to consider more things than full block level RAID1, and it's
>> more flex than block level raid1.
>> (For example, you can use 3 devices with different sizes to do btrfs RAID1
>> and get more available size than mdadm raid1)
>>
>> You may think the behavior is totally insane for btrfs RAID1, but don't
>> forget, btrfs can have different metdata/data profile.
>> (And even more, there is already plan to support different profile for
>> different subvolumes)
>>
>> In case your metadata is RAID1, your data can still be RAID0, and in that
>> case a missing devices can still cause huge problem.
>
> From an user's point of view, what you're saying is more an excuse and
> kind of irrelavant. Stop doing that please, try to fix the insane behavior instead.
>
> Thanks,
>
> -liubo
Didn't you see I have already submitted the first version of per-chunk
degradable patchset for a long time to address the problem?
And you should blame the person who is blocking the patchset from
merging by refusing the split them along.
Thanks,
Qu
>
>>
>> There are already unmerged patches which will partly do the mdadm level
>> behavior, like automatically change to degraded mode without making the fs
>> RO.
>>
>> The original patchset:
>> http://comments.gmane.org/gmane.comp.file-systems.btrfs/48335
>>
>> Or the latest patchset inside Anand Jain's auto-replace patchset:
>> http://thread.gmane.org/gmane.comp.file-systems.btrfs/55446
>>
>> Thanks,
>> Qu
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-20 5:17 ` Matthias Bodenbinder
2016-04-20 7:25 ` Qu Wenruo
2016-04-20 13:32 ` Anand Jain
@ 2016-04-21 6:23 ` Satoru Takeuchi
2016-04-21 11:09 ` Austin S. Hemmelgarn
` (2 more replies)
2 siblings, 3 replies; 32+ messages in thread
From: Satoru Takeuchi @ 2016-04-21 6:23 UTC (permalink / raw)
To: Matthias Bodenbinder, linux-btrfs
On 2016/04/20 14:17, Matthias Bodenbinder wrote:
> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>> BTW, it would be better to post the dmesg for better debug.
>
> So here we. I did the same test again. Here is a full log of what i did. It seems to be mean like a bug in btrfs.
> Sequenz of events:
> 1. mount the raid1 (2 disc with different size)
> 2. unplug the biggest drive (hotplug)
> 3. try to copy something to the degraded raid1
> 4. plugin the device again (hotplug)
>
> This scenario does not work. The disc array is NOT redundant! I can not work with it while a drive is missing and I can not reattach the device so that everything works again.
>
> The btrfs module crashes during the test.
>
> I am using LMDE2 with backports:
> btrfs-tools 4.4-1~bpo8+1
> linux-image-4.4.0-0.bpo.1-amd64
>
> Matthias
>
>
> rakete - root - /root
> 1# mount /mnt/raid1/
>
> Journal:
>
> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto defrag
> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>
> rakete - root - /mnt/raid1
> 3# ll
> insgesamt 0
> drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
> drwxr-xr-x 1 root root 108 Mär 24 07:31 var
>
> 4# btrfs fi show
> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> Total devices 3 FS bytes used 1.60GiB
> devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
> devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
> devid 3 size 232.88GiB used 0.00B path /dev/sdi
>
> ####
> unplug device sdg:
>
> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is busy
> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info about processes that
> Apr 20 07:03:05 rakete umount[16405]: use the device is found by lsof(8) or fuser(1).)
> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process exited, code=exited status=32
> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device number 3 using xhci_hcd
> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an MTP device
> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6 type 0
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8 type 0
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page found
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache: write through
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00 10 08
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page found
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive cache: write through
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00 10 08
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page found
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive cache: write through
> Apr 20 07:03:25 rakete kernel: sdf: sdf1
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No such file or directory
>
>
> ####
> 5# btrfs fi show
> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> Total devices 3 FS bytes used 1.60GiB
> devid 2 size 465.76GiB used 3.03GiB path /dev/sdj
> devid 3 size 232.88GiB used 0.00B path /dev/sdk
> *** Some devices missing
> ####
Here the names of *online* devices are changed
(/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk) after just
offlining a device (/dev/sdf). It's odd regardless of
whether Btrfs works fine or not.
Can anyone explain this behavior?
Thanks,
Satoru
> still mounted in rw mode:
> /dev/sdg on /mnt/raid1 type btrfs (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
> ####
> 7# cp -r /root/ .
> cp: das Verzeichnis „./root“ kann nicht angelegt werden: Eingabe-/Ausgabefehler
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error reading free space cache
> Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 20497563648, rebuilding it now
> Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
> Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156 __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
> Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
> Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci saa7146_vv ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core dvb_core v4l2_common videodev media cfg80211 vboxdrv(O) cpufreq_powersave cpufreq_conservative cpufreq_userspace cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev sparse_keymap drbg iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek rfkill ansi_cprng snd_hda_codec_generic nvidia(PO) aesni_intel aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper snd_hda_codec cryptd snd_hda_core serio_raw pcspkr snd_hwdep snd_pcm i2c_i801 snd_timer snd lpc_ich soundcore 8250_fintek mei_me shpchp mei
> Apr 20 07:05:37 rakete kernel: mfd_core battery tpm_tis tpm evdev processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd2 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic md_mod dm_mirror dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod ata_generic ahci libahci pata_via xhci_pci ehci_pci crc32c_intel xhci_hcd ehci_hcd libata psmouse scsi_mod atl1c usbcore usb_common fjes video wmi fan thermal button
> Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted: P O 4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
> Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer System Product Name/P8H67-V, BIOS 3707 07/12/2013
> Apr 20 07:05:37 rakete kernel: 0000000000000286 000000006a1407c8 ffffffff812ed425 ffff88016b6dfb90
> Apr 20 07:05:37 rakete kernel: ffffffffa03817b8 ffffffff81077ea1 ffff88018e7fcd30 ffff88016b6dfbe8
> Apr 20 07:05:37 rakete kernel: ffff88005d863e88 ffff8801cde7a980 ffff88018e7fce48 ffffffff81077f2c
> Apr 20 07:05:37 rakete kernel: Call Trace:
> Apr 20 07:05:37 rakete kernel: [<ffffffff812ed425>] ? dump_stack+0x5c/0x77
> Apr 20 07:05:37 rakete kernel: [<ffffffff81077ea1>] ? warn_slowpath_common+0x81/0xb0
> Apr 20 07:05:37 rakete kernel: [<ffffffff81077f2c>] ? warn_slowpath_fmt+0x5c/0x80
> Apr 20 07:05:37 rakete kernel: [<ffffffffa02d74af>] ? __btrfs_cow_block+0x56f/0x5e0 [btrfs]
> Apr 20 07:05:37 rakete kernel: [<ffffffffa02d76af>] ? btrfs_cow_block+0x10f/0x1d0 [btrfs]
> Apr 20 07:05:37 rakete kernel: [<ffffffffa02db2cd>] ? btrfs_search_slot+0x1fd/0xa30 [btrfs]
> Apr 20 07:05:37 rakete kernel: [<ffffffffa02dd3f1>] ? btrfs_insert_empty_items+0x71/0xc0 [btrfs]
> Apr 20 07:05:37 rakete kernel: [<ffffffff811f4d92>] ? insert_inode_locked4+0xa2/0x1c0
> Apr 20 07:05:37 rakete kernel: [<ffffffffa030ee5d>] ? btrfs_new_inode+0x1cd/0x590 [btrfs]
> Apr 20 07:05:37 rakete kernel: [<ffffffffa0310a77>] ? btrfs_mkdir+0x107/0x1f0 [btrfs]
> Apr 20 07:05:37 rakete kernel: [<ffffffff811e80b0>] ? vfs_mkdir+0xb0/0x140
> Apr 20 07:05:37 rakete kernel: [<ffffffff811e9d3e>] ? SyS_mkdir+0xce/0x110
> Apr 20 07:05:37 rakete kernel: [<ffffffff81592736>] ? system_call_fast_compare_end+0xc/0x6b
> Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
> Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in __btrfs_cow_block:1156: errno=-5 IO failure
> Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced readonly
>
> ####
> Try to copy again:
> 11# cp -r /root/ .
> cp: cannot create directory './root': Read-only file system
> ####
> /dev/sdg on /mnt/raid1 type btrfs (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
> ####
> plugin device sdg again:
>
> Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point /media/matthias/BACKUP (device 8:81 no longer exist)
> Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device number 3
> Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No such file or directory
> Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is busy
> Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info about processes that
> Apr 20 07:07:39 rakete umount[16807]: use the device is found by lsof(8) or fuser(1).)
> Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process exited, code=exited status=32
> Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
> Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device number 4 using xhci_hcd
> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
> Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an MTP device
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access WDC WD75 00AACS-00C7B0 0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
>
> ####
> still ro mode
> /dev/sdj on /mnt/raid1 type btrfs (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
> ####
> 14# btrfs fi show
> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> Total devices 3 FS bytes used 1.60GiB
> devid 1 size 698.64GiB used 3.03GiB path /dev/sdj
> devid 2 size 465.76GiB used 3.03GiB path /dev/sdk
> devid 3 size 232.88GiB used 0.00B path /dev/sdl
> ####
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-21 5:15 ` Matthias Bodenbinder
@ 2016-04-21 7:19 ` Anand Jain
0 siblings, 0 replies; 32+ messages in thread
From: Anand Jain @ 2016-04-21 7:19 UTC (permalink / raw)
To: Matthias Bodenbinder, linux-btrfs
On 04/21/2016 01:15 PM, Matthias Bodenbinder wrote:
> Am 20.04.2016 um 15:32 schrieb Anand Jain:
>>> 1. mount the raid1 (2 disc with different size)
>>
>>> 2. unplug the biggest drive (hotplug)
>>
>> Btrfs won't know that you have plugged-out a disk.
>> Though it experiences IO failures, it won't close the bdev.
>
> Well, as far as I can tell mdadm can handle this use case. I tested that. I have an mdadm raid5 running. I accidentially unplugged a sata cable from one of the devices and the raid still worked. I did not even notice that the cable was unplugged until a few hours later. Then I plugged in the cable agaib and that was it. mdadm recovered the raid5 without any problem. -> This is redunancy!
Yep. I meant to say its a bug in btrfs that it won't know
about the missing device (after mount). Pls do test the hot
spare patch set it has few first steps (yep not a complete)
to handle the failed device while FS is mounted.
>>> 3. try to copy something to the degraded raid1
>>
>> This will work as long as you do _not_ run unmount/mount.
>
> I did not umount the raid1 when I tried to copy something. As you can see from the sequence of events: I removed the drive and immdiately afterwards tried to copy something to the degraded array. This copy failed with a crash of the btrfs module. -> This is NOT redundancy.
>
> The ummount and mount operations are coming afterwards.
>
> In a nutshell I have to say that the btrfs behaviour is by no means compliant with my understanding of redundancy.
A known issue.
Your testing / validating of hot spare patch set will help.
Thanks, Anand
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-21 6:23 ` Satoru Takeuchi
@ 2016-04-21 11:09 ` Austin S. Hemmelgarn
2016-04-21 11:28 ` Henk Slager
[not found] ` <57188534.1070408@jp.fujitsu.com>
2 siblings, 0 replies; 32+ messages in thread
From: Austin S. Hemmelgarn @ 2016-04-21 11:09 UTC (permalink / raw)
To: Satoru Takeuchi, Matthias Bodenbinder, linux-btrfs
On 2016-04-21 02:23, Satoru Takeuchi wrote:
> On 2016/04/20 14:17, Matthias Bodenbinder wrote:
>> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>>> BTW, it would be better to post the dmesg for better debug.
>>
>> So here we. I did the same test again. Here is a full log of what i
>> did. It seems to be mean like a bug in btrfs.
>> Sequenz of events:
>> 1. mount the raid1 (2 disc with different size)
>> 2. unplug the biggest drive (hotplug)
>> 3. try to copy something to the degraded raid1
>> 4. plugin the device again (hotplug)
>>
>> This scenario does not work. The disc array is NOT redundant! I can
>> not work with it while a drive is missing and I can not reattach the
>> device so that everything works again.
>>
>> The btrfs module crashes during the test.
>>
>> I am using LMDE2 with backports:
>> btrfs-tools 4.4-1~bpo8+1
>> linux-image-4.4.0-0.bpo.1-amd64
>>
>> Matthias
>>
>>
>> rakete - root - /root
>> 1# mount /mnt/raid1/
>>
>> Journal:
>>
>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto
>> defrag
>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space
>> caching is enabled
>> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>>
>> rakete - root - /mnt/raid1
>> 3# ll
>> insgesamt 0
>> drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
>> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
>> drwxr-xr-x 1 root root 108 Mär 24 07:31 var
>>
>> 4# btrfs fi show
>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>> Total devices 3 FS bytes used 1.60GiB
>> devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
>> devid 3 size 232.88GiB used 0.00B path /dev/sdi
>>
>> ####
>> unplug device sdg:
>>
>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>> block 243826688, lost sync page write
>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>> journal superblock for sdf1-8.
>> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>> block 243826688, lost sync page write
>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>> journal superblock for sdf1-8.
>> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is busy
>> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info about
>> processes that
>> Apr 20 07:03:05 rakete umount[16405]: use the device is found by
>> lsof(8) or fuser(1).)
>> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process
>> exited, code=exited status=32
>> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
>> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device
>> number 3 using xhci_hcd
>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found,
>> idVendor=152d, idProduct=0567
>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings:
>> Mfr=10, Product=11, SerialNumber=5
>> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
>> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
>> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>> device detected
>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>> vid 152d pid 0567: 5000000
>> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
>> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3:
>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an
>> MTP device
>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access WDC
>> WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access WDC
>> WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access
>> SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6
>> type 0
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7
>> type 0
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte
>> logical blocks: (2.00 TB/1.82 TiB)
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8
>> type 0
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte
>> logical blocks: (500 GB/466 GiB)
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page
>> found
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache:
>> write through
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00 10 08
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte
>> logical blocks: (250 GB/233 GiB)
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page
>> found
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive cache:
>> write through
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00 10 08
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page
>> found
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive cache:
>> write through
>> Apr 20 07:03:25 rakete kernel: sdf: sdf1
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem with
>> ordered data mode. Opts: (null)
>> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No such
>> file or directory
>>
>>
>> ####
>> 5# btrfs fi show
>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>> Total devices 3 FS bytes used 1.60GiB
>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdj
>> devid 3 size 232.88GiB used 0.00B path /dev/sdk
>> *** Some devices missing
>> ####
>
> Here the names of *online* devices are changed
> (/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk) after just
> offlining a device (/dev/sdf). It's odd regardless of
> whether Btrfs works fine or not.
>
> Can anyone explain this behavior?
It's a side effect of the reference counting done in the kernel. If
something is holding open references to the block device (for example,
if there's a mounted filesystem on one of it's partitions), then the
kernel has to keep the internal structures relating to that block device
around, even if the device isn't there anymore. This means that when
the disk reappears, the old name is still in use, so the kernel has to
allocate a new one (because it can't safely assume that the disk is the
same one that was there previously). It has some annoying side effects,
but it's still a whole lot better than the system crashing from a NULL
pointer reference.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-21 6:23 ` Satoru Takeuchi
2016-04-21 11:09 ` Austin S. Hemmelgarn
@ 2016-04-21 11:28 ` Henk Slager
2016-04-21 17:27 ` Matthias Bodenbinder
[not found] ` <57188534.1070408@jp.fujitsu.com>
2 siblings, 1 reply; 32+ messages in thread
From: Henk Slager @ 2016-04-21 11:28 UTC (permalink / raw)
To: Satoru Takeuchi; +Cc: Matthias Bodenbinder, linux-btrfs
On Thu, Apr 21, 2016 at 8:23 AM, Satoru Takeuchi
<takeuchi_satoru@jp.fujitsu.com> wrote:
> On 2016/04/20 14:17, Matthias Bodenbinder wrote:
>>
>> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>>>
>>> BTW, it would be better to post the dmesg for better debug.
>>
>>
>> So here we. I did the same test again. Here is a full log of what i did.
>> It seems to be mean like a bug in btrfs.
>> Sequenz of events:
>> 1. mount the raid1 (2 disc with different size)
>> 2. unplug the biggest drive (hotplug)
>> 3. try to copy something to the degraded raid1
>> 4. plugin the device again (hotplug)
>>
>> This scenario does not work. The disc array is NOT redundant! I can not
>> work with it while a drive is missing and I can not reattach the device so
>> that everything works again.
>>
>> The btrfs module crashes during the test.
>>
>> I am using LMDE2 with backports:
>> btrfs-tools 4.4-1~bpo8+1
>> linux-image-4.4.0-0.bpo.1-amd64
>>
>> Matthias
>>
>>
>> rakete - root - /root
>> 1# mount /mnt/raid1/
>>
>> Journal:
>>
>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto
>> defrag
>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space caching
>> is enabled
>> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>>
>> rakete - root - /mnt/raid1
>> 3# ll
>> insgesamt 0
>> drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
>> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
>> drwxr-xr-x 1 root root 108 Mär 24 07:31 var
>>
>> 4# btrfs fi show
>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>> Total devices 3 FS bytes used 1.60GiB
>> devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
>> devid 3 size 232.88GiB used 0.00B path /dev/sdi
>>
>> ####
>> unplug device sdg:
>>
>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block
>> 243826688, lost sync page write
>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>> journal superblock for sdf1-8.
>> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block
>> 243826688, lost sync page write
>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>> journal superblock for sdf1-8.
>> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is busy
>> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info about
>> processes that
>> Apr 20 07:03:05 rakete umount[16405]: use the device is found by lsof(8)
>> or fuser(1).)
>> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process exited,
>> code=exited status=32
>> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
>> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device number 3
>> using xhci_hcd
>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found,
>> idVendor=152d, idProduct=0567
>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings: Mfr=10,
>> Product=11, SerialNumber=5
>> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
>> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
>> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>> device detected
>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid
>> 152d pid 0567: 5000000
>> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
>> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3:
>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an MTP
>> device
>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access WDC WD20
>> 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access WDC WD50
>> 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access SAMSUNG
>> SP2504C 0125 PQ: 0 ANSI: 6
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6 type
>> 0
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7 type
>> 0
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte
>> logical blocks: (2.00 TB/1.82 TiB)
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8 type
>> 0
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte
>> logical blocks: (500 GB/466 GiB)
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page
>> found
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache:
>> write through
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00 10 08
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte
>> logical blocks: (250 GB/233 GiB)
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page
>> found
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive cache:
>> write through
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00 10 08
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page
>> found
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive cache:
>> write through
>> Apr 20 07:03:25 rakete kernel: sdf: sdf1
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem with
>> ordered data mode. Opts: (null)
>> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No such
>> file or directory
>>
>>
>> ####
>> 5# btrfs fi show
>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>> Total devices 3 FS bytes used 1.60GiB
>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdj
>> devid 3 size 232.88GiB used 0.00B path /dev/sdk
>> *** Some devices missing
>> ####
>
>
> Here the names of *online* devices are changed
> (/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk) after just
> offlining a device (/dev/sdf). It's odd regardless of
> whether Btrfs works fine or not.
>
> Can anyone explain this behavior?
All 4 drives (WD20, WD75, WD50, SP2504C) get a disconnect twice in
this test. What is on WD20 is unclear to me, but the raid1 array is
{WD75, WD50, SP2504C}
So the test as described by Matthias is not what actually happens.
In fact, the whole btrfs fs is 'disconnected on the lower layers of
the kernel' but there is no unmount. You can see the scsi items go
from 8?.0.0.x to
9.0.0.x to 10.0.0.x. In the 9.0.0.x state, the tools show then 1 dev
missing (WD75), but in fact the whole fs state is messed up. So as
indicated by Anand already, it is a bad test and it is what one can
expect from an unpatched 4.4.0 kernel. ( I'm curious to know how md
raidX would handle this ).
a) My best guess is that the 4 drives are in a USB connected drivebay
and that Matthias unplugged WD75 (so cut its power and SATA
connection), did the file copy trial and then plugged in the WD75
again into the drivebay. The (un)plug of a harddisk is then assumed to
trigger a USB link re-init by the chipset in the drivebay.
b) Another possibility is that due to (un)plug of WD75 cause the host
USB chipset to re-init the USB link due to (too big?) changes in
electrical current. And likely separate USB cables and maybe some
SATA.
c) Or some flaw in the LMDE2 distribution in combination with btrfs. I
don't what is in the linux-image-4.4.0-0.bpo.1-amd64
>> still mounted in rw mode:
>> /dev/sdg on /mnt/raid1 type btrfs
>> (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>> ####
>> 7# cp -r /root/ .
>> cp: das Verzeichnis „./root“ kann nicht angelegt werden:
>> Eingabe-/Ausgabefehler
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error reading
>> free space cache
>> Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to load
>> free space cache for block group 20497563648, rebuilding it now
>> Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
>> Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at
>> /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156
>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
>> Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
>> Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage pci_stub
>> vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci saa7146_vv
>> ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core dvb_core v4l2_common
>> videodev media cfg80211 vboxdrv(O) cpufreq_powersave cpufreq_conservative
>> cpufreq_userspace cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi
>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
>> crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev sparse_keymap drbg
>> iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek rfkill ansi_cprng
>> snd_hda_codec_generic nvidia(PO) aesni_intel aes_x86_64 lrw gf128mul
>> snd_hda_intel glue_helper ablk_helper snd_hda_codec cryptd snd_hda_core
>> serio_raw pcspkr snd_hwdep snd_pcm i2c_i801 snd_timer snd lpc_ich soundcore
>> 8250_fintek mei_me shpchp mei
>> Apr 20 07:05:37 rakete kernel: mfd_core battery tpm_tis tpm evdev
>> processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac
>> encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd2
>> btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
>> hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic md_mod dm_mirror
>> dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod ata_generic ahci libahci
>> pata_via xhci_pci ehci_pci crc32c_intel xhci_hcd ehci_hcd libata psmouse
>> scsi_mod atl1c usbcore usb_common fjes video wmi fan thermal button
>> Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted: P
>> O 4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
>> Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer System
>> Product Name/P8H67-V, BIOS 3707 07/12/2013
>> Apr 20 07:05:37 rakete kernel: 0000000000000286 000000006a1407c8
>> ffffffff812ed425 ffff88016b6dfb90
>> Apr 20 07:05:37 rakete kernel: ffffffffa03817b8 ffffffff81077ea1
>> ffff88018e7fcd30 ffff88016b6dfbe8
>> Apr 20 07:05:37 rakete kernel: ffff88005d863e88 ffff8801cde7a980
>> ffff88018e7fce48 ffffffff81077f2c
>> Apr 20 07:05:37 rakete kernel: Call Trace:
>> Apr 20 07:05:37 rakete kernel: [<ffffffff812ed425>] ?
>> dump_stack+0x5c/0x77
>> Apr 20 07:05:37 rakete kernel: [<ffffffff81077ea1>] ?
>> warn_slowpath_common+0x81/0xb0
>> Apr 20 07:05:37 rakete kernel: [<ffffffff81077f2c>] ?
>> warn_slowpath_fmt+0x5c/0x80
>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02d74af>] ?
>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]
>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02d76af>] ?
>> btrfs_cow_block+0x10f/0x1d0 [btrfs]
>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02db2cd>] ?
>> btrfs_search_slot+0x1fd/0xa30 [btrfs]
>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02dd3f1>] ?
>> btrfs_insert_empty_items+0x71/0xc0 [btrfs]
>> Apr 20 07:05:37 rakete kernel: [<ffffffff811f4d92>] ?
>> insert_inode_locked4+0xa2/0x1c0
>> Apr 20 07:05:37 rakete kernel: [<ffffffffa030ee5d>] ?
>> btrfs_new_inode+0x1cd/0x590 [btrfs]
>> Apr 20 07:05:37 rakete kernel: [<ffffffffa0310a77>] ?
>> btrfs_mkdir+0x107/0x1f0 [btrfs]
>> Apr 20 07:05:37 rakete kernel: [<ffffffff811e80b0>] ?
>> vfs_mkdir+0xb0/0x140
>> Apr 20 07:05:37 rakete kernel: [<ffffffff811e9d3e>] ?
>> SyS_mkdir+0xce/0x110
>> Apr 20 07:05:37 rakete kernel: [<ffffffff81592736>] ?
>> system_call_fast_compare_end+0xc/0x6b
>> Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
>> Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in
>> __btrfs_cow_block:1156: errno=-5 IO failure
>> Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced readonly
>>
>> ####
>> Try to copy again:
>> 11# cp -r /root/ .
>> cp: cannot create directory './root': Read-only file system
>> ####
>> /dev/sdg on /mnt/raid1 type btrfs
>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>> ####
>> plugin device sdg again:
>>
>> Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point
>> /media/matthias/BACKUP (device 8:81 no longer exist)
>> Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device number 3
>> Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No such
>> file or directory
>> Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is busy
>> Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info about
>> processes that
>> Apr 20 07:07:39 rakete umount[16807]: use the device is found by lsof(8)
>> or fuser(1).)
>> Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process exited,
>> code=exited status=32
>> Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
>> Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device number 4
>> using xhci_hcd
>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found,
>> idVendor=152d, idProduct=0567
>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings: Mfr=10,
>> Product=11, SerialNumber=5
>> Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
>> Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
>> Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>> device detected
>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid
>> 152d pid 0567: 5000000
>> Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
>> Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4:
>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>> Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an MTP
>> device
>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access WDC WD20
>> 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access WDC WD75
>> 00AACS-00C7B0 0125 PQ: 0 ANSI: 6
>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access WDC WD50
>> 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access SAMSUNG
>> SP2504C 0125 PQ: 0 ANSI: 6
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6 type
>> 0
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte
>> logical blocks: (2.00 TB/1.82 TiB)
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00 10 08
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7 type
>> 0
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode page
>> found
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive cache:
>> write through
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte
>> logical blocks: (750 GB/699 GiB)
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8 type
>> 0
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00 10 08
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9 type
>> 0
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte
>> logical blocks: (500 GB/466 GiB)
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode page
>> found
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive cache:
>> write through
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00 10 08
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte
>> logical blocks: (250 GB/233 GiB)
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode page
>> found
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive cache:
>> write through
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00 10 08
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode page
>> found
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive cache:
>> write through
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem with
>> ordered data mode. Opts: (null)
>>
>> ####
>> still ro mode
>> /dev/sdj on /mnt/raid1 type btrfs
>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>> ####
>> 14# btrfs fi show
>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>> Total devices 3 FS bytes used 1.60GiB
>> devid 1 size 698.64GiB used 3.03GiB path /dev/sdj
>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdk
>> devid 3 size 232.88GiB used 0.00B path /dev/sdl
>> ####
>>
>>
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
[not found] ` <57188534.1070408@jp.fujitsu.com>
@ 2016-04-21 11:58 ` Qu Wenruo
2016-04-22 2:21 ` Satoru Takeuchi
0 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2016-04-21 11:58 UTC (permalink / raw)
To: Satoru Takeuchi, Matthias Bodenbinder, linux-btrfs
On 04/21/2016 03:45 PM, Satoru Takeuchi wrote:
> On 2016/04/21 15:23, Satoru Takeuchi wrote:
>> On 2016/04/20 14:17, Matthias Bodenbinder wrote:
>>> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>>>> BTW, it would be better to post the dmesg for better debug.
>>>
>>> So here we. I did the same test again. Here is a full log of what i
>>> did. It seems to be mean like a bug in btrfs.
>>> Sequenz of events:
>>> 1. mount the raid1 (2 disc with different size)
>>> 2. unplug the biggest drive (hotplug)
>>> 3. try to copy something to the degraded raid1
>>> 4. plugin the device again (hotplug)
>>>
>>> This scenario does not work. The disc array is NOT redundant! I can
>>> not work with it while a drive is missing and I can not reattach the
>>> device so that everything works again.
>>>
>>> The btrfs module crashes during the test.
>>>
>>> I am using LMDE2 with backports:
>>> btrfs-tools 4.4-1~bpo8+1
>>> linux-image-4.4.0-0.bpo.1-amd64
>>>
>>> Matthias
>>>
>>>
>>> rakete - root - /root
>>> 1# mount /mnt/raid1/
>>>
>>> Journal:
>>>
>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto
>>> defrag
>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space
>>> caching is enabled
>>> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>>>
>>> rakete - root - /mnt/raid1
>>> 3# ll
>>> insgesamt 0
>>> drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
>>> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
>>> drwxr-xr-x 1 root root 108 Mär 24 07:31 var
>>>
>>> 4# btrfs fi show
>>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>> Total devices 3 FS bytes used 1.60GiB
>>> devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
>>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
>>> devid 3 size 232.88GiB used 0.00B path /dev/sdi
>>>
>>> ####
>>> unplug device sdg:
>>>
>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>> block 243826688, lost sync page write
>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>> journal superblock for sdf1-8.
>>> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>> block 243826688, lost sync page write
>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>> journal superblock for sdf1-8.
>>> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is busy
>>> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info
>>> about processes that
>>> Apr 20 07:03:05 rakete umount[16405]: use the device is found by
>>> lsof(8) or fuser(1).)
>>> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process
>>> exited, code=exited status=32
>>> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device
>>> number 3 using xhci_hcd
>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found,
>>> idVendor=152d, idProduct=0567
>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings:
>>> Mfr=10, Product=11, SerialNumber=5
>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
>>> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>> device detected
>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>> vid 152d pid 0567: 5000000
>>> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
>>> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3:
>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an
>>> MTP device
>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access WDC
>>> WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access WDC
>>> WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access
>>> SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6
>>> type 0
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7
>>> type 0
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte
>>> logical blocks: (2.00 TB/1.82 TiB)
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8
>>> type 0
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte
>>> logical blocks: (500 GB/466 GiB)
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page
>>> found
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive
>>> cache: write through
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00 10 08
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte
>>> logical blocks: (250 GB/233 GiB)
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page
>>> found
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive
>>> cache: write through
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00 10 08
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page
>>> found
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive
>>> cache: write through
>>> Apr 20 07:03:25 rakete kernel: sdf: sdf1
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>> with ordered data mode. Opts: (null)
>>> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No
>>> such file or directory
>>>
>>>
>>> ####
>>> 5# btrfs fi show
>>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>> Total devices 3 FS bytes used 1.60GiB
>>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdj
>>> devid 3 size 232.88GiB used 0.00B path /dev/sdk
>>> *** Some devices missing
>>> ####
>>
>> Here the names of *online* devices are changed
>> (/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk) after just
>> offlining a device (/dev/sdf). It's odd regardless of
>> whether Btrfs works fine or not.
>>
>> Can anyone explain this behavior?
>
> FYI,
>
> I tried to reproduce this problem on VM.
> Here USB storages are /dev/sd{a,b,c}.
>
> Step to reproduce:
>
> 1. create a fs on /dev/sd{a,b,c}
> 2. mount this fs
> 3. Surprise unplug /dev/sdc
> 4. Write to this fs till ENOSPC happens
>
> Then, although there are I/O errors about /dev/sdc,
> device names didn't change and ro remount didn't happen.
>
> command log:
> =================================
> # mkfs.btrfs -f -m raid1 -d raid1 /dev/sd{a,b,c}
> btrfs-progs v4.5.1-41-g8202204-dirty
> See http://btrfs.wiki.kernel.org for more information.
>
> Label: (null)
> UUID: 16a54915-c807-42cf-8365-82c0780c5ab5
> Node size: 16384
> Sector size: 4096
> Filesystem size: 15.00GiB
> Block group profiles:
> Data: RAID1 1.01GiB
> Metadata: RAID1 1.01GiB
> System: RAID1 12.00MiB
> SSD detected: no
> Incompat features: extref, skinny-metadata
> Number of devices: 3
> Devices:
> ID SIZE PATH
> 1 5.00GiB /dev/sda
> 2 5.00GiB /dev/sdb
> 3 5.00GiB /dev/sdc
>
> # mount /dev/sda /scratch_mnt/
> # btrfs fi show /scratch_mnt/
> Label: none uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
> Total devices 3 FS bytes used 640.00KiB
> devid 1 size 5.00GiB used 2.00GiB path /dev/sda
> devid 2 size 5.00GiB used 1.01GiB path /dev/sdb
> devid 3 size 5.00GiB used 1.01GiB path /dev/sdc
>
> #
> # # *** surprise unplug happens here ***
> #
> # btrfs fi show /scratch_mnt/
Would you please post the output of "btrfs-debug-tree -t 3"?
I guess the case would be that, there is not raid1 stripe in device 3,
so all data/metadata allocation/cow happens without problem.
"btrfs-debug-tree -t 3" output would verify my guess.
Thanks,
Qu
> Label: none uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
> Total devices 3 FS bytes used 1.81GiB
> devid 1 size 5.00GiB used 2.00GiB path /dev/sda
> devid 2 size 5.00GiB used 2.01GiB path /dev/sdb
> *** Some devices missing
>
> # cp -a linux /scratch_mnt/
> # cp -a linux /scratch_mnt/linux.2
> # cp -a linux /scratch_mnt/linux.3
> cp: error writing ‘/scratch_mnt/linux.3/drivers/scsi/lpfc/lpfc_els.c’:
> No space left on device
> ...
> # mount | grep scratch
> /dev/sda on /scratch_mnt type btrfs
> (rw,relatime,seclabel,space_cache,subvolid=5,subvol=/)
> # dmesg | tail
> [ 1400.778705] BTRFS warning (device sdc): lost page write due to IO
> error on /dev/sdc
> [ 1438.604796] btrfs_dev_stat_print_on_error: 174 callbacks suppressed
> [ 1438.604803] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125633,
> rd 1, flush 276, corrupt 0, gen 0
> [ 1438.609782] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
> rd 1, flush 276, corrupt 0, gen 0
> [ 1438.613331] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
> rd 1, flush 277, corrupt 0, gen 0
> [ 1438.669090] btrfs_end_buffer_write_sync: 52 callbacks suppressed
> [ 1438.669095] BTRFS warning (device sdc): lost page write due to IO
> error on /dev/sdc
> [ 1438.669098] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125635,
> rd 1, flush 277, corrupt 0, gen 0
> [ 1438.672621] BTRFS warning (device sdc): lost page write due to IO
> error on /dev/sdc
> [ 1438.672626] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125636,
> rd 1, flush 277, corrupt 0, gen 0
> =================================
>
> Thanks,
> Satoru
>
>>
>> Thanks,
>> Satoru
>>
>>> still mounted in rw mode:
>>> /dev/sdg on /mnt/raid1 type btrfs
>>> (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>> ####
>>> 7# cp -r /root/ .
>>> cp: das Verzeichnis „./root“ kann nicht angelegt werden:
>>> Eingabe-/Ausgabefehler
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error
>>> reading free space cache
>>> Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to
>>> load free space cache for block group 20497563648, rebuilding it now
>>> Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
>>> Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at
>>> /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156
>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
>>> Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
>>> Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage
>>> pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci
>>> saa7146_vv ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core
>>> dvb_core v4l2_common videodev media cfg80211 vboxdrv(O)
>>> cpufreq_powersave cpufreq_conservative cpufreq_userspace
>>> cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi
>>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
>>> irqbypass crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev
>>> sparse_keymap drbg iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek
>>> rfkill ansi_cprng snd_hda_codec_generic nvidia(PO) aesni_intel
>>> aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper
>>> snd_hda_codec cryptd snd_hda_core serio_raw pcspkr snd_hwdep snd_pcm
>>> i2c_i801 snd_timer snd lpc_ich soundcore 8250_fintek mei_me shpchp mei
>>> Apr 20 07:05:37 rakete kernel: mfd_core battery tpm_tis tpm evdev
>>> processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac
>>> encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache
>>> jbd2 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor
>>> async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic
>>> md_mod dm_mirror dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod
>>> ata_generic ahci libahci pata_via xhci_pci ehci_pci crc32c_intel
>>> xhci_hcd ehci_hcd libata psmouse scsi_mod atl1c usbcore usb_common
>>> fjes video wmi fan thermal button
>>> Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted:
>>> P O 4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
>>> Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer
>>> System Product Name/P8H67-V, BIOS 3707 07/12/2013
>>> Apr 20 07:05:37 rakete kernel: 0000000000000286 000000006a1407c8
>>> ffffffff812ed425 ffff88016b6dfb90
>>> Apr 20 07:05:37 rakete kernel: ffffffffa03817b8 ffffffff81077ea1
>>> ffff88018e7fcd30 ffff88016b6dfbe8
>>> Apr 20 07:05:37 rakete kernel: ffff88005d863e88 ffff8801cde7a980
>>> ffff88018e7fce48 ffffffff81077f2c
>>> Apr 20 07:05:37 rakete kernel: Call Trace:
>>> Apr 20 07:05:37 rakete kernel: [<ffffffff812ed425>] ?
>>> dump_stack+0x5c/0x77
>>> Apr 20 07:05:37 rakete kernel: [<ffffffff81077ea1>] ?
>>> warn_slowpath_common+0x81/0xb0
>>> Apr 20 07:05:37 rakete kernel: [<ffffffff81077f2c>] ?
>>> warn_slowpath_fmt+0x5c/0x80
>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02d74af>] ?
>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]
>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02d76af>] ?
>>> btrfs_cow_block+0x10f/0x1d0 [btrfs]
>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02db2cd>] ?
>>> btrfs_search_slot+0x1fd/0xa30 [btrfs]
>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02dd3f1>] ?
>>> btrfs_insert_empty_items+0x71/0xc0 [btrfs]
>>> Apr 20 07:05:37 rakete kernel: [<ffffffff811f4d92>] ?
>>> insert_inode_locked4+0xa2/0x1c0
>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa030ee5d>] ?
>>> btrfs_new_inode+0x1cd/0x590 [btrfs]
>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa0310a77>] ?
>>> btrfs_mkdir+0x107/0x1f0 [btrfs]
>>> Apr 20 07:05:37 rakete kernel: [<ffffffff811e80b0>] ?
>>> vfs_mkdir+0xb0/0x140
>>> Apr 20 07:05:37 rakete kernel: [<ffffffff811e9d3e>] ?
>>> SyS_mkdir+0xce/0x110
>>> Apr 20 07:05:37 rakete kernel: [<ffffffff81592736>] ?
>>> system_call_fast_compare_end+0xc/0x6b
>>> Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
>>> Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in
>>> __btrfs_cow_block:1156: errno=-5 IO failure
>>> Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced readonly
>>>
>>> ####
>>> Try to copy again:
>>> 11# cp -r /root/ .
>>> cp: cannot create directory './root': Read-only file system
>>> ####
>>> /dev/sdg on /mnt/raid1 type btrfs
>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>> ####
>>> plugin device sdg again:
>>>
>>> Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point
>>> /media/matthias/BACKUP (device 8:81 no longer exist)
>>> Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device number 3
>>> Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No
>>> such file or directory
>>> Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is busy
>>> Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info
>>> about processes that
>>> Apr 20 07:07:39 rakete umount[16807]: use the device is found by
>>> lsof(8) or fuser(1).)
>>> Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process
>>> exited, code=exited status=32
>>> Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>> Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device
>>> number 4 using xhci_hcd
>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found,
>>> idVendor=152d, idProduct=0567
>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings:
>>> Mfr=10, Product=11, SerialNumber=5
>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
>>> Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>> device detected
>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>> vid 152d pid 0567: 5000000
>>> Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
>>> Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4:
>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>> Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an
>>> MTP device
>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access WDC
>>> WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access WDC
>>> WD75 00AACS-00C7B0 0125 PQ: 0 ANSI: 6
>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access WDC
>>> WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access
>>> SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6
>>> type 0
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte
>>> logical blocks: (2.00 TB/1.82 TiB)
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00
>>> 10 08
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7
>>> type 0
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode
>>> page found
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive
>>> cache: write through
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte
>>> logical blocks: (750 GB/699 GiB)
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8
>>> type 0
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00
>>> 10 08
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9
>>> type 0
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte
>>> logical blocks: (500 GB/466 GiB)
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode
>>> page found
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive
>>> cache: write through
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00
>>> 10 08
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte
>>> logical blocks: (250 GB/233 GiB)
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode
>>> page found
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive
>>> cache: write through
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00
>>> 10 08
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode
>>> page found
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive
>>> cache: write through
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>> with ordered data mode. Opts: (null)
>>>
>>> ####
>>> still ro mode
>>> /dev/sdj on /mnt/raid1 type btrfs
>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>> ####
>>> 14# btrfs fi show
>>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>> Total devices 3 FS bytes used 1.60GiB
>>> devid 1 size 698.64GiB used 3.03GiB path /dev/sdj
>>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdk
>>> devid 3 size 232.88GiB used 0.00B path /dev/sdl
>>> ####
>>>
>>>
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-21 11:28 ` Henk Slager
@ 2016-04-21 17:27 ` Matthias Bodenbinder
2016-04-26 16:19 ` Henk Slager
0 siblings, 1 reply; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-21 17:27 UTC (permalink / raw)
To: linux-btrfs
Am 21.04.2016 um 13:28 schrieb Henk Slager:
>> Can anyone explain this behavior?
>
> All 4 drives (WD20, WD75, WD50, SP2504C) get a disconnect twice in
> this test. What is on WD20 is unclear to me, but the raid1 array is
> {WD75, WD50, SP2504C}
> So the test as described by Matthias is not what actually happens.
> In fact, the whole btrfs fs is 'disconnected on the lower layers of
> the kernel' but there is no unmount. You can see the scsi items go
> from 8?.0.0.x to
> 9.0.0.x to 10.0.0.x. In the 9.0.0.x state, the tools show then 1 dev
> missing (WD75), but in fact the whole fs state is messed up. So as
> indicated by Anand already, it is a bad test and it is what one can
> expect from an unpatched 4.4.0 kernel. ( I'm curious to know how md
> raidX would handle this ).
>
> a) My best guess is that the 4 drives are in a USB connected drivebay
> and that Matthias unplugged WD75 (so cut its power and SATA
> connection), did the file copy trial and then plugged in the WD75
> again into the drivebay. The (un)plug of a harddisk is then assumed to
> trigger a USB link re-init by the chipset in the drivebay.
>
> b) Another possibility is that due to (un)plug of WD75 cause the host
> USB chipset to re-init the USB link due to (too big?) changes in
> electrical current. And likely separate USB cables and maybe some
> SATA.
>
> c) Or some flaw in the LMDE2 distribution in combination with btrfs. I
> don't what is in the linux-image-4.4.0-0.bpo.1-amd64
>
Just to clarify my setup. I HDs are mounted into a FANTEC QB-35US3-6G case. According to the handbook it has "Hot-Plug for USB / eSATA interface".
It is equipped with 4 HDs. 3 of them are part of the raid1. The fourth HD is a 2 TB device with ext4 filesystem and no relevance for this thread.
Matthias
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-21 5:43 ` Qu Wenruo
2016-04-21 6:02 ` Liu Bo
@ 2016-04-21 17:40 ` Matthias Bodenbinder
2016-04-22 6:02 ` Qu Wenruo
1 sibling, 1 reply; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-21 17:40 UTC (permalink / raw)
To: linux-btrfs
Am 21.04.2016 um 07:43 schrieb Qu Wenruo:
> There are already unmerged patches which will partly do the mdadm level behavior, like automatically change to degraded mode without making the fs RO.
>
> The original patchset:
> http://comments.gmane.org/gmane.comp.file-systems.btrfs/48335
The description of thix patch says:
"Although the one-size-fit-all solution is quite safe, it's too strict if
data and metadata has different duplication level."
...
"This patchset will introduce a new per-chunk degradable check for btrfs,
allow above case to succeed, and it's quite small anyway."
My raid1 is "-m raid1 -d raid1". Both the same duplication level. Would that patch make any difference?
And: What do I need to do to test this in "debian stable"? I am not a programmer - but I know how to use git and how to compile with proper configuration directions.
Matthias
> Or the latest patchset inside Anand Jain's auto-replace patchset:
> http://thread.gmane.org/gmane.comp.file-systems.btrfs/55446
>
> Thanks,
> Qu
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-21 11:58 ` Qu Wenruo
@ 2016-04-22 2:21 ` Satoru Takeuchi
2016-04-22 5:32 ` Qu Wenruo
0 siblings, 1 reply; 32+ messages in thread
From: Satoru Takeuchi @ 2016-04-22 2:21 UTC (permalink / raw)
To: Qu Wenruo, Matthias Bodenbinder, linux-btrfs
On 2016/04/21 20:58, Qu Wenruo wrote:
>
>
> On 04/21/2016 03:45 PM, Satoru Takeuchi wrote:
>> On 2016/04/21 15:23, Satoru Takeuchi wrote:
>>> On 2016/04/20 14:17, Matthias Bodenbinder wrote:
>>>> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>>>>> BTW, it would be better to post the dmesg for better debug.
>>>>
>>>> So here we. I did the same test again. Here is a full log of what i
>>>> did. It seems to be mean like a bug in btrfs.
>>>> Sequenz of events:
>>>> 1. mount the raid1 (2 disc with different size)
>>>> 2. unplug the biggest drive (hotplug)
>>>> 3. try to copy something to the degraded raid1
>>>> 4. plugin the device again (hotplug)
>>>>
>>>> This scenario does not work. The disc array is NOT redundant! I can
>>>> not work with it while a drive is missing and I can not reattach the
>>>> device so that everything works again.
>>>>
>>>> The btrfs module crashes during the test.
>>>>
>>>> I am using LMDE2 with backports:
>>>> btrfs-tools 4.4-1~bpo8+1
>>>> linux-image-4.4.0-0.bpo.1-amd64
>>>>
>>>> Matthias
>>>>
>>>>
>>>> rakete - root - /root
>>>> 1# mount /mnt/raid1/
>>>>
>>>> Journal:
>>>>
>>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto
>>>> defrag
>>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space
>>>> caching is enabled
>>>> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>>>>
>>>> rakete - root - /mnt/raid1
>>>> 3# ll
>>>> insgesamt 0
>>>> drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
>>>> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
>>>> drwxr-xr-x 1 root root 108 Mär 24 07:31 var
>>>>
>>>> 4# btrfs fi show
>>>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>> Total devices 3 FS bytes used 1.60GiB
>>>> devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
>>>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
>>>> devid 3 size 232.88GiB used 0.00B path /dev/sdi
>>>>
>>>> ####
>>>> unplug device sdg:
>>>>
>>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>>> block 243826688, lost sync page write
>>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>>> journal superblock for sdf1-8.
>>>> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
>>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>>> block 243826688, lost sync page write
>>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>>> journal superblock for sdf1-8.
>>>> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is busy
>>>> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info
>>>> about processes that
>>>> Apr 20 07:03:05 rakete umount[16405]: use the device is found by
>>>> lsof(8) or fuser(1).)
>>>> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process
>>>> exited, code=exited status=32
>>>> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device
>>>> number 3 using xhci_hcd
>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found,
>>>> idVendor=152d, idProduct=0567
>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings:
>>>> Mfr=10, Product=11, SerialNumber=5
>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>>> device detected
>>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>>> vid 152d pid 0567: 5000000
>>>> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
>>>> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3:
>>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>>> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an
>>>> MTP device
>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access WDC
>>>> WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access WDC
>>>> WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access
>>>> SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6
>>>> type 0
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7
>>>> type 0
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte
>>>> logical blocks: (2.00 TB/1.82 TiB)
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8
>>>> type 0
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte
>>>> logical blocks: (500 GB/466 GiB)
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page
>>>> found
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive
>>>> cache: write through
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00 10 08
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte
>>>> logical blocks: (250 GB/233 GiB)
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page
>>>> found
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive
>>>> cache: write through
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00 10 08
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page
>>>> found
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive
>>>> cache: write through
>>>> Apr 20 07:03:25 rakete kernel: sdf: sdf1
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
>>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
>>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>>> with ordered data mode. Opts: (null)
>>>> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No
>>>> such file or directory
>>>>
>>>>
>>>> ####
>>>> 5# btrfs fi show
>>>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>> Total devices 3 FS bytes used 1.60GiB
>>>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdj
>>>> devid 3 size 232.88GiB used 0.00B path /dev/sdk
>>>> *** Some devices missing
>>>> ####
>>>
>>> Here the names of *online* devices are changed
>>> (/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk) after just
>>> offlining a device (/dev/sdf). It's odd regardless of
>>> whether Btrfs works fine or not.
>>>
>>> Can anyone explain this behavior?
>>
>> FYI,
>>
>> I tried to reproduce this problem on VM.
>> Here USB storages are /dev/sd{a,b,c}.
>>
>> Step to reproduce:
>>
>> 1. create a fs on /dev/sd{a,b,c}
>> 2. mount this fs
>> 3. Surprise unplug /dev/sdc
>> 4. Write to this fs till ENOSPC happens
>>
>> Then, although there are I/O errors about /dev/sdc,
>> device names didn't change and ro remount didn't happen.
>>
>> command log:
>> =================================
>> # mkfs.btrfs -f -m raid1 -d raid1 /dev/sd{a,b,c}
>> btrfs-progs v4.5.1-41-g8202204-dirty
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Label: (null)
>> UUID: 16a54915-c807-42cf-8365-82c0780c5ab5
>> Node size: 16384
>> Sector size: 4096
>> Filesystem size: 15.00GiB
>> Block group profiles:
>> Data: RAID1 1.01GiB
>> Metadata: RAID1 1.01GiB
>> System: RAID1 12.00MiB
>> SSD detected: no
>> Incompat features: extref, skinny-metadata
>> Number of devices: 3
>> Devices:
>> ID SIZE PATH
>> 1 5.00GiB /dev/sda
>> 2 5.00GiB /dev/sdb
>> 3 5.00GiB /dev/sdc
>>
>> # mount /dev/sda /scratch_mnt/
>> # btrfs fi show /scratch_mnt/
>> Label: none uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
>> Total devices 3 FS bytes used 640.00KiB
>> devid 1 size 5.00GiB used 2.00GiB path /dev/sda
>> devid 2 size 5.00GiB used 1.01GiB path /dev/sdb
>> devid 3 size 5.00GiB used 1.01GiB path /dev/sdc
>>
>> #
>> # # *** surprise unplug happens here ***
>> #
>> # btrfs fi show /scratch_mnt/
>
> Would you please post the output of "btrfs-debug-tree -t 3"?
>
> I guess the case would be that, there is not raid1 stripe in device 3, so all data/metadata allocation/cow happens without problem.
> "btrfs-debug-tree -t 3" output would verify my guess.
OK, here it is.
btrfs-debug-tree -t 3 before cp:
===========================
btrfs-progs v4.5.1-41-g8202204-dirty
chunk tree
leaf 20987904 items 6 free space 15503 generation 5 owner 3
fs uuid 30771a06-e6a8-4cbc-a094-893049fa5060
chunk uuid 2325f1b9-1bf0-4247-8c29-7b179eabf1b2
item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
dev item devid 1 total_bytes 5368709120 bytes used 2147483648
dev uuid 06bc0993-39d3-4d9a-b484-760ae2150c3a
item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
dev item devid 2 total_bytes 5368709120 bytes used 1082130432
dev uuid 3868895f-295b-4a89-a01c-ad0f1c5ac758
item 2 key (DEV_ITEMS DEV_ITEM 3) itemoff 15989 itemsize 98
dev item devid 3 total_bytes 5368709120 bytes used 1082130432
dev uuid 911e8702-9428-4b8e-bc6d-d212e909a1ef
item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15877 itemsize 112
chunk length 8388608 owner 2 stripe_len 65536
type SYSTEM|RAID1 num_stripes 2
stripe 0 devid 3 offset 1048576
dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
stripe 1 devid 2 offset 1048576
dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 15765 itemsize 112
chunk length 1073741824 owner 2 stripe_len 65536
type METADATA|RAID1 num_stripes 2
stripe 0 devid 1 offset 20971520
dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
stripe 1 devid 3 offset 9437184
dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 1103101952) itemoff 15653 itemsize 112
chunk length 1073741824 owner 2 stripe_len 65536
type DATA|RAID1 num_stripes 2
stripe 0 devid 2 offset 9437184
dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
stripe 1 devid 1 offset 1094713344
dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
total bytes 16106127360
bytes used 114688
uuid 30771a06-e6a8-4cbc-a094-893049fa5060
===========================
Here I hot unplug devid 2 (/dev/sdb).
btrfs-debug-tree -t 3 after cp (which cause ENOSPC):
===========================
btrfs-progs v4.5.1-41-g8202204-dirty
warning, device 2 is missing
chunk tree
leaf 20987904 items 11 free space 14818 generation 9 owner 3
fs uuid 30771a06-e6a8-4cbc-a094-893049fa5060
chunk uuid 2325f1b9-1bf0-4247-8c29-7b179eabf1b2
item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
dev item devid 1 total_bytes 5368709120 bytes used 4294967296
dev uuid 06bc0993-39d3-4d9a-b484-760ae2150c3a
item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
dev item devid 2 total_bytes 5368709120 bytes used 5367660544
dev uuid 3868895f-295b-4a89-a01c-ad0f1c5ac758
item 2 key (DEV_ITEMS DEV_ITEM 3) itemoff 15989 itemsize 98
dev item devid 3 total_bytes 5368709120 bytes used 5367660544
dev uuid 911e8702-9428-4b8e-bc6d-d212e909a1ef
item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15877 itemsize 112
chunk length 8388608 owner 2 stripe_len 65536
type SYSTEM|RAID1 num_stripes 2
stripe 0 devid 3 offset 1048576
dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
stripe 1 devid 2 offset 1048576
dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 15765 itemsize 112
chunk length 1073741824 owner 2 stripe_len 65536
type METADATA|RAID1 num_stripes 2
stripe 0 devid 1 offset 20971520
dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
stripe 1 devid 3 offset 9437184
dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 1103101952) itemoff 15653 itemsize 112
chunk length 1073741824 owner 2 stripe_len 65536
type DATA|RAID1 num_stripes 2
stripe 0 devid 2 offset 9437184
dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
stripe 1 devid 1 offset 1094713344
dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
item 6 key (FIRST_CHUNK_TREE CHUNK_ITEM 2176843776) itemoff 15541 itemsize 112
chunk length 1073741824 owner 2 stripe_len 65536
type DATA|RAID1 num_stripes 2
stripe 0 devid 2 offset 1083179008
dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
stripe 1 devid 3 offset 1083179008
dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
item 7 key (FIRST_CHUNK_TREE CHUNK_ITEM 3250585600) itemoff 15429 itemsize 112
chunk length 1073741824 owner 2 stripe_len 65536
type DATA|RAID1 num_stripes 2
stripe 0 devid 1 offset 2168455168
dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
stripe 1 devid 3 offset 2156920832
dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
item 8 key (FIRST_CHUNK_TREE CHUNK_ITEM 4324327424) itemoff 15317 itemsize 112
chunk length 1073741824 owner 2 stripe_len 65536
type DATA|RAID1 num_stripes 2
stripe 0 devid 2 offset 2156920832
dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
stripe 1 devid 1 offset 3242196992
dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
item 9 key (FIRST_CHUNK_TREE CHUNK_ITEM 5398069248) itemoff 15205 itemsize 112
chunk length 1073741824 owner 2 stripe_len 65536
type DATA|RAID1 num_stripes 2
stripe 0 devid 2 offset 3230662656
dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
stripe 1 devid 3 offset 3230662656
dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
item 10 key (FIRST_CHUNK_TREE CHUNK_ITEM 6471811072) itemoff 15093 itemsize 112
chunk length 1064304640 owner 2 stripe_len 65536
type DATA|RAID1 num_stripes 2
stripe 0 devid 2 offset 4304404480
dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
stripe 1 devid 3 offset 4304404480
dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
total bytes 16106127360
bytes used 6711709696
uuid 30771a06-e6a8-4cbc-a094-893049fa5060
===========================
In both before cp and after cp, there are
chunks containing /dev/sdb (devid 2).
Thanks,
Satoru
>
> Thanks,
> Qu
>> Label: none uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
>> Total devices 3 FS bytes used 1.81GiB
>> devid 1 size 5.00GiB used 2.00GiB path /dev/sda
>> devid 2 size 5.00GiB used 2.01GiB path /dev/sdb
>> *** Some devices missing
>>
>> # cp -a linux /scratch_mnt/
>> # cp -a linux /scratch_mnt/linux.2
>> # cp -a linux /scratch_mnt/linux.3
>> cp: error writing ‘/scratch_mnt/linux.3/drivers/scsi/lpfc/lpfc_els.c’:
>> No space left on device
>> ...
>> # mount | grep scratch
>> /dev/sda on /scratch_mnt type btrfs
>> (rw,relatime,seclabel,space_cache,subvolid=5,subvol=/)
>> # dmesg | tail
>> [ 1400.778705] BTRFS warning (device sdc): lost page write due to IO
>> error on /dev/sdc
>> [ 1438.604796] btrfs_dev_stat_print_on_error: 174 callbacks suppressed
>> [ 1438.604803] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125633,
>> rd 1, flush 276, corrupt 0, gen 0
>> [ 1438.609782] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
>> rd 1, flush 276, corrupt 0, gen 0
>> [ 1438.613331] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
>> rd 1, flush 277, corrupt 0, gen 0
>> [ 1438.669090] btrfs_end_buffer_write_sync: 52 callbacks suppressed
>> [ 1438.669095] BTRFS warning (device sdc): lost page write due to IO
>> error on /dev/sdc
>> [ 1438.669098] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125635,
>> rd 1, flush 277, corrupt 0, gen 0
>> [ 1438.672621] BTRFS warning (device sdc): lost page write due to IO
>> error on /dev/sdc
>> [ 1438.672626] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125636,
>> rd 1, flush 277, corrupt 0, gen 0
>> =================================
>>
>> Thanks,
>> Satoru
>>
>>>
>>> Thanks,
>>> Satoru
>>>
>>>> still mounted in rw mode:
>>>> /dev/sdg on /mnt/raid1 type btrfs
>>>> (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>> ####
>>>> 7# cp -r /root/ .
>>>> cp: das Verzeichnis „./root“ kann nicht angelegt werden:
>>>> Eingabe-/Ausgabefehler
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error
>>>> reading free space cache
>>>> Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to
>>>> load free space cache for block group 20497563648, rebuilding it now
>>>> Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
>>>> Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at
>>>> /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156
>>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
>>>> Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
>>>> Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage
>>>> pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci
>>>> saa7146_vv ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core
>>>> dvb_core v4l2_common videodev media cfg80211 vboxdrv(O)
>>>> cpufreq_powersave cpufreq_conservative cpufreq_userspace
>>>> cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi
>>>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
>>>> irqbypass crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev
>>>> sparse_keymap drbg iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek
>>>> rfkill ansi_cprng snd_hda_codec_generic nvidia(PO) aesni_intel
>>>> aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper
>>>> snd_hda_codec cryptd snd_hda_core serio_raw pcspkr snd_hwdep snd_pcm
>>>> i2c_i801 snd_timer snd lpc_ich soundcore 8250_fintek mei_me shpchp mei
>>>> Apr 20 07:05:37 rakete kernel: mfd_core battery tpm_tis tpm evdev
>>>> processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac
>>>> encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache
>>>> jbd2 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor
>>>> async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic
>>>> md_mod dm_mirror dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod
>>>> ata_generic ahci libahci pata_via xhci_pci ehci_pci crc32c_intel
>>>> xhci_hcd ehci_hcd libata psmouse scsi_mod atl1c usbcore usb_common
>>>> fjes video wmi fan thermal button
>>>> Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted:
>>>> P O 4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
>>>> Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer
>>>> System Product Name/P8H67-V, BIOS 3707 07/12/2013
>>>> Apr 20 07:05:37 rakete kernel: 0000000000000286 000000006a1407c8
>>>> ffffffff812ed425 ffff88016b6dfb90
>>>> Apr 20 07:05:37 rakete kernel: ffffffffa03817b8 ffffffff81077ea1
>>>> ffff88018e7fcd30 ffff88016b6dfbe8
>>>> Apr 20 07:05:37 rakete kernel: ffff88005d863e88 ffff8801cde7a980
>>>> ffff88018e7fce48 ffffffff81077f2c
>>>> Apr 20 07:05:37 rakete kernel: Call Trace:
>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff812ed425>] ?
>>>> dump_stack+0x5c/0x77
>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff81077ea1>] ?
>>>> warn_slowpath_common+0x81/0xb0
>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff81077f2c>] ?
>>>> warn_slowpath_fmt+0x5c/0x80
>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02d74af>] ?
>>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]
>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02d76af>] ?
>>>> btrfs_cow_block+0x10f/0x1d0 [btrfs]
>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02db2cd>] ?
>>>> btrfs_search_slot+0x1fd/0xa30 [btrfs]
>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02dd3f1>] ?
>>>> btrfs_insert_empty_items+0x71/0xc0 [btrfs]
>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff811f4d92>] ?
>>>> insert_inode_locked4+0xa2/0x1c0
>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa030ee5d>] ?
>>>> btrfs_new_inode+0x1cd/0x590 [btrfs]
>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa0310a77>] ?
>>>> btrfs_mkdir+0x107/0x1f0 [btrfs]
>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff811e80b0>] ?
>>>> vfs_mkdir+0xb0/0x140
>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff811e9d3e>] ?
>>>> SyS_mkdir+0xce/0x110
>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff81592736>] ?
>>>> system_call_fast_compare_end+0xc/0x6b
>>>> Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
>>>> Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in
>>>> __btrfs_cow_block:1156: errno=-5 IO failure
>>>> Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced readonly
>>>>
>>>> ####
>>>> Try to copy again:
>>>> 11# cp -r /root/ .
>>>> cp: cannot create directory './root': Read-only file system
>>>> ####
>>>> /dev/sdg on /mnt/raid1 type btrfs
>>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>> ####
>>>> plugin device sdg again:
>>>>
>>>> Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point
>>>> /media/matthias/BACKUP (device 8:81 no longer exist)
>>>> Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device number 3
>>>> Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No
>>>> such file or directory
>>>> Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is busy
>>>> Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info
>>>> about processes that
>>>> Apr 20 07:07:39 rakete umount[16807]: use the device is found by
>>>> lsof(8) or fuser(1).)
>>>> Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process
>>>> exited, code=exited status=32
>>>> Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device
>>>> number 4 using xhci_hcd
>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found,
>>>> idVendor=152d, idProduct=0567
>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings:
>>>> Mfr=10, Product=11, SerialNumber=5
>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>>> device detected
>>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>>> vid 152d pid 0567: 5000000
>>>> Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
>>>> Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4:
>>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>>> Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an
>>>> MTP device
>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access WDC
>>>> WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access WDC
>>>> WD75 00AACS-00C7B0 0125 PQ: 0 ANSI: 6
>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access WDC
>>>> WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access
>>>> SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6
>>>> type 0
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte
>>>> logical blocks: (2.00 TB/1.82 TiB)
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00
>>>> 10 08
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7
>>>> type 0
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode
>>>> page found
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive
>>>> cache: write through
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte
>>>> logical blocks: (750 GB/699 GiB)
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8
>>>> type 0
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00
>>>> 10 08
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9
>>>> type 0
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte
>>>> logical blocks: (500 GB/466 GiB)
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode
>>>> page found
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive
>>>> cache: write through
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00
>>>> 10 08
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte
>>>> logical blocks: (250 GB/233 GiB)
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode
>>>> page found
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive
>>>> cache: write through
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00
>>>> 10 08
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode
>>>> page found
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive
>>>> cache: write through
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
>>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
>>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>>> with ordered data mode. Opts: (null)
>>>>
>>>> ####
>>>> still ro mode
>>>> /dev/sdj on /mnt/raid1 type btrfs
>>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>> ####
>>>> 14# btrfs fi show
>>>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>> Total devices 3 FS bytes used 1.60GiB
>>>> devid 1 size 698.64GiB used 3.03GiB path /dev/sdj
>>>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdk
>>>> devid 3 size 232.88GiB used 0.00B path /dev/sdl
>>>> ####
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-22 2:21 ` Satoru Takeuchi
@ 2016-04-22 5:32 ` Qu Wenruo
2016-04-22 6:17 ` Satoru Takeuchi
0 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2016-04-22 5:32 UTC (permalink / raw)
To: Satoru Takeuchi, Qu Wenruo, Matthias Bodenbinder, linux-btrfs
Satoru Takeuchi wrote on 2016/04/22 11:21 +0900:
> On 2016/04/21 20:58, Qu Wenruo wrote:
>>
>>
>> On 04/21/2016 03:45 PM, Satoru Takeuchi wrote:
>>> On 2016/04/21 15:23, Satoru Takeuchi wrote:
>>>> On 2016/04/20 14:17, Matthias Bodenbinder wrote:
>>>>> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>>>>>> BTW, it would be better to post the dmesg for better debug.
>>>>>
>>>>> So here we. I did the same test again. Here is a full log of what i
>>>>> did. It seems to be mean like a bug in btrfs.
>>>>> Sequenz of events:
>>>>> 1. mount the raid1 (2 disc with different size)
>>>>> 2. unplug the biggest drive (hotplug)
>>>>> 3. try to copy something to the degraded raid1
>>>>> 4. plugin the device again (hotplug)
>>>>>
>>>>> This scenario does not work. The disc array is NOT redundant! I can
>>>>> not work with it while a drive is missing and I can not reattach the
>>>>> device so that everything works again.
>>>>>
>>>>> The btrfs module crashes during the test.
>>>>>
>>>>> I am using LMDE2 with backports:
>>>>> btrfs-tools 4.4-1~bpo8+1
>>>>> linux-image-4.4.0-0.bpo.1-amd64
>>>>>
>>>>> Matthias
>>>>>
>>>>>
>>>>> rakete - root - /root
>>>>> 1# mount /mnt/raid1/
>>>>>
>>>>> Journal:
>>>>>
>>>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto
>>>>> defrag
>>>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space
>>>>> caching is enabled
>>>>> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>>>>>
>>>>> rakete - root - /mnt/raid1
>>>>> 3# ll
>>>>> insgesamt 0
>>>>> drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
>>>>> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
>>>>> drwxr-xr-x 1 root root 108 Mär 24 07:31 var
>>>>>
>>>>> 4# btrfs fi show
>>>>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>> Total devices 3 FS bytes used 1.60GiB
>>>>> devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
>>>>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
>>>>> devid 3 size 232.88GiB used 0.00B path /dev/sdi
>>>>>
>>>>> ####
>>>>> unplug device sdg:
>>>>>
>>>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>>>> block 243826688, lost sync page write
>>>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>>>> journal superblock for sdf1-8.
>>>>> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
>>>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>>>> block 243826688, lost sync page write
>>>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>>>> journal superblock for sdf1-8.
>>>>> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is
>>>>> busy
>>>>> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info
>>>>> about processes that
>>>>> Apr 20 07:03:05 rakete umount[16405]: use the device is found by
>>>>> lsof(8) or fuser(1).)
>>>>> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process
>>>>> exited, code=exited status=32
>>>>> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device
>>>>> number 3 using xhci_hcd
>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found,
>>>>> idVendor=152d, idProduct=0567
>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings:
>>>>> Mfr=10, Product=11, SerialNumber=5
>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI
>>>>> Bridge
>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>>>> device detected
>>>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>>>> vid 152d pid 0567: 5000000
>>>>> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
>>>>> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3:
>>>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>>>> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an
>>>>> MTP device
>>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access WDC
>>>>> WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
>>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access WDC
>>>>> WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
>>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access
>>>>> SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6
>>>>> type 0
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7
>>>>> type 0
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte
>>>>> logical blocks: (2.00 TB/1.82 TiB)
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00
>>>>> 10 08
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8
>>>>> type 0
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte
>>>>> logical blocks: (500 GB/466 GiB)
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page
>>>>> found
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive
>>>>> cache: write through
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00
>>>>> 10 08
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte
>>>>> logical blocks: (250 GB/233 GiB)
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page
>>>>> found
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive
>>>>> cache: write through
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00
>>>>> 10 08
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page
>>>>> found
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive
>>>>> cache: write through
>>>>> Apr 20 07:03:25 rakete kernel: sdf: sdf1
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
>>>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
>>>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>>>> with ordered data mode. Opts: (null)
>>>>> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No
>>>>> such file or directory
>>>>>
>>>>>
>>>>> ####
>>>>> 5# btrfs fi show
>>>>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>> Total devices 3 FS bytes used 1.60GiB
>>>>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdj
>>>>> devid 3 size 232.88GiB used 0.00B path /dev/sdk
>>>>> *** Some devices missing
>>>>> ####
>>>>
>>>> Here the names of *online* devices are changed
>>>> (/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk) after just
>>>> offlining a device (/dev/sdf). It's odd regardless of
>>>> whether Btrfs works fine or not.
>>>>
>>>> Can anyone explain this behavior?
>>>
>>> FYI,
>>>
>>> I tried to reproduce this problem on VM.
>>> Here USB storages are /dev/sd{a,b,c}.
>>>
>>> Step to reproduce:
>>>
>>> 1. create a fs on /dev/sd{a,b,c}
>>> 2. mount this fs
>>> 3. Surprise unplug /dev/sdc
>>> 4. Write to this fs till ENOSPC happens
>>>
>>> Then, although there are I/O errors about /dev/sdc,
>>> device names didn't change and ro remount didn't happen.
>>>
>>> command log:
>>> =================================
>>> # mkfs.btrfs -f -m raid1 -d raid1 /dev/sd{a,b,c}
>>> btrfs-progs v4.5.1-41-g8202204-dirty
>>> See http://btrfs.wiki.kernel.org for more information.
>>>
>>> Label: (null)
>>> UUID: 16a54915-c807-42cf-8365-82c0780c5ab5
>>> Node size: 16384
>>> Sector size: 4096
>>> Filesystem size: 15.00GiB
>>> Block group profiles:
>>> Data: RAID1 1.01GiB
>>> Metadata: RAID1 1.01GiB
>>> System: RAID1 12.00MiB
>>> SSD detected: no
>>> Incompat features: extref, skinny-metadata
>>> Number of devices: 3
>>> Devices:
>>> ID SIZE PATH
>>> 1 5.00GiB /dev/sda
>>> 2 5.00GiB /dev/sdb
>>> 3 5.00GiB /dev/sdc
>>>
>>> # mount /dev/sda /scratch_mnt/
>>> # btrfs fi show /scratch_mnt/
>>> Label: none uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
>>> Total devices 3 FS bytes used 640.00KiB
>>> devid 1 size 5.00GiB used 2.00GiB path /dev/sda
>>> devid 2 size 5.00GiB used 1.01GiB path /dev/sdb
>>> devid 3 size 5.00GiB used 1.01GiB path /dev/sdc
>>>
>>> #
>>> # # *** surprise unplug happens here ***
>>> #
>>> # btrfs fi show /scratch_mnt/
>>
>> Would you please post the output of "btrfs-debug-tree -t 3"?
>>
>> I guess the case would be that, there is not raid1 stripe in device 3,
>> so all data/metadata allocation/cow happens without problem.
>> "btrfs-debug-tree -t 3" output would verify my guess.
>
> OK, here it is.
>
> btrfs-debug-tree -t 3 before cp:
> ===========================
> btrfs-progs v4.5.1-41-g8202204-dirty
> chunk tree
> leaf 20987904 items 6 free space 15503 generation 5 owner 3
> fs uuid 30771a06-e6a8-4cbc-a094-893049fa5060
> chunk uuid 2325f1b9-1bf0-4247-8c29-7b179eabf1b2
> item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
> dev item devid 1 total_bytes 5368709120 bytes used 2147483648
> dev uuid 06bc0993-39d3-4d9a-b484-760ae2150c3a
> item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
> dev item devid 2 total_bytes 5368709120 bytes used 1082130432
> dev uuid 3868895f-295b-4a89-a01c-ad0f1c5ac758
> item 2 key (DEV_ITEMS DEV_ITEM 3) itemoff 15989 itemsize 98
> dev item devid 3 total_bytes 5368709120 bytes used 1082130432
> dev uuid 911e8702-9428-4b8e-bc6d-d212e909a1ef
> item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15877
> itemsize 112
> chunk length 8388608 owner 2 stripe_len 65536
> type SYSTEM|RAID1 num_stripes 2
> stripe 0 devid 3 offset 1048576
> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
> stripe 1 devid 2 offset 1048576
> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 15765
> itemsize 112
> chunk length 1073741824 owner 2 stripe_len 65536
> type METADATA|RAID1 num_stripes 2
> stripe 0 devid 1 offset 20971520
> dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
> stripe 1 devid 3 offset 9437184
> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
> item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 1103101952) itemoff 15653
> itemsize 112
> chunk length 1073741824 owner 2 stripe_len 65536
> type DATA|RAID1 num_stripes 2
> stripe 0 devid 2 offset 9437184
> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
> stripe 1 devid 1 offset 1094713344
> dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
> total bytes 16106127360
> bytes used 114688
> uuid 30771a06-e6a8-4cbc-a094-893049fa5060
> ===========================
>
>
>
> Here I hot unplug devid 2 (/dev/sdb).
>
>
>
> btrfs-debug-tree -t 3 after cp (which cause ENOSPC):
> ===========================
> btrfs-progs v4.5.1-41-g8202204-dirty
> warning, device 2 is missing
> chunk tree
> leaf 20987904 items 11 free space 14818 generation 9 owner 3
> fs uuid 30771a06-e6a8-4cbc-a094-893049fa5060
> chunk uuid 2325f1b9-1bf0-4247-8c29-7b179eabf1b2
> item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
> dev item devid 1 total_bytes 5368709120 bytes used 4294967296
> dev uuid 06bc0993-39d3-4d9a-b484-760ae2150c3a
> item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
> dev item devid 2 total_bytes 5368709120 bytes used 5367660544
> dev uuid 3868895f-295b-4a89-a01c-ad0f1c5ac758
> item 2 key (DEV_ITEMS DEV_ITEM 3) itemoff 15989 itemsize 98
> dev item devid 3 total_bytes 5368709120 bytes used 5367660544
> dev uuid 911e8702-9428-4b8e-bc6d-d212e909a1ef
> item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15877
> itemsize 112
> chunk length 8388608 owner 2 stripe_len 65536
> type SYSTEM|RAID1 num_stripes 2
> stripe 0 devid 3 offset 1048576
> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
> stripe 1 devid 2 offset 1048576
> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 15765
> itemsize 112
> chunk length 1073741824 owner 2 stripe_len 65536
> type METADATA|RAID1 num_stripes 2
> stripe 0 devid 1 offset 20971520
> dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
> stripe 1 devid 3 offset 9437184
> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
> item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 1103101952) itemoff 15653
> itemsize 112
> chunk length 1073741824 owner 2 stripe_len 65536
> type DATA|RAID1 num_stripes 2
> stripe 0 devid 2 offset 9437184
> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
> stripe 1 devid 1 offset 1094713344
> dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
> item 6 key (FIRST_CHUNK_TREE CHUNK_ITEM 2176843776) itemoff 15541
> itemsize 112
> chunk length 1073741824 owner 2 stripe_len 65536
> type DATA|RAID1 num_stripes 2
> stripe 0 devid 2 offset 1083179008
> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
> stripe 1 devid 3 offset 1083179008
> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
> item 7 key (FIRST_CHUNK_TREE CHUNK_ITEM 3250585600) itemoff 15429
> itemsize 112
> chunk length 1073741824 owner 2 stripe_len 65536
> type DATA|RAID1 num_stripes 2
> stripe 0 devid 1 offset 2168455168
> dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
> stripe 1 devid 3 offset 2156920832
> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
> item 8 key (FIRST_CHUNK_TREE CHUNK_ITEM 4324327424) itemoff 15317
> itemsize 112
> chunk length 1073741824 owner 2 stripe_len 65536
> type DATA|RAID1 num_stripes 2
> stripe 0 devid 2 offset 2156920832
> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
> stripe 1 devid 1 offset 3242196992
> dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
> item 9 key (FIRST_CHUNK_TREE CHUNK_ITEM 5398069248) itemoff 15205
> itemsize 112
> chunk length 1073741824 owner 2 stripe_len 65536
> type DATA|RAID1 num_stripes 2
> stripe 0 devid 2 offset 3230662656
> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
> stripe 1 devid 3 offset 3230662656
> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
> item 10 key (FIRST_CHUNK_TREE CHUNK_ITEM 6471811072) itemoff 15093
> itemsize 112
> chunk length 1064304640 owner 2 stripe_len 65536
> type DATA|RAID1 num_stripes 2
> stripe 0 devid 2 offset 4304404480
> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
> stripe 1 devid 3 offset 4304404480
> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
> total bytes 16106127360
> bytes used 6711709696
> uuid 30771a06-e6a8-4cbc-a094-893049fa5060
> ===========================
>
> In both before cp and after cp, there are
> chunks containing /dev/sdb (devid 2).
Right, even newly create data chunk have strips on devid 2.
Making the original bug a little strange now.
Thanks,
Qu
>
> Thanks,
> Satoru
>
>>
>> Thanks,
>> Qu
>>> Label: none uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
>>> Total devices 3 FS bytes used 1.81GiB
>>> devid 1 size 5.00GiB used 2.00GiB path /dev/sda
>>> devid 2 size 5.00GiB used 2.01GiB path /dev/sdb
>>> *** Some devices missing
>>>
>>> # cp -a linux /scratch_mnt/
>>> # cp -a linux /scratch_mnt/linux.2
>>> # cp -a linux /scratch_mnt/linux.3
>>> cp: error writing ‘/scratch_mnt/linux.3/drivers/scsi/lpfc/lpfc_els.c’:
>>> No space left on device
>>> ...
>>> # mount | grep scratch
>>> /dev/sda on /scratch_mnt type btrfs
>>> (rw,relatime,seclabel,space_cache,subvolid=5,subvol=/)
>>> # dmesg | tail
>>> [ 1400.778705] BTRFS warning (device sdc): lost page write due to IO
>>> error on /dev/sdc
>>> [ 1438.604796] btrfs_dev_stat_print_on_error: 174 callbacks suppressed
>>> [ 1438.604803] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125633,
>>> rd 1, flush 276, corrupt 0, gen 0
>>> [ 1438.609782] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
>>> rd 1, flush 276, corrupt 0, gen 0
>>> [ 1438.613331] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
>>> rd 1, flush 277, corrupt 0, gen 0
>>> [ 1438.669090] btrfs_end_buffer_write_sync: 52 callbacks suppressed
>>> [ 1438.669095] BTRFS warning (device sdc): lost page write due to IO
>>> error on /dev/sdc
>>> [ 1438.669098] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125635,
>>> rd 1, flush 277, corrupt 0, gen 0
>>> [ 1438.672621] BTRFS warning (device sdc): lost page write due to IO
>>> error on /dev/sdc
>>> [ 1438.672626] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125636,
>>> rd 1, flush 277, corrupt 0, gen 0
>>> =================================
>>>
>>> Thanks,
>>> Satoru
>>>
>>>>
>>>> Thanks,
>>>> Satoru
>>>>
>>>>> still mounted in rw mode:
>>>>> /dev/sdg on /mnt/raid1 type btrfs
>>>>> (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>>> ####
>>>>> 7# cp -r /root/ .
>>>>> cp: das Verzeichnis „./root“ kann nicht angelegt werden:
>>>>> Eingabe-/Ausgabefehler
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error
>>>>> reading free space cache
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to
>>>>> load free space cache for block group 20497563648, rebuilding it now
>>>>> Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
>>>>> Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at
>>>>> /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156
>>>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
>>>>> Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage
>>>>> pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci
>>>>> saa7146_vv ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core
>>>>> dvb_core v4l2_common videodev media cfg80211 vboxdrv(O)
>>>>> cpufreq_powersave cpufreq_conservative cpufreq_userspace
>>>>> cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi
>>>>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
>>>>> irqbypass crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev
>>>>> sparse_keymap drbg iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek
>>>>> rfkill ansi_cprng snd_hda_codec_generic nvidia(PO) aesni_intel
>>>>> aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper
>>>>> snd_hda_codec cryptd snd_hda_core serio_raw pcspkr snd_hwdep snd_pcm
>>>>> i2c_i801 snd_timer snd lpc_ich soundcore 8250_fintek mei_me shpchp mei
>>>>> Apr 20 07:05:37 rakete kernel: mfd_core battery tpm_tis tpm evdev
>>>>> processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac
>>>>> encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache
>>>>> jbd2 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor
>>>>> async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic
>>>>> md_mod dm_mirror dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod
>>>>> ata_generic ahci libahci pata_via xhci_pci ehci_pci crc32c_intel
>>>>> xhci_hcd ehci_hcd libata psmouse scsi_mod atl1c usbcore usb_common
>>>>> fjes video wmi fan thermal button
>>>>> Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted:
>>>>> P O 4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
>>>>> Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer
>>>>> System Product Name/P8H67-V, BIOS 3707 07/12/2013
>>>>> Apr 20 07:05:37 rakete kernel: 0000000000000286 000000006a1407c8
>>>>> ffffffff812ed425 ffff88016b6dfb90
>>>>> Apr 20 07:05:37 rakete kernel: ffffffffa03817b8 ffffffff81077ea1
>>>>> ffff88018e7fcd30 ffff88016b6dfbe8
>>>>> Apr 20 07:05:37 rakete kernel: ffff88005d863e88 ffff8801cde7a980
>>>>> ffff88018e7fce48 ffffffff81077f2c
>>>>> Apr 20 07:05:37 rakete kernel: Call Trace:
>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff812ed425>] ?
>>>>> dump_stack+0x5c/0x77
>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff81077ea1>] ?
>>>>> warn_slowpath_common+0x81/0xb0
>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff81077f2c>] ?
>>>>> warn_slowpath_fmt+0x5c/0x80
>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02d74af>] ?
>>>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]
>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02d76af>] ?
>>>>> btrfs_cow_block+0x10f/0x1d0 [btrfs]
>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02db2cd>] ?
>>>>> btrfs_search_slot+0x1fd/0xa30 [btrfs]
>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02dd3f1>] ?
>>>>> btrfs_insert_empty_items+0x71/0xc0 [btrfs]
>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff811f4d92>] ?
>>>>> insert_inode_locked4+0xa2/0x1c0
>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa030ee5d>] ?
>>>>> btrfs_new_inode+0x1cd/0x590 [btrfs]
>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa0310a77>] ?
>>>>> btrfs_mkdir+0x107/0x1f0 [btrfs]
>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff811e80b0>] ?
>>>>> vfs_mkdir+0xb0/0x140
>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff811e9d3e>] ?
>>>>> SyS_mkdir+0xce/0x110
>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff81592736>] ?
>>>>> system_call_fast_compare_end+0xc/0x6b
>>>>> Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in
>>>>> __btrfs_cow_block:1156: errno=-5 IO failure
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced
>>>>> readonly
>>>>>
>>>>> ####
>>>>> Try to copy again:
>>>>> 11# cp -r /root/ .
>>>>> cp: cannot create directory './root': Read-only file system
>>>>> ####
>>>>> /dev/sdg on /mnt/raid1 type btrfs
>>>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>>> ####
>>>>> plugin device sdg again:
>>>>>
>>>>> Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point
>>>>> /media/matthias/BACKUP (device 8:81 no longer exist)
>>>>> Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device
>>>>> number 3
>>>>> Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No
>>>>> such file or directory
>>>>> Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is
>>>>> busy
>>>>> Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info
>>>>> about processes that
>>>>> Apr 20 07:07:39 rakete umount[16807]: use the device is found by
>>>>> lsof(8) or fuser(1).)
>>>>> Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process
>>>>> exited, code=exited status=32
>>>>> Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device
>>>>> number 4 using xhci_hcd
>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found,
>>>>> idVendor=152d, idProduct=0567
>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings:
>>>>> Mfr=10, Product=11, SerialNumber=5
>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI
>>>>> Bridge
>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>>>> device detected
>>>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>>>> vid 152d pid 0567: 5000000
>>>>> Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
>>>>> Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4:
>>>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>>>> Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an
>>>>> MTP device
>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access WDC
>>>>> WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access WDC
>>>>> WD75 00AACS-00C7B0 0125 PQ: 0 ANSI: 6
>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access WDC
>>>>> WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access
>>>>> SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6
>>>>> type 0
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte
>>>>> logical blocks: (2.00 TB/1.82 TiB)
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00
>>>>> 10 08
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7
>>>>> type 0
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode
>>>>> page found
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive
>>>>> cache: write through
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte
>>>>> logical blocks: (750 GB/699 GiB)
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8
>>>>> type 0
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00
>>>>> 10 08
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9
>>>>> type 0
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte
>>>>> logical blocks: (500 GB/466 GiB)
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode
>>>>> page found
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive
>>>>> cache: write through
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00
>>>>> 10 08
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte
>>>>> logical blocks: (250 GB/233 GiB)
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode
>>>>> page found
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive
>>>>> cache: write through
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00
>>>>> 10 08
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode
>>>>> page found
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive
>>>>> cache: write through
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
>>>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
>>>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>>>> with ordered data mode. Opts: (null)
>>>>>
>>>>> ####
>>>>> still ro mode
>>>>> /dev/sdj on /mnt/raid1 type btrfs
>>>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>>> ####
>>>>> 14# btrfs fi show
>>>>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>> Total devices 3 FS bytes used 1.60GiB
>>>>> devid 1 size 698.64GiB used 3.03GiB path /dev/sdj
>>>>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdk
>>>>> devid 3 size 232.88GiB used 0.00B path /dev/sdl
>>>>> ####
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-21 17:40 ` Matthias Bodenbinder
@ 2016-04-22 6:02 ` Qu Wenruo
2016-04-23 7:07 ` Matthias Bodenbinder
0 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2016-04-22 6:02 UTC (permalink / raw)
To: Matthias Bodenbinder, linux-btrfs
Matthias Bodenbinder wrote on 2016/04/21 19:40 +0200:
> Am 21.04.2016 um 07:43 schrieb Qu Wenruo:
>> There are already unmerged patches which will partly do the mdadm level behavior, like automatically change to degraded mode without making the fs RO.
>>
>> The original patchset:
>> http://comments.gmane.org/gmane.comp.file-systems.btrfs/48335
>
> The description of thix patch says:
>
> "Although the one-size-fit-all solution is quite safe, it's too strict if
> data and metadata has different duplication level."
> ...
> "This patchset will introduce a new per-chunk degradable check for btrfs,
> allow above case to succeed, and it's quite small anyway."
>
>
> My raid1 is "-m raid1 -d raid1". Both the same duplication level. Would that patch make any difference?
Without this patch, we can abort_transaction() at commit or space
allocation time.
(There is also user can't reproduce your bug though)
Although this patchset is not full fix, it provides the basis for later
raid1 failure fix.
And that's the reason Anand Jain pick these patchse into this big
auto-replace patchset.
I was meant to do further fix, but now Anand Jain is pushing
auto-replace so I didn't do anything newer after the original patchset.
>
> And: What do I need to do to test this in "debian stable"? I am not a programmer - but I know how to use git and how to compile with proper configuration directions.
If no experience in git and kernel compile, then you can still do your
contribution.
Since Satoru can't reproduce the problem, would you please try his
method to reproduce it?
As I found your kernel is 4.4, not old but still not the latest, while I
think Satoru is using the latest one.
If it's possible, please use the 4.5/4.6-rc kernel if debian provided.
If it's not possible (debian doesn't provide 4.5 or 4.6-rc), would you
please try the same process Satoro provided.
As unlike Satoru's process, your fs is not newly created(empty).
If we can reproduce it, it would be much easier to fix.
Thanks,
Qu
>
> Matthias
>
>
>> Or the latest patchset inside Anand Jain's auto-replace patchset:
>> http://thread.gmane.org/gmane.comp.file-systems.btrfs/55446
>>
>> Thanks,
>> Qu
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-22 5:32 ` Qu Wenruo
@ 2016-04-22 6:17 ` Satoru Takeuchi
0 siblings, 0 replies; 32+ messages in thread
From: Satoru Takeuchi @ 2016-04-22 6:17 UTC (permalink / raw)
To: Qu Wenruo, Qu Wenruo, Matthias Bodenbinder, linux-btrfs
On 2016/04/22 14:32, Qu Wenruo wrote:
>
>
> Satoru Takeuchi wrote on 2016/04/22 11:21 +0900:
>> On 2016/04/21 20:58, Qu Wenruo wrote:
>>>
>>>
>>> On 04/21/2016 03:45 PM, Satoru Takeuchi wrote:
>>>> On 2016/04/21 15:23, Satoru Takeuchi wrote:
>>>>> On 2016/04/20 14:17, Matthias Bodenbinder wrote:
>>>>>> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>>>>>>> BTW, it would be better to post the dmesg for better debug.
>>>>>>
>>>>>> So here we. I did the same test again. Here is a full log of what i
>>>>>> did. It seems to be mean like a bug in btrfs.
>>>>>> Sequenz of events:
>>>>>> 1. mount the raid1 (2 disc with different size)
>>>>>> 2. unplug the biggest drive (hotplug)
>>>>>> 3. try to copy something to the degraded raid1
>>>>>> 4. plugin the device again (hotplug)
>>>>>>
>>>>>> This scenario does not work. The disc array is NOT redundant! I can
>>>>>> not work with it while a drive is missing and I can not reattach the
>>>>>> device so that everything works again.
>>>>>>
>>>>>> The btrfs module crashes during the test.
>>>>>>
>>>>>> I am using LMDE2 with backports:
>>>>>> btrfs-tools 4.4-1~bpo8+1
>>>>>> linux-image-4.4.0-0.bpo.1-amd64
>>>>>>
>>>>>> Matthias
>>>>>>
>>>>>>
>>>>>> rakete - root - /root
>>>>>> 1# mount /mnt/raid1/
>>>>>>
>>>>>> Journal:
>>>>>>
>>>>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto
>>>>>> defrag
>>>>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space
>>>>>> caching is enabled
>>>>>> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>>>>>>
>>>>>> rakete - root - /mnt/raid1
>>>>>> 3# ll
>>>>>> insgesamt 0
>>>>>> drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
>>>>>> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
>>>>>> drwxr-xr-x 1 root root 108 Mär 24 07:31 var
>>>>>>
>>>>>> 4# btrfs fi show
>>>>>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>>> Total devices 3 FS bytes used 1.60GiB
>>>>>> devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
>>>>>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
>>>>>> devid 3 size 232.88GiB used 0.00B path /dev/sdi
>>>>>>
>>>>>> ####
>>>>>> unplug device sdg:
>>>>>>
>>>>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>>>>> block 243826688, lost sync page write
>>>>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>>>>> journal superblock for sdf1-8.
>>>>>> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
>>>>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>>>>> block 243826688, lost sync page write
>>>>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>>>>> journal superblock for sdf1-8.
>>>>>> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is
>>>>>> busy
>>>>>> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info
>>>>>> about processes that
>>>>>> Apr 20 07:03:05 rakete umount[16405]: use the device is found by
>>>>>> lsof(8) or fuser(1).)
>>>>>> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process
>>>>>> exited, code=exited status=32
>>>>>> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device
>>>>>> number 3 using xhci_hcd
>>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found,
>>>>>> idVendor=152d, idProduct=0567
>>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings:
>>>>>> Mfr=10, Product=11, SerialNumber=5
>>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI
>>>>>> Bridge
>>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
>>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>>>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>>>>> device detected
>>>>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>>>>> vid 152d pid 0567: 5000000
>>>>>> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
>>>>>> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3:
>>>>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>>>>> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an
>>>>>> MTP device
>>>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access WDC
>>>>>> WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
>>>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access WDC
>>>>>> WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
>>>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access
>>>>>> SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6
>>>>>> type 0
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7
>>>>>> type 0
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte
>>>>>> logical blocks: (2.00 TB/1.82 TiB)
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00
>>>>>> 10 08
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8
>>>>>> type 0
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte
>>>>>> logical blocks: (500 GB/466 GiB)
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page
>>>>>> found
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive
>>>>>> cache: write through
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00
>>>>>> 10 08
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte
>>>>>> logical blocks: (250 GB/233 GiB)
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page
>>>>>> found
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive
>>>>>> cache: write through
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00
>>>>>> 10 08
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page
>>>>>> found
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive
>>>>>> cache: write through
>>>>>> Apr 20 07:03:25 rakete kernel: sdf: sdf1
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
>>>>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
>>>>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>>>>> with ordered data mode. Opts: (null)
>>>>>> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No
>>>>>> such file or directory
>>>>>>
>>>>>>
>>>>>> ####
>>>>>> 5# btrfs fi show
>>>>>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>>> Total devices 3 FS bytes used 1.60GiB
>>>>>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdj
>>>>>> devid 3 size 232.88GiB used 0.00B path /dev/sdk
>>>>>> *** Some devices missing
>>>>>> ####
>>>>>
>>>>> Here the names of *online* devices are changed
>>>>> (/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk) after just
>>>>> offlining a device (/dev/sdf). It's odd regardless of
>>>>> whether Btrfs works fine or not.
>>>>>
>>>>> Can anyone explain this behavior?
>>>>
>>>> FYI,
>>>>
>>>> I tried to reproduce this problem on VM.
>>>> Here USB storages are /dev/sd{a,b,c}.
>>>>
>>>> Step to reproduce:
>>>>
>>>> 1. create a fs on /dev/sd{a,b,c}
>>>> 2. mount this fs
>>>> 3. Surprise unplug /dev/sdc
>>>> 4. Write to this fs till ENOSPC happens
>>>>
>>>> Then, although there are I/O errors about /dev/sdc,
>>>> device names didn't change and ro remount didn't happen.
>>>>
>>>> command log:
>>>> =================================
>>>> # mkfs.btrfs -f -m raid1 -d raid1 /dev/sd{a,b,c}
>>>> btrfs-progs v4.5.1-41-g8202204-dirty
>>>> See http://btrfs.wiki.kernel.org for more information.
>>>>
>>>> Label: (null)
>>>> UUID: 16a54915-c807-42cf-8365-82c0780c5ab5
>>>> Node size: 16384
>>>> Sector size: 4096
>>>> Filesystem size: 15.00GiB
>>>> Block group profiles:
>>>> Data: RAID1 1.01GiB
>>>> Metadata: RAID1 1.01GiB
>>>> System: RAID1 12.00MiB
>>>> SSD detected: no
>>>> Incompat features: extref, skinny-metadata
>>>> Number of devices: 3
>>>> Devices:
>>>> ID SIZE PATH
>>>> 1 5.00GiB /dev/sda
>>>> 2 5.00GiB /dev/sdb
>>>> 3 5.00GiB /dev/sdc
>>>>
>>>> # mount /dev/sda /scratch_mnt/
>>>> # btrfs fi show /scratch_mnt/
>>>> Label: none uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
>>>> Total devices 3 FS bytes used 640.00KiB
>>>> devid 1 size 5.00GiB used 2.00GiB path /dev/sda
>>>> devid 2 size 5.00GiB used 1.01GiB path /dev/sdb
>>>> devid 3 size 5.00GiB used 1.01GiB path /dev/sdc
>>>>
>>>> #
>>>> # # *** surprise unplug happens here ***
>>>> #
>>>> # btrfs fi show /scratch_mnt/
>>>
>>> Would you please post the output of "btrfs-debug-tree -t 3"?
>>>
>>> I guess the case would be that, there is not raid1 stripe in device 3,
>>> so all data/metadata allocation/cow happens without problem.
>>> "btrfs-debug-tree -t 3" output would verify my guess.
>>
>> OK, here it is.
>>
>> btrfs-debug-tree -t 3 before cp:
>> ===========================
>> btrfs-progs v4.5.1-41-g8202204-dirty
>> chunk tree
>> leaf 20987904 items 6 free space 15503 generation 5 owner 3
>> fs uuid 30771a06-e6a8-4cbc-a094-893049fa5060
>> chunk uuid 2325f1b9-1bf0-4247-8c29-7b179eabf1b2
>> item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
>> dev item devid 1 total_bytes 5368709120 bytes used 2147483648
>> dev uuid 06bc0993-39d3-4d9a-b484-760ae2150c3a
>> item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
>> dev item devid 2 total_bytes 5368709120 bytes used 1082130432
>> dev uuid 3868895f-295b-4a89-a01c-ad0f1c5ac758
>> item 2 key (DEV_ITEMS DEV_ITEM 3) itemoff 15989 itemsize 98
>> dev item devid 3 total_bytes 5368709120 bytes used 1082130432
>> dev uuid 911e8702-9428-4b8e-bc6d-d212e909a1ef
>> item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15877
>> itemsize 112
>> chunk length 8388608 owner 2 stripe_len 65536
>> type SYSTEM|RAID1 num_stripes 2
>> stripe 0 devid 3 offset 1048576
>> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>> stripe 1 devid 2 offset 1048576
>> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 15765
>> itemsize 112
>> chunk length 1073741824 owner 2 stripe_len 65536
>> type METADATA|RAID1 num_stripes 2
>> stripe 0 devid 1 offset 20971520
>> dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>> stripe 1 devid 3 offset 9437184
>> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>> item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 1103101952) itemoff 15653
>> itemsize 112
>> chunk length 1073741824 owner 2 stripe_len 65536
>> type DATA|RAID1 num_stripes 2
>> stripe 0 devid 2 offset 9437184
>> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>> stripe 1 devid 1 offset 1094713344
>> dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>> total bytes 16106127360
>> bytes used 114688
>> uuid 30771a06-e6a8-4cbc-a094-893049fa5060
>> ===========================
>>
>>
>>
>> Here I hot unplug devid 2 (/dev/sdb).
>>
>>
>>
>> btrfs-debug-tree -t 3 after cp (which cause ENOSPC):
>> ===========================
>> btrfs-progs v4.5.1-41-g8202204-dirty
>> warning, device 2 is missing
>> chunk tree
>> leaf 20987904 items 11 free space 14818 generation 9 owner 3
>> fs uuid 30771a06-e6a8-4cbc-a094-893049fa5060
>> chunk uuid 2325f1b9-1bf0-4247-8c29-7b179eabf1b2
>> item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
>> dev item devid 1 total_bytes 5368709120 bytes used 4294967296
>> dev uuid 06bc0993-39d3-4d9a-b484-760ae2150c3a
>> item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
>> dev item devid 2 total_bytes 5368709120 bytes used 5367660544
>> dev uuid 3868895f-295b-4a89-a01c-ad0f1c5ac758
>> item 2 key (DEV_ITEMS DEV_ITEM 3) itemoff 15989 itemsize 98
>> dev item devid 3 total_bytes 5368709120 bytes used 5367660544
>> dev uuid 911e8702-9428-4b8e-bc6d-d212e909a1ef
>> item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15877
>> itemsize 112
>> chunk length 8388608 owner 2 stripe_len 65536
>> type SYSTEM|RAID1 num_stripes 2
>> stripe 0 devid 3 offset 1048576
>> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>> stripe 1 devid 2 offset 1048576
>> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 15765
>> itemsize 112
>> chunk length 1073741824 owner 2 stripe_len 65536
>> type METADATA|RAID1 num_stripes 2
>> stripe 0 devid 1 offset 20971520
>> dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>> stripe 1 devid 3 offset 9437184
>> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>> item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 1103101952) itemoff 15653
>> itemsize 112
>> chunk length 1073741824 owner 2 stripe_len 65536
>> type DATA|RAID1 num_stripes 2
>> stripe 0 devid 2 offset 9437184
>> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>> stripe 1 devid 1 offset 1094713344
>> dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>> item 6 key (FIRST_CHUNK_TREE CHUNK_ITEM 2176843776) itemoff 15541
>> itemsize 112
>> chunk length 1073741824 owner 2 stripe_len 65536
>> type DATA|RAID1 num_stripes 2
>> stripe 0 devid 2 offset 1083179008
>> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>> stripe 1 devid 3 offset 1083179008
>> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>> item 7 key (FIRST_CHUNK_TREE CHUNK_ITEM 3250585600) itemoff 15429
>> itemsize 112
>> chunk length 1073741824 owner 2 stripe_len 65536
>> type DATA|RAID1 num_stripes 2
>> stripe 0 devid 1 offset 2168455168
>> dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>> stripe 1 devid 3 offset 2156920832
>> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>> item 8 key (FIRST_CHUNK_TREE CHUNK_ITEM 4324327424) itemoff 15317
>> itemsize 112
>> chunk length 1073741824 owner 2 stripe_len 65536
>> type DATA|RAID1 num_stripes 2
>> stripe 0 devid 2 offset 2156920832
>> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>> stripe 1 devid 1 offset 3242196992
>> dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>> item 9 key (FIRST_CHUNK_TREE CHUNK_ITEM 5398069248) itemoff 15205
>> itemsize 112
>> chunk length 1073741824 owner 2 stripe_len 65536
>> type DATA|RAID1 num_stripes 2
>> stripe 0 devid 2 offset 3230662656
>> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>> stripe 1 devid 3 offset 3230662656
>> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>> item 10 key (FIRST_CHUNK_TREE CHUNK_ITEM 6471811072) itemoff 15093
>> itemsize 112
>> chunk length 1064304640 owner 2 stripe_len 65536
>> type DATA|RAID1 num_stripes 2
>> stripe 0 devid 2 offset 4304404480
>> dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>> stripe 1 devid 3 offset 4304404480
>> dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>> total bytes 16106127360
>> bytes used 6711709696
>> uuid 30771a06-e6a8-4cbc-a094-893049fa5060
>> ===========================
>>
>> In both before cp and after cp, there are
>> chunks containing /dev/sdb (devid 2).
>
> Right, even newly create data chunk have strips on devid 2.
>
> Making the original bug a little strange now.
Yes, so I guess the root cause of the original bug
comes from the name changing of the still-online
two devices (/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk).
Thanks,
Satoru
>
> Thanks,
> Qu
>>
>> Thanks,
>> Satoru
>>
>>>
>>> Thanks,
>>> Qu
>>>> Label: none uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
>>>> Total devices 3 FS bytes used 1.81GiB
>>>> devid 1 size 5.00GiB used 2.00GiB path /dev/sda
>>>> devid 2 size 5.00GiB used 2.01GiB path /dev/sdb
>>>> *** Some devices missing
>>>>
>>>> # cp -a linux /scratch_mnt/
>>>> # cp -a linux /scratch_mnt/linux.2
>>>> # cp -a linux /scratch_mnt/linux.3
>>>> cp: error writing ‘/scratch_mnt/linux.3/drivers/scsi/lpfc/lpfc_els.c’:
>>>> No space left on device
>>>> ...
>>>> # mount | grep scratch
>>>> /dev/sda on /scratch_mnt type btrfs
>>>> (rw,relatime,seclabel,space_cache,subvolid=5,subvol=/)
>>>> # dmesg | tail
>>>> [ 1400.778705] BTRFS warning (device sdc): lost page write due to IO
>>>> error on /dev/sdc
>>>> [ 1438.604796] btrfs_dev_stat_print_on_error: 174 callbacks suppressed
>>>> [ 1438.604803] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125633,
>>>> rd 1, flush 276, corrupt 0, gen 0
>>>> [ 1438.609782] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
>>>> rd 1, flush 276, corrupt 0, gen 0
>>>> [ 1438.613331] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
>>>> rd 1, flush 277, corrupt 0, gen 0
>>>> [ 1438.669090] btrfs_end_buffer_write_sync: 52 callbacks suppressed
>>>> [ 1438.669095] BTRFS warning (device sdc): lost page write due to IO
>>>> error on /dev/sdc
>>>> [ 1438.669098] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125635,
>>>> rd 1, flush 277, corrupt 0, gen 0
>>>> [ 1438.672621] BTRFS warning (device sdc): lost page write due to IO
>>>> error on /dev/sdc
>>>> [ 1438.672626] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125636,
>>>> rd 1, flush 277, corrupt 0, gen 0
>>>> =================================
>>>>
>>>> Thanks,
>>>> Satoru
>>>>
>>>>>
>>>>> Thanks,
>>>>> Satoru
>>>>>
>>>>>> still mounted in rw mode:
>>>>>> /dev/sdg on /mnt/raid1 type btrfs
>>>>>> (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>>>> ####
>>>>>> 7# cp -r /root/ .
>>>>>> cp: das Verzeichnis „./root“ kann nicht angelegt werden:
>>>>>> Eingabe-/Ausgabefehler
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error
>>>>>> reading free space cache
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to
>>>>>> load free space cache for block group 20497563648, rebuilding it now
>>>>>> Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
>>>>>> Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at
>>>>>> /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156
>>>>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
>>>>>> Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage
>>>>>> pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci
>>>>>> saa7146_vv ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core
>>>>>> dvb_core v4l2_common videodev media cfg80211 vboxdrv(O)
>>>>>> cpufreq_powersave cpufreq_conservative cpufreq_userspace
>>>>>> cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi
>>>>>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
>>>>>> irqbypass crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev
>>>>>> sparse_keymap drbg iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek
>>>>>> rfkill ansi_cprng snd_hda_codec_generic nvidia(PO) aesni_intel
>>>>>> aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper
>>>>>> snd_hda_codec cryptd snd_hda_core serio_raw pcspkr snd_hwdep snd_pcm
>>>>>> i2c_i801 snd_timer snd lpc_ich soundcore 8250_fintek mei_me shpchp mei
>>>>>> Apr 20 07:05:37 rakete kernel: mfd_core battery tpm_tis tpm evdev
>>>>>> processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac
>>>>>> encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache
>>>>>> jbd2 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor
>>>>>> async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic
>>>>>> md_mod dm_mirror dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod
>>>>>> ata_generic ahci libahci pata_via xhci_pci ehci_pci crc32c_intel
>>>>>> xhci_hcd ehci_hcd libata psmouse scsi_mod atl1c usbcore usb_common
>>>>>> fjes video wmi fan thermal button
>>>>>> Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted:
>>>>>> P O 4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
>>>>>> Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer
>>>>>> System Product Name/P8H67-V, BIOS 3707 07/12/2013
>>>>>> Apr 20 07:05:37 rakete kernel: 0000000000000286 000000006a1407c8
>>>>>> ffffffff812ed425 ffff88016b6dfb90
>>>>>> Apr 20 07:05:37 rakete kernel: ffffffffa03817b8 ffffffff81077ea1
>>>>>> ffff88018e7fcd30 ffff88016b6dfbe8
>>>>>> Apr 20 07:05:37 rakete kernel: ffff88005d863e88 ffff8801cde7a980
>>>>>> ffff88018e7fce48 ffffffff81077f2c
>>>>>> Apr 20 07:05:37 rakete kernel: Call Trace:
>>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff812ed425>] ?
>>>>>> dump_stack+0x5c/0x77
>>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff81077ea1>] ?
>>>>>> warn_slowpath_common+0x81/0xb0
>>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff81077f2c>] ?
>>>>>> warn_slowpath_fmt+0x5c/0x80
>>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02d74af>] ?
>>>>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]
>>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02d76af>] ?
>>>>>> btrfs_cow_block+0x10f/0x1d0 [btrfs]
>>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02db2cd>] ?
>>>>>> btrfs_search_slot+0x1fd/0xa30 [btrfs]
>>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa02dd3f1>] ?
>>>>>> btrfs_insert_empty_items+0x71/0xc0 [btrfs]
>>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff811f4d92>] ?
>>>>>> insert_inode_locked4+0xa2/0x1c0
>>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa030ee5d>] ?
>>>>>> btrfs_new_inode+0x1cd/0x590 [btrfs]
>>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffffa0310a77>] ?
>>>>>> btrfs_mkdir+0x107/0x1f0 [btrfs]
>>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff811e80b0>] ?
>>>>>> vfs_mkdir+0xb0/0x140
>>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff811e9d3e>] ?
>>>>>> SyS_mkdir+0xce/0x110
>>>>>> Apr 20 07:05:37 rakete kernel: [<ffffffff81592736>] ?
>>>>>> system_call_fast_compare_end+0xc/0x6b
>>>>>> Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in
>>>>>> __btrfs_cow_block:1156: errno=-5 IO failure
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced
>>>>>> readonly
>>>>>>
>>>>>> ####
>>>>>> Try to copy again:
>>>>>> 11# cp -r /root/ .
>>>>>> cp: cannot create directory './root': Read-only file system
>>>>>> ####
>>>>>> /dev/sdg on /mnt/raid1 type btrfs
>>>>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>>>> ####
>>>>>> plugin device sdg again:
>>>>>>
>>>>>> Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point
>>>>>> /media/matthias/BACKUP (device 8:81 no longer exist)
>>>>>> Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device
>>>>>> number 3
>>>>>> Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No
>>>>>> such file or directory
>>>>>> Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is
>>>>>> busy
>>>>>> Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info
>>>>>> about processes that
>>>>>> Apr 20 07:07:39 rakete umount[16807]: use the device is found by
>>>>>> lsof(8) or fuser(1).)
>>>>>> Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process
>>>>>> exited, code=exited status=32
>>>>>> Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device
>>>>>> number 4 using xhci_hcd
>>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found,
>>>>>> idVendor=152d, idProduct=0567
>>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings:
>>>>>> Mfr=10, Product=11, SerialNumber=5
>>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI
>>>>>> Bridge
>>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
>>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>>>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>>>>> device detected
>>>>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>>>>> vid 152d pid 0567: 5000000
>>>>>> Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
>>>>>> Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4:
>>>>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>>>>> Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an
>>>>>> MTP device
>>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access WDC
>>>>>> WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
>>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access WDC
>>>>>> WD75 00AACS-00C7B0 0125 PQ: 0 ANSI: 6
>>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access WDC
>>>>>> WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
>>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access
>>>>>> SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6
>>>>>> type 0
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte
>>>>>> logical blocks: (2.00 TB/1.82 TiB)
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00
>>>>>> 10 08
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7
>>>>>> type 0
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode
>>>>>> page found
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive
>>>>>> cache: write through
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte
>>>>>> logical blocks: (750 GB/699 GiB)
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8
>>>>>> type 0
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00
>>>>>> 10 08
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9
>>>>>> type 0
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte
>>>>>> logical blocks: (500 GB/466 GiB)
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode
>>>>>> page found
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive
>>>>>> cache: write through
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00
>>>>>> 10 08
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte
>>>>>> logical blocks: (250 GB/233 GiB)
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode
>>>>>> page found
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive
>>>>>> cache: write through
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00
>>>>>> 10 08
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode
>>>>>> page found
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive
>>>>>> cache: write through
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
>>>>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
>>>>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>>>>> with ordered data mode. Opts: (null)
>>>>>>
>>>>>> ####
>>>>>> still ro mode
>>>>>> /dev/sdj on /mnt/raid1 type btrfs
>>>>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>>>> ####
>>>>>> 14# btrfs fi show
>>>>>> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>>> Total devices 3 FS bytes used 1.60GiB
>>>>>> devid 1 size 698.64GiB used 3.03GiB path /dev/sdj
>>>>>> devid 2 size 465.76GiB used 3.03GiB path /dev/sdk
>>>>>> devid 3 size 232.88GiB used 0.00B path /dev/sdl
>>>>>> ####
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-btrfs" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-22 6:02 ` Qu Wenruo
@ 2016-04-23 7:07 ` Matthias Bodenbinder
2016-04-23 7:17 ` Matthias Bodenbinder
` (2 more replies)
0 siblings, 3 replies; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-23 7:07 UTC (permalink / raw)
To: linux-btrfs
Here is my newest test. The backports provide a 4.5 kernel:
####
kernel: 4.5.0-0.bpo.1-amd64
btrfs-tools: 4.4-1~bpo8+1
####
This time the raid1 is automatically unmounted after I unplug the device and it can not be mounted while the device is missing. See below.
Matthias
####
1) turn on the Fantec case:
Apr 23 08:45:38 rakete kernel: usb 3-1: new SuperSpeed USB device number 2 using xhci_hcd
Apr 23 08:45:38 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
Apr 23 08:45:38 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
Apr 23 08:45:38 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
Apr 23 08:45:38 rakete kernel: usb 3-1: Manufacturer: JMicron
Apr 23 08:45:38 rakete kernel: usb 3-1: SerialNumber: 152D00539000
Apr 23 08:45:38 rakete mtp-probe[3641]: checking bus 3, device 2: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
Apr 23 08:45:38 rakete mtp-probe[3641]: bus: 3, device: 2 was not an MTP device
Apr 23 08:45:38 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
Apr 23 08:45:38 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
Apr 23 08:45:38 rakete kernel: scsi host8: usb-storage 3-1:1.0
Apr 23 08:45:38 rakete kernel: usbcore: registered new interface driver usb-storage
Apr 23 08:45:38 rakete kernel: usbcore: registered new interface driver uas
Apr 23 08:45:39 rakete kernel: scsi 8:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
Apr 23 08:45:39 rakete kernel: scsi 8:0:0:1: Direct-Access WDC WD75 00AACS-00C7B0 0125 PQ: 0 ANSI: 6
Apr 23 08:45:39 rakete kernel: scsi 8:0:0:2: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
Apr 23 08:45:39 rakete kernel: scsi 8:0:0:3: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: Attached scsi generic sg6 type 0
Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: Attached scsi generic sg7 type 0
Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Write Protect is off
Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Mode Sense: 67 00 10 08
Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: Attached scsi generic sg8 type 0
Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Write Protect is off
Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Mode Sense: 67 00 10 08
Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: Attached scsi generic sg9 type 0
Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] No Caching mode page found
Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Assuming drive cache: write through
Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Write Protect is off
Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Mode Sense: 67 00 10 08
Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] 488395055 512-byte logical blocks: (250 GB/233 GiB)
Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] No Caching mode page found
Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Assuming drive cache: write through
Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Write Protect is off
Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Mode Sense: 67 00 10 08
Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] No Caching mode page found
Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Assuming drive cache: write through
Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] No Caching mode page found
Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Assuming drive cache: write through
Apr 23 08:45:39 rakete kernel: sdf: sdf1
Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Attached SCSI disk
Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Attached SCSI disk
Apr 23 08:45:40 rakete kernel: sd 8:0:0:2: [sdh] Attached SCSI disk
Apr 23 08:45:40 rakete kernel: sd 8:0:0:3: [sdi] Attached SCSI disk
Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 1 transid 89 /dev/sdg
Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 2 transid 89 /dev/sdh
Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 3 transid 89 /dev/sdi
Apr 23 08:45:40 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
Apr 23 08:45:40 rakete udisksd[2422]: Mounted /dev/sdf1 at /media/matthias/BACKUP on behalf of uid 1000
####
7# mount /mnt/raid1/
Apr 23 08:47:31 rakete kernel: BTRFS info (device sdi): enabling auto defrag
Apr 23 08:47:31 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
Apr 23 08:47:31 rakete kernel: BTRFS: has skinny extents
8# btrfs fi show
Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
Total devices 3 FS bytes used 1.60GiB
devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
devid 3 size 232.88GiB used 0.00B path /dev/sdi
9# ls -l /mnt/raid1/
total 0
drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
drwxr-xr-x 1 root root 108 Mar 24 07:31 var
####
Unplug the biggest HD
Apr 23 08:51:29 rakete kernel: usb 3-1: USB disconnect, device number 2
Apr 23 08:51:29 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
Apr 23 08:51:29 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
Apr 23 08:51:29 rakete kernel: Aborting journal on device sdf1-8.
Apr 23 08:51:29 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
Apr 23 08:51:29 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): error reading free space cache
Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 20497563648, rebuilding it now
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): error reading free space cache
Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 21571305472, rebuilding it now
Apr 23 08:51:29 rakete kernel: BTRFS: error (device sdi) in btrfs_commit_transaction:2142: errno=-5 IO failure (Error while writing out transaction)
Apr 23 08:51:29 rakete kernel: BTRFS info (device sdi): forced readonly
Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): Skipping commit of aborted transaction.
Apr 23 08:51:29 rakete kernel: ------------[ cut here ]------------
Apr 23 08:51:29 rakete kernel: WARNING: CPU: 1 PID: 4277 at /build/linux-Ki7dwx/linux-4.5.1/fs/btrfs/transaction.c:1764 cleanup_transaction+0x96/0x300 [btrfs]()
Apr 23 08:51:29 rakete kernel: BTRFS: Transaction aborted (error -5)
Apr 23 08:51:29 rakete kernel: Modules linked in: uas(E) usb_storage(E) pci_stub(E) vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) binfmt_misc(E) dvb_ttpci(E) saa7146_vv(E) ttpci_eeprom(E) saa7146(E) videobuf_dma_sg(E) videobuf_core(E) dvb_core(E) v4l2_common(E) videodev(E) media(E) cfg80211(E) vboxdrv(OE) cpufreq_powersave(E) cpufreq_conservative(E) cpufreq_userspace(E) cpufreq_stats(E) snd_hda_codec_hdmi(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drbg(E) ansi_cprng(E) eeepc_wmi(E) asus_wmi(E) sparse_keymap(E) iTCO_wdt(E) joydev(E) iTCO_vendor_support(E) rfkill(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) snd_hda_codec_realtek(E) pcspkr(E) snd_hda_codec_generic(E)
Apr 23 08:51:29 rakete kernel: serio_raw(E) i2c_i801(E) lpc_ich(E) mfd_core(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) evdev(E) battery(E) snd_timer(E) 8250_fintek(E) snd(E) mei_me(E) soundcore(E) mei(E) shpchp(E) tpm_tis(E) tpm(E) processor(E) nvidia(POE) drm(E) fuse(E) ecryptfs(E) cbc(E) hmac(E) encrypted_keys(E) parport_pc(E) ppdev(E) lp(E) parport(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) hid_generic(E) usbhid(E) hid(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) md_mod(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) sg(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ahci(E) pata_via(E) libahci(E) crc32c_intel(E) xhci_pci(E) ehci_pci(E) psmouse(E) libata(E) xhci_hcd(E)
Apr 23 08:51:29 rakete kernel: ehci_hcd(E) atl1c(E) scsi_mod(E) usbcore(E) usb_common(E) wmi(E) fan(E) thermal(E) fjes(E) video(E) button(E)
Apr 23 08:51:29 rakete kernel: CPU: 1 PID: 4277 Comm: umount Tainted: P OE 4.5.0-0.bpo.1-amd64 #1 Debian 4.5.1-1~bpo8+1
Apr 23 08:51:29 rakete kernel: Hardware name: System manufacturer System Product Name/P8H67-V, BIOS 3707 07/12/2013
Apr 23 08:51:29 rakete kernel: 0000000000000286 00000000bfe9047d ffffffff813099f5 ffff8801c1103ca8
Apr 23 08:51:29 rakete kernel: ffffffffc03a8c98 ffffffff81079a61 ffff880214562d90 ffff8801c1103d00
Apr 23 08:51:29 rakete kernel: ffff8801c5165980 00000000fffffffb ffff880214562d90 ffffffff81079aec
Apr 23 08:51:29 rakete kernel: Call Trace:
Apr 23 08:51:29 rakete kernel: [<ffffffff813099f5>] ? dump_stack+0x5c/0x77
Apr 23 08:51:29 rakete kernel: [<ffffffff81079a61>] ? warn_slowpath_common+0x81/0xb0
Apr 23 08:51:29 rakete kernel: [<ffffffff81079aec>] ? warn_slowpath_fmt+0x5c/0x80
Apr 23 08:51:29 rakete kernel: [<ffffffffc0320e46>] ? cleanup_transaction+0x96/0x300 [btrfs]
Apr 23 08:51:29 rakete kernel: [<ffffffff810b94a0>] ? wait_woken+0x90/0x90
Apr 23 08:51:29 rakete kernel: [<ffffffffc0321bf3>] ? btrfs_commit_transaction+0x2b3/0xa30 [btrfs]
Apr 23 08:51:29 rakete kernel: [<ffffffffc0322406>] ? start_transaction+0x96/0x4d0 [btrfs]
Apr 23 08:51:29 rakete kernel: [<ffffffffc031d0d2>] ? close_ctree+0x2b2/0x360 [btrfs]
Apr 23 08:51:29 rakete kernel: [<ffffffff81206fd7>] ? evict_inodes+0x147/0x170
Apr 23 08:51:29 rakete kernel: [<ffffffff811eda39>] ? generic_shutdown_super+0x69/0xf0
Apr 23 08:51:29 rakete kernel: [<ffffffff811edace>] ? kill_anon_super+0xe/0x20
Apr 23 08:51:29 rakete kernel: [<ffffffffc02f1603>] ? btrfs_kill_super+0x13/0x100 [btrfs]
Apr 23 08:51:29 rakete kernel: [<ffffffff811ed4c4>] ? deactivate_locked_super+0x34/0x60
Apr 23 08:51:29 rakete kernel: [<ffffffff81209d5b>] ? cleanup_mnt+0x3b/0x80
Apr 23 08:51:29 rakete kernel: [<ffffffff81096114>] ? task_work_run+0x74/0x90
Apr 23 08:51:29 rakete kernel: [<ffffffff8100334a>] ? exit_to_usermode_loop+0xba/0xc0
Apr 23 08:51:29 rakete kernel: [<ffffffff81003bcf>] ? syscall_return_slowpath+0x8f/0x110
Apr 23 08:51:29 rakete kernel: [<ffffffff815b9918>] ? int_ret_from_sys_call+0x25/0x8f
Apr 23 08:51:29 rakete kernel: ---[ end trace 6bbe2b6d20973e0e ]---
Apr 23 08:51:29 rakete kernel: BTRFS: error (device sdi) in cleanup_transaction:1764: errno=-5 IO failure
Apr 23 08:51:29 rakete kernel: BTRFS info (device sdi): delayed_refs has NO entry
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): commit super ret -5
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): cleaner transaction attach returned -30
....
Apr 23 08:51:48 rakete kernel: usb 3-1: new SuperSpeed USB device number 3 using xhci_hcd
Apr 23 08:51:48 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
Apr 23 08:51:48 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
Apr 23 08:51:48 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
Apr 23 08:51:48 rakete kernel: usb 3-1: Manufacturer: JMicron
Apr 23 08:51:48 rakete kernel: usb 3-1: SerialNumber: 152D00539000
Apr 23 08:51:48 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
Apr 23 08:51:48 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
Apr 23 08:51:48 rakete kernel: scsi host9: usb-storage 3-1:1.0
Apr 23 08:51:48 rakete mtp-probe[4301]: checking bus 3, device 3: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
Apr 23 08:51:48 rakete mtp-probe[4301]: bus: 3, device: 3 was not an MTP device
Apr 23 08:51:49 rakete kernel: scsi 9:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
Apr 23 08:51:49 rakete kernel: scsi 9:0:0:1: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
Apr 23 08:51:49 rakete kernel: scsi 9:0:0:2: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6 type 0
Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8 type 0
Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page found
Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache: write through
Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Write Protect is off
Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Mode Sense: 67 00 10 08
Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] 488395055 512-byte logical blocks: (250 GB/233 GiB)
Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] No Caching mode page found
Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Assuming drive cache: write through
Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Write Protect is off
Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Mode Sense: 67 00 10 08
Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] No Caching mode page found
Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Assuming drive cache: write through
Apr 23 08:51:49 rakete kernel: sdf: sdf1
Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Attached SCSI disk
Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Attached SCSI disk
Apr 23 08:51:49 rakete udisksd[2422]: Mounted /dev/sdf1 at /media/matthias/BACKUP on behalf of uid 1000
Apr 23 08:51:49 rakete kernel: EXT4-fs (sdf1): recovery complete
Apr 23 08:51:49 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
####
10# btrfs fi show
warning, device 1 is missing
warning devid 1 not found already
Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
Total devices 3 FS bytes used 1.60GiB
devid 2 size 465.76GiB used 3.03GiB path /dev/sdg
devid 3 size 232.88GiB used 0.00B path /dev/sdh
*** Some devices missing
####
This time the raid1 is in state "unmounted" after removing the device. This is different to what I found with kernel 4.4.
12# ls -l /mnt/raid1/
total 0
####
Trying to mount it again:
14# mount /mnt/raid1/
mount: wrong fs type, bad option, bad superblock on /dev/sdh,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
####
Apr 23 08:54:35 rakete kernel: BTRFS info (device sdh): enabling auto defrag
Apr 23 08:54:35 rakete kernel: BTRFS info (device sdh): disk space caching is enabled
Apr 23 08:54:35 rakete kernel: BTRFS: has skinny extents
Apr 23 08:54:35 rakete kernel: BTRFS: failed to read the system array on sdh
Apr 23 08:54:35 rakete kernel: BTRFS: open_ctree failed
####
Plugin the device again.
Apr 23 08:55:44 rakete kernel: usb 3-1: USB disconnect, device number 3
Apr 23 08:56:06 rakete kernel: usb 3-1: new SuperSpeed USB device number 4 using xhci_hcd
Apr 23 08:56:06 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
Apr 23 08:56:06 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
Apr 23 08:56:06 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
Apr 23 08:56:06 rakete kernel: usb 3-1: Manufacturer: JMicron
Apr 23 08:56:06 rakete kernel: usb 3-1: SerialNumber: 152D00539000
Apr 23 08:56:06 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
Apr 23 08:56:06 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
Apr 23 08:56:06 rakete kernel: scsi host10: usb-storage 3-1:1.0
Apr 23 08:56:06 rakete mtp-probe[4751]: checking bus 3, device 4: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
Apr 23 08:56:06 rakete mtp-probe[4751]: bus: 3, device: 4 was not an MTP device
Apr 23 08:56:07 rakete kernel: scsi 10:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
Apr 23 08:56:07 rakete kernel: scsi 10:0:0:1: Direct-Access WDC WD75 00AACS-00C7B0 0125 PQ: 0 ANSI: 6
Apr 23 08:56:07 rakete kernel: scsi 10:0:0:2: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
Apr 23 08:56:07 rakete kernel: scsi 10:0:0:3: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6 type 0
Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00 10 08
Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7 type 0
Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode page found
Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive cache: write through
Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8 type 0
Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9 type 0
Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Write Protect is off
Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Mode Sense: 67 00 10 08
Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] No Caching mode page found
Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Assuming drive cache: write through
Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] 488395055 512-byte logical blocks: (250 GB/233 GiB)
Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Write Protect is off
Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Mode Sense: 67 00 10 08
Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Write Protect is off
Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Mode Sense: 67 00 10 08
Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] No Caching mode page found
Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Assuming drive cache: write through
Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] No Caching mode page found
Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Assuming drive cache: write through
Apr 23 08:56:07 rakete kernel: sdf: sdf1
Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Attached SCSI disk
Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Attached SCSI disk
Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Attached SCSI disk
Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 1 transid 89 /dev/sdg
Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 2 transid 89 /dev/sdh
Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 3 transid 89 /dev/sdi
Apr 23 08:56:07 rakete kernel: EXT4-fs (sdf1): recovery complete
Apr 23 08:56:07 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
####
15# btrfs fi show
Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
Total devices 3 FS bytes used 1.60GiB
devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
devid 3 size 232.88GiB used 0.00B path /dev/sdi
####
18# mount /mnt/raid1/
Apr 23 08:57:00 rakete kernel: BTRFS info (device sdi): enabling auto defrag
Apr 23 08:57:00 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
Apr 23 08:57:00 rakete kernel: BTRFS: has skinny extents
####
19# ls -l /mnt/raid1/
total 0
drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
drwxr-xr-x 1 root root 108 Mar 24 07:31 var
####
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-23 7:07 ` Matthias Bodenbinder
@ 2016-04-23 7:17 ` Matthias Bodenbinder
2016-04-26 8:17 ` Satoru Takeuchi
2016-04-26 15:16 ` Henk Slager
2 siblings, 0 replies; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-23 7:17 UTC (permalink / raw)
To: linux-btrfs
Am 23.04.2016 um 09:07 schrieb Matthias Bodenbinder:
> 14# mount /mnt/raid1/
> mount: wrong fs type, bad option, bad superblock on /dev/sdh,
> missing codepage or helper program, or other error
>
> In some cases useful info is found in syslog - try
> dmesg | tail or so.
> ####
My /etc/fstab has the following entry for the raid1:
UUID=16d5891f-5d52-4b29-8591-588ddf11e73d /mnt/raid1 btrfs noauto,noatime,autodefrag 1 2
Matthias
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-23 7:07 ` Matthias Bodenbinder
2016-04-23 7:17 ` Matthias Bodenbinder
@ 2016-04-26 8:17 ` Satoru Takeuchi
2016-04-26 15:16 ` Henk Slager
2 siblings, 0 replies; 32+ messages in thread
From: Satoru Takeuchi @ 2016-04-26 8:17 UTC (permalink / raw)
To: Matthias Bodenbinder, linux-btrfs
On 2016/04/23 16:07, Matthias Bodenbinder wrote:
> Here is my newest test. The backports provide a 4.5 kernel:
>
> ####
> kernel: 4.5.0-0.bpo.1-amd64
> btrfs-tools: 4.4-1~bpo8+1
> ####
>
> This time the raid1 is automatically unmounted after I unplug the device and it can not be mounted while the device is missing. See below.
>
> Matthias
As I said before, I consider this problem is not
caused by Btrfs, but by hardware.
Please see the following comments.
>
>
> ####
> 1) turn on the Fantec case:
>
> Apr 23 08:45:38 rakete kernel: usb 3-1: new SuperSpeed USB device number 2 using xhci_hcd
> Apr 23 08:45:38 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 23 08:45:38 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 23 08:45:38 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 23 08:45:38 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 23 08:45:38 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 23 08:45:38 rakete mtp-probe[3641]: checking bus 3, device 2: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 23 08:45:38 rakete mtp-probe[3641]: bus: 3, device: 2 was not an MTP device
> Apr 23 08:45:38 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 23 08:45:38 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 23 08:45:38 rakete kernel: scsi host8: usb-storage 3-1:1.0
> Apr 23 08:45:38 rakete kernel: usbcore: registered new interface driver usb-storage
> Apr 23 08:45:38 rakete kernel: usbcore: registered new interface driver uas
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:1: Direct-Access WDC WD75 00AACS-00C7B0 0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:2: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:3: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: Attached scsi generic sg6 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: Attached scsi generic sg7 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: Attached scsi generic sg8 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: Attached scsi generic sg9 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel: sdf: sdf1
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Attached SCSI disk
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Attached SCSI disk
> Apr 23 08:45:40 rakete kernel: sd 8:0:0:2: [sdh] Attached SCSI disk
> Apr 23 08:45:40 rakete kernel: sd 8:0:0:3: [sdi] Attached SCSI disk
When you turned on Fantec case, four disks, WD20(sdf), WD75(sdg),
WD50(sgh), and SP2504C(sgi) were attached. It's a matter of course.
> Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 1 transid 89 /dev/sdg
> Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 2 transid 89 /dev/sdh
> Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 3 transid 89 /dev/sdi
> Apr 23 08:45:40 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
> Apr 23 08:45:40 rakete udisksd[2422]: Mounted /dev/sdf1 at /media/matthias/BACKUP on behalf of uid 1000
>
> ####
>
> 7# mount /mnt/raid1/
>
> Apr 23 08:47:31 rakete kernel: BTRFS info (device sdi): enabling auto defrag
> Apr 23 08:47:31 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
> Apr 23 08:47:31 rakete kernel: BTRFS: has skinny extents
>
> 8# btrfs fi show
> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> Total devices 3 FS bytes used 1.60GiB
> devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
> devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
> devid 3 size 232.88GiB used 0.00B path /dev/sdi
>
> 9# ls -l /mnt/raid1/
> total 0
> drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
> drwxr-xr-x 1 root root 108 Mar 24 07:31 var
>
> ####
>
> Unplug the biggest HD
Then you hot-unplugged the biggest disk, WD75(sdg).
>
> Apr 23 08:51:29 rakete kernel: usb 3-1: USB disconnect, device number 2
> Apr 23 08:51:29 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 23 08:51:29 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 23 08:51:29 rakete kernel: Aborting journal on device sdf1-8.
> Apr 23 08:51:29 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 23 08:51:29 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): error reading free space cache
> Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 20497563648, rebuilding it now
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): error reading free space cache
> Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 21571305472, rebuilding it now
> Apr 23 08:51:29 rakete kernel: BTRFS: error (device sdi) in btrfs_commit_transaction:2142: errno=-5 IO failure (Error while writing out transaction)
> Apr 23 08:51:29 rakete kernel: BTRFS info (device sdi): forced readonly
> Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): Skipping commit of aborted transaction.
> Apr 23 08:51:29 rakete kernel: ------------[ cut here ]------------
> Apr 23 08:51:29 rakete kernel: WARNING: CPU: 1 PID: 4277 at /build/linux-Ki7dwx/linux-4.5.1/fs/btrfs/transaction.c:1764 cleanup_transaction+0x96/0x300 [btrfs]()
> Apr 23 08:51:29 rakete kernel: BTRFS: Transaction aborted (error -5)
> Apr 23 08:51:29 rakete kernel: Modules linked in: uas(E) usb_storage(E) pci_stub(E) vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) binfmt_misc(E) dvb_ttpci(E) saa7146_vv(E) ttpci_eeprom(E) saa7146(E) videobuf_dma_sg(E) videobuf_core(E) dvb_core(E) v4l2_common(E) videodev(E) media(E) cfg80211(E) vboxdrv(OE) cpufreq_powersave(E) cpufreq_conservative(E) cpufreq_userspace(E) cpufreq_stats(E) snd_hda_codec_hdmi(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drbg(E) ansi_cprng(E) eeepc_wmi(E) asus_wmi(E) sparse_keymap(E) iTCO_wdt(E) joydev(E) iTCO_vendor_support(E) rfkill(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) snd_hda_codec_realtek(E) pcspkr(E) snd_hda_codec_generic(E)
> Apr 23 08:51:29 rakete kernel: serio_raw(E) i2c_i801(E) lpc_ich(E) mfd_core(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) evdev(E) battery(E) snd_timer(E) 8250_fintek(E) snd(E) mei_me(E) soundcore(E) mei(E) shpchp(E) tpm_tis(E) tpm(E) processor(E) nvidia(POE) drm(E) fuse(E) ecryptfs(E) cbc(E) hmac(E) encrypted_keys(E) parport_pc(E) ppdev(E) lp(E) parport(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) hid_generic(E) usbhid(E) hid(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) md_mod(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) sg(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ahci(E) pata_via(E) libahci(E) crc32c_intel(E) xhci_pci(E) ehci_pci(E) psmouse(E) libata(E) xhci_hcd(E)
> Apr 23 08:51:29 rakete kernel: ehci_hcd(E) atl1c(E) scsi_mod(E) usbcore(E) usb_common(E) wmi(E) fan(E) thermal(E) fjes(E) video(E) button(E)
> Apr 23 08:51:29 rakete kernel: CPU: 1 PID: 4277 Comm: umount Tainted: P OE 4.5.0-0.bpo.1-amd64 #1 Debian 4.5.1-1~bpo8+1
> Apr 23 08:51:29 rakete kernel: Hardware name: System manufacturer System Product Name/P8H67-V, BIOS 3707 07/12/2013
> Apr 23 08:51:29 rakete kernel: 0000000000000286 00000000bfe9047d ffffffff813099f5 ffff8801c1103ca8
> Apr 23 08:51:29 rakete kernel: ffffffffc03a8c98 ffffffff81079a61 ffff880214562d90 ffff8801c1103d00
> Apr 23 08:51:29 rakete kernel: ffff8801c5165980 00000000fffffffb ffff880214562d90 ffffffff81079aec
> Apr 23 08:51:29 rakete kernel: Call Trace:
> Apr 23 08:51:29 rakete kernel: [<ffffffff813099f5>] ? dump_stack+0x5c/0x77
> Apr 23 08:51:29 rakete kernel: [<ffffffff81079a61>] ? warn_slowpath_common+0x81/0xb0
> Apr 23 08:51:29 rakete kernel: [<ffffffff81079aec>] ? warn_slowpath_fmt+0x5c/0x80
> Apr 23 08:51:29 rakete kernel: [<ffffffffc0320e46>] ? cleanup_transaction+0x96/0x300 [btrfs]
> Apr 23 08:51:29 rakete kernel: [<ffffffff810b94a0>] ? wait_woken+0x90/0x90
> Apr 23 08:51:29 rakete kernel: [<ffffffffc0321bf3>] ? btrfs_commit_transaction+0x2b3/0xa30 [btrfs]
> Apr 23 08:51:29 rakete kernel: [<ffffffffc0322406>] ? start_transaction+0x96/0x4d0 [btrfs]
> Apr 23 08:51:29 rakete kernel: [<ffffffffc031d0d2>] ? close_ctree+0x2b2/0x360 [btrfs]
> Apr 23 08:51:29 rakete kernel: [<ffffffff81206fd7>] ? evict_inodes+0x147/0x170
> Apr 23 08:51:29 rakete kernel: [<ffffffff811eda39>] ? generic_shutdown_super+0x69/0xf0
> Apr 23 08:51:29 rakete kernel: [<ffffffff811edace>] ? kill_anon_super+0xe/0x20
> Apr 23 08:51:29 rakete kernel: [<ffffffffc02f1603>] ? btrfs_kill_super+0x13/0x100 [btrfs]
> Apr 23 08:51:29 rakete kernel: [<ffffffff811ed4c4>] ? deactivate_locked_super+0x34/0x60
> Apr 23 08:51:29 rakete kernel: [<ffffffff81209d5b>] ? cleanup_mnt+0x3b/0x80
> Apr 23 08:51:29 rakete kernel: [<ffffffff81096114>] ? task_work_run+0x74/0x90
> Apr 23 08:51:29 rakete kernel: [<ffffffff8100334a>] ? exit_to_usermode_loop+0xba/0xc0
> Apr 23 08:51:29 rakete kernel: [<ffffffff81003bcf>] ? syscall_return_slowpath+0x8f/0x110
> Apr 23 08:51:29 rakete kernel: [<ffffffff815b9918>] ? int_ret_from_sys_call+0x25/0x8f
> Apr 23 08:51:29 rakete kernel: ---[ end trace 6bbe2b6d20973e0e ]---
> Apr 23 08:51:29 rakete kernel: BTRFS: error (device sdi) in cleanup_transaction:1764: errno=-5 IO failure
> Apr 23 08:51:29 rakete kernel: BTRFS info (device sdi): delayed_refs has NO entry
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): commit super ret -5
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): cleaner transaction attach returned -30
>
> ....
>
> Apr 23 08:51:48 rakete kernel: usb 3-1: new SuperSpeed USB device number 3 using xhci_hcd
> Apr 23 08:51:48 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 23 08:51:48 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 23 08:51:48 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 23 08:51:48 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 23 08:51:48 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 23 08:51:48 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 23 08:51:48 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 23 08:51:48 rakete kernel: scsi host9: usb-storage 3-1:1.0
> Apr 23 08:51:48 rakete mtp-probe[4301]: checking bus 3, device 3: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 23 08:51:48 rakete mtp-probe[4301]: bus: 3, device: 3 was not an MTP device
> Apr 23 08:51:49 rakete kernel: scsi 9:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
> Apr 23 08:51:49 rakete kernel: scsi 9:0:0:1: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
> Apr 23 08:51:49 rakete kernel: scsi 9:0:0:2: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6 type 0
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8 type 0
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page found
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache: write through
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Write Protect is off
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Mode Sense: 67 00 10 08
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] No Caching mode page found
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Assuming drive cache: write through
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Write Protect is off
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Mode Sense: 67 00 10 08
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] No Caching mode page found
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Assuming drive cache: write through
> Apr 23 08:51:49 rakete kernel: sdf: sdf1
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Attached SCSI disk
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Attached SCSI disk
After hot-unplugging WD75(sdg), surprisingly, other three online
disks, in other word, not-unplugged disks, were re-attached.
From the filesystem's point of view, once its *all* backend
devices suddenly became missing. In this case, I guess,
any filesystems can't work correctly.
> Apr 23 08:51:49 rakete udisksd[2422]: Mounted /dev/sdf1 at /media/matthias/BACKUP on behalf of uid 1000
> Apr 23 08:51:49 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 23 08:51:49 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
In fact, we can see some problems happened not only on
Btrfs but also on ext4 on WD20(sdf1). ext4 was remounted
here and was recovered from some inconsistent state.
Probably similar problems happen on any other filesystems,
for example XFS.
Apparently it's not what you intended. You just tried to
hot-unplug a disk to confirm whether Btrfs's RAID1 works
correctly or not. However, what happened here was that
all four disks were detached and tree of them were
attached again.
>
> ####
>
> 10# btrfs fi show
> warning, device 1 is missing
> warning devid 1 not found already
> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> Total devices 3 FS bytes used 1.60GiB
> devid 2 size 465.76GiB used 3.03GiB path /dev/sdg
> devid 3 size 232.88GiB used 0.00B path /dev/sdh
> *** Some devices missing
Furthermore, device names for Btrfs were changed. WD75 was
from sdh to sdg and SP2504C was from sdi to sdh. It would
make things worse.
>
> ####
> This time the raid1 is in state "unmounted" after removing the device. This is different to what I found with kernel 4.4.
>
> 12# ls -l /mnt/raid1/
> total 0
>
> ####
> Trying to mount it again:
>
> 14# mount /mnt/raid1/
> mount: wrong fs type, bad option, bad superblock on /dev/sdh,
> missing codepage or helper program, or other error
>
> In some cases useful info is found in syslog - try
> dmesg | tail or so.
> ####
>
> Apr 23 08:54:35 rakete kernel: BTRFS info (device sdh): enabling auto defrag
> Apr 23 08:54:35 rakete kernel: BTRFS info (device sdh): disk space caching is enabled
> Apr 23 08:54:35 rakete kernel: BTRFS: has skinny extents
> Apr 23 08:54:35 rakete kernel: BTRFS: failed to read the system array on sdh
> Apr 23 08:54:35 rakete kernel: BTRFS: open_ctree failed
>
> ####
>
> Plugin the device again.
You hot-plugged the biggest device, WD75.
>
> Apr 23 08:55:44 rakete kernel: usb 3-1: USB disconnect, device number 3
> Apr 23 08:56:06 rakete kernel: usb 3-1: new SuperSpeed USB device number 4 using xhci_hcd
> Apr 23 08:56:06 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 23 08:56:06 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 23 08:56:06 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 23 08:56:06 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 23 08:56:06 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 23 08:56:06 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 23 08:56:06 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 23 08:56:06 rakete kernel: scsi host10: usb-storage 3-1:1.0
> Apr 23 08:56:06 rakete mtp-probe[4751]: checking bus 3, device 4: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 23 08:56:06 rakete mtp-probe[4751]: bus: 3, device: 4 was not an MTP device
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:1: Direct-Access WDC WD75 00AACS-00C7B0 0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:2: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:3: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel: sdf: sdf1
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Attached SCSI disk
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Attached SCSI disk
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Attached SCSI disk
Then all four devices, WD20(sdf), WD75(sdg), WD50(sdh),
and SP2504C(sdi) were attached. Attaching WD75(sdg) is OK.
However, re-attaching already-online devices WD20(sdf),
WD50(sdh), and SP2504C(sdi) are apparently strange.
> Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 1 transid 89 /dev/sdg
> Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 2 transid 89 /dev/sdh
> Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 3 transid 89 /dev/sdi
> Apr 23 08:56:07 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 23 08:56:07 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
There were some problems on ext4 on WD20(sdf1) again.
Thanks,
Satoru
>
> ####
>
> 15# btrfs fi show
> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> Total devices 3 FS bytes used 1.60GiB
> devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
> devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
> devid 3 size 232.88GiB used 0.00B path /dev/sdi
>
> ####
>
> 18# mount /mnt/raid1/
>
> Apr 23 08:57:00 rakete kernel: BTRFS info (device sdi): enabling auto defrag
> Apr 23 08:57:00 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
> Apr 23 08:57:00 rakete kernel: BTRFS: has skinny extents
>
> ####
>
> 19# ls -l /mnt/raid1/
> total 0
> drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
> drwxr-xr-x 1 root root 108 Mar 24 07:31 var
>
> ####
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-23 7:07 ` Matthias Bodenbinder
2016-04-23 7:17 ` Matthias Bodenbinder
2016-04-26 8:17 ` Satoru Takeuchi
@ 2016-04-26 15:16 ` Henk Slager
2 siblings, 0 replies; 32+ messages in thread
From: Henk Slager @ 2016-04-26 15:16 UTC (permalink / raw)
To: Matthias Bodenbinder; +Cc: linux-btrfs
On Sat, Apr 23, 2016 at 9:07 AM, Matthias Bodenbinder
<matthias@bodenbinder.de> wrote:
>
> Here is my newest test. The backports provide a 4.5 kernel:
>
> ####
> kernel: 4.5.0-0.bpo.1-amd64
> btrfs-tools: 4.4-1~bpo8+1
> ####
>
> This time the raid1 is automatically unmounted after I unplug the device and it can not be mounted while the device is missing. See below.
>
> Matthias
>
>
> ####
> 1) turn on the Fantec case:
>
> Apr 23 08:45:38 rakete kernel: usb 3-1: new SuperSpeed USB device number 2 using xhci_hcd
> Apr 23 08:45:38 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 23 08:45:38 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 23 08:45:38 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 23 08:45:38 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 23 08:45:38 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 23 08:45:38 rakete mtp-probe[3641]: checking bus 3, device 2: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 23 08:45:38 rakete mtp-probe[3641]: bus: 3, device: 2 was not an MTP device
> Apr 23 08:45:38 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 23 08:45:38 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 23 08:45:38 rakete kernel: scsi host8: usb-storage 3-1:1.0
> Apr 23 08:45:38 rakete kernel: usbcore: registered new interface driver usb-storage
> Apr 23 08:45:38 rakete kernel: usbcore: registered new interface driver uas
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:1: Direct-Access WDC WD75 00AACS-00C7B0 0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:2: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:3: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: Attached scsi generic sg6 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: Attached scsi generic sg7 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: Attached scsi generic sg8 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: Attached scsi generic sg9 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel: sdf: sdf1
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Attached SCSI disk
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Attached SCSI disk
> Apr 23 08:45:40 rakete kernel: sd 8:0:0:2: [sdh] Attached SCSI disk
> Apr 23 08:45:40 rakete kernel: sd 8:0:0:3: [sdi] Attached SCSI disk
> Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 1 transid 89 /dev/sdg
> Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 2 transid 89 /dev/sdh
> Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 3 transid 89 /dev/sdi
> Apr 23 08:45:40 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
> Apr 23 08:45:40 rakete udisksd[2422]: Mounted /dev/sdf1 at /media/matthias/BACKUP on behalf of uid 1000
>
> ####
>
> 7# mount /mnt/raid1/
>
> Apr 23 08:47:31 rakete kernel: BTRFS info (device sdi): enabling auto defrag
> Apr 23 08:47:31 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
> Apr 23 08:47:31 rakete kernel: BTRFS: has skinny extents
>
> 8# btrfs fi show
> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> Total devices 3 FS bytes used 1.60GiB
> devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
> devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
> devid 3 size 232.88GiB used 0.00B path /dev/sdi
>
> 9# ls -l /mnt/raid1/
> total 0
> drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
> drwxr-xr-x 1 root root 108 Mar 24 07:31 var
>
> ####
>
> Unplug the biggest HD
>
> Apr 23 08:51:29 rakete kernel: usb 3-1: USB disconnect, device number 2
> Apr 23 08:51:29 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 23 08:51:29 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 23 08:51:29 rakete kernel: Aborting journal on device sdf1-8.
> Apr 23 08:51:29 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 23 08:51:29 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): error reading free space cache
> Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 20497563648, rebuilding it now
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): error reading free space cache
> Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 21571305472, rebuilding it now
> Apr 23 08:51:29 rakete kernel: BTRFS: error (device sdi) in btrfs_commit_transaction:2142: errno=-5 IO failure (Error while writing out transaction)
> Apr 23 08:51:29 rakete kernel: BTRFS info (device sdi): forced readonly
> Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): Skipping commit of aborted transaction.
> Apr 23 08:51:29 rakete kernel: ------------[ cut here ]------------
> Apr 23 08:51:29 rakete kernel: WARNING: CPU: 1 PID: 4277 at /build/linux-Ki7dwx/linux-4.5.1/fs/btrfs/transaction.c:1764 cleanup_transaction+0x96/0x300 [btrfs]()
> Apr 23 08:51:29 rakete kernel: BTRFS: Transaction aborted (error -5)
> Apr 23 08:51:29 rakete kernel: Modules linked in: uas(E) usb_storage(E) pci_stub(E) vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) binfmt_misc(E) dvb_ttpci(E) saa7146_vv(E) ttpci_eeprom(E) saa7146(E) videobuf_dma_sg(E) videobuf_core(E) dvb_core(E) v4l2_common(E) videodev(E) media(E) cfg80211(E) vboxdrv(OE) cpufreq_powersave(E) cpufreq_conservative(E) cpufreq_userspace(E) cpufreq_stats(E) snd_hda_codec_hdmi(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drbg(E) ansi_cprng(E) eeepc_wmi(E) asus_wmi(E) sparse_keymap(E) iTCO_wdt(E) joydev(E) iTCO_vendor_support(E) rfkill(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) snd_hda_codec_realtek(E) pcspkr(E) snd_hda_codec_generic(E)
> Apr 23 08:51:29 rakete kernel: serio_raw(E) i2c_i801(E) lpc_ich(E) mfd_core(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) evdev(E) battery(E) snd_timer(E) 8250_fintek(E) snd(E) mei_me(E) soundcore(E) mei(E) shpchp(E) tpm_tis(E) tpm(E) processor(E) nvidia(POE) drm(E) fuse(E) ecryptfs(E) cbc(E) hmac(E) encrypted_keys(E) parport_pc(E) ppdev(E) lp(E) parport(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) hid_generic(E) usbhid(E) hid(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) md_mod(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) sg(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ahci(E) pata_via(E) libahci(E) crc32c_intel(E) xhci_pci(E) ehci_pci(E) psmouse(E) libata(E) xhci_hcd(E)
> Apr 23 08:51:29 rakete kernel: ehci_hcd(E) atl1c(E) scsi_mod(E) usbcore(E) usb_common(E) wmi(E) fan(E) thermal(E) fjes(E) video(E) button(E)
> Apr 23 08:51:29 rakete kernel: CPU: 1 PID: 4277 Comm: umount Tainted: P OE 4.5.0-0.bpo.1-amd64 #1 Debian 4.5.1-1~bpo8+1
> Apr 23 08:51:29 rakete kernel: Hardware name: System manufacturer System Product Name/P8H67-V, BIOS 3707 07/12/2013
> Apr 23 08:51:29 rakete kernel: 0000000000000286 00000000bfe9047d ffffffff813099f5 ffff8801c1103ca8
> Apr 23 08:51:29 rakete kernel: ffffffffc03a8c98 ffffffff81079a61 ffff880214562d90 ffff8801c1103d00
> Apr 23 08:51:29 rakete kernel: ffff8801c5165980 00000000fffffffb ffff880214562d90 ffffffff81079aec
> Apr 23 08:51:29 rakete kernel: Call Trace:
> Apr 23 08:51:29 rakete kernel: [<ffffffff813099f5>] ? dump_stack+0x5c/0x77
> Apr 23 08:51:29 rakete kernel: [<ffffffff81079a61>] ? warn_slowpath_common+0x81/0xb0
> Apr 23 08:51:29 rakete kernel: [<ffffffff81079aec>] ? warn_slowpath_fmt+0x5c/0x80
> Apr 23 08:51:29 rakete kernel: [<ffffffffc0320e46>] ? cleanup_transaction+0x96/0x300 [btrfs]
> Apr 23 08:51:29 rakete kernel: [<ffffffff810b94a0>] ? wait_woken+0x90/0x90
> Apr 23 08:51:29 rakete kernel: [<ffffffffc0321bf3>] ? btrfs_commit_transaction+0x2b3/0xa30 [btrfs]
> Apr 23 08:51:29 rakete kernel: [<ffffffffc0322406>] ? start_transaction+0x96/0x4d0 [btrfs]
> Apr 23 08:51:29 rakete kernel: [<ffffffffc031d0d2>] ? close_ctree+0x2b2/0x360 [btrfs]
> Apr 23 08:51:29 rakete kernel: [<ffffffff81206fd7>] ? evict_inodes+0x147/0x170
> Apr 23 08:51:29 rakete kernel: [<ffffffff811eda39>] ? generic_shutdown_super+0x69/0xf0
> Apr 23 08:51:29 rakete kernel: [<ffffffff811edace>] ? kill_anon_super+0xe/0x20
> Apr 23 08:51:29 rakete kernel: [<ffffffffc02f1603>] ? btrfs_kill_super+0x13/0x100 [btrfs]
> Apr 23 08:51:29 rakete kernel: [<ffffffff811ed4c4>] ? deactivate_locked_super+0x34/0x60
> Apr 23 08:51:29 rakete kernel: [<ffffffff81209d5b>] ? cleanup_mnt+0x3b/0x80
> Apr 23 08:51:29 rakete kernel: [<ffffffff81096114>] ? task_work_run+0x74/0x90
> Apr 23 08:51:29 rakete kernel: [<ffffffff8100334a>] ? exit_to_usermode_loop+0xba/0xc0
> Apr 23 08:51:29 rakete kernel: [<ffffffff81003bcf>] ? syscall_return_slowpath+0x8f/0x110
> Apr 23 08:51:29 rakete kernel: [<ffffffff815b9918>] ? int_ret_from_sys_call+0x25/0x8f
> Apr 23 08:51:29 rakete kernel: ---[ end trace 6bbe2b6d20973e0e ]---
> Apr 23 08:51:29 rakete kernel: BTRFS: error (device sdi) in cleanup_transaction:1764: errno=-5 IO failure
> Apr 23 08:51:29 rakete kernel: BTRFS info (device sdi): delayed_refs has NO entry
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): commit super ret -5
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): cleaner transaction attach returned -30
>
> ....
In this 19 (or less) seconds the linux system decides to unmount the
btrfs raid1 filesystem (as all its devices have disappeared). I am
wondering if this is done directly by the kernel or is it udisksd that
initiates this?
>
> Apr 23 08:51:48 rakete kernel: usb 3-1: new SuperSpeed USB device number 3 using xhci_hcd
> Apr 23 08:51:48 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 23 08:51:48 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 23 08:51:48 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 23 08:51:48 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 23 08:51:48 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 23 08:51:48 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 23 08:51:48 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 23 08:51:48 rakete kernel: scsi host9: usb-storage 3-1:1.0
> Apr 23 08:51:48 rakete mtp-probe[4301]: checking bus 3, device 3: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 23 08:51:48 rakete mtp-probe[4301]: bus: 3, device: 3 was not an MTP device
> Apr 23 08:51:49 rakete kernel: scsi 9:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
> Apr 23 08:51:49 rakete kernel: scsi 9:0:0:1: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
> Apr 23 08:51:49 rakete kernel: scsi 9:0:0:2: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6 type 0
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8 type 0
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page found
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache: write through
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Write Protect is off
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Mode Sense: 67 00 10 08
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] No Caching mode page found
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Assuming drive cache: write through
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Write Protect is off
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Mode Sense: 67 00 10 08
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] No Caching mode page found
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Assuming drive cache: write through
> Apr 23 08:51:49 rakete kernel: sdf: sdf1
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Attached SCSI disk
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Attached SCSI disk
> Apr 23 08:51:49 rakete udisksd[2422]: Mounted /dev/sdf1 at /media/matthias/BACKUP on behalf of uid 1000
> Apr 23 08:51:49 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 23 08:51:49 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
>
> ####
>
> 10# btrfs fi show
> warning, device 1 is missing
> warning devid 1 not found already
> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> Total devices 3 FS bytes used 1.60GiB
> devid 2 size 465.76GiB used 3.03GiB path /dev/sdg
> devid 3 size 232.88GiB used 0.00B path /dev/sdh
> *** Some devices missing
>
> ####
> This time the raid1 is in state "unmounted" after removing the device. This is different to what I found with kernel 4.4.
>
> 12# ls -l /mnt/raid1/
> total 0
>
> ####
> Trying to mount it again:
>
> 14# mount /mnt/raid1/
> mount: wrong fs type, bad option, bad superblock on /dev/sdh,
> missing codepage or helper program, or other error
>
> In some cases useful info is found in syslog - try
> dmesg | tail or so.
> ####
>
> Apr 23 08:54:35 rakete kernel: BTRFS info (device sdh): enabling auto defrag
> Apr 23 08:54:35 rakete kernel: BTRFS info (device sdh): disk space caching is enabled
> Apr 23 08:54:35 rakete kernel: BTRFS: has skinny extents
> Apr 23 08:54:35 rakete kernel: BTRFS: failed to read the system array on sdh
> Apr 23 08:54:35 rakete kernel: BTRFS: open_ctree failed
>
> ####
>
> Plugin the device again.
>
> Apr 23 08:55:44 rakete kernel: usb 3-1: USB disconnect, device number 3
> Apr 23 08:56:06 rakete kernel: usb 3-1: new SuperSpeed USB device number 4 using xhci_hcd
> Apr 23 08:56:06 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 23 08:56:06 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 23 08:56:06 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 23 08:56:06 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 23 08:56:06 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 23 08:56:06 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 23 08:56:06 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 23 08:56:06 rakete kernel: scsi host10: usb-storage 3-1:1.0
> Apr 23 08:56:06 rakete mtp-probe[4751]: checking bus 3, device 4: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 23 08:56:06 rakete mtp-probe[4751]: bus: 3, device: 4 was not an MTP device
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:0: Direct-Access WDC WD20 02FAEX-007BA0 0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:1: Direct-Access WDC WD75 00AACS-00C7B0 0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:2: Direct-Access WDC WD50 01AALS-00L3B2 0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:3: Direct-Access SAMSUNG SP2504C 0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel: sdf: sdf1
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Attached SCSI disk
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Attached SCSI disk
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Attached SCSI disk
> Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 1 transid 89 /dev/sdg
> Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 2 transid 89 /dev/sdh
> Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 3 transid 89 /dev/sdi
> Apr 23 08:56:07 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 23 08:56:07 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
>
> ####
>
> 15# btrfs fi show
> Label: none uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> Total devices 3 FS bytes used 1.60GiB
> devid 1 size 698.64GiB used 3.03GiB path /dev/sdg
> devid 2 size 465.76GiB used 3.03GiB path /dev/sdh
> devid 3 size 232.88GiB used 0.00B path /dev/sdi
>
> ####
>
> 18# mount /mnt/raid1/
>
> Apr 23 08:57:00 rakete kernel: BTRFS info (device sdi): enabling auto defrag
> Apr 23 08:57:00 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
> Apr 23 08:57:00 rakete kernel: BTRFS: has skinny extents
>
> ####
>
> 19# ls -l /mnt/raid1/
> total 0
> drwxrwxr-x 1 root root 36 Nov 14 2014 AfterShot2(64-bit)
> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
> drwxr-xr-x 1 root root 108 Mar 24 07:31 var
>
> ####
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-21 17:27 ` Matthias Bodenbinder
@ 2016-04-26 16:19 ` Henk Slager
2016-04-26 16:42 ` Holger Hoffstätte
2016-04-28 5:09 ` Matthias Bodenbinder
0 siblings, 2 replies; 32+ messages in thread
From: Henk Slager @ 2016-04-26 16:19 UTC (permalink / raw)
To: Matthias Bodenbinder; +Cc: linux-btrfs
On Thu, Apr 21, 2016 at 7:27 PM, Matthias Bodenbinder
<matthias@bodenbinder.de> wrote:
> Am 21.04.2016 um 13:28 schrieb Henk Slager:
>>> Can anyone explain this behavior?
>>
>> All 4 drives (WD20, WD75, WD50, SP2504C) get a disconnect twice in
>> this test. What is on WD20 is unclear to me, but the raid1 array is
>> {WD75, WD50, SP2504C}
>> So the test as described by Matthias is not what actually happens.
>> In fact, the whole btrfs fs is 'disconnected on the lower layers of
>> the kernel' but there is no unmount. You can see the scsi items go
>> from 8?.0.0.x to
>> 9.0.0.x to 10.0.0.x. In the 9.0.0.x state, the tools show then 1 dev
>> missing (WD75), but in fact the whole fs state is messed up. So as
>> indicated by Anand already, it is a bad test and it is what one can
>> expect from an unpatched 4.4.0 kernel. ( I'm curious to know how md
>> raidX would handle this ).
>>
>> a) My best guess is that the 4 drives are in a USB connected drivebay
>> and that Matthias unplugged WD75 (so cut its power and SATA
>> connection), did the file copy trial and then plugged in the WD75
>> again into the drivebay. The (un)plug of a harddisk is then assumed to
>> trigger a USB link re-init by the chipset in the drivebay.
>>
>> b) Another possibility is that due to (un)plug of WD75 cause the host
>> USB chipset to re-init the USB link due to (too big?) changes in
>> electrical current. And likely separate USB cables and maybe some
>> SATA.
>>
>> c) Or some flaw in the LMDE2 distribution in combination with btrfs. I
>> don't what is in the linux-image-4.4.0-0.bpo.1-amd64
>>
>
> Just to clarify my setup. I HDs are mounted into a FANTEC QB-35US3-6G case. According to the handbook it has "Hot-Plug for USB / eSATA interface".
>
> It is equipped with 4 HDs. 3 of them are part of the raid1. The fourth HD is a 2 TB device with ext4 filesystem and no relevance for this thread.
It looks like a JMS567 + SATA port multipliers behaind it are used in
this drivebay. The command lsusb -v could show that. So your HW
setup is like JBOD, not RAID.
IMHO, using such a setup for software RAID (like btrfs RAID1)
fundamentally violates the concept of RAID (redundant array of
independent disks). It depends on where you define the system border
of the (independent) disks.
If it is at:
A) the 4 (or 3 disk in this case) SATA+power interfaces inside the drivebay or
B) inside the PC's chipset.
In case A) there is a shared removable link (USB) inside the
filesystem processing machine.
In case B) the disks aren't really independent as they share a
removable link (and as proven by the (un)plug of 1 device affecting
all others).
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-26 16:19 ` Henk Slager
@ 2016-04-26 16:42 ` Holger Hoffstätte
2016-04-28 5:12 ` Matthias Bodenbinder
2016-04-28 5:09 ` Matthias Bodenbinder
1 sibling, 1 reply; 32+ messages in thread
From: Holger Hoffstätte @ 2016-04-26 16:42 UTC (permalink / raw)
To: Henk Slager, Matthias Bodenbinder; +Cc: linux-btrfs
On 04/26/16 18:19, Henk Slager wrote:
> It looks like a JMS567 + SATA port multipliers behaind it are used in
> this drivebay. The command lsusb -v could show that. So your HW
> setup is like JBOD, not RAID.
I hate to quote the "harmful" trope, but..
SATA Port Multipliers Considered Harmful
https://www.usenix.org/system/files/fastpw13-paper7_0.pdf
aka: how to make any RAID setup useless in 1 easy step.
-h
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-26 16:19 ` Henk Slager
2016-04-26 16:42 ` Holger Hoffstätte
@ 2016-04-28 5:09 ` Matthias Bodenbinder
2016-04-28 19:14 ` Henk Slager
1 sibling, 1 reply; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-28 5:09 UTC (permalink / raw)
To: linux-btrfs
Am 26.04.2016 um 18:19 schrieb Henk Slager:
> It looks like a JMS567 + SATA port multipliers behaind it are used in
> this drivebay. The command lsusb -v could show that. So your HW
> setup is like JBOD, not RAID.
Here is the output of lsusb -v:
Bus 003 Device 004: ID 152d:0567 JMicron Technology Corp. / JMicron USA Technology Corp.
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 3.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 9
idVendor 0x152d JMicron Technology Corp. / JMicron USA Technology Corp.
idProduct 0x0567
bcdDevice 2.05
iManufacturer 10 JMicron
iProduct 11 USB to ATA/ATAPI Bridge
iSerial 5 152D00539000
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 44
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xc0
Self Powered
MaxPower 2mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 8 Mass Storage
bInterfaceSubClass 6 SCSI
bInterfaceProtocol 80 Bulk-Only
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 15
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 15
Binary Object Store Descriptor:
bLength 5
bDescriptorType 15
wTotalLength 22
bNumDeviceCaps 2
USB 2.0 Extension Device Capability:
bLength 7
bDescriptorType 16
bDevCapabilityType 2
bmAttributes 0x00000002
Link Power Management (LPM) Supported
SuperSpeed USB Device Capability:
bLength 10
bDescriptorType 16
bDevCapabilityType 3
bmAttributes 0x00
wSpeedsSupported 0x000e
Device can operate at Full Speed (12Mbps)
Device can operate at High Speed (480Mbps)
Device can operate at SuperSpeed (5Gbps)
bFunctionalitySupport 1
Lowest fully-functional device speed is Full Speed (12Mbps)
bU1DevExitLat 10 micro seconds
bU2DevExitLat 2047 micro seconds
Device Status: 0x0001
Self Powered
> IMHO, using such a setup for software RAID (like btrfs RAID1)
> fundamentally violates the concept of RAID (redundant array of
> independent disks). It depends on where you define the system border
> of the (independent) disks.
> If it is at:
>
> A) the 4 (or 3 disk in this case) SATA+power interfaces inside the drivebay or
>
> B) inside the PC's chipset.
>
> In case A) there is a shared removable link (USB) inside the
> filesystem processing machine.
> In case B) the disks aren't really independent as they share a
> removable link (and as proven by the (un)plug of 1 device affecting
> all others).
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-26 16:42 ` Holger Hoffstätte
@ 2016-04-28 5:12 ` Matthias Bodenbinder
2016-04-28 5:24 ` Gareth Pye
0 siblings, 1 reply; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-28 5:12 UTC (permalink / raw)
To: linux-btrfs
Am 26.04.2016 um 18:42 schrieb Holger Hoffstätte:
> On 04/26/16 18:19, Henk Slager wrote:
>> It looks like a JMS567 + SATA port multipliers behaind it are used in
>> this drivebay. The command lsusb -v could show that. So your HW
>> setup is like JBOD, not RAID.
>
> I hate to quote the "harmful" trope, but..
>
> SATA Port Multipliers Considered Harmful
> https://www.usenix.org/system/files/fastpw13-paper7_0.pdf
>
> aka: how to make any RAID setup useless in 1 easy step.
Interesting article but it has no date to it. Could be outdated or brand new.
Matthias
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-28 5:12 ` Matthias Bodenbinder
@ 2016-04-28 5:24 ` Gareth Pye
2016-04-28 8:08 ` Duncan
0 siblings, 1 reply; 32+ messages in thread
From: Gareth Pye @ 2016-04-28 5:24 UTC (permalink / raw)
To: Matthias Bodenbinder; +Cc: linux-btrfs
PDF doc info dates it at 23/1/2013, which is the best guess that can
easily be found.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-28 5:24 ` Gareth Pye
@ 2016-04-28 8:08 ` Duncan
0 siblings, 0 replies; 32+ messages in thread
From: Duncan @ 2016-04-28 8:08 UTC (permalink / raw)
To: linux-btrfs
Gareth Pye posted on Thu, 28 Apr 2016 15:24:51 +1000 as excerpted:
> PDF doc info dates it at 23/1/2013, which is the best guess that can
> easily be found.
Well, "easily" is relative, but motivated by your observation I first
confirmed it, then decided to see what google had to say about the
authors.
I only looked at the two University of Minnesota authors. David Lilja is
a professor there since the 90s, with google turning up various lectures,
etc, at other universities. Peng Li, listed as a student in the paper,
was presumably a graduate student. His linkedin profile says he's at
Intel from Aug 2015 to present (software engineer, non-volatitle memory
device R&D), but was Sr. Engineer, Seagate Tech, Minneapolis/St.Paul
area, July 2013 to August 2015 (drive arch and performance modeling), and
was a summer intern at Huawei in San Fran area in the middle of 2012.
There's several patent and papers to his name.
More importantly for us, however, linkedin links to his personal page,
still University of Minnesota as he graduated there with a doctorate, PhD
Advisor, no surprise, Prof. David J Lilja.
http://people.ece.umn.edu/~lipeng/
That page lists as a one project:
Reliability of SATA Port Multiplier (2012).
So while the paper probably came out in January of 2013 as the pdf date
suggests, he was working on it in 2012.
BTW, his personal site was last updated in June of 2013 and thus doesn't
mention anything about his move to Intel in 2015. I'd guess he hasn't
touched it since getting the doctorate and the job at Seagate, given the
page mentions that, but the Linkedin profile said it didn't start until
July of that year, the month after his last personal page at the
university, update.
Took me longer to write that up than to find it, so it wasn't hard, but
as I said, "easy" is relative, so YMMV. =:^)
Meanwhile, that was just a single sampling, as the paper itself points
out, so we don't know where it falls among other port multipliers, or
even if its behavior was characteristic of that brand and model.
What we do have, however, is that semi-official paper, along with other
observations here about the reliability, or more accurately, lack of
reliability, of the various USB2SATA bridge chips, etc. Even without the
port multiplier, prior real world posted experience here suggests that
while single device btrfs on USB via USB2SATA bridge may be reasonable,
it's not particularly reliable as part of a multi-device btrfs, as too
often the bridges and devices behind them drop out temporarily due to
power or other reasons, and btrfs at this point simply doesn't cope well
with devices dropping out and appearing again, possibly as other
devices. With a single-device btrfs there isn't much to screw up, the
data either gets there or doesn't, and the atomic-cow nature of btrfs
does at least normally allow for recovery to a known past state plus
replay of the fsync log between commits if it doesn't, but multi-device
can quickly get out of hand, particularly if more than one device is
playing the disappear and reappear game at once.
A reasonable conclusion then, is that the given layout isn't particularly
reliable at more than one point, making multi-device anything over it
rather unwise. JBOD /as/ /JBOD/, creating individual single-device
filesystems on each device (or device partition), may be somewhat more
workable, but multi-device, whether at the btrfs level or dm- or md-raid
level underneath some other filesystem, isn't likely to be very reliable
at all.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Question: raid1 behaviour on failure
2016-04-28 5:09 ` Matthias Bodenbinder
@ 2016-04-28 19:14 ` Henk Slager
0 siblings, 0 replies; 32+ messages in thread
From: Henk Slager @ 2016-04-28 19:14 UTC (permalink / raw)
To: Matthias Bodenbinder; +Cc: linux-btrfs
On Thu, Apr 28, 2016 at 7:09 AM, Matthias Bodenbinder
<matthias@bodenbinder.de> wrote:
> Am 26.04.2016 um 18:19 schrieb Henk Slager:
>> It looks like a JMS567 + SATA port multipliers behaind it are used in
>> this drivebay. The command lsusb -v could show that. So your HW
>> setup is like JBOD, not RAID.
>
> Here is the output of lsusb -v:
>
>
> Bus 003 Device 004: ID 152d:0567 JMicron Technology Corp. / JMicron USA Technology Corp.
> Device Descriptor:
> bLength 18
> bDescriptorType 1
> bcdUSB 3.00
> bDeviceClass 0 (Defined at Interface level)
> bDeviceSubClass 0
> bDeviceProtocol 0
> bMaxPacketSize0 9
> idVendor 0x152d JMicron Technology Corp. / JMicron USA Technology Corp.
> idProduct 0x0567
> bcdDevice 2.05
> iManufacturer 10 JMicron
> iProduct 11 USB to ATA/ATAPI Bridge
> iSerial 5 152D00539000
> bNumConfigurations 1
OK, that is how the drivebay presents itself. It does not really
correspond to this:
http://www.jmicron.com/PDF/brief/jms567.pdf
It looks more like a jms562 is used, but I don't know what is on the
PCB and in the FW
Anyhow, hot (un)plug capability on the 4 internal SATA i/f is not
explicitly mentioned. If you expect or want that, ask Fantec I would
say.
> Configuration Descriptor:
> bLength 9
> bDescriptorType 2
> wTotalLength 44
> bNumInterfaces 1
> bConfigurationValue 1
> iConfiguration 0
> bmAttributes 0xc0
> Self Powered
> MaxPower 2mA
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 0
> bAlternateSetting 0
> bNumEndpoints 2
> bInterfaceClass 8 Mass Storage
> bInterfaceSubClass 6 SCSI
> bInterfaceProtocol 80 Bulk-Only
> iInterface 0
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x81 EP 1 IN
> bmAttributes 2
> Transfer Type Bulk
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0400 1x 1024 bytes
> bInterval 0
> bMaxBurst 15
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x02 EP 2 OUT
> bmAttributes 2
> Transfer Type Bulk
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0400 1x 1024 bytes
> bInterval 0
> bMaxBurst 15
> Binary Object Store Descriptor:
> bLength 5
> bDescriptorType 15
> wTotalLength 22
> bNumDeviceCaps 2
> USB 2.0 Extension Device Capability:
> bLength 7
> bDescriptorType 16
> bDevCapabilityType 2
> bmAttributes 0x00000002
> Link Power Management (LPM) Supported
> SuperSpeed USB Device Capability:
> bLength 10
> bDescriptorType 16
> bDevCapabilityType 3
> bmAttributes 0x00
> wSpeedsSupported 0x000e
> Device can operate at Full Speed (12Mbps)
> Device can operate at High Speed (480Mbps)
> Device can operate at SuperSpeed (5Gbps)
> bFunctionalitySupport 1
> Lowest fully-functional device speed is Full Speed (12Mbps)
> bU1DevExitLat 10 micro seconds
> bU2DevExitLat 2047 micro seconds
> Device Status: 0x0001
> Self Powered
>
>
>
>> IMHO, using such a setup for software RAID (like btrfs RAID1)
>> fundamentally violates the concept of RAID (redundant array of
>> independent disks). It depends on where you define the system border
>> of the (independent) disks.
>> If it is at:
>>
>> A) the 4 (or 3 disk in this case) SATA+power interfaces inside the drivebay or
>>
>> B) inside the PC's chipset.
>>
>> In case A) there is a shared removable link (USB) inside the
>> filesystem processing machine.
>> In case B) the disks aren't really independent as they share a
>> removable link (and as proven by the (un)plug of 1 device affecting
>> all others).
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2016-04-28 19:14 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-18 5:06 Question: raid1 behaviour on failure Matthias Bodenbinder
2016-04-18 7:22 ` Qu Wenruo
2016-04-20 5:17 ` Matthias Bodenbinder
2016-04-20 7:25 ` Qu Wenruo
2016-04-21 5:22 ` Matthias Bodenbinder
2016-04-21 5:43 ` Qu Wenruo
2016-04-21 6:02 ` Liu Bo
2016-04-21 6:09 ` Qu Wenruo
2016-04-21 17:40 ` Matthias Bodenbinder
2016-04-22 6:02 ` Qu Wenruo
2016-04-23 7:07 ` Matthias Bodenbinder
2016-04-23 7:17 ` Matthias Bodenbinder
2016-04-26 8:17 ` Satoru Takeuchi
2016-04-26 15:16 ` Henk Slager
2016-04-20 13:32 ` Anand Jain
2016-04-21 5:15 ` Matthias Bodenbinder
2016-04-21 7:19 ` Anand Jain
2016-04-21 6:23 ` Satoru Takeuchi
2016-04-21 11:09 ` Austin S. Hemmelgarn
2016-04-21 11:28 ` Henk Slager
2016-04-21 17:27 ` Matthias Bodenbinder
2016-04-26 16:19 ` Henk Slager
2016-04-26 16:42 ` Holger Hoffstätte
2016-04-28 5:12 ` Matthias Bodenbinder
2016-04-28 5:24 ` Gareth Pye
2016-04-28 8:08 ` Duncan
2016-04-28 5:09 ` Matthias Bodenbinder
2016-04-28 19:14 ` Henk Slager
[not found] ` <57188534.1070408@jp.fujitsu.com>
2016-04-21 11:58 ` Qu Wenruo
2016-04-22 2:21 ` Satoru Takeuchi
2016-04-22 5:32 ` Qu Wenruo
2016-04-22 6:17 ` Satoru Takeuchi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.