All of lore.kernel.org
 help / color / mirror / Atom feed
* Question: raid1 behaviour on failure
@ 2016-04-18  5:06 Matthias Bodenbinder
  2016-04-18  7:22 ` Qu Wenruo
  0 siblings, 1 reply; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-18  5:06 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have a raid1 with 3 drives: 698, 465 and 232 GB. I copied 1,7 GB data to that raid1, balanced the filesystem and then removed the bigger drive (hotplug). 

The data was still available. Now I copied the /root directory to the raid1. It showed up via ls -l. Then I plugged in the missing hard drive again (hotplug). After a few seconds "btrfs fi show" is giving output as usual:

Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
	Total devices 3 FS bytes used 1.60GiB
	devid    1 size 698.64GiB used 4.03GiB path /dev/sdg
	devid    2 size 465.76GiB used 4.03GiB path /dev/sdh
	devid    3 size 232.88GiB used 0.00B path /dev/sdi

The /root is still showing up, but the raid1 is now mounted in *read-only* mode. 

I umounted it and mounted it again. Now the /root directory on the raid1 is no longer available. Its gone.

I guess I missed some important step to recover the degraded raid1 before umounting it.

What is it that I missed?

Matthias


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-18  5:06 Question: raid1 behaviour on failure Matthias Bodenbinder
@ 2016-04-18  7:22 ` Qu Wenruo
  2016-04-20  5:17   ` Matthias Bodenbinder
  0 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2016-04-18  7:22 UTC (permalink / raw)
  To: Matthias Bodenbinder, linux-btrfs

Not quite sure about raid1 behavior.

But your "hotplug" seems to be problem.
IIRC Btrfs is known to have problem with re-appearing device.

If the hot revmoed device is fully wiped before re-plugged, it should 
not cause the RO mount (abort transaction).

BTW, it would be better to post the dmesg for better debug.

Hopes other one could give better explanation on this.

Thanks,
Qu

Matthias Bodenbinder wrote on 2016/04/18 07:06 +0200:
> Hi,
>
> I have a raid1 with 3 drives: 698, 465 and 232 GB. I copied 1,7 GB data to that raid1, balanced the filesystem and then removed the bigger drive (hotplug).
>
> The data was still available. Now I copied the /root directory to the raid1. It showed up via ls -l. Then I plugged in the missing hard drive again (hotplug). After a few seconds "btrfs fi show" is giving output as usual:
>
> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> 	Total devices 3 FS bytes used 1.60GiB
> 	devid    1 size 698.64GiB used 4.03GiB path /dev/sdg
> 	devid    2 size 465.76GiB used 4.03GiB path /dev/sdh
> 	devid    3 size 232.88GiB used 0.00B path /dev/sdi
>
> The /root is still showing up, but the raid1 is now mounted in *read-only* mode.
>
> I umounted it and mounted it again. Now the /root directory on the raid1 is no longer available. Its gone.
>
> I guess I missed some important step to recover the degraded raid1 before umounting it.
>
> What is it that I missed?
>
> Matthias
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-18  7:22 ` Qu Wenruo
@ 2016-04-20  5:17   ` Matthias Bodenbinder
  2016-04-20  7:25     ` Qu Wenruo
                       ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-20  5:17 UTC (permalink / raw)
  To: linux-btrfs

Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
> BTW, it would be better to post the dmesg for better debug.

So here we. I did the same test again. Here is a full log of what i did. It seems to be mean like a bug in btrfs. 
Sequenz of events:
1. mount the raid1 (2 disc with different size)
2. unplug the biggest drive (hotplug)
3. try to copy something to the degraded raid1
4. plugin the device again (hotplug)

This scenario does not work. The disc array is NOT redundant! I can not work with it while a drive is missing and I can not reattach the device so that everything works again.

The btrfs module crashes during the test.

I am using LMDE2 with backports:
btrfs-tools 4.4-1~bpo8+1
linux-image-4.4.0-0.bpo.1-amd64

Matthias


rakete - root - /root
1# mount /mnt/raid1/

Journal:

Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto defrag
Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents

rakete - root - /mnt/raid1
3# ll
insgesamt 0
drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
drwxr-xr-x 1 root root  108 Mär 24 07:31 var

4# btrfs fi show
Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
	Total devices 3 FS bytes used 1.60GiB
	devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
	devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
	devid    3 size 232.88GiB used 0.00B path /dev/sdi

####
unplug device sdg:

Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is busy
Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info about processes that
Apr 20 07:03:05 rakete umount[16405]: use the device is found by lsof(8) or fuser(1).)
Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process exited, code=exited status=32
Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device number 3 using xhci_hcd
Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an MTP device
Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6 type 0
Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8 type 0
Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page found
Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache: write through
Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00 10 08
Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte logical blocks: (250 GB/233 GiB)
Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page found
Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive cache: write through
Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00 10 08
Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page found
Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive cache: write through
Apr 20 07:03:25 rakete kernel:  sdf: sdf1
Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No such file or directory


####
5# btrfs fi show
Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
	Total devices 3 FS bytes used 1.60GiB
	devid    2 size 465.76GiB used 3.03GiB path /dev/sdj
	devid    3 size 232.88GiB used 0.00B path /dev/sdk
	*** Some devices missing
####
still mounted in rw mode:
/dev/sdg on /mnt/raid1 type btrfs (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
####
7# cp -r /root/ .
cp: das Verzeichnis „./root“ kann nicht angelegt werden: Eingabe-/Ausgabefehler
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error reading free space cache
Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 20497563648, rebuilding it now
Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156 __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci saa7146_vv ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core dvb_core v4l2_common videodev media cfg80211 vboxdrv(O) cpufreq_powersave cpufreq_conservative cpufreq_userspace cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev sparse_keymap drbg iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek rfkill ansi_cprng snd_hda_codec_generic nvidia(PO) aesni_intel aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper snd_hda_codec cryptd snd_hda_core serio_raw pcspkr snd_hwdep snd_pcm i2c_i801 snd_timer snd lpc_ich soundcore 8250_fintek mei_me shpchp mei
Apr 20 07:05:37 rakete kernel:  mfd_core battery tpm_tis tpm evdev processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd2 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic md_mod dm_mirror dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod ata_generic ahci libahci pata_via xhci_pci ehci_pci crc32c_intel xhci_hcd ehci_hcd libata psmouse scsi_mod atl1c usbcore usb_common fjes video wmi fan thermal button
Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted: P           O    4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer System Product Name/P8H67-V, BIOS 3707 07/12/2013
Apr 20 07:05:37 rakete kernel:  0000000000000286 000000006a1407c8 ffffffff812ed425 ffff88016b6dfb90
Apr 20 07:05:37 rakete kernel:  ffffffffa03817b8 ffffffff81077ea1 ffff88018e7fcd30 ffff88016b6dfbe8
Apr 20 07:05:37 rakete kernel:  ffff88005d863e88 ffff8801cde7a980 ffff88018e7fce48 ffffffff81077f2c
Apr 20 07:05:37 rakete kernel: Call Trace:
Apr 20 07:05:37 rakete kernel:  [<ffffffff812ed425>] ? dump_stack+0x5c/0x77
Apr 20 07:05:37 rakete kernel:  [<ffffffff81077ea1>] ? warn_slowpath_common+0x81/0xb0
Apr 20 07:05:37 rakete kernel:  [<ffffffff81077f2c>] ? warn_slowpath_fmt+0x5c/0x80
Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d74af>] ? __btrfs_cow_block+0x56f/0x5e0 [btrfs]
Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d76af>] ? btrfs_cow_block+0x10f/0x1d0 [btrfs]
Apr 20 07:05:37 rakete kernel:  [<ffffffffa02db2cd>] ? btrfs_search_slot+0x1fd/0xa30 [btrfs]
Apr 20 07:05:37 rakete kernel:  [<ffffffffa02dd3f1>] ? btrfs_insert_empty_items+0x71/0xc0 [btrfs]
Apr 20 07:05:37 rakete kernel:  [<ffffffff811f4d92>] ? insert_inode_locked4+0xa2/0x1c0
Apr 20 07:05:37 rakete kernel:  [<ffffffffa030ee5d>] ? btrfs_new_inode+0x1cd/0x590 [btrfs]
Apr 20 07:05:37 rakete kernel:  [<ffffffffa0310a77>] ? btrfs_mkdir+0x107/0x1f0 [btrfs]
Apr 20 07:05:37 rakete kernel:  [<ffffffff811e80b0>] ? vfs_mkdir+0xb0/0x140
Apr 20 07:05:37 rakete kernel:  [<ffffffff811e9d3e>] ? SyS_mkdir+0xce/0x110
Apr 20 07:05:37 rakete kernel:  [<ffffffff81592736>] ? system_call_fast_compare_end+0xc/0x6b
Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in __btrfs_cow_block:1156: errno=-5 IO failure
Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced readonly

####
Try to copy again:
11# cp -r /root/ .
cp: cannot create directory './root': Read-only file system
####
/dev/sdg on /mnt/raid1 type btrfs (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
####
plugin device sdg again:

Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point /media/matthias/BACKUP (device 8:81 no longer exist)
Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device number 3
Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No such file or directory
Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is busy
Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info about processes that
Apr 20 07:07:39 rakete umount[16807]: use the device is found by lsof(8) or fuser(1).)
Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process exited, code=exited status=32
Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device number 4 using xhci_hcd
Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an MTP device
Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access     WDC WD75 00AACS-00C7B0    0125 PQ: 0 ANSI: 6
Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6 type 0
Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00 10 08
Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7 type 0
Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode page found
Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive cache: write through
Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8 type 0
Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00 10 08
Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9 type 0
Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode page found
Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive cache: write through
Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00 10 08
Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte logical blocks: (250 GB/233 GiB)
Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode page found
Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive cache: write through
Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00 10 08
Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode page found
Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive cache: write through
Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)

####
still ro mode
/dev/sdj on /mnt/raid1 type btrfs (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
####
14# btrfs fi show
Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
	Total devices 3 FS bytes used 1.60GiB
	devid    1 size 698.64GiB used 3.03GiB path /dev/sdj
	devid    2 size 465.76GiB used 3.03GiB path /dev/sdk
	devid    3 size 232.88GiB used 0.00B path /dev/sdl
####






^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-20  5:17   ` Matthias Bodenbinder
@ 2016-04-20  7:25     ` Qu Wenruo
  2016-04-21  5:22       ` Matthias Bodenbinder
  2016-04-20 13:32     ` Anand Jain
  2016-04-21  6:23     ` Satoru Takeuchi
  2 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2016-04-20  7:25 UTC (permalink / raw)
  To: Matthias Bodenbinder, linux-btrfs



Matthias Bodenbinder wrote on 2016/04/20 07:17 +0200:
> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>> BTW, it would be better to post the dmesg for better debug.
>
> So here we. I did the same test again. Here is a full log of what i did. It seems to be mean like a bug in btrfs.
> Sequenz of events:
> 1. mount the raid1 (2 disc with different size)
> 2. unplug the biggest drive (hotplug)
> 3. try to copy something to the degraded raid1
> 4. plugin the device again (hotplug)
>
> This scenario does not work. The disc array is NOT redundant! I can not work with it while a drive is missing and I can not reattach the device so that everything works again.
>
> The btrfs module crashes during the test.
>
> I am using LMDE2 with backports:
> btrfs-tools 4.4-1~bpo8+1
> linux-image-4.4.0-0.bpo.1-amd64
>
> Matthias
>
>
> rakete - root - /root
> 1# mount /mnt/raid1/
>
> Journal:
>
> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto defrag
> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>
> rakete - root - /mnt/raid1
> 3# ll
> insgesamt 0
> drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
> drwxr-xr-x 1 root root  108 Mär 24 07:31 var
>
> 4# btrfs fi show
> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> 	Total devices 3 FS bytes used 1.60GiB
> 	devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
> 	devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
> 	devid    3 size 232.88GiB used 0.00B path /dev/sdi
>
> ####
> unplug device sdg:
>
> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is busy
> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info about processes that
> Apr 20 07:03:05 rakete umount[16405]: use the device is found by lsof(8) or fuser(1).)
> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process exited, code=exited status=32
> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device number 3 using xhci_hcd
> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an MTP device
> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6 type 0
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8 type 0
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page found
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache: write through
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00 10 08
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page found
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive cache: write through
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00 10 08
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page found
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive cache: write through
> Apr 20 07:03:25 rakete kernel:  sdf: sdf1
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No such file or directory
>
>
> ####
> 5# btrfs fi show
> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> 	Total devices 3 FS bytes used 1.60GiB
> 	devid    2 size 465.76GiB used 3.03GiB path /dev/sdj
> 	devid    3 size 232.88GiB used 0.00B path /dev/sdk
> 	*** Some devices missing
> ####
> still mounted in rw mode:
> /dev/sdg on /mnt/raid1 type btrfs (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
> ####

Unfortunately, this is the designed behavior.

The fs is rw just because it doesn't hit any critical problem.

If you try to touch a file and then sync the fs, btrfs will become RO 
immediately.

> 7# cp -r /root/ .
> cp: das Verzeichnis „./root“ kann nicht angelegt werden: Eingabe-/Ausgabefehler
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error reading free space cache
> Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 20497563648, rebuilding it now
> Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
> Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156 __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
> Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
> Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci saa7146_vv ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core dvb_core v4l2_common videodev media cfg80211 vboxdrv(O) cpufreq_powersave cpufreq_conservative cpufreq_userspace cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev sparse_keymap drbg iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek rfkill ansi_cprng snd_hda_codec_generic nvidia(PO) aesni_intel aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper snd_hda_codec cryptd snd_hda_core serio_raw pcspkr snd_hwdep snd_pcm i2c_i801 snd_timer snd lpc_ich soundcore 8250_fintek mei_me shpchp mei
> Apr 20 07:05:37 rakete kernel:  mfd_core battery tpm_tis tpm evdev processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd2 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic md_mod dm_mirror dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod ata_generic ahci libahci pata_via xhci_pci ehci_pci crc32c_intel xhci_hcd ehci_hcd libata psmouse scsi_mod atl1c usbcore usb_common fjes video wmi fan thermal button
> Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted: P           O    4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
> Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer System Product Name/P8H67-V, BIOS 3707 07/12/2013
> Apr 20 07:05:37 rakete kernel:  0000000000000286 000000006a1407c8 ffffffff812ed425 ffff88016b6dfb90
> Apr 20 07:05:37 rakete kernel:  ffffffffa03817b8 ffffffff81077ea1 ffff88018e7fcd30 ffff88016b6dfbe8
> Apr 20 07:05:37 rakete kernel:  ffff88005d863e88 ffff8801cde7a980 ffff88018e7fce48 ffffffff81077f2c
> Apr 20 07:05:37 rakete kernel: Call Trace:
> Apr 20 07:05:37 rakete kernel:  [<ffffffff812ed425>] ? dump_stack+0x5c/0x77
> Apr 20 07:05:37 rakete kernel:  [<ffffffff81077ea1>] ? warn_slowpath_common+0x81/0xb0
> Apr 20 07:05:37 rakete kernel:  [<ffffffff81077f2c>] ? warn_slowpath_fmt+0x5c/0x80
> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d74af>] ? __btrfs_cow_block+0x56f/0x5e0 [btrfs]
> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d76af>] ? btrfs_cow_block+0x10f/0x1d0 [btrfs]
> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02db2cd>] ? btrfs_search_slot+0x1fd/0xa30 [btrfs]
> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02dd3f1>] ? btrfs_insert_empty_items+0x71/0xc0 [btrfs]
> Apr 20 07:05:37 rakete kernel:  [<ffffffff811f4d92>] ? insert_inode_locked4+0xa2/0x1c0
> Apr 20 07:05:37 rakete kernel:  [<ffffffffa030ee5d>] ? btrfs_new_inode+0x1cd/0x590 [btrfs]
> Apr 20 07:05:37 rakete kernel:  [<ffffffffa0310a77>] ? btrfs_mkdir+0x107/0x1f0 [btrfs]
> Apr 20 07:05:37 rakete kernel:  [<ffffffff811e80b0>] ? vfs_mkdir+0xb0/0x140
> Apr 20 07:05:37 rakete kernel:  [<ffffffff811e9d3e>] ? SyS_mkdir+0xce/0x110
> Apr 20 07:05:37 rakete kernel:  [<ffffffff81592736>] ? system_call_fast_compare_end+0xc/0x6b
> Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
> Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in __btrfs_cow_block:1156: errno=-5 IO failure
> Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced readonly

Btrfs fails to read space cache, nor make a new dir.

The failure on cow_block in mkdir is ciritical, and btrfs become RO.

All expected behavior so far.

You may try use degraded mount option, but AFAIK it may not handle case 
like yours.

Thanks,
Qu
>
> ####
> Try to copy again:
> 11# cp -r /root/ .
> cp: cannot create directory './root': Read-only file system
> ####
> /dev/sdg on /mnt/raid1 type btrfs (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
> ####
> plugin device sdg again:
>
> Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point /media/matthias/BACKUP (device 8:81 no longer exist)
> Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device number 3
> Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No such file or directory
> Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is busy
> Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info about processes that
> Apr 20 07:07:39 rakete umount[16807]: use the device is found by lsof(8) or fuser(1).)
> Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process exited, code=exited status=32
> Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
> Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device number 4 using xhci_hcd
> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
> Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an MTP device
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access     WDC WD75 00AACS-00C7B0    0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
>
> ####
> still ro mode
> /dev/sdj on /mnt/raid1 type btrfs (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
> ####
> 14# btrfs fi show
> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> 	Total devices 3 FS bytes used 1.60GiB
> 	devid    1 size 698.64GiB used 3.03GiB path /dev/sdj
> 	devid    2 size 465.76GiB used 3.03GiB path /dev/sdk
> 	devid    3 size 232.88GiB used 0.00B path /dev/sdl
> ####
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-20  5:17   ` Matthias Bodenbinder
  2016-04-20  7:25     ` Qu Wenruo
@ 2016-04-20 13:32     ` Anand Jain
  2016-04-21  5:15       ` Matthias Bodenbinder
  2016-04-21  6:23     ` Satoru Takeuchi
  2 siblings, 1 reply; 32+ messages in thread
From: Anand Jain @ 2016-04-20 13:32 UTC (permalink / raw)
  To: Matthias Bodenbinder, linux-btrfs



> 1. mount the raid1 (2 disc with different size)

> 2. unplug the biggest drive (hotplug)

   Btrfs won't know that you have plugged-out a disk.
   Though it experiences IO failures, it won't close the bdev.

> 3. try to copy something to the degraded raid1

   This will work as long as you do _not_ run unmount/mount.

   However once you umount/mount you won't be able to mount
   even with -o degraded option. (there are some workaround
   patches in the ML)

> 4. plugin the device again (hotplug)

   This is a bad test case.

   - Since btrfs didn't close the device, at #2 above, the block
   layer will create a new device instance and path when you plug-in
   the device.

   And when btrfs will promptly scan the device and update its
   records. But note that its still using the old bdev. And
   you will continue to see the IO errors. And no IO will go
   to the new device instance.

   There are patches in the ML under tests which will force
   close the device upon loosing access to the device. As a
   first step.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-20 13:32     ` Anand Jain
@ 2016-04-21  5:15       ` Matthias Bodenbinder
  2016-04-21  7:19         ` Anand Jain
  0 siblings, 1 reply; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-21  5:15 UTC (permalink / raw)
  To: linux-btrfs

Am 20.04.2016 um 15:32 schrieb Anand Jain:
>> 1. mount the raid1 (2 disc with different size)
> 
>> 2. unplug the biggest drive (hotplug)
> 
>   Btrfs won't know that you have plugged-out a disk.
>   Though it experiences IO failures, it won't close the bdev.

Well, as far as I can tell mdadm can handle this use case. I tested that. I have an mdadm raid5 running. I accidentially unplugged a sata cable from one of the devices and the raid still worked. I did not even notice that the cable was unplugged until a few hours later. Then I plugged in the cable agaib and that was it. mdadm recovered the raid5 without any problem. -> This is redunancy!


> 
>> 3. try to copy something to the degraded raid1
> 
>   This will work as long as you do _not_ run unmount/mount.
 
I did not umount the raid1 when I tried to copy something. As you can see from the sequence of events: I removed the drive and immdiately afterwards tried to copy something to the degraded array. This copy failed with a crash of the btrfs module. -> This is NOT redundancy.

The ummount and mount operations are coming afterwards.

In a nutshell I have to say that the btrfs behaviour is by no means compliant with my understanding of redundancy.


Matthias




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-20  7:25     ` Qu Wenruo
@ 2016-04-21  5:22       ` Matthias Bodenbinder
  2016-04-21  5:43         ` Qu Wenruo
  0 siblings, 1 reply; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-21  5:22 UTC (permalink / raw)
  To: linux-btrfs

Am 20.04.2016 um 09:25 schrieb Qu Wenruo:

> 
> Unfortunately, this is the designed behavior.
> 
> The fs is rw just because it doesn't hit any critical problem.
> 
> If you try to touch a file and then sync the fs, btrfs will become RO immediately.
> 
....

> Btrfs fails to read space cache, nor make a new dir.
> 
> The failure on cow_block in mkdir is ciritical, and btrfs become RO.
> 
> All expected behavior so far.
> 
> You may try use degraded mount option, but AFAIK it may not handle case like yours.

This really scares me. "Expected bevahour"? 
So you are saying: If one of the drives in the raid1 is going dead without noticing btrfs, the redundancy is lost. 

Lets say, the power unit of a disc is going dead. This disc will disappear from the raid1 pretty much as suddenly as in my test case here. No difference.

You are saying that in this case, btrfs should exactly behave like this? If that is the case I eventually need to rethink my interpretation of redundancy.

Matthias




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-21  5:22       ` Matthias Bodenbinder
@ 2016-04-21  5:43         ` Qu Wenruo
  2016-04-21  6:02           ` Liu Bo
  2016-04-21 17:40           ` Matthias Bodenbinder
  0 siblings, 2 replies; 32+ messages in thread
From: Qu Wenruo @ 2016-04-21  5:43 UTC (permalink / raw)
  To: Matthias Bodenbinder, linux-btrfs



Matthias Bodenbinder wrote on 2016/04/21 07:22 +0200:
> Am 20.04.2016 um 09:25 schrieb Qu Wenruo:
>
>>
>> Unfortunately, this is the designed behavior.
>>
>> The fs is rw just because it doesn't hit any critical problem.
>>
>> If you try to touch a file and then sync the fs, btrfs will become RO immediately.
>>
> ....
>
>> Btrfs fails to read space cache, nor make a new dir.
>>
>> The failure on cow_block in mkdir is ciritical, and btrfs become RO.
>>
>> All expected behavior so far.
>>
>> You may try use degraded mount option, but AFAIK it may not handle case like yours.
>
> This really scares me. "Expected bevahour"?
> So you are saying: If one of the drives in the raid1 is going dead without noticing btrfs, the redundancy is lost.
>
> Lets say, the power unit of a disc is going dead. This disc will disappear from the raid1 pretty much as suddenly as in my test case here. No difference.
>
> You are saying that in this case, btrfs should exactly behave like this? If that is the case I eventually need to rethink my interpretation of redundancy.
>
> Matthias
>

The "expected behavior" just means the abort transaction behavior for 
critical error is expected.

And you should know, btrfs is not doing full block level RAID1, it's 
doing RAID at chunk level.
Which needs to consider more things than full block level RAID1, and 
it's more flex than block level raid1.
(For example, you can use 3 devices with different sizes to do btrfs 
RAID1 and get more available size than mdadm raid1)

You may think the behavior is totally insane for btrfs RAID1, but don't 
forget, btrfs can have different metdata/data profile.
(And even more, there is already plan to support different profile for 
different subvolumes)

In case your metadata is RAID1, your data can still be RAID0, and in 
that case a missing devices can still cause huge problem.

There are already unmerged patches which will partly do the mdadm level 
behavior, like automatically change to degraded mode without making the 
fs RO.

The original patchset:
http://comments.gmane.org/gmane.comp.file-systems.btrfs/48335

Or the latest patchset inside Anand Jain's auto-replace patchset:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/55446

Thanks,
Qu
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-21  5:43         ` Qu Wenruo
@ 2016-04-21  6:02           ` Liu Bo
  2016-04-21  6:09             ` Qu Wenruo
  2016-04-21 17:40           ` Matthias Bodenbinder
  1 sibling, 1 reply; 32+ messages in thread
From: Liu Bo @ 2016-04-21  6:02 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Matthias Bodenbinder, linux-btrfs

On Thu, Apr 21, 2016 at 01:43:56PM +0800, Qu Wenruo wrote:
> 
> 
> Matthias Bodenbinder wrote on 2016/04/21 07:22 +0200:
> >Am 20.04.2016 um 09:25 schrieb Qu Wenruo:
> >
> >>
> >>Unfortunately, this is the designed behavior.
> >>
> >>The fs is rw just because it doesn't hit any critical problem.
> >>
> >>If you try to touch a file and then sync the fs, btrfs will become RO immediately.
> >>
> >....
> >
> >>Btrfs fails to read space cache, nor make a new dir.
> >>
> >>The failure on cow_block in mkdir is ciritical, and btrfs become RO.
> >>
> >>All expected behavior so far.
> >>
> >>You may try use degraded mount option, but AFAIK it may not handle case like yours.
> >
> >This really scares me. "Expected bevahour"?
> >So you are saying: If one of the drives in the raid1 is going dead without noticing btrfs, the redundancy is lost.
> >
> >Lets say, the power unit of a disc is going dead. This disc will disappear from the raid1 pretty much as suddenly as in my test case here. No difference.
> >
> >You are saying that in this case, btrfs should exactly behave like this? If that is the case I eventually need to rethink my interpretation of redundancy.
> >
> >Matthias
> >
> 
> The "expected behavior" just means the abort transaction behavior for
> critical error is expected.
> 
> And you should know, btrfs is not doing full block level RAID1, it's doing
> RAID at chunk level.
> Which needs to consider more things than full block level RAID1, and it's
> more flex than block level raid1.
> (For example, you can use 3 devices with different sizes to do btrfs RAID1
> and get more available size than mdadm raid1)
> 
> You may think the behavior is totally insane for btrfs RAID1, but don't
> forget, btrfs can have different metdata/data profile.
> (And even more, there is already plan to support different profile for
> different subvolumes)
> 
> In case your metadata is RAID1, your data can still be RAID0, and in that
> case a missing devices can still cause huge problem.

>From an user's point of view, what you're saying is more an excuse and
kind of irrelavant.  Stop doing that please, try to fix the insane behavior instead.

Thanks,

-liubo

> 
> There are already unmerged patches which will partly do the mdadm level
> behavior, like automatically change to degraded mode without making the fs
> RO.
> 
> The original patchset:
> http://comments.gmane.org/gmane.comp.file-systems.btrfs/48335
> 
> Or the latest patchset inside Anand Jain's auto-replace patchset:
> http://thread.gmane.org/gmane.comp.file-systems.btrfs/55446
> 
> Thanks,
> Qu
> >
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-21  6:02           ` Liu Bo
@ 2016-04-21  6:09             ` Qu Wenruo
  0 siblings, 0 replies; 32+ messages in thread
From: Qu Wenruo @ 2016-04-21  6:09 UTC (permalink / raw)
  To: bo.li.liu; +Cc: Matthias Bodenbinder, linux-btrfs



Liu Bo wrote on 2016/04/20 23:02 -0700:
> On Thu, Apr 21, 2016 at 01:43:56PM +0800, Qu Wenruo wrote:
>>
>>
>> Matthias Bodenbinder wrote on 2016/04/21 07:22 +0200:
>>> Am 20.04.2016 um 09:25 schrieb Qu Wenruo:
>>>
>>>>
>>>> Unfortunately, this is the designed behavior.
>>>>
>>>> The fs is rw just because it doesn't hit any critical problem.
>>>>
>>>> If you try to touch a file and then sync the fs, btrfs will become RO immediately.
>>>>
>>> ....
>>>
>>>> Btrfs fails to read space cache, nor make a new dir.
>>>>
>>>> The failure on cow_block in mkdir is ciritical, and btrfs become RO.
>>>>
>>>> All expected behavior so far.
>>>>
>>>> You may try use degraded mount option, but AFAIK it may not handle case like yours.
>>>
>>> This really scares me. "Expected bevahour"?
>>> So you are saying: If one of the drives in the raid1 is going dead without noticing btrfs, the redundancy is lost.
>>>
>>> Lets say, the power unit of a disc is going dead. This disc will disappear from the raid1 pretty much as suddenly as in my test case here. No difference.
>>>
>>> You are saying that in this case, btrfs should exactly behave like this? If that is the case I eventually need to rethink my interpretation of redundancy.
>>>
>>> Matthias
>>>
>>
>> The "expected behavior" just means the abort transaction behavior for
>> critical error is expected.
>>
>> And you should know, btrfs is not doing full block level RAID1, it's doing
>> RAID at chunk level.
>> Which needs to consider more things than full block level RAID1, and it's
>> more flex than block level raid1.
>> (For example, you can use 3 devices with different sizes to do btrfs RAID1
>> and get more available size than mdadm raid1)
>>
>> You may think the behavior is totally insane for btrfs RAID1, but don't
>> forget, btrfs can have different metdata/data profile.
>> (And even more, there is already plan to support different profile for
>> different subvolumes)
>>
>> In case your metadata is RAID1, your data can still be RAID0, and in that
>> case a missing devices can still cause huge problem.
>
> From an user's point of view, what you're saying is more an excuse and
> kind of irrelavant.  Stop doing that please, try to fix the insane behavior instead.
>
> Thanks,
>
> -liubo

Didn't you see I have already submitted the first version of per-chunk 
degradable patchset for a long time to address the problem?

And you should blame the person who is blocking the patchset from 
merging by refusing the split them along.

Thanks,
Qu

>
>>
>> There are already unmerged patches which will partly do the mdadm level
>> behavior, like automatically change to degraded mode without making the fs
>> RO.
>>
>> The original patchset:
>> http://comments.gmane.org/gmane.comp.file-systems.btrfs/48335
>>
>> Or the latest patchset inside Anand Jain's auto-replace patchset:
>> http://thread.gmane.org/gmane.comp.file-systems.btrfs/55446
>>
>> Thanks,
>> Qu
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-20  5:17   ` Matthias Bodenbinder
  2016-04-20  7:25     ` Qu Wenruo
  2016-04-20 13:32     ` Anand Jain
@ 2016-04-21  6:23     ` Satoru Takeuchi
  2016-04-21 11:09       ` Austin S. Hemmelgarn
                         ` (2 more replies)
  2 siblings, 3 replies; 32+ messages in thread
From: Satoru Takeuchi @ 2016-04-21  6:23 UTC (permalink / raw)
  To: Matthias Bodenbinder, linux-btrfs

On 2016/04/20 14:17, Matthias Bodenbinder wrote:
> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>> BTW, it would be better to post the dmesg for better debug.
>
> So here we. I did the same test again. Here is a full log of what i did. It seems to be mean like a bug in btrfs.
> Sequenz of events:
> 1. mount the raid1 (2 disc with different size)
> 2. unplug the biggest drive (hotplug)
> 3. try to copy something to the degraded raid1
> 4. plugin the device again (hotplug)
>
> This scenario does not work. The disc array is NOT redundant! I can not work with it while a drive is missing and I can not reattach the device so that everything works again.
>
> The btrfs module crashes during the test.
>
> I am using LMDE2 with backports:
> btrfs-tools 4.4-1~bpo8+1
> linux-image-4.4.0-0.bpo.1-amd64
>
> Matthias
>
>
> rakete - root - /root
> 1# mount /mnt/raid1/
>
> Journal:
>
> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto defrag
> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>
> rakete - root - /mnt/raid1
> 3# ll
> insgesamt 0
> drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
> drwxr-xr-x 1 root root  108 Mär 24 07:31 var
>
> 4# btrfs fi show
> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> 	Total devices 3 FS bytes used 1.60GiB
> 	devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
> 	devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
> 	devid    3 size 232.88GiB used 0.00B path /dev/sdi
>
> ####
> unplug device sdg:
>
> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is busy
> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info about processes that
> Apr 20 07:03:05 rakete umount[16405]: use the device is found by lsof(8) or fuser(1).)
> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process exited, code=exited status=32
> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device number 3 using xhci_hcd
> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an MTP device
> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6 type 0
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8 type 0
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page found
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache: write through
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00 10 08
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page found
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive cache: write through
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00 10 08
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page found
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive cache: write through
> Apr 20 07:03:25 rakete kernel:  sdf: sdf1
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No such file or directory
>
>
> ####
> 5# btrfs fi show
> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> 	Total devices 3 FS bytes used 1.60GiB
> 	devid    2 size 465.76GiB used 3.03GiB path /dev/sdj
> 	devid    3 size 232.88GiB used 0.00B path /dev/sdk
> 	*** Some devices missing
> ####

Here the names of *online* devices are changed
(/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk) after just
offlining a device (/dev/sdf). It's odd regardless of
whether Btrfs works fine or not.

Can anyone explain this behavior?

Thanks,
Satoru

> still mounted in rw mode:
> /dev/sdg on /mnt/raid1 type btrfs (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
> ####
> 7# cp -r /root/ .
> cp: das Verzeichnis „./root“ kann nicht angelegt werden: Eingabe-/Ausgabefehler
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error reading free space cache
> Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 20497563648, rebuilding it now
> Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
> Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156 __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
> Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
> Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci saa7146_vv ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core dvb_core v4l2_common videodev media cfg80211 vboxdrv(O) cpufreq_powersave cpufreq_conservative cpufreq_userspace cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev sparse_keymap drbg iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek rfkill ansi_cprng snd_hda_codec_generic nvidia(PO) aesni_intel aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper snd_hda_codec cryptd snd_hda_core serio_raw pcspkr snd_hwdep snd_pcm i2c_i801 snd_timer snd lpc_ich soundcore 8250_fintek mei_me shpchp mei
> Apr 20 07:05:37 rakete kernel:  mfd_core battery tpm_tis tpm evdev processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd2 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic md_mod dm_mirror dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod ata_generic ahci libahci pata_via xhci_pci ehci_pci crc32c_intel xhci_hcd ehci_hcd libata psmouse scsi_mod atl1c usbcore usb_common fjes video wmi fan thermal button
> Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted: P           O    4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
> Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer System Product Name/P8H67-V, BIOS 3707 07/12/2013
> Apr 20 07:05:37 rakete kernel:  0000000000000286 000000006a1407c8 ffffffff812ed425 ffff88016b6dfb90
> Apr 20 07:05:37 rakete kernel:  ffffffffa03817b8 ffffffff81077ea1 ffff88018e7fcd30 ffff88016b6dfbe8
> Apr 20 07:05:37 rakete kernel:  ffff88005d863e88 ffff8801cde7a980 ffff88018e7fce48 ffffffff81077f2c
> Apr 20 07:05:37 rakete kernel: Call Trace:
> Apr 20 07:05:37 rakete kernel:  [<ffffffff812ed425>] ? dump_stack+0x5c/0x77
> Apr 20 07:05:37 rakete kernel:  [<ffffffff81077ea1>] ? warn_slowpath_common+0x81/0xb0
> Apr 20 07:05:37 rakete kernel:  [<ffffffff81077f2c>] ? warn_slowpath_fmt+0x5c/0x80
> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d74af>] ? __btrfs_cow_block+0x56f/0x5e0 [btrfs]
> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d76af>] ? btrfs_cow_block+0x10f/0x1d0 [btrfs]
> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02db2cd>] ? btrfs_search_slot+0x1fd/0xa30 [btrfs]
> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02dd3f1>] ? btrfs_insert_empty_items+0x71/0xc0 [btrfs]
> Apr 20 07:05:37 rakete kernel:  [<ffffffff811f4d92>] ? insert_inode_locked4+0xa2/0x1c0
> Apr 20 07:05:37 rakete kernel:  [<ffffffffa030ee5d>] ? btrfs_new_inode+0x1cd/0x590 [btrfs]
> Apr 20 07:05:37 rakete kernel:  [<ffffffffa0310a77>] ? btrfs_mkdir+0x107/0x1f0 [btrfs]
> Apr 20 07:05:37 rakete kernel:  [<ffffffff811e80b0>] ? vfs_mkdir+0xb0/0x140
> Apr 20 07:05:37 rakete kernel:  [<ffffffff811e9d3e>] ? SyS_mkdir+0xce/0x110
> Apr 20 07:05:37 rakete kernel:  [<ffffffff81592736>] ? system_call_fast_compare_end+0xc/0x6b
> Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
> Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in __btrfs_cow_block:1156: errno=-5 IO failure
> Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced readonly
>
> ####
> Try to copy again:
> 11# cp -r /root/ .
> cp: cannot create directory './root': Read-only file system
> ####
> /dev/sdg on /mnt/raid1 type btrfs (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
> ####
> plugin device sdg again:
>
> Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point /media/matthias/BACKUP (device 8:81 no longer exist)
> Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device number 3
> Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No such file or directory
> Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is busy
> Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info about processes that
> Apr 20 07:07:39 rakete umount[16807]: use the device is found by lsof(8) or fuser(1).)
> Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process exited, code=exited status=32
> Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
> Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device number 4 using xhci_hcd
> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
> Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an MTP device
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access     WDC WD75 00AACS-00C7B0    0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9 type 0
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00 10 08
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode page found
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive cache: write through
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
>
> ####
> still ro mode
> /dev/sdj on /mnt/raid1 type btrfs (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
> ####
> 14# btrfs fi show
> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> 	Total devices 3 FS bytes used 1.60GiB
> 	devid    1 size 698.64GiB used 3.03GiB path /dev/sdj
> 	devid    2 size 465.76GiB used 3.03GiB path /dev/sdk
> 	devid    3 size 232.88GiB used 0.00B path /dev/sdl
> ####
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-21  5:15       ` Matthias Bodenbinder
@ 2016-04-21  7:19         ` Anand Jain
  0 siblings, 0 replies; 32+ messages in thread
From: Anand Jain @ 2016-04-21  7:19 UTC (permalink / raw)
  To: Matthias Bodenbinder, linux-btrfs



On 04/21/2016 01:15 PM, Matthias Bodenbinder wrote:
> Am 20.04.2016 um 15:32 schrieb Anand Jain:
>>> 1. mount the raid1 (2 disc with different size)
>>
>>> 2. unplug the biggest drive (hotplug)
>>
>>    Btrfs won't know that you have plugged-out a disk.
>>    Though it experiences IO failures, it won't close the bdev.
>
> Well, as far as I can tell mdadm can handle this use case. I tested that. I have an mdadm raid5 running. I accidentially unplugged a sata cable from one of the devices and the raid still worked. I did not even notice that the cable was unplugged until a few hours later. Then I plugged in the cable agaib and that was it. mdadm recovered the raid5 without any problem. -> This is redunancy!

  Yep. I meant to say its a bug in btrfs that it won't know
  about the missing device (after mount). Pls do test the hot
  spare patch set it has few first steps (yep not a complete)
  to handle the failed device while FS is mounted.


>>> 3. try to copy something to the degraded raid1
>>
>>    This will work as long as you do _not_ run unmount/mount.
>
> I did not umount the raid1 when I tried to copy something. As you can see from the sequence of events: I removed the drive and immdiately afterwards tried to copy something to the degraded array. This copy failed with a crash of the btrfs module. -> This is NOT redundancy.
>
> The ummount and mount operations are coming afterwards.
>
> In a nutshell I have to say that the btrfs behaviour is by no means compliant with my understanding of redundancy.

  A known issue.
  Your testing / validating of hot spare patch set will help.

Thanks, Anand


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-21  6:23     ` Satoru Takeuchi
@ 2016-04-21 11:09       ` Austin S. Hemmelgarn
  2016-04-21 11:28       ` Henk Slager
       [not found]       ` <57188534.1070408@jp.fujitsu.com>
  2 siblings, 0 replies; 32+ messages in thread
From: Austin S. Hemmelgarn @ 2016-04-21 11:09 UTC (permalink / raw)
  To: Satoru Takeuchi, Matthias Bodenbinder, linux-btrfs

On 2016-04-21 02:23, Satoru Takeuchi wrote:
> On 2016/04/20 14:17, Matthias Bodenbinder wrote:
>> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>>> BTW, it would be better to post the dmesg for better debug.
>>
>> So here we. I did the same test again. Here is a full log of what i
>> did. It seems to be mean like a bug in btrfs.
>> Sequenz of events:
>> 1. mount the raid1 (2 disc with different size)
>> 2. unplug the biggest drive (hotplug)
>> 3. try to copy something to the degraded raid1
>> 4. plugin the device again (hotplug)
>>
>> This scenario does not work. The disc array is NOT redundant! I can
>> not work with it while a drive is missing and I can not reattach the
>> device so that everything works again.
>>
>> The btrfs module crashes during the test.
>>
>> I am using LMDE2 with backports:
>> btrfs-tools 4.4-1~bpo8+1
>> linux-image-4.4.0-0.bpo.1-amd64
>>
>> Matthias
>>
>>
>> rakete - root - /root
>> 1# mount /mnt/raid1/
>>
>> Journal:
>>
>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto
>> defrag
>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space
>> caching is enabled
>> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>>
>> rakete - root - /mnt/raid1
>> 3# ll
>> insgesamt 0
>> drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
>> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
>> drwxr-xr-x 1 root root  108 Mär 24 07:31 var
>>
>> 4# btrfs fi show
>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>     Total devices 3 FS bytes used 1.60GiB
>>     devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
>>     devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
>>     devid    3 size 232.88GiB used 0.00B path /dev/sdi
>>
>> ####
>> unplug device sdg:
>>
>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>> block 243826688, lost sync page write
>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>> journal superblock for sdf1-8.
>> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>> block 243826688, lost sync page write
>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>> journal superblock for sdf1-8.
>> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is busy
>> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info about
>> processes that
>> Apr 20 07:03:05 rakete umount[16405]: use the device is found by
>> lsof(8) or fuser(1).)
>> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process
>> exited, code=exited status=32
>> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
>> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device
>> number 3 using xhci_hcd
>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found,
>> idVendor=152d, idProduct=0567
>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings:
>> Mfr=10, Product=11, SerialNumber=5
>> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
>> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
>> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>> device detected
>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>> vid 152d pid 0567: 5000000
>> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
>> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3:
>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an
>> MTP device
>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access     WDC
>> WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access     WDC
>> WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access
>> SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6
>> type 0
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7
>> type 0
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte
>> logical blocks: (2.00 TB/1.82 TiB)
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8
>> type 0
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte
>> logical blocks: (500 GB/466 GiB)
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page
>> found
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache:
>> write through
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00 10 08
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte
>> logical blocks: (250 GB/233 GiB)
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page
>> found
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive cache:
>> write through
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00 10 08
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page
>> found
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive cache:
>> write through
>> Apr 20 07:03:25 rakete kernel:  sdf: sdf1
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem with
>> ordered data mode. Opts: (null)
>> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No such
>> file or directory
>>
>>
>> ####
>> 5# btrfs fi show
>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>     Total devices 3 FS bytes used 1.60GiB
>>     devid    2 size 465.76GiB used 3.03GiB path /dev/sdj
>>     devid    3 size 232.88GiB used 0.00B path /dev/sdk
>>     *** Some devices missing
>> ####
>
> Here the names of *online* devices are changed
> (/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk) after just
> offlining a device (/dev/sdf). It's odd regardless of
> whether Btrfs works fine or not.
>
> Can anyone explain this behavior?
It's a side effect of the reference counting done in the kernel.  If 
something is holding open references to the block device (for example, 
if there's a mounted filesystem on one of it's partitions), then the 
kernel has to keep the internal structures relating to that block device 
around, even if the device isn't there anymore.  This means that when 
the disk reappears, the old name is still in use, so the kernel has to 
allocate a new one (because it can't safely assume that the disk is the 
same one that was there previously).  It has some annoying side effects, 
but it's still a whole lot better than the system crashing from a NULL 
pointer reference.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-21  6:23     ` Satoru Takeuchi
  2016-04-21 11:09       ` Austin S. Hemmelgarn
@ 2016-04-21 11:28       ` Henk Slager
  2016-04-21 17:27         ` Matthias Bodenbinder
       [not found]       ` <57188534.1070408@jp.fujitsu.com>
  2 siblings, 1 reply; 32+ messages in thread
From: Henk Slager @ 2016-04-21 11:28 UTC (permalink / raw)
  To: Satoru Takeuchi; +Cc: Matthias Bodenbinder, linux-btrfs

On Thu, Apr 21, 2016 at 8:23 AM, Satoru Takeuchi
<takeuchi_satoru@jp.fujitsu.com> wrote:
> On 2016/04/20 14:17, Matthias Bodenbinder wrote:
>>
>> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>>>
>>> BTW, it would be better to post the dmesg for better debug.
>>
>>
>> So here we. I did the same test again. Here is a full log of what i did.
>> It seems to be mean like a bug in btrfs.
>> Sequenz of events:
>> 1. mount the raid1 (2 disc with different size)
>> 2. unplug the biggest drive (hotplug)
>> 3. try to copy something to the degraded raid1
>> 4. plugin the device again (hotplug)
>>
>> This scenario does not work. The disc array is NOT redundant! I can not
>> work with it while a drive is missing and I can not reattach the device so
>> that everything works again.
>>
>> The btrfs module crashes during the test.
>>
>> I am using LMDE2 with backports:
>> btrfs-tools 4.4-1~bpo8+1
>> linux-image-4.4.0-0.bpo.1-amd64
>>
>> Matthias
>>
>>
>> rakete - root - /root
>> 1# mount /mnt/raid1/
>>
>> Journal:
>>
>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto
>> defrag
>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space caching
>> is enabled
>> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>>
>> rakete - root - /mnt/raid1
>> 3# ll
>> insgesamt 0
>> drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
>> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
>> drwxr-xr-x 1 root root  108 Mär 24 07:31 var
>>
>> 4# btrfs fi show
>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>         Total devices 3 FS bytes used 1.60GiB
>>         devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
>>         devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
>>         devid    3 size 232.88GiB used 0.00B path /dev/sdi
>>
>> ####
>> unplug device sdg:
>>
>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block
>> 243826688, lost sync page write
>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>> journal superblock for sdf1-8.
>> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical block
>> 243826688, lost sync page write
>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>> journal superblock for sdf1-8.
>> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is busy
>> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info about
>> processes that
>> Apr 20 07:03:05 rakete umount[16405]: use the device is found by lsof(8)
>> or fuser(1).)
>> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process exited,
>> code=exited status=32
>> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
>> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device number 3
>> using xhci_hcd
>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found,
>> idVendor=152d, idProduct=0567
>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings: Mfr=10,
>> Product=11, SerialNumber=5
>> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
>> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
>> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>> device detected
>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid
>> 152d pid 0567: 5000000
>> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
>> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3:
>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an MTP
>> device
>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access     WDC WD20
>> 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access     WDC WD50
>> 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access     SAMSUNG
>> SP2504C          0125 PQ: 0 ANSI: 6
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6 type
>> 0
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7 type
>> 0
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte
>> logical blocks: (2.00 TB/1.82 TiB)
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8 type
>> 0
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte
>> logical blocks: (500 GB/466 GiB)
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page
>> found
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache:
>> write through
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00 10 08
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte
>> logical blocks: (250 GB/233 GiB)
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page
>> found
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive cache:
>> write through
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00 10 08
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page
>> found
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive cache:
>> write through
>> Apr 20 07:03:25 rakete kernel:  sdf: sdf1
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem with
>> ordered data mode. Opts: (null)
>> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No such
>> file or directory
>>
>>
>> ####
>> 5# btrfs fi show
>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>         Total devices 3 FS bytes used 1.60GiB
>>         devid    2 size 465.76GiB used 3.03GiB path /dev/sdj
>>         devid    3 size 232.88GiB used 0.00B path /dev/sdk
>>         *** Some devices missing
>> ####
>
>
> Here the names of *online* devices are changed
> (/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk) after just
> offlining a device (/dev/sdf). It's odd regardless of
> whether Btrfs works fine or not.
>
> Can anyone explain this behavior?

All 4 drives (WD20, WD75, WD50, SP2504C) get a disconnect twice in
this test. What is on WD20 is unclear to me, but the raid1 array is
{WD75, WD50, SP2504C}
So the test as described by Matthias is not what actually happens.
In fact, the whole btrfs fs is 'disconnected on the lower layers of
the kernel' but there is no unmount.  You can see the scsi items go
from 8?.0.0.x to
9.0.0.x to 10.0.0.x. In the 9.0.0.x state, the tools show then 1 dev
missing (WD75), but in fact the whole fs state is messed up. So as
indicated by Anand already, it is a bad test and it is what one can
expect from an unpatched 4.4.0 kernel. ( I'm curious to know how md
raidX would handle this ).

a) My best guess is that the 4 drives are in a USB connected drivebay
and that Matthias unplugged WD75 (so cut its power and SATA
connection), did the file copy trial and then plugged in the WD75
again into the drivebay. The (un)plug of a harddisk is then assumed to
trigger a USB link re-init by the chipset in the drivebay.

b) Another possibility is that due to (un)plug of WD75 cause the host
USB chipset to re-init the USB link due to (too big?) changes in
electrical current. And likely separate USB cables and maybe some
SATA.

c) Or some flaw in the LMDE2 distribution in combination with btrfs. I
don't what is in the  linux-image-4.4.0-0.bpo.1-amd64

>> still mounted in rw mode:
>> /dev/sdg on /mnt/raid1 type btrfs
>> (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>> ####
>> 7# cp -r /root/ .
>> cp: das Verzeichnis „./root“ kann nicht angelegt werden:
>> Eingabe-/Ausgabefehler
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev /dev/sdg
>> errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error reading
>> free space cache
>> Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to load
>> free space cache for block group 20497563648, rebuilding it now
>> Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
>> Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at
>> /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156
>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
>> Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
>> Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage pci_stub
>> vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci saa7146_vv
>> ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core dvb_core v4l2_common
>> videodev media cfg80211 vboxdrv(O) cpufreq_powersave cpufreq_conservative
>> cpufreq_userspace cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi
>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
>> crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev sparse_keymap drbg
>> iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek rfkill ansi_cprng
>> snd_hda_codec_generic nvidia(PO) aesni_intel aes_x86_64 lrw gf128mul
>> snd_hda_intel glue_helper ablk_helper snd_hda_codec cryptd snd_hda_core
>> serio_raw pcspkr snd_hwdep snd_pcm i2c_i801 snd_timer snd lpc_ich soundcore
>> 8250_fintek mei_me shpchp mei
>> Apr 20 07:05:37 rakete kernel:  mfd_core battery tpm_tis tpm evdev
>> processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac
>> encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd2
>> btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
>> hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic md_mod dm_mirror
>> dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod ata_generic ahci libahci
>> pata_via xhci_pci ehci_pci crc32c_intel xhci_hcd ehci_hcd libata psmouse
>> scsi_mod atl1c usbcore usb_common fjes video wmi fan thermal button
>> Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted: P
>> O    4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
>> Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer System
>> Product Name/P8H67-V, BIOS 3707 07/12/2013
>> Apr 20 07:05:37 rakete kernel:  0000000000000286 000000006a1407c8
>> ffffffff812ed425 ffff88016b6dfb90
>> Apr 20 07:05:37 rakete kernel:  ffffffffa03817b8 ffffffff81077ea1
>> ffff88018e7fcd30 ffff88016b6dfbe8
>> Apr 20 07:05:37 rakete kernel:  ffff88005d863e88 ffff8801cde7a980
>> ffff88018e7fce48 ffffffff81077f2c
>> Apr 20 07:05:37 rakete kernel: Call Trace:
>> Apr 20 07:05:37 rakete kernel:  [<ffffffff812ed425>] ?
>> dump_stack+0x5c/0x77
>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81077ea1>] ?
>> warn_slowpath_common+0x81/0xb0
>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81077f2c>] ?
>> warn_slowpath_fmt+0x5c/0x80
>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d74af>] ?
>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]
>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d76af>] ?
>> btrfs_cow_block+0x10f/0x1d0 [btrfs]
>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02db2cd>] ?
>> btrfs_search_slot+0x1fd/0xa30 [btrfs]
>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02dd3f1>] ?
>> btrfs_insert_empty_items+0x71/0xc0 [btrfs]
>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811f4d92>] ?
>> insert_inode_locked4+0xa2/0x1c0
>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa030ee5d>] ?
>> btrfs_new_inode+0x1cd/0x590 [btrfs]
>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa0310a77>] ?
>> btrfs_mkdir+0x107/0x1f0 [btrfs]
>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811e80b0>] ?
>> vfs_mkdir+0xb0/0x140
>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811e9d3e>] ?
>> SyS_mkdir+0xce/0x110
>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81592736>] ?
>> system_call_fast_compare_end+0xc/0x6b
>> Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
>> Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in
>> __btrfs_cow_block:1156: errno=-5 IO failure
>> Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced readonly
>>
>> ####
>> Try to copy again:
>> 11# cp -r /root/ .
>> cp: cannot create directory './root': Read-only file system
>> ####
>> /dev/sdg on /mnt/raid1 type btrfs
>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>> ####
>> plugin device sdg again:
>>
>> Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point
>> /media/matthias/BACKUP (device 8:81 no longer exist)
>> Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device number 3
>> Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No such
>> file or directory
>> Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is busy
>> Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info about
>> processes that
>> Apr 20 07:07:39 rakete umount[16807]: use the device is found by lsof(8)
>> or fuser(1).)
>> Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process exited,
>> code=exited status=32
>> Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
>> Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device number 4
>> using xhci_hcd
>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found,
>> idVendor=152d, idProduct=0567
>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings: Mfr=10,
>> Product=11, SerialNumber=5
>> Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
>> Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
>> Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>> device detected
>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid
>> 152d pid 0567: 5000000
>> Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
>> Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4:
>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>> Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an MTP
>> device
>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access     WDC WD20
>> 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access     WDC WD75
>> 00AACS-00C7B0    0125 PQ: 0 ANSI: 6
>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access     WDC WD50
>> 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access     SAMSUNG
>> SP2504C          0125 PQ: 0 ANSI: 6
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6 type
>> 0
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte
>> logical blocks: (2.00 TB/1.82 TiB)
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00 10 08
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7 type
>> 0
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode page
>> found
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive cache:
>> write through
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte
>> logical blocks: (750 GB/699 GiB)
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8 type
>> 0
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00 10 08
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9 type
>> 0
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte
>> logical blocks: (500 GB/466 GiB)
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode page
>> found
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive cache:
>> write through
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00 10 08
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte
>> logical blocks: (250 GB/233 GiB)
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode page
>> found
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive cache:
>> write through
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00 10 08
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode page
>> found
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive cache:
>> write through
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem with
>> ordered data mode. Opts: (null)
>>
>> ####
>> still ro mode
>> /dev/sdj on /mnt/raid1 type btrfs
>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>> ####
>> 14# btrfs fi show
>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>         Total devices 3 FS bytes used 1.60GiB
>>         devid    1 size 698.64GiB used 3.03GiB path /dev/sdj
>>         devid    2 size 465.76GiB used 3.03GiB path /dev/sdk
>>         devid    3 size 232.88GiB used 0.00B path /dev/sdl
>> ####
>>
>>
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
       [not found]       ` <57188534.1070408@jp.fujitsu.com>
@ 2016-04-21 11:58         ` Qu Wenruo
  2016-04-22  2:21           ` Satoru Takeuchi
  0 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2016-04-21 11:58 UTC (permalink / raw)
  To: Satoru Takeuchi, Matthias Bodenbinder, linux-btrfs



On 04/21/2016 03:45 PM, Satoru Takeuchi wrote:
> On 2016/04/21 15:23, Satoru Takeuchi wrote:
>> On 2016/04/20 14:17, Matthias Bodenbinder wrote:
>>> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>>>> BTW, it would be better to post the dmesg for better debug.
>>>
>>> So here we. I did the same test again. Here is a full log of what i
>>> did. It seems to be mean like a bug in btrfs.
>>> Sequenz of events:
>>> 1. mount the raid1 (2 disc with different size)
>>> 2. unplug the biggest drive (hotplug)
>>> 3. try to copy something to the degraded raid1
>>> 4. plugin the device again (hotplug)
>>>
>>> This scenario does not work. The disc array is NOT redundant! I can
>>> not work with it while a drive is missing and I can not reattach the
>>> device so that everything works again.
>>>
>>> The btrfs module crashes during the test.
>>>
>>> I am using LMDE2 with backports:
>>> btrfs-tools 4.4-1~bpo8+1
>>> linux-image-4.4.0-0.bpo.1-amd64
>>>
>>> Matthias
>>>
>>>
>>> rakete - root - /root
>>> 1# mount /mnt/raid1/
>>>
>>> Journal:
>>>
>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto
>>> defrag
>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space
>>> caching is enabled
>>> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>>>
>>> rakete - root - /mnt/raid1
>>> 3# ll
>>> insgesamt 0
>>> drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
>>> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
>>> drwxr-xr-x 1 root root  108 Mär 24 07:31 var
>>>
>>> 4# btrfs fi show
>>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>     Total devices 3 FS bytes used 1.60GiB
>>>     devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
>>>     devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
>>>     devid    3 size 232.88GiB used 0.00B path /dev/sdi
>>>
>>> ####
>>> unplug device sdg:
>>>
>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>> block 243826688, lost sync page write
>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>> journal superblock for sdf1-8.
>>> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>> block 243826688, lost sync page write
>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>> journal superblock for sdf1-8.
>>> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is busy
>>> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info
>>> about processes that
>>> Apr 20 07:03:05 rakete umount[16405]: use the device is found by
>>> lsof(8) or fuser(1).)
>>> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process
>>> exited, code=exited status=32
>>> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device
>>> number 3 using xhci_hcd
>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found,
>>> idVendor=152d, idProduct=0567
>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings:
>>> Mfr=10, Product=11, SerialNumber=5
>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
>>> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>> device detected
>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>> vid 152d pid 0567: 5000000
>>> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
>>> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3:
>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an
>>> MTP device
>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access     WDC
>>> WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access     WDC
>>> WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access
>>> SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6
>>> type 0
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7
>>> type 0
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte
>>> logical blocks: (2.00 TB/1.82 TiB)
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8
>>> type 0
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte
>>> logical blocks: (500 GB/466 GiB)
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page
>>> found
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive
>>> cache: write through
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00 10 08
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte
>>> logical blocks: (250 GB/233 GiB)
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page
>>> found
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive
>>> cache: write through
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00 10 08
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page
>>> found
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive
>>> cache: write through
>>> Apr 20 07:03:25 rakete kernel:  sdf: sdf1
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>> with ordered data mode. Opts: (null)
>>> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No
>>> such file or directory
>>>
>>>
>>> ####
>>> 5# btrfs fi show
>>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>     Total devices 3 FS bytes used 1.60GiB
>>>     devid    2 size 465.76GiB used 3.03GiB path /dev/sdj
>>>     devid    3 size 232.88GiB used 0.00B path /dev/sdk
>>>     *** Some devices missing
>>> ####
>>
>> Here the names of *online* devices are changed
>> (/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk) after just
>> offlining a device (/dev/sdf). It's odd regardless of
>> whether Btrfs works fine or not.
>>
>> Can anyone explain this behavior?
>
> FYI,
>
> I tried to reproduce this problem on VM.
> Here USB storages are /dev/sd{a,b,c}.
>
> Step to reproduce:
>
>   1. create a fs on /dev/sd{a,b,c}
>   2. mount this fs
>   3. Surprise unplug /dev/sdc
>   4. Write to this fs till ENOSPC happens
>
> Then, although there are I/O errors about /dev/sdc,
> device names didn't change and ro remount didn't happen.
>
> command log:
> =================================
> # mkfs.btrfs -f -m raid1 -d raid1 /dev/sd{a,b,c}
> btrfs-progs v4.5.1-41-g8202204-dirty
> See http://btrfs.wiki.kernel.org for more information.
>
> Label:              (null)
> UUID:               16a54915-c807-42cf-8365-82c0780c5ab5
> Node size:          16384
> Sector size:        4096
> Filesystem size:    15.00GiB
> Block group profiles:
>    Data:             RAID1             1.01GiB
>    Metadata:         RAID1             1.01GiB
>    System:           RAID1            12.00MiB
> SSD detected:       no
> Incompat features:  extref, skinny-metadata
> Number of devices:  3
> Devices:
>     ID        SIZE  PATH
>      1     5.00GiB  /dev/sda
>      2     5.00GiB  /dev/sdb
>      3     5.00GiB  /dev/sdc
>
> # mount /dev/sda /scratch_mnt/
> # btrfs fi show /scratch_mnt/
> Label: none  uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
>          Total devices 3 FS bytes used 640.00KiB
>          devid    1 size 5.00GiB used 2.00GiB path /dev/sda
>          devid    2 size 5.00GiB used 1.01GiB path /dev/sdb
>          devid    3 size 5.00GiB used 1.01GiB path /dev/sdc
>
> #
> # # *** surprise unplug happens here ***
> #
> # btrfs fi show /scratch_mnt/

Would you please post the output of "btrfs-debug-tree -t 3"?

I guess the case would be that, there is not raid1 stripe in device 3, 
so all data/metadata allocation/cow happens without problem.
"btrfs-debug-tree -t 3" output would verify my guess.

Thanks,
Qu
> Label: none  uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
>          Total devices 3 FS bytes used 1.81GiB
>          devid    1 size 5.00GiB used 2.00GiB path /dev/sda
>          devid    2 size 5.00GiB used 2.01GiB path /dev/sdb
>          *** Some devices missing
>
> # cp -a linux /scratch_mnt/
> # cp -a linux /scratch_mnt/linux.2
> # cp -a linux /scratch_mnt/linux.3
> cp: error writing ‘/scratch_mnt/linux.3/drivers/scsi/lpfc/lpfc_els.c’:
> No space left on device
> ...
> # mount | grep scratch
> /dev/sda on /scratch_mnt type btrfs
> (rw,relatime,seclabel,space_cache,subvolid=5,subvol=/)
> # dmesg | tail
> [ 1400.778705] BTRFS warning (device sdc): lost page write due to IO
> error on /dev/sdc
> [ 1438.604796] btrfs_dev_stat_print_on_error: 174 callbacks suppressed
> [ 1438.604803] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125633,
> rd 1, flush 276, corrupt 0, gen 0
> [ 1438.609782] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
> rd 1, flush 276, corrupt 0, gen 0
> [ 1438.613331] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
> rd 1, flush 277, corrupt 0, gen 0
> [ 1438.669090] btrfs_end_buffer_write_sync: 52 callbacks suppressed
> [ 1438.669095] BTRFS warning (device sdc): lost page write due to IO
> error on /dev/sdc
> [ 1438.669098] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125635,
> rd 1, flush 277, corrupt 0, gen 0
> [ 1438.672621] BTRFS warning (device sdc): lost page write due to IO
> error on /dev/sdc
> [ 1438.672626] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125636,
> rd 1, flush 277, corrupt 0, gen 0
> =================================
>
> Thanks,
> Satoru
>
>>
>> Thanks,
>> Satoru
>>
>>> still mounted in rw mode:
>>> /dev/sdg on /mnt/raid1 type btrfs
>>> (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>> ####
>>> 7# cp -r /root/ .
>>> cp: das Verzeichnis „./root“ kann nicht angelegt werden:
>>> Eingabe-/Ausgabefehler
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>> /dev/sdg errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error
>>> reading free space cache
>>> Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to
>>> load free space cache for block group 20497563648, rebuilding it now
>>> Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
>>> Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at
>>> /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156
>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
>>> Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
>>> Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage
>>> pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci
>>> saa7146_vv ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core
>>> dvb_core v4l2_common videodev media cfg80211 vboxdrv(O)
>>> cpufreq_powersave cpufreq_conservative cpufreq_userspace
>>> cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi
>>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
>>> irqbypass crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev
>>> sparse_keymap drbg iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek
>>> rfkill ansi_cprng snd_hda_codec_generic nvidia(PO) aesni_intel
>>> aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper
>>> snd_hda_codec cryptd snd_hda_core serio_raw pcspkr snd_hwdep snd_pcm
>>> i2c_i801 snd_timer snd lpc_ich soundcore 8250_fintek mei_me shpchp mei
>>> Apr 20 07:05:37 rakete kernel:  mfd_core battery tpm_tis tpm evdev
>>> processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac
>>> encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache
>>> jbd2 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor
>>> async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic
>>> md_mod dm_mirror dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod
>>> ata_generic ahci libahci pata_via xhci_pci ehci_pci crc32c_intel
>>> xhci_hcd ehci_hcd libata psmouse scsi_mod atl1c usbcore usb_common
>>> fjes video wmi fan thermal button
>>> Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted:
>>> P           O    4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
>>> Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer
>>> System Product Name/P8H67-V, BIOS 3707 07/12/2013
>>> Apr 20 07:05:37 rakete kernel:  0000000000000286 000000006a1407c8
>>> ffffffff812ed425 ffff88016b6dfb90
>>> Apr 20 07:05:37 rakete kernel:  ffffffffa03817b8 ffffffff81077ea1
>>> ffff88018e7fcd30 ffff88016b6dfbe8
>>> Apr 20 07:05:37 rakete kernel:  ffff88005d863e88 ffff8801cde7a980
>>> ffff88018e7fce48 ffffffff81077f2c
>>> Apr 20 07:05:37 rakete kernel: Call Trace:
>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff812ed425>] ?
>>> dump_stack+0x5c/0x77
>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81077ea1>] ?
>>> warn_slowpath_common+0x81/0xb0
>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81077f2c>] ?
>>> warn_slowpath_fmt+0x5c/0x80
>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d74af>] ?
>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]
>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d76af>] ?
>>> btrfs_cow_block+0x10f/0x1d0 [btrfs]
>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02db2cd>] ?
>>> btrfs_search_slot+0x1fd/0xa30 [btrfs]
>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02dd3f1>] ?
>>> btrfs_insert_empty_items+0x71/0xc0 [btrfs]
>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811f4d92>] ?
>>> insert_inode_locked4+0xa2/0x1c0
>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa030ee5d>] ?
>>> btrfs_new_inode+0x1cd/0x590 [btrfs]
>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa0310a77>] ?
>>> btrfs_mkdir+0x107/0x1f0 [btrfs]
>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811e80b0>] ?
>>> vfs_mkdir+0xb0/0x140
>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811e9d3e>] ?
>>> SyS_mkdir+0xce/0x110
>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81592736>] ?
>>> system_call_fast_compare_end+0xc/0x6b
>>> Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
>>> Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in
>>> __btrfs_cow_block:1156: errno=-5 IO failure
>>> Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced readonly
>>>
>>> ####
>>> Try to copy again:
>>> 11# cp -r /root/ .
>>> cp: cannot create directory './root': Read-only file system
>>> ####
>>> /dev/sdg on /mnt/raid1 type btrfs
>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>> ####
>>> plugin device sdg again:
>>>
>>> Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point
>>> /media/matthias/BACKUP (device 8:81 no longer exist)
>>> Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device number 3
>>> Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No
>>> such file or directory
>>> Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is busy
>>> Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info
>>> about processes that
>>> Apr 20 07:07:39 rakete umount[16807]: use the device is found by
>>> lsof(8) or fuser(1).)
>>> Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process
>>> exited, code=exited status=32
>>> Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>> Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device
>>> number 4 using xhci_hcd
>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found,
>>> idVendor=152d, idProduct=0567
>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings:
>>> Mfr=10, Product=11, SerialNumber=5
>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
>>> Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>> device detected
>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>> vid 152d pid 0567: 5000000
>>> Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
>>> Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4:
>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>> Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an
>>> MTP device
>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access     WDC
>>> WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access     WDC
>>> WD75 00AACS-00C7B0    0125 PQ: 0 ANSI: 6
>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access     WDC
>>> WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access
>>> SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6
>>> type 0
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte
>>> logical blocks: (2.00 TB/1.82 TiB)
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00
>>> 10 08
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7
>>> type 0
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode
>>> page found
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive
>>> cache: write through
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte
>>> logical blocks: (750 GB/699 GiB)
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8
>>> type 0
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00
>>> 10 08
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9
>>> type 0
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte
>>> logical blocks: (500 GB/466 GiB)
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode
>>> page found
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive
>>> cache: write through
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00
>>> 10 08
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte
>>> logical blocks: (250 GB/233 GiB)
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode
>>> page found
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive
>>> cache: write through
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00
>>> 10 08
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode
>>> page found
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive
>>> cache: write through
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>> with ordered data mode. Opts: (null)
>>>
>>> ####
>>> still ro mode
>>> /dev/sdj on /mnt/raid1 type btrfs
>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>> ####
>>> 14# btrfs fi show
>>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>     Total devices 3 FS bytes used 1.60GiB
>>>     devid    1 size 698.64GiB used 3.03GiB path /dev/sdj
>>>     devid    2 size 465.76GiB used 3.03GiB path /dev/sdk
>>>     devid    3 size 232.88GiB used 0.00B path /dev/sdl
>>> ####
>>>
>>>
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-21 11:28       ` Henk Slager
@ 2016-04-21 17:27         ` Matthias Bodenbinder
  2016-04-26 16:19           ` Henk Slager
  0 siblings, 1 reply; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-21 17:27 UTC (permalink / raw)
  To: linux-btrfs

Am 21.04.2016 um 13:28 schrieb Henk Slager:
>> Can anyone explain this behavior?
> 
> All 4 drives (WD20, WD75, WD50, SP2504C) get a disconnect twice in
> this test. What is on WD20 is unclear to me, but the raid1 array is
> {WD75, WD50, SP2504C}
> So the test as described by Matthias is not what actually happens.
> In fact, the whole btrfs fs is 'disconnected on the lower layers of
> the kernel' but there is no unmount.  You can see the scsi items go
> from 8?.0.0.x to
> 9.0.0.x to 10.0.0.x. In the 9.0.0.x state, the tools show then 1 dev
> missing (WD75), but in fact the whole fs state is messed up. So as
> indicated by Anand already, it is a bad test and it is what one can
> expect from an unpatched 4.4.0 kernel. ( I'm curious to know how md
> raidX would handle this ).
> 
> a) My best guess is that the 4 drives are in a USB connected drivebay
> and that Matthias unplugged WD75 (so cut its power and SATA
> connection), did the file copy trial and then plugged in the WD75
> again into the drivebay. The (un)plug of a harddisk is then assumed to
> trigger a USB link re-init by the chipset in the drivebay.
> 
> b) Another possibility is that due to (un)plug of WD75 cause the host
> USB chipset to re-init the USB link due to (too big?) changes in
> electrical current. And likely separate USB cables and maybe some
> SATA.
> 
> c) Or some flaw in the LMDE2 distribution in combination with btrfs. I
> don't what is in the  linux-image-4.4.0-0.bpo.1-amd64
> 

Just to clarify my setup. I HDs are mounted into a FANTEC QB-35US3-6G case. According to the handbook it has "Hot-Plug for  USB / eSATA interface".

It is equipped with 4 HDs. 3 of them are part of the raid1. The fourth HD is a 2 TB device with ext4 filesystem and no relevance for this thread.

Matthias


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-21  5:43         ` Qu Wenruo
  2016-04-21  6:02           ` Liu Bo
@ 2016-04-21 17:40           ` Matthias Bodenbinder
  2016-04-22  6:02             ` Qu Wenruo
  1 sibling, 1 reply; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-21 17:40 UTC (permalink / raw)
  To: linux-btrfs

Am 21.04.2016 um 07:43 schrieb Qu Wenruo:
> There are already unmerged patches which will partly do the mdadm level behavior, like automatically change to degraded mode without making the fs RO.
> 
> The original patchset:
> http://comments.gmane.org/gmane.comp.file-systems.btrfs/48335

The description of thix patch says: 

"Although the one-size-fit-all solution is quite safe, it's too strict if
data and metadata has different duplication level." 
...
"This patchset will introduce a new per-chunk degradable check for btrfs,
allow above case to succeed, and it's quite small anyway."


My raid1 is "-m raid1 -d raid1". Both the same duplication level. Would that patch make any difference?

And: What do I need to do to test this in "debian stable"? I am not a programmer - but I know how to use git and how to compile with proper configuration directions.

Matthias


> Or the latest patchset inside Anand Jain's auto-replace patchset:
> http://thread.gmane.org/gmane.comp.file-systems.btrfs/55446
> 
> Thanks,
> Qu
>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-21 11:58         ` Qu Wenruo
@ 2016-04-22  2:21           ` Satoru Takeuchi
  2016-04-22  5:32             ` Qu Wenruo
  0 siblings, 1 reply; 32+ messages in thread
From: Satoru Takeuchi @ 2016-04-22  2:21 UTC (permalink / raw)
  To: Qu Wenruo, Matthias Bodenbinder, linux-btrfs

On 2016/04/21 20:58, Qu Wenruo wrote:
>
>
> On 04/21/2016 03:45 PM, Satoru Takeuchi wrote:
>> On 2016/04/21 15:23, Satoru Takeuchi wrote:
>>> On 2016/04/20 14:17, Matthias Bodenbinder wrote:
>>>> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>>>>> BTW, it would be better to post the dmesg for better debug.
>>>>
>>>> So here we. I did the same test again. Here is a full log of what i
>>>> did. It seems to be mean like a bug in btrfs.
>>>> Sequenz of events:
>>>> 1. mount the raid1 (2 disc with different size)
>>>> 2. unplug the biggest drive (hotplug)
>>>> 3. try to copy something to the degraded raid1
>>>> 4. plugin the device again (hotplug)
>>>>
>>>> This scenario does not work. The disc array is NOT redundant! I can
>>>> not work with it while a drive is missing and I can not reattach the
>>>> device so that everything works again.
>>>>
>>>> The btrfs module crashes during the test.
>>>>
>>>> I am using LMDE2 with backports:
>>>> btrfs-tools 4.4-1~bpo8+1
>>>> linux-image-4.4.0-0.bpo.1-amd64
>>>>
>>>> Matthias
>>>>
>>>>
>>>> rakete - root - /root
>>>> 1# mount /mnt/raid1/
>>>>
>>>> Journal:
>>>>
>>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto
>>>> defrag
>>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space
>>>> caching is enabled
>>>> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>>>>
>>>> rakete - root - /mnt/raid1
>>>> 3# ll
>>>> insgesamt 0
>>>> drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
>>>> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
>>>> drwxr-xr-x 1 root root  108 Mär 24 07:31 var
>>>>
>>>> 4# btrfs fi show
>>>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>     Total devices 3 FS bytes used 1.60GiB
>>>>     devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
>>>>     devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
>>>>     devid    3 size 232.88GiB used 0.00B path /dev/sdi
>>>>
>>>> ####
>>>> unplug device sdg:
>>>>
>>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>>> block 243826688, lost sync page write
>>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>>> journal superblock for sdf1-8.
>>>> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
>>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>>> block 243826688, lost sync page write
>>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>>> journal superblock for sdf1-8.
>>>> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is busy
>>>> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info
>>>> about processes that
>>>> Apr 20 07:03:05 rakete umount[16405]: use the device is found by
>>>> lsof(8) or fuser(1).)
>>>> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process
>>>> exited, code=exited status=32
>>>> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device
>>>> number 3 using xhci_hcd
>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found,
>>>> idVendor=152d, idProduct=0567
>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings:
>>>> Mfr=10, Product=11, SerialNumber=5
>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>>> device detected
>>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>>> vid 152d pid 0567: 5000000
>>>> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
>>>> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3:
>>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>>> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an
>>>> MTP device
>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access     WDC
>>>> WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access     WDC
>>>> WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access
>>>> SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6
>>>> type 0
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7
>>>> type 0
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte
>>>> logical blocks: (2.00 TB/1.82 TiB)
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8
>>>> type 0
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte
>>>> logical blocks: (500 GB/466 GiB)
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page
>>>> found
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive
>>>> cache: write through
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00 10 08
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte
>>>> logical blocks: (250 GB/233 GiB)
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page
>>>> found
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive
>>>> cache: write through
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00 10 08
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page
>>>> found
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive
>>>> cache: write through
>>>> Apr 20 07:03:25 rakete kernel:  sdf: sdf1
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
>>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
>>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>>> with ordered data mode. Opts: (null)
>>>> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No
>>>> such file or directory
>>>>
>>>>
>>>> ####
>>>> 5# btrfs fi show
>>>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>     Total devices 3 FS bytes used 1.60GiB
>>>>     devid    2 size 465.76GiB used 3.03GiB path /dev/sdj
>>>>     devid    3 size 232.88GiB used 0.00B path /dev/sdk
>>>>     *** Some devices missing
>>>> ####
>>>
>>> Here the names of *online* devices are changed
>>> (/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk) after just
>>> offlining a device (/dev/sdf). It's odd regardless of
>>> whether Btrfs works fine or not.
>>>
>>> Can anyone explain this behavior?
>>
>> FYI,
>>
>> I tried to reproduce this problem on VM.
>> Here USB storages are /dev/sd{a,b,c}.
>>
>> Step to reproduce:
>>
>>   1. create a fs on /dev/sd{a,b,c}
>>   2. mount this fs
>>   3. Surprise unplug /dev/sdc
>>   4. Write to this fs till ENOSPC happens
>>
>> Then, although there are I/O errors about /dev/sdc,
>> device names didn't change and ro remount didn't happen.
>>
>> command log:
>> =================================
>> # mkfs.btrfs -f -m raid1 -d raid1 /dev/sd{a,b,c}
>> btrfs-progs v4.5.1-41-g8202204-dirty
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> Label:              (null)
>> UUID:               16a54915-c807-42cf-8365-82c0780c5ab5
>> Node size:          16384
>> Sector size:        4096
>> Filesystem size:    15.00GiB
>> Block group profiles:
>>    Data:             RAID1             1.01GiB
>>    Metadata:         RAID1             1.01GiB
>>    System:           RAID1            12.00MiB
>> SSD detected:       no
>> Incompat features:  extref, skinny-metadata
>> Number of devices:  3
>> Devices:
>>     ID        SIZE  PATH
>>      1     5.00GiB  /dev/sda
>>      2     5.00GiB  /dev/sdb
>>      3     5.00GiB  /dev/sdc
>>
>> # mount /dev/sda /scratch_mnt/
>> # btrfs fi show /scratch_mnt/
>> Label: none  uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
>>          Total devices 3 FS bytes used 640.00KiB
>>          devid    1 size 5.00GiB used 2.00GiB path /dev/sda
>>          devid    2 size 5.00GiB used 1.01GiB path /dev/sdb
>>          devid    3 size 5.00GiB used 1.01GiB path /dev/sdc
>>
>> #
>> # # *** surprise unplug happens here ***
>> #
>> # btrfs fi show /scratch_mnt/
>
> Would you please post the output of "btrfs-debug-tree -t 3"?
>
> I guess the case would be that, there is not raid1 stripe in device 3, so all data/metadata allocation/cow happens without problem.
> "btrfs-debug-tree -t 3" output would verify my guess.

OK, here it is.

btrfs-debug-tree -t 3 before cp:
===========================
btrfs-progs v4.5.1-41-g8202204-dirty
chunk tree
leaf 20987904 items 6 free space 15503 generation 5 owner 3
fs uuid 30771a06-e6a8-4cbc-a094-893049fa5060
chunk uuid 2325f1b9-1bf0-4247-8c29-7b179eabf1b2
	item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
		dev item devid 1 total_bytes 5368709120 bytes used 2147483648
		dev uuid 06bc0993-39d3-4d9a-b484-760ae2150c3a
	item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
		dev item devid 2 total_bytes 5368709120 bytes used 1082130432
		dev uuid 3868895f-295b-4a89-a01c-ad0f1c5ac758
	item 2 key (DEV_ITEMS DEV_ITEM 3) itemoff 15989 itemsize 98
		dev item devid 3 total_bytes 5368709120 bytes used 1082130432
		dev uuid 911e8702-9428-4b8e-bc6d-d212e909a1ef
	item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15877 itemsize 112
		chunk length 8388608 owner 2 stripe_len 65536
		type SYSTEM|RAID1 num_stripes 2
			stripe 0 devid 3 offset 1048576
			dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
			stripe 1 devid 2 offset 1048576
			dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
	item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 15765 itemsize 112
		chunk length 1073741824 owner 2 stripe_len 65536
		type METADATA|RAID1 num_stripes 2
			stripe 0 devid 1 offset 20971520
			dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
			stripe 1 devid 3 offset 9437184
			dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
	item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 1103101952) itemoff 15653 itemsize 112
		chunk length 1073741824 owner 2 stripe_len 65536
		type DATA|RAID1 num_stripes 2
			stripe 0 devid 2 offset 9437184
			dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
			stripe 1 devid 1 offset 1094713344
			dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
total bytes 16106127360
bytes used 114688
uuid 30771a06-e6a8-4cbc-a094-893049fa5060
===========================



Here I hot unplug devid 2 (/dev/sdb).



btrfs-debug-tree -t 3 after cp (which cause ENOSPC):
===========================
btrfs-progs v4.5.1-41-g8202204-dirty
warning, device 2 is missing
chunk tree
leaf 20987904 items 11 free space 14818 generation 9 owner 3
fs uuid 30771a06-e6a8-4cbc-a094-893049fa5060
chunk uuid 2325f1b9-1bf0-4247-8c29-7b179eabf1b2
	item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
		dev item devid 1 total_bytes 5368709120 bytes used 4294967296
		dev uuid 06bc0993-39d3-4d9a-b484-760ae2150c3a
	item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
		dev item devid 2 total_bytes 5368709120 bytes used 5367660544
		dev uuid 3868895f-295b-4a89-a01c-ad0f1c5ac758
	item 2 key (DEV_ITEMS DEV_ITEM 3) itemoff 15989 itemsize 98
		dev item devid 3 total_bytes 5368709120 bytes used 5367660544
		dev uuid 911e8702-9428-4b8e-bc6d-d212e909a1ef
	item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15877 itemsize 112
		chunk length 8388608 owner 2 stripe_len 65536
		type SYSTEM|RAID1 num_stripes 2
			stripe 0 devid 3 offset 1048576
			dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
			stripe 1 devid 2 offset 1048576
			dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
	item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 15765 itemsize 112
		chunk length 1073741824 owner 2 stripe_len 65536
		type METADATA|RAID1 num_stripes 2
			stripe 0 devid 1 offset 20971520
			dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
			stripe 1 devid 3 offset 9437184
			dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
	item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 1103101952) itemoff 15653 itemsize 112
		chunk length 1073741824 owner 2 stripe_len 65536
		type DATA|RAID1 num_stripes 2
			stripe 0 devid 2 offset 9437184
			dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
			stripe 1 devid 1 offset 1094713344
			dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
	item 6 key (FIRST_CHUNK_TREE CHUNK_ITEM 2176843776) itemoff 15541 itemsize 112
		chunk length 1073741824 owner 2 stripe_len 65536
		type DATA|RAID1 num_stripes 2
			stripe 0 devid 2 offset 1083179008
			dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
			stripe 1 devid 3 offset 1083179008
			dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
	item 7 key (FIRST_CHUNK_TREE CHUNK_ITEM 3250585600) itemoff 15429 itemsize 112
		chunk length 1073741824 owner 2 stripe_len 65536
		type DATA|RAID1 num_stripes 2
			stripe 0 devid 1 offset 2168455168
			dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
			stripe 1 devid 3 offset 2156920832
			dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
	item 8 key (FIRST_CHUNK_TREE CHUNK_ITEM 4324327424) itemoff 15317 itemsize 112
		chunk length 1073741824 owner 2 stripe_len 65536
		type DATA|RAID1 num_stripes 2
			stripe 0 devid 2 offset 2156920832
			dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
			stripe 1 devid 1 offset 3242196992
			dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
	item 9 key (FIRST_CHUNK_TREE CHUNK_ITEM 5398069248) itemoff 15205 itemsize 112
		chunk length 1073741824 owner 2 stripe_len 65536
		type DATA|RAID1 num_stripes 2
			stripe 0 devid 2 offset 3230662656
			dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
			stripe 1 devid 3 offset 3230662656
			dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
	item 10 key (FIRST_CHUNK_TREE CHUNK_ITEM 6471811072) itemoff 15093 itemsize 112
		chunk length 1064304640 owner 2 stripe_len 65536
		type DATA|RAID1 num_stripes 2
			stripe 0 devid 2 offset 4304404480
			dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
			stripe 1 devid 3 offset 4304404480
			dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
total bytes 16106127360
bytes used 6711709696
uuid 30771a06-e6a8-4cbc-a094-893049fa5060
===========================

In both before cp and after cp, there are
chunks containing /dev/sdb (devid 2).

Thanks,
Satoru

>
> Thanks,
> Qu
>> Label: none  uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
>>          Total devices 3 FS bytes used 1.81GiB
>>          devid    1 size 5.00GiB used 2.00GiB path /dev/sda
>>          devid    2 size 5.00GiB used 2.01GiB path /dev/sdb
>>          *** Some devices missing
>>
>> # cp -a linux /scratch_mnt/
>> # cp -a linux /scratch_mnt/linux.2
>> # cp -a linux /scratch_mnt/linux.3
>> cp: error writing ‘/scratch_mnt/linux.3/drivers/scsi/lpfc/lpfc_els.c’:
>> No space left on device
>> ...
>> # mount | grep scratch
>> /dev/sda on /scratch_mnt type btrfs
>> (rw,relatime,seclabel,space_cache,subvolid=5,subvol=/)
>> # dmesg | tail
>> [ 1400.778705] BTRFS warning (device sdc): lost page write due to IO
>> error on /dev/sdc
>> [ 1438.604796] btrfs_dev_stat_print_on_error: 174 callbacks suppressed
>> [ 1438.604803] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125633,
>> rd 1, flush 276, corrupt 0, gen 0
>> [ 1438.609782] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
>> rd 1, flush 276, corrupt 0, gen 0
>> [ 1438.613331] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
>> rd 1, flush 277, corrupt 0, gen 0
>> [ 1438.669090] btrfs_end_buffer_write_sync: 52 callbacks suppressed
>> [ 1438.669095] BTRFS warning (device sdc): lost page write due to IO
>> error on /dev/sdc
>> [ 1438.669098] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125635,
>> rd 1, flush 277, corrupt 0, gen 0
>> [ 1438.672621] BTRFS warning (device sdc): lost page write due to IO
>> error on /dev/sdc
>> [ 1438.672626] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125636,
>> rd 1, flush 277, corrupt 0, gen 0
>> =================================
>>
>> Thanks,
>> Satoru
>>
>>>
>>> Thanks,
>>> Satoru
>>>
>>>> still mounted in rw mode:
>>>> /dev/sdg on /mnt/raid1 type btrfs
>>>> (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>> ####
>>>> 7# cp -r /root/ .
>>>> cp: das Verzeichnis „./root“ kann nicht angelegt werden:
>>>> Eingabe-/Ausgabefehler
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>> /dev/sdg errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error
>>>> reading free space cache
>>>> Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to
>>>> load free space cache for block group 20497563648, rebuilding it now
>>>> Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
>>>> Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at
>>>> /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156
>>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
>>>> Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
>>>> Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage
>>>> pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci
>>>> saa7146_vv ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core
>>>> dvb_core v4l2_common videodev media cfg80211 vboxdrv(O)
>>>> cpufreq_powersave cpufreq_conservative cpufreq_userspace
>>>> cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi
>>>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
>>>> irqbypass crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev
>>>> sparse_keymap drbg iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek
>>>> rfkill ansi_cprng snd_hda_codec_generic nvidia(PO) aesni_intel
>>>> aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper
>>>> snd_hda_codec cryptd snd_hda_core serio_raw pcspkr snd_hwdep snd_pcm
>>>> i2c_i801 snd_timer snd lpc_ich soundcore 8250_fintek mei_me shpchp mei
>>>> Apr 20 07:05:37 rakete kernel:  mfd_core battery tpm_tis tpm evdev
>>>> processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac
>>>> encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache
>>>> jbd2 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor
>>>> async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic
>>>> md_mod dm_mirror dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod
>>>> ata_generic ahci libahci pata_via xhci_pci ehci_pci crc32c_intel
>>>> xhci_hcd ehci_hcd libata psmouse scsi_mod atl1c usbcore usb_common
>>>> fjes video wmi fan thermal button
>>>> Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted:
>>>> P           O    4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
>>>> Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer
>>>> System Product Name/P8H67-V, BIOS 3707 07/12/2013
>>>> Apr 20 07:05:37 rakete kernel:  0000000000000286 000000006a1407c8
>>>> ffffffff812ed425 ffff88016b6dfb90
>>>> Apr 20 07:05:37 rakete kernel:  ffffffffa03817b8 ffffffff81077ea1
>>>> ffff88018e7fcd30 ffff88016b6dfbe8
>>>> Apr 20 07:05:37 rakete kernel:  ffff88005d863e88 ffff8801cde7a980
>>>> ffff88018e7fce48 ffffffff81077f2c
>>>> Apr 20 07:05:37 rakete kernel: Call Trace:
>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff812ed425>] ?
>>>> dump_stack+0x5c/0x77
>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81077ea1>] ?
>>>> warn_slowpath_common+0x81/0xb0
>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81077f2c>] ?
>>>> warn_slowpath_fmt+0x5c/0x80
>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d74af>] ?
>>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]
>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d76af>] ?
>>>> btrfs_cow_block+0x10f/0x1d0 [btrfs]
>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02db2cd>] ?
>>>> btrfs_search_slot+0x1fd/0xa30 [btrfs]
>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02dd3f1>] ?
>>>> btrfs_insert_empty_items+0x71/0xc0 [btrfs]
>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811f4d92>] ?
>>>> insert_inode_locked4+0xa2/0x1c0
>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa030ee5d>] ?
>>>> btrfs_new_inode+0x1cd/0x590 [btrfs]
>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa0310a77>] ?
>>>> btrfs_mkdir+0x107/0x1f0 [btrfs]
>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811e80b0>] ?
>>>> vfs_mkdir+0xb0/0x140
>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811e9d3e>] ?
>>>> SyS_mkdir+0xce/0x110
>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81592736>] ?
>>>> system_call_fast_compare_end+0xc/0x6b
>>>> Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
>>>> Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in
>>>> __btrfs_cow_block:1156: errno=-5 IO failure
>>>> Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced readonly
>>>>
>>>> ####
>>>> Try to copy again:
>>>> 11# cp -r /root/ .
>>>> cp: cannot create directory './root': Read-only file system
>>>> ####
>>>> /dev/sdg on /mnt/raid1 type btrfs
>>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>> ####
>>>> plugin device sdg again:
>>>>
>>>> Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point
>>>> /media/matthias/BACKUP (device 8:81 no longer exist)
>>>> Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device number 3
>>>> Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No
>>>> such file or directory
>>>> Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is busy
>>>> Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info
>>>> about processes that
>>>> Apr 20 07:07:39 rakete umount[16807]: use the device is found by
>>>> lsof(8) or fuser(1).)
>>>> Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process
>>>> exited, code=exited status=32
>>>> Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device
>>>> number 4 using xhci_hcd
>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found,
>>>> idVendor=152d, idProduct=0567
>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings:
>>>> Mfr=10, Product=11, SerialNumber=5
>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>>> device detected
>>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>>> vid 152d pid 0567: 5000000
>>>> Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
>>>> Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4:
>>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>>> Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an
>>>> MTP device
>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access     WDC
>>>> WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access     WDC
>>>> WD75 00AACS-00C7B0    0125 PQ: 0 ANSI: 6
>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access     WDC
>>>> WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access
>>>> SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6
>>>> type 0
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte
>>>> logical blocks: (2.00 TB/1.82 TiB)
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00
>>>> 10 08
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7
>>>> type 0
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode
>>>> page found
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive
>>>> cache: write through
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte
>>>> logical blocks: (750 GB/699 GiB)
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8
>>>> type 0
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00
>>>> 10 08
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9
>>>> type 0
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte
>>>> logical blocks: (500 GB/466 GiB)
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode
>>>> page found
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive
>>>> cache: write through
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00
>>>> 10 08
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte
>>>> logical blocks: (250 GB/233 GiB)
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode
>>>> page found
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive
>>>> cache: write through
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00
>>>> 10 08
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode
>>>> page found
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive
>>>> cache: write through
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
>>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
>>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>>> with ordered data mode. Opts: (null)
>>>>
>>>> ####
>>>> still ro mode
>>>> /dev/sdj on /mnt/raid1 type btrfs
>>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>> ####
>>>> 14# btrfs fi show
>>>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>     Total devices 3 FS bytes used 1.60GiB
>>>>     devid    1 size 698.64GiB used 3.03GiB path /dev/sdj
>>>>     devid    2 size 465.76GiB used 3.03GiB path /dev/sdk
>>>>     devid    3 size 232.88GiB used 0.00B path /dev/sdl
>>>> ####
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-22  2:21           ` Satoru Takeuchi
@ 2016-04-22  5:32             ` Qu Wenruo
  2016-04-22  6:17               ` Satoru Takeuchi
  0 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2016-04-22  5:32 UTC (permalink / raw)
  To: Satoru Takeuchi, Qu Wenruo, Matthias Bodenbinder, linux-btrfs



Satoru Takeuchi wrote on 2016/04/22 11:21 +0900:
> On 2016/04/21 20:58, Qu Wenruo wrote:
>>
>>
>> On 04/21/2016 03:45 PM, Satoru Takeuchi wrote:
>>> On 2016/04/21 15:23, Satoru Takeuchi wrote:
>>>> On 2016/04/20 14:17, Matthias Bodenbinder wrote:
>>>>> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>>>>>> BTW, it would be better to post the dmesg for better debug.
>>>>>
>>>>> So here we. I did the same test again. Here is a full log of what i
>>>>> did. It seems to be mean like a bug in btrfs.
>>>>> Sequenz of events:
>>>>> 1. mount the raid1 (2 disc with different size)
>>>>> 2. unplug the biggest drive (hotplug)
>>>>> 3. try to copy something to the degraded raid1
>>>>> 4. plugin the device again (hotplug)
>>>>>
>>>>> This scenario does not work. The disc array is NOT redundant! I can
>>>>> not work with it while a drive is missing and I can not reattach the
>>>>> device so that everything works again.
>>>>>
>>>>> The btrfs module crashes during the test.
>>>>>
>>>>> I am using LMDE2 with backports:
>>>>> btrfs-tools 4.4-1~bpo8+1
>>>>> linux-image-4.4.0-0.bpo.1-amd64
>>>>>
>>>>> Matthias
>>>>>
>>>>>
>>>>> rakete - root - /root
>>>>> 1# mount /mnt/raid1/
>>>>>
>>>>> Journal:
>>>>>
>>>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto
>>>>> defrag
>>>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space
>>>>> caching is enabled
>>>>> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>>>>>
>>>>> rakete - root - /mnt/raid1
>>>>> 3# ll
>>>>> insgesamt 0
>>>>> drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
>>>>> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
>>>>> drwxr-xr-x 1 root root  108 Mär 24 07:31 var
>>>>>
>>>>> 4# btrfs fi show
>>>>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>>     Total devices 3 FS bytes used 1.60GiB
>>>>>     devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
>>>>>     devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
>>>>>     devid    3 size 232.88GiB used 0.00B path /dev/sdi
>>>>>
>>>>> ####
>>>>> unplug device sdg:
>>>>>
>>>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>>>> block 243826688, lost sync page write
>>>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>>>> journal superblock for sdf1-8.
>>>>> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
>>>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>>>> block 243826688, lost sync page write
>>>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>>>> journal superblock for sdf1-8.
>>>>> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is
>>>>> busy
>>>>> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info
>>>>> about processes that
>>>>> Apr 20 07:03:05 rakete umount[16405]: use the device is found by
>>>>> lsof(8) or fuser(1).)
>>>>> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process
>>>>> exited, code=exited status=32
>>>>> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device
>>>>> number 3 using xhci_hcd
>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found,
>>>>> idVendor=152d, idProduct=0567
>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings:
>>>>> Mfr=10, Product=11, SerialNumber=5
>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI
>>>>> Bridge
>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>>>> device detected
>>>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>>>> vid 152d pid 0567: 5000000
>>>>> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
>>>>> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3:
>>>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>>>> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an
>>>>> MTP device
>>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access     WDC
>>>>> WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
>>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access     WDC
>>>>> WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
>>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access
>>>>> SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6
>>>>> type 0
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7
>>>>> type 0
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte
>>>>> logical blocks: (2.00 TB/1.82 TiB)
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00
>>>>> 10 08
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8
>>>>> type 0
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte
>>>>> logical blocks: (500 GB/466 GiB)
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page
>>>>> found
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive
>>>>> cache: write through
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00
>>>>> 10 08
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte
>>>>> logical blocks: (250 GB/233 GiB)
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page
>>>>> found
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive
>>>>> cache: write through
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00
>>>>> 10 08
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page
>>>>> found
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive
>>>>> cache: write through
>>>>> Apr 20 07:03:25 rakete kernel:  sdf: sdf1
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
>>>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
>>>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>>>> with ordered data mode. Opts: (null)
>>>>> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No
>>>>> such file or directory
>>>>>
>>>>>
>>>>> ####
>>>>> 5# btrfs fi show
>>>>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>>     Total devices 3 FS bytes used 1.60GiB
>>>>>     devid    2 size 465.76GiB used 3.03GiB path /dev/sdj
>>>>>     devid    3 size 232.88GiB used 0.00B path /dev/sdk
>>>>>     *** Some devices missing
>>>>> ####
>>>>
>>>> Here the names of *online* devices are changed
>>>> (/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk) after just
>>>> offlining a device (/dev/sdf). It's odd regardless of
>>>> whether Btrfs works fine or not.
>>>>
>>>> Can anyone explain this behavior?
>>>
>>> FYI,
>>>
>>> I tried to reproduce this problem on VM.
>>> Here USB storages are /dev/sd{a,b,c}.
>>>
>>> Step to reproduce:
>>>
>>>   1. create a fs on /dev/sd{a,b,c}
>>>   2. mount this fs
>>>   3. Surprise unplug /dev/sdc
>>>   4. Write to this fs till ENOSPC happens
>>>
>>> Then, although there are I/O errors about /dev/sdc,
>>> device names didn't change and ro remount didn't happen.
>>>
>>> command log:
>>> =================================
>>> # mkfs.btrfs -f -m raid1 -d raid1 /dev/sd{a,b,c}
>>> btrfs-progs v4.5.1-41-g8202204-dirty
>>> See http://btrfs.wiki.kernel.org for more information.
>>>
>>> Label:              (null)
>>> UUID:               16a54915-c807-42cf-8365-82c0780c5ab5
>>> Node size:          16384
>>> Sector size:        4096
>>> Filesystem size:    15.00GiB
>>> Block group profiles:
>>>    Data:             RAID1             1.01GiB
>>>    Metadata:         RAID1             1.01GiB
>>>    System:           RAID1            12.00MiB
>>> SSD detected:       no
>>> Incompat features:  extref, skinny-metadata
>>> Number of devices:  3
>>> Devices:
>>>     ID        SIZE  PATH
>>>      1     5.00GiB  /dev/sda
>>>      2     5.00GiB  /dev/sdb
>>>      3     5.00GiB  /dev/sdc
>>>
>>> # mount /dev/sda /scratch_mnt/
>>> # btrfs fi show /scratch_mnt/
>>> Label: none  uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
>>>          Total devices 3 FS bytes used 640.00KiB
>>>          devid    1 size 5.00GiB used 2.00GiB path /dev/sda
>>>          devid    2 size 5.00GiB used 1.01GiB path /dev/sdb
>>>          devid    3 size 5.00GiB used 1.01GiB path /dev/sdc
>>>
>>> #
>>> # # *** surprise unplug happens here ***
>>> #
>>> # btrfs fi show /scratch_mnt/
>>
>> Would you please post the output of "btrfs-debug-tree -t 3"?
>>
>> I guess the case would be that, there is not raid1 stripe in device 3,
>> so all data/metadata allocation/cow happens without problem.
>> "btrfs-debug-tree -t 3" output would verify my guess.
>
> OK, here it is.
>
> btrfs-debug-tree -t 3 before cp:
> ===========================
> btrfs-progs v4.5.1-41-g8202204-dirty
> chunk tree
> leaf 20987904 items 6 free space 15503 generation 5 owner 3
> fs uuid 30771a06-e6a8-4cbc-a094-893049fa5060
> chunk uuid 2325f1b9-1bf0-4247-8c29-7b179eabf1b2
>     item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
>         dev item devid 1 total_bytes 5368709120 bytes used 2147483648
>         dev uuid 06bc0993-39d3-4d9a-b484-760ae2150c3a
>     item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
>         dev item devid 2 total_bytes 5368709120 bytes used 1082130432
>         dev uuid 3868895f-295b-4a89-a01c-ad0f1c5ac758
>     item 2 key (DEV_ITEMS DEV_ITEM 3) itemoff 15989 itemsize 98
>         dev item devid 3 total_bytes 5368709120 bytes used 1082130432
>         dev uuid 911e8702-9428-4b8e-bc6d-d212e909a1ef
>     item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15877
> itemsize 112
>         chunk length 8388608 owner 2 stripe_len 65536
>         type SYSTEM|RAID1 num_stripes 2
>             stripe 0 devid 3 offset 1048576
>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>             stripe 1 devid 2 offset 1048576
>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>     item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 15765
> itemsize 112
>         chunk length 1073741824 owner 2 stripe_len 65536
>         type METADATA|RAID1 num_stripes 2
>             stripe 0 devid 1 offset 20971520
>             dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>             stripe 1 devid 3 offset 9437184
>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>     item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 1103101952) itemoff 15653
> itemsize 112
>         chunk length 1073741824 owner 2 stripe_len 65536
>         type DATA|RAID1 num_stripes 2
>             stripe 0 devid 2 offset 9437184
>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>             stripe 1 devid 1 offset 1094713344
>             dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
> total bytes 16106127360
> bytes used 114688
> uuid 30771a06-e6a8-4cbc-a094-893049fa5060
> ===========================
>
>
>
> Here I hot unplug devid 2 (/dev/sdb).
>
>
>
> btrfs-debug-tree -t 3 after cp (which cause ENOSPC):
> ===========================
> btrfs-progs v4.5.1-41-g8202204-dirty
> warning, device 2 is missing
> chunk tree
> leaf 20987904 items 11 free space 14818 generation 9 owner 3
> fs uuid 30771a06-e6a8-4cbc-a094-893049fa5060
> chunk uuid 2325f1b9-1bf0-4247-8c29-7b179eabf1b2
>     item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
>         dev item devid 1 total_bytes 5368709120 bytes used 4294967296
>         dev uuid 06bc0993-39d3-4d9a-b484-760ae2150c3a
>     item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
>         dev item devid 2 total_bytes 5368709120 bytes used 5367660544
>         dev uuid 3868895f-295b-4a89-a01c-ad0f1c5ac758
>     item 2 key (DEV_ITEMS DEV_ITEM 3) itemoff 15989 itemsize 98
>         dev item devid 3 total_bytes 5368709120 bytes used 5367660544
>         dev uuid 911e8702-9428-4b8e-bc6d-d212e909a1ef
>     item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15877
> itemsize 112
>         chunk length 8388608 owner 2 stripe_len 65536
>         type SYSTEM|RAID1 num_stripes 2
>             stripe 0 devid 3 offset 1048576
>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>             stripe 1 devid 2 offset 1048576
>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>     item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 15765
> itemsize 112
>         chunk length 1073741824 owner 2 stripe_len 65536
>         type METADATA|RAID1 num_stripes 2
>             stripe 0 devid 1 offset 20971520
>             dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>             stripe 1 devid 3 offset 9437184
>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>     item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 1103101952) itemoff 15653
> itemsize 112
>         chunk length 1073741824 owner 2 stripe_len 65536
>         type DATA|RAID1 num_stripes 2
>             stripe 0 devid 2 offset 9437184
>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>             stripe 1 devid 1 offset 1094713344
>             dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>     item 6 key (FIRST_CHUNK_TREE CHUNK_ITEM 2176843776) itemoff 15541
> itemsize 112
>         chunk length 1073741824 owner 2 stripe_len 65536
>         type DATA|RAID1 num_stripes 2
>             stripe 0 devid 2 offset 1083179008
>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>             stripe 1 devid 3 offset 1083179008
>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>     item 7 key (FIRST_CHUNK_TREE CHUNK_ITEM 3250585600) itemoff 15429
> itemsize 112
>         chunk length 1073741824 owner 2 stripe_len 65536
>         type DATA|RAID1 num_stripes 2
>             stripe 0 devid 1 offset 2168455168
>             dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>             stripe 1 devid 3 offset 2156920832
>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>     item 8 key (FIRST_CHUNK_TREE CHUNK_ITEM 4324327424) itemoff 15317
> itemsize 112
>         chunk length 1073741824 owner 2 stripe_len 65536
>         type DATA|RAID1 num_stripes 2
>             stripe 0 devid 2 offset 2156920832
>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>             stripe 1 devid 1 offset 3242196992
>             dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>     item 9 key (FIRST_CHUNK_TREE CHUNK_ITEM 5398069248) itemoff 15205
> itemsize 112
>         chunk length 1073741824 owner 2 stripe_len 65536
>         type DATA|RAID1 num_stripes 2
>             stripe 0 devid 2 offset 3230662656
>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>             stripe 1 devid 3 offset 3230662656
>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>     item 10 key (FIRST_CHUNK_TREE CHUNK_ITEM 6471811072) itemoff 15093
> itemsize 112
>         chunk length 1064304640 owner 2 stripe_len 65536
>         type DATA|RAID1 num_stripes 2
>             stripe 0 devid 2 offset 4304404480
>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>             stripe 1 devid 3 offset 4304404480
>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
> total bytes 16106127360
> bytes used 6711709696
> uuid 30771a06-e6a8-4cbc-a094-893049fa5060
> ===========================
>
> In both before cp and after cp, there are
> chunks containing /dev/sdb (devid 2).

Right, even newly create data chunk have strips on devid 2.

Making the original bug a little strange now.

Thanks,
Qu
>
> Thanks,
> Satoru
>
>>
>> Thanks,
>> Qu
>>> Label: none  uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
>>>          Total devices 3 FS bytes used 1.81GiB
>>>          devid    1 size 5.00GiB used 2.00GiB path /dev/sda
>>>          devid    2 size 5.00GiB used 2.01GiB path /dev/sdb
>>>          *** Some devices missing
>>>
>>> # cp -a linux /scratch_mnt/
>>> # cp -a linux /scratch_mnt/linux.2
>>> # cp -a linux /scratch_mnt/linux.3
>>> cp: error writing ‘/scratch_mnt/linux.3/drivers/scsi/lpfc/lpfc_els.c’:
>>> No space left on device
>>> ...
>>> # mount | grep scratch
>>> /dev/sda on /scratch_mnt type btrfs
>>> (rw,relatime,seclabel,space_cache,subvolid=5,subvol=/)
>>> # dmesg | tail
>>> [ 1400.778705] BTRFS warning (device sdc): lost page write due to IO
>>> error on /dev/sdc
>>> [ 1438.604796] btrfs_dev_stat_print_on_error: 174 callbacks suppressed
>>> [ 1438.604803] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125633,
>>> rd 1, flush 276, corrupt 0, gen 0
>>> [ 1438.609782] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
>>> rd 1, flush 276, corrupt 0, gen 0
>>> [ 1438.613331] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
>>> rd 1, flush 277, corrupt 0, gen 0
>>> [ 1438.669090] btrfs_end_buffer_write_sync: 52 callbacks suppressed
>>> [ 1438.669095] BTRFS warning (device sdc): lost page write due to IO
>>> error on /dev/sdc
>>> [ 1438.669098] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125635,
>>> rd 1, flush 277, corrupt 0, gen 0
>>> [ 1438.672621] BTRFS warning (device sdc): lost page write due to IO
>>> error on /dev/sdc
>>> [ 1438.672626] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125636,
>>> rd 1, flush 277, corrupt 0, gen 0
>>> =================================
>>>
>>> Thanks,
>>> Satoru
>>>
>>>>
>>>> Thanks,
>>>> Satoru
>>>>
>>>>> still mounted in rw mode:
>>>>> /dev/sdg on /mnt/raid1 type btrfs
>>>>> (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>>> ####
>>>>> 7# cp -r /root/ .
>>>>> cp: das Verzeichnis „./root“ kann nicht angelegt werden:
>>>>> Eingabe-/Ausgabefehler
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>> /dev/sdg errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error
>>>>> reading free space cache
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to
>>>>> load free space cache for block group 20497563648, rebuilding it now
>>>>> Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
>>>>> Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at
>>>>> /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156
>>>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
>>>>> Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage
>>>>> pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci
>>>>> saa7146_vv ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core
>>>>> dvb_core v4l2_common videodev media cfg80211 vboxdrv(O)
>>>>> cpufreq_powersave cpufreq_conservative cpufreq_userspace
>>>>> cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi
>>>>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
>>>>> irqbypass crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev
>>>>> sparse_keymap drbg iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek
>>>>> rfkill ansi_cprng snd_hda_codec_generic nvidia(PO) aesni_intel
>>>>> aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper
>>>>> snd_hda_codec cryptd snd_hda_core serio_raw pcspkr snd_hwdep snd_pcm
>>>>> i2c_i801 snd_timer snd lpc_ich soundcore 8250_fintek mei_me shpchp mei
>>>>> Apr 20 07:05:37 rakete kernel:  mfd_core battery tpm_tis tpm evdev
>>>>> processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac
>>>>> encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache
>>>>> jbd2 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor
>>>>> async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic
>>>>> md_mod dm_mirror dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod
>>>>> ata_generic ahci libahci pata_via xhci_pci ehci_pci crc32c_intel
>>>>> xhci_hcd ehci_hcd libata psmouse scsi_mod atl1c usbcore usb_common
>>>>> fjes video wmi fan thermal button
>>>>> Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted:
>>>>> P           O    4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
>>>>> Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer
>>>>> System Product Name/P8H67-V, BIOS 3707 07/12/2013
>>>>> Apr 20 07:05:37 rakete kernel:  0000000000000286 000000006a1407c8
>>>>> ffffffff812ed425 ffff88016b6dfb90
>>>>> Apr 20 07:05:37 rakete kernel:  ffffffffa03817b8 ffffffff81077ea1
>>>>> ffff88018e7fcd30 ffff88016b6dfbe8
>>>>> Apr 20 07:05:37 rakete kernel:  ffff88005d863e88 ffff8801cde7a980
>>>>> ffff88018e7fce48 ffffffff81077f2c
>>>>> Apr 20 07:05:37 rakete kernel: Call Trace:
>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff812ed425>] ?
>>>>> dump_stack+0x5c/0x77
>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81077ea1>] ?
>>>>> warn_slowpath_common+0x81/0xb0
>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81077f2c>] ?
>>>>> warn_slowpath_fmt+0x5c/0x80
>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d74af>] ?
>>>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]
>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d76af>] ?
>>>>> btrfs_cow_block+0x10f/0x1d0 [btrfs]
>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02db2cd>] ?
>>>>> btrfs_search_slot+0x1fd/0xa30 [btrfs]
>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02dd3f1>] ?
>>>>> btrfs_insert_empty_items+0x71/0xc0 [btrfs]
>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811f4d92>] ?
>>>>> insert_inode_locked4+0xa2/0x1c0
>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa030ee5d>] ?
>>>>> btrfs_new_inode+0x1cd/0x590 [btrfs]
>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa0310a77>] ?
>>>>> btrfs_mkdir+0x107/0x1f0 [btrfs]
>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811e80b0>] ?
>>>>> vfs_mkdir+0xb0/0x140
>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811e9d3e>] ?
>>>>> SyS_mkdir+0xce/0x110
>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81592736>] ?
>>>>> system_call_fast_compare_end+0xc/0x6b
>>>>> Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in
>>>>> __btrfs_cow_block:1156: errno=-5 IO failure
>>>>> Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced
>>>>> readonly
>>>>>
>>>>> ####
>>>>> Try to copy again:
>>>>> 11# cp -r /root/ .
>>>>> cp: cannot create directory './root': Read-only file system
>>>>> ####
>>>>> /dev/sdg on /mnt/raid1 type btrfs
>>>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>>> ####
>>>>> plugin device sdg again:
>>>>>
>>>>> Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point
>>>>> /media/matthias/BACKUP (device 8:81 no longer exist)
>>>>> Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device
>>>>> number 3
>>>>> Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No
>>>>> such file or directory
>>>>> Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is
>>>>> busy
>>>>> Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info
>>>>> about processes that
>>>>> Apr 20 07:07:39 rakete umount[16807]: use the device is found by
>>>>> lsof(8) or fuser(1).)
>>>>> Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process
>>>>> exited, code=exited status=32
>>>>> Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device
>>>>> number 4 using xhci_hcd
>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found,
>>>>> idVendor=152d, idProduct=0567
>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings:
>>>>> Mfr=10, Product=11, SerialNumber=5
>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI
>>>>> Bridge
>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>>>> device detected
>>>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>>>> vid 152d pid 0567: 5000000
>>>>> Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
>>>>> Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4:
>>>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>>>> Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an
>>>>> MTP device
>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access     WDC
>>>>> WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access     WDC
>>>>> WD75 00AACS-00C7B0    0125 PQ: 0 ANSI: 6
>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access     WDC
>>>>> WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access
>>>>> SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6
>>>>> type 0
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte
>>>>> logical blocks: (2.00 TB/1.82 TiB)
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00
>>>>> 10 08
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7
>>>>> type 0
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode
>>>>> page found
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive
>>>>> cache: write through
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte
>>>>> logical blocks: (750 GB/699 GiB)
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8
>>>>> type 0
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00
>>>>> 10 08
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9
>>>>> type 0
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte
>>>>> logical blocks: (500 GB/466 GiB)
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode
>>>>> page found
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive
>>>>> cache: write through
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00
>>>>> 10 08
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte
>>>>> logical blocks: (250 GB/233 GiB)
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode
>>>>> page found
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive
>>>>> cache: write through
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00
>>>>> 10 08
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode
>>>>> page found
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive
>>>>> cache: write through
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
>>>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
>>>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>>>> with ordered data mode. Opts: (null)
>>>>>
>>>>> ####
>>>>> still ro mode
>>>>> /dev/sdj on /mnt/raid1 type btrfs
>>>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>>> ####
>>>>> 14# btrfs fi show
>>>>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>>     Total devices 3 FS bytes used 1.60GiB
>>>>>     devid    1 size 698.64GiB used 3.03GiB path /dev/sdj
>>>>>     devid    2 size 465.76GiB used 3.03GiB path /dev/sdk
>>>>>     devid    3 size 232.88GiB used 0.00B path /dev/sdl
>>>>> ####
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-21 17:40           ` Matthias Bodenbinder
@ 2016-04-22  6:02             ` Qu Wenruo
  2016-04-23  7:07               ` Matthias Bodenbinder
  0 siblings, 1 reply; 32+ messages in thread
From: Qu Wenruo @ 2016-04-22  6:02 UTC (permalink / raw)
  To: Matthias Bodenbinder, linux-btrfs



Matthias Bodenbinder wrote on 2016/04/21 19:40 +0200:
> Am 21.04.2016 um 07:43 schrieb Qu Wenruo:
>> There are already unmerged patches which will partly do the mdadm level behavior, like automatically change to degraded mode without making the fs RO.
>>
>> The original patchset:
>> http://comments.gmane.org/gmane.comp.file-systems.btrfs/48335
>
> The description of thix patch says:
>
> "Although the one-size-fit-all solution is quite safe, it's too strict if
> data and metadata has different duplication level."
> ...
> "This patchset will introduce a new per-chunk degradable check for btrfs,
> allow above case to succeed, and it's quite small anyway."
>
>
> My raid1 is "-m raid1 -d raid1". Both the same duplication level. Would that patch make any difference?

Without this patch, we can abort_transaction() at commit or space 
allocation time.
(There is also user can't reproduce your bug though)

Although this patchset is not full fix, it provides the basis for later 
raid1 failure fix.

And that's the reason Anand Jain pick these patchse into this big 
auto-replace patchset.

I was meant to do further fix, but now Anand Jain is pushing 
auto-replace so I didn't do anything newer after the original patchset.

>
> And: What do I need to do to test this in "debian stable"? I am not a programmer - but I know how to use git and how to compile with proper configuration directions.

If no experience in git and kernel compile, then you can still do your 
contribution.

Since Satoru can't reproduce the problem, would you please try his 
method to reproduce it?

As I found your kernel is 4.4, not old but still not the latest, while I 
think Satoru is using the latest one.

If it's possible, please use the 4.5/4.6-rc kernel if debian provided.
If it's not possible (debian doesn't provide 4.5 or 4.6-rc), would you 
please try the same process Satoro provided.

As unlike Satoru's process, your fs is not newly created(empty).

If we can reproduce it, it would be much easier to fix.

Thanks,
Qu

>
> Matthias
>
>
>> Or the latest patchset inside Anand Jain's auto-replace patchset:
>> http://thread.gmane.org/gmane.comp.file-systems.btrfs/55446
>>
>> Thanks,
>> Qu
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-22  5:32             ` Qu Wenruo
@ 2016-04-22  6:17               ` Satoru Takeuchi
  0 siblings, 0 replies; 32+ messages in thread
From: Satoru Takeuchi @ 2016-04-22  6:17 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, Matthias Bodenbinder, linux-btrfs

On 2016/04/22 14:32, Qu Wenruo wrote:
>
>
> Satoru Takeuchi wrote on 2016/04/22 11:21 +0900:
>> On 2016/04/21 20:58, Qu Wenruo wrote:
>>>
>>>
>>> On 04/21/2016 03:45 PM, Satoru Takeuchi wrote:
>>>> On 2016/04/21 15:23, Satoru Takeuchi wrote:
>>>>> On 2016/04/20 14:17, Matthias Bodenbinder wrote:
>>>>>> Am 18.04.2016 um 09:22 schrieb Qu Wenruo:
>>>>>>> BTW, it would be better to post the dmesg for better debug.
>>>>>>
>>>>>> So here we. I did the same test again. Here is a full log of what i
>>>>>> did. It seems to be mean like a bug in btrfs.
>>>>>> Sequenz of events:
>>>>>> 1. mount the raid1 (2 disc with different size)
>>>>>> 2. unplug the biggest drive (hotplug)
>>>>>> 3. try to copy something to the degraded raid1
>>>>>> 4. plugin the device again (hotplug)
>>>>>>
>>>>>> This scenario does not work. The disc array is NOT redundant! I can
>>>>>> not work with it while a drive is missing and I can not reattach the
>>>>>> device so that everything works again.
>>>>>>
>>>>>> The btrfs module crashes during the test.
>>>>>>
>>>>>> I am using LMDE2 with backports:
>>>>>> btrfs-tools 4.4-1~bpo8+1
>>>>>> linux-image-4.4.0-0.bpo.1-amd64
>>>>>>
>>>>>> Matthias
>>>>>>
>>>>>>
>>>>>> rakete - root - /root
>>>>>> 1# mount /mnt/raid1/
>>>>>>
>>>>>> Journal:
>>>>>>
>>>>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): enabling auto
>>>>>> defrag
>>>>>> Apr 20 07:01:16 rakete kernel: BTRFS info (device sdi): disk space
>>>>>> caching is enabled
>>>>>> Apr 20 07:01:16 rakete kernel: BTRFS: has skinny extents
>>>>>>
>>>>>> rakete - root - /mnt/raid1
>>>>>> 3# ll
>>>>>> insgesamt 0
>>>>>> drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
>>>>>> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
>>>>>> drwxr-xr-x 1 root root  108 Mär 24 07:31 var
>>>>>>
>>>>>> 4# btrfs fi show
>>>>>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>>>     Total devices 3 FS bytes used 1.60GiB
>>>>>>     devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
>>>>>>     devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
>>>>>>     devid    3 size 232.88GiB used 0.00B path /dev/sdi
>>>>>>
>>>>>> ####
>>>>>> unplug device sdg:
>>>>>>
>>>>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>>>>> block 243826688, lost sync page write
>>>>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>>>>> journal superblock for sdf1-8.
>>>>>> Apr 20 07:03:05 rakete kernel: Aborting journal on device sdf1-8.
>>>>>> Apr 20 07:03:05 rakete kernel: Buffer I/O error on dev sdf1, logical
>>>>>> block 243826688, lost sync page write
>>>>>> Apr 20 07:03:05 rakete kernel: JBD2: Error -5 detected when updating
>>>>>> journal superblock for sdf1-8.
>>>>>> Apr 20 07:03:05 rakete umount[16405]: umount: /mnt/raid1: target is
>>>>>> busy
>>>>>> Apr 20 07:03:05 rakete umount[16405]: (In some cases useful info
>>>>>> about processes that
>>>>>> Apr 20 07:03:05 rakete umount[16405]: use the device is found by
>>>>>> lsof(8) or fuser(1).)
>>>>>> Apr 20 07:03:05 rakete systemd[1]: mnt-raid1.mount mount process
>>>>>> exited, code=exited status=32
>>>>>> Apr 20 07:03:05 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: new SuperSpeed USB device
>>>>>> number 3 using xhci_hcd
>>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device found,
>>>>>> idVendor=152d, idProduct=0567
>>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: New USB device strings:
>>>>>> Mfr=10, Product=11, SerialNumber=5
>>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI
>>>>>> Bridge
>>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: Manufacturer: JMicron
>>>>>> Apr 20 07:03:24 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>>>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>>>>> device detected
>>>>>> Apr 20 07:03:24 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>>>>> vid 152d pid 0567: 5000000
>>>>>> Apr 20 07:03:24 rakete kernel: scsi host9: usb-storage 3-1:1.0
>>>>>> Apr 20 07:03:24 rakete mtp-probe[16424]: checking bus 3, device 3:
>>>>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>>>>> Apr 20 07:03:24 rakete mtp-probe[16424]: bus: 3, device: 3 was not an
>>>>>> MTP device
>>>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:0: Direct-Access     WDC
>>>>>> WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
>>>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:1: Direct-Access     WDC
>>>>>> WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
>>>>>> Apr 20 07:03:25 rakete kernel: scsi 9:0:0:2: Direct-Access
>>>>>> SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6
>>>>>> type 0
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7
>>>>>> type 0
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte
>>>>>> logical blocks: (2.00 TB/1.82 TiB)
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00
>>>>>> 10 08
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8
>>>>>> type 0
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] 976773168 512-byte
>>>>>> logical blocks: (500 GB/466 GiB)
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page
>>>>>> found
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive
>>>>>> cache: write through
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Write Protect is off
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Mode Sense: 67 00
>>>>>> 10 08
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] 488395055 512-byte
>>>>>> logical blocks: (250 GB/233 GiB)
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] No Caching mode page
>>>>>> found
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Assuming drive
>>>>>> cache: write through
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Write Protect is off
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Mode Sense: 67 00
>>>>>> 10 08
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] No Caching mode page
>>>>>> found
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Assuming drive
>>>>>> cache: write through
>>>>>> Apr 20 07:03:25 rakete kernel:  sdf: sdf1
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:1: [sdj] Attached SCSI disk
>>>>>> Apr 20 07:03:25 rakete kernel: sd 9:0:0:2: [sdk] Attached SCSI disk
>>>>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): recovery complete
>>>>>> Apr 20 07:03:25 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>>>>> with ordered data mode. Opts: (null)
>>>>>> Apr 20 07:03:25 rakete udisksd[3671]: Error statting /dev/sdg: No
>>>>>> such file or directory
>>>>>>
>>>>>>
>>>>>> ####
>>>>>> 5# btrfs fi show
>>>>>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>>>     Total devices 3 FS bytes used 1.60GiB
>>>>>>     devid    2 size 465.76GiB used 3.03GiB path /dev/sdj
>>>>>>     devid    3 size 232.88GiB used 0.00B path /dev/sdk
>>>>>>     *** Some devices missing
>>>>>> ####
>>>>>
>>>>> Here the names of *online* devices are changed
>>>>> (/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk) after just
>>>>> offlining a device (/dev/sdf). It's odd regardless of
>>>>> whether Btrfs works fine or not.
>>>>>
>>>>> Can anyone explain this behavior?
>>>>
>>>> FYI,
>>>>
>>>> I tried to reproduce this problem on VM.
>>>> Here USB storages are /dev/sd{a,b,c}.
>>>>
>>>> Step to reproduce:
>>>>
>>>>   1. create a fs on /dev/sd{a,b,c}
>>>>   2. mount this fs
>>>>   3. Surprise unplug /dev/sdc
>>>>   4. Write to this fs till ENOSPC happens
>>>>
>>>> Then, although there are I/O errors about /dev/sdc,
>>>> device names didn't change and ro remount didn't happen.
>>>>
>>>> command log:
>>>> =================================
>>>> # mkfs.btrfs -f -m raid1 -d raid1 /dev/sd{a,b,c}
>>>> btrfs-progs v4.5.1-41-g8202204-dirty
>>>> See http://btrfs.wiki.kernel.org for more information.
>>>>
>>>> Label:              (null)
>>>> UUID:               16a54915-c807-42cf-8365-82c0780c5ab5
>>>> Node size:          16384
>>>> Sector size:        4096
>>>> Filesystem size:    15.00GiB
>>>> Block group profiles:
>>>>    Data:             RAID1             1.01GiB
>>>>    Metadata:         RAID1             1.01GiB
>>>>    System:           RAID1            12.00MiB
>>>> SSD detected:       no
>>>> Incompat features:  extref, skinny-metadata
>>>> Number of devices:  3
>>>> Devices:
>>>>     ID        SIZE  PATH
>>>>      1     5.00GiB  /dev/sda
>>>>      2     5.00GiB  /dev/sdb
>>>>      3     5.00GiB  /dev/sdc
>>>>
>>>> # mount /dev/sda /scratch_mnt/
>>>> # btrfs fi show /scratch_mnt/
>>>> Label: none  uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
>>>>          Total devices 3 FS bytes used 640.00KiB
>>>>          devid    1 size 5.00GiB used 2.00GiB path /dev/sda
>>>>          devid    2 size 5.00GiB used 1.01GiB path /dev/sdb
>>>>          devid    3 size 5.00GiB used 1.01GiB path /dev/sdc
>>>>
>>>> #
>>>> # # *** surprise unplug happens here ***
>>>> #
>>>> # btrfs fi show /scratch_mnt/
>>>
>>> Would you please post the output of "btrfs-debug-tree -t 3"?
>>>
>>> I guess the case would be that, there is not raid1 stripe in device 3,
>>> so all data/metadata allocation/cow happens without problem.
>>> "btrfs-debug-tree -t 3" output would verify my guess.
>>
>> OK, here it is.
>>
>> btrfs-debug-tree -t 3 before cp:
>> ===========================
>> btrfs-progs v4.5.1-41-g8202204-dirty
>> chunk tree
>> leaf 20987904 items 6 free space 15503 generation 5 owner 3
>> fs uuid 30771a06-e6a8-4cbc-a094-893049fa5060
>> chunk uuid 2325f1b9-1bf0-4247-8c29-7b179eabf1b2
>>     item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
>>         dev item devid 1 total_bytes 5368709120 bytes used 2147483648
>>         dev uuid 06bc0993-39d3-4d9a-b484-760ae2150c3a
>>     item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
>>         dev item devid 2 total_bytes 5368709120 bytes used 1082130432
>>         dev uuid 3868895f-295b-4a89-a01c-ad0f1c5ac758
>>     item 2 key (DEV_ITEMS DEV_ITEM 3) itemoff 15989 itemsize 98
>>         dev item devid 3 total_bytes 5368709120 bytes used 1082130432
>>         dev uuid 911e8702-9428-4b8e-bc6d-d212e909a1ef
>>     item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15877
>> itemsize 112
>>         chunk length 8388608 owner 2 stripe_len 65536
>>         type SYSTEM|RAID1 num_stripes 2
>>             stripe 0 devid 3 offset 1048576
>>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>>             stripe 1 devid 2 offset 1048576
>>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>>     item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 15765
>> itemsize 112
>>         chunk length 1073741824 owner 2 stripe_len 65536
>>         type METADATA|RAID1 num_stripes 2
>>             stripe 0 devid 1 offset 20971520
>>             dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>>             stripe 1 devid 3 offset 9437184
>>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>>     item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 1103101952) itemoff 15653
>> itemsize 112
>>         chunk length 1073741824 owner 2 stripe_len 65536
>>         type DATA|RAID1 num_stripes 2
>>             stripe 0 devid 2 offset 9437184
>>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>>             stripe 1 devid 1 offset 1094713344
>>             dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>> total bytes 16106127360
>> bytes used 114688
>> uuid 30771a06-e6a8-4cbc-a094-893049fa5060
>> ===========================
>>
>>
>>
>> Here I hot unplug devid 2 (/dev/sdb).
>>
>>
>>
>> btrfs-debug-tree -t 3 after cp (which cause ENOSPC):
>> ===========================
>> btrfs-progs v4.5.1-41-g8202204-dirty
>> warning, device 2 is missing
>> chunk tree
>> leaf 20987904 items 11 free space 14818 generation 9 owner 3
>> fs uuid 30771a06-e6a8-4cbc-a094-893049fa5060
>> chunk uuid 2325f1b9-1bf0-4247-8c29-7b179eabf1b2
>>     item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
>>         dev item devid 1 total_bytes 5368709120 bytes used 4294967296
>>         dev uuid 06bc0993-39d3-4d9a-b484-760ae2150c3a
>>     item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
>>         dev item devid 2 total_bytes 5368709120 bytes used 5367660544
>>         dev uuid 3868895f-295b-4a89-a01c-ad0f1c5ac758
>>     item 2 key (DEV_ITEMS DEV_ITEM 3) itemoff 15989 itemsize 98
>>         dev item devid 3 total_bytes 5368709120 bytes used 5367660544
>>         dev uuid 911e8702-9428-4b8e-bc6d-d212e909a1ef
>>     item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15877
>> itemsize 112
>>         chunk length 8388608 owner 2 stripe_len 65536
>>         type SYSTEM|RAID1 num_stripes 2
>>             stripe 0 devid 3 offset 1048576
>>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>>             stripe 1 devid 2 offset 1048576
>>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>>     item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 15765
>> itemsize 112
>>         chunk length 1073741824 owner 2 stripe_len 65536
>>         type METADATA|RAID1 num_stripes 2
>>             stripe 0 devid 1 offset 20971520
>>             dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>>             stripe 1 devid 3 offset 9437184
>>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>>     item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 1103101952) itemoff 15653
>> itemsize 112
>>         chunk length 1073741824 owner 2 stripe_len 65536
>>         type DATA|RAID1 num_stripes 2
>>             stripe 0 devid 2 offset 9437184
>>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>>             stripe 1 devid 1 offset 1094713344
>>             dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>>     item 6 key (FIRST_CHUNK_TREE CHUNK_ITEM 2176843776) itemoff 15541
>> itemsize 112
>>         chunk length 1073741824 owner 2 stripe_len 65536
>>         type DATA|RAID1 num_stripes 2
>>             stripe 0 devid 2 offset 1083179008
>>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>>             stripe 1 devid 3 offset 1083179008
>>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>>     item 7 key (FIRST_CHUNK_TREE CHUNK_ITEM 3250585600) itemoff 15429
>> itemsize 112
>>         chunk length 1073741824 owner 2 stripe_len 65536
>>         type DATA|RAID1 num_stripes 2
>>             stripe 0 devid 1 offset 2168455168
>>             dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>>             stripe 1 devid 3 offset 2156920832
>>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>>     item 8 key (FIRST_CHUNK_TREE CHUNK_ITEM 4324327424) itemoff 15317
>> itemsize 112
>>         chunk length 1073741824 owner 2 stripe_len 65536
>>         type DATA|RAID1 num_stripes 2
>>             stripe 0 devid 2 offset 2156920832
>>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>>             stripe 1 devid 1 offset 3242196992
>>             dev uuid: 06bc0993-39d3-4d9a-b484-760ae2150c3a
>>     item 9 key (FIRST_CHUNK_TREE CHUNK_ITEM 5398069248) itemoff 15205
>> itemsize 112
>>         chunk length 1073741824 owner 2 stripe_len 65536
>>         type DATA|RAID1 num_stripes 2
>>             stripe 0 devid 2 offset 3230662656
>>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>>             stripe 1 devid 3 offset 3230662656
>>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>>     item 10 key (FIRST_CHUNK_TREE CHUNK_ITEM 6471811072) itemoff 15093
>> itemsize 112
>>         chunk length 1064304640 owner 2 stripe_len 65536
>>         type DATA|RAID1 num_stripes 2
>>             stripe 0 devid 2 offset 4304404480
>>             dev uuid: 3868895f-295b-4a89-a01c-ad0f1c5ac758
>>             stripe 1 devid 3 offset 4304404480
>>             dev uuid: 911e8702-9428-4b8e-bc6d-d212e909a1ef
>> total bytes 16106127360
>> bytes used 6711709696
>> uuid 30771a06-e6a8-4cbc-a094-893049fa5060
>> ===========================
>>
>> In both before cp and after cp, there are
>> chunks containing /dev/sdb (devid 2).
>
> Right, even newly create data chunk have strips on devid 2.
>
> Making the original bug a little strange now.

Yes, so I guess the root cause of the original bug
comes from the name changing of the still-online
two devices (/dev/sdh => /dev/sdj, /dev/sdi => /dev/sdk).

Thanks,
Satoru

>
> Thanks,
> Qu
>>
>> Thanks,
>> Satoru
>>
>>>
>>> Thanks,
>>> Qu
>>>> Label: none  uuid: 16a54915-c807-42cf-8365-82c0780c5ab5
>>>>          Total devices 3 FS bytes used 1.81GiB
>>>>          devid    1 size 5.00GiB used 2.00GiB path /dev/sda
>>>>          devid    2 size 5.00GiB used 2.01GiB path /dev/sdb
>>>>          *** Some devices missing
>>>>
>>>> # cp -a linux /scratch_mnt/
>>>> # cp -a linux /scratch_mnt/linux.2
>>>> # cp -a linux /scratch_mnt/linux.3
>>>> cp: error writing ‘/scratch_mnt/linux.3/drivers/scsi/lpfc/lpfc_els.c’:
>>>> No space left on device
>>>> ...
>>>> # mount | grep scratch
>>>> /dev/sda on /scratch_mnt type btrfs
>>>> (rw,relatime,seclabel,space_cache,subvolid=5,subvol=/)
>>>> # dmesg | tail
>>>> [ 1400.778705] BTRFS warning (device sdc): lost page write due to IO
>>>> error on /dev/sdc
>>>> [ 1438.604796] btrfs_dev_stat_print_on_error: 174 callbacks suppressed
>>>> [ 1438.604803] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125633,
>>>> rd 1, flush 276, corrupt 0, gen 0
>>>> [ 1438.609782] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
>>>> rd 1, flush 276, corrupt 0, gen 0
>>>> [ 1438.613331] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125634,
>>>> rd 1, flush 277, corrupt 0, gen 0
>>>> [ 1438.669090] btrfs_end_buffer_write_sync: 52 callbacks suppressed
>>>> [ 1438.669095] BTRFS warning (device sdc): lost page write due to IO
>>>> error on /dev/sdc
>>>> [ 1438.669098] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125635,
>>>> rd 1, flush 277, corrupt 0, gen 0
>>>> [ 1438.672621] BTRFS warning (device sdc): lost page write due to IO
>>>> error on /dev/sdc
>>>> [ 1438.672626] BTRFS error (device sdc): bdev /dev/sdc errs: wr 125636,
>>>> rd 1, flush 277, corrupt 0, gen 0
>>>> =================================
>>>>
>>>> Thanks,
>>>> Satoru
>>>>
>>>>>
>>>>> Thanks,
>>>>> Satoru
>>>>>
>>>>>> still mounted in rw mode:
>>>>>> /dev/sdg on /mnt/raid1 type btrfs
>>>>>> (rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>>>> ####
>>>>>> 7# cp -r /root/ .
>>>>>> cp: das Verzeichnis „./root“ kann nicht angelegt werden:
>>>>>> Eingabe-/Ausgabefehler
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): bdev
>>>>>> /dev/sdg errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS error (device sdi): error
>>>>>> reading free space cache
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS warning (device sdi): failed to
>>>>>> load free space cache for block group 20497563648, rebuilding it now
>>>>>> Apr 20 07:05:37 rakete kernel: ------------[ cut here ]------------
>>>>>> Apr 20 07:05:37 rakete kernel: WARNING: CPU: 7 PID: 16738 at
>>>>>> /build/linux-H3jpF0/linux-4.4.6/fs/btrfs/ctree.c:1156
>>>>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]()
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS: Transaction aborted (error -5)
>>>>>> Apr 20 07:05:37 rakete kernel: Modules linked in: uas usb_storage
>>>>>> pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) binfmt_misc dvb_ttpci
>>>>>> saa7146_vv ttpci_eeprom saa7146 videobuf_dma_sg videobuf_core
>>>>>> dvb_core v4l2_common videodev media cfg80211 vboxdrv(O)
>>>>>> cpufreq_powersave cpufreq_conservative cpufreq_userspace
>>>>>> cpufreq_stats snd_hda_codec_hdmi intel_rapl iosf_mbi
>>>>>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
>>>>>> irqbypass crct10dif_pclmul crc32_pclmul eeepc_wmi asus_wmi joydev
>>>>>> sparse_keymap drbg iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek
>>>>>> rfkill ansi_cprng snd_hda_codec_generic nvidia(PO) aesni_intel
>>>>>> aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper
>>>>>> snd_hda_codec cryptd snd_hda_core serio_raw pcspkr snd_hwdep snd_pcm
>>>>>> i2c_i801 snd_timer snd lpc_ich soundcore 8250_fintek mei_me shpchp mei
>>>>>> Apr 20 07:05:37 rakete kernel:  mfd_core battery tpm_tis tpm evdev
>>>>>> processor drm fuse ecryptfs cbc sha256_ssse3 sha256_generic hmac
>>>>>> encrypted_keys parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache
>>>>>> jbd2 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor
>>>>>> async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic
>>>>>> md_mod dm_mirror dm_region_hash dm_log dm_mod sr_mod sg cdrom sd_mod
>>>>>> ata_generic ahci libahci pata_via xhci_pci ehci_pci crc32c_intel
>>>>>> xhci_hcd ehci_hcd libata psmouse scsi_mod atl1c usbcore usb_common
>>>>>> fjes video wmi fan thermal button
>>>>>> Apr 20 07:05:37 rakete kernel: CPU: 7 PID: 16738 Comm: cp Tainted:
>>>>>> P           O    4.4.0-0.bpo.1-amd64 #1 Debian 4.4.6-1~bpo8+1
>>>>>> Apr 20 07:05:37 rakete kernel: Hardware name: System manufacturer
>>>>>> System Product Name/P8H67-V, BIOS 3707 07/12/2013
>>>>>> Apr 20 07:05:37 rakete kernel:  0000000000000286 000000006a1407c8
>>>>>> ffffffff812ed425 ffff88016b6dfb90
>>>>>> Apr 20 07:05:37 rakete kernel:  ffffffffa03817b8 ffffffff81077ea1
>>>>>> ffff88018e7fcd30 ffff88016b6dfbe8
>>>>>> Apr 20 07:05:37 rakete kernel:  ffff88005d863e88 ffff8801cde7a980
>>>>>> ffff88018e7fce48 ffffffff81077f2c
>>>>>> Apr 20 07:05:37 rakete kernel: Call Trace:
>>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff812ed425>] ?
>>>>>> dump_stack+0x5c/0x77
>>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81077ea1>] ?
>>>>>> warn_slowpath_common+0x81/0xb0
>>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81077f2c>] ?
>>>>>> warn_slowpath_fmt+0x5c/0x80
>>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d74af>] ?
>>>>>> __btrfs_cow_block+0x56f/0x5e0 [btrfs]
>>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02d76af>] ?
>>>>>> btrfs_cow_block+0x10f/0x1d0 [btrfs]
>>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02db2cd>] ?
>>>>>> btrfs_search_slot+0x1fd/0xa30 [btrfs]
>>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa02dd3f1>] ?
>>>>>> btrfs_insert_empty_items+0x71/0xc0 [btrfs]
>>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811f4d92>] ?
>>>>>> insert_inode_locked4+0xa2/0x1c0
>>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa030ee5d>] ?
>>>>>> btrfs_new_inode+0x1cd/0x590 [btrfs]
>>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffffa0310a77>] ?
>>>>>> btrfs_mkdir+0x107/0x1f0 [btrfs]
>>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811e80b0>] ?
>>>>>> vfs_mkdir+0xb0/0x140
>>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff811e9d3e>] ?
>>>>>> SyS_mkdir+0xce/0x110
>>>>>> Apr 20 07:05:37 rakete kernel:  [<ffffffff81592736>] ?
>>>>>> system_call_fast_compare_end+0xc/0x6b
>>>>>> Apr 20 07:05:37 rakete kernel: ---[ end trace 025eb0e83ffed96f ]---
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS: error (device sdi) in
>>>>>> __btrfs_cow_block:1156: errno=-5 IO failure
>>>>>> Apr 20 07:05:37 rakete kernel: BTRFS info (device sdi): forced
>>>>>> readonly
>>>>>>
>>>>>> ####
>>>>>> Try to copy again:
>>>>>> 11# cp -r /root/ .
>>>>>> cp: cannot create directory './root': Read-only file system
>>>>>> ####
>>>>>> /dev/sdg on /mnt/raid1 type btrfs
>>>>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>>>> ####
>>>>>> plugin device sdg again:
>>>>>>
>>>>>> Apr 20 07:07:39 rakete udisksd[3671]: Cleaning up mount point
>>>>>> /media/matthias/BACKUP (device 8:81 no longer exist)
>>>>>> Apr 20 07:07:39 rakete kernel: usb 3-1: USB disconnect, device
>>>>>> number 3
>>>>>> Apr 20 07:07:39 rakete udisksd[3671]: Error statting /dev/sdg: No
>>>>>> such file or directory
>>>>>> Apr 20 07:07:39 rakete umount[16807]: umount: /mnt/raid1: target is
>>>>>> busy
>>>>>> Apr 20 07:07:39 rakete umount[16807]: (In some cases useful info
>>>>>> about processes that
>>>>>> Apr 20 07:07:39 rakete umount[16807]: use the device is found by
>>>>>> lsof(8) or fuser(1).)
>>>>>> Apr 20 07:07:39 rakete systemd[1]: mnt-raid1.mount mount process
>>>>>> exited, code=exited status=32
>>>>>> Apr 20 07:07:39 rakete systemd[1]: Failed unmounting /mnt/raid1.
>>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: new SuperSpeed USB device
>>>>>> number 4 using xhci_hcd
>>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device found,
>>>>>> idVendor=152d, idProduct=0567
>>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: New USB device strings:
>>>>>> Mfr=10, Product=11, SerialNumber=5
>>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI
>>>>>> Bridge
>>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: Manufacturer: JMicron
>>>>>> Apr 20 07:08:01 rakete kernel: usb 3-1: SerialNumber: 152D00539000
>>>>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage
>>>>>> device detected
>>>>>> Apr 20 07:08:01 rakete kernel: usb-storage 3-1:1.0: Quirks match for
>>>>>> vid 152d pid 0567: 5000000
>>>>>> Apr 20 07:08:01 rakete kernel: scsi host10: usb-storage 3-1:1.0
>>>>>> Apr 20 07:08:01 rakete mtp-probe[16826]: checking bus 3, device 4:
>>>>>> "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
>>>>>> Apr 20 07:08:01 rakete mtp-probe[16826]: bus: 3, device: 4 was not an
>>>>>> MTP device
>>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:0: Direct-Access     WDC
>>>>>> WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
>>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:1: Direct-Access     WDC
>>>>>> WD75 00AACS-00C7B0    0125 PQ: 0 ANSI: 6
>>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:2: Direct-Access     WDC
>>>>>> WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
>>>>>> Apr 20 07:08:02 rakete kernel: scsi 10:0:0:3: Direct-Access
>>>>>> SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6
>>>>>> type 0
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte
>>>>>> logical blocks: (2.00 TB/1.82 TiB)
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00
>>>>>> 10 08
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7
>>>>>> type 0
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode
>>>>>> page found
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive
>>>>>> cache: write through
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] 1465149168 512-byte
>>>>>> logical blocks: (750 GB/699 GiB)
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8
>>>>>> type 0
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Write Protect is off
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Mode Sense: 67 00
>>>>>> 10 08
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9
>>>>>> type 0
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] 976773168 512-byte
>>>>>> logical blocks: (500 GB/466 GiB)
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] No Caching mode
>>>>>> page found
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Assuming drive
>>>>>> cache: write through
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Write Protect is off
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Mode Sense: 67 00
>>>>>> 10 08
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] 488395055 512-byte
>>>>>> logical blocks: (250 GB/233 GiB)
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] No Caching mode
>>>>>> page found
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Assuming drive
>>>>>> cache: write through
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Write Protect is off
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Mode Sense: 67 00
>>>>>> 10 08
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] No Caching mode
>>>>>> page found
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Assuming drive
>>>>>> cache: write through
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:1: [sdj] Attached SCSI disk
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:2: [sdk] Attached SCSI disk
>>>>>> Apr 20 07:08:02 rakete kernel: sd 10:0:0:3: [sdl] Attached SCSI disk
>>>>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): recovery complete
>>>>>> Apr 20 07:08:02 rakete kernel: EXT4-fs (sdf1): mounted filesystem
>>>>>> with ordered data mode. Opts: (null)
>>>>>>
>>>>>> ####
>>>>>> still ro mode
>>>>>> /dev/sdj on /mnt/raid1 type btrfs
>>>>>> (ro,noatime,space_cache,autodefrag,subvolid=5,subvol=/)
>>>>>> ####
>>>>>> 14# btrfs fi show
>>>>>> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>>>>>>     Total devices 3 FS bytes used 1.60GiB
>>>>>>     devid    1 size 698.64GiB used 3.03GiB path /dev/sdj
>>>>>>     devid    2 size 465.76GiB used 3.03GiB path /dev/sdk
>>>>>>     devid    3 size 232.88GiB used 0.00B path /dev/sdl
>>>>>> ####
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-btrfs" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-22  6:02             ` Qu Wenruo
@ 2016-04-23  7:07               ` Matthias Bodenbinder
  2016-04-23  7:17                 ` Matthias Bodenbinder
                                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-23  7:07 UTC (permalink / raw)
  To: linux-btrfs

Here is my newest test. The backports provide a 4.5 kernel:

####
kernel: 4.5.0-0.bpo.1-amd64
btrfs-tools: 4.4-1~bpo8+1 
####

This time the raid1 is automatically unmounted after I unplug the device and it can not be mounted while the device is missing. See below.

Matthias


####
1) turn on the Fantec case:

Apr 23 08:45:38 rakete kernel: usb 3-1: new SuperSpeed USB device number 2 using xhci_hcd
Apr 23 08:45:38 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
Apr 23 08:45:38 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
Apr 23 08:45:38 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
Apr 23 08:45:38 rakete kernel: usb 3-1: Manufacturer: JMicron
Apr 23 08:45:38 rakete kernel: usb 3-1: SerialNumber: 152D00539000
Apr 23 08:45:38 rakete mtp-probe[3641]: checking bus 3, device 2: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
Apr 23 08:45:38 rakete mtp-probe[3641]: bus: 3, device: 2 was not an MTP device
Apr 23 08:45:38 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
Apr 23 08:45:38 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
Apr 23 08:45:38 rakete kernel: scsi host8: usb-storage 3-1:1.0
Apr 23 08:45:38 rakete kernel: usbcore: registered new interface driver usb-storage
Apr 23 08:45:38 rakete kernel: usbcore: registered new interface driver uas
Apr 23 08:45:39 rakete kernel: scsi 8:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
Apr 23 08:45:39 rakete kernel: scsi 8:0:0:1: Direct-Access     WDC WD75 00AACS-00C7B0    0125 PQ: 0 ANSI: 6
Apr 23 08:45:39 rakete kernel: scsi 8:0:0:2: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
Apr 23 08:45:39 rakete kernel: scsi 8:0:0:3: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: Attached scsi generic sg6 type 0
Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: Attached scsi generic sg7 type 0
Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Write Protect is off
Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Mode Sense: 67 00 10 08
Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: Attached scsi generic sg8 type 0
Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Write Protect is off
Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Mode Sense: 67 00 10 08
Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: Attached scsi generic sg9 type 0
Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] No Caching mode page found
Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Assuming drive cache: write through
Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Write Protect is off
Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Mode Sense: 67 00 10 08
Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] 488395055 512-byte logical blocks: (250 GB/233 GiB)
Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] No Caching mode page found
Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Assuming drive cache: write through
Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Write Protect is off
Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Mode Sense: 67 00 10 08
Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] No Caching mode page found
Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Assuming drive cache: write through
Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] No Caching mode page found
Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Assuming drive cache: write through
Apr 23 08:45:39 rakete kernel:  sdf: sdf1
Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Attached SCSI disk
Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Attached SCSI disk
Apr 23 08:45:40 rakete kernel: sd 8:0:0:2: [sdh] Attached SCSI disk
Apr 23 08:45:40 rakete kernel: sd 8:0:0:3: [sdi] Attached SCSI disk
Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 1 transid 89 /dev/sdg
Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 2 transid 89 /dev/sdh
Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 3 transid 89 /dev/sdi
Apr 23 08:45:40 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
Apr 23 08:45:40 rakete udisksd[2422]: Mounted /dev/sdf1 at /media/matthias/BACKUP on behalf of uid 1000

####

7# mount /mnt/raid1/

Apr 23 08:47:31 rakete kernel: BTRFS info (device sdi): enabling auto defrag
Apr 23 08:47:31 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
Apr 23 08:47:31 rakete kernel: BTRFS: has skinny extents

8# btrfs fi show
Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
	Total devices 3 FS bytes used 1.60GiB
	devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
	devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
	devid    3 size 232.88GiB used 0.00B path /dev/sdi

9# ls -l /mnt/raid1/
total 0
drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
drwxr-xr-x 1 root root  108 Mar 24 07:31 var

####

Unplug the biggest HD

Apr 23 08:51:29 rakete kernel: usb 3-1: USB disconnect, device number 2
Apr 23 08:51:29 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
Apr 23 08:51:29 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
Apr 23 08:51:29 rakete kernel: Aborting journal on device sdf1-8.
Apr 23 08:51:29 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
Apr 23 08:51:29 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): error reading free space cache
Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 20497563648, rebuilding it now
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): error reading free space cache
Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 21571305472, rebuilding it now
Apr 23 08:51:29 rakete kernel: BTRFS: error (device sdi) in btrfs_commit_transaction:2142: errno=-5 IO failure (Error while writing out transaction)
Apr 23 08:51:29 rakete kernel: BTRFS info (device sdi): forced readonly
Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): Skipping commit of aborted transaction.
Apr 23 08:51:29 rakete kernel: ------------[ cut here ]------------
Apr 23 08:51:29 rakete kernel: WARNING: CPU: 1 PID: 4277 at /build/linux-Ki7dwx/linux-4.5.1/fs/btrfs/transaction.c:1764 cleanup_transaction+0x96/0x300 [btrfs]()
Apr 23 08:51:29 rakete kernel: BTRFS: Transaction aborted (error -5)
Apr 23 08:51:29 rakete kernel: Modules linked in: uas(E) usb_storage(E) pci_stub(E) vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) binfmt_misc(E) dvb_ttpci(E) saa7146_vv(E) ttpci_eeprom(E) saa7146(E) videobuf_dma_sg(E) videobuf_core(E) dvb_core(E) v4l2_common(E) videodev(E) media(E) cfg80211(E) vboxdrv(OE) cpufreq_powersave(E) cpufreq_conservative(E) cpufreq_userspace(E) cpufreq_stats(E) snd_hda_codec_hdmi(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drbg(E) ansi_cprng(E) eeepc_wmi(E) asus_wmi(E) sparse_keymap(E) iTCO_wdt(E) joydev(E) iTCO_vendor_support(E) rfkill(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) snd_hda_codec_realtek(E) pcspkr(E) snd_hda_codec_generic(E)
Apr 23 08:51:29 rakete kernel:  serio_raw(E) i2c_i801(E) lpc_ich(E) mfd_core(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) evdev(E) battery(E) snd_timer(E) 8250_fintek(E) snd(E) mei_me(E) soundcore(E) mei(E) shpchp(E) tpm_tis(E) tpm(E) processor(E) nvidia(POE) drm(E) fuse(E) ecryptfs(E) cbc(E) hmac(E) encrypted_keys(E) parport_pc(E) ppdev(E) lp(E) parport(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) hid_generic(E) usbhid(E) hid(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) md_mod(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) sg(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ahci(E) pata_via(E) libahci(E) crc32c_intel(E) xhci_pci(E) ehci_pci(E) psmouse(E) libata(E) xhci_hcd(E)
Apr 23 08:51:29 rakete kernel:  ehci_hcd(E) atl1c(E) scsi_mod(E) usbcore(E) usb_common(E) wmi(E) fan(E) thermal(E) fjes(E) video(E) button(E)
Apr 23 08:51:29 rakete kernel: CPU: 1 PID: 4277 Comm: umount Tainted: P           OE   4.5.0-0.bpo.1-amd64 #1 Debian 4.5.1-1~bpo8+1
Apr 23 08:51:29 rakete kernel: Hardware name: System manufacturer System Product Name/P8H67-V, BIOS 3707 07/12/2013
Apr 23 08:51:29 rakete kernel:  0000000000000286 00000000bfe9047d ffffffff813099f5 ffff8801c1103ca8
Apr 23 08:51:29 rakete kernel:  ffffffffc03a8c98 ffffffff81079a61 ffff880214562d90 ffff8801c1103d00
Apr 23 08:51:29 rakete kernel:  ffff8801c5165980 00000000fffffffb ffff880214562d90 ffffffff81079aec
Apr 23 08:51:29 rakete kernel: Call Trace:
Apr 23 08:51:29 rakete kernel:  [<ffffffff813099f5>] ? dump_stack+0x5c/0x77
Apr 23 08:51:29 rakete kernel:  [<ffffffff81079a61>] ? warn_slowpath_common+0x81/0xb0
Apr 23 08:51:29 rakete kernel:  [<ffffffff81079aec>] ? warn_slowpath_fmt+0x5c/0x80
Apr 23 08:51:29 rakete kernel:  [<ffffffffc0320e46>] ? cleanup_transaction+0x96/0x300 [btrfs]
Apr 23 08:51:29 rakete kernel:  [<ffffffff810b94a0>] ? wait_woken+0x90/0x90
Apr 23 08:51:29 rakete kernel:  [<ffffffffc0321bf3>] ? btrfs_commit_transaction+0x2b3/0xa30 [btrfs]
Apr 23 08:51:29 rakete kernel:  [<ffffffffc0322406>] ? start_transaction+0x96/0x4d0 [btrfs]
Apr 23 08:51:29 rakete kernel:  [<ffffffffc031d0d2>] ? close_ctree+0x2b2/0x360 [btrfs]
Apr 23 08:51:29 rakete kernel:  [<ffffffff81206fd7>] ? evict_inodes+0x147/0x170
Apr 23 08:51:29 rakete kernel:  [<ffffffff811eda39>] ? generic_shutdown_super+0x69/0xf0
Apr 23 08:51:29 rakete kernel:  [<ffffffff811edace>] ? kill_anon_super+0xe/0x20
Apr 23 08:51:29 rakete kernel:  [<ffffffffc02f1603>] ? btrfs_kill_super+0x13/0x100 [btrfs]
Apr 23 08:51:29 rakete kernel:  [<ffffffff811ed4c4>] ? deactivate_locked_super+0x34/0x60
Apr 23 08:51:29 rakete kernel:  [<ffffffff81209d5b>] ? cleanup_mnt+0x3b/0x80
Apr 23 08:51:29 rakete kernel:  [<ffffffff81096114>] ? task_work_run+0x74/0x90
Apr 23 08:51:29 rakete kernel:  [<ffffffff8100334a>] ? exit_to_usermode_loop+0xba/0xc0
Apr 23 08:51:29 rakete kernel:  [<ffffffff81003bcf>] ? syscall_return_slowpath+0x8f/0x110
Apr 23 08:51:29 rakete kernel:  [<ffffffff815b9918>] ? int_ret_from_sys_call+0x25/0x8f
Apr 23 08:51:29 rakete kernel: ---[ end trace 6bbe2b6d20973e0e ]---
Apr 23 08:51:29 rakete kernel: BTRFS: error (device sdi) in cleanup_transaction:1764: errno=-5 IO failure
Apr 23 08:51:29 rakete kernel: BTRFS info (device sdi): delayed_refs has NO entry
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): commit super ret -5
Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): cleaner transaction attach returned -30

....

Apr 23 08:51:48 rakete kernel: usb 3-1: new SuperSpeed USB device number 3 using xhci_hcd
Apr 23 08:51:48 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
Apr 23 08:51:48 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
Apr 23 08:51:48 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
Apr 23 08:51:48 rakete kernel: usb 3-1: Manufacturer: JMicron
Apr 23 08:51:48 rakete kernel: usb 3-1: SerialNumber: 152D00539000
Apr 23 08:51:48 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
Apr 23 08:51:48 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
Apr 23 08:51:48 rakete kernel: scsi host9: usb-storage 3-1:1.0
Apr 23 08:51:48 rakete mtp-probe[4301]: checking bus 3, device 3: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
Apr 23 08:51:48 rakete mtp-probe[4301]: bus: 3, device: 3 was not an MTP device
Apr 23 08:51:49 rakete kernel: scsi 9:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
Apr 23 08:51:49 rakete kernel: scsi 9:0:0:1: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
Apr 23 08:51:49 rakete kernel: scsi 9:0:0:2: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6 type 0
Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8 type 0
Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page found
Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache: write through
Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Write Protect is off
Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Mode Sense: 67 00 10 08
Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] 488395055 512-byte logical blocks: (250 GB/233 GiB)
Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] No Caching mode page found
Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Assuming drive cache: write through
Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Write Protect is off
Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Mode Sense: 67 00 10 08
Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] No Caching mode page found
Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Assuming drive cache: write through
Apr 23 08:51:49 rakete kernel:  sdf: sdf1
Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Attached SCSI disk
Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Attached SCSI disk
Apr 23 08:51:49 rakete udisksd[2422]: Mounted /dev/sdf1 at /media/matthias/BACKUP on behalf of uid 1000
Apr 23 08:51:49 rakete kernel: EXT4-fs (sdf1): recovery complete
Apr 23 08:51:49 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)

####

10# btrfs fi show
warning, device 1 is missing
warning devid 1 not found already
Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
	Total devices 3 FS bytes used 1.60GiB
	devid    2 size 465.76GiB used 3.03GiB path /dev/sdg
	devid    3 size 232.88GiB used 0.00B path /dev/sdh
	*** Some devices missing

####
This time the raid1 is in state "unmounted" after removing the device. This is different to what I found with kernel 4.4.

12# ls -l /mnt/raid1/
total 0

####
Trying to mount it again:

14# mount /mnt/raid1/
mount: wrong fs type, bad option, bad superblock on /dev/sdh,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
####

Apr 23 08:54:35 rakete kernel: BTRFS info (device sdh): enabling auto defrag
Apr 23 08:54:35 rakete kernel: BTRFS info (device sdh): disk space caching is enabled
Apr 23 08:54:35 rakete kernel: BTRFS: has skinny extents
Apr 23 08:54:35 rakete kernel: BTRFS: failed to read the system array on sdh
Apr 23 08:54:35 rakete kernel: BTRFS: open_ctree failed

####

Plugin the device again.

Apr 23 08:55:44 rakete kernel: usb 3-1: USB disconnect, device number 3
Apr 23 08:56:06 rakete kernel: usb 3-1: new SuperSpeed USB device number 4 using xhci_hcd
Apr 23 08:56:06 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
Apr 23 08:56:06 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
Apr 23 08:56:06 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
Apr 23 08:56:06 rakete kernel: usb 3-1: Manufacturer: JMicron
Apr 23 08:56:06 rakete kernel: usb 3-1: SerialNumber: 152D00539000
Apr 23 08:56:06 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
Apr 23 08:56:06 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
Apr 23 08:56:06 rakete kernel: scsi host10: usb-storage 3-1:1.0
Apr 23 08:56:06 rakete mtp-probe[4751]: checking bus 3, device 4: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
Apr 23 08:56:06 rakete mtp-probe[4751]: bus: 3, device: 4 was not an MTP device
Apr 23 08:56:07 rakete kernel: scsi 10:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
Apr 23 08:56:07 rakete kernel: scsi 10:0:0:1: Direct-Access     WDC WD75 00AACS-00C7B0    0125 PQ: 0 ANSI: 6
Apr 23 08:56:07 rakete kernel: scsi 10:0:0:2: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
Apr 23 08:56:07 rakete kernel: scsi 10:0:0:3: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6 type 0
Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00 10 08
Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7 type 0
Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode page found
Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive cache: write through
Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8 type 0
Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9 type 0
Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Write Protect is off
Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Mode Sense: 67 00 10 08
Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] 976773168 512-byte logical blocks: (500 GB/466 GiB)
Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] No Caching mode page found
Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Assuming drive cache: write through
Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] 488395055 512-byte logical blocks: (250 GB/233 GiB)
Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Write Protect is off
Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Mode Sense: 67 00 10 08
Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Write Protect is off
Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Mode Sense: 67 00 10 08
Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] No Caching mode page found
Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Assuming drive cache: write through
Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] No Caching mode page found
Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Assuming drive cache: write through
Apr 23 08:56:07 rakete kernel:  sdf: sdf1
Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Attached SCSI disk
Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Attached SCSI disk
Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Attached SCSI disk
Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 1 transid 89 /dev/sdg
Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 2 transid 89 /dev/sdh
Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 3 transid 89 /dev/sdi
Apr 23 08:56:07 rakete kernel: EXT4-fs (sdf1): recovery complete
Apr 23 08:56:07 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)

####

15# btrfs fi show
Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
	Total devices 3 FS bytes used 1.60GiB
	devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
	devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
	devid    3 size 232.88GiB used 0.00B path /dev/sdi

####

18# mount /mnt/raid1/

Apr 23 08:57:00 rakete kernel: BTRFS info (device sdi): enabling auto defrag
Apr 23 08:57:00 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
Apr 23 08:57:00 rakete kernel: BTRFS: has skinny extents

####

19# ls -l /mnt/raid1/
total 0
drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
drwxr-xr-x 1 root root  108 Mar 24 07:31 var

####




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-23  7:07               ` Matthias Bodenbinder
@ 2016-04-23  7:17                 ` Matthias Bodenbinder
  2016-04-26  8:17                 ` Satoru Takeuchi
  2016-04-26 15:16                 ` Henk Slager
  2 siblings, 0 replies; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-23  7:17 UTC (permalink / raw)
  To: linux-btrfs

Am 23.04.2016 um 09:07 schrieb Matthias Bodenbinder:
> 14# mount /mnt/raid1/
> mount: wrong fs type, bad option, bad superblock on /dev/sdh,
>        missing codepage or helper program, or other error
> 
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.
> ####

My /etc/fstab has the following entry for the raid1:

UUID=16d5891f-5d52-4b29-8591-588ddf11e73d /mnt/raid1 btrfs    noauto,noatime,autodefrag  1   2

Matthias


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-23  7:07               ` Matthias Bodenbinder
  2016-04-23  7:17                 ` Matthias Bodenbinder
@ 2016-04-26  8:17                 ` Satoru Takeuchi
  2016-04-26 15:16                 ` Henk Slager
  2 siblings, 0 replies; 32+ messages in thread
From: Satoru Takeuchi @ 2016-04-26  8:17 UTC (permalink / raw)
  To: Matthias Bodenbinder, linux-btrfs

On 2016/04/23 16:07, Matthias Bodenbinder wrote:
> Here is my newest test. The backports provide a 4.5 kernel:
>
> ####
> kernel: 4.5.0-0.bpo.1-amd64
> btrfs-tools: 4.4-1~bpo8+1
> ####
>
> This time the raid1 is automatically unmounted after I unplug the device and it can not be mounted while the device is missing. See below.
>
> Matthias

As I said before, I consider this problem is not
caused by Btrfs, but by hardware.

Please see the following comments.

>
>
> ####
> 1) turn on the Fantec case:
>
> Apr 23 08:45:38 rakete kernel: usb 3-1: new SuperSpeed USB device number 2 using xhci_hcd
> Apr 23 08:45:38 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 23 08:45:38 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 23 08:45:38 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 23 08:45:38 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 23 08:45:38 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 23 08:45:38 rakete mtp-probe[3641]: checking bus 3, device 2: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 23 08:45:38 rakete mtp-probe[3641]: bus: 3, device: 2 was not an MTP device
> Apr 23 08:45:38 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 23 08:45:38 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 23 08:45:38 rakete kernel: scsi host8: usb-storage 3-1:1.0
> Apr 23 08:45:38 rakete kernel: usbcore: registered new interface driver usb-storage
> Apr 23 08:45:38 rakete kernel: usbcore: registered new interface driver uas
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:1: Direct-Access     WDC WD75 00AACS-00C7B0    0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:2: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:3: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: Attached scsi generic sg6 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: Attached scsi generic sg7 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: Attached scsi generic sg8 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: Attached scsi generic sg9 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel:  sdf: sdf1
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Attached SCSI disk
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Attached SCSI disk
> Apr 23 08:45:40 rakete kernel: sd 8:0:0:2: [sdh] Attached SCSI disk
> Apr 23 08:45:40 rakete kernel: sd 8:0:0:3: [sdi] Attached SCSI disk

When you turned on Fantec case, four disks, WD20(sdf), WD75(sdg),
WD50(sgh), and SP2504C(sgi) were attached. It's a matter of course.

> Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 1 transid 89 /dev/sdg
> Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 2 transid 89 /dev/sdh
> Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 3 transid 89 /dev/sdi
> Apr 23 08:45:40 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
> Apr 23 08:45:40 rakete udisksd[2422]: Mounted /dev/sdf1 at /media/matthias/BACKUP on behalf of uid 1000
>
> ####
>
> 7# mount /mnt/raid1/
>
> Apr 23 08:47:31 rakete kernel: BTRFS info (device sdi): enabling auto defrag
> Apr 23 08:47:31 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
> Apr 23 08:47:31 rakete kernel: BTRFS: has skinny extents
>
> 8# btrfs fi show
> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> 	Total devices 3 FS bytes used 1.60GiB
> 	devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
> 	devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
> 	devid    3 size 232.88GiB used 0.00B path /dev/sdi
>
> 9# ls -l /mnt/raid1/
> total 0
> drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
> drwxr-xr-x 1 root root  108 Mar 24 07:31 var
>
> ####
>
> Unplug the biggest HD

Then you hot-unplugged the biggest disk, WD75(sdg).

>
> Apr 23 08:51:29 rakete kernel: usb 3-1: USB disconnect, device number 2
> Apr 23 08:51:29 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 23 08:51:29 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 23 08:51:29 rakete kernel: Aborting journal on device sdf1-8.
> Apr 23 08:51:29 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 23 08:51:29 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): error reading free space cache
> Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 20497563648, rebuilding it now
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): error reading free space cache
> Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 21571305472, rebuilding it now
> Apr 23 08:51:29 rakete kernel: BTRFS: error (device sdi) in btrfs_commit_transaction:2142: errno=-5 IO failure (Error while writing out transaction)
> Apr 23 08:51:29 rakete kernel: BTRFS info (device sdi): forced readonly
> Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): Skipping commit of aborted transaction.
> Apr 23 08:51:29 rakete kernel: ------------[ cut here ]------------
> Apr 23 08:51:29 rakete kernel: WARNING: CPU: 1 PID: 4277 at /build/linux-Ki7dwx/linux-4.5.1/fs/btrfs/transaction.c:1764 cleanup_transaction+0x96/0x300 [btrfs]()
> Apr 23 08:51:29 rakete kernel: BTRFS: Transaction aborted (error -5)
> Apr 23 08:51:29 rakete kernel: Modules linked in: uas(E) usb_storage(E) pci_stub(E) vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) binfmt_misc(E) dvb_ttpci(E) saa7146_vv(E) ttpci_eeprom(E) saa7146(E) videobuf_dma_sg(E) videobuf_core(E) dvb_core(E) v4l2_common(E) videodev(E) media(E) cfg80211(E) vboxdrv(OE) cpufreq_powersave(E) cpufreq_conservative(E) cpufreq_userspace(E) cpufreq_stats(E) snd_hda_codec_hdmi(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drbg(E) ansi_cprng(E) eeepc_wmi(E) asus_wmi(E) sparse_keymap(E) iTCO_wdt(E) joydev(E) iTCO_vendor_support(E) rfkill(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) snd_hda_codec_realtek(E) pcspkr(E) snd_hda_codec_generic(E)
> Apr 23 08:51:29 rakete kernel:  serio_raw(E) i2c_i801(E) lpc_ich(E) mfd_core(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) evdev(E) battery(E) snd_timer(E) 8250_fintek(E) snd(E) mei_me(E) soundcore(E) mei(E) shpchp(E) tpm_tis(E) tpm(E) processor(E) nvidia(POE) drm(E) fuse(E) ecryptfs(E) cbc(E) hmac(E) encrypted_keys(E) parport_pc(E) ppdev(E) lp(E) parport(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) hid_generic(E) usbhid(E) hid(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) md_mod(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) sg(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ahci(E) pata_via(E) libahci(E) crc32c_intel(E) xhci_pci(E) ehci_pci(E) psmouse(E) libata(E) xhci_hcd(E)
> Apr 23 08:51:29 rakete kernel:  ehci_hcd(E) atl1c(E) scsi_mod(E) usbcore(E) usb_common(E) wmi(E) fan(E) thermal(E) fjes(E) video(E) button(E)
> Apr 23 08:51:29 rakete kernel: CPU: 1 PID: 4277 Comm: umount Tainted: P           OE   4.5.0-0.bpo.1-amd64 #1 Debian 4.5.1-1~bpo8+1
> Apr 23 08:51:29 rakete kernel: Hardware name: System manufacturer System Product Name/P8H67-V, BIOS 3707 07/12/2013
> Apr 23 08:51:29 rakete kernel:  0000000000000286 00000000bfe9047d ffffffff813099f5 ffff8801c1103ca8
> Apr 23 08:51:29 rakete kernel:  ffffffffc03a8c98 ffffffff81079a61 ffff880214562d90 ffff8801c1103d00
> Apr 23 08:51:29 rakete kernel:  ffff8801c5165980 00000000fffffffb ffff880214562d90 ffffffff81079aec
> Apr 23 08:51:29 rakete kernel: Call Trace:
> Apr 23 08:51:29 rakete kernel:  [<ffffffff813099f5>] ? dump_stack+0x5c/0x77
> Apr 23 08:51:29 rakete kernel:  [<ffffffff81079a61>] ? warn_slowpath_common+0x81/0xb0
> Apr 23 08:51:29 rakete kernel:  [<ffffffff81079aec>] ? warn_slowpath_fmt+0x5c/0x80
> Apr 23 08:51:29 rakete kernel:  [<ffffffffc0320e46>] ? cleanup_transaction+0x96/0x300 [btrfs]
> Apr 23 08:51:29 rakete kernel:  [<ffffffff810b94a0>] ? wait_woken+0x90/0x90
> Apr 23 08:51:29 rakete kernel:  [<ffffffffc0321bf3>] ? btrfs_commit_transaction+0x2b3/0xa30 [btrfs]
> Apr 23 08:51:29 rakete kernel:  [<ffffffffc0322406>] ? start_transaction+0x96/0x4d0 [btrfs]
> Apr 23 08:51:29 rakete kernel:  [<ffffffffc031d0d2>] ? close_ctree+0x2b2/0x360 [btrfs]
> Apr 23 08:51:29 rakete kernel:  [<ffffffff81206fd7>] ? evict_inodes+0x147/0x170
> Apr 23 08:51:29 rakete kernel:  [<ffffffff811eda39>] ? generic_shutdown_super+0x69/0xf0
> Apr 23 08:51:29 rakete kernel:  [<ffffffff811edace>] ? kill_anon_super+0xe/0x20
> Apr 23 08:51:29 rakete kernel:  [<ffffffffc02f1603>] ? btrfs_kill_super+0x13/0x100 [btrfs]
> Apr 23 08:51:29 rakete kernel:  [<ffffffff811ed4c4>] ? deactivate_locked_super+0x34/0x60
> Apr 23 08:51:29 rakete kernel:  [<ffffffff81209d5b>] ? cleanup_mnt+0x3b/0x80
> Apr 23 08:51:29 rakete kernel:  [<ffffffff81096114>] ? task_work_run+0x74/0x90
> Apr 23 08:51:29 rakete kernel:  [<ffffffff8100334a>] ? exit_to_usermode_loop+0xba/0xc0
> Apr 23 08:51:29 rakete kernel:  [<ffffffff81003bcf>] ? syscall_return_slowpath+0x8f/0x110
> Apr 23 08:51:29 rakete kernel:  [<ffffffff815b9918>] ? int_ret_from_sys_call+0x25/0x8f
> Apr 23 08:51:29 rakete kernel: ---[ end trace 6bbe2b6d20973e0e ]---
> Apr 23 08:51:29 rakete kernel: BTRFS: error (device sdi) in cleanup_transaction:1764: errno=-5 IO failure
> Apr 23 08:51:29 rakete kernel: BTRFS info (device sdi): delayed_refs has NO entry
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): commit super ret -5
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): cleaner transaction attach returned -30
>
> ....
>
> Apr 23 08:51:48 rakete kernel: usb 3-1: new SuperSpeed USB device number 3 using xhci_hcd
> Apr 23 08:51:48 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 23 08:51:48 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 23 08:51:48 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 23 08:51:48 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 23 08:51:48 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 23 08:51:48 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 23 08:51:48 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 23 08:51:48 rakete kernel: scsi host9: usb-storage 3-1:1.0
> Apr 23 08:51:48 rakete mtp-probe[4301]: checking bus 3, device 3: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 23 08:51:48 rakete mtp-probe[4301]: bus: 3, device: 3 was not an MTP device
> Apr 23 08:51:49 rakete kernel: scsi 9:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
> Apr 23 08:51:49 rakete kernel: scsi 9:0:0:1: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
> Apr 23 08:51:49 rakete kernel: scsi 9:0:0:2: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6 type 0
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8 type 0
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page found
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache: write through
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Write Protect is off
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Mode Sense: 67 00 10 08
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] No Caching mode page found
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Assuming drive cache: write through
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Write Protect is off
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Mode Sense: 67 00 10 08
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] No Caching mode page found
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Assuming drive cache: write through
> Apr 23 08:51:49 rakete kernel:  sdf: sdf1
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Attached SCSI disk
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Attached SCSI disk

After hot-unplugging WD75(sdg), surprisingly, other three online
disks, in other word, not-unplugged disks, were re-attached.
 From the filesystem's point of view, once its *all* backend
devices suddenly became missing. In this case, I guess,
any filesystems can't work correctly.

> Apr 23 08:51:49 rakete udisksd[2422]: Mounted /dev/sdf1 at /media/matthias/BACKUP on behalf of uid 1000
> Apr 23 08:51:49 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 23 08:51:49 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)

In fact, we can see some problems happened not only on
Btrfs but also on ext4 on WD20(sdf1). ext4 was remounted
here and was recovered from some inconsistent state.
Probably similar problems happen on any other filesystems,
for example XFS.

Apparently it's not what you intended. You just tried to
hot-unplug a disk to confirm whether Btrfs's RAID1 works
correctly or not. However, what happened here was that
all four disks were detached and tree of them were
attached again.

>
> ####
>
> 10# btrfs fi show
> warning, device 1 is missing
> warning devid 1 not found already
> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> 	Total devices 3 FS bytes used 1.60GiB
> 	devid    2 size 465.76GiB used 3.03GiB path /dev/sdg
> 	devid    3 size 232.88GiB used 0.00B path /dev/sdh
> 	*** Some devices missing

Furthermore, device names for Btrfs were changed. WD75 was
from sdh to sdg and SP2504C was from sdi to sdh. It would
make things worse.

>
> ####
> This time the raid1 is in state "unmounted" after removing the device. This is different to what I found with kernel 4.4.
>
> 12# ls -l /mnt/raid1/
> total 0
>
> ####
> Trying to mount it again:
>
> 14# mount /mnt/raid1/
> mount: wrong fs type, bad option, bad superblock on /dev/sdh,
>         missing codepage or helper program, or other error
>
>         In some cases useful info is found in syslog - try
>         dmesg | tail or so.
> ####
>
> Apr 23 08:54:35 rakete kernel: BTRFS info (device sdh): enabling auto defrag
> Apr 23 08:54:35 rakete kernel: BTRFS info (device sdh): disk space caching is enabled
> Apr 23 08:54:35 rakete kernel: BTRFS: has skinny extents
> Apr 23 08:54:35 rakete kernel: BTRFS: failed to read the system array on sdh
> Apr 23 08:54:35 rakete kernel: BTRFS: open_ctree failed
>
> ####
>
> Plugin the device again.

You hot-plugged the biggest device, WD75.

>
> Apr 23 08:55:44 rakete kernel: usb 3-1: USB disconnect, device number 3
> Apr 23 08:56:06 rakete kernel: usb 3-1: new SuperSpeed USB device number 4 using xhci_hcd
> Apr 23 08:56:06 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 23 08:56:06 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 23 08:56:06 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 23 08:56:06 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 23 08:56:06 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 23 08:56:06 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 23 08:56:06 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 23 08:56:06 rakete kernel: scsi host10: usb-storage 3-1:1.0
> Apr 23 08:56:06 rakete mtp-probe[4751]: checking bus 3, device 4: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 23 08:56:06 rakete mtp-probe[4751]: bus: 3, device: 4 was not an MTP device
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:1: Direct-Access     WDC WD75 00AACS-00C7B0    0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:2: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:3: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel:  sdf: sdf1
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Attached SCSI disk
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Attached SCSI disk
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Attached SCSI disk

Then all four devices, WD20(sdf), WD75(sdg), WD50(sdh),
and SP2504C(sdi) were attached. Attaching WD75(sdg) is OK.
However, re-attaching already-online devices WD20(sdf),
WD50(sdh), and SP2504C(sdi) are apparently strange.

> Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 1 transid 89 /dev/sdg
> Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 2 transid 89 /dev/sdh
> Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 3 transid 89 /dev/sdi
> Apr 23 08:56:07 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 23 08:56:07 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)

There were some problems on ext4 on WD20(sdf1) again.

Thanks,
Satoru

>
> ####
>
> 15# btrfs fi show
> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
> 	Total devices 3 FS bytes used 1.60GiB
> 	devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
> 	devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
> 	devid    3 size 232.88GiB used 0.00B path /dev/sdi
>
> ####
>
> 18# mount /mnt/raid1/
>
> Apr 23 08:57:00 rakete kernel: BTRFS info (device sdi): enabling auto defrag
> Apr 23 08:57:00 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
> Apr 23 08:57:00 rakete kernel: BTRFS: has skinny extents
>
> ####
>
> 19# ls -l /mnt/raid1/
> total 0
> drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
> drwxr-xr-x 1 root root  108 Mar 24 07:31 var
>
> ####
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-23  7:07               ` Matthias Bodenbinder
  2016-04-23  7:17                 ` Matthias Bodenbinder
  2016-04-26  8:17                 ` Satoru Takeuchi
@ 2016-04-26 15:16                 ` Henk Slager
  2 siblings, 0 replies; 32+ messages in thread
From: Henk Slager @ 2016-04-26 15:16 UTC (permalink / raw)
  To: Matthias Bodenbinder; +Cc: linux-btrfs

On Sat, Apr 23, 2016 at 9:07 AM, Matthias Bodenbinder
<matthias@bodenbinder.de> wrote:
>
> Here is my newest test. The backports provide a 4.5 kernel:
>
> ####
> kernel: 4.5.0-0.bpo.1-amd64
> btrfs-tools: 4.4-1~bpo8+1
> ####
>
> This time the raid1 is automatically unmounted after I unplug the device and it can not be mounted while the device is missing. See below.
>
> Matthias
>
>
> ####
> 1) turn on the Fantec case:
>
> Apr 23 08:45:38 rakete kernel: usb 3-1: new SuperSpeed USB device number 2 using xhci_hcd
> Apr 23 08:45:38 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 23 08:45:38 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 23 08:45:38 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 23 08:45:38 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 23 08:45:38 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 23 08:45:38 rakete mtp-probe[3641]: checking bus 3, device 2: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 23 08:45:38 rakete mtp-probe[3641]: bus: 3, device: 2 was not an MTP device
> Apr 23 08:45:38 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 23 08:45:38 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 23 08:45:38 rakete kernel: scsi host8: usb-storage 3-1:1.0
> Apr 23 08:45:38 rakete kernel: usbcore: registered new interface driver usb-storage
> Apr 23 08:45:38 rakete kernel: usbcore: registered new interface driver uas
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:1: Direct-Access     WDC WD75 00AACS-00C7B0    0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:2: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: scsi 8:0:0:3: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: Attached scsi generic sg6 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: Attached scsi generic sg7 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: Attached scsi generic sg8 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: Attached scsi generic sg9 type 0
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Write Protect is off
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Mode Sense: 67 00 10 08
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:2: [sdh] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] No Caching mode page found
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:3: [sdi] Assuming drive cache: write through
> Apr 23 08:45:39 rakete kernel:  sdf: sdf1
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:0: [sdf] Attached SCSI disk
> Apr 23 08:45:39 rakete kernel: sd 8:0:0:1: [sdg] Attached SCSI disk
> Apr 23 08:45:40 rakete kernel: sd 8:0:0:2: [sdh] Attached SCSI disk
> Apr 23 08:45:40 rakete kernel: sd 8:0:0:3: [sdi] Attached SCSI disk
> Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 1 transid 89 /dev/sdg
> Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 2 transid 89 /dev/sdh
> Apr 23 08:45:40 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 3 transid 89 /dev/sdi
> Apr 23 08:45:40 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
> Apr 23 08:45:40 rakete udisksd[2422]: Mounted /dev/sdf1 at /media/matthias/BACKUP on behalf of uid 1000
>
> ####
>
> 7# mount /mnt/raid1/
>
> Apr 23 08:47:31 rakete kernel: BTRFS info (device sdi): enabling auto defrag
> Apr 23 08:47:31 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
> Apr 23 08:47:31 rakete kernel: BTRFS: has skinny extents
>
> 8# btrfs fi show
> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>         Total devices 3 FS bytes used 1.60GiB
>         devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
>         devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
>         devid    3 size 232.88GiB used 0.00B path /dev/sdi
>
> 9# ls -l /mnt/raid1/
> total 0
> drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
> drwxr-xr-x 1 root root  108 Mar 24 07:31 var
>
> ####
>
> Unplug the biggest HD
>
> Apr 23 08:51:29 rakete kernel: usb 3-1: USB disconnect, device number 2
> Apr 23 08:51:29 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 23 08:51:29 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 23 08:51:29 rakete kernel: Aborting journal on device sdf1-8.
> Apr 23 08:51:29 rakete kernel: Buffer I/O error on dev sdf1, logical block 243826688, lost sync page write
> Apr 23 08:51:29 rakete kernel: JBD2: Error -5 detected when updating journal superblock for sdf1-8.
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): bdev /dev/sdh errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): error reading free space cache
> Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 20497563648, rebuilding it now
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): error reading free space cache
> Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): failed to load free space cache for block group 21571305472, rebuilding it now
> Apr 23 08:51:29 rakete kernel: BTRFS: error (device sdi) in btrfs_commit_transaction:2142: errno=-5 IO failure (Error while writing out transaction)
> Apr 23 08:51:29 rakete kernel: BTRFS info (device sdi): forced readonly
> Apr 23 08:51:29 rakete kernel: BTRFS warning (device sdi): Skipping commit of aborted transaction.
> Apr 23 08:51:29 rakete kernel: ------------[ cut here ]------------
> Apr 23 08:51:29 rakete kernel: WARNING: CPU: 1 PID: 4277 at /build/linux-Ki7dwx/linux-4.5.1/fs/btrfs/transaction.c:1764 cleanup_transaction+0x96/0x300 [btrfs]()
> Apr 23 08:51:29 rakete kernel: BTRFS: Transaction aborted (error -5)
> Apr 23 08:51:29 rakete kernel: Modules linked in: uas(E) usb_storage(E) pci_stub(E) vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) binfmt_misc(E) dvb_ttpci(E) saa7146_vv(E) ttpci_eeprom(E) saa7146(E) videobuf_dma_sg(E) videobuf_core(E) dvb_core(E) v4l2_common(E) videodev(E) media(E) cfg80211(E) vboxdrv(OE) cpufreq_powersave(E) cpufreq_conservative(E) cpufreq_userspace(E) cpufreq_stats(E) snd_hda_codec_hdmi(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drbg(E) ansi_cprng(E) eeepc_wmi(E) asus_wmi(E) sparse_keymap(E) iTCO_wdt(E) joydev(E) iTCO_vendor_support(E) rfkill(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) snd_hda_codec_realtek(E) pcspkr(E) snd_hda_codec_generic(E)
> Apr 23 08:51:29 rakete kernel:  serio_raw(E) i2c_i801(E) lpc_ich(E) mfd_core(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) evdev(E) battery(E) snd_timer(E) 8250_fintek(E) snd(E) mei_me(E) soundcore(E) mei(E) shpchp(E) tpm_tis(E) tpm(E) processor(E) nvidia(POE) drm(E) fuse(E) ecryptfs(E) cbc(E) hmac(E) encrypted_keys(E) parport_pc(E) ppdev(E) lp(E) parport(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) hid_generic(E) usbhid(E) hid(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) md_mod(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) sg(E) sr_mod(E) cdrom(E) sd_mod(E) ata_generic(E) ahci(E) pata_via(E) libahci(E) crc32c_intel(E) xhci_pci(E) ehci_pci(E) psmouse(E) libata(E) xhci_hcd(E)
> Apr 23 08:51:29 rakete kernel:  ehci_hcd(E) atl1c(E) scsi_mod(E) usbcore(E) usb_common(E) wmi(E) fan(E) thermal(E) fjes(E) video(E) button(E)
> Apr 23 08:51:29 rakete kernel: CPU: 1 PID: 4277 Comm: umount Tainted: P           OE   4.5.0-0.bpo.1-amd64 #1 Debian 4.5.1-1~bpo8+1
> Apr 23 08:51:29 rakete kernel: Hardware name: System manufacturer System Product Name/P8H67-V, BIOS 3707 07/12/2013
> Apr 23 08:51:29 rakete kernel:  0000000000000286 00000000bfe9047d ffffffff813099f5 ffff8801c1103ca8
> Apr 23 08:51:29 rakete kernel:  ffffffffc03a8c98 ffffffff81079a61 ffff880214562d90 ffff8801c1103d00
> Apr 23 08:51:29 rakete kernel:  ffff8801c5165980 00000000fffffffb ffff880214562d90 ffffffff81079aec
> Apr 23 08:51:29 rakete kernel: Call Trace:
> Apr 23 08:51:29 rakete kernel:  [<ffffffff813099f5>] ? dump_stack+0x5c/0x77
> Apr 23 08:51:29 rakete kernel:  [<ffffffff81079a61>] ? warn_slowpath_common+0x81/0xb0
> Apr 23 08:51:29 rakete kernel:  [<ffffffff81079aec>] ? warn_slowpath_fmt+0x5c/0x80
> Apr 23 08:51:29 rakete kernel:  [<ffffffffc0320e46>] ? cleanup_transaction+0x96/0x300 [btrfs]
> Apr 23 08:51:29 rakete kernel:  [<ffffffff810b94a0>] ? wait_woken+0x90/0x90
> Apr 23 08:51:29 rakete kernel:  [<ffffffffc0321bf3>] ? btrfs_commit_transaction+0x2b3/0xa30 [btrfs]
> Apr 23 08:51:29 rakete kernel:  [<ffffffffc0322406>] ? start_transaction+0x96/0x4d0 [btrfs]
> Apr 23 08:51:29 rakete kernel:  [<ffffffffc031d0d2>] ? close_ctree+0x2b2/0x360 [btrfs]
> Apr 23 08:51:29 rakete kernel:  [<ffffffff81206fd7>] ? evict_inodes+0x147/0x170
> Apr 23 08:51:29 rakete kernel:  [<ffffffff811eda39>] ? generic_shutdown_super+0x69/0xf0
> Apr 23 08:51:29 rakete kernel:  [<ffffffff811edace>] ? kill_anon_super+0xe/0x20
> Apr 23 08:51:29 rakete kernel:  [<ffffffffc02f1603>] ? btrfs_kill_super+0x13/0x100 [btrfs]
> Apr 23 08:51:29 rakete kernel:  [<ffffffff811ed4c4>] ? deactivate_locked_super+0x34/0x60
> Apr 23 08:51:29 rakete kernel:  [<ffffffff81209d5b>] ? cleanup_mnt+0x3b/0x80
> Apr 23 08:51:29 rakete kernel:  [<ffffffff81096114>] ? task_work_run+0x74/0x90
> Apr 23 08:51:29 rakete kernel:  [<ffffffff8100334a>] ? exit_to_usermode_loop+0xba/0xc0
> Apr 23 08:51:29 rakete kernel:  [<ffffffff81003bcf>] ? syscall_return_slowpath+0x8f/0x110
> Apr 23 08:51:29 rakete kernel:  [<ffffffff815b9918>] ? int_ret_from_sys_call+0x25/0x8f
> Apr 23 08:51:29 rakete kernel: ---[ end trace 6bbe2b6d20973e0e ]---
> Apr 23 08:51:29 rakete kernel: BTRFS: error (device sdi) in cleanup_transaction:1764: errno=-5 IO failure
> Apr 23 08:51:29 rakete kernel: BTRFS info (device sdi): delayed_refs has NO entry
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): commit super ret -5
> Apr 23 08:51:29 rakete kernel: BTRFS error (device sdi): cleaner transaction attach returned -30
>
> ....

In this 19 (or less) seconds the linux system decides to unmount the
btrfs raid1 filesystem (as all its devices have disappeared). I am
wondering if this is done directly by the kernel or is it udisksd that
initiates this?

>
> Apr 23 08:51:48 rakete kernel: usb 3-1: new SuperSpeed USB device number 3 using xhci_hcd
> Apr 23 08:51:48 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 23 08:51:48 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 23 08:51:48 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 23 08:51:48 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 23 08:51:48 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 23 08:51:48 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 23 08:51:48 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 23 08:51:48 rakete kernel: scsi host9: usb-storage 3-1:1.0
> Apr 23 08:51:48 rakete mtp-probe[4301]: checking bus 3, device 3: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 23 08:51:48 rakete mtp-probe[4301]: bus: 3, device: 3 was not an MTP device
> Apr 23 08:51:49 rakete kernel: scsi 9:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
> Apr 23 08:51:49 rakete kernel: scsi 9:0:0:1: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
> Apr 23 08:51:49 rakete kernel: scsi 9:0:0:2: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: Attached scsi generic sg6 type 0
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: Attached scsi generic sg7 type 0
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: Attached scsi generic sg8 type 0
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Write Protect is off
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] No Caching mode page found
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Assuming drive cache: write through
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Write Protect is off
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Mode Sense: 67 00 10 08
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] No Caching mode page found
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Assuming drive cache: write through
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Write Protect is off
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Mode Sense: 67 00 10 08
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] No Caching mode page found
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Assuming drive cache: write through
> Apr 23 08:51:49 rakete kernel:  sdf: sdf1
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:0: [sdf] Attached SCSI disk
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:1: [sdg] Attached SCSI disk
> Apr 23 08:51:49 rakete kernel: sd 9:0:0:2: [sdh] Attached SCSI disk
> Apr 23 08:51:49 rakete udisksd[2422]: Mounted /dev/sdf1 at /media/matthias/BACKUP on behalf of uid 1000
> Apr 23 08:51:49 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 23 08:51:49 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
>
> ####
>
> 10# btrfs fi show
> warning, device 1 is missing
> warning devid 1 not found already
> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>         Total devices 3 FS bytes used 1.60GiB
>         devid    2 size 465.76GiB used 3.03GiB path /dev/sdg
>         devid    3 size 232.88GiB used 0.00B path /dev/sdh
>         *** Some devices missing
>
> ####
> This time the raid1 is in state "unmounted" after removing the device. This is different to what I found with kernel 4.4.
>
> 12# ls -l /mnt/raid1/
> total 0
>
> ####
> Trying to mount it again:
>
> 14# mount /mnt/raid1/
> mount: wrong fs type, bad option, bad superblock on /dev/sdh,
>        missing codepage or helper program, or other error
>
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.
> ####
>
> Apr 23 08:54:35 rakete kernel: BTRFS info (device sdh): enabling auto defrag
> Apr 23 08:54:35 rakete kernel: BTRFS info (device sdh): disk space caching is enabled
> Apr 23 08:54:35 rakete kernel: BTRFS: has skinny extents
> Apr 23 08:54:35 rakete kernel: BTRFS: failed to read the system array on sdh
> Apr 23 08:54:35 rakete kernel: BTRFS: open_ctree failed
>
> ####
>
> Plugin the device again.
>
> Apr 23 08:55:44 rakete kernel: usb 3-1: USB disconnect, device number 3
> Apr 23 08:56:06 rakete kernel: usb 3-1: new SuperSpeed USB device number 4 using xhci_hcd
> Apr 23 08:56:06 rakete kernel: usb 3-1: New USB device found, idVendor=152d, idProduct=0567
> Apr 23 08:56:06 rakete kernel: usb 3-1: New USB device strings: Mfr=10, Product=11, SerialNumber=5
> Apr 23 08:56:06 rakete kernel: usb 3-1: Product: USB to ATA/ATAPI Bridge
> Apr 23 08:56:06 rakete kernel: usb 3-1: Manufacturer: JMicron
> Apr 23 08:56:06 rakete kernel: usb 3-1: SerialNumber: 152D00539000
> Apr 23 08:56:06 rakete kernel: usb-storage 3-1:1.0: USB Mass Storage device detected
> Apr 23 08:56:06 rakete kernel: usb-storage 3-1:1.0: Quirks match for vid 152d pid 0567: 5000000
> Apr 23 08:56:06 rakete kernel: scsi host10: usb-storage 3-1:1.0
> Apr 23 08:56:06 rakete mtp-probe[4751]: checking bus 3, device 4: "/sys/devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb3/3-1"
> Apr 23 08:56:06 rakete mtp-probe[4751]: bus: 3, device: 4 was not an MTP device
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:0: Direct-Access     WDC WD20 02FAEX-007BA0    0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:1: Direct-Access     WDC WD75 00AACS-00C7B0    0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:2: Direct-Access     WDC WD50 01AALS-00L3B2    0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: scsi 10:0:0:3: Direct-Access     SAMSUNG  SP2504C          0125 PQ: 0 ANSI: 6
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: Attached scsi generic sg6 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: Attached scsi generic sg7 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: Attached scsi generic sg8 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] 1465149168 512-byte logical blocks: (750 GB/699 GiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: Attached scsi generic sg9 type 0
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] 976773168 512-byte logical blocks: (500 GB/466 GiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] 488395055 512-byte logical blocks: (250 GB/233 GiB)
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Write Protect is off
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Mode Sense: 67 00 10 08
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] No Caching mode page found
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Assuming drive cache: write through
> Apr 23 08:56:07 rakete kernel:  sdf: sdf1
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:0: [sdf] Attached SCSI disk
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:1: [sdg] Attached SCSI disk
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:2: [sdh] Attached SCSI disk
> Apr 23 08:56:07 rakete kernel: sd 10:0:0:3: [sdi] Attached SCSI disk
> Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 1 transid 89 /dev/sdg
> Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 2 transid 89 /dev/sdh
> Apr 23 08:56:07 rakete kernel: BTRFS: device fsid 16d5891f-5d52-4b29-8591-588ddf11e73d devid 3 transid 89 /dev/sdi
> Apr 23 08:56:07 rakete kernel: EXT4-fs (sdf1): recovery complete
> Apr 23 08:56:07 rakete kernel: EXT4-fs (sdf1): mounted filesystem with ordered data mode. Opts: (null)
>
> ####
>
> 15# btrfs fi show
> Label: none  uuid: 16d5891f-5d52-4b29-8591-588ddf11e73d
>         Total devices 3 FS bytes used 1.60GiB
>         devid    1 size 698.64GiB used 3.03GiB path /dev/sdg
>         devid    2 size 465.76GiB used 3.03GiB path /dev/sdh
>         devid    3 size 232.88GiB used 0.00B path /dev/sdi
>
> ####
>
> 18# mount /mnt/raid1/
>
> Apr 23 08:57:00 rakete kernel: BTRFS info (device sdi): enabling auto defrag
> Apr 23 08:57:00 rakete kernel: BTRFS info (device sdi): disk space caching is enabled
> Apr 23 08:57:00 rakete kernel: BTRFS: has skinny extents
>
> ####
>
> 19# ls -l /mnt/raid1/
> total 0
> drwxrwxr-x 1 root root   36 Nov 14  2014 AfterShot2(64-bit)
> drwxrwxr-x 1 root root 5082 Apr 17 09:06 etc
> drwxr-xr-x 1 root root  108 Mar 24 07:31 var
>
> ####
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-21 17:27         ` Matthias Bodenbinder
@ 2016-04-26 16:19           ` Henk Slager
  2016-04-26 16:42             ` Holger Hoffstätte
  2016-04-28  5:09             ` Matthias Bodenbinder
  0 siblings, 2 replies; 32+ messages in thread
From: Henk Slager @ 2016-04-26 16:19 UTC (permalink / raw)
  To: Matthias Bodenbinder; +Cc: linux-btrfs

On Thu, Apr 21, 2016 at 7:27 PM, Matthias Bodenbinder
<matthias@bodenbinder.de> wrote:
> Am 21.04.2016 um 13:28 schrieb Henk Slager:
>>> Can anyone explain this behavior?
>>
>> All 4 drives (WD20, WD75, WD50, SP2504C) get a disconnect twice in
>> this test. What is on WD20 is unclear to me, but the raid1 array is
>> {WD75, WD50, SP2504C}
>> So the test as described by Matthias is not what actually happens.
>> In fact, the whole btrfs fs is 'disconnected on the lower layers of
>> the kernel' but there is no unmount.  You can see the scsi items go
>> from 8?.0.0.x to
>> 9.0.0.x to 10.0.0.x. In the 9.0.0.x state, the tools show then 1 dev
>> missing (WD75), but in fact the whole fs state is messed up. So as
>> indicated by Anand already, it is a bad test and it is what one can
>> expect from an unpatched 4.4.0 kernel. ( I'm curious to know how md
>> raidX would handle this ).
>>
>> a) My best guess is that the 4 drives are in a USB connected drivebay
>> and that Matthias unplugged WD75 (so cut its power and SATA
>> connection), did the file copy trial and then plugged in the WD75
>> again into the drivebay. The (un)plug of a harddisk is then assumed to
>> trigger a USB link re-init by the chipset in the drivebay.
>>
>> b) Another possibility is that due to (un)plug of WD75 cause the host
>> USB chipset to re-init the USB link due to (too big?) changes in
>> electrical current. And likely separate USB cables and maybe some
>> SATA.
>>
>> c) Or some flaw in the LMDE2 distribution in combination with btrfs. I
>> don't what is in the  linux-image-4.4.0-0.bpo.1-amd64
>>
>
> Just to clarify my setup. I HDs are mounted into a FANTEC QB-35US3-6G case. According to the handbook it has "Hot-Plug for  USB / eSATA interface".
>
> It is equipped with 4 HDs. 3 of them are part of the raid1. The fourth HD is a 2 TB device with ext4 filesystem and no relevance for this thread.

It looks like a JMS567 + SATA port multipliers behaind it are used in
this drivebay. The command   lsusb -v  could show that. So your HW
setup is like JBOD, not RAID.

IMHO, using such a setup for software RAID (like btrfs RAID1)
fundamentally violates the concept of RAID (redundant array of
independent disks). It depends on where you define the system border
of the (independent) disks.
If it is at:

A) the 4 (or 3 disk in this case) SATA+power interfaces inside the drivebay or

B) inside the PC's chipset.

In case A) there is a shared removable link (USB) inside the
filesystem processing machine.
In case B) the disks aren't really independent as they share a
removable link (and as proven by the (un)plug of 1 device affecting
all others).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-26 16:19           ` Henk Slager
@ 2016-04-26 16:42             ` Holger Hoffstätte
  2016-04-28  5:12               ` Matthias Bodenbinder
  2016-04-28  5:09             ` Matthias Bodenbinder
  1 sibling, 1 reply; 32+ messages in thread
From: Holger Hoffstätte @ 2016-04-26 16:42 UTC (permalink / raw)
  To: Henk Slager, Matthias Bodenbinder; +Cc: linux-btrfs

On 04/26/16 18:19, Henk Slager wrote:
> It looks like a JMS567 + SATA port multipliers behaind it are used in
> this drivebay. The command   lsusb -v  could show that. So your HW
> setup is like JBOD, not RAID.

I hate to quote the "harmful" trope, but..

SATA Port Multipliers Considered Harmful
https://www.usenix.org/system/files/fastpw13-paper7_0.pdf

aka: how to make any RAID setup useless in 1 easy step.

-h


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-26 16:19           ` Henk Slager
  2016-04-26 16:42             ` Holger Hoffstätte
@ 2016-04-28  5:09             ` Matthias Bodenbinder
  2016-04-28 19:14               ` Henk Slager
  1 sibling, 1 reply; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-28  5:09 UTC (permalink / raw)
  To: linux-btrfs

Am 26.04.2016 um 18:19 schrieb Henk Slager:
> It looks like a JMS567 + SATA port multipliers behaind it are used in
> this drivebay. The command   lsusb -v  could show that. So your HW
> setup is like JBOD, not RAID.

Here is the output of lsusb -v:


Bus 003 Device 004: ID 152d:0567 JMicron Technology Corp. / JMicron USA Technology Corp. 
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               3.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0         9
  idVendor           0x152d JMicron Technology Corp. / JMicron USA Technology Corp.
  idProduct          0x0567 
  bcdDevice            2.05
  iManufacturer          10 JMicron
  iProduct               11 USB to ATA/ATAPI Bridge
  iSerial                 5 152D00539000
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           44
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xc0
      Self Powered
    MaxPower                2mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass         8 Mass Storage
      bInterfaceSubClass      6 SCSI
      bInterfaceProtocol     80 Bulk-Only
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0400  1x 1024 bytes
        bInterval               0
        bMaxBurst              15
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0400  1x 1024 bytes
        bInterval               0
        bMaxBurst              15
Binary Object Store Descriptor:
  bLength                 5
  bDescriptorType        15
  wTotalLength           22
  bNumDeviceCaps          2
  USB 2.0 Extension Device Capability:
    bLength                 7
    bDescriptorType        16
    bDevCapabilityType      2
    bmAttributes   0x00000002
      Link Power Management (LPM) Supported
  SuperSpeed USB Device Capability:
    bLength                10
    bDescriptorType        16
    bDevCapabilityType      3
    bmAttributes         0x00
    wSpeedsSupported   0x000e
      Device can operate at Full Speed (12Mbps)
      Device can operate at High Speed (480Mbps)
      Device can operate at SuperSpeed (5Gbps)
    bFunctionalitySupport   1
      Lowest fully-functional device speed is Full Speed (12Mbps)
    bU1DevExitLat          10 micro seconds
    bU2DevExitLat        2047 micro seconds
Device Status:     0x0001
  Self Powered



> IMHO, using such a setup for software RAID (like btrfs RAID1)
> fundamentally violates the concept of RAID (redundant array of
> independent disks). It depends on where you define the system border
> of the (independent) disks.
> If it is at:
> 
> A) the 4 (or 3 disk in this case) SATA+power interfaces inside the drivebay or
> 
> B) inside the PC's chipset.
> 
> In case A) there is a shared removable link (USB) inside the
> filesystem processing machine.
> In case B) the disks aren't really independent as they share a
> removable link (and as proven by the (un)plug of 1 device affecting
> all others).
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-26 16:42             ` Holger Hoffstätte
@ 2016-04-28  5:12               ` Matthias Bodenbinder
  2016-04-28  5:24                 ` Gareth Pye
  0 siblings, 1 reply; 32+ messages in thread
From: Matthias Bodenbinder @ 2016-04-28  5:12 UTC (permalink / raw)
  To: linux-btrfs

Am 26.04.2016 um 18:42 schrieb Holger Hoffstätte:
> On 04/26/16 18:19, Henk Slager wrote:
>> It looks like a JMS567 + SATA port multipliers behaind it are used in
>> this drivebay. The command   lsusb -v  could show that. So your HW
>> setup is like JBOD, not RAID.
> 
> I hate to quote the "harmful" trope, but..
> 
> SATA Port Multipliers Considered Harmful
> https://www.usenix.org/system/files/fastpw13-paper7_0.pdf
> 
> aka: how to make any RAID setup useless in 1 easy step.

Interesting article but it has no date to it. Could be outdated or brand new.

Matthias




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-28  5:12               ` Matthias Bodenbinder
@ 2016-04-28  5:24                 ` Gareth Pye
  2016-04-28  8:08                   ` Duncan
  0 siblings, 1 reply; 32+ messages in thread
From: Gareth Pye @ 2016-04-28  5:24 UTC (permalink / raw)
  To: Matthias Bodenbinder; +Cc: linux-btrfs

PDF doc info dates it at 23/1/2013, which is the best guess that can
easily be found.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-28  5:24                 ` Gareth Pye
@ 2016-04-28  8:08                   ` Duncan
  0 siblings, 0 replies; 32+ messages in thread
From: Duncan @ 2016-04-28  8:08 UTC (permalink / raw)
  To: linux-btrfs

Gareth Pye posted on Thu, 28 Apr 2016 15:24:51 +1000 as excerpted:

> PDF doc info dates it at 23/1/2013, which is the best guess that can
> easily be found.

Well, "easily" is relative, but motivated by your observation I first 
confirmed it, then decided to see what google had to say about the 
authors.

I only looked at the two University of Minnesota authors.  David Lilja is 
a professor there since the 90s, with google turning up various lectures, 
etc, at other universities.  Peng Li, listed as a student in the paper, 
was presumably a graduate student.  His linkedin profile says he's at 
Intel from Aug 2015 to present (software engineer, non-volatitle memory 
device R&D), but was Sr. Engineer, Seagate Tech, Minneapolis/St.Paul 
area, July 2013 to August 2015 (drive arch and performance modeling), and 
was a summer intern at Huawei in San Fran area in the middle of 2012.  
There's several patent and papers to his name.

More importantly for us, however, linkedin links to his personal page, 
still University of Minnesota as he graduated there with a doctorate, PhD 
Advisor, no surprise, Prof. David J Lilja.

http://people.ece.umn.edu/~lipeng/

That page lists as a one project:

Reliability of SATA Port Multiplier (2012).

So while the paper probably came out in January of 2013 as the pdf date 
suggests, he was working on it in 2012.

BTW, his personal site was last updated in June of 2013 and thus doesn't 
mention anything about his move to Intel in 2015.  I'd guess he hasn't 
touched it since getting the doctorate and the job at Seagate, given the 
page mentions that, but the Linkedin profile said it didn't start until 
July of that year, the month after his last personal page at the 
university, update.


Took me longer to write that up than to find it, so it wasn't hard, but 
as I said, "easy" is relative, so YMMV. =:^)

Meanwhile, that was just a single sampling, as the paper itself points 
out, so we don't know where it falls among other port multipliers, or 
even if its behavior was characteristic of that brand and model.

What we do have, however, is that semi-official paper, along with other 
observations here about the reliability, or more accurately, lack of 
reliability, of the various USB2SATA bridge chips, etc.  Even without the 
port multiplier, prior real world posted experience here suggests that 
while single device btrfs on USB via USB2SATA bridge may be reasonable, 
it's not particularly reliable as part of a multi-device btrfs, as too 
often the bridges and devices behind them drop out temporarily due to 
power or other reasons, and btrfs at this point simply doesn't cope well 
with devices dropping out and appearing again, possibly as other 
devices.  With a single-device btrfs there isn't much to screw up, the 
data either gets there or doesn't, and the atomic-cow nature of btrfs 
does at least normally allow for recovery to a known past state plus 
replay of the fsync log between commits if it doesn't, but multi-device 
can quickly get out of hand, particularly if more than one device is 
playing the disappear and reappear game at once.


A reasonable conclusion then, is that the given layout isn't particularly 
reliable at more than one point, making multi-device anything over it 
rather unwise.  JBOD /as/ /JBOD/, creating individual single-device 
filesystems on each device (or device partition), may be somewhat more 
workable, but multi-device, whether at the btrfs level or dm- or md-raid 
level underneath some other filesystem, isn't likely to be very reliable 
at all.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Question: raid1 behaviour on failure
  2016-04-28  5:09             ` Matthias Bodenbinder
@ 2016-04-28 19:14               ` Henk Slager
  0 siblings, 0 replies; 32+ messages in thread
From: Henk Slager @ 2016-04-28 19:14 UTC (permalink / raw)
  To: Matthias Bodenbinder; +Cc: linux-btrfs

On Thu, Apr 28, 2016 at 7:09 AM, Matthias Bodenbinder
<matthias@bodenbinder.de> wrote:
> Am 26.04.2016 um 18:19 schrieb Henk Slager:
>> It looks like a JMS567 + SATA port multipliers behaind it are used in
>> this drivebay. The command   lsusb -v  could show that. So your HW
>> setup is like JBOD, not RAID.
>
> Here is the output of lsusb -v:
>
>
> Bus 003 Device 004: ID 152d:0567 JMicron Technology Corp. / JMicron USA Technology Corp.
> Device Descriptor:
>   bLength                18
>   bDescriptorType         1
>   bcdUSB               3.00
>   bDeviceClass            0 (Defined at Interface level)
>   bDeviceSubClass         0
>   bDeviceProtocol         0
>   bMaxPacketSize0         9
>   idVendor           0x152d JMicron Technology Corp. / JMicron USA Technology Corp.
>   idProduct          0x0567
>   bcdDevice            2.05
>   iManufacturer          10 JMicron
>   iProduct               11 USB to ATA/ATAPI Bridge
>   iSerial                 5 152D00539000
>   bNumConfigurations      1

OK, that is how the drivebay presents itself. It does not really
correspond to this:
http://www.jmicron.com/PDF/brief/jms567.pdf
It looks more like a jms562 is used, but I don't know what is on the
PCB and in the FW

Anyhow, hot (un)plug capability on the 4 internal SATA i/f is not
explicitly mentioned. If you expect or want that, ask Fantec I would
say.

>   Configuration Descriptor:
>     bLength                 9
>     bDescriptorType         2
>     wTotalLength           44
>     bNumInterfaces          1
>     bConfigurationValue     1
>     iConfiguration          0
>     bmAttributes         0xc0
>       Self Powered
>     MaxPower                2mA
>     Interface Descriptor:
>       bLength                 9
>       bDescriptorType         4
>       bInterfaceNumber        0
>       bAlternateSetting       0
>       bNumEndpoints           2
>       bInterfaceClass         8 Mass Storage
>       bInterfaceSubClass      6 SCSI
>       bInterfaceProtocol     80 Bulk-Only
>       iInterface              0
>       Endpoint Descriptor:
>         bLength                 7
>         bDescriptorType         5
>         bEndpointAddress     0x81  EP 1 IN
>         bmAttributes            2
>           Transfer Type            Bulk
>           Synch Type               None
>           Usage Type               Data
>         wMaxPacketSize     0x0400  1x 1024 bytes
>         bInterval               0
>         bMaxBurst              15
>       Endpoint Descriptor:
>         bLength                 7
>         bDescriptorType         5
>         bEndpointAddress     0x02  EP 2 OUT
>         bmAttributes            2
>           Transfer Type            Bulk
>           Synch Type               None
>           Usage Type               Data
>         wMaxPacketSize     0x0400  1x 1024 bytes
>         bInterval               0
>         bMaxBurst              15
> Binary Object Store Descriptor:
>   bLength                 5
>   bDescriptorType        15
>   wTotalLength           22
>   bNumDeviceCaps          2
>   USB 2.0 Extension Device Capability:
>     bLength                 7
>     bDescriptorType        16
>     bDevCapabilityType      2
>     bmAttributes   0x00000002
>       Link Power Management (LPM) Supported
>   SuperSpeed USB Device Capability:
>     bLength                10
>     bDescriptorType        16
>     bDevCapabilityType      3
>     bmAttributes         0x00
>     wSpeedsSupported   0x000e
>       Device can operate at Full Speed (12Mbps)
>       Device can operate at High Speed (480Mbps)
>       Device can operate at SuperSpeed (5Gbps)
>     bFunctionalitySupport   1
>       Lowest fully-functional device speed is Full Speed (12Mbps)
>     bU1DevExitLat          10 micro seconds
>     bU2DevExitLat        2047 micro seconds
> Device Status:     0x0001
>   Self Powered
>
>
>
>> IMHO, using such a setup for software RAID (like btrfs RAID1)
>> fundamentally violates the concept of RAID (redundant array of
>> independent disks). It depends on where you define the system border
>> of the (independent) disks.
>> If it is at:
>>
>> A) the 4 (or 3 disk in this case) SATA+power interfaces inside the drivebay or
>>
>> B) inside the PC's chipset.
>>
>> In case A) there is a shared removable link (USB) inside the
>> filesystem processing machine.
>> In case B) the disks aren't really independent as they share a
>> removable link (and as proven by the (un)plug of 1 device affecting
>> all others).
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-04-28 19:14 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-18  5:06 Question: raid1 behaviour on failure Matthias Bodenbinder
2016-04-18  7:22 ` Qu Wenruo
2016-04-20  5:17   ` Matthias Bodenbinder
2016-04-20  7:25     ` Qu Wenruo
2016-04-21  5:22       ` Matthias Bodenbinder
2016-04-21  5:43         ` Qu Wenruo
2016-04-21  6:02           ` Liu Bo
2016-04-21  6:09             ` Qu Wenruo
2016-04-21 17:40           ` Matthias Bodenbinder
2016-04-22  6:02             ` Qu Wenruo
2016-04-23  7:07               ` Matthias Bodenbinder
2016-04-23  7:17                 ` Matthias Bodenbinder
2016-04-26  8:17                 ` Satoru Takeuchi
2016-04-26 15:16                 ` Henk Slager
2016-04-20 13:32     ` Anand Jain
2016-04-21  5:15       ` Matthias Bodenbinder
2016-04-21  7:19         ` Anand Jain
2016-04-21  6:23     ` Satoru Takeuchi
2016-04-21 11:09       ` Austin S. Hemmelgarn
2016-04-21 11:28       ` Henk Slager
2016-04-21 17:27         ` Matthias Bodenbinder
2016-04-26 16:19           ` Henk Slager
2016-04-26 16:42             ` Holger Hoffstätte
2016-04-28  5:12               ` Matthias Bodenbinder
2016-04-28  5:24                 ` Gareth Pye
2016-04-28  8:08                   ` Duncan
2016-04-28  5:09             ` Matthias Bodenbinder
2016-04-28 19:14               ` Henk Slager
     [not found]       ` <57188534.1070408@jp.fujitsu.com>
2016-04-21 11:58         ` Qu Wenruo
2016-04-22  2:21           ` Satoru Takeuchi
2016-04-22  5:32             ` Qu Wenruo
2016-04-22  6:17               ` Satoru Takeuchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.