All of lore.kernel.org
 help / color / mirror / Atom feed
* Problems with "btrfs dev remove" of dead disk
@ 2016-02-14 21:55 Andy Smith
  2016-02-14 23:49 ` Chris Murphy
  2016-02-15  3:40 ` Anand Jain
  0 siblings, 2 replies; 4+ messages in thread
From: Andy Smith @ 2016-02-14 21:55 UTC (permalink / raw)
  To: linux-btrfs

Hi,

One of my drives died earlier in a fairly emphatic way in that not
only did it show IO errors and got removed as a device by the
kernel, but it was also making audible grinding/screeching noises
until I hot unplugged it.

Feb 14 18:29:36 specialbrew kernel: [27576156.070961] ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Feb 14 18:29:37 specialbrew kernel: [27576157.215312] ata6.00: hard resetting link
Feb 14 18:29:37 specialbrew kernel: [27576157.555369] ata6.00: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 14 18:29:37 specialbrew kernel: [27576157.560028] ata6.01: hard resetting link
Feb 14 18:29:38 specialbrew kernel: [27576157.915797] ata6.01: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 14 18:29:38 specialbrew kernel: [27576157.920591] ata6.02: hard resetting link
Feb 14 18:29:38 specialbrew kernel: [27576158.275759] ata6.02: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 14 18:29:38 specialbrew kernel: [27576158.280603] ata6.03: hard resetting link
Feb 14 18:29:38 specialbrew kernel: [27576158.603658] ata6.03: SATA link down (SStatus 0 SControl 320)
Feb 14 18:29:38 specialbrew kernel: [27576158.608844] ata6.04: hard resetting link
Feb 14 18:29:39 specialbrew kernel: [27576158.947805] ata6.04: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 14 18:29:39 specialbrew kernel: [27576158.953058] ata6.05: hard resetting link
Feb 14 18:29:39 specialbrew kernel: [27576159.291801] ata6.05: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 14 18:29:39 specialbrew kernel: [27576159.297143] ata6.06: hard resetting link
Feb 14 18:29:39 specialbrew kernel: [27576159.639850] ata6.06: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 14 18:29:39 specialbrew kernel: [27576159.645411] ata6.07: hard resetting link
Feb 14 18:29:40 specialbrew kernel: [27576159.971581] ata6.07: SATA link down (SStatus 0 SControl 320)
Feb 14 18:29:40 specialbrew kernel: [27576159.977251] ata6.08: hard resetting link
Feb 14 18:29:40 specialbrew kernel: [27576160.303533] ata6.08: SATA link down (SStatus 0 SControl 320)
Feb 14 18:29:40 specialbrew kernel: [27576160.310056] ata6.09: hard resetting link
Feb 14 18:29:40 specialbrew kernel: [27576160.635541] ata6.09: SATA link down (SStatus 0 SControl 320)
Feb 14 18:29:40 specialbrew kernel: [27576160.641371] ata6.10: hard resetting link
Feb 14 18:29:41 specialbrew kernel: [27576160.967639] ata6.10: SATA link down (SStatus 0 SControl 320)
Feb 14 18:29:41 specialbrew kernel: [27576160.973591] ata6.11: hard resetting link
Feb 14 18:29:41 specialbrew kernel: [27576161.299570] ata6.11: SATA link down (SStatus 0 SControl 320)
Feb 14 18:29:41 specialbrew kernel: [27576161.305670] ata6.12: hard resetting link
Feb 14 18:29:41 specialbrew kernel: [27576161.631589] ata6.12: SATA link down (SStatus 0 SControl 320)
Feb 14 18:29:41 specialbrew kernel: [27576161.637725] ata6.13: hard resetting link
Feb 14 18:29:42 specialbrew kernel: [27576161.963597] ata6.13: SATA link down (SStatus 0 SControl 320)
Feb 14 18:29:42 specialbrew kernel: [27576161.969538] ata6.14: hard resetting link
Feb 14 18:29:42 specialbrew kernel: [27576162.295657] ata6.14: SATA link down (SStatus 0 SControl 320)
Feb 14 18:29:42 specialbrew kernel: [27576162.303094] ata6.00: configured for UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.310674] ata6.01: configured for UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.317928] ata6.02: configured for UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.326589] ata6.04: configured for UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.337178] ata6.05: configured for UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.344438] ata6.06: configured for UDMA/100
Feb 14 18:29:43 specialbrew kernel: [27576163.607145] ata6.03: hard resetting link
Feb 14 18:29:44 specialbrew kernel: [27576163.935962] ata6.03: SATA link down (SStatus 0 SControl 320)
Feb 14 18:29:44 specialbrew kernel: [27576163.942835] ata6.03: limiting SATA link speed to 1.5 Gbps
Feb 14 18:29:49 specialbrew kernel: [27576168.939422] ata6.03: hard resetting link
Feb 14 18:29:49 specialbrew kernel: [27576169.264031] ata6.03: SATA link down (SStatus 0 SControl 310)
Feb 14 18:29:49 specialbrew kernel: [27576169.270519] ata6.03: disabled
Feb 14 18:29:49 specialbrew kernel: [27576169.276874] end_request: I/O error, dev sdh, sector 0
Feb 14 18:29:49 specialbrew kernel: [27576169.282908] btrfs_dev_stat_print_on_error: 965 callbacks suppressed
Feb 14 18:29:49 specialbrew kernel: [27576169.282929] ata6: EH complete
Feb 14 18:29:49 specialbrew kernel: [27576169.294246] BTRFS: bdev /dev/sdh errs: wr 125, rd 8, flush 1, corrupt 0, gen 0
Feb 14 18:29:49 specialbrew kernel: [27576169.300987] sd 5:3:0:0: rejecting I/O to offline device
Feb 14 18:29:49 specialbrew kernel: [27576169.307016] BTRFS: lost page write due to I/O error on /dev/sdh
Feb 14 18:29:49 specialbrew kernel: [27576169.312976] BTRFS: bdev /dev/sdh errs: wr 126, rd 8, flush 1, corrupt 0, gen 0
Feb 14 18:29:49 specialbrew kernel: [27576169.319049] ata6.03: detaching (SCSI 5:3:0:0)
Feb 14 18:29:49 specialbrew kernel: [27576169.319433] sd 5:3:0:0: rejecting I/O to offline device
Feb 14 18:29:49 specialbrew kernel: [27576169.319443] BTRFS: lost page write due to I/O error on /dev/sdh
Feb 14 18:29:49 specialbrew kernel: [27576169.319448] BTRFS: bdev /dev/sdh errs: wr 127, rd 8, flush 1, corrupt 0, gen 0
Feb 14 18:29:49 specialbrew kernel: [27576169.319521] sd 5:3:0:0: rejecting I/O to offline device
Feb 14 18:29:49 specialbrew kernel: [27576169.319523] BTRFS: lost page write due to I/O error on /dev/sdh
Feb 14 18:29:49 specialbrew kernel: [27576169.319526] BTRFS: bdev /dev/sdh errs: wr 128, rd 8, flush 1, corrupt 0, gen 0
Feb 14 18:29:49 specialbrew kernel: [27576169.426264] sd 5:3:0:0: [sdh] Synchronizing SCSI cache
Feb 14 18:29:49 specialbrew kernel: [27576169.432734] sd 5:3:0:0: [sdh]  
Feb 14 18:29:49 specialbrew kernel: [27576169.438653] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 14 18:29:49 specialbrew kernel: [27576169.444590] sd 5:3:0:0: [sdh] Stopping disk
Feb 14 18:29:49 specialbrew kernel: [27576169.450961] sd 5:3:0:0: [sdh] START_STOP FAILED
Feb 14 18:29:49 specialbrew kernel: [27576169.456838] sd 5:3:0:0: [sdh]  
Feb 14 18:29:49 specialbrew kernel: [27576169.462622] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 14 18:30:21 specialbrew kernel: [27576201.178630] BTRFS: bdev /dev/sdh errs: wr 128, rd 8, flush 2, corrupt 0, gen 0
Feb 14 18:30:21 specialbrew kernel: [27576201.309583] BTRFS: lost page write due to I/O error on /dev/sdh
Feb 14 18:30:21 specialbrew kernel: [27576201.315761] BTRFS: bdev /dev/sdh errs: wr 129, rd 8, flush 2, corrupt 0, gen 0
Feb 14 18:30:21 specialbrew kernel: [27576201.322086] BTRFS: lost page write due to I/O error on /dev/sdh

…and those BTRFS: messages continue now even though the system no
longer has a /dev/sdh.

Now:

$ sudo btrfs fi sh /srv/tank
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 6 FS bytes used 1.57TiB
        devid    3 size 1.82TiB used 383.00GiB path /dev/sdg
        devid    4 size 1.82TiB used 384.00GiB path /dev/sdf
        devid    5 size 2.73TiB used 1.25TiB path /dev/sdk
        devid    6 size 1.82TiB used 347.00GiB path /dev/sdj
        devid    7 size 2.73TiB used 464.00GiB path /dev/sde
        *** Some devices missing
$ sudo btrfs dev usage /srv/tank
/dev/sde, ID: 7
   Device size:             2.73TiB
   Data,RAID1:            464.00GiB
   Unallocated:             2.28TiB

/dev/sdf, ID: 4
   Device size:             1.82TiB
   Data,RAID1:            383.00GiB
   Metadata,RAID1:          1.00GiB
   Unallocated:             1.44TiB

/dev/sdg, ID: 3
   Device size:             1.82TiB
   Data,RAID1:            382.00GiB
   Metadata,RAID1:          1.00GiB
   Unallocated:             1.45TiB

/dev/sdh, ID: 2
   Device size:               0.00B
   Data,RAID1:            383.00GiB
   Metadata,RAID1:          1.00GiB
   System,RAID1:           32.00MiB
   Unallocated:             1.44TiB

/dev/sdj, ID: 6
   Device size:             1.82TiB
   Data,RAID1:            347.00GiB
   Unallocated:             1.48TiB

/dev/sdk, ID: 5
   Device size:             2.73TiB
   Data,RAID1:              1.25TiB
   Metadata,RAID1:          3.00GiB
   System,RAID1:           32.00MiB
   Unallocated:             1.48TiB

So, ideally I'd like to remove the missing device sdh (id 2) to have
redundant copies of the data until I can insert a new drive. But
"remove" doesn't seem to want to work:

$ sudo btrfs dev remove /dev/sdh /srv/tank
ERROR: not a block device: /dev/sdh
$ sudo btrfs dev remove 2 /srv/tank
ERROR: not a block device: 2
$ btrfs --version
btrfs-progs v4.4

I expect my kernel might be too old as it is a Debian backports
version on wheezy (linux-image-3.16.0-0.bpo.4-amd64
3.16.7-ckt20-1+deb8u3~bpo70+1).

If I upgrade the kernel then should one of those remove commands
above work?

I would rather not reboot just now if I can achieve redundancy in
some other way. Would a rebalance like:

$ sudo btrfs balance -f -v -sdevid=2 -mdevid=2 /srv/tank

reconstruct redundant copies elsewhere?

With this btrfs-progs and kernel version, will a later "btrfs
replace start -r /dev/sdh /dev/sdl" work without me rebooting into a
newer kernel, even though /dev/sdh doesn't exist as a device to the
kernel right now?

Any information/advice appreciated.

Cheers,
Andy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Problems with "btrfs dev remove" of dead disk
  2016-02-14 21:55 Problems with "btrfs dev remove" of dead disk Andy Smith
@ 2016-02-14 23:49 ` Chris Murphy
  2016-02-15  0:13   ` Andy Smith
  2016-02-15  3:40 ` Anand Jain
  1 sibling, 1 reply; 4+ messages in thread
From: Chris Murphy @ 2016-02-14 23:49 UTC (permalink / raw)
  To: Andy Smith; +Cc: Btrfs BTRFS

On Sun, Feb 14, 2016 at 2:55 PM, Andy Smith <andy@strugglers.net> wrote:

>
> So, ideally I'd like to remove the missing device sdh (id 2) to have
> redundant copies of the data until I can insert a new drive. But
> "remove" doesn't seem to want to work:
>
> $ sudo btrfs dev remove /dev/sdh /srv/tank
> ERROR: not a block device: /dev/sdh


Since now it's a missing device, it should be

sudo btrfs device remove missing /srv/tank

But I'm not sure if this works when the volume is not already mounted
degraded. There is no automatic degraded mechanism yet. I think even
newer than 4.4 is needed plus a patch.

http://www.spinics.net/lists/linux-btrfs/msg51912.html



> With this btrfs-progs and kernel version, will a later "btrfs
> replace start -r /dev/sdh /dev/sdl" work without me rebooting into a
> newer kernel, even though /dev/sdh doesn't exist as a device to the
> kernel right now?
>
> Any information/advice appreciated.

On the one hand, the rebuild doesn't change any fs features and
therefore a device rebuilt with e.g. kernel 4.4, should then read fine
later if you go back to 3.16. On the other hand, there's hundreds of
bug fixes, many thousands of insertions and deletions in Btrfs since
then, so it really doesn't make sense to me you'd want to increase
risk of more Btrfs problems when such known things are now fixed.
Consider 4.1.15 if you want a stable long term yet currently
supportable kernel.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Problems with "btrfs dev remove" of dead disk
  2016-02-14 23:49 ` Chris Murphy
@ 2016-02-15  0:13   ` Andy Smith
  0 siblings, 0 replies; 4+ messages in thread
From: Andy Smith @ 2016-02-15  0:13 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Hi Chris,

On Sun, Feb 14, 2016 at 04:49:29PM -0700, Chris Murphy wrote:
> On Sun, Feb 14, 2016 at 2:55 PM, Andy Smith <andy@strugglers.net> wrote:
> > $ sudo btrfs dev remove /dev/sdh /srv/tank
> > ERROR: not a block device: /dev/sdh
> 
> 
> Since now it's a missing device, it should be
> 
> sudo btrfs device remove missing /srv/tank

$ sudo btrfs device remove missing /srv/tank
ERROR: error removing device 'missing': no missing devices found to remove

> But I'm not sure if this works when the volume is not already mounted
> degraded.

I have now done:

# mount -oremount,degraded /srv/tank 

and tried again, but it produces the same response ("mount" now does
show "degraded" as one of the mount flags, however).

I have not yet tried completely unmounting it and mounting it again.

> it really doesn't make sense to me you'd want to increase risk of
> more Btrfs problems when such known things are now fixed. Consider
> 4.1.15 if you want a stable long term yet currently supportable
> kernel.

It is inconvenient to reboot just now, so if I'm able to fix things
without doing so (e.g. by balance or replace) then I would like to.

If that won't be possible then I will of course boot into a newer
kernel at the same time.

If I end up booting into 4.1.15 then it should be possible to mount
degraded and remove missing?

Cheers,
Andy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Problems with "btrfs dev remove" of dead disk
  2016-02-14 21:55 Problems with "btrfs dev remove" of dead disk Andy Smith
  2016-02-14 23:49 ` Chris Murphy
@ 2016-02-15  3:40 ` Anand Jain
  1 sibling, 0 replies; 4+ messages in thread
From: Anand Jain @ 2016-02-15  3:40 UTC (permalink / raw)
  To: Andy Smith; +Cc: linux-btrfs



> Feb 14 18:30:21 specialbrew kernel: [27576201.178630] BTRFS: bdev /dev/sdh errs: wr 128, rd 8, flush 2, corrupt 0, gen 0
> Feb 14 18:30:21 specialbrew kernel: [27576201.309583] BTRFS: lost page write due to I/O error on /dev/sdh
> Feb 14 18:30:21 specialbrew kernel: [27576201.315761] BTRFS: bdev /dev/sdh errs: wr 129, rd 8, flush 2, corrupt 0, gen 0
> Feb 14 18:30:21 specialbrew kernel: [27576201.322086] BTRFS: lost page write due to I/O error on /dev/sdh
>
> …and those BTRFS: messages continue now even though the system no
> longer has a /dev/sdh.

    You need the patch set
       [PATCH 00/15] btrfs: Hot spare and Auto replace

    which includes the patch required here.
     [PATCH 07/15] btrfs: introduce device dynamic state transition to 
offline or failed

   and this will take care of stopping the IO when disk fails.

> Now:
>
> $ sudo btrfs fi sh /srv/tank
> Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
>          Total devices 6 FS bytes used 1.57TiB
>          devid    3 size 1.82TiB used 383.00GiB path /dev/sdg
>          devid    4 size 1.82TiB used 384.00GiB path /dev/sdf
>          devid    5 size 2.73TiB used 1.25TiB path /dev/sdk
>          devid    6 size 1.82TiB used 347.00GiB path /dev/sdj
>          devid    7 size 2.73TiB used 464.00GiB path /dev/sde
>          *** Some devices missing

  btrfs progs has a code to fabricate missing in the user land
  instead of obtaining from the kernel.
  ---
  commit 206efb60cbe3049e0d44c6da3c1909aeee18f813
     btrfs-progs: Add missing devices check for mounted btrfs.
  ---

  So I recommend to use 'btrfs fi show -m', which I guess in your
  case shall not show that devid 2 is missing. Because without
  the kernel patch
    [PATCH 07/15] btrfs: introduce device dynamic state transition to 
offline or failed
  Kernel won't make that (online to offline/failed) transitions at all.

  Current workaround to tell kernel that a device is missing is only
  by .. unmount and mount (not remount (bug)) which is a kind of
  (enterprise unacceptable) workaround. Sorry about that.


> $ sudo btrfs dev usage /srv/tank

::

> /dev/sdh, ID: 2
>     Device size:               0.00B
>     Data,RAID1:            383.00GiB
>     Metadata,RAID1:          1.00GiB
>     System,RAID1:           32.00MiB
>     Unallocated:             1.44TiB

  Yep kernel does not know that device is missing. That
  part of the code is in the patch to be integrated as above.

> So, ideally I'd like to remove the missing device sdh (id 2) to have
> redundant copies of the data until I can insert a new drive. But
> "remove" doesn't seem to want to work:


> $ sudo btrfs dev remove /dev/sdh /srv/tank
> ERROR: not a block device: /dev/sdh
> $ sudo btrfs dev remove 2 /srv/tank
> ERROR: not a block device: 2
> $ btrfs --version
> btrfs-progs v4.4

  Since now device is removed. So only option is to use devid
  if you want to remove/delete. but it needs the patch.
    [PATCH 0/7] Introduce device delete by devid
  I think this is being integrated into 4.5.x (needs both kernel
  and progs patches).


  If you happen to try any of these patches, please consider to
  post results.


> I expect my kernel might be too old as it is a Debian backports
> version on wheezy (linux-image-3.16.0-0.bpo.4-amd64
> 3.16.7-ckt20-1+deb8u3~bpo70+1).
>
> If I upgrade the kernel then should one of those remove commands
> above work?


> I would rather not reboot just now if I can achieve redundancy in
> some other way. Would a rebalance like:
>
> $ sudo btrfs balance -f -v -sdevid=2 -mdevid=2 /srv/tank
>
> reconstruct redundant copies elsewhere?

  No. Please don't do that. It would aggravate the IO errors and
  disk will never be removed from the kernel.

  I suggest reboot if its btrfs root or btrfs is not a kernel module,
  otherwise
  umount
  modprobe -r btrfs (removes stale device entries)
  btrfs dev scan
  mount

  Now 'btrfs fi show -m' should show device id 2 missing.
  So now either you may replace devid2 or delete devid 2 based
  on your business data protection needs.

  Kindly note. If you are trying the hot spare and auto replace patches,
  in this context after the reboot, the device id will be identified
  as missing. And Not failed. So the auto replace won't trigger
  the replace if you have a spare device. This is as designed.


> With this btrfs-progs and kernel version, will a later "btrfs
> replace start -r /dev/sdh /dev/sdl" work without me rebooting into a
> newer kernel, even though /dev/sdh doesn't exist as a device to the
> kernel right now?

  Yes you can consider this, without needing to reboot, however the
  command will be
    btrfs replace start -r 2 /dev/sdl /btrfs

Thanks, Anand


> Any information/advice appreciated.
>
> Cheers,
> Andy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-02-15  3:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-14 21:55 Problems with "btrfs dev remove" of dead disk Andy Smith
2016-02-14 23:49 ` Chris Murphy
2016-02-15  0:13   ` Andy Smith
2016-02-15  3:40 ` Anand Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.