Odd behaviour of replace -- unknown resulting state

* Odd behaviour of replace -- unknown resulting state
@ 2017-12-09 17:43 Hugo Mills
  2017-12-09 19:53 ` Hugo Mills
  2017-12-10  0:39 ` Duncan
  0 siblings, 2 replies; 3+ messages in thread
From: Hugo Mills @ 2017-12-09 17:43 UTC (permalink / raw)
  To: Btrfs mailing list

[-- Attachment #1: Type: text/plain, Size: 2510 bytes --]

   This is on 4.10, so there may have been fixes made to this since
then. If so, apologies for the noise.

   I had a filesystem on 6 devices with a badly failing drive in it
(/dev/sdi). I replaced the drive with a new one:

# btrfs replace start /dev/sdi /dev/sdj /media/video

   Once it had finished(*), I resized the device from 6 TB to 8 TB:

# btrfs fi resize 2:max /media/video

   I also removed another, smaller, device:

# btrfs dev del 7 /media/video

   Following this, btrfs fi show was reporting the correct device
size, but still the same device node in the filesystem:

Label: 'amelia'  uuid: f7409f7d-bea2-4818-b937-9e45d754b5f1
       Total devices 5 FS bytes used 9.15TiB
       devid    2 size 7.28TiB used 6.44TiB path /dev/sdi2
       devid    3 size 3.63TiB used 3.46TiB path /dev/sde2
       devid    4 size 3.63TiB used 3.45TiB path /dev/sdd2
       devid    5 size 1.81TiB used 1.65TiB path /dev/sdh2
       devid    6 size 3.63TiB used 3.43TiB path /dev/sdc2

   Note that device 2 definitely isn't /dev/sdi2, because /dev/sdi2
was on a 6 TB device, not an 8 TB device.

   Finally, I physically removed the two deleted devices from the
machine. The second device came out fine, but the first (/dev/sdi) has
now resulted in this from btrfs fi show:

Label: 'amelia'  uuid: f7409f7d-bea2-4818-b937-9e45d754b5f1
       Total devices 5 FS bytes used 9.15TiB
       devid    3 size 3.63TiB used 3.46TiB path /dev/sde2
       devid    4 size 3.63TiB used 3.45TiB path /dev/sdd2
       devid    5 size 1.81TiB used 1.65TiB path /dev/sdh2
       devid    6 size 3.63TiB used 3.43TiB path /dev/sdc2
       *** Some devices missing

   So, what's the *actual* current state of this filesystem? It's not
throwing write errors in the kernel logs from having a missing device,
so it seems like it's probably OK. However, the FS's idea of which
devices it's got seems to be confused.

   I suspect that if I reboot, it'll all be fine, but I'd be happier
if it hadn't got into this state in the first place.

   Is this bug fixed in later versions of the kernel? Can anyone think
of any issues I might have if I leave it in this state for a while?
Likewise, any issues I might have from a reboot? (Probably into 4.14)

   Hugo.

(*) as an aside, it was reporting over 300% complete when it finally
    completed. Not sure if that's been fixed since 4.10, either.

-- 
Hugo Mills             | Biphocles: Plato's optician
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread