All of lore.kernel.org
 help / color / mirror / Atom feed
* XFS filesystem claims to be mounted after a disconnect
@ 2014-05-02 13:47 Martin Papik
  2014-05-02 15:04 ` Eric Sandeen
  2014-05-02 15:07 ` Eric Sandeen
  0 siblings, 2 replies; 31+ messages in thread
From: Martin Papik @ 2014-05-02 13:47 UTC (permalink / raw)
  To: xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512



I ran into a problem using XFS. The USB device on which I have an XFS
file system got disconnected and xfs_repair and xfs_check fail with a
message saying the file system is mounted writable. There is no entry
in /etc/mtab or /proc/mounts. However I see messages in the kernel log
(dmesg) about write failures to the disconnected drive.


Please let me know what I can do short of zeroing the log, which I
believe would result in some data loss.

Martin


# xfs_repair /dev/sdd104
xfs_repair: /dev/sdd104 contains a mounted filesystem

fatal error -- couldn't initialize XFS library
# xfs_check /dev/sdd104
xfs_check: /dev/sdd104 contains a mounted and writable filesystem

fatal error -- couldn't initialize XFS library


partial dmesg output:

[346220.652432] Buffer I/O error on device sdb104, logical block
3906961152
[346220.652440] Buffer I/O error on device sdb104, logical block
3906961153
[346220.652443] Buffer I/O error on device sdb104, logical block
3906961154
[346220.652446] Buffer I/O error on device sdb104, logical block
3906961155
[346220.652449] Buffer I/O error on device sdb104, logical block
3906961156
[346220.652452] Buffer I/O error on device sdb104, logical block
3906961157
[346220.652455] Buffer I/O error on device sdb104, logical block
3906961158
[346220.652459] Buffer I/O error on device sdb104, logical block
3906961159
[346220.652473] Buffer I/O error on device sdb104, logical block
3906961352
[346220.652476] Buffer I/O error on device sdb104, logical block
3906961353
[346554.917502] quiet_error: 1924 callbacks suppressed
[346554.917510] Buffer I/O error on device sdb104, logical block 0
[346554.917518] Buffer I/O error on device sdb104, logical block 1
[346554.917522] Buffer I/O error on device sdb104, logical block 2
[346554.917525] Buffer I/O error on device sdb104, logical block 3
[346554.917529] Buffer I/O error on device sdb104, logical block 4
[346554.917532] Buffer I/O error on device sdb104, logical block 5
[346554.917536] Buffer I/O error on device sdb104, logical block 6
[346554.917539] Buffer I/O error on device sdb104, logical block 7
[346554.951030] Buffer I/O error on device sdb104, logical block
3906961152
[346554.951051] Buffer I/O error on device sdb104, logical block
3906961153

The current disk is /dev/sdd104 and sdb104 doesn't appear anywhere in
the kernel, despite the messages in dmesg.

# cat /proc/partitions
major minor  #blocks  name

   7        0     131072 loop0
   8        0  488386584 sda
   8        1     204800 sda1
   8        2  104384512 sda2
   8        3          1 sda3
   8        4   15471640 sda4
   8        5   30403584 sda5
   8        6     104391 sda6
   8        7  117187584 sda7
  11        0    1048575 sr0
   8       64 1953481728 sde
   8       75 1953480687 sde11
   8       80  312571224 sdf
   8       48 1953481728 sdd
 259        0 1953480687 sdd104


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTY6GrAAoJELsEaSRwbVYrTwcQAKNYds36X3wUssi1Zbcn+AXi
UJuvx3QYVY6OF7huitlM/m9Ih9gOfFVQEbVvHEqoHXwEx2y6a5/KzCGoSHSXsxUj
KLFgXxgbdTsGEIePfW3R1v3JyY9TmuDx/U39gMPuM2nRy2sFCXc6HpBJFfUJf4uo
GY1y+w+LzFE2bTQYG+F0sWhUnBWq9klGoTsZYsghEpLNnaLgdrHLqpJkgb8RIy2C
O1sCbRD+Crs/a5W2ijCNOTchU1yV8oJOK4D7HkyyFUMqG2XRUkeRMntA93r9byo9
kYlzNJUbSPB2onzRacq5ygMKp816+e7sQaBFuYTZQB6IQRk3NGf6PWKwLBO+pY3d
2lqbjkt12jtuth6xlxiUHps83D8X/tEWz0a8oQMfjIyAoZZJTCZdg/Hd9HyVUg5A
mWPu4UpvXDsTCY/rt3owJrL+VCCxPXumpgAnKc7EiMjbjZBMDIK9YwvEpWSrtt8H
Ak11w4bdCoML/oDIK8LV/SHGinSHAC/sOaEBSX5IfvQeIeohoK7cN400jYFzkMgu
p5ZsiEtRHwl2tnZ9bgm52TYUH6KMI1qGuTPJoOWfKe7XgXAzmx+ppnzuUtDGKi1y
NXNHuD18J8InFCI08mIKaCkIg9yA/6c7Mhq2EaS+QeURcUNyUsi3J2Q7+wv34JJ0
i8TsGzSUYkqrWwDLa1+c
=gR4I
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-02 13:47 XFS filesystem claims to be mounted after a disconnect Martin Papik
@ 2014-05-02 15:04 ` Eric Sandeen
  2014-05-02 15:07 ` Eric Sandeen
  1 sibling, 0 replies; 31+ messages in thread
From: Eric Sandeen @ 2014-05-02 15:04 UTC (permalink / raw)
  To: Martin Papik, xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/2/14, 8:47 AM, Martin Papik wrote:
> 
> 
> I ran into a problem using XFS. The USB device on which I have an XFS
> file system got disconnected and xfs_repair and xfs_check fail with a
> message saying the file system is mounted writable. There is no entry
> in /etc/mtab or /proc/mounts. However I see messages in the kernel log
> (dmesg) about write failures to the disconnected drive.
> 

platform_check_iswritable() and platform_check_ismounted() in xfsprogs check
these things.

platform_check_ismounted() does a ustat() of the block device,
"ustat() returns  information  about  a mounted file system"
and it knows if it's mounted or not,
"EINVAL: dev does not refer to a device containing a mounted file system."

so something, somewhere thinks it's mounted.  Check /proc/mounts?

> Please let me know what I can do short of zeroing the log, which I
> believe would result in some data loss.

Hate to say it, but a reboot may be simplest.  Zeroing the log won't
help.

- -Eric

> Martin
> 
> 
> # xfs_repair /dev/sdd104
> xfs_repair: /dev/sdd104 contains a mounted filesystem
> 
> fatal error -- couldn't initialize XFS library
> # xfs_check /dev/sdd104
> xfs_check: /dev/sdd104 contains a mounted and writable filesystem
> 
> fatal error -- couldn't initialize XFS library
> 
> 
> partial dmesg output:
> 
> [346220.652432] Buffer I/O error on device sdb104, logical block
> 3906961152
> [346220.652440] Buffer I/O error on device sdb104, logical block
> 3906961153
> [346220.652443] Buffer I/O error on device sdb104, logical block
> 3906961154
> [346220.652446] Buffer I/O error on device sdb104, logical block
> 3906961155
> [346220.652449] Buffer I/O error on device sdb104, logical block
> 3906961156
> [346220.652452] Buffer I/O error on device sdb104, logical block
> 3906961157
> [346220.652455] Buffer I/O error on device sdb104, logical block
> 3906961158
> [346220.652459] Buffer I/O error on device sdb104, logical block
> 3906961159
> [346220.652473] Buffer I/O error on device sdb104, logical block
> 3906961352
> [346220.652476] Buffer I/O error on device sdb104, logical block
> 3906961353
> [346554.917502] quiet_error: 1924 callbacks suppressed
> [346554.917510] Buffer I/O error on device sdb104, logical block 0
> [346554.917518] Buffer I/O error on device sdb104, logical block 1
> [346554.917522] Buffer I/O error on device sdb104, logical block 2
> [346554.917525] Buffer I/O error on device sdb104, logical block 3
> [346554.917529] Buffer I/O error on device sdb104, logical block 4
> [346554.917532] Buffer I/O error on device sdb104, logical block 5
> [346554.917536] Buffer I/O error on device sdb104, logical block 6
> [346554.917539] Buffer I/O error on device sdb104, logical block 7
> [346554.951030] Buffer I/O error on device sdb104, logical block
> 3906961152
> [346554.951051] Buffer I/O error on device sdb104, logical block
> 3906961153
> 
> The current disk is /dev/sdd104 and sdb104 doesn't appear anywhere in
> the kernel, despite the messages in dmesg.
> 
> # cat /proc/partitions
> major minor  #blocks  name
> 
>    7        0     131072 loop0
>    8        0  488386584 sda
>    8        1     204800 sda1
>    8        2  104384512 sda2
>    8        3          1 sda3
>    8        4   15471640 sda4
>    8        5   30403584 sda5
>    8        6     104391 sda6
>    8        7  117187584 sda7
>   11        0    1048575 sr0
>    8       64 1953481728 sde
>    8       75 1953480687 sde11
>    8       80  312571224 sdf
>    8       48 1953481728 sdd
>  259        0 1953480687 sdd104
> 
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTY7QVAAoJECCuFpLhPd7gh8sP/3etms+dvasiiUX6dEwoGf4C
uiZarjqYG6liwMgBgtsfBfHs35p5r70kab55kf6keAx0zIvFewQbyCGKZ931Ok1B
c0XgXaBVrSCzEtgRAJjJMDnIguEOCn0E/duIkjRcq7Gp8lcb1C83yr9fJ16HMYaz
xdYy1Gc3VeDmVAZ3Bf3ojXu1Uqeaa0QZjFSTx7cdUXtsftxtIO906snqU+xc8+pr
BPwzneWntEYiR8Fy5ZyjFCwnhlLnfRwu/EivUkLCcNIBHANko12Uzf/0t0xvcTGQ
z+VZzmZF5Us8ytFsTqfP6a55jQeWU5p2jr7flRSLnjDZ3RK3wUXIK1Yvngq+zTc6
57o2EhNNBTjqLkIzqDG5rAJYVNsv/9gIR3u7QzbdpzquwSHborfUDLS9Ss3saK8D
Qk7FgyM72sNyI5KntO0/HFc443Qvj3ptWfgFengCSd8pDklnIGMC2ykT2MRVazm4
QmeX9XCQ40SZ6y/X67dMrR6hLMSbij7XXZkRjOAFTWkePLNvo2RjzHBW8gFEybHM
d/uFP5HWGtKqUEgwUoQWQ+U/MENv878UqT1Hq7XRNtt+TCWNQQOzuAMbdUSniuCG
V1WFVZEKsKqmzBJ5bdOLXmkUPXcV73Ou2vvTfRqDCHcPIKJTd9Yq/lNZAE5I6NiE
mM4LhitVZUvUn/94NGvX
=m/u5
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-02 13:47 XFS filesystem claims to be mounted after a disconnect Martin Papik
  2014-05-02 15:04 ` Eric Sandeen
@ 2014-05-02 15:07 ` Eric Sandeen
  2014-05-02 15:44   ` Mark Tinguely
  2014-05-02 16:44   ` Martin Papik
  1 sibling, 2 replies; 31+ messages in thread
From: Eric Sandeen @ 2014-05-02 15:07 UTC (permalink / raw)
  To: Martin Papik, xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/2/14, 8:47 AM, Martin Papik wrote:
> 
> 
> I ran into a problem using XFS. The USB device on which I have an XFS
> file system got disconnected and xfs_repair and xfs_check fail with a
> message saying the file system is mounted writable. There is no entry
> in /etc/mtab or /proc/mounts. However I see messages in the kernel log
> (dmesg) about write failures to the disconnected drive.

platform_check_iswritable() and platform_check_ismounted() in xfsprogs check
these things.

platform_check_ismounted() does a ustat() of the block device,
"ustat() returns  information  about  a mounted file system"
and it knows if it's mounted or not,
"EINVAL: dev does not refer to a device containing a mounted file system."

so something, somewhere thinks it's mounted.  Check /proc/mounts?

> Please let me know what I can do short of zeroing the log, which I
> believe would result in some data loss.

Hate to say it, but a reboot may be simplest.  Zeroing the log won't
help.  OTOH, if you lost USB connectivity, you already lost some data.

- -Eric



> Martin
> 
> 
> # xfs_repair /dev/sdd104
> xfs_repair: /dev/sdd104 contains a mounted filesystem
> 
> fatal error -- couldn't initialize XFS library
> # xfs_check /dev/sdd104
> xfs_check: /dev/sdd104 contains a mounted and writable filesystem
> 
> fatal error -- couldn't initialize XFS library
> 
> 
> partial dmesg output:
> 
> [346220.652432] Buffer I/O error on device sdb104, logical block
> 3906961152
> [346220.652440] Buffer I/O error on device sdb104, logical block
> 3906961153
> [346220.652443] Buffer I/O error on device sdb104, logical block
> 3906961154
> [346220.652446] Buffer I/O error on device sdb104, logical block
> 3906961155
> [346220.652449] Buffer I/O error on device sdb104, logical block
> 3906961156
> [346220.652452] Buffer I/O error on device sdb104, logical block
> 3906961157
> [346220.652455] Buffer I/O error on device sdb104, logical block
> 3906961158
> [346220.652459] Buffer I/O error on device sdb104, logical block
> 3906961159
> [346220.652473] Buffer I/O error on device sdb104, logical block
> 3906961352
> [346220.652476] Buffer I/O error on device sdb104, logical block
> 3906961353
> [346554.917502] quiet_error: 1924 callbacks suppressed
> [346554.917510] Buffer I/O error on device sdb104, logical block 0
> [346554.917518] Buffer I/O error on device sdb104, logical block 1
> [346554.917522] Buffer I/O error on device sdb104, logical block 2
> [346554.917525] Buffer I/O error on device sdb104, logical block 3
> [346554.917529] Buffer I/O error on device sdb104, logical block 4
> [346554.917532] Buffer I/O error on device sdb104, logical block 5
> [346554.917536] Buffer I/O error on device sdb104, logical block 6
> [346554.917539] Buffer I/O error on device sdb104, logical block 7
> [346554.951030] Buffer I/O error on device sdb104, logical block
> 3906961152
> [346554.951051] Buffer I/O error on device sdb104, logical block
> 3906961153
> 
> The current disk is /dev/sdd104 and sdb104 doesn't appear anywhere in
> the kernel, despite the messages in dmesg.
> 
> # cat /proc/partitions
> major minor  #blocks  name
> 
>    7        0     131072 loop0
>    8        0  488386584 sda
>    8        1     204800 sda1
>    8        2  104384512 sda2
>    8        3          1 sda3
>    8        4   15471640 sda4
>    8        5   30403584 sda5
>    8        6     104391 sda6
>    8        7  117187584 sda7
>   11        0    1048575 sr0
>    8       64 1953481728 sde
>    8       75 1953480687 sde11
>    8       80  312571224 sdf
>    8       48 1953481728 sdd
>  259        0 1953480687 sdd104
> 
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTY7TIAAoJECCuFpLhPd7gXMYP/1ZAyFdFXLxGUh8aFjN6R38J
hguYWXYWgsZFYc4bl1tLrph74AiPdYFZ3GusNdot9+RL0lW3Fxv1WpXZxvgaKKcV
v7eZ8BtI88Gq5JwwnjkozOPCgc7CBifN8AYYWawgGearaTauHRZAwB7WhKkCY3dc
0JGJ7M0OnYHkTti1YluhB01f8zAhdFgjiMp9eQpg+2QgdYE3VJa451ZWG1hcAuQU
QCCl/N517GoNMIXo/iq7q0l9BDFyF7Ck8uFjxY3DjT3OG8rg7wjyDnECQf5/qUU2
A7UIOte+7cfnfOaHnkZFHD2OtJSyZVAY31fSVUrJZSR2dQ1hiAVQrVv3IJwCMZW/
euD7geYmFR4WHi2Sqy0uerM9n/2YluVDHSn56DqXU7ttJOlmrgJjsapxUnMSAa1J
fRay12deZMP02o9sZsIa0CbaeW74fZkUUPMxBrn6SbicjwjbGpab/KqAOMZdZ6G/
241jR83km9MceBv+I9CqkA4V8eH0jTBxNzzQhRkIhjMn4FbdU/e3OxUlzz+gwkly
dB5sGtJpb1yRs+btKuaD7HeA6ta98Id4J6GISA76HvyPIdqa6753sycC62JI+HhI
YJeG4p8G52QH0tB8rP7sPYuifGkKYlFXGa0qufWDMKoYvPkMe+ikJdUiKghsaFzX
Qrw28Kc6weAYMMTEpi9f
=YSyU
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-02 15:07 ` Eric Sandeen
@ 2014-05-02 15:44   ` Mark Tinguely
  2014-05-02 16:26     ` Martin Papik
  2014-05-02 16:44   ` Martin Papik
  1 sibling, 1 reply; 31+ messages in thread
From: Mark Tinguely @ 2014-05-02 15:44 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Martin Papik, xfs


Please do a "ps -ef" before umount to see if the unmount is hung.

--Mark.

On 05/02/14 10:07, Eric Sandeen wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 5/2/14, 8:47 AM, Martin Papik wrote:
>>
>>
>> I ran into a problem using XFS. The USB device on which I have an XFS
>> file system got disconnected and xfs_repair and xfs_check fail with a
>> message saying the file system is mounted writable. There is no entry
>> in /etc/mtab or /proc/mounts. However I see messages in the kernel log
>> (dmesg) about write failures to the disconnected drive.
>
> platform_check_iswritable() and platform_check_ismounted() in xfsprogs check
> these things.
>
> platform_check_ismounted() does a ustat() of the block device,
> "ustat() returns  information  about  a mounted file system"
> and it knows if it's mounted or not,
> "EINVAL: dev does not refer to a device containing a mounted file system."
>
> so something, somewhere thinks it's mounted.  Check /proc/mounts?
>
>> Please let me know what I can do short of zeroing the log, which I
>> believe would result in some data loss.
>
> Hate to say it, but a reboot may be simplest.  Zeroing the log won't
> help.  OTOH, if you lost USB connectivity, you already lost some data.
>
> - -Eric
>
>
>
>> Martin
>>


<deletes>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-02 15:44   ` Mark Tinguely
@ 2014-05-02 16:26     ` Martin Papik
  0 siblings, 0 replies; 31+ messages in thread
From: Martin Papik @ 2014-05-02 16:26 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 05/02/2014 06:44 PM, Mark Tinguely wrote:
> 
> Please do a "ps -ef" before umount to see if the unmount is hung.
> 
> --Mark.

Can't do that any more, I think I've figured out part of the problem
and "resolved" it. There's nothing in "ps -ef" any more, maybe there
was. What seemed to be the problem was that after the device got
disconnected the xfs module was still aware of the mount, even though
I could not see it in /proc/mounts or find any other trace of it
anywhere. After I went through all processes which had a working
directory on the mounted device and changed it to something else the
xfs was able to complete its job. But before then it wasn't allowing
me to do anything.

It's strange that I didn't see xfs mounted (/proc/mounts) or was able
to see any directories (find /proc -type l -ls | grep <mount-dir>)
used. Any idea why this is? Personally I think this is some kind of a
bug. IMHO the xfs driver should be able to inform the user/admin that
the filesystem is still in use.

Below is what appeared in the kernel log after the last process
stopped using the filesystem.

One more question, even though in this case it's moot now, is there
any possibility of telling xfs that device sdd104 is the new sdb104
and tell it to resume writing where it left off? I know there are
risks involved, especially if the USB device device doesn't report
write completion correctly (FUA). But I'd like to know. I believe that
in this case it would have been safe, the filesystem was idle for at
least an hour before the USB disconnected.

Martin

[352505.707397] XFS (sdb104): metadata I/O error: block 0x7470230c
("xlog_iodone") error 19 numblks 64
[352505.707415] XFS (sdb104): xfs_do_force_shutdown(0x2) called from
line 1115 of file
/build/buildd/linux-lts-raring-3.8.0/fs/xfs/xfs_log.c.  Return address
= 0xffffffffa07f4fd1
[352505.707445] XFS (sdb104): Log I/O Error Detected.  Shutting down
filesystem
[352505.707448] XFS (sdb104): Please umount the filesystem and rectify
the problem(s)
[352505.707452] XFS (sdb104): Unable to update superblock counters.
Freespace may not be correct on next mount.
[352505.707463] XFS (sdb104): xfs_log_force: error 5 returned.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTY8cfAAoJELsEaSRwbVYrOvoP/jaeYZ59qbqhZCauO6Lhir1i
zWKyfX6R3vNfQPmu0mWeVn/1+uqmufEbtK6gXoROJsta+7YaBDZtnRvuvN5JeE76
GR9ZxS7Jghffj+CbSwoI+q41K6dorD+fTpHm8uUbsYkNq1N4eVfrULP+7PcchjLx
mRDkeffHdESSLF3ZtXhMD/cd7Ok6ILG2PpJ5+yEVu2XKr6QLgOMY8cIUpoDNoKgh
/Mawcex1Bz6oItXPGTWth34harbnbOUOU3gRfd6pbXxvXlFfIihUkg6cpre55U0o
O6H+3KGZj9gGtxP+Atcb1dc4oJaT7J+SL6PTBNmRuP5oqqGWI2YPDiuQX3BnJMZv
VrV78rkJLjMNAmS9c47TkJe9No267dOFK+6pZXj+b3gsmg1/5PrhSJnt5IdVrU6+
kext/CGmtb80Q8O/K6WBb5VmQTD0ZhLV+UPEgllRn8QOu8RfGjp5NgGyeFcqLuWb
YgS2OMp1F+EZ/wnXf0LgQECq0FLj5UBmOmhri3fJ3pHlusOjq/VCNIPcCgTl4YaA
F7eiM/tYy7m42USyyT76N0g1lAqPq26INfFCiXdGdJGUfdHiK8/ucoGIvsEW8lY8
cfZFSOdfpWtqZLytr8tOVA8rs4c+7EZcnuxb9T49qcPvf9/lL2WRSOWShTdCz6cV
sE//zq0BJttX8VDTCp1G
=z+sB
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-02 15:07 ` Eric Sandeen
  2014-05-02 15:44   ` Mark Tinguely
@ 2014-05-02 16:44   ` Martin Papik
  2014-05-02 16:53     ` Eric Sandeen
  1 sibling, 1 reply; 31+ messages in thread
From: Martin Papik @ 2014-05-02 16:44 UTC (permalink / raw)
  To: Eric Sandeen, xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


> so something, somewhere thinks it's mounted.  Check /proc/mounts?

There was nothing in /proc/mounts, nor was there any visible reference
in /proc. If there was, the first thing I would have done is to make
sure no process is using the FS. But I only did it later based on a hunch.

>> Please let me know what I can do short of zeroing the log, which
>> I believe would result in some data loss.
> 
> Hate to say it, but a reboot may be simplest.  Zeroing the log
> won't help.  OTOH, if you lost USB connectivity, you already lost
> some data.

Please explain why losing USB connectivity means I've lost data. Is a
SATA/SCSI/NBD disconnect less likely to lose data?

Is XFS is not stable enough to function without a need to reboot in
case of a relatively minor HW failure? Minor meaning affecting only
some disks.

Martin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTY8tNAAoJELsEaSRwbVYrbRsQALh8cNWAUElNyGMlJg9xWKE+
hAXtwIgX8/zkdMAHKbqHCuGmid+D3FF6kMW5VJUH/i3+OW5l9BD+zps9QqBtTW6X
XZjjV3AOslLK6EKm5S796z/42QNIPZUprqJZKkyI70N6BhtfaIEIN+iaSaPWNLjd
AgB/Mh57NCziISsfGbGWiq3x1X/0/qs7LvzlFcuQEcq8KlesjJNybPqMZloMuYMO
G9tQ9X403bnZBMzn7LXIwii+0hAVylLA94L3C6ygar2PcZagrwp6sUVEN1DP8KXu
2ghlrV4EBkRIJmk7ckUAjc45E9zK17qFOfUg/j3UO8q4rS4BtoQapxvZwb+4np2R
RbhLJu/o+OvEfd7mTnMvHk7INDnalqnAkKCrKzJ/I9cgmNvl5GOwSaclDDynMH2/
8ZftI5lBOtnZ1BXSRjm+l7WuxxIQPKTGKjGAlDoFkyhZu4T1BJ9UA4Cx8dRdEW/1
V2pzuSuPhBvQ92x45PC7HDGByxc2AIBGM8+zPTdFA9z9dht0T4R4VSgyUgx53FvP
lyZ63ZzLLsaMY3KCeQefqWIsRCbBwTIKLoqjX7iEQ9MAUBo6OnqL5u6Hc+I/35gO
I1u7uTaDnrQoqqGc+F9rA9ApW7bYbuq+hmuhr3stsYGrHCMSyRcHR3gbeVMKH/NR
AD2DbwmwDv0h/NEf1DER
=/tYi
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-02 16:44   ` Martin Papik
@ 2014-05-02 16:53     ` Eric Sandeen
  2014-05-02 17:54       ` Martin Papik
  0 siblings, 1 reply; 31+ messages in thread
From: Eric Sandeen @ 2014-05-02 16:53 UTC (permalink / raw)
  To: Martin Papik, xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/2/14, 11:44 AM, Martin Papik wrote:
> 
>> so something, somewhere thinks it's mounted.  Check /proc/mounts?
> 
> There was nothing in /proc/mounts, nor was there any visible reference
> in /proc. If there was, the first thing I would have done is to make
> sure no process is using the FS. But I only did it later based on a hunch.

BTW sorry for replying twice, my mailer was being weird and the first
one didn't seem to send.

>>> Please let me know what I can do short of zeroing the log, which
>>> I believe would result in some data loss.
> 
>> Hate to say it, but a reboot may be simplest.  Zeroing the log
>> won't help.  OTOH, if you lost USB connectivity, you already lost
>> some data.
> 
> Please explain why losing USB connectivity means I've lost data. Is a
> SATA/SCSI/NBD disconnect less likely to lose data?

If the device goes away and does not come back, any pending buffered data
to that device will be lost.  That's true of any filesystem, on any type
of connection.

i.e. these IOs have nowhere to go:

> [346220.652432] Buffer I/O error on device sdb104, logical block
> 3906961152

and if they can't ever hit the disk, they'll be lost.

In the USB case when it comes back with a new name, as far as I know
there is no mechanism to handle that anywhere in the kernel.

> Is XFS is not stable enough to function without a need to reboot in
> case of a relatively minor HW failure? Minor meaning affecting only
> some disks.

It's not a question of XFS stability, IMHO.  XFS was talking to device A;
device A went away and never came back.

The issue of being unable to repair it seems to have been a result of files
still open on the (disappeared) device?  Once you resolved that, all was
well, and no reboot was needed, correct?

I suggested the reboot as a big-hammer fix to clear the mysterious stale
mount; turns out that was not required, apparently.

If ustat(device) was reporting that it's mounted, but /proc/partitions
didn't show it, then the device was in some kind of limbo state, I guess,
and that sort of umount handling is below XFS (or any other filesystem),
as far as I know.

What initiated the unmount, was it you (after the USB disconnect) or some
udev magic?

- -Eric

> Martin
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTY81vAAoJECCuFpLhPd7gSxYP/0FEYa82SBczZZLRDe9MWvqC
/lMK475w5wHaHfHcgph9f8i7YbAirrpIS42+QqHVKp92SE5kGRbHhNorUp2Me202
4fvt4oLbro7UxA66BsdrCG7OuaRB4+6FUYGsI6EudnwIAwvG3jur8urf61vZIb+8
BXSd2Y7Z+wQIqPIEKB2n4myrgyZC8GlW0ILNynqL1DFW3y0bkxgV/kFZFTbReOb0
Er0ekrW8A8eQ8R6mU8/msCfSUgNaqSyStkU3Vw35m1gGc+BLE4tArRsCb/AF414j
U9JzgFLtGaf8xdfbmTrOsduaIwchv3F+NaHimClpHMHGaxiBpXkNXrWxGVvy76Fd
IpGTsqnjBP5Dhh35ymrmOgE3YR+GNax0ckjd4gpW15QHhjiHWayCQEMc1GZ9VWKj
USXmT+jENLGGorF8hjWFuC8rMVjWyGwMKFU6wK9xQxHrxn/6/eJqu3DGnV2Bpj70
XTNHvVAn+AFo9tiFd3S1kqiXvZ/HB/mM3DRf5KNkGtpGxTtKxhpNnujG4w6QIkZw
ltAAM/i0iBrq/llb3tfthnwZpSUnqwvY+hYh51TRjAA+HBnRfrQygOozrpSfEeIN
ebJmyFoLpbTLiRFcUzqnpOlhcX5X1dAIw/Llr0Yp57Y38CmQ5XcWjFabgYwdTgkr
2TjHvhDwU5b4mtEG1OOW
=OWFX
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-02 16:53     ` Eric Sandeen
@ 2014-05-02 17:54       ` Martin Papik
  2014-05-02 18:39         ` Eric Sandeen
  0 siblings, 1 reply; 31+ messages in thread
From: Martin Papik @ 2014-05-02 17:54 UTC (permalink / raw)
  To: Eric Sandeen, xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


> In the USB case when it comes back with a new name, as far as I
> know there is no mechanism to handle that anywhere in the kernel.

Is there a mechanism for other devices?

>> Is XFS is not stable enough to function without a need to reboot
>> in case of a relatively minor HW failure? Minor meaning affecting
>> only some disks.
> 
> It's not a question of XFS stability, IMHO.  XFS was talking to
> device A; device A went away and never came back.

Well, it kinda did come back, but that's different story.

> The issue of being unable to repair it seems to have been a result
> of files still open on the (disappeared) device?  Once you resolved
> that, all was well, and no reboot was needed, correct?

Yup, but xfs was still active without a trace in /proc/mounts, which
what confused me.

> I suggested the reboot as a big-hammer fix to clear the mysterious
> stale mount; turns out that was not required, apparently.

I don't like that particular hammer. Personal opinion, sure, but it
seems to me that reboot is what you do when you don't know what went
wrong or you know it's totally fubar. In this case, IMHO, not fubar.

> If ustat(device) was reporting that it's mounted, but
> /proc/partitions didn't show it, then the device was in some kind
> of limbo state, I guess, and that sort of umount handling is below
> XFS (or any other filesystem), as far as I know.

I'm confused here. /dev/old was not in /proc/partitions or
/proc/mounts, /dev/new was in /proc/partitions but not in
/proc/mounts, even after disconnect and reconnect of the drive the
/dev/new refused to be acted on by xfs_check or xfs_repair. How did
that happen? All right, apparently there was a slate xfs instance in
the kernel, not visible anywhere, but that was attached to /dev/old,
why did xfs_repair fail to work on /dev/new until the stale xfs
instance in the kernel finished shutting down.

> What initiated the unmount, was it you (after the USB disconnect)
> or some udev magic?

The disconnect of the USB drive, specifically the internal HUB in the
notebook failed (don't know how), I reset it from ssh (keyboard is
also on the hub), see below, I didn't find any messages from any user
space system, but they might not log everything, but there were
messages about the XFS driver detecting the error, the USB hub being
fubar-ed, the device being off-line, so I'm guessing it was the panic
action, or maybe userspace. I'm not sure, I wasn't able to find out
how XFS handles errors, there's nothing in the manual pages and google
didn't help. Do you know? I.e. the equivalent of errors=remount_ro, or
whatever. One page claimed xfs doesn't recognize this option. My
system has the defaults and it's ubuntu/precise, if that helps.

Martin




May  2 15:49:06 lennie kernel: [344344.325232] sd 11:0:0:0: rejecting
I/O to offline device
May  2 15:49:39 lennie kernel: [344377.367220] hub 2-1:1.0:
hub_port_status failed (err = -110)
May  2 15:49:44 lennie kernel: [344382.459545] hub 2-1:1.0:
hub_port_status failed (err = -110)
May  2 15:49:50 lennie kernel: [344387.551918] hub 2-1:1.0:
hub_port_status failed (err = -110)
May  2 15:49:50 lennie kernel: [344388.413611] sd 6:0:0:0: rejecting
I/O to offline device
May  2 15:49:50 lennie kernel: [344388.413650] sd 6:0:0:0: rejecting
I/O to offline device
May  2 15:49:50 lennie kernel: [344388.413668] sd 6:0:0:0: rejecting
I/O to offline device
May  2 15:49:52 lennie kernel: [344390.062780] sd 6:0:0:0: rejecting
I/O to offline device
May  2 15:49:52 lennie kernel: [344390.062837] ffff8801034da000: 80 ab
4d 03 01 88 ff ff 00 00 70 b4 f0 7f 00 00  ..M.......p.....
May  2 15:49:52 lennie kernel: [344390.062844] XFS (sdb104): Internal
error xfs_dir2_data_reada_verify at line 226 of file
/build/buildd/linux-lts-raring-3.8.0/fs/xfs/xfs_dir2_data.c.
 Caller 0xffffffffa079e33f
May  2 15:49:52 lennie kernel: [344390.062844]
May  2 15:49:52 lennie kernel: [344390.062852] Pid: 642, comm:
kworker/0:1H Tainted: G         C   3.8.0-39-generic #57~precise1-Ubuntu
May  2 15:49:52 lennie kernel: [344390.062854] Call Trace:
May  2 15:49:52 lennie kernel: [344390.062902]  [<ffffffffa07a018f>]
xfs_error_report+0x3f/0x50 [xfs]
May  2 15:49:52 lennie kernel: [344390.062921]  [<ffffffffa079e33f>] ?
xfs_buf_iodone_work+0x3f/0xa0 [xfs]
May  2 15:49:52 lennie kernel: [344390.062939]  [<ffffffffa07a01fe>]
xfs_corruption_error+0x5e/0x90 [xfs]
May  2 15:49:52 lennie kernel: [344390.062966]  [<ffffffffa07da159>]
xfs_dir2_data_reada_verify+0x59/0xa0 [xfs]
May  2 15:49:52 lennie kernel: [344390.062986]  [<ffffffffa079e33f>] ?
xfs_buf_iodone_work+0x3f/0xa0 [xfs]
May  2 15:49:52 lennie kernel: [344390.062994]  [<ffffffff8108e54a>] ?
finish_task_switch+0x4a/0xf0
May  2 15:49:52 lennie kernel: [344390.063013]  [<ffffffffa079e33f>]
xfs_buf_iodone_work+0x3f/0xa0 [xfs]
May  2 15:49:52 lennie kernel: [344390.063019]  [<ffffffff81078de1>]
process_one_work+0x141/0x4a0
May  2 15:49:52 lennie kernel: [344390.063024]  [<ffffffff81079dd8>]
worker_thread+0x168/0x410
May  2 15:49:52 lennie kernel: [344390.063029]  [<ffffffff81079c70>] ?
manage_workers+0x120/0x120
May  2 15:49:52 lennie kernel: [344390.063034]  [<ffffffff8107f300>]
kthread+0xc0/0xd0
May  2 15:49:52 lennie kernel: [344390.063039]  [<ffffffff8107f240>] ?
flush_kthread_worker+0xb0/0xb0
May  2 15:49:52 lennie kernel: [344390.063046]  [<ffffffff816ff56c>]
ret_from_fork+0x7c/0xb0
May  2 15:49:52 lennie kernel: [344390.063050]  [<ffffffff8107f240>] ?
flush_kthread_worker+0xb0/0xb0
May  2 15:49:52 lennie kernel: [344390.063054] XFS (sdb104):
Corruption detected. Unmount and run xfs_repair
May  2 15:49:52 lennie kernel: [344390.067128] sd 6:0:0:0: rejecting
I/O to offline device
May  2 15:49:52 lennie kernel: [344390.067158] XFS (sdb104): metadata
I/O error: block 0x8a6ec930 ("xfs_trans_read_buf_map") error 117 numblks 8
May  2 15:49:52 lennie kernel: [344390.067179] ffff8801034da000: 80 ab
4d 03 01 88 ff ff 00 00 70 b4 f0 7f 00 00  ..M.......p.....
May  2 15:49:52 lennie kernel: [344390.067184] XFS (sdb104): Internal
error xfs_dir2_block_verify at line 71 of file
/build/buildd/linux-lts-raring-3.8.0/fs/xfs/xfs_dir2_block.c.  Call
er 0xffffffffa07d7f3e
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTY9vCAAoJELsEaSRwbVYrIhsQAIDDL7yshllWCBcxSDmfdefh
PMTgMxvzprexd+5xqh14klDySA78FZM44bzMd5mjABQ+GvE0hhbB6kLMQSuySXWi
c+nNtpZXsW7R+o5D0GymWF1PYn3KfbE/aJ3lrLtA6yddwV0KanB4SxD45HoiKGdJ
1a2uLZB4G8ZjvyO6tQYn63R9GMWIX0mK5TovzrXY5JRaTIhYxwwTJjKzQpT+N67m
nWb86Ve3ahDQHBZx1hhf/xRtKYjgPENH57goKyZqdcmUlTgm2AUhsN0tbfm5T1sX
Bb0f4ZOebkfdhXfq5Sk/Eysz7gL+CdPwETJUwr/Z42QFUZfkK1/G1bbJTXZeXi8B
cngPk65VxV4UCGX3nzVpg5wk7scelIFULrmUM8FgiR3+SN6oZ4cWycQLGYr44j4k
UchuHcZpuMvCiHIPXWGk1PASIWUqdy7eroj900pVVGBMRwyiNe3pmbVHOpjK2owi
KaCUiDB86WuKK9V5SSWL3UgVfjy994vZEIvOczaf7+vKfkhW4OX2MJNXDGmWW0/E
3JFbIrD8ETPGhYR2+emRZhOa6op8I5buvkegfMLgWhRxh5jlxxeZ6e2ZdUHc8Ty2
r8xaKnoJArehYzUKxqPCBLwRNljGBMrZ+F1O2Ifemm4cWtocmG56Ae3WvbM+btEH
2po38EG9LNPvuquUJqxy
=+zQ+
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-02 17:54       ` Martin Papik
@ 2014-05-02 18:39         ` Eric Sandeen
  2014-05-02 19:07           ` Martin Papik
  0 siblings, 1 reply; 31+ messages in thread
From: Eric Sandeen @ 2014-05-02 18:39 UTC (permalink / raw)
  To: Martin Papik, xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/2/14, 12:54 PM, Martin Papik wrote:
> 
>> In the USB case when it comes back with a new name, as far as I
>> know there is no mechanism to handle that anywhere in the kernel.
> 
> Is there a mechanism for other devices?

to be honest, I'm not certain; if it came back under the same device
name, things may have continued.  I'm not sure.

In general, filesystems are not very happy with storage being yanked
out from under them.

>>> Is XFS is not stable enough to function without a need to reboot
>>> in case of a relatively minor HW failure? Minor meaning affecting
>>> only some disks.
> 
>> It's not a question of XFS stability, IMHO.  XFS was talking to
>> device A; device A went away and never came back.
> 
> Well, it kinda did come back, but that's different story.
> 
>> The issue of being unable to repair it seems to have been a result
>> of files still open on the (disappeared) device?  Once you resolved
>> that, all was well, and no reboot was needed, correct?
> 
> Yup, but xfs was still active without a trace in /proc/mounts, which
> what confused me.

I agree, it's confusing.

>> I suggested the reboot as a big-hammer fix to clear the mysterious
>> stale mount; turns out that was not required, apparently.
> 
> I don't like that particular hammer. Personal opinion, sure, but it
> seems to me that reboot is what you do when you don't know what went
> wrong or you know it's totally fubar. In this case, IMHO, not fubar.

Well, I did say that it was the simplest thing.  Not the best or
most informative thing.  :)

>> If ustat(device) was reporting that it's mounted, but
>> /proc/partitions didn't show it, then the device was in some kind
>> of limbo state, I guess, and that sort of umount handling is below
>> XFS (or any other filesystem), as far as I know.
> 
> I'm confused here. /dev/old was not in /proc/partitions or
> /proc/mounts, /dev/new was in /proc/partitions but not in
> /proc/mounts, even after disconnect and reconnect of the drive the
> /dev/new refused to be acted on by xfs_check or xfs_repair. How did
> that happen? All right, apparently there was a slate xfs instance in
> the kernel, not visible anywhere, but that was attached to /dev/old,
> why did xfs_repair fail to work on /dev/new until the stale xfs
> instance in the kernel finished shutting down.

Somewhere in the vfs, the filesystem was still present in a way that
the ustat syscall reported that it was mounted. xfs_repair uses this
syscall to determine mounted state.  It called sys_ustat, got an
answer of "it's mounted" and refused to continue. 

It refused to continue because running xfs_repair on a mounted filesystem
would lead to severe damage.

>> What initiated the unmount, was it you (after the USB disconnect)
>> or some udev magic?
> 
> The disconnect of the USB drive, specifically the internal HUB in the
> notebook failed (don't know how), I reset it from ssh (keyboard is
> also on the hub), see below, I didn't find any messages from any user
> space system, but they might not log everything, but there were
> messages about the XFS driver detecting the error, the USB hub being
> fubar-ed, the device being off-line, so I'm guessing it was the panic
> action, or maybe userspace. I'm not sure, I wasn't able to find out
> how XFS handles errors, there's nothing in the manual pages and google
> didn't help. Do you know? I.e. the equivalent of errors=remount_ro, or
> whatever. One page claimed xfs doesn't recognize this option. My
> system has the defaults and it's ubuntu/precise, if that helps.

If xfs encounters an insurmountable error, it will shut down, and all
operations will return EIO or EUCLEAN.  You are right that there is no
errors=* mount option; the behavior is not configurable on xfs.

You're right that this doesn't seem to be well described in documentation,
that's probably something we should address.

As for the root cause event; XFS on a yanked and re-plugged USB device
is not something that is heavily tested, to be honest, and it's something
that no filesystem handles particularly well, as far as I know.
(I know that ext4 has had some patches to at least make it a bit less
noisy...)

- -Eric

> Martin
> 
> 
> 
> 
> May  2 15:49:06 lennie kernel: [344344.325232] sd 11:0:0:0: rejecting
> I/O to offline device
> May  2 15:49:39 lennie kernel: [344377.367220] hub 2-1:1.0:
> hub_port_status failed (err = -110)
> May  2 15:49:44 lennie kernel: [344382.459545] hub 2-1:1.0:
> hub_port_status failed (err = -110)
> May  2 15:49:50 lennie kernel: [344387.551918] hub 2-1:1.0:
> hub_port_status failed (err = -110)
> May  2 15:49:50 lennie kernel: [344388.413611] sd 6:0:0:0: rejecting
> I/O to offline device
> May  2 15:49:50 lennie kernel: [344388.413650] sd 6:0:0:0: rejecting
> I/O to offline device
> May  2 15:49:50 lennie kernel: [344388.413668] sd 6:0:0:0: rejecting
> I/O to offline device
> May  2 15:49:52 lennie kernel: [344390.062780] sd 6:0:0:0: rejecting
> I/O to offline device
> May  2 15:49:52 lennie kernel: [344390.062837] ffff8801034da000: 80 ab
> 4d 03 01 88 ff ff 00 00 70 b4 f0 7f 00 00  ..M.......p.....
> May  2 15:49:52 lennie kernel: [344390.062844] XFS (sdb104): Internal
> error xfs_dir2_data_reada_verify at line 226 of file
> /build/buildd/linux-lts-raring-3.8.0/fs/xfs/xfs_dir2_data.c.
>  Caller 0xffffffffa079e33f
> May  2 15:49:52 lennie kernel: [344390.062844]
> May  2 15:49:52 lennie kernel: [344390.062852] Pid: 642, comm:
> kworker/0:1H Tainted: G         C   3.8.0-39-generic #57~precise1-Ubuntu
> May  2 15:49:52 lennie kernel: [344390.062854] Call Trace:
> May  2 15:49:52 lennie kernel: [344390.062902]  [<ffffffffa07a018f>]
> xfs_error_report+0x3f/0x50 [xfs]
> May  2 15:49:52 lennie kernel: [344390.062921]  [<ffffffffa079e33f>] ?
> xfs_buf_iodone_work+0x3f/0xa0 [xfs]
> May  2 15:49:52 lennie kernel: [344390.062939]  [<ffffffffa07a01fe>]
> xfs_corruption_error+0x5e/0x90 [xfs]
> May  2 15:49:52 lennie kernel: [344390.062966]  [<ffffffffa07da159>]
> xfs_dir2_data_reada_verify+0x59/0xa0 [xfs]
> May  2 15:49:52 lennie kernel: [344390.062986]  [<ffffffffa079e33f>] ?
> xfs_buf_iodone_work+0x3f/0xa0 [xfs]
> May  2 15:49:52 lennie kernel: [344390.062994]  [<ffffffff8108e54a>] ?
> finish_task_switch+0x4a/0xf0
> May  2 15:49:52 lennie kernel: [344390.063013]  [<ffffffffa079e33f>]
> xfs_buf_iodone_work+0x3f/0xa0 [xfs]
> May  2 15:49:52 lennie kernel: [344390.063019]  [<ffffffff81078de1>]
> process_one_work+0x141/0x4a0
> May  2 15:49:52 lennie kernel: [344390.063024]  [<ffffffff81079dd8>]
> worker_thread+0x168/0x410
> May  2 15:49:52 lennie kernel: [344390.063029]  [<ffffffff81079c70>] ?
> manage_workers+0x120/0x120
> May  2 15:49:52 lennie kernel: [344390.063034]  [<ffffffff8107f300>]
> kthread+0xc0/0xd0
> May  2 15:49:52 lennie kernel: [344390.063039]  [<ffffffff8107f240>] ?
> flush_kthread_worker+0xb0/0xb0
> May  2 15:49:52 lennie kernel: [344390.063046]  [<ffffffff816ff56c>]
> ret_from_fork+0x7c/0xb0
> May  2 15:49:52 lennie kernel: [344390.063050]  [<ffffffff8107f240>] ?
> flush_kthread_worker+0xb0/0xb0
> May  2 15:49:52 lennie kernel: [344390.063054] XFS (sdb104):
> Corruption detected. Unmount and run xfs_repair
> May  2 15:49:52 lennie kernel: [344390.067128] sd 6:0:0:0: rejecting
> I/O to offline device
> May  2 15:49:52 lennie kernel: [344390.067158] XFS (sdb104): metadata
> I/O error: block 0x8a6ec930 ("xfs_trans_read_buf_map") error 117 numblks 8
> May  2 15:49:52 lennie kernel: [344390.067179] ffff8801034da000: 80 ab
> 4d 03 01 88 ff ff 00 00 70 b4 f0 7f 00 00  ..M.......p.....
> May  2 15:49:52 lennie kernel: [344390.067184] XFS (sdb104): Internal
> error xfs_dir2_block_verify at line 71 of file
> /build/buildd/linux-lts-raring-3.8.0/fs/xfs/xfs_dir2_block.c.  Call
> er 0xffffffffa07d7f3e
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTY+ZbAAoJECCuFpLhPd7gtk8QAJ65zHJXtYtC/zGndfeok8en
5gbG5Ctgz3uZlMVL0/JGAlHGJDxGBS9YcHBNr/WJmC9VtRinQno/o4L0Z1nycWb/
kAMfDPRJyV4qfMSo8UQOXLovoA6p+neuF5pkVX9m2nmjM4CgQTmmCEEnmUkE78Yj
8lcy4xtWM6tlYVDS5LiNRplaATXJXuBZGL9glxqnxUwGy34/2z+YtcrBNUM0rRtN
VH1ws/ci9RwMWDWy7gfEzbJIQMRVUpHmNeC2PIlRVK130YpbwjqIoEpYOeyfBeVE
f8uSrGZVSEj4qEm5K72Ulx+GjbLCqhhIQcBDFmqwyhTxph+ARJd1ium3cUN9r6Ki
nbWHA0f2PG04E8a5O3pr0Kn61B6Y2a0fuzrMaGNG6dftJa7UPcknEQzRTk8+8dwE
uD1veinxP/w9vJjDL0pSbSz1T8sJGF5nCD4cszhN/iplYjf3R3EQXf4p6pDbGeSh
NUT97ysxjqbIBeZaM+pUkzPNfY9vjjCqbxUrimIGiOq3QzovILszvb1U7taIP7EI
5FF0a/NbYHvp+Ks8r6zst0HAxPQ6UMx5+1Yxi5zo23ROq3PKGz/t3zWi6zgYkfzO
08IIDiEBtD7M5CIv/mVsK9CX+5nFG6g2khYB6xlsLGYjxUZHw/TTRB+xpflI/qJi
Yg0LgqPOXkH36W08nJLG
=srIM
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-02 18:39         ` Eric Sandeen
@ 2014-05-02 19:07           ` Martin Papik
  2014-05-02 19:16             ` Eric Sandeen
  2014-05-02 23:35             ` Dave Chinner
  0 siblings, 2 replies; 31+ messages in thread
From: Martin Papik @ 2014-05-02 19:07 UTC (permalink / raw)
  To: Eric Sandeen, xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

> to be honest, I'm not certain; if it came back under the same
> device name, things may have continued.  I'm not sure.

Personally, I haven't seen it reconnect even once. I've seen disks
fail to appear until the old references are removed, or even
partitions not detecting until all is clean. Reconnecting, only on SW
raid, and only when everything was just right.

> In general, filesystems are not very happy with storage being
> yanked out from under them.

Yup, I know that, except when there's raid 1, 5 or 6, some yanking is
possible. But I wish it were possible, even if manually at my own risk.

> Well, I did say that it was the simplest thing.  Not the best or 
> most informative thing.  :)

I know, I'm just philosophically opposed to rebooting, every time I'm
forced to reboot a system I have a nagging feeling I don't really know
what the problem is and how to fix it. So, having to reboot makes me
think I'm stupid. So I prefer fixing things.

> Somewhere in the vfs, the filesystem was still present in a way
> that the ustat syscall reported that it was mounted. xfs_repair
> uses this syscall to determine mounted state.  It called sys_ustat,
> got an answer of "it's mounted" and refused to continue.
> 
> It refused to continue because running xfs_repair on a mounted
> filesystem would lead to severe damage.

I understand that, and I'm okay with whatever I need to do in order to
restore the FS after the failure, but it would be good to have xfs
report the status correctly, i.e. show up in /proc/mounts UNTIL all
resources are released. What do you think?

> If xfs encounters an insurmountable error, it will shut down, and
> all operations will return EIO or EUCLEAN.  You are right that
> there is no errors=* mount option; the behavior is not configurable
> on xfs.

IMHO it should be, but since the last email I've glanced at some
mailing lists and understand that there's some reluctance, in the name
of not polluting the FS after an error. But at least a R/O remount
should be possible, to prevent yanking libraries from under
applications (root FS).

> You're right that this doesn't seem to be well described in
> documentation, that's probably something we should address.

Yup, any idea when? .... Also, I think it would be good to have a
section on what to do when things go south and what to expect. E.g. I
found out the hard way that xfs_check on a 2TB disk allocates 16G of
memory, so now I'm running it with cgroup based limitations, otherwise
I couldn't even open my emails now. I'm still not sure when to run
xfs_check and when xfs_repair, etc. At least I haven't seen such docs.
Maybe I missed them.

Martin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTY+zaAAoJELsEaSRwbVYr5+MP/AnX6a3aKjwgCI9NzV7/0FG2
Whm/9JR+wg3r0DQ1jc+RUn2NfFIFjkABmxid+icZeZy3o3P0fAcS8yFlKIdzvZaA
k7KWgITDbpd/IxVJA1kplxS+MJW/1ACUxGEfsEfDR9YwtkPR3hiFP0vNCp+Y8RTi
EDawgNYhJrmLFN/8cMkryPAWiowEBebUZAvDClwMkt9wJW0RzAeccc07IRHAMEuN
fBeu+iJJwMdGn/NQfJrOZBXdwU9C/M7v43L269g4H8mCSOFiHCe4prtKWK7LHb0q
JvAddCESBEYgAoO7LpZumAmpoGZDR69d80aLvWEayBm+FVi84Wbwl5gde+QH7UKx
lH2rWEngSv61OmW0CRfZ2MthYsjGGJF/+4JrVepSiCpu2Vra9X9yOZKV+aJzt+fX
lSgaoXsYNIkimJ1fDJHFMeHlZzU4ju4avD6YBNdZP/WPc20awxhv1jJys3ZZCwUc
ynAx44AFUS6PXqf6rGJngc/wcfvWDBYio7umbfx/WeLt2cn5CcNhqOWCvu4TNuAt
mn4vG1ULIP8v5YaTfDuZQ7vfP4DVDGWqyd4ZTdLkix0wXAAnwZrbpAVST8sgcKY9
17N5dXUT2JyoUZVydRwBzZPORNj6iWO3aKnXykAk/rW+yTGRJuamluS4GYmZQtaK
EqJhKeQD81CDtWPlnSr8
=+dT5
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-02 19:07           ` Martin Papik
@ 2014-05-02 19:16             ` Eric Sandeen
  2014-05-02 19:29               ` Martin Papik
  2014-05-02 23:35             ` Dave Chinner
  1 sibling, 1 reply; 31+ messages in thread
From: Eric Sandeen @ 2014-05-02 19:16 UTC (permalink / raw)
  To: Martin Papik, xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 5/2/14, 2:07 PM, Martin Papik wrote:

...

>> You're right that this doesn't seem to be well described in
>> documentation, that's probably something we should address.
> 
> Yup, any idea when? .... Also, I think it would be good to have a
> section on what to do when things go south and what to expect. E.g. I
> found out the hard way that xfs_check on a 2TB disk allocates 16G of
> memory, so now I'm running it with cgroup based limitations, otherwise
> I couldn't even open my emails now. I'm still not sure when to run
> xfs_check and when xfs_repair, etc. At least I haven't seen such docs.
> Maybe I missed them.

We have a lot of docs at http://xfs.org/index.php/XFS_Papers_and_Documentation
in publican/xml format, but Dave has been making noises about converting
that to asciidoc.  In any case, the goal is documentation which is readily
available, version-controlled, for which patches may be submitted...

Like everything else, there are more things to do than there are hours
to do them.

- -Eric

> Martin
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTY+70AAoJECCuFpLhPd7gRAYP/2m4sRZZh6yb2z7FZOZOv9l9
uwde0O1/dLt1wpBArcdQoOnMSyPGXzH/0Dd0iDoGSCaTOGlfSwsLTnz7vzemN7am
bhhh0ky9vLaUdmqIBJA/DJYcPx3yJiTUm9W5ats8cWJFOMajgF9X92Lg5nlmDrto
kOw6+73ct3ShHo0yzf/SlbcBkPbQ7SJ3XAFyD7eF2+UxEe+UMRmPZq+C0PCYBYpE
GMi9ByUKuWSM3Misy+e9LBrnHFkq+JT+Z9EnBxJGACWQ+ctDMCjzou1TEWFiljd1
lXdotGOGDR9ETFC4A/9lmCd/CHgJ3xYVXxpgOH4sJuRsUG1QOgw/sInv2fbvDdTg
5f70Od3zzO3UflwWbz8LkltODc1Q+uCWdr2wDzRFQ1JWlxT1BdNvOs9Muly9FQFD
0AwusjyDiOhO8QFocivp2sYuohvUMlMCLKx07QPtVkx0a7o3N8iG4hOjMaAc8oEn
mNAHJMU/Et2VHFJr14mDUee3SeC+lMmKhhZ1SMr/G3WOlwaH6C7jpkts+rXWkHqI
r1GakzXi+F+hONLWSFTylwXFLioqHqjBYt/8UWZ/1pmw4pHFHm7XTwNdsaiLErtW
0a8fYvTFeSHWhPCX6s1XnUBqSFFADo7Q1ApJ3c9+bXlwWM7YOCxA9BtTLXNDJZCQ
22PiBukZVLhpYmNRZs4y
=TPaU
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-02 19:16             ` Eric Sandeen
@ 2014-05-02 19:29               ` Martin Papik
  2014-05-02 23:38                 ` Dave Chinner
  0 siblings, 1 reply; 31+ messages in thread
From: Martin Papik @ 2014-05-02 19:29 UTC (permalink / raw)
  To: Eric Sandeen, xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


> We have a lot of docs at 
> http://xfs.org/index.php/XFS_Papers_and_Documentation in 
> publican/xml format, but Dave has been making noises about 
> converting that to asciidoc.  In any case, the goal is 
> documentation which is readily available, version-controlled, for 
> which patches may be submitted...

Unless this documentation is part of the installable package, its
utility is limited. My expectation was for the manual pages to be the
reference. There could be a snippet in each man page saying "for more
information go to http://....", which would help locate this
additional documentation. :-)

Anyway, thanks for the link. I'll have a look.

Martin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTY/IlAAoJELsEaSRwbVYrELgP/3gq8APx4OKxKaFpWpuQi+TV
okqLv7HHtiz3LnL7D8IhM6m2V4UUqgrsWATDtieRAFRmGFZDAVfzjsIpjIi/aGK3
nhEB1vpumwx2r25NOQXanQ4RLcPydIWkp/yQNe+9n9JTkpJwZopgSGPyAFiuBy/P
JlTOG/1Izqs/t9zCYMbjfrpbgRhyS6tpnS//bIaXb5oDwRFiuT6uQG1RRmWIUMFz
eutCqpax2A4U5vdk5E7mhoytykSdX6ZiGyvHEBtHYnNFVwnfHGEsxfdmJQOe9vpj
Wp45aR0QO/7XnE4/L47bRkxiGJ7f9jAcq+oyWpT1FYapbCHCIujXIcXNzjEtyLl1
PQpMmFOL4B4qkoG574qm1Lvx0RGkwb/gwmCoKXnnK8Za1kxRQFYNvUl99SHpWjzf
iVuOK2Akr0E+AGf3Eviqvx8Y1HA7m31Z4LfofIXpp1lzmKuHIeFCvQT/tzWPbIFs
XBKiy9R5fJ+lS6QNkHC4HvbI1SEGSjv3WWbiFOIMPin9/Vr/hmtAVP4TklMqT/kw
M54ksMdJ+LE/WW5HwtUAMzhDF9PK2BDdM1ESukhW84Vzz6Ccj4pCqqP9vae+z9eL
XimHCgGD2NUMsST8TJ9g2z/6IRDnkegUzI6w0kTWs72+OOnvFXK3TGV0BxOM0lCx
VcVffcgToVy7KRv00ZAz
=7tAo
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-02 19:07           ` Martin Papik
  2014-05-02 19:16             ` Eric Sandeen
@ 2014-05-02 23:35             ` Dave Chinner
  2014-05-03  0:04               ` Martin Papik
  1 sibling, 1 reply; 31+ messages in thread
From: Dave Chinner @ 2014-05-02 23:35 UTC (permalink / raw)
  To: Martin Papik; +Cc: Eric Sandeen, xfs

On Fri, May 02, 2014 at 10:07:20PM +0300, Martin Papik wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> > to be honest, I'm not certain; if it came back under the same
> > device name, things may have continued.  I'm not sure.

No, they won't. Because the disconnection breaks all references from
the filesystem to the original block device.

> Personally, I haven't seen it reconnect even once. I've seen disks
> fail to appear until the old references are removed, or even
> partitions not detecting until all is clean. Reconnecting, only on SW
> raid, and only when everything was just right.

Right, that's because sw raid probes the new drive, finds the MD/LVM
signature, and knows where it put. Nothing else does.

> > Somewhere in the vfs, the filesystem was still present in a way
> > that the ustat syscall reported that it was mounted. xfs_repair
> > uses this syscall to determine mounted state.  It called sys_ustat,
> > got an answer of "it's mounted" and refused to continue.
> > 
> > It refused to continue because running xfs_repair on a mounted
> > filesystem would lead to severe damage.
> 
> I understand that, and I'm okay with whatever I need to do in order to
> restore the FS after the failure, but it would be good to have xfs
> report the status correctly, i.e. show up in /proc/mounts UNTIL all
> resources are released. What do you think?

It's called a lazy unmount: "umount -l". It disconnects the
filesystem from the namespace, but it still lives on in the kernel
until all references to the filesystem go away. Given that the
hot-unplug proceedure can call back into the filesystem to
sync it (once it's been disconnected!) the hot unplug can deadlock
on filesystem locks that can't be released until the hot-unplug
errors everything out.

So you can end up with the system in an unrecoverable state when USB
unplugs.

> > If xfs encounters an insurmountable error, it will shut down, and
> > all operations will return EIO or EUCLEAN.  You are right that
> > there is no errors=* mount option; the behavior is not configurable
> > on xfs.
> 
> IMHO it should be, but since the last email I've glanced at some
> mailing lists and understand that there's some reluctance, in the name
> of not polluting the FS after an error. But at least a R/O remount
> should be possible, to prevent yanking libraries from under
> applications (root FS).

What you see here has nothing to do with XFS's shutdown behaviour.
The filesystem is already unmounted, it just can't be destroyed
because there are still kernel internal references to it.

> > documentation, that's probably something we should address.
> 
> Yup, any idea when? .... Also, I think it would be good to have a
> section on what to do when things go south and what to expect. E.g. I
> found out the hard way that xfs_check on a 2TB disk allocates 16G of
> memory, so now I'm running it with cgroup based limitations, otherwise

$ man xfs_check
....
Note that xfs_check is deprecated and scheduled for removal in June
2014. Please use xfs_repair -n instead.
....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-02 19:29               ` Martin Papik
@ 2014-05-02 23:38                 ` Dave Chinner
  0 siblings, 0 replies; 31+ messages in thread
From: Dave Chinner @ 2014-05-02 23:38 UTC (permalink / raw)
  To: Martin Papik; +Cc: Eric Sandeen, xfs

On Fri, May 02, 2014 at 10:29:50PM +0300, Martin Papik wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> 
> > We have a lot of docs at 
> > http://xfs.org/index.php/XFS_Papers_and_Documentation in 
> > publican/xml format, but Dave has been making noises about 
> > converting that to asciidoc.  In any case, the goal is 
> > documentation which is readily available, version-controlled, for 
> > which patches may be submitted...
> 
> Unless this documentation is part of the installable package, its
> utility is limited. My expectation was for the manual pages to be the
> reference. There could be a snippet in each man page saying "for more
> information go to http://....", which would help locate this
> additional documentation. :-)

Eventually it will become exactly that - an installable XFS
documentation package that ends up in /usr/share/doc/xfs.  Other
parts of it will end up as the source for the wiki pages e.g.  the
FAQ. As I do stuff, it ends up here:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs-documentation.git

But right now it's time and resources that limit the conversion and
development of that repository, so don't hold your breathe too long
waiting for it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-02 23:35             ` Dave Chinner
@ 2014-05-03  0:04               ` Martin Papik
  2014-05-03  3:02                 ` Dave Chinner
  0 siblings, 1 reply; 31+ messages in thread
From: Martin Papik @ 2014-05-03  0:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


> It's called a lazy unmount: "umount -l". It disconnects the 
> filesystem from the namespace, but it still lives on in the kernel 
> until all references to the filesystem go away. Given that the 
> hot-unplug proceedure can call back into the filesystem to sync it
> (once it's been disconnected!) the hot unplug can deadlock on
> filesystem locks that can't be released until the hot-unplug errors
> everything out.
> 
> So you can end up with the system in an unrecoverable state when
> USB unplugs.

And the disconnect from the namespace is what removes it from
/proc/mounts?

By hot unplug, do you mean a user initiated "remove device" or a pull
out of the USB cable? I'm sorry, I don't understand your example.
Would you be kind enough to elaborate?

>>> If xfs encounters an insurmountable error, it will shut down,
>>> and all operations will return EIO or EUCLEAN.  You are right
>>> that there is no errors=* mount option; the behavior is not
>>> configurable on xfs.
>> 
>> IMHO it should be, but since the last email I've glanced at some 
>> mailing lists and understand that there's some reluctance, in the
>> name of not polluting the FS after an error. But at least a R/O
>> remount should be possible, to prevent yanking libraries from
>> under applications (root FS).
> 
> What you see here has nothing to do with XFS's shutdown behaviour. 
> The filesystem is already unmounted, it just can't be destroyed 
> because there are still kernel internal references to it.

How can I detect this situation? I mean I didn't see anything in
/proc/mounts or references to the mount point from /proc/<pid>/*, so I
only managed to correct it (chdir elsewhere) by chance on a hunch.
Would it not be desirable to know that there's a phantom FS referenced
by a number of processes?

Also, do you know if this affects other filesystems? I never saw this
with ext3/4 or reiser, I don't have much practical experience with
other filesystems. I ask because your explanation sounds like it's vfs
rather than xfs, but as I said, I never saw this before.

>>> documentation, that's probably something we should address.
>> 
>> Yup, any idea when? .... Also, I think it would be good to have
>> a section on what to do when things go south and what to expect.
>> E.g. I found out the hard way that xfs_check on a 2TB disk
>> allocates 16G of memory, so now I'm running it with cgroup based
>> limitations, otherwise
> 
> $ man xfs_check .... Note that xfs_check is deprecated and
> scheduled for removal in June 2014. Please use xfs_repair -n
> instead.

Thanks, I didn't know that.

Martin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTZDKTAAoJELsEaSRwbVYr5T4QAJ+10OjafQUnX6zvG0Lrhs1C
G+4Liuxm5aUmINKfUEeuhPJOsNfrdrSs+/SW6G9u5Lhu6FSrll/+O1BLa4Ld6Mxx
3IADom8RQl0rcEMpBGnPNi1hTY0RycYk+Pzug1GzCz2nDE6zCojobvGoW8a02BaL
pEdfh0NXDVAjSbTubHKXSqxWydIkVJacbshEy/BhySQuZmPSiu1BOIV1DTvGLqIz
VIsYDkv7UvuZyKsBL+0ux/9gPVPNP78hIIvUU9hLomjfnUum02vV6ps6RJZtGjVt
OKZ02qaIjaRPtlFCU21YTFr/x0WIGFsh7Zzfma4sDs4tXCqB7FEs+NA4Fq0zoHV0
OSCiiBgCTTtkph0Bn5/WycoVfkxm9eCru5eCLY1NeBRCIFi5rlRNX/Uvo9YO3twA
PvvGMHFROYtNl0u+/e1Tkniylwtanx7esMgVb0rC4IYHeovxZkHIuFkjHv5/PMNs
p+w8u6ZOfKOARUfYiFHOLVR/QAhp4ubhpTegD7d6Eqqtea/d/vGrUj6Bu/4svZ9j
YsVmYqsnUe1Uisz+NarmH/t7KeeRJBqEPLvJ9rZ2P7ixQLOTxsnuyU7kOdZKpwIM
jHAzaAIfxcntyL76hPbnkAdSZU//zOv3qfyfkD/NuqnKi1BOsQKZMMb9NEcA4OOg
QoWmXdMC64OlWv1Buxdr
=qP6z
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-03  0:04               ` Martin Papik
@ 2014-05-03  3:02                 ` Dave Chinner
  2014-06-02 11:22                   ` Martin Papik
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Chinner @ 2014-05-03  3:02 UTC (permalink / raw)
  To: Martin Papik; +Cc: xfs

On Sat, May 03, 2014 at 03:04:48AM +0300, Martin Papik wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> 
> > It's called a lazy unmount: "umount -l". It disconnects the 
> > filesystem from the namespace, but it still lives on in the kernel 
> > until all references to the filesystem go away. Given that the 
> > hot-unplug proceedure can call back into the filesystem to sync it
> > (once it's been disconnected!) the hot unplug can deadlock on
> > filesystem locks that can't be released until the hot-unplug errors
> > everything out.
> > 
> > So you can end up with the system in an unrecoverable state when
> > USB unplugs.
> 
> And the disconnect from the namespace is what removes it from
> /proc/mounts?

I believe so.

> By hot unplug, do you mean a user initiated "remove device" or a pull
> out of the USB cable? I'm sorry, I don't understand your example.
> Would you be kind enough to elaborate?

Anything that causes a hot-unplug to occur. There's no real
difference between echoing a value to the relevant sysfs file to
trigger the hot-unplug or simply pull the plug on the active device.
Or could even occur because something went wrong in the USB
subsystem (e.g. a hub stopped communicating) and so the end devices
disappeared, even though nothing is wrong with them.

> >>> If xfs encounters an insurmountable error, it will shut down,
> >>> and all operations will return EIO or EUCLEAN.  You are right
> >>> that there is no errors=* mount option; the behavior is not
> >>> configurable on xfs.
> >> 
> >> IMHO it should be, but since the last email I've glanced at some 
> >> mailing lists and understand that there's some reluctance, in the
> >> name of not polluting the FS after an error. But at least a R/O
> >> remount should be possible, to prevent yanking libraries from
> >> under applications (root FS).
> > 
> > What you see here has nothing to do with XFS's shutdown behaviour. 
> > The filesystem is already unmounted, it just can't be destroyed 
> > because there are still kernel internal references to it.
> 
> How can I detect this situation? I mean I didn't see anything in
> /proc/mounts or references to the mount point from /proc/<pid>/*, so I
> only managed to correct it (chdir elsewhere) by chance on a hunch.
> Would it not be desirable to know that there's a phantom FS referenced
> by a number of processes?

lsof.

> Also, do you know if this affects other filesystems? I never saw this
> with ext3/4 or reiser, I don't have much practical experience with
> other filesystems. I ask because your explanation sounds like it's vfs
> rather than xfs, but as I said, I never saw this before.

Yes, it affects all filesystems - the same behaviour occurs
regardless of the filesystem that is active on the block device.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-05-03  3:02                 ` Dave Chinner
@ 2014-06-02 11:22                   ` Martin Papik
  2014-06-02 23:41                     ` Dave Chinner
  0 siblings, 1 reply; 31+ messages in thread
From: Martin Papik @ 2014-06-02 11:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


>> How can I detect this situation? I mean I didn't see anything in 
>> /proc/mounts or references to the mount point from /proc/<pid>/*,
>> so I only managed to correct it (chdir elsewhere) by chance on a
>> hunch. Would it not be desirable to know that there's a phantom
>> FS referenced by a number of processes?
> 
> lsof.

No good. It happened again and the only thing showing up in lsof
that's even remotely a match is the kernel process associated with the
filesystem instance. "[xfs-data/sde103]". The mountpoint has been
removed by the automatic mounting facility (udev? systemd?). The
device is no longer id /dev (udev?). There's nothing useful in the
output of "find /proc -ls" either.

Any other suggestions to locate the processes that are holding up XFS?

Martin

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTjF5dAAoJELsEaSRwbVYrO7IP/016AVgVrdCQ2Vm95hFT5rdu
LH6Sz0mpnZn6mAliNLAa+wFpXArDb6VPyssmomzNLgnpA95qLcfJeBqs1wF88tiA
GO+aFSX3o3npAokbaq1N5c1dKz/g9lpsUPBkmiUEqcPLJMOKdJdSSd4T2nnnKrkD
uDGRmEPZGKXLt75Z0SajnG8oX22bSQdzREn2o4DXZy+RJF9G8wsc490b8TzijDul
AtL/2zlRL45LqbcaNNAOARUiTN4IAxy3xwYRU+xXKhqojPRc0x8K4wrjnGcsiw4x
Cz5HjJ6CdBLmjn69TRf+S5Y/UXtUcyFOIS0XIUwYg6KtwrI1BY+N+GpkSJNG/DcH
0lAN6n7UBTAADw54ZJdYhyw5aUt9OEXdDx2nIuTxq+X/6u4ydEDchJWPvyLG2nv9
Z2vObyh07X9n7h+zknK3yBxlHHNcpi1/d3njxRSa3GKQIPKNbrLNkrFqB4U4+isr
qrcldbcke5d6sUP/VdvjeOGTYWtymx8KqL1MDGPRXruOO1Y6N8DWTbseQCbeoQlS
k0L9md5FsZOPG/aUgXHvD4350sKqYpQk2pgoGW7Ia6F+5nwi1TdSd8R6mlGJ0GyW
hMymVeykhb19mVJzdc7UlmxyAUBUEd2zGEeO0uSenHt19yWc93vfEHG6GCtxFpu6
sxblWr9nRfPi6sOw7oUF
=xn+2
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-06-02 11:22                   ` Martin Papik
@ 2014-06-02 23:41                     ` Dave Chinner
  2014-06-03  9:23                       ` Martin Papik
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Chinner @ 2014-06-02 23:41 UTC (permalink / raw)
  To: Martin Papik; +Cc: xfs

On Mon, Jun 02, 2014 at 02:22:15PM +0300, Martin Papik wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> 
> >> How can I detect this situation? I mean I didn't see anything in 
> >> /proc/mounts or references to the mount point from /proc/<pid>/*,
> >> so I only managed to correct it (chdir elsewhere) by chance on a
> >> hunch. Would it not be desirable to know that there's a phantom
> >> FS referenced by a number of processes?
> > 
> > lsof.
> 
> No good.

lsof reports such things as belonging to / because it can't find the
correct path for them. Indeed, you can't find them by filtering on
mount point, file or anything else. But they are there.

e.g:

$ sudo mount /dev/vdc /mnt/scratch
$ cd /mnt/scratch
$ sudo umount -l /mnt/scratch
$ sleep 300
$ cd ~
$ ps -ef |grep [s]leep
dave     16341  7432  0 09:27 pts/1    00:00:00 sleep 300
$ sudo lsof |grep sleep
sleep     16341            dave  cwd       DIR             253,32        6         96 /
sleep     16341            dave  rtd       DIR                8,1     4096          2 /
sleep     16341            dave  txt       REG                8,1    31208      32607 /bin/sleep
sleep     16341            dave  mem       REG                8,1  1742520     245384 /lib/x86_64-linux-gnu/libc-2.17.so
sleep     16341            dave  mem       REG                8,1   145160     245381 /lib/x86_64-linux-gnu/ld-2.17.so
sleep     16341            dave  mem       REG                8,1  1607584      98063 /usr/lib/locale/locale-archive
sleep     16341            dave    0u      CHR              136,1      0t0          4 /dev/pts/1
sleep     16341            dave    1u      CHR              136,1      0t0          4 /dev/pts/1
sleep     16341            dave    2u      CHR              136,1      0t0          4 /dev/pts/1

See the first two DIR lines? They have different devices but the
same path (/). That's what you need to look for - the non-root device
with a root path - in this case 253,32:

$ ls -l /dev/vdc
brw-rw---- 1 root disk 253, 32 Jun  3 09:01 /dev/vdc
$

With this knowledge, the simple way:

$ sudo lsof |grep "253,32"
bash       7432            dave  cwd       DIR             253,32        6         96 /
sleep     16341            dave  cwd       DIR             253,32        6         96 /

There's the two processes  holding references to the unmounted
filesystem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-06-02 23:41                     ` Dave Chinner
@ 2014-06-03  9:23                       ` Martin Papik
  2014-06-03  9:55                         ` Stefan Ring
  0 siblings, 1 reply; 31+ messages in thread
From: Martin Papik @ 2014-06-03  9:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


> lsof reports such things as belonging to / because it can't find
> the correct path for them. Indeed, you can't find them by filtering
> on mount point, file or anything else. But they are there.

I didn't know that, sorry. I'll let you know how it works out when it
craps out next time.

But I noticed something interesting when I was playing around with it.
There were 5 processes in the kernel doing something with the defunct
xfs mount, which is no surprise, and when I tried to run xfs_repair
and straced it and there was nothing that looked like a scan of kernel
objects (e.g. via /sys). But I've noticed a call to ustat on the newly
connected device, which had a different ID, the disconnected one was
/dev/sde103 and the new device (the one ustat-ed) was /dev/sdd103, but
ustat reported it mounted. Does XFS do this?

Martin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTjZP8AAoJELsEaSRwbVYr1coP/i4hjHUg7qEKvlLaTlgE7l3t
wruzcEKagipiHDBXryDWCFoinJiIZ5CIXdektN4t/TQbQ+nbO5/OtkB/d/4QKvV/
MQVsmZfCmF3WU74QZ3+mzI9aQ/eJ6Sbq3GUUIGDylsBP9UKJHkxgHdUZ7zNligQm
hzsslSgOuV+Yyiv/4MwIBWeFQswM7y3/5PMP5QnTHv6diUVtqiBvJvFDmE1MXxD0
FHm52+W0WxsqF2dJ3nCGfYhAeO9uiNIjXHJZKs8dcYQKaONpxpUYFaol/qa8EFCK
abw8atF0oSaGS3VgjDPj+LRudTz308M7VWZPxfCmzXtfepg4SnBXadAyJCAg584L
Huhu9GhU+yKwkuQiMSOYu46SECp/O0JcqvLB197gQPWqFQFoEuOH6dmnCnldNgId
8LXzBlsjo6dSq5ryF4D9CFi1gh3a8+Kcb7BjurQ5EksKkds9W5sYoDSytbNWmKS8
b3vY23XKq6iSe8155lPSvwot2VqFd56PQSURYXWFOu1tFK4yO9754uzlgKVyHYvm
OhMwa3GdcYat/ArFbOewvFupElpTSsWdtD1nbaZ+OFTorzMCtN5nBIGM47eaTlFf
ZwMQ5FvJGIBpb+sCwa4SdHOGuH1+uGoYIBO79Hbuyp0iMjhVPpuTSlq/ovytbvHl
Bw9D4KbQAeHT4YGnnR5q
=akH0
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-06-03  9:23                       ` Martin Papik
@ 2014-06-03  9:55                         ` Stefan Ring
  2014-06-03 10:48                           ` Martin Papik
  0 siblings, 1 reply; 31+ messages in thread
From: Stefan Ring @ 2014-06-03  9:55 UTC (permalink / raw)
  To: Martin Papik; +Cc: Linux fs XFS

>From skimming this thread, it seems that there is some hardware issue
at work here, but nonetheless, I had a very similar situation a while
ago that was rather puzzling to me at the time, having to do with
mount namespaces:
http://oss.sgi.com/pipermail/xfs/2012-August/020910.html

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-06-03  9:55                         ` Stefan Ring
@ 2014-06-03 10:48                           ` Martin Papik
  2014-06-03 21:28                             ` Dave Chinner
  0 siblings, 1 reply; 31+ messages in thread
From: Martin Papik @ 2014-06-03 10:48 UTC (permalink / raw)
  To: Stefan Ring; +Cc: Linux fs XFS

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 06/03/2014 12:55 PM, Stefan Ring wrote:
> From skimming this thread, it seems that there is some hardware
> issue at work here, but nonetheless, I had a very similar situation
> a while ago that was rather puzzling to me at the time, having to
> do with mount namespaces: 
> http://oss.sgi.com/pipermail/xfs/2012-August/020910.html
> 

Hardware issue or not, IMHO XFS has some issues. Specifically, thus
far I have not seen any other filesystem prevent fsck on a USB disk
that disconnected and was reconnected. After all the reconnected
device is a new device. But the new device (different from the
previous one, e.g. sda and sdb) can't be checked (xfs_repair) or mounted.

All right, here's a bit of an experiment. I have a hard drive I use
for testing with several small partitions with several filesystems.

After automounting I see this:

$ cat /proc/mounts | grep media/T
/dev/sdf101 /media/T2 ext2
rw,nosuid,nodev,relatime,errors=continue,user_xattr,acl 0 0
/dev/sdf102 /media/T4 btrfs rw,nosuid,nodev,relatime,nospace_cache 0 0
/dev/sdf104 /media/T5 ext4 rw,nosuid,nodev,relatime,data=ordered 0 0
/dev/sdf103 /media/T4_ ext3
rw,nosuid,nodev,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered
0 0
/dev/sdf100 /media/TEST xfs
rw,nosuid,nodev,relatime,attr2,inode64,noquota 0 0

I open hexedit on some files on ext4 and xfs

and I see this:

$ lsof | grep TEST
hexedit   24010      martin    3u      REG              259,2
4198400        131 /media/TEST/TEST...FILE
hexedit   24011      martin    3u      REG              259,6
4198400         12 /media/T5/TEST...FILE

After yanking the USB cable I see this:

$ cat /proc/mounts | grep media/T
  --- no output ---
$ lsof | grep TEST
hexedit   24010      martin    3u  unknown
              /TEST...FILE (stat: Input/output error)
hexedit   24011      martin    3u      REG              259,6
4198400         12 /TEST...FILE

After reconnecting the device ext4 mounts, xfs does not.

dmegs contains this (among other [unrelated] things):

[3095915.107117] sd 60:0:0:0: [sdf] 976773167 512-byte logical blocks:
(500 GB/465 GiB)
[3095915.108343] sd 60:0:0:0: [sdf] Write Protect is off
[3095915.108360] sd 60:0:0:0: [sdf] Mode Sense: 1c 00 00 00
[3095915.110633] sd 60:0:0:0: [sdf] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[3095915.207622]  sdf: sdf69 sdf100 sdf101 sdf102 sdf103 sdf104 sdf105
[3095915.210148] sd 60:0:0:0: [sdf] Attached SCSI disk
[3095917.969887] XFS (sdf100): Mounting Filesystem
[3095918.209464] XFS (sdf100): Starting recovery (logdev: internal)
[3095918.260450] XFS (sdf100): Ending recovery (logdev: internal)
[3096069.218797] XFS (sdf100): metadata I/O error: block 0xa02007
("xlog_iodone") error 19 numblks 64
[3096069.218808] XFS (sdf100): xfs_do_force_shutdown(0x2) called from
line 1115 of file
/build/buildd/linux-lts-raring-3.8.0/fs/xfs/xfs_log.c.  Return address
= 0xffffffffa07f4fd1
[3096069.218830] XFS (sdf100): Log I/O Error Detected.  Shutting down
filesystem
[3096069.218833] XFS (sdf100): Please umount the filesystem and
rectify the problem(s)
[3096099.254131] XFS (sdf100): xfs_log_force: error 5 returned.
[3096129.289338] XFS (sdf100): xfs_log_force: error 5 returned.
[3096159.324525] XFS (sdf100): xfs_log_force: error 5 returned.
[3096185.296795] sd 61:0:0:0: [sdg] 976773167 512-byte logical blocks:
(500 GB/465 GiB)
[3096185.297431] sd 61:0:0:0: [sdg] Write Protect is off
[3096185.297447] sd 61:0:0:0: [sdg] Mode Sense: 1c 00 00 00
[3096185.298022] sd 61:0:0:0: [sdg] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[3096185.392940]  sdg: sdg69 sdg100 sdg101 sdg102 sdg103 sdg104 sdg105
[3096185.395247] sd 61:0:0:0: [sdg] Attached SCSI disk
[3096189.359859] XFS (sdf100): xfs_log_force: error 5 returned.
[3096219.395200] XFS (sdf100): xfs_log_force: error 5 returned.
[3096249.430490] XFS (sdf100): xfs_log_force: error 5 returned.
[3096279.465765] XFS (sdf100): xfs_log_force: error 5 returned.
[3096309.501089] XFS (sdf100): xfs_log_force: error 5 returned.
[3096339.536371] XFS (sdf100): xfs_log_force: error 5 returned.
[3096369.571713] XFS (sdf100): xfs_log_force: error 5 returned.
[3096399.607003] XFS (sdf100): xfs_log_force: error 5 returned.
[3096429.642332] XFS (sdf100): xfs_log_force: error 5 returned.
[3096459.677730] XFS (sdf100): xfs_log_force: error 5 returned.
[3096489.712934] XFS (sdf100): xfs_log_force: error 5 returned.
[3096519.748242] XFS (sdf100): xfs_log_force: error 5 returned.
[3096549.783642] XFS (sdf100): xfs_log_force: error 5 returned.

sdf100 (the old device) and sdg100 (the reconnected device) are
different, but XFS won't touch it.

# xfs_repair /dev/sdg100
xfs_repair: /dev/sdg100 contains a mounted filesystem

fatal error -- couldn't initialize XFS library


Also please do carefully note the difference between the lsof output
for the hung file descriptor for xfs and ext4. ext4 reports everything
the same as before, except for the mount path. xfs report changes, the
device ID is missing, the file changes from REG to unknown.

So, AFAIK and IMHO this is an issue with XFS. The impact can be the
inability to recover from a device disconnect, since so far I don't
see a good way to figure out which processes are holding up the FS.
And besides, having to kill processes to mount a filesystem (xfs) is
not a happy state of affairs.

Oh yes, there is a hardware issue somewhere, but that is not the cause
of the XFS behavior, only the trigger. Since the experiment in this
email was without my USB HUB going nuts, I merely did a good old
fashioned cable yank. And yes, it's not an every day occurrence, but a
stable and reliable FS should deal with it. At least I think so, don't
you? Sadly I can't help with the coding, I am not familiar with the
code base, I got a bit lost trying to follow the path of ustat and
proc mounts, it was ages since I touched the kernel sources. But I can
provide information about what happened. :-) I hope it helps us all
have a good FS.

Martin

PS

# xfs_repair /dev/sdg100
xfs_repair: /dev/sdg100 contains a mounted filesystem

fatal error -- couldn't initialize XFS library
# kill 24010
# xfs_repair /dev/sdg100
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which
needs to
be replayed.  Mount the filesystem to replay the log, and unmount it
before
re-running xfs_repair.  If you are unable to mount the filesystem,
then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a
mount
of the filesystem before doing this.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTjaf4AAoJELsEaSRwbVYrJfsP/3z/WI5+dkk2XduRayB2FdOo
S97IMjGHSEbNDNEAKvTsahYwZENE5TizuhyOrvQORl+fsMaedIdn2QYVS6fGAnJR
llhNMQezUKOfwBZtpf3S3FmvFZCoN+q3BTfl2qkmY29c0aivLyxyTCsGlDprHY2Q
pxv3QzsXRtM1FYk6+FFtc9XQYCiLU3KOAq4I7GoGcAMjFRpH8xpuogI2fQQQkFo8
NGxZBmtTq3xbOd/7237tug44Z98iM/uz+tT2xE5g3iJSqcEhaMTJbAkv9d6uBY8G
xLb+yT5M2O6Z6xuZowk3ySFtO+Ia5Row3BhQrpuySdkRNueiJf9KTLMleMNxVqj8
DcNL2hFS6Fyog6g0wVfoUM3txm5wx80w15K2zN2cPnOsdDO11QKUbV9ktFjQ7f++
CLcmxGHtuq7SFM0bMgbcxvA5B9Gs/9tlzXDiN/jag3ixMZYTmOC15ayJevAM3Nru
xN/lPBMiFO+Rr89yZz303M+hRRRD4pQL1VxcyPjs0f6l0tWqb2Xx0wpFBjantUyF
EzIUwgekwMktzLefhTgXumDH/aE9xlY2au+sJtL255uX1XBq4qE4sxrGv73+L9Ti
M+tToCi7sQPoMwzCqJqHHbYWwaisgbq9AFymy2FUFUSqiiV21NMdIZeu7zcDEzuj
pG51qhnHCz5O48cPBpZx
=ecc3
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-06-03 10:48                           ` Martin Papik
@ 2014-06-03 21:28                             ` Dave Chinner
  2014-06-03 22:37                               ` Martin Papik
  2014-06-03 22:58                               ` Martin Papik
  0 siblings, 2 replies; 31+ messages in thread
From: Dave Chinner @ 2014-06-03 21:28 UTC (permalink / raw)
  To: Martin Papik; +Cc: Stefan Ring, Linux fs XFS

On Tue, Jun 03, 2014 at 01:48:31PM +0300, Martin Papik wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> On 06/03/2014 12:55 PM, Stefan Ring wrote:
> > From skimming this thread, it seems that there is some hardware
> > issue at work here, but nonetheless, I had a very similar situation
> > a while ago that was rather puzzling to me at the time, having to
> > do with mount namespaces: 
> > http://oss.sgi.com/pipermail/xfs/2012-August/020910.html
> > 
> 
> Hardware issue or not, IMHO XFS has some issues.

No issues, XFS just behaves differently to hot-unplug scenarios to
ext4. the ext4 behaviour is actually problematic when it comes
to data and filesystem security in error conditions and so it is not
a model we shoul dbe following.

To summarise, yanking the device out from behind XFS iis causin an
EIO error to a critical metadata write and it is shutting down to
prevent further error and/or corruption propagation. You have to
unmount the XFS shutdown filesystem before you can access the
filesystem and mount point again.

The fact that ext4 is not failing when you yank the plug is a bad
sign. That's actually a major potential for Bad Stuff because
there's no guarantee that the device you plugged back in is the same
device, yet ext4 appears to think it is just fine. What happens next
is likely to be filesystem corruption and data loss.

> $ cat /proc/mounts | grep media/T
>   --- no output ---
> $ lsof | grep TEST
> hexedit   24010      martin    3u  unknown
>               /TEST...FILE (stat: Input/output error)

Yup, EIO - the device is gone, filesystem shutdown. This is a correct
reposnse to the conditions you have created.

> hexedit   24011      martin    3u      REG              259,6
> 4198400         12 /TEST...FILE
> 
> After reconnecting the device ext4 mounts, xfs does not.

Yup - XFS refuses to mount a filesystem with a duplicate UUID,
preventing you from mounting the same filesystem from two different
logical block device instances that point to the same physical disk.
That's the only sane thing to do in enterprise storage systems that
use multi-pathing to present failure-tolerant access to a physical
device.

> dmegs contains this (among other [unrelated] things):
> 
> [3095915.107117] sd 60:0:0:0: [sdf] 976773167 512-byte logical blocks:
> (500 GB/465 GiB)
> [3095915.108343] sd 60:0:0:0: [sdf] Write Protect is off
> [3095915.108360] sd 60:0:0:0: [sdf] Mode Sense: 1c 00 00 00
> [3095915.110633] sd 60:0:0:0: [sdf] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
> [3095915.207622]  sdf: sdf69 sdf100 sdf101 sdf102 sdf103 sdf104 sdf105
> [3095915.210148] sd 60:0:0:0: [sdf] Attached SCSI disk
> [3095917.969887] XFS (sdf100): Mounting Filesystem
> [3095918.209464] XFS (sdf100): Starting recovery (logdev: internal)
> [3095918.260450] XFS (sdf100): Ending recovery (logdev: internal)
> [3096069.218797] XFS (sdf100): metadata I/O error: block 0xa02007
> ("xlog_iodone") error 19 numblks 64

#define ENODEV          19      /* No such device */

Yup, that's what happened to the filesystem - you unplugged the
device and it:

> [3096069.218808] XFS (sdf100): xfs_do_force_shutdown(0x2) called from
> line 1115 of file
> /build/buildd/linux-lts-raring-3.8.0/fs/xfs/xfs_log.c.  Return address
> = 0xffffffffa07f4fd1
> [3096069.218830] XFS (sdf100): Log I/O Error Detected.  Shutting down
> filesystem
> [3096069.218833] XFS (sdf100): Please umount the filesystem and
> rectify the problem(s)

triggered a shutdown and told you what to do next.

> [3096099.254131] XFS (sdf100): xfs_log_force: error 5 returned.
> [3096129.289338] XFS (sdf100): xfs_log_force: error 5 returned.
> [3096159.324525] XFS (sdf100): xfs_log_force: error 5 returned.
> [3096185.296795] sd 61:0:0:0: [sdg] 976773167 512-byte logical blocks:
> (500 GB/465 GiB)
> [3096185.297431] sd 61:0:0:0: [sdg] Write Protect is off
> [3096185.297447] sd 61:0:0:0: [sdg] Mode Sense: 1c 00 00 00
> [3096185.298022] sd 61:0:0:0: [sdg] Write cache: enabled, read cache:

Then the device was hot-plugged and it came back as a different
block device.

> sdf100 (the old device) and sdg100 (the reconnected device) are
> different, but XFS won't touch it.
> 
> # xfs_repair /dev/sdg100
> xfs_repair: /dev/sdg100 contains a mounted filesystem
> 
> fatal error -- couldn't initialize XFS library

Yup, because the filesystem is still mounted at /mnt/TEST. XFS
checks whether the filesystem on the block device is mounted, not
whether the block device *instance* is mounted. Again, this is
needed in redundant path storage setups because, for example,
/dev/sdc and /dev/sdx might be the same physical disk and filesystem
but have different paths to get them.

> Also please do carefully note the difference between the lsof output
> for the hung file descriptor for xfs and ext4. ext4 reports everything
> the same as before, except for the mount path. xfs report changes, the
> device ID is missing, the file changes from REG to unknown.

Of course - it can't be queried because the filesystem has shut down
and it returned an error.

> So, AFAIK and IMHO this is an issue with XFS. The impact can be the
> inability to recover from a device disconnect, since so far I don't
> see a good way to figure out which processes are holding up the FS.
> And besides, having to kill processes to mount a filesystem (xfs) is
> not a happy state of affairs.

I think you have incorrect expectations of how filesystems should
handle device hot-unplug and a later replug.  You're expecting a
filesystem that is designed for robustness in data center
environments and complex redundant path storage configurations to
behave like a filesystem designed for your laptop.

Hot-unplug is a potential data loss event. Silent data loss is the
single worst evil a filesystem can perpetrate on a user because the
user does not know they lost their important cat videos until they
try to show them to their friends. Now, would you prefer to know you
lost your cat videos straight away (XFS behaviour), or a few months
later when you try to retreive them (ext4 behaviour)?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-06-03 21:28                             ` Dave Chinner
@ 2014-06-03 22:37                               ` Martin Papik
  2014-06-05  0:55                                 ` Dave Chinner
  2014-06-03 22:58                               ` Martin Papik
  1 sibling, 1 reply; 31+ messages in thread
From: Martin Papik @ 2014-06-03 22:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Linux fs XFS

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


I think you're trying too hard to defend XFS which may be causing you
to miss my point. Or it could be my bad communication.

When I yank and replug a device, I can only remount the device only if
I kill certain processes. But this limitation exists only on the same
box. I.e. it won't prevent me from mounting the same disk on a
different machine, just the same one.

So here are a few questions.

(1) If the device vanished, why not just terminate the mount instance?
After all it's not like the device comes back one day and the
processes will be able to continue writing to the files. If you could
do that it would be good, but you yourself said you can't.

(2) Following the methods of the prior experiments I did this,
connected the disk to PC1, hexedit file, yank disk, plug disk, at this
point PC1 won't touch the disk, moved the disk to PC2, it
automatically, silently (Mounting Filesystem ++ Ending clean mount)
mounts the FS, then move the disk back and the disk still doesn't
mount, claiming it's mounted, never mind that since then the FS was
mounted somewhere else and for all intents and purposes it a
completely different disk, to which (question 1) the potentially
unwritten data will never be written back. I apologize, but I really
don't see what XFS is protecting me from or how and I doubt its
success rate. Can you please explain?

(3) Isn't it possible that XFS just doesn't recognize that whatever
error condition happened is permanent and the disk won't come back.
Isn't XFS just forcing me to take a manual action by accident?
Imagine, I have some files, just saved them, didn't call fsync, the
data is still in some cache, the cable is yanked, and the data is
lost. But in this case the XFS won't complain. Only if there's a
process. Seems more like circumstance than design. Is it? Is this an
actual intentional behavior. Designed, as opposed to just happening.
Again, I apologize, but it really seems to me that this isn't right,
that it's neither intentional nor correct. I mean, if it was
intentional, why not prevent me from mounting the FS on PC2, it wasn't
cleanly unmounted, there was a file open, clearly there was some
rationale to stop the auto-mount. And if it didn't stop me from
mounting on PC2, why does it stop me on PC1, if after all the
unwritten data will never ever be written.

> Yup - XFS refuses to mount a filesystem with a duplicate UUID, 
> preventing you from mounting the same filesystem from two
> different logical block device instances that point to the same
> physical disk. That's the only sane thing to do in enterprise
> storage systems that use multi-pathing to present failure-tolerant
> access to a physical device.

Actually, IMHO it would also be sane to forget you ever saw a UUID
after the last underlying physical device is gone and you're not going
to be ever writing to this. Since if you're never touching the FS with
UUID XYZ then it's not mounted enough to prevent use. IMHO. But yes,
as long as you do have a functioning relationship with UUID XYZ
through /dev/sda1, lock /dev/sdb1 if it has the same UUID. But not
after you've lost all block devices. ........ Or attempting to put my
understanding of the situation in humorous terms "the kernel is
preventing access to /dev/sdg100 out of grief for the death of
/dev/sdf100". Lame joke, yes, but think please, what is the actual
benefit of me having to kill a process, after which I yank again, plug
again, and the FS mounts silently. I really don't get this. How is
this not a bug?

Martin

On 06/04/2014 12:28 AM, Dave Chinner wrote:
> On Tue, Jun 03, 2014 at 01:48:31PM +0300, Martin Papik wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
>> 
>> On 06/03/2014 12:55 PM, Stefan Ring wrote:
>>> From skimming this thread, it seems that there is some
>>> hardware issue at work here, but nonetheless, I had a very
>>> similar situation a while ago that was rather puzzling to me at
>>> the time, having to do with mount namespaces: 
>>> http://oss.sgi.com/pipermail/xfs/2012-August/020910.html
>>> 
>> 
>> Hardware issue or not, IMHO XFS has some issues.
> 
> No issues, XFS just behaves differently to hot-unplug scenarios to 
> ext4. the ext4 behaviour is actually problematic when it comes to
> data and filesystem security in error conditions and so it is not a
> model we shoul dbe following.
> 
> To summarise, yanking the device out from behind XFS iis causin an 
> EIO error to a critical metadata write and it is shutting down to 
> prevent further error and/or corruption propagation. You have to 
> unmount the XFS shutdown filesystem before you can access the 
> filesystem and mount point again.
> 
> The fact that ext4 is not failing when you yank the plug is a bad 
> sign. That's actually a major potential for Bad Stuff because 
> there's no guarantee that the device you plugged back in is the
> same device, yet ext4 appears to think it is just fine. What
> happens next is likely to be filesystem corruption and data loss.
> 
>> $ cat /proc/mounts | grep media/T --- no output --- $ lsof | grep
>> TEST hexedit   24010      martin    3u  unknown /TEST...FILE
>> (stat: Input/output error)
> 
> Yup, EIO - the device is gone, filesystem shutdown. This is a
> correct reposnse to the conditions you have created.
> 
>> hexedit   24011      martin    3u      REG              259,6 
>> 4198400         12 /TEST...FILE
>> 
>> After reconnecting the device ext4 mounts, xfs does not.
> 
> Yup - XFS refuses to mount a filesystem with a duplicate UUID, 
> preventing you from mounting the same filesystem from two
> different logical block device instances that point to the same
> physical disk. That's the only sane thing to do in enterprise
> storage systems that use multi-pathing to present failure-tolerant
> access to a physical device.
> 
>> dmegs contains this (among other [unrelated] things):
>> 
>> [3095915.107117] sd 60:0:0:0: [sdf] 976773167 512-byte logical
>> blocks: (500 GB/465 GiB) [3095915.108343] sd 60:0:0:0: [sdf]
>> Write Protect is off [3095915.108360] sd 60:0:0:0: [sdf] Mode
>> Sense: 1c 00 00 00 [3095915.110633] sd 60:0:0:0: [sdf] Write
>> cache: enabled, read cache: enabled, doesn't support DPO or FUA 
>> [3095915.207622]  sdf: sdf69 sdf100 sdf101 sdf102 sdf103 sdf104
>> sdf105 [3095915.210148] sd 60:0:0:0: [sdf] Attached SCSI disk 
>> [3095917.969887] XFS (sdf100): Mounting Filesystem 
>> [3095918.209464] XFS (sdf100): Starting recovery (logdev:
>> internal) [3095918.260450] XFS (sdf100): Ending recovery (logdev:
>> internal) [3096069.218797] XFS (sdf100): metadata I/O error:
>> block 0xa02007 ("xlog_iodone") error 19 numblks 64
> 
> #define ENODEV          19      /* No such device */
> 
> Yup, that's what happened to the filesystem - you unplugged the 
> device and it:
> 
>> [3096069.218808] XFS (sdf100): xfs_do_force_shutdown(0x2) called
>> from line 1115 of file 
>> /build/buildd/linux-lts-raring-3.8.0/fs/xfs/xfs_log.c.  Return
>> address = 0xffffffffa07f4fd1 [3096069.218830] XFS (sdf100): Log
>> I/O Error Detected.  Shutting down filesystem [3096069.218833]
>> XFS (sdf100): Please umount the filesystem and rectify the
>> problem(s)
> 
> triggered a shutdown and told you what to do next.
> 
>> [3096099.254131] XFS (sdf100): xfs_log_force: error 5 returned. 
>> [3096129.289338] XFS (sdf100): xfs_log_force: error 5 returned. 
>> [3096159.324525] XFS (sdf100): xfs_log_force: error 5 returned. 
>> [3096185.296795] sd 61:0:0:0: [sdg] 976773167 512-byte logical
>> blocks: (500 GB/465 GiB) [3096185.297431] sd 61:0:0:0: [sdg]
>> Write Protect is off [3096185.297447] sd 61:0:0:0: [sdg] Mode
>> Sense: 1c 00 00 00 [3096185.298022] sd 61:0:0:0: [sdg] Write
>> cache: enabled, read cache:
> 
> Then the device was hot-plugged and it came back as a different 
> block device.
> 
>> sdf100 (the old device) and sdg100 (the reconnected device) are 
>> different, but XFS won't touch it.
>> 
>> # xfs_repair /dev/sdg100 xfs_repair: /dev/sdg100 contains a
>> mounted filesystem
>> 
>> fatal error -- couldn't initialize XFS library
> 
> Yup, because the filesystem is still mounted at /mnt/TEST. XFS 
> checks whether the filesystem on the block device is mounted, not 
> whether the block device *instance* is mounted. Again, this is 
> needed in redundant path storage setups because, for example, 
> /dev/sdc and /dev/sdx might be the same physical disk and
> filesystem but have different paths to get them.
> 
>> Also please do carefully note the difference between the lsof
>> output for the hung file descriptor for xfs and ext4. ext4
>> reports everything the same as before, except for the mount path.
>> xfs report changes, the device ID is missing, the file changes
>> from REG to unknown.
> 
> Of course - it can't be queried because the filesystem has shut
> down and it returned an error.
> 
>> So, AFAIK and IMHO this is an issue with XFS. The impact can be
>> the inability to recover from a device disconnect, since so far I
>> don't see a good way to figure out which processes are holding up
>> the FS. And besides, having to kill processes to mount a
>> filesystem (xfs) is not a happy state of affairs.
> 
> I think you have incorrect expectations of how filesystems should 
> handle device hot-unplug and a later replug.  You're expecting a 
> filesystem that is designed for robustness in data center 
> environments and complex redundant path storage configurations to 
> behave like a filesystem designed for your laptop.
> 
> Hot-unplug is a potential data loss event. Silent data loss is the 
> single worst evil a filesystem can perpetrate on a user because
> the user does not know they lost their important cat videos until
> they try to show them to their friends. Now, would you prefer to
> know you lost your cat videos straight away (XFS behaviour), or a
> few months later when you try to retreive them (ext4 behaviour)?
> 
> Cheers,
> 
> Dave.
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTjk4IAAoJELsEaSRwbVYrOyQQAIkEibi8/SNJkKpghkXEb8ZZ
Ek+DO6xSeKg6glzXsZppQuKGPVytqJ0/J8X8+I98khnrI0+U07vzjHB/L7PFyZ/m
afdrnROcxtJxgXF/td/a1+lZjx3JXZnXxidEkDIpJmb9EpjLErgHDyPqbBmVTcja
o6FVAYG57av1LeTm07jL7T75f7Egb1i/LYMH44g/KKUYjo+b3Hr7LckkG58PGZZj
a3bAeEnfnQnXCBSRJP6zsESiB3OZmfTCbN3m2BGLRMdfBYC4tp+duouWLn5raCWn
cWjVfY9XzbfAR7ls2oKAN29Yfvz9mBecTwiL7F9cDRxpCtvfQXYBcyexkuK6w8Dn
+FnsVP4DcrbxlMf7nf9c3kAxL0epis6WKKc+d2aiZZvwyZcqNGXx02Sm5/cB8Qqk
4Vrwe8/Ja2BcJFEK9WaT8DOLn91pIyVcUubH5yWVm830w/aYOWR+B7ZgUyH4HAee
/DZrnMyzivE+hwEdFBuYnLxGypWhC/m+Zc3HpYOiycsn6t5lXXThmyfK/uzK5Vn1
qUirSp0IfBiCJaqvQGL/SyGAyWRUHHIj7Z/ZQb3WJblobN9gPYX9vRdno3TaIJas
fQNMePWdpxxExFnj2mEWTFcanFwaIPQNmWxbPebiSn++KEdk1VPhtShrBD3ei5dO
D895KEfSbfhA5ZIayc3x
=M2a5
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-06-03 21:28                             ` Dave Chinner
  2014-06-03 22:37                               ` Martin Papik
@ 2014-06-03 22:58                               ` Martin Papik
  2014-06-05  0:08                                 ` Dave Chinner
  1 sibling, 1 reply; 31+ messages in thread
From: Martin Papik @ 2014-06-03 22:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Linux fs XFS

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512



>> [3096069.218830] XFS (sdf100): Log I/O Error Detected.  Shutting
>> down filesystem [3096069.218833] XFS (sdf100): Please umount the
>> filesystem and rectify the problem(s)
> 
> triggered a shutdown and told you what to do next.

Okay, here I'll pick nits. I hope you're not sick of me yet.

1) I would LOVE to unmount the FS, but how? umount /dev/xxx ... device
no longer there. umount /media/xxx ... mount point no longer there.

2) I can't rectify the problems exactly because the FS is mounted
(according to xfs_repair [ustat]), yet not mounted (according to
/proc/mounts). .... unless rectifying the problem means reporting this
as a bug. :-)

3) "Shutting down filesystem" ... isn't this when the new device
should no longer be detected as mounted?

4) come to think of it, if XFS is shutting down, why isn't it
unmounting itself?

Anyway, sorry if I'm being annoying, but I'm really really convinced
this is wrong. :-)

Martin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTjlMuAAoJELsEaSRwbVYrzOAP/36uqrKeKBlAss8ILg9cwdtI
cXxGG7kAPCI2r+jTNCmHnMsJW4ujptVB3jllIlNPiF+oglZHEFf2bhfoYL8D5q+0
oxHiaOuvU590hBfELf+yPHNxkTPpUirhF5VymwqD+ZvV/TEjuw/VvGZ8pAU4bZHv
v4vfDX2ExCHqK3d5j7D5u5YF4wD754yTq+5Qz5L944JKubXCK2uo0xt7i3QluIMJ
dcEHPw1LXIKjjmsTZ2BaCSUwtfLGnQhGp2mHQY2Oe8KeRx8Jog7reNjazcFqI5hn
Ap6Q9ugSbtSn200e4P2pBbQOEsaHgfA9HzAQYZ6rhQk/Wrjr2I34uVxySNDn0RQk
EbjBFV04DjoiLPfff/tWvHtUNR3yZ3X0ssV5kLbfub4nPk0dMApaDI/pB8j//+Qu
4CHZgtodi//9hOFaBX9h/3vJl5aD6XddUVznZGP+0W/W2Gifev5xr+jSiUK00xk6
xLGelmUdtkq5IlFNizXZU/QcTzLW2gcB2oyW0NpoMBqEhsGCFG6IE85H/POtV7Io
otgBiWxBgxYEzGm73krja3yQlflKfuPK+SKJZue/liK5eNuIt8FRFmMH8jK/dGTc
6BMZ66Mnt2Pn7UkewLFmH0XtAPcMO7lyCEBBN0iG0s3f7Vg1T7b0tamNcaThkPt/
476Ebh9LGukDmcNXeyXo
=C9nD
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-06-03 22:58                               ` Martin Papik
@ 2014-06-05  0:08                                 ` Dave Chinner
  2014-06-05  1:07                                   ` Martin Papik
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Chinner @ 2014-06-05  0:08 UTC (permalink / raw)
  To: Martin Papik; +Cc: Linux fs XFS

On Wed, Jun 04, 2014 at 01:58:54AM +0300, Martin Papik wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> 
> 
> >> [3096069.218830] XFS (sdf100): Log I/O Error Detected.  Shutting
> >> down filesystem [3096069.218833] XFS (sdf100): Please umount the
> >> filesystem and rectify the problem(s)
> > 
> > triggered a shutdown and told you what to do next.
> 
> Okay, here I'll pick nits. I hope you're not sick of me yet.
> 
> 1) I would LOVE to unmount the FS, but how? umount /dev/xxx ... device
> no longer there. umount /media/xxx ... mount point no longer there.

Oh, something is doing a lazy unmount automatically on device
unplug? I missed the implications of that - your system is behaviour
exactly as it has been told to behave.

That is, lazy unmount only detaches the mount namespace from the
mount - it doesn't actually tell the kernel to unmount the
filesystem internally, instead it just removes the reference count
it has on it. If there are other open references to the filesystem,
then it won't actually do the real unmount until those references go
away.  i.e. lazy unmount is designed to leave the kernel superblock
(i.e. the filesystem) mounted internally until the last reference to
it goes away.

And that leaves the user to find those references and clean them up
so the kernel can actually unmount it. Put simply, the system is
behaving exactly as it has been asked to act in response to your
actions. Whether the automounter is behaving correctly or not, that
is a different matter, but it is certainly not an XFS bug that a
lazy unmount is leaving you with a mess that you need to cleanup
manually.

> 2) I can't rectify the problems exactly because the FS is mounted
> (according to xfs_repair [ustat]), yet not mounted (according to
> /proc/mounts). .... unless rectifying the problem means reporting this
> as a bug. :-)

Not a bug, it's the desired behaviour of lazy unmounts. Fix userspace
not to hold references when unmounting the filesystem...

> 3) "Shutting down filesystem" ... isn't this when the new device
> should no longer be detected as mounted?

No. Filesystems get shut down for all sorts of reasons and the
correct action to take after unmounting the filesystem depends on
the reason for the shutdown. i.e. a shutdown filesystem requires
manual intervention to recover from, and so the filesystem remains
mounted until such manual intervention can take place.

> 4) come to think of it, if XFS is shutting down, why isn't it
> unmounting itself?

Because a filesystem cannot unmount itself - that has to be done
from userspace.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-06-03 22:37                               ` Martin Papik
@ 2014-06-05  0:55                                 ` Dave Chinner
  2014-06-05  1:38                                   ` Martin Papik
  2014-06-05 19:39                                   ` Martin Papik
  0 siblings, 2 replies; 31+ messages in thread
From: Dave Chinner @ 2014-06-05  0:55 UTC (permalink / raw)
  To: Martin Papik; +Cc: Linux fs XFS

On Wed, Jun 04, 2014 at 01:37:15AM +0300, Martin Papik wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> 
> I think you're trying too hard to defend XFS which may be causing you
> to miss my point. Or it could be my bad communication.

Or it coul dbe you lack the knowledge base to understand what I
explained to you. That happens all the time because this stuff is
complex and very few people actually have the time to understand how
it is all supposed to work.

> When I yank and replug a device, I can only remount the device only if
> I kill certain processes. But this limitation exists only on the same
> box. I.e. it won't prevent me from mounting the same disk on a
> different machine, just the same one.
> 
> So here are a few questions.
> 
> (1) If the device vanished, why not just terminate the mount instance?

That's what the automounter is doing from userspace with the lazy
unmount on reception of a device unplug event. i.e. the policy of
what to do when a device unplug event occurs is handled in
userspace, and it has nothing to do with the filesystem on the block
device.

> (2) Following the methods of the prior experiments I did this,
> connected the disk to PC1, hexedit file, yank disk, plug disk, at this
> point PC1 won't touch the disk, moved the disk to PC2, it
> automatically, silently (Mounting Filesystem ++ Ending clean mount)
> mounts the FS, then move the disk back and the disk still doesn't
> mount, claiming it's mounted, never mind that since then the FS was
> mounted somewhere else and for all intents and purposes it a
> completely different disk, to which (question 1) the potentially
> unwritten data will never be written back. I apologize, but I really
> don't see what XFS is protecting me from or how and I doubt its
> success rate. Can you please explain?

It's not protecting you against doing this. You can subvert
/etc/shadow doing this for all I care, but the fact is that until
you clean up the original mess your cable yanking created, XFS won't
allow you to mount that filesystem again on that system.

As I've already explained, we do not allow multiple instances of the
same filesystem to be mounted because in XFS's primary target market
(i.e. servers and enterprise storage) this can occur because of
multi-pathing presenting the same devices multiple times. And in
those environments, mounting the same filesystem multiple times
through different block devices is *always* a mistake and will
result in filesystem corruption and data loss.

> (3) Isn't it possible that XFS just doesn't recognize that whatever
> error condition happened is permanent and the disk won't come back.

XFS can't determine correctly if it is a fatal permanent or
temporary error condition. Hence if we get an error from the storage
(regardless of the error) in a situation we can't recover
from, it is considered fatal regardless of whether the device is
replugged or not. You case is a failed log IO, which is always a
fatal, unrecoverable error....

> Isn't XFS just forcing me to take a manual action by accident?

No, by intent. Obvious, in-your-face intent. Filesystem corruption
events require manual intervention to analyse and take appropriate
action. You may not think it's necessary for your use case, but
years of use in mission critical data storage environments has
proven otherwise....

> Imagine, I have some files, just saved them, didn't call fsync, the
> data is still in some cache, the cable is yanked, and the data is
> lost. But in this case the XFS won't complain.

It does complain - it logs that it is discarding data unless a
shutdown has already occurred, and then it doesn't bother because
it's already indicated to the log that the filesystem is in big
trouble....

> Only if there's a process. Seems more like circumstance than design. Is it? Is this an
> actual intentional behavior.

Lazy unmount does this by intent and XFS has not control over this.
Lazy unmount is done by your userspace software, not the filesystem.
You're shooting the messenger.

> > Yup - XFS refuses to mount a filesystem with a duplicate UUID, 
> > preventing you from mounting the same filesystem from two
> > different logical block device instances that point to the same
> > physical disk. That's the only sane thing to do in enterprise
> > storage systems that use multi-pathing to present failure-tolerant
> > access to a physical device.
> 
> Actually, IMHO it would also be sane to forget you ever saw a UUID
> after the last underlying physical device is gone and you're not going
> to be ever writing to this.

And how does the referenced, mounted filesystem know this? It can't
- it actually holds a reference to the block device that got yanked,
and internally that block device doesn't go away until the
filesystem releases it's reference.

> Since if you're never touching the FS with
> UUID XYZ then it's not mounted enough to prevent use. IMHO. But yes,
> as long as you do have a functioning relationship with UUID XYZ
> through /dev/sda1, lock /dev/sdb1 if it has the same UUID. But not
> after you've lost all block devices. ........ Or attempting to put my
> understanding of the situation in humorous terms "the kernel is
> preventing access to /dev/sdg100 out of grief for the death of
> /dev/sdf100".

/dev/sdf still exists inside the kernel while the filesystem that
was using it is still mounted. You just can't see kernel-internal
references to block device. Sound familiar? It's just like processes
and lazy unmounts, yes? IOWs, what is happening is this:

Yank the device, the device hot-unplugs and nothing new can now use
it. It still has active references, so it isn't cleaned up. It sends
an unplug event to userspace, probably caught by udev, fed into
dbus, picked up by the automounter, which does a lazy unmount of the
filesystem on the device. Filesystem is removed from the namespace,
but open references to it still exist so it's not fully unmounted
and so still holds a block device reference.  Userspace references
to filesystem go away, filesystem completes unmount, releases
blockdev reference, blockdev cleans up and dissappears completely,
filesystem cleans up and disappears completely.

Userspace causes the mess because it's handling of the unplug event,
and there's nothing we can do in the kernel about that, because....

> Lame joke, yes, but think please, what is the actual
> benefit of me having to kill a process, after which I yank again, plug
> again, and the FS mounts silently. I really don't get this. How is
> this not a bug?

.... until the userspace references to the filesystem go away, the
kernel still has a huge amount of internally referenced state that
you can't see from userspace. So, the bug here is in userspace by
using lazy unmounts and not dropping active references in a timely
fashion after an unplug event has occurred.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-06-05  0:08                                 ` Dave Chinner
@ 2014-06-05  1:07                                   ` Martin Papik
  0 siblings, 0 replies; 31+ messages in thread
From: Martin Papik @ 2014-06-05  1:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Linux fs XFS

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

>> 1) I would LOVE to unmount the FS, but how? umount /dev/xxx ... 
>> device no longer there. umount /media/xxx ... mount point no 
>> longer there.
> 
> Oh, something is doing a lazy unmount automatically on device 
> unplug? I missed the implications of that - your system is 
> behaviour exactly as it has been told to behave.
> 
> That is, lazy unmount only detaches the mount namespace from the 
> mount - it doesn't actually tell the kernel to unmount the 
> filesystem internally, instead it just removes the reference count
>  it has on it. If there are other open references to the 
> filesystem, then it won't actually do the real unmount until those 
> references go away.  i.e. lazy unmount is designed to leave the 
> kernel superblock (i.e. the filesystem) mounted internally until 
> the last reference to it goes away.
> 
> And that leaves the user to find those references and clean them up
> so the kernel can actually unmount it. Put simply, the system is
> behaving exactly as it has been asked to act in response to your
> actions. Whether the automounter is behaving correctly or not, that
> is a different matter, but it is certainly not an XFS bug that a
> lazy unmount is leaving you with a mess that you need to cleanup 
> manually.

But XFS is the one that prevents the repair. For reasons you've
outlined, granted, but it's XFS no longer has access to the device, so
it shouldn't be blocking it.

>> 2) I can't rectify the problems exactly because the FS is mounted
>> (according to xfs_repair [ustat]), yet not mounted (according to
>> /proc/mounts). .... unless rectifying the problem means reporting
>> this as a bug. :-)
> 
> Not a bug, it's the desired behaviour of lazy unmounts. Fix 
> userspace not to hold references when unmounting the filesystem...

Yet it doesn't affect ext4 (to pick an example at random). And the
only way to fix the userspace in this case is to start killing
processes, and again, this is only required for XFS.

>> 3) "Shutting down filesystem" ... isn't this when the new device
>>  should no longer be detected as mounted?
> 
> No. Filesystems get shut down for all sorts of reasons and the 
> correct action to take after unmounting the filesystem depends on 
> the reason for the shutdown. i.e. a shutdown filesystem requires 
> manual intervention to recover from, and so the filesystem remains
>  mounted until such manual intervention can take place.

Once more, shouldn't XFS stop holding onto the UUID after the FS is
shut down AND the underlying device (all of them, in case of
multipath) is returning an error code which means the device won't
ever come back? Seriously, the device is gone, won't come back.
Wouldn't it make sense to just let xfs_repair do its job?

And one more question, did you see the lsof output in my previous
email? Did you notice that while both XFS ans ext4 are still there,
the file that's still in use on ext4 shows the device number, but not
XFS. Just to refresh, here's a copy.

$ lsof | grep TEST
hexedit   24010      martin    3u  unknown /TEST...FILE (stat:
Input/output error)
hexedit   24011      martin    3u      REG              259,6 4198400
        12 /TEST...FILE

See, ext4 was device 259:6, but on XFS the device number doesn't show up.

Looks like lsof is doing a stat (not an lstat) on /proc/X/fd/Y, and
ext4 returns the full inode info, but XFS doesn't. Is this OK? This
info would be the only way to positively tie the processes to the
specific filesystem, wouldn't it?

# stat -L /proc/{15478,15496}/fd/3
stat: cannot stat `/proc/15478/fd/3': Input/output error
  File: `/proc/15496/fd/3'
  Size: 4198400   	Blocks: 520        IO Block: 4096   regular file
Device: 10306h/66310d	Inode: 12          Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/  martin)   Gid: ( 1000/  martin)
Access: 2014-06-05 03:46:42.969617017 +0300
Modify: 2014-03-11 16:24:22.500349375 +0300
Change: 2014-03-11 16:24:22.500349375 +0300
 Birth: -

# strace -e trace=stat stat -L /proc/{15478,15496}/fd/3
stat("/proc/15478/fd/3", 0x7fffcfad7580) = -1 EIO (Input/output error)
stat: cannot stat `/proc/15478/fd/3': Input/output error
stat("/proc/15496/fd/3", {st_mode=S_IFREG|0644, st_size=4198400, ...}) = 0
  File: `/proc/15496/fd/3'
  Size: 4198400   	Blocks: 520        IO Block: 4096   regular file
Device: 10306h/66310d	Inode: 12          Links: 1
stat("/lib/x86_64-linux-gnu/tls/x86_64", 0x7fffcfad69a0) = -1 ENOENT
(No such file or directory)
stat("/lib/x86_64-linux-gnu/tls", 0x7fffcfad69a0) = -1 ENOENT (No such
file or directory)
stat("/lib/x86_64-linux-gnu/x86_64", 0x7fffcfad69a0) = -1 ENOENT (No
such file or directory)
stat("/lib/x86_64-linux-gnu", {st_mode=S_IFDIR|0755, st_size=12288,
...}) = 0
stat("/usr/lib/x86_64-linux-gnu/tls/x86_64", 0x7fffcfad69a0) = -1
ENOENT (No such file or directory)
stat("/usr/lib/x86_64-linux-gnu/tls", 0x7fffcfad69a0) = -1 ENOENT (No
such file or directory)
stat("/usr/lib/x86_64-linux-gnu/x86_64", 0x7fffcfad69a0) = -1 ENOENT
(No such file or directory)
stat("/usr/lib/x86_64-linux-gnu", {st_mode=S_IFDIR|0755,
st_size=69632, ...}) = 0
stat("/lib/tls/x86_64", 0x7fffcfad69a0) = -1 ENOENT (No such file or
directory)
stat("/lib/tls", 0x7fffcfad69a0)        = -1 ENOENT (No such file or
directory)
stat("/lib/x86_64", 0x7fffcfad69a0)     = -1 ENOENT (No such file or
directory)
stat("/lib", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0
stat("/usr/lib/tls/x86_64", 0x7fffcfad69a0) = -1 ENOENT (No such file
or directory)
stat("/usr/lib/tls", 0x7fffcfad69a0)    = -1 ENOENT (No such file or
directory)
stat("/usr/lib/x86_64", 0x7fffcfad69a0) = -1 ENOENT (No such file or
directory)
stat("/usr/lib", {st_mode=S_IFDIR|0755, st_size=90112, ...}) = 0
Access: (0644/-rw-r--r--)  Uid: ( 1000/  martin)   Gid: ( 1000/  martin)
Access: 2014-06-05 03:46:42.969617017 +0300
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=195, ...}) = 0
Modify: 2014-03-11 16:24:22.500349375 +0300
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=195, ...}) = 0
Change: 2014-03-11 16:24:22.500349375 +0300
 Birth: -

>> 4) come to think of it, if XFS is shutting down, why isn't it 
>> unmounting itself?
> 
> Because a filesystem cannot unmount itself - that has to be done 
> from userspace.

That makes sense.

Martin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTj8LPAAoJELsEaSRwbVYrQ7IP/1rLx09jgQBK+4tlcSZqjd8G
dOYQ4onEUrUPUh9/wfzmfArK0DqKSrNK2Gp9y2IpuHIW7i/700TziL1ryVh9k6F+
4Yf+7xPz/tzKQONe/X3XpdO9jSoyJ3pfIQh5Zq7fgUMl6dSr+S3hFYGJ/ZoDgwz5
/E9z17J8Avur3PJNto1CZA5/KqpiRcm/EwXclQMkvN6I7VfJWLiTtmpzntAbzYJI
2QaUP3/k9IxIEB3sydZcGCvcMxljglCrGhFnUX/Q0/qtVMZpHH/oyGZw1KifxUFf
/R5lw1h5CBSHY6fMsjZXWXFvIfzSnli5hV9jIjjRi/tVdXLDCnz4JV3DUP3lMjLc
K8srNBQwk/FM7jOnNcmoAS/EIAx3+FAC8JZL47GbA8EWgDjzUk/AhVAfpvwXkIig
5MA0qn2aYMnLNaUeE8/ZYN5c/5ZnJUnruaL4vM/oP+7YNHnr04GQXoFmIoJ7KOL+
0bhtozACj7K2pNlBS+0jvSY7HnampTdcNXREqHk+hkKzn69vI4xcPNrYRCCyY0hz
OISdfUAMlUighsxy999EYLVz6bLiSy4IJ3aen09SHvRS1iifJycV3MLpiOJl3GED
84AEGLCGCBNHAqP7oWn5acXNSzkvuNTJ1dTpSmL3V+mg9GeoduNyhwP8ymEdsyOE
BH075Xvzf5qh3qLTCELi
=84Mu
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-06-05  0:55                                 ` Dave Chinner
@ 2014-06-05  1:38                                   ` Martin Papik
  2014-06-05 19:39                                   ` Martin Papik
  1 sibling, 0 replies; 31+ messages in thread
From: Martin Papik @ 2014-06-05  1:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Linux fs XFS

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

>> I think you're trying too hard to defend XFS which may be causing
>> you to miss my point. Or it could be my bad communication.
> 
> Or it coul dbe you lack the knowledge base to understand what I 
> explained to you. That happens all the time because this stuff is 
> complex and very few people actually have the time to understand
> how it is all supposed to work.

Yup, it's arcane, but I understand what you're trying to tell me, I
just don't agree. Mostly because I simply don't believe the kernel
(block device layer) won't indicate a permanent error (device gone)
and the FS needs to hold onto a FS (uuid) which it won't ever reach
again through that dead reference. Consequently I believe the FS
should be able to determine that it's time to stop blocking the use of
the FS. And since I believe it could, I think it should. .... OTOH
what seems to be happening is that the FS keeps trying to finish
writing the log entries to the journal on a device it won't ever see
again. And at the same time, it's stopping the use of the FS (uuid),
for the right reasons (I get it) but in the wrong circumstances
(device gone, no need to block, no plans to finish writing). IMHO.

> XFS can't determine correctly if it is a fatal permanent or 
> temporary error condition.

I don't believe the block device doesn't return an error codes
detailed enough to know if the device is GONE or just temporary
insane. Or is the block device layer so bad?

> Hence if we get an error from the storage (regardless of the error)
> in a situation we can't recover from, it is considered fatal
> regardless of whether the device is replugged or not. You case is a
> failed log IO, which is always a fatal, unrecoverable error....

I think you're misunderstanding me, I am not expecting the FS to
automagically start writing again after a reconnect (though I wish for
it). The old device is dead, there's a new device, the old device will
be dead until there's any reference to it, at which point the device
ID will be freed up for use. I'm merely hoping the the complete and
permanent disappearance of a disk on one device wouldn't prevent the
use of the same disk as a new device.

>> Isn't XFS just forcing me to take a manual action by accident?
> 
> No, by intent. Obvious, in-your-face intent. Filesystem corruption 
> events require manual intervention to analyse and take appropriate 
> action. You may not think it's necessary for your use case, but 
> years of use in mission critical data storage environments has 
> proven otherwise....
> 
>> Imagine, I have some files, just saved them, didn't call fsync,
>> the data is still in some cache, the cable is yanked, and the
>> data is lost. But in this case the XFS won't complain.
> 
> It does complain - it logs that it is discarding data unless a 
> shutdown has already occurred, and then it doesn't bother because 
> it's already indicated to the log that the filesystem is in big 
> trouble....

Yes, it always complains, which is not the same as what it's doing to
me, it's preventing the use of the filesystem until some processes are
killed, processes which will never ever EVER succeed in messing up the
filesystem, since the device the FS was using is dead and gone. And
the reason I'm stressing that this is accidental is because A) it
doesn't provide any benefit for the FS (no one will ever write to the
device from the old device) B) it makes me jump through hoops only on
the one PC, which means it's not a FS related hustle, it's PC related.
In which case, why am I going through it?

>> Only if there's a process. Seems more like circumstance than
>> design. Is it? Is this an actual intentional behavior.
> 
> Lazy unmount does this by intent and XFS has not control over
> this. Lazy unmount is done by your userspace software, not the
> filesystem. You're shooting the messenger.

Okay, I get that, automount is triggering the disappearance of the
mount point and the /proc/mounts entry, triggered by
udev/dbus/whatever. I get it.

>>> Yup - XFS refuses to mount a filesystem with a duplicate UUID,
>>>  preventing you from mounting the same filesystem from two 
>>> different logical block device instances that point to the
>>> same physical disk. That's the only sane thing to do in
>>> enterprise storage systems that use multi-pathing to present
>>> failure-tolerant access to a physical device.
>> 
>> Actually, IMHO it would also be sane to forget you ever saw a
>> UUID after the last underlying physical device is gone and you're
>> not going to be ever writing to this.
> 
> And how does the referenced, mounted filesystem know this? It
> can't - it actually holds a reference to the block device that got
> yanked, and internally that block device doesn't go away until the 
> filesystem releases it's reference.

But a write and read should return a different error message, doesn't
it? Doesn't the block device layer let the FS layer know that the
device is gone? Something like ENODEV, or something like it, I don't
know, something. There must be something, otherwise it's a kernel bug.
IMHO, etc.

Martin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTj8odAAoJELsEaSRwbVYr1xwP/RkrIvq+DmF+8YUd6bSgDCe+
/HQdMOl8u5ln+h7hYwfPKniMWwnPhEx8I4VfHLNXY7S0o0iTbfpBXUacsQGc25IF
Xf3Ktv5JpW6X/pzCwAhr+ZY35NMjMR79ySjQeXeEyanEd5ghG0PP1Fsh/zQIDBpU
SdurLJirgbFufPBIJerxFtR0WKyDUAoGO0rcfsl67RaEMy4KS/Cusodb0a5UXZMd
P57Ef1rUVYoGBvh9pieplHKQIfPvW//p7B++oeWrYhQF2c+hUhWeOIfv81o7vRvn
8lYuVGgv2BLgUQ1rCi3jT5zUfy/RAW8GA5/M1AMksLgEkIOzSxavYHE+K7ALRCRt
1PXMk01KLO3VyYkE4qkArVH+vypKgd+Ma11ofYGoCTbKCjKXgeRtahzBXvvEyFrh
l4I5jNBsNB7RYiuBpEnf0Orx1cdk6no18373CtmLWRadRxJhjJiq9DHRmzr94CA6
Csnv0LpewScXmLeWeSG7EkSYUeO3KNu3rNBvhcg+tkL5XATZi4cN4kvoq+yoAhX8
sZxGtWJBNJ3ModIISOh85M6T1b8+Uu1psS3dz2vDWhhWRQu6PiuLSaDqKzift+Va
4zfUn3O1DnaRqm359swWffOWA+pOeUQSgYCrftGeeAlu6nJtgM726KbvQ+ovWJIu
LiyqWfKjJjTqqNIvyrtP
=5aBA
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-06-05  0:55                                 ` Dave Chinner
  2014-06-05  1:38                                   ` Martin Papik
@ 2014-06-05 19:39                                   ` Martin Papik
  2014-06-05 22:41                                     ` Dave Chinner
  1 sibling, 1 reply; 31+ messages in thread
From: Martin Papik @ 2014-06-05 19:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Linux fs XFS

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Dear Dave,

Thanks for your patience, and I'd like to ask a few more questions if
I may.

You mentioned that XFS is primarily for server type environment and
specifically mentioned multipath access to storage devices. I had a
setup in the past, blade center with a shared SCSI array where
everything was duplicated. The blades had two connections one to each
of a pair of switches, the switches had duplicated connections to two
controllers (4 connections) and the controllers had access to the SCSI
drives (a dozen). So in this scenario each blade had two SCSI drives
showing for each volume on the array. So.....

Does XFS bind to both devices? If so, does it start using the second
if it loses access to the one it uses right now? E.g. when you do
maintenance on one of the redundant switches. Also, if so, does XFS
pick the device up when the device comes back (e.g. maintenance done).
This assumes of course that only one of the pair is down at a time.
... Or does XFS just prevent the use of the redundant device in order
to prevent problems related to writing to the same device by two FS
instances?

Martin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTkMdoAAoJELsEaSRwbVYrbfgP/0zfMZaKz4f8Si9eIkdVLn1o
nnS0KRo3Ma751vvRWeu8Nf9Mki4rSW9bTE9au2O5jameUTGJl/4TbeC91HTbnRq7
tVjjNw9rf7I36kLEv0fZmm5zjlpn4WLSulcjdOEJqGB4NGng2HULqIUJQ62MPvW4
ZRluzgq3fZPg3wOpKinEUviwDjcwOe/9grCIieNxgVwk7GKwal8ytH/Z58++AoFW
5PB5MBlPTYFOTfIWupkGoWGyUS3M7+Ddwq9RmD871XXFnQOeyRlxF8iGp4Kqw1h1
D+jhUqiMZWApzprs6V1zQJ9Z7NbBZyHk/xwbB2EF/aQUsiukLchsX7rxgPFShWkA
dJ8biDbbqQY4+ZFYt5NXi9TlffpGoCutWkU73DdGxWArM5cI6eFOCSRPE7Gfm8zR
ZhUeq1g6UP9mefsm7ZOlVq4KanbEgdkkh2I1y/aTSUHd80O7Fge2vpN089Ub1LnI
kcRcTN0b11Ut6nJkPAbVpVnCBpz6F0qAYUlW6ECW0F5w3Qvd5VZhRN5OLL3OTjo/
2k+YRA/9g9tlZNNcZWdFR08TZLzMACxNef6uNb0HIlIJNEq8+a1JipklWgOjqOt2
2/eGduI5ijt0i+4MqUnrhStIUb0ac0al4vliQ0ijAfpD8WbnrzznvGBAZoCUSjWC
1cKR41L/uYmjdLmB6AHc
=lbaf
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-06-05 19:39                                   ` Martin Papik
@ 2014-06-05 22:41                                     ` Dave Chinner
  2014-06-06  0:47                                       ` Martin Papik
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Chinner @ 2014-06-05 22:41 UTC (permalink / raw)
  To: Martin Papik; +Cc: Linux fs XFS

On Thu, Jun 05, 2014 at 10:39:42PM +0300, Martin Papik wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> Dear Dave,
> 
> Thanks for your patience, and I'd like to ask a few more questions if
> I may.
> 
> You mentioned that XFS is primarily for server type environment and
> specifically mentioned multipath access to storage devices. I had a
> setup in the past, blade center with a shared SCSI array where
> everything was duplicated. The blades had two connections one to each
> of a pair of switches, the switches had duplicated connections to two
> controllers (4 connections) and the controllers had access to the SCSI
> drives (a dozen). So in this scenario each blade had two SCSI drives
> showing for each volume on the array. So.....
> 
> Does XFS bind to both devices?

No. Use dm-multipath to make them appear as a single block device
made up as a pair of redundant paths in primary/secondary failover
or active/active load balancing configurations. dm-multipath handles
failover between the two block devices on path failure
transparently.

	  XFS
	   |
	dm-mp-0
	/    \
      sdc    sdd

The mounted filesystem doesn't even know there are multiple paths in
this configuration, however the XFS UUID trapping behaviour avoids
this problem by preventing you from doing XFS operations directly on
/dev/sdc or /dev/sdd while the filesystem is mount on
/dev/dm-mp-0....

Cheers.

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: XFS filesystem claims to be mounted after a disconnect
  2014-06-05 22:41                                     ` Dave Chinner
@ 2014-06-06  0:47                                       ` Martin Papik
  0 siblings, 0 replies; 31+ messages in thread
From: Martin Papik @ 2014-06-06  0:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Linux fs XFS

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


Thanks Dave

But wouldn't the same thing that prevents xfs_repair also prevent
dm-multipath from using the device? Or would you typically setup a
partitioning or volume scheme on top of dm-multipath?

Martin

> No. Use dm-multipath to make them appear as a single block device 
> made up as a pair of redundant paths in primary/secondary failover 
> or active/active load balancing configurations. dm-multipath
> handles failover between the two block devices on path failure 
> transparently.
> 
> XFS | dm-mp-0 /    \ sdc    sdd
> 
> The mounted filesystem doesn't even know there are multiple paths
> in this configuration, however the XFS UUID trapping behaviour
> avoids this problem by preventing you from doing XFS operations
> directly on /dev/sdc or /dev/sdd while the filesystem is mount on 
> /dev/dm-mp-0....
> 
> Cheers.
> 
> Dave.
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTkQ+yAAoJELsEaSRwbVYrJB4QAIyjUk7un+bth/5WnW/imwoh
CGlkS7kuU8Iu96VqwoojOZtY6RcUX5Yq+NoW7NDigFv0iEkaM2j0wg8JoNw6EFqX
uiJuCkBUI3/v8QkFU0GtA4/6NFsu9N709f8J0ilof20L0bNm2wh7SiXqrp28X3HS
8Y1G50PXbBnEM5yKbmXLpVTOhEw35NlaPKBYZ5gBFohPXDb4+J6+g0kJD5wZPSWK
lsHyW8tuuntGIZTtGZ4sHvak5THCXILClmFcPrdzzU110c601RTzJfeq9L7QXTMI
thrUH3KGvT2s4vTwqLLFHdBZNwEXi2X0PaiM7uO0/AYvaaoDfwnLh0lJmSSxRFZn
YYFa6Ko20l4nxFZoy6eRUh8vxvOat4VOlYowlyEcnPTE1nSeJfulun94ztjNXs2k
5VtWlFh12CXSVrynXBVnYKvFOAdecZcztqxjAjjTcjIrWHbZv0UXFtflG1V+zjgD
63SqxwShL2C/685gFXHh7mY0K2vAbnBK+IrRSTWMKVMZ16OuuIFA2WZebnskDfAf
vWWFv8mAEAUP/xYF7u4g4XVov2aQAVxjDosXxuqUbW8QHlTXNB/wc7akXgoD3fmZ
u0cB5Paw9JJeGt5eKYbtReL+Wt3STKtYGu2yNOZo/oiUr0uuoAkcjSfUGqrdu6HL
kQgRYDDjQt37JPVIyyNK
=55eH
-----END PGP SIGNATURE-----

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-06-06  0:48 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-02 13:47 XFS filesystem claims to be mounted after a disconnect Martin Papik
2014-05-02 15:04 ` Eric Sandeen
2014-05-02 15:07 ` Eric Sandeen
2014-05-02 15:44   ` Mark Tinguely
2014-05-02 16:26     ` Martin Papik
2014-05-02 16:44   ` Martin Papik
2014-05-02 16:53     ` Eric Sandeen
2014-05-02 17:54       ` Martin Papik
2014-05-02 18:39         ` Eric Sandeen
2014-05-02 19:07           ` Martin Papik
2014-05-02 19:16             ` Eric Sandeen
2014-05-02 19:29               ` Martin Papik
2014-05-02 23:38                 ` Dave Chinner
2014-05-02 23:35             ` Dave Chinner
2014-05-03  0:04               ` Martin Papik
2014-05-03  3:02                 ` Dave Chinner
2014-06-02 11:22                   ` Martin Papik
2014-06-02 23:41                     ` Dave Chinner
2014-06-03  9:23                       ` Martin Papik
2014-06-03  9:55                         ` Stefan Ring
2014-06-03 10:48                           ` Martin Papik
2014-06-03 21:28                             ` Dave Chinner
2014-06-03 22:37                               ` Martin Papik
2014-06-05  0:55                                 ` Dave Chinner
2014-06-05  1:38                                   ` Martin Papik
2014-06-05 19:39                                   ` Martin Papik
2014-06-05 22:41                                     ` Dave Chinner
2014-06-06  0:47                                       ` Martin Papik
2014-06-03 22:58                               ` Martin Papik
2014-06-05  0:08                                 ` Dave Chinner
2014-06-05  1:07                                   ` Martin Papik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.