All of lore.kernel.org
 help / color / mirror / Atom feed
* trouble with generic/081
@ 2016-12-14 16:43 Christoph Hellwig
  2016-12-15  6:29 ` Eryu Guan
  2016-12-15  6:36 ` Dave Chinner
  0 siblings, 2 replies; 29+ messages in thread
From: Christoph Hellwig @ 2016-12-14 16:43 UTC (permalink / raw)
  To: eguan; +Cc: fstests

Hi Eryu,

I'm running into a fairly reproducable issue with generic/081
(about every other run): For some reason the umount call in
_cleanup doesn't do anything because it thinks the file system isn't
mounted, but then vgremove complains that there is a mounted file
system.  This leads to the scratch device no being release and all
subsequent tests failing.

Here is the output if I let the commands in _cleanup print to stdout:

QA output created by 081
Silence is golden
umount: /mnt/test/mnt_081: not mounted
  Logical volume vg_081/snap_081 contains a filesystem in use.
  PV /dev/sdc belongs to Volume Group vg_081 so please use vgreduce first.

You added a comment in _cleanup that sais:

# lvm may have umounted it on I/O error, but in case it does not

Does LVM really unmount filesystems on it's own?  Could we be racing
with it?

With a "sleep 1" added before the umount call the test passes reliably
for me, but that seems like papering over the issue.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2016-12-14 16:43 trouble with generic/081 Christoph Hellwig
@ 2016-12-15  6:29 ` Eryu Guan
  2016-12-15  6:36 ` Dave Chinner
  1 sibling, 0 replies; 29+ messages in thread
From: Eryu Guan @ 2016-12-15  6:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: fstests

On Wed, Dec 14, 2016 at 08:43:14AM -0800, Christoph Hellwig wrote:
> Hi Eryu,
> 
> I'm running into a fairly reproducable issue with generic/081
> (about every other run): For some reason the umount call in
> _cleanup doesn't do anything because it thinks the file system isn't
> mounted, but then vgremove complains that there is a mounted file
> system.  This leads to the scratch device no being release and all
> subsequent tests failing.
> 
> Here is the output if I let the commands in _cleanup print to stdout:
> 
> QA output created by 081
> Silence is golden
> umount: /mnt/test/mnt_081: not mounted
>   Logical volume vg_081/snap_081 contains a filesystem in use.
>   PV /dev/sdc belongs to Volume Group vg_081 so please use vgreduce first.

Yes, I have this problem too.

My original patch didn't have "-c fsync" in the last xfs_io pwrite
command,

$XFS_IO_PROG -fc "pwrite 0 5m" -c fsync $mnt/testfile >>$seqres.full 2>&1

and Brian suggested that an explicit fsync would make the test clear.
And Dave added it and committed the patch.

https://www.spinics.net/lists/fstests/msg01265.html

This cleanup failure was the exact reason why I didn't include fsync at
first.

https://www.spinics.net/lists/fstests/msg01269.html

Then I sent a follow-up patch to workaround this issue, but Dave
suggested that we should triage and fix the underlying bug first (if
there's any).

https://www.spinics.net/lists/fstests/msg01406.html

I tried to follow & dig into it but went nowhere, I didn't know that
part of code good enough..

> 
> You added a comment in _cleanup that sais:
> 
> # lvm may have umounted it on I/O error, but in case it does not
> 
> Does LVM really unmount filesystems on it's own?  Could we be racing
> with it?

IIRC, there's some kind of hooks in LVM that unmount the filesystems,
but I can't recall the details now.. From the ending results, the
filesystems are umounted, perhaps that's why you see "/mnt/test/mnt_081:
not mounted" (this error message is redirected to /dev/null in the
test).

> 
> With a "sleep 1" added before the umount call the test passes reliably
> for me, but that seems like papering over the issue.

Do you have any preference on this?

Thanks,
Eryu

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2016-12-14 16:43 trouble with generic/081 Christoph Hellwig
  2016-12-15  6:29 ` Eryu Guan
@ 2016-12-15  6:36 ` Dave Chinner
  2016-12-15  8:42   ` Christoph Hellwig
  1 sibling, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2016-12-15  6:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: eguan, fstests

On Wed, Dec 14, 2016 at 08:43:14AM -0800, Christoph Hellwig wrote:
> Hi Eryu,
> 
> I'm running into a fairly reproducable issue with generic/081
> (about every other run): For some reason the umount call in
> _cleanup doesn't do anything because it thinks the file system isn't
> mounted, but then vgremove complains that there is a mounted file
> system.  This leads to the scratch device no being release and all
> subsequent tests failing.

Yup, been seeing that on my pmem test setup for months. Reported
along with the subsequent LVM configuration fuckup it resulted in:

https://www.redhat.com/archives/dm-devel/2016-July/msg00405.html

> Here is the output if I let the commands in _cleanup print to stdout:
> 
> QA output created by 081
> Silence is golden
> umount: /mnt/test/mnt_081: not mounted
>   Logical volume vg_081/snap_081 contains a filesystem in use.
>   PV /dev/sdc belongs to Volume Group vg_081 so please use vgreduce first.
> 
> You added a comment in _cleanup that sais:
> 
> # lvm may have umounted it on I/O error, but in case it does not
> 
> Does LVM really unmount filesystems on it's own?  Could we be racing
> with it?

Nope, I'm pretty sure it's a snapshot lifecycle issue - the snapshot
is still busy doing something (probably IO) for a short while after
we unmount, so LVM can't tear it down immediately like we ask. Wait
a few seconds, the snapshot work finishes, goes idle, and then it
can be torn down.

But if you consider the fuckup that occurs if generic/085 starts up
and tries to reconfigure LVM while the snapshot from generic/081 is
still in this whacky window (as reported in the above link), this is
really quite a nasty bug.

> With a "sleep 1" added before the umount call the test passes reliably
> for me, but that seems like papering over the issue.

Yup, same here. My local patch is this:

---
 tests/generic/081 | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tests/generic/081 b/tests/generic/081
index 11755d4d89ff..ff33ffaa4fb8 100755
--- a/tests/generic/081
+++ b/tests/generic/081
@@ -36,6 +36,11 @@ _cleanup()
 	rm -f $tmp.*
 	# lvm may have umounted it on I/O error, but in case it does not
 	$UMOUNT_PROG $mnt >/dev/null 2>&1
+
+	# on a pmem device, the vgremove/pvremove commands fail immediately
+	# after unmount. Wait a bit before removing them in the hope it
+	# succeeds.
+	sleep 5
 	$LVM_PROG vgremove -f $vgname >>$seqres.full 2>&1
 	$LVM_PROG pvremove -f $SCRATCH_DEV >>$seqres.full 2>&1
 }

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2016-12-15  6:36 ` Dave Chinner
@ 2016-12-15  8:42   ` Christoph Hellwig
  2016-12-15  9:16     ` Zdenek Kabelac
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2016-12-15  8:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: eguan, fstests, dm-devel

On Thu, Dec 15, 2016 at 05:36:50PM +1100, Dave Chinner wrote:
> Yup, same here. My local patch is this:

I have a sleep 1 before the unmount.  To be honest this lvm behavior
of auto-unmounting on error seems like a huge mess, I wonder if there is
a way to disable it?

Even on a production system I'd much rather have a shutdown XFS fs
than LVM trying to unmount, probably hanging because there are busy
fds on the fs, and if not the application might not write to another
fs becaus of this.  It's just an amazingly stupid idea.

> 
> ---
>  tests/generic/081 | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/tests/generic/081 b/tests/generic/081
> index 11755d4d89ff..ff33ffaa4fb8 100755
> --- a/tests/generic/081
> +++ b/tests/generic/081
> @@ -36,6 +36,11 @@ _cleanup()
>  	rm -f $tmp.*
>  	# lvm may have umounted it on I/O error, but in case it does not
>  	$UMOUNT_PROG $mnt >/dev/null 2>&1
> +
> +	# on a pmem device, the vgremove/pvremove commands fail immediately
> +	# after unmount. Wait a bit before removing them in the hope it
> +	# succeeds.
> +	sleep 5
>  	$LVM_PROG vgremove -f $vgname >>$seqres.full 2>&1
>  	$LVM_PROG pvremove -f $SCRATCH_DEV >>$seqres.full 2>&1
>  }
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2016-12-15  8:42   ` Christoph Hellwig
@ 2016-12-15  9:16     ` Zdenek Kabelac
  2016-12-16  8:15       ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Zdenek Kabelac @ 2016-12-15  9:16 UTC (permalink / raw)
  To: Christoph Hellwig, Dave Chinner; +Cc: eguan, fstests, dm-devel

Dne 15.12.2016 v 09:42 Christoph Hellwig napsal(a):
> On Thu, Dec 15, 2016 at 05:36:50PM +1100, Dave Chinner wrote:
>> Yup, same here. My local patch is this:
>
> I have a sleep 1 before the unmount.  To be honest this lvm behavior of
> auto-unmounting on error seems like a huge mess, I wonder if there is a way
> to disable it?
>
> Even on a production system I'd much rather have a shutdown XFS fs than LVM
> trying to unmount, probably hanging because there are busy fds on the fs,
> and if not the application might not write to another fs becaus of this.
> It's just an amazingly stupid idea.
>

Hi

So let me explain the logic behind this 'amazingly stupid' idea.

The full pool is by far worst case and ATM lvm2 has only a single policy
(though for many many years) to prevent bigger disaster by intentionally
causing smaller one - and sometimes it works (since typically if you have open
descriptors in apps - and you can't open new one app dies, descriptors are
closed, FS is unmounted....) (Originally we wanted to be even more 'brutal' -
and replace such thin devices with error targets...)   Unfortunately remount
of filesystem to read-only mode will not ensure there will be no further
writes (journal replies, endless retry of XFS to store unwritten blocks...)
but could be possibly better choice.

What is worth here to emphasize: the 95% fullness of thin-poll (which is the
moment where  the unmount will be tried) it simply the failure of admin - it
should never get to that fullness in the first place - plain and simple.

Thin-pool is NOT designed to be commonly operated over the corner cases - if
that's your common use scenario - you've picked wrong technology - it doesn't
fit.  We have already seen some  use scenarios where 100% full pool was meant
to be part of regular work-flow....

If you do monitor with threshold 70% and you already have a pool over 95%
there is nearly zero chance it will get fixed by some miracle.

Filesystem are not so much ready yet to deal  with full thin-pool well - this
is plain real-world fact. They will destroy them-self totally when they
suddenly face massive random error sector counts. It may cause serious
possibly even irreparable damage in case of XFS. The ext4 is ahead with years
proven error remount read-only behavior technology and until very recently
where XFS gained something close to this.


But of course we are open to proposal - what better we should do to not be
'amazingly stupid'   (just note  trying to use full thin-pool is not
'amazingly smart') - this is incomparable case to full single filesystem.


We do already have some plans to call  'fstrim' at some points - but still
we need something as emergency stop for filesystem before there
is hit of  full-pool wall - this is the case which always means
data-loss and is not-trival to recover.

For each such proposal it's worth to make upstream Bugzilla.

Regards

Zdenek

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2016-12-15  9:16     ` Zdenek Kabelac
@ 2016-12-16  8:15       ` Christoph Hellwig
  2016-12-16  9:31         ` Zdenek Kabelac
  2017-01-04 23:03         ` Eric Sandeen
  0 siblings, 2 replies; 29+ messages in thread
From: Christoph Hellwig @ 2016-12-16  8:15 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: Christoph Hellwig, Dave Chinner, eguan, fstests, dm-devel

On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote:
> So let me explain the logic behind this 'amazingly stupid' idea.

And that logic doesn't make any sense at all.  invibly unmounting
a file system behind the users back is actively harmful, as it is
contradicting the principle of least surprise, and the xfstests mess
is one simple example for it.  Add a callback in-kernel to tell the
fs to shut down but NOT unmount and expose the namespace below it,
which the administrator has probably intentionally hid.

Even worse unmount may trigger further writes and with fses not
handling them the fs might now be stuck after being detached from
the namespace without a way for the admin to detect or recover this.

What XFS did on IRIX was to let the volume manager call into the fs
and shut it down.  At this point no further writes are possible,
but we do not expose the namespace under the mount point, and the
admin can fix the situation with all the normal tools.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2016-12-16  8:15       ` Christoph Hellwig
@ 2016-12-16  9:31         ` Zdenek Kabelac
  2017-01-04 23:03         ` Eric Sandeen
  1 sibling, 0 replies; 29+ messages in thread
From: Zdenek Kabelac @ 2016-12-16  9:31 UTC (permalink / raw)
  To: dm-devel

Dne 16.12.2016 v 09:15 Christoph Hellwig napsal(a):
> On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote:
>> So let me explain the logic behind this 'amazingly stupid' idea.
>
> And that logic doesn't make any sense at all.  invibly unmounting
> a file system behind the users back is actively harmful, as it is
> contradicting the principle of least surprise, and the xfstests mess
> is one simple example for it.  Add a callback in-kernel to tell the
> fs to shut down but NOT unmount and expose the namespace below it,
> which the administrator has probably intentionally hid.
>
> Even worse unmount may trigger further writes and with fses not
> handling them the fs might now be stuck after being detached from
> the namespace without a way for the admin to detect or recover this.
>
> What XFS did on IRIX was to let the volume manager call into the fs
> and shut it down.  At this point no further writes are possible,
> but we do not expose the namespace under the mount point, and the
> admin can fix the situation with all the normal tools.
>

Please propose something better.
(do nothing is NOT better)

Regards


Zdenek

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2016-12-16  8:15       ` Christoph Hellwig
  2016-12-16  9:31         ` Zdenek Kabelac
@ 2017-01-04 23:03         ` Eric Sandeen
  2017-01-05 10:35             ` Zdenek Kabelac
  1 sibling, 1 reply; 29+ messages in thread
From: Eric Sandeen @ 2017-01-04 23:03 UTC (permalink / raw)
  To: Christoph Hellwig, Zdenek Kabelac; +Cc: Dave Chinner, eguan, fstests, dm-devel



On 12/16/16 2:15 AM, Christoph Hellwig wrote:
> On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote:
>> So let me explain the logic behind this 'amazingly stupid' idea.
> 
> And that logic doesn't make any sense at all.  invibly unmounting
> a file system behind the users back is actively harmful, as it is
> contradicting the principle of least surprise, and the xfstests mess
> is one simple example for it.  Add a callback in-kernel to tell the
> fs to shut down but NOT unmount and expose the namespace below it,
> which the administrator has probably intentionally hid.
> 
> Even worse unmount may trigger further writes and with fses not
> handling them the fs might now be stuck after being detached from
> the namespace without a way for the admin to detect or recover this.
> 
> What XFS did on IRIX was to let the volume manager call into the fs
> and shut it down.  At this point no further writes are possible,
> but we do not expose the namespace under the mount point, and the
> admin can fix the situation with all the normal tools.

<late to the party>

Is there a need for this kind of call-up when xfs now has the configurable
error handling so that it will shut down after X retries or Y seconds 
of a persistent error?

-Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-04 23:03         ` Eric Sandeen
@ 2017-01-05 10:35             ` Zdenek Kabelac
  0 siblings, 0 replies; 29+ messages in thread
From: Zdenek Kabelac @ 2017-01-05 10:35 UTC (permalink / raw)
  To: Eric Sandeen, Christoph Hellwig, Zdenek Kabelac
  Cc: Dave Chinner, eguan, fstests, dm-devel

Dne 5.1.2017 v 00:03 Eric Sandeen napsal(a):
>
>
> On 12/16/16 2:15 AM, Christoph Hellwig wrote:
>> On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote:
>>> So let me explain the logic behind this 'amazingly stupid' idea.
>>
>> And that logic doesn't make any sense at all.  invibly unmounting
>> a file system behind the users back is actively harmful, as it is
>> contradicting the principle of least surprise, and the xfstests mess
>> is one simple example for it.  Add a callback in-kernel to tell the
>> fs to shut down but NOT unmount and expose the namespace below it,
>> which the administrator has probably intentionally hid.
>>
>> Even worse unmount may trigger further writes and with fses not
>> handling them the fs might now be stuck after being detached from
>> the namespace without a way for the admin to detect or recover this.
>>
>> What XFS did on IRIX was to let the volume manager call into the fs
>> and shut it down.  At this point no further writes are possible,
>> but we do not expose the namespace under the mount point, and the
>> admin can fix the situation with all the normal tools.
>
> <late to the party>
>
> Is there a need for this kind of call-up when xfs now has the configurable
> error handling so that it will shut down after X retries or Y seconds
> of a persistent error?


We need likely to open  RFE bugzilla  here - and specify how it should
work when some conditions are met.

Current 'best effort' tries to minimize damage by trying to do a full-stop
when pool approaches 95% fullness.  Which is relatively 'low/small' for small 
sized thin-pool - but there is reasonable big free space for
commonly sized thin-pool to be able to flush most of page cache on
disk before things will go crazy.

Now - we could probably detect presence of kernel version and xfs/ext4 present 
features - and change reactions.

What I expect from this BZ is -  how to detect things and what is the 'best' 
thing to do.

I'm clearly not an expert on all filesystem and all their features - but lvm2
needs to work reasonable well across all variants of kernels and filesystems - 
  so we cannot say to user - now  we require you to use the latest 4.10
kernel with these features enabled otherwise all your data could be lost.

We need to know what to do with 3.X  kernel,  4.X kernel and present features 
in kernel and how we can detect them in runtime.

Regards

Zdenek


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
@ 2017-01-05 10:35             ` Zdenek Kabelac
  0 siblings, 0 replies; 29+ messages in thread
From: Zdenek Kabelac @ 2017-01-05 10:35 UTC (permalink / raw)
  To: Eric Sandeen, Christoph Hellwig, Zdenek Kabelac
  Cc: dm-devel, Dave Chinner, eguan, fstests

Dne 5.1.2017 v 00:03 Eric Sandeen napsal(a):
>
>
> On 12/16/16 2:15 AM, Christoph Hellwig wrote:
>> On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote:
>>> So let me explain the logic behind this 'amazingly stupid' idea.
>>
>> And that logic doesn't make any sense at all.  invibly unmounting
>> a file system behind the users back is actively harmful, as it is
>> contradicting the principle of least surprise, and the xfstests mess
>> is one simple example for it.  Add a callback in-kernel to tell the
>> fs to shut down but NOT unmount and expose the namespace below it,
>> which the administrator has probably intentionally hid.
>>
>> Even worse unmount may trigger further writes and with fses not
>> handling them the fs might now be stuck after being detached from
>> the namespace without a way for the admin to detect or recover this.
>>
>> What XFS did on IRIX was to let the volume manager call into the fs
>> and shut it down.  At this point no further writes are possible,
>> but we do not expose the namespace under the mount point, and the
>> admin can fix the situation with all the normal tools.
>
> <late to the party>
>
> Is there a need for this kind of call-up when xfs now has the configurable
> error handling so that it will shut down after X retries or Y seconds
> of a persistent error?


We need likely to open  RFE bugzilla  here - and specify how it should
work when some conditions are met.

Current 'best effort' tries to minimize damage by trying to do a full-stop
when pool approaches 95% fullness.  Which is relatively 'low/small' for small 
sized thin-pool - but there is reasonable big free space for
commonly sized thin-pool to be able to flush most of page cache on
disk before things will go crazy.

Now - we could probably detect presence of kernel version and xfs/ext4 present 
features - and change reactions.

What I expect from this BZ is -  how to detect things and what is the 'best' 
thing to do.

I'm clearly not an expert on all filesystem and all their features - but lvm2
needs to work reasonable well across all variants of kernels and filesystems - 
  so we cannot say to user - now  we require you to use the latest 4.10
kernel with these features enabled otherwise all your data could be lost.

We need to know what to do with 3.X  kernel,  4.X kernel and present features 
in kernel and how we can detect them in runtime.

Regards

Zdenek

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-05 10:35             ` Zdenek Kabelac
  (?)
@ 2017-01-05 16:26             ` Mike Snitzer
  2017-01-05 17:42                 ` Zdenek Kabelac
  -1 siblings, 1 reply; 29+ messages in thread
From: Mike Snitzer @ 2017-01-05 16:26 UTC (permalink / raw)
  To: Zdenek Kabelac
  Cc: Eric Sandeen, Christoph Hellwig, dm-devel, Dave Chinner, eguan, fstests

On Thu, Jan 05 2017 at  5:35am -0500,
Zdenek Kabelac <zkabelac@redhat.com> wrote:

> Dne 5.1.2017 v 00:03 Eric Sandeen napsal(a):
> >
> >
> >On 12/16/16 2:15 AM, Christoph Hellwig wrote:
> >>On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote:
> >>>So let me explain the logic behind this 'amazingly stupid' idea.
> >>
> >>And that logic doesn't make any sense at all.  invibly unmounting
> >>a file system behind the users back is actively harmful, as it is
> >>contradicting the principle of least surprise, and the xfstests mess
> >>is one simple example for it.  Add a callback in-kernel to tell the
> >>fs to shut down but NOT unmount and expose the namespace below it,
> >>which the administrator has probably intentionally hid.
> >>
> >>Even worse unmount may trigger further writes and with fses not
> >>handling them the fs might now be stuck after being detached from
> >>the namespace without a way for the admin to detect or recover this.
> >>
> >>What XFS did on IRIX was to let the volume manager call into the fs
> >>and shut it down.  At this point no further writes are possible,
> >>but we do not expose the namespace under the mount point, and the
> >>admin can fix the situation with all the normal tools.
> >
> ><late to the party>
> >
> >Is there a need for this kind of call-up when xfs now has the configurable
> >error handling so that it will shut down after X retries or Y seconds
> >of a persistent error?
> 
> 
> We need likely to open  RFE bugzilla  here - and specify how it should
> work when some conditions are met.
> 
> Current 'best effort' tries to minimize damage by trying to do a full-stop
> when pool approaches 95% fullness.  Which is relatively 'low/small'
> for small sized thin-pool - but there is reasonable big free space
> for
> commonly sized thin-pool to be able to flush most of page cache on
> disk before things will go crazy.
> 
> Now - we could probably detect presence of kernel version and
> xfs/ext4 present features - and change reactions.
> 
> What I expect from this BZ is -  how to detect things and what is
> the 'best' thing to do.
> 
> I'm clearly not an expert on all filesystem and all their features - but lvm2
> needs to work reasonable well across all variants of kernels and
> filesystems -  so we cannot say to user - now  we require you to use
> the latest 4.10
> kernel with these features enabled otherwise all your data could be lost.
> 
> We need to know what to do with 3.X  kernel,  4.X kernel and present
> features in kernel and how we can detect them in runtime.

No we need to fix upstream.  It is the job of distros to sort out other
solutions.  And yeah I appreciate that you need to worry about distro X,
Y, Z from Red Hat but lvm2 needs to not be so cute about things.

And there has been significant progress on XFS's error handling so that
it no longer hangs in the face of ENOSPC.

Eric says the basics are documented in Documentation/filesystems/xfs.txt
under "Error Handling".

Bottomline is lvm2 really has no business unmounting the filesystem.
That lvm2 "feature" should be reverted.  It isn't what anyone would
expect.  And it only serves to create problems.  Nice intent but it was
a misfire to implement it.  At most a discard should be issued once you
cross a threshold.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-05 16:26             ` Mike Snitzer
@ 2017-01-05 17:42                 ` Zdenek Kabelac
  0 siblings, 0 replies; 29+ messages in thread
From: Zdenek Kabelac @ 2017-01-05 17:42 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Eric Sandeen, Christoph Hellwig, dm-devel, Dave Chinner, eguan, fstests

Dne 5.1.2017 v 17:26 Mike Snitzer napsal(a):
> On Thu, Jan 05 2017 at  5:35am -0500,
> Zdenek Kabelac <zkabelac@redhat.com> wrote:
>
>> Dne 5.1.2017 v 00:03 Eric Sandeen napsal(a):
>>>
>>>
>>> On 12/16/16 2:15 AM, Christoph Hellwig wrote:
>>>> On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote:
>>>>> So let me explain the logic behind this 'amazingly stupid' idea.
>>>>
>>>> And that logic doesn't make any sense at all.  invibly unmounting
>>>> a file system behind the users back is actively harmful, as it is
>>>> contradicting the principle of least surprise, and the xfstests mess
>>>> is one simple example for it.  Add a callback in-kernel to tell the
>>>> fs to shut down but NOT unmount and expose the namespace below it,
>>>> which the administrator has probably intentionally hid.
>>>>
>>>> Even worse unmount may trigger further writes and with fses not
>>>> handling them the fs might now be stuck after being detached from
>>>> the namespace without a way for the admin to detect or recover this.
>>>>
>>>> What XFS did on IRIX was to let the volume manager call into the fs
>>>> and shut it down.  At this point no further writes are possible,
>>>> but we do not expose the namespace under the mount point, and the
>>>> admin can fix the situation with all the normal tools.
>>>
>>> <late to the party>
>>>
>>> Is there a need for this kind of call-up when xfs now has the configurable
>>> error handling so that it will shut down after X retries or Y seconds
>>> of a persistent error?
>>
>>
>> We need likely to open  RFE bugzilla  here - and specify how it should
>> work when some conditions are met.
>>
>> Current 'best effort' tries to minimize damage by trying to do a full-stop
>> when pool approaches 95% fullness.  Which is relatively 'low/small'
>> for small sized thin-pool - but there is reasonable big free space
>> for
>> commonly sized thin-pool to be able to flush most of page cache on
>> disk before things will go crazy.
>>
>> Now - we could probably detect presence of kernel version and
>> xfs/ext4 present features - and change reactions.
>>
>> What I expect from this BZ is -  how to detect things and what is
>> the 'best' thing to do.
>>
>> I'm clearly not an expert on all filesystem and all their features - but lvm2
>> needs to work reasonable well across all variants of kernels and
>> filesystems -  so we cannot say to user - now  we require you to use
>> the latest 4.10
>> kernel with these features enabled otherwise all your data could be lost.
>>
>> We need to know what to do with 3.X  kernel,  4.X kernel and present
>> features in kernel and how we can detect them in runtime.
>
> No we need to fix upstream.  It is the job of distros to sort out other
> solutions.  And yeah I appreciate that you need to worry about distro X,
> Y, Z from Red Hat but lvm2 needs to not be so cute about things.
>
> And there has been significant progress on XFS's error handling so that
> it no longer hangs in the face of ENOSPC.
>
> Eric says the basics are documented in Documentation/filesystems/xfs.txt
> under "Error Handling".
>
> Bottomline is lvm2 really has no business unmounting the filesystem.
> That lvm2 "feature" should be reverted.  It isn't what anyone would
> expect.  And it only serves to create problems.  Nice intent but it was
> a misfire to implement it.  At most a discard should be issued once you
> cross a threshold.
>


Users are mostly using thin LVs with  ext4 and XFS filesytems.
Lots of users do use quite ancient kernels (<4.X).

When lvm2 decides to UNMOUNT volumes - it's the moment where everything else 
HAS failed (mainly Admin has failed to provide promised space)
And it should be seen as mild OOPS replacement.


Un/fortunately lvm2 does care about older distributions and kernels - so 
unlike many other 'modern' software you can take recent lvm2 - compile & run 
on several years system and it does its best - so far I'd call it a feature.

Not really sure what do you mean by - leaving this on 'distro' maintainers 
since these are typically able to run   'configure & make',
without no big idea about configurable lvm2 internals.

--

Now there is no objection about adding configurable behavior on such cases
(policies) here you are 'breaking to the open doors' since we do plan to 
provide these for some time (I think there are couple BZs already)

So far I understand that admin  HAS to configure 'better XFS logic' himself on 
filesystem - it's not  granted default so far even on 4.10 kernel ?

On lvm2 side we need to first convert plugin code executions to lvconvert 
policies (will make BZ for this one).


Regards

Zdenek


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
@ 2017-01-05 17:42                 ` Zdenek Kabelac
  0 siblings, 0 replies; 29+ messages in thread
From: Zdenek Kabelac @ 2017-01-05 17:42 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: eguan, Eric Sandeen, Dave Chinner, fstests, Christoph Hellwig, dm-devel

Dne 5.1.2017 v 17:26 Mike Snitzer napsal(a):
> On Thu, Jan 05 2017 at  5:35am -0500,
> Zdenek Kabelac <zkabelac@redhat.com> wrote:
>
>> Dne 5.1.2017 v 00:03 Eric Sandeen napsal(a):
>>>
>>>
>>> On 12/16/16 2:15 AM, Christoph Hellwig wrote:
>>>> On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote:
>>>>> So let me explain the logic behind this 'amazingly stupid' idea.
>>>>
>>>> And that logic doesn't make any sense at all.  invibly unmounting
>>>> a file system behind the users back is actively harmful, as it is
>>>> contradicting the principle of least surprise, and the xfstests mess
>>>> is one simple example for it.  Add a callback in-kernel to tell the
>>>> fs to shut down but NOT unmount and expose the namespace below it,
>>>> which the administrator has probably intentionally hid.
>>>>
>>>> Even worse unmount may trigger further writes and with fses not
>>>> handling them the fs might now be stuck after being detached from
>>>> the namespace without a way for the admin to detect or recover this.
>>>>
>>>> What XFS did on IRIX was to let the volume manager call into the fs
>>>> and shut it down.  At this point no further writes are possible,
>>>> but we do not expose the namespace under the mount point, and the
>>>> admin can fix the situation with all the normal tools.
>>>
>>> <late to the party>
>>>
>>> Is there a need for this kind of call-up when xfs now has the configurable
>>> error handling so that it will shut down after X retries or Y seconds
>>> of a persistent error?
>>
>>
>> We need likely to open  RFE bugzilla  here - and specify how it should
>> work when some conditions are met.
>>
>> Current 'best effort' tries to minimize damage by trying to do a full-stop
>> when pool approaches 95% fullness.  Which is relatively 'low/small'
>> for small sized thin-pool - but there is reasonable big free space
>> for
>> commonly sized thin-pool to be able to flush most of page cache on
>> disk before things will go crazy.
>>
>> Now - we could probably detect presence of kernel version and
>> xfs/ext4 present features - and change reactions.
>>
>> What I expect from this BZ is -  how to detect things and what is
>> the 'best' thing to do.
>>
>> I'm clearly not an expert on all filesystem and all their features - but lvm2
>> needs to work reasonable well across all variants of kernels and
>> filesystems -  so we cannot say to user - now  we require you to use
>> the latest 4.10
>> kernel with these features enabled otherwise all your data could be lost.
>>
>> We need to know what to do with 3.X  kernel,  4.X kernel and present
>> features in kernel and how we can detect them in runtime.
>
> No we need to fix upstream.  It is the job of distros to sort out other
> solutions.  And yeah I appreciate that you need to worry about distro X,
> Y, Z from Red Hat but lvm2 needs to not be so cute about things.
>
> And there has been significant progress on XFS's error handling so that
> it no longer hangs in the face of ENOSPC.
>
> Eric says the basics are documented in Documentation/filesystems/xfs.txt
> under "Error Handling".
>
> Bottomline is lvm2 really has no business unmounting the filesystem.
> That lvm2 "feature" should be reverted.  It isn't what anyone would
> expect.  And it only serves to create problems.  Nice intent but it was
> a misfire to implement it.  At most a discard should be issued once you
> cross a threshold.
>


Users are mostly using thin LVs with  ext4 and XFS filesytems.
Lots of users do use quite ancient kernels (<4.X).

When lvm2 decides to UNMOUNT volumes - it's the moment where everything else 
HAS failed (mainly Admin has failed to provide promised space)
And it should be seen as mild OOPS replacement.


Un/fortunately lvm2 does care about older distributions and kernels - so 
unlike many other 'modern' software you can take recent lvm2 - compile & run 
on several years system and it does its best - so far I'd call it a feature.

Not really sure what do you mean by - leaving this on 'distro' maintainers 
since these are typically able to run   'configure & make',
without no big idea about configurable lvm2 internals.

--

Now there is no objection about adding configurable behavior on such cases
(policies) here you are 'breaking to the open doors' since we do plan to 
provide these for some time (I think there are couple BZs already)

So far I understand that admin  HAS to configure 'better XFS logic' himself on 
filesystem - it's not  granted default so far even on 4.10 kernel ?

On lvm2 side we need to first convert plugin code executions to lvconvert 
policies (will make BZ for this one).


Regards

Zdenek

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-05 17:42                 ` Zdenek Kabelac
  (?)
@ 2017-01-05 18:07                 ` Mike Snitzer
  -1 siblings, 0 replies; 29+ messages in thread
From: Mike Snitzer @ 2017-01-05 18:07 UTC (permalink / raw)
  To: Zdenek Kabelac
  Cc: Eric Sandeen, Christoph Hellwig, dm-devel, Dave Chinner, eguan, fstests

On Thu, Jan 05 2017 at 12:42pm -0500,
Zdenek Kabelac <zkabelac@redhat.com> wrote:

> Dne 5.1.2017 v 17:26 Mike Snitzer napsal(a):
> >On Thu, Jan 05 2017 at  5:35am -0500,
> >Zdenek Kabelac <zkabelac@redhat.com> wrote:
> >
> >>Dne 5.1.2017 v 00:03 Eric Sandeen napsal(a):
> >>>
> >>>
> >>>On 12/16/16 2:15 AM, Christoph Hellwig wrote:
> >>>>On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote:
> >>>>>So let me explain the logic behind this 'amazingly stupid' idea.
> >>>>
> >>>>And that logic doesn't make any sense at all.  invibly unmounting
> >>>>a file system behind the users back is actively harmful, as it is
> >>>>contradicting the principle of least surprise, and the xfstests mess
> >>>>is one simple example for it.  Add a callback in-kernel to tell the
> >>>>fs to shut down but NOT unmount and expose the namespace below it,
> >>>>which the administrator has probably intentionally hid.
> >>>>
> >>>>Even worse unmount may trigger further writes and with fses not
> >>>>handling them the fs might now be stuck after being detached from
> >>>>the namespace without a way for the admin to detect or recover this.
> >>>>
> >>>>What XFS did on IRIX was to let the volume manager call into the fs
> >>>>and shut it down.  At this point no further writes are possible,
> >>>>but we do not expose the namespace under the mount point, and the
> >>>>admin can fix the situation with all the normal tools.
> >>>
> >>><late to the party>
> >>>
> >>>Is there a need for this kind of call-up when xfs now has the configurable
> >>>error handling so that it will shut down after X retries or Y seconds
> >>>of a persistent error?
> >>
> >>
> >>We need likely to open  RFE bugzilla  here - and specify how it should
> >>work when some conditions are met.
> >>
> >>Current 'best effort' tries to minimize damage by trying to do a full-stop
> >>when pool approaches 95% fullness.  Which is relatively 'low/small'
> >>for small sized thin-pool - but there is reasonable big free space
> >>for
> >>commonly sized thin-pool to be able to flush most of page cache on
> >>disk before things will go crazy.
> >>
> >>Now - we could probably detect presence of kernel version and
> >>xfs/ext4 present features - and change reactions.
> >>
> >>What I expect from this BZ is -  how to detect things and what is
> >>the 'best' thing to do.
> >>
> >>I'm clearly not an expert on all filesystem and all their features - but lvm2
> >>needs to work reasonable well across all variants of kernels and
> >>filesystems -  so we cannot say to user - now  we require you to use
> >>the latest 4.10
> >>kernel with these features enabled otherwise all your data could be lost.
> >>
> >>We need to know what to do with 3.X  kernel,  4.X kernel and present
> >>features in kernel and how we can detect them in runtime.
> >
> >No we need to fix upstream.  It is the job of distros to sort out other
> >solutions.  And yeah I appreciate that you need to worry about distro X,
> >Y, Z from Red Hat but lvm2 needs to not be so cute about things.
> >
> >And there has been significant progress on XFS's error handling so that
> >it no longer hangs in the face of ENOSPC.
> >
> >Eric says the basics are documented in Documentation/filesystems/xfs.txt
> >under "Error Handling".
> >
> >Bottomline is lvm2 really has no business unmounting the filesystem.
> >That lvm2 "feature" should be reverted.  It isn't what anyone would
> >expect.  And it only serves to create problems.  Nice intent but it was
> >a misfire to implement it.  At most a discard should be issued once you
> >cross a threshold.
> >
> 
> 
> Users are mostly using thin LVs with  ext4 and XFS filesytems.
> Lots of users do use quite ancient kernels (<4.X).
> 
> When lvm2 decides to UNMOUNT volumes - it's the moment where
> everything else HAS failed (mainly Admin has failed to provide
> promised space)
> And it should be seen as mild OOPS replacement.
> 
> 
> Un/fortunately lvm2 does care about older distributions and kernels
> - so unlike many other 'modern' software you can take recent lvm2 -
> compile & run on several years system and it does its best - so far
> I'd call it a feature.
> 
> Not really sure what do you mean by - leaving this on 'distro'
> maintainers since these are typically able to run   'configure &
> make',
> without no big idea about configurable lvm2 internals.
> 
> --
> 
> Now there is no objection about adding configurable behavior on such cases
> (policies) here you are 'breaking to the open doors' since we do
> plan to provide these for some time (I think there are couple BZs
> already)
> 
> So far I understand that admin  HAS to configure 'better XFS logic'
> himself on filesystem - it's not  granted default so far even on
> 4.10 kernel ?
> 
> On lvm2 side we need to first convert plugin code executions to
> lvconvert policies (will make BZ for this one).

lvm2 isn't the one that needs to fix this!

but it certainly should not be doing an unmount.  discard at most.

why is discard kernel version dependent?

You're making an historically annoying problem even more so.  please
stop this.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-05 10:35             ` Zdenek Kabelac
  (?)
  (?)
@ 2017-01-05 18:24             ` Eric Sandeen
  2017-01-05 18:52               ` Mike Snitzer
  2017-01-05 19:13               ` Zdenek Kabelac
  -1 siblings, 2 replies; 29+ messages in thread
From: Eric Sandeen @ 2017-01-05 18:24 UTC (permalink / raw)
  To: Zdenek Kabelac, Christoph Hellwig; +Cc: dm-devel, Dave Chinner, eguan

(dropping fstests list)

On 1/5/17 4:35 AM, Zdenek Kabelac wrote:
> Dne 5.1.2017 v 00:03 Eric Sandeen napsal(a):
>>
>>
>> On 12/16/16 2:15 AM, Christoph Hellwig wrote:
>>> On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote:

...

>>> What XFS did on IRIX was to let the volume manager call into the fs
>>> and shut it down.  At this point no further writes are possible,
>>> but we do not expose the namespace under the mount point, and the
>>> admin can fix the situation with all the normal tools.
>>
>> <late to the party>
>>
>> Is there a need for this kind of call-up when xfs now has the configurable
>> error handling so that it will shut down after X retries or Y seconds
>> of a persistent error?
> 
> 
> We need likely to open  RFE bugzilla  here - and specify how it should
> work when some conditions are met.

We need volume manager people & filesystem people to coordinate a solution,
bugzillas are rarely the best place to do that.  ;)

> Current 'best effort' tries to minimize damage by trying to do a full-stop
> when pool approaches 95% fullness. Which is relatively 'low/small'
> for small sized thin-pool - but there is reasonable big free space
> for commonly sized thin-pool to be able to flush most of page cache
> on disk before things will go crazy.

Sounds like pure speculation. "95%" says nothing about actual space left
vs. actual amount of outstanding buffered IO.

> Now - we could probably detect presence of kernel version and
> xfs/ext4 present features - and change reactions.

Unlikely.  Kernel version doesn't mean anything when distros are
involved.  Many features are not advertised in any way.

> What I expect from this BZ is - how to detect things and what is the
> 'best' thing to do.
> 
> I'm clearly not an expert on all filesystem and all their features -
> but lvm2 needs to work reasonable well across all variants of kernels
> and filesystems - so we cannot say to user - now we require you to
> use the latest 4.10 kernel with these features enabled otherwise all
> your data could be lost.
> 
> We need to know what to do with 3.X kernel, 4.X kernel and present
> features in kernel and how we can detect them in runtime.

Like Mike said, we need to make upstream work, first.

Distros can figure out where to go from there.


Anyway, at this point I'm not convinced that anything but the filesystem
should be making decisions based on storage error conditions.

I think unmounting the filesystem is a terrible idea, and hch & mike
seem to agree.  It's problematic in many ways.

I'm not super keen on shutting down the filesystem, for similar reasons,
but I have a more open mind about that because the implications to the
system are not so severe.

Upstream now has better xfs error handling configurability.  Have you
tested with that?  (for that matter, what thinp test framework exists
on the lvm2/dm side?  We currently have only minimal testing fstests,
to be honest.  Until we have a framework to test against this seems likely
to continue going in theoretical circles.)

-Eric

> Regards
> 
> Zdenek
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-05 17:42                 ` Zdenek Kabelac
  (?)
  (?)
@ 2017-01-05 18:40                 ` Eric Sandeen
  -1 siblings, 0 replies; 29+ messages in thread
From: Eric Sandeen @ 2017-01-05 18:40 UTC (permalink / raw)
  To: Zdenek Kabelac, Mike Snitzer
  Cc: Christoph Hellwig, dm-devel, Dave Chinner, eguan, fstests

On 1/5/17 11:42 AM, Zdenek Kabelac wrote:
> Dne 5.1.2017 v 17:26 Mike Snitzer napsal(a):

...

>> Bottomline is lvm2 really has no business unmounting the filesystem.
>> That lvm2 "feature" should be reverted.  It isn't what anyone would
>> expect.  And it only serves to create problems.  Nice intent but it was
>> a misfire to implement it.  At most a discard should be issued once you
>> cross a threshold.

Agreed.

> 
> 
> Users are mostly using thin LVs with  ext4 and XFS filesytems.
> Lots of users do use quite ancient kernels (<4.X).

Yes, old kernels have bugs.  Enterprise distros should fix those bugs,
configure && make distros and roll-your-own users get what they
get, IMHO.

> When lvm2 decides to UNMOUNT volumes - it's the moment where
> everything else HAS failed (mainly Admin has failed to provide
> promised space)
> And it should be seen as mild OOPS replacement.

unmounting is a bad idea.  It's not even guaranteed to complete,
because unmount may well require IO.  It has undesirable and unpredictable
impacts on the system.  At a bare minimum, it should not be the default
behavior.

> 
> Un/fortunately lvm2 does care about older distributions and kernels -
> so unlike many other 'modern' software you can take recent lvm2 -
> compile & run on several years system and it does its best - so far
> I'd call it a feature.

It's great that lvm2 stays compatible - xfsprogs does too,
FWIW - but trying to work around every old bad behavior is likely
to be a fool's errand, IMHO.

> Not really sure what do you mean by - leaving this on 'distro'
> maintainers since these are typically able to run 'configure &
> make', without no big idea about configurable lvm2 internals.

But unmounting /does not/ solve that for them.  It's bad behavior.
It's far better IMHO to let xfs spew errors to the log for a day
than to unmount a volume and then fill the root fs with ongoing
IO, for example.
 
> -- 
> 
> Now there is no objection about adding configurable behavior on such cases
> (policies) here you are 'breaking to the open doors' since we do plan
> to provide these for some time (I think there are couple BZs
> already)
> 
> So far I understand that admin HAS to configure 'better XFS logic'
> himself on filesystem - it's not granted default so far even on 4.10
> kernel ?

Ok, back to concrete things: Right, XFS behavior is tunable, but
largely unchanged other than a better behavior at unmount.  Default
behavior in the face of IO errors is otherwise unchanged.  Not all
the world is thinp; there are other error conditions and cases which
may warrant different responses.  As such, the default remains to
keep trying IO unless something critical fails, which will then shut
down the filesystem.

That said, lvm2 offering to configure such behavior for xfs on a thin
volume might be quite reasonable.

I really think we need a robust test framework for all of this so we
can move beyond anecdotes and assumptions, and get some repeatable
regression tests going.

-Eric

> On lvm2 side we need to first convert plugin code executions to lvconvert policies (will make BZ for this one).
> 
> 
> Regards
> 
> Zdenek
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-05 18:24             ` Eric Sandeen
@ 2017-01-05 18:52               ` Mike Snitzer
  2017-01-05 19:13               ` Zdenek Kabelac
  1 sibling, 0 replies; 29+ messages in thread
From: Mike Snitzer @ 2017-01-05 18:52 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Christoph Hellwig, dm-devel, Dave Chinner, eguan, Zdenek Kabelac

On Thu, Jan 05 2017 at  1:24pm -0500,
Eric Sandeen <sandeen@sandeen.net> wrote:

> Upstream now has better xfs error handling configurability.  Have you
> tested with that?  (for that matter, what thinp test framework exists
> on the lvm2/dm side?  We currently have only minimal testing fstests,
> to be honest.  Until we have a framework to test against this seems likely
> to continue going in theoretical circles.)

device-mapper-test-suite (dmts) has various thinp out of space tests,
e.g.:

# dmtest run --suite thin-provisioning -n /out_of_*space/
Loaded suite thin-provisioning
DeletionTests
  delete_after_out_of_space...PASS
DiscardSlowTests
  discard_after_out_of_space...PASS

# dmtest run --suite thin-provisioning -t PoolResizeWhenOutOfSpaceTests
Loaded suite thin-provisioning
PoolResizeWhenOutOfSpaceTests
  io_to_provisioned_region_with_OODS_held_io...PASS
  out_of_data_space_errors_immediately_if_requested...PASS
  out_of_data_space_times_out...PASS
  resize_after_OODS_error_immediately...PASS
  resize_after_OODS_held_io...PASS
  resize_after_OODS_held_io_ext4...#<Test::Unit::Error:0x00000003746d48
 @exception=
  #<ProcessControl::ExitError: command failed: fsck.ext4 -fn /dev/mapper/test-dev-60498>,
 @test_name=
  "test_resize_after_OODS_held_io_ext4(PoolResizeWhenOutOfSpaceTests)">
FAIL
  resize_after_OODS_held_io_preload...PASS
  resize_after_OODS_held_io_timed_out_preload...PASS
  resize_io...PASS

I need to look closer at the 'resize_after_OODS_held_io_ext4' FAIL,
after initial look the dmts test code seems to be buggy.

But we can easily extend to have specific coverage with XFS ontop.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-05 18:24             ` Eric Sandeen
  2017-01-05 18:52               ` Mike Snitzer
@ 2017-01-05 19:13               ` Zdenek Kabelac
  2017-01-05 19:29                 ` Eric Sandeen
  1 sibling, 1 reply; 29+ messages in thread
From: Zdenek Kabelac @ 2017-01-05 19:13 UTC (permalink / raw)
  To: Eric Sandeen, Christoph Hellwig; +Cc: dm-devel, Dave Chinner, eguan

Dne 5.1.2017 v 19:24 Eric Sandeen napsal(a):
> (dropping fstests list)
>
> On 1/5/17 4:35 AM, Zdenek Kabelac wrote:
>> Dne 5.1.2017 v 00:03 Eric Sandeen napsal(a):
>>>
>>>
>>> On 12/16/16 2:15 AM, Christoph Hellwig wrote:
>>>> On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote:
>
> ...
>
>>>> What XFS did on IRIX was to let the volume manager call into the fs
>>>> and shut it down.  At this point no further writes are possible,
>>>> but we do not expose the namespace under the mount point, and the
>>>> admin can fix the situation with all the normal tools.
>>>
>>> <late to the party>
>>>
>>> Is there a need for this kind of call-up when xfs now has the configurable
>>> error handling so that it will shut down after X retries or Y seconds
>>> of a persistent error?
>>
>>
>> We need likely to open  RFE bugzilla  here - and specify how it should
>> work when some conditions are met.
>
> We need volume manager people & filesystem people to coordinate a solution,
> bugzillas are rarely the best place to do that.  ;)

IMHO it's usually better then sending list to vairous list -
we need all facts single place instead of looking them out in lists.


>
>> Current 'best effort' tries to minimize damage by trying to do a full-stop
>> when pool approaches 95% fullness. Which is relatively 'low/small'
>> for small sized thin-pool - but there is reasonable big free space
>> for commonly sized thin-pool to be able to flush most of page cache
>> on disk before things will go crazy.
>
> Sounds like pure speculation. "95%" says nothing about actual space left
> vs. actual amount of outstanding buffered IO.

It's quite similar approach as when the filesystem has reserved some space for 
'root' user - to simply proceed when user exhausted things in fs.


>> Now - we could probably detect presence of kernel version and
>> xfs/ext4 present features - and change reactions.
>
> Unlikely.  Kernel version doesn't mean anything when distros are
> involved.  Many features are not advertised in any way.

Aren'y those 'new' features exposed by  /sysfs in some way ?

>>
>> We need to know what to do with 3.X kernel, 4.X kernel and present
>> features in kernel and how we can detect them in runtime.
>
> Like Mike said, we need to make upstream work, first.
>
> Distros can figure out where to go from there.
>

lvm2 upstream is  'distro' & 'kernel'  independent.

It is designed to COVER-UP known kernel bugs and detect present features.
It's the design purpose of lvm2 and it's KEY feature of lvm2.

So we can't just  'drop' existing users because we like new 4.X kernel so much.
(Yet we may issue a serious WARNING message when user is using something
with bad consequences for him)


> Anyway, at this point I'm not convinced that anything but the filesystem
> should be making decisions based on storage error conditions.

So far I'm not convinced  doing nothing is better then trying at least unmount.

Since doing nothing is known to cause  SEVERE filesystem damages,
while I've haven't heard about them when 'unmount' is in the field.

Users are not happy - but usually filesystem is repairable when new space
is added. (Note here -  user are using couple LVs and usually have
some space left to succeed with flush & umount)

>
> I think unmounting the filesystem is a terrible idea, and hch & mike
> seem to agree.  It's problematic in many ways.

So let's remain core trouble -


Data-exhausted thinpool allows  'fs' user to write to provisioned space - 
while error-out on non-provisioned/missing.

If the filesystem is not immediately stopped on the 1st. such error (like 
remount-ro does for ext4) it continues to destroy itself to major degree  as 
after reboot the non-provisioned space may actually be there  - as user do 
typically use snapshots - and write requires provisioning new space - but old 
space remains there - as thin volume metadata do not point to 'non-existing' 
block for failed provisioning - but the old existing one before error.

This puts  filesystem in rather 'tragical' situation as it reads data out of 
thin volume without knowing how consistent they are - i.e. some  mixture of 
old and new data.

I've proposed couple things - i.e.:

Configurable option that 1st. provisioning error makes ALL further 'writes' to 
thin failing - this solves filesystem repair trouble - but  was not seen as 
good idea by Mike as this would complicate logic in thinp target.

We could possibly implement this by remapping tables via lvm - but it's not
quite easy to provide such feature.


We could actually put 'error' targets instead of thins  - and let filesystem
deals with it - but still some older XFS basically OOM later without telling
user a word how bad is that   (seen users with lots of RAM and working for 2 
days....) unless user monitors syslog for stream or write errors.


>
> I'm not super keen on shutting down the filesystem, for similar reasons,
> but I have a more open mind about that because the implications to the
> system are not so severe.

Yes -   instant 'shutdown' is nice option - expect lot of users
are not using thin for their root volume - just for some data volume (virtual 
machines), so killing machine is quite major obstruction then  - unmount is 
just a tiny bit nicer.


> Upstream now has better xfs error handling configurability.  Have you
> tested with that?  (for that matter, what thinp test framework exists
> on the lvm2/dm side?  We currently have only minimal testing fstests,
> to be honest.  Until we have a framework to test against this seems likely
> to continue going in theoretical circles.)

See i.e.  lvm2/tests/shell  subdir

Regards

Zdenek

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-05 19:13               ` Zdenek Kabelac
@ 2017-01-05 19:29                 ` Eric Sandeen
  2017-01-05 21:12                   ` Zdenek Kabelac
  0 siblings, 1 reply; 29+ messages in thread
From: Eric Sandeen @ 2017-01-05 19:29 UTC (permalink / raw)
  To: Zdenek Kabelac, Christoph Hellwig; +Cc: dm-devel, Dave Chinner, eguan

On 1/5/17 1:13 PM, Zdenek Kabelac wrote:
>> Anyway, at this point I'm not convinced that anything but the filesystem
>> should be making decisions based on storage error conditions.
> 
> So far I'm not convinced  doing nothing is better then trying at least unmount.
> 
> Since doing nothing is known to cause  SEVERE filesystem damages,
> while I've haven't heard about them when 'unmount' is in the field.

I'm pretty sure that's exactly what started this thread.  ;)

Failing IOs should never cause "severe filesystem damage" - that is what
a journaling filesystem is /for/.  Can you explain further?

(A journal may not be replayable on mount if it needs to allocate more
thin blocks on replay and is unable to do so, but hat should just fail
gracefully)

> 
> Users are not happy - but usually filesystem is repairable when new space
> is added. (Note here -  user are using couple LVs and usually have
> some space left to succeed with flush & umount)
> 
>>
>> I think unmounting the filesystem is a terrible idea, and hch & mike
>> seem to agree.  It's problematic in many ways.
> 
> So let's remain core trouble -
> 
> 
> Data-exhausted thinpool allows 'fs' user to write to provisioned
> space - while error-out on non-provisioned/missing.

as expected.

> If the filesystem is not immediately stopped on the 1st. such error
> (like remount-ro does for ext4) it continues to destroy itself to
> major degree 

Again, please provide concrete examples here so we know what you've
seen in this regard.

It's my expectation that a journaling filesystem will make noise,
might shut down, but should never be /damaged/ by IO errors.  If that's
not the case, can you provide a reproducer?

> as after reboot the non-provisioned space may actually
> be there - as user do typically use snapshots - and write requires
> provisioning new space - but old space remains there - as thin volume
> metadata do not point to 'non-existing' block for failed provisioning
> - but the old existing one before error.
> 
> This puts filesystem in rather 'tragical' situation as it reads data
> out of thin volume without knowing how consistent they are - i.e.
> some mixture of old and new data.

I lost you there.  A testcase for this filesystem-destroying
behavior might help me understand.

> I've proposed couple things - i.e.:
> 
> Configurable option that 1st. provisioning error makes ALL further
> 'writes' to thin failing - this solves filesystem repair trouble -
> but was not seen as good idea by Mike as this would complicate logic
> in thinp target.
> 
> We could possibly implement this by remapping tables via lvm - but
> it's not quite easy to provide such feature.> 
> 
> We could actually put 'error' targets instead of thins  - and let filesystem
> deals with it - but still some older XFS basically OOM later without telling
> user a word how bad is that   (seen users with lots of RAM and working for 2 days....) unless user monitors syslog for stream or write errors.
> 
> 
>>
>> I'm not super keen on shutting down the filesystem, for similar reasons,
>> but I have a more open mind about that because the implications to the
>> system are not so severe.
> 
> Yes -   instant 'shutdown' is nice option - expect lot of users
> are not using thin for their root volume - just for some data volume
> (virtual machines), so killing machine is quite major obstruction
> then - unmount is just a tiny bit nicer.

I am not following you at all.  I'm not talking about killing the machine,
I am talking about shutting down the /filesystem/ which has hit errors.
("shutdown" is xfs-speak for ext4's remount-ro, more or less).

> 
> 
>> Upstream now has better xfs error handling configurability.  Have you
>> tested with that?  (for that matter, what thinp test framework exists
>> on the lvm2/dm side?  We currently have only minimal testing fstests,
>> to be honest.  Until we have a framework to test against this seems likely
>> to continue going in theoretical circles.)
> 
> See i.e.  lvm2/tests/shell  subdir

Thx.

-Eric

> Regards
> 
> Zdenek

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-05 19:29                 ` Eric Sandeen
@ 2017-01-05 21:12                   ` Zdenek Kabelac
  2017-01-05 22:03                     ` Eric Sandeen
  2017-01-05 22:46                     ` Dave Chinner
  0 siblings, 2 replies; 29+ messages in thread
From: Zdenek Kabelac @ 2017-01-05 21:12 UTC (permalink / raw)
  To: Eric Sandeen, Christoph Hellwig; +Cc: dm-devel, Dave Chinner, eguan

Dne 5.1.2017 v 20:29 Eric Sandeen napsal(a):
> On 1/5/17 1:13 PM, Zdenek Kabelac wrote:
>>> Anyway, at this point I'm not convinced that anything but the filesystem
>>> should be making decisions based on storage error conditions.
>>
>> So far I'm not convinced  doing nothing is better then trying at least unmount.
>>
>> Since doing nothing is known to cause  SEVERE filesystem damages,
>> while I've haven't heard about them when 'unmount' is in the field.
>
> I'm pretty sure that's exactly what started this thread.  ;)
>
> Failing IOs should never cause "severe filesystem damage" - that is what
> a journaling filesystem is /for/.  Can you explain further?

well all I know are user reports - which we capable to use 'XFS'
with exhausted  thin-pool while  having 'snapshots' of their volumes.

Since there was no 'umount' and  XFS upon write error just retried
endlessly to write block over and over -  system appeared
to the users nice & usable for quite long time (especially when boxes had 32G 
of RAM or more...)

Maybe writes passed to 'uniquely' owned blocs....

Then after some day,two,free   OOM finally killed.
Users realized thin-pool was out-of-space - added room to VG and pool
and tried  xfs_repair - but whole FS was largely lost.

With umount   user cannot use the machine and is mostly forced to reboot
(which was the main point of umount -  to distract user work)

So  if I hear all voice correctly now -


we now want to let user continue to use such systems and let them figure out 
themself something is wrong when they get occasional write failure
and XFS now avoid destruction by  shutting down on any journal failure.

> (A journal may not be replayable on mount if it needs to allocate more
> thin blocks on replay and is unable to do so, but hat should just fail
> gracefully)


I don't have a test sample myself - just some guides how to get to it.

Use  LV and make some thin snapshots.

Then change various parts of origin - at various moment before pool is 
out-of-space

So you will get lots of different scenarios of missing data.

You will mostly not get into those mentioned trouble if you
have just single thinLV and you exhaust thin-pool while using it.

Games with snapshot are needed.



Regards

Zdenek

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-05 21:12                   ` Zdenek Kabelac
@ 2017-01-05 22:03                     ` Eric Sandeen
  2017-01-05 22:46                     ` Dave Chinner
  1 sibling, 0 replies; 29+ messages in thread
From: Eric Sandeen @ 2017-01-05 22:03 UTC (permalink / raw)
  To: Zdenek Kabelac, Christoph Hellwig; +Cc: dm-devel, Dave Chinner, eguan

On 1/5/17 3:12 PM, Zdenek Kabelac wrote:
> Dne 5.1.2017 v 20:29 Eric Sandeen napsal(a):
> 
> So  if I hear all voice correctly now -
> 
> 
> we now want to let user continue to use such systems and let them
> figure out themself something is wrong when they get occasional write
> failure and XFS now avoid destruction by shutting down on any journal
> failure.

No - I'd like to know, specifically, what scenario leads to filesystem
corruption when we run out of space in the thin-pool, because I do
not expect that result.

Anecdotes about "destruction" don't really help.

>> (A journal may not be replayable on mount if it needs to allocate more
>> thin blocks on replay and is unable to do so, but hat should just fail
>> gracefully)
> 
> 
> I don't have a test sample myself - just some guides how to get to it.
> 
> Use  LV and make some thin snapshots.
> 
> Then change various parts of origin - at various moment before pool is out-of-space
> 
> So you will get lots of different scenarios of missing data.

(missing /data/ is not "filesystem destruction" - if we're just talking
about buffered IO which was never fsynced and does not persist on
disk, that is expected with or without thinp on any sort of storage
trouble.)
 
> You will mostly not get into those mentioned trouble if you
> have just single thinLV and you exhaust thin-pool while using it.
> 
> Games with snapshot are needed.

Please formalize that, and I'll be happy to take a look.

-Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-05 21:12                   ` Zdenek Kabelac
  2017-01-05 22:03                     ` Eric Sandeen
@ 2017-01-05 22:46                     ` Dave Chinner
  2017-01-09 13:39                       ` Christoph Hellwig
  1 sibling, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2017-01-05 22:46 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: Christoph Hellwig, dm-devel, Eric Sandeen, eguan

On Thu, Jan 05, 2017 at 10:12:25PM +0100, Zdenek Kabelac wrote:
> Dne 5.1.2017 v 20:29 Eric Sandeen napsal(a):
> >On 1/5/17 1:13 PM, Zdenek Kabelac wrote:
> >>>Anyway, at this point I'm not convinced that anything but the filesystem
> >>>should be making decisions based on storage error conditions.
> >>
> >>So far I'm not convinced  doing nothing is better then trying at least unmount.
> >>
> >>Since doing nothing is known to cause  SEVERE filesystem damages,
> >>while I've haven't heard about them when 'unmount' is in the field.
> >
> >I'm pretty sure that's exactly what started this thread.  ;)
> >
> >Failing IOs should never cause "severe filesystem damage" - that is what
> >a journaling filesystem is /for/.  Can you explain further?
> 
> well all I know are user reports - which we capable to use 'XFS'
> with exhausted  thin-pool while  having 'snapshots' of their volumes.
> 
> Since there was no 'umount' and  XFS upon write error just retried
> endlessly to write block over and over -  system appeared

Which has already been fixed upstream.

And my 2c worth on the "lvm unmounting filesystems on error" - stop
it, now. It's the wrong thing to do, and it makes it impossible for
filesystems to handle the error and recover gracefully when
possible.

> to the users nice & usable for quite long time (especially when
> boxes had 32G of RAM or more...)
> 
> Maybe writes passed to 'uniquely' owned blocs....
> 
> Then after some day,two,free   OOM finally killed.
> Users realized thin-pool was out-of-space - added room to VG and pool
> and tried  xfs_repair - but whole FS was largely lost.

That sounds very much like a block device snapshot corruption
problem, not a filesystem problem. As always, the filesystem gets
blamed for data loss, regardless of where the problem really lies.

> Use  LV and make some thin snapshots.
> 
> Then change various parts of origin - at various moment before pool
> is out-of-space
> 
> So you will get lots of different scenarios of missing data.
> 
> You will mostly not get into those mentioned trouble if you
> have just single thinLV and you exhaust thin-pool while using it.
> 
> Games with snapshot are needed.

This really sounds like a problem with snapshot ENOSPC error
handling, not a filesystem issue - the filesystem is simply the
messenger here...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-05 22:46                     ` Dave Chinner
@ 2017-01-09 13:39                       ` Christoph Hellwig
  2017-01-09 14:22                         ` Zdenek Kabelac
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2017-01-09 13:39 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, dm-devel, Eric Sandeen, eguan, Zdenek Kabelac

On Fri, Jan 06, 2017 at 09:46:00AM +1100, Dave Chinner wrote:
> And my 2c worth on the "lvm unmounting filesystems on error" - stop
> it, now. It's the wrong thing to do, and it makes it impossible for
> filesystems to handle the error and recover gracefully when
> possible.

It's causing way more harm than it helps due to the namespace
implications.  And we'll have to fix it in the FS for real because
other storage can run out of space as well.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-09 13:39                       ` Christoph Hellwig
@ 2017-01-09 14:22                         ` Zdenek Kabelac
  2017-01-09 14:54                           ` Eric Sandeen
  2017-01-09 15:01                           ` Christoph Hellwig
  0 siblings, 2 replies; 29+ messages in thread
From: Zdenek Kabelac @ 2017-01-09 14:22 UTC (permalink / raw)
  To: Christoph Hellwig, Dave Chinner; +Cc: dm-devel, Eric Sandeen, eguan

Dne 9.1.2017 v 14:39 Christoph Hellwig napsal(a):
> On Fri, Jan 06, 2017 at 09:46:00AM +1100, Dave Chinner wrote:
>> And my 2c worth on the "lvm unmounting filesystems on error" - stop
>> it, now. It's the wrong thing to do, and it makes it impossible for
>> filesystems to handle the error and recover gracefully when
>> possible.
>
> It's causing way more harm than it helps due to the namespace
> implications.  And we'll have to fix it in the FS for real because
> other storage can run out of space as well.
>


Hi

I can be blind but I still miss some simple things here -


lvm2 will initiate lazy umount of ALL thin devices from a thin-pool
when it gets about 95% fullness  (so it's a bit sooner then 100%
with still some 5% 'free-space'.

This should mostly trigger flushing as much dirty stuff on disk - possibly 
causing even 100% fullness (which is not wanted but unavoidable with todays 
RAM and sizes).

So now - filesystem gets into position of  having ENOSP errors.

I'd expect  XFS switches off  (xfs_shutdown) on this condition.

So it should get into EXACT same state as  is advocated here - do nothing case 
- without invoking 'lazy umount' but significantly later
(so IMHO causing possibly more damage to a user).

So is  'XFS  incapable to handle lazy umount at all in such conditions ?

I really would like to first understand why there is such big hallo effect 
around this - since in my eyes -  lvm2 was designed to operated within
threshold bounds - if the thin-pool volume is out of configured  bounds, lvm2 
is not capable to deliver (resize) more space and it's trying to simply stop 
further operations - so disruption of work was 'intended'.


What I'm getting from this thread is -   this is unwanted and user wish to 
continue use thin-pool further with possible damages to their data set - and
lvm2 WILL provide configurable settings for this.

I'm not disputing this part AT ALL  - just to make it really clear.


But could anyone from XFS specify -  why  umount is causing some 'more' 
damage, then  no umount at all ?

With my naive thinking - I'd just assume I hit 'xfs_shutdown' rather earlier.
Is xfs refusing to umount 'erroring' device ?


Also please note - since its mentioned namespaces here - if this is something 
Docker related - be aware for Docker thins  - lvm2 for awhile is leaving such 
volumes intact already (-> no umount for docker thins volume).


Regards

Zdenek

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-09 14:22                         ` Zdenek Kabelac
@ 2017-01-09 14:54                           ` Eric Sandeen
  2017-01-09 15:11                             ` Zdenek Kabelac
  2017-01-10  4:30                             ` Darrick J. Wong
  2017-01-09 15:01                           ` Christoph Hellwig
  1 sibling, 2 replies; 29+ messages in thread
From: Eric Sandeen @ 2017-01-09 14:54 UTC (permalink / raw)
  To: Zdenek Kabelac, Christoph Hellwig, Dave Chinner; +Cc: dm-devel, eguan

On 1/9/17 8:22 AM, Zdenek Kabelac wrote:
> But could anyone from XFS specify -  why  umount is causing some
> 'more' damage, then  no umount at all ?

Please reread this thread...  it /started/ with problems
/caused by unmount/ for Christoph.

It's not that unmount damages the filesystem per se; it damages the
/system/ when it uncovers the underlying mountpoint and applications
continue writing without error, to the /wrong filesystem/.

Further, if unmount requires IO that can't complete due to ENOSPC,
then launching the unmount behind the admin's back may cause other
problems when it gets detached from the namespace.

Christoph later clarified:

> Even on a production system I'd much rather have a shutdown XFS fs
> than LVM trying to unmount, probably hanging because there are busy
> fds on the fs, and if not the application might not write to another
> fs becaus of this.  It's just an amazingly stupid idea.

And again:

> invi[si]bly unmounting
> a file system behind the users back is actively harmful, as it is
> contradicting the principle of least surprise, and the xfstests mess
> is one simple example for it.
...
> [ it's undesirable to ] unmount and expose the namespace below it,
> which the administrator has probably intentionally hid.

> Even worse unmount may trigger further writes and with fses not
> handling them the fs might now be stuck after being detached from
> the namespace without a way for the admin to detect or recover this.

Dave agreed:

> And my 2c worth on the "lvm unmounting filesystems on error" - stop
> it, now. It's the wrong thing to do, and it makes it impossible for
> filesystems to handle the error and recover gracefully when
> possible.

Now, as for:

> Is xfs refusing to umount 'erroring' device ? 

I'm not sure what you mean by this.  XFS never unmounts itself.
If you're asking if an XFS filesystem in an error state (i.e., shutdown,
or with failing IOs) can unmount - yes, it is possible to unmount
a filesystem in this state.  The administrator can make that choice.



You've had two preeminent XFS developers ask repeatedly that
you stop unmounting xfs from lvm2.  I really wish you would take their
advice.

We need to work together to ensure that these subsystems react
in the best possible way to overprovisioned storage, and give the admin
the best chance at recovery without more adverse affects.  For the reasons
stated above, unmounting the filesystem does not achieve this goal.

It leads to application IO to wrong filesystems (quite possibly root),
detached filesystems which can no longer be administered, possible hung
unmounts, etc.  Automatic unmount is the wrong reaction, please stop doing
it.

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-09 14:22                         ` Zdenek Kabelac
  2017-01-09 14:54                           ` Eric Sandeen
@ 2017-01-09 15:01                           ` Christoph Hellwig
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2017-01-09 15:01 UTC (permalink / raw)
  To: Zdenek Kabelac
  Cc: Christoph Hellwig, Eric Sandeen, dm-devel, Dave Chinner, eguan

On Mon, Jan 09, 2017 at 03:22:00PM +0100, Zdenek Kabelac wrote:
> lvm2 will initiate lazy umount of ALL thin devices from a thin-pool
> when it gets about 95% fullness  (so it's a bit sooner then 100%
> with still some 5% 'free-space'.

Yes, and we want this not to be done.  Not for XFS and not for any
other file system.  umounting is something neither lvm nor the
filesystem should do, but instead the administrator.

> So it should get into EXACT same state as  is advocated here - do nothing
> case - without invoking 'lazy umount' but significantly later
> (so IMHO causing possibly more damage to a user).

The lazy unmount does not help.  If LVM want to give the fs a headups
about running out of space please do so, this would be useful as we
could then do sensible things, like doing a discard of all free blocks
or an explicit, early fs shutdown.  But don't just lazy unmount the fs
which only creates lots of problems.

> But could anyone from XFS specify -  why  umount is causing some 'more'
> damage, then  no umount at all ?

It is causing damage because unmount cause additional I/O, at a point
where we possibly might not want more I/O.  It also doesn't let the
file system to do useful things.  And last but not least it changes the
namespace (aka mount topology) in an absolutely unexpected way, which
might even introduce security issues because now users could write
into otherwise hidden directories in the parent mountpoint.

> Also please note - since its mentioned namespaces here - if this is
> something Docker related - be aware for Docker thins  - lvm2 for awhile is
> leaving such volumes intact already (-> no umount for docker thins volume).

The filesystem namespace has nothing to do with docker at all.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-09 14:54                           ` Eric Sandeen
@ 2017-01-09 15:11                             ` Zdenek Kabelac
  2017-01-10  2:48                               ` Theodore Ts'o
  2017-01-10  4:30                             ` Darrick J. Wong
  1 sibling, 1 reply; 29+ messages in thread
From: Zdenek Kabelac @ 2017-01-09 15:11 UTC (permalink / raw)
  To: Eric Sandeen, Christoph Hellwig, Dave Chinner; +Cc: dm-devel, eguan

Dne 9.1.2017 v 15:54 Eric Sandeen napsal(a):
> On 1/9/17 8:22 AM, Zdenek Kabelac wrote:
>> But could anyone from XFS specify -  why  umount is causing some
>> 'more' damage, then  no umount at all ?
>
> Please reread this thread...  it /started/ with problems
> /caused by unmount/ for Christoph.
>
> It's not that unmount damages the filesystem per se; it damages the
> /system/ when it uncovers the underlying mountpoint and applications
> continue writing without error, to the /wrong filesystem/.
>
> Further, if unmount requires IO that can't complete due to ENOSPC,
> then launching the unmount behind the admin's back may cause other
> problems when it gets detached from the namespace.
>
> Christoph later clarified:
>
>> Even on a production system I'd much rather have a shutdown XFS fs
>> than LVM trying to unmount, probably hanging because there are busy
>> fds on the fs, and if not the application might not write to another
>> fs becaus of this.  It's just an amazingly stupid idea.
>
> And again:
>
>> invi[si]bly unmounting
>> a file system behind the users back is actively harmful, as it is
>> contradicting the principle of least surprise, and the xfstests mess
>> is one simple example for it.
> ...
>> [ it's undesirable to ] unmount and expose the namespace below it,
>> which the administrator has probably intentionally hid.
>
>> Even worse unmount may trigger further writes and with fses not
>> handling them the fs might now be stuck after being detached from
>> the namespace without a way for the admin to detect or recover this.
>
> Dave agreed:
>
>> And my 2c worth on the "lvm unmounting filesystems on error" - stop
>> it, now. It's the wrong thing to do, and it makes it impossible for
>> filesystems to handle the error and recover gracefully when
>> possible.
>
> Now, as for:
>
>> Is xfs refusing to umount 'erroring' device ?
>
> I'm not sure what you mean by this.  XFS never unmounts itself.
> If you're asking if an XFS filesystem in an error state (i.e., shutdown,
> or with failing IOs) can unmount - yes, it is possible to unmount
> a filesystem in this state.  The administrator can make that choice.
>
>
>
> You've had two preeminent XFS developers ask repeatedly that
> you stop unmounting xfs from lvm2.  I really wish you would take their
> advice.


As said already said couple times -   lvm2 WILL provide this feature.
For sure - I'll repeat again - it will get there.


>
> We need to work together to ensure that these subsystems react
> in the best possible way to overprovisioned storage, and give the admin
> the best chance at recovery without more adverse affects.  For the reasons
> stated above, unmounting the filesystem does not achieve this goal.
>
> It leads to application IO to wrong filesystems (quite possibly root),
> detached filesystems which can no longer be administered, possible hung
> unmounts, etc.  Automatic unmount is the wrong reaction, please stop doing
> it.

I do get this - but we have 2 cases - each has it's merit.

You have the case were application will write to 'different' filesystem,
while in other cases user will be able to continue to use their filesystem
and cause irreparable filesystem damage (whatever you want to believe
is failure if thin-pool or filesystem).

So I believe it's upto a user to be able to pick which version he
prefers - and that's  why  lvm2 will provide this  behavior configurable.

You could of course do any other jobs - i.e. call fstrim  in the hook.

Regards

Zdenek

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-09 15:11                             ` Zdenek Kabelac
@ 2017-01-10  2:48                               ` Theodore Ts'o
  0 siblings, 0 replies; 29+ messages in thread
From: Theodore Ts'o @ 2017-01-10  2:48 UTC (permalink / raw)
  To: Zdenek Kabelac
  Cc: Christoph Hellwig, Dave Chinner, dm-devel, Eric Sandeen, eguan

On Mon, Jan 09, 2017 at 04:11:08PM +0100, Zdenek Kabelac wrote:
> 
> You have the case were application will write to 'different' filesystem,
> while in other cases user will be able to continue to use their filesystem
> and cause irreparable filesystem damage (whatever you want to believe
> is failure if thin-pool or filesystem).

Huh?  Where do you get this fantasy that ENOSPC causes "irreparable
file system damage"?  We have tests in xfstests that repeatedly hammer
file systems ENOSPC handling --- the file system is filled so it is
almost full, and then one thread is constantly creating and deleting
files to keep the file system almost full, and then backing off, while
another thread is trying to continuously allocate space, and retrying
when it gets ENOSPC.

If the file system gets "irreparably damanged", it's a file system
bug.  I can't remember the last time ext4 or xfs has had file system
issues with this.  Btrfs has in the past had issues here --- but this
is a file system bug.  It should be fixed, or no responsible system
administrator should be using btrfs on that enterprise distribution
until it is fixed, Docker or no Docker. 

	     	      	 	       - Ted

P.S.  And no responsible enterprise distro provider should be shipping
btrfs if it is hasn't had all of the bug fixes so it doesn't corrupt
the file system on ENOSPC.  Since I believe Red Hat is a responsible
enterprise distro provider, what's the problem?  :-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: trouble with generic/081
  2017-01-09 14:54                           ` Eric Sandeen
  2017-01-09 15:11                             ` Zdenek Kabelac
@ 2017-01-10  4:30                             ` Darrick J. Wong
  1 sibling, 0 replies; 29+ messages in thread
From: Darrick J. Wong @ 2017-01-10  4:30 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Christoph Hellwig, dm-devel, Dave Chinner, eguan, Zdenek Kabelac

On Mon, Jan 09, 2017 at 08:54:02AM -0600, Eric Sandeen wrote:
> On 1/9/17 8:22 AM, Zdenek Kabelac wrote:
> > But could anyone from XFS specify -  why  umount is causing some
> > 'more' damage, then  no umount at all ?
> 
> Please reread this thread...  it /started/ with problems
> /caused by unmount/ for Christoph.
> 
> It's not that unmount damages the filesystem per se; it damages the
> /system/ when it uncovers the underlying mountpoint and applications
> continue writing without error, to the /wrong filesystem/.
> 
> Further, if unmount requires IO that can't complete due to ENOSPC,
> then launching the unmount behind the admin's back may cause other
> problems when it gets detached from the namespace.
> 
> Christoph later clarified:
> 
> > Even on a production system I'd much rather have a shutdown XFS fs
> > than LVM trying to unmount, probably hanging because there are busy
> > fds on the fs, and if not the application might not write to another
> > fs becaus of this.  It's just an amazingly stupid idea.
> 
> And again:
> 
> > invi[si]bly unmounting
> > a file system behind the users back is actively harmful, as it is
> > contradicting the principle of least surprise, and the xfstests mess
> > is one simple example for it.
> ...
> > [ it's undesirable to ] unmount and expose the namespace below it,
> > which the administrator has probably intentionally hid.
> 
> > Even worse unmount may trigger further writes and with fses not
> > handling them the fs might now be stuck after being detached from
> > the namespace without a way for the admin to detect or recover this.
> 
> Dave agreed:
> 
> > And my 2c worth on the "lvm unmounting filesystems on error" - stop
> > it, now. It's the wrong thing to do, and it makes it impossible for
> > filesystems to handle the error and recover gracefully when
> > possible.
> 
> Now, as for:
> 
> > Is xfs refusing to umount 'erroring' device ? 
> 
> I'm not sure what you mean by this.  XFS never unmounts itself.
> If you're asking if an XFS filesystem in an error state (i.e., shutdown,
> or with failing IOs) can unmount - yes, it is possible to unmount
> a filesystem in this state.  The administrator can make that choice.
> 
> 
> 
> You've had two preeminent XFS developers ask repeatedly that

Three.  We've improved XFS' behavior w.r.t. error codes from lower
layers, so please just let XFS (and everything else) deal with it.  If
the FS actually gets corrupt when log writes fail then let's get a bug
report going and actually fix that.  Every filesystem driver has some
sort of well-known strategy for dealing with problems -- XFS shuts down;
ext4 can remount, continue, or panic; etc.  As a convention we don't
nuke mountpoints from orbit when storage errors happen; this is clearly
a surprising deviation from that.

--D

> you stop unmounting xfs from lvm2.  I really wish you would take their
> advice.
> 
> We need to work together to ensure that these subsystems react
> in the best possible way to overprovisioned storage, and give the admin
> the best chance at recovery without more adverse affects.  For the reasons
> stated above, unmounting the filesystem does not achieve this goal.
> 
> It leads to application IO to wrong filesystems (quite possibly root),
> detached filesystems which can no longer be administered, possible hung
> unmounts, etc.  Automatic unmount is the wrong reaction, please stop doing
> it.
> 
> Thanks,
> -Eric
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-01-10  4:30 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-14 16:43 trouble with generic/081 Christoph Hellwig
2016-12-15  6:29 ` Eryu Guan
2016-12-15  6:36 ` Dave Chinner
2016-12-15  8:42   ` Christoph Hellwig
2016-12-15  9:16     ` Zdenek Kabelac
2016-12-16  8:15       ` Christoph Hellwig
2016-12-16  9:31         ` Zdenek Kabelac
2017-01-04 23:03         ` Eric Sandeen
2017-01-05 10:35           ` Zdenek Kabelac
2017-01-05 10:35             ` Zdenek Kabelac
2017-01-05 16:26             ` Mike Snitzer
2017-01-05 17:42               ` Zdenek Kabelac
2017-01-05 17:42                 ` Zdenek Kabelac
2017-01-05 18:07                 ` Mike Snitzer
2017-01-05 18:40                 ` Eric Sandeen
2017-01-05 18:24             ` Eric Sandeen
2017-01-05 18:52               ` Mike Snitzer
2017-01-05 19:13               ` Zdenek Kabelac
2017-01-05 19:29                 ` Eric Sandeen
2017-01-05 21:12                   ` Zdenek Kabelac
2017-01-05 22:03                     ` Eric Sandeen
2017-01-05 22:46                     ` Dave Chinner
2017-01-09 13:39                       ` Christoph Hellwig
2017-01-09 14:22                         ` Zdenek Kabelac
2017-01-09 14:54                           ` Eric Sandeen
2017-01-09 15:11                             ` Zdenek Kabelac
2017-01-10  2:48                               ` Theodore Ts'o
2017-01-10  4:30                             ` Darrick J. Wong
2017-01-09 15:01                           ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.