linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Shutdown filesystem when a thin pool become full
@ 2017-05-22 14:25 Gionatan Danti
  2017-05-22 23:09 ` Carlos Maiolino
  2017-05-23 13:24 ` Eric Sandeen
  0 siblings, 2 replies; 30+ messages in thread
From: Gionatan Danti @ 2017-05-22 14:25 UTC (permalink / raw)
  To: linux-xfs; +Cc: g.danti

Hi all,
I have a question regarding how to automatically shutdown (or put in 
read-only mode) an XFS filesystem which reside on a full thin pool.

Background info: when a thin pool become full, it will deny any new 
unallocated writes, but will continue to allow writes to 
already-allocated storage chunks. So, it behave as a badly damaged disk, 
where some places can be written, but others throw an error.

Playing with dmsetup table it is possible to deny *all* I/O to the 
volume, but it require careful monitor and a sensible script to do the 
right thing (tm).

 From my testing, it happear that XFS will shutdown itself when a 
metadata write fails, however, failed *data* update will continue 
without triggering any filesystem response. Is this by design? Can I do 
something about that?

As a side note, I perfectly understand that async writes should *not* 
put the filesystem in suspended state; however, I found this behavior 
for synchronized writes also.

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-05-22 14:25 Shutdown filesystem when a thin pool become full Gionatan Danti
@ 2017-05-22 23:09 ` Carlos Maiolino
  2017-05-23 10:56   ` Gionatan Danti
  2017-05-23 13:24 ` Eric Sandeen
  1 sibling, 1 reply; 30+ messages in thread
From: Carlos Maiolino @ 2017-05-22 23:09 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-xfs

Hi,

On Mon, May 22, 2017 at 04:25:37PM +0200, Gionatan Danti wrote:
> Hi all,
> I have a question regarding how to automatically shutdown (or put in
> read-only mode) an XFS filesystem which reside on a full thin pool.
> 

If you want that XFS shut down when it hits an error, you can configure it
through sysfs in:

/sys/fs/xfs/<dev>/error/

directory.

You can find detailed info in Documentation/filesystems/xfs.txt (from linux
source).

Although, I should say this is a bad idea. A full thin pool will return an
ENOSPC error to the filesystem, and, the filesystem should act accordingly.

Would you want the filesystem to shut down because it hit an ENOSPC, or report
to the user and let him/her to take the appropriate action, like freeing space
or increasing the dm-thin pool size?

> Background info: when a thin pool become full, it will deny any new
> unallocated writes, but will continue to allow writes to already-allocated
> storage chunks. So, it behave as a badly damaged disk, where some places can
> be written, but others throw an error.
> 

The behavior really depends on several configurations, including how dm-thin
will deal with ENOSPC situations, dm-thin can be configured to queue new
incoming IOs, fail the IO, etc, but, even in a full disk situation, you can
still rewrite data, so this behavior doesn't change much from what would happen
if you fill up a regular block device, XFS will try to allocate a new block, the
block device returns an ENOSPC and XFS acts accordingly.

> Playing with dmsetup table it is possible to deny *all* I/O to the volume,
> but it require careful monitor and a sensible script to do the right thing
> (tm).
> 
> From my testing, it happear that XFS will shutdown itself when a metadata
> write fails, however, failed *data* update will continue without triggering
> any filesystem response. Is this by design? Can I do something about that?
> 

What exactly do you mean by failed data update? As long as you don't need to
allocate any new block either for data of metadata, you should be able to rewrite
data normally if the device is full.

How dm-thin and XFS will work together, really depends on how you configure both
of them. XFS has a configuration system that you can use to set how it will
behave in some error situations according to your preferences, it can be set to
retry the writes for a specific amount of time, or simply fail immediately and
shut down the filesystem.


Worth to mention though, there is a bug that might be triggered when you
overcommit the space (creates a virtual device larger than the real physical
space), it's not too easy to trigger it though, and the fix is already WIP.



Cheers.

-- 
Carlos

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-05-22 23:09 ` Carlos Maiolino
@ 2017-05-23 10:56   ` Gionatan Danti
  2017-05-23 11:01     ` Gionatan Danti
  2017-05-23 12:11     ` Carlos Maiolino
  0 siblings, 2 replies; 30+ messages in thread
From: Gionatan Danti @ 2017-05-23 10:56 UTC (permalink / raw)
  To: linux-xfs; +Cc: g.danti

On 23/05/2017 01:09, Carlos Maiolino wrote:
> 
> If you want that XFS shut down when it hits an error, you can configure it
> through sysfs in:
> 
> /sys/fs/xfs/<dev>/error/
> 
> directory.
> 
> You can find detailed info in Documentation/filesystems/xfs.txt (from linux
> source).

Very useful documentation, thank you.

> 
> Although, I should say this is a bad idea. A full thin pool will return an
> ENOSPC error to the filesystem, and, the filesystem should act accordingly.
> 
> Would you want the filesystem to shut down because it hit an ENOSPC, or report
> to the user and let him/her to take the appropriate action, like freeing space
> or increasing the dm-thin pool size?

Does a full thin pool *really* report a ENOSPC? On all my tests, I 
simply see "buffer i/o error on dev" on dmesg output (see below). How 
can I see if a ENOSPC was returned?
> The behavior really depends on several configurations, including how dm-thin
> will deal with ENOSPC situations, dm-thin can be configured to queue new
> incoming IOs, fail the IO, etc, but, even in a full disk situation, you can
> still rewrite data, so this behavior doesn't change much from what would happen
> if you fill up a regular block device, XFS will try to allocate a new block, the
> block device returns an ENOSPC and XFS acts accordingly.

I am testing with "errorwhenfull=y" when thin pool fills.

> What exactly do you mean by failed data update? As long as you don't need to
> allocate any new block either for data of metadata, you should be able to rewrite
> data normally if the device is full.

I refer to update for which the metadata write completes successfully 
(ie: they are writen to already-allocated space), but the data writeout 
does *not* (ie: new data require a new allocation, which it fails).

> How dm-thin and XFS will work together, really depends on how you configure both
> of them. XFS has a configuration system that you can use to set how it will
> behave in some error situations according to your preferences, it can be set to
> retry the writes for a specific amount of time, or simply fail immediately and
> shut down the filesystem.

Ok, I need some some testing ;)

> Worth to mention though, there is a bug that might be triggered when you
> overcommit the space (creates a virtual device larger than the real physical
> space), it's not too easy to trigger it though, and the fix is already WIP.

Can you link to the bug?

Thanks a lot.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-05-23 10:56   ` Gionatan Danti
@ 2017-05-23 11:01     ` Gionatan Danti
  2017-05-23 12:27       ` Carlos Maiolino
  2017-05-23 12:11     ` Carlos Maiolino
  1 sibling, 1 reply; 30+ messages in thread
From: Gionatan Danti @ 2017-05-23 11:01 UTC (permalink / raw)
  To: linux-xfs; +Cc: g.danti

On 23/05/2017 12:56, Gionatan Danti wrote:> Does a full thin pool 
*really* report a ENOSPC? On all my tests, I
> simply see "buffer i/o error on dev" on dmesg output (see below).

Ok, I forget to attach the debug logs :p

This is my initial LVM state:
[root@blackhole tmp]# lvs
   LV       VG        Attr       LSize  Pool     Origin Data%  Meta% 
Move Log Cpy%Sync Convert
   root     vg_system -wi-ao---- 50.00g
   swap     vg_system -wi-ao----  7.62g
   thinpool vg_system twi-aot---  1.00g                 1.51   0.98
   thinvol  vg_system Vwi-aot---  2.00g thinpool        0.76
[root@blackhole tmp]# lvchange vg_system/thinpool --errorwhenfull=y
   Logical volume vg_system/thinpool changed.

I create an XFS filesystem on /dev/vg_system/thinvol and mounted it 
under /mnt/storage. Then I filled it:

[root@blackhole tmp]# dd if=/dev/zero of=/mnt/storage/disk.img bs=1M 
count=2048 oflag=sync
dd: error writing ‘/mnt/storage/disk.img’: Input/output error
1009+0 records in
1008+0 records out
1056964608 bytes (1.1 GB) copied, 59.7361 s, 17.7 MB/s

[root@blackhole tmp]# df -h
Filesystem                     Size  Used Avail Use% Mounted on
/dev/mapper/vg_system-root      50G   47G  3.8G  93% /
devtmpfs                       3.8G     0  3.8G   0% /dev
tmpfs                          3.8G   84K  3.8G   1% /dev/shm
tmpfs                          3.8G  9.1M  3.8G   1% /run
tmpfs                          3.8G     0  3.8G   0% /sys/fs/cgroup
tmpfs                          3.8G   16K  3.8G   1% /tmp
/dev/sda1                     1014M  314M  701M  31% /boot
tmpfs                          774M   16K  774M   1% /run/user/42
tmpfs                          774M     0  774M   0% /run/user/0
/dev/mapper/vg_system-thinvol  2.0G  1.1G  993M  52% /mnt/storage

On dmesg I can see the following:

[ 3005.331830] XFS (dm-6): Mounting V5 Filesystem
[ 3005.443769] XFS (dm-6): Ending clean mount
[ 5891.595901] device-mapper: thin: Data device (dm-3) discard 
unsupported: Disabling discard passdown.
[ 5970.314062] device-mapper: thin: 253:4: reached low water mark for 
data device: sending event.
[ 5970.358234] device-mapper: thin: 253:4: switching pool to 
out-of-data-space (error IO) mode
[ 5970.358528] Buffer I/O error on dev dm-6, logical block 389248, lost 
async page write
[ 5970.358546] Buffer I/O error on dev dm-6, logical block 389249, lost 
async page write
[ 5970.358552] Buffer I/O error on dev dm-6, logical block 389250, lost 
async page write
[ 5970.358557] Buffer I/O error on dev dm-6, logical block 389251, lost 
async page write
[ 5970.358562] Buffer I/O error on dev dm-6, logical block 389252, lost 
async page write
[ 5970.358567] Buffer I/O error on dev dm-6, logical block 389253, lost 
async page write
[ 5970.358573] Buffer I/O error on dev dm-6, logical block 389254, lost 
async page write
[ 5970.358577] Buffer I/O error on dev dm-6, logical block 389255, lost 
async page write
[ 5970.358583] Buffer I/O error on dev dm-6, logical block 389256, lost 
async page write
[ 5970.358594] Buffer I/O error on dev dm-6, logical block 389257, lost 
async page write

This appears as a "normal" I/O error, right? Or I am missing something?

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-05-23 10:56   ` Gionatan Danti
  2017-05-23 11:01     ` Gionatan Danti
@ 2017-05-23 12:11     ` Carlos Maiolino
  1 sibling, 0 replies; 30+ messages in thread
From: Carlos Maiolino @ 2017-05-23 12:11 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-xfs

Hi,

> > Although, I should say this is a bad idea. A full thin pool will return an
> > ENOSPC error to the filesystem, and, the filesystem should act accordingly.
> > 
> > Would you want the filesystem to shut down because it hit an ENOSPC, or report
> > to the user and let him/her to take the appropriate action, like freeing space
> > or increasing the dm-thin pool size?
> 
> Does a full thin pool *really* report a ENOSPC? On all my tests, I simply
> see "buffer i/o error on dev" on dmesg output (see below). How can I see if
> a ENOSPC was returned?

Yes, it does, unless you are using an old version of dm-thin module:

commit c3667cc6190469d2c7196c2d4dc75fcb33a0814f
Author: Mike Snitzer <snitzer@redhat.com>
Date:   Thu Mar 10 11:31:35 2016 -0500

    dm thin: consistently return -ENOSPC if pool has run out of data space


It used to return EIO, then we decided it should return ENOSPC instead of EIO.

See my comments in your next e-mail

> > The behavior really depends on several configurations, including how dm-thin
> > will deal with ENOSPC situations, dm-thin can be configured to queue new
> > incoming IOs, fail the IO, etc, but, even in a full disk situation, you can
> > still rewrite data, so this behavior doesn't change much from what would happen
> > if you fill up a regular block device, XFS will try to allocate a new block, the
> > block device returns an ENOSPC and XFS acts accordingly.
> 
> I am testing with "errorwhenfull=y" when thin pool fills.
> 

Ok, so it should return -ENOSPC


> > What exactly do you mean by failed data update? As long as you don't need to
> > allocate any new block either for data of metadata, you should be able to rewrite
> > data normally if the device is full.
> 
> I refer to update for which the metadata write completes successfully (ie:
> they are writen to already-allocated space), but the data writeout does
> *not* (ie: new data require a new allocation, which it fails).
> 

Yup, will make XFS complain about metadata errors, once it can't writeback data
queued in AIL.

> Can you link to the bug?
> 

https://www.spinics.net/lists/linux-xfs/msg06986.html


> Thanks a lot.
> 
> -- 
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it
> email: g.danti@assyoma.it - info@assyoma.it
> GPG public key ID: FF5F32A8
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-05-23 11:01     ` Gionatan Danti
@ 2017-05-23 12:27       ` Carlos Maiolino
  2017-05-23 20:05         ` Gionatan Danti
  0 siblings, 1 reply; 30+ messages in thread
From: Carlos Maiolino @ 2017-05-23 12:27 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-xfs

On Tue, May 23, 2017 at 01:01:06PM +0200, Gionatan Danti wrote:
> On 23/05/2017 12:56, Gionatan Danti wrote:> Does a full thin pool *really*
> report a ENOSPC? On all my tests, I
> > simply see "buffer i/o error on dev" on dmesg output (see below).
> 
> Ok, I forget to attach the debug logs :p
> 
> This is my initial LVM state:
> [root@blackhole tmp]# lvs
>   LV       VG        Attr       LSize  Pool     Origin Data%  Meta% Move Log
> Cpy%Sync Convert
>   root     vg_system -wi-ao---- 50.00g
>   swap     vg_system -wi-ao----  7.62g
>   thinpool vg_system twi-aot---  1.00g                 1.51   0.98
>   thinvol  vg_system Vwi-aot---  2.00g thinpool        0.76
> [root@blackhole tmp]# lvchange vg_system/thinpool --errorwhenfull=y
>   Logical volume vg_system/thinpool changed.
> 
> I create an XFS filesystem on /dev/vg_system/thinvol and mounted it under
> /mnt/storage. Then I filled it:
> 
> [root@blackhole tmp]# dd if=/dev/zero of=/mnt/storage/disk.img bs=1M
> count=2048 oflag=sync
> dd: error writing ‘/mnt/storage/disk.img’: Input/output error

Aha, you are using sync flag, that's why you are getting IO errors instead of
ENOSPC, I don't remember from the top of my mind why exactly, it's been a while
since I started to work on this XFS and dm-thin integration, but IIRC, the
problem is that XFS reserves the data required, and don't expect to get an
ENOSPC once the device "have space", and when the sync occurs, kaboom. I should
take a look again on it.


> [ 3005.331830] XFS (dm-6): Mounting V5 Filesystem
> [ 3005.443769] XFS (dm-6): Ending clean mount
> [ 5891.595901] device-mapper: thin: Data device (dm-3) discard unsupported:
> Disabling discard passdown.
> [ 5970.314062] device-mapper: thin: 253:4: reached low water mark for data
> device: sending event.
> [ 5970.358234] device-mapper: thin: 253:4: switching pool to
> out-of-data-space (error IO) mode
> [ 5970.358528] Buffer I/O error on dev dm-6, logical block 389248, lost
> async page write
> [ 5970.358546] Buffer I/O error on dev dm-6, logical block 389249, lost
> async page write
> async page write
> [ 5970.358577] Buffer I/O error on dev dm-6, logical block 389255, lost
> async page write
> [ 5970.358583] Buffer I/O error on dev dm-6, logical block 389256, lost
> async page write
> [ 5970.358594] Buffer I/O error on dev dm-6, logical block 389257, lost
> async page write
> 
> This appears as a "normal" I/O error, right? Or I am missing something?

Yeah, I don't remember exactly the details from this part of the problem, but
yes, looks like you are also hitting the problem I've been working on, which
basically makes XFS spinning indefinitely on xfsaild, trying to retry the
buffers which failed, but, can't because they are flush locked. It basically
have all data committed to AIL but can't flush them to their respective place
due lack of space, then, it you will keep seeing this message until it either
permanent fail the buffers, you expand the dm-pool or you unmount the
filesystem.

Currently, in all 3 cases, XFS can hang, unless you have set 'max_retries'
configuration to '0', before reproducing the problem.

Which kernel version are you using?

If you have the possibility, you can test my patches to fix this problem:

https://www.spinics.net/lists/linux-xfs/msg06986.html

It will certainly have a V3, but they shouldn't explode your system :) And more
testing are always welcomed.

With the patchset you will still get the errors, since the device will not have
the space XFS expects it to have, but the errors will simply go away as soon as
you extend the pool device allowing more space, or it will shut down the FS if
you try to unmount it, instead of hang the filesystem.

Cheers.

-- 
Carlos

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-05-22 14:25 Shutdown filesystem when a thin pool become full Gionatan Danti
  2017-05-22 23:09 ` Carlos Maiolino
@ 2017-05-23 13:24 ` Eric Sandeen
  2017-05-23 20:23   ` Gionatan Danti
  1 sibling, 1 reply; 30+ messages in thread
From: Eric Sandeen @ 2017-05-23 13:24 UTC (permalink / raw)
  To: Gionatan Danti, linux-xfs

On 5/22/17 9:25 AM, Gionatan Danti wrote:
> Hi all, I have a question regarding how to automatically shutdown (or
> put in read-only mode) an XFS filesystem which reside on a full thin
> pool.
> 
> Background info: when a thin pool become full, it will deny any new
> unallocated writes, but will continue to allow writes to
> already-allocated storage chunks. So, it behave as a badly damaged
> disk, where some places can be written, but others throw an error.
> 
> Playing with dmsetup table it is possible to deny *all* I/O to the
> volume, but it require careful monitor and a sensible script to do
> the right thing (tm).
> 
> From my testing, it happear that XFS will shutdown itself when a
> metadata write fails, however, failed *data* update will continue
> without triggering any filesystem response. Is this by design? Can I
> do something about that?

It is pretty much by design - keeping in mind that XFS predates the
general notion of thin storage by a decade or two.  ;)

In the big picture, xfs's shutdown behavior is there to preserve the
consistency of the filesystem - if metadata writes are failing,
continuing would likely lead to corruption.

If /data/ writes are failing, that's for the application to deal with.

Shutting down the filesystem for an unwritable data block is probably not
what people want to see, in general.

> As a side note, I perfectly understand that async writes should *not*
> put the filesystem in suspended state; however, I found this behavior
> for synchronized writes also.

I would expect that synchronous writes and/or things like fsync calls
would return errors if the underlying data IO fails.

As for your question about ENOSPC vs EIO, even though the storage may
return ENOSPC, I believe that gets turned into an EIO when it passes
through XFS to userspace.

Historically, ENOSPC meant that the filesystem itself has run out of
blocks, and EIO meant that the underlying device could not complete the
IO - so it's a matter of semantics, and I'm not sure anyone has
really settled on a consistent story with thin devices.

-Eric

> Thanks.
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-05-23 12:27       ` Carlos Maiolino
@ 2017-05-23 20:05         ` Gionatan Danti
  2017-05-23 21:33           ` Eric Sandeen
  2017-06-13  9:09           ` Gionatan Danti
  0 siblings, 2 replies; 30+ messages in thread
From: Gionatan Danti @ 2017-05-23 20:05 UTC (permalink / raw)
  To: linux-xfs; +Cc: g.danti

Il 23-05-2017 14:27 Carlos Maiolino ha scritto:
> 
> Aha, you are using sync flag, that's why you are getting IO errors 
> instead of
> ENOSPC, I don't remember from the top of my mind why exactly, it's been 
> a while
> since I started to work on this XFS and dm-thin integration, but IIRC, 
> the
> problem is that XFS reserves the data required, and don't expect to get 
> an
> ENOSPC once the device "have space", and when the sync occurs, kaboom. 
> I should
> take a look again on it.

Ok, I tried with a more typical non-sync write and it seems to report 
ENOSPC:

[root@blackhole ~]# dd if=/dev/zero of=/mnt/storage/disk.img bs=1M 
count=2048
dd: error writing ‘/mnt/storage/disk.img’: No space left on device
2002+0 records in
2001+0 records out
2098917376 bytes (2.1 GB) copied, 7.88216 s, 266 MB/s

With /sys/fs/xfs/dm-6/error/metadata/ENOSPC/max_retries = -1 (default), 
I have the following dmesg output:

[root@blackhole ~]# dmesg
[23152.667198] XFS (dm-6): Mounting V5 Filesystem
[23152.762711] XFS (dm-6): Ending clean mount
[23192.704672] device-mapper: thin: 253:4: reached low water mark for 
data device: sending event.
[23192.988356] device-mapper: thin: 253:4: switching pool to 
out-of-data-space (error IO) mode
[23193.046288] Buffer I/O error on dev dm-6, logical block 385299, lost 
async page write
[23193.046299] Buffer I/O error on dev dm-6, logical block 385300, lost 
async page write
[23193.046302] Buffer I/O error on dev dm-6, logical block 385301, lost 
async page write
[23193.046304] Buffer I/O error on dev dm-6, logical block 385302, lost 
async page write
[23193.046307] Buffer I/O error on dev dm-6, logical block 385303, lost 
async page write
[23193.046309] Buffer I/O error on dev dm-6, logical block 385304, lost 
async page write
[23193.046312] Buffer I/O error on dev dm-6, logical block 385305, lost 
async page write
[23193.046314] Buffer I/O error on dev dm-6, logical block 385306, lost 
async page write
[23193.046316] Buffer I/O error on dev dm-6, logical block 385307, lost 
async page write
[23193.046319] Buffer I/O error on dev dm-6, logical block 385308, lost 
async page write

With /sys/fs/xfs/dm-6/error/metadata/ENOSPC/max_retries = 0, dmesg 
output is slightly different:

[root@blackhole default]# dmesg
[23557.594502] device-mapper: thin: 253:4: switching pool to 
out-of-data-space (error IO) mode
[23557.649772] buffer_io_error: 257430 callbacks suppressed
[23557.649784] Buffer I/O error on dev dm-6, logical block 381193, lost 
async page write
[23557.649805] Buffer I/O error on dev dm-6, logical block 381194, lost 
async page write
[23557.649811] Buffer I/O error on dev dm-6, logical block 381195, lost 
async page write
[23557.649818] Buffer I/O error on dev dm-6, logical block 381196, lost 
async page write
[23557.649862] Buffer I/O error on dev dm-6, logical block 381197, lost 
async page write
[23557.649871] Buffer I/O error on dev dm-6, logical block 381198, lost 
async page write
[23557.649880] Buffer I/O error on dev dm-6, logical block 381199, lost 
async page write
[23557.649888] Buffer I/O error on dev dm-6, logical block 381200, lost 
async page write
[23557.649897] Buffer I/O error on dev dm-6, logical block 381201, lost 
async page write
[23557.649903] Buffer I/O error on dev dm-6, logical block 381202, lost 
async page write

Notice the suppressed buffer_io_error entries: are they related to the 
bug you linked before?
Anyway, in *no* cases I had a filesystem shutdown on these errors.

Trying to be pragmatic, my main concern is to avoid extended filesystem 
and/or data corruption in the case a thin pool become inadvertently 
full. For example, with ext4 I can mount the filesystem with 
"errors=remount-ro,data=journaled" and *any* filesystem error (due to 
thinpool or other problems) will put the filesystem in a read-only 
state, avoiding significan damages.

If, and how, I can replicate this behavior with XFS? From my 
understanding, XFS does not have a "remount read-only" mode. Moreover, 
until its metadata can be safely stored on disk (ie: they hit already 
allocated space), it seems to happily continue to run, disregarding data 
writeout problem/error. As a note, ext4 without "data=jornaled" bahave 
quite similarly, whit a read-only remount happening on metadata errors 
only.

Surely I am missing something... right?
Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-05-23 13:24 ` Eric Sandeen
@ 2017-05-23 20:23   ` Gionatan Danti
  2017-05-24  7:38     ` Carlos Maiolino
  0 siblings, 1 reply; 30+ messages in thread
From: Gionatan Danti @ 2017-05-23 20:23 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, g.danti

Il 23-05-2017 15:24 Eric Sandeen ha scritto:
> It is pretty much by design - keeping in mind that XFS predates the
> general notion of thin storage by a decade or two.  ;)
> 
> In the big picture, xfs's shutdown behavior is there to preserve the
> consistency of the filesystem - if metadata writes are failing,
> continuing would likely lead to corruption.
> 
> If /data/ writes are failing, that's for the application to deal with.
> 
> Shutting down the filesystem for an unwritable data block is probably 
> not
> what people want to see, in general.

True.

But preparing for the worse (ie: a thin pool which inadvertently 
filled), I think that a read-only/shutdown mode would be valuable for 
the (large) class of application which don't really know how to deal 
with I/O errors. Moreover, applications not issuing fsyncs will *never* 
know that an I/O error happened on data writeout. On the other hand, 
sure, applications which do non use fsyncs probably do not really care 
about their data...

Eric: in general terms, how do you feel about XFS on thinly-provisioned 
volumes? Is it a production-level setup or they have too many rough 
edges?

> As for your question about ENOSPC vs EIO, even though the storage may
> return ENOSPC, I believe that gets turned into an EIO when it passes
> through XFS to userspace.
> 
> Historically, ENOSPC meant that the filesystem itself has run out of
> blocks, and EIO meant that the underlying device could not complete the
> IO - so it's a matter of semantics, and I'm not sure anyone has
> really settled on a consistent story with thin devices.
> 

Yeah, I agree ;)

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-05-23 20:05         ` Gionatan Danti
@ 2017-05-23 21:33           ` Eric Sandeen
  2017-05-24 17:52             ` Gionatan Danti
  2017-06-13  9:09           ` Gionatan Danti
  1 sibling, 1 reply; 30+ messages in thread
From: Eric Sandeen @ 2017-05-23 21:33 UTC (permalink / raw)
  To: Gionatan Danti, linux-xfs

On 5/23/17 3:05 PM, Gionatan Danti wrote:
> Il 23-05-2017 14:27 Carlos Maiolino ha scritto:
>>
>> Aha, you are using sync flag, that's why you are getting IO errors instead of
>> ENOSPC, I don't remember from the top of my mind why exactly, it's been a while
>> since I started to work on this XFS and dm-thin integration, but IIRC, the
>> problem is that XFS reserves the data required, and don't expect to get an
>> ENOSPC once the device "have space", and when the sync occurs, kaboom. I should
>> take a look again on it.
> 
> Ok, I tried with a more typical non-sync write and it seems to report ENOSPC:
> 
> [root@blackhole ~]# dd if=/dev/zero of=/mnt/storage/disk.img bs=1M count=2048
> dd: error writing ‘/mnt/storage/disk.img’: No space left on device
> 2002+0 records in
> 2001+0 records out
> 2098917376 bytes (2.1 GB) copied, 7.88216 s, 266 MB/s
> 
> With /sys/fs/xfs/dm-6/error/metadata/ENOSPC/max_retries = -1 (default), I have the following dmesg output:
> 
> [root@blackhole ~]# dmesg
> [23152.667198] XFS (dm-6): Mounting V5 Filesystem
> [23152.762711] XFS (dm-6): Ending clean mount
> [23192.704672] device-mapper: thin: 253:4: reached low water mark for data device: sending event.
> [23192.988356] device-mapper: thin: 253:4: switching pool to out-of-data-space (error IO) mode
> [23193.046288] Buffer I/O error on dev dm-6, logical block 385299, lost async page write
> [23193.046299] Buffer I/O error on dev dm-6, logical block 385300, lost async page write
> [23193.046302] Buffer I/O error on dev dm-6, logical block 385301, lost async page write
> [23193.046304] Buffer I/O error on dev dm-6, logical block 385302, lost async page write
> [23193.046307] Buffer I/O error on dev dm-6, logical block 385303, lost async page write
> [23193.046309] Buffer I/O error on dev dm-6, logical block 385304, lost async page write
> [23193.046312] Buffer I/O error on dev dm-6, logical block 385305, lost async page write
> [23193.046314] Buffer I/O error on dev dm-6, logical block 385306, lost async page write
> [23193.046316] Buffer I/O error on dev dm-6, logical block 385307, lost async page write
> [23193.046319] Buffer I/O error on dev dm-6, logical block 385308, lost async page write
> 
> With /sys/fs/xfs/dm-6/error/metadata/ENOSPC/max_retries = 0, dmesg output is slightly different:


Try setting EIO rather than ENOSPC.

> 
> [root@blackhole default]# dmesg
> [23557.594502] device-mapper: thin: 253:4: switching pool to out-of-data-space (error IO) mode
> [23557.649772] buffer_io_error: 257430 callbacks suppressed
> [23557.649784] Buffer I/O error on dev dm-6, logical block 381193, lost async page write
> [23557.649805] Buffer I/O error on dev dm-6, logical block 381194, lost async page write
> [23557.649811] Buffer I/O error on dev dm-6, logical block 381195, lost async page write
> [23557.649818] Buffer I/O error on dev dm-6, logical block 381196, lost async page write
> [23557.649862] Buffer I/O error on dev dm-6, logical block 381197, lost async page write
> [23557.649871] Buffer I/O error on dev dm-6, logical block 381198, lost async page write
> [23557.649880] Buffer I/O error on dev dm-6, logical block 381199, lost async page write
> [23557.649888] Buffer I/O error on dev dm-6, logical block 381200, lost async page write
> [23557.649897] Buffer I/O error on dev dm-6, logical block 381201, lost async page write
> [23557.649903] Buffer I/O error on dev dm-6, logical block 381202, lost async page write
> 
> Notice the suppressed buffer_io_error entries: are they related to the bug you linked before?
> Anyway, in *no* cases I had a filesystem shutdown on these errors.

Yep, that's "just" data IO.

> Trying to be pragmatic, my main concern is to avoid extended filesystem and/or data corruption in the case a thin pool become inadvertently full. For example, with ext4 I can mount the filesystem with "errors=remount-ro,data=journaled" and *any* filesystem error (due to thinpool or other problems) will put the filesystem in a read-only state, avoiding significan damages.
> 
> If, and how, I can replicate this behavior with XFS? From my understanding, XFS does not have a "remount read-only" mode. Moreover, until its metadata can be safely stored on disk (ie: they hit already allocated space), it seems to happily continue to run, disregarding data writeout problem/error. As a note, ext4 without "data=jornaled" bahave quite similarly, whit a read-only remount happening on metadata errors only.
> 
> Surely I am missing something... right?

Even if you run out of space, xfs should not become corrupted.
You may need to add space to successfully replay the log afterward,
but if you do, it should replay and everyhthing should be consistent
(which is not the same as "no data was lost") - is that not the case?

As for xfs happily running with metadata errors, the tunable error behavior
should make it stop more quickly, at least if you set it for EIO.
It usually will eventually stop when it hits what it considers a critical
metadata IO error...

-Eric

> Thanks.
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-05-23 20:23   ` Gionatan Danti
@ 2017-05-24  7:38     ` Carlos Maiolino
  2017-05-24 17:50       ` Gionatan Danti
  0 siblings, 1 reply; 30+ messages in thread
From: Carlos Maiolino @ 2017-05-24  7:38 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: Eric Sandeen, linux-xfs

On Tue, May 23, 2017 at 10:23:55PM +0200, Gionatan Danti wrote:
> Il 23-05-2017 15:24 Eric Sandeen ha scritto:
> > It is pretty much by design - keeping in mind that XFS predates the
> > general notion of thin storage by a decade or two.  ;)
> > 
> > In the big picture, xfs's shutdown behavior is there to preserve the
> > consistency of the filesystem - if metadata writes are failing,
> > continuing would likely lead to corruption.
> > 
> > If /data/ writes are failing, that's for the application to deal with.
> > 
> > Shutting down the filesystem for an unwritable data block is probably
> > not
> > what people want to see, in general.
> 
> True.
> 
> But preparing for the worse (ie: a thin pool which inadvertently filled), I
> think that a read-only/shutdown mode would be valuable for the (large) class
> of application which don't really know how to deal with I/O errors.
> Moreover, applications not issuing fsyncs will *never* know that an I/O
> error happened on data writeout. On the other hand, sure, applications which
> do non use fsyncs probably do not really care about their data...
> 

If the application don't deal with the I/O errors, and ensure its data is
already written, what difference a RO fs will do? :) the application will send a
write request, the filesystem will deny it (because it is in RO), and the
application will not care :)

> Eric: in general terms, how do you feel about XFS on thinly-provisioned
> volumes? Is it a production-level setup or they have too many rough edges?
> 

Any application that don't ensure its data is stored, with fsync, and
applications that don't check for IO errors, well, it's applications fault, the
filesystem can't do much other than reporting the error to userspace.

Thin provisioning is not a new thing, and XFS has been designed to work on such
volumes from the beginning, so, yes, it is reliable to be used with thin
provisioned volumes. But of course, it requires the thin provisioning system to
be well administered, as any other filesystem will require.

>From a historical point of view, XFS ever retried forever to submit any failed
IO, once, most EIOs are considered temporary.

Well, things change, and we implemented a way to configure for how long XFS
should retry such IOs, and that's where the configuration system comes in.

Regarding you don't seeing XFS to shutdown during your tests, XFS will shutdown
if is there problems writing metadata to the disk, ensuring data is properly
written is the application responsibility, the filesystem should guarantee
consistency.


Cheers

> > As for your question about ENOSPC vs EIO, even though the storage may
> > return ENOSPC, I believe that gets turned into an EIO when it passes
> > through XFS to userspace.
> > 
> > Historically, ENOSPC meant that the filesystem itself has run out of
> > blocks, and EIO meant that the underlying device could not complete the
> > IO - so it's a matter of semantics, and I'm not sure anyone has
> > really settled on a consistent story with thin devices.
> > 
> 
> Yeah, I agree ;)
> 
> Thanks.
> 
> -- 
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it
> email: g.danti@assyoma.it - info@assyoma.it
> GPG public key ID: FF5F32A8
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-05-24  7:38     ` Carlos Maiolino
@ 2017-05-24 17:50       ` Gionatan Danti
  0 siblings, 0 replies; 30+ messages in thread
From: Gionatan Danti @ 2017-05-24 17:50 UTC (permalink / raw)
  To: Eric Sandeen, linux-xfs; +Cc: g.danti



On 24/05/2017 09:38, Carlos Maiolino wrote:
> 
> If the application don't deal with the I/O errors, and ensure its data is
> already written, what difference a RO fs will do? :) the application will send a
> write request, the filesystem will deny it (because it is in RO), and the
> application will not care :)

Maybe I am wrong, but a read only filesystem guarantee that *no other 
data modifications* can be done, effectively freezing the volume.

With a full thin pool, XFS will continue to serve writes for already 
allocated chunks, but will reject writes for unallocated ones. I think 
this can lead to some inconsistencies, for example:
- this part of a file was updated, that one failed - but nobody noticed;
- a file was copied, but its content was lost due to data writeout 
failing and no fsyncs (and filemanagers often do *exactly* this);
- having two files, this file was updated, that other failed;
- writing to a file, its size is updated (not only apparent size, but 
real/allocated one also) but data writeout fails. In this case, reading 
the file over the unallocated space returns EIO, but you need to *read 
all data* until EIO to realize that the file has some serious problem.

In all these cases, I feel that a "shutdown the filesystem at the first 
data writeout problem" command can save the day. Even better would be a 
"put the filesystem in read only mode".

True, a well bahaved application should issue fsync() and check the I/O 
errors, but many applications don't do that. Hence I was asking if XFS 
can be suspended/turned-off.

Maybe I'm rising naive problems - after all, a full filesystem will 
behave somewhat similarly in at least some cases. However, from 
linux-lvm mailing list I understand that a full thin pool is *not* 
comparable to a full filesystem, right?

Thanks.


-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-05-23 21:33           ` Eric Sandeen
@ 2017-05-24 17:52             ` Gionatan Danti
  0 siblings, 0 replies; 30+ messages in thread
From: Gionatan Danti @ 2017-05-24 17:52 UTC (permalink / raw)
  To: Eric Sandeen, linux-xfs; +Cc: g.danti

On 23/05/2017 23:33, Eric Sandeen wrote:
> Try setting EIO rather than ENOSPC.
> 
> Even if you run out of space, xfs should not become corrupted.
> You may need to add space to successfully replay the log afterward,
> but if you do, it should replay and everyhthing should be consistent
> (which is not the same as "no data was lost") - is that not the case?
> 
> As for xfs happily running with metadata errors, the tunable error behavior
> should make it stop more quickly, at least if you set it for EIO.
> It usually will eventually stop when it hits what it considers a critical
> metadata IO error...
> 
> -Eric
>

Mmm ok, I need to play better with these tunables ;)

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-05-23 20:05         ` Gionatan Danti
  2017-05-23 21:33           ` Eric Sandeen
@ 2017-06-13  9:09           ` Gionatan Danti
  2017-06-15 11:51             ` Gionatan Danti
  1 sibling, 1 reply; 30+ messages in thread
From: Gionatan Danti @ 2017-06-13  9:09 UTC (permalink / raw)
  To: linux-xfs; +Cc: g.danti

Sorry for the bump, but further tests show unexpected behavior and I 
would really like to understand what I am missing.

Current setup:	CentOS 7.3 x86-64
Kernel version:	3.10.0-514.21.1.el7.x86_64

LVM2 version (from lvm version):
   LVM version:     2.02.166(2)-RHEL7 (2016-11-16)
   Library version: 1.02.135-RHEL7 (2016-11-16)
   Driver version:  4.34.0

On 23/05/2017 22:05, Gionatan Danti wrote:
> 
> Ok, I tried with a more typical non-sync write and it seems to report 
> ENOSPC:
> 
> [root@blackhole ~]# dd if=/dev/zero of=/mnt/storage/disk.img bs=1M 
> count=2048
> dd: error writing ‘/mnt/storage/disk.img’: No space left on device
> 2002+0 records in
> 2001+0 records out
> 2098917376 bytes (2.1 GB) copied, 7.88216 s, 266 MB/s
> 

Contrary to what reported above, thin pool seems to *not* reporting 
ENOSPC when full. This means that any new data submitted to the 
filesystem will be reported as "written" but they never were.

I fully understand that application who cares for their data should 
regularly use fsync(). However, *many* application don't do that. One 
notable example is Windows Explorer: when accessing a full thinvol via a 
samba share, it will blatantly continue do "write" to the share without 
notice the user in any way that something is wrong. This is a recipe for 
disaster, as the user continues to uploads file which basically get lost...

Yes, the lacking fsync() use really is an application-level problem. 
However, sending files to (basically) /dev/null when the pool is full 
does not seems a smart thing.

I am surely doing wrong something, but I can not found what. Give a look 
below for how to reproduce...

# thinpool has errorwhenfull=y set
# thinpool is 256M, thin volume is 1G
[root@blackhole mnt]# lvs -o +whenfull
   LV       VG        Attr       LSize   Pool     Origin Data%  Meta% 
Move Log Cpy%Sync Convert WhenFull
   fatvol   vg_kvm    -wi-ao---- 256.00m 

   storage  vg_kvm    -wi-a----- 300.00g 

   thinpool vg_kvm    twi-aot--- 256.00m                 1.46   0.98 
                         error
   thinvol  vg_kvm    Vwi-aot---   1.00g thinpool        0.37 

   root     vg_system -wi-ao----  50.00g 

   swap     vg_system -wi-ao----   7.62g

# current device mappings
[root@blackhole mnt]# ls -al /dev/mapper/ | grep thin
lrwxrwxrwx.  1 root root       7 13 giu 09.37 vg_kvm-thinpool -> ../dm-7
lrwxrwxrwx.  1 root root       7 13 giu 09.39 vg_kvm-thinpool_tdata -> 
../dm-5
lrwxrwxrwx.  1 root root       7 13 giu 09.39 vg_kvm-thinpool_tmeta -> 
../dm-4
lrwxrwxrwx.  1 root root       7 13 giu 09.39 vg_kvm-thinpool-tpool -> 
../dm-6
lrwxrwxrwx.  1 root root       7 13 giu 10.46 vg_kvm-thinvol -> ../dm-8

# disabled ENOSPC max_retries (default value does not change anything)
[root@blackhole mnt]# cat 
/sys/fs/xfs/dm-8/error/metadata/ENOSPC/max_retries
0

# current filesystem use
[root@blackhole mnt]# df -h | grep thin
/dev/mapper/vg_kvm-thinvol 1021M   33M  989M   4% /mnt/thinvol

# write 400M - it should fill the thinpool
[root@blackhole mnt]# dd if=/dev/zero of=/mnt/thinvol/disk.img bs=1M 
count=400
400+0 records in
400+0 records out
419430400 bytes (419 MB) copied, 0.424677 s, 988 MB/s

... wait 30 seconds ...

# thin pool switched to out-of-space mode
[root@blackhole mnt]# dmesg
[ 4408.257419] XFS (dm-8): Mounting V5 Filesystem
[ 4408.368891] XFS (dm-8): Ending clean mount
[ 4460.147962] device-mapper: thin: 253:6: switching pool to 
out-of-data-space (error IO) mode
[ 4460.218484] buffer_io_error: 199623 callbacks suppressed
[ 4460.218497] Buffer I/O error on dev dm-8, logical block 86032, lost 
async page write
[ 4460.218510] Buffer I/O error on dev dm-8, logical block 86033, lost 
async page write
[ 4460.218516] Buffer I/O error on dev dm-8, logical block 86034, lost 
async page write
[ 4460.218521] Buffer I/O error on dev dm-8, logical block 86035, lost 
async page write
[ 4460.218526] Buffer I/O error on dev dm-8, logical block 86036, lost 
async page write
[ 4460.218531] Buffer I/O error on dev dm-8, logical block 86037, lost 
async page write
[ 4460.218536] Buffer I/O error on dev dm-8, logical block 86038, lost 
async page write
[ 4460.218541] Buffer I/O error on dev dm-8, logical block 86039, lost 
async page write
[ 4460.218546] Buffer I/O error on dev dm-8, logical block 86040, lost 
async page write
[ 4460.218551] Buffer I/O error on dev dm-8, logical block 86041, lost 
async page write

# current thinpool state
[root@blackhole mnt]# lvs -o +whenfull
   LV       VG        Attr       LSize   Pool     Origin Data%  Meta% 
Move Log Cpy%Sync Convert WhenFull
   fatvol   vg_kvm    -wi-a----- 256.00m 

   storage  vg_kvm    -wi-a----- 300.00g 

   thinpool vg_kvm    twi-aot-D- 256.00m                 100.00 4.10 
                         error
   thinvol  vg_kvm    Vwi-aot---   1.00g thinpool        25.00 

   root     vg_system -wi-ao----  50.00g 

   swap     vg_system -wi-ao----   7.62g

# write another 400M - they should *not* be allowed to complete without 
errors
[root@blackhole mnt]# dd if=/dev/zero of=/mnt/thinvol/disk2.img bs=1M 
count=400
400+0 records in
400+0 records out
419430400 bytes (419 MB) copied, 0.36643 s, 1.1 GB/s

# no errors reported! give a look at dmesg

[root@blackhole mnt]# dmesg
[ 4603.649156] buffer_io_error: 44890 callbacks suppressed
[ 4603.649163] Buffer I/O error on dev dm-8, logical block 163776, lost 
async page write
[ 4603.649172] Buffer I/O error on dev dm-8, logical block 163777, lost 
async page write
[ 4603.649175] Buffer I/O error on dev dm-8, logical block 163778, lost 
async page write
[ 4603.649178] Buffer I/O error on dev dm-8, logical block 163779, lost 
async page write
[ 4603.649181] Buffer I/O error on dev dm-8, logical block 163780, lost 
async page write
[ 4603.649184] Buffer I/O error on dev dm-8, logical block 163781, lost 
async page write
[ 4603.649187] Buffer I/O error on dev dm-8, logical block 163782, lost 
async page write
[ 4603.649189] Buffer I/O error on dev dm-8, logical block 163783, lost 
async page write
[ 4603.649192] Buffer I/O error on dev dm-8, logical block 163784, lost 
async page write
[ 4603.649194] Buffer I/O error on dev dm-8, logical block 163785, lost 
async page write

# current filesystem use
[root@blackhole mnt]# df -h | grep thin
/dev/mapper/vg_kvm-thinvol 1021M  833M  189M  82% /mnt/thinvol

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-13  9:09           ` Gionatan Danti
@ 2017-06-15 11:51             ` Gionatan Danti
  2017-06-15 13:14               ` Carlos Maiolino
  0 siblings, 1 reply; 30+ messages in thread
From: Gionatan Danti @ 2017-06-15 11:51 UTC (permalink / raw)
  To: linux-xfs

Il 13-06-2017 11:09 Gionatan Danti ha scritto:
> Sorry for the bump, but further tests show unexpected behavior and I
> would really like to understand what I am missing.
> 
> Current setup:	CentOS 7.3 x86-64
> Kernel version:	3.10.0-514.21.1.el7.x86_64
> 
> LVM2 version (from lvm version):
>   LVM version:     2.02.166(2)-RHEL7 (2016-11-16)
>   Library version: 1.02.135-RHEL7 (2016-11-16)
>   Driver version:  4.34.0
> 
> On 23/05/2017 22:05, Gionatan Danti wrote:
>> 
>> Ok, I tried with a more typical non-sync write and it seems to report 
>> ENOSPC:
>> 
>> [root@blackhole ~]# dd if=/dev/zero of=/mnt/storage/disk.img bs=1M 
>> count=2048
>> dd: error writing ‘/mnt/storage/disk.img’: No space left on device
>> 2002+0 records in
>> 2001+0 records out
>> 2098917376 bytes (2.1 GB) copied, 7.88216 s, 266 MB/s
>> 
> 
> Contrary to what reported above, thin pool seems to *not* reporting
> ENOSPC when full. This means that any new data submitted to the
> filesystem will be reported as "written" but they never were.
> 
> I fully understand that application who cares for their data should
> regularly use fsync(). However, *many* application don't do that. One
> notable example is Windows Explorer: when accessing a full thinvol via
> a samba share, it will blatantly continue do "write" to the share
> without notice the user in any way that something is wrong. This is a
> recipe for disaster, as the user continues to uploads file which
> basically get lost...
> 
> Yes, the lacking fsync() use really is an application-level problem.
> However, sending files to (basically) /dev/null when the pool is full
> does not seems a smart thing.
> 
> I am surely doing wrong something, but I can not found what. Give a
> look below for how to reproduce...
> 
> # thinpool has errorwhenfull=y set
> # thinpool is 256M, thin volume is 1G
> [root@blackhole mnt]# lvs -o +whenfull
>   LV       VG        Attr       LSize   Pool     Origin Data%  Meta%
> Move Log Cpy%Sync Convert WhenFull
>   fatvol   vg_kvm    -wi-ao---- 256.00m
> 
>   storage  vg_kvm    -wi-a----- 300.00g
> 
>   thinpool vg_kvm    twi-aot--- 256.00m                 1.46   0.98
>                      error
>   thinvol  vg_kvm    Vwi-aot---   1.00g thinpool        0.37
> 
>   root     vg_system -wi-ao----  50.00g
> 
>   swap     vg_system -wi-ao----   7.62g
> 
> # current device mappings
> [root@blackhole mnt]# ls -al /dev/mapper/ | grep thin
> lrwxrwxrwx.  1 root root       7 13 giu 09.37 vg_kvm-thinpool -> 
> ../dm-7
> lrwxrwxrwx.  1 root root       7 13 giu 09.39 vg_kvm-thinpool_tdata -> 
> ../dm-5
> lrwxrwxrwx.  1 root root       7 13 giu 09.39 vg_kvm-thinpool_tmeta -> 
> ../dm-4
> lrwxrwxrwx.  1 root root       7 13 giu 09.39 vg_kvm-thinpool-tpool -> 
> ../dm-6
> lrwxrwxrwx.  1 root root       7 13 giu 10.46 vg_kvm-thinvol -> ../dm-8
> 
> # disabled ENOSPC max_retries (default value does not change anything)
> [root@blackhole mnt]# cat 
> /sys/fs/xfs/dm-8/error/metadata/ENOSPC/max_retries
> 0
> 
> # current filesystem use
> [root@blackhole mnt]# df -h | grep thin
> /dev/mapper/vg_kvm-thinvol 1021M   33M  989M   4% /mnt/thinvol
> 
> # write 400M - it should fill the thinpool
> [root@blackhole mnt]# dd if=/dev/zero of=/mnt/thinvol/disk.img bs=1M 
> count=400
> 400+0 records in
> 400+0 records out
> 419430400 bytes (419 MB) copied, 0.424677 s, 988 MB/s
> 
> ... wait 30 seconds ...
> 
> # thin pool switched to out-of-space mode
> [root@blackhole mnt]# dmesg
> [ 4408.257419] XFS (dm-8): Mounting V5 Filesystem
> [ 4408.368891] XFS (dm-8): Ending clean mount
> [ 4460.147962] device-mapper: thin: 253:6: switching pool to
> out-of-data-space (error IO) mode
> [ 4460.218484] buffer_io_error: 199623 callbacks suppressed
> [ 4460.218497] Buffer I/O error on dev dm-8, logical block 86032, lost
> async page write
> [ 4460.218510] Buffer I/O error on dev dm-8, logical block 86033, lost
> async page write
> [ 4460.218516] Buffer I/O error on dev dm-8, logical block 86034, lost
> async page write
> [ 4460.218521] Buffer I/O error on dev dm-8, logical block 86035, lost
> async page write
> [ 4460.218526] Buffer I/O error on dev dm-8, logical block 86036, lost
> async page write
> [ 4460.218531] Buffer I/O error on dev dm-8, logical block 86037, lost
> async page write
> [ 4460.218536] Buffer I/O error on dev dm-8, logical block 86038, lost
> async page write
> [ 4460.218541] Buffer I/O error on dev dm-8, logical block 86039, lost
> async page write
> [ 4460.218546] Buffer I/O error on dev dm-8, logical block 86040, lost
> async page write
> [ 4460.218551] Buffer I/O error on dev dm-8, logical block 86041, lost
> async page write
> 
> # current thinpool state
> [root@blackhole mnt]# lvs -o +whenfull
>   LV       VG        Attr       LSize   Pool     Origin Data%  Meta%
> Move Log Cpy%Sync Convert WhenFull
>   fatvol   vg_kvm    -wi-a----- 256.00m
> 
>   storage  vg_kvm    -wi-a----- 300.00g
> 
>   thinpool vg_kvm    twi-aot-D- 256.00m                 100.00 4.10
>                      error
>   thinvol  vg_kvm    Vwi-aot---   1.00g thinpool        25.00
> 
>   root     vg_system -wi-ao----  50.00g
> 
>   swap     vg_system -wi-ao----   7.62g
> 
> # write another 400M - they should *not* be allowed to complete without 
> errors
> [root@blackhole mnt]# dd if=/dev/zero of=/mnt/thinvol/disk2.img bs=1M 
> count=400
> 400+0 records in
> 400+0 records out
> 419430400 bytes (419 MB) copied, 0.36643 s, 1.1 GB/s
> 
> # no errors reported! give a look at dmesg
> 
> [root@blackhole mnt]# dmesg
> [ 4603.649156] buffer_io_error: 44890 callbacks suppressed
> [ 4603.649163] Buffer I/O error on dev dm-8, logical block 163776,
> lost async page write
> [ 4603.649172] Buffer I/O error on dev dm-8, logical block 163777,
> lost async page write
> [ 4603.649175] Buffer I/O error on dev dm-8, logical block 163778,
> lost async page write
> [ 4603.649178] Buffer I/O error on dev dm-8, logical block 163779,
> lost async page write
> [ 4603.649181] Buffer I/O error on dev dm-8, logical block 163780,
> lost async page write
> [ 4603.649184] Buffer I/O error on dev dm-8, logical block 163781,
> lost async page write
> [ 4603.649187] Buffer I/O error on dev dm-8, logical block 163782,
> lost async page write
> [ 4603.649189] Buffer I/O error on dev dm-8, logical block 163783,
> lost async page write
> [ 4603.649192] Buffer I/O error on dev dm-8, logical block 163784,
> lost async page write
> [ 4603.649194] Buffer I/O error on dev dm-8, logical block 163785,
> lost async page write
> 
> # current filesystem use
> [root@blackhole mnt]# df -h | grep thin
> /dev/mapper/vg_kvm-thinvol 1021M  833M  189M  82% /mnt/thinvol

Hi all,
any suggestion regarding the issue?

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-15 11:51             ` Gionatan Danti
@ 2017-06-15 13:14               ` Carlos Maiolino
  2017-06-15 14:10                 ` Carlos Maiolino
  0 siblings, 1 reply; 30+ messages in thread
From: Carlos Maiolino @ 2017-06-15 13:14 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-xfs

Hi,

> > > Ok, I tried with a more typical non-sync write and it seems to
> > > report ENOSPC:
> > > 
> > > [root@blackhole ~]# dd if=/dev/zero of=/mnt/storage/disk.img bs=1M
> > > count=2048
> > > dd: error writing ‘/mnt/storage/disk.img’: No space left on device
> > > 2002+0 records in
> > > 2001+0 records out
> > > 2098917376 bytes (2.1 GB) copied, 7.88216 s, 266 MB/s
> > > 
> > 
> > # thin pool switched to out-of-space mode
> > [root@blackhole mnt]# dmesg
> > [ 4408.257419] XFS (dm-8): Mounting V5 Filesystem
> > [ 4408.368891] XFS (dm-8): Ending clean mount
> > [ 4460.147962] device-mapper: thin: 253:6: switching pool to
> > out-of-data-space (error IO) mode
> > [ 4460.218484] buffer_io_error: 199623 callbacks suppressed
> > [ 4460.218497] Buffer I/O error on dev dm-8, logical block 86032, lost
> > async page write
> > 
.
.
.
> > # write another 400M - they should *not* be allowed to complete without
> > errors
> > [root@blackhole mnt]# dd if=/dev/zero of=/mnt/thinvol/disk2.img bs=1M
> > count=400
> > 400+0 records in
> > 400+0 records out
> > 419430400 bytes (419 MB) copied, 0.36643 s, 1.1 GB/s
> > 
> > # no errors reported! give a look at dmesg
> > 
> > [root@blackhole mnt]# dmesg
> > [ 4603.649156] buffer_io_error: 44890 callbacks suppressed
> > [ 4603.649163] Buffer I/O error on dev dm-8, logical block 163776,
> > lost async page write
> > [ 4603.649172] Buffer I/O error on dev dm-8, logical block 163777,
> > # current filesystem use
> > [root@blackhole mnt]# df -h | grep thin
> > /dev/mapper/vg_kvm-thinvol 1021M  833M  189M  82% /mnt/thinvol
> 

> Hi all,
> any suggestion regarding the issue?
> 
> Regards.
> 

Unfortunately, there isn't much a filesystem can do here. I'll need to talk with
device-mapper folks to get a better understanding how we can handle such cases,
if possible at all.

The problem you are seeing there is because of buffered writes, you don't get a
ENOSPC because the device itself virtually still has space to be allocated, so
the lack of space is only visible when the blocks are really being allocated
from the POOL itself, but still, there isn't much a filesystem can do here
regarding buffered writes.

As you noticed, XFS will report errors regarding the problems to write metadata
to the device, but again, user data, is up to the application to ensure the data
is consistent, although I think I actually found a problem with it while doing
some tests as you mentioned, I'll need to look deeper into them.
 

-- 
Carlos

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-15 13:14               ` Carlos Maiolino
@ 2017-06-15 14:10                 ` Carlos Maiolino
  2017-06-15 15:04                   ` Gionatan Danti
  0 siblings, 1 reply; 30+ messages in thread
From: Carlos Maiolino @ 2017-06-15 14:10 UTC (permalink / raw)
  To: Gionatan Danti, linux-xfs

> > Hi all,
> > any suggestion regarding the issue?
> > 
> > Regards.
> > 
> 
> Unfortunately, there isn't much a filesystem can do here. I'll need to talk with
> device-mapper folks to get a better understanding how we can handle such cases,
> if possible at all.
> 
> The problem you are seeing there is because of buffered writes, you don't get a
> ENOSPC because the device itself virtually still has space to be allocated, so
> the lack of space is only visible when the blocks are really being allocated
> from the POOL itself, but still, there isn't much a filesystem can do here
> regarding buffered writes.
> 
> As you noticed, XFS will report errors regarding the problems to write metadata
> to the device, but again, user data, is up to the application to ensure the data
> is consistent, although 

>I think I actually found a problem with it while doing
> some tests as you mentioned, I'll need to look deeper into them.

Disregard this comment, I messed up with some tests, so, basically, the
application is responsible for the user data, and need to use fsync/fdatasync to
ensure the data is properly written, this is not FS responsibility.

cheers
>  
> 
> -- 
> Carlos
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-15 14:10                 ` Carlos Maiolino
@ 2017-06-15 15:04                   ` Gionatan Danti
  2017-06-20 10:19                     ` Gionatan Danti
  2017-06-20 11:05                     ` Carlos Maiolino
  0 siblings, 2 replies; 30+ messages in thread
From: Gionatan Danti @ 2017-06-15 15:04 UTC (permalink / raw)
  To: linux-xfs; +Cc: g.danti

On 15/06/2017 16:10, Carlos Maiolino wrote:
> 
> Disregard this comment, I messed up with some tests, so, basically, the
> application is responsible for the user data, and need to use fsync/fdatasync to
> ensure the data is properly written, this is not FS responsibility.
> 
> cheers

Hi Carlos,
I fully agree that it is application responsibility to issue appropriate 
fsync(). However, knowing that this not always happens in real-world, I 
am trying to be as much "fail-safe" as possible.

 From my understanding of your previous message, a full thin pool with 
--errorwhenfull=y should return ENOSPC to the filesystem. Does this work 
on normal cached/buffered/async writes, or with O_DIRECT writes only?

If it is not the case, how can I prevent further writes to a data-full 
thin pool? With ext4, I can use "data=journal,errors=remount-ro" to 
catch any write errors and stop the filesystem (or remount it 
read-only), losing only some seconds worth of data. This *will* works 
even for applications that do not issue fsync(), as the read-only 
filesystem will not let the write() syscall to complete successfully.

On XFS (which I would *really* use, because it is quite more advanced), 
all writes directed to a full thin-pool will basically end on /dev/null 
and, as write() succeeded, the application/user will *not* be alerted on 
any way. If the thin-pool can communicate its "end of free space" to the 
filesystem, the problem can be avoided.

If this can not be done, the only remaining possibility is to instruct 
the filesystem to stop itself on data writeout errors. So, we got 
full-circle about my original question: how can I stop XFS when writes 
return I/O errors? Please note that I tried to set any 
/sys/fs/xfs/dm-8/error/metadata/*/max_retries tunable to 0, but I can 
not get the filesystem to suspend itself, even when dmesg reported 
metadata write errors.

Thank you very much.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-15 15:04                   ` Gionatan Danti
@ 2017-06-20 10:19                     ` Gionatan Danti
  2017-06-20 11:05                     ` Carlos Maiolino
  1 sibling, 0 replies; 30+ messages in thread
From: Gionatan Danti @ 2017-06-20 10:19 UTC (permalink / raw)
  To: linux-xfs; +Cc: g.danti

Il 15-06-2017 17:04 Gionatan Danti ha scritto:
> On 15/06/2017 16:10, Carlos Maiolino wrote:
>> 
>> Disregard this comment, I messed up with some tests, so, basically, 
>> the
>> application is responsible for the user data, and need to use 
>> fsync/fdatasync to
>> ensure the data is properly written, this is not FS responsibility.
>> 
>> cheers
> 
> Hi Carlos,
> I fully agree that it is application responsibility to issue
> appropriate fsync(). However, knowing that this not always happens in
> real-world, I am trying to be as much "fail-safe" as possible.
> 
> From my understanding of your previous message, a full thin pool with
> --errorwhenfull=y should return ENOSPC to the filesystem. Does this
> work on normal cached/buffered/async writes, or with O_DIRECT writes
> only?
> 
> If it is not the case, how can I prevent further writes to a data-full
> thin pool? With ext4, I can use "data=journal,errors=remount-ro" to
> catch any write errors and stop the filesystem (or remount it
> read-only), losing only some seconds worth of data. This *will* works
> even for applications that do not issue fsync(), as the read-only
> filesystem will not let the write() syscall to complete successfully.
> 
> On XFS (which I would *really* use, because it is quite more
> advanced), all writes directed to a full thin-pool will basically end
> on /dev/null and, as write() succeeded, the application/user will
> *not* be alerted on any way. If the thin-pool can communicate its "end
> of free space" to the filesystem, the problem can be avoided.
> 
> If this can not be done, the only remaining possibility is to instruct
> the filesystem to stop itself on data writeout errors. So, we got
> full-circle about my original question: how can I stop XFS when writes
> return I/O errors? Please note that I tried to set any
> /sys/fs/xfs/dm-8/error/metadata/*/max_retries tunable to 0, but I can
> not get the filesystem to suspend itself, even when dmesg reported
> metadata write errors.
> 
> Thank you very much.

Hi all,
any thought on the matter?

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-15 15:04                   ` Gionatan Danti
  2017-06-20 10:19                     ` Gionatan Danti
@ 2017-06-20 11:05                     ` Carlos Maiolino
  2017-06-20 15:03                       ` Gionatan Danti
  1 sibling, 1 reply; 30+ messages in thread
From: Carlos Maiolino @ 2017-06-20 11:05 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-xfs

Hi Gionatan,

On Thu, Jun 15, 2017 at 05:04:48PM +0200, Gionatan Danti wrote:
> On 15/06/2017 16:10, Carlos Maiolino wrote:
> > 
> > Disregard this comment, I messed up with some tests, so, basically, the
> > application is responsible for the user data, and need to use fsync/fdatasync to
> > ensure the data is properly written, this is not FS responsibility.
> > 
> > cheers
> 
> Hi Carlos,
> I fully agree that it is application responsibility to issue appropriate
> fsync(). However, knowing that this not always happens in real-world, I am
> trying to be as much "fail-safe" as possible.
> 
Yeah, unfortunately, the real-world has lots of bad written applications :(


> From my understanding of your previous message, a full thin pool with
> --errorwhenfull=y should return ENOSPC to the filesystem. Does this work on
> normal cached/buffered/async writes, or with O_DIRECT writes only?
> 

AFAIK, it will return ENOSPC with O_DIRECT, yes. With async writes, you won't
have any error returned until you issue a fsync/fdatasync, which, per my
understanding, it will return an EIO.

> If it is not the case, how can I prevent further writes to a data-full thin
> pool? With ext4, I can use "data=journal,errors=remount-ro" to catch any
> write errors and stop the filesystem (or remount it read-only), losing only
> some seconds worth of data. This *will* works even for applications that do
> not issue fsync(), as the read-only filesystem will not let the write()
> syscall to complete successfully.
> 
It 'works' on Ext4, because it will journal the data first, and at some point it
will try to allocate blocks for metadata, and that will fail, which will
help ext4 to catch this corner case, although, IIRC, 'data=journal' mode isn't
supported at all. I even heard rumors of the possibility to have this option
removed from Ext4, but I don't follow ext4 development close enough to tell you
if this is just a rumor or they are really considering it.

> On XFS (which I would *really* use, because it is quite more advanced), all
> writes directed to a full thin-pool will basically end on /dev/null and, as
> write() succeeded, the application/user will *not* be alerted on any way. If
> the thin-pool can communicate its "end of free space" to the filesystem, the
> problem can be avoided.
> 

The application won't be alerted in any way unless it uses fsync()/fdatasync()
with any filesystem being used, even using data=journal in ext4, this won't
happen, ext4 gets mounted as read-only because there were 'metadata' errors when
writing the file to the journal, but again, it is not a fix for a faulty
application, it is not even reliable for shutting down the filesystem the way
you are thinking this will. It will only shut down the filesystem depending on
the amount of blocks being allocated, even when using data=journal, if the
amount of blocks allocated are enough to hold the metadata, but not the data,
you will see the same problem as you are seeing with XFS (or ext4 without
data=journal), so, don't rely on it.


> If this can not be done, the only remaining possibility is to instruct the
> filesystem to stop itself on data writeout errors. So, we got full-circle
> about my original question: how can I stop XFS when writes return I/O
> errors? Please note that I tried to set any
> /sys/fs/xfs/dm-8/error/metadata/*/max_retries tunable to 0, but I can not
> get the filesystem to suspend itself, even when dmesg reported metadata
> write errors.

Yes, these options won't help, because they are configuration options
for metadata errors, not data errors.

Please, bear in mind that your question should be: "how can I stop a filesystem
when async writes return I/O errors", because this isn't a XFS issue.

BUt again, there isn't too much you can do here, async writes are supposed to
behave this way. And whoever is writing "data" to the device is supposed to care
of their own data.

Imagine for example a situation where you have 2 applications using the same
filesystem (quite common right?), then application A and B issues buffered
writes, and for some reason, application A data, hits an IO error, for any
reason, maybe a too busy storage, a missed scsi command, whatever, anything that
can be retried.

then the filesystem shuts down because of that, which will also affect
application B, even if nothing wrong happened with application B.

One of the goals of multitasking is having applications running at the same time
without affecting each other.

Now, consider that, application B is a well written application, and application
A isn't.

App B cares for its data to be written to disk, while app A doesn't.

In case of a casual error, app B will retry to write its data, while app A
won't.

Should we really shutdown the filesystem here affecting everything on the
system, because application A is not caring for its own data?

Shutting a filesystem down, has basically one purpose: avoid corruption, we
basically only shutdown a filesystem when keeping it alive can cause a problem
with everything using it (really really simple explanation here).

Surely this can be improved, but at the end, the application will always need to
check for its own data.

I am not really a device-mapper developer and I don't know much about its code
in depth. But, I know it will issue warnings when there isn't more space left,
and you can configure a watermark too, to warn the admin when the space used
reaches that watermark.

By now, I believe the best solution is to have a reasonable watermark set on the
thin device, and the Admin take the appropriate action whenever this watermark
is achieved.

Cheers.

> 
> Thank you very much.
> 
-- 
Carlos

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-20 11:05                     ` Carlos Maiolino
@ 2017-06-20 15:03                       ` Gionatan Danti
  2017-06-20 15:28                         ` Brian Foster
  0 siblings, 1 reply; 30+ messages in thread
From: Gionatan Danti @ 2017-06-20 15:03 UTC (permalink / raw)
  To: linux-xfs; +Cc: g.danti

Il 20-06-2017 13:05 Carlos Maiolino ha scritto:
> 
> AFAIK, it will return ENOSPC with O_DIRECT, yes. With async writes, you 
> won't
> have any error returned until you issue a fsync/fdatasync, which, per 
> my
> understanding, it will return an EIO.
> 

Ok, I was missing that; so ENOSPC will be returned for O_DIRECT only. 
I'll take a note ;)

> 
> The application won't be alerted in any way unless it uses 
> fsync()/fdatasync()
> with any filesystem being used, even using data=journal in ext4, this 
> won't
> happen, ext4 gets mounted as read-only because there were 'metadata' 
> errors when
> writing the file to the journal, but again, it is not a fix for a 
> faulty
> application, it is not even reliable for shutting down the filesystem 
> the way
> you are thinking this will. It will only shut down the filesystem 
> depending on
> the amount of blocks being allocated, even when using data=journal, if 
> the
> amount of blocks allocated are enough to hold the metadata, but not the 
> data,
> you will see the same problem as you are seeing with XFS (or ext4 
> without
> data=journal), so, don't rely on it.
> 

This somewhat scares me. From my understanding, a full thin pool will 
eventually bring XFS to an halt (filesystem shutdown) but, from my 
testing, this can take a fair amount of time/failed writes. During this 
period, any writes will be lost without nobody noticing that. In fact, I 
opened a similar thread on the lvm mailing list discussing this very 
same problem.

> 
> Yes, these options won't help, because they are configuration options
> for metadata errors, not data errors.
> 
> Please, bear in mind that your question should be: "how can I stop a 
> filesystem
> when async writes return I/O errors", because this isn't a XFS issue.
> 
> BUt again, there isn't too much you can do here, async writes are 
> supposed to
> behave this way. And whoever is writing "data" to the device is 
> supposed to care
> of their own data.
> 
> Imagine for example a situation where you have 2 applications using the 
> same
> filesystem (quite common right?), then application A and B issues 
> buffered
> writes, and for some reason, application A data, hits an IO error, for 
> any
> reason, maybe a too busy storage, a missed scsi command, whatever, 
> anything that
> can be retried.
> 
> then the filesystem shuts down because of that, which will also affect
> application B, even if nothing wrong happened with application B.
> 
> One of the goals of multitasking is having applications running at the 
> same time
> without affecting each other.
> 
> Now, consider that, application B is a well written application, and 
> application
> A isn't.
> 
> App B cares for its data to be written to disk, while app A doesn't.
> 
> In case of a casual error, app B will retry to write its data, while 
> app A
> won't.
> 
> Should we really shutdown the filesystem here affecting everything on 
> the
> system, because application A is not caring for its own data?
> 
> Shutting a filesystem down, has basically one purpose: avoid 
> corruption, we
> basically only shutdown a filesystem when keeping it alive can cause a 
> problem
> with everything using it (really really simple explanation here).
> 
> Surely this can be improved, but at the end, the application will 
> always need to
> check for its own data.

I think the key improvement would be to let the filesystem know about 
the full thin pool - ie: returing ENOSPC at some convenient time (a wild 
guess: can we return ENOSPC during delayed block allocation?)

> 
> I am not really a device-mapper developer and I don't know much about 
> its code
> in depth. But, I know it will issue warnings when there isn't more 
> space left,
> and you can configure a watermark too, to warn the admin when the space 
> used
> reaches that watermark.
> 
> By now, I believe the best solution is to have a reasonable watermark 
> set on the
> thin device, and the Admin take the appropriate action whenever this 
> watermark
> is achieved.

Yeah, lvmthin *will* return appropriate warnings during pool filling. 
However, this require active monitoring which, albeit a great idea and 
"the right thing to do (tm)", it adds complexity and can itself fail. In 
recent enought (experimental) versions, lvmthin can be instructed to 
execute specific actions when data allocation is higher than some 
threshold, which somewhat addresses my concerns at the block layer.

Thank you for your patience and sharing, Carlos.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-20 15:03                       ` Gionatan Danti
@ 2017-06-20 15:28                         ` Brian Foster
  2017-06-20 15:34                           ` Luis R. Rodriguez
  2017-06-20 15:55                           ` Gionatan Danti
  0 siblings, 2 replies; 30+ messages in thread
From: Brian Foster @ 2017-06-20 15:28 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-xfs

On Tue, Jun 20, 2017 at 05:03:42PM +0200, Gionatan Danti wrote:
> Il 20-06-2017 13:05 Carlos Maiolino ha scritto:
> > 
...
> > Surely this can be improved, but at the end, the application will always
> > need to
> > check for its own data.
> 
> I think the key improvement would be to let the filesystem know about the
> full thin pool - ie: returing ENOSPC at some convenient time (a wild guess:
> can we return ENOSPC during delayed block allocation?)
> 

FWIW, I played with something like this a while ago. See the following
(and its predecessor for a more detailed cover letter):

  http://oss.sgi.com/pipermail/xfs/2016-April/048166.html

You lose some allocation efficiency with this approach because XFS
relies on a worst case allocation reservation in dm-thin, but IIRC that
only really manifested when the volume was near ENOSPC. If one finds
that tradeoff acceptable, I think it's otherwise possible to forward
ENOSPC from the the block device earlier than is done currently.

Brian

> > 
> > I am not really a device-mapper developer and I don't know much about
> > its code
> > in depth. But, I know it will issue warnings when there isn't more space
> > left,
> > and you can configure a watermark too, to warn the admin when the space
> > used
> > reaches that watermark.
> > 
> > By now, I believe the best solution is to have a reasonable watermark
> > set on the
> > thin device, and the Admin take the appropriate action whenever this
> > watermark
> > is achieved.
> 
> Yeah, lvmthin *will* return appropriate warnings during pool filling.
> However, this require active monitoring which, albeit a great idea and "the
> right thing to do (tm)", it adds complexity and can itself fail. In recent
> enought (experimental) versions, lvmthin can be instructed to execute
> specific actions when data allocation is higher than some threshold, which
> somewhat addresses my concerns at the block layer.
> 
> Thank you for your patience and sharing, Carlos.
> 
> -- 
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it
> email: g.danti@assyoma.it - info@assyoma.it
> GPG public key ID: FF5F32A8
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-20 15:28                         ` Brian Foster
@ 2017-06-20 15:34                           ` Luis R. Rodriguez
  2017-06-20 17:01                             ` Brian Foster
  2017-06-20 15:55                           ` Gionatan Danti
  1 sibling, 1 reply; 30+ messages in thread
From: Luis R. Rodriguez @ 2017-06-20 15:34 UTC (permalink / raw)
  To: Brian Foster; +Cc: Gionatan Danti, linux-xfs

On Tue, Jun 20, 2017 at 11:28:58AM -0400, Brian Foster wrote:
> On Tue, Jun 20, 2017 at 05:03:42PM +0200, Gionatan Danti wrote:
> > Il 20-06-2017 13:05 Carlos Maiolino ha scritto:
> > > 
> ...
> > > Surely this can be improved, but at the end, the application will always
> > > need to
> > > check for its own data.
> > 
> > I think the key improvement would be to let the filesystem know about the
> > full thin pool - ie: returing ENOSPC at some convenient time (a wild guess:
> > can we return ENOSPC during delayed block allocation?)
> > 
> 
> FWIW, I played with something like this a while ago. See the following
> (and its predecessor for a more detailed cover letter):
> 
>   http://oss.sgi.com/pipermail/xfs/2016-April/048166.html

Any chance this is up in a tree somewhere?

  Luis

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-20 15:28                         ` Brian Foster
  2017-06-20 15:34                           ` Luis R. Rodriguez
@ 2017-06-20 15:55                           ` Gionatan Danti
  2017-06-20 17:02                             ` Brian Foster
  1 sibling, 1 reply; 30+ messages in thread
From: Gionatan Danti @ 2017-06-20 15:55 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, g.danti

Il 20-06-2017 17:28 Brian Foster ha scritto:
> 
> FWIW, I played with something like this a while ago. See the following
> (and its predecessor for a more detailed cover letter):
> 
>   http://oss.sgi.com/pipermail/xfs/2016-April/048166.html
> 
> You lose some allocation efficiency with this approach because XFS
> relies on a worst case allocation reservation in dm-thin, but IIRC that
> only really manifested when the volume was near ENOSPC. If one finds
> that tradeoff acceptable, I think it's otherwise possible to forward
> ENOSPC from the the block device earlier than is done currently.
> 
> Brian

Very informative thread, thanks for linking. From here [1]:

"That just doesn't help us avoid the overprovisioned situation where we
have data in pagecache and nowhere to write it back to (w/o setting the
volume read-only). The only way I'm aware of to handle that is to
account for the space at write time."

I fully understand that: after all, writes sitting in pagecaches are 
not, well, yet written. I can also imagine what profound ramifications 
would have to correctly cover any failed data writeout corner case. What 
would be a great first step, however, is that at the *first* failed data 
writeout due to full thin pool, a ENOSPC (or similar) to be returned to 
the filesystem. Catching this situation, the filesystem can reject any 
further buffered writes until manual intervention.

Well, my main concern is to avoid sunstained writes to a filled pool, 
surely your patch target a whole bigger (and better!) solution.

[1] http://oss.sgi.com/pipermail/xfs/2016-April/048378.html

> 
>> >
>> > I am not really a device-mapper developer and I don't know much about
>> > its code
>> > in depth. But, I know it will issue warnings when there isn't more space
>> > left,
>> > and you can configure a watermark too, to warn the admin when the space
>> > used
>> > reaches that watermark.
>> >
>> > By now, I believe the best solution is to have a reasonable watermark
>> > set on the
>> > thin device, and the Admin take the appropriate action whenever this
>> > watermark
>> > is achieved.
>> 
>> Yeah, lvmthin *will* return appropriate warnings during pool filling.
>> However, this require active monitoring which, albeit a great idea and 
>> "the
>> right thing to do (tm)", it adds complexity and can itself fail. In 
>> recent
>> enought (experimental) versions, lvmthin can be instructed to execute
>> specific actions when data allocation is higher than some threshold, 
>> which
>> somewhat addresses my concerns at the block layer.
>> 
>> Thank you for your patience and sharing, Carlos.
>> 
>> --
>> Danti Gionatan
>> Supporto Tecnico
>> Assyoma S.r.l. - www.assyoma.it
>> email: g.danti@assyoma.it - info@assyoma.it
>> GPG public key ID: FF5F32A8
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" 
>> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-20 15:34                           ` Luis R. Rodriguez
@ 2017-06-20 17:01                             ` Brian Foster
  0 siblings, 0 replies; 30+ messages in thread
From: Brian Foster @ 2017-06-20 17:01 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: Gionatan Danti, linux-xfs

On Tue, Jun 20, 2017 at 05:34:44PM +0200, Luis R. Rodriguez wrote:
> On Tue, Jun 20, 2017 at 11:28:58AM -0400, Brian Foster wrote:
> > On Tue, Jun 20, 2017 at 05:03:42PM +0200, Gionatan Danti wrote:
> > > Il 20-06-2017 13:05 Carlos Maiolino ha scritto:
> > > > 
> > ...
> > > > Surely this can be improved, but at the end, the application will always
> > > > need to
> > > > check for its own data.
> > > 
> > > I think the key improvement would be to let the filesystem know about the
> > > full thin pool - ie: returing ENOSPC at some convenient time (a wild guess:
> > > can we return ENOSPC during delayed block allocation?)
> > > 
> > 
> > FWIW, I played with something like this a while ago. See the following
> > (and its predecessor for a more detailed cover letter):
> > 
> >   http://oss.sgi.com/pipermail/xfs/2016-April/048166.html
> 
> Any chance this is up in a tree somewhere?
> 

No, I don't have an upstream tree hosted anywhere unfortunately.

Brian

>   Luis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-20 15:55                           ` Gionatan Danti
@ 2017-06-20 17:02                             ` Brian Foster
  2017-06-20 18:43                               ` Gionatan Danti
  0 siblings, 1 reply; 30+ messages in thread
From: Brian Foster @ 2017-06-20 17:02 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-xfs

On Tue, Jun 20, 2017 at 05:55:26PM +0200, Gionatan Danti wrote:
> Il 20-06-2017 17:28 Brian Foster ha scritto:
> > 
> > FWIW, I played with something like this a while ago. See the following
> > (and its predecessor for a more detailed cover letter):
> > 
> >   http://oss.sgi.com/pipermail/xfs/2016-April/048166.html
> > 
> > You lose some allocation efficiency with this approach because XFS
> > relies on a worst case allocation reservation in dm-thin, but IIRC that
> > only really manifested when the volume was near ENOSPC. If one finds
> > that tradeoff acceptable, I think it's otherwise possible to forward
> > ENOSPC from the the block device earlier than is done currently.
> > 
> > Brian
> 
> Very informative thread, thanks for linking. From here [1]:
> 
> "That just doesn't help us avoid the overprovisioned situation where we
> have data in pagecache and nowhere to write it back to (w/o setting the
> volume read-only). The only way I'm aware of to handle that is to
> account for the space at write time."
> 
> I fully understand that: after all, writes sitting in pagecaches are not,
> well, yet written. I can also imagine what profound ramifications would have
> to correctly cover any failed data writeout corner case. What would be a
> great first step, however, is that at the *first* failed data writeout due
> to full thin pool, a ENOSPC (or similar) to be returned to the filesystem.
> Catching this situation, the filesystem can reject any further buffered
> writes until manual intervention.
> 
> Well, my main concern is to avoid sunstained writes to a filled pool, surely
> your patch target a whole bigger (and better!) solution.
> 

ISTM you might as well write something in userspace that receives a
notification from device-mapper and shuts down or remounts the fs if the
volume has gone inactive or hit a watermark. I don't think we'd bury
anything in XFS that cuts off and then resumes operations based on
underlying device errors like that. That sounds like a very crude
approach with a narrow use case.

That said, I don't think I'd be opposed to something in XFS that
(optionally) shutdown the fs in response to a similar dm notification
provided we know with certainty that the underlying device is inactive
(and that it can be accomplished relatively cleanly).

Brian

> [1] http://oss.sgi.com/pipermail/xfs/2016-April/048378.html
> 
> > 
> > > >
> > > > I am not really a device-mapper developer and I don't know much about
> > > > its code
> > > > in depth. But, I know it will issue warnings when there isn't more space
> > > > left,
> > > > and you can configure a watermark too, to warn the admin when the space
> > > > used
> > > > reaches that watermark.
> > > >
> > > > By now, I believe the best solution is to have a reasonable watermark
> > > > set on the
> > > > thin device, and the Admin take the appropriate action whenever this
> > > > watermark
> > > > is achieved.
> > > 
> > > Yeah, lvmthin *will* return appropriate warnings during pool filling.
> > > However, this require active monitoring which, albeit a great idea
> > > and "the
> > > right thing to do (tm)", it adds complexity and can itself fail. In
> > > recent
> > > enought (experimental) versions, lvmthin can be instructed to execute
> > > specific actions when data allocation is higher than some threshold,
> > > which
> > > somewhat addresses my concerns at the block layer.
> > > 
> > > Thank you for your patience and sharing, Carlos.
> > > 
> > > --
> > > Danti Gionatan
> > > Supporto Tecnico
> > > Assyoma S.r.l. - www.assyoma.it
> > > email: g.danti@assyoma.it - info@assyoma.it
> > > GPG public key ID: FF5F32A8
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs"
> > > in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it
> email: g.danti@assyoma.it - info@assyoma.it
> GPG public key ID: FF5F32A8
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-20 17:02                             ` Brian Foster
@ 2017-06-20 18:43                               ` Gionatan Danti
  2017-06-21  9:44                                 ` Carlos Maiolino
  2017-06-21  9:53                                 ` Brian Foster
  0 siblings, 2 replies; 30+ messages in thread
From: Gionatan Danti @ 2017-06-20 18:43 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, g.danti

Il 20-06-2017 19:02 Brian Foster ha scritto:
> 
> ISTM you might as well write something in userspace that receives a
> notification from device-mapper and shuts down or remounts the fs if 
> the
> volume has gone inactive or hit a watermark. I don't think we'd bury
> anything in XFS that cuts off and then resumes operations based on
> underlying device errors like that. That sounds like a very crude
> approach with a narrow use case.

Absolutely, crude & ugly...

> 
> That said, I don't think I'd be opposed to something in XFS that
> (optionally) shutdown the fs in response to a similar dm notification
> provided we know with certainty that the underlying device is inactive
> (and that it can be accomplished relatively cleanly).
> 

This would be a much better approach. Any chances to get it implemented?
About your patches, I really like that they addresses the fallocate 
propagation issue. Are they a work in progress? Any chances to have them 
shipping in mainline kernel?

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-20 18:43                               ` Gionatan Danti
@ 2017-06-21  9:44                                 ` Carlos Maiolino
  2017-06-21 10:39                                   ` Gionatan Danti
  2017-06-21  9:53                                 ` Brian Foster
  1 sibling, 1 reply; 30+ messages in thread
From: Carlos Maiolino @ 2017-06-21  9:44 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-xfs

Hi,

> > ISTM you might as well write something in userspace that receives a
> > notification from device-mapper and shuts down or remounts the fs if the
> > volume has gone inactive or hit a watermark. I don't think we'd bury
> > anything in XFS that cuts off and then resumes operations based on
> > underlying device errors like that. That sounds like a very crude
> > approach with a narrow use case.
> 
> Absolutely, crude & ugly...
> 
> > 
> > That said, I don't think I'd be opposed to something in XFS that
> > (optionally) shutdown the fs in response to a similar dm notification
> > provided we know with certainty that the underlying device is inactive
> > (and that it can be accomplished relatively cleanly).
> > 
> 
> This would be a much better approach. Any chances to get it implemented?

I think this is doable, I've been talking with Jeff who is working on an
enhanced writeback error notification for a while, which will be useful for
something like that, or some other type of enhanced communication between
dm-thin and filesystems. 

Such improvements have been in discussion, I believe, since I brought up the
subject in Linux Plumbers 2013, but there are a lot of work to be done yet.


Now I'm quoting your previous email:

>This somewhat scares me. From my understanding, a full thin pool will
>eventually bring XFS to an halt (filesystem shutdown) but, from my testing,
>this can take a fair amount of time/failed writes. During this period, any
>writes will be lost without nobody noticing that. In fact, I opened a similar
>thread on the lvm mailing list discussing this very same problem.

By "eventually", you should say metadata errors, filesystems won't shutdown
during data errors.


>Yeah, lvmthin *will* return appropriate warnings during pool filling. However,
>this require active monitoring which, albeit a great idea and "the right thing
>to do (tm)", it adds complexity and can itself fail.

Well, yes, this is what sysadmins are supposed to do, no? Regarding the
complexity, everything we've been discussing here will also add lots of
complexity to the filesystem/block subsystems, and also, they can fail, like for
example, what I wrote before, one application could shutdown the filesystem and
cause it to be inaccessible by other applications which is usually not what
anybody wants, also, because you don't want to remove the possibility of the
applications to still read their data if such corner case happens (physical
space full, virtual space still available).

>In recent enought
>(experimental) versions, lvmthin can be instructed to execute specific actions
>when data allocation is higher than some threshold, which somewhat addresses
>my concerns at the block layer.

That's good, I didn't know about that, there is a good way to manage such stuff,
like, telling lvm to remount a FS as read-only, after a threshold.

At the end though, I feel that what you are looking for, is a way that the
filesystem/block layer can remove the monitoring job from the sysadmin. Yes,
there are many things that can be done better, yes, there will be lots of
improvements in this area in the near future, but this still won't remove the
responsibility of the sysadmins to monitor their systems and ensure they take
the required actions when needed.

Thin provisioning isn't a new technology, it is in the market for ages,
overprovisioning indeed, and these same problems were expected to happen AFAIK,
and the sysadmin, expected to take the appropriate actions.

It's been too long since I worked with dedicated storages using thin
provisioning, so I don't remember how a dedicated hardware is expected to behave
when the physical space is full, or even if there is any standard to follow on
this situation, but I *think*, the same behavior is expected, data writes
failing, and nobody caring about it other than the userspace application, and
the filesystem not taking any action until some metadata write fail.

But I still think that, if you don't want to risk such situation, the
applications should be doing their job well, the sysadmin monitoring the systems
as required, or not using overprovisioning at all.

But anyway, thanks for such discussion, it brought nice ideas of future
improvements.

Cheers

--
Carlos

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-20 18:43                               ` Gionatan Danti
  2017-06-21  9:44                                 ` Carlos Maiolino
@ 2017-06-21  9:53                                 ` Brian Foster
  1 sibling, 0 replies; 30+ messages in thread
From: Brian Foster @ 2017-06-21  9:53 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-xfs

On Tue, Jun 20, 2017 at 08:43:30PM +0200, Gionatan Danti wrote:
> Il 20-06-2017 19:02 Brian Foster ha scritto:
> > 
> > ISTM you might as well write something in userspace that receives a
> > notification from device-mapper and shuts down or remounts the fs if the
> > volume has gone inactive or hit a watermark. I don't think we'd bury
> > anything in XFS that cuts off and then resumes operations based on
> > underlying device errors like that. That sounds like a very crude
> > approach with a narrow use case.
> 
> Absolutely, crude & ugly...
> 
> > 
> > That said, I don't think I'd be opposed to something in XFS that
> > (optionally) shutdown the fs in response to a similar dm notification
> > provided we know with certainty that the underlying device is inactive
> > (and that it can be accomplished relatively cleanly).
> > 
> 
> This would be a much better approach. Any chances to get it implemented?
> About your patches, I really like that they addresses the fallocate
> propagation issue. Are they a work in progress? Any chances to have them
> shipping in mainline kernel?
> 

Not likely anytime soon, at least. Those patches were an experiment for
an idea on how to improve usability in an XFS/dm-thin setup, but
otherwise didn't seem to get much traction. One major downside is that
the solution is specific to XFS.

Brian

> Thanks.
> 
> -- 
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it
> email: g.danti@assyoma.it - info@assyoma.it
> GPG public key ID: FF5F32A8
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Shutdown filesystem when a thin pool become full
  2017-06-21  9:44                                 ` Carlos Maiolino
@ 2017-06-21 10:39                                   ` Gionatan Danti
  0 siblings, 0 replies; 30+ messages in thread
From: Gionatan Danti @ 2017-06-21 10:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: g.danti

Il 21-06-2017 11:44 Carlos Maiolino ha scritto:
> 
> I think this is doable, I've been talking with Jeff who is working on 
> an
> enhanced writeback error notification for a while, which will be useful 
> for
> something like that, or some other type of enhanced communication 
> between
> dm-thin and filesystems.
> 
> Such improvements have been in discussion, I believe, since I brought 
> up the
> subject in Linux Plumbers 2013, but there are a lot of work to be done 
> yet.
> 

Hi Carlos, this would be great. Glad to hear it is in 
progress/discussion.

> 
> At the end though, I feel that what you are looking for, is a way that 
> the
> filesystem/block layer can remove the monitoring job from the sysadmin.
> 

No, this is a misunderstanding due to bad communication on my part, 
sorry :)

I don't want to remove part of my job/responsability; rather, as I 
always prepare for the worse (ie: my monitoring failing *while* users 
fill the thin pool), I would like to have a "fail-safe" backup plan. The 
"gold standard" would for thin pools to react as a full filesystem - ie 
you (and the application) get a ENOSPC or "No space available" message.

This would mimic what happens on ZFS world when using sparse volume. 
 From its man page:
"A 'sparse volume' is a volume where the reservation is less then the 
volume size. Consequently, writes to a sparse volume can fail with 
ENOSPC when the pool is low on space."

> 
> Yes,
> there are many things that can be done better, yes, there will be lots 
> of
> improvements in this area in the near future, but this still won't 
> remove the
> responsibility of the sysadmins to monitor their systems and ensure 
> they take
> the required actions when needed.
> 

I agree, absolutely.

> 
> Thin provisioning isn't a new technology, it is in the market for ages,
> overprovisioning indeed, and these same problems were expected to 
> happen AFAIK,
> and the sysadmin, expected to take the appropriate actions.
> 
> It's been too long since I worked with dedicated storages using thin
> provisioning, so I don't remember how a dedicated hardware is expected 
> to behave
> when the physical space is full, or even if there is any standard to 
> follow on
> this situation, but I *think*, the same behavior is expected, data 
> writes
> failing, and nobody caring about it other than the userspace 
> application, and
> the filesystem not taking any action until some metadata write fail.
> 
> But I still think that, if you don't want to risk such situation, the
> applications should be doing their job well, the sysadmin monitoring 
> the systems
> as required, or not using overprovisioning at all.
> 

Overprovision by itself is not my main point: after all, if I blatanly 
lie claiming to have space that I don't really have, some problem can be 
expected ;)

Rather, my main concern is that when using snapshots, even a filesystem 
smaller than thin pool can encounter a "no more data space available", 
simply due to the additional CoW tracking needed to keep the snapshot 
"alive". But hey - this is by itself a form of overprovisioning, after 
all...

> 
> But anyway, thanks for such discussion, it brought nice ideas of future
> improvements.
> 

Thank you (and others) for all the hard work!

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-06-21 10:39 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-22 14:25 Shutdown filesystem when a thin pool become full Gionatan Danti
2017-05-22 23:09 ` Carlos Maiolino
2017-05-23 10:56   ` Gionatan Danti
2017-05-23 11:01     ` Gionatan Danti
2017-05-23 12:27       ` Carlos Maiolino
2017-05-23 20:05         ` Gionatan Danti
2017-05-23 21:33           ` Eric Sandeen
2017-05-24 17:52             ` Gionatan Danti
2017-06-13  9:09           ` Gionatan Danti
2017-06-15 11:51             ` Gionatan Danti
2017-06-15 13:14               ` Carlos Maiolino
2017-06-15 14:10                 ` Carlos Maiolino
2017-06-15 15:04                   ` Gionatan Danti
2017-06-20 10:19                     ` Gionatan Danti
2017-06-20 11:05                     ` Carlos Maiolino
2017-06-20 15:03                       ` Gionatan Danti
2017-06-20 15:28                         ` Brian Foster
2017-06-20 15:34                           ` Luis R. Rodriguez
2017-06-20 17:01                             ` Brian Foster
2017-06-20 15:55                           ` Gionatan Danti
2017-06-20 17:02                             ` Brian Foster
2017-06-20 18:43                               ` Gionatan Danti
2017-06-21  9:44                                 ` Carlos Maiolino
2017-06-21 10:39                                   ` Gionatan Danti
2017-06-21  9:53                                 ` Brian Foster
2017-05-23 12:11     ` Carlos Maiolino
2017-05-23 13:24 ` Eric Sandeen
2017-05-23 20:23   ` Gionatan Danti
2017-05-24  7:38     ` Carlos Maiolino
2017-05-24 17:50       ` Gionatan Danti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).