All of lore.kernel.org
 help / color / mirror / Atom feed
From: Carlos Maiolino <cmaiolino@redhat.com>
To: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC PATCH 0/9] dm-thin/xfs: prototype a block reservation allocation model
Date: Mon, 21 Mar 2016 14:33:46 +0100	[thread overview]
Message-ID: <20160321133346.GD25476@redhat.com> (raw)
In-Reply-To: <1458225037-24155-1-git-send-email-bfoster@redhat.com>

Hi.

>From my point of view, I like the idea of an interface between the filesystem,
and the thin-provisioned device, so that we can actually know if the thin
volume is running out of space or not, but, before we actually start to discuss
how this should be implemented, I'd like to ask if this should be implemented.

After a few days discussing this with some block layer and dm-thin developers,
what I most hear/read is that a thin volume should be transparent to the
filesystem. So, the filesystem itself should not know it's running over a
thin-provisioned volume. And such interface being discussed here, breaks this
abstraction.

What I would like to know is the POV of block layer and dm-thin developers
regarding this. I know that this subject is being discussed for a while, but I
really have never seen a conclusion about if thin provisioned devices should be
transparent or not to the filesystem.

>From a storage perspective, I believe that all dedicated storage hardwares that actually
provide thin provisioning, does it in a transparent way to the filesystem, which
doesn't mean dm-thin must follow the same behavior.

A layer of communication between the fs and dm-thin will be great, mainly to
avoid cases, like you already mentioned, about data loss, such as items in AIL
can not be written back to disk due lack of space (which I've been working on
the past days), but before actually work and change the filesystem, I'd like to
understand what block/dm-thin layer actually expects about it. I tried to google
it a bit, to see if there is any standard regarding how thin provisioned devices
should behave, but I didn't find anything, so, any input about it will be
appreciated.

Cheers

--
Carlos


On Thu, Mar 17, 2016 at 10:30:28AM -0400, Brian Foster wrote:
> Hi all,
> 
> This is a proof-of-concept of a block reservation allocation model
> between XFS and dm-thin. The purpose is to create a mechanism by which
> the filesystem can determine an underlying thin volume is out of space
> and opt to return -ENOSPC to userspace rather than waiting until the
> volume is driven out of space (and deactivated or transitioned
> read-only). The idea, in principle, is to use a similar reservation
> model for thin pool blocks as the filesystem does today for delayed
> allocation blocks and to prevent similar risk of overprovisioning of fs
> blocks.
> 
> This idea was concocted a while back during some discussions around how
> to provide a more user friendly out of space condition to users of
> filesystems on top of thin devices. At the moment, we (XFS) write to the
> underlying volume until it runs out of space and transitions to
> read-only. The administrator is responsible to prevent or recover from
> this condition via auto provisioning and/or monitoring for low watermark
> notifications. With a reservation model, the filesytem returns -ENOSPC
> at write time when the underlying pool is out of space and operation
> otherwise continues (e.g., space can be freed from the fs) as if the fs
> itself were out of space.
> 
> Joe and Mike were kind enough to hack together a dm block reservation
> mechanism to help us experiment further. I slightly modified and hacked
> in an additional provision call based on their code, and then hacked up
> an integration with the existing XFS resource reservation mechanism. I
> think the results are slightly encouraging, at least in that the basic
> buffered write mechanism works as expected without too much inefficiency
> due to the over-reservation.
> 
> There are still flaws and tradeoffs to this approach, of course. The
> current implementation uses a worst case reservation model that assumes
> every unallocated filesystem block requires a new dm-thin block
> allocation. With dm-thin block sizes on the order of 256k-1MB for larger
> volumes, this is a significant over-reservation for 4k (or smaller)
> filesystem blocks. XFS has algorithms in some areas (buffered writes)
> that deal with this problem already, but at the very least, further
> optimization might be in order to improve performance. This also doesn't
> consider other operations (fallocate) or filesystems that might not be
> immediately suited to handle this limitation. Also, the interface to the
> block device is clearly crude, incomplete and hacked together
> (particularly the provision bits added by me). It remains to be seen
> whether we can define a sane interface to fully support this
> functionality.
> 
> As far as the implementation goes, this is a toy/experiment with various
> other known issues (mostly documented in the code, see the comments in
> xfs_thin.c) and should not be used for anything outside of
> experimentation. I haven't done much testing beyond simple buffered
> write runs to ENOSPC, so problems in other areas can be expected.
> Apologies for whatever general shoddiness might be discovered, but I
> wanted to get something posted to generate discussion before putting too
> much effort into testing and exploring all of the dark corners where
> more issues certainly lurk.
> 
> In summary, the primary purpose of this series is to close the loop on
> some of the early XFS/dm-thin discussion around whether something like
> this is feasible, worthwhile, and to otherwise gather initial thoughts
> from fs and dm folks on the general topic. If worth pursuing further,
> discussion around things like an appropriate interface to the block
> device is certainly warranted.
> 
> Thanks again to Joe and Mike for entertaining the idea and hacking
> something together to play around with. Thoughts, reviews, flames
> appreciated. (BTW, I'm also planning to be at LSF if anybody is
> interested in discussing this further).
> 
> Brian
> 
> P.S., With these patches applied, use the following to create an
> over-provisioned thin volume and mount XFS in "reservation mode:"
> 
> # lvcreate --thinpool test/pool -L1G
> # lvcreate -T test/pool -n thin -V 10G
> # mkfs.xfs -f /dev/test/thin
> # mount /dev/test/thin /mnt -o discard
> # dmesg | tail
> ...
> XFS (dm-8): Mounting V5 Filesystem
> XFS (dm-8): Ending clean mount
> XFS (dm-8): Thin pool reservation enabled
> XFS (dm-8): Thin reserve blocksize: 512 sectors
> # dd if=/dev/zero of=/mnt/file bs=4k
> dd: error writing '/mnt/file': No space left on device
> ...
> 
> Brian Foster (6):
>   dm thin: update reserve space func to allow reduction
>   block: add a block_device_operations method to provision space
>   dm: add method to provision space
>   dm thin: add method to provision space
>   xfs: thin block device reservation mechanism
>   xfs: adopt a reserved allocation model on dm-thin devices
> 
> Joe Thornber (1):
>   dm thin: add methods to set and get reserved space
> 
> Mike Snitzer (2):
>   block: add block_device_operations methods to set and get reserved
>     space
>   dm: add methods to set and get reserved space
> 
>  drivers/md/dm-thin.c          | 187 +++++++++++++++++++++++++++--
>  drivers/md/dm.c               | 110 +++++++++++++++++
>  fs/block_dev.c                |  30 +++++
>  fs/xfs/Makefile               |   1 +
>  fs/xfs/libxfs/xfs_alloc.c     |   6 +
>  fs/xfs/xfs_mount.c            |  81 +++++++++++--
>  fs/xfs/xfs_mount.h            |   7 ++
>  fs/xfs/xfs_thin.c             | 273 ++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_thin.h             |   9 ++
>  fs/xfs/xfs_trace.h            |  27 +++++
>  fs/xfs/xfs_trans.c            |  26 +++-
>  include/linux/blkdev.h        |   7 ++
>  include/linux/device-mapper.h |   7 ++
>  13 files changed, 749 insertions(+), 22 deletions(-)
>  create mode 100644 fs/xfs/xfs_thin.c
>  create mode 100644 fs/xfs/xfs_thin.h
> 
> -- 
> 2.4.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


WARNING: multiple messages have this Message-ID (diff)
From: Carlos Maiolino <cmaiolino@redhat.com>
To: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org
Subject: Re: [RFC PATCH 0/9] dm-thin/xfs: prototype a block reservation allocation model
Date: Mon, 21 Mar 2016 14:33:46 +0100	[thread overview]
Message-ID: <20160321133346.GD25476@redhat.com> (raw)
In-Reply-To: <1458225037-24155-1-git-send-email-bfoster@redhat.com>

Hi.

>From my point of view, I like the idea of an interface between the filesystem,
and the thin-provisioned device, so that we can actually know if the thin
volume is running out of space or not, but, before we actually start to discuss
how this should be implemented, I'd like to ask if this should be implemented.

After a few days discussing this with some block layer and dm-thin developers,
what I most hear/read is that a thin volume should be transparent to the
filesystem. So, the filesystem itself should not know it's running over a
thin-provisioned volume. And such interface being discussed here, breaks this
abstraction.

What I would like to know is the POV of block layer and dm-thin developers
regarding this. I know that this subject is being discussed for a while, but I
really have never seen a conclusion about if thin provisioned devices should be
transparent or not to the filesystem.

>From a storage perspective, I believe that all dedicated storage hardwares that actually
provide thin provisioning, does it in a transparent way to the filesystem, which
doesn't mean dm-thin must follow the same behavior.

A layer of communication between the fs and dm-thin will be great, mainly to
avoid cases, like you already mentioned, about data loss, such as items in AIL
can not be written back to disk due lack of space (which I've been working on
the past days), but before actually work and change the filesystem, I'd like to
understand what block/dm-thin layer actually expects about it. I tried to google
it a bit, to see if there is any standard regarding how thin provisioned devices
should behave, but I didn't find anything, so, any input about it will be
appreciated.

Cheers

--
Carlos


On Thu, Mar 17, 2016 at 10:30:28AM -0400, Brian Foster wrote:
> Hi all,
> 
> This is a proof-of-concept of a block reservation allocation model
> between XFS and dm-thin. The purpose is to create a mechanism by which
> the filesystem can determine an underlying thin volume is out of space
> and opt to return -ENOSPC to userspace rather than waiting until the
> volume is driven out of space (and deactivated or transitioned
> read-only). The idea, in principle, is to use a similar reservation
> model for thin pool blocks as the filesystem does today for delayed
> allocation blocks and to prevent similar risk of overprovisioning of fs
> blocks.
> 
> This idea was concocted a while back during some discussions around how
> to provide a more user friendly out of space condition to users of
> filesystems on top of thin devices. At the moment, we (XFS) write to the
> underlying volume until it runs out of space and transitions to
> read-only. The administrator is responsible to prevent or recover from
> this condition via auto provisioning and/or monitoring for low watermark
> notifications. With a reservation model, the filesytem returns -ENOSPC
> at write time when the underlying pool is out of space and operation
> otherwise continues (e.g., space can be freed from the fs) as if the fs
> itself were out of space.
> 
> Joe and Mike were kind enough to hack together a dm block reservation
> mechanism to help us experiment further. I slightly modified and hacked
> in an additional provision call based on their code, and then hacked up
> an integration with the existing XFS resource reservation mechanism. I
> think the results are slightly encouraging, at least in that the basic
> buffered write mechanism works as expected without too much inefficiency
> due to the over-reservation.
> 
> There are still flaws and tradeoffs to this approach, of course. The
> current implementation uses a worst case reservation model that assumes
> every unallocated filesystem block requires a new dm-thin block
> allocation. With dm-thin block sizes on the order of 256k-1MB for larger
> volumes, this is a significant over-reservation for 4k (or smaller)
> filesystem blocks. XFS has algorithms in some areas (buffered writes)
> that deal with this problem already, but at the very least, further
> optimization might be in order to improve performance. This also doesn't
> consider other operations (fallocate) or filesystems that might not be
> immediately suited to handle this limitation. Also, the interface to the
> block device is clearly crude, incomplete and hacked together
> (particularly the provision bits added by me). It remains to be seen
> whether we can define a sane interface to fully support this
> functionality.
> 
> As far as the implementation goes, this is a toy/experiment with various
> other known issues (mostly documented in the code, see the comments in
> xfs_thin.c) and should not be used for anything outside of
> experimentation. I haven't done much testing beyond simple buffered
> write runs to ENOSPC, so problems in other areas can be expected.
> Apologies for whatever general shoddiness might be discovered, but I
> wanted to get something posted to generate discussion before putting too
> much effort into testing and exploring all of the dark corners where
> more issues certainly lurk.
> 
> In summary, the primary purpose of this series is to close the loop on
> some of the early XFS/dm-thin discussion around whether something like
> this is feasible, worthwhile, and to otherwise gather initial thoughts
> from fs and dm folks on the general topic. If worth pursuing further,
> discussion around things like an appropriate interface to the block
> device is certainly warranted.
> 
> Thanks again to Joe and Mike for entertaining the idea and hacking
> something together to play around with. Thoughts, reviews, flames
> appreciated. (BTW, I'm also planning to be at LSF if anybody is
> interested in discussing this further).
> 
> Brian
> 
> P.S., With these patches applied, use the following to create an
> over-provisioned thin volume and mount XFS in "reservation mode:"
> 
> # lvcreate --thinpool test/pool -L1G
> # lvcreate -T test/pool -n thin -V 10G
> # mkfs.xfs -f /dev/test/thin
> # mount /dev/test/thin /mnt -o discard
> # dmesg | tail
> ...
> XFS (dm-8): Mounting V5 Filesystem
> XFS (dm-8): Ending clean mount
> XFS (dm-8): Thin pool reservation enabled
> XFS (dm-8): Thin reserve blocksize: 512 sectors
> # dd if=/dev/zero of=/mnt/file bs=4k
> dd: error writing '/mnt/file': No space left on device
> ...
> 
> Brian Foster (6):
>   dm thin: update reserve space func to allow reduction
>   block: add a block_device_operations method to provision space
>   dm: add method to provision space
>   dm thin: add method to provision space
>   xfs: thin block device reservation mechanism
>   xfs: adopt a reserved allocation model on dm-thin devices
> 
> Joe Thornber (1):
>   dm thin: add methods to set and get reserved space
> 
> Mike Snitzer (2):
>   block: add block_device_operations methods to set and get reserved
>     space
>   dm: add methods to set and get reserved space
> 
>  drivers/md/dm-thin.c          | 187 +++++++++++++++++++++++++++--
>  drivers/md/dm.c               | 110 +++++++++++++++++
>  fs/block_dev.c                |  30 +++++
>  fs/xfs/Makefile               |   1 +
>  fs/xfs/libxfs/xfs_alloc.c     |   6 +
>  fs/xfs/xfs_mount.c            |  81 +++++++++++--
>  fs/xfs/xfs_mount.h            |   7 ++
>  fs/xfs/xfs_thin.c             | 273 ++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_thin.h             |   9 ++
>  fs/xfs/xfs_trace.h            |  27 +++++
>  fs/xfs/xfs_trans.c            |  26 +++-
>  include/linux/blkdev.h        |   7 ++
>  include/linux/device-mapper.h |   7 ++
>  13 files changed, 749 insertions(+), 22 deletions(-)
>  create mode 100644 fs/xfs/xfs_thin.c
>  create mode 100644 fs/xfs/xfs_thin.h
> 
> -- 
> 2.4.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2016-03-21 13:33 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-17 14:30 [RFC PATCH 0/9] dm-thin/xfs: prototype a block reservation allocation model Brian Foster
2016-03-17 14:30 ` Brian Foster
2016-03-17 14:30 ` [RFC PATCH 1/9] block: add block_device_operations methods to set and get reserved space Brian Foster
2016-03-17 14:30   ` Brian Foster
2016-03-21 12:08   ` Carlos Maiolino
2016-03-21 12:08     ` Carlos Maiolino
2016-03-21 21:53     ` Dave Chinner
2016-03-21 21:53       ` Dave Chinner
2016-03-22 12:05       ` Brian Foster
2016-03-22 12:05         ` Brian Foster
2016-03-17 14:30 ` [RFC PATCH 2/9] dm: add " Brian Foster
2016-03-17 14:30   ` Brian Foster
2016-03-21 12:17   ` Carlos Maiolino
2016-03-21 12:17     ` Carlos Maiolino
2016-03-17 14:30 ` [RFC PATCH 3/9] dm thin: " Brian Foster
2016-03-17 14:30   ` Brian Foster
2016-03-17 14:30 ` [RFC PATCH 4/9] dm thin: update reserve space func to allow reduction Brian Foster
2016-03-17 14:30   ` Brian Foster
2016-03-17 14:30 ` [RFC PATCH 5/9] block: add a block_device_operations method to provision space Brian Foster
2016-03-17 14:30   ` Brian Foster
2016-03-17 14:30 ` [RFC PATCH 6/9] dm: add " Brian Foster
2016-03-17 14:30   ` Brian Foster
2016-03-17 14:30 ` [RFC PATCH 7/9] dm thin: " Brian Foster
2016-03-17 14:30   ` Brian Foster
2016-03-17 14:30 ` [RFC PATCH 8/9] xfs: thin block device reservation mechanism Brian Foster
2016-03-17 14:30   ` Brian Foster
2016-03-17 14:30 ` [RFC PATCH 9/9] xfs: adopt a reserved allocation model on dm-thin devices Brian Foster
2016-03-17 14:30   ` Brian Foster
2016-03-21 13:33 ` Carlos Maiolino [this message]
2016-03-21 13:33   ` [RFC PATCH 0/9] dm-thin/xfs: prototype a block reservation allocation model Carlos Maiolino
2016-03-21 22:36   ` Dave Chinner
2016-03-21 22:36     ` Dave Chinner
2016-03-22 12:06     ` Brian Foster
2016-03-22 12:06       ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160321133346.GD25476@redhat.com \
    --to=cmaiolino@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.