linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Avi Kivity <avi@scylladb.com>
To: Brian Foster <bfoster@redhat.com>, Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 09/10] iomap: add a IOMAP_DIO_NOALLOC flag
Date: Thu, 14 Jan 2021 12:43:37 +0200	[thread overview]
Message-ID: <8ed44546-e5bd-dd60-a16b-ab185de3d5b9@scylladb.com> (raw)
In-Reply-To: <20210114102347.GD1333929@bfoster>

On 1/14/21 12:23 PM, Brian Foster wrote:
> On Thu, Jan 14, 2021 at 09:49:35AM +1100, Dave Chinner wrote:
>> On Wed, Jan 13, 2021 at 10:32:15AM -0500, Brian Foster wrote:
>>> On Wed, Jan 13, 2021 at 10:29:23AM +1100, Dave Chinner wrote:
>>>> On Tue, Jan 12, 2021 at 05:26:15PM +0100, Christoph Hellwig wrote:
>>>>> Add a flag to request that the iomap instances do not allocate blocks
>>>>> by translating it to another new IOMAP_NOALLOC flag.
>>>> Except "no allocation" that is not what XFS needs for concurrent
>>>> sub-block DIO.
>>>>
>>>> We are trying to avoid external sub-block IO outside the range of
>>>> the user data IO (COW, sub-block zeroing, etc) so that we don't
>>>> trash adjacent sub-block IO in flight. This means we can't do
>>>> sub-block zeroing and that then means we can't map unwritten extents
>>>> or allocate new extents for the sub-block IO.  It also means the IO
>>>> range cannot span EOF because that triggers unconditional sub-block
>>>> zeroing in iomap_dio_rw_actor().
>>>>
>>>> And because we may have to map multiple extents to fully span an IO
>>>> range, we have to guarantee that subsequent extents for the IO are
>>>> also written otherwise we have a partial write abort case. Hence we
>>>> have single extent limitations as well.
>>>>
>>>> So "no allocation" really doesn't describe what we want this flag to
>>>> at all.
>>>>
>>>> If we're going to use a flag for this specific functionality, let's
>>>> call it what it is: IOMAP_DIO_UNALIGNED/IOMAP_UNALIGNED and do two
>>>> things with it.
>>>>
>>>> 	1. Make unaligned IO a formal part of the iomap_dio_rw()
>>>> 	behaviour so it can do the common checks to for things that
>>>> 	need exclusive serialisation for unaligned IO (i.e. avoid IO
>>>> 	spanning EOF, abort if there are cached pages over the
>>>> 	range, etc).
>>>>
>>>> 	2. require the filesystem mapping callback do only allow
>>>> 	unaligned IO into ranges that are contiguous and don't
>>>> 	require mapping state changes or sub-block zeroing to be
>>>> 	performed during the sub-block IO.
>>>>
>>>>
>>> Something I hadn't thought about before is whether applications might
>>> depend on current unaligned dio serialization for coherency and thus
>>> break if the kernel suddenly allows concurrent unaligned dio to pass
>>> through. Should this be something that is explicitly requested by
>>> userspace?
>> If applications are relying on an undocumented, implementation
>> specific behaviour of a filesystem that only occurs for IOs of a
>> certain size for implicit data coherency between independent,
>> non-overlapping DIOs and/or page cache IO, then they are already
>> broken and need fixing because that behaviour is not guaranteed to
>> occur. e.g. 512 byte block size filesystem does not provide such
>> serialisation, so if the app depends on 512 byte DIOs being
>> serialised completely by the filesytem then it already fails on 512
>> byte block size filesystems.
>>
> I'm not sure how the block size relates beyond just changing the
> alignment requirements..?
>
>> So, no, we simply don't care about breaking broken applications that
>> are already broken.
>>
> I agree in general, but I'm not sure that helps us on the "don't break
> userspace" front. We can call userspace broken all we want, but if some
> application has such a workload that historically functions correctly
> due to this serialization and all of a sudden starts to cause data
> corruption because we decide to remove it, I fear we'd end up taking the
> blame regardless. :/


I think it's unlikely. Application writers rarely know about such 
issues, so they can't knowingly depend on them. The sub-sub-genre of 
application writers who rely on dio/aio will be a lot more careful and 
wary of the filesystem.


In this particular case, triggering serialization also triggers blocking 
in io_submit, which is the aio/dio user's worst nightmare, by several 
orders of magnitude than the runner up. I have code to detect these 
cases and try to prevent serialization, or, when serialization is 
inevitable, do the serialization in userspace so my io_submits don't get 
blocked.




  reply	other threads:[~2021-01-14 10:44 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-12 16:26 [RFC] another attempt to reduce sub-block DIO serialisation Christoph Hellwig
2021-01-12 16:26 ` [PATCH 01/10] xfs: factor out a xfs_ilock_iocb helper Christoph Hellwig
2021-01-12 22:41   ` Dave Chinner
2021-01-12 16:26 ` [PATCH 02/10] xfs: make xfs_file_aio_write_checks IOCB_NOWAIT-aware Christoph Hellwig
2021-01-12 22:42   ` Dave Chinner
2021-01-12 16:26 ` [PATCH 03/10] xfs: cleanup the read/write helper naming Christoph Hellwig
2021-01-12 22:43   ` Dave Chinner
2021-01-12 16:26 ` [PATCH 04/10] xfs: remove the buffered I/O fallback assert Christoph Hellwig
2021-01-12 22:44   ` Dave Chinner
2021-01-12 16:26 ` [PATCH 05/10] xfs: simplify the read/write tracepoints Christoph Hellwig
2021-01-12 22:54   ` Dave Chinner
2021-01-12 16:26 ` [PATCH 06/10] xfs: improve the reflink_bounce_dio_write tracepoint Christoph Hellwig
2021-01-12 22:56   ` Dave Chinner
2021-01-12 16:26 ` [PATCH 07/10] xfs: split unaligned DIO write code out Christoph Hellwig
2021-01-12 23:00   ` Dave Chinner
2021-01-12 16:26 ` [PATCH 08/10] iomap: pass a flags argument to iomap_dio_rw Christoph Hellwig
2021-01-12 16:26 ` [PATCH 09/10] iomap: add a IOMAP_DIO_NOALLOC flag Christoph Hellwig
2021-01-12 23:29   ` Dave Chinner
2021-01-13 15:32     ` Brian Foster
2021-01-13 22:49       ` Dave Chinner
2021-01-14 10:23         ` Brian Foster
2021-01-14 10:43           ` Avi Kivity [this message]
2021-01-14 17:29       ` Christoph Hellwig
2021-01-14 17:26     ` Christoph Hellwig
2021-01-12 16:26 ` [PATCH 10/10] xfs: reduce exclusive locking on unaligned dio Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ed44546-e5bd-dd60-a16b-ab185de3d5b9@scylladb.com \
    --to=avi@scylladb.com \
    --cc=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).