From: Avi Kivity <avi@scylladb.com>
To: Brian Foster <bfoster@redhat.com>, Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 09/10] iomap: add a IOMAP_DIO_NOALLOC flag
Date: Thu, 14 Jan 2021 12:43:37 +0200 [thread overview]
Message-ID: <8ed44546-e5bd-dd60-a16b-ab185de3d5b9@scylladb.com> (raw)
In-Reply-To: <20210114102347.GD1333929@bfoster>
On 1/14/21 12:23 PM, Brian Foster wrote:
> On Thu, Jan 14, 2021 at 09:49:35AM +1100, Dave Chinner wrote:
>> On Wed, Jan 13, 2021 at 10:32:15AM -0500, Brian Foster wrote:
>>> On Wed, Jan 13, 2021 at 10:29:23AM +1100, Dave Chinner wrote:
>>>> On Tue, Jan 12, 2021 at 05:26:15PM +0100, Christoph Hellwig wrote:
>>>>> Add a flag to request that the iomap instances do not allocate blocks
>>>>> by translating it to another new IOMAP_NOALLOC flag.
>>>> Except "no allocation" that is not what XFS needs for concurrent
>>>> sub-block DIO.
>>>>
>>>> We are trying to avoid external sub-block IO outside the range of
>>>> the user data IO (COW, sub-block zeroing, etc) so that we don't
>>>> trash adjacent sub-block IO in flight. This means we can't do
>>>> sub-block zeroing and that then means we can't map unwritten extents
>>>> or allocate new extents for the sub-block IO. It also means the IO
>>>> range cannot span EOF because that triggers unconditional sub-block
>>>> zeroing in iomap_dio_rw_actor().
>>>>
>>>> And because we may have to map multiple extents to fully span an IO
>>>> range, we have to guarantee that subsequent extents for the IO are
>>>> also written otherwise we have a partial write abort case. Hence we
>>>> have single extent limitations as well.
>>>>
>>>> So "no allocation" really doesn't describe what we want this flag to
>>>> at all.
>>>>
>>>> If we're going to use a flag for this specific functionality, let's
>>>> call it what it is: IOMAP_DIO_UNALIGNED/IOMAP_UNALIGNED and do two
>>>> things with it.
>>>>
>>>> 1. Make unaligned IO a formal part of the iomap_dio_rw()
>>>> behaviour so it can do the common checks to for things that
>>>> need exclusive serialisation for unaligned IO (i.e. avoid IO
>>>> spanning EOF, abort if there are cached pages over the
>>>> range, etc).
>>>>
>>>> 2. require the filesystem mapping callback do only allow
>>>> unaligned IO into ranges that are contiguous and don't
>>>> require mapping state changes or sub-block zeroing to be
>>>> performed during the sub-block IO.
>>>>
>>>>
>>> Something I hadn't thought about before is whether applications might
>>> depend on current unaligned dio serialization for coherency and thus
>>> break if the kernel suddenly allows concurrent unaligned dio to pass
>>> through. Should this be something that is explicitly requested by
>>> userspace?
>> If applications are relying on an undocumented, implementation
>> specific behaviour of a filesystem that only occurs for IOs of a
>> certain size for implicit data coherency between independent,
>> non-overlapping DIOs and/or page cache IO, then they are already
>> broken and need fixing because that behaviour is not guaranteed to
>> occur. e.g. 512 byte block size filesystem does not provide such
>> serialisation, so if the app depends on 512 byte DIOs being
>> serialised completely by the filesytem then it already fails on 512
>> byte block size filesystems.
>>
> I'm not sure how the block size relates beyond just changing the
> alignment requirements..?
>
>> So, no, we simply don't care about breaking broken applications that
>> are already broken.
>>
> I agree in general, but I'm not sure that helps us on the "don't break
> userspace" front. We can call userspace broken all we want, but if some
> application has such a workload that historically functions correctly
> due to this serialization and all of a sudden starts to cause data
> corruption because we decide to remove it, I fear we'd end up taking the
> blame regardless. :/
I think it's unlikely. Application writers rarely know about such
issues, so they can't knowingly depend on them. The sub-sub-genre of
application writers who rely on dio/aio will be a lot more careful and
wary of the filesystem.
In this particular case, triggering serialization also triggers blocking
in io_submit, which is the aio/dio user's worst nightmare, by several
orders of magnitude than the runner up. I have code to detect these
cases and try to prevent serialization, or, when serialization is
inevitable, do the serialization in userspace so my io_submits don't get
blocked.
next prev parent reply other threads:[~2021-01-14 10:44 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-12 16:26 [RFC] another attempt to reduce sub-block DIO serialisation Christoph Hellwig
2021-01-12 16:26 ` [PATCH 01/10] xfs: factor out a xfs_ilock_iocb helper Christoph Hellwig
2021-01-12 22:41 ` Dave Chinner
2021-01-12 16:26 ` [PATCH 02/10] xfs: make xfs_file_aio_write_checks IOCB_NOWAIT-aware Christoph Hellwig
2021-01-12 22:42 ` Dave Chinner
2021-01-12 16:26 ` [PATCH 03/10] xfs: cleanup the read/write helper naming Christoph Hellwig
2021-01-12 22:43 ` Dave Chinner
2021-01-12 16:26 ` [PATCH 04/10] xfs: remove the buffered I/O fallback assert Christoph Hellwig
2021-01-12 22:44 ` Dave Chinner
2021-01-12 16:26 ` [PATCH 05/10] xfs: simplify the read/write tracepoints Christoph Hellwig
2021-01-12 22:54 ` Dave Chinner
2021-01-12 16:26 ` [PATCH 06/10] xfs: improve the reflink_bounce_dio_write tracepoint Christoph Hellwig
2021-01-12 22:56 ` Dave Chinner
2021-01-12 16:26 ` [PATCH 07/10] xfs: split unaligned DIO write code out Christoph Hellwig
2021-01-12 23:00 ` Dave Chinner
2021-01-12 16:26 ` [PATCH 08/10] iomap: pass a flags argument to iomap_dio_rw Christoph Hellwig
2021-01-12 16:26 ` [PATCH 09/10] iomap: add a IOMAP_DIO_NOALLOC flag Christoph Hellwig
2021-01-12 23:29 ` Dave Chinner
2021-01-13 15:32 ` Brian Foster
2021-01-13 22:49 ` Dave Chinner
2021-01-14 10:23 ` Brian Foster
2021-01-14 10:43 ` Avi Kivity [this message]
2021-01-14 17:29 ` Christoph Hellwig
2021-01-14 17:26 ` Christoph Hellwig
2021-01-12 16:26 ` [PATCH 10/10] xfs: reduce exclusive locking on unaligned dio Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8ed44546-e5bd-dd60-a16b-ab185de3d5b9@scylladb.com \
--to=avi@scylladb.com \
--cc=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).