linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Gruenbacher <agruenba@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: cluster-devel <cluster-devel@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH v4 11/11] iomap: Complete partial direct I/O writes synchronously
Date: Wed, 16 May 2018 22:27:18 +0200	[thread overview]
Message-ID: <CAHc6FU7Td_tabaXM3aSQA8krfxMrgV=NiCcXggf6_aC1WA0nNw@mail.gmail.com> (raw)
In-Reply-To: <20180515072411.GB23202@lst.de>

On 15 May 2018 at 09:24, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, May 14, 2018 at 05:36:24PM +0200, Andreas Gruenbacher wrote:
>> According to xfstest generic/240, applications see, to expect direct I/O
>> writes to either complete as a whole or to fail; short direct I/O writes
>> are apparently not appreciated.  This means that when only part of an
>> asynchronous direct I/O write succeeds, we can either fail the entire
>> write, or we can wait wait for the partial write to complete and retry
>> the remaining write using buffered I/O.  The old __blockdev_direct_IO
>> helper has code for waiting for partial writes to complete; the new
>> iomap_dio_rw iomap helper does not.
>>
>> The above mentioned fallback mode is used by gfs2, which doesn't allow
>> block allocations under direct I/O to avoid taking cluster-wide
>> exclusive locks.  As a consequence, an asynchronous direct I/O write to
>> a file range that ends in a hole will result in a short write.  When
>> that happens, we want to retry the remaining write using buffered I/O.
>>
>> To allow that, change iomap_dio_rw to wait for short direct I/O writes
>> like __blockdev_direct_IO does instead of returning -EIOCBQUEUED.
>>
>> This fixes xfstest generic/240 on gfs2.
>
> The code looks pretty racy to me.

I/O completion triggers when dio->ref reaches zero, so
iomap_dio_bio_end_io will never evaluate submit.waiter before the
atomic_dec_and_test in iomap_dio_rw. We probably need an
smp_mb__before_atomic() before that atomic_dec_and_test to make sure
that iomap_dio_bio_end_io sees the right value of submit.waiter
though. Any concerns beyond that?

> Why would gfs2 cause a short direct I/O write to start with? I suspect that is
> where the problem that needs fixing is burried.

This is caused by the fact that gfs2 takes a deferred rather than an
exclusive cluster-wide lock during direct I/O operations, which allows
concurrent operations. This is critical for performance. In that
locking mode, however, it's only possible to write to preallocated
space, and new allocations are not possible. So user-space requests an
asynchronous direct I/O write into a file range that happens to
contain a hole. iomap_dio_rw is walf-way done with the write when
ops->iomap_begin hits the hole. In that state, the only choice is to
wait for the completion of the partial write and to switch locking
modes before completing the rest of the write. That's what we did
before iomap; the old code did support that. Leaving things unchanged
and completing partial direct I/O writes asynchronously breaks xfstest
generic/240, and likely also user-space.

The only other workaround I can think of would be to check all
asynchronous direct I/O write file ranges for holes before calling
iomap_dio_rw. That would be a major nightmare though.

Thanks,
Andreas

  reply	other threads:[~2018-05-16 20:27 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-14 15:36 [PATCH v4 00/11] gfs2 iomap write support Andreas Gruenbacher
2018-05-14 15:36 ` [PATCH v4 01/11] gfs2: Update find_metapath comment Andreas Gruenbacher
2018-05-14 15:36 ` [PATCH v4 02/11] gfs2: hole_size improvement Andreas Gruenbacher
2018-05-14 15:36 ` [PATCH v4 03/11] gfs2: gfs2_stuffed_write_end cleanup Andreas Gruenbacher
2018-05-14 15:36 ` [PATCH v4 04/11] gfs2: Remove ordered write mode handling from gfs2_trans_add_data Andreas Gruenbacher
2018-05-14 15:36 ` [PATCH v4 05/11] gfs2: Iomap cleanups and improvements Andreas Gruenbacher
2018-05-14 15:36 ` [PATCH v4 06/11] iomap: Add write_{begin,end} iomap operations Andreas Gruenbacher
2018-05-15  1:11   ` Dave Chinner
2018-05-15  7:22     ` Christoph Hellwig
2018-05-15  8:16       ` Andreas Gruenbacher
2018-05-18 16:04         ` Christoph Hellwig
2018-05-25 17:58           ` Andreas Grünbacher
2018-05-28 13:02             ` Christoph Hellwig
2018-05-14 15:36 ` [PATCH v4 07/11] gfs2: iomap buffered write support Andreas Gruenbacher
2018-05-14 15:36 ` [PATCH v4 08/11] gfs2: gfs2_extent_length cleanup Andreas Gruenbacher
2018-05-14 15:36 ` [PATCH v4 09/11] gfs2: iomap direct I/O support Andreas Gruenbacher
2018-05-15  7:31   ` Christoph Hellwig
2018-05-16 20:36     ` Andreas Gruenbacher
2018-05-14 15:36 ` [PATCH v4 10/11] gfs2: Remove gfs2_write_{begin,end} Andreas Gruenbacher
2018-05-14 15:36 ` [PATCH v4 11/11] iomap: Complete partial direct I/O writes synchronously Andreas Gruenbacher
2018-05-15  7:24   ` Christoph Hellwig
2018-05-16 20:27     ` Andreas Gruenbacher [this message]
2018-05-18 15:56       ` Christoph Hellwig
2018-05-18 18:35         ` Andreas Grünbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHc6FU7Td_tabaXM3aSQA8krfxMrgV=NiCcXggf6_aC1WA0nNw@mail.gmail.com' \
    --to=agruenba@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --subject='Re: [PATCH v4 11/11] iomap: Complete partial direct I/O writes synchronously' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).