linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robert Dorr <rdorr@microsoft.com>
To: Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>
Cc: "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"hch@lst.de" <hch@lst.de>, Slava Oks <slavao@microsoft.com>,
	Jasraj Dange <jasrajd@microsoft.com>,
	Michael Nelson <micn@microsoft.com>,
	Amit Banerjee <amit.banerjee@microsoft.com>,
	Travis Wright <twright@microsoft.com>,
	Bob Ward <bobward@microsoft.com>,
	Scott Konersmann <scottkon@microsoft.com>
Subject: RE: [PATCH 2/4] iomap: iomap_dio_rw() handles all sync writes
Date: Wed, 2 May 2018 14:27:37 +0000	[thread overview]
Message-ID: <BN6PR21MB07409BC39716894D592EB0C0B8800@BN6PR21MB0740.namprd21.prod.outlook.com> (raw)
In-Reply-To: <20180502024540.GA23861@dastard>

Thanks for your continued effort Dave.

In the current implementation the first write to the location updates the metadata and must issue the flush.   In Windows SQL Server can avoid this behavior.   SQL Server can issue DeviceIoControl with SET_FILE_VALID_DATA and then SetEndOfFile.  The SetEndOfFile acquires space and saves metadata without requiring the actual write.   This allows us to quickly create a large file and the writes do not need the added flush.

Is this something that fallocate could accommodate to avoid having to write once (triggers flush for metadata) and then secondary writes can use FUA and avoid the flush?



-----Original Message-----
From: Dave Chinner <david@fromorbit.com> 
Sent: Tuesday, May 1, 2018 9:46 PM
To: Jan Kara <jack@suse.cz>
Cc: linux-xfs@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux-block@vger.kernel.org; hch@lst.de; Robert Dorr <rdorr@microsoft.com>
Subject: Re: [PATCH 2/4] iomap: iomap_dio_rw() handles all sync writes

On Sat, Apr 21, 2018 at 03:03:09PM +0200, Jan Kara wrote:
> On Wed 18-04-18 14:08:26, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Currently iomap_dio_rw() only handles (data)sync write completions 
> > for AIO. This means we can't optimised non-AIO IO to minimise device 
> > flushes as we can't tell the caller whether a flush is required or 
> > not.
> > 
> > To solve this problem and enable further optimisations, make 
> > iomap_dio_rw responsible for data sync behaviour for all IO, not 
> > just AIO.
> > 
> > In doing so, the sync operation is now accounted as part of the DIO 
> > IO by inode_dio_end(), hence post-IO data stability updates will no 
> > long race against operations that serialise via inode_dio_wait() 
> > such as truncate or hole punch.
> > 
> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Looks good to me. You can add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>

It looks good, but it's broken in a subtle, nasty way. :/

> > @@ -768,14 +776,8 @@ static ssize_t iomap_dio_complete(struct 
> > iomap_dio *dio)  static void iomap_dio_complete_work(struct 
> > work_struct *work)  {
> >  	struct iomap_dio *dio = container_of(work, struct iomap_dio, aio.work);
> > -	struct kiocb *iocb = dio->iocb;
> > -	bool is_write = (dio->flags & IOMAP_DIO_WRITE);
> > -	ssize_t ret;
> >  
> > -	ret = iomap_dio_complete(dio);
> > -	if (is_write && ret > 0)
> > -		ret = generic_write_sync(iocb, ret);
> > -	iocb->ki_complete(iocb, ret, 0);
> > +	dio->iocb->ki_complete(dio->iocb, iomap_dio_complete(dio), 0);

This generates a use after free from KASAN from generic/016. it appears the compiler orders the code so that it dereferences
dio->iocb after iomap_dio_complete() has freed the dio structure
(yay!).

I'll post a new version of the patchset now that I've got changes to
2 of the 3 remaining patches in it....

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2018-05-02 14:27 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-18  4:08 [PATCH 0/4 V2] iomap: Use FUA for O_DSYNC DIO writes Dave Chinner
2018-04-18  4:08 ` [PATCH 1/4] xfs: move generic_write_sync calls inwards Dave Chinner
2018-04-18  4:08 ` [PATCH 2/4] iomap: iomap_dio_rw() handles all sync writes Dave Chinner
2018-04-21 13:03   ` Jan Kara
2018-05-02  2:45     ` Dave Chinner
2018-05-02 14:27       ` Robert Dorr [this message]
2018-05-03 13:34         ` Jan Kara
2018-05-03 12:51       ` Jan Kara
2018-04-18  4:08 ` [PATCH 3/4] blk: add blk_queue_fua() helper function Dave Chinner
2018-04-18 10:34   ` Christoph Hellwig
2018-04-18 12:18     ` Matthew Wilcox
2018-04-18 14:23   ` Jens Axboe
2018-04-18  4:08 ` [PATCH 4/4] iomap: Use FUA for pure data O_DSYNC DIO writes Dave Chinner
2018-04-21 12:54   ` Jan Kara
2018-04-24 17:34     ` Christoph Hellwig
2018-04-24 22:07       ` Holger Hoffstätte
2018-04-25  5:20         ` Christoph Hellwig
2018-04-25 13:02         ` Jan Kara
2018-05-02  2:34       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BN6PR21MB07409BC39716894D592EB0C0B8800@BN6PR21MB0740.namprd21.prod.outlook.com \
    --to=rdorr@microsoft.com \
    --cc=amit.banerjee@microsoft.com \
    --cc=bobward@microsoft.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jasrajd@microsoft.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=micn@microsoft.com \
    --cc=scottkon@microsoft.com \
    --cc=slavao@microsoft.com \
    --cc=twright@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).