linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>,
	Goldwyn Rodrigues <rgoldwyn@suse.de>,
	linux-btrfs@vger.kernel.org, kilobyte@angband.pl,
	linux-fsdevel@vger.kernel.org, willy@infradead.org, hch@lst.de,
	dsterba@suse.cz, nborisov@suse.com, linux-nvdimm@lists.01.org
Subject: Re: [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes
Date: Thu, 30 May 2019 13:16:05 +0200	[thread overview]
Message-ID: <20190530111605.GC29237@quack2.suse.cz> (raw)
In-Reply-To: <20190529221445.GE16786@dread.disaster.area>

On Thu 30-05-19 08:14:45, Dave Chinner wrote:
> On Wed, May 29, 2019 at 03:46:29PM +0200, Jan Kara wrote:
> > On Wed 29-05-19 14:46:58, Dave Chinner wrote:
> > >  iomap_apply()
> > > 
> > >  	->iomap_begin()
> > > 		map old data extent that we copy from
> > > 
> > > 		allocate new data extent we copy to in data fork,
> > > 		immediately replacing old data extent
> > > 
> > > 		return transaction handle as private data
> 
> This holds the inode block map locked exclusively across the IO,
> so....

Does it? We do hold XFS_IOLOCK_EXCL during the whole dax write. But
xfs_file_iomap_begin() does release XFS_ILOCK_* on exit AFAICS. So I don't
see anything that would prevent page fault from mapping blocks into page
tables just after xfs_file_iomap_begin() returns.

> > > 	dax_iomap_actor()
> > > 		copies data from old extent to new extent
> > > 
> > > 	->iomap_end
> > > 		commits transaction now data has been copied, making
> > > 		the COW operation atomic with the data copy.
> > > 
> > > 
> > > This, in fact, should be how we do all DAX writes that require
> > > allocation, because then we get rid of the need to zero newly
> > > allocated or unwritten extents before we copy the data into it. i.e.
> > > we only need to write once to newly allocated storage rather than
> > > twice.
> > 
> > You need to be careful though. You need to synchronize with page faults so
> > that they cannot see and expose in page tables blocks you've allocated
> > before their contents is filled.
> 
> ... so the page fault will block trying to map the blocks because
> it can't get the xfs_inode->i_ilock until the allocation transaciton
> commits....
> 
> > This race was actually the strongest
> > motivation for pre-zeroing of blocks. OTOH copy_from_iter() in
> > dax_iomap_actor() needs to be able to fault pages to copy from (and these
> > pages may be from the same file you're writing to) so you cannot just block
> > faulting for the file through I_MMAP_LOCK.
> 
> Right, it doesn't take the I_MMAP_LOCK, but it would block further
> in. And, really, I'm not caring all this much about this corner
> case. i.e.  anyone using a "mmap()+write() zero copy" pattern on DAX
> within a file is unbeleivably naive - the data still gets copied by
> the CPU in the write() call. It's far simpler and more effcient to
> just mmap() both ranges of the file(s) and memcpy() in userspace....
> 
> FWIW, it's to avoid problems with stupid userspace stuff that nobody
> really should be doing that I want range locks for the XFS inode
> locks.  If userspace overlaps the ranges and deadlocks in that case,
> they they get to keep all the broken bits because, IMO, they are
> doing something monumentally stupid. I'd probably be making it
> return EDEADLOCK back out to userspace in the case rather than
> deadlocking but, fundamentally, I think it's broken behaviour that
> we should be rejecting with an error rather than adding complexity
> trying to handle it.

I agree with this. We must just prevent user from taking the kernel down
with maliciously created IOs...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2019-05-30 11:16 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-29 17:26 [PATCH v4 00/18] btrfs dax support Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 01/18] btrfs: create a mount option for dax Goldwyn Rodrigues
2019-05-21 18:02   ` Darrick J. Wong
2019-04-29 17:26 ` [PATCH 02/18] btrfs: Carve out btrfs_get_extent_map_write() out of btrfs_get_blocks_write() Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 03/18] btrfs: basic dax read Goldwyn Rodrigues
2019-05-21 15:14   ` Darrick J. Wong
2019-05-22 21:50     ` Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes Goldwyn Rodrigues
2019-05-21 16:51   ` Darrick J. Wong
2019-05-22 20:14     ` Goldwyn Rodrigues
2019-05-23  2:10       ` Dave Chinner
2019-05-23  9:05     ` Shiyang Ruan
2019-05-23 11:51       ` Goldwyn Rodrigues
2019-05-27  8:25         ` Shiyang Ruan
2019-05-28  9:17           ` Jan Kara
2019-05-29  2:01             ` Shiyang Ruan
2019-05-29  2:47               ` Dave Chinner
2019-05-29  4:02                 ` Shiyang Ruan
2019-05-29  4:07                   ` Darrick J. Wong
2019-05-29  4:46                     ` Dave Chinner
2019-05-29 13:46                       ` Jan Kara
2019-05-29 22:14                         ` Dave Chinner
2019-05-30 11:16                           ` Jan Kara [this message]
2019-05-30 22:59                             ` Dave Chinner
2019-04-29 17:26 ` [PATCH 05/18] btrfs: return whether extent is nocow or not Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 06/18] btrfs: Rename __endio_write_update_ordered() to btrfs_update_ordered_extent() Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 07/18] btrfs: add dax write support Goldwyn Rodrigues
2019-05-21 17:08   ` Darrick J. Wong
2019-04-29 17:26 ` [PATCH 08/18] dax: memcpy page in case of IOMAP_DAX_COW for mmap faults Goldwyn Rodrigues
2019-05-21 17:46   ` Darrick J. Wong
2019-05-22 19:11     ` Goldwyn Rodrigues
2019-05-23  4:02       ` Darrick J. Wong
2019-05-23 12:10     ` Jan Kara
2019-04-29 17:26 ` [PATCH 09/18] btrfs: Add dax specific address_space_operations Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 10/18] dax: replace mmap entry in case of CoW Goldwyn Rodrigues
2019-05-21 17:35   ` Darrick J. Wong
2019-05-23 13:38   ` Jan Kara
2019-04-29 17:26 ` [PATCH 11/18] btrfs: add dax mmap support Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 12/18] btrfs: allow MAP_SYNC mmap Goldwyn Rodrigues
2019-05-10 15:32   ` [PATCH for-goldwyn] btrfs: disallow MAP_SYNC outside of DAX mounts Adam Borowski
2019-05-10 15:41     ` Dan Williams
2019-05-10 15:59       ` Pankaj Gupta
2019-05-23 13:44   ` [PATCH 12/18] btrfs: allow MAP_SYNC mmap Jan Kara
2019-05-23 16:19     ` Adam Borowski
2019-04-29 17:26 ` [PATCH 13/18] fs: dedup file range to use a compare function Goldwyn Rodrigues
2019-05-21 18:17   ` Darrick J. Wong
2019-04-29 17:26 ` [PATCH 14/18] dax: memcpy before zeroing range Goldwyn Rodrigues
2019-05-21 17:27   ` Darrick J. Wong
2019-04-29 17:26 ` [PATCH 15/18] btrfs: handle dax page zeroing Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 16/18] btrfs: Writeprotect mmap pages on snapshot Goldwyn Rodrigues
2019-05-23 14:04   ` Jan Kara
2019-05-23 15:27     ` Goldwyn Rodrigues
2019-05-23 19:07       ` Jan Kara
2019-05-23 21:22         ` Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 17/18] btrfs: Disable dax-based defrag and send Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 18/18] btrfs: trace functions for btrfs_iomap_begin/end Goldwyn Rodrigues
  -- strict thread matches above, loose matches on Subject: below --
2019-04-16 16:41 [PATCH v3 00/18] btrfs dax support Goldwyn Rodrigues
2019-04-16 16:41 ` [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes Goldwyn Rodrigues
2019-04-17 16:46   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190530111605.GC29237@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dsterba@suse.cz \
    --cc=hch@lst.de \
    --cc=kilobyte@angband.pl \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=nborisov@suse.com \
    --cc=rgoldwyn@suse.de \
    --cc=ruansy.fnst@cn.fujitsu.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).