All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: kilobyte@angband.pl, Jan Kara <jack@suse.cz>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	nborisov@suse.com, Goldwyn Rodrigues <rgoldwyn@suse.de>,
	linux-nvdimm@lists.01.org, dsterba@suse.cz, willy@infradead.org,
	linux-fsdevel@vger.kernel.org, hch@lst.de,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes
Date: Thu, 30 May 2019 13:16:05 +0200	[thread overview]
Message-ID: <20190530111605.GC29237@quack2.suse.cz> (raw)
In-Reply-To: <20190529221445.GE16786@dread.disaster.area>

On Thu 30-05-19 08:14:45, Dave Chinner wrote:
> On Wed, May 29, 2019 at 03:46:29PM +0200, Jan Kara wrote:
> > On Wed 29-05-19 14:46:58, Dave Chinner wrote:
> > >  iomap_apply()
> > > 
> > >  	->iomap_begin()
> > > 		map old data extent that we copy from
> > > 
> > > 		allocate new data extent we copy to in data fork,
> > > 		immediately replacing old data extent
> > > 
> > > 		return transaction handle as private data
> 
> This holds the inode block map locked exclusively across the IO,
> so....

Does it? We do hold XFS_IOLOCK_EXCL during the whole dax write. But
xfs_file_iomap_begin() does release XFS_ILOCK_* on exit AFAICS. So I don't
see anything that would prevent page fault from mapping blocks into page
tables just after xfs_file_iomap_begin() returns.

> > > 	dax_iomap_actor()
> > > 		copies data from old extent to new extent
> > > 
> > > 	->iomap_end
> > > 		commits transaction now data has been copied, making
> > > 		the COW operation atomic with the data copy.
> > > 
> > > 
> > > This, in fact, should be how we do all DAX writes that require
> > > allocation, because then we get rid of the need to zero newly
> > > allocated or unwritten extents before we copy the data into it. i.e.
> > > we only need to write once to newly allocated storage rather than
> > > twice.
> > 
> > You need to be careful though. You need to synchronize with page faults so
> > that they cannot see and expose in page tables blocks you've allocated
> > before their contents is filled.
> 
> ... so the page fault will block trying to map the blocks because
> it can't get the xfs_inode->i_ilock until the allocation transaciton
> commits....
> 
> > This race was actually the strongest
> > motivation for pre-zeroing of blocks. OTOH copy_from_iter() in
> > dax_iomap_actor() needs to be able to fault pages to copy from (and these
> > pages may be from the same file you're writing to) so you cannot just block
> > faulting for the file through I_MMAP_LOCK.
> 
> Right, it doesn't take the I_MMAP_LOCK, but it would block further
> in. And, really, I'm not caring all this much about this corner
> case. i.e.  anyone using a "mmap()+write() zero copy" pattern on DAX
> within a file is unbeleivably naive - the data still gets copied by
> the CPU in the write() call. It's far simpler and more effcient to
> just mmap() both ranges of the file(s) and memcpy() in userspace....
> 
> FWIW, it's to avoid problems with stupid userspace stuff that nobody
> really should be doing that I want range locks for the XFS inode
> locks.  If userspace overlaps the ranges and deadlocks in that case,
> they they get to keep all the broken bits because, IMO, they are
> doing something monumentally stupid. I'd probably be making it
> return EDEADLOCK back out to userspace in the case rather than
> deadlocking but, fundamentally, I think it's broken behaviour that
> we should be rejecting with an error rather than adding complexity
> trying to handle it.

I agree with this. We must just prevent user from taking the kernel down
with maliciously created IOs...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>,
	Goldwyn Rodrigues <rgoldwyn@suse.de>,
	linux-btrfs@vger.kernel.org, kilobyte@angband.pl,
	linux-fsdevel@vger.kernel.org, willy@infradead.org, hch@lst.de,
	dsterba@suse.cz, nborisov@suse.com, linux-nvdimm@lists.01.org
Subject: Re: [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes
Date: Thu, 30 May 2019 13:16:05 +0200	[thread overview]
Message-ID: <20190530111605.GC29237@quack2.suse.cz> (raw)
In-Reply-To: <20190529221445.GE16786@dread.disaster.area>

On Thu 30-05-19 08:14:45, Dave Chinner wrote:
> On Wed, May 29, 2019 at 03:46:29PM +0200, Jan Kara wrote:
> > On Wed 29-05-19 14:46:58, Dave Chinner wrote:
> > >  iomap_apply()
> > > 
> > >  	->iomap_begin()
> > > 		map old data extent that we copy from
> > > 
> > > 		allocate new data extent we copy to in data fork,
> > > 		immediately replacing old data extent
> > > 
> > > 		return transaction handle as private data
> 
> This holds the inode block map locked exclusively across the IO,
> so....

Does it? We do hold XFS_IOLOCK_EXCL during the whole dax write. But
xfs_file_iomap_begin() does release XFS_ILOCK_* on exit AFAICS. So I don't
see anything that would prevent page fault from mapping blocks into page
tables just after xfs_file_iomap_begin() returns.

> > > 	dax_iomap_actor()
> > > 		copies data from old extent to new extent
> > > 
> > > 	->iomap_end
> > > 		commits transaction now data has been copied, making
> > > 		the COW operation atomic with the data copy.
> > > 
> > > 
> > > This, in fact, should be how we do all DAX writes that require
> > > allocation, because then we get rid of the need to zero newly
> > > allocated or unwritten extents before we copy the data into it. i.e.
> > > we only need to write once to newly allocated storage rather than
> > > twice.
> > 
> > You need to be careful though. You need to synchronize with page faults so
> > that they cannot see and expose in page tables blocks you've allocated
> > before their contents is filled.
> 
> ... so the page fault will block trying to map the blocks because
> it can't get the xfs_inode->i_ilock until the allocation transaciton
> commits....
> 
> > This race was actually the strongest
> > motivation for pre-zeroing of blocks. OTOH copy_from_iter() in
> > dax_iomap_actor() needs to be able to fault pages to copy from (and these
> > pages may be from the same file you're writing to) so you cannot just block
> > faulting for the file through I_MMAP_LOCK.
> 
> Right, it doesn't take the I_MMAP_LOCK, but it would block further
> in. And, really, I'm not caring all this much about this corner
> case. i.e.  anyone using a "mmap()+write() zero copy" pattern on DAX
> within a file is unbeleivably naive - the data still gets copied by
> the CPU in the write() call. It's far simpler and more effcient to
> just mmap() both ranges of the file(s) and memcpy() in userspace....
> 
> FWIW, it's to avoid problems with stupid userspace stuff that nobody
> really should be doing that I want range locks for the XFS inode
> locks.  If userspace overlaps the ranges and deadlocks in that case,
> they they get to keep all the broken bits because, IMO, they are
> doing something monumentally stupid. I'd probably be making it
> return EDEADLOCK back out to userspace in the case rather than
> deadlocking but, fundamentally, I think it's broken behaviour that
> we should be rejecting with an error rather than adding complexity
> trying to handle it.

I agree with this. We must just prevent user from taking the kernel down
with maliciously created IOs...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2019-05-30 11:16 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-29 17:26 [PATCH v4 00/18] btrfs dax support Goldwyn Rodrigues
2019-04-29 17:26 ` Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 01/18] btrfs: create a mount option for dax Goldwyn Rodrigues
2019-04-29 17:26   ` Goldwyn Rodrigues
2019-05-21 18:02   ` Darrick J. Wong
2019-05-21 18:02     ` Darrick J. Wong
2019-04-29 17:26 ` [PATCH 02/18] btrfs: Carve out btrfs_get_extent_map_write() out of btrfs_get_blocks_write() Goldwyn Rodrigues
2019-04-29 17:26   ` Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes Goldwyn Rodrigues
2019-04-29 17:26   ` Goldwyn Rodrigues
2019-05-21 16:51   ` Darrick J. Wong
2019-05-22 20:14     ` Goldwyn Rodrigues
2019-05-22 20:14       ` Goldwyn Rodrigues
2019-05-23  2:10       ` Dave Chinner
2019-05-23  2:10         ` Dave Chinner
2019-05-23  9:05     ` Shiyang Ruan
2019-05-23  9:05       ` Shiyang Ruan
2019-05-23 11:51       ` Goldwyn Rodrigues
2019-05-23 11:51         ` Goldwyn Rodrigues
2019-05-27  8:25         ` Shiyang Ruan
2019-05-27  8:25           ` Shiyang Ruan
2019-05-28  9:17           ` Jan Kara
2019-05-28  9:17             ` Jan Kara
2019-05-29  2:01             ` Shiyang Ruan
2019-05-29  2:01               ` Shiyang Ruan
2019-05-29  2:47               ` Dave Chinner
2019-05-29  2:47                 ` Dave Chinner
2019-05-29  4:02                 ` Shiyang Ruan
2019-05-29  4:02                   ` Shiyang Ruan
2019-05-29  4:07                   ` Darrick J. Wong
2019-05-29  4:07                     ` Darrick J. Wong
2019-05-29  4:46                     ` Dave Chinner
2019-05-29  4:46                       ` Dave Chinner
2019-05-29 13:46                       ` Jan Kara
2019-05-29 13:46                         ` Jan Kara
2019-05-29 22:14                         ` Dave Chinner
2019-05-29 22:14                           ` Dave Chinner
2019-05-30 11:16                           ` Jan Kara [this message]
2019-05-30 11:16                             ` Jan Kara
2019-05-30 22:59                             ` Dave Chinner
2019-05-30 22:59                               ` Dave Chinner
2019-04-29 17:26 ` [PATCH 05/18] btrfs: return whether extent is nocow or not Goldwyn Rodrigues
2019-04-29 17:26   ` Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 06/18] btrfs: Rename __endio_write_update_ordered() to btrfs_update_ordered_extent() Goldwyn Rodrigues
2019-04-29 17:26   ` Goldwyn Rodrigues
     [not found] ` <20190429172649.8288-1-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
2019-04-29 17:26   ` [PATCH 03/18] btrfs: basic dax read Goldwyn Rodrigues
2019-04-29 17:26     ` Goldwyn Rodrigues
     [not found]     ` <20190429172649.8288-4-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
2019-05-21 15:14       ` Darrick J. Wong
2019-05-21 15:14         ` Darrick J. Wong
2019-05-22 21:50         ` Goldwyn Rodrigues
2019-04-29 17:26   ` [PATCH 07/18] btrfs: add dax write support Goldwyn Rodrigues
2019-04-29 17:26     ` Goldwyn Rodrigues
     [not found]     ` <20190429172649.8288-8-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
2019-05-21 17:08       ` Darrick J. Wong
2019-05-21 17:08         ` Darrick J. Wong
2019-04-29 17:26   ` [PATCH 13/18] fs: dedup file range to use a compare function Goldwyn Rodrigues
2019-04-29 17:26     ` Goldwyn Rodrigues
     [not found]     ` <20190429172649.8288-14-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
2019-05-21 18:17       ` Darrick J. Wong
2019-05-21 18:17         ` Darrick J. Wong
2019-04-29 17:26   ` [PATCH 18/18] btrfs: trace functions for btrfs_iomap_begin/end Goldwyn Rodrigues
2019-04-29 17:26     ` Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 08/18] dax: memcpy page in case of IOMAP_DAX_COW for mmap faults Goldwyn Rodrigues
2019-04-29 17:26   ` Goldwyn Rodrigues
2019-05-21 17:46   ` Darrick J. Wong
2019-05-21 17:46     ` Darrick J. Wong
2019-05-22 19:11     ` Goldwyn Rodrigues
2019-05-22 19:11       ` Goldwyn Rodrigues
2019-05-23  4:02       ` Darrick J. Wong
2019-05-23  4:02         ` Darrick J. Wong
2019-05-23 12:10     ` Jan Kara
2019-05-23 12:10       ` Jan Kara
2019-04-29 17:26 ` [PATCH 09/18] btrfs: Add dax specific address_space_operations Goldwyn Rodrigues
2019-04-29 17:26   ` Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 10/18] dax: replace mmap entry in case of CoW Goldwyn Rodrigues
2019-04-29 17:26   ` Goldwyn Rodrigues
2019-05-21 17:35   ` Darrick J. Wong
2019-05-21 17:35     ` Darrick J. Wong
2019-05-23 13:38   ` Jan Kara
2019-05-23 13:38     ` Jan Kara
2019-04-29 17:26 ` [PATCH 11/18] btrfs: add dax mmap support Goldwyn Rodrigues
2019-04-29 17:26   ` Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 12/18] btrfs: allow MAP_SYNC mmap Goldwyn Rodrigues
2019-04-29 17:26   ` Goldwyn Rodrigues
2019-05-10 15:32   ` [PATCH for-goldwyn] btrfs: disallow MAP_SYNC outside of DAX mounts Adam Borowski
2019-05-10 15:32     ` Adam Borowski
2019-05-10 15:41     ` Dan Williams
2019-05-10 15:41       ` Dan Williams
2019-05-10 15:59       ` Pankaj Gupta
2019-05-10 15:59         ` Pankaj Gupta
2019-05-23 13:44   ` [PATCH 12/18] btrfs: allow MAP_SYNC mmap Jan Kara
2019-05-23 13:44     ` Jan Kara
2019-05-23 16:19     ` Adam Borowski
2019-05-23 16:19       ` Adam Borowski
2019-04-29 17:26 ` [PATCH 14/18] dax: memcpy before zeroing range Goldwyn Rodrigues
2019-04-29 17:26   ` Goldwyn Rodrigues
2019-05-21 17:27   ` Darrick J. Wong
2019-05-21 17:27     ` Darrick J. Wong
2019-04-29 17:26 ` [PATCH 15/18] btrfs: handle dax page zeroing Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 16/18] btrfs: Writeprotect mmap pages on snapshot Goldwyn Rodrigues
2019-04-29 17:26   ` Goldwyn Rodrigues
2019-05-23 14:04   ` Jan Kara
2019-05-23 14:04     ` Jan Kara
2019-05-23 15:27     ` Goldwyn Rodrigues
2019-05-23 15:27       ` Goldwyn Rodrigues
2019-05-23 19:07       ` Jan Kara
2019-05-23 19:07         ` Jan Kara
2019-05-23 21:22         ` Goldwyn Rodrigues
2019-05-23 21:22           ` Goldwyn Rodrigues
2019-04-29 17:26 ` [PATCH 17/18] btrfs: Disable dax-based defrag and send Goldwyn Rodrigues
2019-04-29 17:26   ` Goldwyn Rodrigues
  -- strict thread matches above, loose matches on Subject: below --
2019-04-16 16:41 [PATCH v3 00/18] btrfs dax support Goldwyn Rodrigues
2019-04-16 16:41 ` [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes Goldwyn Rodrigues
2019-04-16 16:41   ` Goldwyn Rodrigues
     [not found]   ` <20190416164154.30390-5-rgoldwyn-l3A5Bk7waGM@public.gmane.org>
2019-04-17 16:46     ` Darrick J. Wong
2019-04-17 16:46       ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190530111605.GC29237@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dsterba@suse.cz \
    --cc=hch@lst.de \
    --cc=kilobyte@angband.pl \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=nborisov@suse.com \
    --cc=rgoldwyn@suse.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.