From: Jan Kara <jack@suse.cz> To: Dave Chinner <david@fromorbit.com> Cc: kilobyte@angband.pl, Jan Kara <jack@suse.cz>, "Darrick J. Wong" <darrick.wong@oracle.com>, nborisov@suse.com, Goldwyn Rodrigues <rgoldwyn@suse.de>, linux-nvdimm@lists.01.org, dsterba@suse.cz, willy@infradead.org, linux-fsdevel@vger.kernel.org, hch@lst.de, linux-btrfs@vger.kernel.org Subject: Re: [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes Date: Thu, 30 May 2019 13:16:05 +0200 [thread overview] Message-ID: <20190530111605.GC29237@quack2.suse.cz> (raw) In-Reply-To: <20190529221445.GE16786@dread.disaster.area> On Thu 30-05-19 08:14:45, Dave Chinner wrote: > On Wed, May 29, 2019 at 03:46:29PM +0200, Jan Kara wrote: > > On Wed 29-05-19 14:46:58, Dave Chinner wrote: > > > iomap_apply() > > > > > > ->iomap_begin() > > > map old data extent that we copy from > > > > > > allocate new data extent we copy to in data fork, > > > immediately replacing old data extent > > > > > > return transaction handle as private data > > This holds the inode block map locked exclusively across the IO, > so.... Does it? We do hold XFS_IOLOCK_EXCL during the whole dax write. But xfs_file_iomap_begin() does release XFS_ILOCK_* on exit AFAICS. So I don't see anything that would prevent page fault from mapping blocks into page tables just after xfs_file_iomap_begin() returns. > > > dax_iomap_actor() > > > copies data from old extent to new extent > > > > > > ->iomap_end > > > commits transaction now data has been copied, making > > > the COW operation atomic with the data copy. > > > > > > > > > This, in fact, should be how we do all DAX writes that require > > > allocation, because then we get rid of the need to zero newly > > > allocated or unwritten extents before we copy the data into it. i.e. > > > we only need to write once to newly allocated storage rather than > > > twice. > > > > You need to be careful though. You need to synchronize with page faults so > > that they cannot see and expose in page tables blocks you've allocated > > before their contents is filled. > > ... so the page fault will block trying to map the blocks because > it can't get the xfs_inode->i_ilock until the allocation transaciton > commits.... > > > This race was actually the strongest > > motivation for pre-zeroing of blocks. OTOH copy_from_iter() in > > dax_iomap_actor() needs to be able to fault pages to copy from (and these > > pages may be from the same file you're writing to) so you cannot just block > > faulting for the file through I_MMAP_LOCK. > > Right, it doesn't take the I_MMAP_LOCK, but it would block further > in. And, really, I'm not caring all this much about this corner > case. i.e. anyone using a "mmap()+write() zero copy" pattern on DAX > within a file is unbeleivably naive - the data still gets copied by > the CPU in the write() call. It's far simpler and more effcient to > just mmap() both ranges of the file(s) and memcpy() in userspace.... > > FWIW, it's to avoid problems with stupid userspace stuff that nobody > really should be doing that I want range locks for the XFS inode > locks. If userspace overlaps the ranges and deadlocks in that case, > they they get to keep all the broken bits because, IMO, they are > doing something monumentally stupid. I'd probably be making it > return EDEADLOCK back out to userspace in the case rather than > deadlocking but, fundamentally, I think it's broken behaviour that > we should be rejecting with an error rather than adding complexity > trying to handle it. I agree with this. We must just prevent user from taking the kernel down with maliciously created IOs... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz> To: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz>, "Darrick J. Wong" <darrick.wong@oracle.com>, Shiyang Ruan <ruansy.fnst@cn.fujitsu.com>, Goldwyn Rodrigues <rgoldwyn@suse.de>, linux-btrfs@vger.kernel.org, kilobyte@angband.pl, linux-fsdevel@vger.kernel.org, willy@infradead.org, hch@lst.de, dsterba@suse.cz, nborisov@suse.com, linux-nvdimm@lists.01.org Subject: Re: [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes Date: Thu, 30 May 2019 13:16:05 +0200 [thread overview] Message-ID: <20190530111605.GC29237@quack2.suse.cz> (raw) In-Reply-To: <20190529221445.GE16786@dread.disaster.area> On Thu 30-05-19 08:14:45, Dave Chinner wrote: > On Wed, May 29, 2019 at 03:46:29PM +0200, Jan Kara wrote: > > On Wed 29-05-19 14:46:58, Dave Chinner wrote: > > > iomap_apply() > > > > > > ->iomap_begin() > > > map old data extent that we copy from > > > > > > allocate new data extent we copy to in data fork, > > > immediately replacing old data extent > > > > > > return transaction handle as private data > > This holds the inode block map locked exclusively across the IO, > so.... Does it? We do hold XFS_IOLOCK_EXCL during the whole dax write. But xfs_file_iomap_begin() does release XFS_ILOCK_* on exit AFAICS. So I don't see anything that would prevent page fault from mapping blocks into page tables just after xfs_file_iomap_begin() returns. > > > dax_iomap_actor() > > > copies data from old extent to new extent > > > > > > ->iomap_end > > > commits transaction now data has been copied, making > > > the COW operation atomic with the data copy. > > > > > > > > > This, in fact, should be how we do all DAX writes that require > > > allocation, because then we get rid of the need to zero newly > > > allocated or unwritten extents before we copy the data into it. i.e. > > > we only need to write once to newly allocated storage rather than > > > twice. > > > > You need to be careful though. You need to synchronize with page faults so > > that they cannot see and expose in page tables blocks you've allocated > > before their contents is filled. > > ... so the page fault will block trying to map the blocks because > it can't get the xfs_inode->i_ilock until the allocation transaciton > commits.... > > > This race was actually the strongest > > motivation for pre-zeroing of blocks. OTOH copy_from_iter() in > > dax_iomap_actor() needs to be able to fault pages to copy from (and these > > pages may be from the same file you're writing to) so you cannot just block > > faulting for the file through I_MMAP_LOCK. > > Right, it doesn't take the I_MMAP_LOCK, but it would block further > in. And, really, I'm not caring all this much about this corner > case. i.e. anyone using a "mmap()+write() zero copy" pattern on DAX > within a file is unbeleivably naive - the data still gets copied by > the CPU in the write() call. It's far simpler and more effcient to > just mmap() both ranges of the file(s) and memcpy() in userspace.... > > FWIW, it's to avoid problems with stupid userspace stuff that nobody > really should be doing that I want range locks for the XFS inode > locks. If userspace overlaps the ranges and deadlocks in that case, > they they get to keep all the broken bits because, IMO, they are > doing something monumentally stupid. I'd probably be making it > return EDEADLOCK back out to userspace in the case rather than > deadlocking but, fundamentally, I think it's broken behaviour that > we should be rejecting with an error rather than adding complexity > trying to handle it. I agree with this. We must just prevent user from taking the kernel down with maliciously created IOs... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR
next prev parent reply other threads:[~2019-05-30 11:16 UTC|newest] Thread overview: 113+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-04-29 17:26 [PATCH v4 00/18] btrfs dax support Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues 2019-04-29 17:26 ` [PATCH 01/18] btrfs: create a mount option for dax Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues 2019-05-21 18:02 ` Darrick J. Wong 2019-05-21 18:02 ` Darrick J. Wong 2019-04-29 17:26 ` [PATCH 02/18] btrfs: Carve out btrfs_get_extent_map_write() out of btrfs_get_blocks_write() Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues 2019-04-29 17:26 ` [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues 2019-05-21 16:51 ` Darrick J. Wong 2019-05-22 20:14 ` Goldwyn Rodrigues 2019-05-22 20:14 ` Goldwyn Rodrigues 2019-05-23 2:10 ` Dave Chinner 2019-05-23 2:10 ` Dave Chinner 2019-05-23 9:05 ` Shiyang Ruan 2019-05-23 9:05 ` Shiyang Ruan 2019-05-23 11:51 ` Goldwyn Rodrigues 2019-05-23 11:51 ` Goldwyn Rodrigues 2019-05-27 8:25 ` Shiyang Ruan 2019-05-27 8:25 ` Shiyang Ruan 2019-05-28 9:17 ` Jan Kara 2019-05-28 9:17 ` Jan Kara 2019-05-29 2:01 ` Shiyang Ruan 2019-05-29 2:01 ` Shiyang Ruan 2019-05-29 2:47 ` Dave Chinner 2019-05-29 2:47 ` Dave Chinner 2019-05-29 4:02 ` Shiyang Ruan 2019-05-29 4:02 ` Shiyang Ruan 2019-05-29 4:07 ` Darrick J. Wong 2019-05-29 4:07 ` Darrick J. Wong 2019-05-29 4:46 ` Dave Chinner 2019-05-29 4:46 ` Dave Chinner 2019-05-29 13:46 ` Jan Kara 2019-05-29 13:46 ` Jan Kara 2019-05-29 22:14 ` Dave Chinner 2019-05-29 22:14 ` Dave Chinner 2019-05-30 11:16 ` Jan Kara [this message] 2019-05-30 11:16 ` Jan Kara 2019-05-30 22:59 ` Dave Chinner 2019-05-30 22:59 ` Dave Chinner 2019-04-29 17:26 ` [PATCH 05/18] btrfs: return whether extent is nocow or not Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues 2019-04-29 17:26 ` [PATCH 06/18] btrfs: Rename __endio_write_update_ordered() to btrfs_update_ordered_extent() Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues [not found] ` <20190429172649.8288-1-rgoldwyn-l3A5Bk7waGM@public.gmane.org> 2019-04-29 17:26 ` [PATCH 03/18] btrfs: basic dax read Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues [not found] ` <20190429172649.8288-4-rgoldwyn-l3A5Bk7waGM@public.gmane.org> 2019-05-21 15:14 ` Darrick J. Wong 2019-05-21 15:14 ` Darrick J. Wong 2019-05-22 21:50 ` Goldwyn Rodrigues 2019-04-29 17:26 ` [PATCH 07/18] btrfs: add dax write support Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues [not found] ` <20190429172649.8288-8-rgoldwyn-l3A5Bk7waGM@public.gmane.org> 2019-05-21 17:08 ` Darrick J. Wong 2019-05-21 17:08 ` Darrick J. Wong 2019-04-29 17:26 ` [PATCH 13/18] fs: dedup file range to use a compare function Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues [not found] ` <20190429172649.8288-14-rgoldwyn-l3A5Bk7waGM@public.gmane.org> 2019-05-21 18:17 ` Darrick J. Wong 2019-05-21 18:17 ` Darrick J. Wong 2019-04-29 17:26 ` [PATCH 18/18] btrfs: trace functions for btrfs_iomap_begin/end Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues 2019-04-29 17:26 ` [PATCH 08/18] dax: memcpy page in case of IOMAP_DAX_COW for mmap faults Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues 2019-05-21 17:46 ` Darrick J. Wong 2019-05-21 17:46 ` Darrick J. Wong 2019-05-22 19:11 ` Goldwyn Rodrigues 2019-05-22 19:11 ` Goldwyn Rodrigues 2019-05-23 4:02 ` Darrick J. Wong 2019-05-23 4:02 ` Darrick J. Wong 2019-05-23 12:10 ` Jan Kara 2019-05-23 12:10 ` Jan Kara 2019-04-29 17:26 ` [PATCH 09/18] btrfs: Add dax specific address_space_operations Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues 2019-04-29 17:26 ` [PATCH 10/18] dax: replace mmap entry in case of CoW Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues 2019-05-21 17:35 ` Darrick J. Wong 2019-05-21 17:35 ` Darrick J. Wong 2019-05-23 13:38 ` Jan Kara 2019-05-23 13:38 ` Jan Kara 2019-04-29 17:26 ` [PATCH 11/18] btrfs: add dax mmap support Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues 2019-04-29 17:26 ` [PATCH 12/18] btrfs: allow MAP_SYNC mmap Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues 2019-05-10 15:32 ` [PATCH for-goldwyn] btrfs: disallow MAP_SYNC outside of DAX mounts Adam Borowski 2019-05-10 15:32 ` Adam Borowski 2019-05-10 15:41 ` Dan Williams 2019-05-10 15:41 ` Dan Williams 2019-05-10 15:59 ` Pankaj Gupta 2019-05-10 15:59 ` Pankaj Gupta 2019-05-23 13:44 ` [PATCH 12/18] btrfs: allow MAP_SYNC mmap Jan Kara 2019-05-23 13:44 ` Jan Kara 2019-05-23 16:19 ` Adam Borowski 2019-05-23 16:19 ` Adam Borowski 2019-04-29 17:26 ` [PATCH 14/18] dax: memcpy before zeroing range Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues 2019-05-21 17:27 ` Darrick J. Wong 2019-05-21 17:27 ` Darrick J. Wong 2019-04-29 17:26 ` [PATCH 15/18] btrfs: handle dax page zeroing Goldwyn Rodrigues 2019-04-29 17:26 ` [PATCH 16/18] btrfs: Writeprotect mmap pages on snapshot Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues 2019-05-23 14:04 ` Jan Kara 2019-05-23 14:04 ` Jan Kara 2019-05-23 15:27 ` Goldwyn Rodrigues 2019-05-23 15:27 ` Goldwyn Rodrigues 2019-05-23 19:07 ` Jan Kara 2019-05-23 19:07 ` Jan Kara 2019-05-23 21:22 ` Goldwyn Rodrigues 2019-05-23 21:22 ` Goldwyn Rodrigues 2019-04-29 17:26 ` [PATCH 17/18] btrfs: Disable dax-based defrag and send Goldwyn Rodrigues 2019-04-29 17:26 ` Goldwyn Rodrigues -- strict thread matches above, loose matches on Subject: below -- 2019-04-16 16:41 [PATCH v3 00/18] btrfs dax support Goldwyn Rodrigues 2019-04-16 16:41 ` [PATCH 04/18] dax: Introduce IOMAP_DAX_COW to CoW edges during writes Goldwyn Rodrigues 2019-04-16 16:41 ` Goldwyn Rodrigues [not found] ` <20190416164154.30390-5-rgoldwyn-l3A5Bk7waGM@public.gmane.org> 2019-04-17 16:46 ` Darrick J. Wong 2019-04-17 16:46 ` Darrick J. Wong
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190530111605.GC29237@quack2.suse.cz \ --to=jack@suse.cz \ --cc=darrick.wong@oracle.com \ --cc=david@fromorbit.com \ --cc=dsterba@suse.cz \ --cc=hch@lst.de \ --cc=kilobyte@angband.pl \ --cc=linux-btrfs@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ --cc=nborisov@suse.com \ --cc=rgoldwyn@suse.de \ --cc=willy@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.