All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: NeilBrown <neilb@suse.com>
Cc: Jan Kara <jack@suse.cz>, linux-nvdimm@lists.01.org, Wilcox,
Subject: Re: [PATCH 12/12] dax: New fault locking
Date: Fri, 18 Mar 2016 16:39:19 +0100	[thread overview]
Message-ID: <20160318153919.GG7152@quack.suse.cz> (raw)
In-Reply-To: <20160318141618.GF7152@quack.suse.cz>

On Fri 18-03-16 15:16:18, Jan Kara wrote:
> On Wed 16-03-16 08:34:28, NeilBrown wrote:
> > On Fri, Mar 11 2016, NeilBrown wrote:
> > 
> > > On Fri, Mar 11 2016, Jan Kara wrote:
> > >
> > >> Currently DAX page fault locking is racy.
> > >>
> > >> CPU0 (write fault)		CPU1 (read fault)
> > >>
> > >> __dax_fault()			__dax_fault()
> > >>   get_block(inode, block, &bh, 0) -> not mapped
> > >> 				  get_block(inode, block, &bh, 0)
> > >> 				    -> not mapped
> > >>   if (!buffer_mapped(&bh))
> > >>     if (vmf->flags & FAULT_FLAG_WRITE)
> > >>       get_block(inode, block, &bh, 1) -> allocates blocks
> > >>   if (page) -> no
> > >> 				  if (!buffer_mapped(&bh))
> > >> 				    if (vmf->flags & FAULT_FLAG_WRITE) {
> > >> 				    } else {
> > >> 				      dax_load_hole();
> > >> 				    }
> > >>   dax_insert_mapping()
> > >>
> > >> And we are in a situation where we fail in dax_radix_entry() with -EIO.
> > >>
> > >> Another problem with the current DAX page fault locking is that there is
> > >> no race-free way to clear dirty tag in the radix tree. We can always
> > >> end up with clean radix tree and dirty data in CPU cache.
> > >>
> > >> We fix the first problem by introducing locking of exceptional radix
> > >> tree entries in DAX mappings acting very similarly to page lock and thus
> > >> synchronizing properly faults against the same mapping index. The same
> > >> lock can later be used to avoid races when clearing radix tree dirty
> > >> tag.
> > >
> > > Hi,
> > >  I think the exception locking bits look good - I cannot comment on the
> > >  rest.
> > >  I looks like it was a good idea to bring the locking into dax.c instead
> > >  of trying to make it generic.
> > >
> > 
> > Actually ... I'm still bothered by the exclusive waiting.  If an entry
> > is locked and there are two threads in dax_pfn_mkwrite() then one would
> > be woken up when the entry is unlocked and it will just set the TAG_DIRTY
> > flag and then continue without ever waking the next waiter on the
> > wait queue.
> > 
> > I *think* that any thread which gets an exclusive wakeup is responsible
> > for performing another wakeup.  In this case it must either lock the
> > slot, or call __wakeup.
> > That means:
> >   grab_mapping_entry needs to call wakeup:
> >      if radix_tree_preload() fails
> >      if radix_tree_insert fails other than with -EEXIST
> >      if a valid page was found
> 
> Why would we need to call wake up when a valid page was found? In that case
> there should not be any process waiting for the radix tree entry lock.
> Otherwise I agree with you. Thanks for pointing this out, you've likely
> saved me quite some debugging ;).
> 
> 
> >   dax_delete_mapping_entry needs to call wakeup
> >      if the fail case, though as that isn't expect (WARN_ON_ONCE)
> >         it should be a problem not to wakeup here
> >   dax_pfn_mkwrite needs to call wakeup unconditionally

Actually, after some thought I don't think the wakeup is needed except for
dax_pfn_mkwrite(). In the other cases we know there is no radix tree
exceptional entry and thus there can be no waiters for its lock...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: NeilBrown <neilb@suse.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, "Wilcox,
	Matthew R" <matthew.r.wilcox@intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-nvdimm@lists.01.org
Subject: Re: [PATCH 12/12] dax: New fault locking
Date: Fri, 18 Mar 2016 16:39:19 +0100	[thread overview]
Message-ID: <20160318153919.GG7152@quack.suse.cz> (raw)
In-Reply-To: <20160318141618.GF7152@quack.suse.cz>

On Fri 18-03-16 15:16:18, Jan Kara wrote:
> On Wed 16-03-16 08:34:28, NeilBrown wrote:
> > On Fri, Mar 11 2016, NeilBrown wrote:
> > 
> > > On Fri, Mar 11 2016, Jan Kara wrote:
> > >
> > >> Currently DAX page fault locking is racy.
> > >>
> > >> CPU0 (write fault)		CPU1 (read fault)
> > >>
> > >> __dax_fault()			__dax_fault()
> > >>   get_block(inode, block, &bh, 0) -> not mapped
> > >> 				  get_block(inode, block, &bh, 0)
> > >> 				    -> not mapped
> > >>   if (!buffer_mapped(&bh))
> > >>     if (vmf->flags & FAULT_FLAG_WRITE)
> > >>       get_block(inode, block, &bh, 1) -> allocates blocks
> > >>   if (page) -> no
> > >> 				  if (!buffer_mapped(&bh))
> > >> 				    if (vmf->flags & FAULT_FLAG_WRITE) {
> > >> 				    } else {
> > >> 				      dax_load_hole();
> > >> 				    }
> > >>   dax_insert_mapping()
> > >>
> > >> And we are in a situation where we fail in dax_radix_entry() with -EIO.
> > >>
> > >> Another problem with the current DAX page fault locking is that there is
> > >> no race-free way to clear dirty tag in the radix tree. We can always
> > >> end up with clean radix tree and dirty data in CPU cache.
> > >>
> > >> We fix the first problem by introducing locking of exceptional radix
> > >> tree entries in DAX mappings acting very similarly to page lock and thus
> > >> synchronizing properly faults against the same mapping index. The same
> > >> lock can later be used to avoid races when clearing radix tree dirty
> > >> tag.
> > >
> > > Hi,
> > >  I think the exception locking bits look good - I cannot comment on the
> > >  rest.
> > >  I looks like it was a good idea to bring the locking into dax.c instead
> > >  of trying to make it generic.
> > >
> > 
> > Actually ... I'm still bothered by the exclusive waiting.  If an entry
> > is locked and there are two threads in dax_pfn_mkwrite() then one would
> > be woken up when the entry is unlocked and it will just set the TAG_DIRTY
> > flag and then continue without ever waking the next waiter on the
> > wait queue.
> > 
> > I *think* that any thread which gets an exclusive wakeup is responsible
> > for performing another wakeup.  In this case it must either lock the
> > slot, or call __wakeup.
> > That means:
> >   grab_mapping_entry needs to call wakeup:
> >      if radix_tree_preload() fails
> >      if radix_tree_insert fails other than with -EEXIST
> >      if a valid page was found
> 
> Why would we need to call wake up when a valid page was found? In that case
> there should not be any process waiting for the radix tree entry lock.
> Otherwise I agree with you. Thanks for pointing this out, you've likely
> saved me quite some debugging ;).
> 
> 
> >   dax_delete_mapping_entry needs to call wakeup
> >      if the fail case, though as that isn't expect (WARN_ON_ONCE)
> >         it should be a problem not to wakeup here
> >   dax_pfn_mkwrite needs to call wakeup unconditionally

Actually, after some thought I don't think the wakeup is needed except for
dax_pfn_mkwrite(). In the other cases we know there is no radix tree
exceptional entry and thus there can be no waiters for its lock...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2016-03-18 15:39 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-10 19:18 [RFC] [PATCH 0/12] DAX page fault locking Jan Kara
2016-03-10 19:18 ` Jan Kara
2016-03-10 19:18 ` [PATCH 01/12] DAX: move RADIX_DAX_ definitions to dax.c Jan Kara
2016-03-10 19:18   ` Jan Kara
2016-03-11 22:54   ` Ross Zwisler
2016-03-11 22:54     ` Ross Zwisler
2016-03-10 19:18 ` [PATCH 02/12] radix-tree: make 'indirect' bit available to exception entries Jan Kara
2016-03-10 19:18   ` Jan Kara
2016-03-10 19:18 ` [PATCH 03/12] mm: Remove VM_FAULT_MINOR Jan Kara
2016-03-10 19:18   ` Jan Kara
2016-03-10 19:38   ` Wilcox, Matthew R
2016-03-10 19:38     ` Wilcox, Matthew R
2016-03-10 19:48     ` Jan Kara
2016-03-10 19:48       ` Jan Kara
2016-03-10 19:18 ` [PATCH 04/12] ocfs2: Fix return value from ocfs2_page_mkwrite() Jan Kara
2016-03-10 19:18   ` Jan Kara
2016-03-10 19:18 ` [PATCH 05/12] dax: Remove synchronization using i_mmap_lock Jan Kara
2016-03-10 19:55   ` Wilcox, Matthew R
2016-03-10 19:55     ` Wilcox, Matthew R
2016-03-10 20:05     ` Jan Kara
2016-03-10 20:05       ` Jan Kara
2016-03-10 20:10       ` Wilcox, Matthew R
2016-03-10 20:10         ` Wilcox, Matthew R
2016-03-14 10:01         ` Jan Kara
2016-03-14 10:01           ` Jan Kara
2016-03-14 14:51           ` Wilcox, Matthew R
2016-03-14 14:51             ` Wilcox, Matthew R
2016-03-15  9:50             ` Jan Kara
2016-03-15  9:50               ` Jan Kara
2016-03-10 19:18 ` [PATCH 06/12] dax: Remove complete_unwritten argument Jan Kara
2016-03-10 19:18   ` Jan Kara
2016-03-10 19:18 ` [PATCH 07/12] dax: Fix data corruption for written and mmapped files Jan Kara
2016-03-10 19:18   ` Jan Kara
2016-03-10 19:18 ` [PATCH 08/12] dax: Fix bogus fault return value on cow faults Jan Kara
2016-03-10 19:18   ` Jan Kara
2016-03-10 19:18 ` [PATCH 09/12] dax: Allow DAX code to replace exceptional entries Jan Kara
2016-03-10 19:18   ` Jan Kara
2016-03-10 19:18 ` [PATCH 10/12] dax: Remove redundant inode size checks Jan Kara
2016-03-10 19:18   ` Jan Kara
2016-03-10 19:18 ` [PATCH 11/12] dax: Disable huge page handling Jan Kara
2016-03-10 19:34   ` Dan Williams
2016-03-10 19:34     ` Dan Williams
2016-03-10 19:52     ` Jan Kara
2016-03-10 19:52       ` Jan Kara
2016-03-10 19:18 ` [PATCH 12/12] dax: New fault locking Jan Kara
2016-03-10 19:18   ` Jan Kara
2016-03-10 23:54   ` NeilBrown
2016-03-10 23:54     ` NeilBrown
2016-03-15 21:34     ` NeilBrown
2016-03-15 21:34       ` NeilBrown
2016-03-18 14:16       ` Jan Kara
2016-03-18 14:16         ` Jan Kara
2016-03-18 15:39         ` Jan Kara [this message]
2016-03-18 15:39           ` Jan Kara
2016-03-22 21:10           ` NeilBrown
2016-03-22 21:10             ` NeilBrown
2016-03-23 11:00             ` Jan Kara
2016-03-23 11:00               ` Jan Kara
2016-03-31  4:20               ` NeilBrown
2016-03-31  4:20                 ` NeilBrown
2016-03-31  8:54                 ` Jan Kara
2016-03-31  8:54                   ` Jan Kara
2016-04-01  0:34                   ` NeilBrown
2016-04-01  0:34                     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160318153919.GG7152@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=linux-nvdimm@lists.01.org \
    --cc=neilb@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.