From: Jan Kara <jack@suse.cz> To: NeilBrown <neilb@suse.com> Cc: Jan Kara <jack@suse.cz>, linux-nvdimm@lists.01.org, Wilcox, Subject: Re: [PATCH 12/12] dax: New fault locking Date: Fri, 18 Mar 2016 15:16:18 +0100 [thread overview] Message-ID: <20160318141618.GF7152@quack.suse.cz> (raw) In-Reply-To: <87egbbh1cr.fsf@notabene.neil.brown.name> On Wed 16-03-16 08:34:28, NeilBrown wrote: > On Fri, Mar 11 2016, NeilBrown wrote: > > > On Fri, Mar 11 2016, Jan Kara wrote: > > > >> Currently DAX page fault locking is racy. > >> > >> CPU0 (write fault) CPU1 (read fault) > >> > >> __dax_fault() __dax_fault() > >> get_block(inode, block, &bh, 0) -> not mapped > >> get_block(inode, block, &bh, 0) > >> -> not mapped > >> if (!buffer_mapped(&bh)) > >> if (vmf->flags & FAULT_FLAG_WRITE) > >> get_block(inode, block, &bh, 1) -> allocates blocks > >> if (page) -> no > >> if (!buffer_mapped(&bh)) > >> if (vmf->flags & FAULT_FLAG_WRITE) { > >> } else { > >> dax_load_hole(); > >> } > >> dax_insert_mapping() > >> > >> And we are in a situation where we fail in dax_radix_entry() with -EIO. > >> > >> Another problem with the current DAX page fault locking is that there is > >> no race-free way to clear dirty tag in the radix tree. We can always > >> end up with clean radix tree and dirty data in CPU cache. > >> > >> We fix the first problem by introducing locking of exceptional radix > >> tree entries in DAX mappings acting very similarly to page lock and thus > >> synchronizing properly faults against the same mapping index. The same > >> lock can later be used to avoid races when clearing radix tree dirty > >> tag. > > > > Hi, > > I think the exception locking bits look good - I cannot comment on the > > rest. > > I looks like it was a good idea to bring the locking into dax.c instead > > of trying to make it generic. > > > > Actually ... I'm still bothered by the exclusive waiting. If an entry > is locked and there are two threads in dax_pfn_mkwrite() then one would > be woken up when the entry is unlocked and it will just set the TAG_DIRTY > flag and then continue without ever waking the next waiter on the > wait queue. > > I *think* that any thread which gets an exclusive wakeup is responsible > for performing another wakeup. In this case it must either lock the > slot, or call __wakeup. > That means: > grab_mapping_entry needs to call wakeup: > if radix_tree_preload() fails > if radix_tree_insert fails other than with -EEXIST > if a valid page was found Why would we need to call wake up when a valid page was found? In that case there should not be any process waiting for the radix tree entry lock. Otherwise I agree with you. Thanks for pointing this out, you've likely saved me quite some debugging ;). > dax_delete_mapping_entry needs to call wakeup > if the fail case, though as that isn't expect (WARN_ON_ONCE) > it should be a problem not to wakeup here > dax_pfn_mkwrite needs to call wakeup unconditionally Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz> To: NeilBrown <neilb@suse.com> Cc: Jan Kara <jack@suse.cz>, linux-fsdevel@vger.kernel.org, "Wilcox, Matthew R" <matthew.r.wilcox@intel.com>, Ross Zwisler <ross.zwisler@linux.intel.com>, Dan Williams <dan.j.williams@intel.com>, linux-nvdimm@lists.01.org Subject: Re: [PATCH 12/12] dax: New fault locking Date: Fri, 18 Mar 2016 15:16:18 +0100 [thread overview] Message-ID: <20160318141618.GF7152@quack.suse.cz> (raw) In-Reply-To: <87egbbh1cr.fsf@notabene.neil.brown.name> On Wed 16-03-16 08:34:28, NeilBrown wrote: > On Fri, Mar 11 2016, NeilBrown wrote: > > > On Fri, Mar 11 2016, Jan Kara wrote: > > > >> Currently DAX page fault locking is racy. > >> > >> CPU0 (write fault) CPU1 (read fault) > >> > >> __dax_fault() __dax_fault() > >> get_block(inode, block, &bh, 0) -> not mapped > >> get_block(inode, block, &bh, 0) > >> -> not mapped > >> if (!buffer_mapped(&bh)) > >> if (vmf->flags & FAULT_FLAG_WRITE) > >> get_block(inode, block, &bh, 1) -> allocates blocks > >> if (page) -> no > >> if (!buffer_mapped(&bh)) > >> if (vmf->flags & FAULT_FLAG_WRITE) { > >> } else { > >> dax_load_hole(); > >> } > >> dax_insert_mapping() > >> > >> And we are in a situation where we fail in dax_radix_entry() with -EIO. > >> > >> Another problem with the current DAX page fault locking is that there is > >> no race-free way to clear dirty tag in the radix tree. We can always > >> end up with clean radix tree and dirty data in CPU cache. > >> > >> We fix the first problem by introducing locking of exceptional radix > >> tree entries in DAX mappings acting very similarly to page lock and thus > >> synchronizing properly faults against the same mapping index. The same > >> lock can later be used to avoid races when clearing radix tree dirty > >> tag. > > > > Hi, > > I think the exception locking bits look good - I cannot comment on the > > rest. > > I looks like it was a good idea to bring the locking into dax.c instead > > of trying to make it generic. > > > > Actually ... I'm still bothered by the exclusive waiting. If an entry > is locked and there are two threads in dax_pfn_mkwrite() then one would > be woken up when the entry is unlocked and it will just set the TAG_DIRTY > flag and then continue without ever waking the next waiter on the > wait queue. > > I *think* that any thread which gets an exclusive wakeup is responsible > for performing another wakeup. In this case it must either lock the > slot, or call __wakeup. > That means: > grab_mapping_entry needs to call wakeup: > if radix_tree_preload() fails > if radix_tree_insert fails other than with -EEXIST > if a valid page was found Why would we need to call wake up when a valid page was found? In that case there should not be any process waiting for the radix tree entry lock. Otherwise I agree with you. Thanks for pointing this out, you've likely saved me quite some debugging ;). > dax_delete_mapping_entry needs to call wakeup > if the fail case, though as that isn't expect (WARN_ON_ONCE) > it should be a problem not to wakeup here > dax_pfn_mkwrite needs to call wakeup unconditionally Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR
next prev parent reply other threads:[~2016-03-18 14:16 UTC|newest] Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-03-10 19:18 [RFC] [PATCH 0/12] DAX page fault locking Jan Kara 2016-03-10 19:18 ` Jan Kara 2016-03-10 19:18 ` [PATCH 01/12] DAX: move RADIX_DAX_ definitions to dax.c Jan Kara 2016-03-10 19:18 ` Jan Kara 2016-03-11 22:54 ` Ross Zwisler 2016-03-11 22:54 ` Ross Zwisler 2016-03-10 19:18 ` [PATCH 02/12] radix-tree: make 'indirect' bit available to exception entries Jan Kara 2016-03-10 19:18 ` Jan Kara 2016-03-10 19:18 ` [PATCH 03/12] mm: Remove VM_FAULT_MINOR Jan Kara 2016-03-10 19:18 ` Jan Kara 2016-03-10 19:38 ` Wilcox, Matthew R 2016-03-10 19:38 ` Wilcox, Matthew R 2016-03-10 19:48 ` Jan Kara 2016-03-10 19:48 ` Jan Kara 2016-03-10 19:18 ` [PATCH 04/12] ocfs2: Fix return value from ocfs2_page_mkwrite() Jan Kara 2016-03-10 19:18 ` Jan Kara 2016-03-10 19:18 ` [PATCH 05/12] dax: Remove synchronization using i_mmap_lock Jan Kara 2016-03-10 19:55 ` Wilcox, Matthew R 2016-03-10 19:55 ` Wilcox, Matthew R 2016-03-10 20:05 ` Jan Kara 2016-03-10 20:05 ` Jan Kara 2016-03-10 20:10 ` Wilcox, Matthew R 2016-03-10 20:10 ` Wilcox, Matthew R 2016-03-14 10:01 ` Jan Kara 2016-03-14 10:01 ` Jan Kara 2016-03-14 14:51 ` Wilcox, Matthew R 2016-03-14 14:51 ` Wilcox, Matthew R 2016-03-15 9:50 ` Jan Kara 2016-03-15 9:50 ` Jan Kara 2016-03-10 19:18 ` [PATCH 06/12] dax: Remove complete_unwritten argument Jan Kara 2016-03-10 19:18 ` Jan Kara 2016-03-10 19:18 ` [PATCH 07/12] dax: Fix data corruption for written and mmapped files Jan Kara 2016-03-10 19:18 ` Jan Kara 2016-03-10 19:18 ` [PATCH 08/12] dax: Fix bogus fault return value on cow faults Jan Kara 2016-03-10 19:18 ` Jan Kara 2016-03-10 19:18 ` [PATCH 09/12] dax: Allow DAX code to replace exceptional entries Jan Kara 2016-03-10 19:18 ` Jan Kara 2016-03-10 19:18 ` [PATCH 10/12] dax: Remove redundant inode size checks Jan Kara 2016-03-10 19:18 ` Jan Kara 2016-03-10 19:18 ` [PATCH 11/12] dax: Disable huge page handling Jan Kara 2016-03-10 19:34 ` Dan Williams 2016-03-10 19:34 ` Dan Williams 2016-03-10 19:52 ` Jan Kara 2016-03-10 19:52 ` Jan Kara 2016-03-10 19:18 ` [PATCH 12/12] dax: New fault locking Jan Kara 2016-03-10 19:18 ` Jan Kara 2016-03-10 23:54 ` NeilBrown 2016-03-10 23:54 ` NeilBrown 2016-03-15 21:34 ` NeilBrown 2016-03-15 21:34 ` NeilBrown 2016-03-18 14:16 ` Jan Kara [this message] 2016-03-18 14:16 ` Jan Kara 2016-03-18 15:39 ` Jan Kara 2016-03-18 15:39 ` Jan Kara 2016-03-22 21:10 ` NeilBrown 2016-03-22 21:10 ` NeilBrown 2016-03-23 11:00 ` Jan Kara 2016-03-23 11:00 ` Jan Kara 2016-03-31 4:20 ` NeilBrown 2016-03-31 4:20 ` NeilBrown 2016-03-31 8:54 ` Jan Kara 2016-03-31 8:54 ` Jan Kara 2016-04-01 0:34 ` NeilBrown 2016-04-01 0:34 ` NeilBrown
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20160318141618.GF7152@quack.suse.cz \ --to=jack@suse.cz \ --cc=linux-nvdimm@lists.01.org \ --cc=neilb@suse.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.