nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Josh Triplett <josh.triplett@intel.com>,
	Mike Snitzer <snitzer@redhat.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Dave Chinner <david@fromorbit.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v8 15/18] mm, fs, dax: handle layout changes to pinned dax mappings
Date: Thu, 19 Apr 2018 20:00:09 -0700	[thread overview]
Message-ID: <CAPcyv4ievTQtaMRBe8Pdz=f-rvpcv82tLDTFbJ6v5WSixU91pg@mail.gmail.com> (raw)
In-Reply-To: <20180419104432.7lzk7nbjmwav6ojl@quack2.suse.cz>

On Thu, Apr 19, 2018 at 3:44 AM, Jan Kara <jack@suse.cz> wrote:
> On Fri 13-04-18 15:03:51, Dan Williams wrote:
>> On Mon, Apr 9, 2018 at 9:51 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>> > On Mon, Apr 9, 2018 at 9:49 AM, Jan Kara <jack@suse.cz> wrote:
>> >> On Sat 07-04-18 12:38:24, Dan Williams wrote:
>> > [..]
>> >>> I wonder if this can be trivially solved by using srcu. I.e. we don't
>> >>> need to wait for a global quiescent state, just a
>> >>> get_user_pages_fast() quiescent state. ...or is that an abuse of the
>> >>> srcu api?
>> >>
>> >> Well, I'd rather use the percpu rwsemaphore (linux/percpu-rwsem.h) than
>> >> SRCU. It is a more-or-less standard locking mechanism rather than relying
>> >> on implementation properties of SRCU which is a data structure protection
>> >> method. And the overhead of percpu rwsemaphore for your use case should be
>> >> about the same as that of SRCU.
>> >
>> > I was just about to ask that. Yes, it seems they would share similar
>> > properties and it would be better to use the explicit implementation
>> > rather than a side effect of srcu.
>>
>> ...unfortunately:
>>
>>  BUG: sleeping function called from invalid context at
>> ./include/linux/percpu-rwsem.h:34
>>  [..]
>>  Call Trace:
>>   dump_stack+0x85/0xcb
>>   ___might_sleep+0x15b/0x240
>>   dax_layout_lock+0x18/0x80
>>   get_user_pages_fast+0xf8/0x140
>>
>> ...and thinking about it more srcu is a better fit. We don't need the
>> 100% exclusion provided by an rwsem we only need the guarantee that
>> all cpus that might have been running get_user_pages_fast() have
>> finished it at least once.
>>
>> In my tests synchronize_srcu is a bit slower than unpatched for the
>> trivial 100 truncate test, but certainly not the 200x latency you were
>> seeing with syncrhonize_rcu.
>>
>> Elapsed time:
>> 0.006149178 unpatched
>> 0.009426360 srcu
>
> Hum, right. Yesterday I was looking into KSM for a different reason and
> I've noticed it also does writeprotect pages and deals with races with GUP.
> And what KSM relies on is:
>
> write_protect_page()
>   ...
>   entry = ptep_clear_flush(vma, pvmw.address, pvmw.pte);
>   /*
>    * Check that no O_DIRECT or similar I/O is in progress on the
>    * page
>    */
>   if (page_mapcount(page) + 1 + swapped != page_count(page)) {
>     page used -> bail

Slick.

>   }
>
> And this really works because gup_pte_range() does:
>
>   page = pte_page(pte);
>   head = compound_head(page);
>
>   if (!page_cache_get_speculative(head))
>     goto pte_unmap;
>
>   if (unlikely(pte_val(pte) != pte_val(*ptep))) {
>     bail

Need to add a similar check to __gup_device_huge_pmd.

>   }
>
> So either write_protect_page() page sees the elevated reference or
> gup_pte_range() bails because it will see the pte changed.
>
> In the truncate path things are a bit different but in principle the same
> should work - once truncate blocks page faults and unmaps pages from page
> tables, we can be sure GUP will not grab the page anymore or we'll see
> elevated page count. So IMO there's no need for any additional locking
> against the GUP path (but a comment explaining this is highly desirable I
> guess).

Yes, those "pte_val(pte) != pte_val(*ptep)" checks should be
documented for the same reason we require comments on rmb/wmb pairs.
I'll take a look, thanks Jan.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

  reply	other threads:[~2018-04-20  3:00 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-31  4:02 [PATCH v8 00/18] dax: fix dma vs truncate/hole-punch Dan Williams
2018-03-31  4:02 ` [PATCH v8 01/18] dax: store pfns in the radix Dan Williams
2018-03-31  4:02 ` [PATCH v8 02/18] fs, dax: prepare for dax-specific address_space_operations Dan Williams
2018-03-31  4:02 ` [PATCH v8 03/18] block, dax: remove dead code in blkdev_writepages() Dan Williams
2018-03-31  4:02 ` [PATCH v8 04/18] xfs, dax: introduce xfs_dax_aops Dan Williams
2018-03-31  4:02 ` [PATCH v8 05/18] ext4, dax: introduce ext4_dax_aops Dan Williams
2018-04-03 11:50   ` Jan Kara
2018-03-31  4:02 ` [PATCH v8 06/18] ext2, dax: introduce ext2_dax_aops Dan Williams
2018-04-03 11:51   ` Jan Kara
2018-03-31  4:02 ` [PATCH v8 07/18] fs, dax: use page->mapping to warn if truncate collides with a busy page Dan Williams
2018-03-31  4:02 ` [PATCH v8 08/18] dax: introduce CONFIG_DAX_DRIVER Dan Williams
2018-03-31  4:02 ` [PATCH v8 09/18] dax, dm: allow device-mapper to operate without dax support Dan Williams
2018-03-31  4:03 ` [PATCH v8 10/18] dax, dm: introduce ->fs_{claim, release}() dax_device infrastructure Dan Williams
2018-04-03 18:24   ` Dan Williams
2018-04-03 19:39     ` Mike Snitzer
2018-04-03 19:47       ` Dan Williams
2018-04-03 20:36         ` [PATCH v9] " Dan Williams
2018-04-03 21:13           ` Mike Snitzer
2018-03-31  4:03 ` [PATCH v8 11/18] mm, dax: enable filesystems to trigger dev_pagemap ->page_free callbacks Dan Williams
2018-04-04 21:23   ` [v8, " Andrei Vagin
2018-04-04 21:27     ` Dan Williams
2018-04-04 21:35       ` Dan Williams
2018-04-04 23:19         ` Stephen Rothwell
2018-04-04 21:40     ` Andrei Vagin
2018-03-31  4:03 ` [PATCH v8 12/18] memremap: split devm_memremap_pages() and memremap() infrastructure Dan Williams
2018-03-31  4:03 ` [PATCH v8 13/18] mm, dev_pagemap: introduce CONFIG_DEV_PAGEMAP_OPS Dan Williams
2018-03-31  4:03 ` [PATCH v8 14/18] memremap: mark devm_memremap_pages() EXPORT_SYMBOL_GPL Dan Williams
2018-03-31  4:03 ` [PATCH v8 15/18] mm, fs, dax: handle layout changes to pinned dax mappings Dan Williams
2018-04-04  9:46   ` Jan Kara
2018-04-04 10:06     ` Jan Kara
2018-04-04 14:12     ` Dan Williams
2018-04-07 19:38     ` Dan Williams
2018-04-08  3:11       ` Paul E. McKenney
2018-04-09 16:39         ` Jan Kara
2018-04-09 18:14           ` Paul E. McKenney
2018-04-09 16:49       ` Jan Kara
2018-04-09 16:51         ` Dan Williams
2018-04-13 22:03           ` Dan Williams
2018-04-13 22:48             ` Paul E. McKenney
2018-04-19 10:44             ` Jan Kara
2018-04-20  3:00               ` Dan Williams [this message]
2018-03-31  4:03 ` [PATCH v8 16/18] xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL Dan Williams
2018-03-31  4:03 ` [PATCH v8 17/18] xfs: prepare xfs_break_layouts() for another layout type Dan Williams
2018-03-31  4:03 ` [PATCH v8 18/18] xfs, dax: introduce xfs_break_dax_layouts() Dan Williams
2018-04-04  9:55   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4ievTQtaMRBe8Pdz=f-rvpcv82tLDTFbJ6v5WSixU91pg@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=darrick.wong@oracle.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=josh.triplett@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=snitzer@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).