All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Dave Chinner <david@fromorbit.com>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v4 17/18] mm, fs, dax: dax_flush_dma, handle dma vs block-map-change collisions
Date: Fri, 9 Mar 2018 08:15:06 -0800	[thread overview]
Message-ID: <CAPcyv4iO1TVLXPPC=RXHWLfaSXO0wBcSOUMJevZgbeVmF+vrTA@mail.gmail.com> (raw)
In-Reply-To: <20180309125625.n6hpu6ooi6rhzqzs@quack2.suse.cz>

On Fri, Mar 9, 2018 at 4:56 AM, Jan Kara <jack@suse.cz> wrote:
> On Thu 08-03-18 09:02:30, Dan Williams wrote:
>> On Mon, Jan 8, 2018 at 5:50 AM, Jan Kara <jack@suse.cz> wrote:
>> > On Sun 07-01-18 13:58:42, Dan Williams wrote:
>> >> On Thu, Jan 4, 2018 at 3:12 AM, Jan Kara <jack@suse.cz> wrote:
>> >> > On Sat 23-12-17 16:57:31, Dan Williams wrote:
>> >> >
>> >> >> +     /*
>> >> >> +      * Flush dax_dma_lock() sections to ensure all possible page
>> >> >> +      * references have been taken, or will block on the fs
>> >> >> +      * 'mmap_lock'.
>> >> >> +      */
>> >> >> +     synchronize_rcu();
>> >> >
>> >> > Frankly, I don't like synchronize_rcu() in a relatively hot path like this.
>> >> > Cannot we just abuse get_dev_pagemap() to fail if truncation is in progress
>> >> > for the pfn? We could indicate that by some bit in struct page or something
>> >> > like that.
>> >>
>> >> We would need a lockless way to take a reference conditionally if the
>> >> page is not subject to truncation.
>> >>
>> >> I recall the raid5 code did something similar where it split a
>> >> reference count into 2 fields. I.e. take page->_refcount and use the
>> >> upper bits as a truncation count. Something like:
>> >>
>> >> do {
>> >>     old = atomic_read(&page->_refcount);
>> >>     if (old & trunc_mask) /* upper bits of _refcount */
>> >>         return false;
>> >>     new = cnt + 1;
>> >> } while (atomic_cmpxchg(&page->_refcount, old, new) != old);
>> >> return true; /* we incremented the _refcount while the truncation
>> >> count was zero */
>> >>
>> >> ...the only concern is teaching the put_page() path to consider that
>> >> 'trunc_mask' when determining that the page is idle.
>> >>
>> >> Other ideas?
>> >
>> > What I rather thought about was an update to GUP paths (like
>> > follow_page_pte()):
>> >
>> >         if (flags & FOLL_GET) {
>> >                 get_page(page);
>> >                 if (pte_devmap(pte)) {
>> >                         /*
>> >                          * Pairs with the barrier in the truncate path.
>> >                          * Could be possibly _after_atomic version of the
>> >                          * barrier.
>> >                          */
>> >                         smp_mb();
>> >                         if (PageTruncateInProgress(page)) {
>> >                                 put_page(page);
>> >                                 ..bail...
>> >                         }
>> >                 }
>> >         }
>> >
>> > and in the truncate path:
>> >
>> >         down_write(inode->i_mmap_sem);
>> >         walk all pages in the mapping and mark them PageTruncateInProgress().
>> >         unmap_mapping_range(...);
>> >         /*
>> >          * Pairs with the barrier in GUP path. In fact not necessary since
>> >          * unmap_mapping_range() provides us with the barrier already.
>> >          */
>> >         smp_mb();
>> >         /*
>> >          * By now we are either guaranteed to see grabbed page reference or
>> >          * GUP is guaranteed to see PageTruncateInProgress().
>> >          */
>> >         while ((page = dax_find_referenced_page(mapping))) {
>> >                 ...
>> >         }
>> >
>> > The barriers need some verification, I've opted for the conservative option
>> > but I guess you get the idea.
>>
>> [ Reviving this thread for the next rev of this patch set for 4.17
>> consideration ]
>>
>> I don't think this barrier scheme can work in the presence of
>> get_user_pages_fast(). The get_user_pages_fast() path can race
>> unmap_mapping_range() to take out an elevated reference count on a
>> page.
>
> Why the scheme cannot work? Sure you'd need to patch also gup_pte_range()
> and a similar thing for PMDs to recheck PageTruncateInProgress() after
> grabbing the page reference. But in principle I don't see anything
> fundamentally different between gup_fast() and plain gup().

Ah, yes I didn't grok the abort on PageTruncateInProgress() until I
read this again (and again), I'll try that.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@lst.de>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>
Subject: Re: [PATCH v4 17/18] mm, fs, dax: dax_flush_dma, handle dma vs block-map-change collisions
Date: Fri, 9 Mar 2018 08:15:06 -0800	[thread overview]
Message-ID: <CAPcyv4iO1TVLXPPC=RXHWLfaSXO0wBcSOUMJevZgbeVmF+vrTA@mail.gmail.com> (raw)
In-Reply-To: <20180309125625.n6hpu6ooi6rhzqzs@quack2.suse.cz>

On Fri, Mar 9, 2018 at 4:56 AM, Jan Kara <jack@suse.cz> wrote:
> On Thu 08-03-18 09:02:30, Dan Williams wrote:
>> On Mon, Jan 8, 2018 at 5:50 AM, Jan Kara <jack@suse.cz> wrote:
>> > On Sun 07-01-18 13:58:42, Dan Williams wrote:
>> >> On Thu, Jan 4, 2018 at 3:12 AM, Jan Kara <jack@suse.cz> wrote:
>> >> > On Sat 23-12-17 16:57:31, Dan Williams wrote:
>> >> >
>> >> >> +     /*
>> >> >> +      * Flush dax_dma_lock() sections to ensure all possible page
>> >> >> +      * references have been taken, or will block on the fs
>> >> >> +      * 'mmap_lock'.
>> >> >> +      */
>> >> >> +     synchronize_rcu();
>> >> >
>> >> > Frankly, I don't like synchronize_rcu() in a relatively hot path like this.
>> >> > Cannot we just abuse get_dev_pagemap() to fail if truncation is in progress
>> >> > for the pfn? We could indicate that by some bit in struct page or something
>> >> > like that.
>> >>
>> >> We would need a lockless way to take a reference conditionally if the
>> >> page is not subject to truncation.
>> >>
>> >> I recall the raid5 code did something similar where it split a
>> >> reference count into 2 fields. I.e. take page->_refcount and use the
>> >> upper bits as a truncation count. Something like:
>> >>
>> >> do {
>> >>     old = atomic_read(&page->_refcount);
>> >>     if (old & trunc_mask) /* upper bits of _refcount */
>> >>         return false;
>> >>     new = cnt + 1;
>> >> } while (atomic_cmpxchg(&page->_refcount, old, new) != old);
>> >> return true; /* we incremented the _refcount while the truncation
>> >> count was zero */
>> >>
>> >> ...the only concern is teaching the put_page() path to consider that
>> >> 'trunc_mask' when determining that the page is idle.
>> >>
>> >> Other ideas?
>> >
>> > What I rather thought about was an update to GUP paths (like
>> > follow_page_pte()):
>> >
>> >         if (flags & FOLL_GET) {
>> >                 get_page(page);
>> >                 if (pte_devmap(pte)) {
>> >                         /*
>> >                          * Pairs with the barrier in the truncate path.
>> >                          * Could be possibly _after_atomic version of the
>> >                          * barrier.
>> >                          */
>> >                         smp_mb();
>> >                         if (PageTruncateInProgress(page)) {
>> >                                 put_page(page);
>> >                                 ..bail...
>> >                         }
>> >                 }
>> >         }
>> >
>> > and in the truncate path:
>> >
>> >         down_write(inode->i_mmap_sem);
>> >         walk all pages in the mapping and mark them PageTruncateInProgress().
>> >         unmap_mapping_range(...);
>> >         /*
>> >          * Pairs with the barrier in GUP path. In fact not necessary since
>> >          * unmap_mapping_range() provides us with the barrier already.
>> >          */
>> >         smp_mb();
>> >         /*
>> >          * By now we are either guaranteed to see grabbed page reference or
>> >          * GUP is guaranteed to see PageTruncateInProgress().
>> >          */
>> >         while ((page = dax_find_referenced_page(mapping))) {
>> >                 ...
>> >         }
>> >
>> > The barriers need some verification, I've opted for the conservative option
>> > but I guess you get the idea.
>>
>> [ Reviving this thread for the next rev of this patch set for 4.17
>> consideration ]
>>
>> I don't think this barrier scheme can work in the presence of
>> get_user_pages_fast(). The get_user_pages_fast() path can race
>> unmap_mapping_range() to take out an elevated reference count on a
>> page.
>
> Why the scheme cannot work? Sure you'd need to patch also gup_pte_range()
> and a similar thing for PMDs to recheck PageTruncateInProgress() after
> grabbing the page reference. But in principle I don't see anything
> fundamentally different between gup_fast() and plain gup().

Ah, yes I didn't grok the abort on PageTruncateInProgress() until I
read this again (and again), I'll try that.

  reply	other threads:[~2018-03-09 16:08 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-24  0:56 [PATCH v4 00/18] dax: fix dma vs truncate/hole-punch Dan Williams
2017-12-24  0:56 ` Dan Williams
2017-12-24  0:56 ` Dan Williams
2017-12-24  0:56 ` [PATCH v4 01/18] mm, dax: introduce pfn_t_special() Dan Williams
2017-12-24  0:56   ` Dan Williams
2018-01-04  8:16   ` Christoph Hellwig
2018-01-04  8:16     ` Christoph Hellwig
2017-12-24  0:56 ` [PATCH v4 02/18] ext4: auto disable dax instead of failing mount Dan Williams
2017-12-24  0:56   ` Dan Williams
2018-01-03 14:20   ` Jan Kara
2018-01-03 14:20     ` Jan Kara
2017-12-24  0:56 ` [PATCH v4 03/18] ext2: " Dan Williams
2017-12-24  0:56   ` Dan Williams
2018-01-03 14:21   ` Jan Kara
2018-01-03 14:21     ` Jan Kara
2017-12-24  0:56 ` [PATCH v4 04/18] dax: require 'struct page' by default for filesystem dax Dan Williams
2017-12-24  0:56   ` Dan Williams
2018-01-03 15:29   ` Jan Kara
2018-01-03 15:29     ` Jan Kara
2018-01-04  8:16   ` Christoph Hellwig
2018-01-04  8:16     ` Christoph Hellwig
2018-01-08 11:58   ` Gerald Schaefer
2018-01-08 11:58     ` Gerald Schaefer
2017-12-24  0:56 ` [PATCH v4 05/18] dax: stop using VM_MIXEDMAP for dax Dan Williams
2017-12-24  0:56   ` Dan Williams
2018-01-03 15:27   ` Jan Kara
2018-01-03 15:27     ` Jan Kara
2017-12-24  0:56 ` [PATCH v4 06/18] dax: stop using VM_HUGEPAGE " Dan Williams
2017-12-24  0:56   ` Dan Williams
2017-12-24  0:56 ` [PATCH v4 07/18] dax: store pfns in the radix Dan Williams
2017-12-24  0:56   ` Dan Williams
2017-12-27  0:17   ` Ross Zwisler
2017-12-27  0:17     ` Ross Zwisler
2018-01-02 20:15     ` Dan Williams
2018-01-02 20:15       ` Dan Williams
2018-01-03 15:39   ` Jan Kara
2018-01-03 15:39     ` Jan Kara
2017-12-24  0:56 ` [PATCH v4 08/18] tools/testing/nvdimm: add 'bio_delay' mechanism Dan Williams
2017-12-24  0:56   ` Dan Williams
2017-12-27 18:08   ` Ross Zwisler
2017-12-27 18:08     ` Ross Zwisler
2018-01-02 20:35     ` Dan Williams
2018-01-02 20:35       ` Dan Williams
2018-01-02 21:44   ` Dave Chinner
2018-01-02 21:44     ` Dave Chinner
2018-01-02 21:51     ` Dan Williams
2018-01-02 21:51       ` Dan Williams
2018-01-03 15:46       ` Jan Kara
2018-01-03 15:46         ` Jan Kara
2018-01-03 20:37         ` Jeff Moyer
2018-01-03 20:37           ` Jeff Moyer
2017-12-24  0:56 ` [PATCH v4 09/18] mm, dax: enable filesystems to trigger dev_pagemap ->page_free callbacks Dan Williams
2017-12-24  0:56   ` Dan Williams
2018-01-04  8:20   ` Christoph Hellwig
2018-01-04  8:20     ` Christoph Hellwig
2017-12-24  0:56 ` [PATCH v4 10/18] mm, dev_pagemap: introduce CONFIG_DEV_PAGEMAP_OPS Dan Williams
2017-12-24  0:56   ` Dan Williams
2018-01-04  8:25   ` Christoph Hellwig
2018-01-04  8:25     ` Christoph Hellwig
2017-12-24  0:56 ` [PATCH v4 11/18] fs, dax: introduce DEFINE_FSDAX_AOPS Dan Williams
2017-12-24  0:56   ` Dan Williams
2017-12-27  5:29   ` Matthew Wilcox
2017-12-27  5:29     ` Matthew Wilcox
2018-01-02 20:21     ` Dan Williams
2018-01-02 20:21       ` Dan Williams
2018-01-03 16:05       ` Jan Kara
2018-01-03 16:05         ` Jan Kara
2018-01-04  8:27         ` Christoph Hellwig
2018-01-04  8:27           ` Christoph Hellwig
2018-01-02 21:41   ` Dave Chinner
2018-01-02 21:41     ` Dave Chinner
2017-12-24  0:57 ` [PATCH v4 12/18] xfs: use DEFINE_FSDAX_AOPS Dan Williams
2017-12-24  0:57   ` Dan Williams
2018-01-02 21:15   ` Darrick J. Wong
2018-01-02 21:15     ` Darrick J. Wong
2018-01-02 21:40     ` Dan Williams
2018-01-02 21:40       ` Dan Williams
2018-01-03 16:09       ` Jan Kara
2018-01-03 16:09         ` Jan Kara
2018-01-04  8:28   ` Christoph Hellwig
2018-01-04  8:28     ` Christoph Hellwig
2017-12-24  0:57 ` [PATCH v4 13/18] ext4: " Dan Williams
2017-12-24  0:57   ` Dan Williams
2017-12-24  0:57   ` Dan Williams
2018-01-04  8:29   ` Christoph Hellwig
2018-01-04  8:29     ` Christoph Hellwig
2018-01-04  8:29     ` Christoph Hellwig
2017-12-24  0:57 ` [PATCH v4 14/18] ext2: " Dan Williams
2017-12-24  0:57   ` Dan Williams
2018-01-04  8:29   ` Christoph Hellwig
2018-01-04  8:29     ` Christoph Hellwig
2017-12-24  0:57 ` [PATCH v4 15/18] mm, fs, dax: use page->mapping to warn if dma collides with truncate Dan Williams
2017-12-24  0:57   ` Dan Williams
2018-01-04  8:30   ` Christoph Hellwig
2018-01-04  8:30     ` Christoph Hellwig
2018-01-04  9:39   ` Jan Kara
2018-01-04  9:39     ` Jan Kara
2017-12-24  0:57 ` [PATCH v4 16/18] wait_bit: introduce {wait_on,wake_up}_atomic_one Dan Williams
2017-12-24  0:57   ` Dan Williams
2018-01-04  8:30   ` Christoph Hellwig
2018-01-04  8:30     ` Christoph Hellwig
2017-12-24  0:57 ` [PATCH v4 17/18] mm, fs, dax: dax_flush_dma, handle dma vs block-map-change collisions Dan Williams
2017-12-24  0:57   ` Dan Williams
2018-01-04  8:31   ` Christoph Hellwig
2018-01-04  8:31     ` Christoph Hellwig
2018-01-04 11:12   ` Jan Kara
2018-01-04 11:12     ` Jan Kara
2018-01-07 21:58     ` Dan Williams
2018-01-07 21:58       ` Dan Williams
2018-01-08 13:50       ` Jan Kara
2018-01-08 13:50         ` Jan Kara
2018-03-08 17:02         ` Dan Williams
2018-03-08 17:02           ` Dan Williams
2018-03-09 12:56           ` Jan Kara
2018-03-09 12:56             ` Jan Kara
2018-03-09 16:15             ` Dan Williams [this message]
2018-03-09 16:15               ` Dan Williams
2018-03-09 17:26               ` Dan Williams
2018-03-09 17:26                 ` Dan Williams
2017-12-24  0:57 ` [PATCH v4 18/18] xfs, dax: wire up dax_flush_dma support via a new xfs_sync_dma helper Dan Williams
2017-12-24  0:57   ` Dan Williams
2018-01-02 21:07   ` Darrick J. Wong
2018-01-02 21:07     ` Darrick J. Wong
2018-01-02 23:00   ` Dave Chinner
2018-01-02 23:00     ` Dave Chinner
2018-01-03  2:21     ` Dan Williams
2018-01-03  2:21       ` Dan Williams
2018-01-03  7:51       ` Dave Chinner
2018-01-03  7:51         ` Dave Chinner
2018-01-04  8:34         ` Christoph Hellwig
2018-01-04  8:34           ` Christoph Hellwig
2018-01-04  8:33     ` Christoph Hellwig
2018-01-04  8:33       ` Christoph Hellwig
2018-01-04  8:17 ` [PATCH v4 00/18] dax: fix dma vs truncate/hole-punch Christoph Hellwig
2018-01-04  8:17   ` Christoph Hellwig
2018-01-04  8:17   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4iO1TVLXPPC=RXHWLfaSXO0wBcSOUMJevZgbeVmF+vrTA@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=darrick.wong@oracle.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.