netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Björn Töpel" <bjorn.topel@intel.com>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Dave Chinner" <david@fromorbit.com>,
	"David Airlie" <airlied@linux.ie>,
	"David S . Miller" <davem@davemloft.net>,
	"Ira Weiny" <ira.weiny@intel.com>, "Jan Kara" <jack@suse.cz>,
	"Jason Gunthorpe" <jgg@ziepe.ca>, "Jens Axboe" <axboe@kernel.dk>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Magnus Karlsson" <magnus.karlsson@intel.com>,
	"Mauro Carvalho Chehab" <mchehab@kernel.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Michal Hocko" <mhocko@suse.com>,
	"Mike Kravetz" <mike.kravetz@oracle.com>,
	"Paul Mackerras" <paulus@samba.org>,
	"Shuah Khan" <shuah@kernel.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	bpf@vger.kernel.org,
	"Maling list - DRI developers" <dri-devel@lists.freedesktop.org>,
	"KVM list" <kvm@vger.kernel.org>,
	linux-block@vger.kernel.org,
	"Linux Doc Mailing List" <linux-doc@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-kselftest@vger.kernel.org,
	"Linux-media@vger.kernel.org" <linux-media@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Netdev <netdev@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Ralph Campbell" <rcampbell@nvidia.com>
Subject: Re: [PATCH v4 04/23] mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages
Date: Wed, 13 Nov 2019 14:46:50 -0800	[thread overview]
Message-ID: <00148078-1795-da3e-916e-3ae2dcdd553d@nvidia.com> (raw)
In-Reply-To: <CAPcyv4hr64b-k4j7ZY796+k-+Dy11REMcvPJ+QjTsyJ3vSdfKg@mail.gmail.com>

On 11/13/19 2:00 PM, Dan Williams wrote:
...
>> Ugh, when did all this HMM specific manipulation sneak into the
>> generic ZONE_DEVICE path? It used to be gated by pgmap type with its
>> own put_zone_device_private_page(). For example it's certainly
>> unnecessary and might be broken (would need to check) to call
>> mem_cgroup_uncharge() on a DAX page. ZONE_DEVICE users are not a
>> monolith and the HMM use case leaks pages into code paths that DAX
>> explicitly avoids.
> 
> It's been this way for a while and I did not react previously,
> apologies for that. I think __ClearPageActive, __ClearPageWaiters, and
> mem_cgroup_uncharge, belong behind a device-private conditional. The
> history here is:
> 
> Move some, but not all HMM specifics to hmm_devmem_free():
>      2fa147bdbf67 mm, dev_pagemap: Do not clear ->mapping on final put
> 
> Remove the clearing of mapping since no upstream consumers needed it:
>      b7a523109fb5 mm: don't clear ->mapping in hmm_devmem_free
> 
> Add it back in once an upstream consumer arrived:
>      7ab0ad0e74f8 mm/hmm: fix ZONE_DEVICE anon page mapping reuse
> 
> We're now almost entirely free of ->page_free callbacks except for
> that weird nouveau case, can that FIXME in nouveau_dmem_page_free()
> also result in killing the ->page_free() callback altogether? In the
> meantime I'm proposing a cleanup like this:


OK, assuming this is acceptable (no obvious problems jump out at me,
and we can also test it with HMM), then how would you like to proceed, as
far as patches go: add such a patch as part of this series here, or as a
stand-alone patch either before or after this series? Or something else?
And did you plan on sending it out as such?

Also, the diffs didn't quite make it through intact to my "git apply", so
I'm re-posting the diff in hopes that this time it survives:

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f9f76f6ba07b..21db1ce8c0ae 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -338,13 +338,7 @@ static void pmem_release_disk(void *__pmem)
  	put_disk(pmem->disk);
  }
  
-static void pmem_pagemap_page_free(struct page *page)
-{
-	wake_up_var(&page->_refcount);
-}
-
  static const struct dev_pagemap_ops fsdax_pagemap_ops = {
-	.page_free		= pmem_pagemap_page_free,
  	.kill			= pmem_pagemap_kill,
  	.cleanup		= pmem_pagemap_cleanup,
  };
diff --git a/mm/memremap.c b/mm/memremap.c
index 03ccbdfeb697..157edb8f7cf8 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -419,12 +419,6 @@ void __put_devmap_managed_page(struct page *page)
  	 * holds a reference on the page.
  	 */
  	if (count == 1) {
-		/* Clear Active bit in case of parallel mark_page_accessed */
-		__ClearPageActive(page);
-		__ClearPageWaiters(page);
-
-		mem_cgroup_uncharge(page);
-
  		/*
  		 * When a device_private page is freed, the page->mapping field
  		 * may still contain a (stale) mapping value. For example, the
@@ -446,10 +440,17 @@ void __put_devmap_managed_page(struct page *page)
  		 * handled differently or not done at all, so there is no need
  		 * to clear page->mapping.
  		 */
-		if (is_device_private_page(page))
-			page->mapping = NULL;
+		if (is_device_private_page(page)) {
+			/* Clear Active bit in case of parallel mark_page_accessed */
+			__ClearPageActive(page);
+			__ClearPageWaiters(page);
  
-		page->pgmap->ops->page_free(page);
+			mem_cgroup_uncharge(page);
+
+			page->mapping = NULL;
+			page->pgmap->ops->page_free(page);
+		} else
+			wake_up_var(&page->_refcount);
  	} else if (!count)
  		__put_page(page);
  }
-- 
2.24.0


thanks,
-- 
John Hubbard
NVIDIA

> 
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index ad8e4df1282b..4eae441f86c9 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -337,13 +337,7 @@ static void pmem_release_disk(void *__pmem)
>          put_disk(pmem->disk);
>   }
> 
> -static void pmem_pagemap_page_free(struct page *page)
> -{
> -       wake_up_var(&page->_refcount);
> -}
> -
>   static const struct dev_pagemap_ops fsdax_pagemap_ops = {
> -       .page_free              = pmem_pagemap_page_free,
>          .kill                   = pmem_pagemap_kill,
>          .cleanup                = pmem_pagemap_cleanup,
>   };
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 03ccbdfeb697..157edb8f7cf8 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -419,12 +419,6 @@ void __put_devmap_managed_page(struct page *page)
>           * holds a reference on the page.
>           */
>          if (count == 1) {
> -               /* Clear Active bit in case of parallel mark_page_accessed */
> -               __ClearPageActive(page);
> -               __ClearPageWaiters(page);
> -
> -               mem_cgroup_uncharge(page);
> -
>                  /*
>                   * When a device_private page is freed, the page->mapping field
>                   * may still contain a (stale) mapping value. For example, the
> @@ -446,10 +440,17 @@ void __put_devmap_managed_page(struct page *page)
>                   * handled differently or not done at all, so there is no need
>                   * to clear page->mapping.
>                   */
> -               if (is_device_private_page(page))
> -                       page->mapping = NULL;
> +               if (is_device_private_page(page)) {
> +                       /* Clear Active bit in case of parallel
> mark_page_accessed */
> +                       __ClearPageActive(page);
> +                       __ClearPageWaiters(page);
> 
> -               page->pgmap->ops->page_free(page);
> +                       mem_cgroup_uncharge(page);
> +
> +                       page->mapping = NULL;
> +                       page->pgmap->ops->page_free(page);
> +               } else
> +                       wake_up_var(&page->_refcount);
>          } else if (!count)
>                  __put_page(page);
>   }
> 

  reply	other threads:[~2019-11-13 22:49 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-13  4:26 [PATCH v4 00/23] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM John Hubbard
2019-11-13  4:26 ` [PATCH v4 01/23] mm/gup: pass flags arg to __gup_device_* functions John Hubbard
2019-11-13 10:50   ` Jan Kara
2019-11-13  4:26 ` [PATCH v4 02/23] mm/gup: factor out duplicate code from four routines John Hubbard
2019-11-13 11:15   ` Jan Kara
2019-11-13 23:12     ` John Hubbard
2019-11-13  4:26 ` [PATCH v4 03/23] mm/gup: move try_get_compound_head() to top, fix minor issues John Hubbard
2019-11-13 10:50   ` Jan Kara
2019-11-13 18:38   ` Ira Weiny
2019-11-13  4:26 ` [PATCH v4 04/23] mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages John Hubbard
2019-11-13 10:49   ` Jan Kara
2019-11-13 19:09   ` Jerome Glisse
2019-11-13 19:23   ` Dan Williams
2019-11-13 22:00     ` Dan Williams
2019-11-13 22:46       ` John Hubbard [this message]
2019-11-13 22:55         ` Dan Williams
2019-11-13 22:56           ` John Hubbard
2019-11-13 23:03       ` Jerome Glisse
2019-11-13  4:26 ` [PATCH v4 05/23] goldish_pipe: rename local pin_user_pages() routine John Hubbard
2019-11-13  4:26 ` [PATCH v4 06/23] IB/umem: use get_user_pages_fast() to pin DMA pages John Hubbard
2019-11-13  4:26 ` [PATCH v4 07/23] media/v4l2-core: set pages dirty upon releasing DMA buffers John Hubbard
2019-11-13  4:26 ` [PATCH v4 08/23] vfio, mm: fix get_user_pages_remote() and FOLL_LONGTERM John Hubbard
2019-11-13 13:02   ` Jason Gunthorpe
2019-11-13 19:17     ` Ira Weiny
2019-11-13 20:19       ` John Hubbard
2019-11-13 18:47   ` Ira Weiny
2019-11-13  4:26 ` [PATCH v4 09/23] mm/gup: introduce pin_user_pages*() and FOLL_PIN John Hubbard
2019-11-13 10:43   ` Jan Kara
2019-11-13 18:31     ` Ira Weiny
2019-11-13 18:45     ` Dan Williams
2019-11-13 23:22     ` John Hubbard
2019-11-14  6:08       ` John Hubbard
2019-11-13 18:59   ` Ira Weiny
2019-11-13 20:24     ` John Hubbard
2019-11-13  4:26 ` [PATCH v4 10/23] goldish_pipe: convert to pin_user_pages() and put_user_page() John Hubbard
2019-11-13  4:26 ` [PATCH v4 11/23] IB/{core,hw,umem}: set FOLL_PIN, FOLL_LONGTERM via pin_longterm_pages*() John Hubbard
2019-11-13  4:26 ` [PATCH v4 12/23] mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote() John Hubbard
2019-11-13  4:27 ` [PATCH v4 13/23] drm/via: set FOLL_PIN via pin_user_pages_fast() John Hubbard
2019-11-13  4:27 ` [PATCH v4 14/23] fs/io_uring: set FOLL_PIN via pin_user_pages() John Hubbard
2019-11-13  4:27 ` [PATCH v4 15/23] net/xdp: " John Hubbard
2019-11-13  4:27 ` [PATCH v4 16/23] mm/gup: track FOLL_PIN pages John Hubbard
2019-11-13  4:27 ` [PATCH v4 17/23] media/v4l2-core: pin_longterm_pages (FOLL_PIN) and put_user_page() conversion John Hubbard
2019-11-13  4:27 ` [PATCH v4 18/23] vfio, mm: " John Hubbard
2019-11-13  4:27 ` [PATCH v4 19/23] powerpc: book3s64: convert to pin_longterm_pages() and put_user_page() John Hubbard
2019-11-13  4:27 ` [PATCH v4 20/23] mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1" John Hubbard
2019-11-13 19:01   ` Ira Weiny
2019-11-13  4:27 ` [PATCH v4 21/23] mm/gup_benchmark: support pin_user_pages() and related calls John Hubbard
2019-11-13 19:03   ` Ira Weiny
2019-11-13  4:27 ` [PATCH v4 22/23] selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN coverage John Hubbard
2019-11-13 19:06   ` Ira Weiny
2019-11-13  4:27 ` [PATCH v4 23/23] mm/gup: remove support for gup(FOLL_LONGTERM) John Hubbard
2019-11-13 19:09   ` Ira Weiny
2019-11-13 23:27     ` John Hubbard
2019-11-13  4:32 ` [PATCH v4 00/23] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=00148078-1795-da3e-916e-3ae2dcdd553d@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=bjorn.topel@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=davem@davemloft.net \
    --cc=david@fromorbit.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@infradead.org \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=magnus.karlsson@intel.com \
    --cc=mchehab@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mpe@ellerman.id.au \
    --cc=netdev@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=rcampbell@nvidia.com \
    --cc=shuah@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).