linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: alexander.h.duyck@linux.intel.com
Cc: "Jérôme Glisse" <jglisse@redhat.com>,
	"Barret Rhoden" <brho@google.com>,
	"Zhang, Yu C" <yu.c.zhang@intel.com>,
	"KVM list" <kvm@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	"Jan Kara" <jack@suse.cz>, "David Hildenbrand" <david@redhat.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Linux MM" <linux-mm@kvack.org>,
	rkrcmar@redhat.com, "Paolo Bonzini" <pbonzini@redhat.com>,
	"Christoph Hellwig" <hch@lst.de>
Subject: Re: [PATCH RFC 2/3] mm: Add support for exposing if dev_pagemap supports refcount pinning
Date: Mon, 3 Dec 2018 13:05:37 -0800	[thread overview]
Message-ID: <CAPcyv4hPDjHzKd4wTh8Ujv-xL8YsJpcFXOp5ocJ-5fVJZ3=vRw@mail.gmail.com> (raw)
In-Reply-To: <2a3f70b011b56de2289e2f304b3d2d617c5658fb.camel@linux.intel.com>

On Mon, Dec 3, 2018 at 12:53 PM Alexander Duyck
<alexander.h.duyck@linux.intel.com> wrote:
>
> On Mon, 2018-12-03 at 12:31 -0800, Dan Williams wrote:
> > On Mon, Dec 3, 2018 at 12:21 PM Alexander Duyck
> > <alexander.h.duyck@linux.intel.com> wrote:
> > >
> > > On Mon, 2018-12-03 at 11:47 -0800, Dan Williams wrote:
> > > > On Mon, Dec 3, 2018 at 11:25 AM Alexander Duyck
> > > > <alexander.h.duyck@linux.intel.com> wrote:
> > > > >
> > > > > Add a means of exposing if a pagemap supports refcount pinning. I am doing
> > > > > this to expose if a given pagemap has backing struct pages that will allow
> > > > > for the reference count of the page to be incremented to lock the page
> > > > > into place.
> > > > >
> > > > > The KVM code already has several spots where it was trying to use a
> > > > > pfn_valid check combined with a PageReserved check to determien if it could
> > > > > take a reference on the page. I am adding this check so in the case of the
> > > > > page having the reserved flag checked we can check the pagemap for the page
> > > > > to determine if we might fall into the special DAX case.
> > > > >
> > > > > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > > > > ---
> > > > >  drivers/nvdimm/pfn_devs.c |    2 ++
> > > > >  include/linux/memremap.h  |    5 ++++-
> > > > >  include/linux/mm.h        |   11 +++++++++++
> > > > >  3 files changed, 17 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> > > > > index 6f22272e8d80..7a4a85bcf7f4 100644
> > > > > --- a/drivers/nvdimm/pfn_devs.c
> > > > > +++ b/drivers/nvdimm/pfn_devs.c
> > > > > @@ -640,6 +640,8 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap)
> > > > >         } else
> > > > >                 return -ENXIO;
> > > > >
> > > > > +       pgmap->support_refcount_pinning = true;
> > > > > +
> > > >
> > > > There should be no dev_pagemap instance instance where this isn't
> > > > true, so I'm missing why this is needed?
> > >
> > > I thought in the case of HMM there were instances where you couldn't
> > > pin the page, isn't there? Specifically I am thinking of the definition
> > > of MEMORY_DEVICE_PUBLIC:
> > >   Device memory that is cache coherent from device and CPU point of
> > >   view. This is use on platform that have an advance system bus (like
> > >   CAPI or CCIX). A driver can hotplug the device memory using
> > >   ZONE_DEVICE and with that memory type. Any page of a process can be
> > >   migrated to such memory. However no one should be allow to pin such
> > >   memory so that it can always be evicted.
> > >
> > > It sounds like MEMORY_DEVICE_PUBLIC and MMIO would want to fall into
> > > the same category here in order to allow a hot-plug event to remove the
> > > device and take the memory with it, or is my understanding on this not
> > > correct?
> >
> > I don't understand how HMM expects to enforce no pinning, but in any
> > event it should always be the expectation an elevated reference count
> > on a page prevents that page from disappearing. Anything else is
> > broken.
>
> I don't think that is true for device MMIO though.
>
> In the case of MMIO you have the memory region backed by a device, if
> that device is hot-plugged or fails in some way then that backing would
> go away and the reads would return and all 1's response.

Until p2pdma there are no struct pages for device memory, is that what
you're referring?

Otherwise any device driver that leaks "struct pages" into random code
paths in the kernel had better not expect to be able to
surprise-remove those pages from the system. Any dev_pagemap user
should expect to do a coordinated removal with the driver that waits
for page references to drop before the device can be physically
removed.

> Holding a reference to the page doesn't guarantee that the backing
> device cannot go away.

Correct there is no physical guarantee, but that's not the point. It
needs to be coordinated, otherwise all bets are off with respect to
system stability.

> I believe that is the origin of the original use
> of the PageReserved check in KVM in terms of if it will try to use the
> get_page/put_page functions.

Is it? MMIO does not typically have a corresponding 'struct page'.

> I believe this is also why
> MEMORY_DEVICE_PUBLIC specifically calls out that you should not allow
> pinning such memory.

I don't think that call out was referencing device hotplug, I believe
it was the HMM expectation that it should be able to move an HMM page
from device to System-RAM at will.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

  reply	other threads:[~2018-12-03 21:05 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03 19:25 [PATCH RFC 0/3] Fix KVM misinterpreting Reserved page as an MMIO page Alexander Duyck
2018-12-03 19:25 ` [PATCH RFC 1/3] kvm: Split use cases for kvm_is_reserved_pfn to kvm_is_refcounted_pfn Alexander Duyck
2018-12-03 19:25 ` [PATCH RFC 2/3] mm: Add support for exposing if dev_pagemap supports refcount pinning Alexander Duyck
2018-12-03 19:47   ` Dan Williams
2018-12-03 20:21     ` Alexander Duyck
2018-12-03 20:31       ` Dan Williams
2018-12-03 20:53         ` Alexander Duyck
2018-12-03 21:05           ` Dan Williams [this message]
2018-12-03 21:50             ` Alexander Duyck
2018-12-04 19:08               ` Dan Williams
2018-12-04 22:51                 ` Alexander Duyck
2018-12-04 23:24                   ` Barret Rhoden
2018-12-05  0:01                     ` Alexander Duyck
2018-12-05  0:26                       ` Dan Williams
2018-12-05  8:13                         ` David Hildenbrand
2018-12-03 19:25 ` [PATCH RFC 3/3] kvm: Add additional check to determine if a page is refcounted Alexander Duyck
2018-12-04  6:59 ` [PATCH RFC 0/3] Fix KVM misinterpreting Reserved page as an MMIO page Yi Zhang
2018-12-04 18:45   ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4hPDjHzKd4wTh8Ujv-xL8YsJpcFXOp5ocJ-5fVJZ3=vRw@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=brho@google.com \
    --cc=david@redhat.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=yu.c.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).