All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Ingo Molnar <mingo@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>, Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Logan Gunthorpe <logang@deltatee.com>,
	Kirill Shutemov <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v2] mm, zone_device: replace {get, put}_zone_device_page() with a single reference
Date: Mon, 1 May 2017 16:32:37 -0400	[thread overview]
Message-ID: <20170501203236.GA20927@redhat.com> (raw)
In-Reply-To: <CAPcyv4gFMyXhqY9enam5v9nFwjSULLE=PUEqGP0psLMcA9fzDA@mail.gmail.com>

On Mon, May 01, 2017 at 01:19:24PM -0700, Dan Williams wrote:
> On Mon, May 1, 2017 at 6:55 AM, Jerome Glisse <jglisse@redhat.com> wrote:
> > On Mon, May 01, 2017 at 01:23:59PM +0300, Kirill A. Shutemov wrote:
> >> On Sun, Apr 30, 2017 at 07:14:24PM -0400, Jerome Glisse wrote:
> >> > On Sat, Apr 29, 2017 at 01:17:26PM +0300, Kirill A. Shutemov wrote:
> >> > > On Fri, Apr 28, 2017 at 03:33:07PM -0400, Jerome Glisse wrote:
> >> > > > On Fri, Apr 28, 2017 at 12:22:24PM -0700, Dan Williams wrote:
> >> > > > > Are you sure about needing to hook the 2 -> 1 transition? Could we
> >> > > > > change ZONE_DEVICE pages to not have an elevated reference count when
> >> > > > > they are created so you can keep the HMM references out of the mm hot
> >> > > > > path?
> >> > > >
> >> > > > 100% sure on that :) I need to callback into driver for 2->1 transition
> >> > > > no way around that. If we change ZONE_DEVICE to not have an elevated
> >> > > > reference count that you need to make a lot more change to mm so that
> >> > > > ZONE_DEVICE is never use as fallback for memory allocation. Also need
> >> > > > to make change to be sure that ZONE_DEVICE page never endup in one of
> >> > > > the path that try to put them back on lru. There is a lot of place that
> >> > > > would need to be updated and it would be highly intrusive and add a
> >> > > > lot of special cases to other hot code path.
> >> > >
> >> > > Could you explain more on where the requirement comes from or point me to
> >> > > where I can read about this.
> >> > >
> >> >
> >> > HMM ZONE_DEVICE pages are use like other pages (anonymous or file back page)
> >> > in _any_ vma. So i need to know when a page is freed ie either as result of
> >> > unmap, exit or migration or anything that would free the memory. For zone
> >> > device a page is free once its refcount reach 1 so i need to catch refcount
> >> > transition from 2->1
> >>
> >> What if we would rework zone device to have pages with refcount 0 at
> >> start?
> >
> > That is a _lot_ of work from top of my head because it would need changes
> > to a lot of places and likely more hot code path that simply adding some-
> > thing to put_page() note that i only need something in put_page() i do not
> > need anything in the get page path. Is adding a conditional branch for
> > HMM pages in put_page() that much of a problem ?
> >
> >
> >> > This is the only way i can inform the device that the page is now free. See
> >> >
> >> > https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-v21&id=52da8fe1a088b87b5321319add79e43b8372ed7d
> >> >
> >> > There is _no_ way around that.
> >>
> >> I'm still not convinced that it's impossible.
> >>
> >> Could you describe lifecycle for pages in case of HMM?
> >
> > Process malloc something, end it over to some function in the program
> > that use the GPU that function call GPU API (OpenCL, CUDA, ...) that
> > trigger a migration to device memory.
> >
> > So in the kernel you get a migration like any existing migration,
> > original page is unmap, if refcount is all ok (no pin) then a device
> > page is allocated and thing are migrated to device memory.
> >
> > What happen after is unknown. Either userspace/kernel driver decide
> > to migrate back to system memory, either there is an munmap, either
> > there is a CPU page fault, ... So from that point on the device page
> > as the exact same life as a regular page.
> >
> > Above i describe the migrate case, but you can also have new memory
> > allocation that directly allocate device memory. For instance if the
> > GPU do a page fault on an address that isn't back by anything then
> > we can directly allocate a device page. No migration involve in that
> > case.
> >
> > HMM pages are like any other pages in most respect. Exception are:
> >   - no GUP
> >   - no KSM
> >   - no lru reclaim
> >   - no NUMA balancing
> >   - no regular migration (existing migrate_page)
> >
> > The fact that minimum refcount for ZONE_DEVICE is 1 already gives
> > us for free most of the above exception. To convert the refcount to
> > be like other pages would mean that all of the above would need to
> > be audited and probably modify to ignore ZONE_DEVICE pages (i am
> > pretty sure Dan do not want any of the above either).
> 
> Right, adding HMM references to get_page() and put_page() seems less
> intrusive. Given how uncommon HMM hardware is (insert grumble about no
> visible upstream user of this functionality) I think the 'static
> branch' approach helps mitigate the impact for everything else.
> Looking back, I should have used that mechanism for the pmem use case,
> but it's moot now.

I do not need anything in get_page() all i need is something in put_page()
to catch the 2 -> 1 refcount transition to know when a page is freed.

Cheers,
Jérôme

WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <jglisse@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Ingo Molnar <mingo@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>, Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Logan Gunthorpe <logang@deltatee.com>,
	Kirill Shutemov <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v2] mm, zone_device: replace {get, put}_zone_device_page() with a single reference
Date: Mon, 1 May 2017 16:32:37 -0400	[thread overview]
Message-ID: <20170501203236.GA20927@redhat.com> (raw)
In-Reply-To: <CAPcyv4gFMyXhqY9enam5v9nFwjSULLE=PUEqGP0psLMcA9fzDA@mail.gmail.com>

On Mon, May 01, 2017 at 01:19:24PM -0700, Dan Williams wrote:
> On Mon, May 1, 2017 at 6:55 AM, Jerome Glisse <jglisse@redhat.com> wrote:
> > On Mon, May 01, 2017 at 01:23:59PM +0300, Kirill A. Shutemov wrote:
> >> On Sun, Apr 30, 2017 at 07:14:24PM -0400, Jerome Glisse wrote:
> >> > On Sat, Apr 29, 2017 at 01:17:26PM +0300, Kirill A. Shutemov wrote:
> >> > > On Fri, Apr 28, 2017 at 03:33:07PM -0400, Jerome Glisse wrote:
> >> > > > On Fri, Apr 28, 2017 at 12:22:24PM -0700, Dan Williams wrote:
> >> > > > > Are you sure about needing to hook the 2 -> 1 transition? Could we
> >> > > > > change ZONE_DEVICE pages to not have an elevated reference count when
> >> > > > > they are created so you can keep the HMM references out of the mm hot
> >> > > > > path?
> >> > > >
> >> > > > 100% sure on that :) I need to callback into driver for 2->1 transition
> >> > > > no way around that. If we change ZONE_DEVICE to not have an elevated
> >> > > > reference count that you need to make a lot more change to mm so that
> >> > > > ZONE_DEVICE is never use as fallback for memory allocation. Also need
> >> > > > to make change to be sure that ZONE_DEVICE page never endup in one of
> >> > > > the path that try to put them back on lru. There is a lot of place that
> >> > > > would need to be updated and it would be highly intrusive and add a
> >> > > > lot of special cases to other hot code path.
> >> > >
> >> > > Could you explain more on where the requirement comes from or point me to
> >> > > where I can read about this.
> >> > >
> >> >
> >> > HMM ZONE_DEVICE pages are use like other pages (anonymous or file back page)
> >> > in _any_ vma. So i need to know when a page is freed ie either as result of
> >> > unmap, exit or migration or anything that would free the memory. For zone
> >> > device a page is free once its refcount reach 1 so i need to catch refcount
> >> > transition from 2->1
> >>
> >> What if we would rework zone device to have pages with refcount 0 at
> >> start?
> >
> > That is a _lot_ of work from top of my head because it would need changes
> > to a lot of places and likely more hot code path that simply adding some-
> > thing to put_page() note that i only need something in put_page() i do not
> > need anything in the get page path. Is adding a conditional branch for
> > HMM pages in put_page() that much of a problem ?
> >
> >
> >> > This is the only way i can inform the device that the page is now free. See
> >> >
> >> > https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-v21&id=52da8fe1a088b87b5321319add79e43b8372ed7d
> >> >
> >> > There is _no_ way around that.
> >>
> >> I'm still not convinced that it's impossible.
> >>
> >> Could you describe lifecycle for pages in case of HMM?
> >
> > Process malloc something, end it over to some function in the program
> > that use the GPU that function call GPU API (OpenCL, CUDA, ...) that
> > trigger a migration to device memory.
> >
> > So in the kernel you get a migration like any existing migration,
> > original page is unmap, if refcount is all ok (no pin) then a device
> > page is allocated and thing are migrated to device memory.
> >
> > What happen after is unknown. Either userspace/kernel driver decide
> > to migrate back to system memory, either there is an munmap, either
> > there is a CPU page fault, ... So from that point on the device page
> > as the exact same life as a regular page.
> >
> > Above i describe the migrate case, but you can also have new memory
> > allocation that directly allocate device memory. For instance if the
> > GPU do a page fault on an address that isn't back by anything then
> > we can directly allocate a device page. No migration involve in that
> > case.
> >
> > HMM pages are like any other pages in most respect. Exception are:
> >   - no GUP
> >   - no KSM
> >   - no lru reclaim
> >   - no NUMA balancing
> >   - no regular migration (existing migrate_page)
> >
> > The fact that minimum refcount for ZONE_DEVICE is 1 already gives
> > us for free most of the above exception. To convert the refcount to
> > be like other pages would mean that all of the above would need to
> > be audited and probably modify to ignore ZONE_DEVICE pages (i am
> > pretty sure Dan do not want any of the above either).
> 
> Right, adding HMM references to get_page() and put_page() seems less
> intrusive. Given how uncommon HMM hardware is (insert grumble about no
> visible upstream user of this functionality) I think the 'static
> branch' approach helps mitigate the impact for everything else.
> Looking back, I should have used that mechanism for the pmem use case,
> but it's moot now.

I do not need anything in get_page() all i need is something in put_page()
to catch the 2 -> 1 refcount transition to know when a page is freed.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-05-01 20:32 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-20 21:46 [tip:x86/mm] x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation Dan Williams
2017-04-21 14:16 ` Kirill A. Shutemov
2017-04-21 19:30   ` Dan Williams
2017-04-23  9:52     ` [PATCH] Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" Ingo Molnar
2017-04-23 23:31 ` get_zone_device_page() in get_page() and page_cache_get_speculative() Kirill A. Shutemov
2017-04-23 23:31   ` Kirill A. Shutemov
2017-04-24 17:23   ` Dan Williams
2017-04-24 17:23     ` Dan Williams
2017-04-24 17:30     ` Kirill A. Shutemov
2017-04-24 17:30       ` Kirill A. Shutemov
2017-04-24 17:47       ` Dan Williams
2017-04-24 17:47         ` Dan Williams
2017-04-24 18:01         ` Kirill A. Shutemov
2017-04-24 18:01           ` Kirill A. Shutemov
2017-04-24 18:25           ` Kirill A. Shutemov
2017-04-24 18:25             ` Kirill A. Shutemov
2017-04-24 18:41             ` Dan Williams
2017-04-24 18:41               ` Dan Williams
2017-04-25 13:19               ` Kirill A. Shutemov
2017-04-25 13:19                 ` Kirill A. Shutemov
2017-04-25 16:44                 ` Dan Williams
2017-04-25 16:44                   ` Dan Williams
2017-04-27  0:55   ` [PATCH] mm, zone_device: replace {get, put}_zone_device_page() with a single reference Dan Williams
2017-04-27  0:55     ` Dan Williams
2017-04-27  8:33     ` Kirill A. Shutemov
2017-04-27  8:33       ` Kirill A. Shutemov
2017-04-28  6:39       ` Ingo Molnar
2017-04-28  6:39         ` Ingo Molnar
2017-04-28  8:14         ` [PATCH] mm, zone_device: Replace " Kirill A. Shutemov
2017-04-28  8:14           ` Kirill A. Shutemov
2017-04-28 17:23         ` [PATCH v2] mm, zone_device: replace " Dan Williams
2017-04-28 17:23           ` Dan Williams
2017-04-28 17:34           ` Jerome Glisse
2017-04-28 17:34             ` Jerome Glisse
2017-04-28 17:41             ` Dan Williams
2017-04-28 17:41               ` Dan Williams
2017-04-28 18:00               ` Jerome Glisse
2017-04-28 18:00                 ` Jerome Glisse
2017-04-28 19:02                 ` Dan Williams
2017-04-28 19:02                   ` Dan Williams
2017-04-28 19:16                   ` Jerome Glisse
2017-04-28 19:16                     ` Jerome Glisse
2017-04-28 19:22                     ` Dan Williams
2017-04-28 19:22                       ` Dan Williams
2017-04-28 19:33                       ` Jerome Glisse
2017-04-28 19:33                         ` Jerome Glisse
2017-04-29 10:17                         ` Kirill A. Shutemov
2017-04-29 10:17                           ` Kirill A. Shutemov
2017-04-30 23:14                           ` Jerome Glisse
2017-04-30 23:14                             ` Jerome Glisse
2017-05-01  1:42                             ` Dan Williams
2017-05-01  1:42                               ` Dan Williams
2017-05-01  1:54                               ` Jerome Glisse
2017-05-01  1:54                                 ` Jerome Glisse
2017-05-01  2:40                                 ` Dan Williams
2017-05-01  2:40                                   ` Dan Williams
2017-05-01  3:48                             ` Logan Gunthorpe
2017-05-01  3:48                               ` Logan Gunthorpe
2017-05-01 10:23                             ` Kirill A. Shutemov
2017-05-01 10:23                               ` Kirill A. Shutemov
2017-05-01 13:55                               ` Jerome Glisse
2017-05-01 13:55                                 ` Jerome Glisse
2017-05-01 20:19                                 ` Dan Williams
2017-05-01 20:19                                   ` Dan Williams
2017-05-01 20:32                                   ` Jerome Glisse [this message]
2017-05-01 20:32                                     ` Jerome Glisse
2017-05-02 11:37                                 ` Kirill A. Shutemov
2017-05-02 11:37                                   ` Kirill A. Shutemov
2017-05-02 13:22                                   ` Jerome Glisse
2017-05-02 13:22                                     ` Jerome Glisse
2017-04-29 14:18           ` Ingo Molnar
2017-04-29 14:18             ` Ingo Molnar
2017-05-01  2:45             ` Dan Williams
2017-05-01  2:45               ` Dan Williams
2017-05-01  7:12               ` Ingo Molnar
2017-05-01  7:12                 ` Ingo Molnar
2017-05-01  9:33                 ` Kirill A. Shutemov
2017-05-01  9:33                   ` Kirill A. Shutemov
2017-05-01  8:28           ` [tip:x86/mm] mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash tip-bot for Dan Williams
2017-04-27 16:11     ` [PATCH] mm, zone_device: replace {get, put}_zone_device_page() with a single reference Logan Gunthorpe
2017-04-27 16:11       ` Logan Gunthorpe
2017-04-27 16:14       ` Dan Williams
2017-04-27 16:14         ` Dan Williams
2017-04-27 16:33         ` Logan Gunthorpe
2017-04-27 16:33           ` Logan Gunthorpe
2017-04-27 16:38           ` Dan Williams
2017-04-27 16:38             ` Dan Williams
2017-04-27 16:45             ` Logan Gunthorpe
2017-04-27 16:45               ` Logan Gunthorpe
2017-04-27 16:46               ` Dan Williams
2017-04-27 16:46                 ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170501203236.GA20927@redhat.com \
    --to=jglisse@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=logang@deltatee.com \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.