From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx173.postini.com [74.125.245.173]) by kanga.kvack.org (Postfix) with SMTP id 8568C6B0005 for ; Fri, 12 Apr 2013 09:32:47 -0400 (EDT) Received: by mail-qe0-f50.google.com with SMTP id a11so1533056qen.9 for ; Fri, 12 Apr 2013 06:32:46 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <51679F46.7030901@gmail.com> References: <5114DF05.7070702@mellanox.com> <5163D119.80603@gmail.com> <20130409142156.GA1909@gmail.com> <5164C365.70302@gmail.com> <20130410204507.GA3958@gmail.com> <5166310D.4020100@gmail.com> <20130411183828.GA6696@gmail.com> <51676941.3050802@gmail.com> <51679F46.7030901@gmail.com> Date: Fri, 12 Apr 2013 09:32:46 -0400 Message-ID: Subject: Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes From: Jerome Glisse Content-Type: multipart/alternative; boundary=f46d0447a18d46aeec04da29f009 Sender: owner-linux-mm@kvack.org List-ID: To: Simon Jeons Cc: Michel Lespinasse , Shachar Raindel , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Andrea Arcangeli , Roland Dreier , Haggai Eran , Or Gerlitz , Sagi Grimberg , Liran Liss --f46d0447a18d46aeec04da29f009 Content-Type: text/plain; charset=ISO-8859-1 On Fri, Apr 12, 2013 at 1:44 AM, Simon Jeons wrote: > Hi Jerome, > > On 04/12/2013 10:57 AM, Jerome Glisse wrote: > > On Thu, Apr 11, 2013 at 9:54 PM, Simon Jeons wrote: > >> Hi Jerome, >> >> On 04/12/2013 02:38 AM, Jerome Glisse wrote: >> >>> On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote: >>> >>>> Hi Jerome, >>>> On 04/11/2013 04:45 AM, Jerome Glisse wrote: >>>> >>>>> On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote: >>>>> >>>>>> Hi Jerome, >>>>>> On 04/09/2013 10:21 PM, Jerome Glisse wrote: >>>>>> >>>>>>> On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote: >>>>>>> >>>>>>>> Hi Jerome, >>>>>>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote: >>>>>>>> >>>>>>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse < >>>>>>>>> walken@google.com> wrote: >>>>>>>>> >>>>>>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel < >>>>>>>>>> raindel@mellanox.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> We would like to present a reference implementation for safely >>>>>>>>>>> sharing >>>>>>>>>>> memory pages from user space with the hardware, without pinning. >>>>>>>>>>> >>>>>>>>>>> We will be happy to hear the community feedback on our prototype >>>>>>>>>>> implementation, and suggestions for future improvements. >>>>>>>>>>> >>>>>>>>>>> We would also like to discuss adding features to the core MM >>>>>>>>>>> subsystem to >>>>>>>>>>> assist hardware access to user memory without pinning. >>>>>>>>>>> >>>>>>>>>> This sounds kinda scary TBH; however I do understand the need for >>>>>>>>>> such >>>>>>>>>> technology. >>>>>>>>>> >>>>>>>>>> I think one issue is that many MM developers are insufficiently >>>>>>>>>> aware >>>>>>>>>> of such developments; having a technology presentation would >>>>>>>>>> probably >>>>>>>>>> help there; but traditionally LSF/MM sessions are more interactive >>>>>>>>>> between developers who are already quite familiar with the >>>>>>>>>> technology. >>>>>>>>>> I think it would help if you could send in advance a detailed >>>>>>>>>> presentation of the problem and the proposed solutions (and then >>>>>>>>>> what >>>>>>>>>> they require of the MM layer) so people can be better prepared. >>>>>>>>>> >>>>>>>>>> And first I'd like to ask, aren't IOMMUs supposed to already >>>>>>>>>> largely >>>>>>>>>> solve this problem ? (probably a dumb question, but that just >>>>>>>>>> tells >>>>>>>>>> you how much you need to explain :) >>>>>>>>>> >>>>>>>>> For GPU the motivation is three fold. With the advance of GPU >>>>>>>>> compute >>>>>>>>> and also with newer graphic program we see a massive increase in >>>>>>>>> GPU >>>>>>>>> memory consumption. We easily can reach buffer that are bigger than >>>>>>>>> 1gbytes. So the first motivation is to directly use the memory the >>>>>>>>> user allocated through malloc in the GPU this avoid copying >>>>>>>>> 1gbytes of >>>>>>>>> data with the cpu to the gpu buffer. The second and mostly >>>>>>>>> important >>>>>>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order >>>>>>>>> to >>>>>>>>> achieve this you want the programmer to have a single address >>>>>>>>> space on >>>>>>>>> the CPU and GPU. So that the same address point to the same object >>>>>>>>> on >>>>>>>>> GPU as on the CPU. This would also be a tremendous cleaner design >>>>>>>>> from >>>>>>>>> driver point of view toward memory management. >>>>>>>>> >>>>>>>>> And last, the most important, with such big buffer (>1gbytes) the >>>>>>>>> memory pinning is becoming way to expensive and also drastically >>>>>>>>> reduce the freedom of the mm to free page for other process. Most >>>>>>>>> of >>>>>>>>> the time a small window (every thing is relative the window can be >>>>>>>>> > >>>>>>>>> 100mbytes not so small :)) of the object will be in use by the >>>>>>>>> hardware. The hardware pagefault support would avoid the necessity >>>>>>>>> to >>>>>>>>> >>>>>>>> What's the meaning of hardware pagefault? >>>>>>>> >>>>>>> It's a PCIE extension (well it's a combination of extension that >>>>>>> allow >>>>>>> that see http://www.pcisig.com/specifications/iov/ats/). Idea is >>>>>>> that the >>>>>>> iommu can trigger a regular pagefault inside a process address space >>>>>>> on >>>>>>> behalf of the hardware. The only iommu supporting that right now is >>>>>>> the >>>>>>> AMD iommu v2 that you find on recent AMD platform. >>>>>>> >>>>>> Why need hardware page fault? regular page fault is trigger by cpu >>>>>> mmu, correct? >>>>>> >>>>> Well here i abuse regular page fault term. Idea is that with hardware >>>>> page >>>>> fault you don't need to pin memory or take reference on page for >>>>> hardware to >>>>> use it. So that kernel can free as usual page that would otherwise >>>>> have been >>>>> >>>> For the case when GPU need to pin memory, why GPU need grap the >>>> memory of normal process instead of allocating for itself? >>>> >>> Pin memory is today world where gpu allocate its own memory (GB of >>> memory) >>> that disappear from kernel control ie kernel can no longer reclaim this >>> memory it's lost memory (i had complain about that already from user than >>> saw GB of memory vanish and couldn't understand why the GPU was using so >>> much). >>> >>> Tomorrow world we want gpu to be able to access memory that the >>> application >>> allocated through a simple malloc and we want the kernel to be able to >>> recycly any page at any time because of memory pressure or because kernel >>> decide to do so. >>> >>> That's just what we want to do. To achieve so we are getting hw that can >>> do >>> pagefault. No change to kernel core mm code (some improvement might be >>> made). >>> >> >> The memory disappear since you have a reference(gup) against it, >> correct? Tomorrow world you want the page fault trigger through iommu >> driver that call get_user_pages, it also will take a reference(since gup is >> called), isn't it? Anyway, assume tomorrow world doesn't take a reference, >> we don't need care page which used by GPU is reclaimed? >> >> > Right now code use gup because it's convenient but it drop the reference > right after the fault. So reference is hold only for short period of time. > > > Are you sure gup will drop the reference right after the fault? I redig > the codes and fail verify it. Could you point out to me? > > In amd_iommu_v2.c:do_fault get_user_pages followed by put_page Cheers, Jerome --f46d0447a18d46aeec04da29f009 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On Fri, Apr 12, 2013 at 1:44 AM, Simon Jeons <simon.jeons@gmail.com> wrote:
=20 =20 =20
Hi Jerome,

On 04/12/2013 10:57 AM, Jerome Glisse wrote:
On Thu, Apr 11, 2013 at 9:54 PM, Simon Jeons = <simon.jeons@gmail.com> wrote:
Hi Jerome,

On 04/12/2013 02:38 AM, Jerome Glisse wrote:
On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote:
Hi Jerome,
On 04/11/2013 04:45 AM, Jerome Glisse wrote:
On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote:
Hi Jerome,
On 04/09/2013 10:21 PM, Jerome Glisse wrote:
On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote:
Hi Jerome,
On 02/10/2013 12:29 AM, Jerome Glisse wrote:
On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
Hi,

We would like to present a reference implementation for safely sharing
memory pages from user space with the hardware, without pinning.

We will be happy to hear the community feedback on our prototype
implementation, and suggestions for future improvements.

We would also like to discuss adding features to the core MM subsystem to
assist hardware access to user memory without pinning.
This sounds kinda scary TBH; however I do understand the need for such
technology.

I think one issue is that many MM developers are insufficiently aware
of such developments; having a technology presentation would probably
help there; but traditionally LSF/MM sessions are more interactive
between developers who are already quite familiar with the technology.
I think it would help if you could send in advance a detailed
presentation of the problem and the proposed solutions (and then what
they require of the MM layer) so people can be better prepared.

And first I'd like to ask, aren't IOM= MUs supposed to already largely
solve this problem ? (probably a dumb question, but that just tells
you how much you need to explain :)
For GPU the motivation is three fold. With the advance of GPU compute
and also with newer graphic program we see a massive increase in GPU
memory consumption. We easily can reach buffer that are bigger than
1gbytes. So the first motivation is to directly use the memory the
user allocated through malloc in the GPU this avoid copying 1gbytes of
data with the cpu to the gpu buffer. The second and mostly important
to GPU compute is the use of GPU seamlessly with the CPU, in order to
achieve this you want the programmer to have a single address space on
the CPU and GPU. So that the same address point to the same object on
GPU as on the CPU. This would also be a tremendous cleaner design from
driver point of view toward memory management.

And last, the most important, with such big buffer (>1gbytes) the
memory pinning is becoming way to expensive and also drastically
reduce the freedom of the mm to free page for other process. Most of
the time a small window (every thing is relative the window can be >
100mbytes not so small :)) of the object will be in use by the
hardware. The hardware pagefault support would avoid the necessity to
What's the meaning of hardware pagefault?
It's a PCIE extension (well it's a combinat= ion of extension that allow
that see http://www.pcisig.com/specifications/iov/= ats/). Idea is that the
iommu can trigger a regular pagefault inside a process address space on
behalf of the hardware. The only iommu supporting that right now is the
AMD iommu v2 that you find on recent AMD platform.
Why need hardware page fault? regular page fault is trigger by cpu
mmu, correct?
Well here i abuse regular page fault term. Idea is that with hardware page
fault you don't need to pin memory or take referenc= e on page for hardware to
use it. So that kernel can free as usual page that would otherwise have been
For the case when GPU need to pin memory, why GPU need grap the
memory of normal process instead of allocating for itself?
Pin memory is today world where gpu allocate its own memory (GB of memory)
that disappear from kernel control ie kernel can no longer reclaim this
memory it's lost memory (i had complain about that already from user than
saw GB of memory vanish and couldn't understand why the GPU was using so
much).

Tomorrow world we want gpu to be able to access memory that the application
allocated through a simple malloc and we want the kernel to be able to
recycly any page at any time because of memory pressure or because kernel
decide to do so.

That's just what we want to do. To achieve so we are getting hw that can do
pagefault. No change to kernel core mm code (some improvement might be made).

The memory disappear since you have a reference(gup) against it, correct? Tomorrow world you want the page fault trigger through iommu driver that call get_user_pages, it also will take a reference(since gup is called), isn't it? Anyway, assume tomorrow world doesn't take a reference, we don't = need care page which used by GPU is reclaimed?


Right now code use gup because it's convenient but it drop th= e reference right after the fault. So reference is hold only for short period of time.

Are you sure gup will drop the reference right after the fault? I redig the codes and fail verify it. Could you point out to me?

=A0
In amd_iommu_v2.c:= do_fault get_user_pages followed by put_page
=A0
=A0
Cheers,
= Jerome
--f46d0447a18d46aeec04da29f009-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org