All of lore.kernel.org
 help / color / mirror / Atom feed
From: Simon Jeons <simon.jeons@gmail.com>
To: Jerome Glisse <j.glisse@gmail.com>
Cc: Michel Lespinasse <walken@google.com>,
	Shachar Raindel <raindel@mellanox.com>,
	lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	Roland Dreier <roland@purestorage.com>,
	Haggai Eran <haggaie@mellanox.com>,
	Or Gerlitz <ogerlitz@mellanox.com>,
	Sagi Grimberg <sagig@mellanox.com>,
	Liran Liss <liranl@mellanox.com>
Subject: Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
Date: Fri, 12 Apr 2013 13:44:38 +0800	[thread overview]
Message-ID: <51679F46.7030901@gmail.com> (raw)
In-Reply-To: <CAH3drwYee1mKMPcT5QJNsaGGEvJHNTPFEvndpvS+HkeuwwAYmg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 9882 bytes --]

Hi Jerome,
On 04/12/2013 10:57 AM, Jerome Glisse wrote:
> On Thu, Apr 11, 2013 at 9:54 PM, Simon Jeons <simon.jeons@gmail.com 
> <mailto:simon.jeons@gmail.com>> wrote:
>
>     Hi Jerome,
>
>     On 04/12/2013 02:38 AM, Jerome Glisse wrote:
>
>         On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote:
>
>             Hi Jerome,
>             On 04/11/2013 04:45 AM, Jerome Glisse wrote:
>
>                 On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons
>                 wrote:
>
>                     Hi Jerome,
>                     On 04/09/2013 10:21 PM, Jerome Glisse wrote:
>
>                         On Tue, Apr 09, 2013 at 04:28:09PM +0800,
>                         Simon Jeons wrote:
>
>                             Hi Jerome,
>                             On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>
>                                 On Sat, Feb 9, 2013 at 1:05 AM, Michel
>                                 Lespinasse <walken@google.com
>                                 <mailto:walken@google.com>> wrote:
>
>                                     On Fri, Feb 8, 2013 at 3:18 AM,
>                                     Shachar Raindel
>                                     <raindel@mellanox.com
>                                     <mailto:raindel@mellanox.com>> wrote:
>
>                                         Hi,
>
>                                         We would like to present a
>                                         reference implementation for
>                                         safely sharing
>                                         memory pages from user space
>                                         with the hardware, without
>                                         pinning.
>
>                                         We will be happy to hear the
>                                         community feedback on our
>                                         prototype
>                                         implementation, and
>                                         suggestions for future
>                                         improvements.
>
>                                         We would also like to discuss
>                                         adding features to the core MM
>                                         subsystem to
>                                         assist hardware access to user
>                                         memory without pinning.
>
>                                     This sounds kinda scary TBH;
>                                     however I do understand the need
>                                     for such
>                                     technology.
>
>                                     I think one issue is that many MM
>                                     developers are insufficiently aware
>                                     of such developments; having a
>                                     technology presentation would probably
>                                     help there; but traditionally
>                                     LSF/MM sessions are more interactive
>                                     between developers who are already
>                                     quite familiar with the technology.
>                                     I think it would help if you could
>                                     send in advance a detailed
>                                     presentation of the problem and
>                                     the proposed solutions (and then what
>                                     they require of the MM layer) so
>                                     people can be better prepared.
>
>                                     And first I'd like to ask, aren't
>                                     IOMMUs supposed to already largely
>                                     solve this problem ? (probably a
>                                     dumb question, but that just tells
>                                     you how much you need to explain :)
>
>                                 For GPU the motivation is three fold.
>                                 With the advance of GPU compute
>                                 and also with newer graphic program we
>                                 see a massive increase in GPU
>                                 memory consumption. We easily can
>                                 reach buffer that are bigger than
>                                 1gbytes. So the first motivation is to
>                                 directly use the memory the
>                                 user allocated through malloc in the
>                                 GPU this avoid copying 1gbytes of
>                                 data with the cpu to the gpu buffer.
>                                 The second and mostly important
>                                 to GPU compute is the use of GPU
>                                 seamlessly with the CPU, in order to
>                                 achieve this you want the programmer
>                                 to have a single address space on
>                                 the CPU and GPU. So that the same
>                                 address point to the same object on
>                                 GPU as on the CPU. This would also be
>                                 a tremendous cleaner design from
>                                 driver point of view toward memory
>                                 management.
>
>                                 And last, the most important, with
>                                 such big buffer (>1gbytes) the
>                                 memory pinning is becoming way to
>                                 expensive and also drastically
>                                 reduce the freedom of the mm to free
>                                 page for other process. Most of
>                                 the time a small window (every thing
>                                 is relative the window can be >
>                                 100mbytes not so small :)) of the
>                                 object will be in use by the
>                                 hardware. The hardware pagefault
>                                 support would avoid the necessity to
>
>                             What's the meaning of hardware pagefault?
>
>                         It's a PCIE extension (well it's a combination
>                         of extension that allow
>                         that see
>                         http://www.pcisig.com/specifications/iov/ats/). Idea
>                         is that the
>                         iommu can trigger a regular pagefault inside a
>                         process address space on
>                         behalf of the hardware. The only iommu
>                         supporting that right now is the
>                         AMD iommu v2 that you find on recent AMD platform.
>
>                     Why need hardware page fault? regular page fault
>                     is trigger by cpu
>                     mmu, correct?
>
>                 Well here i abuse regular page fault term. Idea is
>                 that with hardware page
>                 fault you don't need to pin memory or take reference
>                 on page for hardware to
>                 use it. So that kernel can free as usual page that
>                 would otherwise have been
>
>             For the case when GPU need to pin memory, why GPU need
>             grap the
>             memory of normal process instead of allocating for itself?
>
>         Pin memory is today world where gpu allocate its own memory
>         (GB of memory)
>         that disappear from kernel control ie kernel can no longer
>         reclaim this
>         memory it's lost memory (i had complain about that already
>         from user than
>         saw GB of memory vanish and couldn't understand why the GPU
>         was using so
>         much).
>
>         Tomorrow world we want gpu to be able to access memory that
>         the application
>         allocated through a simple malloc and we want the kernel to be
>         able to
>         recycly any page at any time because of memory pressure or
>         because kernel
>         decide to do so.
>
>         That's just what we want to do. To achieve so we are getting
>         hw that can do
>         pagefault. No change to kernel core mm code (some improvement
>         might be made).
>
>
>     The memory disappear since you have a reference(gup) against it,
>     correct? Tomorrow world you want the page fault trigger through
>     iommu driver that call get_user_pages, it also will take a
>     reference(since gup is called), isn't it? Anyway, assume tomorrow
>     world doesn't take a reference, we don't need care page which used
>     by GPU is reclaimed?
>
>
> Right now code use gup because it's convenient but it drop the 
> reference right after the fault. So reference is hold only for short 
> period of time.

Are you sure gup will drop the reference right after the fault? I redig 
the codes and fail verify it. Could you point out to me?

>
> No you don't need to care about reclaim thanks to mmu notifier, ie 
> before page is remove mmu notifier is call and iommu register a 
> notifier, so it get the invalidate event and invalidate the device tlb 
> and things goes on. If gpu access the page a new pagefault happen and 
> a new page is allocated.

Good idea! ;-)

>
> All this code is upstream in linux kernel just read it. There is just 
> no device that use it yet.
>
> That being said we will want improvement so that page that are hot in 
> the device are not reclaimed. But it can work without such improvement.
>
> Cheers,
> Jerome


[-- Attachment #2: Type: text/html, Size: 13455 bytes --]

  reply	other threads:[~2013-04-12  5:44 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-08 11:18 [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes Shachar Raindel
2013-02-08 15:21 ` Jerome Glisse
2013-04-16  7:03   ` Simon Jeons
2013-04-16 16:27     ` Jerome Glisse
2013-04-16 23:50       ` Simon Jeons
2013-04-17 14:01         ` Jerome Glisse
2013-04-17 23:48           ` Simon Jeons
2013-04-18  1:02             ` Jerome Glisse
2013-02-09  6:05 ` Michel Lespinasse
2013-02-09 16:29   ` Jerome Glisse
2013-04-09  8:28     ` Simon Jeons
2013-04-09 14:21       ` Jerome Glisse
2013-04-10  1:41         ` Simon Jeons
2013-04-10 20:45           ` Jerome Glisse
2013-04-11  3:42             ` Simon Jeons
2013-04-11 18:38               ` Jerome Glisse
2013-04-12  1:54                 ` Simon Jeons
2013-04-12  2:11                   ` [Lsf-pc] " Rik van Riel
2013-04-12  2:57                   ` Jerome Glisse
2013-04-12  5:44                     ` Simon Jeons [this message]
2013-04-12 13:32                       ` Jerome Glisse
2013-04-10  1:57     ` Simon Jeons
2013-04-10 20:55       ` Jerome Glisse
2013-04-11  3:37         ` Simon Jeons
2013-04-11 18:48           ` Jerome Glisse
2013-04-12  3:13             ` Simon Jeons
2013-04-12  3:21               ` Jerome Glisse
2013-04-15  8:39     ` Simon Jeons
2013-04-15 15:38       ` Jerome Glisse
2013-04-16  4:20         ` Simon Jeons
2013-04-16 16:19           ` Jerome Glisse
2013-02-10  7:54   ` Shachar Raindel
2013-04-09  8:17 ` Simon Jeons
2013-04-10  1:48   ` Simon Jeons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51679F46.7030901@gmail.com \
    --to=simon.jeons@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=haggaie@mellanox.com \
    --cc=j.glisse@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=liranl@mellanox.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=ogerlitz@mellanox.com \
    --cc=raindel@mellanox.com \
    --cc=roland@purestorage.com \
    --cc=sagig@mellanox.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.