All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Figo.zhang" <figo1802@gmail.com>
To: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>, John Hubbard <jhubbard@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	David Nellans <dnellans@nvidia.com>
Subject: Re: [HMM 00/16] HMM (Heterogeneous Memory Management) v19
Date: Thu, 6 Apr 2017 11:22:12 +0800	[thread overview]
Message-ID: <CAF7GXvptCfV89rAi=j1cy1df12039GDpq_DHOyx+_xk0FjBDPg@mail.gmail.com> (raw)
In-Reply-To: <20170405204026.3940-1-jglisse@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6191 bytes --]

>
>
>
>
> Heterogeneous Memory Management (HMM) (description and justification)
>
> Today device driver expose dedicated memory allocation API through their
> device file, often relying on a combination of IOCTL and mmap calls. The
> device can only access and use memory allocated through this API. This
> effectively split the program address space into object allocated for the
> device and useable by the device and other regular memory (malloc, mmap
> of a file, share memory, …) only accessible by CPU (or in a very limited
> way by a device by pinning memory).
>
> Allowing different isolated component of a program to use a device thus
> require duplication of the input data structure using device memory
> allocator. This is reasonable for simple data structure (array, grid,
> image, …) but this get extremely complex with advance data structure
> (list, tree, graph, …) that rely on a web of memory pointers. This is
> becoming a serious limitation on the kind of work load that can be
> offloaded to device like GPU.
>

how handle it by current  GPU software stack? maintain a complex middle
firmwork/HAL?


>
> New industry standard like C++, OpenCL or CUDA are pushing to remove this
> barrier. This require a shared address space between GPU device and CPU so
> that GPU can access any memory of a process (while still obeying memory
> protection like read only).


GPU can access the whole process VMAs or any VMAs which backing system
memory has migrate to GPU page table?



> This kind of feature is also appearing in
> various other operating systems.
>
> HMM is a set of helpers to facilitate several aspects of address space
> sharing and device memory management. Unlike existing sharing mechanism
> that rely on pining pages use by a device, HMM relies on mmu_notifier to
> propagate CPU page table update to device page table.
>
> Duplicating CPU page table is only one aspect necessary for efficiently
> using device like GPU. GPU local memory have bandwidth in the TeraBytes/
> second range but they are connected to main memory through a system bus
> like PCIE that is limited to 32GigaBytes/second (PCIE 4.0 16x). Thus it
> is necessary to allow migration of process memory from main system memory
> to device memory. Issue is that on platform that only have PCIE the device
> memory is not accessible by the CPU with the same properties as main
> memory (cache coherency, atomic operations, …).
>
> To allow migration from main memory to device memory HMM provides a set
> of helper to hotplug device memory as a new type of ZONE_DEVICE memory
> which is un-addressable by CPU but still has struct page representing it.
> This allow most of the core kernel logic that deals with a process memory
> to stay oblivious of the peculiarity of device memory.
>
> When page backing an address of a process is migrated to device memory
> the CPU page table entry is set to a new specific swap entry. CPU access
> to such address triggers a migration back to system memory, just like if
> the page was swap on disk. HMM also blocks any one from pinning a
> ZONE_DEVICE page so that it can always be migrated back to system memory
> if CPU access it. Conversely HMM does not migrate to device memory any
> page that is pin in system memory.
>

the purpose of  migrate the system pages to device is that device can read
the system memory?
if the CPU/programs want read the device data, it need pin/mapping the
device memory to the process address space?
if multiple applications want to read the same device memory region
concurrently, how to do it?

it is better a graph to show how CPU and GPU share the address space.


>
> To allow efficient migration between device memory and main memory a new
> migrate_vma() helpers is added with this patchset. It allows to leverage
> device DMA engine to perform the copy operation.
>
> This feature will be use by upstream driver like nouveau mlx5 and probably
> other in the future (amdgpu is next suspect  in line). We are actively
> working on nouveau and mlx5 support. To test this patchset we also worked
> with NVidia close source driver team, they have more resources than us to
> test this kind of infrastructure and also a bigger and better userspace
> eco-system with various real industry workload they can be use to test and
> profile HMM.
>
> The expected workload is a program builds a data set on the CPU (from disk,
> from network, from sensors, …). Program uses GPU API (OpenCL, CUDA, ...)
> to give hint on memory placement for the input data and also for the output
> buffer. Program call GPU API to schedule a GPU job, this happens using
> device driver specific ioctl. All this is hidden from programmer point of
> view in case of C++ compiler that transparently offload some part of a
> program to GPU. Program can keep doing other stuff on the CPU while the
> GPU is crunching numbers.
>
> It is expected that CPU will not access the same data set as the GPU while
> GPU is working on it, but this is not mandatory. In fact we expect some
> small memory object to be actively access by both GPU and CPU concurrently
> as synchronization channel and/or for monitoring purposes. Such object will
> stay in system memory and should not be bottlenecked by system bus
> bandwidth (rare write and read access from both CPU and GPU).
>
> As we are relying on device driver API, HMM does not introduce any new
> syscall nor does it modify any existing ones. It does not change any POSIX
> semantics or behaviors. For instance the child after a fork of a process
> that is using HMM will not be impacted in anyway, nor is there any data
> hazard between child COW or parent COW of memory that was migrated to
> device prior to fork.
>
> HMM assume a numbers of hardware features. Device must allow device page
> table to be updated at any time (ie device job must be preemptable). Device
> page table must provides memory protection such as read only. Device must
> track write access (dirty bit). Device must have a minimum granularity that
> match PAGE_SIZE (ie 4k).
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 7138 bytes --]

  parent reply	other threads:[~2017-04-06  3:22 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-05 20:40 [HMM 00/16] HMM (Heterogeneous Memory Management) v19 Jérôme Glisse
2017-04-05 20:40 ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 01/16] mm/memory/hotplug: add memory type parameter to arch_add/remove_memory Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-06  9:45   ` Anshuman Khandual
2017-04-06  9:45     ` Anshuman Khandual
2017-04-06 13:58     ` Jerome Glisse
2017-04-06 13:58       ` Jerome Glisse
2017-04-07 12:13   ` Michal Hocko
2017-04-07 12:13     ` Michal Hocko
2017-04-07 14:32     ` Jerome Glisse
2017-04-07 14:32       ` Jerome Glisse
2017-04-07 14:45       ` Michal Hocko
2017-04-07 14:45         ` Michal Hocko
2017-04-07 14:57         ` Jerome Glisse
2017-04-07 14:57           ` Jerome Glisse
2017-04-07 15:11           ` Michal Hocko
2017-04-07 15:11             ` Michal Hocko
2017-04-07 16:10             ` Jerome Glisse
2017-04-07 16:10               ` Jerome Glisse
2017-04-07 16:37               ` Michal Hocko
2017-04-07 16:37                 ` Michal Hocko
2017-04-07 17:10                 ` Jerome Glisse
2017-04-07 17:10                   ` Jerome Glisse
2017-04-07 17:59                   ` Michal Hocko
2017-04-07 17:59                     ` Michal Hocko
2017-04-07 18:27                     ` Jerome Glisse
2017-04-07 18:27                       ` Jerome Glisse
2017-04-05 20:40 ` [HMM 02/16] mm/put_page: move ZONE_DEVICE page reference decrement v2 Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 03/16] mm/unaddressable-memory: new type of ZONE_DEVICE for unaddressable memory Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 04/16] mm/ZONE_DEVICE/x86: add support for un-addressable device memory Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 05/16] mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 06/16] mm/migrate: new memory migration helper for use with device memory v4 Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 07/16] mm/migrate: migrate_vma() unmap page from vma while collecting pages Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 08/16] mm/hmm: heterogeneous memory management (HMM for short) Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 09/16] mm/hmm/mirror: mirror process address space on device with HMM helpers Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 10/16] mm/hmm/mirror: helper to snapshot CPU page table v2 Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-10  8:35   ` Michal Hocko
2017-04-10  8:35     ` Michal Hocko
2017-04-10  8:43   ` Michal Hocko
2017-04-10  8:43     ` Michal Hocko
2017-04-10 22:10     ` Andrew Morton
2017-04-10 22:10       ` Andrew Morton
2017-04-11  1:33       ` Jerome Glisse
2017-04-11  1:33         ` Jerome Glisse
2017-04-11 20:33         ` Andrew Morton
2017-04-11 20:33           ` Andrew Morton
2017-04-05 20:40 ` [HMM 11/16] mm/hmm/mirror: device page fault handler Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 12/16] mm/migrate: support un-addressable ZONE_DEVICE page in migration Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 13/16] mm/migrate: allow migrate_vma() to alloc new page on empty entry Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 14/16] mm/hmm/devmem: device memory hotplug using ZONE_DEVICE Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-06 21:22   ` Jerome Glisse
2017-04-06 21:22     ` Jerome Glisse
2017-04-07  1:37   ` Balbir Singh
2017-04-07  1:37     ` Balbir Singh
2017-04-07  2:02     ` Jerome Glisse
2017-04-07  2:02       ` Jerome Glisse
2017-04-07 16:26       ` Jerome Glisse
2017-04-07 16:26         ` Jerome Glisse
2017-04-10  4:31         ` Balbir Singh
2017-04-10  4:31           ` Balbir Singh
2017-04-05 20:40 ` [HMM 15/16] mm/hmm/devmem: dummy HMM device for ZONE_DEVICE memory v2 Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-05 20:40 ` [HMM 16/16] hmm: heterogeneous memory management documentation Jérôme Glisse
2017-04-05 20:40   ` Jérôme Glisse
2017-04-06  3:22 ` Figo.zhang [this message]
2017-04-06  4:59   ` [HMM 00/16] HMM (Heterogeneous Memory Management) v19 Jerome Glisse
2017-04-06  4:59     ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF7GXvptCfV89rAi=j1cy1df12039GDpq_DHOyx+_xk0FjBDPg@mail.gmail.com' \
    --to=figo1802@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dnellans@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.