[LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
@ 2013-02-08 11:18 Shachar Raindel
  2013-02-08 15:21 ` Jerome Glisse
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Shachar Raindel @ 2013-02-08 11:18 UTC (permalink / raw)
  To: lsf-pc
  Cc: linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran,
	Or Gerlitz, Sagi Grimberg, Liran Liss

Hi,

We would like to present a reference implementation for safely sharing 
memory pages from user space with the hardware, without pinning.

We will be happy to hear the community feedback on our prototype 
implementation, and suggestions for future improvements.

We would also like to discuss adding features to the core MM subsystem 
to assist hardware access to user memory without pinning.

Following is a longer motivation and explanation on the technology 
presented:

Many application developers would like to be able to be able to 
communicate directly with the hardware from the userspace.

Use cases for that includes high performance networking API such as 
InfiniBand, RoCE and iWarp and interfacing with GPUs.

Currently, if the user space application wants to share system memory 
with the hardware device, the kernel component must pin the memory pages 
in RAM, using get_user_pages.

This is a hurdle, as it usually makes large portions the application 
memory unmovable. This pinning also makes the user space development 
model very complicated a?? one needs to register memory before using it 
for communication with the hardware.

We use the mmu-notifiers [1] mechanism to inform the hardware when the 
mapping of a page is changed. If the hardware tries to access a page 
which is not yet mapped for the hardware, it requests a resolution for 
the page address from the kernel.

This mechanism allows the hardware to access the entire address space of 
the user application, without pinning even a single page.

We would like to use the LSF/MM forum opportunity to discuss open issues 
we have for further development, such as:

-Allowing the hardware to perform page table walk, similar to 
get_user_pages_fast to resolve user pages that are already in RAM.

-Batching page eviction by various kernel subsystems (swapper, 
page-cache) to reduce the amount of communication needed with the 
hardware in such events

-Hinting from the hardware to the MM regarding page fetches which are 
speculative, similarly to prefetching done by the page-cache

-Page-in notifications from the kernel to the driver, such that we can 
keep our secondary TLB in sync with the kernel page table without 
incurring page faults.

-Allowed and banned actions while in an MMU notifier callback. We have 
already done some work on making the MMU notifiers sleepable [2], but 
there might be additional limitations, which we would like to discuss.

-Hinting from the MMU notifiers as for the reason for the notification - 
for example we would like to react differently if a page was moved by 
NUMA migration vs. page being swapped out.

[1] http://lwn.net/Articles/266320/

[2] http://comments.gmane.org/gmane.linux.kernel.mm/85002

Thanks,

--Shachar

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-02-08 11:18 [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes Shachar Raindel
@ 2013-02-08 15:21 ` Jerome Glisse
  2013-04-16  7:03   ` Simon Jeons
  2013-02-09  6:05 ` Michel Lespinasse
  2013-04-09  8:17 ` Simon Jeons
  2 siblings, 1 reply; 34+ messages in thread
From: Jerome Glisse @ 2013-02-08 15:21 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran,
	Or Gerlitz, Sagi Grimberg, Liran Liss

On Fri, Feb 8, 2013 at 6:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
> Hi,
>
> We would like to present a reference implementation for safely sharing
> memory pages from user space with the hardware, without pinning.
>
> We will be happy to hear the community feedback on our prototype
> implementation, and suggestions for future improvements.
>
> We would also like to discuss adding features to the core MM subsystem to
> assist hardware access to user memory without pinning.
>
> Following is a longer motivation and explanation on the technology
> presented:
>
> Many application developers would like to be able to be able to communicate
> directly with the hardware from the userspace.
>
> Use cases for that includes high performance networking API such as
> InfiniBand, RoCE and iWarp and interfacing with GPUs.
>
> Currently, if the user space application wants to share system memory with
> the hardware device, the kernel component must pin the memory pages in RAM,
> using get_user_pages.
>
> This is a hurdle, as it usually makes large portions the application memory
> unmovable. This pinning also makes the user space development model very
> complicated – one needs to register memory before using it for communication
> with the hardware.
>
> We use the mmu-notifiers [1] mechanism to inform the hardware when the
> mapping of a page is changed. If the hardware tries to access a page which
> is not yet mapped for the hardware, it requests a resolution for the page
> address from the kernel.
>
> This mechanism allows the hardware to access the entire address space of the
> user application, without pinning even a single page.
>
> We would like to use the LSF/MM forum opportunity to discuss open issues we
> have for further development, such as:
>
> -Allowing the hardware to perform page table walk, similar to
> get_user_pages_fast to resolve user pages that are already in RAM.
>
> -Batching page eviction by various kernel subsystems (swapper, page-cache)
> to reduce the amount of communication needed with the hardware in such
> events
>
> -Hinting from the hardware to the MM regarding page fetches which are
> speculative, similarly to prefetching done by the page-cache
>
> -Page-in notifications from the kernel to the driver, such that we can keep
> our secondary TLB in sync with the kernel page table without incurring page
> faults.
>
> -Allowed and banned actions while in an MMU notifier callback. We have
> already done some work on making the MMU notifiers sleepable [2], but there
> might be additional limitations, which we would like to discuss.
>
> -Hinting from the MMU notifiers as for the reason for the notification - for
> example we would like to react differently if a page was moved by NUMA
> migration vs. page being swapped out.
>
> [1] http://lwn.net/Articles/266320/
>
> [2] http://comments.gmane.org/gmane.linux.kernel.mm/85002
>
> Thanks,
>
> --Shachar

As a GPU driver developer i can say that this is something we want to
do in a very near future. Also i think we would like another
capabilities :

- hint to mm on memory range that are best not to evict (easier for
driver to know what is hot and gonna see activities)

Dunno how big the change to the page eviction path would need to be.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-02-08 15:21 ` Jerome Glisse
@ 2013-04-16  7:03   ` Simon Jeons
  2013-04-16 16:27     ` Jerome Glisse
  0 siblings, 1 reply; 34+ messages in thread
From: Simon Jeons @ 2013-04-16  7:03 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli,
	Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg,
	Liran Liss

Hi Jerome,
On 02/08/2013 11:21 PM, Jerome Glisse wrote:
> On Fri, Feb 8, 2013 at 6:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
>> Hi,
>>
>> We would like to present a reference implementation for safely sharing
>> memory pages from user space with the hardware, without pinning.
>>
>> We will be happy to hear the community feedback on our prototype
>> implementation, and suggestions for future improvements.
>>
>> We would also like to discuss adding features to the core MM subsystem to
>> assist hardware access to user memory without pinning.
>>
>> Following is a longer motivation and explanation on the technology
>> presented:
>>
>> Many application developers would like to be able to be able to communicate
>> directly with the hardware from the userspace.
>>
>> Use cases for that includes high performance networking API such as
>> InfiniBand, RoCE and iWarp and interfacing with GPUs.
>>
>> Currently, if the user space application wants to share system memory with
>> the hardware device, the kernel component must pin the memory pages in RAM,
>> using get_user_pages.
>>
>> This is a hurdle, as it usually makes large portions the application memory
>> unmovable. This pinning also makes the user space development model very
>> complicated ? one needs to register memory before using it for communication
>> with the hardware.
>>
>> We use the mmu-notifiers [1] mechanism to inform the hardware when the
>> mapping of a page is changed. If the hardware tries to access a page which
>> is not yet mapped for the hardware, it requests a resolution for the page
>> address from the kernel.
>>
>> This mechanism allows the hardware to access the entire address space of the
>> user application, without pinning even a single page.
>>
>> We would like to use the LSF/MM forum opportunity to discuss open issues we
>> have for further development, such as:
>>
>> -Allowing the hardware to perform page table walk, similar to
>> get_user_pages_fast to resolve user pages that are already in RAM.

get_user_pages_fast just get page reference count instead of populate 
the pte to page table, correct? Then how can GPU driver use iommu to 
access the page?

>>
>> -Batching page eviction by various kernel subsystems (swapper, page-cache)
>> to reduce the amount of communication needed with the hardware in such
>> events
>>
>> -Hinting from the hardware to the MM regarding page fetches which are
>> speculative, similarly to prefetching done by the page-cache
>>
>> -Page-in notifications from the kernel to the driver, such that we can keep
>> our secondary TLB in sync with the kernel page table without incurring page
>> faults.
>>
>> -Allowed and banned actions while in an MMU notifier callback. We have
>> already done some work on making the MMU notifiers sleepable [2], but there
>> might be additional limitations, which we would like to discuss.
>>
>> -Hinting from the MMU notifiers as for the reason for the notification - for
>> example we would like to react differently if a page was moved by NUMA
>> migration vs. page being swapped out.
>>
>> [1] http://lwn.net/Articles/266320/
>>
>> [2] http://comments.gmane.org/gmane.linux.kernel.mm/85002
>>
>> Thanks,
>>
>> --Shachar
> As a GPU driver developer i can say that this is something we want to
> do in a very near future. Also i think we would like another
> capabilities :
>
> - hint to mm on memory range that are best not to evict (easier for
> driver to know what is hot and gonna see activities)
>
> Dunno how big the change to the page eviction path would need to be.
>
> Cheers,
> Jerome
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=ilto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-16  7:03   ` Simon Jeons
@ 2013-04-16 16:27     ` Jerome Glisse
  2013-04-16 23:50       ` Simon Jeons
  0 siblings, 1 reply; 34+ messages in thread
From: Jerome Glisse @ 2013-04-16 16:27 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli,
	Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg,
	Liran Liss

[-- Attachment #1: Type: text/plain, Size: 2786 bytes --]

On Tue, Apr 16, 2013 at 3:03 AM, Simon Jeons <simon.jeons@gmail.com> wrote:

> Hi Jerome,
>
> On 02/08/2013 11:21 PM, Jerome Glisse wrote:
>
>> On Fri, Feb 8, 2013 at 6:18 AM, Shachar Raindel <raindel@mellanox.com>
>> wrote:
>>
>>> Hi,
>>>
>>> We would like to present a reference implementation for safely sharing
>>> memory pages from user space with the hardware, without pinning.
>>>
>>> We will be happy to hear the community feedback on our prototype
>>> implementation, and suggestions for future improvements.
>>>
>>> We would also like to discuss adding features to the core MM subsystem to
>>> assist hardware access to user memory without pinning.
>>>
>>> Following is a longer motivation and explanation on the technology
>>> presented:
>>>
>>> Many application developers would like to be able to be able to
>>> communicate
>>> directly with the hardware from the userspace.
>>>
>>> Use cases for that includes high performance networking API such as
>>> InfiniBand, RoCE and iWarp and interfacing with GPUs.
>>>
>>> Currently, if the user space application wants to share system memory
>>> with
>>> the hardware device, the kernel component must pin the memory pages in
>>> RAM,
>>> using get_user_pages.
>>>
>>> This is a hurdle, as it usually makes large portions the application
>>> memory
>>> unmovable. This pinning also makes the user space development model very
>>> complicated – one needs to register memory before using it for
>>> communication
>>> with the hardware.
>>>
>>> We use the mmu-notifiers [1] mechanism to inform the hardware when the
>>> mapping of a page is changed. If the hardware tries to access a page
>>> which
>>> is not yet mapped for the hardware, it requests a resolution for the page
>>> address from the kernel.
>>>
>>> This mechanism allows the hardware to access the entire address space of
>>> the
>>> user application, without pinning even a single page.
>>>
>>> We would like to use the LSF/MM forum opportunity to discuss open issues
>>> we
>>> have for further development, such as:
>>>
>>> -Allowing the hardware to perform page table walk, similar to
>>> get_user_pages_fast to resolve user pages that are already in RAM.
>>>
>>
> get_user_pages_fast just get page reference count instead of populate the
> pte to page table, correct? Then how can GPU driver use iommu to access the
> page?
>

As i said this is for pre-filling already present entry, ie pte that are
present with a valid page (no special bit set). This is an optimization so
that the GPU can pre-fill its tlb without having to take any mmap_sem. Hope
is that in most common case this will be enough, but in some case you will
have to go through the lengthy non fast gup.

Cheers,
Jerome

[-- Attachment #2: Type: text/html, Size: 3327 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-16 16:27     ` Jerome Glisse
@ 2013-04-16 23:50       ` Simon Jeons
  2013-04-17 14:01         ` Jerome Glisse
  0 siblings, 1 reply; 34+ messages in thread
From: Simon Jeons @ 2013-04-16 23:50 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli,
	Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg,
	Liran Liss

On 04/17/2013 12:27 AM, Jerome Glisse wrote:

[snip]
>
>
> As i said this is for pre-filling already present entry, ie pte that 
> are present with a valid page (no special bit set). This is an 
> optimization so that the GPU can pre-fill its tlb without having to 
> take any mmap_sem. Hope is that in most common case this will be 
> enough, but in some case you will have to go through the lengthy non 
> fast gup.

I know this. What I concern is the pte you mentioned is for normal cpu, 
correct? How can you pre-fill pte and tlb of GPU?

>
> Cheers,
> Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-16 23:50       ` Simon Jeons
@ 2013-04-17 14:01         ` Jerome Glisse
  2013-04-17 23:48           ` Simon Jeons
  0 siblings, 1 reply; 34+ messages in thread
From: Jerome Glisse @ 2013-04-17 14:01 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli,
	Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg,
	Liran Liss

[-- Attachment #1: Type: text/plain, Size: 1163 bytes --]

On Tue, Apr 16, 2013 at 7:50 PM, Simon Jeons <simon.jeons@gmail.com> wrote:

> On 04/17/2013 12:27 AM, Jerome Glisse wrote:
>
> [snip]
>
>
>>
>> As i said this is for pre-filling already present entry, ie pte that are
>> present with a valid page (no special bit set). This is an optimization so
>> that the GPU can pre-fill its tlb without having to take any mmap_sem. Hope
>> is that in most common case this will be enough, but in some case you will
>> have to go through the lengthy non fast gup.
>>
>
> I know this. What I concern is the pte you mentioned is for normal cpu,
> correct? How can you pre-fill pte and tlb of GPU?
>

You getting confuse, idea is to look at cpu pte and prefill gpu pte. I do
not prefill cpu pte, if a cpu pte is valid then i use the page it point to
prefill the GPU pte.

So i don't pre-fill CPU PTE and TLB GPU, i pre-fill GPU PTE from CPU PTE if
CPU PTE is valid. Other GPU PTE are marked as invalid and will trigger a
fault that will be handle using gup that will fill CPU PTE (if fault happen
at a valid address) at which point GPU PTE is updated or error is reported
if fault happened at an invalid address.

Cheers,
Jerome

[-- Attachment #2: Type: text/html, Size: 1578 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-17 14:01         ` Jerome Glisse
@ 2013-04-17 23:48           ` Simon Jeons
  2013-04-18  1:02             ` Jerome Glisse
  0 siblings, 1 reply; 34+ messages in thread
From: Simon Jeons @ 2013-04-17 23:48 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli,
	Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg,
	Liran Liss

[-- Attachment #1: Type: text/plain, Size: 1502 bytes --]

Hi Jerome,
On 04/17/2013 10:01 PM, Jerome Glisse wrote:
> On Tue, Apr 16, 2013 at 7:50 PM, Simon Jeons <simon.jeons@gmail.com 
> <mailto:simon.jeons@gmail.com>> wrote:
>
>     On 04/17/2013 12:27 AM, Jerome Glisse wrote:
>
>     [snip]
>
>
>
>         As i said this is for pre-filling already present entry, ie
>         pte that are present with a valid page (no special bit set).
>         This is an optimization so that the GPU can pre-fill its tlb
>         without having to take any mmap_sem. Hope is that in most
>         common case this will be enough, but in some case you will
>         have to go through the lengthy non fast gup.
>
>
>     I know this. What I concern is the pte you mentioned is for normal
>     cpu, correct? How can you pre-fill pte and tlb of GPU?
>
>
> You getting confuse, idea is to look at cpu pte and prefill gpu pte. I 
> do not prefill cpu pte, if a cpu pte is valid then i use the page it 
> point to prefill the GPU pte.

Yes, confused!

>
> So i don't pre-fill CPU PTE and TLB GPU, i pre-fill GPU PTE from CPU 
> PTE if CPU PTE is valid. Other GPU PTE are marked as invalid and will 
> trigger a fault that will be handle using gup that will fill CPU PTE 
> (if fault happen at a valid address) at which point GPU PTE is updated 
> or error is reported if fault happened at an invalid address.

gup is used to fill CPU PTE, could you point out to me which codes will 
re-fill GPU PTE? gup fast?
GPU page table is different from CPU?

>
> Cheers,
> Jerome


[-- Attachment #2: Type: text/html, Size: 2877 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-17 23:48           ` Simon Jeons
@ 2013-04-18  1:02             ` Jerome Glisse
  0 siblings, 0 replies; 34+ messages in thread
From: Jerome Glisse @ 2013-04-18  1:02 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli,
	Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg,
	Liran Liss

[-- Attachment #1: Type: text/plain, Size: 1623 bytes --]

On Wed, Apr 17, 2013 at 7:48 PM, Simon Jeons <simon.jeons@gmail.com> wrote:

>  Hi Jerome,
>
> On 04/17/2013 10:01 PM, Jerome Glisse wrote:
>
> On Tue, Apr 16, 2013 at 7:50 PM, Simon Jeons <simon.jeons@gmail.com>wrote:
>
>> On 04/17/2013 12:27 AM, Jerome Glisse wrote:
>>
>> [snip]
>>
>>
>>>
>>> As i said this is for pre-filling already present entry, ie pte that are
>>> present with a valid page (no special bit set). This is an optimization so
>>> that the GPU can pre-fill its tlb without having to take any mmap_sem. Hope
>>> is that in most common case this will be enough, but in some case you will
>>> have to go through the lengthy non fast gup.
>>>
>>
>>  I know this. What I concern is the pte you mentioned is for normal cpu,
>> correct? How can you pre-fill pte and tlb of GPU?
>>
>
> You getting confuse, idea is to look at cpu pte and prefill gpu pte. I do
> not prefill cpu pte, if a cpu pte is valid then i use the page it point to
> prefill the GPU pte.
>
>
> Yes, confused!
>
>
>
> So i don't pre-fill CPU PTE and TLB GPU, i pre-fill GPU PTE from CPU PTE
> if CPU PTE is valid. Other GPU PTE are marked as invalid and will trigger a
> fault that will be handle using gup that will fill CPU PTE (if fault happen
> at a valid address) at which point GPU PTE is updated or error is reported
> if fault happened at an invalid address.
>
>
> gup is used to fill CPU PTE, could you point out to me which codes will
> re-fill GPU PTE? gup fast?
> GPU page table is different from CPU?
>
>
The GPU interrupt handler will schedule a work thread that will call gup
and then update gpu page table.

Cheers,
Jerome

[-- Attachment #2: Type: text/html, Size: 2907 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-02-08 11:18 [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes Shachar Raindel
  2013-02-08 15:21 ` Jerome Glisse
@ 2013-02-09  6:05 ` Michel Lespinasse
  2013-02-09 16:29   ` Jerome Glisse
  2013-02-10  7:54   ` Shachar Raindel
  2013-04-09  8:17 ` Simon Jeons
  2 siblings, 2 replies; 34+ messages in thread
From: Michel Lespinasse @ 2013-02-09  6:05 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran,
	Or Gerlitz, Sagi Grimberg, Liran Liss

On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
> Hi,
>
> We would like to present a reference implementation for safely sharing
> memory pages from user space with the hardware, without pinning.
>
> We will be happy to hear the community feedback on our prototype
> implementation, and suggestions for future improvements.
>
> We would also like to discuss adding features to the core MM subsystem to
> assist hardware access to user memory without pinning.

This sounds kinda scary TBH; however I do understand the need for such
technology.

I think one issue is that many MM developers are insufficiently aware
of such developments; having a technology presentation would probably
help there; but traditionally LSF/MM sessions are more interactive
between developers who are already quite familiar with the technology.
I think it would help if you could send in advance a detailed
presentation of the problem and the proposed solutions (and then what
they require of the MM layer) so people can be better prepared.

And first I'd like to ask, aren't IOMMUs supposed to already largely
solve this problem ? (probably a dumb question, but that just tells
you how much you need to explain :)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-02-09  6:05 ` Michel Lespinasse
@ 2013-02-09 16:29   ` Jerome Glisse
  2013-04-09  8:28     ` Simon Jeons
                       ` (2 more replies)
  2013-02-10  7:54   ` Shachar Raindel
  1 sibling, 3 replies; 34+ messages in thread
From: Jerome Glisse @ 2013-02-09 16:29 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli,
	Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg,
	Liran Liss

On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
>> Hi,
>>
>> We would like to present a reference implementation for safely sharing
>> memory pages from user space with the hardware, without pinning.
>>
>> We will be happy to hear the community feedback on our prototype
>> implementation, and suggestions for future improvements.
>>
>> We would also like to discuss adding features to the core MM subsystem to
>> assist hardware access to user memory without pinning.
>
> This sounds kinda scary TBH; however I do understand the need for such
> technology.
>
> I think one issue is that many MM developers are insufficiently aware
> of such developments; having a technology presentation would probably
> help there; but traditionally LSF/MM sessions are more interactive
> between developers who are already quite familiar with the technology.
> I think it would help if you could send in advance a detailed
> presentation of the problem and the proposed solutions (and then what
> they require of the MM layer) so people can be better prepared.
>
> And first I'd like to ask, aren't IOMMUs supposed to already largely
> solve this problem ? (probably a dumb question, but that just tells
> you how much you need to explain :)

For GPU the motivation is three fold. With the advance of GPU compute
and also with newer graphic program we see a massive increase in GPU
memory consumption. We easily can reach buffer that are bigger than
1gbytes. So the first motivation is to directly use the memory the
user allocated through malloc in the GPU this avoid copying 1gbytes of
data with the cpu to the gpu buffer. The second and mostly important
to GPU compute is the use of GPU seamlessly with the CPU, in order to
achieve this you want the programmer to have a single address space on
the CPU and GPU. So that the same address point to the same object on
GPU as on the CPU. This would also be a tremendous cleaner design from
driver point of view toward memory management.

And last, the most important, with such big buffer (>1gbytes) the
memory pinning is becoming way to expensive and also drastically
reduce the freedom of the mm to free page for other process. Most of
the time a small window (every thing is relative the window can be >
100mbytes not so small :)) of the object will be in use by the
hardware. The hardware pagefault support would avoid the necessity to
pin memory and thus offer greater flexibility. At the same time the
driver wants to avoid page fault as much as possible this is why i
would like to be able to give hint to the mm about range of address it
should avoid freeing page (swapping them out).

The iommu was designed with other goals, which were first isolate
device from one another and restrict device access to allowed memory.
Second allow to remap address that are above device address space
limit. Lot of device can only address 24bit or 32bit of memory and
with computer with several gbytes of memory suddenly lot of the page
become unreachable to the hardware. The iommu allow to work around
this by remapping those high page into address that the hardware can
reach.

The hardware page fault support is a new feature of iommu designed to
help the os and driver to reduce memory pinning and also share address
space. Thought i am sure there are other motivations that i am not
even aware off or would think off.

Btw i won't be at LSF/MM so a free good beer (or other beverage) on me
to whoever takes note on this subject in next conf we run into each
others.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-02-09 16:29   ` Jerome Glisse
@ 2013-04-09  8:28     ` Simon Jeons
  2013-04-09 14:21       ` Jerome Glisse
  2013-04-10  1:57     ` Simon Jeons
  2013-04-15  8:39     ` Simon Jeons
  2 siblings, 1 reply; 34+ messages in thread
From: Simon Jeons @ 2013-04-09  8:28 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

Hi Jerome,
On 02/10/2013 12:29 AM, Jerome Glisse wrote:
> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
>>> Hi,
>>>
>>> We would like to present a reference implementation for safely sharing
>>> memory pages from user space with the hardware, without pinning.
>>>
>>> We will be happy to hear the community feedback on our prototype
>>> implementation, and suggestions for future improvements.
>>>
>>> We would also like to discuss adding features to the core MM subsystem to
>>> assist hardware access to user memory without pinning.
>> This sounds kinda scary TBH; however I do understand the need for such
>> technology.
>>
>> I think one issue is that many MM developers are insufficiently aware
>> of such developments; having a technology presentation would probably
>> help there; but traditionally LSF/MM sessions are more interactive
>> between developers who are already quite familiar with the technology.
>> I think it would help if you could send in advance a detailed
>> presentation of the problem and the proposed solutions (and then what
>> they require of the MM layer) so people can be better prepared.
>>
>> And first I'd like to ask, aren't IOMMUs supposed to already largely
>> solve this problem ? (probably a dumb question, but that just tells
>> you how much you need to explain :)
> For GPU the motivation is three fold. With the advance of GPU compute
> and also with newer graphic program we see a massive increase in GPU
> memory consumption. We easily can reach buffer that are bigger than
> 1gbytes. So the first motivation is to directly use the memory the
> user allocated through malloc in the GPU this avoid copying 1gbytes of
> data with the cpu to the gpu buffer. The second and mostly important
> to GPU compute is the use of GPU seamlessly with the CPU, in order to
> achieve this you want the programmer to have a single address space on
> the CPU and GPU. So that the same address point to the same object on
> GPU as on the CPU. This would also be a tremendous cleaner design from
> driver point of view toward memory management.
>
> And last, the most important, with such big buffer (>1gbytes) the
> memory pinning is becoming way to expensive and also drastically
> reduce the freedom of the mm to free page for other process. Most of
> the time a small window (every thing is relative the window can be >
> 100mbytes not so small :)) of the object will be in use by the
> hardware. The hardware pagefault support would avoid the necessity to

What's the meaning of hardware pagefault?

> pin memory and thus offer greater flexibility. At the same time the
> driver wants to avoid page fault as much as possible this is why i
> would like to be able to give hint to the mm about range of address it
> should avoid freeing page (swapping them out).
>
> The iommu was designed with other goals, which were first isolate
> device from one another and restrict device access to allowed memory.
> Second allow to remap address that are above device address space

When need remap address?

> limit. Lot of device can only address 24bit or 32bit of memory and
> with computer with several gbytes of memory suddenly lot of the page
> become unreachable to the hardware. The iommu allow to work around
> this by remapping those high page into address that the hardware can
> reach.
>
> The hardware page fault support is a new feature of iommu designed to
> help the os and driver to reduce memory pinning and also share address
> space. Thought i am sure there are other motivations that i am not
> even aware off or would think off.
>
> Btw i won't be at LSF/MM so a free good beer (or other beverage) on me
> to whoever takes note on this subject in next conf we run into each
> others.
>
> Cheers,
> Jerome
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-09  8:28     ` Simon Jeons
@ 2013-04-09 14:21       ` Jerome Glisse
  2013-04-10  1:41         ` Simon Jeons
  0 siblings, 1 reply; 34+ messages in thread
From: Jerome Glisse @ 2013-04-09 14:21 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote:
> Hi Jerome,
> On 02/10/2013 12:29 AM, Jerome Glisse wrote:
> >On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
> >>On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
> >>>Hi,
> >>>
> >>>We would like to present a reference implementation for safely sharing
> >>>memory pages from user space with the hardware, without pinning.
> >>>
> >>>We will be happy to hear the community feedback on our prototype
> >>>implementation, and suggestions for future improvements.
> >>>
> >>>We would also like to discuss adding features to the core MM subsystem to
> >>>assist hardware access to user memory without pinning.
> >>This sounds kinda scary TBH; however I do understand the need for such
> >>technology.
> >>
> >>I think one issue is that many MM developers are insufficiently aware
> >>of such developments; having a technology presentation would probably
> >>help there; but traditionally LSF/MM sessions are more interactive
> >>between developers who are already quite familiar with the technology.
> >>I think it would help if you could send in advance a detailed
> >>presentation of the problem and the proposed solutions (and then what
> >>they require of the MM layer) so people can be better prepared.
> >>
> >>And first I'd like to ask, aren't IOMMUs supposed to already largely
> >>solve this problem ? (probably a dumb question, but that just tells
> >>you how much you need to explain :)
> >For GPU the motivation is three fold. With the advance of GPU compute
> >and also with newer graphic program we see a massive increase in GPU
> >memory consumption. We easily can reach buffer that are bigger than
> >1gbytes. So the first motivation is to directly use the memory the
> >user allocated through malloc in the GPU this avoid copying 1gbytes of
> >data with the cpu to the gpu buffer. The second and mostly important
> >to GPU compute is the use of GPU seamlessly with the CPU, in order to
> >achieve this you want the programmer to have a single address space on
> >the CPU and GPU. So that the same address point to the same object on
> >GPU as on the CPU. This would also be a tremendous cleaner design from
> >driver point of view toward memory management.
> >
> >And last, the most important, with such big buffer (>1gbytes) the
> >memory pinning is becoming way to expensive and also drastically
> >reduce the freedom of the mm to free page for other process. Most of
> >the time a small window (every thing is relative the window can be >
> >100mbytes not so small :)) of the object will be in use by the
> >hardware. The hardware pagefault support would avoid the necessity to
> 
> What's the meaning of hardware pagefault?

It's a PCIE extension (well it's a combination of extension that allow
that see http://www.pcisig.com/specifications/iov/ats/). Idea is that the
iommu can trigger a regular pagefault inside a process address space on
behalf of the hardware. The only iommu supporting that right now is the
AMD iommu v2 that you find on recent AMD platform.

> 
> >pin memory and thus offer greater flexibility. At the same time the
> >driver wants to avoid page fault as much as possible this is why i
> >would like to be able to give hint to the mm about range of address it
> >should avoid freeing page (swapping them out).
> >
> >The iommu was designed with other goals, which were first isolate
> >device from one another and restrict device access to allowed memory.
> >Second allow to remap address that are above device address space
> 
> When need remap address?

Some hardware have 24bits or 32bits address limitation, iommu allow to
remap memory that are above this range into the working range of the
device. Just as i said below. Or are your question different ?

Cheers,
Jerome

> >limit. Lot of device can only address 24bit or 32bit of memory and
> >with computer with several gbytes of memory suddenly lot of the page
> >become unreachable to the hardware. The iommu allow to work around
> >this by remapping those high page into address that the hardware can
> >reach.
> >
> >The hardware page fault support is a new feature of iommu designed to
> >help the os and driver to reduce memory pinning and also share address
> >space. Thought i am sure there are other motivations that i am not
> >even aware off or would think off.
> >
> >Btw i won't be at LSF/MM so a free good beer (or other beverage) on me
> >to whoever takes note on this subject in next conf we run into each
> >others.
> >

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-09 14:21       ` Jerome Glisse
@ 2013-04-10  1:41         ` Simon Jeons
  2013-04-10 20:45           ` Jerome Glisse
  0 siblings, 1 reply; 34+ messages in thread
From: Simon Jeons @ 2013-04-10  1:41 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

Hi Jerome,
On 04/09/2013 10:21 PM, Jerome Glisse wrote:
> On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote:
>> Hi Jerome,
>> On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
>>>>> Hi,
>>>>>
>>>>> We would like to present a reference implementation for safely sharing
>>>>> memory pages from user space with the hardware, without pinning.
>>>>>
>>>>> We will be happy to hear the community feedback on our prototype
>>>>> implementation, and suggestions for future improvements.
>>>>>
>>>>> We would also like to discuss adding features to the core MM subsystem to
>>>>> assist hardware access to user memory without pinning.
>>>> This sounds kinda scary TBH; however I do understand the need for such
>>>> technology.
>>>>
>>>> I think one issue is that many MM developers are insufficiently aware
>>>> of such developments; having a technology presentation would probably
>>>> help there; but traditionally LSF/MM sessions are more interactive
>>>> between developers who are already quite familiar with the technology.
>>>> I think it would help if you could send in advance a detailed
>>>> presentation of the problem and the proposed solutions (and then what
>>>> they require of the MM layer) so people can be better prepared.
>>>>
>>>> And first I'd like to ask, aren't IOMMUs supposed to already largely
>>>> solve this problem ? (probably a dumb question, but that just tells
>>>> you how much you need to explain :)
>>> For GPU the motivation is three fold. With the advance of GPU compute
>>> and also with newer graphic program we see a massive increase in GPU
>>> memory consumption. We easily can reach buffer that are bigger than
>>> 1gbytes. So the first motivation is to directly use the memory the
>>> user allocated through malloc in the GPU this avoid copying 1gbytes of
>>> data with the cpu to the gpu buffer. The second and mostly important
>>> to GPU compute is the use of GPU seamlessly with the CPU, in order to
>>> achieve this you want the programmer to have a single address space on
>>> the CPU and GPU. So that the same address point to the same object on
>>> GPU as on the CPU. This would also be a tremendous cleaner design from
>>> driver point of view toward memory management.
>>>
>>> And last, the most important, with such big buffer (>1gbytes) the
>>> memory pinning is becoming way to expensive and also drastically
>>> reduce the freedom of the mm to free page for other process. Most of
>>> the time a small window (every thing is relative the window can be >
>>> 100mbytes not so small :)) of the object will be in use by the
>>> hardware. The hardware pagefault support would avoid the necessity to
>> What's the meaning of hardware pagefault?
> It's a PCIE extension (well it's a combination of extension that allow
> that see http://www.pcisig.com/specifications/iov/ats/). Idea is that the
> iommu can trigger a regular pagefault inside a process address space on
> behalf of the hardware. The only iommu supporting that right now is the
> AMD iommu v2 that you find on recent AMD platform.

Why need hardware page fault? regular page fault is trigger by cpu mmu, 
correct?

>>> pin memory and thus offer greater flexibility. At the same time the
>>> driver wants to avoid page fault as much as possible this is why i
>>> would like to be able to give hint to the mm about range of address it
>>> should avoid freeing page (swapping them out).
>>>
>>> The iommu was designed with other goals, which were first isolate
>>> device from one another and restrict device access to allowed memory.
>>> Second allow to remap address that are above device address space
>> When need remap address?
> Some hardware have 24bits or 32bits address limitation, iommu allow to
> remap memory that are above this range into the working range of the
> device. Just as i said below. Or are your question different ?

Oh, this method can replace bounce buffer, correct?

>
> Cheers,
> Jerome
>
>>> limit. Lot of device can only address 24bit or 32bit of memory and
>>> with computer with several gbytes of memory suddenly lot of the page
>>> become unreachable to the hardware. The iommu allow to work around
>>> this by remapping those high page into address that the hardware can
>>> reach.
>>>
>>> The hardware page fault support is a new feature of iommu designed to
>>> help the os and driver to reduce memory pinning and also share address
>>> space. Thought i am sure there are other motivations that i am not
>>> even aware off or would think off.
>>>
>>> Btw i won't be at LSF/MM so a free good beer (or other beverage) on me
>>> to whoever takes note on this subject in next conf we run into each
>>> others.
>>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-10  1:41         ` Simon Jeons
@ 2013-04-10 20:45           ` Jerome Glisse
  2013-04-11  3:42             ` Simon Jeons
  0 siblings, 1 reply; 34+ messages in thread
From: Jerome Glisse @ 2013-04-10 20:45 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote:
> Hi Jerome,
> On 04/09/2013 10:21 PM, Jerome Glisse wrote:
> >On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote:
> >>Hi Jerome,
> >>On 02/10/2013 12:29 AM, Jerome Glisse wrote:
> >>>On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
> >>>>On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
> >>>>>Hi,
> >>>>>
> >>>>>We would like to present a reference implementation for safely sharing
> >>>>>memory pages from user space with the hardware, without pinning.
> >>>>>
> >>>>>We will be happy to hear the community feedback on our prototype
> >>>>>implementation, and suggestions for future improvements.
> >>>>>
> >>>>>We would also like to discuss adding features to the core MM subsystem to
> >>>>>assist hardware access to user memory without pinning.
> >>>>This sounds kinda scary TBH; however I do understand the need for such
> >>>>technology.
> >>>>
> >>>>I think one issue is that many MM developers are insufficiently aware
> >>>>of such developments; having a technology presentation would probably
> >>>>help there; but traditionally LSF/MM sessions are more interactive
> >>>>between developers who are already quite familiar with the technology.
> >>>>I think it would help if you could send in advance a detailed
> >>>>presentation of the problem and the proposed solutions (and then what
> >>>>they require of the MM layer) so people can be better prepared.
> >>>>
> >>>>And first I'd like to ask, aren't IOMMUs supposed to already largely
> >>>>solve this problem ? (probably a dumb question, but that just tells
> >>>>you how much you need to explain :)
> >>>For GPU the motivation is three fold. With the advance of GPU compute
> >>>and also with newer graphic program we see a massive increase in GPU
> >>>memory consumption. We easily can reach buffer that are bigger than
> >>>1gbytes. So the first motivation is to directly use the memory the
> >>>user allocated through malloc in the GPU this avoid copying 1gbytes of
> >>>data with the cpu to the gpu buffer. The second and mostly important
> >>>to GPU compute is the use of GPU seamlessly with the CPU, in order to
> >>>achieve this you want the programmer to have a single address space on
> >>>the CPU and GPU. So that the same address point to the same object on
> >>>GPU as on the CPU. This would also be a tremendous cleaner design from
> >>>driver point of view toward memory management.
> >>>
> >>>And last, the most important, with such big buffer (>1gbytes) the
> >>>memory pinning is becoming way to expensive and also drastically
> >>>reduce the freedom of the mm to free page for other process. Most of
> >>>the time a small window (every thing is relative the window can be >
> >>>100mbytes not so small :)) of the object will be in use by the
> >>>hardware. The hardware pagefault support would avoid the necessity to
> >>What's the meaning of hardware pagefault?
> >It's a PCIE extension (well it's a combination of extension that allow
> >that see http://www.pcisig.com/specifications/iov/ats/). Idea is that the
> >iommu can trigger a regular pagefault inside a process address space on
> >behalf of the hardware. The only iommu supporting that right now is the
> >AMD iommu v2 that you find on recent AMD platform.
> 
> Why need hardware page fault? regular page fault is trigger by cpu
> mmu, correct?

Well here i abuse regular page fault term. Idea is that with hardware page
fault you don't need to pin memory or take reference on page for hardware to
use it. So that kernel can free as usual page that would otherwise have been
pinned. If GPU is really using them it will trigger a fault through the iommu
driver that call get_user_pages (which can end up calling handle_mm_fault like
a regular page fault that happened on the CPU).

One use case is GPU working on BIG dataset (think > GB buffer that can be on disk
and just paged in when a chunk is needed). This is one example, but usualy GPU
works on very large dataset because that's what they are good at.

> 
> >>>pin memory and thus offer greater flexibility. At the same time the
> >>>driver wants to avoid page fault as much as possible this is why i
> >>>would like to be able to give hint to the mm about range of address it
> >>>should avoid freeing page (swapping them out).
> >>>
> >>>The iommu was designed with other goals, which were first isolate
> >>>device from one another and restrict device access to allowed memory.
> >>>Second allow to remap address that are above device address space
> >>When need remap address?
> >Some hardware have 24bits or 32bits address limitation, iommu allow to
> >remap memory that are above this range into the working range of the
> >device. Just as i said below. Or are your question different ?
> 
> Oh, this method can replace bounce buffer, correct?

Yes, no bounce buffer, bounce buffer is frowned upon in GPU world because you
really really really don't want to use the dma sync buffer API.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-10 20:45           ` Jerome Glisse
@ 2013-04-11  3:42             ` Simon Jeons
  2013-04-11 18:38               ` Jerome Glisse
  0 siblings, 1 reply; 34+ messages in thread
From: Simon Jeons @ 2013-04-11  3:42 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

Hi Jerome,
On 04/11/2013 04:45 AM, Jerome Glisse wrote:
> On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote:
>> Hi Jerome,
>> On 04/09/2013 10:21 PM, Jerome Glisse wrote:
>>> On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote:
>>>> Hi Jerome,
>>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
>>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We would like to present a reference implementation for safely sharing
>>>>>>> memory pages from user space with the hardware, without pinning.
>>>>>>>
>>>>>>> We will be happy to hear the community feedback on our prototype
>>>>>>> implementation, and suggestions for future improvements.
>>>>>>>
>>>>>>> We would also like to discuss adding features to the core MM subsystem to
>>>>>>> assist hardware access to user memory without pinning.
>>>>>> This sounds kinda scary TBH; however I do understand the need for such
>>>>>> technology.
>>>>>>
>>>>>> I think one issue is that many MM developers are insufficiently aware
>>>>>> of such developments; having a technology presentation would probably
>>>>>> help there; but traditionally LSF/MM sessions are more interactive
>>>>>> between developers who are already quite familiar with the technology.
>>>>>> I think it would help if you could send in advance a detailed
>>>>>> presentation of the problem and the proposed solutions (and then what
>>>>>> they require of the MM layer) so people can be better prepared.
>>>>>>
>>>>>> And first I'd like to ask, aren't IOMMUs supposed to already largely
>>>>>> solve this problem ? (probably a dumb question, but that just tells
>>>>>> you how much you need to explain :)
>>>>> For GPU the motivation is three fold. With the advance of GPU compute
>>>>> and also with newer graphic program we see a massive increase in GPU
>>>>> memory consumption. We easily can reach buffer that are bigger than
>>>>> 1gbytes. So the first motivation is to directly use the memory the
>>>>> user allocated through malloc in the GPU this avoid copying 1gbytes of
>>>>> data with the cpu to the gpu buffer. The second and mostly important
>>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order to
>>>>> achieve this you want the programmer to have a single address space on
>>>>> the CPU and GPU. So that the same address point to the same object on
>>>>> GPU as on the CPU. This would also be a tremendous cleaner design from
>>>>> driver point of view toward memory management.
>>>>>
>>>>> And last, the most important, with such big buffer (>1gbytes) the
>>>>> memory pinning is becoming way to expensive and also drastically
>>>>> reduce the freedom of the mm to free page for other process. Most of
>>>>> the time a small window (every thing is relative the window can be >
>>>>> 100mbytes not so small :)) of the object will be in use by the
>>>>> hardware. The hardware pagefault support would avoid the necessity to
>>>> What's the meaning of hardware pagefault?
>>> It's a PCIE extension (well it's a combination of extension that allow
>>> that see http://www.pcisig.com/specifications/iov/ats/). Idea is that the
>>> iommu can trigger a regular pagefault inside a process address space on
>>> behalf of the hardware. The only iommu supporting that right now is the
>>> AMD iommu v2 that you find on recent AMD platform.
>> Why need hardware page fault? regular page fault is trigger by cpu
>> mmu, correct?
> Well here i abuse regular page fault term. Idea is that with hardware page
> fault you don't need to pin memory or take reference on page for hardware to
> use it. So that kernel can free as usual page that would otherwise have been

For the case when GPU need to pin memory, why GPU need grap the memory 
of normal process instead of allocating for itself?

> pinned. If GPU is really using them it will trigger a fault through the iommu
> driver that call get_user_pages (which can end up calling handle_mm_fault like
> a regular page fault that happened on the CPU).

This time normal process can't use this page, correct? So GPU and normal 
process both have their own pages?

> One use case is GPU working on BIG dataset (think > GB buffer that can be on disk
> and just paged in when a chunk is needed). This is one example, but usualy GPU
> works on very large dataset because that's what they are good at.
>>>>> pin memory and thus offer greater flexibility. At the same time the
>>>>> driver wants to avoid page fault as much as possible this is why i
>>>>> would like to be able to give hint to the mm about range of address it
>>>>> should avoid freeing page (swapping them out).
>>>>>
>>>>> The iommu was designed with other goals, which were first isolate
>>>>> device from one another and restrict device access to allowed memory.
>>>>> Second allow to remap address that are above device address space
>>>> When need remap address?
>>> Some hardware have 24bits or 32bits address limitation, iommu allow to
>>> remap memory that are above this range into the working range of the
>>> device. Just as i said below. Or are your question different ?
>> Oh, this method can replace bounce buffer, correct?
> Yes, no bounce buffer, bounce buffer is frowned upon in GPU world because you
> really really really don't want to use the dma sync buffer API.
>
> Cheers,
> Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-11  3:42             ` Simon Jeons
@ 2013-04-11 18:38               ` Jerome Glisse
  2013-04-12  1:54                 ` Simon Jeons
  0 siblings, 1 reply; 34+ messages in thread
From: Jerome Glisse @ 2013-04-11 18:38 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote:
> Hi Jerome,
> On 04/11/2013 04:45 AM, Jerome Glisse wrote:
> >On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote:
> >>Hi Jerome,
> >>On 04/09/2013 10:21 PM, Jerome Glisse wrote:
> >>>On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote:
> >>>>Hi Jerome,
> >>>>On 02/10/2013 12:29 AM, Jerome Glisse wrote:
> >>>>>On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
> >>>>>>On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
> >>>>>>>Hi,
> >>>>>>>
> >>>>>>>We would like to present a reference implementation for safely sharing
> >>>>>>>memory pages from user space with the hardware, without pinning.
> >>>>>>>
> >>>>>>>We will be happy to hear the community feedback on our prototype
> >>>>>>>implementation, and suggestions for future improvements.
> >>>>>>>
> >>>>>>>We would also like to discuss adding features to the core MM subsystem to
> >>>>>>>assist hardware access to user memory without pinning.
> >>>>>>This sounds kinda scary TBH; however I do understand the need for such
> >>>>>>technology.
> >>>>>>
> >>>>>>I think one issue is that many MM developers are insufficiently aware
> >>>>>>of such developments; having a technology presentation would probably
> >>>>>>help there; but traditionally LSF/MM sessions are more interactive
> >>>>>>between developers who are already quite familiar with the technology.
> >>>>>>I think it would help if you could send in advance a detailed
> >>>>>>presentation of the problem and the proposed solutions (and then what
> >>>>>>they require of the MM layer) so people can be better prepared.
> >>>>>>
> >>>>>>And first I'd like to ask, aren't IOMMUs supposed to already largely
> >>>>>>solve this problem ? (probably a dumb question, but that just tells
> >>>>>>you how much you need to explain :)
> >>>>>For GPU the motivation is three fold. With the advance of GPU compute
> >>>>>and also with newer graphic program we see a massive increase in GPU
> >>>>>memory consumption. We easily can reach buffer that are bigger than
> >>>>>1gbytes. So the first motivation is to directly use the memory the
> >>>>>user allocated through malloc in the GPU this avoid copying 1gbytes of
> >>>>>data with the cpu to the gpu buffer. The second and mostly important
> >>>>>to GPU compute is the use of GPU seamlessly with the CPU, in order to
> >>>>>achieve this you want the programmer to have a single address space on
> >>>>>the CPU and GPU. So that the same address point to the same object on
> >>>>>GPU as on the CPU. This would also be a tremendous cleaner design from
> >>>>>driver point of view toward memory management.
> >>>>>
> >>>>>And last, the most important, with such big buffer (>1gbytes) the
> >>>>>memory pinning is becoming way to expensive and also drastically
> >>>>>reduce the freedom of the mm to free page for other process. Most of
> >>>>>the time a small window (every thing is relative the window can be >
> >>>>>100mbytes not so small :)) of the object will be in use by the
> >>>>>hardware. The hardware pagefault support would avoid the necessity to
> >>>>What's the meaning of hardware pagefault?
> >>>It's a PCIE extension (well it's a combination of extension that allow
> >>>that see http://www.pcisig.com/specifications/iov/ats/). Idea is that the
> >>>iommu can trigger a regular pagefault inside a process address space on
> >>>behalf of the hardware. The only iommu supporting that right now is the
> >>>AMD iommu v2 that you find on recent AMD platform.
> >>Why need hardware page fault? regular page fault is trigger by cpu
> >>mmu, correct?
> >Well here i abuse regular page fault term. Idea is that with hardware page
> >fault you don't need to pin memory or take reference on page for hardware to
> >use it. So that kernel can free as usual page that would otherwise have been
> 
> For the case when GPU need to pin memory, why GPU need grap the
> memory of normal process instead of allocating for itself?

Pin memory is today world where gpu allocate its own memory (GB of memory)
that disappear from kernel control ie kernel can no longer reclaim this
memory it's lost memory (i had complain about that already from user than
saw GB of memory vanish and couldn't understand why the GPU was using so
much).

Tomorrow world we want gpu to be able to access memory that the application
allocated through a simple malloc and we want the kernel to be able to
recycly any page at any time because of memory pressure or because kernel
decide to do so.

That's just what we want to do. To achieve so we are getting hw that can do
pagefault. No change to kernel core mm code (some improvement might be made).

> 
> >pinned. If GPU is really using them it will trigger a fault through the iommu
> >driver that call get_user_pages (which can end up calling handle_mm_fault like
> >a regular page fault that happened on the CPU).
> 
> This time normal process can't use this page, correct? So GPU and
> normal process both have their own pages?

No, tomorrow world, gpu and cpu both using same page in same address space at
the same time. Just like two cpu core each running a different thread of
the same process would. Just consider the gpu as a new cpu core working in same
address space using the same memory all at the same time as cpu.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-11 18:38               ` Jerome Glisse
@ 2013-04-12  1:54                 ` Simon Jeons
  2013-04-12  2:11                   ` [Lsf-pc] " Rik van Riel
  2013-04-12  2:57                   ` Jerome Glisse
  0 siblings, 2 replies; 34+ messages in thread
From: Simon Jeons @ 2013-04-12  1:54 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

Hi Jerome,
On 04/12/2013 02:38 AM, Jerome Glisse wrote:
> On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote:
>> Hi Jerome,
>> On 04/11/2013 04:45 AM, Jerome Glisse wrote:
>>> On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote:
>>>> Hi Jerome,
>>>> On 04/09/2013 10:21 PM, Jerome Glisse wrote:
>>>>> On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote:
>>>>>> Hi Jerome,
>>>>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>>>>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
>>>>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> We would like to present a reference implementation for safely sharing
>>>>>>>>> memory pages from user space with the hardware, without pinning.
>>>>>>>>>
>>>>>>>>> We will be happy to hear the community feedback on our prototype
>>>>>>>>> implementation, and suggestions for future improvements.
>>>>>>>>>
>>>>>>>>> We would also like to discuss adding features to the core MM subsystem to
>>>>>>>>> assist hardware access to user memory without pinning.
>>>>>>>> This sounds kinda scary TBH; however I do understand the need for such
>>>>>>>> technology.
>>>>>>>>
>>>>>>>> I think one issue is that many MM developers are insufficiently aware
>>>>>>>> of such developments; having a technology presentation would probably
>>>>>>>> help there; but traditionally LSF/MM sessions are more interactive
>>>>>>>> between developers who are already quite familiar with the technology.
>>>>>>>> I think it would help if you could send in advance a detailed
>>>>>>>> presentation of the problem and the proposed solutions (and then what
>>>>>>>> they require of the MM layer) so people can be better prepared.
>>>>>>>>
>>>>>>>> And first I'd like to ask, aren't IOMMUs supposed to already largely
>>>>>>>> solve this problem ? (probably a dumb question, but that just tells
>>>>>>>> you how much you need to explain :)
>>>>>>> For GPU the motivation is three fold. With the advance of GPU compute
>>>>>>> and also with newer graphic program we see a massive increase in GPU
>>>>>>> memory consumption. We easily can reach buffer that are bigger than
>>>>>>> 1gbytes. So the first motivation is to directly use the memory the
>>>>>>> user allocated through malloc in the GPU this avoid copying 1gbytes of
>>>>>>> data with the cpu to the gpu buffer. The second and mostly important
>>>>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order to
>>>>>>> achieve this you want the programmer to have a single address space on
>>>>>>> the CPU and GPU. So that the same address point to the same object on
>>>>>>> GPU as on the CPU. This would also be a tremendous cleaner design from
>>>>>>> driver point of view toward memory management.
>>>>>>>
>>>>>>> And last, the most important, with such big buffer (>1gbytes) the
>>>>>>> memory pinning is becoming way to expensive and also drastically
>>>>>>> reduce the freedom of the mm to free page for other process. Most of
>>>>>>> the time a small window (every thing is relative the window can be >
>>>>>>> 100mbytes not so small :)) of the object will be in use by the
>>>>>>> hardware. The hardware pagefault support would avoid the necessity to
>>>>>> What's the meaning of hardware pagefault?
>>>>> It's a PCIE extension (well it's a combination of extension that allow
>>>>> that see http://www.pcisig.com/specifications/iov/ats/). Idea is that the
>>>>> iommu can trigger a regular pagefault inside a process address space on
>>>>> behalf of the hardware. The only iommu supporting that right now is the
>>>>> AMD iommu v2 that you find on recent AMD platform.
>>>> Why need hardware page fault? regular page fault is trigger by cpu
>>>> mmu, correct?
>>> Well here i abuse regular page fault term. Idea is that with hardware page
>>> fault you don't need to pin memory or take reference on page for hardware to
>>> use it. So that kernel can free as usual page that would otherwise have been
>> For the case when GPU need to pin memory, why GPU need grap the
>> memory of normal process instead of allocating for itself?
> Pin memory is today world where gpu allocate its own memory (GB of memory)
> that disappear from kernel control ie kernel can no longer reclaim this
> memory it's lost memory (i had complain about that already from user than
> saw GB of memory vanish and couldn't understand why the GPU was using so
> much).
>
> Tomorrow world we want gpu to be able to access memory that the application
> allocated through a simple malloc and we want the kernel to be able to
> recycly any page at any time because of memory pressure or because kernel
> decide to do so.
>
> That's just what we want to do. To achieve so we are getting hw that can do
> pagefault. No change to kernel core mm code (some improvement might be made).

The memory disappear since you have a reference(gup) against it, 
correct? Tomorrow world you want the page fault trigger through iommu 
driver that call get_user_pages, it also will take a reference(since gup 
is called), isn't it? Anyway, assume tomorrow world doesn't take a 
reference, we don't need care page which used by GPU is reclaimed?

>
>>> pinned. If GPU is really using them it will trigger a fault through the iommu
>>> driver that call get_user_pages (which can end up calling handle_mm_fault like
>>> a regular page fault that happened on the CPU).
>> This time normal process can't use this page, correct? So GPU and
>> normal process both have their own pages?
> No, tomorrow world, gpu and cpu both using same page in same address space at
> the same time. Just like two cpu core each running a different thread of
> the same process would. Just consider the gpu as a new cpu core working in same
> address space using the same memory all at the same time as cpu.
>
> Cheers,
> Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-12  1:54                 ` Simon Jeons
@ 2013-04-12  2:11                   ` Rik van Riel
  2013-04-12  2:57                   ` Jerome Glisse
  1 sibling, 0 replies; 34+ messages in thread
From: Rik van Riel @ 2013-04-12  2:11 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Jerome Glisse, Andrea Arcangeli, Haggai Eran, lsf-pc, Liran Liss,
	Shachar Raindel, Sagi Grimberg, Roland Dreier, linux-mm,
	Or Gerlitz, Michel Lespinasse

On 04/11/2013 09:54 PM, Simon Jeons wrote:
> Hi Jerome,
> On 04/12/2013 02:38 AM, Jerome Glisse wrote:
>> On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote:

>> Tomorrow world we want gpu to be able to access memory that the
>> application
>> allocated through a simple malloc and we want the kernel to be able to
>> recycly any page at any time because of memory pressure or because kernel
>> decide to do so.
>>
>> That's just what we want to do. To achieve so we are getting hw that
>> can do
>> pagefault. No change to kernel core mm code (some improvement might be
>> made).
>
> The memory disappear since you have a reference(gup) against it,
> correct? Tomorrow world you want the page fault trigger through iommu
> driver that call get_user_pages, it also will take a reference(since gup
> is called), isn't it? Anyway, assume tomorrow world doesn't take a
> reference, we don't need care page which used by GPU is reclaimed?

The GPU and CPU may each have a different page table format.
The kernel will need to keep both in sync. That is one of the
things this discussion is about.

For performance reasons, it may also make sense to locate some
of the application's data in the GPU's own memory, so it does
not have to cross the PCIE bus every time it needs to load the
data. That requires memory coherency code in the kernel.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-12  1:54                 ` Simon Jeons
  2013-04-12  2:11                   ` [Lsf-pc] " Rik van Riel
@ 2013-04-12  2:57                   ` Jerome Glisse
  2013-04-12  5:44                     ` Simon Jeons
  1 sibling, 1 reply; 34+ messages in thread
From: Jerome Glisse @ 2013-04-12  2:57 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

[-- Attachment #1: Type: text/plain, Size: 6456 bytes --]

On Thu, Apr 11, 2013 at 9:54 PM, Simon Jeons <simon.jeons@gmail.com> wrote:

> Hi Jerome,
>
> On 04/12/2013 02:38 AM, Jerome Glisse wrote:
>
>> On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote:
>>
>>> Hi Jerome,
>>> On 04/11/2013 04:45 AM, Jerome Glisse wrote:
>>>
>>>> On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote:
>>>>
>>>>> Hi Jerome,
>>>>> On 04/09/2013 10:21 PM, Jerome Glisse wrote:
>>>>>
>>>>>> On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote:
>>>>>>
>>>>>>> Hi Jerome,
>>>>>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>>>>>>>
>>>>>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <
>>>>>>>> walken@google.com> wrote:
>>>>>>>>
>>>>>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <
>>>>>>>>> raindel@mellanox.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> We would like to present a reference implementation for safely
>>>>>>>>>> sharing
>>>>>>>>>> memory pages from user space with the hardware, without pinning.
>>>>>>>>>>
>>>>>>>>>> We will be happy to hear the community feedback on our prototype
>>>>>>>>>> implementation, and suggestions for future improvements.
>>>>>>>>>>
>>>>>>>>>> We would also like to discuss adding features to the core MM
>>>>>>>>>> subsystem to
>>>>>>>>>> assist hardware access to user memory without pinning.
>>>>>>>>>>
>>>>>>>>> This sounds kinda scary TBH; however I do understand the need for
>>>>>>>>> such
>>>>>>>>> technology.
>>>>>>>>>
>>>>>>>>> I think one issue is that many MM developers are insufficiently
>>>>>>>>> aware
>>>>>>>>> of such developments; having a technology presentation would
>>>>>>>>> probably
>>>>>>>>> help there; but traditionally LSF/MM sessions are more interactive
>>>>>>>>> between developers who are already quite familiar with the
>>>>>>>>> technology.
>>>>>>>>> I think it would help if you could send in advance a detailed
>>>>>>>>> presentation of the problem and the proposed solutions (and then
>>>>>>>>> what
>>>>>>>>> they require of the MM layer) so people can be better prepared.
>>>>>>>>>
>>>>>>>>> And first I'd like to ask, aren't IOMMUs supposed to already
>>>>>>>>> largely
>>>>>>>>> solve this problem ? (probably a dumb question, but that just tells
>>>>>>>>> you how much you need to explain :)
>>>>>>>>>
>>>>>>>> For GPU the motivation is three fold. With the advance of GPU
>>>>>>>> compute
>>>>>>>> and also with newer graphic program we see a massive increase in GPU
>>>>>>>> memory consumption. We easily can reach buffer that are bigger than
>>>>>>>> 1gbytes. So the first motivation is to directly use the memory the
>>>>>>>> user allocated through malloc in the GPU this avoid copying 1gbytes
>>>>>>>> of
>>>>>>>> data with the cpu to the gpu buffer. The second and mostly important
>>>>>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order
>>>>>>>> to
>>>>>>>> achieve this you want the programmer to have a single address space
>>>>>>>> on
>>>>>>>> the CPU and GPU. So that the same address point to the same object
>>>>>>>> on
>>>>>>>> GPU as on the CPU. This would also be a tremendous cleaner design
>>>>>>>> from
>>>>>>>> driver point of view toward memory management.
>>>>>>>>
>>>>>>>> And last, the most important, with such big buffer (>1gbytes) the
>>>>>>>> memory pinning is becoming way to expensive and also drastically
>>>>>>>> reduce the freedom of the mm to free page for other process. Most of
>>>>>>>> the time a small window (every thing is relative the window can be >
>>>>>>>> 100mbytes not so small :)) of the object will be in use by the
>>>>>>>> hardware. The hardware pagefault support would avoid the necessity
>>>>>>>> to
>>>>>>>>
>>>>>>> What's the meaning of hardware pagefault?
>>>>>>>
>>>>>> It's a PCIE extension (well it's a combination of extension that allow
>>>>>> that see http://www.pcisig.com/**specifications/iov/ats/<http://www.pcisig.com/specifications/iov/ats/>).
>>>>>> Idea is that the
>>>>>> iommu can trigger a regular pagefault inside a process address space
>>>>>> on
>>>>>> behalf of the hardware. The only iommu supporting that right now is
>>>>>> the
>>>>>> AMD iommu v2 that you find on recent AMD platform.
>>>>>>
>>>>> Why need hardware page fault? regular page fault is trigger by cpu
>>>>> mmu, correct?
>>>>>
>>>> Well here i abuse regular page fault term. Idea is that with hardware
>>>> page
>>>> fault you don't need to pin memory or take reference on page for
>>>> hardware to
>>>> use it. So that kernel can free as usual page that would otherwise have
>>>> been
>>>>
>>> For the case when GPU need to pin memory, why GPU need grap the
>>> memory of normal process instead of allocating for itself?
>>>
>> Pin memory is today world where gpu allocate its own memory (GB of memory)
>> that disappear from kernel control ie kernel can no longer reclaim this
>> memory it's lost memory (i had complain about that already from user than
>> saw GB of memory vanish and couldn't understand why the GPU was using so
>> much).
>>
>> Tomorrow world we want gpu to be able to access memory that the
>> application
>> allocated through a simple malloc and we want the kernel to be able to
>> recycly any page at any time because of memory pressure or because kernel
>> decide to do so.
>>
>> That's just what we want to do. To achieve so we are getting hw that can
>> do
>> pagefault. No change to kernel core mm code (some improvement might be
>> made).
>>
>
> The memory disappear since you have a reference(gup) against it, correct?
> Tomorrow world you want the page fault trigger through iommu driver that
> call get_user_pages, it also will take a reference(since gup is called),
> isn't it? Anyway, assume tomorrow world doesn't take a reference, we don't
> need care page which used by GPU is reclaimed?
>
>
Right now code use gup because it's convenient but it drop the reference
right after the fault. So reference is hold only for short period of time.

No you don't need to care about reclaim thanks to mmu notifier, ie before
page is remove mmu notifier is call and iommu register a notifier, so it
get the invalidate event and invalidate the device tlb and things goes on.
If gpu access the page a new pagefault happen and a new page is allocated.

All this code is upstream in linux kernel just read it. There is just no
device that use it yet.

That being said we will want improvement so that page that are hot in the
device are not reclaimed. But it can work without such improvement.

Cheers,
Jerome

[-- Attachment #2: Type: text/html, Size: 7593 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-12  2:57                   ` Jerome Glisse
@ 2013-04-12  5:44                     ` Simon Jeons
  2013-04-12 13:32                       ` Jerome Glisse
  0 siblings, 1 reply; 34+ messages in thread
From: Simon Jeons @ 2013-04-12  5:44 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

[-- Attachment #1: Type: text/plain, Size: 9882 bytes --]

Hi Jerome,
On 04/12/2013 10:57 AM, Jerome Glisse wrote:
> On Thu, Apr 11, 2013 at 9:54 PM, Simon Jeons <simon.jeons@gmail.com 
> <mailto:simon.jeons@gmail.com>> wrote:
>
>     Hi Jerome,
>
>     On 04/12/2013 02:38 AM, Jerome Glisse wrote:
>
>         On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote:
>
>             Hi Jerome,
>             On 04/11/2013 04:45 AM, Jerome Glisse wrote:
>
>                 On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons
>                 wrote:
>
>                     Hi Jerome,
>                     On 04/09/2013 10:21 PM, Jerome Glisse wrote:
>
>                         On Tue, Apr 09, 2013 at 04:28:09PM +0800,
>                         Simon Jeons wrote:
>
>                             Hi Jerome,
>                             On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>
>                                 On Sat, Feb 9, 2013 at 1:05 AM, Michel
>                                 Lespinasse <walken@google.com
>                                 <mailto:walken@google.com>> wrote:
>
>                                     On Fri, Feb 8, 2013 at 3:18 AM,
>                                     Shachar Raindel
>                                     <raindel@mellanox.com
>                                     <mailto:raindel@mellanox.com>> wrote:
>
>                                         Hi,
>
>                                         We would like to present a
>                                         reference implementation for
>                                         safely sharing
>                                         memory pages from user space
>                                         with the hardware, without
>                                         pinning.
>
>                                         We will be happy to hear the
>                                         community feedback on our
>                                         prototype
>                                         implementation, and
>                                         suggestions for future
>                                         improvements.
>
>                                         We would also like to discuss
>                                         adding features to the core MM
>                                         subsystem to
>                                         assist hardware access to user
>                                         memory without pinning.
>
>                                     This sounds kinda scary TBH;
>                                     however I do understand the need
>                                     for such
>                                     technology.
>
>                                     I think one issue is that many MM
>                                     developers are insufficiently aware
>                                     of such developments; having a
>                                     technology presentation would probably
>                                     help there; but traditionally
>                                     LSF/MM sessions are more interactive
>                                     between developers who are already
>                                     quite familiar with the technology.
>                                     I think it would help if you could
>                                     send in advance a detailed
>                                     presentation of the problem and
>                                     the proposed solutions (and then what
>                                     they require of the MM layer) so
>                                     people can be better prepared.
>
>                                     And first I'd like to ask, aren't
>                                     IOMMUs supposed to already largely
>                                     solve this problem ? (probably a
>                                     dumb question, but that just tells
>                                     you how much you need to explain :)
>
>                                 For GPU the motivation is three fold.
>                                 With the advance of GPU compute
>                                 and also with newer graphic program we
>                                 see a massive increase in GPU
>                                 memory consumption. We easily can
>                                 reach buffer that are bigger than
>                                 1gbytes. So the first motivation is to
>                                 directly use the memory the
>                                 user allocated through malloc in the
>                                 GPU this avoid copying 1gbytes of
>                                 data with the cpu to the gpu buffer.
>                                 The second and mostly important
>                                 to GPU compute is the use of GPU
>                                 seamlessly with the CPU, in order to
>                                 achieve this you want the programmer
>                                 to have a single address space on
>                                 the CPU and GPU. So that the same
>                                 address point to the same object on
>                                 GPU as on the CPU. This would also be
>                                 a tremendous cleaner design from
>                                 driver point of view toward memory
>                                 management.
>
>                                 And last, the most important, with
>                                 such big buffer (>1gbytes) the
>                                 memory pinning is becoming way to
>                                 expensive and also drastically
>                                 reduce the freedom of the mm to free
>                                 page for other process. Most of
>                                 the time a small window (every thing
>                                 is relative the window can be >
>                                 100mbytes not so small :)) of the
>                                 object will be in use by the
>                                 hardware. The hardware pagefault
>                                 support would avoid the necessity to
>
>                             What's the meaning of hardware pagefault?
>
>                         It's a PCIE extension (well it's a combination
>                         of extension that allow
>                         that see
>                         http://www.pcisig.com/specifications/iov/ats/). Idea
>                         is that the
>                         iommu can trigger a regular pagefault inside a
>                         process address space on
>                         behalf of the hardware. The only iommu
>                         supporting that right now is the
>                         AMD iommu v2 that you find on recent AMD platform.
>
>                     Why need hardware page fault? regular page fault
>                     is trigger by cpu
>                     mmu, correct?
>
>                 Well here i abuse regular page fault term. Idea is
>                 that with hardware page
>                 fault you don't need to pin memory or take reference
>                 on page for hardware to
>                 use it. So that kernel can free as usual page that
>                 would otherwise have been
>
>             For the case when GPU need to pin memory, why GPU need
>             grap the
>             memory of normal process instead of allocating for itself?
>
>         Pin memory is today world where gpu allocate its own memory
>         (GB of memory)
>         that disappear from kernel control ie kernel can no longer
>         reclaim this
>         memory it's lost memory (i had complain about that already
>         from user than
>         saw GB of memory vanish and couldn't understand why the GPU
>         was using so
>         much).
>
>         Tomorrow world we want gpu to be able to access memory that
>         the application
>         allocated through a simple malloc and we want the kernel to be
>         able to
>         recycly any page at any time because of memory pressure or
>         because kernel
>         decide to do so.
>
>         That's just what we want to do. To achieve so we are getting
>         hw that can do
>         pagefault. No change to kernel core mm code (some improvement
>         might be made).
>
>
>     The memory disappear since you have a reference(gup) against it,
>     correct? Tomorrow world you want the page fault trigger through
>     iommu driver that call get_user_pages, it also will take a
>     reference(since gup is called), isn't it? Anyway, assume tomorrow
>     world doesn't take a reference, we don't need care page which used
>     by GPU is reclaimed?
>
>
> Right now code use gup because it's convenient but it drop the 
> reference right after the fault. So reference is hold only for short 
> period of time.

Are you sure gup will drop the reference right after the fault? I redig 
the codes and fail verify it. Could you point out to me?

>
> No you don't need to care about reclaim thanks to mmu notifier, ie 
> before page is remove mmu notifier is call and iommu register a 
> notifier, so it get the invalidate event and invalidate the device tlb 
> and things goes on. If gpu access the page a new pagefault happen and 
> a new page is allocated.

Good idea! ;-)

>
> All this code is upstream in linux kernel just read it. There is just 
> no device that use it yet.
>
> That being said we will want improvement so that page that are hot in 
> the device are not reclaimed. But it can work without such improvement.
>
> Cheers,
> Jerome


[-- Attachment #2: Type: text/html, Size: 13455 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-12  5:44                     ` Simon Jeons
@ 2013-04-12 13:32                       ` Jerome Glisse
  0 siblings, 0 replies; 34+ messages in thread
From: Jerome Glisse @ 2013-04-12 13:32 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

[-- Attachment #1: Type: text/plain, Size: 6419 bytes --]

On Fri, Apr 12, 2013 at 1:44 AM, Simon Jeons <simon.jeons@gmail.com> wrote:

>  Hi Jerome,
>
> On 04/12/2013 10:57 AM, Jerome Glisse wrote:
>
> On Thu, Apr 11, 2013 at 9:54 PM, Simon Jeons <simon.jeons@gmail.com>wrote:
>
>> Hi Jerome,
>>
>> On 04/12/2013 02:38 AM, Jerome Glisse wrote:
>>
>>> On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote:
>>>
>>>> Hi Jerome,
>>>> On 04/11/2013 04:45 AM, Jerome Glisse wrote:
>>>>
>>>>> On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote:
>>>>>
>>>>>> Hi Jerome,
>>>>>> On 04/09/2013 10:21 PM, Jerome Glisse wrote:
>>>>>>
>>>>>>> On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote:
>>>>>>>
>>>>>>>> Hi Jerome,
>>>>>>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>>>>>>>>
>>>>>>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <
>>>>>>>>> walken@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <
>>>>>>>>>> raindel@mellanox.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> We would like to present a reference implementation for safely
>>>>>>>>>>> sharing
>>>>>>>>>>> memory pages from user space with the hardware, without pinning.
>>>>>>>>>>>
>>>>>>>>>>> We will be happy to hear the community feedback on our prototype
>>>>>>>>>>> implementation, and suggestions for future improvements.
>>>>>>>>>>>
>>>>>>>>>>> We would also like to discuss adding features to the core MM
>>>>>>>>>>> subsystem to
>>>>>>>>>>> assist hardware access to user memory without pinning.
>>>>>>>>>>>
>>>>>>>>>> This sounds kinda scary TBH; however I do understand the need for
>>>>>>>>>> such
>>>>>>>>>> technology.
>>>>>>>>>>
>>>>>>>>>> I think one issue is that many MM developers are insufficiently
>>>>>>>>>> aware
>>>>>>>>>> of such developments; having a technology presentation would
>>>>>>>>>> probably
>>>>>>>>>> help there; but traditionally LSF/MM sessions are more interactive
>>>>>>>>>> between developers who are already quite familiar with the
>>>>>>>>>> technology.
>>>>>>>>>> I think it would help if you could send in advance a detailed
>>>>>>>>>> presentation of the problem and the proposed solutions (and then
>>>>>>>>>> what
>>>>>>>>>> they require of the MM layer) so people can be better prepared.
>>>>>>>>>>
>>>>>>>>>> And first I'd like to ask, aren't IOMMUs supposed to already
>>>>>>>>>> largely
>>>>>>>>>> solve this problem ? (probably a dumb question, but that just
>>>>>>>>>> tells
>>>>>>>>>> you how much you need to explain :)
>>>>>>>>>>
>>>>>>>>> For GPU the motivation is three fold. With the advance of GPU
>>>>>>>>> compute
>>>>>>>>> and also with newer graphic program we see a massive increase in
>>>>>>>>> GPU
>>>>>>>>> memory consumption. We easily can reach buffer that are bigger than
>>>>>>>>> 1gbytes. So the first motivation is to directly use the memory the
>>>>>>>>> user allocated through malloc in the GPU this avoid copying
>>>>>>>>> 1gbytes of
>>>>>>>>> data with the cpu to the gpu buffer. The second and mostly
>>>>>>>>> important
>>>>>>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order
>>>>>>>>> to
>>>>>>>>> achieve this you want the programmer to have a single address
>>>>>>>>> space on
>>>>>>>>> the CPU and GPU. So that the same address point to the same object
>>>>>>>>> on
>>>>>>>>> GPU as on the CPU. This would also be a tremendous cleaner design
>>>>>>>>> from
>>>>>>>>> driver point of view toward memory management.
>>>>>>>>>
>>>>>>>>> And last, the most important, with such big buffer (>1gbytes) the
>>>>>>>>> memory pinning is becoming way to expensive and also drastically
>>>>>>>>> reduce the freedom of the mm to free page for other process. Most
>>>>>>>>> of
>>>>>>>>> the time a small window (every thing is relative the window can be
>>>>>>>>> >
>>>>>>>>> 100mbytes not so small :)) of the object will be in use by the
>>>>>>>>> hardware. The hardware pagefault support would avoid the necessity
>>>>>>>>> to
>>>>>>>>>
>>>>>>>> What's the meaning of hardware pagefault?
>>>>>>>>
>>>>>>> It's a PCIE extension (well it's a combination of extension that
>>>>>>> allow
>>>>>>> that see http://www.pcisig.com/specifications/iov/ats/). Idea is
>>>>>>> that the
>>>>>>> iommu can trigger a regular pagefault inside a process address space
>>>>>>> on
>>>>>>> behalf of the hardware. The only iommu supporting that right now is
>>>>>>> the
>>>>>>> AMD iommu v2 that you find on recent AMD platform.
>>>>>>>
>>>>>> Why need hardware page fault? regular page fault is trigger by cpu
>>>>>> mmu, correct?
>>>>>>
>>>>> Well here i abuse regular page fault term. Idea is that with hardware
>>>>> page
>>>>> fault you don't need to pin memory or take reference on page for
>>>>> hardware to
>>>>> use it. So that kernel can free as usual page that would otherwise
>>>>> have been
>>>>>
>>>> For the case when GPU need to pin memory, why GPU need grap the
>>>> memory of normal process instead of allocating for itself?
>>>>
>>> Pin memory is today world where gpu allocate its own memory (GB of
>>> memory)
>>> that disappear from kernel control ie kernel can no longer reclaim this
>>> memory it's lost memory (i had complain about that already from user than
>>> saw GB of memory vanish and couldn't understand why the GPU was using so
>>> much).
>>>
>>> Tomorrow world we want gpu to be able to access memory that the
>>> application
>>> allocated through a simple malloc and we want the kernel to be able to
>>> recycly any page at any time because of memory pressure or because kernel
>>> decide to do so.
>>>
>>> That's just what we want to do. To achieve so we are getting hw that can
>>> do
>>> pagefault. No change to kernel core mm code (some improvement might be
>>> made).
>>>
>>
>>  The memory disappear since you have a reference(gup) against it,
>> correct? Tomorrow world you want the page fault trigger through iommu
>> driver that call get_user_pages, it also will take a reference(since gup is
>> called), isn't it? Anyway, assume tomorrow world doesn't take a reference,
>> we don't need care page which used by GPU is reclaimed?
>>
>>
> Right now code use gup because it's convenient but it drop the reference
> right after the fault. So reference is hold only for short period of time.
>
>
> Are you sure gup will drop the reference right after the fault? I redig
> the codes and fail verify it. Could you point out to me?
>
>
In amd_iommu_v2.c:do_fault get_user_pages followed by put_page


Cheers,
Jerome

[-- Attachment #2: Type: text/html, Size: 12109 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-02-09 16:29   ` Jerome Glisse
  2013-04-09  8:28     ` Simon Jeons
@ 2013-04-10  1:57     ` Simon Jeons
  2013-04-10 20:55       ` Jerome Glisse
  2013-04-15  8:39     ` Simon Jeons
  2 siblings, 1 reply; 34+ messages in thread
From: Simon Jeons @ 2013-04-10  1:57 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

Hi Jerome,
On 02/10/2013 12:29 AM, Jerome Glisse wrote:
> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
>>> Hi,
>>>
>>> We would like to present a reference implementation for safely sharing
>>> memory pages from user space with the hardware, without pinning.
>>>
>>> We will be happy to hear the community feedback on our prototype
>>> implementation, and suggestions for future improvements.
>>>
>>> We would also like to discuss adding features to the core MM subsystem to
>>> assist hardware access to user memory without pinning.
>> This sounds kinda scary TBH; however I do understand the need for such
>> technology.
>>
>> I think one issue is that many MM developers are insufficiently aware
>> of such developments; having a technology presentation would probably
>> help there; but traditionally LSF/MM sessions are more interactive
>> between developers who are already quite familiar with the technology.
>> I think it would help if you could send in advance a detailed
>> presentation of the problem and the proposed solutions (and then what
>> they require of the MM layer) so people can be better prepared.
>>
>> And first I'd like to ask, aren't IOMMUs supposed to already largely
>> solve this problem ? (probably a dumb question, but that just tells
>> you how much you need to explain :)
> For GPU the motivation is three fold. With the advance of GPU compute
> and also with newer graphic program we see a massive increase in GPU
> memory consumption. We easily can reach buffer that are bigger than
> 1gbytes. So the first motivation is to directly use the memory the
> user allocated through malloc in the GPU this avoid copying 1gbytes of
> data with the cpu to the gpu buffer. The second and mostly important
> to GPU compute is the use of GPU seamlessly with the CPU, in order to
> achieve this you want the programmer to have a single address space on
> the CPU and GPU. So that the same address point to the same object on
> GPU as on the CPU. This would also be a tremendous cleaner design from
> driver point of view toward memory management.

When GPU will comsume memory?

The userspace process like mplayer will have video datas and GPU will 
play this datas and use memory of mplayer since these video datas load 
in mplayer process's address space? So GPU codes will call gup to take a 
reference of memory? Please correct me if my understanding is wrong. ;-)


> And last, the most important, with such big buffer (>1gbytes) the
> memory pinning is becoming way to expensive and also drastically
> reduce the freedom of the mm to free page for other process. Most of
> the time a small window (every thing is relative the window can be >
> 100mbytes not so small :)) of the object will be in use by the
> hardware. The hardware pagefault support would avoid the necessity to
> pin memory and thus offer greater flexibility. At the same time the
> driver wants to avoid page fault as much as possible this is why i
> would like to be able to give hint to the mm about range of address it
> should avoid freeing page (swapping them out).
>
> The iommu was designed with other goals, which were first isolate
> device from one another and restrict device access to allowed memory.
> Second allow to remap address that are above device address space
> limit. Lot of device can only address 24bit or 32bit of memory and
> with computer with several gbytes of memory suddenly lot of the page
> become unreachable to the hardware. The iommu allow to work around
> this by remapping those high page into address that the hardware can
> reach.
>
> The hardware page fault support is a new feature of iommu designed to
> help the os and driver to reduce memory pinning and also share address
> space. Thought i am sure there are other motivations that i am not
> even aware off or would think off.
>
> Btw i won't be at LSF/MM so a free good beer (or other beverage) on me
> to whoever takes note on this subject in next conf we run into each
> others.
>
> Cheers,
> Jerome
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-10  1:57     ` Simon Jeons
@ 2013-04-10 20:55       ` Jerome Glisse
  2013-04-11  3:37         ` Simon Jeons
  0 siblings, 1 reply; 34+ messages in thread
From: Jerome Glisse @ 2013-04-10 20:55 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

On Wed, Apr 10, 2013 at 09:57:02AM +0800, Simon Jeons wrote:
> Hi Jerome,
> On 02/10/2013 12:29 AM, Jerome Glisse wrote:
> >On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
> >>On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
> >>>Hi,
> >>>
> >>>We would like to present a reference implementation for safely sharing
> >>>memory pages from user space with the hardware, without pinning.
> >>>
> >>>We will be happy to hear the community feedback on our prototype
> >>>implementation, and suggestions for future improvements.
> >>>
> >>>We would also like to discuss adding features to the core MM subsystem to
> >>>assist hardware access to user memory without pinning.
> >>This sounds kinda scary TBH; however I do understand the need for such
> >>technology.
> >>
> >>I think one issue is that many MM developers are insufficiently aware
> >>of such developments; having a technology presentation would probably
> >>help there; but traditionally LSF/MM sessions are more interactive
> >>between developers who are already quite familiar with the technology.
> >>I think it would help if you could send in advance a detailed
> >>presentation of the problem and the proposed solutions (and then what
> >>they require of the MM layer) so people can be better prepared.
> >>
> >>And first I'd like to ask, aren't IOMMUs supposed to already largely
> >>solve this problem ? (probably a dumb question, but that just tells
> >>you how much you need to explain :)
> >For GPU the motivation is three fold. With the advance of GPU compute
> >and also with newer graphic program we see a massive increase in GPU
> >memory consumption. We easily can reach buffer that are bigger than
> >1gbytes. So the first motivation is to directly use the memory the
> >user allocated through malloc in the GPU this avoid copying 1gbytes of
> >data with the cpu to the gpu buffer. The second and mostly important
> >to GPU compute is the use of GPU seamlessly with the CPU, in order to
> >achieve this you want the programmer to have a single address space on
> >the CPU and GPU. So that the same address point to the same object on
> >GPU as on the CPU. This would also be a tremendous cleaner design from
> >driver point of view toward memory management.
> 
> When GPU will comsume memory?
> 
> The userspace process like mplayer will have video datas and GPU
> will play this datas and use memory of mplayer since these video
> datas load in mplayer process's address space? So GPU codes will
> call gup to take a reference of memory? Please correct me if my
> understanding is wrong. ;-)

First target is not thing such as video decompression, however they could
too benefit from it given updated driver kernel API. In case of using
iommu hardware page fault we don't call get_user_pages (gup) those we
don't take a reference on the page. That's the whole point of the hardware
pagefault, not taking reference on the page.

Cheers,
Jerome Glisse

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-10 20:55       ` Jerome Glisse
@ 2013-04-11  3:37         ` Simon Jeons
  2013-04-11 18:48           ` Jerome Glisse
  0 siblings, 1 reply; 34+ messages in thread
From: Simon Jeons @ 2013-04-11  3:37 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

Hi Jerome,
On 04/11/2013 04:55 AM, Jerome Glisse wrote:
> On Wed, Apr 10, 2013 at 09:57:02AM +0800, Simon Jeons wrote:
>> Hi Jerome,
>> On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
>>>>> Hi,
>>>>>
>>>>> We would like to present a reference implementation for safely sharing
>>>>> memory pages from user space with the hardware, without pinning.
>>>>>
>>>>> We will be happy to hear the community feedback on our prototype
>>>>> implementation, and suggestions for future improvements.
>>>>>
>>>>> We would also like to discuss adding features to the core MM subsystem to
>>>>> assist hardware access to user memory without pinning.
>>>> This sounds kinda scary TBH; however I do understand the need for such
>>>> technology.
>>>>
>>>> I think one issue is that many MM developers are insufficiently aware
>>>> of such developments; having a technology presentation would probably
>>>> help there; but traditionally LSF/MM sessions are more interactive
>>>> between developers who are already quite familiar with the technology.
>>>> I think it would help if you could send in advance a detailed
>>>> presentation of the problem and the proposed solutions (and then what
>>>> they require of the MM layer) so people can be better prepared.
>>>>
>>>> And first I'd like to ask, aren't IOMMUs supposed to already largely
>>>> solve this problem ? (probably a dumb question, but that just tells
>>>> you how much you need to explain :)
>>> For GPU the motivation is three fold. With the advance of GPU compute
>>> and also with newer graphic program we see a massive increase in GPU
>>> memory consumption. We easily can reach buffer that are bigger than
>>> 1gbytes. So the first motivation is to directly use the memory the
>>> user allocated through malloc in the GPU this avoid copying 1gbytes of
>>> data with the cpu to the gpu buffer. The second and mostly important
>>> to GPU compute is the use of GPU seamlessly with the CPU, in order to
>>> achieve this you want the programmer to have a single address space on
>>> the CPU and GPU. So that the same address point to the same object on
>>> GPU as on the CPU. This would also be a tremendous cleaner design from
>>> driver point of view toward memory management.
>> When GPU will comsume memory?
>>
>> The userspace process like mplayer will have video datas and GPU
>> will play this datas and use memory of mplayer since these video
>> datas load in mplayer process's address space? So GPU codes will
>> call gup to take a reference of memory? Please correct me if my
>> understanding is wrong. ;-)
> First target is not thing such as video decompression, however they could
> too benefit from it given updated driver kernel API. In case of using
> iommu hardware page fault we don't call get_user_pages (gup) those we
> don't take a reference on the page. That's the whole point of the hardware
> pagefault, not taking reference on the page.

mplayer process is running on normal CPU or GPU?
chipset_integrated graphics will use normal memory and discrete graphics 
will use its own memory, correct? So the memory used by discrete 
graphics won't need gup, correct?


>
> Cheers,
> Jerome Glisse

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-11  3:37         ` Simon Jeons
@ 2013-04-11 18:48           ` Jerome Glisse
  2013-04-12  3:13             ` Simon Jeons
  0 siblings, 1 reply; 34+ messages in thread
From: Jerome Glisse @ 2013-04-11 18:48 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

On Thu, Apr 11, 2013 at 11:37:35AM +0800, Simon Jeons wrote:
> Hi Jerome,
> On 04/11/2013 04:55 AM, Jerome Glisse wrote:
> >On Wed, Apr 10, 2013 at 09:57:02AM +0800, Simon Jeons wrote:
> >>Hi Jerome,
> >>On 02/10/2013 12:29 AM, Jerome Glisse wrote:
> >>>On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
> >>>>On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
> >>>>>Hi,
> >>>>>
> >>>>>We would like to present a reference implementation for safely sharing
> >>>>>memory pages from user space with the hardware, without pinning.
> >>>>>
> >>>>>We will be happy to hear the community feedback on our prototype
> >>>>>implementation, and suggestions for future improvements.
> >>>>>
> >>>>>We would also like to discuss adding features to the core MM subsystem to
> >>>>>assist hardware access to user memory without pinning.
> >>>>This sounds kinda scary TBH; however I do understand the need for such
> >>>>technology.
> >>>>
> >>>>I think one issue is that many MM developers are insufficiently aware
> >>>>of such developments; having a technology presentation would probably
> >>>>help there; but traditionally LSF/MM sessions are more interactive
> >>>>between developers who are already quite familiar with the technology.
> >>>>I think it would help if you could send in advance a detailed
> >>>>presentation of the problem and the proposed solutions (and then what
> >>>>they require of the MM layer) so people can be better prepared.
> >>>>
> >>>>And first I'd like to ask, aren't IOMMUs supposed to already largely
> >>>>solve this problem ? (probably a dumb question, but that just tells
> >>>>you how much you need to explain :)
> >>>For GPU the motivation is three fold. With the advance of GPU compute
> >>>and also with newer graphic program we see a massive increase in GPU
> >>>memory consumption. We easily can reach buffer that are bigger than
> >>>1gbytes. So the first motivation is to directly use the memory the
> >>>user allocated through malloc in the GPU this avoid copying 1gbytes of
> >>>data with the cpu to the gpu buffer. The second and mostly important
> >>>to GPU compute is the use of GPU seamlessly with the CPU, in order to
> >>>achieve this you want the programmer to have a single address space on
> >>>the CPU and GPU. So that the same address point to the same object on
> >>>GPU as on the CPU. This would also be a tremendous cleaner design from
> >>>driver point of view toward memory management.
> >>When GPU will comsume memory?
> >>
> >>The userspace process like mplayer will have video datas and GPU
> >>will play this datas and use memory of mplayer since these video
> >>datas load in mplayer process's address space? So GPU codes will
> >>call gup to take a reference of memory? Please correct me if my
> >>understanding is wrong. ;-)
> >First target is not thing such as video decompression, however they could
> >too benefit from it given updated driver kernel API. In case of using
> >iommu hardware page fault we don't call get_user_pages (gup) those we
> >don't take a reference on the page. That's the whole point of the hardware
> >pagefault, not taking reference on the page.
> 
> mplayer process is running on normal CPU or GPU?
> chipset_integrated graphics will use normal memory and discrete
> graphics will use its own memory, correct? So the memory used by
> discrete graphics won't need gup, correct?

mplayer can decode video in software an only use the cpu. It can also use
one of the accleration API such as VDPAU. In any case mplayer is still opening
the video file allocating some memory with malloc, reading from file into
this memory eventually do some preprocessing on that memory and then
memcpy from this memory to memory allocated by the gpu driver.

No imagine a world where you don't have to memcpy so that the gpu can access
it. Even if it's doable today it's really not something you want todo, ie
gup on page and not releasing page for minutes.

There is two kind of integrated GPU, on x86 integrated GPU should be considered
as discrete GPU because BIOS steal a chunk of system ram and transform it in
fake vram. This stolen chunk is never ever under the control of the linux kernel
(from mm pov the gpu kernel driver is in charge of it).

In any case both discrete GPU and integrated GPU have their own page table or
memory controller and they map system memory in it or video memory, sometime
interleaving (at address 0x100000 64k is in vram but at address 0x10000+64k it's
system memory pointing to some pages).

So right now any time we map a normal system ram page we take a reference on it
so it does not goes away. We decided to not use gup because it will break several
kernel assumption on anonymous memory in case of GPU. But we could use gup for
short lived memory transaction like memcpy from system ram to vram (no matter if
it's fake vram or real vram).

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-11 18:48           ` Jerome Glisse
@ 2013-04-12  3:13             ` Simon Jeons
  2013-04-12  3:21               ` Jerome Glisse
  0 siblings, 1 reply; 34+ messages in thread
From: Simon Jeons @ 2013-04-12  3:13 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

Hi Jerome,
On 04/12/2013 02:48 AM, Jerome Glisse wrote:
> On Thu, Apr 11, 2013 at 11:37:35AM +0800, Simon Jeons wrote:
>> Hi Jerome,
>> On 04/11/2013 04:55 AM, Jerome Glisse wrote:
>>> On Wed, Apr 10, 2013 at 09:57:02AM +0800, Simon Jeons wrote:
>>>> Hi Jerome,
>>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
>>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We would like to present a reference implementation for safely sharing
>>>>>>> memory pages from user space with the hardware, without pinning.
>>>>>>>
>>>>>>> We will be happy to hear the community feedback on our prototype
>>>>>>> implementation, and suggestions for future improvements.
>>>>>>>
>>>>>>> We would also like to discuss adding features to the core MM subsystem to
>>>>>>> assist hardware access to user memory without pinning.
>>>>>> This sounds kinda scary TBH; however I do understand the need for such
>>>>>> technology.
>>>>>>
>>>>>> I think one issue is that many MM developers are insufficiently aware
>>>>>> of such developments; having a technology presentation would probably
>>>>>> help there; but traditionally LSF/MM sessions are more interactive
>>>>>> between developers who are already quite familiar with the technology.
>>>>>> I think it would help if you could send in advance a detailed
>>>>>> presentation of the problem and the proposed solutions (and then what
>>>>>> they require of the MM layer) so people can be better prepared.
>>>>>>
>>>>>> And first I'd like to ask, aren't IOMMUs supposed to already largely
>>>>>> solve this problem ? (probably a dumb question, but that just tells
>>>>>> you how much you need to explain :)
>>>>> For GPU the motivation is three fold. With the advance of GPU compute
>>>>> and also with newer graphic program we see a massive increase in GPU
>>>>> memory consumption. We easily can reach buffer that are bigger than
>>>>> 1gbytes. So the first motivation is to directly use the memory the
>>>>> user allocated through malloc in the GPU this avoid copying 1gbytes of
>>>>> data with the cpu to the gpu buffer. The second and mostly important
>>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order to
>>>>> achieve this you want the programmer to have a single address space on
>>>>> the CPU and GPU. So that the same address point to the same object on
>>>>> GPU as on the CPU. This would also be a tremendous cleaner design from
>>>>> driver point of view toward memory management.
>>>> When GPU will comsume memory?
>>>>
>>>> The userspace process like mplayer will have video datas and GPU
>>>> will play this datas and use memory of mplayer since these video
>>>> datas load in mplayer process's address space? So GPU codes will
>>>> call gup to take a reference of memory? Please correct me if my
>>>> understanding is wrong. ;-)
>>> First target is not thing such as video decompression, however they could
>>> too benefit from it given updated driver kernel API. In case of using
>>> iommu hardware page fault we don't call get_user_pages (gup) those we
>>> don't take a reference on the page. That's the whole point of the hardware
>>> pagefault, not taking reference on the page.
>> mplayer process is running on normal CPU or GPU?
>> chipset_integrated graphics will use normal memory and discrete
>> graphics will use its own memory, correct? So the memory used by
>> discrete graphics won't need gup, correct?
> mplayer can decode video in software an only use the cpu. It can also use
> one of the accleration API such as VDPAU. In any case mplayer is still opening
> the video file allocating some memory with malloc, reading from file into
> this memory eventually do some preprocessing on that memory and then
> memcpy from this memory to memory allocated by the gpu driver.
>
> No imagine a world where you don't have to memcpy so that the gpu can access
> it. Even if it's doable today it's really not something you want todo, ie
> gup on page and not releasing page for minutes.
>
> There is two kind of integrated GPU, on x86 integrated GPU should be considered
> as discrete GPU because BIOS steal a chunk of system ram and transform it in
> fake vram. This stolen chunk is never ever under the control of the linux kernel
> (from mm pov the gpu kernel driver is in charge of it).

I configure integrated GPU in BIOS during system boot, it's seems that 
we can preallocate memory for integrated GPU, is this the memory you 
mentioned?
>
> In any case both discrete GPU and integrated GPU have their own page table or

Discrete GPU will not use normal memory even if their own memory is 
exhaused, correct?

> memory controller and they map system memory in it or video memory, sometime
> interleaving (at address 0x100000 64k is in vram but at address 0x10000+64k it's
> system memory pointing to some pages).
>
> So right now any time we map a normal system ram page we take a reference on it
> so it does not goes away. We decided to not use gup because it will break several
> kernel assumption on anonymous memory in case of GPU. But we could use gup for
> short lived memory transaction like memcpy from system ram to vram (no matter if
> it's fake vram or real vram).
>
> Cheers,
> Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-12  3:13             ` Simon Jeons
@ 2013-04-12  3:21               ` Jerome Glisse
  0 siblings, 0 replies; 34+ messages in thread
From: Jerome Glisse @ 2013-04-12  3:21 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

[-- Attachment #1: Type: text/plain, Size: 5427 bytes --]

On Thu, Apr 11, 2013 at 11:13 PM, Simon Jeons <simon.jeons@gmail.com> wrote:

> Hi Jerome,
>
> On 04/12/2013 02:48 AM, Jerome Glisse wrote:
>
>> On Thu, Apr 11, 2013 at 11:37:35AM +0800, Simon Jeons wrote:
>>
>>> Hi Jerome,
>>> On 04/11/2013 04:55 AM, Jerome Glisse wrote:
>>>
>>>> On Wed, Apr 10, 2013 at 09:57:02AM +0800, Simon Jeons wrote:
>>>>
>>>>> Hi Jerome,
>>>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>>>>>
>>>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <
>>>>>>> raindel@mellanox.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We would like to present a reference implementation for safely
>>>>>>>> sharing
>>>>>>>> memory pages from user space with the hardware, without pinning.
>>>>>>>>
>>>>>>>> We will be happy to hear the community feedback on our prototype
>>>>>>>> implementation, and suggestions for future improvements.
>>>>>>>>
>>>>>>>> We would also like to discuss adding features to the core MM
>>>>>>>> subsystem to
>>>>>>>> assist hardware access to user memory without pinning.
>>>>>>>>
>>>>>>> This sounds kinda scary TBH; however I do understand the need for
>>>>>>> such
>>>>>>> technology.
>>>>>>>
>>>>>>> I think one issue is that many MM developers are insufficiently aware
>>>>>>> of such developments; having a technology presentation would probably
>>>>>>> help there; but traditionally LSF/MM sessions are more interactive
>>>>>>> between developers who are already quite familiar with the
>>>>>>> technology.
>>>>>>> I think it would help if you could send in advance a detailed
>>>>>>> presentation of the problem and the proposed solutions (and then what
>>>>>>> they require of the MM layer) so people can be better prepared.
>>>>>>>
>>>>>>> And first I'd like to ask, aren't IOMMUs supposed to already largely
>>>>>>> solve this problem ? (probably a dumb question, but that just tells
>>>>>>> you how much you need to explain :)
>>>>>>>
>>>>>> For GPU the motivation is three fold. With the advance of GPU compute
>>>>>> and also with newer graphic program we see a massive increase in GPU
>>>>>> memory consumption. We easily can reach buffer that are bigger than
>>>>>> 1gbytes. So the first motivation is to directly use the memory the
>>>>>> user allocated through malloc in the GPU this avoid copying 1gbytes of
>>>>>> data with the cpu to the gpu buffer. The second and mostly important
>>>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order to
>>>>>> achieve this you want the programmer to have a single address space on
>>>>>> the CPU and GPU. So that the same address point to the same object on
>>>>>> GPU as on the CPU. This would also be a tremendous cleaner design from
>>>>>> driver point of view toward memory management.
>>>>>>
>>>>> When GPU will comsume memory?
>>>>>
>>>>> The userspace process like mplayer will have video datas and GPU
>>>>> will play this datas and use memory of mplayer since these video
>>>>> datas load in mplayer process's address space? So GPU codes will
>>>>> call gup to take a reference of memory? Please correct me if my
>>>>> understanding is wrong. ;-)
>>>>>
>>>> First target is not thing such as video decompression, however they
>>>> could
>>>> too benefit from it given updated driver kernel API. In case of using
>>>> iommu hardware page fault we don't call get_user_pages (gup) those we
>>>> don't take a reference on the page. That's the whole point of the
>>>> hardware
>>>> pagefault, not taking reference on the page.
>>>>
>>> mplayer process is running on normal CPU or GPU?
>>> chipset_integrated graphics will use normal memory and discrete
>>> graphics will use its own memory, correct? So the memory used by
>>> discrete graphics won't need gup, correct?
>>>
>> mplayer can decode video in software an only use the cpu. It can also use
>> one of the accleration API such as VDPAU. In any case mplayer is still
>> opening
>> the video file allocating some memory with malloc, reading from file into
>> this memory eventually do some preprocessing on that memory and then
>> memcpy from this memory to memory allocated by the gpu driver.
>>
>> No imagine a world where you don't have to memcpy so that the gpu can
>> access
>> it. Even if it's doable today it's really not something you want todo, ie
>> gup on page and not releasing page for minutes.
>>
>> There is two kind of integrated GPU, on x86 integrated GPU should be
>> considered
>> as discrete GPU because BIOS steal a chunk of system ram and transform it
>> in
>> fake vram. This stolen chunk is never ever under the control of the linux
>> kernel
>> (from mm pov the gpu kernel driver is in charge of it).
>>
>
> I configure integrated GPU in BIOS during system boot, it's seems that we
> can preallocate memory for integrated GPU, is this the memory you mentioned
> ?


Most likely it's


> In any case both discrete GPU and integrated GPU have their own page table
>> or
>>
>
> Discrete GPU will not use normal memory even if their own memory is
> exhaused, correct?
>
>
They will consume normal memory, right now you can see that on heavy load
hugue chunk of your system memory disappear, it's the gpu driver that is
using it, it get mapped into gpu address space and from gpu unit pov it's
just like any other memory (ie vram or sram looks the same to the gpu
acceleration core, sram is just slower).

Cheers
Jerome

[-- Attachment #2: Type: text/html, Size: 6837 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-02-09 16:29   ` Jerome Glisse
  2013-04-09  8:28     ` Simon Jeons
  2013-04-10  1:57     ` Simon Jeons
@ 2013-04-15  8:39     ` Simon Jeons
  2013-04-15 15:38       ` Jerome Glisse
  2 siblings, 1 reply; 34+ messages in thread
From: Simon Jeons @ 2013-04-15  8:39 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

Hi Jerome,
On 02/10/2013 12:29 AM, Jerome Glisse wrote:
> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
>>> Hi,
>>>
>>> We would like to present a reference implementation for safely sharing
>>> memory pages from user space with the hardware, without pinning.
>>>
>>> We will be happy to hear the community feedback on our prototype
>>> implementation, and suggestions for future improvements.
>>>
>>> We would also like to discuss adding features to the core MM subsystem to
>>> assist hardware access to user memory without pinning.
>> This sounds kinda scary TBH; however I do understand the need for such
>> technology.
>>
>> I think one issue is that many MM developers are insufficiently aware
>> of such developments; having a technology presentation would probably
>> help there; but traditionally LSF/MM sessions are more interactive
>> between developers who are already quite familiar with the technology.
>> I think it would help if you could send in advance a detailed
>> presentation of the problem and the proposed solutions (and then what
>> they require of the MM layer) so people can be better prepared.
>>
>> And first I'd like to ask, aren't IOMMUs supposed to already largely
>> solve this problem ? (probably a dumb question, but that just tells
>> you how much you need to explain :)
> For GPU the motivation is three fold. With the advance of GPU compute
> and also with newer graphic program we see a massive increase in GPU
> memory consumption. We easily can reach buffer that are bigger than
> 1gbytes. So the first motivation is to directly use the memory the
> user allocated through malloc in the GPU this avoid copying 1gbytes of
> data with the cpu to the gpu buffer. The second and mostly important

The pinned memory you mentioned is the memory user allocated or the 
memory of gpu buffer?

> to GPU compute is the use of GPU seamlessly with the CPU, in order to
> achieve this you want the programmer to have a single address space on
> the CPU and GPU. So that the same address point to the same object on
> GPU as on the CPU. This would also be a tremendous cleaner design from
> driver point of view toward memory management.
>
> And last, the most important, with such big buffer (>1gbytes) the
> memory pinning is becoming way to expensive and also drastically
> reduce the freedom of the mm to free page for other process. Most of
> the time a small window (every thing is relative the window can be >
> 100mbytes not so small :)) of the object will be in use by the
> hardware. The hardware pagefault support would avoid the necessity to
> pin memory and thus offer greater flexibility. At the same time the
> driver wants to avoid page fault as much as possible this is why i
> would like to be able to give hint to the mm about range of address it
> should avoid freeing page (swapping them out).
>
> The iommu was designed with other goals, which were first isolate
> device from one another and restrict device access to allowed memory.
> Second allow to remap address that are above device address space
> limit. Lot of device can only address 24bit or 32bit of memory and
> with computer with several gbytes of memory suddenly lot of the page
> become unreachable to the hardware. The iommu allow to work around
> this by remapping those high page into address that the hardware can
> reach.
>
> The hardware page fault support is a new feature of iommu designed to
> help the os and driver to reduce memory pinning and also share address
> space. Thought i am sure there are other motivations that i am not
> even aware off or would think off.
>
> Btw i won't be at LSF/MM so a free good beer (or other beverage) on me
> to whoever takes note on this subject in next conf we run into each
> others.
>
> Cheers,
> Jerome
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-15  8:39     ` Simon Jeons
@ 2013-04-15 15:38       ` Jerome Glisse
  2013-04-16  4:20         ` Simon Jeons
  0 siblings, 1 reply; 34+ messages in thread
From: Jerome Glisse @ 2013-04-15 15:38 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

[-- Attachment #1: Type: text/plain, Size: 2163 bytes --]

On Mon, Apr 15, 2013 at 4:39 AM, Simon Jeons <simon.jeons@gmail.com> wrote:

> Hi Jerome,
> On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>
>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com>
>> wrote:
>>
>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We would like to present a reference implementation for safely sharing
>>>> memory pages from user space with the hardware, without pinning.
>>>>
>>>> We will be happy to hear the community feedback on our prototype
>>>> implementation, and suggestions for future improvements.
>>>>
>>>> We would also like to discuss adding features to the core MM subsystem
>>>> to
>>>> assist hardware access to user memory without pinning.
>>>>
>>> This sounds kinda scary TBH; however I do understand the need for such
>>> technology.
>>>
>>> I think one issue is that many MM developers are insufficiently aware
>>> of such developments; having a technology presentation would probably
>>> help there; but traditionally LSF/MM sessions are more interactive
>>> between developers who are already quite familiar with the technology.
>>> I think it would help if you could send in advance a detailed
>>> presentation of the problem and the proposed solutions (and then what
>>> they require of the MM layer) so people can be better prepared.
>>>
>>> And first I'd like to ask, aren't IOMMUs supposed to already largely
>>> solve this problem ? (probably a dumb question, but that just tells
>>> you how much you need to explain :)
>>>
>> For GPU the motivation is three fold. With the advance of GPU compute
>> and also with newer graphic program we see a massive increase in GPU
>> memory consumption. We easily can reach buffer that are bigger than
>> 1gbytes. So the first motivation is to directly use the memory the
>> user allocated through malloc in the GPU this avoid copying 1gbytes of
>> data with the cpu to the gpu buffer. The second and mostly important
>>
>
> The pinned memory you mentioned is the memory user allocated or the memory
> of gpu buffer?
>

Memory user allocated, we don't want to pin this memory.

Cheers,
Jerome

[-- Attachment #2: Type: text/html, Size: 2975 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-15 15:38       ` Jerome Glisse
@ 2013-04-16  4:20         ` Simon Jeons
  2013-04-16 16:19           ` Jerome Glisse
  0 siblings, 1 reply; 34+ messages in thread
From: Simon Jeons @ 2013-04-16  4:20 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

[-- Attachment #1: Type: text/plain, Size: 2993 bytes --]

Hi Jerome,
On 04/15/2013 11:38 PM, Jerome Glisse wrote:
> On Mon, Apr 15, 2013 at 4:39 AM, Simon Jeons <simon.jeons@gmail.com 
> <mailto:simon.jeons@gmail.com>> wrote:
>
>     Hi Jerome,
>     On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>
>         On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse
>         <walken@google.com <mailto:walken@google.com>> wrote:
>
>             On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel
>             <raindel@mellanox.com <mailto:raindel@mellanox.com>> wrote:
>
>                 Hi,
>
>                 We would like to present a reference implementation
>                 for safely sharing
>                 memory pages from user space with the hardware,
>                 without pinning.
>
>                 We will be happy to hear the community feedback on our
>                 prototype
>                 implementation, and suggestions for future improvements.
>
>                 We would also like to discuss adding features to the
>                 core MM subsystem to
>                 assist hardware access to user memory without pinning.
>
>             This sounds kinda scary TBH; however I do understand the
>             need for such
>             technology.
>
>             I think one issue is that many MM developers are
>             insufficiently aware
>             of such developments; having a technology presentation
>             would probably
>             help there; but traditionally LSF/MM sessions are more
>             interactive
>             between developers who are already quite familiar with the
>             technology.
>             I think it would help if you could send in advance a detailed
>             presentation of the problem and the proposed solutions
>             (and then what
>             they require of the MM layer) so people can be better
>             prepared.
>
>             And first I'd like to ask, aren't IOMMUs supposed to
>             already largely
>             solve this problem ? (probably a dumb question, but that
>             just tells
>             you how much you need to explain :)
>
>         For GPU the motivation is three fold. With the advance of GPU
>         compute
>         and also with newer graphic program we see a massive increase
>         in GPU
>         memory consumption. We easily can reach buffer that are bigger
>         than
>         1gbytes. So the first motivation is to directly use the memory the
>         user allocated through malloc in the GPU this avoid copying
>         1gbytes of
>         data with the cpu to the gpu buffer. The second and mostly
>         important
>
>
>     The pinned memory you mentioned is the memory user allocated or
>     the memory of gpu buffer?
>
>
> Memory user allocated, we don't want to pin this memory.

After this idea merged, we don't need to allocate memory for integrated 
GPU buffer and discrete GPU don't need to have its own memory, correct?

>
> Cheers,
> Jerome


[-- Attachment #2: Type: text/html, Size: 5155 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-16  4:20         ` Simon Jeons
@ 2013-04-16 16:19           ` Jerome Glisse
  0 siblings, 0 replies; 34+ messages in thread
From: Jerome Glisse @ 2013-04-16 16:19 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm,
	Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz,
	Sagi Grimberg, Liran Liss

[-- Attachment #1: Type: text/plain, Size: 3267 bytes --]

On Tue, Apr 16, 2013 at 12:20 AM, Simon Jeons <simon.jeons@gmail.com> wrote:

>  Hi Jerome,
>
> On 04/15/2013 11:38 PM, Jerome Glisse wrote:
>
> On Mon, Apr 15, 2013 at 4:39 AM, Simon Jeons <simon.jeons@gmail.com>wrote:
>
>> Hi Jerome,
>> On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>>
>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com>
>>> wrote:
>>>
>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We would like to present a reference implementation for safely sharing
>>>>> memory pages from user space with the hardware, without pinning.
>>>>>
>>>>> We will be happy to hear the community feedback on our prototype
>>>>> implementation, and suggestions for future improvements.
>>>>>
>>>>> We would also like to discuss adding features to the core MM subsystem
>>>>> to
>>>>> assist hardware access to user memory without pinning.
>>>>>
>>>> This sounds kinda scary TBH; however I do understand the need for such
>>>> technology.
>>>>
>>>> I think one issue is that many MM developers are insufficiently aware
>>>> of such developments; having a technology presentation would probably
>>>> help there; but traditionally LSF/MM sessions are more interactive
>>>> between developers who are already quite familiar with the technology.
>>>> I think it would help if you could send in advance a detailed
>>>> presentation of the problem and the proposed solutions (and then what
>>>> they require of the MM layer) so people can be better prepared.
>>>>
>>>> And first I'd like to ask, aren't IOMMUs supposed to already largely
>>>> solve this problem ? (probably a dumb question, but that just tells
>>>> you how much you need to explain :)
>>>>
>>> For GPU the motivation is three fold. With the advance of GPU compute
>>> and also with newer graphic program we see a massive increase in GPU
>>> memory consumption. We easily can reach buffer that are bigger than
>>> 1gbytes. So the first motivation is to directly use the memory the
>>> user allocated through malloc in the GPU this avoid copying 1gbytes of
>>> data with the cpu to the gpu buffer. The second and mostly important
>>>
>>
>>  The pinned memory you mentioned is the memory user allocated or the
>> memory of gpu buffer?
>>
>
> Memory user allocated, we don't want to pin this memory.
>
>
> After this idea merged, we don't need to allocate memory for integrated
> GPU buffer and discrete GPU don't need to have its own memory, correct?
>

You need to stop considering discret and integrated GPU as different, they
are not from driver point of view. Integrated GPU will keep stealing a
chunk of system memory at boot because its a BIOS things and BIOS don't
change like that.

Both (integrated and discret) will keep allocating system memory in kernel
for their own buffer because API such as OpenGL or OpenCL needs too. The
transparent use of same address space on GPU as on CPU will only happen
with newer API such as OpenCL 2.0 or other API that are schedule down the
road.

And discrete GPU will keep having its own memory, the whole point is that
GDDR5 can be more than 10 times faster than the fastest DDR3. You can not
beat that. GPU is all about bandwidth and that's not gonna change.

Cheers,
Jerome

[-- Attachment #2: Type: text/html, Size: 5812 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-02-09  6:05 ` Michel Lespinasse
  2013-02-09 16:29   ` Jerome Glisse
@ 2013-02-10  7:54   ` Shachar Raindel
  1 sibling, 0 replies; 34+ messages in thread
From: Shachar Raindel @ 2013-02-10  7:54 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran,
	Or Gerlitz, Sagi Grimberg, Liran Liss

[-- Attachment #1: Type: text/plain, Size: 4809 bytes --]

On 2/9/2013 8:05 AM, Michel Lespinasse wrote:
> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel<raindel@mellanox.com>  wrote:
>> Hi,
>>
>> We would like to present a reference implementation for safely sharing
>> memory pages from user space with the hardware, without pinning.
>>
>> We will be happy to hear the community feedback on our prototype
>> implementation, and suggestions for future improvements.
>>
>> We would also like to discuss adding features to the core MM subsystem to
>> assist hardware access to user memory without pinning.
> This sounds kinda scary TBH; however I do understand the need for such
> technology.
The technological challenges here are actually rather similar to the 
ones experienced
by hypervisors that want to allow swapping of virtual machines. As a 
result, we benefit
greatly from the mmu notifiers implemented for KVM. Reading the page 
table directly
will be another level of challenge.
> I think one issue is that many MM developers are insufficiently aware
> of such developments; having a technology presentation would probably
> help there; but traditionally LSF/MM sessions are more interactive
> between developers who are already quite familiar with the technology.
> I think it would help if you could send in advance a detailed
> presentation of the problem and the proposed solutions (and then what
> they require of the MM layer) so people can be better prepared.
We hope to send out an RFC patch-set of the feature implementation for 
our hardware
soon, which might help to demonstrate a use case for the technology.

The current programming model for InfiniBand (and related network 
protocols - RoCE,
iWarp) relies on the user space program registering memory regions for 
use with the
hardware. Upon registration, the driver performs pinning 
(get_user_pages) of the
memory area, updates a mapping table in the hardware and provides the user
application with a handle for the mapping. The user space application 
then use this
handle to request the hardware to access this area for network IO.

While achieving unbeatable IO performance (round-trip latency, for user 
space programs,
of less than 2  microseconds, bandwidth of 56 Gbit/second), this model 
is relatively
hard to use:

- The need for explicit memory registration for each area makes the API 
rather
   complex to use. Ideal API would have a handle per process, that 
allows it to
   communicate with the hardware using the process virtual addresses.

- After a part of the address space has been registered, the application 
must be
   careful not to move the pages around. For example, doing a fork 
results in all of
   the memory registrations pointing to the wrong pages (which is very 
hard to debug).
   This was partially addressed at [1], but the cure is nearly as bad as 
the disease - when
   MADVISE_DONTFORK is used on the heap, a simple call to malloc in the 
child process
   might crash the process.

- Memory which was registered is not swappable. As a result, one cannot 
write
   applications that overcommit for physical memory while using this 
API. Similarly to
   what Jerome described about GPU applications, for network access the 
application
   might want to use ~10% of its allocated memory space, but it is 
required to either
   pin all of the memory, use heuristics to predict what memory will be 
used or
   perform expensive copying/pinning for every network transaction. All 
of these are
   non-optimal.

> And first I'd like to ask, aren't IOMMUs supposed to already largely
> solve this problem ? (probably a dumb question, but that just tells
> you how much you need to explain :)
>

IOMMU v1 doesn't solve this problem, as it gives you only one mapping 
table per
PCI function. If you want ~64 processes on your machine to be able to 
access the
network, this is not nearly enough. It is helping in implementing PCI 
pass-thru for
virtualized guests (with the hardware devices exposing several virtual 
PCI functions
for the guests), but that is still not enough for user space applications.

To some extant, IOMMU v1 might even be an obstacle to implementing such
feature, as it prevents PCI devices from accessing parts of the memory, 
requiring
driver intervention for every page fault, even if the page is in memory.

IOMMU v2 [2] is a step at the same direction that we are moving towards, 
offering
PASID - a unique identifier for each transaction that the device 
performs, allowing
to associate the transaction with a specific process. However, the 
challenges there
are similar to these we encounter when using an address translation 
table on the
PCI device itself (NIC/GPU).

References:

1. MADVISE_DONTFORK - http://lwn.net/Articles/171956/
2. AMD IOMMU v2 - 
http://www.linux-kvm.org/wiki/images/b/b1/2011-forum-amd-iommuv2-kvm.pdf


[-- Attachment #2: Type: text/html, Size: 9341 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-02-08 11:18 [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes Shachar Raindel
  2013-02-08 15:21 ` Jerome Glisse
  2013-02-09  6:05 ` Michel Lespinasse
@ 2013-04-09  8:17 ` Simon Jeons
  2013-04-10  1:48   ` Simon Jeons
  2 siblings, 1 reply; 34+ messages in thread
From: Simon Jeons @ 2013-04-09  8:17 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran,
	Or Gerlitz, Sagi Grimberg, Liran Liss

Hi Simon,
On 02/08/2013 07:18 PM, Shachar Raindel wrote:
> Hi,
>
> We would like to present a reference implementation for safely sharing 
> memory pages from user space with the hardware, without pinning.
>
> We will be happy to hear the community feedback on our prototype 
> implementation, and suggestions for future improvements.
>
> We would also like to discuss adding features to the core MM subsystem 
> to assist hardware access to user memory without pinning.
>
> Following is a longer motivation and explanation on the technology 
> presented:
>
> Many application developers would like to be able to be able to 
> communicate directly with the hardware from the userspace.
>
> Use cases for that includes high performance networking API such as 
> InfiniBand, RoCE and iWarp and interfacing with GPUs.
>
> Currently, if the user space application wants to share system memory 
> with the hardware device, the kernel component must pin the memory 
> pages in RAM, using get_user_pages.
>
> This is a hurdle, as it usually makes large portions the application 
> memory unmovable. This pinning also makes the user space development 
> model very complicated ? one needs to register memory before using it 
> for communication with the hardware.
>
> We use the mmu-notifiers [1] mechanism to inform the hardware when the 
> mapping of a page is changed. If the hardware tries to access a page 
> which is not yet mapped for the hardware, it requests a resolution for 
> the page address from the kernel.

mmu_notifiers is used for host notice guest a page changed, is it? Why 
you said that it is used for informing the hardware when the mapping of 
a page is changed?

>
> This mechanism allows the hardware to access the entire address space 
> of the user application, without pinning even a single page.
>
> We would like to use the LSF/MM forum opportunity to discuss open 
> issues we have for further development, such as:
>
> -Allowing the hardware to perform page table walk, similar to 
> get_user_pages_fast to resolve user pages that are already in RAM.
>
> -Batching page eviction by various kernel subsystems (swapper, 
> page-cache) to reduce the amount of communication needed with the 
> hardware in such events
>
> -Hinting from the hardware to the MM regarding page fetches which are 
> speculative, similarly to prefetching done by the page-cache
>
> -Page-in notifications from the kernel to the driver, such that we can 
> keep our secondary TLB in sync with the kernel page table without 
> incurring page faults.
>
> -Allowed and banned actions while in an MMU notifier callback. We have 
> already done some work on making the MMU notifiers sleepable [2], but 
> there might be additional limitations, which we would like to discuss.
>
> -Hinting from the MMU notifiers as for the reason for the notification 
> - for example we would like to react differently if a page was moved 
> by NUMA migration vs. page being swapped out.
>
> [1] http://lwn.net/Articles/266320/
>
> [2] http://comments.gmane.org/gmane.linux.kernel.mm/85002
>
> Thanks,
>
> --Shachar
>
> -- 
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
  2013-04-09  8:17 ` Simon Jeons
@ 2013-04-10  1:48   ` Simon Jeons
  0 siblings, 0 replies; 34+ messages in thread
From: Simon Jeons @ 2013-04-10  1:48 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran,
	Or Gerlitz, Sagi Grimberg, Liran Liss, Jerome Glisse

Ping Jerome,

On 04/09/2013 04:17 PM, Simon Jeons wrote:
> Hi Simon,
> On 02/08/2013 07:18 PM, Shachar Raindel wrote:
>> Hi,
>>
>> We would like to present a reference implementation for safely 
>> sharing memory pages from user space with the hardware, without pinning.
>>
>> We will be happy to hear the community feedback on our prototype 
>> implementation, and suggestions for future improvements.
>>
>> We would also like to discuss adding features to the core MM 
>> subsystem to assist hardware access to user memory without pinning.
>>
>> Following is a longer motivation and explanation on the technology 
>> presented:
>>
>> Many application developers would like to be able to be able to 
>> communicate directly with the hardware from the userspace.
>>
>> Use cases for that includes high performance networking API such as 
>> InfiniBand, RoCE and iWarp and interfacing with GPUs.
>>
>> Currently, if the user space application wants to share system memory 
>> with the hardware device, the kernel component must pin the memory 
>> pages in RAM, using get_user_pages.
>>
>> This is a hurdle, as it usually makes large portions the application 
>> memory unmovable. This pinning also makes the user space development 
>> model very complicated ? one needs to register memory before using it 
>> for communication with the hardware.
>>
>> We use the mmu-notifiers [1] mechanism to inform the hardware when 
>> the mapping of a page is changed. If the hardware tries to access a 
>> page which is not yet mapped for the hardware, it requests a 
>> resolution for the page address from the kernel.
>
> mmu_notifiers is used for host notice guest a page changed, is it? Why 
> you said that it is used for informing the hardware when the mapping 
> of a page is changed?
>
>>
>> This mechanism allows the hardware to access the entire address space 
>> of the user application, without pinning even a single page.
>>
>> We would like to use the LSF/MM forum opportunity to discuss open 
>> issues we have for further development, such as:
>>
>> -Allowing the hardware to perform page table walk, similar to 
>> get_user_pages_fast to resolve user pages that are already in RAM.
>>
>> -Batching page eviction by various kernel subsystems (swapper, 
>> page-cache) to reduce the amount of communication needed with the 
>> hardware in such events
>>
>> -Hinting from the hardware to the MM regarding page fetches which are 
>> speculative, similarly to prefetching done by the page-cache
>>
>> -Page-in notifications from the kernel to the driver, such that we 
>> can keep our secondary TLB in sync with the kernel page table without 
>> incurring page faults.
>>
>> -Allowed and banned actions while in an MMU notifier callback. We 
>> have already done some work on making the MMU notifiers sleepable 
>> [2], but there might be additional limitations, which we would like 
>> to discuss.
>>
>> -Hinting from the MMU notifiers as for the reason for the 
>> notification - for example we would like to react differently if a 
>> page was moved by NUMA migration vs. page being swapped out.
>>
>> [1] http://lwn.net/Articles/266320/
>>
>> [2] http://comments.gmane.org/gmane.linux.kernel.mm/85002
>>
>> Thanks,
>>
>> --Shachar
>>
>> -- 
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2013-04-18  1:02 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-08 11:18 [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes Shachar Raindel
2013-02-08 15:21 ` Jerome Glisse
2013-04-16  7:03   ` Simon Jeons
2013-04-16 16:27     ` Jerome Glisse
2013-04-16 23:50       ` Simon Jeons
2013-04-17 14:01         ` Jerome Glisse
2013-04-17 23:48           ` Simon Jeons
2013-04-18  1:02             ` Jerome Glisse
2013-02-09  6:05 ` Michel Lespinasse
2013-02-09 16:29   ` Jerome Glisse
2013-04-09  8:28     ` Simon Jeons
2013-04-09 14:21       ` Jerome Glisse
2013-04-10  1:41         ` Simon Jeons
2013-04-10 20:45           ` Jerome Glisse
2013-04-11  3:42             ` Simon Jeons
2013-04-11 18:38               ` Jerome Glisse
2013-04-12  1:54                 ` Simon Jeons
2013-04-12  2:11                   ` [Lsf-pc] " Rik van Riel
2013-04-12  2:57                   ` Jerome Glisse
2013-04-12  5:44                     ` Simon Jeons
2013-04-12 13:32                       ` Jerome Glisse
2013-04-10  1:57     ` Simon Jeons
2013-04-10 20:55       ` Jerome Glisse
2013-04-11  3:37         ` Simon Jeons
2013-04-11 18:48           ` Jerome Glisse
2013-04-12  3:13             ` Simon Jeons
2013-04-12  3:21               ` Jerome Glisse
2013-04-15  8:39     ` Simon Jeons
2013-04-15 15:38       ` Jerome Glisse
2013-04-16  4:20         ` Simon Jeons
2013-04-16 16:19           ` Jerome Glisse
2013-02-10  7:54   ` Shachar Raindel
2013-04-09  8:17 ` Simon Jeons
2013-04-10  1:48   ` Simon Jeons

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.