* [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes @ 2013-02-08 11:18 Shachar Raindel 2013-02-08 15:21 ` Jerome Glisse ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Shachar Raindel @ 2013-02-08 11:18 UTC (permalink / raw) To: lsf-pc Cc: linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss Hi, We would like to present a reference implementation for safely sharing memory pages from user space with the hardware, without pinning. We will be happy to hear the community feedback on our prototype implementation, and suggestions for future improvements. We would also like to discuss adding features to the core MM subsystem to assist hardware access to user memory without pinning. Following is a longer motivation and explanation on the technology presented: Many application developers would like to be able to be able to communicate directly with the hardware from the userspace. Use cases for that includes high performance networking API such as InfiniBand, RoCE and iWarp and interfacing with GPUs. Currently, if the user space application wants to share system memory with the hardware device, the kernel component must pin the memory pages in RAM, using get_user_pages. This is a hurdle, as it usually makes large portions the application memory unmovable. This pinning also makes the user space development model very complicated a?? one needs to register memory before using it for communication with the hardware. We use the mmu-notifiers [1] mechanism to inform the hardware when the mapping of a page is changed. If the hardware tries to access a page which is not yet mapped for the hardware, it requests a resolution for the page address from the kernel. This mechanism allows the hardware to access the entire address space of the user application, without pinning even a single page. We would like to use the LSF/MM forum opportunity to discuss open issues we have for further development, such as: -Allowing the hardware to perform page table walk, similar to get_user_pages_fast to resolve user pages that are already in RAM. -Batching page eviction by various kernel subsystems (swapper, page-cache) to reduce the amount of communication needed with the hardware in such events -Hinting from the hardware to the MM regarding page fetches which are speculative, similarly to prefetching done by the page-cache -Page-in notifications from the kernel to the driver, such that we can keep our secondary TLB in sync with the kernel page table without incurring page faults. -Allowed and banned actions while in an MMU notifier callback. We have already done some work on making the MMU notifiers sleepable [2], but there might be additional limitations, which we would like to discuss. -Hinting from the MMU notifiers as for the reason for the notification - for example we would like to react differently if a page was moved by NUMA migration vs. page being swapped out. [1] http://lwn.net/Articles/266320/ [2] http://comments.gmane.org/gmane.linux.kernel.mm/85002 Thanks, --Shachar -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-02-08 11:18 [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes Shachar Raindel @ 2013-02-08 15:21 ` Jerome Glisse 2013-04-16 7:03 ` Simon Jeons 2013-02-09 6:05 ` Michel Lespinasse 2013-04-09 8:17 ` Simon Jeons 2 siblings, 1 reply; 34+ messages in thread From: Jerome Glisse @ 2013-02-08 15:21 UTC (permalink / raw) To: Shachar Raindel Cc: lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss On Fri, Feb 8, 2013 at 6:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: > Hi, > > We would like to present a reference implementation for safely sharing > memory pages from user space with the hardware, without pinning. > > We will be happy to hear the community feedback on our prototype > implementation, and suggestions for future improvements. > > We would also like to discuss adding features to the core MM subsystem to > assist hardware access to user memory without pinning. > > Following is a longer motivation and explanation on the technology > presented: > > Many application developers would like to be able to be able to communicate > directly with the hardware from the userspace. > > Use cases for that includes high performance networking API such as > InfiniBand, RoCE and iWarp and interfacing with GPUs. > > Currently, if the user space application wants to share system memory with > the hardware device, the kernel component must pin the memory pages in RAM, > using get_user_pages. > > This is a hurdle, as it usually makes large portions the application memory > unmovable. This pinning also makes the user space development model very > complicated – one needs to register memory before using it for communication > with the hardware. > > We use the mmu-notifiers [1] mechanism to inform the hardware when the > mapping of a page is changed. If the hardware tries to access a page which > is not yet mapped for the hardware, it requests a resolution for the page > address from the kernel. > > This mechanism allows the hardware to access the entire address space of the > user application, without pinning even a single page. > > We would like to use the LSF/MM forum opportunity to discuss open issues we > have for further development, such as: > > -Allowing the hardware to perform page table walk, similar to > get_user_pages_fast to resolve user pages that are already in RAM. > > -Batching page eviction by various kernel subsystems (swapper, page-cache) > to reduce the amount of communication needed with the hardware in such > events > > -Hinting from the hardware to the MM regarding page fetches which are > speculative, similarly to prefetching done by the page-cache > > -Page-in notifications from the kernel to the driver, such that we can keep > our secondary TLB in sync with the kernel page table without incurring page > faults. > > -Allowed and banned actions while in an MMU notifier callback. We have > already done some work on making the MMU notifiers sleepable [2], but there > might be additional limitations, which we would like to discuss. > > -Hinting from the MMU notifiers as for the reason for the notification - for > example we would like to react differently if a page was moved by NUMA > migration vs. page being swapped out. > > [1] http://lwn.net/Articles/266320/ > > [2] http://comments.gmane.org/gmane.linux.kernel.mm/85002 > > Thanks, > > --Shachar As a GPU driver developer i can say that this is something we want to do in a very near future. Also i think we would like another capabilities : - hint to mm on memory range that are best not to evict (easier for driver to know what is hot and gonna see activities) Dunno how big the change to the page eviction path would need to be. Cheers, Jerome -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-02-08 15:21 ` Jerome Glisse @ 2013-04-16 7:03 ` Simon Jeons 2013-04-16 16:27 ` Jerome Glisse 0 siblings, 1 reply; 34+ messages in thread From: Simon Jeons @ 2013-04-16 7:03 UTC (permalink / raw) To: Jerome Glisse Cc: Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss Hi Jerome, On 02/08/2013 11:21 PM, Jerome Glisse wrote: > On Fri, Feb 8, 2013 at 6:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: >> Hi, >> >> We would like to present a reference implementation for safely sharing >> memory pages from user space with the hardware, without pinning. >> >> We will be happy to hear the community feedback on our prototype >> implementation, and suggestions for future improvements. >> >> We would also like to discuss adding features to the core MM subsystem to >> assist hardware access to user memory without pinning. >> >> Following is a longer motivation and explanation on the technology >> presented: >> >> Many application developers would like to be able to be able to communicate >> directly with the hardware from the userspace. >> >> Use cases for that includes high performance networking API such as >> InfiniBand, RoCE and iWarp and interfacing with GPUs. >> >> Currently, if the user space application wants to share system memory with >> the hardware device, the kernel component must pin the memory pages in RAM, >> using get_user_pages. >> >> This is a hurdle, as it usually makes large portions the application memory >> unmovable. This pinning also makes the user space development model very >> complicated ? one needs to register memory before using it for communication >> with the hardware. >> >> We use the mmu-notifiers [1] mechanism to inform the hardware when the >> mapping of a page is changed. If the hardware tries to access a page which >> is not yet mapped for the hardware, it requests a resolution for the page >> address from the kernel. >> >> This mechanism allows the hardware to access the entire address space of the >> user application, without pinning even a single page. >> >> We would like to use the LSF/MM forum opportunity to discuss open issues we >> have for further development, such as: >> >> -Allowing the hardware to perform page table walk, similar to >> get_user_pages_fast to resolve user pages that are already in RAM. get_user_pages_fast just get page reference count instead of populate the pte to page table, correct? Then how can GPU driver use iommu to access the page? >> >> -Batching page eviction by various kernel subsystems (swapper, page-cache) >> to reduce the amount of communication needed with the hardware in such >> events >> >> -Hinting from the hardware to the MM regarding page fetches which are >> speculative, similarly to prefetching done by the page-cache >> >> -Page-in notifications from the kernel to the driver, such that we can keep >> our secondary TLB in sync with the kernel page table without incurring page >> faults. >> >> -Allowed and banned actions while in an MMU notifier callback. We have >> already done some work on making the MMU notifiers sleepable [2], but there >> might be additional limitations, which we would like to discuss. >> >> -Hinting from the MMU notifiers as for the reason for the notification - for >> example we would like to react differently if a page was moved by NUMA >> migration vs. page being swapped out. >> >> [1] http://lwn.net/Articles/266320/ >> >> [2] http://comments.gmane.org/gmane.linux.kernel.mm/85002 >> >> Thanks, >> >> --Shachar > As a GPU driver developer i can say that this is something we want to > do in a very near future. Also i think we would like another > capabilities : > > - hint to mm on memory range that are best not to evict (easier for > driver to know what is hot and gonna see activities) > > Dunno how big the change to the page eviction path would need to be. > > Cheers, > Jerome > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=ilto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-16 7:03 ` Simon Jeons @ 2013-04-16 16:27 ` Jerome Glisse 2013-04-16 23:50 ` Simon Jeons 0 siblings, 1 reply; 34+ messages in thread From: Jerome Glisse @ 2013-04-16 16:27 UTC (permalink / raw) To: Simon Jeons Cc: Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss [-- Attachment #1: Type: text/plain, Size: 2786 bytes --] On Tue, Apr 16, 2013 at 3:03 AM, Simon Jeons <simon.jeons@gmail.com> wrote: > Hi Jerome, > > On 02/08/2013 11:21 PM, Jerome Glisse wrote: > >> On Fri, Feb 8, 2013 at 6:18 AM, Shachar Raindel <raindel@mellanox.com> >> wrote: >> >>> Hi, >>> >>> We would like to present a reference implementation for safely sharing >>> memory pages from user space with the hardware, without pinning. >>> >>> We will be happy to hear the community feedback on our prototype >>> implementation, and suggestions for future improvements. >>> >>> We would also like to discuss adding features to the core MM subsystem to >>> assist hardware access to user memory without pinning. >>> >>> Following is a longer motivation and explanation on the technology >>> presented: >>> >>> Many application developers would like to be able to be able to >>> communicate >>> directly with the hardware from the userspace. >>> >>> Use cases for that includes high performance networking API such as >>> InfiniBand, RoCE and iWarp and interfacing with GPUs. >>> >>> Currently, if the user space application wants to share system memory >>> with >>> the hardware device, the kernel component must pin the memory pages in >>> RAM, >>> using get_user_pages. >>> >>> This is a hurdle, as it usually makes large portions the application >>> memory >>> unmovable. This pinning also makes the user space development model very >>> complicated – one needs to register memory before using it for >>> communication >>> with the hardware. >>> >>> We use the mmu-notifiers [1] mechanism to inform the hardware when the >>> mapping of a page is changed. If the hardware tries to access a page >>> which >>> is not yet mapped for the hardware, it requests a resolution for the page >>> address from the kernel. >>> >>> This mechanism allows the hardware to access the entire address space of >>> the >>> user application, without pinning even a single page. >>> >>> We would like to use the LSF/MM forum opportunity to discuss open issues >>> we >>> have for further development, such as: >>> >>> -Allowing the hardware to perform page table walk, similar to >>> get_user_pages_fast to resolve user pages that are already in RAM. >>> >> > get_user_pages_fast just get page reference count instead of populate the > pte to page table, correct? Then how can GPU driver use iommu to access the > page? > As i said this is for pre-filling already present entry, ie pte that are present with a valid page (no special bit set). This is an optimization so that the GPU can pre-fill its tlb without having to take any mmap_sem. Hope is that in most common case this will be enough, but in some case you will have to go through the lengthy non fast gup. Cheers, Jerome [-- Attachment #2: Type: text/html, Size: 3327 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-16 16:27 ` Jerome Glisse @ 2013-04-16 23:50 ` Simon Jeons 2013-04-17 14:01 ` Jerome Glisse 0 siblings, 1 reply; 34+ messages in thread From: Simon Jeons @ 2013-04-16 23:50 UTC (permalink / raw) To: Jerome Glisse Cc: Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss On 04/17/2013 12:27 AM, Jerome Glisse wrote: [snip] > > > As i said this is for pre-filling already present entry, ie pte that > are present with a valid page (no special bit set). This is an > optimization so that the GPU can pre-fill its tlb without having to > take any mmap_sem. Hope is that in most common case this will be > enough, but in some case you will have to go through the lengthy non > fast gup. I know this. What I concern is the pte you mentioned is for normal cpu, correct? How can you pre-fill pte and tlb of GPU? > > Cheers, > Jerome -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-16 23:50 ` Simon Jeons @ 2013-04-17 14:01 ` Jerome Glisse 2013-04-17 23:48 ` Simon Jeons 0 siblings, 1 reply; 34+ messages in thread From: Jerome Glisse @ 2013-04-17 14:01 UTC (permalink / raw) To: Simon Jeons Cc: Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss [-- Attachment #1: Type: text/plain, Size: 1163 bytes --] On Tue, Apr 16, 2013 at 7:50 PM, Simon Jeons <simon.jeons@gmail.com> wrote: > On 04/17/2013 12:27 AM, Jerome Glisse wrote: > > [snip] > > >> >> As i said this is for pre-filling already present entry, ie pte that are >> present with a valid page (no special bit set). This is an optimization so >> that the GPU can pre-fill its tlb without having to take any mmap_sem. Hope >> is that in most common case this will be enough, but in some case you will >> have to go through the lengthy non fast gup. >> > > I know this. What I concern is the pte you mentioned is for normal cpu, > correct? How can you pre-fill pte and tlb of GPU? > You getting confuse, idea is to look at cpu pte and prefill gpu pte. I do not prefill cpu pte, if a cpu pte is valid then i use the page it point to prefill the GPU pte. So i don't pre-fill CPU PTE and TLB GPU, i pre-fill GPU PTE from CPU PTE if CPU PTE is valid. Other GPU PTE are marked as invalid and will trigger a fault that will be handle using gup that will fill CPU PTE (if fault happen at a valid address) at which point GPU PTE is updated or error is reported if fault happened at an invalid address. Cheers, Jerome [-- Attachment #2: Type: text/html, Size: 1578 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-17 14:01 ` Jerome Glisse @ 2013-04-17 23:48 ` Simon Jeons 2013-04-18 1:02 ` Jerome Glisse 0 siblings, 1 reply; 34+ messages in thread From: Simon Jeons @ 2013-04-17 23:48 UTC (permalink / raw) To: Jerome Glisse Cc: Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss [-- Attachment #1: Type: text/plain, Size: 1502 bytes --] Hi Jerome, On 04/17/2013 10:01 PM, Jerome Glisse wrote: > On Tue, Apr 16, 2013 at 7:50 PM, Simon Jeons <simon.jeons@gmail.com > <mailto:simon.jeons@gmail.com>> wrote: > > On 04/17/2013 12:27 AM, Jerome Glisse wrote: > > [snip] > > > > As i said this is for pre-filling already present entry, ie > pte that are present with a valid page (no special bit set). > This is an optimization so that the GPU can pre-fill its tlb > without having to take any mmap_sem. Hope is that in most > common case this will be enough, but in some case you will > have to go through the lengthy non fast gup. > > > I know this. What I concern is the pte you mentioned is for normal > cpu, correct? How can you pre-fill pte and tlb of GPU? > > > You getting confuse, idea is to look at cpu pte and prefill gpu pte. I > do not prefill cpu pte, if a cpu pte is valid then i use the page it > point to prefill the GPU pte. Yes, confused! > > So i don't pre-fill CPU PTE and TLB GPU, i pre-fill GPU PTE from CPU > PTE if CPU PTE is valid. Other GPU PTE are marked as invalid and will > trigger a fault that will be handle using gup that will fill CPU PTE > (if fault happen at a valid address) at which point GPU PTE is updated > or error is reported if fault happened at an invalid address. gup is used to fill CPU PTE, could you point out to me which codes will re-fill GPU PTE? gup fast? GPU page table is different from CPU? > > Cheers, > Jerome [-- Attachment #2: Type: text/html, Size: 2877 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-17 23:48 ` Simon Jeons @ 2013-04-18 1:02 ` Jerome Glisse 0 siblings, 0 replies; 34+ messages in thread From: Jerome Glisse @ 2013-04-18 1:02 UTC (permalink / raw) To: Simon Jeons Cc: Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss [-- Attachment #1: Type: text/plain, Size: 1623 bytes --] On Wed, Apr 17, 2013 at 7:48 PM, Simon Jeons <simon.jeons@gmail.com> wrote: > Hi Jerome, > > On 04/17/2013 10:01 PM, Jerome Glisse wrote: > > On Tue, Apr 16, 2013 at 7:50 PM, Simon Jeons <simon.jeons@gmail.com>wrote: > >> On 04/17/2013 12:27 AM, Jerome Glisse wrote: >> >> [snip] >> >> >>> >>> As i said this is for pre-filling already present entry, ie pte that are >>> present with a valid page (no special bit set). This is an optimization so >>> that the GPU can pre-fill its tlb without having to take any mmap_sem. Hope >>> is that in most common case this will be enough, but in some case you will >>> have to go through the lengthy non fast gup. >>> >> >> I know this. What I concern is the pte you mentioned is for normal cpu, >> correct? How can you pre-fill pte and tlb of GPU? >> > > You getting confuse, idea is to look at cpu pte and prefill gpu pte. I do > not prefill cpu pte, if a cpu pte is valid then i use the page it point to > prefill the GPU pte. > > > Yes, confused! > > > > So i don't pre-fill CPU PTE and TLB GPU, i pre-fill GPU PTE from CPU PTE > if CPU PTE is valid. Other GPU PTE are marked as invalid and will trigger a > fault that will be handle using gup that will fill CPU PTE (if fault happen > at a valid address) at which point GPU PTE is updated or error is reported > if fault happened at an invalid address. > > > gup is used to fill CPU PTE, could you point out to me which codes will > re-fill GPU PTE? gup fast? > GPU page table is different from CPU? > > The GPU interrupt handler will schedule a work thread that will call gup and then update gpu page table. Cheers, Jerome [-- Attachment #2: Type: text/html, Size: 2907 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-02-08 11:18 [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes Shachar Raindel 2013-02-08 15:21 ` Jerome Glisse @ 2013-02-09 6:05 ` Michel Lespinasse 2013-02-09 16:29 ` Jerome Glisse 2013-02-10 7:54 ` Shachar Raindel 2013-04-09 8:17 ` Simon Jeons 2 siblings, 2 replies; 34+ messages in thread From: Michel Lespinasse @ 2013-02-09 6:05 UTC (permalink / raw) To: Shachar Raindel Cc: lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: > Hi, > > We would like to present a reference implementation for safely sharing > memory pages from user space with the hardware, without pinning. > > We will be happy to hear the community feedback on our prototype > implementation, and suggestions for future improvements. > > We would also like to discuss adding features to the core MM subsystem to > assist hardware access to user memory without pinning. This sounds kinda scary TBH; however I do understand the need for such technology. I think one issue is that many MM developers are insufficiently aware of such developments; having a technology presentation would probably help there; but traditionally LSF/MM sessions are more interactive between developers who are already quite familiar with the technology. I think it would help if you could send in advance a detailed presentation of the problem and the proposed solutions (and then what they require of the MM layer) so people can be better prepared. And first I'd like to ask, aren't IOMMUs supposed to already largely solve this problem ? (probably a dumb question, but that just tells you how much you need to explain :) -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-02-09 6:05 ` Michel Lespinasse @ 2013-02-09 16:29 ` Jerome Glisse 2013-04-09 8:28 ` Simon Jeons ` (2 more replies) 2013-02-10 7:54 ` Shachar Raindel 1 sibling, 3 replies; 34+ messages in thread From: Jerome Glisse @ 2013-02-09 16:29 UTC (permalink / raw) To: Michel Lespinasse Cc: Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote: > On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: >> Hi, >> >> We would like to present a reference implementation for safely sharing >> memory pages from user space with the hardware, without pinning. >> >> We will be happy to hear the community feedback on our prototype >> implementation, and suggestions for future improvements. >> >> We would also like to discuss adding features to the core MM subsystem to >> assist hardware access to user memory without pinning. > > This sounds kinda scary TBH; however I do understand the need for such > technology. > > I think one issue is that many MM developers are insufficiently aware > of such developments; having a technology presentation would probably > help there; but traditionally LSF/MM sessions are more interactive > between developers who are already quite familiar with the technology. > I think it would help if you could send in advance a detailed > presentation of the problem and the proposed solutions (and then what > they require of the MM layer) so people can be better prepared. > > And first I'd like to ask, aren't IOMMUs supposed to already largely > solve this problem ? (probably a dumb question, but that just tells > you how much you need to explain :) For GPU the motivation is three fold. With the advance of GPU compute and also with newer graphic program we see a massive increase in GPU memory consumption. We easily can reach buffer that are bigger than 1gbytes. So the first motivation is to directly use the memory the user allocated through malloc in the GPU this avoid copying 1gbytes of data with the cpu to the gpu buffer. The second and mostly important to GPU compute is the use of GPU seamlessly with the CPU, in order to achieve this you want the programmer to have a single address space on the CPU and GPU. So that the same address point to the same object on GPU as on the CPU. This would also be a tremendous cleaner design from driver point of view toward memory management. And last, the most important, with such big buffer (>1gbytes) the memory pinning is becoming way to expensive and also drastically reduce the freedom of the mm to free page for other process. Most of the time a small window (every thing is relative the window can be > 100mbytes not so small :)) of the object will be in use by the hardware. The hardware pagefault support would avoid the necessity to pin memory and thus offer greater flexibility. At the same time the driver wants to avoid page fault as much as possible this is why i would like to be able to give hint to the mm about range of address it should avoid freeing page (swapping them out). The iommu was designed with other goals, which were first isolate device from one another and restrict device access to allowed memory. Second allow to remap address that are above device address space limit. Lot of device can only address 24bit or 32bit of memory and with computer with several gbytes of memory suddenly lot of the page become unreachable to the hardware. The iommu allow to work around this by remapping those high page into address that the hardware can reach. The hardware page fault support is a new feature of iommu designed to help the os and driver to reduce memory pinning and also share address space. Thought i am sure there are other motivations that i am not even aware off or would think off. Btw i won't be at LSF/MM so a free good beer (or other beverage) on me to whoever takes note on this subject in next conf we run into each others. Cheers, Jerome -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-02-09 16:29 ` Jerome Glisse @ 2013-04-09 8:28 ` Simon Jeons 2013-04-09 14:21 ` Jerome Glisse 2013-04-10 1:57 ` Simon Jeons 2013-04-15 8:39 ` Simon Jeons 2 siblings, 1 reply; 34+ messages in thread From: Simon Jeons @ 2013-04-09 8:28 UTC (permalink / raw) To: Jerome Glisse Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss Hi Jerome, On 02/10/2013 12:29 AM, Jerome Glisse wrote: > On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote: >> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: >>> Hi, >>> >>> We would like to present a reference implementation for safely sharing >>> memory pages from user space with the hardware, without pinning. >>> >>> We will be happy to hear the community feedback on our prototype >>> implementation, and suggestions for future improvements. >>> >>> We would also like to discuss adding features to the core MM subsystem to >>> assist hardware access to user memory without pinning. >> This sounds kinda scary TBH; however I do understand the need for such >> technology. >> >> I think one issue is that many MM developers are insufficiently aware >> of such developments; having a technology presentation would probably >> help there; but traditionally LSF/MM sessions are more interactive >> between developers who are already quite familiar with the technology. >> I think it would help if you could send in advance a detailed >> presentation of the problem and the proposed solutions (and then what >> they require of the MM layer) so people can be better prepared. >> >> And first I'd like to ask, aren't IOMMUs supposed to already largely >> solve this problem ? (probably a dumb question, but that just tells >> you how much you need to explain :) > For GPU the motivation is three fold. With the advance of GPU compute > and also with newer graphic program we see a massive increase in GPU > memory consumption. We easily can reach buffer that are bigger than > 1gbytes. So the first motivation is to directly use the memory the > user allocated through malloc in the GPU this avoid copying 1gbytes of > data with the cpu to the gpu buffer. The second and mostly important > to GPU compute is the use of GPU seamlessly with the CPU, in order to > achieve this you want the programmer to have a single address space on > the CPU and GPU. So that the same address point to the same object on > GPU as on the CPU. This would also be a tremendous cleaner design from > driver point of view toward memory management. > > And last, the most important, with such big buffer (>1gbytes) the > memory pinning is becoming way to expensive and also drastically > reduce the freedom of the mm to free page for other process. Most of > the time a small window (every thing is relative the window can be > > 100mbytes not so small :)) of the object will be in use by the > hardware. The hardware pagefault support would avoid the necessity to What's the meaning of hardware pagefault? > pin memory and thus offer greater flexibility. At the same time the > driver wants to avoid page fault as much as possible this is why i > would like to be able to give hint to the mm about range of address it > should avoid freeing page (swapping them out). > > The iommu was designed with other goals, which were first isolate > device from one another and restrict device access to allowed memory. > Second allow to remap address that are above device address space When need remap address? > limit. Lot of device can only address 24bit or 32bit of memory and > with computer with several gbytes of memory suddenly lot of the page > become unreachable to the hardware. The iommu allow to work around > this by remapping those high page into address that the hardware can > reach. > > The hardware page fault support is a new feature of iommu designed to > help the os and driver to reduce memory pinning and also share address > space. Thought i am sure there are other motivations that i am not > even aware off or would think off. > > Btw i won't be at LSF/MM so a free good beer (or other beverage) on me > to whoever takes note on this subject in next conf we run into each > others. > > Cheers, > Jerome > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-09 8:28 ` Simon Jeons @ 2013-04-09 14:21 ` Jerome Glisse 2013-04-10 1:41 ` Simon Jeons 0 siblings, 1 reply; 34+ messages in thread From: Jerome Glisse @ 2013-04-09 14:21 UTC (permalink / raw) To: Simon Jeons Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote: > Hi Jerome, > On 02/10/2013 12:29 AM, Jerome Glisse wrote: > >On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote: > >>On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: > >>>Hi, > >>> > >>>We would like to present a reference implementation for safely sharing > >>>memory pages from user space with the hardware, without pinning. > >>> > >>>We will be happy to hear the community feedback on our prototype > >>>implementation, and suggestions for future improvements. > >>> > >>>We would also like to discuss adding features to the core MM subsystem to > >>>assist hardware access to user memory without pinning. > >>This sounds kinda scary TBH; however I do understand the need for such > >>technology. > >> > >>I think one issue is that many MM developers are insufficiently aware > >>of such developments; having a technology presentation would probably > >>help there; but traditionally LSF/MM sessions are more interactive > >>between developers who are already quite familiar with the technology. > >>I think it would help if you could send in advance a detailed > >>presentation of the problem and the proposed solutions (and then what > >>they require of the MM layer) so people can be better prepared. > >> > >>And first I'd like to ask, aren't IOMMUs supposed to already largely > >>solve this problem ? (probably a dumb question, but that just tells > >>you how much you need to explain :) > >For GPU the motivation is three fold. With the advance of GPU compute > >and also with newer graphic program we see a massive increase in GPU > >memory consumption. We easily can reach buffer that are bigger than > >1gbytes. So the first motivation is to directly use the memory the > >user allocated through malloc in the GPU this avoid copying 1gbytes of > >data with the cpu to the gpu buffer. The second and mostly important > >to GPU compute is the use of GPU seamlessly with the CPU, in order to > >achieve this you want the programmer to have a single address space on > >the CPU and GPU. So that the same address point to the same object on > >GPU as on the CPU. This would also be a tremendous cleaner design from > >driver point of view toward memory management. > > > >And last, the most important, with such big buffer (>1gbytes) the > >memory pinning is becoming way to expensive and also drastically > >reduce the freedom of the mm to free page for other process. Most of > >the time a small window (every thing is relative the window can be > > >100mbytes not so small :)) of the object will be in use by the > >hardware. The hardware pagefault support would avoid the necessity to > > What's the meaning of hardware pagefault? It's a PCIE extension (well it's a combination of extension that allow that see http://www.pcisig.com/specifications/iov/ats/). Idea is that the iommu can trigger a regular pagefault inside a process address space on behalf of the hardware. The only iommu supporting that right now is the AMD iommu v2 that you find on recent AMD platform. > > >pin memory and thus offer greater flexibility. At the same time the > >driver wants to avoid page fault as much as possible this is why i > >would like to be able to give hint to the mm about range of address it > >should avoid freeing page (swapping them out). > > > >The iommu was designed with other goals, which were first isolate > >device from one another and restrict device access to allowed memory. > >Second allow to remap address that are above device address space > > When need remap address? Some hardware have 24bits or 32bits address limitation, iommu allow to remap memory that are above this range into the working range of the device. Just as i said below. Or are your question different ? Cheers, Jerome > >limit. Lot of device can only address 24bit or 32bit of memory and > >with computer with several gbytes of memory suddenly lot of the page > >become unreachable to the hardware. The iommu allow to work around > >this by remapping those high page into address that the hardware can > >reach. > > > >The hardware page fault support is a new feature of iommu designed to > >help the os and driver to reduce memory pinning and also share address > >space. Thought i am sure there are other motivations that i am not > >even aware off or would think off. > > > >Btw i won't be at LSF/MM so a free good beer (or other beverage) on me > >to whoever takes note on this subject in next conf we run into each > >others. > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-09 14:21 ` Jerome Glisse @ 2013-04-10 1:41 ` Simon Jeons 2013-04-10 20:45 ` Jerome Glisse 0 siblings, 1 reply; 34+ messages in thread From: Simon Jeons @ 2013-04-10 1:41 UTC (permalink / raw) To: Jerome Glisse Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss Hi Jerome, On 04/09/2013 10:21 PM, Jerome Glisse wrote: > On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote: >> Hi Jerome, >> On 02/10/2013 12:29 AM, Jerome Glisse wrote: >>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote: >>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: >>>>> Hi, >>>>> >>>>> We would like to present a reference implementation for safely sharing >>>>> memory pages from user space with the hardware, without pinning. >>>>> >>>>> We will be happy to hear the community feedback on our prototype >>>>> implementation, and suggestions for future improvements. >>>>> >>>>> We would also like to discuss adding features to the core MM subsystem to >>>>> assist hardware access to user memory without pinning. >>>> This sounds kinda scary TBH; however I do understand the need for such >>>> technology. >>>> >>>> I think one issue is that many MM developers are insufficiently aware >>>> of such developments; having a technology presentation would probably >>>> help there; but traditionally LSF/MM sessions are more interactive >>>> between developers who are already quite familiar with the technology. >>>> I think it would help if you could send in advance a detailed >>>> presentation of the problem and the proposed solutions (and then what >>>> they require of the MM layer) so people can be better prepared. >>>> >>>> And first I'd like to ask, aren't IOMMUs supposed to already largely >>>> solve this problem ? (probably a dumb question, but that just tells >>>> you how much you need to explain :) >>> For GPU the motivation is three fold. With the advance of GPU compute >>> and also with newer graphic program we see a massive increase in GPU >>> memory consumption. We easily can reach buffer that are bigger than >>> 1gbytes. So the first motivation is to directly use the memory the >>> user allocated through malloc in the GPU this avoid copying 1gbytes of >>> data with the cpu to the gpu buffer. The second and mostly important >>> to GPU compute is the use of GPU seamlessly with the CPU, in order to >>> achieve this you want the programmer to have a single address space on >>> the CPU and GPU. So that the same address point to the same object on >>> GPU as on the CPU. This would also be a tremendous cleaner design from >>> driver point of view toward memory management. >>> >>> And last, the most important, with such big buffer (>1gbytes) the >>> memory pinning is becoming way to expensive and also drastically >>> reduce the freedom of the mm to free page for other process. Most of >>> the time a small window (every thing is relative the window can be > >>> 100mbytes not so small :)) of the object will be in use by the >>> hardware. The hardware pagefault support would avoid the necessity to >> What's the meaning of hardware pagefault? > It's a PCIE extension (well it's a combination of extension that allow > that see http://www.pcisig.com/specifications/iov/ats/). Idea is that the > iommu can trigger a regular pagefault inside a process address space on > behalf of the hardware. The only iommu supporting that right now is the > AMD iommu v2 that you find on recent AMD platform. Why need hardware page fault? regular page fault is trigger by cpu mmu, correct? >>> pin memory and thus offer greater flexibility. At the same time the >>> driver wants to avoid page fault as much as possible this is why i >>> would like to be able to give hint to the mm about range of address it >>> should avoid freeing page (swapping them out). >>> >>> The iommu was designed with other goals, which were first isolate >>> device from one another and restrict device access to allowed memory. >>> Second allow to remap address that are above device address space >> When need remap address? > Some hardware have 24bits or 32bits address limitation, iommu allow to > remap memory that are above this range into the working range of the > device. Just as i said below. Or are your question different ? Oh, this method can replace bounce buffer, correct? > > Cheers, > Jerome > >>> limit. Lot of device can only address 24bit or 32bit of memory and >>> with computer with several gbytes of memory suddenly lot of the page >>> become unreachable to the hardware. The iommu allow to work around >>> this by remapping those high page into address that the hardware can >>> reach. >>> >>> The hardware page fault support is a new feature of iommu designed to >>> help the os and driver to reduce memory pinning and also share address >>> space. Thought i am sure there are other motivations that i am not >>> even aware off or would think off. >>> >>> Btw i won't be at LSF/MM so a free good beer (or other beverage) on me >>> to whoever takes note on this subject in next conf we run into each >>> others. >>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-10 1:41 ` Simon Jeons @ 2013-04-10 20:45 ` Jerome Glisse 2013-04-11 3:42 ` Simon Jeons 0 siblings, 1 reply; 34+ messages in thread From: Jerome Glisse @ 2013-04-10 20:45 UTC (permalink / raw) To: Simon Jeons Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote: > Hi Jerome, > On 04/09/2013 10:21 PM, Jerome Glisse wrote: > >On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote: > >>Hi Jerome, > >>On 02/10/2013 12:29 AM, Jerome Glisse wrote: > >>>On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote: > >>>>On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: > >>>>>Hi, > >>>>> > >>>>>We would like to present a reference implementation for safely sharing > >>>>>memory pages from user space with the hardware, without pinning. > >>>>> > >>>>>We will be happy to hear the community feedback on our prototype > >>>>>implementation, and suggestions for future improvements. > >>>>> > >>>>>We would also like to discuss adding features to the core MM subsystem to > >>>>>assist hardware access to user memory without pinning. > >>>>This sounds kinda scary TBH; however I do understand the need for such > >>>>technology. > >>>> > >>>>I think one issue is that many MM developers are insufficiently aware > >>>>of such developments; having a technology presentation would probably > >>>>help there; but traditionally LSF/MM sessions are more interactive > >>>>between developers who are already quite familiar with the technology. > >>>>I think it would help if you could send in advance a detailed > >>>>presentation of the problem and the proposed solutions (and then what > >>>>they require of the MM layer) so people can be better prepared. > >>>> > >>>>And first I'd like to ask, aren't IOMMUs supposed to already largely > >>>>solve this problem ? (probably a dumb question, but that just tells > >>>>you how much you need to explain :) > >>>For GPU the motivation is three fold. With the advance of GPU compute > >>>and also with newer graphic program we see a massive increase in GPU > >>>memory consumption. We easily can reach buffer that are bigger than > >>>1gbytes. So the first motivation is to directly use the memory the > >>>user allocated through malloc in the GPU this avoid copying 1gbytes of > >>>data with the cpu to the gpu buffer. The second and mostly important > >>>to GPU compute is the use of GPU seamlessly with the CPU, in order to > >>>achieve this you want the programmer to have a single address space on > >>>the CPU and GPU. So that the same address point to the same object on > >>>GPU as on the CPU. This would also be a tremendous cleaner design from > >>>driver point of view toward memory management. > >>> > >>>And last, the most important, with such big buffer (>1gbytes) the > >>>memory pinning is becoming way to expensive and also drastically > >>>reduce the freedom of the mm to free page for other process. Most of > >>>the time a small window (every thing is relative the window can be > > >>>100mbytes not so small :)) of the object will be in use by the > >>>hardware. The hardware pagefault support would avoid the necessity to > >>What's the meaning of hardware pagefault? > >It's a PCIE extension (well it's a combination of extension that allow > >that see http://www.pcisig.com/specifications/iov/ats/). Idea is that the > >iommu can trigger a regular pagefault inside a process address space on > >behalf of the hardware. The only iommu supporting that right now is the > >AMD iommu v2 that you find on recent AMD platform. > > Why need hardware page fault? regular page fault is trigger by cpu > mmu, correct? Well here i abuse regular page fault term. Idea is that with hardware page fault you don't need to pin memory or take reference on page for hardware to use it. So that kernel can free as usual page that would otherwise have been pinned. If GPU is really using them it will trigger a fault through the iommu driver that call get_user_pages (which can end up calling handle_mm_fault like a regular page fault that happened on the CPU). One use case is GPU working on BIG dataset (think > GB buffer that can be on disk and just paged in when a chunk is needed). This is one example, but usualy GPU works on very large dataset because that's what they are good at. > > >>>pin memory and thus offer greater flexibility. At the same time the > >>>driver wants to avoid page fault as much as possible this is why i > >>>would like to be able to give hint to the mm about range of address it > >>>should avoid freeing page (swapping them out). > >>> > >>>The iommu was designed with other goals, which were first isolate > >>>device from one another and restrict device access to allowed memory. > >>>Second allow to remap address that are above device address space > >>When need remap address? > >Some hardware have 24bits or 32bits address limitation, iommu allow to > >remap memory that are above this range into the working range of the > >device. Just as i said below. Or are your question different ? > > Oh, this method can replace bounce buffer, correct? Yes, no bounce buffer, bounce buffer is frowned upon in GPU world because you really really really don't want to use the dma sync buffer API. Cheers, Jerome -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-10 20:45 ` Jerome Glisse @ 2013-04-11 3:42 ` Simon Jeons 2013-04-11 18:38 ` Jerome Glisse 0 siblings, 1 reply; 34+ messages in thread From: Simon Jeons @ 2013-04-11 3:42 UTC (permalink / raw) To: Jerome Glisse Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss Hi Jerome, On 04/11/2013 04:45 AM, Jerome Glisse wrote: > On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote: >> Hi Jerome, >> On 04/09/2013 10:21 PM, Jerome Glisse wrote: >>> On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote: >>>> Hi Jerome, >>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote: >>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote: >>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> We would like to present a reference implementation for safely sharing >>>>>>> memory pages from user space with the hardware, without pinning. >>>>>>> >>>>>>> We will be happy to hear the community feedback on our prototype >>>>>>> implementation, and suggestions for future improvements. >>>>>>> >>>>>>> We would also like to discuss adding features to the core MM subsystem to >>>>>>> assist hardware access to user memory without pinning. >>>>>> This sounds kinda scary TBH; however I do understand the need for such >>>>>> technology. >>>>>> >>>>>> I think one issue is that many MM developers are insufficiently aware >>>>>> of such developments; having a technology presentation would probably >>>>>> help there; but traditionally LSF/MM sessions are more interactive >>>>>> between developers who are already quite familiar with the technology. >>>>>> I think it would help if you could send in advance a detailed >>>>>> presentation of the problem and the proposed solutions (and then what >>>>>> they require of the MM layer) so people can be better prepared. >>>>>> >>>>>> And first I'd like to ask, aren't IOMMUs supposed to already largely >>>>>> solve this problem ? (probably a dumb question, but that just tells >>>>>> you how much you need to explain :) >>>>> For GPU the motivation is three fold. With the advance of GPU compute >>>>> and also with newer graphic program we see a massive increase in GPU >>>>> memory consumption. We easily can reach buffer that are bigger than >>>>> 1gbytes. So the first motivation is to directly use the memory the >>>>> user allocated through malloc in the GPU this avoid copying 1gbytes of >>>>> data with the cpu to the gpu buffer. The second and mostly important >>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order to >>>>> achieve this you want the programmer to have a single address space on >>>>> the CPU and GPU. So that the same address point to the same object on >>>>> GPU as on the CPU. This would also be a tremendous cleaner design from >>>>> driver point of view toward memory management. >>>>> >>>>> And last, the most important, with such big buffer (>1gbytes) the >>>>> memory pinning is becoming way to expensive and also drastically >>>>> reduce the freedom of the mm to free page for other process. Most of >>>>> the time a small window (every thing is relative the window can be > >>>>> 100mbytes not so small :)) of the object will be in use by the >>>>> hardware. The hardware pagefault support would avoid the necessity to >>>> What's the meaning of hardware pagefault? >>> It's a PCIE extension (well it's a combination of extension that allow >>> that see http://www.pcisig.com/specifications/iov/ats/). Idea is that the >>> iommu can trigger a regular pagefault inside a process address space on >>> behalf of the hardware. The only iommu supporting that right now is the >>> AMD iommu v2 that you find on recent AMD platform. >> Why need hardware page fault? regular page fault is trigger by cpu >> mmu, correct? > Well here i abuse regular page fault term. Idea is that with hardware page > fault you don't need to pin memory or take reference on page for hardware to > use it. So that kernel can free as usual page that would otherwise have been For the case when GPU need to pin memory, why GPU need grap the memory of normal process instead of allocating for itself? > pinned. If GPU is really using them it will trigger a fault through the iommu > driver that call get_user_pages (which can end up calling handle_mm_fault like > a regular page fault that happened on the CPU). This time normal process can't use this page, correct? So GPU and normal process both have their own pages? > One use case is GPU working on BIG dataset (think > GB buffer that can be on disk > and just paged in when a chunk is needed). This is one example, but usualy GPU > works on very large dataset because that's what they are good at. >>>>> pin memory and thus offer greater flexibility. At the same time the >>>>> driver wants to avoid page fault as much as possible this is why i >>>>> would like to be able to give hint to the mm about range of address it >>>>> should avoid freeing page (swapping them out). >>>>> >>>>> The iommu was designed with other goals, which were first isolate >>>>> device from one another and restrict device access to allowed memory. >>>>> Second allow to remap address that are above device address space >>>> When need remap address? >>> Some hardware have 24bits or 32bits address limitation, iommu allow to >>> remap memory that are above this range into the working range of the >>> device. Just as i said below. Or are your question different ? >> Oh, this method can replace bounce buffer, correct? > Yes, no bounce buffer, bounce buffer is frowned upon in GPU world because you > really really really don't want to use the dma sync buffer API. > > Cheers, > Jerome -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-11 3:42 ` Simon Jeons @ 2013-04-11 18:38 ` Jerome Glisse 2013-04-12 1:54 ` Simon Jeons 0 siblings, 1 reply; 34+ messages in thread From: Jerome Glisse @ 2013-04-11 18:38 UTC (permalink / raw) To: Simon Jeons Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote: > Hi Jerome, > On 04/11/2013 04:45 AM, Jerome Glisse wrote: > >On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote: > >>Hi Jerome, > >>On 04/09/2013 10:21 PM, Jerome Glisse wrote: > >>>On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote: > >>>>Hi Jerome, > >>>>On 02/10/2013 12:29 AM, Jerome Glisse wrote: > >>>>>On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote: > >>>>>>On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: > >>>>>>>Hi, > >>>>>>> > >>>>>>>We would like to present a reference implementation for safely sharing > >>>>>>>memory pages from user space with the hardware, without pinning. > >>>>>>> > >>>>>>>We will be happy to hear the community feedback on our prototype > >>>>>>>implementation, and suggestions for future improvements. > >>>>>>> > >>>>>>>We would also like to discuss adding features to the core MM subsystem to > >>>>>>>assist hardware access to user memory without pinning. > >>>>>>This sounds kinda scary TBH; however I do understand the need for such > >>>>>>technology. > >>>>>> > >>>>>>I think one issue is that many MM developers are insufficiently aware > >>>>>>of such developments; having a technology presentation would probably > >>>>>>help there; but traditionally LSF/MM sessions are more interactive > >>>>>>between developers who are already quite familiar with the technology. > >>>>>>I think it would help if you could send in advance a detailed > >>>>>>presentation of the problem and the proposed solutions (and then what > >>>>>>they require of the MM layer) so people can be better prepared. > >>>>>> > >>>>>>And first I'd like to ask, aren't IOMMUs supposed to already largely > >>>>>>solve this problem ? (probably a dumb question, but that just tells > >>>>>>you how much you need to explain :) > >>>>>For GPU the motivation is three fold. With the advance of GPU compute > >>>>>and also with newer graphic program we see a massive increase in GPU > >>>>>memory consumption. We easily can reach buffer that are bigger than > >>>>>1gbytes. So the first motivation is to directly use the memory the > >>>>>user allocated through malloc in the GPU this avoid copying 1gbytes of > >>>>>data with the cpu to the gpu buffer. The second and mostly important > >>>>>to GPU compute is the use of GPU seamlessly with the CPU, in order to > >>>>>achieve this you want the programmer to have a single address space on > >>>>>the CPU and GPU. So that the same address point to the same object on > >>>>>GPU as on the CPU. This would also be a tremendous cleaner design from > >>>>>driver point of view toward memory management. > >>>>> > >>>>>And last, the most important, with such big buffer (>1gbytes) the > >>>>>memory pinning is becoming way to expensive and also drastically > >>>>>reduce the freedom of the mm to free page for other process. Most of > >>>>>the time a small window (every thing is relative the window can be > > >>>>>100mbytes not so small :)) of the object will be in use by the > >>>>>hardware. The hardware pagefault support would avoid the necessity to > >>>>What's the meaning of hardware pagefault? > >>>It's a PCIE extension (well it's a combination of extension that allow > >>>that see http://www.pcisig.com/specifications/iov/ats/). Idea is that the > >>>iommu can trigger a regular pagefault inside a process address space on > >>>behalf of the hardware. The only iommu supporting that right now is the > >>>AMD iommu v2 that you find on recent AMD platform. > >>Why need hardware page fault? regular page fault is trigger by cpu > >>mmu, correct? > >Well here i abuse regular page fault term. Idea is that with hardware page > >fault you don't need to pin memory or take reference on page for hardware to > >use it. So that kernel can free as usual page that would otherwise have been > > For the case when GPU need to pin memory, why GPU need grap the > memory of normal process instead of allocating for itself? Pin memory is today world where gpu allocate its own memory (GB of memory) that disappear from kernel control ie kernel can no longer reclaim this memory it's lost memory (i had complain about that already from user than saw GB of memory vanish and couldn't understand why the GPU was using so much). Tomorrow world we want gpu to be able to access memory that the application allocated through a simple malloc and we want the kernel to be able to recycly any page at any time because of memory pressure or because kernel decide to do so. That's just what we want to do. To achieve so we are getting hw that can do pagefault. No change to kernel core mm code (some improvement might be made). > > >pinned. If GPU is really using them it will trigger a fault through the iommu > >driver that call get_user_pages (which can end up calling handle_mm_fault like > >a regular page fault that happened on the CPU). > > This time normal process can't use this page, correct? So GPU and > normal process both have their own pages? No, tomorrow world, gpu and cpu both using same page in same address space at the same time. Just like two cpu core each running a different thread of the same process would. Just consider the gpu as a new cpu core working in same address space using the same memory all at the same time as cpu. Cheers, Jerome -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-11 18:38 ` Jerome Glisse @ 2013-04-12 1:54 ` Simon Jeons 2013-04-12 2:11 ` [Lsf-pc] " Rik van Riel 2013-04-12 2:57 ` Jerome Glisse 0 siblings, 2 replies; 34+ messages in thread From: Simon Jeons @ 2013-04-12 1:54 UTC (permalink / raw) To: Jerome Glisse Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss Hi Jerome, On 04/12/2013 02:38 AM, Jerome Glisse wrote: > On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote: >> Hi Jerome, >> On 04/11/2013 04:45 AM, Jerome Glisse wrote: >>> On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote: >>>> Hi Jerome, >>>> On 04/09/2013 10:21 PM, Jerome Glisse wrote: >>>>> On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote: >>>>>> Hi Jerome, >>>>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote: >>>>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote: >>>>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> We would like to present a reference implementation for safely sharing >>>>>>>>> memory pages from user space with the hardware, without pinning. >>>>>>>>> >>>>>>>>> We will be happy to hear the community feedback on our prototype >>>>>>>>> implementation, and suggestions for future improvements. >>>>>>>>> >>>>>>>>> We would also like to discuss adding features to the core MM subsystem to >>>>>>>>> assist hardware access to user memory without pinning. >>>>>>>> This sounds kinda scary TBH; however I do understand the need for such >>>>>>>> technology. >>>>>>>> >>>>>>>> I think one issue is that many MM developers are insufficiently aware >>>>>>>> of such developments; having a technology presentation would probably >>>>>>>> help there; but traditionally LSF/MM sessions are more interactive >>>>>>>> between developers who are already quite familiar with the technology. >>>>>>>> I think it would help if you could send in advance a detailed >>>>>>>> presentation of the problem and the proposed solutions (and then what >>>>>>>> they require of the MM layer) so people can be better prepared. >>>>>>>> >>>>>>>> And first I'd like to ask, aren't IOMMUs supposed to already largely >>>>>>>> solve this problem ? (probably a dumb question, but that just tells >>>>>>>> you how much you need to explain :) >>>>>>> For GPU the motivation is three fold. With the advance of GPU compute >>>>>>> and also with newer graphic program we see a massive increase in GPU >>>>>>> memory consumption. We easily can reach buffer that are bigger than >>>>>>> 1gbytes. So the first motivation is to directly use the memory the >>>>>>> user allocated through malloc in the GPU this avoid copying 1gbytes of >>>>>>> data with the cpu to the gpu buffer. The second and mostly important >>>>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order to >>>>>>> achieve this you want the programmer to have a single address space on >>>>>>> the CPU and GPU. So that the same address point to the same object on >>>>>>> GPU as on the CPU. This would also be a tremendous cleaner design from >>>>>>> driver point of view toward memory management. >>>>>>> >>>>>>> And last, the most important, with such big buffer (>1gbytes) the >>>>>>> memory pinning is becoming way to expensive and also drastically >>>>>>> reduce the freedom of the mm to free page for other process. Most of >>>>>>> the time a small window (every thing is relative the window can be > >>>>>>> 100mbytes not so small :)) of the object will be in use by the >>>>>>> hardware. The hardware pagefault support would avoid the necessity to >>>>>> What's the meaning of hardware pagefault? >>>>> It's a PCIE extension (well it's a combination of extension that allow >>>>> that see http://www.pcisig.com/specifications/iov/ats/). Idea is that the >>>>> iommu can trigger a regular pagefault inside a process address space on >>>>> behalf of the hardware. The only iommu supporting that right now is the >>>>> AMD iommu v2 that you find on recent AMD platform. >>>> Why need hardware page fault? regular page fault is trigger by cpu >>>> mmu, correct? >>> Well here i abuse regular page fault term. Idea is that with hardware page >>> fault you don't need to pin memory or take reference on page for hardware to >>> use it. So that kernel can free as usual page that would otherwise have been >> For the case when GPU need to pin memory, why GPU need grap the >> memory of normal process instead of allocating for itself? > Pin memory is today world where gpu allocate its own memory (GB of memory) > that disappear from kernel control ie kernel can no longer reclaim this > memory it's lost memory (i had complain about that already from user than > saw GB of memory vanish and couldn't understand why the GPU was using so > much). > > Tomorrow world we want gpu to be able to access memory that the application > allocated through a simple malloc and we want the kernel to be able to > recycly any page at any time because of memory pressure or because kernel > decide to do so. > > That's just what we want to do. To achieve so we are getting hw that can do > pagefault. No change to kernel core mm code (some improvement might be made). The memory disappear since you have a reference(gup) against it, correct? Tomorrow world you want the page fault trigger through iommu driver that call get_user_pages, it also will take a reference(since gup is called), isn't it? Anyway, assume tomorrow world doesn't take a reference, we don't need care page which used by GPU is reclaimed? > >>> pinned. If GPU is really using them it will trigger a fault through the iommu >>> driver that call get_user_pages (which can end up calling handle_mm_fault like >>> a regular page fault that happened on the CPU). >> This time normal process can't use this page, correct? So GPU and >> normal process both have their own pages? > No, tomorrow world, gpu and cpu both using same page in same address space at > the same time. Just like two cpu core each running a different thread of > the same process would. Just consider the gpu as a new cpu core working in same > address space using the same memory all at the same time as cpu. > > Cheers, > Jerome -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Lsf-pc] [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-12 1:54 ` Simon Jeons @ 2013-04-12 2:11 ` Rik van Riel 2013-04-12 2:57 ` Jerome Glisse 1 sibling, 0 replies; 34+ messages in thread From: Rik van Riel @ 2013-04-12 2:11 UTC (permalink / raw) To: Simon Jeons Cc: Jerome Glisse, Andrea Arcangeli, Haggai Eran, lsf-pc, Liran Liss, Shachar Raindel, Sagi Grimberg, Roland Dreier, linux-mm, Or Gerlitz, Michel Lespinasse On 04/11/2013 09:54 PM, Simon Jeons wrote: > Hi Jerome, > On 04/12/2013 02:38 AM, Jerome Glisse wrote: >> On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote: >> Tomorrow world we want gpu to be able to access memory that the >> application >> allocated through a simple malloc and we want the kernel to be able to >> recycly any page at any time because of memory pressure or because kernel >> decide to do so. >> >> That's just what we want to do. To achieve so we are getting hw that >> can do >> pagefault. No change to kernel core mm code (some improvement might be >> made). > > The memory disappear since you have a reference(gup) against it, > correct? Tomorrow world you want the page fault trigger through iommu > driver that call get_user_pages, it also will take a reference(since gup > is called), isn't it? Anyway, assume tomorrow world doesn't take a > reference, we don't need care page which used by GPU is reclaimed? The GPU and CPU may each have a different page table format. The kernel will need to keep both in sync. That is one of the things this discussion is about. For performance reasons, it may also make sense to locate some of the application's data in the GPU's own memory, so it does not have to cross the PCIE bus every time it needs to load the data. That requires memory coherency code in the kernel. -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-12 1:54 ` Simon Jeons 2013-04-12 2:11 ` [Lsf-pc] " Rik van Riel @ 2013-04-12 2:57 ` Jerome Glisse 2013-04-12 5:44 ` Simon Jeons 1 sibling, 1 reply; 34+ messages in thread From: Jerome Glisse @ 2013-04-12 2:57 UTC (permalink / raw) To: Simon Jeons Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss [-- Attachment #1: Type: text/plain, Size: 6456 bytes --] On Thu, Apr 11, 2013 at 9:54 PM, Simon Jeons <simon.jeons@gmail.com> wrote: > Hi Jerome, > > On 04/12/2013 02:38 AM, Jerome Glisse wrote: > >> On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote: >> >>> Hi Jerome, >>> On 04/11/2013 04:45 AM, Jerome Glisse wrote: >>> >>>> On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote: >>>> >>>>> Hi Jerome, >>>>> On 04/09/2013 10:21 PM, Jerome Glisse wrote: >>>>> >>>>>> On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote: >>>>>> >>>>>>> Hi Jerome, >>>>>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote: >>>>>>> >>>>>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse < >>>>>>>> walken@google.com> wrote: >>>>>>>> >>>>>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel < >>>>>>>>> raindel@mellanox.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> We would like to present a reference implementation for safely >>>>>>>>>> sharing >>>>>>>>>> memory pages from user space with the hardware, without pinning. >>>>>>>>>> >>>>>>>>>> We will be happy to hear the community feedback on our prototype >>>>>>>>>> implementation, and suggestions for future improvements. >>>>>>>>>> >>>>>>>>>> We would also like to discuss adding features to the core MM >>>>>>>>>> subsystem to >>>>>>>>>> assist hardware access to user memory without pinning. >>>>>>>>>> >>>>>>>>> This sounds kinda scary TBH; however I do understand the need for >>>>>>>>> such >>>>>>>>> technology. >>>>>>>>> >>>>>>>>> I think one issue is that many MM developers are insufficiently >>>>>>>>> aware >>>>>>>>> of such developments; having a technology presentation would >>>>>>>>> probably >>>>>>>>> help there; but traditionally LSF/MM sessions are more interactive >>>>>>>>> between developers who are already quite familiar with the >>>>>>>>> technology. >>>>>>>>> I think it would help if you could send in advance a detailed >>>>>>>>> presentation of the problem and the proposed solutions (and then >>>>>>>>> what >>>>>>>>> they require of the MM layer) so people can be better prepared. >>>>>>>>> >>>>>>>>> And first I'd like to ask, aren't IOMMUs supposed to already >>>>>>>>> largely >>>>>>>>> solve this problem ? (probably a dumb question, but that just tells >>>>>>>>> you how much you need to explain :) >>>>>>>>> >>>>>>>> For GPU the motivation is three fold. With the advance of GPU >>>>>>>> compute >>>>>>>> and also with newer graphic program we see a massive increase in GPU >>>>>>>> memory consumption. We easily can reach buffer that are bigger than >>>>>>>> 1gbytes. So the first motivation is to directly use the memory the >>>>>>>> user allocated through malloc in the GPU this avoid copying 1gbytes >>>>>>>> of >>>>>>>> data with the cpu to the gpu buffer. The second and mostly important >>>>>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order >>>>>>>> to >>>>>>>> achieve this you want the programmer to have a single address space >>>>>>>> on >>>>>>>> the CPU and GPU. So that the same address point to the same object >>>>>>>> on >>>>>>>> GPU as on the CPU. This would also be a tremendous cleaner design >>>>>>>> from >>>>>>>> driver point of view toward memory management. >>>>>>>> >>>>>>>> And last, the most important, with such big buffer (>1gbytes) the >>>>>>>> memory pinning is becoming way to expensive and also drastically >>>>>>>> reduce the freedom of the mm to free page for other process. Most of >>>>>>>> the time a small window (every thing is relative the window can be > >>>>>>>> 100mbytes not so small :)) of the object will be in use by the >>>>>>>> hardware. The hardware pagefault support would avoid the necessity >>>>>>>> to >>>>>>>> >>>>>>> What's the meaning of hardware pagefault? >>>>>>> >>>>>> It's a PCIE extension (well it's a combination of extension that allow >>>>>> that see http://www.pcisig.com/**specifications/iov/ats/<http://www.pcisig.com/specifications/iov/ats/>). >>>>>> Idea is that the >>>>>> iommu can trigger a regular pagefault inside a process address space >>>>>> on >>>>>> behalf of the hardware. The only iommu supporting that right now is >>>>>> the >>>>>> AMD iommu v2 that you find on recent AMD platform. >>>>>> >>>>> Why need hardware page fault? regular page fault is trigger by cpu >>>>> mmu, correct? >>>>> >>>> Well here i abuse regular page fault term. Idea is that with hardware >>>> page >>>> fault you don't need to pin memory or take reference on page for >>>> hardware to >>>> use it. So that kernel can free as usual page that would otherwise have >>>> been >>>> >>> For the case when GPU need to pin memory, why GPU need grap the >>> memory of normal process instead of allocating for itself? >>> >> Pin memory is today world where gpu allocate its own memory (GB of memory) >> that disappear from kernel control ie kernel can no longer reclaim this >> memory it's lost memory (i had complain about that already from user than >> saw GB of memory vanish and couldn't understand why the GPU was using so >> much). >> >> Tomorrow world we want gpu to be able to access memory that the >> application >> allocated through a simple malloc and we want the kernel to be able to >> recycly any page at any time because of memory pressure or because kernel >> decide to do so. >> >> That's just what we want to do. To achieve so we are getting hw that can >> do >> pagefault. No change to kernel core mm code (some improvement might be >> made). >> > > The memory disappear since you have a reference(gup) against it, correct? > Tomorrow world you want the page fault trigger through iommu driver that > call get_user_pages, it also will take a reference(since gup is called), > isn't it? Anyway, assume tomorrow world doesn't take a reference, we don't > need care page which used by GPU is reclaimed? > > Right now code use gup because it's convenient but it drop the reference right after the fault. So reference is hold only for short period of time. No you don't need to care about reclaim thanks to mmu notifier, ie before page is remove mmu notifier is call and iommu register a notifier, so it get the invalidate event and invalidate the device tlb and things goes on. If gpu access the page a new pagefault happen and a new page is allocated. All this code is upstream in linux kernel just read it. There is just no device that use it yet. That being said we will want improvement so that page that are hot in the device are not reclaimed. But it can work without such improvement. Cheers, Jerome [-- Attachment #2: Type: text/html, Size: 7593 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-12 2:57 ` Jerome Glisse @ 2013-04-12 5:44 ` Simon Jeons 2013-04-12 13:32 ` Jerome Glisse 0 siblings, 1 reply; 34+ messages in thread From: Simon Jeons @ 2013-04-12 5:44 UTC (permalink / raw) To: Jerome Glisse Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss [-- Attachment #1: Type: text/plain, Size: 9882 bytes --] Hi Jerome, On 04/12/2013 10:57 AM, Jerome Glisse wrote: > On Thu, Apr 11, 2013 at 9:54 PM, Simon Jeons <simon.jeons@gmail.com > <mailto:simon.jeons@gmail.com>> wrote: > > Hi Jerome, > > On 04/12/2013 02:38 AM, Jerome Glisse wrote: > > On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote: > > Hi Jerome, > On 04/11/2013 04:45 AM, Jerome Glisse wrote: > > On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons > wrote: > > Hi Jerome, > On 04/09/2013 10:21 PM, Jerome Glisse wrote: > > On Tue, Apr 09, 2013 at 04:28:09PM +0800, > Simon Jeons wrote: > > Hi Jerome, > On 02/10/2013 12:29 AM, Jerome Glisse wrote: > > On Sat, Feb 9, 2013 at 1:05 AM, Michel > Lespinasse <walken@google.com > <mailto:walken@google.com>> wrote: > > On Fri, Feb 8, 2013 at 3:18 AM, > Shachar Raindel > <raindel@mellanox.com > <mailto:raindel@mellanox.com>> wrote: > > Hi, > > We would like to present a > reference implementation for > safely sharing > memory pages from user space > with the hardware, without > pinning. > > We will be happy to hear the > community feedback on our > prototype > implementation, and > suggestions for future > improvements. > > We would also like to discuss > adding features to the core MM > subsystem to > assist hardware access to user > memory without pinning. > > This sounds kinda scary TBH; > however I do understand the need > for such > technology. > > I think one issue is that many MM > developers are insufficiently aware > of such developments; having a > technology presentation would probably > help there; but traditionally > LSF/MM sessions are more interactive > between developers who are already > quite familiar with the technology. > I think it would help if you could > send in advance a detailed > presentation of the problem and > the proposed solutions (and then what > they require of the MM layer) so > people can be better prepared. > > And first I'd like to ask, aren't > IOMMUs supposed to already largely > solve this problem ? (probably a > dumb question, but that just tells > you how much you need to explain :) > > For GPU the motivation is three fold. > With the advance of GPU compute > and also with newer graphic program we > see a massive increase in GPU > memory consumption. We easily can > reach buffer that are bigger than > 1gbytes. So the first motivation is to > directly use the memory the > user allocated through malloc in the > GPU this avoid copying 1gbytes of > data with the cpu to the gpu buffer. > The second and mostly important > to GPU compute is the use of GPU > seamlessly with the CPU, in order to > achieve this you want the programmer > to have a single address space on > the CPU and GPU. So that the same > address point to the same object on > GPU as on the CPU. This would also be > a tremendous cleaner design from > driver point of view toward memory > management. > > And last, the most important, with > such big buffer (>1gbytes) the > memory pinning is becoming way to > expensive and also drastically > reduce the freedom of the mm to free > page for other process. Most of > the time a small window (every thing > is relative the window can be > > 100mbytes not so small :)) of the > object will be in use by the > hardware. The hardware pagefault > support would avoid the necessity to > > What's the meaning of hardware pagefault? > > It's a PCIE extension (well it's a combination > of extension that allow > that see > http://www.pcisig.com/specifications/iov/ats/). Idea > is that the > iommu can trigger a regular pagefault inside a > process address space on > behalf of the hardware. The only iommu > supporting that right now is the > AMD iommu v2 that you find on recent AMD platform. > > Why need hardware page fault? regular page fault > is trigger by cpu > mmu, correct? > > Well here i abuse regular page fault term. Idea is > that with hardware page > fault you don't need to pin memory or take reference > on page for hardware to > use it. So that kernel can free as usual page that > would otherwise have been > > For the case when GPU need to pin memory, why GPU need > grap the > memory of normal process instead of allocating for itself? > > Pin memory is today world where gpu allocate its own memory > (GB of memory) > that disappear from kernel control ie kernel can no longer > reclaim this > memory it's lost memory (i had complain about that already > from user than > saw GB of memory vanish and couldn't understand why the GPU > was using so > much). > > Tomorrow world we want gpu to be able to access memory that > the application > allocated through a simple malloc and we want the kernel to be > able to > recycly any page at any time because of memory pressure or > because kernel > decide to do so. > > That's just what we want to do. To achieve so we are getting > hw that can do > pagefault. No change to kernel core mm code (some improvement > might be made). > > > The memory disappear since you have a reference(gup) against it, > correct? Tomorrow world you want the page fault trigger through > iommu driver that call get_user_pages, it also will take a > reference(since gup is called), isn't it? Anyway, assume tomorrow > world doesn't take a reference, we don't need care page which used > by GPU is reclaimed? > > > Right now code use gup because it's convenient but it drop the > reference right after the fault. So reference is hold only for short > period of time. Are you sure gup will drop the reference right after the fault? I redig the codes and fail verify it. Could you point out to me? > > No you don't need to care about reclaim thanks to mmu notifier, ie > before page is remove mmu notifier is call and iommu register a > notifier, so it get the invalidate event and invalidate the device tlb > and things goes on. If gpu access the page a new pagefault happen and > a new page is allocated. Good idea! ;-) > > All this code is upstream in linux kernel just read it. There is just > no device that use it yet. > > That being said we will want improvement so that page that are hot in > the device are not reclaimed. But it can work without such improvement. > > Cheers, > Jerome [-- Attachment #2: Type: text/html, Size: 13455 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-12 5:44 ` Simon Jeons @ 2013-04-12 13:32 ` Jerome Glisse 0 siblings, 0 replies; 34+ messages in thread From: Jerome Glisse @ 2013-04-12 13:32 UTC (permalink / raw) To: Simon Jeons Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss [-- Attachment #1: Type: text/plain, Size: 6419 bytes --] On Fri, Apr 12, 2013 at 1:44 AM, Simon Jeons <simon.jeons@gmail.com> wrote: > Hi Jerome, > > On 04/12/2013 10:57 AM, Jerome Glisse wrote: > > On Thu, Apr 11, 2013 at 9:54 PM, Simon Jeons <simon.jeons@gmail.com>wrote: > >> Hi Jerome, >> >> On 04/12/2013 02:38 AM, Jerome Glisse wrote: >> >>> On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote: >>> >>>> Hi Jerome, >>>> On 04/11/2013 04:45 AM, Jerome Glisse wrote: >>>> >>>>> On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote: >>>>> >>>>>> Hi Jerome, >>>>>> On 04/09/2013 10:21 PM, Jerome Glisse wrote: >>>>>> >>>>>>> On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote: >>>>>>> >>>>>>>> Hi Jerome, >>>>>>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote: >>>>>>>> >>>>>>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse < >>>>>>>>> walken@google.com> wrote: >>>>>>>>> >>>>>>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel < >>>>>>>>>> raindel@mellanox.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> We would like to present a reference implementation for safely >>>>>>>>>>> sharing >>>>>>>>>>> memory pages from user space with the hardware, without pinning. >>>>>>>>>>> >>>>>>>>>>> We will be happy to hear the community feedback on our prototype >>>>>>>>>>> implementation, and suggestions for future improvements. >>>>>>>>>>> >>>>>>>>>>> We would also like to discuss adding features to the core MM >>>>>>>>>>> subsystem to >>>>>>>>>>> assist hardware access to user memory without pinning. >>>>>>>>>>> >>>>>>>>>> This sounds kinda scary TBH; however I do understand the need for >>>>>>>>>> such >>>>>>>>>> technology. >>>>>>>>>> >>>>>>>>>> I think one issue is that many MM developers are insufficiently >>>>>>>>>> aware >>>>>>>>>> of such developments; having a technology presentation would >>>>>>>>>> probably >>>>>>>>>> help there; but traditionally LSF/MM sessions are more interactive >>>>>>>>>> between developers who are already quite familiar with the >>>>>>>>>> technology. >>>>>>>>>> I think it would help if you could send in advance a detailed >>>>>>>>>> presentation of the problem and the proposed solutions (and then >>>>>>>>>> what >>>>>>>>>> they require of the MM layer) so people can be better prepared. >>>>>>>>>> >>>>>>>>>> And first I'd like to ask, aren't IOMMUs supposed to already >>>>>>>>>> largely >>>>>>>>>> solve this problem ? (probably a dumb question, but that just >>>>>>>>>> tells >>>>>>>>>> you how much you need to explain :) >>>>>>>>>> >>>>>>>>> For GPU the motivation is three fold. With the advance of GPU >>>>>>>>> compute >>>>>>>>> and also with newer graphic program we see a massive increase in >>>>>>>>> GPU >>>>>>>>> memory consumption. We easily can reach buffer that are bigger than >>>>>>>>> 1gbytes. So the first motivation is to directly use the memory the >>>>>>>>> user allocated through malloc in the GPU this avoid copying >>>>>>>>> 1gbytes of >>>>>>>>> data with the cpu to the gpu buffer. The second and mostly >>>>>>>>> important >>>>>>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order >>>>>>>>> to >>>>>>>>> achieve this you want the programmer to have a single address >>>>>>>>> space on >>>>>>>>> the CPU and GPU. So that the same address point to the same object >>>>>>>>> on >>>>>>>>> GPU as on the CPU. This would also be a tremendous cleaner design >>>>>>>>> from >>>>>>>>> driver point of view toward memory management. >>>>>>>>> >>>>>>>>> And last, the most important, with such big buffer (>1gbytes) the >>>>>>>>> memory pinning is becoming way to expensive and also drastically >>>>>>>>> reduce the freedom of the mm to free page for other process. Most >>>>>>>>> of >>>>>>>>> the time a small window (every thing is relative the window can be >>>>>>>>> > >>>>>>>>> 100mbytes not so small :)) of the object will be in use by the >>>>>>>>> hardware. The hardware pagefault support would avoid the necessity >>>>>>>>> to >>>>>>>>> >>>>>>>> What's the meaning of hardware pagefault? >>>>>>>> >>>>>>> It's a PCIE extension (well it's a combination of extension that >>>>>>> allow >>>>>>> that see http://www.pcisig.com/specifications/iov/ats/). Idea is >>>>>>> that the >>>>>>> iommu can trigger a regular pagefault inside a process address space >>>>>>> on >>>>>>> behalf of the hardware. The only iommu supporting that right now is >>>>>>> the >>>>>>> AMD iommu v2 that you find on recent AMD platform. >>>>>>> >>>>>> Why need hardware page fault? regular page fault is trigger by cpu >>>>>> mmu, correct? >>>>>> >>>>> Well here i abuse regular page fault term. Idea is that with hardware >>>>> page >>>>> fault you don't need to pin memory or take reference on page for >>>>> hardware to >>>>> use it. So that kernel can free as usual page that would otherwise >>>>> have been >>>>> >>>> For the case when GPU need to pin memory, why GPU need grap the >>>> memory of normal process instead of allocating for itself? >>>> >>> Pin memory is today world where gpu allocate its own memory (GB of >>> memory) >>> that disappear from kernel control ie kernel can no longer reclaim this >>> memory it's lost memory (i had complain about that already from user than >>> saw GB of memory vanish and couldn't understand why the GPU was using so >>> much). >>> >>> Tomorrow world we want gpu to be able to access memory that the >>> application >>> allocated through a simple malloc and we want the kernel to be able to >>> recycly any page at any time because of memory pressure or because kernel >>> decide to do so. >>> >>> That's just what we want to do. To achieve so we are getting hw that can >>> do >>> pagefault. No change to kernel core mm code (some improvement might be >>> made). >>> >> >> The memory disappear since you have a reference(gup) against it, >> correct? Tomorrow world you want the page fault trigger through iommu >> driver that call get_user_pages, it also will take a reference(since gup is >> called), isn't it? Anyway, assume tomorrow world doesn't take a reference, >> we don't need care page which used by GPU is reclaimed? >> >> > Right now code use gup because it's convenient but it drop the reference > right after the fault. So reference is hold only for short period of time. > > > Are you sure gup will drop the reference right after the fault? I redig > the codes and fail verify it. Could you point out to me? > > In amd_iommu_v2.c:do_fault get_user_pages followed by put_page Cheers, Jerome [-- Attachment #2: Type: text/html, Size: 12109 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-02-09 16:29 ` Jerome Glisse 2013-04-09 8:28 ` Simon Jeons @ 2013-04-10 1:57 ` Simon Jeons 2013-04-10 20:55 ` Jerome Glisse 2013-04-15 8:39 ` Simon Jeons 2 siblings, 1 reply; 34+ messages in thread From: Simon Jeons @ 2013-04-10 1:57 UTC (permalink / raw) To: Jerome Glisse Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss Hi Jerome, On 02/10/2013 12:29 AM, Jerome Glisse wrote: > On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote: >> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: >>> Hi, >>> >>> We would like to present a reference implementation for safely sharing >>> memory pages from user space with the hardware, without pinning. >>> >>> We will be happy to hear the community feedback on our prototype >>> implementation, and suggestions for future improvements. >>> >>> We would also like to discuss adding features to the core MM subsystem to >>> assist hardware access to user memory without pinning. >> This sounds kinda scary TBH; however I do understand the need for such >> technology. >> >> I think one issue is that many MM developers are insufficiently aware >> of such developments; having a technology presentation would probably >> help there; but traditionally LSF/MM sessions are more interactive >> between developers who are already quite familiar with the technology. >> I think it would help if you could send in advance a detailed >> presentation of the problem and the proposed solutions (and then what >> they require of the MM layer) so people can be better prepared. >> >> And first I'd like to ask, aren't IOMMUs supposed to already largely >> solve this problem ? (probably a dumb question, but that just tells >> you how much you need to explain :) > For GPU the motivation is three fold. With the advance of GPU compute > and also with newer graphic program we see a massive increase in GPU > memory consumption. We easily can reach buffer that are bigger than > 1gbytes. So the first motivation is to directly use the memory the > user allocated through malloc in the GPU this avoid copying 1gbytes of > data with the cpu to the gpu buffer. The second and mostly important > to GPU compute is the use of GPU seamlessly with the CPU, in order to > achieve this you want the programmer to have a single address space on > the CPU and GPU. So that the same address point to the same object on > GPU as on the CPU. This would also be a tremendous cleaner design from > driver point of view toward memory management. When GPU will comsume memory? The userspace process like mplayer will have video datas and GPU will play this datas and use memory of mplayer since these video datas load in mplayer process's address space? So GPU codes will call gup to take a reference of memory? Please correct me if my understanding is wrong. ;-) > And last, the most important, with such big buffer (>1gbytes) the > memory pinning is becoming way to expensive and also drastically > reduce the freedom of the mm to free page for other process. Most of > the time a small window (every thing is relative the window can be > > 100mbytes not so small :)) of the object will be in use by the > hardware. The hardware pagefault support would avoid the necessity to > pin memory and thus offer greater flexibility. At the same time the > driver wants to avoid page fault as much as possible this is why i > would like to be able to give hint to the mm about range of address it > should avoid freeing page (swapping them out). > > The iommu was designed with other goals, which were first isolate > device from one another and restrict device access to allowed memory. > Second allow to remap address that are above device address space > limit. Lot of device can only address 24bit or 32bit of memory and > with computer with several gbytes of memory suddenly lot of the page > become unreachable to the hardware. The iommu allow to work around > this by remapping those high page into address that the hardware can > reach. > > The hardware page fault support is a new feature of iommu designed to > help the os and driver to reduce memory pinning and also share address > space. Thought i am sure there are other motivations that i am not > even aware off or would think off. > > Btw i won't be at LSF/MM so a free good beer (or other beverage) on me > to whoever takes note on this subject in next conf we run into each > others. > > Cheers, > Jerome > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-10 1:57 ` Simon Jeons @ 2013-04-10 20:55 ` Jerome Glisse 2013-04-11 3:37 ` Simon Jeons 0 siblings, 1 reply; 34+ messages in thread From: Jerome Glisse @ 2013-04-10 20:55 UTC (permalink / raw) To: Simon Jeons Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss On Wed, Apr 10, 2013 at 09:57:02AM +0800, Simon Jeons wrote: > Hi Jerome, > On 02/10/2013 12:29 AM, Jerome Glisse wrote: > >On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote: > >>On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: > >>>Hi, > >>> > >>>We would like to present a reference implementation for safely sharing > >>>memory pages from user space with the hardware, without pinning. > >>> > >>>We will be happy to hear the community feedback on our prototype > >>>implementation, and suggestions for future improvements. > >>> > >>>We would also like to discuss adding features to the core MM subsystem to > >>>assist hardware access to user memory without pinning. > >>This sounds kinda scary TBH; however I do understand the need for such > >>technology. > >> > >>I think one issue is that many MM developers are insufficiently aware > >>of such developments; having a technology presentation would probably > >>help there; but traditionally LSF/MM sessions are more interactive > >>between developers who are already quite familiar with the technology. > >>I think it would help if you could send in advance a detailed > >>presentation of the problem and the proposed solutions (and then what > >>they require of the MM layer) so people can be better prepared. > >> > >>And first I'd like to ask, aren't IOMMUs supposed to already largely > >>solve this problem ? (probably a dumb question, but that just tells > >>you how much you need to explain :) > >For GPU the motivation is three fold. With the advance of GPU compute > >and also with newer graphic program we see a massive increase in GPU > >memory consumption. We easily can reach buffer that are bigger than > >1gbytes. So the first motivation is to directly use the memory the > >user allocated through malloc in the GPU this avoid copying 1gbytes of > >data with the cpu to the gpu buffer. The second and mostly important > >to GPU compute is the use of GPU seamlessly with the CPU, in order to > >achieve this you want the programmer to have a single address space on > >the CPU and GPU. So that the same address point to the same object on > >GPU as on the CPU. This would also be a tremendous cleaner design from > >driver point of view toward memory management. > > When GPU will comsume memory? > > The userspace process like mplayer will have video datas and GPU > will play this datas and use memory of mplayer since these video > datas load in mplayer process's address space? So GPU codes will > call gup to take a reference of memory? Please correct me if my > understanding is wrong. ;-) First target is not thing such as video decompression, however they could too benefit from it given updated driver kernel API. In case of using iommu hardware page fault we don't call get_user_pages (gup) those we don't take a reference on the page. That's the whole point of the hardware pagefault, not taking reference on the page. Cheers, Jerome Glisse -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-10 20:55 ` Jerome Glisse @ 2013-04-11 3:37 ` Simon Jeons 2013-04-11 18:48 ` Jerome Glisse 0 siblings, 1 reply; 34+ messages in thread From: Simon Jeons @ 2013-04-11 3:37 UTC (permalink / raw) To: Jerome Glisse Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss Hi Jerome, On 04/11/2013 04:55 AM, Jerome Glisse wrote: > On Wed, Apr 10, 2013 at 09:57:02AM +0800, Simon Jeons wrote: >> Hi Jerome, >> On 02/10/2013 12:29 AM, Jerome Glisse wrote: >>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote: >>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: >>>>> Hi, >>>>> >>>>> We would like to present a reference implementation for safely sharing >>>>> memory pages from user space with the hardware, without pinning. >>>>> >>>>> We will be happy to hear the community feedback on our prototype >>>>> implementation, and suggestions for future improvements. >>>>> >>>>> We would also like to discuss adding features to the core MM subsystem to >>>>> assist hardware access to user memory without pinning. >>>> This sounds kinda scary TBH; however I do understand the need for such >>>> technology. >>>> >>>> I think one issue is that many MM developers are insufficiently aware >>>> of such developments; having a technology presentation would probably >>>> help there; but traditionally LSF/MM sessions are more interactive >>>> between developers who are already quite familiar with the technology. >>>> I think it would help if you could send in advance a detailed >>>> presentation of the problem and the proposed solutions (and then what >>>> they require of the MM layer) so people can be better prepared. >>>> >>>> And first I'd like to ask, aren't IOMMUs supposed to already largely >>>> solve this problem ? (probably a dumb question, but that just tells >>>> you how much you need to explain :) >>> For GPU the motivation is three fold. With the advance of GPU compute >>> and also with newer graphic program we see a massive increase in GPU >>> memory consumption. We easily can reach buffer that are bigger than >>> 1gbytes. So the first motivation is to directly use the memory the >>> user allocated through malloc in the GPU this avoid copying 1gbytes of >>> data with the cpu to the gpu buffer. The second and mostly important >>> to GPU compute is the use of GPU seamlessly with the CPU, in order to >>> achieve this you want the programmer to have a single address space on >>> the CPU and GPU. So that the same address point to the same object on >>> GPU as on the CPU. This would also be a tremendous cleaner design from >>> driver point of view toward memory management. >> When GPU will comsume memory? >> >> The userspace process like mplayer will have video datas and GPU >> will play this datas and use memory of mplayer since these video >> datas load in mplayer process's address space? So GPU codes will >> call gup to take a reference of memory? Please correct me if my >> understanding is wrong. ;-) > First target is not thing such as video decompression, however they could > too benefit from it given updated driver kernel API. In case of using > iommu hardware page fault we don't call get_user_pages (gup) those we > don't take a reference on the page. That's the whole point of the hardware > pagefault, not taking reference on the page. mplayer process is running on normal CPU or GPU? chipset_integrated graphics will use normal memory and discrete graphics will use its own memory, correct? So the memory used by discrete graphics won't need gup, correct? > > Cheers, > Jerome Glisse -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-11 3:37 ` Simon Jeons @ 2013-04-11 18:48 ` Jerome Glisse 2013-04-12 3:13 ` Simon Jeons 0 siblings, 1 reply; 34+ messages in thread From: Jerome Glisse @ 2013-04-11 18:48 UTC (permalink / raw) To: Simon Jeons Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss On Thu, Apr 11, 2013 at 11:37:35AM +0800, Simon Jeons wrote: > Hi Jerome, > On 04/11/2013 04:55 AM, Jerome Glisse wrote: > >On Wed, Apr 10, 2013 at 09:57:02AM +0800, Simon Jeons wrote: > >>Hi Jerome, > >>On 02/10/2013 12:29 AM, Jerome Glisse wrote: > >>>On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote: > >>>>On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: > >>>>>Hi, > >>>>> > >>>>>We would like to present a reference implementation for safely sharing > >>>>>memory pages from user space with the hardware, without pinning. > >>>>> > >>>>>We will be happy to hear the community feedback on our prototype > >>>>>implementation, and suggestions for future improvements. > >>>>> > >>>>>We would also like to discuss adding features to the core MM subsystem to > >>>>>assist hardware access to user memory without pinning. > >>>>This sounds kinda scary TBH; however I do understand the need for such > >>>>technology. > >>>> > >>>>I think one issue is that many MM developers are insufficiently aware > >>>>of such developments; having a technology presentation would probably > >>>>help there; but traditionally LSF/MM sessions are more interactive > >>>>between developers who are already quite familiar with the technology. > >>>>I think it would help if you could send in advance a detailed > >>>>presentation of the problem and the proposed solutions (and then what > >>>>they require of the MM layer) so people can be better prepared. > >>>> > >>>>And first I'd like to ask, aren't IOMMUs supposed to already largely > >>>>solve this problem ? (probably a dumb question, but that just tells > >>>>you how much you need to explain :) > >>>For GPU the motivation is three fold. With the advance of GPU compute > >>>and also with newer graphic program we see a massive increase in GPU > >>>memory consumption. We easily can reach buffer that are bigger than > >>>1gbytes. So the first motivation is to directly use the memory the > >>>user allocated through malloc in the GPU this avoid copying 1gbytes of > >>>data with the cpu to the gpu buffer. The second and mostly important > >>>to GPU compute is the use of GPU seamlessly with the CPU, in order to > >>>achieve this you want the programmer to have a single address space on > >>>the CPU and GPU. So that the same address point to the same object on > >>>GPU as on the CPU. This would also be a tremendous cleaner design from > >>>driver point of view toward memory management. > >>When GPU will comsume memory? > >> > >>The userspace process like mplayer will have video datas and GPU > >>will play this datas and use memory of mplayer since these video > >>datas load in mplayer process's address space? So GPU codes will > >>call gup to take a reference of memory? Please correct me if my > >>understanding is wrong. ;-) > >First target is not thing such as video decompression, however they could > >too benefit from it given updated driver kernel API. In case of using > >iommu hardware page fault we don't call get_user_pages (gup) those we > >don't take a reference on the page. That's the whole point of the hardware > >pagefault, not taking reference on the page. > > mplayer process is running on normal CPU or GPU? > chipset_integrated graphics will use normal memory and discrete > graphics will use its own memory, correct? So the memory used by > discrete graphics won't need gup, correct? mplayer can decode video in software an only use the cpu. It can also use one of the accleration API such as VDPAU. In any case mplayer is still opening the video file allocating some memory with malloc, reading from file into this memory eventually do some preprocessing on that memory and then memcpy from this memory to memory allocated by the gpu driver. No imagine a world where you don't have to memcpy so that the gpu can access it. Even if it's doable today it's really not something you want todo, ie gup on page and not releasing page for minutes. There is two kind of integrated GPU, on x86 integrated GPU should be considered as discrete GPU because BIOS steal a chunk of system ram and transform it in fake vram. This stolen chunk is never ever under the control of the linux kernel (from mm pov the gpu kernel driver is in charge of it). In any case both discrete GPU and integrated GPU have their own page table or memory controller and they map system memory in it or video memory, sometime interleaving (at address 0x100000 64k is in vram but at address 0x10000+64k it's system memory pointing to some pages). So right now any time we map a normal system ram page we take a reference on it so it does not goes away. We decided to not use gup because it will break several kernel assumption on anonymous memory in case of GPU. But we could use gup for short lived memory transaction like memcpy from system ram to vram (no matter if it's fake vram or real vram). Cheers, Jerome -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-11 18:48 ` Jerome Glisse @ 2013-04-12 3:13 ` Simon Jeons 2013-04-12 3:21 ` Jerome Glisse 0 siblings, 1 reply; 34+ messages in thread From: Simon Jeons @ 2013-04-12 3:13 UTC (permalink / raw) To: Jerome Glisse Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss Hi Jerome, On 04/12/2013 02:48 AM, Jerome Glisse wrote: > On Thu, Apr 11, 2013 at 11:37:35AM +0800, Simon Jeons wrote: >> Hi Jerome, >> On 04/11/2013 04:55 AM, Jerome Glisse wrote: >>> On Wed, Apr 10, 2013 at 09:57:02AM +0800, Simon Jeons wrote: >>>> Hi Jerome, >>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote: >>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote: >>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> We would like to present a reference implementation for safely sharing >>>>>>> memory pages from user space with the hardware, without pinning. >>>>>>> >>>>>>> We will be happy to hear the community feedback on our prototype >>>>>>> implementation, and suggestions for future improvements. >>>>>>> >>>>>>> We would also like to discuss adding features to the core MM subsystem to >>>>>>> assist hardware access to user memory without pinning. >>>>>> This sounds kinda scary TBH; however I do understand the need for such >>>>>> technology. >>>>>> >>>>>> I think one issue is that many MM developers are insufficiently aware >>>>>> of such developments; having a technology presentation would probably >>>>>> help there; but traditionally LSF/MM sessions are more interactive >>>>>> between developers who are already quite familiar with the technology. >>>>>> I think it would help if you could send in advance a detailed >>>>>> presentation of the problem and the proposed solutions (and then what >>>>>> they require of the MM layer) so people can be better prepared. >>>>>> >>>>>> And first I'd like to ask, aren't IOMMUs supposed to already largely >>>>>> solve this problem ? (probably a dumb question, but that just tells >>>>>> you how much you need to explain :) >>>>> For GPU the motivation is three fold. With the advance of GPU compute >>>>> and also with newer graphic program we see a massive increase in GPU >>>>> memory consumption. We easily can reach buffer that are bigger than >>>>> 1gbytes. So the first motivation is to directly use the memory the >>>>> user allocated through malloc in the GPU this avoid copying 1gbytes of >>>>> data with the cpu to the gpu buffer. The second and mostly important >>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order to >>>>> achieve this you want the programmer to have a single address space on >>>>> the CPU and GPU. So that the same address point to the same object on >>>>> GPU as on the CPU. This would also be a tremendous cleaner design from >>>>> driver point of view toward memory management. >>>> When GPU will comsume memory? >>>> >>>> The userspace process like mplayer will have video datas and GPU >>>> will play this datas and use memory of mplayer since these video >>>> datas load in mplayer process's address space? So GPU codes will >>>> call gup to take a reference of memory? Please correct me if my >>>> understanding is wrong. ;-) >>> First target is not thing such as video decompression, however they could >>> too benefit from it given updated driver kernel API. In case of using >>> iommu hardware page fault we don't call get_user_pages (gup) those we >>> don't take a reference on the page. That's the whole point of the hardware >>> pagefault, not taking reference on the page. >> mplayer process is running on normal CPU or GPU? >> chipset_integrated graphics will use normal memory and discrete >> graphics will use its own memory, correct? So the memory used by >> discrete graphics won't need gup, correct? > mplayer can decode video in software an only use the cpu. It can also use > one of the accleration API such as VDPAU. In any case mplayer is still opening > the video file allocating some memory with malloc, reading from file into > this memory eventually do some preprocessing on that memory and then > memcpy from this memory to memory allocated by the gpu driver. > > No imagine a world where you don't have to memcpy so that the gpu can access > it. Even if it's doable today it's really not something you want todo, ie > gup on page and not releasing page for minutes. > > There is two kind of integrated GPU, on x86 integrated GPU should be considered > as discrete GPU because BIOS steal a chunk of system ram and transform it in > fake vram. This stolen chunk is never ever under the control of the linux kernel > (from mm pov the gpu kernel driver is in charge of it). I configure integrated GPU in BIOS during system boot, it's seems that we can preallocate memory for integrated GPU, is this the memory you mentioned? > > In any case both discrete GPU and integrated GPU have their own page table or Discrete GPU will not use normal memory even if their own memory is exhaused, correct? > memory controller and they map system memory in it or video memory, sometime > interleaving (at address 0x100000 64k is in vram but at address 0x10000+64k it's > system memory pointing to some pages). > > So right now any time we map a normal system ram page we take a reference on it > so it does not goes away. We decided to not use gup because it will break several > kernel assumption on anonymous memory in case of GPU. But we could use gup for > short lived memory transaction like memcpy from system ram to vram (no matter if > it's fake vram or real vram). > > Cheers, > Jerome -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-12 3:13 ` Simon Jeons @ 2013-04-12 3:21 ` Jerome Glisse 0 siblings, 0 replies; 34+ messages in thread From: Jerome Glisse @ 2013-04-12 3:21 UTC (permalink / raw) To: Simon Jeons Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss [-- Attachment #1: Type: text/plain, Size: 5427 bytes --] On Thu, Apr 11, 2013 at 11:13 PM, Simon Jeons <simon.jeons@gmail.com> wrote: > Hi Jerome, > > On 04/12/2013 02:48 AM, Jerome Glisse wrote: > >> On Thu, Apr 11, 2013 at 11:37:35AM +0800, Simon Jeons wrote: >> >>> Hi Jerome, >>> On 04/11/2013 04:55 AM, Jerome Glisse wrote: >>> >>>> On Wed, Apr 10, 2013 at 09:57:02AM +0800, Simon Jeons wrote: >>>> >>>>> Hi Jerome, >>>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote: >>>>> >>>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> >>>>>> wrote: >>>>>> >>>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel < >>>>>>> raindel@mellanox.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> We would like to present a reference implementation for safely >>>>>>>> sharing >>>>>>>> memory pages from user space with the hardware, without pinning. >>>>>>>> >>>>>>>> We will be happy to hear the community feedback on our prototype >>>>>>>> implementation, and suggestions for future improvements. >>>>>>>> >>>>>>>> We would also like to discuss adding features to the core MM >>>>>>>> subsystem to >>>>>>>> assist hardware access to user memory without pinning. >>>>>>>> >>>>>>> This sounds kinda scary TBH; however I do understand the need for >>>>>>> such >>>>>>> technology. >>>>>>> >>>>>>> I think one issue is that many MM developers are insufficiently aware >>>>>>> of such developments; having a technology presentation would probably >>>>>>> help there; but traditionally LSF/MM sessions are more interactive >>>>>>> between developers who are already quite familiar with the >>>>>>> technology. >>>>>>> I think it would help if you could send in advance a detailed >>>>>>> presentation of the problem and the proposed solutions (and then what >>>>>>> they require of the MM layer) so people can be better prepared. >>>>>>> >>>>>>> And first I'd like to ask, aren't IOMMUs supposed to already largely >>>>>>> solve this problem ? (probably a dumb question, but that just tells >>>>>>> you how much you need to explain :) >>>>>>> >>>>>> For GPU the motivation is three fold. With the advance of GPU compute >>>>>> and also with newer graphic program we see a massive increase in GPU >>>>>> memory consumption. We easily can reach buffer that are bigger than >>>>>> 1gbytes. So the first motivation is to directly use the memory the >>>>>> user allocated through malloc in the GPU this avoid copying 1gbytes of >>>>>> data with the cpu to the gpu buffer. The second and mostly important >>>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order to >>>>>> achieve this you want the programmer to have a single address space on >>>>>> the CPU and GPU. So that the same address point to the same object on >>>>>> GPU as on the CPU. This would also be a tremendous cleaner design from >>>>>> driver point of view toward memory management. >>>>>> >>>>> When GPU will comsume memory? >>>>> >>>>> The userspace process like mplayer will have video datas and GPU >>>>> will play this datas and use memory of mplayer since these video >>>>> datas load in mplayer process's address space? So GPU codes will >>>>> call gup to take a reference of memory? Please correct me if my >>>>> understanding is wrong. ;-) >>>>> >>>> First target is not thing such as video decompression, however they >>>> could >>>> too benefit from it given updated driver kernel API. In case of using >>>> iommu hardware page fault we don't call get_user_pages (gup) those we >>>> don't take a reference on the page. That's the whole point of the >>>> hardware >>>> pagefault, not taking reference on the page. >>>> >>> mplayer process is running on normal CPU or GPU? >>> chipset_integrated graphics will use normal memory and discrete >>> graphics will use its own memory, correct? So the memory used by >>> discrete graphics won't need gup, correct? >>> >> mplayer can decode video in software an only use the cpu. It can also use >> one of the accleration API such as VDPAU. In any case mplayer is still >> opening >> the video file allocating some memory with malloc, reading from file into >> this memory eventually do some preprocessing on that memory and then >> memcpy from this memory to memory allocated by the gpu driver. >> >> No imagine a world where you don't have to memcpy so that the gpu can >> access >> it. Even if it's doable today it's really not something you want todo, ie >> gup on page and not releasing page for minutes. >> >> There is two kind of integrated GPU, on x86 integrated GPU should be >> considered >> as discrete GPU because BIOS steal a chunk of system ram and transform it >> in >> fake vram. This stolen chunk is never ever under the control of the linux >> kernel >> (from mm pov the gpu kernel driver is in charge of it). >> > > I configure integrated GPU in BIOS during system boot, it's seems that we > can preallocate memory for integrated GPU, is this the memory you mentioned > ? Most likely it's > In any case both discrete GPU and integrated GPU have their own page table >> or >> > > Discrete GPU will not use normal memory even if their own memory is > exhaused, correct? > > They will consume normal memory, right now you can see that on heavy load hugue chunk of your system memory disappear, it's the gpu driver that is using it, it get mapped into gpu address space and from gpu unit pov it's just like any other memory (ie vram or sram looks the same to the gpu acceleration core, sram is just slower). Cheers Jerome [-- Attachment #2: Type: text/html, Size: 6837 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-02-09 16:29 ` Jerome Glisse 2013-04-09 8:28 ` Simon Jeons 2013-04-10 1:57 ` Simon Jeons @ 2013-04-15 8:39 ` Simon Jeons 2013-04-15 15:38 ` Jerome Glisse 2 siblings, 1 reply; 34+ messages in thread From: Simon Jeons @ 2013-04-15 8:39 UTC (permalink / raw) To: Jerome Glisse Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss Hi Jerome, On 02/10/2013 12:29 AM, Jerome Glisse wrote: > On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote: >> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote: >>> Hi, >>> >>> We would like to present a reference implementation for safely sharing >>> memory pages from user space with the hardware, without pinning. >>> >>> We will be happy to hear the community feedback on our prototype >>> implementation, and suggestions for future improvements. >>> >>> We would also like to discuss adding features to the core MM subsystem to >>> assist hardware access to user memory without pinning. >> This sounds kinda scary TBH; however I do understand the need for such >> technology. >> >> I think one issue is that many MM developers are insufficiently aware >> of such developments; having a technology presentation would probably >> help there; but traditionally LSF/MM sessions are more interactive >> between developers who are already quite familiar with the technology. >> I think it would help if you could send in advance a detailed >> presentation of the problem and the proposed solutions (and then what >> they require of the MM layer) so people can be better prepared. >> >> And first I'd like to ask, aren't IOMMUs supposed to already largely >> solve this problem ? (probably a dumb question, but that just tells >> you how much you need to explain :) > For GPU the motivation is three fold. With the advance of GPU compute > and also with newer graphic program we see a massive increase in GPU > memory consumption. We easily can reach buffer that are bigger than > 1gbytes. So the first motivation is to directly use the memory the > user allocated through malloc in the GPU this avoid copying 1gbytes of > data with the cpu to the gpu buffer. The second and mostly important The pinned memory you mentioned is the memory user allocated or the memory of gpu buffer? > to GPU compute is the use of GPU seamlessly with the CPU, in order to > achieve this you want the programmer to have a single address space on > the CPU and GPU. So that the same address point to the same object on > GPU as on the CPU. This would also be a tremendous cleaner design from > driver point of view toward memory management. > > And last, the most important, with such big buffer (>1gbytes) the > memory pinning is becoming way to expensive and also drastically > reduce the freedom of the mm to free page for other process. Most of > the time a small window (every thing is relative the window can be > > 100mbytes not so small :)) of the object will be in use by the > hardware. The hardware pagefault support would avoid the necessity to > pin memory and thus offer greater flexibility. At the same time the > driver wants to avoid page fault as much as possible this is why i > would like to be able to give hint to the mm about range of address it > should avoid freeing page (swapping them out). > > The iommu was designed with other goals, which were first isolate > device from one another and restrict device access to allowed memory. > Second allow to remap address that are above device address space > limit. Lot of device can only address 24bit or 32bit of memory and > with computer with several gbytes of memory suddenly lot of the page > become unreachable to the hardware. The iommu allow to work around > this by remapping those high page into address that the hardware can > reach. > > The hardware page fault support is a new feature of iommu designed to > help the os and driver to reduce memory pinning and also share address > space. Thought i am sure there are other motivations that i am not > even aware off or would think off. > > Btw i won't be at LSF/MM so a free good beer (or other beverage) on me > to whoever takes note on this subject in next conf we run into each > others. > > Cheers, > Jerome > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-15 8:39 ` Simon Jeons @ 2013-04-15 15:38 ` Jerome Glisse 2013-04-16 4:20 ` Simon Jeons 0 siblings, 1 reply; 34+ messages in thread From: Jerome Glisse @ 2013-04-15 15:38 UTC (permalink / raw) To: Simon Jeons Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss [-- Attachment #1: Type: text/plain, Size: 2163 bytes --] On Mon, Apr 15, 2013 at 4:39 AM, Simon Jeons <simon.jeons@gmail.com> wrote: > Hi Jerome, > On 02/10/2013 12:29 AM, Jerome Glisse wrote: > >> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> >> wrote: >> >>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> >>> wrote: >>> >>>> Hi, >>>> >>>> We would like to present a reference implementation for safely sharing >>>> memory pages from user space with the hardware, without pinning. >>>> >>>> We will be happy to hear the community feedback on our prototype >>>> implementation, and suggestions for future improvements. >>>> >>>> We would also like to discuss adding features to the core MM subsystem >>>> to >>>> assist hardware access to user memory without pinning. >>>> >>> This sounds kinda scary TBH; however I do understand the need for such >>> technology. >>> >>> I think one issue is that many MM developers are insufficiently aware >>> of such developments; having a technology presentation would probably >>> help there; but traditionally LSF/MM sessions are more interactive >>> between developers who are already quite familiar with the technology. >>> I think it would help if you could send in advance a detailed >>> presentation of the problem and the proposed solutions (and then what >>> they require of the MM layer) so people can be better prepared. >>> >>> And first I'd like to ask, aren't IOMMUs supposed to already largely >>> solve this problem ? (probably a dumb question, but that just tells >>> you how much you need to explain :) >>> >> For GPU the motivation is three fold. With the advance of GPU compute >> and also with newer graphic program we see a massive increase in GPU >> memory consumption. We easily can reach buffer that are bigger than >> 1gbytes. So the first motivation is to directly use the memory the >> user allocated through malloc in the GPU this avoid copying 1gbytes of >> data with the cpu to the gpu buffer. The second and mostly important >> > > The pinned memory you mentioned is the memory user allocated or the memory > of gpu buffer? > Memory user allocated, we don't want to pin this memory. Cheers, Jerome [-- Attachment #2: Type: text/html, Size: 2975 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-15 15:38 ` Jerome Glisse @ 2013-04-16 4:20 ` Simon Jeons 2013-04-16 16:19 ` Jerome Glisse 0 siblings, 1 reply; 34+ messages in thread From: Simon Jeons @ 2013-04-16 4:20 UTC (permalink / raw) To: Jerome Glisse Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss [-- Attachment #1: Type: text/plain, Size: 2993 bytes --] Hi Jerome, On 04/15/2013 11:38 PM, Jerome Glisse wrote: > On Mon, Apr 15, 2013 at 4:39 AM, Simon Jeons <simon.jeons@gmail.com > <mailto:simon.jeons@gmail.com>> wrote: > > Hi Jerome, > On 02/10/2013 12:29 AM, Jerome Glisse wrote: > > On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse > <walken@google.com <mailto:walken@google.com>> wrote: > > On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel > <raindel@mellanox.com <mailto:raindel@mellanox.com>> wrote: > > Hi, > > We would like to present a reference implementation > for safely sharing > memory pages from user space with the hardware, > without pinning. > > We will be happy to hear the community feedback on our > prototype > implementation, and suggestions for future improvements. > > We would also like to discuss adding features to the > core MM subsystem to > assist hardware access to user memory without pinning. > > This sounds kinda scary TBH; however I do understand the > need for such > technology. > > I think one issue is that many MM developers are > insufficiently aware > of such developments; having a technology presentation > would probably > help there; but traditionally LSF/MM sessions are more > interactive > between developers who are already quite familiar with the > technology. > I think it would help if you could send in advance a detailed > presentation of the problem and the proposed solutions > (and then what > they require of the MM layer) so people can be better > prepared. > > And first I'd like to ask, aren't IOMMUs supposed to > already largely > solve this problem ? (probably a dumb question, but that > just tells > you how much you need to explain :) > > For GPU the motivation is three fold. With the advance of GPU > compute > and also with newer graphic program we see a massive increase > in GPU > memory consumption. We easily can reach buffer that are bigger > than > 1gbytes. So the first motivation is to directly use the memory the > user allocated through malloc in the GPU this avoid copying > 1gbytes of > data with the cpu to the gpu buffer. The second and mostly > important > > > The pinned memory you mentioned is the memory user allocated or > the memory of gpu buffer? > > > Memory user allocated, we don't want to pin this memory. After this idea merged, we don't need to allocate memory for integrated GPU buffer and discrete GPU don't need to have its own memory, correct? > > Cheers, > Jerome [-- Attachment #2: Type: text/html, Size: 5155 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-16 4:20 ` Simon Jeons @ 2013-04-16 16:19 ` Jerome Glisse 0 siblings, 0 replies; 34+ messages in thread From: Jerome Glisse @ 2013-04-16 16:19 UTC (permalink / raw) To: Simon Jeons Cc: Michel Lespinasse, Shachar Raindel, lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss [-- Attachment #1: Type: text/plain, Size: 3267 bytes --] On Tue, Apr 16, 2013 at 12:20 AM, Simon Jeons <simon.jeons@gmail.com> wrote: > Hi Jerome, > > On 04/15/2013 11:38 PM, Jerome Glisse wrote: > > On Mon, Apr 15, 2013 at 4:39 AM, Simon Jeons <simon.jeons@gmail.com>wrote: > >> Hi Jerome, >> On 02/10/2013 12:29 AM, Jerome Glisse wrote: >> >>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> >>> wrote: >>> >>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> We would like to present a reference implementation for safely sharing >>>>> memory pages from user space with the hardware, without pinning. >>>>> >>>>> We will be happy to hear the community feedback on our prototype >>>>> implementation, and suggestions for future improvements. >>>>> >>>>> We would also like to discuss adding features to the core MM subsystem >>>>> to >>>>> assist hardware access to user memory without pinning. >>>>> >>>> This sounds kinda scary TBH; however I do understand the need for such >>>> technology. >>>> >>>> I think one issue is that many MM developers are insufficiently aware >>>> of such developments; having a technology presentation would probably >>>> help there; but traditionally LSF/MM sessions are more interactive >>>> between developers who are already quite familiar with the technology. >>>> I think it would help if you could send in advance a detailed >>>> presentation of the problem and the proposed solutions (and then what >>>> they require of the MM layer) so people can be better prepared. >>>> >>>> And first I'd like to ask, aren't IOMMUs supposed to already largely >>>> solve this problem ? (probably a dumb question, but that just tells >>>> you how much you need to explain :) >>>> >>> For GPU the motivation is three fold. With the advance of GPU compute >>> and also with newer graphic program we see a massive increase in GPU >>> memory consumption. We easily can reach buffer that are bigger than >>> 1gbytes. So the first motivation is to directly use the memory the >>> user allocated through malloc in the GPU this avoid copying 1gbytes of >>> data with the cpu to the gpu buffer. The second and mostly important >>> >> >> The pinned memory you mentioned is the memory user allocated or the >> memory of gpu buffer? >> > > Memory user allocated, we don't want to pin this memory. > > > After this idea merged, we don't need to allocate memory for integrated > GPU buffer and discrete GPU don't need to have its own memory, correct? > You need to stop considering discret and integrated GPU as different, they are not from driver point of view. Integrated GPU will keep stealing a chunk of system memory at boot because its a BIOS things and BIOS don't change like that. Both (integrated and discret) will keep allocating system memory in kernel for their own buffer because API such as OpenGL or OpenCL needs too. The transparent use of same address space on GPU as on CPU will only happen with newer API such as OpenCL 2.0 or other API that are schedule down the road. And discrete GPU will keep having its own memory, the whole point is that GDDR5 can be more than 10 times faster than the fastest DDR3. You can not beat that. GPU is all about bandwidth and that's not gonna change. Cheers, Jerome [-- Attachment #2: Type: text/html, Size: 5812 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-02-09 6:05 ` Michel Lespinasse 2013-02-09 16:29 ` Jerome Glisse @ 2013-02-10 7:54 ` Shachar Raindel 1 sibling, 0 replies; 34+ messages in thread From: Shachar Raindel @ 2013-02-10 7:54 UTC (permalink / raw) To: Michel Lespinasse Cc: lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss [-- Attachment #1: Type: text/plain, Size: 4809 bytes --] On 2/9/2013 8:05 AM, Michel Lespinasse wrote: > On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel<raindel@mellanox.com> wrote: >> Hi, >> >> We would like to present a reference implementation for safely sharing >> memory pages from user space with the hardware, without pinning. >> >> We will be happy to hear the community feedback on our prototype >> implementation, and suggestions for future improvements. >> >> We would also like to discuss adding features to the core MM subsystem to >> assist hardware access to user memory without pinning. > This sounds kinda scary TBH; however I do understand the need for such > technology. The technological challenges here are actually rather similar to the ones experienced by hypervisors that want to allow swapping of virtual machines. As a result, we benefit greatly from the mmu notifiers implemented for KVM. Reading the page table directly will be another level of challenge. > I think one issue is that many MM developers are insufficiently aware > of such developments; having a technology presentation would probably > help there; but traditionally LSF/MM sessions are more interactive > between developers who are already quite familiar with the technology. > I think it would help if you could send in advance a detailed > presentation of the problem and the proposed solutions (and then what > they require of the MM layer) so people can be better prepared. We hope to send out an RFC patch-set of the feature implementation for our hardware soon, which might help to demonstrate a use case for the technology. The current programming model for InfiniBand (and related network protocols - RoCE, iWarp) relies on the user space program registering memory regions for use with the hardware. Upon registration, the driver performs pinning (get_user_pages) of the memory area, updates a mapping table in the hardware and provides the user application with a handle for the mapping. The user space application then use this handle to request the hardware to access this area for network IO. While achieving unbeatable IO performance (round-trip latency, for user space programs, of less than 2 microseconds, bandwidth of 56 Gbit/second), this model is relatively hard to use: - The need for explicit memory registration for each area makes the API rather complex to use. Ideal API would have a handle per process, that allows it to communicate with the hardware using the process virtual addresses. - After a part of the address space has been registered, the application must be careful not to move the pages around. For example, doing a fork results in all of the memory registrations pointing to the wrong pages (which is very hard to debug). This was partially addressed at [1], but the cure is nearly as bad as the disease - when MADVISE_DONTFORK is used on the heap, a simple call to malloc in the child process might crash the process. - Memory which was registered is not swappable. As a result, one cannot write applications that overcommit for physical memory while using this API. Similarly to what Jerome described about GPU applications, for network access the application might want to use ~10% of its allocated memory space, but it is required to either pin all of the memory, use heuristics to predict what memory will be used or perform expensive copying/pinning for every network transaction. All of these are non-optimal. > And first I'd like to ask, aren't IOMMUs supposed to already largely > solve this problem ? (probably a dumb question, but that just tells > you how much you need to explain :) > IOMMU v1 doesn't solve this problem, as it gives you only one mapping table per PCI function. If you want ~64 processes on your machine to be able to access the network, this is not nearly enough. It is helping in implementing PCI pass-thru for virtualized guests (with the hardware devices exposing several virtual PCI functions for the guests), but that is still not enough for user space applications. To some extant, IOMMU v1 might even be an obstacle to implementing such feature, as it prevents PCI devices from accessing parts of the memory, requiring driver intervention for every page fault, even if the page is in memory. IOMMU v2 [2] is a step at the same direction that we are moving towards, offering PASID - a unique identifier for each transaction that the device performs, allowing to associate the transaction with a specific process. However, the challenges there are similar to these we encounter when using an address translation table on the PCI device itself (NIC/GPU). References: 1. MADVISE_DONTFORK - http://lwn.net/Articles/171956/ 2. AMD IOMMU v2 - http://www.linux-kvm.org/wiki/images/b/b1/2011-forum-amd-iommuv2-kvm.pdf [-- Attachment #2: Type: text/html, Size: 9341 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-02-08 11:18 [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes Shachar Raindel 2013-02-08 15:21 ` Jerome Glisse 2013-02-09 6:05 ` Michel Lespinasse @ 2013-04-09 8:17 ` Simon Jeons 2013-04-10 1:48 ` Simon Jeons 2 siblings, 1 reply; 34+ messages in thread From: Simon Jeons @ 2013-04-09 8:17 UTC (permalink / raw) To: Shachar Raindel Cc: lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss Hi Simon, On 02/08/2013 07:18 PM, Shachar Raindel wrote: > Hi, > > We would like to present a reference implementation for safely sharing > memory pages from user space with the hardware, without pinning. > > We will be happy to hear the community feedback on our prototype > implementation, and suggestions for future improvements. > > We would also like to discuss adding features to the core MM subsystem > to assist hardware access to user memory without pinning. > > Following is a longer motivation and explanation on the technology > presented: > > Many application developers would like to be able to be able to > communicate directly with the hardware from the userspace. > > Use cases for that includes high performance networking API such as > InfiniBand, RoCE and iWarp and interfacing with GPUs. > > Currently, if the user space application wants to share system memory > with the hardware device, the kernel component must pin the memory > pages in RAM, using get_user_pages. > > This is a hurdle, as it usually makes large portions the application > memory unmovable. This pinning also makes the user space development > model very complicated ? one needs to register memory before using it > for communication with the hardware. > > We use the mmu-notifiers [1] mechanism to inform the hardware when the > mapping of a page is changed. If the hardware tries to access a page > which is not yet mapped for the hardware, it requests a resolution for > the page address from the kernel. mmu_notifiers is used for host notice guest a page changed, is it? Why you said that it is used for informing the hardware when the mapping of a page is changed? > > This mechanism allows the hardware to access the entire address space > of the user application, without pinning even a single page. > > We would like to use the LSF/MM forum opportunity to discuss open > issues we have for further development, such as: > > -Allowing the hardware to perform page table walk, similar to > get_user_pages_fast to resolve user pages that are already in RAM. > > -Batching page eviction by various kernel subsystems (swapper, > page-cache) to reduce the amount of communication needed with the > hardware in such events > > -Hinting from the hardware to the MM regarding page fetches which are > speculative, similarly to prefetching done by the page-cache > > -Page-in notifications from the kernel to the driver, such that we can > keep our secondary TLB in sync with the kernel page table without > incurring page faults. > > -Allowed and banned actions while in an MMU notifier callback. We have > already done some work on making the MMU notifiers sleepable [2], but > there might be additional limitations, which we would like to discuss. > > -Hinting from the MMU notifiers as for the reason for the notification > - for example we would like to react differently if a page was moved > by NUMA migration vs. page being swapped out. > > [1] http://lwn.net/Articles/266320/ > > [2] http://comments.gmane.org/gmane.linux.kernel.mm/85002 > > Thanks, > > --Shachar > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes 2013-04-09 8:17 ` Simon Jeons @ 2013-04-10 1:48 ` Simon Jeons 0 siblings, 0 replies; 34+ messages in thread From: Simon Jeons @ 2013-04-10 1:48 UTC (permalink / raw) To: Shachar Raindel Cc: lsf-pc, linux-mm, Andrea Arcangeli, Roland Dreier, Haggai Eran, Or Gerlitz, Sagi Grimberg, Liran Liss, Jerome Glisse Ping Jerome, On 04/09/2013 04:17 PM, Simon Jeons wrote: > Hi Simon, > On 02/08/2013 07:18 PM, Shachar Raindel wrote: >> Hi, >> >> We would like to present a reference implementation for safely >> sharing memory pages from user space with the hardware, without pinning. >> >> We will be happy to hear the community feedback on our prototype >> implementation, and suggestions for future improvements. >> >> We would also like to discuss adding features to the core MM >> subsystem to assist hardware access to user memory without pinning. >> >> Following is a longer motivation and explanation on the technology >> presented: >> >> Many application developers would like to be able to be able to >> communicate directly with the hardware from the userspace. >> >> Use cases for that includes high performance networking API such as >> InfiniBand, RoCE and iWarp and interfacing with GPUs. >> >> Currently, if the user space application wants to share system memory >> with the hardware device, the kernel component must pin the memory >> pages in RAM, using get_user_pages. >> >> This is a hurdle, as it usually makes large portions the application >> memory unmovable. This pinning also makes the user space development >> model very complicated ? one needs to register memory before using it >> for communication with the hardware. >> >> We use the mmu-notifiers [1] mechanism to inform the hardware when >> the mapping of a page is changed. If the hardware tries to access a >> page which is not yet mapped for the hardware, it requests a >> resolution for the page address from the kernel. > > mmu_notifiers is used for host notice guest a page changed, is it? Why > you said that it is used for informing the hardware when the mapping > of a page is changed? > >> >> This mechanism allows the hardware to access the entire address space >> of the user application, without pinning even a single page. >> >> We would like to use the LSF/MM forum opportunity to discuss open >> issues we have for further development, such as: >> >> -Allowing the hardware to perform page table walk, similar to >> get_user_pages_fast to resolve user pages that are already in RAM. >> >> -Batching page eviction by various kernel subsystems (swapper, >> page-cache) to reduce the amount of communication needed with the >> hardware in such events >> >> -Hinting from the hardware to the MM regarding page fetches which are >> speculative, similarly to prefetching done by the page-cache >> >> -Page-in notifications from the kernel to the driver, such that we >> can keep our secondary TLB in sync with the kernel page table without >> incurring page faults. >> >> -Allowed and banned actions while in an MMU notifier callback. We >> have already done some work on making the MMU notifiers sleepable >> [2], but there might be additional limitations, which we would like >> to discuss. >> >> -Hinting from the MMU notifiers as for the reason for the >> notification - for example we would like to react differently if a >> page was moved by NUMA migration vs. page being swapped out. >> >> [1] http://lwn.net/Articles/266320/ >> >> [2] http://comments.gmane.org/gmane.linux.kernel.mm/85002 >> >> Thanks, >> >> --Shachar >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2013-04-18 1:02 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-02-08 11:18 [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes Shachar Raindel 2013-02-08 15:21 ` Jerome Glisse 2013-04-16 7:03 ` Simon Jeons 2013-04-16 16:27 ` Jerome Glisse 2013-04-16 23:50 ` Simon Jeons 2013-04-17 14:01 ` Jerome Glisse 2013-04-17 23:48 ` Simon Jeons 2013-04-18 1:02 ` Jerome Glisse 2013-02-09 6:05 ` Michel Lespinasse 2013-02-09 16:29 ` Jerome Glisse 2013-04-09 8:28 ` Simon Jeons 2013-04-09 14:21 ` Jerome Glisse 2013-04-10 1:41 ` Simon Jeons 2013-04-10 20:45 ` Jerome Glisse 2013-04-11 3:42 ` Simon Jeons 2013-04-11 18:38 ` Jerome Glisse 2013-04-12 1:54 ` Simon Jeons 2013-04-12 2:11 ` [Lsf-pc] " Rik van Riel 2013-04-12 2:57 ` Jerome Glisse 2013-04-12 5:44 ` Simon Jeons 2013-04-12 13:32 ` Jerome Glisse 2013-04-10 1:57 ` Simon Jeons 2013-04-10 20:55 ` Jerome Glisse 2013-04-11 3:37 ` Simon Jeons 2013-04-11 18:48 ` Jerome Glisse 2013-04-12 3:13 ` Simon Jeons 2013-04-12 3:21 ` Jerome Glisse 2013-04-15 8:39 ` Simon Jeons 2013-04-15 15:38 ` Jerome Glisse 2013-04-16 4:20 ` Simon Jeons 2013-04-16 16:19 ` Jerome Glisse 2013-02-10 7:54 ` Shachar Raindel 2013-04-09 8:17 ` Simon Jeons 2013-04-10 1:48 ` Simon Jeons
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.