On Fri, Apr 12, 2013 at 1:44 AM, Simon Jeons wrote: > Hi Jerome, > > On 04/12/2013 10:57 AM, Jerome Glisse wrote: > > On Thu, Apr 11, 2013 at 9:54 PM, Simon Jeons wrote: > >> Hi Jerome, >> >> On 04/12/2013 02:38 AM, Jerome Glisse wrote: >> >>> On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote: >>> >>>> Hi Jerome, >>>> On 04/11/2013 04:45 AM, Jerome Glisse wrote: >>>> >>>>> On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote: >>>>> >>>>>> Hi Jerome, >>>>>> On 04/09/2013 10:21 PM, Jerome Glisse wrote: >>>>>> >>>>>>> On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote: >>>>>>> >>>>>>>> Hi Jerome, >>>>>>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote: >>>>>>>> >>>>>>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse < >>>>>>>>> walken@google.com> wrote: >>>>>>>>> >>>>>>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel < >>>>>>>>>> raindel@mellanox.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> We would like to present a reference implementation for safely >>>>>>>>>>> sharing >>>>>>>>>>> memory pages from user space with the hardware, without pinning. >>>>>>>>>>> >>>>>>>>>>> We will be happy to hear the community feedback on our prototype >>>>>>>>>>> implementation, and suggestions for future improvements. >>>>>>>>>>> >>>>>>>>>>> We would also like to discuss adding features to the core MM >>>>>>>>>>> subsystem to >>>>>>>>>>> assist hardware access to user memory without pinning. >>>>>>>>>>> >>>>>>>>>> This sounds kinda scary TBH; however I do understand the need for >>>>>>>>>> such >>>>>>>>>> technology. >>>>>>>>>> >>>>>>>>>> I think one issue is that many MM developers are insufficiently >>>>>>>>>> aware >>>>>>>>>> of such developments; having a technology presentation would >>>>>>>>>> probably >>>>>>>>>> help there; but traditionally LSF/MM sessions are more interactive >>>>>>>>>> between developers who are already quite familiar with the >>>>>>>>>> technology. >>>>>>>>>> I think it would help if you could send in advance a detailed >>>>>>>>>> presentation of the problem and the proposed solutions (and then >>>>>>>>>> what >>>>>>>>>> they require of the MM layer) so people can be better prepared. >>>>>>>>>> >>>>>>>>>> And first I'd like to ask, aren't IOMMUs supposed to already >>>>>>>>>> largely >>>>>>>>>> solve this problem ? (probably a dumb question, but that just >>>>>>>>>> tells >>>>>>>>>> you how much you need to explain :) >>>>>>>>>> >>>>>>>>> For GPU the motivation is three fold. With the advance of GPU >>>>>>>>> compute >>>>>>>>> and also with newer graphic program we see a massive increase in >>>>>>>>> GPU >>>>>>>>> memory consumption. We easily can reach buffer that are bigger than >>>>>>>>> 1gbytes. So the first motivation is to directly use the memory the >>>>>>>>> user allocated through malloc in the GPU this avoid copying >>>>>>>>> 1gbytes of >>>>>>>>> data with the cpu to the gpu buffer. The second and mostly >>>>>>>>> important >>>>>>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order >>>>>>>>> to >>>>>>>>> achieve this you want the programmer to have a single address >>>>>>>>> space on >>>>>>>>> the CPU and GPU. So that the same address point to the same object >>>>>>>>> on >>>>>>>>> GPU as on the CPU. This would also be a tremendous cleaner design >>>>>>>>> from >>>>>>>>> driver point of view toward memory management. >>>>>>>>> >>>>>>>>> And last, the most important, with such big buffer (>1gbytes) the >>>>>>>>> memory pinning is becoming way to expensive and also drastically >>>>>>>>> reduce the freedom of the mm to free page for other process. Most >>>>>>>>> of >>>>>>>>> the time a small window (every thing is relative the window can be >>>>>>>>> > >>>>>>>>> 100mbytes not so small :)) of the object will be in use by the >>>>>>>>> hardware. The hardware pagefault support would avoid the necessity >>>>>>>>> to >>>>>>>>> >>>>>>>> What's the meaning of hardware pagefault? >>>>>>>> >>>>>>> It's a PCIE extension (well it's a combination of extension that >>>>>>> allow >>>>>>> that see http://www.pcisig.com/specifications/iov/ats/). Idea is >>>>>>> that the >>>>>>> iommu can trigger a regular pagefault inside a process address space >>>>>>> on >>>>>>> behalf of the hardware. The only iommu supporting that right now is >>>>>>> the >>>>>>> AMD iommu v2 that you find on recent AMD platform. >>>>>>> >>>>>> Why need hardware page fault? regular page fault is trigger by cpu >>>>>> mmu, correct? >>>>>> >>>>> Well here i abuse regular page fault term. Idea is that with hardware >>>>> page >>>>> fault you don't need to pin memory or take reference on page for >>>>> hardware to >>>>> use it. So that kernel can free as usual page that would otherwise >>>>> have been >>>>> >>>> For the case when GPU need to pin memory, why GPU need grap the >>>> memory of normal process instead of allocating for itself? >>>> >>> Pin memory is today world where gpu allocate its own memory (GB of >>> memory) >>> that disappear from kernel control ie kernel can no longer reclaim this >>> memory it's lost memory (i had complain about that already from user than >>> saw GB of memory vanish and couldn't understand why the GPU was using so >>> much). >>> >>> Tomorrow world we want gpu to be able to access memory that the >>> application >>> allocated through a simple malloc and we want the kernel to be able to >>> recycly any page at any time because of memory pressure or because kernel >>> decide to do so. >>> >>> That's just what we want to do. To achieve so we are getting hw that can >>> do >>> pagefault. No change to kernel core mm code (some improvement might be >>> made). >>> >> >> The memory disappear since you have a reference(gup) against it, >> correct? Tomorrow world you want the page fault trigger through iommu >> driver that call get_user_pages, it also will take a reference(since gup is >> called), isn't it? Anyway, assume tomorrow world doesn't take a reference, >> we don't need care page which used by GPU is reclaimed? >> >> > Right now code use gup because it's convenient but it drop the reference > right after the fault. So reference is hold only for short period of time. > > > Are you sure gup will drop the reference right after the fault? I redig > the codes and fail verify it. Could you point out to me? > > In amd_iommu_v2.c:do_fault get_user_pages followed by put_page Cheers, Jerome