linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: locking user space memory in kernel
@ 2004-04-08  0:45 Libor Michalek
  2004-04-08  5:22 ` Manfred Spraul
  0 siblings, 1 reply; 13+ messages in thread
From: Libor Michalek @ 2004-04-08  0:45 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Roland Dreier, Eli Cohen, linux-kernel

----- Forwarded message from Manfred Spraul <manfred@colorfullife.com> -----
>
> Date:	Sun, 21 Mar 2004 12:31:59 +0100
> From: Manfred Spraul <manfred@colorfullife.com>
> To: Eli Cohen <mlxk@mellanox.co.il>
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: locking user space memory in kernel
>
> Hi Eli,
>
> I think just get_user_pages() should be sufficient: the pages won't be 
> swapped out. You don't need to set VM_LOCKED in vma->vm_flags to prevent 
> the swap out. In the worst case, the pte is cleared a that will cause a 
> soft page fault, but the physical address won't change. Multiple 
> get_user_pages() calls on overlapping regions are ok, the page count is 
> an atomic_t, at least 24-bit large.

  The soft page fault is a problem if the device is going to write data 
into the buffer and then notify the user that the buffer now contains 
valid data. If the soft page fault occurs before the device has written
to the page list, once the user is notified of the write and reads the 
buffer, it will no longer be the same pages as the ones to which the 
device wrote. Is setting VM_LOCKED the only way to prevent the soft
page fault and this issue?


-Libor







^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: locking user space memory in kernel
  2004-04-08  0:45 locking user space memory in kernel Libor Michalek
@ 2004-04-08  5:22 ` Manfred Spraul
  0 siblings, 0 replies; 13+ messages in thread
From: Manfred Spraul @ 2004-04-08  5:22 UTC (permalink / raw)
  To: Libor Michalek; +Cc: Roland Dreier, Eli Cohen, linux-kernel

Libor Michalek wrote:

>----- Forwarded message from Manfred Spraul <manfred@colorfullife.com> -----
>  
>
>>Date:	Sun, 21 Mar 2004 12:31:59 +0100
>>From: Manfred Spraul <manfred@colorfullife.com>
>>To: Eli Cohen <mlxk@mellanox.co.il>
>>Cc: linux-kernel@vger.kernel.org
>>Subject: Re: locking user space memory in kernel
>>
>>Hi Eli,
>>
>>I think just get_user_pages() should be sufficient: the pages won't be 
>>swapped out. You don't need to set VM_LOCKED in vma->vm_flags to prevent 
>>the swap out. In the worst case, the pte is cleared a that will cause a 
>>soft page fault, but the physical address won't change. Multiple 
>>get_user_pages() calls on overlapping regions are ok, the page count is 
>>an atomic_t, at least 24-bit large.
>>    
>>
>
>  The soft page fault is a problem if the device is going to write data 
>into the buffer and then notify the user that the buffer now contains 
>valid data. If the soft page fault occurs before the device has written
>to the page list, once the user is notified of the write and reads the 
>buffer, it will no longer be the same pages as the ones to which the 
>device wrote.
>
No. The physical addresses do not change due to a soft page fault.
A soft fault means that the page table entry is cleared, but that the 
physical page is still in the system memory. do_swap_page does a swap 
cache lookup and finds the original physical page in memory and maps it 
back to the virtual address. The physical page can't be dropped from the 
swap cache because your driver still holds one reference to the page - 
no swapout.

But fork() is a problem for get_user_pages(): You probably have to write 
an improved function (create_user_mapping/destroy_user_mapping) that 
handles fork correctly.
And add arch hooks into the new function - they are required for archs 
with incoherent cpu caches. Right now O_DIRECT doesn't flush the cpu 
caches, because it's impossible to implement it with get_user_pages(). 
It works, because the data cache is usually coherent and noone loads 
libraries with O_DIRECT.

--
    Manfred



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: locking user space memory in kernel
@ 2004-04-08  6:20 Ross Dickson
  0 siblings, 0 replies; 13+ messages in thread
From: Ross Dickson @ 2004-04-08  6:20 UTC (permalink / raw)
  To: libor; +Cc: linux-kernel, manfred

Libor Michalek wrote: 
 
>----- Forwarded message from Manfred Spraul <manfred@colorfullife.com> ----- 
 > 
 > 
 >>Date: Sun, 21 Mar 2004 12:31:59 +0100 
 >>From: Manfred Spraul <manfred@colorfullife.com> 
 >>To: Eli Cohen <mlxk@mellanox.co.il> 
 >>Cc: linux-kernel@vger.kernel.org 
 >>Subject: Re: locking user space memory in kernel 
 >> 
 >>Hi Eli, 
 >> 
 >>I think just get_user_pages() should be sufficient: the pages won't be 
 >>swapped out. You don't need to set VM_LOCKED in vma->vm_flags to prevent 
 >>the swap out. In the worst case, the pte is cleared a that will cause a 
 >>soft page fault, but the physical address won't change. Multiple 
 >>get_user_pages() calls on overlapping regions are ok, the page count is 
 >>an atomic_t, at least 24-bit large. 
 >> 
 >> 
 > 
 > The soft page fault is a problem if the device is going to write data 
 >into the buffer and then notify the user that the buffer now contains 
 >valid data. If the soft page fault occurs before the device has written 
 >to the page list, once the user is notified of the write and reads the 
 >buffer, it will no longer be the same pages as the ones to which the 
 >device wrote. 
 > 

I know of an open source driver for image acquisition cards iti-fg that does
"PAGEWISE transfer of image frames directly into user space". The
release notes mention the memory locking.

It is available here
http://oss.gom.com/

Regards
Ross.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: locking user space memory in kernel
  2004-03-22 15:22       ` Eli Cohen
@ 2004-03-22 19:34         ` Manfred Spraul
  0 siblings, 0 replies; 13+ messages in thread
From: Manfred Spraul @ 2004-03-22 19:34 UTC (permalink / raw)
  To: Eli Cohen; +Cc: Roland Dreier, linux-kernel

Eli Cohen wrote:

> Roland Dreier wrote:
>
>> I don't think copying all the registered memory on fork() is feasible,
>> because it's going to kill performance (especially since exec() is
>> likely to immediately follow the fork() in the child).  Also, there
>> may not be enough memory around to copy everything.
>>
>>  
>>
> Suppose a new vma flag is introduced, VM_NOCOW and an API to apply 
> this flag on a range of addreses, splitting or unifying vmas as necessary.

Something like that. But it should be hidden within a suitable 
abstraction. get_user_pages and then put_page is not stateful enough. 
Actually it's fundamentally broken for platform that need cache flush 
calls. create_page_mapping/free_page_mapping, or something like that.

And I still think that the initial implementation should copy the 
affected pages within fork() - it might be slow, but at least it's 
simple and correct. _If_ it's too slow, then it can be fixed later.

--
    Manfred




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: locking user space memory in kernel
  2004-03-21 18:18     ` Roland Dreier
  2004-03-22 13:15       ` Eli Cohen
@ 2004-03-22 15:22       ` Eli Cohen
  2004-03-22 19:34         ` Manfred Spraul
  1 sibling, 1 reply; 13+ messages in thread
From: Eli Cohen @ 2004-03-22 15:22 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Manfred Spraul, linux-kernel

Roland Dreier wrote:

> I don't think copying all the registered memory on fork() is feasible,
> because it's going to kill performance (especially since exec() is
> likely to immediately follow the fork() in the child).  Also, there
> may not be enough memory around to copy everything.
>
>  
>
Suppose a new vma flag is introduced, VM_NOCOW and an API to apply this 
flag on a range of addreses, splitting or unifying vmas as necessary. A 
driver which registers memory with hardware would call this function. 
When fork takes place, the ptes of the parent belonging to such vmas 
will not be changed to read only thus they will not undergo COW. The 
kernel will copy the first and last pages of theses vmas to the child. 
All the pages in between will be marked read only and will undergo COW 
when written to. One problem would be that that child can read pages of 
the parent after the parent modifies them but that could be avoided if 
the address space of the child does not inherit the range of the middle 
pages. ???
Eli

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: locking user space memory in kernel
  2004-03-21 18:18     ` Roland Dreier
@ 2004-03-22 13:15       ` Eli Cohen
  2004-03-22 15:22       ` Eli Cohen
  1 sibling, 0 replies; 13+ messages in thread
From: Eli Cohen @ 2004-03-22 13:15 UTC (permalink / raw)
  To: linux-kernel

Roland Dreier wrote:

>I don't think copying all the registered memory on fork() is feasible,
>because it's going to kill performance (especially since exec() is
>likely to immediately follow the fork() in the child).  Also, there
>may not be enough memory around to copy everything.
>
>  
>
Suppose a new vma flag is introduced, VM_NOCOW and an API to apply this 
flag on a range of addreses, splitting or unifying vmas as necessary. A 
driver which registers memory with hardware would call this function. 
When fork takes place, the ptes of the parent belonging to such vmas 
will not be changed to read only thus they will not undergo COW. The 
kernel will copy the first and last pages of theses vmas to the child. 
All the pages in between will be marked read only and will undergo COW 
when written to. One problem would be that that child can read pages of 
the parent after the parent modifies them but that could be avoided if 
the address space of the child does not inherit the range of the moddle 
pages. ???
Eli

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: locking user space memory in kernel
  2004-03-21 17:15   ` Manfred Spraul
@ 2004-03-21 18:18     ` Roland Dreier
  2004-03-22 13:15       ` Eli Cohen
  2004-03-22 15:22       ` Eli Cohen
  0 siblings, 2 replies; 13+ messages in thread
From: Roland Dreier @ 2004-03-21 18:18 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Eli Cohen, linux-kernel

    Manfred> I think just get_user_pages() should be sufficient: the
    Manfred> pages won't be swapped out. You don't need to set
    Manfred> VM_LOCKED in vma->vm_flags to prevent the swap out. In
    Manfred> the worst case, the pte is cleared a that will cause a
    Manfred> soft page fault, but the physical address won't
    Manfred> change. Multiple get_user_pages() calls on overlapping
    Manfred> regions are ok, the page count is an atomic_t, at least
    Manfred> 24-bit large.

    Roland> There is one case that we ran into where the physical
    Roland> address can change: if a process does a fork() and then
    Roland> triggers COW.

    Manfred> You are right.  What should happen if there are
    Manfred> registered transfers during fork()?  Copy the pages
    Manfred> during the fork() syscall?

The current Mellanox InfiniBand driver goes to some trouble to mark
the memory being registered with VM_DONTCOPY.  This means the vmas
don't get copied into the child of a fork(), so the COW doesn't
happen.  However, this certainly leads to some quirks in semantics.
In particular, an application using fork() has to be careful that
registered memory doesn't share a page with something the child
process wants to use.

I don't think copying all the registered memory on fork() is feasible,
because it's going to kill performance (especially since exec() is
likely to immediately follow the fork() in the child).  Also, there
may not be enough memory around to copy everything.

Out of curiousity, what happens if I fork with pending AIO in the
current kernel?

 - Roland


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: locking user space memory in kernel
  2004-03-21 16:40 ` Roland Dreier
@ 2004-03-21 17:15   ` Manfred Spraul
  2004-03-21 18:18     ` Roland Dreier
  0 siblings, 1 reply; 13+ messages in thread
From: Manfred Spraul @ 2004-03-21 17:15 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Eli Cohen, linux-kernel

Roland Dreier wrote:

>    Manfred> I think just get_user_pages() should be sufficient: the
>    Manfred> pages won't be swapped out. You don't need to set
>    Manfred> VM_LOCKED in vma->vm_flags to prevent the swap out. In
>    Manfred> the worst case, the pte is cleared a that will cause a
>    Manfred> soft page fault, but the physical address won't
>    Manfred> change. Multiple get_user_pages() calls on overlapping
>    Manfred> regions are ok, the page count is an atomic_t, at least
>    Manfred> 24-bit large.
>
>There is one case that we ran into where the physical address can
>change: if a process does a fork() and then triggers COW.
>
You are right.
What should happen if there are registered transfers during fork()? Copy 
the pages during the fork() syscall?

--
    Manfred


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: locking user space memory in kernel
  2004-03-21 11:31 Manfred Spraul
  2004-03-21 14:12 ` Manfred Spraul
@ 2004-03-21 16:40 ` Roland Dreier
  2004-03-21 17:15   ` Manfred Spraul
  1 sibling, 1 reply; 13+ messages in thread
From: Roland Dreier @ 2004-03-21 16:40 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Eli Cohen, linux-kernel

    Manfred> I think just get_user_pages() should be sufficient: the
    Manfred> pages won't be swapped out. You don't need to set
    Manfred> VM_LOCKED in vma->vm_flags to prevent the swap out. In
    Manfred> the worst case, the pte is cleared a that will cause a
    Manfred> soft page fault, but the physical address won't
    Manfred> change. Multiple get_user_pages() calls on overlapping
    Manfred> regions are ok, the page count is an atomic_t, at least
    Manfred> 24-bit large.

There is one case that we ran into where the physical address can
change: if a process does a fork() and then triggers COW.

 - Roland


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: locking user space memory in kernel
  2004-03-21 11:31 Manfred Spraul
@ 2004-03-21 14:12 ` Manfred Spraul
  2004-03-21 16:40 ` Roland Dreier
  1 sibling, 0 replies; 13+ messages in thread
From: Manfred Spraul @ 2004-03-21 14:12 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Eli Cohen, linux-kernel

Arjan wrote:

>On Sun, 2004-03-21 at 12:18, Eli Cohen wrote:
>> Hi,
>> I need to be able to lock memory allocated in user space and passed to 
>> my driver, in order to pass it to a dma controller that can maintain a 
>> translation table for each process. The obvious thing is to use 
>
>the linux way is to do it the other way around, provide a device that
>userspace then can mmap......
>
That's definitively the preferred method, but unfortunately there are 
existing apis that are the other way around. I think the main MPI 
transfer functions must read/write to arbitrary addresses, I'm sure 
there are other examples.

--
    Manfred


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: locking user space memory in kernel
  2004-03-21 11:18 Eli Cohen
@ 2004-03-21 11:35 ` Arjan van de Ven
  0 siblings, 0 replies; 13+ messages in thread
From: Arjan van de Ven @ 2004-03-21 11:35 UTC (permalink / raw)
  To: Eli Cohen; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 378 bytes --]

On Sun, 2004-03-21 at 12:18, Eli Cohen wrote:
> Hi,
> I need to be able to lock memory allocated in user space and passed to 
> my driver, in order to pass it to a dma controller that can maintain a 
> translation table for each process. The obvious thing is to use 

the linux way is to do it the other way around, provide a device that
userspace then can mmap......


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: locking user space memory in kernel
@ 2004-03-21 11:31 Manfred Spraul
  2004-03-21 14:12 ` Manfred Spraul
  2004-03-21 16:40 ` Roland Dreier
  0 siblings, 2 replies; 13+ messages in thread
From: Manfred Spraul @ 2004-03-21 11:31 UTC (permalink / raw)
  To: Eli Cohen; +Cc: linux-kernel

Hi Eli,

I think just get_user_pages() should be sufficient: the pages won't be 
swapped out. You don't need to set VM_LOCKED in vma->vm_flags to prevent 
the swap out. In the worst case, the pte is cleared a that will cause a 
soft page fault, but the physical address won't change. Multiple 
get_user_pages() calls on overlapping regions are ok, the page count is 
an atomic_t, at least 24-bit large.

--
    Manfred


^ permalink raw reply	[flat|nested] 13+ messages in thread

* locking user space memory in kernel
@ 2004-03-21 11:18 Eli Cohen
  2004-03-21 11:35 ` Arjan van de Ven
  0 siblings, 1 reply; 13+ messages in thread
From: Eli Cohen @ 2004-03-21 11:18 UTC (permalink / raw)
  To: linux-kernel

Hi,
I need to be able to lock memory allocated in user space and passed to 
my driver, in order to pass it to a dma controller that can maintain a 
translation table for each process. The obvious thing is to use 
sys_mlock() (and sys_munlock() for unlocking) but this function is not 
exported anymore, nore is sys_call_table.  I considered marking the 
relevant vma->vm_flags with VM_LOCKED and calling get_user_pages but 
that could be overkill if I want to lock just a portion of the VMA. 
Currently I do some hacking to find the addresses of sys_mlock/sys_munlock.
I also need to maintain a reference count on the locking /unlocking such 
that a region that has been locked twice will really be unlocked after 
unlocking twice. This needs to support partly overlapping regions. To 
cope with this I have implemented some code on top of calls to 
sys_mlock/sys_munlock to provide this functionality.
Are there more standard ways to get this functionality from the kernel? 
Any help is appreciated.

Thanks
Eli

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-04-08  6:17 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-08  0:45 locking user space memory in kernel Libor Michalek
2004-04-08  5:22 ` Manfred Spraul
  -- strict thread matches above, loose matches on Subject: below --
2004-04-08  6:20 Ross Dickson
2004-03-21 11:31 Manfred Spraul
2004-03-21 14:12 ` Manfred Spraul
2004-03-21 16:40 ` Roland Dreier
2004-03-21 17:15   ` Manfred Spraul
2004-03-21 18:18     ` Roland Dreier
2004-03-22 13:15       ` Eli Cohen
2004-03-22 15:22       ` Eli Cohen
2004-03-22 19:34         ` Manfred Spraul
2004-03-21 11:18 Eli Cohen
2004-03-21 11:35 ` Arjan van de Ven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).