All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Veltzer <mark.veltzer@gmail.com>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	linux-kernel@vger.kernel.org, Andi Kleen <andi@firstfloor.org>
Subject: Re: get_user_pages question
Date: Tue, 10 Nov 2009 00:13:31 +0200	[thread overview]
Message-ID: <200911100013.31768.mark.veltzer@gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0911091031460.15199@sister.anvils>

On Monday 09 November 2009 12:32:52 you wrote:
> On Mon, 9 Nov 2009, Andi Kleen wrote:
> > Mark Veltzer <mark.veltzer@gmail.com> writes:
> > > I am testing this kernel module with several buffers from user space
> > > allocated in several different ways. heap, data segment, static
> > > variable in function and stack. All scenarious work EXCEPT the stack
> > > one. When passing the stack buffer the kernel sees one thing while user
> > > space sees another.
> >
> > In theory it should work, stack is no different from any other pages.
> > First thought was that you used some platform with incoherent caches,
> > but that doesn't seem to be the case if it's standard x86.
> 
> It may be irrelevant to Mark's stack case, but it is worth mentioning
> the fork problem: how a process does get_user_pages to pin down a buffer
> somewhere in anonymous memory, a thread forks (write protecting anonymous
> memory shared between parent and child), child userspace writes to a
> location in the same page as that buffer, causing copy-on-write which
> breaks the connection between the get_user_pages buffer and what child
> userspace sees there afterwards.
> 
> Hugh
> 

Thanks Hugh and Andi

Hugh, you actually hit the nail on the head!

I was forking while doing these mappings and the child won the race and got to 
keep the pinned pages while the parent got left with a copy which meant 
nothing. The thing is that it was hard to spot because I was using a library 
function which called a function etc... which eventually did some system(3). 
It only happened on in stack testing case bacause the child was not really 
doing anything with the pinned memory on purpose and so in all other cases did 
not touch the memory except the stack which it, ofcourse, uses. The child won 
the race in the stack case and so shared the data with the kernel and the 
parent got a copy with the old data.

I understand that madvise(2) can prevent this copy-on-write and race between 
child and parent and I also duplicated it in the kernel using the following 
code:

		[lock the current->mm for writing]
			vma=find_vma(current->mm, [user pointer])
			vma->vm_flags|=VM_DONTCOPY
		[unlock the current->mm for writing]

The above code is actually a kernel version of madvise(2) and MADV_DONTFORK.

The problem with this solution (either madvise in user space or DONTCOPY in 
kernel) is that I give up the ability to fork(2) since the child is left 
stackless (or with a hold in it's stack - im not sure...)

My question is: is there a way to allow forking while still pinning STACK 
memory via get_user_pages? I can actually live with the current solution since 
I can make sure that the user space thread that does the work with the driver 
never forks but I'm interested to know what other neat vm tricks linux has up 
it's sleeve...

BTW: would it not be a good addition to the madvise(2) manpage to state that 
you should be careful with doing madvise(DONTFORK) because you may segfault 
your children and that doing so on a stack address has even more chance of 
crashing children ? Who should I talk about adding this info to the manual 
page? The current manpage that I have only talks about scatter-gather uses of 
DONTFORK and does not mention the problems of DONTFORK...

Thanks in advance
	Mark

  reply	other threads:[~2009-11-09 22:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-09  6:50 get_user_pages question Mark Veltzer
2009-11-09  9:31 ` Andi Kleen
2009-11-09 10:32   ` Hugh Dickins
2009-11-09 22:13     ` Mark Veltzer [this message]
2009-11-10 16:33       ` Hugh Dickins
2009-11-28 18:50         ` Andrea Arcangeli
2009-11-28 22:22           ` Mark Veltzer
2009-11-30 12:01             ` Nick Piggin
2009-11-30 16:12               ` Andrea Arcangeli
2009-11-30 11:54           ` Nick Piggin
  -- strict thread matches above, loose matches on Subject: below --
2004-05-01 11:12 Eli Cohen
2004-05-01 11:32 ` Arjan van de Ven
2004-05-01 11:41   ` Eli Cohen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200911100013.31768.mark.veltzer@gmail.com \
    --to=mark.veltzer@gmail.com \
    --cc=andi@firstfloor.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.