linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: Users locking memory using futexes
@ 2002-11-12  5:56 Perez-Gonzalez, Inaky
  2002-11-12 17:36 ` Jamie Lokier
  0 siblings, 1 reply; 13+ messages in thread
From: Perez-Gonzalez, Inaky @ 2002-11-12  5:56 UTC (permalink / raw)
  To: 'Jamie Lokier'
  Cc: Rusty Russell, 'mingo@redhat.com', 'Mark Mielke',
	linux-kernel


> > This raises a good point - I guess we should be doing something like
> > checking user limits (against locked memory, 'ulimit -l').

> If futexes are limited by user limits, that's going to mean some
> threading program gets a surprise when too many threads decide to
> block on a resource.  That's really nasty.  (Of course, a program can
> get a surprise due to just running out of memory in sys_futex() too,
> but that's much rarer).
 
Sure, as I mentioned in my email, that'd be _a_ way to do it, but I am not
convinced at all it is the best -- of course, I don't know what could be the
best way to do it; maybe a capability? a per-process tunable in /proc?
another rlimit and we break POSIX? [do we?]

Good thing is - I just found out after reading twice - that FUTEX_FD does
not lock the page in memory, so that is one case less to worry about. 

In this context I was wondering it it really makes sense to worry about too
many threads of a DoS process blocking on futex_wait() to lock memory out.
At least, as an excercise ...

> It would be nice if the futex waitqueues could be re-hashed against
> swap entries when pages are swapped out, somehow, but this 
> sounds hard.

I am starting to think it could be done with no effort -- just off my
little-knoledgeable-head -- let's say it can be done:

In futex_wait(), we lock the page, store it and the offset [and whatever
else] as now, and then releasing just after queueing in the hash table; this
way the page can go wild to swap.

Some other process has locked it, then goes on to do something else and the
page ends up in swap. Whenever we call _wake() - or tell_waiters(), we need
to make sure the page is in RAM - if not, we can page it in (__pin_page()
does it already) and lock it, do the thing, unlock it.

So, this would mean this patch should suffice:
--- futex.c	12 Nov 2002 05:38:55 -0000	1.1.1.3.8.1
+++ futex.c	12 Nov 2002 05:50:35 -0000
@@ -281,10 +277,12 @@
 	/* Page is pinned, but may no longer be in this address space. */
 	if (get_user(curval, (int *)uaddr) != 0) {
 		ret = -EFAULT;
+                unpin_page(page);
 		goto out;
 	}
 	if (curval != val) {
 		ret = -EWOULDBLOCK;
+                unpin_page(page);
 		goto out;
 	}
 	/*
@@ -295,6 +293,7 @@
 	 * the waiter from the list.
 	 */
 	add_wait_queue(&q.waiters, &wait);
+        unpin_page(page);
 	set_current_state(TASK_INTERRUPTIBLE);
 	if (!list_empty(&q.list))
 		time = schedule_timeout(time);
@@ -313,7 +312,6 @@
 	/* Were we woken up anyway? */
 	if (!unqueue_me(&q))
 		ret = 0;
-	unpin_page(page);
 
 	return ret;
 }

Rusty, Ingo: am I missing something big in here? I am kind of green in the
interactions between the address spaces.

Inaky Perez-Gonzalez -- Not speaking for Intel - opinions are my own [or my
fault]


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Users locking memory using futexes
  2002-11-12  5:56 Users locking memory using futexes Perez-Gonzalez, Inaky
@ 2002-11-12 17:36 ` Jamie Lokier
  0 siblings, 0 replies; 13+ messages in thread
From: Jamie Lokier @ 2002-11-12 17:36 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: Rusty Russell, 'mingo@redhat.com', 'Mark Mielke',
	linux-kernel

Perez-Gonzalez, Inaky wrote:
> Good thing is - I just found out after reading twice - that FUTEX_FD does
> not lock the page in memory, so that is one case less to worry about. 

Oh yes it does - the page isn't unpinned until wakeup or close.
See where it says in futex_fd():

	page = NULL;
out:
	if (page)
		unpin_page(page);

Rusty's got a good point about pipe() though.

Btw, maybe GnuPG can use this "feature" to lock it's crypto memory in RAM :)

-- Jamie

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Users locking memory using futexes
  2002-11-12 18:13       ` Rusty Russell
@ 2002-11-13 14:27         ` Alan Cox
  0 siblings, 0 replies; 13+ messages in thread
From: Alan Cox @ 2002-11-13 14:27 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jamie Lokier, Perez-Gonzalez, Inaky, 'Mark Mielke',
	Linux Kernel Mailing List

On Tue, 2002-11-12 at 18:13, Rusty Russell wrote:
> It's bounded by one page per fd.  If you want better than that, then
> yes we'll need to thihk harder.

one page per fd is "unbounded" to all intents and purposes. Doing the
page accounting per user doesnt look too scary if we ignore stuff like
page tables for a first cut.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Users locking memory using futexes
  2002-11-12 17:57 Perez-Gonzalez, Inaky
@ 2002-11-12 19:17 ` Jamie Lokier
  0 siblings, 0 replies; 13+ messages in thread
From: Jamie Lokier @ 2002-11-12 19:17 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: Rusty Russell, 'mingo@redhat.com', 'Mark Mielke',
	linux-kernel

Perez-Gonzalez, Inaky wrote:
> Hum ... still I want to try Ingo's approach on the ptes; that is the part I
> was missing [knowing that struct page * is not invariant as the pte number
> ... even being as obvious as it is].

Btw, pte number of private mapping is not invariant over mremap(), but
otherwise I think it is fine.

- Jamie

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Users locking memory using futexes
  2002-11-12 18:06     ` Alan Cox
@ 2002-11-12 18:13       ` Rusty Russell
  2002-11-13 14:27         ` Alan Cox
  0 siblings, 1 reply; 13+ messages in thread
From: Rusty Russell @ 2002-11-12 18:13 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jamie Lokier, Perez-Gonzalez, Inaky, 'Mark Mielke',
	Linux Kernel Mailing List

In message <1037124384.8321.70.camel@irongate.swansea.linux.org.uk> you write:
> On Tue, 2002-11-12 at 17:17, Rusty Russell wrote:
> > > Ouch!  It looks to me like userspace can use FUTEX_FD to lock many
> > > pages of memory, achieving the same as mlock() but without the
> > > resource checks.
> > > 
> > > Denial of service attack?
> > 
> > See "pipe".
> 
> Thats not an excuse. If the futex stuff allows arbitary memory locking
> and it isnt properly accounted then its a bug, with the added problem
> that its easier to havie nasty accidents with than pipes.

It's bounded by one page per fd.  If you want better than that, then
yes we'll need to thihk harder.

Frobbing futexes on COW and page-in/out is a possible solution, but
requires careful thought.

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Users locking memory using futexes
  2002-11-12 17:17   ` Rusty Russell
@ 2002-11-12 18:06     ` Alan Cox
  2002-11-12 18:13       ` Rusty Russell
  0 siblings, 1 reply; 13+ messages in thread
From: Alan Cox @ 2002-11-12 18:06 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jamie Lokier, Perez-Gonzalez, Inaky, 'Mark Mielke',
	Linux Kernel Mailing List

On Tue, 2002-11-12 at 17:17, Rusty Russell wrote:
> > Ouch!  It looks to me like userspace can use FUTEX_FD to lock many
> > pages of memory, achieving the same as mlock() but without the
> > resource checks.
> > 
> > Denial of service attack?
> 
> See "pipe".

Thats not an excuse. If the futex stuff allows arbitary memory locking
and it isnt properly accounted then its a bug, with the added problem
that its easier to have nasty accidents with than pipes.

We have a per user object nowdays so accounting per user locked memory
looks rather doable both for mlock, pipe, af_unix socket and for other
things


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Users locking memory using futexes
@ 2002-11-12 17:57 Perez-Gonzalez, Inaky
  2002-11-12 19:17 ` Jamie Lokier
  0 siblings, 1 reply; 13+ messages in thread
From: Perez-Gonzalez, Inaky @ 2002-11-12 17:57 UTC (permalink / raw)
  To: 'Jamie Lokier'
  Cc: Rusty Russell, 'mingo@redhat.com', 'Mark Mielke',
	linux-kernel


> > Good thing is - I just found out after reading twice - that 
> FUTEX_FD does
> > not lock the page in memory, so that is one case less to 
> worry about. 
> 
> Oh yes it does - the page isn't unpinned until wakeup or close.
> See where it says in futex_fd():
> 
> 	page = NULL;
> out:
> 	if (page)
> 		unpin_page(page);

Bang, bang, bang ... assshoooole [hearing whispers in my ears]. Great point:
Inaky 0, Jamie 1 - this will teach me to read _three_ times on Monday
evenings. I am supposed to know all that code by heart ... oh well.

> Rusty's got a good point about pipe() though.

He does; grumble, grumble ... let's see ... with pipe you have an implicit
limit that controls you, the number of open files, that you also hit with
futex_fd() (in ... get_unused_fd()) - so that is covered. OTOH, with just
futex_wait(), if you are up to use one page per futex you lock on, you are
also limited by RLIMIT_NPROC for every process you lock on [asides from
wasting a lot of memory], so looks like there is another roadblock there to
control it.

Hum ... still I want to try Ingo's approach on the ptes; that is the part I
was missing [knowing that struct page * is not invariant as the pte number
... even being as obvious as it is].

Inaky Perez-Gonzalez -- Not speaking for Intel - opinions are my own [or my
fault]


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Users locking memory using futexes
  2002-11-12  9:11   ` Ingo Molnar
@ 2002-11-12 17:40     ` Jamie Lokier
  0 siblings, 0 replies; 13+ messages in thread
From: Jamie Lokier @ 2002-11-12 17:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Perez-Gonzalez, Inaky, Rusty Russell, 'Mark Mielke',
	linux-kernel

Ingo Molnar wrote:
> > It would be nice if the futex waitqueues could be re-hashed against swap
> > entries when pages are swapped out, somehow, but this sounds hard.
> 
> yes it sounds hard (and somewhat expensive). The simple solution would be
> to hash against the pte address, which is an invariant over swapout - but
> that breaks inter-process futexes. The hard way would be to rehash the
> futex at the pte address upon swapout, and rehash it with the new physical
> page upon swapin. The pte chain case has to be careful, and rehashing
> should only be done when the physical page is truly unmapped even in the
> last process context.

Can't it be hashed against (address space, offset) for shared
mappings, and against (mm, pte address) for private mappings?

-- Jamie

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Users locking memory using futexes
  2002-11-12  3:46 ` Users locking memory using futexes Jamie Lokier
@ 2002-11-12 17:17   ` Rusty Russell
  2002-11-12 18:06     ` Alan Cox
  0 siblings, 1 reply; 13+ messages in thread
From: Rusty Russell @ 2002-11-12 17:17 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Perez-Gonzalez, Inaky, 'Mark Mielke', linux-kernel

In message <20021112034648.GA11766@bjl1.asuk.net> you write:
> Perez-Gonzalez, Inaky wrote:
> > [...] each time you lock a futex you are pinning the containing page
> > into physical memory, that would cause that if you have, for
> > example, 4000 futexes locked in 4000 different pages, there is going
> > to be 4 MB of memory locked in [...]
> 
> Ouch!  It looks to me like userspace can use FUTEX_FD to lock many
> pages of memory, achieving the same as mlock() but without the
> resource checks.
> 
> Denial of service attack?

See "pipe".

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Users locking memory using futexes
  2002-11-12  5:21 ` Jamie Lokier
@ 2002-11-12  9:11   ` Ingo Molnar
  2002-11-12 17:40     ` Jamie Lokier
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2002-11-12  9:11 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Perez-Gonzalez, Inaky, Rusty Russell, 'Mark Mielke',
	linux-kernel


On Tue, 12 Nov 2002, Jamie Lokier wrote:

> It would be nice if the futex waitqueues could be re-hashed against swap
> entries when pages are swapped out, somehow, but this sounds hard.

yes it sounds hard (and somewhat expensive). The simple solution would be
to hash against the pte address, which is an invariant over swapout - but
that breaks inter-process futexes. The hard way would be to rehash the
futex at the pte address upon swapout, and rehash it with the new physical
page upon swapin. The pte chain case has to be careful, and rehashing
should only be done when the physical page is truly unmapped even in the
last process context.

but this should indeed solve the page lockdown problem.

	Ingo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Users locking memory using futexes
  2002-11-12  4:16 Perez-Gonzalez, Inaky
@ 2002-11-12  5:21 ` Jamie Lokier
  2002-11-12  9:11   ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Jamie Lokier @ 2002-11-12  5:21 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: Rusty Russell, 'mingo@redhat.com', 'Mark Mielke',
	linux-kernel

Perez-Gonzalez, Inaky wrote:
> This raises a good point - I guess we should be doing something like
> checking user limits (against locked memory, 'ulimit -l').

If futexes are limited by user limits, that's going to mean some
threading program gets a surprise when too many threads decide to
block on a resource.  That's really nasty.  (Of course, a program can
get a surprise due to just running out of memory in sys_futex() too,
but that's much rarer).

It would be nice if the futex waitqueues could be re-hashed against
swap entries when pages are swapped out, somehow, but this sounds hard.

-- Jamie

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Users locking memory using futexes
@ 2002-11-12  4:16 Perez-Gonzalez, Inaky
  2002-11-12  5:21 ` Jamie Lokier
  0 siblings, 1 reply; 13+ messages in thread
From: Perez-Gonzalez, Inaky @ 2002-11-12  4:16 UTC (permalink / raw)
  To: 'Jamie Lokier', Rusty Russell
  Cc: 'mingo@redhat.com', 'Mark Mielke', linux-kernel


> Perez-Gonzalez, Inaky wrote:
> > [...] each time you lock a futex you are pinning the containing page
> > into physical memory, that would cause that if you have, for
> > example, 4000 futexes locked in 4000 different pages, there is going
> > to be 4 MB of memory locked in [...]
> 
> Ouch!  It looks to me like userspace can use FUTEX_FD to lock many
> pages of memory, achieving the same as mlock() but without the
> resource checks.

This raises a good point - I guess we should be doing something like
checking user limits (against locked memory, 'ulimit -l'). Something along
the lines of this [warning, dirty-fastly-scratched-draft, untested]:

diff -u futex.c.orig futex.c
--- futex.c.orig	2002-11-11 20:06:22.000000000 -0800
+++ futex.c	2002-11-11 20:08:48.000000000 -0800
@@ -261,8 +261,12 @@
 	struct page *page;
 	struct futex_q q;
 
+	if (current->mm->total_vm + 1 >
+            (current->rlim[RLIMIT_MEMLOCK].rlim_cur >> PAGE_SHIFT))
+          return -ENOMEM;
+        
 	init_waitqueue_head(&q.waiters);
-
+        
 	lock_futex_mm();
 
 	page = __pin_page(uaddr - offset);
@@ -358,6 +362,11 @@
 	if (signal < 0 || signal > _NSIG)
 		goto out;
 
+	ret = -ENOMEM;
+        if (current->mm->total_vm + 1 >
+            (current->rlim[RLIMIT_MEMLOCK].rlim_cur >> PAGE_SHIFT))
+          goto out;
+        
 	ret = get_unused_fd();
 	if (ret < 0)
 		goto out;

However, we could break the semantics of other programs that expect that the
amount of memory they lock is the only one that is used in the rlimit ... 

What else could be done?

Inaky Perez-Gonzalez -- Not speaking for Intel - opinions are my own [or my
fault]


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Users locking memory using futexes
  2002-11-11 20:28 PROT_SEM + FUTEX Perez-Gonzalez, Inaky
@ 2002-11-12  3:46 ` Jamie Lokier
  2002-11-12 17:17   ` Rusty Russell
  0 siblings, 1 reply; 13+ messages in thread
From: Jamie Lokier @ 2002-11-12  3:46 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Perez-Gonzalez, Inaky, 'Mark Mielke', linux-kernel

Perez-Gonzalez, Inaky wrote:
> [...] each time you lock a futex you are pinning the containing page
> into physical memory, that would cause that if you have, for
> example, 4000 futexes locked in 4000 different pages, there is going
> to be 4 MB of memory locked in [...]

Ouch!  It looks to me like userspace can use FUTEX_FD to lock many
pages of memory, achieving the same as mlock() but without the
resource checks.

Denial of service attack?

-- Jamie

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2002-11-13 13:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-11-12  5:56 Users locking memory using futexes Perez-Gonzalez, Inaky
2002-11-12 17:36 ` Jamie Lokier
  -- strict thread matches above, loose matches on Subject: below --
2002-11-12 17:57 Perez-Gonzalez, Inaky
2002-11-12 19:17 ` Jamie Lokier
2002-11-12  4:16 Perez-Gonzalez, Inaky
2002-11-12  5:21 ` Jamie Lokier
2002-11-12  9:11   ` Ingo Molnar
2002-11-12 17:40     ` Jamie Lokier
2002-11-11 20:28 PROT_SEM + FUTEX Perez-Gonzalez, Inaky
2002-11-12  3:46 ` Users locking memory using futexes Jamie Lokier
2002-11-12 17:17   ` Rusty Russell
2002-11-12 18:06     ` Alan Cox
2002-11-12 18:13       ` Rusty Russell
2002-11-13 14:27         ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).