All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: linux-aio@kvack.org, zach.brown@oracle.com, bcrl@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [patch] aio: remove aio-max-nr and instead use the memlock rlimit to limit the number of pages pinned for the aio completion ring
Date: Mon, 9 Mar 2009 15:36:30 -0700	[thread overview]
Message-ID: <20090309153630.912d36a0.akpm@linux-foundation.org> (raw)
In-Reply-To: <x49bpsaelwa.fsf@segfault.boston.devel.redhat.com>

On Mon, 09 Mar 2009 11:49:57 -0400
Jeff Moyer <jmoyer@redhat.com> wrote:

> Hi,
> 
> Believe it or not, I get numerous questions from customers about the
> suggested tuning value of aio-max-nr.  aio-max-nr limits the total
> number of io events that can be reserved, system wide, for aio
> completions.  Each time io_setup is called, a ring buffer is allocated
> that can hold nr_events I/O completions.  That ring buffer is then
> mapped into the process' address space, and the pages are pinned in
> memory.  So, the reason for this upper limit (I believe) is to keep a
> malicious user from pinning all of kernel memory.  Now, this sounds like
> a much better job for the memlock rlimit to me, hence the following
> patch.
> 
> It's worth noting that, by default, aio-max-nr was set to 65536
> requests.  By default, the memlock rlimit is set to 64kb per process.
> With this patch in place, a single process can specify 2045 for the
> nr_events parameter of io_setup.  Or, for the worst case scenario, a
> process can only create 16 io contexts, each with a single event (since
> the minimum ring size is a page).
> 
> I created a simple program that sets the process' memlock rlimit and
> then drops the CAP_IPC_LOCK capability (the memlock rlimit is a command
> line parameter, so you can see how the nr_events allowable changes with
> different limits in place).  Then, it calls io_setup/io_destroy in a
> loop, increasing nr_events until it fails.  Finally, it calls io_setup
> in a loop with a single event to see how many iterations it can perform
> before receiving -EAGAIN.  I ran the test on a patched kernel and the
> results are in line with expectations.
> 
> I also ran the aio-dio-regress regression tests, and all passed but
> one.  The one that failed only failed due to an expected return code of
> -ENOMEM, and instead got -EAGAIN.  Part of the patch below returns a
> proper error code from aio_setup_ring.  So, I'm calling this a test
> issue, but we can debate this nuance if it is deemed important.
> 
> Further, as a result of this exercise, I noticed that there are numerous
> places in the kernel that test to see if locking memory will exceed the
> maximum amount of allowed locked memory.  I've factored out that bit of
> code in a follow-on patch that I will post.
> 
> Finally, I updated the aio man pages to reflect this change (and fix
> references to aio_context_t that should be io_context_t).
> 
> You can find a copy of the test program I used at:
>   http://people.redhat.com/jmoyer/aio/ioctx_alloc.c
> I'll apologize in advance for the crudity of the test code.
> 
> For those reviewing the below patch, it may be worth looking at:
> 
> commit d55b5fdaf40846221d543937b786956e27837fda
> Author: Zach Brown <zach.brown@oracle.com>
> Date:   Mon Nov 7 00:59:31 2005 -0800
> 
>     [PATCH] aio: remove aio_max_nr accounting race
> 
> The below patch basically reverts the above commit.  There is no
> accounting race now, because we are charging per-process rlimits instead
> of a system-wide maximum number of aio requests.  This has the added
> benefit of reducing some code complexity.

It's risky to simply remove an existing tunable.  What if someone's
mission-critical startup script which is provided by a super-slow or
even no-longer-in-business vendor does 

	if (write(aoi-max-nr, something) == error)
		crash_and_burn()

?

It would be prudent to have a more cautious update scheme.  Leave the
existing tunable in place.  Keep it working if possible.  If someone
uses it, blurt out a loud printk telling them that they're using a
deprecated interface and informing them how to update.

Then at some later time we can remove the old interface.

>  /*------ sysctl variables----*/
> -static DEFINE_SPINLOCK(aio_nr_lock);
> -unsigned long aio_nr;		/* current system wide number of aio requests */
> -unsigned long aio_max_nr = 0x10000; /* system wide maximum number of aio requests */
> +/* current system wide number of aio requests */
> +atomic_t aio_nr = ATOMIC_INIT(0);

Was it a conscious decision to downgrade this from a `long' type to an
`int' type?  It _could_ have used atomic_long_t.



  parent reply	other threads:[~2009-03-09 22:37 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-09 15:49 [patch] aio: remove aio-max-nr and instead use the memlock rlimit to limit the number of pages pinned for the aio completion ring Jeff Moyer
2009-03-09 15:54 ` [patch] factor out checks against the memlock rlimit Jeff Moyer
2009-03-09 15:59 ` [patch] man-pages: add documentation about the memlock implications of io_setup Jeff Moyer
2009-03-09 16:45   ` Michael Kerrisk
2009-03-09 16:48   ` Michael Kerrisk
2009-03-09 20:44     ` Jeff Moyer
2009-03-09 16:18 ` [patch] aio: remove aio-max-nr and instead use the memlock rlimit to limit the number of pages pinned for the aio completion ring Avi Kivity
2009-03-09 17:57   ` Jeff Moyer
2009-03-09 19:45     ` Avi Kivity
2009-03-09 20:36       ` Jamie Lokier
2009-03-10  8:36         ` Avi Kivity
2009-03-09 20:31     ` Eric Dumazet
2009-03-12  2:39       ` Eric Dumazet
2009-03-12  2:44         ` Benjamin LaHaise
2009-03-12  3:24           ` Eric Dumazet
2009-03-12  3:29             ` Benjamin LaHaise
2009-03-12  3:33               ` Eric Dumazet
2009-03-12  3:36                 ` Benjamin LaHaise
2009-03-12  3:40                   ` Eric Dumazet
2009-03-12  3:09         ` Eric Dumazet
2009-03-12  5:18           ` [PATCH] fs: fput() can be called from interrupt context Eric Dumazet
2009-03-12  5:42             ` [PATCH] aio: " Eric Dumazet
2009-03-12  5:47             ` [PATCH] fs: " Andrew Morton
2009-03-12  6:10               ` Eric Dumazet
2009-03-12  6:39                 ` Andrew Morton
2009-03-12 13:39                   ` Davide Libenzi
2009-03-13 22:34                     ` Davide Libenzi
2009-03-13 22:43                       ` Eric Dumazet
2009-03-13 23:28                     ` Trond Myklebust
2009-03-14  1:40                       ` Davide Libenzi
2009-03-14  4:02                         ` Trond Myklebust
2009-03-14 14:32                           ` Davide Libenzi
2009-03-15  1:36                             ` [patch] eventfd - remove fput() call from possible IRQ context Davide Libenzi
2009-03-15 17:44                               ` Benjamin LaHaise
2009-03-15 20:08                                 ` [patch] eventfd - remove fput() call from possible IRQ context (2nd rev) Davide Libenzi
2009-03-16 17:25                                   ` Jamie Lokier
2009-03-16 18:36                                     ` Davide Libenzi
2009-03-18 14:22                                   ` Jeff Moyer
2009-03-18 14:46                                     ` Davide Libenzi
2009-03-18 14:55                                     ` Eric Dumazet
2009-03-18 15:25                                       ` Jeff Moyer
2009-03-18 15:43                                         ` Eric Dumazet
2009-03-18 16:13                                           ` Jeff Moyer
2009-03-18 17:25                                     ` [patch] eventfd - remove fput() call from possible IRQ context (3rd rev) Davide Libenzi
2009-03-18 17:34                                       ` Jeff Moyer
2009-03-12 19:22                   ` [PATCH] fs: fput() can be called from interrupt context Eric Dumazet
2009-03-12 20:21                     ` Andrew Morton
2009-03-09 22:36 ` Andrew Morton [this message]
2009-03-10 13:43   ` [patch] aio: remove aio-max-nr and instead use the memlock rlimit to limit the number of pages pinned for the aio completion ring Jeff Moyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090309153630.912d36a0.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=bcrl@kvack.org \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.