All of lore.kernel.org
 help / color / mirror / Atom feed
* [RESEND PATCH v2 0/2] aio: decouple ctx's max_reqs and nr_events
@ 2017-05-24 19:36 Mauricio Faria de Oliveira
  2017-05-24 19:36 ` [RESEND PATCH v2 1/2] aio: make nr_events a parameter for aio_setup_ring() Mauricio Faria de Oliveira
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-05-24 19:36 UTC (permalink / raw)
  To: bcrl, viro; +Cc: jmoyer, linux-aio, linux-fsdevel

Currently, using smaller nr_events in io_setup(), the max number of
allocation requests is inversely proportional to num_possible_cpus()
(which is a problem on larger/non-virtualized systems with many CPUs).

This happens because, internally, nr_events must be increased for the
percpu allocation scheme (according to the number of CPUs), but it is
being reflected on the max_reqs value (aio-nr), which is used to count
against the global limit (aio-max-nr).

This patchset decouples ctx->max_reqs and ctx->nr_events, so that the
extra nr_events which are internally required do not reflect into the
'aio-nr' counter which is externally visible & checked to 'aio-max-nr'.
    
This ensures that userspace can allocate most of the aio-max-nr value
that is made available/visible to it -- regardless of the number of
possible CPUs in the system.

P.S.: this patchset has been 'relatively acked' by Benjamin LaHaise [2],
      but for timing difficulties, it hasn't been queued nor re-submitted
      in a while; sorry.

v2:
 - adjust/move comment in PATCH 2/2 to commit message (Jeff Moyer [3])

Test-case:

     // 64k request (ie, aio-max-nr) to io_setup with nr_events = 1
     for (i = 0; i < 65536; i++)
         if (rc = io_setup(1, &ioctx[i]))
             break;

     printf("rc = %d, i = %d\n", rc, i);

  - on v4.12-rc2, it fails w/ -EAGAIN on request #102 (160 CPUs),
                  and aio-nr value is ~2x the aio-max-nr value (wrong):

    # ./test-case &
    rc = -11, i = 102

    # grep . /proc/sys/fs/aio-*
    /proc/sys/fs/aio-max-nr:65536
    /proc/sys/fs/aio-nr:130560


  - on v4.12-rc2 with patches, it fails w/ -ENOMEM on request #65515,
                 thus going way further toward the actual max limit:

    # ./test-case &
    rc = -12, i = 65515

    # grep . /proc/sys/fs/aio-*
    /proc/sys/fs/aio-max-nr:65536
    /proc/sys/fs/aio-nr:65515


Links:
[1] http://marc.info/?l=linux-aio&m=147562172703469&w=2
[2] http://marc.info/?l=linux-aio&m=147757894603034&w=2
[3] http://marc.info/?l=linux-aio&m=147769074702475&w=2

Mauricio Faria de Oliveira (2):
  aio: make nr_events a parameter for aio_setup_ring()
  aio: use ctx->max_reqs only for counting against the global limit

 fs/aio.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RESEND PATCH v2 1/2] aio: make nr_events a parameter for aio_setup_ring()
  2017-05-24 19:36 [RESEND PATCH v2 0/2] aio: decouple ctx's max_reqs and nr_events Mauricio Faria de Oliveira
@ 2017-05-24 19:36 ` Mauricio Faria de Oliveira
  2017-05-24 19:36 ` [RESEND PATCH v2 2/2] aio: use ctx->max_reqs only for counting against the global limit Mauricio Faria de Oliveira
  2017-06-09  0:26 ` [RESEND PATCH v2 0/2] aio: decouple ctx's max_reqs and nr_events Mauricio Faria de Oliveira
  2 siblings, 0 replies; 4+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-05-24 19:36 UTC (permalink / raw)
  To: bcrl, viro; +Cc: jmoyer, linux-aio, linux-fsdevel

In order to decouple ctx->max_reqs and ctx->nr_events, first remove
the aio_setup_ring() dependency on ctx->max_reqs -- which currently
is assigned the value of nr_events -- and make the function receive
the value of nr_events directly.

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
---
 fs/aio.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index f52d925ee259..7c3c01f352c1 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -441,10 +441,9 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 #endif
 };
 
-static int aio_setup_ring(struct kioctx *ctx)
+static int aio_setup_ring(struct kioctx *ctx, unsigned nr_events)
 {
 	struct aio_ring *ring;
-	unsigned nr_events = ctx->max_reqs;
 	struct mm_struct *mm = current->mm;
 	unsigned long size, unused;
 	int nr_pages;
@@ -753,7 +752,7 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
 	if (!ctx->cpu)
 		goto err;
 
-	err = aio_setup_ring(ctx);
+	err = aio_setup_ring(ctx, nr_events);
 	if (err < 0)
 		goto err;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RESEND PATCH v2 2/2] aio: use ctx->max_reqs only for counting against the global limit
  2017-05-24 19:36 [RESEND PATCH v2 0/2] aio: decouple ctx's max_reqs and nr_events Mauricio Faria de Oliveira
  2017-05-24 19:36 ` [RESEND PATCH v2 1/2] aio: make nr_events a parameter for aio_setup_ring() Mauricio Faria de Oliveira
@ 2017-05-24 19:36 ` Mauricio Faria de Oliveira
  2017-06-09  0:26 ` [RESEND PATCH v2 0/2] aio: decouple ctx's max_reqs and nr_events Mauricio Faria de Oliveira
  2 siblings, 0 replies; 4+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-05-24 19:36 UTC (permalink / raw)
  To: bcrl, viro; +Cc: jmoyer, linux-aio, linux-fsdevel

Decouple ctx->max_reqs and ctx->nr_events; each one represents
a different side of the same coin -- userspace and kernelspace,
respectively.

Briefly, ctx->max_reqs represents what is userspace/externally
accessible by userspace; and ctx->nr_events represents what is
kernelspace/internally needed by the percpu allocation scheme.

With the percpu scheme, the original value of ctx->max_reqs from
userspace is changed (but still used to count against aio_max_nr)
based on num_possible_cpus(), and it may increase significantly
on systems with great num_possible_cpus() for smaller nr_events.

This eventually prevents userspace applications from getting the
actual value of aio_max_nr in the total requested nr_events.

ctx->max_reqs
=============

The ctx->max_reqs value once again aligns with its description:

  * This is what userspace passed to io_setup(), it's not used for
  * anything but counting against the global max_reqs quota.

It stores the original value of nr_events that userspace passed
to io_setup() (it's not increased to make room for requirements
of the percpu allocation scheme) - and is used to increment and
decrement the 'aio_nr' value, and to check against 'aio_max_nr'.

So, regardless of how many additional nr_events are internally
required for the percpu allocation scheme (e.g. make it 4x the
number of possible CPUs, and double it), userspace can get all
of the 'aio-max-nr' value that is made available/visible to it.

Another benefit is a consistent value in '/proc/sys/fs/aio-nr':
the sum of all values as requested by userspace, and it's less
than or equal to '/proc/sys/fs/aio-max-nr' again (not 2x it).

ctx->nr_events
==============

The ctx->nr_events value is the actual size of the ringbuffer/
number of slots, which may be more than what userspace passed
to io_setup() (depending on the requested value for nr_events
and/or calculations made in aio_setup_ring()) - as determined
by the percpu allocation scheme for its correct/fast behavior.

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
---
 fs/aio.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 7c3c01f352c1..4967b0e1ef1a 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -706,6 +706,12 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
 	int err = -ENOMEM;
 
 	/*
+	 * Store the original value of nr_events from userspace for counting
+	 * against the global limit (aio_max_nr).
+	 */
+	unsigned max_reqs = nr_events;
+
+	/*
 	 * We keep track of the number of available ringbuffer slots, to prevent
 	 * overflow (reqs_available), and we also use percpu counters for this.
 	 *
@@ -723,14 +729,14 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
 		return ERR_PTR(-EINVAL);
 	}
 
-	if (!nr_events || (unsigned long)nr_events > (aio_max_nr * 2UL))
+	if (!nr_events || (unsigned long)max_reqs > aio_max_nr)
 		return ERR_PTR(-EAGAIN);
 
 	ctx = kmem_cache_zalloc(kioctx_cachep, GFP_KERNEL);
 	if (!ctx)
 		return ERR_PTR(-ENOMEM);
 
-	ctx->max_reqs = nr_events;
+	ctx->max_reqs = max_reqs;
 
 	spin_lock_init(&ctx->ctx_lock);
 	spin_lock_init(&ctx->completion_lock);
@@ -763,8 +769,8 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
 
 	/* limit the number of system wide aios */
 	spin_lock(&aio_nr_lock);
-	if (aio_nr + nr_events > (aio_max_nr * 2UL) ||
-	    aio_nr + nr_events < aio_nr) {
+	if (aio_nr + ctx->max_reqs > aio_max_nr ||
+	    aio_nr + ctx->max_reqs < aio_nr) {
 		spin_unlock(&aio_nr_lock);
 		err = -EAGAIN;
 		goto err_ctx;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RESEND PATCH v2 0/2] aio: decouple ctx's max_reqs and nr_events
  2017-05-24 19:36 [RESEND PATCH v2 0/2] aio: decouple ctx's max_reqs and nr_events Mauricio Faria de Oliveira
  2017-05-24 19:36 ` [RESEND PATCH v2 1/2] aio: make nr_events a parameter for aio_setup_ring() Mauricio Faria de Oliveira
  2017-05-24 19:36 ` [RESEND PATCH v2 2/2] aio: use ctx->max_reqs only for counting against the global limit Mauricio Faria de Oliveira
@ 2017-06-09  0:26 ` Mauricio Faria de Oliveira
  2 siblings, 0 replies; 4+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-06-09  0:26 UTC (permalink / raw)
  To: bcrl, viro, jmoyer; +Cc: linux-aio, linux-fsdevel

On 05/24/2017 04:36 PM, Mauricio Faria de Oliveira wrote:
> Mauricio Faria de Oliveira (2):
>    aio: make nr_events a parameter for aio_setup_ring()
>    aio: use ctx->max_reqs only for counting against the global limit

I'll resend this in a simpler patch, with subject:
"aio: fix the increment of aio-nr and counting against aio-max-nr"

Hope this helps.
Sorry for the extra email.

-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-06-09  0:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-24 19:36 [RESEND PATCH v2 0/2] aio: decouple ctx's max_reqs and nr_events Mauricio Faria de Oliveira
2017-05-24 19:36 ` [RESEND PATCH v2 1/2] aio: make nr_events a parameter for aio_setup_ring() Mauricio Faria de Oliveira
2017-05-24 19:36 ` [RESEND PATCH v2 2/2] aio: use ctx->max_reqs only for counting against the global limit Mauricio Faria de Oliveira
2017-06-09  0:26 ` [RESEND PATCH v2 0/2] aio: decouple ctx's max_reqs and nr_events Mauricio Faria de Oliveira

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.