All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pekka Enberg <penberg@iki.fi>
To: Jann Horn <jannh@google.com>
Cc: Jens Axboe <axboe@kernel.dk>, io-uring <io-uring@vger.kernel.org>,
	Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>,
	joseph qi <joseph.qi@linux.alibaba.com>,
	Jiufei Xue <jiufei.xue@linux.alibaba.com>,
	Pavel Begunkov <asml.silence@gmail.com>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH RFC} io_uring: io_kiocb alloc cache
Date: Wed, 13 May 2020 22:20:57 +0300	[thread overview]
Message-ID: <20200513191919.GA10975@nero> (raw)
In-Reply-To: <CAG48ez0eGT60a50GAkL3FVvRzpXwhufdr+68k_X_qTgxyZ-oQQ@mail.gmail.com>


Hi,

On Wed, May 13, 2020 at 6:30 PM Jens Axboe <axboe@kernel.dk> wrote:
> > I turned the quick'n dirty from the other day into something a bit 
> > more done. Would be great if someone else could run some
> > performance testing with this, I get about a 10% boost on the pure
> > NOP benchmark with this. But that's just on my laptop in qemu, so
> > some real iron testing would be awesome.

On 5/13/20 8:42 PM, Jann Horn wrote:> +slab allocator people
> 10% boost compared to which allocator? Are you using CONFIG_SLUB?
 
On Wed, May 13, 2020 at 6:30 PM Jens Axboe <axboe@kernel.dk> wrote:
> > The idea here is to have a percpu alloc cache. There's two sets of 
> > state:
> > 
> > 1) Requests that have IRQ completion. preempt disable is not
> > enough there, we need to disable local irqs. This is a lot slower
> > in certain setups, so we keep this separate.
> > 
> > 2) No IRQ completion, we can get by with just disabling preempt.

On 5/13/20 8:42 PM, Jann Horn wrote:> +slab allocator people
> The SLUB allocator has percpu caching, too, and as long as you don't 
> enable any SLUB debugging or ASAN or such, and you're not hitting
> any slowpath processing, it doesn't even have to disable interrupts,
> it gets away with cmpxchg_double.

The struct io_kiocb is 240 bytes. I don't see a dedicated slab for it in
/proc/slabinfo on my machine, so it likely got merged to the kmalloc-256
cache. This means that there's 32 objects in the per-CPU cache. Jens, on
the other hand, made the cache much bigger:

+#define IO_KIOCB_CACHE_MAX 256

So I assume if someone does "perf record", they will see significant
reduction in page allocator activity with Jens' patch. One possible way
around that is forcing the page allocation order to be much higher. IOW,
something like the following completely untested patch:

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 979d9f977409..c3bf7b72026d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8143,7 +8143,7 @@ static int __init io_uring_init(void)
 
 	BUILD_BUG_ON(ARRAY_SIZE(io_op_defs) != IORING_OP_LAST);
 	BUILD_BUG_ON(__REQ_F_LAST_BIT >= 8 * sizeof(int));
-	req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC);
+	req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_LARGE_ORDER);
 	return 0;
 };
 __initcall(io_uring_init);
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 6d454886bcaf..316fd821ec1f 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -39,6 +39,8 @@
 #define SLAB_STORE_USER		((slab_flags_t __force)0x00010000U)
 /* Panic if kmem_cache_create() fails */
 #define SLAB_PANIC		((slab_flags_t __force)0x00040000U)
+/* Force slab page allocation order to be as large as possible */
+#define SLAB_LARGE_ORDER	((slab_flags_t __force)0x00080000U)
 /*
  * SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS!
  *
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 23c7500eea7d..a18bbe9472e4 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -51,7 +51,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
  */
 #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
 		SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
-		SLAB_FAILSLAB | SLAB_KASAN)
+		SLAB_FAILSLAB | SLAB_KASAN | SLAB_LARGE_ORDER)
 
 #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
 			 SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
diff --git a/mm/slub.c b/mm/slub.c
index b762450fc9f0..d1d86b1279aa 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3318,12 +3318,15 @@ static inline unsigned int slab_order(unsigned int size,
 	return order;
 }
 
-static inline int calculate_order(unsigned int size)
+static inline int calculate_order(unsigned int size, gfp_t flags)
 {
 	unsigned int order;
 	unsigned int min_objects;
 	unsigned int max_objects;
 
+	if (flags & SLAB_LARGE_ORDER)
+		return slub_max_order;
+
 	/*
 	 * Attempt to find best configuration for a slab. This
 	 * works by first attempting to generate a layout with
@@ -3651,7 +3654,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 	if (forced_order >= 0)
 		order = forced_order;
 	else
-		order = calculate_order(size);
+		order = calculate_order(size, flags);
 
 	if ((int)order < 0)
 		return 0;

- Pekka

  parent reply	other threads:[~2020-05-13 19:26 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-13 16:30 [PATCH RFC} io_uring: io_kiocb alloc cache Jens Axboe
2020-05-13 17:42 ` Jann Horn
2020-05-13 17:42   ` Jann Horn
2020-05-13 18:34   ` Jens Axboe
2020-05-13 19:20   ` Pekka Enberg [this message]
2020-05-13 20:09     ` Jens Axboe
2020-05-13 20:31       ` Pekka Enberg
2020-05-13 20:44         ` Jens Axboe
2020-05-14  8:25 ` Xiaoguang Wang
2020-05-14 14:22   ` Jens Axboe
2020-05-14 14:33     ` Jens Axboe
2020-05-14 14:53       ` Pavel Begunkov
2020-05-14 15:15         ` Jens Axboe
2020-05-14 15:37           ` Pavel Begunkov
2020-05-14 15:53             ` Jens Axboe
2020-05-14 16:18               ` Pavel Begunkov
2020-05-14 16:21                 ` Jens Axboe
2020-05-14 16:25                 ` Pavel Begunkov
2020-05-14 17:01                   ` Jens Axboe
2020-05-14 17:41                     ` Jens Axboe
2020-05-16  9:20       ` Xiaoguang Wang
2020-05-16 16:15     ` Xiaoguang Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200513191919.GA10975@nero \
    --to=penberg@iki.fi \
    --cc=akpm@linux-foundation.org \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=io-uring@vger.kernel.org \
    --cc=jannh@google.com \
    --cc=jiufei.xue@linux.alibaba.com \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=xiaoguang.wang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.