All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next v4 0/6] implement pcpu bio caching for IRQ I/O
@ 2022-11-02 15:18 Pavel Begunkov
  2022-11-02 15:18 ` [PATCH for-next v4 1/6] mempool: introduce mempool_is_saturated Pavel Begunkov
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Pavel Begunkov @ 2022-11-02 15:18 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: kernel-team, Pavel Begunkov

Add bio pcpu caching for IRQ-driven I/O. We extend the currently limited to
iopoll REQ_ALLOC_CACHE infra. Benchmarked with t/io_uring and an Optane SSD:
2.22 -> 2.32 MIOPS for qd32 (+4.5%) and 2.60 vs 2.82 for qd128 (+8.4%).

Works best with per-cpu queues, otherwise there might be some effects at
play, e.g. bios allocated by one cpu but freed by another, but the worst
case (always goes to mempool) doesn't show any performance degradation.

Currently, it's only enabled for previous REQ_ALLOC_CACHE users but will
be turned on system-wide later.

v2: fix botched splicing threshold checks
v3: remove merged patch limit scope of flags var in bio_put_percpu_cache
v4: correct outdated comment
    fix in-irq put -> splice modifying the non-irq safe cache list
    fix alloc null dereference

Pavel Begunkov (6):
  mempool: introduce mempool_is_saturated
  bio: don't rob starving biosets of bios
  bio: split pcpu cache part of bio_put into a helper
  bio: add pcpu caching for non-polling bio_put
  bio: shrink max number of pcpu cached bios
  io_uring/rw: enable bio caches for IRQ rw

 block/bio.c             | 98 +++++++++++++++++++++++++++++++----------
 include/linux/mempool.h |  5 +++
 io_uring/rw.c           |  3 +-
 3 files changed, 82 insertions(+), 24 deletions(-)

-- 
2.38.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH for-next v4 1/6] mempool: introduce mempool_is_saturated
  2022-11-02 15:18 [PATCH for-next v4 0/6] implement pcpu bio caching for IRQ I/O Pavel Begunkov
@ 2022-11-02 15:18 ` Pavel Begunkov
  2022-11-02 15:18 ` [PATCH for-next v4 2/6] bio: don't rob starving biosets of bios Pavel Begunkov
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2022-11-02 15:18 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: kernel-team, Pavel Begunkov

Introduce a helper mempool_is_saturated(), which tells if the mempool is
under-filled or not. We need it to figure out whether it should be
freed right into the mempool or could be cached with top level caches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/mempool.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index 0c964ac107c2..4aae6c06c5f2 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -30,6 +30,11 @@ static inline bool mempool_initialized(mempool_t *pool)
 	return pool->elements != NULL;
 }
 
+static inline bool mempool_is_saturated(mempool_t *pool)
+{
+	return READ_ONCE(pool->curr_nr) >= pool->min_nr;
+}
+
 void mempool_exit(mempool_t *pool);
 int mempool_init_node(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
 		      mempool_free_t *free_fn, void *pool_data,
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH for-next v4 2/6] bio: don't rob starving biosets of bios
  2022-11-02 15:18 [PATCH for-next v4 0/6] implement pcpu bio caching for IRQ I/O Pavel Begunkov
  2022-11-02 15:18 ` [PATCH for-next v4 1/6] mempool: introduce mempool_is_saturated Pavel Begunkov
@ 2022-11-02 15:18 ` Pavel Begunkov
  2022-11-02 15:18 ` [PATCH for-next v4 3/6] bio: split pcpu cache part of bio_put into a helper Pavel Begunkov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2022-11-02 15:18 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: kernel-team, Pavel Begunkov

Biosets keep a mempool, so as long as requests complete we can always
can allocate and have forward progress. Percpu bio caches break that
assumptions as we may complete into the cache of one CPU and after try
and fail to allocate with another CPU. We also can't grab from another
CPU's cache without tricky sync.

If we're allocating with a bio while the mempool is undersaturated,
remove REQ_ALLOC_CACHE flag, so on put it will go straight to mempool.
It might try to free into mempool more requests than required, but
assuming than there is no memory starvation in the system it'll
stabilise and never hit that path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/bio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 57c2f327225b..8afc3e78beff 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -526,6 +526,8 @@ struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs,
 	}
 	if (unlikely(!p))
 		return NULL;
+	if (!mempool_is_saturated(&bs->bio_pool))
+		opf &= ~REQ_ALLOC_CACHE;
 
 	bio = p + bs->front_pad;
 	if (nr_vecs > BIO_INLINE_VECS) {
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH for-next v4 3/6] bio: split pcpu cache part of bio_put into a helper
  2022-11-02 15:18 [PATCH for-next v4 0/6] implement pcpu bio caching for IRQ I/O Pavel Begunkov
  2022-11-02 15:18 ` [PATCH for-next v4 1/6] mempool: introduce mempool_is_saturated Pavel Begunkov
  2022-11-02 15:18 ` [PATCH for-next v4 2/6] bio: don't rob starving biosets of bios Pavel Begunkov
@ 2022-11-02 15:18 ` Pavel Begunkov
  2022-11-02 15:18 ` [PATCH for-next v4 4/6] bio: add pcpu caching for non-polling bio_put Pavel Begunkov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2022-11-02 15:18 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: kernel-team, Pavel Begunkov

Extract a helper out of bio_put for recycling into percpu caches.
It's a preparation patch without functional changes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/bio.c | 38 +++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 8afc3e78beff..f99d27566839 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -727,6 +727,28 @@ static void bio_alloc_cache_destroy(struct bio_set *bs)
 	bs->cache = NULL;
 }
 
+static inline void bio_put_percpu_cache(struct bio *bio)
+{
+	struct bio_alloc_cache *cache;
+
+	cache = per_cpu_ptr(bio->bi_pool->cache, get_cpu());
+	bio_uninit(bio);
+
+	if ((bio->bi_opf & REQ_POLLED) && !WARN_ON_ONCE(in_interrupt())) {
+		bio->bi_next = cache->free_list;
+		cache->free_list = bio;
+		cache->nr++;
+	} else {
+		put_cpu();
+		bio_free(bio);
+		return;
+	}
+
+	if (cache->nr > ALLOC_CACHE_MAX + ALLOC_CACHE_SLACK)
+		bio_alloc_cache_prune(cache, ALLOC_CACHE_SLACK);
+	put_cpu();
+}
+
 /**
  * bio_put - release a reference to a bio
  * @bio:   bio to release reference to
@@ -742,20 +764,10 @@ void bio_put(struct bio *bio)
 		if (!atomic_dec_and_test(&bio->__bi_cnt))
 			return;
 	}
-
-	if ((bio->bi_opf & REQ_ALLOC_CACHE) && !WARN_ON_ONCE(in_interrupt())) {
-		struct bio_alloc_cache *cache;
-
-		bio_uninit(bio);
-		cache = per_cpu_ptr(bio->bi_pool->cache, get_cpu());
-		bio->bi_next = cache->free_list;
-		cache->free_list = bio;
-		if (++cache->nr > ALLOC_CACHE_MAX + ALLOC_CACHE_SLACK)
-			bio_alloc_cache_prune(cache, ALLOC_CACHE_SLACK);
-		put_cpu();
-	} else {
+	if (bio->bi_opf & REQ_ALLOC_CACHE)
+		bio_put_percpu_cache(bio);
+	else
 		bio_free(bio);
-	}
 }
 EXPORT_SYMBOL(bio_put);
 
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH for-next v4 4/6] bio: add pcpu caching for non-polling bio_put
  2022-11-02 15:18 [PATCH for-next v4 0/6] implement pcpu bio caching for IRQ I/O Pavel Begunkov
                   ` (2 preceding siblings ...)
  2022-11-02 15:18 ` [PATCH for-next v4 3/6] bio: split pcpu cache part of bio_put into a helper Pavel Begunkov
@ 2022-11-02 15:18 ` Pavel Begunkov
  2022-11-02 15:18 ` [PATCH for-next v4 5/6] bio: shrink max number of pcpu cached bios Pavel Begunkov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2022-11-02 15:18 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: kernel-team, Pavel Begunkov

This patch extends REQ_ALLOC_CACHE to IRQ completions, whenever
currently it's only limited to iopoll. Instead of guarding the list with
irq toggling on alloc, which is expensive, it keeps an additional
irq-safe list from which bios are spliced in batches to ammortise
overhead. On the put side it toggles irqs, but in many cases they're
already disabled and so cheap.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/bio.c | 70 +++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 54 insertions(+), 16 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index f99d27566839..d989e45583ac 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -25,9 +25,15 @@
 #include "blk-rq-qos.h"
 #include "blk-cgroup.h"
 
+#define ALLOC_CACHE_THRESHOLD	16
+#define ALLOC_CACHE_SLACK	64
+#define ALLOC_CACHE_MAX		512
+
 struct bio_alloc_cache {
 	struct bio		*free_list;
+	struct bio		*free_list_irq;
 	unsigned int		nr;
+	unsigned int		nr_irq;
 };
 
 static struct biovec_slab {
@@ -408,6 +414,22 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
 	queue_work(bs->rescue_workqueue, &bs->rescue_work);
 }
 
+static void bio_alloc_irq_cache_splice(struct bio_alloc_cache *cache)
+{
+	unsigned long flags;
+
+	/* cache->free_list must be empty */
+	if (WARN_ON_ONCE(cache->free_list))
+		return;
+
+	local_irq_save(flags);
+	cache->free_list = cache->free_list_irq;
+	cache->free_list_irq = NULL;
+	cache->nr += cache->nr_irq;
+	cache->nr_irq = 0;
+	local_irq_restore(flags);
+}
+
 static struct bio *bio_alloc_percpu_cache(struct block_device *bdev,
 		unsigned short nr_vecs, blk_opf_t opf, gfp_t gfp,
 		struct bio_set *bs)
@@ -417,8 +439,12 @@ static struct bio *bio_alloc_percpu_cache(struct block_device *bdev,
 
 	cache = per_cpu_ptr(bs->cache, get_cpu());
 	if (!cache->free_list) {
-		put_cpu();
-		return NULL;
+		if (READ_ONCE(cache->nr_irq) >= ALLOC_CACHE_THRESHOLD)
+			bio_alloc_irq_cache_splice(cache);
+		if (!cache->free_list) {
+			put_cpu();
+			return NULL;
+		}
 	}
 	bio = cache->free_list;
 	cache->free_list = bio->bi_next;
@@ -462,9 +488,6 @@ static struct bio *bio_alloc_percpu_cache(struct block_device *bdev,
  * submit_bio_noacct() should be avoided - instead, use bio_set's front_pad
  * for per bio allocations.
  *
- * If REQ_ALLOC_CACHE is set, the final put of the bio MUST be done from process
- * context, not hard/soft IRQ.
- *
  * Returns: Pointer to new bio on success, NULL on failure.
  */
 struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs,
@@ -678,11 +701,8 @@ void guard_bio_eod(struct bio *bio)
 	bio_truncate(bio, maxsector << 9);
 }
 
-#define ALLOC_CACHE_MAX		512
-#define ALLOC_CACHE_SLACK	 64
-
-static void bio_alloc_cache_prune(struct bio_alloc_cache *cache,
-				  unsigned int nr)
+static int __bio_alloc_cache_prune(struct bio_alloc_cache *cache,
+				   unsigned int nr)
 {
 	unsigned int i = 0;
 	struct bio *bio;
@@ -694,6 +714,17 @@ static void bio_alloc_cache_prune(struct bio_alloc_cache *cache,
 		if (++i == nr)
 			break;
 	}
+	return i;
+}
+
+static void bio_alloc_cache_prune(struct bio_alloc_cache *cache,
+				  unsigned int nr)
+{
+	nr -= __bio_alloc_cache_prune(cache, nr);
+	if (!READ_ONCE(cache->free_list)) {
+		bio_alloc_irq_cache_splice(cache);
+		__bio_alloc_cache_prune(cache, nr);
+	}
 }
 
 static int bio_cpu_dead(unsigned int cpu, struct hlist_node *node)
@@ -732,6 +763,12 @@ static inline void bio_put_percpu_cache(struct bio *bio)
 	struct bio_alloc_cache *cache;
 
 	cache = per_cpu_ptr(bio->bi_pool->cache, get_cpu());
+	if (READ_ONCE(cache->nr_irq) + cache->nr > ALLOC_CACHE_MAX) {
+		put_cpu();
+		bio_free(bio);
+		return;
+	}
+
 	bio_uninit(bio);
 
 	if ((bio->bi_opf & REQ_POLLED) && !WARN_ON_ONCE(in_interrupt())) {
@@ -739,13 +776,14 @@ static inline void bio_put_percpu_cache(struct bio *bio)
 		cache->free_list = bio;
 		cache->nr++;
 	} else {
-		put_cpu();
-		bio_free(bio);
-		return;
-	}
+		unsigned long flags;
 
-	if (cache->nr > ALLOC_CACHE_MAX + ALLOC_CACHE_SLACK)
-		bio_alloc_cache_prune(cache, ALLOC_CACHE_SLACK);
+		local_irq_save(flags);
+		bio->bi_next = cache->free_list_irq;
+		cache->free_list_irq = bio;
+		cache->nr_irq++;
+		local_irq_restore(flags);
+	}
 	put_cpu();
 }
 
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH for-next v4 5/6] bio: shrink max number of pcpu cached bios
  2022-11-02 15:18 [PATCH for-next v4 0/6] implement pcpu bio caching for IRQ I/O Pavel Begunkov
                   ` (3 preceding siblings ...)
  2022-11-02 15:18 ` [PATCH for-next v4 4/6] bio: add pcpu caching for non-polling bio_put Pavel Begunkov
@ 2022-11-02 15:18 ` Pavel Begunkov
  2022-11-02 15:18 ` [PATCH for-next v4 6/6] io_uring/rw: enable bio caches for IRQ rw Pavel Begunkov
  2022-11-16 18:49 ` [PATCH for-next v4 0/6] implement pcpu bio caching for IRQ I/O Jens Axboe
  6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2022-11-02 15:18 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: kernel-team, Pavel Begunkov

The downside of the bio pcpu cache is that bios of a cpu will be never
freed unless there is new I/O issued from that cpu. We currently keep
max 512 bios, which feels too much, half it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 block/bio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index d989e45583ac..6277a2f68ab8 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -27,7 +27,7 @@
 
 #define ALLOC_CACHE_THRESHOLD	16
 #define ALLOC_CACHE_SLACK	64
-#define ALLOC_CACHE_MAX		512
+#define ALLOC_CACHE_MAX		256
 
 struct bio_alloc_cache {
 	struct bio		*free_list;
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH for-next v4 6/6] io_uring/rw: enable bio caches for IRQ rw
  2022-11-02 15:18 [PATCH for-next v4 0/6] implement pcpu bio caching for IRQ I/O Pavel Begunkov
                   ` (4 preceding siblings ...)
  2022-11-02 15:18 ` [PATCH for-next v4 5/6] bio: shrink max number of pcpu cached bios Pavel Begunkov
@ 2022-11-02 15:18 ` Pavel Begunkov
  2022-11-16 18:49 ` [PATCH for-next v4 0/6] implement pcpu bio caching for IRQ I/O Jens Axboe
  6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2022-11-02 15:18 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: kernel-team, Pavel Begunkov

Now we can use IOCB_ALLOC_CACHE not only for iopoll'ed reads/write but
also for normal IRQ driven I/O.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/rw.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/io_uring/rw.c b/io_uring/rw.c
index bb47cc4da713..5c91cc80b348 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -665,6 +665,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
 	ret = kiocb_set_rw_flags(kiocb, rw->flags);
 	if (unlikely(ret))
 		return ret;
+	kiocb->ki_flags |= IOCB_ALLOC_CACHE;
 
 	/*
 	 * If the file is marked O_NONBLOCK, still allow retry for it if it
@@ -680,7 +681,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
 			return -EOPNOTSUPP;
 
 		kiocb->private = NULL;
-		kiocb->ki_flags |= IOCB_HIPRI | IOCB_ALLOC_CACHE;
+		kiocb->ki_flags |= IOCB_HIPRI;
 		kiocb->ki_complete = io_complete_rw_iopoll;
 		req->iopoll_completed = 0;
 	} else {
-- 
2.38.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH for-next v4 0/6] implement pcpu bio caching for IRQ I/O
  2022-11-02 15:18 [PATCH for-next v4 0/6] implement pcpu bio caching for IRQ I/O Pavel Begunkov
                   ` (5 preceding siblings ...)
  2022-11-02 15:18 ` [PATCH for-next v4 6/6] io_uring/rw: enable bio caches for IRQ rw Pavel Begunkov
@ 2022-11-16 18:49 ` Jens Axboe
  6 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2022-11-16 18:49 UTC (permalink / raw)
  To: Pavel Begunkov, linux-block; +Cc: kernel-team

On Wed, 2 Nov 2022 15:18:18 +0000, Pavel Begunkov wrote:
> Add bio pcpu caching for IRQ-driven I/O. We extend the currently limited to
> iopoll REQ_ALLOC_CACHE infra. Benchmarked with t/io_uring and an Optane SSD:
> 2.22 -> 2.32 MIOPS for qd32 (+4.5%) and 2.60 vs 2.82 for qd128 (+8.4%).
> 
> Works best with per-cpu queues, otherwise there might be some effects at
> play, e.g. bios allocated by one cpu but freed by another, but the worst
> case (always goes to mempool) doesn't show any performance degradation.
> 
> [...]

Applied, thanks!

[1/6] mempool: introduce mempool_is_saturated
      commit: 6e4068a11413b96687a03c39814539e202de294b
[2/6] bio: don't rob starving biosets of bios
      commit: 759aa12f19155fe4e4fb4740450b4aa4233b7d9f
[3/6] bio: split pcpu cache part of bio_put into a helper
      commit: f25cf75a452150c243f74ab1f1836822137d5d2c
[4/6] bio: add pcpu caching for non-polling bio_put
      commit: b99182c501c3dedeba4c9e6c92a60df1a2fee119
[5/6] bio: shrink max number of pcpu cached bios
      commit: 42b2b2fb6ecf1cc11eb7e75782dd7a7a17e6d958
[6/6] io_uring/rw: enable bio caches for IRQ rw
      commit: 12e4e8c7ab5978eb56f9d363461a8a40a8618bf4

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-11-16 18:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-02 15:18 [PATCH for-next v4 0/6] implement pcpu bio caching for IRQ I/O Pavel Begunkov
2022-11-02 15:18 ` [PATCH for-next v4 1/6] mempool: introduce mempool_is_saturated Pavel Begunkov
2022-11-02 15:18 ` [PATCH for-next v4 2/6] bio: don't rob starving biosets of bios Pavel Begunkov
2022-11-02 15:18 ` [PATCH for-next v4 3/6] bio: split pcpu cache part of bio_put into a helper Pavel Begunkov
2022-11-02 15:18 ` [PATCH for-next v4 4/6] bio: add pcpu caching for non-polling bio_put Pavel Begunkov
2022-11-02 15:18 ` [PATCH for-next v4 5/6] bio: shrink max number of pcpu cached bios Pavel Begunkov
2022-11-02 15:18 ` [PATCH for-next v4 6/6] io_uring/rw: enable bio caches for IRQ rw Pavel Begunkov
2022-11-16 18:49 ` [PATCH for-next v4 0/6] implement pcpu bio caching for IRQ I/O Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.