From: Pavel Machek <pavel@ucw.cz>
To: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Mike Snitzer <snitzer@redhat.com>,
kernel list <linux-kernel@vger.kernel.org>,
axboe@fb.com, hch@lst.de, neilb@suse.de,
martin.petersen@oracle.com, dpark@posteo.net,
ming.l@ssi.samsung.com, dm-devel@redhat.com,
ming.lei@canonical.com, agk@redhat.com, jkosina@suse.cz,
geoff@infradead.org, jim@jtan.com, pjk1939@linux.vnet.ibm.com,
minchan@kernel.org, ngupta@vflare.org, oleg.drokin@intel.com,
andreas.dilger@intel.com
Subject: Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
Date: Tue, 7 Feb 2017 21:39:11 +0100 [thread overview]
Message-ID: <20170207203911.GA16980@amd> (raw)
In-Reply-To: <20170207024906.4oswyuvxfnqkvbhr@moria.home.lan>
[-- Attachment #1: Type: text/plain, Size: 13883 bytes --]
On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > >
> > > So.. what needs to be done there?
>
> > But, I just got an idea for how to handle this that might be halfway sane, maybe
> > I'll try and come up with a patch...
>
> Ok, here's such a patch, only lightly tested:
I guess it would be nice for me to test it... but what it is against?
I tried after v4.10-rc5 and linux-next, but got rejects in both cases.
Thanks,
Pavel
> -- >8 --
> Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset
>
> Note: this patch is very lightly tested.
>
> Also, trigger rescuing whenever with bios on current->bio_list, instead
> of only when we block in bio_alloc_bioset(). This is more correct, and
> should result in fewer rescuer threads.
>
> XXX: The current->bio_list plugging needs to be unified with the
> blk_plug mechanism.
>
> TODO: If we change normal request_queue drivers to handle arbitrary size
> bios by processing requests incrementally, instead of splitting bios,
> then we can get rid of rescuer threads from those devices.
> ---
> block/bio.c | 107 ++++---------------------------------------------
> block/blk-core.c | 58 ++++++++++++++++++++++++---
> block/blk-sysfs.c | 2 +
> include/linux/bio.h | 16 ++++----
> include/linux/blkdev.h | 10 +++++
> include/linux/sched.h | 2 +-
> kernel/sched/core.c | 4 ++
> 7 files changed, 83 insertions(+), 116 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index f3b5786202..9ad54a9b12 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -336,54 +336,6 @@ void bio_chain(struct bio *bio, struct bio *parent)
> }
> EXPORT_SYMBOL(bio_chain);
>
> -static void bio_alloc_rescue(struct work_struct *work)
> -{
> - struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
> - struct bio *bio;
> -
> - while (1) {
> - spin_lock(&bs->rescue_lock);
> - bio = bio_list_pop(&bs->rescue_list);
> - spin_unlock(&bs->rescue_lock);
> -
> - if (!bio)
> - break;
> -
> - generic_make_request(bio);
> - }
> -}
> -
> -static void punt_bios_to_rescuer(struct bio_set *bs)
> -{
> - struct bio_list punt, nopunt;
> - struct bio *bio;
> -
> - /*
> - * In order to guarantee forward progress we must punt only bios that
> - * were allocated from this bio_set; otherwise, if there was a bio on
> - * there for a stacking driver higher up in the stack, processing it
> - * could require allocating bios from this bio_set, and doing that from
> - * our own rescuer would be bad.
> - *
> - * Since bio lists are singly linked, pop them all instead of trying to
> - * remove from the middle of the list:
> - */
> -
> - bio_list_init(&punt);
> - bio_list_init(&nopunt);
> -
> - while ((bio = bio_list_pop(current->bio_list)))
> - bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio);
> -
> - *current->bio_list = nopunt;
> -
> - spin_lock(&bs->rescue_lock);
> - bio_list_merge(&bs->rescue_list, &punt);
> - spin_unlock(&bs->rescue_lock);
> -
> - queue_work(bs->rescue_workqueue, &bs->rescue_work);
> -}
> -
> /**
> * bio_alloc_bioset - allocate a bio for I/O
> * @gfp_mask: the GFP_ mask given to the slab allocator
> @@ -421,54 +373,27 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
> */
> struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> {
> - gfp_t saved_gfp = gfp_mask;
> unsigned front_pad;
> unsigned inline_vecs;
> struct bio_vec *bvl = NULL;
> struct bio *bio;
> void *p;
>
> - if (!bs) {
> - if (nr_iovecs > UIO_MAXIOV)
> - return NULL;
> + WARN(current->bio_list &&
> + !current->bio_list->q->rescue_workqueue,
> + "allocating bio beneath generic_make_request() without rescuer");
>
> + if (nr_iovecs > UIO_MAXIOV)
> + return NULL;
> +
> + if (!bs) {
> p = kmalloc(sizeof(struct bio) +
> nr_iovecs * sizeof(struct bio_vec),
> gfp_mask);
> front_pad = 0;
> inline_vecs = nr_iovecs;
> } else {
> - /*
> - * generic_make_request() converts recursion to iteration; this
> - * means if we're running beneath it, any bios we allocate and
> - * submit will not be submitted (and thus freed) until after we
> - * return.
> - *
> - * This exposes us to a potential deadlock if we allocate
> - * multiple bios from the same bio_set() while running
> - * underneath generic_make_request(). If we were to allocate
> - * multiple bios (say a stacking block driver that was splitting
> - * bios), we would deadlock if we exhausted the mempool's
> - * reserve.
> - *
> - * We solve this, and guarantee forward progress, with a rescuer
> - * workqueue per bio_set. If we go to allocate and there are
> - * bios on current->bio_list, we first try the allocation
> - * without __GFP_DIRECT_RECLAIM; if that fails, we punt those
> - * bios we would be blocking to the rescuer workqueue before
> - * we retry with the original gfp_flags.
> - */
> -
> - if (current->bio_list && !bio_list_empty(current->bio_list))
> - gfp_mask &= ~__GFP_DIRECT_RECLAIM;
> -
> p = mempool_alloc(&bs->bio_pool, gfp_mask);
> - if (!p && gfp_mask != saved_gfp) {
> - punt_bios_to_rescuer(bs);
> - gfp_mask = saved_gfp;
> - p = mempool_alloc(&bs->bio_pool, gfp_mask);
> - }
> -
> front_pad = bs->front_pad;
> inline_vecs = BIO_INLINE_VECS;
> }
> @@ -483,12 +408,6 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> unsigned long idx = 0;
>
> bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx, &bs->bvec_pool);
> - if (!bvl && gfp_mask != saved_gfp) {
> - punt_bios_to_rescuer(bs);
> - gfp_mask = saved_gfp;
> - bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx, &bs->bvec_pool);
> - }
> -
> if (unlikely(!bvl))
> goto err_free;
>
> @@ -1938,10 +1857,6 @@ int biovec_init_pool(mempool_t *pool, int pool_entries)
>
> void bioset_exit(struct bio_set *bs)
> {
> - if (bs->rescue_workqueue)
> - destroy_workqueue(bs->rescue_workqueue);
> - bs->rescue_workqueue = NULL;
> -
> mempool_exit(&bs->bio_pool);
> mempool_exit(&bs->bvec_pool);
>
> @@ -1968,10 +1883,6 @@ static int __bioset_init(struct bio_set *bs,
>
> bs->front_pad = front_pad;
>
> - spin_lock_init(&bs->rescue_lock);
> - bio_list_init(&bs->rescue_list);
> - INIT_WORK(&bs->rescue_work, bio_alloc_rescue);
> -
> bs->bio_slab = bio_find_or_create_slab(front_pad + back_pad);
> if (!bs->bio_slab)
> return -ENOMEM;
> @@ -1983,10 +1894,6 @@ static int __bioset_init(struct bio_set *bs,
> biovec_init_pool(&bs->bvec_pool, pool_size))
> goto bad;
>
> - bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
> - if (!bs->rescue_workqueue)
> - goto bad;
> -
> return 0;
> bad:
> bioset_exit(bs);
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 7e3cfa9c88..f716164cb3 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -48,6 +48,8 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(block_unplug);
>
> DEFINE_IDA(blk_queue_ida);
>
> +static void bio_rescue_work(struct work_struct *);
> +
> /*
> * For the allocated request tables
> */
> @@ -759,11 +761,21 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
> PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
> goto fail_bdi;
>
> - if (blkcg_init_queue(q))
> + spin_lock_init(&q->rescue_lock);
> + bio_list_init(&q->rescue_list);
> + INIT_WORK(&q->rescue_work, bio_rescue_work);
> +
> + q->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
> + if (!q->rescue_workqueue)
> goto fail_ref;
>
> + if (blkcg_init_queue(q))
> + goto fail_rescue;
> +
> return q;
>
> +fail_rescue:
> + destroy_workqueue(q->rescue_workqueue);
> fail_ref:
> percpu_ref_exit(&q->q_usage_counter);
> fail_bdi:
> @@ -1994,7 +2006,7 @@ generic_make_request_checks(struct bio *bio)
> */
> blk_qc_t generic_make_request(struct bio *bio)
> {
> - struct bio_list bio_list_on_stack;
> + struct bio_plug_list bio_list_on_stack;
> blk_qc_t ret = BLK_QC_T_NONE;
>
> if (!generic_make_request_checks(bio))
> @@ -2011,7 +2023,9 @@ blk_qc_t generic_make_request(struct bio *bio)
> * should be added at the tail
> */
> if (current->bio_list) {
> - bio_list_add(current->bio_list, bio);
> + WARN(!current->bio_list->q->rescue_workqueue,
> + "submitting bio beneath generic_make_request() without rescuer");
> + bio_list_add(¤t->bio_list->bios, bio);
> goto out;
> }
>
> @@ -2030,19 +2044,23 @@ blk_qc_t generic_make_request(struct bio *bio)
> * bio_list, and call into ->make_request() again.
> */
> BUG_ON(bio->bi_next);
> - bio_list_init(&bio_list_on_stack);
> + bio_list_init(&bio_list_on_stack.bios);
> current->bio_list = &bio_list_on_stack;
> +
> do {
> struct request_queue *q = bdev_get_queue(bio->bi_bdev);
>
> + current->bio_list->q = q;
> +
> if (likely(blk_queue_enter(q, false) == 0)) {
> ret = q->make_request_fn(q, bio);
>
> blk_queue_exit(q);
>
> - bio = bio_list_pop(current->bio_list);
> + bio = bio_list_pop(¤t->bio_list->bios);
> } else {
> - struct bio *bio_next = bio_list_pop(current->bio_list);
> + struct bio *bio_next =
> + bio_list_pop(¤t->bio_list->bios);
>
> bio_io_error(bio);
> bio = bio_next;
> @@ -2055,6 +2073,34 @@ blk_qc_t generic_make_request(struct bio *bio)
> }
> EXPORT_SYMBOL(generic_make_request);
>
> +static void bio_rescue_work(struct work_struct *work)
> +{
> + struct request_queue *q =
> + container_of(work, struct request_queue, rescue_work);
> + struct bio *bio;
> +
> + while (1) {
> + spin_lock(&q->rescue_lock);
> + bio = bio_list_pop(&q->rescue_list);
> + spin_unlock(&q->rescue_lock);
> +
> + if (!bio)
> + break;
> +
> + generic_make_request(bio);
> + }
> +}
> +
> +void blk_punt_blocked_bios(struct bio_plug_list *list)
> +{
> + spin_lock(&list->q->rescue_lock);
> + bio_list_merge(&list->q->rescue_list, &list->bios);
> + bio_list_init(&list->bios);
> + spin_unlock(&list->q->rescue_lock);
> +
> + queue_work(list->q->rescue_workqueue, &list->q->rescue_work);
> +}
> +
> /**
> * submit_bio - submit a bio to the block device layer for I/O
> * @bio: The &struct bio which describes the I/O
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 7f27a18cc4..77529238d1 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -660,6 +660,8 @@ static void blk_release_queue(struct kobject *kobj)
>
> blk_trace_shutdown(q);
>
> + if (q->rescue_workqueue)
> + destroy_workqueue(q->rescue_workqueue);
> if (q->bio_split)
> bioset_free(q->bio_split);
>
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 1ffe8e37ae..87eeec7eda 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -658,6 +658,13 @@ static inline struct bio *bio_list_get(struct bio_list *bl)
> return bio;
> }
>
> +struct bio_plug_list {
> + struct bio_list bios;
> + struct request_queue *q;
> +};
> +
> +void blk_punt_blocked_bios(struct bio_plug_list *);
> +
> /*
> * Increment chain count for the bio. Make sure the CHAIN flag update
> * is visible before the raised count.
> @@ -687,15 +694,6 @@ struct bio_set {
> mempool_t bio_integrity_pool;
> mempool_t bvec_integrity_pool;
> #endif
> -
> - /*
> - * Deadlock avoidance for stacking block drivers: see comments in
> - * bio_alloc_bioset() for details
> - */
> - spinlock_t rescue_lock;
> - struct bio_list rescue_list;
> - struct work_struct rescue_work;
> - struct workqueue_struct *rescue_workqueue;
> };
>
> struct biovec_slab {
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c47c358ba0..f64b886c65 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -476,6 +476,16 @@ struct request_queue {
> struct bio_set *bio_split;
>
> bool mq_sysfs_init_done;
> +
> + /*
> + * Deadlock avoidance, to deal with the plugging in
> + * generic_make_request() that converts recursion to iteration to avoid
> + * stack overflow:
> + */
> + spinlock_t rescue_lock;
> + struct bio_list rescue_list;
> + struct work_struct rescue_work;
> + struct workqueue_struct *rescue_workqueue;
> };
>
> #define QUEUE_FLAG_QUEUED 1 /* uses generic tag queueing */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2865d10a28..59df7a1030 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1762,7 +1762,7 @@ struct task_struct {
> void *journal_info;
>
> /* stacked block device info */
> - struct bio_list *bio_list;
> + struct bio_plug_list *bio_list;
>
> #ifdef CONFIG_BLOCK
> /* stack plugging */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bd39d698cb..23b6290ba1 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3439,6 +3439,10 @@ static inline void sched_submit_work(struct task_struct *tsk)
> {
> if (!tsk->state || tsk_is_pi_blocked(tsk))
> return;
> +
> + if (tsk->bio_list && !bio_list_empty(&tsk->bio_list->bios))
> + blk_punt_blocked_bios(tsk->bio_list);
> +
> /*
> * If we are going to sleep and we have plugged IO queued,
> * make sure to submit it to avoid deadlocks.
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
next prev parent reply other threads:[~2017-02-07 20:39 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-11 10:49 4.4-rc: 28 bioset threads on small notebook Pavel Machek
2015-12-11 14:08 ` Mike Snitzer
2015-12-11 17:14 ` Pavel Machek
2016-02-20 17:40 ` 4.4-final: " Pavel Machek
2016-02-20 18:42 ` Pavel Machek
2016-02-20 19:51 ` Mike Snitzer
2016-02-20 20:04 ` Pavel Machek
2016-02-20 20:38 ` Mike Snitzer
2016-02-20 20:55 ` Pavel Machek
2016-02-21 4:15 ` Kent Overstreet
2016-02-21 6:43 ` Ming Lin-SSI
2016-02-21 9:40 ` Ming Lei
2016-02-22 22:58 ` Kent Overstreet
2016-02-23 2:55 ` Ming Lei
2016-02-23 14:54 ` Mike Snitzer
2016-02-24 2:48 ` Ming Lei
2016-02-24 3:23 ` Kent Overstreet
2016-02-23 20:45 ` Pavel Machek
2017-02-06 12:53 ` v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone Pavel Machek
2017-02-07 1:47 ` Kent Overstreet
2017-02-07 2:49 ` Kent Overstreet
2017-02-07 17:13 ` Mike Snitzer
2017-02-07 20:39 ` Pavel Machek [this message]
2017-02-08 3:12 ` Mike Galbraith
2017-02-08 4:58 ` Kent Overstreet
2017-02-08 6:22 ` [PATCH] block: Make rescuer threads per request_queue, not per bioset kbuild test robot
2017-02-08 6:23 ` kbuild test robot
2017-02-08 6:57 ` v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone Mike Galbraith
2017-02-08 16:34 ` Mike Snitzer
2017-02-09 21:25 ` Kent Overstreet
2017-02-14 16:34 ` [dm-devel] " Mikulas Patocka
2017-02-14 17:33 ` Mike Snitzer
2017-02-08 2:47 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170207203911.GA16980@amd \
--to=pavel@ucw.cz \
--cc=agk@redhat.com \
--cc=andreas.dilger@intel.com \
--cc=axboe@fb.com \
--cc=dm-devel@redhat.com \
--cc=dpark@posteo.net \
--cc=geoff@infradead.org \
--cc=hch@lst.de \
--cc=jim@jtan.com \
--cc=jkosina@suse.cz \
--cc=kent.overstreet@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=minchan@kernel.org \
--cc=ming.l@ssi.samsung.com \
--cc=ming.lei@canonical.com \
--cc=neilb@suse.de \
--cc=ngupta@vflare.org \
--cc=oleg.drokin@intel.com \
--cc=pjk1939@linux.vnet.ibm.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).