From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754738AbbCaPnU (ORCPT ); Tue, 31 Mar 2015 11:43:20 -0400 Received: from mail-ie0-f179.google.com ([209.85.223.179]:34469 "EHLO mail-ie0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752562AbbCaPnS (ORCPT ); Tue, 31 Mar 2015 11:43:18 -0400 Message-ID: <551AC093.6080500@kernel.dk> Date: Tue, 31 Mar 2015 09:43:15 -0600 From: Jens Axboe User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Frederic Weisbecker CC: Rik van Riel , axboe@kernel.org, fweisbec@redhat.com, mingo@kernel.org, linux-kernel@vger.kernel.org, lcapitulino@redhat.com, mtosatti@redhat.com Subject: Re: [PATCH RFC] nohz,blk-mq: do not create blk-mq workqueues on nohz dedicated CPUs References: <20150331102726.076a6860@annuminas.surriel.com> <551AB81F.8020806@kernel.dk> <20150331153310.GB29033@lerouge> In-Reply-To: <20150331153310.GB29033@lerouge> Content-Type: multipart/mixed; boundary="------------050702000708010404000904" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------050702000708010404000904 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit On 03/31/2015 09:33 AM, Frederic Weisbecker wrote: > On Tue, Mar 31, 2015 at 09:07:11AM -0600, Jens Axboe wrote: >> On 03/31/2015 08:27 AM, Rik van Riel wrote: >>> CPUs with nohz_full do not want disruption from timer interrupts, >>> or other random system things. This includes block mq work. >>> >>> There is another issue with block mq vs. realtime tasks that run >>> 100% of the time, which is not uncommon on systems that have CPUs >>> dedicated to real time use with isolcpus= and nohz_full= >>> >>> Specifically, on systems like that, a block work item may never >>> get to run, which could lead to filesystems getting stuck forever. >>> >>> We can avoid both issues by not scheduling blk-mq workqueues on >>> cpus in nohz_full mode. >>> >>> Question for Jens: should we try to spread out the load for >>> currently offline and nohz CPUs across the remaining CPUs in >>> the system, to get the full benefit of blk-mq in these situations? >>> >>> If so, do you have any preference on how I should implement that? >>> >>> Cc: Frederic Weisbecker >>> Cc: Ingo Molnar >>> Cc: Jens Axboe >>> Signed-off-by: Rik van Riel >>> --- >>> block/blk-mq.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>> index 4f4bea21052e..1004d6817fa4 100644 >>> --- a/block/blk-mq.c >>> +++ b/block/blk-mq.c >>> @@ -21,6 +21,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> >>> #include >>> >>> @@ -1760,6 +1761,10 @@ static void blk_mq_init_cpu_queues(struct request_queue *q, >>> if (!cpu_online(i)) >>> continue; >>> >>> + /* Do not schedule work on nohz full dedicated CPUs. */ >>> + if (tick_nohz_full_cpu(i)) >>> + continue; >> >> Is this CPU ever going to queue IO? If yes, then it needs to be mapped. If >> userspace never runs on it and submits IO, then we'll never run completions >> on it nor schedule the associated workqueue. So I really don't see how it >> doesn't already work, as-is. > > Well, it's fairly possible that full dynticks CPUs do IO of any sort. Is it possible > to affine these asynchronous works to specific CPU? The usual scheme of full dynticks > is to have CPU 0 handling any kind of housekeeping and other CPUs doing latency or performance > sensitive works that don't want to be disturbed. That'd be easy enough to do, that's how blk-mq handles offline CPUs as well. The attached patch is completely untested, but will handle offline or nohz CPUs in the same fashion - they will punt to hardware queue 0, which is mapped to CPU0 (and others, depending on the queue vs CPU ratio). -- Jens Axboe --------------050702000708010404000904 Content-Type: text/x-patch; name="blk-mq-offline-nohz.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="blk-mq-offline-nohz.patch" diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 5f13f4d0bcce..9cb20d14c6b9 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -51,7 +51,10 @@ int blk_mq_update_queue_map(unsigned int *map, unsigned int nr_queues) queue = 0; for_each_possible_cpu(i) { - if (!cpu_online(i)) { + /* + * Offline or full nohz CPUs get mapped to CPU0 + */ + if (blk_mq_cpu_offline(i)) { map[i] = 0; continue; } diff --git a/block/blk-mq.c b/block/blk-mq.c index b7b8933ec241..ec0de2871950 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -366,7 +366,7 @@ static void blk_mq_ipi_complete_request(struct request *rq) if (!test_bit(QUEUE_FLAG_SAME_FORCE, &rq->q->queue_flags)) shared = cpus_share_cache(cpu, ctx->cpu); - if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) { + if (cpu != ctx->cpu && !shared && !blk_mq_cpu_offline(ctx->cpu)) { rq->csd.func = __blk_mq_complete_request_remote; rq->csd.info = rq; rq->csd.flags = 0; @@ -1022,7 +1022,7 @@ void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue, struct blk_mq_ctx *ctx = rq->mq_ctx, *current_ctx; current_ctx = blk_mq_get_ctx(q); - if (!cpu_online(ctx->cpu)) + if (blk_mq_cpu_offline(ctx->cpu)) rq->mq_ctx = ctx = current_ctx; hctx = q->mq_ops->map_queue(q, ctx->cpu); @@ -1051,7 +1051,7 @@ static void blk_mq_insert_requests(struct request_queue *q, current_ctx = blk_mq_get_ctx(q); - if (!cpu_online(ctx->cpu)) + if (blk_mq_cpu_offline(ctx->cpu)) ctx = current_ctx; hctx = q->mq_ops->map_queue(q, ctx->cpu); @@ -1757,7 +1757,7 @@ static void blk_mq_init_cpu_queues(struct request_queue *q, __ctx->queue = q; /* If the cpu isn't online, the cpu is mapped to first hctx */ - if (!cpu_online(i)) + if (blk_mq_cpu_offline(i)) continue; hctx = q->mq_ops->map_queue(q, i); @@ -1789,7 +1789,7 @@ static void blk_mq_map_swqueue(struct request_queue *q) */ queue_for_each_ctx(q, ctx, i) { /* If the cpu isn't online, the cpu is mapped to first hctx */ - if (!cpu_online(i)) + if (blk_mq_cpu_offline(i)) continue; hctx = q->mq_ops->map_queue(q, i); diff --git a/block/blk-mq.h b/block/blk-mq.h index 6a48c4c0d8a2..443dc8e0ea24 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -1,6 +1,8 @@ #ifndef INT_BLK_MQ_H #define INT_BLK_MQ_H +#include + struct blk_mq_tag_set; struct blk_mq_ctx { @@ -123,4 +125,13 @@ static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx) return hctx->nr_ctx && hctx->tags; } +/* + * If the CPU is offline or is a nohz CPU, we will remap any IO processing + * to the first hardware queue. + */ +static inline bool blk_mq_cpu_offline(const unsigned int cpu) +{ + return !cpu_online(cpu) || tick_nohz_full_cpu(cpu); +} + #endif --------------050702000708010404000904--