From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755040AbcIPIvq (ORCPT ); Fri, 16 Sep 2016 04:51:46 -0400 Received: from mx1.redhat.com ([209.132.183.28]:16223 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753955AbcIPIvi (ORCPT ); Fri, 16 Sep 2016 04:51:38 -0400 From: Alexander Gordeev To: linux-kernel@vger.kernel.org Cc: Alexander Gordeev , Jens Axboe , linux-nvme@lists.infradead.org Subject: [PATCH RFC 00/21] blk-mq: Introduce combined hardware queues Date: Fri, 16 Sep 2016 10:51:11 +0200 Message-Id: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Fri, 16 Sep 2016 08:51:37 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Linux block device layer limits number of hardware contexts queues to number of CPUs in the system. That looks like suboptimal hardware utilization in systems where number of CPUs is (significantly) less than number of hardware queues. In addition, there is a need to deal with tag starvation (see commit 0d2602ca "blk-mq: improve support for shared tags maps"). While unused hardware queues stay idle, extra efforts are taken to maintain a notion of fairness between queue users. Deeper queue depth could probably mitigate the whole issue sometimes. That all brings a straightforward idea that hardware queues provided by a device should be utilized as much as possible. This series is an attempt to introduce 1:N mapping between CPUs and hardware queues. The code is experimental and hence some checks and sysfs interfaces and are withdrawn as blocking the demo implementation. The implementation evenly distributes hardware queues by CPUs, with moderate changes to the existing codebase. But further developments of the design are possible if needed. I.e. complete device utilization, CPU and/or interrupt topology-driven queue distribution, workload-driven queue redistribution. Comments and suggestions are very welcomed! The series is against linux-block tree. Thanks! CC: Jens Axboe CC: linux-nvme@lists.infradead.org Alexander Gordeev (21): blk-mq: Fix memory leaks on a queue cleanup blk-mq: Fix a potential NULL pointer assignment to hctx tags block: Get rid of unused request_queue::nr_queues member blk-mq: Do not limit number of queues to 'nr_cpu_ids' in allocations blk-mq: Update hardware queue map after q->nr_hw_queues is set block: Remove redundant blk_mq_ops::map_queue() interface blk-mq: Remove a redundant assignment blk-mq: Cleanup hardware context data node selection blk-mq: Cleanup a loop exit condition blk-mq: Get rid of unnecessary blk_mq_free_hw_queues() blk-mq: Move duplicating code to blk_mq_exit_hctx() blk-mq: Uninit hardware context in order reverse to init blk-mq: Move hardware context init code into blk_mq_init_hctx() blk-mq: Rework blk_mq_init_hctx() function blk-mq: Pair blk_mq_hctx_kobj_init() with blk_mq_hctx_kobj_put() blk-mq: Set flush_start_tag to BLK_MQ_MAX_DEPTH blk-mq: Introduce a 1:N hardware contexts blk-mq: Enable tag numbers exceed hardware queue depth blk-mq: Enable combined hardware queues blk-mq: Allow combined hardware queues null_blk: Do not limit # of hardware queues to # of CPUs block/blk-core.c | 5 +- block/blk-flush.c | 6 +- block/blk-mq-cpumap.c | 49 +++-- block/blk-mq-sysfs.c | 5 + block/blk-mq-tag.c | 9 +- block/blk-mq.c | 373 +++++++++++++++----------------------- block/blk-mq.h | 4 +- block/blk.h | 2 +- drivers/block/loop.c | 3 +- drivers/block/mtip32xx/mtip32xx.c | 4 +- drivers/block/null_blk.c | 16 +- drivers/block/rbd.c | 3 +- drivers/block/virtio_blk.c | 6 +- drivers/block/xen-blkfront.c | 6 +- drivers/md/dm-rq.c | 4 +- drivers/mtd/ubi/block.c | 1 - drivers/nvme/host/pci.c | 29 +-- drivers/nvme/host/rdma.c | 2 - drivers/nvme/target/loop.c | 2 - drivers/scsi/scsi_lib.c | 4 +- include/linux/blk-mq.h | 51 ++++-- include/linux/blkdev.h | 1 - 22 files changed, 279 insertions(+), 306 deletions(-) -- 1.8.3.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: agordeev@redhat.com (Alexander Gordeev) Date: Fri, 16 Sep 2016 10:51:11 +0200 Subject: [PATCH RFC 00/21] blk-mq: Introduce combined hardware queues Message-ID: Linux block device layer limits number of hardware contexts queues to number of CPUs in the system. That looks like suboptimal hardware utilization in systems where number of CPUs is (significantly) less than number of hardware queues. In addition, there is a need to deal with tag starvation (see commit 0d2602ca "blk-mq: improve support for shared tags maps"). While unused hardware queues stay idle, extra efforts are taken to maintain a notion of fairness between queue users. Deeper queue depth could probably mitigate the whole issue sometimes. That all brings a straightforward idea that hardware queues provided by a device should be utilized as much as possible. This series is an attempt to introduce 1:N mapping between CPUs and hardware queues. The code is experimental and hence some checks and sysfs interfaces and are withdrawn as blocking the demo implementation. The implementation evenly distributes hardware queues by CPUs, with moderate changes to the existing codebase. But further developments of the design are possible if needed. I.e. complete device utilization, CPU and/or interrupt topology-driven queue distribution, workload-driven queue redistribution. Comments and suggestions are very welcomed! The series is against linux-block tree. Thanks! CC: Jens Axboe CC: linux-nvme at lists.infradead.org Alexander Gordeev (21): blk-mq: Fix memory leaks on a queue cleanup blk-mq: Fix a potential NULL pointer assignment to hctx tags block: Get rid of unused request_queue::nr_queues member blk-mq: Do not limit number of queues to 'nr_cpu_ids' in allocations blk-mq: Update hardware queue map after q->nr_hw_queues is set block: Remove redundant blk_mq_ops::map_queue() interface blk-mq: Remove a redundant assignment blk-mq: Cleanup hardware context data node selection blk-mq: Cleanup a loop exit condition blk-mq: Get rid of unnecessary blk_mq_free_hw_queues() blk-mq: Move duplicating code to blk_mq_exit_hctx() blk-mq: Uninit hardware context in order reverse to init blk-mq: Move hardware context init code into blk_mq_init_hctx() blk-mq: Rework blk_mq_init_hctx() function blk-mq: Pair blk_mq_hctx_kobj_init() with blk_mq_hctx_kobj_put() blk-mq: Set flush_start_tag to BLK_MQ_MAX_DEPTH blk-mq: Introduce a 1:N hardware contexts blk-mq: Enable tag numbers exceed hardware queue depth blk-mq: Enable combined hardware queues blk-mq: Allow combined hardware queues null_blk: Do not limit # of hardware queues to # of CPUs block/blk-core.c | 5 +- block/blk-flush.c | 6 +- block/blk-mq-cpumap.c | 49 +++-- block/blk-mq-sysfs.c | 5 + block/blk-mq-tag.c | 9 +- block/blk-mq.c | 373 +++++++++++++++----------------------- block/blk-mq.h | 4 +- block/blk.h | 2 +- drivers/block/loop.c | 3 +- drivers/block/mtip32xx/mtip32xx.c | 4 +- drivers/block/null_blk.c | 16 +- drivers/block/rbd.c | 3 +- drivers/block/virtio_blk.c | 6 +- drivers/block/xen-blkfront.c | 6 +- drivers/md/dm-rq.c | 4 +- drivers/mtd/ubi/block.c | 1 - drivers/nvme/host/pci.c | 29 +-- drivers/nvme/host/rdma.c | 2 - drivers/nvme/target/loop.c | 2 - drivers/scsi/scsi_lib.c | 4 +- include/linux/blk-mq.h | 51 ++++-- include/linux/blkdev.h | 1 - 22 files changed, 279 insertions(+), 306 deletions(-) -- 1.8.3.1