From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.3 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_DKIMWL_WL_HIGH,UNWANTED_LANGUAGE_BODY autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8E92C28CC3 for ; Fri, 7 Jun 2019 09:45:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B259D208C3 for ; Fri, 7 Jun 2019 09:45:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=broadcom.com header.i=@broadcom.com header.b="cZ+tnvrC" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727246AbfFGJpI (ORCPT ); Fri, 7 Jun 2019 05:45:08 -0400 Received: from mail-io1-f68.google.com ([209.85.166.68]:37191 "EHLO mail-io1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726823AbfFGJpH (ORCPT ); Fri, 7 Jun 2019 05:45:07 -0400 Received: by mail-io1-f68.google.com with SMTP id e5so927858iok.4 for ; Fri, 07 Jun 2019 02:45:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=from:references:in-reply-to:mime-version:thread-index:date :message-id:subject:to:cc; bh=g83v8094D4mWODrUsadU7i1rBHO1zAwqVkxt10+UJWs=; b=cZ+tnvrCCvhDViv8MaNtI6RhwVKZNuUz1P0VxPFHCv5cI/djeQtkfdwtTYPZ9y3cb0 glnMJ5xglNqEPXlop71+7g4rq1TGhCdz77XUd1K6oaCz191YzGIbC++RKfmyBN4laDub sGi32iK6TL0LYTmWb/x83mZwO7sNvGYjElVe0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:cc; bh=g83v8094D4mWODrUsadU7i1rBHO1zAwqVkxt10+UJWs=; b=cX5Rvd4NJt1hVBZiY17xuMf/ojoovyox39XEVLkjOZqVhkev/0mJf5Zwgab5woA6qq ldWZngyKxd7f+W6xjoKbfo1wWMzsrvVIlGBrB2a49XvLdV3HrmJpFHBWyU5B0pdI1ej9 nRFDrDJKA1dKwY+VhK8QOhvHAyJo8SYarX1LBYQtwKlsOgSiKbtVJlk5k1uah+dJWhYr NFRAfjjv8ILrY+v56yzHz9HiiXhpHUltxizKShg1fLkyYd6LU/3e00ZtDVu1Ev/O6ZNz SoUhTJWpWDHeD7oEmyR8HPzvAjij6JhhI0NDp7QmHf05c8SrZuyFlnhR00WB4M+WFt7M w4hQ== X-Gm-Message-State: APjAAAUE9Kotg9r8HLN9o1sztaLIrmEyZIPZ1KkSXUj3CIb6whSAGV4a gF+JjVAcQMAJTQk4+W2grBE89GA00s/vDXRhAxF3lA== X-Google-Smtp-Source: APXvYqy/ZoC4j2RDDElcc5eVOlGiF41xNchXI0MTqIyaBC1DnAhTbnpN2o4x0Gy7CWr4zx2wpbseZbVJ8yuU960Fbms= X-Received: by 2002:a6b:691d:: with SMTP id e29mr29149534ioc.96.1559900706576; Fri, 07 Jun 2019 02:45:06 -0700 (PDT) From: Kashyap Desai References: <20190531022801.10003-1-ming.lei@redhat.com> <20190531022801.10003-9-ming.lei@redhat.com> <7819e1a523b9e8227e3a9d188ee1e083@mail.gmail.com> <20190602064202.GA2731@ming.t460p> <20190602074757.GA31572@ming.t460p> <020a7707a31803d65dd94cc0928a425a@mail.gmail.com> <20190603035605.GB13684@ming.t460p> f24109eb867deae8cb262466ecc70b09@mail.gmail.com In-Reply-To: f24109eb867deae8cb262466ecc70b09@mail.gmail.com MIME-Version: 1.0 X-Mailer: Microsoft Outlook 15.0 Thread-Index: AQISV99IJucrELCJ7/TtkhttwnBgjgOGFSeuAhzMczwA7yMTbgJjyv4mAu5i69MBzxVPCaWhc+pggAZsHKA= Date: Fri, 7 Jun 2019 15:15:04 +0530 Message-ID: <287b092778c749c8b101641f9d48ab44@mail.gmail.com> Subject: RE: [PATCH 8/9] scsi: megaraid: convert private reply queue to blk-mq hw queue To: Ming Lei Cc: Jens Axboe , linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, "Martin K . Petersen" , James Bottomley , Bart Van Assche , Hannes Reinecke , John Garry , Don Brace , Sathya Prakash Veerichetty , Christoph Hellwig Content-Type: text/plain; charset="UTF-8" Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org > > > > > Please drop the patch in my last email, and apply the following patch > > and see if we can make a difference: > > Ming, > > I dropped early patch and applied the below patched. Now, I am getting > expected performance (3.0M IOPS). > Below patch fix the performance issue. See perf report after applying the > same - > > 8.52% [kernel] [k] sbitmap_any_bit_set > 4.19% [kernel] [k] blk_mq_run_hw_queue > 3.76% [megaraid_sas] [k] complete_cmd_fusion > 3.24% [kernel] [k] scsi_queue_rq > 2.53% [megaraid_sas] [k] megasas_build_ldio_fusion > 2.34% [megaraid_sas] [k] megasas_build_and_issue_cmd_fusion > 2.18% [kernel] [k] entry_SYSCALL_64 > 1.85% [kernel] [k] syscall_return_via_sysret > 1.78% [kernel] [k] blk_mq_run_hw_queues > 1.59% [kernel] [k] gup_pmd_range > 1.49% [kernel] [k] _raw_spin_lock_irqsave > 1.24% [kernel] [k] scsi_dec_host_busy > 1.23% [kernel] [k] blk_mq_free_request > 1.23% [kernel] [k] blk_mq_get_request > 0.96% [kernel] [k] __slab_free > 0.91% [kernel] [k] aio_complete > 0.90% [kernel] [k] __sched_text_start > 0.89% [megaraid_sas] [k] megasas_queue_command > 0.85% [kernel] [k] __fget > 0.84% [kernel] [k] scsi_mq_get_budget > > I will do some more testing and update the results. Ming, I did testing on AMD Dual Socket server (AMD EPYC 7601 32-Core Processor). System has total 128 logical cores. Without patch, performance can go upto 2.8M IOPS. See below perf top output. 7.37% [megaraid_sas] [k] complete_cmd_fusion 2.51% [kernel] [k] copy_user_generic_string 2.48% [kernel] [k] read_tsc 2.10% fio [.] thread_main 2.06% [kernel] [k] gup_pgd_range 1.98% [kernel] [k] __get_user_4 1.92% [kernel] [k] entry_SYSCALL_64 1.58% [kernel] [k] scsi_queue_rq 1.55% [megaraid_sas] [k] megasas_queue_command 1.52% [kernel] [k] irq_entries_start 1.43% fio [.] get_io_u 1.39% [kernel] [k] blkdev_direct_IO 1.34% [kernel] [k] __audit_syscall_exit 1.31% [megaraid_sas] [k] megasas_build_and_issue_cmd_fusion 1.27% [kernel] [k] syscall_slow_exit_work 1.23% [kernel] [k] io_submit_one 1.20% [kernel] [k] do_syscall_64 1.17% fio [.] td_io_queue 1.16% [kernel] [k] lookup_ioctx 1.14% [kernel] [k] kmem_cache_alloc 1.10% [megaraid_sas] [k] megasas_build_ldio_fusion 1.07% [kernel] [k] __memset 1.06% [kernel] [k] __virt_addr_valid 0.98% [kernel] [k] blk_mq_get_request 0.94% [kernel] [k] note_interrupt 0.91% [kernel] [k] __get_user_8 0.91% [kernel] [k] aio_read_events 0.85% [kernel] [k] __put_user_4 0.78% fio [.] fio_libaio_commit 0.74% [megaraid_sas] [k] MR_BuildRaidContext 0.70% [kernel] [k] __x64_sys_io_submit 0.69% fio [.] utime_since_now With your patch - Performance can go upto 1.7M IOPS. See below perf top output. 23.01% [kernel] [k] sbitmap_any_bit_set 6.42% [kernel] [k] blk_mq_run_hw_queue 4.44% [megaraid_sas] [k] complete_cmd_fusion 4.23% [kernel] [k] blk_mq_run_hw_queues 1.80% [kernel] [k] read_tsc 1.60% [kernel] [k] copy_user_generic_string 1.33% fio [.] thread_main 1.27% [kernel] [k] irq_entries_start 1.22% [kernel] [k] gup_pgd_range 1.20% [kernel] [k] __get_user_4 1.20% [kernel] [k] entry_SYSCALL_64 1.07% [kernel] [k] scsi_queue_rq 0.88% fio [.] get_io_u 0.87% [megaraid_sas] [k] megasas_queue_command 0.86% [kernel] [k] blkdev_direct_IO 0.85% fio [.] td_io_queue 0.80% [kernel] [k] note_interrupt 0.76% [kernel] [k] lookup_ioctx 0.76% [kernel] [k] do_syscall_64 0.75% [megaraid_sas] [k] megasas_build_and_issue_cmd_fusion 0.74% [megaraid_sas] [k] megasas_build_ldio_fusion 0.72% [kernel] [k] kmem_cache_alloc 0.71% [kernel] [k] __audit_syscall_exit 0.67% [kernel] [k] __virt_addr_valid 0.65% [kernel] [k] blk_mq_get_request 0.64% [kernel] [k] __memset 0.62% [kernel] [k] syscall_slow_exit_work 0.60% [kernel] [k] io_submit_one 0.59% [kernel] [k] ktime_get 0.58% fio [.] fio_libaio_commit 0.57% [kernel] [k] aio_read_events 0.54% [kernel] [k] __get_user_8 0.53% [kernel] [k] aio_complete_rw 0.51% [kernel] [k] kmem_cache_free With your patch + reducing logical cpu core to 64 (CPU hotplugged), performance can go upto 2.2M IOPS. See below perf top output. 9.56% [kernel] [k] sbitmap_any_bit_set 4.62% [megaraid_sas] [k] complete_cmd_fusion 3.02% [kernel] [k] blk_mq_run_hw_queue 2.15% [kernel] [k] copy_user_generic_string 2.13% [kernel] [k] blk_mq_run_hw_queues 2.09% [kernel] [k] read_tsc 1.66% [kernel] [k] __get_user_4 1.59% [kernel] [k] entry_SYSCALL_64 1.57% [kernel] [k] gup_pgd_range 1.55% fio [.] thread_main 1.51% [kernel] [k] scsi_queue_rq 1.31% [kernel] [k] __memset 1.21% [megaraid_sas] [k] megasas_build_and_issue_cmd_fusion 1.16% [megaraid_sas] [k] megasas_queue_command 1.13% fio [.] get_io_u 1.12% [kernel] [k] blk_mq_get_request 1.07% [kernel] [k] blkdev_direct_IO 1.06% [kernel] [k] __put_user_4 1.05% fio [.] td_io_queue 1.02% [kernel] [k] syscall_slow_exit_work 1.00% [megaraid_sas] [k] megasas_build_ldio_fusion In summary, Part of the performance drop may be correlated with number of hctx created in block layer. I can provide more details and can test follow up patch. Kashyap > > Kashyap > > > > > diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index > > 3d6780504dcb..69d6bffcc8ff 100644 > > --- a/block/blk-mq-debugfs.c > > +++ b/block/blk-mq-debugfs.c > > @@ -627,6 +627,9 @@ static int hctx_active_show(void *data, struct > > seq_file > > *m) { > > struct blk_mq_hw_ctx *hctx = data; > > > > + if (hctx->flags & BLK_MQ_F_HOST_TAGS) > > + hctx = blk_mq_master_hctx(hctx); > > + > > seq_printf(m, "%d\n", atomic_read(&hctx->nr_active)); > > return 0; > > } > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index > > 309ec5079f3f..58ef83a34fda 100644 > > --- a/block/blk-mq-tag.c > > +++ b/block/blk-mq-tag.c > > @@ -30,6 +30,9 @@ bool blk_mq_has_free_tags(struct blk_mq_tags *tags) > > */ > > bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) { > > + if (hctx->flags & BLK_MQ_F_HOST_TAGS) > > + hctx = blk_mq_master_hctx(hctx); > > + > > if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) && > > !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) > > atomic_inc(&hctx->tags->active_queues); > > @@ -55,6 +58,9 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx) > { > > struct blk_mq_tags *tags = hctx->tags; > > > > + if (hctx->flags & BLK_MQ_F_HOST_TAGS) > > + hctx = blk_mq_master_hctx(hctx); > > + > > if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) > > return; > > > > @@ -74,6 +80,10 @@ static inline bool hctx_may_queue(struct > > blk_mq_hw_ctx *hctx, > > > > if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED)) > > return true; > > + > > + if (hctx->flags & BLK_MQ_F_HOST_TAGS) > > + hctx = blk_mq_master_hctx(hctx); > > + > > if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) > > return true; > > > > diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index > > 61deab0b5a5a..84e9b46ffc78 100644 > > --- a/block/blk-mq-tag.h > > +++ b/block/blk-mq-tag.h > > @@ -36,11 +36,22 @@ extern void blk_mq_tag_wakeup_all(struct > > blk_mq_tags *tags, bool); void blk_mq_queue_tag_busy_iter(struct > > request_queue *q, busy_iter_fn *fn, > > void *priv); > > > > +static inline struct blk_mq_hw_ctx *blk_mq_master_hctx( > > + struct blk_mq_hw_ctx *hctx) > > +{ > > + return hctx->queue->queue_hw_ctx[0]; } > > + > > + > > static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt, > > struct blk_mq_hw_ctx *hctx) > > { > > if (!hctx) > > return &bt->ws[0]; > > + > > + if (hctx->flags & BLK_MQ_F_HOST_TAGS) > > + hctx = blk_mq_master_hctx(hctx); > > + > > return sbq_wait_ptr(bt, &hctx->wait_index); } > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c index > > 49d73d979cb3..4196ed3b0085 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -303,7 +303,7 @@ static struct request *blk_mq_rq_ctx_init(struct > > blk_mq_alloc_data *data, > > } else { > > if (data->hctx->flags & BLK_MQ_F_TAG_SHARED) { > > rq_flags = RQF_MQ_INFLIGHT; > > - atomic_inc(&data->hctx->nr_active); > > + blk_mq_inc_nr_active(data->hctx); > > } > > rq->tag = tag; > > rq->internal_tag = -1; > > @@ -517,7 +517,7 @@ void blk_mq_free_request(struct request *rq) > > > > ctx->rq_completed[rq_is_sync(rq)]++; > > if (rq->rq_flags & RQF_MQ_INFLIGHT) > > - atomic_dec(&hctx->nr_active); > > + blk_mq_dec_nr_active(hctx); > > > > if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq))) > > laptop_io_completion(q->backing_dev_info); > > @@ -1064,7 +1064,7 @@ bool blk_mq_get_driver_tag(struct request *rq) > > if (rq->tag >= 0) { > > if (shared) { > > rq->rq_flags |= RQF_MQ_INFLIGHT; > > - atomic_inc(&data.hctx->nr_active); > > + blk_mq_inc_nr_active(data.hctx); > > } > > data.hctx->tags->rqs[rq->tag] = rq; > > } > > diff --git a/block/blk-mq.h b/block/blk-mq.h index > > 633a5a77ee8b..f1279b8c2289 100644 > > --- a/block/blk-mq.h > > +++ b/block/blk-mq.h > > @@ -193,6 +193,20 @@ unsigned int blk_mq_in_flight(struct > > request_queue *q, struct hd_struct *part); void > > blk_mq_in_flight_rw(struct request_queue *q, struct hd_struct *part, > > unsigned int inflight[2]); > > > > +static inline void blk_mq_inc_nr_active(struct blk_mq_hw_ctx *hctx) { > > + if (hctx->flags & BLK_MQ_F_HOST_TAGS) > > + hctx = blk_mq_master_hctx(hctx); > > + atomic_inc(&hctx->nr_active); > > +} > > + > > +static inline void blk_mq_dec_nr_active(struct blk_mq_hw_ctx *hctx) { > > + if (hctx->flags & BLK_MQ_F_HOST_TAGS) > > + hctx = blk_mq_master_hctx(hctx); > > + atomic_dec(&hctx->nr_active); > > +} > > + > > static inline void blk_mq_put_dispatch_budget(struct blk_mq_hw_ctx > > *hctx) { > > struct request_queue *q = hctx->queue; @@ -218,7 +232,7 @@ static > > inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx, > > > > if (rq->rq_flags & RQF_MQ_INFLIGHT) { > > rq->rq_flags &= ~RQF_MQ_INFLIGHT; > > - atomic_dec(&hctx->nr_active); > > + blk_mq_dec_nr_active(hctx); > > } > > } > > > > Thanks, > > Ming