All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kashyap Desai <kashyap.desai@broadcom.com>
To: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	Ming Lei <ming.lei@redhat.com>, Omar Sandoval <osandov@fb.com>,
	Christoph Hellwig <hch@lst.de>, Hannes Reinecke <hare@suse.de>,
	linux-scsi <linux-scsi@vger.kernel.org>
Subject: [RFC] bypass scheduler for no sched is set
Date: Wed, 4 Jul 2018 13:29:50 +0530	[thread overview]
Message-ID: <CAHsXFKHdhOE_USCASpiRd6dHzcNvo29ipeJ2LhVbEsxnyMow1A@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 6307 bytes --]

Hi,

Ming Lei posted below patch series and performance improved for
megaraid_sas driver. I used the same kernel base and figure out some more
possible performance improvement in block layer. This RFC improves
performance as well as CPU utilization. If this patch fits the design
aspect of the blk-mq and scsi-mq, I can convert it into PATCH and submit
the same/modified version.

https://marc.info/?l=linux-block&m=153062994403732&w=2

Description of change -

Do not insert request into software queue if BLK_MQ_F_NO_SCHED is set.
Submit request from blk_mq_make_request to low level driver directly as
depicted through below function call.

blk_mq_try_issue_directly


__blk_mq_try_issue_directly

                 scsi_queue_rq

Low level driver attached to scsi.mq can set BLK_MQ_F_NO_SCHED,  If they do
not want benefit from io scheduler (e.a in case of SSDs connected to IT/MR
controller). In case of HDD drives connected to HBA, driver can avoid
setting BLK_MQ_F_NO_SCHED so that default elevator is set to mq-deadline.

Setup and performance number detail listed below -

I have created one R0 VD consist of 8 SSD on MegarRaid adapter.

Without RFC - IOPS goes 840K and CPU utilization goes upto 11%. Below is
perf top output

   5.17%  [kernel]                 [k] _raw_spin_lock
   4.62%  [kernel]                 [k] try_to_grab_pending
   2.29%  [kernel]                 [k] syscall_return_via_sysret
   1.37%  [kernel]                 [k] blk_mq_flush_busy_ctxs
   1.29%  [kernel]                 [k] kobject_get
   1.27%  fio                      [.] axmap_isset
   1.25%  [kernel]                 [k] flush_busy_ctx
   1.20%  [kernel]                 [k] scsi_dispatch_cmd
   1.18%  [kernel]                 [k] blk_mq_get_request
   1.16%  [kernel]                 [k] blk_mq_hctx_mark_pending.isra.45
   1.09%  [kernel]                 [k] irq_entries_start
   0.94%  [kernel]                 [k] del_timer
   0.91%  [kernel]                 [k] scsi_softirq_done
   0.90%  [kernel]                 [k] sbitmap_any_bit_set
   0.83%  [kernel]                 [k] blk_mq_free_request
   0.82%  [kernel]                 [k] kobject_put
   0.81%  [sd_mod]                 [k] sd_setup_read_write_cmnd
   0.80%  [kernel]                 [k] scsi_mq_get_budget
   0.79%  [kernel]                 [k] blk_mq_get_tag
   0.70%  [kernel]                 [k] blk_mq_dispatch_rq_list
   0.61%  [kernel]                 [k] bt_iter
   0.60%  fio                      [.] __fio_gettime
   0.59%  [kernel]                 [k] blk_mq_complete_request
   0.59%  [kernel]                 [k] gup_pgd_range
   0.57%  [kernel]                 [k] scsi_queue_rq


After applying RFC - IOPS goes 1066K and CPU utilization goes up to 6%.

   2.56%  [kernel]             [k] syscall_return_via_sysret
   2.46%  [kernel]             [k] irq_entries_start
   2.43%  [kernel]             [k] kobject_get
   2.40%  [kernel]             [k] bt_iter
   2.16%  fio                  [.] axmap_isset
   2.06%  [kernel]             [k] _raw_spin_lock
   1.76%  [kernel]             [k] __audit_syscall_exit
   1.51%  [kernel]             [k] scsi_dispatch_cmd
   1.49%  [kernel]             [k] blk_mq_free_request
   1.49%  [sd_mod]             [k] sd_setup_read_write_cmnd
   1.45%  [kernel]             [k] scsi_softirq_done
   1.32%  [kernel]             [k] switch_mm_irqs_off
   1.28%  [kernel]             [k] scsi_mq_get_budget
   1.22%  [kernel]             [k] blk_mq_check_inflight
   1.13%  [kernel]             [k] kobject_put
   1.11%  fio                  [.] __fio_gettime
   0.95%  [kernel]             [k] gup_pgd_range
   0.90%  [kernel]             [k] blk_mq_get_tag
   0.88%  [kernel]             [k] read_tsc
   0.85%  [kernel]             [k] scsi_end_request
   0.85%  fio                  [.] get_io_u
   0.84%  [kernel]             [k] lookup_ioctx
   0.81%  [kernel]             [k] blk_mq_complete_request
   0.80%  [kernel]             [k] blk_mq_get_request

Signed-off-by: Kashyap Desai < kashyap.desai@broadcom.com>
---

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4d1c048..ab27788 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1811,32 +1811,35 @@ static blk_qc_t blk_mq_make_request(struct
request_queue *q, struct bio *bio)
         blk_insert_flush(rq);
         blk_mq_run_hw_queue(data.hctx, true);
     } else if (plug && q->nr_hw_queues == 1) {
-        struct request *last = NULL;
-
         blk_mq_put_ctx(data.ctx);
         blk_mq_bio_to_request(rq, bio);
+        /* bypass scheduler for no sched flag set */
+        if (q->tag_set->flags & BLK_MQ_F_NO_SCHED)
+            blk_mq_try_issue_directly(data.hctx, rq, &cookie);
+        else {
+            struct request *last = NULL;
+            /*
+             * @request_count may become stale because of schedule
+             * out, so check the list again.
+             */
+            if (list_empty(&plug->mq_list))
+                request_count = 0;
+            else if (blk_queue_nomerges(q))
+                request_count = blk_plug_queued_count(q);
+
+            if (!request_count)
+                trace_block_plug(q);
+            else
+                last = list_entry_rq(plug->mq_list.prev);
+
+            if (request_count >= BLK_MAX_REQUEST_COUNT || (last &&
+                blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) {
+                blk_flush_plug_list(plug, false);
+                trace_block_plug(q);
+            }

-        /*
-         * @request_count may become stale because of schedule
-         * out, so check the list again.
-         */
-        if (list_empty(&plug->mq_list))
-            request_count = 0;
-        else if (blk_queue_nomerges(q))
-            request_count = blk_plug_queued_count(q);
-
-        if (!request_count)
-            trace_block_plug(q);
-        else
-            last = list_entry_rq(plug->mq_list.prev);
-
-        if (request_count >= BLK_MAX_REQUEST_COUNT || (last &&
-            blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) {
-            blk_flush_plug_list(plug, false);
-            trace_block_plug(q);
+            list_add_tail(&rq->queuelist, &plug->mq_list);
         }
-
-        list_add_tail(&rq->queuelist, &plug->mq_list);
     } else if (plug && !blk_queue_nomerges(q)) {
         blk_mq_bio_to_request(rq, bio);

[-- Attachment #2: Type: text/html, Size: 8995 bytes --]

             reply	other threads:[~2018-07-04  7:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-04  7:59 Kashyap Desai [this message]
2018-07-04  9:08 ` [RFC] bypass scheduler for no sched is set Ming Lei
2018-07-04 10:37   ` Kashyap Desai
2018-07-05  2:28     ` Ming Lei
2018-07-05  9:55       ` Kashyap Desai
2018-07-10  7:11   ` Kashyap Desai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHsXFKHdhOE_USCASpiRd6dHzcNvo29ipeJ2LhVbEsxnyMow1A@mail.gmail.com \
    --to=kashyap.desai@broadcom.com \
    --cc=axboe@kernel.dk \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=osandov@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.