All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org, Omar Sandoval <osandov@fb.com>,
	Kashyap Desai <kashyap.desai@broadcom.com>,
	"Ewan D . Milne" <emilne@redhat.com>,
	Hannes Reinecke <hare@suse.de>
Subject: Re: [PATCH V3 for 5.11 00/12] blk-mq/scsi: tracking device queue depth via sbitmap
Date: Wed, 23 Sep 2020 15:31:06 -0600	[thread overview]
Message-ID: <CADbZ7FqgpoSdZpSkATRRkUHLzhifATVzseBm3et5__8=gt3NYA@mail.gmail.com> (raw)
In-Reply-To: <20200923013339.1621784-1-ming.lei@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4266 bytes --]

Hi Ming,

I have tested this patch extensively in our labs.

This patch gives excellent results when a single device can provide
very high IOPs, and only a few of those devices are available on the
system.
Thus, if a RAID 0 volume is created out of many high end NVMe devices,
then that RAID0 volume can potentially reach a max IOPs that is a
summation of the maxs IOPS for all the underlying drives. Without this
patch, the current kernel code cannot get there.

For example, for a simple RAID0 volume with 32 NVMe drives, I got
almost 100% performance boost with this patch.
The NVMe stack does not have this limitation, and this patch goes a
long way in closing that gap.

I have also tested it in many other configurations, and  did not see
any adverse side effects.

Please feel free to add:
Tested-by: Sumanesh Samanta

Thanks,
Sumanesh




On Tue, Sep 22, 2020 at 7:33 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> Hi,
>
> scsi uses one global atomic variable to track queue depth for each
> LUN/request queue. This way can't scale well when there is lots of CPU
> cores and the disk is very fast. Broadcom guys has complained that their
> high end HBA can't reach top performance because .device_busy is
> operated in IO path.
>
> Replace the atomic variable sdev->device_busy with sbitmap for
> tracking scsi device queue depth.
>
> Test on scsi_debug shows this way improve IOPS > 20%. Meantime
> the IOPS difference is just ~1% compared with bypassing .device_busy
> on scsi_debug via patches[1]
>
> The 1st 6 patches moves percpu allocation hint into sbitmap, since
> the improvement by doing percpu allocation hint on sbitmap is observable.
> Meantime export helpers for SCSI.
>
> Patch 7 and 8 prepares for the conversion by returning budget token
> from .get_budget callback, meantime passes the budget token to driver
> via 'struct blk_mq_queue_data' in .queue_rq().
>
> The last four patches changes SCSI for switching to track device queue
> depth via sbitmap.
>
> The patchset have been tested by Broadcom, and obvious performance boost
> can be observed.
>
> Given it is based on both for-5.10/block and 5.10/scsi-queue, the target
> is for v5.11. And it is posted out just for getting full/enough review.
>
> Please comment and review!
>
> V3:
>         - rebase on both for-5.10/block and 5.10/scsi-queue.
>
> V2:
>         - fix one build failure
>
>
> Ming Lei (12):
>   sbitmap: remove sbitmap_clear_bit_unlock
>   sbitmap: maintain allocation round_robin in sbitmap
>   sbitmap: add helpers for updating allocation hint
>   sbitmap: move allocation hint into sbitmap
>   sbitmap: export sbitmap_weight
>   sbitmap: add helper of sbitmap_calculate_shift
>   blk-mq: add callbacks for storing & retrieving budget token
>   blk-mq: return budget token from .get_budget callback
>   scsi: put hot fields of scsi_host_template into one cacheline
>   scsi: add scsi_device_busy() to read sdev->device_busy
>   scsi: make sure sdev->queue_depth is <= shost->can_queue
>   scsi: replace sdev->device_busy with sbitmap
>
>  block/blk-mq-sched.c                 |  17 ++-
>  block/blk-mq.c                       |  38 +++--
>  block/blk-mq.h                       |  25 +++-
>  block/kyber-iosched.c                |   3 +-
>  drivers/message/fusion/mptsas.c      |   2 +-
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c |   2 +-
>  drivers/scsi/scsi.c                  |   4 +
>  drivers/scsi/scsi_lib.c              |  69 ++++++---
>  drivers/scsi/scsi_priv.h             |   1 +
>  drivers/scsi/scsi_scan.c             |  22 ++-
>  drivers/scsi/scsi_sysfs.c            |   4 +-
>  drivers/scsi/sg.c                    |   2 +-
>  include/linux/blk-mq.h               |  13 +-
>  include/linux/sbitmap.h              |  84 +++++++----
>  include/scsi/scsi_cmnd.h             |   2 +
>  include/scsi/scsi_device.h           |   8 +-
>  include/scsi/scsi_host.h             |  72 ++++-----
>  lib/sbitmap.c                        | 213 +++++++++++++++------------
>  18 files changed, 376 insertions(+), 205 deletions(-)
>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
> Cc: Ewan D. Milne <emilne@redhat.com>
> Cc: Hannes Reinecke <hare@suse.de>
>
> --
> 2.25.2
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4178 bytes --]

      parent reply	other threads:[~2020-09-23 21:31 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-23  1:33 [PATCH V3 for 5.11 00/12] blk-mq/scsi: tracking device queue depth via sbitmap Ming Lei
2020-09-23  1:33 ` [PATCH V3 for 5.11 01/12] sbitmap: remove sbitmap_clear_bit_unlock Ming Lei
2020-09-23  7:59   ` Johannes Thumshirn
2020-09-23  1:33 ` [PATCH V3 for 5.11 02/12] sbitmap: maintain allocation round_robin in sbitmap Ming Lei
2020-09-23  1:33 ` [PATCH V3 for 5.11 03/12] sbitmap: add helpers for updating allocation hint Ming Lei
2020-09-23  1:33 ` [PATCH V3 for 5.11 04/12] sbitmap: move allocation hint into sbitmap Ming Lei
2020-09-23  6:36   ` Hannes Reinecke
2020-11-10  4:20     ` Ming Lei
2020-09-23  1:33 ` [PATCH V3 for 5.11 05/12] sbitmap: export sbitmap_weight Ming Lei
2020-09-23  6:37   ` Hannes Reinecke
2020-09-23  1:33 ` [PATCH V3 for 5.11 06/12] sbitmap: add helper of sbitmap_calculate_shift Ming Lei
2020-09-23  6:38   ` Hannes Reinecke
2020-09-23  1:33 ` [PATCH V3 for 5.11 07/12] blk-mq: add callbacks for storing & retrieving budget token Ming Lei
2020-09-23  6:39   ` Hannes Reinecke
2020-09-23  1:33 ` [PATCH V3 for 5.11 08/12] blk-mq: return budget token from .get_budget callback Ming Lei
2020-09-23  6:45   ` Hannes Reinecke
2020-09-23  1:33 ` [PATCH V3 for 5.11 09/12] scsi: put hot fields of scsi_host_template into one cacheline Ming Lei
2020-09-23  6:46   ` Hannes Reinecke
2020-09-23  1:33 ` [PATCH V3 for 5.11 10/12] scsi: add scsi_device_busy() to read sdev->device_busy Ming Lei
2020-09-23  6:47   ` Hannes Reinecke
2020-09-23  1:33 ` [PATCH V3 for 5.11 11/12] scsi: make sure sdev->queue_depth is <= shost->can_queue Ming Lei
2020-09-23  6:47   ` Hannes Reinecke
2020-09-23  7:38   ` John Garry
2020-11-10  9:28     ` Ming Lei
2020-09-23  1:33 ` [PATCH V3 for 5.11 12/12] scsi: replace sdev->device_busy with sbitmap Ming Lei
2020-09-23  6:50   ` Hannes Reinecke
2020-09-23 21:31 ` Sumanesh Samanta [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADbZ7FqgpoSdZpSkATRRkUHLzhifATVzseBm3et5__8=gt3NYA@mail.gmail.com' \
    --to=sumanesh.samanta@broadcom.com \
    --cc=axboe@kernel.dk \
    --cc=emilne@redhat.com \
    --cc=hare@suse.de \
    --cc=kashyap.desai@broadcom.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    --cc=osandov@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.