All of lore.kernel.org
 help / color / mirror / Atom feed
* scsi-mq V4
@ 2014-07-18 10:12 Christoph Hellwig
  2014-07-18 10:13 ` [PATCH 01/14] scsi: add scsi_setup_cmnd helper Christoph Hellwig
                   ` (13 more replies)
  0 siblings, 14 replies; 38+ messages in thread
From: Christoph Hellwig @ 2014-07-18 10:12 UTC (permalink / raw)
  To: James Bottomley, linux-scsi
  Cc: Jens Axboe, Bart Van Assche, Mike Christie, Martin K. Petersen,
	Robert Elliott, Webb Scales, linux-kernel

At this point the code is ready for merging and use by developers and early
adopters.  Except for the newly added first patch all have been thru multiple
review cycles and I would like to merge the series early next week assuming
I can get reviews for this.  Please scream loud if you see any reason not
to merge it now.

The core blk-mq code isn't that suitable for slow devices
yet, mostly due to the lack of an I/O scheduler, but Jens is working on it.
Similarly there is no dm-multipath support for drivers using blk-mq yet,
but I'm working on it.  It should also be noted that the code doesn't
actually support multiple hardware queues or fine grained tuning of the
blk-mq parameters yet.  All these could be added fairly easily as soon
as low-level drivers want to make use of them.

The amount of chances to the existing code are fairly small, and mostly
speedups or cleanups that also apply to the old path as well.  Because
of this I also haven't bothered to put it under a config option, just
like the blk-mq core.

The usage of blk-mq dramatically decreases CPU usage under all workloads going
down from 100% CPU usage that the old setup can hit easily to usually less
than 20% for maxing out storage subsystems with 512byte reads and writes,
and it allows to easily archive millions of IOPS.  Bart and Robert have
helped with some very detailed measurements that they might be able to send
in reply to this, although these usually involve significantly reworked low
level drivers to avoid other bottle necks.

One major objection to previous iterations of this code was the simple
replacement of the host_lock with atomic counters for the host and busy
counters.  The host_lock avoidance on it's own already improves performance,
and with the patch to avoid maintaining the per-target busy counter unless
needed we now replace a lock round trip on the host_lock with just a single
atomic increment in the submission path, and a single atomic decrement in
completion path, which should provide benefits even for the oddest RISC
architecture.  Longer term I'd still love to get rid of these entirely
and use the counters in blk-mq, but due to the difference in how they
are maintained this doesn't seem feasible as long as we still need to
support the legacy request code path.

Changes from V3:
 - micro optimize the scsi_*_queue_ready functions (Webb Scales)
 - reverted an uninited but harmless transformation in
   scsi_host_queue_ready (Reported by Webb Scales)
 - remove a superflous cancel_delayed_work (Reported by Mike Christie)
 - fix for error handling during failed host initialization
   (Reported by Robert Elliot)

Changes from V2:
 - rebased on top of the I/O path cleanups

Changes from V1:
 - rebased on top of the core-for-3.17 branch, most notable the
   scsi logging changes
 - fixed handling of cmd_list to prevent crashes for some heavy
   workloads
 - fixed incorrect handling of !target->can_queue
 - avoid scheduling a workqueue on I/O completions when no queues
   are congested

In addition to the patches in this thread there also is a git available at:

	git://git.infradead.org/users/hch/scsi.git scsi-mq.4

This work was sponsored by the ION division of Fusion IO.


^ permalink raw reply	[flat|nested] 38+ messages in thread
* scsi-mq V2
@ 2014-06-25 16:51 Christoph Hellwig
  2014-06-25 16:51 ` [PATCH 07/14] scsi: convert host_busy to atomic_t Christoph Hellwig
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2014-06-25 16:51 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliott, linux-scsi, linux-kernel

This is the second post of the scsi-mq series.

At this point the code is ready for merging and use by developers and early
adopters.  The core blk-mq code isn't that suitable for slow devices
yet, mostly due to the lack of an I/O scheduler, but Jens is working on it.
Similarly there is no dm-multipath support for drivers using blk-mq yet,
but I'm working on it.  It should also be noted that the code doesn't
actually support multiple hardware queues or fine grained tuning of the
blk-mq parameters yet.  All these could be added fairly easily as soon
as low-level drivers want to make use of them.

The amount of chances to the existing code are fairly small, and mostly
speedups or cleanups that also apply to the old path as well.  Because
of this I also haven't bothered to put it under a config option, just
like the blk-mq core.

The usage of blk-mq dramatically decreases CPU usage under all workloads going
down from 100% CPU usage that the old setup can hit easily to usually less
than 20% for maxing out storage subsystems with 512byte reads and writes,
and it allows to easily archive millions of IOPS.  Bart and Robert have
helped with some very detailed measurements that they might be able to send
in reply to this, although these usually involve significantly reworked low
level drivers to avoid other bottle necks.

One major objection to previous iterations of this code was the simple
replacement of the host_lock with atomic counters for the host and busy
counters.  The host_lock avoidance on it's own already improves performance,
and with the patch to avoid maintaining the per-target busy counter unless
needed we now replace a lock round trip on the host_lock with just a single
atomic increment in the submission path, and a single atomic decrement in
completion path, which should provide benefits even for the oddest RISC
architecture.  Longer term I'd still love to get rid of these entirely
and use the counters in blk-mq, but due to the difference in how they
are maintained this doesn't seem feasible as long as we still need to
support the legacy request code path.

Changes from V1:
 - rebased on top of the core-for-3.17 branch, most notable the
   scsi logging changes
 - fixed handling of cmd_list to prevent crashes for some heavy
   workloads
 - fixed incorrect handling of !target->can_queue
 - avoid scheduling a workqueue on I/O completions when no queues
   are congested

In addition to the patches in this thread there also is a git available at:

	git://git.infradead.org/users/hch/scsi.git scsi-mq.2

This work was sponsored by the ION division of Fusion IO.


^ permalink raw reply	[flat|nested] 38+ messages in thread
* scsi-mq
@ 2014-06-12 13:48 Christoph Hellwig
  2014-06-12 13:48 ` [PATCH 07/14] scsi: convert host_busy to atomic_t Christoph Hellwig
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2014-06-12 13:48 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Bart Van Assche, Robert Elliot, linux-scsi, linux-kernel

With all the required blk-mq work, and the previous set of scsi midlayer
updates in Linus' tree this is the time for the first format scsi-mq
submission.

At this point the code is ready for merging and use by developers and early
adopters.  The core blk-mq code isn't that suitable for slow devices
yet, mostly due to the lack of an I/O scheduler, but Jens is working on it.
Similarly there is no dm-multipath support for drivers using blk-mq yet,
but I'm working on it.  It should also be noted that the code doesn't
actually support multiple hardware queues or fine grained tuning of the
blk-mq parameters yet.  All these could be added fairly easily as soon
as low-level drivers want to make use of them.

The amount of chances to the existing code are fairly small, and mostly
speedups or cleanups that also apply to the old path as well.  Because
of this I also haven't bothered to put it under a config option, just
like the blk-mq core.

The usage of blk-mq dramatically decreases CPU usage under all workloads going
down from 100% CPU usage that the old setup can hit easily to usually less
than 20% for maxing out storage subsystems with 512byte reads and writes,
and it allows to easily archive millions of IOPS.  Bart and Robert have
helped with some very detailed measurements that they might be able to send
in reply to this, although these usually involve significantly reworked low
level drivers to avoid other bottle necks.

One major objection to previous iterations of this code was the simple
replacement of the host_lock with atomic counters for the host and busy
counters.  The host_lock avoidance on it's own already improves performance,
and with the patch to avoid maintaining the per-target busy counter unless
needed we now replace a lock round trip on the host_lock with just a single
atomic increment in the submission path, and a single atomic decrement in
completion path, which should provide benefits even for the oddest RISC
architecture.  Longer term I'd still love to get rid of these entirely
and use the counters in blk-mq, but due to the difference in how they
are maintained this doesn't seem feasible as long as we still need to
support the legacy request code path.

In addition to the patches in this thread there also is a git available at:

	git://git.infradead.org/users/hch/scsi.git scsi-mq

This work was sponsored by the ION division of Fusion IO.


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2014-08-19 16:11 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-18 10:12 scsi-mq V4 Christoph Hellwig
2014-07-18 10:13 ` [PATCH 01/14] scsi: add scsi_setup_cmnd helper Christoph Hellwig
2014-07-22  3:42   ` Martin K. Petersen
2014-07-22 17:20   ` Webb Scales
2014-07-18 10:13 ` [PATCH 02/14] scsi: split __scsi_queue_insert Christoph Hellwig
2014-07-22  3:44   ` Martin K. Petersen
2014-07-18 10:13 ` [PATCH 03/14] scsi: centralize command re-queueing in scsi_dispatch_fn Christoph Hellwig
2014-07-22  3:46   ` Martin K. Petersen
2014-07-18 10:13 ` [PATCH 04/14] scsi: set ->scsi_done before calling scsi_dispatch_cmd Christoph Hellwig
2014-07-22  3:48   ` Martin K. Petersen
2014-07-18 10:13 ` [PATCH 05/14] scsi: push host_lock down into scsi_{host,target}_queue_ready Christoph Hellwig
2014-07-22  3:52   ` Martin K. Petersen
2014-07-18 10:13 ` [PATCH 06/14] scsi: convert target_busy to an atomic_t Christoph Hellwig
2014-07-22  3:56   ` Martin K. Petersen
2014-07-18 10:13 ` [PATCH 07/14] scsi: convert host_busy to atomic_t Christoph Hellwig
2014-07-22  4:01   ` Martin K. Petersen
2014-07-22  4:18   ` Martin K. Petersen
2014-07-25 11:38     ` Christoph Hellwig
2014-07-18 10:13 ` [PATCH 08/14] scsi: convert device_busy " Christoph Hellwig
2014-07-18 10:13 ` [PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess Christoph Hellwig
2014-07-25 19:08   ` Martin K. Petersen
2014-07-18 10:13 ` [PATCH 10/14] scsi: only maintain target_blocked if the driver has a target queue limit Christoph Hellwig
2014-07-25 19:10   ` Martin K. Petersen
2014-07-18 10:13 ` [PATCH 11/14] scsi: unwind blk_end_request_all and blk_end_request_err calls Christoph Hellwig
2014-07-25 19:12   ` Martin K. Petersen
2014-07-18 10:13 ` [PATCH 12/14] scatterlist: allow chaining to preallocated chunks Christoph Hellwig
2014-07-25 19:15   ` Martin K. Petersen
2014-07-18 10:13 ` [PATCH 13/14] scsi: add support for a blk-mq based I/O path Christoph Hellwig
2014-07-25 19:29   ` Martin K. Petersen
2014-08-18 22:21   ` Kashyap Desai
2014-08-19 15:41     ` Kashyap Desai
2014-08-19 16:06     ` Christoph Hellwig
2014-08-19 16:11       ` Kashyap Desai
2014-07-18 10:13 ` [PATCH 14/14] fnic: reject device resets without assigned tags for the blk-mq case Christoph Hellwig
2014-07-25 19:31   ` Martin K. Petersen
  -- strict thread matches above, loose matches on Subject: below --
2014-06-25 16:51 scsi-mq V2 Christoph Hellwig
2014-06-25 16:51 ` [PATCH 07/14] scsi: convert host_busy to atomic_t Christoph Hellwig
2014-07-09 11:15   ` Hannes Reinecke
2014-06-12 13:48 scsi-mq Christoph Hellwig
2014-06-12 13:48 ` [PATCH 07/14] scsi: convert host_busy to atomic_t Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.