All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Bart Van Assche <Bart.VanAssche@wdc.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"hch@infradead.org" <hch@infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"axboe@fb.com" <axboe@fb.com>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Subject: Re: [PATCH V3 0/5]  dm-rq: improve sequential I/O performance
Date: Fri, 12 Jan 2018 12:18:41 -0500	[thread overview]
Message-ID: <20180112171840.GA4541@redhat.com> (raw)
In-Reply-To: <20180112033308.GC25090@ming.t460p>

On Thu, Jan 11 2018 at 10:33pm -0500,
Ming Lei <ming.lei@redhat.com> wrote:

> On Thu, Jan 11, 2018 at 08:57:21PM -0500, Mike Snitzer wrote:
> > On Thu, Jan 11 2018 at  8:42pm -0500,
> > Ming Lei <ming.lei@redhat.com> wrote:
> > 
> > > On Thu, Jan 11, 2018 at 10:37:37PM +0000, Bart Van Assche wrote:
> > > > On Thu, 2018-01-11 at 17:07 -0500, Mike Snitzer wrote:
> > > > > Bart, if for some reason we regress for some workload you're able to
> > > > > more readily test we can deal with it.  But I'm too encouraged by Ming's
> > > > > performance improvements to hold these changes back any further.
> > > > 
> > > > Sorry Mike but I think Ming's measurement results are not sufficient to
> > > > motivate upstreaming of these patches. What I remember from previous versions
> > > > of this patch series is that Ming measured the performance impact of this
> > > > patch series only for the Emulex FC driver (lpfc). That driver limits queue
> > > > depth to 3. Instead of modifying the dm code, that driver needs to be fixed
> > > > such that it allows larger queue depths.
> > > > 
> > > > Additionally, some time ago I had explained in detail why I think that patch
> > > > 2/5 in this series is wrong and why it will introduce fairness regressions
> > > > in multi-LUN workloads. I think we need performance measurements for this
> > > > patch series for at least the following two configurations before this patch
> > > > series can be considered for upstream inclusion:
> > > > * The performance impact of this patch series for SCSI devices with a
> > > >   realistic queue depth (e.g. 64 or 128).
> > > 
> > > Please look at the cover letter or patch 5's commit log, it mentions that
> > > dm-mpath over virtio-scsi is tested, and the default queue depth of virito-scsi
> > > is 128.
> > > 
> > > > * The performance impact for this patch series for a SCSI host with which
> > > >   multiple LUNs are associated and for which the target system often replies
> > > >   with the SCSI "BUSY" status.
> > > 
> > > I don't think this patch is related with this case, this patch just provides the
> > > BUSY feedback from underlying queue to blk-mq via dm-rq.
> > 
> > Hi Ming,
> > 
> > Please see https://www.redhat.com/archives/dm-devel/2017-April/msg00190.html
> > 
> > Specifically:
> > "That patch [commit 6077c2d706] can be considered as a first step that
> > can be refined further, namely by modifying the dm-rq code further such
> > that dm-rq queues are only rerun after the busy condition has been
> > cleared. The patch at the start of this thread is easier to review and
> > easier to test than any patch that would only rerun dm-rq queues after
> > the busy condition has been cleared."
> > 
> > Do you have time to reason through Bart's previous proposal for how to
> > better address the specific issue that was documented to be fixed in the
> > header for commit 6077c2d706 ?
> 
> Hi Mike,
> 
> Recently we fixed one such issue in blk-mq (eb619fdb2d4cb8b3d34), and I
> highly guess that may fix Bart's case.

Wow, kind of staggering that there was no mention of this fix prior to
now.  Seems _very_ relevant (like the real fix).

> Let's see this commit 6077c2d706 again, I don't know the idea behind the
> fix, which just adds random of 100ms, and this delay degrades performance a
> lot, since no hctx can be scheduled any more during the delay.

I'd love to try to reproduce the issue documented in commit
6077c2d706 but sadly I cannot get that srp-test to work on my system.
Just fails with:

# while srp-test/run_tests -d -r 30 -t 02-mq; do :; done
Unloaded the ib_srpt kernel module
Unloaded the rdma_rxe kernel module
Zero-initializing /dev/ram0 ... done
Zero-initializing /dev/ram1 ... done
Configured SRP target driver
Running test srp-test/tests/02-mq ...
Test file I/O on top of multipath concurrently with logout and login (0 min; mq)
SRP login failed
Test srp-test/tests/02-mq failed

Think the login is failing due to target not being setup properly?
Yeap, now from reading this thread, it is clear that srp-test only works
if you have an IB HCA:
https://patchwork.kernel.org/patch/10041381/

> So again it is definitely a workaround, no any reasoning, no any theory, just
> say it is a fix. Are there anyone who can explain the story?
> 
> IMO, the blk_get_request() in dm-mpath is just like a kmalloc(ATOMIC) which
> is used in other blk-mq drivers' IO path too, so don't know why dm-mq is so
> strange to require such ugly 100ms delay.

The header for commit 6077c2d706 notes that same action that Jens took
to unwedge the request stalls (in the patchwork thread I referenced
above), with:

echo run > /sys/kernel/debug/block/nullb1/state
vs what Bart noted in commit 6077c2d706:
echo run >/sys/kernel/debug/block/dm-0/state

Really does feel like the issue Jens' shared tags fix addressed _is_ the
root cause that commit 6077c2d706 worked around.

> So I suggest to remove it, let's see if there are reports, and if there
> are, I am quite confident we can root cause that and fix that.

Yeah, it really is rediculous that we're getting this kind of pushback
for something that was/is so poorly justified.  And especially given
that dm_mq_queue_rq() _still_ has this:

        if (ti->type->busy && ti->type->busy(ti))
                return BLK_STS_RESOURCE;

yet our desire to avoid blk_mq_delay_run_hw_queue(hctx, 100/*ms*/) in
response to blk_get_request() failure, just prior to returning
BLK_STS_RESOURCE in dm_mq_queue_rq(), is met with hellfire.

I refuse to submit to this highly unexpected cargo cult programming.

This is going upstream for 4.16:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/commit/?h=dm-4.16&id=5b18cff4baedde77e0d69bd62a13ae78f9488d89

I've lost patience with the unwillingness to reassess/justify the need
for commit 6077c2d706 ; SO it is just getting removed and we'll debug
and fix any future reported blk-mq stalls (and/or high cpu load) in a
much more precise manner -- provided the reporter is willing to work
with us!

Mike

  reply	other threads:[~2018-01-12 17:18 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-11  6:01 [PATCH V3 0/5] dm-rq: improve sequential I/O performance Ming Lei
2018-01-11  6:01 ` [PATCH V3 1/5] dm-mpath: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE Ming Lei
2018-01-11  6:01 ` [PATCH V3 2/5] dm-mpath: return DM_MAPIO_REQUEUE in case of rq allocation failure Ming Lei
2018-01-12 19:04   ` Bart Van Assche
2018-01-12 19:04     ` Bart Van Assche
2018-01-13  1:29     ` Ming Lei
2018-01-11  6:01 ` [PATCH V3 3/5] blk-mq: move actual issue into one helper Ming Lei
2018-01-11 22:09   ` Mike Snitzer
2018-01-11  6:01 ` [PATCH V3 4/5] blk-mq: return dispatch result to caller in blk_mq_try_issue_directly Ming Lei
2018-01-11 22:10   ` Mike Snitzer
2018-01-11  6:01 ` [PATCH V3 5/5] blk-mq: issue request directly for blk_insert_cloned_request Ming Lei
2018-01-11 22:42   ` Mike Snitzer
2018-01-11 22:07 ` [PATCH V3 0/5] dm-rq: improve sequential I/O performance Mike Snitzer
2018-01-11 22:37   ` Bart Van Assche
2018-01-11 22:37     ` Bart Van Assche
2018-01-11 22:58     ` Mike Snitzer
2018-01-11 23:27       ` Bart Van Assche
2018-01-11 23:27         ` Bart Van Assche
2018-01-12  1:43         ` Mike Snitzer
2018-01-12  1:42     ` Ming Lei
2018-01-12  1:57       ` Mike Snitzer
2018-01-12  3:33         ` Ming Lei
2018-01-12 17:18           ` Mike Snitzer [this message]
2018-01-12 17:26             ` Bart Van Assche
2018-01-12 17:26               ` Bart Van Assche
2018-01-12 17:40               ` Mike Snitzer
2018-01-12 17:46                 ` Bart Van Assche
2018-01-12 17:46                   ` Bart Van Assche
2018-01-12 18:06                   ` Mike Snitzer
2018-01-12 18:54                     ` Bart Van Assche
2018-01-12 18:54                       ` Bart Van Assche
2018-01-12 19:29                       ` Mike Snitzer
2018-01-12 19:53                       ` Elliott, Robert (Persistent Memory)
2018-01-12 19:53                         ` Elliott, Robert (Persistent Memory)
2018-01-13  0:52                         ` Mike Snitzer
2018-01-13  1:00                           ` Bart Van Assche
2018-01-13  1:00                             ` Bart Van Assche
2018-01-13  1:37                             ` Mike Snitzer
2018-01-13 15:14                               ` Mike Snitzer
2018-01-12 22:31                       ` Mike Snitzer
2018-01-13 15:04                         ` Ming Lei
2018-01-13 15:04                           ` Ming Lei
2018-01-13 15:10                           ` Mike Snitzer
2018-01-12 23:17                       ` Mike Snitzer
2018-01-12 23:42                         ` Bart Van Assche
2018-01-12 23:42                           ` Bart Van Assche
2018-01-13  0:45                           ` Mike Snitzer
2018-01-13 14:34                       ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180112171840.GA4541@redhat.com \
    --to=snitzer@redhat.com \
    --cc=Bart.VanAssche@wdc.com \
    --cc=axboe@fb.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.