All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Wagner <dwagner@suse.de>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Mike Galbraith <efault@gmx.de>,
	Christoph Hellwig <hch@infradead.org>,
	Sagi Grimberg <sagi@grimberg.me>
Subject: Re: [PATCH 2/3] blk-mq: Always complete remote completions requests in softirq
Date: Tue, 8 Dec 2020 09:44:09 +0100	[thread overview]
Message-ID: <20201208084409.koeftbpnvesp4xtv@beryllium.lan> (raw)
In-Reply-To: <20201208082220.hhel5ubeh4uqrwnd@linutronix.de>

On Tue, Dec 08, 2020 at 09:22:20AM +0100, Sebastian Andrzej Siewior wrote:
> Sagi mentioned nvme-tcp as a user of this remote completion and Daniel
> has been kind to run some nvme-tcp tests.

I've started with some benchmarking. The first thing I tried is to find
a setup where the remote path is taken. I found a setup with nvme-fc
with a workload which results in ca 10% remote completion.

Setup:
 - NVMe over FC
 - 2x Emulex LPe36000 32Gb PCIe Fibre Channel Adapter
 - 8 mpaths
 - 4 E7-4820 v3, 80 cores

Workload:
 - fio --rw=randwrite --name=test --size=50M \
       --iodepth=32 --direct=0 --bs=4k --numjobs=40 \
       --time_based --runtime=1h --ioengine=libaio \
       --group_reporting

(I played a bit around with different workloads, most of them
 wont use the remote completion path)

I've annotated the code with a counter and exported it via
debugfs.

@@ -671,6 +673,8 @@ bool blk_mq_complete_request_remote(struct request *rq)
                return false;

        if (blk_mq_complete_need_ipi(rq)) {
+               ctx->rq_remote++;
+
                rq->csd.func = __blk_mq_complete_request_remote;
                rq->csd.info = rq;
                rq->csd.flags = 0;


And hacked a small script to collect the data.

- Baseline (5.10-rc7)

  Starting 40 processes
  Jobs: 40 (f=40): [w(40)][100.0%][r=0KiB/s,w=12.0GiB/s][r=0,w=3405k IOPS][eta 00m:00s]
  test: (groupid=0, jobs=40): err= 0: pid=14225: Mon Dec  7 20:09:57 2020
    write: IOPS=3345k, BW=12.8GiB/s (13.7GB/s)(44.9TiB/3600002msec)
      slat (usec): min=2, max=90772, avg= 9.43, stdev=10.67
      clat (usec): min=2, max=91343, avg=371.79, stdev=119.52
       lat (usec): min=5, max=91358, avg=381.31, stdev=122.45
      clat percentiles (usec):
       |  1.00th=[  231],  5.00th=[  245], 10.00th=[  253], 20.00th=[  273],
       | 30.00th=[  293], 40.00th=[  322], 50.00th=[  351], 60.00th=[  388],
       | 70.00th=[  420], 80.00th=[  465], 90.00th=[  529], 95.00th=[  570],
       | 99.00th=[  644], 99.50th=[  676], 99.90th=[  750], 99.95th=[  783],
       | 99.99th=[  873]
     bw (  KiB/s): min=107333, max=749152, per=2.51%, avg=335200.07, stdev=87628.57, samples=288000
     iops        : min=26833, max=187286, avg=83799.66, stdev=21907.09, samples=288000
    lat (usec)   : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec)   : 250=8.04%, 500=6.79%, 750=13.75%, 1000=0.09%
    lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
    lat (msec)   : 100=0.01%
    cpu          : usr=29.14%, sys=70.83%, ctx=320219, majf=0, minf=13056
    IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=28.7%, >=64=0.0%
       submit    : 0=0.0%, 4=28.7%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
       complete  : 0=0.0%, 4=28.7%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
       issued rwts: total=0,12042333583,0,0 short=0,0,0,0 dropped=0,0,0,0
       latency   : target=0, window=0, percentile=100.00%, depth=32

  Run status group 0 (all jobs):
    WRITE: bw=12.8GiB/s (13.7GB/s), 12.8GiB/s-12.8GiB/s (13.7GB/s-13.7GB/s), io=44.9TiB (49.3TB), run=3600002-3600002msec

  Disk stats (read/write):
    nvme5n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

- Patched

  Jobs: 40 (f=40): [w(40)][100.0%][r=0KiB/s,w=12.9GiB/s][r=0,w=3383k IOPS][eta 00m:00s]
  test: (groupid=0, jobs=40): err= 0: pid=13413: Mon Dec  7 21:31:01 2020
    write: IOPS=3371k, BW=12.9GiB/s (13.8GB/s)(45.2TiB/3600004msec)
      slat (nsec): min=1984, max=90341k, avg=9308.73, stdev=7068.58
      clat (usec): min=2, max=91259, avg=368.94, stdev=118.31
       lat (usec): min=5, max=91269, avg=378.34, stdev=121.43
      clat percentiles (usec):
       |  1.00th=[  231],  5.00th=[  245], 10.00th=[  255], 20.00th=[  277],
       | 30.00th=[  302], 40.00th=[  318], 50.00th=[  334], 60.00th=[  359],
       | 70.00th=[  392], 80.00th=[  433], 90.00th=[  562], 95.00th=[  635],
       | 99.00th=[  693], 99.50th=[  709], 99.90th=[  766], 99.95th=[  816],
       | 99.99th=[  914]
     bw (  KiB/s): min=124304, max=770204, per=2.50%, avg=337559.07, stdev=91383.66, samples=287995
     iops        : min=31076, max=192551, avg=84389.45, stdev=22845.91, samples=287995
    lat (usec)   : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec)   : 250=7.44%, 500=7.84%, 750=13.79%, 1000=0.15%
    lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
    lat (msec)   : 100=0.01%
    cpu          : usr=30.30%, sys=69.69%, ctx=179950, majf=0, minf=7403
    IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=29.2%, >=64=0.0%
       submit    : 0=0.0%, 4=29.2%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
       complete  : 0=0.0%, 4=29.2%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
       issued rwts: total=0,12135617715,0,0 short=0,0,0,0 dropped=0,0,0,0
       latency   : target=0, window=0, percentile=100.00%, depth=32

  Run status group 0 (all jobs):
    WRITE: bw=12.9GiB/s (13.8GB/s), 12.9GiB/s-12.9GiB/s (13.8GB/s-13.8GB/s), io=45.2TiB (49.7TB), run=3600004-3600004msec

  Disk stats (read/write):
    nvme1n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

- Baseline

  nvme5c0n1 completed     411218 remote      38777 9.43%
  nvme5c0n2 completed          0 remote          0 0.00%
  nvme5c1n1 completed     411270 remote      38770 9.43%
  nvme5c1n2 completed         50 remote          0 0.00%
  nvme5c2n1 completed          0 remote          0 0.00%
  nvme5c2n2 completed          0 remote          0 0.00%
  nvme5c3n1 completed     411220 remote      38751 9.42%
  nvme5c3n2 completed          0 remote          0 0.00%
  nvme5c4n1 completed          0 remote          0 0.00%
  nvme5c4n2 completed          0 remote          0 0.00%
  nvme5c5n1 completed          0 remote          0 0.00%
  nvme5c5n2 completed          0 remote          0 0.00%
  nvme5c6n1 completed     411216 remote      38759 9.43%
  nvme5c6n2 completed          0 remote          0 0.00%
  nvme5c7n1 completed          0 remote          0 0.00%
  nvme5c7n2 completed          0 remote          0 0.00%

- Patched

  nvme1c0n1 completed          0 remote          0 0.00%
  nvme1c0n2 completed          0 remote          0 0.00%
  nvme1c1n1 completed     172202 remote      17813 10.34%
  nvme1c1n2 completed         50 remote          0 0.00%
  nvme1c2n1 completed     172147 remote      17831 10.36%
  nvme1c2n2 completed          0 remote          0 0.00%
  nvme1c3n1 completed          0 remote          0 0.00%
  nvme1c3n2 completed          0 remote          0 0.00%
  nvme1c4n1 completed     172159 remote      17825 10.35%
  nvme1c4n2 completed          0 remote          0 0.00%
  nvme1c5n1 completed          0 remote          0 0.00%
  nvme1c5n2 completed          0 remote          0 0.00%
  nvme1c6n1 completed          0 remote          0 0.00%
  nvme1c6n2 completed          0 remote          0 0.00%
  nvme1c7n1 completed     172156 remote      17781 10.33%
  nvme1c7n2 completed          0 remote          0 0.00%


It looks like the patched version show tiny bit better numbers for this
workload. slat seems to be one of the major difference between the two
runs. But that is the only thing I really spotted to be really off.

I keep going with some more testing. Let what kind of tests you would
also want to see. I'll do a few plain NVMe tests next.

Thanks,
Daniel

  reply	other threads:[~2020-12-08  8:45 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-04 19:13 [PATCH 0/3 v2] blk-mq: Don't complete in IRQ, use llist_head Sebastian Andrzej Siewior
2020-12-04 19:13 ` [PATCH 1/3] blk-mq: Don't complete on a remote CPU in force threaded mode Sebastian Andrzej Siewior
2020-12-08 13:10   ` Christoph Hellwig
2020-12-04 19:13 ` [PATCH 2/3] blk-mq: Always complete remote completions requests in softirq Sebastian Andrzej Siewior
2020-12-07 23:52   ` Jens Axboe
2020-12-08  8:22     ` Sebastian Andrzej Siewior
2020-12-08  8:44       ` Daniel Wagner [this message]
2020-12-08 11:36         ` Daniel Wagner
2020-12-08 11:49           ` Sebastian Andrzej Siewior
2020-12-08 12:41             ` Daniel Wagner
2020-12-08 12:52               ` Sebastian Andrzej Siewior
2020-12-08 12:57                 ` Sebastian Andrzej Siewior
2020-12-08 13:27                   ` Daniel Wagner
2020-12-17 16:45         ` Jens Axboe
2020-12-17 16:49           ` Daniel Wagner
2020-12-17 16:54             ` Jens Axboe
2020-12-08 13:13     ` Christoph Hellwig
2020-12-17 16:43       ` Sebastian Andrzej Siewior
2020-12-17 16:55         ` Jens Axboe
2020-12-17 16:58           ` Sebastian Andrzej Siewior
2020-12-17 17:05             ` Daniel Wagner
2020-12-17 18:16           ` Daniel Wagner
2020-12-17 18:22             ` Jens Axboe
2020-12-17 18:41               ` Daniel Wagner
2020-12-17 18:46                 ` Jens Axboe
2020-12-17 19:07                   ` Daniel Wagner
2020-12-17 19:13                     ` Jens Axboe
2020-12-17 19:15                       ` Daniel Wagner
2020-12-04 19:13 ` [PATCH 3/3] blk-mq: Use llist_head for blk_cpu_done Sebastian Andrzej Siewior
2020-12-08 13:20   ` Christoph Hellwig
2020-12-08 13:28     ` Christoph Hellwig
2020-12-14 20:20     ` Sebastian Andrzej Siewior
  -- strict thread matches above, loose matches on Subject: below --
2021-01-23 20:10 [PATCH v3 0/3] blk-mq: Don't complete in IRQ, use llist_head Sebastian Andrzej Siewior
2021-01-23 20:10 ` [PATCH 2/3] blk-mq: Always complete remote completions requests in softirq Sebastian Andrzej Siewior
2021-01-25  7:10   ` Hannes Reinecke
2021-01-25  8:25     ` Christoph Hellwig
2021-01-25  8:30       ` Sebastian Andrzej Siewior
2021-01-25  8:32         ` Christoph Hellwig
2021-01-25  9:29           ` Sebastian Andrzej Siewior
2021-01-25  8:22   ` Christoph Hellwig
2021-01-25  8:49     ` Christoph Hellwig
2021-01-27 11:22   ` Daniel Wagner
2020-10-28  6:56 [PATCH RFC] blk-mq: Don't IPI requests on PREEMPT_RT Christoph Hellwig
2020-10-28 14:12 ` [PATCH 1/3] blk-mq: Don't complete on a remote CPU in force threaded mode Sebastian Andrzej Siewior
2020-10-28 14:12   ` [PATCH 2/3] blk-mq: Always complete remote completions requests in softirq Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201208084409.koeftbpnvesp4xtv@beryllium.lan \
    --to=dwagner@suse.de \
    --cc=axboe@kernel.dk \
    --cc=bigeasy@linutronix.de \
    --cc=efault@gmx.de \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sagi@grimberg.me \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.