All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"chenxiang (M)" <chenxiang66@hisilicon.com>,
	"bigeasy@linutronix.de" <bigeasy@linutronix.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"hare@suse.com" <hare@suse.com>, "hch@lst.de" <hch@lst.de>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"bvanassche@acm.org" <bvanassche@acm.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"mingo@redhat.com" <mingo@redhat.com>
Subject: Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt
Date: Mon, 16 Dec 2019 10:47:45 +0000	[thread overview]
Message-ID: <7db89b97-1b9e-8dd1-684a-3eef1b1af244@huawei.com> (raw)
In-Reply-To: <20191214135641.5a817512@why>

On 14/12/2019 13:56, Marc Zyngier wrote:
> On Fri, 13 Dec 2019 15:43:07 +0000
> John Garry <john.garry@huawei.com> wrote:
> 
> [...]
> 
>> john@ubuntu:~$ ./dump-io-irq-affinity
>> kernel version:
>> Linux ubuntu 5.5.0-rc1-00003-g7adc5d7ec1ca-dirty #1440 SMP PREEMPT Fri Dec 13 14:53:19 GMT 2019 aarch64 aarch64 aarch64 GNU/Linux
>> PCI name is 04:00.0: nvme0n1
>> irq 56, cpu list 75, effective list 5
>> irq 60, cpu list 24-28, effective list 10
> 
> The NUMA selection code definitely gets in the way. And to be honest,
> this NUMA thing is only there for the benefit of a terminally broken
> implementation (Cavium ThunderX), which we should have never supported
> the first place.
> 
> Let's rework this and simply use the managed affinity whenever
> available instead. It may well be that it will break TX1, but I care
> about it just as much as Cavium/Marvell does...

I'm just wondering if non-managed interrupts should be included in the 
load balancing calculation? Couldn't irqbalance (if active) start moving 
non-managed interrupts around anyway?

> 
> Please give this new patch a shot on your system (my D05 doesn't have
> any managed devices):

We could consider supporting platform msi managed interrupts, but I 
doubt the value.

> 
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/its-balance-mappings&id=1e987d83b8d880d56c9a2d8a86289631da94e55a
> 

I quickly tested that in my NVMe env, and I see a performance boost of 
1055K -> 1206K IOPS. Results at bottom.

Here's the irq mapping dump:

PCI name is 04:00.0: nvme0n1
irq 56, cpu list 75, effective list 75
irq 60, cpu list 24-28, effective list 26
irq 61, cpu list 29-33, effective list 30
irq 62, cpu list 34-38, effective list 35
irq 63, cpu list 39-43, effective list 40
irq 64, cpu list 44-47, effective list 45
irq 65, cpu list 48-51, effective list 49
irq 66, cpu list 52-55, effective list 55
irq 67, cpu list 56-59, effective list 57
irq 68, cpu list 60-63, effective list 61
irq 69, cpu list 64-67, effective list 65
irq 70, cpu list 68-71, effective list 69
irq 71, cpu list 72-75, effective list 73
irq 72, cpu list 76-79, effective list 77
irq 73, cpu list 80-83, effective list 81
irq 74, cpu list 84-87, effective list 85
irq 75, cpu list 88-91, effective list 89
irq 76, cpu list 92-95, effective list 93
irq 77, cpu list 0-3, effective list 1
irq 78, cpu list 4-7, effective list 6
irq 79, cpu list 8-11, effective list 9
irq 80, cpu list 12-15, effective list 13
irq 81, cpu list 16-19, effective list 17
irq 82, cpu list 20-23, effective list 21
PCI name is 81:00.0: nvme1n1
irq 100, cpu list 0-3, effective list 0
irq 101, cpu list 4-7, effective list 4
irq 102, cpu list 8-11, effective list 8
irq 103, cpu list 12-15, effective list 12
irq 104, cpu list 16-19, effective list 16
irq 105, cpu list 20-23, effective list 20
irq 57, cpu list 63, effective list 63
irq 83, cpu list 24-28, effective list 26
irq 84, cpu list 29-33, effective list 29
irq 85, cpu list 34-38, effective list 34
irq 86, cpu list 39-43, effective list 39
irq 87, cpu list 44-47, effective list 44
irq 88, cpu list 48-51, effective list 48
irq 89, cpu list 52-55, effective list 54
irq 90, cpu list 56-59, effective list 56
irq 91, cpu list 60-63, effective list 60
irq 92, cpu list 64-67, effective list 64
irq 93, cpu list 68-71, effective list 68
irq 94, cpu list 72-75, effective list 72
irq 95, cpu list 76-79, effective list 76
irq 96, cpu list 80-83, effective list 80
irq 97, cpu list 84-87, effective list 84
irq 98, cpu list 88-91, effective list 88
irq 99, cpu list 92-95, effective list 92

I'm still getting the CPU lockup (even on CPUs which have a single NVMe 
completion interrupt assigned), which taints these results. That lockup 
needs to be fixed.

We'll check on our SAS env also. I did already hack something up similar 
to your change and again we saw a boost there.

Thanks,
John


before

job1: (groupid=0, jobs=20): err= 0: pid=1328: Mon Dec 16 10:03:35 2019
    read: IOPS=1055k, BW=4121MiB/s (4322MB/s)(994GiB/246946msec)
     slat (usec): min=2, max=36747k, avg= 6.08, stdev=4018.85
     clat (usec): min=13, max=145774k, avg=369.87, stdev=50221.38
      lat (usec): min=22, max=145774k, avg=376.12, stdev=50387.08
     clat percentiles (usec):
      |  1.00th=[  105],  5.00th=[  128], 10.00th=[  149], 20.00th=[  178],
      | 30.00th=[  210], 40.00th=[  243], 50.00th=[  281], 60.00th=[  326],
      | 70.00th=[  396], 80.00th=[  486], 90.00th=[  570], 95.00th=[  619],
      | 99.00th=[  775], 99.50th=[  906], 99.90th=[ 1254], 99.95th=[ 1631],
      | 99.99th=[ 3884]
    bw (  KiB/s): min=    8, max=715687, per=5.65%, avg=238518.42, 
stdev=115795.80, samples=8726
    iops        : min=    2, max=178921, avg=59629.49, stdev=28948.95, 
samples=8726
   lat (usec)   : 20=0.01%, 50=0.01%, 100=0.60%, 250=41.66%, 500=39.19%
   lat (usec)   : 750=17.36%, 1000=0.95%
   lat (msec)   : 2=0.20%, 4=0.02%, 10=0.01%, 20=0.01%, 50=0.01%
   lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
   lat (msec)   : 2000=0.01%, >=2000=0.01%
   cpu          : usr=8.26%, sys=33.56%, ctx=132171506, majf=0, minf=6774
   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, 
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, 
 >=64=0.0%
      issued rwt: total=260541724,0,0, short=0,0,0, dropped=0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=20

Run status group 0 (all jobs):
    READ: bw=4121MiB/s (4322MB/s), 4121MiB/s-4121MiB/s 
(4322MB/s-4322MB/s), io=994GiB (1067GB), run=246946-246946msec

Disk stats (read/write):
   nvme0n1: ios=136993553/0, merge=0/0, ticks=42019997/0, 
in_queue=14168, util=100.00%
   nvme1n1: ios=123408538/0, merge=0/0, ticks=37371364/0, 
in_queue=44672, util=100.00%
john@ubuntu:~$ dmesg | grep "Linux v"
[    0.000000] Linux version 5.5.0-rc1-dirty 
(john@john-ThinkCentre-M93p) (gcc version 7.3.1 20180425 
[linaro-7.3-2018.05-rc1 revision 
38aec9a676236eaa42ca03ccb3a6c1dd0182c29f] (Linaro GCC 7.3-2018.05-rc1)) 
#546 SMP PREEMPT Mon Dec 16 09:47:44 GMT 2019
john@ubuntu:~$


after

Creat 4k_read_depth20_fiotest file sucessfully_cpu_liuyifan_nvme.sh 4k 
read 20 1
job1: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=20
...
job1: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=20
...
fio-3.1
Starting 20 processes
[  318.569268] rcu: INFO: rcu_preempt self-detected stall on CPU0 
IOPS][eta 04m:30s]
[  318.575010] rcu: 26-....: (1 GPs behind) 
idle=b82/1/0x4000000000000004 softirq=842/843 fqs=2508
[  355.781759] rcu: INFO: rcu_preempt self-detected stall on CPU0 
IOPS][eta 03m:53s]
[  355.787499] rcu: 34-....: (1 GPs behind) 
idle=35a/1/0x4000000000000004 softirq=10395/11729 fqs=2623
[  407.805329] rcu: INFO: rcu_preempt self-detected stall on CPU0 
IOPS][eta 03m:01s]
[  407.811069] rcu: 0-....: (1 GPs behind) idle=0ba/0/0x3 
softirq=10830/14926 fqs=2625
Jobs: 20 (f=20): [R(20)][61.0%][r=4747MiB/s,w=0KiB/s][r=1215k,w=0[ 
470.817317] rcu: INFO: rcu_preempt self-detected stall on CPU
[  470.824912] rcu: 0-....: (2779 ticks this GP) idle=0ba/0/0x3 
softirq=14927/14927 fqs=10501
[  533.829618] rcu: INFO: rcu_preempt self-detected stall on CPU0 
IOPS][eta 00m:54s]
[  533.835360] rcu: 39-....: (1 GPs behind) 
idle=74e/1/0x4000000000000004 softirq=3422/3422 fqs=17226
Jobs: 20 (f=20): [R(20)][100.0%][r=4822MiB/s,w=0KiB/s][r=1234k,w=0 
IOPS][eta 00m:00s]
job1: (groupid=0, jobs=20): err= 0: pid=1273: Mon Dec 16 10:15:55 2019
    read: IOPS=1206k, BW=4712MiB/s (4941MB/s)(1381GiB/300002msec)
     slat (usec): min=2, max=165648k, avg= 7.26, stdev=10373.59
     clat (usec): min=12, max=191808k, avg=323.17, stdev=57005.77
      lat (usec): min=19, max=191808k, avg=330.59, stdev=58014.79
     clat percentiles (usec):
      |  1.00th=[  106],  5.00th=[  151], 10.00th=[  174], 20.00th=[  194],
      | 30.00th=[  212], 40.00th=[  231], 50.00th=[  247], 60.00th=[  262],
      | 70.00th=[  285], 80.00th=[  330], 90.00th=[  457], 95.00th=[  537],
      | 99.00th=[  676], 99.50th=[  807], 99.90th=[ 1647], 99.95th=[ 2376],
      | 99.99th=[ 6915]
    bw (  KiB/s): min=    8, max=648593, per=5.73%, avg=276597.82, 
stdev=98174.89, samples=10475
    iops        : min=    2, max=162148, avg=69149.31, stdev=24543.72, 
samples=10475
   lat (usec)   : 20=0.01%, 50=0.01%, 100=0.67%, 250=51.48%, 500=41.68%
   lat (usec)   : 750=5.54%, 1000=0.33%
   lat (msec)   : 2=0.23%, 4=0.05%, 10=0.02%, 20=0.01%, 50=0.01%
   lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
   lat (msec)   : 2000=0.01%, >=2000=0.01%
   cpu          : usr=9.77%, sys=41.68%, ctx=218155976, majf=0, minf=6376
   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, 
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, 
 >=64=0.0%
      issued rwt: total=361899317,0,0, short=0,0,0, dropped=0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=20

Run status group 0 (all jobs):
    READ: bw=4712MiB/s (4941MB/s), 4712MiB/s-4712MiB/s 
(4941MB/s-4941MB/s), io=1381GiB (1482GB), run=300002-300002msec

Disk stats (read/write):
   nvme0n1: ios=188627578/0, merge=0/0, ticks=50365208/0, 
in_queue=55380, util=100.00%
   nvme1n1: ios=173066657/0, merge=0/0, ticks=38804419/0, 
in_queue=151212, util=100.00%
john@ubuntu:~$ dmesg | grep "Linux v"
[    0.000000] Linux version 5.5.0-rc1-00001-g1e987d83b8d8-dirty 
(john@john-ThinkCentre-M93p) (gcc version 7.3.1 20180425 
[linaro-7.3-2018.05-rc1 revision 
38aec9a676236eaa42ca03ccb3a6c1dd0182c29f] (Linaro GCC 7.3-2018.05-rc1)) 
#547 SMP PREEMPT Mon Dec 16 10:02:27 GMT 2019

  reply	other threads:[~2019-12-16 10:47 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-06 14:35 [PATCH RFC 0/1] Threaded handler uses irq affinity for when the interrupt is managed John Garry
2019-12-06 14:35 ` [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt John Garry
2019-12-06 15:22   ` Marc Zyngier
2019-12-06 16:16     ` John Garry
2019-12-07  8:03   ` Ming Lei
2019-12-09 14:30     ` John Garry
2019-12-09 15:09       ` Hannes Reinecke
2019-12-09 15:17         ` Marc Zyngier
2019-12-09 15:25           ` Hannes Reinecke
2019-12-09 15:36             ` Marc Zyngier
2019-12-09 15:49           ` Qais Yousef
2019-12-09 15:55             ` Marc Zyngier
2019-12-10  1:43       ` Ming Lei
2019-12-10  9:45         ` John Garry
2019-12-10 10:06           ` Ming Lei
2019-12-10 10:28           ` Marc Zyngier
2019-12-10 10:59             ` John Garry
2019-12-10 11:36               ` Marc Zyngier
2019-12-10 12:05                 ` John Garry
2019-12-10 18:32                   ` Marc Zyngier
2019-12-11  9:41                     ` John Garry
2019-12-13 10:07                       ` John Garry
2019-12-13 10:31                         ` Marc Zyngier
2019-12-13 12:08                           ` John Garry
2019-12-14 10:59                             ` Marc Zyngier
2019-12-11 17:09         ` John Garry
2019-12-12 22:38           ` Ming Lei
2019-12-13 11:12             ` John Garry
2019-12-13 13:18               ` Ming Lei
2019-12-13 15:43                 ` John Garry
2019-12-13 17:12                   ` Ming Lei
2019-12-13 17:50                     ` John Garry
2019-12-14 13:56                   ` Marc Zyngier
2019-12-16 10:47                     ` John Garry [this message]
2019-12-16 11:40                       ` Marc Zyngier
2019-12-16 14:17                         ` John Garry
2019-12-16 18:00                           ` Marc Zyngier
2019-12-16 18:50                             ` John Garry
2019-12-20 11:30                               ` John Garry
2019-12-20 14:43                                 ` Marc Zyngier
2019-12-20 15:38                                   ` John Garry
2019-12-20 16:16                                     ` Marc Zyngier
2019-12-20 23:31                                     ` Ming Lei
2019-12-23  9:07                                       ` Marc Zyngier
2019-12-23 10:26                                         ` John Garry
2019-12-23 10:47                                           ` Marc Zyngier
2019-12-23 11:35                                             ` John Garry
2019-12-24  1:59                                             ` Ming Lei
2019-12-24 11:20                                               ` Marc Zyngier
2019-12-25  0:48                                                 ` Ming Lei
2020-01-02 10:35                                                   ` John Garry
2020-01-03  0:46                                                     ` Ming Lei
2020-01-03 10:41                                                       ` John Garry
2020-01-03 11:29                                                         ` Ming Lei
2020-01-03 11:50                                                           ` John Garry
2020-01-04 12:03                                                             ` Ming Lei
2020-05-30  7:46 ` [tip: irq/core] irqchip/gic-v3-its: Balance initial LPI affinity across CPUs tip-bot2 for Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7db89b97-1b9e-8dd1-684a-3eef1b1af244@huawei.com \
    --to=john.garry@huawei.com \
    --cc=axboe@kernel.dk \
    --cc=bigeasy@linutronix.de \
    --cc=bvanassche@acm.org \
    --cc=chenxiang66@hisilicon.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.