LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: John Garry <john.garry@huawei.com>
Cc: tglx@linutronix.de, chenxiang66@hisilicon.com,
	bigeasy@linutronix.de, linux-kernel@vger.kernel.org,
	maz@kernel.org, hare@suse.com, hch@lst.de, axboe@kernel.dk,
	bvanassche@acm.org, peterz@infradead.org, mingo@redhat.com
Subject: Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt
Date: Fri, 13 Dec 2019 06:38:05 +0800
Message-ID: <20191212223805.GA24463@ming.t460p> (raw)
In-Reply-To: <0ad37515-c22d-6857-65a2-cc28256a8afa@huawei.com>


[-- Attachment #1: Type: text/plain, Size: 4575 bytes --]

On Wed, Dec 11, 2019 at 05:09:18PM +0000, John Garry wrote:
> On 10/12/2019 01:43, Ming Lei wrote:
> > > > > For when the interrupt is managed, allow the threaded part to run on all
> > > > > cpus in the irq affinity mask.
> > > > I remembered that performance drop is observed by this approach in some
> > > > test.
> > >  From checking the thread about the NVMe interrupt swamp, just switching to
> > > threaded handler alone degrades performance. I didn't see any specific
> > > results for this change from Long Li -https://lkml.org/lkml/2019/8/21/128
> 
> Hi Ming,
> 
> > I am pretty clear the reason for Azure, which is caused by aggressive interrupt
> > coalescing, and this behavior shouldn't be very common, and it can be
> > addressed by the following patch:
> 
> I am running some NVMe perf tests with Marc's patch.

We need to confirm that if Marc's patch works as expected, could you
collect log via the attached script?

> 
> I see this almost always eventually (with or without that patch):
> 
> [   66.018140] rcu: INFO: rcu_preempt self-detected stall on CPU2% done]
> [5058MB/0KB/0KB /s] [1295K/0/0 iops] [eta 01m:39s]
> [   66.023885] rcu: 12-....: (5250 ticks this GP)
> idle=182/1/0x4000000000000004 softirq=517/517 fqs=2529
> [   66.033306] (t=5254 jiffies g=733 q=2241)
> [   66.037394] Task dump for CPU 12:
> [   66.040696] fio             R  running task        0   798    796
> 0x00000002
> [   66.047733] Call trace:
> [   66.050173]  dump_backtrace+0x0/0x1a0
> [   66.053823]  show_stack+0x14/0x20
> [   66.057126]  sched_show_task+0x164/0x1a0
> [   66.061036]  dump_cpu_task+0x40/0x2e8
> [   66.064686]  rcu_dump_cpu_stacks+0xa0/0xe0
> [   66.068769]  rcu_sched_clock_irq+0x6d8/0xaa8
> [   66.073027]  update_process_times+0x2c/0x50
> [   66.077198]  tick_sched_handle.isra.14+0x30/0x50
> [   66.081802]  tick_sched_timer+0x48/0x98
> [   66.085625]  __hrtimer_run_queues+0x120/0x1b8
> [   66.089968]  hrtimer_interrupt+0xd4/0x250
> [   66.093966]  arch_timer_handler_phys+0x28/0x40
> [   66.098398]  handle_percpu_devid_irq+0x80/0x140
> [   66.102915]  generic_handle_irq+0x24/0x38
> [   66.106911]  __handle_domain_irq+0x5c/0xb0
> [   66.110995]  gic_handle_irq+0x5c/0x148
> [   66.114731]  el1_irq+0xb8/0x180
> [   66.117858]  efi_header_end+0x94/0x234
> [   66.121595]  irq_exit+0xd0/0xd8
> [   66.124724]  __handle_domain_irq+0x60/0xb0
> [   66.128806]  gic_handle_irq+0x5c/0x148
> [   66.132542]  el0_irq_naked+0x4c/0x54
> [   97.152870] rcu: INFO: rcu_preempt self-detected stall on CPU8% done]
> [4736MB/0KB/0KB /s] [1212K/0/0 iops] [eta 01m:08s]
> [   97.158616] rcu: 8-....: (1 GPs behind) idle=08e/1/0x4000000000000002
> softirq=462/505 fqs=2621
> [   97.167414] (t=5253 jiffies g=737 q=5507)
> [   97.171498] Task dump for CPU 8:
> [pu_task+0x40/0x2e8
> [   97.198705]  rcu_dump_cpu_stacks+0xa0/0xe0
> [   97.202788]  rcu_sched_clock_irq+0x6d8/0xaa8
> [   97.207046]  update_process_times+0x2c/0x50
> [   97.211217]  tick_sched_handle.isra.14+0x30/0x50
> [   97.215820]  tick_sched_timer+0x48/0x98
> [   97.219644]  __hrtimer_run_queues+0x120/0x1b8
> [   97.223989]  hrtimer_interrupt+0xd4/0x250
> [   97.227987]  arch_timer_handler_phys+0x28/0x40
> [   97.232418]  handle_percpu_devid_irq+0x80/0x140
> [   97.236935]  generic_handle_irq+0x24/0x38
> [   97.240931]  __handle_domain_irq+0x5c/0xb0
> [   97.245015]  gic_handle_irq+0x5c/0x148
> [   97.248751]  el1_irq+0xb8/0x180
> [   97.251880]  find_busiest_group+0x18c/0x9e8
> [   97.256050]  load_balance+0x154/0xb98
> [   97.259700]  rebalance_domains+0x1cc/0x2f8
> [   97.263783]  run_rebalance_domains+0x78/0xe0
> [   97.268040]  efi_header_end+0x114/0x234
> [   97.271864]  run_ksoftirqd+0x38/0x48
> [   97.275427]  smpboot_thread_fn+0x16c/0x270
> [   97.279511]  kthread+0x118/0x120
> [   97.282726]  ret_from_fork+0x10/0x18
> [   97.286289] Task dump for CPU 12:
> [   97.289591] kworker/12:1    R  running task        0   570      2
> 0x0000002a
> [   97.296634] Workqueue:  0x0 (mm_percpu_wq)
> [   97.300718] Call trace:
> [   97.303152]  __switch_to+0xbc/0x218
> [   97.306632]  page_wait_table+0x1500/0x1800
> 
> Would this be the same interrupt "swamp" issue?

It could be, but reason need to investigated.

You never provide the test details(how many drives, how many disks
attached to each drive) as I asked, so I can't comment on the reason,
also no reason shows that the patch is a good fix.

My theory is simple, so far, the CPU is still much quicker than
current storage in case that IO aren't from multiple disks which are
connected to same drive.

Thanks, 
Ming

[-- Attachment #2: dump-io-irq-affinity --]
[-- Type: text/plain, Size: 1023 bytes --]

#!/bin/sh

get_disk_from_pcid()
{
	PCID=$1

	DISKS=`find /sys/block -name "*"`
	for DISK in $DISKS; do
		DISKP=`realpath $DISK/device`
		echo $DISKP | grep $PCID > /dev/null
		[ $? -eq 0 ] && echo `basename $DISK` && break
	done
}

dump_irq_affinity()
{
	PCID=$1
	PCIP=`find /sys/devices -name *$PCID | grep pci`

	[[ ! -d $PCIP/msi_irqs ]] && return

	IRQS=`ls $PCIP/msi_irqs`

	[ $? -ne 0 ] && return

	DISK=`get_disk_from_pcid $PCID`
	echo "PCI name is $PCID: $DISK"

	for IRQ in $IRQS; do
	    [ -f /proc/irq/$IRQ/smp_affinity_list ] && CPUS=`cat /proc/irq/$IRQ/smp_affinity_list`
	    [ -f /proc/irq/$IRQ/effective_affinity_list ] && ECPUS=`cat /proc/irq/$IRQ/effective_affinity_list`
	    echo -e "\tirq $IRQ, cpu list $CPUS, effective list $ECPUS"
	done
}


if [ $# -ge 1 ]; then
	PCIDS=$1
else
#	PCID=`lspci | grep "Non-Volatile memory" | cut -c1-7`
	PCIDS=`lspci | grep "Non-Volatile memory controller" | awk '{print $1}'`
fi

echo "kernel version: "
uname -a

for PCID in $PCIDS; do
	dump_irq_affinity $PCID
done

  reply index

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-06 14:35 [PATCH RFC 0/1] Threaded handler uses irq affinity for when the interrupt is managed John Garry
2019-12-06 14:35 ` [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt John Garry
2019-12-06 15:22   ` Marc Zyngier
2019-12-06 16:16     ` John Garry
2019-12-07  8:03   ` Ming Lei
2019-12-09 14:30     ` John Garry
2019-12-09 15:09       ` Hannes Reinecke
2019-12-09 15:17         ` Marc Zyngier
2019-12-09 15:25           ` Hannes Reinecke
2019-12-09 15:36             ` Marc Zyngier
2019-12-09 15:49           ` Qais Yousef
2019-12-09 15:55             ` Marc Zyngier
2019-12-10  1:43       ` Ming Lei
2019-12-10  9:45         ` John Garry
2019-12-10 10:06           ` Ming Lei
2019-12-10 10:28           ` Marc Zyngier
2019-12-10 10:59             ` John Garry
2019-12-10 11:36               ` Marc Zyngier
2019-12-10 12:05                 ` John Garry
2019-12-10 18:32                   ` Marc Zyngier
2019-12-11  9:41                     ` John Garry
2019-12-13 10:07                       ` John Garry
2019-12-13 10:31                         ` Marc Zyngier
2019-12-13 12:08                           ` John Garry
2019-12-14 10:59                             ` Marc Zyngier
2019-12-11 17:09         ` John Garry
2019-12-12 22:38           ` Ming Lei [this message]
2019-12-13 11:12             ` John Garry
2019-12-13 13:18               ` Ming Lei
2019-12-13 15:43                 ` John Garry
2019-12-13 17:12                   ` Ming Lei
2019-12-13 17:50                     ` John Garry
2019-12-14 13:56                   ` Marc Zyngier
2019-12-16 10:47                     ` John Garry
2019-12-16 11:40                       ` Marc Zyngier
2019-12-16 14:17                         ` John Garry
2019-12-16 18:00                           ` Marc Zyngier
2019-12-16 18:50                             ` John Garry
2019-12-20 11:30                               ` John Garry
2019-12-20 14:43                                 ` Marc Zyngier
2019-12-20 15:38                                   ` John Garry
2019-12-20 16:16                                     ` Marc Zyngier
2019-12-20 23:31                                     ` Ming Lei
2019-12-23  9:07                                       ` Marc Zyngier
2019-12-23 10:26                                         ` John Garry
2019-12-23 10:47                                           ` Marc Zyngier
2019-12-23 11:35                                             ` John Garry
2019-12-24  1:59                                             ` Ming Lei
2019-12-24 11:20                                               ` Marc Zyngier
2019-12-25  0:48                                                 ` Ming Lei
2020-01-02 10:35                                                   ` John Garry
2020-01-03  0:46                                                     ` Ming Lei
2020-01-03 10:41                                                       ` John Garry
2020-01-03 11:29                                                         ` Ming Lei
2020-01-03 11:50                                                           ` John Garry
2020-01-04 12:03                                                             ` Ming Lei
2020-05-30  7:46 ` [tip: irq/core] irqchip/gic-v3-its: Balance initial LPI affinity across CPUs tip-bot2 for Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191212223805.GA24463@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bigeasy@linutronix.de \
    --cc=bvanassche@acm.org \
    --cc=chenxiang66@hisilicon.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=john.garry@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git