All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: John Garry <john.garry@huawei.com>
Cc: Ming Lei <ming.lei@redhat.com>, <tglx@linutronix.de>,
	<chenxiang66@hisilicon.com>, <bigeasy@linutronix.de>,
	<linux-kernel@vger.kernel.org>, <hare@suse.com>, <hch@lst.de>,
	<axboe@kernel.dk>, <bvanassche@acm.org>, <peterz@infradead.org>,
	<mingo@redhat.com>
Subject: Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity  for managed interrupt
Date: Tue, 10 Dec 2019 10:28:12 +0000	[thread overview]
Message-ID: <048746c22898849d28985c0f65cf2c2a@www.loen.fr> (raw)
In-Reply-To: <28424a58-1159-c3f9-1efb-f1366993afcf@huawei.com>

On 2019-12-10 09:45, John Garry wrote:
> On 10/12/2019 01:43, Ming Lei wrote:
>> On Mon, Dec 09, 2019 at 02:30:59PM +0000, John Garry wrote:
>>> On 07/12/2019 08:03, Ming Lei wrote:
>>>> On Fri, Dec 06, 2019 at 10:35:04PM +0800, John Garry wrote:
>>>>> Currently the cpu allowed mask for the threaded part of a 
>>>>> threaded irq
>>>>> handler will be set to the effective affinity of the hard irq.
>>>>>
>>>>> Typically the effective affinity of the hard irq will be for a 
>>>>> single cpu. As such,
>>>>> the threaded handler would always run on the same cpu as the hard 
>>>>> irq.
>>>>>
>>>>> We have seen scenarios in high data-rate throughput testing that 
>>>>> the cpu
>>>>> handling the interrupt can be totally saturated handling both the 
>>>>> hard
>>>>> interrupt and threaded handler parts, limiting throughput.
>>>>
>
> Hi Ming,
>
>>>> Frankly speaking, I never observed that single CPU is saturated by 
>>>> one storage
>>>> completion queue's interrupt load. Because CPU is still much 
>>>> quicker than
>>>> current storage device.
>>>>
>>>> If there are more drives, one CPU won't handle more than one 
>>>> queue(drive)'s
>>>> interrupt if (nr_drive * nr_hw_queues) < nr_cpu_cores.
>>>
>>> Are things this simple? I mean, can you guarantee that fio 
>>> processes are
>>> evenly distributed as such?
>> That is why I ask you for the details of your test.
>> If you mean hisilicon SAS,
>
> Yes, it is.
>
>  the interrupt load should have been distributed
>> well given the device has multiple reply queues for distributing 
>> interrupt
>> load.
>>
>>>
>>>>
>>>> So could you describe your case in a bit detail? Then we can 
>>>> confirm
>>>> if this change is really needed.
>>>
>>> The issue is that the CPU is saturated in servicing the hard and 
>>> threaded
>>> part of the interrupt together - here's the sort of thing which we 
>>> saw
>>> previously:
>>> Before:
>>> CPU	%usr	%sys	%irq	%soft	%idle
>>> all	2.9	13.1	1.2	4.6	78.2
>>> 0	0.0	29.3	10.1	58.6	2.0
>>> 1	18.2	39.4	0.0	1.0	41.4
>>> 2	0.0	2.0	0.0	0.0	98.0
>>>
>>> CPU0 has no effectively no idle.
>> The result just shows the saturation, we need to root cause it 
>> instead
>> of workaround it via random changes.
>>
>>>
>>> Then, by allowing the threaded part to roam:
>>> After:
>>> CPU	%usr	%sys	%irq	%soft	%idle
>>> all	3.5	18.4	2.7	6.8	68.6
>>> 0	0.0	20.6	29.9	29.9	19.6
>>> 1	0.0	39.8	0.0	50.0	10.2
>>>
>>> Note: I think that I may be able to reduce the irq hard part load 
>>> in the
>>> endpoint driver, but not that much such that we see still this 
>>> issue.
>>>
>>>>
>>>>>
>>>>> For when the interrupt is managed, allow the threaded part to run 
>>>>> on all
>>>>> cpus in the irq affinity mask.
>>>>
>>>> I remembered that performance drop is observed by this approach in 
>>>> some
>>>> test.
>>>
>>>  From checking the thread about the NVMe interrupt swamp, just 
>>> switching to
>>> threaded handler alone degrades performance. I didn't see any 
>>> specific
>>> results for this change from Long Li - 
>>> https://lkml.org/lkml/2019/8/21/128
>> I am pretty clear the reason for Azure, which is caused by 
>> aggressive interrupt
>> coalescing, and this behavior shouldn't be very common, and it can 
>> be
>> addressed by the following patch:
>> 
>> http://lists.infradead.org/pipermail/linux-nvme/2019-November/028008.html
>> Then please share your lockup story, such as, which HBA/drivers, 
>> test steps,
>> if you complete IOs from multiple disks(LUNs) on single CPU, if you 
>> have
>> multiple queues, how many active LUNs involved in the test, ...
>
> There is no lockup, just a potential performance boost in this 
> change.
>
> My colleague Xiang Chen can provide specifics of the test, as he is
> the one running it.
>
> But one key bit of info - which I did not think most relevant before
> - that is we have 2x SAS controllers running the throughput test on
> the same host.
>
> As such, the completion queue interrupts would be spread identically
> over the CPUs for each controller. I notice that ARM GICv3 ITS
> interrupt controller (which we use) does not use the generic irq
> matrix allocator, which I think would really help with this.
>
> Hi Marc,
>
> Is there any reason for which we couldn't utilise of the generic irq
> matrix allocator for GICv3?

For a start, the ITS code predates the matrix allocator by about three
years. Also, my understanding of this allocator is that it allows
x86 to cope with a very small number of possible interrupt vectors
per CPU. The ITS doesn't have such issue, as:

1) the namespace is global, and not per CPU
2) the namespace is *huge*

Now, what property of the matrix allocator is the ITS code missing?
I'd be more than happy to improve it.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

  parent reply	other threads:[~2019-12-10 10:28 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-06 14:35 [PATCH RFC 0/1] Threaded handler uses irq affinity for when the interrupt is managed John Garry
2019-12-06 14:35 ` [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt John Garry
2019-12-06 15:22   ` Marc Zyngier
2019-12-06 16:16     ` John Garry
2019-12-07  8:03   ` Ming Lei
2019-12-09 14:30     ` John Garry
2019-12-09 15:09       ` Hannes Reinecke
2019-12-09 15:17         ` Marc Zyngier
2019-12-09 15:25           ` Hannes Reinecke
2019-12-09 15:36             ` Marc Zyngier
2019-12-09 15:49           ` Qais Yousef
2019-12-09 15:55             ` Marc Zyngier
2019-12-10  1:43       ` Ming Lei
2019-12-10  9:45         ` John Garry
2019-12-10 10:06           ` Ming Lei
2019-12-10 10:28           ` Marc Zyngier [this message]
2019-12-10 10:59             ` John Garry
2019-12-10 11:36               ` Marc Zyngier
2019-12-10 12:05                 ` John Garry
2019-12-10 18:32                   ` Marc Zyngier
2019-12-11  9:41                     ` John Garry
2019-12-13 10:07                       ` John Garry
2019-12-13 10:31                         ` Marc Zyngier
2019-12-13 12:08                           ` John Garry
2019-12-14 10:59                             ` Marc Zyngier
2019-12-11 17:09         ` John Garry
2019-12-12 22:38           ` Ming Lei
2019-12-13 11:12             ` John Garry
2019-12-13 13:18               ` Ming Lei
2019-12-13 15:43                 ` John Garry
2019-12-13 17:12                   ` Ming Lei
2019-12-13 17:50                     ` John Garry
2019-12-14 13:56                   ` Marc Zyngier
2019-12-16 10:47                     ` John Garry
2019-12-16 11:40                       ` Marc Zyngier
2019-12-16 14:17                         ` John Garry
2019-12-16 18:00                           ` Marc Zyngier
2019-12-16 18:50                             ` John Garry
2019-12-20 11:30                               ` John Garry
2019-12-20 14:43                                 ` Marc Zyngier
2019-12-20 15:38                                   ` John Garry
2019-12-20 16:16                                     ` Marc Zyngier
2019-12-20 23:31                                     ` Ming Lei
2019-12-23  9:07                                       ` Marc Zyngier
2019-12-23 10:26                                         ` John Garry
2019-12-23 10:47                                           ` Marc Zyngier
2019-12-23 11:35                                             ` John Garry
2019-12-24  1:59                                             ` Ming Lei
2019-12-24 11:20                                               ` Marc Zyngier
2019-12-25  0:48                                                 ` Ming Lei
2020-01-02 10:35                                                   ` John Garry
2020-01-03  0:46                                                     ` Ming Lei
2020-01-03 10:41                                                       ` John Garry
2020-01-03 11:29                                                         ` Ming Lei
2020-01-03 11:50                                                           ` John Garry
2020-01-04 12:03                                                             ` Ming Lei
2020-05-30  7:46 ` [tip: irq/core] irqchip/gic-v3-its: Balance initial LPI affinity across CPUs tip-bot2 for Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=048746c22898849d28985c0f65cf2c2a@www.loen.fr \
    --to=maz@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bigeasy@linutronix.de \
    --cc=bvanassche@acm.org \
    --cc=chenxiang66@hisilicon.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=john.garry@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.