From: Marc Zyngier <maz@kernel.org>
To: John Garry <john.garry@huawei.com>
Cc: Jason Cooper <jason@lakedaemon.net>,
luojiaxing <luojiaxing@huawei.com>,
linux-kernel@vger.kernel.org, Ming Lei <ming.lei@redhat.com>,
"Wangzhou (B)" <wangzhou1@hisilicon.com>,
Thomas Gleixner <tglx@linutronix.de>,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v3 2/2] irqchip/gic-v3-its: Balance initial LPI affinity across CPUs
Date: Wed, 18 Mar 2020 17:30:44 +0000 [thread overview]
Message-ID: <a24fad17d178209d35bbcb9f270c84ff@kernel.org> (raw)
In-Reply-To: <894aabcc-9676-3945-7a62-70fb930fd8a5@huawei.com>
On 2020-03-18 15:34, John Garry wrote:
>>>> +static int its_select_cpu(struct irq_data *d,
>>>> + const struct cpumask *aff_mask)
>>>> +{
>>>> + struct its_device *its_dev = irq_data_get_irq_chip_data(d);
>>>> + cpumask_var_t tmpmask;
>>>> + int cpu, node;
>>>> +
>>>> + if (!alloc_cpumask_var(&tmpmask, GFP_KERNEL))
>>>> + return -ENOMEM;
>>>> +
>>>> + node = its_dev->its->numa_node;
>>>> +
>>>> + if (!irqd_affinity_is_managed(d)) {
>>>> + /* First try the NUMA node */
>>>> + if (node != NUMA_NO_NODE) {
>>>> + /*
>>>> + * Try the intersection of the affinity mask and the
>>>> + * node mask (and the online mask, just to be safe).
>>>> + */
>>>> + cpumask_and(tmpmask, cpumask_of_node(node), aff_mask);
>>>> + cpumask_and(tmpmask, tmpmask, cpu_online_mask);
>>>> +
>>>> + /* If that doesn't work, try the nodemask itself */
>>>
>>> So if tmpmsk is empty...
>>
>> Which means the proposed affinity mask isn't part of the node mask the
>> first place.
>> Why did we get such an affinity the first place?
>
> It seems to be just irqbalance setting the affinity mask via sysfs:
>
> [44.782116] Calltrace:
> [44.782119] its_select_cpu+0x420/0x6e0
> [44.782121] its_set_affinity+0x180/0x208
> [44.782126] msi_domain_set_affinity+0x44/0xb8
> [44.782130] irq_do_set_affinity+0x48/0x190
> [44.782132] irq_set_affinity_locked+0xc0/0xe8
> [44.782134] __irq_set_affinity+0x48/0x78
> [44.782136] write_irq_affinity.isra.8+0xec/0x110
> [44.782138] irq_affinity_proc_write+0x1c/0x28
> [44.782142] proc_reg_write+0x70/0xb8
> [44.782147] __vfs_write+0x18/0x40
> [44.782149] vfs_write+0xb0/0x1d0
> [44.782151] ksys_write+0x64/0xe8
> [44.782154] __arm64_sys_write+0x18/0x20
> [44.782157] el0_svc_common.constprop.2+0x88/0x150
> [44.782159] do_el0_svc+0x20/0x80
> [44.782162] el0_sync_handler+0x118/0x188
> [44.782164] el0_sync+0x140/0x180
>
> And for some reason fancied cpu62.
Hmmm. OK. I'm surprised that irqbalance dries to set a range of CPUs,
instead of
a particular CPU though.
>
>>
>>>
>>>> + if (cpumask_empty(tmpmask))
>>>> + cpumask_and(tmpmask, cpumask_of_node(node), cpu_online_mask);
>>>
>>> now the tmpmask may have no intersection with the aff_mask...
>>
>> But it has the mask for CPUs that are best suited for this interrupt,
>> right?
>> If I understand the topology of your machine, it has an ITS per 64
>> CPUs,
>> and
>> this device is connected to the ITS that serves the second socket.
>
> No, this one (D06ES) has a single ITS:
>
> john@ubuntu:~/kernel-dev$ dmesg | grep ITS
> [ 0.000000] SRAT: PXM 0 -> ITS 0 -> Node 0
> [ 0.000000] ITS [mem 0x202100000-0x20211ffff]
> [ 0.000000] ITS@0x0000000202100000: Using ITS number 0
> [ 0.000000] ITS@0x0000000202100000: allocated 8192 Devices
> @23ea9f0000 (indirect, esz 8, psz 16K, shr 1)
> [ 0.000000] ITS@0x0000000202100000: allocated 2048 Virtual CPUs
> @23ea9d8000 (indirect, esz 16, psz 4K, shr 1)
> [ 0.000000] ITS@0x0000000202100000: allocated 256 Interrupt
> Collections @23ea9d3000 (flat, esz 16, psz 4K, shr 1)
> [ 0.000000] ITS: Using DirectLPI for VPE invalidation
> [ 0.000000] ITS: Enabling GICv4 support
> [ 0.044034] Platform MSI: ITS@0x202100000 domain created
> [ 0.044042] PCI/MSI: ITS@0x202100000 domain created
There's something I'm missing here. If there's a single ITS in the
system,
node affinity must cover the whole system, not half of it.
> D06CS has 2x ITS, as you may know :)
>
> And, FWIW, the device is on the 2nd socket, numa node #2.
You've lost me. Single ITS, but two sockets?
>
> So the cpu mask of node #0 (where the ITS lives) is 0-23. So no
> intersection with what userspace requested.
>
>>> if (cpu < 0 || cpu >= nr_cpu_ids)
>>> return -EINVAL;
>>>
>>> if (cpu != its_dev->event_map.col_map[id]) {
>>> its_inc_lpi_count(d, cpu);
>>> its_dec_lpi_count(d, its_dev->event_map.col_map[id]);
>>> target_col = &its_dev->its->collections[cpu];
>>> its_send_movi(its_dev, target_col, id);
>>> its_dev->event_map.col_map[id] = cpu;
>>> irq_data_update_effective_affinity(d, cpumask_of(cpu));
>>> }
>>>
>>> So cpu may not be a member of mask_val. Hence the inconsistency of
>>> the
>>> affinity list and effective affinity. We could just drop the AND of
>>> the ITS node mask in its_select_cpu().
>>
>> That would be a departure from the algorithm Thomas proposed, which
>> made
>> a lot of sense in my opinion. What its_select_cpu() does in this case
>> is
>> probably the best that can be achieved from a latency perspective,
>> as it keeps the interrupt local to the socket that generated it.
>
> We seem to be following what Thomas described for a non-managed
> interrupt bound to a node. But is this interrupt bound to the node?
If the ITS advertizes affinity to a node (through SRAT, for example),
we should use that. And that's what we have in this patch.
> Regardless of that, what you're saying seems right - keep local
> interrupt bound to the node. But the problem is that userspace is
> doing its own thing.
Unless you tell the interrupt subsystem that userspace cannot balance
this interrupt, it can happen.
> BTW, sorry if any text formatting is mangled. I have to improve my WFH
> setup....
You're doing fine! ;-)
M.
--
Jazz is not dead. It just smells funny...
next prev parent reply other threads:[~2020-03-18 17:30 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-16 11:54 [PATCH v3 0/2] irqchip/gic-v3-its: Balance LPI affinity across CPUs Marc Zyngier
2020-03-16 11:54 ` [PATCH v3 1/2] irqchip/gic-v3-its: Track LPI distribution on a per CPU basis Marc Zyngier
2020-03-16 11:54 ` [PATCH v3 2/2] irqchip/gic-v3-its: Balance initial LPI affinity across CPUs Marc Zyngier
2020-03-16 13:02 ` John Garry
2020-03-16 13:14 ` Marc Zyngier
2020-03-17 18:43 ` John Garry
2020-03-18 14:16 ` Marc Zyngier
2020-03-18 14:25 ` John Garry
2020-03-18 12:22 ` John Garry
2020-03-18 14:04 ` Marc Zyngier
2020-03-18 15:34 ` John Garry
2020-03-18 17:30 ` Marc Zyngier [this message]
2020-03-18 19:00 ` John Garry
2020-03-27 17:52 ` John Garry
2020-03-19 12:31 ` [PATCH v3 0/2] irqchip/gic-v3-its: Balance " John Garry
2020-03-27 17:47 ` John Garry
2020-04-01 11:33 ` John Garry
2020-05-14 12:05 ` John Garry
2020-05-15 10:14 ` Marc Zyngier
2020-05-15 11:50 ` John Garry
2020-05-15 15:37 ` Marc Zyngier
2020-05-15 16:15 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a24fad17d178209d35bbcb9f270c84ff@kernel.org \
--to=maz@kernel.org \
--cc=jason@lakedaemon.net \
--cc=john.garry@huawei.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luojiaxing@huawei.com \
--cc=ming.lei@redhat.com \
--cc=tglx@linutronix.de \
--cc=wangzhou1@hisilicon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).