All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: David Woodhouse <dwmw2@infradead.org>, x86@kernel.org
Cc: iommu <iommu@lists.linux-foundation.org>,
	kvm <kvm@vger.kernel.org>,
	linux-hyperv@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
Date: Wed, 07 Oct 2020 17:57:36 +0200	[thread overview]
Message-ID: <87a6wy3u6n.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <336029ca32524147a61b6fa1eb734debc9d51a00.camel@infradead.org>

On Wed, Oct 07 2020 at 15:10, David Woodhouse wrote:
> On Wed, 2020-10-07 at 15:37 +0200, Thomas Gleixner wrote:
>> What is preventing you to change the function signature? But handing
>> down irqdomain here is not cutting it. The right thing to do is to
>> replace 'struct irq_affinity_desc *affinity' with something more
>> flexible.
>
> Yeah, although I do think I want to ditch this part completely, and
> treat the "possible" mask for the domain very much more like we do
> cpu_online_mask. In that we can allow affinities to be *requested*
> which are outside it, and they can just never be *effective* while
> those CPUs aren't present and reachable.

Huch?

> That way a lot of the nastiness about preparing an "acceptable" mask in
> advance can go away.

There is not lot's of nastiness.

> It's *also* driven, as I noted, by the realisation that on x86, the
> x86_non_ir_cpumask will only ever contain those CPUs which have been
> brought online so far and meet the criteria... but won't that be true
> on *any* system where CPU numbers are virtual and not 1:1 mapped with
> whatever determines reachability by the IRQ domain? It isn't *such* an
> x86ism, surely?

Yes, but that's exactly the reason why I don't want to have whatever
mask name you chose to be directly exposed and want it to be part of the
irq domains because then you can express arbitrary per domain limits.

>> Fact is, that if there are CPUs which cannot be targeted by device
>> interrupts then the multiqueue affinity mechanism has to be fixed to
>> handle this. Right now it's just broken.
>
> I think part of the problem there is that I don't really understand how
> this part is *supposed* to work. I was focusing on getting the simple
> case working first, in the expectation that we'd come back to that part
> ansd you'd keep me honest. Is there some decent documentation on this
> that I'm missing?

TLDR & HTF;

Multiqueue devices want to have at max 1 queue per CPU or if the device
has less queues than CPUs they want the queues to have a fixed
association to a set of CPUs.

At setup time this is established considering possible CPUs to handle
'physical' hotplug correctly.

If a queue has no online CPUs it cannot be started. If it's active and
the last CPU goes down then it's quiesced and stopped and the core code
shuts down the interrupt and does not move it to a still online CPU.

So with your hackery, we end up in a situation where we have a large
possible mask, but not all CPUs in that mask can be reached, which means
in a 1 queue per CPU scenario all unreachable CPUs would have
disfunctional queues.

So that spreading algorithm needs to know about this limitation.

>> So if you look at X86 then you have either:
>> 
>>    [VECTOR] ----------------- [IO/APIC]
>>                           |-- [MSI]
>>                           |-- [WHATEVER]
>> 
>> or
>> 
>>    [VECTOR] ---[REMAP]------- [IO/APIC]
>>              |            |-- [MSI]
>>              |----------------[WHATEVER]
>
> Hierarchically, theoretically the IOAPIC and HPET really ought to be
> children of the MSI domain. It's the Compatibility MSI which has the
> restriction on destination ID, because of how many bits it interprets
> from the MSI address. HPET and IOAPIC are just generating MSIs that hit
> that upstream limit.

We kinda have that, but not nicely abstracted. But we surely can and
should fix that.

>> So if REMAP allows cpu_possible_mask and VECTOR some restricted subset
>> then irqdomain_get_possible_affinity() will return the correct result
>> independent whether remapping is enabled or not.
>
w> Sure. Although VECTOR doesn't have the restriction. As noted, it's the
> Compatibility MSI that does. So the diagram looks something like this:
>
>  [ VECTOR ] ---- [ REMAP ] ---- [ IR-MSI ] ---- [ IR-HPET ]
>               |                             |---[ IR-PCI-MSI ]
>               |                             |---[ IR-IOAPIC ]
>               |
>               |--[ COMPAT-MSI ] ---- [ HPET ]
>                                  |-- [ PCI-MSI ]
>                                  |-- [ IOAPIC ]
>
>
> In this diagram, it's COMPAT-MSI that has the affinity restriction,
> inherited by its children.
>
> Now, I understand that you're not keen on IOAPIC actually being a child
> of the MSI domain, and that's fine. In Linux right now, those generic
> 'IR-MSI' and 'COMPAT-MSI' domains don't exist. So all three of the
> compatibility HPET, PCI-MSI and IOAPIC domains would have to add that
> same 8-bit affinity limit for themselves, as their parent is the VECTOR
> domain.

No. We fix it proper and not by hacking around it.

> I suppose it *might* not hurt to pretend that VECTOR does indeed have
> the limit, and to *remove* it in the remapping domain. And then the
> affinity limit could be removed in one place by the REMAP domain
> because even now in Linux's imprecise hierarchy, the IR-HPET, IR-PCI-
> MSI and IR-IOAPIC domains *are* children of that.

It's not rocket science to fix that as the irq remapping code already
differentiates between the device types.

Thanks,

        tglx



WARNING: multiple messages have this Message-ID (diff)
From: Thomas Gleixner <tglx@linutronix.de>
To: David Woodhouse <dwmw2@infradead.org>, x86@kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	iommu <iommu@lists.linux-foundation.org>,
	linux-hyperv@vger.kernel.org, kvm <kvm@vger.kernel.org>
Subject: Re: [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs()
Date: Wed, 07 Oct 2020 17:57:36 +0200	[thread overview]
Message-ID: <87a6wy3u6n.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <336029ca32524147a61b6fa1eb734debc9d51a00.camel@infradead.org>

On Wed, Oct 07 2020 at 15:10, David Woodhouse wrote:
> On Wed, 2020-10-07 at 15:37 +0200, Thomas Gleixner wrote:
>> What is preventing you to change the function signature? But handing
>> down irqdomain here is not cutting it. The right thing to do is to
>> replace 'struct irq_affinity_desc *affinity' with something more
>> flexible.
>
> Yeah, although I do think I want to ditch this part completely, and
> treat the "possible" mask for the domain very much more like we do
> cpu_online_mask. In that we can allow affinities to be *requested*
> which are outside it, and they can just never be *effective* while
> those CPUs aren't present and reachable.

Huch?

> That way a lot of the nastiness about preparing an "acceptable" mask in
> advance can go away.

There is not lot's of nastiness.

> It's *also* driven, as I noted, by the realisation that on x86, the
> x86_non_ir_cpumask will only ever contain those CPUs which have been
> brought online so far and meet the criteria... but won't that be true
> on *any* system where CPU numbers are virtual and not 1:1 mapped with
> whatever determines reachability by the IRQ domain? It isn't *such* an
> x86ism, surely?

Yes, but that's exactly the reason why I don't want to have whatever
mask name you chose to be directly exposed and want it to be part of the
irq domains because then you can express arbitrary per domain limits.

>> Fact is, that if there are CPUs which cannot be targeted by device
>> interrupts then the multiqueue affinity mechanism has to be fixed to
>> handle this. Right now it's just broken.
>
> I think part of the problem there is that I don't really understand how
> this part is *supposed* to work. I was focusing on getting the simple
> case working first, in the expectation that we'd come back to that part
> ansd you'd keep me honest. Is there some decent documentation on this
> that I'm missing?

TLDR & HTF;

Multiqueue devices want to have at max 1 queue per CPU or if the device
has less queues than CPUs they want the queues to have a fixed
association to a set of CPUs.

At setup time this is established considering possible CPUs to handle
'physical' hotplug correctly.

If a queue has no online CPUs it cannot be started. If it's active and
the last CPU goes down then it's quiesced and stopped and the core code
shuts down the interrupt and does not move it to a still online CPU.

So with your hackery, we end up in a situation where we have a large
possible mask, but not all CPUs in that mask can be reached, which means
in a 1 queue per CPU scenario all unreachable CPUs would have
disfunctional queues.

So that spreading algorithm needs to know about this limitation.

>> So if you look at X86 then you have either:
>> 
>>    [VECTOR] ----------------- [IO/APIC]
>>                           |-- [MSI]
>>                           |-- [WHATEVER]
>> 
>> or
>> 
>>    [VECTOR] ---[REMAP]------- [IO/APIC]
>>              |            |-- [MSI]
>>              |----------------[WHATEVER]
>
> Hierarchically, theoretically the IOAPIC and HPET really ought to be
> children of the MSI domain. It's the Compatibility MSI which has the
> restriction on destination ID, because of how many bits it interprets
> from the MSI address. HPET and IOAPIC are just generating MSIs that hit
> that upstream limit.

We kinda have that, but not nicely abstracted. But we surely can and
should fix that.

>> So if REMAP allows cpu_possible_mask and VECTOR some restricted subset
>> then irqdomain_get_possible_affinity() will return the correct result
>> independent whether remapping is enabled or not.
>
w> Sure. Although VECTOR doesn't have the restriction. As noted, it's the
> Compatibility MSI that does. So the diagram looks something like this:
>
>  [ VECTOR ] ---- [ REMAP ] ---- [ IR-MSI ] ---- [ IR-HPET ]
>               |                             |---[ IR-PCI-MSI ]
>               |                             |---[ IR-IOAPIC ]
>               |
>               |--[ COMPAT-MSI ] ---- [ HPET ]
>                                  |-- [ PCI-MSI ]
>                                  |-- [ IOAPIC ]
>
>
> In this diagram, it's COMPAT-MSI that has the affinity restriction,
> inherited by its children.
>
> Now, I understand that you're not keen on IOAPIC actually being a child
> of the MSI domain, and that's fine. In Linux right now, those generic
> 'IR-MSI' and 'COMPAT-MSI' domains don't exist. So all three of the
> compatibility HPET, PCI-MSI and IOAPIC domains would have to add that
> same 8-bit affinity limit for themselves, as their parent is the VECTOR
> domain.

No. We fix it proper and not by hacking around it.

> I suppose it *might* not hurt to pretend that VECTOR does indeed have
> the limit, and to *remove* it in the remapping domain. And then the
> affinity limit could be removed in one place by the REMAP domain
> because even now in Linux's imprecise hierarchy, the IR-HPET, IR-PCI-
> MSI and IR-IOAPIC domains *are* children of that.

It's not rocket science to fix that as the irq remapping code already
differentiates between the device types.

Thanks,

        tglx


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2020-10-07 15:57 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-05 15:28 [PATCH 0/13] Fix per-domain IRQ affinity, allow >255 CPUs on x86 without IRQ remapping David Woodhouse
2020-10-05 15:28 ` David Woodhouse
2020-10-05 15:28 ` [PATCH 01/13] x86/apic: Use x2apic in guest kernels even with unusable CPUs David Woodhouse
2020-10-05 15:28   ` David Woodhouse
2020-10-05 15:28   ` [PATCH 02/13] x86/msi: Only use high bits of MSI address for DMAR unit David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-06 20:45     ` Thomas Gleixner
2020-10-06 20:45       ` Thomas Gleixner
2020-10-05 15:28   ` [PATCH 03/13] x86/ioapic: Handle Extended Destination ID field in RTE David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-05 15:28   ` [PATCH 04/13] x86/apic: Support 15 bits of APIC ID in IOAPIC/MSI where available David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-05 15:28   ` [PATCH 05/13] genirq: Prepare for default affinity to be passed to __irq_alloc_descs() David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-06 21:01     ` Thomas Gleixner
2020-10-06 21:01       ` Thomas Gleixner
2020-10-06 21:07       ` David Woodhouse
2020-10-06 21:07         ` David Woodhouse
2020-10-05 15:28   ` [PATCH 06/13] genirq: Add default_affinity argument " David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-06 21:06     ` Thomas Gleixner
2020-10-06 21:06       ` Thomas Gleixner
2020-10-05 15:28   ` [PATCH 07/13] irqdomain: Add max_affinity argument to irq_domain_alloc_descs() David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-06 21:26     ` Thomas Gleixner
2020-10-06 21:26       ` Thomas Gleixner
2020-10-07  7:19       ` David Woodhouse
2020-10-07  7:19         ` David Woodhouse
2020-10-07 13:37         ` Thomas Gleixner
2020-10-07 13:37           ` Thomas Gleixner
2020-10-07 14:10           ` David Woodhouse
2020-10-07 14:10             ` David Woodhouse
2020-10-07 15:57             ` Thomas Gleixner [this message]
2020-10-07 15:57               ` Thomas Gleixner
2020-10-07 16:11               ` David Woodhouse
2020-10-07 16:11                 ` David Woodhouse
2020-10-07 20:53                 ` Thomas Gleixner
2020-10-07 20:53                   ` Thomas Gleixner
2020-10-08  7:21               ` David Woodhouse
2020-10-08  7:21                 ` David Woodhouse
2020-10-08  9:34                 ` Thomas Gleixner
2020-10-08  9:34                   ` Thomas Gleixner
2020-10-08 11:10                   ` David Woodhouse
2020-10-08 11:10                     ` David Woodhouse
2020-10-08 12:40                     ` Thomas Gleixner
2020-10-08 12:40                       ` Thomas Gleixner
2020-10-09  7:54                       ` David Woodhouse
2020-10-09  7:54                         ` David Woodhouse
2020-10-05 15:28   ` [PATCH 08/13] genirq: Add irq_domain_set_affinity() David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-06 21:32     ` Thomas Gleixner
2020-10-06 21:32       ` Thomas Gleixner
2020-10-07  7:22       ` David Woodhouse
2020-10-07  7:22         ` David Woodhouse
2020-10-05 15:28   ` [PATCH 09/13] x86/irq: Add x86_non_ir_cpumask David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-06 21:42     ` Thomas Gleixner
2020-10-06 21:42       ` Thomas Gleixner
2020-10-07  7:25       ` David Woodhouse
2020-10-07  7:25         ` David Woodhouse
2020-10-05 15:28   ` [PATCH 10/13] x86/irq: Limit IOAPIC and MSI domains' affinity without IR David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-06 21:54     ` Thomas Gleixner
2020-10-06 21:54       ` Thomas Gleixner
2020-10-07  7:48       ` David Woodhouse
2020-10-07  7:48         ` David Woodhouse
2020-10-07 12:59         ` Thomas Gleixner
2020-10-07 12:59           ` Thomas Gleixner
2020-10-07 13:08           ` David Woodhouse
2020-10-07 13:08             ` David Woodhouse
2020-10-07 14:05             ` Thomas Gleixner
2020-10-07 14:05               ` Thomas Gleixner
2020-10-07 14:23               ` David Woodhouse
2020-10-07 14:23                 ` David Woodhouse
2020-10-07 16:02                 ` Thomas Gleixner
2020-10-07 16:02                   ` Thomas Gleixner
2020-10-07 16:15                   ` David Woodhouse
2020-10-07 16:15                     ` David Woodhouse
2020-10-07 15:05               ` David Woodhouse
2020-10-07 15:05                 ` David Woodhouse
2020-10-07 15:25                 ` Thomas Gleixner
2020-10-07 15:25                   ` Thomas Gleixner
2020-10-07 15:46                   ` David Woodhouse
2020-10-07 15:46                     ` David Woodhouse
2020-10-07 17:23                     ` Thomas Gleixner
2020-10-07 17:23                       ` Thomas Gleixner
2020-10-07 17:34                       ` David Woodhouse
2020-10-07 17:34                         ` David Woodhouse
2020-10-05 15:28   ` [PATCH 11/13] x86/smp: Allow more than 255 CPUs even without interrupt remapping David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-05 15:28   ` [PATCH 12/13] iommu/irq_remapping: Kill most of hyperv-iommu.c now it's redundant David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-05 15:28   ` [PATCH 13/13] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID David Woodhouse
2020-10-05 15:28     ` David Woodhouse
2020-10-07  8:14     ` Paolo Bonzini
2020-10-07  8:14       ` Paolo Bonzini
2020-10-07  8:59       ` David Woodhouse
2020-10-07  8:59         ` David Woodhouse
2020-10-07 11:15         ` Paolo Bonzini
2020-10-07 11:15           ` Paolo Bonzini
2020-10-07 12:04           ` David Woodhouse
2020-10-07 12:04             ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a6wy3u6n.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.