linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* irqdomain API: how to set affinity of parent irq of chained irqs?
@ 2022-05-02  8:21 Marek Behún
  2022-05-02  9:31 ` Marc Zyngier
  0 siblings, 1 reply; 7+ messages in thread
From: Marek Behún @ 2022-05-02  8:21 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-kernel, Thomas Gleixner, pali

Dear Marc, Thomas,

we have encountered the following problem that can hopefully be put
some light onto: What is the intended way to set affinity (and possibly
other irq attributes) of parent IRQ of chained IRQs, when using the
irqdomain API?

We are working on a driver that
- registers an irqchip and adds an irqdomain
- calls irq_set_chained_handler_and_data(parent_irq, handler)
  where handler triggers handling of child IRQs
- but since parent_irq isn't requested for with request_thread_irq(),
  it does not show up in proc/sysfs, only in debugfs
- the HW does not support setting affinity for the chained IRQs, only 
  the parent (which comes from a GIC chip)

The problem is that he parent IRQ, as mentioned in the third point, does
not show up in proc/sysfs.

Is there some precedent for this?

Thank you.

Marek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: irqdomain API: how to set affinity of parent irq of chained irqs?
  2022-05-02  8:21 irqdomain API: how to set affinity of parent irq of chained irqs? Marek Behún
@ 2022-05-02  9:31 ` Marc Zyngier
  2022-05-02 15:45   ` Marek Behún
  0 siblings, 1 reply; 7+ messages in thread
From: Marc Zyngier @ 2022-05-02  9:31 UTC (permalink / raw)
  To: Marek Behún; +Cc: linux-kernel, Thomas Gleixner, pali

On Mon, 02 May 2022 09:21:37 +0100,
Marek Behún <kabel@kernel.org> wrote:
> 
> Dear Marc, Thomas,
> 
> we have encountered the following problem that can hopefully be put
> some light onto: What is the intended way to set affinity (and possibly
> other irq attributes) of parent IRQ of chained IRQs, when using the
> irqdomain API?

Simples: you can't. What sense does it make to change the affinity of
the parent interrupt, given that its fate is tied to *all* of the
other interrupts that are muxed to it?

Moving the parent interrupt breaks userspace's view of how interrupt
affinity is managed (change the affinity of one interrupt, see all the
others move the same way). Which is why we don't expose this interrupt
to userspace, as this can only lead to bad things.

Note that this has nothing to do with the irqdomain API, but
everything to do with the userspace ABI.

> 
> We are working on a driver that
> - registers an irqchip and adds an irqdomain
> - calls irq_set_chained_handler_and_data(parent_irq, handler)
>   where handler triggers handling of child IRQs
> - but since parent_irq isn't requested for with request_thread_irq(),
>   it does not show up in proc/sysfs, only in debugfs
> - the HW does not support setting affinity for the chained IRQs, only 
>   the parent (which comes from a GIC chip)
> 
> The problem is that he parent IRQ, as mentioned in the third point, does
> not show up in proc/sysfs.
> 
> Is there some precedent for this?

There were precedents of irqchips doing terrible things, such as
implementing a set_affinity() callback in the chained irqchip.
Thankfully, they have been either fixed or eradicated.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: irqdomain API: how to set affinity of parent irq of chained irqs?
  2022-05-02  9:31 ` Marc Zyngier
@ 2022-05-02 15:45   ` Marek Behún
  2022-05-03  9:32     ` Marc Zyngier
  0 siblings, 1 reply; 7+ messages in thread
From: Marek Behún @ 2022-05-02 15:45 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-kernel, Thomas Gleixner, pali

On Mon, 02 May 2022 10:31:11 +0100
Marc Zyngier <maz@kernel.org> wrote:

> On Mon, 02 May 2022 09:21:37 +0100,
> Marek Behún <kabel@kernel.org> wrote:
> > 
> > Dear Marc, Thomas,
> > 
> > we have encountered the following problem that can hopefully be put
> > some light onto: What is the intended way to set affinity (and possibly
> > other irq attributes) of parent IRQ of chained IRQs, when using the
> > irqdomain API?  
> 
> Simples: you can't. What sense does it make to change the affinity of
> the parent interrupt, given that its fate is tied to *all* of the
> other interrupts that are muxed to it?

Dear Marc,

thank you for your answer. Still:

What about when we want to set the same affinity for all the chained
interrupts?

Example: on Armada 385 there are 4 PCIe controllers. Each controller
has one interrupt from which we trigger chained interrupts. We would
like to configure each controller to trigger interrupt (and thus all
chained interrupts in the domain) on different CPU core.

Moreover we would really like to do this in runtime, through sysfs,
depending on for example whether there are cards plugged in the PCIe
ports.

Maybe there should be some mechanism to allow to change affinity for
whole irqdomain, or something?

Marek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: irqdomain API: how to set affinity of parent irq of chained irqs?
  2022-05-02 15:45   ` Marek Behún
@ 2022-05-03  9:32     ` Marc Zyngier
  2023-04-06 23:56       ` Radu Rendec
  0 siblings, 1 reply; 7+ messages in thread
From: Marc Zyngier @ 2022-05-03  9:32 UTC (permalink / raw)
  To: Marek Behún; +Cc: linux-kernel, Thomas Gleixner, pali

On Mon, 02 May 2022 16:45:59 +0100,
Marek Behún <kabel@kernel.org> wrote:
> 
> On Mon, 02 May 2022 10:31:11 +0100
> Marc Zyngier <maz@kernel.org> wrote:
> 
> > On Mon, 02 May 2022 09:21:37 +0100,
> > Marek Behún <kabel@kernel.org> wrote:
> > > 
> > > Dear Marc, Thomas,
> > > 
> > > we have encountered the following problem that can hopefully be put
> > > some light onto: What is the intended way to set affinity (and possibly
> > > other irq attributes) of parent IRQ of chained IRQs, when using the
> > > irqdomain API?  
> > 
> > Simples: you can't. What sense does it make to change the affinity of
> > the parent interrupt, given that its fate is tied to *all* of the
> > other interrupts that are muxed to it?
> 
> Dear Marc,
> 
> thank you for your answer. Still:
> 
> What about when we want to set the same affinity for all the chained
> interrupts?
> 
> Example: on Armada 385 there are 4 PCIe controllers. Each controller
> has one interrupt from which we trigger chained interrupts. We would
> like to configure each controller to trigger interrupt (and thus all
> chained interrupts in the domain) on different CPU core.
> 
> Moreover we would really like to do this in runtime, through sysfs,
> depending on for example whether there are cards plugged in the PCIe
> ports.
> 
> Maybe there should be some mechanism to allow to change affinity for
> whole irqdomain, or something?

Should? Maybe. But not for an irqdomain (which really doesn't have
anything to do with interrupt affinity).

What you may want is a new sysfs interface that would allow a parent
interrupt affinity being changed, but also exposing to userspace all
the interrupts this affects *at the same time*. something like:

/sys/kernel/irq/42/smp_affinity_list
/sys/kernel/irq/42/muxed_irqs/
/sys/kernel/irq/42/muxed_irqs/56 -> ../../56
/sys/kernel/irq/42/muxed_irqs/57 -> ../../57

The main issues are that:

- we don't really track the muxing information in any of the data
  structures, so you can't just walk a short list and generate this
  information. You'd need to build the topology information at
  allocation time (or fish it out at runtime, but that's likely a
  pain).

- sysfs doesn't deal with affinities at all. procfs does, but adding
  more crap there is frowned upon.

- it *must* be a new interface. You can't repurpose the existing one,
  as something like irqbalance would be otherwise be massively
  confused by seeing interrupts moving around behind its back.

- conversely, you'll need to teach irqbalance how to deal with this
  new interface.

- this needs to be safe against CPU hotplug. It probably already is,
  but nobody ever tested it, given that userspace can't interact with
  these interrupts at the moment.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: irqdomain API: how to set affinity of parent irq of chained irqs?
  2022-05-03  9:32     ` Marc Zyngier
@ 2023-04-06 23:56       ` Radu Rendec
  2023-04-07  9:18         ` Marc Zyngier
  0 siblings, 1 reply; 7+ messages in thread
From: Radu Rendec @ 2023-04-06 23:56 UTC (permalink / raw)
  To: Marc Zyngier, Marek Behún
  Cc: linux-kernel, Thomas Gleixner, pali, Brian Masney

Hello Marc, Marek,

On Tue, 2022-05-03 at 10:32 +0100, Marc Zyngier wrote:
> On Mon, 02 May 2022 16:45:59 +0100,
> Marek Behún <kabel@kernel.org> wrote:
> > 
> > On Mon, 02 May 2022 10:31:11 +0100
> > Marc Zyngier <maz@kernel.org> wrote:
> > 
> > > On Mon, 02 May 2022 09:21:37 +0100,
> > > Marek Behún <kabel@kernel.org> wrote:
> > > > 
> > > > Dear Marc, Thomas,
> > > > 
> > > > we have encountered the following problem that can hopefully be put
> > > > some light onto: What is the intended way to set affinity (and possibly
> > > > other irq attributes) of parent IRQ of chained IRQs, when using the
> > > > irqdomain API?  
> > > 
> > > Simples: you can't. What sense does it make to change the affinity of
> > > the parent interrupt, given that its fate is tied to *all* of the
> > > other interrupts that are muxed to it?
> > 
> > Dear Marc,
> > 
> > thank you for your answer. Still:
> > 
> > What about when we want to set the same affinity for all the chained
> > interrupts?
> > 
> > Example: on Armada 385 there are 4 PCIe controllers. Each controller
> > has one interrupt from which we trigger chained interrupts. We would
> > like to configure each controller to trigger interrupt (and thus all
> > chained interrupts in the domain) on different CPU core.
> > 
> > Moreover we would really like to do this in runtime, through sysfs,
> > depending on for example whether there are cards plugged in the PCIe
> > ports.
> > 
> > Maybe there should be some mechanism to allow to change affinity for
> > whole irqdomain, or something?
> 
> Should? Maybe. But not for an irqdomain (which really doesn't have
> anything to do with interrupt affinity).
> 
> What you may want is a new sysfs interface that would allow a parent
> interrupt affinity being changed, but also exposing to userspace all
> the interrupts this affects *at the same time*. something like:
> 
> /sys/kernel/irq/42/smp_affinity_list
> /sys/kernel/irq/42/muxed_irqs/
> /sys/kernel/irq/42/muxed_irqs/56 -> ../../56
> /sys/kernel/irq/42/muxed_irqs/57 -> ../../57
> 
> The main issues are that:
> 
> - we don't really track the muxing information in any of the data
>   structures, so you can't just walk a short list and generate this
>   information. You'd need to build the topology information at
>   allocation time (or fish it out at runtime, but that's likely a
>   pain).
> 
> - sysfs doesn't deal with affinities at all. procfs does, but adding
>   more crap there is frowned upon.
> 
> - it *must* be a new interface. You can't repurpose the existing one,
>   as something like irqbalance would be otherwise be massively
>   confused by seeing interrupts moving around behind its back.
> 
> - conversely, you'll need to teach irqbalance how to deal with this
>   new interface.
> 
> - this needs to be safe against CPU hotplug. It probably already is,
>   but nobody ever tested it, given that userspace can't interact with
>   these interrupts at the moment.

Are you aware of any work being done (or having been done) in this
area? Thanks in advance!

My colleagues and I are looking into picking this up and implementing
the new sysfs interface and the related irqbalance changes, and we are
currently evaluating the level of effort. Obviously, we would like to
avoid any effort duplication.

Best regards,
Radu


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: irqdomain API: how to set affinity of parent irq of chained irqs?
  2023-04-06 23:56       ` Radu Rendec
@ 2023-04-07  9:18         ` Marc Zyngier
  2023-04-20 22:22           ` Radu Rendec
  0 siblings, 1 reply; 7+ messages in thread
From: Marc Zyngier @ 2023-04-07  9:18 UTC (permalink / raw)
  To: Radu Rendec
  Cc: Marek Behún, linux-kernel, Thomas Gleixner, pali, Brian Masney

Hi Radu,

On Fri, 07 Apr 2023 00:56:40 +0100,
Radu Rendec <rrendec@redhat.com> wrote:
> 
> Hello Marc, Marek,
> 
> On Tue, 2022-05-03 at 10:32 +0100, Marc Zyngier wrote:
> > On Mon, 02 May 2022 16:45:59 +0100,
> > Marek Behún <kabel@kernel.org> wrote:
> > > 
> > > On Mon, 02 May 2022 10:31:11 +0100
> > > Marc Zyngier <maz@kernel.org> wrote:
> > > 
> > > > On Mon, 02 May 2022 09:21:37 +0100,
> > > > Marek Behún <kabel@kernel.org> wrote:
> > > > > 
> > > > > Dear Marc, Thomas,
> > > > > 
> > > > > we have encountered the following problem that can hopefully be put
> > > > > some light onto: What is the intended way to set affinity (and possibly
> > > > > other irq attributes) of parent IRQ of chained IRQs, when using the
> > > > > irqdomain API?  
> > > > 
> > > > Simples: you can't. What sense does it make to change the affinity of
> > > > the parent interrupt, given that its fate is tied to *all* of the
> > > > other interrupts that are muxed to it?
> > > 
> > > Dear Marc,
> > > 
> > > thank you for your answer. Still:
> > > 
> > > What about when we want to set the same affinity for all the chained
> > > interrupts?
> > > 
> > > Example: on Armada 385 there are 4 PCIe controllers. Each controller
> > > has one interrupt from which we trigger chained interrupts. We would
> > > like to configure each controller to trigger interrupt (and thus all
> > > chained interrupts in the domain) on different CPU core.
> > > 
> > > Moreover we would really like to do this in runtime, through sysfs,
> > > depending on for example whether there are cards plugged in the PCIe
> > > ports.
> > > 
> > > Maybe there should be some mechanism to allow to change affinity for
> > > whole irqdomain, or something?
> > 
> > Should? Maybe. But not for an irqdomain (which really doesn't have
> > anything to do with interrupt affinity).
> > 
> > What you may want is a new sysfs interface that would allow a parent
> > interrupt affinity being changed, but also exposing to userspace all
> > the interrupts this affects *at the same time*. something like:
> > 
> > /sys/kernel/irq/42/smp_affinity_list
> > /sys/kernel/irq/42/muxed_irqs/
> > /sys/kernel/irq/42/muxed_irqs/56 -> ../../56
> > /sys/kernel/irq/42/muxed_irqs/57 -> ../../57
> > 
> > The main issues are that:
> > 
> > - we don't really track the muxing information in any of the data
> >   structures, so you can't just walk a short list and generate this
> >   information. You'd need to build the topology information at
> >   allocation time (or fish it out at runtime, but that's likely a
> >   pain).
> > 
> > - sysfs doesn't deal with affinities at all. procfs does, but adding
> >   more crap there is frowned upon.
> > 
> > - it *must* be a new interface. You can't repurpose the existing one,
> >   as something like irqbalance would be otherwise be massively
> >   confused by seeing interrupts moving around behind its back.
> > 
> > - conversely, you'll need to teach irqbalance how to deal with this
> >   new interface.
> > 
> > - this needs to be safe against CPU hotplug. It probably already is,
> >   but nobody ever tested it, given that userspace can't interact with
> >   these interrupts at the moment.
> 
> Are you aware of any work being done (or having been done) in this
> area? Thanks in advance!
> 
> My colleagues and I are looking into picking this up and implementing
> the new sysfs interface and the related irqbalance changes, and we are
> currently evaluating the level of effort. Obviously, we would like to
> avoid any effort duplication.

I don't think anyone ever tried it (it's far easier to just moan about
it than to do anything useful). But if you want to start looking into
that, that'd be great.

One of my concern is that allowing affinity changes for chained
interrupt may uncover issues in existing drivers, so it would have to
be an explicit buy-in for any chained irqchip. That's probably not too
hard to achieve anyway given that you'll need some new infrastructure
to track the muxed interrupts.

Hopefully this will result in something actually happening! ;-)

Cheers,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: irqdomain API: how to set affinity of parent irq of chained irqs?
  2023-04-07  9:18         ` Marc Zyngier
@ 2023-04-20 22:22           ` Radu Rendec
  0 siblings, 0 replies; 7+ messages in thread
From: Radu Rendec @ 2023-04-20 22:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Marek Behún, linux-kernel, Thomas Gleixner, pali, Brian Masney

Hi Marc,

On Fri, 2023-04-07 at 10:18 +0100, Marc Zyngier wrote:
> On Fri, 07 Apr 2023 00:56:40 +0100, Radu Rendec <rrendec@redhat.com> wrote:
> > Are you aware of any work being done (or having been done) in this
> > area? Thanks in advance!
> > 
> > My colleagues and I are looking into picking this up and implementing
> > the new sysfs interface and the related irqbalance changes, and we are
> > currently evaluating the level of effort. Obviously, we would like to
> > avoid any effort duplication.
> 
> I don't think anyone ever tried it (it's far easier to just moan about
> it than to do anything useful). But if you want to start looking into
> that, that'd be great.

Thanks for the feedback and sorry for the late reply. It looks like I
already started. I have been working on a "sandbox" driver that
implements hierarchical/muxed interrupts and would allow me to test in
a generic environment, without requiring mux hardware or to mess with
real interrupts.

But first, I would like to clarify something, just to make sure I'm on
the right track. It looks to me like with the hierarchical IRQ domain
API, there is always a 1:1 end-to-end mapping between virqs and the
hwirqs near the CPU. IOW, there is a 1:1 mapping between a given virq
and the corresponding hwirq in each IRQ domain along the chain, and
there is no other virq in-between. I looked at many of the irqchip
drivers that implement the hierarchical API, and couldn't find a single
one that does muxed IRQs. Furthermore, the revmap in struct irq_domain
is clearly a 1:1 map, so when an IRQ vector is entered, there is no way
to map multiple virqs (and run the associated handlers). I tried it in
my test driver, and if the .alloc domain op implementation allocates
the same hwirq in the parent domain for two different (v)irqs, the
revmap slot in the parent domain is overwritten.

If my understanding is correct, muxed IRQs are not possible with the
hierarchical IRQ domain API. That means in this particular case you can
never indirectly change the affinity of a different IRQ because hwirqs
are never shared. So, this is just a matter of exposing the affinity
through the new sysfs API for every irqchip driver that opts-in.

On the other hand, muxed IRQs *are* possible with the legacy API, and
drivers/irqchip/irq-imx-intmux.c is a clear example of that. However,
in this case one (or multiple) additional virq(s) exist at the mux
level, and it is the virq handler that implements the logic to invoke
the appropriate downstream (child) virq handler(s). Then the virq(s) at
the mux level and all the corresponding downstream virqs share the same
affinity setting, because they also share the same hwirq in the root
domain (which is where affinity is really implemented). And yes, in
this case the relationship between these virqs is not tracked anywhere
currently. Is this what you had in mind when you mentioned below a "new
infrastructure to track muxed interrupts"?

> One of my concern is that allowing affinity changes for chained
> interrupt may uncover issues in existing drivers, so it would have to
> be an explicit buy-in for any chained irqchip. That's probably not too
> hard to achieve anyway given that you'll need some new infrastructure
> to track the muxed interrupts.

The first thing that comes to mind for the "explicit buy-in" is a new
function pointer in struct irq_chip to set the affinity in a mux-aware
manner. Something like irq_set_affinity_shared or _chained. I may not
see the whole picture yet but so far my thinking is that the existing
irq_set_affinity must remain unchanged in order to preserve
compatibility/behavior of the procfs interface.

> Hopefully this will result in something actually happening! ;-)

I really hope so. I am also excited to have the opportunity to work on
this. I will likely need your guidance along the way but I think it's
better to talk in advance than submit a huge patch series that makes no
sense :)

Thanks,
Radu


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-04-20 22:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-02  8:21 irqdomain API: how to set affinity of parent irq of chained irqs? Marek Behún
2022-05-02  9:31 ` Marc Zyngier
2022-05-02 15:45   ` Marek Behún
2022-05-03  9:32     ` Marc Zyngier
2023-04-06 23:56       ` Radu Rendec
2023-04-07  9:18         ` Marc Zyngier
2023-04-20 22:22           ` Radu Rendec

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).