* "irq 4: Affinity broken due to vector space exhaustion." warning on restart of ttyS0 console @ 2020-11-10 19:24 Prarit Bhargava 2020-11-10 20:56 ` Thomas Gleixner 0 siblings, 1 reply; 7+ messages in thread From: Prarit Bhargava @ 2020-11-10 19:24 UTC (permalink / raw) To: linux-kernel Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Marc Zyngier, Andy Lutomirski, Ming Lei, Peter Xu Occasionally when logging out of the ttyS0 aka serial console I see that irq 4: Affinity broken due to vector space exhaustion. is output to the console. At boot the default smp_affinity is /proc/irq/4/smp_affinity:000000ff,ffff0000,00ffffff The irqbalance service runs and can change this value. Depending on system load and behaviour, the IRQ can be assigned to a cpu on socket 1 and the smp_affinity is changed to socket 1. In that case, /proc/irq/4/smp_affinity:ffffff00,0000ffff,ff000000 When the user logs out of the serial console, the console is shut down and IRQ is free'd. The IRQ is immediately reacquired by the serial console when it starts up again. For example, Red Hat Enterprise Linux 8.4 Beta (Ootpa) Kernel 5.10.0-rc2+ on an x86_64 HOSTNAME: intel-whitley-07.khw1.lab.eng.bos.redhat.com Activate the web console with: systemctl enable --now cockpit.socket intel-whitley-07 login: root Password: Last login: Mon Nov 9 19:13:33 on ttyS0 [07:25 PM root@intel-whitley-07 ~]# exit *** console shutdown, IRQ released for cpu on socket 1 *** console starts back up again, IRQ assigned to on cpu on socket 0 In this process, however, the smp_affinity is not cleared for IRQ4. That is, it remains as /proc/irq/4/smp_affinity:ffffff00,0000ffff,ff000000 so that the check in activate_reserved() fails and "irq 4: Affinity broken due to vector space exhaustion." is output to the screen. I am not sure of correct fix here. It looks like the smp_affinity should be reset to default at irq shutdown, however, I cannot determine if that should be done for every IRQ, or (hopefully not) per driver. Can anyone offer guidance on a fix? P. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "irq 4: Affinity broken due to vector space exhaustion." warning on restart of ttyS0 console 2020-11-10 19:24 "irq 4: Affinity broken due to vector space exhaustion." warning on restart of ttyS0 console Prarit Bhargava @ 2020-11-10 20:56 ` Thomas Gleixner 2020-11-11 16:14 ` Prarit Bhargava 2020-12-09 6:33 ` Shung-Hsi Yu 0 siblings, 2 replies; 7+ messages in thread From: Thomas Gleixner @ 2020-11-10 20:56 UTC (permalink / raw) To: Prarit Bhargava, linux-kernel Cc: Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Marc Zyngier, Andy Lutomirski, Ming Lei, Peter Xu Prarit, On Tue, Nov 10 2020 at 14:24, Prarit Bhargava wrote: > Occasionally when logging out of the ttyS0 aka serial console I see that > > irq 4: Affinity broken due to vector space exhaustion. > > *** console shutdown, IRQ released for cpu on socket 1 > *** console starts back up again, IRQ assigned to on cpu on socket 0 > > In this process, however, the smp_affinity is not cleared for IRQ4. That is, it > remains as > > /proc/irq/4/smp_affinity:ffffff00,0000ffff,ff000000 > > so that the check in activate_reserved() fails and > > "irq 4: Affinity broken due to vector space exhaustion." > > is output to the screen. > > I am not sure of correct fix here. It looks like the smp_affinity should be > reset to default at irq shutdown, however, I cannot determine if that should be > done for every IRQ, or (hopefully not) per driver. This has been that way forever and there was a discussion about this at least 15 years ago. I can't find it at the moment because I can't access my archives and I failed to find the conversation on lore. But here is the gist: At some point I actually wanted to reset the affinity mask to the default affinity mask in free_irq() when the last action was removed. That broke setups where the affinity of the serial console interrupt, was set to some specific CPU by the admin and then it was moved to some other place due to logout -> shutdown -> startup. So we left the historic behaviour untouched. So now you are complaining about it the other way round and I have to tell you that there is no correct fix neither in the core nor in a driver. The real problem is irqbalanced aggressively exhausting the vector space of a _whole_ socket to the point that there is not a single vector left for serial. That's the problem you want to fix. Thanks, tglx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "irq 4: Affinity broken due to vector space exhaustion." warning on restart of ttyS0 console 2020-11-10 20:56 ` Thomas Gleixner @ 2020-11-11 16:14 ` Prarit Bhargava 2020-12-09 6:33 ` Shung-Hsi Yu 1 sibling, 0 replies; 7+ messages in thread From: Prarit Bhargava @ 2020-11-11 16:14 UTC (permalink / raw) To: Thomas Gleixner, linux-kernel Cc: Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Marc Zyngier, Andy Lutomirski, Ming Lei, Peter Xu On 11/10/20 3:56 PM, Thomas Gleixner wrote: > Prarit, > > On Tue, Nov 10 2020 at 14:24, Prarit Bhargava wrote: >> Occasionally when logging out of the ttyS0 aka serial console I see that >> >> irq 4: Affinity broken due to vector space exhaustion. >> >> *** console shutdown, IRQ released for cpu on socket 1 >> *** console starts back up again, IRQ assigned to on cpu on socket 0 >> >> In this process, however, the smp_affinity is not cleared for IRQ4. That is, it >> remains as >> >> /proc/irq/4/smp_affinity:ffffff00,0000ffff,ff000000 >> >> so that the check in activate_reserved() fails and >> >> "irq 4: Affinity broken due to vector space exhaustion." >> >> is output to the screen. >> >> I am not sure of correct fix here. It looks like the smp_affinity should be >> reset to default at irq shutdown, however, I cannot determine if that should be >> done for every IRQ, or (hopefully not) per driver. > > This has been that way forever and there was a discussion about this > at least 15 years ago. I can't find it at the moment because I can't > access my archives and I failed to find the conversation on lore. > > But here is the gist: > > At some point I actually wanted to reset the affinity mask to the > default affinity mask in free_irq() when the last action was > removed. > > That broke setups where the affinity of the serial console interrupt, > was set to some specific CPU by the admin and then it was moved to some > other place due to logout -> shutdown -> startup. > > So we left the historic behaviour untouched. > > So now you are complaining about it the other way round and I have to > tell you that there is no correct fix neither in the core nor in a > driver. > > The real problem is irqbalanced aggressively exhausting the vector space > of a _whole_ socket to the point that there is not a single vector left > for serial. That's the problem you want to fix. Thanks tglx. I'll go figure this out with the irqbalanced team. P. > > Thanks, > > tglx > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "irq 4: Affinity broken due to vector space exhaustion." warning on restart of ttyS0 console 2020-11-10 20:56 ` Thomas Gleixner 2020-11-11 16:14 ` Prarit Bhargava @ 2020-12-09 6:33 ` Shung-Hsi Yu 2020-12-09 7:10 ` Shung-Hsi Yu 2020-12-09 18:28 ` Thomas Gleixner 1 sibling, 2 replies; 7+ messages in thread From: Shung-Hsi Yu @ 2020-12-09 6:33 UTC (permalink / raw) To: Thomas Gleixner Cc: Prarit Bhargava, linux-kernel, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Marc Zyngier, Andy Lutomirski, Ming Lei, Peter Xu Hi Thomas, On Tue, Nov 10, 2020 at 09:56:27PM +0100, Thomas Gleixner wrote: > The real problem is irqbalanced aggressively exhausting the vector space > of a _whole_ socket to the point that there is not a single vector left > for serial. That's the problem you want to fix. I believe this warning also gets triggered even when there's _no_ vector exhaustion. This seem to happen when the IRQ's affinity mask is set (wrongly) to CPUs on a different NUMA node (e.g. cpumask_of_node(1) when the irqd->irq == 0). $ lscpu ... NUMA node0 CPU(s): 0-25,52-77 NUMA node1 CPU(s): 26-51,78-103 $ cat /sys/kernel/debug/tracing/trace ... irqbalance-1994 [017] d... 74.912799: irq_matrix_alloc: bit=33 cpu=26 online=1 avl=198 alloc=3 managed=1 online_maps=104 global_avl=20687, global_rsvd=341, total_alloc=217 irqbalance-1994 [017] d... 74.912802: vector_alloc: irq=4 vector=33 reserved=0 ret=0 irqbalance-1994 [017] d... 74.912804: vector_update: irq=4 vector=33 cpu=26 prev_vector=33 prev_cpu=7 irqbalance-1994 [017] d... 74.912805: vector_config: irq=4 vector=33 cpu=26 apicdest=0x00000040 <idle>-0 [007] d.h. 74.970733: vector_free_moved: irq=4 cpu=7 vector=33 is_managed=0 <idle>-0 [007] d.h. 74.970738: irq_matrix_free: bit=33 cpu=7 online=1 avl=200 alloc=1 managed=1 online_maps=104 global_avl=20687, global_rsvd=341, total_alloc=217 ... (agetty)-3004 [047] d... 81.731231: vector_deactivate: irq=4 is_managed=0 can_reserve=1 reserve=0 (agetty)-3004 [047] d... 81.738035: vector_clear: irq=4 vector=33 cpu=26 prev_vector=0 prev_cpu=7 (agetty)-3004 [047] d... 81.738040: irq_matrix_free: bit=33 cpu=26 online=1 avl=199 alloc=2 managed=1 online_maps=104 global_avl=20689, global_rsvd=341, total_alloc=215 (agetty)-3004 [047] d... 81.738046: irq_matrix_reserve: online_maps=104 global_avl=20689, global_rsvd=342, total_alloc=215 (agetty)-3004 [047] d... 81.766739: vector_reserve: irq=4 ret=0 (agetty)-3004 [047] d... 81.766741: vector_config: irq=4 vector=239 cpu=0 apicdest=0x00000000 (agetty)-3004 [047] d... 81.777152: vector_activate: irq=4 is_managed=0 can_reserve=1 reserve=0 (agetty)-3004 [047] d... 81.777157: vector_alloc: irq=4 vector=0 reserved=1 ret=-22 ----------------------------------------> irq_matrix_alloc() failed with EINVAL because the cpumask passed in is empty, which is a result of affmask being (ff,ffffc000,000fffff,fc000000) and cpumask_of_node(node) being (00,00003fff,fff00000,03ffffff). (agetty)-3004 [047] d... 81.789349: irq_matrix_alloc: bit=33 cpu=1 online=1 avl=199 alloc=2 managed=1 online_maps=104 global_avl=20688, global_rsvd=341, total_alloc=216 (agetty)-3004 [047] d... 81.789351: vector_alloc: irq=4 vector=33 reserved=1 ret=0 (agetty)-3004 [047] d... 81.789353: vector_update: irq=4 vector=33 cpu=1 prev_vector=0 prev_cpu=26 (agetty)-3004 [047] d... 81.789355: vector_config: irq=4 vector=33 cpu=1 apicdest=0x00000002 ----------------------------------------> "irq 4: Affinity broken due to vector space exhaustion." warning shows up (agetty)-3004 [047] d... 81.900783: irq_matrix_alloc: bit=33 cpu=26 online=1 avl=198 alloc=3 managed=1 online_maps=104 global_avl=20687, global_rsvd=341, total_alloc=217 (agetty)-3004 [047] d... 82.053535: vector_alloc: irq=4 vector=33 reserved=0 ret=0 (agetty)-3004 [047] d... 82.053536: vector_update: irq=4 vector=33 cpu=26 prev_vector=33 prev_cpu=1 (agetty)-3004 [047] d... 82.053538: vector_config: irq=4 vector=33 cpu=26 apicdest=0x00000040 Shung-Hsi Yu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "irq 4: Affinity broken due to vector space exhaustion." warning on restart of ttyS0 console 2020-12-09 6:33 ` Shung-Hsi Yu @ 2020-12-09 7:10 ` Shung-Hsi Yu 2020-12-09 18:28 ` Thomas Gleixner 1 sibling, 0 replies; 7+ messages in thread From: Shung-Hsi Yu @ 2020-12-09 7:10 UTC (permalink / raw) To: Thomas Gleixner Cc: Prarit Bhargava, linux-kernel, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Marc Zyngier, Andy Lutomirski, Ming Lei, Peter Xu On Wed, Dec 09, 2020 at 02:33:04PM +0800, Shung-Hsi Yu wrote: > Hi Thomas, > > On Tue, Nov 10, 2020 at 09:56:27PM +0100, Thomas Gleixner wrote: > > The real problem is irqbalanced aggressively exhausting the vector space > > of a _whole_ socket to the point that there is not a single vector left > > for serial. That's the problem you want to fix. > > I believe this warning also gets triggered even when there's _no_ vector > exhaustion. > > This seem to happen when the IRQ's affinity mask is set (wrongly) to CPUs on > a different NUMA node (e.g. cpumask_of_node(1) when the irqd->irq == 0). typo ^^^^^^^^^^^^^^ Should be "affinity mask set to cpumask_of_node(1) when irq_data_get_node(irqd) == 0". Shung-Hsi > $ lscpu > ... > NUMA node0 CPU(s): 0-25,52-77 > NUMA node1 CPU(s): 26-51,78-103 > > $ cat /sys/kernel/debug/tracing/trace > ... > irqbalance-1994 [017] d... 74.912799: irq_matrix_alloc: bit=33 cpu=26 online=1 avl=198 alloc=3 managed=1 online_maps=104 global_avl=20687, global_rsvd=341, total_alloc=217 > irqbalance-1994 [017] d... 74.912802: vector_alloc: irq=4 vector=33 reserved=0 ret=0 > irqbalance-1994 [017] d... 74.912804: vector_update: irq=4 vector=33 cpu=26 prev_vector=33 prev_cpu=7 > irqbalance-1994 [017] d... 74.912805: vector_config: irq=4 vector=33 cpu=26 apicdest=0x00000040 > <idle>-0 [007] d.h. 74.970733: vector_free_moved: irq=4 cpu=7 vector=33 is_managed=0 > <idle>-0 [007] d.h. 74.970738: irq_matrix_free: bit=33 cpu=7 online=1 avl=200 alloc=1 managed=1 online_maps=104 global_avl=20687, global_rsvd=341, total_alloc=217 > ... > (agetty)-3004 [047] d... 81.731231: vector_deactivate: irq=4 is_managed=0 can_reserve=1 reserve=0 > (agetty)-3004 [047] d... 81.738035: vector_clear: irq=4 vector=33 cpu=26 prev_vector=0 prev_cpu=7 > (agetty)-3004 [047] d... 81.738040: irq_matrix_free: bit=33 cpu=26 online=1 avl=199 alloc=2 managed=1 online_maps=104 global_avl=20689, global_rsvd=341, total_alloc=215 > (agetty)-3004 [047] d... 81.738046: irq_matrix_reserve: online_maps=104 global_avl=20689, global_rsvd=342, total_alloc=215 > (agetty)-3004 [047] d... 81.766739: vector_reserve: irq=4 ret=0 > (agetty)-3004 [047] d... 81.766741: vector_config: irq=4 vector=239 cpu=0 apicdest=0x00000000 > (agetty)-3004 [047] d... 81.777152: vector_activate: irq=4 is_managed=0 can_reserve=1 reserve=0 > (agetty)-3004 [047] d... 81.777157: vector_alloc: irq=4 vector=0 reserved=1 ret=-22 > ----------------------------------------> irq_matrix_alloc() failed with > EINVAL because the cpumask > passed in is empty, which is a > result of affmask being > (ff,ffffc000,000fffff,fc000000) > and cpumask_of_node(node) > being > (00,00003fff,fff00000,03ffffff). > > (agetty)-3004 [047] d... 81.789349: irq_matrix_alloc: bit=33 cpu=1 online=1 avl=199 alloc=2 managed=1 online_maps=104 global_avl=20688, global_rsvd=341, total_alloc=216 > (agetty)-3004 [047] d... 81.789351: vector_alloc: irq=4 vector=33 reserved=1 ret=0 > (agetty)-3004 [047] d... 81.789353: vector_update: irq=4 vector=33 cpu=1 prev_vector=0 prev_cpu=26 > (agetty)-3004 [047] d... 81.789355: vector_config: irq=4 vector=33 cpu=1 apicdest=0x00000002 > ----------------------------------------> "irq 4: Affinity broken due to > vector space exhaustion." > warning shows up > > (agetty)-3004 [047] d... 81.900783: irq_matrix_alloc: bit=33 cpu=26 online=1 avl=198 alloc=3 managed=1 online_maps=104 global_avl=20687, global_rsvd=341, total_alloc=217 > (agetty)-3004 [047] d... 82.053535: vector_alloc: irq=4 vector=33 reserved=0 ret=0 > (agetty)-3004 [047] d... 82.053536: vector_update: irq=4 vector=33 cpu=26 prev_vector=33 prev_cpu=1 > (agetty)-3004 [047] d... 82.053538: vector_config: irq=4 vector=33 cpu=26 apicdest=0x00000040 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "irq 4: Affinity broken due to vector space exhaustion." warning on restart of ttyS0 console 2020-12-09 6:33 ` Shung-Hsi Yu 2020-12-09 7:10 ` Shung-Hsi Yu @ 2020-12-09 18:28 ` Thomas Gleixner 2020-12-10 6:18 ` Shung-Hsi Yu 1 sibling, 1 reply; 7+ messages in thread From: Thomas Gleixner @ 2020-12-09 18:28 UTC (permalink / raw) To: Shung-Hsi Yu Cc: Prarit Bhargava, linux-kernel, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Marc Zyngier, Andy Lutomirski, Ming Lei, Peter Xu Hi Shung-Hsi! On Wed, Dec 09 2020 at 14:33, Shung-Hsi Yu wrote: > On Tue, Nov 10, 2020 at 09:56:27PM +0100, Thomas Gleixner wrote: >> The real problem is irqbalanced aggressively exhausting the vector space >> of a _whole_ socket to the point that there is not a single vector left >> for serial. That's the problem you want to fix. > > I believe this warning also gets triggered even when there's _no_ vector > exhaustion. > > This seem to happen when the IRQ's affinity mask is set (wrongly) to CPUs on > a different NUMA node (e.g. cpumask_of_node(1) when the irqd->irq == 0). > > $ lscpu > ... > NUMA node0 CPU(s): 0-25,52-77 > NUMA node1 CPU(s): 26-51,78-103 > > $ cat /sys/kernel/debug/tracing/trace > ... > (agetty)-3004 [047] d... 81.777152: vector_activate: irq=4 is_managed=0 can_reserve=1 reserve=0 > (agetty)-3004 [047] d... 81.777157: vector_alloc: irq=4 vector=0 reserved=1 ret=-22 > ----------------------------------------> irq_matrix_alloc() failed with > EINVAL because the cpumask > passed in is empty, which is a > result of affmask being > (ff,ffffc000,000fffff,fc000000) > and cpumask_of_node(node) > being > (00,00003fff,fff00000,03ffffff). > > (agetty)-3004 [047] d... 81.789349: irq_matrix_alloc: bit=33 cpu=1 online=1 avl=199 alloc=2 managed=1 online_maps=104 global_avl=20688, global_rsvd=341, total_alloc=216 > (agetty)-3004 [047] d... 81.789351: vector_alloc: irq=4 vector=33 reserved=1 ret=0 > (agetty)-3004 [047] d... 81.789353: vector_update: irq=4 vector=33 cpu=1 prev_vector=0 prev_cpu=26 > (agetty)-3004 [047] d... 81.789355: vector_config: irq=4 vector=33 cpu=1 apicdest=0x00000002 > ----------------------------------------> "irq 4: Affinity broken due to > vector space exhaustion." > warning shows up > Ok. That's a different story. Nice explanation! But the fix is not to tone down the warning. The proper fix is to do the search in the correct order. Thanks, tglx --- arch/x86/kernel/apic/vector.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -273,20 +273,24 @@ static int assign_irq_vector_any_locked( const struct cpumask *affmsk = irq_data_get_affinity_mask(irqd); int node = irq_data_get_node(irqd); - if (node == NUMA_NO_NODE) - goto all; - /* Try the intersection of @affmsk and node mask */ - cpumask_and(vector_searchmask, cpumask_of_node(node), affmsk); - if (!assign_vector_locked(irqd, vector_searchmask)) - return 0; - /* Try the node mask */ - if (!assign_vector_locked(irqd, cpumask_of_node(node))) - return 0; -all: + if (node != NUMA_NO_NODE) { + /* Try the intersection of @affmsk and node mask */ + cpumask_and(vector_searchmask, cpumask_of_node(node), affmsk); + if (!assign_vector_locked(irqd, vector_searchmask)) + return 0; + } + /* Try the full affinity mask */ cpumask_and(vector_searchmask, affmsk, cpu_online_mask); if (!assign_vector_locked(irqd, vector_searchmask)) return 0; + + if (node != NUMA_NO_NODE) { + /* Try the node mask */ + if (!assign_vector_locked(irqd, cpumask_of_node(node))) + return 0; + } + /* Try the full online mask */ return assign_vector_locked(irqd, cpu_online_mask); } ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: "irq 4: Affinity broken due to vector space exhaustion." warning on restart of ttyS0 console 2020-12-09 18:28 ` Thomas Gleixner @ 2020-12-10 6:18 ` Shung-Hsi Yu 0 siblings, 0 replies; 7+ messages in thread From: Shung-Hsi Yu @ 2020-12-10 6:18 UTC (permalink / raw) To: Thomas Gleixner Cc: Prarit Bhargava, linux-kernel, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Marc Zyngier, Andy Lutomirski, Ming Lei, Peter Xu On Wed, Dec 09, 2020 at 07:28:49PM +0100, Thomas Gleixner wrote: > But the fix is not to tone down the warning. The proper fix is to do the > search in the correct order. Agree. Thank you for the speedy fix! Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-12-10 6:20 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-11-10 19:24 "irq 4: Affinity broken due to vector space exhaustion." warning on restart of ttyS0 console Prarit Bhargava 2020-11-10 20:56 ` Thomas Gleixner 2020-11-11 16:14 ` Prarit Bhargava 2020-12-09 6:33 ` Shung-Hsi Yu 2020-12-09 7:10 ` Shung-Hsi Yu 2020-12-09 18:28 ` Thomas Gleixner 2020-12-10 6:18 ` Shung-Hsi Yu
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.