* [PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last CPU goes offline @ 2024-03-27 18:46 Tony Luck 2024-03-27 22:37 ` Reinette Chatre 0 siblings, 1 reply; 4+ messages in thread From: Tony Luck @ 2024-03-27 18:46 UTC (permalink / raw) To: Borislav Petkov Cc: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman, James Morse, Babu Moger, Drew Fustini, x86, linux-kernel, patches, Tony Luck Don't bother looking for another CPU to take over MBM overflow duties when the last CPU in a domain goes offline. Doing so results in this Oops: [ 97.166136] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 97.173118] #PF: supervisor read access in kernel mode [ 97.178263] #PF: error_code(0x0000) - not-present page [ 97.183410] PGD 0 [ 97.185438] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 97.189805] CPU: 36 PID: 235 Comm: cpuhp/36 Tainted: G T 6.9.0-rc1 #356 [ 97.208322] RIP: 0010:__find_nth_andnot_bit+0x66/0x110 Fixes: 978fcca954cb ("x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but CPU") Signed-off-by: Tony Luck <tony.luck@intel.com> --- arch/x86/kernel/cpu/resctrl/monitor.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c index 757d475158a3..4d9987acffd6 100644 --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -929,6 +929,10 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms, unsigned long delay = msecs_to_jiffies(delay_ms); int cpu; + /* Nothing to do if this is the last CPU in a domain going offline */ + if (!delay_ms && bitmap_weight(cpumask_bits(&dom->cpu_mask), nr_cpu_ids) == 1) + return; + /* * When a domain comes online there is no guarantee the filesystem is * mounted. If not, there is no need to catch counter overflow. -- 2.44.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last CPU goes offline 2024-03-27 18:46 [PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last CPU goes offline Tony Luck @ 2024-03-27 22:37 ` Reinette Chatre 2024-03-27 23:01 ` Luck, Tony 0 siblings, 1 reply; 4+ messages in thread From: Reinette Chatre @ 2024-03-27 22:37 UTC (permalink / raw) To: Tony Luck, Borislav Petkov Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse, Babu Moger, Drew Fustini, x86, linux-kernel, patches Hi Tony, Thank you very much for taking a closer look at this. On 3/27/2024 11:46 AM, Tony Luck wrote: > Don't bother looking for another CPU to take over MBM overflow duties > when the last CPU in a domain goes offline. Doing so results in this > Oops: > > [ 97.166136] BUG: kernel NULL pointer dereference, address: 0000000000000000 > [ 97.173118] #PF: supervisor read access in kernel mode > [ 97.178263] #PF: error_code(0x0000) - not-present page > [ 97.183410] PGD 0 > [ 97.185438] Oops: 0000 [#1] PREEMPT SMP NOPTI > [ 97.189805] CPU: 36 PID: 235 Comm: cpuhp/36 Tainted: G T 6.9.0-rc1 #356 > [ 97.208322] RIP: 0010:__find_nth_andnot_bit+0x66/0x110 > > Fixes: 978fcca954cb ("x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but CPU") > Signed-off-by: Tony Luck <tony.luck@intel.com> > --- > arch/x86/kernel/cpu/resctrl/monitor.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c > index 757d475158a3..4d9987acffd6 100644 > --- a/arch/x86/kernel/cpu/resctrl/monitor.c > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c > @@ -929,6 +929,10 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms, > unsigned long delay = msecs_to_jiffies(delay_ms); > int cpu; > > + /* Nothing to do if this is the last CPU in a domain going offline */ > + if (!delay_ms && bitmap_weight(cpumask_bits(&dom->cpu_mask), nr_cpu_ids) == 1) > + return; > + > /* > * When a domain comes online there is no guarantee the filesystem is > * mounted. If not, there is no need to catch counter overflow. While this addresses the scenario you tested I do not think that this solves the underlying problem and thus I believe that there remains other scenarios in which this same OOPS can be encountered. For example, I think you will encounter the same OOPS a few lines later within cqm_setup_limbo_handler() if the system happened to have some busy RMIDs. Another example would be if the tick_nohz_full_mask contains all but the exclude CPU. In this scenario a bitmap_weight() test will not be sufficient since it does not give insight into how many CPUs remain after taking into account tick_nohz_full_mask. There seems to be two issues here (although I am not familiar with these flows). First, it seems that tick_nohz_full_mask is not actually allocated unless the user boots with a "nohz_full=". This means that any attempt to access bits within tick_nohz_full_mask will cause this OOPS. If that is allocated then the second issue seems that the buried __ffs() call requires that it not be called with 0 and this checking is not done. To me it seems most appropriate to fix this at the central place to ensure all scenarios are handled instead of scattering checks. To that end, what do you think of something like below? It uses tick_nohz_full_enabled() check to ensure that tick_nohz_full_mask is actually allocated while the other changes aim to avoid __ffs() on 0. diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h index 8f40fb35db78..61337f32830c 100644 --- a/arch/x86/kernel/cpu/resctrl/internal.h +++ b/arch/x86/kernel/cpu/resctrl/internal.h @@ -72,6 +72,7 @@ static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu) { unsigned int cpu, hk_cpu; + cpumask_var_t cpu_remain; if (exclude_cpu == RESCTRL_PICK_ANY_CPU) cpu = cpumask_any(mask); @@ -85,14 +86,26 @@ cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu) if (cpu < nr_cpu_ids && !tick_nohz_full_cpu(cpu)) return cpu; - /* Try to find a CPU that isn't nohz_full to use in preference */ - hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask); - if (hk_cpu == exclude_cpu) - hk_cpu = cpumask_nth_andnot(1, mask, tick_nohz_full_mask); + /* Do not try to access tick_nohz_full_mask if it has not been allocated. */ + if (!tick_nohz_full_enabled()) + return cpu; + + if (!zalloc_cpumask_var(&cpu_remain, GFP_KERNEL)) + return cpu; + if (!cpumask_andnot(cpu_remain, mask, tick_nohz_full_mask)) { + free_cpumask_var(cpu_remain); + return cpu; + } + + cpumask_clear_cpu(exclude_cpu, cpu_remain); + + hk_cpu = cpumask_any(cpu_remain); if (hk_cpu < nr_cpu_ids) cpu = hk_cpu; + free_cpumask_var(cpu_remain); + return cpu; } Reinette ^ permalink raw reply related [flat|nested] 4+ messages in thread
* RE: [PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last CPU goes offline 2024-03-27 22:37 ` Reinette Chatre @ 2024-03-27 23:01 ` Luck, Tony 2024-03-28 21:11 ` Reinette Chatre 0 siblings, 1 reply; 4+ messages in thread From: Luck, Tony @ 2024-03-27 23:01 UTC (permalink / raw) To: Chatre, Reinette, Borislav Petkov Cc: Yu, Fenghua, Wieczor-Retman, Maciej, Peter Newman, James Morse, Babu Moger, Drew Fustini, x86, linux-kernel, patches > There seems to be two issues here (although I am not familiar with these flows). First, > it seems that tick_nohz_full_mask is not actually allocated unless the user boots > with a "nohz_full=". This means that any attempt to access bits within tick_nohz_full_mask > will cause this OOPS. If that is allocated then the second issue seems that the > buried __ffs() call requires that it not be called with 0 and this checking is not done. > > To me it seems most appropriate to fix this at the central place to ensure all scenarios > are handled instead of scattering checks. Good analysis. > To that end, what do you think of something like below? It uses tick_nohz_full_enabled() check > to ensure that tick_nohz_full_mask is actually allocated while the other changes aim to > avoid __ffs() on 0. Looks good. Tested-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Tont Luck <tony.luck@intel.com> -Tony ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last CPU goes offline 2024-03-27 23:01 ` Luck, Tony @ 2024-03-28 21:11 ` Reinette Chatre 0 siblings, 0 replies; 4+ messages in thread From: Reinette Chatre @ 2024-03-28 21:11 UTC (permalink / raw) To: Luck, Tony, Borislav Petkov Cc: Yu, Fenghua, Wieczor-Retman, Maciej, Peter Newman, James Morse, Babu Moger, Drew Fustini, x86, linux-kernel, patches Hi Tony, On 3/27/2024 4:01 PM, Luck, Tony wrote: >> There seems to be two issues here (although I am not familiar with these flows). First, >> it seems that tick_nohz_full_mask is not actually allocated unless the user boots >> with a "nohz_full=". This means that any attempt to access bits within tick_nohz_full_mask >> will cause this OOPS. If that is allocated then the second issue seems that the >> buried __ffs() call requires that it not be called with 0 and this checking is not done. >> >> To me it seems most appropriate to fix this at the central place to ensure all scenarios >> are handled instead of scattering checks. > > Good analysis. > >> To that end, what do you think of something like below? It uses tick_nohz_full_enabled() check >> to ensure that tick_nohz_full_mask is actually allocated while the other changes aim to >> avoid __ffs() on 0. I studied the flows some more and I no longer believe that there is a risk of __ffs() being called on 0. Looking at the flows starting with cpumask_nth_andnot() the value provided to __ffs() is ensured to be non-zero via either an explicit check or hweight_long(). To confirm this I tested a CONFIG_NO_HZ_FULL kernel by booting with "nohz_full=" set to the highest numbered CPU in the domain. When all the CPUs are offlined the behavior for the "second to last" and "last" CPU are most interesting and worked as expected when using cpumask_nth_andnot(). > > Looks good. > > Tested-by: Tony Luck <tony.luck@intel.com> > Reviewed-by: Tont Luck <tony.luck@intel.com> Considering the motivation above I will submit a change that only adds the tick_nohz_full_enabled() check. Since it is such a big change from the snippet you reviewed and tested here I dropped your tags from it and hope you can reconsider the fix after considering the information above. Reinette ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-03-28 21:11 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-03-27 18:46 [PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last CPU goes offline Tony Luck 2024-03-27 22:37 ` Reinette Chatre 2024-03-27 23:01 ` Luck, Tony 2024-03-28 21:11 ` Reinette Chatre
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.