[PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last CPU goes offline

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last CPU goes offline
@ 2024-03-27 18:46 Tony Luck
  2024-03-27 22:37 ` Reinette Chatre
  0 siblings, 1 reply; 4+ messages in thread
From: Tony Luck @ 2024-03-27 18:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, x86, linux-kernel,
	patches, Tony Luck

Don't bother looking for another CPU to take over MBM overflow duties
when the last CPU in a domain goes offline. Doing so results in this
Oops:

[   97.166136] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   97.173118] #PF: supervisor read access in kernel mode
[   97.178263] #PF: error_code(0x0000) - not-present page
[   97.183410] PGD 0
[   97.185438] Oops: 0000 [#1] PREEMPT SMP NOPTI
[   97.189805] CPU: 36 PID: 235 Comm: cpuhp/36 Tainted: G                T  6.9.0-rc1 #356
[   97.208322] RIP: 0010:__find_nth_andnot_bit+0x66/0x110

Fixes: 978fcca954cb ("x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but CPU")
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 757d475158a3..4d9987acffd6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -929,6 +929,10 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms,
 	unsigned long delay = msecs_to_jiffies(delay_ms);
 	int cpu;
 
+	/* Nothing to do if this is the last CPU in a domain going offline */
+	if (!delay_ms && bitmap_weight(cpumask_bits(&dom->cpu_mask), nr_cpu_ids) == 1)
+		return;
+
 	/*
 	 * When a domain comes online there is no guarantee the filesystem is
 	 * mounted. If not, there is no need to catch counter overflow.
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last CPU goes offline
  2024-03-27 18:46 [PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last CPU goes offline Tony Luck
@ 2024-03-27 22:37 ` Reinette Chatre
  2024-03-27 23:01   ` Luck, Tony
  0 siblings, 1 reply; 4+ messages in thread
From: Reinette Chatre @ 2024-03-27 22:37 UTC (permalink / raw)
  To: Tony Luck, Borislav Petkov
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, x86, linux-kernel, patches

Hi Tony,

Thank you very much for taking a closer look at this.

On 3/27/2024 11:46 AM, Tony Luck wrote:
> Don't bother looking for another CPU to take over MBM overflow duties
> when the last CPU in a domain goes offline. Doing so results in this
> Oops:
> 
> [   97.166136] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [   97.173118] #PF: supervisor read access in kernel mode
> [   97.178263] #PF: error_code(0x0000) - not-present page
> [   97.183410] PGD 0
> [   97.185438] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [   97.189805] CPU: 36 PID: 235 Comm: cpuhp/36 Tainted: G                T  6.9.0-rc1 #356
> [   97.208322] RIP: 0010:__find_nth_andnot_bit+0x66/0x110
> 
> Fixes: 978fcca954cb ("x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but CPU")
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/resctrl/monitor.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 757d475158a3..4d9987acffd6 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -929,6 +929,10 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms,
>  	unsigned long delay = msecs_to_jiffies(delay_ms);
>  	int cpu;
>  
> +	/* Nothing to do if this is the last CPU in a domain going offline */
> +	if (!delay_ms && bitmap_weight(cpumask_bits(&dom->cpu_mask), nr_cpu_ids) == 1)
> +		return;
> +
>  	/*
>  	 * When a domain comes online there is no guarantee the filesystem is
>  	 * mounted. If not, there is no need to catch counter overflow.

While this addresses the scenario you tested I do not think that this solves the
underlying problem and thus I believe that there remains other scenarios in which this
same OOPS can be encountered.

For example, I think you will encounter the same OOPS a few lines later within 
cqm_setup_limbo_handler() if the system happened to have some busy RMIDs. Another
example would be if the tick_nohz_full_mask contains all but the exclude CPU.
In this scenario a bitmap_weight() test will not be sufficient since it does
not give insight into how many CPUs remain after taking into account
tick_nohz_full_mask.

There seems to be two issues here (although I am not familiar with these flows). First,
it seems that tick_nohz_full_mask is not actually allocated unless the user boots
with a "nohz_full=". This means that any attempt to access bits within tick_nohz_full_mask
will cause this OOPS. If that is allocated then the second issue seems that the  
buried __ffs() call requires that it not be called with 0 and this checking is not done.

To me it seems most appropriate to fix this at the central place to ensure all scenarios
are handled instead of scattering checks.

To that end, what do you think of something like below? It uses tick_nohz_full_enabled() check
to ensure that tick_nohz_full_mask is actually allocated while the other changes aim to
avoid __ffs() on 0.

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 8f40fb35db78..61337f32830c 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -72,6 +72,7 @@ static inline unsigned int
 cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
 {
 	unsigned int cpu, hk_cpu;
+	cpumask_var_t cpu_remain;
 
 	if (exclude_cpu == RESCTRL_PICK_ANY_CPU)
 		cpu = cpumask_any(mask);
@@ -85,14 +86,26 @@ cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
 	if (cpu < nr_cpu_ids && !tick_nohz_full_cpu(cpu))
 		return cpu;
 
-	/* Try to find a CPU that isn't nohz_full to use in preference */
-	hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
-	if (hk_cpu == exclude_cpu)
-		hk_cpu = cpumask_nth_andnot(1, mask, tick_nohz_full_mask);
+	/* Do not try to access tick_nohz_full_mask if it has not been allocated. */
+	if (!tick_nohz_full_enabled())
+		return cpu;
+
+	if (!zalloc_cpumask_var(&cpu_remain, GFP_KERNEL))
+		return cpu;
 
+	if (!cpumask_andnot(cpu_remain, mask, tick_nohz_full_mask)) {
+		free_cpumask_var(cpu_remain);
+		return cpu;
+	}
+
+	cpumask_clear_cpu(exclude_cpu, cpu_remain);
+
+	hk_cpu = cpumask_any(cpu_remain);
 	if (hk_cpu < nr_cpu_ids)
 		cpu = hk_cpu;
 
+	free_cpumask_var(cpu_remain);
+
 	return cpu;
 }
 

Reinette

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* RE: [PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last CPU goes offline
  2024-03-27 22:37 ` Reinette Chatre
@ 2024-03-27 23:01   ` Luck, Tony
  2024-03-28 21:11     ` Reinette Chatre
  0 siblings, 1 reply; 4+ messages in thread
From: Luck, Tony @ 2024-03-27 23:01 UTC (permalink / raw)
  To: Chatre, Reinette, Borislav Petkov
  Cc: Yu, Fenghua, Wieczor-Retman, Maciej, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, x86, linux-kernel, patches

> There seems to be two issues here (although I am not familiar with these flows). First,
> it seems that tick_nohz_full_mask is not actually allocated unless the user boots
> with a "nohz_full=". This means that any attempt to access bits within tick_nohz_full_mask
> will cause this OOPS. If that is allocated then the second issue seems that the  
> buried __ffs() call requires that it not be called with 0 and this checking is not done.
>
> To me it seems most appropriate to fix this at the central place to ensure all scenarios
> are handled instead of scattering checks.

Good analysis.

> To that end, what do you think of something like below? It uses tick_nohz_full_enabled() check
> to ensure that tick_nohz_full_mask is actually allocated while the other changes aim to
> avoid __ffs() on 0.

Looks good.

Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tont Luck <tony.luck@intel.com>

-Tony

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last CPU goes offline
  2024-03-27 23:01   ` Luck, Tony
@ 2024-03-28 21:11     ` Reinette Chatre
  0 siblings, 0 replies; 4+ messages in thread
From: Reinette Chatre @ 2024-03-28 21:11 UTC (permalink / raw)
  To: Luck, Tony, Borislav Petkov
  Cc: Yu, Fenghua, Wieczor-Retman, Maciej, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, x86, linux-kernel, patches

Hi Tony,

On 3/27/2024 4:01 PM, Luck, Tony wrote:
>> There seems to be two issues here (although I am not familiar with these flows). First,
>> it seems that tick_nohz_full_mask is not actually allocated unless the user boots
>> with a "nohz_full=". This means that any attempt to access bits within tick_nohz_full_mask
>> will cause this OOPS. If that is allocated then the second issue seems that the  
>> buried __ffs() call requires that it not be called with 0 and this checking is not done.
>>
>> To me it seems most appropriate to fix this at the central place to ensure all scenarios
>> are handled instead of scattering checks.
> 
> Good analysis.
> 
>> To that end, what do you think of something like below? It uses tick_nohz_full_enabled() check
>> to ensure that tick_nohz_full_mask is actually allocated while the other changes aim to
>> avoid __ffs() on 0.

I studied the flows some more and I no longer believe that there is a risk of __ffs() being
called on 0. Looking at the flows starting with cpumask_nth_andnot() the value provided to
__ffs() is ensured to be non-zero via either an explicit check or hweight_long().

To confirm this I tested a CONFIG_NO_HZ_FULL kernel by booting with "nohz_full=" set to the
highest numbered CPU in the domain. When all the CPUs are offlined the behavior for the
"second to last" and "last" CPU are most interesting and worked as expected when using 
cpumask_nth_andnot(). 

> 
> Looks good.
> 
> Tested-by: Tony Luck <tony.luck@intel.com>
> Reviewed-by: Tont Luck <tony.luck@intel.com>

Considering the motivation above I will submit a change that only adds the 
tick_nohz_full_enabled() check. Since it is such a big change from the snippet you reviewed
and tested here I dropped your tags from it and hope you can reconsider the fix after considering
the information above.

Reinette

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-03-28 21:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-27 18:46 [PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last CPU goes offline Tony Luck
2024-03-27 22:37 ` Reinette Chatre
2024-03-27 23:01   ` Luck, Tony
2024-03-28 21:11     ` Reinette Chatre

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.