linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi()
@ 2022-01-06  9:46 Vitaly Kuznetsov
  2022-01-07 16:56 ` Michael Kelley (LINUX)
  0 siblings, 1 reply; 3+ messages in thread
From: Vitaly Kuznetsov @ 2022-01-06  9:46 UTC (permalink / raw)
  To: linux-hyperv
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Michael Kelley, x86, linux-kernel

KASAN detected the following issue:

 BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_multi+0xf88/0x1060
 Read of size 4 at addr ffff8880011ccbc0 by task kcompactd0/33

 CPU: 1 PID: 33 Comm: kcompactd0 Not tainted 5.14.0-39.el9.x86_64+debug #1
 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine,
     BIOS Hyper-V UEFI Release v4.0 12/17/2019
 Call Trace:
  dump_stack_lvl+0x57/0x7d
  print_address_description.constprop.0+0x1f/0x140
  ? hyperv_flush_tlb_multi+0xf88/0x1060
  __kasan_report.cold+0x7f/0x11e
  ? hyperv_flush_tlb_multi+0xf88/0x1060
  kasan_report+0x38/0x50
  hyperv_flush_tlb_multi+0xf88/0x1060
  flush_tlb_mm_range+0x1b1/0x200
  ptep_clear_flush+0x10e/0x150
...
 Allocated by task 0:
  kasan_save_stack+0x1b/0x40
  __kasan_kmalloc+0x7c/0x90
  hv_common_init+0xae/0x115
  hyperv_init+0x97/0x501
  apic_intr_mode_init+0xb3/0x1e0
  x86_late_time_init+0x92/0xa2
  start_kernel+0x338/0x3eb
  secondary_startup_64_no_verify+0xc2/0xcb

 The buggy address belongs to the object at ffff8880011cc800
  which belongs to the cache kmalloc-1k of size 1024
 The buggy address is located 960 bytes inside of
  1024-byte region [ffff8880011cc800, ffff8880011ccc00)

'hyperv_flush_tlb_multi+0xf88/0x1060' points to
hv_cpu_number_to_vp_number() and '960 bytes' means we're trying to get
VP_INDEX for CPU#240. 'nr_cpus' here is exactly 240 so we're trying to
access past hv_vp_index's last element. This can (and will) happen
when 'cpus' mask is empty and cpumask_last() will return '>=nr_cpus'.

Commit ad0a6bad4475 ("x86/hyperv: check cpu mask after interrupt has
been disabled") tried to deal with empty cpumask situation but
apparently didn't fully fix the issue.

'cpus' cpumask which is passed to hyperv_flush_tlb_multi() is
'mm_cpumask(mm)' (which is '&mm->cpu_bitmap'). This mask changes every
time the particular mm is scheduled/unscheduled on some CPU (see
switch_mm_irqs_off()), disabling IRQs on the CPU which is performing remote
TLB flush has zero influence on whether the particular process can get
scheduled/unscheduled on _other_ CPUs so e.g. in the case where the mm was
scheduled on one other CPU and got unscheduled during
hyperv_flush_tlb_multi()'s execution will lead to cpumask becoming empty.

It doesn't seem that there's a good way to protect 'mm_cpumask(mm)'
from changing during hyperv_flush_tlb_multi()'s execution. It would be
possible to copy it in the very beginning of the function but this is a
waste. It seems we can deal with changing cpumask just fine.

When 'cpus' cpumask changes during hyperv_flush_tlb_multi()'s
execution, there are two possible issues:
- 'Under-flushing': we will not flush TLB on a CPU which got added to
the mask while hyperv_flush_tlb_multi() was already running. This is
not a problem as this is equal to mm getting scheduled on that CPU
right after TLB flush.
- 'Over-flushing': we may flush TLB on a CPU which is already cleared
from the mask. First, extra TLB flush preserves correctness. Second,
Hyper-V's TLB flush hypercall takes 'mm->pgd' argument so Hyper-V may
avoid the flush if CR3 doesn't match.

Fix the immediate issue with cpumask_last()/hv_cpu_number_to_vp_number()
and remove the pointless cpumask_empty() check from the beginning of the
function as it really doesn't protect anything. Also, avoid the hypercall
altogether when 'flush->processor_mask' ends up being empty.

Fixes: ad0a6bad4475 ("x86/hyperv: check cpu mask after interrupt has been disabled")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/mmu.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index bd13736d0c05..0ad2378fe6ad 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -68,15 +68,6 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
 
 	local_irq_save(flags);
 
-	/*
-	 * Only check the mask _after_ interrupt has been disabled to avoid the
-	 * mask changing under our feet.
-	 */
-	if (cpumask_empty(cpus)) {
-		local_irq_restore(flags);
-		return;
-	}
-
 	flush_pcpu = (struct hv_tlb_flush **)
 		     this_cpu_ptr(hyperv_pcpu_input_arg);
 
@@ -115,7 +106,9 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
 		 * must. We will also check all VP numbers when walking the
 		 * supplied CPU set to remain correct in all cases.
 		 */
-		if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
+		cpu = cpumask_last(cpus);
+
+		if (cpu < nr_cpumask_bits && hv_cpu_number_to_vp_number(cpu) >= 64)
 			goto do_ex_hypercall;
 
 		for_each_cpu(cpu, cpus) {
@@ -131,6 +124,12 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
 			__set_bit(vcpu, (unsigned long *)
 				  &flush->processor_mask);
 		}
+
+		/* nothing to flush if 'processor_mask' ends up being empty */
+		if (!flush->processor_mask) {
+			local_irq_restore(flags);
+			return;
+		}
 	}
 
 	/*
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* RE: [PATCH] x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi()
  2022-01-06  9:46 [PATCH] x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi() Vitaly Kuznetsov
@ 2022-01-07 16:56 ` Michael Kelley (LINUX)
  2022-01-10 11:51   ` Wei Liu
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Kelley (LINUX) @ 2022-01-07 16:56 UTC (permalink / raw)
  To: vkuznets, linux-hyperv
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, x86, linux-kernel

From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Thursday, January 6, 2022 1:46 AM
> 
> KASAN detected the following issue:
> 
>  BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_multi+0xf88/0x1060
>  Read of size 4 at addr ffff8880011ccbc0 by task kcompactd0/33
> 
>  CPU: 1 PID: 33 Comm: kcompactd0 Not tainted 5.14.0-39.el9.x86_64+debug #1
>  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine,
>      BIOS Hyper-V UEFI Release v4.0 12/17/2019
>  Call Trace:
>   dump_stack_lvl+0x57/0x7d
>   print_address_description.constprop.0+0x1f/0x140
>   ? hyperv_flush_tlb_multi+0xf88/0x1060
>   __kasan_report.cold+0x7f/0x11e
>   ? hyperv_flush_tlb_multi+0xf88/0x1060
>   kasan_report+0x38/0x50
>   hyperv_flush_tlb_multi+0xf88/0x1060
>   flush_tlb_mm_range+0x1b1/0x200
>   ptep_clear_flush+0x10e/0x150
> ...
>  Allocated by task 0:
>   kasan_save_stack+0x1b/0x40
>   __kasan_kmalloc+0x7c/0x90
>   hv_common_init+0xae/0x115
>   hyperv_init+0x97/0x501
>   apic_intr_mode_init+0xb3/0x1e0
>   x86_late_time_init+0x92/0xa2
>   start_kernel+0x338/0x3eb
>   secondary_startup_64_no_verify+0xc2/0xcb
> 
>  The buggy address belongs to the object at ffff8880011cc800
>   which belongs to the cache kmalloc-1k of size 1024
>  The buggy address is located 960 bytes inside of
>   1024-byte region [ffff8880011cc800, ffff8880011ccc00)
> 
> 'hyperv_flush_tlb_multi+0xf88/0x1060' points to
> hv_cpu_number_to_vp_number() and '960 bytes' means we're trying to get
> VP_INDEX for CPU#240. 'nr_cpus' here is exactly 240 so we're trying to
> access past hv_vp_index's last element. This can (and will) happen
> when 'cpus' mask is empty and cpumask_last() will return '>=nr_cpus'.
> 
> Commit ad0a6bad4475 ("x86/hyperv: check cpu mask after interrupt has
> been disabled") tried to deal with empty cpumask situation but
> apparently didn't fully fix the issue.
> 
> 'cpus' cpumask which is passed to hyperv_flush_tlb_multi() is
> 'mm_cpumask(mm)' (which is '&mm->cpu_bitmap'). This mask changes every
> time the particular mm is scheduled/unscheduled on some CPU (see
> switch_mm_irqs_off()), disabling IRQs on the CPU which is performing remote
> TLB flush has zero influence on whether the particular process can get
> scheduled/unscheduled on _other_ CPUs so e.g. in the case where the mm was
> scheduled on one other CPU and got unscheduled during
> hyperv_flush_tlb_multi()'s execution will lead to cpumask becoming empty.
> 
> It doesn't seem that there's a good way to protect 'mm_cpumask(mm)'
> from changing during hyperv_flush_tlb_multi()'s execution. It would be
> possible to copy it in the very beginning of the function but this is a
> waste. It seems we can deal with changing cpumask just fine.
> 
> When 'cpus' cpumask changes during hyperv_flush_tlb_multi()'s
> execution, there are two possible issues:
> - 'Under-flushing': we will not flush TLB on a CPU which got added to
> the mask while hyperv_flush_tlb_multi() was already running. This is
> not a problem as this is equal to mm getting scheduled on that CPU
> right after TLB flush.
> - 'Over-flushing': we may flush TLB on a CPU which is already cleared
> from the mask. First, extra TLB flush preserves correctness. Second,
> Hyper-V's TLB flush hypercall takes 'mm->pgd' argument so Hyper-V may
> avoid the flush if CR3 doesn't match.
> 
> Fix the immediate issue with
> cpumask_last()/hv_cpu_number_to_vp_number()
> and remove the pointless cpumask_empty() check from the beginning of the
> function as it really doesn't protect anything. Also, avoid the hypercall
> altogether when 'flush->processor_mask' ends up being empty.
> 
> Fixes: ad0a6bad4475 ("x86/hyperv: check cpu mask after interrupt has been disabled")
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/hyperv/mmu.c | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> index bd13736d0c05..0ad2378fe6ad 100644
> --- a/arch/x86/hyperv/mmu.c
> +++ b/arch/x86/hyperv/mmu.c
> @@ -68,15 +68,6 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
> 
>  	local_irq_save(flags);
> 
> -	/*
> -	 * Only check the mask _after_ interrupt has been disabled to avoid the
> -	 * mask changing under our feet.
> -	 */
> -	if (cpumask_empty(cpus)) {
> -		local_irq_restore(flags);
> -		return;
> -	}
> -
>  	flush_pcpu = (struct hv_tlb_flush **)
>  		     this_cpu_ptr(hyperv_pcpu_input_arg);
> 
> @@ -115,7 +106,9 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
>  		 * must. We will also check all VP numbers when walking the
>  		 * supplied CPU set to remain correct in all cases.
>  		 */
> -		if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
> +		cpu = cpumask_last(cpus);
> +
> +		if (cpu < nr_cpumask_bits && hv_cpu_number_to_vp_number(cpu) >= 64)
>  			goto do_ex_hypercall;
> 
>  		for_each_cpu(cpu, cpus) {
> @@ -131,6 +124,12 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
>  			__set_bit(vcpu, (unsigned long *)
>  				  &flush->processor_mask);
>  		}
> +
> +		/* nothing to flush if 'processor_mask' ends up being empty */
> +		if (!flush->processor_mask) {
> +			local_irq_restore(flags);
> +			return;
> +		}
>  	}
> 
>  	/*
> --
> 2.33.1

Thanks for figuring out the core issue with the cpumask changing and
doing what should be the definitive solution!

Reviewed-by: Michael Kelley <mikelley@microsoft.com>





^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi()
  2022-01-07 16:56 ` Michael Kelley (LINUX)
@ 2022-01-10 11:51   ` Wei Liu
  0 siblings, 0 replies; 3+ messages in thread
From: Wei Liu @ 2022-01-10 11:51 UTC (permalink / raw)
  To: Michael Kelley (LINUX)
  Cc: vkuznets, linux-hyperv, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, Dexuan Cui, x86, linux-kernel

On Fri, Jan 07, 2022 at 04:56:29PM +0000, Michael Kelley (LINUX) wrote:
> From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Thursday, January 6, 2022 1:46 AM
> > 
> > KASAN detected the following issue:
> > 
> >  BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_multi+0xf88/0x1060
> >  Read of size 4 at addr ffff8880011ccbc0 by task kcompactd0/33
> > 
> >  CPU: 1 PID: 33 Comm: kcompactd0 Not tainted 5.14.0-39.el9.x86_64+debug #1
> >  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine,
> >      BIOS Hyper-V UEFI Release v4.0 12/17/2019
> >  Call Trace:
> >   dump_stack_lvl+0x57/0x7d
> >   print_address_description.constprop.0+0x1f/0x140
> >   ? hyperv_flush_tlb_multi+0xf88/0x1060
> >   __kasan_report.cold+0x7f/0x11e
> >   ? hyperv_flush_tlb_multi+0xf88/0x1060
> >   kasan_report+0x38/0x50
> >   hyperv_flush_tlb_multi+0xf88/0x1060
> >   flush_tlb_mm_range+0x1b1/0x200
> >   ptep_clear_flush+0x10e/0x150
> > ...
> >  Allocated by task 0:
> >   kasan_save_stack+0x1b/0x40
> >   __kasan_kmalloc+0x7c/0x90
> >   hv_common_init+0xae/0x115
> >   hyperv_init+0x97/0x501
> >   apic_intr_mode_init+0xb3/0x1e0
> >   x86_late_time_init+0x92/0xa2
> >   start_kernel+0x338/0x3eb
> >   secondary_startup_64_no_verify+0xc2/0xcb
> > 
> >  The buggy address belongs to the object at ffff8880011cc800
> >   which belongs to the cache kmalloc-1k of size 1024
> >  The buggy address is located 960 bytes inside of
> >   1024-byte region [ffff8880011cc800, ffff8880011ccc00)
> > 
> > 'hyperv_flush_tlb_multi+0xf88/0x1060' points to
> > hv_cpu_number_to_vp_number() and '960 bytes' means we're trying to get
> > VP_INDEX for CPU#240. 'nr_cpus' here is exactly 240 so we're trying to
> > access past hv_vp_index's last element. This can (and will) happen
> > when 'cpus' mask is empty and cpumask_last() will return '>=nr_cpus'.
> > 
> > Commit ad0a6bad4475 ("x86/hyperv: check cpu mask after interrupt has
> > been disabled") tried to deal with empty cpumask situation but
> > apparently didn't fully fix the issue.
> > 
> > 'cpus' cpumask which is passed to hyperv_flush_tlb_multi() is
> > 'mm_cpumask(mm)' (which is '&mm->cpu_bitmap'). This mask changes every
> > time the particular mm is scheduled/unscheduled on some CPU (see
> > switch_mm_irqs_off()), disabling IRQs on the CPU which is performing remote
> > TLB flush has zero influence on whether the particular process can get
> > scheduled/unscheduled on _other_ CPUs so e.g. in the case where the mm was
> > scheduled on one other CPU and got unscheduled during
> > hyperv_flush_tlb_multi()'s execution will lead to cpumask becoming empty.
> > 
> > It doesn't seem that there's a good way to protect 'mm_cpumask(mm)'
> > from changing during hyperv_flush_tlb_multi()'s execution. It would be
> > possible to copy it in the very beginning of the function but this is a
> > waste. It seems we can deal with changing cpumask just fine.
> > 
> > When 'cpus' cpumask changes during hyperv_flush_tlb_multi()'s
> > execution, there are two possible issues:
> > - 'Under-flushing': we will not flush TLB on a CPU which got added to
> > the mask while hyperv_flush_tlb_multi() was already running. This is
> > not a problem as this is equal to mm getting scheduled on that CPU
> > right after TLB flush.
> > - 'Over-flushing': we may flush TLB on a CPU which is already cleared
> > from the mask. First, extra TLB flush preserves correctness. Second,
> > Hyper-V's TLB flush hypercall takes 'mm->pgd' argument so Hyper-V may
> > avoid the flush if CR3 doesn't match.
> > 
> > Fix the immediate issue with
> > cpumask_last()/hv_cpu_number_to_vp_number()
> > and remove the pointless cpumask_empty() check from the beginning of the
> > function as it really doesn't protect anything. Also, avoid the hypercall
> > altogether when 'flush->processor_mask' ends up being empty.
> > 
> > Fixes: ad0a6bad4475 ("x86/hyperv: check cpu mask after interrupt has been disabled")
> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > ---
> >  arch/x86/hyperv/mmu.c | 19 +++++++++----------
> >  1 file changed, 9 insertions(+), 10 deletions(-)
> > 
[...]
> 
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>

Applied to hyperv-next. Thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-10 11:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-06  9:46 [PATCH] x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi() Vitaly Kuznetsov
2022-01-07 16:56 ` Michael Kelley (LINUX)
2022-01-10 11:51   ` Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).