RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()

linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
       [not found] <CA+qYZY3a-FHfWNL2=na6O8TRJYu9kaeyp80VNDxaDTi2EBGoog@mail.gmail.com>
@ 2021-08-06 10:43 ` Michael Kelley
  2021-08-06 17:35   ` David Mozes
  0 siblings, 1 reply; 21+ messages in thread
From: Michael Kelley @ 2021-08-06 10:43 UTC (permalink / raw)
  To: David Moses; +Cc: david.mozes, linux-hyperv, linux-kernel

From: David Moses <mosesster@gmail.com> Sent: Friday, August 6, 2021 2:20 AM

> Hi Michael , 
> We are  running kernel 4.19.195 (The fix Wei Liu suggested of moving the
> cpumask_empty check after disabling interrupts is included in this version).
> with the default hyper-v version 
> I'm getting the 4 bytes garbage read (trace included) once almost every night
> We running on Azure vm Standard  D64s_v4 with 64 cores (Our system include
> three of such Vms) the application is very high io traffic involving iscsi 
> We believe this issue casus us to stack corruption on the rt scheduler as I forward
> in the previous mail.
>
> Let us know what is more needed to clarify the problem.
> Is it just Hyper-v related?   or could be a general kernel issue. 
>
> Thx David 
>
> even more that that while i add the below patch/fix 
> 
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 5b58a6c..165727a 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -298,6 +298,9 @@ static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
 > */
> static inline int hv_cpu_number_to_vp_number(int cpu_number)
> {
> +       if (WARN_ON_ONCE(cpu_number < 0 || cpu_number >= num_possible_cpus()))
> +               return VP_INVAL;
> +
>         return hv_vp_index[cpu_number];
> }
>
> we have evidence that we reach this point 
> 
> see below:
> Aug  5 21:03:01 c-node11 kernel: [17147.089261] WARNING: CPU: 15 PID: 8973 at arch/x86/include/asm/mshyperv.h:301 hyperv_flush_tlb_others+0x1f7/0x760
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] RIP: 0010:hyperv_flush_tlb_others+0x1f7/0x760
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] Code: ff ff be 40 00 00 00 48 89 df e8 c4 ff 3a 00
> 85 c0 48 89 c2 78 14 48 8b 3d be 52 32 01 f3 48 0f b8 c7 39 c2 0f 82 7e 01 00 00 <0f> 0b ba ff ff ff ff
> 89 d7 48 89 de e8 68 87 7d 00 3b 05 66 54 32
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] RSP: 0018:ffff8c536bcafa38 EFLAGS: 00010046
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] RAX: 0000000000000040 RBX: ffff8c339542ea00 RCX: ffffffffffffffff
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] RDX: 0000000000000040 RSI: ffffffffffffffff RDI: ffffffffffffffff
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] RBP: ffff8c339878b000 R08: ffffffffffffffff R09: ffffe93ecbcaa0e8
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] R10: 00000000020e0000 R11: 0000000000000000 R12: ffff8c536bcafa88
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] R13: ffffe93efe1ef980 R14: ffff8c339542e600 R15: 00007ffcbc390000
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] FS:  00007fcb8eae37a0(0000) GS:ffff8c339f7c0000(0000) knlGS:0000000000000000
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] CR2: 000000000135d1d8 CR3: 0000004037137005 CR4: 00000000003606e0
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] Call Trace:
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  flush_tlb_mm_range+0xc3/0x120
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  ptep_clear_flush+0x3a/0x40
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  wp_page_copy+0x2e6/0x8f0
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  ? reuse_swap_page+0x13d/0x390
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  do_wp_page+0x99/0x4c0
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  __handle_mm_fault+0xb4e/0x12c0
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  ? memcg_kmem_get_cache+0x76/0x1a0
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  handle_mm_fault+0xd6/0x200
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  __get_user_pages+0x29e/0x780
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  get_user_pages_remote+0x12c/0x1b0

(FYI -- email to the Linux kernel mailing lists should be in plaintext format, and
not use HTML or other formatting.)

This is an excellent experiment.  It certainly suggests that the cpumask that is
passed to hyperv_flush_tlb_others() has bits set for CPUs above 64 that don't exist.
If that's the case, it would seem to be a general kernel issue rather than something
specific to Hyper-V.

Since it looks like you can  to add debugging code to the kernel, here are a couple
of thoughts:

1) In hyperv_flush_tlb_others() after the call to disable interrupts, check the value
of cpulast(cpus), and if it is greater than num_possible_cpus(), execute a printk()
statement that outputs the entire contents of the cpumask that is passed in.  There's
a special printk format string for printing out bitmaps like cpumasks.   Let me know
if you would like some help on this code -- I can provide a diff later today.  Seeing
what the "bad" cpumask looks like might give some clues as to the problem.

2) As a different experiment, you can disable the Hyper-V specific flush routines
entirely.  At the end of the mmu.c source file, have hyperv_setup_mmu_ops()
always return immediately.  In this case, the generic Linux kernel flush routines
will be used instead of the Hyper-V ones.   The code may be marginally slower,
but it will then be interesting to see if a problem shows up elsewhere.

But based on your experiment, I'm guessing that there's a general kernel issue
rather than something specific to Hyper-V. 

Have you run 4.19 kernels previous to 4.19.195 that didn't have this problem?  If
you have a kernel version that is good, the ultimate step would be to do 
a bisect and find out where the problem was introduced in the 4.19-series.  That
could take a while, but it would almost certainly identify the problematic
code change and would be beneficial to the Linux kernel community in
general.

Michael

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2021-08-06 10:43 ` [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() Michael Kelley
@ 2021-08-06 17:35   ` David Mozes
       [not found]     ` <CAHkVu0-ZCXDRZL92d_G3oKpPuKvmY=YEbu9nbx9vkZHnhHFD8Q@mail.gmail.com>
  0 siblings, 1 reply; 21+ messages in thread
From: David Mozes @ 2021-08-06 17:35 UTC (permalink / raw)
  To: Michael Kelley, David Moses
  Cc: linux-hyperv, linux-kernel, David Mozes,
	תומר
	אבוטבול


Yes ,please provide the diff. 
Unfortunately  we saw the problem on every 4.19.x version we tested, started from 149 we saw the issue as we as in 5.4.80
I believe you are right and it is general kernel issue and not hyper-v specific. 
Look that the code I added eliminate the Panic we got but the kernel "doesn't like it"
Any suggestions how we can let the kernel continue working while we do our experiment? 

Thx
David 


-----Original Message-----
From: Michael Kelley <mikelley@microsoft.com> 
Sent: Friday, August 6, 2021 1:43 PM
To: David Moses <mosesster@gmail.com>
Cc: David Mozes <david.mozes@silk.us>; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()

From: David Moses <mosesster@gmail.com> Sent: Friday, August 6, 2021 2:20 AM

> Hi Michael , 
> We are  running kernel 4.19.195 (The fix Wei Liu suggested of moving the
> cpumask_empty check after disabling interrupts is included in this version).
> with the default hyper-v version 
> I'm getting the 4 bytes garbage read (trace included) once almost every night
> We running on Azure vm Standard  D64s_v4 with 64 cores (Our system include
> three of such Vms) the application is very high io traffic involving iscsi 
> We believe this issue casus us to stack corruption on the rt scheduler as I forward
> in the previous mail.
>
> Let us know what is more needed to clarify the problem.
> Is it just Hyper-v related?   or could be a general kernel issue. 
>
> Thx David 
>
> even more that that while i add the below patch/fix 
> 
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 5b58a6c..165727a 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -298,6 +298,9 @@ static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
 > */
> static inline int hv_cpu_number_to_vp_number(int cpu_number)
> {
> +       if (WARN_ON_ONCE(cpu_number < 0 || cpu_number >= num_possible_cpus()))
> +               return VP_INVAL;
> +
>         return hv_vp_index[cpu_number];
> }
>
> we have evidence that we reach this point 
> 
> see below:
> Aug  5 21:03:01 c-node11 kernel: [17147.089261] WARNING: CPU: 15 PID: 8973 at arch/x86/include/asm/mshyperv.h:301 hyperv_flush_tlb_others+0x1f7/0x760
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] RIP: 0010:hyperv_flush_tlb_others+0x1f7/0x760
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] Code: ff ff be 40 00 00 00 48 89 df e8 c4 ff 3a 00
> 85 c0 48 89 c2 78 14 48 8b 3d be 52 32 01 f3 48 0f b8 c7 39 c2 0f 82 7e 01 00 00 <0f> 0b ba ff ff ff ff
> 89 d7 48 89 de e8 68 87 7d 00 3b 05 66 54 32
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] RSP: 0018:ffff8c536bcafa38 EFLAGS: 00010046
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] RAX: 0000000000000040 RBX: ffff8c339542ea00 RCX: ffffffffffffffff
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] RDX: 0000000000000040 RSI: ffffffffffffffff RDI: ffffffffffffffff
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] RBP: ffff8c339878b000 R08: ffffffffffffffff R09: ffffe93ecbcaa0e8
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] R10: 00000000020e0000 R11: 0000000000000000 R12: ffff8c536bcafa88
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] R13: ffffe93efe1ef980 R14: ffff8c339542e600 R15: 00007ffcbc390000
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] FS:  00007fcb8eae37a0(0000) GS:ffff8c339f7c0000(0000) knlGS:0000000000000000
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] CR2: 000000000135d1d8 CR3: 0000004037137005 CR4: 00000000003606e0
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Aug  5 21:03:01 c-node11 kernel: [17147.089275] Call Trace:
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  flush_tlb_mm_range+0xc3/0x120
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  ptep_clear_flush+0x3a/0x40
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  wp_page_copy+0x2e6/0x8f0
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  ? reuse_swap_page+0x13d/0x390
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  do_wp_page+0x99/0x4c0
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  __handle_mm_fault+0xb4e/0x12c0
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  ? memcg_kmem_get_cache+0x76/0x1a0
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  handle_mm_fault+0xd6/0x200
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  __get_user_pages+0x29e/0x780
> Aug  5 21:03:01 c-node11 kernel: [17147.089275]  get_user_pages_remote+0x12c/0x1b0


(FYI -- email to the Linux kernel mailing lists should be in plaintext format, and
not use HTML or other formatting.)

This is an excellent experiment.  It certainly suggests that the cpumask that is
passed to hyperv_flush_tlb_others() has bits set for CPUs above 64 that don't exist.
f that's the case, it would seem to be a general kernel issue rather than something
specific to Hyper-V.

Since it looks like you can  to add debugging code to the kernel, here are a couple
of thoughts:

1) In hyperv_flush_tlb_others() after the call to disable interrupts, check the value
of cpulast(cpus), and if it is greater than num_possible_cpus(), execute a printk()
statement that outputs the entire contents of the cpumask that is passed in.  There's
a special printk format string for printing out bitmaps like cpumasks.   Let me know
if you would like some help on this code -- I can provide a diff later today.  Seeing
what the "bad" cpumask looks like might give some clues as to the problem.

2) As a different experiment, you can disable the Hyper-V specific flush routines
entirely.  At the end of the mmu.c source file, have hyperv_setup_mmu_ops()
always return immediately.  In this case, the generic Linux kernel flush routines
will be used instead of the Hyper-V ones.   The code may be marginally slower,
but it will then be interesting to see if a problem shows up elsewhere.

But based on your experiment, I'm guessing that there's a general kernel issue
rather than something specific to Hyper-V. 

Have you run 4.19 kernels previous to 4.19.195 that didn't have this problem?  If
you have a kernel version that is good, the ultimate step would be to do 
a bisect and find out where the problem was introduced in the 4.19-series.  That
could take a while, but it would almost certainly identify the problematic
code change and would be beneficial to the Linux kernel community in
general.

Michael

^ permalink raw reply	[flat|nested] 21+ messages in thread
[parent not found: <CAHkVu0-ZCXDRZL92d_G3oKpPuKvmY=YEbu9nbx9vkZHnhHFD8Q@mail.gmail.com>]
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
       [not found]     ` <CAHkVu0-ZCXDRZL92d_G3oKpPuKvmY=YEbu9nbx9vkZHnhHFD8Q@mail.gmail.com>
@ 2021-08-06 21:51       ` Michael Kelley
  2021-08-07  5:00         ` David Moses
  0 siblings, 1 reply; 21+ messages in thread
From: Michael Kelley @ 2021-08-06 21:51 UTC (permalink / raw)
  To: תומר
	אבוטבול,
	David Mozes
  Cc: David Moses, linux-hyperv, linux-kernel

From: תומר אבוטבול <tomer432100@gmail.com>  Sent: Friday, August 6, 2021 11:03 AM

> Attaching the patches Michael asked for debugging 
> 1) Print the cpumask when < num_possible_cpus():
> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> index e666f7eaf32d..620f656d6195 100644
> --- a/arch/x86/hyperv/mmu.c
> +++ b/arch/x86/hyperv/mmu.c
> @@ -60,6 +60,7 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>         struct hv_tlb_flush *flush;
>         u64 status = U64_MAX;
>         unsigned long flags;
> +       unsigned int cpu_last;
>
>        trace_hyperv_mmu_flush_tlb_others(cpus, info);
>
> @@ -68,6 +69,11 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>
>        local_irq_save(flags);
>
> +       cpu_last = cpumask_last(cpus);
> +       if (cpu_last > num_possible_cpus()) {

I think this should be ">=" since cpus are numbered starting at zero.
In your VM with 64 CPUs, having CPU #64 in the list would be error.

> +               pr_emerg("ERROR_HYPERV: cpu_last=%*pbl", cpumask_pr_args(cpus));
> +       }
> +
>        /*
>         * Only check the mask _after_ interrupt has been disabled to avoid the
>         * mask changing under our feet.
>
> 2) disable the Hyper-V specific flush routines:
> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> index e666f7eaf32d..8e77cc84775a 100644
> --- a/arch/x86/hyperv/mmu.c
> +++ b/arch/x86/hyperv/mmu.c
> @@ -235,6 +235,7 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
>
> void hyperv_setup_mmu_ops(void)
> {
> +  return;
>        if (!(ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED))
>                return;

Otherwise, this code looks good to me and matches what I had in mind.

Note that the function native_flush_tlb_others() is used when the Hyper-V specific
flush function is disabled per patch #2 above, or when hv_cpu_to_vp_index() returns
VP_INVALID.  In a quick glance through the code, it appears that native_flush_tlb_others()
will work even if there's a non-existent CPU in the cpumask that is passed as an argument.
So perhaps an immediate workaround is Patch #2 above.  

Perhaps hyperv_flush_tlb_others() should be made equally tolerant of a non-existent
CPU being in the list. But if you are willing, I'm still interested in the results of an
experiment with just Patch #1.  I'm curious about what the CPU list looks like when
it has a non-existent CPU.  Is it complete garbage, or is there just one non-existent
CPU?

The other curiosity is that I haven't seen this Linux panic reported by other users,
and I think it would have come to our attention if it were happening with any frequency.
You see the problem fairly regularly.  So I'm wondering what the difference is.

Michael

^ permalink raw reply	[flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2021-08-06 21:51       ` Michael Kelley
@ 2021-08-07  5:00         ` David Moses
  2021-08-17  9:16           ` David Mozes
  0 siblings, 1 reply; 21+ messages in thread
From: David Moses @ 2021-08-07  5:00 UTC (permalink / raw)
  To: Michael Kelley
  Cc: תומר
	אבוטבול,
	David Mozes, linux-hyperv, linux-kernel



Sent from my iPhone

> On Aug 7, 2021, at 12:51 AM, Michael Kelley <mikelley@microsoft.com> wrote:
> 
> From: תומר אבוטבול <tomer432100@gmail.com>  Sent: Friday, August 6, 2021 11:03 AM
> 
>> Attaching the patches Michael asked for debugging 
>> 1) Print the cpumask when < num_possible_cpus():
>> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
>> index e666f7eaf32d..620f656d6195 100644
>> --- a/arch/x86/hyperv/mmu.c
>> +++ b/arch/x86/hyperv/mmu.c
>> @@ -60,6 +60,7 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>>         struct hv_tlb_flush *flush;
>>         u64 status = U64_MAX;
>>         unsigned long flags;
>> +       unsigned int cpu_last;
>> 
>>         trace_hyperv_mmu_flush_tlb_others(cpus, info);
>> 
>> @@ -68,6 +69,11 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>> 
>>         local_irq_save(flags);
>> 
>> +       cpu_last = cpumask_last(cpus);
>> +       if (cpu_last > num_possible_cpus()) {
> 
> I think this should be ">=" since cpus are numbered starting at zero.
> In your VM with 64 CPUs, having CPU #64 in the list would be error.
> 
>> +               pr_emerg("ERROR_HYPERV: cpu_last=%*pbl", cpumask_pr_args(cpus));
>> +       }
>> +
>>         /*
>>          * Only check the mask _after_ interrupt has been disabled to avoid the
>>          * mask changing under our feet.
>> 
>> 2) disable the Hyper-V specific flush routines:
>> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
>> index e666f7eaf32d..8e77cc84775a 100644
>> --- a/arch/x86/hyperv/mmu.c
>> +++ b/arch/x86/hyperv/mmu.c
>> @@ -235,6 +235,7 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
>> 
>> void hyperv_setup_mmu_ops(void)
>>  {
>> +  return;
>>         if (!(ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED))
>>                 return;
> 
> Otherwise, this code looks good to me and matches what I had in mind.
> 
> Note that the function native_flush_tlb_others() is used when the Hyper-V specific
> flush function is disabled per patch #2 above, or when hv_cpu_to_vp_index() returns
> VP_INVALID.  In a quick glance through the code, it appears that native_flush_tlb_others()
> will work even if there's a non-existent CPU in the cpumask that is passed as an argument.
> So perhaps an immediate workaround is Patch #2 above.

The current code of hv_cpu_to_vp_index (where I generated the warning ) is returning VP_INVALID in this case (see previous mail) and look like it is not completely workaround the issue.
the cpu is hanging even not panic Will continue watching .
>   
> 
> Perhaps hyperv_flush_tlb_others() should be made equally tolerant of a non-existent
> CPU being in the list. But if you are willing, I'm still interested in the results of an
> experiment with just Patch #1.  I'm curious about what the CPU list looks like when
> it has a non-existent CPU.  Is it complete garbage, or is there just one non-existent
> CPU?
> 
 We will do my be not next week since vacation but the week after

> The other curiosity is that I haven't seen this Linux panic reported by other users,
> and I think it would have come to our attention if it were happening with any frequency.
> You see the problem fairly regularly.  So I'm wondering what the difference is.
> 
> Michael

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2021-08-07  5:00         ` David Moses
@ 2021-08-17  9:16           ` David Mozes
  2021-08-17 11:29             ` Wei Liu
  0 siblings, 1 reply; 21+ messages in thread
From: David Mozes @ 2021-08-17  9:16 UTC (permalink / raw)
  To: David Moses, Michael Kelley
  Cc: תומר
	אבוטבול,
	linux-hyperv, linux-kernel

Hi Michael and all .
I am back from the Holiday and did your saggestiones /requstes 

1. While  running with patch number-2 (disable the Hyper-V specific flush routines) 
 As you suspected, we got panic similar to what we got with the Hyper-V specific flash routines. 
Below is the trace we got: 

	[32097.577728] kernel BUG at kernel/sched/rt.c:1004!
[32097.577738] invalid opcode: 0000 [#1] SMP
[32097.578711] CPU: 45 PID: 51244 Comm: STAR4BLKS0_WORK Kdump: loaded Tainted: G           OE     4.19.195-KM9 #1
[32097.578711] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[32097.578711] RIP: 0010:dequeue_top_rt_rq+0x88/0xa0
[32097.578711] Code: 00 48 89 d5 48 0f a3 15 6e 19 82 01 73 d0 48 89 c7 e8 bc b7 fe ff be 02 00 00 00 89 ef 84 c0 74 0b e8 2c 94 04 00 eb b6 0f 0b <0f> 0b e8 b1 93 04 00 eb ab 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00
[32097.578711] RSP: 0018:ffff9442e0de7b48 EFLAGS: 00010046
[32097.578711] RAX: ffff94809f9e1e00 RBX: ffff9448295e4c40 RCX: 00000000ffffffff
[32097.578711] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94809f9e2040
[32097.578711] RBP: ffff94809f9e1e00 R08: fffffffffff0be25 R09: 00000000000216c0
[32097.578711] R10: 00004bbc85e1eff3 R11: 0000000000000000 R12: 0000000000000000
[32097.578711] R13: ffff9448295e4a20 R14: 0000000000021e00 R15: ffff94809fa21e00
[32097.578711] FS:  00007f7b0cea0700(0000) GS:ffff94809f940000(0000) knlGS:0000000000000000
[32097.578711] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[32097.578711] CR2: ffffffffff600400 CR3: 000000201d5b3002 CR4: 00000000003606e0
[32097.578711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[32097.578711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[32097.578711] Call Trace:
[32097.578711]  dequeue_rt_stack+0x3e/0x280
[32097.578711]  dequeue_rt_entity+0x1f/0x70
[32097.578711]  dequeue_task_rt+0x26/0x70
[32097.578711]  push_rt_task+0x1e2/0x220
[32097.578711]  push_rt_tasks+0x11/0x20
[32097.578711]  __balance_callback+0x3b/0x60
[32097.578711]  __schedule+0x6e9/0x830
[32097.578711]  schedule+0x28/0x80
[32097.578711]  futex_wait_queue_me+0xb9/0x120
[32097.578711]  futex_wait+0x139/0x250
[32097.578711]  ? try_to_wake_up+0x54/0x460
[32097.578711]  ? enqueue_task_rt+0x9f/0xc0
[32097.578711]  ? get_futex_key+0x2ee/0x450
[32097.578711]  do_futex+0x2eb/0x9f0
[32097.578711]  __x64_sys_futex+0x143/0x180
[32097.578711]  do_syscall_64+0x59/0x1b0
[32097.578711]  ? prepare_exit_to_usermode+0x70/0x90
[32097.578711]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[32097.578711] RIP: 0033:0x7fa2ae151334
[32097.578711] Code: 66 0f 1f 44 00 00 41 52 52 4d 31 d2 ba 02 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05 <89> d0 87 07 85 c0 75 f1 5a 41 5a c3 83 3d f1 df 20 00 00 74 59 48
[32097.578711] RSP: 002b:00007f7b0ce9f3b0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[32097.578711] RAX: ffffffffffffffda RBX: 00007f7c1da5bc18 RCX: 00007fa2ae151334
[32097.578711] RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007f7c1da5bc58
[32097.578711] RBP: 00007f7b0ce9f5b0 R08: 00007f7c1da5bc58 R09: 000000000000c82c
[32097.578711] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7b1a149cf0
[32097.578711] R13: 00007f7c1da5bc58 R14: 0000000000000001 R15: 00000000000005a1


2. as you requested and to help to the community  we running patch no 1 as well : 

And that is what we got: 

	Aug 17 05:36:22 10.230.247.7 [40544.392690] Hyper-V: ERROR_HYPERV: cpu_last= 

It looks like we got an empty cpumask ! 	

Would you please let us know what father info you need and what Is the next step for debugging this interesting issue 

Thx
David 




-----Original Message-----
From: David Moses <mosesster@gmail.com> 
Sent: Saturday, August 7, 2021 8:00 AM
To: Michael Kelley <mikelley@microsoft.com>
Cc: תומר אבוטבול <tomer432100@gmail.com>; David Mozes <david.mozes@silk.us>; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()



Sent from my iPhone

> On Aug 7, 2021, at 12:51 AM, Michael Kelley <mikelley@microsoft.com> wrote:
> 
> From: תומר אבוטבול <tomer432100@gmail.com>  Sent: Friday, August 6, 2021 11:03 AM
> 
>> Attaching the patches Michael asked for debugging 
>> 1) Print the cpumask when < num_possible_cpus():
>> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
>> index e666f7eaf32d..620f656d6195 100644
>> --- a/arch/x86/hyperv/mmu.c
>> +++ b/arch/x86/hyperv/mmu.c
>> @@ -60,6 +60,7 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>>         struct hv_tlb_flush *flush;
>>         u64 status = U64_MAX;
>>         unsigned long flags;
>> +       unsigned int cpu_last;
>> 
>>         trace_hyperv_mmu_flush_tlb_others(cpus, info);
>> 
>> @@ -68,6 +69,11 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>> 
>>         local_irq_save(flags);
>> 
>> +       cpu_last = cpumask_last(cpus);
>> +       if (cpu_last > num_possible_cpus()) {
> 
> I think this should be ">=" since cpus are numbered starting at zero.
> In your VM with 64 CPUs, having CPU #64 in the list would be error.
> 
>> +               pr_emerg("ERROR_HYPERV: cpu_last=%*pbl", cpumask_pr_args(cpus));
>> +       }
>> +
>>         /*
>>          * Only check the mask _after_ interrupt has been disabled to avoid the
>>          * mask changing under our feet.
>> 
>> 2) disable the Hyper-V specific flush routines:
>> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
>> index e666f7eaf32d..8e77cc84775a 100644
>> --- a/arch/x86/hyperv/mmu.c
>> +++ b/arch/x86/hyperv/mmu.c
>> @@ -235,6 +235,7 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
>> 
>> void hyperv_setup_mmu_ops(void)
>>  {
>> +  return;
>>         if (!(ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED))
>>                 return;
> 
> Otherwise, this code looks good to me and matches what I had in mind.
> 
> Note that the function native_flush_tlb_others() is used when the Hyper-V specific
> flush function is disabled per patch #2 above, or when hv_cpu_to_vp_index() returns
> VP_INVALID.  In a quick glance through the code, it appears that native_flush_tlb_others()
> will work even if there's a non-existent CPU in the cpumask that is passed as an argument.
> So perhaps an immediate workaround is Patch #2 above.

The current code of hv_cpu_to_vp_index (where I generated the warning ) is returning VP_INVALID in this case (see previous mail) and look like it is not completely workaround the issue.
the cpu is hanging even not panic Will continue watching .
>   
> 
> Perhaps hyperv_flush_tlb_others() should be made equally tolerant of a non-existent
> CPU being in the list. But if you are willing, I'm still interested in the results of an
> experiment with just Patch #1.  I'm curious about what the CPU list looks like when
> it has a non-existent CPU.  Is it complete garbage, or is there just one non-existent
> CPU?
> 
 We will do my be not next week since vacation but the week after

> The other curiosity is that I haven't seen this Linux panic reported by other users,
> and I think it would have come to our attention if it were happening with any frequency.
> You see the problem fairly regularly.  So I'm wondering what the difference is.
> 
> Michael

^ permalink raw reply	[flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2021-08-17  9:16           ` David Mozes
@ 2021-08-17 11:29             ` Wei Liu
  2021-08-19 11:05               ` David Mozes
       [not found]               ` <CA+qYZY1U04SkyHo7X+rDeE=nUy_X5nxLfShyuLJFzXnFp2A6uw@mail.gmail.com>
  0 siblings, 2 replies; 21+ messages in thread
From: Wei Liu @ 2021-08-17 11:29 UTC (permalink / raw)
  To: David Mozes
  Cc: David Moses, Michael Kelley,
	תומר
	אבוטבול,
	linux-hyperv, linux-kernel, Wei Liu

Please use the "reply all" button in your mail client and  avoid
top-posting. It is very difficult for me to decipher this thread...

On Tue, Aug 17, 2021 at 09:16:45AM +0000, David Mozes wrote:
> Hi Michael and all .
> I am back from the Holiday and did your saggestiones /requstes 
> 
> 1. While  running with patch number-2 (disable the Hyper-V specific flush routines) 
>  As you suspected, we got panic similar to what we got with the Hyper-V specific flash routines. 
> Below is the trace we got: 
> 
> 	[32097.577728] kernel BUG at kernel/sched/rt.c:1004!
> [32097.577738] invalid opcode: 0000 [#1] SMP
> [32097.578711] CPU: 45 PID: 51244 Comm: STAR4BLKS0_WORK Kdump: loaded Tainted: G           OE     4.19.195-KM9 #1

It seems that you have out of tree module(s) loaded. Please make sure
they don't do anything unusual.

> [32097.578711] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
> [32097.578711] RIP: 0010:dequeue_top_rt_rq+0x88/0xa0
> [32097.578711] Code: 00 48 89 d5 48 0f a3 15 6e 19 82 01 73 d0 48 89 c7 e8 bc b7 fe ff be 02 00 00 00 89 ef 84 c0 74 0b e8 2c 94 04 00 eb b6 0f 0b <0f> 0b e8 b1 93 04 00 eb ab 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00
> [32097.578711] RSP: 0018:ffff9442e0de7b48 EFLAGS: 00010046
> [32097.578711] RAX: ffff94809f9e1e00 RBX: ffff9448295e4c40 RCX: 00000000ffffffff
> [32097.578711] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94809f9e2040
> [32097.578711] RBP: ffff94809f9e1e00 R08: fffffffffff0be25 R09: 00000000000216c0
> [32097.578711] R10: 00004bbc85e1eff3 R11: 0000000000000000 R12: 0000000000000000
> [32097.578711] R13: ffff9448295e4a20 R14: 0000000000021e00 R15: ffff94809fa21e00
> [32097.578711] FS:  00007f7b0cea0700(0000) GS:ffff94809f940000(0000) knlGS:0000000000000000
> [32097.578711] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [32097.578711] CR2: ffffffffff600400 CR3: 000000201d5b3002 CR4: 00000000003606e0
> [32097.578711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [32097.578711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [32097.578711] Call Trace:
> [32097.578711]  dequeue_rt_stack+0x3e/0x280
> [32097.578711]  dequeue_rt_entity+0x1f/0x70
> [32097.578711]  dequeue_task_rt+0x26/0x70
> [32097.578711]  push_rt_task+0x1e2/0x220
> [32097.578711]  push_rt_tasks+0x11/0x20
> [32097.578711]  __balance_callback+0x3b/0x60
> [32097.578711]  __schedule+0x6e9/0x830
> [32097.578711]  schedule+0x28/0x80

It looks like the scheduler is in an irrecoverable state. The stack
trace does not show anything related to TLB flush, so it is unclear to
me this has anything to do with the original report.

Have you tried running the same setup on baremetal?


> [32097.578711]  futex_wait_queue_me+0xb9/0x120
> [32097.578711]  futex_wait+0x139/0x250
> [32097.578711]  ? try_to_wake_up+0x54/0x460
> [32097.578711]  ? enqueue_task_rt+0x9f/0xc0
> [32097.578711]  ? get_futex_key+0x2ee/0x450
> [32097.578711]  do_futex+0x2eb/0x9f0
> [32097.578711]  __x64_sys_futex+0x143/0x180
> [32097.578711]  do_syscall_64+0x59/0x1b0
> [32097.578711]  ? prepare_exit_to_usermode+0x70/0x90
> [32097.578711]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [32097.578711] RIP: 0033:0x7fa2ae151334
> [32097.578711] Code: 66 0f 1f 44 00 00 41 52 52 4d 31 d2 ba 02 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05 <89> d0 87 07 85 c0 75 f1 5a 41 5a c3 83 3d f1 df 20 00 00 74 59 48
> [32097.578711] RSP: 002b:00007f7b0ce9f3b0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> [32097.578711] RAX: ffffffffffffffda RBX: 00007f7c1da5bc18 RCX: 00007fa2ae151334
> [32097.578711] RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007f7c1da5bc58
> [32097.578711] RBP: 00007f7b0ce9f5b0 R08: 00007f7c1da5bc58 R09: 000000000000c82c
> [32097.578711] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7b1a149cf0
> [32097.578711] R13: 00007f7c1da5bc58 R14: 0000000000000001 R15: 00000000000005a1
> 
> 
> 2. as you requested and to help to the community  we running patch no 1 as well : 
> 
> And that is what we got: 
> 
> 	Aug 17 05:36:22 10.230.247.7 [40544.392690] Hyper-V: ERROR_HYPERV: cpu_last= 
> 
> It looks like we got an empty cpumask ! 	

Assuming this is from the patch below, the code already handles empty
cpumask a few lines later.

You should perhaps move your change after that to right before cpus is
actually used.

Wei.

> 
> Would you please let us know what father info you need and what Is the next step for debugging this interesting issue 
> 
> Thx
> David 
> 
> 
> 
> 
> >> 1) Print the cpumask when < num_possible_cpus():
> >> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> >> index e666f7eaf32d..620f656d6195 100644
> >> --- a/arch/x86/hyperv/mmu.c
> >> +++ b/arch/x86/hyperv/mmu.c
> >> @@ -60,6 +60,7 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> >>         struct hv_tlb_flush *flush;
> >>         u64 status = U64_MAX;
> >>         unsigned long flags;
> >> +       unsigned int cpu_last;
> >> 
> >>         trace_hyperv_mmu_flush_tlb_others(cpus, info);
> >> 
> >> @@ -68,6 +69,11 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> >> 
> >>         local_irq_save(flags);
> >> 
> >> +       cpu_last = cpumask_last(cpus);
> >> +       if (cpu_last > num_possible_cpus()) {
> > 
> > I think this should be ">=" since cpus are numbered starting at zero.
> > In your VM with 64 CPUs, having CPU #64 in the list would be error.
> > 
> >> +               pr_emerg("ERROR_HYPERV: cpu_last=%*pbl", cpumask_pr_args(cpus));
> >> +       }
> >> +
> >>         /*
> >>          * Only check the mask _after_ interrupt has been disabled to avoid the
> >>          * mask changing under our feet.
> >> 

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2021-08-17 11:29             ` Wei Liu
@ 2021-08-19 11:05               ` David Mozes
       [not found]               ` <CA+qYZY1U04SkyHo7X+rDeE=nUy_X5nxLfShyuLJFzXnFp2A6uw@mail.gmail.com>
  1 sibling, 0 replies; 21+ messages in thread
From: David Mozes @ 2021-08-19 11:05 UTC (permalink / raw)
  To: Wei Liu
  Cc: David Moses, Michael Kelley,
	תומר
	אבוטבול,
	linux-hyperv, linux-kernel



Hi Wei ,
Per your request I move the print cpumask to other two places after the treatment on the empty mask see below 
And I got the folwing: 


Aug 19 02:01:51 c-node05 kernel: [25936.562674] Hyper-V: ERROR_HYPERV2: cpu_last=
Aug 19 02:01:51 c-node05 kernel: [25936.562686] WARNING: CPU: 11 PID: 56432 at arch/x86/include/asm/mshyperv.h:301 hyperv_flush_tlb_others+0x23f/0x7b0

So we got empty cpumask  on a different place on the code .
Let me know if you need further information from us. 
How you sagest to handle this situation? 

Thx 
David 

The new  print cpu mask patch

diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index a72fa69..31f0683 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -70,9 +70,7 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
        local_irq_save(flags);

        cpu_last = cpumask_last(cpus);
-       if (cpu_last >= num_possible_cpus()) {
-               pr_emerg("ERROR_HYPERV: cpu_last=%*pbl", cpumask_pr_args(cpus));
-       }
+

        /*
         * Only check the mask _after_ interrupt has been disabled to avoid the
@@ -83,6 +81,12 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
                return;
        }

+
+       if (cpu_last >= num_possible_cpus()) {
+                               pr_emerg("ERROR_HYPERV1: cpu_last=%*pbl", cpumask_pr_args(cpus));
+                       }
+
+
        flush_pcpu = (struct hv_tlb_flush **)
                     this_cpu_ptr(hyperv_pcpu_input_arg);

@@ -121,6 +125,13 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
                 * must. We will also check all VP numbers when walking the
                 * supplied CPU set to remain correct in all cases.
                 */
+               cpu_last = cpumask_last(cpus);
+
+               if (cpu_last >= num_possible_cpus()) {
+                                       pr_emerg("ERROR_HYPERV2: cpu_last=%*pbl", cpumask_pr_args(cpus));
+                               }
+
-----Original Message-----
From: Wei Liu <wei.liu@kernel.org> 
Sent: Tuesday, August 17, 2021 2:30 PM
To: David Mozes <david.mozes@silk.us>
Cc: David Moses <mosesster@gmail.com>; Michael Kelley <mikelley@microsoft.com>; תומר אבוטבול <tomer432100@gmail.com>; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org; Wei Liu <wei.liu@kernel.org>
Subject: Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()

Please use the "reply all" button in your mail client and  avoid
top-posting. It is very difficult for me to decipher this thread...

On Tue, Aug 17, 2021 at 09:16:45AM +0000, David Mozes wrote:
> Hi Michael and all .
> I am back from the Holiday and did your saggestiones /requstes 
> 
> 1. While  running with patch number-2 (disable the Hyper-V specific flush routines) 
>  As you suspected, we got panic similar to what we got with the Hyper-V specific flash routines. 
> Below is the trace we got: 
> 
> 	[32097.577728] kernel BUG at kernel/sched/rt.c:1004!
> [32097.577738] invalid opcode: 0000 [#1] SMP
> [32097.578711] CPU: 45 PID: 51244 Comm: STAR4BLKS0_WORK Kdump: loaded Tainted: G           OE     4.19.195-KM9 #1

It seems that you have out of tree module(s) loaded. Please make sure
they don't do anything unusual.

> [32097.578711] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
> [32097.578711] RIP: 0010:dequeue_top_rt_rq+0x88/0xa0
> [32097.578711] Code: 00 48 89 d5 48 0f a3 15 6e 19 82 01 73 d0 48 89 c7 e8 bc b7 fe ff be 02 00 00 00 89 ef 84 c0 74 0b e8 2c 94 04 00 eb b6 0f 0b <0f> 0b e8 b1 93 04 00 eb ab 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00
> [32097.578711] RSP: 0018:ffff9442e0de7b48 EFLAGS: 00010046
> [32097.578711] RAX: ffff94809f9e1e00 RBX: ffff9448295e4c40 RCX: 00000000ffffffff
> [32097.578711] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94809f9e2040
> [32097.578711] RBP: ffff94809f9e1e00 R08: fffffffffff0be25 R09: 00000000000216c0
> [32097.578711] R10: 00004bbc85e1eff3 R11: 0000000000000000 R12: 0000000000000000
> [32097.578711] R13: ffff9448295e4a20 R14: 0000000000021e00 R15: ffff94809fa21e00
> [32097.578711] FS:  00007f7b0cea0700(0000) GS:ffff94809f940000(0000) knlGS:0000000000000000
> [32097.578711] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [32097.578711] CR2: ffffffffff600400 CR3: 000000201d5b3002 CR4: 00000000003606e0
> [32097.578711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [32097.578711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [32097.578711] Call Trace:
> [32097.578711]  dequeue_rt_stack+0x3e/0x280
> [32097.578711]  dequeue_rt_entity+0x1f/0x70
> [32097.578711]  dequeue_task_rt+0x26/0x70
> [32097.578711]  push_rt_task+0x1e2/0x220
> [32097.578711]  push_rt_tasks+0x11/0x20
> [32097.578711]  __balance_callback+0x3b/0x60
> [32097.578711]  __schedule+0x6e9/0x830
> [32097.578711]  schedule+0x28/0x80

It looks like the scheduler is in an irrecoverable state. The stack
trace does not show anything related to TLB flush, so it is unclear to
me this has anything to do with the original report.

Have you tried running the same setup on baremetal?


> [32097.578711]  futex_wait_queue_me+0xb9/0x120
> [32097.578711]  futex_wait+0x139/0x250
> [32097.578711]  ? try_to_wake_up+0x54/0x460
> [32097.578711]  ? enqueue_task_rt+0x9f/0xc0
> [32097.578711]  ? get_futex_key+0x2ee/0x450
> [32097.578711]  do_futex+0x2eb/0x9f0
> [32097.578711]  __x64_sys_futex+0x143/0x180
> [32097.578711]  do_syscall_64+0x59/0x1b0
> [32097.578711]  ? prepare_exit_to_usermode+0x70/0x90
> [32097.578711]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [32097.578711] RIP: 0033:0x7fa2ae151334
> [32097.578711] Code: 66 0f 1f 44 00 00 41 52 52 4d 31 d2 ba 02 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05 <89> d0 87 07 85 c0 75 f1 5a 41 5a c3 83 3d f1 df 20 00 00 74 59 48
> [32097.578711] RSP: 002b:00007f7b0ce9f3b0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> [32097.578711] RAX: ffffffffffffffda RBX: 00007f7c1da5bc18 RCX: 00007fa2ae151334
> [32097.578711] RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007f7c1da5bc58
> [32097.578711] RBP: 00007f7b0ce9f5b0 R08: 00007f7c1da5bc58 R09: 000000000000c82c
> [32097.578711] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7b1a149cf0
> [32097.578711] R13: 00007f7c1da5bc58 R14: 0000000000000001 R15: 00000000000005a1
> 
> 
> 2. as you requested and to help to the community  we running patch no 1 as well : 
> 
> And that is what we got: 
> 
> 	Aug 17 05:36:22 10.230.247.7 [40544.392690] Hyper-V: ERROR_HYPERV: cpu_last= 
> 
> It looks like we got an empty cpumask ! 	

Assuming this is from the patch below, the code already handles empty
cpumask a few lines later.

You should perhaps move your change after that to right before cpus is
actually used.

Wei.

> 
> Would you please let us know what father info you need and what Is the next step for debugging this interesting issue 
> 
> Thx
> David 
> 
> 
> 
> 
> >> 1) Print the cpumask when < num_possible_cpus():
> >> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> >> index e666f7eaf32d..620f656d6195 100644
> >> --- a/arch/x86/hyperv/mmu.c
> >> +++ b/arch/x86/hyperv/mmu.c
> >> @@ -60,6 +60,7 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> >>         struct hv_tlb_flush *flush;
> >>         u64 status = U64_MAX;
> >>         unsigned long flags;
> >> +       unsigned int cpu_last;
> >> 
> >>         trace_hyperv_mmu_flush_tlb_others(cpus, info);
> >> 
> >> @@ -68,6 +69,11 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> >> 
> >>         local_irq_save(flags);
> >> 
> >> +       cpu_last = cpumask_last(cpus);
> >> +       if (cpu_last > num_possible_cpus()) {
> > 
> > I think this should be ">=" since cpus are numbered starting at zero.
> > In your VM with 64 CPUs, having CPU #64 in the list would be error.
> > 
> >> +               pr_emerg("ERROR_HYPERV: cpu_last=%*pbl", cpumask_pr_args(cpus));
> >> +       }
> >> +
> >>         /*
> >>          * Only check the mask _after_ interrupt has been disabled to avoid the
> >>          * mask changing under our feet.
> >> 

^ permalink raw reply related	[flat|nested] 21+ messages in thread
[parent not found: <CA+qYZY1U04SkyHo7X+rDeE=nUy_X5nxLfShyuLJFzXnFp2A6uw@mail.gmail.com>]
[parent not found: <VI1PR0401MB24153DEC767B0126B1030E07F1C09@VI1PR0401MB2415.eurprd04.prod.outlook.com>]
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
       [not found]                 ` <VI1PR0401MB24153DEC767B0126B1030E07F1C09@VI1PR0401MB2415.eurprd04.prod.outlook.com>
@ 2021-08-22 15:24                   ` Wei Liu
  2021-08-22 16:25                     ` David Mozes
  0 siblings, 1 reply; 21+ messages in thread
From: Wei Liu @ 2021-08-22 15:24 UTC (permalink / raw)
  To: David Mozes
  Cc: David Moses, Wei Liu, Michael Kelley,
	תומר
	אבוטבול,
	linux-hyperv, linux-kernel

On Thu, Aug 19, 2021 at 07:55:06AM +0000, David Mozes wrote:
> Hi Wei ,
> I move the print cpumask to other two places after the treatment on the empty mask see below
> And I got the folwing:
> 
> 
> Aug 19 02:01:51 c-node05 kernel: [25936.562674] Hyper-V: ERROR_HYPERV2: cpu_last=
> Aug 19 02:01:51 c-node05 kernel: [25936.562686] WARNING: CPU: 11 PID: 56432 at arch/x86/include/asm/mshyperv.h:301 hyperv_flush_tlb_others+0x23f/0x7b0
> 
> So we got empty on different place on the code .
> Let me know if you need further information from us.
> How you sagest to handle this situation?
> 

Please find a way to reproduce this issue with upstream kernels.

Thanks,
Wei.

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2021-08-22 15:24                   ` Wei Liu
@ 2021-08-22 16:25                     ` David Mozes
  2021-08-22 17:32                       ` Wei Liu
  0 siblings, 1 reply; 21+ messages in thread
From: David Mozes @ 2021-08-22 16:25 UTC (permalink / raw)
  To: Wei Liu
  Cc: David Moses, Michael Kelley,
	תומר
	אבוטבול,
	linux-hyperv, linux-kernel

This is not visible since we need a very high load to reproduce. 
We have tried a lot but can't achieve the desired load 
On our kernel with less load, it is not reproducible as well.

-----Original Message-----
From: Wei Liu <wei.liu@kernel.org> 
Sent: Sunday, August 22, 2021 6:25 PM
To: David Mozes <david.mozes@silk.us>
Cc: David Moses <mosesster@gmail.com>; Wei Liu <wei.liu@kernel.org>; Michael Kelley <mikelley@microsoft.com>; תומר אבוטבול <tomer432100@gmail.com>; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()

On Thu, Aug 19, 2021 at 07:55:06AM +0000, David Mozes wrote:
> Hi Wei ,
> I move the print cpumask to other two places after the treatment on the empty mask see below
> And I got the folwing:
> 
> 
> Aug 19 02:01:51 c-node05 kernel: [25936.562674] Hyper-V: ERROR_HYPERV2: cpu_last=
> Aug 19 02:01:51 c-node05 kernel: [25936.562686] WARNING: CPU: 11 PID: 56432 at arch/x86/include/asm/mshyperv.h:301 hyperv_flush_tlb_others+0x23f/0x7b0
> 
> So we got empty on different place on the code .
> Let me know if you need further information from us.
> How you sagest to handle this situation?
> 

Please find a way to reproduce this issue with upstream kernels.

Thanks,
Wei.

^ permalink raw reply	[flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2021-08-22 16:25                     ` David Mozes
@ 2021-08-22 17:32                       ` Wei Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Wei Liu @ 2021-08-22 17:32 UTC (permalink / raw)
  To: David Mozes
  Cc: Wei Liu, David Moses, Michael Kelley,
	תומר
	אבוטבול,
	linux-hyperv, linux-kernel

On Sun, Aug 22, 2021 at 04:25:19PM +0000, David Mozes wrote:
> This is not visible since we need a very high load to reproduce. 
> We have tried a lot but can't achieve the desired load 
> On our kernel with less load, it is not reproducible as well.

There isn't much upstream can do if there is no way to reproduce the
issue with an upstream kernel.

You can check all the code paths which may modify cpumask and analyze
them. KCSAN may be useful too, but that's only available in 5.8 and
later.

Thanks,
Wei.

> 
> -----Original Message-----
> From: Wei Liu <wei.liu@kernel.org> 
> Sent: Sunday, August 22, 2021 6:25 PM
> To: David Mozes <david.mozes@silk.us>
> Cc: David Moses <mosesster@gmail.com>; Wei Liu <wei.liu@kernel.org>; Michael Kelley <mikelley@microsoft.com>; תומר אבוטבול <tomer432100@gmail.com>; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
> 
> On Thu, Aug 19, 2021 at 07:55:06AM +0000, David Mozes wrote:
> > Hi Wei ,
> > I move the print cpumask to other two places after the treatment on the empty mask see below
> > And I got the folwing:
> > 
> > 
> > Aug 19 02:01:51 c-node05 kernel: [25936.562674] Hyper-V: ERROR_HYPERV2: cpu_last=
> > Aug 19 02:01:51 c-node05 kernel: [25936.562686] WARNING: CPU: 11 PID: 56432 at arch/x86/include/asm/mshyperv.h:301 hyperv_flush_tlb_others+0x23f/0x7b0
> > 
> > So we got empty on different place on the code .
> > Let me know if you need further information from us.
> > How you sagest to handle this situation?
> > 
> 
> Please find a way to reproduce this issue with upstream kernels.
> 
> Thanks,
> Wei.

^ permalink raw reply	[flat|nested] 21+ messages in thread
[parent not found: <VI1PR0401MB24150B31A1D63176BBB788D2F1F19@VI1PR0401MB2415.eurprd04.prod.outlook.com>]
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
       [not found] <VI1PR0401MB24150B31A1D63176BBB788D2F1F19@VI1PR0401MB2415.eurprd04.prod.outlook.com>
@ 2021-08-05 18:08 ` Michael Kelley
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Kelley @ 2021-08-05 18:08 UTC (permalink / raw)
  To: David Mozes, 20201001013814.2435935-1-sashal, linux-kernel, linux-hyperv

From: David Mozes <david.mozes@silk.us>
> 
> Hi,
> The problem is happening to me very frequently on kernel 4.19.195
> 

David -- could you give us a little more context?   Were you running earlier
4.19.xxx versions and did not see this problem?   There was a timing
problem in  hyperv_flush_tlb_others() that was fixed in early January
2021.  The fix was backported to the 4.19 longterm tree, and should
be included in 4.19.195.  Outside of that, I'm not aware of a problem
in this area.

For completeness, what version of Hyper-V are you using?  And how
many vCPUs in your VM?

Michael

> 
> 
> ug  4 03:59:01 c-node04 kernel: [36976.388554] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others+0xec9/0x1640
> Aug  4 03:59:01 c-node04 kernel: [36976.388556] Read of size 4 at addr ffff889e5e127440 by task ps/52478
> Aug  4 03:59:01 c-node04 kernel: [36976.388556]
> Aug  4 03:59:01 c-node04 kernel: [36976.388560] CPU: 4 PID: 52478 Comm: ps Kdump: loaded Tainted: G        W  OE
> 4.19.195-KM9 #1
> Aug  4 03:59:01 c-node04 kernel: [36976.388562] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine,
> BIOS 090008  12/07/2018
> Aug  4 03:59:01 c-node04 kernel: [36976.388562] Call Trace:
> Aug  4 03:59:01 c-node04 kernel: [36976.388569]  dump_stack+0x11d/0x1a7
> Aug  4 03:59:01 c-node04 kernel: [36976.388572]  ? dump_stack_print_info.cold.0+0x1b/0x1b
> Aug  4 03:59:01 c-node04 kernel: [36976.388576]  ? percpu_ref_tryget_live+0x2f0/0x2f0
> Aug  4 03:59:01 c-node04 kernel: [36976.388580]  ? rb_erase_cached+0xc4c/0x2880
> Aug  4 03:59:01 c-node04 kernel: [36976.388584]  ? printk+0x9f/0xc5
> Aug  4 03:59:01 c-node04 kernel: [36976.388585]  ? snapshot_ioctl.cold.1+0x74/0x74
> Aug  4 03:59:01 c-node04 kernel: [36976.388590]  print_address_description+0x65/0x22e
> Aug  4 03:59:01 c-node04 kernel: [36976.388592]  kasan_report.cold.6+0x243/0x2ff
> Aug  4 03:59:01 c-node04 kernel: [36976.388594]  ? hyperv_flush_tlb_others+0xec9/0x1640
> Aug  4 03:59:01 c-node04 kernel: [36976.388596]  hyperv_flush_tlb_others+0xec9/0x1640
> Aug  4 03:59:01 c-node04 kernel: [36976.388601]  ?
> trace_event_raw_event_hyperv_nested_flush_guest_mapping+0x1b0/0x1b0
> Aug  4 03:59:01 c-node04 kernel: [36976.388603]  ? mem_cgroup_try_charge+0x3cc/0x7d0
> Aug  4 03:59:01 c-node04 kernel: [36976.388608]  flush_tlb_mm_range+0x25c/0x370
> Aug  4 03:59:01 c-node04 kernel: [36976.388611]  ? native_flush_tlb_others+0x3b0/0x3b0
> Aug  4 03:59:01 c-node04 kernel: [36976.388616]  ptep_clear_flush+0x192/0x1d0
> Aug  4 03:59:01 c-node04 kernel: [36976.388618]  ? pmd_clear_bad+0x70/0x70
> Aug  4 03:59:01 c-node04 kernel: [36976.388622]  wp_page_copy+0x861/0x1a30
> Aug  4 03:59:01 c-node04 kernel: [36976.388624]  ? follow_pfn+0x2f0/0x2f0
> Aug  4 03:59:01 c-node04 kernel: [36976.388627]  ? active_load_balance_cpu_stop+0x10d0/0x10d0
> Aug  4 03:59:01 c-node04 kernel: [36976.388632]  ? get_page_from_freelist+0x330c/0x4660
> Aug  4 03:59:01 c-node04 kernel: [36976.388638]  ? activate_page+0x660/0x660
> Aug  4 03:59:01 c-node04 kernel: [36976.388639]  ? rb_erase+0x2a40/0x2a40
> Aug  4 03:59:01 c-node04 kernel: [36976.388639]  ? wake_up_page_bit+0x4d0/0x4d0
> Aug  4 03:59:01 c-node04 kernel: [36976.388639]  ? unwind_next_frame+0x113e/0x1920
> Aug  4 03:59:01 c-node04 kernel: [36976.388639]  ? __pte_alloc_kernel+0x350/0x350
> Aug  4 03:59:01 c-node04 kernel: [36976.388639]  ? deref_stack_reg+0x130/0x130
> Aug  4 03:59:01 c-node04 kernel: [36976.388639]  do_wp_page+0x461/0x1ca0
> Aug  4 03:59:01 c-node04 kernel: [36976.388639]  ? deref_stack_reg+0x130/0x130
> Aug  4 03:59:01 c-node04 kernel: [36976.388639]  ? finish_mkwrite_fault+0x710/0x710
> Aug  4 03:59:01 c-node04 kernel: [36976.388639]  ? unwind_next_frame+0x105d/0x1920
> Aug  4 03:59:01 c-node04 kernel: [36976.388639]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Aug  4 03:59:01 c-node04 kernel: [36976.388639]  ? __pte_alloc_kernel+0x350/0x350
> Aug  4 03:59:01 c-node04 kernel: [36976.388639]  ? __zone_watermark_ok+0x33c/0x640
> Aug  4 03:59:01 c-node04 kernel: [36976.388639]  ? _raw_spin_lock+0x13/0x30
> Pattern not found  (press RETURN)

^ permalink raw reply	[flat|nested] 21+ messages in thread
* [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
@ 2020-10-01  1:38 Sasha Levin
  2020-10-01  9:40 ` Vitaly Kuznetsov
  0 siblings, 1 reply; 21+ messages in thread
From: Sasha Levin @ 2020-10-01  1:38 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu
  Cc: tglx, mingo, bp, x86, hpa, vkuznets, mikelley, linux-hyperv,
	linux-kernel, Sasha Levin, stable

cpumask can change underneath us, which is generally safe except when we
call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass
num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read
garbage. As reported by KASAN:

[   83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
[   83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106
[   84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G        W         5.4.60 #1
[   84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[   84.196669] Workqueue: writeback wb_workfn (flush-8:0)
[   84.196669] Call Trace:
[   84.196669] dump_stack (lib/dump_stack.c:120)
[   84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375)
[   84.196669] __kasan_report.cold (mm/kasan/report.c:507)
[   84.196669] kasan_report (arch/x86/include/asm/smap.h:71 mm/kasan/common.c:635)
[   84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
[   84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 arch/x86/mm/tlb.c:798)
[   84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-generic.c:88)

Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible")
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/hyperv/mmu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index 5208ba49c89a9..b1d6afc5fc4a3 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
 		 * must. We will also check all VP numbers when walking the
 		 * supplied CPU set to remain correct in all cases.
 		 */
-		if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
+		int last = cpumask_last(cpus);
+
+		if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >= 64)
 			goto do_ex_hypercall;
 
 		for_each_cpu(cpu, cpus) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2020-10-01  1:38 Sasha Levin
@ 2020-10-01  9:40 ` Vitaly Kuznetsov
  2020-10-01 11:53   ` Wei Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Vitaly Kuznetsov @ 2020-10-01  9:40 UTC (permalink / raw)
  To: Sasha Levin
  Cc: tglx, mingo, bp, x86, hpa, mikelley, linux-hyperv, linux-kernel,
	Sasha Levin, stable, kys, haiyangz, sthemmin, wei.liu

Sasha Levin <sashal@kernel.org> writes:

> cpumask can change underneath us, which is generally safe except when we
> call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass
> num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read
> garbage. As reported by KASAN:
>
> [   83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
> [   83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106
> [   84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G        W         5.4.60 #1
> [   84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
> [   84.196669] Workqueue: writeback wb_workfn (flush-8:0)
> [   84.196669] Call Trace:
> [   84.196669] dump_stack (lib/dump_stack.c:120)
> [   84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375)
> [   84.196669] __kasan_report.cold (mm/kasan/report.c:507)
> [   84.196669] kasan_report (arch/x86/include/asm/smap.h:71 mm/kasan/common.c:635)
> [   84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
> [   84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 arch/x86/mm/tlb.c:798)
> [   84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-generic.c:88)
>
> Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible")
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: stable@kernel.org
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  arch/x86/hyperv/mmu.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> index 5208ba49c89a9..b1d6afc5fc4a3 100644
> --- a/arch/x86/hyperv/mmu.c
> +++ b/arch/x86/hyperv/mmu.c
> @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>  		 * must. We will also check all VP numbers when walking the
>  		 * supplied CPU set to remain correct in all cases.
>  		 */
> -		if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
> +		int last = cpumask_last(cpus);
> +
> +		if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >= 64)
>  			goto do_ex_hypercall;

In case 'cpus' can end up being empty (I'm genuinely suprised it can)
the check is mandatory indeed. I would, however, just return directly in
this case:

if (last < num_possible_cpus())
	return;

if (hv_cpu_number_to_vp_number(last) >= 64)
	goto do_ex_hypercall;

as there's nothing to flush, no need to call into
hyperv_flush_tlb_others_ex().

Anyway, the fix seems to be correct, so

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

>  
>  		for_each_cpu(cpu, cpus) {

-- 
Vitaly


^ permalink raw reply	[flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2020-10-01  9:40 ` Vitaly Kuznetsov
@ 2020-10-01 11:53   ` Wei Liu
  2020-10-01 13:04     ` Sasha Levin
  2020-10-01 13:10     ` Vitaly Kuznetsov
  0 siblings, 2 replies; 21+ messages in thread
From: Wei Liu @ 2020-10-01 11:53 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Sasha Levin, tglx, mingo, bp, x86, hpa, mikelley, linux-hyperv,
	linux-kernel, stable, kys, haiyangz, sthemmin, wei.liu

On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote:
> Sasha Levin <sashal@kernel.org> writes:
> 
> > cpumask can change underneath us, which is generally safe except when we
> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass
> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read
> > garbage. As reported by KASAN:
> >
> > [   83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
> > [   83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106
> > [   84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G        W         5.4.60 #1
> > [   84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
> > [   84.196669] Workqueue: writeback wb_workfn (flush-8:0)
> > [   84.196669] Call Trace:
> > [   84.196669] dump_stack (lib/dump_stack.c:120)
> > [   84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375)
> > [   84.196669] __kasan_report.cold (mm/kasan/report.c:507)
> > [   84.196669] kasan_report (arch/x86/include/asm/smap.h:71 mm/kasan/common.c:635)
> > [   84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
> > [   84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 arch/x86/mm/tlb.c:798)
> > [   84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-generic.c:88)
> >
> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible")
> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > Cc: stable@kernel.org
> > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > ---
> >  arch/x86/hyperv/mmu.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> > index 5208ba49c89a9..b1d6afc5fc4a3 100644
> > --- a/arch/x86/hyperv/mmu.c
> > +++ b/arch/x86/hyperv/mmu.c
> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> >  		 * must. We will also check all VP numbers when walking the
> >  		 * supplied CPU set to remain correct in all cases.
> >  		 */
> > -		if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
> > +		int last = cpumask_last(cpus);
> > +
> > +		if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >= 64)
> >  			goto do_ex_hypercall;
> 
> In case 'cpus' can end up being empty (I'm genuinely suprised it can)
> the check is mandatory indeed. I would, however, just return directly in
> this case:
> 
> if (last < num_possible_cpus())
> 	return;

I think you want 

   last >= num_possible_cpus()

here?

A more important question is, if the mask can change willy-nilly, what
is stopping it from changing between these checks? I.e. is there still a
windows that hv_cpu_number_to_vp_number(last) can return garbage?

Wei.

> 
> if (hv_cpu_number_to_vp_number(last) >= 64)
> 	goto do_ex_hypercall;
> 
> as there's nothing to flush, no need to call into
> hyperv_flush_tlb_others_ex().
> 
> Anyway, the fix seems to be correct, so
> 
> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> >  
> >  		for_each_cpu(cpu, cpus) {
> 
> -- 
> Vitaly
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2020-10-01 11:53   ` Wei Liu
@ 2020-10-01 13:04     ` Sasha Levin
  2020-10-03 17:40       ` Michael Kelley
  2020-10-01 13:10     ` Vitaly Kuznetsov
  1 sibling, 1 reply; 21+ messages in thread
From: Sasha Levin @ 2020-10-01 13:04 UTC (permalink / raw)
  To: Wei Liu
  Cc: Vitaly Kuznetsov, tglx, mingo, bp, x86, hpa, mikelley,
	linux-hyperv, linux-kernel, stable, kys, haiyangz, sthemmin

On Thu, Oct 01, 2020 at 11:53:59AM +0000, Wei Liu wrote:
>On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote:
>> Sasha Levin <sashal@kernel.org> writes:
>>
>> > cpumask can change underneath us, which is generally safe except when we
>> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass
>> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read
>> > garbage. As reported by KASAN:
>> >
>> > [   83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
>> > [   83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106
>> > [   84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G        W         5.4.60 #1
>> > [   84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
>> > [   84.196669] Workqueue: writeback wb_workfn (flush-8:0)
>> > [   84.196669] Call Trace:
>> > [   84.196669] dump_stack (lib/dump_stack.c:120)
>> > [   84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375)
>> > [   84.196669] __kasan_report.cold (mm/kasan/report.c:507)
>> > [   84.196669] kasan_report (arch/x86/include/asm/smap.h:71 mm/kasan/common.c:635)
>> > [   84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
>> > [   84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 arch/x86/mm/tlb.c:798)
>> > [   84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-generic.c:88)
>> >
>> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible")
>> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> > Cc: stable@kernel.org
>> > Signed-off-by: Sasha Levin <sashal@kernel.org>
>> > ---
>> >  arch/x86/hyperv/mmu.c | 4 +++-
>> >  1 file changed, 3 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
>> > index 5208ba49c89a9..b1d6afc5fc4a3 100644
>> > --- a/arch/x86/hyperv/mmu.c
>> > +++ b/arch/x86/hyperv/mmu.c
>> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>> >  		 * must. We will also check all VP numbers when walking the
>> >  		 * supplied CPU set to remain correct in all cases.
>> >  		 */
>> > -		if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
>> > +		int last = cpumask_last(cpus);
>> > +
>> > +		if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >= 64)
>> >  			goto do_ex_hypercall;
>>
>> In case 'cpus' can end up being empty (I'm genuinely suprised it can)

I was just as surprised as you and spent the good part of a day
debugging this. However, a:

	WARN_ON(cpumask_empty(cpus));

triggers at that line of code even though we check for cpumask_empty()
at the entry of the function.

>> the check is mandatory indeed. I would, however, just return directly in
>> this case:

Makes sense.

>> if (last < num_possible_cpus())
>> 	return;
>
>I think you want
>
>   last >= num_possible_cpus()
>
>here?
>
>A more important question is, if the mask can change willy-nilly, what
>is stopping it from changing between these checks? I.e. is there still a
>windows that hv_cpu_number_to_vp_number(last) can return garbage?

It's not that hv_cpu_number_to_vp_number() returns garbage, the issue is
that we feed it garbage.

hv_cpu_number_to_vp_number() expects that the input would be in the
range of 0 <= X < num_possible_cpus(), and here if 'cpus' was empty we
would pass in X==num_possible_cpus() making it read out of bound.

Maybe it's worthwhile to add a WARN_ON() into
hv_cpu_number_to_vp_number() to assert as well.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2020-10-01 13:04     ` Sasha Levin
@ 2020-10-03 17:40       ` Michael Kelley
  2020-10-05 14:58         ` Wei Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Michael Kelley @ 2020-10-03 17:40 UTC (permalink / raw)
  To: Sasha Levin, Wei Liu
  Cc: vkuznets, tglx, mingo, bp, x86, hpa, linux-hyperv, linux-kernel,
	stable, KY Srinivasan, Haiyang Zhang, Stephen Hemminger

From: Sasha Levin <sashal@kernel.org>  Sent: Thursday, October 1, 2020 6:04 AM
> 
> On Thu, Oct 01, 2020 at 11:53:59AM +0000, Wei Liu wrote:
> >On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote:
> >> Sasha Levin <sashal@kernel.org> writes:
> >>
> >> > cpumask can change underneath us, which is generally safe except when we
> >> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass
> >> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read
> >> > garbage. As reported by KASAN:
> >> >
> >> > [   83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others
> (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
> >> > [   83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106
> >> > [   84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G        W         5.4.60 #1
> >> > [   84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine,
> BIOS 090008  12/07/2018
> >> > [   84.196669] Workqueue: writeback wb_workfn (flush-8:0)
> >> > [   84.196669] Call Trace:
> >> > [   84.196669] dump_stack (lib/dump_stack.c:120)
> >> > [   84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375)
> >> > [   84.196669] __kasan_report.cold (mm/kasan/report.c:507)
> >> > [   84.196669] kasan_report (arch/x86/include/asm/smap.h:71
> mm/kasan/common.c:635)
> >> > [   84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128
> arch/x86/hyperv/mmu.c:112)
> >> > [   84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68
> arch/x86/mm/tlb.c:798)
> >> > [   84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-
> generic.c:88)
> >> >
> >> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper
> HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible")
> >> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> > Cc: stable@kernel.org
> >> > Signed-off-by: Sasha Levin <sashal@kernel.org>
> >> > ---
> >> >  arch/x86/hyperv/mmu.c | 4 +++-
> >> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> >> > index 5208ba49c89a9..b1d6afc5fc4a3 100644
> >> > --- a/arch/x86/hyperv/mmu.c
> >> > +++ b/arch/x86/hyperv/mmu.c
> >> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask
> *cpus,
> >> >  		 * must. We will also check all VP numbers when walking the
> >> >  		 * supplied CPU set to remain correct in all cases.
> >> >  		 */
> >> > -		if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
> >> > +		int last = cpumask_last(cpus);
> >> > +
> >> > +		if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >=
> 64)
> >> >  			goto do_ex_hypercall;
> >>
> >> In case 'cpus' can end up being empty (I'm genuinely suprised it can)
> 
> I was just as surprised as you and spent the good part of a day
> debugging this. However, a:
> 
> 	WARN_ON(cpumask_empty(cpus));
> 
> triggers at that line of code even though we check for cpumask_empty()
> at the entry of the function.

What does the call stack look like when this triggers?  I'm curious about
the path where the 'cpus' could be changing while the flush call is in
progress.

I wonder if CPUs could ever be added to the mask?  Removing CPUs can
be handled with some care because an unnecessary flush doesn't hurt
anything.   But adding CPUs has serious correctness problems.

> 
> >> the check is mandatory indeed. I would, however, just return directly in
> >> this case:
> 
> Makes sense.

But need to do a local_irq_restore() before returning.

> 
> >> if (last < num_possible_cpus())
> >> 	return;
> >
> >I think you want
> >
> >   last >= num_possible_cpus()
> >
> >here?

Yes, but also the && must become || 

> >
> >A more important question is, if the mask can change willy-nilly, what
> >is stopping it from changing between these checks? I.e. is there still a
> >windows that hv_cpu_number_to_vp_number(last) can return garbage?
> 
> It's not that hv_cpu_number_to_vp_number() returns garbage, the issue is
> that we feed it garbage.
> 
> hv_cpu_number_to_vp_number() expects that the input would be in the
> range of 0 <= X < num_possible_cpus(), and here if 'cpus' was empty we
> would pass in X==num_possible_cpus() making it read out of bound.
> 
> Maybe it's worthwhile to add a WARN_ON() into
> hv_cpu_number_to_vp_number() to assert as well.

If the input cpumask can be changing, the other risk is the for_each_cpu()
loop, which also has a call to hv_cpu_number_to_vp_number().  But looking at
the implementation of for_each_cpu(), it will always return an in-bounds value,
so everything should be OK.

> 
> --
> Thanks,
> Sasha

^ permalink raw reply	[flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2020-10-03 17:40       ` Michael Kelley
@ 2020-10-05 14:58         ` Wei Liu
  2021-01-05 16:59           ` Michael Kelley
  0 siblings, 1 reply; 21+ messages in thread
From: Wei Liu @ 2020-10-05 14:58 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Sasha Levin, Wei Liu, vkuznets, tglx, mingo, bp, x86, hpa,
	linux-hyperv, linux-kernel, stable, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger

On Sat, Oct 03, 2020 at 05:40:15PM +0000, Michael Kelley wrote:
> From: Sasha Levin <sashal@kernel.org>  Sent: Thursday, October 1, 2020 6:04 AM
> > 
> > On Thu, Oct 01, 2020 at 11:53:59AM +0000, Wei Liu wrote:
> > >On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote:
> > >> Sasha Levin <sashal@kernel.org> writes:
> > >>
> > >> > cpumask can change underneath us, which is generally safe except when we
> > >> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass
> > >> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read
> > >> > garbage. As reported by KASAN:
> > >> >
> > >> > [   83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others
> > (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
> > >> > [   83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106
> > >> > [   84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G        W         5.4.60 #1
> > >> > [   84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine,
> > BIOS 090008  12/07/2018
> > >> > [   84.196669] Workqueue: writeback wb_workfn (flush-8:0)
> > >> > [   84.196669] Call Trace:
> > >> > [   84.196669] dump_stack (lib/dump_stack.c:120)
> > >> > [   84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375)
> > >> > [   84.196669] __kasan_report.cold (mm/kasan/report.c:507)
> > >> > [   84.196669] kasan_report (arch/x86/include/asm/smap.h:71
> > mm/kasan/common.c:635)
> > >> > [   84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128
> > arch/x86/hyperv/mmu.c:112)
> > >> > [   84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68
> > arch/x86/mm/tlb.c:798)
> > >> > [   84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-
> > generic.c:88)
> > >> >
> > >> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper
> > HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible")
> > >> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > >> > Cc: stable@kernel.org
> > >> > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > >> > ---
> > >> >  arch/x86/hyperv/mmu.c | 4 +++-
> > >> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >> >
> > >> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> > >> > index 5208ba49c89a9..b1d6afc5fc4a3 100644
> > >> > --- a/arch/x86/hyperv/mmu.c
> > >> > +++ b/arch/x86/hyperv/mmu.c
> > >> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask
> > *cpus,
> > >> >  		 * must. We will also check all VP numbers when walking the
> > >> >  		 * supplied CPU set to remain correct in all cases.
> > >> >  		 */
> > >> > -		if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
> > >> > +		int last = cpumask_last(cpus);
> > >> > +
> > >> > +		if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >=
> > 64)
> > >> >  			goto do_ex_hypercall;
> > >>
> > >> In case 'cpus' can end up being empty (I'm genuinely suprised it can)
> > 
> > I was just as surprised as you and spent the good part of a day
> > debugging this. However, a:
> > 
> > 	WARN_ON(cpumask_empty(cpus));
> > 
> > triggers at that line of code even though we check for cpumask_empty()
> > at the entry of the function.
> 
> What does the call stack look like when this triggers?  I'm curious about
> the path where the 'cpus' could be changing while the flush call is in
> progress.
> 
> I wonder if CPUs could ever be added to the mask?  Removing CPUs can
> be handled with some care because an unnecessary flush doesn't hurt
> anything.   But adding CPUs has serious correctness problems.
> 

The cpumask_empty check is done before disabling irq. Is it possible
the mask is modified by an interrupt?

If there is a reliable way to trigger this bug, we may be able to test
the following patch.

diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index 5208ba49c89a..23fa08d24c1a 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -66,11 +66,13 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
        if (!hv_hypercall_pg)
                goto do_native;

-       if (cpumask_empty(cpus))
-               return;
-
        local_irq_save(flags);

+       if (cpumask_empty(cpus)) {
+               local_irq_restore(flags);
+               return;
+       }
+
        flush_pcpu = (struct hv_tlb_flush **)
                     this_cpu_ptr(hyperv_pcpu_input_arg);


^ permalink raw reply related	[flat|nested] 21+ messages in thread
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2020-10-05 14:58         ` Wei Liu
@ 2021-01-05 16:59           ` Michael Kelley
  2021-01-05 17:10             ` Wei Liu
  2021-01-08 15:22             ` Sasha Levin
  0 siblings, 2 replies; 21+ messages in thread
From: Michael Kelley @ 2021-01-05 16:59 UTC (permalink / raw)
  To: Wei Liu
  Cc: Sasha Levin, vkuznets, tglx, mingo, bp, x86, hpa, linux-hyperv,
	linux-kernel, stable, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger

From: Wei Liu <wei.liu@kernel.org> Sent: Monday, October 5, 2020 7:59 AM
> 
> On Sat, Oct 03, 2020 at 05:40:15PM +0000, Michael Kelley wrote:
> > From: Sasha Levin <sashal@kernel.org>  Sent: Thursday, October 1, 2020 6:04 AM
> > >
> > > On Thu, Oct 01, 2020 at 11:53:59AM +0000, Wei Liu wrote:
> > > >On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote:
> > > >> Sasha Levin <sashal@kernel.org> writes:
> > > >>
> > > >> > cpumask can change underneath us, which is generally safe except when we
> > > >> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass
> > > >> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read
> > > >> > garbage. As reported by KASAN:
> > > >> >
> > > >> > [   83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others
> > > (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
> > > >> > [   83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106
> > > >> > [   84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G        W         5.4.60 #1
> > > >> > [   84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual
> Machine,
> > > BIOS 090008  12/07/2018
> > > >> > [   84.196669] Workqueue: writeback wb_workfn (flush-8:0)
> > > >> > [   84.196669] Call Trace:
> > > >> > [   84.196669] dump_stack (lib/dump_stack.c:120)
> > > >> > [   84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375)
> > > >> > [   84.196669] __kasan_report.cold (mm/kasan/report.c:507)
> > > >> > [   84.196669] kasan_report (arch/x86/include/asm/smap.h:71
> > > mm/kasan/common.c:635)
> > > >> > [   84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128
> > > arch/x86/hyperv/mmu.c:112)
> > > >> > [   84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68
> > > arch/x86/mm/tlb.c:798)
> > > >> > [   84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-
> > > generic.c:88)
> > > >> >
> > > >> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper
> > > HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible")
> > > >> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > >> > Cc: stable@kernel.org
> > > >> > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > > >> > ---
> > > >> >  arch/x86/hyperv/mmu.c | 4 +++-
> > > >> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > >> >
> > > >> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> > > >> > index 5208ba49c89a9..b1d6afc5fc4a3 100644
> > > >> > --- a/arch/x86/hyperv/mmu.c
> > > >> > +++ b/arch/x86/hyperv/mmu.c
> > > >> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask
> > > *cpus,
> > > >> >  		 * must. We will also check all VP numbers when walking the
> > > >> >  		 * supplied CPU set to remain correct in all cases.
> > > >> >  		 */
> > > >> > -		if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
> > > >> > +		int last = cpumask_last(cpus);
> > > >> > +
> > > >> > +		if (last < num_possible_cpus() &&
> hv_cpu_number_to_vp_number(last) >=
> > > 64)
> > > >> >  			goto do_ex_hypercall;
> > > >>
> > > >> In case 'cpus' can end up being empty (I'm genuinely suprised it can)
> > >
> > > I was just as surprised as you and spent the good part of a day
> > > debugging this. However, a:
> > >
> > > 	WARN_ON(cpumask_empty(cpus));
> > >
> > > triggers at that line of code even though we check for cpumask_empty()
> > > at the entry of the function.
> >
> > What does the call stack look like when this triggers?  I'm curious about
> > the path where the 'cpus' could be changing while the flush call is in
> > progress.
> >
> > I wonder if CPUs could ever be added to the mask?  Removing CPUs can
> > be handled with some care because an unnecessary flush doesn't hurt
> > anything.   But adding CPUs has serious correctness problems.
> >
> 
> The cpumask_empty check is done before disabling irq. Is it possible
> the mask is modified by an interrupt?
> 
> If there is a reliable way to trigger this bug, we may be able to test
> the following patch.
> 
> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> index 5208ba49c89a..23fa08d24c1a 100644
> --- a/arch/x86/hyperv/mmu.c
> +++ b/arch/x86/hyperv/mmu.c
> @@ -66,11 +66,13 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>         if (!hv_hypercall_pg)
>                 goto do_native;
> 
> -       if (cpumask_empty(cpus))
> -               return;
> -
>         local_irq_save(flags);
> 
> +       if (cpumask_empty(cpus)) {
> +               local_irq_restore(flags);
> +               return;
> +       }
> +
>         flush_pcpu = (struct hv_tlb_flush **)
>                      this_cpu_ptr(hyperv_pcpu_input_arg);

This thread died out 3 months ago without any patches being taken.
I recently hit the problem again at random, though not in a
reproducible way.

I'd like to take Wei Liu's latest proposal to check for an empty
cpumask *after* interrupts are disabled.   I think this will almost
certainly solve the problem, and in a cleaner way than Sasha's
proposal.  I'd also suggest adding a comment in the code to note
the importance of the ordering.

Michael





^ permalink raw reply	[flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2021-01-05 16:59           ` Michael Kelley
@ 2021-01-05 17:10             ` Wei Liu
  2021-01-08 15:22             ` Sasha Levin
  1 sibling, 0 replies; 21+ messages in thread
From: Wei Liu @ 2021-01-05 17:10 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, Sasha Levin, vkuznets, tglx, mingo, bp, x86, hpa,
	linux-hyperv, linux-kernel, stable, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger

On Tue, Jan 05, 2021 at 04:59:10PM +0000, Michael Kelley wrote:
> From: Wei Liu <wei.liu@kernel.org> Sent: Monday, October 5, 2020 7:59 AM
> > 
> > On Sat, Oct 03, 2020 at 05:40:15PM +0000, Michael Kelley wrote:
> > > From: Sasha Levin <sashal@kernel.org>  Sent: Thursday, October 1, 2020 6:04 AM
> > > >
> > > > On Thu, Oct 01, 2020 at 11:53:59AM +0000, Wei Liu wrote:
> > > > >On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote:
> > > > >> Sasha Levin <sashal@kernel.org> writes:
> > > > >>
> > > > >> > cpumask can change underneath us, which is generally safe except when we
> > > > >> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass
> > > > >> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read
> > > > >> > garbage. As reported by KASAN:
> > > > >> >
> > > > >> > [   83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others
> > > > (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
> > > > >> > [   83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106
> > > > >> > [   84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G        W         5.4.60 #1
> > > > >> > [   84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual
> > Machine,
> > > > BIOS 090008  12/07/2018
> > > > >> > [   84.196669] Workqueue: writeback wb_workfn (flush-8:0)
> > > > >> > [   84.196669] Call Trace:
> > > > >> > [   84.196669] dump_stack (lib/dump_stack.c:120)
> > > > >> > [   84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375)
> > > > >> > [   84.196669] __kasan_report.cold (mm/kasan/report.c:507)
> > > > >> > [   84.196669] kasan_report (arch/x86/include/asm/smap.h:71
> > > > mm/kasan/common.c:635)
> > > > >> > [   84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128
> > > > arch/x86/hyperv/mmu.c:112)
> > > > >> > [   84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68
> > > > arch/x86/mm/tlb.c:798)
> > > > >> > [   84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-
> > > > generic.c:88)
> > > > >> >
> > > > >> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper
> > > > HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible")
> > > > >> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > > >> > Cc: stable@kernel.org
> > > > >> > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > > > >> > ---
> > > > >> >  arch/x86/hyperv/mmu.c | 4 +++-
> > > > >> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > >> >
> > > > >> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> > > > >> > index 5208ba49c89a9..b1d6afc5fc4a3 100644
> > > > >> > --- a/arch/x86/hyperv/mmu.c
> > > > >> > +++ b/arch/x86/hyperv/mmu.c
> > > > >> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask
> > > > *cpus,
> > > > >> >  		 * must. We will also check all VP numbers when walking the
> > > > >> >  		 * supplied CPU set to remain correct in all cases.
> > > > >> >  		 */
> > > > >> > -		if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
> > > > >> > +		int last = cpumask_last(cpus);
> > > > >> > +
> > > > >> > +		if (last < num_possible_cpus() &&
> > hv_cpu_number_to_vp_number(last) >=
> > > > 64)
> > > > >> >  			goto do_ex_hypercall;
> > > > >>
> > > > >> In case 'cpus' can end up being empty (I'm genuinely suprised it can)
> > > >
> > > > I was just as surprised as you and spent the good part of a day
> > > > debugging this. However, a:
> > > >
> > > > 	WARN_ON(cpumask_empty(cpus));
> > > >
> > > > triggers at that line of code even though we check for cpumask_empty()
> > > > at the entry of the function.
> > >
> > > What does the call stack look like when this triggers?  I'm curious about
> > > the path where the 'cpus' could be changing while the flush call is in
> > > progress.
> > >
> > > I wonder if CPUs could ever be added to the mask?  Removing CPUs can
> > > be handled with some care because an unnecessary flush doesn't hurt
> > > anything.   But adding CPUs has serious correctness problems.
> > >
> > 
> > The cpumask_empty check is done before disabling irq. Is it possible
> > the mask is modified by an interrupt?
> > 
> > If there is a reliable way to trigger this bug, we may be able to test
> > the following patch.
> > 
> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> > index 5208ba49c89a..23fa08d24c1a 100644
> > --- a/arch/x86/hyperv/mmu.c
> > +++ b/arch/x86/hyperv/mmu.c
> > @@ -66,11 +66,13 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> >         if (!hv_hypercall_pg)
> >                 goto do_native;
> > 
> > -       if (cpumask_empty(cpus))
> > -               return;
> > -
> >         local_irq_save(flags);
> > 
> > +       if (cpumask_empty(cpus)) {
> > +               local_irq_restore(flags);
> > +               return;
> > +       }
> > +
> >         flush_pcpu = (struct hv_tlb_flush **)
> >                      this_cpu_ptr(hyperv_pcpu_input_arg);
> 
> This thread died out 3 months ago without any patches being taken.
> I recently hit the problem again at random, though not in a
> reproducible way.
> 
> I'd like to take Wei Liu's latest proposal to check for an empty
> cpumask *after* interrupts are disabled.   I think this will almost
> certainly solve the problem, and in a cleaner way than Sasha's
> proposal.  I'd also suggest adding a comment in the code to note
> the importance of the ordering.
> 

Sure. Let me prepare a proper patch.

Wei.

^ permalink raw reply	[flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2021-01-05 16:59           ` Michael Kelley
  2021-01-05 17:10             ` Wei Liu
@ 2021-01-08 15:22             ` Sasha Levin
  1 sibling, 0 replies; 21+ messages in thread
From: Sasha Levin @ 2021-01-08 15:22 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Wei Liu, vkuznets, tglx, mingo, bp, x86, hpa, linux-hyperv,
	linux-kernel, stable, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger

On Tue, Jan 05, 2021 at 04:59:10PM +0000, Michael Kelley wrote:
>From: Wei Liu <wei.liu@kernel.org> Sent: Monday, October 5, 2020 7:59 AM
>>
>> On Sat, Oct 03, 2020 at 05:40:15PM +0000, Michael Kelley wrote:
>> > From: Sasha Levin <sashal@kernel.org>  Sent: Thursday, October 1, 2020 6:04 AM
>> > >
>> > > On Thu, Oct 01, 2020 at 11:53:59AM +0000, Wei Liu wrote:
>> > > >On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote:
>> > > >> Sasha Levin <sashal@kernel.org> writes:
>> > > >>
>> > > >> > cpumask can change underneath us, which is generally safe except when we
>> > > >> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass
>> > > >> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read
>> > > >> > garbage. As reported by KASAN:
>> > > >> >
>> > > >> > [   83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others
>> > > (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
>> > > >> > [   83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106
>> > > >> > [   84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G        W         5.4.60 #1
>> > > >> > [   84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual
>> Machine,
>> > > BIOS 090008  12/07/2018
>> > > >> > [   84.196669] Workqueue: writeback wb_workfn (flush-8:0)
>> > > >> > [   84.196669] Call Trace:
>> > > >> > [   84.196669] dump_stack (lib/dump_stack.c:120)
>> > > >> > [   84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375)
>> > > >> > [   84.196669] __kasan_report.cold (mm/kasan/report.c:507)
>> > > >> > [   84.196669] kasan_report (arch/x86/include/asm/smap.h:71
>> > > mm/kasan/common.c:635)
>> > > >> > [   84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128
>> > > arch/x86/hyperv/mmu.c:112)
>> > > >> > [   84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68
>> > > arch/x86/mm/tlb.c:798)
>> > > >> > [   84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-
>> > > generic.c:88)
>> > > >> >
>> > > >> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper
>> > > HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible")
>> > > >> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> > > >> > Cc: stable@kernel.org
>> > > >> > Signed-off-by: Sasha Levin <sashal@kernel.org>
>> > > >> > ---
>> > > >> >  arch/x86/hyperv/mmu.c | 4 +++-
>> > > >> >  1 file changed, 3 insertions(+), 1 deletion(-)
>> > > >> >
>> > > >> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
>> > > >> > index 5208ba49c89a9..b1d6afc5fc4a3 100644
>> > > >> > --- a/arch/x86/hyperv/mmu.c
>> > > >> > +++ b/arch/x86/hyperv/mmu.c
>> > > >> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask
>> > > *cpus,
>> > > >> >  		 * must. We will also check all VP numbers when walking the
>> > > >> >  		 * supplied CPU set to remain correct in all cases.
>> > > >> >  		 */
>> > > >> > -		if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
>> > > >> > +		int last = cpumask_last(cpus);
>> > > >> > +
>> > > >> > +		if (last < num_possible_cpus() &&
>> hv_cpu_number_to_vp_number(last) >=
>> > > 64)
>> > > >> >  			goto do_ex_hypercall;
>> > > >>
>> > > >> In case 'cpus' can end up being empty (I'm genuinely suprised it can)
>> > >
>> > > I was just as surprised as you and spent the good part of a day
>> > > debugging this. However, a:
>> > >
>> > > 	WARN_ON(cpumask_empty(cpus));
>> > >
>> > > triggers at that line of code even though we check for cpumask_empty()
>> > > at the entry of the function.
>> >
>> > What does the call stack look like when this triggers?  I'm curious about
>> > the path where the 'cpus' could be changing while the flush call is in
>> > progress.
>> >
>> > I wonder if CPUs could ever be added to the mask?  Removing CPUs can
>> > be handled with some care because an unnecessary flush doesn't hurt
>> > anything.   But adding CPUs has serious correctness problems.
>> >
>>
>> The cpumask_empty check is done before disabling irq. Is it possible
>> the mask is modified by an interrupt?
>>
>> If there is a reliable way to trigger this bug, we may be able to test
>> the following patch.
>>
>> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
>> index 5208ba49c89a..23fa08d24c1a 100644
>> --- a/arch/x86/hyperv/mmu.c
>> +++ b/arch/x86/hyperv/mmu.c
>> @@ -66,11 +66,13 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>>         if (!hv_hypercall_pg)
>>                 goto do_native;
>>
>> -       if (cpumask_empty(cpus))
>> -               return;
>> -
>>         local_irq_save(flags);
>>
>> +       if (cpumask_empty(cpus)) {
>> +               local_irq_restore(flags);
>> +               return;
>> +       }
>> +
>>         flush_pcpu = (struct hv_tlb_flush **)
>>                      this_cpu_ptr(hyperv_pcpu_input_arg);
>
>This thread died out 3 months ago without any patches being taken.
>I recently hit the problem again at random, though not in a
>reproducible way.
>
>I'd like to take Wei Liu's latest proposal to check for an empty
>cpumask *after* interrupts are disabled.   I think this will almost
>certainly solve the problem, and in a cleaner way than Sasha's
>proposal.  I'd also suggest adding a comment in the code to note
>the importance of the ordering.

I found that this syzbot reproducer:
https://syzkaller.appspot.com//bug?id=47befb59c610a69f024db20b927dea80c88fc045
is pretty good at reproducing the issue too:

BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others+0x11ea/0x17c0
Read of size 4 at addr ffff88810005db20 by task 3.c.exe/13007

CPU: 4 PID: 13007 Comm: 3.c.exe Not tainted 5.10.5 #1
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 06/17/2020
Call Trace:
  dump_stack+0xa4/0xd9
  print_address_description.constprop.0.cold+0xd4/0x509
  kasan_report.cold+0x20/0x37
  __asan_report_load4_noabort+0x14/0x20
  hyperv_flush_tlb_others+0x11ea/0x17c0
  flush_tlb_mm_range+0x1fd/0x360
  tlb_flush_mmu+0x1b5/0x510
  tlb_finish_mmu+0x89/0x360
  exit_mmap+0x24f/0x450
  mmput+0x121/0x400
  do_exit+0x8cf/0x2a70
  do_group_exit+0x100/0x300
  get_signal+0x3d7/0x1e70
  arch_do_signal+0x8c/0x2670
  exit_to_user_mode_prepare+0x154/0x1f0
  syscall_exit_to_user_mode+0x42/0x280
  do_syscall_64+0x45/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x450c2d
Code: Unable to access opcode bytes at RIP 0x450c03.
RSP: 002b:00007f6c81711d68 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 0000000000450c2d
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00000000004e0428
RBP: 00007f6c81711d80 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffeeef33d2e
R13: 00007ffeeef33d2f R14: 00007ffeeef33dd0 R15: 00007f6c81711e80

Allocated by task 0:
  kasan_save_stack+0x23/0x50
  __kasan_kmalloc.constprop.0+0xcf/0xe0
  kasan_kmalloc+0x9/0x10
  __kmalloc+0x1c8/0x3b0
  kmalloc_array+0x12/0x14
  hyperv_init+0xd4/0x3a0
  apic_intr_mode_init+0xbb/0x1e8
  x86_late_time_init+0x96/0xa7
  start_kernel+0x317/0x3d3
  x86_64_start_reservations+0x24/0x26
  x86_64_start_kernel+0x7a/0x7e
  secondary_startup_64_no_verify+0xb0/0xbb

The buggy address belongs to the object at ffff88810005db00
  which belongs to the cache kmalloc-32 of size 32
The buggy address is located 0 bytes to the right of
  32-byte region [ffff88810005db00, ffff88810005db20)
The buggy address belongs to the page:
page:0000000065310ff0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10005d
flags: 0x17ffffc0000200(slab)
raw: 0017ffffc0000200 0000000000000000 0000000100000001 ffff888100043a40
raw: 0000000000000000 0000000000400040 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff88810005da00: 00 00 00 00 fc fc fc fc 00 00 00 00 fc fc fc fc
  ffff88810005da80: 00 00 00 00 fc fc fc fc 00 00 00 00 fc fc fc fc
>ffff88810005db00: 00 00 00 00 fc fc fc fc 00 00 00 fc fc fc fc fc
                                ^
  ffff88810005db80: 00 00 00 fc fc fc fc fc 00 00 00 fc fc fc fc fc
  ffff88810005dc00: 00 00 00 fc fc fc fc fc 00 00 00 fc fc fc fc fc

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
  2020-10-01 11:53   ` Wei Liu
  2020-10-01 13:04     ` Sasha Levin
@ 2020-10-01 13:10     ` Vitaly Kuznetsov
  1 sibling, 0 replies; 21+ messages in thread
From: Vitaly Kuznetsov @ 2020-10-01 13:10 UTC (permalink / raw)
  To: Wei Liu
  Cc: Sasha Levin, tglx, mingo, bp, x86, hpa, mikelley, linux-hyperv,
	linux-kernel, stable, kys, haiyangz, sthemmin, wei.liu

Wei Liu <wei.liu@kernel.org> writes:

> On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote:
>> Sasha Levin <sashal@kernel.org> writes:
>> 
>> > cpumask can change underneath us, which is generally safe except when we
>> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass
>> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read
>> > garbage. As reported by KASAN:
>> >
>> > [   83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
>> > [   83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106
>> > [   84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G        W         5.4.60 #1
>> > [   84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
>> > [   84.196669] Workqueue: writeback wb_workfn (flush-8:0)
>> > [   84.196669] Call Trace:
>> > [   84.196669] dump_stack (lib/dump_stack.c:120)
>> > [   84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375)
>> > [   84.196669] __kasan_report.cold (mm/kasan/report.c:507)
>> > [   84.196669] kasan_report (arch/x86/include/asm/smap.h:71 mm/kasan/common.c:635)
>> > [   84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112)
>> > [   84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 arch/x86/mm/tlb.c:798)
>> > [   84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-generic.c:88)
>> >
>> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible")
>> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> > Cc: stable@kernel.org
>> > Signed-off-by: Sasha Levin <sashal@kernel.org>
>> > ---
>> >  arch/x86/hyperv/mmu.c | 4 +++-
>> >  1 file changed, 3 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
>> > index 5208ba49c89a9..b1d6afc5fc4a3 100644
>> > --- a/arch/x86/hyperv/mmu.c
>> > +++ b/arch/x86/hyperv/mmu.c
>> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>> >  		 * must. We will also check all VP numbers when walking the
>> >  		 * supplied CPU set to remain correct in all cases.
>> >  		 */
>> > -		if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
>> > +		int last = cpumask_last(cpus);
>> > +
>> > +		if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >= 64)
>> >  			goto do_ex_hypercall;
>> 
>> In case 'cpus' can end up being empty (I'm genuinely suprised it can)
>> the check is mandatory indeed. I would, however, just return directly in
>> this case:
>> 
>> if (last < num_possible_cpus())
>> 	return;
>
> I think you want 
>
>    last >= num_possible_cpus()
>
> here?

Of course, thanks!

>
> A more important question is, if the mask can change willy-nilly, what
> is stopping it from changing between these checks? I.e. is there still a
> windows that hv_cpu_number_to_vp_number(last) can return garbage?
>

AFAIU some CPUs can be dropped from the mask (because they switch to a
different mm?) and if we still flush there it's not a problem. The only
real problem I currently see is that we're passing cpumask_last() result
to hv_cpu_number_to_vp_number() and cpumask_last() returns
num_possible_cpus() when the mask is empty but this can't be passed to
hv_cpu_number_to_vp_number().


> Wei.
>
>> 
>> if (hv_cpu_number_to_vp_number(last) >= 64)
>> 	goto do_ex_hypercall;
>> 
>> as there's nothing to flush, no need to call into
>> hyperv_flush_tlb_others_ex().
>> 
>> Anyway, the fix seems to be correct, so
>> 
>> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> 
>> >  
>> >  		for_each_cpu(cpu, cpus) {
>> 
>> -- 
>> Vitaly
>> 
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 21+ messages in thread
end of thread, other threads:[~2021-08-22 17:32 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CA+qYZY3a-FHfWNL2=na6O8TRJYu9kaeyp80VNDxaDTi2EBGoog@mail.gmail.com>
2021-08-06 10:43 ` [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() Michael Kelley
2021-08-06 17:35   ` David Mozes
     [not found]     ` <CAHkVu0-ZCXDRZL92d_G3oKpPuKvmY=YEbu9nbx9vkZHnhHFD8Q@mail.gmail.com>
2021-08-06 21:51       ` Michael Kelley
2021-08-07  5:00         ` David Moses
2021-08-17  9:16           ` David Mozes
2021-08-17 11:29             ` Wei Liu
2021-08-19 11:05               ` David Mozes
     [not found]               ` <CA+qYZY1U04SkyHo7X+rDeE=nUy_X5nxLfShyuLJFzXnFp2A6uw@mail.gmail.com>
     [not found]                 ` <VI1PR0401MB24153DEC767B0126B1030E07F1C09@VI1PR0401MB2415.eurprd04.prod.outlook.com>
2021-08-22 15:24                   ` Wei Liu
2021-08-22 16:25                     ` David Mozes
2021-08-22 17:32                       ` Wei Liu
     [not found] <VI1PR0401MB24150B31A1D63176BBB788D2F1F19@VI1PR0401MB2415.eurprd04.prod.outlook.com>
2021-08-05 18:08 ` Michael Kelley
2020-10-01  1:38 Sasha Levin
2020-10-01  9:40 ` Vitaly Kuznetsov
2020-10-01 11:53   ` Wei Liu
2020-10-01 13:04     ` Sasha Levin
2020-10-03 17:40       ` Michael Kelley
2020-10-05 14:58         ` Wei Liu
2021-01-05 16:59           ` Michael Kelley
2021-01-05 17:10             ` Wei Liu
2021-01-08 15:22             ` Sasha Levin
2020-10-01 13:10     ` Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).