* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() [not found] <CA+qYZY3a-FHfWNL2=na6O8TRJYu9kaeyp80VNDxaDTi2EBGoog@mail.gmail.com> @ 2021-08-06 10:43 ` Michael Kelley 2021-08-06 17:35 ` David Mozes 0 siblings, 1 reply; 21+ messages in thread From: Michael Kelley @ 2021-08-06 10:43 UTC (permalink / raw) To: David Moses; +Cc: david.mozes, linux-hyperv, linux-kernel From: David Moses <mosesster@gmail.com> Sent: Friday, August 6, 2021 2:20 AM > Hi Michael , > We are running kernel 4.19.195 (The fix Wei Liu suggested of moving the > cpumask_empty check after disabling interrupts is included in this version). > with the default hyper-v version > I'm getting the 4 bytes garbage read (trace included) once almost every night > We running on Azure vm Standard D64s_v4 with 64 cores (Our system include > three of such Vms) the application is very high io traffic involving iscsi > We believe this issue casus us to stack corruption on the rt scheduler as I forward > in the previous mail. > > Let us know what is more needed to clarify the problem. > Is it just Hyper-v related? or could be a general kernel issue. > > Thx David > > even more that that while i add the below patch/fix > > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h > index 5b58a6c..165727a 100644 > --- a/arch/x86/include/asm/mshyperv.h > +++ b/arch/x86/include/asm/mshyperv.h > @@ -298,6 +298,9 @@ static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu) > */ > static inline int hv_cpu_number_to_vp_number(int cpu_number) > { > + if (WARN_ON_ONCE(cpu_number < 0 || cpu_number >= num_possible_cpus())) > + return VP_INVAL; > + > return hv_vp_index[cpu_number]; > } > > we have evidence that we reach this point > > see below: > Aug 5 21:03:01 c-node11 kernel: [17147.089261] WARNING: CPU: 15 PID: 8973 at arch/x86/include/asm/mshyperv.h:301 hyperv_flush_tlb_others+0x1f7/0x760 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] RIP: 0010:hyperv_flush_tlb_others+0x1f7/0x760 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] Code: ff ff be 40 00 00 00 48 89 df e8 c4 ff 3a 00 > 85 c0 48 89 c2 78 14 48 8b 3d be 52 32 01 f3 48 0f b8 c7 39 c2 0f 82 7e 01 00 00 <0f> 0b ba ff ff ff ff > 89 d7 48 89 de e8 68 87 7d 00 3b 05 66 54 32 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] RSP: 0018:ffff8c536bcafa38 EFLAGS: 00010046 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] RAX: 0000000000000040 RBX: ffff8c339542ea00 RCX: ffffffffffffffff > Aug 5 21:03:01 c-node11 kernel: [17147.089275] RDX: 0000000000000040 RSI: ffffffffffffffff RDI: ffffffffffffffff > Aug 5 21:03:01 c-node11 kernel: [17147.089275] RBP: ffff8c339878b000 R08: ffffffffffffffff R09: ffffe93ecbcaa0e8 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] R10: 00000000020e0000 R11: 0000000000000000 R12: ffff8c536bcafa88 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] R13: ffffe93efe1ef980 R14: ffff8c339542e600 R15: 00007ffcbc390000 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] FS: 00007fcb8eae37a0(0000) GS:ffff8c339f7c0000(0000) knlGS:0000000000000000 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] CR2: 000000000135d1d8 CR3: 0000004037137005 CR4: 00000000003606e0 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] Call Trace: > Aug 5 21:03:01 c-node11 kernel: [17147.089275] flush_tlb_mm_range+0xc3/0x120 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] ptep_clear_flush+0x3a/0x40 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] wp_page_copy+0x2e6/0x8f0 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] ? reuse_swap_page+0x13d/0x390 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] do_wp_page+0x99/0x4c0 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] __handle_mm_fault+0xb4e/0x12c0 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] ? memcg_kmem_get_cache+0x76/0x1a0 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] handle_mm_fault+0xd6/0x200 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] __get_user_pages+0x29e/0x780 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] get_user_pages_remote+0x12c/0x1b0 (FYI -- email to the Linux kernel mailing lists should be in plaintext format, and not use HTML or other formatting.) This is an excellent experiment. It certainly suggests that the cpumask that is passed to hyperv_flush_tlb_others() has bits set for CPUs above 64 that don't exist. If that's the case, it would seem to be a general kernel issue rather than something specific to Hyper-V. Since it looks like you can to add debugging code to the kernel, here are a couple of thoughts: 1) In hyperv_flush_tlb_others() after the call to disable interrupts, check the value of cpulast(cpus), and if it is greater than num_possible_cpus(), execute a printk() statement that outputs the entire contents of the cpumask that is passed in. There's a special printk format string for printing out bitmaps like cpumasks. Let me know if you would like some help on this code -- I can provide a diff later today. Seeing what the "bad" cpumask looks like might give some clues as to the problem. 2) As a different experiment, you can disable the Hyper-V specific flush routines entirely. At the end of the mmu.c source file, have hyperv_setup_mmu_ops() always return immediately. In this case, the generic Linux kernel flush routines will be used instead of the Hyper-V ones. The code may be marginally slower, but it will then be interesting to see if a problem shows up elsewhere. But based on your experiment, I'm guessing that there's a general kernel issue rather than something specific to Hyper-V. Have you run 4.19 kernels previous to 4.19.195 that didn't have this problem? If you have a kernel version that is good, the ultimate step would be to do a bisect and find out where the problem was introduced in the 4.19-series. That could take a while, but it would almost certainly identify the problematic code change and would be beneficial to the Linux kernel community in general. Michael ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2021-08-06 10:43 ` [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() Michael Kelley @ 2021-08-06 17:35 ` David Mozes [not found] ` <CAHkVu0-ZCXDRZL92d_G3oKpPuKvmY=YEbu9nbx9vkZHnhHFD8Q@mail.gmail.com> 0 siblings, 1 reply; 21+ messages in thread From: David Mozes @ 2021-08-06 17:35 UTC (permalink / raw) To: Michael Kelley, David Moses Cc: linux-hyperv, linux-kernel, David Mozes, תומר אבוטבול Yes ,please provide the diff. Unfortunately we saw the problem on every 4.19.x version we tested, started from 149 we saw the issue as we as in 5.4.80 I believe you are right and it is general kernel issue and not hyper-v specific. Look that the code I added eliminate the Panic we got but the kernel "doesn't like it" Any suggestions how we can let the kernel continue working while we do our experiment? Thx David -----Original Message----- From: Michael Kelley <mikelley@microsoft.com> Sent: Friday, August 6, 2021 1:43 PM To: David Moses <mosesster@gmail.com> Cc: David Mozes <david.mozes@silk.us>; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org Subject: RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() From: David Moses <mosesster@gmail.com> Sent: Friday, August 6, 2021 2:20 AM > Hi Michael , > We are running kernel 4.19.195 (The fix Wei Liu suggested of moving the > cpumask_empty check after disabling interrupts is included in this version). > with the default hyper-v version > I'm getting the 4 bytes garbage read (trace included) once almost every night > We running on Azure vm Standard D64s_v4 with 64 cores (Our system include > three of such Vms) the application is very high io traffic involving iscsi > We believe this issue casus us to stack corruption on the rt scheduler as I forward > in the previous mail. > > Let us know what is more needed to clarify the problem. > Is it just Hyper-v related? or could be a general kernel issue. > > Thx David > > even more that that while i add the below patch/fix > > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h > index 5b58a6c..165727a 100644 > --- a/arch/x86/include/asm/mshyperv.h > +++ b/arch/x86/include/asm/mshyperv.h > @@ -298,6 +298,9 @@ static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu) > */ > static inline int hv_cpu_number_to_vp_number(int cpu_number) > { > + if (WARN_ON_ONCE(cpu_number < 0 || cpu_number >= num_possible_cpus())) > + return VP_INVAL; > + > return hv_vp_index[cpu_number]; > } > > we have evidence that we reach this point > > see below: > Aug 5 21:03:01 c-node11 kernel: [17147.089261] WARNING: CPU: 15 PID: 8973 at arch/x86/include/asm/mshyperv.h:301 hyperv_flush_tlb_others+0x1f7/0x760 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] RIP: 0010:hyperv_flush_tlb_others+0x1f7/0x760 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] Code: ff ff be 40 00 00 00 48 89 df e8 c4 ff 3a 00 > 85 c0 48 89 c2 78 14 48 8b 3d be 52 32 01 f3 48 0f b8 c7 39 c2 0f 82 7e 01 00 00 <0f> 0b ba ff ff ff ff > 89 d7 48 89 de e8 68 87 7d 00 3b 05 66 54 32 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] RSP: 0018:ffff8c536bcafa38 EFLAGS: 00010046 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] RAX: 0000000000000040 RBX: ffff8c339542ea00 RCX: ffffffffffffffff > Aug 5 21:03:01 c-node11 kernel: [17147.089275] RDX: 0000000000000040 RSI: ffffffffffffffff RDI: ffffffffffffffff > Aug 5 21:03:01 c-node11 kernel: [17147.089275] RBP: ffff8c339878b000 R08: ffffffffffffffff R09: ffffe93ecbcaa0e8 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] R10: 00000000020e0000 R11: 0000000000000000 R12: ffff8c536bcafa88 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] R13: ffffe93efe1ef980 R14: ffff8c339542e600 R15: 00007ffcbc390000 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] FS: 00007fcb8eae37a0(0000) GS:ffff8c339f7c0000(0000) knlGS:0000000000000000 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] CR2: 000000000135d1d8 CR3: 0000004037137005 CR4: 00000000003606e0 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] Call Trace: > Aug 5 21:03:01 c-node11 kernel: [17147.089275] flush_tlb_mm_range+0xc3/0x120 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] ptep_clear_flush+0x3a/0x40 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] wp_page_copy+0x2e6/0x8f0 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] ? reuse_swap_page+0x13d/0x390 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] do_wp_page+0x99/0x4c0 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] __handle_mm_fault+0xb4e/0x12c0 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] ? memcg_kmem_get_cache+0x76/0x1a0 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] handle_mm_fault+0xd6/0x200 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] __get_user_pages+0x29e/0x780 > Aug 5 21:03:01 c-node11 kernel: [17147.089275] get_user_pages_remote+0x12c/0x1b0 (FYI -- email to the Linux kernel mailing lists should be in plaintext format, and not use HTML or other formatting.) This is an excellent experiment. It certainly suggests that the cpumask that is passed to hyperv_flush_tlb_others() has bits set for CPUs above 64 that don't exist. f that's the case, it would seem to be a general kernel issue rather than something specific to Hyper-V. Since it looks like you can to add debugging code to the kernel, here are a couple of thoughts: 1) In hyperv_flush_tlb_others() after the call to disable interrupts, check the value of cpulast(cpus), and if it is greater than num_possible_cpus(), execute a printk() statement that outputs the entire contents of the cpumask that is passed in. There's a special printk format string for printing out bitmaps like cpumasks. Let me know if you would like some help on this code -- I can provide a diff later today. Seeing what the "bad" cpumask looks like might give some clues as to the problem. 2) As a different experiment, you can disable the Hyper-V specific flush routines entirely. At the end of the mmu.c source file, have hyperv_setup_mmu_ops() always return immediately. In this case, the generic Linux kernel flush routines will be used instead of the Hyper-V ones. The code may be marginally slower, but it will then be interesting to see if a problem shows up elsewhere. But based on your experiment, I'm guessing that there's a general kernel issue rather than something specific to Hyper-V. Have you run 4.19 kernels previous to 4.19.195 that didn't have this problem? If you have a kernel version that is good, the ultimate step would be to do a bisect and find out where the problem was introduced in the 4.19-series. That could take a while, but it would almost certainly identify the problematic code change and would be beneficial to the Linux kernel community in general. Michael ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <CAHkVu0-ZCXDRZL92d_G3oKpPuKvmY=YEbu9nbx9vkZHnhHFD8Q@mail.gmail.com>]
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() [not found] ` <CAHkVu0-ZCXDRZL92d_G3oKpPuKvmY=YEbu9nbx9vkZHnhHFD8Q@mail.gmail.com> @ 2021-08-06 21:51 ` Michael Kelley 2021-08-07 5:00 ` David Moses 0 siblings, 1 reply; 21+ messages in thread From: Michael Kelley @ 2021-08-06 21:51 UTC (permalink / raw) To: תומר אבוטבול, David Mozes Cc: David Moses, linux-hyperv, linux-kernel From: תומר אבוטבול <tomer432100@gmail.com> Sent: Friday, August 6, 2021 11:03 AM > Attaching the patches Michael asked for debugging > 1) Print the cpumask when < num_possible_cpus(): > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > index e666f7eaf32d..620f656d6195 100644 > --- a/arch/x86/hyperv/mmu.c > +++ b/arch/x86/hyperv/mmu.c > @@ -60,6 +60,7 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, > struct hv_tlb_flush *flush; > u64 status = U64_MAX; > unsigned long flags; > + unsigned int cpu_last; > > trace_hyperv_mmu_flush_tlb_others(cpus, info); > > @@ -68,6 +69,11 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, > > local_irq_save(flags); > > + cpu_last = cpumask_last(cpus); > + if (cpu_last > num_possible_cpus()) { I think this should be ">=" since cpus are numbered starting at zero. In your VM with 64 CPUs, having CPU #64 in the list would be error. > + pr_emerg("ERROR_HYPERV: cpu_last=%*pbl", cpumask_pr_args(cpus)); > + } > + > /* > * Only check the mask _after_ interrupt has been disabled to avoid the > * mask changing under our feet. > > 2) disable the Hyper-V specific flush routines: > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > index e666f7eaf32d..8e77cc84775a 100644 > --- a/arch/x86/hyperv/mmu.c > +++ b/arch/x86/hyperv/mmu.c > @@ -235,6 +235,7 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus, > > void hyperv_setup_mmu_ops(void) > { > + return; > if (!(ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED)) > return; Otherwise, this code looks good to me and matches what I had in mind. Note that the function native_flush_tlb_others() is used when the Hyper-V specific flush function is disabled per patch #2 above, or when hv_cpu_to_vp_index() returns VP_INVALID. In a quick glance through the code, it appears that native_flush_tlb_others() will work even if there's a non-existent CPU in the cpumask that is passed as an argument. So perhaps an immediate workaround is Patch #2 above. Perhaps hyperv_flush_tlb_others() should be made equally tolerant of a non-existent CPU being in the list. But if you are willing, I'm still interested in the results of an experiment with just Patch #1. I'm curious about what the CPU list looks like when it has a non-existent CPU. Is it complete garbage, or is there just one non-existent CPU? The other curiosity is that I haven't seen this Linux panic reported by other users, and I think it would have come to our attention if it were happening with any frequency. You see the problem fairly regularly. So I'm wondering what the difference is. Michael ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2021-08-06 21:51 ` Michael Kelley @ 2021-08-07 5:00 ` David Moses 2021-08-17 9:16 ` David Mozes 0 siblings, 1 reply; 21+ messages in thread From: David Moses @ 2021-08-07 5:00 UTC (permalink / raw) To: Michael Kelley Cc: תומר אבוטבול, David Mozes, linux-hyperv, linux-kernel Sent from my iPhone > On Aug 7, 2021, at 12:51 AM, Michael Kelley <mikelley@microsoft.com> wrote: > > From: תומר אבוטבול <tomer432100@gmail.com> Sent: Friday, August 6, 2021 11:03 AM > >> Attaching the patches Michael asked for debugging >> 1) Print the cpumask when < num_possible_cpus(): >> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c >> index e666f7eaf32d..620f656d6195 100644 >> --- a/arch/x86/hyperv/mmu.c >> +++ b/arch/x86/hyperv/mmu.c >> @@ -60,6 +60,7 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, >> struct hv_tlb_flush *flush; >> u64 status = U64_MAX; >> unsigned long flags; >> + unsigned int cpu_last; >> >> trace_hyperv_mmu_flush_tlb_others(cpus, info); >> >> @@ -68,6 +69,11 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, >> >> local_irq_save(flags); >> >> + cpu_last = cpumask_last(cpus); >> + if (cpu_last > num_possible_cpus()) { > > I think this should be ">=" since cpus are numbered starting at zero. > In your VM with 64 CPUs, having CPU #64 in the list would be error. > >> + pr_emerg("ERROR_HYPERV: cpu_last=%*pbl", cpumask_pr_args(cpus)); >> + } >> + >> /* >> * Only check the mask _after_ interrupt has been disabled to avoid the >> * mask changing under our feet. >> >> 2) disable the Hyper-V specific flush routines: >> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c >> index e666f7eaf32d..8e77cc84775a 100644 >> --- a/arch/x86/hyperv/mmu.c >> +++ b/arch/x86/hyperv/mmu.c >> @@ -235,6 +235,7 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus, >> >> void hyperv_setup_mmu_ops(void) >> { >> + return; >> if (!(ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED)) >> return; > > Otherwise, this code looks good to me and matches what I had in mind. > > Note that the function native_flush_tlb_others() is used when the Hyper-V specific > flush function is disabled per patch #2 above, or when hv_cpu_to_vp_index() returns > VP_INVALID. In a quick glance through the code, it appears that native_flush_tlb_others() > will work even if there's a non-existent CPU in the cpumask that is passed as an argument. > So perhaps an immediate workaround is Patch #2 above. The current code of hv_cpu_to_vp_index (where I generated the warning ) is returning VP_INVALID in this case (see previous mail) and look like it is not completely workaround the issue. the cpu is hanging even not panic Will continue watching . > > > Perhaps hyperv_flush_tlb_others() should be made equally tolerant of a non-existent > CPU being in the list. But if you are willing, I'm still interested in the results of an > experiment with just Patch #1. I'm curious about what the CPU list looks like when > it has a non-existent CPU. Is it complete garbage, or is there just one non-existent > CPU? > We will do my be not next week since vacation but the week after > The other curiosity is that I haven't seen this Linux panic reported by other users, > and I think it would have come to our attention if it were happening with any frequency. > You see the problem fairly regularly. So I'm wondering what the difference is. > > Michael ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2021-08-07 5:00 ` David Moses @ 2021-08-17 9:16 ` David Mozes 2021-08-17 11:29 ` Wei Liu 0 siblings, 1 reply; 21+ messages in thread From: David Mozes @ 2021-08-17 9:16 UTC (permalink / raw) To: David Moses, Michael Kelley Cc: תומר אבוטבול, linux-hyperv, linux-kernel Hi Michael and all . I am back from the Holiday and did your saggestiones /requstes 1. While running with patch number-2 (disable the Hyper-V specific flush routines) As you suspected, we got panic similar to what we got with the Hyper-V specific flash routines. Below is the trace we got: [32097.577728] kernel BUG at kernel/sched/rt.c:1004! [32097.577738] invalid opcode: 0000 [#1] SMP [32097.578711] CPU: 45 PID: 51244 Comm: STAR4BLKS0_WORK Kdump: loaded Tainted: G OE 4.19.195-KM9 #1 [32097.578711] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 [32097.578711] RIP: 0010:dequeue_top_rt_rq+0x88/0xa0 [32097.578711] Code: 00 48 89 d5 48 0f a3 15 6e 19 82 01 73 d0 48 89 c7 e8 bc b7 fe ff be 02 00 00 00 89 ef 84 c0 74 0b e8 2c 94 04 00 eb b6 0f 0b <0f> 0b e8 b1 93 04 00 eb ab 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 [32097.578711] RSP: 0018:ffff9442e0de7b48 EFLAGS: 00010046 [32097.578711] RAX: ffff94809f9e1e00 RBX: ffff9448295e4c40 RCX: 00000000ffffffff [32097.578711] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94809f9e2040 [32097.578711] RBP: ffff94809f9e1e00 R08: fffffffffff0be25 R09: 00000000000216c0 [32097.578711] R10: 00004bbc85e1eff3 R11: 0000000000000000 R12: 0000000000000000 [32097.578711] R13: ffff9448295e4a20 R14: 0000000000021e00 R15: ffff94809fa21e00 [32097.578711] FS: 00007f7b0cea0700(0000) GS:ffff94809f940000(0000) knlGS:0000000000000000 [32097.578711] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [32097.578711] CR2: ffffffffff600400 CR3: 000000201d5b3002 CR4: 00000000003606e0 [32097.578711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [32097.578711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [32097.578711] Call Trace: [32097.578711] dequeue_rt_stack+0x3e/0x280 [32097.578711] dequeue_rt_entity+0x1f/0x70 [32097.578711] dequeue_task_rt+0x26/0x70 [32097.578711] push_rt_task+0x1e2/0x220 [32097.578711] push_rt_tasks+0x11/0x20 [32097.578711] __balance_callback+0x3b/0x60 [32097.578711] __schedule+0x6e9/0x830 [32097.578711] schedule+0x28/0x80 [32097.578711] futex_wait_queue_me+0xb9/0x120 [32097.578711] futex_wait+0x139/0x250 [32097.578711] ? try_to_wake_up+0x54/0x460 [32097.578711] ? enqueue_task_rt+0x9f/0xc0 [32097.578711] ? get_futex_key+0x2ee/0x450 [32097.578711] do_futex+0x2eb/0x9f0 [32097.578711] __x64_sys_futex+0x143/0x180 [32097.578711] do_syscall_64+0x59/0x1b0 [32097.578711] ? prepare_exit_to_usermode+0x70/0x90 [32097.578711] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [32097.578711] RIP: 0033:0x7fa2ae151334 [32097.578711] Code: 66 0f 1f 44 00 00 41 52 52 4d 31 d2 ba 02 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05 <89> d0 87 07 85 c0 75 f1 5a 41 5a c3 83 3d f1 df 20 00 00 74 59 48 [32097.578711] RSP: 002b:00007f7b0ce9f3b0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca [32097.578711] RAX: ffffffffffffffda RBX: 00007f7c1da5bc18 RCX: 00007fa2ae151334 [32097.578711] RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007f7c1da5bc58 [32097.578711] RBP: 00007f7b0ce9f5b0 R08: 00007f7c1da5bc58 R09: 000000000000c82c [32097.578711] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7b1a149cf0 [32097.578711] R13: 00007f7c1da5bc58 R14: 0000000000000001 R15: 00000000000005a1 2. as you requested and to help to the community we running patch no 1 as well : And that is what we got: Aug 17 05:36:22 10.230.247.7 [40544.392690] Hyper-V: ERROR_HYPERV: cpu_last= It looks like we got an empty cpumask ! Would you please let us know what father info you need and what Is the next step for debugging this interesting issue Thx David -----Original Message----- From: David Moses <mosesster@gmail.com> Sent: Saturday, August 7, 2021 8:00 AM To: Michael Kelley <mikelley@microsoft.com> Cc: תומר אבוטבול <tomer432100@gmail.com>; David Mozes <david.mozes@silk.us>; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() Sent from my iPhone > On Aug 7, 2021, at 12:51 AM, Michael Kelley <mikelley@microsoft.com> wrote: > > From: תומר אבוטבול <tomer432100@gmail.com> Sent: Friday, August 6, 2021 11:03 AM > >> Attaching the patches Michael asked for debugging >> 1) Print the cpumask when < num_possible_cpus(): >> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c >> index e666f7eaf32d..620f656d6195 100644 >> --- a/arch/x86/hyperv/mmu.c >> +++ b/arch/x86/hyperv/mmu.c >> @@ -60,6 +60,7 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, >> struct hv_tlb_flush *flush; >> u64 status = U64_MAX; >> unsigned long flags; >> + unsigned int cpu_last; >> >> trace_hyperv_mmu_flush_tlb_others(cpus, info); >> >> @@ -68,6 +69,11 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, >> >> local_irq_save(flags); >> >> + cpu_last = cpumask_last(cpus); >> + if (cpu_last > num_possible_cpus()) { > > I think this should be ">=" since cpus are numbered starting at zero. > In your VM with 64 CPUs, having CPU #64 in the list would be error. > >> + pr_emerg("ERROR_HYPERV: cpu_last=%*pbl", cpumask_pr_args(cpus)); >> + } >> + >> /* >> * Only check the mask _after_ interrupt has been disabled to avoid the >> * mask changing under our feet. >> >> 2) disable the Hyper-V specific flush routines: >> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c >> index e666f7eaf32d..8e77cc84775a 100644 >> --- a/arch/x86/hyperv/mmu.c >> +++ b/arch/x86/hyperv/mmu.c >> @@ -235,6 +235,7 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus, >> >> void hyperv_setup_mmu_ops(void) >> { >> + return; >> if (!(ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED)) >> return; > > Otherwise, this code looks good to me and matches what I had in mind. > > Note that the function native_flush_tlb_others() is used when the Hyper-V specific > flush function is disabled per patch #2 above, or when hv_cpu_to_vp_index() returns > VP_INVALID. In a quick glance through the code, it appears that native_flush_tlb_others() > will work even if there's a non-existent CPU in the cpumask that is passed as an argument. > So perhaps an immediate workaround is Patch #2 above. The current code of hv_cpu_to_vp_index (where I generated the warning ) is returning VP_INVALID in this case (see previous mail) and look like it is not completely workaround the issue. the cpu is hanging even not panic Will continue watching . > > > Perhaps hyperv_flush_tlb_others() should be made equally tolerant of a non-existent > CPU being in the list. But if you are willing, I'm still interested in the results of an > experiment with just Patch #1. I'm curious about what the CPU list looks like when > it has a non-existent CPU. Is it complete garbage, or is there just one non-existent > CPU? > We will do my be not next week since vacation but the week after > The other curiosity is that I haven't seen this Linux panic reported by other users, > and I think it would have come to our attention if it were happening with any frequency. > You see the problem fairly regularly. So I'm wondering what the difference is. > > Michael ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2021-08-17 9:16 ` David Mozes @ 2021-08-17 11:29 ` Wei Liu 2021-08-19 11:05 ` David Mozes [not found] ` <CA+qYZY1U04SkyHo7X+rDeE=nUy_X5nxLfShyuLJFzXnFp2A6uw@mail.gmail.com> 0 siblings, 2 replies; 21+ messages in thread From: Wei Liu @ 2021-08-17 11:29 UTC (permalink / raw) To: David Mozes Cc: David Moses, Michael Kelley, תומר אבוטבול, linux-hyperv, linux-kernel, Wei Liu Please use the "reply all" button in your mail client and avoid top-posting. It is very difficult for me to decipher this thread... On Tue, Aug 17, 2021 at 09:16:45AM +0000, David Mozes wrote: > Hi Michael and all . > I am back from the Holiday and did your saggestiones /requstes > > 1. While running with patch number-2 (disable the Hyper-V specific flush routines) > As you suspected, we got panic similar to what we got with the Hyper-V specific flash routines. > Below is the trace we got: > > [32097.577728] kernel BUG at kernel/sched/rt.c:1004! > [32097.577738] invalid opcode: 0000 [#1] SMP > [32097.578711] CPU: 45 PID: 51244 Comm: STAR4BLKS0_WORK Kdump: loaded Tainted: G OE 4.19.195-KM9 #1 It seems that you have out of tree module(s) loaded. Please make sure they don't do anything unusual. > [32097.578711] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 > [32097.578711] RIP: 0010:dequeue_top_rt_rq+0x88/0xa0 > [32097.578711] Code: 00 48 89 d5 48 0f a3 15 6e 19 82 01 73 d0 48 89 c7 e8 bc b7 fe ff be 02 00 00 00 89 ef 84 c0 74 0b e8 2c 94 04 00 eb b6 0f 0b <0f> 0b e8 b1 93 04 00 eb ab 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 > [32097.578711] RSP: 0018:ffff9442e0de7b48 EFLAGS: 00010046 > [32097.578711] RAX: ffff94809f9e1e00 RBX: ffff9448295e4c40 RCX: 00000000ffffffff > [32097.578711] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94809f9e2040 > [32097.578711] RBP: ffff94809f9e1e00 R08: fffffffffff0be25 R09: 00000000000216c0 > [32097.578711] R10: 00004bbc85e1eff3 R11: 0000000000000000 R12: 0000000000000000 > [32097.578711] R13: ffff9448295e4a20 R14: 0000000000021e00 R15: ffff94809fa21e00 > [32097.578711] FS: 00007f7b0cea0700(0000) GS:ffff94809f940000(0000) knlGS:0000000000000000 > [32097.578711] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [32097.578711] CR2: ffffffffff600400 CR3: 000000201d5b3002 CR4: 00000000003606e0 > [32097.578711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [32097.578711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [32097.578711] Call Trace: > [32097.578711] dequeue_rt_stack+0x3e/0x280 > [32097.578711] dequeue_rt_entity+0x1f/0x70 > [32097.578711] dequeue_task_rt+0x26/0x70 > [32097.578711] push_rt_task+0x1e2/0x220 > [32097.578711] push_rt_tasks+0x11/0x20 > [32097.578711] __balance_callback+0x3b/0x60 > [32097.578711] __schedule+0x6e9/0x830 > [32097.578711] schedule+0x28/0x80 It looks like the scheduler is in an irrecoverable state. The stack trace does not show anything related to TLB flush, so it is unclear to me this has anything to do with the original report. Have you tried running the same setup on baremetal? > [32097.578711] futex_wait_queue_me+0xb9/0x120 > [32097.578711] futex_wait+0x139/0x250 > [32097.578711] ? try_to_wake_up+0x54/0x460 > [32097.578711] ? enqueue_task_rt+0x9f/0xc0 > [32097.578711] ? get_futex_key+0x2ee/0x450 > [32097.578711] do_futex+0x2eb/0x9f0 > [32097.578711] __x64_sys_futex+0x143/0x180 > [32097.578711] do_syscall_64+0x59/0x1b0 > [32097.578711] ? prepare_exit_to_usermode+0x70/0x90 > [32097.578711] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [32097.578711] RIP: 0033:0x7fa2ae151334 > [32097.578711] Code: 66 0f 1f 44 00 00 41 52 52 4d 31 d2 ba 02 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05 <89> d0 87 07 85 c0 75 f1 5a 41 5a c3 83 3d f1 df 20 00 00 74 59 48 > [32097.578711] RSP: 002b:00007f7b0ce9f3b0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca > [32097.578711] RAX: ffffffffffffffda RBX: 00007f7c1da5bc18 RCX: 00007fa2ae151334 > [32097.578711] RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007f7c1da5bc58 > [32097.578711] RBP: 00007f7b0ce9f5b0 R08: 00007f7c1da5bc58 R09: 000000000000c82c > [32097.578711] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7b1a149cf0 > [32097.578711] R13: 00007f7c1da5bc58 R14: 0000000000000001 R15: 00000000000005a1 > > > 2. as you requested and to help to the community we running patch no 1 as well : > > And that is what we got: > > Aug 17 05:36:22 10.230.247.7 [40544.392690] Hyper-V: ERROR_HYPERV: cpu_last= > > It looks like we got an empty cpumask ! Assuming this is from the patch below, the code already handles empty cpumask a few lines later. You should perhaps move your change after that to right before cpus is actually used. Wei. > > Would you please let us know what father info you need and what Is the next step for debugging this interesting issue > > Thx > David > > > > > >> 1) Print the cpumask when < num_possible_cpus(): > >> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > >> index e666f7eaf32d..620f656d6195 100644 > >> --- a/arch/x86/hyperv/mmu.c > >> +++ b/arch/x86/hyperv/mmu.c > >> @@ -60,6 +60,7 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, > >> struct hv_tlb_flush *flush; > >> u64 status = U64_MAX; > >> unsigned long flags; > >> + unsigned int cpu_last; > >> > >> trace_hyperv_mmu_flush_tlb_others(cpus, info); > >> > >> @@ -68,6 +69,11 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, > >> > >> local_irq_save(flags); > >> > >> + cpu_last = cpumask_last(cpus); > >> + if (cpu_last > num_possible_cpus()) { > > > > I think this should be ">=" since cpus are numbered starting at zero. > > In your VM with 64 CPUs, having CPU #64 in the list would be error. > > > >> + pr_emerg("ERROR_HYPERV: cpu_last=%*pbl", cpumask_pr_args(cpus)); > >> + } > >> + > >> /* > >> * Only check the mask _after_ interrupt has been disabled to avoid the > >> * mask changing under our feet. > >> ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2021-08-17 11:29 ` Wei Liu @ 2021-08-19 11:05 ` David Mozes [not found] ` <CA+qYZY1U04SkyHo7X+rDeE=nUy_X5nxLfShyuLJFzXnFp2A6uw@mail.gmail.com> 1 sibling, 0 replies; 21+ messages in thread From: David Mozes @ 2021-08-19 11:05 UTC (permalink / raw) To: Wei Liu Cc: David Moses, Michael Kelley, תומר אבוטבול, linux-hyperv, linux-kernel Hi Wei , Per your request I move the print cpumask to other two places after the treatment on the empty mask see below And I got the folwing: Aug 19 02:01:51 c-node05 kernel: [25936.562674] Hyper-V: ERROR_HYPERV2: cpu_last= Aug 19 02:01:51 c-node05 kernel: [25936.562686] WARNING: CPU: 11 PID: 56432 at arch/x86/include/asm/mshyperv.h:301 hyperv_flush_tlb_others+0x23f/0x7b0 So we got empty cpumask on a different place on the code . Let me know if you need further information from us. How you sagest to handle this situation? Thx David The new print cpu mask patch diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c index a72fa69..31f0683 100644 --- a/arch/x86/hyperv/mmu.c +++ b/arch/x86/hyperv/mmu.c @@ -70,9 +70,7 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, local_irq_save(flags); cpu_last = cpumask_last(cpus); - if (cpu_last >= num_possible_cpus()) { - pr_emerg("ERROR_HYPERV: cpu_last=%*pbl", cpumask_pr_args(cpus)); - } + /* * Only check the mask _after_ interrupt has been disabled to avoid the @@ -83,6 +81,12 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, return; } + + if (cpu_last >= num_possible_cpus()) { + pr_emerg("ERROR_HYPERV1: cpu_last=%*pbl", cpumask_pr_args(cpus)); + } + + flush_pcpu = (struct hv_tlb_flush **) this_cpu_ptr(hyperv_pcpu_input_arg); @@ -121,6 +125,13 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, * must. We will also check all VP numbers when walking the * supplied CPU set to remain correct in all cases. */ + cpu_last = cpumask_last(cpus); + + if (cpu_last >= num_possible_cpus()) { + pr_emerg("ERROR_HYPERV2: cpu_last=%*pbl", cpumask_pr_args(cpus)); + } + -----Original Message----- From: Wei Liu <wei.liu@kernel.org> Sent: Tuesday, August 17, 2021 2:30 PM To: David Mozes <david.mozes@silk.us> Cc: David Moses <mosesster@gmail.com>; Michael Kelley <mikelley@microsoft.com>; תומר אבוטבול <tomer432100@gmail.com>; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org; Wei Liu <wei.liu@kernel.org> Subject: Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() Please use the "reply all" button in your mail client and avoid top-posting. It is very difficult for me to decipher this thread... On Tue, Aug 17, 2021 at 09:16:45AM +0000, David Mozes wrote: > Hi Michael and all . > I am back from the Holiday and did your saggestiones /requstes > > 1. While running with patch number-2 (disable the Hyper-V specific flush routines) > As you suspected, we got panic similar to what we got with the Hyper-V specific flash routines. > Below is the trace we got: > > [32097.577728] kernel BUG at kernel/sched/rt.c:1004! > [32097.577738] invalid opcode: 0000 [#1] SMP > [32097.578711] CPU: 45 PID: 51244 Comm: STAR4BLKS0_WORK Kdump: loaded Tainted: G OE 4.19.195-KM9 #1 It seems that you have out of tree module(s) loaded. Please make sure they don't do anything unusual. > [32097.578711] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 > [32097.578711] RIP: 0010:dequeue_top_rt_rq+0x88/0xa0 > [32097.578711] Code: 00 48 89 d5 48 0f a3 15 6e 19 82 01 73 d0 48 89 c7 e8 bc b7 fe ff be 02 00 00 00 89 ef 84 c0 74 0b e8 2c 94 04 00 eb b6 0f 0b <0f> 0b e8 b1 93 04 00 eb ab 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 > [32097.578711] RSP: 0018:ffff9442e0de7b48 EFLAGS: 00010046 > [32097.578711] RAX: ffff94809f9e1e00 RBX: ffff9448295e4c40 RCX: 00000000ffffffff > [32097.578711] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94809f9e2040 > [32097.578711] RBP: ffff94809f9e1e00 R08: fffffffffff0be25 R09: 00000000000216c0 > [32097.578711] R10: 00004bbc85e1eff3 R11: 0000000000000000 R12: 0000000000000000 > [32097.578711] R13: ffff9448295e4a20 R14: 0000000000021e00 R15: ffff94809fa21e00 > [32097.578711] FS: 00007f7b0cea0700(0000) GS:ffff94809f940000(0000) knlGS:0000000000000000 > [32097.578711] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [32097.578711] CR2: ffffffffff600400 CR3: 000000201d5b3002 CR4: 00000000003606e0 > [32097.578711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [32097.578711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [32097.578711] Call Trace: > [32097.578711] dequeue_rt_stack+0x3e/0x280 > [32097.578711] dequeue_rt_entity+0x1f/0x70 > [32097.578711] dequeue_task_rt+0x26/0x70 > [32097.578711] push_rt_task+0x1e2/0x220 > [32097.578711] push_rt_tasks+0x11/0x20 > [32097.578711] __balance_callback+0x3b/0x60 > [32097.578711] __schedule+0x6e9/0x830 > [32097.578711] schedule+0x28/0x80 It looks like the scheduler is in an irrecoverable state. The stack trace does not show anything related to TLB flush, so it is unclear to me this has anything to do with the original report. Have you tried running the same setup on baremetal? > [32097.578711] futex_wait_queue_me+0xb9/0x120 > [32097.578711] futex_wait+0x139/0x250 > [32097.578711] ? try_to_wake_up+0x54/0x460 > [32097.578711] ? enqueue_task_rt+0x9f/0xc0 > [32097.578711] ? get_futex_key+0x2ee/0x450 > [32097.578711] do_futex+0x2eb/0x9f0 > [32097.578711] __x64_sys_futex+0x143/0x180 > [32097.578711] do_syscall_64+0x59/0x1b0 > [32097.578711] ? prepare_exit_to_usermode+0x70/0x90 > [32097.578711] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [32097.578711] RIP: 0033:0x7fa2ae151334 > [32097.578711] Code: 66 0f 1f 44 00 00 41 52 52 4d 31 d2 ba 02 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05 <89> d0 87 07 85 c0 75 f1 5a 41 5a c3 83 3d f1 df 20 00 00 74 59 48 > [32097.578711] RSP: 002b:00007f7b0ce9f3b0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca > [32097.578711] RAX: ffffffffffffffda RBX: 00007f7c1da5bc18 RCX: 00007fa2ae151334 > [32097.578711] RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007f7c1da5bc58 > [32097.578711] RBP: 00007f7b0ce9f5b0 R08: 00007f7c1da5bc58 R09: 000000000000c82c > [32097.578711] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7b1a149cf0 > [32097.578711] R13: 00007f7c1da5bc58 R14: 0000000000000001 R15: 00000000000005a1 > > > 2. as you requested and to help to the community we running patch no 1 as well : > > And that is what we got: > > Aug 17 05:36:22 10.230.247.7 [40544.392690] Hyper-V: ERROR_HYPERV: cpu_last= > > It looks like we got an empty cpumask ! Assuming this is from the patch below, the code already handles empty cpumask a few lines later. You should perhaps move your change after that to right before cpus is actually used. Wei. > > Would you please let us know what father info you need and what Is the next step for debugging this interesting issue > > Thx > David > > > > > >> 1) Print the cpumask when < num_possible_cpus(): > >> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > >> index e666f7eaf32d..620f656d6195 100644 > >> --- a/arch/x86/hyperv/mmu.c > >> +++ b/arch/x86/hyperv/mmu.c > >> @@ -60,6 +60,7 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, > >> struct hv_tlb_flush *flush; > >> u64 status = U64_MAX; > >> unsigned long flags; > >> + unsigned int cpu_last; > >> > >> trace_hyperv_mmu_flush_tlb_others(cpus, info); > >> > >> @@ -68,6 +69,11 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, > >> > >> local_irq_save(flags); > >> > >> + cpu_last = cpumask_last(cpus); > >> + if (cpu_last > num_possible_cpus()) { > > > > I think this should be ">=" since cpus are numbered starting at zero. > > In your VM with 64 CPUs, having CPU #64 in the list would be error. > > > >> + pr_emerg("ERROR_HYPERV: cpu_last=%*pbl", cpumask_pr_args(cpus)); > >> + } > >> + > >> /* > >> * Only check the mask _after_ interrupt has been disabled to avoid the > >> * mask changing under our feet. > >> ^ permalink raw reply related [flat|nested] 21+ messages in thread
[parent not found: <CA+qYZY1U04SkyHo7X+rDeE=nUy_X5nxLfShyuLJFzXnFp2A6uw@mail.gmail.com>]
[parent not found: <VI1PR0401MB24153DEC767B0126B1030E07F1C09@VI1PR0401MB2415.eurprd04.prod.outlook.com>]
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() [not found] ` <VI1PR0401MB24153DEC767B0126B1030E07F1C09@VI1PR0401MB2415.eurprd04.prod.outlook.com> @ 2021-08-22 15:24 ` Wei Liu 2021-08-22 16:25 ` David Mozes 0 siblings, 1 reply; 21+ messages in thread From: Wei Liu @ 2021-08-22 15:24 UTC (permalink / raw) To: David Mozes Cc: David Moses, Wei Liu, Michael Kelley, תומר אבוטבול, linux-hyperv, linux-kernel On Thu, Aug 19, 2021 at 07:55:06AM +0000, David Mozes wrote: > Hi Wei , > I move the print cpumask to other two places after the treatment on the empty mask see below > And I got the folwing: > > > Aug 19 02:01:51 c-node05 kernel: [25936.562674] Hyper-V: ERROR_HYPERV2: cpu_last= > Aug 19 02:01:51 c-node05 kernel: [25936.562686] WARNING: CPU: 11 PID: 56432 at arch/x86/include/asm/mshyperv.h:301 hyperv_flush_tlb_others+0x23f/0x7b0 > > So we got empty on different place on the code . > Let me know if you need further information from us. > How you sagest to handle this situation? > Please find a way to reproduce this issue with upstream kernels. Thanks, Wei. ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2021-08-22 15:24 ` Wei Liu @ 2021-08-22 16:25 ` David Mozes 2021-08-22 17:32 ` Wei Liu 0 siblings, 1 reply; 21+ messages in thread From: David Mozes @ 2021-08-22 16:25 UTC (permalink / raw) To: Wei Liu Cc: David Moses, Michael Kelley, תומר אבוטבול, linux-hyperv, linux-kernel This is not visible since we need a very high load to reproduce. We have tried a lot but can't achieve the desired load On our kernel with less load, it is not reproducible as well. -----Original Message----- From: Wei Liu <wei.liu@kernel.org> Sent: Sunday, August 22, 2021 6:25 PM To: David Mozes <david.mozes@silk.us> Cc: David Moses <mosesster@gmail.com>; Wei Liu <wei.liu@kernel.org>; Michael Kelley <mikelley@microsoft.com>; תומר אבוטבול <tomer432100@gmail.com>; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() On Thu, Aug 19, 2021 at 07:55:06AM +0000, David Mozes wrote: > Hi Wei , > I move the print cpumask to other two places after the treatment on the empty mask see below > And I got the folwing: > > > Aug 19 02:01:51 c-node05 kernel: [25936.562674] Hyper-V: ERROR_HYPERV2: cpu_last= > Aug 19 02:01:51 c-node05 kernel: [25936.562686] WARNING: CPU: 11 PID: 56432 at arch/x86/include/asm/mshyperv.h:301 hyperv_flush_tlb_others+0x23f/0x7b0 > > So we got empty on different place on the code . > Let me know if you need further information from us. > How you sagest to handle this situation? > Please find a way to reproduce this issue with upstream kernels. Thanks, Wei. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2021-08-22 16:25 ` David Mozes @ 2021-08-22 17:32 ` Wei Liu 0 siblings, 0 replies; 21+ messages in thread From: Wei Liu @ 2021-08-22 17:32 UTC (permalink / raw) To: David Mozes Cc: Wei Liu, David Moses, Michael Kelley, תומר אבוטבול, linux-hyperv, linux-kernel On Sun, Aug 22, 2021 at 04:25:19PM +0000, David Mozes wrote: > This is not visible since we need a very high load to reproduce. > We have tried a lot but can't achieve the desired load > On our kernel with less load, it is not reproducible as well. There isn't much upstream can do if there is no way to reproduce the issue with an upstream kernel. You can check all the code paths which may modify cpumask and analyze them. KCSAN may be useful too, but that's only available in 5.8 and later. Thanks, Wei. > > -----Original Message----- > From: Wei Liu <wei.liu@kernel.org> > Sent: Sunday, August 22, 2021 6:25 PM > To: David Mozes <david.mozes@silk.us> > Cc: David Moses <mosesster@gmail.com>; Wei Liu <wei.liu@kernel.org>; Michael Kelley <mikelley@microsoft.com>; תומר אבוטבול <tomer432100@gmail.com>; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() > > On Thu, Aug 19, 2021 at 07:55:06AM +0000, David Mozes wrote: > > Hi Wei , > > I move the print cpumask to other two places after the treatment on the empty mask see below > > And I got the folwing: > > > > > > Aug 19 02:01:51 c-node05 kernel: [25936.562674] Hyper-V: ERROR_HYPERV2: cpu_last= > > Aug 19 02:01:51 c-node05 kernel: [25936.562686] WARNING: CPU: 11 PID: 56432 at arch/x86/include/asm/mshyperv.h:301 hyperv_flush_tlb_others+0x23f/0x7b0 > > > > So we got empty on different place on the code . > > Let me know if you need further information from us. > > How you sagest to handle this situation? > > > > Please find a way to reproduce this issue with upstream kernels. > > Thanks, > Wei. ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <VI1PR0401MB24150B31A1D63176BBB788D2F1F19@VI1PR0401MB2415.eurprd04.prod.outlook.com>]
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() [not found] <VI1PR0401MB24150B31A1D63176BBB788D2F1F19@VI1PR0401MB2415.eurprd04.prod.outlook.com> @ 2021-08-05 18:08 ` Michael Kelley 0 siblings, 0 replies; 21+ messages in thread From: Michael Kelley @ 2021-08-05 18:08 UTC (permalink / raw) To: David Mozes, 20201001013814.2435935-1-sashal, linux-kernel, linux-hyperv From: David Mozes <david.mozes@silk.us> > > Hi, > The problem is happening to me very frequently on kernel 4.19.195 > David -- could you give us a little more context? Were you running earlier 4.19.xxx versions and did not see this problem? There was a timing problem in hyperv_flush_tlb_others() that was fixed in early January 2021. The fix was backported to the 4.19 longterm tree, and should be included in 4.19.195. Outside of that, I'm not aware of a problem in this area. For completeness, what version of Hyper-V are you using? And how many vCPUs in your VM? Michael > > > ug 4 03:59:01 c-node04 kernel: [36976.388554] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others+0xec9/0x1640 > Aug 4 03:59:01 c-node04 kernel: [36976.388556] Read of size 4 at addr ffff889e5e127440 by task ps/52478 > Aug 4 03:59:01 c-node04 kernel: [36976.388556] > Aug 4 03:59:01 c-node04 kernel: [36976.388560] CPU: 4 PID: 52478 Comm: ps Kdump: loaded Tainted: G W OE > 4.19.195-KM9 #1 > Aug 4 03:59:01 c-node04 kernel: [36976.388562] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, > BIOS 090008 12/07/2018 > Aug 4 03:59:01 c-node04 kernel: [36976.388562] Call Trace: > Aug 4 03:59:01 c-node04 kernel: [36976.388569] dump_stack+0x11d/0x1a7 > Aug 4 03:59:01 c-node04 kernel: [36976.388572] ? dump_stack_print_info.cold.0+0x1b/0x1b > Aug 4 03:59:01 c-node04 kernel: [36976.388576] ? percpu_ref_tryget_live+0x2f0/0x2f0 > Aug 4 03:59:01 c-node04 kernel: [36976.388580] ? rb_erase_cached+0xc4c/0x2880 > Aug 4 03:59:01 c-node04 kernel: [36976.388584] ? printk+0x9f/0xc5 > Aug 4 03:59:01 c-node04 kernel: [36976.388585] ? snapshot_ioctl.cold.1+0x74/0x74 > Aug 4 03:59:01 c-node04 kernel: [36976.388590] print_address_description+0x65/0x22e > Aug 4 03:59:01 c-node04 kernel: [36976.388592] kasan_report.cold.6+0x243/0x2ff > Aug 4 03:59:01 c-node04 kernel: [36976.388594] ? hyperv_flush_tlb_others+0xec9/0x1640 > Aug 4 03:59:01 c-node04 kernel: [36976.388596] hyperv_flush_tlb_others+0xec9/0x1640 > Aug 4 03:59:01 c-node04 kernel: [36976.388601] ? > trace_event_raw_event_hyperv_nested_flush_guest_mapping+0x1b0/0x1b0 > Aug 4 03:59:01 c-node04 kernel: [36976.388603] ? mem_cgroup_try_charge+0x3cc/0x7d0 > Aug 4 03:59:01 c-node04 kernel: [36976.388608] flush_tlb_mm_range+0x25c/0x370 > Aug 4 03:59:01 c-node04 kernel: [36976.388611] ? native_flush_tlb_others+0x3b0/0x3b0 > Aug 4 03:59:01 c-node04 kernel: [36976.388616] ptep_clear_flush+0x192/0x1d0 > Aug 4 03:59:01 c-node04 kernel: [36976.388618] ? pmd_clear_bad+0x70/0x70 > Aug 4 03:59:01 c-node04 kernel: [36976.388622] wp_page_copy+0x861/0x1a30 > Aug 4 03:59:01 c-node04 kernel: [36976.388624] ? follow_pfn+0x2f0/0x2f0 > Aug 4 03:59:01 c-node04 kernel: [36976.388627] ? active_load_balance_cpu_stop+0x10d0/0x10d0 > Aug 4 03:59:01 c-node04 kernel: [36976.388632] ? get_page_from_freelist+0x330c/0x4660 > Aug 4 03:59:01 c-node04 kernel: [36976.388638] ? activate_page+0x660/0x660 > Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? rb_erase+0x2a40/0x2a40 > Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? wake_up_page_bit+0x4d0/0x4d0 > Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? unwind_next_frame+0x113e/0x1920 > Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? __pte_alloc_kernel+0x350/0x350 > Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? deref_stack_reg+0x130/0x130 > Aug 4 03:59:01 c-node04 kernel: [36976.388639] do_wp_page+0x461/0x1ca0 > Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? deref_stack_reg+0x130/0x130 > Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? finish_mkwrite_fault+0x710/0x710 > Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? unwind_next_frame+0x105d/0x1920 > Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? __pte_alloc_kernel+0x350/0x350 > Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? __zone_watermark_ok+0x33c/0x640 > Aug 4 03:59:01 c-node04 kernel: [36976.388639] ? _raw_spin_lock+0x13/0x30 > Pattern not found (press RETURN) ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() @ 2020-10-01 1:38 Sasha Levin 2020-10-01 9:40 ` Vitaly Kuznetsov 0 siblings, 1 reply; 21+ messages in thread From: Sasha Levin @ 2020-10-01 1:38 UTC (permalink / raw) To: kys, haiyangz, sthemmin, wei.liu Cc: tglx, mingo, bp, x86, hpa, vkuznets, mikelley, linux-hyperv, linux-kernel, Sasha Levin, stable cpumask can change underneath us, which is generally safe except when we call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read garbage. As reported by KASAN: [ 83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) [ 83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106 [ 84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G W 5.4.60 #1 [ 84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 [ 84.196669] Workqueue: writeback wb_workfn (flush-8:0) [ 84.196669] Call Trace: [ 84.196669] dump_stack (lib/dump_stack.c:120) [ 84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375) [ 84.196669] __kasan_report.cold (mm/kasan/report.c:507) [ 84.196669] kasan_report (arch/x86/include/asm/smap.h:71 mm/kasan/common.c:635) [ 84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) [ 84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 arch/x86/mm/tlb.c:798) [ 84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-generic.c:88) Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible") Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> --- arch/x86/hyperv/mmu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c index 5208ba49c89a9..b1d6afc5fc4a3 100644 --- a/arch/x86/hyperv/mmu.c +++ b/arch/x86/hyperv/mmu.c @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, * must. We will also check all VP numbers when walking the * supplied CPU set to remain correct in all cases. */ - if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64) + int last = cpumask_last(cpus); + + if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >= 64) goto do_ex_hypercall; for_each_cpu(cpu, cpus) { -- 2.25.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2020-10-01 1:38 Sasha Levin @ 2020-10-01 9:40 ` Vitaly Kuznetsov 2020-10-01 11:53 ` Wei Liu 0 siblings, 1 reply; 21+ messages in thread From: Vitaly Kuznetsov @ 2020-10-01 9:40 UTC (permalink / raw) To: Sasha Levin Cc: tglx, mingo, bp, x86, hpa, mikelley, linux-hyperv, linux-kernel, Sasha Levin, stable, kys, haiyangz, sthemmin, wei.liu Sasha Levin <sashal@kernel.org> writes: > cpumask can change underneath us, which is generally safe except when we > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read > garbage. As reported by KASAN: > > [ 83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) > [ 83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106 > [ 84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G W 5.4.60 #1 > [ 84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 > [ 84.196669] Workqueue: writeback wb_workfn (flush-8:0) > [ 84.196669] Call Trace: > [ 84.196669] dump_stack (lib/dump_stack.c:120) > [ 84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375) > [ 84.196669] __kasan_report.cold (mm/kasan/report.c:507) > [ 84.196669] kasan_report (arch/x86/include/asm/smap.h:71 mm/kasan/common.c:635) > [ 84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) > [ 84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 arch/x86/mm/tlb.c:798) > [ 84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-generic.c:88) > > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible") > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > Cc: stable@kernel.org > Signed-off-by: Sasha Levin <sashal@kernel.org> > --- > arch/x86/hyperv/mmu.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > index 5208ba49c89a9..b1d6afc5fc4a3 100644 > --- a/arch/x86/hyperv/mmu.c > +++ b/arch/x86/hyperv/mmu.c > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, > * must. We will also check all VP numbers when walking the > * supplied CPU set to remain correct in all cases. > */ > - if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64) > + int last = cpumask_last(cpus); > + > + if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >= 64) > goto do_ex_hypercall; In case 'cpus' can end up being empty (I'm genuinely suprised it can) the check is mandatory indeed. I would, however, just return directly in this case: if (last < num_possible_cpus()) return; if (hv_cpu_number_to_vp_number(last) >= 64) goto do_ex_hypercall; as there's nothing to flush, no need to call into hyperv_flush_tlb_others_ex(). Anyway, the fix seems to be correct, so Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> > > for_each_cpu(cpu, cpus) { -- Vitaly ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2020-10-01 9:40 ` Vitaly Kuznetsov @ 2020-10-01 11:53 ` Wei Liu 2020-10-01 13:04 ` Sasha Levin 2020-10-01 13:10 ` Vitaly Kuznetsov 0 siblings, 2 replies; 21+ messages in thread From: Wei Liu @ 2020-10-01 11:53 UTC (permalink / raw) To: Vitaly Kuznetsov Cc: Sasha Levin, tglx, mingo, bp, x86, hpa, mikelley, linux-hyperv, linux-kernel, stable, kys, haiyangz, sthemmin, wei.liu On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote: > Sasha Levin <sashal@kernel.org> writes: > > > cpumask can change underneath us, which is generally safe except when we > > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass > > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read > > garbage. As reported by KASAN: > > > > [ 83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) > > [ 83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106 > > [ 84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G W 5.4.60 #1 > > [ 84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 > > [ 84.196669] Workqueue: writeback wb_workfn (flush-8:0) > > [ 84.196669] Call Trace: > > [ 84.196669] dump_stack (lib/dump_stack.c:120) > > [ 84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375) > > [ 84.196669] __kasan_report.cold (mm/kasan/report.c:507) > > [ 84.196669] kasan_report (arch/x86/include/asm/smap.h:71 mm/kasan/common.c:635) > > [ 84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) > > [ 84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 arch/x86/mm/tlb.c:798) > > [ 84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-generic.c:88) > > > > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible") > > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > > Cc: stable@kernel.org > > Signed-off-by: Sasha Levin <sashal@kernel.org> > > --- > > arch/x86/hyperv/mmu.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > > index 5208ba49c89a9..b1d6afc5fc4a3 100644 > > --- a/arch/x86/hyperv/mmu.c > > +++ b/arch/x86/hyperv/mmu.c > > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, > > * must. We will also check all VP numbers when walking the > > * supplied CPU set to remain correct in all cases. > > */ > > - if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64) > > + int last = cpumask_last(cpus); > > + > > + if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >= 64) > > goto do_ex_hypercall; > > In case 'cpus' can end up being empty (I'm genuinely suprised it can) > the check is mandatory indeed. I would, however, just return directly in > this case: > > if (last < num_possible_cpus()) > return; I think you want last >= num_possible_cpus() here? A more important question is, if the mask can change willy-nilly, what is stopping it from changing between these checks? I.e. is there still a windows that hv_cpu_number_to_vp_number(last) can return garbage? Wei. > > if (hv_cpu_number_to_vp_number(last) >= 64) > goto do_ex_hypercall; > > as there's nothing to flush, no need to call into > hyperv_flush_tlb_others_ex(). > > Anyway, the fix seems to be correct, so > > Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> > > > > > for_each_cpu(cpu, cpus) { > > -- > Vitaly > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2020-10-01 11:53 ` Wei Liu @ 2020-10-01 13:04 ` Sasha Levin 2020-10-03 17:40 ` Michael Kelley 2020-10-01 13:10 ` Vitaly Kuznetsov 1 sibling, 1 reply; 21+ messages in thread From: Sasha Levin @ 2020-10-01 13:04 UTC (permalink / raw) To: Wei Liu Cc: Vitaly Kuznetsov, tglx, mingo, bp, x86, hpa, mikelley, linux-hyperv, linux-kernel, stable, kys, haiyangz, sthemmin On Thu, Oct 01, 2020 at 11:53:59AM +0000, Wei Liu wrote: >On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote: >> Sasha Levin <sashal@kernel.org> writes: >> >> > cpumask can change underneath us, which is generally safe except when we >> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass >> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read >> > garbage. As reported by KASAN: >> > >> > [ 83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) >> > [ 83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106 >> > [ 84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G W 5.4.60 #1 >> > [ 84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 >> > [ 84.196669] Workqueue: writeback wb_workfn (flush-8:0) >> > [ 84.196669] Call Trace: >> > [ 84.196669] dump_stack (lib/dump_stack.c:120) >> > [ 84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375) >> > [ 84.196669] __kasan_report.cold (mm/kasan/report.c:507) >> > [ 84.196669] kasan_report (arch/x86/include/asm/smap.h:71 mm/kasan/common.c:635) >> > [ 84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) >> > [ 84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 arch/x86/mm/tlb.c:798) >> > [ 84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-generic.c:88) >> > >> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible") >> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> >> > Cc: stable@kernel.org >> > Signed-off-by: Sasha Levin <sashal@kernel.org> >> > --- >> > arch/x86/hyperv/mmu.c | 4 +++- >> > 1 file changed, 3 insertions(+), 1 deletion(-) >> > >> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c >> > index 5208ba49c89a9..b1d6afc5fc4a3 100644 >> > --- a/arch/x86/hyperv/mmu.c >> > +++ b/arch/x86/hyperv/mmu.c >> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, >> > * must. We will also check all VP numbers when walking the >> > * supplied CPU set to remain correct in all cases. >> > */ >> > - if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64) >> > + int last = cpumask_last(cpus); >> > + >> > + if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >= 64) >> > goto do_ex_hypercall; >> >> In case 'cpus' can end up being empty (I'm genuinely suprised it can) I was just as surprised as you and spent the good part of a day debugging this. However, a: WARN_ON(cpumask_empty(cpus)); triggers at that line of code even though we check for cpumask_empty() at the entry of the function. >> the check is mandatory indeed. I would, however, just return directly in >> this case: Makes sense. >> if (last < num_possible_cpus()) >> return; > >I think you want > > last >= num_possible_cpus() > >here? > >A more important question is, if the mask can change willy-nilly, what >is stopping it from changing between these checks? I.e. is there still a >windows that hv_cpu_number_to_vp_number(last) can return garbage? It's not that hv_cpu_number_to_vp_number() returns garbage, the issue is that we feed it garbage. hv_cpu_number_to_vp_number() expects that the input would be in the range of 0 <= X < num_possible_cpus(), and here if 'cpus' was empty we would pass in X==num_possible_cpus() making it read out of bound. Maybe it's worthwhile to add a WARN_ON() into hv_cpu_number_to_vp_number() to assert as well. -- Thanks, Sasha ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2020-10-01 13:04 ` Sasha Levin @ 2020-10-03 17:40 ` Michael Kelley 2020-10-05 14:58 ` Wei Liu 0 siblings, 1 reply; 21+ messages in thread From: Michael Kelley @ 2020-10-03 17:40 UTC (permalink / raw) To: Sasha Levin, Wei Liu Cc: vkuznets, tglx, mingo, bp, x86, hpa, linux-hyperv, linux-kernel, stable, KY Srinivasan, Haiyang Zhang, Stephen Hemminger From: Sasha Levin <sashal@kernel.org> Sent: Thursday, October 1, 2020 6:04 AM > > On Thu, Oct 01, 2020 at 11:53:59AM +0000, Wei Liu wrote: > >On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote: > >> Sasha Levin <sashal@kernel.org> writes: > >> > >> > cpumask can change underneath us, which is generally safe except when we > >> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass > >> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read > >> > garbage. As reported by KASAN: > >> > > >> > [ 83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others > (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) > >> > [ 83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106 > >> > [ 84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G W 5.4.60 #1 > >> > [ 84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, > BIOS 090008 12/07/2018 > >> > [ 84.196669] Workqueue: writeback wb_workfn (flush-8:0) > >> > [ 84.196669] Call Trace: > >> > [ 84.196669] dump_stack (lib/dump_stack.c:120) > >> > [ 84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375) > >> > [ 84.196669] __kasan_report.cold (mm/kasan/report.c:507) > >> > [ 84.196669] kasan_report (arch/x86/include/asm/smap.h:71 > mm/kasan/common.c:635) > >> > [ 84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 > arch/x86/hyperv/mmu.c:112) > >> > [ 84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 > arch/x86/mm/tlb.c:798) > >> > [ 84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable- > generic.c:88) > >> > > >> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper > HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible") > >> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > >> > Cc: stable@kernel.org > >> > Signed-off-by: Sasha Levin <sashal@kernel.org> > >> > --- > >> > arch/x86/hyperv/mmu.c | 4 +++- > >> > 1 file changed, 3 insertions(+), 1 deletion(-) > >> > > >> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > >> > index 5208ba49c89a9..b1d6afc5fc4a3 100644 > >> > --- a/arch/x86/hyperv/mmu.c > >> > +++ b/arch/x86/hyperv/mmu.c > >> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask > *cpus, > >> > * must. We will also check all VP numbers when walking the > >> > * supplied CPU set to remain correct in all cases. > >> > */ > >> > - if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64) > >> > + int last = cpumask_last(cpus); > >> > + > >> > + if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >= > 64) > >> > goto do_ex_hypercall; > >> > >> In case 'cpus' can end up being empty (I'm genuinely suprised it can) > > I was just as surprised as you and spent the good part of a day > debugging this. However, a: > > WARN_ON(cpumask_empty(cpus)); > > triggers at that line of code even though we check for cpumask_empty() > at the entry of the function. What does the call stack look like when this triggers? I'm curious about the path where the 'cpus' could be changing while the flush call is in progress. I wonder if CPUs could ever be added to the mask? Removing CPUs can be handled with some care because an unnecessary flush doesn't hurt anything. But adding CPUs has serious correctness problems. > > >> the check is mandatory indeed. I would, however, just return directly in > >> this case: > > Makes sense. But need to do a local_irq_restore() before returning. > > >> if (last < num_possible_cpus()) > >> return; > > > >I think you want > > > > last >= num_possible_cpus() > > > >here? Yes, but also the && must become || > > > >A more important question is, if the mask can change willy-nilly, what > >is stopping it from changing between these checks? I.e. is there still a > >windows that hv_cpu_number_to_vp_number(last) can return garbage? > > It's not that hv_cpu_number_to_vp_number() returns garbage, the issue is > that we feed it garbage. > > hv_cpu_number_to_vp_number() expects that the input would be in the > range of 0 <= X < num_possible_cpus(), and here if 'cpus' was empty we > would pass in X==num_possible_cpus() making it read out of bound. > > Maybe it's worthwhile to add a WARN_ON() into > hv_cpu_number_to_vp_number() to assert as well. If the input cpumask can be changing, the other risk is the for_each_cpu() loop, which also has a call to hv_cpu_number_to_vp_number(). But looking at the implementation of for_each_cpu(), it will always return an in-bounds value, so everything should be OK. > > -- > Thanks, > Sasha ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2020-10-03 17:40 ` Michael Kelley @ 2020-10-05 14:58 ` Wei Liu 2021-01-05 16:59 ` Michael Kelley 0 siblings, 1 reply; 21+ messages in thread From: Wei Liu @ 2020-10-05 14:58 UTC (permalink / raw) To: Michael Kelley Cc: Sasha Levin, Wei Liu, vkuznets, tglx, mingo, bp, x86, hpa, linux-hyperv, linux-kernel, stable, KY Srinivasan, Haiyang Zhang, Stephen Hemminger On Sat, Oct 03, 2020 at 05:40:15PM +0000, Michael Kelley wrote: > From: Sasha Levin <sashal@kernel.org> Sent: Thursday, October 1, 2020 6:04 AM > > > > On Thu, Oct 01, 2020 at 11:53:59AM +0000, Wei Liu wrote: > > >On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote: > > >> Sasha Levin <sashal@kernel.org> writes: > > >> > > >> > cpumask can change underneath us, which is generally safe except when we > > >> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass > > >> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read > > >> > garbage. As reported by KASAN: > > >> > > > >> > [ 83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others > > (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) > > >> > [ 83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106 > > >> > [ 84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G W 5.4.60 #1 > > >> > [ 84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, > > BIOS 090008 12/07/2018 > > >> > [ 84.196669] Workqueue: writeback wb_workfn (flush-8:0) > > >> > [ 84.196669] Call Trace: > > >> > [ 84.196669] dump_stack (lib/dump_stack.c:120) > > >> > [ 84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375) > > >> > [ 84.196669] __kasan_report.cold (mm/kasan/report.c:507) > > >> > [ 84.196669] kasan_report (arch/x86/include/asm/smap.h:71 > > mm/kasan/common.c:635) > > >> > [ 84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 > > arch/x86/hyperv/mmu.c:112) > > >> > [ 84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 > > arch/x86/mm/tlb.c:798) > > >> > [ 84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable- > > generic.c:88) > > >> > > > >> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper > > HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible") > > >> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > > >> > Cc: stable@kernel.org > > >> > Signed-off-by: Sasha Levin <sashal@kernel.org> > > >> > --- > > >> > arch/x86/hyperv/mmu.c | 4 +++- > > >> > 1 file changed, 3 insertions(+), 1 deletion(-) > > >> > > > >> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > > >> > index 5208ba49c89a9..b1d6afc5fc4a3 100644 > > >> > --- a/arch/x86/hyperv/mmu.c > > >> > +++ b/arch/x86/hyperv/mmu.c > > >> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask > > *cpus, > > >> > * must. We will also check all VP numbers when walking the > > >> > * supplied CPU set to remain correct in all cases. > > >> > */ > > >> > - if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64) > > >> > + int last = cpumask_last(cpus); > > >> > + > > >> > + if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >= > > 64) > > >> > goto do_ex_hypercall; > > >> > > >> In case 'cpus' can end up being empty (I'm genuinely suprised it can) > > > > I was just as surprised as you and spent the good part of a day > > debugging this. However, a: > > > > WARN_ON(cpumask_empty(cpus)); > > > > triggers at that line of code even though we check for cpumask_empty() > > at the entry of the function. > > What does the call stack look like when this triggers? I'm curious about > the path where the 'cpus' could be changing while the flush call is in > progress. > > I wonder if CPUs could ever be added to the mask? Removing CPUs can > be handled with some care because an unnecessary flush doesn't hurt > anything. But adding CPUs has serious correctness problems. > The cpumask_empty check is done before disabling irq. Is it possible the mask is modified by an interrupt? If there is a reliable way to trigger this bug, we may be able to test the following patch. diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c index 5208ba49c89a..23fa08d24c1a 100644 --- a/arch/x86/hyperv/mmu.c +++ b/arch/x86/hyperv/mmu.c @@ -66,11 +66,13 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, if (!hv_hypercall_pg) goto do_native; - if (cpumask_empty(cpus)) - return; - local_irq_save(flags); + if (cpumask_empty(cpus)) { + local_irq_restore(flags); + return; + } + flush_pcpu = (struct hv_tlb_flush **) this_cpu_ptr(hyperv_pcpu_input_arg); ^ permalink raw reply related [flat|nested] 21+ messages in thread
* RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2020-10-05 14:58 ` Wei Liu @ 2021-01-05 16:59 ` Michael Kelley 2021-01-05 17:10 ` Wei Liu 2021-01-08 15:22 ` Sasha Levin 0 siblings, 2 replies; 21+ messages in thread From: Michael Kelley @ 2021-01-05 16:59 UTC (permalink / raw) To: Wei Liu Cc: Sasha Levin, vkuznets, tglx, mingo, bp, x86, hpa, linux-hyperv, linux-kernel, stable, KY Srinivasan, Haiyang Zhang, Stephen Hemminger From: Wei Liu <wei.liu@kernel.org> Sent: Monday, October 5, 2020 7:59 AM > > On Sat, Oct 03, 2020 at 05:40:15PM +0000, Michael Kelley wrote: > > From: Sasha Levin <sashal@kernel.org> Sent: Thursday, October 1, 2020 6:04 AM > > > > > > On Thu, Oct 01, 2020 at 11:53:59AM +0000, Wei Liu wrote: > > > >On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote: > > > >> Sasha Levin <sashal@kernel.org> writes: > > > >> > > > >> > cpumask can change underneath us, which is generally safe except when we > > > >> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass > > > >> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read > > > >> > garbage. As reported by KASAN: > > > >> > > > > >> > [ 83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others > > > (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) > > > >> > [ 83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106 > > > >> > [ 84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G W 5.4.60 #1 > > > >> > [ 84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual > Machine, > > > BIOS 090008 12/07/2018 > > > >> > [ 84.196669] Workqueue: writeback wb_workfn (flush-8:0) > > > >> > [ 84.196669] Call Trace: > > > >> > [ 84.196669] dump_stack (lib/dump_stack.c:120) > > > >> > [ 84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375) > > > >> > [ 84.196669] __kasan_report.cold (mm/kasan/report.c:507) > > > >> > [ 84.196669] kasan_report (arch/x86/include/asm/smap.h:71 > > > mm/kasan/common.c:635) > > > >> > [ 84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 > > > arch/x86/hyperv/mmu.c:112) > > > >> > [ 84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 > > > arch/x86/mm/tlb.c:798) > > > >> > [ 84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable- > > > generic.c:88) > > > >> > > > > >> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper > > > HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible") > > > >> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > > > >> > Cc: stable@kernel.org > > > >> > Signed-off-by: Sasha Levin <sashal@kernel.org> > > > >> > --- > > > >> > arch/x86/hyperv/mmu.c | 4 +++- > > > >> > 1 file changed, 3 insertions(+), 1 deletion(-) > > > >> > > > > >> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > > > >> > index 5208ba49c89a9..b1d6afc5fc4a3 100644 > > > >> > --- a/arch/x86/hyperv/mmu.c > > > >> > +++ b/arch/x86/hyperv/mmu.c > > > >> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask > > > *cpus, > > > >> > * must. We will also check all VP numbers when walking the > > > >> > * supplied CPU set to remain correct in all cases. > > > >> > */ > > > >> > - if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64) > > > >> > + int last = cpumask_last(cpus); > > > >> > + > > > >> > + if (last < num_possible_cpus() && > hv_cpu_number_to_vp_number(last) >= > > > 64) > > > >> > goto do_ex_hypercall; > > > >> > > > >> In case 'cpus' can end up being empty (I'm genuinely suprised it can) > > > > > > I was just as surprised as you and spent the good part of a day > > > debugging this. However, a: > > > > > > WARN_ON(cpumask_empty(cpus)); > > > > > > triggers at that line of code even though we check for cpumask_empty() > > > at the entry of the function. > > > > What does the call stack look like when this triggers? I'm curious about > > the path where the 'cpus' could be changing while the flush call is in > > progress. > > > > I wonder if CPUs could ever be added to the mask? Removing CPUs can > > be handled with some care because an unnecessary flush doesn't hurt > > anything. But adding CPUs has serious correctness problems. > > > > The cpumask_empty check is done before disabling irq. Is it possible > the mask is modified by an interrupt? > > If there is a reliable way to trigger this bug, we may be able to test > the following patch. > > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > index 5208ba49c89a..23fa08d24c1a 100644 > --- a/arch/x86/hyperv/mmu.c > +++ b/arch/x86/hyperv/mmu.c > @@ -66,11 +66,13 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, > if (!hv_hypercall_pg) > goto do_native; > > - if (cpumask_empty(cpus)) > - return; > - > local_irq_save(flags); > > + if (cpumask_empty(cpus)) { > + local_irq_restore(flags); > + return; > + } > + > flush_pcpu = (struct hv_tlb_flush **) > this_cpu_ptr(hyperv_pcpu_input_arg); This thread died out 3 months ago without any patches being taken. I recently hit the problem again at random, though not in a reproducible way. I'd like to take Wei Liu's latest proposal to check for an empty cpumask *after* interrupts are disabled. I think this will almost certainly solve the problem, and in a cleaner way than Sasha's proposal. I'd also suggest adding a comment in the code to note the importance of the ordering. Michael ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2021-01-05 16:59 ` Michael Kelley @ 2021-01-05 17:10 ` Wei Liu 2021-01-08 15:22 ` Sasha Levin 1 sibling, 0 replies; 21+ messages in thread From: Wei Liu @ 2021-01-05 17:10 UTC (permalink / raw) To: Michael Kelley Cc: Wei Liu, Sasha Levin, vkuznets, tglx, mingo, bp, x86, hpa, linux-hyperv, linux-kernel, stable, KY Srinivasan, Haiyang Zhang, Stephen Hemminger On Tue, Jan 05, 2021 at 04:59:10PM +0000, Michael Kelley wrote: > From: Wei Liu <wei.liu@kernel.org> Sent: Monday, October 5, 2020 7:59 AM > > > > On Sat, Oct 03, 2020 at 05:40:15PM +0000, Michael Kelley wrote: > > > From: Sasha Levin <sashal@kernel.org> Sent: Thursday, October 1, 2020 6:04 AM > > > > > > > > On Thu, Oct 01, 2020 at 11:53:59AM +0000, Wei Liu wrote: > > > > >On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote: > > > > >> Sasha Levin <sashal@kernel.org> writes: > > > > >> > > > > >> > cpumask can change underneath us, which is generally safe except when we > > > > >> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass > > > > >> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read > > > > >> > garbage. As reported by KASAN: > > > > >> > > > > > >> > [ 83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others > > > > (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) > > > > >> > [ 83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106 > > > > >> > [ 84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G W 5.4.60 #1 > > > > >> > [ 84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual > > Machine, > > > > BIOS 090008 12/07/2018 > > > > >> > [ 84.196669] Workqueue: writeback wb_workfn (flush-8:0) > > > > >> > [ 84.196669] Call Trace: > > > > >> > [ 84.196669] dump_stack (lib/dump_stack.c:120) > > > > >> > [ 84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375) > > > > >> > [ 84.196669] __kasan_report.cold (mm/kasan/report.c:507) > > > > >> > [ 84.196669] kasan_report (arch/x86/include/asm/smap.h:71 > > > > mm/kasan/common.c:635) > > > > >> > [ 84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 > > > > arch/x86/hyperv/mmu.c:112) > > > > >> > [ 84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 > > > > arch/x86/mm/tlb.c:798) > > > > >> > [ 84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable- > > > > generic.c:88) > > > > >> > > > > > >> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper > > > > HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible") > > > > >> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > > > > >> > Cc: stable@kernel.org > > > > >> > Signed-off-by: Sasha Levin <sashal@kernel.org> > > > > >> > --- > > > > >> > arch/x86/hyperv/mmu.c | 4 +++- > > > > >> > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > >> > > > > > >> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > > > > >> > index 5208ba49c89a9..b1d6afc5fc4a3 100644 > > > > >> > --- a/arch/x86/hyperv/mmu.c > > > > >> > +++ b/arch/x86/hyperv/mmu.c > > > > >> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask > > > > *cpus, > > > > >> > * must. We will also check all VP numbers when walking the > > > > >> > * supplied CPU set to remain correct in all cases. > > > > >> > */ > > > > >> > - if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64) > > > > >> > + int last = cpumask_last(cpus); > > > > >> > + > > > > >> > + if (last < num_possible_cpus() && > > hv_cpu_number_to_vp_number(last) >= > > > > 64) > > > > >> > goto do_ex_hypercall; > > > > >> > > > > >> In case 'cpus' can end up being empty (I'm genuinely suprised it can) > > > > > > > > I was just as surprised as you and spent the good part of a day > > > > debugging this. However, a: > > > > > > > > WARN_ON(cpumask_empty(cpus)); > > > > > > > > triggers at that line of code even though we check for cpumask_empty() > > > > at the entry of the function. > > > > > > What does the call stack look like when this triggers? I'm curious about > > > the path where the 'cpus' could be changing while the flush call is in > > > progress. > > > > > > I wonder if CPUs could ever be added to the mask? Removing CPUs can > > > be handled with some care because an unnecessary flush doesn't hurt > > > anything. But adding CPUs has serious correctness problems. > > > > > > > The cpumask_empty check is done before disabling irq. Is it possible > > the mask is modified by an interrupt? > > > > If there is a reliable way to trigger this bug, we may be able to test > > the following patch. > > > > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > > index 5208ba49c89a..23fa08d24c1a 100644 > > --- a/arch/x86/hyperv/mmu.c > > +++ b/arch/x86/hyperv/mmu.c > > @@ -66,11 +66,13 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, > > if (!hv_hypercall_pg) > > goto do_native; > > > > - if (cpumask_empty(cpus)) > > - return; > > - > > local_irq_save(flags); > > > > + if (cpumask_empty(cpus)) { > > + local_irq_restore(flags); > > + return; > > + } > > + > > flush_pcpu = (struct hv_tlb_flush **) > > this_cpu_ptr(hyperv_pcpu_input_arg); > > This thread died out 3 months ago without any patches being taken. > I recently hit the problem again at random, though not in a > reproducible way. > > I'd like to take Wei Liu's latest proposal to check for an empty > cpumask *after* interrupts are disabled. I think this will almost > certainly solve the problem, and in a cleaner way than Sasha's > proposal. I'd also suggest adding a comment in the code to note > the importance of the ordering. > Sure. Let me prepare a proper patch. Wei. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2021-01-05 16:59 ` Michael Kelley 2021-01-05 17:10 ` Wei Liu @ 2021-01-08 15:22 ` Sasha Levin 1 sibling, 0 replies; 21+ messages in thread From: Sasha Levin @ 2021-01-08 15:22 UTC (permalink / raw) To: Michael Kelley Cc: Wei Liu, vkuznets, tglx, mingo, bp, x86, hpa, linux-hyperv, linux-kernel, stable, KY Srinivasan, Haiyang Zhang, Stephen Hemminger On Tue, Jan 05, 2021 at 04:59:10PM +0000, Michael Kelley wrote: >From: Wei Liu <wei.liu@kernel.org> Sent: Monday, October 5, 2020 7:59 AM >> >> On Sat, Oct 03, 2020 at 05:40:15PM +0000, Michael Kelley wrote: >> > From: Sasha Levin <sashal@kernel.org> Sent: Thursday, October 1, 2020 6:04 AM >> > > >> > > On Thu, Oct 01, 2020 at 11:53:59AM +0000, Wei Liu wrote: >> > > >On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote: >> > > >> Sasha Levin <sashal@kernel.org> writes: >> > > >> >> > > >> > cpumask can change underneath us, which is generally safe except when we >> > > >> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass >> > > >> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read >> > > >> > garbage. As reported by KASAN: >> > > >> > >> > > >> > [ 83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others >> > > (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) >> > > >> > [ 83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106 >> > > >> > [ 84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G W 5.4.60 #1 >> > > >> > [ 84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual >> Machine, >> > > BIOS 090008 12/07/2018 >> > > >> > [ 84.196669] Workqueue: writeback wb_workfn (flush-8:0) >> > > >> > [ 84.196669] Call Trace: >> > > >> > [ 84.196669] dump_stack (lib/dump_stack.c:120) >> > > >> > [ 84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375) >> > > >> > [ 84.196669] __kasan_report.cold (mm/kasan/report.c:507) >> > > >> > [ 84.196669] kasan_report (arch/x86/include/asm/smap.h:71 >> > > mm/kasan/common.c:635) >> > > >> > [ 84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 >> > > arch/x86/hyperv/mmu.c:112) >> > > >> > [ 84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 >> > > arch/x86/mm/tlb.c:798) >> > > >> > [ 84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable- >> > > generic.c:88) >> > > >> > >> > > >> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper >> > > HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible") >> > > >> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> >> > > >> > Cc: stable@kernel.org >> > > >> > Signed-off-by: Sasha Levin <sashal@kernel.org> >> > > >> > --- >> > > >> > arch/x86/hyperv/mmu.c | 4 +++- >> > > >> > 1 file changed, 3 insertions(+), 1 deletion(-) >> > > >> > >> > > >> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c >> > > >> > index 5208ba49c89a9..b1d6afc5fc4a3 100644 >> > > >> > --- a/arch/x86/hyperv/mmu.c >> > > >> > +++ b/arch/x86/hyperv/mmu.c >> > > >> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask >> > > *cpus, >> > > >> > * must. We will also check all VP numbers when walking the >> > > >> > * supplied CPU set to remain correct in all cases. >> > > >> > */ >> > > >> > - if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64) >> > > >> > + int last = cpumask_last(cpus); >> > > >> > + >> > > >> > + if (last < num_possible_cpus() && >> hv_cpu_number_to_vp_number(last) >= >> > > 64) >> > > >> > goto do_ex_hypercall; >> > > >> >> > > >> In case 'cpus' can end up being empty (I'm genuinely suprised it can) >> > > >> > > I was just as surprised as you and spent the good part of a day >> > > debugging this. However, a: >> > > >> > > WARN_ON(cpumask_empty(cpus)); >> > > >> > > triggers at that line of code even though we check for cpumask_empty() >> > > at the entry of the function. >> > >> > What does the call stack look like when this triggers? I'm curious about >> > the path where the 'cpus' could be changing while the flush call is in >> > progress. >> > >> > I wonder if CPUs could ever be added to the mask? Removing CPUs can >> > be handled with some care because an unnecessary flush doesn't hurt >> > anything. But adding CPUs has serious correctness problems. >> > >> >> The cpumask_empty check is done before disabling irq. Is it possible >> the mask is modified by an interrupt? >> >> If there is a reliable way to trigger this bug, we may be able to test >> the following patch. >> >> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c >> index 5208ba49c89a..23fa08d24c1a 100644 >> --- a/arch/x86/hyperv/mmu.c >> +++ b/arch/x86/hyperv/mmu.c >> @@ -66,11 +66,13 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, >> if (!hv_hypercall_pg) >> goto do_native; >> >> - if (cpumask_empty(cpus)) >> - return; >> - >> local_irq_save(flags); >> >> + if (cpumask_empty(cpus)) { >> + local_irq_restore(flags); >> + return; >> + } >> + >> flush_pcpu = (struct hv_tlb_flush **) >> this_cpu_ptr(hyperv_pcpu_input_arg); > >This thread died out 3 months ago without any patches being taken. >I recently hit the problem again at random, though not in a >reproducible way. > >I'd like to take Wei Liu's latest proposal to check for an empty >cpumask *after* interrupts are disabled. I think this will almost >certainly solve the problem, and in a cleaner way than Sasha's >proposal. I'd also suggest adding a comment in the code to note >the importance of the ordering. I found that this syzbot reproducer: https://syzkaller.appspot.com//bug?id=47befb59c610a69f024db20b927dea80c88fc045 is pretty good at reproducing the issue too: BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others+0x11ea/0x17c0 Read of size 4 at addr ffff88810005db20 by task 3.c.exe/13007 CPU: 4 PID: 13007 Comm: 3.c.exe Not tainted 5.10.5 #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 06/17/2020 Call Trace: dump_stack+0xa4/0xd9 print_address_description.constprop.0.cold+0xd4/0x509 kasan_report.cold+0x20/0x37 __asan_report_load4_noabort+0x14/0x20 hyperv_flush_tlb_others+0x11ea/0x17c0 flush_tlb_mm_range+0x1fd/0x360 tlb_flush_mmu+0x1b5/0x510 tlb_finish_mmu+0x89/0x360 exit_mmap+0x24f/0x450 mmput+0x121/0x400 do_exit+0x8cf/0x2a70 do_group_exit+0x100/0x300 get_signal+0x3d7/0x1e70 arch_do_signal+0x8c/0x2670 exit_to_user_mode_prepare+0x154/0x1f0 syscall_exit_to_user_mode+0x42/0x280 do_syscall_64+0x45/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x450c2d Code: Unable to access opcode bytes at RIP 0x450c03. RSP: 002b:00007f6c81711d68 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 0000000000450c2d RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00000000004e0428 RBP: 00007f6c81711d80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffeeef33d2e R13: 00007ffeeef33d2f R14: 00007ffeeef33dd0 R15: 00007f6c81711e80 Allocated by task 0: kasan_save_stack+0x23/0x50 __kasan_kmalloc.constprop.0+0xcf/0xe0 kasan_kmalloc+0x9/0x10 __kmalloc+0x1c8/0x3b0 kmalloc_array+0x12/0x14 hyperv_init+0xd4/0x3a0 apic_intr_mode_init+0xbb/0x1e8 x86_late_time_init+0x96/0xa7 start_kernel+0x317/0x3d3 x86_64_start_reservations+0x24/0x26 x86_64_start_kernel+0x7a/0x7e secondary_startup_64_no_verify+0xb0/0xbb The buggy address belongs to the object at ffff88810005db00 which belongs to the cache kmalloc-32 of size 32 The buggy address is located 0 bytes to the right of 32-byte region [ffff88810005db00, ffff88810005db20) The buggy address belongs to the page: page:0000000065310ff0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10005d flags: 0x17ffffc0000200(slab) raw: 0017ffffc0000200 0000000000000000 0000000100000001 ffff888100043a40 raw: 0000000000000000 0000000000400040 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88810005da00: 00 00 00 00 fc fc fc fc 00 00 00 00 fc fc fc fc ffff88810005da80: 00 00 00 00 fc fc fc fc 00 00 00 00 fc fc fc fc >ffff88810005db00: 00 00 00 00 fc fc fc fc 00 00 00 fc fc fc fc fc ^ ffff88810005db80: 00 00 00 fc fc fc fc fc 00 00 00 fc fc fc fc fc ffff88810005dc00: 00 00 00 fc fc fc fc fc 00 00 00 fc fc fc fc fc -- Thanks, Sasha ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() 2020-10-01 11:53 ` Wei Liu 2020-10-01 13:04 ` Sasha Levin @ 2020-10-01 13:10 ` Vitaly Kuznetsov 1 sibling, 0 replies; 21+ messages in thread From: Vitaly Kuznetsov @ 2020-10-01 13:10 UTC (permalink / raw) To: Wei Liu Cc: Sasha Levin, tglx, mingo, bp, x86, hpa, mikelley, linux-hyperv, linux-kernel, stable, kys, haiyangz, sthemmin, wei.liu Wei Liu <wei.liu@kernel.org> writes: > On Thu, Oct 01, 2020 at 11:40:04AM +0200, Vitaly Kuznetsov wrote: >> Sasha Levin <sashal@kernel.org> writes: >> >> > cpumask can change underneath us, which is generally safe except when we >> > call into hv_cpu_number_to_vp_number(): if cpumask ends up empty we pass >> > num_cpu_possible() into hv_cpu_number_to_vp_number(), causing it to read >> > garbage. As reported by KASAN: >> > >> > [ 83.504763] BUG: KASAN: slab-out-of-bounds in hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) >> > [ 83.908636] Read of size 4 at addr ffff888267c01370 by task kworker/u8:2/106 >> > [ 84.196669] CPU: 0 PID: 106 Comm: kworker/u8:2 Tainted: G W 5.4.60 #1 >> > [ 84.196669] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 >> > [ 84.196669] Workqueue: writeback wb_workfn (flush-8:0) >> > [ 84.196669] Call Trace: >> > [ 84.196669] dump_stack (lib/dump_stack.c:120) >> > [ 84.196669] print_address_description.constprop.0 (mm/kasan/report.c:375) >> > [ 84.196669] __kasan_report.cold (mm/kasan/report.c:507) >> > [ 84.196669] kasan_report (arch/x86/include/asm/smap.h:71 mm/kasan/common.c:635) >> > [ 84.196669] hyperv_flush_tlb_others (include/asm-generic/mshyperv.h:128 arch/x86/hyperv/mmu.c:112) >> > [ 84.196669] flush_tlb_mm_range (arch/x86/include/asm/paravirt.h:68 arch/x86/mm/tlb.c:798) >> > [ 84.196669] ptep_clear_flush (arch/x86/include/asm/tlbflush.h:586 mm/pgtable-generic.c:88) >> > >> > Fixes: 0e4c88f37693 ("x86/hyper-v: Use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible") >> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> >> > Cc: stable@kernel.org >> > Signed-off-by: Sasha Levin <sashal@kernel.org> >> > --- >> > arch/x86/hyperv/mmu.c | 4 +++- >> > 1 file changed, 3 insertions(+), 1 deletion(-) >> > >> > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c >> > index 5208ba49c89a9..b1d6afc5fc4a3 100644 >> > --- a/arch/x86/hyperv/mmu.c >> > +++ b/arch/x86/hyperv/mmu.c >> > @@ -109,7 +109,9 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus, >> > * must. We will also check all VP numbers when walking the >> > * supplied CPU set to remain correct in all cases. >> > */ >> > - if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64) >> > + int last = cpumask_last(cpus); >> > + >> > + if (last < num_possible_cpus() && hv_cpu_number_to_vp_number(last) >= 64) >> > goto do_ex_hypercall; >> >> In case 'cpus' can end up being empty (I'm genuinely suprised it can) >> the check is mandatory indeed. I would, however, just return directly in >> this case: >> >> if (last < num_possible_cpus()) >> return; > > I think you want > > last >= num_possible_cpus() > > here? Of course, thanks! > > A more important question is, if the mask can change willy-nilly, what > is stopping it from changing between these checks? I.e. is there still a > windows that hv_cpu_number_to_vp_number(last) can return garbage? > AFAIU some CPUs can be dropped from the mask (because they switch to a different mm?) and if we still flush there it's not a problem. The only real problem I currently see is that we're passing cpumask_last() result to hv_cpu_number_to_vp_number() and cpumask_last() returns num_possible_cpus() when the mask is empty but this can't be passed to hv_cpu_number_to_vp_number(). > Wei. > >> >> if (hv_cpu_number_to_vp_number(last) >= 64) >> goto do_ex_hypercall; >> >> as there's nothing to flush, no need to call into >> hyperv_flush_tlb_others_ex(). >> >> Anyway, the fix seems to be correct, so >> >> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> >> >> > >> > for_each_cpu(cpu, cpus) { >> >> -- >> Vitaly >> > -- Vitaly ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2021-08-22 17:32 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CA+qYZY3a-FHfWNL2=na6O8TRJYu9kaeyp80VNDxaDTi2EBGoog@mail.gmail.com> 2021-08-06 10:43 ` [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() Michael Kelley 2021-08-06 17:35 ` David Mozes [not found] ` <CAHkVu0-ZCXDRZL92d_G3oKpPuKvmY=YEbu9nbx9vkZHnhHFD8Q@mail.gmail.com> 2021-08-06 21:51 ` Michael Kelley 2021-08-07 5:00 ` David Moses 2021-08-17 9:16 ` David Mozes 2021-08-17 11:29 ` Wei Liu 2021-08-19 11:05 ` David Mozes [not found] ` <CA+qYZY1U04SkyHo7X+rDeE=nUy_X5nxLfShyuLJFzXnFp2A6uw@mail.gmail.com> [not found] ` <VI1PR0401MB24153DEC767B0126B1030E07F1C09@VI1PR0401MB2415.eurprd04.prod.outlook.com> 2021-08-22 15:24 ` Wei Liu 2021-08-22 16:25 ` David Mozes 2021-08-22 17:32 ` Wei Liu [not found] <VI1PR0401MB24150B31A1D63176BBB788D2F1F19@VI1PR0401MB2415.eurprd04.prod.outlook.com> 2021-08-05 18:08 ` Michael Kelley 2020-10-01 1:38 Sasha Levin 2020-10-01 9:40 ` Vitaly Kuznetsov 2020-10-01 11:53 ` Wei Liu 2020-10-01 13:04 ` Sasha Levin 2020-10-03 17:40 ` Michael Kelley 2020-10-05 14:58 ` Wei Liu 2021-01-05 16:59 ` Michael Kelley 2021-01-05 17:10 ` Wei Liu 2021-01-08 15:22 ` Sasha Levin 2020-10-01 13:10 ` Vitaly Kuznetsov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).