All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Mozes <david.mozes@silk.us>
To: David Moses <mosesster@gmail.com>,
	Michael Kelley <mikelley@microsoft.com>
Cc: "תומר אבוטבול" <tomer432100@gmail.com>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()
Date: Tue, 17 Aug 2021 09:16:45 +0000	[thread overview]
Message-ID: <VI1PR0401MB2415E89B6E3D01B446FD1DACF1FE9@VI1PR0401MB2415.eurprd04.prod.outlook.com> (raw)
In-Reply-To: <FD8265E6-895E-45CF-9AE3-787FAD669FC8@gmail.com>

Hi Michael and all .
I am back from the Holiday and did your saggestiones /requstes 

1. While  running with patch number-2 (disable the Hyper-V specific flush routines) 
 As you suspected, we got panic similar to what we got with the Hyper-V specific flash routines. 
Below is the trace we got: 

	[32097.577728] kernel BUG at kernel/sched/rt.c:1004!
[32097.577738] invalid opcode: 0000 [#1] SMP
[32097.578711] CPU: 45 PID: 51244 Comm: STAR4BLKS0_WORK Kdump: loaded Tainted: G           OE     4.19.195-KM9 #1
[32097.578711] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[32097.578711] RIP: 0010:dequeue_top_rt_rq+0x88/0xa0
[32097.578711] Code: 00 48 89 d5 48 0f a3 15 6e 19 82 01 73 d0 48 89 c7 e8 bc b7 fe ff be 02 00 00 00 89 ef 84 c0 74 0b e8 2c 94 04 00 eb b6 0f 0b <0f> 0b e8 b1 93 04 00 eb ab 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00
[32097.578711] RSP: 0018:ffff9442e0de7b48 EFLAGS: 00010046
[32097.578711] RAX: ffff94809f9e1e00 RBX: ffff9448295e4c40 RCX: 00000000ffffffff
[32097.578711] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94809f9e2040
[32097.578711] RBP: ffff94809f9e1e00 R08: fffffffffff0be25 R09: 00000000000216c0
[32097.578711] R10: 00004bbc85e1eff3 R11: 0000000000000000 R12: 0000000000000000
[32097.578711] R13: ffff9448295e4a20 R14: 0000000000021e00 R15: ffff94809fa21e00
[32097.578711] FS:  00007f7b0cea0700(0000) GS:ffff94809f940000(0000) knlGS:0000000000000000
[32097.578711] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[32097.578711] CR2: ffffffffff600400 CR3: 000000201d5b3002 CR4: 00000000003606e0
[32097.578711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[32097.578711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[32097.578711] Call Trace:
[32097.578711]  dequeue_rt_stack+0x3e/0x280
[32097.578711]  dequeue_rt_entity+0x1f/0x70
[32097.578711]  dequeue_task_rt+0x26/0x70
[32097.578711]  push_rt_task+0x1e2/0x220
[32097.578711]  push_rt_tasks+0x11/0x20
[32097.578711]  __balance_callback+0x3b/0x60
[32097.578711]  __schedule+0x6e9/0x830
[32097.578711]  schedule+0x28/0x80
[32097.578711]  futex_wait_queue_me+0xb9/0x120
[32097.578711]  futex_wait+0x139/0x250
[32097.578711]  ? try_to_wake_up+0x54/0x460
[32097.578711]  ? enqueue_task_rt+0x9f/0xc0
[32097.578711]  ? get_futex_key+0x2ee/0x450
[32097.578711]  do_futex+0x2eb/0x9f0
[32097.578711]  __x64_sys_futex+0x143/0x180
[32097.578711]  do_syscall_64+0x59/0x1b0
[32097.578711]  ? prepare_exit_to_usermode+0x70/0x90
[32097.578711]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[32097.578711] RIP: 0033:0x7fa2ae151334
[32097.578711] Code: 66 0f 1f 44 00 00 41 52 52 4d 31 d2 ba 02 00 00 00 81 f6 80 00 00 00 64 23 34 25 48 00 00 00 39 d0 75 07 b8 ca 00 00 00 0f 05 <89> d0 87 07 85 c0 75 f1 5a 41 5a c3 83 3d f1 df 20 00 00 74 59 48
[32097.578711] RSP: 002b:00007f7b0ce9f3b0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[32097.578711] RAX: ffffffffffffffda RBX: 00007f7c1da5bc18 RCX: 00007fa2ae151334
[32097.578711] RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007f7c1da5bc58
[32097.578711] RBP: 00007f7b0ce9f5b0 R08: 00007f7c1da5bc58 R09: 000000000000c82c
[32097.578711] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7b1a149cf0
[32097.578711] R13: 00007f7c1da5bc58 R14: 0000000000000001 R15: 00000000000005a1


2. as you requested and to help to the community  we running patch no 1 as well : 

And that is what we got: 

	Aug 17 05:36:22 10.230.247.7 [40544.392690] Hyper-V: ERROR_HYPERV: cpu_last= 

It looks like we got an empty cpumask ! 	

Would you please let us know what father info you need and what Is the next step for debugging this interesting issue 

Thx
David 




-----Original Message-----
From: David Moses <mosesster@gmail.com> 
Sent: Saturday, August 7, 2021 8:00 AM
To: Michael Kelley <mikelley@microsoft.com>
Cc: תומר אבוטבול <tomer432100@gmail.com>; David Mozes <david.mozes@silk.us>; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others()



Sent from my iPhone

> On Aug 7, 2021, at 12:51 AM, Michael Kelley <mikelley@microsoft.com> wrote:
> 
> From: תומר אבוטבול <tomer432100@gmail.com>  Sent: Friday, August 6, 2021 11:03 AM
> 
>> Attaching the patches Michael asked for debugging 
>> 1) Print the cpumask when < num_possible_cpus():
>> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
>> index e666f7eaf32d..620f656d6195 100644
>> --- a/arch/x86/hyperv/mmu.c
>> +++ b/arch/x86/hyperv/mmu.c
>> @@ -60,6 +60,7 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>>         struct hv_tlb_flush *flush;
>>         u64 status = U64_MAX;
>>         unsigned long flags;
>> +       unsigned int cpu_last;
>> 
>>         trace_hyperv_mmu_flush_tlb_others(cpus, info);
>> 
>> @@ -68,6 +69,11 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>> 
>>         local_irq_save(flags);
>> 
>> +       cpu_last = cpumask_last(cpus);
>> +       if (cpu_last > num_possible_cpus()) {
> 
> I think this should be ">=" since cpus are numbered starting at zero.
> In your VM with 64 CPUs, having CPU #64 in the list would be error.
> 
>> +               pr_emerg("ERROR_HYPERV: cpu_last=%*pbl", cpumask_pr_args(cpus));
>> +       }
>> +
>>         /*
>>          * Only check the mask _after_ interrupt has been disabled to avoid the
>>          * mask changing under our feet.
>> 
>> 2) disable the Hyper-V specific flush routines:
>> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
>> index e666f7eaf32d..8e77cc84775a 100644
>> --- a/arch/x86/hyperv/mmu.c
>> +++ b/arch/x86/hyperv/mmu.c
>> @@ -235,6 +235,7 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
>> 
>> void hyperv_setup_mmu_ops(void)
>>  {
>> +  return;
>>         if (!(ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED))
>>                 return;
> 
> Otherwise, this code looks good to me and matches what I had in mind.
> 
> Note that the function native_flush_tlb_others() is used when the Hyper-V specific
> flush function is disabled per patch #2 above, or when hv_cpu_to_vp_index() returns
> VP_INVALID.  In a quick glance through the code, it appears that native_flush_tlb_others()
> will work even if there's a non-existent CPU in the cpumask that is passed as an argument.
> So perhaps an immediate workaround is Patch #2 above.

The current code of hv_cpu_to_vp_index (where I generated the warning ) is returning VP_INVALID in this case (see previous mail) and look like it is not completely workaround the issue.
the cpu is hanging even not panic Will continue watching .
>   
> 
> Perhaps hyperv_flush_tlb_others() should be made equally tolerant of a non-existent
> CPU being in the list. But if you are willing, I'm still interested in the results of an
> experiment with just Patch #1.  I'm curious about what the CPU list looks like when
> it has a non-existent CPU.  Is it complete garbage, or is there just one non-existent
> CPU?
> 
 We will do my be not next week since vacation but the week after

> The other curiosity is that I haven't seen this Linux panic reported by other users,
> and I think it would have come to our attention if it were happening with any frequency.
> You see the problem fairly regularly.  So I'm wondering what the difference is.
> 
> Michael

  reply	other threads:[~2021-08-17  9:17 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CA+qYZY3a-FHfWNL2=na6O8TRJYu9kaeyp80VNDxaDTi2EBGoog@mail.gmail.com>
2021-08-06 10:43 ` [PATCH] x86/hyper-v: guard against cpu mask changes in hyperv_flush_tlb_others() Michael Kelley
2021-08-06 17:35   ` David Mozes
     [not found]     ` <CAHkVu0-ZCXDRZL92d_G3oKpPuKvmY=YEbu9nbx9vkZHnhHFD8Q@mail.gmail.com>
2021-08-06 21:51       ` Michael Kelley
2021-08-07  5:00         ` David Moses
2021-08-17  9:16           ` David Mozes [this message]
2021-08-17 11:29             ` Wei Liu
2021-08-19 11:05               ` David Mozes
     [not found]               ` <CA+qYZY1U04SkyHo7X+rDeE=nUy_X5nxLfShyuLJFzXnFp2A6uw@mail.gmail.com>
     [not found]                 ` <VI1PR0401MB24153DEC767B0126B1030E07F1C09@VI1PR0401MB2415.eurprd04.prod.outlook.com>
2021-08-22 15:24                   ` Wei Liu
2021-08-22 16:25                     ` David Mozes
2021-08-22 17:32                       ` Wei Liu
2021-08-04 11:23 David Mozes
2021-08-05 18:08 ` Michael Kelley
  -- strict thread matches above, loose matches on Subject: below --
2020-10-01  1:38 Sasha Levin
2020-10-01  9:40 ` Vitaly Kuznetsov
2020-10-01 11:53   ` Wei Liu
2020-10-01 13:04     ` Sasha Levin
2020-10-03 17:40       ` Michael Kelley
2020-10-05 14:58         ` Wei Liu
2021-01-05 16:59           ` Michael Kelley
2021-01-05 17:10             ` Wei Liu
2021-01-08 15:22             ` Sasha Levin
2020-10-01 13:10     ` Vitaly Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR0401MB2415E89B6E3D01B446FD1DACF1FE9@VI1PR0401MB2415.eurprd04.prod.outlook.com \
    --to=david.mozes@silk.us \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikelley@microsoft.com \
    --cc=mosesster@gmail.com \
    --cc=tomer432100@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.