On 12.03.21 22:33, Roman Shaposhnik wrote: > Hi Jürgen, > > just wanted to give you (and everyone who may be keeping an eye on > this) an update. > > Somehow, after applying your kernel patch -- the VM is now running 10 > days+ without a problem. Can you check the kernel console messages, please? There are messages printed when a potential hang is detected, and the hanging cpu is tried to be woken up via another interrupt again. Look for messages containing "csd", so e.g. do dmesg | grep csd in the VM. Thanks, Juergen > > I'll keep experimenting (A/B-testing style) but at this point I'm > actually pretty perplexed as to why this patch would make a difference > (since it is basically just for observability). Any thoughts on that? > > Thanks, > Roman. > > On Wed, Feb 24, 2021 at 7:06 PM Roman Shaposhnik wrote: >> >> Hi Jürgen! >> >> sorry for the belated reply -- I wanted to externalize the VM before I >> do -- but let me at least reply to you: >> >> On Tue, Feb 23, 2021 at 5:17 AM Jürgen Groß wrote: >>> >>> On 18.02.21 06:21, Roman Shaposhnik wrote: >>>> On Wed, Feb 17, 2021 at 12:29 AM Jürgen Groß >>> > wrote: >>>> >>>> On 17.02.21 09:12, Roman Shaposhnik wrote: >>>> > Hi Jürgen, thanks for taking a look at this. A few comments below: >>>> > >>>> > On Tue, Feb 16, 2021 at 10:47 PM Jürgen Groß >>> > wrote: >>>> >> >>>> >> On 16.02.21 21:34, Stefano Stabellini wrote: >>>> >>> + x86 maintainers >>>> >>> >>>> >>> It looks like the tlbflush is getting stuck? >>>> >> >>>> >> I have seen this case multiple times on customer systems now, but >>>> >> reproducing it reliably seems to be very hard. >>>> > >>>> > It is reliably reproducible under my workload but it take a long time >>>> > (~3 days of the workload running in the lab). >>>> >>>> This is by far the best reproduction rate I have seen up to now. >>>> >>>> The next best reproducer seems to be a huge installation with several >>>> hundred hosts and thousands of VMs with about 1 crash each week. >>>> >>>> > >>>> >> I suspected fifo events to be blamed, but just yesterday I've been >>>> >> informed of another case with fifo events disabled in the guest. >>>> >> >>>> >> One common pattern seems to be that up to now I have seen this >>>> effect >>>> >> only on systems with Intel Gold cpus. Can it be confirmed to be true >>>> >> in this case, too? >>>> > >>>> > I am pretty sure mine isn't -- I can get you full CPU specs if >>>> that's useful. >>>> >>>> Just the output of "grep model /proc/cpuinfo" should be enough. >>>> >>>> >>>> processor: 3 >>>> vendor_id: GenuineIntel >>>> cpu family: 6 >>>> model: 77 >>>> model name: Intel(R) Atom(TM) CPU C2550 @ 2.40GHz >>>> stepping: 8 >>>> microcode: 0x12d >>>> cpu MHz: 1200.070 >>>> cache size: 1024 KB >>>> physical id: 0 >>>> siblings: 4 >>>> core id: 3 >>>> cpu cores: 4 >>>> apicid: 6 >>>> initial apicid: 6 >>>> fpu: yes >>>> fpu_exception: yes >>>> cpuid level: 11 >>>> wp: yes >>>> flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat >>>> pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp >>>> lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology >>>> nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est >>>> tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer >>>> aes rdrand lahf_lm 3dnowprefetch cpuid_fault epb pti ibrs ibpb stibp >>>> tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida >>>> arat md_clear >>>> vmx flags: vnmi preemption_timer invvpid ept_x_only flexpriority >>>> tsc_offset vtpr mtf vapic ept vpid unrestricted_guest >>>> bugs: cpu_meltdown spectre_v1 spectre_v2 mds msbds_only >>>> bogomips: 4800.19 >>>> clflush size: 64 >>>> cache_alignment: 64 >>>> address sizes: 36 bits physical, 48 bits virtual >>>> power management: >>>> >>>> > >>>> >> In case anybody has a reproducer (either in a guest or dom0) with a >>>> >> setup where a diagnostic kernel can be used, I'd be _very_ >>>> interested! >>>> > >>>> > I can easily add things to Dom0 and DomU. Whether that will >>>> disrupt the >>>> > experiment is, of course, another matter. Still please let me >>>> know what >>>> > would be helpful to do. >>>> >>>> Is there a chance to switch to an upstream kernel in the guest? I'd like >>>> to add some diagnostic code to the kernel and creating the patches will >>>> be easier this way. >>>> >>>> >>>> That's a bit tough -- the VM is based on stock Ubuntu and if I upgrade >>>> the kernel I'll have fiddle with a lot things to make workload >>>> functional again. >>>> >>>> However, I can install debug kernel (from Ubuntu, etc. etc.) >>>> >>>> Of course, if patching the kernel is the only way to make progress -- >>>> lets try that -- please let me know. >>> >>> I have found a nice upstream patch, which - with some modifications - I >>> plan to give our customer as a workaround. >>> >>> The patch is for kernel 4.12, but chances are good it will apply to a >>> 4.15 kernel, too. >> >> I'm slightly confused about this patch -- it seems to me that it needs >> to be applied to the guest kernel, correct? >> >> If that's the case -- the challenge I have is that I need to re-build >> the Canonical (Ubuntu) distro kernel with this patch -- this seems >> a bit daunting at first (I mean -- I'm pretty good at rebuilding kernels >> I just never do it with the vendor ones ;-)). >> >> So... if there's anyone here who has any suggestions on how to do that >> -- I'd appreciate pointers. >> >>> I have been able to gather some more data. >>> >>> I have contacted the author of the upstream kernel patch I've been using >>> for our customer (and that helped, by the way). >>> >>> It seems as if the problem is occurring when running as a guest at least >>> under Xen, KVM, and VMWare, and there have been reports of bare metal >>> cases, too. Hunting this bug is going on for several years now, the >>> patch author is at it since 8 months. >>> >>> So we can rule out a Xen problem. >>> >>> Finding the root cause is still important, of course, and your setup >>> seems to have the best reproduction rate up to now. >>> >>> So any help would really be appreciated. >>> >>> Is the VM self contained? Would it be possible to start it e.g. on a >>> test system on my side? If yes, would you be allowed to pass it on to >>> me? >> >> I'm working on externalizing the VM in a way that doesn't disclose anything >> about the customer workload. I'm almost there -- sans my question about >> the vendor kernel rebuild. I plan to make that VM available this week. >> >> Goes without saying, but I would really appreciate your help in chasing this. >> >> Thanks, >> Roman. >