From: Roman Shaposhnik <roman@zededa.com>
To: "Jürgen Groß" <jgross@suse.com>
Cc: "Stefano Stabellini" <sstabellini@kernel.org>,
Xen-devel <xen-devel@lists.xenproject.org>,
"Jan Beulich" <jbeulich@suse.com>,
"Andrew Cooper" <andrew.cooper3@citrix.com>,
"Roger Pau Monné" <roger.pau@citrix.com>, "Wei Liu" <wl@xen.org>,
"George Dunlap" <george.dunlap@citrix.com>
Subject: Re: Linux DomU freezes and dies under heavy memory shuffling
Date: Fri, 12 Mar 2021 13:33:20 -0800 [thread overview]
Message-ID: <CAMmSBy_0zCa1D5dpw4VFAcJwSiE6RAQoBk5vAJzW1ZPk5Zaxww@mail.gmail.com> (raw)
In-Reply-To: <CAMmSBy8pSZROdPo+gee8oxrU9EL=k+QTJj0UxZTi3Bh+S_g2_w@mail.gmail.com>
Hi Jürgen,
just wanted to give you (and everyone who may be keeping an eye on
this) an update.
Somehow, after applying your kernel patch -- the VM is now running 10
days+ without a problem.
I'll keep experimenting (A/B-testing style) but at this point I'm
actually pretty perplexed as to why this patch would make a difference
(since it is basically just for observability). Any thoughts on that?
Thanks,
Roman.
On Wed, Feb 24, 2021 at 7:06 PM Roman Shaposhnik <roman@zededa.com> wrote:
>
> Hi Jürgen!
>
> sorry for the belated reply -- I wanted to externalize the VM before I
> do -- but let me at least reply to you:
>
> On Tue, Feb 23, 2021 at 5:17 AM Jürgen Groß <jgross@suse.com> wrote:
> >
> > On 18.02.21 06:21, Roman Shaposhnik wrote:
> > > On Wed, Feb 17, 2021 at 12:29 AM Jürgen Groß <jgross@suse.com
> > > <mailto:jgross@suse.com>> wrote:
> > >
> > > On 17.02.21 09:12, Roman Shaposhnik wrote:
> > > > Hi Jürgen, thanks for taking a look at this. A few comments below:
> > > >
> > > > On Tue, Feb 16, 2021 at 10:47 PM Jürgen Groß <jgross@suse.com
> > > <mailto:jgross@suse.com>> wrote:
> > > >>
> > > >> On 16.02.21 21:34, Stefano Stabellini wrote:
> > > >>> + x86 maintainers
> > > >>>
> > > >>> It looks like the tlbflush is getting stuck?
> > > >>
> > > >> I have seen this case multiple times on customer systems now, but
> > > >> reproducing it reliably seems to be very hard.
> > > >
> > > > It is reliably reproducible under my workload but it take a long time
> > > > (~3 days of the workload running in the lab).
> > >
> > > This is by far the best reproduction rate I have seen up to now.
> > >
> > > The next best reproducer seems to be a huge installation with several
> > > hundred hosts and thousands of VMs with about 1 crash each week.
> > >
> > > >
> > > >> I suspected fifo events to be blamed, but just yesterday I've been
> > > >> informed of another case with fifo events disabled in the guest.
> > > >>
> > > >> One common pattern seems to be that up to now I have seen this
> > > effect
> > > >> only on systems with Intel Gold cpus. Can it be confirmed to be true
> > > >> in this case, too?
> > > >
> > > > I am pretty sure mine isn't -- I can get you full CPU specs if
> > > that's useful.
> > >
> > > Just the output of "grep model /proc/cpuinfo" should be enough.
> > >
> > >
> > > processor: 3
> > > vendor_id: GenuineIntel
> > > cpu family: 6
> > > model: 77
> > > model name: Intel(R) Atom(TM) CPU C2550 @ 2.40GHz
> > > stepping: 8
> > > microcode: 0x12d
> > > cpu MHz: 1200.070
> > > cache size: 1024 KB
> > > physical id: 0
> > > siblings: 4
> > > core id: 3
> > > cpu cores: 4
> > > apicid: 6
> > > initial apicid: 6
> > > fpu: yes
> > > fpu_exception: yes
> > > cpuid level: 11
> > > wp: yes
> > > flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
> > > pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp
> > > lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> > > nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est
> > > tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer
> > > aes rdrand lahf_lm 3dnowprefetch cpuid_fault epb pti ibrs ibpb stibp
> > > tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida
> > > arat md_clear
> > > vmx flags: vnmi preemption_timer invvpid ept_x_only flexpriority
> > > tsc_offset vtpr mtf vapic ept vpid unrestricted_guest
> > > bugs: cpu_meltdown spectre_v1 spectre_v2 mds msbds_only
> > > bogomips: 4800.19
> > > clflush size: 64
> > > cache_alignment: 64
> > > address sizes: 36 bits physical, 48 bits virtual
> > > power management:
> > >
> > > >
> > > >> In case anybody has a reproducer (either in a guest or dom0) with a
> > > >> setup where a diagnostic kernel can be used, I'd be _very_
> > > interested!
> > > >
> > > > I can easily add things to Dom0 and DomU. Whether that will
> > > disrupt the
> > > > experiment is, of course, another matter. Still please let me
> > > know what
> > > > would be helpful to do.
> > >
> > > Is there a chance to switch to an upstream kernel in the guest? I'd like
> > > to add some diagnostic code to the kernel and creating the patches will
> > > be easier this way.
> > >
> > >
> > > That's a bit tough -- the VM is based on stock Ubuntu and if I upgrade
> > > the kernel I'll have fiddle with a lot things to make workload
> > > functional again.
> > >
> > > However, I can install debug kernel (from Ubuntu, etc. etc.)
> > >
> > > Of course, if patching the kernel is the only way to make progress --
> > > lets try that -- please let me know.
> >
> > I have found a nice upstream patch, which - with some modifications - I
> > plan to give our customer as a workaround.
> >
> > The patch is for kernel 4.12, but chances are good it will apply to a
> > 4.15 kernel, too.
>
> I'm slightly confused about this patch -- it seems to me that it needs
> to be applied to the guest kernel, correct?
>
> If that's the case -- the challenge I have is that I need to re-build
> the Canonical (Ubuntu) distro kernel with this patch -- this seems
> a bit daunting at first (I mean -- I'm pretty good at rebuilding kernels
> I just never do it with the vendor ones ;-)).
>
> So... if there's anyone here who has any suggestions on how to do that
> -- I'd appreciate pointers.
>
> > I have been able to gather some more data.
> >
> > I have contacted the author of the upstream kernel patch I've been using
> > for our customer (and that helped, by the way).
> >
> > It seems as if the problem is occurring when running as a guest at least
> > under Xen, KVM, and VMWare, and there have been reports of bare metal
> > cases, too. Hunting this bug is going on for several years now, the
> > patch author is at it since 8 months.
> >
> > So we can rule out a Xen problem.
> >
> > Finding the root cause is still important, of course, and your setup
> > seems to have the best reproduction rate up to now.
> >
> > So any help would really be appreciated.
> >
> > Is the VM self contained? Would it be possible to start it e.g. on a
> > test system on my side? If yes, would you be allowed to pass it on to
> > me?
>
> I'm working on externalizing the VM in a way that doesn't disclose anything
> about the customer workload. I'm almost there -- sans my question about
> the vendor kernel rebuild. I plan to make that VM available this week.
>
> Goes without saying, but I would really appreciate your help in chasing this.
>
> Thanks,
> Roman.
next prev parent reply other threads:[~2021-03-12 21:34 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-06 20:03 Linux DomU freezes and dies under heavy memory shuffling Roman Shaposhnik
2021-02-16 20:34 ` Stefano Stabellini
2021-02-17 6:47 ` Jürgen Groß
2021-02-17 8:12 ` Roman Shaposhnik
2021-02-17 8:29 ` Jürgen Groß
2021-02-18 5:21 ` Roman Shaposhnik
2021-02-18 9:34 ` Jürgen Groß
2021-02-23 13:17 ` Jürgen Groß
2021-02-25 3:06 ` Roman Shaposhnik
2021-02-25 3:44 ` Elliott Mitchell
2021-02-25 4:30 ` Roman Shaposhnik
2021-02-25 4:47 ` Elliott Mitchell
2021-03-12 21:33 ` Roman Shaposhnik [this message]
2021-03-13 7:18 ` Jürgen Groß
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMmSBy_0zCa1D5dpw4VFAcJwSiE6RAQoBk5vAJzW1ZPk5Zaxww@mail.gmail.com \
--to=roman@zededa.com \
--cc=andrew.cooper3@citrix.com \
--cc=george.dunlap@citrix.com \
--cc=jbeulich@suse.com \
--cc=jgross@suse.com \
--cc=roger.pau@citrix.com \
--cc=sstabellini@kernel.org \
--cc=wl@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).