Hi,

Last weekend I deployed a hypervisor built from 4.10.1 release
plus the most recent XSAs (which were under embargo at that time).
Previously to this I had only gone as far as XSA-267, having taken a
decision to wait before applying later XSAs. So, this most recent
deployment included the fixes for XSA-273 for the first time.

Over the course of this past week, some guests started to experience
sporadic assertion failures in libc/malloc.c or strange segmentation
violations. In most cases it is not easily reproducible, but I got a
report from one guest administrator that their php-fpm process is
reliably segfaulting immediately. For example:

[19-Nov-2018 06:39:56] WARNING: [pool www] child 3682 exited on signal 11 (SIGSEGV) after 18.601413 seconds from start
[19-Nov-2018 06:39:56] NOTICE: [pool www] child 3683 started
[19-Nov-2018 06:40:16] WARNING: [pool www] child 3683 exited on signal 11 (SIGSEGV) after 20.364357 seconds from start
[19-Nov-2018 06:40:16] NOTICE: [pool www] child 3684 started
[19-Nov-2018 06:43:43] WARNING: [pool www] child 3426 exited on signal 11 (SIGSEGV) after 1327.885798 seconds from start
[19-Nov-2018 06:43:43] NOTICE: [pool www] child 3739 started
[19-Nov-2018 06:43:59] WARNING: [pool www] child 3739 exited on signal 11 (SIGSEGV) after 15.922980 seconds from start

The failures that mention malloc.c are happening in multiple
different binaries, including grep, perl and shells. They look like
this:

grep: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.

I have not been able to reproduce these problems when I boot the
hypervisor with pv-l1tf=false. The php-fpm one was previously
reproducible 100% of the time. The other cases are very hard to
trigger but with pv-l1tf=false I am not able to at all.

I have since checked out staging-4.10 and am experiencing the same
thing, so I'm fairly confident it is not something I've introduced
when applying XSA patches.

My workload is several hundred PV guests across 9 servers with two
different types of Intel CPU. The guests are of many different Linux
distributions, probably a 70/30 split between 32- and 64-bit. I have
so far only encountered this with 64-bit guests running Debian
jessie and stretch, less than 10 guests are affected (so far
reported), and all of them trigger the "d1 L1TF-vulnerable L1e
000000006a6ff960 - Shadowing" warning in dmesg (though there are
hundreds of others which trigger it yet seem unaffected). There is
an unconfirmed report from 64-bit Gentoo.

In the text for XSA-273 it says:

    "Shadowing comes with a workload-dependent performance hit to
    the guest.  Once the guest kernel software updates have been
    applied, a well behaved guest will not write vulnerable PTEs,
    and will therefore avoid the performance penalty (or crash)
    entirely."

Does anyone have a reference to what is needed in the Linux kernel
for that? Perhaps I can see what the status of that is within kernel
upstream / Debian and then get past the problem by getting an
updated guest kernel onto affected guests.

Also:

    "This behaviour is active by default for guests on affected
    hardware (controlled by `pv-l1tf=`), but is disabled by default
    for dom0. Dom0's exemption is because of instabilities when
    being shadowed, which are under investigation"

I have not had these issues in any of my 9 dom0s which are all
64-bit Debian jessie. Since these L1TF fixes are not active for
dom0, that makes sense. Are the observed dom0 instabilities similar
to what I am seeing in some guests?

Any suggestions for further debugging? I have attached an "xl
dmesg".

Cheers,
Andy