* [xen-unstable test] 101698: regressions - FAIL
@ 2016-10-27 18:26 osstest service owner
2016-10-28 9:55 ` Broadwell TLB Erratum Andrew Cooper
0 siblings, 1 reply; 9+ messages in thread
From: osstest service owner @ 2016-10-27 18:26 UTC (permalink / raw)
To: xen-devel, osstest-admin
[-- Attachment #1: Type: text/plain, Size: 18468 bytes --]
flight 101698 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/101698/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-xtf-amd64-amd64-5 44 xtf/test-hvm64-xsa-186 fail REGR. vs. 101673
test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail REGR. vs. 101673
test-amd64-i386-xl-qemut-debianhvm-amd64 17 guest-start/debianhvm.repeat fail REGR. vs. 101673
Regressions which are regarded as allowable (not blocking):
test-armhf-armhf-libvirt 13 saverestore-support-check fail like 101673
test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 101673
test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 101673
test-armhf-armhf-libvirt-qcow2 12 saverestore-support-check fail like 101673
test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 101673
test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 101673
test-armhf-armhf-libvirt-xsm 13 saverestore-support-check fail like 101673
test-armhf-armhf-libvirt-raw 12 saverestore-support-check fail like 101673
test-amd64-amd64-xl-rtds 9 debian-install fail like 101673
Tests which did not succeed, but are not blocking:
test-amd64-amd64-rumprun-amd64 1 build-check(1) blocked n/a
test-amd64-i386-rumprun-i386 1 build-check(1) blocked n/a
build-amd64-rumprun 7 xen-build fail never pass
test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass
build-i386-rumprun 7 xen-build fail never pass
test-amd64-amd64-libvirt 12 migrate-support-check fail never pass
test-amd64-amd64-libvirt-xsm 12 migrate-support-check fail never pass
test-amd64-i386-libvirt 12 migrate-support-check fail never pass
test-amd64-i386-libvirt-xsm 12 migrate-support-check fail never pass
test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass
test-amd64-amd64-libvirt-vhd 11 migrate-support-check fail never pass
test-armhf-armhf-xl-arndale 12 migrate-support-check fail never pass
test-armhf-armhf-xl-arndale 13 saverestore-support-check fail never pass
test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2 fail never pass
test-armhf-armhf-libvirt 12 migrate-support-check fail never pass
test-armhf-armhf-xl-xsm 12 migrate-support-check fail never pass
test-armhf-armhf-xl-xsm 13 saverestore-support-check fail never pass
test-armhf-armhf-xl 12 migrate-support-check fail never pass
test-armhf-armhf-xl-cubietruck 12 migrate-support-check fail never pass
test-armhf-armhf-xl-cubietruck 13 saverestore-support-check fail never pass
test-armhf-armhf-xl 13 saverestore-support-check fail never pass
test-armhf-armhf-xl-credit2 12 migrate-support-check fail never pass
test-armhf-armhf-xl-credit2 13 saverestore-support-check fail never pass
test-amd64-amd64-xl-pvh-intel 11 guest-start fail never pass
test-armhf-armhf-libvirt-qcow2 11 migrate-support-check fail never pass
test-armhf-armhf-xl-rtds 12 migrate-support-check fail never pass
test-armhf-armhf-xl-rtds 13 saverestore-support-check fail never pass
test-armhf-armhf-libvirt-xsm 12 migrate-support-check fail never pass
test-armhf-armhf-xl-multivcpu 12 migrate-support-check fail never pass
test-armhf-armhf-xl-multivcpu 13 saverestore-support-check fail never pass
test-armhf-armhf-libvirt-raw 11 migrate-support-check fail never pass
test-armhf-armhf-xl-vhd 11 migrate-support-check fail never pass
test-armhf-armhf-xl-vhd 12 saverestore-support-check fail never pass
version targeted for testing:
xen e26722422764d3ddfe59e76f5efbd330f8f9288f
baseline version:
xen 6f9b62ca57322197e26d3b22ff11b629697142bd
Last test of basis 101673 2016-10-26 02:01:16 Z 1 days
Testing same since 101698 2016-10-26 19:50:48 Z 0 days 1 attempts
------------------------------------------------------------
People who touched revisions under test:
Andrew Cooper <andrew.cooper3@citrix.com>
Dario Faggioli <dario.faggioli@citrix.com>
David Scott <dave@recoil.org>
Ian Jackson <Ian.Jackson@eu.citrix.com>
Jan Beulich <jbeulich@suse.com>
Juergen Gross <jgross@suse.com>
Meng Xu <mengxu@cis.upenn.edu>
Roger Pau Monne <roger.pau@citrix.com>
Roger Pau Monné <roger.pau@citrix.com>
Wei Liu <wei.liu2@citrix.com>
jobs:
build-amd64-xsm pass
build-armhf-xsm pass
build-i386-xsm pass
build-amd64-xtf pass
build-amd64 pass
build-armhf pass
build-i386 pass
build-amd64-libvirt pass
build-armhf-libvirt pass
build-i386-libvirt pass
build-amd64-oldkern pass
build-i386-oldkern pass
build-amd64-prev pass
build-i386-prev pass
build-amd64-pvops pass
build-armhf-pvops pass
build-i386-pvops pass
build-amd64-rumprun fail
build-i386-rumprun fail
test-xtf-amd64-amd64-1 pass
test-xtf-amd64-amd64-2 pass
test-xtf-amd64-amd64-3 pass
test-xtf-amd64-amd64-4 pass
test-xtf-amd64-amd64-5 pass
test-amd64-amd64-xl pass
test-armhf-armhf-xl pass
test-amd64-i386-xl pass
test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm pass
test-amd64-i386-xl-qemut-debianhvm-amd64-xsm pass
test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm pass
test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass
test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm pass
test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm fail
test-amd64-amd64-libvirt-xsm pass
test-armhf-armhf-libvirt-xsm pass
test-amd64-i386-libvirt-xsm pass
test-amd64-amd64-xl-xsm pass
test-armhf-armhf-xl-xsm pass
test-amd64-i386-xl-xsm pass
test-amd64-amd64-qemuu-nested-amd fail
test-amd64-amd64-xl-pvh-amd fail
test-amd64-i386-qemut-rhel6hvm-amd pass
test-amd64-i386-qemuu-rhel6hvm-amd pass
test-amd64-amd64-xl-qemut-debianhvm-amd64 pass
test-amd64-i386-xl-qemut-debianhvm-amd64 fail
test-amd64-amd64-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-freebsd10-amd64 pass
test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
test-amd64-i386-xl-qemuu-ovmf-amd64 pass
test-amd64-amd64-rumprun-amd64 blocked
test-amd64-amd64-xl-qemut-win7-amd64 fail
test-amd64-i386-xl-qemut-win7-amd64 fail
test-amd64-amd64-xl-qemuu-win7-amd64 fail
test-amd64-i386-xl-qemuu-win7-amd64 fail
test-armhf-armhf-xl-arndale pass
test-amd64-amd64-xl-credit2 pass
test-armhf-armhf-xl-credit2 pass
test-armhf-armhf-xl-cubietruck pass
test-amd64-i386-freebsd10-i386 pass
test-amd64-i386-rumprun-i386 blocked
test-amd64-amd64-qemuu-nested-intel pass
test-amd64-amd64-xl-pvh-intel fail
test-amd64-i386-qemut-rhel6hvm-intel pass
test-amd64-i386-qemuu-rhel6hvm-intel pass
test-amd64-amd64-libvirt pass
test-armhf-armhf-libvirt pass
test-amd64-i386-libvirt pass
test-amd64-amd64-migrupgrade pass
test-amd64-i386-migrupgrade pass
test-amd64-amd64-xl-multivcpu pass
test-armhf-armhf-xl-multivcpu pass
test-amd64-amd64-pair pass
test-amd64-i386-pair pass
test-amd64-amd64-libvirt-pair pass
test-amd64-i386-libvirt-pair pass
test-amd64-amd64-amd64-pvgrub pass
test-amd64-amd64-i386-pvgrub pass
test-amd64-amd64-pygrub pass
test-armhf-armhf-libvirt-qcow2 pass
test-amd64-amd64-xl-qcow2 pass
test-armhf-armhf-libvirt-raw pass
test-amd64-i386-xl-raw pass
test-amd64-amd64-xl-rtds fail
test-armhf-armhf-xl-rtds pass
test-amd64-i386-xl-qemut-winxpsp3-vcpus1 pass
test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 pass
test-amd64-amd64-libvirt-vhd pass
test-armhf-armhf-xl-vhd pass
test-amd64-amd64-xl-qemut-winxpsp3 pass
test-amd64-i386-xl-qemut-winxpsp3 pass
test-amd64-amd64-xl-qemuu-winxpsp3 pass
test-amd64-i386-xl-qemuu-winxpsp3 pass
------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images
Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs
Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master
Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary
Not pushing.
------------------------------------------------------------
commit e26722422764d3ddfe59e76f5efbd330f8f9288f
Author: Jan Beulich <jbeulich@suse.com>
Date: Wed Oct 26 16:13:21 2016 +0200
Revert "keyhandler: rework process of nonirq keyhandler"
This reverts commit 610b4eda2ce2b87cccbc8f61bdec01052e54fc66.
It's not useful without ed7e33747d, which got reverted already.
commit 9f47f3d69f4dcb2b33ccb8fb20057152302ea1ad
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Wed Oct 26 12:06:44 2016 +0100
x86/emul: Move CPUID Faulting fault generation into the emulator
In hindsight, this is a better position for it, as it avoids opencoding
hvmemul_inject_hw_exception() in hvmemul_cpuid(), and reduces the requirements
on other ops->cpuid() hooks wanting to implement cpuid faulting in the future.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
commit 0888d36bb23f7365ce12b03127fd0fb2661ec90e
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Fri Sep 23 14:48:27 2016 +0100
x86/emul: Correct the decoding of SReg3 operands
REX.R is ignored when considering segment register operands, and needs masking
out first.
While fixing this, reorder the user segments in x86_segment to match SReg3
encoding. This avoids needing a translation table between hardware ordering
and Xen's ordering.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
commit 22bc820abb5200729dc387e6a0653c31daecfef3
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Tue Oct 25 18:46:39 2016 +0100
x86/emul: Use explicit __attribute__((__packed__)) rather than __packed
x86_emulate.h is included by the userspace test harness. Avoid using
constructs which don't come from standard header files.
Reposition the test harnesses inclusion of x86_emulate.h to avoid relying on
any definitions intended for use by x86_emulate.c alone.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
commit 1b843b2097e89d0fae18123cde88da9d167d9a0c
Author: Meng Xu <mengxu@cis.upenn.edu>
Date: Fri Oct 21 22:12:02 2016 -0400
xen: rtds: always clear the flag when replenishing a depleted vcpu
We should clear the __RTDS_depleted bit once a VCPU budget is replenished.
Because repl_timer_handler may be called after rt_schedule
but before rt_context_saved, the VCPU may be not on CPU or on queue
when the VCPU is the middle of context switch
Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
commit 1307f8d3d46fe34f6eb739894008e8af3c168818
Author: Juergen Gross <jgross@suse.com>
Date: Mon Oct 24 13:27:17 2016 +0200
docs: remove wrong statement about bug in xenstore
docs/misc/xenstore.txt states that xenstored will use "0" as a valid
transaction id after 2^32 transactions. This is not true. Remove that
statement.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
commit 0897514b4b376a167f968f79c6ea0dee1061458e
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Wed Oct 26 10:34:21 2016 +0100
tools/oxenstored: Avoid allocating invalid transaction ids
The transaction id of 0 is reserved, meaning "not in a transaction". It is up
to the xenstored server to allocate transaction ids. While oxenstored starts
its ids at 1, but insufficient care is taken with truncation cases.
A 32bit oxenstored has an int with 31 bits of width, meaning that the
transaction id will wrap around to 0 after 2 billion transactions.
A 64bit oxenstored has an int with 63 bits of width, meaning that once 4
billion transactions are used, the allocated id will be truncated when written
into the uin32_t field in the ring. This causes the client to reply with the
truncated id, breaking any further attempt to use any transactions.
Limit all transaction ids to the range between 1 and 0x7ffffffe. This is the
best which can be done without making oxenstored depend on Stdint or Cstruct,
yet still work for 32bit builds.
Also check that the proposed new transaction id isn't currently in use. For
the first 2 billion transactions there is no chance of a collision, and after
that, the chance is at most 20 (the default open transaction quota) in 2
billion.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: David Scott <dave@recoil.org>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
commit 4000a7c7d7b0e01837abd3918e393f289c07d68c
Author: Roger Pau Monne <roger.pau@citrix.com>
Date: Tue Oct 25 11:53:28 2016 +0200
tools/configure: fix pkg-config install path for FreeBSD
pkg-config from FreeBSD ports doesn't have ${prefix}/share/pkgconfig in the
default search path, fix this by having a PKG_INSTALLDIR variable that can
be changed on a per-OS basis.
It would be best to use PKG_INSTALLDIR as defined by the pkg.m4 macro, but
sadly this also reports a wrong value on FreeBSD (${libdir}/pkgconfig, which
expands to /usr/local/lib/pkgconfig by default, and is also _not_ part of
the default pkg-config search path).
This patch should not change the behavior for Linux installs.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Alexander Nusov <alexander.nusov@nfvexpress.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
commit 0d250b69eae5d1e8039270c763b05acc84589a8c
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date: Wed Oct 26 12:06:17 2016 +0100
Update QEMU_UPSTREAM_REVISION
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
(qemu changes not included)
[-- Attachment #2: Type: text/plain, Size: 127 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Broadwell TLB Erratum
2016-10-27 18:26 [xen-unstable test] 101698: regressions - FAIL osstest service owner
@ 2016-10-28 9:55 ` Andrew Cooper
2016-10-28 10:09 ` Jan Beulich
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2016-10-28 9:55 UTC (permalink / raw)
To: osstest service owner, xen-devel, Lai, Paul C, Kevin Tian, Jun Nakajima
Cc: Ian Jackson, Wei Liu, Jan Beulich
On 27/10/16 19:26, osstest service owner wrote:
> flight 101698 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/101698/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> test-xtf-amd64-amd64-5 44 xtf/test-hvm64-xsa-186 fail REGR. vs. 101673
--- Xen Test Framework ---
Environment: HVM 64bit (Long mode 4 levels)
XSA-186 PoC
******************************
PANIC: Unhandled exception at 0008:fffffffffffffffa
Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
******************************
This is an issue I have seen before, and I think it is TLB erratum in
Broadwell processors. Within XenServer, it has now been observed on one
SDP and two different Broadwell servers from different vendors.
The first CPU I saw it on was
CPU Vendor: Intel, Family 6 (0x6), Model 71 (0x47), Stepping 1 (raw
00040671)
Nobbling-1, which this test ran on is
CPU Vendor: Intel, Family 6 (0x6), Model 79 (0x4f), Stepping 1 (raw
000406f1)
The code in question sets up the mapping, memcpy()'s an instruction stub
into place, then calls the stub.
This pagefault is from the call, after the memcpy() has succeeded,
therefore proving the mapping is present in the dTLB.
The issue reproduces ~1 in 200 times, but can reliably be found in a
minute or two. Inserting an invlpg instruction immediately before the
call appears to resolve the issue (i.e. the tests run for ~1 hour
without observing the issue).
Architecturally however, this invlpg should have no effect. I think
there is some race condition propagating TLB records to the L1 iTLB if
it is already present in the L1 dTLB.
At the first time I discovered this, I checked the NDA Specification
Update for the processor, and didn't find any published errata which
matched the symptoms.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Broadwell TLB Erratum
2016-10-28 9:55 ` Broadwell TLB Erratum Andrew Cooper
@ 2016-10-28 10:09 ` Jan Beulich
2016-10-28 10:36 ` [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum Andrew Cooper
0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2016-10-28 10:09 UTC (permalink / raw)
To: Andrew Cooper
Cc: Kevin Tian, Wei Liu, Paul C Lai, IanJackson,
osstest service owner, Jun Nakajima, xen-devel
>>> On 28.10.16 at 11:55, <andrew.cooper3@citrix.com> wrote:
> On 27/10/16 19:26, osstest service owner wrote:
>> flight 101698 xen-unstable real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/101698/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>> test-xtf-amd64-amd64-5 44 xtf/test-hvm64-xsa-186 fail REGR. vs.
> 101673
>
> --- Xen Test Framework ---
> Environment: HVM 64bit (Long mode 4 levels)
> XSA-186 PoC
> ******************************
> PANIC: Unhandled exception at 0008:fffffffffffffffa
> Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
> ******************************
>
> This is an issue I have seen before, and I think it is TLB erratum in
> Broadwell processors. Within XenServer, it has now been observed on one
> SDP and two different Broadwell servers from different vendors.
>
> The first CPU I saw it on was
>
> CPU Vendor: Intel, Family 6 (0x6), Model 71 (0x47), Stepping 1 (raw
> 00040671)
>
> Nobbling-1, which this test ran on is
>
> CPU Vendor: Intel, Family 6 (0x6), Model 79 (0x4f), Stepping 1 (raw
> 000406f1)
>
>
> The code in question sets up the mapping, memcpy()'s an instruction stub
> into place, then calls the stub.
>
> This pagefault is from the call, after the memcpy() has succeeded,
> therefore proving the mapping is present in the dTLB.
>
> The issue reproduces ~1 in 200 times, but can reliably be found in a
> minute or two. Inserting an invlpg instruction immediately before the
> call appears to resolve the issue (i.e. the tests run for ~1 hour
> without observing the issue).
>
> Architecturally however, this invlpg should have no effect. I think
> there is some race condition propagating TLB records to the L1 iTLB if
> it is already present in the L1 dTLB.
>
> At the first time I discovered this, I checked the NDA Specification
> Update for the processor, and didn't find any published errata which
> matched the symptoms.
So until you/we hear back from Intel (which as we all know can take
a while), could you insert an INVLPG in the test, to eliminate these
(supposedly spurious) failures?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum
2016-10-28 10:09 ` Jan Beulich
@ 2016-10-28 10:36 ` Andrew Cooper
2016-10-28 12:03 ` Jan Beulich
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2016-10-28 10:36 UTC (permalink / raw)
To: Xen-devel; +Cc: Andrew Cooper
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
tests/xsa-186/main.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/tests/xsa-186/main.c b/tests/xsa-186/main.c
index f2bc8f6..fe7e98b 100644
--- a/tests/xsa-186/main.c
+++ b/tests/xsa-186/main.c
@@ -144,6 +144,29 @@ void test_main(void)
memcpy(stub, insn_buf_start, insn_buf_end - insn_buf_start);
/*
+ * Work around suspected Broadwell TLB Erratum
+ *
+ * Occasionally, this test failes with:
+ *
+ * --- Xen Test Framework ---
+ * Environment: HVM 64bit (Long mode 4 levels)
+ * XSA-186 PoC
+ * ******************************
+ * PANIC: Unhandled exception at 0008:fffffffffffffffa
+ * Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
+ * ******************************
+ *
+ * on Broadwell hardware. The mapping is definitely present as the
+ * memcpy() has already succeeded. Inserting an invlpg resolves the
+ * issue, sugguesting that there is a race conditon between dTLB/iTLB
+ * handling.
+ *
+ * Work around the issue for now, to avoid intermittent OSSTest failures
+ * from blocking pushes of unrelated changes.
+ */
+ invlpg(stub);
+
+ /*
* Execute the stub.
*
* Intel CPUs are happy doing this for 32 and 64bit. AMD CPUs are happy
--
2.1.4
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum
2016-10-28 10:36 ` [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum Andrew Cooper
@ 2016-10-28 12:03 ` Jan Beulich
2016-10-28 12:39 ` Andrew Cooper
0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2016-10-28 12:03 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Xen-devel
>>> On 28.10.16 at 12:36, <andrew.cooper3@citrix.com> wrote:
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(Maybe you want to drop the ...
> --- a/tests/xsa-186/main.c
> +++ b/tests/xsa-186/main.c
> @@ -144,6 +144,29 @@ void test_main(void)
> memcpy(stub, insn_buf_start, insn_buf_end - insn_buf_start);
>
> /*
> + * Work around suspected Broadwell TLB Erratum
> + *
> + * Occasionally, this test failes with:
> + *
> + * --- Xen Test Framework ---
> + * Environment: HVM 64bit (Long mode 4 levels)
> + * XSA-186 PoC
> + * ******************************
> + * PANIC: Unhandled exception at 0008:fffffffffffffffa
> + * Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
> + * ******************************
> + *
> + * on Broadwell hardware. The mapping is definitely present as the
> + * memcpy() has already succeeded. Inserting an invlpg resolves the
> + * issue, sugguesting that there is a race conditon between dTLB/iTLB
... stray u which slipped into "suggesting".)
Btw - would you mind trying something else: Instead of the INVLPG,
put a CPUID or some other serializing instruction in here. ISTR that
for self modifying code this is required, i.e. the CPU could have been
fetching instructions ahead of the memcpy(), and nothing would be
there to force it to drop what it has already executed speculatively,
including the exception token.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum
2016-10-28 12:03 ` Jan Beulich
@ 2016-10-28 12:39 ` Andrew Cooper
2016-10-28 12:49 ` Jan Beulich
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2016-10-28 12:39 UTC (permalink / raw)
To: Jan Beulich; +Cc: Xen-devel
On 28/10/16 13:03, Jan Beulich wrote:
>>>> On 28.10.16 at 12:36, <andrew.cooper3@citrix.com> wrote:
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> (Maybe you want to drop the ...
>
>> --- a/tests/xsa-186/main.c
>> +++ b/tests/xsa-186/main.c
>> @@ -144,6 +144,29 @@ void test_main(void)
>> memcpy(stub, insn_buf_start, insn_buf_end - insn_buf_start);
>>
>> /*
>> + * Work around suspected Broadwell TLB Erratum
>> + *
>> + * Occasionally, this test failes with:
>> + *
>> + * --- Xen Test Framework ---
>> + * Environment: HVM 64bit (Long mode 4 levels)
>> + * XSA-186 PoC
>> + * ******************************
>> + * PANIC: Unhandled exception at 0008:fffffffffffffffa
>> + * Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
>> + * ******************************
>> + *
>> + * on Broadwell hardware. The mapping is definitely present as the
>> + * memcpy() has already succeeded. Inserting an invlpg resolves the
>> + * issue, sugguesting that there is a race conditon between dTLB/iTLB
> ... stray u which slipped into "suggesting".)
>
> Btw - would you mind trying something else: Instead of the INVLPG,
> put a CPUID or some other serializing instruction in here. ISTR that
> for self modifying code this is required, i.e. the CPU could have been
> fetching instructions ahead of the memcpy(), and nothing would be
> there to force it to drop what it has already executed speculatively,
> including the exception token.
That is an interesting point, but still doesn't explain the symptoms.
If the icache wasn't flushed, we might get junk instructions and a #UD/#GP.
However, in this case the fault is for an instruction fetch from a
non-present page, not a failure to execute what it found there.
I expect a cpuid instruction would resolve the issue, but it also forces
a vmexit which complicates the microarchitectural interactions here.
Something else, like executing an int3 will also serialise the pipeline,
but not vmexit. I will try and find some time to experiment.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum
2016-10-28 12:39 ` Andrew Cooper
@ 2016-10-28 12:49 ` Jan Beulich
2016-10-28 13:02 ` Andrew Cooper
0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2016-10-28 12:49 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Xen-devel
>>> On 28.10.16 at 14:39, <andrew.cooper3@citrix.com> wrote:
> On 28/10/16 13:03, Jan Beulich wrote:
>>>>> On 28.10.16 at 12:36, <andrew.cooper3@citrix.com> wrote:
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> (Maybe you want to drop the ...
>>
>>> --- a/tests/xsa-186/main.c
>>> +++ b/tests/xsa-186/main.c
>>> @@ -144,6 +144,29 @@ void test_main(void)
>>> memcpy(stub, insn_buf_start, insn_buf_end - insn_buf_start);
>>>
>>> /*
>>> + * Work around suspected Broadwell TLB Erratum
>>> + *
>>> + * Occasionally, this test failes with:
>>> + *
>>> + * --- Xen Test Framework ---
>>> + * Environment: HVM 64bit (Long mode 4 levels)
>>> + * XSA-186 PoC
>>> + * ******************************
>>> + * PANIC: Unhandled exception at 0008:fffffffffffffffa
>>> + * Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
>>> + * ******************************
>>> + *
>>> + * on Broadwell hardware. The mapping is definitely present as the
>>> + * memcpy() has already succeeded. Inserting an invlpg resolves the
>>> + * issue, sugguesting that there is a race conditon between dTLB/iTLB
>> ... stray u which slipped into "suggesting".)
>>
>> Btw - would you mind trying something else: Instead of the INVLPG,
>> put a CPUID or some other serializing instruction in here. ISTR that
>> for self modifying code this is required, i.e. the CPU could have been
>> fetching instructions ahead of the memcpy(), and nothing would be
>> there to force it to drop what it has already executed speculatively,
>> including the exception token.
>
> That is an interesting point, but still doesn't explain the symptoms.
> If the icache wasn't flushed, we might get junk instructions and a #UD/#GP.
No. As the processor speculates the call, it won't be able to fetch
the target instruction and hence would insert an exception token
into the queue. There would be junk instruction bytes only if there
was a prior mapping for that page, but aiui a mapping for that
address gets established exactly once.
> However, in this case the fault is for an instruction fetch from a
> non-present page, not a failure to execute what it found there.
>
> I expect a cpuid instruction would resolve the issue, but it also forces
> a vmexit which complicates the microarchitectural interactions here.
> Something else, like executing an int3 will also serialise the pipeline,
> but not vmexit. I will try and find some time to experiment.
You're in ring 0, aren't you? That gives you plenty of serializing
instructions which don't directly interact with the TLBs. An LLDT
with a zero selector might be the one with least side effects. And
in case you're not in ring 0, make up an interrupt frame and
execute an IRET.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum
2016-10-28 12:49 ` Jan Beulich
@ 2016-10-28 13:02 ` Andrew Cooper
2016-10-28 13:37 ` Jan Beulich
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2016-10-28 13:02 UTC (permalink / raw)
To: Jan Beulich; +Cc: Xen-devel
On 28/10/16 13:49, Jan Beulich wrote:
>>>> On 28.10.16 at 14:39, <andrew.cooper3@citrix.com> wrote:
>> On 28/10/16 13:03, Jan Beulich wrote:
>>>>>> On 28.10.16 at 12:36, <andrew.cooper3@citrix.com> wrote:
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>> (Maybe you want to drop the ...
>>>
>>>> --- a/tests/xsa-186/main.c
>>>> +++ b/tests/xsa-186/main.c
>>>> @@ -144,6 +144,29 @@ void test_main(void)
>>>> memcpy(stub, insn_buf_start, insn_buf_end - insn_buf_start);
>>>>
>>>> /*
>>>> + * Work around suspected Broadwell TLB Erratum
>>>> + *
>>>> + * Occasionally, this test failes with:
>>>> + *
>>>> + * --- Xen Test Framework ---
>>>> + * Environment: HVM 64bit (Long mode 4 levels)
>>>> + * XSA-186 PoC
>>>> + * ******************************
>>>> + * PANIC: Unhandled exception at 0008:fffffffffffffffa
>>>> + * Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
>>>> + * ******************************
>>>> + *
>>>> + * on Broadwell hardware. The mapping is definitely present as the
>>>> + * memcpy() has already succeeded. Inserting an invlpg resolves the
>>>> + * issue, sugguesting that there is a race conditon between dTLB/iTLB
>>> ... stray u which slipped into "suggesting".)
>>>
>>> Btw - would you mind trying something else: Instead of the INVLPG,
>>> put a CPUID or some other serializing instruction in here. ISTR that
>>> for self modifying code this is required, i.e. the CPU could have been
>>> fetching instructions ahead of the memcpy(), and nothing would be
>>> there to force it to drop what it has already executed speculatively,
>>> including the exception token.
>> That is an interesting point, but still doesn't explain the symptoms.
>> If the icache wasn't flushed, we might get junk instructions and a #UD/#GP.
> No. As the processor speculates the call, it won't be able to fetch
> the target instruction and hence would insert an exception token
> into the queue. There would be junk instruction bytes only if there
> was a prior mapping for that page, but aiui a mapping for that
> address gets established exactly once.
Re-reading Intel Vol 3 11.6 "Self-Modifying Code".
* A write to a memory location in a code segment that is currently
cached in the processor causes the associated cache line (or lines) to
be invalidated. This check is based on the physical address of the
instruction. If the write affects a prefetched instruction, the
prefetch queue is invalidated. This latter check is based on the linear
address of the instruction.
* Systems software, such as a debugger, that might possibly modify an
instruction using a different linear address than that used to fetch the
instruction, will execute a serializing operation, such as a CPUID
instruction, before the modified instruction is executed, which will
automatically resynchronize the instruction cache and prefetch queue.
As this is a single vcpu using a single flat address space, the memcpy()
should invalidate any speculative execution which has already happened.
>
>> However, in this case the fault is for an instruction fetch from a
>> non-present page, not a failure to execute what it found there.
>>
>> I expect a cpuid instruction would resolve the issue, but it also forces
>> a vmexit which complicates the microarchitectural interactions here.
>> Something else, like executing an int3 will also serialise the pipeline,
>> but not vmexit. I will try and find some time to experiment.
> You're in ring 0, aren't you? That gives you plenty of serializing
> instructions which don't directly interact with the TLBs. An LLDT
> with a zero selector might be the one with least side effects. And
> in case you're not in ring 0, make up an interrupt frame and
> execute an IRET.
Yes - all better options.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum
2016-10-28 13:02 ` Andrew Cooper
@ 2016-10-28 13:37 ` Jan Beulich
0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2016-10-28 13:37 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Xen-devel
>>> On 28.10.16 at 15:02, <andrew.cooper3@citrix.com> wrote:
> On 28/10/16 13:49, Jan Beulich wrote:
>>>>> On 28.10.16 at 14:39, <andrew.cooper3@citrix.com> wrote:
>>> On 28/10/16 13:03, Jan Beulich wrote:
>>>>>>> On 28.10.16 at 12:36, <andrew.cooper3@citrix.com> wrote:
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>> (Maybe you want to drop the ...
>>>>
>>>>> --- a/tests/xsa-186/main.c
>>>>> +++ b/tests/xsa-186/main.c
>>>>> @@ -144,6 +144,29 @@ void test_main(void)
>>>>> memcpy(stub, insn_buf_start, insn_buf_end - insn_buf_start);
>>>>>
>>>>> /*
>>>>> + * Work around suspected Broadwell TLB Erratum
>>>>> + *
>>>>> + * Occasionally, this test failes with:
>>>>> + *
>>>>> + * --- Xen Test Framework ---
>>>>> + * Environment: HVM 64bit (Long mode 4 levels)
>>>>> + * XSA-186 PoC
>>>>> + * ******************************
>>>>> + * PANIC: Unhandled exception at 0008:fffffffffffffffa
>>>>> + * Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
>>>>> + * ******************************
>>>>> + *
>>>>> + * on Broadwell hardware. The mapping is definitely present as the
>>>>> + * memcpy() has already succeeded. Inserting an invlpg resolves the
>>>>> + * issue, sugguesting that there is a race conditon between dTLB/iTLB
>>>> ... stray u which slipped into "suggesting".)
>>>>
>>>> Btw - would you mind trying something else: Instead of the INVLPG,
>>>> put a CPUID or some other serializing instruction in here. ISTR that
>>>> for self modifying code this is required, i.e. the CPU could have been
>>>> fetching instructions ahead of the memcpy(), and nothing would be
>>>> there to force it to drop what it has already executed speculatively,
>>>> including the exception token.
>>> That is an interesting point, but still doesn't explain the symptoms.
>>> If the icache wasn't flushed, we might get junk instructions and a #UD/#GP.
>> No. As the processor speculates the call, it won't be able to fetch
>> the target instruction and hence would insert an exception token
>> into the queue. There would be junk instruction bytes only if there
>> was a prior mapping for that page, but aiui a mapping for that
>> address gets established exactly once.
>
> Re-reading Intel Vol 3 11.6 "Self-Modifying Code".
>
> * A write to a memory location in a code segment that is currently
> cached in the processor causes the associated cache line (or lines) to
> be invalidated. This check is based on the physical address of the
> instruction. If the write affects a prefetched instruction, the
> prefetch queue is invalidated. This latter check is based on the linear
> address of the instruction.
>
> * Systems software, such as a debugger, that might possibly modify an
> instruction using a different linear address than that used to fetch the
> instruction, will execute a serializing operation, such as a CPUID
> instruction, before the modified instruction is executed, which will
> automatically resynchronize the instruction cache and prefetch queue.
>
> As this is a single vcpu using a single flat address space, the memcpy()
> should invalidate any speculative execution which has already happened.
And still you describe only the case where there would need to be
a prior mapping - without one there simply is no physical address to
compare against. What if speculative execution ends up performing
the call to stub before you finish populating page tables? That would
also explain the error code. But aiui this might still be an erratum, as
it might be a memory ordering violation (depending on whether insn
fetches count as reads here, which would then have to observe
earlier writes, albeit the addresses of the two are different).
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-10-28 13:37 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-27 18:26 [xen-unstable test] 101698: regressions - FAIL osstest service owner
2016-10-28 9:55 ` Broadwell TLB Erratum Andrew Cooper
2016-10-28 10:09 ` Jan Beulich
2016-10-28 10:36 ` [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum Andrew Cooper
2016-10-28 12:03 ` Jan Beulich
2016-10-28 12:39 ` Andrew Cooper
2016-10-28 12:49 ` Jan Beulich
2016-10-28 13:02 ` Andrew Cooper
2016-10-28 13:37 ` Jan Beulich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.