All of lore.kernel.org
 help / color / mirror / Atom feed
* [xen-unstable test] 101698: regressions - FAIL
@ 2016-10-27 18:26 osstest service owner
  2016-10-28  9:55 ` Broadwell TLB Erratum Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: osstest service owner @ 2016-10-27 18:26 UTC (permalink / raw)
  To: xen-devel, osstest-admin

[-- Attachment #1: Type: text/plain, Size: 18468 bytes --]

flight 101698 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/101698/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-xtf-amd64-amd64-5       44 xtf/test-hvm64-xsa-186   fail REGR. vs. 101673
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail REGR. vs. 101673
 test-amd64-i386-xl-qemut-debianhvm-amd64 17 guest-start/debianhvm.repeat fail REGR. vs. 101673

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-libvirt     13 saverestore-support-check    fail  like 101673
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop             fail like 101673
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop             fail like 101673
 test-armhf-armhf-libvirt-qcow2 12 saverestore-support-check   fail like 101673
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop            fail like 101673
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop            fail like 101673
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-check    fail  like 101673
 test-armhf-armhf-libvirt-raw 12 saverestore-support-check    fail  like 101673
 test-amd64-amd64-xl-rtds      9 debian-install               fail  like 101673

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumprun-amd64  1 build-check(1)               blocked  n/a
 test-amd64-i386-rumprun-i386  1 build-check(1)               blocked  n/a
 build-amd64-rumprun           7 xen-build                    fail   never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start                  fail   never pass
 build-i386-rumprun            7 xen-build                    fail   never pass
 test-amd64-amd64-libvirt     12 migrate-support-check        fail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt      12 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-check        fail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-check    fail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-libvirt     12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-xsm      12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-xsm      13 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl          12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-check        fail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-check    fail never pass
 test-armhf-armhf-xl          13 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-check    fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start                  fail  never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-check        fail never pass
 test-armhf-armhf-xl-rtds     12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-rtds     13 saverestore-support-check    fail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-check        fail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-check    fail  never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-vhd      11 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-vhd      12 saverestore-support-check    fail   never pass

version targeted for testing:
 xen                  e26722422764d3ddfe59e76f5efbd330f8f9288f
baseline version:
 xen                  6f9b62ca57322197e26d3b22ff11b629697142bd

Last test of basis   101673  2016-10-26 02:01:16 Z    1 days
Testing same since   101698  2016-10-26 19:50:48 Z    0 days    1 attempts

------------------------------------------------------------
People who touched revisions under test:
  Andrew Cooper <andrew.cooper3@citrix.com>
  Dario Faggioli <dario.faggioli@citrix.com>
  David Scott <dave@recoil.org>
  Ian Jackson <Ian.Jackson@eu.citrix.com>
  Jan Beulich <jbeulich@suse.com>
  Juergen Gross <jgross@suse.com>
  Meng Xu <mengxu@cis.upenn.edu>
  Roger Pau Monne <roger.pau@citrix.com>
  Roger Pau Monné <roger.pau@citrix.com>
  Wei Liu <wei.liu2@citrix.com>

jobs:
 build-amd64-xsm                                              pass    
 build-armhf-xsm                                              pass    
 build-i386-xsm                                               pass    
 build-amd64-xtf                                              pass    
 build-amd64                                                  pass    
 build-armhf                                                  pass    
 build-i386                                                   pass    
 build-amd64-libvirt                                          pass    
 build-armhf-libvirt                                          pass    
 build-i386-libvirt                                           pass    
 build-amd64-oldkern                                          pass    
 build-i386-oldkern                                           pass    
 build-amd64-prev                                             pass    
 build-i386-prev                                              pass    
 build-amd64-pvops                                            pass    
 build-armhf-pvops                                            pass    
 build-i386-pvops                                             pass    
 build-amd64-rumprun                                          fail    
 build-i386-rumprun                                           fail    
 test-xtf-amd64-amd64-1                                       pass    
 test-xtf-amd64-amd64-2                                       pass    
 test-xtf-amd64-amd64-3                                       pass    
 test-xtf-amd64-amd64-4                                       pass    
 test-xtf-amd64-amd64-5                                       pass    
 test-amd64-amd64-xl                                          pass    
 test-armhf-armhf-xl                                          pass    
 test-amd64-i386-xl                                           pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm                pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm                 pass    
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm           pass    
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm            pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm                pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm                 pass    
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm        pass    
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm         fail    
 test-amd64-amd64-libvirt-xsm                                 pass    
 test-armhf-armhf-libvirt-xsm                                 pass    
 test-amd64-i386-libvirt-xsm                                  pass    
 test-amd64-amd64-xl-xsm                                      pass    
 test-armhf-armhf-xl-xsm                                      pass    
 test-amd64-i386-xl-xsm                                       pass    
 test-amd64-amd64-qemuu-nested-amd                            fail    
 test-amd64-amd64-xl-pvh-amd                                  fail    
 test-amd64-i386-qemut-rhel6hvm-amd                           pass    
 test-amd64-i386-qemuu-rhel6hvm-amd                           pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64                     fail    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64                     pass    
 test-amd64-i386-freebsd10-amd64                              pass    
 test-amd64-amd64-xl-qemuu-ovmf-amd64                         pass    
 test-amd64-i386-xl-qemuu-ovmf-amd64                          pass    
 test-amd64-amd64-rumprun-amd64                               blocked 
 test-amd64-amd64-xl-qemut-win7-amd64                         fail    
 test-amd64-i386-xl-qemut-win7-amd64                          fail    
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail    
 test-amd64-i386-xl-qemuu-win7-amd64                          fail    
 test-armhf-armhf-xl-arndale                                  pass    
 test-amd64-amd64-xl-credit2                                  pass    
 test-armhf-armhf-xl-credit2                                  pass    
 test-armhf-armhf-xl-cubietruck                               pass    
 test-amd64-i386-freebsd10-i386                               pass    
 test-amd64-i386-rumprun-i386                                 blocked 
 test-amd64-amd64-qemuu-nested-intel                          pass    
 test-amd64-amd64-xl-pvh-intel                                fail    
 test-amd64-i386-qemut-rhel6hvm-intel                         pass    
 test-amd64-i386-qemuu-rhel6hvm-intel                         pass    
 test-amd64-amd64-libvirt                                     pass    
 test-armhf-armhf-libvirt                                     pass    
 test-amd64-i386-libvirt                                      pass    
 test-amd64-amd64-migrupgrade                                 pass    
 test-amd64-i386-migrupgrade                                  pass    
 test-amd64-amd64-xl-multivcpu                                pass    
 test-armhf-armhf-xl-multivcpu                                pass    
 test-amd64-amd64-pair                                        pass    
 test-amd64-i386-pair                                         pass    
 test-amd64-amd64-libvirt-pair                                pass    
 test-amd64-i386-libvirt-pair                                 pass    
 test-amd64-amd64-amd64-pvgrub                                pass    
 test-amd64-amd64-i386-pvgrub                                 pass    
 test-amd64-amd64-pygrub                                      pass    
 test-armhf-armhf-libvirt-qcow2                               pass    
 test-amd64-amd64-xl-qcow2                                    pass    
 test-armhf-armhf-libvirt-raw                                 pass    
 test-amd64-i386-xl-raw                                       pass    
 test-amd64-amd64-xl-rtds                                     fail    
 test-armhf-armhf-xl-rtds                                     pass    
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1                     pass    
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1                     pass    
 test-amd64-amd64-libvirt-vhd                                 pass    
 test-armhf-armhf-xl-vhd                                      pass    
 test-amd64-amd64-xl-qemut-winxpsp3                           pass    
 test-amd64-i386-xl-qemut-winxpsp3                            pass    
 test-amd64-amd64-xl-qemuu-winxpsp3                           pass    
 test-amd64-i386-xl-qemuu-winxpsp3                            pass    


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

------------------------------------------------------------
commit e26722422764d3ddfe59e76f5efbd330f8f9288f
Author: Jan Beulich <jbeulich@suse.com>
Date:   Wed Oct 26 16:13:21 2016 +0200

    Revert "keyhandler: rework process of nonirq keyhandler"
    
    This reverts commit 610b4eda2ce2b87cccbc8f61bdec01052e54fc66.
    It's not useful without ed7e33747d, which got reverted already.

commit 9f47f3d69f4dcb2b33ccb8fb20057152302ea1ad
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Wed Oct 26 12:06:44 2016 +0100

    x86/emul: Move CPUID Faulting fault generation into the emulator
    
    In hindsight, this is a better position for it, as it avoids opencoding
    hvmemul_inject_hw_exception() in hvmemul_cpuid(), and reduces the requirements
    on other ops->cpuid() hooks wanting to implement cpuid faulting in the future.
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Wei Liu <wei.liu2@citrix.com>
    Release-acked-by: Wei Liu <wei.liu2@citrix.com>

commit 0888d36bb23f7365ce12b03127fd0fb2661ec90e
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Fri Sep 23 14:48:27 2016 +0100

    x86/emul: Correct the decoding of SReg3 operands
    
    REX.R is ignored when considering segment register operands, and needs masking
    out first.
    
    While fixing this, reorder the user segments in x86_segment to match SReg3
    encoding.  This avoids needing a translation table between hardware ordering
    and Xen's ordering.
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    Release-acked-by: Wei Liu <wei.liu2@citrix.com>

commit 22bc820abb5200729dc387e6a0653c31daecfef3
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Tue Oct 25 18:46:39 2016 +0100

    x86/emul: Use explicit __attribute__((__packed__)) rather than __packed
    
    x86_emulate.h is included by the userspace test harness.  Avoid using
    constructs which don't come from standard header files.
    
    Reposition the test harnesses inclusion of x86_emulate.h to avoid relying on
    any definitions intended for use by x86_emulate.c alone.
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    Release-acked-by: Wei Liu <wei.liu2@citrix.com>

commit 1b843b2097e89d0fae18123cde88da9d167d9a0c
Author: Meng Xu <mengxu@cis.upenn.edu>
Date:   Fri Oct 21 22:12:02 2016 -0400

    xen: rtds: always clear the flag when replenishing a depleted vcpu
    
    We should clear the __RTDS_depleted bit once a VCPU budget is replenished.
    Because repl_timer_handler may be called after rt_schedule
    but before rt_context_saved, the VCPU may be not on CPU or on queue
    when the VCPU is the middle of context switch
    
    Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
    Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
    Release-acked-by: Wei Liu <wei.liu2@citrix.com>

commit 1307f8d3d46fe34f6eb739894008e8af3c168818
Author: Juergen Gross <jgross@suse.com>
Date:   Mon Oct 24 13:27:17 2016 +0200

    docs: remove wrong statement about bug in xenstore
    
    docs/misc/xenstore.txt states that xenstored will use "0" as a valid
    transaction id after 2^32 transactions. This is not true. Remove that
    statement.
    
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Acked-by: Wei Liu <wei.liu2@citrix.com>
    Release-acked-by: Wei Liu <wei.liu2@citrix.com>

commit 0897514b4b376a167f968f79c6ea0dee1061458e
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Wed Oct 26 10:34:21 2016 +0100

    tools/oxenstored: Avoid allocating invalid transaction ids
    
    The transaction id of 0 is reserved, meaning "not in a transaction".  It is up
    to the xenstored server to allocate transaction ids.  While oxenstored starts
    its ids at 1, but insufficient care is taken with truncation cases.
    
    A 32bit oxenstored has an int with 31 bits of width, meaning that the
    transaction id will wrap around to 0 after 2 billion transactions.
    
    A 64bit oxenstored has an int with 63 bits of width, meaning that once 4
    billion transactions are used, the allocated id will be truncated when written
    into the uin32_t field in the ring.  This causes the client to reply with the
    truncated id, breaking any further attempt to use any transactions.
    
    Limit all transaction ids to the range between 1 and 0x7ffffffe.  This is the
    best which can be done without making oxenstored depend on Stdint or Cstruct,
    yet still work for 32bit builds.
    
    Also check that the proposed new transaction id isn't currently in use.  For
    the first 2 billion transactions there is no chance of a collision, and after
    that, the chance is at most 20 (the default open transaction quota) in 2
    billion.
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Acked-by: David Scott <dave@recoil.org>
    Release-acked-by: Wei Liu <wei.liu2@citrix.com>

commit 4000a7c7d7b0e01837abd3918e393f289c07d68c
Author: Roger Pau Monne <roger.pau@citrix.com>
Date:   Tue Oct 25 11:53:28 2016 +0200

    tools/configure: fix pkg-config install path for FreeBSD
    
    pkg-config from FreeBSD ports doesn't have ${prefix}/share/pkgconfig in the
    default search path, fix this by having a PKG_INSTALLDIR variable that can
    be changed on a per-OS basis.
    
    It would be best to use PKG_INSTALLDIR as defined by the pkg.m4 macro, but
    sadly this also reports a wrong value on FreeBSD (${libdir}/pkgconfig, which
    expands to /usr/local/lib/pkgconfig by default, and is also _not_ part of
    the default pkg-config search path).
    
    This patch should not change the behavior for Linux installs.
    
    Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
    Reported-by: Alexander Nusov <alexander.nusov@nfvexpress.com>
    Acked-by: Wei Liu <wei.liu2@citrix.com>
    Release-acked-by: Wei Liu <wei.liu2@citrix.com>

commit 0d250b69eae5d1e8039270c763b05acc84589a8c
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date:   Wed Oct 26 12:06:17 2016 +0100

    Update QEMU_UPSTREAM_REVISION
    
    Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
(qemu changes not included)


[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Broadwell TLB Erratum
  2016-10-27 18:26 [xen-unstable test] 101698: regressions - FAIL osstest service owner
@ 2016-10-28  9:55 ` Andrew Cooper
  2016-10-28 10:09   ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2016-10-28  9:55 UTC (permalink / raw)
  To: osstest service owner, xen-devel, Lai, Paul C, Kevin Tian, Jun Nakajima
  Cc: Ian Jackson, Wei Liu, Jan Beulich

On 27/10/16 19:26, osstest service owner wrote:
> flight 101698 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/101698/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-xtf-amd64-amd64-5       44 xtf/test-hvm64-xsa-186   fail REGR. vs. 101673

--- Xen Test Framework ---
Environment: HVM 64bit (Long mode 4 levels)
XSA-186 PoC
******************************
PANIC: Unhandled exception at 0008:fffffffffffffffa
Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
******************************

This is an issue I have seen before, and I think it is TLB erratum in
Broadwell processors. Within XenServer, it has now been observed on one
SDP and two different Broadwell servers from different vendors.

The first CPU I saw it on was

CPU Vendor: Intel, Family 6 (0x6), Model 71 (0x47), Stepping 1 (raw
00040671)

Nobbling-1, which this test ran on is

CPU Vendor: Intel, Family 6 (0x6), Model 79 (0x4f), Stepping 1 (raw
000406f1)


The code in question sets up the mapping, memcpy()'s an instruction stub
into place, then calls the stub.

This pagefault is from the call, after the memcpy() has succeeded,
therefore proving the mapping is present in the dTLB.

The issue reproduces ~1 in 200 times, but can reliably be found in a
minute or two. Inserting an invlpg instruction immediately before the
call appears to resolve the issue (i.e. the tests run for ~1 hour
without observing the issue).

Architecturally however, this invlpg should have no effect.  I think
there is some race condition propagating TLB records to the L1 iTLB if
it is already present in the L1 dTLB.

At the first time I discovered this, I checked the NDA Specification
Update for the processor, and didn't find any published errata which
matched the symptoms.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Broadwell TLB Erratum
  2016-10-28  9:55 ` Broadwell TLB Erratum Andrew Cooper
@ 2016-10-28 10:09   ` Jan Beulich
  2016-10-28 10:36     ` [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2016-10-28 10:09 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Paul C Lai, IanJackson,
	osstest service owner, Jun Nakajima, xen-devel

>>> On 28.10.16 at 11:55, <andrew.cooper3@citrix.com> wrote:
> On 27/10/16 19:26, osstest service owner wrote:
>> flight 101698 xen-unstable real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/101698/ 
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>  test-xtf-amd64-amd64-5       44 xtf/test-hvm64-xsa-186   fail REGR. vs. 
> 101673
> 
> --- Xen Test Framework ---
> Environment: HVM 64bit (Long mode 4 levels)
> XSA-186 PoC
> ******************************
> PANIC: Unhandled exception at 0008:fffffffffffffffa
> Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
> ******************************
> 
> This is an issue I have seen before, and I think it is TLB erratum in
> Broadwell processors. Within XenServer, it has now been observed on one
> SDP and two different Broadwell servers from different vendors.
> 
> The first CPU I saw it on was
> 
> CPU Vendor: Intel, Family 6 (0x6), Model 71 (0x47), Stepping 1 (raw
> 00040671)
> 
> Nobbling-1, which this test ran on is
> 
> CPU Vendor: Intel, Family 6 (0x6), Model 79 (0x4f), Stepping 1 (raw
> 000406f1)
> 
> 
> The code in question sets up the mapping, memcpy()'s an instruction stub
> into place, then calls the stub.
> 
> This pagefault is from the call, after the memcpy() has succeeded,
> therefore proving the mapping is present in the dTLB.
> 
> The issue reproduces ~1 in 200 times, but can reliably be found in a
> minute or two. Inserting an invlpg instruction immediately before the
> call appears to resolve the issue (i.e. the tests run for ~1 hour
> without observing the issue).
> 
> Architecturally however, this invlpg should have no effect.  I think
> there is some race condition propagating TLB records to the L1 iTLB if
> it is already present in the L1 dTLB.
> 
> At the first time I discovered this, I checked the NDA Specification
> Update for the processor, and didn't find any published errata which
> matched the symptoms.

So until you/we hear back from Intel (which as we all know can take
a while), could you insert an INVLPG in the test, to eliminate these
(supposedly spurious) failures?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum
  2016-10-28 10:09   ` Jan Beulich
@ 2016-10-28 10:36     ` Andrew Cooper
  2016-10-28 12:03       ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2016-10-28 10:36 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tests/xsa-186/main.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/tests/xsa-186/main.c b/tests/xsa-186/main.c
index f2bc8f6..fe7e98b 100644
--- a/tests/xsa-186/main.c
+++ b/tests/xsa-186/main.c
@@ -144,6 +144,29 @@ void test_main(void)
     memcpy(stub, insn_buf_start, insn_buf_end - insn_buf_start);
 
     /*
+     * Work around suspected Broadwell TLB Erratum
+     *
+     * Occasionally, this test failes with:
+     *
+     *   --- Xen Test Framework ---
+     *   Environment: HVM 64bit (Long mode 4 levels)
+     *   XSA-186 PoC
+     *   ******************************
+     *   PANIC: Unhandled exception at 0008:fffffffffffffffa
+     *   Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
+     *   ******************************
+     *
+     * on Broadwell hardware.  The mapping is definitely present as the
+     * memcpy() has already succeeded.  Inserting an invlpg resolves the
+     * issue, sugguesting that there is a race conditon between dTLB/iTLB
+     * handling.
+     *
+     * Work around the issue for now, to avoid intermittent OSSTest failures
+     * from blocking pushes of unrelated changes.
+     */
+    invlpg(stub);
+
+    /*
      * Execute the stub.
      *
      * Intel CPUs are happy doing this for 32 and 64bit.  AMD CPUs are happy
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum
  2016-10-28 10:36     ` [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum Andrew Cooper
@ 2016-10-28 12:03       ` Jan Beulich
  2016-10-28 12:39         ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2016-10-28 12:03 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 28.10.16 at 12:36, <andrew.cooper3@citrix.com> wrote:
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
(Maybe you want to drop the ...

> --- a/tests/xsa-186/main.c
> +++ b/tests/xsa-186/main.c
> @@ -144,6 +144,29 @@ void test_main(void)
>      memcpy(stub, insn_buf_start, insn_buf_end - insn_buf_start);
>  
>      /*
> +     * Work around suspected Broadwell TLB Erratum
> +     *
> +     * Occasionally, this test failes with:
> +     *
> +     *   --- Xen Test Framework ---
> +     *   Environment: HVM 64bit (Long mode 4 levels)
> +     *   XSA-186 PoC
> +     *   ******************************
> +     *   PANIC: Unhandled exception at 0008:fffffffffffffffa
> +     *   Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
> +     *   ******************************
> +     *
> +     * on Broadwell hardware.  The mapping is definitely present as the
> +     * memcpy() has already succeeded.  Inserting an invlpg resolves the
> +     * issue, sugguesting that there is a race conditon between dTLB/iTLB

... stray u which slipped into "suggesting".)

Btw - would you mind trying something else: Instead of the INVLPG,
put a CPUID or some other serializing instruction in here. ISTR that
for self modifying code this is required, i.e. the CPU could have been
fetching instructions ahead of the memcpy(), and nothing would be
there to force it to drop what it has already executed speculatively,
including the exception token.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum
  2016-10-28 12:03       ` Jan Beulich
@ 2016-10-28 12:39         ` Andrew Cooper
  2016-10-28 12:49           ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2016-10-28 12:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 28/10/16 13:03, Jan Beulich wrote:
>>>> On 28.10.16 at 12:36, <andrew.cooper3@citrix.com> wrote:
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> (Maybe you want to drop the ...
>
>> --- a/tests/xsa-186/main.c
>> +++ b/tests/xsa-186/main.c
>> @@ -144,6 +144,29 @@ void test_main(void)
>>      memcpy(stub, insn_buf_start, insn_buf_end - insn_buf_start);
>>  
>>      /*
>> +     * Work around suspected Broadwell TLB Erratum
>> +     *
>> +     * Occasionally, this test failes with:
>> +     *
>> +     *   --- Xen Test Framework ---
>> +     *   Environment: HVM 64bit (Long mode 4 levels)
>> +     *   XSA-186 PoC
>> +     *   ******************************
>> +     *   PANIC: Unhandled exception at 0008:fffffffffffffffa
>> +     *   Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
>> +     *   ******************************
>> +     *
>> +     * on Broadwell hardware.  The mapping is definitely present as the
>> +     * memcpy() has already succeeded.  Inserting an invlpg resolves the
>> +     * issue, sugguesting that there is a race conditon between dTLB/iTLB
> ... stray u which slipped into "suggesting".)
>
> Btw - would you mind trying something else: Instead of the INVLPG,
> put a CPUID or some other serializing instruction in here. ISTR that
> for self modifying code this is required, i.e. the CPU could have been
> fetching instructions ahead of the memcpy(), and nothing would be
> there to force it to drop what it has already executed speculatively,
> including the exception token.

That is an interesting point, but still doesn't explain the symptoms. 
If the icache wasn't flushed, we might get junk instructions and a #UD/#GP.

However, in this case the fault is for an instruction fetch from a
non-present page, not a failure to execute what it found there.

I expect a cpuid instruction would resolve the issue, but it also forces
a vmexit which complicates the microarchitectural interactions here. 
Something else, like executing an int3 will also serialise the pipeline,
but not vmexit.  I will try and find some time to experiment.

~Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum
  2016-10-28 12:39         ` Andrew Cooper
@ 2016-10-28 12:49           ` Jan Beulich
  2016-10-28 13:02             ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2016-10-28 12:49 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 28.10.16 at 14:39, <andrew.cooper3@citrix.com> wrote:
> On 28/10/16 13:03, Jan Beulich wrote:
>>>>> On 28.10.16 at 12:36, <andrew.cooper3@citrix.com> wrote:
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> (Maybe you want to drop the ...
>>
>>> --- a/tests/xsa-186/main.c
>>> +++ b/tests/xsa-186/main.c
>>> @@ -144,6 +144,29 @@ void test_main(void)
>>>      memcpy(stub, insn_buf_start, insn_buf_end - insn_buf_start);
>>>  
>>>      /*
>>> +     * Work around suspected Broadwell TLB Erratum
>>> +     *
>>> +     * Occasionally, this test failes with:
>>> +     *
>>> +     *   --- Xen Test Framework ---
>>> +     *   Environment: HVM 64bit (Long mode 4 levels)
>>> +     *   XSA-186 PoC
>>> +     *   ******************************
>>> +     *   PANIC: Unhandled exception at 0008:fffffffffffffffa
>>> +     *   Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
>>> +     *   ******************************
>>> +     *
>>> +     * on Broadwell hardware.  The mapping is definitely present as the
>>> +     * memcpy() has already succeeded.  Inserting an invlpg resolves the
>>> +     * issue, sugguesting that there is a race conditon between dTLB/iTLB
>> ... stray u which slipped into "suggesting".)
>>
>> Btw - would you mind trying something else: Instead of the INVLPG,
>> put a CPUID or some other serializing instruction in here. ISTR that
>> for self modifying code this is required, i.e. the CPU could have been
>> fetching instructions ahead of the memcpy(), and nothing would be
>> there to force it to drop what it has already executed speculatively,
>> including the exception token.
> 
> That is an interesting point, but still doesn't explain the symptoms. 
> If the icache wasn't flushed, we might get junk instructions and a #UD/#GP.

No. As the processor speculates the call, it won't be able to fetch
the target instruction and hence would insert an exception token
into the queue. There would be junk instruction bytes only if there
was a prior mapping for that page, but aiui a mapping for that
address gets established exactly once.

> However, in this case the fault is for an instruction fetch from a
> non-present page, not a failure to execute what it found there.
> 
> I expect a cpuid instruction would resolve the issue, but it also forces
> a vmexit which complicates the microarchitectural interactions here. 
> Something else, like executing an int3 will also serialise the pipeline,
> but not vmexit.  I will try and find some time to experiment.

You're in ring 0, aren't you? That gives you plenty of serializing
instructions which don't directly interact with the TLBs. An LLDT
with a zero selector might be the one with least side effects. And
in case you're not in ring 0, make up an interrupt frame and
execute an IRET.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum
  2016-10-28 12:49           ` Jan Beulich
@ 2016-10-28 13:02             ` Andrew Cooper
  2016-10-28 13:37               ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2016-10-28 13:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 28/10/16 13:49, Jan Beulich wrote:
>>>> On 28.10.16 at 14:39, <andrew.cooper3@citrix.com> wrote:
>> On 28/10/16 13:03, Jan Beulich wrote:
>>>>>> On 28.10.16 at 12:36, <andrew.cooper3@citrix.com> wrote:
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>> (Maybe you want to drop the ...
>>>
>>>> --- a/tests/xsa-186/main.c
>>>> +++ b/tests/xsa-186/main.c
>>>> @@ -144,6 +144,29 @@ void test_main(void)
>>>>      memcpy(stub, insn_buf_start, insn_buf_end - insn_buf_start);
>>>>  
>>>>      /*
>>>> +     * Work around suspected Broadwell TLB Erratum
>>>> +     *
>>>> +     * Occasionally, this test failes with:
>>>> +     *
>>>> +     *   --- Xen Test Framework ---
>>>> +     *   Environment: HVM 64bit (Long mode 4 levels)
>>>> +     *   XSA-186 PoC
>>>> +     *   ******************************
>>>> +     *   PANIC: Unhandled exception at 0008:fffffffffffffffa
>>>> +     *   Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
>>>> +     *   ******************************
>>>> +     *
>>>> +     * on Broadwell hardware.  The mapping is definitely present as the
>>>> +     * memcpy() has already succeeded.  Inserting an invlpg resolves the
>>>> +     * issue, sugguesting that there is a race conditon between dTLB/iTLB
>>> ... stray u which slipped into "suggesting".)
>>>
>>> Btw - would you mind trying something else: Instead of the INVLPG,
>>> put a CPUID or some other serializing instruction in here. ISTR that
>>> for self modifying code this is required, i.e. the CPU could have been
>>> fetching instructions ahead of the memcpy(), and nothing would be
>>> there to force it to drop what it has already executed speculatively,
>>> including the exception token.
>> That is an interesting point, but still doesn't explain the symptoms. 
>> If the icache wasn't flushed, we might get junk instructions and a #UD/#GP.
> No. As the processor speculates the call, it won't be able to fetch
> the target instruction and hence would insert an exception token
> into the queue. There would be junk instruction bytes only if there
> was a prior mapping for that page, but aiui a mapping for that
> address gets established exactly once.

Re-reading Intel Vol 3 11.6 "Self-Modifying Code".

* A write to a memory location in a code segment that is currently
cached in the processor causes the associated cache line (or lines) to
be invalidated. This check is based on the physical address of the
instruction.  If the write affects a prefetched instruction, the
prefetch queue is invalidated. This latter check is based on the linear
address of the instruction.

* Systems software, such as a debugger, that might possibly modify an
instruction using a different linear address than that used to fetch the
instruction, will execute a serializing operation, such as a CPUID
instruction, before the modified instruction is executed, which will
automatically resynchronize the instruction cache and prefetch queue.

As this is a single vcpu using a single flat address space, the memcpy()
should invalidate any speculative execution which has already happened.

>
>> However, in this case the fault is for an instruction fetch from a
>> non-present page, not a failure to execute what it found there.
>>
>> I expect a cpuid instruction would resolve the issue, but it also forces
>> a vmexit which complicates the microarchitectural interactions here. 
>> Something else, like executing an int3 will also serialise the pipeline,
>> but not vmexit.  I will try and find some time to experiment.
> You're in ring 0, aren't you? That gives you plenty of serializing
> instructions which don't directly interact with the TLBs. An LLDT
> with a zero selector might be the one with least side effects. And
> in case you're not in ring 0, make up an interrupt frame and
> execute an IRET.

Yes - all better options.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum
  2016-10-28 13:02             ` Andrew Cooper
@ 2016-10-28 13:37               ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2016-10-28 13:37 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 28.10.16 at 15:02, <andrew.cooper3@citrix.com> wrote:
> On 28/10/16 13:49, Jan Beulich wrote:
>>>>> On 28.10.16 at 14:39, <andrew.cooper3@citrix.com> wrote:
>>> On 28/10/16 13:03, Jan Beulich wrote:
>>>>>>> On 28.10.16 at 12:36, <andrew.cooper3@citrix.com> wrote:
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>> (Maybe you want to drop the ...
>>>>
>>>>> --- a/tests/xsa-186/main.c
>>>>> +++ b/tests/xsa-186/main.c
>>>>> @@ -144,6 +144,29 @@ void test_main(void)
>>>>>      memcpy(stub, insn_buf_start, insn_buf_end - insn_buf_start);
>>>>>  
>>>>>      /*
>>>>> +     * Work around suspected Broadwell TLB Erratum
>>>>> +     *
>>>>> +     * Occasionally, this test failes with:
>>>>> +     *
>>>>> +     *   --- Xen Test Framework ---
>>>>> +     *   Environment: HVM 64bit (Long mode 4 levels)
>>>>> +     *   XSA-186 PoC
>>>>> +     *   ******************************
>>>>> +     *   PANIC: Unhandled exception at 0008:fffffffffffffffa
>>>>> +     *   Vec 14 #PF[-I-sr-] %cr2 fffffffffffffffa
>>>>> +     *   ******************************
>>>>> +     *
>>>>> +     * on Broadwell hardware.  The mapping is definitely present as the
>>>>> +     * memcpy() has already succeeded.  Inserting an invlpg resolves the
>>>>> +     * issue, sugguesting that there is a race conditon between dTLB/iTLB
>>>> ... stray u which slipped into "suggesting".)
>>>>
>>>> Btw - would you mind trying something else: Instead of the INVLPG,
>>>> put a CPUID or some other serializing instruction in here. ISTR that
>>>> for self modifying code this is required, i.e. the CPU could have been
>>>> fetching instructions ahead of the memcpy(), and nothing would be
>>>> there to force it to drop what it has already executed speculatively,
>>>> including the exception token.
>>> That is an interesting point, but still doesn't explain the symptoms. 
>>> If the icache wasn't flushed, we might get junk instructions and a #UD/#GP.
>> No. As the processor speculates the call, it won't be able to fetch
>> the target instruction and hence would insert an exception token
>> into the queue. There would be junk instruction bytes only if there
>> was a prior mapping for that page, but aiui a mapping for that
>> address gets established exactly once.
> 
> Re-reading Intel Vol 3 11.6 "Self-Modifying Code".
> 
> * A write to a memory location in a code segment that is currently
> cached in the processor causes the associated cache line (or lines) to
> be invalidated. This check is based on the physical address of the
> instruction.  If the write affects a prefetched instruction, the
> prefetch queue is invalidated. This latter check is based on the linear
> address of the instruction.
> 
> * Systems software, such as a debugger, that might possibly modify an
> instruction using a different linear address than that used to fetch the
> instruction, will execute a serializing operation, such as a CPUID
> instruction, before the modified instruction is executed, which will
> automatically resynchronize the instruction cache and prefetch queue.
> 
> As this is a single vcpu using a single flat address space, the memcpy()
> should invalidate any speculative execution which has already happened.

And still you describe only the case where there would need to be
a prior mapping - without one there simply is no physical address to
compare against. What if speculative execution ends up performing
the call to stub before you finish populating page tables? That would
also explain the error code. But aiui this might still be an erratum, as
it might be a memory ordering violation (depending on whether insn
fetches count as reads here, which would then have to observe
earlier writes, albeit the addresses of the two are different).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-10-28 13:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-27 18:26 [xen-unstable test] 101698: regressions - FAIL osstest service owner
2016-10-28  9:55 ` Broadwell TLB Erratum Andrew Cooper
2016-10-28 10:09   ` Jan Beulich
2016-10-28 10:36     ` [XTF PATCH] XSA-186: Work around suspected Broadwell TLB erratum Andrew Cooper
2016-10-28 12:03       ` Jan Beulich
2016-10-28 12:39         ` Andrew Cooper
2016-10-28 12:49           ` Jan Beulich
2016-10-28 13:02             ` Andrew Cooper
2016-10-28 13:37               ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.