* [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-05-15 19:48 ` osstest service owner
0 siblings, 0 replies; 43+ messages in thread
From: osstest service owner @ 2019-05-15 19:48 UTC (permalink / raw)
To: xen-devel, osstest-admin
flight 136184 qemu-upstream-4.11-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/136184/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
build-arm64-pvops <job status> broken in 134594
build-arm64 <job status> broken in 134594
build-arm64-xsm <job status> broken in 134594
build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575
build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575
build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575
test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575
test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575
test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575
test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575
Tests which are failing intermittently (not blocking):
test-amd64-amd64-xl-qcow2 17 guest-localmigrate/x10 fail in 136057 pass in 134594
test-amd64-amd64-xl-qcow2 16 guest-saverestore.2 fail pass in 136057
Tests which did not succeed, but are not blocking:
test-arm64-arm64-xl 1 build-check(1) blocked in 134594 n/a
build-arm64-libvirt 1 build-check(1) blocked in 134594 n/a
test-arm64-arm64-xl-xsm 1 build-check(1) blocked in 134594 n/a
test-arm64-arm64-xl-credit1 1 build-check(1) blocked in 134594 n/a
test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked in 134594 n/a
test-arm64-arm64-xl-credit2 1 build-check(1) blocked in 134594 n/a
test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass
test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass
test-amd64-amd64-libvirt-xsm 13 migrate-support-check fail never pass
test-amd64-i386-xl-pvshim 12 guest-start fail never pass
test-amd64-i386-libvirt-xsm 13 migrate-support-check fail never pass
test-amd64-amd64-libvirt 13 migrate-support-check fail never pass
test-amd64-i386-libvirt 13 migrate-support-check fail never pass
test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass
test-arm64-arm64-xl-credit1 7 xen-boot fail never pass
test-armhf-armhf-xl-arndale 13 migrate-support-check fail never pass
test-armhf-armhf-xl-arndale 14 saverestore-support-check fail never pass
test-amd64-amd64-libvirt-vhd 12 migrate-support-check fail never pass
test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass
test-armhf-armhf-libvirt 13 migrate-support-check fail never pass
test-armhf-armhf-xl-rtds 13 migrate-support-check fail never pass
test-armhf-armhf-xl-rtds 14 saverestore-support-check fail never pass
test-armhf-armhf-libvirt 14 saverestore-support-check fail never pass
test-armhf-armhf-xl-multivcpu 13 migrate-support-check fail never pass
test-armhf-armhf-xl-multivcpu 14 saverestore-support-check fail never pass
test-armhf-armhf-xl 13 migrate-support-check fail never pass
test-armhf-armhf-xl 14 saverestore-support-check fail never pass
test-armhf-armhf-xl-credit2 13 migrate-support-check fail never pass
test-armhf-armhf-xl-credit2 14 saverestore-support-check fail never pass
test-armhf-armhf-xl-credit1 13 migrate-support-check fail never pass
test-armhf-armhf-xl-credit1 14 saverestore-support-check fail never pass
test-armhf-armhf-xl-cubietruck 13 migrate-support-check fail never pass
test-armhf-armhf-xl-cubietruck 14 saverestore-support-check fail never pass
test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass
test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
test-armhf-armhf-xl-vhd 12 migrate-support-check fail never pass
test-armhf-armhf-xl-vhd 13 saverestore-support-check fail never pass
test-armhf-armhf-libvirt-raw 12 migrate-support-check fail never pass
test-armhf-armhf-libvirt-raw 13 saverestore-support-check fail never pass
test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
test-amd64-amd64-xl-qemuu-win10-i386 10 windows-install fail never pass
test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
version targeted for testing:
qemuu 2871355a6957f1b3c16f858e3143e0fff0737b6a
baseline version:
qemuu 20c76f9a5fbf16d58c6add2ace2ff0fabd785926
Last test of basis 125575 2018-07-25 18:53:54 Z 294 days
Testing same since 134270 2019-04-01 16:10:50 Z 44 days 19 attempts
------------------------------------------------------------
People who touched revisions under test:
Anthony PERARD <anthony.perard@citrix.com>
Gerd Hoffmann <kraxel@redhat.com>
Greg Kurz <groug@kaod.org>
Jason Wang <jasowang@redhat.com>
Kevin Wolf <kwolf@redhat.com>
Li Qiang <liq3ea@gmail.com>
Michael McConville <mmcco@mykolab.com>
Michael Tokarev <mjt@tls.msk.ru>
Niels de Vos <ndevos@redhat.com>
Paolo Bonzini <pbonzini@redhat.com>
Peter Maydell <peter.maydell@linaro.org>
Philippe Mathieu-Daudé <philmd@redhat.com>
Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Roger Pau Monne <roger.pau@citrix.com>
Roger Pau Monné <roger.pau@citrix.com>
jobs:
build-amd64-xsm pass
build-arm64-xsm pass
build-i386-xsm pass
build-amd64 pass
build-arm64 pass
build-armhf pass
build-i386 pass
build-amd64-libvirt pass
build-arm64-libvirt pass
build-armhf-libvirt pass
build-i386-libvirt pass
build-amd64-pvops pass
build-arm64-pvops pass
build-armhf-pvops pass
build-i386-pvops pass
test-amd64-amd64-xl pass
test-arm64-arm64-xl fail
test-armhf-armhf-xl pass
test-amd64-i386-xl pass
test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm pass
test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm pass
test-amd64-i386-xl-qemuu-debianhvm-i386-xsm pass
test-amd64-amd64-libvirt-xsm pass
test-arm64-arm64-libvirt-xsm fail
test-amd64-i386-libvirt-xsm pass
test-amd64-amd64-xl-xsm pass
test-arm64-arm64-xl-xsm fail
test-amd64-i386-xl-xsm pass
test-amd64-amd64-qemuu-nested-amd fail
test-amd64-amd64-xl-pvhv2-amd pass
test-amd64-i386-qemuu-rhel6hvm-amd pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-freebsd10-amd64 pass
test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
test-amd64-i386-xl-qemuu-ovmf-amd64 pass
test-amd64-amd64-xl-qemuu-win7-amd64 fail
test-amd64-i386-xl-qemuu-win7-amd64 fail
test-amd64-amd64-xl-qemuu-ws16-amd64 fail
test-amd64-i386-xl-qemuu-ws16-amd64 fail
test-armhf-armhf-xl-arndale pass
test-amd64-amd64-xl-credit1 pass
test-arm64-arm64-xl-credit1 fail
test-armhf-armhf-xl-credit1 pass
test-amd64-amd64-xl-credit2 pass
test-arm64-arm64-xl-credit2 fail
test-armhf-armhf-xl-credit2 pass
test-armhf-armhf-xl-cubietruck pass
test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict fail
test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict fail
test-amd64-i386-freebsd10-i386 pass
test-amd64-amd64-xl-qemuu-win10-i386 fail
test-amd64-i386-xl-qemuu-win10-i386 fail
test-amd64-amd64-qemuu-nested-intel pass
test-amd64-amd64-xl-pvhv2-intel pass
test-amd64-i386-qemuu-rhel6hvm-intel pass
test-amd64-amd64-libvirt pass
test-armhf-armhf-libvirt pass
test-amd64-i386-libvirt pass
test-amd64-amd64-xl-multivcpu pass
test-armhf-armhf-xl-multivcpu pass
test-amd64-amd64-pair pass
test-amd64-i386-pair pass
test-amd64-amd64-libvirt-pair pass
test-amd64-i386-libvirt-pair pass
test-amd64-amd64-amd64-pvgrub pass
test-amd64-amd64-i386-pvgrub pass
test-amd64-amd64-xl-pvshim pass
test-amd64-i386-xl-pvshim fail
test-amd64-amd64-pygrub pass
test-amd64-amd64-xl-qcow2 fail
test-armhf-armhf-libvirt-raw pass
test-amd64-i386-xl-raw pass
test-amd64-amd64-xl-rtds pass
test-armhf-armhf-xl-rtds pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass
test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow pass
test-amd64-amd64-xl-shadow pass
test-amd64-i386-xl-shadow pass
test-amd64-amd64-libvirt-vhd pass
test-armhf-armhf-xl-vhd pass
------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images
Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs
Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master
Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary
broken-job build-arm64-pvops broken
broken-job build-arm64 broken
broken-job build-arm64-xsm broken
Not pushing.
------------------------------------------------------------
commit 2871355a6957f1b3c16f858e3143e0fff0737b6a
Author: Kevin Wolf <kwolf@redhat.com>
Date: Thu Oct 11 17:30:39 2018 +0200
gtk: Don't vte_terminal_set_encoding() on new VTE versions
The function vte_terminal_set_encoding() is deprecated since VTE 0.54,
so stop calling it from that version on. This fixes a build error
because of our use of warning flags [-Werror=deprecated-declarations].
Fixes: https://bugs.launchpad.net/bugs/1794939
Reported-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20181011153039.2324-1-kwolf@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 6415994ffcc6d22b3f5add67f63fe77e4b9711f4)
commit 94a715b6cba7225e5db59901e5d0a5252ead9755
Author: Niels de Vos <ndevos@redhat.com>
Date: Tue Mar 5 16:46:34 2019 +0100
gluster: the glfs_io_cbk callback function pointer adds pre/post stat args
The glfs_*_async() functions do a callback once finished. This callback
has changed its arguments, pre- and post-stat structures have been
added. This makes it possible to improve caching, which is useful for
Samba and NFS-Ganesha, but not so much for QEMU. Gluster 6 is the first
release that includes these new arguments.
With an additional detection in ./configure, the new arguments can
conditionally get included in the glfs_io_cbk handler.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 0e3b891fefacc0e49f3c8ffa3a753b69eb7214d2)
commit 13bac7abf60e25101ef6059f0da7a168942eccd9
Author: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Date: Tue Mar 5 16:46:33 2019 +0100
gluster: Handle changed glfs_ftruncate signature
New versions of Glusters libgfapi.so have an updated glfs_ftruncate()
function that returns additional 'struct stat' structures to enable
advanced caching of attributes. This is useful for file servers, not so
much for QEMU. Nevertheless, the API has changed and needs to be
adopted.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit e014dbe74e0484188164c61ff6843f8a04a8cb9d)
commit 9864a12f4a13f19a7440cb32bd3242506d6b2738
Author: Jason Wang <jasowang@redhat.com>
Date: Tue Dec 4 11:53:43 2018 +0800
net: drop too large packet early
We try to detect and drop too large packet (>INT_MAX) in 1592a9947036
("net: ignore packet size greater than INT_MAX") during packet
delivering. Unfortunately, this is not sufficient as we may hit
another integer overflow when trying to queue such large packet in
qemu_net_queue_append_iov():
- size of the allocation may overflow on 32bit
- packet->size is integer which may overflow even on 64bit
Fixing this by moving the check to qemu_sendv_packet_async() which is
the entrance of all networking codes and reduce the limit to
NET_BUFSIZE to be more conservative. This works since:
- For the callers that call qemu_sendv_packet_async() directly, they
only care about if zero is returned to determine whether to prevent
the source from producing more packets. A callback will be triggered
if peer can accept more then source could be enabled. This is
usually used by high speed networking implementation like virtio-net
or netmap.
- For the callers that call qemu_sendv_packet() that calls
qemu_sendv_packet_async() indirectly, they often ignore the return
value. In this case qemu will just the drop packets if peer can't
receive.
Qemu will copy the packet if it was queued. So it was safe for both
kinds of the callers to assume the packet was sent.
Since we move the check from qemu_deliver_packet_iov() to
qemu_sendv_packet_async(), it would be safer to make
qemu_deliver_packet_iov() static to prevent any external user in the
future.
This is a revised patch of CVE-2018-17963.
Cc: qemu-stable@nongnu.org
Cc: Li Qiang <liq3ea@163.com>
Fixes: 1592a9947036 ("net: ignore packet size greater than INT_MAX")
Reported-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20181204035347.6148-2-jasowang@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 25c01bd19d0e4b66f357618aeefda1ef7a41e21a)
commit b697c0aecbf9bc8bdb4f1bf0ea92e6a8fb258094
Author: Jason Wang <jasowang@redhat.com>
Date: Wed May 30 13:16:36 2018 +0800
net: ignore packet size greater than INT_MAX
There should not be a reason for passing a packet size greater than
INT_MAX. It's usually a hint of bug somewhere, so ignore packet size
greater than INT_MAX in qemu_deliver_packet_iov()
CC: qemu-stable@nongnu.org
Reported-by: Daniel Shapira <daniel@twistlock.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 1592a9947036d60dde5404204a5d45975133caf5)
commit f517c1b6079a514c0798eacb3f7c77b9dd8ebbf1
Author: Greg Kurz <groug@kaod.org>
Date: Fri Nov 23 13:28:03 2018 +0100
9p: fix QEMU crash when renaming files
When using the 9P2000.u version of the protocol, the following shell
command line in the guest can cause QEMU to crash:
while true; do rm -rf aa; mkdir -p a/b & touch a/b/c & mv a aa; done
With 9P2000.u, file renaming is handled by the WSTAT command. The
v9fs_wstat() function calls v9fs_complete_rename(), which calls
v9fs_fix_path() for every fid whose path is affected by the change.
The involved calls to v9fs_path_copy() may race with any other access
to the fid path performed by some worker thread, causing a crash like
shown below:
Thread 12 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0,
flags=65536, mode=0) at hw/9pfs/9p-local.c:59
59 while (*path && fd != -1) {
(gdb) bt
#0 0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8,
path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59
#1 0x0000555555a25e0c in local_opendir_nofollow (fs_ctx=0x555557d958b8,
path=0x0) at hw/9pfs/9p-local.c:92
#2 0x0000555555a261b8 in local_lstat (fs_ctx=0x555557d958b8,
fs_path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/9p-local.c:185
#3 0x0000555555a2b367 in v9fs_co_lstat (pdu=0x555557d97498,
path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/cofile.c:53
#4 0x0000555555a1e9e2 in v9fs_stat (opaque=0x555557d97498)
at hw/9pfs/9p.c:1083
#5 0x0000555555e060a2 in coroutine_trampoline (i0=-669165424, i1=32767)
at util/coroutine-ucontext.c:116
#6 0x00007fffef4f5600 in __start_context () at /lib64/libc.so.6
#7 0x0000000000000000 in ()
(gdb)
The fix is to take the path write lock when calling v9fs_complete_rename(),
like in v9fs_rename().
Impact: DoS triggered by unprivileged guest users.
Fixes: CVE-2018-19489
Cc: P J P <ppandit@redhat.com>
Reported-by: zhibin hu <noirfate@gmail.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 1d20398694a3b67a388d955b7a945ba4aa90a8a8)
commit 9af9c1c20e313f597168e0522f5fc8d78123b0c8
Author: Paolo Bonzini <pbonzini@redhat.com>
Date: Tue Nov 20 19:41:48 2018 +0100
nvme: fix out-of-bounds access to the CMB
Because the CMB BAR has a min_access_size of 2, if you read the last
byte it will try to memcpy *2* bytes from n->cmbuf, causing an off-by-one
error. This is CVE-2018-16847.
Another way to fix this might be to register the CMB as a RAM memory
region, which would also be more efficient. However, that might be a
change for big-endian machines; I didn't think this through and I don't
know how real hardware works. Add a basic testcase for the CMB in case
somebody does this change later on.
Cc: Keith Busch <keith.busch@intel.com>
Cc: qemu-block@nongnu.org
Reported-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Tested-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 87ad860c622cc8f8916b5232bd8728c08f938fce)
commit c50c704a6a09554925b926c0313280be4a3d7100
Author: Greg Kurz <groug@kaod.org>
Date: Tue Nov 20 13:00:35 2018 +0100
9p: take write lock on fid path updates (CVE-2018-19364)
Recent commit 5b76ef50f62079a fixed a race where v9fs_co_open2() could
possibly overwrite a fid path with v9fs_path_copy() while it is being
accessed by some other thread, ie, use-after-free that can be detected
by ASAN with a custom 9p client.
It turns out that the same can happen at several locations where
v9fs_path_copy() is used to set the fid path. The fix is again to
take the write lock.
Fixes CVE-2018-19364.
Cc: P J P <ppandit@redhat.com>
Reported-by: zhibin hu <noirfate@gmail.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 5b3c77aa581ebb215125c84b0742119483571e55)
commit 03c28544a1b67fd48ef1fa72231818efa8563874
Author: Roger Pau Monne <roger.pau@citrix.com>
Date: Mon Mar 18 18:37:31 2019 +0100
xen-mapcache: use MAP_FIXED flag so the mmap address hint is always honored
Or if it's not possible to honor the hinted address an error is returned
instead. This makes it easier to spot the actual failure, instead of
failing later on when the caller of xen_remap_bucket realizes the
mapping has not been created at the requested address.
Also note that at least on FreeBSD using MAP_FIXED will cause mmap to
try harder to honor the passed address.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Igor Druzhinin <igor.druzhinin@cirtix.com>
Message-Id: <20190318173731.14494-1-roger.pau@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit 4158e93f4aced247c8db94a0275fc027da7dc97e)
commit a35ed1444329599f2975512c82be795f8af284d5
Author: Michael McConville <mmcco@mykolab.com>
Date: Fri Dec 1 11:31:57 2017 -0700
mmap(2) returns MAP_FAILED, not NULL, on failure
Signed-off-by: Michael McConville <mmcco@mykolab.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit ab1ce9bd4897b9909836e2d50bca86f2f3f2dddc)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-05-15 19:48 ` osstest service owner
0 siblings, 0 replies; 43+ messages in thread
From: osstest service owner @ 2019-05-15 19:48 UTC (permalink / raw)
To: xen-devel, osstest-admin
flight 136184 qemu-upstream-4.11-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/136184/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
build-arm64-pvops <job status> broken in 134594
build-arm64 <job status> broken in 134594
build-arm64-xsm <job status> broken in 134594
build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575
build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575
build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575
test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575
test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575
test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575
test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575
Tests which are failing intermittently (not blocking):
test-amd64-amd64-xl-qcow2 17 guest-localmigrate/x10 fail in 136057 pass in 134594
test-amd64-amd64-xl-qcow2 16 guest-saverestore.2 fail pass in 136057
Tests which did not succeed, but are not blocking:
test-arm64-arm64-xl 1 build-check(1) blocked in 134594 n/a
build-arm64-libvirt 1 build-check(1) blocked in 134594 n/a
test-arm64-arm64-xl-xsm 1 build-check(1) blocked in 134594 n/a
test-arm64-arm64-xl-credit1 1 build-check(1) blocked in 134594 n/a
test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked in 134594 n/a
test-arm64-arm64-xl-credit2 1 build-check(1) blocked in 134594 n/a
test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass
test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass
test-amd64-amd64-libvirt-xsm 13 migrate-support-check fail never pass
test-amd64-i386-xl-pvshim 12 guest-start fail never pass
test-amd64-i386-libvirt-xsm 13 migrate-support-check fail never pass
test-amd64-amd64-libvirt 13 migrate-support-check fail never pass
test-amd64-i386-libvirt 13 migrate-support-check fail never pass
test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass
test-arm64-arm64-xl-credit1 7 xen-boot fail never pass
test-armhf-armhf-xl-arndale 13 migrate-support-check fail never pass
test-armhf-armhf-xl-arndale 14 saverestore-support-check fail never pass
test-amd64-amd64-libvirt-vhd 12 migrate-support-check fail never pass
test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass
test-armhf-armhf-libvirt 13 migrate-support-check fail never pass
test-armhf-armhf-xl-rtds 13 migrate-support-check fail never pass
test-armhf-armhf-xl-rtds 14 saverestore-support-check fail never pass
test-armhf-armhf-libvirt 14 saverestore-support-check fail never pass
test-armhf-armhf-xl-multivcpu 13 migrate-support-check fail never pass
test-armhf-armhf-xl-multivcpu 14 saverestore-support-check fail never pass
test-armhf-armhf-xl 13 migrate-support-check fail never pass
test-armhf-armhf-xl 14 saverestore-support-check fail never pass
test-armhf-armhf-xl-credit2 13 migrate-support-check fail never pass
test-armhf-armhf-xl-credit2 14 saverestore-support-check fail never pass
test-armhf-armhf-xl-credit1 13 migrate-support-check fail never pass
test-armhf-armhf-xl-credit1 14 saverestore-support-check fail never pass
test-armhf-armhf-xl-cubietruck 13 migrate-support-check fail never pass
test-armhf-armhf-xl-cubietruck 14 saverestore-support-check fail never pass
test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass
test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
test-armhf-armhf-xl-vhd 12 migrate-support-check fail never pass
test-armhf-armhf-xl-vhd 13 saverestore-support-check fail never pass
test-armhf-armhf-libvirt-raw 12 migrate-support-check fail never pass
test-armhf-armhf-libvirt-raw 13 saverestore-support-check fail never pass
test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
test-amd64-amd64-xl-qemuu-win10-i386 10 windows-install fail never pass
test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
version targeted for testing:
qemuu 2871355a6957f1b3c16f858e3143e0fff0737b6a
baseline version:
qemuu 20c76f9a5fbf16d58c6add2ace2ff0fabd785926
Last test of basis 125575 2018-07-25 18:53:54 Z 294 days
Testing same since 134270 2019-04-01 16:10:50 Z 44 days 19 attempts
------------------------------------------------------------
People who touched revisions under test:
Anthony PERARD <anthony.perard@citrix.com>
Gerd Hoffmann <kraxel@redhat.com>
Greg Kurz <groug@kaod.org>
Jason Wang <jasowang@redhat.com>
Kevin Wolf <kwolf@redhat.com>
Li Qiang <liq3ea@gmail.com>
Michael McConville <mmcco@mykolab.com>
Michael Tokarev <mjt@tls.msk.ru>
Niels de Vos <ndevos@redhat.com>
Paolo Bonzini <pbonzini@redhat.com>
Peter Maydell <peter.maydell@linaro.org>
Philippe Mathieu-Daudé <philmd@redhat.com>
Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Roger Pau Monne <roger.pau@citrix.com>
Roger Pau Monné <roger.pau@citrix.com>
jobs:
build-amd64-xsm pass
build-arm64-xsm pass
build-i386-xsm pass
build-amd64 pass
build-arm64 pass
build-armhf pass
build-i386 pass
build-amd64-libvirt pass
build-arm64-libvirt pass
build-armhf-libvirt pass
build-i386-libvirt pass
build-amd64-pvops pass
build-arm64-pvops pass
build-armhf-pvops pass
build-i386-pvops pass
test-amd64-amd64-xl pass
test-arm64-arm64-xl fail
test-armhf-armhf-xl pass
test-amd64-i386-xl pass
test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm pass
test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm pass
test-amd64-i386-xl-qemuu-debianhvm-i386-xsm pass
test-amd64-amd64-libvirt-xsm pass
test-arm64-arm64-libvirt-xsm fail
test-amd64-i386-libvirt-xsm pass
test-amd64-amd64-xl-xsm pass
test-arm64-arm64-xl-xsm fail
test-amd64-i386-xl-xsm pass
test-amd64-amd64-qemuu-nested-amd fail
test-amd64-amd64-xl-pvhv2-amd pass
test-amd64-i386-qemuu-rhel6hvm-amd pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-freebsd10-amd64 pass
test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
test-amd64-i386-xl-qemuu-ovmf-amd64 pass
test-amd64-amd64-xl-qemuu-win7-amd64 fail
test-amd64-i386-xl-qemuu-win7-amd64 fail
test-amd64-amd64-xl-qemuu-ws16-amd64 fail
test-amd64-i386-xl-qemuu-ws16-amd64 fail
test-armhf-armhf-xl-arndale pass
test-amd64-amd64-xl-credit1 pass
test-arm64-arm64-xl-credit1 fail
test-armhf-armhf-xl-credit1 pass
test-amd64-amd64-xl-credit2 pass
test-arm64-arm64-xl-credit2 fail
test-armhf-armhf-xl-credit2 pass
test-armhf-armhf-xl-cubietruck pass
test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict fail
test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict fail
test-amd64-i386-freebsd10-i386 pass
test-amd64-amd64-xl-qemuu-win10-i386 fail
test-amd64-i386-xl-qemuu-win10-i386 fail
test-amd64-amd64-qemuu-nested-intel pass
test-amd64-amd64-xl-pvhv2-intel pass
test-amd64-i386-qemuu-rhel6hvm-intel pass
test-amd64-amd64-libvirt pass
test-armhf-armhf-libvirt pass
test-amd64-i386-libvirt pass
test-amd64-amd64-xl-multivcpu pass
test-armhf-armhf-xl-multivcpu pass
test-amd64-amd64-pair pass
test-amd64-i386-pair pass
test-amd64-amd64-libvirt-pair pass
test-amd64-i386-libvirt-pair pass
test-amd64-amd64-amd64-pvgrub pass
test-amd64-amd64-i386-pvgrub pass
test-amd64-amd64-xl-pvshim pass
test-amd64-i386-xl-pvshim fail
test-amd64-amd64-pygrub pass
test-amd64-amd64-xl-qcow2 fail
test-armhf-armhf-libvirt-raw pass
test-amd64-i386-xl-raw pass
test-amd64-amd64-xl-rtds pass
test-armhf-armhf-xl-rtds pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass
test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow pass
test-amd64-amd64-xl-shadow pass
test-amd64-i386-xl-shadow pass
test-amd64-amd64-libvirt-vhd pass
test-armhf-armhf-xl-vhd pass
------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images
Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs
Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master
Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary
broken-job build-arm64-pvops broken
broken-job build-arm64 broken
broken-job build-arm64-xsm broken
Not pushing.
------------------------------------------------------------
commit 2871355a6957f1b3c16f858e3143e0fff0737b6a
Author: Kevin Wolf <kwolf@redhat.com>
Date: Thu Oct 11 17:30:39 2018 +0200
gtk: Don't vte_terminal_set_encoding() on new VTE versions
The function vte_terminal_set_encoding() is deprecated since VTE 0.54,
so stop calling it from that version on. This fixes a build error
because of our use of warning flags [-Werror=deprecated-declarations].
Fixes: https://bugs.launchpad.net/bugs/1794939
Reported-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20181011153039.2324-1-kwolf@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 6415994ffcc6d22b3f5add67f63fe77e4b9711f4)
commit 94a715b6cba7225e5db59901e5d0a5252ead9755
Author: Niels de Vos <ndevos@redhat.com>
Date: Tue Mar 5 16:46:34 2019 +0100
gluster: the glfs_io_cbk callback function pointer adds pre/post stat args
The glfs_*_async() functions do a callback once finished. This callback
has changed its arguments, pre- and post-stat structures have been
added. This makes it possible to improve caching, which is useful for
Samba and NFS-Ganesha, but not so much for QEMU. Gluster 6 is the first
release that includes these new arguments.
With an additional detection in ./configure, the new arguments can
conditionally get included in the glfs_io_cbk handler.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 0e3b891fefacc0e49f3c8ffa3a753b69eb7214d2)
commit 13bac7abf60e25101ef6059f0da7a168942eccd9
Author: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Date: Tue Mar 5 16:46:33 2019 +0100
gluster: Handle changed glfs_ftruncate signature
New versions of Glusters libgfapi.so have an updated glfs_ftruncate()
function that returns additional 'struct stat' structures to enable
advanced caching of attributes. This is useful for file servers, not so
much for QEMU. Nevertheless, the API has changed and needs to be
adopted.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit e014dbe74e0484188164c61ff6843f8a04a8cb9d)
commit 9864a12f4a13f19a7440cb32bd3242506d6b2738
Author: Jason Wang <jasowang@redhat.com>
Date: Tue Dec 4 11:53:43 2018 +0800
net: drop too large packet early
We try to detect and drop too large packet (>INT_MAX) in 1592a9947036
("net: ignore packet size greater than INT_MAX") during packet
delivering. Unfortunately, this is not sufficient as we may hit
another integer overflow when trying to queue such large packet in
qemu_net_queue_append_iov():
- size of the allocation may overflow on 32bit
- packet->size is integer which may overflow even on 64bit
Fixing this by moving the check to qemu_sendv_packet_async() which is
the entrance of all networking codes and reduce the limit to
NET_BUFSIZE to be more conservative. This works since:
- For the callers that call qemu_sendv_packet_async() directly, they
only care about if zero is returned to determine whether to prevent
the source from producing more packets. A callback will be triggered
if peer can accept more then source could be enabled. This is
usually used by high speed networking implementation like virtio-net
or netmap.
- For the callers that call qemu_sendv_packet() that calls
qemu_sendv_packet_async() indirectly, they often ignore the return
value. In this case qemu will just the drop packets if peer can't
receive.
Qemu will copy the packet if it was queued. So it was safe for both
kinds of the callers to assume the packet was sent.
Since we move the check from qemu_deliver_packet_iov() to
qemu_sendv_packet_async(), it would be safer to make
qemu_deliver_packet_iov() static to prevent any external user in the
future.
This is a revised patch of CVE-2018-17963.
Cc: qemu-stable@nongnu.org
Cc: Li Qiang <liq3ea@163.com>
Fixes: 1592a9947036 ("net: ignore packet size greater than INT_MAX")
Reported-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20181204035347.6148-2-jasowang@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 25c01bd19d0e4b66f357618aeefda1ef7a41e21a)
commit b697c0aecbf9bc8bdb4f1bf0ea92e6a8fb258094
Author: Jason Wang <jasowang@redhat.com>
Date: Wed May 30 13:16:36 2018 +0800
net: ignore packet size greater than INT_MAX
There should not be a reason for passing a packet size greater than
INT_MAX. It's usually a hint of bug somewhere, so ignore packet size
greater than INT_MAX in qemu_deliver_packet_iov()
CC: qemu-stable@nongnu.org
Reported-by: Daniel Shapira <daniel@twistlock.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 1592a9947036d60dde5404204a5d45975133caf5)
commit f517c1b6079a514c0798eacb3f7c77b9dd8ebbf1
Author: Greg Kurz <groug@kaod.org>
Date: Fri Nov 23 13:28:03 2018 +0100
9p: fix QEMU crash when renaming files
When using the 9P2000.u version of the protocol, the following shell
command line in the guest can cause QEMU to crash:
while true; do rm -rf aa; mkdir -p a/b & touch a/b/c & mv a aa; done
With 9P2000.u, file renaming is handled by the WSTAT command. The
v9fs_wstat() function calls v9fs_complete_rename(), which calls
v9fs_fix_path() for every fid whose path is affected by the change.
The involved calls to v9fs_path_copy() may race with any other access
to the fid path performed by some worker thread, causing a crash like
shown below:
Thread 12 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0,
flags=65536, mode=0) at hw/9pfs/9p-local.c:59
59 while (*path && fd != -1) {
(gdb) bt
#0 0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8,
path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59
#1 0x0000555555a25e0c in local_opendir_nofollow (fs_ctx=0x555557d958b8,
path=0x0) at hw/9pfs/9p-local.c:92
#2 0x0000555555a261b8 in local_lstat (fs_ctx=0x555557d958b8,
fs_path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/9p-local.c:185
#3 0x0000555555a2b367 in v9fs_co_lstat (pdu=0x555557d97498,
path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/cofile.c:53
#4 0x0000555555a1e9e2 in v9fs_stat (opaque=0x555557d97498)
at hw/9pfs/9p.c:1083
#5 0x0000555555e060a2 in coroutine_trampoline (i0=-669165424, i1=32767)
at util/coroutine-ucontext.c:116
#6 0x00007fffef4f5600 in __start_context () at /lib64/libc.so.6
#7 0x0000000000000000 in ()
(gdb)
The fix is to take the path write lock when calling v9fs_complete_rename(),
like in v9fs_rename().
Impact: DoS triggered by unprivileged guest users.
Fixes: CVE-2018-19489
Cc: P J P <ppandit@redhat.com>
Reported-by: zhibin hu <noirfate@gmail.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 1d20398694a3b67a388d955b7a945ba4aa90a8a8)
commit 9af9c1c20e313f597168e0522f5fc8d78123b0c8
Author: Paolo Bonzini <pbonzini@redhat.com>
Date: Tue Nov 20 19:41:48 2018 +0100
nvme: fix out-of-bounds access to the CMB
Because the CMB BAR has a min_access_size of 2, if you read the last
byte it will try to memcpy *2* bytes from n->cmbuf, causing an off-by-one
error. This is CVE-2018-16847.
Another way to fix this might be to register the CMB as a RAM memory
region, which would also be more efficient. However, that might be a
change for big-endian machines; I didn't think this through and I don't
know how real hardware works. Add a basic testcase for the CMB in case
somebody does this change later on.
Cc: Keith Busch <keith.busch@intel.com>
Cc: qemu-block@nongnu.org
Reported-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Tested-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 87ad860c622cc8f8916b5232bd8728c08f938fce)
commit c50c704a6a09554925b926c0313280be4a3d7100
Author: Greg Kurz <groug@kaod.org>
Date: Tue Nov 20 13:00:35 2018 +0100
9p: take write lock on fid path updates (CVE-2018-19364)
Recent commit 5b76ef50f62079a fixed a race where v9fs_co_open2() could
possibly overwrite a fid path with v9fs_path_copy() while it is being
accessed by some other thread, ie, use-after-free that can be detected
by ASAN with a custom 9p client.
It turns out that the same can happen at several locations where
v9fs_path_copy() is used to set the fid path. The fix is again to
take the write lock.
Fixes CVE-2018-19364.
Cc: P J P <ppandit@redhat.com>
Reported-by: zhibin hu <noirfate@gmail.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 5b3c77aa581ebb215125c84b0742119483571e55)
commit 03c28544a1b67fd48ef1fa72231818efa8563874
Author: Roger Pau Monne <roger.pau@citrix.com>
Date: Mon Mar 18 18:37:31 2019 +0100
xen-mapcache: use MAP_FIXED flag so the mmap address hint is always honored
Or if it's not possible to honor the hinted address an error is returned
instead. This makes it easier to spot the actual failure, instead of
failing later on when the caller of xen_remap_bucket realizes the
mapping has not been created at the requested address.
Also note that at least on FreeBSD using MAP_FIXED will cause mmap to
try harder to honor the passed address.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Igor Druzhinin <igor.druzhinin@cirtix.com>
Message-Id: <20190318173731.14494-1-roger.pau@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit 4158e93f4aced247c8db94a0275fc027da7dc97e)
commit a35ed1444329599f2975512c82be795f8af284d5
Author: Michael McConville <mmcco@mykolab.com>
Date: Fri Dec 1 11:31:57 2017 -0700
mmap(2) returns MAP_FAILED, not NULL, on failure
Signed-off-by: Michael McConville <mmcco@mykolab.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit ab1ce9bd4897b9909836e2d50bca86f2f3f2dddc)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-05-16 10:37 ` Anthony PERARD
0 siblings, 0 replies; 43+ messages in thread
From: Anthony PERARD @ 2019-05-16 10:37 UTC (permalink / raw)
To: osstest service owner, Ian Jackson, Julien Grall; +Cc: xen-devel
On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote:
> flight 136184 qemu-upstream-4.11-testing real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/136184/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> build-arm64-pvops <job status> broken in 134594
> build-arm64 <job status> broken in 134594
> build-arm64-xsm <job status> broken in 134594
> build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575
> build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575
> build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575
> test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575
> test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575
> test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575
> test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575
>
Ian, Julien,
I can't figure out why Xen consistently fails to boot on rochester* in
the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to
pass.
At boot, the boot loader seems to load blobs, but when it's time to Xen
to shine, there are no output from Xen on the serial.
Do you know what could cause xen to fail to boot?
I don't believe a few more patch on top of qemu-xen would.
Thanks,
--
Anthony PERARD
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-05-16 10:37 ` Anthony PERARD
0 siblings, 0 replies; 43+ messages in thread
From: Anthony PERARD @ 2019-05-16 10:37 UTC (permalink / raw)
To: osstest service owner, Ian Jackson, Julien Grall; +Cc: xen-devel
On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote:
> flight 136184 qemu-upstream-4.11-testing real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/136184/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> build-arm64-pvops <job status> broken in 134594
> build-arm64 <job status> broken in 134594
> build-arm64-xsm <job status> broken in 134594
> build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575
> build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575
> build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575
> test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575
> test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575
> test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575
> test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575
>
Ian, Julien,
I can't figure out why Xen consistently fails to boot on rochester* in
the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to
pass.
At boot, the boot loader seems to load blobs, but when it's time to Xen
to shine, there are no output from Xen on the serial.
Do you know what could cause xen to fail to boot?
I don't believe a few more patch on top of qemu-xen would.
Thanks,
--
Anthony PERARD
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-05-16 21:38 ` Julien Grall
0 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2019-05-16 21:38 UTC (permalink / raw)
To: Anthony PERARD, osstest service owner, Ian Jackson
Cc: xen-devel, Stefano Stabellini
Hi Anthony,
Thank you for CCing me.
On 5/16/19 11:37 AM, Anthony PERARD wrote:
> On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote:
>> flight 136184 qemu-upstream-4.11-testing real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/136184/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>> build-arm64-pvops <job status> broken in 134594
>> build-arm64 <job status> broken in 134594
>> build-arm64-xsm <job status> broken in 134594
>> build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575
>> build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575
>> build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575
>> test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575
>> test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575
>> test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575
>> test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575
>>
>
> Ian, Julien,
>
> I can't figure out why Xen consistently fails to boot on rochester* in
> the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to
> pass.
>
> At boot, the boot loader seems to load blobs, but when it's time to Xen
> to shine, there are no output from Xen on the serial.
The serial console is initializing fairly late in the process. Any
useful message (such as memory setup or even part of the interrupts)
will be hide out. For getting them, you need earlyprintk. Unfortunately
they can't be configured at runtime today :(.
>
> Do you know what could cause xen to fail to boot?
It is hard to say without the log. Looking at the different with a Xen
4.11 flights on rochester0 [1], it seems the .config is slightly
different. 4.11 flight has CONFIG_LIVEPATCH set.
I tried to boot xen built in this flight on an internal board, but I
can't see any error. So it may be some board specific issues.
Sorry I can't provide more input without a proper investigation.
> I don't believe a few more patch on top of qemu-xen would.
Cheers,
[1]
http://logs.test-lab.xenproject.org/osstest/logs/136231/test-arm64-arm64-xl/info.html
>
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-05-16 21:38 ` Julien Grall
0 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2019-05-16 21:38 UTC (permalink / raw)
To: Anthony PERARD, osstest service owner, Ian Jackson
Cc: xen-devel, Stefano Stabellini
Hi Anthony,
Thank you for CCing me.
On 5/16/19 11:37 AM, Anthony PERARD wrote:
> On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote:
>> flight 136184 qemu-upstream-4.11-testing real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/136184/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>> build-arm64-pvops <job status> broken in 134594
>> build-arm64 <job status> broken in 134594
>> build-arm64-xsm <job status> broken in 134594
>> build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575
>> build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575
>> build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575
>> test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575
>> test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575
>> test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575
>> test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575
>>
>
> Ian, Julien,
>
> I can't figure out why Xen consistently fails to boot on rochester* in
> the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to
> pass.
>
> At boot, the boot loader seems to load blobs, but when it's time to Xen
> to shine, there are no output from Xen on the serial.
The serial console is initializing fairly late in the process. Any
useful message (such as memory setup or even part of the interrupts)
will be hide out. For getting them, you need earlyprintk. Unfortunately
they can't be configured at runtime today :(.
>
> Do you know what could cause xen to fail to boot?
It is hard to say without the log. Looking at the different with a Xen
4.11 flights on rochester0 [1], it seems the .config is slightly
different. 4.11 flight has CONFIG_LIVEPATCH set.
I tried to boot xen built in this flight on an internal board, but I
can't see any error. So it may be some board specific issues.
Sorry I can't provide more input without a proper investigation.
> I don't believe a few more patch on top of qemu-xen would.
Cheers,
[1]
http://logs.test-lab.xenproject.org/osstest/logs/136231/test-arm64-arm64-xl/info.html
>
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-05-17 15:53 ` Ian Jackson
0 siblings, 0 replies; 43+ messages in thread
From: Ian Jackson @ 2019-05-17 15:53 UTC (permalink / raw)
To: Julien Grall; +Cc: Anthony Perard, xen-devel, Stefano Stabellini
Julien Grall writes ("Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL"):
> On 5/16/19 11:37 AM, Anthony PERARD wrote:
> >> Tests which did not succeed and are blocking,
> >> including tests which could not be run:
> >> test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575
> >> test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575
> >> test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575
> >> test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575
..
> > I can't figure out why Xen consistently fails to boot on rochester* in
> > the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to
> > pass.
> >
> > At boot, the boot loader seems to load blobs, but when it's time to Xen
> > to shine, there are no output from Xen on the serial.
>
> The serial console is initializing fairly late in the process. Any
> useful message (such as memory setup or even part of the interrupts)
> will be hide out. For getting them, you need earlyprintk. Unfortunately
> they can't be configured at runtime today :(.
:-/. Can we configure the earlyprintk at compile-time ? We always
want it to be serial...
> > Do you know what could cause xen to fail to boot?
>
> It is hard to say without the log. Looking at the different with a Xen
> 4.11 flights on rochester0 [1], it seems the .config is slightly
> different. 4.11 flight has CONFIG_LIVEPATCH set.
The osstest history shows this as a 100% repeatable boot failure but
only in the qemu flights.
Comparing 136231 (pass, xen-4.11-testing) with 136184 (fail,
qemu-upstream-4.11-testing), there are no differences in the test job
runvars. Both used the same version of osstest.
But in the build-arm64 (Xen build) job runvars I see the following
differences:
136231 136184
pass fail
xen-4.11-testing qemu-*4.11*
build-arm64 (Xen build)
enable_livepatch true (unset)
[~built_]revision_qemuu 20c76f9a5fbf... 2871355a6957...
[~built_]revision_xen a6e07495c171... 3b062f5040a1...
~path_xenlptdist build/xenlptdist.tar.gz (unset)
build-arm64-pvops (kernel build)
~host rochester1 laxton1
~ indicates variable set by osstest during the test run.
The qemu revision is clearly not relevant. I did this
git-diff --stat a6e07495c171..3b062f5040a1
in xen.git and the differences really don't seem like they would be
relevant.
I think therefore that we need to blame the livepatch setting. This
comes from osstest's flight construction code. osstest is configured
to enable live patching, in the build, only on the xen-* branches.
Unfortunately due to the xen/cmdline regression, the osstest bisector
does not seem to have a useful enough baseline. I have rm'd the stamp
files and it may manage to do better but I doubt it.
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-05-17 15:53 ` Ian Jackson
0 siblings, 0 replies; 43+ messages in thread
From: Ian Jackson @ 2019-05-17 15:53 UTC (permalink / raw)
To: Julien Grall; +Cc: Anthony Perard, xen-devel, Stefano Stabellini
Julien Grall writes ("Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL"):
> On 5/16/19 11:37 AM, Anthony PERARD wrote:
> >> Tests which did not succeed and are blocking,
> >> including tests which could not be run:
> >> test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575
> >> test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575
> >> test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575
> >> test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575
..
> > I can't figure out why Xen consistently fails to boot on rochester* in
> > the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to
> > pass.
> >
> > At boot, the boot loader seems to load blobs, but when it's time to Xen
> > to shine, there are no output from Xen on the serial.
>
> The serial console is initializing fairly late in the process. Any
> useful message (such as memory setup or even part of the interrupts)
> will be hide out. For getting them, you need earlyprintk. Unfortunately
> they can't be configured at runtime today :(.
:-/. Can we configure the earlyprintk at compile-time ? We always
want it to be serial...
> > Do you know what could cause xen to fail to boot?
>
> It is hard to say without the log. Looking at the different with a Xen
> 4.11 flights on rochester0 [1], it seems the .config is slightly
> different. 4.11 flight has CONFIG_LIVEPATCH set.
The osstest history shows this as a 100% repeatable boot failure but
only in the qemu flights.
Comparing 136231 (pass, xen-4.11-testing) with 136184 (fail,
qemu-upstream-4.11-testing), there are no differences in the test job
runvars. Both used the same version of osstest.
But in the build-arm64 (Xen build) job runvars I see the following
differences:
136231 136184
pass fail
xen-4.11-testing qemu-*4.11*
build-arm64 (Xen build)
enable_livepatch true (unset)
[~built_]revision_qemuu 20c76f9a5fbf... 2871355a6957...
[~built_]revision_xen a6e07495c171... 3b062f5040a1...
~path_xenlptdist build/xenlptdist.tar.gz (unset)
build-arm64-pvops (kernel build)
~host rochester1 laxton1
~ indicates variable set by osstest during the test run.
The qemu revision is clearly not relevant. I did this
git-diff --stat a6e07495c171..3b062f5040a1
in xen.git and the differences really don't seem like they would be
relevant.
I think therefore that we need to blame the livepatch setting. This
comes from osstest's flight construction code. osstest is configured
to enable live patching, in the build, only on the xen-* branches.
Unfortunately due to the xen/cmdline regression, the osstest bisector
does not seem to have a useful enough baseline. I have rm'd the stamp
files and it may manage to do better but I doubt it.
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-05-17 17:23 ` Anthony PERARD
0 siblings, 0 replies; 43+ messages in thread
From: Anthony PERARD @ 2019-05-17 17:23 UTC (permalink / raw)
To: Julien Grall
Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel
On Thu, May 16, 2019 at 10:38:54PM +0100, Julien Grall wrote:
> Hi Anthony,
>
> Thank you for CCing me.
>
> On 5/16/19 11:37 AM, Anthony PERARD wrote:
> > On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote:
> > > flight 136184 qemu-upstream-4.11-testing real [real]
> > > http://logs.test-lab.xenproject.org/osstest/logs/136184/
> > >
> > > Regressions :-(
> > >
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > > build-arm64-pvops <job status> broken in 134594
> > > build-arm64 <job status> broken in 134594
> > > build-arm64-xsm <job status> broken in 134594
> > > build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575
> > > build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575
> > > build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575
> > > test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575
> > > test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575
> > > test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575
> > > test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575
> > >
> >
> > Ian, Julien,
> >
> > I can't figure out why Xen consistently fails to boot on rochester* in
> > the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to
> > pass.
> >
> > At boot, the boot loader seems to load blobs, but when it's time to Xen
> > to shine, there are no output from Xen on the serial.
>
> The serial console is initializing fairly late in the process. Any useful
> message (such as memory setup or even part of the interrupts) will be hide
> out. For getting them, you need earlyprintk. Unfortunately they can't be
> configured at runtime today :(.
I think I managed to run the job with earlyprintk on rochester, but
Xen have booted:
http://logs.test-lab.xenproject.org/osstest/logs/136451/
So that probably wasn't very useful.
(I had to hack osstest in order to compile xen with early printk.)
--
Anthony PERARD
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-05-17 17:23 ` Anthony PERARD
0 siblings, 0 replies; 43+ messages in thread
From: Anthony PERARD @ 2019-05-17 17:23 UTC (permalink / raw)
To: Julien Grall
Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel
On Thu, May 16, 2019 at 10:38:54PM +0100, Julien Grall wrote:
> Hi Anthony,
>
> Thank you for CCing me.
>
> On 5/16/19 11:37 AM, Anthony PERARD wrote:
> > On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote:
> > > flight 136184 qemu-upstream-4.11-testing real [real]
> > > http://logs.test-lab.xenproject.org/osstest/logs/136184/
> > >
> > > Regressions :-(
> > >
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > > build-arm64-pvops <job status> broken in 134594
> > > build-arm64 <job status> broken in 134594
> > > build-arm64-xsm <job status> broken in 134594
> > > build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575
> > > build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575
> > > build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575
> > > test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575
> > > test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575
> > > test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575
> > > test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575
> > >
> >
> > Ian, Julien,
> >
> > I can't figure out why Xen consistently fails to boot on rochester* in
> > the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to
> > pass.
> >
> > At boot, the boot loader seems to load blobs, but when it's time to Xen
> > to shine, there are no output from Xen on the serial.
>
> The serial console is initializing fairly late in the process. Any useful
> message (such as memory setup or even part of the interrupts) will be hide
> out. For getting them, you need earlyprintk. Unfortunately they can't be
> configured at runtime today :(.
I think I managed to run the job with earlyprintk on rochester, but
Xen have booted:
http://logs.test-lab.xenproject.org/osstest/logs/136451/
So that probably wasn't very useful.
(I had to hack osstest in order to compile xen with early printk.)
--
Anthony PERARD
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-05-17 19:00 ` Julien Grall
0 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2019-05-17 19:00 UTC (permalink / raw)
To: Anthony PERARD
Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel
Hi,
On 5/17/19 6:23 PM, Anthony PERARD wrote:
> On Thu, May 16, 2019 at 10:38:54PM +0100, Julien Grall wrote:
>> Hi Anthony,
>>
>> Thank you for CCing me.
>>
>> On 5/16/19 11:37 AM, Anthony PERARD wrote:
>>> On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote:
>>>> flight 136184 qemu-upstream-4.11-testing real [real]
>>>> http://logs.test-lab.xenproject.org/osstest/logs/136184/
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>> build-arm64-pvops <job status> broken in 134594
>>>> build-arm64 <job status> broken in 134594
>>>> build-arm64-xsm <job status> broken in 134594
>>>> build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575
>>>> build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575
>>>> build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575
>>>> test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575
>>>> test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575
>>>> test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575
>>>> test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575
>>>>
>>>
>>> Ian, Julien,
>>>
>>> I can't figure out why Xen consistently fails to boot on rochester* in
>>> the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to
>>> pass.
>>>
>>> At boot, the boot loader seems to load blobs, but when it's time to Xen
>>> to shine, there are no output from Xen on the serial.
>>
>> The serial console is initializing fairly late in the process. Any useful
>> message (such as memory setup or even part of the interrupts) will be hide
>> out. For getting them, you need earlyprintk. Unfortunately they can't be
>> configured at runtime today :(.
>
> I think I managed to run the job with earlyprintk on rochester, but
> Xen have booted:
> http://logs.test-lab.xenproject.org/osstest/logs/136451/
Yes this is with earlyprintk. That's going to be fun to reproduce if
earlyprintk modifies the behavior.
I think we can interpret as earlyprintk add enough latency to make
everything working.
There are two possible issues I can think of:
1) The boot code does not follow the Arm Arm, so it may be possible
the board is doing something different compare to the other regarding
the memory. IIRC, this is the first hardware we have with core not
directly designed by Arm.
2) We are missing some errata in Xen. Linux contains 6 errata for
that platform. Looking at them, I don't think they matter for boot time.
1) is currently been looked at (see MM-PART* patches on the ML). 2)
should probably be addressed at some point, but I may not be able to
send them as Arm employee (we tend to avoid sending patch showing
brokenness in partner silicon).
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-05-17 19:00 ` Julien Grall
0 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2019-05-17 19:00 UTC (permalink / raw)
To: Anthony PERARD
Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel
Hi,
On 5/17/19 6:23 PM, Anthony PERARD wrote:
> On Thu, May 16, 2019 at 10:38:54PM +0100, Julien Grall wrote:
>> Hi Anthony,
>>
>> Thank you for CCing me.
>>
>> On 5/16/19 11:37 AM, Anthony PERARD wrote:
>>> On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote:
>>>> flight 136184 qemu-upstream-4.11-testing real [real]
>>>> http://logs.test-lab.xenproject.org/osstest/logs/136184/
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>> build-arm64-pvops <job status> broken in 134594
>>>> build-arm64 <job status> broken in 134594
>>>> build-arm64-xsm <job status> broken in 134594
>>>> build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575
>>>> build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575
>>>> build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575
>>>> test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575
>>>> test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575
>>>> test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575
>>>> test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575
>>>>
>>>
>>> Ian, Julien,
>>>
>>> I can't figure out why Xen consistently fails to boot on rochester* in
>>> the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to
>>> pass.
>>>
>>> At boot, the boot loader seems to load blobs, but when it's time to Xen
>>> to shine, there are no output from Xen on the serial.
>>
>> The serial console is initializing fairly late in the process. Any useful
>> message (such as memory setup or even part of the interrupts) will be hide
>> out. For getting them, you need earlyprintk. Unfortunately they can't be
>> configured at runtime today :(.
>
> I think I managed to run the job with earlyprintk on rochester, but
> Xen have booted:
> http://logs.test-lab.xenproject.org/osstest/logs/136451/
Yes this is with earlyprintk. That's going to be fun to reproduce if
earlyprintk modifies the behavior.
I think we can interpret as earlyprintk add enough latency to make
everything working.
There are two possible issues I can think of:
1) The boot code does not follow the Arm Arm, so it may be possible
the board is doing something different compare to the other regarding
the memory. IIRC, this is the first hardware we have with core not
directly designed by Arm.
2) We are missing some errata in Xen. Linux contains 6 errata for
that platform. Looking at them, I don't think they matter for boot time.
1) is currently been looked at (see MM-PART* patches on the ML). 2)
should probably be addressed at some point, but I may not be able to
send them as Arm employee (we tend to avoid sending patch showing
brokenness in partner silicon).
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-05-21 16:52 ` Julien Grall
0 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2019-05-21 16:52 UTC (permalink / raw)
To: Anthony PERARD
Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel
Hi,
Answering to myself.
On 5/17/19 8:00 PM, Julien Grall wrote:
> Hi,
>
> On 5/17/19 6:23 PM, Anthony PERARD wrote:
>> On Thu, May 16, 2019 at 10:38:54PM +0100, Julien Grall wrote:
>>> Hi Anthony,
>>>
>>> Thank you for CCing me.
>>>
>>> On 5/16/19 11:37 AM, Anthony PERARD wrote:
>>>> On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote:
>>>>> flight 136184 qemu-upstream-4.11-testing real [real]
>>>>> http://logs.test-lab.xenproject.org/osstest/logs/136184/
>>>>>
>>>>> Regressions :-(
>>>>>
>>>>> Tests which did not succeed and are blocking,
>>>>> including tests which could not be run:
>>>>> build-arm64-pvops <job status>
>>>>> broken in 134594
>>>>> build-arm64 <job status>
>>>>> broken in 134594
>>>>> build-arm64-xsm <job status>
>>>>> broken in 134594
>>>>> build-arm64-xsm 4 host-install(4) broken in 134594
>>>>> REGR. vs. 125575
>>>>> build-arm64-pvops 4 host-install(4) broken in 134594
>>>>> REGR. vs. 125575
>>>>> build-arm64 4 host-install(4) broken in 134594
>>>>> REGR. vs. 125575
>>>>> test-arm64-arm64-libvirt-xsm 7 xen-boot fail
>>>>> REGR. vs. 125575
>>>>> test-arm64-arm64-xl 7 xen-boot fail
>>>>> REGR. vs. 125575
>>>>> test-arm64-arm64-xl-xsm 7 xen-boot fail
>>>>> REGR. vs. 125575
>>>>> test-arm64-arm64-xl-credit2 7 xen-boot fail
>>>>> REGR. vs. 125575
>>>>>
>>>>
>>>> Ian, Julien,
>>>>
>>>> I can't figure out why Xen consistently fails to boot on rochester* in
>>>> the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to
>>>> pass.
>>>>
>>>> At boot, the boot loader seems to load blobs, but when it's time to Xen
>>>> to shine, there are no output from Xen on the serial.
>>>
>>> The serial console is initializing fairly late in the process. Any
>>> useful
>>> message (such as memory setup or even part of the interrupts) will be
>>> hide
>>> out. For getting them, you need earlyprintk. Unfortunately they can't be
>>> configured at runtime today :(.
>>
>> I think I managed to run the job with earlyprintk on rochester, but
>> Xen have booted:
>> http://logs.test-lab.xenproject.org/osstest/logs/136451/
>
> Yes this is with earlyprintk. That's going to be fun to reproduce if
> earlyprintk modifies the behavior.
>
> I think we can interpret as earlyprintk add enough latency to make
> everything working.
>
> There are two possible issues I can think of:
> 1) The boot code does not follow the Arm Arm, so it may be possible
> the board is doing something different compare to the other regarding
> the memory. IIRC, this is the first hardware we have with core not
> directly designed by Arm.
> 2) We are missing some errata in Xen. Linux contains 6 errata for
> that platform. Looking at them, I don't think they matter for boot time.
>
> 1) is currently been looked at (see MM-PART* patches on the ML). 2)
> should probably be addressed at some point, but I may not be able to
> send them as Arm employee (we tend to avoid sending patch showing
> brokenness in partner silicon).
Ian kindly started a couple of jobs over the week-end to confirm whether
it can be reproduced on laxton* (Seattle board).
The same error cannot be reproduced on laxton*. Looking at the test
history, it looks like qemu-upstream-4.12-testing flight has run
successfully a few times on rochester*. So we may have fixed the error
in Xen 4.12.
Potential candidates would be:
- 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings
earlier on"
- f60658c6ae "xen/arm: Stop relocating Xen"
Ian, is it something the bisector could automatically look at?
If not, I will need to find some time and borrow the board to bisect the
issues.
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-05-21 16:52 ` Julien Grall
0 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2019-05-21 16:52 UTC (permalink / raw)
To: Anthony PERARD
Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel
Hi,
Answering to myself.
On 5/17/19 8:00 PM, Julien Grall wrote:
> Hi,
>
> On 5/17/19 6:23 PM, Anthony PERARD wrote:
>> On Thu, May 16, 2019 at 10:38:54PM +0100, Julien Grall wrote:
>>> Hi Anthony,
>>>
>>> Thank you for CCing me.
>>>
>>> On 5/16/19 11:37 AM, Anthony PERARD wrote:
>>>> On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote:
>>>>> flight 136184 qemu-upstream-4.11-testing real [real]
>>>>> http://logs.test-lab.xenproject.org/osstest/logs/136184/
>>>>>
>>>>> Regressions :-(
>>>>>
>>>>> Tests which did not succeed and are blocking,
>>>>> including tests which could not be run:
>>>>> build-arm64-pvops <job status>
>>>>> broken in 134594
>>>>> build-arm64 <job status>
>>>>> broken in 134594
>>>>> build-arm64-xsm <job status>
>>>>> broken in 134594
>>>>> build-arm64-xsm 4 host-install(4) broken in 134594
>>>>> REGR. vs. 125575
>>>>> build-arm64-pvops 4 host-install(4) broken in 134594
>>>>> REGR. vs. 125575
>>>>> build-arm64 4 host-install(4) broken in 134594
>>>>> REGR. vs. 125575
>>>>> test-arm64-arm64-libvirt-xsm 7 xen-boot fail
>>>>> REGR. vs. 125575
>>>>> test-arm64-arm64-xl 7 xen-boot fail
>>>>> REGR. vs. 125575
>>>>> test-arm64-arm64-xl-xsm 7 xen-boot fail
>>>>> REGR. vs. 125575
>>>>> test-arm64-arm64-xl-credit2 7 xen-boot fail
>>>>> REGR. vs. 125575
>>>>>
>>>>
>>>> Ian, Julien,
>>>>
>>>> I can't figure out why Xen consistently fails to boot on rochester* in
>>>> the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to
>>>> pass.
>>>>
>>>> At boot, the boot loader seems to load blobs, but when it's time to Xen
>>>> to shine, there are no output from Xen on the serial.
>>>
>>> The serial console is initializing fairly late in the process. Any
>>> useful
>>> message (such as memory setup or even part of the interrupts) will be
>>> hide
>>> out. For getting them, you need earlyprintk. Unfortunately they can't be
>>> configured at runtime today :(.
>>
>> I think I managed to run the job with earlyprintk on rochester, but
>> Xen have booted:
>> http://logs.test-lab.xenproject.org/osstest/logs/136451/
>
> Yes this is with earlyprintk. That's going to be fun to reproduce if
> earlyprintk modifies the behavior.
>
> I think we can interpret as earlyprintk add enough latency to make
> everything working.
>
> There are two possible issues I can think of:
> 1) The boot code does not follow the Arm Arm, so it may be possible
> the board is doing something different compare to the other regarding
> the memory. IIRC, this is the first hardware we have with core not
> directly designed by Arm.
> 2) We are missing some errata in Xen. Linux contains 6 errata for
> that platform. Looking at them, I don't think they matter for boot time.
>
> 1) is currently been looked at (see MM-PART* patches on the ML). 2)
> should probably be addressed at some point, but I may not be able to
> send them as Arm employee (we tend to avoid sending patch showing
> brokenness in partner silicon).
Ian kindly started a couple of jobs over the week-end to confirm whether
it can be reproduced on laxton* (Seattle board).
The same error cannot be reproduced on laxton*. Looking at the test
history, it looks like qemu-upstream-4.12-testing flight has run
successfully a few times on rochester*. So we may have fixed the error
in Xen 4.12.
Potential candidates would be:
- 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings
earlier on"
- f60658c6ae "xen/arm: Stop relocating Xen"
Ian, is it something the bisector could automatically look at?
If not, I will need to find some time and borrow the board to bisect the
issues.
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-06-03 17:15 ` Anthony PERARD
0 siblings, 0 replies; 43+ messages in thread
From: Anthony PERARD @ 2019-06-03 17:15 UTC (permalink / raw)
To: Julien Grall
Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel
On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote:
> The same error cannot be reproduced on laxton*. Looking at the test history,
> it looks like qemu-upstream-4.12-testing flight has run successfully a few
> times on rochester*. So we may have fixed the error in Xen 4.12.
>
> Potential candidates would be:
> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on"
> - f60658c6ae "xen/arm: Stop relocating Xen"
>
> Ian, is it something the bisector could automatically look at?
> If not, I will need to find some time and borrow the board to bisect the
> issues.
I attempted to do that bisection myself, and the first commit that git
wanted to try, a common commit to both branches, boots just fine.
It turns out that the first commit that fails to boot on rochester is
e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
(even with the "eb8acba82a xen: Fix backport of .." applied)
I did try a few commits from stable-4.12 branches and they all booted
just fine on rochester.
Now about the potential candidates:
> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on"
This commit alone, cherry-picked on top of stable-4.11 makes xen boot on
rochester.
> - f60658c6ae "xen/arm: Stop relocating Xen"
With that commit applied, xen doesn't build. So I couldn't try to boot
it. (mm.c: In function ‘setup_pagetables’: mm.c:653:42: error: ‘xen_paddr’ undeclared )
--
Anthony PERARD
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-06-03 17:15 ` Anthony PERARD
0 siblings, 0 replies; 43+ messages in thread
From: Anthony PERARD @ 2019-06-03 17:15 UTC (permalink / raw)
To: Julien Grall
Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel
On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote:
> The same error cannot be reproduced on laxton*. Looking at the test history,
> it looks like qemu-upstream-4.12-testing flight has run successfully a few
> times on rochester*. So we may have fixed the error in Xen 4.12.
>
> Potential candidates would be:
> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on"
> - f60658c6ae "xen/arm: Stop relocating Xen"
>
> Ian, is it something the bisector could automatically look at?
> If not, I will need to find some time and borrow the board to bisect the
> issues.
I attempted to do that bisection myself, and the first commit that git
wanted to try, a common commit to both branches, boots just fine.
It turns out that the first commit that fails to boot on rochester is
e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
(even with the "eb8acba82a xen: Fix backport of .." applied)
I did try a few commits from stable-4.12 branches and they all booted
just fine on rochester.
Now about the potential candidates:
> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on"
This commit alone, cherry-picked on top of stable-4.11 makes xen boot on
rochester.
> - f60658c6ae "xen/arm: Stop relocating Xen"
With that commit applied, xen doesn't build. So I couldn't try to boot
it. (mm.c: In function ‘setup_pagetables’: mm.c:653:42: error: ‘xen_paddr’ undeclared )
--
Anthony PERARD
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-06-04 7:06 ` Jan Beulich
0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2019-06-04 7:06 UTC (permalink / raw)
To: Anthony Perard
Cc: Ian Jackson, Julien Grall, Stefano Stabellini,
osstest service owner, xen-devel
>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
> On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote:
>> The same error cannot be reproduced on laxton*. Looking at the test history,
>> it looks like qemu-upstream-4.12-testing flight has run successfully a few
>> times on rochester*. So we may have fixed the error in Xen 4.12.
>>
>> Potential candidates would be:
>> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on"
>> - f60658c6ae "xen/arm: Stop relocating Xen"
>>
>> Ian, is it something the bisector could automatically look at?
>> If not, I will need to find some time and borrow the board to bisect the
>> issues.
>
> I attempted to do that bisection myself, and the first commit that git
> wanted to try, a common commit to both branches, boots just fine.
Thanks for doing this!
One thing that, for now, completely escapes me: How come the
main 4.11 branch has progressed fine, but the qemuu one has
got stalled like this?
> It turns out that the first commit that fails to boot on rochester is
> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
> (even with the "eb8acba82a xen: Fix backport of .." applied)
Now that's particularly odd a regression candidate. It doesn't
touch any Arm code at all (nor does the fixup commit). And the
common code changes don't look "risky" either; the one thing that
jumps out as the most likely of all the unlikely candidates would
seem to be the xen/common/efi/boot.c change, but if there was
a problem there then the EFI boot on Arm would be latently
broken in other ways as well. Plus, of course, you say that the
same change is no problem on 4.12.
Of course the commit itself could be further "bisected" - all
changes other than the introduction of cmdline_strcmp() are
completely independent of one another.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-06-04 7:06 ` Jan Beulich
0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2019-06-04 7:06 UTC (permalink / raw)
To: Anthony Perard
Cc: Ian Jackson, Julien Grall, Stefano Stabellini,
osstest service owner, xen-devel
>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
> On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote:
>> The same error cannot be reproduced on laxton*. Looking at the test history,
>> it looks like qemu-upstream-4.12-testing flight has run successfully a few
>> times on rochester*. So we may have fixed the error in Xen 4.12.
>>
>> Potential candidates would be:
>> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on"
>> - f60658c6ae "xen/arm: Stop relocating Xen"
>>
>> Ian, is it something the bisector could automatically look at?
>> If not, I will need to find some time and borrow the board to bisect the
>> issues.
>
> I attempted to do that bisection myself, and the first commit that git
> wanted to try, a common commit to both branches, boots just fine.
Thanks for doing this!
One thing that, for now, completely escapes me: How come the
main 4.11 branch has progressed fine, but the qemuu one has
got stalled like this?
> It turns out that the first commit that fails to boot on rochester is
> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
> (even with the "eb8acba82a xen: Fix backport of .." applied)
Now that's particularly odd a regression candidate. It doesn't
touch any Arm code at all (nor does the fixup commit). And the
common code changes don't look "risky" either; the one thing that
jumps out as the most likely of all the unlikely candidates would
seem to be the xen/common/efi/boot.c change, but if there was
a problem there then the EFI boot on Arm would be latently
broken in other ways as well. Plus, of course, you say that the
same change is no problem on 4.12.
Of course the commit itself could be further "bisected" - all
changes other than the introduction of cmdline_strcmp() are
completely independent of one another.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-06-04 9:01 ` Julien Grall
0 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2019-06-04 9:01 UTC (permalink / raw)
To: Jan Beulich, Anthony Perard
Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel
Hi Jan,
On 6/4/19 8:06 AM, Jan Beulich wrote:
>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
>> On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote:
>>> The same error cannot be reproduced on laxton*. Looking at the test history,
>>> it looks like qemu-upstream-4.12-testing flight has run successfully a few
>>> times on rochester*. So we may have fixed the error in Xen 4.12.
>>>
>>> Potential candidates would be:
>>> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on"
>>> - f60658c6ae "xen/arm: Stop relocating Xen"
>>>
>>> Ian, is it something the bisector could automatically look at?
>>> If not, I will need to find some time and borrow the board to bisect the
>>> issues.
>>
>> I attempted to do that bisection myself, and the first commit that git
>> wanted to try, a common commit to both branches, boots just fine.
>
> Thanks for doing this!
>
> One thing that, for now, completely escapes me: How come the
> main 4.11 branch has progressed fine, but the qemuu one has
> got stalled like this?
Because Xen on Arm today does not fully respect the Arm Arm when
modifying the page-tables. This may result to TLB conflict and break of
coherency.
>
>> It turns out that the first commit that fails to boot on rochester is
>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
>> (even with the "eb8acba82a xen: Fix backport of .." applied)
>
> Now that's particularly odd a regression candidate. It doesn't
> touch any Arm code at all (nor does the fixup commit). And the
> common code changes don't look "risky" either; the one thing that
> jumps out as the most likely of all the unlikely candidates would
> seem to be the xen/common/efi/boot.c change, but if there was
> a problem there then the EFI boot on Arm would be latently
> broken in other ways as well. Plus, of course, you say that the
> same change is no problem on 4.12.
>
> Of course the commit itself could be further "bisected" - all
> changes other than the introduction of cmdline_strcmp() are
> completely independent of one another.
I think this is just a red-herring. The commit is probably modifying
enough the layout of Xen that TLB conflict will appear.
Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission
for Xen mappings earlier on" makes staging-4.11 boots. This patch
removes some of the potential cause of TLB conflict.
I haven't suggested a backport of this patch so far, because there are
still TLB conflict possible within the function modified. It might also
be possible that it exposes more of TLB conflict as more work in Xen is
needed (see my MM-PARTn series).
I don't know whether backporting this patch is worth it compare to the
risk it introduces.
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-06-04 9:01 ` Julien Grall
0 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2019-06-04 9:01 UTC (permalink / raw)
To: Jan Beulich, Anthony Perard
Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel
Hi Jan,
On 6/4/19 8:06 AM, Jan Beulich wrote:
>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
>> On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote:
>>> The same error cannot be reproduced on laxton*. Looking at the test history,
>>> it looks like qemu-upstream-4.12-testing flight has run successfully a few
>>> times on rochester*. So we may have fixed the error in Xen 4.12.
>>>
>>> Potential candidates would be:
>>> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on"
>>> - f60658c6ae "xen/arm: Stop relocating Xen"
>>>
>>> Ian, is it something the bisector could automatically look at?
>>> If not, I will need to find some time and borrow the board to bisect the
>>> issues.
>>
>> I attempted to do that bisection myself, and the first commit that git
>> wanted to try, a common commit to both branches, boots just fine.
>
> Thanks for doing this!
>
> One thing that, for now, completely escapes me: How come the
> main 4.11 branch has progressed fine, but the qemuu one has
> got stalled like this?
Because Xen on Arm today does not fully respect the Arm Arm when
modifying the page-tables. This may result to TLB conflict and break of
coherency.
>
>> It turns out that the first commit that fails to boot on rochester is
>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
>> (even with the "eb8acba82a xen: Fix backport of .." applied)
>
> Now that's particularly odd a regression candidate. It doesn't
> touch any Arm code at all (nor does the fixup commit). And the
> common code changes don't look "risky" either; the one thing that
> jumps out as the most likely of all the unlikely candidates would
> seem to be the xen/common/efi/boot.c change, but if there was
> a problem there then the EFI boot on Arm would be latently
> broken in other ways as well. Plus, of course, you say that the
> same change is no problem on 4.12.
>
> Of course the commit itself could be further "bisected" - all
> changes other than the introduction of cmdline_strcmp() are
> completely independent of one another.
I think this is just a red-herring. The commit is probably modifying
enough the layout of Xen that TLB conflict will appear.
Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission
for Xen mappings earlier on" makes staging-4.11 boots. This patch
removes some of the potential cause of TLB conflict.
I haven't suggested a backport of this patch so far, because there are
still TLB conflict possible within the function modified. It might also
be possible that it exposes more of TLB conflict as more work in Xen is
needed (see my MM-PARTn series).
I don't know whether backporting this patch is worth it compare to the
risk it introduces.
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-06-04 9:17 ` Jan Beulich
0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2019-06-04 9:17 UTC (permalink / raw)
To: Julien Grall, Stefano Stabellini
Cc: Anthony Perard, Ian Jackson, osstest service owner, xen-devel
>>> On 04.06.19 at 11:01, <julien.grall@arm.com> wrote:
> On 6/4/19 8:06 AM, Jan Beulich wrote:
>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
>>> It turns out that the first commit that fails to boot on rochester is
>>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
>>> (even with the "eb8acba82a xen: Fix backport of .." applied)
>>
>> Now that's particularly odd a regression candidate. It doesn't
>> touch any Arm code at all (nor does the fixup commit). And the
>> common code changes don't look "risky" either; the one thing that
>> jumps out as the most likely of all the unlikely candidates would
>> seem to be the xen/common/efi/boot.c change, but if there was
>> a problem there then the EFI boot on Arm would be latently
>> broken in other ways as well. Plus, of course, you say that the
>> same change is no problem on 4.12.
>>
>> Of course the commit itself could be further "bisected" - all
>> changes other than the introduction of cmdline_strcmp() are
>> completely independent of one another.
>
> I think this is just a red-herring. The commit is probably modifying
> enough the layout of Xen that TLB conflict will appear.
>
> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission
> for Xen mappings earlier on" makes staging-4.11 boots. This patch
> removes some of the potential cause of TLB conflict.
>
> I haven't suggested a backport of this patch so far, because there are
> still TLB conflict possible within the function modified. It might also
> be possible that it exposes more of TLB conflict as more work in Xen is
> needed (see my MM-PARTn series).
>
> I don't know whether backporting this patch is worth it compare to the
> risk it introduces.
Well, if you don't backport this, what's the alternative road towards a
solution here? I'm afraid the two of you will need to decide one way or
another.
In any event this sounds to me as if a similar problem could appear at
any time on any branch. Not a very nice state to be in ...
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-06-04 9:17 ` Jan Beulich
0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2019-06-04 9:17 UTC (permalink / raw)
To: Julien Grall, Stefano Stabellini
Cc: Anthony Perard, Ian Jackson, osstest service owner, xen-devel
>>> On 04.06.19 at 11:01, <julien.grall@arm.com> wrote:
> On 6/4/19 8:06 AM, Jan Beulich wrote:
>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
>>> It turns out that the first commit that fails to boot on rochester is
>>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
>>> (even with the "eb8acba82a xen: Fix backport of .." applied)
>>
>> Now that's particularly odd a regression candidate. It doesn't
>> touch any Arm code at all (nor does the fixup commit). And the
>> common code changes don't look "risky" either; the one thing that
>> jumps out as the most likely of all the unlikely candidates would
>> seem to be the xen/common/efi/boot.c change, but if there was
>> a problem there then the EFI boot on Arm would be latently
>> broken in other ways as well. Plus, of course, you say that the
>> same change is no problem on 4.12.
>>
>> Of course the commit itself could be further "bisected" - all
>> changes other than the introduction of cmdline_strcmp() are
>> completely independent of one another.
>
> I think this is just a red-herring. The commit is probably modifying
> enough the layout of Xen that TLB conflict will appear.
>
> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission
> for Xen mappings earlier on" makes staging-4.11 boots. This patch
> removes some of the potential cause of TLB conflict.
>
> I haven't suggested a backport of this patch so far, because there are
> still TLB conflict possible within the function modified. It might also
> be possible that it exposes more of TLB conflict as more work in Xen is
> needed (see my MM-PARTn series).
>
> I don't know whether backporting this patch is worth it compare to the
> risk it introduces.
Well, if you don't backport this, what's the alternative road towards a
solution here? I'm afraid the two of you will need to decide one way or
another.
In any event this sounds to me as if a similar problem could appear at
any time on any branch. Not a very nice state to be in ...
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-06-04 9:57 ` Julien Grall
0 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2019-06-04 9:57 UTC (permalink / raw)
To: Jan Beulich, Stefano Stabellini
Cc: Anthony Perard, Ian Jackson, osstest service owner, xen-devel
On 6/4/19 10:17 AM, Jan Beulich wrote:
>>>> On 04.06.19 at 11:01, <julien.grall@arm.com> wrote:
>> On 6/4/19 8:06 AM, Jan Beulich wrote:
>>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
>>>> It turns out that the first commit that fails to boot on rochester is
>>>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
>>>> (even with the "eb8acba82a xen: Fix backport of .." applied)
>>>
>>> Now that's particularly odd a regression candidate. It doesn't
>>> touch any Arm code at all (nor does the fixup commit). And the
>>> common code changes don't look "risky" either; the one thing that
>>> jumps out as the most likely of all the unlikely candidates would
>>> seem to be the xen/common/efi/boot.c change, but if there was
>>> a problem there then the EFI boot on Arm would be latently
>>> broken in other ways as well. Plus, of course, you say that the
>>> same change is no problem on 4.12.
>>>
>>> Of course the commit itself could be further "bisected" - all
>>> changes other than the introduction of cmdline_strcmp() are
>>> completely independent of one another.
>>
>> I think this is just a red-herring. The commit is probably modifying
>> enough the layout of Xen that TLB conflict will appear.
>>
>> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission
>> for Xen mappings earlier on" makes staging-4.11 boots. This patch
>> removes some of the potential cause of TLB conflict.
>>
>> I haven't suggested a backport of this patch so far, because there are
>> still TLB conflict possible within the function modified. It might also
>> be possible that it exposes more of TLB conflict as more work in Xen is
>> needed (see my MM-PARTn series).
>>
>> I don't know whether backporting this patch is worth it compare to the
>> risk it introduces.
>
> Well, if you don't backport this, what's the alternative road towards a
> solution here? I'm afraid the two of you will need to decide one way or
> another.
The "two" being?
Looking at the code again, we now avoid replacing 4KB entry with 2MB
block entry without respecting the Break-Before-Make sequence. So this
is one (actually two) less potential source of TLB conflict.
This patch may introduce more source of TLB conflict is the processor is
caching intermediate walk. But this was already the case before, so it
may be as bad as I first thought.
I would definitely like to hear an opinion from Stefano here.
>
> In any event this sounds to me as if a similar problem could appear at
> any time on any branch. Not a very nice state to be in ...
Thankfully most of those issues will appear at boot time. The update of
Xen page-tables at runtime is sort of correct (missing a couple of lock).
But the failure will depend on your code. I expect that we would not see
the failure in all the Arm platformed used in osstest but Thunder-X.
It is not a nice state to be, but the task is quite important as Xen was
designed on wrong assumption. This implies to rework most of the boot
and memory management.
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-06-04 9:57 ` Julien Grall
0 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2019-06-04 9:57 UTC (permalink / raw)
To: Jan Beulich, Stefano Stabellini
Cc: Anthony Perard, Ian Jackson, osstest service owner, xen-devel
On 6/4/19 10:17 AM, Jan Beulich wrote:
>>>> On 04.06.19 at 11:01, <julien.grall@arm.com> wrote:
>> On 6/4/19 8:06 AM, Jan Beulich wrote:
>>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
>>>> It turns out that the first commit that fails to boot on rochester is
>>>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
>>>> (even with the "eb8acba82a xen: Fix backport of .." applied)
>>>
>>> Now that's particularly odd a regression candidate. It doesn't
>>> touch any Arm code at all (nor does the fixup commit). And the
>>> common code changes don't look "risky" either; the one thing that
>>> jumps out as the most likely of all the unlikely candidates would
>>> seem to be the xen/common/efi/boot.c change, but if there was
>>> a problem there then the EFI boot on Arm would be latently
>>> broken in other ways as well. Plus, of course, you say that the
>>> same change is no problem on 4.12.
>>>
>>> Of course the commit itself could be further "bisected" - all
>>> changes other than the introduction of cmdline_strcmp() are
>>> completely independent of one another.
>>
>> I think this is just a red-herring. The commit is probably modifying
>> enough the layout of Xen that TLB conflict will appear.
>>
>> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission
>> for Xen mappings earlier on" makes staging-4.11 boots. This patch
>> removes some of the potential cause of TLB conflict.
>>
>> I haven't suggested a backport of this patch so far, because there are
>> still TLB conflict possible within the function modified. It might also
>> be possible that it exposes more of TLB conflict as more work in Xen is
>> needed (see my MM-PARTn series).
>>
>> I don't know whether backporting this patch is worth it compare to the
>> risk it introduces.
>
> Well, if you don't backport this, what's the alternative road towards a
> solution here? I'm afraid the two of you will need to decide one way or
> another.
The "two" being?
Looking at the code again, we now avoid replacing 4KB entry with 2MB
block entry without respecting the Break-Before-Make sequence. So this
is one (actually two) less potential source of TLB conflict.
This patch may introduce more source of TLB conflict is the processor is
caching intermediate walk. But this was already the case before, so it
may be as bad as I first thought.
I would definitely like to hear an opinion from Stefano here.
>
> In any event this sounds to me as if a similar problem could appear at
> any time on any branch. Not a very nice state to be in ...
Thankfully most of those issues will appear at boot time. The update of
Xen page-tables at runtime is sort of correct (missing a couple of lock).
But the failure will depend on your code. I expect that we would not see
the failure in all the Arm platformed used in osstest but Thunder-X.
It is not a nice state to be, but the task is quite important as Xen was
designed on wrong assumption. This implies to rework most of the boot
and memory management.
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-06-04 10:02 ` Jan Beulich
0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2019-06-04 10:02 UTC (permalink / raw)
To: Julien Grall
Cc: Anthony Perard, Ian Jackson, Stefano Stabellini,
osstest service owner, xen-devel
>>> On 04.06.19 at 11:57, <julien.grall@arm.com> wrote:
>
> On 6/4/19 10:17 AM, Jan Beulich wrote:
>>>>> On 04.06.19 at 11:01, <julien.grall@arm.com> wrote:
>>> On 6/4/19 8:06 AM, Jan Beulich wrote:
>>>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
>>>>> It turns out that the first commit that fails to boot on rochester is
>>>>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
>>>>> (even with the "eb8acba82a xen: Fix backport of .." applied)
>>>>
>>>> Now that's particularly odd a regression candidate. It doesn't
>>>> touch any Arm code at all (nor does the fixup commit). And the
>>>> common code changes don't look "risky" either; the one thing that
>>>> jumps out as the most likely of all the unlikely candidates would
>>>> seem to be the xen/common/efi/boot.c change, but if there was
>>>> a problem there then the EFI boot on Arm would be latently
>>>> broken in other ways as well. Plus, of course, you say that the
>>>> same change is no problem on 4.12.
>>>>
>>>> Of course the commit itself could be further "bisected" - all
>>>> changes other than the introduction of cmdline_strcmp() are
>>>> completely independent of one another.
>>>
>>> I think this is just a red-herring. The commit is probably modifying
>>> enough the layout of Xen that TLB conflict will appear.
>>>
>>> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission
>>> for Xen mappings earlier on" makes staging-4.11 boots. This patch
>>> removes some of the potential cause of TLB conflict.
>>>
>>> I haven't suggested a backport of this patch so far, because there are
>>> still TLB conflict possible within the function modified. It might also
>>> be possible that it exposes more of TLB conflict as more work in Xen is
>>> needed (see my MM-PARTn series).
>>>
>>> I don't know whether backporting this patch is worth it compare to the
>>> risk it introduces.
>>
>> Well, if you don't backport this, what's the alternative road towards a
>> solution here? I'm afraid the two of you will need to decide one way or
>> another.
>
> The "two" being?
You and Stefano, as was reflected by the To: list of my earlier reply.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
@ 2019-06-04 10:02 ` Jan Beulich
0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2019-06-04 10:02 UTC (permalink / raw)
To: Julien Grall
Cc: Anthony Perard, Ian Jackson, Stefano Stabellini,
osstest service owner, xen-devel
>>> On 04.06.19 at 11:57, <julien.grall@arm.com> wrote:
>
> On 6/4/19 10:17 AM, Jan Beulich wrote:
>>>>> On 04.06.19 at 11:01, <julien.grall@arm.com> wrote:
>>> On 6/4/19 8:06 AM, Jan Beulich wrote:
>>>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
>>>>> It turns out that the first commit that fails to boot on rochester is
>>>>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
>>>>> (even with the "eb8acba82a xen: Fix backport of .." applied)
>>>>
>>>> Now that's particularly odd a regression candidate. It doesn't
>>>> touch any Arm code at all (nor does the fixup commit). And the
>>>> common code changes don't look "risky" either; the one thing that
>>>> jumps out as the most likely of all the unlikely candidates would
>>>> seem to be the xen/common/efi/boot.c change, but if there was
>>>> a problem there then the EFI boot on Arm would be latently
>>>> broken in other ways as well. Plus, of course, you say that the
>>>> same change is no problem on 4.12.
>>>>
>>>> Of course the commit itself could be further "bisected" - all
>>>> changes other than the introduction of cmdline_strcmp() are
>>>> completely independent of one another.
>>>
>>> I think this is just a red-herring. The commit is probably modifying
>>> enough the layout of Xen that TLB conflict will appear.
>>>
>>> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission
>>> for Xen mappings earlier on" makes staging-4.11 boots. This patch
>>> removes some of the potential cause of TLB conflict.
>>>
>>> I haven't suggested a backport of this patch so far, because there are
>>> still TLB conflict possible within the function modified. It might also
>>> be possible that it exposes more of TLB conflict as more work in Xen is
>>> needed (see my MM-PARTn series).
>>>
>>> I don't know whether backporting this patch is worth it compare to the
>>> risk it introduces.
>>
>> Well, if you don't backport this, what's the alternative road towards a
>> solution here? I'm afraid the two of you will need to decide one way or
>> another.
>
> The "two" being?
You and Stefano, as was reflected by the To: list of my earlier reply.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-04 9:01 ` [Xen-devel] " Julien Grall
(?)
(?)
@ 2019-06-04 17:09 ` Stefano Stabellini
2019-06-04 17:22 ` Julien Grall
-1 siblings, 1 reply; 43+ messages in thread
From: Stefano Stabellini @ 2019-06-04 17:09 UTC (permalink / raw)
To: Julien Grall
Cc: Stefano Stabellini, osstest service owner, Jan Beulich,
xen-devel, Anthony Perard, Ian Jackson
On Tue, 4 Jun 2019, Julien Grall wrote:
> Hi Jan,
>
> On 6/4/19 8:06 AM, Jan Beulich wrote:
> > > > > On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
> > > On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote:
> > > > The same error cannot be reproduced on laxton*. Looking at the test
> > > > history,
> > > > it looks like qemu-upstream-4.12-testing flight has run successfully a
> > > > few
> > > > times on rochester*. So we may have fixed the error in Xen 4.12.
> > > >
> > > > Potential candidates would be:
> > > > - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings
> > > > earlier on"
> > > > - f60658c6ae "xen/arm: Stop relocating Xen"
> > > >
> > > > Ian, is it something the bisector could automatically look at?
> > > > If not, I will need to find some time and borrow the board to bisect the
> > > > issues.
> > >
> > > I attempted to do that bisection myself, and the first commit that git
> > > wanted to try, a common commit to both branches, boots just fine.
> >
> > Thanks for doing this!
> >
> > One thing that, for now, completely escapes me: How come the
> > main 4.11 branch has progressed fine, but the qemuu one has
> > got stalled like this?
>
> Because Xen on Arm today does not fully respect the Arm Arm when modifying the
> page-tables. This may result to TLB conflict and break of coherency.
Yes, I follow your reasoning, but it is still quite strange that it only
happens with the qemu testing branch. Maybe it is because laxton was
picked instead of rochester to run the tests for this branch? Otherwise,
there must be a difference in the Xen configuration between the normal
branch and the qemu testing branch, aside from QEMU of course, that
shouldn't make any differences.
> > > It turns out that the first commit that fails to boot on rochester is
> > > e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
> > > (even with the "eb8acba82a xen: Fix backport of .." applied)
> >
> > Now that's particularly odd a regression candidate. It doesn't
> > touch any Arm code at all (nor does the fixup commit). And the
> > common code changes don't look "risky" either; the one thing that
> > jumps out as the most likely of all the unlikely candidates would
> > seem to be the xen/common/efi/boot.c change, but if there was
> > a problem there then the EFI boot on Arm would be latently
> > broken in other ways as well. Plus, of course, you say that the
> > same change is no problem on 4.12.
> >
> > Of course the commit itself could be further "bisected" - all
> > changes other than the introduction of cmdline_strcmp() are
> > completely independent of one another.
>
> I think this is just a red-herring. The commit is probably modifying enough
> the layout of Xen that TLB conflict will appear.
>
> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission for
> Xen mappings earlier on" makes staging-4.11 boots. This patch removes some of
> the potential cause of TLB conflict.
>
> I haven't suggested a backport of this patch so far, because there are still
> TLB conflict possible within the function modified. It might also be possible
> that it exposes more of TLB conflict as more work in Xen is needed (see my
> MM-PARTn series).
>
> I don't know whether backporting this patch is worth it compare to the risk it
> introduces.
I think we should backport 00c96d7742. We don't need to fix all issues,
we only need to make improvements without introducing more bugs. From
that standpoints, I think 00c96d7742 is doable. I'll backport it now to
4.11. What about the other older stanging branches?
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-04 17:09 ` Stefano Stabellini
@ 2019-06-04 17:22 ` Julien Grall
2019-06-04 17:39 ` Stefano Stabellini
2019-06-05 10:19 ` Jan Beulich
0 siblings, 2 replies; 43+ messages in thread
From: Julien Grall @ 2019-06-04 17:22 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Anthony Perard, Ian Jackson, osstest service owner, Jan Beulich,
xen-devel
Hi Stefano,
On 6/4/19 6:09 PM, Stefano Stabellini wrote:
> On Tue, 4 Jun 2019, Julien Grall wrote:
>> Hi Jan,
>>
>> On 6/4/19 8:06 AM, Jan Beulich wrote:
>>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
>>>> On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote:
>>>>> The same error cannot be reproduced on laxton*. Looking at the test
>>>>> history,
>>>>> it looks like qemu-upstream-4.12-testing flight has run successfully a
>>>>> few
>>>>> times on rochester*. So we may have fixed the error in Xen 4.12.
>>>>>
>>>>> Potential candidates would be:
>>>>> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings
>>>>> earlier on"
>>>>> - f60658c6ae "xen/arm: Stop relocating Xen"
>>>>>
>>>>> Ian, is it something the bisector could automatically look at?
>>>>> If not, I will need to find some time and borrow the board to bisect the
>>>>> issues.
>>>>
>>>> I attempted to do that bisection myself, and the first commit that git
>>>> wanted to try, a common commit to both branches, boots just fine.
>>>
>>> Thanks for doing this!
>>>
>>> One thing that, for now, completely escapes me: How come the
>>> main 4.11 branch has progressed fine, but the qemuu one has
>>> got stalled like this?
>>
>> Because Xen on Arm today does not fully respect the Arm Arm when modifying the
>> page-tables. This may result to TLB conflict and break of coherency.
>
> Yes, I follow your reasoning, but it is still quite strange that it only
> happens with the qemu testing branch. Maybe it is because laxton was
> picked instead of rochester to run the tests for this branch? Otherwise,
> there must be a difference in the Xen configuration between the normal
> branch and the qemu testing branch, aside from QEMU of course, that
> shouldn't make any differences.
Per the discussion before, the .config is different between the 2
flights. QEMU testing is not selecting CONFIG_LIVEPATCH while
staging-4.11 is.
>
>
>>>> It turns out that the first commit that fails to boot on rochester is
>>>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
>>>> (even with the "eb8acba82a xen: Fix backport of .." applied)
>>>
>>> Now that's particularly odd a regression candidate. It doesn't
>>> touch any Arm code at all (nor does the fixup commit). And the
>>> common code changes don't look "risky" either; the one thing that
>>> jumps out as the most likely of all the unlikely candidates would
>>> seem to be the xen/common/efi/boot.c change, but if there was
>>> a problem there then the EFI boot on Arm would be latently
>>> broken in other ways as well. Plus, of course, you say that the
>>> same change is no problem on 4.12.
>>>
>>> Of course the commit itself could be further "bisected" - all
>>> changes other than the introduction of cmdline_strcmp() are
>>> completely independent of one another.
>>
>> I think this is just a red-herring. The commit is probably modifying enough
>> the layout of Xen that TLB conflict will appear.
>>
>> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission for
>> Xen mappings earlier on" makes staging-4.11 boots. This patch removes some of
>> the potential cause of TLB conflict.
>>
>> I haven't suggested a backport of this patch so far, because there are still
>> TLB conflict possible within the function modified. It might also be possible
>> that it exposes more of TLB conflict as more work in Xen is needed (see my
>> MM-PARTn series).
>>
>> I don't know whether backporting this patch is worth it compare to the risk it
>> introduces.
>
> I think we should backport 00c96d7742. We don't need to fix all issues,
> we only need to make improvements without introducing more bugs.
> From that standpoints, I think 00c96d7742 is doable. I'll backport it now to
> 4.11.
You don't seem to assess/acknowledge any risk I mention in this thread.
Note that I am not suggesting to not backport it. I am trying to
understand how you came to your conclusion here.
> What about the other older stanging branches?
The only one we could consider is 4.10, but AFAICT Jan already did cut
the last release for it.
So I wouldn't consider any backport unless we begin to see the branch
failing.
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-04 17:22 ` Julien Grall
@ 2019-06-04 17:39 ` Stefano Stabellini
2019-06-04 17:52 ` Ian Jackson
2019-06-04 20:50 ` Julien Grall
2019-06-05 10:19 ` Jan Beulich
1 sibling, 2 replies; 43+ messages in thread
From: Stefano Stabellini @ 2019-06-04 17:39 UTC (permalink / raw)
To: Julien Grall
Cc: Stefano Stabellini, osstest service owner, Jan Beulich,
xen-devel, Anthony Perard, Ian Jackson
On Tue, 4 Jun 2019, Julien Grall wrote:
> Hi Stefano,
>
> On 6/4/19 6:09 PM, Stefano Stabellini wrote:
> > On Tue, 4 Jun 2019, Julien Grall wrote:
> > > Hi Jan,
> > >
> > > On 6/4/19 8:06 AM, Jan Beulich wrote:
> > > > > > > On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
> > > > > On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote:
> > > > > > The same error cannot be reproduced on laxton*. Looking at the test
> > > > > > history,
> > > > > > it looks like qemu-upstream-4.12-testing flight has run successfully
> > > > > > a
> > > > > > few
> > > > > > times on rochester*. So we may have fixed the error in Xen 4.12.
> > > > > >
> > > > > > Potential candidates would be:
> > > > > > - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen
> > > > > > mappings
> > > > > > earlier on"
> > > > > > - f60658c6ae "xen/arm: Stop relocating Xen"
> > > > > >
> > > > > > Ian, is it something the bisector could automatically look at?
> > > > > > If not, I will need to find some time and borrow the board to bisect
> > > > > > the
> > > > > > issues.
> > > > >
> > > > > I attempted to do that bisection myself, and the first commit that git
> > > > > wanted to try, a common commit to both branches, boots just fine.
> > > >
> > > > Thanks for doing this!
> > > >
> > > > One thing that, for now, completely escapes me: How come the
> > > > main 4.11 branch has progressed fine, but the qemuu one has
> > > > got stalled like this?
> > >
> > > Because Xen on Arm today does not fully respect the Arm Arm when modifying
> > > the
> > > page-tables. This may result to TLB conflict and break of coherency.
> >
> > Yes, I follow your reasoning, but it is still quite strange that it only
> > happens with the qemu testing branch. Maybe it is because laxton was
> > picked instead of rochester to run the tests for this branch? Otherwise,
> > there must be a difference in the Xen configuration between the normal
> > branch and the qemu testing branch, aside from QEMU of course, that
> > shouldn't make any differences.
>
> Per the discussion before, the .config is different between the 2 flights.
> QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is.
Has anybody tried to start selecting CONFIG_LIVEPATCH in the QEMU testing
branch? Is it possible to give it a try?
> > > > > It turns out that the first commit that fails to boot on rochester is
> > > > > e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s)
> > > > > construct
> > > > > (even with the "eb8acba82a xen: Fix backport of .." applied)
> > > >
> > > > Now that's particularly odd a regression candidate. It doesn't
> > > > touch any Arm code at all (nor does the fixup commit). And the
> > > > common code changes don't look "risky" either; the one thing that
> > > > jumps out as the most likely of all the unlikely candidates would
> > > > seem to be the xen/common/efi/boot.c change, but if there was
> > > > a problem there then the EFI boot on Arm would be latently
> > > > broken in other ways as well. Plus, of course, you say that the
> > > > same change is no problem on 4.12.
> > > >
> > > > Of course the commit itself could be further "bisected" - all
> > > > changes other than the introduction of cmdline_strcmp() are
> > > > completely independent of one another.
> > >
> > > I think this is just a red-herring. The commit is probably modifying
> > > enough
> > > the layout of Xen that TLB conflict will appear.
> > >
> > > Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission
> > > for
> > > Xen mappings earlier on" makes staging-4.11 boots. This patch removes some
> > > of
> > > the potential cause of TLB conflict.
> > >
> > > I haven't suggested a backport of this patch so far, because there are
> > > still
> > > TLB conflict possible within the function modified. It might also be
> > > possible
> > > that it exposes more of TLB conflict as more work in Xen is needed (see my
> > > MM-PARTn series).
> > >
> > > I don't know whether backporting this patch is worth it compare to the
> > > risk it
> > > introduces.
> >
> > I think we should backport 00c96d7742. We don't need to fix all issues,
> > we only need to make improvements without introducing more bugs.
> > From that standpoints, I think 00c96d7742 is doable. I'll backport it now to
> > 4.11.
>
> You don't seem to assess/acknowledge any risk I mention in this thread.
>
> Note that I am not suggesting to not backport it. I am trying to understand
> how you came to your conclusion here.
Based on the fact that by code inspection the patch should be risk
decremental in terms of Arm Arm violations, which is consistent with the
fact that Anthony found it "fixing" the regression. Do you foresee cases
where the patch increments the risk of failure?
> > What about the other older stanging branches?
>
> The only one we could consider is 4.10, but AFAICT Jan already did cut the
> last release for it.
>
> So I wouldn't consider any backport unless we begin to see the branch failing.
If Jan already made the last release for 4.10, then little point in
backporting it to it. However, it is not ideal to have something like
00c96d7742 in some still-maintained staging branches but not all.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-04 17:39 ` Stefano Stabellini
@ 2019-06-04 17:52 ` Ian Jackson
2019-06-04 18:03 ` Stefano Stabellini
2019-06-04 20:50 ` Julien Grall
1 sibling, 1 reply; 43+ messages in thread
From: Ian Jackson @ 2019-06-04 17:52 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: Anthony Perard, xen-devel, Julien Grall, Jan Beulich
Stefano Stabellini writes ("Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL"):
> On Tue, 4 Jun 2019, Julien Grall wrote:
> > Per the discussion before, the .config is different between the 2 flights.
> > QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is.
>
> Has anybody tried to start selecting CONFIG_LIVEPATCH in the QEMU testing
> branch? Is it possible to give it a try?
I can do this we thinks it's desirable. But I think it is probably
actually helpful to test both, just in case non-LIVEPATCH breaks. As
it just have.
AIUI this is thought to be quite a rare problem, so it showing up in a
qemu branch is OK.
Otherwise maybe we would have to add both with- and without-LIVEPATCH
tests to the xen-* flights. We already have both with- and
without-XSM, and this would add another dimension to the build matrix.
And we would have to decide what subset of the tests should be run in
each configuration.
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-04 17:52 ` Ian Jackson
@ 2019-06-04 18:03 ` Stefano Stabellini
2019-06-04 18:27 ` Ian Jackson
0 siblings, 1 reply; 43+ messages in thread
From: Stefano Stabellini @ 2019-06-04 18:03 UTC (permalink / raw)
To: Ian Jackson
Cc: Anthony Perard, xen-devel, Julien Grall, Stefano Stabellini, Jan Beulich
On Tue, 4 Jun 2019, Ian Jackson wrote:
> Stefano Stabellini writes ("Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL"):
> > On Tue, 4 Jun 2019, Julien Grall wrote:
> > > Per the discussion before, the .config is different between the 2 flights.
> > > QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is.
> >
> > Has anybody tried to start selecting CONFIG_LIVEPATCH in the QEMU testing
> > branch? Is it possible to give it a try?
>
> I can do this we thinks it's desirable. But I think it is probably
> actually helpful to test both, just in case non-LIVEPATCH breaks. As
> it just have.
>
> AIUI this is thought to be quite a rare problem, so it showing up in a
> qemu branch is OK.
>
> Otherwise maybe we would have to add both with- and without-LIVEPATCH
> tests to the xen-* flights. We already have both with- and
> without-XSM, and this would add another dimension to the build matrix.
> And we would have to decide what subset of the tests should be run in
> each configuration.
Hi Ian,
I agree with you it would be desirable to test both LIVEPATCH and
non-LIVEPATCH, and I understand about limitation of resources and test
matrix explosion.
Given the chance, I think it would be better if we had an explicit test
about LIVEPATCH rather than a "hidden" enablement of it within another
different test. Or maybe just call it out explicitly, renaming the test
run to qemu-upstream-livepatch or something like that. In any case, I'll
leave it to you.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-04 18:03 ` Stefano Stabellini
@ 2019-06-04 18:27 ` Ian Jackson
2019-06-04 18:53 ` Stefano Stabellini
0 siblings, 1 reply; 43+ messages in thread
From: Ian Jackson @ 2019-06-04 18:27 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: Anthony Perard, xen-devel, Julien Grall, Jan Beulich
Stefano Stabellini writes ("Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL"):
> I agree with you it would be desirable to test both LIVEPATCH and
> non-LIVEPATCH, and I understand about limitation of resources and test
> matrix explosion.
>
> Given the chance, I think it would be better if we had an explicit test
> about LIVEPATCH rather than a "hidden" enablement of it within another
> different test. Or maybe just call it out explicitly, renaming the test
> run to qemu-upstream-livepatch or something like that. In any case, I'll
> leave it to you.
I think maybe you have misunderstood ?
The thing that triggers this bug, here, is *compiling* Xen with
CONFIG_LIVEPATCH *disabled*.
So, in fact, if it is a hidden anything, it is a hidden *dis*ablement
of a feature which is deliberately only compiled in, and only tested
on, tests of the xen-* branches.
That *disabling* this feature would cause a regression is surprising,
and I think this is only the case because Xen only works by accident
on these boxes ? (Considering the discussion of ARM ARM violations.)
To make it an "explicit" test as you suggest would involve compiling
Xen an additional time. I guess that would actually be changing some
tests on xen-* branches to a version of Xen compiled *without*
livepatch. Right now we build
most other branches
Xen amd64 with XSM no livepatch
Xen armhf no XSM no livepatch
Xen arm64 with XSM no livepatch
xen-* branches
Xen amd64 with XSM with livepatch
Xen armhf no XSM with livepatch
Xen arm64 with XSM with livepatch
What without-livepatch build should be added to the xen-* branches ?
And in which tests should it replace the existing with-livepatch
builds ? Should I just pick one or two apparently at random ?
NB that I doubt the livepatch maintainers have much of an opinion
here. We would normally expect that compiling in livepatching might
break something but that compiling it out would be fine. So the
current situation is good from that point of view and we might even
worry that changing some of the existing tests to not have
livepatching compiled in might miss some actual livepatch-related
bugs. My normal practice is to try to enable as much as is relevant
and might break things.
But what we have here is *not* a livepatch-related bug. It has
nothing to do with livepatch. It is just that by luck, compiling Xen
*with* livepatching somehow masks the random failure, presumably by
changing exact orderings and timings of memory accesses etc.
Does that make sense ?
Thanks,
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-04 18:27 ` Ian Jackson
@ 2019-06-04 18:53 ` Stefano Stabellini
0 siblings, 0 replies; 43+ messages in thread
From: Stefano Stabellini @ 2019-06-04 18:53 UTC (permalink / raw)
To: Ian Jackson
Cc: lars.kurth, Stefano Stabellini, Julien Grall, Jan Beulich,
Anthony Perard, xen-devel
On Tue, 4 Jun 2019, Ian Jackson wrote:
> Stefano Stabellini writes ("Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL"):
> > I agree with you it would be desirable to test both LIVEPATCH and
> > non-LIVEPATCH, and I understand about limitation of resources and test
> > matrix explosion.
> >
> > Given the chance, I think it would be better if we had an explicit test
> > about LIVEPATCH rather than a "hidden" enablement of it within another
> > different test. Or maybe just call it out explicitly, renaming the test
> > run to qemu-upstream-livepatch or something like that. In any case, I'll
> > leave it to you.
>
> I think maybe you have misunderstood ?
>
> The thing that triggers this bug, here, is *compiling* Xen with
> CONFIG_LIVEPATCH *disabled*.
I followed, but I mistyped inverting the condition.
> So, in fact, if it is a hidden anything, it is a hidden *dis*ablement
> of a feature which is deliberately only compiled in, and only tested
> on, tests of the xen-* branches.
>
> That *disabling* this feature would cause a regression is surprising,
> and I think this is only the case because Xen only works by accident
> on these boxes ? (Considering the discussion of ARM ARM violations.)
Yes, that is the current thinking.
> To make it an "explicit" test as you suggest would involve compiling
> Xen an additional time. I guess that would actually be changing some
> tests on xen-* branches to a version of Xen compiled *without*
> livepatch. Right now we build
>
> most other branches
> Xen amd64 with XSM no livepatch
> Xen armhf no XSM no livepatch
> Xen arm64 with XSM no livepatch
>
> xen-* branches
> Xen amd64 with XSM with livepatch
> Xen armhf no XSM with livepatch
> Xen arm64 with XSM with livepatch
>
> What without-livepatch build should be added to the xen-* branches ?
> And in which tests should it replace the existing with-livepatch
> builds ? Should I just pick one or two apparently at random ?
>
> NB that I doubt the livepatch maintainers have much of an opinion
> here. We would normally expect that compiling in livepatching might
> break something but that compiling it out would be fine. So the
> current situation is good from that point of view and we might even
> worry that changing some of the existing tests to not have
> livepatching compiled in might miss some actual livepatch-related
> bugs. My normal practice is to try to enable as much as is relevant
> and might break things.
I think it is a good practice in general, especially if we only have the
resources for one type of tests.
My point is that differences in the kconfig (except maybe for drivers
such as UARTs) can have an important impact either directly or
indirectly, like in this case. The problem will only get worse as more
kconfig options will be introduced. We cannot test all possible
combinations. However, I think different kconfigs deserve to be called
out explicitly in the tests. This is what I was trying to say. Maybe we
can pick 2 or 3 "interesting" Xen kconfigs and run tests for them. But
of course this is predicated on hardware and resource availability that
we might not have.
Specifically in your matrix above, maybe:
xen-* branches
Xen amd64 kconfig_1
Xen amd64 kconfig_2
Xen armhf kconfig_1
Xen arm64 kconfig_1
Xen arm64 kconfig_2
where kconfig_1 has few options as possible enabled (no XSM, no
LIVEPATCH) and kconfig_2 has as many options as possible enabled (both
XSA and LIVEPATCH). Note that I only added kconfig_1 to the armhf line
because it doesn't look like a good idea to run both on arm32. One day
it would be great to add a kconfig_3 with a hand-picked set of options,
and maybe more (kconfig_4, maybe a random kconfig, etc.).
The other branches ideally would follow the same patten. If we don't
have enough resources, they could run with kconfig_1 or kconfig_2 only.
Funnily enough, we discussed something very similar just this morning in
the FuSa Call because we'll need a special kconfig for safety
certifications to be tested. It might end up looking very much like
kconfig_1 (CC'ing Lars here to connect the dots.)
> But what we have here is *not* a livepatch-related bug. It has
> nothing to do with livepatch. It is just that by luck, compiling Xen
> *with* livepatching somehow masks the random failure, presumably by
> changing exact orderings and timings of memory accesses etc.
>
> Does that make sense ?
Yes, I got it.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-04 17:39 ` Stefano Stabellini
2019-06-04 17:52 ` Ian Jackson
@ 2019-06-04 20:50 ` Julien Grall
2019-06-04 23:11 ` Stefano Stabellini
1 sibling, 1 reply; 43+ messages in thread
From: Julien Grall @ 2019-06-04 20:50 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Anthony Perard, Ian Jackson, osstest service owner, Jan Beulich,
xen-devel
On 6/4/19 6:39 PM, Stefano Stabellini wrote:
> On Tue, 4 Jun 2019, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 6/4/19 6:09 PM, Stefano Stabellini wrote:
>>> On Tue, 4 Jun 2019, Julien Grall wrote:
>>>> Hi Jan,
>>>>
>>>> On 6/4/19 8:06 AM, Jan Beulich wrote:
>>>>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
>>>>>> On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote:
>>>>>>> The same error cannot be reproduced on laxton*. Looking at the test
>>>>>>> history,
>>>>>>> it looks like qemu-upstream-4.12-testing flight has run successfully
>>>>>>> a
>>>>>>> few
>>>>>>> times on rochester*. So we may have fixed the error in Xen 4.12.
>>>>>>>
>>>>>>> Potential candidates would be:
>>>>>>> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen
>>>>>>> mappings
>>>>>>> earlier on"
>>>>>>> - f60658c6ae "xen/arm: Stop relocating Xen"
>>>>>>>
>>>>>>> Ian, is it something the bisector could automatically look at?
>>>>>>> If not, I will need to find some time and borrow the board to bisect
>>>>>>> the
>>>>>>> issues.
>>>>>>
>>>>>> I attempted to do that bisection myself, and the first commit that git
>>>>>> wanted to try, a common commit to both branches, boots just fine.
>>>>>
>>>>> Thanks for doing this!
>>>>>
>>>>> One thing that, for now, completely escapes me: How come the
>>>>> main 4.11 branch has progressed fine, but the qemuu one has
>>>>> got stalled like this?
>>>>
>>>> Because Xen on Arm today does not fully respect the Arm Arm when modifying
>>>> the
>>>> page-tables. This may result to TLB conflict and break of coherency.
>>>
>>> Yes, I follow your reasoning, but it is still quite strange that it only
>>> happens with the qemu testing branch. Maybe it is because laxton was
>>> picked instead of rochester to run the tests for this branch? Otherwise,
>>> there must be a difference in the Xen configuration between the normal
>>> branch and the qemu testing branch, aside from QEMU of course, that
>>> shouldn't make any differences.
>>
>> Per the discussion before, the .config is different between the 2 flights.
>> QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is.
>
> Has anybody tried to start selecting CONFIG_LIVEPATCH in the QEMU testing
> branch? Is it possible to give it a try?
I don't know and I am not sure how this would help here it is pretty
clear that backporting 00c96d7742 "xen/arm: mm: Set-up page permission
for Xen mappings earlier on" is actually going to help booting.
So it is very unlikely that CONFIG_LIVEPATCH is the problem.
>
>
>>>>>> It turns out that the first commit that fails to boot on rochester is
>>>>>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s)
>>>>>> construct
>>>>>> (even with the "eb8acba82a xen: Fix backport of .." applied)
>>>>>
>>>>> Now that's particularly odd a regression candidate. It doesn't
>>>>> touch any Arm code at all (nor does the fixup commit). And the
>>>>> common code changes don't look "risky" either; the one thing that
>>>>> jumps out as the most likely of all the unlikely candidates would
>>>>> seem to be the xen/common/efi/boot.c change, but if there was
>>>>> a problem there then the EFI boot on Arm would be latently
>>>>> broken in other ways as well. Plus, of course, you say that the
>>>>> same change is no problem on 4.12.
>>>>>
>>>>> Of course the commit itself could be further "bisected" - all
>>>>> changes other than the introduction of cmdline_strcmp() are
>>>>> completely independent of one another.
>>>>
>>>> I think this is just a red-herring. The commit is probably modifying
>>>> enough
>>>> the layout of Xen that TLB conflict will appear.
>>>>
>>>> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission
>>>> for
>>>> Xen mappings earlier on" makes staging-4.11 boots. This patch removes some
>>>> of
>>>> the potential cause of TLB conflict.
>>>>
>>>> I haven't suggested a backport of this patch so far, because there are
>>>> still
>>>> TLB conflict possible within the function modified. It might also be
>>>> possible
>>>> that it exposes more of TLB conflict as more work in Xen is needed (see my
>>>> MM-PARTn series).
>>>>
>>>> I don't know whether backporting this patch is worth it compare to the
>>>> risk it
>>>> introduces.
>>>
>>> I think we should backport 00c96d7742. We don't need to fix all issues,
>>> we only need to make improvements without introducing more bugs.
>>> From that standpoints, I think 00c96d7742 is doable. I'll backport it now to
>>> 4.11.
>>
>> You don't seem to assess/acknowledge any risk I mention in this thread.
>>
>> Note that I am not suggesting to not backport it. I am trying to understand
>> how you came to your conclusion here.
>
> Based on the fact that by code inspection the patch should be risk
> decremental in terms of Arm Arm violations, which is consistent with the
> fact that Anthony found it "fixing" the regression. Do you foresee cases
> where the patch increments the risk of failure?
Well yes and no. I guess you haven't read what I wrote on the separate
thread.
Yes, two potential source of TLB conflict is removed by avoiding
replacing 4KB entries with 2MB block entry (and vice versa) without
respecting the Break-Before-Make.
No, this patch introducing another source of TLB conflict if the
processor is caching intermediate translation (this is implementation
defined).
The bug reported by osstest actually taught me that even if Xen may boot
today on a given platform, this may not be the case tomorrow because of
the slight change in the code ordering (and therefore memory access).
/!\ Below is my interpretation and does not imply I am correct ;)
However, such Arm Arm violations are mostly gathered around boot and
shouldn't affect runtime. IOW, Xen would stop booting on those platforms
rather than making unrealiable. So it would not be too bad.
/!\ End
We just have to be aware of the risk we are taking with backporting the
patch.
>>> What about the other older stanging branches?
>>
>> The only one we could consider is 4.10, but AFAICT Jan already did cut the
>> last release for it.
>>
>> So I wouldn't consider any backport unless we begin to see the branch failing.
>
> If Jan already made the last release for 4.10, then little point in
> backporting it to it. However, it is not ideal to have something like
> 00c96d7742 in some still-maintained staging branches but not all.
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-04 20:50 ` Julien Grall
@ 2019-06-04 23:11 ` Stefano Stabellini
2019-06-05 10:59 ` Julien Grall
0 siblings, 1 reply; 43+ messages in thread
From: Stefano Stabellini @ 2019-06-04 23:11 UTC (permalink / raw)
To: Julien Grall
Cc: Stefano Stabellini, osstest service owner, Jan Beulich,
xen-devel, Anthony Perard, Ian Jackson
On Tue, 4 Jun 2019, Julien Grall wrote:
> On 6/4/19 6:39 PM, Stefano Stabellini wrote:
> > On Tue, 4 Jun 2019, Julien Grall wrote:
> > > Hi Stefano,
> > >
> > > On 6/4/19 6:09 PM, Stefano Stabellini wrote:
> > > > On Tue, 4 Jun 2019, Julien Grall wrote:
> > > > > Hi Jan,
> > > > >
> > > > > On 6/4/19 8:06 AM, Jan Beulich wrote:
> > > > > > > > > On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
> > > > > > > On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote:
> > > > > > > > The same error cannot be reproduced on laxton*. Looking at the
> > > > > > > > test
> > > > > > > > history,
> > > > > > > > it looks like qemu-upstream-4.12-testing flight has run
> > > > > > > > successfully
> > > > > > > > a
> > > > > > > > few
> > > > > > > > times on rochester*. So we may have fixed the error in Xen 4.12.
> > > > > > > >
> > > > > > > > Potential candidates would be:
> > > > > > > > - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen
> > > > > > > > mappings
> > > > > > > > earlier on"
> > > > > > > > - f60658c6ae "xen/arm: Stop relocating Xen"
> > > > > > > >
> > > > > > > > Ian, is it something the bisector could automatically look at?
> > > > > > > > If not, I will need to find some time and borrow the board to
> > > > > > > > bisect
> > > > > > > > the
> > > > > > > > issues.
> > > > > > >
> > > > > > > I attempted to do that bisection myself, and the first commit that
> > > > > > > git
> > > > > > > wanted to try, a common commit to both branches, boots just fine.
> > > > > >
> > > > > > Thanks for doing this!
> > > > > >
> > > > > > One thing that, for now, completely escapes me: How come the
> > > > > > main 4.11 branch has progressed fine, but the qemuu one has
> > > > > > got stalled like this?
> > > > >
> > > > > Because Xen on Arm today does not fully respect the Arm Arm when
> > > > > modifying
> > > > > the
> > > > > page-tables. This may result to TLB conflict and break of coherency.
> > > >
> > > > Yes, I follow your reasoning, but it is still quite strange that it only
> > > > happens with the qemu testing branch. Maybe it is because laxton was
> > > > picked instead of rochester to run the tests for this branch? Otherwise,
> > > > there must be a difference in the Xen configuration between the normal
> > > > branch and the qemu testing branch, aside from QEMU of course, that
> > > > shouldn't make any differences.
> > >
> > > Per the discussion before, the .config is different between the 2 flights.
> > > QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is.
> >
> > Has anybody tried to start selecting CONFIG_LIVEPATCH in the QEMU testing
> > branch? Is it possible to give it a try?
>
> I don't know and I am not sure how this would help here it is pretty clear
> that backporting 00c96d7742 "xen/arm: mm: Set-up page permission for Xen
> mappings earlier on" is actually going to help booting.
>
> So it is very unlikely that CONFIG_LIVEPATCH is the problem.
I am not blaming CONFIG_LIVEPATCH at all. If we decide that we don't
want to backport 00c96d7742 for one reason or the other, and basically
we cannot fix this bug, enabling CONFIG_LIVEPATCH would probably unblock
the CI-loop (it would be nice to be sure about it). Let's keep in mind
that we always had this bug -- the next 4.11 release is not going to be
any more broken than the previous 4.11 release if we don't fix this
issue, unless you think we backported something that affected the
underlying problem, making it worse.
Note that I am not advocating for leaving this bug unfixed. I am only
suggesting that if we decide it is too risky to backport 00c96d7742 and
we don't know what else to do, it would be good to have a way to unblock
4.11 without having to force-push it. Let's settle the discussion below
first.
> > > > > > > It turns out that the first commit that fails to boot on rochester
> > > > > > > is
> > > > > > > e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s)
> > > > > > > construct
> > > > > > > (even with the "eb8acba82a xen: Fix backport of .." applied)
> > > > > >
> > > > > > Now that's particularly odd a regression candidate. It doesn't
> > > > > > touch any Arm code at all (nor does the fixup commit). And the
> > > > > > common code changes don't look "risky" either; the one thing that
> > > > > > jumps out as the most likely of all the unlikely candidates would
> > > > > > seem to be the xen/common/efi/boot.c change, but if there was
> > > > > > a problem there then the EFI boot on Arm would be latently
> > > > > > broken in other ways as well. Plus, of course, you say that the
> > > > > > same change is no problem on 4.12.
> > > > > >
> > > > > > Of course the commit itself could be further "bisected" - all
> > > > > > changes other than the introduction of cmdline_strcmp() are
> > > > > > completely independent of one another.
> > > > >
> > > > > I think this is just a red-herring. The commit is probably modifying
> > > > > enough
> > > > > the layout of Xen that TLB conflict will appear.
> > > > >
> > > > > Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page
> > > > > permission
> > > > > for
> > > > > Xen mappings earlier on" makes staging-4.11 boots. This patch removes
> > > > > some
> > > > > of
> > > > > the potential cause of TLB conflict.
> > > > >
> > > > > I haven't suggested a backport of this patch so far, because there are
> > > > > still
> > > > > TLB conflict possible within the function modified. It might also be
> > > > > possible
> > > > > that it exposes more of TLB conflict as more work in Xen is needed
> > > > > (see my
> > > > > MM-PARTn series).
> > > > >
> > > > > I don't know whether backporting this patch is worth it compare to the
> > > > > risk it
> > > > > introduces.
> > > >
> > > > I think we should backport 00c96d7742. We don't need to fix all issues,
> > > > we only need to make improvements without introducing more bugs.
> > > > From that standpoints, I think 00c96d7742 is doable. I'll backport it
> > > > now to
> > > > 4.11.
> > >
> > > You don't seem to assess/acknowledge any risk I mention in this thread.
> > >
> > > Note that I am not suggesting to not backport it. I am trying to
> > > understand
> > > how you came to your conclusion here.
> >
> > Based on the fact that by code inspection the patch should be risk
> > decremental in terms of Arm Arm violations, which is consistent with the
> > fact that Anthony found it "fixing" the regression. Do you foresee cases
> > where the patch increments the risk of failure?
>
> Well yes and no. I guess you haven't read what I wrote on the separate thread.
I missed it
> Yes, two potential source of TLB conflict is removed by avoiding replacing 4KB
> entries with 2MB block entry (and vice versa) without respecting the
> Break-Before-Make.
This is clear
> No, this patch introducing another source of TLB conflict if the processor is
> caching intermediate translation (this is implementation defined).
By "another source of TLB conflict" are you referring to something new
that wasn't there before? Or are you referring to the fact that still we
are not following the proper sequence to update the Xen pagetable? If
you are referring to the latter, wouldn't it be reasonable to say that
such a problem could have happened also before 00c96d7742?
> The bug reported by osstest actually taught me that even if Xen may boot today
> on a given platform, this may not be the case tomorrow because of the slight
> change in the code ordering (and therefore memory access).
>
> /!\ Below is my interpretation and does not imply I am correct ;)
>
> However, such Arm Arm violations are mostly gathered around boot and shouldn't
> affect runtime. IOW, Xen would stop booting on those platforms rather than
> making unrealiable. So it would not be too bad.
>
> /!\ End
>
> We just have to be aware of the risk we are taking with backporting the patch.
What you wrote here seems to make sense but I would like to understand
the problem mentioned earlier a bit better
> > > > What about the other older stanging branches?
> > >
> > > The only one we could consider is 4.10, but AFAICT Jan already did cut the
> > > last release for it.
> > >
> > > So I wouldn't consider any backport unless we begin to see the branch
> > > failing.
> >
> > If Jan already made the last release for 4.10, then little point in
> > backporting it to it. However, it is not ideal to have something like
> > 00c96d7742 in some still-maintained staging branches but not all.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-04 17:22 ` Julien Grall
2019-06-04 17:39 ` Stefano Stabellini
@ 2019-06-05 10:19 ` Jan Beulich
1 sibling, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2019-06-05 10:19 UTC (permalink / raw)
To: Julien Grall
Cc: Anthony Perard, Ian Jackson, Stefano Stabellini,
osstest service owner, xen-devel
>>> On 04.06.19 at 19:22, <julien.grall@arm.com> wrote:
> The only one we could consider is 4.10, but AFAICT Jan already did cut
> the last release for it.
I've sent a call for backport requests. The tree isn't closed yet, but
soon will be.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-04 23:11 ` Stefano Stabellini
@ 2019-06-05 10:59 ` Julien Grall
2019-06-05 20:29 ` Stefano Stabellini
0 siblings, 1 reply; 43+ messages in thread
From: Julien Grall @ 2019-06-05 10:59 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Anthony Perard, Ian Jackson, osstest service owner, Jan Beulich,
xen-devel
Hi Stefano,
On 05/06/2019 00:11, Stefano Stabellini wrote:
> On Tue, 4 Jun 2019, Julien Grall wrote:
>> On 6/4/19 6:39 PM, Stefano Stabellini wrote:
>>> On Tue, 4 Jun 2019, Julien Grall wrote:
>>>> Hi Stefano,
>>>>
>>>> On 6/4/19 6:09 PM, Stefano Stabellini wrote:
>>>>> On Tue, 4 Jun 2019, Julien Grall wrote:
>>>>>> Hi Jan,
>>>>>>
>>>>>> On 6/4/19 8:06 AM, Jan Beulich wrote:
>>>>>>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
>>>>>>>> On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote:
>>>>>>>>> The same error cannot be reproduced on laxton*. Looking at the
>>>>>>>>> test
>>>>>>>>> history,
>>>>>>>>> it looks like qemu-upstream-4.12-testing flight has run
>>>>>>>>> successfully
>>>>>>>>> a
>>>>>>>>> few
>>>>>>>>> times on rochester*. So we may have fixed the error in Xen 4.12.
>>>>>>>>>
>>>>>>>>> Potential candidates would be:
>>>>>>>>> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen
>>>>>>>>> mappings
>>>>>>>>> earlier on"
>>>>>>>>> - f60658c6ae "xen/arm: Stop relocating Xen"
>>>>>>>>>
>>>>>>>>> Ian, is it something the bisector could automatically look at?
>>>>>>>>> If not, I will need to find some time and borrow the board to
>>>>>>>>> bisect
>>>>>>>>> the
>>>>>>>>> issues.
>>>>>>>>
>>>>>>>> I attempted to do that bisection myself, and the first commit that
>>>>>>>> git
>>>>>>>> wanted to try, a common commit to both branches, boots just fine.
>>>>>>>
>>>>>>> Thanks for doing this!
>>>>>>>
>>>>>>> One thing that, for now, completely escapes me: How come the
>>>>>>> main 4.11 branch has progressed fine, but the qemuu one has
>>>>>>> got stalled like this?
>>>>>>
>>>>>> Because Xen on Arm today does not fully respect the Arm Arm when
>>>>>> modifying
>>>>>> the
>>>>>> page-tables. This may result to TLB conflict and break of coherency.
>>>>>
>>>>> Yes, I follow your reasoning, but it is still quite strange that it only
>>>>> happens with the qemu testing branch. Maybe it is because laxton was
>>>>> picked instead of rochester to run the tests for this branch? Otherwise,
>>>>> there must be a difference in the Xen configuration between the normal
>>>>> branch and the qemu testing branch, aside from QEMU of course, that
>>>>> shouldn't make any differences.
>>>>
>>>> Per the discussion before, the .config is different between the 2 flights.
>>>> QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is.
>>>
>>> Has anybody tried to start selecting CONFIG_LIVEPATCH in the QEMU testing
>>> branch? Is it possible to give it a try?
>>
>> I don't know and I am not sure how this would help here it is pretty clear
>> that backporting 00c96d7742 "xen/arm: mm: Set-up page permission for Xen
>> mappings earlier on" is actually going to help booting.
>>
>> So it is very unlikely that CONFIG_LIVEPATCH is the problem.
>
> I am not blaming CONFIG_LIVEPATCH at all. If we decide that we don't
> want to backport 00c96d7742 for one reason or the other, and basically
> we cannot fix this bug, enabling CONFIG_LIVEPATCH would probably unblock
> the CI-loop (it would be nice to be sure about it). Let's keep in mind
> that we always had this bug -- the next 4.11 release is not going to be
> any more broken than the previous 4.11 release if we don't fix this
> issue, unless you think we backported something that affected the
> underlying problem, making it worse.
>
> Note that I am not advocating for leaving this bug unfixed. I am only
> suggesting that if we decide it is too risky to backport 00c96d7742 and
> we don't know what else to do, it would be good to have a way to unblock
> 4.11 without having to force-push it. Let's settle the discussion below
> first.
One way to unblock is not testing 4.11 (or just this flight) on Thunder-X.
[...]
>> No, this patch introducing another source of TLB conflict if the processor is
>> caching intermediate translation (this is implementation defined).
>
> By "another source of TLB conflict" are you referring to something new
> that wasn't there before? Or are you referring to the fact that still we
> are not following the proper sequence to update the Xen pagetable? If
> you are referring to the latter, wouldn't it be reasonable to say that
> such a problem could have happened also before 00c96d7742?
It is existent but in a different form. I can't tell whether this is bad or not
because the re-ordering of the code (and therefore memory access) will affect
how TLBs are used. So it is a bit of gambling here.
>> The bug reported by osstest actually taught me that even if Xen may boot today
>> on a given platform, this may not be the case tomorrow because of the slight
>> change in the code ordering (and therefore memory access).
>>
>> /!\ Below is my interpretation and does not imply I am correct ;)
>>
>> However, such Arm Arm violations are mostly gathered around boot and shouldn't
>> affect runtime. IOW, Xen would stop booting on those platforms rather than
>> making unrealiable. So it would not be too bad.
>>
>> /!\ End
>>
>> We just have to be aware of the risk we are taking with backporting the patch.
>
> What you wrote here seems to make sense but I would like to understand
> the problem mentioned earlier a bit better
>
>
>>>>> What about the other older stanging branches?
>>>>
>>>> The only one we could consider is 4.10, but AFAICT Jan already did cut the
>>>> last release for it.
>>>>
>>>> So I wouldn't consider any backport unless we begin to see the branch
>>>> failing.
>>>
>>> If Jan already made the last release for 4.10, then little point in
>>> backporting it to it. However, it is not ideal to have something like
>>> 00c96d7742 in some still-maintained staging branches but not all.
Jan pointed out it is not yet release. However, we didn't get any report for
problem (aside the Arm Arm violation) with Xen 4.10 today. So I would rather
avoid such backport in a final point release as we have a risk to make more
broken than it is today.
I find this acceptable for Xen 4.11 because it has been proven to help. We also
still have point release afterwards if this goes wrong.
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-05 10:59 ` Julien Grall
@ 2019-06-05 20:29 ` Stefano Stabellini
2019-06-05 21:38 ` Julien Grall
0 siblings, 1 reply; 43+ messages in thread
From: Stefano Stabellini @ 2019-06-05 20:29 UTC (permalink / raw)
To: Julien Grall
Cc: Stefano Stabellini, osstest service owner, Jan Beulich,
xen-devel, Anthony Perard, Ian Jackson
On Wed, 5 Jun 2019, Julien Grall wrote:
> Hi Stefano,
>
> On 05/06/2019 00:11, Stefano Stabellini wrote:
> > On Tue, 4 Jun 2019, Julien Grall wrote:
> > > On 6/4/19 6:39 PM, Stefano Stabellini wrote:
> > > > On Tue, 4 Jun 2019, Julien Grall wrote:
> > > > > Hi Stefano,
> > > > >
> > > > > On 6/4/19 6:09 PM, Stefano Stabellini wrote:
> > > > > > On Tue, 4 Jun 2019, Julien Grall wrote:
> > > > > > > Hi Jan,
> > > > > > >
> > > > > > > On 6/4/19 8:06 AM, Jan Beulich wrote:
> > > > > > > > > > > On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote:
> > > > > > > > > On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote:
> > > > > > > > > > The same error cannot be reproduced on laxton*. Looking at
> > > > > > > > > > the
> > > > > > > > > > test
> > > > > > > > > > history,
> > > > > > > > > > it looks like qemu-upstream-4.12-testing flight has run
> > > > > > > > > > successfully
> > > > > > > > > > a
> > > > > > > > > > few
> > > > > > > > > > times on rochester*. So we may have fixed the error in Xen
> > > > > > > > > > 4.12.
> > > > > > > > > >
> > > > > > > > > > Potential candidates would be:
> > > > > > > > > > - 00c96d7742 "xen/arm: mm: Set-up page permission for
> > > > > > > > > > Xen
> > > > > > > > > > mappings
> > > > > > > > > > earlier on"
> > > > > > > > > > - f60658c6ae "xen/arm: Stop relocating Xen"
> > > > > > > > > >
> > > > > > > > > > Ian, is it something the bisector could automatically look
> > > > > > > > > > at?
> > > > > > > > > > If not, I will need to find some time and borrow the board
> > > > > > > > > > to
> > > > > > > > > > bisect
> > > > > > > > > > the
> > > > > > > > > > issues.
> > > > > > > > >
> > > > > > > > > I attempted to do that bisection myself, and the first commit
> > > > > > > > > that
> > > > > > > > > git
> > > > > > > > > wanted to try, a common commit to both branches, boots just
> > > > > > > > > fine.
> > > > > > > >
> > > > > > > > Thanks for doing this!
> > > > > > > >
> > > > > > > > One thing that, for now, completely escapes me: How come the
> > > > > > > > main 4.11 branch has progressed fine, but the qemuu one has
> > > > > > > > got stalled like this?
> > > > > > >
> > > > > > > Because Xen on Arm today does not fully respect the Arm Arm when
> > > > > > > modifying
> > > > > > > the
> > > > > > > page-tables. This may result to TLB conflict and break of
> > > > > > > coherency.
> > > > > >
> > > > > > Yes, I follow your reasoning, but it is still quite strange that it
> > > > > > only
> > > > > > happens with the qemu testing branch. Maybe it is because laxton was
> > > > > > picked instead of rochester to run the tests for this branch?
> > > > > > Otherwise,
> > > > > > there must be a difference in the Xen configuration between the
> > > > > > normal
> > > > > > branch and the qemu testing branch, aside from QEMU of course, that
> > > > > > shouldn't make any differences.
> > > > >
> > > > > Per the discussion before, the .config is different between the 2
> > > > > flights.
> > > > > QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is.
> > > >
> > > > Has anybody tried to start selecting CONFIG_LIVEPATCH in the QEMU
> > > > testing
> > > > branch? Is it possible to give it a try?
> > >
> > > I don't know and I am not sure how this would help here it is pretty clear
> > > that backporting 00c96d7742 "xen/arm: mm: Set-up page permission for Xen
> > > mappings earlier on" is actually going to help booting.
> > >
> > > So it is very unlikely that CONFIG_LIVEPATCH is the problem.
> >
> > I am not blaming CONFIG_LIVEPATCH at all. If we decide that we don't
> > want to backport 00c96d7742 for one reason or the other, and basically
> > we cannot fix this bug, enabling CONFIG_LIVEPATCH would probably unblock
> > the CI-loop (it would be nice to be sure about it). Let's keep in mind
> > that we always had this bug -- the next 4.11 release is not going to be
> > any more broken than the previous 4.11 release if we don't fix this
> > issue, unless you think we backported something that affected the
> > underlying problem, making it worse.
> >
> > Note that I am not advocating for leaving this bug unfixed. I am only
> > suggesting that if we decide it is too risky to backport 00c96d7742 and
> > we don't know what else to do, it would be good to have a way to unblock
> > 4.11 without having to force-push it. Let's settle the discussion below
> > first.
>
> One way to unblock is not testing 4.11 (or just this flight) on Thunder-X.
Yeah, let's keep these options in mind.
> > > No, this patch introducing another source of TLB conflict if the processor
> > > is
> > > caching intermediate translation (this is implementation defined).
> >
> > By "another source of TLB conflict" are you referring to something new
> > that wasn't there before? Or are you referring to the fact that still we
> > are not following the proper sequence to update the Xen pagetable? If
> > you are referring to the latter, wouldn't it be reasonable to say that
> > such a problem could have happened also before 00c96d7742?
>
> It is existent but in a different form. I can't tell whether this is bad or
> not because the re-ordering of the code (and therefore memory access) will
> affect how TLBs are used. So it is a bit of gambling here.
If I read this right, this is the same underlying issue but due to the
re-ordering of the code, it could manifest differently. For instance the
impact on cache lines could be different.
Is this the case? If so, I think this is a tolerable risk, as other
things could affect it too, such as CONFIG options being
enabled/disabled, as we have just seen with CONFIG_LIVEPATCH. It is
almost "random".
I did take this into account when I wrote earlier that I think it should
be backported. But if you see a different class of problems potentially
being introduced by 00c96d7742 then I think the discussion would change
because it can be considered a regression.
> > > The bug reported by osstest actually taught me that even if Xen may boot
> > > today
> > > on a given platform, this may not be the case tomorrow because of the
> > > slight
> > > change in the code ordering (and therefore memory access).
> > >
> > > /!\ Below is my interpretation and does not imply I am correct ;)
> > >
> > > However, such Arm Arm violations are mostly gathered around boot and
> > > shouldn't
> > > affect runtime. IOW, Xen would stop booting on those platforms rather than
> > > making unrealiable. So it would not be too bad.
> > >
> > > /!\ End
> > >
> > > We just have to be aware of the risk we are taking with backporting the
> > > patch.
> >
> > What you wrote here seems to make sense but I would like to understand
> > the problem mentioned earlier a bit better
> >
> >
> > > > > > What about the other older stanging branches?
> > > > >
> > > > > The only one we could consider is 4.10, but AFAICT Jan already did cut
> > > > > the
> > > > > last release for it.
> > > > >
> > > > > So I wouldn't consider any backport unless we begin to see the branch
> > > > > failing.
> > > >
> > > > If Jan already made the last release for 4.10, then little point in
> > > > backporting it to it. However, it is not ideal to have something like
> > > > 00c96d7742 in some still-maintained staging branches but not all.
>
> Jan pointed out it is not yet release. However, we didn't get any report for
> problem (aside the Arm Arm violation) with Xen 4.10 today. So I would rather
> avoid such backport in a final point release as we have a risk to make more
> broken than it is today.
>
> I find this acceptable for Xen 4.11 because it has been proven to help. We
> also still have point release afterwards if this goes wrong.
If we do the backport, I would prefer to backport it to both trees, for
consistency, and because there might be machines out there where 4.10
doesn't boot with the wrong kconfig. This patch should decrease the risk
of breakage.
However, I see your point too. This is a judgement call -- we have not
enough data but we have to make a decision anyway. No way to tell which
way is best "scientifically".
My vote is to backport to both. Jan/others please express your opinion.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-05 20:29 ` Stefano Stabellini
@ 2019-06-05 21:38 ` Julien Grall
2019-06-06 8:42 ` Jan Beulich
0 siblings, 1 reply; 43+ messages in thread
From: Julien Grall @ 2019-06-05 21:38 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Anthony Perard, Ian Jackson, osstest service owner, Jan Beulich,
xen-devel
Hi Stefano,
On 6/5/19 9:29 PM, Stefano Stabellini wrote:
> On Wed, 5 Jun 2019, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 05/06/2019 00:11, Stefano Stabellini wrote:
>>> On Tue, 4 Jun 2019, Julien Grall wrote:
>>>> On 6/4/19 6:39 PM, Stefano Stabellini wrote:
>>>>> On Tue, 4 Jun 2019, Julien Grall wrote:
>>>> No, this patch introducing another source of TLB conflict if the processor
>>>> is
>>>> caching intermediate translation (this is implementation defined).
>>>
>>> By "another source of TLB conflict" are you referring to something new
>>> that wasn't there before? Or are you referring to the fact that still we
>>> are not following the proper sequence to update the Xen pagetable? If
>>> you are referring to the latter, wouldn't it be reasonable to say that
>>> such a problem could have happened also before 00c96d7742?
>>
>> It is existent but in a different form. I can't tell whether this is bad or
>> not because the re-ordering of the code (and therefore memory access) will
>> affect how TLBs are used. So it is a bit of gambling here.
>
> If I read this right, this is the same underlying issue but due to the
> re-ordering of the code, it could manifest differently. For instance the
> impact on cache lines could be different.
I am sorry, but how did you came up with cache line difference here? It
has nothing about cachelines, it just has to do how the TLBs are filled
at a given point. If you re-order memory access, then you may as well
have a different state of the TLBs at a given point.
>
> Is this the case? If so, I think this is a tolerable risk, as other
> things could affect it too, such as CONFIG options being
> enabled/disabled, as we have just seen with CONFIG_LIVEPATCH. It is
> almost "random".
See above. But yes it is almost random.
>>>> The bug reported by osstest actually taught me that even if Xen may boot
>>>> today
>>>> on a given platform, this may not be the case tomorrow because of the
>>>> slight
>>>> change in the code ordering (and therefore memory access).
>>>>
>>>> /!\ Below is my interpretation and does not imply I am correct ;)
>>>>
>>>> However, such Arm Arm violations are mostly gathered around boot and
>>>> shouldn't
>>>> affect runtime. IOW, Xen would stop booting on those platforms rather than
>>>> making unrealiable. So it would not be too bad.
>>>>
>>>> /!\ End
>>>>
>>>> We just have to be aware of the risk we are taking with backporting the
>>>> patch.
>>>
>>> What you wrote here seems to make sense but I would like to understand
>>> the problem mentioned earlier a bit better
>>>
>>>
>>>>>>> What about the other older stanging branches?
>>>>>>
>>>>>> The only one we could consider is 4.10, but AFAICT Jan already did cut
>>>>>> the
>>>>>> last release for it.
>>>>>>
>>>>>> So I wouldn't consider any backport unless we begin to see the branch
>>>>>> failing.
>>>>>
>>>>> If Jan already made the last release for 4.10, then little point in
>>>>> backporting it to it. However, it is not ideal to have something like
>>>>> 00c96d7742 in some still-maintained staging branches but not all.
>>
>> Jan pointed out it is not yet release. However, we didn't get any report for
>> problem (aside the Arm Arm violation) with Xen 4.10 today. So I would rather
>> avoid such backport in a final point release as we have a risk to make more
>> broken than it is today.
>>
>> I find this acceptable for Xen 4.11 because it has been proven to help. We
>> also still have point release afterwards if this goes wrong.
>
> If we do the backport, I would prefer to backport it to both trees, for
> consistency, and because there might be machines out there where 4.10
> doesn't boot with the wrong kconfig. This patch should decrease the risk
> of breakage.
The counter point here is Xen 4.10 is going to be out of support in a
few weeks. If you are about to use Xen 4.10 for your new product, then
you already made the wrong choice. Why would you use an out of support
release?
If you already use Xen 4.10, then you are probably fine to run this
release on your platform. Why would you take the risk to break them?
Note that Osstest does not test Xen 4.10 (or earlier) on Thunder-X, this
is does not need to be factored in the decision.
>
> However, I see your point too. This is a judgement call -- we have not
> enough data but we have to make a decision anyway. No way to tell which
> way is best "scientifically".
I also understand your point, however this is a bit worrying that not
enough data means that we are happy to backport a patch in a final point
release. I would have thought more caution would happen during backport.
>
> My vote is to backport to both. Jan/others please express your opinion.
To follow the vote convention:
4.11: -1
4.10: -1 (I was tempted by a -2 but if the other feels it should be
backported then I will not push back).
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-05 21:38 ` Julien Grall
@ 2019-06-06 8:42 ` Jan Beulich
2019-06-06 8:47 ` Julien Grall
0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2019-06-06 8:42 UTC (permalink / raw)
To: Julien Grall, Stefano Stabellini
Cc: Anthony Perard, Ian Jackson, osstest service owner, xen-devel
>>> On 05.06.19 at 23:38, <julien.grall@arm.com> wrote:
> On 6/5/19 9:29 PM, Stefano Stabellini wrote:
>> My vote is to backport to both. Jan/others please express your opinion.
>
> To follow the vote convention:
>
> 4.11: -1
Hmm, I'm surprised by this. Didn't I see you mention to Ian (on irc)
you'd prefer backporting over working around this in osstest?
> 4.10: -1 (I was tempted by a -2 but if the other feels it should be
> backported then I will not push back).
Considering the situation, I'm leaning towards Julien's opinion here.
But take this with care - I have way too little insight to have a
meaningful opinion.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-06 8:42 ` Jan Beulich
@ 2019-06-06 8:47 ` Julien Grall
2019-06-06 22:21 ` Stefano Stabellini
0 siblings, 1 reply; 43+ messages in thread
From: Julien Grall @ 2019-06-06 8:47 UTC (permalink / raw)
To: Jan Beulich, Stefano Stabellini
Cc: Anthony Perard, Ian Jackson, osstest service owner, xen-devel
On 06/06/2019 09:42, Jan Beulich wrote:
>>>> On 05.06.19 at 23:38, <julien.grall@arm.com> wrote:
>> On 6/5/19 9:29 PM, Stefano Stabellini wrote:
>>> My vote is to backport to both. Jan/others please express your opinion.
>>
>> To follow the vote convention:
>>
>> 4.11: -1
>
> Hmm, I'm surprised by this. Didn't I see you mention to Ian (on irc)
> you'd prefer backporting over working around this in osstest?
My mistake here. It should be +1 here.
>
>> 4.10: -1 (I was tempted by a -2 but if the other feels it should be
>> backported then I will not push back).
>
> Considering the situation, I'm leaning towards Julien's opinion here.
> But take this with care - I have way too little insight to have a
> meaningful opinion.
>
> Jan
>
>
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-06 8:47 ` Julien Grall
@ 2019-06-06 22:21 ` Stefano Stabellini
2019-06-07 9:33 ` Julien Grall
0 siblings, 1 reply; 43+ messages in thread
From: Stefano Stabellini @ 2019-06-06 22:21 UTC (permalink / raw)
To: Julien Grall
Cc: Stefano Stabellini, osstest service owner, Jan Beulich,
xen-devel, Anthony Perard, Ian Jackson
On Thu, 6 Jun 2019, Julien Grall wrote:
> On 06/06/2019 09:42, Jan Beulich wrote:
> > > > > On 05.06.19 at 23:38, <julien.grall@arm.com> wrote:
> > > On 6/5/19 9:29 PM, Stefano Stabellini wrote:
> > > > My vote is to backport to both. Jan/others please express your opinion.
> > >
> > > To follow the vote convention:
> > >
> > > 4.11: -1
> >
> > Hmm, I'm surprised by this. Didn't I see you mention to Ian (on irc)
> > you'd prefer backporting over working around this in osstest?
>
> My mistake here. It should be +1 here.
>
> > > 4.10: -1 (I was tempted by a -2 but if the other feels it should be
> > > backported then I will not push back).
> >
> > Considering the situation, I'm leaning towards Julien's opinion here.
> > But take this with care - I have way too little insight to have a
> > meaningful opinion.
All right. I backported the patch to staging-4.11 only.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL
2019-06-06 22:21 ` Stefano Stabellini
@ 2019-06-07 9:33 ` Julien Grall
0 siblings, 0 replies; 43+ messages in thread
From: Julien Grall @ 2019-06-07 9:33 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Anthony Perard, Ian Jackson, osstest service owner, Jan Beulich,
xen-devel
Hi Stefano,
On 06/06/2019 23:21, Stefano Stabellini wrote:
> On Thu, 6 Jun 2019, Julien Grall wrote:
>> On 06/06/2019 09:42, Jan Beulich wrote:
>>>>>> On 05.06.19 at 23:38, <julien.grall@arm.com> wrote:
>>>> On 6/5/19 9:29 PM, Stefano Stabellini wrote:
>>>>> My vote is to backport to both. Jan/others please express your opinion.
>>>>
>>>> To follow the vote convention:
>>>>
>>>> 4.11: -1
>>>
>>> Hmm, I'm surprised by this. Didn't I see you mention to Ian (on irc)
>>> you'd prefer backporting over working around this in osstest?
>>
>> My mistake here. It should be +1 here.
>>
>>>> 4.10: -1 (I was tempted by a -2 but if the other feels it should be
>>>> backported then I will not push back).
>>>
>>> Considering the situation, I'm leaning towards Julien's opinion here.
>>> But take this with care - I have way too little insight to have a
>>> meaningful opinion.
>
> All right. I backported the patch to staging-4.11 only.
Thank you! I will watch the next osstest flight for qemu-upstream-4.11 and see
if it boots.
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2019-06-07 9:49 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-15 19:48 [qemu-upstream-4.11-testing test] 136184: regressions - FAIL osstest service owner
2019-05-15 19:48 ` [Xen-devel] " osstest service owner
2019-05-16 10:37 ` Anthony PERARD
2019-05-16 10:37 ` [Xen-devel] " Anthony PERARD
2019-05-16 21:38 ` Julien Grall
2019-05-16 21:38 ` [Xen-devel] " Julien Grall
2019-05-17 15:53 ` Ian Jackson
2019-05-17 15:53 ` [Xen-devel] " Ian Jackson
2019-05-17 17:23 ` Anthony PERARD
2019-05-17 17:23 ` [Xen-devel] " Anthony PERARD
2019-05-17 19:00 ` Julien Grall
2019-05-17 19:00 ` [Xen-devel] " Julien Grall
2019-05-21 16:52 ` Julien Grall
2019-05-21 16:52 ` [Xen-devel] " Julien Grall
2019-06-03 17:15 ` Anthony PERARD
2019-06-03 17:15 ` [Xen-devel] " Anthony PERARD
2019-06-04 7:06 ` Jan Beulich
2019-06-04 7:06 ` [Xen-devel] " Jan Beulich
2019-06-04 9:01 ` Julien Grall
2019-06-04 9:01 ` [Xen-devel] " Julien Grall
2019-06-04 9:17 ` Jan Beulich
2019-06-04 9:17 ` [Xen-devel] " Jan Beulich
2019-06-04 9:57 ` Julien Grall
2019-06-04 9:57 ` [Xen-devel] " Julien Grall
2019-06-04 10:02 ` Jan Beulich
2019-06-04 10:02 ` [Xen-devel] " Jan Beulich
2019-06-04 17:09 ` Stefano Stabellini
2019-06-04 17:22 ` Julien Grall
2019-06-04 17:39 ` Stefano Stabellini
2019-06-04 17:52 ` Ian Jackson
2019-06-04 18:03 ` Stefano Stabellini
2019-06-04 18:27 ` Ian Jackson
2019-06-04 18:53 ` Stefano Stabellini
2019-06-04 20:50 ` Julien Grall
2019-06-04 23:11 ` Stefano Stabellini
2019-06-05 10:59 ` Julien Grall
2019-06-05 20:29 ` Stefano Stabellini
2019-06-05 21:38 ` Julien Grall
2019-06-06 8:42 ` Jan Beulich
2019-06-06 8:47 ` Julien Grall
2019-06-06 22:21 ` Stefano Stabellini
2019-06-07 9:33 ` Julien Grall
2019-06-05 10:19 ` Jan Beulich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.