* [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-05-15 19:48 ` osstest service owner 0 siblings, 0 replies; 43+ messages in thread From: osstest service owner @ 2019-05-15 19:48 UTC (permalink / raw) To: xen-devel, osstest-admin flight 136184 qemu-upstream-4.11-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/136184/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-arm64-pvops <job status> broken in 134594 build-arm64 <job status> broken in 134594 build-arm64-xsm <job status> broken in 134594 build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575 build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575 build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575 test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575 test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575 test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575 test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575 Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qcow2 17 guest-localmigrate/x10 fail in 136057 pass in 134594 test-amd64-amd64-xl-qcow2 16 guest-saverestore.2 fail pass in 136057 Tests which did not succeed, but are not blocking: test-arm64-arm64-xl 1 build-check(1) blocked in 134594 n/a build-arm64-libvirt 1 build-check(1) blocked in 134594 n/a test-arm64-arm64-xl-xsm 1 build-check(1) blocked in 134594 n/a test-arm64-arm64-xl-credit1 1 build-check(1) blocked in 134594 n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked in 134594 n/a test-arm64-arm64-xl-credit2 1 build-check(1) blocked in 134594 n/a test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-check fail never pass test-amd64-i386-xl-pvshim 12 guest-start fail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt 13 migrate-support-check fail never pass test-amd64-i386-libvirt 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-xl-credit1 7 xen-boot fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-check fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-libvirt 13 migrate-support-check fail never pass test-armhf-armhf-xl-rtds 13 migrate-support-check fail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-check fail never pass test-armhf-armhf-libvirt 14 saverestore-support-check fail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-check fail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-check fail never pass test-armhf-armhf-xl 13 migrate-support-check fail never pass test-armhf-armhf-xl 14 saverestore-support-check fail never pass test-armhf-armhf-xl-credit2 13 migrate-support-check fail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-check fail never pass test-armhf-armhf-xl-credit1 13 migrate-support-check fail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-check fail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-check fail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-check fail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-vhd 12 migrate-support-check fail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-check fail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-check fail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-check fail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-install fail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass version targeted for testing: qemuu 2871355a6957f1b3c16f858e3143e0fff0737b6a baseline version: qemuu 20c76f9a5fbf16d58c6add2ace2ff0fabd785926 Last test of basis 125575 2018-07-25 18:53:54 Z 294 days Testing same since 134270 2019-04-01 16:10:50 Z 44 days 19 attempts ------------------------------------------------------------ People who touched revisions under test: Anthony PERARD <anthony.perard@citrix.com> Gerd Hoffmann <kraxel@redhat.com> Greg Kurz <groug@kaod.org> Jason Wang <jasowang@redhat.com> Kevin Wolf <kwolf@redhat.com> Li Qiang <liq3ea@gmail.com> Michael McConville <mmcco@mykolab.com> Michael Tokarev <mjt@tls.msk.ru> Niels de Vos <ndevos@redhat.com> Paolo Bonzini <pbonzini@redhat.com> Peter Maydell <peter.maydell@linaro.org> Philippe Mathieu-Daudé <philmd@redhat.com> Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Roger Pau Monne <roger.pau@citrix.com> Roger Pau Monné <roger.pau@citrix.com> jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvops pass build-arm64-pvops pass build-armhf-pvops pass build-i386-pvops pass test-amd64-amd64-xl pass test-arm64-arm64-xl fail test-armhf-armhf-xl pass test-amd64-i386-xl pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm pass test-amd64-i386-xl-qemuu-debianhvm-i386-xsm pass test-amd64-amd64-libvirt-xsm pass test-arm64-arm64-libvirt-xsm fail test-amd64-i386-libvirt-xsm pass test-amd64-amd64-xl-xsm pass test-arm64-arm64-xl-xsm fail test-amd64-i386-xl-xsm pass test-amd64-amd64-qemuu-nested-amd fail test-amd64-amd64-xl-pvhv2-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-qemuu-ws16-amd64 fail test-amd64-i386-xl-qemuu-ws16-amd64 fail test-armhf-armhf-xl-arndale pass test-amd64-amd64-xl-credit1 pass test-arm64-arm64-xl-credit1 fail test-armhf-armhf-xl-credit1 pass test-amd64-amd64-xl-credit2 pass test-arm64-arm64-xl-credit2 fail test-armhf-armhf-xl-credit2 pass test-armhf-armhf-xl-cubietruck pass test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict fail test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict fail test-amd64-i386-freebsd10-i386 pass test-amd64-amd64-xl-qemuu-win10-i386 fail test-amd64-i386-xl-qemuu-win10-i386 fail test-amd64-amd64-qemuu-nested-intel pass test-amd64-amd64-xl-pvhv2-intel pass test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-amd64-libvirt pass test-armhf-armhf-libvirt pass test-amd64-i386-libvirt pass test-amd64-amd64-xl-multivcpu pass test-armhf-armhf-xl-multivcpu pass test-amd64-amd64-pair pass test-amd64-i386-pair pass test-amd64-amd64-libvirt-pair pass test-amd64-i386-libvirt-pair pass test-amd64-amd64-amd64-pvgrub pass test-amd64-amd64-i386-pvgrub pass test-amd64-amd64-xl-pvshim pass test-amd64-i386-xl-pvshim fail test-amd64-amd64-pygrub pass test-amd64-amd64-xl-qcow2 fail test-armhf-armhf-libvirt-raw pass test-amd64-i386-xl-raw pass test-amd64-amd64-xl-rtds pass test-armhf-armhf-xl-rtds pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow pass test-amd64-amd64-xl-shadow pass test-amd64-i386-xl-shadow pass test-amd64-amd64-libvirt-vhd pass test-armhf-armhf-xl-vhd pass ------------------------------------------------------------ sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary broken-job build-arm64-pvops broken broken-job build-arm64 broken broken-job build-arm64-xsm broken Not pushing. ------------------------------------------------------------ commit 2871355a6957f1b3c16f858e3143e0fff0737b6a Author: Kevin Wolf <kwolf@redhat.com> Date: Thu Oct 11 17:30:39 2018 +0200 gtk: Don't vte_terminal_set_encoding() on new VTE versions The function vte_terminal_set_encoding() is deprecated since VTE 0.54, so stop calling it from that version on. This fixes a build error because of our use of warning flags [-Werror=deprecated-declarations]. Fixes: https://bugs.launchpad.net/bugs/1794939 Reported-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20181011153039.2324-1-kwolf@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> (cherry picked from commit 6415994ffcc6d22b3f5add67f63fe77e4b9711f4) commit 94a715b6cba7225e5db59901e5d0a5252ead9755 Author: Niels de Vos <ndevos@redhat.com> Date: Tue Mar 5 16:46:34 2019 +0100 gluster: the glfs_io_cbk callback function pointer adds pre/post stat args The glfs_*_async() functions do a callback once finished. This callback has changed its arguments, pre- and post-stat structures have been added. This makes it possible to improve caching, which is useful for Samba and NFS-Ganesha, but not so much for QEMU. Gluster 6 is the first release that includes these new arguments. With an additional detection in ./configure, the new arguments can conditionally get included in the glfs_io_cbk handler. Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit 0e3b891fefacc0e49f3c8ffa3a753b69eb7214d2) commit 13bac7abf60e25101ef6059f0da7a168942eccd9 Author: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Date: Tue Mar 5 16:46:33 2019 +0100 gluster: Handle changed glfs_ftruncate signature New versions of Glusters libgfapi.so have an updated glfs_ftruncate() function that returns additional 'struct stat' structures to enable advanced caching of attributes. This is useful for file servers, not so much for QEMU. Nevertheless, the API has changed and needs to be adopted. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit e014dbe74e0484188164c61ff6843f8a04a8cb9d) commit 9864a12f4a13f19a7440cb32bd3242506d6b2738 Author: Jason Wang <jasowang@redhat.com> Date: Tue Dec 4 11:53:43 2018 +0800 net: drop too large packet early We try to detect and drop too large packet (>INT_MAX) in 1592a9947036 ("net: ignore packet size greater than INT_MAX") during packet delivering. Unfortunately, this is not sufficient as we may hit another integer overflow when trying to queue such large packet in qemu_net_queue_append_iov(): - size of the allocation may overflow on 32bit - packet->size is integer which may overflow even on 64bit Fixing this by moving the check to qemu_sendv_packet_async() which is the entrance of all networking codes and reduce the limit to NET_BUFSIZE to be more conservative. This works since: - For the callers that call qemu_sendv_packet_async() directly, they only care about if zero is returned to determine whether to prevent the source from producing more packets. A callback will be triggered if peer can accept more then source could be enabled. This is usually used by high speed networking implementation like virtio-net or netmap. - For the callers that call qemu_sendv_packet() that calls qemu_sendv_packet_async() indirectly, they often ignore the return value. In this case qemu will just the drop packets if peer can't receive. Qemu will copy the packet if it was queued. So it was safe for both kinds of the callers to assume the packet was sent. Since we move the check from qemu_deliver_packet_iov() to qemu_sendv_packet_async(), it would be safer to make qemu_deliver_packet_iov() static to prevent any external user in the future. This is a revised patch of CVE-2018-17963. Cc: qemu-stable@nongnu.org Cc: Li Qiang <liq3ea@163.com> Fixes: 1592a9947036 ("net: ignore packet size greater than INT_MAX") Reported-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-id: 20181204035347.6148-2-jasowang@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> (cherry picked from commit 25c01bd19d0e4b66f357618aeefda1ef7a41e21a) commit b697c0aecbf9bc8bdb4f1bf0ea92e6a8fb258094 Author: Jason Wang <jasowang@redhat.com> Date: Wed May 30 13:16:36 2018 +0800 net: ignore packet size greater than INT_MAX There should not be a reason for passing a packet size greater than INT_MAX. It's usually a hint of bug somewhere, so ignore packet size greater than INT_MAX in qemu_deliver_packet_iov() CC: qemu-stable@nongnu.org Reported-by: Daniel Shapira <daniel@twistlock.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> (cherry picked from commit 1592a9947036d60dde5404204a5d45975133caf5) commit f517c1b6079a514c0798eacb3f7c77b9dd8ebbf1 Author: Greg Kurz <groug@kaod.org> Date: Fri Nov 23 13:28:03 2018 +0100 9p: fix QEMU crash when renaming files When using the 9P2000.u version of the protocol, the following shell command line in the guest can cause QEMU to crash: while true; do rm -rf aa; mkdir -p a/b & touch a/b/c & mv a aa; done With 9P2000.u, file renaming is handled by the WSTAT command. The v9fs_wstat() function calls v9fs_complete_rename(), which calls v9fs_fix_path() for every fid whose path is affected by the change. The involved calls to v9fs_path_copy() may race with any other access to the fid path performed by some worker thread, causing a crash like shown below: Thread 12 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59 59 while (*path && fd != -1) { (gdb) bt #0 0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59 #1 0x0000555555a25e0c in local_opendir_nofollow (fs_ctx=0x555557d958b8, path=0x0) at hw/9pfs/9p-local.c:92 #2 0x0000555555a261b8 in local_lstat (fs_ctx=0x555557d958b8, fs_path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/9p-local.c:185 #3 0x0000555555a2b367 in v9fs_co_lstat (pdu=0x555557d97498, path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/cofile.c:53 #4 0x0000555555a1e9e2 in v9fs_stat (opaque=0x555557d97498) at hw/9pfs/9p.c:1083 #5 0x0000555555e060a2 in coroutine_trampoline (i0=-669165424, i1=32767) at util/coroutine-ucontext.c:116 #6 0x00007fffef4f5600 in __start_context () at /lib64/libc.so.6 #7 0x0000000000000000 in () (gdb) The fix is to take the path write lock when calling v9fs_complete_rename(), like in v9fs_rename(). Impact: DoS triggered by unprivileged guest users. Fixes: CVE-2018-19489 Cc: P J P <ppandit@redhat.com> Reported-by: zhibin hu <noirfate@gmail.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Greg Kurz <groug@kaod.org> (cherry picked from commit 1d20398694a3b67a388d955b7a945ba4aa90a8a8) commit 9af9c1c20e313f597168e0522f5fc8d78123b0c8 Author: Paolo Bonzini <pbonzini@redhat.com> Date: Tue Nov 20 19:41:48 2018 +0100 nvme: fix out-of-bounds access to the CMB Because the CMB BAR has a min_access_size of 2, if you read the last byte it will try to memcpy *2* bytes from n->cmbuf, causing an off-by-one error. This is CVE-2018-16847. Another way to fix this might be to register the CMB as a RAM memory region, which would also be more efficient. However, that might be a change for big-endian machines; I didn't think this through and I don't know how real hardware works. Add a basic testcase for the CMB in case somebody does this change later on. Cc: Keith Busch <keith.busch@intel.com> Cc: qemu-block@nongnu.org Reported-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Tested-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit 87ad860c622cc8f8916b5232bd8728c08f938fce) commit c50c704a6a09554925b926c0313280be4a3d7100 Author: Greg Kurz <groug@kaod.org> Date: Tue Nov 20 13:00:35 2018 +0100 9p: take write lock on fid path updates (CVE-2018-19364) Recent commit 5b76ef50f62079a fixed a race where v9fs_co_open2() could possibly overwrite a fid path with v9fs_path_copy() while it is being accessed by some other thread, ie, use-after-free that can be detected by ASAN with a custom 9p client. It turns out that the same can happen at several locations where v9fs_path_copy() is used to set the fid path. The fix is again to take the write lock. Fixes CVE-2018-19364. Cc: P J P <ppandit@redhat.com> Reported-by: zhibin hu <noirfate@gmail.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Greg Kurz <groug@kaod.org> (cherry picked from commit 5b3c77aa581ebb215125c84b0742119483571e55) commit 03c28544a1b67fd48ef1fa72231818efa8563874 Author: Roger Pau Monne <roger.pau@citrix.com> Date: Mon Mar 18 18:37:31 2019 +0100 xen-mapcache: use MAP_FIXED flag so the mmap address hint is always honored Or if it's not possible to honor the hinted address an error is returned instead. This makes it easier to spot the actual failure, instead of failing later on when the caller of xen_remap_bucket realizes the mapping has not been created at the requested address. Also note that at least on FreeBSD using MAP_FIXED will cause mmap to try harder to honor the passed address. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Igor Druzhinin <igor.druzhinin@cirtix.com> Message-Id: <20190318173731.14494-1-roger.pau@citrix.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> (cherry picked from commit 4158e93f4aced247c8db94a0275fc027da7dc97e) commit a35ed1444329599f2975512c82be795f8af284d5 Author: Michael McConville <mmcco@mykolab.com> Date: Fri Dec 1 11:31:57 2017 -0700 mmap(2) returns MAP_FAILED, not NULL, on failure Signed-off-by: Michael McConville <mmcco@mykolab.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (cherry picked from commit ab1ce9bd4897b9909836e2d50bca86f2f3f2dddc) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-05-15 19:48 ` osstest service owner 0 siblings, 0 replies; 43+ messages in thread From: osstest service owner @ 2019-05-15 19:48 UTC (permalink / raw) To: xen-devel, osstest-admin flight 136184 qemu-upstream-4.11-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/136184/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-arm64-pvops <job status> broken in 134594 build-arm64 <job status> broken in 134594 build-arm64-xsm <job status> broken in 134594 build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575 build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575 build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575 test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575 test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575 test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575 test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575 Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qcow2 17 guest-localmigrate/x10 fail in 136057 pass in 134594 test-amd64-amd64-xl-qcow2 16 guest-saverestore.2 fail pass in 136057 Tests which did not succeed, but are not blocking: test-arm64-arm64-xl 1 build-check(1) blocked in 134594 n/a build-arm64-libvirt 1 build-check(1) blocked in 134594 n/a test-arm64-arm64-xl-xsm 1 build-check(1) blocked in 134594 n/a test-arm64-arm64-xl-credit1 1 build-check(1) blocked in 134594 n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked in 134594 n/a test-arm64-arm64-xl-credit2 1 build-check(1) blocked in 134594 n/a test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-check fail never pass test-amd64-i386-xl-pvshim 12 guest-start fail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt 13 migrate-support-check fail never pass test-amd64-i386-libvirt 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-xl-credit1 7 xen-boot fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-check fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-armhf-armhf-libvirt 13 migrate-support-check fail never pass test-armhf-armhf-xl-rtds 13 migrate-support-check fail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-check fail never pass test-armhf-armhf-libvirt 14 saverestore-support-check fail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-check fail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-check fail never pass test-armhf-armhf-xl 13 migrate-support-check fail never pass test-armhf-armhf-xl 14 saverestore-support-check fail never pass test-armhf-armhf-xl-credit2 13 migrate-support-check fail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-check fail never pass test-armhf-armhf-xl-credit1 13 migrate-support-check fail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-check fail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-check fail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-check fail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-vhd 12 migrate-support-check fail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-check fail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-check fail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-check fail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-install fail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass version targeted for testing: qemuu 2871355a6957f1b3c16f858e3143e0fff0737b6a baseline version: qemuu 20c76f9a5fbf16d58c6add2ace2ff0fabd785926 Last test of basis 125575 2018-07-25 18:53:54 Z 294 days Testing same since 134270 2019-04-01 16:10:50 Z 44 days 19 attempts ------------------------------------------------------------ People who touched revisions under test: Anthony PERARD <anthony.perard@citrix.com> Gerd Hoffmann <kraxel@redhat.com> Greg Kurz <groug@kaod.org> Jason Wang <jasowang@redhat.com> Kevin Wolf <kwolf@redhat.com> Li Qiang <liq3ea@gmail.com> Michael McConville <mmcco@mykolab.com> Michael Tokarev <mjt@tls.msk.ru> Niels de Vos <ndevos@redhat.com> Paolo Bonzini <pbonzini@redhat.com> Peter Maydell <peter.maydell@linaro.org> Philippe Mathieu-Daudé <philmd@redhat.com> Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Roger Pau Monne <roger.pau@citrix.com> Roger Pau Monné <roger.pau@citrix.com> jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvops pass build-arm64-pvops pass build-armhf-pvops pass build-i386-pvops pass test-amd64-amd64-xl pass test-arm64-arm64-xl fail test-armhf-armhf-xl pass test-amd64-i386-xl pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm pass test-amd64-i386-xl-qemuu-debianhvm-i386-xsm pass test-amd64-amd64-libvirt-xsm pass test-arm64-arm64-libvirt-xsm fail test-amd64-i386-libvirt-xsm pass test-amd64-amd64-xl-xsm pass test-arm64-arm64-xl-xsm fail test-amd64-i386-xl-xsm pass test-amd64-amd64-qemuu-nested-amd fail test-amd64-amd64-xl-pvhv2-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-qemuu-ws16-amd64 fail test-amd64-i386-xl-qemuu-ws16-amd64 fail test-armhf-armhf-xl-arndale pass test-amd64-amd64-xl-credit1 pass test-arm64-arm64-xl-credit1 fail test-armhf-armhf-xl-credit1 pass test-amd64-amd64-xl-credit2 pass test-arm64-arm64-xl-credit2 fail test-armhf-armhf-xl-credit2 pass test-armhf-armhf-xl-cubietruck pass test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict fail test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict fail test-amd64-i386-freebsd10-i386 pass test-amd64-amd64-xl-qemuu-win10-i386 fail test-amd64-i386-xl-qemuu-win10-i386 fail test-amd64-amd64-qemuu-nested-intel pass test-amd64-amd64-xl-pvhv2-intel pass test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-amd64-libvirt pass test-armhf-armhf-libvirt pass test-amd64-i386-libvirt pass test-amd64-amd64-xl-multivcpu pass test-armhf-armhf-xl-multivcpu pass test-amd64-amd64-pair pass test-amd64-i386-pair pass test-amd64-amd64-libvirt-pair pass test-amd64-i386-libvirt-pair pass test-amd64-amd64-amd64-pvgrub pass test-amd64-amd64-i386-pvgrub pass test-amd64-amd64-xl-pvshim pass test-amd64-i386-xl-pvshim fail test-amd64-amd64-pygrub pass test-amd64-amd64-xl-qcow2 fail test-armhf-armhf-libvirt-raw pass test-amd64-i386-xl-raw pass test-amd64-amd64-xl-rtds pass test-armhf-armhf-xl-rtds pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow pass test-amd64-amd64-xl-shadow pass test-amd64-i386-xl-shadow pass test-amd64-amd64-libvirt-vhd pass test-armhf-armhf-xl-vhd pass ------------------------------------------------------------ sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary broken-job build-arm64-pvops broken broken-job build-arm64 broken broken-job build-arm64-xsm broken Not pushing. ------------------------------------------------------------ commit 2871355a6957f1b3c16f858e3143e0fff0737b6a Author: Kevin Wolf <kwolf@redhat.com> Date: Thu Oct 11 17:30:39 2018 +0200 gtk: Don't vte_terminal_set_encoding() on new VTE versions The function vte_terminal_set_encoding() is deprecated since VTE 0.54, so stop calling it from that version on. This fixes a build error because of our use of warning flags [-Werror=deprecated-declarations]. Fixes: https://bugs.launchpad.net/bugs/1794939 Reported-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20181011153039.2324-1-kwolf@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> (cherry picked from commit 6415994ffcc6d22b3f5add67f63fe77e4b9711f4) commit 94a715b6cba7225e5db59901e5d0a5252ead9755 Author: Niels de Vos <ndevos@redhat.com> Date: Tue Mar 5 16:46:34 2019 +0100 gluster: the glfs_io_cbk callback function pointer adds pre/post stat args The glfs_*_async() functions do a callback once finished. This callback has changed its arguments, pre- and post-stat structures have been added. This makes it possible to improve caching, which is useful for Samba and NFS-Ganesha, but not so much for QEMU. Gluster 6 is the first release that includes these new arguments. With an additional detection in ./configure, the new arguments can conditionally get included in the glfs_io_cbk handler. Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit 0e3b891fefacc0e49f3c8ffa3a753b69eb7214d2) commit 13bac7abf60e25101ef6059f0da7a168942eccd9 Author: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Date: Tue Mar 5 16:46:33 2019 +0100 gluster: Handle changed glfs_ftruncate signature New versions of Glusters libgfapi.so have an updated glfs_ftruncate() function that returns additional 'struct stat' structures to enable advanced caching of attributes. This is useful for file servers, not so much for QEMU. Nevertheless, the API has changed and needs to be adopted. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit e014dbe74e0484188164c61ff6843f8a04a8cb9d) commit 9864a12f4a13f19a7440cb32bd3242506d6b2738 Author: Jason Wang <jasowang@redhat.com> Date: Tue Dec 4 11:53:43 2018 +0800 net: drop too large packet early We try to detect and drop too large packet (>INT_MAX) in 1592a9947036 ("net: ignore packet size greater than INT_MAX") during packet delivering. Unfortunately, this is not sufficient as we may hit another integer overflow when trying to queue such large packet in qemu_net_queue_append_iov(): - size of the allocation may overflow on 32bit - packet->size is integer which may overflow even on 64bit Fixing this by moving the check to qemu_sendv_packet_async() which is the entrance of all networking codes and reduce the limit to NET_BUFSIZE to be more conservative. This works since: - For the callers that call qemu_sendv_packet_async() directly, they only care about if zero is returned to determine whether to prevent the source from producing more packets. A callback will be triggered if peer can accept more then source could be enabled. This is usually used by high speed networking implementation like virtio-net or netmap. - For the callers that call qemu_sendv_packet() that calls qemu_sendv_packet_async() indirectly, they often ignore the return value. In this case qemu will just the drop packets if peer can't receive. Qemu will copy the packet if it was queued. So it was safe for both kinds of the callers to assume the packet was sent. Since we move the check from qemu_deliver_packet_iov() to qemu_sendv_packet_async(), it would be safer to make qemu_deliver_packet_iov() static to prevent any external user in the future. This is a revised patch of CVE-2018-17963. Cc: qemu-stable@nongnu.org Cc: Li Qiang <liq3ea@163.com> Fixes: 1592a9947036 ("net: ignore packet size greater than INT_MAX") Reported-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-id: 20181204035347.6148-2-jasowang@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> (cherry picked from commit 25c01bd19d0e4b66f357618aeefda1ef7a41e21a) commit b697c0aecbf9bc8bdb4f1bf0ea92e6a8fb258094 Author: Jason Wang <jasowang@redhat.com> Date: Wed May 30 13:16:36 2018 +0800 net: ignore packet size greater than INT_MAX There should not be a reason for passing a packet size greater than INT_MAX. It's usually a hint of bug somewhere, so ignore packet size greater than INT_MAX in qemu_deliver_packet_iov() CC: qemu-stable@nongnu.org Reported-by: Daniel Shapira <daniel@twistlock.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> (cherry picked from commit 1592a9947036d60dde5404204a5d45975133caf5) commit f517c1b6079a514c0798eacb3f7c77b9dd8ebbf1 Author: Greg Kurz <groug@kaod.org> Date: Fri Nov 23 13:28:03 2018 +0100 9p: fix QEMU crash when renaming files When using the 9P2000.u version of the protocol, the following shell command line in the guest can cause QEMU to crash: while true; do rm -rf aa; mkdir -p a/b & touch a/b/c & mv a aa; done With 9P2000.u, file renaming is handled by the WSTAT command. The v9fs_wstat() function calls v9fs_complete_rename(), which calls v9fs_fix_path() for every fid whose path is affected by the change. The involved calls to v9fs_path_copy() may race with any other access to the fid path performed by some worker thread, causing a crash like shown below: Thread 12 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59 59 while (*path && fd != -1) { (gdb) bt #0 0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59 #1 0x0000555555a25e0c in local_opendir_nofollow (fs_ctx=0x555557d958b8, path=0x0) at hw/9pfs/9p-local.c:92 #2 0x0000555555a261b8 in local_lstat (fs_ctx=0x555557d958b8, fs_path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/9p-local.c:185 #3 0x0000555555a2b367 in v9fs_co_lstat (pdu=0x555557d97498, path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/cofile.c:53 #4 0x0000555555a1e9e2 in v9fs_stat (opaque=0x555557d97498) at hw/9pfs/9p.c:1083 #5 0x0000555555e060a2 in coroutine_trampoline (i0=-669165424, i1=32767) at util/coroutine-ucontext.c:116 #6 0x00007fffef4f5600 in __start_context () at /lib64/libc.so.6 #7 0x0000000000000000 in () (gdb) The fix is to take the path write lock when calling v9fs_complete_rename(), like in v9fs_rename(). Impact: DoS triggered by unprivileged guest users. Fixes: CVE-2018-19489 Cc: P J P <ppandit@redhat.com> Reported-by: zhibin hu <noirfate@gmail.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Greg Kurz <groug@kaod.org> (cherry picked from commit 1d20398694a3b67a388d955b7a945ba4aa90a8a8) commit 9af9c1c20e313f597168e0522f5fc8d78123b0c8 Author: Paolo Bonzini <pbonzini@redhat.com> Date: Tue Nov 20 19:41:48 2018 +0100 nvme: fix out-of-bounds access to the CMB Because the CMB BAR has a min_access_size of 2, if you read the last byte it will try to memcpy *2* bytes from n->cmbuf, causing an off-by-one error. This is CVE-2018-16847. Another way to fix this might be to register the CMB as a RAM memory region, which would also be more efficient. However, that might be a change for big-endian machines; I didn't think this through and I don't know how real hardware works. Add a basic testcase for the CMB in case somebody does this change later on. Cc: Keith Busch <keith.busch@intel.com> Cc: qemu-block@nongnu.org Reported-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Tested-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit 87ad860c622cc8f8916b5232bd8728c08f938fce) commit c50c704a6a09554925b926c0313280be4a3d7100 Author: Greg Kurz <groug@kaod.org> Date: Tue Nov 20 13:00:35 2018 +0100 9p: take write lock on fid path updates (CVE-2018-19364) Recent commit 5b76ef50f62079a fixed a race where v9fs_co_open2() could possibly overwrite a fid path with v9fs_path_copy() while it is being accessed by some other thread, ie, use-after-free that can be detected by ASAN with a custom 9p client. It turns out that the same can happen at several locations where v9fs_path_copy() is used to set the fid path. The fix is again to take the write lock. Fixes CVE-2018-19364. Cc: P J P <ppandit@redhat.com> Reported-by: zhibin hu <noirfate@gmail.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Greg Kurz <groug@kaod.org> (cherry picked from commit 5b3c77aa581ebb215125c84b0742119483571e55) commit 03c28544a1b67fd48ef1fa72231818efa8563874 Author: Roger Pau Monne <roger.pau@citrix.com> Date: Mon Mar 18 18:37:31 2019 +0100 xen-mapcache: use MAP_FIXED flag so the mmap address hint is always honored Or if it's not possible to honor the hinted address an error is returned instead. This makes it easier to spot the actual failure, instead of failing later on when the caller of xen_remap_bucket realizes the mapping has not been created at the requested address. Also note that at least on FreeBSD using MAP_FIXED will cause mmap to try harder to honor the passed address. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Igor Druzhinin <igor.druzhinin@cirtix.com> Message-Id: <20190318173731.14494-1-roger.pau@citrix.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> (cherry picked from commit 4158e93f4aced247c8db94a0275fc027da7dc97e) commit a35ed1444329599f2975512c82be795f8af284d5 Author: Michael McConville <mmcco@mykolab.com> Date: Fri Dec 1 11:31:57 2017 -0700 mmap(2) returns MAP_FAILED, not NULL, on failure Signed-off-by: Michael McConville <mmcco@mykolab.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (cherry picked from commit ab1ce9bd4897b9909836e2d50bca86f2f3f2dddc) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-05-16 10:37 ` Anthony PERARD 0 siblings, 0 replies; 43+ messages in thread From: Anthony PERARD @ 2019-05-16 10:37 UTC (permalink / raw) To: osstest service owner, Ian Jackson, Julien Grall; +Cc: xen-devel On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote: > flight 136184 qemu-upstream-4.11-testing real [real] > http://logs.test-lab.xenproject.org/osstest/logs/136184/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > build-arm64-pvops <job status> broken in 134594 > build-arm64 <job status> broken in 134594 > build-arm64-xsm <job status> broken in 134594 > build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575 > build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575 > build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575 > test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575 > test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575 > test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575 > test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575 > Ian, Julien, I can't figure out why Xen consistently fails to boot on rochester* in the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to pass. At boot, the boot loader seems to load blobs, but when it's time to Xen to shine, there are no output from Xen on the serial. Do you know what could cause xen to fail to boot? I don't believe a few more patch on top of qemu-xen would. Thanks, -- Anthony PERARD _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-05-16 10:37 ` Anthony PERARD 0 siblings, 0 replies; 43+ messages in thread From: Anthony PERARD @ 2019-05-16 10:37 UTC (permalink / raw) To: osstest service owner, Ian Jackson, Julien Grall; +Cc: xen-devel On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote: > flight 136184 qemu-upstream-4.11-testing real [real] > http://logs.test-lab.xenproject.org/osstest/logs/136184/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > build-arm64-pvops <job status> broken in 134594 > build-arm64 <job status> broken in 134594 > build-arm64-xsm <job status> broken in 134594 > build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575 > build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575 > build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575 > test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575 > test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575 > test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575 > test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575 > Ian, Julien, I can't figure out why Xen consistently fails to boot on rochester* in the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to pass. At boot, the boot loader seems to load blobs, but when it's time to Xen to shine, there are no output from Xen on the serial. Do you know what could cause xen to fail to boot? I don't believe a few more patch on top of qemu-xen would. Thanks, -- Anthony PERARD _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-05-16 21:38 ` Julien Grall 0 siblings, 0 replies; 43+ messages in thread From: Julien Grall @ 2019-05-16 21:38 UTC (permalink / raw) To: Anthony PERARD, osstest service owner, Ian Jackson Cc: xen-devel, Stefano Stabellini Hi Anthony, Thank you for CCing me. On 5/16/19 11:37 AM, Anthony PERARD wrote: > On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote: >> flight 136184 qemu-upstream-4.11-testing real [real] >> http://logs.test-lab.xenproject.org/osstest/logs/136184/ >> >> Regressions :-( >> >> Tests which did not succeed and are blocking, >> including tests which could not be run: >> build-arm64-pvops <job status> broken in 134594 >> build-arm64 <job status> broken in 134594 >> build-arm64-xsm <job status> broken in 134594 >> build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575 >> build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575 >> build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575 >> test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575 >> test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575 >> test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575 >> test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575 >> > > Ian, Julien, > > I can't figure out why Xen consistently fails to boot on rochester* in > the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to > pass. > > At boot, the boot loader seems to load blobs, but when it's time to Xen > to shine, there are no output from Xen on the serial. The serial console is initializing fairly late in the process. Any useful message (such as memory setup or even part of the interrupts) will be hide out. For getting them, you need earlyprintk. Unfortunately they can't be configured at runtime today :(. > > Do you know what could cause xen to fail to boot? It is hard to say without the log. Looking at the different with a Xen 4.11 flights on rochester0 [1], it seems the .config is slightly different. 4.11 flight has CONFIG_LIVEPATCH set. I tried to boot xen built in this flight on an internal board, but I can't see any error. So it may be some board specific issues. Sorry I can't provide more input without a proper investigation. > I don't believe a few more patch on top of qemu-xen would. Cheers, [1] http://logs.test-lab.xenproject.org/osstest/logs/136231/test-arm64-arm64-xl/info.html > -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-05-16 21:38 ` Julien Grall 0 siblings, 0 replies; 43+ messages in thread From: Julien Grall @ 2019-05-16 21:38 UTC (permalink / raw) To: Anthony PERARD, osstest service owner, Ian Jackson Cc: xen-devel, Stefano Stabellini Hi Anthony, Thank you for CCing me. On 5/16/19 11:37 AM, Anthony PERARD wrote: > On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote: >> flight 136184 qemu-upstream-4.11-testing real [real] >> http://logs.test-lab.xenproject.org/osstest/logs/136184/ >> >> Regressions :-( >> >> Tests which did not succeed and are blocking, >> including tests which could not be run: >> build-arm64-pvops <job status> broken in 134594 >> build-arm64 <job status> broken in 134594 >> build-arm64-xsm <job status> broken in 134594 >> build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575 >> build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575 >> build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575 >> test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575 >> test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575 >> test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575 >> test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575 >> > > Ian, Julien, > > I can't figure out why Xen consistently fails to boot on rochester* in > the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to > pass. > > At boot, the boot loader seems to load blobs, but when it's time to Xen > to shine, there are no output from Xen on the serial. The serial console is initializing fairly late in the process. Any useful message (such as memory setup or even part of the interrupts) will be hide out. For getting them, you need earlyprintk. Unfortunately they can't be configured at runtime today :(. > > Do you know what could cause xen to fail to boot? It is hard to say without the log. Looking at the different with a Xen 4.11 flights on rochester0 [1], it seems the .config is slightly different. 4.11 flight has CONFIG_LIVEPATCH set. I tried to boot xen built in this flight on an internal board, but I can't see any error. So it may be some board specific issues. Sorry I can't provide more input without a proper investigation. > I don't believe a few more patch on top of qemu-xen would. Cheers, [1] http://logs.test-lab.xenproject.org/osstest/logs/136231/test-arm64-arm64-xl/info.html > -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-05-17 15:53 ` Ian Jackson 0 siblings, 0 replies; 43+ messages in thread From: Ian Jackson @ 2019-05-17 15:53 UTC (permalink / raw) To: Julien Grall; +Cc: Anthony Perard, xen-devel, Stefano Stabellini Julien Grall writes ("Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL"): > On 5/16/19 11:37 AM, Anthony PERARD wrote: > >> Tests which did not succeed and are blocking, > >> including tests which could not be run: > >> test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575 > >> test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575 > >> test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575 > >> test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575 .. > > I can't figure out why Xen consistently fails to boot on rochester* in > > the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to > > pass. > > > > At boot, the boot loader seems to load blobs, but when it's time to Xen > > to shine, there are no output from Xen on the serial. > > The serial console is initializing fairly late in the process. Any > useful message (such as memory setup or even part of the interrupts) > will be hide out. For getting them, you need earlyprintk. Unfortunately > they can't be configured at runtime today :(. :-/. Can we configure the earlyprintk at compile-time ? We always want it to be serial... > > Do you know what could cause xen to fail to boot? > > It is hard to say without the log. Looking at the different with a Xen > 4.11 flights on rochester0 [1], it seems the .config is slightly > different. 4.11 flight has CONFIG_LIVEPATCH set. The osstest history shows this as a 100% repeatable boot failure but only in the qemu flights. Comparing 136231 (pass, xen-4.11-testing) with 136184 (fail, qemu-upstream-4.11-testing), there are no differences in the test job runvars. Both used the same version of osstest. But in the build-arm64 (Xen build) job runvars I see the following differences: 136231 136184 pass fail xen-4.11-testing qemu-*4.11* build-arm64 (Xen build) enable_livepatch true (unset) [~built_]revision_qemuu 20c76f9a5fbf... 2871355a6957... [~built_]revision_xen a6e07495c171... 3b062f5040a1... ~path_xenlptdist build/xenlptdist.tar.gz (unset) build-arm64-pvops (kernel build) ~host rochester1 laxton1 ~ indicates variable set by osstest during the test run. The qemu revision is clearly not relevant. I did this git-diff --stat a6e07495c171..3b062f5040a1 in xen.git and the differences really don't seem like they would be relevant. I think therefore that we need to blame the livepatch setting. This comes from osstest's flight construction code. osstest is configured to enable live patching, in the build, only on the xen-* branches. Unfortunately due to the xen/cmdline regression, the osstest bisector does not seem to have a useful enough baseline. I have rm'd the stamp files and it may manage to do better but I doubt it. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-05-17 15:53 ` Ian Jackson 0 siblings, 0 replies; 43+ messages in thread From: Ian Jackson @ 2019-05-17 15:53 UTC (permalink / raw) To: Julien Grall; +Cc: Anthony Perard, xen-devel, Stefano Stabellini Julien Grall writes ("Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL"): > On 5/16/19 11:37 AM, Anthony PERARD wrote: > >> Tests which did not succeed and are blocking, > >> including tests which could not be run: > >> test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575 > >> test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575 > >> test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575 > >> test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575 .. > > I can't figure out why Xen consistently fails to boot on rochester* in > > the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to > > pass. > > > > At boot, the boot loader seems to load blobs, but when it's time to Xen > > to shine, there are no output from Xen on the serial. > > The serial console is initializing fairly late in the process. Any > useful message (such as memory setup or even part of the interrupts) > will be hide out. For getting them, you need earlyprintk. Unfortunately > they can't be configured at runtime today :(. :-/. Can we configure the earlyprintk at compile-time ? We always want it to be serial... > > Do you know what could cause xen to fail to boot? > > It is hard to say without the log. Looking at the different with a Xen > 4.11 flights on rochester0 [1], it seems the .config is slightly > different. 4.11 flight has CONFIG_LIVEPATCH set. The osstest history shows this as a 100% repeatable boot failure but only in the qemu flights. Comparing 136231 (pass, xen-4.11-testing) with 136184 (fail, qemu-upstream-4.11-testing), there are no differences in the test job runvars. Both used the same version of osstest. But in the build-arm64 (Xen build) job runvars I see the following differences: 136231 136184 pass fail xen-4.11-testing qemu-*4.11* build-arm64 (Xen build) enable_livepatch true (unset) [~built_]revision_qemuu 20c76f9a5fbf... 2871355a6957... [~built_]revision_xen a6e07495c171... 3b062f5040a1... ~path_xenlptdist build/xenlptdist.tar.gz (unset) build-arm64-pvops (kernel build) ~host rochester1 laxton1 ~ indicates variable set by osstest during the test run. The qemu revision is clearly not relevant. I did this git-diff --stat a6e07495c171..3b062f5040a1 in xen.git and the differences really don't seem like they would be relevant. I think therefore that we need to blame the livepatch setting. This comes from osstest's flight construction code. osstest is configured to enable live patching, in the build, only on the xen-* branches. Unfortunately due to the xen/cmdline regression, the osstest bisector does not seem to have a useful enough baseline. I have rm'd the stamp files and it may manage to do better but I doubt it. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-05-17 17:23 ` Anthony PERARD 0 siblings, 0 replies; 43+ messages in thread From: Anthony PERARD @ 2019-05-17 17:23 UTC (permalink / raw) To: Julien Grall Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel On Thu, May 16, 2019 at 10:38:54PM +0100, Julien Grall wrote: > Hi Anthony, > > Thank you for CCing me. > > On 5/16/19 11:37 AM, Anthony PERARD wrote: > > On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote: > > > flight 136184 qemu-upstream-4.11-testing real [real] > > > http://logs.test-lab.xenproject.org/osstest/logs/136184/ > > > > > > Regressions :-( > > > > > > Tests which did not succeed and are blocking, > > > including tests which could not be run: > > > build-arm64-pvops <job status> broken in 134594 > > > build-arm64 <job status> broken in 134594 > > > build-arm64-xsm <job status> broken in 134594 > > > build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575 > > > build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575 > > > build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575 > > > test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575 > > > test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575 > > > test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575 > > > test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575 > > > > > > > Ian, Julien, > > > > I can't figure out why Xen consistently fails to boot on rochester* in > > the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to > > pass. > > > > At boot, the boot loader seems to load blobs, but when it's time to Xen > > to shine, there are no output from Xen on the serial. > > The serial console is initializing fairly late in the process. Any useful > message (such as memory setup or even part of the interrupts) will be hide > out. For getting them, you need earlyprintk. Unfortunately they can't be > configured at runtime today :(. I think I managed to run the job with earlyprintk on rochester, but Xen have booted: http://logs.test-lab.xenproject.org/osstest/logs/136451/ So that probably wasn't very useful. (I had to hack osstest in order to compile xen with early printk.) -- Anthony PERARD _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-05-17 17:23 ` Anthony PERARD 0 siblings, 0 replies; 43+ messages in thread From: Anthony PERARD @ 2019-05-17 17:23 UTC (permalink / raw) To: Julien Grall Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel On Thu, May 16, 2019 at 10:38:54PM +0100, Julien Grall wrote: > Hi Anthony, > > Thank you for CCing me. > > On 5/16/19 11:37 AM, Anthony PERARD wrote: > > On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote: > > > flight 136184 qemu-upstream-4.11-testing real [real] > > > http://logs.test-lab.xenproject.org/osstest/logs/136184/ > > > > > > Regressions :-( > > > > > > Tests which did not succeed and are blocking, > > > including tests which could not be run: > > > build-arm64-pvops <job status> broken in 134594 > > > build-arm64 <job status> broken in 134594 > > > build-arm64-xsm <job status> broken in 134594 > > > build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575 > > > build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575 > > > build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575 > > > test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575 > > > test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575 > > > test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575 > > > test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575 > > > > > > > Ian, Julien, > > > > I can't figure out why Xen consistently fails to boot on rochester* in > > the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to > > pass. > > > > At boot, the boot loader seems to load blobs, but when it's time to Xen > > to shine, there are no output from Xen on the serial. > > The serial console is initializing fairly late in the process. Any useful > message (such as memory setup or even part of the interrupts) will be hide > out. For getting them, you need earlyprintk. Unfortunately they can't be > configured at runtime today :(. I think I managed to run the job with earlyprintk on rochester, but Xen have booted: http://logs.test-lab.xenproject.org/osstest/logs/136451/ So that probably wasn't very useful. (I had to hack osstest in order to compile xen with early printk.) -- Anthony PERARD _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-05-17 19:00 ` Julien Grall 0 siblings, 0 replies; 43+ messages in thread From: Julien Grall @ 2019-05-17 19:00 UTC (permalink / raw) To: Anthony PERARD Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel Hi, On 5/17/19 6:23 PM, Anthony PERARD wrote: > On Thu, May 16, 2019 at 10:38:54PM +0100, Julien Grall wrote: >> Hi Anthony, >> >> Thank you for CCing me. >> >> On 5/16/19 11:37 AM, Anthony PERARD wrote: >>> On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote: >>>> flight 136184 qemu-upstream-4.11-testing real [real] >>>> http://logs.test-lab.xenproject.org/osstest/logs/136184/ >>>> >>>> Regressions :-( >>>> >>>> Tests which did not succeed and are blocking, >>>> including tests which could not be run: >>>> build-arm64-pvops <job status> broken in 134594 >>>> build-arm64 <job status> broken in 134594 >>>> build-arm64-xsm <job status> broken in 134594 >>>> build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575 >>>> build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575 >>>> build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575 >>>> test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575 >>>> test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575 >>>> test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575 >>>> test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575 >>>> >>> >>> Ian, Julien, >>> >>> I can't figure out why Xen consistently fails to boot on rochester* in >>> the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to >>> pass. >>> >>> At boot, the boot loader seems to load blobs, but when it's time to Xen >>> to shine, there are no output from Xen on the serial. >> >> The serial console is initializing fairly late in the process. Any useful >> message (such as memory setup or even part of the interrupts) will be hide >> out. For getting them, you need earlyprintk. Unfortunately they can't be >> configured at runtime today :(. > > I think I managed to run the job with earlyprintk on rochester, but > Xen have booted: > http://logs.test-lab.xenproject.org/osstest/logs/136451/ Yes this is with earlyprintk. That's going to be fun to reproduce if earlyprintk modifies the behavior. I think we can interpret as earlyprintk add enough latency to make everything working. There are two possible issues I can think of: 1) The boot code does not follow the Arm Arm, so it may be possible the board is doing something different compare to the other regarding the memory. IIRC, this is the first hardware we have with core not directly designed by Arm. 2) We are missing some errata in Xen. Linux contains 6 errata for that platform. Looking at them, I don't think they matter for boot time. 1) is currently been looked at (see MM-PART* patches on the ML). 2) should probably be addressed at some point, but I may not be able to send them as Arm employee (we tend to avoid sending patch showing brokenness in partner silicon). Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-05-17 19:00 ` Julien Grall 0 siblings, 0 replies; 43+ messages in thread From: Julien Grall @ 2019-05-17 19:00 UTC (permalink / raw) To: Anthony PERARD Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel Hi, On 5/17/19 6:23 PM, Anthony PERARD wrote: > On Thu, May 16, 2019 at 10:38:54PM +0100, Julien Grall wrote: >> Hi Anthony, >> >> Thank you for CCing me. >> >> On 5/16/19 11:37 AM, Anthony PERARD wrote: >>> On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote: >>>> flight 136184 qemu-upstream-4.11-testing real [real] >>>> http://logs.test-lab.xenproject.org/osstest/logs/136184/ >>>> >>>> Regressions :-( >>>> >>>> Tests which did not succeed and are blocking, >>>> including tests which could not be run: >>>> build-arm64-pvops <job status> broken in 134594 >>>> build-arm64 <job status> broken in 134594 >>>> build-arm64-xsm <job status> broken in 134594 >>>> build-arm64-xsm 4 host-install(4) broken in 134594 REGR. vs. 125575 >>>> build-arm64-pvops 4 host-install(4) broken in 134594 REGR. vs. 125575 >>>> build-arm64 4 host-install(4) broken in 134594 REGR. vs. 125575 >>>> test-arm64-arm64-libvirt-xsm 7 xen-boot fail REGR. vs. 125575 >>>> test-arm64-arm64-xl 7 xen-boot fail REGR. vs. 125575 >>>> test-arm64-arm64-xl-xsm 7 xen-boot fail REGR. vs. 125575 >>>> test-arm64-arm64-xl-credit2 7 xen-boot fail REGR. vs. 125575 >>>> >>> >>> Ian, Julien, >>> >>> I can't figure out why Xen consistently fails to boot on rochester* in >>> the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to >>> pass. >>> >>> At boot, the boot loader seems to load blobs, but when it's time to Xen >>> to shine, there are no output from Xen on the serial. >> >> The serial console is initializing fairly late in the process. Any useful >> message (such as memory setup or even part of the interrupts) will be hide >> out. For getting them, you need earlyprintk. Unfortunately they can't be >> configured at runtime today :(. > > I think I managed to run the job with earlyprintk on rochester, but > Xen have booted: > http://logs.test-lab.xenproject.org/osstest/logs/136451/ Yes this is with earlyprintk. That's going to be fun to reproduce if earlyprintk modifies the behavior. I think we can interpret as earlyprintk add enough latency to make everything working. There are two possible issues I can think of: 1) The boot code does not follow the Arm Arm, so it may be possible the board is doing something different compare to the other regarding the memory. IIRC, this is the first hardware we have with core not directly designed by Arm. 2) We are missing some errata in Xen. Linux contains 6 errata for that platform. Looking at them, I don't think they matter for boot time. 1) is currently been looked at (see MM-PART* patches on the ML). 2) should probably be addressed at some point, but I may not be able to send them as Arm employee (we tend to avoid sending patch showing brokenness in partner silicon). Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-05-21 16:52 ` Julien Grall 0 siblings, 0 replies; 43+ messages in thread From: Julien Grall @ 2019-05-21 16:52 UTC (permalink / raw) To: Anthony PERARD Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel Hi, Answering to myself. On 5/17/19 8:00 PM, Julien Grall wrote: > Hi, > > On 5/17/19 6:23 PM, Anthony PERARD wrote: >> On Thu, May 16, 2019 at 10:38:54PM +0100, Julien Grall wrote: >>> Hi Anthony, >>> >>> Thank you for CCing me. >>> >>> On 5/16/19 11:37 AM, Anthony PERARD wrote: >>>> On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote: >>>>> flight 136184 qemu-upstream-4.11-testing real [real] >>>>> http://logs.test-lab.xenproject.org/osstest/logs/136184/ >>>>> >>>>> Regressions :-( >>>>> >>>>> Tests which did not succeed and are blocking, >>>>> including tests which could not be run: >>>>> build-arm64-pvops <job status> >>>>> broken in 134594 >>>>> build-arm64 <job status> >>>>> broken in 134594 >>>>> build-arm64-xsm <job status> >>>>> broken in 134594 >>>>> build-arm64-xsm 4 host-install(4) broken in 134594 >>>>> REGR. vs. 125575 >>>>> build-arm64-pvops 4 host-install(4) broken in 134594 >>>>> REGR. vs. 125575 >>>>> build-arm64 4 host-install(4) broken in 134594 >>>>> REGR. vs. 125575 >>>>> test-arm64-arm64-libvirt-xsm 7 xen-boot fail >>>>> REGR. vs. 125575 >>>>> test-arm64-arm64-xl 7 xen-boot fail >>>>> REGR. vs. 125575 >>>>> test-arm64-arm64-xl-xsm 7 xen-boot fail >>>>> REGR. vs. 125575 >>>>> test-arm64-arm64-xl-credit2 7 xen-boot fail >>>>> REGR. vs. 125575 >>>>> >>>> >>>> Ian, Julien, >>>> >>>> I can't figure out why Xen consistently fails to boot on rochester* in >>>> the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to >>>> pass. >>>> >>>> At boot, the boot loader seems to load blobs, but when it's time to Xen >>>> to shine, there are no output from Xen on the serial. >>> >>> The serial console is initializing fairly late in the process. Any >>> useful >>> message (such as memory setup or even part of the interrupts) will be >>> hide >>> out. For getting them, you need earlyprintk. Unfortunately they can't be >>> configured at runtime today :(. >> >> I think I managed to run the job with earlyprintk on rochester, but >> Xen have booted: >> http://logs.test-lab.xenproject.org/osstest/logs/136451/ > > Yes this is with earlyprintk. That's going to be fun to reproduce if > earlyprintk modifies the behavior. > > I think we can interpret as earlyprintk add enough latency to make > everything working. > > There are two possible issues I can think of: > 1) The boot code does not follow the Arm Arm, so it may be possible > the board is doing something different compare to the other regarding > the memory. IIRC, this is the first hardware we have with core not > directly designed by Arm. > 2) We are missing some errata in Xen. Linux contains 6 errata for > that platform. Looking at them, I don't think they matter for boot time. > > 1) is currently been looked at (see MM-PART* patches on the ML). 2) > should probably be addressed at some point, but I may not be able to > send them as Arm employee (we tend to avoid sending patch showing > brokenness in partner silicon). Ian kindly started a couple of jobs over the week-end to confirm whether it can be reproduced on laxton* (Seattle board). The same error cannot be reproduced on laxton*. Looking at the test history, it looks like qemu-upstream-4.12-testing flight has run successfully a few times on rochester*. So we may have fixed the error in Xen 4.12. Potential candidates would be: - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on" - f60658c6ae "xen/arm: Stop relocating Xen" Ian, is it something the bisector could automatically look at? If not, I will need to find some time and borrow the board to bisect the issues. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-05-21 16:52 ` Julien Grall 0 siblings, 0 replies; 43+ messages in thread From: Julien Grall @ 2019-05-21 16:52 UTC (permalink / raw) To: Anthony PERARD Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel Hi, Answering to myself. On 5/17/19 8:00 PM, Julien Grall wrote: > Hi, > > On 5/17/19 6:23 PM, Anthony PERARD wrote: >> On Thu, May 16, 2019 at 10:38:54PM +0100, Julien Grall wrote: >>> Hi Anthony, >>> >>> Thank you for CCing me. >>> >>> On 5/16/19 11:37 AM, Anthony PERARD wrote: >>>> On Wed, May 15, 2019 at 07:48:17PM +0000, osstest service owner wrote: >>>>> flight 136184 qemu-upstream-4.11-testing real [real] >>>>> http://logs.test-lab.xenproject.org/osstest/logs/136184/ >>>>> >>>>> Regressions :-( >>>>> >>>>> Tests which did not succeed and are blocking, >>>>> including tests which could not be run: >>>>> build-arm64-pvops <job status> >>>>> broken in 134594 >>>>> build-arm64 <job status> >>>>> broken in 134594 >>>>> build-arm64-xsm <job status> >>>>> broken in 134594 >>>>> build-arm64-xsm 4 host-install(4) broken in 134594 >>>>> REGR. vs. 125575 >>>>> build-arm64-pvops 4 host-install(4) broken in 134594 >>>>> REGR. vs. 125575 >>>>> build-arm64 4 host-install(4) broken in 134594 >>>>> REGR. vs. 125575 >>>>> test-arm64-arm64-libvirt-xsm 7 xen-boot fail >>>>> REGR. vs. 125575 >>>>> test-arm64-arm64-xl 7 xen-boot fail >>>>> REGR. vs. 125575 >>>>> test-arm64-arm64-xl-xsm 7 xen-boot fail >>>>> REGR. vs. 125575 >>>>> test-arm64-arm64-xl-credit2 7 xen-boot fail >>>>> REGR. vs. 125575 >>>>> >>>> >>>> Ian, Julien, >>>> >>>> I can't figure out why Xen consistently fails to boot on rochester* in >>>> the qemu-upstream-4.11-testing flights. The xen-4.11-testing seems to >>>> pass. >>>> >>>> At boot, the boot loader seems to load blobs, but when it's time to Xen >>>> to shine, there are no output from Xen on the serial. >>> >>> The serial console is initializing fairly late in the process. Any >>> useful >>> message (such as memory setup or even part of the interrupts) will be >>> hide >>> out. For getting them, you need earlyprintk. Unfortunately they can't be >>> configured at runtime today :(. >> >> I think I managed to run the job with earlyprintk on rochester, but >> Xen have booted: >> http://logs.test-lab.xenproject.org/osstest/logs/136451/ > > Yes this is with earlyprintk. That's going to be fun to reproduce if > earlyprintk modifies the behavior. > > I think we can interpret as earlyprintk add enough latency to make > everything working. > > There are two possible issues I can think of: > 1) The boot code does not follow the Arm Arm, so it may be possible > the board is doing something different compare to the other regarding > the memory. IIRC, this is the first hardware we have with core not > directly designed by Arm. > 2) We are missing some errata in Xen. Linux contains 6 errata for > that platform. Looking at them, I don't think they matter for boot time. > > 1) is currently been looked at (see MM-PART* patches on the ML). 2) > should probably be addressed at some point, but I may not be able to > send them as Arm employee (we tend to avoid sending patch showing > brokenness in partner silicon). Ian kindly started a couple of jobs over the week-end to confirm whether it can be reproduced on laxton* (Seattle board). The same error cannot be reproduced on laxton*. Looking at the test history, it looks like qemu-upstream-4.12-testing flight has run successfully a few times on rochester*. So we may have fixed the error in Xen 4.12. Potential candidates would be: - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on" - f60658c6ae "xen/arm: Stop relocating Xen" Ian, is it something the bisector could automatically look at? If not, I will need to find some time and borrow the board to bisect the issues. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-06-03 17:15 ` Anthony PERARD 0 siblings, 0 replies; 43+ messages in thread From: Anthony PERARD @ 2019-06-03 17:15 UTC (permalink / raw) To: Julien Grall Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote: > The same error cannot be reproduced on laxton*. Looking at the test history, > it looks like qemu-upstream-4.12-testing flight has run successfully a few > times on rochester*. So we may have fixed the error in Xen 4.12. > > Potential candidates would be: > - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on" > - f60658c6ae "xen/arm: Stop relocating Xen" > > Ian, is it something the bisector could automatically look at? > If not, I will need to find some time and borrow the board to bisect the > issues. I attempted to do that bisection myself, and the first commit that git wanted to try, a common commit to both branches, boots just fine. It turns out that the first commit that fails to boot on rochester is e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct (even with the "eb8acba82a xen: Fix backport of .." applied) I did try a few commits from stable-4.12 branches and they all booted just fine on rochester. Now about the potential candidates: > - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on" This commit alone, cherry-picked on top of stable-4.11 makes xen boot on rochester. > - f60658c6ae "xen/arm: Stop relocating Xen" With that commit applied, xen doesn't build. So I couldn't try to boot it. (mm.c: In function ‘setup_pagetables’: mm.c:653:42: error: ‘xen_paddr’ undeclared ) -- Anthony PERARD _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-06-03 17:15 ` Anthony PERARD 0 siblings, 0 replies; 43+ messages in thread From: Anthony PERARD @ 2019-06-03 17:15 UTC (permalink / raw) To: Julien Grall Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote: > The same error cannot be reproduced on laxton*. Looking at the test history, > it looks like qemu-upstream-4.12-testing flight has run successfully a few > times on rochester*. So we may have fixed the error in Xen 4.12. > > Potential candidates would be: > - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on" > - f60658c6ae "xen/arm: Stop relocating Xen" > > Ian, is it something the bisector could automatically look at? > If not, I will need to find some time and borrow the board to bisect the > issues. I attempted to do that bisection myself, and the first commit that git wanted to try, a common commit to both branches, boots just fine. It turns out that the first commit that fails to boot on rochester is e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct (even with the "eb8acba82a xen: Fix backport of .." applied) I did try a few commits from stable-4.12 branches and they all booted just fine on rochester. Now about the potential candidates: > - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on" This commit alone, cherry-picked on top of stable-4.11 makes xen boot on rochester. > - f60658c6ae "xen/arm: Stop relocating Xen" With that commit applied, xen doesn't build. So I couldn't try to boot it. (mm.c: In function ‘setup_pagetables’: mm.c:653:42: error: ‘xen_paddr’ undeclared ) -- Anthony PERARD _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-06-04 7:06 ` Jan Beulich 0 siblings, 0 replies; 43+ messages in thread From: Jan Beulich @ 2019-06-04 7:06 UTC (permalink / raw) To: Anthony Perard Cc: Ian Jackson, Julien Grall, Stefano Stabellini, osstest service owner, xen-devel >>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: > On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote: >> The same error cannot be reproduced on laxton*. Looking at the test history, >> it looks like qemu-upstream-4.12-testing flight has run successfully a few >> times on rochester*. So we may have fixed the error in Xen 4.12. >> >> Potential candidates would be: >> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on" >> - f60658c6ae "xen/arm: Stop relocating Xen" >> >> Ian, is it something the bisector could automatically look at? >> If not, I will need to find some time and borrow the board to bisect the >> issues. > > I attempted to do that bisection myself, and the first commit that git > wanted to try, a common commit to both branches, boots just fine. Thanks for doing this! One thing that, for now, completely escapes me: How come the main 4.11 branch has progressed fine, but the qemuu one has got stalled like this? > It turns out that the first commit that fails to boot on rochester is > e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct > (even with the "eb8acba82a xen: Fix backport of .." applied) Now that's particularly odd a regression candidate. It doesn't touch any Arm code at all (nor does the fixup commit). And the common code changes don't look "risky" either; the one thing that jumps out as the most likely of all the unlikely candidates would seem to be the xen/common/efi/boot.c change, but if there was a problem there then the EFI boot on Arm would be latently broken in other ways as well. Plus, of course, you say that the same change is no problem on 4.12. Of course the commit itself could be further "bisected" - all changes other than the introduction of cmdline_strcmp() are completely independent of one another. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-06-04 7:06 ` Jan Beulich 0 siblings, 0 replies; 43+ messages in thread From: Jan Beulich @ 2019-06-04 7:06 UTC (permalink / raw) To: Anthony Perard Cc: Ian Jackson, Julien Grall, Stefano Stabellini, osstest service owner, xen-devel >>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: > On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote: >> The same error cannot be reproduced on laxton*. Looking at the test history, >> it looks like qemu-upstream-4.12-testing flight has run successfully a few >> times on rochester*. So we may have fixed the error in Xen 4.12. >> >> Potential candidates would be: >> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on" >> - f60658c6ae "xen/arm: Stop relocating Xen" >> >> Ian, is it something the bisector could automatically look at? >> If not, I will need to find some time and borrow the board to bisect the >> issues. > > I attempted to do that bisection myself, and the first commit that git > wanted to try, a common commit to both branches, boots just fine. Thanks for doing this! One thing that, for now, completely escapes me: How come the main 4.11 branch has progressed fine, but the qemuu one has got stalled like this? > It turns out that the first commit that fails to boot on rochester is > e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct > (even with the "eb8acba82a xen: Fix backport of .." applied) Now that's particularly odd a regression candidate. It doesn't touch any Arm code at all (nor does the fixup commit). And the common code changes don't look "risky" either; the one thing that jumps out as the most likely of all the unlikely candidates would seem to be the xen/common/efi/boot.c change, but if there was a problem there then the EFI boot on Arm would be latently broken in other ways as well. Plus, of course, you say that the same change is no problem on 4.12. Of course the commit itself could be further "bisected" - all changes other than the introduction of cmdline_strcmp() are completely independent of one another. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-06-04 9:01 ` Julien Grall 0 siblings, 0 replies; 43+ messages in thread From: Julien Grall @ 2019-06-04 9:01 UTC (permalink / raw) To: Jan Beulich, Anthony Perard Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel Hi Jan, On 6/4/19 8:06 AM, Jan Beulich wrote: >>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: >> On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote: >>> The same error cannot be reproduced on laxton*. Looking at the test history, >>> it looks like qemu-upstream-4.12-testing flight has run successfully a few >>> times on rochester*. So we may have fixed the error in Xen 4.12. >>> >>> Potential candidates would be: >>> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on" >>> - f60658c6ae "xen/arm: Stop relocating Xen" >>> >>> Ian, is it something the bisector could automatically look at? >>> If not, I will need to find some time and borrow the board to bisect the >>> issues. >> >> I attempted to do that bisection myself, and the first commit that git >> wanted to try, a common commit to both branches, boots just fine. > > Thanks for doing this! > > One thing that, for now, completely escapes me: How come the > main 4.11 branch has progressed fine, but the qemuu one has > got stalled like this? Because Xen on Arm today does not fully respect the Arm Arm when modifying the page-tables. This may result to TLB conflict and break of coherency. > >> It turns out that the first commit that fails to boot on rochester is >> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct >> (even with the "eb8acba82a xen: Fix backport of .." applied) > > Now that's particularly odd a regression candidate. It doesn't > touch any Arm code at all (nor does the fixup commit). And the > common code changes don't look "risky" either; the one thing that > jumps out as the most likely of all the unlikely candidates would > seem to be the xen/common/efi/boot.c change, but if there was > a problem there then the EFI boot on Arm would be latently > broken in other ways as well. Plus, of course, you say that the > same change is no problem on 4.12. > > Of course the commit itself could be further "bisected" - all > changes other than the introduction of cmdline_strcmp() are > completely independent of one another. I think this is just a red-herring. The commit is probably modifying enough the layout of Xen that TLB conflict will appear. Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on" makes staging-4.11 boots. This patch removes some of the potential cause of TLB conflict. I haven't suggested a backport of this patch so far, because there are still TLB conflict possible within the function modified. It might also be possible that it exposes more of TLB conflict as more work in Xen is needed (see my MM-PARTn series). I don't know whether backporting this patch is worth it compare to the risk it introduces. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-06-04 9:01 ` Julien Grall 0 siblings, 0 replies; 43+ messages in thread From: Julien Grall @ 2019-06-04 9:01 UTC (permalink / raw) To: Jan Beulich, Anthony Perard Cc: Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel Hi Jan, On 6/4/19 8:06 AM, Jan Beulich wrote: >>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: >> On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote: >>> The same error cannot be reproduced on laxton*. Looking at the test history, >>> it looks like qemu-upstream-4.12-testing flight has run successfully a few >>> times on rochester*. So we may have fixed the error in Xen 4.12. >>> >>> Potential candidates would be: >>> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on" >>> - f60658c6ae "xen/arm: Stop relocating Xen" >>> >>> Ian, is it something the bisector could automatically look at? >>> If not, I will need to find some time and borrow the board to bisect the >>> issues. >> >> I attempted to do that bisection myself, and the first commit that git >> wanted to try, a common commit to both branches, boots just fine. > > Thanks for doing this! > > One thing that, for now, completely escapes me: How come the > main 4.11 branch has progressed fine, but the qemuu one has > got stalled like this? Because Xen on Arm today does not fully respect the Arm Arm when modifying the page-tables. This may result to TLB conflict and break of coherency. > >> It turns out that the first commit that fails to boot on rochester is >> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct >> (even with the "eb8acba82a xen: Fix backport of .." applied) > > Now that's particularly odd a regression candidate. It doesn't > touch any Arm code at all (nor does the fixup commit). And the > common code changes don't look "risky" either; the one thing that > jumps out as the most likely of all the unlikely candidates would > seem to be the xen/common/efi/boot.c change, but if there was > a problem there then the EFI boot on Arm would be latently > broken in other ways as well. Plus, of course, you say that the > same change is no problem on 4.12. > > Of course the commit itself could be further "bisected" - all > changes other than the introduction of cmdline_strcmp() are > completely independent of one another. I think this is just a red-herring. The commit is probably modifying enough the layout of Xen that TLB conflict will appear. Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on" makes staging-4.11 boots. This patch removes some of the potential cause of TLB conflict. I haven't suggested a backport of this patch so far, because there are still TLB conflict possible within the function modified. It might also be possible that it exposes more of TLB conflict as more work in Xen is needed (see my MM-PARTn series). I don't know whether backporting this patch is worth it compare to the risk it introduces. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-06-04 9:17 ` Jan Beulich 0 siblings, 0 replies; 43+ messages in thread From: Jan Beulich @ 2019-06-04 9:17 UTC (permalink / raw) To: Julien Grall, Stefano Stabellini Cc: Anthony Perard, Ian Jackson, osstest service owner, xen-devel >>> On 04.06.19 at 11:01, <julien.grall@arm.com> wrote: > On 6/4/19 8:06 AM, Jan Beulich wrote: >>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: >>> It turns out that the first commit that fails to boot on rochester is >>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct >>> (even with the "eb8acba82a xen: Fix backport of .." applied) >> >> Now that's particularly odd a regression candidate. It doesn't >> touch any Arm code at all (nor does the fixup commit). And the >> common code changes don't look "risky" either; the one thing that >> jumps out as the most likely of all the unlikely candidates would >> seem to be the xen/common/efi/boot.c change, but if there was >> a problem there then the EFI boot on Arm would be latently >> broken in other ways as well. Plus, of course, you say that the >> same change is no problem on 4.12. >> >> Of course the commit itself could be further "bisected" - all >> changes other than the introduction of cmdline_strcmp() are >> completely independent of one another. > > I think this is just a red-herring. The commit is probably modifying > enough the layout of Xen that TLB conflict will appear. > > Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission > for Xen mappings earlier on" makes staging-4.11 boots. This patch > removes some of the potential cause of TLB conflict. > > I haven't suggested a backport of this patch so far, because there are > still TLB conflict possible within the function modified. It might also > be possible that it exposes more of TLB conflict as more work in Xen is > needed (see my MM-PARTn series). > > I don't know whether backporting this patch is worth it compare to the > risk it introduces. Well, if you don't backport this, what's the alternative road towards a solution here? I'm afraid the two of you will need to decide one way or another. In any event this sounds to me as if a similar problem could appear at any time on any branch. Not a very nice state to be in ... Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-06-04 9:17 ` Jan Beulich 0 siblings, 0 replies; 43+ messages in thread From: Jan Beulich @ 2019-06-04 9:17 UTC (permalink / raw) To: Julien Grall, Stefano Stabellini Cc: Anthony Perard, Ian Jackson, osstest service owner, xen-devel >>> On 04.06.19 at 11:01, <julien.grall@arm.com> wrote: > On 6/4/19 8:06 AM, Jan Beulich wrote: >>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: >>> It turns out that the first commit that fails to boot on rochester is >>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct >>> (even with the "eb8acba82a xen: Fix backport of .." applied) >> >> Now that's particularly odd a regression candidate. It doesn't >> touch any Arm code at all (nor does the fixup commit). And the >> common code changes don't look "risky" either; the one thing that >> jumps out as the most likely of all the unlikely candidates would >> seem to be the xen/common/efi/boot.c change, but if there was >> a problem there then the EFI boot on Arm would be latently >> broken in other ways as well. Plus, of course, you say that the >> same change is no problem on 4.12. >> >> Of course the commit itself could be further "bisected" - all >> changes other than the introduction of cmdline_strcmp() are >> completely independent of one another. > > I think this is just a red-herring. The commit is probably modifying > enough the layout of Xen that TLB conflict will appear. > > Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission > for Xen mappings earlier on" makes staging-4.11 boots. This patch > removes some of the potential cause of TLB conflict. > > I haven't suggested a backport of this patch so far, because there are > still TLB conflict possible within the function modified. It might also > be possible that it exposes more of TLB conflict as more work in Xen is > needed (see my MM-PARTn series). > > I don't know whether backporting this patch is worth it compare to the > risk it introduces. Well, if you don't backport this, what's the alternative road towards a solution here? I'm afraid the two of you will need to decide one way or another. In any event this sounds to me as if a similar problem could appear at any time on any branch. Not a very nice state to be in ... Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-06-04 9:57 ` Julien Grall 0 siblings, 0 replies; 43+ messages in thread From: Julien Grall @ 2019-06-04 9:57 UTC (permalink / raw) To: Jan Beulich, Stefano Stabellini Cc: Anthony Perard, Ian Jackson, osstest service owner, xen-devel On 6/4/19 10:17 AM, Jan Beulich wrote: >>>> On 04.06.19 at 11:01, <julien.grall@arm.com> wrote: >> On 6/4/19 8:06 AM, Jan Beulich wrote: >>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: >>>> It turns out that the first commit that fails to boot on rochester is >>>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct >>>> (even with the "eb8acba82a xen: Fix backport of .." applied) >>> >>> Now that's particularly odd a regression candidate. It doesn't >>> touch any Arm code at all (nor does the fixup commit). And the >>> common code changes don't look "risky" either; the one thing that >>> jumps out as the most likely of all the unlikely candidates would >>> seem to be the xen/common/efi/boot.c change, but if there was >>> a problem there then the EFI boot on Arm would be latently >>> broken in other ways as well. Plus, of course, you say that the >>> same change is no problem on 4.12. >>> >>> Of course the commit itself could be further "bisected" - all >>> changes other than the introduction of cmdline_strcmp() are >>> completely independent of one another. >> >> I think this is just a red-herring. The commit is probably modifying >> enough the layout of Xen that TLB conflict will appear. >> >> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission >> for Xen mappings earlier on" makes staging-4.11 boots. This patch >> removes some of the potential cause of TLB conflict. >> >> I haven't suggested a backport of this patch so far, because there are >> still TLB conflict possible within the function modified. It might also >> be possible that it exposes more of TLB conflict as more work in Xen is >> needed (see my MM-PARTn series). >> >> I don't know whether backporting this patch is worth it compare to the >> risk it introduces. > > Well, if you don't backport this, what's the alternative road towards a > solution here? I'm afraid the two of you will need to decide one way or > another. The "two" being? Looking at the code again, we now avoid replacing 4KB entry with 2MB block entry without respecting the Break-Before-Make sequence. So this is one (actually two) less potential source of TLB conflict. This patch may introduce more source of TLB conflict is the processor is caching intermediate walk. But this was already the case before, so it may be as bad as I first thought. I would definitely like to hear an opinion from Stefano here. > > In any event this sounds to me as if a similar problem could appear at > any time on any branch. Not a very nice state to be in ... Thankfully most of those issues will appear at boot time. The update of Xen page-tables at runtime is sort of correct (missing a couple of lock). But the failure will depend on your code. I expect that we would not see the failure in all the Arm platformed used in osstest but Thunder-X. It is not a nice state to be, but the task is quite important as Xen was designed on wrong assumption. This implies to rework most of the boot and memory management. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-06-04 9:57 ` Julien Grall 0 siblings, 0 replies; 43+ messages in thread From: Julien Grall @ 2019-06-04 9:57 UTC (permalink / raw) To: Jan Beulich, Stefano Stabellini Cc: Anthony Perard, Ian Jackson, osstest service owner, xen-devel On 6/4/19 10:17 AM, Jan Beulich wrote: >>>> On 04.06.19 at 11:01, <julien.grall@arm.com> wrote: >> On 6/4/19 8:06 AM, Jan Beulich wrote: >>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: >>>> It turns out that the first commit that fails to boot on rochester is >>>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct >>>> (even with the "eb8acba82a xen: Fix backport of .." applied) >>> >>> Now that's particularly odd a regression candidate. It doesn't >>> touch any Arm code at all (nor does the fixup commit). And the >>> common code changes don't look "risky" either; the one thing that >>> jumps out as the most likely of all the unlikely candidates would >>> seem to be the xen/common/efi/boot.c change, but if there was >>> a problem there then the EFI boot on Arm would be latently >>> broken in other ways as well. Plus, of course, you say that the >>> same change is no problem on 4.12. >>> >>> Of course the commit itself could be further "bisected" - all >>> changes other than the introduction of cmdline_strcmp() are >>> completely independent of one another. >> >> I think this is just a red-herring. The commit is probably modifying >> enough the layout of Xen that TLB conflict will appear. >> >> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission >> for Xen mappings earlier on" makes staging-4.11 boots. This patch >> removes some of the potential cause of TLB conflict. >> >> I haven't suggested a backport of this patch so far, because there are >> still TLB conflict possible within the function modified. It might also >> be possible that it exposes more of TLB conflict as more work in Xen is >> needed (see my MM-PARTn series). >> >> I don't know whether backporting this patch is worth it compare to the >> risk it introduces. > > Well, if you don't backport this, what's the alternative road towards a > solution here? I'm afraid the two of you will need to decide one way or > another. The "two" being? Looking at the code again, we now avoid replacing 4KB entry with 2MB block entry without respecting the Break-Before-Make sequence. So this is one (actually two) less potential source of TLB conflict. This patch may introduce more source of TLB conflict is the processor is caching intermediate walk. But this was already the case before, so it may be as bad as I first thought. I would definitely like to hear an opinion from Stefano here. > > In any event this sounds to me as if a similar problem could appear at > any time on any branch. Not a very nice state to be in ... Thankfully most of those issues will appear at boot time. The update of Xen page-tables at runtime is sort of correct (missing a couple of lock). But the failure will depend on your code. I expect that we would not see the failure in all the Arm platformed used in osstest but Thunder-X. It is not a nice state to be, but the task is quite important as Xen was designed on wrong assumption. This implies to rework most of the boot and memory management. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-06-04 10:02 ` Jan Beulich 0 siblings, 0 replies; 43+ messages in thread From: Jan Beulich @ 2019-06-04 10:02 UTC (permalink / raw) To: Julien Grall Cc: Anthony Perard, Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel >>> On 04.06.19 at 11:57, <julien.grall@arm.com> wrote: > > On 6/4/19 10:17 AM, Jan Beulich wrote: >>>>> On 04.06.19 at 11:01, <julien.grall@arm.com> wrote: >>> On 6/4/19 8:06 AM, Jan Beulich wrote: >>>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: >>>>> It turns out that the first commit that fails to boot on rochester is >>>>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct >>>>> (even with the "eb8acba82a xen: Fix backport of .." applied) >>>> >>>> Now that's particularly odd a regression candidate. It doesn't >>>> touch any Arm code at all (nor does the fixup commit). And the >>>> common code changes don't look "risky" either; the one thing that >>>> jumps out as the most likely of all the unlikely candidates would >>>> seem to be the xen/common/efi/boot.c change, but if there was >>>> a problem there then the EFI boot on Arm would be latently >>>> broken in other ways as well. Plus, of course, you say that the >>>> same change is no problem on 4.12. >>>> >>>> Of course the commit itself could be further "bisected" - all >>>> changes other than the introduction of cmdline_strcmp() are >>>> completely independent of one another. >>> >>> I think this is just a red-herring. The commit is probably modifying >>> enough the layout of Xen that TLB conflict will appear. >>> >>> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission >>> for Xen mappings earlier on" makes staging-4.11 boots. This patch >>> removes some of the potential cause of TLB conflict. >>> >>> I haven't suggested a backport of this patch so far, because there are >>> still TLB conflict possible within the function modified. It might also >>> be possible that it exposes more of TLB conflict as more work in Xen is >>> needed (see my MM-PARTn series). >>> >>> I don't know whether backporting this patch is worth it compare to the >>> risk it introduces. >> >> Well, if you don't backport this, what's the alternative road towards a >> solution here? I'm afraid the two of you will need to decide one way or >> another. > > The "two" being? You and Stefano, as was reflected by the To: list of my earlier reply. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL @ 2019-06-04 10:02 ` Jan Beulich 0 siblings, 0 replies; 43+ messages in thread From: Jan Beulich @ 2019-06-04 10:02 UTC (permalink / raw) To: Julien Grall Cc: Anthony Perard, Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel >>> On 04.06.19 at 11:57, <julien.grall@arm.com> wrote: > > On 6/4/19 10:17 AM, Jan Beulich wrote: >>>>> On 04.06.19 at 11:01, <julien.grall@arm.com> wrote: >>> On 6/4/19 8:06 AM, Jan Beulich wrote: >>>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: >>>>> It turns out that the first commit that fails to boot on rochester is >>>>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct >>>>> (even with the "eb8acba82a xen: Fix backport of .." applied) >>>> >>>> Now that's particularly odd a regression candidate. It doesn't >>>> touch any Arm code at all (nor does the fixup commit). And the >>>> common code changes don't look "risky" either; the one thing that >>>> jumps out as the most likely of all the unlikely candidates would >>>> seem to be the xen/common/efi/boot.c change, but if there was >>>> a problem there then the EFI boot on Arm would be latently >>>> broken in other ways as well. Plus, of course, you say that the >>>> same change is no problem on 4.12. >>>> >>>> Of course the commit itself could be further "bisected" - all >>>> changes other than the introduction of cmdline_strcmp() are >>>> completely independent of one another. >>> >>> I think this is just a red-herring. The commit is probably modifying >>> enough the layout of Xen that TLB conflict will appear. >>> >>> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission >>> for Xen mappings earlier on" makes staging-4.11 boots. This patch >>> removes some of the potential cause of TLB conflict. >>> >>> I haven't suggested a backport of this patch so far, because there are >>> still TLB conflict possible within the function modified. It might also >>> be possible that it exposes more of TLB conflict as more work in Xen is >>> needed (see my MM-PARTn series). >>> >>> I don't know whether backporting this patch is worth it compare to the >>> risk it introduces. >> >> Well, if you don't backport this, what's the alternative road towards a >> solution here? I'm afraid the two of you will need to decide one way or >> another. > > The "two" being? You and Stefano, as was reflected by the To: list of my earlier reply. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-04 9:01 ` [Xen-devel] " Julien Grall (?) (?) @ 2019-06-04 17:09 ` Stefano Stabellini 2019-06-04 17:22 ` Julien Grall -1 siblings, 1 reply; 43+ messages in thread From: Stefano Stabellini @ 2019-06-04 17:09 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, osstest service owner, Jan Beulich, xen-devel, Anthony Perard, Ian Jackson On Tue, 4 Jun 2019, Julien Grall wrote: > Hi Jan, > > On 6/4/19 8:06 AM, Jan Beulich wrote: > > > > > On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: > > > On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote: > > > > The same error cannot be reproduced on laxton*. Looking at the test > > > > history, > > > > it looks like qemu-upstream-4.12-testing flight has run successfully a > > > > few > > > > times on rochester*. So we may have fixed the error in Xen 4.12. > > > > > > > > Potential candidates would be: > > > > - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings > > > > earlier on" > > > > - f60658c6ae "xen/arm: Stop relocating Xen" > > > > > > > > Ian, is it something the bisector could automatically look at? > > > > If not, I will need to find some time and borrow the board to bisect the > > > > issues. > > > > > > I attempted to do that bisection myself, and the first commit that git > > > wanted to try, a common commit to both branches, boots just fine. > > > > Thanks for doing this! > > > > One thing that, for now, completely escapes me: How come the > > main 4.11 branch has progressed fine, but the qemuu one has > > got stalled like this? > > Because Xen on Arm today does not fully respect the Arm Arm when modifying the > page-tables. This may result to TLB conflict and break of coherency. Yes, I follow your reasoning, but it is still quite strange that it only happens with the qemu testing branch. Maybe it is because laxton was picked instead of rochester to run the tests for this branch? Otherwise, there must be a difference in the Xen configuration between the normal branch and the qemu testing branch, aside from QEMU of course, that shouldn't make any differences. > > > It turns out that the first commit that fails to boot on rochester is > > > e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct > > > (even with the "eb8acba82a xen: Fix backport of .." applied) > > > > Now that's particularly odd a regression candidate. It doesn't > > touch any Arm code at all (nor does the fixup commit). And the > > common code changes don't look "risky" either; the one thing that > > jumps out as the most likely of all the unlikely candidates would > > seem to be the xen/common/efi/boot.c change, but if there was > > a problem there then the EFI boot on Arm would be latently > > broken in other ways as well. Plus, of course, you say that the > > same change is no problem on 4.12. > > > > Of course the commit itself could be further "bisected" - all > > changes other than the introduction of cmdline_strcmp() are > > completely independent of one another. > > I think this is just a red-herring. The commit is probably modifying enough > the layout of Xen that TLB conflict will appear. > > Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission for > Xen mappings earlier on" makes staging-4.11 boots. This patch removes some of > the potential cause of TLB conflict. > > I haven't suggested a backport of this patch so far, because there are still > TLB conflict possible within the function modified. It might also be possible > that it exposes more of TLB conflict as more work in Xen is needed (see my > MM-PARTn series). > > I don't know whether backporting this patch is worth it compare to the risk it > introduces. I think we should backport 00c96d7742. We don't need to fix all issues, we only need to make improvements without introducing more bugs. From that standpoints, I think 00c96d7742 is doable. I'll backport it now to 4.11. What about the other older stanging branches? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-04 17:09 ` Stefano Stabellini @ 2019-06-04 17:22 ` Julien Grall 2019-06-04 17:39 ` Stefano Stabellini 2019-06-05 10:19 ` Jan Beulich 0 siblings, 2 replies; 43+ messages in thread From: Julien Grall @ 2019-06-04 17:22 UTC (permalink / raw) To: Stefano Stabellini Cc: Anthony Perard, Ian Jackson, osstest service owner, Jan Beulich, xen-devel Hi Stefano, On 6/4/19 6:09 PM, Stefano Stabellini wrote: > On Tue, 4 Jun 2019, Julien Grall wrote: >> Hi Jan, >> >> On 6/4/19 8:06 AM, Jan Beulich wrote: >>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: >>>> On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote: >>>>> The same error cannot be reproduced on laxton*. Looking at the test >>>>> history, >>>>> it looks like qemu-upstream-4.12-testing flight has run successfully a >>>>> few >>>>> times on rochester*. So we may have fixed the error in Xen 4.12. >>>>> >>>>> Potential candidates would be: >>>>> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings >>>>> earlier on" >>>>> - f60658c6ae "xen/arm: Stop relocating Xen" >>>>> >>>>> Ian, is it something the bisector could automatically look at? >>>>> If not, I will need to find some time and borrow the board to bisect the >>>>> issues. >>>> >>>> I attempted to do that bisection myself, and the first commit that git >>>> wanted to try, a common commit to both branches, boots just fine. >>> >>> Thanks for doing this! >>> >>> One thing that, for now, completely escapes me: How come the >>> main 4.11 branch has progressed fine, but the qemuu one has >>> got stalled like this? >> >> Because Xen on Arm today does not fully respect the Arm Arm when modifying the >> page-tables. This may result to TLB conflict and break of coherency. > > Yes, I follow your reasoning, but it is still quite strange that it only > happens with the qemu testing branch. Maybe it is because laxton was > picked instead of rochester to run the tests for this branch? Otherwise, > there must be a difference in the Xen configuration between the normal > branch and the qemu testing branch, aside from QEMU of course, that > shouldn't make any differences. Per the discussion before, the .config is different between the 2 flights. QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is. > > >>>> It turns out that the first commit that fails to boot on rochester is >>>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct >>>> (even with the "eb8acba82a xen: Fix backport of .." applied) >>> >>> Now that's particularly odd a regression candidate. It doesn't >>> touch any Arm code at all (nor does the fixup commit). And the >>> common code changes don't look "risky" either; the one thing that >>> jumps out as the most likely of all the unlikely candidates would >>> seem to be the xen/common/efi/boot.c change, but if there was >>> a problem there then the EFI boot on Arm would be latently >>> broken in other ways as well. Plus, of course, you say that the >>> same change is no problem on 4.12. >>> >>> Of course the commit itself could be further "bisected" - all >>> changes other than the introduction of cmdline_strcmp() are >>> completely independent of one another. >> >> I think this is just a red-herring. The commit is probably modifying enough >> the layout of Xen that TLB conflict will appear. >> >> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission for >> Xen mappings earlier on" makes staging-4.11 boots. This patch removes some of >> the potential cause of TLB conflict. >> >> I haven't suggested a backport of this patch so far, because there are still >> TLB conflict possible within the function modified. It might also be possible >> that it exposes more of TLB conflict as more work in Xen is needed (see my >> MM-PARTn series). >> >> I don't know whether backporting this patch is worth it compare to the risk it >> introduces. > > I think we should backport 00c96d7742. We don't need to fix all issues, > we only need to make improvements without introducing more bugs. > From that standpoints, I think 00c96d7742 is doable. I'll backport it now to > 4.11. You don't seem to assess/acknowledge any risk I mention in this thread. Note that I am not suggesting to not backport it. I am trying to understand how you came to your conclusion here. > What about the other older stanging branches? The only one we could consider is 4.10, but AFAICT Jan already did cut the last release for it. So I wouldn't consider any backport unless we begin to see the branch failing. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-04 17:22 ` Julien Grall @ 2019-06-04 17:39 ` Stefano Stabellini 2019-06-04 17:52 ` Ian Jackson 2019-06-04 20:50 ` Julien Grall 2019-06-05 10:19 ` Jan Beulich 1 sibling, 2 replies; 43+ messages in thread From: Stefano Stabellini @ 2019-06-04 17:39 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, osstest service owner, Jan Beulich, xen-devel, Anthony Perard, Ian Jackson On Tue, 4 Jun 2019, Julien Grall wrote: > Hi Stefano, > > On 6/4/19 6:09 PM, Stefano Stabellini wrote: > > On Tue, 4 Jun 2019, Julien Grall wrote: > > > Hi Jan, > > > > > > On 6/4/19 8:06 AM, Jan Beulich wrote: > > > > > > > On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: > > > > > On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote: > > > > > > The same error cannot be reproduced on laxton*. Looking at the test > > > > > > history, > > > > > > it looks like qemu-upstream-4.12-testing flight has run successfully > > > > > > a > > > > > > few > > > > > > times on rochester*. So we may have fixed the error in Xen 4.12. > > > > > > > > > > > > Potential candidates would be: > > > > > > - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen > > > > > > mappings > > > > > > earlier on" > > > > > > - f60658c6ae "xen/arm: Stop relocating Xen" > > > > > > > > > > > > Ian, is it something the bisector could automatically look at? > > > > > > If not, I will need to find some time and borrow the board to bisect > > > > > > the > > > > > > issues. > > > > > > > > > > I attempted to do that bisection myself, and the first commit that git > > > > > wanted to try, a common commit to both branches, boots just fine. > > > > > > > > Thanks for doing this! > > > > > > > > One thing that, for now, completely escapes me: How come the > > > > main 4.11 branch has progressed fine, but the qemuu one has > > > > got stalled like this? > > > > > > Because Xen on Arm today does not fully respect the Arm Arm when modifying > > > the > > > page-tables. This may result to TLB conflict and break of coherency. > > > > Yes, I follow your reasoning, but it is still quite strange that it only > > happens with the qemu testing branch. Maybe it is because laxton was > > picked instead of rochester to run the tests for this branch? Otherwise, > > there must be a difference in the Xen configuration between the normal > > branch and the qemu testing branch, aside from QEMU of course, that > > shouldn't make any differences. > > Per the discussion before, the .config is different between the 2 flights. > QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is. Has anybody tried to start selecting CONFIG_LIVEPATCH in the QEMU testing branch? Is it possible to give it a try? > > > > > It turns out that the first commit that fails to boot on rochester is > > > > > e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) > > > > > construct > > > > > (even with the "eb8acba82a xen: Fix backport of .." applied) > > > > > > > > Now that's particularly odd a regression candidate. It doesn't > > > > touch any Arm code at all (nor does the fixup commit). And the > > > > common code changes don't look "risky" either; the one thing that > > > > jumps out as the most likely of all the unlikely candidates would > > > > seem to be the xen/common/efi/boot.c change, but if there was > > > > a problem there then the EFI boot on Arm would be latently > > > > broken in other ways as well. Plus, of course, you say that the > > > > same change is no problem on 4.12. > > > > > > > > Of course the commit itself could be further "bisected" - all > > > > changes other than the introduction of cmdline_strcmp() are > > > > completely independent of one another. > > > > > > I think this is just a red-herring. The commit is probably modifying > > > enough > > > the layout of Xen that TLB conflict will appear. > > > > > > Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission > > > for > > > Xen mappings earlier on" makes staging-4.11 boots. This patch removes some > > > of > > > the potential cause of TLB conflict. > > > > > > I haven't suggested a backport of this patch so far, because there are > > > still > > > TLB conflict possible within the function modified. It might also be > > > possible > > > that it exposes more of TLB conflict as more work in Xen is needed (see my > > > MM-PARTn series). > > > > > > I don't know whether backporting this patch is worth it compare to the > > > risk it > > > introduces. > > > > I think we should backport 00c96d7742. We don't need to fix all issues, > > we only need to make improvements without introducing more bugs. > > From that standpoints, I think 00c96d7742 is doable. I'll backport it now to > > 4.11. > > You don't seem to assess/acknowledge any risk I mention in this thread. > > Note that I am not suggesting to not backport it. I am trying to understand > how you came to your conclusion here. Based on the fact that by code inspection the patch should be risk decremental in terms of Arm Arm violations, which is consistent with the fact that Anthony found it "fixing" the regression. Do you foresee cases where the patch increments the risk of failure? > > What about the other older stanging branches? > > The only one we could consider is 4.10, but AFAICT Jan already did cut the > last release for it. > > So I wouldn't consider any backport unless we begin to see the branch failing. If Jan already made the last release for 4.10, then little point in backporting it to it. However, it is not ideal to have something like 00c96d7742 in some still-maintained staging branches but not all. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-04 17:39 ` Stefano Stabellini @ 2019-06-04 17:52 ` Ian Jackson 2019-06-04 18:03 ` Stefano Stabellini 2019-06-04 20:50 ` Julien Grall 1 sibling, 1 reply; 43+ messages in thread From: Ian Jackson @ 2019-06-04 17:52 UTC (permalink / raw) To: Stefano Stabellini; +Cc: Anthony Perard, xen-devel, Julien Grall, Jan Beulich Stefano Stabellini writes ("Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL"): > On Tue, 4 Jun 2019, Julien Grall wrote: > > Per the discussion before, the .config is different between the 2 flights. > > QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is. > > Has anybody tried to start selecting CONFIG_LIVEPATCH in the QEMU testing > branch? Is it possible to give it a try? I can do this we thinks it's desirable. But I think it is probably actually helpful to test both, just in case non-LIVEPATCH breaks. As it just have. AIUI this is thought to be quite a rare problem, so it showing up in a qemu branch is OK. Otherwise maybe we would have to add both with- and without-LIVEPATCH tests to the xen-* flights. We already have both with- and without-XSM, and this would add another dimension to the build matrix. And we would have to decide what subset of the tests should be run in each configuration. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-04 17:52 ` Ian Jackson @ 2019-06-04 18:03 ` Stefano Stabellini 2019-06-04 18:27 ` Ian Jackson 0 siblings, 1 reply; 43+ messages in thread From: Stefano Stabellini @ 2019-06-04 18:03 UTC (permalink / raw) To: Ian Jackson Cc: Anthony Perard, xen-devel, Julien Grall, Stefano Stabellini, Jan Beulich On Tue, 4 Jun 2019, Ian Jackson wrote: > Stefano Stabellini writes ("Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL"): > > On Tue, 4 Jun 2019, Julien Grall wrote: > > > Per the discussion before, the .config is different between the 2 flights. > > > QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is. > > > > Has anybody tried to start selecting CONFIG_LIVEPATCH in the QEMU testing > > branch? Is it possible to give it a try? > > I can do this we thinks it's desirable. But I think it is probably > actually helpful to test both, just in case non-LIVEPATCH breaks. As > it just have. > > AIUI this is thought to be quite a rare problem, so it showing up in a > qemu branch is OK. > > Otherwise maybe we would have to add both with- and without-LIVEPATCH > tests to the xen-* flights. We already have both with- and > without-XSM, and this would add another dimension to the build matrix. > And we would have to decide what subset of the tests should be run in > each configuration. Hi Ian, I agree with you it would be desirable to test both LIVEPATCH and non-LIVEPATCH, and I understand about limitation of resources and test matrix explosion. Given the chance, I think it would be better if we had an explicit test about LIVEPATCH rather than a "hidden" enablement of it within another different test. Or maybe just call it out explicitly, renaming the test run to qemu-upstream-livepatch or something like that. In any case, I'll leave it to you. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-04 18:03 ` Stefano Stabellini @ 2019-06-04 18:27 ` Ian Jackson 2019-06-04 18:53 ` Stefano Stabellini 0 siblings, 1 reply; 43+ messages in thread From: Ian Jackson @ 2019-06-04 18:27 UTC (permalink / raw) To: Stefano Stabellini; +Cc: Anthony Perard, xen-devel, Julien Grall, Jan Beulich Stefano Stabellini writes ("Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL"): > I agree with you it would be desirable to test both LIVEPATCH and > non-LIVEPATCH, and I understand about limitation of resources and test > matrix explosion. > > Given the chance, I think it would be better if we had an explicit test > about LIVEPATCH rather than a "hidden" enablement of it within another > different test. Or maybe just call it out explicitly, renaming the test > run to qemu-upstream-livepatch or something like that. In any case, I'll > leave it to you. I think maybe you have misunderstood ? The thing that triggers this bug, here, is *compiling* Xen with CONFIG_LIVEPATCH *disabled*. So, in fact, if it is a hidden anything, it is a hidden *dis*ablement of a feature which is deliberately only compiled in, and only tested on, tests of the xen-* branches. That *disabling* this feature would cause a regression is surprising, and I think this is only the case because Xen only works by accident on these boxes ? (Considering the discussion of ARM ARM violations.) To make it an "explicit" test as you suggest would involve compiling Xen an additional time. I guess that would actually be changing some tests on xen-* branches to a version of Xen compiled *without* livepatch. Right now we build most other branches Xen amd64 with XSM no livepatch Xen armhf no XSM no livepatch Xen arm64 with XSM no livepatch xen-* branches Xen amd64 with XSM with livepatch Xen armhf no XSM with livepatch Xen arm64 with XSM with livepatch What without-livepatch build should be added to the xen-* branches ? And in which tests should it replace the existing with-livepatch builds ? Should I just pick one or two apparently at random ? NB that I doubt the livepatch maintainers have much of an opinion here. We would normally expect that compiling in livepatching might break something but that compiling it out would be fine. So the current situation is good from that point of view and we might even worry that changing some of the existing tests to not have livepatching compiled in might miss some actual livepatch-related bugs. My normal practice is to try to enable as much as is relevant and might break things. But what we have here is *not* a livepatch-related bug. It has nothing to do with livepatch. It is just that by luck, compiling Xen *with* livepatching somehow masks the random failure, presumably by changing exact orderings and timings of memory accesses etc. Does that make sense ? Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-04 18:27 ` Ian Jackson @ 2019-06-04 18:53 ` Stefano Stabellini 0 siblings, 0 replies; 43+ messages in thread From: Stefano Stabellini @ 2019-06-04 18:53 UTC (permalink / raw) To: Ian Jackson Cc: lars.kurth, Stefano Stabellini, Julien Grall, Jan Beulich, Anthony Perard, xen-devel On Tue, 4 Jun 2019, Ian Jackson wrote: > Stefano Stabellini writes ("Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL"): > > I agree with you it would be desirable to test both LIVEPATCH and > > non-LIVEPATCH, and I understand about limitation of resources and test > > matrix explosion. > > > > Given the chance, I think it would be better if we had an explicit test > > about LIVEPATCH rather than a "hidden" enablement of it within another > > different test. Or maybe just call it out explicitly, renaming the test > > run to qemu-upstream-livepatch or something like that. In any case, I'll > > leave it to you. > > I think maybe you have misunderstood ? > > The thing that triggers this bug, here, is *compiling* Xen with > CONFIG_LIVEPATCH *disabled*. I followed, but I mistyped inverting the condition. > So, in fact, if it is a hidden anything, it is a hidden *dis*ablement > of a feature which is deliberately only compiled in, and only tested > on, tests of the xen-* branches. > > That *disabling* this feature would cause a regression is surprising, > and I think this is only the case because Xen only works by accident > on these boxes ? (Considering the discussion of ARM ARM violations.) Yes, that is the current thinking. > To make it an "explicit" test as you suggest would involve compiling > Xen an additional time. I guess that would actually be changing some > tests on xen-* branches to a version of Xen compiled *without* > livepatch. Right now we build > > most other branches > Xen amd64 with XSM no livepatch > Xen armhf no XSM no livepatch > Xen arm64 with XSM no livepatch > > xen-* branches > Xen amd64 with XSM with livepatch > Xen armhf no XSM with livepatch > Xen arm64 with XSM with livepatch > > What without-livepatch build should be added to the xen-* branches ? > And in which tests should it replace the existing with-livepatch > builds ? Should I just pick one or two apparently at random ? > > NB that I doubt the livepatch maintainers have much of an opinion > here. We would normally expect that compiling in livepatching might > break something but that compiling it out would be fine. So the > current situation is good from that point of view and we might even > worry that changing some of the existing tests to not have > livepatching compiled in might miss some actual livepatch-related > bugs. My normal practice is to try to enable as much as is relevant > and might break things. I think it is a good practice in general, especially if we only have the resources for one type of tests. My point is that differences in the kconfig (except maybe for drivers such as UARTs) can have an important impact either directly or indirectly, like in this case. The problem will only get worse as more kconfig options will be introduced. We cannot test all possible combinations. However, I think different kconfigs deserve to be called out explicitly in the tests. This is what I was trying to say. Maybe we can pick 2 or 3 "interesting" Xen kconfigs and run tests for them. But of course this is predicated on hardware and resource availability that we might not have. Specifically in your matrix above, maybe: xen-* branches Xen amd64 kconfig_1 Xen amd64 kconfig_2 Xen armhf kconfig_1 Xen arm64 kconfig_1 Xen arm64 kconfig_2 where kconfig_1 has few options as possible enabled (no XSM, no LIVEPATCH) and kconfig_2 has as many options as possible enabled (both XSA and LIVEPATCH). Note that I only added kconfig_1 to the armhf line because it doesn't look like a good idea to run both on arm32. One day it would be great to add a kconfig_3 with a hand-picked set of options, and maybe more (kconfig_4, maybe a random kconfig, etc.). The other branches ideally would follow the same patten. If we don't have enough resources, they could run with kconfig_1 or kconfig_2 only. Funnily enough, we discussed something very similar just this morning in the FuSa Call because we'll need a special kconfig for safety certifications to be tested. It might end up looking very much like kconfig_1 (CC'ing Lars here to connect the dots.) > But what we have here is *not* a livepatch-related bug. It has > nothing to do with livepatch. It is just that by luck, compiling Xen > *with* livepatching somehow masks the random failure, presumably by > changing exact orderings and timings of memory accesses etc. > > Does that make sense ? Yes, I got it. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-04 17:39 ` Stefano Stabellini 2019-06-04 17:52 ` Ian Jackson @ 2019-06-04 20:50 ` Julien Grall 2019-06-04 23:11 ` Stefano Stabellini 1 sibling, 1 reply; 43+ messages in thread From: Julien Grall @ 2019-06-04 20:50 UTC (permalink / raw) To: Stefano Stabellini Cc: Anthony Perard, Ian Jackson, osstest service owner, Jan Beulich, xen-devel On 6/4/19 6:39 PM, Stefano Stabellini wrote: > On Tue, 4 Jun 2019, Julien Grall wrote: >> Hi Stefano, >> >> On 6/4/19 6:09 PM, Stefano Stabellini wrote: >>> On Tue, 4 Jun 2019, Julien Grall wrote: >>>> Hi Jan, >>>> >>>> On 6/4/19 8:06 AM, Jan Beulich wrote: >>>>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: >>>>>> On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote: >>>>>>> The same error cannot be reproduced on laxton*. Looking at the test >>>>>>> history, >>>>>>> it looks like qemu-upstream-4.12-testing flight has run successfully >>>>>>> a >>>>>>> few >>>>>>> times on rochester*. So we may have fixed the error in Xen 4.12. >>>>>>> >>>>>>> Potential candidates would be: >>>>>>> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen >>>>>>> mappings >>>>>>> earlier on" >>>>>>> - f60658c6ae "xen/arm: Stop relocating Xen" >>>>>>> >>>>>>> Ian, is it something the bisector could automatically look at? >>>>>>> If not, I will need to find some time and borrow the board to bisect >>>>>>> the >>>>>>> issues. >>>>>> >>>>>> I attempted to do that bisection myself, and the first commit that git >>>>>> wanted to try, a common commit to both branches, boots just fine. >>>>> >>>>> Thanks for doing this! >>>>> >>>>> One thing that, for now, completely escapes me: How come the >>>>> main 4.11 branch has progressed fine, but the qemuu one has >>>>> got stalled like this? >>>> >>>> Because Xen on Arm today does not fully respect the Arm Arm when modifying >>>> the >>>> page-tables. This may result to TLB conflict and break of coherency. >>> >>> Yes, I follow your reasoning, but it is still quite strange that it only >>> happens with the qemu testing branch. Maybe it is because laxton was >>> picked instead of rochester to run the tests for this branch? Otherwise, >>> there must be a difference in the Xen configuration between the normal >>> branch and the qemu testing branch, aside from QEMU of course, that >>> shouldn't make any differences. >> >> Per the discussion before, the .config is different between the 2 flights. >> QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is. > > Has anybody tried to start selecting CONFIG_LIVEPATCH in the QEMU testing > branch? Is it possible to give it a try? I don't know and I am not sure how this would help here it is pretty clear that backporting 00c96d7742 "xen/arm: mm: Set-up page permission for Xen mappings earlier on" is actually going to help booting. So it is very unlikely that CONFIG_LIVEPATCH is the problem. > > >>>>>> It turns out that the first commit that fails to boot on rochester is >>>>>> e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) >>>>>> construct >>>>>> (even with the "eb8acba82a xen: Fix backport of .." applied) >>>>> >>>>> Now that's particularly odd a regression candidate. It doesn't >>>>> touch any Arm code at all (nor does the fixup commit). And the >>>>> common code changes don't look "risky" either; the one thing that >>>>> jumps out as the most likely of all the unlikely candidates would >>>>> seem to be the xen/common/efi/boot.c change, but if there was >>>>> a problem there then the EFI boot on Arm would be latently >>>>> broken in other ways as well. Plus, of course, you say that the >>>>> same change is no problem on 4.12. >>>>> >>>>> Of course the commit itself could be further "bisected" - all >>>>> changes other than the introduction of cmdline_strcmp() are >>>>> completely independent of one another. >>>> >>>> I think this is just a red-herring. The commit is probably modifying >>>> enough >>>> the layout of Xen that TLB conflict will appear. >>>> >>>> Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page permission >>>> for >>>> Xen mappings earlier on" makes staging-4.11 boots. This patch removes some >>>> of >>>> the potential cause of TLB conflict. >>>> >>>> I haven't suggested a backport of this patch so far, because there are >>>> still >>>> TLB conflict possible within the function modified. It might also be >>>> possible >>>> that it exposes more of TLB conflict as more work in Xen is needed (see my >>>> MM-PARTn series). >>>> >>>> I don't know whether backporting this patch is worth it compare to the >>>> risk it >>>> introduces. >>> >>> I think we should backport 00c96d7742. We don't need to fix all issues, >>> we only need to make improvements without introducing more bugs. >>> From that standpoints, I think 00c96d7742 is doable. I'll backport it now to >>> 4.11. >> >> You don't seem to assess/acknowledge any risk I mention in this thread. >> >> Note that I am not suggesting to not backport it. I am trying to understand >> how you came to your conclusion here. > > Based on the fact that by code inspection the patch should be risk > decremental in terms of Arm Arm violations, which is consistent with the > fact that Anthony found it "fixing" the regression. Do you foresee cases > where the patch increments the risk of failure? Well yes and no. I guess you haven't read what I wrote on the separate thread. Yes, two potential source of TLB conflict is removed by avoiding replacing 4KB entries with 2MB block entry (and vice versa) without respecting the Break-Before-Make. No, this patch introducing another source of TLB conflict if the processor is caching intermediate translation (this is implementation defined). The bug reported by osstest actually taught me that even if Xen may boot today on a given platform, this may not be the case tomorrow because of the slight change in the code ordering (and therefore memory access). /!\ Below is my interpretation and does not imply I am correct ;) However, such Arm Arm violations are mostly gathered around boot and shouldn't affect runtime. IOW, Xen would stop booting on those platforms rather than making unrealiable. So it would not be too bad. /!\ End We just have to be aware of the risk we are taking with backporting the patch. >>> What about the other older stanging branches? >> >> The only one we could consider is 4.10, but AFAICT Jan already did cut the >> last release for it. >> >> So I wouldn't consider any backport unless we begin to see the branch failing. > > If Jan already made the last release for 4.10, then little point in > backporting it to it. However, it is not ideal to have something like > 00c96d7742 in some still-maintained staging branches but not all. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-04 20:50 ` Julien Grall @ 2019-06-04 23:11 ` Stefano Stabellini 2019-06-05 10:59 ` Julien Grall 0 siblings, 1 reply; 43+ messages in thread From: Stefano Stabellini @ 2019-06-04 23:11 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, osstest service owner, Jan Beulich, xen-devel, Anthony Perard, Ian Jackson On Tue, 4 Jun 2019, Julien Grall wrote: > On 6/4/19 6:39 PM, Stefano Stabellini wrote: > > On Tue, 4 Jun 2019, Julien Grall wrote: > > > Hi Stefano, > > > > > > On 6/4/19 6:09 PM, Stefano Stabellini wrote: > > > > On Tue, 4 Jun 2019, Julien Grall wrote: > > > > > Hi Jan, > > > > > > > > > > On 6/4/19 8:06 AM, Jan Beulich wrote: > > > > > > > > > On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: > > > > > > > On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote: > > > > > > > > The same error cannot be reproduced on laxton*. Looking at the > > > > > > > > test > > > > > > > > history, > > > > > > > > it looks like qemu-upstream-4.12-testing flight has run > > > > > > > > successfully > > > > > > > > a > > > > > > > > few > > > > > > > > times on rochester*. So we may have fixed the error in Xen 4.12. > > > > > > > > > > > > > > > > Potential candidates would be: > > > > > > > > - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen > > > > > > > > mappings > > > > > > > > earlier on" > > > > > > > > - f60658c6ae "xen/arm: Stop relocating Xen" > > > > > > > > > > > > > > > > Ian, is it something the bisector could automatically look at? > > > > > > > > If not, I will need to find some time and borrow the board to > > > > > > > > bisect > > > > > > > > the > > > > > > > > issues. > > > > > > > > > > > > > > I attempted to do that bisection myself, and the first commit that > > > > > > > git > > > > > > > wanted to try, a common commit to both branches, boots just fine. > > > > > > > > > > > > Thanks for doing this! > > > > > > > > > > > > One thing that, for now, completely escapes me: How come the > > > > > > main 4.11 branch has progressed fine, but the qemuu one has > > > > > > got stalled like this? > > > > > > > > > > Because Xen on Arm today does not fully respect the Arm Arm when > > > > > modifying > > > > > the > > > > > page-tables. This may result to TLB conflict and break of coherency. > > > > > > > > Yes, I follow your reasoning, but it is still quite strange that it only > > > > happens with the qemu testing branch. Maybe it is because laxton was > > > > picked instead of rochester to run the tests for this branch? Otherwise, > > > > there must be a difference in the Xen configuration between the normal > > > > branch and the qemu testing branch, aside from QEMU of course, that > > > > shouldn't make any differences. > > > > > > Per the discussion before, the .config is different between the 2 flights. > > > QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is. > > > > Has anybody tried to start selecting CONFIG_LIVEPATCH in the QEMU testing > > branch? Is it possible to give it a try? > > I don't know and I am not sure how this would help here it is pretty clear > that backporting 00c96d7742 "xen/arm: mm: Set-up page permission for Xen > mappings earlier on" is actually going to help booting. > > So it is very unlikely that CONFIG_LIVEPATCH is the problem. I am not blaming CONFIG_LIVEPATCH at all. If we decide that we don't want to backport 00c96d7742 for one reason or the other, and basically we cannot fix this bug, enabling CONFIG_LIVEPATCH would probably unblock the CI-loop (it would be nice to be sure about it). Let's keep in mind that we always had this bug -- the next 4.11 release is not going to be any more broken than the previous 4.11 release if we don't fix this issue, unless you think we backported something that affected the underlying problem, making it worse. Note that I am not advocating for leaving this bug unfixed. I am only suggesting that if we decide it is too risky to backport 00c96d7742 and we don't know what else to do, it would be good to have a way to unblock 4.11 without having to force-push it. Let's settle the discussion below first. > > > > > > > It turns out that the first commit that fails to boot on rochester > > > > > > > is > > > > > > > e202feb713 xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) > > > > > > > construct > > > > > > > (even with the "eb8acba82a xen: Fix backport of .." applied) > > > > > > > > > > > > Now that's particularly odd a regression candidate. It doesn't > > > > > > touch any Arm code at all (nor does the fixup commit). And the > > > > > > common code changes don't look "risky" either; the one thing that > > > > > > jumps out as the most likely of all the unlikely candidates would > > > > > > seem to be the xen/common/efi/boot.c change, but if there was > > > > > > a problem there then the EFI boot on Arm would be latently > > > > > > broken in other ways as well. Plus, of course, you say that the > > > > > > same change is no problem on 4.12. > > > > > > > > > > > > Of course the commit itself could be further "bisected" - all > > > > > > changes other than the introduction of cmdline_strcmp() are > > > > > > completely independent of one another. > > > > > > > > > > I think this is just a red-herring. The commit is probably modifying > > > > > enough > > > > > the layout of Xen that TLB conflict will appear. > > > > > > > > > > Anthony said backporting 00c96d7742 "xen/arm: mm: Set-up page > > > > > permission > > > > > for > > > > > Xen mappings earlier on" makes staging-4.11 boots. This patch removes > > > > > some > > > > > of > > > > > the potential cause of TLB conflict. > > > > > > > > > > I haven't suggested a backport of this patch so far, because there are > > > > > still > > > > > TLB conflict possible within the function modified. It might also be > > > > > possible > > > > > that it exposes more of TLB conflict as more work in Xen is needed > > > > > (see my > > > > > MM-PARTn series). > > > > > > > > > > I don't know whether backporting this patch is worth it compare to the > > > > > risk it > > > > > introduces. > > > > > > > > I think we should backport 00c96d7742. We don't need to fix all issues, > > > > we only need to make improvements without introducing more bugs. > > > > From that standpoints, I think 00c96d7742 is doable. I'll backport it > > > > now to > > > > 4.11. > > > > > > You don't seem to assess/acknowledge any risk I mention in this thread. > > > > > > Note that I am not suggesting to not backport it. I am trying to > > > understand > > > how you came to your conclusion here. > > > > Based on the fact that by code inspection the patch should be risk > > decremental in terms of Arm Arm violations, which is consistent with the > > fact that Anthony found it "fixing" the regression. Do you foresee cases > > where the patch increments the risk of failure? > > Well yes and no. I guess you haven't read what I wrote on the separate thread. I missed it > Yes, two potential source of TLB conflict is removed by avoiding replacing 4KB > entries with 2MB block entry (and vice versa) without respecting the > Break-Before-Make. This is clear > No, this patch introducing another source of TLB conflict if the processor is > caching intermediate translation (this is implementation defined). By "another source of TLB conflict" are you referring to something new that wasn't there before? Or are you referring to the fact that still we are not following the proper sequence to update the Xen pagetable? If you are referring to the latter, wouldn't it be reasonable to say that such a problem could have happened also before 00c96d7742? > The bug reported by osstest actually taught me that even if Xen may boot today > on a given platform, this may not be the case tomorrow because of the slight > change in the code ordering (and therefore memory access). > > /!\ Below is my interpretation and does not imply I am correct ;) > > However, such Arm Arm violations are mostly gathered around boot and shouldn't > affect runtime. IOW, Xen would stop booting on those platforms rather than > making unrealiable. So it would not be too bad. > > /!\ End > > We just have to be aware of the risk we are taking with backporting the patch. What you wrote here seems to make sense but I would like to understand the problem mentioned earlier a bit better > > > > What about the other older stanging branches? > > > > > > The only one we could consider is 4.10, but AFAICT Jan already did cut the > > > last release for it. > > > > > > So I wouldn't consider any backport unless we begin to see the branch > > > failing. > > > > If Jan already made the last release for 4.10, then little point in > > backporting it to it. However, it is not ideal to have something like > > 00c96d7742 in some still-maintained staging branches but not all. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-04 23:11 ` Stefano Stabellini @ 2019-06-05 10:59 ` Julien Grall 2019-06-05 20:29 ` Stefano Stabellini 0 siblings, 1 reply; 43+ messages in thread From: Julien Grall @ 2019-06-05 10:59 UTC (permalink / raw) To: Stefano Stabellini Cc: Anthony Perard, Ian Jackson, osstest service owner, Jan Beulich, xen-devel Hi Stefano, On 05/06/2019 00:11, Stefano Stabellini wrote: > On Tue, 4 Jun 2019, Julien Grall wrote: >> On 6/4/19 6:39 PM, Stefano Stabellini wrote: >>> On Tue, 4 Jun 2019, Julien Grall wrote: >>>> Hi Stefano, >>>> >>>> On 6/4/19 6:09 PM, Stefano Stabellini wrote: >>>>> On Tue, 4 Jun 2019, Julien Grall wrote: >>>>>> Hi Jan, >>>>>> >>>>>> On 6/4/19 8:06 AM, Jan Beulich wrote: >>>>>>>>>> On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: >>>>>>>> On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote: >>>>>>>>> The same error cannot be reproduced on laxton*. Looking at the >>>>>>>>> test >>>>>>>>> history, >>>>>>>>> it looks like qemu-upstream-4.12-testing flight has run >>>>>>>>> successfully >>>>>>>>> a >>>>>>>>> few >>>>>>>>> times on rochester*. So we may have fixed the error in Xen 4.12. >>>>>>>>> >>>>>>>>> Potential candidates would be: >>>>>>>>> - 00c96d7742 "xen/arm: mm: Set-up page permission for Xen >>>>>>>>> mappings >>>>>>>>> earlier on" >>>>>>>>> - f60658c6ae "xen/arm: Stop relocating Xen" >>>>>>>>> >>>>>>>>> Ian, is it something the bisector could automatically look at? >>>>>>>>> If not, I will need to find some time and borrow the board to >>>>>>>>> bisect >>>>>>>>> the >>>>>>>>> issues. >>>>>>>> >>>>>>>> I attempted to do that bisection myself, and the first commit that >>>>>>>> git >>>>>>>> wanted to try, a common commit to both branches, boots just fine. >>>>>>> >>>>>>> Thanks for doing this! >>>>>>> >>>>>>> One thing that, for now, completely escapes me: How come the >>>>>>> main 4.11 branch has progressed fine, but the qemuu one has >>>>>>> got stalled like this? >>>>>> >>>>>> Because Xen on Arm today does not fully respect the Arm Arm when >>>>>> modifying >>>>>> the >>>>>> page-tables. This may result to TLB conflict and break of coherency. >>>>> >>>>> Yes, I follow your reasoning, but it is still quite strange that it only >>>>> happens with the qemu testing branch. Maybe it is because laxton was >>>>> picked instead of rochester to run the tests for this branch? Otherwise, >>>>> there must be a difference in the Xen configuration between the normal >>>>> branch and the qemu testing branch, aside from QEMU of course, that >>>>> shouldn't make any differences. >>>> >>>> Per the discussion before, the .config is different between the 2 flights. >>>> QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is. >>> >>> Has anybody tried to start selecting CONFIG_LIVEPATCH in the QEMU testing >>> branch? Is it possible to give it a try? >> >> I don't know and I am not sure how this would help here it is pretty clear >> that backporting 00c96d7742 "xen/arm: mm: Set-up page permission for Xen >> mappings earlier on" is actually going to help booting. >> >> So it is very unlikely that CONFIG_LIVEPATCH is the problem. > > I am not blaming CONFIG_LIVEPATCH at all. If we decide that we don't > want to backport 00c96d7742 for one reason or the other, and basically > we cannot fix this bug, enabling CONFIG_LIVEPATCH would probably unblock > the CI-loop (it would be nice to be sure about it). Let's keep in mind > that we always had this bug -- the next 4.11 release is not going to be > any more broken than the previous 4.11 release if we don't fix this > issue, unless you think we backported something that affected the > underlying problem, making it worse. > > Note that I am not advocating for leaving this bug unfixed. I am only > suggesting that if we decide it is too risky to backport 00c96d7742 and > we don't know what else to do, it would be good to have a way to unblock > 4.11 without having to force-push it. Let's settle the discussion below > first. One way to unblock is not testing 4.11 (or just this flight) on Thunder-X. [...] >> No, this patch introducing another source of TLB conflict if the processor is >> caching intermediate translation (this is implementation defined). > > By "another source of TLB conflict" are you referring to something new > that wasn't there before? Or are you referring to the fact that still we > are not following the proper sequence to update the Xen pagetable? If > you are referring to the latter, wouldn't it be reasonable to say that > such a problem could have happened also before 00c96d7742? It is existent but in a different form. I can't tell whether this is bad or not because the re-ordering of the code (and therefore memory access) will affect how TLBs are used. So it is a bit of gambling here. >> The bug reported by osstest actually taught me that even if Xen may boot today >> on a given platform, this may not be the case tomorrow because of the slight >> change in the code ordering (and therefore memory access). >> >> /!\ Below is my interpretation and does not imply I am correct ;) >> >> However, such Arm Arm violations are mostly gathered around boot and shouldn't >> affect runtime. IOW, Xen would stop booting on those platforms rather than >> making unrealiable. So it would not be too bad. >> >> /!\ End >> >> We just have to be aware of the risk we are taking with backporting the patch. > > What you wrote here seems to make sense but I would like to understand > the problem mentioned earlier a bit better > > >>>>> What about the other older stanging branches? >>>> >>>> The only one we could consider is 4.10, but AFAICT Jan already did cut the >>>> last release for it. >>>> >>>> So I wouldn't consider any backport unless we begin to see the branch >>>> failing. >>> >>> If Jan already made the last release for 4.10, then little point in >>> backporting it to it. However, it is not ideal to have something like >>> 00c96d7742 in some still-maintained staging branches but not all. Jan pointed out it is not yet release. However, we didn't get any report for problem (aside the Arm Arm violation) with Xen 4.10 today. So I would rather avoid such backport in a final point release as we have a risk to make more broken than it is today. I find this acceptable for Xen 4.11 because it has been proven to help. We also still have point release afterwards if this goes wrong. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-05 10:59 ` Julien Grall @ 2019-06-05 20:29 ` Stefano Stabellini 2019-06-05 21:38 ` Julien Grall 0 siblings, 1 reply; 43+ messages in thread From: Stefano Stabellini @ 2019-06-05 20:29 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, osstest service owner, Jan Beulich, xen-devel, Anthony Perard, Ian Jackson On Wed, 5 Jun 2019, Julien Grall wrote: > Hi Stefano, > > On 05/06/2019 00:11, Stefano Stabellini wrote: > > On Tue, 4 Jun 2019, Julien Grall wrote: > > > On 6/4/19 6:39 PM, Stefano Stabellini wrote: > > > > On Tue, 4 Jun 2019, Julien Grall wrote: > > > > > Hi Stefano, > > > > > > > > > > On 6/4/19 6:09 PM, Stefano Stabellini wrote: > > > > > > On Tue, 4 Jun 2019, Julien Grall wrote: > > > > > > > Hi Jan, > > > > > > > > > > > > > > On 6/4/19 8:06 AM, Jan Beulich wrote: > > > > > > > > > > > On 03.06.19 at 19:15, <anthony.perard@citrix.com> wrote: > > > > > > > > > On Tue, May 21, 2019 at 05:52:12PM +0100, Julien Grall wrote: > > > > > > > > > > The same error cannot be reproduced on laxton*. Looking at > > > > > > > > > > the > > > > > > > > > > test > > > > > > > > > > history, > > > > > > > > > > it looks like qemu-upstream-4.12-testing flight has run > > > > > > > > > > successfully > > > > > > > > > > a > > > > > > > > > > few > > > > > > > > > > times on rochester*. So we may have fixed the error in Xen > > > > > > > > > > 4.12. > > > > > > > > > > > > > > > > > > > > Potential candidates would be: > > > > > > > > > > - 00c96d7742 "xen/arm: mm: Set-up page permission for > > > > > > > > > > Xen > > > > > > > > > > mappings > > > > > > > > > > earlier on" > > > > > > > > > > - f60658c6ae "xen/arm: Stop relocating Xen" > > > > > > > > > > > > > > > > > > > > Ian, is it something the bisector could automatically look > > > > > > > > > > at? > > > > > > > > > > If not, I will need to find some time and borrow the board > > > > > > > > > > to > > > > > > > > > > bisect > > > > > > > > > > the > > > > > > > > > > issues. > > > > > > > > > > > > > > > > > > I attempted to do that bisection myself, and the first commit > > > > > > > > > that > > > > > > > > > git > > > > > > > > > wanted to try, a common commit to both branches, boots just > > > > > > > > > fine. > > > > > > > > > > > > > > > > Thanks for doing this! > > > > > > > > > > > > > > > > One thing that, for now, completely escapes me: How come the > > > > > > > > main 4.11 branch has progressed fine, but the qemuu one has > > > > > > > > got stalled like this? > > > > > > > > > > > > > > Because Xen on Arm today does not fully respect the Arm Arm when > > > > > > > modifying > > > > > > > the > > > > > > > page-tables. This may result to TLB conflict and break of > > > > > > > coherency. > > > > > > > > > > > > Yes, I follow your reasoning, but it is still quite strange that it > > > > > > only > > > > > > happens with the qemu testing branch. Maybe it is because laxton was > > > > > > picked instead of rochester to run the tests for this branch? > > > > > > Otherwise, > > > > > > there must be a difference in the Xen configuration between the > > > > > > normal > > > > > > branch and the qemu testing branch, aside from QEMU of course, that > > > > > > shouldn't make any differences. > > > > > > > > > > Per the discussion before, the .config is different between the 2 > > > > > flights. > > > > > QEMU testing is not selecting CONFIG_LIVEPATCH while staging-4.11 is. > > > > > > > > Has anybody tried to start selecting CONFIG_LIVEPATCH in the QEMU > > > > testing > > > > branch? Is it possible to give it a try? > > > > > > I don't know and I am not sure how this would help here it is pretty clear > > > that backporting 00c96d7742 "xen/arm: mm: Set-up page permission for Xen > > > mappings earlier on" is actually going to help booting. > > > > > > So it is very unlikely that CONFIG_LIVEPATCH is the problem. > > > > I am not blaming CONFIG_LIVEPATCH at all. If we decide that we don't > > want to backport 00c96d7742 for one reason or the other, and basically > > we cannot fix this bug, enabling CONFIG_LIVEPATCH would probably unblock > > the CI-loop (it would be nice to be sure about it). Let's keep in mind > > that we always had this bug -- the next 4.11 release is not going to be > > any more broken than the previous 4.11 release if we don't fix this > > issue, unless you think we backported something that affected the > > underlying problem, making it worse. > > > > Note that I am not advocating for leaving this bug unfixed. I am only > > suggesting that if we decide it is too risky to backport 00c96d7742 and > > we don't know what else to do, it would be good to have a way to unblock > > 4.11 without having to force-push it. Let's settle the discussion below > > first. > > One way to unblock is not testing 4.11 (or just this flight) on Thunder-X. Yeah, let's keep these options in mind. > > > No, this patch introducing another source of TLB conflict if the processor > > > is > > > caching intermediate translation (this is implementation defined). > > > > By "another source of TLB conflict" are you referring to something new > > that wasn't there before? Or are you referring to the fact that still we > > are not following the proper sequence to update the Xen pagetable? If > > you are referring to the latter, wouldn't it be reasonable to say that > > such a problem could have happened also before 00c96d7742? > > It is existent but in a different form. I can't tell whether this is bad or > not because the re-ordering of the code (and therefore memory access) will > affect how TLBs are used. So it is a bit of gambling here. If I read this right, this is the same underlying issue but due to the re-ordering of the code, it could manifest differently. For instance the impact on cache lines could be different. Is this the case? If so, I think this is a tolerable risk, as other things could affect it too, such as CONFIG options being enabled/disabled, as we have just seen with CONFIG_LIVEPATCH. It is almost "random". I did take this into account when I wrote earlier that I think it should be backported. But if you see a different class of problems potentially being introduced by 00c96d7742 then I think the discussion would change because it can be considered a regression. > > > The bug reported by osstest actually taught me that even if Xen may boot > > > today > > > on a given platform, this may not be the case tomorrow because of the > > > slight > > > change in the code ordering (and therefore memory access). > > > > > > /!\ Below is my interpretation and does not imply I am correct ;) > > > > > > However, such Arm Arm violations are mostly gathered around boot and > > > shouldn't > > > affect runtime. IOW, Xen would stop booting on those platforms rather than > > > making unrealiable. So it would not be too bad. > > > > > > /!\ End > > > > > > We just have to be aware of the risk we are taking with backporting the > > > patch. > > > > What you wrote here seems to make sense but I would like to understand > > the problem mentioned earlier a bit better > > > > > > > > > > What about the other older stanging branches? > > > > > > > > > > The only one we could consider is 4.10, but AFAICT Jan already did cut > > > > > the > > > > > last release for it. > > > > > > > > > > So I wouldn't consider any backport unless we begin to see the branch > > > > > failing. > > > > > > > > If Jan already made the last release for 4.10, then little point in > > > > backporting it to it. However, it is not ideal to have something like > > > > 00c96d7742 in some still-maintained staging branches but not all. > > Jan pointed out it is not yet release. However, we didn't get any report for > problem (aside the Arm Arm violation) with Xen 4.10 today. So I would rather > avoid such backport in a final point release as we have a risk to make more > broken than it is today. > > I find this acceptable for Xen 4.11 because it has been proven to help. We > also still have point release afterwards if this goes wrong. If we do the backport, I would prefer to backport it to both trees, for consistency, and because there might be machines out there where 4.10 doesn't boot with the wrong kconfig. This patch should decrease the risk of breakage. However, I see your point too. This is a judgement call -- we have not enough data but we have to make a decision anyway. No way to tell which way is best "scientifically". My vote is to backport to both. Jan/others please express your opinion. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-05 20:29 ` Stefano Stabellini @ 2019-06-05 21:38 ` Julien Grall 2019-06-06 8:42 ` Jan Beulich 0 siblings, 1 reply; 43+ messages in thread From: Julien Grall @ 2019-06-05 21:38 UTC (permalink / raw) To: Stefano Stabellini Cc: Anthony Perard, Ian Jackson, osstest service owner, Jan Beulich, xen-devel Hi Stefano, On 6/5/19 9:29 PM, Stefano Stabellini wrote: > On Wed, 5 Jun 2019, Julien Grall wrote: >> Hi Stefano, >> >> On 05/06/2019 00:11, Stefano Stabellini wrote: >>> On Tue, 4 Jun 2019, Julien Grall wrote: >>>> On 6/4/19 6:39 PM, Stefano Stabellini wrote: >>>>> On Tue, 4 Jun 2019, Julien Grall wrote: >>>> No, this patch introducing another source of TLB conflict if the processor >>>> is >>>> caching intermediate translation (this is implementation defined). >>> >>> By "another source of TLB conflict" are you referring to something new >>> that wasn't there before? Or are you referring to the fact that still we >>> are not following the proper sequence to update the Xen pagetable? If >>> you are referring to the latter, wouldn't it be reasonable to say that >>> such a problem could have happened also before 00c96d7742? >> >> It is existent but in a different form. I can't tell whether this is bad or >> not because the re-ordering of the code (and therefore memory access) will >> affect how TLBs are used. So it is a bit of gambling here. > > If I read this right, this is the same underlying issue but due to the > re-ordering of the code, it could manifest differently. For instance the > impact on cache lines could be different. I am sorry, but how did you came up with cache line difference here? It has nothing about cachelines, it just has to do how the TLBs are filled at a given point. If you re-order memory access, then you may as well have a different state of the TLBs at a given point. > > Is this the case? If so, I think this is a tolerable risk, as other > things could affect it too, such as CONFIG options being > enabled/disabled, as we have just seen with CONFIG_LIVEPATCH. It is > almost "random". See above. But yes it is almost random. >>>> The bug reported by osstest actually taught me that even if Xen may boot >>>> today >>>> on a given platform, this may not be the case tomorrow because of the >>>> slight >>>> change in the code ordering (and therefore memory access). >>>> >>>> /!\ Below is my interpretation and does not imply I am correct ;) >>>> >>>> However, such Arm Arm violations are mostly gathered around boot and >>>> shouldn't >>>> affect runtime. IOW, Xen would stop booting on those platforms rather than >>>> making unrealiable. So it would not be too bad. >>>> >>>> /!\ End >>>> >>>> We just have to be aware of the risk we are taking with backporting the >>>> patch. >>> >>> What you wrote here seems to make sense but I would like to understand >>> the problem mentioned earlier a bit better >>> >>> >>>>>>> What about the other older stanging branches? >>>>>> >>>>>> The only one we could consider is 4.10, but AFAICT Jan already did cut >>>>>> the >>>>>> last release for it. >>>>>> >>>>>> So I wouldn't consider any backport unless we begin to see the branch >>>>>> failing. >>>>> >>>>> If Jan already made the last release for 4.10, then little point in >>>>> backporting it to it. However, it is not ideal to have something like >>>>> 00c96d7742 in some still-maintained staging branches but not all. >> >> Jan pointed out it is not yet release. However, we didn't get any report for >> problem (aside the Arm Arm violation) with Xen 4.10 today. So I would rather >> avoid such backport in a final point release as we have a risk to make more >> broken than it is today. >> >> I find this acceptable for Xen 4.11 because it has been proven to help. We >> also still have point release afterwards if this goes wrong. > > If we do the backport, I would prefer to backport it to both trees, for > consistency, and because there might be machines out there where 4.10 > doesn't boot with the wrong kconfig. This patch should decrease the risk > of breakage. The counter point here is Xen 4.10 is going to be out of support in a few weeks. If you are about to use Xen 4.10 for your new product, then you already made the wrong choice. Why would you use an out of support release? If you already use Xen 4.10, then you are probably fine to run this release on your platform. Why would you take the risk to break them? Note that Osstest does not test Xen 4.10 (or earlier) on Thunder-X, this is does not need to be factored in the decision. > > However, I see your point too. This is a judgement call -- we have not > enough data but we have to make a decision anyway. No way to tell which > way is best "scientifically". I also understand your point, however this is a bit worrying that not enough data means that we are happy to backport a patch in a final point release. I would have thought more caution would happen during backport. > > My vote is to backport to both. Jan/others please express your opinion. To follow the vote convention: 4.11: -1 4.10: -1 (I was tempted by a -2 but if the other feels it should be backported then I will not push back). Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-05 21:38 ` Julien Grall @ 2019-06-06 8:42 ` Jan Beulich 2019-06-06 8:47 ` Julien Grall 0 siblings, 1 reply; 43+ messages in thread From: Jan Beulich @ 2019-06-06 8:42 UTC (permalink / raw) To: Julien Grall, Stefano Stabellini Cc: Anthony Perard, Ian Jackson, osstest service owner, xen-devel >>> On 05.06.19 at 23:38, <julien.grall@arm.com> wrote: > On 6/5/19 9:29 PM, Stefano Stabellini wrote: >> My vote is to backport to both. Jan/others please express your opinion. > > To follow the vote convention: > > 4.11: -1 Hmm, I'm surprised by this. Didn't I see you mention to Ian (on irc) you'd prefer backporting over working around this in osstest? > 4.10: -1 (I was tempted by a -2 but if the other feels it should be > backported then I will not push back). Considering the situation, I'm leaning towards Julien's opinion here. But take this with care - I have way too little insight to have a meaningful opinion. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-06 8:42 ` Jan Beulich @ 2019-06-06 8:47 ` Julien Grall 2019-06-06 22:21 ` Stefano Stabellini 0 siblings, 1 reply; 43+ messages in thread From: Julien Grall @ 2019-06-06 8:47 UTC (permalink / raw) To: Jan Beulich, Stefano Stabellini Cc: Anthony Perard, Ian Jackson, osstest service owner, xen-devel On 06/06/2019 09:42, Jan Beulich wrote: >>>> On 05.06.19 at 23:38, <julien.grall@arm.com> wrote: >> On 6/5/19 9:29 PM, Stefano Stabellini wrote: >>> My vote is to backport to both. Jan/others please express your opinion. >> >> To follow the vote convention: >> >> 4.11: -1 > > Hmm, I'm surprised by this. Didn't I see you mention to Ian (on irc) > you'd prefer backporting over working around this in osstest? My mistake here. It should be +1 here. > >> 4.10: -1 (I was tempted by a -2 but if the other feels it should be >> backported then I will not push back). > > Considering the situation, I'm leaning towards Julien's opinion here. > But take this with care - I have way too little insight to have a > meaningful opinion. > > Jan > > -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-06 8:47 ` Julien Grall @ 2019-06-06 22:21 ` Stefano Stabellini 2019-06-07 9:33 ` Julien Grall 0 siblings, 1 reply; 43+ messages in thread From: Stefano Stabellini @ 2019-06-06 22:21 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, osstest service owner, Jan Beulich, xen-devel, Anthony Perard, Ian Jackson On Thu, 6 Jun 2019, Julien Grall wrote: > On 06/06/2019 09:42, Jan Beulich wrote: > > > > > On 05.06.19 at 23:38, <julien.grall@arm.com> wrote: > > > On 6/5/19 9:29 PM, Stefano Stabellini wrote: > > > > My vote is to backport to both. Jan/others please express your opinion. > > > > > > To follow the vote convention: > > > > > > 4.11: -1 > > > > Hmm, I'm surprised by this. Didn't I see you mention to Ian (on irc) > > you'd prefer backporting over working around this in osstest? > > My mistake here. It should be +1 here. > > > > 4.10: -1 (I was tempted by a -2 but if the other feels it should be > > > backported then I will not push back). > > > > Considering the situation, I'm leaning towards Julien's opinion here. > > But take this with care - I have way too little insight to have a > > meaningful opinion. All right. I backported the patch to staging-4.11 only. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-06 22:21 ` Stefano Stabellini @ 2019-06-07 9:33 ` Julien Grall 0 siblings, 0 replies; 43+ messages in thread From: Julien Grall @ 2019-06-07 9:33 UTC (permalink / raw) To: Stefano Stabellini Cc: Anthony Perard, Ian Jackson, osstest service owner, Jan Beulich, xen-devel Hi Stefano, On 06/06/2019 23:21, Stefano Stabellini wrote: > On Thu, 6 Jun 2019, Julien Grall wrote: >> On 06/06/2019 09:42, Jan Beulich wrote: >>>>>> On 05.06.19 at 23:38, <julien.grall@arm.com> wrote: >>>> On 6/5/19 9:29 PM, Stefano Stabellini wrote: >>>>> My vote is to backport to both. Jan/others please express your opinion. >>>> >>>> To follow the vote convention: >>>> >>>> 4.11: -1 >>> >>> Hmm, I'm surprised by this. Didn't I see you mention to Ian (on irc) >>> you'd prefer backporting over working around this in osstest? >> >> My mistake here. It should be +1 here. >> >>>> 4.10: -1 (I was tempted by a -2 but if the other feels it should be >>>> backported then I will not push back). >>> >>> Considering the situation, I'm leaning towards Julien's opinion here. >>> But take this with care - I have way too little insight to have a >>> meaningful opinion. > > All right. I backported the patch to staging-4.11 only. Thank you! I will watch the next osstest flight for qemu-upstream-4.11 and see if it boots. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Xen-devel] [qemu-upstream-4.11-testing test] 136184: regressions - FAIL 2019-06-04 17:22 ` Julien Grall 2019-06-04 17:39 ` Stefano Stabellini @ 2019-06-05 10:19 ` Jan Beulich 1 sibling, 0 replies; 43+ messages in thread From: Jan Beulich @ 2019-06-05 10:19 UTC (permalink / raw) To: Julien Grall Cc: Anthony Perard, Ian Jackson, Stefano Stabellini, osstest service owner, xen-devel >>> On 04.06.19 at 19:22, <julien.grall@arm.com> wrote: > The only one we could consider is 4.10, but AFAICT Jan already did cut > the last release for it. I've sent a call for backport requests. The tree isn't closed yet, but soon will be. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2019-06-07 9:49 UTC | newest] Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-15 19:48 [qemu-upstream-4.11-testing test] 136184: regressions - FAIL osstest service owner 2019-05-15 19:48 ` [Xen-devel] " osstest service owner 2019-05-16 10:37 ` Anthony PERARD 2019-05-16 10:37 ` [Xen-devel] " Anthony PERARD 2019-05-16 21:38 ` Julien Grall 2019-05-16 21:38 ` [Xen-devel] " Julien Grall 2019-05-17 15:53 ` Ian Jackson 2019-05-17 15:53 ` [Xen-devel] " Ian Jackson 2019-05-17 17:23 ` Anthony PERARD 2019-05-17 17:23 ` [Xen-devel] " Anthony PERARD 2019-05-17 19:00 ` Julien Grall 2019-05-17 19:00 ` [Xen-devel] " Julien Grall 2019-05-21 16:52 ` Julien Grall 2019-05-21 16:52 ` [Xen-devel] " Julien Grall 2019-06-03 17:15 ` Anthony PERARD 2019-06-03 17:15 ` [Xen-devel] " Anthony PERARD 2019-06-04 7:06 ` Jan Beulich 2019-06-04 7:06 ` [Xen-devel] " Jan Beulich 2019-06-04 9:01 ` Julien Grall 2019-06-04 9:01 ` [Xen-devel] " Julien Grall 2019-06-04 9:17 ` Jan Beulich 2019-06-04 9:17 ` [Xen-devel] " Jan Beulich 2019-06-04 9:57 ` Julien Grall 2019-06-04 9:57 ` [Xen-devel] " Julien Grall 2019-06-04 10:02 ` Jan Beulich 2019-06-04 10:02 ` [Xen-devel] " Jan Beulich 2019-06-04 17:09 ` Stefano Stabellini 2019-06-04 17:22 ` Julien Grall 2019-06-04 17:39 ` Stefano Stabellini 2019-06-04 17:52 ` Ian Jackson 2019-06-04 18:03 ` Stefano Stabellini 2019-06-04 18:27 ` Ian Jackson 2019-06-04 18:53 ` Stefano Stabellini 2019-06-04 20:50 ` Julien Grall 2019-06-04 23:11 ` Stefano Stabellini 2019-06-05 10:59 ` Julien Grall 2019-06-05 20:29 ` Stefano Stabellini 2019-06-05 21:38 ` Julien Grall 2019-06-06 8:42 ` Jan Beulich 2019-06-06 8:47 ` Julien Grall 2019-06-06 22:21 ` Stefano Stabellini 2019-06-07 9:33 ` Julien Grall 2019-06-05 10:19 ` Jan Beulich
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.