* [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
@ 2022-11-18 10:21 osstest service owner
2022-11-18 14:39 ` Roger Pau Monné
0 siblings, 1 reply; 8+ messages in thread
From: osstest service owner @ 2022-11-18 10:21 UTC (permalink / raw)
To: xen-devel
flight 174809 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/174809/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict <job status> broken
test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 5 host-install(5) broken REGR. vs. 174797
test-amd64-amd64-xl-credit2 20 guest-localmigrate/x10 fail REGR. vs. 174797
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 15 guest-saverestore fail REGR. vs. 174797
test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-saverestore fail REGR. vs. 174797
test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 debian-hvm-install fail REGR. vs. 174797
test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 174797
test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 174797
test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 174797
test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 12 debian-hvm-install fail REGR. vs. 174797
test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 15 guest-saverestore fail REGR. vs. 174797
Tests which did not succeed, but are not blocking:
test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stop fail like 174797
test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 174797
test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stop fail like 174797
test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 174797
test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 174797
test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 174797
test-armhf-armhf-libvirt 16 saverestore-support-check fail like 174797
test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stop fail like 174797
test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 174797
test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail like 174797
test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 174797
test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop fail like 174797
test-amd64-i386-xl-pvshim 14 guest-start fail never pass
test-arm64-arm64-xl-seattle 15 migrate-support-check fail never pass
test-arm64-arm64-xl-seattle 16 saverestore-support-check fail never pass
test-amd64-amd64-libvirt 15 migrate-support-check fail never pass
test-amd64-amd64-libvirt-xsm 15 migrate-support-check fail never pass
test-amd64-i386-libvirt-xsm 15 migrate-support-check fail never pass
test-amd64-i386-libvirt 15 migrate-support-check fail never pass
test-arm64-arm64-xl 15 migrate-support-check fail never pass
test-arm64-arm64-xl 16 saverestore-support-check fail never pass
test-arm64-arm64-xl-credit2 15 migrate-support-check fail never pass
test-arm64-arm64-xl-credit2 16 saverestore-support-check fail never pass
test-arm64-arm64-xl-xsm 15 migrate-support-check fail never pass
test-arm64-arm64-xl-xsm 16 saverestore-support-check fail never pass
test-arm64-arm64-libvirt-xsm 15 migrate-support-check fail never pass
test-arm64-arm64-libvirt-xsm 16 saverestore-support-check fail never pass
test-arm64-arm64-xl-credit1 15 migrate-support-check fail never pass
test-arm64-arm64-xl-credit1 16 saverestore-support-check fail never pass
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass
test-armhf-armhf-xl-arndale 15 migrate-support-check fail never pass
test-armhf-armhf-xl-arndale 16 saverestore-support-check fail never pass
test-arm64-arm64-xl-thunderx 15 migrate-support-check fail never pass
test-arm64-arm64-xl-thunderx 16 saverestore-support-check fail never pass
test-amd64-i386-libvirt-raw 14 migrate-support-check fail never pass
test-amd64-amd64-libvirt-vhd 14 migrate-support-check fail never pass
test-armhf-armhf-xl-credit1 15 migrate-support-check fail never pass
test-armhf-armhf-xl-credit1 16 saverestore-support-check fail never pass
test-armhf-armhf-xl-multivcpu 15 migrate-support-check fail never pass
test-armhf-armhf-xl-multivcpu 16 saverestore-support-check fail never pass
test-armhf-armhf-xl 15 migrate-support-check fail never pass
test-armhf-armhf-xl 16 saverestore-support-check fail never pass
test-arm64-arm64-xl-vhd 14 migrate-support-check fail never pass
test-arm64-arm64-xl-vhd 15 saverestore-support-check fail never pass
test-armhf-armhf-xl-credit2 15 migrate-support-check fail never pass
test-armhf-armhf-xl-credit2 16 saverestore-support-check fail never pass
test-armhf-armhf-xl-cubietruck 15 migrate-support-check fail never pass
test-armhf-armhf-xl-cubietruck 16 saverestore-support-check fail never pass
test-armhf-armhf-xl-rtds 15 migrate-support-check fail never pass
test-armhf-armhf-xl-rtds 16 saverestore-support-check fail never pass
test-armhf-armhf-libvirt 15 migrate-support-check fail never pass
test-arm64-arm64-libvirt-raw 14 migrate-support-check fail never pass
test-arm64-arm64-libvirt-raw 15 saverestore-support-check fail never pass
test-armhf-armhf-xl-vhd 14 migrate-support-check fail never pass
test-armhf-armhf-xl-vhd 15 saverestore-support-check fail never pass
test-armhf-armhf-libvirt-qcow2 14 migrate-support-check fail never pass
test-armhf-armhf-libvirt-raw 14 migrate-support-check fail never pass
version targeted for testing:
xen db8fa01c61db0317a9ee947925226234c65d48e8
baseline version:
xen f5d56f4b253072264efc0fece698a91779e362f5
Last test of basis 174797 2022-11-17 03:03:07 Z 1 days
Testing same since 174809 2022-11-18 00:06:55 Z 0 days 1 attempts
------------------------------------------------------------
People who touched revisions under test:
Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD <anthony.perard@citrix.com>
Jan Beulich <jbeulich@suse.com>
jobs:
build-amd64-xsm pass
build-arm64-xsm pass
build-i386-xsm pass
build-amd64-xtf pass
build-amd64 pass
build-arm64 pass
build-armhf pass
build-i386 pass
build-amd64-libvirt pass
build-arm64-libvirt pass
build-armhf-libvirt pass
build-i386-libvirt pass
build-amd64-prev pass
build-i386-prev pass
build-amd64-pvops pass
build-arm64-pvops pass
build-armhf-pvops pass
build-i386-pvops pass
test-xtf-amd64-amd64-1 pass
test-xtf-amd64-amd64-2 pass
test-xtf-amd64-amd64-3 pass
test-xtf-amd64-amd64-4 pass
test-xtf-amd64-amd64-5 pass
test-amd64-amd64-xl pass
test-amd64-coresched-amd64-xl pass
test-arm64-arm64-xl pass
test-armhf-armhf-xl pass
test-amd64-i386-xl pass
test-amd64-coresched-i386-xl pass
test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm fail
test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm fail
test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm fail
test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm fail
test-amd64-amd64-xl-qemut-debianhvm-i386-xsm fail
test-amd64-i386-xl-qemut-debianhvm-i386-xsm fail
test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm fail
test-amd64-i386-xl-qemuu-debianhvm-i386-xsm fail
test-amd64-amd64-libvirt-xsm pass
test-arm64-arm64-libvirt-xsm pass
test-amd64-i386-libvirt-xsm pass
test-amd64-amd64-xl-xsm pass
test-arm64-arm64-xl-xsm pass
test-amd64-i386-xl-xsm pass
test-amd64-amd64-qemuu-nested-amd fail
test-amd64-amd64-xl-pvhv2-amd pass
test-amd64-i386-qemut-rhel6hvm-amd pass
test-amd64-i386-qemuu-rhel6hvm-amd pass
test-amd64-amd64-dom0pvh-xl-amd pass
test-amd64-amd64-xl-qemut-debianhvm-amd64 pass
test-amd64-i386-xl-qemut-debianhvm-amd64 pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
test-amd64-i386-freebsd10-amd64 pass
test-amd64-amd64-qemuu-freebsd11-amd64 pass
test-amd64-amd64-qemuu-freebsd12-amd64 pass
test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
test-amd64-i386-xl-qemuu-ovmf-amd64 pass
test-amd64-amd64-xl-qemut-win7-amd64 fail
test-amd64-i386-xl-qemut-win7-amd64 fail
test-amd64-amd64-xl-qemuu-win7-amd64 fail
test-amd64-i386-xl-qemuu-win7-amd64 fail
test-amd64-amd64-xl-qemut-ws16-amd64 fail
test-amd64-i386-xl-qemut-ws16-amd64 fail
test-amd64-amd64-xl-qemuu-ws16-amd64 fail
test-amd64-i386-xl-qemuu-ws16-amd64 fail
test-armhf-armhf-xl-arndale pass
test-amd64-amd64-examine-bios pass
test-amd64-i386-examine-bios pass
test-amd64-amd64-xl-credit1 pass
test-arm64-arm64-xl-credit1 pass
test-armhf-armhf-xl-credit1 pass
test-amd64-amd64-xl-credit2 fail
test-arm64-arm64-xl-credit2 pass
test-armhf-armhf-xl-credit2 pass
test-armhf-armhf-xl-cubietruck pass
test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict broken
test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict pass
test-amd64-amd64-examine pass
test-arm64-arm64-examine pass
test-armhf-armhf-examine pass
test-amd64-i386-examine pass
test-amd64-i386-freebsd10-i386 pass
test-amd64-amd64-qemuu-nested-intel pass
test-amd64-amd64-xl-pvhv2-intel pass
test-amd64-i386-qemut-rhel6hvm-intel pass
test-amd64-i386-qemuu-rhel6hvm-intel pass
test-amd64-amd64-dom0pvh-xl-intel pass
test-amd64-amd64-libvirt pass
test-armhf-armhf-libvirt pass
test-amd64-i386-libvirt pass
test-amd64-amd64-livepatch pass
test-amd64-i386-livepatch pass
test-amd64-amd64-migrupgrade pass
test-amd64-i386-migrupgrade pass
test-amd64-amd64-xl-multivcpu pass
test-armhf-armhf-xl-multivcpu pass
test-amd64-amd64-pair pass
test-amd64-i386-pair pass
test-amd64-amd64-libvirt-pair pass
test-amd64-i386-libvirt-pair pass
test-amd64-amd64-xl-pvshim pass
test-amd64-i386-xl-pvshim fail
test-amd64-amd64-pygrub pass
test-armhf-armhf-libvirt-qcow2 pass
test-amd64-amd64-xl-qcow2 pass
test-arm64-arm64-libvirt-raw pass
test-armhf-armhf-libvirt-raw pass
test-amd64-i386-libvirt-raw pass
test-amd64-amd64-xl-rtds pass
test-armhf-armhf-xl-rtds pass
test-arm64-arm64-xl-seattle pass
test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass
test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow pass
test-amd64-amd64-xl-shadow pass
test-amd64-i386-xl-shadow pass
test-arm64-arm64-xl-thunderx pass
test-amd64-amd64-examine-uefi pass
test-amd64-i386-examine-uefi pass
test-amd64-amd64-libvirt-vhd pass
test-arm64-arm64-xl-vhd pass
test-armhf-armhf-xl-vhd pass
test-amd64-i386-xl-vhd pass
------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images
Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs
Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master
Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary
broken-job test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict broken
broken-step test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict host-install(5)
Not pushing.
------------------------------------------------------------
commit db8fa01c61db0317a9ee947925226234c65d48e8
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu Oct 20 12:14:30 2022 +0100
xen/arm: Correct the p2m pool size calculations
Allocating or freeing p2m pages doesn't alter the size of the mempool; only
the split between free and used pages.
Right now, the hypercalls operate on the free subset of the pool, meaning that
XEN_DOMCTL_get_paging_mempool_size varies with time as the guest shuffles its
physmap, and XEN_DOMCTL_set_paging_mempool_size ignores the used subset of the
pool and lets the guest grow unbounded.
This fixes test-pagign-mempool on ARM so that the behaviour matches x86.
This is part of XSA-409 / CVE-2022-33747.
Fixes: cbea5a1149ca ("xen/arm: Allocate and free P2M pages from the P2M pool")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
commit 7c3bbd940dd8aeb1649734e5055798cc6f3fea4e
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Tue Oct 25 15:27:05 2022 +0100
xen/arm, libxl: Revert XEN_DOMCTL_shadow_op; use p2m mempool hypercalls
This reverts most of commit cf2a68d2ffbc3ce95e01449d46180bddb10d24a0, and bits
of cbea5a1149ca7fd4b7cdbfa3ec2e4f109b601ff7.
First of all, with ARM borrowing x86's implementation, the logic to set the
pool size should have been common, not duplicated. Introduce
libxl__domain_set_paging_mempool_size() as a shared implementation, and use it
from the ARM and x86 paths. It is left as an exercise to the reader to judge
how libxl/xl can reasonably function without the ability to query the pool
size...
Remove ARM's p2m_domctl() infrastructure now the functioanlity has been
replaced with a working and unit tested interface.
This is part of XSA-409 / CVE-2022-33747.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
commit bd87315a603bf25e869e6293f7db7b1024d67999
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu Oct 20 12:13:46 2022 +0100
tools/tests: Unit test for paging mempool size
Exercise some basic functionality of the new
xc_{get,set}_paging_mempool_size() hypercalls.
This passes on x86, but fails currently on ARM. ARM will be fixed up in
future patches.
This is part of XSA-409 / CVE-2022-33747.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
commit 22b20bd98c025e06525410e3ab3494d5e63489f7
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Fri Oct 21 14:13:00 2022 +0100
xen: Introduce non-broken hypercalls for the paging mempool size
The existing XEN_DOMCTL_SHADOW_OP_{GET,SET}_ALLOCATION have problems:
* All set_allocation() flavours have an overflow-before-widen bug when
calculating "sc->mb << (20 - PAGE_SHIFT)".
* All flavours have a granularity of 1M. This was tolerable when the size of
the pool could only be set at the same granularity, but is broken now that
ARM has a 16-page stopgap allocation in use.
* All get_allocation() flavours round up, and in particular turn 0 into 1,
meaning the get op returns junk before a successful set op.
* The x86 flavours reject the hypercalls before the VM has vCPUs allocated,
despite the pool size being a domain property.
* Even the hypercall names are long-obsolete.
Implement a better interface, which can be first used to unit test the
behaviour, and subsequently correct a broken implementation. The old
interface will be retired in due course.
The unit of bytes (as opposed pages) is a deliberate API/ABI improvement to
more easily support multiple page granularities.
This is part of XSA-409 / CVE-2022-33747.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
commit e5ac68a0110cb43a3a0bc17d545ae7a0bd746ef9
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Mon Nov 14 21:47:59 2022 +0000
x86/hvm: Revert per-domain APIC acceleration support
I was really hoping to avoid this, but its now too late in the 4.17 freeze and
we still don't have working fixes.
The in-Xen calculations for assistance capabilities are buggy. For the
avoidance of doubt, the original intention was to be able to control every
aspect of a APIC acceleration so we could comprehensively test Xen's support,
as it has proved to be buggy time and time again.
Even after a protracted discussion on what the new API ought to mean, attempts
to apply it to the existing logic have been unsuccessful, proving that the
API/ABI is too complicated for most people to reason about.
This reverts most of:
2ce11ce249a3981bac50914c6a90f681ad7a4222
6b2b9b3405092c3ad38d7342988a584b8efa674c
leaving in place the non-APIC specific changes (minimal as they are).
This takes us back to the behaviour of Xen 4.16 where APIC acceleration is
configured on a per system basis.
This work will be revisted in due course.
Fixes: 2ce11ce249a3 ("x86/HVM: allow per-domain usage of hardware virtualized APIC")
Fixes: 6b2b9b340509 ("x86: report Interrupt Controller Virtualization capabilities")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(qemu changes not included)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
2022-11-18 10:21 [xen-unstable test] 174809: regressions - trouble: broken/fail/pass osstest service owner
@ 2022-11-18 14:39 ` Roger Pau Monné
2022-11-18 17:22 ` Flask vs paging mempool - Was: " Andrew Cooper
0 siblings, 1 reply; 8+ messages in thread
From: Roger Pau Monné @ 2022-11-18 14:39 UTC (permalink / raw)
To: Andrew Cooper, Henry Wang; +Cc: xen-devel
On Fri, Nov 18, 2022 at 10:21:52AM +0000, osstest service owner wrote:
> flight 174809 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/174809/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict <job status> broken
> test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 5 host-install(5) broken REGR. vs. 174797
> test-amd64-amd64-xl-credit2 20 guest-localmigrate/x10 fail REGR. vs. 174797
> test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 15 guest-saverestore fail REGR. vs. 174797
> test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-saverestore fail REGR. vs. 174797
> test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 debian-hvm-install fail REGR. vs. 174797
> test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 174797
> test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 174797
> test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 174797
> test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 12 debian-hvm-install fail REGR. vs. 174797
> test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 15 guest-saverestore fail REGR. vs. 174797
Looking at a random failure:
Nov 18 01:55:09.233941 (d1) Searching bootorder for: HALT
Nov 18 01:55:11.681666 (d1) drive 0x000f5890: PCHS=16383/16/63 translation=lba LCHS=1024/255/63 s=20480000
Nov 18 01:55:11.693694 (d1) Space available for UMB: cb000-e9000, f52e0-f5820
Nov 18 01:55:11.693754 (d1) Returned 258048 bytes of ZoneHigh
Nov 18 01:55:11.705648 (d1) e820 map has 8 items:
Nov 18 01:55:11.705676 (d1) 0: 0000000000000000 - 000000000009fc00 = 1 RAM
Nov 18 01:55:11.705701 (d1) 1: 000000000009fc00 - 00000000000a0000 = 2 RESERVED
Nov 18 01:55:11.717716 (d1) 2: 00000000000f0000 - 0000000000100000 = 2 RESERVED
Nov 18 01:55:11.717768 (d1) 3: 0000000000100000 - 00000000effff000 = 1 RAM
Nov 18 01:55:11.729687 (d1) 4: 00000000effff000 - 00000000f0000000 = 2 RESERVED
Nov 18 01:55:11.729745 (d1) 5: 00000000fc000000 - 00000000fc00b000 = 4 NVS
Nov 18 01:55:11.741693 (d1) 6: 00000000fc00b000 - 0000000100000000 = 2 RESERVED
Nov 18 01:55:11.741752 (d1) 7: 0000000100000000 - 0000000148000000 = 1 RAM
Nov 18 01:55:11.753644 (d1) enter handle_19:
Nov 18 01:55:11.753721 (d1) NULL
Nov 18 01:55:11.753796 (d1) Booting from DVD/CD...
Nov 18 01:55:11.753864 (d1) Booting from 0000:7c00
Nov 18 01:55:11.753936 (XEN) arch/x86/mm/hap/hap.c:304: d1 failed to allocate from HAP pool
Nov 18 01:55:18.633799 (XEN) Failed to shatter gfn 7ed37: -12
Nov 18 01:55:18.633866 (XEN) d1v0 EPT violation 0x19c (--x/rw-) gpa 0x0000007ed373a1 mfn 0x33ed37 type 0
Nov 18 01:55:18.645790 (XEN) d1v0 Walking EPT tables for GFN 7ed37:
Nov 18 01:55:18.645850 (XEN) d1v0 epte 9c0000047eba3107
Nov 18 01:55:18.645893 (XEN) d1v0 epte 9c000003000003f3
Nov 18 01:55:18.645935 (XEN) d1v0 --- GLA 0x7ed373a1
Nov 18 01:55:18.657783 (XEN) domain_crash called from arch/x86/hvm/vmx/vmx.c:3758
Nov 18 01:55:18.657844 (XEN) Domain 1 (vcpu#0) crashed on cpu#8:
Nov 18 01:55:18.669781 (XEN) ----[ Xen-4.17-rc x86_64 debug=y Not tainted ]----
Nov 18 01:55:18.669843 (XEN) CPU: 8
Nov 18 01:55:18.669884 (XEN) RIP: 0020:[<000000007ed373a1>]
Nov 18 01:55:18.681711 (XEN) RFLAGS: 0000000000010002 CONTEXT: hvm guest (d1v0)
Nov 18 01:55:18.681772 (XEN) rax: 000000007ed373a1 rbx: 000000007ed3726c rcx: 0000000000000000
Nov 18 01:55:18.693713 (XEN) rdx: 000000007ed2e610 rsi: 0000000000008e38 rdi: 000000007ed37448
Nov 18 01:55:18.693775 (XEN) rbp: 0000000001b410a0 rsp: 0000000000320880 r8: 0000000000000000
Nov 18 01:55:18.705725 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
Nov 18 01:55:18.717733 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000
Nov 18 01:55:18.717794 (XEN) r15: 0000000000000000 cr0: 0000000000000011 cr4: 0000000000000000
Nov 18 01:55:18.729713 (XEN) cr3: 0000000000400000 cr2: 0000000000000000
Nov 18 01:55:18.729771 (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000002
Nov 18 01:55:18.741711 (XEN) ds: 0028 es: 0028 fs: 0000 gs: 0000 ss: 0028 cs: 0020
It seems to be related to the paging pool adding Andrew and Henry so
that he is aware.
Roger.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Flask vs paging mempool - Was: [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
2022-11-18 14:39 ` Roger Pau Monné
@ 2022-11-18 17:22 ` Andrew Cooper
2022-11-18 21:10 ` Jason Andryuk
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2022-11-18 17:22 UTC (permalink / raw)
To: Roger Pau Monne, Henry Wang, Anthony Perard, Daniel Smith, Jason Andryuk
Cc: xen-devel
On 18/11/2022 14:39, Roger Pau Monne wrote:
> Nov 18 01:55:11.753936 (XEN) arch/x86/mm/hap/hap.c:304: d1 failed to allocate from HAP pool
> Nov 18 01:55:18.633799 (XEN) Failed to shatter gfn 7ed37: -12
> Nov 18 01:55:18.633866 (XEN) d1v0 EPT violation 0x19c (--x/rw-) gpa 0x0000007ed373a1 mfn 0x33ed37 type 0
> Nov 18 01:55:18.645790 (XEN) d1v0 Walking EPT tables for GFN 7ed37:
> Nov 18 01:55:18.645850 (XEN) d1v0 epte 9c0000047eba3107
> Nov 18 01:55:18.645893 (XEN) d1v0 epte 9c000003000003f3
> Nov 18 01:55:18.645935 (XEN) d1v0 --- GLA 0x7ed373a1
> Nov 18 01:55:18.657783 (XEN) domain_crash called from arch/x86/hvm/vmx/vmx.c:3758
> Nov 18 01:55:18.657844 (XEN) Domain 1 (vcpu#0) crashed on cpu#8:
> Nov 18 01:55:18.669781 (XEN) ----[ Xen-4.17-rc x86_64 debug=y Not tainted ]----
> Nov 18 01:55:18.669843 (XEN) CPU: 8
> Nov 18 01:55:18.669884 (XEN) RIP: 0020:[<000000007ed373a1>]
> Nov 18 01:55:18.681711 (XEN) RFLAGS: 0000000000010002 CONTEXT: hvm guest (d1v0)
> Nov 18 01:55:18.681772 (XEN) rax: 000000007ed373a1 rbx: 000000007ed3726c rcx: 0000000000000000
> Nov 18 01:55:18.693713 (XEN) rdx: 000000007ed2e610 rsi: 0000000000008e38 rdi: 000000007ed37448
> Nov 18 01:55:18.693775 (XEN) rbp: 0000000001b410a0 rsp: 0000000000320880 r8: 0000000000000000
> Nov 18 01:55:18.705725 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
> Nov 18 01:55:18.717733 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000
> Nov 18 01:55:18.717794 (XEN) r15: 0000000000000000 cr0: 0000000000000011 cr4: 0000000000000000
> Nov 18 01:55:18.729713 (XEN) cr3: 0000000000400000 cr2: 0000000000000000
> Nov 18 01:55:18.729771 (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000002
> Nov 18 01:55:18.741711 (XEN) ds: 0028 es: 0028 fs: 0000 gs: 0000 ss: 0028 cs: 0020
>
> It seems to be related to the paging pool adding Andrew and Henry so
> that he is aware.
Summary of what I've just given on IRC/Matrix.
This crash is caused by two things. First
(XEN) FLASK: Denying unknown domctl: 86.
because I completely forgot to wire up Flask for the new hypercalls.
But so did the original XSA-409 fix (as SECCLASS_SHADOW is behind
CONFIG_X86), so I don't feel quite as bad about this.
And second because libxl ignores the error it gets back, and blindly
continues onward. Anthony has posted "libs/light: Propagate
libxl__arch_domain_create() return code" to fix the libxl half of the
bug, and I posted a second libxl bugfix to fix an error message. Both
are very simple.
For Flask, we need new access vectors because this is a common
hypercall, but I'm unsure how to interlink it with x86's shadow
control. This will require a bit of pondering, but it is probably
easier to just leave them unlinked.
Flask is listed as experimental which means it doesn't technically
matter if we break it, but it is used by OpenXT so not fixing it for
4.17 would be rather rude.
~Andrew
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Flask vs paging mempool - Was: [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
2022-11-18 17:22 ` Flask vs paging mempool - Was: " Andrew Cooper
@ 2022-11-18 21:10 ` Jason Andryuk
2022-11-20 11:08 ` Daniel P. Smith
2022-11-21 11:37 ` Andrew Cooper
0 siblings, 2 replies; 8+ messages in thread
From: Jason Andryuk @ 2022-11-18 21:10 UTC (permalink / raw)
To: Andrew Cooper
Cc: Roger Pau Monne, Henry Wang, Anthony Perard, Daniel Smith, xen-devel
On Fri, Nov 18, 2022 at 12:22 PM Andrew Cooper
<Andrew.Cooper3@citrix.com> wrote:
>
> On 18/11/2022 14:39, Roger Pau Monne wrote:
> > Nov 18 01:55:11.753936 (XEN) arch/x86/mm/hap/hap.c:304: d1 failed to allocate from HAP pool
> > Nov 18 01:55:18.633799 (XEN) Failed to shatter gfn 7ed37: -12
> > Nov 18 01:55:18.633866 (XEN) d1v0 EPT violation 0x19c (--x/rw-) gpa 0x0000007ed373a1 mfn 0x33ed37 type 0
> > Nov 18 01:55:18.645790 (XEN) d1v0 Walking EPT tables for GFN 7ed37:
> > Nov 18 01:55:18.645850 (XEN) d1v0 epte 9c0000047eba3107
> > Nov 18 01:55:18.645893 (XEN) d1v0 epte 9c000003000003f3
> > Nov 18 01:55:18.645935 (XEN) d1v0 --- GLA 0x7ed373a1
> > Nov 18 01:55:18.657783 (XEN) domain_crash called from arch/x86/hvm/vmx/vmx.c:3758
> > Nov 18 01:55:18.657844 (XEN) Domain 1 (vcpu#0) crashed on cpu#8:
> > Nov 18 01:55:18.669781 (XEN) ----[ Xen-4.17-rc x86_64 debug=y Not tainted ]----
> > Nov 18 01:55:18.669843 (XEN) CPU: 8
> > Nov 18 01:55:18.669884 (XEN) RIP: 0020:[<000000007ed373a1>]
> > Nov 18 01:55:18.681711 (XEN) RFLAGS: 0000000000010002 CONTEXT: hvm guest (d1v0)
> > Nov 18 01:55:18.681772 (XEN) rax: 000000007ed373a1 rbx: 000000007ed3726c rcx: 0000000000000000
> > Nov 18 01:55:18.693713 (XEN) rdx: 000000007ed2e610 rsi: 0000000000008e38 rdi: 000000007ed37448
> > Nov 18 01:55:18.693775 (XEN) rbp: 0000000001b410a0 rsp: 0000000000320880 r8: 0000000000000000
> > Nov 18 01:55:18.705725 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
> > Nov 18 01:55:18.717733 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000
> > Nov 18 01:55:18.717794 (XEN) r15: 0000000000000000 cr0: 0000000000000011 cr4: 0000000000000000
> > Nov 18 01:55:18.729713 (XEN) cr3: 0000000000400000 cr2: 0000000000000000
> > Nov 18 01:55:18.729771 (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000002
> > Nov 18 01:55:18.741711 (XEN) ds: 0028 es: 0028 fs: 0000 gs: 0000 ss: 0028 cs: 0020
> >
> > It seems to be related to the paging pool adding Andrew and Henry so
> > that he is aware.
>
> Summary of what I've just given on IRC/Matrix.
>
> This crash is caused by two things. First
>
> (XEN) FLASK: Denying unknown domctl: 86.
>
> because I completely forgot to wire up Flask for the new hypercalls.
> But so did the original XSA-409 fix (as SECCLASS_SHADOW is behind
> CONFIG_X86), so I don't feel quite as bad about this.
Broken for ARM, but not for x86, right?
I think SECCLASS_SHADOW is available in the policy bits - it's just
whether or not the hook functions are available?
> And second because libxl ignores the error it gets back, and blindly
> continues onward. Anthony has posted "libs/light: Propagate
> libxl__arch_domain_create() return code" to fix the libxl half of the
> bug, and I posted a second libxl bugfix to fix an error message. Both
> are very simple.
>
>
> For Flask, we need new access vectors because this is a common
> hypercall, but I'm unsure how to interlink it with x86's shadow
> control. This will require a bit of pondering, but it is probably
> easier to just leave them unlinked.
It sort of seems like it could go under domain2 since domain/domain2
have most of the memory stuff, but it is non-PV. shadow has its own
set of hooks. It could go in hvm which already has some memory stuff.
> Flask is listed as experimental which means it doesn't technically
> matter if we break it, but it is used by OpenXT so not fixing it for
> 4.17 would be rather rude.
It's definitely nicer to have functional Flask in the release. OpenXT
can use a backport if necessary, so it doesn't need to be a release
blocker. Having said that, Flask is a nice feature of Xen, so it
would be good to have it functioning in 4.17.
Regards,
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Flask vs paging mempool - Was: [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
2022-11-18 21:10 ` Jason Andryuk
@ 2022-11-20 11:08 ` Daniel P. Smith
2022-11-21 8:04 ` Jan Beulich
2022-11-21 11:37 ` Andrew Cooper
1 sibling, 1 reply; 8+ messages in thread
From: Daniel P. Smith @ 2022-11-20 11:08 UTC (permalink / raw)
To: Jason Andryuk, Andrew Cooper
Cc: Roger Pau Monne, Henry Wang, Anthony Perard, xen-devel
On 11/18/22 16:10, Jason Andryuk wrote:
> On Fri, Nov 18, 2022 at 12:22 PM Andrew Cooper
> <Andrew.Cooper3@citrix.com> wrote:
>>
>> On 18/11/2022 14:39, Roger Pau Monne wrote:
>>> Nov 18 01:55:11.753936 (XEN) arch/x86/mm/hap/hap.c:304: d1 failed to allocate from HAP pool
>>> Nov 18 01:55:18.633799 (XEN) Failed to shatter gfn 7ed37: -12
>>> Nov 18 01:55:18.633866 (XEN) d1v0 EPT violation 0x19c (--x/rw-) gpa 0x0000007ed373a1 mfn 0x33ed37 type 0
>>> Nov 18 01:55:18.645790 (XEN) d1v0 Walking EPT tables for GFN 7ed37:
>>> Nov 18 01:55:18.645850 (XEN) d1v0 epte 9c0000047eba3107
>>> Nov 18 01:55:18.645893 (XEN) d1v0 epte 9c000003000003f3
>>> Nov 18 01:55:18.645935 (XEN) d1v0 --- GLA 0x7ed373a1
>>> Nov 18 01:55:18.657783 (XEN) domain_crash called from arch/x86/hvm/vmx/vmx.c:3758
>>> Nov 18 01:55:18.657844 (XEN) Domain 1 (vcpu#0) crashed on cpu#8:
>>> Nov 18 01:55:18.669781 (XEN) ----[ Xen-4.17-rc x86_64 debug=y Not tainted ]----
>>> Nov 18 01:55:18.669843 (XEN) CPU: 8
>>> Nov 18 01:55:18.669884 (XEN) RIP: 0020:[<000000007ed373a1>]
>>> Nov 18 01:55:18.681711 (XEN) RFLAGS: 0000000000010002 CONTEXT: hvm guest (d1v0)
>>> Nov 18 01:55:18.681772 (XEN) rax: 000000007ed373a1 rbx: 000000007ed3726c rcx: 0000000000000000
>>> Nov 18 01:55:18.693713 (XEN) rdx: 000000007ed2e610 rsi: 0000000000008e38 rdi: 000000007ed37448
>>> Nov 18 01:55:18.693775 (XEN) rbp: 0000000001b410a0 rsp: 0000000000320880 r8: 0000000000000000
>>> Nov 18 01:55:18.705725 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
>>> Nov 18 01:55:18.717733 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000
>>> Nov 18 01:55:18.717794 (XEN) r15: 0000000000000000 cr0: 0000000000000011 cr4: 0000000000000000
>>> Nov 18 01:55:18.729713 (XEN) cr3: 0000000000400000 cr2: 0000000000000000
>>> Nov 18 01:55:18.729771 (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000002
>>> Nov 18 01:55:18.741711 (XEN) ds: 0028 es: 0028 fs: 0000 gs: 0000 ss: 0028 cs: 0020
>>>
>>> It seems to be related to the paging pool adding Andrew and Henry so
>>> that he is aware.
>>
>> Summary of what I've just given on IRC/Matrix.
>>
>> This crash is caused by two things. First
>>
>> (XEN) FLASK: Denying unknown domctl: 86.
>>
>> because I completely forgot to wire up Flask for the new hypercalls.
>> But so did the original XSA-409 fix (as SECCLASS_SHADOW is behind
>> CONFIG_X86), so I don't feel quite as bad about this.
>
> Broken for ARM, but not for x86, right?
>
> I think SECCLASS_SHADOW is available in the policy bits - it's just
> whether or not the hook functions are available?
>
>> And second because libxl ignores the error it gets back, and blindly
>> continues onward. Anthony has posted "libs/light: Propagate
>> libxl__arch_domain_create() return code" to fix the libxl half of the
>> bug, and I posted a second libxl bugfix to fix an error message. Both
>> are very simple.
>>
>>
>> For Flask, we need new access vectors because this is a common
>> hypercall, but I'm unsure how to interlink it with x86's shadow
>> control. This will require a bit of pondering, but it is probably
>> easier to just leave them unlinked.
>
> It sort of seems like it could go under domain2 since domain/domain2
> have most of the memory stuff, but it is non-PV. shadow has its own
> set of hooks. It could go in hvm which already has some memory stuff.
Since the new hypercall is for managing a memory pool for any domain,
though HVM is the only one supported today, imho it belongs under
domain/domain2.
Something to consider is that there is another guest memory pool that is
managed, the PoD pool, which has a dedicated privilege for it. This
leads me to the question of whether access to manage the PoD pool and
the paging pool size should be separate accesses or whether they should
be under the same access. IMHO I believe it should be the latter as I
can see no benefit in disaggregating access to the PoD pool and the
paging pool. In fact I find myself thinking in terms of should the
managing domain have control over the size of any backing memory pools
for the target domain. I am not seeing any benefit to discriminating
between which backing memory pool a managing domain should be able to
manage. With that said, I am open to being convinced otherwise.
Since this is an XSA fix that will be backported, moving get/set PoD
hypercalls under a new permission would be too disruptive. I would
recommend introducing the permission set/getmempools under the domain
access vector, which will only control access to the paging pool. Then
planning can occur for 4.18 to look at transitioning get/set PoD target
to being controlled via get/setmempools.
>> Flask is listed as experimental which means it doesn't technically
>> matter if we break it, but it is used by OpenXT so not fixing it for
>> 4.17 would be rather rude.
>
> It's definitely nicer to have functional Flask in the release. OpenXT
> can use a backport if necessary, so it doesn't need to be a release
> blocker. Having said that, Flask is a nice feature of Xen, so it
> would be good to have it functioning in 4.17.
As maintainer I would really prefer not to see 4.17 go out with any part
of XSM broken. While it is considered experimental, which I hope to
rectify, it is a long standing feature that has been kept stable, and
for which there is a sizeable user base. IMHO I think it deserves a
proper fix before release.
V/r,
Daniel P. Smith
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Flask vs paging mempool - Was: [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
2022-11-20 11:08 ` Daniel P. Smith
@ 2022-11-21 8:04 ` Jan Beulich
2022-11-21 12:14 ` Daniel P. Smith
0 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2022-11-21 8:04 UTC (permalink / raw)
To: Daniel P. Smith
Cc: Roger Pau Monne, Henry Wang, Anthony Perard, xen-devel,
Jason Andryuk, Andrew Cooper
On 20.11.2022 12:08, Daniel P. Smith wrote:
> On 11/18/22 16:10, Jason Andryuk wrote:
>> On Fri, Nov 18, 2022 at 12:22 PM Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>>> For Flask, we need new access vectors because this is a common
>>> hypercall, but I'm unsure how to interlink it with x86's shadow
>>> control. This will require a bit of pondering, but it is probably
>>> easier to just leave them unlinked.
>>
>> It sort of seems like it could go under domain2 since domain/domain2
>> have most of the memory stuff, but it is non-PV. shadow has its own
>> set of hooks. It could go in hvm which already has some memory stuff.
>
> Since the new hypercall is for managing a memory pool for any domain,
> though HVM is the only one supported today, imho it belongs under
> domain/domain2.
>
> Something to consider is that there is another guest memory pool that is
> managed, the PoD pool, which has a dedicated privilege for it. This
> leads me to the question of whether access to manage the PoD pool and
> the paging pool size should be separate accesses or whether they should
> be under the same access. IMHO I believe it should be the latter as I
> can see no benefit in disaggregating access to the PoD pool and the
> paging pool. In fact I find myself thinking in terms of should the
> managing domain have control over the size of any backing memory pools
> for the target domain. I am not seeing any benefit to discriminating
> between which backing memory pool a managing domain should be able to
> manage. With that said, I am open to being convinced otherwise.
Yet the two pools are of quite different nature: The PoD pool is memory
the domain itself gets to use (more precisely it is memory temporarily
"stolen" from the domain). The paging pool, otoh, is memory we need to
make the domain actually function, without the guest having access to
that memory.
Jan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Flask vs paging mempool - Was: [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
2022-11-18 21:10 ` Jason Andryuk
2022-11-20 11:08 ` Daniel P. Smith
@ 2022-11-21 11:37 ` Andrew Cooper
1 sibling, 0 replies; 8+ messages in thread
From: Andrew Cooper @ 2022-11-21 11:37 UTC (permalink / raw)
To: Jason Andryuk
Cc: Roger Pau Monne, Henry Wang, Anthony Perard, Daniel Smith, xen-devel
On 18/11/2022 21:10, Jason Andryuk wrote:
> On Fri, Nov 18, 2022 at 12:22 PM Andrew Cooper
> <Andrew.Cooper3@citrix.com> wrote:
>> On 18/11/2022 14:39, Roger Pau Monne wrote:
>>> Nov 18 01:55:11.753936 (XEN) arch/x86/mm/hap/hap.c:304: d1 failed to allocate from HAP pool
>>> Nov 18 01:55:18.633799 (XEN) Failed to shatter gfn 7ed37: -12
>>> Nov 18 01:55:18.633866 (XEN) d1v0 EPT violation 0x19c (--x/rw-) gpa 0x0000007ed373a1 mfn 0x33ed37 type 0
>>> Nov 18 01:55:18.645790 (XEN) d1v0 Walking EPT tables for GFN 7ed37:
>>> Nov 18 01:55:18.645850 (XEN) d1v0 epte 9c0000047eba3107
>>> Nov 18 01:55:18.645893 (XEN) d1v0 epte 9c000003000003f3
>>> Nov 18 01:55:18.645935 (XEN) d1v0 --- GLA 0x7ed373a1
>>> Nov 18 01:55:18.657783 (XEN) domain_crash called from arch/x86/hvm/vmx/vmx.c:3758
>>> Nov 18 01:55:18.657844 (XEN) Domain 1 (vcpu#0) crashed on cpu#8:
>>> Nov 18 01:55:18.669781 (XEN) ----[ Xen-4.17-rc x86_64 debug=y Not tainted ]----
>>> Nov 18 01:55:18.669843 (XEN) CPU: 8
>>> Nov 18 01:55:18.669884 (XEN) RIP: 0020:[<000000007ed373a1>]
>>> Nov 18 01:55:18.681711 (XEN) RFLAGS: 0000000000010002 CONTEXT: hvm guest (d1v0)
>>> Nov 18 01:55:18.681772 (XEN) rax: 000000007ed373a1 rbx: 000000007ed3726c rcx: 0000000000000000
>>> Nov 18 01:55:18.693713 (XEN) rdx: 000000007ed2e610 rsi: 0000000000008e38 rdi: 000000007ed37448
>>> Nov 18 01:55:18.693775 (XEN) rbp: 0000000001b410a0 rsp: 0000000000320880 r8: 0000000000000000
>>> Nov 18 01:55:18.705725 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
>>> Nov 18 01:55:18.717733 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000
>>> Nov 18 01:55:18.717794 (XEN) r15: 0000000000000000 cr0: 0000000000000011 cr4: 0000000000000000
>>> Nov 18 01:55:18.729713 (XEN) cr3: 0000000000400000 cr2: 0000000000000000
>>> Nov 18 01:55:18.729771 (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000002
>>> Nov 18 01:55:18.741711 (XEN) ds: 0028 es: 0028 fs: 0000 gs: 0000 ss: 0028 cs: 0020
>>>
>>> It seems to be related to the paging pool adding Andrew and Henry so
>>> that he is aware.
>> Summary of what I've just given on IRC/Matrix.
>>
>> This crash is caused by two things. First
>>
>> (XEN) FLASK: Denying unknown domctl: 86.
>>
>> because I completely forgot to wire up Flask for the new hypercalls.
>> But so did the original XSA-409 fix (as SECCLASS_SHADOW is behind
>> CONFIG_X86), so I don't feel quite as bad about this.
> Broken for ARM, but not for x86, right?
Specifically, the original XSA-409 fix broke Flask (on ARM only) by
introducing shadow domctl to ARM without making flask_shadow_control()
common.
I "fixed" that by removing ARM's use of shadow domctl, and broke it
differently by not adding Flask controls for the new common hypercalls.
> I think SECCLASS_SHADOW is available in the policy bits - it's just
> whether or not the hook functions are available?
I suspect so.
>> And second because libxl ignores the error it gets back, and blindly
>> continues onward. Anthony has posted "libs/light: Propagate
>> libxl__arch_domain_create() return code" to fix the libxl half of the
>> bug, and I posted a second libxl bugfix to fix an error message. Both
>> are very simple.
>>
>>
>> For Flask, we need new access vectors because this is a common
>> hypercall, but I'm unsure how to interlink it with x86's shadow
>> control. This will require a bit of pondering, but it is probably
>> easier to just leave them unlinked.
> It sort of seems like it could go under domain2 since domain/domain2
> have most of the memory stuff, but it is non-PV. shadow has its own
> set of hooks. It could go in hvm which already has some memory stuff.
Having looked at all the proposed options, I'm going to put it in
domain/domain2.
This new hypercall is intentionally common, and applicable to all domain
types (eventually - x86 PV guests use this memory pool during migrate).
Furthermore, it needs backporting along with all the other fixes to try
and make 409 work.
~Andrew
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Flask vs paging mempool - Was: [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
2022-11-21 8:04 ` Jan Beulich
@ 2022-11-21 12:14 ` Daniel P. Smith
0 siblings, 0 replies; 8+ messages in thread
From: Daniel P. Smith @ 2022-11-21 12:14 UTC (permalink / raw)
To: Jan Beulich
Cc: Roger Pau Monne, Henry Wang, Anthony Perard, xen-devel,
Jason Andryuk, Andrew Cooper
On 11/21/22 03:04, Jan Beulich wrote:
> On 20.11.2022 12:08, Daniel P. Smith wrote:
>> On 11/18/22 16:10, Jason Andryuk wrote:
>>> On Fri, Nov 18, 2022 at 12:22 PM Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>>>> For Flask, we need new access vectors because this is a common
>>>> hypercall, but I'm unsure how to interlink it with x86's shadow
>>>> control. This will require a bit of pondering, but it is probably
>>>> easier to just leave them unlinked.
>>>
>>> It sort of seems like it could go under domain2 since domain/domain2
>>> have most of the memory stuff, but it is non-PV. shadow has its own
>>> set of hooks. It could go in hvm which already has some memory stuff.
>>
>> Since the new hypercall is for managing a memory pool for any domain,
>> though HVM is the only one supported today, imho it belongs under
>> domain/domain2.
>>
>> Something to consider is that there is another guest memory pool that is
>> managed, the PoD pool, which has a dedicated privilege for it. This
>> leads me to the question of whether access to manage the PoD pool and
>> the paging pool size should be separate accesses or whether they should
>> be under the same access. IMHO I believe it should be the latter as I
>> can see no benefit in disaggregating access to the PoD pool and the
>> paging pool. In fact I find myself thinking in terms of should the
>> managing domain have control over the size of any backing memory pools
>> for the target domain. I am not seeing any benefit to discriminating
>> between which backing memory pool a managing domain should be able to
>> manage. With that said, I am open to being convinced otherwise.
>
> Yet the two pools are of quite different nature: The PoD pool is memory
> the domain itself gets to use (more precisely it is memory temporarily
> "stolen" from the domain). The paging pool, otoh, is memory we need to
> make the domain actually function, without the guest having access to
> that memory.
The question is not necessarily what the pools' exact purpose are, but
who will need control over their size. If one takes a courser view, and
say these memory pools relate to how a domain is consuming memory, then
it follows that only entity needing access is the entity granted
control/management over the domain memory usage. In the end there will
still be an access check for both calls, the question is whether it
makes any sense to differentiate between them in the security model. As
I just outlined, IMHO there is not, but I am open to hearing why they
would need to be differentiated in the security model.
v/r,
dps
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-11-21 12:14 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-18 10:21 [xen-unstable test] 174809: regressions - trouble: broken/fail/pass osstest service owner
2022-11-18 14:39 ` Roger Pau Monné
2022-11-18 17:22 ` Flask vs paging mempool - Was: " Andrew Cooper
2022-11-18 21:10 ` Jason Andryuk
2022-11-20 11:08 ` Daniel P. Smith
2022-11-21 8:04 ` Jan Beulich
2022-11-21 12:14 ` Daniel P. Smith
2022-11-21 11:37 ` Andrew Cooper
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.