All of lore.kernel.org
 help / color / mirror / Atom feed
* [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
@ 2022-11-18 10:21 osstest service owner
  2022-11-18 14:39 ` Roger Pau Monné
  0 siblings, 1 reply; 8+ messages in thread
From: osstest service owner @ 2022-11-18 10:21 UTC (permalink / raw)
  To: xen-devel

flight 174809 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/174809/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict    <job status>   broken
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 5 host-install(5) broken REGR. vs. 174797
 test-amd64-amd64-xl-credit2  20 guest-localmigrate/x10   fail REGR. vs. 174797
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 15 guest-saverestore fail REGR. vs. 174797
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-saverestore fail REGR. vs. 174797
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 debian-hvm-install fail REGR. vs. 174797
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 174797
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 174797
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 174797
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 12 debian-hvm-install fail REGR. vs. 174797
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 15 guest-saverestore fail REGR. vs. 174797

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stop            fail like 174797
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop             fail like 174797
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stop            fail like 174797
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop             fail like 174797
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 174797
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop             fail like 174797
 test-armhf-armhf-libvirt     16 saverestore-support-check    fail  like 174797
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stop            fail like 174797
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 174797
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check    fail  like 174797
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop             fail like 174797
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop            fail like 174797
 test-amd64-i386-xl-pvshim    14 guest-start                  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-check    fail   never pass
 test-amd64-amd64-libvirt     15 migrate-support-check        fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt      15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl          15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl          16 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-xsm      15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-xsm      16 saverestore-support-check    fail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-check        fail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-check    fail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-check    fail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-check        fail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-check        fail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-check    fail  never pass
 test-armhf-armhf-xl          15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          16 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-vhd      14 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-vhd      15 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-check        fail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-check    fail never pass
 test-armhf-armhf-xl-rtds     15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-rtds     16 saverestore-support-check    fail   never pass
 test-armhf-armhf-libvirt     15 migrate-support-check        fail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-check        fail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-vhd      14 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-vhd      15 saverestore-support-check    fail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-check        fail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-check        fail   never pass

version targeted for testing:
 xen                  db8fa01c61db0317a9ee947925226234c65d48e8
baseline version:
 xen                  f5d56f4b253072264efc0fece698a91779e362f5

Last test of basis   174797  2022-11-17 03:03:07 Z    1 days
Testing same since   174809  2022-11-18 00:06:55 Z    0 days    1 attempts

------------------------------------------------------------
People who touched revisions under test:
  Andrew Cooper <andrew.cooper3@citrix.com>
  Anthony PERARD <anthony.perard@citrix.com>
  Jan Beulich <jbeulich@suse.com>

jobs:
 build-amd64-xsm                                              pass    
 build-arm64-xsm                                              pass    
 build-i386-xsm                                               pass    
 build-amd64-xtf                                              pass    
 build-amd64                                                  pass    
 build-arm64                                                  pass    
 build-armhf                                                  pass    
 build-i386                                                   pass    
 build-amd64-libvirt                                          pass    
 build-arm64-libvirt                                          pass    
 build-armhf-libvirt                                          pass    
 build-i386-libvirt                                           pass    
 build-amd64-prev                                             pass    
 build-i386-prev                                              pass    
 build-amd64-pvops                                            pass    
 build-arm64-pvops                                            pass    
 build-armhf-pvops                                            pass    
 build-i386-pvops                                             pass    
 test-xtf-amd64-amd64-1                                       pass    
 test-xtf-amd64-amd64-2                                       pass    
 test-xtf-amd64-amd64-3                                       pass    
 test-xtf-amd64-amd64-4                                       pass    
 test-xtf-amd64-amd64-5                                       pass    
 test-amd64-amd64-xl                                          pass    
 test-amd64-coresched-amd64-xl                                pass    
 test-arm64-arm64-xl                                          pass    
 test-armhf-armhf-xl                                          pass    
 test-amd64-i386-xl                                           pass    
 test-amd64-coresched-i386-xl                                 pass    
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm           fail    
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm            fail    
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm        fail    
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm         fail    
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm                 fail    
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm                  fail    
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm                 fail    
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm                  fail    
 test-amd64-amd64-libvirt-xsm                                 pass    
 test-arm64-arm64-libvirt-xsm                                 pass    
 test-amd64-i386-libvirt-xsm                                  pass    
 test-amd64-amd64-xl-xsm                                      pass    
 test-arm64-arm64-xl-xsm                                      pass    
 test-amd64-i386-xl-xsm                                       pass    
 test-amd64-amd64-qemuu-nested-amd                            fail    
 test-amd64-amd64-xl-pvhv2-amd                                pass    
 test-amd64-i386-qemut-rhel6hvm-amd                           pass    
 test-amd64-i386-qemuu-rhel6hvm-amd                           pass    
 test-amd64-amd64-dom0pvh-xl-amd                              pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64                     pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64                     pass    
 test-amd64-i386-freebsd10-amd64                              pass    
 test-amd64-amd64-qemuu-freebsd11-amd64                       pass    
 test-amd64-amd64-qemuu-freebsd12-amd64                       pass    
 test-amd64-amd64-xl-qemuu-ovmf-amd64                         pass    
 test-amd64-i386-xl-qemuu-ovmf-amd64                          pass    
 test-amd64-amd64-xl-qemut-win7-amd64                         fail    
 test-amd64-i386-xl-qemut-win7-amd64                          fail    
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail    
 test-amd64-i386-xl-qemuu-win7-amd64                          fail    
 test-amd64-amd64-xl-qemut-ws16-amd64                         fail    
 test-amd64-i386-xl-qemut-ws16-amd64                          fail    
 test-amd64-amd64-xl-qemuu-ws16-amd64                         fail    
 test-amd64-i386-xl-qemuu-ws16-amd64                          fail    
 test-armhf-armhf-xl-arndale                                  pass    
 test-amd64-amd64-examine-bios                                pass    
 test-amd64-i386-examine-bios                                 pass    
 test-amd64-amd64-xl-credit1                                  pass    
 test-arm64-arm64-xl-credit1                                  pass    
 test-armhf-armhf-xl-credit1                                  pass    
 test-amd64-amd64-xl-credit2                                  fail    
 test-arm64-arm64-xl-credit2                                  pass    
 test-armhf-armhf-xl-credit2                                  pass    
 test-armhf-armhf-xl-cubietruck                               pass    
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict        broken  
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict         pass    
 test-amd64-amd64-examine                                     pass    
 test-arm64-arm64-examine                                     pass    
 test-armhf-armhf-examine                                     pass    
 test-amd64-i386-examine                                      pass    
 test-amd64-i386-freebsd10-i386                               pass    
 test-amd64-amd64-qemuu-nested-intel                          pass    
 test-amd64-amd64-xl-pvhv2-intel                              pass    
 test-amd64-i386-qemut-rhel6hvm-intel                         pass    
 test-amd64-i386-qemuu-rhel6hvm-intel                         pass    
 test-amd64-amd64-dom0pvh-xl-intel                            pass    
 test-amd64-amd64-libvirt                                     pass    
 test-armhf-armhf-libvirt                                     pass    
 test-amd64-i386-libvirt                                      pass    
 test-amd64-amd64-livepatch                                   pass    
 test-amd64-i386-livepatch                                    pass    
 test-amd64-amd64-migrupgrade                                 pass    
 test-amd64-i386-migrupgrade                                  pass    
 test-amd64-amd64-xl-multivcpu                                pass    
 test-armhf-armhf-xl-multivcpu                                pass    
 test-amd64-amd64-pair                                        pass    
 test-amd64-i386-pair                                         pass    
 test-amd64-amd64-libvirt-pair                                pass    
 test-amd64-i386-libvirt-pair                                 pass    
 test-amd64-amd64-xl-pvshim                                   pass    
 test-amd64-i386-xl-pvshim                                    fail    
 test-amd64-amd64-pygrub                                      pass    
 test-armhf-armhf-libvirt-qcow2                               pass    
 test-amd64-amd64-xl-qcow2                                    pass    
 test-arm64-arm64-libvirt-raw                                 pass    
 test-armhf-armhf-libvirt-raw                                 pass    
 test-amd64-i386-libvirt-raw                                  pass    
 test-amd64-amd64-xl-rtds                                     pass    
 test-armhf-armhf-xl-rtds                                     pass    
 test-arm64-arm64-xl-seattle                                  pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow             pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow              pass    
 test-amd64-amd64-xl-shadow                                   pass    
 test-amd64-i386-xl-shadow                                    pass    
 test-arm64-arm64-xl-thunderx                                 pass    
 test-amd64-amd64-examine-uefi                                pass    
 test-amd64-i386-examine-uefi                                 pass    
 test-amd64-amd64-libvirt-vhd                                 pass    
 test-arm64-arm64-xl-vhd                                      pass    
 test-armhf-armhf-xl-vhd                                      pass    
 test-amd64-i386-xl-vhd                                       pass    


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary

broken-job test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict broken
broken-step test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict host-install(5)

Not pushing.

------------------------------------------------------------
commit db8fa01c61db0317a9ee947925226234c65d48e8
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Thu Oct 20 12:14:30 2022 +0100

    xen/arm: Correct the p2m pool size calculations
    
    Allocating or freeing p2m pages doesn't alter the size of the mempool; only
    the split between free and used pages.
    
    Right now, the hypercalls operate on the free subset of the pool, meaning that
    XEN_DOMCTL_get_paging_mempool_size varies with time as the guest shuffles its
    physmap, and XEN_DOMCTL_set_paging_mempool_size ignores the used subset of the
    pool and lets the guest grow unbounded.
    
    This fixes test-pagign-mempool on ARM so that the behaviour matches x86.
    
    This is part of XSA-409 / CVE-2022-33747.
    
    Fixes: cbea5a1149ca ("xen/arm: Allocate and free P2M pages from the P2M pool")
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Reviewed-by: Julien Grall <jgrall@amazon.com>
    Release-acked-by: Henry Wang <Henry.Wang@arm.com>

commit 7c3bbd940dd8aeb1649734e5055798cc6f3fea4e
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Tue Oct 25 15:27:05 2022 +0100

    xen/arm, libxl: Revert XEN_DOMCTL_shadow_op; use p2m mempool hypercalls
    
    This reverts most of commit cf2a68d2ffbc3ce95e01449d46180bddb10d24a0, and bits
    of cbea5a1149ca7fd4b7cdbfa3ec2e4f109b601ff7.
    
    First of all, with ARM borrowing x86's implementation, the logic to set the
    pool size should have been common, not duplicated.  Introduce
    libxl__domain_set_paging_mempool_size() as a shared implementation, and use it
    from the ARM and x86 paths.  It is left as an exercise to the reader to judge
    how libxl/xl can reasonably function without the ability to query the pool
    size...
    
    Remove ARM's p2m_domctl() infrastructure now the functioanlity has been
    replaced with a working and unit tested interface.
    
    This is part of XSA-409 / CVE-2022-33747.
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
    Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
    Release-acked-by: Henry Wang <Henry.Wang@arm.com>

commit bd87315a603bf25e869e6293f7db7b1024d67999
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Thu Oct 20 12:13:46 2022 +0100

    tools/tests: Unit test for paging mempool size
    
    Exercise some basic functionality of the new
    xc_{get,set}_paging_mempool_size() hypercalls.
    
    This passes on x86, but fails currently on ARM.  ARM will be fixed up in
    future patches.
    
    This is part of XSA-409 / CVE-2022-33747.
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Acked-by: Jan Beulich <jbeulich@suse.com>
    Acked-by: Anthony PERARD <anthony.perard@citrix.com>
    Release-acked-by: Henry Wang <Henry.Wang@arm.com>

commit 22b20bd98c025e06525410e3ab3494d5e63489f7
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Fri Oct 21 14:13:00 2022 +0100

    xen: Introduce non-broken hypercalls for the paging mempool size
    
    The existing XEN_DOMCTL_SHADOW_OP_{GET,SET}_ALLOCATION have problems:
    
     * All set_allocation() flavours have an overflow-before-widen bug when
       calculating "sc->mb << (20 - PAGE_SHIFT)".
     * All flavours have a granularity of 1M.  This was tolerable when the size of
       the pool could only be set at the same granularity, but is broken now that
       ARM has a 16-page stopgap allocation in use.
     * All get_allocation() flavours round up, and in particular turn 0 into 1,
       meaning the get op returns junk before a successful set op.
     * The x86 flavours reject the hypercalls before the VM has vCPUs allocated,
       despite the pool size being a domain property.
     * Even the hypercall names are long-obsolete.
    
    Implement a better interface, which can be first used to unit test the
    behaviour, and subsequently correct a broken implementation.  The old
    interface will be retired in due course.
    
    The unit of bytes (as opposed pages) is a deliberate API/ABI improvement to
    more easily support multiple page granularities.
    
    This is part of XSA-409 / CVE-2022-33747.
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    Acked-by: Anthony PERARD <anthony.perard@citrix.com>
    Release-acked-by: Henry Wang <Henry.Wang@arm.com>

commit e5ac68a0110cb43a3a0bc17d545ae7a0bd746ef9
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Mon Nov 14 21:47:59 2022 +0000

    x86/hvm: Revert per-domain APIC acceleration support
    
    I was really hoping to avoid this, but its now too late in the 4.17 freeze and
    we still don't have working fixes.
    
    The in-Xen calculations for assistance capabilities are buggy.  For the
    avoidance of doubt, the original intention was to be able to control every
    aspect of a APIC acceleration so we could comprehensively test Xen's support,
    as it has proved to be buggy time and time again.
    
    Even after a protracted discussion on what the new API ought to mean, attempts
    to apply it to the existing logic have been unsuccessful, proving that the
    API/ABI is too complicated for most people to reason about.
    
    This reverts most of:
      2ce11ce249a3981bac50914c6a90f681ad7a4222
      6b2b9b3405092c3ad38d7342988a584b8efa674c
    
    leaving in place the non-APIC specific changes (minimal as they are).
    
    This takes us back to the behaviour of Xen 4.16 where APIC acceleration is
    configured on a per system basis.
    
    This work will be revisted in due course.
    
    Fixes: 2ce11ce249a3 ("x86/HVM: allow per-domain usage of hardware virtualized APIC")
    Fixes: 6b2b9b340509 ("x86: report Interrupt Controller Virtualization capabilities")
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Acked-by: Jan Beulich <jbeulich@suse.com>
    Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(qemu changes not included)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
  2022-11-18 10:21 [xen-unstable test] 174809: regressions - trouble: broken/fail/pass osstest service owner
@ 2022-11-18 14:39 ` Roger Pau Monné
  2022-11-18 17:22   ` Flask vs paging mempool - Was: " Andrew Cooper
  0 siblings, 1 reply; 8+ messages in thread
From: Roger Pau Monné @ 2022-11-18 14:39 UTC (permalink / raw)
  To: Andrew Cooper, Henry Wang; +Cc: xen-devel

On Fri, Nov 18, 2022 at 10:21:52AM +0000, osstest service owner wrote:
> flight 174809 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/174809/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict    <job status>   broken
>  test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 5 host-install(5) broken REGR. vs. 174797
>  test-amd64-amd64-xl-credit2  20 guest-localmigrate/x10   fail REGR. vs. 174797
>  test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 15 guest-saverestore fail REGR. vs. 174797
>  test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 15 guest-saverestore fail REGR. vs. 174797
>  test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 12 debian-hvm-install fail REGR. vs. 174797
>  test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 174797
>  test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 174797
>  test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail REGR. vs. 174797
>  test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 12 debian-hvm-install fail REGR. vs. 174797
>  test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 15 guest-saverestore fail REGR. vs. 174797

Looking at a random failure:

Nov 18 01:55:09.233941 (d1) Searching bootorder for: HALT
Nov 18 01:55:11.681666 (d1) drive 0x000f5890: PCHS=16383/16/63 translation=lba LCHS=1024/255/63 s=20480000
Nov 18 01:55:11.693694 (d1) Space available for UMB: cb000-e9000, f52e0-f5820
Nov 18 01:55:11.693754 (d1) Returned 258048 bytes of ZoneHigh
Nov 18 01:55:11.705648 (d1) e820 map has 8 items:
Nov 18 01:55:11.705676 (d1)   0: 0000000000000000 - 000000000009fc00 = 1 RAM
Nov 18 01:55:11.705701 (d1)   1: 000000000009fc00 - 00000000000a0000 = 2 RESERVED
Nov 18 01:55:11.717716 (d1)   2: 00000000000f0000 - 0000000000100000 = 2 RESERVED
Nov 18 01:55:11.717768 (d1)   3: 0000000000100000 - 00000000effff000 = 1 RAM
Nov 18 01:55:11.729687 (d1)   4: 00000000effff000 - 00000000f0000000 = 2 RESERVED
Nov 18 01:55:11.729745 (d1)   5: 00000000fc000000 - 00000000fc00b000 = 4 NVS
Nov 18 01:55:11.741693 (d1)   6: 00000000fc00b000 - 0000000100000000 = 2 RESERVED
Nov 18 01:55:11.741752 (d1)   7: 0000000100000000 - 0000000148000000 = 1 RAM
Nov 18 01:55:11.753644 (d1) enter handle_19:
Nov 18 01:55:11.753721 (d1)   NULL
Nov 18 01:55:11.753796 (d1) Booting from DVD/CD...
Nov 18 01:55:11.753864 (d1) Booting from 0000:7c00
Nov 18 01:55:11.753936 (XEN) arch/x86/mm/hap/hap.c:304: d1 failed to allocate from HAP pool
Nov 18 01:55:18.633799 (XEN) Failed to shatter gfn 7ed37: -12
Nov 18 01:55:18.633866 (XEN) d1v0 EPT violation 0x19c (--x/rw-) gpa 0x0000007ed373a1 mfn 0x33ed37 type 0
Nov 18 01:55:18.645790 (XEN) d1v0 Walking EPT tables for GFN 7ed37:
Nov 18 01:55:18.645850 (XEN) d1v0  epte 9c0000047eba3107
Nov 18 01:55:18.645893 (XEN) d1v0  epte 9c000003000003f3
Nov 18 01:55:18.645935 (XEN) d1v0  --- GLA 0x7ed373a1
Nov 18 01:55:18.657783 (XEN) domain_crash called from arch/x86/hvm/vmx/vmx.c:3758
Nov 18 01:55:18.657844 (XEN) Domain 1 (vcpu#0) crashed on cpu#8:
Nov 18 01:55:18.669781 (XEN) ----[ Xen-4.17-rc  x86_64  debug=y  Not tainted ]----
Nov 18 01:55:18.669843 (XEN) CPU:    8
Nov 18 01:55:18.669884 (XEN) RIP:    0020:[<000000007ed373a1>]
Nov 18 01:55:18.681711 (XEN) RFLAGS: 0000000000010002   CONTEXT: hvm guest (d1v0)
Nov 18 01:55:18.681772 (XEN) rax: 000000007ed373a1   rbx: 000000007ed3726c   rcx: 0000000000000000
Nov 18 01:55:18.693713 (XEN) rdx: 000000007ed2e610   rsi: 0000000000008e38   rdi: 000000007ed37448
Nov 18 01:55:18.693775 (XEN) rbp: 0000000001b410a0   rsp: 0000000000320880   r8:  0000000000000000
Nov 18 01:55:18.705725 (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
Nov 18 01:55:18.717733 (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
Nov 18 01:55:18.717794 (XEN) r15: 0000000000000000   cr0: 0000000000000011   cr4: 0000000000000000
Nov 18 01:55:18.729713 (XEN) cr3: 0000000000400000   cr2: 0000000000000000
Nov 18 01:55:18.729771 (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000002
Nov 18 01:55:18.741711 (XEN) ds: 0028   es: 0028   fs: 0000   gs: 0000   ss: 0028   cs: 0020

It seems to be related to the paging pool adding Andrew and Henry so
that he is aware.

Roger.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Flask vs paging mempool - Was: [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
  2022-11-18 14:39 ` Roger Pau Monné
@ 2022-11-18 17:22   ` Andrew Cooper
  2022-11-18 21:10     ` Jason Andryuk
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2022-11-18 17:22 UTC (permalink / raw)
  To: Roger Pau Monne, Henry Wang, Anthony Perard, Daniel Smith, Jason Andryuk
  Cc: xen-devel

On 18/11/2022 14:39, Roger Pau Monne wrote:
> Nov 18 01:55:11.753936 (XEN) arch/x86/mm/hap/hap.c:304: d1 failed to allocate from HAP pool
> Nov 18 01:55:18.633799 (XEN) Failed to shatter gfn 7ed37: -12
> Nov 18 01:55:18.633866 (XEN) d1v0 EPT violation 0x19c (--x/rw-) gpa 0x0000007ed373a1 mfn 0x33ed37 type 0
> Nov 18 01:55:18.645790 (XEN) d1v0 Walking EPT tables for GFN 7ed37:
> Nov 18 01:55:18.645850 (XEN) d1v0  epte 9c0000047eba3107
> Nov 18 01:55:18.645893 (XEN) d1v0  epte 9c000003000003f3
> Nov 18 01:55:18.645935 (XEN) d1v0  --- GLA 0x7ed373a1
> Nov 18 01:55:18.657783 (XEN) domain_crash called from arch/x86/hvm/vmx/vmx.c:3758
> Nov 18 01:55:18.657844 (XEN) Domain 1 (vcpu#0) crashed on cpu#8:
> Nov 18 01:55:18.669781 (XEN) ----[ Xen-4.17-rc  x86_64  debug=y  Not tainted ]----
> Nov 18 01:55:18.669843 (XEN) CPU:    8
> Nov 18 01:55:18.669884 (XEN) RIP:    0020:[<000000007ed373a1>]
> Nov 18 01:55:18.681711 (XEN) RFLAGS: 0000000000010002   CONTEXT: hvm guest (d1v0)
> Nov 18 01:55:18.681772 (XEN) rax: 000000007ed373a1   rbx: 000000007ed3726c   rcx: 0000000000000000
> Nov 18 01:55:18.693713 (XEN) rdx: 000000007ed2e610   rsi: 0000000000008e38   rdi: 000000007ed37448
> Nov 18 01:55:18.693775 (XEN) rbp: 0000000001b410a0   rsp: 0000000000320880   r8:  0000000000000000
> Nov 18 01:55:18.705725 (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> Nov 18 01:55:18.717733 (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
> Nov 18 01:55:18.717794 (XEN) r15: 0000000000000000   cr0: 0000000000000011   cr4: 0000000000000000
> Nov 18 01:55:18.729713 (XEN) cr3: 0000000000400000   cr2: 0000000000000000
> Nov 18 01:55:18.729771 (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000002
> Nov 18 01:55:18.741711 (XEN) ds: 0028   es: 0028   fs: 0000   gs: 0000   ss: 0028   cs: 0020
>
> It seems to be related to the paging pool adding Andrew and Henry so
> that he is aware.

Summary of what I've just given on IRC/Matrix.

This crash is caused by two things.  First

  (XEN) FLASK: Denying unknown domctl: 86.

because I completely forgot to wire up Flask for the new hypercalls. 
But so did the original XSA-409 fix (as SECCLASS_SHADOW is behind
CONFIG_X86), so I don't feel quite as bad about this.

And second because libxl ignores the error it gets back, and blindly
continues onward.  Anthony has posted "libs/light: Propagate
libxl__arch_domain_create() return code" to fix the libxl half of the
bug, and I posted a second libxl bugfix to fix an error message.  Both
are very simple.


For Flask, we need new access vectors because this is a common
hypercall, but I'm unsure how to interlink it with x86's shadow
control.  This will require a bit of pondering, but it is probably
easier to just leave them unlinked.


Flask is listed as experimental which means it doesn't technically
matter if we break it, but it is used by OpenXT so not fixing it for
4.17 would be rather rude.

~Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Flask vs paging mempool - Was: [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
  2022-11-18 17:22   ` Flask vs paging mempool - Was: " Andrew Cooper
@ 2022-11-18 21:10     ` Jason Andryuk
  2022-11-20 11:08       ` Daniel P. Smith
  2022-11-21 11:37       ` Andrew Cooper
  0 siblings, 2 replies; 8+ messages in thread
From: Jason Andryuk @ 2022-11-18 21:10 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monne, Henry Wang, Anthony Perard, Daniel Smith, xen-devel

On Fri, Nov 18, 2022 at 12:22 PM Andrew Cooper
<Andrew.Cooper3@citrix.com> wrote:
>
> On 18/11/2022 14:39, Roger Pau Monne wrote:
> > Nov 18 01:55:11.753936 (XEN) arch/x86/mm/hap/hap.c:304: d1 failed to allocate from HAP pool
> > Nov 18 01:55:18.633799 (XEN) Failed to shatter gfn 7ed37: -12
> > Nov 18 01:55:18.633866 (XEN) d1v0 EPT violation 0x19c (--x/rw-) gpa 0x0000007ed373a1 mfn 0x33ed37 type 0
> > Nov 18 01:55:18.645790 (XEN) d1v0 Walking EPT tables for GFN 7ed37:
> > Nov 18 01:55:18.645850 (XEN) d1v0  epte 9c0000047eba3107
> > Nov 18 01:55:18.645893 (XEN) d1v0  epte 9c000003000003f3
> > Nov 18 01:55:18.645935 (XEN) d1v0  --- GLA 0x7ed373a1
> > Nov 18 01:55:18.657783 (XEN) domain_crash called from arch/x86/hvm/vmx/vmx.c:3758
> > Nov 18 01:55:18.657844 (XEN) Domain 1 (vcpu#0) crashed on cpu#8:
> > Nov 18 01:55:18.669781 (XEN) ----[ Xen-4.17-rc  x86_64  debug=y  Not tainted ]----
> > Nov 18 01:55:18.669843 (XEN) CPU:    8
> > Nov 18 01:55:18.669884 (XEN) RIP:    0020:[<000000007ed373a1>]
> > Nov 18 01:55:18.681711 (XEN) RFLAGS: 0000000000010002   CONTEXT: hvm guest (d1v0)
> > Nov 18 01:55:18.681772 (XEN) rax: 000000007ed373a1   rbx: 000000007ed3726c   rcx: 0000000000000000
> > Nov 18 01:55:18.693713 (XEN) rdx: 000000007ed2e610   rsi: 0000000000008e38   rdi: 000000007ed37448
> > Nov 18 01:55:18.693775 (XEN) rbp: 0000000001b410a0   rsp: 0000000000320880   r8:  0000000000000000
> > Nov 18 01:55:18.705725 (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> > Nov 18 01:55:18.717733 (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
> > Nov 18 01:55:18.717794 (XEN) r15: 0000000000000000   cr0: 0000000000000011   cr4: 0000000000000000
> > Nov 18 01:55:18.729713 (XEN) cr3: 0000000000400000   cr2: 0000000000000000
> > Nov 18 01:55:18.729771 (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000002
> > Nov 18 01:55:18.741711 (XEN) ds: 0028   es: 0028   fs: 0000   gs: 0000   ss: 0028   cs: 0020
> >
> > It seems to be related to the paging pool adding Andrew and Henry so
> > that he is aware.
>
> Summary of what I've just given on IRC/Matrix.
>
> This crash is caused by two things.  First
>
>   (XEN) FLASK: Denying unknown domctl: 86.
>
> because I completely forgot to wire up Flask for the new hypercalls.
> But so did the original XSA-409 fix (as SECCLASS_SHADOW is behind
> CONFIG_X86), so I don't feel quite as bad about this.

Broken for ARM, but not for x86, right?

I think SECCLASS_SHADOW is available in the policy bits - it's just
whether or not the hook functions are available?

> And second because libxl ignores the error it gets back, and blindly
> continues onward.  Anthony has posted "libs/light: Propagate
> libxl__arch_domain_create() return code" to fix the libxl half of the
> bug, and I posted a second libxl bugfix to fix an error message.  Both
> are very simple.
>
>
> For Flask, we need new access vectors because this is a common
> hypercall, but I'm unsure how to interlink it with x86's shadow
> control.  This will require a bit of pondering, but it is probably
> easier to just leave them unlinked.

It sort of seems like it could go under domain2 since domain/domain2
have most of the memory stuff, but it is non-PV.  shadow has its own
set of hooks.  It could go in hvm which already has some memory stuff.

> Flask is listed as experimental which means it doesn't technically
> matter if we break it, but it is used by OpenXT so not fixing it for
> 4.17 would be rather rude.

It's definitely nicer to have functional Flask in the release.  OpenXT
can use a backport if necessary, so it doesn't need to be a release
blocker.  Having said that, Flask is a nice feature of Xen, so it
would be good to have it functioning in 4.17.

Regards,
Jason


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Flask vs paging mempool - Was: [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
  2022-11-18 21:10     ` Jason Andryuk
@ 2022-11-20 11:08       ` Daniel P. Smith
  2022-11-21  8:04         ` Jan Beulich
  2022-11-21 11:37       ` Andrew Cooper
  1 sibling, 1 reply; 8+ messages in thread
From: Daniel P. Smith @ 2022-11-20 11:08 UTC (permalink / raw)
  To: Jason Andryuk, Andrew Cooper
  Cc: Roger Pau Monne, Henry Wang, Anthony Perard, xen-devel

On 11/18/22 16:10, Jason Andryuk wrote:
> On Fri, Nov 18, 2022 at 12:22 PM Andrew Cooper
> <Andrew.Cooper3@citrix.com> wrote:
>>
>> On 18/11/2022 14:39, Roger Pau Monne wrote:
>>> Nov 18 01:55:11.753936 (XEN) arch/x86/mm/hap/hap.c:304: d1 failed to allocate from HAP pool
>>> Nov 18 01:55:18.633799 (XEN) Failed to shatter gfn 7ed37: -12
>>> Nov 18 01:55:18.633866 (XEN) d1v0 EPT violation 0x19c (--x/rw-) gpa 0x0000007ed373a1 mfn 0x33ed37 type 0
>>> Nov 18 01:55:18.645790 (XEN) d1v0 Walking EPT tables for GFN 7ed37:
>>> Nov 18 01:55:18.645850 (XEN) d1v0  epte 9c0000047eba3107
>>> Nov 18 01:55:18.645893 (XEN) d1v0  epte 9c000003000003f3
>>> Nov 18 01:55:18.645935 (XEN) d1v0  --- GLA 0x7ed373a1
>>> Nov 18 01:55:18.657783 (XEN) domain_crash called from arch/x86/hvm/vmx/vmx.c:3758
>>> Nov 18 01:55:18.657844 (XEN) Domain 1 (vcpu#0) crashed on cpu#8:
>>> Nov 18 01:55:18.669781 (XEN) ----[ Xen-4.17-rc  x86_64  debug=y  Not tainted ]----
>>> Nov 18 01:55:18.669843 (XEN) CPU:    8
>>> Nov 18 01:55:18.669884 (XEN) RIP:    0020:[<000000007ed373a1>]
>>> Nov 18 01:55:18.681711 (XEN) RFLAGS: 0000000000010002   CONTEXT: hvm guest (d1v0)
>>> Nov 18 01:55:18.681772 (XEN) rax: 000000007ed373a1   rbx: 000000007ed3726c   rcx: 0000000000000000
>>> Nov 18 01:55:18.693713 (XEN) rdx: 000000007ed2e610   rsi: 0000000000008e38   rdi: 000000007ed37448
>>> Nov 18 01:55:18.693775 (XEN) rbp: 0000000001b410a0   rsp: 0000000000320880   r8:  0000000000000000
>>> Nov 18 01:55:18.705725 (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
>>> Nov 18 01:55:18.717733 (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
>>> Nov 18 01:55:18.717794 (XEN) r15: 0000000000000000   cr0: 0000000000000011   cr4: 0000000000000000
>>> Nov 18 01:55:18.729713 (XEN) cr3: 0000000000400000   cr2: 0000000000000000
>>> Nov 18 01:55:18.729771 (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000002
>>> Nov 18 01:55:18.741711 (XEN) ds: 0028   es: 0028   fs: 0000   gs: 0000   ss: 0028   cs: 0020
>>>
>>> It seems to be related to the paging pool adding Andrew and Henry so
>>> that he is aware.
>>
>> Summary of what I've just given on IRC/Matrix.
>>
>> This crash is caused by two things.  First
>>
>>    (XEN) FLASK: Denying unknown domctl: 86.
>>
>> because I completely forgot to wire up Flask for the new hypercalls.
>> But so did the original XSA-409 fix (as SECCLASS_SHADOW is behind
>> CONFIG_X86), so I don't feel quite as bad about this.
> 
> Broken for ARM, but not for x86, right?
> 
> I think SECCLASS_SHADOW is available in the policy bits - it's just
> whether or not the hook functions are available?
> 
>> And second because libxl ignores the error it gets back, and blindly
>> continues onward.  Anthony has posted "libs/light: Propagate
>> libxl__arch_domain_create() return code" to fix the libxl half of the
>> bug, and I posted a second libxl bugfix to fix an error message.  Both
>> are very simple.
>>
>>
>> For Flask, we need new access vectors because this is a common
>> hypercall, but I'm unsure how to interlink it with x86's shadow
>> control.  This will require a bit of pondering, but it is probably
>> easier to just leave them unlinked.
> 
> It sort of seems like it could go under domain2 since domain/domain2
> have most of the memory stuff, but it is non-PV.  shadow has its own
> set of hooks.  It could go in hvm which already has some memory stuff.

Since the new hypercall is for managing a memory pool for any domain, 
though HVM is the only one supported today, imho it belongs under 
domain/domain2.

Something to consider is that there is another guest memory pool that is 
managed, the PoD pool, which has a dedicated privilege for it. This 
leads me to the question of whether access to manage the PoD pool and 
the paging pool size should be separate accesses or whether they should 
be under the same access. IMHO I believe it should be the latter as I 
can see no benefit in disaggregating access to the PoD pool and the 
paging pool. In fact I find myself thinking in terms of should the 
managing domain have control over the size of any backing memory pools 
for the target domain. I am not seeing any benefit to discriminating 
between which backing memory pool a managing domain should be able to 
manage. With that said, I am open to being convinced otherwise.

Since this is an XSA fix that will be backported, moving get/set PoD 
hypercalls under a new permission would be too disruptive. I would 
recommend introducing the permission set/getmempools under the domain 
access vector, which will only control access to the paging pool. Then 
planning can occur for 4.18 to look at transitioning get/set PoD target 
to being controlled via get/setmempools.

>> Flask is listed as experimental which means it doesn't technically
>> matter if we break it, but it is used by OpenXT so not fixing it for
>> 4.17 would be rather rude.
> 
> It's definitely nicer to have functional Flask in the release.  OpenXT
> can use a backport if necessary, so it doesn't need to be a release
> blocker.  Having said that, Flask is a nice feature of Xen, so it
> would be good to have it functioning in 4.17.

As maintainer I would really prefer not to see 4.17 go out with any part 
of XSM broken. While it is considered experimental, which I hope to 
rectify, it is a long standing feature that has been kept stable, and 
for which there is a sizeable user base. IMHO I think it deserves a 
proper fix before release.

V/r,
Daniel P. Smith


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Flask vs paging mempool - Was: [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
  2022-11-20 11:08       ` Daniel P. Smith
@ 2022-11-21  8:04         ` Jan Beulich
  2022-11-21 12:14           ` Daniel P. Smith
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2022-11-21  8:04 UTC (permalink / raw)
  To: Daniel P. Smith
  Cc: Roger Pau Monne, Henry Wang, Anthony Perard, xen-devel,
	Jason Andryuk, Andrew Cooper

On 20.11.2022 12:08, Daniel P. Smith wrote:
> On 11/18/22 16:10, Jason Andryuk wrote:
>> On Fri, Nov 18, 2022 at 12:22 PM Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>>> For Flask, we need new access vectors because this is a common
>>> hypercall, but I'm unsure how to interlink it with x86's shadow
>>> control.  This will require a bit of pondering, but it is probably
>>> easier to just leave them unlinked.
>>
>> It sort of seems like it could go under domain2 since domain/domain2
>> have most of the memory stuff, but it is non-PV.  shadow has its own
>> set of hooks.  It could go in hvm which already has some memory stuff.
> 
> Since the new hypercall is for managing a memory pool for any domain, 
> though HVM is the only one supported today, imho it belongs under 
> domain/domain2.
> 
> Something to consider is that there is another guest memory pool that is 
> managed, the PoD pool, which has a dedicated privilege for it. This 
> leads me to the question of whether access to manage the PoD pool and 
> the paging pool size should be separate accesses or whether they should 
> be under the same access. IMHO I believe it should be the latter as I 
> can see no benefit in disaggregating access to the PoD pool and the 
> paging pool. In fact I find myself thinking in terms of should the 
> managing domain have control over the size of any backing memory pools 
> for the target domain. I am not seeing any benefit to discriminating 
> between which backing memory pool a managing domain should be able to 
> manage. With that said, I am open to being convinced otherwise.

Yet the two pools are of quite different nature: The PoD pool is memory
the domain itself gets to use (more precisely it is memory temporarily
"stolen" from the domain). The paging pool, otoh, is memory we need to
make the domain actually function, without the guest having access to
that memory.

Jan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Flask vs paging mempool - Was: [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
  2022-11-18 21:10     ` Jason Andryuk
  2022-11-20 11:08       ` Daniel P. Smith
@ 2022-11-21 11:37       ` Andrew Cooper
  1 sibling, 0 replies; 8+ messages in thread
From: Andrew Cooper @ 2022-11-21 11:37 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Roger Pau Monne, Henry Wang, Anthony Perard, Daniel Smith, xen-devel

On 18/11/2022 21:10, Jason Andryuk wrote:
> On Fri, Nov 18, 2022 at 12:22 PM Andrew Cooper
> <Andrew.Cooper3@citrix.com> wrote:
>> On 18/11/2022 14:39, Roger Pau Monne wrote:
>>> Nov 18 01:55:11.753936 (XEN) arch/x86/mm/hap/hap.c:304: d1 failed to allocate from HAP pool
>>> Nov 18 01:55:18.633799 (XEN) Failed to shatter gfn 7ed37: -12
>>> Nov 18 01:55:18.633866 (XEN) d1v0 EPT violation 0x19c (--x/rw-) gpa 0x0000007ed373a1 mfn 0x33ed37 type 0
>>> Nov 18 01:55:18.645790 (XEN) d1v0 Walking EPT tables for GFN 7ed37:
>>> Nov 18 01:55:18.645850 (XEN) d1v0  epte 9c0000047eba3107
>>> Nov 18 01:55:18.645893 (XEN) d1v0  epte 9c000003000003f3
>>> Nov 18 01:55:18.645935 (XEN) d1v0  --- GLA 0x7ed373a1
>>> Nov 18 01:55:18.657783 (XEN) domain_crash called from arch/x86/hvm/vmx/vmx.c:3758
>>> Nov 18 01:55:18.657844 (XEN) Domain 1 (vcpu#0) crashed on cpu#8:
>>> Nov 18 01:55:18.669781 (XEN) ----[ Xen-4.17-rc  x86_64  debug=y  Not tainted ]----
>>> Nov 18 01:55:18.669843 (XEN) CPU:    8
>>> Nov 18 01:55:18.669884 (XEN) RIP:    0020:[<000000007ed373a1>]
>>> Nov 18 01:55:18.681711 (XEN) RFLAGS: 0000000000010002   CONTEXT: hvm guest (d1v0)
>>> Nov 18 01:55:18.681772 (XEN) rax: 000000007ed373a1   rbx: 000000007ed3726c   rcx: 0000000000000000
>>> Nov 18 01:55:18.693713 (XEN) rdx: 000000007ed2e610   rsi: 0000000000008e38   rdi: 000000007ed37448
>>> Nov 18 01:55:18.693775 (XEN) rbp: 0000000001b410a0   rsp: 0000000000320880   r8:  0000000000000000
>>> Nov 18 01:55:18.705725 (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
>>> Nov 18 01:55:18.717733 (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
>>> Nov 18 01:55:18.717794 (XEN) r15: 0000000000000000   cr0: 0000000000000011   cr4: 0000000000000000
>>> Nov 18 01:55:18.729713 (XEN) cr3: 0000000000400000   cr2: 0000000000000000
>>> Nov 18 01:55:18.729771 (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000002
>>> Nov 18 01:55:18.741711 (XEN) ds: 0028   es: 0028   fs: 0000   gs: 0000   ss: 0028   cs: 0020
>>>
>>> It seems to be related to the paging pool adding Andrew and Henry so
>>> that he is aware.
>> Summary of what I've just given on IRC/Matrix.
>>
>> This crash is caused by two things.  First
>>
>>   (XEN) FLASK: Denying unknown domctl: 86.
>>
>> because I completely forgot to wire up Flask for the new hypercalls.
>> But so did the original XSA-409 fix (as SECCLASS_SHADOW is behind
>> CONFIG_X86), so I don't feel quite as bad about this.
> Broken for ARM, but not for x86, right?

Specifically, the original XSA-409 fix broke Flask (on ARM only) by
introducing shadow domctl to ARM without making flask_shadow_control()
common.

I "fixed" that by removing ARM's use of shadow domctl, and broke it
differently by not adding Flask controls for the new common hypercalls.

> I think SECCLASS_SHADOW is available in the policy bits - it's just
> whether or not the hook functions are available?

I suspect so.

>> And second because libxl ignores the error it gets back, and blindly
>> continues onward.  Anthony has posted "libs/light: Propagate
>> libxl__arch_domain_create() return code" to fix the libxl half of the
>> bug, and I posted a second libxl bugfix to fix an error message.  Both
>> are very simple.
>>
>>
>> For Flask, we need new access vectors because this is a common
>> hypercall, but I'm unsure how to interlink it with x86's shadow
>> control.  This will require a bit of pondering, but it is probably
>> easier to just leave them unlinked.
> It sort of seems like it could go under domain2 since domain/domain2
> have most of the memory stuff, but it is non-PV.  shadow has its own
> set of hooks.  It could go in hvm which already has some memory stuff.

Having looked at all the proposed options, I'm going to put it in
domain/domain2.

This new hypercall is intentionally common, and applicable to all domain
types (eventually - x86 PV guests use this memory pool during migrate). 
Furthermore, it needs backporting along with all the other fixes to try
and make 409 work.

~Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Flask vs paging mempool - Was: [xen-unstable test] 174809: regressions - trouble: broken/fail/pass
  2022-11-21  8:04         ` Jan Beulich
@ 2022-11-21 12:14           ` Daniel P. Smith
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel P. Smith @ 2022-11-21 12:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monne, Henry Wang, Anthony Perard, xen-devel,
	Jason Andryuk, Andrew Cooper

On 11/21/22 03:04, Jan Beulich wrote:
> On 20.11.2022 12:08, Daniel P. Smith wrote:
>> On 11/18/22 16:10, Jason Andryuk wrote:
>>> On Fri, Nov 18, 2022 at 12:22 PM Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>>>> For Flask, we need new access vectors because this is a common
>>>> hypercall, but I'm unsure how to interlink it with x86's shadow
>>>> control.  This will require a bit of pondering, but it is probably
>>>> easier to just leave them unlinked.
>>>
>>> It sort of seems like it could go under domain2 since domain/domain2
>>> have most of the memory stuff, but it is non-PV.  shadow has its own
>>> set of hooks.  It could go in hvm which already has some memory stuff.
>>
>> Since the new hypercall is for managing a memory pool for any domain,
>> though HVM is the only one supported today, imho it belongs under
>> domain/domain2.
>>
>> Something to consider is that there is another guest memory pool that is
>> managed, the PoD pool, which has a dedicated privilege for it. This
>> leads me to the question of whether access to manage the PoD pool and
>> the paging pool size should be separate accesses or whether they should
>> be under the same access. IMHO I believe it should be the latter as I
>> can see no benefit in disaggregating access to the PoD pool and the
>> paging pool. In fact I find myself thinking in terms of should the
>> managing domain have control over the size of any backing memory pools
>> for the target domain. I am not seeing any benefit to discriminating
>> between which backing memory pool a managing domain should be able to
>> manage. With that said, I am open to being convinced otherwise.
> 
> Yet the two pools are of quite different nature: The PoD pool is memory
> the domain itself gets to use (more precisely it is memory temporarily
> "stolen" from the domain). The paging pool, otoh, is memory we need to
> make the domain actually function, without the guest having access to
> that memory.

The question is not necessarily what the pools' exact purpose are, but 
who will need control over their size. If one takes a courser view, and 
say these memory pools relate to how a domain is consuming memory, then 
it follows that only entity needing access is the entity granted 
control/management over the domain memory usage. In the end there will 
still be an access check for both calls, the question is whether it 
makes any sense to differentiate between them in the security model. As 
I just outlined, IMHO there is not, but I am open to hearing why they 
would need to be differentiated in the security model.

v/r,
dps


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-11-21 12:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-18 10:21 [xen-unstable test] 174809: regressions - trouble: broken/fail/pass osstest service owner
2022-11-18 14:39 ` Roger Pau Monné
2022-11-18 17:22   ` Flask vs paging mempool - Was: " Andrew Cooper
2022-11-18 21:10     ` Jason Andryuk
2022-11-20 11:08       ` Daniel P. Smith
2022-11-21  8:04         ` Jan Beulich
2022-11-21 12:14           ` Daniel P. Smith
2022-11-21 11:37       ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.