All of lore.kernel.org
 help / color / mirror / Atom feed
* [xen-unstable test] 164996: regressions - FAIL
@ 2021-09-16  4:06 osstest service owner
  2021-09-16 16:21 ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: osstest service owner @ 2021-09-16  4:06 UTC (permalink / raw)
  To: xen-devel, osstest-admin

flight 164996 xen-unstable real [real]
flight 165002 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/164996/
http://logs.test-lab.xenproject.org/osstest/logs/165002/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-libvirt-raw 17 guest-start/debian.repeat fail REGR. vs. 164945

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 165002-retest
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 165002-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stop            fail like 164945
 test-armhf-armhf-libvirt     16 saverestore-support-check    fail  like 164945
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 164945
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stop            fail like 164945
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop             fail like 164945
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 164945
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop             fail like 164945
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check    fail  like 164945
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop             fail like 164945
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stop            fail like 164945
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop            fail like 164945
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop             fail like 164945
 test-arm64-arm64-xl-seattle  15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-check    fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt      15 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-check        fail   never pass
 test-amd64-amd64-libvirt     15 migrate-support-check        fail   never pass
 test-amd64-i386-xl-pvshim    14 guest-start                  fail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass
 test-arm64-arm64-xl-xsm      15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-xsm      16 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-check    fail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-check        fail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl          15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl          16 saverestore-support-check    fail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-check        fail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-check        fail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-check    fail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-vhd      14 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-vhd      15 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-check        fail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-check    fail  never pass
 test-armhf-armhf-xl          15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          16 saverestore-support-check    fail   never pass
 test-armhf-armhf-libvirt     15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-check        fail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-check    fail never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-check        fail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-vhd      14 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-vhd      15 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-rtds     15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-rtds     16 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-check    fail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-check        fail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-check    fail   never pass

version targeted for testing:
 xen                  487975df53b5298316b594550c79934d646701bd
baseline version:
 xen                  c76cfada1cfad05aaf64ce3ad305c5467650e782

Last test of basis   164945  2021-09-10 21:23:48 Z    5 days
Failing since        164951  2021-09-12 00:14:36 Z    4 days    8 attempts
Testing same since   164996  2021-09-15 11:47:08 Z    0 days    1 attempts

------------------------------------------------------------
People who touched revisions under test:
  Andrew Cooper <andrew.cooper3@citrix.com>
  Daniel P. Smith <dpsmith@apertussolutions.com>
  Ian Jackson <iwj@xenproject.org>
  Jan Beulich <jbeulich@suse.com>
  Nick Rosbrook <rosbrookn@ainfosec.com>
  Penny Zheng <penny.zheng@arm.com>
  Roger Pau Monne <roger.pau@citrix.com>
  Roger Pau Monné <roger.pau@citrix.com>
  Stefano Stabellini <stefano.stabellini@xilinx.com>

jobs:
 build-amd64-xsm                                              pass    
 build-arm64-xsm                                              pass    
 build-i386-xsm                                               pass    
 build-amd64-xtf                                              pass    
 build-amd64                                                  pass    
 build-arm64                                                  pass    
 build-armhf                                                  pass    
 build-i386                                                   pass    
 build-amd64-libvirt                                          pass    
 build-arm64-libvirt                                          pass    
 build-armhf-libvirt                                          pass    
 build-i386-libvirt                                           pass    
 build-amd64-prev                                             pass    
 build-i386-prev                                              pass    
 build-amd64-pvops                                            pass    
 build-arm64-pvops                                            pass    
 build-armhf-pvops                                            pass    
 build-i386-pvops                                             pass    
 test-xtf-amd64-amd64-1                                       pass    
 test-xtf-amd64-amd64-2                                       pass    
 test-xtf-amd64-amd64-3                                       pass    
 test-xtf-amd64-amd64-4                                       pass    
 test-xtf-amd64-amd64-5                                       pass    
 test-amd64-amd64-xl                                          pass    
 test-amd64-coresched-amd64-xl                                pass    
 test-arm64-arm64-xl                                          pass    
 test-armhf-armhf-xl                                          pass    
 test-amd64-i386-xl                                           pass    
 test-amd64-coresched-i386-xl                                 pass    
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm           pass    
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm            pass    
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm        pass    
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm         pass    
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm                 fail    
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm                  fail    
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm                 pass    
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm                  pass    
 test-amd64-amd64-libvirt-xsm                                 pass    
 test-arm64-arm64-libvirt-xsm                                 pass    
 test-amd64-i386-libvirt-xsm                                  pass    
 test-amd64-amd64-xl-xsm                                      pass    
 test-arm64-arm64-xl-xsm                                      pass    
 test-amd64-i386-xl-xsm                                       pass    
 test-amd64-amd64-qemuu-nested-amd                            fail    
 test-amd64-amd64-xl-pvhv2-amd                                pass    
 test-amd64-i386-qemut-rhel6hvm-amd                           pass    
 test-amd64-i386-qemuu-rhel6hvm-amd                           pass    
 test-amd64-amd64-dom0pvh-xl-amd                              pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64                     pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64                     pass    
 test-amd64-i386-freebsd10-amd64                              pass    
 test-amd64-amd64-qemuu-freebsd11-amd64                       pass    
 test-amd64-amd64-qemuu-freebsd12-amd64                       pass    
 test-amd64-amd64-xl-qemuu-ovmf-amd64                         pass    
 test-amd64-i386-xl-qemuu-ovmf-amd64                          pass    
 test-amd64-amd64-xl-qemut-win7-amd64                         fail    
 test-amd64-i386-xl-qemut-win7-amd64                          fail    
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail    
 test-amd64-i386-xl-qemuu-win7-amd64                          fail    
 test-amd64-amd64-xl-qemut-ws16-amd64                         fail    
 test-amd64-i386-xl-qemut-ws16-amd64                          fail    
 test-amd64-amd64-xl-qemuu-ws16-amd64                         fail    
 test-amd64-i386-xl-qemuu-ws16-amd64                          fail    
 test-armhf-armhf-xl-arndale                                  pass    
 test-amd64-amd64-xl-credit1                                  pass    
 test-arm64-arm64-xl-credit1                                  pass    
 test-armhf-armhf-xl-credit1                                  pass    
 test-amd64-amd64-xl-credit2                                  pass    
 test-arm64-arm64-xl-credit2                                  pass    
 test-armhf-armhf-xl-credit2                                  pass    
 test-armhf-armhf-xl-cubietruck                               pass    
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict        pass    
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict         pass    
 test-amd64-amd64-examine                                     pass    
 test-arm64-arm64-examine                                     pass    
 test-armhf-armhf-examine                                     pass    
 test-amd64-i386-examine                                      pass    
 test-amd64-i386-freebsd10-i386                               pass    
 test-amd64-amd64-qemuu-nested-intel                          pass    
 test-amd64-amd64-xl-pvhv2-intel                              pass    
 test-amd64-i386-qemut-rhel6hvm-intel                         pass    
 test-amd64-i386-qemuu-rhel6hvm-intel                         pass    
 test-amd64-amd64-dom0pvh-xl-intel                            pass    
 test-amd64-amd64-libvirt                                     pass    
 test-armhf-armhf-libvirt                                     pass    
 test-amd64-i386-libvirt                                      pass    
 test-amd64-amd64-livepatch                                   pass    
 test-amd64-i386-livepatch                                    pass    
 test-amd64-amd64-migrupgrade                                 pass    
 test-amd64-i386-migrupgrade                                  pass    
 test-amd64-amd64-xl-multivcpu                                pass    
 test-armhf-armhf-xl-multivcpu                                pass    
 test-amd64-amd64-pair                                        pass    
 test-amd64-i386-pair                                         pass    
 test-amd64-amd64-libvirt-pair                                pass    
 test-amd64-i386-libvirt-pair                                 pass    
 test-amd64-amd64-xl-pvshim                                   pass    
 test-amd64-i386-xl-pvshim                                    fail    
 test-amd64-amd64-pygrub                                      pass    
 test-armhf-armhf-libvirt-qcow2                               pass    
 test-amd64-amd64-xl-qcow2                                    pass    
 test-arm64-arm64-libvirt-raw                                 fail    
 test-armhf-armhf-libvirt-raw                                 pass    
 test-amd64-i386-libvirt-raw                                  pass    
 test-amd64-amd64-xl-rtds                                     pass    
 test-armhf-armhf-xl-rtds                                     pass    
 test-arm64-arm64-xl-seattle                                  pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow             pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow              pass    
 test-amd64-amd64-xl-shadow                                   pass    
 test-amd64-i386-xl-shadow                                    pass    
 test-arm64-arm64-xl-thunderx                                 pass    
 test-amd64-amd64-libvirt-vhd                                 pass    
 test-arm64-arm64-xl-vhd                                      pass    
 test-armhf-armhf-xl-vhd                                      pass    
 test-amd64-i386-xl-vhd                                       pass    


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

------------------------------------------------------------
commit 487975df53b5298316b594550c79934d646701bd
Author: Penny Zheng <penny.zheng@arm.com>
Date:   Fri Sep 10 02:52:15 2021 +0000

    xen/arm: introduce allocate_static_memory
    
    This commit introduces a new function allocate_static_memory to allocate
    static memory as guest RAM for domains on Static Allocation.
    
    It uses acquire_domstatic_pages to acquire pre-configured static memory
    for the domain, and uses guest_physmap_add_pages to set up the P2M table.
    These pre-defined static memory banks shall be mapped to the usual guest
    memory addresses (GUEST_RAM0_BASE, GUEST_RAM1_BASE) defined by
    xen/include/public/arch-arm.h.
    
    In order to deal with the trouble of count-to-order conversion when page number
    is not in a power-of-two, this commit exports p2m_insert_mapping and introduce
    a new function guest_physmap_add_pages to cope with adding guest RAM p2m
    mapping with nr_pages.
    
    Signed-off-by: Penny Zheng <penny.zheng@arm.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

commit c7fe462c0d274ffa30c9448c0a80affa075d789d
Author: Penny Zheng <penny.zheng@arm.com>
Date:   Fri Sep 10 02:52:14 2021 +0000

    xen/arm: introduce acquire_staticmem_pages and acquire_domstatic_pages
    
    New function acquire_staticmem_pages aims to acquire nr_mfns contiguous pages
    of static memory, starting at #smfn. And it is the equivalent of
    alloc_heap_pages for static memory.
    
    For each page, it shall check if the page is reserved(PGC_reserved)
    and free. It shall also do a set of necessary initialization, which are
    mostly the same ones in alloc_heap_pages, like, following the same
    cache-coherency policy and turning page status into PGC_state_inuse, etc.
    
    New function acquire_domstatic_pages is the equivalent of alloc_domheap_pages
    for static memory, and it is to acquire nr_mfns contiguous pages of
    static memory and assign them to one specific domain.
    
    It uses acquire_staticmem_pages to acquire nr_mfns pages of static memory.
    Then on success, it will use assign_pages to assign those pages to one
    specific domain.
    
    In order to differentiate pages of static memory from those allocated from
    heap, this patch introduces a new page flag PGC_reserved, then mark pages of
    static memory PGC_reserved when initializing them.
    
    Signed-off-by: Penny Zheng <penny.zheng@arm.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

commit 5260e8fb93f0e1f094de4142b2abad45844ab89c
Author: Penny Zheng <penny.zheng@arm.com>
Date:   Fri Sep 10 02:52:13 2021 +0000

    xen: re-define assign_pages and introduce a new function assign_page
    
    In order to deal with the trouble of count-to-order conversion when page number
    is not in a power-of-two, this commit re-define assign_pages for nr pages and
    assign_page for original page with a single order.
    
    Backporting confusion could be helped by altering the order of assign_pages
    parameters, such that the compiler would point out that adjustments at call
    sites are needed.
    
    [stefano: switch to unsigned int for nr]
    Signed-off-by: Penny Zheng <penny.zheng@arm.com>
    Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

commit 4a9e73e6e53e9d8bc005a08c3968ec36d793f140
Author: Penny Zheng <penny.zheng@arm.com>
Date:   Fri Sep 10 02:52:12 2021 +0000

    xen/arm: static memory initialization
    
    This patch introduces static memory initialization, during system boot-up.
    
    The new function init_staticmem_pages is responsible for static memory
    initialization.
    
    Helper free_staticmem_pages is the equivalent of free_heap_pages, to free
    nr_mfns pages of static memory.
    
    This commit also introduces a new CONFIG_STATIC_MEMORY option to wrap all
    static-allocation-related code.
    
    Put asynchronously scrubbing pages of static memory in TODO list.
    
    Signed-off-by: Penny Zheng <penny.zheng@arm.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

commit 540a637c3410780b519fc055f432afe271f642f8
Author: Penny Zheng <penny.zheng@arm.com>
Date:   Fri Sep 10 02:52:11 2021 +0000

    xen: introduce mark_page_free
    
    This commit defines a new helper mark_page_free to extract common code,
    like following the same cache/TLB coherency policy, between free_heap_pages
    and the new function free_staticmem_pages, which will be introduced later.
    
    The PDX compression makes that conversion between the MFN and the page can
    be potentially non-trivial. As the function is internal, pass the MFN and
    the page. They are both expected to match.
    
    Signed-off-by: Penny Zheng <penny.zheng@arm.com>
    Acked-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Julien Grall <jgrall@amazon.com>

commit 41c031ff437b66cfac4b120bd7698ca039850690
Author: Penny Zheng <penny.zheng@arm.com>
Date:   Fri Sep 10 02:52:10 2021 +0000

    xen/arm: introduce domain on Static Allocation
    
    Static Allocation refers to system or sub-system(domains) for which memory
    areas are pre-defined by configuration using physical address ranges.
    
    Those pre-defined memory, -- Static Memory, as parts of RAM reserved in the
    beginning, shall never go to heap allocator or boot allocator for any use.
    
    Memory can be statically allocated to a domain using the property "xen,static-
    mem" defined in the domain configuration. The number of cells for the address
    and the size must be defined using respectively the properties
    "#xen,static-mem-address-cells" and "#xen,static-mem-size-cells".
    
    The property 'memory' is still needed and should match the amount of memory
    given to the guest. Currently, it either comes from static memory or lets Xen
    allocate from heap. *Mixing* is not supported.
    
    The static memory will be mapped in the guest at the usual guest memory
    addresses (GUEST_RAM0_BASE, GUEST_RAM1_BASE) defined by
    xen/include/public/arch-arm.h.
    
    This patch introduces this new `xen,static-mem` feature, and also documents
    and parses this new attribute at boot time.
    
    This patch also introduces a new field "bool xen_domain" in "struct membank"
    to tell whether the memory bank is reserved as the whole hardware resource,
    or bind to a xen domain node, through "xen,static-mem"
    
    Signed-off-by: Penny Zheng <penny.zheng@arm.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

commit 904ba3ce2e46e59080dc09676cede5df63b59f20
Author: Penny Zheng <penny.zheng@arm.com>
Date:   Fri Sep 10 02:52:09 2021 +0000

    xen/arm: introduce new helper device_tree_get_meminfo
    
    This commit creates a new helper device_tree_get_meminfo to iterate over a
    device tree property to get memory info, like "reg".
    
    Signed-off-by: Penny Zheng <penny.zheng@arm.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

commit a89bcd9737757e4d671783588a6041a84a5e1754
Author: Roger Pau Monne <roger.pau@citrix.com>
Date:   Wed Jul 7 09:15:31 2021 +0200

    tools/go: honor append build flags
    
    Make the go build use APPEND_{C/LD}FLAGS when necessary, just like
    other parts of the build.
    
    Reported-by: Ting-Wei Lan <lantw44@gmail.com>
    Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
    Acked-by: Nick Rosbrook <rosbrookn@ainfosec.com>
    Acked-by: Ian Jackson <iwj@xenproject.org>

commit 6d45368a0a89e01a3a01d156af61fea565db96cc
Author: Daniel P. Smith <dpsmith@apertussolutions.com>
Date:   Fri Sep 10 16:12:59 2021 -0400

    xsm: drop dubious xsm_op_t type
    
    The type xsm_op_t masks the use of void pointers. This commit drops the
    xsm_op_t type and replaces it and all its uses with an explicit void.
    
    Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
    Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

commit 2928c1d250b157fd4585ca47ba36ad4792723f1f
Author: Daniel P. Smith <dpsmith@apertussolutions.com>
Date:   Fri Sep 10 16:12:58 2021 -0400

    xsm: remove remnants of xsm_memtype hook
    
    In c/s fcb8baddf00e the xsm_memtype hook was removed but some remnants were
    left behind. This commit cleans up those remnants.
    
    Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
    Acked-by: Jan Beulich <jbeulich@suse.com>

commit 4624912c0b5505387e53a12ef3417d001431a29d
Author: Daniel P. Smith <dpsmith@apertussolutions.com>
Date:   Fri Sep 10 16:12:57 2021 -0400

    xsm: remove the ability to disable flask
    
    On Linux when SELinux is put into permissive mode the discretionary access
    controls are still in place. Whereas for Xen when the enforcing state of flask
    is set to permissive, all operations for all domains would succeed, i.e. it
    does not fall back to the default access controls. To provide a means to mimic
    a similar but not equivalent behaviour, a flask op is present to allow a
    one-time switch back to the default access controls, aka the "dummy policy".
    
    While this may be desirable for an OS, Xen is a hypervisor and should not
    allow the switching of which security policy framework is being enforced after
    boot.  This patch removes the flask op to enforce the desired XSM usage model
    requiring a reboot of Xen to change the XSM policy module in use.
    
    Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
    Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

commit f26bb285949b8c233816c4c6a87237ee14a06ebc
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Fri Sep 10 16:12:56 2021 -0400

    xen: Implement xen/alternative-call.h for use in common code
    
    The alternative call infrastructure is x86-only for now, but the common iommu
    code has a variant and more common code wants to use the infrastructure.
    
    Introduce CONFIG_ALTERNATIVE_CALL and a conditional implementation so common
    code can use the optimisation when available, without requiring all
    architectures to implement no-op stubs.
    
    Write some documentation, which was thus far entirely absent, covering the
    requirements for an architecture to implement this optimisation, and how to
    use the infrastructure in general code.
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
    Acked-by: Jan Beulich <jbeulich@suse.com>
(qemu changes not included)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-16  4:06 [xen-unstable test] 164996: regressions - FAIL osstest service owner
@ 2021-09-16 16:21 ` Jan Beulich
  2021-09-20 15:44   ` Ian Jackson
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2021-09-16 16:21 UTC (permalink / raw)
  To: xen-devel; +Cc: osstest service owner

On 16.09.2021 06:06, osstest service owner wrote:
> flight 164996 xen-unstable real [real]
> flight 165002 xen-unstable real-retest [real]
> http://logs.test-lab.xenproject.org/osstest/logs/164996/
> http://logs.test-lab.xenproject.org/osstest/logs/165002/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-arm64-arm64-libvirt-raw 17 guest-start/debian.repeat fail REGR. vs. 164945

Since no-one gave a sign so far of looking into this failure, I took
a look despite having little hope to actually figure something. I'm
pretty sure the randomness of the "when" of this failure correlates
with

Sep 15 14:44:48.518439 [ 1613.227909] rpc-worker: page allocation failure: order:4, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
Sep 15 14:44:55.418534 [ 1613.240888] CPU: 48 PID: 2029 Comm: rpc-worker Not tainted 5.4.17+ #1
Sep 15 14:44:55.430511 [ 1613.247370] Hardware name: Cavium ThunderX CN88XX board (DT)
Sep 15 14:44:55.430576 [ 1613.253099] Call trace:
Sep 15 14:44:55.442497 [ 1613.255620]  dump_backtrace+0x0/0x140
Sep 15 14:44:55.442558 [ 1613.259348]  show_stack+0x14/0x20
Sep 15 14:44:55.442606 [ 1613.262734]  dump_stack+0xbc/0x100
Sep 15 14:44:55.442651 [ 1613.266206]  warn_alloc+0xf8/0x160
Sep 15 14:44:55.454512 [ 1613.269677]  __alloc_pages_slowpath+0x9c4/0x9f0
Sep 15 14:44:55.454574 [ 1613.274277]  __alloc_pages_nodemask+0x1cc/0x248
Sep 15 14:44:55.466498 [ 1613.278878]  kmalloc_order+0x24/0xa8
Sep 15 14:44:55.466559 [ 1613.282523]  __kmalloc+0x244/0x270
Sep 15 14:44:55.466607 [ 1613.285995]  alloc_empty_pages.isra.17+0x34/0xb0
Sep 15 14:44:55.478495 [ 1613.290681]  privcmd_ioctl_mmap_batch.isra.20+0x414/0x428
Sep 15 14:44:55.478560 [ 1613.296149]  privcmd_ioctl+0xbc/0xb7c
Sep 15 14:44:55.478608 [ 1613.299883]  do_vfs_ioctl+0xb8/0xae0
Sep 15 14:44:55.490475 [ 1613.303527]  ksys_ioctl+0x78/0xa8
Sep 15 14:44:55.490536 [ 1613.306911]  __arm64_sys_ioctl+0x1c/0x28
Sep 15 14:44:55.490584 [ 1613.310906]  el0_svc_common.constprop.2+0x88/0x150
Sep 15 14:44:55.502489 [ 1613.315765]  el0_svc_handler+0x20/0x80
Sep 15 14:44:55.502551 [ 1613.319583]  el0_svc+0x8/0xc

As per

Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info:
Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639 inactive_anon:15857 isolated_anon:0
Sep 15 14:44:55.514480 [ 1613.324918]  active_file:13286 inactive_file:11182 isolated_file:0
Sep 15 14:44:55.514545 [ 1613.324918]  unevictable:0 dirty:30 writeback:0 unstable:0
Sep 15 14:44:55.526477 [ 1613.324918]  slab_reclaimable:10922 slab_unreclaimable:30234
Sep 15 14:44:55.526540 [ 1613.324918]  mapped:11277 shmem:10975 pagetables:401 bounce:0
Sep 15 14:44:55.538474 [ 1613.324918]  free:8364 free_pcp:100 free_cma:1650

the system doesn't look to really be out of memory; as per

Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB (UMEC) 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 7*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB

there even look to be a number of higher order pages available (albeit
without digging I can't tell what "(C)" means). Nevertheless order-4
allocations aren't really nice.

What I can't see is why this may have started triggering recently. Was
the kernel updated in osstest? Is 512Mb of memory perhaps a bit too
small for a Dom0 on this system (with 96 CPUs)? Going through the log
I haven't been able to find crucial information like how much memory
the host has or what the hypervisor command line was.

Jan



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-16 16:21 ` Jan Beulich
@ 2021-09-20 15:44   ` Ian Jackson
  2021-09-20 15:58     ` Jan Beulich
  2021-09-21 23:38     ` Stefano Stabellini
  0 siblings, 2 replies; 17+ messages in thread
From: Ian Jackson @ 2021-09-20 15:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
> As per
> 
> Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info:
> Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639 inactive_anon:15857 isolated_anon:0
> Sep 15 14:44:55.514480 [ 1613.324918]  active_file:13286 inactive_file:11182 isolated_file:0
> Sep 15 14:44:55.514545 [ 1613.324918]  unevictable:0 dirty:30 writeback:0 unstable:0
> Sep 15 14:44:55.526477 [ 1613.324918]  slab_reclaimable:10922 slab_unreclaimable:30234
> Sep 15 14:44:55.526540 [ 1613.324918]  mapped:11277 shmem:10975 pagetables:401 bounce:0
> Sep 15 14:44:55.538474 [ 1613.324918]  free:8364 free_pcp:100 free_cma:1650
> 
> the system doesn't look to really be out of memory; as per
> 
> Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB (UMEC) 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 7*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB
> 
> there even look to be a number of higher order pages available (albeit
> without digging I can't tell what "(C)" means). Nevertheless order-4
> allocations aren't really nice.

The host history suggests this may possibly be related to a qemu update.

http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html

> What I can't see is why this may have started triggering recently. Was
> the kernel updated in osstest? Is 512Mb of memory perhaps a bit too
> small for a Dom0 on this system (with 96 CPUs)? Going through the log
> I haven't been able to find crucial information like how much memory
> the host has or what the hypervisor command line was.

Logs from last host examination, including a dmesg:

http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.examine/

Re the command line, does Xen not print it ?

The bootloader output seems garbled in the serial log.

Anyway, I think Xen is being booted EFI judging by the grub cfg:

http://logs.test-lab.xenproject.org/osstest/logs/165002/test-arm64-arm64-libvirt-raw/rochester0--grub.cfg.1

which means that it is probaly reading this:

http://logs.test-lab.xenproject.org/osstest/logs/165002/test-arm64-arm64-libvirt-raw/rochester0--xen.cfg

which gives this specification of the command line:

  options=placeholder conswitch=x watchdog noreboot async-show-all console=dtuart dom0_mem=512M,max:512M ucode=scan  

The grub cfg has this:

 multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all console=dtuart dom0_mem=512M,max:512M ucode=scan  ${xen_rm_opts}

It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off".

Ian


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-20 15:44   ` Ian Jackson
@ 2021-09-20 15:58     ` Jan Beulich
  2021-09-21 23:38     ` Stefano Stabellini
  1 sibling, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2021-09-20 15:58 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On 20.09.2021 17:44, Ian Jackson wrote:
> Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
>> As per
>>
>> Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info:
>> Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639 inactive_anon:15857 isolated_anon:0
>> Sep 15 14:44:55.514480 [ 1613.324918]  active_file:13286 inactive_file:11182 isolated_file:0
>> Sep 15 14:44:55.514545 [ 1613.324918]  unevictable:0 dirty:30 writeback:0 unstable:0
>> Sep 15 14:44:55.526477 [ 1613.324918]  slab_reclaimable:10922 slab_unreclaimable:30234
>> Sep 15 14:44:55.526540 [ 1613.324918]  mapped:11277 shmem:10975 pagetables:401 bounce:0
>> Sep 15 14:44:55.538474 [ 1613.324918]  free:8364 free_pcp:100 free_cma:1650
>>
>> the system doesn't look to really be out of memory; as per
>>
>> Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB (UMEC) 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 7*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB
>>
>> there even look to be a number of higher order pages available (albeit
>> without digging I can't tell what "(C)" means). Nevertheless order-4
>> allocations aren't really nice.
> 
> The host history suggests this may possibly be related to a qemu update.
> 
> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html
> 
>> What I can't see is why this may have started triggering recently. Was
>> the kernel updated in osstest? Is 512Mb of memory perhaps a bit too
>> small for a Dom0 on this system (with 96 CPUs)? Going through the log
>> I haven't been able to find crucial information like how much memory
>> the host has or what the hypervisor command line was.
> 
> Logs from last host examination, including a dmesg:
> 
> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.examine/
> 
> Re the command line, does Xen not print it ?
> 
> The bootloader output seems garbled in the serial log.
> 
> Anyway, I think Xen is being booted EFI judging by the grub cfg:
> 
> http://logs.test-lab.xenproject.org/osstest/logs/165002/test-arm64-arm64-libvirt-raw/rochester0--grub.cfg.1

Also judging by output seen in the log file.

> which means that it is probaly reading this:
> 
> http://logs.test-lab.xenproject.org/osstest/logs/165002/test-arm64-arm64-libvirt-raw/rochester0--xen.cfg
> 
> which gives this specification of the command line:
> 
>   options=placeholder conswitch=x watchdog noreboot async-show-all console=dtuart dom0_mem=512M,max:512M ucode=scan  

Funny - about half of this look to be x86-only options.

But yes, this confirms my suspicion on this Dom0 getting limited to
512M of RAM.

> The grub cfg has this:
> 
>  multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all console=dtuart dom0_mem=512M,max:512M ucode=scan  ${xen_rm_opts}
> 
> It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off".

Which wouldn't matter - the two options are x86-only again, and hence
would (if anything) trigger log messages about unknown options. Such
log messages would be seen in the ring buffer only though, not on the
serial console (for getting issued too early).

Jan



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-20 15:44   ` Ian Jackson
  2021-09-20 15:58     ` Jan Beulich
@ 2021-09-21 23:38     ` Stefano Stabellini
  2021-09-22  7:34       ` Jan Beulich
  1 sibling, 1 reply; 17+ messages in thread
From: Stefano Stabellini @ 2021-09-21 23:38 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Jan Beulich, xen-devel, dpsmith, sstabellini

On Mon, 20 Sep 2021, Ian Jackson wrote:
> Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
> > As per
> > 
> > Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info:
> > Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639 inactive_anon:15857 isolated_anon:0
> > Sep 15 14:44:55.514480 [ 1613.324918]  active_file:13286 inactive_file:11182 isolated_file:0
> > Sep 15 14:44:55.514545 [ 1613.324918]  unevictable:0 dirty:30 writeback:0 unstable:0
> > Sep 15 14:44:55.526477 [ 1613.324918]  slab_reclaimable:10922 slab_unreclaimable:30234
> > Sep 15 14:44:55.526540 [ 1613.324918]  mapped:11277 shmem:10975 pagetables:401 bounce:0
> > Sep 15 14:44:55.538474 [ 1613.324918]  free:8364 free_pcp:100 free_cma:1650
> > 
> > the system doesn't look to really be out of memory; as per
> > 
> > Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB (UMEC) 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 7*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB
> > 
> > there even look to be a number of higher order pages available (albeit
> > without digging I can't tell what "(C)" means). Nevertheless order-4
> > allocations aren't really nice.
> 
> The host history suggests this may possibly be related to a qemu update.
> 
> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html
> 
> > What I can't see is why this may have started triggering recently. Was
> > the kernel updated in osstest? Is 512Mb of memory perhaps a bit too
> > small for a Dom0 on this system (with 96 CPUs)? Going through the log
> > I haven't been able to find crucial information like how much memory
> > the host has or what the hypervisor command line was.
> 
> Logs from last host examination, including a dmesg:
> 
> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.examine/
> 
> Re the command line, does Xen not print it ?
> 
> The bootloader output seems garbled in the serial log.
> 
> Anyway, I think Xen is being booted EFI judging by the grub cfg:
> 
> http://logs.test-lab.xenproject.org/osstest/logs/165002/test-arm64-arm64-libvirt-raw/rochester0--grub.cfg.1
> 
> which means that it is probaly reading this:
> 
> http://logs.test-lab.xenproject.org/osstest/logs/165002/test-arm64-arm64-libvirt-raw/rochester0--xen.cfg
> 
> which gives this specification of the command line:
> 
>   options=placeholder conswitch=x watchdog noreboot async-show-all console=dtuart dom0_mem=512M,max:512M ucode=scan  
> 
> The grub cfg has this:
> 
>  multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all console=dtuart dom0_mem=512M,max:512M ucode=scan  ${xen_rm_opts}
> 
> It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off".

I definitely recommend to increase dom0 memory, especially as I guess
the box is going to have a significant amount, far more than 4GB. I
would set it to 2GB. Also the syntax on ARM is simpler, so it should be
just: dom0_mem=2G

In addition, I also did some investigation just in case there is
actually a bug in the code and it is not a simple OOM problem.

Looking at the recent OSSTests results, the first failure is:
https://marc.info/?l=xen-devel&m=163145323631047
http://logs.test-lab.xenproject.org/osstest/logs/164951/

Indeed, the failure is the same test-arm64-arm64-libvirt-raw which is
still failing in more recent tests:
http://logs.test-lab.xenproject.org/osstest/logs/164951/test-arm64-arm64-libvirt-raw/info.html

But if we look at the commit id of flight 164951, it is
6d45368a0a89e01a3a01d156af61fea565db96cc "xsm: drop dubious xsm_op_t
type" by Daniel P. Smith (CCed).

It is interesting because:
- it is *before* all the recent ARM patch series
- it is only 4 commits after master


The 4 commits are:

2021-09-10 16:12 Daniel P. Smith   o xsm: drop dubious xsm_op_t type
2021-09-10 16:12 Daniel P. Smith   o xsm: remove remnants of xsm_memtype hook
2021-09-10 16:12 Daniel P. Smith   o xsm: remove the ability to disable flask
2021-09-10 16:12 Andrew Cooper     o xen: Implement xen/alternative-call.h for use in common code


Looking at them in details:

- "xen: Implement xen/alternative-call.h for use in common code" shouldn'
It shouldn't affect ARM at all

- "xsm: remove the ability to disable flask"
It would only affect the test case if libvirt directly or via libxl
calls FLASK_DISABLE.

- "xsm: remove remnants of xsm_memtype hook"
Shouldn't have any effects

- "xsm: drop dubious xsm_op_t type"
It doesn't look like it should have any runtime effect, only build time


So among these four, only "xsm: remove the ability to disable flask"
seems to have the potential to break a libvirt guest start test. Even
that, it is far fetched and the lack of an explicit XSM-related error
message in the logs would really point in the direction of an OOM.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-21 23:38     ` Stefano Stabellini
@ 2021-09-22  7:34       ` Jan Beulich
  2021-09-22 11:20         ` Ian Jackson
  2021-09-23  1:10         ` Stefano Stabellini
  0 siblings, 2 replies; 17+ messages in thread
From: Jan Beulich @ 2021-09-22  7:34 UTC (permalink / raw)
  To: Stefano Stabellini, Ian Jackson; +Cc: xen-devel, dpsmith

On 22.09.2021 01:38, Stefano Stabellini wrote:
> On Mon, 20 Sep 2021, Ian Jackson wrote:
>> Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
>>> As per
>>>
>>> Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info:
>>> Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639 inactive_anon:15857 isolated_anon:0
>>> Sep 15 14:44:55.514480 [ 1613.324918]  active_file:13286 inactive_file:11182 isolated_file:0
>>> Sep 15 14:44:55.514545 [ 1613.324918]  unevictable:0 dirty:30 writeback:0 unstable:0
>>> Sep 15 14:44:55.526477 [ 1613.324918]  slab_reclaimable:10922 slab_unreclaimable:30234
>>> Sep 15 14:44:55.526540 [ 1613.324918]  mapped:11277 shmem:10975 pagetables:401 bounce:0
>>> Sep 15 14:44:55.538474 [ 1613.324918]  free:8364 free_pcp:100 free_cma:1650
>>>
>>> the system doesn't look to really be out of memory; as per
>>>
>>> Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB (UMEC) 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 7*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB
>>>
>>> there even look to be a number of higher order pages available (albeit
>>> without digging I can't tell what "(C)" means). Nevertheless order-4
>>> allocations aren't really nice.
>>
>> The host history suggests this may possibly be related to a qemu update.
>>
>> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html

Stefano - as per some of your investigation detailed further down I
wonder whether you had seen this part of Ian's reply. (Question of
course then is how that qemu update had managed to get pushed.)

>> The grub cfg has this:
>>
>>  multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all console=dtuart dom0_mem=512M,max:512M ucode=scan  ${xen_rm_opts}
>>
>> It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off".
> 
> I definitely recommend to increase dom0 memory, especially as I guess
> the box is going to have a significant amount, far more than 4GB. I
> would set it to 2GB. Also the syntax on ARM is simpler, so it should be
> just: dom0_mem=2G

Ian - I guess that's an adjustment relatively easy to make? I wonder
though whether we wouldn't want to address the underlying issue first.
Presumably not, because the fix would likely take quite some time to
propagate suitably. Yet if not, we will want to have some way of
verifying that an eventual fix there would have helped here.

> In addition, I also did some investigation just in case there is
> actually a bug in the code and it is not a simple OOM problem.

I think the actual issue is quite clear; what I'm struggling with is
why we weren't hit by it earlier.

As imo always, non-order-0 allocations (perhaps excluding the bringing
up of the kernel or whichever entity) are to be avoided it at possible.
The offender in this case looks to be privcmd's alloc_empty_pages().
For it to request through kcalloc() what ends up being an order-4
allocation, the original IOCTL_PRIVCMD_MMAPBATCH must specify a pretty
large chunk of guest memory to get mapped. Which may in turn be
questionable, but I'm afraid I don't have the time to try to drill
down where that request is coming from and whether that also wouldn't
better be split up.

The solution looks simple enough - convert from kcalloc() to kvcalloc().
I can certainly spin up a patch to Linux to this effect. Yet that still
won't answer the question of why this issue has popped up all of the
sudden (and hence whether there are things wanting changing elsewhere
as well).

Jan



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-22  7:34       ` Jan Beulich
@ 2021-09-22 11:20         ` Ian Jackson
  2021-09-22 12:24           ` Jan Beulich
  2021-09-23  1:10         ` Stefano Stabellini
  1 sibling, 1 reply; 17+ messages in thread
From: Ian Jackson @ 2021-09-22 11:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Stefano Stabellini, Ian Jackson, xen-devel, dpsmith

Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
> On 22.09.2021 01:38, Stefano Stabellini wrote:
> > On Mon, 20 Sep 2021, Ian Jackson wrote:
> >>> Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB (UMEC) 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 7*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB
> >>>
> >>> there even look to be a number of higher order pages available (albeit
> >>> without digging I can't tell what "(C)" means). Nevertheless order-4
> >>> allocations aren't really nice.
> >>
> >> The host history suggests this may possibly be related to a qemu update.
> >>
> >> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html
> 
> Stefano - as per some of your investigation detailed further down I
> wonder whether you had seen this part of Ian's reply. (Question of
> course then is how that qemu update had managed to get pushed.)

I looked for bisection results for this failure and

  http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-unstable/test-arm64-arm64-libvirt-xsm.guest-start--debian.repeat.html

it's a heisenbug.  Also, the tests got reorganised slightly as a
side-effect of dropping some i386 tests, so some of these tests are
"new" from osstest's pov, although their content isn't really new.

Unfortunately, with it being a heisenbug, we won't get any useful
bisection results, which would otherwise conclusively tell us which
tree the problem was in.

> >> The grub cfg has this:
> >>
> >>  multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all console=dtuart dom0_mem=512M,max:512M ucode=scan  ${xen_rm_opts}
> >>
> >> It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off".
> > 
> > I definitely recommend to increase dom0 memory, especially as I guess
> > the box is going to have a significant amount, far more than 4GB. I
> > would set it to 2GB. Also the syntax on ARM is simpler, so it should be
> > just: dom0_mem=2G
> 
> Ian - I guess that's an adjustment relatively easy to make? I wonder
> though whether we wouldn't want to address the underlying issue first.
> Presumably not, because the fix would likely take quite some time to
> propagate suitably. Yet if not, we will want to have some way of
> verifying that an eventual fix there would have helped here.

It could propagate fairly quickly.  But I'm loathe to make this change
because it seems to me that it would be simply masking the bug.

Notably, when this goes wrong, it seems to happen after the guest has
been started once successfully already.  So there *is* enough
memory...

Ian.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-22 11:20         ` Ian Jackson
@ 2021-09-22 12:24           ` Jan Beulich
  2021-09-22 12:29             ` Ian Jackson
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2021-09-22 12:24 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Stefano Stabellini, xen-devel, dpsmith

On 22.09.2021 13:20, Ian Jackson wrote:
> Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
>> On 22.09.2021 01:38, Stefano Stabellini wrote:
>>> On Mon, 20 Sep 2021, Ian Jackson wrote:
>>>>> Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB (UMEC) 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 7*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB
>>>>>
>>>>> there even look to be a number of higher order pages available (albeit
>>>>> without digging I can't tell what "(C)" means). Nevertheless order-4
>>>>> allocations aren't really nice.
>>>>
>>>> The host history suggests this may possibly be related to a qemu update.
>>>>
>>>> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html
>>
>> Stefano - as per some of your investigation detailed further down I
>> wonder whether you had seen this part of Ian's reply. (Question of
>> course then is how that qemu update had managed to get pushed.)
> 
> I looked for bisection results for this failure and
> 
>   http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-unstable/test-arm64-arm64-libvirt-xsm.guest-start--debian.repeat.html
> 
> it's a heisenbug.  Also, the tests got reorganised slightly as a
> side-effect of dropping some i386 tests, so some of these tests are
> "new" from osstest's pov, although their content isn't really new.
> 
> Unfortunately, with it being a heisenbug, we won't get any useful
> bisection results, which would otherwise conclusively tell us which
> tree the problem was in.

Quite unfortunate.

>>>> The grub cfg has this:
>>>>
>>>>  multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all console=dtuart dom0_mem=512M,max:512M ucode=scan  ${xen_rm_opts}
>>>>
>>>> It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off".
>>>
>>> I definitely recommend to increase dom0 memory, especially as I guess
>>> the box is going to have a significant amount, far more than 4GB. I
>>> would set it to 2GB. Also the syntax on ARM is simpler, so it should be
>>> just: dom0_mem=2G
>>
>> Ian - I guess that's an adjustment relatively easy to make? I wonder
>> though whether we wouldn't want to address the underlying issue first.
>> Presumably not, because the fix would likely take quite some time to
>> propagate suitably. Yet if not, we will want to have some way of
>> verifying that an eventual fix there would have helped here.
> 
> It could propagate fairly quickly.

Is the Dom0 kernel used here a distro one or our own build of one of
the upstream trees? In the latter case I'd expect propagation to be
quite a bit faster than in the former case.

>  But I'm loathe to make this change
> because it seems to me that it would be simply masking the bug.
> 
> Notably, when this goes wrong, it seems to happen after the guest has
> been started once successfully already.  So there *is* enough
> memory...

Well, there is enough memory, sure, but (transiently as it seems) not
enough contiguous chunks. The likelihood of higher order allocations
failing increases with smaller overall memory amounts (in Dom0 in this
case), afaict, unless there's (aggressive) de-fragmentation.

Jan



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-22 12:24           ` Jan Beulich
@ 2021-09-22 12:29             ` Ian Jackson
  2021-09-22 13:26               ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Jackson @ 2021-09-22 12:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Stefano Stabellini, xen-devel, dpsmith

Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
> Is the Dom0 kernel used here a distro one or our own build of one of
> the upstream trees? In the latter case I'd expect propagation to be
> quite a bit faster than in the former case.

It's our own build.

> >  But I'm loathe to make this change
> > because it seems to me that it would be simply masking the bug.
> > 
> > Notably, when this goes wrong, it seems to happen after the guest has
> > been started once successfully already.  So there *is* enough
> > memory...
> 
> Well, there is enough memory, sure, but (transiently as it seems) not
> enough contiguous chunks. The likelihood of higher order allocations
> failing increases with smaller overall memory amounts (in Dom0 in this
> case), afaict, unless there's (aggressive) de-fragmentation.

Indeed.

I'm not sure, though, that I fully understand the design principles
behind non-order-0 allocations, and memory sizing, and so on.  Your
earlier mail suggeted there may not be a design principle, and that
anything relying on non-order-0 atomic allocations is only working by
luck (or an embarassing excess of ram).

Ian.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-22 12:29             ` Ian Jackson
@ 2021-09-22 13:26               ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2021-09-22 13:26 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Stefano Stabellini, xen-devel, dpsmith

On 22.09.2021 14:29, Ian Jackson wrote:
> I'm not sure, though, that I fully understand the design principles
> behind non-order-0 allocations, and memory sizing, and so on.  Your
> earlier mail suggeted there may not be a design principle, and that
> anything relying on non-order-0 atomic allocations is only working by
> luck (or an embarassing excess of ram).

That's what I think, yes. During boot and in certain other specific
places it may be okay to use such allocations, as long as failure
leads to something non-destructive. A process (or VM) not getting
created successfully _might_ be okay; a process or VM failing when
it already runs is not okay. Just to give an example. The situation
here falls in the latter category, at least from osstest's pov. IOW
assuming that what gets tested is a goal in terms of functionality,
VM creation failing when there is enough memory (just not in the
right "shape") is not okay here. Or else the test was wrongly put
in place.

Therefore a goal I've been trying to follow in the hypervisor is to
eliminate higher order allocations wherever possible. And I think
the kernel wants to follow suit here.

Jan



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-22  7:34       ` Jan Beulich
  2021-09-22 11:20         ` Ian Jackson
@ 2021-09-23  1:10         ` Stefano Stabellini
  2021-09-23  2:56           ` Julien Grall
  2021-09-23  9:24           ` Jan Beulich
  1 sibling, 2 replies; 17+ messages in thread
From: Stefano Stabellini @ 2021-09-23  1:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Stefano Stabellini, Ian Jackson, xen-devel, dpsmith

On Wed, 22 Sep 2021, Jan Beulich wrote:
> On 22.09.2021 01:38, Stefano Stabellini wrote:
> > On Mon, 20 Sep 2021, Ian Jackson wrote:
> >> Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
> >>> As per
> >>>
> >>> Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info:
> >>> Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639 inactive_anon:15857 isolated_anon:0
> >>> Sep 15 14:44:55.514480 [ 1613.324918]  active_file:13286 inactive_file:11182 isolated_file:0
> >>> Sep 15 14:44:55.514545 [ 1613.324918]  unevictable:0 dirty:30 writeback:0 unstable:0
> >>> Sep 15 14:44:55.526477 [ 1613.324918]  slab_reclaimable:10922 slab_unreclaimable:30234
> >>> Sep 15 14:44:55.526540 [ 1613.324918]  mapped:11277 shmem:10975 pagetables:401 bounce:0
> >>> Sep 15 14:44:55.538474 [ 1613.324918]  free:8364 free_pcp:100 free_cma:1650
> >>>
> >>> the system doesn't look to really be out of memory; as per
> >>>
> >>> Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB (UMEC) 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 7*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB
> >>>
> >>> there even look to be a number of higher order pages available (albeit
> >>> without digging I can't tell what "(C)" means). Nevertheless order-4
> >>> allocations aren't really nice.
> >>
> >> The host history suggests this may possibly be related to a qemu update.
> >>
> >> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html
> 
> Stefano - as per some of your investigation detailed further down I
> wonder whether you had seen this part of Ian's reply. (Question of
> course then is how that qemu update had managed to get pushed.)
> 
> >> The grub cfg has this:
> >>
> >>  multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all console=dtuart dom0_mem=512M,max:512M ucode=scan  ${xen_rm_opts}
> >>
> >> It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off".
> > 
> > I definitely recommend to increase dom0 memory, especially as I guess
> > the box is going to have a significant amount, far more than 4GB. I
> > would set it to 2GB. Also the syntax on ARM is simpler, so it should be
> > just: dom0_mem=2G
> 
> Ian - I guess that's an adjustment relatively easy to make? I wonder
> though whether we wouldn't want to address the underlying issue first.
> Presumably not, because the fix would likely take quite some time to
> propagate suitably. Yet if not, we will want to have some way of
> verifying that an eventual fix there would have helped here.
> 
> > In addition, I also did some investigation just in case there is
> > actually a bug in the code and it is not a simple OOM problem.
> 
> I think the actual issue is quite clear; what I'm struggling with is
> why we weren't hit by it earlier.
> 
> As imo always, non-order-0 allocations (perhaps excluding the bringing
> up of the kernel or whichever entity) are to be avoided it at possible.
> The offender in this case looks to be privcmd's alloc_empty_pages().
> For it to request through kcalloc() what ends up being an order-4
> allocation, the original IOCTL_PRIVCMD_MMAPBATCH must specify a pretty
> large chunk of guest memory to get mapped. Which may in turn be
> questionable, but I'm afraid I don't have the time to try to drill
> down where that request is coming from and whether that also wouldn't
> better be split up.
> 
> The solution looks simple enough - convert from kcalloc() to kvcalloc().
> I can certainly spin up a patch to Linux to this effect. Yet that still
> won't answer the question of why this issue has popped up all of the
> sudden (and hence whether there are things wanting changing elsewhere
> as well).

Also, I saw your patches for Linux. Let's say that the patches are
reviewed and enqueued immediately to be sent to Linus at the next
opportunity. It is going to take a while for them to take effect in
OSSTest, unless we import them somehow in the Linux tree used by OSSTest
straight away, right?

Should we arrange for one test OSSTest flight now with the patches
applied to see if they actually fix the issue? Otherwise we might end up
waiting for nothing...


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-23  1:10         ` Stefano Stabellini
@ 2021-09-23  2:56           ` Julien Grall
  2021-09-28 15:24             ` Jan Beulich
  2021-09-23  9:24           ` Jan Beulich
  1 sibling, 1 reply; 17+ messages in thread
From: Julien Grall @ 2021-09-23  2:56 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Jan Beulich, Ian Jackson, xen-devel, dpsmith

[-- Attachment #1: Type: text/plain, Size: 4931 bytes --]

Hi,

Sorry for the formatting.


On Thu, 23 Sep 2021, 06:10 Stefano Stabellini, <sstabellini@kernel.org>
wrote:

> On Wed, 22 Sep 2021, Jan Beulich wrote:
> > On 22.09.2021 01:38, Stefano Stabellini wrote:
> > > On Mon, 20 Sep 2021, Ian Jackson wrote:
> > >> Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions -
> FAIL"):
> > >>> As per
> > >>>
> > >>> Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info:
> > >>> Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639
> inactive_anon:15857 isolated_anon:0
> > >>> Sep 15 14:44:55.514480 [ 1613.324918]  active_file:13286
> inactive_file:11182 isolated_file:0
> > >>> Sep 15 14:44:55.514545 [ 1613.324918]  unevictable:0 dirty:30
> writeback:0 unstable:0
> > >>> Sep 15 14:44:55.526477 [ 1613.324918]  slab_reclaimable:10922
> slab_unreclaimable:30234
> > >>> Sep 15 14:44:55.526540 [ 1613.324918]  mapped:11277 shmem:10975
> pagetables:401 bounce:0
> > >>> Sep 15 14:44:55.538474 [ 1613.324918]  free:8364 free_pcp:100
> free_cma:1650
> > >>>
> > >>> the system doesn't look to really be out of memory; as per
> > >>>
> > >>> Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB
> (UMEC) 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C)
> 7*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB
> > >>>
> > >>> there even look to be a number of higher order pages available
> (albeit
> > >>> without digging I can't tell what "(C)" means). Nevertheless order-4
> > >>> allocations aren't really nice.
> > >>
> > >> The host history suggests this may possibly be related to a qemu
> update.
> > >>
> > >>
> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html
> >
> > Stefano - as per some of your investigation detailed further down I
> > wonder whether you had seen this part of Ian's reply. (Question of
> > course then is how that qemu update had managed to get pushed.)
> >
> > >> The grub cfg has this:
> > >>
> > >>  multiboot /xen placeholder conswitch=x watchdog noreboot
> async-show-all console=dtuart dom0_mem=512M,max:512M ucode=scan
> ${xen_rm_opts}
> > >>
> > >> It's not clear to me whether xen_rm_opts is "" or "no-real-mode
> edd=off".
> > >
> > > I definitely recommend to increase dom0 memory, especially as I guess
> > > the box is going to have a significant amount, far more than 4GB. I
> > > would set it to 2GB. Also the syntax on ARM is simpler, so it should be
> > > just: dom0_mem=2G
> >
> > Ian - I guess that's an adjustment relatively easy to make? I wonder
> > though whether we wouldn't want to address the underlying issue first.
> > Presumably not, because the fix would likely take quite some time to
> > propagate suitably. Yet if not, we will want to have some way of
> > verifying that an eventual fix there would have helped here.
> >
> > > In addition, I also did some investigation just in case there is
> > > actually a bug in the code and it is not a simple OOM problem.
> >
> > I think the actual issue is quite clear; what I'm struggling with is
> > why we weren't hit by it earlier.
> >
> > As imo always, non-order-0 allocations (perhaps excluding the bringing
> > up of the kernel or whichever entity) are to be avoided it at possible.
> > The offender in this case looks to be privcmd's alloc_empty_pages().
> > For it to request through kcalloc() what ends up being an order-4
> > allocation, the original IOCTL_PRIVCMD_MMAPBATCH must specify a pretty
> > large chunk of guest memory to get mapped. Which may in turn be
> > questionable, but I'm afraid I don't have the time to try to drill
> > down where that request is coming from and whether that also wouldn't
> > better be split up.
> >
> > The solution looks simple enough - convert from kcalloc() to kvcalloc().
> > I can certainly spin up a patch to Linux to this effect. Yet that still
> > won't answer the question of why this issue has popped up all of the
> > sudden (and hence whether there are things wanting changing elsewhere
> > as well).
>
> Also, I saw your patches for Linux. Let's say that the patches are
> reviewed and enqueued immediately to be sent to Linus at the next
> opportunity. It is going to take a while for them to take effect in
> OSSTest, unless we import them somehow in the Linux tree used by OSSTest
> straight away, right?
>

For Arm testing we don't use a branch provided by Linux upstream. So your
wait will be forever :).


> Should we arrange for one test OSSTest flight now with the patches
> applied to see if they actually fix the issue? Otherwise we might end up
> waiting for nothing...


We could push the patch in the branch we have. However the Linux we use is
not fairly old (I think I did a push last year) and not even the latest
stable.

I can't remember whether we still have some patches on top of Linux to run
on arm (specifically 32-bit). So maybe we should start to track upstream
instead?

This will have the benefits to pick any new patches.

Cheers,

.





>

[-- Attachment #2: Type: text/html, Size: 6910 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-23  1:10         ` Stefano Stabellini
  2021-09-23  2:56           ` Julien Grall
@ 2021-09-23  9:24           ` Jan Beulich
  1 sibling, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2021-09-23  9:24 UTC (permalink / raw)
  To: Stefano Stabellini, Ian Jackson; +Cc: xen-devel, dpsmith

On 23.09.2021 03:10, Stefano Stabellini wrote:
> On Wed, 22 Sep 2021, Jan Beulich wrote:
>> On 22.09.2021 01:38, Stefano Stabellini wrote:
>>> On Mon, 20 Sep 2021, Ian Jackson wrote:
>>>> Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
>>>>> As per
>>>>>
>>>>> Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info:
>>>>> Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639 inactive_anon:15857 isolated_anon:0
>>>>> Sep 15 14:44:55.514480 [ 1613.324918]  active_file:13286 inactive_file:11182 isolated_file:0
>>>>> Sep 15 14:44:55.514545 [ 1613.324918]  unevictable:0 dirty:30 writeback:0 unstable:0
>>>>> Sep 15 14:44:55.526477 [ 1613.324918]  slab_reclaimable:10922 slab_unreclaimable:30234
>>>>> Sep 15 14:44:55.526540 [ 1613.324918]  mapped:11277 shmem:10975 pagetables:401 bounce:0
>>>>> Sep 15 14:44:55.538474 [ 1613.324918]  free:8364 free_pcp:100 free_cma:1650
>>>>>
>>>>> the system doesn't look to really be out of memory; as per
>>>>>
>>>>> Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB (UMEC) 497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 7*512kB (C) 0*1024kB 0*2048kB 0*4096kB = 33456kB
>>>>>
>>>>> there even look to be a number of higher order pages available (albeit
>>>>> without digging I can't tell what "(C)" means). Nevertheless order-4
>>>>> allocations aren't really nice.
>>>>
>>>> The host history suggests this may possibly be related to a qemu update.
>>>>
>>>> http://logs.test-lab.xenproject.org/osstest/results/host/rochester0.html
>>
>> Stefano - as per some of your investigation detailed further down I
>> wonder whether you had seen this part of Ian's reply. (Question of
>> course then is how that qemu update had managed to get pushed.)
>>
>>>> The grub cfg has this:
>>>>
>>>>  multiboot /xen placeholder conswitch=x watchdog noreboot async-show-all console=dtuart dom0_mem=512M,max:512M ucode=scan  ${xen_rm_opts}
>>>>
>>>> It's not clear to me whether xen_rm_opts is "" or "no-real-mode edd=off".
>>>
>>> I definitely recommend to increase dom0 memory, especially as I guess
>>> the box is going to have a significant amount, far more than 4GB. I
>>> would set it to 2GB. Also the syntax on ARM is simpler, so it should be
>>> just: dom0_mem=2G
>>
>> Ian - I guess that's an adjustment relatively easy to make? I wonder
>> though whether we wouldn't want to address the underlying issue first.
>> Presumably not, because the fix would likely take quite some time to
>> propagate suitably. Yet if not, we will want to have some way of
>> verifying that an eventual fix there would have helped here.
>>
>>> In addition, I also did some investigation just in case there is
>>> actually a bug in the code and it is not a simple OOM problem.
>>
>> I think the actual issue is quite clear; what I'm struggling with is
>> why we weren't hit by it earlier.
>>
>> As imo always, non-order-0 allocations (perhaps excluding the bringing
>> up of the kernel or whichever entity) are to be avoided it at possible.
>> The offender in this case looks to be privcmd's alloc_empty_pages().
>> For it to request through kcalloc() what ends up being an order-4
>> allocation, the original IOCTL_PRIVCMD_MMAPBATCH must specify a pretty
>> large chunk of guest memory to get mapped. Which may in turn be
>> questionable, but I'm afraid I don't have the time to try to drill
>> down where that request is coming from and whether that also wouldn't
>> better be split up.
>>
>> The solution looks simple enough - convert from kcalloc() to kvcalloc().
>> I can certainly spin up a patch to Linux to this effect. Yet that still
>> won't answer the question of why this issue has popped up all of the
>> sudden (and hence whether there are things wanting changing elsewhere
>> as well).
> 
> Also, I saw your patches for Linux. Let's say that the patches are
> reviewed and enqueued immediately to be sent to Linus at the next
> opportunity. It is going to take a while for them to take effect in
> OSSTest, unless we import them somehow in the Linux tree used by OSSTest
> straight away, right?

Yes.

> Should we arrange for one test OSSTest flight now with the patches
> applied to see if they actually fix the issue? Otherwise we might end up
> waiting for nothing...

Not sure how easy it is to do one-off Linux builds then to be used in
hypervisor tests. Ian?

Jan



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-23  2:56           ` Julien Grall
@ 2021-09-28 15:24             ` Jan Beulich
  2021-09-28 16:16               ` Ian Jackson
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2021-09-28 15:24 UTC (permalink / raw)
  To: Julien Grall, Ian Jackson; +Cc: xen-devel, Stefano Stabellini, dpsmith

[-- Attachment #1: Type: text/plain, Size: 599 bytes --]

On 23.09.2021 04:56, Julien Grall wrote:
> We could push the patch in the branch we have. However the Linux we use is
> not fairly old (I think I did a push last year) and not even the latest
> stable.

I don't think that's a problem here - this looks to be 5.4.17-ish, which
the patch should be good for (and it does apply cleanly to plain 5.4.0).

Ian, for your setting up of a one-off flight (as just talked about),
you can find the patch at
https://lists.xen.org/archives/html/xen-devel/2021-09/msg01691.html
(and perhaps in your mailbox). In case that's easier I've also attached
it here.

Jan

[-- Attachment #2: linux-5.15-rc2-xen-privcmd-mmap-kvcalloc.patch --]
[-- Type: text/plain, Size: 1825 bytes --]

xen/privcmd: replace kcalloc() by kvcalloc() when allocating empty pages

Osstest has been suffering test failures for a little while from order-4
allocation failures, resulting from alloc_empty_pages() calling
kcalloc(). As there's no need for physically contiguous space here,
switch to kvcalloc().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
---
RFC: I cannot really test this, as alloc_empty_pages() only gets used in
     the auto-translated case (i.e. on Arm or PVH Dom0, the latter of
     which I'm not trusting enough yet to actually start playing with
     guests).

There are quite a few more kcalloc() where it's not immediately clear
how large the element counts could possibly grow nor whether it would be
fine to replace them (i.e. physically contiguous space not required).

I wasn't sure whether to Cc stable@ here; the issue certainly has been
present for quite some time. But it didn't look to cause issues until
recently.

--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -420,7 +420,7 @@ static int alloc_empty_pages(struct vm_a
 	int rc;
 	struct page **pages;
 
-	pages = kcalloc(numpgs, sizeof(pages[0]), GFP_KERNEL);
+	pages = kvcalloc(numpgs, sizeof(pages[0]), GFP_KERNEL);
 	if (pages == NULL)
 		return -ENOMEM;
 
@@ -428,7 +428,7 @@ static int alloc_empty_pages(struct vm_a
 	if (rc != 0) {
 		pr_warn("%s Could not alloc %d pfns rc:%d\n", __func__,
 			numpgs, rc);
-		kfree(pages);
+		kvfree(pages);
 		return -ENOMEM;
 	}
 	BUG_ON(vma->vm_private_data != NULL);
@@ -912,7 +912,7 @@ static void privcmd_close(struct vm_area
 	else
 		pr_crit("unable to unmap MFN range: leaking %d pages. rc=%d\n",
 			numpgs, rc);
-	kfree(pages);
+	kvfree(pages);
 }
 
 static vm_fault_t privcmd_fault(struct vm_fault *vmf)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-28 15:24             ` Jan Beulich
@ 2021-09-28 16:16               ` Ian Jackson
  2021-09-29 21:35                 ` Ian Jackson
  2021-10-04 15:22                 ` Ian Jackson
  0 siblings, 2 replies; 17+ messages in thread
From: Ian Jackson @ 2021-09-28 16:16 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Julien Grall, xen-devel, Stefano Stabellini, dpsmith

Jan Beulich writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
> Ian, for your setting up of a one-off flight (as just talked about),
> you can find the patch at
> https://lists.xen.org/archives/html/xen-devel/2021-09/msg01691.html
> (and perhaps in your mailbox). In case that's easier I've also attached
> it here.
...
> [DELETED ATTACHMENT linux-5.15-rc2-xen-privcmd-mmap-kvcalloc.patch, plain text]

Thanks.  The attachment didn't git-am but I managed to make a tree
with it in (but a bogus commit message).

I have a repro of 165218 test-arm64-arm64-libvirt-raw (that's the last
xen-unstable flight) running.  If all goes well it will rebuild Linux
from my branch (new flight 165241) and then run the test using that
kernel (new flight 165242).  I have told it to report to the people on
this thread (and the list).

It will probably report in an hour or two (since it needs to rebuild a
kernel and then negotiate to get a host to run the repro on).
I didn't ask it to keep the host for me, but it ought to publish the
logs and as I say, send an email report here.

Ian.

For my reference:

./mg-transient-task ./mg-repro-setup -P -E...,iwj@xenproject.org,... 165218 test-arm64-arm64-libvirt-raw X --rebuild +linux=https://xenbits.xen.org/git-http/people/iwj/linux.git#164996-fix alloc:equiv-rochester



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-28 16:16               ` Ian Jackson
@ 2021-09-29 21:35                 ` Ian Jackson
  2021-10-04 15:22                 ` Ian Jackson
  1 sibling, 0 replies; 17+ messages in thread
From: Ian Jackson @ 2021-09-29 21:35 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall, xen-devel, Stefano Stabellini, dpsmith

Ian Jackson writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
> Thanks.  The attachment didn't git-am but I managed to make a tree
> with it in (but a bogus commit message).
> 
> I have a repro of 165218 test-arm64-arm64-libvirt-raw (that's the last
> xen-unstable flight) running.  If all goes well it will rebuild Linux
> from my branch (new flight 165241) and then run the test using that
> kernel (new flight 165242).  I have told it to report to the people on
> this thread (and the list).
> 
> It will probably report in an hour or two (since it needs to rebuild a
> kernel and then negotiate to get a host to run the repro on).
> I didn't ask it to keep the host for me, but it ought to publish the
> logs and as I say, send an email report here.

Restarted as 165323 and 165324.  Maybe the thing won't catch fire this
time.  Unusual consequences for a small kernel patch :-).

Ian.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [xen-unstable test] 164996: regressions - FAIL
  2021-09-28 16:16               ` Ian Jackson
  2021-09-29 21:35                 ` Ian Jackson
@ 2021-10-04 15:22                 ` Ian Jackson
  1 sibling, 0 replies; 17+ messages in thread
From: Ian Jackson @ 2021-10-04 15:22 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall, xen-devel, Stefano Stabellini, dpsmith

Ian Jackson writes ("Re: [xen-unstable test] 164996: regressions - FAIL"):
> Thanks.  The attachment didn't git-am but I managed to make a tree
> with it in (but a bogus commit message).
> 
> I have a repro of 165218 test-arm64-arm64-libvirt-raw (that's the last
> xen-unstable flight) running.  If all goes well it will rebuild Linux
> from my branch (new flight 165241) and then run the test using that
> kernel (new flight 165242).  I have told it to report to the people on
> this thread (and the list).
> 
> It will probably report in an hour or two (since it needs to rebuild a
> kernel and then negotiate to get a host to run the repro on).
> I didn't ask it to keep the host for me, but it ought to publish the
> logs and as I say, send an email report here.

This was disrupted by the osstest failure.  I'm running it again.
165354 and 165355.

Ian.

For my reference:

./mg-transient-task ./mg-repro-setup -P -Exen-devel@lists.xenproject.org,jbeulich@suse.com,julien.grall.oss@gmail.com,iwj@xenproject.org,sstabellini@kernel.org,dpsmith@apertussolutions.com 165218 test-arm64-arm64-libvirt-raw X --rebuild +linux=https://xenbits.xen.org/git-http/people/iwj/linux.git#164996-fix alloc:'{equiv-rochester,real}


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-10-04 15:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-16  4:06 [xen-unstable test] 164996: regressions - FAIL osstest service owner
2021-09-16 16:21 ` Jan Beulich
2021-09-20 15:44   ` Ian Jackson
2021-09-20 15:58     ` Jan Beulich
2021-09-21 23:38     ` Stefano Stabellini
2021-09-22  7:34       ` Jan Beulich
2021-09-22 11:20         ` Ian Jackson
2021-09-22 12:24           ` Jan Beulich
2021-09-22 12:29             ` Ian Jackson
2021-09-22 13:26               ` Jan Beulich
2021-09-23  1:10         ` Stefano Stabellini
2021-09-23  2:56           ` Julien Grall
2021-09-28 15:24             ` Jan Beulich
2021-09-28 16:16               ` Ian Jackson
2021-09-29 21:35                 ` Ian Jackson
2021-10-04 15:22                 ` Ian Jackson
2021-09-23  9:24           ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.