All of lore.kernel.org
 help / color / mirror / Atom feed
* [xen-4.7-testing test] 105948: regressions - FAIL
@ 2017-02-21 23:45 osstest service owner
  2017-02-22  0:02 ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: osstest service owner @ 2017-02-21 23:45 UTC (permalink / raw)
  To: xen-devel, osstest-admin

[-- Attachment #1: Type: text/plain, Size: 20336 bytes --]

flight 105948 xen-4.7-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/105948/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-credit2  17 guest-localmigrate/x10   fail REGR. vs. 105855

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-libvirt     13 saverestore-support-check    fail  like 105855
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-check    fail  like 105855
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop             fail like 105855
 test-armhf-armhf-libvirt-raw 12 saverestore-support-check    fail  like 105855
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop            fail like 105855
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop             fail like 105855
 test-amd64-amd64-xl-rtds      9 debian-install               fail  like 105855

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-xsm  1 build-check(1)               blocked  n/a
 test-arm64-arm64-xl           1 build-check(1)               blocked  n/a
 build-arm64-libvirt           1 build-check(1)               blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)               blocked  n/a
 test-arm64-arm64-libvirt      1 build-check(1)               blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)               blocked  n/a
 test-arm64-arm64-xl-rtds      1 build-check(1)               blocked  n/a
 test-arm64-arm64-xl-multivcpu  1 build-check(1)               blocked  n/a
 test-arm64-arm64-xl-xsm       1 build-check(1)               blocked  n/a
 build-arm64                   5 xen-build                    fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start                  fail  never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start                  fail   never pass
 test-amd64-amd64-libvirt     12 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt      12 migrate-support-check        fail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-check        fail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-check        fail   never pass
 build-arm64-xsm               5 xen-build                    fail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-check    fail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check fail never pass
 build-arm64-pvops             5 kernel-build                 fail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          13 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-xsm      12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-xsm      13 saverestore-support-check    fail   never pass
 test-armhf-armhf-libvirt     12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-check        fail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-check    fail  never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-check        fail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-check    fail never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-check        fail   never pass
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop             fail never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-rtds     12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-rtds     13 saverestore-support-check    fail   never pass
 test-armhf-armhf-xl-vhd      11 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-vhd      12 saverestore-support-check    fail   never pass

version targeted for testing:
 xen                  8a9dfe392702cb987cab725fbda7345f4c3053da
baseline version:
 xen                  758378233b0b5d79a29735d95dc72410ef2f19aa

Last test of basis   105855  2017-02-16 15:36:15 Z    5 days
Testing same since   105924  2017-02-20 15:11:38 Z    1 days    3 attempts

------------------------------------------------------------
People who touched revisions under test:
  Andrew Cooper <andrew.cooper3@citrix.com>
  Dario Faggioli <dario.faggioli@citrix.com
  Dario Faggioli <dario.faggioli@citrix.com>
  David Woodhouse <dwmw@amazon.com>
  George Dunlap <george.dunlap@citrix.com>
  Jan Beulich <jbeulich@suse.com>
  Kevin Tian <kevin.tian@intel.com>
  Sergey Dyasli <sergey.dyasli@citrix.com>
  Tamas K Lengyel <tamas@tklengyel.com>

jobs:
 build-amd64-xsm                                              pass    
 build-arm64-xsm                                              fail    
 build-armhf-xsm                                              pass    
 build-i386-xsm                                               pass    
 build-amd64-xtf                                              pass    
 build-amd64                                                  pass    
 build-arm64                                                  fail    
 build-armhf                                                  pass    
 build-i386                                                   pass    
 build-amd64-libvirt                                          pass    
 build-arm64-libvirt                                          blocked 
 build-armhf-libvirt                                          pass    
 build-i386-libvirt                                           pass    
 build-amd64-prev                                             pass    
 build-i386-prev                                              pass    
 build-amd64-pvops                                            pass    
 build-arm64-pvops                                            fail    
 build-armhf-pvops                                            pass    
 build-i386-pvops                                             pass    
 build-amd64-rumprun                                          pass    
 build-i386-rumprun                                           pass    
 test-xtf-amd64-amd64-1                                       pass    
 test-xtf-amd64-amd64-2                                       pass    
 test-xtf-amd64-amd64-3                                       pass    
 test-xtf-amd64-amd64-4                                       pass    
 test-xtf-amd64-amd64-5                                       pass    
 test-amd64-amd64-xl                                          pass    
 test-arm64-arm64-xl                                          blocked 
 test-armhf-armhf-xl                                          pass    
 test-amd64-i386-xl                                           pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm                pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm                 pass    
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm           pass    
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm            pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm                pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm                 pass    
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm        pass    
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm         pass    
 test-amd64-amd64-libvirt-xsm                                 pass    
 test-arm64-arm64-libvirt-xsm                                 blocked 
 test-armhf-armhf-libvirt-xsm                                 pass    
 test-amd64-i386-libvirt-xsm                                  pass    
 test-amd64-amd64-xl-xsm                                      pass    
 test-arm64-arm64-xl-xsm                                      blocked 
 test-armhf-armhf-xl-xsm                                      pass    
 test-amd64-i386-xl-xsm                                       pass    
 test-amd64-amd64-qemuu-nested-amd                            fail    
 test-amd64-amd64-xl-pvh-amd                                  fail    
 test-amd64-i386-qemut-rhel6hvm-amd                           pass    
 test-amd64-i386-qemuu-rhel6hvm-amd                           pass    
 test-amd64-amd64-xl-qemut-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemut-debianhvm-amd64                     pass    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64                    pass    
 test-amd64-i386-xl-qemuu-debianhvm-amd64                     pass    
 test-amd64-i386-freebsd10-amd64                              pass    
 test-amd64-amd64-xl-qemuu-ovmf-amd64                         pass    
 test-amd64-i386-xl-qemuu-ovmf-amd64                          pass    
 test-amd64-amd64-rumprun-amd64                               pass    
 test-amd64-amd64-xl-qemut-win7-amd64                         fail    
 test-amd64-i386-xl-qemut-win7-amd64                          fail    
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail    
 test-amd64-i386-xl-qemuu-win7-amd64                          fail    
 test-armhf-armhf-xl-arndale                                  pass    
 test-amd64-amd64-xl-credit2                                  fail    
 test-arm64-arm64-xl-credit2                                  blocked 
 test-armhf-armhf-xl-credit2                                  pass    
 test-armhf-armhf-xl-cubietruck                               pass    
 test-amd64-i386-freebsd10-i386                               pass    
 test-amd64-i386-rumprun-i386                                 pass    
 test-amd64-amd64-qemuu-nested-intel                          pass    
 test-amd64-amd64-xl-pvh-intel                                fail    
 test-amd64-i386-qemut-rhel6hvm-intel                         pass    
 test-amd64-i386-qemuu-rhel6hvm-intel                         pass    
 test-amd64-amd64-libvirt                                     pass    
 test-arm64-arm64-libvirt                                     blocked 
 test-armhf-armhf-libvirt                                     pass    
 test-amd64-i386-libvirt                                      pass    
 test-amd64-amd64-migrupgrade                                 pass    
 test-amd64-i386-migrupgrade                                  pass    
 test-amd64-amd64-xl-multivcpu                                pass    
 test-arm64-arm64-xl-multivcpu                                blocked 
 test-armhf-armhf-xl-multivcpu                                pass    
 test-amd64-amd64-pair                                        pass    
 test-amd64-i386-pair                                         pass    
 test-amd64-amd64-libvirt-pair                                pass    
 test-amd64-i386-libvirt-pair                                 pass    
 test-amd64-amd64-amd64-pvgrub                                pass    
 test-amd64-amd64-i386-pvgrub                                 pass    
 test-amd64-amd64-pygrub                                      pass    
 test-arm64-arm64-libvirt-qcow2                               blocked 
 test-amd64-amd64-xl-qcow2                                    pass    
 test-armhf-armhf-libvirt-raw                                 pass    
 test-amd64-i386-xl-raw                                       pass    
 test-amd64-amd64-xl-rtds                                     fail    
 test-arm64-arm64-xl-rtds                                     blocked 
 test-armhf-armhf-xl-rtds                                     pass    
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1                     pass    
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1                     pass    
 test-amd64-amd64-libvirt-vhd                                 pass    
 test-armhf-armhf-xl-vhd                                      pass    
 test-amd64-amd64-xl-qemut-winxpsp3                           pass    
 test-amd64-i386-xl-qemut-winxpsp3                            pass    
 test-amd64-amd64-xl-qemuu-winxpsp3                           pass    
 test-amd64-i386-xl-qemuu-winxpsp3                            pass    


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

------------------------------------------------------------
commit 8a9dfe392702cb987cab725fbda7345f4c3053da
Author: Jan Beulich <jbeulich@suse.com>
Date:   Mon Feb 20 16:02:47 2017 +0100

    VMX: fix VMCS race on context-switch paths
    
    When __context_switch() is being bypassed during original context
    switch handling, the vCPU "owning" the VMCS partially loses control of
    it: It will appear non-running to remote CPUs, and hence their attempt
    to pause the owning vCPU will have no effect on it (as it already
    looks to be paused). At the same time the "owning" CPU will re-enable
    interrupts eventually (the lastest when entering the idle loop) and
    hence becomes subject to IPIs from other CPUs requesting access to the
    VMCS. As a result, when __context_switch() finally gets run, the CPU
    may no longer have the VMCS loaded, and hence any accesses to it would
    fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().
    
    For consistency use the new function also in vmx_do_resume(), to
    avoid leaving an open-coded incarnation of it around.
    
    Reported-by: Kevin Mayer <Kevin.Mayer@gdata.de>
    Reported-by: Anshul Makkar <anshul.makkar@citrix.com>
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Acked-by: Kevin Tian <kevin.tian@intel.com>
    Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
    Tested-by: Sergey Dyasli <sergey.dyasli@citrix.com>
    master commit: 2f4d2198a9b3ba94c959330b5c94fe95917c364c
    master date: 2017-02-17 15:49:56 +0100

commit 19d4e55a01cdeafb6b14262806892fcd34bd205d
Author: George Dunlap <george.dunlap@citrix.com>
Date:   Mon Feb 20 16:02:12 2017 +0100

    xen/p2m: Fix p2m_flush_table for non-nested cases
    
    Commit 71bb7304e7a7a35ea6df4b0cedebc35028e4c159 added flushing of
    nested p2m tables whenever the host p2m table changed.  Unfortunately
    in the process, it added a filter to p2m_flush_table() function so
    that the p2m would only be flushed if it was being used as a nested
    p2m.  This meant that the p2m was not being flushed at all for altp2m
    callers.
    
    Only check np2m_base if p2m_class for nested p2m's.
    
    NB that this is not a security issue: The only time this codepath is
    called is in cases where either nestedp2m or altp2m is enabled, and
    neither of them are in security support.
    
    Reported-by: Matt Leinhos <matt@starlab.io>
    Signed-off-by: George Dunlap <george.dunlap@citrix.com>
    Reviewed-by: Tim Deegan <tim@xen.org>
    Tested-by: Tamas K Lengyel <tamas@tklengyel.com>
    master commit: 6192e6378e094094906950120470a621d5b2977c
    master date: 2017-02-15 17:15:56 +0000

commit ad19a5189d8a5b7d48c40cf62ff3682d24194ddf
Author: David Woodhouse <dwmw@amazon.com>
Date:   Mon Feb 20 16:01:47 2017 +0100

    x86/ept: allow write-combining on !mfn_valid() MMIO mappings again
    
    For some MMIO regions, such as those high above RAM, mfn_valid() will
    return false.
    
    Since the fix for XSA-154 in commit c61a6f74f80e ("x86: enforce
    consistent cachability of MMIO mappings"), guests have no longer been
    able to use PAT to obtain write-combining on such regions because the
    'ignore PAT' bit is set in EPT.
    
    We probably want to err on the side of caution and preserve that
    behaviour for addresses in mmio_ro_ranges, but not for normal MMIO
    mappings. That necessitates a slight refactoring to check mfn_valid()
    later, and let the MMIO case get through to the right code path.
    
    Since we're not bailing out for !mfn_valid() immediately, the range
    checks need to be adjusted to cope — simply by masking in the low bits
    to account for 'order' instead of adding, to avoid overflow when the mfn
    is INVALID_MFN (which happens on unmap, since we carefully call this
    function to fill in the EMT even though the PTE won't be valid).
    
    The range checks are also slightly refactored to put only one of them in
    the fast path in the common case. If it doesn't overlap, then it
    *definitely* isn't contained, so we don't need both checks. And if it
    overlaps and is only one page, then it definitely *is* contained.
    
    Finally, add a comment clarifying how that 'return -1' works — it isn't
    returning an error and causing the mapping to fail; it relies on
    resolve_misconfig() being able to split the mapping later. So it's
    *only* sane to do it where order>0 and the 'problem' will be solved by
    splitting the large page. Not for blindly returning 'error', which I was
    tempted to do in my first attempt.
    
    Signed-off-by: David Woodhouse <dwmw@amazon.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Kevin Tian <kevin.tian@intel.com>
    master commit: 30921dc2df3665ca1b2593595aa6725ff013d386
    master date: 2017-02-07 14:30:01 +0100

commit 19addfac3c32d34eee51eb401d18dcd48f6d1298
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Mon Feb 20 16:01:20 2017 +0100

    xen: credit2: never consider CPUs outside of our cpupool.
    
    In fact, relying on the mask of what pCPUs belong to
    which Credit2 runqueue is not enough. If we only do that,
    when Credit2 is the boot scheduler, we may ASSERT() or
    panic when moving a pCPU from Pool-0 to another cpupool.
    
    This is because pCPUs outside of any pool are considered
    part of cpupool0. This puts us at risk of crash when those
    same pCPUs are added to another pool and something
    different than the idle domain is found to be running
    on them.
    
    Note that, even if we prevent the above to happen (which
    is the purpose of this patch), this is still pretty bad,
    in fact, when we remove a pCPU from Pool-0:
    - in Credit1, as we do *not* update prv->ncpus and
      prv->credit, which means we're considering the wrong
      total credits when doing accounting;
    - in Credit2, the pCPU remains part of one runqueue,
      and is hence at least considered during load balancing,
      even if no vCPU should really run there.
    
    In Credit1, this "only" causes skewed accounting and
    no crashes because there is a lot of `cpumask_and`ing
    going on with the cpumask of the domains' cpupool
    (which, BTW, comes at a price).
    
    A quick and not to involved (and easily backportable)
    solution for Credit2, is to do exactly the same.
    
    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com
    Acked-by: George Dunlap <george.dunlap@citrix.com>
    master commit: e7191920261d20e52ca4c06a03589a1155981b04
    master date: 2017-01-24 17:02:07 +0000

commit d9dec4151a2ae2708c4b71f9e78257e5c874e6eb
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Mon Feb 20 16:00:20 2017 +0100

    x86/VT-x: Dump VMCS on VMLAUNCH/VMRESUME failure
    
    If a VMLAUNCH/VMRESUME fails due to invalid control or host state, dump the
    VMCS before crashing the domain.
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Acked-by: Kevin Tian <kevin.tian@intel.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    master commit: d0fd9ae54491328b10dee4003656c14b3bf3d3e9
    master date: 2016-07-04 10:51:48 +0100
(qemu changes not included)


[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [xen-4.7-testing test] 105948: regressions - FAIL
  2017-02-21 23:45 [xen-4.7-testing test] 105948: regressions - FAIL osstest service owner
@ 2017-02-22  0:02 ` Andrew Cooper
  2017-02-22  8:46   ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2017-02-22  0:02 UTC (permalink / raw)
  To: osstest service owner, xen-devel
  Cc: George Dunlap, Dario Faggioli, Jan Beulich

On 21/02/2017 23:45, osstest service owner wrote:
> flight 105948 xen-4.7-testing real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/105948/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-xl-credit2  17 guest-localmigrate/x10   fail REGR. vs. 105855

From
http://logs.test-lab.xenproject.org/osstest/logs/105948/test-amd64-amd64-xl-credit2/serial-nobling0.log
around Feb 21 20:32:01.481626

(XEN) csched2_vcpu_insert: Inserting d5v0
(XEN) csched2_vcpu_insert: Inserting d5v1
(XEN) csched2_vcpu_insert: Inserting d5v2
(XEN) csched2_vcpu_insert: Inserting d5v3
(XEN) Assertion 'd->cpupool != NULL' failed at
...5948.build-amd64/xen/xen/include/xen/sched-if.h:200
(XEN) ----[ Xen-4.7.2-pre  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    14
(XEN) RIP:    e008:[<ffff82d080126e70>]
sched_credit2.c#vcpu_is_migrateable+0x22/0x9a
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: ffff8304573bc000   rbx: ffff83047f114f10   rcx: ffff83007baca000
(XEN) rdx: 0000000000000000   rsi: ffff83023fed0458   rdi: ffff83047f114f10
(XEN) rbp: ffff830473fffd60   rsp: ffff830473fffd40   r8:  0000000014a138bf
(XEN) r9:  0000000014a538bf   r10: 0000000000021e54   r11: 0f0f0f0f0f0f0f0f
(XEN) r12: ffff83047f114190   r13: 000000000000000e   r14: ffff82d0802e66c0
(XEN) r15: ffff83023fed0000   cr0: 0000000080050033   cr4: 00000000003526e4
(XEN) cr3: 0000000457499000   cr2: ffff880002810288
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d080126e70>
(sched_credit2.c#vcpu_is_migrateable+0x22/0x9a):
(XEN)  8b 50 68 48 85 d2 75 02 <0f> 0b 49 89 f4 48 89 fb 48 8d 05 81 52
21 00 48
(XEN) Xen stack trace from rsp=ffff830473fffd40:
(XEN)    0000000000000006 ffff83047f114f10 ffff83047f114190 ffff83023fed0498
(XEN)    ffff830473fffe20 ffff82d080129763 ffff830473fffde0 ffff82d0802e66c0
(XEN)    ffff83023fed0000 ffff83023fed0000 ffff83040000000e ffff83040000000e
(XEN)    000000070000000e 000000528d5c4e92 ffff830473fffdc8 ffff830473fffe68
(XEN)    ffff82d080130110 0000000002a5a4fa 0000000000000000 0000000000000000
(XEN)    ffff83023fed0960 ffff83023fed0458 0000000000000002 ffff8300679fc000
(XEN)    ffff82d08033c140 000000528d5c4e92 ffff83023fed0964 000000000000000e
(XEN)    ffff830473fffeb0 ffff82d08012c17e a6671e8700000002 ffff83023ff70160
(XEN)    0000000e00fffe60 ffff83023ff70140 ffff830473fffe60 ffff82d0801301a1
(XEN)    ffff830473fffeb0 ffff82d0801330ad ffff830473fffef0 ffff82d0801bb4a2
(XEN)    00000010679fc000 ffff82d080313180 ffff82d080312a80 ffffffffffffffff
(XEN)    ffff830473ffffff ffff8304677f2000 ffff830473fffee0 ffff82d08012f8bd
(XEN)    ffff830473ffffff ffff83007bad0000 00000000ffffffff ffff83023fec2000
(XEN)    ffff830473fffef0 ffff82d08012f912 ffff830473ffff10 ffff82d080164b17
(XEN)    ffff82d08012f912 ffff8300679fc000 ffff830473fffdd8 0000000000000000
(XEN)    ffffffff81c01fd8 ffffffff81c01fd8 0000000000000000 ffffffff81c01e68
(XEN)    ffffffffffffffff 0000000000000246 000000528d4bace3 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffffffff810013aa ffffffff81c319a0
(XEN)    00000000deadbeef 00000000deadbeef 0000010000000000 ffffffff810013aa
(XEN)    000000000000e033 0000000000000246 ffffffff81c01e50 000000000000e02b
(XEN) Xen call trace:
(XEN)    [<ffff82d080126e70>] sched_credit2.c#vcpu_is_migrateable+0x22/0x9a
(XEN)    [<ffff82d080129763>] sched_credit2.c#csched2_schedule+0x823/0xb4e
(XEN)    [<ffff82d08012c17e>] schedule.c#schedule+0x108/0x609
(XEN)    [<ffff82d08012f8bd>] softirq.c#__do_softirq+0x7f/0x8a
(XEN)    [<ffff82d08012f912>] do_softirq+0x13/0x15
(XEN)    [<ffff82d080164b17>] domain.c#idle_loop+0x55/0x62
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 14:
(XEN) Assertion 'd->cpupool != NULL' failed at
...5948.build-amd64/xen/xen/include/xen/sched-if.h:200
(XEN) ****************************************
(XEN)
(XEN) Manual reset required ('noreboot' specified)

I am guessing the most recent credit2 backports weren't quite so safe?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [xen-4.7-testing test] 105948: regressions - FAIL
  2017-02-22  0:02 ` Andrew Cooper
@ 2017-02-22  8:46   ` Jan Beulich
  2017-02-22  9:59     ` Dario Faggioli
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jan Beulich @ 2017-02-22  8:46 UTC (permalink / raw)
  To: Andrew Cooper, Dario Faggioli
  Cc: George Dunlap, xen-devel, osstest service owner

>>> On 22.02.17 at 01:02, <andrew.cooper3@citrix.com> wrote:
> On 21/02/2017 23:45, osstest service owner wrote:
>> flight 105948 xen-4.7-testing real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/105948/ 
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>  test-amd64-amd64-xl-credit2  17 guest-localmigrate/x10   fail REGR. vs. 105855
> 
> From
> http://logs.test-lab.xenproject.org/osstest/logs/105948/test-amd64-amd64-xl- 
> credit2/serial-nobling0.log
> around Feb 21 20:32:01.481626
> 
> (XEN) csched2_vcpu_insert: Inserting d5v0
> (XEN) csched2_vcpu_insert: Inserting d5v1
> (XEN) csched2_vcpu_insert: Inserting d5v2
> (XEN) csched2_vcpu_insert: Inserting d5v3
> (XEN) Assertion 'd->cpupool != NULL' failed at
> ...5948.build-amd64/xen/xen/include/xen/sched-if.h:200
> (XEN) ----[ Xen-4.7.2-pre  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    14
> (XEN) RIP:    e008:[<ffff82d080126e70>]
> sched_credit2.c#vcpu_is_migrateable+0x22/0x9a
> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
> (XEN) rax: ffff8304573bc000   rbx: ffff83047f114f10   rcx: ffff83007baca000
> (XEN) rdx: 0000000000000000   rsi: ffff83023fed0458   rdi: ffff83047f114f10
> (XEN) rbp: ffff830473fffd60   rsp: ffff830473fffd40   r8:  0000000014a138bf
> (XEN) r9:  0000000014a538bf   r10: 0000000000021e54   r11: 0f0f0f0f0f0f0f0f
> (XEN) r12: ffff83047f114190   r13: 000000000000000e   r14: ffff82d0802e66c0
> (XEN) r15: ffff83023fed0000   cr0: 0000000080050033   cr4: 00000000003526e4
> (XEN) cr3: 0000000457499000   cr2: ffff880002810288
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen code around <ffff82d080126e70>
> (sched_credit2.c#vcpu_is_migrateable+0x22/0x9a):
> (XEN)  8b 50 68 48 85 d2 75 02 <0f> 0b 49 89 f4 48 89 fb 48 8d 05 81 52
> 21 00 48
> (XEN) Xen stack trace from rsp=ffff830473fffd40:
> (XEN)    0000000000000006 ffff83047f114f10 ffff83047f114190 ffff83023fed0498
> (XEN)    ffff830473fffe20 ffff82d080129763 ffff830473fffde0 ffff82d0802e66c0
> (XEN)    ffff83023fed0000 ffff83023fed0000 ffff83040000000e ffff83040000000e
> (XEN)    000000070000000e 000000528d5c4e92 ffff830473fffdc8 ffff830473fffe68
> (XEN)    ffff82d080130110 0000000002a5a4fa 0000000000000000 0000000000000000
> (XEN)    ffff83023fed0960 ffff83023fed0458 0000000000000002 ffff8300679fc000
> (XEN)    ffff82d08033c140 000000528d5c4e92 ffff83023fed0964 000000000000000e
> (XEN)    ffff830473fffeb0 ffff82d08012c17e a6671e8700000002 ffff83023ff70160
> (XEN)    0000000e00fffe60 ffff83023ff70140 ffff830473fffe60 ffff82d0801301a1
> (XEN)    ffff830473fffeb0 ffff82d0801330ad ffff830473fffef0 ffff82d0801bb4a2
> (XEN)    00000010679fc000 ffff82d080313180 ffff82d080312a80 ffffffffffffffff
> (XEN)    ffff830473ffffff ffff8304677f2000 ffff830473fffee0 ffff82d08012f8bd
> (XEN)    ffff830473ffffff ffff83007bad0000 00000000ffffffff ffff83023fec2000
> (XEN)    ffff830473fffef0 ffff82d08012f912 ffff830473ffff10 ffff82d080164b17
> (XEN)    ffff82d08012f912 ffff8300679fc000 ffff830473fffdd8 0000000000000000
> (XEN)    ffffffff81c01fd8 ffffffff81c01fd8 0000000000000000 ffffffff81c01e68
> (XEN)    ffffffffffffffff 0000000000000246 000000528d4bace3 0000000000000000
> (XEN)    0000000000000000 0000000000000000 ffffffff810013aa ffffffff81c319a0
> (XEN)    00000000deadbeef 00000000deadbeef 0000010000000000 ffffffff810013aa
> (XEN)    000000000000e033 0000000000000246 ffffffff81c01e50 000000000000e02b
> (XEN) Xen call trace:
> (XEN)    [<ffff82d080126e70>] sched_credit2.c#vcpu_is_migrateable+0x22/0x9a
> (XEN)    [<ffff82d080129763>] sched_credit2.c#csched2_schedule+0x823/0xb4e
> (XEN)    [<ffff82d08012c17e>] schedule.c#schedule+0x108/0x609
> (XEN)    [<ffff82d08012f8bd>] softirq.c#__do_softirq+0x7f/0x8a
> (XEN)    [<ffff82d08012f912>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d080164b17>] domain.c#idle_loop+0x55/0x62
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 14:
> (XEN) Assertion 'd->cpupool != NULL' failed at
> ...5948.build-amd64/xen/xen/include/xen/sched-if.h:200
> (XEN) ****************************************
> (XEN)
> (XEN) Manual reset required ('noreboot' specified)
> 
> I am guessing the most recent credit2 backports weren't quite so safe?

Well, there was only one in the batch under test (and that is what
adds the cpupool_domain_cpumask() causing the ASSERT() above
to trigger). However, comparing with the staging version of the file
(which is heavily different), the immediate code involved here isn't
all that different, so I wonder whether (a) this is a problem on
staging too or (b) we're missing another backport. Dario?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [xen-4.7-testing test] 105948: regressions - FAIL
  2017-02-22  8:46   ` Jan Beulich
@ 2017-02-22  9:59     ` Dario Faggioli
  2017-02-23 23:25     ` Dario Faggioli
  2017-02-24 16:14     ` RFC/PATCH: xen: race during domain destruction [Re: [xen-4.7-testing test] 105948: regressions - FAIL] Dario Faggioli
  2 siblings, 0 replies; 9+ messages in thread
From: Dario Faggioli @ 2017-02-22  9:59 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: George Dunlap, xen-devel, osstest service owner


[-- Attachment #1.1: Type: text/plain, Size: 2636 bytes --]

On Wed, 2017-02-22 at 01:46 -0700, Jan Beulich wrote:
> > > > On 22.02.17 at 01:02, <andrew.cooper3@citrix.com> wrote:
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d080126e70>]
> > sched_credit2.c#vcpu_is_migrateable+0x22/0x9a
> > (XEN)    [<ffff82d080129763>]
> > sched_credit2.c#csched2_schedule+0x823/0xb4e
> > (XEN)    [<ffff82d08012c17e>] schedule.c#schedule+0x108/0x609
> > (XEN)    [<ffff82d08012f8bd>] softirq.c#__do_softirq+0x7f/0x8a
> > (XEN)    [<ffff82d08012f912>] do_softirq+0x13/0x15
> > (XEN)    [<ffff82d080164b17>] domain.c#idle_loop+0x55/0x62
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 14:
> > (XEN) Assertion 'd->cpupool != NULL' failed at
> > ...5948.build-amd64/xen/xen/include/xen/sched-if.h:200
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Manual reset required ('noreboot' specified)
> > 
> > I am guessing the most recent credit2 backports weren't quite so
> > safe?
> 
> Well, there was only one in the batch under test (and that is what
> adds the cpupool_domain_cpumask() causing the ASSERT() above
> to trigger). However, comparing with the staging version of the file
> (which is heavily different), the immediate code involved here isn't
> all that different, so I wonder whether (a) this is a problem on
> staging too or (b) we're missing another backport.
>
Yeah, I also wonder in which of these two situations we are. Staging
looks fine, as far as my testing goes, and also according, e.g., to:

http://logs.test-lab.xenproject.org/osstest/logs/105946/test-amd64-amd64-xl-credit2/info.html

Or:

http://logs.test-lab.xenproject.org/osstest/logs/105900/test-amd64-amd64-xl-credit2/info.html

There appear to have been a problem, in a different test step, though,
here:
http://logs.test-lab.xenproject.org/osstest/logs/105919/test-amd64-amd64-xl-credit2/info.html

Which I noticed yesterday afternoon and am currently looking into it,
because I don't see what has actually gone wrong.

Also, looking at the history of Credit2 tests in 4.7-testing:
http://logs.test-lab.xenproject.org/osstest/results/history/test-amd64-amd64-xl-credit2/xen-4.7-testing

Things were fine on 2017-02-16, and have started failing 2017-02-20.

>  Dario?
> 
I will investigate.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [xen-4.7-testing test] 105948: regressions - FAIL
  2017-02-22  8:46   ` Jan Beulich
  2017-02-22  9:59     ` Dario Faggioli
@ 2017-02-23 23:25     ` Dario Faggioli
  2017-02-24 16:14     ` RFC/PATCH: xen: race during domain destruction [Re: [xen-4.7-testing test] 105948: regressions - FAIL] Dario Faggioli
  2 siblings, 0 replies; 9+ messages in thread
From: Dario Faggioli @ 2017-02-23 23:25 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: George Dunlap, xen-devel, osstest service owner


[-- Attachment #1.1: Type: text/plain, Size: 1624 bytes --]

On Wed, 2017-02-22 at 01:46 -0700, Jan Beulich wrote:
> > > > On 22.02.17 at 01:02, <andrew.cooper3@citrix.com> wrote:
> > (XEN) ****************************************
> > (XEN) Panic on CPU 14:
> > (XEN) Assertion 'd->cpupool != NULL' failed at
> > ...5948.build-amd64/xen/xen/include/xen/sched-if.h:200
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Manual reset required ('noreboot' specified)
> > 
> > I am guessing the most recent credit2 backports weren't quite so
> > safe?
> 
> Well, there was only one in the batch under test (and that is what
> adds the cpupool_domain_cpumask() causing the ASSERT() above
> to trigger). However, comparing with the staging version of the file
> (which is heavily different), the immediate code involved here isn't
> all that different, so I wonder whether (a) this is a problem on
> staging too or (b) we're missing another backport. Dario?
>
Sorry I'm a bit late. But I wasn't feeling too well today, so I
couldn't work much.

In any case, I managed to reproduce this (with staging-4.7) on my
testbox, and I think I've understood what it is (Credit2's load
balancer is considering the original domain, which is in the process of
being destroyed, and hence already has d->cpupool==NULL).

I should be able to put together a patch quickly tomorrow.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RFC/PATCH: xen: race during domain destruction [Re: [xen-4.7-testing test] 105948: regressions - FAIL]
  2017-02-22  8:46   ` Jan Beulich
  2017-02-22  9:59     ` Dario Faggioli
  2017-02-23 23:25     ` Dario Faggioli
@ 2017-02-24 16:14     ` Dario Faggioli
  2017-02-26 15:53       ` Dario Faggioli
  2 siblings, 1 reply; 9+ messages in thread
From: Dario Faggioli @ 2017-02-24 16:14 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: George Dunlap, xen-devel, osstest service owner, Juergen Gross


[-- Attachment #1.1: Type: text/plain, Size: 4311 bytes --]

[Adding Juergen]

On Wed, 2017-02-22 at 01:46 -0700, Jan Beulich wrote:
> > > > On 22.02.17 at 01:02, <andrew.cooper3@citrix.com> wrote:
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d080126e70>]
> > sched_credit2.c#vcpu_is_migrateable+0x22/0x9a
> > (XEN)    [<ffff82d080129763>]
> > sched_credit2.c#csched2_schedule+0x823/0xb4e
> > (XEN)    [<ffff82d08012c17e>] schedule.c#schedule+0x108/0x609
> > (XEN)    [<ffff82d08012f8bd>] softirq.c#__do_softirq+0x7f/0x8a
> > (XEN)    [<ffff82d08012f912>] do_softirq+0x13/0x15
> > (XEN)    [<ffff82d080164b17>] domain.c#idle_loop+0x55/0x62
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 14:
> > (XEN) Assertion 'd->cpupool != NULL' failed at
> > ...5948.build-amd64/xen/xen/include/xen/sched-if.h:200
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Manual reset required ('noreboot' specified)
> > 
> > I am guessing the most recent credit2 backports weren't quite so
> > safe?
> 
Well, what I'd say we're facing is the surfacing of a latent bug.

> However, comparing with the staging version of the file
> (which is heavily different), the immediate code involved here isn't
> all that different, so I wonder whether (a) this is a problem on
> staging too or (b) we're missing another backport. Dario?
> 
So, according to my investigation, this is a genuine race. It affects
this branch as well as staging, but it manifests less frequently (or, I
should say, very rarely) in the latter.

The problem is that the Credit2's load balancer operates not only on
runnable vCPUs, but also on blocked, sleeping, and paused ones (and
that's by design).

In this case, the original domain is in the process of being destroyed,
 after migration completed, and reaches the point where, within
domain_destroy(), we call cpupool_rm_domain(). This remove the domain
from any cpupool, and sets d->cpupool = NULL.
Then, on another pCPU --since the vCPUs of the domain are still around
(until we call sched_destroy_vcpu(), which happens much later-- and
they also are still assigned to a Credit2 runqueue, balance_load()
picks up one of them for moving to another runqueue, and things explode
when we realize that the vCPU is actually out of any pool!

So, I've thought quite a bit of how to solve this. Possibilities are to
act at the Credit2 level, or outside of it.

I drafted a couple of solutions only affecting sched_credit2.c, but
could not be satisfied with the results. And that's because I
ultimately think it should be safe for a scheduler that it can play
with a vCPU that it can reach out to, and that means the vCPU must be
in a pool.

And that's why I came up with the patch below.

This is a draft and is on top of staging-4.7. I will properly submit it
against staging, if you agree with me it's an ok thing to do.

Basically, I anticipate a little bit calling sched_destroy_vcpu(), so
that it happens before cpupool_rm_domain(). This ensures that vCPUs
have valid cpupool information until the very last moment that they are
accessible from a scheduler.

---
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 45273d4..4db7750 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -643,7 +643,10 @@ int domain_kill(struct domain *d)
         if ( cpupool_move_domain(d, cpupool0) )
             return -ERESTART;
         for_each_vcpu ( d, v )
+        {
             unmap_vcpu_info(v);
+            sched_destroy_vcpu(v);
+        }
         d->is_dying = DOMDYING_dead;
         /* Mem event cleanup has to go here because the rings 
          * have to be put before we call put_domain. */
@@ -807,7 +810,6 @@ static void complete_domain_destroy(struct rcu_head *head)
             continue;
         tasklet_kill(&v->continue_hypercall_tasklet);
         vcpu_destroy(v);
-        sched_destroy_vcpu(v);
         destroy_waitqueue_vcpu(v);
     }
---

Let me know.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: RFC/PATCH: xen: race during domain destruction [Re: [xen-4.7-testing test] 105948: regressions - FAIL]
  2017-02-24 16:14     ` RFC/PATCH: xen: race during domain destruction [Re: [xen-4.7-testing test] 105948: regressions - FAIL] Dario Faggioli
@ 2017-02-26 15:53       ` Dario Faggioli
  2017-02-27 15:18         ` Dario Faggioli
  0 siblings, 1 reply; 9+ messages in thread
From: Dario Faggioli @ 2017-02-26 15:53 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: George Dunlap, xen-devel, osstest service owner, Juergen Gross


[-- Attachment #1.1: Type: text/plain, Size: 1398 bytes --]

On Fri, 2017-02-24 at 17:14 +0100, Dario Faggioli wrote:
> On Wed, 2017-02-22 at 01:46 -0700, Jan Beulich wrote:
> > However, comparing with the staging version of the file
> > (which is heavily different), the immediate code involved here
> > isn't
> > all that different, so I wonder whether (a) this is a problem on
> > staging too or (b) we're missing another backport. Dario?
> > 
> So, according to my investigation, this is a genuine race. It affects
> this branch as well as staging, but it manifests less frequently (or,
> I
> should say, very rarely) in the latter.
> 
Actually, this is probably wrong. It looks like the following commit:

 f3d47501db2b7bb8dfd6a3c9710b7aff4b1fc55b
 xen: fix a (latent) cpupool-related race during domain destroy

is not in staging-4.7.

At some point, while investigating, I thought I had seen it there, but
I was wrong!

So, I'd say that the proper solution is to backport that change, and
ignore the drafted patch I sent before.

In any case, I'll try doing the backport myself and test the result on
Monday (tomorrow). And I will let you know.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC/PATCH: xen: race during domain destruction [Re: [xen-4.7-testing test] 105948: regressions - FAIL]
  2017-02-26 15:53       ` Dario Faggioli
@ 2017-02-27 15:18         ` Dario Faggioli
  2017-02-28  9:48           ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Dario Faggioli @ 2017-02-27 15:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, xen-devel, Andrew Cooper, osstest service owner,
	Juergen Gross


[-- Attachment #1.1: Type: text/plain, Size: 1744 bytes --]

On Sun, 2017-02-26 at 16:53 +0100, Dario Faggioli wrote:
> On Fri, 2017-02-24 at 17:14 +0100, Dario Faggioli wrote:
> > On Wed, 2017-02-22 at 01:46 -0700, Jan Beulich wrote:
> > > 
> > > However, comparing with the staging version of the file
> > > (which is heavily different), the immediate code involved here
> > > isn't
> > > all that different, so I wonder whether (a) this is a problem on
> > > staging too or (b) we're missing another backport. Dario?
> > > 
> > So, according to my investigation, this is a genuine race. It
> > affects
> > this branch as well as staging, but it manifests less frequently
> > (or,
> > I
> > should say, very rarely) in the latter.
> > 
> Actually, this is probably wrong. It looks like the following commit:
> 
>  f3d47501db2b7bb8dfd6a3c9710b7aff4b1fc55b
>  xen: fix a (latent) cpupool-related race during domain destroy
> 
> is not in staging-4.7.
> 
And my testing confirms that backporting the changeset above (which
just applies cleanly on staging-4.7, AFAICT) make the problem go away.

As the changelog of that commit says, I've even seen something similar
happening already during my development... Sorry I did not recognise it
sooner, and for failing to request backport of that change in the first
place.

I'm therefore doing that now: I ask for backport of:

 f3d47501db2b7bb8dfd6a3c9710b7aff4b1fc55b
 xen: fix a (latent) cpupool-related race during domain destroy

to 4.7.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC/PATCH: xen: race during domain destruction [Re: [xen-4.7-testing test] 105948: regressions - FAIL]
  2017-02-27 15:18         ` Dario Faggioli
@ 2017-02-28  9:48           ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2017-02-28  9:48 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: George Dunlap, AndrewCooper, osstest service owner,
	Juergen Gross, xen-devel

>>> On 27.02.17 at 16:18, <dario.faggioli@citrix.com> wrote:
> I'm therefore doing that now: I ask for backport of:
> 
>  f3d47501db2b7bb8dfd6a3c9710b7aff4b1fc55b
>  xen: fix a (latent) cpupool-related race during domain destroy
> 
> to 4.7.

Thanks for working this out! Applied to 4.7-staging.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-02-28  9:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-21 23:45 [xen-4.7-testing test] 105948: regressions - FAIL osstest service owner
2017-02-22  0:02 ` Andrew Cooper
2017-02-22  8:46   ` Jan Beulich
2017-02-22  9:59     ` Dario Faggioli
2017-02-23 23:25     ` Dario Faggioli
2017-02-24 16:14     ` RFC/PATCH: xen: race during domain destruction [Re: [xen-4.7-testing test] 105948: regressions - FAIL] Dario Faggioli
2017-02-26 15:53       ` Dario Faggioli
2017-02-27 15:18         ` Dario Faggioli
2017-02-28  9:48           ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.