All of lore.kernel.org
 help / color / mirror / Atom feed
* [xen-unstable test] 56759: regressions - FAIL
@ 2015-05-20  9:34 osstest service user
  2015-05-20  9:56 ` Ian Campbell
  0 siblings, 1 reply; 16+ messages in thread
From: osstest service user @ 2015-05-20  9:34 UTC (permalink / raw)
  To: xen-devel; +Cc: ian.jackson

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 15809 bytes --]

flight 56759 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/56759/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-multivcpu 17 leak-check/check         fail REGR. vs. 56375

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-freebsd10-amd64 13 guest-localmigrate     fail REGR. vs. 56375
 test-amd64-amd64-libvirt     11 guest-start               fail REGR. vs. 56375
 test-amd64-i386-freebsd10-i386 16 guest-localmigrate/x10 fail blocked in 56375
 test-armhf-armhf-libvirt     11 guest-start                  fail   like 56375
 test-amd64-i386-libvirt      11 guest-start                  fail   like 56375
 test-amd64-amd64-rumpuserxen-amd64 15 rumpuserxen-demo-xenstorels/xenstorels.repeat fail like 56544-bisect

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 9 debian-hvm-install fail never pass
 test-amd64-amd64-xl-xsm      11 guest-start                  fail   never pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 9 debian-hvm-install fail never pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 9 debian-hvm-install fail never pass
 test-amd64-i386-xl-xsm       11 guest-start                  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start                  fail  never pass
 test-amd64-amd64-libvirt-xsm 11 guest-start                  fail   never pass
 test-amd64-i386-libvirt-xsm  11 guest-start                  fail   never pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 9 debian-hvm-install fail never pass
 test-armhf-armhf-libvirt-xsm 11 guest-start                  fail   never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start                  fail   never pass
 test-armhf-armhf-xl-xsm      11 guest-start                  fail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-check        fail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop              fail never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-check        fail never pass
 test-armhf-armhf-xl-sedf-pin 12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          12 migrate-support-check        fail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop              fail never pass
 test-armhf-armhf-xl-sedf     12 migrate-support-check        fail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-check        fail  never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop             fail never pass

version targeted for testing:
 xen                  1037e33c88bb0e1fe530c164f242df17030102e1
baseline version:
 xen                  e13013dbf1d5997915548a3b5f1c39594d8c1d7b

------------------------------------------------------------
People who touched revisions under test:
  David Vrabel <david.vrabel@citrix.com>
  George Dunlap <george.dunlap@eu.citrix.com>
  Ian Campbell <ian.campbell@citrix.com>
  Jan Beulich <jbeulich@suse.com>
  Julien Grall <julien.grall@citrix.com> (ARM)
  Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
  Roger Pau Monné <roger.pau@citrix.com>
------------------------------------------------------------

jobs:
 build-amd64-xsm                                              pass
 build-armhf-xsm                                              pass
 build-i386-xsm                                               pass
 build-amd64                                                  pass
 build-armhf                                                  pass
 build-i386                                                   pass
 build-amd64-libvirt                                          pass
 build-armhf-libvirt                                          pass
 build-i386-libvirt                                           pass
 build-amd64-oldkern                                          pass
 build-i386-oldkern                                           pass
 build-amd64-pvops                                            pass
 build-armhf-pvops                                            pass
 build-i386-pvops                                             pass
 build-amd64-rumpuserxen                                      pass
 build-i386-rumpuserxen                                       pass
 test-amd64-amd64-xl                                          pass
 test-armhf-armhf-xl                                          pass
 test-amd64-i386-xl                                           pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm                fail
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm                 fail
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm                fail
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm                 fail
 test-amd64-amd64-libvirt-xsm                                 fail
 test-armhf-armhf-libvirt-xsm                                 fail
 test-amd64-i386-libvirt-xsm                                  fail
 test-amd64-amd64-xl-xsm                                      fail
 test-armhf-armhf-xl-xsm                                      fail
 test-amd64-i386-xl-xsm                                       fail
 test-amd64-amd64-xl-pvh-amd                                  fail
 test-amd64-i386-qemut-rhel6hvm-amd                           pass
 test-amd64-i386-qemuu-rhel6hvm-amd                           pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64                    pass
 test-amd64-i386-xl-qemut-debianhvm-amd64                     pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64                    pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64                     pass
 test-amd64-i386-freebsd10-amd64                              fail
 test-amd64-amd64-xl-qemuu-ovmf-amd64                         pass
 test-amd64-i386-xl-qemuu-ovmf-amd64                          pass
 test-amd64-amd64-rumpuserxen-amd64                           fail
 test-amd64-amd64-xl-qemut-win7-amd64                         pass
 test-amd64-i386-xl-qemut-win7-amd64                          fail
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail
 test-amd64-i386-xl-qemuu-win7-amd64                          fail
 test-armhf-armhf-xl-arndale                                  pass
 test-amd64-amd64-xl-credit2                                  pass
 test-armhf-armhf-xl-credit2                                  pass
 test-armhf-armhf-xl-cubietruck                               pass
 test-amd64-i386-freebsd10-i386                               fail
 test-amd64-i386-rumpuserxen-i386                             pass
 test-amd64-amd64-xl-pvh-intel                                fail
 test-amd64-i386-qemut-rhel6hvm-intel                         pass
 test-amd64-i386-qemuu-rhel6hvm-intel                         pass
 test-amd64-amd64-libvirt                                     fail
 test-armhf-armhf-libvirt                                     fail
 test-amd64-i386-libvirt                                      fail
 test-amd64-amd64-xl-multivcpu                                pass
 test-armhf-armhf-xl-multivcpu                                fail
 test-amd64-amd64-pair                                        pass
 test-amd64-i386-pair                                         pass
 test-amd64-amd64-xl-sedf-pin                                 pass
 test-armhf-armhf-xl-sedf-pin                                 pass
 test-amd64-amd64-xl-sedf                                     pass
 test-armhf-armhf-xl-sedf                                     pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1                     pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1                     pass
 test-amd64-amd64-xl-qemut-winxpsp3                           pass
 test-amd64-i386-xl-qemut-winxpsp3                            pass
 test-amd64-amd64-xl-qemuu-winxpsp3                           pass
 test-amd64-i386-xl-qemuu-winxpsp3                            pass


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

------------------------------------------------------------
commit 1037e33c88bb0e1fe530c164f242df17030102e1
Author: David Vrabel <david.vrabel@citrix.com>
Date:   Tue May 19 15:49:22 2015 +0200

    spinlock: fix build with older GCC

    Older GCC versions such as 4.3 cannot have initializers for the
    members of anonymous structures, so initialize .head_tail instead.

    Use a SPINLOCK_TICKET_INC define so this initializer is near the
    spinlock_tickets_t definition (in case the structure changes requiring
    changes to the initializer).

    Signed-off-by: David Vrabel <david.vrabel@citrix.com>
    Reported-and-tested-by: Jan Beulich <jbeulich@suse.com>

commit db83975f0fcd30370392ed288a7bd2420624ed4e
Author: Jan Beulich <jbeulich@suse.com>
Date:   Tue May 19 11:35:30 2015 +0200

    x86/EFI: keep EFI runtime services top level page tables up-to-date

    Updates to idle_pg_table[] need to be mirrored into the page tables
    used for invoking EFI runtime services.

    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

commit 0d7a599afff0665c74f328f6af85e556688d7908
Author: Jan Beulich <jbeulich@suse.com>
Date:   Mon May 18 12:34:44 2015 +0200

    Revert "x86: rework paging_log_dirty_op to work with hvm guests"

    This reverts commit a809eeea06d20b115d78f12e473502bcb6209844, as it
    breaks PV log dirty mode handling.

commit 08c902a39f5f7aa0e1d5fe664b2b8db458d4fb73
Author: Jan Beulich <jbeulich@suse.com>
Date:   Mon May 18 12:11:31 2015 +0200

    x86emul: also put_fpu() on error paths

    fail_if() and generate_exception_if() could theoretically bypass the
    normal flow reaching put_fpu(), and not invoking it would leave the
    fpu_exception_callback pointer in place, allowing for the callback to
    be called at an unexpected time. Luckily the two
    generate_exception_if()-s that would actually trigger this are
    currently commented out, so this is not (yet) a (security) issue.

    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

commit e4ad2836842ac114e7791963d56ebd02dd4c384f
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Fri May 15 16:12:22 2015 -0400

    xentrace: Implement cpu mask range parsing of human values (-c).

    Instead of just using -c 0x<some hex value> we can
    also use: -c <starting cpu>-<end cpu>, -c <cpu1>,<cpu2>, or a
    combination of them, or 'all' for all cpus.

    This new format can include just singular CPUs: -c <cpu1>,
    or ranges without an start or end (and xentrace will figure out
    the values), such as: -c -<cpu2> (which will include cpu0, cpu1,
    and cpu2) or -c <cpu2>- (which will include cpu2 and up to MAX_CPUS).

    That should make it easier to trace the right CPU if
    using this along with 'xl vcpu-list'.

    The code has been lifted from the Linux kernel, see file
    lib/bitmap.c, function __bitmap_parselist.

    To make the old behavior and the new function work, we check
    to see if the arguments have '0x' in them. If they do
    we use the old style parsing (limited to 32 CPUs). If that
    does not exist we use the new parsing.

    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>

commit a809eeea06d20b115d78f12e473502bcb6209844
Author: Roger Pau Monné <roger.pau@citrix.com>
Date:   Fri May 15 10:08:33 2015 +0200

    x86: rework paging_log_dirty_op to work with hvm guests

    When the caller of paging_log_dirty_op is a hvm guest Xen would choke when
    trying to copy the dirty bitmap to the guest because the paging lock is
    already held.

    Fix this by independently mapping each page of the guest bitmap as needed
    without the paging lock held.

    Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
    Reviewed-by: Tim Deegan <tim@xen.org>

commit acc0899ef41e763c665c542beca6809049fac11c
Author: Roger Pau Monné <roger.pau@citrix.com>
Date:   Fri May 15 10:07:50 2015 +0200

    x86/hap: make hap_track_dirty_vram use non-contiguous memory for temporary map

    Just like it's done for shadow_track_dirty_vram allocate the temporary
    buffer using non-contiguous memory.

    Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
    Reviewed-by: Tim Deegan <tim@xen.org>

commit bd1b4a71b325933a08099676515a7cc8235d7144
Author: Roger Pau Monné <roger.pau@citrix.com>
Date:   Fri May 15 10:07:20 2015 +0200

    x86/shadow: fix shadow_track_dirty_vram to work on hvm guests

    Modify shadow_track_dirty_vram to use a local buffer and then flush to the
    guest without the paging_lock held. This is modeled after
    hap_track_dirty_vram.

    Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
    Reviewed-by: Tim Deegan <tim@xen.org>

commit f278fcf19ce15f7b7ee69181560b5884a5e12b66
Author: Roger Pau Monné <roger.pau@citrix.com>
Date:   Fri May 15 10:06:04 2015 +0200

    introduce a helper to allocate non-contiguous memory

    The allocator uses independent calls to alloc_domheap_pages in order to get
    the desired amount of memory and then maps all the independent physical
    addresses into a contiguous virtual address space.

    Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
    Tested-by: Julien Grall <julien.grall@citrix.com> (ARM)
    Reviewed-by: Tim Deegan <tim@xen.org>

commit e62e49e6d5d4e8d22f3df0b75443ede65a812435
Author: David Vrabel <david.vrabel@citrix.com>
Date:   Fri May 15 09:52:25 2015 +0200

    x86,arm: remove asm/spinlock.h from all architectures

    Now that all architecture use a common ticket lock implementation for
    spinlocks, remove the architecture specific byte lock implementations.

    Signed-off-by: David Vrabel <david.vrabel@citrix.com>
    Reviewed-by: Tim Deegan <tim@xen.org>
    Acked-by: Jan Beulich <jbeulich@suse.com>
    Acked-by: Ian Campbell <ian.campbell@citrix.com>

commit 45fcc4568c5162b00fb3907fb158af82dd484a3d
Author: David Vrabel <david.vrabel@citrix.com>
Date:   Fri May 15 09:49:12 2015 +0200

    use ticket locks for spin locks

    Replace the byte locks with ticket locks.  Ticket locks are: a) fair;
    and b) peform better when contented since they spin without an atomic
    operation.

    The lock is split into two ticket values: head and tail.  A locker
    acquires a ticket by (atomically) increasing tail and using the
    previous tail value.  A CPU holds the lock if its ticket == head.  The
    lock is released by increasing head.

    spin_lock_irq() and spin_lock_irqsave() now spin with irqs disabled
    (previously, they would spin with irqs enabled if possible).  This is
    required to prevent deadlocks when the irq handler tries to take the
    same lock with a higher ticket.

    Architectures need only provide arch_fetch_and_add() and two barriers:
    arch_lock_acquire_barrier() and arch_lock_release_barrier().

    Signed-off-by: David Vrabel <david.vrabel@citrix.com>
    Reviewed-by: Tim Deegan <tim@xen.org>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
(qemu changes not included)


[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-05-20  9:34 [xen-unstable test] 56759: regressions - FAIL osstest service user
@ 2015-05-20  9:56 ` Ian Campbell
  2015-05-26  9:11   ` Julien Grall
  2015-05-26 13:29   ` Ian Campbell
  0 siblings, 2 replies; 16+ messages in thread
From: Ian Campbell @ 2015-05-20  9:56 UTC (permalink / raw)
  To: xen-devel, ian.jackson
  Cc: Julien Grall, Tim Deegan, David Vrabel, Stefano Stabellini

On Wed, 2015-05-20 at 09:34 +0000, osstest service user wrote:
> flight 56759 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/56759/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-armhf-armhf-xl-multivcpu 17 leak-check/check         fail REGR. vs. 56375

I'm pretty hard pressed to explain this from the set of commits
currently under test, but it has happened a few times now (e.g. 56700
56576) so it does seem to be real.

http://logs.test-lab.xenproject.org/osstest/results/bisect.xen-unstable.test-armhf-armhf-xl-multivcpu.leak-check--check.html
is working on it and is currently consider the set of changes from:
ianc@cosworth:xen.git$ git log --oneline 9ab42~1...45fcc4
45fcc45 use ticket locks for spin locks
e13013d libxc/restore: add checkpointed flag to the restore context
ce44b40 libxc/restore: introduce setup() and cleanup() on restore
c5c5a04 libxc/restore: split read/handle qemu info
9ab42c9 libxc/restore: introduce process_record()

where e13013d is current master which was pushed in by flight 56375.

I think it unlikely the libxl stuff is responible, given we don't
migrate on ARM, which would seem to point to the ticket locks...

Ian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-05-20  9:56 ` Ian Campbell
@ 2015-05-26  9:11   ` Julien Grall
  2015-05-26  9:17     ` Ian Campbell
  2015-05-26 13:29   ` Ian Campbell
  1 sibling, 1 reply; 16+ messages in thread
From: Julien Grall @ 2015-05-26  9:11 UTC (permalink / raw)
  To: Ian Campbell, xen-devel, ian.jackson
  Cc: Julien Grall, Tim Deegan, David Vrabel, Stefano Stabellini

Hi,

On 20/05/2015 11:56, Ian Campbell wrote:
> On Wed, 2015-05-20 at 09:34 +0000, osstest service user wrote:
>> flight 56759 xen-unstable real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/56759/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>   test-armhf-armhf-xl-multivcpu 17 leak-check/check         fail REGR. vs. 56375
>
> I'm pretty hard pressed to explain this from the set of commits
> currently under test, but it has happened a few times now (e.g. 56700
> 56576) so it does seem to be real.
>
> http://logs.test-lab.xenproject.org/osstest/results/bisect.xen-unstable.test-armhf-armhf-xl-multivcpu.leak-check--check.html
> is working on it and is currently consider the set of changes from:
> ianc@cosworth:xen.git$ git log --oneline 9ab42~1...45fcc4
> 45fcc45 use ticket locks for spin locks
> e13013d libxc/restore: add checkpointed flag to the restore context
> ce44b40 libxc/restore: introduce setup() and cleanup() on restore
> c5c5a04 libxc/restore: split read/handle qemu info
> 9ab42c9 libxc/restore: introduce process_record()
>
> where e13013d is current master which was pushed in by flight 56375.
>
> I think it unlikely the libxl stuff is responible, given we don't
> migrate on ARM, which would seem to point to the ticket locks...

The test is still failing on the latest flight [1]. Any update on this 
issue?

Regards,

[1] http://logs.test-lab.xenproject.org/osstest/logs/57271/

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-05-26  9:11   ` Julien Grall
@ 2015-05-26  9:17     ` Ian Campbell
  2015-05-26  9:22       ` Julien Grall
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Campbell @ 2015-05-26  9:17 UTC (permalink / raw)
  To: Julien Grall
  Cc: Tim Deegan, xen-devel, ian.jackson, David Vrabel, Stefano Stabellini

On Tue, 2015-05-26 at 11:11 +0200, Julien Grall wrote:
> Hi,
> 
> On 20/05/2015 11:56, Ian Campbell wrote:
> > On Wed, 2015-05-20 at 09:34 +0000, osstest service user wrote:
> >> flight 56759 xen-unstable real [real]
> >> http://logs.test-lab.xenproject.org/osstest/logs/56759/
> >>
> >> Regressions :-(
> >>
> >> Tests which did not succeed and are blocking,
> >> including tests which could not be run:
> >>   test-armhf-armhf-xl-multivcpu 17 leak-check/check         fail REGR. vs. 56375
> >
> > I'm pretty hard pressed to explain this from the set of commits
> > currently under test, but it has happened a few times now (e.g. 56700
> > 56576) so it does seem to be real.
> >
> > http://logs.test-lab.xenproject.org/osstest/results/bisect.xen-unstable.test-armhf-armhf-xl-multivcpu.leak-check--check.html
> > is working on it and is currently consider the set of changes from:
> > ianc@cosworth:xen.git$ git log --oneline 9ab42~1...45fcc4
> > 45fcc45 use ticket locks for spin locks
> > e13013d libxc/restore: add checkpointed flag to the restore context
> > ce44b40 libxc/restore: introduce setup() and cleanup() on restore
> > c5c5a04 libxc/restore: split read/handle qemu info
> > 9ab42c9 libxc/restore: introduce process_record()
> >
> > where e13013d is current master which was pushed in by flight 56375.
> >
> > I think it unlikely the libxl stuff is responible, given we don't
> > migrate on ARM, which would seem to point to the ticket locks...
> 
> The test is still failing on the latest flight [1]. Any update on this 
> issue?

The bisection got nowhere.

I've tried to repro on the cubietruck on my desk and have gotten
nowhere.

But I've just now noticed that the failures are on arndale (not sure why
I thought ct).

Can I steal the arndale off your desk please?

BTW, it doesn't seem to be a 100% failure rate, e.g. 57271 seems to have
passed, despite testing the exact same thing as 57242.

Ian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-05-26  9:17     ` Ian Campbell
@ 2015-05-26  9:22       ` Julien Grall
  0 siblings, 0 replies; 16+ messages in thread
From: Julien Grall @ 2015-05-26  9:22 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Tim Deegan, xen-devel, ian.jackson, David Vrabel, Stefano Stabellini

Hi Ian,

On 26/05/2015 11:17, Ian Campbell wrote:
> On Tue, 2015-05-26 at 11:11 +0200, Julien Grall wrote:
>> Hi,
>>
>> On 20/05/2015 11:56, Ian Campbell wrote:
>>> On Wed, 2015-05-20 at 09:34 +0000, osstest service user wrote:
>>>> flight 56759 xen-unstable real [real]
>>>> http://logs.test-lab.xenproject.org/osstest/logs/56759/
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>>    test-armhf-armhf-xl-multivcpu 17 leak-check/check         fail REGR. vs. 56375
>>>
>>> I'm pretty hard pressed to explain this from the set of commits
>>> currently under test, but it has happened a few times now (e.g. 56700
>>> 56576) so it does seem to be real.
>>>
>>> http://logs.test-lab.xenproject.org/osstest/results/bisect.xen-unstable.test-armhf-armhf-xl-multivcpu.leak-check--check.html
>>> is working on it and is currently consider the set of changes from:
>>> ianc@cosworth:xen.git$ git log --oneline 9ab42~1...45fcc4
>>> 45fcc45 use ticket locks for spin locks
>>> e13013d libxc/restore: add checkpointed flag to the restore context
>>> ce44b40 libxc/restore: introduce setup() and cleanup() on restore
>>> c5c5a04 libxc/restore: split read/handle qemu info
>>> 9ab42c9 libxc/restore: introduce process_record()
>>>
>>> where e13013d is current master which was pushed in by flight 56375.
>>>
>>> I think it unlikely the libxl stuff is responible, given we don't
>>> migrate on ARM, which would seem to point to the ticket locks...
>>
>> The test is still failing on the latest flight [1]. Any update on this
>> issue?
>
> The bisection got nowhere.
>
> I've tried to repro on the cubietruck on my desk and have gotten
> nowhere.
>
> But I've just now noticed that the failures are on arndale (not sure why
> I thought ct).

We use the same Xen binary (hypervisor/tools) and the both platform right?

I'm wondering if it's because the processor revision is not the same and 
we forgot to implement an errata.

> Can I steal the arndale off your desk please?

Go ahead.

> BTW, it doesn't seem to be a 100% failure rate, e.g. 57271 seems to have
> passed, despite testing the exact same thing as 57242.

I sometimes saw another test failing for ARM too.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-05-20  9:56 ` Ian Campbell
  2015-05-26  9:11   ` Julien Grall
@ 2015-05-26 13:29   ` Ian Campbell
  2015-05-27 16:04     ` Ian Campbell
  1 sibling, 1 reply; 16+ messages in thread
From: Ian Campbell @ 2015-05-26 13:29 UTC (permalink / raw)
  To: xen-devel
  Cc: Julien Grall, Tim Deegan, ian.jackson, David Vrabel, Stefano Stabellini

On Wed, 2015-05-20 at 10:56 +0100, Ian Campbell wrote:
> On Wed, 2015-05-20 at 09:34 +0000, osstest service user wrote:
> > flight 56759 xen-unstable real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/56759/
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  test-armhf-armhf-xl-multivcpu 17 leak-check/check         fail REGR. vs. 56375
> 
> I'm pretty hard pressed to explain this from the set of commits
> currently under test, but it has happened a few times now (e.g. 56700
> 56576) so it does seem to be real.
> 
> http://logs.test-lab.xenproject.org/osstest/results/bisect.xen-unstable.test-armhf-armhf-xl-multivcpu.leak-check--check.html
> is working on it and is currently consider the set of changes from:
> ianc@cosworth:xen.git$ git log --oneline 9ab42~1...45fcc4
> 45fcc45 use ticket locks for spin locks
> e13013d libxc/restore: add checkpointed flag to the restore context
> ce44b40 libxc/restore: introduce setup() and cleanup() on restore
> c5c5a04 libxc/restore: split read/handle qemu info
> 9ab42c9 libxc/restore: introduce process_record()
> 
> where e13013d is current master which was pushed in by flight 56375.
> 
> I think it unlikely the libxl stuff is responible, given we don't
> migrate on ARM, which would seem to point to the ticket locks...

I've now managed to reproduce using the arndale on my desk.

I'm just starting to dig in to the issue.

So far the only thing I've concluded is that the message comes from
netback try to read the script node for inclusion in the hotplug
invocation's environment.

I wonder if perhaps the spinlock change has just exposed a pre-existing
race?

Ian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-05-26 13:29   ` Ian Campbell
@ 2015-05-27 16:04     ` Ian Campbell
  2015-05-28  8:50       ` Jan Beulich
  2015-05-29 16:32       ` Ian Campbell
  0 siblings, 2 replies; 16+ messages in thread
From: Ian Campbell @ 2015-05-27 16:04 UTC (permalink / raw)
  To: xen-devel, Wei Liu
  Cc: Julien Grall, ian.jackson, Tim Deegan, David Vrabel, Stefano Stabellini

On Tue, 2015-05-26 at 14:29 +0100, Ian Campbell wrote:
> On Wed, 2015-05-20 at 10:56 +0100, Ian Campbell wrote:
> > On Wed, 2015-05-20 at 09:34 +0000, osstest service user wrote:
> > > flight 56759 xen-unstable real [real]
> > > http://logs.test-lab.xenproject.org/osstest/logs/56759/
> > > 
> > > Regressions :-(
> > > 
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > >  test-armhf-armhf-xl-multivcpu 17 leak-check/check         fail REGR. vs. 56375
> > 
> > I'm pretty hard pressed to explain this from the set of commits
> > currently under test, but it has happened a few times now (e.g. 56700
> > 56576) so it does seem to be real.
> > 
> > http://logs.test-lab.xenproject.org/osstest/results/bisect.xen-unstable.test-armhf-armhf-xl-multivcpu.leak-check--check.html
> > is working on it and is currently consider the set of changes from:
> > ianc@cosworth:xen.git$ git log --oneline 9ab42~1...45fcc4
> > 45fcc45 use ticket locks for spin locks
> > e13013d libxc/restore: add checkpointed flag to the restore context
> > ce44b40 libxc/restore: introduce setup() and cleanup() on restore
> > c5c5a04 libxc/restore: split read/handle qemu info
> > 9ab42c9 libxc/restore: introduce process_record()
> > 
> > where e13013d is current master which was pushed in by flight 56375.
> > 
> > I think it unlikely the libxl stuff is responible, given we don't
> > migrate on ARM, which would seem to point to the ticket locks...
> 
> I've now managed to reproduce using the arndale on my desk.

... and now I've confirmed that reverting the spin lock change causes
the issue to not happen any more.

> I'm just starting to dig in to the issue.
> 
> So far the only thing I've concluded is that the message comes from
> netback try to read the script node for inclusion in the hotplug
> invocation's environment.
> 
> I wonder if perhaps the spinlock change has just exposed a pre-existing
> race?

I'm still confirming, but AFAICT libxl does the right thing and writes
state=Closing and waits for it to hit state=Closed before tearing down
the backend directory. AFAICS it is not timing out while waiting.

Looking at the netback side though it seems like netback_remove is
switching to state=Closed _before_ it calls kobject_uevent(...,
KOBJ_OFFLINE) and it is this which generates the call to netback_uevent
which tries and fails to read script and produces the error message.

Since switching to state=Closed is what prompts libxl to go and delete
the xenstore backend dir it seems like it would be possible that
netback_uevent might not happen until the xenstore key was gone,
prompting it to write the error nodes. Is there anything else which
might prevent against that possibility?

Handwaving a bit (ok, a lot) it's possible that the change of spinlocks
has caused a commonly won race to become a commonly lost one at least
under these circumstances.

My theory is that this is exacerbated on arndale because the CPU is
relatively slow (even compared to cubietruck which is the same core but
faster DRAM etc) and the fact that it is dual core while the test case
which is failing involves a 4 vcpu guest (which is a bit dumb but not
invalid) is loading things even more.

I'm still slightly concerned that perhaps the new spinlock stuff has
some sort of bad behaviour either on arndale specifically or more
generally for ARM systems which has pushed this particular case over the
edge.

Ian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-05-27 16:04     ` Ian Campbell
@ 2015-05-28  8:50       ` Jan Beulich
  2015-05-28  9:26         ` Ian Campbell
  2015-05-29 16:32       ` Ian Campbell
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2015-05-28  8:50 UTC (permalink / raw)
  To: David Vrabel, Ian Campbell
  Cc: Wei Liu, Stefano Stabellini, Tim Deegan, ian.jackson,
	Julien Grall, xen-devel

>>> On 27.05.15 at 18:04, <ian.campbell@citrix.com> wrote:
> On Tue, 2015-05-26 at 14:29 +0100, Ian Campbell wrote:
>> On Wed, 2015-05-20 at 10:56 +0100, Ian Campbell wrote:
>> > On Wed, 2015-05-20 at 09:34 +0000, osstest service user wrote:
>> > > flight 56759 xen-unstable real [real]
>> > > http://logs.test-lab.xenproject.org/osstest/logs/56759/ 
>> > > 
>> > > Regressions :-(
>> > > 
>> > > Tests which did not succeed and are blocking,
>> > > including tests which could not be run:
>> > >  test-armhf-armhf-xl-multivcpu 17 leak-check/check         fail REGR. vs. 56375
>> > 
>> > I'm pretty hard pressed to explain this from the set of commits
>> > currently under test, but it has happened a few times now (e.g. 56700
>> > 56576) so it does seem to be real.
>> > 
>> > 
> http://logs.test-lab.xenproject.org/osstest/results/bisect.xen-unstable.test-ar 
> mhf-armhf-xl-multivcpu.leak-check--check.html
>> > is working on it and is currently consider the set of changes from:
>> > ianc@cosworth:xen.git$ git log --oneline 9ab42~1...45fcc4
>> > 45fcc45 use ticket locks for spin locks
>> > e13013d libxc/restore: add checkpointed flag to the restore context
>> > ce44b40 libxc/restore: introduce setup() and cleanup() on restore
>> > c5c5a04 libxc/restore: split read/handle qemu info
>> > 9ab42c9 libxc/restore: introduce process_record()
>> > 
>> > where e13013d is current master which was pushed in by flight 56375.
>> > 
>> > I think it unlikely the libxl stuff is responible, given we don't
>> > migrate on ARM, which would seem to point to the ticket locks...
>> 
>> I've now managed to reproduce using the arndale on my desk.
> 
> ... and now I've confirmed that reverting the spin lock change causes
> the issue to not happen any more.

Considering that this issue has prevented a push for almost
two weeks, I think we ought to consider reverting the two
offending commits until the problem got sorted out.

Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-05-28  8:50       ` Jan Beulich
@ 2015-05-28  9:26         ` Ian Campbell
  2015-05-28 10:10           ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Campbell @ 2015-05-28  9:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, Tim Deegan, ian.jackson,
	Julien Grall, David Vrabel, xen-devel

On Thu, 2015-05-28 at 09:50 +0100, Jan Beulich wrote:
> >>> On 27.05.15 at 18:04, <ian.campbell@citrix.com> wrote:
> > On Tue, 2015-05-26 at 14:29 +0100, Ian Campbell wrote:
> >> On Wed, 2015-05-20 at 10:56 +0100, Ian Campbell wrote:
> >> > On Wed, 2015-05-20 at 09:34 +0000, osstest service user wrote:
> >> > > flight 56759 xen-unstable real [real]
> >> > > http://logs.test-lab.xenproject.org/osstest/logs/56759/ 
> >> > > 
> >> > > Regressions :-(
> >> > > 
> >> > > Tests which did not succeed and are blocking,
> >> > > including tests which could not be run:
> >> > >  test-armhf-armhf-xl-multivcpu 17 leak-check/check         fail REGR. vs. 56375
> >> > 
> >> > I'm pretty hard pressed to explain this from the set of commits
> >> > currently under test, but it has happened a few times now (e.g. 56700
> >> > 56576) so it does seem to be real.
> >> > 
> >> > 
> > http://logs.test-lab.xenproject.org/osstest/results/bisect.xen-unstable.test-ar 
> > mhf-armhf-xl-multivcpu.leak-check--check.html
> >> > is working on it and is currently consider the set of changes from:
> >> > ianc@cosworth:xen.git$ git log --oneline 9ab42~1...45fcc4
> >> > 45fcc45 use ticket locks for spin locks
> >> > e13013d libxc/restore: add checkpointed flag to the restore context
> >> > ce44b40 libxc/restore: introduce setup() and cleanup() on restore
> >> > c5c5a04 libxc/restore: split read/handle qemu info
> >> > 9ab42c9 libxc/restore: introduce process_record()
> >> > 
> >> > where e13013d is current master which was pushed in by flight 56375.
> >> > 
> >> > I think it unlikely the libxl stuff is responible, given we don't
> >> > migrate on ARM, which would seem to point to the ticket locks...
> >> 
> >> I've now managed to reproduce using the arndale on my desk.
> > 
> > ... and now I've confirmed that reverting the spin lock change causes
> > the issue to not happen any more.
> 
> Considering that this issue has prevented a push for almost
> two weeks, I think we ought to consider reverting the two
> offending commits until the problem got sorted out.

I think that would probably be wise. I'll try and figure out exactly
what is going on and propose some patches ASAP.

Ian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-05-28  9:26         ` Ian Campbell
@ 2015-05-28 10:10           ` Jan Beulich
  2015-05-29  9:56             ` Andrew Cooper
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2015-05-28 10:10 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Liu, Stefano Stabellini, Tim Deegan, ian.jackson,
	Julien Grall, David Vrabel, xen-devel

>>> On 28.05.15 at 11:26, <ian.campbell@citrix.com> wrote:
> On Thu, 2015-05-28 at 09:50 +0100, Jan Beulich wrote:
>> >>> On 27.05.15 at 18:04, <ian.campbell@citrix.com> wrote:
>> > On Tue, 2015-05-26 at 14:29 +0100, Ian Campbell wrote:
>> >> I've now managed to reproduce using the arndale on my desk.
>> > 
>> > ... and now I've confirmed that reverting the spin lock change causes
>> > the issue to not happen any more.
>> 
>> Considering that this issue has prevented a push for almost
>> two weeks, I think we ought to consider reverting the two
>> offending commits until the problem got sorted out.
> 
> I think that would probably be wise. I'll try and figure out exactly
> what is going on and propose some patches ASAP.

Now done and pushed.

Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-05-28 10:10           ` Jan Beulich
@ 2015-05-29  9:56             ` Andrew Cooper
  2015-05-29 10:40               ` Jan Beulich
  2015-05-29 10:50               ` Ian Campbell
  0 siblings, 2 replies; 16+ messages in thread
From: Andrew Cooper @ 2015-05-29  9:56 UTC (permalink / raw)
  To: Jan Beulich, Ian Campbell
  Cc: Wei Liu, Stefano Stabellini, ian.jackson, Tim Deegan,
	Julien Grall, David Vrabel, xen-devel

On 28/05/15 11:10, Jan Beulich wrote:
>>>> On 28.05.15 at 11:26, <ian.campbell@citrix.com> wrote:
>> On Thu, 2015-05-28 at 09:50 +0100, Jan Beulich wrote:
>>>>>> On 27.05.15 at 18:04, <ian.campbell@citrix.com> wrote:
>>>> On Tue, 2015-05-26 at 14:29 +0100, Ian Campbell wrote:
>>>>> I've now managed to reproduce using the arndale on my desk.
>>>> ... and now I've confirmed that reverting the spin lock change causes
>>>> the issue to not happen any more.
>>> Considering that this issue has prevented a push for almost
>>> two weeks, I think we ought to consider reverting the two
>>> offending commits until the problem got sorted out.
>> I think that would probably be wise. I'll try and figure out exactly
>> what is going on and propose some patches ASAP.
> Now done and pushed.

Wait what?  This failure is not related to spinlocks; It is a networking
behavioural bug (hardware specific, even) which has been uncovered,
showing that there is a preexisting race condition.

It is not reasonable to revert a correct change because it has exposed
an existing race condition elsewhere.  IMO, this should have been a
force push to mark the test as non-blocking.

~Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-05-29  9:56             ` Andrew Cooper
@ 2015-05-29 10:40               ` Jan Beulich
  2015-05-29 10:50               ` Ian Campbell
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2015-05-29 10:40 UTC (permalink / raw)
  To: Andrew Cooper, Ian Campbell
  Cc: Wei Liu, Stefano Stabellini, Tim Deegan, ian.jackson,
	Julien Grall, DavidVrabel, xen-devel

>>> On 29.05.15 at 11:56, <andrew.cooper3@citrix.com> wrote:
> On 28/05/15 11:10, Jan Beulich wrote:
>>>>> On 28.05.15 at 11:26, <ian.campbell@citrix.com> wrote:
>>> On Thu, 2015-05-28 at 09:50 +0100, Jan Beulich wrote:
>>>>>>> On 27.05.15 at 18:04, <ian.campbell@citrix.com> wrote:
>>>>> On Tue, 2015-05-26 at 14:29 +0100, Ian Campbell wrote:
>>>>>> I've now managed to reproduce using the arndale on my desk.
>>>>> ... and now I've confirmed that reverting the spin lock change causes
>>>>> the issue to not happen any more.
>>>> Considering that this issue has prevented a push for almost
>>>> two weeks, I think we ought to consider reverting the two
>>>> offending commits until the problem got sorted out.
>>> I think that would probably be wise. I'll try and figure out exactly
>>> what is going on and propose some patches ASAP.
>> Now done and pushed.
> 
> Wait what?  This failure is not related to spinlocks; It is a networking
> behavioural bug (hardware specific, even) which has been uncovered,
> showing that there is a preexisting race condition.

If Ian gives his okay, I'm fine to re-instate the reverted patches (which
incidentally even got a push during the night), but I can't really see the
proof of what you claim in any of the earlier communication.

> It is not reasonable to revert a correct change because it has exposed
> an existing race condition elsewhere.  IMO, this should have been a
> force push to mark the test as non-blocking.

That's one way to view it. I'm not sure a force push would have been
warranted here, as the regression was real. And further holding up
the tree moving forward would have been bad in that situation too,
the more that it was - as said above - almost two weeks that it had
been stuck.

Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-05-29  9:56             ` Andrew Cooper
  2015-05-29 10:40               ` Jan Beulich
@ 2015-05-29 10:50               ` Ian Campbell
  1 sibling, 0 replies; 16+ messages in thread
From: Ian Campbell @ 2015-05-29 10:50 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Stefano Stabellini, ian.jackson, Tim Deegan,
	Julien Grall, David Vrabel, Jan Beulich, xen-devel

On Fri, 2015-05-29 at 10:56 +0100, Andrew Cooper wrote:
> On 28/05/15 11:10, Jan Beulich wrote:
> >>>> On 28.05.15 at 11:26, <ian.campbell@citrix.com> wrote:
> >> On Thu, 2015-05-28 at 09:50 +0100, Jan Beulich wrote:
> >>>>>> On 27.05.15 at 18:04, <ian.campbell@citrix.com> wrote:
> >>>> On Tue, 2015-05-26 at 14:29 +0100, Ian Campbell wrote:
> >>>>> I've now managed to reproduce using the arndale on my desk.
> >>>> ... and now I've confirmed that reverting the spin lock change causes
> >>>> the issue to not happen any more.
> >>> Considering that this issue has prevented a push for almost
> >>> two weeks, I think we ought to consider reverting the two
> >>> offending commits until the problem got sorted out.
> >> I think that would probably be wise. I'll try and figure out exactly
> >> what is going on and propose some patches ASAP.
> > Now done and pushed.
> 
> Wait what?  This failure is not related to spinlocks; It is a networking
> behavioural bug (hardware specific, even) which has been uncovered,
> showing that there is a preexisting race condition.

That's the current _hypothesis_, but it hasn't been confirmed what is
actually happening here.

So far doing the apparently obvious fix in netback (moving the state
change to closed until after the uevent is generated) doesn't seem to
have fixed the issue. So either the hypothesis is wrong or there is
something more subtle going on.

We don't know what is causing this issue yet and therefore neither
holding up the push gate nor force pushing seem appropriate under the
circumstances.

> It is not reasonable to revert a correct change because it has exposed
> an existing race condition elsewhere.  IMO, this should have been a
> force push to mark the test as non-blocking.
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-05-27 16:04     ` Ian Campbell
  2015-05-28  8:50       ` Jan Beulich
@ 2015-05-29 16:32       ` Ian Campbell
  2015-06-02 10:30         ` Jan Beulich
  1 sibling, 1 reply; 16+ messages in thread
From: Ian Campbell @ 2015-05-29 16:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, Tim Deegan, ian.jackson,
	Julien Grall, David Vrabel

On Wed, 2015-05-27 at 17:04 +0100, Ian Campbell wrote:
> Looking at the netback side though it seems like netback_remove is
> switching to state=Closed _before_ it calls kobject_uevent(...,
> KOBJ_OFFLINE) and it is this which generates the call to netback_uevent
> which tries and fails to read script and produces the error message.

I've just sent out a patch which fixes this issue, although I am still
at a loss to explain why we have only started seeing this now and only
under such specific circumstances.

> I'm still slightly concerned that perhaps the new spinlock stuff has
> some sort of bad behaviour either on arndale specifically or more
> generally for ARM systems which has pushed this particular case over the
> edge.

I did run some benchmarks (hackbench+fio on arndale domU and hackbench
on midway dom0) with and without the ticket locks and the results were
close enough that I'm basically not too worried that there is something
wrong with the ticket locks on ARM.

It still niggles somewhat not to have a good theory about why this
change had this seemingly random effect, but I've not got any good ideas
for avenues to explore and I've got other things to do so I think I'll
leave it at that.

Ian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-05-29 16:32       ` Ian Campbell
@ 2015-06-02 10:30         ` Jan Beulich
  2015-06-11 13:22           ` Julien Grall
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2015-06-02 10:30 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Liu, Stefano Stabellini, Tim Deegan, ian.jackson,
	Julien Grall, David Vrabel, xen-devel

>>> On 29.05.15 at 18:32, <ian.campbell@citrix.com> wrote:
> On Wed, 2015-05-27 at 17:04 +0100, Ian Campbell wrote:
>> Looking at the netback side though it seems like netback_remove is
>> switching to state=Closed _before_ it calls kobject_uevent(...,
>> KOBJ_OFFLINE) and it is this which generates the call to netback_uevent
>> which tries and fails to read script and produces the error message.
> 
> I've just sent out a patch which fixes this issue, although I am still
> at a loss to explain why we have only started seeing this now and only
> under such specific circumstances.
> 
>> I'm still slightly concerned that perhaps the new spinlock stuff has
>> some sort of bad behaviour either on arndale specifically or more
>> generally for ARM systems which has pushed this particular case over the
>> edge.
> 
> I did run some benchmarks (hackbench+fio on arndale domU and hackbench
> on midway dom0) with and without the ticket locks and the results were
> close enough that I'm basically not too worried that there is something
> wrong with the ticket locks on ARM.
> 
> It still niggles somewhat not to have a good theory about why this
> change had this seemingly random effect, but I've not got any good ideas
> for avenues to explore and I've got other things to do so I think I'll
> leave it at that.

So should we then re-instate the ticket lock patches?

Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [xen-unstable test] 56759: regressions - FAIL
  2015-06-02 10:30         ` Jan Beulich
@ 2015-06-11 13:22           ` Julien Grall
  0 siblings, 0 replies; 16+ messages in thread
From: Julien Grall @ 2015-06-11 13:22 UTC (permalink / raw)
  To: Jan Beulich, Ian Campbell
  Cc: Wei Liu, Stefano Stabellini, ian.jackson, Tim Deegan,
	Julien Grall, David Vrabel, xen-devel

Hi,

On 02/06/2015 06:30, Jan Beulich wrote:
>>>> On 29.05.15 at 18:32, <ian.campbell@citrix.com> wrote:
>> On Wed, 2015-05-27 at 17:04 +0100, Ian Campbell wrote:
>>> Looking at the netback side though it seems like netback_remove is
>>> switching to state=Closed _before_ it calls kobject_uevent(...,
>>> KOBJ_OFFLINE) and it is this which generates the call to netback_uevent
>>> which tries and fails to read script and produces the error message.
>>
>> I've just sent out a patch which fixes this issue, although I am still
>> at a loss to explain why we have only started seeing this now and only
>> under such specific circumstances.
>>
>>> I'm still slightly concerned that perhaps the new spinlock stuff has
>>> some sort of bad behaviour either on arndale specifically or more
>>> generally for ARM systems which has pushed this particular case over the
>>> edge.
>>
>> I did run some benchmarks (hackbench+fio on arndale domU and hackbench
>> on midway dom0) with and without the ticket locks and the results were
>> close enough that I'm basically not too worried that there is something
>> wrong with the ticket locks on ARM.
>>
>> It still niggles somewhat not to have a good theory about why this
>> change had this seemingly random effect, but I've not got any good ideas
>> for avenues to explore and I've got other things to do so I think I'll
>> leave it at that.
>
> So should we then re-instate the ticket lock patches?

Any update on this?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-06-11 13:29 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-20  9:34 [xen-unstable test] 56759: regressions - FAIL osstest service user
2015-05-20  9:56 ` Ian Campbell
2015-05-26  9:11   ` Julien Grall
2015-05-26  9:17     ` Ian Campbell
2015-05-26  9:22       ` Julien Grall
2015-05-26 13:29   ` Ian Campbell
2015-05-27 16:04     ` Ian Campbell
2015-05-28  8:50       ` Jan Beulich
2015-05-28  9:26         ` Ian Campbell
2015-05-28 10:10           ` Jan Beulich
2015-05-29  9:56             ` Andrew Cooper
2015-05-29 10:40               ` Jan Beulich
2015-05-29 10:50               ` Ian Campbell
2015-05-29 16:32       ` Ian Campbell
2015-06-02 10:30         ` Jan Beulich
2015-06-11 13:22           ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.