All of lore.kernel.org
 help / color / mirror / Atom feed
* Prepping a bugfix push
@ 2009-12-03 19:26 Jeremy Fitzhardinge
  2009-12-03 19:35 ` Brendan Cully
  2009-12-04 10:45 ` Ian Campbell
  0 siblings, 2 replies; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2009-12-03 19:26 UTC (permalink / raw)
  To: Ian Campbell, Konrad Rzeszutek Wilk, Paolo Bonzini; +Cc: Xen-devel

I'm preparing a general bugfix push for Linus, targeted at both current
linux-2.6.git and stable.  The list of patches I have lined up (in the
"bugfix" branch) are below.  Is there anything I've overlooked?  Are
there any patches I've forgotten to apply altogether?

(Note, this is all domU stuff; dom0 things will need to mature a bit.)

Thanks,
    J

commit b4606f2165153833247823e8c04c5e88cb3d298b
Author: Ian Campbell <ian.campbell@citrix.com>
Date:   Tue Dec 1 11:47:15 2009 +0000

    xen: explicitly create/destroy stop_machine workqueues outside suspend/resume region.
    
    I have observed cases where the implicit stop_machine_destroy() done by
    stop_machine() hangs while destroying the workqueues, specifically in
    kthread_stop(). This seems to be because timer ticks are not restarted
    until after stop_machine() returns.
    
    Fortunately stop_machine provides a facility to pre-create/post-destroy
    the workqueues so use this to ensure that workqueues are only destroyed
    after everything is really up and running again.
    
    I only actually observed this failure with 2.6.30. It seems that newer
    kernels are somehow more robust against doing kthread_stop() without timer
    interrupts (I tried some backports of some likely looking candidates but
    did not track down the commit which added this robustness). However this
    change seems like a reasonable belt&braces thing to do.
    
    Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Cc: Stable Kernel <stable@kernel.org>

commit 65f63384b391bf4d384327d8a7c6de9860290b5c
Author: Ian Campbell <ian.campbell@citrix.com>
Date:   Tue Dec 1 11:47:14 2009 +0000

    xen: improve error handling in do_suspend.
    
    The existing error handling has a few issues:
    - If freeze_processes() fails it exits with shutting_down = SHUTDOWN_SUSPEND.
    - If dpm_suspend_noirq() fails it exits without resuming xenbus.
    - If stop_machine() fails it exits without resuming xenbus or calling
      dpm_resume_end().
    - xs_suspend()/xs_resume() and dpm_suspend_noirq()/dpm_resume_noirq() were not
      nested in the obvious way.
    
    Fix by ensuring each failure case goto's the correct label. Treat a failure of
    stop_machine() as a cancelled suspend in order to follow the correct resume
    path.
    
    Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Cc: Stable Kernel <stable@kernel.org>

commit fed5ea87e02aaf902ff38c65b4514233db03dc09
Author: Ian Campbell <ian.campbell@citrix.com>
Date:   Tue Dec 1 16:15:30 2009 +0000

    xen: don't leak IRQs over suspend/resume.
    
    On resume irq_info[*].evtchn is reset to 0 since event channel mappings
    are not preserved over suspend/resume. The other contents of irq_info
    is preserved to allow rebind_evtchn_irq() to function.
    
    However when a device resumes it will try to unbind from the
    previous IRQ (e.g.  blkfront goes blkfront_resume() -> blkif_free() ->
    unbind_from_irqhandler() -> unbind_from_irq()). This will fail due to the
    check for VALID_EVTCHN in unbind_from_irq() and the IRQ is leaked. The
    device will then continue to resume and allocate a new IRQ, eventually
    leading to find_unbound_irq() panic()ing.
    
    Fix this by changing unbind_from_irq() to handle teardown of interrupts
    which have type!=IRQT_UNBOUND but are not currently bound to a specific
    event channel.
    
    Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Cc: Stable Kernel <stable@kernel.org>

commit f6eafe3665bcc374c66775d58312d1c06c55303f
Author: Ian Campbell <Ian.Campbell@citrix.com>
Date:   Wed Nov 25 14:12:08 2009 +0000

    xen: call clock resume notifier on all CPUs
    
    tick_resume() is never called on secondary processors. Presumably this
    is because they are offlined for suspend on native and so this is
    normally taken care of in the CPU onlining path. Under Xen we keep all
    CPUs online over a suspend.
    
    This patch papers over the issue for me but I will investigate a more
    generic, less hacky, way of doing to the same.
    
    tick_suspend is also only called on the boot CPU which I presume should
    be fixed too.
    
    Signed-off-by: Ian Campbell <Ian.Campbell@citrix.com>
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Cc: Stable Kernel <stable@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>

commit 6aaf5d633bb6cead81b396d861d7bae4b9a0ba7e
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date:   Wed Nov 25 13:15:38 2009 -0800

    xen: use iret for return from 64b kernel to 32b usermode
    
    If Xen wants to return to a 32b usermode with sysret it must use the
    right form.  When using VCGF_in_syscall to trigger this, it looks at
    the code segment and does a 32b sysret if it is FLAT_USER_CS32.
    However, this is different from __USER32_CS, so it fails to return
    properly if we use the normal Linux segment.
    
    So avoid the whole mess by dropping VCGF_in_syscall and simply use
    plain iret to return to usermode.
    
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Acked-by: Jan Beulich <jbeulich@novell.com>
    Cc: Stable Kernel <stable@kernel.org>

commit 922cc38ab71d1360978e65207e4a4f4988987127
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date:   Tue Nov 24 09:58:49 2009 -0800

    xen: don't call dpm_resume_noirq() with interrupts disabled.
    
    dpm_resume_noirq() takes a mutex, so it can't be called from a no-interrupt
    context.  Don't call it from within the stop-machine function, but just
    afterwards, since we're resuming anyway, regardless of what happened.
    
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Cc: Stable Kernel <stable@kernel.org>

commit 499d19b82b586aef18727b9ae1437f8f37b66e91
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date:   Tue Nov 24 09:38:25 2009 -0800

    xen: register runstate info for boot CPU early
    
    printk timestamping uses sched_clock, which in turn relies on runstate
    info under Xen.  So make sure we set it up before any printks can
    be called.
    
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Cc: Stable Kernel <stable@kernel.org>

commit 028896721ac04f6fa0697f3ecac3f98761746363
Author: Ian Campbell <ian.campbell@citrix.com>
Date:   Tue Nov 24 09:32:48 2009 -0800

    xen: register runstate on secondary CPUs
    
    The commit "xen: re-register runstate area earlier on resume" caused us
    to never try and setup the runstate area for secondary CPUs. Ensure that
    we do this...
    
    Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Cc: Stable Kernel <stable@kernel.org>

commit f350c7922faad3397c98c81a9e5658f5a1ef0214
Author: Ian Campbell <ian.campbell@citrix.com>
Date:   Tue Nov 24 10:16:23 2009 +0000

    xen: register timer interrupt with IRQF_TIMER
    
    Otherwise the timer is disabled by dpm_suspend_noirq() which in turn prevents
    correct operation of stop_machine on multi-processor systems and breaks
    suspend.
    
    Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Cc: Stable Kernel <stable@kernel.org>

commit fa24ba62ea2869308ffc9f0b286ac9650b4ca6cb
Author: Ian Campbell <ian.campbell@citrix.com>
Date:   Sat Nov 21 11:32:49 2009 +0000

    xen: correctly restore pfn_to_mfn_list_list after resume
    
    pvops kernels >= 2.6.30 can currently only be saved and restored once. The
    second attempt to save results in:
    
        ERROR Internal error: Frame# in pfn-to-mfn frame list is not in pseudophys
        ERROR Internal error: entry 0: p2m_frame_list[0] is 0xf2c2c2c2, max 0x120000
        ERROR Internal error: Failed to map/save the p2m frame list
    
    I finally narrowed it down to:
    
        commit cdaead6b4e657f960d6d6f9f380e7dfeedc6a09b
            Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
            Date:   Fri Feb 27 15:34:59 2009 -0800
    
                xen: split construction of p2m mfn tables from registration
    
                Build the p2m_mfn_list_list early with the rest of the p2m table, but
                register it later when the real shared_info structure is in place.
    
                Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    
    The unforeseen side-effect of this change was to cause the mfn list list to not
    be rebuilt on resume. Prior to this change it would have been rebuilt via
    xen_post_suspend() -> xen_setup_shared_info() -> xen_setup_mfn_list_list().
    
    Fix by explicitly calling xen_build_mfn_list_list() from xen_post_suspend().
    
    Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Cc: Stable Kernel <stable@kernel.org>

commit 3905bb2aa7bb801b31946b37a4635ebac4009051
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date:   Sat Nov 21 08:46:29 2009 +0800

    xen: restore runstate_info even if !have_vcpu_info_placement
    
    Even if have_vcpu_info_placement is not set, we still need to set up
    the runstate area on each resumed vcpu.
    
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Cc: Stable Kernel <stable@kernel.org>

commit be012920ecba161ad20303a3f6d9e96c58cf97c7
Author: Ian Campbell <Ian.Campbell@citrix.com>
Date:   Sat Nov 21 08:35:55 2009 +0800

    xen: re-register runstate area earlier on resume.
    
    This is necessary to ensure the runstate area is available to
    xen_sched_clock before any calls to printk which will require it in
    order to provide a timestamp.
    
    I chose to pull the xen_setup_runstate_info out of xen_time_init into
    the caller in order to maintain parity with calling
    xen_setup_runstate_info separately from calling xen_time_resume.
    
    Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Cc: Stable Kernel <stable@kernel.org>

commit ae7888012969355a548372e99b066d9e31153b62
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Wed Jul 8 12:27:39 2009 +0200

    xen: wait up to 5 minutes for device connetion
    
    Increases the device timeout from 10s to 5 minutes, giving the user a
    visual indication during that time in case there are problems.  The patch
    is a backport of changesets 144 and 150 in the Xenbits tree.
    
    Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

commit f8dc33088febc63286b7a60e6b678de8e064de8e
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Wed Jul 8 12:27:38 2009 +0200

    xen: improvement to wait_for_devices()
    
    When printing a warning about a timed-out device, print the
    current state of both ends of the device connection (i.e., backend as
    well as frontend).  This backports half of changeset 146 from the
    Xenbits tree.
    
    Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

commit c6e1971139be1342902873181f3b80a979bfb33b
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Wed Jul 8 12:27:37 2009 +0200

    xen: fix is_disconnected_device/exists_disconnected_device
    
    The logic of is_disconnected_device/exists_disconnected_device is wrong
    in that they are used to test whether a device is trying to connect (i.e.
    connecting).  For this reason the patch fixes them to not consider a
    Closing or Closed device to be connecting.  At the same time the patch
    also renames the functions according to what they really do; you could
    say a closed device is "disconnected" (the old name), but not "connecting"
    (the new name).
    
    This patch is a backport of changeset 909 from the Xenbits tree.
    
    Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

commit db05fed0ad72f264e39bcb366795f7367384ec92
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date:   Tue Nov 24 16:41:47 2009 -0800

    xen/xenbus: make DEVICE_ATTR()s static
    
    They don't need to be global, and may cause linker clashes.
    
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
    Cc: Stable Kernel <stable@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Prepping a bugfix push
  2009-12-03 19:26 Prepping a bugfix push Jeremy Fitzhardinge
@ 2009-12-03 19:35 ` Brendan Cully
  2009-12-03 19:38   ` Jeremy Fitzhardinge
  2009-12-03 23:22   ` Jeremy Fitzhardinge
  2009-12-04 10:45 ` Ian Campbell
  1 sibling, 2 replies; 17+ messages in thread
From: Brendan Cully @ 2009-12-03 19:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ian Campbell, Paolo Bonzini, Xen-devel, Konrad Rzeszutek Wilk

Not a patch, but I've just tried out xm save -c again with the latest
xen changes, and while I no longer see the grant table version panic,
the guest's devices (aside from the console) appear to be wedged on
resume. Is anyone else seeing this?

After a while on the console I see messages like this:

INFO: task syslogd:2219 blocked for more than 120 seconds.

which I assume is trouble with the block device.

On Thursday, 03 December 2009 at 11:26, Jeremy Fitzhardinge wrote:
> I'm preparing a general bugfix push for Linus, targeted at both current
> linux-2.6.git and stable.  The list of patches I have lined up (in the
> "bugfix" branch) are below.  Is there anything I've overlooked?  Are
> there any patches I've forgotten to apply altogether?
> 
> (Note, this is all domU stuff; dom0 things will need to mature a bit.)
> 
> Thanks,
>     J
> 
> commit b4606f2165153833247823e8c04c5e88cb3d298b
> Author: Ian Campbell <ian.campbell@citrix.com>
> Date:   Tue Dec 1 11:47:15 2009 +0000
> 
>     xen: explicitly create/destroy stop_machine workqueues outside suspend/resume region.
>     
>     I have observed cases where the implicit stop_machine_destroy() done by
>     stop_machine() hangs while destroying the workqueues, specifically in
>     kthread_stop(). This seems to be because timer ticks are not restarted
>     until after stop_machine() returns.
>     
>     Fortunately stop_machine provides a facility to pre-create/post-destroy
>     the workqueues so use this to ensure that workqueues are only destroyed
>     after everything is really up and running again.
>     
>     I only actually observed this failure with 2.6.30. It seems that newer
>     kernels are somehow more robust against doing kthread_stop() without timer
>     interrupts (I tried some backports of some likely looking candidates but
>     did not track down the commit which added this robustness). However this
>     change seems like a reasonable belt&braces thing to do.
>     
>     Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Cc: Stable Kernel <stable@kernel.org>
> 
> commit 65f63384b391bf4d384327d8a7c6de9860290b5c
> Author: Ian Campbell <ian.campbell@citrix.com>
> Date:   Tue Dec 1 11:47:14 2009 +0000
> 
>     xen: improve error handling in do_suspend.
>     
>     The existing error handling has a few issues:
>     - If freeze_processes() fails it exits with shutting_down = SHUTDOWN_SUSPEND.
>     - If dpm_suspend_noirq() fails it exits without resuming xenbus.
>     - If stop_machine() fails it exits without resuming xenbus or calling
>       dpm_resume_end().
>     - xs_suspend()/xs_resume() and dpm_suspend_noirq()/dpm_resume_noirq() were not
>       nested in the obvious way.
>     
>     Fix by ensuring each failure case goto's the correct label. Treat a failure of
>     stop_machine() as a cancelled suspend in order to follow the correct resume
>     path.
>     
>     Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Cc: Stable Kernel <stable@kernel.org>
> 
> commit fed5ea87e02aaf902ff38c65b4514233db03dc09
> Author: Ian Campbell <ian.campbell@citrix.com>
> Date:   Tue Dec 1 16:15:30 2009 +0000
> 
>     xen: don't leak IRQs over suspend/resume.
>     
>     On resume irq_info[*].evtchn is reset to 0 since event channel mappings
>     are not preserved over suspend/resume. The other contents of irq_info
>     is preserved to allow rebind_evtchn_irq() to function.
>     
>     However when a device resumes it will try to unbind from the
>     previous IRQ (e.g.  blkfront goes blkfront_resume() -> blkif_free() ->
>     unbind_from_irqhandler() -> unbind_from_irq()). This will fail due to the
>     check for VALID_EVTCHN in unbind_from_irq() and the IRQ is leaked. The
>     device will then continue to resume and allocate a new IRQ, eventually
>     leading to find_unbound_irq() panic()ing.
>     
>     Fix this by changing unbind_from_irq() to handle teardown of interrupts
>     which have type!=IRQT_UNBOUND but are not currently bound to a specific
>     event channel.
>     
>     Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Cc: Stable Kernel <stable@kernel.org>
> 
> commit f6eafe3665bcc374c66775d58312d1c06c55303f
> Author: Ian Campbell <Ian.Campbell@citrix.com>
> Date:   Wed Nov 25 14:12:08 2009 +0000
> 
>     xen: call clock resume notifier on all CPUs
>     
>     tick_resume() is never called on secondary processors. Presumably this
>     is because they are offlined for suspend on native and so this is
>     normally taken care of in the CPU onlining path. Under Xen we keep all
>     CPUs online over a suspend.
>     
>     This patch papers over the issue for me but I will investigate a more
>     generic, less hacky, way of doing to the same.
>     
>     tick_suspend is also only called on the boot CPU which I presume should
>     be fixed too.
>     
>     Signed-off-by: Ian Campbell <Ian.Campbell@citrix.com>
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Cc: Stable Kernel <stable@kernel.org>
>     Cc: Thomas Gleixner <tglx@linutronix.de>
> 
> commit 6aaf5d633bb6cead81b396d861d7bae4b9a0ba7e
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Date:   Wed Nov 25 13:15:38 2009 -0800
> 
>     xen: use iret for return from 64b kernel to 32b usermode
>     
>     If Xen wants to return to a 32b usermode with sysret it must use the
>     right form.  When using VCGF_in_syscall to trigger this, it looks at
>     the code segment and does a 32b sysret if it is FLAT_USER_CS32.
>     However, this is different from __USER32_CS, so it fails to return
>     properly if we use the normal Linux segment.
>     
>     So avoid the whole mess by dropping VCGF_in_syscall and simply use
>     plain iret to return to usermode.
>     
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Acked-by: Jan Beulich <jbeulich@novell.com>
>     Cc: Stable Kernel <stable@kernel.org>
> 
> commit 922cc38ab71d1360978e65207e4a4f4988987127
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Date:   Tue Nov 24 09:58:49 2009 -0800
> 
>     xen: don't call dpm_resume_noirq() with interrupts disabled.
>     
>     dpm_resume_noirq() takes a mutex, so it can't be called from a no-interrupt
>     context.  Don't call it from within the stop-machine function, but just
>     afterwards, since we're resuming anyway, regardless of what happened.
>     
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Cc: Stable Kernel <stable@kernel.org>
> 
> commit 499d19b82b586aef18727b9ae1437f8f37b66e91
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Date:   Tue Nov 24 09:38:25 2009 -0800
> 
>     xen: register runstate info for boot CPU early
>     
>     printk timestamping uses sched_clock, which in turn relies on runstate
>     info under Xen.  So make sure we set it up before any printks can
>     be called.
>     
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Cc: Stable Kernel <stable@kernel.org>
> 
> commit 028896721ac04f6fa0697f3ecac3f98761746363
> Author: Ian Campbell <ian.campbell@citrix.com>
> Date:   Tue Nov 24 09:32:48 2009 -0800
> 
>     xen: register runstate on secondary CPUs
>     
>     The commit "xen: re-register runstate area earlier on resume" caused us
>     to never try and setup the runstate area for secondary CPUs. Ensure that
>     we do this...
>     
>     Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Cc: Stable Kernel <stable@kernel.org>
> 
> commit f350c7922faad3397c98c81a9e5658f5a1ef0214
> Author: Ian Campbell <ian.campbell@citrix.com>
> Date:   Tue Nov 24 10:16:23 2009 +0000
> 
>     xen: register timer interrupt with IRQF_TIMER
>     
>     Otherwise the timer is disabled by dpm_suspend_noirq() which in turn prevents
>     correct operation of stop_machine on multi-processor systems and breaks
>     suspend.
>     
>     Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Cc: Stable Kernel <stable@kernel.org>
> 
> commit fa24ba62ea2869308ffc9f0b286ac9650b4ca6cb
> Author: Ian Campbell <ian.campbell@citrix.com>
> Date:   Sat Nov 21 11:32:49 2009 +0000
> 
>     xen: correctly restore pfn_to_mfn_list_list after resume
>     
>     pvops kernels >= 2.6.30 can currently only be saved and restored once. The
>     second attempt to save results in:
>     
>         ERROR Internal error: Frame# in pfn-to-mfn frame list is not in pseudophys
>         ERROR Internal error: entry 0: p2m_frame_list[0] is 0xf2c2c2c2, max 0x120000
>         ERROR Internal error: Failed to map/save the p2m frame list
>     
>     I finally narrowed it down to:
>     
>         commit cdaead6b4e657f960d6d6f9f380e7dfeedc6a09b
>             Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>             Date:   Fri Feb 27 15:34:59 2009 -0800
>     
>                 xen: split construction of p2m mfn tables from registration
>     
>                 Build the p2m_mfn_list_list early with the rest of the p2m table, but
>                 register it later when the real shared_info structure is in place.
>     
>                 Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     
>     The unforeseen side-effect of this change was to cause the mfn list list to not
>     be rebuilt on resume. Prior to this change it would have been rebuilt via
>     xen_post_suspend() -> xen_setup_shared_info() -> xen_setup_mfn_list_list().
>     
>     Fix by explicitly calling xen_build_mfn_list_list() from xen_post_suspend().
>     
>     Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Cc: Stable Kernel <stable@kernel.org>
> 
> commit 3905bb2aa7bb801b31946b37a4635ebac4009051
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Date:   Sat Nov 21 08:46:29 2009 +0800
> 
>     xen: restore runstate_info even if !have_vcpu_info_placement
>     
>     Even if have_vcpu_info_placement is not set, we still need to set up
>     the runstate area on each resumed vcpu.
>     
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Cc: Stable Kernel <stable@kernel.org>
> 
> commit be012920ecba161ad20303a3f6d9e96c58cf97c7
> Author: Ian Campbell <Ian.Campbell@citrix.com>
> Date:   Sat Nov 21 08:35:55 2009 +0800
> 
>     xen: re-register runstate area earlier on resume.
>     
>     This is necessary to ensure the runstate area is available to
>     xen_sched_clock before any calls to printk which will require it in
>     order to provide a timestamp.
>     
>     I chose to pull the xen_setup_runstate_info out of xen_time_init into
>     the caller in order to maintain parity with calling
>     xen_setup_runstate_info separately from calling xen_time_resume.
>     
>     Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Cc: Stable Kernel <stable@kernel.org>
> 
> commit ae7888012969355a548372e99b066d9e31153b62
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Wed Jul 8 12:27:39 2009 +0200
> 
>     xen: wait up to 5 minutes for device connetion
>     
>     Increases the device timeout from 10s to 5 minutes, giving the user a
>     visual indication during that time in case there are problems.  The patch
>     is a backport of changesets 144 and 150 in the Xenbits tree.
>     
>     Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> commit f8dc33088febc63286b7a60e6b678de8e064de8e
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Wed Jul 8 12:27:38 2009 +0200
> 
>     xen: improvement to wait_for_devices()
>     
>     When printing a warning about a timed-out device, print the
>     current state of both ends of the device connection (i.e., backend as
>     well as frontend).  This backports half of changeset 146 from the
>     Xenbits tree.
>     
>     Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> commit c6e1971139be1342902873181f3b80a979bfb33b
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Wed Jul 8 12:27:37 2009 +0200
> 
>     xen: fix is_disconnected_device/exists_disconnected_device
>     
>     The logic of is_disconnected_device/exists_disconnected_device is wrong
>     in that they are used to test whether a device is trying to connect (i.e.
>     connecting).  For this reason the patch fixes them to not consider a
>     Closing or Closed device to be connecting.  At the same time the patch
>     also renames the functions according to what they really do; you could
>     say a closed device is "disconnected" (the old name), but not "connecting"
>     (the new name).
>     
>     This patch is a backport of changeset 909 from the Xenbits tree.
>     
>     Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> commit db05fed0ad72f264e39bcb366795f7367384ec92
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Date:   Tue Nov 24 16:41:47 2009 -0800
> 
>     xen/xenbus: make DEVICE_ATTR()s static
>     
>     They don't need to be global, and may cause linker clashes.
>     
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>     Cc: Stable Kernel <stable@kernel.org>
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Prepping a bugfix push
  2009-12-03 19:35 ` Brendan Cully
@ 2009-12-03 19:38   ` Jeremy Fitzhardinge
  2009-12-03 23:22   ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2009-12-03 19:38 UTC (permalink / raw)
  To: Ian.Campbell, konrad.wilk, pbonzini, xen-devel

On 12/03/09 11:35, Brendan Cully wrote:
> Not a patch, but I've just tried out xm save -c again with the latest
> xen changes, and while I no longer see the grant table version panic,
> the guest's devices (aside from the console) appear to be wedged on
> resume. Is anyone else seeing this?
>   

I tried it successfully the other day, but not since the most recent set
of patches.  If one of them caused this, then "xen: don't leak IRQs over
suspend/resume" looks like the most suspect.

I'm building a test kernel at the moment, so I'll do some more testing.

    J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Prepping a bugfix push
  2009-12-03 19:35 ` Brendan Cully
  2009-12-03 19:38   ` Jeremy Fitzhardinge
@ 2009-12-03 23:22   ` Jeremy Fitzhardinge
  2009-12-04  0:24     ` Brendan Cully
  1 sibling, 1 reply; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2009-12-03 23:22 UTC (permalink / raw)
  To: Ian.Campbell, konrad.wilk, pbonzini, xen-devel

On 12/03/09 11:35, Brendan Cully wrote:
> Not a patch, but I've just tried out xm save -c again with the latest
> xen changes, and while I no longer see the grant table version panic,
> the guest's devices (aside from the console) appear to be wedged on
> resume. Is anyone else seeing this?
>
> After a while on the console I see messages like this:
>
> INFO: task syslogd:2219 blocked for more than 120 seconds.
>
> which I assume is trouble with the block device.
>    

Is there any backtrace or other info associated with this?

     J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Prepping a bugfix push
  2009-12-03 23:22   ` Jeremy Fitzhardinge
@ 2009-12-04  0:24     ` Brendan Cully
  2009-12-04  1:10       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 17+ messages in thread
From: Brendan Cully @ 2009-12-04  0:24 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Ian.Campbell, pbonzini, xen-devel, konrad.wilk

On Thursday, 03 December 2009 at 15:22, Jeremy Fitzhardinge wrote:
> On 12/03/09 11:35, Brendan Cully wrote:
> >Not a patch, but I've just tried out xm save -c again with the latest
> >xen changes, and while I no longer see the grant table version panic,
> >the guest's devices (aside from the console) appear to be wedged on
> >resume. Is anyone else seeing this?
> >
> >After a while on the console I see messages like this:
> >
> >INFO: task syslogd:2219 blocked for more than 120 seconds.
> >
> >which I assume is trouble with the block device.
> 
> Is there any backtrace or other info associated with this?

I take it you couldn't reproduce? Let me know what you want
captured. In the meantime, here's the kernel log:

Dec  4 00:16:12 p32 kernel: suspending xenstore...
Dec  4 00:16:12 p32 kernel: Grant tables using version 2 layout.

Here's a backtrace after the block message:

INFO: task kjournald:880 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D 00000000     0   880      2 0x00000000
 cfb6be70 00000246 cf8fa000 00000000 00000002 00000000 c14972a0 00000001
 cf8fa000 c152f380 c152f380 c152f380 cf8fa1fc cf850b10 cf850c9c d0808380
 c1924584 00000000 cfb6be70 c1006f04 c1920284 c1924588 c1920284 cfb49854
Call Trace:
 [<c1006f04>] ? check_events+0x8/0xc
 [<c137b0a8>] io_schedule+0x1e/0x28
 [<c10c2555>] sync_buffer+0x33/0x37
 [<c137b3a1>] __wait_on_bit+0x45/0x62
 [<c10c2522>] ? sync_buffer+0x0/0x37
 [<c137b41d>] out_of_line_wait_on_bit+0x5f/0x72
 [<c10c2522>] ? sync_buffer+0x0/0x37
 [<c1049b7e>] ? wake_bit_function+0x0/0x47
 [<c10c24e5>] __wait_on_buffer+0x23/0x25
 [<c1102940>] journal_commit_transaction+0x50b/0xdeb
 [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
 [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
 [<c103fad4>] ? try_to_del_timer_sync+0x49/0x52
 [<c1105b66>] kjournald+0xab/0x1f0
 [<c1049b46>] ? autoremove_wake_function+0x0/0x38
 [<c1105abb>] ? kjournald+0x0/0x1f0
 [<c1049a85>] kthread+0x74/0x7f
 [<c1049a11>] ? kthread+0x0/0x7f
 [<c1009b37>] kernel_thread_helper+0x7/0x10

and here's the result of xm sysrq <dom> l
SysRq : Show backtrace of all active CPUs

and xm sysrq <dom> w

SysRq : Show Blocked State
  task                PC stack   pid father
pdflush       D cf40297c     0   150      2 0x00000000
 cf95dec8 00000246 00000000 cf40297c c1550280 c1553680 c10067df c1550280
 00000000 c152f380 c152f380 c152f380 c1550280 cf8b3130 cf8b32bc d0808380
 c1553680 ffff995d cf95dec8 c1006f04 c1550280 c1553a6c c1550280 c1006efb
Call Trace:
 [<c10067df>] ? xen_force_evtchn_callback+0xf/0x14
 [<c1006f04>] ? check_events+0x8/0xc
 [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
 [<c137b1cb>] schedule_timeout+0xd6/0x13e
 [<c10067df>] ? xen_force_evtchn_callback+0xf/0x14
 [<c103fc07>] ? process_timeout+0x0/0xa
 [<c137a8ea>] io_schedule_timeout+0x1e/0x28
 [<c108e9a8>] congestion_wait+0x51/0x66
 [<c1049b46>] ? autoremove_wake_function+0x0/0x38
 [<c1083ef3>] wb_kupdate+0xb3/0xfa
 [<c10842d5>] ? pdflush+0x0/0x205
 [<c10843d5>] pdflush+0x100/0x205
 [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
 [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
 [<c1083e40>] ? wb_kupdate+0x0/0xfa
 [<c1049a85>] kthread+0x74/0x7f
 [<c1049a11>] ? kthread+0x0/0x7f
 [<c1009b37>] kernel_thread_helper+0x7/0x10
kjournald     D 00000000     0   880      2 0x00000000
 cfb6be70 00000246 cf8fa000 00000000 00000002 00000000 c14972a0 00000001
 cf8fa000 c152f380 c152f380 c152f380 cf8fa1fc cf850b10 cf850c9c d0808380
 c1924584 00000000 cfb6be70 c1006f04 c1920284 c1924588 c1920284 cfb49854
Call Trace:
 [<c1006f04>] ? check_events+0x8/0xc
 [<c137b0a8>] io_schedule+0x1e/0x28
 [<c10c2555>] sync_buffer+0x33/0x37
 [<c137b3a1>] __wait_on_bit+0x45/0x62
 [<c10c2522>] ? sync_buffer+0x0/0x37
 [<c137b41d>] out_of_line_wait_on_bit+0x5f/0x72
 [<c10c2522>] ? sync_buffer+0x0/0x37
 [<c1049b7e>] ? wake_bit_function+0x0/0x47
 [<c10c24e5>] __wait_on_buffer+0x23/0x25
 [<c1102940>] journal_commit_transaction+0x50b/0xdeb
 [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
 [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
 [<c103fad4>] ? try_to_del_timer_sync+0x49/0x52
 [<c1105b66>] kjournald+0xab/0x1f0
 [<c1049b46>] ? autoremove_wake_function+0x0/0x38
 [<c1105abb>] ? kjournald+0x0/0x1f0
 [<c1049a85>] kthread+0x74/0x7f
 [<c1049a11>] ? kthread+0x0/0x7f
 [<c1009b37>] kernel_thread_helper+0x7/0x10
syslogd       D 00000000     0  2219      1 0x00000000
 cfb33e40 00000286 00000000 00000000 0000aa70 00000000 c10067df cfb7026c
 00000001 c152f380 c152f380 c152f380 00000000 cfa2a130 cfa2a2bc d0808380
 cfb33e64 cfb70254 c1006efb c137c4f4 c10282ff cfb33e58 cfb33e40 c1049d05
Call Trace:
 [<c10067df>] ? xen_force_evtchn_callback+0xf/0x14
 [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
 [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
 [<c10282ff>] ? __wake_up+0x3a/0x42
 [<c1049d05>] ? prepare_to_wait+0x41/0x46
 [<c110535f>] log_wait_commit+0x87/0xef
 [<c1049b46>] ? autoremove_wake_function+0x0/0x38
 [<c1100ca8>] journal_stop+0x1dc/0x29f
 [<c110156d>] ? journal_start+0x88/0xb2
 [<c11015b4>] journal_force_commit+0x1d/0x1f
 [<c10f2d8a>] ext3_force_commit+0x1c/0x22
 [<c10ed096>] ext3_write_inode+0x2d/0x3b
 [<c10bd21f>] writeback_single_inode+0x1f5/0x2be
 [<c108326f>] ? do_writepages+0x37/0x39
 [<c10bdaa1>] sync_inode+0x1f/0x37
 [<c10e9409>] ext3_sync_file+0x99/0xb0
 [<c10c06fc>] vfs_fsync+0x73/0xb1
 [<c10c075c>] do_fsync+0x22/0x32
 [<c10c078b>] sys_fsync+0xd/0xf
 [<c1009058>] sysenter_do_call+0x12/0x2c
getty         D cf402840     0  2330      1 0x00000004
 cfb23cc0 00000286 00000000 cf402840 00000000 00000000 00000000 00000000
 00000000 c152f380 c152f380 c152f380 00000000 cf8740f0 cf87427c d0808380
 cfb23d14 c19202c8 c1006efb c137c4f4 00000002 cfb23d08 cfb23cc0 c1049d05
Call Trace:
 [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
 [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
 [<c1049d05>] ? prepare_to_wait+0x41/0x46
 [<c1101016>] do_get_write_access+0x2ab/0x3cd
 [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
 [<c1049b7e>] ? wake_bit_function+0x0/0x47
 [<c1101153>] journal_get_write_access+0x1b/0x2a
 [<c10f72c2>] __ext3_journal_get_write_access+0x19/0x3f
 [<c10eab31>] ext3_reserve_inode_write+0x34/0x6a
 [<c10eab7e>] ext3_mark_inode_dirty+0x17/0x2e
 [<c10eda05>] ext3_dirty_inode+0x6b/0x6d
 [<c107ce9c>] ? file_remove_suid+0x1b/0x59
 [<c10bd91d>] __mark_inode_dirty+0x23/0xd2
 [<c10b6618>] file_update_time+0x76/0xb3
 [<c107e0b9>] __generic_file_aio_write_nolock+0x235/0x4b3
 [<c107f0e5>] ? filemap_fault+0x76/0x3b1
 [<c107e5a5>] generic_file_aio_write+0x5d/0xd1
 [<c10e9267>] ext3_file_write+0x27/0xa5
 [<c10a527a>] do_sync_write+0xcd/0x103
 [<c1049b46>] ? autoremove_wake_function+0x0/0x38
 [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
 [<c10d2c2c>] ? locks_free_lock+0x31/0x40
 [<c10d4103>] ? fcntl_setlk+0x3e/0x185
 [<c10a596e>] vfs_write+0x8b/0x13f
 [<c137e474>] ? do_page_fault+0x1b4/0x3a0
 [<c10a51ad>] ? do_sync_write+0x0/0x103
 [<c10a5f1d>] sys_write+0x3d/0x64
 [<c1009058>] sysenter_do_call+0x12/0x2c
Sched Debug Version: v0.09, 2.6.31.6 #8
now at 194118.700060 msecs
  .jiffies                                 : 4294940775
  .sysctl_sched_latency                    : 20.000000
  .sysctl_sched_min_granularity            : 4.000000
  .sysctl_sched_wakeup_granularity         : 5.000000
  .sysctl_sched_child_runs_first           : 0.000001
  .sysctl_sched_features                   : 113917

cpu#0, 2500.152 MHz
  .nr_running                    : 1
  .load                          : 3121
  .nr_switches                   : 17143
  .nr_load_updates               : 48354
  .nr_uninterruptible            : 4
  .next_balance                  : 4294.940896
  .curr->pid                     : 14
  .clock                         : 193919.954084
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 0
  .cpu_load[2]                   : 0
  .cpu_load[3]                   : 0
  .cpu_load[4]                   : 0

cfs_rq[0]:
  .exec_clock                    : 0.000000
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 13296.631519
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : 0.000000
  .nr_running                    : 1
  .load                          : 3121
  .nr_spread_over                : 0

rt_rq[0]:
  .rt_nr_running                 : 0
  .rt_throttled                  : 0
  .rt_time                       : 0.000000
  .rt_runtime                    : 950.000000

runnable tasks:
            task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------
R       xenwatch    14     13276.942024       146   115               0               0               0.000000               0.000000               0.000000

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Prepping a bugfix push
  2009-12-04  0:24     ` Brendan Cully
@ 2009-12-04  1:10       ` Jeremy Fitzhardinge
  2009-12-04  4:29         ` Brendan Cully
  2009-12-04  7:46         ` Ian Campbell
  0 siblings, 2 replies; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2009-12-04  1:10 UTC (permalink / raw)
  To: Ian.Campbell, konrad.wilk, pbonzini, xen-devel

On 12/03/09 16:24, Brendan Cully wrote:
> On Thursday, 03 December 2009 at 15:22, Jeremy Fitzhardinge wrote:
>    
>> On 12/03/09 11:35, Brendan Cully wrote:
>>      
>>> Not a patch, but I've just tried out xm save -c again with the latest
>>> xen changes, and while I no longer see the grant table version panic,
>>> the guest's devices (aside from the console) appear to be wedged on
>>> resume. Is anyone else seeing this?
>>>
>>> After a while on the console I see messages like this:
>>>
>>> INFO: task syslogd:2219 blocked for more than 120 seconds.
>>>
>>> which I assume is trouble with the block device.
>>>        
>> Is there any backtrace or other info associated with this?
>>      
> I take it you couldn't reproduce? Let me know what you want
> captured. In the meantime, here's the kernel log:
>    

Nope, works for me :)  Ian has been doing testing with hundreds or 
thousands of suspend/resumes, so it *can* be reliable.

That said, I am seeing some odd hangs after suspend/resume with the 
bugfix branch merged into Linux mainline...

Can you ping the domain when its in that state?

     J

> Dec  4 00:16:12 p32 kernel: suspending xenstore...
> Dec  4 00:16:12 p32 kernel: Grant tables using version 2 layout.
>
> Here's a backtrace after the block message:
>
> INFO: task kjournald:880 blocked for more than 120 seconds.
> "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kjournald     D 00000000     0   880      2 0x00000000
>   cfb6be70 00000246 cf8fa000 00000000 00000002 00000000 c14972a0 00000001
>   cf8fa000 c152f380 c152f380 c152f380 cf8fa1fc cf850b10 cf850c9c d0808380
>   c1924584 00000000 cfb6be70 c1006f04 c1920284 c1924588 c1920284 cfb49854
> Call Trace:
>   [<c1006f04>] ? check_events+0x8/0xc
>   [<c137b0a8>] io_schedule+0x1e/0x28
>   [<c10c2555>] sync_buffer+0x33/0x37
>   [<c137b3a1>] __wait_on_bit+0x45/0x62
>   [<c10c2522>] ? sync_buffer+0x0/0x37
>   [<c137b41d>] out_of_line_wait_on_bit+0x5f/0x72
>   [<c10c2522>] ? sync_buffer+0x0/0x37
>   [<c1049b7e>] ? wake_bit_function+0x0/0x47
>   [<c10c24e5>] __wait_on_buffer+0x23/0x25
>   [<c1102940>] journal_commit_transaction+0x50b/0xdeb
>   [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
>   [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
>   [<c103fad4>] ? try_to_del_timer_sync+0x49/0x52
>   [<c1105b66>] kjournald+0xab/0x1f0
>   [<c1049b46>] ? autoremove_wake_function+0x0/0x38
>   [<c1105abb>] ? kjournald+0x0/0x1f0
>   [<c1049a85>] kthread+0x74/0x7f
>   [<c1049a11>] ? kthread+0x0/0x7f
>   [<c1009b37>] kernel_thread_helper+0x7/0x10
>
> and here's the result of xm sysrq<dom>  l
> SysRq : Show backtrace of all active CPUs
>
> and xm sysrq<dom>  w
>
> SysRq : Show Blocked State
>    task                PC stack   pid father
> pdflush       D cf40297c     0   150      2 0x00000000
>   cf95dec8 00000246 00000000 cf40297c c1550280 c1553680 c10067df c1550280
>   00000000 c152f380 c152f380 c152f380 c1550280 cf8b3130 cf8b32bc d0808380
>   c1553680 ffff995d cf95dec8 c1006f04 c1550280 c1553a6c c1550280 c1006efb
> Call Trace:
>   [<c10067df>] ? xen_force_evtchn_callback+0xf/0x14
>   [<c1006f04>] ? check_events+0x8/0xc
>   [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
>   [<c137b1cb>] schedule_timeout+0xd6/0x13e
>   [<c10067df>] ? xen_force_evtchn_callback+0xf/0x14
>   [<c103fc07>] ? process_timeout+0x0/0xa
>   [<c137a8ea>] io_schedule_timeout+0x1e/0x28
>   [<c108e9a8>] congestion_wait+0x51/0x66
>   [<c1049b46>] ? autoremove_wake_function+0x0/0x38
>   [<c1083ef3>] wb_kupdate+0xb3/0xfa
>   [<c10842d5>] ? pdflush+0x0/0x205
>   [<c10843d5>] pdflush+0x100/0x205
>   [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
>   [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
>   [<c1083e40>] ? wb_kupdate+0x0/0xfa
>   [<c1049a85>] kthread+0x74/0x7f
>   [<c1049a11>] ? kthread+0x0/0x7f
>   [<c1009b37>] kernel_thread_helper+0x7/0x10
> kjournald     D 00000000     0   880      2 0x00000000
>   cfb6be70 00000246 cf8fa000 00000000 00000002 00000000 c14972a0 00000001
>   cf8fa000 c152f380 c152f380 c152f380 cf8fa1fc cf850b10 cf850c9c d0808380
>   c1924584 00000000 cfb6be70 c1006f04 c1920284 c1924588 c1920284 cfb49854
> Call Trace:
>   [<c1006f04>] ? check_events+0x8/0xc
>   [<c137b0a8>] io_schedule+0x1e/0x28
>   [<c10c2555>] sync_buffer+0x33/0x37
>   [<c137b3a1>] __wait_on_bit+0x45/0x62
>   [<c10c2522>] ? sync_buffer+0x0/0x37
>   [<c137b41d>] out_of_line_wait_on_bit+0x5f/0x72
>   [<c10c2522>] ? sync_buffer+0x0/0x37
>   [<c1049b7e>] ? wake_bit_function+0x0/0x47
>   [<c10c24e5>] __wait_on_buffer+0x23/0x25
>   [<c1102940>] journal_commit_transaction+0x50b/0xdeb
>   [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
>   [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
>   [<c103fad4>] ? try_to_del_timer_sync+0x49/0x52
>   [<c1105b66>] kjournald+0xab/0x1f0
>   [<c1049b46>] ? autoremove_wake_function+0x0/0x38
>   [<c1105abb>] ? kjournald+0x0/0x1f0
>   [<c1049a85>] kthread+0x74/0x7f
>   [<c1049a11>] ? kthread+0x0/0x7f
>   [<c1009b37>] kernel_thread_helper+0x7/0x10
> syslogd       D 00000000     0  2219      1 0x00000000
>   cfb33e40 00000286 00000000 00000000 0000aa70 00000000 c10067df cfb7026c
>   00000001 c152f380 c152f380 c152f380 00000000 cfa2a130 cfa2a2bc d0808380
>   cfb33e64 cfb70254 c1006efb c137c4f4 c10282ff cfb33e58 cfb33e40 c1049d05
> Call Trace:
>   [<c10067df>] ? xen_force_evtchn_callback+0xf/0x14
>   [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
>   [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
>   [<c10282ff>] ? __wake_up+0x3a/0x42
>   [<c1049d05>] ? prepare_to_wait+0x41/0x46
>   [<c110535f>] log_wait_commit+0x87/0xef
>   [<c1049b46>] ? autoremove_wake_function+0x0/0x38
>   [<c1100ca8>] journal_stop+0x1dc/0x29f
>   [<c110156d>] ? journal_start+0x88/0xb2
>   [<c11015b4>] journal_force_commit+0x1d/0x1f
>   [<c10f2d8a>] ext3_force_commit+0x1c/0x22
>   [<c10ed096>] ext3_write_inode+0x2d/0x3b
>   [<c10bd21f>] writeback_single_inode+0x1f5/0x2be
>   [<c108326f>] ? do_writepages+0x37/0x39
>   [<c10bdaa1>] sync_inode+0x1f/0x37
>   [<c10e9409>] ext3_sync_file+0x99/0xb0
>   [<c10c06fc>] vfs_fsync+0x73/0xb1
>   [<c10c075c>] do_fsync+0x22/0x32
>   [<c10c078b>] sys_fsync+0xd/0xf
>   [<c1009058>] sysenter_do_call+0x12/0x2c
> getty         D cf402840     0  2330      1 0x00000004
>   cfb23cc0 00000286 00000000 cf402840 00000000 00000000 00000000 00000000
>   00000000 c152f380 c152f380 c152f380 00000000 cf8740f0 cf87427c d0808380
>   cfb23d14 c19202c8 c1006efb c137c4f4 00000002 cfb23d08 cfb23cc0 c1049d05
> Call Trace:
>   [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
>   [<c137c4f4>] ? _spin_unlock_irqrestore+0x17/0x1d
>   [<c1049d05>] ? prepare_to_wait+0x41/0x46
>   [<c1101016>] do_get_write_access+0x2ab/0x3cd
>   [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
>   [<c1049b7e>] ? wake_bit_function+0x0/0x47
>   [<c1101153>] journal_get_write_access+0x1b/0x2a
>   [<c10f72c2>] __ext3_journal_get_write_access+0x19/0x3f
>   [<c10eab31>] ext3_reserve_inode_write+0x34/0x6a
>   [<c10eab7e>] ext3_mark_inode_dirty+0x17/0x2e
>   [<c10eda05>] ext3_dirty_inode+0x6b/0x6d
>   [<c107ce9c>] ? file_remove_suid+0x1b/0x59
>   [<c10bd91d>] __mark_inode_dirty+0x23/0xd2
>   [<c10b6618>] file_update_time+0x76/0xb3
>   [<c107e0b9>] __generic_file_aio_write_nolock+0x235/0x4b3
>   [<c107f0e5>] ? filemap_fault+0x76/0x3b1
>   [<c107e5a5>] generic_file_aio_write+0x5d/0xd1
>   [<c10e9267>] ext3_file_write+0x27/0xa5
>   [<c10a527a>] do_sync_write+0xcd/0x103
>   [<c1049b46>] ? autoremove_wake_function+0x0/0x38
>   [<c1006efb>] ? xen_restore_fl_direct_end+0x0/0x1
>   [<c10d2c2c>] ? locks_free_lock+0x31/0x40
>   [<c10d4103>] ? fcntl_setlk+0x3e/0x185
>   [<c10a596e>] vfs_write+0x8b/0x13f
>   [<c137e474>] ? do_page_fault+0x1b4/0x3a0
>   [<c10a51ad>] ? do_sync_write+0x0/0x103
>   [<c10a5f1d>] sys_write+0x3d/0x64
>   [<c1009058>] sysenter_do_call+0x12/0x2c
> Sched Debug Version: v0.09, 2.6.31.6 #8
> now at 194118.700060 msecs
>    .jiffies                                 : 4294940775
>    .sysctl_sched_latency                    : 20.000000
>    .sysctl_sched_min_granularity            : 4.000000
>    .sysctl_sched_wakeup_granularity         : 5.000000
>    .sysctl_sched_child_runs_first           : 0.000001
>    .sysctl_sched_features                   : 113917
>
> cpu#0, 2500.152 MHz
>    .nr_running                    : 1
>    .load                          : 3121
>    .nr_switches                   : 17143
>    .nr_load_updates               : 48354
>    .nr_uninterruptible            : 4
>    .next_balance                  : 4294.940896
>    .curr->pid                     : 14
>    .clock                         : 193919.954084
>    .cpu_load[0]                   : 0
>    .cpu_load[1]                   : 0
>    .cpu_load[2]                   : 0
>    .cpu_load[3]                   : 0
>    .cpu_load[4]                   : 0
>
> cfs_rq[0]:
>    .exec_clock                    : 0.000000
>    .MIN_vruntime                  : 0.000001
>    .min_vruntime                  : 13296.631519
>    .max_vruntime                  : 0.000001
>    .spread                        : 0.000000
>    .spread0                       : 0.000000
>    .nr_running                    : 1
>    .load                          : 3121
>    .nr_spread_over                : 0
>
> rt_rq[0]:
>    .rt_nr_running                 : 0
>    .rt_throttled                  : 0
>    .rt_time                       : 0.000000
>    .rt_runtime                    : 950.000000
>
> runnable tasks:
>              task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
> ----------------------------------------------------------------------------------------------------------
> R       xenwatch    14     13276.942024       146   115               0               0               0.000000               0.000000               0.000000
>
>    

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Prepping a bugfix push
  2009-12-04  1:10       ` Jeremy Fitzhardinge
@ 2009-12-04  4:29         ` Brendan Cully
  2009-12-04  7:46         ` Ian Campbell
  1 sibling, 0 replies; 17+ messages in thread
From: Brendan Cully @ 2009-12-04  4:29 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Ian.Campbell, pbonzini, xen-devel, konrad.wilk

On Thursday, 03 December 2009 at 17:10, Jeremy Fitzhardinge wrote:
> On 12/03/09 16:24, Brendan Cully wrote:
> >On Thursday, 03 December 2009 at 15:22, Jeremy Fitzhardinge wrote:
> >>On 12/03/09 11:35, Brendan Cully wrote:
> >>>Not a patch, but I've just tried out xm save -c again with the latest
> >>>xen changes, and while I no longer see the grant table version panic,
> >>>the guest's devices (aside from the console) appear to be wedged on
> >>>resume. Is anyone else seeing this?
> >>>
> >>>After a while on the console I see messages like this:
> >>>
> >>>INFO: task syslogd:2219 blocked for more than 120 seconds.
> >>>
> >>>which I assume is trouble with the block device.
> >>Is there any backtrace or other info associated with this?
> >I take it you couldn't reproduce? Let me know what you want
> >captured. In the meantime, here's the kernel log:
> 
> Nope, works for me :)  Ian has been doing testing with hundreds or
> thousands of suspend/resumes, so it *can* be reliable.
> 
> That said, I am seeing some odd hangs after suspend/resume with the
> bugfix branch merged into Linux mainline...
> 
> Can you ping the domain when its in that state?

Nope. If it matters, my environment is 64-bit xen, 32-bit 2.6.18 dom0,
both the latest HG.

Here's a diff of xenstore-ls before and after xm save -c, which
doesn't say much. I don't know why that VBD ring-ref is changing
though.

--- xs-presuspend.txt	2009-12-03 20:25:51.419603196 -0800
+++ xs-postsuspend.txt	2009-12-03 20:26:20.604309769 -0800
@@ -96,7 +96,7 @@
       backend-id = "0"
       state = "4"
       backend = "/local/domain/0/backend/vbd/18/51713"
-      ring-ref = "8"
+      ring-ref = "101"
       event-channel = "9"
     vif = ""
      0 = ""
@@ -121,6 +121,7 @@
       backend = "/local/domain/0/backend/console/18/0"
    control = ""
     platform-feature-multiprocessor-suspend = "1"
+    shutdown = ""
    error = ""
    memory = ""
     target = "262144"
@@ -223,7 +224,8 @@
   on_crash = "restart"
   xend = ""
    restart_count = "0"
+   last_shutdown_reason = "suspend"
   vcpus = "1"
   vcpu_avail = "1"
   bootloader = ""
-  name = "p32"
+  name = "migrating-p32"

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Prepping a bugfix push
  2009-12-04  1:10       ` Jeremy Fitzhardinge
  2009-12-04  4:29         ` Brendan Cully
@ 2009-12-04  7:46         ` Ian Campbell
  2009-12-04 15:50           ` Ian Campbell
  1 sibling, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2009-12-04  7:46 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: pbonzini, xen-devel, konrad.wilk

On Fri, 2009-12-04 at 01:10 +0000, Jeremy Fitzhardinge wrote: 
> On 12/03/09 16:24, Brendan Cully wrote:
> > On Thursday, 03 December 2009 at 15:22, Jeremy Fitzhardinge wrote:
> >    
> >> On 12/03/09 11:35, Brendan Cully wrote:
> >>      
> >>> Not a patch, but I've just tried out xm save -c again with the latest
> >>> xen changes, and while I no longer see the grant table version panic,
> >>> the guest's devices (aside from the console) appear to be wedged on
> >>> resume. Is anyone else seeing this?
> >>>
> >>> After a while on the console I see messages like this:
> >>>
> >>> INFO: task syslogd:2219 blocked for more than 120 seconds.
> >>>
> >>> which I assume is trouble with the block device.
> >>>        
> >> Is there any backtrace or other info associated with this?
> >>      
> > I take it you couldn't reproduce? Let me know what you want
> > captured. In the meantime, here's the kernel log:
> >    
> 
> Nope, works for me :)  Ian has been doing testing with hundreds or 
> thousands of suspend/resumes, so it *can* be reliable.

I've been doing regular suspend/resumes not checkpoint ones as Brendan
is doing, I did try a couple of checkpointed ones yesterday and they
failed, IIRC with a similar softlockup to this one.

> That said, I am seeing some odd hangs after suspend/resume with the 
> bugfix branch merged into Linux mainline...

Me too, my many successful iterations were with 2.6.31 and 2.6.30 (plus
the fixes). I've also seen some hangs with 2.6.32-rc8 but they don't
look much like this one.

Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Prepping a bugfix push
  2009-12-03 19:26 Prepping a bugfix push Jeremy Fitzhardinge
  2009-12-03 19:35 ` Brendan Cully
@ 2009-12-04 10:45 ` Ian Campbell
  2009-12-04 14:10   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2009-12-04 10:45 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Paolo Bonzini, Xen-devel, Konrad Rzeszutek Wilk

On Thu, 2009-12-03 at 19:26 +0000, Jeremy Fitzhardinge wrote:
> I'm preparing a general bugfix push for Linus, targeted at both current
> linux-2.6.git and stable.  The list of patches I have lined up (in the
> "bugfix" branch) are below.  Is there anything I've overlooked?  Are
> there any patches I've forgotten to apply altogether?

I see you've already sent the request, but the selection of save/restore
fixes looks complete to me (modulo the hangs on 2.6.32 which there are
no fixes to yet).

BTW, could these go into the next batch?

63fc31d9102c9dd2af4d8a31e8b5679f0c3d2483 Xen balloon: fix totalram_pages counting.
e6ba99241dbe62ecee5b75f103905802ad091e7a xen: try harder to balloon up under memory pressure.

Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Prepping a bugfix push
  2009-12-04 10:45 ` Ian Campbell
@ 2009-12-04 14:10   ` Konrad Rzeszutek Wilk
  2009-12-04 16:18     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2009-12-04 14:10 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Paolo Bonzini, Jeremy Fitzhardinge, Xen-devel

On Fri, Dec 04, 2009 at 10:45:10AM +0000, Ian Campbell wrote:
> On Thu, 2009-12-03 at 19:26 +0000, Jeremy Fitzhardinge wrote:
> > I'm preparing a general bugfix push for Linus, targeted at both current
> > linux-2.6.git and stable.  The list of patches I have lined up (in the
> > "bugfix" branch) are below.  Is there anything I've overlooked?  Are
> > there any patches I've forgotten to apply altogether?

How about those three fb-defio patches? Jaya said he would happy
if you posted them to Linus.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Prepping a bugfix push
  2009-12-04  7:46         ` Ian Campbell
@ 2009-12-04 15:50           ` Ian Campbell
  2009-12-04 16:37             ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2009-12-04 15:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Brendan Cully; +Cc: xen-devel

On Fri, 2009-12-04 at 07:46 +0000, Ian Campbell wrote:
> I've been doing regular suspend/resumes not checkpoint ones as Brendan
> is doing, I did try a couple of checkpointed ones yesterday and they
> failed, IIRC with a similar softlockup to this one. 

So what is happening is that the device event channels are getting torn
down by the resume handler and never completely reinstated in the
cancelled suspend (aka checkpoint) case.

Taking blkfront as an example (although I expect netfront has a similar
issue), what see is: blkfront_resume calls blkif_free which calls
unbind_from_irqhandler, which causes EVTCHNOP_close and unbind the IRQ
from that IRQ (my patch "don't leak IRQs over suspend/resume" did not
change this behaviour in the cancelled suspend case). Then
blkfront_resume calls talk_to_backend which calls setup_blkring which
allocates a new event channel and binds it to a new IRQ. So far so good.
However because the domain never really goes away in the checkpoint case
the backend never notices that it needs to rebind to this new event
channel.

so, using lsevtchn on domain 0 I see:
        --- BEFORE.0	2009-12-04 14:47:58.000000000 +0000
        +++ AFTER.0	2009-12-04 14:48:55.000000000 +0000
        @@ -39,5 +39,5 @@
           39: VCPU 0: Virtual IRQ 3
           40: VCPU 0: Interdomain (Connected) - Remote Domain 1, Port 1
           41: VCPU 0: Interdomain (Connected) - Remote Domain 1, Port 2
        -  42: VCPU 0: Interdomain (Connected) - Remote Domain 1, Port 13
        -  43: VCPU 0: Interdomain (Connected) - Remote Domain 1, Port 14
        +  42: VCPU 0: Interdomain (Waiting connection) - Remote Domain 1
        +  43: VCPU 0: Interdomain (Waiting connection) - Remote Domain 1
and on the guest I see:
        --- BEFORE.U	2009-12-04 14:48:02.000000000 +0000
        +++ AFTER.U	2009-12-04 14:48:53.000000000 +0000
        @@ -10,5 +10,4 @@
           10: VCPU 1: IPI
           11: VCPU 1: Virtual IRQ 1
           12: VCPU 1: IPI
        -  13: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 42
        -  14: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 43
        +  13: VCPU 0: Interdomain (Waiting connection) - Remote Domain 0
(guest event channel 13 happens to be block front in both cases)

In 2.6.18 there was a separate ->suspend_cancel() callback for each
driver, called instead of the ->resume() callback in exactly these
circumstances. The cancel callback doesn't do any of the teardown, in
fact for blkfront it doesn't even exist.

(As a proof of concept, commenting out the entire contents of
blkfront_resume and netfront_resume makes checkpointing work OK for me,
at the cost of breaking regular resume, of course)

pv-ops uses the generic power management infrastructure which does not
have a concept of cancelling a suspend. Perhaps it should? Otherwise a
different solution will be required, I'm not sure what that might be yet
yet.

Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Prepping a bugfix push
  2009-12-04 14:10   ` Konrad Rzeszutek Wilk
@ 2009-12-04 16:18     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2009-12-04 16:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Paolo Bonzini, Xen-devel, Ian Campbell

On 12/04/09 06:10, Konrad Rzeszutek Wilk wrote:
> How about those three fb-defio patches? Jaya said he would happy
> if you posted them to Linus.
>    

Yep, I have them all queued up here and a mail drafted to request a 
pull, but I tried it out and couldn't get FB working.  But it was 
failing to show the grub menu, let alone a kernel framebuffer, so I 
think the problem is on the dom0 side.  I get a blank vnc window, but no 
contents.

I'm updating my test machine's toolstack to see what exciting way that's 
going to further break things...

     J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Prepping a bugfix push
  2009-12-04 15:50           ` Ian Campbell
@ 2009-12-04 16:37             ` Jeremy Fitzhardinge
  2009-12-04 16:56               ` Ian Campbell
  2009-12-04 17:01               ` Brendan Cully
  0 siblings, 2 replies; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2009-12-04 16:37 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Brendan Cully, xen-devel

On 12/04/09 07:50, Ian Campbell wrote:
> On Fri, 2009-12-04 at 07:46 +0000, Ian Campbell wrote:
>    
>> I've been doing regular suspend/resumes not checkpoint ones as Brendan
>> is doing, I did try a couple of checkpointed ones yesterday and they
>> failed, IIRC with a similar softlockup to this one.
>>      
> So what is happening is that the device event channels are getting torn
> down by the resume handler and never completely reinstated in the
> cancelled suspend (aka checkpoint) case.
>    

Hm.

> In 2.6.18 there was a separate ->suspend_cancel() callback for each
> driver, called instead of the ->resume() callback in exactly these
> circumstances. The cancel callback doesn't do any of the teardown, in
> fact for blkfront it doesn't even exist.
>
> (As a proof of concept, commenting out the entire contents of
> blkfront_resume and netfront_resume makes checkpointing work OK for me,
> at the cost of breaking regular resume, of course)
>
> pv-ops uses the generic power management infrastructure which does not
> have a concept of cancelling a suspend. Perhaps it should? Otherwise a
> different solution will be required, I'm not sure what that might be yet
> yet.
>    

Well, the obvious one is to treat it as a full suspend followed by 
immediate resume.  That is, just remove all the special case handling 
for checkpoint, and let it do the normal resume stuff when the hypercall 
returns.

I think the PM core can fail to suspend; it just resumes anything that 
has been suspended so far.

     J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Prepping a bugfix push
  2009-12-04 16:37             ` Jeremy Fitzhardinge
@ 2009-12-04 16:56               ` Ian Campbell
  2009-12-05  0:05                 ` Jeremy Fitzhardinge
  2009-12-04 17:01               ` Brendan Cully
  1 sibling, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2009-12-04 16:56 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Brendan Cully, xen-devel

On Fri, 2009-12-04 at 16:37 +0000, Jeremy Fitzhardinge wrote: 
> On 12/04/09 07:50, Ian Campbell wrote:
> > On Fri, 2009-12-04 at 07:46 +0000, Ian Campbell wrote:
> >    
> >> I've been doing regular suspend/resumes not checkpoint ones as Brendan
> >> is doing, I did try a couple of checkpointed ones yesterday and they
> >> failed, IIRC with a similar softlockup to this one.
> >>      
> > So what is happening is that the device event channels are getting torn
> > down by the resume handler and never completely reinstated in the
> > cancelled suspend (aka checkpoint) case.
> >    
> 
> Hm.
> 
> > In 2.6.18 there was a separate ->suspend_cancel() callback for each
> > driver, called instead of the ->resume() callback in exactly these
> > circumstances. The cancel callback doesn't do any of the teardown, in
> > fact for blkfront it doesn't even exist.
> >
> > (As a proof of concept, commenting out the entire contents of
> > blkfront_resume and netfront_resume makes checkpointing work OK for me,
> > at the cost of breaking regular resume, of course)
> >
> > pv-ops uses the generic power management infrastructure which does not
> > have a concept of cancelling a suspend. Perhaps it should? Otherwise a
> > different solution will be required, I'm not sure what that might be yet
> > yet.
> >    
> 
> Well, the obvious one is to treat it as a full suspend followed by 
> immediate resume.  That is, just remove all the special case handling 
> for checkpoint, and let it do the normal resume stuff when the hypercall 
> returns.

I'm not sure how much that will help, some of the resume stuff relies on
the domain actually changing underneath, i.e. the backends are torn down
and resetup by the tools and therefore expect a fresh reconnection, the
hypervisor side of event channels is implicitly reset (the kernel just
resets its own state) etc. None of these things happen during a
checkpoint. Presumably those who are interested in checkpointing would
prefer them not to happen in order to remain fast.

> I think the PM core can fail to suspend; it just resumes anything that 
> has been suspended so far.

An optional separate hook for that case (called in preference to
->resume) might be acceptable upstream? Adding a parameter to the
->resume handler itself might also be acceptable but would involve more
churn.

Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Prepping a bugfix push
  2009-12-04 16:37             ` Jeremy Fitzhardinge
  2009-12-04 16:56               ` Ian Campbell
@ 2009-12-04 17:01               ` Brendan Cully
  2009-12-04 17:12                 ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 17+ messages in thread
From: Brendan Cully @ 2009-12-04 17:01 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, Ian Campbell

On Friday, 04 December 2009 at 08:37, Jeremy Fitzhardinge wrote:
> On 12/04/09 07:50, Ian Campbell wrote:
> >On Fri, 2009-12-04 at 07:46 +0000, Ian Campbell wrote:
> >>I've been doing regular suspend/resumes not checkpoint ones as Brendan
> >>is doing, I did try a couple of checkpointed ones yesterday and they
> >>failed, IIRC with a similar softlockup to this one.
> >So what is happening is that the device event channels are getting torn
> >down by the resume handler and never completely reinstated in the
> >cancelled suspend (aka checkpoint) case.
> 
> Hm.
> 
> >In 2.6.18 there was a separate ->suspend_cancel() callback for each
> >driver, called instead of the ->resume() callback in exactly these
> >circumstances. The cancel callback doesn't do any of the teardown, in
> >fact for blkfront it doesn't even exist.
> >
> >(As a proof of concept, commenting out the entire contents of
> >blkfront_resume and netfront_resume makes checkpointing work OK for me,
> >at the cost of breaking regular resume, of course)
> >
> >pv-ops uses the generic power management infrastructure which does not
> >have a concept of cancelling a suspend. Perhaps it should? Otherwise a
> >different solution will be required, I'm not sure what that might be yet
> >yet.
> 
> Well, the obvious one is to treat it as a full suspend followed by
> immediate resume.  That is, just remove all the special case handling
> for checkpoint, and let it do the normal resume stuff when the
> hypercall returns.
> 
> I think the PM core can fail to suspend; it just resumes anything
> that has been suspended so far.

Hmm. I just tried changing the SUSPEND_CANCEL elfnote to 0 in pvops,
and now save -c takes a very long time. From the xend log:

[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:3025) XendDomainInfo.resumeDomain(19)
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2319) Destroying device model
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2326) Releasing devices
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2332) Removing vif/0
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:1213) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2332) Removing vbd/51713
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:1213) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51713
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2332) Removing console/0
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:1213) XendDomainInfo.destroyDevice: deviceClass = console, device = console/0
[2009-12-04 08:57:58 4917] INFO (XendDomainInfo:3260) Dev 51713 still active, looping...

that last line repeats for a very long time, and eventually gives
up. The domain is still broken when save completes.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Prepping a bugfix push
  2009-12-04 17:01               ` Brendan Cully
@ 2009-12-04 17:12                 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2009-12-04 17:12 UTC (permalink / raw)
  To: Ian.Campbell, xen-devel

On 12/04/09 09:01, Brendan Cully wrote:
> Hmm. I just tried changing the SUSPEND_CANCEL elfnote to 0 in pvops,
> and now save -c takes a very long time. From the xend log:
>    

Yes, the tools are dumb.  It just assumes guests support suspend-cancel, 
aside from not actually suspending them if they don't.

     J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Prepping a bugfix push
  2009-12-04 16:56               ` Ian Campbell
@ 2009-12-05  0:05                 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2009-12-05  0:05 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Brendan Cully, xen-devel

On 12/04/09 08:56, Ian Campbell wrote:
> On Fri, 2009-12-04 at 16:37 +0000, Jeremy Fitzhardinge wrote:
>    
>> On 12/04/09 07:50, Ian Campbell wrote:
>>      
>>> On Fri, 2009-12-04 at 07:46 +0000, Ian Campbell wrote:
>>>
>>>        
>>>> I've been doing regular suspend/resumes not checkpoint ones as Brendan
>>>> is doing, I did try a couple of checkpointed ones yesterday and they
>>>> failed, IIRC with a similar softlockup to this one.
>>>>
>>>>          
>>> So what is happening is that the device event channels are getting torn
>>> down by the resume handler and never completely reinstated in the
>>> cancelled suspend (aka checkpoint) case.
>>>
>>>        
>> Hm.
>>
>>      
>>> In 2.6.18 there was a separate ->suspend_cancel() callback for each
>>> driver, called instead of the ->resume() callback in exactly these
>>> circumstances. The cancel callback doesn't do any of the teardown, in
>>> fact for blkfront it doesn't even exist.
>>>
>>> (As a proof of concept, commenting out the entire contents of
>>> blkfront_resume and netfront_resume makes checkpointing work OK for me,
>>> at the cost of breaking regular resume, of course)
>>>
>>> pv-ops uses the generic power management infrastructure which does not
>>> have a concept of cancelling a suspend. Perhaps it should? Otherwise a
>>> different solution will be required, I'm not sure what that might be yet
>>> yet.
>>>
>>>        
>> Well, the obvious one is to treat it as a full suspend followed by
>> immediate resume.  That is, just remove all the special case handling
>> for checkpoint, and let it do the normal resume stuff when the hypercall
>> returns.
>>      
> I'm not sure how much that will help, some of the resume stuff relies on
> the domain actually changing underneath, i.e. the backends are torn down
> and resetup by the tools and therefore expect a fresh reconnection, the
> hypervisor side of event channels is implicitly reset (the kernel just
> resets its own state) etc. None of these things happen during a
> checkpoint. Presumably those who are interested in checkpointing would
> prefer them not to happen in order to remain fast.
>    

Yes, that's certainly all possible with some biggish performance hit...

How about this: if its a checkpoint, then don't bother calling all the 
resume functions.  We may need to call the device model resume just to 
keep everyone sane and happy, but at the xenbus nodes, filter out the 
calls to the drivers.

>> I think the PM core can fail to suspend; it just resumes anything that
>> has been suspended so far.
>>      
> An optional separate hook for that case (called in preference to
> ->resume) might be acceptable upstream? Adding a parameter to the
> ->resume handler itself might also be acceptable but would involve more
> churn.
>    

The whole area is so fragile and fraught, I don't really want to get 
into it.

     J

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-12-05  0:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-03 19:26 Prepping a bugfix push Jeremy Fitzhardinge
2009-12-03 19:35 ` Brendan Cully
2009-12-03 19:38   ` Jeremy Fitzhardinge
2009-12-03 23:22   ` Jeremy Fitzhardinge
2009-12-04  0:24     ` Brendan Cully
2009-12-04  1:10       ` Jeremy Fitzhardinge
2009-12-04  4:29         ` Brendan Cully
2009-12-04  7:46         ` Ian Campbell
2009-12-04 15:50           ` Ian Campbell
2009-12-04 16:37             ` Jeremy Fitzhardinge
2009-12-04 16:56               ` Ian Campbell
2009-12-05  0:05                 ` Jeremy Fitzhardinge
2009-12-04 17:01               ` Brendan Cully
2009-12-04 17:12                 ` Jeremy Fitzhardinge
2009-12-04 10:45 ` Ian Campbell
2009-12-04 14:10   ` Konrad Rzeszutek Wilk
2009-12-04 16:18     ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.