All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
@ 2017-08-18 14:15 ` ChristianEhrhardt
  2017-08-18 14:22 ` ChristianEhrhardt
                   ` (47 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-18 14:15 UTC (permalink / raw)
  To: qemu-devel

I reached out to the people involved in the initial fixes which were related to image locking and qemu-nbd. But this might after all be something completely different.
Yet until we know better it might be wiser to reach out to more people.

=> http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg03465.html

** Also affects: qemu (Ubuntu)
   Importance: Undecided
       Status: New

** Also affects: qemu
   Importance: Undecided
       Status: New

** Changed in: qemu (Ubuntu)
     Assignee: (unassigned) => ChristianEhrhardt (paelzer)

** Changed in: qemu (Ubuntu)
       Status: New => Confirmed

** Changed in: qemu (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
  2017-08-18 14:15 ` [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10 ChristianEhrhardt
@ 2017-08-18 14:22 ` ChristianEhrhardt
  2017-08-18 15:03 ` ChristianEhrhardt
                   ` (46 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-18 14:22 UTC (permalink / raw)
  To: qemu-devel

The source log is virsh, I need to ensure we also have a source libvirtd
log ...

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
  2017-08-18 14:15 ` [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10 ChristianEhrhardt
  2017-08-18 14:22 ` ChristianEhrhardt
@ 2017-08-18 15:03 ` ChristianEhrhardt
  2017-08-18 15:03 ` ChristianEhrhardt
                   ` (45 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-18 15:03 UTC (permalink / raw)
  To: qemu-devel

Since this is pretty reproducible here on the setup:

- Two systems (actually two lxd containers on one system)
- Both running Artful with qemu 2.10-rc3 and libvirt 3.6
- Storage path is not shared but set up equivalent with a manual pre-copy
- Migration with post copy is failing, no other options set, example:
  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.100/system
- Same setup works on the qemu versions in Xenial (2.5), Yakkety (2.6), and Zesty (2.8)
- In fact it seems even a migration from a Zesty qemu (2.8) to the new (2.10) works

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (2 preceding siblings ...)
  2017-08-18 15:03 ` ChristianEhrhardt
@ 2017-08-18 15:03 ` ChristianEhrhardt
  2017-08-18 15:04 ` ChristianEhrhardt
                   ` (44 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-18 15:03 UTC (permalink / raw)
  To: qemu-devel

To simplify downloading the logs I'm attaching here a full set of:
- virsh
- source libvirtd
- target libvirtd

** Attachment added: "virsh-source.log"
   https://bugs.launchpad.net/qemu/+bug/1711602/+attachment/4934823/+files/virsh-source.log

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (3 preceding siblings ...)
  2017-08-18 15:03 ` ChristianEhrhardt
@ 2017-08-18 15:04 ` ChristianEhrhardt
  2017-08-18 15:04 ` ChristianEhrhardt
                   ` (43 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-18 15:04 UTC (permalink / raw)
  To: qemu-devel

** Attachment added: "libvirtd-target.log"
   https://bugs.launchpad.net/qemu/+bug/1711602/+attachment/4934824/+files/libvirtd-target.log

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (4 preceding siblings ...)
  2017-08-18 15:04 ` ChristianEhrhardt
@ 2017-08-18 15:04 ` ChristianEhrhardt
  2017-08-18 15:06 ` ChristianEhrhardt
                   ` (42 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-18 15:04 UTC (permalink / raw)
  To: qemu-devel

** Attachment added: "libvirtd-source.log"
   https://bugs.launchpad.net/qemu/+bug/1711602/+attachment/4934825/+files/libvirtd-source.log

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (5 preceding siblings ...)
  2017-08-18 15:04 ` ChristianEhrhardt
@ 2017-08-18 15:06 ` ChristianEhrhardt
  2017-08-18 15:09 ` ChristianEhrhardt
                   ` (41 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-18 15:06 UTC (permalink / raw)
  To: qemu-devel

I've seen something in the logs which I want to eliminate from the list of possibilities:
  "warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]"

We had always a patch I questioned to enable svm capabilitiy for guests in general, it worked all the time but I'd have preferred to be an explicit user opt-in.
I remember seeing the warning in the past which made me neglect it at first, but maybe the target capability check is now more strict.

I'll drop this change for a test build and run all that again to be sure.
I doubt that is the reason, but let verifying this particular lead be my task - please be open with other suggestions.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (6 preceding siblings ...)
  2017-08-18 15:06 ` ChristianEhrhardt
@ 2017-08-18 15:09 ` ChristianEhrhardt
  2017-08-18 17:56 ` Dr. David Alan Gilbert
                   ` (40 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-18 15:09 UTC (permalink / raw)
  To: qemu-devel

Currently I plan to test with the svm/vmx changes disabled as well as a
cross test on ppc64 and s390x which might complete the picture.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (7 preceding siblings ...)
  2017-08-18 15:09 ` ChristianEhrhardt
@ 2017-08-18 17:56 ` Dr. David Alan Gilbert
  2017-08-21 14:55 ` ChristianEhrhardt
                   ` (39 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-18 17:56 UTC (permalink / raw)
  To: qemu-devel

The 'host doesn't support requested feature' is probably a red-herring in this case
The fact it's failing with an IO error but nothing else suggests either:
  a) it's something closing the socket between the two qemu's
  b) The I/O error is from storage/NBD

Assuming it fails on precopy, I'd look at the qemu_loadvm_state_section_startfull to watch all the device states load.
You could also add some debug/tracing in qemu_loadvm_state to see at what point it fails.

Dave

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (8 preceding siblings ...)
  2017-08-18 17:56 ` Dr. David Alan Gilbert
@ 2017-08-21 14:55 ` ChristianEhrhardt
  2017-08-21 14:55 ` ChristianEhrhardt
                   ` (38 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-21 14:55 UTC (permalink / raw)
  To: qemu-devel

Hi David,
confirming the red-herring on the cpu feature - I had a build without runnign over the weekend so this was easy to test - and still the migration fails.

I have about 7 seconds from kicking off the migration until the sync seems to pass its first phase and then qemu is exiting (at least that is what libvirt thinks):
  "closed without SHUTDOWN event; assuming the domain crashed"

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (9 preceding siblings ...)
  2017-08-21 14:55 ` ChristianEhrhardt
@ 2017-08-21 14:55 ` ChristianEhrhardt
  2017-08-21 14:55 ` ChristianEhrhardt
                   ` (37 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-21 14:55 UTC (permalink / raw)
  To: qemu-devel

Since the qemu "lives" in that time I can try to debug what happens.
With strace to sniff where things could be I see right before the end:
     0.000203 recvmsg(27, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="", iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 0 <0.000014>
     0.000049 futex(0xca65dacf4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0xca4785a80, 20) = 1 <0.000016>
     0.000038 getpid()                  = 29750 <0.000023>
     0.000011 tgkill(29750, 29760, SIGUSR1) = 0 <0.000030>
     0.000012 futex(0xca4785a80, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000048>
     0.000010 futex(0xca47b46e4, FUTEX_WAIT_PRIVATE, 19, NULL) = 0 <0.002215>
     0.000032 sendmsg(21, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="{\"timestamp\": {\"seconds\": 1503322067, \"microseconds\": 613178}, \"event\": \"MIGRATION\", \"data\": {\"status\": \"failed\"}}\r\n", iov_len=116}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 116 <0.000024>
     0.000074 write(2, "2017-08-21T13:27:47.613276Z qemu-system-x86_64: load of migration failed: Input/output error\n", 93) = 93 <0.000022>
     0.000055 close(27)                 = 0 <0.000090>

Now 29750 is the main process/tgid and 29760 is the third process started on the migration.
It is the one that does the vcpu ioctl's so I assume this is just the one representing the vpu.
Well gdb should be more useful so looking with that.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (10 preceding siblings ...)
  2017-08-21 14:55 ` ChristianEhrhardt
@ 2017-08-21 14:55 ` ChristianEhrhardt
  2017-08-21 14:56 ` ChristianEhrhardt
                   ` (36 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-21 14:55 UTC (permalink / raw)
  To: qemu-devel

As expected by David when I trace on process_incoming_migration_co which prints the "readable" error I see the error pop up on "qemu_loadvm_state"
It appears as "Thread 4 "CPU 0/KVM" received signal SIGUSR1" and similar which is just the break down of the guest.


Diving "into" qemu_loadvm_state reveals that it gets until "cpu_synchronize_all_pre_loadvm".
In qemu_loadvm_state none of the initial checks fail.
Then the "ret = vmstate_load_state(f, &vmstate_configuration, &savevm_state, 0);" seems to work fine was well.
It seems reproducible in "cpu_synchronize_all_pre_loadvm" where the crash happens.

I can catch the incoming qemu easily with:
$ while ! pid=$(pidof qemu-system-x86_64); do /bin/true; done; gdb --pid ${pid}
# Then in gdb break on "cpu_synchronize_all_pre_loadvm"
# And when I step over it I the next thing I see is the "beginning of the end" for the process
Thread 4 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7f418136e700 (LWP 3887)]
__lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

The guest only has one vcpu, so CPU_FOREACH(cpu) is not much of a loop.
Looking down that path I tracked it to along:
cpu_synchronize_all_pre_loadvm -> cpu_synchronize_pre_loadvm -> kvm_cpu_synchronize_pre_loadvm -> do_run_on_cpu

Here it queues the function "do_kvm_cpu_synchronize_pre_loadvm" onto the vcpu.
That is done via queue_work_on_cpu(cpu, &wi); which in turn uses eventually "qemu_cpu_kick_thread(cpu);"
That seems to trigger the first SIGUSR1

Following that I get the breakpoint that I set at "do_kvm_cpu_synchronize_pre_loadvm".
The actual function only sets "cpu->vcpu_dirty = true;" and works.

On the way out from there, there is a "qemu_kvm_wait_io_event" which
leads to the next SIGUSR1.

Going on I see another "do_run_on_cpu" with "vapic_do_enable_tpr_reporting" as the function.
I set a breakpoint on that as well but took a full CPUstate before going on:
p *cpu
$4 = {parent_obj = {parent_obj = {class = 0x5ffe7170c0, free = 0x7f62328f15a0 <g_free>, properties = 0x5ffe736e40, ref = 1, 
      parent = 0x5ffe726160}, id = 0x0, realized = true, pending_deleted_event = false, opts = 0x0, hotplugged = 0, parent_bus = 0x0, gpios = {
      lh_first = 0x0}, child_bus = {lh_first = 0x0}, num_child_bus = 0, instance_id_alias = -1, alias_required_for_version = 0}, nr_cores = 1, 
  nr_threads = 1, thread = 0x5ffe803cd0, thread_id = 8498, running = false, has_waiter = false, halt_cond = 0x5ffe803cf0, thread_kicked = true, 
  created = true, stop = false, stopped = true, unplug = false, crash_occurred = false, exit_request = false, interrupt_request = 4, 
  singlestep_enabled = 0, icount_budget = 0, icount_extra = 0, jmp_env = {{__jmpbuf = {0, 0, 0, 0, 0, 0, 0, 0}, __mask_was_saved = 0, 
      __saved_mask = {__val = {0 <repeats 16 times>}}}}, work_mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, 
        __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, 
    initialized = true}, queued_work_first = 0x5fffefc990, queued_work_last = 0x5fffefc990, cpu_ases = 0x5ffe803c10, num_ases = 1, 
  as = 0x5ffe7f9690, memory = 0x5ffe725bd0, env_ptr = 0x5ffe7e44c0, tb_jmp_cache = {0x0 <repeats 4096 times>}, gdb_regs = 0x0, 
  gdb_num_regs = 57, gdb_num_g_regs = 57, node = {tqe_next = 0x0, tqe_prev = 0x5ffc1783f0 <cpus>}, breakpoints = {tqh_first = 0x0, 
    tqh_last = 0x5ffe7e4430}, watchpoints = {tqh_first = 0x0, tqh_last = 0x5ffe7e4440}, watchpoint_hit = 0x0, opaque = 0x0, mem_io_pc = 0, 
  mem_io_vaddr = 0, kvm_fd = 19, kvm_state = 0x5ffe7357a0, kvm_run = 0x7f62374bc000, trace_dstate_delayed = {0}, trace_dstate = {0}, 
  cpu_index = 0, halted = 1, can_do_io = 1, exception_index = -1, vcpu_dirty = true, throttle_thread_scheduled = false, icount_decr = {u32 = 0, 
    u16 = {low = 0, high = 0}}, hax_vcpu = 0x0, pending_tlb_flush = 7}

Continuing I hit the "vapic_do_enable_tpr_reporting" with qemu still
running.

Things go on, the next candidate for "do_run_on_cpu" is "kvm_apic_put"
Another SIGUSR1 to get that kicked it seems.
"kvm_apic_put" breakpoint is reached after that kick.

Next for "do_run_on_cpu"  is "do_kvm_cpu_synchronize_post_init". And
that triggered the fourth SIGUSR1. Before I only saw four, hopefully
that is the same with so much breakpoints.

Checked the cpu state again:
1880    static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
1881    {
1882        kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
1883        cpu->vcpu_dirty = false;
1884    }
1885
(gdb) p cpu
$5 = (CPUState *) 0x5ffe7dc230
(gdb) p *cpu
$6 = {parent_obj = {parent_obj = {class = 0x5ffe7170c0, free = 0x7f62328f15a0 <g_free>, properties = 0x5ffe736e40, ref = 1, 
      parent = 0x5ffe726160}, id = 0x0, realized = true, pending_deleted_event = false, opts = 0x0, hotplugged = 0, parent_bus = 0x0, gpios = {
      lh_first = 0x0}, child_bus = {lh_first = 0x0}, num_child_bus = 0, instance_id_alias = -1, alias_required_for_version = 0}, nr_cores = 1, 
  nr_threads = 1, thread = 0x5ffe803cd0, thread_id = 8498, running = false, has_waiter = false, halt_cond = 0x5ffe803cf0, thread_kicked = false, 
  created = true, stop = false, stopped = true, unplug = false, crash_occurred = false, exit_request = false, interrupt_request = 4, 
  singlestep_enabled = 0, icount_budget = 0, icount_extra = 0, jmp_env = {{__jmpbuf = {0, 0, 0, 0, 0, 0, 0, 0}, __mask_was_saved = 0, 
      __saved_mask = {__val = {0 <repeats 16 times>}}}}, work_mutex = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, 
        __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, 
    initialized = true}, queued_work_first = 0x0, queued_work_last = 0x0, cpu_ases = 0x5ffe803c10, num_ases = 1, as = 0x5ffe7f9690, 
  memory = 0x5ffe725bd0, env_ptr = 0x5ffe7e44c0, tb_jmp_cache = {0x0 <repeats 4096 times>}, gdb_regs = 0x0, gdb_num_regs = 57, 
  gdb_num_g_regs = 57, node = {tqe_next = 0x0, tqe_prev = 0x5ffc1783f0 <cpus>}, breakpoints = {tqh_first = 0x0, tqh_last = 0x5ffe7e4430}, 
  watchpoints = {tqh_first = 0x0, tqh_last = 0x5ffe7e4440}, watchpoint_hit = 0x0, opaque = 0x0, mem_io_pc = 0, mem_io_vaddr = 0, kvm_fd = 19, 
  kvm_state = 0x5ffe7357a0, kvm_run = 0x7f62374bc000, trace_dstate_delayed = {0}, trace_dstate = {0}, cpu_index = 0, halted = 1, can_do_io = 1, 
  exception_index = -1, vcpu_dirty = true, throttle_thread_scheduled = false, icount_decr = {u32 = 0, u16 = {low = 0, high = 0}}, 
  hax_vcpu = 0x0, pending_tlb_flush = 7}

And from here stepping into kvm_arch_put_registers:
kvm_arch_put_registers (cpu=cpu@entry=0x5ffe7dc230, level=level@entry=3) at ./target/i386/kvm.c:2591

That still is the same vcpu as all the time, x86_cpu is optimized out
unfortunately as I had no full debug build with -O0.

I see it setting up regs in kvm_arch_put_registers without error (all ret=0) and return to do_kvm_cpu_synchronize_post_init.
This eventually sets "cpu->vcpu_dirty = false;"

After this seems all good I steped through the "way out" and there came another "qemu_kvm_wait_io_event(cpu);".
Without considering this being critical I stepped with "n" and qemu was gone with all its threads.

qemu_kvm_cpu_thread_fn (arg=0x5ffe7dc230) at ./cpus.c:1134
1134        } while (!cpu->unplug || cpu_can_run(cpu));
(gdb) n
1127            if (cpu_can_run(cpu)) {
(gdb) n
1133            qemu_kvm_wait_io_event(cpu);
(gdb) n
[Thread 0x7f6227857700 (LWP 8498) exited]

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (11 preceding siblings ...)
  2017-08-21 14:55 ` ChristianEhrhardt
@ 2017-08-21 14:56 ` ChristianEhrhardt
  2017-08-21 15:46 ` Dr. David Alan Gilbert
                   ` (35 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-21 14:56 UTC (permalink / raw)
  To: qemu-devel

After this I was trying to start closer to the issue, so I put a break on "process_incoming_migration_co" (to skip over much of the initial setup).
Once that was hit I added "qemu_kvm_cpu_thread_fn" and "qemu_kvm_wait_io_event".

Of course when I try that the other functions do not trigger.
Maybe it is partially influenced by the debugging itself and/or the timing changes it causes.

I'll check what else I can find with slightly different debugging, but
so much as an update for now.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (12 preceding siblings ...)
  2017-08-21 14:56 ` ChristianEhrhardt
@ 2017-08-21 15:46 ` Dr. David Alan Gilbert
  2017-08-22  8:51 ` ChristianEhrhardt
                   ` (34 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-21 15:46 UTC (permalink / raw)
  To: qemu-devel

oh yeh you want to tell gdb to ignore SIGUSR1, something like: 
  handle SIGUSR1 nostop noprint pass

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (13 preceding siblings ...)
  2017-08-21 15:46 ` Dr. David Alan Gilbert
@ 2017-08-22  8:51 ` ChristianEhrhardt
  2017-08-22  8:54 ` ChristianEhrhardt
                   ` (33 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-22  8:51 UTC (permalink / raw)
  To: qemu-devel

Sure, but initially I wanted to see what is going on overall so I let it
pop up.

Started a debugging another session today.
First I confirmed with
  (gdb) catch syscall exit exit_group
That this is the "normal" exit along the error message we knew:
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,                  
                       MIGRATION_STATUS_FAILED);                              
     error_report("load of migration failed: %s", strerror(-ret));            
     qemu_fclose(mis->from_src_file);                                         
     exit(EXIT_FAILURE);

I found that already the retval of qemu_loadvm_state it -5.
Every thing else afterwards is cleanup.

Inside qemu_loadvm_state the first 2/3 pass and then that ret=-5 is from
"ret = qemu_file_get_error(f);".

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (14 preceding siblings ...)
  2017-08-22  8:51 ` ChristianEhrhardt
@ 2017-08-22  8:54 ` ChristianEhrhardt
  2017-08-22  8:58 ` ChristianEhrhardt
                   ` (32 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-22  8:54 UTC (permalink / raw)
  To: qemu-devel

Via a watchpoints I found that the error is set by qemu_fill_buffer.

b qemu_loadvm_state
handle SIGUSR1 nostop noprint pass
c
# on the break check and watch the status
(gdb) p f
$1 = (QEMUFile *) 0xb9babb3c00
(gdb) p *f
$2 = {ops = 0xb9b89880a0 <channel_input_ops>, hooks = 0x0, opaque = 0xb9bbabfe00, bytes_xfer = 0, xfer_limit = 0, pos = 0, buf_index = 0, 
  buf_size = 0, buf = '\000' <repeats 32767 times>, may_free = {0}, iov = {{iov_base = 0x0, iov_len = 0} <repeats 64 times>}, iovcnt = 0, 
  last_error = 0}

# ok still no err, set watchpoint
(gdb) p &(f->last_error)
$4 = (int *) 0xb9babbc044
(gdb) watch *(int *) 0xb9babbc044
Hardware watchpoint 2: *(int *) 0xb9babbc044

# This catches the following
Thread 1 "qemu-system-x86" hit Hardware watchpoint 2: *(int *) 0xb9babbc044

Old value = 0
New value = -5
0x000000b9b82bd0ec in qemu_file_set_error (ret=-5, f=0xb9babb3c00) at ./migration/qemu-file.c:125
warning: Source file is more recent than executable.
125             f->last_error = ret;
(gdb) bt
#0  0x000000b9b82bd0ec in qemu_file_set_error (ret=-5, f=0xb9babb3c00) at ./migration/qemu-file.c:125
#1  qemu_fill_buffer (f=0xb9babb3c00) at ./migration/qemu-file.c:299
#2  0x000000b9b82bdbb1 in qemu_peek_byte (f=0xb9babb3c00, offset=0) at ./migration/qemu-file.c:553
#3  0x000000b9b82bdc1b in qemu_get_byte (f=f@entry=0xb9babb3c00) at ./migration/qemu-file.c:566
#4  0x000000b9b82b5853 in qemu_loadvm_state_main (f=f@entry=0xb9babb3c00, mis=0xb9b8a4f700 <mis_current>) at ./migration/savevm.c:1947
#5  0x000000b9b82b864f in qemu_loadvm_state (f=f@entry=0xb9babb3c00) at ./migration/savevm.c:2032
#6  0x000000b9b82af5c3 in process_incoming_migration_co (opaque=0xb9babb3c00) at ./migration/migration.c:320
#7  0x000000b9b83e42a6 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ./util/coroutine-ucontext.c:79
#8  0x00007fbf3702fac0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#9  0x00007fffe3f9f800 in ?? ()
#10 0x0000000000000000 in ?? ()

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (15 preceding siblings ...)
  2017-08-22  8:54 ` ChristianEhrhardt
@ 2017-08-22  8:58 ` ChristianEhrhardt
  2017-08-22  9:01 ` ChristianEhrhardt
                   ` (31 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-22  8:58 UTC (permalink / raw)
  To: qemu-devel

So this is failing I/O that iterates over a channel.
I was tracking down the len, pending and pos used.

I found that this is not completely broken (like no access or generla I/O error)
It starts at pos 0 and iterated with varying offsets, but works for quite some time.
Example:

[...]
Thread 1 "qemu-system-x86" hit Breakpoint 2, qemu_fill_buffer (f=f@entry=0xd3b66f3c00) at ./migration/qemu-file.c:295
295         if (len > 0) {
$11183 = 28728
$11184 = 4040
$11185 = {ops = 0xd3b3d740a0 <channel_input_ops>, hooks = 0x0, opaque = 0xd3b75ee490, bytes_xfer = 0, xfer_limit = 0, pos = 107130146, 
  buf_index = 0, buf_size = 4040, 
  buf = "\v\327\a\000\021\000\[...]\000"..., 
  may_free = {0}, iov = {{iov_base = 0x0, iov_len = 0} <repeats 64 times>}, iovcnt = 0, last_error = 0}
[...]

Well you could see the whole file read passing by one by one buffer
Yet this isn't particularly fast, so track the one that has len==0
 (gdb) b ./migration/qemu-file.c:295 if len == 0

And I got it as:
(gdb) p *f
$11195 = {ops = 0xd3b3d740a0 <channel_input_ops>, hooks = 0x0, opaque = 0xd3b75ee490, bytes_xfer = 0, xfer_limit = 0, pos = 319638837, 
  buf_index = 0, buf_size = 0, buf = '\000' <repeats 5504 times>..., may_free = {0}, iov = {{iov_base = 0x0, iov_len = 0} <repeats 64 times>}, 
  iovcnt = 0, last_error = 0}

Here pending == 0 so buf_size = 0 as well also pos is further down
incremented to 319638837.

Checking in detail I found that I had pending=0 and buf_size=0 as well as non aligned pos entried, but they worked.
So I excluded the buf_size=0/pending=0 as well as the alignment as reasons.
Maybe it just iterates pos out of the range that is working?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (16 preceding siblings ...)
  2017-08-22  8:58 ` ChristianEhrhardt
@ 2017-08-22  9:01 ` ChristianEhrhardt
  2017-08-22  9:08 ` ChristianEhrhardt
                   ` (30 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-22  9:01 UTC (permalink / raw)
  To: qemu-devel

(gdb) handle SIGUSR1 nostop noprint pass
(gdb) b migration/qemu-file.c:295
(gdb) command
p f->pos
c
end

That showed the pos is ever increasing and fails at an offset it never read before. Yet the absolute number was different.
$1 = 0
$2 = 8948
$3 = 41423
[...]
$11359 = 326387440
$11360 = 326420208 => This was the one failing this time


This was a different f->pos than last time, so I wondered if this would change every time.
With a less interactive gdb config I got in three tries:
1. 313153311
2. 313313376
3. 313571856
So a different f->pos to fail each time.

Different but rather close.
I wondered if the reasons I got a higher one when tracing in more detail printing all offsets could be that there still is something copied/synced and only slowly gets available.

I stepped through rather slowly and got to 322429260 this time.
So slower continuing on the iteration over qemu_fill_buffer makes it fail "later"?

Finally it is surely interesting which channel that actually is- likely the migration socket?
And yes, ioc->name in qio_channel_read is:
 $8 = 0x56ab78e5c0 "migration-socket-incoming

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (17 preceding siblings ...)
  2017-08-22  9:01 ` ChristianEhrhardt
@ 2017-08-22  9:08 ` ChristianEhrhardt
  2017-08-22  9:26 ` Dr. David Alan Gilbert
                   ` (29 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-22  9:08 UTC (permalink / raw)
  To: qemu-devel

So TL;DR summary for now:
- error triggers in qio_channel_read
- file is migration-socket-incoming
- reads work a while, but then fail at high f->pos offsets (slightly different ones each time)
- slower execution seems to lead to slightly higher offsets that are failing
- only happens on --copy-storage-* migrations (libvirt/virsh argument)

I don't really know atm where to look deeper - is there a good side
channel that I could use to look at what is going on on the migration-
socket-incoming - Maybe from the source and target while I block in gdb?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (18 preceding siblings ...)
  2017-08-22  9:08 ` ChristianEhrhardt
@ 2017-08-22  9:26 ` Dr. David Alan Gilbert
  2017-08-22  9:51 ` Dr. David Alan Gilbert
                   ` (28 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-22  9:26 UTC (permalink / raw)
  To: qemu-devel

OK, so that looks like a real case of the migration stream failing and getting an IO error; so the question is why:
  a) Is the source qemu dieing first and closing the socket?
  b) Is libvirt closing the socket for some reason

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (19 preceding siblings ...)
  2017-08-22  9:26 ` Dr. David Alan Gilbert
@ 2017-08-22  9:51 ` Dr. David Alan Gilbert
  2017-08-22 10:20 ` ChristianEhrhardt
                   ` (27 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-22  9:51 UTC (permalink / raw)
  To: qemu-devel

also, you might want to chase it a bit further down, I think we've got:

   qemu-file-channel.c:channel_get_buffer
     io/channel-socket.c or io/channel-file.c  qio_channel_file_readv

    it would be good to know what the readv/readmsg is actually
returning in the case where it's failing.

Dave

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (20 preceding siblings ...)
  2017-08-22  9:51 ` Dr. David Alan Gilbert
@ 2017-08-22 10:20 ` ChristianEhrhardt
  2017-08-22 10:57 ` Dr. David Alan Gilbert
                   ` (26 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-22 10:20 UTC (permalink / raw)
  To: qemu-devel

I'll track down the actual read and then add debugging the source at the same time (that should be the best way to track the migration socket on both sides).
This might be slightly tricky since I don't know exactly on which offset but I can surely start over 310*10^6 it seems.

I'll report back once I know more, thanks for your guidance David

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (21 preceding siblings ...)
  2017-08-22 10:20 ` ChristianEhrhardt
@ 2017-08-22 10:57 ` Dr. David Alan Gilbert
  2017-08-22 11:13 ` Dr. David Alan Gilbert
                   ` (25 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-22 10:57 UTC (permalink / raw)
  To: qemu-devel

Hmm i just tried to reproduce this and hit (on the source):

main_channel_client_handle_migrate_connected: client 0x5607d785f610 connected: 0 seamless 0
qemu-system-x86_64: /root/qemu/io/channel.c:303: qio_channel_yield: Assertion `!ioc->write_coroutine' failed.
2017-08-22 10:50:04.888+0000: shutting down, reason=crashed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (22 preceding siblings ...)
  2017-08-22 10:57 ` Dr. David Alan Gilbert
@ 2017-08-22 11:13 ` Dr. David Alan Gilbert
  2017-08-22 11:14 ` ChristianEhrhardt
                   ` (24 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-22 11:13 UTC (permalink / raw)
  To: qemu-devel

OK, 3rd try and I've hit the same behaviour as Christian.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (23 preceding siblings ...)
  2017-08-22 11:13 ` Dr. David Alan Gilbert
@ 2017-08-22 11:14 ` ChristianEhrhardt
  2017-08-22 11:18 ` ChristianEhrhardt
                   ` (23 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-22 11:14 UTC (permalink / raw)
  To: qemu-devel

Stack from qemu_fill_buffer to qio_channel_socket_readv
#0  qio_channel_socket_readv (ioc=<optimized out>, iov=<optimized out>, niov=<optimized out>, fds=0x0, nfds=0x0, errp=0x0)
    at ./io/channel-socket.c:477
#1  0x0000001486ec97e2 in qio_channel_read (ioc=ioc@entry=0x148a73a6c0, 
    buf=buf@entry=\060\nLw", buflen=buflen@entry=28728, errp=errp@entry=0x0) at ./io/channel.c:112
#2  0x0000001486e005ec in channel_get_buffer (opaque=<optimized out>, 
    buf=0x1489844c00 "\060\nLw", pos=<optimized out>, size=28728) at ./migration/qemu-file-channel.c:80
#3  0x0000001486dff095 in qemu_fill_buffer (f=f@entry=0x1489843c00) at ./migration/qemu-file.c:293

I checked that sioc->fd, &msg, sflags) is in fact the socket.
With e.g. with this fd being 27
tcp    ESTAB      1405050 0      ::ffff:10.22.69.30:49152                   ::ffff:10.22.69.157:49804                 users:(("qemu-system-x86",pid=29273,fd=27)) ino:3345152 sk:30 <->
         skmem:(r1420644,rb1495660,t0,tb332800,f668,w0,o0,bl0,d14) ts sack cubic wscale:7,7 rto:200 rtt:0.04/0.02 ato:80 mss:8948 cwnd:10 bytes_received:1981460 segs_out:37 segs_in:247 data_segs_in:231 send 17896.0Mbps lastsnd:254728 lastrcv:250372 lastack:250372 rcv_rtt:0.205 rcv_space:115461 minrtt:0.04

I need to break on the fail of that recvmsg in qio_channel_socket_readv
# the following does not work due to optimization the ret value is only around later
b io/channel-socket.c:478 if ret < 0
But catching it "inside" the if works
b io/channel-socket.c:479


Take the following with a grain of salt, this is very threaded and noisy to debug.

Once I hit it the recmsg returned "-1", that was on f->pos = 311641887
But at the same time I could confirm (via ss) that the socket itself is still open on source and target of the migration.

-1  is EAGAIN and returns QIO_CHANNEL_ERR_BLOCK
That seems to arrive in nbd_rwv nbd/common.c:44).
And led to "qio_channel_yield"

There are a few corouting switches in between so I hope I'm not loosing anything.
But that first ret<0 actually worked, it seems the yield and retry got it working.

I got back to qemu_fill_buffer iterating further after this.
This hit ret<0 in qio_channel_socket_readv again at f->pos 311641887.

This time on returning the QIO_CHANNEL_ERR_BLOCK it returned to "./migration/qemu-file-channel.c:81".
That was interesting as it is different than before.
After this it seemed to become a death spiral - recmsg returned -1 every time (still on the same offset).
It passed back through the nbd_rwv which called qio_channel_yield for multiple times.

Then it continued and later on on 321998304 is the last I saw.
It did no more pass b io/channel-socket.c:479 at all, but then led to the exit.

Hmm, I might have lost myself on the coroutine switches - but it is odd at least.
Trying to redo less interactive and with a bit more prep ...
Maybe the results are more reliable then ...

Getting back with more later ...

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (24 preceding siblings ...)
  2017-08-22 11:14 ` ChristianEhrhardt
@ 2017-08-22 11:18 ` ChristianEhrhardt
  2017-08-22 11:23 ` Dr. David Alan Gilbert
                   ` (22 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-22 11:18 UTC (permalink / raw)
  To: qemu-devel

Only now read comment #27, thanks David for reproducing with me, it is
somewhat relieving that you seem to see the same.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (25 preceding siblings ...)
  2017-08-22 11:18 ` ChristianEhrhardt
@ 2017-08-22 11:23 ` Dr. David Alan Gilbert
  2017-08-22 11:33 ` ChristianEhrhardt
                   ` (21 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-22 11:23 UTC (permalink / raw)
  To: qemu-devel

(4th try) breakpoint on qemu_file_set_error,  it's bdrv_inactivate_all
that's returning the error.

(gdb) list
1155	    if (inactivate_disks) {
1156	        /* Inactivate before sending QEMU_VM_EOF so that the
1157	         * bdrv_invalidate_cache_all() on the other end won't fail. */
1158	        ret = bdrv_inactivate_all();
1159	        if (ret) {
1160	            qemu_file_set_error(f, ret);
1161	            return ret;
1162	        }
1163	    }

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (26 preceding siblings ...)
  2017-08-22 11:23 ` Dr. David Alan Gilbert
@ 2017-08-22 11:33 ` ChristianEhrhardt
  2017-08-22 11:44 ` Dr. David Alan Gilbert
                   ` (20 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-22 11:33 UTC (permalink / raw)
  To: qemu-devel

For me qemu_file_set_error was always called from qemu_fill_buffer, interesting that it seems different for you.
I'll rerun a few times to ensure it really always is always from qemu_fill_buffer for me.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (27 preceding siblings ...)
  2017-08-22 11:33 ` ChristianEhrhardt
@ 2017-08-22 11:44 ` Dr. David Alan Gilbert
  2017-08-22 11:46 ` Dr. David Alan Gilbert
                   ` (19 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-22 11:44 UTC (permalink / raw)
  To: qemu-devel

The difference with the qemu_file_set_error is I'm looking on the source
- because what's happening is the source is erroring so closing the
socket, and so the error you're seeing on the destination is real - the
socket just EOF'd!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (28 preceding siblings ...)
  2017-08-22 11:44 ` Dr. David Alan Gilbert
@ 2017-08-22 11:46 ` Dr. David Alan Gilbert
  2017-08-22 11:50 ` ChristianEhrhardt
                   ` (18 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-22 11:46 UTC (permalink / raw)
  To: qemu-devel

repeated the assert in #26:
Program received signal SIGABRT, Aborted.
0x00007f02163005f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) where
#0  0x00007f02163005f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f0216301ce8 in __GI_abort () at abort.c:90
#2  0x00007f02162f9566 in __assert_fail_base (fmt=0x7f0216449288 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x560ac0191e93 "!ioc->write_coroutine", file=file@entry=0x560ac0191e66 "/root/qemu/io/channel.c", line=line@entry=303, function=function@entry=0x560ac0191f60 <__PRETTY_FUNCTION__.22239> "qio_channel_yield")
    at assert.c:92
#3  0x00007f02162f9612 in __GI___assert_fail (assertion=assertion@entry=0x560ac0191e93 "!ioc->write_coroutine", file=file@entry=0x560ac0191e66 "/root/qemu/io/channel.c", line=line@entry=303, function=function@entry=0x560ac0191f60 <__PRETTY_FUNCTION__.22239> "qio_channel_yield") at assert.c:101
#4  0x0000560ac0036a08 in qio_channel_yield (ioc=ioc@entry=0x560ac2397c90, condition=condition@entry=G_IO_OUT)
    at /root/qemu/io/channel.c:303
#5  0x0000560ac001930e in nbd_rwv (ioc=0x560ac2397c90, iov=<optimized out>, niov=<optimized out>, length=<optimized out>, do_read=do_read@entry=false, errp=errp@entry=0x0) at /root/qemu/nbd/common.c:47
#6  0x0000560ac0007e24 in nbd_co_send_request (bs=bs@entry=0x560ac30167a0, request=request@entry=0x7f0209afc9a0, qiov=qiov@entry=0x560ac2428d68) at /root/qemu/block/nbd-client.c:154
#7  0x0000560ac0008244 in nbd_client_co_pwritev (bs=0x560ac30167a0, offset=3414163456, bytes=<optimized out>, qiov=0x560ac2428d68, flags=<optimized out>) at /root/qemu/block/nbd-client.c:260
#8  0x0000560ac00030e1 in bdrv_driver_pwritev (bs=bs@entry=0x560ac30167a0, offset=offset@entry=3414163456, bytes=bytes@entry=589824, qiov=qiov@entry=0x560ac2428d68, flags=flags@entry=0) at /root/qemu/block/io.c:877
#9  0x0000560ac0004480 in bdrv_aligned_pwritev (req=req@entry=0x7f0209afcba0, offset=offset@entry=3414163456, bytes=589824, align=align@entry=1, qiov=qiov@entry=0x560ac2428d68, flags=flags@entry=0, child=0x560ac1f0a9b0, child=0x560ac1f0a9b0) at /root/qemu/block/io.c:1382
#10 0x0000560ac0005258 in bdrv_co_pwritev (child=0x560ac1f0a9b0, offset=offset@entry=3414163456, bytes=<optimized out>, qiov=qiov@entry=0x560ac2428d68, flags=0) at /root/qemu/block/io.c:1633
#11 0x0000560abffbf564 in raw_co_pwritev (bs=0x560ac22807f0, offset=3414163456, bytes=<optimized out>, qiov=0x560ac2428d68, flags=<optimized out>) at /root/qemu/block/raw-format.c:243
#12 0x0000560ac00030e1 in bdrv_driver_pwritev (bs=bs@entry=0x560ac22807f0, offset=offset@entry=3414163456, bytes=bytes@entry=589824, qiov=qiov@entry=0x560ac2428d68, flags=flags@entry=0) at /root/qemu/block/io.c:877
#13 0x0000560ac0004480 in bdrv_aligned_pwritev (req=req@entry=0x7f0209afce70, offset=offset@entry=3414163456, bytes=589824, align=align@entry=1, qiov=qiov@entry=0x560ac2428d68, flags=flags@entry=0, child=0x560ac33c1e70, child=0x560ac33c1e70) at /root/qemu/block/io.c:1382
#14 0x0000560ac0005258 in bdrv_co_pwritev (child=0x560ac33c1e70, offset=offset@entry=3414163456, bytes=<optimized out>, bytes@entry=589824, qiov=qiov@entry=0x560ac2428d68, flags=0) at /root/qemu/block/io.c:1633
#15 0x0000560abfff5173 in blk_co_pwritev (blk=0x560ac14f31e0, offset=3414163456, bytes=589824, qiov=0x560ac2428d68, flags=<optimized out>) at /root/qemu/block/block-backend.c:1062
#16 0x0000560abfff528a in blk_aio_write_entry (opaque=0x560ac2c18f70) at /root/qemu/block/block-backend.c:1253
#17 0x0000560ac0092cca in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
    at /root/qemu/util/coroutine-ucontext.c:79
#18 0x00007f0216312110 in __start_context () at /lib64/libc.so.6
#19 0x00007fff12c4db30 in  ()
#20 0x0000000000000000 in  ()
(gdb)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (29 preceding siblings ...)
  2017-08-22 11:46 ` Dr. David Alan Gilbert
@ 2017-08-22 11:50 ` ChristianEhrhardt
  2017-08-22 15:05 ` Dr. David Alan Gilbert
                   ` (17 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-22 11:50 UTC (permalink / raw)
  To: qemu-devel

In 5/5 tries this was on qemu_fill_buffer for my case.

But that was on the receiving side, and what you found is closer to the root cause on the source of the migration.
I checked on qemu_file_set_error on the source and can confirm your finding that on the source it is from bdrv_inactivate_all.

#0  qemu_file_set_error (f=f@entry=0x6b76b46c00, ret=ret@entry=-1) at ./migration/qemu-file.c:124
#1  0x0000006b727140cb in qemu_savevm_state_complete_precopy (f=0x6b76b46c00, iterable_only=iterable_only@entry=false, 
    inactivate_disks=inactivate_disks@entry=true) at ./migration/savevm.c:1160
#2  0x0000006b7270c84b in migration_completion (start_time=<synthetic pointer>, old_vm_running=<synthetic pointer>, current_active_state=4, 
    s=0x6b74ef53b0) at ./migration/migration.c:1858
#3  migration_thread (opaque=0x6b74ef53b0) at ./migration/migration.c:2023
#4  0x00007f61a740e74a in start_thread (arg=0x7f61467fc700) at pthread_create.c:456
#5  0x00007f61a714acaf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Also as I outlined - what seems ages ago in comment #6 - if the source
is a qemu 2.8 the migration works for me which would kind of match
assuming the root cause is in the source.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (30 preceding siblings ...)
  2017-08-22 11:50 ` ChristianEhrhardt
@ 2017-08-22 15:05 ` Dr. David Alan Gilbert
  2017-08-22 15:47 ` ChristianEhrhardt
                   ` (16 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-22 15:05 UTC (permalink / raw)
  To: qemu-devel

OK, Stefan posted a patch for that assert (see 'nbd-client: avoid spurious qui_channel_yield() re-entry) so now I'm running with the following patch and I'm seeing the bdrv_inactivate return a -1 for 
drive-virtio-disk0
Christian: Could you see what your source says with this patch?

diff --git a/block.c b/block.c
index 3615a68..f9bd689 100644
--- a/block.c
+++ b/block.c
@@ -4078,9 +4078,11 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs,
     BdrvChild *child, *parent;
     int ret;
 
+    fprintf(stderr, "%s: entry for %s\n", __func__, bdrv_get_device_or_node_name(bs));
     if (!setting_flag && bs->drv->bdrv_inactivate) {
         ret = bs->drv->bdrv_inactivate(bs);
         if (ret < 0) {
+            fprintf(stderr, "%s: exit 1(%d) for %s\n", __func__, ret, bdrv_get_device_or_node_name(bs));
             return ret;
         }
     }
@@ -4094,6 +4096,7 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs,
             if (parent->role->inactivate) {
                 ret = parent->role->inactivate(parent);
                 if (ret < 0) {
+                    fprintf(stderr, "%s: exit 2(%d) for %s\n", __func__, ret, bdrv_get_device_or_node_name(bs));
                     bs->open_flags &= ~BDRV_O_INACTIVE;
                     return ret;
                 }
@@ -4109,6 +4112,7 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs,
     QLIST_FOREACH(child, &bs->children, next) {
         ret = bdrv_inactivate_recurse(child->bs, setting_flag);
         if (ret < 0) {
+            fprintf(stderr, "%s: exit 3(%d) for %s\n", __func__, ret, bdrv_get_device_or_node_name(bs));
             return ret;
         }
     }
@@ -4117,6 +4121,7 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs,
      * driver */
     bdrv_release_persistent_dirty_bitmaps(bs);
 
+    fprintf(stderr, "%s: exit end good for %s\n", __func__,  bdrv_get_device_or_node_name(bs));
     return 0;
 }

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (31 preceding siblings ...)
  2017-08-22 15:05 ` Dr. David Alan Gilbert
@ 2017-08-22 15:47 ` ChristianEhrhardt
  2017-08-22 17:57 ` ChristianEhrhardt
                   ` (15 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-22 15:47 UTC (permalink / raw)
  To: qemu-devel

Building with the attached debug patch ...

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (32 preceding siblings ...)
  2017-08-22 15:47 ` ChristianEhrhardt
@ 2017-08-22 17:57 ` ChristianEhrhardt
  2017-08-22 19:21 ` ChristianEhrhardt
                   ` (14 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-22 17:57 UTC (permalink / raw)
  To: qemu-devel

I didn't add Stefans patch yet.
Note: the Mentioned patch is at: Note: http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg04027.html

With your debug patch applied I get:
2017-08-22 17:57:04.486+0000: initiating migration
bdrv_inactivate_recurse: entry for drive-virtio-disk0
bdrv_inactivate_recurse: entry for #block145
bdrv_inactivate_recurse: entry for #block328
bdrv_inactivate_recurse: entry for #block210
bdrv_inactivate_recurse: exit end good for #block210
bdrv_inactivate_recurse: exit end good for #block328
bdrv_inactivate_recurse: entry for #block082
bdrv_inactivate_recurse: exit end good for #block082
bdrv_inactivate_recurse: exit end good for #block145
bdrv_inactivate_recurse: exit end good for drive-virtio-disk0
bdrv_inactivate_recurse: entry for #block763
bdrv_inactivate_recurse: entry for #block631
bdrv_inactivate_recurse: exit end good for #block631
bdrv_inactivate_recurse: exit end good for #block763
bdrv_inactivate_recurse: entry for drive-virtio-disk1
bdrv_inactivate_recurse: entry for #block544
bdrv_inactivate_recurse: entry for #block405
bdrv_inactivate_recurse: exit end good for #block405
bdrv_inactivate_recurse: exit end good for #block544
bdrv_inactivate_recurse: exit end good for drive-virtio-disk1
bdrv_inactivate_recurse: entry for #block1086
bdrv_inactivate_recurse: entry for #block919
bdrv_inactivate_recurse: exit end good for #block919
bdrv_inactivate_recurse: exit end good for #block1086
bdrv_inactivate_recurse: entry for drive-virtio-disk0
bdrv_inactivate_recurse: exit 2(-1) for drive-virtio-disk0

I'm currently building one with Stefans patch applied as well over (my)
night, but let me know if there is more that makes sense to try.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (33 preceding siblings ...)
  2017-08-22 17:57 ` ChristianEhrhardt
@ 2017-08-22 19:21 ` ChristianEhrhardt
  2017-08-23  8:02 ` Dr. David Alan Gilbert
                   ` (13 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-22 19:21 UTC (permalink / raw)
  To: qemu-devel

With the patch from Stefan and your debug applied source and target I still run into the same issue I'd say.
Id's are slightly off, but they are different on every try anyway.

Still looks the same for me:
bdrv_inactivate_recurse: entry for drive-virtio-disk0
bdrv_inactivate_recurse: entry for #block184
bdrv_inactivate_recurse: entry for #block319
bdrv_inactivate_recurse: entry for #block218
bdrv_inactivate_recurse: exit end good for #block218
bdrv_inactivate_recurse: exit end good for #block319
bdrv_inactivate_recurse: entry for #block092
bdrv_inactivate_recurse: exit end good for #block092
bdrv_inactivate_recurse: exit end good for #block184
bdrv_inactivate_recurse: exit end good for drive-virtio-disk0
bdrv_inactivate_recurse: entry for #block1905
bdrv_inactivate_recurse: entry for #block1889
bdrv_inactivate_recurse: exit end good for #block1889
bdrv_inactivate_recurse: exit end good for #block1905
bdrv_inactivate_recurse: entry for drive-virtio-disk1
bdrv_inactivate_recurse: entry for #block551
bdrv_inactivate_recurse: entry for #block423
bdrv_inactivate_recurse: exit end good for #block423
bdrv_inactivate_recurse: exit end good for #block551
bdrv_inactivate_recurse: exit end good for drive-virtio-disk1
bdrv_inactivate_recurse: entry for #block2246
bdrv_inactivate_recurse: entry for #block2106
bdrv_inactivate_recurse: exit end good for #block2106
bdrv_inactivate_recurse: exit end good for #block2246
bdrv_inactivate_recurse: entry for drive-virtio-disk0
bdrv_inactivate_recurse: exit 2(-1) for drive-virtio-disk0

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (34 preceding siblings ...)
  2017-08-22 19:21 ` ChristianEhrhardt
@ 2017-08-23  8:02 ` Dr. David Alan Gilbert
  2017-08-23 14:14 ` Stefan Hajnoczi
                   ` (12 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-23  8:02 UTC (permalink / raw)
  To: qemu-devel

OK, yeh that's the same symptom I saw - it's that final failure that
causes bdrv_inactivate_all to return a failure and fail the source
migration.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (35 preceding siblings ...)
  2017-08-23  8:02 ` Dr. David Alan Gilbert
@ 2017-08-23 14:14 ` Stefan Hajnoczi
  2017-08-23 14:41 ` Dr. David Alan Gilbert
                   ` (11 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Stefan Hajnoczi @ 2017-08-23 14:14 UTC (permalink / raw)
  To: qemu-devel

Please see Fam's patch series "[PATCH for-2.10 0/4] block: Fix non-
shared storage migration" that fixes this issue.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (36 preceding siblings ...)
  2017-08-23 14:14 ` Stefan Hajnoczi
@ 2017-08-23 14:41 ` Dr. David Alan Gilbert
  2017-08-23 14:46 ` ChristianEhrhardt
                   ` (10 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-23 14:41 UTC (permalink / raw)
  To: qemu-devel

yes, seems to fix it for me.

Thanks Christian for filing this;  we probably wouldn't have spotted it before the release without it
(which the test Stefan has just added will hopefully cure!).

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (37 preceding siblings ...)
  2017-08-23 14:41 ` Dr. David Alan Gilbert
@ 2017-08-23 14:46 ` ChristianEhrhardt
  2017-08-23 14:55 ` ChristianEhrhardt
                   ` (9 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-23 14:46 UTC (permalink / raw)
  To: qemu-devel

Hi Stefan,
I was part of the report around the series in "[PATCH for-2.10 0/4] block: Fix non-shared storage migration", but this is happening on rc3 which contains this.

AFAIK Fam's series is:
dd7fdaad iotests: Add non-shared storage migration case 192 (Fam)
5f7772c4 block-backend: Defer shared_perm tightening migration completion (Fam)
3dff24f2 nbd: Fix order of bdrv_set_perm and bdrv_invalidate_cache (Kevin)
80adf54e stubs: Add vm state change handler stubs (Fam)

All these got into v2.10.0-rc3 which these tests are based on already.
IMHO - This is not complete for qemu 2.10 and a regression since 2.9 (well since 2.8 as I haven't tested 2.9 personally).

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (38 preceding siblings ...)
  2017-08-23 14:46 ` ChristianEhrhardt
@ 2017-08-23 14:55 ` ChristianEhrhardt
  2017-08-23 15:13   ` Eric Blake
  2017-08-23 16:56 ` ChristianEhrhardt
                   ` (8 subsequent siblings)
  48 siblings, 1 reply; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-23 14:55 UTC (permalink / raw)
  To: qemu-devel

Ok, clarified with Stefanha
It has exactly the same title as a series of 18th August which was related to a similar issue.
It is about an hour old now on qemu-devel, quoting

"This fixes the issue reported as
https://bugs.launchpad.net/bugs/1711602

Fam Zheng (3):
  block-backend: Refactor inactivate check
  block-backend: Allow more "can inactivate" cases
  mirror: Mark target BB as "force allow inactivate"

Stefan Hajnoczi (1):
  block: Update open_flags after ->inactivate() callback"


I'll prep a build with that and test as well

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
  2017-08-23 14:55 ` ChristianEhrhardt
@ 2017-08-23 15:13   ` Eric Blake
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Blake @ 2017-08-23 15:13 UTC (permalink / raw)
  To: Bug 1711602, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 993 bytes --]

On 08/23/2017 09:55 AM, ChristianEhrhardt wrote:
> Ok, clarified with Stefanha
> It has exactly the same title as a series of 18th August which was related to a similar issue.
> It is about an hour old now on qemu-devel, quoting
> 
> "This fixes the issue reported as
> https://bugs.launchpad.net/bugs/1711602
> 
> Fam Zheng (3):
>   block-backend: Refactor inactivate check
>   block-backend: Allow more "can inactivate" cases
>   mirror: Mark target BB as "force allow inactivate"
> 
> Stefan Hajnoczi (1):
>   block: Update open_flags after ->inactivate() callback"
> 
> 
> I'll prep a build with that and test as well

Here's what is brewing for my pull request, although if you can
successfully test things, I'm happy to add a Tested-by: tag before
actually sending the pull request:

git fetch  git://repo.or.cz/qemu/ericb.git nbd

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (39 preceding siblings ...)
  2017-08-23 14:55 ` ChristianEhrhardt
@ 2017-08-23 16:56 ` ChristianEhrhardt
  2017-08-23 17:01 ` ChristianEhrhardt
                   ` (7 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-23 16:56 UTC (permalink / raw)
  To: qemu-devel

Hmm,
it gets further but can still not complete this kind of migration:

$ virsh migrate --live --copy-storage-all kvmguest-artful-normal
qemu+ssh://10.22.69.30/system

Source:
2017-08-23 16:49:23.022+0000: initiating migration
Unexpected error in bdrv_check_perm() at /build/qemu-VjSgVJ/qemu-2.10~rc3+dfsg/block.c:1574:
2017-08-23T16:49:23.203181Z qemu-system-x86_64: Block node is read-only
2017-08-23 16:49:23.762+0000: shutting down, reason=crashed

Target:
2017-08-23T16:49:23.495478Z qemu-system-x86_64: Failed to load virtio_pci/modern_state:modern_state
2017-08-23T16:49:23.495505Z qemu-system-x86_64: Failed to load virtio/extra_state:extra_state
2017-08-23T16:49:23.495510Z qemu-system-x86_64: Failed to load virtio-balloon:virtio
2017-08-23T16:49:23.495515Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:06.0/virtio-balloon'
2017-08-23T16:49:23.496071Z qemu-system-x86_64: load of migration failed: Input/output error
2017-08-23 16:49:23.797+0000: shutting down, reason=crashed

I was to eager to get this close-to-real so I don't have Davids fprintf's applied anymore - I'll build those and then run it in the debugger, but until then what I can see is that behavior slightly changes (worse).
It now crashes the guest on the source as well when aborting the migration.

I need to debug to confirm, but it seems it still aborts the migration
 -> qemu-system-x86_64: load of migration failed: Input/output error
But then can't fall back to the source and crashes at
 -> qemu-system-x86_64: Block node is read-only

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (40 preceding siblings ...)
  2017-08-23 16:56 ` ChristianEhrhardt
@ 2017-08-23 17:01 ` ChristianEhrhardt
  2017-08-23 18:10 ` Dr. David Alan Gilbert
                   ` (6 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-23 17:01 UTC (permalink / raw)
  To: qemu-devel

That was rc3 +:
- nbd-client-avoid-spurious-qio_channel_yield.patch
- the four patches mentioned in comment #43

I could also re-base onto master + pacthes or rc4 if there is one soon.
For now building with Davids debug statements applied again to check if we still abort around that assert.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (41 preceding siblings ...)
  2017-08-23 17:01 ` ChristianEhrhardt
@ 2017-08-23 18:10 ` Dr. David Alan Gilbert
  2017-08-23 18:55 ` Dr. David Alan Gilbert
                   ` (5 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-23 18:10 UTC (permalink / raw)
  To: qemu-devel

I need to recheck with that combo - I'd seen that error but only when
I'd commented out 'if (!blk->dev && !blk_name(blk)[0]) {'  when
debugging earlier.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (42 preceding siblings ...)
  2017-08-23 18:10 ` Dr. David Alan Gilbert
@ 2017-08-23 18:55 ` Dr. David Alan Gilbert
  2017-08-23 19:06 ` Dr. David Alan Gilbert
                   ` (4 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-23 18:55 UTC (permalink / raw)
  To: qemu-devel

Looks good here, just retested:

here's teh top of my git:

f89f59fad5119f878aaedf711af90802ddcb99c7 nbd-client: avoid spurious qio_channel_yield() re-entry
cf26039a2b50f078b4ad90b88eea5bb28971c0d8 block: Update open_flags after ->inactivate() callback
8ccc527d84ec9a5052cfae19edbc44abb5ac03ae mirror: Mark target BB as "force allow inactivate"
34c3f17c99a43f261560edbd3da1188dd0c398ab block-backend: Allow more "can inactivate" cases
952ad9fd9dd43e92016d5bfc0ff93bdeaec13bf9 block-backend: Refactor inactivate check
1f296733876434118fd766cfef5eb6f29ecab6a8 Update version for v2.10.0-rc3 release

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (43 preceding siblings ...)
  2017-08-23 18:55 ` Dr. David Alan Gilbert
@ 2017-08-23 19:06 ` Dr. David Alan Gilbert
  2017-08-23 19:10 ` ChristianEhrhardt
                   ` (3 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-23 19:06 UTC (permalink / raw)
  To: qemu-devel

just tested current head - 1eed33994e28d4a0437ba6e944bbc3ec5e4a29a0 -
seems to work for me.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (44 preceding siblings ...)
  2017-08-23 19:06 ` Dr. David Alan Gilbert
@ 2017-08-23 19:10 ` ChristianEhrhardt
  2017-08-23 21:02 ` ChristianEhrhardt
                   ` (2 subsequent siblings)
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-23 19:10 UTC (permalink / raw)
  To: qemu-devel

Yeah seems to be slightly different than the former assert.

2017-08-23 18:41:54.556+0000: initiating migration
bdrv_inactivate_recurse: entry for drive-virtio-disk0
bdrv_inactivate_recurse: entry for #block133
bdrv_inactivate_recurse: entry for #block329
bdrv_inactivate_recurse: entry for #block202
bdrv_inactivate_recurse: exit end good for #block202
bdrv_inactivate_recurse: exit end good for #block329
bdrv_inactivate_recurse: entry for #block025
bdrv_inactivate_recurse: exit end good for #block025
bdrv_inactivate_recurse: exit end good for #block133
bdrv_inactivate_recurse: exit end good for drive-virtio-disk0
bdrv_inactivate_recurse: entry for #block799
bdrv_inactivate_recurse: entry for #block626
bdrv_inactivate_recurse: exit end good for #block626
bdrv_inactivate_recurse: exit end good for #block799
bdrv_inactivate_recurse: entry for drive-virtio-disk1
bdrv_inactivate_recurse: entry for #block570
bdrv_inactivate_recurse: entry for #block485
bdrv_inactivate_recurse: exit end good for #block485
bdrv_inactivate_recurse: exit end good for #block570
bdrv_inactivate_recurse: exit end good for drive-virtio-disk1
bdrv_inactivate_recurse: entry for #block1058
bdrv_inactivate_recurse: entry for #block920
bdrv_inactivate_recurse: exit end good for #block920
bdrv_inactivate_recurse: exit end good for #block1058
bdrv_inactivate_recurse: entry for drive-virtio-disk0
Unexpected error in bdrv_check_perm() at /build/qemu-0OVYHF/qemu-2.10~rc3+dfsg/block.c:1574:
2017-08-23T18:41:54.730131Z qemu-system-x86_64: Block node is read-only

Which is:
1553 /*                                                                               
1554  * Check whether permissions on this node can be changed in a way that           
1555  * @cumulative_perms and @cumulative_shared_perms are the new cumulative         
1556  * permissions of all its parents. This involves checking whether all necessary  
1557  * permission changes to child nodes can be performed.                           
1558  *                                                                               
1559  * A call to this function must always be followed by a call to bdrv_set_perm()  
1560  * or bdrv_abort_perm_update().                                                  
1561  */                                                                              
1562 static int bdrv_check_perm(BlockDriverState *bs, uint64_t cumulative_perms,      
1563                            uint64_t cumulative_shared_perms,                     
1564                            GSList *ignore_children, Error **errp)                
1565 {                                                                                
1566     BlockDriver *drv = bs->drv;                                                  
1567     BdrvChild *c;                                                                
1568     int ret;                                                                     
1569                                                                                  
1570     /* Write permissions never work with read-only images */                     
1571     if ((cumulative_perms & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) &&      
1572         !bdrv_is_writable(bs))                                                   
1573     {                                                                            
1574         error_setg(errp, "Block node is read-only");                             
1575         return -EPERM;                                                           
1576     } 

Adding in debug symbols to see in gdb which device that actually is showed me:
I don't know what you might need so the full struct:

(gdb) p *bs
$2 = {open_flags = 2050, read_only = false, encrypted = false, sg = false, probed = false, force_share = false, implicit = true, 
  drv = 0x1a67219800 <bdrv_mirror_top>, opaque = 0x0, aio_context = 0x1a684ae0d0, aio_notifiers = {lh_first = 0x1a6a4850e0}, 
  walking_aio_notifiers = false, filename = "/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow", '\000' <repeats 4037 times>, 
  backing_file = "/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow", '\000' <repeats 4037 times>, 
  backing_format = "qcow2\000\000\000\000\000\000\000\000\000\000", full_open_options = 0x0, 
  exact_filename = "/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow", '\000' <repeats 4037 times>, backing = 0x1a6971a4a0, 
  file = 0x0, bl = {request_alignment = 1, max_pdiscard = 0, pdiscard_alignment = 0, max_pwrite_zeroes = 0, pwrite_zeroes_alignment = 0, 
    opt_transfer = 0, max_transfer = 0, min_mem_alignment = 512, opt_mem_alignment = 4096, max_iov = 1024}, supported_write_flags = 0, 
  supported_zero_flags = 0, node_name = "#block814", '\000' <repeats 22 times>, node_list = {tqe_next = 0x1a684b44d0, tqe_prev = 0x1a6b02e0c0}, 
  bs_list = {tqe_next = 0x1a6a010030, tqe_prev = 0x1a6ab6bc50}, monitor_list = {tqe_next = 0x0, tqe_prev = 0x0}, refcnt = 3, op_blockers = {{
      lh_first = 0x1a69e18e80}, {lh_first = 0x1a69e18ea0}, {lh_first = 0x1a69e18ec0}, {lh_first = 0x1a69e18ee0}, {lh_first = 0x1a69e18f00}, {
      lh_first = 0x0}, {lh_first = 0x1a69e18f40}, {lh_first = 0x1a69e18f60}, {lh_first = 0x1a69e18f80}, {lh_first = 0x1a69e18fa0}, {
      lh_first = 0x1a6989be30}, {lh_first = 0x1a69e18fc0}, {lh_first = 0x1a69e18fe0}, {lh_first = 0x1a69352e90}, {lh_first = 0x1a69352eb0}, {
      lh_first = 0x1a69352ed0}}, job = 0x1a69e18bf0, inherits_from = 0x0, children = {lh_first = 0x1a6971a4a0}, parents = {
    lh_first = 0x1a69e18e00}, options = 0x1a69b636a0, explicit_options = 0x1a69e16bb0, detect_zeroes = BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF, 
  backing_blocker = 0x1a686e2e00, total_sectors = 16777216, before_write_notifiers = {notifiers = {lh_first = 0x0}}, write_threshold_offset = 0, 
  write_threshold_notifier = {notify = 0x0, node = {le_next = 0x0, le_prev = 0x0}}, dirty_bitmap_mutex = {lock = {__data = {__lock = 0, 
        __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
      __size = '\000' <repeats 39 times>, __align = 0}, initialized = true}, dirty_bitmaps = {lh_first = 0x0}, wr_highest_offset = {
    value = 1190584320}, copy_on_read = 0, in_flight = 0, serialising_in_flight = 0, wakeup = false, io_plugged = 0, enable_write_cache = 0, 
  quiesce_counter = 0, write_gen = 2, reqs_lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, 
    handoff = 0, sequence = 0, holder = 0x0}, tracked_requests = {lh_first = 0x0}, flush_queue = {entries = {sqh_first = 0x0, 
      sqh_last = 0x1a69b63680}}, active_flush_req = false, flushed_gen = 2}

And that effectively is my root disk:

At least the trivial flag in the struct is "read_only = false".
Also on a FS level it is rw:
-rw------- 1 root root 717160448 Aug 23 18:50 /var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow
(qemu is running privileged in this setup with UID 0, so no reason to mark that as read only IMHO)

So I checked the full context of the if that leads to the error:
  (cumulative_perms & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED))
       3  (in my case)  & (    0x2   |  0x4)
  ok that is a match

So it goes further to
  !bdrv_is_writable(bs)

Which effectively is:
  !bdrv_is_read_only(bs) && !(bs->open_flags & BDRV_O_INACTIVE);
       !bs->read_only       ! (2050        &    0x800)
         !false                     !(true)
         true                       false

So the problem is that BDRV_O_INACTIVE is set?
Sorry I don't see why that is so (maybe too late for today).
But I hope that helps in understanding the remaining case.

I checked against your coommit list and I didn't have the following yet.
cf26039a2b50f078b4ad90b88eea5bb28971c0d8 block: Update open_flags after ->inactivate() callback
I took it now from the PULL 0/6 of Eric that appeared after my last test.
Building with that now to report once again.

If there is no build hickup that next test should just fit in before I fall asleep.
Hoping for the best to report a tested by in time if possible.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  New
Status in libvirt package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  Confirmed

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (45 preceding siblings ...)
  2017-08-23 19:10 ` ChristianEhrhardt
@ 2017-08-23 21:02 ` ChristianEhrhardt
  2017-09-02  8:04 ` Launchpad Bug Tracker
  2017-12-15 16:10 ` Thomas Huth
  48 siblings, 0 replies; 50+ messages in thread
From: ChristianEhrhardt @ 2017-08-23 21:02 UTC (permalink / raw)
  To: qemu-devel

Yes, with all the series of [1] on top it finally works.
Saw it already being merged on master.
Expecting a late rc4 or early release tag and then wrap all it up.

Thanks everybody involved!

[1]: http://lists.nongnu.org/archive/html/qemu-
devel/2017-08/msg04513.html

** Changed in: qemu
       Status: New => Fix Committed

** No longer affects: libvirt (Ubuntu)

** Changed in: qemu (Ubuntu)
       Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  Fix Committed
Status in qemu package in Ubuntu:
  In Progress

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (46 preceding siblings ...)
  2017-08-23 21:02 ` ChristianEhrhardt
@ 2017-09-02  8:04 ` Launchpad Bug Tracker
  2017-12-15 16:10 ` Thomas Huth
  48 siblings, 0 replies; 50+ messages in thread
From: Launchpad Bug Tracker @ 2017-09-02  8:04 UTC (permalink / raw)
  To: qemu-devel

This bug was fixed in the package qemu - 1:2.10~rc4+dfsg-0ubuntu1

---------------
qemu (1:2.10~rc4+dfsg-0ubuntu1) artful; urgency=medium

  * Merge with Upstream 2.10-rc4; This fixes a migration issue (LP: #1711602);
    Remaining changes:
    - qemu-kvm to systemd unit
      - d/qemu-kvm-init: script for QEMU KVM preparation modules, ksm,
        hugepages and architecture specifics
      - d/qemu-kvm.service: systemd unit to call qemu-kvm-init
      - d/qemu-system-common.install: install systemd unit and helper script
      - d/qemu-system-common.maintscript: clean old sysv and upstart scripts
      - d/qemu-system-common.qemu-kvm.default: defaults for
        /etc/default/qemu-kvm
      - d/rules: install /etc/default/qemu-kvm
    - Enable nesting by default
      - set nested=1 module option on intel. (is default on amd)
      - re-load kvm_intel.ko if it was loaded without nested=1
      - d/p/ubuntu/expose-vmx_qemu64cpu.patch: expose nested kvm by default
        in qemu64 cpu type.
      - d/p/ubuntu/enable-svm-by-default.patch: Enable nested svm by default
        in qemu64 on amd
    - libvirt/qemu user/group support
      - qemu-system-common.postinst: remove acl placed by udev, and add udevadm
        trigger.
      - qemu-system-common.preinst: add kvm group if needed
    - Distribution specific machine type
      - d/p/ubuntu/define-ubuntu-machine-types.patch: define distro machine
        types to ease future live vm migration.
      - d/qemu-system-x86.NEWS Info on fixed machine type defintions
    - improved dependencies
      - Make qemu-system-common depend on qemu-block-extra
      - Make qemu-utils depend on qemu-block-extra
      - let qemu-utils recommend sharutils
    - s390x support
      - Create qemu-system-s390x package
      - Include s390-ccw.img firmware
      - Enable numa support for s390x
    - ppc64[le] support
      - d/qemu-system-ppc.links provide usr/bin/qemu-system-ppc64le symlink
      - Enable seccomp for ppc64el
      - bump libseccomp-dev dependency, 2.3 is the minimum for ppc64
    - arch aware kvm wrappers
    - update VCS-git to match the Artful branch
    - disable missing x32 architecture
    - d/rules: or32 is now named or1k (since 4a09d0bb)
    - d/qemu-system-common.docs: new paths since (ac06724a)
    - d/qemu-system-common.install: qmp-commands.txt removed, but replaced
      by qapi-schema.json which is already packaged (since 4d8bb958)
    - d/p/02_kfreebsd.patch: utimensat is no more optional upstream (Update
      to Debian patch to match qemu 2.10)
    - s390x package now builds correctly on all architectures (LP 1710695)
  * Added changes:
    - d/qemu-system-common.docs: adapt new path of live-block-operations.rst
      since 8508eee7
    - d/qemu-system-common.docs: adapt q35 config paths since 9ca019c1
    - make nios2/hppa not installed explicitly until further stablized
    - d/qemu-guest-agent.install: add the new guest agent reference man page
      qemu-ga-ref
    - d/qemu-system-common.install: add the now generated qapi/qmp reference
      along the qapi intro
    - d/not-installed: ignore further generated (since 56e8bdd4) files in
      dh_missing that are already provided in other formats qemu-doc,
      qemu-qmp-ref,qemu-ga-ref
    - d/p/ubuntu/define-ubuntu-machine-types.patch: update to match new
      changes in 2.10-rc4

 -- Christian Ehrhardt <christian.ehrhardt@canonical.com>  Fri, 25 Aug
2017 07:49:30 +0200

** Changed in: qemu (Ubuntu)
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  Fix Committed
Status in qemu package in Ubuntu:
  Fix Released

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10
       [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
                   ` (47 preceding siblings ...)
  2017-09-02  8:04 ` Launchpad Bug Tracker
@ 2017-12-15 16:10 ` Thomas Huth
  48 siblings, 0 replies; 50+ messages in thread
From: Thomas Huth @ 2017-12-15 16:10 UTC (permalink / raw)
  To: qemu-devel

** Changed in: qemu
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1711602

Title:
  --copy-storage-all failing with qemu 2.10

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released

Bug description:
  We fixed an issue around disk locking already in regard to qemu-nbd
  [1], but there still seem to be issues.

  $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system
  error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
  2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error

  Source libvirt log for the guest:
  2017-08-18 12:09:08.251+0000: initiating migration
  2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer

  Target libvirt log for the guest:
  2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error
  2017-08-18 12:09:09.010+0000: shutting down, reason=crashed

  Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy).
  Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails.

  Dmesg has no related denials as often apparmor is in the mix.

  Need to check libvirt logs of source [2] and target [3] in Detail.

  [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html
  [2]: http://paste.ubuntu.com/25339356/
  [3]: http://paste.ubuntu.com/25339358/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2017-12-15 16:15 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <150305905460.11582.12289718300820278863.malonedeb@wampee.canonical.com>
2017-08-18 14:15 ` [Qemu-devel] [Bug 1711602] Re: --copy-storage-all failing with qemu 2.10 ChristianEhrhardt
2017-08-18 14:22 ` ChristianEhrhardt
2017-08-18 15:03 ` ChristianEhrhardt
2017-08-18 15:03 ` ChristianEhrhardt
2017-08-18 15:04 ` ChristianEhrhardt
2017-08-18 15:04 ` ChristianEhrhardt
2017-08-18 15:06 ` ChristianEhrhardt
2017-08-18 15:09 ` ChristianEhrhardt
2017-08-18 17:56 ` Dr. David Alan Gilbert
2017-08-21 14:55 ` ChristianEhrhardt
2017-08-21 14:55 ` ChristianEhrhardt
2017-08-21 14:55 ` ChristianEhrhardt
2017-08-21 14:56 ` ChristianEhrhardt
2017-08-21 15:46 ` Dr. David Alan Gilbert
2017-08-22  8:51 ` ChristianEhrhardt
2017-08-22  8:54 ` ChristianEhrhardt
2017-08-22  8:58 ` ChristianEhrhardt
2017-08-22  9:01 ` ChristianEhrhardt
2017-08-22  9:08 ` ChristianEhrhardt
2017-08-22  9:26 ` Dr. David Alan Gilbert
2017-08-22  9:51 ` Dr. David Alan Gilbert
2017-08-22 10:20 ` ChristianEhrhardt
2017-08-22 10:57 ` Dr. David Alan Gilbert
2017-08-22 11:13 ` Dr. David Alan Gilbert
2017-08-22 11:14 ` ChristianEhrhardt
2017-08-22 11:18 ` ChristianEhrhardt
2017-08-22 11:23 ` Dr. David Alan Gilbert
2017-08-22 11:33 ` ChristianEhrhardt
2017-08-22 11:44 ` Dr. David Alan Gilbert
2017-08-22 11:46 ` Dr. David Alan Gilbert
2017-08-22 11:50 ` ChristianEhrhardt
2017-08-22 15:05 ` Dr. David Alan Gilbert
2017-08-22 15:47 ` ChristianEhrhardt
2017-08-22 17:57 ` ChristianEhrhardt
2017-08-22 19:21 ` ChristianEhrhardt
2017-08-23  8:02 ` Dr. David Alan Gilbert
2017-08-23 14:14 ` Stefan Hajnoczi
2017-08-23 14:41 ` Dr. David Alan Gilbert
2017-08-23 14:46 ` ChristianEhrhardt
2017-08-23 14:55 ` ChristianEhrhardt
2017-08-23 15:13   ` Eric Blake
2017-08-23 16:56 ` ChristianEhrhardt
2017-08-23 17:01 ` ChristianEhrhardt
2017-08-23 18:10 ` Dr. David Alan Gilbert
2017-08-23 18:55 ` Dr. David Alan Gilbert
2017-08-23 19:06 ` Dr. David Alan Gilbert
2017-08-23 19:10 ` ChristianEhrhardt
2017-08-23 21:02 ` ChristianEhrhardt
2017-09-02  8:04 ` Launchpad Bug Tracker
2017-12-15 16:10 ` Thomas Huth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.