All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel]  nbd: Possible regression in 2.9 RCs
@ 2017-03-31 16:03 Ciprian Barbu
  2017-03-31 17:32 ` Ciprian Barbu
  2017-03-31 17:43 ` Max Reitz
  0 siblings, 2 replies; 31+ messages in thread
From: Ciprian Barbu @ 2017-03-31 16:03 UTC (permalink / raw)
  To: qemu-devel, Eric Blake, Alexandru Avadanii
  Cc: Jeff Cody, Markus Armbruster, svc-armband

Hello,

Similar to the other thread about possible regression with rbd, there might be a regression with nbd.
This time we are launching an instance from an image (not volume) and try to live migrate it:

nova live-migration <test_instance>

The nova-compute service complains with:

2017-03-31 15:32:56.179 7806 INFO nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Migration running for 0 secs, memory 100% remaining; (bytes processed=0, remaining=0, total=0)
2017-03-31 15:32:58.029 7806 WARNING stevedore.named [req-73bc0113-5555-4dd8-8903-d3540cc61b47 b9fbceeadd2d4d1bab9c90ae104db1f7 7e7db99b32c6467184701e9a0c2f1de7 - - -] Could not load instance_network_info
2017-03-31 15:32:59.038 7806 ERROR nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Live Migration failure: internal error: unable to execute QEMU command 'nbd-server-add': Conflicts with use by drive-virtio-disk0 as 'root', which does not allow 'write' on #block143
2017-03-31 15:32:59.190 7806 ERROR nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Migration operation has aborted

I will try and bisect it myself, but I thought I would paste this here first, just so you know there is this issue too.

Regards,
/Ciprian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-03-31 16:03 [Qemu-devel] nbd: Possible regression in 2.9 RCs Ciprian Barbu
@ 2017-03-31 17:32 ` Ciprian Barbu
  2017-03-31 17:36   ` Ciprian Barbu
  2017-03-31 17:43 ` Max Reitz
  1 sibling, 1 reply; 31+ messages in thread
From: Ciprian Barbu @ 2017-03-31 17:32 UTC (permalink / raw)
  To: Ciprian Barbu, qemu-devel, Eric Blake, Alexandru Avadanii
  Cc: Jeff Cody, Markus Armbruster, svc-armband

I suspect the culprit here is [1]. I've started a git bisect between this commit and v2.8.0 and during this time I found similar erros "/root/qemu/nbd/server.c:nbd_receive_request():L706: read failed"

I suspect this fix tried to solve a bunch of these read fails, but broke write requests.

[1] https://github.com/qemu/qemu/commit/8a7ce4f9338c475df1afc12502af704e4300a3e0

From: cosnos-bounces@list.enea.se [mailto:cosnos-bounces@list.enea.se] On Behalf Of Ciprian Barbu
Sent: Friday, March 31, 2017 7:04 PM
To: qemu-devel@nongnu.org; Eric Blake <eblake@redhat.com>; Alexandru Avadanii <Alexandru.Avadanii@enea.com>
Cc: Jeff Cody <jcody@redhat.com>; Markus Armbruster <armbru@redhat.com>; svc-armband <armband@enea.com>
Subject: [Cosnos] [Qemu-devel] nbd: Possible regression in 2.9 RCs

Hello,

Similar to the other thread about possible regression with rbd, there might be a regression with nbd.
This time we are launching an instance from an image (not volume) and try to live migrate it:

nova live-migration <test_instance>

The nova-compute service complains with:

2017-03-31 15:32:56.179 7806 INFO nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Migration running for 0 secs, memory 100% remaining; (bytes processed=0, remaining=0, total=0)
2017-03-31 15:32:58.029 7806 WARNING stevedore.named [req-73bc0113-5555-4dd8-8903-d3540cc61b47 b9fbceeadd2d4d1bab9c90ae104db1f7 7e7db99b32c6467184701e9a0c2f1de7 - - -] Could not load instance_network_info
2017-03-31 15:32:59.038 7806 ERROR nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Live Migration failure: internal error: unable to execute QEMU command 'nbd-server-add': Conflicts with use by drive-virtio-disk0 as 'root', which does not allow 'write' on #block143
2017-03-31 15:32:59.190 7806 ERROR nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Migration operation has aborted

I will try and bisect it myself, but I thought I would paste this here first, just so you know there is this issue too.

Regards,
/Ciprian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-03-31 17:32 ` Ciprian Barbu
@ 2017-03-31 17:36   ` Ciprian Barbu
  0 siblings, 0 replies; 31+ messages in thread
From: Ciprian Barbu @ 2017-03-31 17:36 UTC (permalink / raw)
  To: qemu-devel, Eric Blake, Alexandru Avadanii
  Cc: Jeff Cody, Markus Armbruster, svc-armband

Sorry, I think this one is more likely the cause, it's about migration:

https://github.com/qemu/qemu/commit/6f5ef23a3ff09919b73eef8196969685cb2383ee

From: Ciprian Barbu
Sent: Friday, March 31, 2017 8:33 PM
To: Ciprian Barbu <Ciprian.Barbu@enea.com>; qemu-devel@nongnu.org; Eric Blake <eblake@redhat.com>; Alexandru Avadanii <Alexandru.Avadanii@enea.com>
Cc: Jeff Cody <jcody@redhat.com>; Markus Armbruster <armbru@redhat.com>; svc-armband <armband@enea.com>
Subject: RE: [Qemu-devel] nbd: Possible regression in 2.9 RCs

I suspect the culprit here is [1]. I've started a git bisect between this commit and v2.8.0 and during this time I found similar erros "/root/qemu/nbd/server.c:nbd_receive_request():L706: read failed"

I suspect this fix tried to solve a bunch of these read fails, but broke write requests.

[1] https://github.com/qemu/qemu/commit/8a7ce4f9338c475df1afc12502af704e4300a3e0

From: cosnos-bounces@list.enea.se<mailto:cosnos-bounces@list.enea.se> [mailto:cosnos-bounces@list.enea.se] On Behalf Of Ciprian Barbu
Sent: Friday, March 31, 2017 7:04 PM
To: qemu-devel@nongnu.org<mailto:qemu-devel@nongnu.org>; Eric Blake <eblake@redhat.com<mailto:eblake@redhat.com>>; Alexandru Avadanii <Alexandru.Avadanii@enea.com<mailto:Alexandru.Avadanii@enea.com>>
Cc: Jeff Cody <jcody@redhat.com<mailto:jcody@redhat.com>>; Markus Armbruster <armbru@redhat.com<mailto:armbru@redhat.com>>; svc-armband <armband@enea.com<mailto:armband@enea.com>>
Subject: [Cosnos] [Qemu-devel] nbd: Possible regression in 2.9 RCs

Hello,

Similar to the other thread about possible regression with rbd, there might be a regression with nbd.
This time we are launching an instance from an image (not volume) and try to live migrate it:

nova live-migration <test_instance>

The nova-compute service complains with:

2017-03-31 15:32:56.179 7806 INFO nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Migration running for 0 secs, memory 100% remaining; (bytes processed=0, remaining=0, total=0)
2017-03-31 15:32:58.029 7806 WARNING stevedore.named [req-73bc0113-5555-4dd8-8903-d3540cc61b47 b9fbceeadd2d4d1bab9c90ae104db1f7 7e7db99b32c6467184701e9a0c2f1de7 - - -] Could not load instance_network_info
2017-03-31 15:32:59.038 7806 ERROR nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Live Migration failure: internal error: unable to execute QEMU command 'nbd-server-add': Conflicts with use by drive-virtio-disk0 as 'root', which does not allow 'write' on #block143
2017-03-31 15:32:59.190 7806 ERROR nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Migration operation has aborted

I will try and bisect it myself, but I thought I would paste this here first, just so you know there is this issue too.

Regards,
/Ciprian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-03-31 16:03 [Qemu-devel] nbd: Possible regression in 2.9 RCs Ciprian Barbu
  2017-03-31 17:32 ` Ciprian Barbu
@ 2017-03-31 17:43 ` Max Reitz
  2017-03-31 17:49   ` Ciprian Barbu
  2017-04-03  8:15   ` Kevin Wolf
  1 sibling, 2 replies; 31+ messages in thread
From: Max Reitz @ 2017-03-31 17:43 UTC (permalink / raw)
  To: Ciprian Barbu, qemu-devel, Eric Blake, Alexandru Avadanii
  Cc: Jeff Cody, Markus Armbruster, svc-armband, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 4290 bytes --]

On 31.03.2017 18:03, Ciprian Barbu wrote:
> Hello,
> 
> Similar to the other thread about possible regression with rbd, there might be a regression with nbd.
> This time we are launching an instance from an image (not volume) and try to live migrate it:
> 
> nova live-migration <test_instance>
> 
> The nova-compute service complains with:
> 
> 2017-03-31 15:32:56.179 7806 INFO nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Migration running for 0 secs, memory 100% remaining; (bytes processed=0, remaining=0, total=0)
> 2017-03-31 15:32:58.029 7806 WARNING stevedore.named [req-73bc0113-5555-4dd8-8903-d3540cc61b47 b9fbceeadd2d4d1bab9c90ae104db1f7 7e7db99b32c6467184701e9a0c2f1de7 - - -] Could not load instance_network_info
> 2017-03-31 15:32:59.038 7806 ERROR nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Live Migration failure: internal error: unable to execute QEMU command 'nbd-server-add': Conflicts with use by drive-virtio-disk0 as 'root', which does not allow 'write' on #block143
> 2017-03-31 15:32:59.190 7806 ERROR nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Migration operation has aborted
> 
> I will try and bisect it myself, but I thought I would paste this here first, just so you know there is this issue too.

Well, I already know the commit in question. It's
8a7ce4f9338c475df1afc12502af704e4300a3e0 ("nbd/server: Use real
permissions for NBD exports").

Whether this is a bug depends on the standpoint. I would very much
consider it a bug fix because as of this commit you can no longer create
a writable NBD server on a block device that is in use by a guest device
without the guest device being aware of this.

The problem is that the functionality to "make" the guest device "aware"
of it was introduced only a couple of commits before, and it's called
"share-rw".

So this doesn't work:

$ x86_64-softmmu/qemu-system-x86_64 \
    -blockdev node-name=image,driver=qcow2,\
file.driver=file,file.filename=foo.qcow2 \
    -device virtio-blk,drive=image \
    -qmp stdio
{"QMP": {"version": {"qemu": {"micro": 92, "minor": 8, "major": 2},
"package": " (v2.8.0-2038-g6604c893d0)"}, "capabilities": []}}
{'execute':'qmp_capabilities'}
{"return": {}}
{'execute':'nbd-server-start','arguments':{'addr':{'type':'inet','data':{'host':'localhost','port':'10809'}}}}
{"return": {}}
{'execute':'nbd-server-add','arguments':{'device':'image','writable':true}}
{"error": {"class": "GenericError", "desc": "Conflicts with use by
/machine/peripheral-anon/device[0]/virtio-backend as 'root', which does
not allow 'write' on image"}

But this works:

$ x86_64-softmmu/qemu-system-x86_64 \
    -blockdev node-name=image,driver=qcow2,\
file.driver=file,file.filename=foo.qcow2 \
    -device virtio-blk,drive=image,share-rw=on \
    -qmp stdio
{"QMP": {"version": {"qemu": {"micro": 92, "minor": 8, "major": 2},
"package": " (v2.8.0-2038-g6604c893d0)"}, "capabilities": []}}
{'execute':'qmp_capabilities'}
{"return": {}}
{'execute':'nbd-server-start','arguments':{'addr':{'type':'inet','data':{'host':'localhost','port':'10809'}}}}
{"return": {}}
{'execute':'nbd-server-add','arguments':{'device':'image','writable':true}}
{"return": {}}

(The difference is the share-rw=on in the -device parameter.)


So in theory all that's necessary is to set share-rw=on for the device
in the management layer. But I'm not sure whether that's practical.

As for just allowing the NBD server write access to the device... To me
that appears pretty difficult from an implementation perspective. We
assert that nobody can write without having requested write access and
we make sure that nobody can request write access without it being
allowed. Making an exception for NBD seems very difficult and would
probably mean we'd have to drop the assertion for write accesses altogether.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-03-31 17:43 ` Max Reitz
@ 2017-03-31 17:49   ` Ciprian Barbu
  2017-03-31 17:57     ` Max Reitz
  2017-04-03 18:52     ` Kashyap Chamarthy
  2017-04-03  8:15   ` Kevin Wolf
  1 sibling, 2 replies; 31+ messages in thread
From: Ciprian Barbu @ 2017-03-31 17:49 UTC (permalink / raw)
  To: Max Reitz, qemu-devel, Eric Blake, Alexandru Avadanii
  Cc: Jeff Cody, Markus Armbruster, svc-armband, Kevin Wolf

Hi,

Thank you for getting back!

I'm trying to follow you, but I don't understand all the details. I would like to ask this question though:

What is the difference between v2.8.0 and this commit? With v2.8.0 the same qemu command worked, but I admit it doesn't request sharing.

We also use libvirt v1.3.4, which might be a problem, but at least we want to understand if the commit in question introduced an obvious problem or if it's all in the details.

Btw, the qemu command generated by libvirt is this one, sorry about that:

2017-03-31 17:40:10.956+0000: starting up libvirt version: 1.3.4, package: 0+amos3~u16.04 (Enea Armband Devops Team <armband@enea.com> Fri, 13 Jan 2017 02:06:05 +0100), qemu version: 2.8.50(Debian 1:2.9+amos2~u16.04), hostname: node-2.domain.tld
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name instance-00000076,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-14-instance-00000076/master-key.aes -machine virt-2.8,accel=kvm,usb=off,gic-version=3 -cpu host -m 256 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 2812f3c9-f564-499b-a8c7-e9e7ccf24143 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-14-instance-00000076/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -no-shutdown -boot strict=on -kernel /var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/kernel -initrd /var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/ramdisk -append 'root=/dev/vda1 rw rootwait console=tty0 console=ttyS0 console=ttyAMA0' -device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1 -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x0 -usb -drive file=/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=native -device virtio-blk-device,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/disk.config,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native -device virtio-blk-device,scsi=off,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=27,id=hostnet0 -device virtio-net-device,netdev=hostnet0,id=net0,mac=fa:16:3e:82:0a:2b -serial file:/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/console.log -serial pty -vnc 0.0.0.0:0 -k en-us -device VGA,id=video0,vgamem_mb=16,bus=pci.2,addr=0x1 -device virtio-balloon-device,id=balloon0 -msg timestamp=on
Domain id=14 is tainted: high-privileges

Regards,
/Ciprian

-----Original Message-----
From: Max Reitz [mailto:mreitz@redhat.com] 
Sent: Friday, March 31, 2017 8:43 PM
To: Ciprian Barbu <Ciprian.Barbu@enea.com>; qemu-devel@nongnu.org; Eric Blake <eblake@redhat.com>; Alexandru Avadanii <Alexandru.Avadanii@enea.com>
Cc: Jeff Cody <jcody@redhat.com>; Markus Armbruster <armbru@redhat.com>; svc-armband <armband@enea.com>; Kevin Wolf <kwolf@redhat.com>
Subject: Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs

On 31.03.2017 18:03, Ciprian Barbu wrote:
> Hello,
> 
> Similar to the other thread about possible regression with rbd, there might be a regression with nbd.
> This time we are launching an instance from an image (not volume) and try to live migrate it:
> 
> nova live-migration <test_instance>
> 
> The nova-compute service complains with:
> 
> 2017-03-31 15:32:56.179 7806 INFO nova.virt.libvirt.driver 
> [req-15d79cbe-5956-4738-92df-3624e6b993ee 
> d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - 
> -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Migration running 
> for 0 secs, memory 100% remaining; (bytes processed=0, remaining=0, 
> total=0)
> 2017-03-31 15:32:58.029 7806 WARNING stevedore.named 
> [req-73bc0113-5555-4dd8-8903-d3540cc61b47 
> b9fbceeadd2d4d1bab9c90ae104db1f7 7e7db99b32c6467184701e9a0c2f1de7 - - 
> -] Could not load instance_network_info
> 2017-03-31 15:32:59.038 7806 ERROR nova.virt.libvirt.driver 
> [req-15d79cbe-5956-4738-92df-3624e6b993ee 
> d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - 
> -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Live Migration 
> failure: internal error: unable to execute QEMU command 
> 'nbd-server-add': Conflicts with use by drive-virtio-disk0 as 'root', 
> which does not allow 'write' on #block143
> 2017-03-31 15:32:59.190 7806 ERROR nova.virt.libvirt.driver 
> [req-15d79cbe-5956-4738-92df-3624e6b993ee 
> d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - 
> -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Migration 
> operation has aborted
> 
> I will try and bisect it myself, but I thought I would paste this here first, just so you know there is this issue too.

Well, I already know the commit in question. It's
8a7ce4f9338c475df1afc12502af704e4300a3e0 ("nbd/server: Use real permissions for NBD exports").

Whether this is a bug depends on the standpoint. I would very much consider it a bug fix because as of this commit you can no longer create a writable NBD server on a block device that is in use by a guest device without the guest device being aware of this.

The problem is that the functionality to "make" the guest device "aware"
of it was introduced only a couple of commits before, and it's called "share-rw".

So this doesn't work:

$ x86_64-softmmu/qemu-system-x86_64 \
    -blockdev node-name=image,driver=qcow2,\
file.driver=file,file.filename=foo.qcow2 \
    -device virtio-blk,drive=image \
    -qmp stdio
{"QMP": {"version": {"qemu": {"micro": 92, "minor": 8, "major": 2},
"package": " (v2.8.0-2038-g6604c893d0)"}, "capabilities": []}} {'execute':'qmp_capabilities'}
{"return": {}}
{'execute':'nbd-server-start','arguments':{'addr':{'type':'inet','data':{'host':'localhost','port':'10809'}}}}
{"return": {}}
{'execute':'nbd-server-add','arguments':{'device':'image','writable':true}}
{"error": {"class": "GenericError", "desc": "Conflicts with use by /machine/peripheral-anon/device[0]/virtio-backend as 'root', which does not allow 'write' on image"}

But this works:

$ x86_64-softmmu/qemu-system-x86_64 \
    -blockdev node-name=image,driver=qcow2,\
file.driver=file,file.filename=foo.qcow2 \
    -device virtio-blk,drive=image,share-rw=on \
    -qmp stdio
{"QMP": {"version": {"qemu": {"micro": 92, "minor": 8, "major": 2},
"package": " (v2.8.0-2038-g6604c893d0)"}, "capabilities": []}} {'execute':'qmp_capabilities'}
{"return": {}}
{'execute':'nbd-server-start','arguments':{'addr':{'type':'inet','data':{'host':'localhost','port':'10809'}}}}
{"return": {}}
{'execute':'nbd-server-add','arguments':{'device':'image','writable':true}}
{"return": {}}

(The difference is the share-rw=on in the -device parameter.)


So in theory all that's necessary is to set share-rw=on for the device in the management layer. But I'm not sure whether that's practical.

As for just allowing the NBD server write access to the device... To me that appears pretty difficult from an implementation perspective. We assert that nobody can write without having requested write access and we make sure that nobody can request write access without it being allowed. Making an exception for NBD seems very difficult and would probably mean we'd have to drop the assertion for write accesses altogether.

Max

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-03-31 17:49   ` Ciprian Barbu
@ 2017-03-31 17:57     ` Max Reitz
  2017-03-31 19:07       ` Alexandru Avadanii
  2017-04-03 18:52     ` Kashyap Chamarthy
  1 sibling, 1 reply; 31+ messages in thread
From: Max Reitz @ 2017-03-31 17:57 UTC (permalink / raw)
  To: Ciprian Barbu, qemu-devel, Eric Blake, Alexandru Avadanii
  Cc: Jeff Cody, Markus Armbruster, svc-armband, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 3729 bytes --]

On 31.03.2017 19:49, Ciprian Barbu wrote:
> Hi,
> 
> Thank you for getting back!
> 
> I'm trying to follow you, but I don't understand all the details. I would like to ask this question though:
> 
> What is the difference between v2.8.0 and this commit? With v2.8.0 the same qemu command worked, but I admit it doesn't request sharing.

The difference is that we now make sure there are for example not two
writers for the same block device (unless both agree that it's OK to
have shared writers). We haven't done that before.

The built-in NBD server is completely OK with writes by other parties
(such as guest devices), but guest devices generally are not (because
the guest OS should know about this).

Therefore, you'd now have to set the share-rw flag for guest devices
which should allow shared writers. Otherwise you cannot set up an NBD
server on a block device that is in use by the guest device.

> We also use libvirt v1.3.4, which might be a problem, but at least we want to understand if the commit in question introduced an obvious problem or if it's all in the details.

Well, I don't think there is any libvirt version that knows about this
flag yet.

...maybe another way to resolve the issue would be to set share-rw to
true by default for now, noting that this will be changed in the future.
In any case, I'm afraid we'll have to wait until Kevin is around on Monday.

> Btw, the qemu command generated by libvirt is this one, sorry about that:

And sorry about not yet having thanked you for reporting this issue. :-)

Max

> 2017-03-31 17:40:10.956+0000: starting up libvirt version: 1.3.4, package: 0+amos3~u16.04 (Enea Armband Devops Team <armband@enea.com> Fri, 13 Jan 2017 02:06:05 +0100), qemu version: 2.8.50(Debian 1:2.9+amos2~u16.04), hostname: node-2.domain.tld
> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name instance-00000076,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-14-instance-00000076/master-key.aes -machine virt-2.8,accel=kvm,usb=off,gic-version=3 -cpu host -m 256 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 2812f3c9-f564-499b-a8c7-e9e7ccf24143 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-14-instance-00000076/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -no-shutdown -boot strict=on -kernel /var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/kernel -initrd /var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/ramdisk -append 'root=/dev/vda1 rw rootwait console=tty0 console=ttyS0 console=ttyAMA0' -device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1 -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x0 -usb -drive file=/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=native -device virtio-blk-device,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/disk.config,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native -device virtio-blk-device,scsi=off,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=27,id=hostnet0 -device virtio-net-device,netdev=hostnet0,id=net0,mac=fa:16:3e:82:0a:2b -serial file:/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/console.log -serial pty -vnc 0.0.0.0:0 -k en-us -device VGA,id=video0,vgamem_mb=16,bus=pci.2,addr=0x1 -device virtio-balloon-device,id=balloon0 -msg timestamp=on
> Domain id=14 is tainted: high-privileges
> 
> Regards,
> /Ciprian
>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-03-31 17:57     ` Max Reitz
@ 2017-03-31 19:07       ` Alexandru Avadanii
  0 siblings, 0 replies; 31+ messages in thread
From: Alexandru Avadanii @ 2017-03-31 19:07 UTC (permalink / raw)
  To: Max Reitz, Ciprian Barbu, qemu-devel, Eric Blake
  Cc: Jeff Cody, Markus Armbruster, svc-armband, Kevin Wolf

Hi,

> -----Original Message-----
> From: Max Reitz [mailto:mreitz@redhat.com]
> Sent: Friday, March 31, 2017 8:57 PM
> To: Ciprian Barbu; qemu-devel@nongnu.org; Eric Blake; Alexandru Avadanii
> Cc: Jeff Cody; Markus Armbruster; svc-armband; Kevin Wolf
> Subject: Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
> 
> On 31.03.2017 19:49, Ciprian Barbu wrote:
> > Hi,
> >
> > Thank you for getting back!
> >
> > I'm trying to follow you, but I don't understand all the details. I would like to
> ask this question though:
> >
> > What is the difference between v2.8.0 and this commit? With v2.8.0 the
> same qemu command worked, but I admit it doesn't request sharing.
> 
> The difference is that we now make sure there are for example not two
> writers for the same block device (unless both agree that it's OK to have
> shared writers). We haven't done that before.
> 
> The built-in NBD server is completely OK with writes by other parties (such as
> guest devices), but guest devices generally are not (because the guest OS
> should know about this).
> 
> Therefore, you'd now have to set the share-rw flag for guest devices which
> should allow shared writers. Otherwise you cannot set up an NBD server on a
> block device that is in use by the guest device.

Thank you for the detailed explanation!

> 
> > We also use libvirt v1.3.4, which might be a problem, but at least we want
> to understand if the commit in question introduced an obvious problem or if
> it's all in the details.
> 
> Well, I don't think there is any libvirt version that knows about this flag yet.
> 
> ...maybe another way to resolve the issue would be to set share-rw to true
> by default for now, noting that this will be changed in the future.
> In any case, I'm afraid we'll have to wait until Kevin is around on Monday.

Ciprian is gone for the week-end too, so next week it is :)
I think we will change the default for now in our builds, and wait for the proper support in Libvirt.
Ty for the default suggestion, btw!

BR,
Alex

> 
> > Btw, the qemu command generated by libvirt is this one, sorry about that:
> 
> And sorry about not yet having thanked you for reporting this issue. :-)
> 
> Max
> 
> > 2017-03-31 17:40:10.956+0000: starting up libvirt version: 1.3.4,
> > package: 0+amos3~u16.04 (Enea Armband Devops Team
> <armband@enea.com>
> > Fri, 13 Jan 2017 02:06:05 +0100), qemu version: 2.8.50(Debian
> > 1:2.9+amos2~u16.04), hostname: node-2.domain.tld LC_ALL=C
> > PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
> > QEMU_AUDIO_DRV=none /usr/bin/kvm -name
> > instance-00000076,debug-threads=on -S -object
> > secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-14-i
> > nstance-00000076/master-key.aes -machine
> > virt-2.8,accel=kvm,usb=off,gic-version=3 -cpu host -m 256 -realtime
> > mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid
> > 2812f3c9-f564-499b-a8c7-e9e7ccf24143 -no-user-config -nodefaults -
> chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-14-
> instance-00000076/monitor.sock,server,nowait -mon
> chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -
> no-shutdown -boot strict=on -kernel /var/lib/nova/instances/2812f3c9-f564-
> 499b-a8c7-e9e7ccf24143/kernel -initrd /var/lib/nova/instances/2812f3c9-
> f564-499b-a8c7-e9e7ccf24143/ramdisk -append 'root=/dev/vda1 rw rootwait
> console=tty0 console=ttyS0 console=ttyAMA0' -device i82801b11-
> bridge,id=pci.1,bus=pcie.0,addr=0x1 -device pci-
> bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x0 -usb -drive
> file=/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-
> e9e7ccf24143/disk,format=qcow2,if=none,id=drive-virtio-
> disk0,cache=none,aio=native -device virtio-blk-device,scsi=off,drive=drive-
> virtio-disk0,id=virtio-disk0,bootindex=1 -drive
> file=/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-
> e9e7ccf24143/disk.config,format=raw,if=none,id=drive-virtio-
> disk1,cache=none,aio=native -device virtio-blk-device,scsi=off,drive=drive-
> virtio-disk1,id=virtio-disk1 -netdev tap,fd=27,id=hostnet0 -device virtio-net-
> device,netdev=hostnet0,id=net0,mac=fa:16:3e:82:0a:2b -serial
> file:/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-
> e9e7ccf24143/console.log -serial pty -vnc 0.0.0.0:0 -k en-us -device
> VGA,id=video0,vgamem_mb=16,bus=pci.2,addr=0x1 -device virtio-balloon-
> device,id=balloon0 -msg timestamp=on Domain id=14 is tainted: high-
> privileges
> >
> > Regards,
> > /Ciprian
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-03-31 17:43 ` Max Reitz
  2017-03-31 17:49   ` Ciprian Barbu
@ 2017-04-03  8:15   ` Kevin Wolf
  2017-04-03 12:39     ` Max Reitz
  2017-04-03 12:51     ` Peter Krempa
  1 sibling, 2 replies; 31+ messages in thread
From: Kevin Wolf @ 2017-04-03  8:15 UTC (permalink / raw)
  To: Max Reitz
  Cc: Ciprian Barbu, qemu-devel, Eric Blake, pkrempa,
	Alexandru Avadanii, Jeff Cody, Markus Armbruster, svc-armband

[-- Attachment #1: Type: text/plain, Size: 4758 bytes --]

Am 31.03.2017 um 19:43 hat Max Reitz geschrieben:
> On 31.03.2017 18:03, Ciprian Barbu wrote:
> > Hello,
> > 
> > Similar to the other thread about possible regression with rbd, there might be a regression with nbd.
> > This time we are launching an instance from an image (not volume) and try to live migrate it:
> > 
> > nova live-migration <test_instance>
> > 
> > The nova-compute service complains with:
> > 
> > 2017-03-31 15:32:56.179 7806 INFO nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Migration running for 0 secs, memory 100% remaining; (bytes processed=0, remaining=0, total=0)
> > 2017-03-31 15:32:58.029 7806 WARNING stevedore.named [req-73bc0113-5555-4dd8-8903-d3540cc61b47 b9fbceeadd2d4d1bab9c90ae104db1f7 7e7db99b32c6467184701e9a0c2f1de7 - - -] Could not load instance_network_info
> > 2017-03-31 15:32:59.038 7806 ERROR nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Live Migration failure: internal error: unable to execute QEMU command 'nbd-server-add': Conflicts with use by drive-virtio-disk0 as 'root', which does not allow 'write' on #block143
> > 2017-03-31 15:32:59.190 7806 ERROR nova.virt.libvirt.driver [req-15d79cbe-5956-4738-92df-3624e6b993ee d795de59fb9a4ea38776a11d20ae8469 cee03e74881f4ccba3b83345fb652b2c - - -] [instance: 6a04508f-5d79-4582-8e2c-4cc368753f6c] Migration operation has aborted
> > 
> > I will try and bisect it myself, but I thought I would paste this here first, just so you know there is this issue too.
> 
> Well, I already know the commit in question. It's
> 8a7ce4f9338c475df1afc12502af704e4300a3e0 ("nbd/server: Use real
> permissions for NBD exports").
> 
> Whether this is a bug depends on the standpoint. I would very much
> consider it a bug fix because as of this commit you can no longer create
> a writable NBD server on a block device that is in use by a guest device
> without the guest device being aware of this.
> 
> The problem is that the functionality to "make" the guest device "aware"
> of it was introduced only a couple of commits before, and it's called
> "share-rw".
> 
> So this doesn't work:
> 
> $ x86_64-softmmu/qemu-system-x86_64 \
>     -blockdev node-name=image,driver=qcow2,\
> file.driver=file,file.filename=foo.qcow2 \
>     -device virtio-blk,drive=image \
>     -qmp stdio
> {"QMP": {"version": {"qemu": {"micro": 92, "minor": 8, "major": 2},
> "package": " (v2.8.0-2038-g6604c893d0)"}, "capabilities": []}}
> {'execute':'qmp_capabilities'}
> {"return": {}}
> {'execute':'nbd-server-start','arguments':{'addr':{'type':'inet','data':{'host':'localhost','port':'10809'}}}}
> {"return": {}}
> {'execute':'nbd-server-add','arguments':{'device':'image','writable':true}}
> {"error": {"class": "GenericError", "desc": "Conflicts with use by
> /machine/peripheral-anon/device[0]/virtio-backend as 'root', which does
> not allow 'write' on image"}
> 
> But this works:
> 
> $ x86_64-softmmu/qemu-system-x86_64 \
>     -blockdev node-name=image,driver=qcow2,\
> file.driver=file,file.filename=foo.qcow2 \
>     -device virtio-blk,drive=image,share-rw=on \
>     -qmp stdio
> {"QMP": {"version": {"qemu": {"micro": 92, "minor": 8, "major": 2},
> "package": " (v2.8.0-2038-g6604c893d0)"}, "capabilities": []}}
> {'execute':'qmp_capabilities'}
> {"return": {}}
> {'execute':'nbd-server-start','arguments':{'addr':{'type':'inet','data':{'host':'localhost','port':'10809'}}}}
> {"return": {}}
> {'execute':'nbd-server-add','arguments':{'device':'image','writable':true}}
> {"return": {}}
> 
> (The difference is the share-rw=on in the -device parameter.)
> 
> So in theory all that's necessary is to set share-rw=on for the device
> in the management layer. But I'm not sure whether that's practical.

Yes, libvirt needs to provide this option if the guest supports sharing.
If it doesn't support sharing, rejecting a read-write NBD client seems
correct to me.

Peter, Eric, what is the status on the libvirt side here?

> As for just allowing the NBD server write access to the device... To me
> that appears pretty difficult from an implementation perspective. We
> assert that nobody can write without having requested write access and
> we make sure that nobody can request write access without it being
> allowed. Making an exception for NBD seems very difficult and would
> probably mean we'd have to drop the assertion for write accesses altogether.

Making an exception would simply be wrong.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-03  8:15   ` Kevin Wolf
@ 2017-04-03 12:39     ` Max Reitz
  2017-04-03 13:00       ` Kevin Wolf
  2017-04-03 19:44       ` Eric Blake
  2017-04-03 12:51     ` Peter Krempa
  1 sibling, 2 replies; 31+ messages in thread
From: Max Reitz @ 2017-04-03 12:39 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Ciprian Barbu, qemu-devel, Eric Blake, pkrempa,
	Alexandru Avadanii, Jeff Cody, Markus Armbruster, svc-armband

[-- Attachment #1: Type: text/plain, Size: 1435 bytes --]

On 03.04.2017 10:15, Kevin Wolf wrote:
> Am 31.03.2017 um 19:43 hat Max Reitz geschrieben:

[...]

>> So in theory all that's necessary is to set share-rw=on for the device
>> in the management layer. But I'm not sure whether that's practical.
> 
> Yes, libvirt needs to provide this option if the guest supports sharing.
> If it doesn't support sharing, rejecting a read-write NBD client seems
> correct to me.
> 
> Peter, Eric, what is the status on the libvirt side here?
> 
>> As for just allowing the NBD server write access to the device... To me
>> that appears pretty difficult from an implementation perspective. We
>> assert that nobody can write without having requested write access and
>> we make sure that nobody can request write access without it being
>> allowed. Making an exception for NBD seems very difficult and would
>> probably mean we'd have to drop the assertion for write accesses altogether.
> 
> Making an exception would simply be wrong.

Indeed. That is why it would be so difficult.

The question remains whether it is practical not to make an exception.
As far as I know, libvirt is only guaranteed to support older qemu
versions, not necessarily future ones. So we should be allowed to break
existing use cases here until libvirt is updated (assuming it is
possible for libvirt to express "guest device allows shared writes" as
an option for its next release).

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-03  8:15   ` Kevin Wolf
  2017-04-03 12:39     ` Max Reitz
@ 2017-04-03 12:51     ` Peter Krempa
  2017-04-04 13:19       ` Kevin Wolf
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Krempa @ 2017-04-03 12:51 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Max Reitz, Ciprian Barbu, qemu-devel, Eric Blake,
	Alexandru Avadanii, Jeff Cody, Markus Armbruster, svc-armband

[-- Attachment #1: Type: text/plain, Size: 2939 bytes --]

On Mon, Apr 03, 2017 at 10:15:42 +0200, Kevin Wolf wrote:
> Am 31.03.2017 um 19:43 hat Max Reitz geschrieben:
> > On 31.03.2017 18:03, Ciprian Barbu wrote:

[...]

> > So this doesn't work:
> > 
> > $ x86_64-softmmu/qemu-system-x86_64 \
> >     -blockdev node-name=image,driver=qcow2,\
> > file.driver=file,file.filename=foo.qcow2 \
> >     -device virtio-blk,drive=image \
> >     -qmp stdio
> > {"QMP": {"version": {"qemu": {"micro": 92, "minor": 8, "major": 2},
> > "package": " (v2.8.0-2038-g6604c893d0)"}, "capabilities": []}}
> > {'execute':'qmp_capabilities'}
> > {"return": {}}
> > {'execute':'nbd-server-start','arguments':{'addr':{'type':'inet','data':{'host':'localhost','port':'10809'}}}}
> > {"return": {}}
> > {'execute':'nbd-server-add','arguments':{'device':'image','writable':true}}
> > {"error": {"class": "GenericError", "desc": "Conflicts with use by
> > /machine/peripheral-anon/device[0]/virtio-backend as 'root', which does
> > not allow 'write' on image"}
> > 
> > But this works:
> > 
> > $ x86_64-softmmu/qemu-system-x86_64 \
> >     -blockdev node-name=image,driver=qcow2,\
> > file.driver=file,file.filename=foo.qcow2 \
> >     -device virtio-blk,drive=image,share-rw=on \
> >     -qmp stdio
> > {"QMP": {"version": {"qemu": {"micro": 92, "minor": 8, "major": 2},
> > "package": " (v2.8.0-2038-g6604c893d0)"}, "capabilities": []}}
> > {'execute':'qmp_capabilities'}
> > {"return": {}}
> > {'execute':'nbd-server-start','arguments':{'addr':{'type':'inet','data':{'host':'localhost','port':'10809'}}}}
> > {"return": {}}
> > {'execute':'nbd-server-add','arguments':{'device':'image','writable':true}}
> > {"return": {}}
> > 
> > (The difference is the share-rw=on in the -device parameter.)
> > 
> > So in theory all that's necessary is to set share-rw=on for the device
> > in the management layer. But I'm not sure whether that's practical.
> 
> Yes, libvirt needs to provide this option if the guest supports sharing.
> If it doesn't support sharing, rejecting a read-write NBD client seems
> correct to me.
> 
> Peter, Eric, what is the status on the libvirt side here?

Libvirt currently uses the NBD server only for non-shared storage
migration. At that point the disk is not shared (while qemu may think
so) since the other side is not actually running until the mirror
reaches synchronized state.

The fix should be trivial.

> > As for just allowing the NBD server write access to the device... To me
> > that appears pretty difficult from an implementation perspective. We
> > assert that nobody can write without having requested write access and
> > we make sure that nobody can request write access without it being
> > allowed. Making an exception for NBD seems very difficult and would
> > probably mean we'd have to drop the assertion for write accesses altogether.
> 
> Making an exception would simply be wrong.
> 
> Kevin



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-03 12:39     ` Max Reitz
@ 2017-04-03 13:00       ` Kevin Wolf
  2017-04-03 13:50         ` Peter Krempa
  2017-04-03 19:48         ` Eric Blake
  2017-04-03 19:44       ` Eric Blake
  1 sibling, 2 replies; 31+ messages in thread
From: Kevin Wolf @ 2017-04-03 13:00 UTC (permalink / raw)
  To: Max Reitz
  Cc: Ciprian Barbu, qemu-devel, Eric Blake, pkrempa,
	Alexandru Avadanii, Jeff Cody, Markus Armbruster, svc-armband

[-- Attachment #1: Type: text/plain, Size: 2000 bytes --]

Am 03.04.2017 um 14:39 hat Max Reitz geschrieben:
> On 03.04.2017 10:15, Kevin Wolf wrote:
> > Am 31.03.2017 um 19:43 hat Max Reitz geschrieben:
> 
> [...]
> 
> >> So in theory all that's necessary is to set share-rw=on for the device
> >> in the management layer. But I'm not sure whether that's practical.
> > 
> > Yes, libvirt needs to provide this option if the guest supports sharing.
> > If it doesn't support sharing, rejecting a read-write NBD client seems
> > correct to me.
> > 
> > Peter, Eric, what is the status on the libvirt side here?
> > 
> >> As for just allowing the NBD server write access to the device... To me
> >> that appears pretty difficult from an implementation perspective. We
> >> assert that nobody can write without having requested write access and
> >> we make sure that nobody can request write access without it being
> >> allowed. Making an exception for NBD seems very difficult and would
> >> probably mean we'd have to drop the assertion for write accesses altogether.
> > 
> > Making an exception would simply be wrong.
> 
> Indeed. That is why it would be so difficult.
> 
> The question remains whether it is practical not to make an exception.
> As far as I know, libvirt is only guaranteed to support older qemu
> versions, not necessarily future ones. So we should be allowed to break
> existing use cases here until libvirt is updated (assuming it is
> possible for libvirt to express "guest device allows shared writes" as
> an option for its next release).

If I understand correctly, this is a case of incoming live migration,
i.e. the virtio-blk device which is blocking the writes to the image
doesn't really belong to a running guest yet.

So if we need to make an exception (and actually reading the context
makes it appear so), I guess it would have to be that devices actually
can share the write permission during incoming migration, but not any
more later (unless the share-rw flag is set).

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-03 13:00       ` Kevin Wolf
@ 2017-04-03 13:50         ` Peter Krempa
  2017-04-04 12:16           ` Kevin Wolf
  2017-04-03 19:48         ` Eric Blake
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Krempa @ 2017-04-03 13:50 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Max Reitz, Ciprian Barbu, qemu-devel, Eric Blake,
	Alexandru Avadanii, Jeff Cody, Markus Armbruster, svc-armband

[-- Attachment #1: Type: text/plain, Size: 2494 bytes --]

On Mon, Apr 03, 2017 at 15:00:41 +0200, Kevin Wolf wrote:
> Am 03.04.2017 um 14:39 hat Max Reitz geschrieben:
> > On 03.04.2017 10:15, Kevin Wolf wrote:
> > > Am 31.03.2017 um 19:43 hat Max Reitz geschrieben:
> > 
> > [...]
> > 
> > >> So in theory all that's necessary is to set share-rw=on for the device
> > >> in the management layer. But I'm not sure whether that's practical.
> > > 
> > > Yes, libvirt needs to provide this option if the guest supports sharing.
> > > If it doesn't support sharing, rejecting a read-write NBD client seems
> > > correct to me.
> > > 
> > > Peter, Eric, what is the status on the libvirt side here?
> > > 
> > >> As for just allowing the NBD server write access to the device... To me
> > >> that appears pretty difficult from an implementation perspective. We
> > >> assert that nobody can write without having requested write access and
> > >> we make sure that nobody can request write access without it being
> > >> allowed. Making an exception for NBD seems very difficult and would
> > >> probably mean we'd have to drop the assertion for write accesses altogether.
> > > 
> > > Making an exception would simply be wrong.
> > 
> > Indeed. That is why it would be so difficult.
> > 
> > The question remains whether it is practical not to make an exception.
> > As far as I know, libvirt is only guaranteed to support older qemu
> > versions, not necessarily future ones. So we should be allowed to break
> > existing use cases here until libvirt is updated (assuming it is
> > possible for libvirt to express "guest device allows shared writes" as
> > an option for its next release).
> 
> If I understand correctly, this is a case of incoming live migration,
> i.e. the virtio-blk device which is blocking the writes to the image
> doesn't really belong to a running guest yet.

Yes, exactly. libvirt starts the NBD server on the destination of
the migration. Until then the VM did not ever run yet. The VM will run
once the memory migration finishes, so the guest front-end will not
write anything at that point.

> So if we need to make an exception (and actually reading the context
> makes it appear so), I guess it would have to be that devices actually
> can share the write permission during incoming migration, but not any
> more later (unless the share-rw flag is set).

Yes, this would be desired to avoid a regression. Libvirt then tears
down the mirror prior to resuming the VM (afaik).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-03-31 17:49   ` Ciprian Barbu
  2017-03-31 17:57     ` Max Reitz
@ 2017-04-03 18:52     ` Kashyap Chamarthy
  2017-04-04  8:07       ` ciprian.barbu
  1 sibling, 1 reply; 31+ messages in thread
From: Kashyap Chamarthy @ 2017-04-03 18:52 UTC (permalink / raw)
  To: Ciprian Barbu
  Cc: Max Reitz, qemu-devel, Eric Blake, Alexandru Avadanii,
	Kevin Wolf, Jeff Cody, Markus Armbruster, svc-armband

On Fri, Mar 31, 2017 at 05:49:52PM +0000, Ciprian Barbu wrote:
> Hi,

[...]

> We also use libvirt v1.3.4, which might be a problem, but at least we
> want to understand if the commit in question introduced an obvious
> problem or if it's all in the details.

A tangential question -- Just curious, your environment
seems...interesting: given you are using a bit old (but above the
MIN_LIBVIRT_VERSION for Nova, which is currently 1.2.1) libvirt version,
with *RC2* of QEMU 2.9 with Nova, and testing live migration.  Me
wonders if this some kind of a custom OpenStack environment, or an
official upstream release that you're simply testing with a newer
version of QEMU...


[1]
http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py?id=068d851#n198

> 
> Btw, the qemu command generated by libvirt is this one, sorry about that:
> 
> 2017-03-31 17:40:10.956+0000: starting up libvirt version: 1.3.4, package: 0+amos3~u16.04 (Enea Armband Devops Team <armband@enea.com> Fri, 13 Jan 2017 02:06:05 +0100), qemu version: 2.8.50(Debian 1:2.9+amos2~u16.04), hostname: node-2.domain.tld
> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name instance-00000076,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-14-instance-00000076/master-key.aes -machine virt-2.8,accel=kvm,usb=off,gic-version=3 -cpu host -m 256 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 2812f3c9-f564-499b-a8c7-e9e7ccf24143 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-14-instance-00000076/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -no-shutdown -boot strict=on -kernel /var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/kernel -initrd /var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/ramdisk -append 'root=/dev/vda1 rw rootwait console=tty0 console=ttyS0 console=ttyAMA0' -device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1 -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x0 -usb -drive file=/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=native -device virtio-blk-device,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/disk.config,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native -device virtio-blk-device,scsi=off,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=27,id=hostnet0 -device virtio-net-device,netdev=hostnet0,id=net0,mac=fa:16:3e:82:0a:2b -serial file:/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/console.log -serial pty -vnc 0.0.0.0:0 -k en-us -device VGA,id=video0,vgamem_mb=16,bus=pci.2,addr=0x1 -device virtio-balloon-device,id=balloon0 -msg timestamp=on
> Domain id=14 is tainted: high-privileges

[...]


-- 
/kashyap

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-03 12:39     ` Max Reitz
  2017-04-03 13:00       ` Kevin Wolf
@ 2017-04-03 19:44       ` Eric Blake
  2017-04-04  8:17         ` ciprian.barbu
  1 sibling, 1 reply; 31+ messages in thread
From: Eric Blake @ 2017-04-03 19:44 UTC (permalink / raw)
  To: Max Reitz, Kevin Wolf
  Cc: Ciprian Barbu, qemu-devel, pkrempa, Alexandru Avadanii,
	Jeff Cody, Markus Armbruster, svc-armband

[-- Attachment #1: Type: text/plain, Size: 2498 bytes --]

On 04/03/2017 07:39 AM, Max Reitz wrote:
>>> As for just allowing the NBD server write access to the device... To me
>>> that appears pretty difficult from an implementation perspective. We
>>> assert that nobody can write without having requested write access and
>>> we make sure that nobody can request write access without it being
>>> allowed. Making an exception for NBD seems very difficult and would
>>> probably mean we'd have to drop the assertion for write accesses altogether.
>>
>> Making an exception would simply be wrong.
> 
> Indeed. That is why it would be so difficult.
> 
> The question remains whether it is practical not to make an exception.
> As far as I know, libvirt is only guaranteed to support older qemu
> versions, not necessarily future ones. So we should be allowed to break
> existing use cases here until libvirt is updated (assuming it is
> possible for libvirt to express "guest device allows shared writes" as
> an option for its next release).

In general, we support:

old qemu, old libvirt (well, as long as those versions are supported)
old qemu, new libvirt
new qemu, new libvirt

but we do NOT make any guarantees of supporting

new qemu, old libvirt

In other words, this may be a failure where new qemu requires extra care
and thus a new libvirt for it to be useful.  It's not nice to break qemu
back-compat if any other solution is possible (new qemu and old libvirt
should work more often than not), but it is the one scenario that no one
supports (whether here, upstream libvirt, or in downstream backports).

Or, put another way, it's perfectly fine if we require that the use of
qemu 2.9 requires that you also use libvirt 3.3.0 or newer (since we
missed the boat on fixing libvirt 3.2 to pass shared-rw or any other
handshaking we come up with), although it's also nice if we figure out
how to make qemu work with what existing libvirt wants to do (the NBD
export needs to be writable by the source pre-migration, and by the
destination post-migration; so there is that aspect of two clients both
wanting to write - but the destination doesn't need to write until after
the source no longer has anything to write, so if we have a clean way to
turn writes off for the source, then turn writes on for the destination,
all before migrating which host is writing, that would be even cleaner).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-03 13:00       ` Kevin Wolf
  2017-04-03 13:50         ` Peter Krempa
@ 2017-04-03 19:48         ` Eric Blake
  1 sibling, 0 replies; 31+ messages in thread
From: Eric Blake @ 2017-04-03 19:48 UTC (permalink / raw)
  To: Kevin Wolf, Max Reitz
  Cc: Ciprian Barbu, qemu-devel, pkrempa, Alexandru Avadanii,
	Jeff Cody, Markus Armbruster, svc-armband

[-- Attachment #1: Type: text/plain, Size: 1274 bytes --]

On 04/03/2017 08:00 AM, Kevin Wolf wrote:
>> The question remains whether it is practical not to make an exception.
>> As far as I know, libvirt is only guaranteed to support older qemu
>> versions, not necessarily future ones. So we should be allowed to break
>> existing use cases here until libvirt is updated (assuming it is
>> possible for libvirt to express "guest device allows shared writes" as
>> an option for its next release).
> 
> If I understand correctly, this is a case of incoming live migration,
> i.e. the virtio-blk device which is blocking the writes to the image
> doesn't really belong to a running guest yet.
> 
> So if we need to make an exception (and actually reading the context
> makes it appear so), I guess it would have to be that devices actually
> can share the write permission during incoming migration, but not any
> more later (unless the share-rw flag is set).

That sounds like the right approach - if nbd-server-add can detect that
it is being invoked during the window between 'qemu -S' and before
'migrate', and allow the shared-rw at that time, then revoke it at the
time the migrate command hits.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-03 18:52     ` Kashyap Chamarthy
@ 2017-04-04  8:07       ` ciprian.barbu
  0 siblings, 0 replies; 31+ messages in thread
From: ciprian.barbu @ 2017-04-04  8:07 UTC (permalink / raw)
  To: Kashyap Chamarthy
  Cc: Max Reitz, qemu-devel, Eric Blake, Alexandru Avadanii,
	Kevin Wolf, Jeff Cody, Markus Armbruster, svc-armband

Hi,

On 03.04.2017 21:52, Kashyap Chamarthy wrote:
> On Fri, Mar 31, 2017 at 05:49:52PM +0000, Ciprian Barbu wrote:
>> Hi,
>
> [...]
>
>> We also use libvirt v1.3.4, which might be a problem, but at least we
>> want to understand if the commit in question introduced an obvious
>> problem or if it's all in the details.
>
> A tangential question -- Just curious, your environment
> seems...interesting: given you are using a bit old (but above the
> MIN_LIBVIRT_VERSION for Nova, which is currently 1.2.1) libvirt version,
> with *RC2* of QEMU 2.9 with Nova, and testing live migration.  Me
> wonders if this some kind of a custom OpenStack environment, or an
> official upstream release that you're simply testing with a newer
> version of QEMU...

Here in the OPNFV ARMBAND Team, the primary goal is to make the OPNFV 
release work on ARM aarch64 servers. We have clusters of servers from 
different vendors, this on in particular is based on the ThunderX cn8890 
SoC, which has received work for live migration from different sources. 
The problem is they have only been upstreamed in new versions of kernel 
and qemu. We tried to backport the changes to earlier versions, 
previously we were running kernel 4.4 and qemu v2.6.1.

But all of this is on top of Openstack Newton, and yes, we basically 
bumped the kernel, qemu and more recently libvirt too (v1.3.4). But as 
described earlier we were kind of forced into this situation.


>
>
> [1]
> http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py?id=068d851#n198
>
>>
>> Btw, the qemu command generated by libvirt is this one, sorry about that:
>>
>> 2017-03-31 17:40:10.956+0000: starting up libvirt version: 1.3.4, package: 0+amos3~u16.04 (Enea Armband Devops Team <armband@enea.com> Fri, 13 Jan 2017 02:06:05 +0100), qemu version: 2.8.50(Debian 1:2.9+amos2~u16.04), hostname: node-2.domain.tld
>> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name instance-00000076,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-14-instance-00000076/master-key.aes -machine virt-2.8,accel=kvm,usb=off,gic-version=3 -cpu host -m 256 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 2812f3c9-f564-499b-a8c7-e9e7ccf24143 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-14-instance-00000076/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -no-shutdown -boot strict=on -kernel /var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/kernel -initrd /var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/ramdisk -append 'root=/dev/vda1 rw rootwait console=tty0 console=ttyS0 console=ttyAMA0' -device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1 -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x0 -usb -drive file=/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=native -device virtio-blk-device,scsi=off,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/disk.config,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native -device virtio-blk-device,scsi=off,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=27,id=hostnet0 -device virtio-net-device,netdev=hostnet0,id=net0,mac=fa:16:3e:82:0a:2b -serial file:/var/lib/nova/instances/2812f3c9-f564-499b-a8c7-e9e7ccf24143/console.log -serial pty -vnc 0.0.0.0:0 -k en-us -device VGA,id=video0,vgamem_mb=16,bus=pci.2,addr=0x1 -device virtio-balloon-device,id=balloon0 -msg timestamp=on
>> Domain id=14 is tainted: high-privileges
>
> [...]
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-03 19:44       ` Eric Blake
@ 2017-04-04  8:17         ` ciprian.barbu
  2017-04-04 11:00           ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: ciprian.barbu @ 2017-04-04  8:17 UTC (permalink / raw)
  To: Eric Blake, Max Reitz, Kevin Wolf
  Cc: qemu-devel, pkrempa, Alexandru Avadanii, Jeff Cody,
	Markus Armbruster, svc-armband

Hi,

On 03.04.2017 22:44, Eric Blake wrote:
> On 04/03/2017 07:39 AM, Max Reitz wrote:
>>>> As for just allowing the NBD server write access to the device... To me
>>>> that appears pretty difficult from an implementation perspective. We
>>>> assert that nobody can write without having requested write access and
>>>> we make sure that nobody can request write access without it being
>>>> allowed. Making an exception for NBD seems very difficult and would
>>>> probably mean we'd have to drop the assertion for write accesses altogether.
>>>
>>> Making an exception would simply be wrong.
>>
>> Indeed. That is why it would be so difficult.
>>
>> The question remains whether it is practical not to make an exception.
>> As far as I know, libvirt is only guaranteed to support older qemu
>> versions, not necessarily future ones. So we should be allowed to break
>> existing use cases here until libvirt is updated (assuming it is
>> possible for libvirt to express "guest device allows shared writes" as
>> an option for its next release).
>
> In general, we support:
>
> old qemu, old libvirt (well, as long as those versions are supported)
> old qemu, new libvirt
> new qemu, new libvirt
>
> but we do NOT make any guarantees of supporting
>
> new qemu, old libvirt

Sounds reasonable enough, I guess we didn't look at it this way.

>
> In other words, this may be a failure where new qemu requires extra care
> and thus a new libvirt for it to be useful.  It's not nice to break qemu
> back-compat if any other solution is possible (new qemu and old libvirt
> should work more often than not), but it is the one scenario that no one
> supports (whether here, upstream libvirt, or in downstream backports).
>
> Or, put another way, it's perfectly fine if we require that the use of
> qemu 2.9 requires that you also use libvirt 3.3.0 or newer (since we
> missed the boat on fixing libvirt 3.2 to pass shared-rw or any other
> handshaking we come up with), although it's also nice if we figure out
> how to make qemu work with what existing libvirt wants to do (the NBD
> export needs to be writable by the source pre-migration, and by the
> destination post-migration; so there is that aspect of two clients both
> wanting to write - but the destination doesn't need to write until after
> the source no longer has anything to write, so if we have a clean way to
> turn writes off for the source, then turn writes on for the destination,
> all before migrating which host is writing, that would be even cleaner).
>

Ok, this sounds like you are saying that there isn't much reasoning 
behind this usecase, and also we might be the only ones complaining 
about new qemu not working with an old libvirt.

I think we could do with a 'clean' (as clean as it gets) workaround, 
since it might be very difficult to bump libvirt in our Openstack 
environment, skipping one major versions and several minors. But we 
would be willing to do it as long as there are good chances it should 
work ok. In the mean time we are using a less clean workaround and we 
rely on you guys to help us find a good solution.

Thanks and regards,
/Ciprian

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-04  8:17         ` ciprian.barbu
@ 2017-04-04 11:00           ` Paolo Bonzini
  2017-04-04 11:14             ` Daniel P. Berrange
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2017-04-04 11:00 UTC (permalink / raw)
  To: ciprian.barbu, Eric Blake, Max Reitz, Kevin Wolf
  Cc: pkrempa, svc-armband, Jeff Cody, qemu-devel, Markus Armbruster,
	Alexandru Avadanii



On 04/04/2017 10:17, ciprian.barbu wrote:
>>
>> but we do NOT make any guarantees of supporting
>>
>> new qemu, old libvirt
> 
> Sounds reasonable enough, I guess we didn't look at it this way.

Yes, but usually it's "new qemu, very old libvirt", like several years
old.  I think this should be treated as a regression.

Between this issue and the problem with snapshots, it seems to me that
the op blockers was really, really premature.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-04 11:00           ` Paolo Bonzini
@ 2017-04-04 11:14             ` Daniel P. Berrange
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel P. Berrange @ 2017-04-04 11:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ciprian.barbu, Eric Blake, Max Reitz, Kevin Wolf, pkrempa,
	svc-armband, Jeff Cody, Markus Armbruster, qemu-devel,
	Alexandru Avadanii

On Tue, Apr 04, 2017 at 01:00:08PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 10:17, ciprian.barbu wrote:
> >>
> >> but we do NOT make any guarantees of supporting
> >>
> >> new qemu, old libvirt
> > 
> > Sounds reasonable enough, I guess we didn't look at it this way.
> 
> Yes, but usually it's "new qemu, very old libvirt", like several years
> old.  I think this should be treated as a regression.
> 
> Between this issue and the problem with snapshots, it seems to me that
> the op blockers was really, really premature.

I think this kind of issue is inevitable when introducing this kind of
feature in to QEMU - no matter how well we think we've identified the
edge cases during development, this kind of issue would inevitably crop
up somewhere.

Even if we had libvirt support already, chances are we would have missed
handling of this edge case.

This feels like a case where the new feature needs to have a different
default - ie default to old behavior and require explicit opt-in to use
of the op-blockers. This allows people to opt-in to test the feature in
real-world & identify problems without risk of breaking the everyone else
out of the box.

Can we get this disabled by default to 2.9, with a view to having some
time for testing before enabling it by default in 2.10 / 2.11

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-03 13:50         ` Peter Krempa
@ 2017-04-04 12:16           ` Kevin Wolf
  2017-04-04 13:51             ` Eric Blake
  2017-04-04 14:04             ` Paolo Bonzini
  0 siblings, 2 replies; 31+ messages in thread
From: Kevin Wolf @ 2017-04-04 12:16 UTC (permalink / raw)
  To: Peter Krempa
  Cc: Max Reitz, Ciprian Barbu, qemu-devel, Eric Blake,
	Alexandru Avadanii, Jeff Cody, Markus Armbruster, svc-armband

[-- Attachment #1: Type: text/plain, Size: 1906 bytes --]

Am 03.04.2017 um 15:50 hat Peter Krempa geschrieben:
> On Mon, Apr 03, 2017 at 15:00:41 +0200, Kevin Wolf wrote:
> > If I understand correctly, this is a case of incoming live migration,
> > i.e. the virtio-blk device which is blocking the writes to the image
> > doesn't really belong to a running guest yet.
> 
> Yes, exactly. libvirt starts the NBD server on the destination of
> the migration. Until then the VM did not ever run yet. The VM will run
> once the memory migration finishes, so the guest front-end will not
> write anything at that point.
> 
> > So if we need to make an exception (and actually reading the context
> > makes it appear so), I guess it would have to be that devices actually
> > can share the write permission during incoming migration, but not any
> > more later (unless the share-rw flag is set).
> 
> Yes, this would be desired to avoid a regression. Libvirt then tears
> down the mirror prior to resuming the VM (afaik).

Now the big question is how to implement this. Just not requesting the
write permission initially if runstate_check(RUN_STATE_INMIGRATE) is
easy. But we need to find a place to actually request it later, after
the mirror has completed, and before the VM is running.

My first thought was that we could add a VMChangeStateHandler and just
request the permission in there. However, requesting the permission can
fail (e.g. because the NBD server hasn't been shut down) and we can't
let a state transition fail from a VMChangeStateHandler.

Maybe the next best thing we could do is to have a BlockDevOps callback
for invalidate_cache/inactivate and get the permissions there. This one
could return an error, which would be passed up the stack and eventually
cause qmp_cont() to fail (i.e. it would refuse to resume the VM). This
sounds workable to me.

Any comments on this approach, or maybe other ideas?

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-03 12:51     ` Peter Krempa
@ 2017-04-04 13:19       ` Kevin Wolf
  2017-04-04 13:27         ` Peter Krempa
  0 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2017-04-04 13:19 UTC (permalink / raw)
  To: Peter Krempa
  Cc: Max Reitz, Ciprian Barbu, qemu-devel, Eric Blake,
	Alexandru Avadanii, Jeff Cody, Markus Armbruster, svc-armband

[-- Attachment #1: Type: text/plain, Size: 2702 bytes --]

Am 03.04.2017 um 14:51 hat Peter Krempa geschrieben:
> On Mon, Apr 03, 2017 at 10:15:42 +0200, Kevin Wolf wrote:
> > Am 31.03.2017 um 19:43 hat Max Reitz geschrieben:
> > > On 31.03.2017 18:03, Ciprian Barbu wrote:
> 
> [...]
> 
> > > So this doesn't work:
> > > 
> > > $ x86_64-softmmu/qemu-system-x86_64 \
> > >     -blockdev node-name=image,driver=qcow2,\
> > > file.driver=file,file.filename=foo.qcow2 \
> > >     -device virtio-blk,drive=image \
> > >     -qmp stdio
> > > {"QMP": {"version": {"qemu": {"micro": 92, "minor": 8, "major": 2},
> > > "package": " (v2.8.0-2038-g6604c893d0)"}, "capabilities": []}}
> > > {'execute':'qmp_capabilities'}
> > > {"return": {}}
> > > {'execute':'nbd-server-start','arguments':{'addr':{'type':'inet','data':{'host':'localhost','port':'10809'}}}}
> > > {"return": {}}
> > > {'execute':'nbd-server-add','arguments':{'device':'image','writable':true}}
> > > {"error": {"class": "GenericError", "desc": "Conflicts with use by
> > > /machine/peripheral-anon/device[0]/virtio-backend as 'root', which does
> > > not allow 'write' on image"}
> > > 
> > > But this works:
> > > 
> > > $ x86_64-softmmu/qemu-system-x86_64 \
> > >     -blockdev node-name=image,driver=qcow2,\
> > > file.driver=file,file.filename=foo.qcow2 \
> > >     -device virtio-blk,drive=image,share-rw=on \
> > >     -qmp stdio
> > > {"QMP": {"version": {"qemu": {"micro": 92, "minor": 8, "major": 2},
> > > "package": " (v2.8.0-2038-g6604c893d0)"}, "capabilities": []}}
> > > {'execute':'qmp_capabilities'}
> > > {"return": {}}
> > > {'execute':'nbd-server-start','arguments':{'addr':{'type':'inet','data':{'host':'localhost','port':'10809'}}}}
> > > {"return": {}}
> > > {'execute':'nbd-server-add','arguments':{'device':'image','writable':true}}
> > > {"return": {}}
> > > 
> > > (The difference is the share-rw=on in the -device parameter.)
> > > 
> > > So in theory all that's necessary is to set share-rw=on for the device
> > > in the management layer. But I'm not sure whether that's practical.
> > 
> > Yes, libvirt needs to provide this option if the guest supports sharing.
> > If it doesn't support sharing, rejecting a read-write NBD client seems
> > correct to me.
> > 
> > Peter, Eric, what is the status on the libvirt side here?
> 
> Libvirt currently uses the NBD server only for non-shared storage
> migration. At that point the disk is not shared (while qemu may think
> so) since the other side is not actually running until the mirror
> reaches synchronized state.

Yes, I misunderstood the situation at first.

Anyway, is there already a libvirt patch for the cases where the image
is actually shared?

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-04 13:19       ` Kevin Wolf
@ 2017-04-04 13:27         ` Peter Krempa
  2017-04-04 13:54           ` Kevin Wolf
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Krempa @ 2017-04-04 13:27 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Max Reitz, Ciprian Barbu, qemu-devel, Eric Blake,
	Alexandru Avadanii, Jeff Cody, Markus Armbruster, svc-armband

[-- Attachment #1: Type: text/plain, Size: 878 bytes --]

On Tue, Apr 04, 2017 at 15:19:02 +0200, Kevin Wolf wrote:
> Am 03.04.2017 um 14:51 hat Peter Krempa geschrieben:
> > On Mon, Apr 03, 2017 at 10:15:42 +0200, Kevin Wolf wrote:
> > > Am 31.03.2017 um 19:43 hat Max Reitz geschrieben:
> > > > On 31.03.2017 18:03, Ciprian Barbu wrote:

[...]

> > > Peter, Eric, what is the status on the libvirt side here?
> > 
> > Libvirt currently uses the NBD server only for non-shared storage
> > migration. At that point the disk is not shared (while qemu may think
> > so) since the other side is not actually running until the mirror
> > reaches synchronized state.
> 
> Yes, I misunderstood the situation at first.
> 
> Anyway, is there already a libvirt patch for the cases where the image
> is actually shared?

As said, we use the nbd server only for migration. There's no other
usage I know of and thus no patch.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-04 12:16           ` Kevin Wolf
@ 2017-04-04 13:51             ` Eric Blake
  2017-04-04 14:04             ` Paolo Bonzini
  1 sibling, 0 replies; 31+ messages in thread
From: Eric Blake @ 2017-04-04 13:51 UTC (permalink / raw)
  To: Kevin Wolf, Peter Krempa
  Cc: Max Reitz, Ciprian Barbu, qemu-devel, Alexandru Avadanii,
	Jeff Cody, Markus Armbruster, svc-armband

[-- Attachment #1: Type: text/plain, Size: 1401 bytes --]

On 04/04/2017 07:16 AM, Kevin Wolf wrote:

> 
> Now the big question is how to implement this. Just not requesting the
> write permission initially if runstate_check(RUN_STATE_INMIGRATE) is
> easy. But we need to find a place to actually request it later, after
> the mirror has completed, and before the VM is running.
> 
> My first thought was that we could add a VMChangeStateHandler and just
> request the permission in there. However, requesting the permission can
> fail (e.g. because the NBD server hasn't been shut down) and we can't
> let a state transition fail from a VMChangeStateHandler.
> 
> Maybe the next best thing we could do is to have a BlockDevOps callback
> for invalidate_cache/inactivate and get the permissions there. This one
> could return an error, which would be passed up the stack and eventually
> cause qmp_cont() to fail (i.e. it would refuse to resume the VM). This
> sounds workable to me.

Having 'cont' refuse when write permissions can't be grabbed after all
sounds reasonable (assuming, of course, that the common case is that the
invalidate_cache/inactivate call that triggers the callback will
normally succeed, because it means the source is now done and the
destination _should_ be able to get write access at that point).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-04 13:27         ` Peter Krempa
@ 2017-04-04 13:54           ` Kevin Wolf
  0 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2017-04-04 13:54 UTC (permalink / raw)
  To: Peter Krempa
  Cc: Max Reitz, Ciprian Barbu, qemu-devel, Eric Blake,
	Alexandru Avadanii, Jeff Cody, Markus Armbruster, svc-armband

[-- Attachment #1: Type: text/plain, Size: 1259 bytes --]

Am 04.04.2017 um 15:27 hat Peter Krempa geschrieben:
> On Tue, Apr 04, 2017 at 15:19:02 +0200, Kevin Wolf wrote:
> > Am 03.04.2017 um 14:51 hat Peter Krempa geschrieben:
> > > On Mon, Apr 03, 2017 at 10:15:42 +0200, Kevin Wolf wrote:
> > > > Am 31.03.2017 um 19:43 hat Max Reitz geschrieben:
> > > > > On 31.03.2017 18:03, Ciprian Barbu wrote:
> 
> [...]
> 
> > > > Peter, Eric, what is the status on the libvirt side here?
> > > 
> > > Libvirt currently uses the NBD server only for non-shared storage
> > > migration. At that point the disk is not shared (while qemu may think
> > > so) since the other side is not actually running until the mirror
> > > reaches synchronized state.
> > 
> > Yes, I misunderstood the situation at first.
> > 
> > Anyway, is there already a libvirt patch for the cases where the image
> > is actually shared?
> 
> As said, we use the nbd server only for migration. There's no other
> usage I know of and thus no patch.

I mean two VMs sharing the same image file, and both expose it as a
guest device. The situation for which we discussed before that libvirt
needs to pass share-rw=on. I understand that this isn't related to this
report, but I want to make sure that it's not forgotten.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-04 12:16           ` Kevin Wolf
  2017-04-04 13:51             ` Eric Blake
@ 2017-04-04 14:04             ` Paolo Bonzini
  2017-04-04 14:53               ` Kevin Wolf
  1 sibling, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2017-04-04 14:04 UTC (permalink / raw)
  To: Kevin Wolf, Peter Krempa
  Cc: svc-armband, Markus Armbruster, Jeff Cody, Ciprian Barbu,
	qemu-devel, Max Reitz, Alexandru Avadanii

[-- Attachment #1: Type: text/plain, Size: 1418 bytes --]



On 04/04/2017 14:16, Kevin Wolf wrote:
> Just not requesting the
> write permission initially if runstate_check(RUN_STATE_INMIGRATE) is
> easy. But we need to find a place to actually request it later, after
> the mirror has completed, and before the VM is running.

Isn't there already a bdrv_invalidate_cache_all or something like that
called when the VM is started?

Having to touch all devices would be ugly.  Maybe some permissions could
be marked as "deferred".

The big question is how this fits into release management.  We have
another important regression from the op blocker work and only a week to
go before the last rc.  Are we going to delay 2.9 arbitrarily?  Are we
going to shorten the 2.10 development period correspondingly?  (I vote
yes and yes, FWIW).

Paolo

> My first thought was that we could add a VMChangeStateHandler and just
> request the permission in there. However, requesting the permission can
> fail (e.g. because the NBD server hasn't been shut down) and we can't
> let a state transition fail from a VMChangeStateHandler.
> 
> Maybe the next best thing we could do is to have a BlockDevOps callback
> for invalidate_cache/inactivate and get the permissions there. This one
> could return an error, which would be passed up the stack and eventually
> cause qmp_cont() to fail (i.e. it would refuse to resume the VM). This
> sounds workable to me.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-04 14:04             ` Paolo Bonzini
@ 2017-04-04 14:53               ` Kevin Wolf
  2017-04-04 15:09                 ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2017-04-04 14:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Krempa, svc-armband, Markus Armbruster, Jeff Cody,
	Ciprian Barbu, qemu-devel, Max Reitz, Alexandru Avadanii

[-- Attachment #1: Type: text/plain, Size: 1881 bytes --]

Am 04.04.2017 um 16:04 hat Paolo Bonzini geschrieben:
> On 04/04/2017 14:16, Kevin Wolf wrote:
> > Just not requesting the
> > write permission initially if runstate_check(RUN_STATE_INMIGRATE) is
> > easy. But we need to find a place to actually request it later, after
> > the mirror has completed, and before the VM is running.
> 
> Isn't there already a bdrv_invalidate_cache_all or something like that
> called when the VM is started?

bdrv_invalidate_cache_all() is different in two aspects.

First, it starts at the BDS level and propagates down to children, but
not up to the parents, where the permissions are set.

Second, bdrv_invalidate_cache_all() is called immediately when migration
completes. This is before the NBD server is shut down, so we can't
actually request write permissions at this point yet.

> Having to touch all devices would be ugly.  Maybe some permissions could
> be marked as "deferred".

Yes, BlockDevOps for it wasn't that great an idea.

We have a common function blkconf_apply_backend_options() that sets the
permissions for any devices that we care about with storage migration.
This function can be changed not to do this during incoming migration,
and then we would need a way to find the BlockConf later when trying to
resume the VM. Failing that, we could try to make it register a notifier
or something.

Or we could mark BlockBackends as "enforce permissions only after
migration has completed" and do this for all BlockBackends of devices.

> The big question is how this fits into release management.  We have
> another important regression from the op blocker work and only a week
> to go before the last rc.  Are we going to delay 2.9 arbitrarily?  Are
> we going to shorten the 2.10 development period correspondingly?  (I
> vote yes and yes, FWIW).

Which is the other regression?

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-04 14:53               ` Kevin Wolf
@ 2017-04-04 15:09                 ` Paolo Bonzini
  2017-04-05 11:01                   ` Kevin Wolf
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2017-04-04 15:09 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Peter Krempa, svc-armband, Markus Armbruster, Jeff Cody,
	Ciprian Barbu, qemu-devel, Max Reitz, Alexandru Avadanii

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]



On 04/04/2017 16:53, Kevin Wolf wrote:
>> The big question is how this fits into release management.  We have
>> another important regression from the op blocker work and only a week
>> to go before the last rc.  Are we going to delay 2.9 arbitrarily?  Are
>> we going to shorten the 2.10 development period correspondingly?  (I
>> vote yes and yes, FWIW).
> Which is the other regression?

The assertion failure for snapshot_blkdev with iothreads.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-04 15:09                 ` Paolo Bonzini
@ 2017-04-05 11:01                   ` Kevin Wolf
  2017-04-05 21:13                     ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2017-04-05 11:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Krempa, svc-armband, Markus Armbruster, Jeff Cody,
	Ciprian Barbu, qemu-devel, Max Reitz, Alexandru Avadanii

[-- Attachment #1: Type: text/plain, Size: 1027 bytes --]

Am 04.04.2017 um 17:09 hat Paolo Bonzini geschrieben:
> On 04/04/2017 16:53, Kevin Wolf wrote:
> >> The big question is how this fits into release management.  We have
> >> another important regression from the op blocker work and only a week
> >> to go before the last rc.  Are we going to delay 2.9 arbitrarily?  Are
> >> we going to shorten the 2.10 development period correspondingly?  (I
> >> vote yes and yes, FWIW).
> > Which is the other regression?
> 
> The assertion failure for snapshot_blkdev with iothreads.

Ah, right, I keep forgetting that this started appearing with the op
blocker series because the failure mode is completely different, so it
seems to have been a latent bug somewhere else that was uncovered by it.

If we're sure that the change of the order in bdrv_append() is what
caused the bug to appear, we can just undo that for 2.9, at the cost of
a messed up graph in the error case when bdrv_set_backing_hd() fails
(because we have no way to undo bdrv_replace_node()).

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-05 11:01                   ` Kevin Wolf
@ 2017-04-05 21:13                     ` Paolo Bonzini
  2017-04-06  8:48                       ` Kevin Wolf
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2017-04-05 21:13 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Peter Krempa, svc-armband, Markus Armbruster, Jeff Cody,
	Ciprian Barbu, qemu-devel, Max Reitz, Alexandru Avadanii



On 05/04/2017 13:01, Kevin Wolf wrote:
> Am 04.04.2017 um 17:09 hat Paolo Bonzini geschrieben:
>> On 04/04/2017 16:53, Kevin Wolf wrote:
>>>> The big question is how this fits into release management.  We have
>>>> another important regression from the op blocker work and only a week
>>>> to go before the last rc.  Are we going to delay 2.9 arbitrarily?  Are
>>>> we going to shorten the 2.10 development period correspondingly?  (I
>>>> vote yes and yes, FWIW).
>>> Which is the other regression?
>>
>> The assertion failure for snapshot_blkdev with iothreads.
> 
> Ah, right, I keep forgetting that this started appearing with the op
> blocker series because the failure mode is completely different, so it
> seems to have been a latent bug somewhere else that was uncovered by it.
> 
> If we're sure that the change of the order in bdrv_append() is what
> caused the bug to appear, we can just undo that for 2.9, at the cost of
> a messed up graph in the error case when bdrv_set_backing_hd() fails
> (because we have no way to undo bdrv_replace_node()).

I don't know if that is enough to fix all of the issues, but the bug is
easy to reproduce.

The issue is the lack of understanding of what node movement does to
quiesce_counter.  The invariant is that children cannot have a lower
quiesce_counter than parents, I think (paths in the graph can only join
in the children direction, right?).  Is it checked, and are there
violations already?  Maybe we need a get_quiesce_counter method in
BdrvChildRole, to cover BlockBackend's quiesce_counter?  Then we can use
that information to adjust the quiesce_counter when nodes move in the graph.

The block layer has good tests, but as the internal logic grows more
complex we should probably have more C level tests.  I'm constantly
impressed by the amount of tricky cases that test-replication.c catches
in the block job code.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-05 21:13                     ` Paolo Bonzini
@ 2017-04-06  8:48                       ` Kevin Wolf
  2017-04-06  9:03                         ` Kevin Wolf
  0 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2017-04-06  8:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Krempa, svc-armband, Markus Armbruster, Jeff Cody,
	Ciprian Barbu, qemu-devel, Max Reitz, Alexandru Avadanii

Am 05.04.2017 um 23:13 hat Paolo Bonzini geschrieben:
> On 05/04/2017 13:01, Kevin Wolf wrote:
> > Am 04.04.2017 um 17:09 hat Paolo Bonzini geschrieben:
> >> On 04/04/2017 16:53, Kevin Wolf wrote:
> >>>> The big question is how this fits into release management.  We have
> >>>> another important regression from the op blocker work and only a week
> >>>> to go before the last rc.  Are we going to delay 2.9 arbitrarily?  Are
> >>>> we going to shorten the 2.10 development period correspondingly?  (I
> >>>> vote yes and yes, FWIW).
> >>> Which is the other regression?
> >>
> >> The assertion failure for snapshot_blkdev with iothreads.
> > 
> > Ah, right, I keep forgetting that this started appearing with the op
> > blocker series because the failure mode is completely different, so it
> > seems to have been a latent bug somewhere else that was uncovered by it.
> > 
> > If we're sure that the change of the order in bdrv_append() is what
> > caused the bug to appear, we can just undo that for 2.9, at the cost of
> > a messed up graph in the error case when bdrv_set_backing_hd() fails
> > (because we have no way to undo bdrv_replace_node()).
> 
> I don't know if that is enough to fix all of the issues, but the bug is
> easy to reproduce.
> 
> The issue is the lack of understanding of what node movement does to
> quiesce_counter.  The invariant is that children cannot have a lower
> quiesce_counter than parents, I think (paths in the graph can only join
> in the children direction, right?).

Maybe I'm missing something, but I think this isn't true at all. Drains
are propagated to the parents, so that this specific node doesn't
receive new requests, but not to the children. The assumption is that
children don't do anything anyway without requests from their parents,
so they are effectively quiesced even with quiesce_counter == 0.

So if anything, the invariant should be the exact opposite: Parents
cannot have a lower quiesce_counter than their children.

I think the exact thing that the quiesce_counter of a node is expected
to be is the number of paths from itself to an explicitly drained node
in the directed block driver graph (counting one path if it is
explicitly drained itself). A path counts multiple times if a node is
explicitly drained multiple times.

> Is it checked, and are there violations already?  Maybe we need a
> get_quiesce_counter method in BdrvChildRole, to cover BlockBackend's
> quiesce_counter?  Then we can use that information to adjust the
> quiesce_counter when nodes move in the graph.

We would need that if we had a downwards propagation and if a
BlockBackend could be drained, but as it stands, I don't see what could
be missing from bdrv_replace_child_noperm() (well, except that I think
your patch is right to avoid calling drained_end/begin if both nodes
were drained because new requests could sneak in this way in theory).

> The block layer has good tests, but as the internal logic grows more
> complex we should probably have more C level tests.  I'm constantly
> impressed by the amount of tricky cases that test-replication.c catches
> in the block job code.

Never really noticed test-replication specifically catching things when
I worked on the op blockers code which changed a lot around block jobs,
but that we should consider this type of tests more often is probably a
good point.

Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
  2017-04-06  8:48                       ` Kevin Wolf
@ 2017-04-06  9:03                         ` Kevin Wolf
  0 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2017-04-06  9:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Krempa, svc-armband, Markus Armbruster, Jeff Cody,
	Ciprian Barbu, qemu-devel, Max Reitz, Alexandru Avadanii

Am 06.04.2017 um 10:48 hat Kevin Wolf geschrieben:
> Am 05.04.2017 um 23:13 hat Paolo Bonzini geschrieben:
> > On 05/04/2017 13:01, Kevin Wolf wrote:
> > > Am 04.04.2017 um 17:09 hat Paolo Bonzini geschrieben:
> > >> On 04/04/2017 16:53, Kevin Wolf wrote:
> > >>>> The big question is how this fits into release management.  We have
> > >>>> another important regression from the op blocker work and only a week
> > >>>> to go before the last rc.  Are we going to delay 2.9 arbitrarily?  Are
> > >>>> we going to shorten the 2.10 development period correspondingly?  (I
> > >>>> vote yes and yes, FWIW).
> > >>> Which is the other regression?
> > >>
> > >> The assertion failure for snapshot_blkdev with iothreads.
> > > 
> > > Ah, right, I keep forgetting that this started appearing with the op
> > > blocker series because the failure mode is completely different, so it
> > > seems to have been a latent bug somewhere else that was uncovered by it.
> > > 
> > > If we're sure that the change of the order in bdrv_append() is what
> > > caused the bug to appear, we can just undo that for 2.9, at the cost of
> > > a messed up graph in the error case when bdrv_set_backing_hd() fails
> > > (because we have no way to undo bdrv_replace_node()).
> > 
> > I don't know if that is enough to fix all of the issues, but the bug is
> > easy to reproduce.
> > 
> > The issue is the lack of understanding of what node movement does to
> > quiesce_counter.  The invariant is that children cannot have a lower
> > quiesce_counter than parents, I think (paths in the graph can only join
> > in the children direction, right?).
> 
> Maybe I'm missing something, but I think this isn't true at all. Drains
> are propagated to the parents, so that this specific node doesn't
> receive new requests, but not to the children. The assumption is that
> children don't do anything anyway without requests from their parents,
> so they are effectively quiesced even with quiesce_counter == 0.
> 
> So if anything, the invariant should be the exact opposite: Parents
> cannot have a lower quiesce_counter than their children.
> 
> I think the exact thing that the quiesce_counter of a node is expected
> to be is the number of paths from itself to an explicitly drained node
> in the directed block driver graph (counting one path if it is
> explicitly drained itself). A path counts multiple times if a node is
> explicitly drained multiple times.
> 
> > Is it checked, and are there violations already?  Maybe we need a
> > get_quiesce_counter method in BdrvChildRole, to cover BlockBackend's
> > quiesce_counter?  Then we can use that information to adjust the
> > quiesce_counter when nodes move in the graph.
> 
> We would need that if we had a downwards propagation and if a
> BlockBackend could be drained, but as it stands, I don't see what could
> be missing from bdrv_replace_child_noperm() (well, except that I think
> your patch is right to avoid calling drained_end/begin if both nodes
> were drained because new requests could sneak in this way in theory).

Actually, to get this part completely right, we also need to drain the
BlockBackend _before_ attaching the new BDS. Otherwise, if the old BDS
wasn't quiesced, but the new one is, the BdrvChildRole.drained_begin()
callback could send requests to the already drained new BDS.

Kevin

> > The block layer has good tests, but as the internal logic grows more
> > complex we should probably have more C level tests.  I'm constantly
> > impressed by the amount of tricky cases that test-replication.c catches
> > in the block job code.
> 
> Never really noticed test-replication specifically catching things when
> I worked on the op blockers code which changed a lot around block jobs,
> but that we should consider this type of tests more often is probably a
> good point.
> 
> Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-04-06  9:04 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-31 16:03 [Qemu-devel] nbd: Possible regression in 2.9 RCs Ciprian Barbu
2017-03-31 17:32 ` Ciprian Barbu
2017-03-31 17:36   ` Ciprian Barbu
2017-03-31 17:43 ` Max Reitz
2017-03-31 17:49   ` Ciprian Barbu
2017-03-31 17:57     ` Max Reitz
2017-03-31 19:07       ` Alexandru Avadanii
2017-04-03 18:52     ` Kashyap Chamarthy
2017-04-04  8:07       ` ciprian.barbu
2017-04-03  8:15   ` Kevin Wolf
2017-04-03 12:39     ` Max Reitz
2017-04-03 13:00       ` Kevin Wolf
2017-04-03 13:50         ` Peter Krempa
2017-04-04 12:16           ` Kevin Wolf
2017-04-04 13:51             ` Eric Blake
2017-04-04 14:04             ` Paolo Bonzini
2017-04-04 14:53               ` Kevin Wolf
2017-04-04 15:09                 ` Paolo Bonzini
2017-04-05 11:01                   ` Kevin Wolf
2017-04-05 21:13                     ` Paolo Bonzini
2017-04-06  8:48                       ` Kevin Wolf
2017-04-06  9:03                         ` Kevin Wolf
2017-04-03 19:48         ` Eric Blake
2017-04-03 19:44       ` Eric Blake
2017-04-04  8:17         ` ciprian.barbu
2017-04-04 11:00           ` Paolo Bonzini
2017-04-04 11:14             ` Daniel P. Berrange
2017-04-03 12:51     ` Peter Krempa
2017-04-04 13:19       ` Kevin Wolf
2017-04-04 13:27         ` Peter Krempa
2017-04-04 13:54           ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.