All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] rbd: Possible regression in 2.9 RCs
@ 2017-03-30 23:42 Alexandru Avadanii
  2017-03-31  0:39 ` Eric Blake
  0 siblings, 1 reply; 7+ messages in thread
From: Alexandru Avadanii @ 2017-03-30 23:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: svc-armband

Hi,
While testing out 2.9.0-rc2 on AArch64, we noticed a possible regression in QEMU, related to parsing -drive 'file=rbd/...':
> "conf option 6789 has no value".

Instance logs [1].
Occasionally, we get "conf option too long", with the same effect.

We bisected this manually between 2.8.0 (working ok with the above cmd) and 2.9.0-rc2, and the problematic change seems to be the merge point [2].
The problem is not arch-specific, i.e. it applies to x86 too, but our set up is based on AArch64, so I attached the logs I had lying around.

I would digg deeper into this myself, but I'm quite burnt out for today.

BR,
Alex

[1] http://paste.openstack.org/show/604938/
[2] https://github.com/qemu/qemu/commit/9a81b79

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] rbd: Possible regression in 2.9 RCs
  2017-03-30 23:42 [Qemu-devel] rbd: Possible regression in 2.9 RCs Alexandru Avadanii
@ 2017-03-31  0:39 ` Eric Blake
  2017-03-31  0:52   ` Alexandru Avadanii
  2017-03-31  1:05   ` Alexandru Avadanii
  0 siblings, 2 replies; 7+ messages in thread
From: Eric Blake @ 2017-03-31  0:39 UTC (permalink / raw)
  To: Alexandru Avadanii, qemu-devel; +Cc: svc-armband, Jeff Cody, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 2019 bytes --]

On 03/30/2017 06:42 PM, Alexandru Avadanii wrote:
> Hi,
> While testing out 2.9.0-rc2 on AArch64, we noticed a possible regression in QEMU, related to parsing -drive 'file=rbd/...':
>> "conf option 6789 has no value".
> 
> Instance logs [1].

Pastebins don't last forever; it helps to paste the actual error message
in the email for archival purposes, and to make it easier for readers to
see your problem without having to chase URLs:

2017-03-30T20:02:27.499695Z qemu-system-aarch64: -drive
file=rbd:volumes/volume-ea141b5c-cdb3-4765-910d-e7008b209a70:id=compute:key=AQAVkvxXAAAAABAA9ZxWFYdRmV+DSwKr7BKKXg==:auth_supported=cephx\;none:mon_host=192.168.1.2\:6789,format=raw,if=none,id=drive-virtio-disk0,serial=ea141b5c-cdb3-4765-910d-e7008b209a70,cache=writeback:
conf option 6789 has no value

Looking at that, the only instance of 6789 that I see is the
'mon_host=192.168.1.2\:6789,' portion.  I bet what is happening is that
we are mis-parsing the string, and trying to treat it as a key-values
pair.  In other words, it's probably an unintended regression introduced
in the range of 7830f909..0a55679b by Jeff [3] or in Markus' cleanups
between f51c363c..2836284d [4].

On the bright side, we still have time to fix it before 2.9 goes final,
now that you called it to our attention.

> Occasionally, we get "conf option too long", with the same effect.
> 
> We bisected this manually between 2.8.0 (working ok with the above cmd) and 2.9.0-rc2, and the problematic change seems to be the merge point [2].

I suspect you didn't run the bisect quite correctly, as that merge point
has nothing to do with block/rbd.c.

> 
> [1] http://paste.openstack.org/show/604938/
> [2] https://github.com/qemu/qemu/commit/9a81b79
> 

[3] https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg07506.html
[4] https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05565.html

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] rbd: Possible regression in 2.9 RCs
  2017-03-31  0:39 ` Eric Blake
@ 2017-03-31  0:52   ` Alexandru Avadanii
  2017-03-31  1:05   ` Alexandru Avadanii
  1 sibling, 0 replies; 7+ messages in thread
From: Alexandru Avadanii @ 2017-03-31  0:52 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: svc-armband, Jeff Cody, Markus Armbruster

Hi, Eric,
Thank you for looking into this!

> -----Original Message-----
> From: Eric Blake [mailto:eblake@redhat.com]
> Sent: Friday, March 31, 2017 3:40 AM
> To: Alexandru Avadanii; qemu-devel@nongnu.org
> Cc: svc-armband; Jeff Cody; Markus Armbruster
> Subject: Re: [Qemu-devel] rbd: Possible regression in 2.9 RCs
> 
> On 03/30/2017 06:42 PM, Alexandru Avadanii wrote:
> > Hi,
> > While testing out 2.9.0-rc2 on AArch64, we noticed a possible regression in
> QEMU, related to parsing -drive 'file=rbd/...':
> >> "conf option 6789 has no value".
> >
> > Instance logs [1].
> 
> Pastebins don't last forever; it helps to paste the actual error message in the
> email for archival purposes, and to make it easier for readers to see your
> problem without having to chase URLs:
> 
> 2017-03-30T20:02:27.499695Z qemu-system-aarch64: -drive
> file=rbd:volumes/volume-ea141b5c-cdb3-4765-910d-
> e7008b209a70:id=compute:key=AQAVkvxXAAAAABAA9ZxWFYdRmV+DSwKr
> 7BKKXg==:auth_supported=cephx\;none:mon_host=192.168.1.2\:6789,form
> at=raw,if=none,id=drive-virtio-disk0,serial=ea141b5c-cdb3-4765-910d-
> e7008b209a70,cache=writeback:
> conf option 6789 has no value
> 
> Looking at that, the only instance of 6789 that I see is the
> 'mon_host=192.168.1.2\:6789,' portion.  I bet what is happening is that we
> are mis-parsing the string, and trying to treat it as a key-values pair.  In other
> words, it's probably an unintended regression introduced in the range of
> 7830f909..0a55679b by Jeff [3] or in Markus' cleanups between
> f51c363c..2836284d [4].
> 
> On the bright side, we still have time to fix it before 2.9 goes final, now that
> you called it to our attention.
> 
> > Occasionally, we get "conf option too long", with the same effect.
> >
> > We bisected this manually between 2.8.0 (working ok with the above cmd)
> and 2.9.0-rc2, and the problematic change seems to be the merge point [2].
> 
> I suspect you didn't run the bisect quite correctly, as that merge point has
> nothing to do with block/rbd.c.

I suspect that too, sorry. I'll redo this tomorrow and get back.

> 
> >
> > [1] http://paste.openstack.org/show/604938/
> > [2] https://github.com/qemu/qemu/commit/9a81b79
> >
> 
> [3] https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg07506.html
> [4] https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05565.html
> 
> --
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] rbd: Possible regression in 2.9 RCs
  2017-03-31  0:39 ` Eric Blake
  2017-03-31  0:52   ` Alexandru Avadanii
@ 2017-03-31  1:05   ` Alexandru Avadanii
  2017-03-31  1:22     ` Eric Blake
  1 sibling, 1 reply; 7+ messages in thread
From: Alexandru Avadanii @ 2017-03-31  1:05 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: svc-armband, Jeff Cody, Markus Armbruster

c7cacb3e7a2e9fdf929c993b98268e4179147cbb is the first bad commit
    block/rbd: parse all options via bdrv_parse_filename

> -----Original Message-----
> From: Alexandru Avadanii
> Sent: Friday, March 31, 2017 3:52 AM
> To: 'Eric Blake'; qemu-devel@nongnu.org
> Cc: svc-armband; Jeff Cody; Markus Armbruster
> Subject: RE: [Qemu-devel] rbd: Possible regression in 2.9 RCs
> 
> Hi, Eric,
> Thank you for looking into this!
> 
> > -----Original Message-----
> > From: Eric Blake [mailto:eblake@redhat.com]
> > Sent: Friday, March 31, 2017 3:40 AM
> > To: Alexandru Avadanii; qemu-devel@nongnu.org
> > Cc: svc-armband; Jeff Cody; Markus Armbruster
> > Subject: Re: [Qemu-devel] rbd: Possible regression in 2.9 RCs
> >
> > On 03/30/2017 06:42 PM, Alexandru Avadanii wrote:
> > > Hi,
> > > While testing out 2.9.0-rc2 on AArch64, we noticed a possible
> > > regression in
> > QEMU, related to parsing -drive 'file=rbd/...':
> > >> "conf option 6789 has no value".
> > >
> > > Instance logs [1].
> >
> > Pastebins don't last forever; it helps to paste the actual error
> > message in the email for archival purposes, and to make it easier for
> > readers to see your problem without having to chase URLs:
> >
> > 2017-03-30T20:02:27.499695Z qemu-system-aarch64: -drive
> > file=rbd:volumes/volume-ea141b5c-cdb3-4765-910d-
> >
> e7008b209a70:id=compute:key=AQAVkvxXAAAAABAA9ZxWFYdRmV+DSwKr
> >
> 7BKKXg==:auth_supported=cephx\;none:mon_host=192.168.1.2\:6789,form
> > at=raw,if=none,id=drive-virtio-disk0,serial=ea141b5c-cdb3-4765-910d-
> > e7008b209a70,cache=writeback:
> > conf option 6789 has no value
> >
> > Looking at that, the only instance of 6789 that I see is the
> > 'mon_host=192.168.1.2\:6789,' portion.  I bet what is happening is
> > that we are mis-parsing the string, and trying to treat it as a
> > key-values pair.  In other words, it's probably an unintended
> > regression introduced in the range of 7830f909..0a55679b by Jeff [3]
> > or in Markus' cleanups between f51c363c..2836284d [4].
> >
> > On the bright side, we still have time to fix it before 2.9 goes
> > final, now that you called it to our attention.
> >
> > > Occasionally, we get "conf option too long", with the same effect.
> > >
> > > We bisected this manually between 2.8.0 (working ok with the above
> > > cmd)
> > and 2.9.0-rc2, and the problematic change seems to be the merge point [2].
> >
> > I suspect you didn't run the bisect quite correctly, as that merge
> > point has nothing to do with block/rbd.c.
> 
> I suspect that too, sorry. I'll redo this tomorrow and get back.
> 
> >
> > >
> > > [1] http://paste.openstack.org/show/604938/
> > > [2] https://github.com/qemu/qemu/commit/9a81b79
> > >
> >
> > [3]
> > https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg07506.html
> > [4]
> > https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05565.html
> >
> > --
> > Eric Blake   eblake redhat com    +1-919-301-3266
> > Libvirt virtualization library http://libvirt.org


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] rbd: Possible regression in 2.9 RCs
  2017-03-31  1:05   ` Alexandru Avadanii
@ 2017-03-31  1:22     ` Eric Blake
  2017-03-31  2:08       ` Eric Blake
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Blake @ 2017-03-31  1:22 UTC (permalink / raw)
  To: Alexandru Avadanii, qemu-devel; +Cc: svc-armband, Jeff Cody, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 1187 bytes --]

On 03/30/2017 08:05 PM, Alexandru Avadanii wrote:
> c7cacb3e7a2e9fdf929c993b98268e4179147cbb is the first bad commit
>     block/rbd: parse all options via bdrv_parse_filename

Yep, my bisect finished about 2 minutes after your email on the same
spot. I'm working on a patch.  I can reproduce the problem with a mere:

./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio
-drive
'file=rbd:volumes/volume-ea141b5c-cdb3-4765-910d-e7008b209a70:id=compute:key=AQAVkvxXAAAAABAA9ZxWFYdRmV+DSwKr7BKKXg==:auth_supported=cephx\;none:mon_host=192.168.1.2\:6789,format=raw,if=none,id=drive-virtio-disk0,serial=ea141b5c-cdb3-4765-910d-e7008b209a70,cache=writeback'

the good behavior (on my setup) just hangs trying to connect to a
non-existent machine, the bad behavior gets rather-badly misparsed
(splitting the escaped : in the host:port portion as if the port were
the next key-value pair) resulting in an instant error message. I don't
have an actual RBD setup for testing the fix, but will cc you on the
patch that I propose once I have something.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] rbd: Possible regression in 2.9 RCs
  2017-03-31  1:22     ` Eric Blake
@ 2017-03-31  2:08       ` Eric Blake
  2017-03-31  3:54         ` Jeff Cody
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Blake @ 2017-03-31  2:08 UTC (permalink / raw)
  To: Alexandru Avadanii, qemu-devel; +Cc: Jeff Cody, Markus Armbruster, svc-armband

[-- Attachment #1: Type: text/plain, Size: 2454 bytes --]

On 03/30/2017 08:22 PM, Eric Blake wrote:
> On 03/30/2017 08:05 PM, Alexandru Avadanii wrote:
>> c7cacb3e7a2e9fdf929c993b98268e4179147cbb is the first bad commit
>>     block/rbd: parse all options via bdrv_parse_filename
> 
> Yep, my bisect finished about 2 minutes after your email on the same
> spot. I'm working on a patch.  I can reproduce the problem with a mere:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio
> -drive
> 'file=rbd:volumes/volume-ea141b5c-cdb3-4765-910d-e7008b209a70:id=compute:key=AQAVkvxXAAAAABAA9ZxWFYdRmV+DSwKr7BKKXg==:auth_supported=cephx\;none:mon_host=192.168.1.2\:6789,format=raw,if=none,id=drive-virtio-disk0,serial=ea141b5c-cdb3-4765-910d-e7008b209a70,cache=writeback'
> 
> the good behavior (on my setup) just hangs trying to connect to a
> non-existent machine, the bad behavior gets rather-badly misparsed
> (splitting the escaped : in the host:port portion as if the port were
> the next key-value pair) resulting in an instant error message. I don't
> have an actual RBD setup for testing the fix, but will cc you on the
> patch that I propose once I have something.

I found the culprit, but the patch is taking me longer.

We are unescaping key="mon_host", value="192.168.1.2\:6789" when we
first parse the key/value pairs in qemu_rbd_parse_filename(), then we
slam the unescaped values back into a single string with ':' separators
resulting in "mon_host=192.168.1.2:6789" for stuffing through the QDict,
then we pass that string BACK through qemu_rbd_next_tok() inside
qemu_rbd_set_keypairs(), and since we no longer have the \: escape, we
are trying to treat 6789 as a new key on the second pass.  Moral of the
story: don't parse stuff twice.

My patch will be to use a QList instead of a QString for the hidden
"=keyvalue-pairs" object that we use to pass things around between
parsing the filename and actually passing parameters to RBD, matching
the fact that we already have this telling comment:

            /* FIXME: This is pretty ugly, and not the right way to do this.
             *        These should be contained in a structure, and then
             *        passed explicitly as individual key/value pairs to
             *        rados.  Consider this legacy code that needs to be
             *        updated. */

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] rbd: Possible regression in 2.9 RCs
  2017-03-31  2:08       ` Eric Blake
@ 2017-03-31  3:54         ` Jeff Cody
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Cody @ 2017-03-31  3:54 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alexandru Avadanii, qemu-devel, Markus Armbruster, svc-armband

On Thu, Mar 30, 2017 at 09:08:20PM -0500, Eric Blake wrote:
> On 03/30/2017 08:22 PM, Eric Blake wrote:
> > On 03/30/2017 08:05 PM, Alexandru Avadanii wrote:
> >> c7cacb3e7a2e9fdf929c993b98268e4179147cbb is the first bad commit
> >>     block/rbd: parse all options via bdrv_parse_filename
> > 
> > Yep, my bisect finished about 2 minutes after your email on the same
> > spot. I'm working on a patch.  I can reproduce the problem with a mere:
> > 
> > ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio
> > -drive
> > 'file=rbd:volumes/volume-ea141b5c-cdb3-4765-910d-e7008b209a70:id=compute:key=AQAVkvxXAAAAABAA9ZxWFYdRmV+DSwKr7BKKXg==:auth_supported=cephx\;none:mon_host=192.168.1.2\:6789,format=raw,if=none,id=drive-virtio-disk0,serial=ea141b5c-cdb3-4765-910d-e7008b209a70,cache=writeback'
> > 
> > the good behavior (on my setup) just hangs trying to connect to a
> > non-existent machine, the bad behavior gets rather-badly misparsed
> > (splitting the escaped : in the host:port portion as if the port were
> > the next key-value pair) resulting in an instant error message. I don't
> > have an actual RBD setup for testing the fix, but will cc you on the
> > patch that I propose once I have something.
> 
> I found the culprit, but the patch is taking me longer.
> 
> We are unescaping key="mon_host", value="192.168.1.2\:6789" when we
> first parse the key/value pairs in qemu_rbd_parse_filename(), then we
> slam the unescaped values back into a single string with ':' separators
> resulting in "mon_host=192.168.1.2:6789" for stuffing through the QDict,
> then we pass that string BACK through qemu_rbd_next_tok() inside
> qemu_rbd_set_keypairs(), and since we no longer have the \: escape, we
> are trying to treat 6789 as a new key on the second pass.  Moral of the
> story: don't parse stuff twice.
> 
> My patch will be to use a QList instead of a QString for the hidden
> "=keyvalue-pairs" object that we use to pass things around between
> parsing the filename and actually passing parameters to RBD, matching
> the fact that we already have this telling comment:
> 
>             /* FIXME: This is pretty ugly, and not the right way to do this.
>              *        These should be contained in a structure, and then
>              *        passed explicitly as individual key/value pairs to
>              *        rados.  Consider this legacy code that needs to be
>              *        updated. */


Ugh.  Yep, I just got done coming to the same conclusion, only to hop on the
email list to see that you did as well.  A QList sounds like a good choice.

Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-31  3:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-30 23:42 [Qemu-devel] rbd: Possible regression in 2.9 RCs Alexandru Avadanii
2017-03-31  0:39 ` Eric Blake
2017-03-31  0:52   ` Alexandru Avadanii
2017-03-31  1:05   ` Alexandru Avadanii
2017-03-31  1:22     ` Eric Blake
2017-03-31  2:08       ` Eric Blake
2017-03-31  3:54         ` Jeff Cody

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.