All of lore.kernel.org
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: Stefan Reiter <s.reiter@proxmox.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Wolfgang Bumiller <w.bumiller@proxmox.com>,
	Michael Roth <michael.roth@amd.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: Re: [PATCH] monitor/qmp: fix race with clients disconnecting early
Date: Tue, 31 Aug 2021 17:45:20 +0200	[thread overview]
Message-ID: <87eea9wrcf.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <91f2fb28-fd4d-f7ad-13d1-61c7ba16ae3c@proxmox.com> (Stefan Reiter's message of "Thu, 26 Aug 2021 17:15:43 +0200")

Stefan Reiter <s.reiter@proxmox.com> writes:

> On 26/08/2021 15:50, Markus Armbruster wrote:
>> Markus Armbruster <armbru@redhat.com> writes:
>> [...]
>> 
>>> Let me re-explain the bug in my own words, to make sure I understand.
>>>
>>> A QMP monitor normally runs in the monitor I/O thread.
>>>
>>> A QMP monitor can serve only one client at a time.
>>>
>>> It executes out-of-band commands right as it reads them.  In-band
>>> commands are queued, and executed one after the other in the main loop.
>>>
>>> Command output is buffered.  We write it out as fast as the character
>>> device can take it.  If a write fails, we throw away the entire buffer
>>> contents.
>>>
>>> A client can disconnect at any time.  This throws away the queue.  An
>>> in-band command may be executing in the main loop.  An out-of-band
>>> command may be executing in the monitor's thread.
>>>
>>> Such commands (if any) are not affected by the disconnect.  Their output
>>> gets buffered, but write out fails, so it's thrown away.
>>>
>>> *Except* when another client connects quickly enough.  Then we send it
>>> output meant for the previous client.  This is wrong.  I suspect this
>>> could even send invalid JSON.
>>>
>
> I'm not sure this is the case. In all testing I have *never*
> encountered the case of broken JS or any other indication that partial
> output was received.

We buffer monitor output without bounds, and try to write it out as
quickly as the client will take it.  I think short writes are possible
when the client is slow to read.  Such short writes can write partial
JSON expressions; the operating system doesn't know or care.  If the
client disconnects right then, the buffer starts with the remainder of
the JSON expression.  If the buffer is sent to the next client...

> I think the issue is just between starting to execute the command in
> the BH and the new client connecting... can the CHR_EVENTs even be
> triggered when the main thread is busy with the BH?

Good question.  We better find out.

>>> Special case: if in-band command qmp_capabilities is executing when the
>>> client disconnects, and another client connects before the command flips
>>> the monitor from capabilities negotiation mode to command mode, that
>>> client starts in the wrong mode.
>>
>> What the cases have in common: disconnect + connect in monitor I/O
>> thread and the command executing in the main thread change the same
>> monitor state.
>>
>> You observed two issues: one involving the output buffer (new client
>> receives old client's output), and one involving monitor mode (new
>> client has its mode flipped by the old client's qmp_capabilities
>> command).
>>
>> Any monitor state accessed by commands could cause issues.  Right now, I
>> can see only one more: file descriptors.  Cleaning them up on disconnect
>> could mess with the command.
>
> Right, that would make sense, but also only an issue if the reconnect
> can happen in the middle of the command itself.
>
> Maybe we can acquire some kind of lock during actual in-band QMP
> command execution?

Yes, the root cause is insufficient synchronization between in-band
command running in the main loop and disconnect / connect running in the
monitor I/O thread, and synchronizing them properly feels like the best
chance for a complete and reliable fix.

Before the OOB work, everything was in the main thread.  I figure the
misbehavior you found was not possible then.

Synchronization so that disconnect is delayed until after the in-band
command in flight completes should get us back to that state.  But I'm
afraid it could regress the OOB feature.

The point of OOB commands is "exec-oob executes right away no matter
what".  To make that possible OOB-capable commands are severely
restricted in what they can do, and the client needs to limit the number
of commands in flight.

"Right away" should be possible even when the client has to connect
first.  I'm not sure that's actually the case now.  Delaying until after
in-band completes would definitely kill it, though.

I'm afraid the synchronization needs to be more involved.

>> [...]



  reply	other threads:[~2021-08-31 15:46 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-23 10:11 [PATCH] monitor/qmp: fix race with clients disconnecting early Stefan Reiter
2021-08-23 15:05 ` Philippe Mathieu-Daudé
2021-08-25 15:54 ` Markus Armbruster
2021-08-26 13:50   ` Markus Armbruster
2021-08-26 15:15     ` Stefan Reiter
2021-08-31 15:45       ` Markus Armbruster [this message]
2021-09-02 12:05         ` Markus Armbruster
2021-09-02 12:45         ` Markus Armbruster
2021-10-12  9:27           ` Markus Armbruster
2021-10-13 15:44             ` Stefan Reiter
2021-10-14  5:40               ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87eea9wrcf.fsf@dusky.pond.sub.org \
    --to=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=s.reiter@proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    --cc=w.bumiller@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.