Re: [Qemu-devel] chardev's and fd's in monitors

From: Markus Armbruster <armbru@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] chardev's and fd's in monitors
Date: Thu, 20 Oct 2016 10:55:52 +0200	[thread overview]
Message-ID: <87oa2fwg9z.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <20161019180616.GF2035@work-vm> (David Alan Gilbert's message of "Wed, 19 Oct 2016 19:06:16 +0100")

"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> * Daniel P. Berrange (berrange@redhat.com) wrote:
>> On Wed, Oct 19, 2016 at 02:16:05PM +0200, Markus Armbruster wrote:
>> > "Daniel P. Berrange" <berrange@redhat.com> writes:
>> > 
>> > > On Wed, Oct 19, 2016 at 11:05:53AM +0100, Dr. David Alan Gilbert wrote:
>> > >> 
>> > >> We need a way to be able to report an error without plumbing error_setg
>> > >> up the stack; if you're saying error_report isn't suitable then we
>> > >> should just recommend we switch everything in migration back to
>> > >> fprintf(stderr,
>> > 
>> > In the cases where error_report() isn't suitable, fprintf() is just as
>> > unsuitable for the exact same reasons.
>> > 
>> > > Well both error_report() + fprintf  are broken from POV of anything
>> > > using QMP. error_report() is slightly less broken for HMP,
>> > 
>> > error_report() is not broken at all for HMP code.  The trouble is code
>> > that can't know whether it's running in a context where error_report()
>> > is suitable.
>> > 
>> > >                                                            but doesn't
>> > > help QMP.
>> > 
>> > Correct.
>> > 
>> > > In the short term we should just make error_report be  threadsafe in
>> > > its usage of the monitor.
>> > 
>> > Any problems left once cur_mon is thread-local (which it should be
>> > anyway)?
>> 
>> If we make cur_mon a thread-local, then error_report() is equivalent
>> to fprintf(stderr) for the migration code, since the migration
>> code runs in a different thread thread, and so would always see
>> cur_mon == NULL.
>
> Yes, that would become safe; it does sound the best fix for the current
> worry.
>
> If we had that, then why not wire up error_report to pass errors back to QMP
> as well?

Well, that would be similar to how QMP used to work.

Back when the design of the QMP monitor was hammered out, we discussed
how to do errors.

Anthony argued for passing around error objects.  I pointed out the
enormous amount of work this would require: every call chain from the
monitor to an error needs to be modified, with ripple effects throughout
QEMU.

So I proposed a shortcut: have a function that reports the error, except
when in QMP context store it in the monitor instead.  That way, you need
to touch only places reporting errors, not every call chains leading to
one.

Sadly, that function couldn't be error_report() back then, because
Anthony insisted on rich error objects, against my opposition.  To
support them, we invented a new function, in commit 8204a91.  Code still
had to be converted to this new function.  But it was the least
laborious solution given the rich error object requirement.

Anthony reluctantly accepted "store errors in monitor" as a transitional
interface, mostly because we needed to get QMP off the ground fast, and
passing around error objects would have slowed command conversion to a
crawl.  I hoped the transitional interface would turn out to be quite
practical, and remain.

Rich errors turned out to be a dead end, and we abandoned them after a
bit over two years (commit de253f1).

The "store error in the monitor" turned out to be a dead end, too.  They
lingered in the tree for a long time, until commit 4629ed1.  My memory
is foggy on why exactly they didn't work out, but reasons include:

* What if code attempts to store multiple errors?  We initially made
  that an assertion failure, but quickly had to relax that so that
  subsequent errors are silently ignored (commit 27a749f).  That's
  differently suboptimal.

* Failure remains difficult to see in the code.  Before QMP, a monitor
  command handler didn't return status to the monitor core, it simply
  reported it to the human user, possibly buried deep down in some call
  chain.  Only if something up the chain needed to know, we additionally
  propagated failure up the chain in ad hoc ways.  Making error
  propagation the only way to fail commands made failure more obvious in
  the code.

* Plumbing errors to the correct monitor is easy only in the
  (synchronous) monitor command handler.  If the handler kicks off some
  background job, you can't store them in the monitor even if you know
  which monitor kicked off the job, because that could interfere with
  another handler's execution!  You'd have to find some other place to
  store, and create some other code to examine that store and do what
  needs to be done.  Whatever that may be: could be sending the error in
  an asynchronous event, could be retaining for a later command to
  report synchronously.  But then propagating errors up the call chain
  starts to look more appealing than it used to.