From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48972) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bx98p-0003t9-Lf for qemu-devel@nongnu.org; Thu, 20 Oct 2016 04:56:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bx98m-0000g8-I8 for qemu-devel@nongnu.org; Thu, 20 Oct 2016 04:55:59 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52074) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bx98m-0000fm-Ah for qemu-devel@nongnu.org; Thu, 20 Oct 2016 04:55:56 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6772731B305 for ; Thu, 20 Oct 2016 08:55:55 +0000 (UTC) From: Markus Armbruster References: <20161018135213.GI2190@work-vm> <20161018140141.GF12728@redhat.com> <87wph4g44n.fsf@dusky.pond.sub.org> <20161019081210.GA2035@work-vm> <20161019084235.GE11194@redhat.com> <87twc8d60e.fsf@dusky.pond.sub.org> <20161019100552.GD2035@work-vm> <20161019101616.GL11194@redhat.com> <87a8e0bkl6.fsf@dusky.pond.sub.org> <20161019122158.GS11194@redhat.com> <20161019180616.GF2035@work-vm> Date: Thu, 20 Oct 2016 10:55:52 +0200 In-Reply-To: <20161019180616.GF2035@work-vm> (David Alan Gilbert's message of "Wed, 19 Oct 2016 19:06:16 +0100") Message-ID: <87oa2fwg9z.fsf@dusky.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] chardev's and fd's in monitors List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: "Daniel P. Berrange" , qemu-devel@nongnu.org "Dr. David Alan Gilbert" writes: > * Daniel P. Berrange (berrange@redhat.com) wrote: >> On Wed, Oct 19, 2016 at 02:16:05PM +0200, Markus Armbruster wrote: >> > "Daniel P. Berrange" writes: >> > >> > > On Wed, Oct 19, 2016 at 11:05:53AM +0100, Dr. David Alan Gilbert wrote: >> > >> >> > >> We need a way to be able to report an error without plumbing error_setg >> > >> up the stack; if you're saying error_report isn't suitable then we >> > >> should just recommend we switch everything in migration back to >> > >> fprintf(stderr, >> > >> > In the cases where error_report() isn't suitable, fprintf() is just as >> > unsuitable for the exact same reasons. >> > >> > > Well both error_report() + fprintf are broken from POV of anything >> > > using QMP. error_report() is slightly less broken for HMP, >> > >> > error_report() is not broken at all for HMP code. The trouble is code >> > that can't know whether it's running in a context where error_report() >> > is suitable. >> > >> > > but doesn't >> > > help QMP. >> > >> > Correct. >> > >> > > In the short term we should just make error_report be threadsafe in >> > > its usage of the monitor. >> > >> > Any problems left once cur_mon is thread-local (which it should be >> > anyway)? >> >> If we make cur_mon a thread-local, then error_report() is equivalent >> to fprintf(stderr) for the migration code, since the migration >> code runs in a different thread thread, and so would always see >> cur_mon == NULL. > > Yes, that would become safe; it does sound the best fix for the current > worry. > > If we had that, then why not wire up error_report to pass errors back to QMP > as well? Well, that would be similar to how QMP used to work. Back when the design of the QMP monitor was hammered out, we discussed how to do errors. Anthony argued for passing around error objects. I pointed out the enormous amount of work this would require: every call chain from the monitor to an error needs to be modified, with ripple effects throughout QEMU. So I proposed a shortcut: have a function that reports the error, except when in QMP context store it in the monitor instead. That way, you need to touch only places reporting errors, not every call chains leading to one. Sadly, that function couldn't be error_report() back then, because Anthony insisted on rich error objects, against my opposition. To support them, we invented a new function, in commit 8204a91. Code still had to be converted to this new function. But it was the least laborious solution given the rich error object requirement. Anthony reluctantly accepted "store errors in monitor" as a transitional interface, mostly because we needed to get QMP off the ground fast, and passing around error objects would have slowed command conversion to a crawl. I hoped the transitional interface would turn out to be quite practical, and remain. Rich errors turned out to be a dead end, and we abandoned them after a bit over two years (commit de253f1). The "store error in the monitor" turned out to be a dead end, too. They lingered in the tree for a long time, until commit 4629ed1. My memory is foggy on why exactly they didn't work out, but reasons include: * What if code attempts to store multiple errors? We initially made that an assertion failure, but quickly had to relax that so that subsequent errors are silently ignored (commit 27a749f). That's differently suboptimal. * Failure remains difficult to see in the code. Before QMP, a monitor command handler didn't return status to the monitor core, it simply reported it to the human user, possibly buried deep down in some call chain. Only if something up the chain needed to know, we additionally propagated failure up the chain in ad hoc ways. Making error propagation the only way to fail commands made failure more obvious in the code. * Plumbing errors to the correct monitor is easy only in the (synchronous) monitor command handler. If the handler kicks off some background job, you can't store them in the monitor even if you know which monitor kicked off the job, because that could interfere with another handler's execution! You'd have to find some other place to store, and create some other code to examine that store and do what needs to be done. Whatever that may be: could be sending the error in an asynchronous event, could be retaining for a later command to report synchronously. But then propagating errors up the call chain starts to look more appealing than it used to.