From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:46006)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1bxHaL-0002PU-4z
	for qemu-devel@nongnu.org; Thu, 20 Oct 2016 13:56:58 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1bxHaH-00080Y-5U
	for qemu-devel@nongnu.org; Thu, 20 Oct 2016 13:56:57 -0400
Received: from mx1.redhat.com ([209.132.183.28]:8778)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <dgilbert@redhat.com>) id 1bxHaG-0007zx-UB
	for qemu-devel@nongnu.org; Thu, 20 Oct 2016 13:56:53 -0400
Received: from int-mx13.intmail.prod.int.phx2.redhat.com
	(int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 0DC6880085
	for <qemu-devel@nongnu.org>; Thu, 20 Oct 2016 17:56:52 +0000 (UTC)
Date: Thu, 20 Oct 2016 18:56:48 +0100
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20161020175648.GN2039@work-vm>
References: <20161019101616.GL11194@redhat.com>
	<87a8e0bkl6.fsf@dusky.pond.sub.org>
	<20161019122158.GS11194@redhat.com> <20161019180616.GF2035@work-vm>
	<87oa2fwg9z.fsf@dusky.pond.sub.org>
	<20161020090356.GD12145@redhat.com> <20161020095835.GC2039@work-vm>
	<87funrti86.fsf@dusky.pond.sub.org>
	<20161020110832.GG12145@redhat.com>
	<87insnqllc.fsf@dusky.pond.sub.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <87insnqllc.fsf@dusky.pond.sub.org>
Subject: Re: [Qemu-devel] chardev's and fd's in monitors
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Markus Armbruster <armbru@redhat.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>, qemu-devel@nongnu.org

* Markus Armbruster (armbru@redhat.com) wrote:
> "Daniel P. Berrange" <berrange@redhat.com> writes:
> 
> > On Thu, Oct 20, 2016 at 12:42:01PM +0200, Markus Armbruster wrote:
> >> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> >> 
> >> > * Daniel P. Berrange (berrange@redhat.com) wrote:
> >> >> On Thu, Oct 20, 2016 at 10:55:52AM +0200, Markus Armbruster wrote:
> >> >> 
> >> >> Our code has increasingly converted to propagate errors up the call
> >> >> chain, but having a mix of different error reporting approaches
> >> >> is increasingly causing pain.
> >> >> 
> >> >> eg a function which propagates errors wants to call into a function
> >> >> whicih uses error_report
> >> 
> >> When you add an Error * parameter to a function, you get to convert
> >> everything it calls.  Failure to do so is a bug, simple as that.
> >> 
> >> Likewise, when you add a new call to a function that takes an Error *
> >> parameter.
> >
> > I agree its a bug - just that from a pragmatic POV it isn't always
> > practical to convert the entire call chain in one big bang - particularly
> > if you are working on shared infrastructure.
> 
> Been there, felt the pain.  I just wish we'd be more diligent adding
> FIXME comments when we take shortcuts.
> 
> >                                              For example, the
> > qemu-sockets.c APIs for connecting & listening on unix/tcp sockets
> > use Error * to report errors. Those APIs were used in the block
> > layer, migration, vnc, chardev and network backend code. It clearly
> > isn't practical to convert all those regions to code to correctly
> > propagate Error ** in one go. That's certainly the desired end
> > goal IMHO, but getting there is going to take a while.
> >
> > This slow conversion has been going on for a long time now, so
> > maybe we need to make an aggressive push in strategic areas of
> > the code to stop it dragging out indefinitely. This is kind of
> > a general problem we have in QEMU - we have introduced lots of
> > new standard frameworks, but very rarely finish converting
> > code to use them.
> 
> Yep.
> 
> >                    This is why I did a big push to try to get
> > everything switched over to use QIOChannel, in the shortest
> > possible time, rather than doing only small bits over many
> > releases.
> 
> Much better when you can pull it off.
> 
> A common road block is having to update code nobody wants to touch, let
> alone maintain, with no way to test.  The easy way out is to update just
> the code you actually care about, then call the conversion process
> "incremental".  Well, it's not incremental without commitments to
> increment.

I don't think we should do this until we can come up with a scheme
that has less impact on the code being modified.

> >> >>                          - there's no nice way to propagate the error
> >> >> since it has already been reported.  If the function then wants to
> >> >> explicitly ignore the error, then that's impossible too,since it has
> >> >> already been reported.  Add in our code which doesn't use error_report
> >> >> and instead returns errno values, such as the block layer, and it gets
> >> >> even worse because if that calls a function which propagates an error,
> >> >> it has to throw away that useful error and return a useless invented
> >> >> errno value :(
> >> 
> >> Different bug: when you receive an Error, you have to either handle and
> >> consume it, or pass it on.  Throwing it away and returning an error code
> >> instead counts as neither, and is a bug.
> >
> > Totally agree its a bug - just again hits the reality of having to
> > do major code conversions.
> >
> >> >> IMHO continuing to convert code to propagate errors is the only way
> >> >> out of this swamp, because it provides the greatest flexibility to
> >> >> the callers of said functions to decide how to deal with the error.
> >> 
> >> I wouldn't call the whole situation a swamp.  error_report() is just
> >> fine in many places.  So is returning -errno.  We convert to Error when
> >> these techniques cease to be fine.  The swampy part is old code where
> >> these techniques have never been fine, or at least not for a while.  To
> >> be drained incrementally.
> >
> > Realistically all the major backend subsystems (chardev, network, block,
> > ui and migration) need to be converted to Error ** propagation, since
> > they all ultimately call into some common code that reports Error **.
> 
> Infrastucture generally doesn't know how it's used, which means
> error_report() is generally wrong there.  Sufficiently simple functions
> can keep returning -errno, null, whatever, but the interesting stuff
> needs to use Error.
> > Very few places will end up being able to stick with -errno, or plain
> > error_report in the long term.
> 
> Not sure about "very few".  Less than now.  We'll see.

I'd also prefer we got the very-few level; Migration used to be
characterised by getting a 'load of migration failed -22' and having
no clue in the logs to why; I've slowly fought back to be able
to get an error from the lowest level that caused the failure.
I want more of that, so that when someone gets a rare failure in the field
I can see why.

Dave

> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK