From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46006) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bxHaL-0002PU-4z for qemu-devel@nongnu.org; Thu, 20 Oct 2016 13:56:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bxHaH-00080Y-5U for qemu-devel@nongnu.org; Thu, 20 Oct 2016 13:56:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:8778) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bxHaG-0007zx-UB for qemu-devel@nongnu.org; Thu, 20 Oct 2016 13:56:53 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0DC6880085 for ; Thu, 20 Oct 2016 17:56:52 +0000 (UTC) Date: Thu, 20 Oct 2016 18:56:48 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20161020175648.GN2039@work-vm> References: <20161019101616.GL11194@redhat.com> <87a8e0bkl6.fsf@dusky.pond.sub.org> <20161019122158.GS11194@redhat.com> <20161019180616.GF2035@work-vm> <87oa2fwg9z.fsf@dusky.pond.sub.org> <20161020090356.GD12145@redhat.com> <20161020095835.GC2039@work-vm> <87funrti86.fsf@dusky.pond.sub.org> <20161020110832.GG12145@redhat.com> <87insnqllc.fsf@dusky.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87insnqllc.fsf@dusky.pond.sub.org> Subject: Re: [Qemu-devel] chardev's and fd's in monitors List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: "Daniel P. Berrange" , qemu-devel@nongnu.org * Markus Armbruster (armbru@redhat.com) wrote: > "Daniel P. Berrange" writes: > > > On Thu, Oct 20, 2016 at 12:42:01PM +0200, Markus Armbruster wrote: > >> "Dr. David Alan Gilbert" writes: > >> > >> > * Daniel P. Berrange (berrange@redhat.com) wrote: > >> >> On Thu, Oct 20, 2016 at 10:55:52AM +0200, Markus Armbruster wrote: > >> >> > >> >> Our code has increasingly converted to propagate errors up the call > >> >> chain, but having a mix of different error reporting approaches > >> >> is increasingly causing pain. > >> >> > >> >> eg a function which propagates errors wants to call into a function > >> >> whicih uses error_report > >> > >> When you add an Error * parameter to a function, you get to convert > >> everything it calls. Failure to do so is a bug, simple as that. > >> > >> Likewise, when you add a new call to a function that takes an Error * > >> parameter. > > > > I agree its a bug - just that from a pragmatic POV it isn't always > > practical to convert the entire call chain in one big bang - particularly > > if you are working on shared infrastructure. > > Been there, felt the pain. I just wish we'd be more diligent adding > FIXME comments when we take shortcuts. > > > For example, the > > qemu-sockets.c APIs for connecting & listening on unix/tcp sockets > > use Error * to report errors. Those APIs were used in the block > > layer, migration, vnc, chardev and network backend code. It clearly > > isn't practical to convert all those regions to code to correctly > > propagate Error ** in one go. That's certainly the desired end > > goal IMHO, but getting there is going to take a while. > > > > This slow conversion has been going on for a long time now, so > > maybe we need to make an aggressive push in strategic areas of > > the code to stop it dragging out indefinitely. This is kind of > > a general problem we have in QEMU - we have introduced lots of > > new standard frameworks, but very rarely finish converting > > code to use them. > > Yep. > > > This is why I did a big push to try to get > > everything switched over to use QIOChannel, in the shortest > > possible time, rather than doing only small bits over many > > releases. > > Much better when you can pull it off. > > A common road block is having to update code nobody wants to touch, let > alone maintain, with no way to test. The easy way out is to update just > the code you actually care about, then call the conversion process > "incremental". Well, it's not incremental without commitments to > increment. I don't think we should do this until we can come up with a scheme that has less impact on the code being modified. > >> >> - there's no nice way to propagate the error > >> >> since it has already been reported. If the function then wants to > >> >> explicitly ignore the error, then that's impossible too,since it has > >> >> already been reported. Add in our code which doesn't use error_report > >> >> and instead returns errno values, such as the block layer, and it gets > >> >> even worse because if that calls a function which propagates an error, > >> >> it has to throw away that useful error and return a useless invented > >> >> errno value :( > >> > >> Different bug: when you receive an Error, you have to either handle and > >> consume it, or pass it on. Throwing it away and returning an error code > >> instead counts as neither, and is a bug. > > > > Totally agree its a bug - just again hits the reality of having to > > do major code conversions. > > > >> >> IMHO continuing to convert code to propagate errors is the only way > >> >> out of this swamp, because it provides the greatest flexibility to > >> >> the callers of said functions to decide how to deal with the error. > >> > >> I wouldn't call the whole situation a swamp. error_report() is just > >> fine in many places. So is returning -errno. We convert to Error when > >> these techniques cease to be fine. The swampy part is old code where > >> these techniques have never been fine, or at least not for a while. To > >> be drained incrementally. > > > > Realistically all the major backend subsystems (chardev, network, block, > > ui and migration) need to be converted to Error ** propagation, since > > they all ultimately call into some common code that reports Error **. > > Infrastucture generally doesn't know how it's used, which means > error_report() is generally wrong there. Sufficiently simple functions > can keep returning -errno, null, whatever, but the interesting stuff > needs to use Error. > > Very few places will end up being able to stick with -errno, or plain > > error_report in the long term. > > Not sure about "very few". Less than now. We'll see. I'd also prefer we got the very-few level; Migration used to be characterised by getting a 'load of migration failed -22' and having no clue in the logs to why; I've slowly fought back to be able to get an error from the lowest level that caused the failure. I want more of that, so that when someone gets a rare failure in the field I can see why. Dave > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK