qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Auditing QEMU to replace NULL with &error_abort
@ 2021-06-22 15:20 John Snow
  2021-06-23 12:16 ` Markus Armbruster
  0 siblings, 1 reply; 7+ messages in thread
From: John Snow @ 2021-06-22 15:20 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Thomas Huth, Philippe Mathieu-Daudé, QEMU Developers, Max Reitz

One of our Bite-Sized tasks on the wiki was to audit QEMU and, where 
applicable, replace NULL with &error_abort.

Everywhere else where it is intentional, we ought to add a comment or 
some other indication explaining why it's the right thing to do in that 
case.

That task was ported to GitLab here:
https://gitlab.com/qemu-project/qemu/-/issues/414

mreitz and thuth have chimed in with excellent clarifications. Phil 
suggests that we should replace the intentional cases of NULL with 
&error_ignore, to possibly log squelched errors in debugging mode. This 
sounds like a great idea to me:

- It allows us to remove NULL entirely, which as mreitz states "is 
fishy", but sometimes valid.
- It annotates callsites where we have decided the ignore is intentional 
and correct.
- It gives us a review opportunity to require good comments at those 
callsites.
- It gives us a good way to measure progress of the audit by making the 
removal of NULL a concrete goal. (Can we use coccinelle to find all 
instances of the literal NULL being passed to a variable named errp?)

 From a brief chat on IRC, Markus is "reluctant to deviate from GError 
even more". It sounds like there isn't consensus here. We should 
probably reach consensus on this point before trying to pass the task 
off to a neophyte, though -- so I'm raising this discussion on the list 
and CC'ing Markus to see if we can define the task better or not.

--js


(Personally, I've got no horse in the race beyond moving these tasks off 
the wiki and onto the tracker. Since I moved the issue, though, I might 
as well make sure the filing is accurate.)



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Auditing QEMU to replace NULL with &error_abort
  2021-06-22 15:20 Auditing QEMU to replace NULL with &error_abort John Snow
@ 2021-06-23 12:16 ` Markus Armbruster
  2021-06-23 15:01   ` John Snow
  2021-06-23 15:08   ` Daniel P. Berrangé
  0 siblings, 2 replies; 7+ messages in thread
From: Markus Armbruster @ 2021-06-23 12:16 UTC (permalink / raw)
  To: John Snow
  Cc: Thomas Huth, Philippe Mathieu-Daudé, QEMU Developers, Max Reitz

John Snow <jsnow@redhat.com> writes:

> One of our Bite-Sized tasks on the wiki was to audit QEMU and, where
> applicable, replace NULL with &error_abort.

Context: NULL argument means "do not pass back an error object".

This has two uses:

1. When the function permits detecting failure in another way (typically
   by returning distinct error value(s)), and the caller is not
   interested in additional detail an error object provides.

2. When the caller ignores errors.

In both cases, the caller can pass NULL to save itself the trouble of
receiving and freeing a (useless) error object.

Our most common *actual* use, however, is

3. When the call site assumes errors can't happen (and breaks when they
do).

This is wrong.  The caller should pass &error_abort instead.

> Everywhere else where it is intentional, we ought to add a comment or
> some other indication explaining why it's the right thing to do in
> that case.
>
> That task was ported to GitLab here:
> https://gitlab.com/qemu-project/qemu/-/issues/414
>
> mreitz and thuth have chimed in with excellent clarifications. Phil
> suggests that we should replace the intentional cases of NULL with 
> &error_ignore, to possibly log squelched errors in debugging
> mode. This sounds like a great idea to me:
>
> - It allows us to remove NULL entirely, which as mreitz states "is
>   fishy", but sometimes valid.

It's perfectly valid when all you'd do with an error object was to throw
it away.

Whether you do that by passing NULL or some other special value doesn't
matter all that much.  NULL is what GError uses.  It has turned out to
be prone to misuse, but I suspect any other value would be just as
prone.

> - It annotates callsites where we have decided the ignore is
>   intentional and correct.

A fresh special value like &error_ignore can serve as "we really mean to
ignore the error object here" signal, but only until people start
misusing it the same way NULL is being misused.

> - It gives us a review opportunity to require good comments at those
>   callsites.

Requiring a use of &error_ignore to come with a comment may be a
workable way to ensure &error_ignore remains a reliable "we mean it".
It may also tempt people to add more NULL arguments, because that's
easier.

Nothing stops us from add good comments to calls passing NULL.

> - It gives us a good way to measure progress of the audit by making
>   the removal of NULL a concrete goal.

Yes, this is far easier to measure than "has a good comment".

>                                        (Can we use coccinelle to find
>  all instances of the literal NULL being passed to a variable named
> errp?)

Maybe.

> From a brief chat on IRC, Markus is "reluctant to deviate from GError
> even more". It sounds like there isn't consensus here. We should 
> probably reach consensus on this point before trying to pass the task
> off to a neophyte, though -- so I'm raising this discussion on the
> list and CC'ing Markus to see if we can define the task better or not.

QEMU's Error preceded the adoption of GLib.  It is "loosely patterned
after Glib's GError" (quoting error.h's big comment).

We've discussed replacing Error by GError a couple of times.
Non-trivial due to differences between the two, in particular
&error_abort, &error_fatal, and error_propagate(()'s "multiple
propagations are fine, first one wins" vs. g_propagate_error()'s
"propagate at most once".

The latter difference seems rather pointless now, but removing it safely
would be a substantial amount of work[1].

&error_abort has been a clear win for us.  &error_fatal too, when used
judiciously.  Marc-André tried to get both into GLib, unsuccessfully[2].

Nevertheless, I'm reluctant to add more differeneces.  That's not a
"no".  It's a "you need to make a compelling case".

> --js
>
>
> (Personally, I've got no horse in the race beyond moving these tasks
> off the wiki and onto the tracker. Since I moved the issue, though, I
> might as well make sure the filing is accurate.)


[1] Converting old code to ERRP_GUARD() may reduce the use of
error_propagate() to a point where removing becomes easy.

[2] https://gitlab.gnome.org/GNOME/glib/-/issues/2288



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Auditing QEMU to replace NULL with &error_abort
  2021-06-23 12:16 ` Markus Armbruster
@ 2021-06-23 15:01   ` John Snow
  2021-06-23 15:08   ` Daniel P. Berrangé
  1 sibling, 0 replies; 7+ messages in thread
From: John Snow @ 2021-06-23 15:01 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Thomas Huth, Philippe Mathieu-Daudé, QEMU Developers, Max Reitz

On 6/23/21 8:16 AM, Markus Armbruster wrote:
> Nevertheless, I'm reluctant to add more differeneces.  That's not a
> "no".  It's a "you need to make a compelling case".

I'm not sure it's my case to make; I believe you are the authority in 
this matter so the work should be defined in terms of what you are 
prepared to accept.

I'm not sure there's a better argument for &error_ignore beyond "It's 
easy to audit", but I find that to be a compelling one especially where 
new contributors are concerned.

If we still wanted to move back to glib (which seems unlikely) replacing 
&error_ignore by NULL seems like a trivial thing to do. Your call.

--js



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Auditing QEMU to replace NULL with &error_abort
  2021-06-23 12:16 ` Markus Armbruster
  2021-06-23 15:01   ` John Snow
@ 2021-06-23 15:08   ` Daniel P. Berrangé
  2021-06-23 15:31     ` Marc-André Lureau
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel P. Berrangé @ 2021-06-23 15:08 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Philippe Mathieu-Daudé,
	Thomas Huth, John Snow, QEMU Developers, Max Reitz

On Wed, Jun 23, 2021 at 02:16:55PM +0200, Markus Armbruster wrote:
> &error_abort has been a clear win for us.  &error_fatal too, when used
> judiciously.  Marc-André tried to get both into GLib, unsuccessfully[2].

...snip...

> [2] https://gitlab.gnome.org/GNOME/glib/-/issues/2288

This doesn't actually suggest adding error_abort/fatal to GLib. Rather
it adds a general callback hook to GLib. Biggest complaints there
are around the callback concept and difficulty of safely using it,
which I can't disagree with.

I wonder if we would have more luck if we explicitly proposed the
error_abort/fatal concept to GLib instead. At least that would not
hit any of the complaints raised about the callback.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Auditing QEMU to replace NULL with &error_abort
  2021-06-23 15:08   ` Daniel P. Berrangé
@ 2021-06-23 15:31     ` Marc-André Lureau
  2021-06-23 15:39       ` Daniel P. Berrangé
  0 siblings, 1 reply; 7+ messages in thread
From: Marc-André Lureau @ 2021-06-23 15:31 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Thomas Huth, QEMU Developers, Philippe Mathieu-Daudé,
	Markus Armbruster, Max Reitz, John Snow

[-- Attachment #1: Type: text/plain, Size: 1155 bytes --]

Hi

On Wed, Jun 23, 2021 at 7:10 PM Daniel P. Berrangé <berrange@redhat.com>
wrote:

> On Wed, Jun 23, 2021 at 02:16:55PM +0200, Markus Armbruster wrote:
> > &error_abort has been a clear win for us.  &error_fatal too, when used
> > judiciously.  Marc-André tried to get both into GLib, unsuccessfully[2].
>
> ...snip...
>
> > [2] https://gitlab.gnome.org/GNOME/glib/-/issues/2288
>
> This doesn't actually suggest adding error_abort/fatal to GLib. Rather
> it adds a general callback hook to GLib. Biggest complaints there
> are around the callback concept and difficulty of safely using it,
> which I can't disagree with.
>
> I wonder if we would have more luck if we explicitly proposed the
> error_abort/fatal concept to GLib instead. At least that would not
> hit any of the complaints raised about the callback.
>
>
Without callbacks, it will be difficult to report errors back to the
monitor, or prettify it the way we want (since we would be using extended
GErrors for hints etc)

But we could have a more specific callback for that perhaps?

You are welcome to propose something else :)

-- 
Marc-André Lureau

[-- Attachment #2: Type: text/html, Size: 1733 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Auditing QEMU to replace NULL with &error_abort
  2021-06-23 15:31     ` Marc-André Lureau
@ 2021-06-23 15:39       ` Daniel P. Berrangé
  2021-06-23 17:24         ` Daniel P. Berrangé
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel P. Berrangé @ 2021-06-23 15:39 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Thomas Huth, QEMU Developers, Philippe Mathieu-Daudé,
	Markus Armbruster, Max Reitz, John Snow

On Wed, Jun 23, 2021 at 07:31:13PM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Jun 23, 2021 at 7:10 PM Daniel P. Berrangé <berrange@redhat.com>
> wrote:
> 
> > On Wed, Jun 23, 2021 at 02:16:55PM +0200, Markus Armbruster wrote:
> > > &error_abort has been a clear win for us.  &error_fatal too, when used
> > > judiciously.  Marc-André tried to get both into GLib, unsuccessfully[2].
> >
> > ...snip...
> >
> > > [2] https://gitlab.gnome.org/GNOME/glib/-/issues/2288
> >
> > This doesn't actually suggest adding error_abort/fatal to GLib. Rather
> > it adds a general callback hook to GLib. Biggest complaints there
> > are around the callback concept and difficulty of safely using it,
> > which I can't disagree with.
> >
> > I wonder if we would have more luck if we explicitly proposed the
> > error_abort/fatal concept to GLib instead. At least that would not
> > hit any of the complaints raised about the callback.
> >
> >
> Without callbacks, it will be difficult to report errors back to the
> monitor, or prettify it the way we want (since we would be using extended
> GErrors for hints etc)
> 
> But we could have a more specific callback for that perhaps?
> 
> You are welcome to propose something else :)

I was thinking g_set_error would use g_warning/g_error to report
it, and thus involve the g_log_default_handler callback, which
we could provide to feed back into the monitor.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Auditing QEMU to replace NULL with &error_abort
  2021-06-23 15:39       ` Daniel P. Berrangé
@ 2021-06-23 17:24         ` Daniel P. Berrangé
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel P. Berrangé @ 2021-06-23 17:24 UTC (permalink / raw)
  To: Marc-André Lureau, Thomas Huth, QEMU Developers,
	Philippe Mathieu-Daudé,
	Markus Armbruster, Max Reitz, John Snow

On Wed, Jun 23, 2021 at 04:39:18PM +0100, Daniel P. Berrangé wrote:
> On Wed, Jun 23, 2021 at 07:31:13PM +0400, Marc-André Lureau wrote:
> > Hi
> > 
> > On Wed, Jun 23, 2021 at 7:10 PM Daniel P. Berrangé <berrange@redhat.com>
> > wrote:
> > 
> > > On Wed, Jun 23, 2021 at 02:16:55PM +0200, Markus Armbruster wrote:
> > > > &error_abort has been a clear win for us.  &error_fatal too, when used
> > > > judiciously.  Marc-André tried to get both into GLib, unsuccessfully[2].
> > >
> > > ...snip...
> > >
> > > > [2] https://gitlab.gnome.org/GNOME/glib/-/issues/2288
> > >
> > > This doesn't actually suggest adding error_abort/fatal to GLib. Rather
> > > it adds a general callback hook to GLib. Biggest complaints there
> > > are around the callback concept and difficulty of safely using it,
> > > which I can't disagree with.
> > >
> > > I wonder if we would have more luck if we explicitly proposed the
> > > error_abort/fatal concept to GLib instead. At least that would not
> > > hit any of the complaints raised about the callback.
> > >
> > >
> > Without callbacks, it will be difficult to report errors back to the
> > monitor, or prettify it the way we want (since we would be using extended
> > GErrors for hints etc)
> > 
> > But we could have a more specific callback for that perhaps?
> > 
> > You are welcome to propose something else :)
> 
> I was thinking g_set_error would use g_warning/g_error to report
> it, and thus involve the g_log_default_handler callback, which
> we could provide to feed back into the monitor.

I did such a change:

  https://gitlab.gnome.org/dberrange/glib/-/commit/b3e951183f319ef634845485f84deefcc36b4dc3

but when I come to document it, I notice the GLib is very strongly
of the opinion that GError *never* be used to report programmer
errors. 'error_abort' is precisely for programmer errors, so I
think it will be a challenge to get that accepted in GLib.
GLib wants you to use g_error or g_return_if_fail or g_assert
etc, and bypass GError entirely.

Adding 'error_fatal' is a more reasonable challenge, but if we
only get one of the two added to GLib that doesn't especially
help us.

I can see that error_abort is simpler than 4 line code pattern
that would be needed to declare Error & call abort, but I wonder
if we have solved the wrong problem. Should we have just not
used Error object at all for programmer errors like GLib suggests.

Picking one common example. Almost every caller of qemu_opts_create()
passes error_abort, and those which don't look like they probably
should. So what value is the Error object adding instead of just
having the method directly call g_error to report the programmer
error ?  The main one I see is that it redirects output to HMP,
but I think that would be achievable by a custom log handler in
glib. A few other very common repeated usages of error_abort
look similarly redundant to me.

Anyway, this all looks like a bigger can of worms than I expected.

What I think it does show though, is that NULL vs &error_ignore
is a fairly minor detail in the bigger scheme of things, and thus
I'm not convinced it is the best thing to spend time on if we want
to work on error reporting.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-23 17:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-22 15:20 Auditing QEMU to replace NULL with &error_abort John Snow
2021-06-23 12:16 ` Markus Armbruster
2021-06-23 15:01   ` John Snow
2021-06-23 15:08   ` Daniel P. Berrangé
2021-06-23 15:31     ` Marc-André Lureau
2021-06-23 15:39       ` Daniel P. Berrangé
2021-06-23 17:24         ` Daniel P. Berrangé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).