On Mon, Oct 4, 2021 at 2:32 PM John Snow <jsnow@redhat.com> wrote:

>
>
> On Mon, Oct 4, 2021 at 6:12 AM Hanna Reitz <hreitz@redhat.com> wrote:
>
>> On 18.09.21 04:14, John Snow wrote:
>> >
>> >
>> > On Fri, Sep 17, 2021 at 8:58 PM John Snow <jsnow@redhat.com
>> > <mailto:jsnow@redhat.com>> wrote:
>> >
>> >
>> >
>> >     On Fri, Sep 17, 2021 at 10:30 AM Hanna Reitz <hreitz@redhat.com
>> >     <mailto:hreitz@redhat.com>> wrote:
>> >
>> >         On 17.09.21 07:40, John Snow wrote:
>> >         > Disable the aqmp logger, which likes to (at the moment)
>> >         print out
>> >         > intermediate warnings and errors that cause session
>> >         termination; disable
>> >         > them so they don't interfere with the job output.
>> >         >
>> >         > Leave any "CRITICAL" warnings enabled though, those are ones
>> >         that we
>> >         > should never see, no matter what.
>> >
>> >         I mean, looks OK to me, but from what I understand (i.e.
>> little),
>> >         qmp_client doesn’t log CRITICAL messages, at least I can’t see
>> >         any. Only
>> >         ERRORs.
>> >
>> >
>> >     There's *one* critical message in protocol.py, used for a
>> >     circumstance that I *think* should be impossible. I do not think I
>> >     currently use any WARNING level statements.
>> >
>> >         I guess I’m missing some CRITICAL messages in external
>> >         functions called
>> >         from qmp_client.py, but shouldn’t we still keep ERRORs?
>> >
>> >
>> >     ...Mayyyyyybe?
>> >
>> >     The errors logged by AQMP are *almost always* raised as Exceptions
>> >     somewhere else, eventually. Sometimes when we encounter them in
>> >     one context, we need to save them and then re-raise them in a
>> >     different execution context. There's one good exception to this:
>> >     My pal, EOFError.
>> >
>> >     If the reader context encounters EOF, it raises EOFError and this
>> >     causes a disconnect to be scheduled asynchronously. *Any*
>> >     Exception that causes a disconnect to be scheduled asynchronously
>> >     is dutifully logged as an ERROR. At this point in the code, we
>> >     don't really know if the user of the library considers this an
>> >     "error" yet or not. I've waffled a lot on how exactly to treat
>> >     this circumstance. ...Hm, I guess that's really the only case
>> >     where I have an error that really ought to be suppressed. I
>> >     suppose what I will do here is: if the exception happens to be an
>> >     EOFError I will drop the severity of the log message down to INFO.
>> >     I don't know why it takes being challenged on this stuff to start
>> >     thinking clearly about it, but here we are. Thank you for your
>> >     feedback :~)
>> >
>> >     --js
>> >
>> >
>> > Oh, CI testing reminds me of why I am a liar here.
>> >
>> > the mirror-top-perms test intentionally expects not to be able to
>> > connect, but we're treated to these two additional lines of output:
>> >
>> > +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Negotiation failed: EOFError
>> > +ERROR:qemu.aqmp.qmp_client.qemub-2536319:Failed to establish session:
>> > EOFError
>> >
>> > Uh. I guess a temporary suppression in mirror-top-perms, then ...?
>>
>> Sounds right to me, if that’s simple enough.
>>
>> (By the way, I understand it right that you want to lower the severity
>> of EOFErrors to INFO only on disconnect, right?  Which is why they’re
>> still logged as ERRORs here, because they aren’t occurring on
>> disconnects?)
>>
>>
> More or less, yeah.
>
> When an EOFError causes the reader coroutine to halt (because it can't
> read the next message), I decided (in v2) to drop that one particular
> logging message down to "INFO", because it might -- or might not be -- an
> expected occurrence from the point of view of whoever is managing the QMP
> connection. Maybe it was expected (The test used qemu-guest-agent or
> something else to make the guest shutdown, taking QEMU down with it without
> the knowledge of the QMP library layer) or maybe it was unexpected (the QMP
> remote really just disappeared from us on a whim). There's no way to know,
> so it probably isn't right to consider it an error.
>
> In the connection case, I left it as an ERROR because the caller asked us
> to connect to an endpoint and we were unable to, which feels unambiguous.
> It will be ultimately reported via Exceptions as qemu.aqmp.ConnectError,
> with additional information available in fields of that exception object.
> Even though the exception is reported to the caller, I decided to log the
> occurrence anyway, because I felt like it should be the job of the library
> to make a good log and not the caller's responsibility to catch the
> exception and then log it themselves.
>
> That does leave us with this atypical case though: the caller is
> intentionally causing problems :)
>
> (Well, atypical for a user of the library who is using it to make a tool
> they expect to work. Quite a typical case for tests in the abstract, though
> we only seem to have precisely one usage of this pattern in iotests so far.)
>
>
>> > In most other cases I rather imagine we do want to see this kind of
>> > output to help give more information about how things have failed --
>> > they ARE errors. We just happen to not care about them right here.
>>
>> Well, in fact we do expect them here, so we could even log them, but
>> first I don’t know whether they’re stable enough for that, and second
>> this would break the “choose the AQMP or legacy QMP back-end via an
>> environment variable” thing.
>>
>>
> Yeah, that's the other half of it, I came to realize. Just logging the
> expected failure feels the most idiomatic, but it requires a new logging
> filter (I have to filter out the PID from the logger name to make it work
> consistently) and breaks the ability to switch.
>
> So I suppose "for now" just disabling the logger output for a critical
> section and leaving a comment that states that once we're feeling confident
> about the new library (maybe after the next release when everything has
> really gone through the testing wash cycle) we can remove the suppression
> and just log the errors the normal way.
>
>
>> > The only thing I don't exactly like about this is that it requires
>> > some knowledge by the caller to know to disable it. It makes writing
>> > negative tests a bit more annoying because the library leans so
>> > heavily on yelling loudly when it encounters problems.
>>
>> Yeah.  I’m afraid I don’t have a good idea on how to convey test writers
>> how to suppress these errors, even if it were a simple one-liner (like
>> `self.vm_b.set_log_threshold(logging.CRITICAL)`)...
>>
>> We could start an “iotests tips and tricks” document, but do we want to?
>>
>> Hanna
>>
>>
> Hm, not just yet. More chances for un-checked stuff to bitrot. There's
> only this one caller that poses a problem here, so it's probably not going
> to be too difficult to just update tests to expect the error logs.
>
> V2 soon. I already respun it, but it's been a week, so I will need to
> scrub it down with a good proof-reading pass. Thanks for your patience, and
> I hope you enjoyed your PTO :~)
>
>
Oh, uh, I got confused and I actually already sent V2 for this series.
Apologies for the confusion.

--js