From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46503) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fT0eN-0001RY-Ku for qemu-devel@nongnu.org; Wed, 13 Jun 2018 03:57:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fT0eK-00085N-IM for qemu-devel@nongnu.org; Wed, 13 Jun 2018 03:57:03 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:36456 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fT0eK-00084n-D5 for qemu-devel@nongnu.org; Wed, 13 Jun 2018 03:57:00 -0400 From: Markus Armbruster References: <20180524044454.11792-1-peterx@redhat.com> <20180524044454.11792-2-peterx@redhat.com> <20180529113000.1ee70ef4.cohuck@redhat.com> <20180530033045.GC25245@xz-mi> <20180530153600.63f16405.cohuck@redhat.com> Date: Wed, 13 Jun 2018 09:56:53 +0200 In-Reply-To: <20180530153600.63f16405.cohuck@redhat.com> (Cornelia Huck's message of "Wed, 30 May 2018 15:36:00 +0200") Message-ID: <8736xrm856.fsf@dusky.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] [PATCH v4 1/2] qemu-error: introduce {error|warn}_report_once List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Cornelia Huck Cc: Peter Xu , "Michael S . Tsirkin" , qemu-devel@nongnu.org, Jason Wang , Markus Armbruster , Philippe =?utf-8?Q?Mathieu-Daud=C3=A9?= , Halil Pasic Cornelia Huck writes: > On Wed, 30 May 2018 11:30:45 +0800 > Peter Xu wrote: > >> On Tue, May 29, 2018 at 11:30:00AM +0200, Cornelia Huck wrote: >> > On Thu, 24 May 2018 12:44:53 +0800 >> > Peter Xu wrote: >> > >> > > There are many error_report()s that can be used in frequently called >> > > functions, especially on IO paths. That can be unideal in that >> > > malicious guest can try to trigger the error tons of time which might >> > > use up the log space on the host (e.g., libvirt can capture the stderr >> > > of QEMU and put it persistently onto disk). In VT-d emulation code, we >> > > have trace_vtd_error() tracer. AFAIU all those places can be replaced >> > > by something like error_report() but trace points are mostly used to >> > > avoid the DDOS attack that mentioned above. However using trace points >> > > mean that errors are not dumped if trace not enabled. >> > > >> > > It's not a big deal in most modern server managements since we have >> > > things like logrotate to maintain the logs and make sure the quota is >> > > expected. However it'll still be nice that we just provide another way >> > > to restrict message generations. In most cases, this kind of >> > > error_report()s will only provide valid information on the first message >> > > sent, and all the rest of similar messages will be mostly talking about >> > > the same thing. This patch introduces *_report_once() helpers to allow >> > > a message to be dumped only once during one QEMU process's life cycle. >> > > It will make sure: (1) it's on by deffault, so we can even get something >> > > without turning the trace on and reproducing, and (2) it won't be >> > > affected by DDOS attack. >> > >> > This is good for something (sub-)system wide, where it is enough to >> > alert the user once; but we may want to print something e.g. once per >> > the device where it happens (see v3 of "vfio-ccw: add force unlimited >> > prefetch property" for an example). >> >> I'm glad that we start to have more users of it, no matter which >> implementation we'll choose. At least it means it makes some sense to >> have such a thing. >> >> For me this patch works nicely enough. Of course there can be >> per-device errors, but AFAICT mostly we don't have any error at >> all... and my debugging experience is that when multiple error happens >> on different devices simutaneously, they'll very possible that it's >> caused by the same problem (again, errors are rare after all), or, the >> rest of the problems are caused by the first error only (so the first >> error might cause a collapse of the rest). That's why I wanted to >> always debug with the first error I encounter, because that's mostly >> always the root cause. In that sense, current patch works nicely for >> me (note that we can have device ID embeded in the error message with >> this one). > > I think we have slightly different use cases here. In your case, > something (an error) happened and you don't really care about any > subsequent messages. In the vfio-ccw case, we want to log when a guest > tries to do things on a certain device, so we can possibly throw a > magic switch for that device. We still want messages after that so we > can catch further devices for which we may want to throw the magic > switch. (Similar for the other message, we want to give a hint why a > certain device may not work as expected.) > >> >> At the same time, I think this patch should be easier to use of course >> - no extra variables to define, and very self contained. So I would >> slightly prefer current patch. >> >> However I'm also fine with the approach proposed in the vfio-ccw patch >> too. Though if so I would possibly drop the 2nd patch too since if >> with the vfio-ccw patch I'll need to introduce one bool for every >> trace_vtd_err() place... then I'd not bother with it any more but >> instead live with that trace_*(). ;) Or I can define one per-IOMMU >> error boolean and pass it in for each of the error_report_once(), but >> it seems a bit awkward too. > > I think we can have both the fine-grained control and convenience > macros for those cases where we really just want to print a message > once. Yes. Cornelia, feel free to post a followup patch that satisfies your needs. >> >> > >> > > >> > > To implement it, I stole the printk_once() macro from Linux. >> > >> > Would something akin to printk_ratelimited() also make sense to avoid >> > log flooding? >> >> Yes it will. IMHO we can have that too as follow up if we want, and >> it does not conflict with this print_once(). I'd say currently this >> error_report_once() is good enough for me, especially lightweighted. >> I suspect we'll need more lines to implement a ratelimited version. > > Yes, and I agree that wants to be a separate patch should we find a use > case for it. Yes.