From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD224C433F5 for ; Tue, 23 Nov 2021 09:32:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232096AbhKWJfo (ORCPT ); Tue, 23 Nov 2021 04:35:44 -0500 Received: from mail-0301.mail-europe.com ([188.165.51.139]:50054 "EHLO mail-0301.mail-europe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230236AbhKWJfn (ORCPT ); Tue, 23 Nov 2021 04:35:43 -0500 Date: Tue, 23 Nov 2021 09:32:28 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=emersion.fr; s=protonmail; t=1637659949; bh=WaYv4Vo1ied5Can2vStLBRqq5qJ8vzsHC41nroVCbKU=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=mF2LsX4wZDisixKtW1bcOAJTHf0WjPG91SzlHISEtGuZGk1Cn8biO9ZokaN192Yz8 BzOP9TotglL3KB6ghzW169xRSIpHRdWqu4RtCay7x6PYP5SI2K3qFivh/bRVx5qGHv /mBYvwXmxvkXueWSArOCQrcblEfT1hnRsuTk08NfJ8YqaXEE/RwMIqM5APRvvfsmHz Rctozj430tUUIcsMyBcIOgeY2daYHpj9S7GV1zVdYPa0/Tqx8ydyhdZZEXDk3+gdj3 5f+BQCmZAUVfr/EtsUINCrt9+kkMZWWFG2J9UIjBWseWwcTz4YRxMInK4sb7XDjQuF JQjv+Fr35uKEw== To: Pekka Paalanen From: Simon Ser Cc: jim.cromie@gmail.com, quic_saipraka@quicinc.com, Catalin Marinas , dri-devel , Will Deacon , maz@kernel.org, Vincent Whitchurch , amd-gfx mailing list , Ingo Molnar , Daniel Vetter , Arnd Bergmann , linux-arm-msm@vger.kernel.org, Intel Graphics Development , Steven Rostedt , Jason Baron , Sean Paul , intel-gvt-dev@lists.freedesktop.org, Linux ARM , Sean Paul , Greg KH , LKML , quic_psodagud@quicinc.com, mathieu.desnoyers@efficios.com Reply-To: Simon Ser Subject: Re: [PATCH v10 08/10] dyndbg: add print-to-tracefs, selftest with it - RFC Message-ID: <-PHBNsA2s0YNaFjE_76_aCTSMbqUpcaqbttDKFOZv0n9VRShPsgC8NDHq_S8KCpNbE32E9LRrw7CHb3pgFzgg99jFb0DX59vpcPVODkYe4Y=@emersion.fr> In-Reply-To: <20211123104522.7a336773@eldfell> References: <20211111220206.121610-1-jim.cromie@gmail.com> <20211116104631.195cbd0b@eldfell> <20211118172401.0b4d722e@eldfell> <41ea83b2-a707-cb6f-521e-070bb12502de@akamai.com> <20211122110208.528e1d80@eldfell> <20211123104522.7a336773@eldfell> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org First off, let me reiterate that this feature would be invaluable as user-s= pace developers. It's often pretty difficult to figure out the cause of an EINVA= L, we have to ask users to follow complicated instructions [1] to grab DRM log= s. Then have to skim through several megabytes of logs to find the error. I have a hack [2] which just calls system("sudo dmesg") after a failed atom= ic commit, it's been pretty handy. But it's really just a hack, a proper solut= ion would be awesome. [1]: https://gitlab.freedesktop.org/wlroots/wlroots/-/wikis/DRM-Debugging [2]: https://gitlab.freedesktop.org/emersion/libliftoff/-/merge_requests/61 > > > > Having a subsystem specific trace buffer would allow subsystem spec= ific > > > > trace log permissions depending on the sensitivity of the data. But > > > > doesn't drm output today go to the system log which is typically wo= rld > > > > readable today? dmesg isn't world-readable these days, it's been changed recently-ish (last year?) at least on my distribution (Arch). I need root to grab dmesg. (Maybe we can we just let the DRM master process grab the logs?) > > > Yes, and that is exactly the problem. The DRM debug output is so high > > > traffic it would make the system log both unusable due to cruft and > > > slow down the whole machine. The debug output is only useful when > > > something went wrong, and at that point it is too late to enable > > > debugging. That's why a flight recorder with an over-written circular > > > in-memory buffer is needed. > > > > Seans patch reuses enum drm_debug_category to split the tracing > > stream into 10 sub-streams > > - how much traffic from each ? > > - are some sub-streams more valuable for post-mortem ? > > - any value from further refinement of categories ? > > - drop irrelevant callsites individually to reduce clutter, extend > > buffer time/space ? > > I think it's hard to predict which sub-streams you are going to need > before you have a bug to debug. Hence I would err on the side of > enabling too much. This also means that better or more refined > categorisation might not be that much of help - or if it is, then are > the excluded debug messages worth having in the kernel to begin with. > Well, we're probably not that interested in GPU debugs but just > everything related to the KMS side, which on the existing categories > is... everything except half of CORE and DRIVER, maybe? Not sure. We've been recommending drm.debug=3D0x19F so far (see wiki linked above). KMS + PRIME + ATOMIC + LEASE is definitely something we want in, and CORE + DRIVER contains other useful info. We definitely don't want VBL. > My feeling is that that could mean in the order of hundreds of log > events at framerate (e.g. 60 times per second) per each enabled output > individually. And per DRM device, of course. This is with the > uninteresting GPU debugs already excluded. Indeed, successful KMS atomic commits already generate a lot of noise. On m= y machine, setting drm.debug=3D0x19F and running the following command: sudo dmesg -w | pv >/dev/null I get 400KiB/s when idling, and 850KiB/s when wiggling the cursor. > Still, I don't think the flight recorder buffer would need to be > massive. I suspect it would be enough to hold a few frames' worth which > is just a split second under active operation. When something happens, > the userspace stack is likely going to stop on its tracks immediately > to collect the debug information, which means the flooding should pause > and the relevant messages don't get overwritten before we get them. In > a multi-seat system where each device is controlled by a separate > display server instance, per-device logs would help with this. OTOH, > multi-seat is not a very common use case I suppose. There's also the case of multi-GPU where GPU B's logs could clutter GPU A's= , making it harder to understand the cause of an atomic commit failure on GPU= A. So per-device logs would be useful, but not a hard requirement for me, havi= ng *anything* at all would already be a big win. In my experiments linked above [2], system("sudo dmesg") after atomic commi= t failure worked pretty well, and the bottom of the log contained the cause o= f the failure. It was pretty useful to system("sudo dmesg -C") before perform= ing an atomic commit, to be able to only collect the extract of the log relevan= t to the atomic commit. Having some kind of "marker" mechanism could be pretty cool. "Mark" the log stream before performing an atomic commit (ideally that'd just return e.g. = an uint64 offset), then on failure request the logs collected after that mark.