From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3374C77B70 for ; Mon, 17 Apr 2023 14:20:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230048AbjDQOUN (ORCPT ); Mon, 17 Apr 2023 10:20:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229878AbjDQOUM (ORCPT ); Mon, 17 Apr 2023 10:20:12 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A406910C; Mon, 17 Apr 2023 07:20:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1681741205; x=1713277205; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=Qf8jzuCciMEVSDSI5sxFJPgOXBhFRJovo0jcdx4dq80=; b=afbJ6js9ka/08XL3cVvNbkpeX/AhGryfnNFfStg/aZBdTN/trLSKgBEK RzyY5wtdqbhoRBvJOsm32+qA+9Yh4Pk5ldnIdD6ZFxAAkEp4YTj2DtF3q Yy3+mJ4/fHvqpR73nptgApdjqga4J0LWW9EDjjiGHtUN/ocsUyT8jJZWM JUwxx8f0N7JsI7wz1ORSe46Eo86S51HelQHhk+j8dlS5jpl6jwM30LvN0 jb1cZdoCgJSv2svOFFZxWaSY9Bm0RdMIJXZNSEj4W42kKR4UzzKg1EBGs C2QH+wxCXciDdJjMRXscjSRg/OoGLDIYXWzM1l0cASIbnQgpia+sKBSdC A==; X-IronPort-AV: E=McAfee;i="6600,9927,10683"; a="431192945" X-IronPort-AV: E=Sophos;i="5.99,204,1677571200"; d="scan'208";a="431192945" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Apr 2023 07:20:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10683"; a="814821273" X-IronPort-AV: E=Sophos;i="5.99,204,1677571200"; d="scan'208";a="814821273" Received: from gtohallo-mobl.ger.corp.intel.com (HELO [10.213.232.210]) ([10.213.232.210]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Apr 2023 07:20:02 -0700 Message-ID: Date: Mon, 17 Apr 2023 15:20:00 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [PATCH v3 6/7] drm: Add fdinfo memory stats Content-Language: en-US To: Rob Clark Cc: Rob Clark , Jonathan Corbet , linux-arm-msm@vger.kernel.org, "open list:DOCUMENTATION" , Emil Velikov , Christopher Healy , dri-devel@lists.freedesktop.org, open list , Boris Brezillon , Thomas Zimmermann , freedreno@lists.freedesktop.org References: <8893ad56-8807-eb69-2185-b338725f0b18@linux.intel.com> <09c8d794-bb64-f7ba-f854-f14ac30600a6@linux.intel.com> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-arm-msm@vger.kernel.org On 17/04/2023 14:42, Rob Clark wrote: > On Mon, Apr 17, 2023 at 4:10 AM Tvrtko Ursulin > wrote: >> >> >> On 16/04/2023 08:48, Daniel Vetter wrote: >>> On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote: >>>> On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin >>>> wrote: >>>>> >>>>> >>>>> On 13/04/2023 21:05, Daniel Vetter wrote: >>>>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote: >>>>>>> >>>>>>> On 13/04/2023 14:27, Daniel Vetter wrote: >>>>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote: >>>>>>>>> >>>>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote: >>>>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote: >>>>>>>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote: >>>>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote: >>>>>>>>>>>>>>> From: Rob Clark >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Add support to dump GEM stats to fdinfo. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64 >>>>>>>>>>>>>>> v3: Do it in core >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Signed-off-by: Rob Clark >>>>>>>>>>>>>>> Reviewed-by: Emil Velikov >>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>> Documentation/gpu/drm-usage-stats.rst | 21 ++++++++ >>>>>>>>>>>>>>> drivers/gpu/drm/drm_file.c | 76 +++++++++++++++++++++++++++ >>>>>>>>>>>>>>> include/drm/drm_file.h | 1 + >>>>>>>>>>>>>>> include/drm/drm_gem.h | 19 +++++++ >>>>>>>>>>>>>>> 4 files changed, 117 insertions(+) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst >>>>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644 >>>>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst >>>>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst >>>>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region. >>>>>>>>>>>>>>> Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' >>>>>>>>>>>>>>> indicating kibi- or mebi-bytes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +- drm-shared-memory: [KiB|MiB] >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more >>>>>>>>>>>>>>> +than a single handle). >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +- drm-private-memory: [KiB|MiB] >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +The total size of buffers that are not shared with another file. >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +- drm-resident-memory: [KiB|MiB] >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +The total size of buffers that are resident in system memory. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think this naming maybe does not work best with the existing >>>>>>>>>>>>>> drm-memory- keys. >>>>>>>>>>>>> >>>>>>>>>>>>> Actually, it was very deliberate not to conflict with the existing >>>>>>>>>>>>> drm-memory- keys ;-) >>>>>>>>>>>>> >>>>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it >>>>>>>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied. >>>>>>>>>>>>> >>>>>>>>>>>>>> How about introduce the concept of a memory region from the start and >>>>>>>>>>>>>> use naming similar like we do for engines? >>>>>>>>>>>>>> >>>>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ... >>>>>>>>>>>>>> >>>>>>>>>>>>>> Then we document a bunch of categories and their semantics, for instance: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 'size' - All reachable objects >>>>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1 >>>>>>>>>>>>>> 'resident' - Objects with backing store >>>>>>>>>>>>>> 'active' - Objects in use, subset of resident >>>>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We keep the same semantics as with process memory accounting (if I got >>>>>>>>>>>>>> it right) which could be desirable for a simplified mental model. >>>>>>>>>>>>>> >>>>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we >>>>>>>>>>>>>> correctly captured this in the first round it should be equivalent to >>>>>>>>>>>>>> 'resident' above. In any case we can document no category is equal to >>>>>>>>>>>>>> which category, and at most one of the two must be output.) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Region names we at most partially standardize. Like we could say >>>>>>>>>>>>>> 'system' is to be used where backing store is system RAM and others are >>>>>>>>>>>>>> driver defined. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory >>>>>>>>>>>>>> region they support. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think this all also works for objects which can be migrated between >>>>>>>>>>>>>> memory regions. 'Size' accounts them against all regions while for >>>>>>>>>>>>>> 'resident' they only appear in the region of their current placement, etc. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm not too sure how to rectify different memory regions with this, >>>>>>>>>>>>> since drm core doesn't really know about the driver's memory regions. >>>>>>>>>>>>> Perhaps we can go back to this being a helper and drivers with vram >>>>>>>>>>>>> just don't use the helper? Or?? >>>>>>>>>>>> >>>>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it >>>>>>>>>>>> all works out reasonably consistently? >>>>>>>>>>> >>>>>>>>>>> That is basically what we have now. I could append -system to each to >>>>>>>>>>> make things easier to add vram/etc (from a uabi standpoint).. >>>>>>>>>> >>>>>>>>>> What you have isn't really -system, but everything. So doesn't really make >>>>>>>>>> sense to me to mark this -system, it's only really true for integrated (if >>>>>>>>>> they don't have stolen or something like that). >>>>>>>>>> >>>>>>>>>> Also my comment was more in reply to Tvrtko's suggestion. >>>>>>>>> >>>>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns >>>>>>>>> with the current drm-memory-$REGION by extending, rather than creating >>>>>>>>> confusion with different order of key name components. >>>>>>>> >>>>>>>> Oh my comment was pretty much just bikeshed, in case someone creates a >>>>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point. >>>>>>>> So $CATEGORY before the -memory. >>>>>>>> >>>>>>>> Otoh I don't think that'll happen, so I guess we can go with whatever more >>>>>>>> folks like :-) I don't really care much personally. >>>>>>> >>>>>>> Okay I missed the parsing problem. >>>>>>> >>>>>>>>> AMD currently has (among others) drm-memory-vram, which we could define in >>>>>>>>> the spec maps to category X, if category component is not present. >>>>>>>>> >>>>>>>>> Some examples: >>>>>>>>> >>>>>>>>> drm-memory-resident-system: >>>>>>>>> drm-memory-size-lmem0: >>>>>>>>> drm-memory-active-vram: >>>>>>>>> >>>>>>>>> Etc.. I think it creates a consistent story. >>>>>>>>> >>>>>>>>> Other than this, my two I think significant opens which haven't been >>>>>>>>> addressed yet are: >>>>>>>>> >>>>>>>>> 1) >>>>>>>>> >>>>>>>>> Why do we want totals (not per region) when userspace can trivially >>>>>>>>> aggregate if they want. What is the use case? >>>>>>>>> >>>>>>>>> 2) >>>>>>>>> >>>>>>>>> Current proposal limits the value to whole objects and fixates that by >>>>>>>>> having it in the common code. If/when some driver is able to support sub-BO >>>>>>>>> granularity they will need to opt out of the common printer at which point >>>>>>>>> it may be less churn to start with a helper rather than mid-layer. Or maybe >>>>>>>>> some drivers already support this, I don't know. Given how important VM BIND >>>>>>>>> is I wouldn't be surprised. >>>>>>>> >>>>>>>> I feel like for drivers using ttm we want a ttm helper which takes care of >>>>>>>> the region printing in hopefully a standard way. And that could then also >>>>>>>> take care of all kinds of of partial binding and funny rules (like maybe >>>>>>>> we want a standard vram region that addds up all the lmem regions on >>>>>>>> intel, so that all dgpu have a common vram bucket that generic tools >>>>>>>> understand?). >>>>>>> >>>>>>> First part yes, but for the second I would think we want to avoid any >>>>>>> aggregation in the kernel which can be done in userspace just as well. Such >>>>>>> total vram bucket would be pretty useless on Intel even since userspace >>>>>>> needs to be region aware to make use of all resources. It could even be >>>>>>> counter productive I think - "why am I getting out of memory when half of my >>>>>>> vram is unused!?". >>>>>> >>>>>> This is not for intel-aware userspace. This is for fairly generic "gputop" >>>>>> style userspace, which might simply have no clue or interest in what lmemX >>>>>> means, but would understand vram. >>>>>> >>>>>> Aggregating makes sense. >>>>> >>>>> Lmem vs vram is now an argument not about aggregation but about >>>>> standardizing regions names. >>>>> >>>>> One detail also is a change in philosophy compared to engine stats where >>>>> engine names are not centrally prescribed and it was expected userspace >>>>> will have to handle things generically and with some vendor specific >>>>> knowledge. >>>>> >>>>> Like in my gputop patches. It doesn't need to understand what is what, >>>>> it just finds what's there and presents it to the user. >>>>> >>>>> Come some accel driver with local memory it wouldn't be vram any more. >>>>> Or even a headless data center GPU. So I really don't think it is good >>>>> to hardcode 'vram' in the spec, or midlayer, or helpers. >>>>> >>>>> And for aggregation.. again, userspace can do it just as well. If we do >>>>> it in kernel then immediately we have multiple sets of keys to output >>>>> for any driver which wants to show the region view. IMO it is just >>>>> pointless work in the kernel and more code in the kernel, when userspace >>>>> can do it. >>>>> >>>>> Proposal A (one a discrete gpu, one category only): >>>>> >>>>> drm-resident-memory: x KiB >>>>> drm-resident-memory-system: x KiB >>>>> drm-resident-memory-vram: x KiB >>>>> >>>>> Two loops in the kernel, more parsing in userspace. >>>> >>>> why would it be more than one loop, ie. >>>> >>>> mem.resident += size; >>>> mem.category[cat].resident += size; >>>> >>>> At the end of the day, there is limited real-estate to show a million >>>> different columns of information. Even the gputop patches I posted >>>> don't show everything of what is currently there. And nvtop only >>>> shows toplevel resident stat. So I think the "everything" stat is >>>> going to be what most tools use. >>> >>> Yeah with enough finesse the double-loop isn't needed, it's just the >>> simplest possible approach. >>> >>> Also this is fdinfo, I _really_ want perf data showing that it's a >>> real-world problem when we conjecture about algorithmic complexity. >>> procutils have been algorithmically garbage since decades after all :-) >> >> Just run it. :) >> >> Algorithmic complexity is quite obvious and not a conjecture - to find >> DRM clients you have to walk _all_ pids and _all_ fds under them. So >> amount of work can scale very quickly and even _not_ with the number of >> DRM clients. >> >> It's not too bad on my desktop setup but it is significantly more CPU >> intensive than top(1). >> >> It would be possible to optimise the current code some more by not >> parsing full fdinfo (may become more important as number of keys grow), >> but that's only relevant when number of drm fds is large. It doesn't >> solve the basic pids * open fds search for which we'd need a way to walk >> the list of pids with drm fds directly. > > All of which has (almost[1]) nothing to do with one loop or two Correct, this was just a side discussion where I understood Daniel is asking about the wider performance story. Perhaps I misunderstood. > (ignoring for a moment that I already pointed out a single loop is all > that is needed). If CPU overhead is a problem, we could perhaps come > up some sysfs which has one file per drm_file and side-step crawling > of all of the proc * fd. I'll play around with it some but I'm pretty > sure you are trying to optimize the wrong thing. Yes, that's what I meant too in "a way to walk the list of pids with drm fds directly". Regards, Tvrtko > > BR, > -R > > [1] generally a single process using drm has multiple fd's pointing at > the same drm_file.. which makes the current approach of having to read > fdinfo to find the client-id sub-optimal. But still the total # of > proc * fd is much larger From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0EC2EC77B76 for ; Mon, 17 Apr 2023 14:20:10 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BEC8C10E32E; Mon, 17 Apr 2023 14:20:08 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id 27B0510E32E; Mon, 17 Apr 2023 14:20:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1681741206; x=1713277206; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=Qf8jzuCciMEVSDSI5sxFJPgOXBhFRJovo0jcdx4dq80=; b=nFYhCdNZ0qpctr+sSpTm9PsPBdObTLSnn7NlCj1J3Jn9QyaZsLfHA4dd VA8NjG7W1p7FiEUWHBJifBtbV8zWc4004DHSezdg0SZ8zmAf4GNVuEzMh QrG/0x/nOZSDXswk+v289AWojfvFFpq0vBXOQHmKafiJ5Jjx+REt14x8m Nb2EYXqEQJhtIKJB31kp7lyYdQoFmsmszBEvuVdMtA1oe7W0ObIxISOi8 LJ8sfcgoPfasSt/Fobk0vaAnsfDIT/j/PDBHUjWuKvbJ4aGaiZJPimFtA a72YaMGVWEo3SJgTJBecuGGYmfNLxQf3IoTDJETh9lp6TP1kmNeOjnOcV Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10683"; a="431192951" X-IronPort-AV: E=Sophos;i="5.99,204,1677571200"; d="scan'208";a="431192951" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Apr 2023 07:20:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10683"; a="814821273" X-IronPort-AV: E=Sophos;i="5.99,204,1677571200"; d="scan'208";a="814821273" Received: from gtohallo-mobl.ger.corp.intel.com (HELO [10.213.232.210]) ([10.213.232.210]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Apr 2023 07:20:02 -0700 Message-ID: Date: Mon, 17 Apr 2023 15:20:00 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [PATCH v3 6/7] drm: Add fdinfo memory stats Content-Language: en-US To: Rob Clark References: <8893ad56-8807-eb69-2185-b338725f0b18@linux.intel.com> <09c8d794-bb64-f7ba-f854-f14ac30600a6@linux.intel.com> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Rob Clark , "open list:DOCUMENTATION" , linux-arm-msm@vger.kernel.org, Jonathan Corbet , Emil Velikov , Christopher Healy , dri-devel@lists.freedesktop.org, open list , Boris Brezillon , Thomas Zimmermann , freedreno@lists.freedesktop.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On 17/04/2023 14:42, Rob Clark wrote: > On Mon, Apr 17, 2023 at 4:10 AM Tvrtko Ursulin > wrote: >> >> >> On 16/04/2023 08:48, Daniel Vetter wrote: >>> On Fri, Apr 14, 2023 at 06:40:27AM -0700, Rob Clark wrote: >>>> On Fri, Apr 14, 2023 at 1:57 AM Tvrtko Ursulin >>>> wrote: >>>>> >>>>> >>>>> On 13/04/2023 21:05, Daniel Vetter wrote: >>>>>> On Thu, Apr 13, 2023 at 05:40:21PM +0100, Tvrtko Ursulin wrote: >>>>>>> >>>>>>> On 13/04/2023 14:27, Daniel Vetter wrote: >>>>>>>> On Thu, Apr 13, 2023 at 01:58:34PM +0100, Tvrtko Ursulin wrote: >>>>>>>>> >>>>>>>>> On 12/04/2023 20:18, Daniel Vetter wrote: >>>>>>>>>> On Wed, Apr 12, 2023 at 11:42:07AM -0700, Rob Clark wrote: >>>>>>>>>>> On Wed, Apr 12, 2023 at 11:17 AM Daniel Vetter wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Apr 12, 2023 at 10:59:54AM -0700, Rob Clark wrote: >>>>>>>>>>>>> On Wed, Apr 12, 2023 at 7:42 AM Tvrtko Ursulin >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 11/04/2023 23:56, Rob Clark wrote: >>>>>>>>>>>>>>> From: Rob Clark >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Add support to dump GEM stats to fdinfo. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> v2: Fix typos, change size units to match docs, use div_u64 >>>>>>>>>>>>>>> v3: Do it in core >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Signed-off-by: Rob Clark >>>>>>>>>>>>>>> Reviewed-by: Emil Velikov >>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>> Documentation/gpu/drm-usage-stats.rst | 21 ++++++++ >>>>>>>>>>>>>>> drivers/gpu/drm/drm_file.c | 76 +++++++++++++++++++++++++++ >>>>>>>>>>>>>>> include/drm/drm_file.h | 1 + >>>>>>>>>>>>>>> include/drm/drm_gem.h | 19 +++++++ >>>>>>>>>>>>>>> 4 files changed, 117 insertions(+) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst >>>>>>>>>>>>>>> index b46327356e80..b5e7802532ed 100644 >>>>>>>>>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst >>>>>>>>>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst >>>>>>>>>>>>>>> @@ -105,6 +105,27 @@ object belong to this client, in the respective memory region. >>>>>>>>>>>>>>> Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' >>>>>>>>>>>>>>> indicating kibi- or mebi-bytes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +- drm-shared-memory: [KiB|MiB] >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +The total size of buffers that are shared with another file (ie. have more >>>>>>>>>>>>>>> +than a single handle). >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +- drm-private-memory: [KiB|MiB] >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +The total size of buffers that are not shared with another file. >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +- drm-resident-memory: [KiB|MiB] >>>>>>>>>>>>>>> + >>>>>>>>>>>>>>> +The total size of buffers that are resident in system memory. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think this naming maybe does not work best with the existing >>>>>>>>>>>>>> drm-memory- keys. >>>>>>>>>>>>> >>>>>>>>>>>>> Actually, it was very deliberate not to conflict with the existing >>>>>>>>>>>>> drm-memory- keys ;-) >>>>>>>>>>>>> >>>>>>>>>>>>> I wouldn't have preferred drm-memory-{active,resident,...} but it >>>>>>>>>>>>> could be mis-parsed by existing userspace so my hands were a bit tied. >>>>>>>>>>>>> >>>>>>>>>>>>>> How about introduce the concept of a memory region from the start and >>>>>>>>>>>>>> use naming similar like we do for engines? >>>>>>>>>>>>>> >>>>>>>>>>>>>> drm-memory-$CATEGORY-$REGION: ... >>>>>>>>>>>>>> >>>>>>>>>>>>>> Then we document a bunch of categories and their semantics, for instance: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 'size' - All reachable objects >>>>>>>>>>>>>> 'shared' - Subset of 'size' with handle_count > 1 >>>>>>>>>>>>>> 'resident' - Objects with backing store >>>>>>>>>>>>>> 'active' - Objects in use, subset of resident >>>>>>>>>>>>>> 'purgeable' - Or inactive? Subset of resident. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We keep the same semantics as with process memory accounting (if I got >>>>>>>>>>>>>> it right) which could be desirable for a simplified mental model. >>>>>>>>>>>>>> >>>>>>>>>>>>>> (AMD needs to remind me of their 'drm-memory-...' keys semantics. If we >>>>>>>>>>>>>> correctly captured this in the first round it should be equivalent to >>>>>>>>>>>>>> 'resident' above. In any case we can document no category is equal to >>>>>>>>>>>>>> which category, and at most one of the two must be output.) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Region names we at most partially standardize. Like we could say >>>>>>>>>>>>>> 'system' is to be used where backing store is system RAM and others are >>>>>>>>>>>>>> driver defined. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Then discrete GPUs could emit N sets of key-values, one for each memory >>>>>>>>>>>>>> region they support. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think this all also works for objects which can be migrated between >>>>>>>>>>>>>> memory regions. 'Size' accounts them against all regions while for >>>>>>>>>>>>>> 'resident' they only appear in the region of their current placement, etc. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm not too sure how to rectify different memory regions with this, >>>>>>>>>>>>> since drm core doesn't really know about the driver's memory regions. >>>>>>>>>>>>> Perhaps we can go back to this being a helper and drivers with vram >>>>>>>>>>>>> just don't use the helper? Or?? >>>>>>>>>>>> >>>>>>>>>>>> I think if you flip it around to drm-$CATEGORY-memory{-$REGION}: then it >>>>>>>>>>>> all works out reasonably consistently? >>>>>>>>>>> >>>>>>>>>>> That is basically what we have now. I could append -system to each to >>>>>>>>>>> make things easier to add vram/etc (from a uabi standpoint).. >>>>>>>>>> >>>>>>>>>> What you have isn't really -system, but everything. So doesn't really make >>>>>>>>>> sense to me to mark this -system, it's only really true for integrated (if >>>>>>>>>> they don't have stolen or something like that). >>>>>>>>>> >>>>>>>>>> Also my comment was more in reply to Tvrtko's suggestion. >>>>>>>>> >>>>>>>>> Right so my proposal was drm-memory-$CATEGORY-$REGION which I think aligns >>>>>>>>> with the current drm-memory-$REGION by extending, rather than creating >>>>>>>>> confusion with different order of key name components. >>>>>>>> >>>>>>>> Oh my comment was pretty much just bikeshed, in case someone creates a >>>>>>>> $REGION that other drivers use for $CATEGORY. Kinda Rob's parsing point. >>>>>>>> So $CATEGORY before the -memory. >>>>>>>> >>>>>>>> Otoh I don't think that'll happen, so I guess we can go with whatever more >>>>>>>> folks like :-) I don't really care much personally. >>>>>>> >>>>>>> Okay I missed the parsing problem. >>>>>>> >>>>>>>>> AMD currently has (among others) drm-memory-vram, which we could define in >>>>>>>>> the spec maps to category X, if category component is not present. >>>>>>>>> >>>>>>>>> Some examples: >>>>>>>>> >>>>>>>>> drm-memory-resident-system: >>>>>>>>> drm-memory-size-lmem0: >>>>>>>>> drm-memory-active-vram: >>>>>>>>> >>>>>>>>> Etc.. I think it creates a consistent story. >>>>>>>>> >>>>>>>>> Other than this, my two I think significant opens which haven't been >>>>>>>>> addressed yet are: >>>>>>>>> >>>>>>>>> 1) >>>>>>>>> >>>>>>>>> Why do we want totals (not per region) when userspace can trivially >>>>>>>>> aggregate if they want. What is the use case? >>>>>>>>> >>>>>>>>> 2) >>>>>>>>> >>>>>>>>> Current proposal limits the value to whole objects and fixates that by >>>>>>>>> having it in the common code. If/when some driver is able to support sub-BO >>>>>>>>> granularity they will need to opt out of the common printer at which point >>>>>>>>> it may be less churn to start with a helper rather than mid-layer. Or maybe >>>>>>>>> some drivers already support this, I don't know. Given how important VM BIND >>>>>>>>> is I wouldn't be surprised. >>>>>>>> >>>>>>>> I feel like for drivers using ttm we want a ttm helper which takes care of >>>>>>>> the region printing in hopefully a standard way. And that could then also >>>>>>>> take care of all kinds of of partial binding and funny rules (like maybe >>>>>>>> we want a standard vram region that addds up all the lmem regions on >>>>>>>> intel, so that all dgpu have a common vram bucket that generic tools >>>>>>>> understand?). >>>>>>> >>>>>>> First part yes, but for the second I would think we want to avoid any >>>>>>> aggregation in the kernel which can be done in userspace just as well. Such >>>>>>> total vram bucket would be pretty useless on Intel even since userspace >>>>>>> needs to be region aware to make use of all resources. It could even be >>>>>>> counter productive I think - "why am I getting out of memory when half of my >>>>>>> vram is unused!?". >>>>>> >>>>>> This is not for intel-aware userspace. This is for fairly generic "gputop" >>>>>> style userspace, which might simply have no clue or interest in what lmemX >>>>>> means, but would understand vram. >>>>>> >>>>>> Aggregating makes sense. >>>>> >>>>> Lmem vs vram is now an argument not about aggregation but about >>>>> standardizing regions names. >>>>> >>>>> One detail also is a change in philosophy compared to engine stats where >>>>> engine names are not centrally prescribed and it was expected userspace >>>>> will have to handle things generically and with some vendor specific >>>>> knowledge. >>>>> >>>>> Like in my gputop patches. It doesn't need to understand what is what, >>>>> it just finds what's there and presents it to the user. >>>>> >>>>> Come some accel driver with local memory it wouldn't be vram any more. >>>>> Or even a headless data center GPU. So I really don't think it is good >>>>> to hardcode 'vram' in the spec, or midlayer, or helpers. >>>>> >>>>> And for aggregation.. again, userspace can do it just as well. If we do >>>>> it in kernel then immediately we have multiple sets of keys to output >>>>> for any driver which wants to show the region view. IMO it is just >>>>> pointless work in the kernel and more code in the kernel, when userspace >>>>> can do it. >>>>> >>>>> Proposal A (one a discrete gpu, one category only): >>>>> >>>>> drm-resident-memory: x KiB >>>>> drm-resident-memory-system: x KiB >>>>> drm-resident-memory-vram: x KiB >>>>> >>>>> Two loops in the kernel, more parsing in userspace. >>>> >>>> why would it be more than one loop, ie. >>>> >>>> mem.resident += size; >>>> mem.category[cat].resident += size; >>>> >>>> At the end of the day, there is limited real-estate to show a million >>>> different columns of information. Even the gputop patches I posted >>>> don't show everything of what is currently there. And nvtop only >>>> shows toplevel resident stat. So I think the "everything" stat is >>>> going to be what most tools use. >>> >>> Yeah with enough finesse the double-loop isn't needed, it's just the >>> simplest possible approach. >>> >>> Also this is fdinfo, I _really_ want perf data showing that it's a >>> real-world problem when we conjecture about algorithmic complexity. >>> procutils have been algorithmically garbage since decades after all :-) >> >> Just run it. :) >> >> Algorithmic complexity is quite obvious and not a conjecture - to find >> DRM clients you have to walk _all_ pids and _all_ fds under them. So >> amount of work can scale very quickly and even _not_ with the number of >> DRM clients. >> >> It's not too bad on my desktop setup but it is significantly more CPU >> intensive than top(1). >> >> It would be possible to optimise the current code some more by not >> parsing full fdinfo (may become more important as number of keys grow), >> but that's only relevant when number of drm fds is large. It doesn't >> solve the basic pids * open fds search for which we'd need a way to walk >> the list of pids with drm fds directly. > > All of which has (almost[1]) nothing to do with one loop or two Correct, this was just a side discussion where I understood Daniel is asking about the wider performance story. Perhaps I misunderstood. > (ignoring for a moment that I already pointed out a single loop is all > that is needed). If CPU overhead is a problem, we could perhaps come > up some sysfs which has one file per drm_file and side-step crawling > of all of the proc * fd. I'll play around with it some but I'm pretty > sure you are trying to optimize the wrong thing. Yes, that's what I meant too in "a way to walk the list of pids with drm fds directly". Regards, Tvrtko > > BR, > -R > > [1] generally a single process using drm has multiple fd's pointing at > the same drm_file.. which makes the current approach of having to read > fdinfo to find the client-id sub-optimal. But still the total # of > proc * fd is much larger