From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF77AC433B4 for ; Thu, 20 May 2021 08:35:21 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8FDA260FF1 for ; Thu, 20 May 2021 08:35:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8FDA260FF1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=nouveau-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 10DD56F390; Thu, 20 May 2021 08:35:21 +0000 (UTC) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7EA796F38A; Thu, 20 May 2021 08:35:19 +0000 (UTC) IronPort-SDR: U/9pXn3LKbQy0FOPsiwPeFHIHluaTsC6UTUYMSP9wOCjlnlYbF/6bfcwC0pdFXjQBw6K+Gd5xy I1zg7G4QJjLg== X-IronPort-AV: E=McAfee;i="6200,9189,9989"; a="181455044" X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="181455044" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 01:35:18 -0700 IronPort-SDR: aNBuLqGFfqSYN9sVZy7S8HzVxBasXvfZ48MkwbejhOkGcfdZtONuiT6xc3vMH4To8/aoZf+FQo OnnUJyzscVcg== X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="543376897" Received: from fgrogers-mobl.ger.corp.intel.com (HELO [10.213.241.97]) ([10.213.241.97]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 01:35:15 -0700 To: Daniel Vetter References: <6cf2f14a-6a16-5ea3-d307-004faad4cc79@linux.intel.com> <7f8fc38a-cd25-aa1f-fa2d-5d3334edb3d2@linux.intel.com> <71428a10-4b2f-dbbf-7678-7487f9eda6a5@linux.intel.com> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc Message-ID: Date: Thu, 20 May 2021 09:35:13 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Subject: Re: [Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness X-BeenThere: nouveau@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Nouveau development list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Intel Graphics Development , Maling list - DRI developers , Daniel Stone , Simon Ser , "nouveau@lists.freedesktop.org" , "Koenig, Christian" , "Nieto, David M" Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: nouveau-bounces@lists.freedesktop.org Sender: "Nouveau" On 19/05/2021 19:23, Daniel Vetter wrote: > On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin > wrote: >> >> >> On 18/05/2021 10:40, Tvrtko Ursulin wrote: >>> >>> On 18/05/2021 10:16, Daniel Stone wrote: >>>> Hi, >>>> >>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin >>>> wrote: >>>>> I was just wondering if stat(2) and a chrdev major check would be a >>>>> solid criteria to more efficiently (compared to parsing the text >>>>> content) detect drm files while walking procfs. >>>> >>>> Maybe I'm missing something, but is the per-PID walk actually a >>>> measurable performance issue rather than just a bit unpleasant? >>> >>> Per pid and per each open fd. >>> >>> As said in the other thread what bothers me a bit in this scheme is that >>> the cost of obtaining GPU usage scales based on non-GPU criteria. >>> >>> For use case of a top-like tool which shows all processes this is a >>> smaller additional cost, but then for a gpu-top like tool it is somewhat >>> higher. >> >> To further expand, not only cost would scale per pid multiplies per open >> fd, but to detect which of the fds are DRM I see these three options: >> >> 1) Open and parse fdinfo. >> 2) Name based matching ie /dev/dri/.. something. >> 3) Stat the symlink target and check for DRM major. > > stat with symlink following should be plenty fast. Maybe. I don't think my point about keeping the dentry cache needlessly hot is getting through at all. On my lightly loaded desktop: $ sudo lsof | wc -l 599551 $ sudo lsof | grep "/dev/dri/" | wc -l 1965 It's going to look up ~600k pointless dentries in every iteration. Just to find a handful of DRM ones. Hard to say if that is better or worse than just parsing fdinfo text for all files. Will see. >> All sound quite sub-optimal to me. >> >> Name based matching is probably the least evil on system resource usage >> (Keeping the dentry cache too hot? Too many syscalls?), even though >> fundamentally I don't it is the right approach. >> >> What happens with dup(2) is another question. > > We need benchmark numbers showing that on anything remotely realistic > it's an actual problem. Until we've demonstrated it's a real problem > we don't need to solve it. Point about dup(2) is whether it is possible to distinguish the duplicated fds in fdinfo. If a DRM client dupes, and we found two fdinfos each saying client is using 20% GPU, we don't want to add it up to 40%. > E.g. top with any sorting enabled also parses way more than it > displays on every update. It seems to be doing Just Fine (tm). Ha, perceptions differ. I see it using 4-5% while building the kernel on a Xeon server which I find quite a lot. :) >> Does anyone have any feedback on the /proc//gpu idea at all? > > When we know we have a problem to solve we can take a look at solutions. Yes I don't think the problem would be to add a better solution later, so happy to try the fdinfo first. I am simply pointing out a fundamental design inefficiency. Even if machines are getting faster and faster I don't think that should be an excuse to waste more and more under the hood, when a more efficient solution can be designed from the start. Regards, Tvrtko _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D36EC433ED for ; Thu, 20 May 2021 08:35:23 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2ECEA6109F for ; Thu, 20 May 2021 08:35:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2ECEA6109F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C21446F38A; Thu, 20 May 2021 08:35:20 +0000 (UTC) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7EA796F38A; Thu, 20 May 2021 08:35:19 +0000 (UTC) IronPort-SDR: U/9pXn3LKbQy0FOPsiwPeFHIHluaTsC6UTUYMSP9wOCjlnlYbF/6bfcwC0pdFXjQBw6K+Gd5xy I1zg7G4QJjLg== X-IronPort-AV: E=McAfee;i="6200,9189,9989"; a="181455044" X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="181455044" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 01:35:18 -0700 IronPort-SDR: aNBuLqGFfqSYN9sVZy7S8HzVxBasXvfZ48MkwbejhOkGcfdZtONuiT6xc3vMH4To8/aoZf+FQo OnnUJyzscVcg== X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="543376897" Received: from fgrogers-mobl.ger.corp.intel.com (HELO [10.213.241.97]) ([10.213.241.97]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 01:35:15 -0700 Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness To: Daniel Vetter References: <6cf2f14a-6a16-5ea3-d307-004faad4cc79@linux.intel.com> <7f8fc38a-cd25-aa1f-fa2d-5d3334edb3d2@linux.intel.com> <71428a10-4b2f-dbbf-7678-7487f9eda6a5@linux.intel.com> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc Message-ID: Date: Thu, 20 May 2021 09:35:13 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "jhubbard@nvidia.com" , Intel Graphics Development , Maling list - DRI developers , "nouveau@lists.freedesktop.org" , "Koenig, Christian" , "aritger@nvidia.com" , "Nieto, David M" Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On 19/05/2021 19:23, Daniel Vetter wrote: > On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin > wrote: >> >> >> On 18/05/2021 10:40, Tvrtko Ursulin wrote: >>> >>> On 18/05/2021 10:16, Daniel Stone wrote: >>>> Hi, >>>> >>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin >>>> wrote: >>>>> I was just wondering if stat(2) and a chrdev major check would be a >>>>> solid criteria to more efficiently (compared to parsing the text >>>>> content) detect drm files while walking procfs. >>>> >>>> Maybe I'm missing something, but is the per-PID walk actually a >>>> measurable performance issue rather than just a bit unpleasant? >>> >>> Per pid and per each open fd. >>> >>> As said in the other thread what bothers me a bit in this scheme is that >>> the cost of obtaining GPU usage scales based on non-GPU criteria. >>> >>> For use case of a top-like tool which shows all processes this is a >>> smaller additional cost, but then for a gpu-top like tool it is somewhat >>> higher. >> >> To further expand, not only cost would scale per pid multiplies per open >> fd, but to detect which of the fds are DRM I see these three options: >> >> 1) Open and parse fdinfo. >> 2) Name based matching ie /dev/dri/.. something. >> 3) Stat the symlink target and check for DRM major. > > stat with symlink following should be plenty fast. Maybe. I don't think my point about keeping the dentry cache needlessly hot is getting through at all. On my lightly loaded desktop: $ sudo lsof | wc -l 599551 $ sudo lsof | grep "/dev/dri/" | wc -l 1965 It's going to look up ~600k pointless dentries in every iteration. Just to find a handful of DRM ones. Hard to say if that is better or worse than just parsing fdinfo text for all files. Will see. >> All sound quite sub-optimal to me. >> >> Name based matching is probably the least evil on system resource usage >> (Keeping the dentry cache too hot? Too many syscalls?), even though >> fundamentally I don't it is the right approach. >> >> What happens with dup(2) is another question. > > We need benchmark numbers showing that on anything remotely realistic > it's an actual problem. Until we've demonstrated it's a real problem > we don't need to solve it. Point about dup(2) is whether it is possible to distinguish the duplicated fds in fdinfo. If a DRM client dupes, and we found two fdinfos each saying client is using 20% GPU, we don't want to add it up to 40%. > E.g. top with any sorting enabled also parses way more than it > displays on every update. It seems to be doing Just Fine (tm). Ha, perceptions differ. I see it using 4-5% while building the kernel on a Xeon server which I find quite a lot. :) >> Does anyone have any feedback on the /proc//gpu idea at all? > > When we know we have a problem to solve we can take a look at solutions. Yes I don't think the problem would be to add a better solution later, so happy to try the fdinfo first. I am simply pointing out a fundamental design inefficiency. Even if machines are getting faster and faster I don't think that should be an excuse to waste more and more under the hood, when a more efficient solution can be designed from the start. Regards, Tvrtko From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6271EC433B4 for ; Thu, 20 May 2021 08:35:26 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 22F966109F for ; Thu, 20 May 2021 08:35:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 22F966109F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 46C2E6F393; Thu, 20 May 2021 08:35:21 +0000 (UTC) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7EA796F38A; Thu, 20 May 2021 08:35:19 +0000 (UTC) IronPort-SDR: U/9pXn3LKbQy0FOPsiwPeFHIHluaTsC6UTUYMSP9wOCjlnlYbF/6bfcwC0pdFXjQBw6K+Gd5xy I1zg7G4QJjLg== X-IronPort-AV: E=McAfee;i="6200,9189,9989"; a="181455044" X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="181455044" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 01:35:18 -0700 IronPort-SDR: aNBuLqGFfqSYN9sVZy7S8HzVxBasXvfZ48MkwbejhOkGcfdZtONuiT6xc3vMH4To8/aoZf+FQo OnnUJyzscVcg== X-IronPort-AV: E=Sophos;i="5.82,313,1613462400"; d="scan'208";a="543376897" Received: from fgrogers-mobl.ger.corp.intel.com (HELO [10.213.241.97]) ([10.213.241.97]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2021 01:35:15 -0700 To: Daniel Vetter References: <6cf2f14a-6a16-5ea3-d307-004faad4cc79@linux.intel.com> <7f8fc38a-cd25-aa1f-fa2d-5d3334edb3d2@linux.intel.com> <71428a10-4b2f-dbbf-7678-7487f9eda6a5@linux.intel.com> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc Message-ID: Date: Thu, 20 May 2021 09:35:13 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "jhubbard@nvidia.com" , Intel Graphics Development , Maling list - DRI developers , Simon Ser , "nouveau@lists.freedesktop.org" , "Koenig, Christian" , "aritger@nvidia.com" , "Nieto, David M" Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On 19/05/2021 19:23, Daniel Vetter wrote: > On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin > wrote: >> >> >> On 18/05/2021 10:40, Tvrtko Ursulin wrote: >>> >>> On 18/05/2021 10:16, Daniel Stone wrote: >>>> Hi, >>>> >>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin >>>> wrote: >>>>> I was just wondering if stat(2) and a chrdev major check would be a >>>>> solid criteria to more efficiently (compared to parsing the text >>>>> content) detect drm files while walking procfs. >>>> >>>> Maybe I'm missing something, but is the per-PID walk actually a >>>> measurable performance issue rather than just a bit unpleasant? >>> >>> Per pid and per each open fd. >>> >>> As said in the other thread what bothers me a bit in this scheme is that >>> the cost of obtaining GPU usage scales based on non-GPU criteria. >>> >>> For use case of a top-like tool which shows all processes this is a >>> smaller additional cost, but then for a gpu-top like tool it is somewhat >>> higher. >> >> To further expand, not only cost would scale per pid multiplies per open >> fd, but to detect which of the fds are DRM I see these three options: >> >> 1) Open and parse fdinfo. >> 2) Name based matching ie /dev/dri/.. something. >> 3) Stat the symlink target and check for DRM major. > > stat with symlink following should be plenty fast. Maybe. I don't think my point about keeping the dentry cache needlessly hot is getting through at all. On my lightly loaded desktop: $ sudo lsof | wc -l 599551 $ sudo lsof | grep "/dev/dri/" | wc -l 1965 It's going to look up ~600k pointless dentries in every iteration. Just to find a handful of DRM ones. Hard to say if that is better or worse than just parsing fdinfo text for all files. Will see. >> All sound quite sub-optimal to me. >> >> Name based matching is probably the least evil on system resource usage >> (Keeping the dentry cache too hot? Too many syscalls?), even though >> fundamentally I don't it is the right approach. >> >> What happens with dup(2) is another question. > > We need benchmark numbers showing that on anything remotely realistic > it's an actual problem. Until we've demonstrated it's a real problem > we don't need to solve it. Point about dup(2) is whether it is possible to distinguish the duplicated fds in fdinfo. If a DRM client dupes, and we found two fdinfos each saying client is using 20% GPU, we don't want to add it up to 40%. > E.g. top with any sorting enabled also parses way more than it > displays on every update. It seems to be doing Just Fine (tm). Ha, perceptions differ. I see it using 4-5% while building the kernel on a Xeon server which I find quite a lot. :) >> Does anyone have any feedback on the /proc//gpu idea at all? > > When we know we have a problem to solve we can take a look at solutions. Yes I don't think the problem would be to add a better solution later, so happy to try the fdinfo first. I am simply pointing out a fundamental design inefficiency. Even if machines are getting faster and faster I don't think that should be an excuse to waste more and more under the hood, when a more efficient solution can be designed from the start. Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx