[AMD Official Use Only] Parsing over 550 processes for fdinfo is taking between 40-100ms single threaded in a 2GHz skylake IBRS within a VM using simple string comparisons and DIRent parsing. And that is pretty much the worst case scenario with some more optimized implementations. David ________________________________ From: Daniel Vetter Sent: Wednesday, May 19, 2021 11:23 AM To: Tvrtko Ursulin Cc: Daniel Stone ; jhubbard@nvidia.com ; nouveau@lists.freedesktop.org ; Intel Graphics Development ; Maling list - DRI developers ; Simon Ser ; Koenig, Christian ; aritger@nvidia.com ; Nieto, David M Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin wrote: > > > On 18/05/2021 10:40, Tvrtko Ursulin wrote: > > > > On 18/05/2021 10:16, Daniel Stone wrote: > >> Hi, > >> > >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin > >> wrote: > >>> I was just wondering if stat(2) and a chrdev major check would be a > >>> solid criteria to more efficiently (compared to parsing the text > >>> content) detect drm files while walking procfs. > >> > >> Maybe I'm missing something, but is the per-PID walk actually a > >> measurable performance issue rather than just a bit unpleasant? > > > > Per pid and per each open fd. > > > > As said in the other thread what bothers me a bit in this scheme is that > > the cost of obtaining GPU usage scales based on non-GPU criteria. > > > > For use case of a top-like tool which shows all processes this is a > > smaller additional cost, but then for a gpu-top like tool it is somewhat > > higher. > > To further expand, not only cost would scale per pid multiplies per open > fd, but to detect which of the fds are DRM I see these three options: > > 1) Open and parse fdinfo. > 2) Name based matching ie /dev/dri/.. something. > 3) Stat the symlink target and check for DRM major. stat with symlink following should be plenty fast. > All sound quite sub-optimal to me. > > Name based matching is probably the least evil on system resource usage > (Keeping the dentry cache too hot? Too many syscalls?), even though > fundamentally I don't it is the right approach. > > What happens with dup(2) is another question. We need benchmark numbers showing that on anything remotely realistic it's an actual problem. Until we've demonstrated it's a real problem we don't need to solve it. E.g. top with any sorting enabled also parses way more than it displays on every update. It seems to be doing Just Fine (tm). > Does anyone have any feedback on the /proc//gpu idea at all? When we know we have a problem to solve we can take a look at solutions. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&data=04%7C01%7CDavid.Nieto%40amd.com%7Cf6aea97532cf41f916de08d91af32cc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570453997158377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4CFrY9qWbJREcIcSzeO9KIn2P%2Fw6k%2BYdNlh6rdS%2BEh4%3D&reserved=0