From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AC4BC433F5 for ; Tue, 12 Oct 2021 07:37:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4302560BD3 for ; Tue, 12 Oct 2021 07:37:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234380AbhJLHjM (ORCPT ); Tue, 12 Oct 2021 03:39:12 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:34702 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234164AbhJLHi7 (ORCPT ); Tue, 12 Oct 2021 03:38:59 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id EA78A22150; Tue, 12 Oct 2021 07:36:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1634024216; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=4nD2ISDAf2TQ3t3VfHUt8r1cCwete3O5TAZWbE4kE7M=; b=C6A/Psx27DoZPrXya2lWB+l7IwGjlMGmwcL8xgtP+rlT5//7r1Ou5LMxdw1roHmjKf3q7F 81xK5su5B8gm1HWP+lEF6NWpiv2Ko43ifsKNqSqqA1lnIYYJT+AHiZufyRXlgf0cF/5ivF Cpowdql7MZF6RmK+IF0F45XrHXvkJIE= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 854B8A3B89; Tue, 12 Oct 2021 07:36:54 +0000 (UTC) Date: Tue, 12 Oct 2021 09:36:52 +0200 From: Michal Hocko To: Suren Baghdasaryan Cc: Kees Cook , Pavel Machek , Rasmus Villemoes , David Hildenbrand , John Hubbard , Andrew Morton , Colin Cross , Sumit Semwal , Dave Hansen , Matthew Wilcox , "Kirill A . Shutemov" , Vlastimil Babka , Johannes Weiner , Jonathan Corbet , Al Viro , Randy Dunlap , Kalesh Singh , Peter Xu , rppt@kernel.org, Peter Zijlstra , Catalin Marinas , vincenzo.frascino@arm.com, Chinwen Chang =?utf-8?B?KOW8temMpuaWhyk=?= , Axel Rasmussen , Andrea Arcangeli , Jann Horn , apopple@nvidia.com, Yu Zhao , Will Deacon , fenghua.yu@intel.com, thunder.leizhen@huawei.com, Hugh Dickins , feng.tang@intel.com, Jason Gunthorpe , Roman Gushchin , Thomas Gleixner , krisman@collabora.com, Chris Hyser , Peter Collingbourne , "Eric W. Biederman" , Jens Axboe , legion@kernel.org, Rolf Eike Beer , Cyrill Gorcunov , Muchun Song , Viresh Kumar , Thomas Cedeno , sashal@kernel.org, cxfcosmos@gmail.com, LKML , linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm , kernel-team , Tim Murray Subject: Re: [PATCH v10 3/3] mm: add anonymous vma name refcounting Message-ID: References: <202110071111.DF87B4EE3@keescook> <202110081344.FE6A7A82@keescook> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 11-10-21 18:20:25, Suren Baghdasaryan wrote: > On Mon, Oct 11, 2021 at 6:18 PM Suren Baghdasaryan wrote: > > > > On Mon, Oct 11, 2021 at 1:36 AM Michal Hocko wrote: > > > > > > On Fri 08-10-21 13:58:01, Kees Cook wrote: > > > > - Strings for "anon" specifically have no required format (this is good) > > > > it's informational like the task_struct::comm and can (roughly) > > > > anything. There's no naming convention for memfds, AF_UNIX, etc. Why > > > > is one needed here? That seems like a completely unreasonable > > > > requirement. > > > > > > I might be misreading the justification for the feature. Patch 2 is > > > talking about tools that need to understand memeory usage to make > > > further actions. Also Suren was suggesting "numbering convetion" as an > > > argument against. > > > > > > So can we get a clear example how is this being used actually? If this > > > is just to be used to debug by humans than I can see an argument for > > > human readable form. If this is, however, meant to be used by tools to > > > make some actions then the argument for strings is much weaker. > > > > The simplest usecase is when we notice that a process consumes more > > memory than usual and we do "cat /proc/$(pidof my_process)/maps" to > > check which area is contributing to this growth. The names we assign > > to anonymous areas are descriptive enough for a developer to get an > > idea where the increased consumption is coming from and how to proceed > > with their investigation. > > There are of course cases when tools are involved, but the end-user is > > always a human and the final report should contain easily > > understandable data. OK, it would have been much more preferable to be explicit about this main use case from the very beginning. Just to make sure we are at the same page. Is the primary usecase usage and bug reporting? My initial understanding was that at userspace managed memory management could make an educated guess about targeted reclaim (e.g. MADV_{FREE,COLD,PAGEOUT} for cached data in memory like uncompressed images/data). Such a usecase would clearly require a standardized id/naming convention to be application neutral. > > IIUC, the main argument here is whether the userspace can provide > > tools to perform the translations between ids and names, with the > > kernel accepting and reporting ids instead of strings. Technically > > it's possible, but to be practical that conversion should be fast > > because we will need to make name->id conversion potentially for each > > mmap. On the consumer side the performance is not as critical, but the > > fact that instead of dumping /proc/$pid/maps we will have to parse the > > file, do id->name conversion and replace all [anon:id] with > > [anon:name] would be an issue when we do that in bulk, for example > > when collecting system-wide data for a bugreport. Whether you use ids or human readable strings you still have to understand the underlying meaning to make any educated guess. Let me give you an example. Say I have an application with a memory leak. Right now I can only tell that it is anonymous memory growing but it is not clear who uses that anonymous. You are adding a means to tell different users appart. That is really helpful. Now I know this is an anon user 1234 or MySuperAnonMemory. Neither of the will not tell me more without a id/naming convention or reading the code. A convention can be useful for the most common users (e.g. a specific allocator) but I am rather dubious there are many more that would be _generally_ recognized without some understanding of the said application. Maybe the situation in Android is different because the runtime is more coupled but is it reasonable to expect any common naming conventions for general Linux platforms? I am slightly worried that we have spent way too much time talking specifics about id->name translation rather than the actual usability of the token. -- Michal Hocko SUSE Labs