From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 747B4C43217 for ; Thu, 7 Oct 2021 15:42:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 53CB4610EA for ; Thu, 7 Oct 2021 15:42:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235155AbhJGPo1 (ORCPT ); Thu, 7 Oct 2021 11:44:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233463AbhJGPo0 (ORCPT ); Thu, 7 Oct 2021 11:44:26 -0400 Received: from mail-yb1-xb2b.google.com (mail-yb1-xb2b.google.com [IPv6:2607:f8b0:4864:20::b2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CF44C061746 for ; Thu, 7 Oct 2021 08:42:32 -0700 (PDT) Received: by mail-yb1-xb2b.google.com with SMTP id n65so14338513ybb.7 for ; Thu, 07 Oct 2021 08:42:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Vlar6Cpk8FMyDmzR6kojX5u1Q4wI3iurHkOrDEyZ+tg=; b=cmq4sjqDQok/4KxsFCxUkk9uUjfHlLxtyjWkxHQGtzIBTrIVFFXA500G7/LK7k7nIk BUEvNt+xYoyFPujGUFiJkIoJi9W7SAaP0mDoD5meAKh2lS6Zh1XOCzznx2FfZg0FT7Ag 2SEatuWQgxkri0VXJyfyrvf2M+deg74+qLz/0sJ9/FZMYVg2yn5ELyHbB2qDZakEpPis UHlTiPe/EvNXYM9rke+8CesJrIiYoJytR6Poemu3GuEBZa0t3XJzK98fjNFXYGa4SmBs 2lWzudueNw2yi3ebMwzZ3btjk6rGEmlA1IMYTdO4GbR8ClQKqrYfN9AkVxUs7UGy58C6 +ycw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Vlar6Cpk8FMyDmzR6kojX5u1Q4wI3iurHkOrDEyZ+tg=; b=XKPIZq5hTgfi/GWeW6FsVUPX1bJax7OzT1oja6dOlcozjTqzyZWSjzl81YhCt9MsLZ 5sVk6EHNXBI/oOJ5EBCZvOfZLSGf/EKE+XRSzLgeh5TBjIIbu4QyfNXdYqXANfogry+I KY3oNF/P/usQybMRbq4FQUH95HVU2OoXRCU2lxOecxc6GSgeLSaj6Yv9kQb84AhwQMnq nA4BgrZcbV1s5o3a7QN0AlOuahUeYq5pZjFKgT0aZUizKGHESSSMVAiyppXfYaUCnOYL 8fxpuks7voPdyBRL8eK7s6b01edclPjvMDRwVQvTAZAes5gj9TCJvPw1y97+ZJTWc9yB CvWA== X-Gm-Message-State: AOAM530Ss7uj7xLV1gXEp5Oo6h6xCrMjErc+aO7KVJYndFDq8r95GQuS nO2azW+fLpPuBySPE9fY14WA9sPRBk8/pObfJrK61g== X-Google-Smtp-Source: ABdhPJzXrozzQndVXvR8NLw9OC068+2VH2pISi5g5Fq0jEtqksTI9lHimoZuNQbEsasUZ55ba8tIP82w/AieCwfl4nM= X-Received: by 2002:a25:3:: with SMTP id 3mr5602512yba.418.1633621351461; Thu, 07 Oct 2021 08:42:31 -0700 (PDT) MIME-Version: 1.0 References: <20211001205657.815551-1-surenb@google.com> <20211001205657.815551-3-surenb@google.com> <20211005184211.GA19804@duo.ucw.cz> <20211005200411.GB19804@duo.ucw.cz> <6b15c682-72eb-724d-bc43-36ae6b79b91a@redhat.com> <192438ab-a095-d441-6843-432fbbb8e38a@redhat.com> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 7 Oct 2021 08:42:20 -0700 Message-ID: Subject: Re: [PATCH v10 3/3] mm: add anonymous vma name refcounting To: David Hildenbrand Cc: Michal Hocko , John Hubbard , Pavel Machek , Andrew Morton , Colin Cross , Sumit Semwal , Dave Hansen , Kees Cook , Matthew Wilcox , "Kirill A . Shutemov" , Vlastimil Babka , Johannes Weiner , Jonathan Corbet , Al Viro , Randy Dunlap , Kalesh Singh , Peter Xu , rppt@kernel.org, Peter Zijlstra , Catalin Marinas , vincenzo.frascino@arm.com, =?UTF-8?B?Q2hpbndlbiBDaGFuZyAo5by16Yym5paHKQ==?= , Axel Rasmussen , Andrea Arcangeli , Jann Horn , apopple@nvidia.com, Yu Zhao , Will Deacon , fenghua.yu@intel.com, thunder.leizhen@huawei.com, Hugh Dickins , feng.tang@intel.com, Jason Gunthorpe , Roman Gushchin , Thomas Gleixner , krisman@collabora.com, chris.hyser@oracle.com, Peter Collingbourne , "Eric W. Biederman" , Jens Axboe , legion@kernel.org, Rolf Eike Beer , Cyrill Gorcunov , Muchun Song , Viresh Kumar , Thomas Cedeno , sashal@kernel.org, cxfcosmos@gmail.com, Rasmus Villemoes , LKML , linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm , kernel-team Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 7, 2021 at 12:33 AM David Hildenbrand wrote: > > On 06.10.21 17:20, Suren Baghdasaryan wrote: > > On Wed, Oct 6, 2021 at 8:08 AM David Hildenbrand wrote: > >> > >> On 06.10.21 17:01, Suren Baghdasaryan wrote: > >>> On Wed, Oct 6, 2021 at 2:27 AM David Hildenbrand wrote: > >>>> > >>>> On 06.10.21 10:27, Michal Hocko wrote: > >>>>> On Tue 05-10-21 23:57:36, John Hubbard wrote: > >>>>> [...] > >>>>>> 1) Yes, just leave the strings in the kernel, that's simple and > >>>>>> it works, and the alternatives don't really help your case nearly > >>>>>> enough. > >>>>> > >>>>> I do not have a strong opinion. Strings are easier to use but they > >>>>> are more involved and the necessity of kref approach just underlines > >>>>> that. There are going to be new allocations and that always can lead > >>>>> to surprising side effects. These are small (80B at maximum) so the > >>>>> overall footpring shouldn't all that large by default but it can grow > >>>>> quite large with a very high max_map_count. There are workloads which > >>>>> really require the default to be set high (e.g. heavy mremap users). So > >>>>> if anything all those should be __GFP_ACCOUNT and memcg accounted. > >>>>> > >>>>> I do agree that numbers are just much more simpler from accounting, > >>>>> performance and implementation POV. > >>>> > >>>> +1 > >>>> > >>>> I can understand that having a string can be quite beneficial e.g., when > >>>> dumping mmaps. If only user space knows the id <-> string mapping, that > >>>> can be quite tricky. > >>>> > >>>> However, I also do wonder if there would be a way to standardize/reserve > >>>> ids, such that a given id always corresponds to a specific user. If we > >>>> use an uint64_t for an id, there would be plenty room to reserve ids ... > >>>> > >>>> I'd really prefer if we can avoid using strings and instead using ids. > >>> > >>> I wish it was that simple and for some names like [anon:.bss] or > >>> [anon:dalvik-zygote space] reserving a unique id would work, however > >>> some names like [anon:dalvik-/system/framework/boot-core-icu4j.art] > >>> are generated dynamically at runtime and include package name. > >> > >> Valuable information > > > > Yeah, I should have described it clearer the first time around. > > > >> > >>> Packages are constantly evolving, new ones are developed, names can > >>> change, etc. So assigning a unique id for these names is not really > >>> feasible. > >> > >> So, you'd actually want to generate/reserve an id for a given string at > >> runtime, assign that id to the VMA, and have a way to match id <-> > >> string somehow? > > > > If we go with ids then yes, that is what we would have to do. > > > >> That reservation service could be inside the kernel or even (better?) in > >> user space. The service could for example de-duplicates strings. > > > > Yes but it would require an IPC call to that service potentially on > > every mmap() when we want to name a mapped vma. This would be > > prohibitive. Even on consumption side, instead of just dumping > > /proc/$pid/maps we would have to parse the file and convert all > > [anon:id] into [anon:name] with each conversion requiring an IPC call > > (assuming no id->name pair caching on the client side). > > mmap() and prctl() already do take the mmap sem in write, so they are > not the "most lightweight" operations so to say. > > We already to have two separate operations, first the mmap(), then the > prctl(). IMHO you could defer the "naming" part to a later point in > time, without creating too many issues, moving it out of the > "hot/performance critical phase" > > Reading https://lwn.net/Articles/867818/, to me it feels like the use > case could live with a little larger delay between the mmap popping up > and a name getting assigned. That might be doable if occasional inconsistency can be tolerated (we can't guarantee that maps won't be read before the deferred work name the vma). However I would prefer an efficient solution vs the one which is inefficient but can be deferred. > > -- > Thanks, > > David / dhildenb >