From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CE16C433FE for ; Thu, 30 Sep 2021 18:56:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2A78360C4A for ; Thu, 30 Sep 2021 18:56:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2A78360C4A Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id A753B9400CB; Thu, 30 Sep 2021 14:56:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A24A494003A; Thu, 30 Sep 2021 14:56:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 850A69400CB; Thu, 30 Sep 2021 14:56:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0197.hostedemail.com [216.40.44.197]) by kanga.kvack.org (Postfix) with ESMTP id 7058B94003A for ; Thu, 30 Sep 2021 14:56:25 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 270D72CBC3 for ; Thu, 30 Sep 2021 18:56:25 +0000 (UTC) X-FDA: 78645145530.03.E5CD210 Received: from mail-yb1-f172.google.com (mail-yb1-f172.google.com [209.85.219.172]) by imf01.hostedemail.com (Postfix) with ESMTP id E01E5506D538 for ; Thu, 30 Sep 2021 18:56:24 +0000 (UTC) Received: by mail-yb1-f172.google.com with SMTP id v195so15020869ybb.0 for ; Thu, 30 Sep 2021 11:56:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kvnWJDkfYff63dedYnGJTZjKEKtttIPMWbyKWd7jcys=; b=MszGudlzOc7y2MG5Azj1ge3Zo3H/LqnWaymyDtHDYs/NteFQibW/dGtvn9pUd/GVmz dXUSVvumzvTjZUnyRrk7AfBsc1+y7CD7RPdT7eJYe7EZ+pqt+g32fvP/vdEt8S6U4W5I JAx/jsNK74i/jP2lFaJFwh+YlQEcOEGR+Opiqs5pd32JCMWAYLBHKoW6Eti0j3vvQims OJBrZ7kdlHUALRmeGKU5vDZW+FCjkSkvraAmOeaP5Ep89KwGxX1QnW9y3lMRNf6Yg/iB r89sVxo6dhS5lweD4Z77UKiVnKYPID+Xuen3WRgMapVd0WDWAyazW8zmJacKqS6AjA5U 2XZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kvnWJDkfYff63dedYnGJTZjKEKtttIPMWbyKWd7jcys=; b=tBbELMiSBBv61EPyEkDow/KLhVTFgaH2IXa7bD6bybc1X7VUrP0numy/x2YPcI0dIv d6WMty5seAt6IO6uqcNMWBASukLdiSOaRxr8+cO/XRMfqlCTmzKnLCc6vsLTWXIJMU4Z 0cJ3ANxiKzbtyIZCeCwCGBtVYOyOKXadnL2n4BaIpd9DyxgMfIeIxe6Siasyx1z69Ds1 MnkaCn9lEeCqXpjkd2dmthsvhfbPZecVxIBXAVoNQDJEqOrWR6gvnE4UDGBP5Mk3r9GQ tTuYkeC1Um+PEm8G9xmKIbXAwnZt4EUN2L6xwPz+9bxNf0PRtnoej3745Ku+qg8+XVyy 8L8g== X-Gm-Message-State: AOAM5308ufIClPi/7hDfe9Ie/da1VGjVImZMKPw4OLWvppZWh1sN5aUs lw4MfAiBriUyr7LvlCuVdw/mLJIy2O4BUzGnOnd1gQ== X-Google-Smtp-Source: ABdhPJy+eGHpLg+c8I2mOWSFvSs59/DdAVBUZvUKl74ku4b0DHYfhCqnG0lf7kkHt1XYSYoQfSszIPerFRm3qKyoAxA= X-Received: by 2002:a05:6902:124f:: with SMTP id t15mr1118645ybu.161.1633028183843; Thu, 30 Sep 2021 11:56:23 -0700 (PDT) MIME-Version: 1.0 References: <20210902231813.3597709-1-surenb@google.com> <20210902231813.3597709-2-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 30 Sep 2021 11:56:12 -0700 Message-ID: Subject: Re: [PATCH v9 2/3] mm: add a field to store names for private anonymous memory To: Matthew Wilcox Cc: Andrew Morton , Colin Cross , Sumit Semwal , Michal Hocko , Dave Hansen , Kees Cook , "Kirill A . Shutemov" , Vlastimil Babka , Johannes Weiner , Jonathan Corbet , Al Viro , Randy Dunlap , Kalesh Singh , Peter Xu , rppt@kernel.org, Peter Zijlstra , Catalin Marinas , vincenzo.frascino@arm.com, =?UTF-8?B?Q2hpbndlbiBDaGFuZyAo5by16Yym5paHKQ==?= , Axel Rasmussen , Andrea Arcangeli , Jann Horn , apopple@nvidia.com, John Hubbard , Yu Zhao , Will Deacon , fenghua.yu@intel.com, thunder.leizhen@huawei.com, Hugh Dickins , feng.tang@intel.com, Jason Gunthorpe , Roman Gushchin , Thomas Gleixner , krisman@collabora.com, chris.hyser@oracle.com, Peter Collingbourne , "Eric W. Biederman" , Jens Axboe , legion@kernel.org, Rolf Eike Beer , Cyrill Gorcunov , Muchun Song , Viresh Kumar , Thomas Cedeno , sashal@kernel.org, cxfcosmos@gmail.com, Rasmus Villemoes , LKML , linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm , kernel-team Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: E01E5506D538 X-Stat-Signature: p6xbckofb4471sfbzu4zr7j57nyzof5f Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=MszGudlz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of surenb@google.com designates 209.85.219.172 as permitted sender) smtp.mailfrom=surenb@google.com X-HE-Tag: 1633028184-168488 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 8, 2021 at 9:05 PM Suren Baghdasaryan wrote: > > On Mon, Sep 6, 2021 at 9:57 AM Matthew Wilcox wrote: > > > > On Thu, Sep 02, 2021 at 04:18:12PM -0700, Suren Baghdasaryan wrote: > > > On Android we heavily use a set of tools that use an extended version of > > > the logic covered in Documentation/vm/pagemap.txt to walk all pages mapped > > > in userspace and slice their usage by process, shared (COW) vs. unique > > > mappings, backing, etc. This can account for real physical memory usage > > > even in cases like fork without exec (which Android uses heavily to share > > > as many private COW pages as possible between processes), Kernel SamePage > > > Merging, and clean zero pages. It produces a measurement of the pages > > > that only exist in that process (USS, for unique), and a measurement of > > > the physical memory usage of that process with the cost of shared pages > > > being evenly split between processes that share them (PSS). > > > > > > If all anonymous memory is indistinguishable then figuring out the real > > > physical memory usage (PSS) of each heap requires either a pagemap walking > > > tool that can understand the heap debugging of every layer, or for every > > > layer's heap debugging tools to implement the pagemap walking logic, in > > > which case it is hard to get a consistent view of memory across the whole > > > system. > > > > > > Tracking the information in userspace leads to all sorts of problems. > > > It either needs to be stored inside the process, which means every > > > process has to have an API to export its current heap information upon > > > request, or it has to be stored externally in a filesystem that > > > somebody needs to clean up on crashes. It needs to be readable while > > > the process is still running, so it has to have some sort of > > > synchronization with every layer of userspace. Efficiently tracking > > > the ranges requires reimplementing something like the kernel vma > > > trees, and linking to it from every layer of userspace. It requires > > > more memory, more syscalls, more runtime cost, and more complexity to > > > separately track regions that the kernel is already tracking. > > > > I understand that the information is currently incoherent, but why is > > this the right way to make it coherent? It would seem more useful to > > use something like one of the tracing mechanisms (eg ftrace, LTTng, > > whatever the current hotness is in userspace tracing) for the malloc > > library to log all the useful information, instead of injecting a subset > > of it into the kernel for userspace to read out again. > > Sorry, for the delay with the response. I'm travelling and my internet > access is very patchy. > > Just to clarify, your suggestion is to require userspace to log any > allocation using ftrace or a similar mechanism and then for the system > to parse these logs to calculate the memory usage for each process? > I didn't think much in this direction but I guess logging each > allocation in the system and periodically collecting that data would > be quite expensive both from memory usage and performance POV. I'll > need to think a bit more but these are to me the obvious downsides of > this approach. Sorry for the delay again. Now that I'm back there should not be any more of them. I thought more about these alternative suggestions for userspace to record allocations but that would introduce considerable complexity into userspace. Userspace would have to collect and consolidate this data by some daemon, all users would have to query it for the data (IPC or something similar), in case this daemon crashes the data would need to be somehow recovered. So, in short, it's possible but makes things much more complex compared to proposed in-kernel implementation. OTOH, the only downside of the current implementation is the additional memory required to store anon vma names. I checked the memory consumption on the latest Android with these patches and because we share vma names during fork, the actual memory required to store vma names is no more than 600kB. Even on older phones like Pixel 3 with 4GB RAM, this is less than 0.015% of total memory. IMHO, this is an acceptable price to pay.