From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8415C433FF for ; Tue, 13 Aug 2019 15:29:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A9FEA20844 for ; Tue, 13 Aug 2019 15:29:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="bImk/Y4v" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728406AbfHMP3h (ORCPT ); Tue, 13 Aug 2019 11:29:37 -0400 Received: from mail-ot1-f68.google.com ([209.85.210.68]:34054 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727869AbfHMP3h (ORCPT ); Tue, 13 Aug 2019 11:29:37 -0400 Received: by mail-ot1-f68.google.com with SMTP id c7so1439262otp.1 for ; Tue, 13 Aug 2019 08:29:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jMeI5ltMPEKZ0Tj88O24MXE9cWJp6pw2JTg4dO3y4Mo=; b=bImk/Y4vshDj5gFBaBVzsjv6Rz+kwzqnWzrOBiLYKXqof3GfeusuSBS3ZiJ0pa2Biz +tCSIZX2wDwVr22g3BeKZT3HzY/3F3KR5WgLqESzTmOD6PixUCsy4sgNyuiEqlGA2Bjp O3h7Bxpj0KGCCfJLCDX5YuylzaDhD6YWh0+jnG/g8GPay7nwiIg92/zGE80yk/76C2Ir tInAOj2kgN2/l0orZzstbjsX+QeiJpnBb75cjQr+tZH8sYPtvAJiOVQTMuMyxL1xu9O/ oCa/4Px4H0ZvIlllpY5mSai4UcuTIufdPCdutaJGOguKuqlkfxkxQaDhfVzv45rlbMpZ AHIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jMeI5ltMPEKZ0Tj88O24MXE9cWJp6pw2JTg4dO3y4Mo=; b=V70QVgn2BghRpZjegJfcpmB+lDwPe7IMKrvH0uF+0eD834WA7dQl3GAODfoYx+164G RcCz/nK8RRhSyhpOJXomkj06mU53nGkWYxAWD9S8G+1riYscjYxxcPNkA3sOYc3w8mYe DvRqGmuJgzObvY+nfOafggFsfs4KqnRnoe3ix0bGqsu5HNq79rIgdWVe5DfqAIxiIpNF Pyc+uBDgbxZe+GIKgvVfY1KLcuMC5Jht6hSOzpOYhA5aztux03CQe+EtASkBq3GMzgPF qyLBlsRcI4Mm2vpFxAF1gG8Um6yfFU6C038/vfbKC82jdz2Bm4BGxXgbJrVB+1i2Mc2y JVFA== X-Gm-Message-State: APjAAAW7fC/WUUtxiEcJs7bOX+nuMSk0HjGSnmJa93VQQe59vSnZpM7s Tp/SvpEX3ZfWO0sptzEEsiFBvrK5mYV/eBO/TL0rcw== X-Google-Smtp-Source: APXvYqxQEaHf3X4y40GcHw/mVHg70puR4hMjqYUOR1PPtguEpeRZ2mELIIydxgtCS2soxOtuvIFl0/ZHBeOF0COouWI= X-Received: by 2002:a9d:5a91:: with SMTP id w17mr35070043oth.32.1565710175793; Tue, 13 Aug 2019 08:29:35 -0700 (PDT) MIME-Version: 1.0 References: <20190807171559.182301-1-joel@joelfernandes.org> <20190813100856.GF17933@dhcp22.suse.cz> In-Reply-To: <20190813100856.GF17933@dhcp22.suse.cz> From: Jann Horn Date: Tue, 13 Aug 2019 17:29:09 +0200 Message-ID: Subject: Re: [PATCH v5 1/6] mm/page_idle: Add per-pid idle page tracking using virtual index To: Michal Hocko , Daniel Gruss , "Joel Fernandes (Google)" Cc: kernel list , Alexey Dobriyan , Andrew Morton , Borislav Petkov , Brendan Gregg , Catalin Marinas , Christian Hansen , Daniel Colascione , fmayer@google.com, "H. Peter Anvin" , Ingo Molnar , Joel Fernandes , Jonathan Corbet , Kees Cook , kernel-team , Linux API , linux-doc@vger.kernel.org, linux-fsdevel , Linux-MM , Mike Rapoport , Minchan Kim , namhyung@google.com, "Paul E. McKenney" , Robin Murphy , Roman Gushchin , Stephen Rothwell , Suren Baghdasaryan , Thomas Gleixner , Todd Kjos , Vladimir Davydov , Vlastimil Babka , Will Deacon Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 13, 2019 at 12:09 PM Michal Hocko wrote: > On Mon 12-08-19 20:14:38, Jann Horn wrote: > > On Wed, Aug 7, 2019 at 7:16 PM Joel Fernandes (Google) > > wrote: > > > The page_idle tracking feature currently requires looking up the pagemap > > > for a process followed by interacting with /sys/kernel/mm/page_idle. > > > Looking up PFN from pagemap in Android devices is not supported by > > > unprivileged process and requires SYS_ADMIN and gives 0 for the PFN. > > > > > > This patch adds support to directly interact with page_idle tracking at > > > the PID level by introducing a /proc//page_idle file. It follows > > > the exact same semantics as the global /sys/kernel/mm/page_idle, but now > > > looking up PFN through pagemap is not needed since the interface uses > > > virtual frame numbers, and at the same time also does not require > > > SYS_ADMIN. > > > > > > In Android, we are using this for the heap profiler (heapprofd) which > > > profiles and pin points code paths which allocates and leaves memory > > > idle for long periods of time. This method solves the security issue > > > with userspace learning the PFN, and while at it is also shown to yield > > > better results than the pagemap lookup, the theory being that the window > > > where the address space can change is reduced by eliminating the > > > intermediate pagemap look up stage. In virtual address indexing, the > > > process's mmap_sem is held for the duration of the access. > > > > What happens when you use this interface on shared pages, like memory > > inherited from the zygote, library file mappings and so on? If two > > profilers ran concurrently for two different processes that both map > > the same libraries, would they end up messing up each other's data? > > Yup PageIdle state is shared. That is the page_idle semantic even now > IIRC. > > > Can this be used to observe which library pages other processes are > > accessing, even if you don't have access to those processes, as long > > as you can map the same libraries? I realize that there are already a > > bunch of ways to do that with side channels and such; but if you're > > adding an interface that allows this by design, it seems to me like > > something that should be gated behind some sort of privilege check. > > Hmm, you need to be priviledged to get the pfn now and without that you > cannot get to any page so the new interface is weakening the rules. > Maybe we should limit setting the idle state to processes with the write > status. Or do you think that even observing idle status is useful for > practical side channel attacks? If yes, is that a problem of the > profiler which does potentially dangerous things? I suppose read-only access isn't a real problem as long as the profiler isn't writing the idle state in a very tight loop... but I don't see a usecase where you'd actually want that? As far as I can tell, if you can't write the idle state, being able to read it is pretty much useless. If the profiler only wants to profile process-private memory, then that should be implementable in a safe way in principle, I think, but since Joel said that they want to profile CoW memory as well, I think that's inherently somewhat dangerous.