From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83767C433FF for ; Tue, 13 Aug 2019 15:19:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4A0D42067D for ; Tue, 13 Aug 2019 15:19:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Fk1Dc/CH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729992AbfHMPT4 (ORCPT ); Tue, 13 Aug 2019 11:19:56 -0400 Received: from mail-ot1-f66.google.com ([209.85.210.66]:38772 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729901AbfHMPTz (ORCPT ); Tue, 13 Aug 2019 11:19:55 -0400 Received: by mail-ot1-f66.google.com with SMTP id r20so24041390ota.5 for ; Tue, 13 Aug 2019 08:19:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Xuo82W680vfMqA0vjhlKyKy1WIhwCKADAN98TM3xDdg=; b=Fk1Dc/CHcXCvYiOTomqNv0zmBWlrR9D+HqfO1gKyIHgLJslb1kPNyTxNdjV/YBUBJl VFi0gheINDE48llBE2R/v8PcAapA7TsC2L+tCszHNHE8BMiAnBfyvQ3BeTgrrG7hYJgx iOSqRSx60Yi9S8YNoRsU2dXtmT+3WKOrOLCBZVm2uGQjNhhfdJmo6IhY7kn+guA1GqKN P/52/JmahNx9K5WuoSrr7/eJLCbdpHYrCDZ9QSF668dGyyLDrs9fygFj+BF+S3srrQam srUSgjROBxIq/WxIap+N60Qa6mvdP78oyd3ewOyWuiQkmhYhuq3HD/pCdrVZCjxmnkl7 u1Ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Xuo82W680vfMqA0vjhlKyKy1WIhwCKADAN98TM3xDdg=; b=J1RWNcqn/dNiUxmqmjatjrD1lSH722UTxs9H1eZRc0X198PBfo527qeXXo4AmfMEkE 3xprMGyLqMC8bBayZmWgmG2TMhfy8QPlYHn2RL+ZPa/CGsPdkHJmeZDYrodJFBk0C/B3 tTSEDr/yjEx2KbF5NapCWirj2W5fdaXPlxhbQG8j798SP4tR+k6ATOEcrq1tCnKNP2DW elpC/tnJCZ/FkX9NnGOI/YM441+pykPaOILzg6/RYF/7yuzYDInzAmwRbJqkktrTIenL mjNfUHd1mDJgpvIrj7EZ26+pT6MyQKxxsIsv/F8yqzNq6j+sdmOTO0Buq6fHZxp+ejRt KCgw== X-Gm-Message-State: APjAAAWoA926v4bDZAunxCvotyjcgDS8+8jMk9MKxR0sx6d9v+WBZBhD m8OKjtWv1Vb05/IfUlcWXVmS83/D2b1ykdl3gFtwxA== X-Google-Smtp-Source: APXvYqxcZF4gU1ub1LLFg/qKp1ZrFgTQMzOSzMW6cWHW1kLXjABD4Sai5Y205ckHfgZ5NHx4tWUFsQyWBTGBmxzQkL4= X-Received: by 2002:aca:3dd7:: with SMTP id k206mr1661128oia.47.1565709594369; Tue, 13 Aug 2019 08:19:54 -0700 (PDT) MIME-Version: 1.0 References: <20190807171559.182301-1-joel@joelfernandes.org> <20190813100856.GF17933@dhcp22.suse.cz> <20190813142527.GD258732@google.com> In-Reply-To: <20190813142527.GD258732@google.com> From: Jann Horn Date: Tue, 13 Aug 2019 17:19:27 +0200 Message-ID: Subject: Re: [PATCH v5 1/6] mm/page_idle: Add per-pid idle page tracking using virtual index To: Joel Fernandes , Daniel Gruss Cc: Michal Hocko , kernel list , Alexey Dobriyan , Andrew Morton , Borislav Petkov , Brendan Gregg , Catalin Marinas , Christian Hansen , Daniel Colascione , fmayer@google.com, "H. Peter Anvin" , Ingo Molnar , Jonathan Corbet , Kees Cook , kernel-team , Linux API , linux-doc@vger.kernel.org, linux-fsdevel , Linux-MM , Mike Rapoport , Minchan Kim , namhyung@google.com, "Paul E. McKenney" , Robin Murphy , Roman Gushchin , Stephen Rothwell , Suren Baghdasaryan , Thomas Gleixner , Todd Kjos , Vladimir Davydov , Vlastimil Babka , Will Deacon Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 13, 2019 at 4:25 PM Joel Fernandes wrote: > On Tue, Aug 13, 2019 at 12:08:56PM +0200, Michal Hocko wrote: > > On Mon 12-08-19 20:14:38, Jann Horn wrote: > > > On Wed, Aug 7, 2019 at 7:16 PM Joel Fernandes (Google) > > > wrote: > > > > The page_idle tracking feature currently requires looking up the pagemap > > > > for a process followed by interacting with /sys/kernel/mm/page_idle. > > > > Looking up PFN from pagemap in Android devices is not supported by > > > > unprivileged process and requires SYS_ADMIN and gives 0 for the PFN. > > > > > > > > This patch adds support to directly interact with page_idle tracking at > > > > the PID level by introducing a /proc//page_idle file. It follows > > > > the exact same semantics as the global /sys/kernel/mm/page_idle, but now > > > > looking up PFN through pagemap is not needed since the interface uses > > > > virtual frame numbers, and at the same time also does not require > > > > SYS_ADMIN. > > > > > > > > In Android, we are using this for the heap profiler (heapprofd) which > > > > profiles and pin points code paths which allocates and leaves memory > > > > idle for long periods of time. This method solves the security issue > > > > with userspace learning the PFN, and while at it is also shown to yield > > > > better results than the pagemap lookup, the theory being that the window > > > > where the address space can change is reduced by eliminating the > > > > intermediate pagemap look up stage. In virtual address indexing, the > > > > process's mmap_sem is held for the duration of the access. > > > > > > What happens when you use this interface on shared pages, like memory > > > inherited from the zygote, library file mappings and so on? If two > > > profilers ran concurrently for two different processes that both map > > > the same libraries, would they end up messing up each other's data? > > > > Yup PageIdle state is shared. That is the page_idle semantic even now > > IIRC. > > Yes, that's right. This patch doesn't change that semantic. Idle page > tracking at the core is a global procedure which is based on pages that can > be shared. > > One of the usecases of the heap profiler is to enable profiling of pages that > are shared between zygote and any processes that are forked. In this case, > I am told by our team working on the heap profiler, that the monitoring of > shared pages will help. > > > > Can this be used to observe which library pages other processes are > > > accessing, even if you don't have access to those processes, as long > > > as you can map the same libraries? I realize that there are already a > > > bunch of ways to do that with side channels and such; but if you're > > > adding an interface that allows this by design, it seems to me like > > > something that should be gated behind some sort of privilege check. > > > > Hmm, you need to be priviledged to get the pfn now and without that you > > cannot get to any page so the new interface is weakening the rules. > > Maybe we should limit setting the idle state to processes with the write > > status. Or do you think that even observing idle status is useful for > > practical side channel attacks? If yes, is that a problem of the > > profiler which does potentially dangerous things? > > The heap profiler is currently unprivileged. Would it help the concern Jann > raised, if the new interface was limited to only anonymous private/shared > pages and not to file pages? Or, is this even a real concern? +Daniel Gruss in case he wants to provide some more detail; he has been involved in a lot of the public research around this topic. It is a bit of a concern when code that wasn't hardened as rigorously as cryptographic library code operates on secret values. A paper was published this year that abused mincore() in combination with tricks for flushing the page cache to obtain information about when shared read-only memory was accessed: . In response to that, the semantics of mincore() were changed to prevent that information from leaking (see commit 134fca9063ad4851de767d1768180e5dede9a881). On the other hand, an attacker could also use things like cache timing attacks instead of abusing operating system features; that is more hardware-specific, but it has a higher spatial granularity (typically 64 bytes instead of 4096 bytes). Timing-granularity-wise, I'm not sure whether the proposed interface would be more or less bad than existing cache side-channels on common architectures. There are papers that demonstrate things like being able to distinguish some classes of keyboard keys from others on an Android phone: I don't think limiting it to anonymous pages is necessarily enough to completely solve this; in a normal Linux environment, it might be good enough, but on Android, I'm worried about the CoW private memory from the zygote.