From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE653C64E7B for ; Tue, 1 Dec 2020 19:14:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 46E0120639 for ; Tue, 1 Dec 2020 19:14:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Gf0MeNT5" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 46E0120639 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B11256B0036; Tue, 1 Dec 2020 14:14:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A9AEF6B005D; Tue, 1 Dec 2020 14:14:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 98A068D0001; Tue, 1 Dec 2020 14:14:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0142.hostedemail.com [216.40.44.142]) by kanga.kvack.org (Postfix) with ESMTP id 7DF176B0036 for ; Tue, 1 Dec 2020 14:14:21 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 445351F10 for ; Tue, 1 Dec 2020 19:14:21 +0000 (UTC) X-FDA: 77545664322.03.crook03_051482e273ad Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 240CF28A4E8 for ; Tue, 1 Dec 2020 19:14:21 +0000 (UTC) X-HE-Tag: crook03_051482e273ad X-Filterd-Recvd-Size: 7415 Received: from mail-pj1-f67.google.com (mail-pj1-f67.google.com [209.85.216.67]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Tue, 1 Dec 2020 19:14:20 +0000 (UTC) Received: by mail-pj1-f67.google.com with SMTP id l23so1883101pjg.1 for ; Tue, 01 Dec 2020 11:14:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wdocc321VZyr0ixZFcdnUEHMOpRJjTac2mIc24NQYwg=; b=Gf0MeNT55gxsCDRpeW4s7Kjs+enGX/dzKKeUTGutLyd+p2rqOpcAhEN/SvgHPXDnEU GY6MVPfHYEFhbrveN11MdgfgVReEmJflFhBeZOHZY0nRSD78gNKqUCySLJD73Zu1x/GJ Kr+6CCFtY46cY5vidIX/TmOo1zGqAMpCmORScNys91ZMSLuxkGHU/zB/x9/qyhu7vzPK 4EJdslCc1En6Ai0BE414lPvEPpQgLVvMsp4U6bMmkBnp7Bc8xgVXafY/yrQmo4zry2k1 EbMjBq5rM6zXdciQ7+by//EZiSuD2IZVYASSSmnckA71pNe6oGqBEVrWT9BtCgltlDE5 JbKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wdocc321VZyr0ixZFcdnUEHMOpRJjTac2mIc24NQYwg=; b=V/tPVJTuhcvzkqSE3tIX6kD6ZMg7KrW9EzlGgkhCT8HtBvKwYKKbuLLti8yNr0s3VV D0WPk1VMTjv0N4owXy32mUFJI+UwYiukHQc02DECdAn3xe5FrIItzfmdyuvXz67UG9Lt ogKko0KXDVaUXeifQtGohRXcUPSivW0s8v6vpbcJYt+klioQydMjauZNIOADjGfuXBDD cNicni1VfwbnJwp/GNjCsO2wT/FaqCBjifKLp4UK9lkfKQu2831mUoMvi55P1wnuIGu3 qtDznpscucMNFBP2E26ellvAfDWCR34F5bgadmGurIz4NYIbwoI5y7Wj3LtadrWr2r5K EZnA== X-Gm-Message-State: AOAM530mWK7qFV6DdLOgBkYLdjPkp7wNco2o5z1tXjFd6Sia9HMgZf0t 8dmstXnP8US68jYYKoC+BRV8rkCWJhbyVSYfr7q81w== X-Google-Smtp-Source: ABdhPJxpM+h/Ol5ujwAYgDBAn/TNuW7lOCE47CRtxTYaM8G5O4s4TceUUL03NwreIEFaza+hHXQ+so6u+FO0ZPgJlZ8= X-Received: by 2002:a17:90b:1287:: with SMTP id fw7mr4341953pjb.52.1606850059415; Tue, 01 Dec 2020 11:14:19 -0800 (PST) MIME-Version: 1.0 References: <20201130233504.3725241-1-axelrasmussen@google.com> In-Reply-To: From: Axel Rasmussen Date: Tue, 1 Dec 2020 11:13:43 -0800 Message-ID: Subject: Re: [PATCH] mm: mmap_lock: fix use-after-free race and css ref leak in tracepoints To: Shakeel Butt Cc: Greg Thelen , Andrew Morton , Chinwen Chang , Daniel Jordan , David Rientjes , Davidlohr Bueso , Ingo Molnar , Jann Horn , Laurent Dufour , Michel Lespinasse , Stephen Rothwell , Steven Rostedt , Vlastimil Babka , Yafang Shao , "David S . Miller" , dsahern@kernel.org, Greg Kroah-Hartman , Jakub Kicinski , liuhangbin@gmail.com, Tejun Heo , LKML , Linux MM Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 1, 2020 at 10:42 AM Shakeel Butt wrote: > > On Tue, Dec 1, 2020 at 9:56 AM Greg Thelen wrote: > > > > Axel Rasmussen wrote: > > > > > On Mon, Nov 30, 2020 at 5:34 PM Shakeel Butt wrote: > > >> > > >> On Mon, Nov 30, 2020 at 3:43 PM Axel Rasmussen wrote: > > >> > > > >> > syzbot reported[1] a use-after-free introduced in 0f818c4bc1f3. The bug > > >> > is that an ongoing trace event might race with the tracepoint being > > >> > disabled (and therefore the _unreg() callback being called). Consider > > >> > this ordering: > > >> > > > >> > T1: trace event fires, get_mm_memcg_path() is called > > >> > T1: get_memcg_path_buf() returns a buffer pointer > > >> > T2: trace_mmap_lock_unreg() is called, buffers are freed > > >> > T1: cgroup_path() is called with the now-freed buffer > > >> > > >> Any reason to use the cgroup_path instead of the cgroup_ino? There are > > >> other examples of trace points using cgroup_ino and no need to > > >> allocate buffers. Also cgroup namespace might complicate the path > > >> usage. > > > > > > Hmm, so in general I would love to use a numeric identifier instead of a string. > > > > > > I did some reading, and it looks like the cgroup_ino() mainly has to > > > do with writeback, instead of being just a general identifier? > > > https://www.kernel.org/doc/Documentation/cgroup-v2.txt > > I think you are confusing cgroup inodes with real filesystem inodes in that doc. > > > > > > > There is cgroup_id() which I think is almost what I'd want, but there > > > are a couple problems with it: > > > > > > - I don't know of a way for userspace to translate IDs -> paths, to > > > make them human readable? > > > > The id => name map can be built from user space with a tree walk. > > Example: > > > > $ find /sys/fs/cgroup/memory -type d -printf '%i %P\n' # ~ [main] > > 20387 init.scope > > 31 system.slice > > > > > - Also I think the ID implementation we use for this is "dense", > > > meaning if a cgroup is removed, its ID is likely to be quickly reused. > > > > > The ID for cgroup nodes (underlying it is kernfs) are allocated from > idr_alloc_cyclic() which gives new ID after the last allocated ID and > wrap after around INT_MAX IDs. So, likeliness of repetition is very > low. Also the file_handle returned by name_to_handle_at() for cgroupfs > returns the inode ID which gives confidence to the claim of low chance > of ID reusing. Ah, for some reason I remembered it using idr_alloc(), but you're right, it does use cyclical IDs. Even so, tracepoints which expose these IDs would still be difficult to use I think. Say we're trying to collect a histogram of lock latencies over the course of some test we're running. At the end, we want to produce some kind of human-readable report. cgroups may come and go throughout the test. Even if we never re-use IDs, in order to be able to map all of them to human-readable paths, it seems like we'd need some background process to poll the /sys/fs/cgroup/memory directory tree as Greg described, keeping track of the ID<->path mapping. This seems expensive, and even if we poll relatively frequently we might still miss short-lived cgroups. Trying to aggregate such statistics across physical machines, or reboots of the same machine, is further complicated. The machine(s) may be running the same application, which runs in a container with the same path, but it'll end up with different IDs. So we'd have to collect the ID<->path mapping from each, and then try to match up the names for aggregation.