From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 062E2C64E7B for ; Tue, 1 Dec 2020 19:15:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8733D2151B for ; Tue, 1 Dec 2020 19:15:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Gf0MeNT5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731298AbgLATPA (ORCPT ); Tue, 1 Dec 2020 14:15:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34024 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730253AbgLATPA (ORCPT ); Tue, 1 Dec 2020 14:15:00 -0500 Received: from mail-pl1-x641.google.com (mail-pl1-x641.google.com [IPv6:2607:f8b0:4864:20::641]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24F6DC0613CF for ; Tue, 1 Dec 2020 11:14:20 -0800 (PST) Received: by mail-pl1-x641.google.com with SMTP id u2so1677404pls.10 for ; Tue, 01 Dec 2020 11:14:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wdocc321VZyr0ixZFcdnUEHMOpRJjTac2mIc24NQYwg=; b=Gf0MeNT55gxsCDRpeW4s7Kjs+enGX/dzKKeUTGutLyd+p2rqOpcAhEN/SvgHPXDnEU GY6MVPfHYEFhbrveN11MdgfgVReEmJflFhBeZOHZY0nRSD78gNKqUCySLJD73Zu1x/GJ Kr+6CCFtY46cY5vidIX/TmOo1zGqAMpCmORScNys91ZMSLuxkGHU/zB/x9/qyhu7vzPK 4EJdslCc1En6Ai0BE414lPvEPpQgLVvMsp4U6bMmkBnp7Bc8xgVXafY/yrQmo4zry2k1 EbMjBq5rM6zXdciQ7+by//EZiSuD2IZVYASSSmnckA71pNe6oGqBEVrWT9BtCgltlDE5 JbKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wdocc321VZyr0ixZFcdnUEHMOpRJjTac2mIc24NQYwg=; b=oO9EhL2C9EvtWOyqgXSmD6g7YcZ9cUZkGIQlUWABKcIpsTINAK1IVojkCJu5E2EShh LRPxTE72QSb6TUUTBtZqINBxfDNNX9ABqVJE+pzalvF4uVS6WymTiXu0hAT+OofGRX79 UNy59l7pgYCF3HUgt8MVbcZcrXQOWkdwVBlPf1Wuulp27qG8ZYixPqfcLcAjNsPs2+92 47aKhOBiaN4g3kilmZSf9CFpPHo/+F/+JycPearbutVroG6TtXej4JF3mcWZKAcjg4ST qlBX/oVMecSXnKRKI2Ciszoy3n8Pd6k5VyTOM2D2L6YY3eekpfkJQvsiKFOFgdZ2Ar4M 8caQ== X-Gm-Message-State: AOAM530Fd5XHC1MOPcSADtb3tgo2uz9W2eGWTXQ1NJ0JiTkZOmk7VlpQ JfUyS2GGvsMW0wdTeXA7BaxJtyjORjLvM45gqA+xKA== X-Google-Smtp-Source: ABdhPJxpM+h/Ol5ujwAYgDBAn/TNuW7lOCE47CRtxTYaM8G5O4s4TceUUL03NwreIEFaza+hHXQ+so6u+FO0ZPgJlZ8= X-Received: by 2002:a17:90b:1287:: with SMTP id fw7mr4341953pjb.52.1606850059415; Tue, 01 Dec 2020 11:14:19 -0800 (PST) MIME-Version: 1.0 References: <20201130233504.3725241-1-axelrasmussen@google.com> In-Reply-To: From: Axel Rasmussen Date: Tue, 1 Dec 2020 11:13:43 -0800 Message-ID: Subject: Re: [PATCH] mm: mmap_lock: fix use-after-free race and css ref leak in tracepoints To: Shakeel Butt Cc: Greg Thelen , Andrew Morton , Chinwen Chang , Daniel Jordan , David Rientjes , Davidlohr Bueso , Ingo Molnar , Jann Horn , Laurent Dufour , Michel Lespinasse , Stephen Rothwell , Steven Rostedt , Vlastimil Babka , Yafang Shao , "David S . Miller" , dsahern@kernel.org, Greg Kroah-Hartman , Jakub Kicinski , liuhangbin@gmail.com, Tejun Heo , LKML , Linux MM Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 1, 2020 at 10:42 AM Shakeel Butt wrote: > > On Tue, Dec 1, 2020 at 9:56 AM Greg Thelen wrote: > > > > Axel Rasmussen wrote: > > > > > On Mon, Nov 30, 2020 at 5:34 PM Shakeel Butt wrote: > > >> > > >> On Mon, Nov 30, 2020 at 3:43 PM Axel Rasmussen wrote: > > >> > > > >> > syzbot reported[1] a use-after-free introduced in 0f818c4bc1f3. The bug > > >> > is that an ongoing trace event might race with the tracepoint being > > >> > disabled (and therefore the _unreg() callback being called). Consider > > >> > this ordering: > > >> > > > >> > T1: trace event fires, get_mm_memcg_path() is called > > >> > T1: get_memcg_path_buf() returns a buffer pointer > > >> > T2: trace_mmap_lock_unreg() is called, buffers are freed > > >> > T1: cgroup_path() is called with the now-freed buffer > > >> > > >> Any reason to use the cgroup_path instead of the cgroup_ino? There are > > >> other examples of trace points using cgroup_ino and no need to > > >> allocate buffers. Also cgroup namespace might complicate the path > > >> usage. > > > > > > Hmm, so in general I would love to use a numeric identifier instead of a string. > > > > > > I did some reading, and it looks like the cgroup_ino() mainly has to > > > do with writeback, instead of being just a general identifier? > > > https://www.kernel.org/doc/Documentation/cgroup-v2.txt > > I think you are confusing cgroup inodes with real filesystem inodes in that doc. > > > > > > > There is cgroup_id() which I think is almost what I'd want, but there > > > are a couple problems with it: > > > > > > - I don't know of a way for userspace to translate IDs -> paths, to > > > make them human readable? > > > > The id => name map can be built from user space with a tree walk. > > Example: > > > > $ find /sys/fs/cgroup/memory -type d -printf '%i %P\n' # ~ [main] > > 20387 init.scope > > 31 system.slice > > > > > - Also I think the ID implementation we use for this is "dense", > > > meaning if a cgroup is removed, its ID is likely to be quickly reused. > > > > > The ID for cgroup nodes (underlying it is kernfs) are allocated from > idr_alloc_cyclic() which gives new ID after the last allocated ID and > wrap after around INT_MAX IDs. So, likeliness of repetition is very > low. Also the file_handle returned by name_to_handle_at() for cgroupfs > returns the inode ID which gives confidence to the claim of low chance > of ID reusing. Ah, for some reason I remembered it using idr_alloc(), but you're right, it does use cyclical IDs. Even so, tracepoints which expose these IDs would still be difficult to use I think. Say we're trying to collect a histogram of lock latencies over the course of some test we're running. At the end, we want to produce some kind of human-readable report. cgroups may come and go throughout the test. Even if we never re-use IDs, in order to be able to map all of them to human-readable paths, it seems like we'd need some background process to poll the /sys/fs/cgroup/memory directory tree as Greg described, keeping track of the ID<->path mapping. This seems expensive, and even if we poll relatively frequently we might still miss short-lived cgroups. Trying to aggregate such statistics across physical machines, or reboots of the same machine, is further complicated. The machine(s) may be running the same application, which runs in a container with the same path, but it'll end up with different IDs. So we'd have to collect the ID<->path mapping from each, and then try to match up the names for aggregation.