From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94C8BC433FE for ; Fri, 4 Dec 2020 17:48:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5FD3B22B40 for ; Fri, 4 Dec 2020 17:48:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729645AbgLDRrv (ORCPT ); Fri, 4 Dec 2020 12:47:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726602AbgLDRru (ORCPT ); Fri, 4 Dec 2020 12:47:50 -0500 Received: from mail-pl1-x642.google.com (mail-pl1-x642.google.com [IPv6:2607:f8b0:4864:20::642]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F138C061A53 for ; Fri, 4 Dec 2020 09:47:10 -0800 (PST) Received: by mail-pl1-x642.google.com with SMTP id v3so3492015plz.13 for ; Fri, 04 Dec 2020 09:47:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=YRl4xaWPv7d+f+XRgORxuMpHebNtXiM6WlyURXxB4l4=; b=EwITWSd3NyGz01qhAhX9VCQTFEHJcet65JIE+86ZwqIcKRmclA67uCt4bUJt7MbVxI /9SndBkwFsLI346sBttx9KU4tpfcep2/tCg04NApiiMhRyV6XYiOu2BxJ+VJFN9u+6k2 gr7T8qOpGnYqkZ4QGn6TdrgeeKqVmAmb/GsvhsAgnYcCGnqXZSY7uujDKzmlaFAl1aRl Co5J+/K9fY+rsV9Q6RX6MxJWMmfXuYnm3l9fs7xe5FdgMPWzyfh+chMg61OkUw9wN7vE zodmxdGHseiPAKyTVjx+k1oLGlxR4I2KFwJcwE+UggFxVySBjssijUgGUoJytYJPEe27 66Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=YRl4xaWPv7d+f+XRgORxuMpHebNtXiM6WlyURXxB4l4=; b=hpMKd95S9A3wi2pk3QQFUirQG3ViKYV6vykwAFSvFzRUeyYTpuBkjapPc/hlCJ3mBa xMh/mJCEAHdNRbDulNDNfiQfZuWdo/3Y9RvAwx7/2GtE8QdMV+cYG/RjWV1csGUh7iF9 LqYa87MXrEc5VaFNGfiZat7ujjrLPHlGvTPDk4FuM0KWeAgqp0XfXytJpFnatNF6AYSF K873HUTVonCPAtNFIom2Oa/aYmkpVwumu7ZrZtQ6jy/nB9ku6bTV4qiGKpCJ/mtaeJ4Z CMNEoPRdh7iYaMCpQRMBUoY9kvf98AT8+kRS6C6H/cSgiPY7AJ+1RdNDNlu1iDRfRTTG /kbA== X-Gm-Message-State: AOAM532LHNTAOSlXahwLdjhlKdMPb65KeAjAEQjKuNqW12/ZO9Zt/4+y wzOz9TVnFan4Zfo1MzRMjYup0o4eKXwe5colD3UI6A== X-Google-Smtp-Source: ABdhPJyjzl/QPPAowJCuEkNFLALdZaDvYL4FurWcvSIVnN4uaZY5O1XjpO5DVYiFSfrCCdfk/z94YhQ6D3TP8DrUM3k= X-Received: by 2002:a17:90b:1287:: with SMTP id fw7mr5184713pjb.52.1607104029937; Fri, 04 Dec 2020 09:47:09 -0800 (PST) MIME-Version: 1.0 References: <20201130233504.3725241-1-axelrasmussen@google.com> <20201201200715.6171d39b@oasis.local.home> <1eb44e95-1fae-5d64-d114-d305c9b8ef63@suse.cz> In-Reply-To: <1eb44e95-1fae-5d64-d114-d305c9b8ef63@suse.cz> From: Axel Rasmussen Date: Fri, 4 Dec 2020 09:46:33 -0800 Message-ID: Subject: Re: [PATCH] mm: mmap_lock: fix use-after-free race and css ref leak in tracepoints To: Vlastimil Babka Cc: Shakeel Butt , Steven Rostedt , Tejun Heo , Greg Thelen , Andrew Morton , Chinwen Chang , Daniel Jordan , David Rientjes , Davidlohr Bueso , Ingo Molnar , Jann Horn , Laurent Dufour , Michel Lespinasse , Stephen Rothwell , Yafang Shao , "David S . Miller" , dsahern@kernel.org, Greg Kroah-Hartman , Jakub Kicinski , liuhangbin@gmail.com, LKML , Linux MM Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 4, 2020 at 8:36 AM Vlastimil Babka wrote: > > On 12/2/20 2:11 AM, Shakeel Butt wrote: > > On Tue, Dec 1, 2020 at 5:07 PM Steven Rostedt wrote: > >> > >> On Tue, 1 Dec 2020 16:36:32 -0800 > >> Shakeel Butt wrote: > >> > >> > SGTM but note that usually Andrew squash all the patches into one > >> > before sending to Linus. If you plan to replace the path buffer with > >> > integer IDs then no need to spend time fixing buffer related bug. > >> > >> I don't think Andrew squashes all the patches. I believe he sends Linus > >> a patch series. > > > > I am talking about the patch and the following fixes to that patch. > > Those are usually squashed into one patch. > > Yeah, if there's a way forward that doesn't need to construct full path on each > event and the associated complexity and just use an ID, let's convert to the ID > and squash it, for less churn. Especially if there are other existing > tracepoints that use the ID. The thing I worry about is that it being inconvenient will prevent folks from using it to send us data on how mmap_lock behaves under their workloads. Anecdotally, I talked to a few teams within Google, and it seems those using containers use paths instead of IDs to refer to them, and they don't have existing infrastructure to keep track of the mapping. So to collect data from those workloads, we'd have to wait for them to build such a thing, vs. just asking them to run a bash one-liner. I haven't done a wider survey, but I suspect the same is true for users of e.g. Docker or Kubernetes. > > If there's further (somewhat orthogonal) work to make the IDs easier for > userspace, it can be added on top later, but really shouldn't need to add the > current complex solution only to remove it later? I'm on board with Shakeel's idea to export this on cgroup creation/deletion instead, since it lets us keep the complexity in exactly one place. I think such a thing would use the same code I'm proposing now, though, so we wouldn't just be deleting it if we merge it. Also, before doing that I think it's worth identifying a second user within the kernel (maybe the writeback tracepoint mentioned earlier in the thread?), so doing it incrementally seems reasonable to me. This plan is also in-line with existing stuff we provide. Histogram triggers provide utilities like ".execname" (PID -> execname) or ".sym" (address -> symbol) for convenience. Sure, userspace could figure these things out for itself, but it's convenient to provide them directly, and it's not so bad since we have exactly one place in the tracing infrastructure that knows how to do these translations. Actually, maybe a ".cgpath" variant would be better than a separate tracepoint. I haven't looked at what either approach would require in detail; maybe another reason to do this iteratively. :) > > Thanks, > Vlastimil From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50340C4361A for ; Fri, 4 Dec 2020 17:47:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C525122A83 for ; Fri, 4 Dec 2020 17:47:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C525122A83 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3F5506B005D; Fri, 4 Dec 2020 12:47:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A8176B006E; Fri, 4 Dec 2020 12:47:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2BC816B0070; Fri, 4 Dec 2020 12:47:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0209.hostedemail.com [216.40.44.209]) by kanga.kvack.org (Postfix) with ESMTP id 14D306B005D for ; Fri, 4 Dec 2020 12:47:12 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D0610824999B for ; Fri, 4 Dec 2020 17:47:11 +0000 (UTC) X-FDA: 77556331062.17.sort24_040c9e9273c6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id A73EA180D0185 for ; Fri, 4 Dec 2020 17:47:11 +0000 (UTC) X-HE-Tag: sort24_040c9e9273c6 X-Filterd-Recvd-Size: 7029 Received: from mail-pl1-f194.google.com (mail-pl1-f194.google.com [209.85.214.194]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Fri, 4 Dec 2020 17:47:11 +0000 (UTC) Received: by mail-pl1-f194.google.com with SMTP id y10so2052893plr.10 for ; Fri, 04 Dec 2020 09:47:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=YRl4xaWPv7d+f+XRgORxuMpHebNtXiM6WlyURXxB4l4=; b=EwITWSd3NyGz01qhAhX9VCQTFEHJcet65JIE+86ZwqIcKRmclA67uCt4bUJt7MbVxI /9SndBkwFsLI346sBttx9KU4tpfcep2/tCg04NApiiMhRyV6XYiOu2BxJ+VJFN9u+6k2 gr7T8qOpGnYqkZ4QGn6TdrgeeKqVmAmb/GsvhsAgnYcCGnqXZSY7uujDKzmlaFAl1aRl Co5J+/K9fY+rsV9Q6RX6MxJWMmfXuYnm3l9fs7xe5FdgMPWzyfh+chMg61OkUw9wN7vE zodmxdGHseiPAKyTVjx+k1oLGlxR4I2KFwJcwE+UggFxVySBjssijUgGUoJytYJPEe27 66Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=YRl4xaWPv7d+f+XRgORxuMpHebNtXiM6WlyURXxB4l4=; b=hR9JwuAbTTaWLfT9DnCnQZuF+hkGyxycjSSjCVRh9njmUgIeLvYm+gqHuZb2dh+Tie +a0UjdBxavX/bfLPen4eOuLatPAuUhjFm33yCgy2uHO/v85M9omGg7z35kdLEWotqwVT tVM++Bm/BrQ11Yik6XQfgeNXUFER80oMtr5VorBldZPbrFettNewGBwwHEMxlGESPPaR hLH2I7pSB/xVBN5eMxZHGwmzwkkF8uaErgrWSX7LqNaOUry61Y2V0y0UkQUnDfC07R9N t7V4E85HpUUu+F0AWAiNhYu52Ng634zUzryYaZMCtU7aFKk6RYdh3gxaeY92mus5qOeB eYTA== X-Gm-Message-State: AOAM530h5Lwpz5OzQWkjsSXSz3Yn8JnJOxijSzlq6zSQs93ngr5HfULn 8ghKNddGfnK/n/S8dBzKUWjlOmCOcJa0p/fTxIZj7A== X-Google-Smtp-Source: ABdhPJyjzl/QPPAowJCuEkNFLALdZaDvYL4FurWcvSIVnN4uaZY5O1XjpO5DVYiFSfrCCdfk/z94YhQ6D3TP8DrUM3k= X-Received: by 2002:a17:90b:1287:: with SMTP id fw7mr5184713pjb.52.1607104029937; Fri, 04 Dec 2020 09:47:09 -0800 (PST) MIME-Version: 1.0 References: <20201130233504.3725241-1-axelrasmussen@google.com> <20201201200715.6171d39b@oasis.local.home> <1eb44e95-1fae-5d64-d114-d305c9b8ef63@suse.cz> In-Reply-To: <1eb44e95-1fae-5d64-d114-d305c9b8ef63@suse.cz> From: Axel Rasmussen Date: Fri, 4 Dec 2020 09:46:33 -0800 Message-ID: Subject: Re: [PATCH] mm: mmap_lock: fix use-after-free race and css ref leak in tracepoints To: Vlastimil Babka Cc: Shakeel Butt , Steven Rostedt , Tejun Heo , Greg Thelen , Andrew Morton , Chinwen Chang , Daniel Jordan , David Rientjes , Davidlohr Bueso , Ingo Molnar , Jann Horn , Laurent Dufour , Michel Lespinasse , Stephen Rothwell , Yafang Shao , "David S . Miller" , dsahern@kernel.org, Greg Kroah-Hartman , Jakub Kicinski , liuhangbin@gmail.com, LKML , Linux MM Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Dec 4, 2020 at 8:36 AM Vlastimil Babka wrote: > > On 12/2/20 2:11 AM, Shakeel Butt wrote: > > On Tue, Dec 1, 2020 at 5:07 PM Steven Rostedt wrote: > >> > >> On Tue, 1 Dec 2020 16:36:32 -0800 > >> Shakeel Butt wrote: > >> > >> > SGTM but note that usually Andrew squash all the patches into one > >> > before sending to Linus. If you plan to replace the path buffer with > >> > integer IDs then no need to spend time fixing buffer related bug. > >> > >> I don't think Andrew squashes all the patches. I believe he sends Linus > >> a patch series. > > > > I am talking about the patch and the following fixes to that patch. > > Those are usually squashed into one patch. > > Yeah, if there's a way forward that doesn't need to construct full path on each > event and the associated complexity and just use an ID, let's convert to the ID > and squash it, for less churn. Especially if there are other existing > tracepoints that use the ID. The thing I worry about is that it being inconvenient will prevent folks from using it to send us data on how mmap_lock behaves under their workloads. Anecdotally, I talked to a few teams within Google, and it seems those using containers use paths instead of IDs to refer to them, and they don't have existing infrastructure to keep track of the mapping. So to collect data from those workloads, we'd have to wait for them to build such a thing, vs. just asking them to run a bash one-liner. I haven't done a wider survey, but I suspect the same is true for users of e.g. Docker or Kubernetes. > > If there's further (somewhat orthogonal) work to make the IDs easier for > userspace, it can be added on top later, but really shouldn't need to add the > current complex solution only to remove it later? I'm on board with Shakeel's idea to export this on cgroup creation/deletion instead, since it lets us keep the complexity in exactly one place. I think such a thing would use the same code I'm proposing now, though, so we wouldn't just be deleting it if we merge it. Also, before doing that I think it's worth identifying a second user within the kernel (maybe the writeback tracepoint mentioned earlier in the thread?), so doing it incrementally seems reasonable to me. This plan is also in-line with existing stuff we provide. Histogram triggers provide utilities like ".execname" (PID -> execname) or ".sym" (address -> symbol) for convenience. Sure, userspace could figure these things out for itself, but it's convenient to provide them directly, and it's not so bad since we have exactly one place in the tracing infrastructure that knows how to do these translations. Actually, maybe a ".cgpath" variant would be better than a separate tracepoint. I haven't looked at what either approach would require in detail; maybe another reason to do this iteratively. :) > > Thanks, > Vlastimil