From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E04ACC4727E for ; Thu, 1 Oct 2020 16:06:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 94361207FB for ; Thu, 1 Oct 2020 16:06:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="v0zUtKZZ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732672AbgJAQF3 (ORCPT ); Thu, 1 Oct 2020 12:05:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732380AbgJAQF2 (ORCPT ); Thu, 1 Oct 2020 12:05:28 -0400 Received: from mail-ej1-x642.google.com (mail-ej1-x642.google.com [IPv6:2a00:1450:4864:20::642]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E459C0613E2 for ; Thu, 1 Oct 2020 09:05:28 -0700 (PDT) Received: by mail-ej1-x642.google.com with SMTP id md26so3208976ejb.10 for ; Thu, 01 Oct 2020 09:05:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=IMB73dDKkouoCGF4bPiVkzHfDrWjLelK7eqM7hnqOfQ=; b=v0zUtKZZ9u7C+SlsQ2sjJJXn5TQDdDxiCusGcV7Bneg7l6BBK2FRvU6tKaf6cwbkZr Bxr4HqFic8ETbL+6s6FkrDBJ6RrJRdxI5e6jH9HqzuAdekZiVBeBfcE+BsIOQYPibWjH 5wDBH5QUEc3dmwkTqAvAlue2C4c55WjFjAt60tuobqpclbLDVXzW3t/vv/ks3GMmLmS3 6Ce5Q9qnrrxYVWmqMxqGZedF4dz75mbs5iMjV4WMHNXLA/xqRf88EKEmLm0XEz+E9Fl7 PyFGoySwCEvboCSlpTknQ/0twIK7Hhr3NtN64G/bzOO0hO0Q5LjfZQ38IYFXieNRFg9b ZFXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=IMB73dDKkouoCGF4bPiVkzHfDrWjLelK7eqM7hnqOfQ=; b=TH/PlvhsqKVE7P8ocNyHvt67vcmGy0uf79Yelqey6rrHoNSBT70GuYtvLftGJJU7xr Kpb/qGn4RBDQHd2mqg/8RG3mW4a47pjZ2HGHeS1j1KKVpB9fkQkaEp91jIIpDFcq3LgV zrV3lzu0/QV163iKC4O8CTkZKJQvp/QGKzdIYqCPWz/SvxOi5M+++eWac0wAw8CCkNQz rQAp0X9OmmQ8nvjszJBIySJWFq9ccsko6k74as2rGQNwIRVonP/GardVmE4IQPmzg8va qvO0QkKuCfsAD4e0Ws96F+1Un0Mmr6NpaUy/HglirQWhfzfr85W6bM8QLyFyiGOWpq5r qf/w== X-Gm-Message-State: AOAM533s9voWicjzHIBEi2vu1+HJViPkl6t6qBHzk+TYnkNYykDqsCaE 0DVZ3Lx3lObIdOf7+Lzk0dCZFmzQxPDhE8l++UCRkQ== X-Google-Smtp-Source: ABdhPJxKtttAvVo28q3xfu3pZ2Zdcn3tId/kcpgfnRwxqrJNDHKqiBhxtoN0zCN6pGxjHl75UiIS9gSFsZdo/e98VLw= X-Received: by 2002:a17:906:33c8:: with SMTP id w8mr4247020eja.233.1601568326659; Thu, 01 Oct 2020 09:05:26 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jann Horn Date: Thu, 1 Oct 2020 18:05:00 +0200 Message-ID: Subject: Re: [PATCH v3 seccomp 5/5] seccomp/cache: Report cache data through /proc/pid/seccomp_cache To: YiFei Zhu Cc: Linux Containers , YiFei Zhu , bpf , kernel list , Aleksa Sarai , Andrea Arcangeli , Andy Lutomirski , David Laight , Dimitrios Skarlatos , Giuseppe Scrivano , Hubertus Franke , Jack Chen , Josep Torrellas , Kees Cook , Tianyin Xu , Tobin Feldman-Fitzthum , Tycho Andersen , Valentin Rothberg , Will Drewry Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 1, 2020 at 2:06 PM YiFei Zhu wrote: > On Wed, Sep 30, 2020 at 5:01 PM Jann Horn wrote: > > Hmm, this won't work, because the task could be exiting, and seccomp > > filters are detached in release_task() (using > > seccomp_filter_release()). And at the moment, seccomp_filter_release() > > just locklessly NULLs out the tsk->seccomp.filter pointer and drops > > the reference. > > > > The locking here is kind of gross, but basically I think you can > > change this code to use lock_task_sighand() / unlock_task_sighand() > > (see the other examples in fs/proc/base.c), and bail out if > > lock_task_sighand() returns NULL. And in seccomp_filter_release(), add > > something like this: > > > > /* We are effectively holding the siglock by not having any sighand. */ > > WARN_ON(tsk->sighand != NULL); > > Ah thanks. I was thinking about how tasks exit and get freed and that > sort of stuff, and how this would race against them. The last time I > worked with procfs there was some magic going on that I could not > figure out, so I was thinking if some magic will stop the task_struct > from being released, considering it's an argument here. > > I just looked at release_task and related functions; looks like it > will, at the end, decrease the reference count of the task_struct. > Does procfs increase the refcount while calling the procfs functions? > Hence, in procfs functions one can rely on the task_struct still being > a task_struct, but any direct effects of release_task may happen while > the procfs functions are running? Yeah. The ONE() entry you're adding to tgid_base_stuff is used to help instantiate a "struct inode" when someone looks up the path "/proc/$tgid/seccomp_cache"; then when that path is opened, a "struct file" is created that holds a reference to the inode; and while that file exists, your proc_pid_seccomp_cache() can be invoked. proc_pid_seccomp_cache() is invoked from proc_single_show() ("PROC_I(inode)->op.proc_show" is proc_pid_seccomp_cache), and proc_single_show() obtains a temporary reference to the task_struct using get_pid_task() on a "struct pid" and drops that reference afterwards with put_task_struct(). The "struct pid" is obtained from the "struct proc_inode", which is essentially a subclass of "struct inode". The "struct pid" is kept refererenced until the inode goes away, via proc_pid_evict_inode(), called by proc_evict_inode(). By looking at put_task_struct() and its callees, you can figure out which parts of the "struct task" are kept alive by the reference to it. By the way, maybe it'd make sense to add this to tid_base_stuff as well? That should just be one extra line of code. Seccomp filters are technically per-thread, so it would make sense to have them visible in the per-thread subdirectories /proc/$pid/task/$tid/.