From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14827C433F5 for ; Tue, 15 Mar 2022 19:05:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351334AbiCOTGR (ORCPT ); Tue, 15 Mar 2022 15:06:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351391AbiCOTEv (ORCPT ); Tue, 15 Mar 2022 15:04:51 -0400 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 027ACB7F; Tue, 15 Mar 2022 12:03:15 -0700 (PDT) Received: by mail-pf1-x42b.google.com with SMTP id s11so407327pfu.13; Tue, 15 Mar 2022 12:03:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lZaAUf36wy+zyCzd7ZqoHqZ/0IIkM1TVPdclUXF4mkE=; b=RZGSwuYdBhGEZZlHlV7YGxZr5LLFVv4d1VaWOFpmrU7S0PTZrePuyM8iHU9Aw3ttCi 1feExChqyT5dno+xcN81w1Kx2u9R93ev0QwX5z5PWTF973Os48vOXKUIeRvZT2k7K9Q4 E+mZPy+M4eloI/g7aE08NffB5sdpanQXZXjbUtMHlBEBCFO4P/EqIETQhCAPpRa3yOky Mh3Y2bbVusXtA2BzLCY+tDc667d2tO94H9IyRSCMn6L/S9DkHYnvWzwYiaBUjiJiPum6 YK02sFABucji2xmF+sH9inGtkQar2HeO5DVsxPzvgMLTIRajJShjCTPOcYcsFFIHblTx 8mJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lZaAUf36wy+zyCzd7ZqoHqZ/0IIkM1TVPdclUXF4mkE=; b=XtslDMXCcMCVRwdD/CymBG/klyZhnAgVC0Dm7cSCUPno48b2CdMd1qCDKIQJluAlCK TrydRcHiCsoZVXWcPUukA70jXoqAI3hStDPV8sgqNmxQIS9s8DN/Vhius1W/VBA6hOKn u9OW1COFzFolMfBlWc1I8w7K8rF7iBBOfGnlK4ZsRtufTxIs7hSm8NWb9wU5rFdbmEfC NhB5LHUMLvLjb7sg/4aKyVE5oNzlAMbHb8NWw/eAgmwF0ViQdcwPUm+TZ39lKcTnq7UT JfJZBUpDcm8VB23U84c/VsXLarGh+dZboVBmACc2Pbx8C2EA7jxc8guaYQuPpDZzh3oh 0/sA== X-Gm-Message-State: AOAM530G0sOibiUvxebv+ciEa6iPNeQou2o/ZTi3p4ADfJQyXpHI1jqJ qrD+01ZbGzXICc/7p5XR2BlTNCYP1DcO0bqo7XwZ/CvT X-Google-Smtp-Source: ABdhPJxHGrdaXBmC/R8i51YwFlI9MFHPTpkJ3bWLJpxYfVIWuUNnyq58BY8R5uiEbHbWwdaehWkSoMFBhmDf9Z9gtf4= X-Received: by 2002:aa7:805a:0:b0:4f6:dc68:5d41 with SMTP id y26-20020aa7805a000000b004f6dc685d41mr29924894pfm.69.1647370995298; Tue, 15 Mar 2022 12:03:15 -0700 (PDT) MIME-Version: 1.0 References: <20220225234339.2386398-1-haoluo@google.com> <20220225234339.2386398-2-haoluo@google.com> In-Reply-To: From: Alexei Starovoitov Date: Tue, 15 Mar 2022 12:03:04 -0700 Message-ID: Subject: Re: [PATCH bpf-next v1 1/9] bpf: Add mkdir, rmdir, unlink syscalls for prog_bpf_syscall To: Hao Luo Cc: Al Viro , Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Shakeel Butt , Joe Burton , Tejun Heo , Josh Don , Stanislav Fomichev , bpf , LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 15, 2022 at 11:59 AM Alexei Starovoitov wrote: > > On Tue, Mar 15, 2022 at 10:27 AM Hao Luo wrote: > > > > On Mon, Mar 14, 2022 at 4:12 PM Al Viro wrote: > > > > > > On Mon, Mar 14, 2022 at 10:07:31AM -0700, Hao Luo wrote: > > > > Hello Al, > > > > > > > > In which contexts can those be called? > > > > > > > > > > > > > In a sleepable context. The plan is to introduce a certain tracepoints > > > > as sleepable, a program that attaches to sleepable tracepoints is > > > > allowed to call these functions. In particular, the first sleepable > > > > tracepoint introduced in this patchset is one at the end of > > > > cgroup_mkdir(). Do you have any advices? > > > > > > Yes - don't do it, unless you really want a lot of user-triggerable > > > deadlocks. > > > > > > Pathname resolution is not locking-agnostic. In particular, you can't > > > do it if you are under any ->i_rwsem, whether it's shared or exclusive. > > > That includes cgroup_mkdir() callchains. And if the pathname passed > > > to these functions will have you walk through the parent directory, > > > you would get screwed (e.g. if the next component happens to be > > > inexistent, triggering a lookup, which takes ->i_rwsem shared). > > > > I'm thinking of two options, let's see if either can work out: > > > > Option 1: We can put restrictions on the pathname passed into this > > helper. We can explicitly require the parameter dirfd to be in bpffs > > (we can verify). In addition, we check pathname to be not containing > > any dot or dotdot, so the resolved path will end up inside bpffs, > > therefore won't take ->i_rwsem that is in the callchain of > > cgroup_mkdir(). > > > > Option 2: We can avoid pathname resolution entirely. Like above, we > > can adjust the semantics of this helper to be: making an immediate > > directory under the dirfd passed in. In particular, like above, we can > > enforce the dirfd to be in bpffs and pathname to consist of only > > alphabet and numbers. With these restrictions, we call vfs_mkdir() to > > create directories. > > > > Being able to mkdir from bpf has useful use cases, let's try to make > > it happen even with many limitations. > > Option 3. delegate vfs_mkdir to a worker and wait in the helper. I meant _dont_ wait, of course.