From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C1EFC433E2 for ; Thu, 10 Sep 2020 20:08:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1CB4E221E3 for ; Thu, 10 Sep 2020 20:08:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726639AbgIJUIA (ORCPT ); Thu, 10 Sep 2020 16:08:00 -0400 Received: from out02.mta.xmission.com ([166.70.13.232]:50192 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726805AbgIJUGy (ORCPT ); Thu, 10 Sep 2020 16:06:54 -0400 Received: from in02.mta.xmission.com ([166.70.13.52]) by out02.mta.xmission.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1kGSqJ-0035dM-Rh; Thu, 10 Sep 2020 14:06:51 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1kGSqI-0000wG-QO; Thu, 10 Sep 2020 14:06:51 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Al Viro Cc: Tetsuo Handa , linux-fsdevel@vger.kernel.org References: <20200708142409.8965-1-penguin-kernel@I-love.SAKURA.ne.jp> <1596027885-4730-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> <20200910035750.GX1236603@ZenIV.linux.org.uk> <20200910112524.GY1236603@ZenIV.linux.org.uk> Date: Thu, 10 Sep 2020 15:06:34 -0500 In-Reply-To: <20200910112524.GY1236603@ZenIV.linux.org.uk> (Al Viro's message of "Thu, 10 Sep 2020 12:25:24 +0100") Message-ID: <878sdh5rcl.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1kGSqI-0000wG-QO;;;mid=<878sdh5rcl.fsf@x220.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX19XWNGdrM+px272aSjcsNQq7ILIxB7VncU= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH v2] fput: Allow calling __fput_sync() from !PF_KTHREAD thread. X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Al Viro writes: > On Thu, Sep 10, 2020 at 02:26:46PM +0900, Tetsuo Handa wrote: >> Thank you for responding. I'm also waiting for your response on >> "[RFC PATCH] pipe: make pipe_release() deferrable." at >> https://lore.kernel.org/linux-fsdevel/7ba35ca4-13c1-caa3-0655-50d328304462@i-love.sakura.ne.jp/ >> and "[PATCH] splice: fix premature end of input detection" at >> https://lore.kernel.org/linux-block/cf26a57e-01f4-32a9-0b2c-9102bffe76b2@i-love.sakura.ne.jp/ . >> >> > >> > NAK. The reason to defer is *NOT* to bypass that BUG_ON() - we really do not >> > want that thing done on anything other than extremely shallow stack. >> > Incidentally, why is that thing ever done _not_ in a kernel thread context? >> >> What does "that thing" refer to? acct_pin_kill() ? blob_to_mnt() ? >> I don't know the reason because I'm not the author of these functions. > > The latter. What I mean, why not simply do that from inside of > fork_usermode_driver()? Because that is a stupid place to do the work. The usermode driver is currently allowed to die and the kernel be respawned when needed. Which means there is not a 1 to 1 relationship between blob_to_mnt and fork_usermode_driver. As for the current code being racy, it is approxiamtely as racy as the current code to load files init an initrd. AKA no one has ever observed any problems in practice but if you squint you can see where maybe something could happen. I think there is a stronger argument for finding a way to guarantee that flush_delayed_fput will wait until any scheduled delayed_fput_work will complete. As that is the race Tetsuo is complaining about, and it does also appear to also be present in populate_rootfs. Flushing the fput is needed to ensure the writable struct file is completely gone before an exec opens file file and calles deny_write_access. > umd_setup is stored in sub_info->init and > eventually called from call_usermodehelper_exec_async(), right before > the created kernel thread is about to call kernel_execve() and stop > being a kernel thread... I think you are suggesting calling __fput_sync in umd_setup. Instead of calling fput from blob_to_mnt. To have a special case that only applies the first time a function is called is possible but it is awkward, and likely more error prone. I moved all of the user mode driver code out of exec and out of the user mode helper code as the user mode driver code is essentially unused at present. The bpf folks really want to try and make it work so I wrote something that is not completely insane so they can have their chance to try. I really suspect it will go the way of all of the migration of the early kernel init code to userspace with klibc. With the practical details overwhelming things and making it not work or worth it in practice. Time will tell. I hope that is enough context to understand what is going on there. Eric