From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7074C43334 for ; Mon, 4 Jul 2022 19:04:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234607AbiGDTES (ORCPT ); Mon, 4 Jul 2022 15:04:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234737AbiGDTD7 (ORCPT ); Mon, 4 Jul 2022 15:03:59 -0400 Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [IPv6:2a03:a000:7:0:5054:ff:fe1c:15ff]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 61FEB11C29; Mon, 4 Jul 2022 12:03:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=StWcqGItZutY3CkZqZp4EajG2FBl6ewxUbX21EaiIa0=; b=gION5JVUSctjEA3iGVyhgDOwB8 /cIoBlprCjdoAmu+i56RbxnuMuL4IKHr7XgAS37kiaaF4oZ1UP6Cw6APsQOce+xxaXzKDsY66devu sokw+Zjn7Va1VU574s0V9995iYgFWRSsgvDKJYYBEaZM+zFhtFOIEZqa2ed4WycftRArPMeF0UrEC NS3ZibrqrAgjlSoyiT84HEu0f9G07c6zeYA8TeenVF0dYmRU5sfuySzh3JQ3QfHGuHIaEzRGdYXqJ TxAAGr2j4E89bonfbrnO/p8QxTRGrgGkafzSkKhQwMVnXwFLLymD4z2DKEPHZl567ilxQiHg3gQiD yxdElB1Q==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.95 #2 (Red Hat Linux)) id 1o8RLQ-0086lx-NV; Mon, 04 Jul 2022 19:02:52 +0000 Date: Mon, 4 Jul 2022 20:02:52 +0100 From: Al Viro To: Linus Torvalds Cc: Alexander Potapenko , Alexei Starovoitov , Andrew Morton , Andrey Konovalov , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Christoph Hellwig , Christoph Lameter , David Rientjes , Dmitry Vyukov , Eric Dumazet , Greg Kroah-Hartman , Herbert Xu , Ilya Leoshkevich , Ingo Molnar , Jens Axboe , Joonsoo Kim , Kees Cook , Marco Elver , Mark Rutland , Matthew Wilcox , "Michael S. Tsirkin" , Pekka Enberg , Peter Zijlstra , Petr Mladek , Steven Rostedt , Thomas Gleixner , Vasily Gorbik , Vegard Nossum , Vlastimil Babka , kasan-dev , Linux-MM , linux-arch , Linux Kernel Mailing List , Evgenii Stepanov , Nathan Chancellor , Nick Desaulniers , Segher Boessenkool , Vitaly Buka , linux-toolchains Subject: Re: [PATCH v4 43/45] namei: initialize parameters passed to step_into() Message-ID: References: <20220701142310.2188015-1-glider@google.com> <20220701142310.2188015-44-glider@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Al Viro Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 04, 2022 at 10:36:05AM -0700, Linus Torvalds wrote: > For example, in __follow_mount_rcu(), when we jump to a new mount > point, and that sequence has > > *seqp = read_seqcount_begin(&dentry->d_seq); > > to reset the sequence number to the new path we jumped into. > > But I don't actually see what checks the previous sequence number in > that path. We just reset it to the new one. Theoretically it could be a problem. We have /mnt/foo/bar and /mnt/baz/bar. Something's mounted on /mnt/foo, hiding /mnt/foo/bar. We start a pathwalk for /mnt/baz/bar, someone umounts /mnt/foo and swaps /mnt/foo to /mnt/baz before we get there. We are doomed to get -ECHILD from an attempt to legitimize in the end, no matter what. However, we might get a hard error (-ENOENT, for example) before that, if we pick up the old mount that used to be on top of /mnt/foo (now /mnt/baz) and had been detached before the damn thing had become /mnt/baz and notice that there's no "bar" in its root. It used to be impossible (rename would've failed if the target had been non-empty and had we managed to empty it first, well, there's your point when -ENOENT would've been accurate). With exchange... Yes, it's a possible race. Might need to add if (read_seqretry(&mount_lock, nd->m_seq)) return false; in there. And yes, it's a nice demonstration of how subtle and brittle RCU pathwalk is - nobody noticed this bit of fun back when RENAME_EXCHANGE had been added... It got a lot more readable these days, but... > For __follow_mount_rcu it looks like validating the previous sequence > number is left to the caller, which then does try_to_unlazy_next(). Not really - the caller goes there only if we have __follow_mount_rcu() say "it's too tricky for me, get out of RCU mode and deal with it there". Anyway, I've thrown a mount_lock check in there, running xfstests to see how it goes...