From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61E24C433E0 for ; Wed, 8 Jul 2020 06:55:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3904E20760 for ; Wed, 8 Jul 2020 06:55:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jqy2ZjZN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726206AbgGHGzp (ORCPT ); Wed, 8 Jul 2020 02:55:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726194AbgGHGzo (ORCPT ); Wed, 8 Jul 2020 02:55:44 -0400 Received: from mail-il1-x142.google.com (mail-il1-x142.google.com [IPv6:2607:f8b0:4864:20::142]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79544C061755 for ; Tue, 7 Jul 2020 23:55:44 -0700 (PDT) Received: by mail-il1-x142.google.com with SMTP id a6so21438576ilq.13 for ; Tue, 07 Jul 2020 23:55:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=r7NYpamvSZwhLmW4eLuWbB/7LiSihL9Y4Yst8qHzeaw=; b=jqy2ZjZNYp2nAz7Ms4z7/xMBFrZI2NXp4yYD3qjCWXrclEl+MtXiiKd/81JXYiUpGw V2UqDjdfwMdqzmvextoZ2txMOLBIuv7B5yqAHOBj9uhk8vrkU06SVNwxHjXkTpJetS1Q uL8C27sC1P47TgsHb1uyL6CSuOW5rxZ5acNvmSPmkbQMJtT5DgR+tBqFSlSdzsLX2S+6 BEwnuUwNWNNPFmKsrnnY61Zw5SkjWUKmiG+HL47F5cHBrG4KdT25touQAA6/2V/O4MFL SeQK11nE6WpngLX8Xi8u8tdX9/k8QS/30e18hF5a2D2EdSs/t+pvYNBW6VifpyfHCIX5 CFSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=r7NYpamvSZwhLmW4eLuWbB/7LiSihL9Y4Yst8qHzeaw=; b=la7+3NhdS5vkOEd8VrfRBoR3PbadyJJz0npDbabGBkk7V/Y2KpqlaJOMEpgantjbXJ RxzVI2bozAMkeqotlqoQn0qzd+MBB4eh3pI4bsPdsMOfXGQRdLBijRvZa8GhcTgRX7qp QWFQm7cZH5iw0+qGMRBbbEN9sPqfQ9hGxM75XDXrlsgNBDocM+2vLs60NjvHUb2wM+h0 7nAwFhr9e6iTnxuLzNebOoo6bNuHLIjf+i+1Myktn6W00//VoyMe7FfIYX2emz9+hNTj 7qOmQC99v3F2HKEl2y5StbgT7mUhweU5Sm+75mVWgv6wFUUNemLPXIcyqymLjuvgKje2 m5IA== X-Gm-Message-State: AOAM532xsPMwLUhMKxP1zbfvnru7BGuJFhkGRF5QIPBFxixZ2+xALP36 QXiMDm1jJP4q5yZbyuTypLji/HbeuPQrfRoPxWQ= X-Google-Smtp-Source: ABdhPJxh9be9VKVTL45VkmifFR+BVAVbSS59uJi7yDyCPdLd6IiSnif0QyL3oAgRxtCakOf9moRTBFOLV/CESZTWL2Y= X-Received: by 2002:a05:6e02:13e2:: with SMTP id w2mr41024228ilj.9.1594191343714; Tue, 07 Jul 2020 23:55:43 -0700 (PDT) MIME-Version: 1.0 References: <32532923.JtPX5UtSzP@fgdesktop> <106271350.sqX05tTuFB@fgdesktop> <20200707155159.GA48341@redhat.com> <20200707215309.GB48341@redhat.com> In-Reply-To: <20200707215309.GB48341@redhat.com> From: Amir Goldstein Date: Wed, 8 Jul 2020 09:55:32 +0300 Message-ID: Subject: Re: overlayfs: issue with a replaced lower squashfs with export-table To: Vivek Goyal Cc: Miklos Szeredi , overlayfs , Fabian Content-Type: text/plain; charset="UTF-8" Sender: linux-unionfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-unionfs@vger.kernel.org On Wed, Jul 8, 2020 at 12:53 AM Vivek Goyal wrote: > > On Tue, Jul 07, 2020 at 08:41:20PM +0300, Amir Goldstein wrote: > > > > Miklos, > > > > > > > > At first glance I did not understand how changing lower file handles causes > > > > failure to ovl_verify_inode(). > > > > To complete the picture, here is the explanation. > > > > > > > > Upper file A was copied up from lower file with inode 10 in old squashfs > > > > and the "origin" file handle composed of the inode number 10 is recorded > > > > in upper file A. > > > > > > > > With newly formatted lower, lower A has inode 11 and lower B has inode 10. > > > > Upper file B is copied from lower file B with inode 10 in new squashfs and > > > > the "origin" file handle composed of the inode number 10 is recorded > > > > in upper file B. > > > > Now we have two upper files with the same "origin" that are not hardlinks. > > > > > > > > On lookup of both overlay files A and B, ovl_check_origin() decodes lower > > > > file B (inode 10) as the lower inode. > > > > This lower inode is used to get the overlay inode number (10) and as > > > > the key to hash overlay inode in inode cache. > > > > > > > > Suppose A is looked up first and it's inode is hashed. > > > > Then B is looked up and in ovl_get_inode() it finds the inode hashed > > > > by the same lower inode in inode cache, but fails ovl_verify_inode() > > > > because: > > > > d_inode(upperdentry) /* B */ != ovl_inode_upper(inode) /* A */ > > > > > > > > This can also happen when copying overlay layers to a new > > > > fs tree and carrying over the old "origin" xattr. > > > > In practice, the UUID part of the stored "origin" xattr is meant to > > > > protect against decoding lower fh when migrating to another > > > > filesystem, but layers could be migrated inside the same filesystem. > > > > Since squashfs does not have a UUID, re-creating sqhashfs is similar > > > > to migrating layers inside the same filesystem. > > > > > > Hi Amir, > > > > > > So we can't use "origin" if lower layers have been copied. If they > > > have been copied to a different filesystem with uuid we seem to > > > have a mechanism to detect it but otherwise not and we can run > > > into these kind of issues. > > > > > > > Correct. > > > > > My question is, why do we allow copying or updating lower layers > > > with same upper when we know this will break stored origin in > > > upper. > > > > I don't know if we "allow" this. > > So there seem to two cases. One is copying the lower layers to same > filesystem or a different filesystem. And another case is recreating > the lower layers and use previous upper with old upper. IIUC, we > are currently facing problem with second scenario. > > "copying the lower layers" is probably fine as you said because old > tree still keeps that inode number busy and newly created inode > should not acquire that number. > > But in this case looks like we recreated lower and that led to > some file B acquiring an inode number which was used by A. So > this is a different use case. IIUC, simply copying the layer > will not lead to this situation. > > Now question is do we support recreating the lower layer with existing > upper. And I see following text in "Sharing and copying layer" > > "Mounting an overlay using an upper layer path, where the upper layer path > was previously used by another mounted overlay in combination with a > different lower layer path, is allowed, unless the "inodes index" feature > or "metadata only copy up" feature is enabled." > > This seems to suggest that recreating lower layer is allowed as long > as you are not using index or metacopy feature. > > If that's the case, probably "origin" should have been an opt-in > feature or automatically be enabled by some other opt-in feature. > > Yes. "should have been". That is my point. We thought it was a victimless crime to enable "origin" without opt-in and I think we were wrong. > > We never considered the case expect > > for nfs export and index, see overlayfs.rst: > > "When the overlay NFS export feature is enabled, overlay filesystems > > behavior on offline changes of the underlying lower layer is different > > than the behavior when NFS export is disabled. ..." > > > > > Can't I do same thing with ext4/xfs, where I recreate > > > lower layers when inode numbers get exchanged and same problem > > > will happen (despite uuid being same). > > > > > > > Same problem. > > > > > > IOW, how can we support copying layers (with same upper) while origin > > > is in use. Rest of the features are built on top of the assumption > > > that origin is valid. And in case of copying layers, we don't > > > seem to have a sure way to find if origin is valid or not. > > > > > > > With index/nfs_export enabled we at least do: > > /* Verify lower root is upper root origin */ > > and if verification fails we disable the feature. > > So discrepancy is still possible if somebody modifies lower layers > without changing lower root. > > - Copy up file A. > - Unmount overlay > - unlink A > - create B (assume B gets same inode number as A). > - mount overlay; B gets copied up. > > And now we have both upper A and B having same origin despite no > hardlink. Am I understanding it right? > Yes. > Is there a good use case for allowing modifying lower layers with > same upper. Given we are adding more complex features to overlayfs, > will it make sense to not allow modifying lower layer going forward. > We might not be able to detect it but atleast it will be unsupported > configuration. And then we can only focus to provide a work around > for existing use cases. > Agreed. > [..] > > > > > > > > We were aware of the "layer migration" case when designing the > > > > index/nfs_export feature, which is one of the reasons they are > > > > opt-in features. > > > > > > > > But we enabled the functionality of following non-dir origin > > > > unconditionally because we *thought* it is harmless, as the comment > > > > in ovl_lookup() says: > > > > > > > > /* > > > > * Lookup copy up origin by decoding origin file handle. > > > > * We may get a disconnected dentry, which is fine, > > > > * because we only need to hold the origin inode in > > > > * cache and use its inode number. We may even get a > > > > * connected dentry, that is not under any of the lower > > > > * layers root. That is also fine for using it's inode > > > > * number - it's the same as if we held a reference > > > > * to a dentry in lower layer that was moved under us. > > > > */ > > > > > > > > The patch I posted disabled decoding of non-dir origin for the special > > > > case of lower null uuid. > > > > > > > > I think we can also warn and auto-disable decoding non-dir origin in > > > > case index is disabled and we detect this upper inode conflict in > > > > ovl_verify_inode(). > > > > > > > > The problem is if A is not metacopy and looked up first, and B is > > > > metacopy and looked up second, then conflict will be deleted after > > > > the wrong inode has been hashed. > > We don't allow modifying lower layers if metacopy is enabled. > > "For "metadata only copy up" feature there is no verification mechanism at > mount time. So if same upper is mounted with different set of lower, mount > probably will succeed but expect the unexpected later on. So don't do it. > " > > So if somebody is recreating lower or modifying lower with metacopy > on, its an unsupported configuration. > Very good. So this case is settled. Now, suggestions for work arounds: 1. Don't follow with lower null uuid (patch posted) - no caveats 2. Opt-out of following origin with explicit option e.g. "index=nofollow" 3. Don't follow origin unless one of the following opt-in features: metacopy,index,xino If we go for option #3, the easiest recommendation for distros would be to set: CONFIG_OVERLAY_FS_XINO_AUTO=y Apart from "compatibility with applications that expect 32bit inodes" I am not aware of any caveats to enabling XINO_AUTO. The purpose of xino feature is to make overlay st_ino/st_dev more posix-like (currently only for non samefs), so it makes some sense to tie the "origin" xattr feature that preserves the pre copy up persistent st_ino to xino. The xino documentation does not mention the samefs exemption: "...If this feature is disabled or the underlying filesystem doesn't have enough free bits in the inode number, then overlayfs will not be able to guarantee that the values of st_ino and st_dev returned by stat(2) and the value of d_ino returned by readdir(3) will act like on a normal filesystem." and the 'xino' mount option is currently not displayed with samefs. If we implement option #3, then with samefs user could enable xino if default is off and disable xino if default is auto and display xino in mount options. Thanks, Amir.