From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7966C433FE for ; Sun, 13 Dec 2020 20:07:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 94B1D207C9 for ; Sun, 13 Dec 2020 20:07:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391777AbgLMUHb (ORCPT ); Sun, 13 Dec 2020 15:07:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728352AbgLMUHZ (ORCPT ); Sun, 13 Dec 2020 15:07:25 -0500 Received: from mail-il1-x141.google.com (mail-il1-x141.google.com [IPv6:2607:f8b0:4864:20::141]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57277C0613CF for ; Sun, 13 Dec 2020 12:06:45 -0800 (PST) Received: by mail-il1-x141.google.com with SMTP id c18so13898259iln.10 for ; Sun, 13 Dec 2020 12:06:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=pi2fcVCqcUFQQYSrnFsWBqF7pr8porn7D+qfGhtFQRo=; b=daXSNb8HWwoWdF2eCXrVn6nFT+Xk99HYG1snGu067/SbxHLH6g2WhFBfUoqagCx3Gb +5utvK97kpFH4wYhyb70+BQRRRLiQomrw6tH2AcKWt5ecwbHLUFmG7sayp8ON4s3DNDq tYrVi5l8Ixgr2b35p7Y577mTiNvwnkq+YUdfI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=pi2fcVCqcUFQQYSrnFsWBqF7pr8porn7D+qfGhtFQRo=; b=nC57VNXYbJnY+1055rIyuAFubDrfd0rUxZkuWVpgixJH8fK9kjLgGoZ7LsKZjfwfgb sRIr7in3w2Z5Ao/97e4jgUZY13g6fkDEV1ggIqNcyS1qY4BvEzW+HC2S5b9o+Ki4Rlqf HJBBkU/Zc+wP3FSUoDWNY4Cteh4nFtQel6dvjV6R1wzA2jc5Z9ZOPkyl7lAtCzMg+QTk eMi7lbhTffcWdAzAonDgVs5mpx4zIO5hFdFtx+l6MyypSaPdq1mlM1pGnpPXf+JrnwH9 Ve9QES0C5SjA61yBDtaiO7TJlscCtuwa2nIbRg5HC4VHIrzDiQmRxgIMKSPukudBZGyi BshA== X-Gm-Message-State: AOAM533r58mVARo/uCSw9rx9/0Exr8fcMapbHJ/hhFFYv5Rr4g8r3ANm o1mqucjapdVlKYsdpiiYhJXFtkqj0tlEmw== X-Google-Smtp-Source: ABdhPJzidN/pSVIKTGs7cgoxNTakG4hkVJ9yAVpIVbLjGf5Jp1dIA6tvA+cCvEDPJSpco+pJpBGd4g== X-Received: by 2002:a05:6e02:1ba3:: with SMTP id n3mr555071ili.10.1607890004543; Sun, 13 Dec 2020 12:06:44 -0800 (PST) Received: from ircssh-2.c.rugged-nimbus-611.internal (80.60.198.104.bc.googleusercontent.com. [104.198.60.80]) by smtp.gmail.com with ESMTPSA id a15sm9757194ilh.10.2020.12.13.12.06.44 (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Sun, 13 Dec 2020 12:06:44 -0800 (PST) Date: Sun, 13 Dec 2020 20:06:42 +0000 From: Sargun Dhillon To: Jeff Layton Cc: Amir Goldstein , Miklos Szeredi , Vivek Goyal , overlayfs , Linux FS-devel Mailing List , Matthew Wilcox Subject: Re: [PATCH v2 0/3] Check errors on sync for volatile overlayfs mounts Message-ID: <20201213200642.GB8562@ircssh-2.c.rugged-nimbus-611.internal> References: <20201211235002.4195-1-sargun@sargun.me> <7779e2ed97080009d894f3442bfad31972494542.camel@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <7779e2ed97080009d894f3442bfad31972494542.camel@kernel.org> User-Agent: Mutt/1.9.4 (2018-02-28) Precedence: bulk List-ID: X-Mailing-List: linux-unionfs@vger.kernel.org On Sat, Dec 12, 2020 at 06:21:37AM -0500, Jeff Layton wrote: > On Fri, 2020-12-11 at 15:49 -0800, Sargun Dhillon wrote: > > The semantics of errseq and syncfs are such that it is impossible to track > > if any errors have occurred between the time the first error occurred, and > > the user checks for the error (calls syncfs, and subsequently > > errseq_check_and_advance. > > > > Overlayfs has a volatile feature which short-circuits syncfs. This, in turn > > makes it so that the user can have silent data corruption and not know > > about it. The third patch in the series introduces behaviour that makes it > > so that we can track errors, and bubble up whether the user has put > > themselves in bad situation. > > > > This required some gymanstics in errseq, and adding a wrapper around it > > called "errseq_counter" (errseq + counter). The data structure uses an > > atomic to track overflow errors. This approach, rather than moving to an > > atomic64 / u64 is so we can avoid bloating every person that subscribes to > > an errseq, and only add the subscriber behaviour to those who care (at the > > expense of space. > > > > The datastructure is write-optimized, and rightfully so, as the users > > of the counter feature are just overlayfs, and it's called in fsync > > checking, which is a rather seldom operation, and not really on > > any hotpaths. > > > > [1]: https://lore.kernel.org/linux-fsdevel/20201202092720.41522-1-sargun@sargun.me/ > > > > Sargun Dhillon (3): > >   errseq: Add errseq_counter to allow for all errors to be observed > >   errseq: Add mechanism to snapshot errseq_counter and check snapshot > >   overlay: Implement volatile-specific fsync error behaviour > > > >  Documentation/filesystems/overlayfs.rst | 8 ++ > >  fs/buffer.c | 2 +- > >  fs/overlayfs/file.c | 5 +- > >  fs/overlayfs/overlayfs.h | 1 + > >  fs/overlayfs/ovl_entry.h | 3 + > >  fs/overlayfs/readdir.c | 5 +- > >  fs/overlayfs/super.c | 26 +++-- > >  fs/overlayfs/util.c | 28 +++++ > >  fs/super.c | 1 + > >  fs/sync.c | 3 +- > >  include/linux/errseq.h | 18 ++++ > >  include/linux/fs.h | 6 +- > >  include/linux/pagemap.h | 2 +- > >  lib/errseq.c | 129 ++++++++++++++++++++---- > >  14 files changed, 202 insertions(+), 35 deletions(-) > > > > It would hel if you could more clearly lay out the semantics you're > looking for. If I understand correctly: > > You basically want to be able to sample the sb->s_wb_err of the upper > layer at mount time and then always return an error if any new errors > were recorded since that point. > There's two things we want to achieve: 1. If an error occurs on the upperidr after mount time, we want to tell the user on every syncfs they try to do on the overlayfs volume that it occurred, and that the volume is in an inconsistent state. 2. We want to be able to checkpoint some information to disk, and if an overlayfs mount was unmounted, and remounted, while in volatile mode, we want to make sure no error occurred while we were way. > If that's correct, then I'm not sure I get need for all of this extra > counter machinery. Why not just sample it at mount time without > recording it as 0 if the seen flag isn't set. Then just do an > errseq_check against the upper superblock (without advancing) in the > overlayfs ->sync_fs routine and just errseq_set that error into the > overlayfs superblock? The syncfs syscall wrapper should then always > report the latest error. I considered the following options: 1. Make errseq_t a u64: Downside: Bloats all errseq_ts to u64 / 8-byte aligned 2. Make errseq_counter_t an atomic64 / u64 giving us 52 error checking bits vs. just 20: Downside: We would have to do an cmpxchg64 on every error, which seemed like it could be costly on platforms that don't naturally support it. 3. Have an overflow counter: This doesn't introduce extra CPU overhead for any other user of errseq, nor does it introduce much memory overhead. Downside: complexity. > > Or (even better) rework all of the sync_fs/syncfs mess to be more sane, > so that overlayfs has more control over what errors get returned to > userland. ISTM that the main problem you have is that the > errseq_check_and_advance is done in the syscall wrapper, and that's > probably not appropriate for your use-case. > > -- > Jeff Layton >