From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05E09C43219 for ; Sat, 27 Apr 2019 21:00:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BBE802077B for ; Sat, 27 Apr 2019 21:00:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="aKtk1lUq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726376AbfD0VA1 (ORCPT ); Sat, 27 Apr 2019 17:00:27 -0400 Received: from mail-yb1-f195.google.com ([209.85.219.195]:46939 "EHLO mail-yb1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726353AbfD0VA0 (ORCPT ); Sat, 27 Apr 2019 17:00:26 -0400 Received: by mail-yb1-f195.google.com with SMTP id m5so2474130ybk.13 for ; Sat, 27 Apr 2019 14:00:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to:cc; bh=Rt3S7O/IouArnV6rdwYf6ZOPN2AmI2hnp+fGkAUAGf8=; b=aKtk1lUqVn9IjNg4CTD+5RfXzqv9ywqWgCUUREIz0A1fyKObv05ZXcPVoXZ/45Em0k 7ePjnccCsDS097T6FS0D5CAMWjkXpORnUyuyvrGOvp3/oPHCXPCHaOvtec2tepED4teL eBx8//6mkrpIAOFwJi77HLMbdeXKuBzMApzHCQ5TMB3giuW9K17bUrU+RXmpIn/MiVDD vYbDpMWYt4dsfz+mjKvS5bBj97qXsVSM3j4K+VMCdb46WNnI5Kmi0uJTX5hVwQYZOpST 2cZjj08eWwRajA2RVMB0XW6zKzZo1LCkdarZvfk5SjzG96Tyxnx4MCIdenPxJOs8ahgt 17TQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=Rt3S7O/IouArnV6rdwYf6ZOPN2AmI2hnp+fGkAUAGf8=; b=Udd7X9WDnjfBQH4i4EZSIYhodRHUh9YT31tG4Wic+HcYPTxSjOZ/zH4ZhB+V5juV1v yrec1eFUm2uMApQ4NYZ9+xenbgpktnCkpeLMpOmGXdFzaS4ZBiDObdi4zFTom4HQUxE0 A5BZd8Ze+FXepra6avqLr3dQEenroOLK8zOPAJ9zbxrLKFX+FW7Kn8HVe2TVigGkoHNq Zf8ftEH+ryrg1uNjnCPKrOEJoVWcn/8OM3/ZflvXUa9jrz16yb9Ew1UrsfcfYD3IfY+d cYieF+9jQvdOoP89uTvu/QFwAIm8uWzJfxqgbXpfhFz7UyaIHjYuc5j47jQjDesZMAGB QZ2w== X-Gm-Message-State: APjAAAVwEDiK4sQ7HacGQik/M7PwtpdVIDDXUSaFs0G8itWeUi17HVdG 5Yye4DqjP9yCcFtJd+uOpr99vjeHnS/V2mV1uhs= X-Google-Smtp-Source: APXvYqyBesXX55HBXqcyR3XLAadW83ABuHfwgyURKKZnSdFt1x4ZIj5PyJuoyuACzd6YCgd/IyARVWMlOK1CaEkeKnE= X-Received: by 2002:a5b:48a:: with SMTP id n10mr43335791ybp.320.1556398825739; Sat, 27 Apr 2019 14:00:25 -0700 (PDT) MIME-Version: 1.0 From: Amir Goldstein Date: Sat, 27 Apr 2019 17:00:14 -0400 Message-ID: Subject: [TOPIC] Extending the filesystem crash recovery guaranties contract To: lsf-pc@lists.linux-foundation.org Cc: Dave Chinner , Theodore Tso , Jan Kara , linux-fsdevel , Jayashree Mohan , Vijaychidambaram Velayudhan Pillai , Filipe Manana Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Suggestion for another filesystems track topic. Some of you may remember the emotional(?) discussions that ensued when the crashmonkey developers embarked on a mission to document and verify filesystem crash recovery guaranties: https://lore.kernel.org/linux-fsdevel/CAOQ4uxj8YpYPPdEvAvKPKXO7wdBg6T1O3osd6fSPFKH9j=i2Yg@mail.gmail.com/ There are two camps among filesystem developers and every camp has good arguments for wanting to document existing behavior and for not wanting to document anything beyond "use fsync if you want any guaranty". I would like to take a suggestion proposed by Jan on a related discussion: https://lore.kernel.org/linux-fsdevel/CAOQ4uxjQx+TO3Dt7TA3ocXnNxbr3+oVyJLYUSpv4QCt_Texdvw@mail.gmail.com/ and make a proposal that may be able to meet the concerns of both camps. The proposal is to add new APIs which communicate crash consistency requirements of the application to the filesystem. Example API could look like this: renameat2(..., RENAME_METADATA_BARRIER | RENAME_DATA_BARRIER) It's just an example. The API could take another form and may need more barrier types (I proposed to use new file_sync_range() flags). The idea is simple though. METADATA_BARRIER means all the inode metadata will be observed after crash if rename is observed after crash. DATA_BARRIER same for file data. We may also want a "ALL_METADATA_BARRIER" and/or "METADATA_DEPENDENCY_BARRIER" to more accurately describe what SOMC guaranties actually provide today. The implementation is also simple. filesystem that currently have SOMC behavior don't need to do anything to respect METADATA_BARRIER and only need to call filemap_write_and_wait_range() to respect DATA_BARRIER. filesystem developers are thus not tying their hands w.r.t future performance optimizations for operations that are not explicitly requesting a barrier. Thanks, Amir.