From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16DADC04A6B for ; Thu, 9 May 2019 01:43:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E5EA420675 for ; Thu, 9 May 2019 01:43:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726179AbfEIBnh (ORCPT ); Wed, 8 May 2019 21:43:37 -0400 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:36855 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725832AbfEIBnh (ORCPT ); Wed, 8 May 2019 21:43:37 -0400 Received: from dread.disaster.area (pa49-181-171-240.pa.nsw.optusnet.com.au [49.181.171.240]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 74FE514A33F; Thu, 9 May 2019 11:43:29 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92) (envelope-from ) id 1hOY5n-0006e1-LW; Thu, 09 May 2019 11:43:27 +1000 Date: Thu, 9 May 2019 11:43:27 +1000 From: Dave Chinner To: Theodore Ts'o Cc: Amir Goldstein , Vijay Chidambaram , lsf-pc@lists.linux-foundation.org, "Darrick J. Wong" , Jan Kara , linux-fsdevel , Jayashree Mohan , Filipe Manana , Chris Mason , lwn@lwn.net Subject: Re: [TOPIC] Extending the filesystem crash recovery guaranties contract Message-ID: <20190509014327.GT1454@dread.disaster.area> References: <20190503023043.GB23724@mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190503023043.GB23724@mit.edu> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=D+Q3ErZj c=1 sm=1 tr=0 cx=a_idp_d a=LhzQONXuMOhFZtk4TmSJIw==:117 a=LhzQONXuMOhFZtk4TmSJIw==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=E5NmQfObTbMA:10 a=7-415B0cAAAA:8 a=2bvXc-thVNbei5B46cQA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Thu, May 02, 2019 at 10:30:43PM -0400, Theodore Ts'o wrote: > On Thu, May 02, 2019 at 01:39:47PM -0400, Amir Goldstein wrote: > > I am not saying there is no room for a document that elaborates on those > > guaranties. I personally think that could be useful and certainly think that > > your group's work for adding xfstest coverage for API guaranties is useful. > > Again, here is my concern. If we promise that ext4 will always obey > Dave Chinner's SOMC model, it would forever rule out Daejun Park and > Dongkun Shin's "iJournaling: Fine-grained journaling for improving the > latency of fsync system call"[1] published in Usenix ATC 2017. No, it doesn't rule that out at all. In a SOMC model, incremental journalling is just fine when there are no external dependencies on the thing being fsync'd. If you have other dependencies (e.g. the file has just be created and so the dir it dirty, too) then fsync would need to do the whole shebang, but otherwise.... > So if the crash consistency guarantees forbids future innovations > where applications might *want* a fast fsync() that doesn't drag > unrelated inodes into the persistence guarantees, .... the whole point of SOMC is that allows filesystems to avoid dragging external metadata into fsync() operations /unless/ there's a user visible ordering dependency that must be maintained between objects. If all you are doing is stabilising file data in a stable file/directory, then independent, incremental journaling of the fsync operations on that file fit the SOMC model just fine. > is that really what > we want? Do we want to forever rule out various academic > investigations such as Park and Shin's because "it violates the crash > consistency recovery model"? Especially if some applications don't > *need* the crash consistency model? Stop with the silly inflammatory hyperbole already, Ted. It is not necessary. > P.P.S. One of the other discussions that did happen during the main > LSF/MM File system session, and for which there was general agreement > across a number of major file system maintainers, was a fsync2() > system call which would take a list of file descriptors (and flags) > that should be fsync'ed. Hmmmm, that wasn't on the agenda, and nobody has documented it as yet. > The semantics would be that when the > fsync2() successfully returns, all of the guarantees of fsync() or > fdatasync() requested by the list of file descriptors and flags would > be satisfied. This would allow file systems to more optimally fsync a > batch of files, for example by implementing data integrity writebacks > for all of the files, followed by a single journal commit to guarantee > persistence for all of the metadata changes. What happens when you get writeback errors on only some of the fds? How do you report the failures and what do you do with the journal commit on partial success? Of course, this ignores the elephant in the room: applications can /already do this/ using AIO_FSYNC and have individual error status for each fd. Not to mention that filesystems already batch concurrent fsync journal commits into a single operation. I'm not seeing the point of a new syscall to do this right now.... Cheers, Dave. -- Dave Chinner david@fromorbit.com