From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,USER_AGENT_MUTT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB608C282DF for ; Fri, 19 Apr 2019 00:02:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C363E21736 for ; Fri, 19 Apr 2019 00:02:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726029AbfDSACe (ORCPT ); Thu, 18 Apr 2019 20:02:34 -0400 Received: from mail106.syd.optusnet.com.au ([211.29.132.42]:42232 "EHLO mail106.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725917AbfDSACd (ORCPT ); Thu, 18 Apr 2019 20:02:33 -0400 Received: from dread.disaster.area (pa49-180-172-16.pa.nsw.optusnet.com.au [49.180.172.16]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 202713D83FC; Fri, 19 Apr 2019 10:02:26 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92) (envelope-from ) id 1hHGz3-0002L4-B3; Fri, 19 Apr 2019 10:02:25 +1000 Date: Fri, 19 Apr 2019 10:02:25 +1000 From: Dave Chinner To: Amir Goldstein Cc: Andrew Morton , Jan Kara , Al Viro , linux-mm@kvack.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH v4] fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback Message-ID: <20190419000225.GF1454@dread.disaster.area> References: <20190409114922.30095-1-amir73il@gmail.com> <20190417054559.29252-1-amir73il@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190417054559.29252-1-amir73il@gmail.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=UJetJGXy c=1 sm=1 tr=0 cx=a_idp_d a=P9M234EABmfYNCwpuVjnFw==:117 a=P9M234EABmfYNCwpuVjnFw==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=oexKYjalfGEA:10 a=VwQbUJbxAAAA:8 a=pGLkceISAAAA:8 a=iox4zFpeAAAA:8 a=7-415B0cAAAA:8 a=drOt6m5kAAAA:8 a=VlU3oBYPLQ4wDVVSrwAA:9 a=te7xfvE6NDnlrWn0:21 a=2RiGNtYlV72RUmA0:21 a=CjuIK1q_8ugA:10 a=AjGcO6oz07-iQ99wixmX:22 a=WzC6qhA0u3u7Ye7llzcV:22 a=biEYGPWJfzWAr4FL6Ov7:22 a=RMMjzBEyIzXRtoq5n5K6:22 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Wed, Apr 17, 2019 at 08:45:59AM +0300, Amir Goldstein wrote: > Commit 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE > writeback") claims that sync_file_range(2) syscall was "created for > userspace to be able to issue background writeout and so waiting for > in-flight IO is undesirable there" and changes the writeback (back) to > WB_SYNC_NONE. > > This claim is only partially true. It is true for users that use the flag > SYNC_FILE_RANGE_WRITE by itself, as does PostgreSQL, the user that was > the reason for changing to WB_SYNC_NONE writeback. > > However, that claim is not true for users that use that flag combination > SYNC_FILE_RANGE_{WAIT_BEFORE|WRITE|_WAIT_AFTER}. > Those users explicitly requested to wait for in-flight IO as well as to > writeback of dirty pages. > > Re-brand that flag combination as SYNC_FILE_RANGE_WRITE_AND_WAIT > and use the helper filemap_write_and_wait_range(), that uses WB_SYNC_ALL > writeback, to perform the full range sync request. > > Link: http://lkml.kernel.org/r/20190409114922.30095-1-amir73il@gmail.com > Fixes: 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE") > Signed-off-by: Amir Goldstein > Acked-by: Jan Kara > Cc: Dave Chinner > Cc: Al Viro > --- > > Andrew, > > V2 of this patch is on your mmtotm queue. > However, I had already sent out V3 with a braino fix and Dave Chinner > just added more review comments which I had addressed in this version. > > Thanks, > Amir. > > Changes since v3: > - Remove unneeded change to VALID_FLAGS (Dave) > - Call file_fdatawait_range() before writeback (Dave) > > Changes since v2: > - Return after filemap_write_and_wait_range() > > Changes since v1: > - Remove non-guaranties of the API from commit message > - Added ACK by Jan > > fs/sync.c | 20 +++++++++++++++----- > include/uapi/linux/fs.h | 3 +++ > 2 files changed, 18 insertions(+), 5 deletions(-) > > diff --git a/fs/sync.c b/fs/sync.c > index b54e0541ad89..1836328f1ae8 100644 > --- a/fs/sync.c > +++ b/fs/sync.c > @@ -235,9 +235,9 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd) > } > > /* > - * sys_sync_file_range() permits finely controlled syncing over a segment of > + * ksys_sync_file_range() permits finely controlled syncing over a segment of > * a file in the range offset .. (offset+nbytes-1) inclusive. If nbytes is > - * zero then sys_sync_file_range() will operate from offset out to EOF. > + * zero then ksys_sync_file_range() will operate from offset out to EOF. > * > * The flag bits are: > * > @@ -254,7 +254,7 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd) > * Useful combinations of the flag bits are: > * > * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE: ensures that all pages > - * in the range which were dirty on entry to sys_sync_file_range() are placed > + * in the range which were dirty on entry to ksys_sync_file_range() are placed > * under writeout. This is a start-write-for-data-integrity operation. > * > * SYNC_FILE_RANGE_WRITE: start writeout of all dirty pages in the range which > @@ -266,10 +266,13 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd) > * earlier SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE operation to wait > * for that operation to complete and to return the result. > * > - * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER: > + * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER > + * (a.k.a. SYNC_FILE_RANGE_WRITE_AND_WAIT): > * a traditional sync() operation. This is a write-for-data-integrity operation > * which will ensure that all pages in the range which were dirty on entry to > - * sys_sync_file_range() are committed to disk. > + * ksys_sync_file_range() are written to disk. It should be noted that disk > + * caches are not flushed by this call, so there are no guarantees here that the > + * data will be available on disk after a crash. > * > * > * SYNC_FILE_RANGE_WAIT_BEFORE and SYNC_FILE_RANGE_WAIT_AFTER will detect any > @@ -344,6 +347,13 @@ int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes, > goto out_put; > } > > + if ((flags & SYNC_FILE_RANGE_WRITE_AND_WAIT) == > + SYNC_FILE_RANGE_WRITE_AND_WAIT) { > + /* Unlike SYNC_FILE_RANGE_WRITE alone, uses WB_SYNC_ALL */ > + ret = filemap_write_and_wait_range(mapping, offset, endbyte); > + goto out_put; > + } Clunky, now that I look at it in context. + int sync_mode = WB_SYNC_NONE; + + if ((flags & SYNC_FILE_RANGE_WRITE_AND_WAIT) == + SYNC_FILE_RANGE_WRITE_AND_WAIT) + sync_mode = WB_SYNC_ALL; ..... if (flags & SYNC_FILE_RANGE_WRITE) { ret = __filemap_fdatawrite_range(mapping, offset, endbyte, - WB_SYNC_NONE); + sync_mode); Cheers, Dave. -- Dave Chinner david@fromorbit.com