From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f71.google.com (mail-oi0-f71.google.com [209.85.218.71]) by kanga.kvack.org (Postfix) with ESMTP id 166CB6B025F for ; Wed, 26 Jul 2017 13:55:43 -0400 (EDT) Received: by mail-oi0-f71.google.com with SMTP id v11so6886297oif.2 for ; Wed, 26 Jul 2017 10:55:43 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id y2si4606887oiy.62.2017.07.26.10.55.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Jul 2017 10:55:42 -0700 (PDT) From: Jeff Layton Subject: [PATCH v2 0/4] mm/gfs2: extend file_* API, and convert gfs2 to errseq_t error reporting Date: Wed, 26 Jul 2017 13:55:34 -0400 Message-Id: <20170726175538.13885-1-jlayton@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Alexander Viro , Jan Kara Cc: "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com From: Jeff Layton I sent a small patch earlier this week to make sync_file_range use errseq_t reporting. This set respins that patch into a patch that adds a bit more file_* infrastructure, and then patches to make sync_file_range and fsync on gfs2 report writeback errors properly. There's also a small cleanup patch for mm/filemap.c to consolidate the DAX handling checks in the existing infrastructure. Jeff Layton (4): mm: consolidate dax / non-dax checks for writeback mm: add file_fdatawait_range and file_write_and_wait fs: convert sync_file_range to use errseq_t based error-tracking gfs2: convert to errseq_t based writeback error reporting for fsync fs/gfs2/file.c | 6 +++-- fs/sync.c | 4 +-- include/linux/fs.h | 7 +++++- mm/filemap.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++----- 4 files changed, 77 insertions(+), 11 deletions(-) -- 2.13.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f72.google.com (mail-oi0-f72.google.com [209.85.218.72]) by kanga.kvack.org (Postfix) with ESMTP id 889706B0292 for ; Wed, 26 Jul 2017 13:55:44 -0400 (EDT) Received: by mail-oi0-f72.google.com with SMTP id b184so12561787oih.9 for ; Wed, 26 Jul 2017 10:55:44 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id y206si8767777oig.366.2017.07.26.10.55.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Jul 2017 10:55:43 -0700 (PDT) From: Jeff Layton Subject: [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback Date: Wed, 26 Jul 2017 13:55:35 -0400 Message-Id: <20170726175538.13885-2-jlayton@kernel.org> In-Reply-To: <20170726175538.13885-1-jlayton@kernel.org> References: <20170726175538.13885-1-jlayton@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Alexander Viro , Jan Kara Cc: "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com From: Jeff Layton We have this complex conditional copied to several places. Turn it into a helper function. Signed-off-by: Jeff Layton --- mm/filemap.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index e1cca770688f..72e46e6f0d9a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -522,12 +522,17 @@ int filemap_fdatawait(struct address_space *mapping) } EXPORT_SYMBOL(filemap_fdatawait); +static bool mapping_needs_writeback(struct address_space *mapping) +{ + return (!dax_mapping(mapping) && mapping->nrpages) || + (dax_mapping(mapping) && mapping->nrexceptional); +} + int filemap_write_and_wait(struct address_space *mapping) { int err = 0; - if ((!dax_mapping(mapping) && mapping->nrpages) || - (dax_mapping(mapping) && mapping->nrexceptional)) { + if (mapping_needs_writeback(mapping)) { err = filemap_fdatawrite(mapping); /* * Even if the above returned error, the pages may be @@ -566,8 +571,7 @@ int filemap_write_and_wait_range(struct address_space *mapping, { int err = 0; - if ((!dax_mapping(mapping) && mapping->nrpages) || - (dax_mapping(mapping) && mapping->nrexceptional)) { + if (mapping_needs_writeback(mapping)) { err = __filemap_fdatawrite_range(mapping, lstart, lend, WB_SYNC_ALL); /* See comment of filemap_write_and_wait() */ @@ -656,8 +660,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend) int err = 0, err2; struct address_space *mapping = file->f_mapping; - if ((!dax_mapping(mapping) && mapping->nrpages) || - (dax_mapping(mapping) && mapping->nrexceptional)) { + if (mapping_needs_writeback(mapping)) { err = __filemap_fdatawrite_range(mapping, lstart, lend, WB_SYNC_ALL); /* See comment of filemap_write_and_wait() */ -- 2.13.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f72.google.com (mail-oi0-f72.google.com [209.85.218.72]) by kanga.kvack.org (Postfix) with ESMTP id 0FE3E6B02B4 for ; Wed, 26 Jul 2017 13:55:46 -0400 (EDT) Received: by mail-oi0-f72.google.com with SMTP id v68so13992964oia.14 for ; Wed, 26 Jul 2017 10:55:46 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id p131si653186oib.339.2017.07.26.10.55.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Jul 2017 10:55:45 -0700 (PDT) From: Jeff Layton Subject: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait Date: Wed, 26 Jul 2017 13:55:36 -0400 Message-Id: <20170726175538.13885-3-jlayton@kernel.org> In-Reply-To: <20170726175538.13885-1-jlayton@kernel.org> References: <20170726175538.13885-1-jlayton@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Alexander Viro , Jan Kara Cc: "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com From: Jeff Layton Some filesystem fsync routines will need these. Signed-off-by: Jeff Layton --- include/linux/fs.h | 7 ++++++- mm/filemap.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 62 insertions(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 21e7df1ad613..bc57a79294f0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2544,6 +2544,8 @@ extern int filemap_fdatawait_range(struct address_space *, loff_t lstart, loff_t lend); extern bool filemap_range_has_page(struct address_space *, loff_t lstart, loff_t lend); +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int filemap_write_and_wait(struct address_space *mapping); extern int filemap_write_and_wait_range(struct address_space *mapping, loff_t lstart, loff_t lend); @@ -2552,11 +2554,14 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping, extern int filemap_fdatawrite_range(struct address_space *mapping, loff_t start, loff_t end); extern int filemap_check_errors(struct address_space *mapping); - extern void __filemap_set_wb_err(struct address_space *mapping, int err); + +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int __must_check file_check_and_advance_wb_err(struct file *file); extern int __must_check file_write_and_wait_range(struct file *file, loff_t start, loff_t end); +extern int __must_check file_write_and_wait(struct file *file); /** * filemap_set_wb_err - set a writeback error on an address_space diff --git a/mm/filemap.c b/mm/filemap.c index 72e46e6f0d9a..b904a8dfa43d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, EXPORT_SYMBOL(filemap_fdatawait_range); /** + * file_fdatawait_range - wait for writeback to complete + * @file: file pointing to address space structure to wait for + * @start_byte: offset in bytes where the range starts + * @end_byte: offset in bytes where the range ends (inclusive) + * + * Walk the list of under-writeback pages of the address space that file + * refers to, in the given range and wait for all of them. Check error + * status of the address space vs. the file->f_wb_err cursor and return it. + * + * Since the error status of the file is advanced by this function, + * callers are responsible for checking the return value and handling and/or + * reporting the error. + */ +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte) +{ + struct address_space *mapping = file->f_mapping; + + __filemap_fdatawait_range(mapping, start_byte, end_byte); + return file_check_and_advance_wb_err(file); +} +EXPORT_SYMBOL(file_fdatawait_range); + +/** * filemap_fdatawait_keep_errors - wait for writeback without clearing errors * @mapping: address space structure to wait for * @@ -675,6 +698,39 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend) EXPORT_SYMBOL(file_write_and_wait_range); /** + * file_write_and_wait - write out whole file and wait on it and return any + * writeback errors since we last checked + * @file: file to write back and wait on + * + * Write back the whole file and wait on its mapping. Afterward, check for + * errors that may have occurred since our file->f_wb_err cursor was last + * updated. + */ +int file_write_and_wait(struct file *file) +{ + int err = 0, err2; + struct address_space *mapping = file->f_mapping; + + if ((!dax_mapping(mapping) && mapping->nrpages) || + (dax_mapping(mapping) && mapping->nrexceptional)) { + err = filemap_fdatawrite(mapping); + /* See comment of filemap_write_and_wait() */ + if (err != -EIO) { + loff_t i_size = i_size_read(mapping->host); + + if (i_size != 0) + __filemap_fdatawait_range(mapping, 0, + i_size - 1); + } + } + err2 = file_check_and_advance_wb_err(file); + if (!err) + err = err2; + return err; +} +EXPORT_SYMBOL(file_write_and_wait); + +/** * replace_page_cache_page - replace a pagecache page with a new one * @old: page to be replaced * @new: page to replace with -- 2.13.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f69.google.com (mail-oi0-f69.google.com [209.85.218.69]) by kanga.kvack.org (Postfix) with ESMTP id 7C0266B02F4 for ; Wed, 26 Jul 2017 13:55:47 -0400 (EDT) Received: by mail-oi0-f69.google.com with SMTP id p62so13537715oih.12 for ; Wed, 26 Jul 2017 10:55:47 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id z126si8861971oiz.119.2017.07.26.10.55.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Jul 2017 10:55:46 -0700 (PDT) From: Jeff Layton Subject: [PATCH v2 3/4] fs: convert sync_file_range to use errseq_t based error-tracking Date: Wed, 26 Jul 2017 13:55:37 -0400 Message-Id: <20170726175538.13885-4-jlayton@kernel.org> In-Reply-To: <20170726175538.13885-1-jlayton@kernel.org> References: <20170726175538.13885-1-jlayton@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Alexander Viro , Jan Kara Cc: "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com From: Jeff Layton sync_file_range doesn't call down into the filesystem directly at all. It only kicks off writeback of pagecache pages and optionally waits on the result. Convert sync_file_range to use errseq_t based error tracking, under the assumption that most users will prefer this behavior when errors occur. Reviewed-by: Jan Kara Signed-off-by: Jeff Layton --- fs/sync.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/sync.c b/fs/sync.c index 2a54c1f22035..27d6b8bbcb6a 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -342,7 +342,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes, ret = 0; if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) { - ret = filemap_fdatawait_range(mapping, offset, endbyte); + ret = file_fdatawait_range(f.file, offset, endbyte); if (ret < 0) goto out_put; } @@ -355,7 +355,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes, } if (flags & SYNC_FILE_RANGE_WAIT_AFTER) - ret = filemap_fdatawait_range(mapping, offset, endbyte); + ret = file_fdatawait_range(f.file, offset, endbyte); out_put: fdput(f); -- 2.13.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f72.google.com (mail-oi0-f72.google.com [209.85.218.72]) by kanga.kvack.org (Postfix) with ESMTP id CCD126B02F4 for ; Wed, 26 Jul 2017 13:55:48 -0400 (EDT) Received: by mail-oi0-f72.google.com with SMTP id p62so13537753oih.12 for ; Wed, 26 Jul 2017 10:55:48 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id q189si9179913oih.549.2017.07.26.10.55.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Jul 2017 10:55:48 -0700 (PDT) From: Jeff Layton Subject: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync Date: Wed, 26 Jul 2017 13:55:38 -0400 Message-Id: <20170726175538.13885-5-jlayton@kernel.org> In-Reply-To: <20170726175538.13885-1-jlayton@kernel.org> References: <20170726175538.13885-1-jlayton@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Alexander Viro , Jan Kara Cc: "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com From: Jeff Layton This means that we need to export the new file_fdatawait_range symbol. Also, fix a place where a writeback error might get dropped in the gfs2_is_jdata case. Signed-off-by: Jeff Layton --- fs/gfs2/file.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index c2062a108d19..c53ac6efd04c 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end, if (ret) return ret; if (gfs2_is_jdata(ip)) - filemap_write_and_wait(mapping); + ret = file_write_and_wait(file); + if (ret) + return ret; gfs2_ail_flush(ip->i_gl, 1); } if (mapping->nrpages) - ret = filemap_fdatawait_range(mapping, start, end); + ret = file_fdatawait_range(file, start, end); return ret ? ret : ret1; } -- 2.13.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f197.google.com (mail-pf0-f197.google.com [209.85.192.197]) by kanga.kvack.org (Postfix) with ESMTP id 622046B025F for ; Wed, 26 Jul 2017 15:13:12 -0400 (EDT) Received: by mail-pf0-f197.google.com with SMTP id k72so89952678pfj.1 for ; Wed, 26 Jul 2017 12:13:12 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id c9si10159724pgt.207.2017.07.26.12.13.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Jul 2017 12:13:11 -0700 (PDT) Date: Wed, 26 Jul 2017 12:13:05 -0700 From: Matthew Wilcox Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait Message-ID: <20170726191305.GC15980@bombadil.infradead.org> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-3-jlayton@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170726175538.13885-3-jlayton@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Jeff Layton Cc: Alexander Viro , Jan Kara , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com On Wed, Jul 26, 2017 at 01:55:36PM -0400, Jeff Layton wrote: > +int file_write_and_wait(struct file *file) > +{ > + int err = 0, err2; > + struct address_space *mapping = file->f_mapping; > + > + if ((!dax_mapping(mapping) && mapping->nrpages) || > + (dax_mapping(mapping) && mapping->nrexceptional)) { Since patch 1 exists, shouldn't this use the new helper? > + err = filemap_fdatawrite(mapping); > + /* See comment of filemap_write_and_wait() */ > + if (err != -EIO) { > + loff_t i_size = i_size_read(mapping->host); > + > + if (i_size != 0) > + __filemap_fdatawait_range(mapping, 0, > + i_size - 1); > + } > + } > + err2 = file_check_and_advance_wb_err(file); > + if (!err) > + err = err2; > + return err; Would this be clearer written as: if (err) return err; return err2; or even ... return err ? err : err2; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f72.google.com (mail-pg0-f72.google.com [74.125.83.72]) by kanga.kvack.org (Postfix) with ESMTP id 9051C6B0292 for ; Wed, 26 Jul 2017 15:21:11 -0400 (EDT) Received: by mail-pg0-f72.google.com with SMTP id a2so224748028pgn.15 for ; Wed, 26 Jul 2017 12:21:11 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id k4si7919512pgr.0.2017.07.26.12.21.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Jul 2017 12:21:10 -0700 (PDT) Date: Wed, 26 Jul 2017 12:21:05 -0700 From: Matthew Wilcox Subject: Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync Message-ID: <20170726192105.GD15980@bombadil.infradead.org> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-5-jlayton@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170726175538.13885-5-jlayton@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Jeff Layton Cc: Alexander Viro , Jan Kara , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end, > if (ret) > return ret; > if (gfs2_is_jdata(ip)) > - filemap_write_and_wait(mapping); > + ret = file_write_and_wait(file); > + if (ret) > + return ret; > gfs2_ail_flush(ip->i_gl, 1); > } Do we want to skip flushing the AIL if there was an error (possibly previously encountered)? I'd think we'd want to flush the AIL then report the error, like this: if (gfs2_is_jdata(ip)) - filemap_write_and_wait(mapping); + ret = file_write_and_wait(file); gfs2_ail_flush(ip->i_gl, 1); + if (ret) + return ret; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f200.google.com (mail-qk0-f200.google.com [209.85.220.200]) by kanga.kvack.org (Postfix) with ESMTP id CCD136B025F for ; Wed, 26 Jul 2017 15:50:27 -0400 (EDT) Received: by mail-qk0-f200.google.com with SMTP id o5so89849795qki.2 for ; Wed, 26 Jul 2017 12:50:27 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id k30si3318001qtb.392.2017.07.26.12.50.27 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Jul 2017 12:50:27 -0700 (PDT) Date: Wed, 26 Jul 2017 15:50:22 -0400 (EDT) From: Bob Peterson Message-ID: <4829887.34737343.1501098622466.JavaMail.zimbra@redhat.com> In-Reply-To: <20170726175538.13885-3-jlayton@kernel.org> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-3-jlayton@kernel.org> Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jeff Layton Cc: Alexander Viro , Jan Kara , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Steven Whitehouse , cluster-devel@redhat.com ----- Original Message ----- | From: Jeff Layton | | Some filesystem fsync routines will need these. | | Signed-off-by: Jeff Layton | --- | include/linux/fs.h | 7 ++++++- | mm/filemap.c | 56 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++ | 2 files changed, 62 insertions(+), 1 deletion(-) (snip) | diff --git a/mm/filemap.c b/mm/filemap.c | index 72e46e6f0d9a..b904a8dfa43d 100644 | --- a/mm/filemap.c | +++ b/mm/filemap.c (snip) | @@ -675,6 +698,39 @@ int file_write_and_wait_range(struct file *file, loff_t | lstart, loff_t lend) | EXPORT_SYMBOL(file_write_and_wait_range); | | /** | + * file_write_and_wait - write out whole file and wait on it and return any | + * writeback errors since we last checked | + * @file: file to write back and wait on | + * | + * Write back the whole file and wait on its mapping. Afterward, check for | + * errors that may have occurred since our file->f_wb_err cursor was last | + * updated. | + */ | +int file_write_and_wait(struct file *file) | +{ | + int err = 0, err2; | + struct address_space *mapping = file->f_mapping; | + | + if ((!dax_mapping(mapping) && mapping->nrpages) || | + (dax_mapping(mapping) && mapping->nrexceptional)) { Seems like we should make the new function mapping_needs_writeback more central (mm.h or fs.h?) and call it here ^. | + err = filemap_fdatawrite(mapping); | + /* See comment of filemap_write_and_wait() */ | + if (err != -EIO) { | + loff_t i_size = i_size_read(mapping->host); | + | + if (i_size != 0) | + __filemap_fdatawait_range(mapping, 0, | + i_size - 1); | + } | + } | + err2 = file_check_and_advance_wb_err(file); | + if (!err) | + err = err2; | + return err; In the past, I've seen more elegant constructs like: return (err ? err : err2); but I don't know what's considered more ugly or hackish. Regards, Bob Peterson Red Hat File Systems -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f200.google.com (mail-qt0-f200.google.com [209.85.216.200]) by kanga.kvack.org (Postfix) with ESMTP id 59A2F6B025F for ; Wed, 26 Jul 2017 18:18:33 -0400 (EDT) Received: by mail-qt0-f200.google.com with SMTP id i19so58303670qte.5 for ; Wed, 26 Jul 2017 15:18:33 -0700 (PDT) Received: from mail-qt0-f174.google.com (mail-qt0-f174.google.com. [209.85.216.174]) by mx.google.com with ESMTPS id v25si13872891qtf.92.2017.07.26.15.18.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Jul 2017 15:18:32 -0700 (PDT) Received: by mail-qt0-f174.google.com with SMTP id p3so52162751qtg.2 for ; Wed, 26 Jul 2017 15:18:32 -0700 (PDT) Message-ID: <1501107510.15159.4.camel@redhat.com> Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait From: Jeff Layton Date: Wed, 26 Jul 2017 18:18:30 -0400 In-Reply-To: <20170726191305.GC15980@bombadil.infradead.org> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-3-jlayton@kernel.org> <20170726191305.GC15980@bombadil.infradead.org> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox , Jeff Layton Cc: Alexander Viro , Jan Kara , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com On Wed, 2017-07-26 at 12:13 -0700, Matthew Wilcox wrote: > On Wed, Jul 26, 2017 at 01:55:36PM -0400, Jeff Layton wrote: > > +int file_write_and_wait(struct file *file) > > +{ > > + int err = 0, err2; > > + struct address_space *mapping = file->f_mapping; > > + > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > Since patch 1 exists, shouldn't this use the new helper? > yes, will fix > > + err = filemap_fdatawrite(mapping); > > + /* See comment of filemap_write_and_wait() */ > > + if (err != -EIO) { > > + loff_t i_size = i_size_read(mapping->host); > > + > > + if (i_size != 0) > > + __filemap_fdatawait_range(mapping, 0, > > + i_size - 1); > > + } > > + } > > + err2 = file_check_and_advance_wb_err(file); > > + if (!err) > > + err = err2; > > + return err; > > Would this be clearer written as: > > if (err) > return err; > return err2; > > or even ... > > return err ? err : err2; > Meh -- I like it the way I have it. If we don't have an error already, then just take the one from the check and advance. That said, I don't have a terribly strong preference here, so if anyone does, then I can be easily persuaded. -- -- Jeff Layton -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f199.google.com (mail-qk0-f199.google.com [209.85.220.199]) by kanga.kvack.org (Postfix) with ESMTP id 81EDE6B025F for ; Wed, 26 Jul 2017 18:22:56 -0400 (EDT) Received: by mail-qk0-f199.google.com with SMTP id o124so68629114qke.9 for ; Wed, 26 Jul 2017 15:22:56 -0700 (PDT) Received: from mail-qk0-f171.google.com (mail-qk0-f171.google.com. [209.85.220.171]) by mx.google.com with ESMTPS id f92si3086915qtd.528.2017.07.26.15.22.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 26 Jul 2017 15:22:55 -0700 (PDT) Received: by mail-qk0-f171.google.com with SMTP id x191so27409176qka.5 for ; Wed, 26 Jul 2017 15:22:55 -0700 (PDT) Message-ID: <1501107773.15159.6.camel@redhat.com> Subject: Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync From: Jeff Layton Date: Wed, 26 Jul 2017 18:22:53 -0400 In-Reply-To: <20170726192105.GD15980@bombadil.infradead.org> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-5-jlayton@kernel.org> <20170726192105.GD15980@bombadil.infradead.org> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox , Jeff Layton Cc: Alexander Viro , Jan Kara , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end, > > if (ret) > > return ret; > > if (gfs2_is_jdata(ip)) > > - filemap_write_and_wait(mapping); > > + ret = file_write_and_wait(file); > > + if (ret) > > + return ret; > > gfs2_ail_flush(ip->i_gl, 1); > > } > > Do we want to skip flushing the AIL if there was an error (possibly > previously encountered)? I'd think we'd want to flush the AIL then report > the error, like this: > I wondered about that. Note that earlier in the function, we also bail out without flushing the AIL if sync_inode_metadata fails, so I assumed that we'd want to do the same here. I could definitely be wrong and am fine with changing it if so. Discarding the error like we do today seems wrong though. Bob, thoughts? > if (gfs2_is_jdata(ip)) > - filemap_write_and_wait(mapping); > + ret = file_write_and_wait(file); > gfs2_ail_flush(ip->i_gl, 1); > + if (ret) > + return ret; > } -- Jeff Layton -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id AF6676B0292 for ; Thu, 27 Jul 2017 04:43:44 -0400 (EDT) Received: by mail-wm0-f69.google.com with SMTP id g71so12803126wmg.13 for ; Thu, 27 Jul 2017 01:43:44 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id s78si7478423wma.251.2017.07.27.01.43.43 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 27 Jul 2017 01:43:43 -0700 (PDT) Date: Thu, 27 Jul 2017 10:43:41 +0200 From: Jan Kara Subject: Re: [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback Message-ID: <20170727084341.GB21100@quack2.suse.cz> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-2-jlayton@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170726175538.13885-2-jlayton@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Jeff Layton Cc: Alexander Viro , Jan Kara , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com On Wed 26-07-17 13:55:35, Jeff Layton wrote: > From: Jeff Layton > > We have this complex conditional copied to several places. Turn it into > a helper function. > > Signed-off-by: Jeff Layton Looks good. You can add: Reviewed-by: Jan Kara Honza > --- > mm/filemap.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index e1cca770688f..72e46e6f0d9a 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -522,12 +522,17 @@ int filemap_fdatawait(struct address_space *mapping) > } > EXPORT_SYMBOL(filemap_fdatawait); > > +static bool mapping_needs_writeback(struct address_space *mapping) > +{ > + return (!dax_mapping(mapping) && mapping->nrpages) || > + (dax_mapping(mapping) && mapping->nrexceptional); > +} > + > int filemap_write_and_wait(struct address_space *mapping) > { > int err = 0; > > - if ((!dax_mapping(mapping) && mapping->nrpages) || > - (dax_mapping(mapping) && mapping->nrexceptional)) { > + if (mapping_needs_writeback(mapping)) { > err = filemap_fdatawrite(mapping); > /* > * Even if the above returned error, the pages may be > @@ -566,8 +571,7 @@ int filemap_write_and_wait_range(struct address_space *mapping, > { > int err = 0; > > - if ((!dax_mapping(mapping) && mapping->nrpages) || > - (dax_mapping(mapping) && mapping->nrexceptional)) { > + if (mapping_needs_writeback(mapping)) { > err = __filemap_fdatawrite_range(mapping, lstart, lend, > WB_SYNC_ALL); > /* See comment of filemap_write_and_wait() */ > @@ -656,8 +660,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend) > int err = 0, err2; > struct address_space *mapping = file->f_mapping; > > - if ((!dax_mapping(mapping) && mapping->nrpages) || > - (dax_mapping(mapping) && mapping->nrexceptional)) { > + if (mapping_needs_writeback(mapping)) { > err = __filemap_fdatawrite_range(mapping, lstart, lend, > WB_SYNC_ALL); > /* See comment of filemap_write_and_wait() */ > -- > 2.13.3 > -- Jan Kara SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 7F80B6B0292 for ; Thu, 27 Jul 2017 04:49:16 -0400 (EDT) Received: by mail-wm0-f72.google.com with SMTP id 185so7152417wmk.12 for ; Thu, 27 Jul 2017 01:49:16 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id t16si7708005wra.201.2017.07.27.01.49.15 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 27 Jul 2017 01:49:15 -0700 (PDT) Date: Thu, 27 Jul 2017 10:49:14 +0200 From: Jan Kara Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait Message-ID: <20170727084914.GC21100@quack2.suse.cz> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-3-jlayton@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170726175538.13885-3-jlayton@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Jeff Layton Cc: Alexander Viro , Jan Kara , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com On Wed 26-07-17 13:55:36, Jeff Layton wrote: > +int file_write_and_wait(struct file *file) > +{ > + int err = 0, err2; > + struct address_space *mapping = file->f_mapping; > + > + if ((!dax_mapping(mapping) && mapping->nrpages) || > + (dax_mapping(mapping) && mapping->nrexceptional)) { > + err = filemap_fdatawrite(mapping); > + /* See comment of filemap_write_and_wait() */ > + if (err != -EIO) { > + loff_t i_size = i_size_read(mapping->host); > + > + if (i_size != 0) > + __filemap_fdatawait_range(mapping, 0, > + i_size - 1); > + } > + } Err, what's the i_size check doing here? I'd just pass ~0 as the end of the range and ignore i_size. It is much easier than trying to wrap your head around possible races with file operations modifying i_size. Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f197.google.com (mail-qk0-f197.google.com [209.85.220.197]) by kanga.kvack.org (Postfix) with ESMTP id 634136B025F for ; Thu, 27 Jul 2017 08:47:14 -0400 (EDT) Received: by mail-qk0-f197.google.com with SMTP id p135so16960095qke.0 for ; Thu, 27 Jul 2017 05:47:14 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id 28si15006437qts.227.2017.07.27.05.47.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Jul 2017 05:47:13 -0700 (PDT) Date: Thu, 27 Jul 2017 08:47:08 -0400 (EDT) From: Bob Peterson Message-ID: <932895023.34932662.1501159628674.JavaMail.zimbra@redhat.com> In-Reply-To: <1501107773.15159.6.camel@redhat.com> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-5-jlayton@kernel.org> <20170726192105.GD15980@bombadil.infradead.org> <1501107773.15159.6.camel@redhat.com> Subject: Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jeff Layton Cc: Matthew Wilcox , Jeff Layton , Alexander Viro , Jan Kara , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steven Whitehouse , cluster-devel@redhat.com ----- Original Message ----- | On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: | > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: | > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t | > > start, loff_t end, | > > if (ret) | > > return ret; | > > if (gfs2_is_jdata(ip)) | > > - filemap_write_and_wait(mapping); | > > + ret = file_write_and_wait(file); | > > + if (ret) | > > + return ret; | > > gfs2_ail_flush(ip->i_gl, 1); | > > } | > | > Do we want to skip flushing the AIL if there was an error (possibly | > previously encountered)? I'd think we'd want to flush the AIL then report | > the error, like this: | > | | I wondered about that. Note that earlier in the function, we also bail | out without flushing the AIL if sync_inode_metadata fails, so I assumed | that we'd want to do the same here. | | I could definitely be wrong and am fine with changing it if so. | Discarding the error like we do today seems wrong though. | | Bob, thoughts? Hi Jeff, Matthew, I'm not sure there's a right or wrong answer here. I don't know what's best from a "correctness" point of view. I guess I'm leaning toward Jeff's original solution where we don't call gfs2_ail_flush() on error. The main purpose of ail_flush is to go through buffer descriptors (bds) attached to the glock and generate revokes for them in a new transaction. If there's an error condition, trying to go through more hoops will probably just get us into more trouble. If the error is -ENOMEM, we don't want to allocate new memory for the new transaction. If the error is -EIO, we probably don't want to encourage more writing either. So on the one hand, it might be good to get rid of the buffer descriptors so we don't leak memory, but that's probably also done elsewhere. I have not chased down what happens in that case, but the same thing would happen in the existing -EIO case a few lines above. On the other hand, we probably don't want to start a new transaction and start adding revokes to it, and such, due to the error. Perhaps Steve Whitehouse can weigh in? Regards, Bob Peterson Red Hat File Systems -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f199.google.com (mail-qk0-f199.google.com [209.85.220.199]) by kanga.kvack.org (Postfix) with ESMTP id F23406B025F for ; Thu, 27 Jul 2017 08:48:32 -0400 (EDT) Received: by mail-qk0-f199.google.com with SMTP id c2so3990436qkb.10 for ; Thu, 27 Jul 2017 05:48:32 -0700 (PDT) Received: from mail-qt0-f177.google.com (mail-qt0-f177.google.com. [209.85.216.177]) by mx.google.com with ESMTPS id j21si15861534qtf.103.2017.07.27.05.48.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Jul 2017 05:48:32 -0700 (PDT) Received: by mail-qt0-f177.google.com with SMTP id v29so41614658qtv.3 for ; Thu, 27 Jul 2017 05:48:32 -0700 (PDT) Message-ID: <1501159710.6279.1.camel@redhat.com> Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait From: Jeff Layton Date: Thu, 27 Jul 2017 08:48:30 -0400 In-Reply-To: <20170727084914.GC21100@quack2.suse.cz> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-3-jlayton@kernel.org> <20170727084914.GC21100@quack2.suse.cz> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jan Kara , Jeff Layton Cc: Alexander Viro , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > +int file_write_and_wait(struct file *file) > > +{ > > + int err = 0, err2; > > + struct address_space *mapping = file->f_mapping; > > + > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > + err = filemap_fdatawrite(mapping); > > + /* See comment of filemap_write_and_wait() */ > > + if (err != -EIO) { > > + loff_t i_size = i_size_read(mapping->host); > > + > > + if (i_size != 0) > > + __filemap_fdatawait_range(mapping, 0, > > + i_size - 1); > > + } > > + } > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > range and ignore i_size. It is much easier than trying to wrap your head > around possible races with file operations modifying i_size. > > Honza I'm basically emulating _exactly_ what filemap_write_and_wait does here, as I'm leery of making subtle behavior changes in the actual writeback behavior. For example: -----------------8<---------------- static inline int __filemap_fdatawrite(struct address_space *mapping, int sync_mode) { return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); } int filemap_fdatawrite(struct address_space *mapping) { return __filemap_fdatawrite(mapping, WB_SYNC_ALL); } EXPORT_SYMBOL(filemap_fdatawrite); -----------------8<---------------- ...which then sets up the wbc with the right ranges and sync mode and kicks off writepages. But then, it does the i_size_read to figure out what range it should wait on (with the shortcut for the size == 0 case). My assumption was that it was intentionally designed that way, but I'm guessing from your comments that it wasn't? If so, then we can turn file_write_and_wait a static inline wrapper around file_write_and_wait_range. -- Jeff Layton -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f199.google.com (mail-qt0-f199.google.com [209.85.216.199]) by kanga.kvack.org (Postfix) with ESMTP id 4D2996B0535 for ; Fri, 28 Jul 2017 08:37:15 -0400 (EDT) Received: by mail-qt0-f199.google.com with SMTP id t37so91645028qtg.6 for ; Fri, 28 Jul 2017 05:37:15 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id x3si10231243qte.285.2017.07.28.05.37.14 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 28 Jul 2017 05:37:14 -0700 (PDT) Subject: Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-5-jlayton@kernel.org> <20170726192105.GD15980@bombadil.infradead.org> <1501107773.15159.6.camel@redhat.com> <932895023.34932662.1501159628674.JavaMail.zimbra@redhat.com> From: Steven Whitehouse Message-ID: <16d62583-f677-bc34-dccf-d20d9405ca10@redhat.com> Date: Fri, 28 Jul 2017 13:37:05 +0100 MIME-Version: 1.0 In-Reply-To: <932895023.34932662.1501159628674.JavaMail.zimbra@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Bob Peterson , Jeff Layton Cc: Matthew Wilcox , Jeff Layton , Alexander Viro , Jan Kara , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, cluster-devel@redhat.com Hi, On 27/07/17 13:47, Bob Peterson wrote: > ----- Original Message ----- > | On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: > | > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > | > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t > | > > start, loff_t end, > | > > if (ret) > | > > return ret; > | > > if (gfs2_is_jdata(ip)) > | > > - filemap_write_and_wait(mapping); > | > > + ret = file_write_and_wait(file); > | > > + if (ret) > | > > + return ret; > | > > gfs2_ail_flush(ip->i_gl, 1); > | > > } > | > > | > Do we want to skip flushing the AIL if there was an error (possibly > | > previously encountered)? I'd think we'd want to flush the AIL then report > | > the error, like this: > | > > | > | I wondered about that. Note that earlier in the function, we also bail > | out without flushing the AIL if sync_inode_metadata fails, so I assumed > | that we'd want to do the same here. > | > | I could definitely be wrong and am fine with changing it if so. > | Discarding the error like we do today seems wrong though. > | > | Bob, thoughts? > > Hi Jeff, Matthew, > > I'm not sure there's a right or wrong answer here. I don't know what's > best from a "correctness" point of view. > > I guess I'm leaning toward Jeff's original solution where we don't > call gfs2_ail_flush() on error. The main purpose of ail_flush is to > go through buffer descriptors (bds) attached to the glock and generate > revokes for them in a new transaction. If there's an error condition, > trying to go through more hoops will probably just get us into more > trouble. If the error is -ENOMEM, we don't want to allocate new memory > for the new transaction. If the error is -EIO, we probably don't > want to encourage more writing either. > > So on the one hand, it might be good to get rid of the buffer descriptors > so we don't leak memory, but that's probably also done elsewhere. > I have not chased down what happens in that case, but the same thing > would happen in the existing -EIO case a few lines above. > > On the other hand, we probably don't want to start a new transaction > and start adding revokes to it, and such, due to the error. > > Perhaps Steve Whitehouse can weigh in? > > Regards, > > Bob Peterson > Red Hat File Systems Yes, we probably do want to skip the ail flush if there is an error. We don't know whether the error is permanent or transient at that stage. If a previous stage of the fsync has failed, then there may be nothing for the next stage to do anyway, so it is probably not a big deal either way. So long as the error is reported to the caller, then we should be ok, Steve. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw0-f198.google.com (mail-yw0-f198.google.com [209.85.161.198]) by kanga.kvack.org (Postfix) with ESMTP id 17BF26B0539 for ; Fri, 28 Jul 2017 08:47:40 -0400 (EDT) Received: by mail-yw0-f198.google.com with SMTP id f72so227693001ywb.4 for ; Fri, 28 Jul 2017 05:47:40 -0700 (PDT) Received: from mail-yw0-f173.google.com (mail-yw0-f173.google.com. [209.85.161.173]) by mx.google.com with ESMTPS id g7si5074027ywf.26.2017.07.28.05.47.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 28 Jul 2017 05:47:39 -0700 (PDT) Received: by mail-yw0-f173.google.com with SMTP id u207so61586777ywc.3 for ; Fri, 28 Jul 2017 05:47:39 -0700 (PDT) Message-ID: <1501246057.8241.1.camel@redhat.com> Subject: Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync From: Jeff Layton Date: Fri, 28 Jul 2017 08:47:37 -0400 In-Reply-To: <16d62583-f677-bc34-dccf-d20d9405ca10@redhat.com> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-5-jlayton@kernel.org> <20170726192105.GD15980@bombadil.infradead.org> <1501107773.15159.6.camel@redhat.com> <932895023.34932662.1501159628674.JavaMail.zimbra@redhat.com> <16d62583-f677-bc34-dccf-d20d9405ca10@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Steven Whitehouse , Bob Peterson Cc: Matthew Wilcox , Jeff Layton , Alexander Viro , Jan Kara , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, cluster-devel@redhat.com On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote: > Hi, > > > On 27/07/17 13:47, Bob Peterson wrote: > > ----- Original Message ----- > > > On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: > > > > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > > > > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t > > > > > start, loff_t end, > > > > > if (ret) > > > > > return ret; > > > > > if (gfs2_is_jdata(ip)) > > > > > - filemap_write_and_wait(mapping); > > > > > + ret = file_write_and_wait(file); > > > > > + if (ret) > > > > > + return ret; > > > > > gfs2_ail_flush(ip->i_gl, 1); > > > > > } > > > > > > > > Do we want to skip flushing the AIL if there was an error (possibly > > > > previously encountered)? I'd think we'd want to flush the AIL then report > > > > the error, like this: > > > > > > > > > > I wondered about that. Note that earlier in the function, we also bail > > > out without flushing the AIL if sync_inode_metadata fails, so I assumed > > > that we'd want to do the same here. > > > > > > I could definitely be wrong and am fine with changing it if so. > > > Discarding the error like we do today seems wrong though. > > > > > > Bob, thoughts? > > > > Hi Jeff, Matthew, > > > > I'm not sure there's a right or wrong answer here. I don't know what's > > best from a "correctness" point of view. > > > > I guess I'm leaning toward Jeff's original solution where we don't > > call gfs2_ail_flush() on error. The main purpose of ail_flush is to > > go through buffer descriptors (bds) attached to the glock and generate > > revokes for them in a new transaction. If there's an error condition, > > trying to go through more hoops will probably just get us into more > > trouble. If the error is -ENOMEM, we don't want to allocate new memory > > for the new transaction. If the error is -EIO, we probably don't > > want to encourage more writing either. > > > > So on the one hand, it might be good to get rid of the buffer descriptors > > so we don't leak memory, but that's probably also done elsewhere. > > I have not chased down what happens in that case, but the same thing > > would happen in the existing -EIO case a few lines above. > > > > On the other hand, we probably don't want to start a new transaction > > and start adding revokes to it, and such, due to the error. > > > > Perhaps Steve Whitehouse can weigh in? > > > > Regards, > > > > Bob Peterson > > Red Hat File Systems > > Yes, we probably do want to skip the ail flush if there is an error. We > don't know whether the error is permanent or transient at that stage. If > a previous stage of the fsync has failed, then there may be nothing for > the next stage to do anyway, so it is probably not a big deal either > way. So long as the error is reported to the caller, then we should be ok, > Ok, cool. I'll plan to carry this patch for now as it depends on an earlier one in the series. One more question though: Is it correct in the gfs2_is_jdata case to ignore the range that was passed in from the caller? ->fsync gets start and end arguments, but this will always write back the whole range. Is that necessary in this case? -- Jeff Layton -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f200.google.com (mail-qt0-f200.google.com [209.85.216.200]) by kanga.kvack.org (Postfix) with ESMTP id 080F76B053B for ; Fri, 28 Jul 2017 08:54:46 -0400 (EDT) Received: by mail-qt0-f200.google.com with SMTP id u19so99380844qtc.14 for ; Fri, 28 Jul 2017 05:54:46 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id 125si12839058qkf.478.2017.07.28.05.54.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 28 Jul 2017 05:54:45 -0700 (PDT) Subject: Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-5-jlayton@kernel.org> <20170726192105.GD15980@bombadil.infradead.org> <1501107773.15159.6.camel@redhat.com> <932895023.34932662.1501159628674.JavaMail.zimbra@redhat.com> <16d62583-f677-bc34-dccf-d20d9405ca10@redhat.com> <1501246057.8241.1.camel@redhat.com> From: Steven Whitehouse Message-ID: <1d8d38a4-38a4-cf65-5ecd-ec9410f7f504@redhat.com> Date: Fri, 28 Jul 2017 13:54:18 +0100 MIME-Version: 1.0 In-Reply-To: <1501246057.8241.1.camel@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jeff Layton , Bob Peterson Cc: Matthew Wilcox , Jeff Layton , Alexander Viro , Jan Kara , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, cluster-devel@redhat.com Hi, On 28/07/17 13:47, Jeff Layton wrote: > On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote: >> Hi, >> >> >> On 27/07/17 13:47, Bob Peterson wrote: >>> ----- Original Message ----- >>>> On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: >>>>> On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: >>>>>> @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t >>>>>> start, loff_t end, >>>>>> if (ret) >>>>>> return ret; >>>>>> if (gfs2_is_jdata(ip)) >>>>>> - filemap_write_and_wait(mapping); >>>>>> + ret = file_write_and_wait(file); >>>>>> + if (ret) >>>>>> + return ret; >>>>>> gfs2_ail_flush(ip->i_gl, 1); >>>>>> } >>>>> Do we want to skip flushing the AIL if there was an error (possibly >>>>> previously encountered)? I'd think we'd want to flush the AIL then report >>>>> the error, like this: >>>>> >>>> I wondered about that. Note that earlier in the function, we also bail >>>> out without flushing the AIL if sync_inode_metadata fails, so I assumed >>>> that we'd want to do the same here. >>>> >>>> I could definitely be wrong and am fine with changing it if so. >>>> Discarding the error like we do today seems wrong though. >>>> >>>> Bob, thoughts? >>> Hi Jeff, Matthew, >>> >>> I'm not sure there's a right or wrong answer here. I don't know what's >>> best from a "correctness" point of view. >>> >>> I guess I'm leaning toward Jeff's original solution where we don't >>> call gfs2_ail_flush() on error. The main purpose of ail_flush is to >>> go through buffer descriptors (bds) attached to the glock and generate >>> revokes for them in a new transaction. If there's an error condition, >>> trying to go through more hoops will probably just get us into more >>> trouble. If the error is -ENOMEM, we don't want to allocate new memory >>> for the new transaction. If the error is -EIO, we probably don't >>> want to encourage more writing either. >>> >>> So on the one hand, it might be good to get rid of the buffer descriptors >>> so we don't leak memory, but that's probably also done elsewhere. >>> I have not chased down what happens in that case, but the same thing >>> would happen in the existing -EIO case a few lines above. >>> >>> On the other hand, we probably don't want to start a new transaction >>> and start adding revokes to it, and such, due to the error. >>> >>> Perhaps Steve Whitehouse can weigh in? >>> >>> Regards, >>> >>> Bob Peterson >>> Red Hat File Systems >> Yes, we probably do want to skip the ail flush if there is an error. We >> don't know whether the error is permanent or transient at that stage. If >> a previous stage of the fsync has failed, then there may be nothing for >> the next stage to do anyway, so it is probably not a big deal either >> way. So long as the error is reported to the caller, then we should be ok, >> > Ok, cool. I'll plan to carry this patch for now as it depends on an > earlier one in the series. One more question though: > > Is it correct in the gfs2_is_jdata case to ignore the range that was > passed in from the caller? ->fsync gets start and end arguments, but > this will always write back the whole range. Is that necessary in this > case? > It probably doesn't matter really. We try to discourage the use of jdata from userspace. There are a few internal files that use it still, and it is there for backwards compatibility more than anything. So performance is generally not a problem for that. The ordered write mode is the important one. So you are right that it might be better to add the range into that call too, but it is not likely that anybody will notice the performance improvement, Steve. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f197.google.com (mail-qt0-f197.google.com [209.85.216.197]) by kanga.kvack.org (Postfix) with ESMTP id A3FEB6B05DF for ; Mon, 31 Jul 2017 07:27:04 -0400 (EDT) Received: by mail-qt0-f197.google.com with SMTP id t37so123352083qtg.6 for ; Mon, 31 Jul 2017 04:27:04 -0700 (PDT) Received: from mail-qk0-f171.google.com (mail-qk0-f171.google.com. [209.85.220.171]) by mx.google.com with ESMTPS id v194si17295722qka.416.2017.07.31.04.27.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 31 Jul 2017 04:27:03 -0700 (PDT) Received: by mail-qk0-f171.google.com with SMTP id x191so79287765qka.5 for ; Mon, 31 Jul 2017 04:27:03 -0700 (PDT) Message-ID: <1501500421.4663.4.camel@redhat.com> Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait From: Jeff Layton Date: Mon, 31 Jul 2017 07:27:01 -0400 In-Reply-To: <1501159710.6279.1.camel@redhat.com> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-3-jlayton@kernel.org> <20170727084914.GC21100@quack2.suse.cz> <1501159710.6279.1.camel@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jan Kara , Marcelo Tosatti Cc: Alexander Viro , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > +int file_write_and_wait(struct file *file) > > > +{ > > > + int err = 0, err2; > > > + struct address_space *mapping = file->f_mapping; > > > + > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > + err = filemap_fdatawrite(mapping); > > > + /* See comment of filemap_write_and_wait() */ > > > + if (err != -EIO) { > > > + loff_t i_size = i_size_read(mapping->host); > > > + > > > + if (i_size != 0) > > > + __filemap_fdatawait_range(mapping, 0, > > > + i_size - 1); > > > + } > > > + } > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > range and ignore i_size. It is much easier than trying to wrap your head > > around possible races with file operations modifying i_size. > > > > Honza > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > as I'm leery of making subtle behavior changes in the actual writeback > behavior. For example: > > -----------------8<---------------- > static inline int __filemap_fdatawrite(struct address_space *mapping, > int sync_mode) > { > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > } > > int filemap_fdatawrite(struct address_space *mapping) > { > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > } > EXPORT_SYMBOL(filemap_fdatawrite); > -----------------8<---------------- > > ...which then sets up the wbc with the right ranges and sync mode and > kicks off writepages. But then, it does the i_size_read to figure out > what range it should wait on (with the shortcut for the size == 0 case). > > My assumption was that it was intentionally designed that way, but I'm > guessing from your comments that it wasn't? If so, then we can turn > file_write_and_wait a static inline wrapper around > file_write_and_wait_range. FWIW, I did a bit of archaeology in the linux-history tree and found this patch from Marcelo in 2004. Is this optimization still helpful? If not, then that does simplify the code a bit. -------------------8<-------------------- [PATCH] small wait_on_page_writeback_range() optimization filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" parameter. This is not needed since we know the EOF from the inode. Use that instead. Signed-off-by: Marcelo Tosatti Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/filemap.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 78e18b7639b6..55fb7b4141e4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); */ int filemap_fdatawait(struct address_space *mapping) { - return wait_on_page_writeback_range(mapping, 0, -1); + loff_t i_size = i_size_read(mapping->host); + + if (i_size == 0) + return 0; + + return wait_on_page_writeback_range(mapping, 0, + (i_size - 1) >> PAGE_CACHE_SHIFT); } EXPORT_SYMBOL(filemap_fdatawait); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f198.google.com (mail-qt0-f198.google.com [209.85.216.198]) by kanga.kvack.org (Postfix) with ESMTP id 804D66B05E1 for ; Mon, 31 Jul 2017 07:32:39 -0400 (EDT) Received: by mail-qt0-f198.google.com with SMTP id l22so103709317qtf.9 for ; Mon, 31 Jul 2017 04:32:39 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id d56si24315090qtf.418.2017.07.31.04.32.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 31 Jul 2017 04:32:38 -0700 (PDT) Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-3-jlayton@kernel.org> <20170727084914.GC21100@quack2.suse.cz> <1501159710.6279.1.camel@redhat.com> <1501500421.4663.4.camel@redhat.com> From: Steven Whitehouse Message-ID: <8d46c4c6-76b5-9726-7d85-249cd9a899f1@redhat.com> Date: Mon, 31 Jul 2017 12:32:31 +0100 MIME-Version: 1.0 In-Reply-To: <1501500421.4663.4.camel@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jeff Layton , Jan Kara , Marcelo Tosatti Cc: Alexander Viro , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , cluster-devel@redhat.com Hi, On 31/07/17 12:27, Jeff Layton wrote: > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: >> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: >>> On Wed 26-07-17 13:55:36, Jeff Layton wrote: >>>> +int file_write_and_wait(struct file *file) >>>> +{ >>>> + int err = 0, err2; >>>> + struct address_space *mapping = file->f_mapping; >>>> + >>>> + if ((!dax_mapping(mapping) && mapping->nrpages) || >>>> + (dax_mapping(mapping) && mapping->nrexceptional)) { >>>> + err = filemap_fdatawrite(mapping); >>>> + /* See comment of filemap_write_and_wait() */ >>>> + if (err != -EIO) { >>>> + loff_t i_size = i_size_read(mapping->host); >>>> + >>>> + if (i_size != 0) >>>> + __filemap_fdatawait_range(mapping, 0, >>>> + i_size - 1); >>>> + } >>>> + } >>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the >>> range and ignore i_size. It is much easier than trying to wrap your head >>> around possible races with file operations modifying i_size. >>> >>> Honza >> I'm basically emulating _exactly_ what filemap_write_and_wait does here, >> as I'm leery of making subtle behavior changes in the actual writeback >> behavior. For example: >> >> -----------------8<---------------- >> static inline int __filemap_fdatawrite(struct address_space *mapping, >> int sync_mode) >> { >> return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); >> } >> >> int filemap_fdatawrite(struct address_space *mapping) >> { >> return __filemap_fdatawrite(mapping, WB_SYNC_ALL); >> } >> EXPORT_SYMBOL(filemap_fdatawrite); >> -----------------8<---------------- >> >> ...which then sets up the wbc with the right ranges and sync mode and >> kicks off writepages. But then, it does the i_size_read to figure out >> what range it should wait on (with the shortcut for the size == 0 case). >> >> My assumption was that it was intentionally designed that way, but I'm >> guessing from your comments that it wasn't? If so, then we can turn >> file_write_and_wait a static inline wrapper around >> file_write_and_wait_range. > FWIW, I did a bit of archaeology in the linux-history tree and found > this patch from Marcelo in 2004. Is this optimization still helpful? If > not, then that does simplify the code a bit. > > -------------------8<-------------------- > > [PATCH] small wait_on_page_writeback_range() optimization > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > parameter. This is not needed since we know the EOF from the inode. Use > that instead. > > Signed-off-by: Marcelo Tosatti > Signed-off-by: Andrew Morton > Signed-off-by: Linus Torvalds > --- > mm/filemap.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 78e18b7639b6..55fb7b4141e4 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > */ > int filemap_fdatawait(struct address_space *mapping) > { > - return wait_on_page_writeback_range(mapping, 0, -1); > + loff_t i_size = i_size_read(mapping->host); > + > + if (i_size == 0) > + return 0; > + > + return wait_on_page_writeback_range(mapping, 0, > + (i_size - 1) >> PAGE_CACHE_SHIFT); > } > EXPORT_SYMBOL(filemap_fdatawait); > Does this ever get called in cases where we would not hold fs locks? In that case we definitely don't want to be relying on i_size, Steve. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f199.google.com (mail-qk0-f199.google.com [209.85.220.199]) by kanga.kvack.org (Postfix) with ESMTP id D83626B05E3 for ; Mon, 31 Jul 2017 07:44:19 -0400 (EDT) Received: by mail-qk0-f199.google.com with SMTP id o65so134439966qkl.12 for ; Mon, 31 Jul 2017 04:44:19 -0700 (PDT) Received: from mail-qk0-f175.google.com (mail-qk0-f175.google.com. [209.85.220.175]) by mx.google.com with ESMTPS id q5si13968732qte.136.2017.07.31.04.44.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 31 Jul 2017 04:44:19 -0700 (PDT) Received: by mail-qk0-f175.google.com with SMTP id x191so79497082qka.5 for ; Mon, 31 Jul 2017 04:44:19 -0700 (PDT) Message-ID: <1501501456.4663.6.camel@redhat.com> Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait From: Jeff Layton Date: Mon, 31 Jul 2017 07:44:16 -0400 In-Reply-To: <8d46c4c6-76b5-9726-7d85-249cd9a899f1@redhat.com> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-3-jlayton@kernel.org> <20170727084914.GC21100@quack2.suse.cz> <1501159710.6279.1.camel@redhat.com> <1501500421.4663.4.camel@redhat.com> <8d46c4c6-76b5-9726-7d85-249cd9a899f1@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Steven Whitehouse , Jan Kara , Marcelo Tosatti Cc: Alexander Viro , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , cluster-devel@redhat.com On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > Hi, > > > On 31/07/17 12:27, Jeff Layton wrote: > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > +int file_write_and_wait(struct file *file) > > > > > +{ > > > > > + int err = 0, err2; > > > > > + struct address_space *mapping = file->f_mapping; > > > > > + > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > + err = filemap_fdatawrite(mapping); > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > + if (err != -EIO) { > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > + > > > > > + if (i_size != 0) > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > + i_size - 1); > > > > > + } > > > > > + } > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > around possible races with file operations modifying i_size. > > > > > > > > Honza > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > as I'm leery of making subtle behavior changes in the actual writeback > > > behavior. For example: > > > > > > -----------------8<---------------- > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > int sync_mode) > > > { > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > } > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > { > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > } > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > -----------------8<---------------- > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > kicks off writepages. But then, it does the i_size_read to figure out > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > guessing from your comments that it wasn't? If so, then we can turn > > > file_write_and_wait a static inline wrapper around > > > file_write_and_wait_range. > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > not, then that does simplify the code a bit. > > > > -------------------8<-------------------- > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > parameter. This is not needed since we know the EOF from the inode. Use > > that instead. > > > > Signed-off-by: Marcelo Tosatti > > Signed-off-by: Andrew Morton > > Signed-off-by: Linus Torvalds > > --- > > mm/filemap.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 78e18b7639b6..55fb7b4141e4 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > */ > > int filemap_fdatawait(struct address_space *mapping) > > { > > - return wait_on_page_writeback_range(mapping, 0, -1); > > + loff_t i_size = i_size_read(mapping->host); > > + > > + if (i_size == 0) > > + return 0; > > + > > + return wait_on_page_writeback_range(mapping, 0, > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > } > > EXPORT_SYMBOL(filemap_fdatawait); > > > > Does this ever get called in cases where we would not hold fs locks? In > that case we definitely don't want to be relying on i_size, > > Steve. > Yes. We can initiate and wait on writeback from any context where you can sleep, really. We're just waiting on whole file writeback here, so I don't think there's anything wrong. As long as the i_size was valid at some point in time prior to waiting then you're ok. The question I have is more whether this optimization is still useful. What we do now is just walk the radix tree and wait_on_page_writeback for each page. Do we gain anything by avoiding ranges beyond the current EOF with the pagecache infrastructure of 2017? -- Jeff Layton -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f200.google.com (mail-qk0-f200.google.com [209.85.220.200]) by kanga.kvack.org (Postfix) with ESMTP id 4B1C66B05E5 for ; Mon, 31 Jul 2017 08:05:39 -0400 (EDT) Received: by mail-qk0-f200.google.com with SMTP id q1so145921644qkb.3 for ; Mon, 31 Jul 2017 05:05:39 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id n4si22471150qte.232.2017.07.31.05.05.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 31 Jul 2017 05:05:38 -0700 (PDT) Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-3-jlayton@kernel.org> <20170727084914.GC21100@quack2.suse.cz> <1501159710.6279.1.camel@redhat.com> <1501500421.4663.4.camel@redhat.com> <8d46c4c6-76b5-9726-7d85-249cd9a899f1@redhat.com> <1501501456.4663.6.camel@redhat.com> From: Steven Whitehouse Message-ID: Date: Mon, 31 Jul 2017 13:05:08 +0100 MIME-Version: 1.0 In-Reply-To: <1501501456.4663.6.camel@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jeff Layton , Jan Kara , Marcelo Tosatti Cc: Alexander Viro , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , cluster-devel@redhat.com Hi, On 31/07/17 12:44, Jeff Layton wrote: > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: >> Hi, >> >> >> On 31/07/17 12:27, Jeff Layton wrote: >>> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: >>>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: >>>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote: >>>>>> +int file_write_and_wait(struct file *file) >>>>>> +{ >>>>>> + int err = 0, err2; >>>>>> + struct address_space *mapping = file->f_mapping; >>>>>> + >>>>>> + if ((!dax_mapping(mapping) && mapping->nrpages) || >>>>>> + (dax_mapping(mapping) && mapping->nrexceptional)) { >>>>>> + err = filemap_fdatawrite(mapping); >>>>>> + /* See comment of filemap_write_and_wait() */ >>>>>> + if (err != -EIO) { >>>>>> + loff_t i_size = i_size_read(mapping->host); >>>>>> + >>>>>> + if (i_size != 0) >>>>>> + __filemap_fdatawait_range(mapping, 0, >>>>>> + i_size - 1); >>>>>> + } >>>>>> + } >>>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the >>>>> range and ignore i_size. It is much easier than trying to wrap your head >>>>> around possible races with file operations modifying i_size. >>>>> >>>>> Honza >>>> I'm basically emulating _exactly_ what filemap_write_and_wait does here, >>>> as I'm leery of making subtle behavior changes in the actual writeback >>>> behavior. For example: >>>> >>>> -----------------8<---------------- >>>> static inline int __filemap_fdatawrite(struct address_space *mapping, >>>> int sync_mode) >>>> { >>>> return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); >>>> } >>>> >>>> int filemap_fdatawrite(struct address_space *mapping) >>>> { >>>> return __filemap_fdatawrite(mapping, WB_SYNC_ALL); >>>> } >>>> EXPORT_SYMBOL(filemap_fdatawrite); >>>> -----------------8<---------------- >>>> >>>> ...which then sets up the wbc with the right ranges and sync mode and >>>> kicks off writepages. But then, it does the i_size_read to figure out >>>> what range it should wait on (with the shortcut for the size == 0 case). >>>> >>>> My assumption was that it was intentionally designed that way, but I'm >>>> guessing from your comments that it wasn't? If so, then we can turn >>>> file_write_and_wait a static inline wrapper around >>>> file_write_and_wait_range. >>> FWIW, I did a bit of archaeology in the linux-history tree and found >>> this patch from Marcelo in 2004. Is this optimization still helpful? If >>> not, then that does simplify the code a bit. >>> >>> -------------------8<-------------------- >>> >>> [PATCH] small wait_on_page_writeback_range() optimization >>> >>> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" >>> parameter. This is not needed since we know the EOF from the inode. Use >>> that instead. >>> >>> Signed-off-by: Marcelo Tosatti >>> Signed-off-by: Andrew Morton >>> Signed-off-by: Linus Torvalds >>> --- >>> mm/filemap.c | 8 +++++++- >>> 1 file changed, 7 insertions(+), 1 deletion(-) >>> >>> diff --git a/mm/filemap.c b/mm/filemap.c >>> index 78e18b7639b6..55fb7b4141e4 100644 >>> --- a/mm/filemap.c >>> +++ b/mm/filemap.c >>> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); >>> */ >>> int filemap_fdatawait(struct address_space *mapping) >>> { >>> - return wait_on_page_writeback_range(mapping, 0, -1); >>> + loff_t i_size = i_size_read(mapping->host); >>> + >>> + if (i_size == 0) >>> + return 0; >>> + >>> + return wait_on_page_writeback_range(mapping, 0, >>> + (i_size - 1) >> PAGE_CACHE_SHIFT); >>> } >>> EXPORT_SYMBOL(filemap_fdatawait); >>> >> Does this ever get called in cases where we would not hold fs locks? In >> that case we definitely don't want to be relying on i_size, >> >> Steve. >> > Yes. We can initiate and wait on writeback from any context where you > can sleep, really. > > We're just waiting on whole file writeback here, so I don't think > there's anything wrong. As long as the i_size was valid at some point in > time prior to waiting then you're ok. > > The question I have is more whether this optimization is still useful. > > What we do now is just walk the radix tree and wait_on_page_writeback > for each page. Do we gain anything by avoiding ranges beyond the current > EOF with the pagecache infrastructure of 2017? > If this can be called from anywhere without fs locks, then i_size is not known. That has been a problem in the past since i_size may have changed on another node. We avoid that in this case due to only changing i_size under an exclusive lock, and also only having dirty pages when we have an exclusive lock. There is another case though, if the inode is a block device, i_size will be zero. That is the case for the address space that looks after rgrps for GFS2. We do (luckily!) call filemap_fdatawait_range() directly in that case. For "normal" inodes though, the address space for metadata is backed by the block device inode, so that looks like it might be an issue, since fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the metamapping. It might potentially be an issue in other cases too, Steve. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id 8FDEB6B05E7 for ; Mon, 31 Jul 2017 08:07:47 -0400 (EDT) Received: by mail-wm0-f69.google.com with SMTP id i187so20944928wma.15 for ; Mon, 31 Jul 2017 05:07:47 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id y23si19536940wra.384.2017.07.31.05.07.46 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 31 Jul 2017 05:07:46 -0700 (PDT) Date: Mon, 31 Jul 2017 14:07:44 +0200 From: Jan Kara Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait Message-ID: <20170731120744.GA25458@quack2.suse.cz> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-3-jlayton@kernel.org> <20170727084914.GC21100@quack2.suse.cz> <1501159710.6279.1.camel@redhat.com> <1501500421.4663.4.camel@redhat.com> <8d46c4c6-76b5-9726-7d85-249cd9a899f1@redhat.com> <1501501456.4663.6.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1501501456.4663.6.camel@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Jeff Layton Cc: Steven Whitehouse , Jan Kara , Marcelo Tosatti , Alexander Viro , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , cluster-devel@redhat.com On Mon 31-07-17 07:44:16, Jeff Layton wrote: > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > On 31/07/17 12:27, Jeff Layton wrote: > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > +int file_write_and_wait(struct file *file) > > > > > > +{ > > > > > > + int err = 0, err2; > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > + > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > + if (err != -EIO) { > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > + > > > > > > + if (i_size != 0) > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > + i_size - 1); > > > > > > + } > > > > > > + } > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > Honza > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > behavior. For example: > > > > > > > > -----------------8<---------------- > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > int sync_mode) > > > > { > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > } > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > { > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > } > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > -----------------8<---------------- > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > file_write_and_wait a static inline wrapper around > > > > file_write_and_wait_range. > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > not, then that does simplify the code a bit. > > > > > > -------------------8<-------------------- > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > parameter. This is not needed since we know the EOF from the inode. Use > > > that instead. > > > > > > Signed-off-by: Marcelo Tosatti > > > Signed-off-by: Andrew Morton > > > Signed-off-by: Linus Torvalds > > > --- > > > mm/filemap.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > --- a/mm/filemap.c > > > +++ b/mm/filemap.c > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > */ > > > int filemap_fdatawait(struct address_space *mapping) > > > { > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > + loff_t i_size = i_size_read(mapping->host); > > > + > > > + if (i_size == 0) > > > + return 0; > > > + > > > + return wait_on_page_writeback_range(mapping, 0, > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > } > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > that case we definitely don't want to be relying on i_size, > > > > Steve. > > > > Yes. We can initiate and wait on writeback from any context where you > can sleep, really. > > We're just waiting on whole file writeback here, so I don't think > there's anything wrong. As long as the i_size was valid at some point in > time prior to waiting then you're ok. > > The question I have is more whether this optimization is still useful. > > What we do now is just walk the radix tree and wait_on_page_writeback > for each page. Do we gain anything by avoiding ranges beyond the current > EOF with the pagecache infrastructure of 2017? FWIW I'm not aware of any significant benefit of using i_size in filemap_fdatawait() - we iterate to the end of the radix tree node anyway since pagevec_lookup_tag() does not support range searches anyway (I'm working on fixing that however even after that the benefit would be still rather marginal). What Marcello might have meant even back in 2004 was that if we are in the middle of truncate, i_size is already reduced but page cache not truncated yet, then filemap_fdatawait() does not have to wait for writeback of truncated pages. That might be a noticeable benefit even today if such race happens however I'm not sure it's worth optimizing for and surprises arising from randomly snapshotting i_size (which especially for clustered filesystems may be out of date) IMHO overweight the possible advantage. Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f197.google.com (mail-qt0-f197.google.com [209.85.216.197]) by kanga.kvack.org (Postfix) with ESMTP id C539F6B05EA for ; Mon, 31 Jul 2017 08:22:44 -0400 (EDT) Received: by mail-qt0-f197.google.com with SMTP id v49so130621633qtc.2 for ; Mon, 31 Jul 2017 05:22:44 -0700 (PDT) Received: from mail-qk0-f173.google.com (mail-qk0-f173.google.com. [209.85.220.173]) by mx.google.com with ESMTPS id m6si23185122qkf.136.2017.07.31.05.22.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 31 Jul 2017 05:22:44 -0700 (PDT) Received: by mail-qk0-f173.google.com with SMTP id u139so84710643qka.1 for ; Mon, 31 Jul 2017 05:22:44 -0700 (PDT) Message-ID: <1501503761.4663.11.camel@redhat.com> Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait From: Jeff Layton Date: Mon, 31 Jul 2017 08:22:41 -0400 In-Reply-To: References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-3-jlayton@kernel.org> <20170727084914.GC21100@quack2.suse.cz> <1501159710.6279.1.camel@redhat.com> <1501500421.4663.4.camel@redhat.com> <8d46c4c6-76b5-9726-7d85-249cd9a899f1@redhat.com> <1501501456.4663.6.camel@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Steven Whitehouse , Jan Kara , Marcelo Tosatti Cc: Alexander Viro , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , cluster-devel@redhat.com On Mon, 2017-07-31 at 13:05 +0100, Steven Whitehouse wrote: > Hi, > > > On 31/07/17 12:44, Jeff Layton wrote: > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > > Hi, > > > > > > > > > On 31/07/17 12:27, Jeff Layton wrote: > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > > +int file_write_and_wait(struct file *file) > > > > > > > +{ > > > > > > > + int err = 0, err2; > > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > > + > > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > > + if (err != -EIO) { > > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > > + > > > > > > > + if (i_size != 0) > > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > > + i_size - 1); > > > > > > > + } > > > > > > > + } > > > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > > > Honza > > > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > > behavior. For example: > > > > > > > > > > -----------------8<---------------- > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > > int sync_mode) > > > > > { > > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > > } > > > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > > { > > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > > } > > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > > -----------------8<---------------- > > > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > > file_write_and_wait a static inline wrapper around > > > > > file_write_and_wait_range. > > > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > > not, then that does simplify the code a bit. > > > > > > > > -------------------8<-------------------- > > > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > > parameter. This is not needed since we know the EOF from the inode. Use > > > > that instead. > > > > > > > > Signed-off-by: Marcelo Tosatti > > > > Signed-off-by: Andrew Morton > > > > Signed-off-by: Linus Torvalds > > > > --- > > > > mm/filemap.c | 8 +++++++- > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > > --- a/mm/filemap.c > > > > +++ b/mm/filemap.c > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > > */ > > > > int filemap_fdatawait(struct address_space *mapping) > > > > { > > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > > + loff_t i_size = i_size_read(mapping->host); > > > > + > > > > + if (i_size == 0) > > > > + return 0; > > > > + > > > > + return wait_on_page_writeback_range(mapping, 0, > > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > > } > > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > > that case we definitely don't want to be relying on i_size, > > > > > > Steve. > > > > > > > Yes. We can initiate and wait on writeback from any context where you > > can sleep, really. > > > > We're just waiting on whole file writeback here, so I don't think > > there's anything wrong. As long as the i_size was valid at some point in > > time prior to waiting then you're ok. > > > > The question I have is more whether this optimization is still useful. > > > > What we do now is just walk the radix tree and wait_on_page_writeback > > for each page. Do we gain anything by avoiding ranges beyond the current > > EOF with the pagecache infrastructure of 2017? > > > > If this can be called from anywhere without fs locks, then i_size is not > known. That has been a problem in the past since i_size may have changed > on another node. We avoid that in this case due to only changing i_size > under an exclusive lock, and also only having dirty pages when we have > an exclusive lock. There is another case though, if the inode is a block > device, i_size will be zero. That is the case for the address space that > looks after rgrps for GFS2. We do (luckily!) call > filemap_fdatawait_range() directly in that case. For "normal" inodes > though, the address space for metadata is backed by the block device > inode, so that looks like it might be an issue, since > fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the > metamapping. It might potentially be an issue in other cases too, > > Steve. > Some of those do sound problematic. Again though, we're only waiting on writeback here, and I assume with gfs2 that would only be pages that were written on the local node. Is it possible to have pages under writeback and in still in the tree, but that are beyond the current i_size? It seems like that's the main worrisome case. -- Jeff Layton -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f199.google.com (mail-qk0-f199.google.com [209.85.220.199]) by kanga.kvack.org (Postfix) with ESMTP id 9B6756B05EC for ; Mon, 31 Jul 2017 08:25:51 -0400 (EDT) Received: by mail-qk0-f199.google.com with SMTP id m84so9443809qki.5 for ; Mon, 31 Jul 2017 05:25:51 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id v62si23055152qkc.104.2017.07.31.05.25.50 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 31 Jul 2017 05:25:50 -0700 (PDT) Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-3-jlayton@kernel.org> <20170727084914.GC21100@quack2.suse.cz> <1501159710.6279.1.camel@redhat.com> <1501500421.4663.4.camel@redhat.com> <8d46c4c6-76b5-9726-7d85-249cd9a899f1@redhat.com> <1501501456.4663.6.camel@redhat.com> <1501503761.4663.11.camel@redhat.com> From: Steven Whitehouse Message-ID: <956b81bb-d8d7-9da3-da6f-98bb9963e408@redhat.com> Date: Mon, 31 Jul 2017 13:25:32 +0100 MIME-Version: 1.0 In-Reply-To: <1501503761.4663.11.camel@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jeff Layton , Jan Kara , Marcelo Tosatti Cc: Alexander Viro , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , cluster-devel@redhat.com Hi, On 31/07/17 13:22, Jeff Layton wrote: > On Mon, 2017-07-31 at 13:05 +0100, Steven Whitehouse wrote: >> Hi, >> >> >> On 31/07/17 12:44, Jeff Layton wrote: >>> On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: >>>> Hi, >>>> >>>> >>>> On 31/07/17 12:27, Jeff Layton wrote: >>>>> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: >>>>>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: >>>>>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote: >>>>>>>> +int file_write_and_wait(struct file *file) >>>>>>>> +{ >>>>>>>> + int err = 0, err2; >>>>>>>> + struct address_space *mapping = file->f_mapping; >>>>>>>> + >>>>>>>> + if ((!dax_mapping(mapping) && mapping->nrpages) || >>>>>>>> + (dax_mapping(mapping) && mapping->nrexceptional)) { >>>>>>>> + err = filemap_fdatawrite(mapping); >>>>>>>> + /* See comment of filemap_write_and_wait() */ >>>>>>>> + if (err != -EIO) { >>>>>>>> + loff_t i_size = i_size_read(mapping->host); >>>>>>>> + >>>>>>>> + if (i_size != 0) >>>>>>>> + __filemap_fdatawait_range(mapping, 0, >>>>>>>> + i_size - 1); >>>>>>>> + } >>>>>>>> + } >>>>>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the >>>>>>> range and ignore i_size. It is much easier than trying to wrap your head >>>>>>> around possible races with file operations modifying i_size. >>>>>>> >>>>>>> Honza >>>>>> I'm basically emulating _exactly_ what filemap_write_and_wait does here, >>>>>> as I'm leery of making subtle behavior changes in the actual writeback >>>>>> behavior. For example: >>>>>> >>>>>> -----------------8<---------------- >>>>>> static inline int __filemap_fdatawrite(struct address_space *mapping, >>>>>> int sync_mode) >>>>>> { >>>>>> return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); >>>>>> } >>>>>> >>>>>> int filemap_fdatawrite(struct address_space *mapping) >>>>>> { >>>>>> return __filemap_fdatawrite(mapping, WB_SYNC_ALL); >>>>>> } >>>>>> EXPORT_SYMBOL(filemap_fdatawrite); >>>>>> -----------------8<---------------- >>>>>> >>>>>> ...which then sets up the wbc with the right ranges and sync mode and >>>>>> kicks off writepages. But then, it does the i_size_read to figure out >>>>>> what range it should wait on (with the shortcut for the size == 0 case). >>>>>> >>>>>> My assumption was that it was intentionally designed that way, but I'm >>>>>> guessing from your comments that it wasn't? If so, then we can turn >>>>>> file_write_and_wait a static inline wrapper around >>>>>> file_write_and_wait_range. >>>>> FWIW, I did a bit of archaeology in the linux-history tree and found >>>>> this patch from Marcelo in 2004. Is this optimization still helpful? If >>>>> not, then that does simplify the code a bit. >>>>> >>>>> -------------------8<-------------------- >>>>> >>>>> [PATCH] small wait_on_page_writeback_range() optimization >>>>> >>>>> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" >>>>> parameter. This is not needed since we know the EOF from the inode. Use >>>>> that instead. >>>>> >>>>> Signed-off-by: Marcelo Tosatti >>>>> Signed-off-by: Andrew Morton >>>>> Signed-off-by: Linus Torvalds >>>>> --- >>>>> mm/filemap.c | 8 +++++++- >>>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/mm/filemap.c b/mm/filemap.c >>>>> index 78e18b7639b6..55fb7b4141e4 100644 >>>>> --- a/mm/filemap.c >>>>> +++ b/mm/filemap.c >>>>> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); >>>>> */ >>>>> int filemap_fdatawait(struct address_space *mapping) >>>>> { >>>>> - return wait_on_page_writeback_range(mapping, 0, -1); >>>>> + loff_t i_size = i_size_read(mapping->host); >>>>> + >>>>> + if (i_size == 0) >>>>> + return 0; >>>>> + >>>>> + return wait_on_page_writeback_range(mapping, 0, >>>>> + (i_size - 1) >> PAGE_CACHE_SHIFT); >>>>> } >>>>> EXPORT_SYMBOL(filemap_fdatawait); >>>>> >>>> Does this ever get called in cases where we would not hold fs locks? In >>>> that case we definitely don't want to be relying on i_size, >>>> >>>> Steve. >>>> >>> Yes. We can initiate and wait on writeback from any context where you >>> can sleep, really. >>> >>> We're just waiting on whole file writeback here, so I don't think >>> there's anything wrong. As long as the i_size was valid at some point in >>> time prior to waiting then you're ok. >>> >>> The question I have is more whether this optimization is still useful. >>> >>> What we do now is just walk the radix tree and wait_on_page_writeback >>> for each page. Do we gain anything by avoiding ranges beyond the current >>> EOF with the pagecache infrastructure of 2017? >>> >> If this can be called from anywhere without fs locks, then i_size is not >> known. That has been a problem in the past since i_size may have changed >> on another node. We avoid that in this case due to only changing i_size >> under an exclusive lock, and also only having dirty pages when we have >> an exclusive lock. There is another case though, if the inode is a block >> device, i_size will be zero. That is the case for the address space that >> looks after rgrps for GFS2. We do (luckily!) call >> filemap_fdatawait_range() directly in that case. For "normal" inodes >> though, the address space for metadata is backed by the block device >> inode, so that looks like it might be an issue, since >> fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the >> metamapping. It might potentially be an issue in other cases too, >> >> Steve. >> > Some of those do sound problematic. > > Again though, we're only waiting on writeback here, and I assume with > gfs2 that would only be pages that were written on the local node. Yes > > Is it possible to have pages under writeback and in still in the tree, > but that are beyond the current i_size? It seems like that's the main > worrisome case. > Thats what I was wondering too. I'm not 100% sure without some more detailed investigation. Either way the block device case also seems problematic, although not impossible to special case I suppose. The real question is what do we get from this optmisation? Is the pain of checking correctness worth it for the benefits gained, Steve. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f197.google.com (mail-qk0-f197.google.com [209.85.220.197]) by kanga.kvack.org (Postfix) with ESMTP id 9DA306B05F3 for ; Mon, 31 Jul 2017 08:38:45 -0400 (EDT) Received: by mail-qk0-f197.google.com with SMTP id x77so75690193qka.15 for ; Mon, 31 Jul 2017 05:38:45 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id 7si22242599qth.75.2017.07.31.05.38.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 31 Jul 2017 05:38:45 -0700 (PDT) Date: Mon, 31 Jul 2017 08:38:41 -0400 (EDT) From: Bob Peterson Message-ID: <1822812523.36420383.1501504721611.JavaMail.zimbra@redhat.com> In-Reply-To: <1501503761.4663.11.camel@redhat.com> References: <20170726175538.13885-1-jlayton@kernel.org> <20170727084914.GC21100@quack2.suse.cz> <1501159710.6279.1.camel@redhat.com> <1501500421.4663.4.camel@redhat.com> <8d46c4c6-76b5-9726-7d85-249cd9a899f1@redhat.com> <1501501456.4663.6.camel@redhat.com> <1501503761.4663.11.camel@redhat.com> Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jeff Layton Cc: Steven Whitehouse , Jan Kara , Marcelo Tosatti , Alexander Viro , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , cluster-devel@redhat.com, Benjamin Marzinski ----- Original Message ----- | > If this can be called from anywhere without fs locks, then i_size is not | > known. That has been a problem in the past since i_size may have changed | > on another node. We avoid that in this case due to only changing i_size | > under an exclusive lock, and also only having dirty pages when we have | > an exclusive lock. There is another case though, if the inode is a block | > device, i_size will be zero. That is the case for the address space that | > looks after rgrps for GFS2. We do (luckily!) call | > filemap_fdatawait_range() directly in that case. For "normal" inodes | > though, the address space for metadata is backed by the block device | > inode, so that looks like it might be an issue, since | > fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the | > metamapping. It might potentially be an issue in other cases too, | > | > Steve. | > | | Some of those do sound problematic. | | Again though, we're only waiting on writeback here, and I assume with | gfs2 that would only be pages that were written on the local node. | | Is it possible to have pages under writeback and in still in the tree, | but that are beyond the current i_size? It seems like that's the main | worrisome case. | | -- | Jeff Layton Hi Jeff, I believe the answer is yes. I was recently "bitten" by a case where (whether due to a bug or not) I had blocks allocated in a GFS2 file beyond i_size. I had implemented a delete algorithm that used i_size, but I found cases where files couldn't be deleted because of blocks hanging out past EOF. I'm not sure if they can be in writeback, but possibly. It's already on my "to investigate" list, but I haven't gotten to it yet. Yes, it seems like a bug. Yes, we need to fix it. But now there may be lots of legacy file systems out in the field that have this problem. Not sure if they can get to writeback until I study the situation more closely. I believe Ben Marzinski also may have come across a case in which we can have blocks in writeback that are beyond i_size. See the commit message on Ben's patch here: https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=fd4c5748b8d3f7420e8932ed0bde3d53cc8acc9d Regards, Bob Peterson Red Hat File Systems -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f199.google.com (mail-qt0-f199.google.com [209.85.216.199]) by kanga.kvack.org (Postfix) with ESMTP id A4D3D6B05FD for ; Mon, 31 Jul 2017 09:00:42 -0400 (EDT) Received: by mail-qt0-f199.google.com with SMTP id s26so34349904qts.8 for ; Mon, 31 Jul 2017 06:00:42 -0700 (PDT) Received: from mail-qt0-f175.google.com (mail-qt0-f175.google.com. [209.85.216.175]) by mx.google.com with ESMTPS id q12si24398476qtf.9.2017.07.31.06.00.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 31 Jul 2017 06:00:41 -0700 (PDT) Received: by mail-qt0-f175.google.com with SMTP id v29so86968834qtv.3 for ; Mon, 31 Jul 2017 06:00:41 -0700 (PDT) Message-ID: <1501506037.4663.13.camel@redhat.com> Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait From: Jeff Layton Date: Mon, 31 Jul 2017 09:00:37 -0400 In-Reply-To: <20170731120744.GA25458@quack2.suse.cz> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-3-jlayton@kernel.org> <20170727084914.GC21100@quack2.suse.cz> <1501159710.6279.1.camel@redhat.com> <1501500421.4663.4.camel@redhat.com> <8d46c4c6-76b5-9726-7d85-249cd9a899f1@redhat.com> <1501501456.4663.6.camel@redhat.com> <20170731120744.GA25458@quack2.suse.cz> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jan Kara Cc: Steven Whitehouse , Marcelo Tosatti , Alexander Viro , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , cluster-devel@redhat.com On Mon, 2017-07-31 at 14:07 +0200, Jan Kara wrote: > On Mon 31-07-17 07:44:16, Jeff Layton wrote: > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > > On 31/07/17 12:27, Jeff Layton wrote: > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > > +int file_write_and_wait(struct file *file) > > > > > > > +{ > > > > > > > + int err = 0, err2; > > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > > + > > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > > + if (err != -EIO) { > > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > > + > > > > > > > + if (i_size != 0) > > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > > + i_size - 1); > > > > > > > + } > > > > > > > + } > > > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > > > Honza > > > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > > behavior. For example: > > > > > > > > > > -----------------8<---------------- > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > > int sync_mode) > > > > > { > > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > > } > > > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > > { > > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > > } > > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > > -----------------8<---------------- > > > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > > file_write_and_wait a static inline wrapper around > > > > > file_write_and_wait_range. > > > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > > not, then that does simplify the code a bit. > > > > > > > > -------------------8<-------------------- > > > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > > parameter. This is not needed since we know the EOF from the inode. Use > > > > that instead. > > > > > > > > Signed-off-by: Marcelo Tosatti > > > > Signed-off-by: Andrew Morton > > > > Signed-off-by: Linus Torvalds > > > > --- > > > > mm/filemap.c | 8 +++++++- > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > > --- a/mm/filemap.c > > > > +++ b/mm/filemap.c > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > > */ > > > > int filemap_fdatawait(struct address_space *mapping) > > > > { > > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > > + loff_t i_size = i_size_read(mapping->host); > > > > + > > > > + if (i_size == 0) > > > > + return 0; > > > > + > > > > + return wait_on_page_writeback_range(mapping, 0, > > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > > } > > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > > that case we definitely don't want to be relying on i_size, > > > > > > Steve. > > > > > > > Yes. We can initiate and wait on writeback from any context where you > > can sleep, really. > > > > We're just waiting on whole file writeback here, so I don't think > > there's anything wrong. As long as the i_size was valid at some point in > > time prior to waiting then you're ok. > > > > The question I have is more whether this optimization is still useful. > > > > What we do now is just walk the radix tree and wait_on_page_writeback > > for each page. Do we gain anything by avoiding ranges beyond the current > > EOF with the pagecache infrastructure of 2017? > > FWIW I'm not aware of any significant benefit of using i_size in > filemap_fdatawait() - we iterate to the end of the radix tree node anyway > since pagevec_lookup_tag() does not support range searches anyway (I'm > working on fixing that however even after that the benefit would be still > rather marginal). > > What Marcello might have meant even back in 2004 was that if we are in the > middle of truncate, i_size is already reduced but page cache not truncated > yet, then filemap_fdatawait() does not have to wait for writeback of > truncated pages. That might be a noticeable benefit even today if such race > happens however I'm not sure it's worth optimizing for and surprises > arising from randomly snapshotting i_size (which especially for clustered > filesystems may be out of date) IMHO overweight the possible advantage. > > Honza Thanks for clarifying. Given that file_write_and_wait is a new helper function anyway, I'll just make it a wrapper around file_write_and_wait_range. Since it might be racy, should remove this optimization from the "legacy" filemap_fdatawait / filemap_fdatawait_keep_errors calls? -- Jeff Layton -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f72.google.com (mail-wm0-f72.google.com [74.125.82.72]) by kanga.kvack.org (Postfix) with ESMTP id 7489D6B0601 for ; Mon, 31 Jul 2017 09:32:46 -0400 (EDT) Received: by mail-wm0-f72.google.com with SMTP id 185so17131254wmk.12 for ; Mon, 31 Jul 2017 06:32:46 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id v128si543388wmg.14.2017.07.31.06.32.44 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 31 Jul 2017 06:32:45 -0700 (PDT) Date: Mon, 31 Jul 2017 15:32:43 +0200 From: Jan Kara Subject: Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait Message-ID: <20170731133243.GB27589@quack2.suse.cz> References: <20170726175538.13885-1-jlayton@kernel.org> <20170726175538.13885-3-jlayton@kernel.org> <20170727084914.GC21100@quack2.suse.cz> <1501159710.6279.1.camel@redhat.com> <1501500421.4663.4.camel@redhat.com> <8d46c4c6-76b5-9726-7d85-249cd9a899f1@redhat.com> <1501501456.4663.6.camel@redhat.com> <20170731120744.GA25458@quack2.suse.cz> <1501506037.4663.13.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1501506037.4663.13.camel@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Jeff Layton Cc: Jan Kara , Steven Whitehouse , Marcelo Tosatti , Alexander Viro , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , cluster-devel@redhat.com On Mon 31-07-17 09:00:37, Jeff Layton wrote: > On Mon, 2017-07-31 at 14:07 +0200, Jan Kara wrote: > > On Mon 31-07-17 07:44:16, Jeff Layton wrote: > > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > > > On 31/07/17 12:27, Jeff Layton wrote: > > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > > > +int file_write_and_wait(struct file *file) > > > > > > > > +{ > > > > > > > > + int err = 0, err2; > > > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > > > + > > > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > > > + if (err != -EIO) { > > > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > > > + > > > > > > > > + if (i_size != 0) > > > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > > > + i_size - 1); > > > > > > > > + } > > > > > > > > + } > > > > > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > > > > > Honza > > > > > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > > > behavior. For example: > > > > > > > > > > > > -----------------8<---------------- > > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > > > int sync_mode) > > > > > > { > > > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > > > } > > > > > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > > > { > > > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > > > } > > > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > > > -----------------8<---------------- > > > > > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > > > file_write_and_wait a static inline wrapper around > > > > > > file_write_and_wait_range. > > > > > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > > > not, then that does simplify the code a bit. > > > > > > > > > > -------------------8<-------------------- > > > > > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > > > parameter. This is not needed since we know the EOF from the inode. Use > > > > > that instead. > > > > > > > > > > Signed-off-by: Marcelo Tosatti > > > > > Signed-off-by: Andrew Morton > > > > > Signed-off-by: Linus Torvalds > > > > > --- > > > > > mm/filemap.c | 8 +++++++- > > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > > > --- a/mm/filemap.c > > > > > +++ b/mm/filemap.c > > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > > > */ > > > > > int filemap_fdatawait(struct address_space *mapping) > > > > > { > > > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > + > > > > > + if (i_size == 0) > > > > > + return 0; > > > > > + > > > > > + return wait_on_page_writeback_range(mapping, 0, > > > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > > > } > > > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > > > that case we definitely don't want to be relying on i_size, > > > > > > > > Steve. > > > > > > > > > > Yes. We can initiate and wait on writeback from any context where you > > > can sleep, really. > > > > > > We're just waiting on whole file writeback here, so I don't think > > > there's anything wrong. As long as the i_size was valid at some point in > > > time prior to waiting then you're ok. > > > > > > The question I have is more whether this optimization is still useful. > > > > > > What we do now is just walk the radix tree and wait_on_page_writeback > > > for each page. Do we gain anything by avoiding ranges beyond the current > > > EOF with the pagecache infrastructure of 2017? > > > > FWIW I'm not aware of any significant benefit of using i_size in > > filemap_fdatawait() - we iterate to the end of the radix tree node anyway > > since pagevec_lookup_tag() does not support range searches anyway (I'm > > working on fixing that however even after that the benefit would be still > > rather marginal). > > > > What Marcello might have meant even back in 2004 was that if we are in the > > middle of truncate, i_size is already reduced but page cache not truncated > > yet, then filemap_fdatawait() does not have to wait for writeback of > > truncated pages. That might be a noticeable benefit even today if such race > > happens however I'm not sure it's worth optimizing for and surprises > > arising from randomly snapshotting i_size (which especially for clustered > > filesystems may be out of date) IMHO overweight the possible advantage. > > > > Honza > > Thanks for clarifying. > > Given that file_write_and_wait is a new helper function anyway, I'll > just make it a wrapper around file_write_and_wait_range. Since it might Agreed. > be racy, should remove this optimization from the "legacy" > filemap_fdatawait / filemap_fdatawait_keep_errors calls? I'm for it. Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f70.google.com (mail-oi0-f70.google.com [209.85.218.70]) by kanga.kvack.org (Postfix) with ESMTP id 7BF3D6B04AA for ; Mon, 31 Jul 2017 12:49:30 -0400 (EDT) Received: by mail-oi0-f70.google.com with SMTP id s21so23459954oie.5 for ; Mon, 31 Jul 2017 09:49:30 -0700 (PDT) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id s67si11079712oig.379.2017.07.31.09.49.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 31 Jul 2017 09:49:29 -0700 (PDT) From: Jeff Layton Subject: [PATCH v3] mm: add file_fdatawait_range and file_write_and_wait Date: Mon, 31 Jul 2017 12:49:25 -0400 Message-Id: <20170731164925.2158-1-jlayton@kernel.org> In-Reply-To: <20170726175538.13885-3-jlayton@kernel.org> References: <20170726175538.13885-3-jlayton@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Alexander Viro , Jan Kara Cc: "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com From: Jeff Layton Necessary now for gfs2_fsync and sync_file_range, but there will eventually be other callers. Signed-off-by: Jeff Layton --- include/linux/fs.h | 11 ++++++++++- mm/filemap.c | 23 +++++++++++++++++++++++ 2 files changed, 33 insertions(+), 1 deletion(-) v3: make file_write_and_wait a wrapper around file_write_and_wait_range diff --git a/include/linux/fs.h b/include/linux/fs.h index 526b6a9f30d4..909210bd6366 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2549,6 +2549,8 @@ static inline int filemap_fdatawait(struct address_space *mapping) extern bool filemap_range_has_page(struct address_space *, loff_t lstart, loff_t lend); +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int filemap_write_and_wait(struct address_space *mapping); extern int filemap_write_and_wait_range(struct address_space *mapping, loff_t lstart, loff_t lend); @@ -2557,12 +2559,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping, extern int filemap_fdatawrite_range(struct address_space *mapping, loff_t start, loff_t end); extern int filemap_check_errors(struct address_space *mapping); - extern void __filemap_set_wb_err(struct address_space *mapping, int err); + +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int __must_check file_check_and_advance_wb_err(struct file *file); extern int __must_check file_write_and_wait_range(struct file *file, loff_t start, loff_t end); +static inline int file_write_and_wait(struct file *file) +{ + return file_write_and_wait_range(file, 0, LLONG_MAX); +} + /** * filemap_set_wb_err - set a writeback error on an address_space * @mapping: mapping in which to set writeback error diff --git a/mm/filemap.c b/mm/filemap.c index 953804b29a75..85dfe3bee324 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, EXPORT_SYMBOL(filemap_fdatawait_range); /** + * file_fdatawait_range - wait for writeback to complete + * @file: file pointing to address space structure to wait for + * @start_byte: offset in bytes where the range starts + * @end_byte: offset in bytes where the range ends (inclusive) + * + * Walk the list of under-writeback pages of the address space that file + * refers to, in the given range and wait for all of them. Check error + * status of the address space vs. the file->f_wb_err cursor and return it. + * + * Since the error status of the file is advanced by this function, + * callers are responsible for checking the return value and handling and/or + * reporting the error. + */ +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte) +{ + struct address_space *mapping = file->f_mapping; + + __filemap_fdatawait_range(mapping, start_byte, end_byte); + return file_check_and_advance_wb_err(file); +} +EXPORT_SYMBOL(file_fdatawait_range); + +/** * filemap_fdatawait_keep_errors - wait for writeback without clearing errors * @mapping: address space structure to wait for * -- 2.13.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f199.google.com (mail-wr0-f199.google.com [209.85.128.199]) by kanga.kvack.org (Postfix) with ESMTP id 67AE26B050F for ; Tue, 1 Aug 2017 05:52:34 -0400 (EDT) Received: by mail-wr0-f199.google.com with SMTP id z48so1614782wrc.4 for ; Tue, 01 Aug 2017 02:52:34 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id b193si870869wme.227.2017.08.01.02.52.32 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 01 Aug 2017 02:52:32 -0700 (PDT) Date: Tue, 1 Aug 2017 11:52:31 +0200 From: Jan Kara Subject: Re: [PATCH v3] mm: add file_fdatawait_range and file_write_and_wait Message-ID: <20170801095231.GE4215@quack2.suse.cz> References: <20170726175538.13885-3-jlayton@kernel.org> <20170731164925.2158-1-jlayton@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170731164925.2158-1-jlayton@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Jeff Layton Cc: Alexander Viro , Jan Kara , "J . Bruce Fields" , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Bob Peterson , Steven Whitehouse , cluster-devel@redhat.com On Mon 31-07-17 12:49:25, Jeff Layton wrote: > From: Jeff Layton > > Necessary now for gfs2_fsync and sync_file_range, but there will > eventually be other callers. > > Signed-off-by: Jeff Layton Looks good to me. You can add: Reviewed-by: Jan Kara Honza > --- > include/linux/fs.h | 11 ++++++++++- > mm/filemap.c | 23 +++++++++++++++++++++++ > 2 files changed, 33 insertions(+), 1 deletion(-) > > v3: make file_write_and_wait a wrapper around file_write_and_wait_range > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 526b6a9f30d4..909210bd6366 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2549,6 +2549,8 @@ static inline int filemap_fdatawait(struct address_space *mapping) > > extern bool filemap_range_has_page(struct address_space *, loff_t lstart, > loff_t lend); > +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, > + loff_t lend); > extern int filemap_write_and_wait(struct address_space *mapping); > extern int filemap_write_and_wait_range(struct address_space *mapping, > loff_t lstart, loff_t lend); > @@ -2557,12 +2559,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping, > extern int filemap_fdatawrite_range(struct address_space *mapping, > loff_t start, loff_t end); > extern int filemap_check_errors(struct address_space *mapping); > - > extern void __filemap_set_wb_err(struct address_space *mapping, int err); > + > +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, > + loff_t lend); > extern int __must_check file_check_and_advance_wb_err(struct file *file); > extern int __must_check file_write_and_wait_range(struct file *file, > loff_t start, loff_t end); > > +static inline int file_write_and_wait(struct file *file) > +{ > + return file_write_and_wait_range(file, 0, LLONG_MAX); > +} > + > /** > * filemap_set_wb_err - set a writeback error on an address_space > * @mapping: mapping in which to set writeback error > diff --git a/mm/filemap.c b/mm/filemap.c > index 953804b29a75..85dfe3bee324 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, > EXPORT_SYMBOL(filemap_fdatawait_range); > > /** > + * file_fdatawait_range - wait for writeback to complete > + * @file: file pointing to address space structure to wait for > + * @start_byte: offset in bytes where the range starts > + * @end_byte: offset in bytes where the range ends (inclusive) > + * > + * Walk the list of under-writeback pages of the address space that file > + * refers to, in the given range and wait for all of them. Check error > + * status of the address space vs. the file->f_wb_err cursor and return it. > + * > + * Since the error status of the file is advanced by this function, > + * callers are responsible for checking the return value and handling and/or > + * reporting the error. > + */ > +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte) > +{ > + struct address_space *mapping = file->f_mapping; > + > + __filemap_fdatawait_range(mapping, start_byte, end_byte); > + return file_check_and_advance_wb_err(file); > +} > +EXPORT_SYMBOL(file_fdatawait_range); > + > +/** > * filemap_fdatawait_keep_errors - wait for writeback without clearing errors > * @mapping: address space structure to wait for > * > -- > 2.13.3 > -- Jan Kara SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org