From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90FE5C433E4 for ; Tue, 21 Jul 2020 20:38:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 69CD020717 for ; Tue, 21 Jul 2020 20:38:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="v+YoGT/I" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730488AbgGUUiL (ORCPT ); Tue, 21 Jul 2020 16:38:11 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:42060 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728837AbgGUUiK (ORCPT ); Tue, 21 Jul 2020 16:38:10 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 06LKRh4Z178822; Tue, 21 Jul 2020 20:37:54 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2020-01-29; bh=3x813AIWufPcAcXMzJIzFrj8TwErT72vrpTdX/dgkQA=; b=v+YoGT/IhTP6ML8OX7vng+hNTOCm5q1u3Hs4dEE+ssCk94tH6vbIy0IbFa2ri05c7ejw FyeVkX66gXS73TH/MXTnw4y/wbVCFar/91ku8z4lhzLDBTkec0lDj0PKbkMWVJGr88p2 5i/k77iyEa3BcfY4j6DVOFODdUu9xOO/PD7DcsS4TIu+WmvEqQpQrTgSDmh3lmBst6dT 9nsckwS1r8fVz/xsHZ9HB2TBk+hgzaTS77FPYcCJ6V2/KyPBWLDOqAFmoj1xXN6DShIg jlGncUKSBf8kIbxX8VJD2Js7rK9xVKznCthpLRWY516VxftPSfkckggIXSpOGnpx1Xqs +w== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2120.oracle.com with ESMTP id 32d6ksku6u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 21 Jul 2020 20:37:53 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 06LKS1pC181699; Tue, 21 Jul 2020 20:37:53 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3020.oracle.com with ESMTP id 32e5fxdcjx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 21 Jul 2020 20:37:52 +0000 Received: from abhmp0007.oracle.com (abhmp0007.oracle.com [141.146.116.13]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 06LKbpxM027485; Tue, 21 Jul 2020 20:37:51 GMT Received: from localhost (/10.159.147.229) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 21 Jul 2020 13:37:51 -0700 Date: Tue, 21 Jul 2020 13:37:49 -0700 From: "Darrick J. Wong" To: Christoph Hellwig Cc: Dave Chinner , Goldwyn Rodrigues , Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Matthew Wilcox , linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, cluster-devel@redhat.com, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, Dave Chinner , Goldwyn Rodrigues Subject: Re: [PATCH 3/3] iomap: fall back to buffered writes for invalidation failures Message-ID: <20200721203749.GF3151642@magnolia> References: <20200721183157.202276-1-hch@lst.de> <20200721183157.202276-4-hch@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200721183157.202276-4-hch@lst.de> X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9689 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=5 malwarescore=0 mlxscore=0 adultscore=0 phishscore=0 spamscore=0 bulkscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2007210135 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9689 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 suspectscore=5 bulkscore=0 mlxscore=0 mlxlogscore=999 impostorscore=0 priorityscore=1501 lowpriorityscore=0 phishscore=0 spamscore=0 adultscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2007210135 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Tue, Jul 21, 2020 at 08:31:57PM +0200, Christoph Hellwig wrote: > Failing to invalid the page cache means data in incoherent, which is > a very bad state for the system. Always fall back to buffered I/O > through the page cache if we can't invalidate mappings. > > Signed-off-by: Christoph Hellwig > Acked-by: Dave Chinner > Reviewed-by: Goldwyn Rodrigues For the iomap and xfs parts, Reviewed-by: Darrick J. Wong But I'd still like acks from Ted, Andreas, and Damien for ext4, gfs2, and zonefs, respectively. (Particularly if anyone was harboring ideas about trying to get this in before 5.10, though I've not yet heard anyone say that explicitly...) --D > --- > fs/ext4/file.c | 2 ++ > fs/gfs2/file.c | 3 ++- > fs/iomap/direct-io.c | 16 +++++++++++----- > fs/iomap/trace.h | 1 + > fs/xfs/xfs_file.c | 4 ++-- > fs/zonefs/super.c | 7 +++++-- > 6 files changed, 23 insertions(+), 10 deletions(-) > > diff --git a/fs/ext4/file.c b/fs/ext4/file.c > index 2a01e31a032c4c..129cc1dd6b7952 100644 > --- a/fs/ext4/file.c > +++ b/fs/ext4/file.c > @@ -544,6 +544,8 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) > iomap_ops = &ext4_iomap_overwrite_ops; > ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, > is_sync_kiocb(iocb) || unaligned_io || extend); > + if (ret == -ENOTBLK) > + ret = 0; > > if (extend) > ret = ext4_handle_inode_extension(inode, offset, ret, count); > diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c > index bebde537ac8cf2..b085a3bea4f0fd 100644 > --- a/fs/gfs2/file.c > +++ b/fs/gfs2/file.c > @@ -835,7 +835,8 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from) > > ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, > is_sync_kiocb(iocb)); > - > + if (ret == -ENOTBLK) > + ret = 0; > out: > gfs2_glock_dq(&gh); > out_uninit: > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > index 190967e87b69e4..c1aafb2ab99072 100644 > --- a/fs/iomap/direct-io.c > +++ b/fs/iomap/direct-io.c > @@ -10,6 +10,7 @@ > #include > #include > #include > +#include "trace.h" > > #include "../internal.h" > > @@ -401,6 +402,9 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, > * can be mapped into multiple disjoint IOs and only a subset of the IOs issued > * may be pure data writes. In that case, we still need to do a full data sync > * completion. > + * > + * Returns -ENOTBLK In case of a page invalidation invalidation failure for > + * writes. The callers needs to fall back to buffered I/O in this case. > */ > ssize_t > iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > @@ -478,13 +482,15 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > if (iov_iter_rw(iter) == WRITE) { > /* > * Try to invalidate cache pages for the range we are writing. > - * If this invalidation fails, tough, the write will still work, > - * but racing two incompatible write paths is a pretty crazy > - * thing to do, so we don't support it 100%. > + * If this invalidation fails, let the caller fall back to > + * buffered I/O. > */ > if (invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, > - end >> PAGE_SHIFT)) > - dio_warn_stale_pagecache(iocb->ki_filp); > + end >> PAGE_SHIFT)) { > + trace_iomap_dio_invalidate_fail(inode, pos, count); > + ret = -ENOTBLK; > + goto out_free_dio; > + } > > if (!wait_for_completion && !inode->i_sb->s_dio_done_wq) { > ret = sb_init_dio_done_wq(inode->i_sb); > diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h > index 5693a39d52fb63..fdc7ae388476f5 100644 > --- a/fs/iomap/trace.h > +++ b/fs/iomap/trace.h > @@ -74,6 +74,7 @@ DEFINE_EVENT(iomap_range_class, name, \ > DEFINE_RANGE_EVENT(iomap_writepage); > DEFINE_RANGE_EVENT(iomap_releasepage); > DEFINE_RANGE_EVENT(iomap_invalidatepage); > +DEFINE_RANGE_EVENT(iomap_dio_invalidate_fail); > > #define IOMAP_TYPE_STRINGS \ > { IOMAP_HOLE, "HOLE" }, \ > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index a6ef90457abf97..1b4517fc55f1b9 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -553,8 +553,8 @@ xfs_file_dio_aio_write( > xfs_iunlock(ip, iolock); > > /* > - * No fallback to buffered IO on errors for XFS, direct IO will either > - * complete fully or fail. > + * No fallback to buffered IO after short writes for XFS, direct I/O > + * will either complete fully or return an error. > */ > ASSERT(ret < 0 || ret == count); > return ret; > diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c > index 07bc42d62673ce..d0a04528a7e18e 100644 > --- a/fs/zonefs/super.c > +++ b/fs/zonefs/super.c > @@ -786,8 +786,11 @@ static ssize_t zonefs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > if (iocb->ki_pos >= ZONEFS_I(inode)->i_max_size) > return -EFBIG; > > - if (iocb->ki_flags & IOCB_DIRECT) > - return zonefs_file_dio_write(iocb, from); > + if (iocb->ki_flags & IOCB_DIRECT) { > + ssize_t ret = zonefs_file_dio_write(iocb, from); > + if (ret != -ENOTBLK) > + return ret; > + } > > return zonefs_file_buffered_write(iocb, from); > } > -- > 2.27.0 >