From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48673C00140 for ; Mon, 1 Aug 2022 00:06:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231936AbiHAAGs (ORCPT ); Sun, 31 Jul 2022 20:06:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229710AbiHAAGr (ORCPT ); Sun, 31 Jul 2022 20:06:47 -0400 Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8A68295B4 for ; Sun, 31 Jul 2022 17:06:44 -0700 (PDT) Received: from dread.disaster.area (pa49-195-20-138.pa.nsw.optusnet.com.au [49.195.20.138]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id C0A0510C8A46; Mon, 1 Aug 2022 10:06:43 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1oIIxF-007gKP-QO; Mon, 01 Aug 2022 10:06:41 +1000 Date: Mon, 1 Aug 2022 10:06:41 +1000 From: Dave Chinner To: "Darrick J. Wong" Cc: xfs Subject: Re: [PATCH] xfs: check return codes when flushing block devices Message-ID: <20220801000641.GZ3600936@dread.disaster.area> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=e9dl9Yl/ c=1 sm=1 tr=0 ts=62e71913 a=cxZHBGNDieHvTKNp/pucQQ==:117 a=cxZHBGNDieHvTKNp/pucQQ==:17 a=kj9zAlcOel0A:10 a=biHskzXt2R4A:10 a=VwQbUJbxAAAA:8 a=7-415B0cAAAA:8 a=QSdvR6El54khIaCTkrMA:9 a=CjuIK1q_8ugA:10 a=AjGcO6oz07-iQ99wixmX:22 a=biEYGPWJfzWAr4FL6Ov7:22 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Sun, Jul 31, 2022 at 09:22:28AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong > > If a block device cache flush fails, fsync needs to report that to upper > levels. If the log can't flush the data device, we should shut it down > immediately because we've just violated an invariant. Hence, check the > return value of blkdev_issue_flush. > > Signed-off-by: Darrick J. Wong > --- > fs/xfs/xfs_file.c | 15 ++++++++++----- > fs/xfs/xfs_log.c | 7 +++++-- > 2 files changed, 15 insertions(+), 7 deletions(-) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 5a171c0b244b..88450c33ab01 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -163,9 +163,11 @@ xfs_file_fsync( > * inode size in case of an extending write. > */ > if (XFS_IS_REALTIME_INODE(ip)) > - blkdev_issue_flush(mp->m_rtdev_targp->bt_bdev); > + error = blkdev_issue_flush(mp->m_rtdev_targp->bt_bdev); > else if (mp->m_logdev_targp != mp->m_ddev_targp) > - blkdev_issue_flush(mp->m_ddev_targp->bt_bdev); > + error = blkdev_issue_flush(mp->m_ddev_targp->bt_bdev); > + if (error) > + return error; > > /* > * Any inode that has dirty modifications in the log is pinned. The > @@ -173,8 +175,11 @@ xfs_file_fsync( > * that happen concurrently to the fsync call, but fsync semantics > * only require to sync previously completed I/O. > */ > - if (xfs_ipincount(ip)) > + if (xfs_ipincount(ip)) { > error = xfs_fsync_flush_log(ip, datasync, &log_flushed); > + if (error) > + return error; > + } Shouldn't we still try to flush the data device if necessary, even if the log flush failed? > /* > * If we only have a single device, and the log force about was > @@ -185,9 +190,9 @@ xfs_file_fsync( > */ > if (!log_flushed && !XFS_IS_REALTIME_INODE(ip) && > mp->m_logdev_targp == mp->m_ddev_targp) > - blkdev_issue_flush(mp->m_ddev_targp->bt_bdev); > + return blkdev_issue_flush(mp->m_ddev_targp->bt_bdev); > > - return error; > + return 0; > } > > static int > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > index 4b1c0a9c6368..8a767f4145f0 100644 > --- a/fs/xfs/xfs_log.c > +++ b/fs/xfs/xfs_log.c > @@ -1926,8 +1926,11 @@ xlog_write_iclog( > * by the LSN in this iclog is on stable storage. This is slow, > * but it *must* complete before we issue the external log IO. > */ > - if (log->l_targ != log->l_mp->m_ddev_targp) > - blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev); > + if (log->l_targ != log->l_mp->m_ddev_targp && > + blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev)) { > + xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); > + return; > + } That seems pretty drastic, though I'm not sure what else apart from ignoring the data device flush error can be done here. Also, it's not actually a log IO error - it's a data device IO error so it's a really a metadata writeback problem. Hence the use of SHUTDOWN_LOG_IO_ERROR probably needs a comment to explain why it needs to be used here... Cheers, Dave. -- Dave Chinner david@fromorbit.com