All of lore.kernel.org
 help / color / mirror / Atom feed
From: Su Yue <glass.su@suse.com>
To: ocfs2-devel@lists.linux.dev
Cc: joseph.qi@linux.alibaba.com, Su Yue <glass.su@suse.com>
Subject: [PATCH 2/4] ocfs2: fix races between hole punching and AIO+DIO
Date: Sun, 31 Mar 2024 19:17:42 +0800	[thread overview]
Message-ID: <20240331111744.7224-3-l@damenly.org> (raw)
In-Reply-To: <20240331111744.7224-1-l@damenly.org>

From: Su Yue <glass.su@suse.com>

After commit "ocfs2: return real error code in ocfs2_dio_wr_get_block",
fstests/generic/300 become from always failed to sometimes failed:

========================================================================
[  473.293420 ] run fstests generic/300

[  475.296983 ] JBD2: Ignoring recovery information on journal
[  475.302473 ] ocfs2: Mounting device (253,1) on (node local, slot 0)
with ordered data mode.
[  494.290998 ] OCFS2: ERROR (device dm-1): ocfs2_change_extent_flag:
Owner 5668 has an extent at cpos 78723 which can no longer be found
[  494.291609 ] On-disk corruption discovered. Please run fsck.ocfs2
once the filesystem is unmounted.
[  494.292018 ] OCFS2: File system is now read-only.
[  494.292224 ] (kworker/19:11,2628,19):ocfs2_mark_extent_written:5272
ERROR: status = -30
[  494.292602 ] (kworker/19:11,2628,19):ocfs2_dio_end_io_write:2374
ERROR: status = -3
fio: io_u error on file /mnt/scratch/racer: Read-only file system: write
offset=460849152, buflen=131072
=========================================================================

In __blockdev_direct_IO, ocfs2_dio_wr_get_block is called to add
unwritten extents to a list. extents are also inserted into extent tree
in ocfs2_write_begin_nolock. Then another thread call fallocate to
puch a hole at one of the unwritten extent. The extent at cpos was
removed by ocfs2_remove_extent(). At end io worker thread,
ocfs2_search_extent_list found there is no such extent at the cpos.

    T1                        T2                T3
                              inode lock
                                ...
                                insert extents
                                ...
                              inode unlock
ocfs2_fallocate
 __ocfs2_change_file_space
  inode lock
  lock ip_alloc_sem
  ocfs2_remove_inode_range inode
   ocfs2_remove_btree_range
    ocfs2_remove_extent
    ^---remove the extent at cpos 78723
  ...
  unlock ip_alloc_sem
  inode unlock
                                       ocfs2_dio_end_io
                                        ocfs2_dio_end_io_write
                                         lock ip_alloc_sem
                                         ocfs2_mark_extent_written
                                          ocfs2_change_extent_flag
                                           ocfs2_search_extent_list
                                           ^---failed to find extent
                                          ...
                                          unlock ip_alloc_sem

In most filesystems, fallocate is not compatible with racing with
AIO+DIO, so fix it by adding to wait for all dio before
fallocate/punch_hole like ext4.

Signed-off-by: Su Yue <glass.su@suse.com>
---
 fs/ocfs2/file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 0da8e7bd3261..ccc57038a977 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1936,6 +1936,8 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
 
 	inode_lock(inode);
 
+	/* Wait all existing dio workers, newcomers will block on i_rwsem */
+	inode_dio_wait(inode);
 	/*
 	 * This prevents concurrent writes on other nodes
 	 */
-- 
2.44.0


  parent reply	other threads:[~2024-03-31 11:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-31 11:17 [PATCH 0/4] ocfs2 bugs fixes exposed by fstests Su Yue
2024-03-31 11:17 ` [PATCH 1/4] ocfs2: return real error code in ocfs2_dio_wr_get_block Su Yue
2024-04-01  1:44   ` Joseph Qi
2024-04-01  3:51     ` Su Yue
2024-03-31 11:17 ` Su Yue [this message]
2024-04-01  1:52   ` [PATCH 2/4] ocfs2: fix races between hole punching and AIO+DIO Joseph Qi
2024-03-31 11:17 ` [PATCH 3/4] ocfs2: update inode fsync transaction id in ocfs2_unlink and ocfs2_link Su Yue
2024-04-01  1:55   ` Joseph Qi
2024-03-31 11:17 ` [PATCH 4/4] ocfs2: use coarse time for new created files Su Yue
2024-04-01  2:02   ` Joseph Qi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240331111744.7224-3-l@damenly.org \
    --to=glass.su@suse.com \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=ocfs2-devel@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.