All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ritesh Harjani <riteshh@linux.ibm.com>
To: linux-ext4@vger.kernel.org
Cc: tytso@mit.edu, jack@suse.cz, Ritesh Harjani <riteshh@linux.ibm.com>
Subject: [RFC 2/2] ext4: Fix stale data read issue with DIO read & ext4_page_mkwrite path
Date: Mon, 13 Jan 2020 16:34:22 +0530	[thread overview]
Message-ID: <1c2da3cf5e0d90e8650e81f07976629c7d87e8ca.1578907891.git.riteshh@linux.ibm.com> (raw)
In-Reply-To: <cover.1578907890.git.riteshh@linux.ibm.com>

Currently there is a small race window where ext4 tries to allocate
a written block for mapped files and if DIO read is in progress, then
this may result into stale data read exposure problem.

This patch fixes the mentioned issue by:
1. For non-delalloc path, page_mkwrite will use unwritten blocks by
   default for extent based files.

2. For delalloc path, we check if DIO is in progress during writeback.
   If yes, then we use unwritten blocks method to avoid this race.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/ext4/inode.c | 45 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d035acab5b2a..07f66782335b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1529,6 +1529,7 @@ struct mpage_da_data {
 	struct ext4_map_blocks map;
 	struct ext4_io_submit io_submit;	/* IO submission data */
 	unsigned int do_map:1;
+	bool dio_in_progress:1;
 };
 
 static void mpage_release_unused_pages(struct mpage_da_data *mpd,
@@ -2359,7 +2360,7 @@ static int mpage_map_one_extent(handle_t *handle, struct mpage_da_data *mpd)
 			   EXT4_GET_BLOCKS_METADATA_NOFAIL |
 			   EXT4_GET_BLOCKS_IO_SUBMIT;
 	dioread_nolock = ext4_should_dioread_nolock(inode);
-	if (dioread_nolock)
+	if (dioread_nolock || mpd->dio_in_progress)
 		get_blocks_flags |= EXT4_GET_BLOCKS_IO_CREATE_EXT;
 	if (map->m_flags & (1 << BH_Delay))
 		get_blocks_flags |= EXT4_GET_BLOCKS_DELALLOC_RESERVE;
@@ -2367,7 +2368,8 @@ static int mpage_map_one_extent(handle_t *handle, struct mpage_da_data *mpd)
 	err = ext4_map_blocks(handle, inode, map, get_blocks_flags);
 	if (err < 0)
 		return err;
-	if (dioread_nolock && (map->m_flags & EXT4_MAP_UNWRITTEN)) {
+	if ((dioread_nolock || mpd->dio_in_progress) &&
+	    (map->m_flags & EXT4_MAP_UNWRITTEN)) {
 		if (!mpd->io_submit.io_end->handle &&
 		    ext4_handle_valid(handle)) {
 			mpd->io_submit.io_end->handle = handle->h_rsv_handle;
@@ -2626,6 +2628,7 @@ static int ext4_writepages(struct address_space *mapping,
 	bool done;
 	struct blk_plug plug;
 	bool give_up_on_write = false;
+	bool dio_in_progress = false;
 
 	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
 		return -EIO;
@@ -2680,15 +2683,6 @@ static int ext4_writepages(struct address_space *mapping,
 		ext4_journal_stop(handle);
 	}
 
-	if (ext4_should_dioread_nolock(inode)) {
-		/*
-		 * We may need to convert up to one extent per block in
-		 * the page and we may dirty the inode.
-		 */
-		rsv_blocks = 1 + ext4_chunk_trans_blocks(inode,
-						PAGE_SIZE >> inode->i_blkbits);
-	}
-
 	if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
 		range_whole = 1;
 
@@ -2712,6 +2706,26 @@ static int ext4_writepages(struct address_space *mapping,
 	done = false;
 	blk_start_plug(&plug);
 
+	/*
+	 * If DIO is in progress, then we use unwritten blocks for allocation.
+	 * This is to avoid a small window of race (stale read) with
+	 * ext4_page_mkwrite path in delalloc case & with DIO read in parallel.
+	 *
+	 * Let's check for i_dio_count after we have tagged pages for writeback.
+	 */
+	smp_mb__before_atomic();
+	dio_in_progress = !!atomic_read(&inode->i_dio_count);
+	mpd.dio_in_progress = dio_in_progress;
+
+	if (ext4_should_dioread_nolock(inode) || dio_in_progress) {
+		/*
+		 * We may need to convert up to one extent per block in
+		 * the page and we may dirty the inode.
+		 */
+		rsv_blocks = 1 + ext4_chunk_trans_blocks(inode,
+						PAGE_SIZE >> inode->i_blkbits);
+	}
+
 	/*
 	 * First writeback pages that don't need mapping - we can avoid
 	 * starting a transaction unnecessarily and also avoid being blocked
@@ -5965,8 +5979,13 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
 		}
 	}
 	unlock_page(page);
-	/* OK, we need to fill the hole... */
-	if (ext4_should_dioread_nolock(inode))
+	/*
+	 * OK, we need to fill the hole...
+	 * By default use unwritten block allocation here to avoid a small
+	 * window of race (stale data read) with DIO read path.
+	 */
+	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS) &&
+	    !ext4_should_journal_data(inode))
 		get_block = ext4_get_block_unwritten;
 	else
 		get_block = ext4_get_block;
-- 
2.21.0


  parent reply	other threads:[~2020-01-13 11:04 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-13 11:04 [RFC 0/2] ext4: Fix stale data read exposure problem with DIO read/page_mkwrite Ritesh Harjani
2020-01-13 11:04 ` [RFC 1/2] iomap: direct-io: Move inode_dio_begin before filemap_write_and_wait_range Ritesh Harjani
2020-01-13 21:51   ` Darrick J. Wong
2020-01-14  9:05     ` Jan Kara
2020-01-14 16:38       ` Christoph Hellwig
2020-01-15  9:19         ` Jan Kara
2020-01-15 14:56           ` Christoph Hellwig
2020-01-14  9:12   ` Jan Kara
2020-01-14 16:37   ` Christoph Hellwig
2020-01-14 17:19     ` Jan Kara
2020-01-14 18:27       ` Christoph Hellwig
2020-01-15  9:08         ` Jan Kara
2020-01-13 11:04 ` Ritesh Harjani [this message]
2020-01-14  9:47   ` [RFC 2/2] ext4: Fix stale data read issue with DIO read & ext4_page_mkwrite path Jan Kara
2020-01-14 22:25     ` Ritesh Harjani
2020-01-14 16:39 ` [RFC 0/2] ext4: Fix stale data read exposure problem with DIO read/page_mkwrite Christoph Hellwig
2020-01-14 22:33   ` Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1c2da3cf5e0d90e8650e81f07976629c7d87e8ca.1578907891.git.riteshh@linux.ibm.com \
    --to=riteshh@linux.ibm.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.