Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/3] ext4: fix two issue about bdev_try_to_free_page()
@ 2021-04-08 11:36 Zhang Yi
  2021-04-08 11:36 ` [PATCH 1/3] jbd2: protect buffers release with j_checkpoint_mutex Zhang Yi
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Zhang Yi @ 2021-04-08 11:36 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

Hi,

This first patch fix a potential filesystem inconsistency problem and
other two fix a use after free problem.

Zhang Yi (3):
  jbd2: protect buffers release with j_checkpoint_mutex
  jbd2: do not free buffers in jbd2_journal_try_to_free_buffers()
  ext4: add rcu to prevent use after free when umount filesystem

 fs/ext4/inode.c       |  6 ++++--
 fs/ext4/super.c       | 41 +++++++++++++++++++++++++++++------------
 fs/jbd2/journal.c     | 30 +++++++++++++++++++++++++++---
 fs/jbd2/transaction.c | 20 ++++++++++----------
 include/linux/jbd2.h  | 11 ++++++++++-
 5 files changed, 80 insertions(+), 28 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] jbd2: protect buffers release with j_checkpoint_mutex
  2021-04-08 11:36 [PATCH 0/3] ext4: fix two issue about bdev_try_to_free_page() Zhang Yi
@ 2021-04-08 11:36 ` Zhang Yi
  2021-04-08 13:45   ` Jan Kara
  2021-04-08 11:36 ` [PATCH 2/3] jbd2: do not free buffers in jbd2_journal_try_to_free_buffers() Zhang Yi
  2021-04-08 11:36 ` [PATCH 3/3] ext4: add rcu to prevent use after free when umount filesystem Zhang Yi
  2 siblings, 1 reply; 9+ messages in thread
From: Zhang Yi @ 2021-04-08 11:36 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

There is a race between jbd2_journal_try_to_free_buffers() and
jbd2_journal_destroy(), so the jbd2_log_do_checkpoint() may still
missing to detect the buffer write io error flag and lead to filesystem
inconsistency.

jbd2_journal_try_to_free_buffers()     ext4_put_super()
                                        jbd2_journal_destroy()
  __jbd2_journal_remove_checkpoint()
  detect buffer write error              jbd2_log_do_checkpoint()
                                         jbd2_cleanup_journal_tail()
                                           <--- lead to inconsistency
  jbd2_journal_abort()

Fix this issue by add j_checkpoint_mutex to protect journal buffer
release on jbd2_journal_try_to_free_buffers().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/jbd2/transaction.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 9396666b7314..b935b20cbae4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2123,6 +2123,7 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page)
 
 	J_ASSERT(PageLocked(page));
 
+	mutex_lock(&journal->j_checkpoint_mutex);
 	head = page_buffers(page);
 	bh = head;
 	do {
@@ -2163,6 +2164,7 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page)
 	if (has_write_io_error)
 		jbd2_journal_abort(journal, -EIO);
 
+	mutex_unlock(&journal->j_checkpoint_mutex);
 	return ret;
 }
 
-- 
2.25.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/3] jbd2: do not free buffers in jbd2_journal_try_to_free_buffers()
  2021-04-08 11:36 [PATCH 0/3] ext4: fix two issue about bdev_try_to_free_page() Zhang Yi
  2021-04-08 11:36 ` [PATCH 1/3] jbd2: protect buffers release with j_checkpoint_mutex Zhang Yi
@ 2021-04-08 11:36 ` Zhang Yi
  2021-04-08 11:36 ` [PATCH 3/3] ext4: add rcu to prevent use after free when umount filesystem Zhang Yi
  2 siblings, 0 replies; 9+ messages in thread
From: Zhang Yi @ 2021-04-08 11:36 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

This patch move try_to_free_buffers() out from
jbd2_journal_try_to_free_buffers() to the caller function, it just check
the buffers are JBD2 journal busy or not, and the caller should invoke
try_to_free_buffers() if it want to release page.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c       |  6 ++++--
 fs/ext4/super.c       |  8 +++++---
 fs/jbd2/transaction.c | 18 ++++++++----------
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0948a43f1b3d..3211af9c969f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3301,6 +3301,7 @@ static void ext4_journalled_invalidatepage(struct page *page,
 static int ext4_releasepage(struct page *page, gfp_t wait)
 {
 	journal_t *journal = EXT4_JOURNAL(page->mapping->host);
+	int ret = 0;
 
 	trace_ext4_releasepage(page);
 
@@ -3308,9 +3309,10 @@ static int ext4_releasepage(struct page *page, gfp_t wait)
 	if (PageChecked(page))
 		return 0;
 	if (journal)
-		return jbd2_journal_try_to_free_buffers(journal, page);
-	else
+		ret = jbd2_journal_try_to_free_buffers(journal, page);
+	if (!ret)
 		return try_to_free_buffers(page);
+	return 0;
 }
 
 static bool ext4_inode_datasync_dirty(struct inode *inode)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2a33c53b57d8..02ba47a5bc70 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1450,14 +1450,16 @@ static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
 				 gfp_t wait)
 {
 	journal_t *journal = EXT4_SB(sb)->s_journal;
+	int ret = 0;
 
 	WARN_ON(PageChecked(page));
 	if (!page_has_buffers(page))
 		return 0;
 	if (journal)
-		return jbd2_journal_try_to_free_buffers(journal, page);
-
-	return try_to_free_buffers(page);
+		ret = jbd2_journal_try_to_free_buffers(journal, page);
+	if (!ret)
+		return try_to_free_buffers(page);
+	return 0;
 }
 
 #ifdef CONFIG_FS_ENCRYPTION
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index b935b20cbae4..e4acc84a95fa 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2089,10 +2089,9 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
  * if they are fully written out ordered data, move them onto BUF_CLEAN
  * so try_to_free_buffers() can reap them.
  *
- * This function returns non-zero if we wish try_to_free_buffers()
- * to be called. We do this if the page is releasable by try_to_free_buffers().
- * We also do it if the page has locked or dirty buffers and the caller wants
- * us to perform sync or async writeout.
+ * This function returns zero if all the buffers on this page are
+ * journal cleaned and the caller should invoke try_to_free_buffers() and
+ * could release page if the page is releasable by try_to_free_buffers().
  *
  * This complicates JBD locking somewhat.  We aren't protected by the
  * BKL here.  We wish to remove the buffer from its committing or
@@ -2112,7 +2111,7 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
  * cannot happen because we never reallocate freed data as metadata
  * while the data is part of a transaction.  Yes?
  *
- * Return 0 on failure, 1 on success
+ * Return 0 on success, -EBUSY if any buffer is still journal busy.
  */
 int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page)
 {
@@ -2142,8 +2141,10 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page)
 		__journal_try_to_free_buffer(journal, bh);
 		spin_unlock(&jh->b_state_lock);
 		jbd2_journal_put_journal_head(jh);
-		if (buffer_jbd(bh))
-			goto busy;
+		if (buffer_jbd(bh)) {
+			ret = -EBUSY;
+			break;
+		}
 
 		/*
 		 * If we free a metadata buffer which has been failed to
@@ -2158,9 +2159,6 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page)
 		}
 	} while ((bh = bh->b_this_page) != head);
 
-	ret = try_to_free_buffers(page);
-
-busy:
 	if (has_write_io_error)
 		jbd2_journal_abort(journal, -EIO);
 
-- 
2.25.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/3] ext4: add rcu to prevent use after free when umount filesystem
  2021-04-08 11:36 [PATCH 0/3] ext4: fix two issue about bdev_try_to_free_page() Zhang Yi
  2021-04-08 11:36 ` [PATCH 1/3] jbd2: protect buffers release with j_checkpoint_mutex Zhang Yi
  2021-04-08 11:36 ` [PATCH 2/3] jbd2: do not free buffers in jbd2_journal_try_to_free_buffers() Zhang Yi
@ 2021-04-08 11:36 ` Zhang Yi
  2021-04-08 13:51   ` kernel test robot
  2021-04-08 13:56   ` Jan Kara
  2 siblings, 2 replies; 9+ messages in thread
From: Zhang Yi @ 2021-04-08 11:36 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

There is a race between bdev_try_to_free_page() and
jbd2_journal_destroy() that could end up triggering a use after free
issue about journal.

drop cache                           umount filesystem
bdev_try_to_free_page()
 get journal
 jbd2_journal_try_to_free_buffers()  ext4_put_super()
                                      kfree(journal)
   access journal <-- lead to UAF

The above race also could happens between the bdev_try_to_free_page()
and the error path of ext4_fill_super(). This patch avoid this race by
add rcu protection around accessing sbi->s_journal in
bdev_try_to_free_page() and destroy the journal after an rcu grace
period.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/super.c      | 33 ++++++++++++++++++++++++---------
 fs/jbd2/journal.c    | 30 +++++++++++++++++++++++++++---
 include/linux/jbd2.h | 11 ++++++++++-
 3 files changed, 61 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 02ba47a5bc70..6bbaadc5357b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1150,6 +1150,21 @@ static inline void ext4_quota_off_umount(struct super_block *sb)
 }
 #endif
 
+static int ext4_journal_release(struct ext4_sb_info *sbi)
+{
+	journal_t *journal = sbi->s_journal;
+	int ret;
+
+	ret = jbd2_journal_release(journal);
+	sbi->s_journal = NULL;
+	/*
+	 * Call rcu to prevent racing with bdev_try_to_free_page()
+	 * accessing the journal at the same time.
+	 */
+	call_rcu(&journal->j_rcu, jbd2_journal_release_rcu);
+	return ret;
+}
+
 static void ext4_put_super(struct super_block *sb)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
@@ -1174,11 +1189,9 @@ static void ext4_put_super(struct super_block *sb)
 
 	if (sbi->s_journal) {
 		aborted = is_journal_aborted(sbi->s_journal);
-		err = jbd2_journal_destroy(sbi->s_journal);
-		sbi->s_journal = NULL;
-		if ((err < 0) && !aborted) {
+		err = ext4_journal_release(sbi);
+		if ((err < 0) && !aborted)
 			ext4_abort(sb, -err, "Couldn't clean up the journal");
-		}
 	}
 
 	ext4_es_unregister_shrinker(sbi);
@@ -1449,14 +1462,18 @@ static int ext4_nfs_commit_metadata(struct inode *inode)
 static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
 				 gfp_t wait)
 {
-	journal_t *journal = EXT4_SB(sb)->s_journal;
+	journal_t *journal;
 	int ret = 0;
 
 	WARN_ON(PageChecked(page));
 	if (!page_has_buffers(page))
 		return 0;
+
+	rcu_read_lock();
+	journal = READ_ONCE(EXT4_SB(sb)->s_journal);
 	if (journal)
 		ret = jbd2_journal_try_to_free_buffers(journal, page);
+	rcu_read_unlock();
 	if (!ret)
 		return try_to_free_buffers(page);
 	return 0;
@@ -5146,10 +5163,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	ext4_xattr_destroy_cache(sbi->s_ea_block_cache);
 	sbi->s_ea_block_cache = NULL;
 
-	if (sbi->s_journal) {
-		jbd2_journal_destroy(sbi->s_journal);
-		sbi->s_journal = NULL;
-	}
+	if (sbi->s_journal)
+		ext4_journal_release(sbi);
 failed_mount3a:
 	ext4_es_unregister_shrinker(sbi);
 failed_mount3:
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 2dc944442802..071caaaa9de1 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -76,6 +76,8 @@ EXPORT_SYMBOL(jbd2_journal_check_available_features);
 EXPORT_SYMBOL(jbd2_journal_set_features);
 EXPORT_SYMBOL(jbd2_journal_load);
 EXPORT_SYMBOL(jbd2_journal_destroy);
+EXPORT_SYMBOL(jbd2_journal_release);
+EXPORT_SYMBOL(jbd2_journal_release_rcu);
 EXPORT_SYMBOL(jbd2_journal_abort);
 EXPORT_SYMBOL(jbd2_journal_errno);
 EXPORT_SYMBOL(jbd2_journal_ack_err);
@@ -1951,14 +1953,14 @@ int jbd2_journal_load(journal_t *journal)
 }
 
 /**
- * jbd2_journal_destroy() - Release a journal_t structure.
+ * jbd2_journal_release() - Release a journal_t structure.
  * @journal: Journal to act on.
  *
  * Release a journal_t structure once it is no longer in use by the
  * journaled object.
  * Return <0 if we couldn't clean up the journal.
  */
-int jbd2_journal_destroy(journal_t *journal)
+int jbd2_journal_release(journal_t *journal)
 {
 	int err = 0;
 
@@ -2021,11 +2023,33 @@ int jbd2_journal_destroy(journal_t *journal)
 		crypto_free_shash(journal->j_chksum_driver);
 	kfree(journal->j_fc_wbuf);
 	kfree(journal->j_wbuf);
-	kfree(journal);
 
 	return err;
 }
 
+/**
+ * jbd2_journal_release_rcu() - Free a journal_t structure.
+ * @rcu: rcu list node relate to the journal want to free.
+ *
+ * Freeing a journal_t structure after a rcu grace period.
+ */
+void jbd2_journal_release_rcu(struct rcu_head *rcu)
+{
+	kfree(container_of(rcu, journal_t, j_rcu));
+}
+
+/**
+ * jbd2_journal_destroy() - Release and free a journal_t structure.
+ * @journal: Journal to act on.
+ *
+ * Release and free a journal_t structure once it is no longer in use
+ * by the journaled object.
+ */
+void jbd2_journal_destroy(journal_t *journal)
+{
+	jbd2_journal_release(journal);
+	kfree(journal);
+}
 
 /**
  * jbd2_journal_check_used_features() - Check if features specified are used.
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 99d3cd051ac3..39a8d04596a2 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1238,6 +1238,13 @@ struct journal_s
 	 */
 	__u32 j_csum_seed;
 
+	/**
+	 * @j_rcu:
+	 *
+	 * Prevent racing between accessing and destroy at the same time.
+	 */
+	struct rcu_head j_rcu;
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	/**
 	 * @j_trans_commit_map:
@@ -1509,7 +1516,9 @@ extern int	   jbd2_journal_set_features
 extern void	   jbd2_journal_clear_features
 		   (journal_t *, unsigned long, unsigned long, unsigned long);
 extern int	   jbd2_journal_load       (journal_t *journal);
-extern int	   jbd2_journal_destroy    (journal_t *);
+extern void	   jbd2_journal_destroy    (journal_t *);
+extern int	   jbd2_journal_release    (journal_t *);
+extern void	   jbd2_journal_release_rcu     (struct rcu_head *rcu);
 extern int	   jbd2_journal_recover    (journal_t *journal);
 extern int	   jbd2_journal_wipe       (journal_t *, int);
 extern int	   jbd2_journal_skip_recovery	(journal_t *);
-- 
2.25.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] jbd2: protect buffers release with j_checkpoint_mutex
  2021-04-08 11:36 ` [PATCH 1/3] jbd2: protect buffers release with j_checkpoint_mutex Zhang Yi
@ 2021-04-08 13:45   ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2021-04-08 13:45 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Thu 08-04-21 19:36:16, Zhang Yi wrote:
> There is a race between jbd2_journal_try_to_free_buffers() and
> jbd2_journal_destroy(), so the jbd2_log_do_checkpoint() may still
> missing to detect the buffer write io error flag and lead to filesystem
> inconsistency.
> 
> jbd2_journal_try_to_free_buffers()     ext4_put_super()
>                                         jbd2_journal_destroy()
>   __jbd2_journal_remove_checkpoint()
>   detect buffer write error              jbd2_log_do_checkpoint()
>                                          jbd2_cleanup_journal_tail()
>                                            <--- lead to inconsistency
>   jbd2_journal_abort()
> 
> Fix this issue by add j_checkpoint_mutex to protect journal buffer
> release on jbd2_journal_try_to_free_buffers().
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Thanks for the patch Zhang. I agree with your problem analysis but I don't
think the solution is correct:

>  	J_ASSERT(PageLocked(page));
>  
> +	mutex_lock(&journal->j_checkpoint_mutex);

We cannot grab j_checkpoint_mutex inside jbd2_journal_try_to_free_buffers()
(or even ext4_releasepage()) because that function is called withe a page
lock which ranks below the checkpoint mutex - generally page locks are
acquired within a transaction and thus all locks required to start a
transaction (and j_checkpoint_mutex is one of them) rank above the page
lock.

Also even if the lock ordering was OK, grabbing j_checkpoint_mutex for
every page from memory reclaim just to close this rare race seems like a
performance overkill.

What we seem to need is a quick way of marking the journal as "IO error
occured" in __journal_try_to_free_buffer() before actually removing the
buffer from the checkpoint list. Perhaps this marking could even happen
already in __jbd2_journal_remove_checkpoint() and we can reuse it in
jbd2_log_do_checkpoint() for IO error handling as well... And then once we
are in a safer context, we can do:

	if (!is_journal_aborted(journal) && journal_io_error_happened(journal))
		jbd2_journal_abort(...)

								Honza

>  	head = page_buffers(page);
>  	bh = head;
>  	do {
> @@ -2163,6 +2164,7 @@ int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page)
>  	if (has_write_io_error)
>  		jbd2_journal_abort(journal, -EIO);
>  
> +	mutex_unlock(&journal->j_checkpoint_mutex);
>  	return ret;
>  }
>  
> -- 
> 2.25.4
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] ext4: add rcu to prevent use after free when umount filesystem
  2021-04-08 11:36 ` [PATCH 3/3] ext4: add rcu to prevent use after free when umount filesystem Zhang Yi
@ 2021-04-08 13:51   ` kernel test robot
  2021-04-08 13:56   ` Jan Kara
  1 sibling, 0 replies; 9+ messages in thread
From: kernel test robot @ 2021-04-08 13:51 UTC (permalink / raw)
  To: Zhang Yi, linux-ext4
  Cc: kbuild-all, tytso, adilger.kernel, jack, yi.zhang, yukuai3


[-- Attachment #1: Type: text/plain, Size: 8657 bytes --]

Hi Zhang,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on ext4/dev]
[also build test ERROR on linus/master v5.12-rc6 next-20210408]
[cannot apply to tytso-fscrypt/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Zhang-Yi/ext4-fix-two-issue-about-bdev_try_to_free_page/20210408-193105
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
config: alpha-randconfig-m031-20210408 (attached as .config)
compiler: alpha-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/1cd15af09e1887501090c23700b696d43ebf39ca
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Zhang-Yi/ext4-fix-two-issue-about-bdev_try_to_free_page/20210408-193105
        git checkout 1cd15af09e1887501090c23700b696d43ebf39ca
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=alpha 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   fs/ocfs2/journal.c: In function 'ocfs2_journal_shutdown':
>> fs/ocfs2/journal.c:1012:6: error: invalid use of void expression
    1012 |  if (!jbd2_journal_destroy(journal->j_journal) && !status) {
         |      ^


vim +1012 fs/ocfs2/journal.c

ccd979bdbce9fb Mark Fasheh                 2005-12-15   955  
ccd979bdbce9fb Mark Fasheh                 2005-12-15   956  /*
ccd979bdbce9fb Mark Fasheh                 2005-12-15   957   * If the journal has been kmalloc'd it needs to be freed after this
ccd979bdbce9fb Mark Fasheh                 2005-12-15   958   * call.
ccd979bdbce9fb Mark Fasheh                 2005-12-15   959   */
ccd979bdbce9fb Mark Fasheh                 2005-12-15   960  void ocfs2_journal_shutdown(struct ocfs2_super *osb)
ccd979bdbce9fb Mark Fasheh                 2005-12-15   961  {
ccd979bdbce9fb Mark Fasheh                 2005-12-15   962  	struct ocfs2_journal *journal = NULL;
ccd979bdbce9fb Mark Fasheh                 2005-12-15   963  	int status = 0;
ccd979bdbce9fb Mark Fasheh                 2005-12-15   964  	struct inode *inode = NULL;
ccd979bdbce9fb Mark Fasheh                 2005-12-15   965  	int num_running_trans = 0;
ccd979bdbce9fb Mark Fasheh                 2005-12-15   966  
ebdec83ba46c12 Eric Sesterhenn / snakebyte 2006-01-27   967  	BUG_ON(!osb);
ccd979bdbce9fb Mark Fasheh                 2005-12-15   968  
ccd979bdbce9fb Mark Fasheh                 2005-12-15   969  	journal = osb->journal;
ccd979bdbce9fb Mark Fasheh                 2005-12-15   970  	if (!journal)
ccd979bdbce9fb Mark Fasheh                 2005-12-15   971  		goto done;
ccd979bdbce9fb Mark Fasheh                 2005-12-15   972  
ccd979bdbce9fb Mark Fasheh                 2005-12-15   973  	inode = journal->j_inode;
ccd979bdbce9fb Mark Fasheh                 2005-12-15   974  
ccd979bdbce9fb Mark Fasheh                 2005-12-15   975  	if (journal->j_state != OCFS2_JOURNAL_LOADED)
ccd979bdbce9fb Mark Fasheh                 2005-12-15   976  		goto done;
ccd979bdbce9fb Mark Fasheh                 2005-12-15   977  
2b4e30fbde4258 Joel Becker                 2008-09-03   978  	/* need to inc inode use count - jbd2_journal_destroy will iput. */
ccd979bdbce9fb Mark Fasheh                 2005-12-15   979  	if (!igrab(inode))
ccd979bdbce9fb Mark Fasheh                 2005-12-15   980  		BUG();
ccd979bdbce9fb Mark Fasheh                 2005-12-15   981  
ccd979bdbce9fb Mark Fasheh                 2005-12-15   982  	num_running_trans = atomic_read(&(osb->journal->j_num_trans));
b41079504c786e Tao Ma                      2011-02-24   983  	trace_ocfs2_journal_shutdown(num_running_trans);
ccd979bdbce9fb Mark Fasheh                 2005-12-15   984  
ccd979bdbce9fb Mark Fasheh                 2005-12-15   985  	/* Do a commit_cache here. It will flush our journal, *and*
ccd979bdbce9fb Mark Fasheh                 2005-12-15   986  	 * release any locks that are still held.
ccd979bdbce9fb Mark Fasheh                 2005-12-15   987  	 * set the SHUTDOWN flag and release the trans lock.
ccd979bdbce9fb Mark Fasheh                 2005-12-15   988  	 * the commit thread will take the trans lock for us below. */
ccd979bdbce9fb Mark Fasheh                 2005-12-15   989  	journal->j_state = OCFS2_JOURNAL_IN_SHUTDOWN;
ccd979bdbce9fb Mark Fasheh                 2005-12-15   990  
ccd979bdbce9fb Mark Fasheh                 2005-12-15   991  	/* The OCFS2_JOURNAL_IN_SHUTDOWN will signal to commit_cache to not
ccd979bdbce9fb Mark Fasheh                 2005-12-15   992  	 * drop the trans_lock (which we want to hold until we
ccd979bdbce9fb Mark Fasheh                 2005-12-15   993  	 * completely destroy the journal. */
ccd979bdbce9fb Mark Fasheh                 2005-12-15   994  	if (osb->commit_task) {
ccd979bdbce9fb Mark Fasheh                 2005-12-15   995  		/* Wait for the commit thread */
b41079504c786e Tao Ma                      2011-02-24   996  		trace_ocfs2_journal_shutdown_wait(osb->commit_task);
ccd979bdbce9fb Mark Fasheh                 2005-12-15   997  		kthread_stop(osb->commit_task);
ccd979bdbce9fb Mark Fasheh                 2005-12-15   998  		osb->commit_task = NULL;
ccd979bdbce9fb Mark Fasheh                 2005-12-15   999  	}
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1000  
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1001  	BUG_ON(atomic_read(&(osb->journal->j_num_trans)) != 0);
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1002  
c271c5c22b0a7c Sunil Mushran               2006-12-05  1003  	if (ocfs2_mount_local(osb)) {
2b4e30fbde4258 Joel Becker                 2008-09-03  1004  		jbd2_journal_lock_updates(journal->j_journal);
2b4e30fbde4258 Joel Becker                 2008-09-03  1005  		status = jbd2_journal_flush(journal->j_journal);
2b4e30fbde4258 Joel Becker                 2008-09-03  1006  		jbd2_journal_unlock_updates(journal->j_journal);
c271c5c22b0a7c Sunil Mushran               2006-12-05  1007  		if (status < 0)
c271c5c22b0a7c Sunil Mushran               2006-12-05  1008  			mlog_errno(status);
c271c5c22b0a7c Sunil Mushran               2006-12-05  1009  	}
c271c5c22b0a7c Sunil Mushran               2006-12-05  1010  
d85400af790dba Junxiao Bi                  2018-12-28  1011  	/* Shutdown the kernel journal system */
d85400af790dba Junxiao Bi                  2018-12-28 @1012  	if (!jbd2_journal_destroy(journal->j_journal) && !status) {
c271c5c22b0a7c Sunil Mushran               2006-12-05  1013  		/*
c271c5c22b0a7c Sunil Mushran               2006-12-05  1014  		 * Do not toggle if flush was unsuccessful otherwise
c271c5c22b0a7c Sunil Mushran               2006-12-05  1015  		 * will leave dirty metadata in a "clean" journal
c271c5c22b0a7c Sunil Mushran               2006-12-05  1016  		 */
539d8264093560 Sunil Mushran               2008-07-14  1017  		status = ocfs2_journal_toggle_dirty(osb, 0, 0);
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1018  		if (status < 0)
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1019  			mlog_errno(status);
c271c5c22b0a7c Sunil Mushran               2006-12-05  1020  	}
ae0dff683076b2 Sunil Mushran               2008-10-22  1021  	journal->j_journal = NULL;
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1022  
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1023  	OCFS2_I(inode)->ip_open_count--;
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1024  
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1025  	/* unlock our journal */
e63aecb651ba73 Mark Fasheh                 2007-10-18  1026  	ocfs2_inode_unlock(inode, 1);
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1027  
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1028  	brelse(journal->j_bh);
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1029  	journal->j_bh = NULL;
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1030  
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1031  	journal->j_state = OCFS2_JOURNAL_FREE;
ccd979bdbce9fb Mark Fasheh                 2005-12-15  1032  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27442 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] ext4: add rcu to prevent use after free when umount filesystem
  2021-04-08 11:36 ` [PATCH 3/3] ext4: add rcu to prevent use after free when umount filesystem Zhang Yi
  2021-04-08 13:51   ` kernel test robot
@ 2021-04-08 13:56   ` Jan Kara
  2021-04-08 14:38     ` Zhang Yi
  1 sibling, 1 reply; 9+ messages in thread
From: Jan Kara @ 2021-04-08 13:56 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Thu 08-04-21 19:36:18, Zhang Yi wrote:
> There is a race between bdev_try_to_free_page() and
> jbd2_journal_destroy() that could end up triggering a use after free
> issue about journal.
> 
> drop cache                           umount filesystem
> bdev_try_to_free_page()
>  get journal
>  jbd2_journal_try_to_free_buffers()  ext4_put_super()
>                                       kfree(journal)
>    access journal <-- lead to UAF
> 
> The above race also could happens between the bdev_try_to_free_page()
> and the error path of ext4_fill_super(). This patch avoid this race by
> add rcu protection around accessing sbi->s_journal in
> bdev_try_to_free_page() and destroy the journal after an rcu grace
> period.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

OK, I see the problem. But cannot the use-after-free happen even for the
superblock itself (e.g., EXT4_SB(sb)->s_journal dereference)? I don't see
anything preventing that as blkdev_releasepage() just shamelessly does:

super = BDEV_I(page->mapping->host)->bdev.bd_super

without making sure the sb cannot go away the instant we load a pointer to
it. Or am I missing something Ted? If I'm right, we'd need some careful
sprinkling of RCU, READ_ONCE(), and careful superblock reference grabbing
to make bdev_try_to_free_page() safe against concurrent
kill_block_super()...

								Honza

> ---
>  fs/ext4/super.c      | 33 ++++++++++++++++++++++++---------
>  fs/jbd2/journal.c    | 30 +++++++++++++++++++++++++++---
>  include/linux/jbd2.h | 11 ++++++++++-
>  3 files changed, 61 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 02ba47a5bc70..6bbaadc5357b 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1150,6 +1150,21 @@ static inline void ext4_quota_off_umount(struct super_block *sb)
>  }
>  #endif
>  
> +static int ext4_journal_release(struct ext4_sb_info *sbi)
> +{
> +	journal_t *journal = sbi->s_journal;
> +	int ret;
> +
> +	ret = jbd2_journal_release(journal);
> +	sbi->s_journal = NULL;
> +	/*
> +	 * Call rcu to prevent racing with bdev_try_to_free_page()
> +	 * accessing the journal at the same time.
> +	 */
> +	call_rcu(&journal->j_rcu, jbd2_journal_release_rcu);
> +	return ret;
> +}
> +
>  static void ext4_put_super(struct super_block *sb)
>  {
>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
> @@ -1174,11 +1189,9 @@ static void ext4_put_super(struct super_block *sb)
>  
>  	if (sbi->s_journal) {
>  		aborted = is_journal_aborted(sbi->s_journal);
> -		err = jbd2_journal_destroy(sbi->s_journal);
> -		sbi->s_journal = NULL;
> -		if ((err < 0) && !aborted) {
> +		err = ext4_journal_release(sbi);
> +		if ((err < 0) && !aborted)
>  			ext4_abort(sb, -err, "Couldn't clean up the journal");
> -		}
>  	}
>  
>  	ext4_es_unregister_shrinker(sbi);
> @@ -1449,14 +1462,18 @@ static int ext4_nfs_commit_metadata(struct inode *inode)
>  static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
>  				 gfp_t wait)
>  {
> -	journal_t *journal = EXT4_SB(sb)->s_journal;
> +	journal_t *journal;
>  	int ret = 0;
>  
>  	WARN_ON(PageChecked(page));
>  	if (!page_has_buffers(page))
>  		return 0;
> +
> +	rcu_read_lock();
> +	journal = READ_ONCE(EXT4_SB(sb)->s_journal);
>  	if (journal)
>  		ret = jbd2_journal_try_to_free_buffers(journal, page);
> +	rcu_read_unlock();
>  	if (!ret)
>  		return try_to_free_buffers(page);
>  	return 0;
> @@ -5146,10 +5163,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>  	ext4_xattr_destroy_cache(sbi->s_ea_block_cache);
>  	sbi->s_ea_block_cache = NULL;
>  
> -	if (sbi->s_journal) {
> -		jbd2_journal_destroy(sbi->s_journal);
> -		sbi->s_journal = NULL;
> -	}
> +	if (sbi->s_journal)
> +		ext4_journal_release(sbi);
>  failed_mount3a:
>  	ext4_es_unregister_shrinker(sbi);
>  failed_mount3:
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 2dc944442802..071caaaa9de1 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -76,6 +76,8 @@ EXPORT_SYMBOL(jbd2_journal_check_available_features);
>  EXPORT_SYMBOL(jbd2_journal_set_features);
>  EXPORT_SYMBOL(jbd2_journal_load);
>  EXPORT_SYMBOL(jbd2_journal_destroy);
> +EXPORT_SYMBOL(jbd2_journal_release);
> +EXPORT_SYMBOL(jbd2_journal_release_rcu);
>  EXPORT_SYMBOL(jbd2_journal_abort);
>  EXPORT_SYMBOL(jbd2_journal_errno);
>  EXPORT_SYMBOL(jbd2_journal_ack_err);
> @@ -1951,14 +1953,14 @@ int jbd2_journal_load(journal_t *journal)
>  }
>  
>  /**
> - * jbd2_journal_destroy() - Release a journal_t structure.
> + * jbd2_journal_release() - Release a journal_t structure.
>   * @journal: Journal to act on.
>   *
>   * Release a journal_t structure once it is no longer in use by the
>   * journaled object.
>   * Return <0 if we couldn't clean up the journal.
>   */
> -int jbd2_journal_destroy(journal_t *journal)
> +int jbd2_journal_release(journal_t *journal)
>  {
>  	int err = 0;
>  
> @@ -2021,11 +2023,33 @@ int jbd2_journal_destroy(journal_t *journal)
>  		crypto_free_shash(journal->j_chksum_driver);
>  	kfree(journal->j_fc_wbuf);
>  	kfree(journal->j_wbuf);
> -	kfree(journal);
>  
>  	return err;
>  }
>  
> +/**
> + * jbd2_journal_release_rcu() - Free a journal_t structure.
> + * @rcu: rcu list node relate to the journal want to free.
> + *
> + * Freeing a journal_t structure after a rcu grace period.
> + */
> +void jbd2_journal_release_rcu(struct rcu_head *rcu)
> +{
> +	kfree(container_of(rcu, journal_t, j_rcu));
> +}
> +
> +/**
> + * jbd2_journal_destroy() - Release and free a journal_t structure.
> + * @journal: Journal to act on.
> + *
> + * Release and free a journal_t structure once it is no longer in use
> + * by the journaled object.
> + */
> +void jbd2_journal_destroy(journal_t *journal)
> +{
> +	jbd2_journal_release(journal);
> +	kfree(journal);
> +}
>  
>  /**
>   * jbd2_journal_check_used_features() - Check if features specified are used.
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 99d3cd051ac3..39a8d04596a2 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1238,6 +1238,13 @@ struct journal_s
>  	 */
>  	__u32 j_csum_seed;
>  
> +	/**
> +	 * @j_rcu:
> +	 *
> +	 * Prevent racing between accessing and destroy at the same time.
> +	 */
> +	struct rcu_head j_rcu;
> +
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  	/**
>  	 * @j_trans_commit_map:
> @@ -1509,7 +1516,9 @@ extern int	   jbd2_journal_set_features
>  extern void	   jbd2_journal_clear_features
>  		   (journal_t *, unsigned long, unsigned long, unsigned long);
>  extern int	   jbd2_journal_load       (journal_t *journal);
> -extern int	   jbd2_journal_destroy    (journal_t *);
> +extern void	   jbd2_journal_destroy    (journal_t *);
> +extern int	   jbd2_journal_release    (journal_t *);
> +extern void	   jbd2_journal_release_rcu     (struct rcu_head *rcu);
>  extern int	   jbd2_journal_recover    (journal_t *journal);
>  extern int	   jbd2_journal_wipe       (journal_t *, int);
>  extern int	   jbd2_journal_skip_recovery	(journal_t *);
> -- 
> 2.25.4
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] ext4: add rcu to prevent use after free when umount filesystem
  2021-04-08 13:56   ` Jan Kara
@ 2021-04-08 14:38     ` Zhang Yi
  2021-04-08 14:53       ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Zhang Yi @ 2021-04-08 14:38 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, tytso, adilger.kernel, yukuai3

On 2021/4/8 21:56, Jan Kara wrote:
> On Thu 08-04-21 19:36:18, Zhang Yi wrote:
>> There is a race between bdev_try_to_free_page() and
>> jbd2_journal_destroy() that could end up triggering a use after free
>> issue about journal.
>>
>> drop cache                           umount filesystem
>> bdev_try_to_free_page()
>>  get journal
>>  jbd2_journal_try_to_free_buffers()  ext4_put_super()
>>                                       kfree(journal)
>>    access journal <-- lead to UAF
>>
>> The above race also could happens between the bdev_try_to_free_page()
>> and the error path of ext4_fill_super(). This patch avoid this race by
>> add rcu protection around accessing sbi->s_journal in
>> bdev_try_to_free_page() and destroy the journal after an rcu grace
>> period.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> OK, I see the problem. But cannot the use-after-free happen even for the
> superblock itself (e.g., EXT4_SB(sb)->s_journal dereference)? I don't see
> anything preventing that as blkdev_releasepage() just shamelessly does:
> 
> super = BDEV_I(page->mapping->host)->bdev.bd_super
> 
Hi, Jan.

I checked the superblock. In theory, the bdev_try_to_free_page() is invoked
with page locked, the umount process will wait the page unlock on
kill_block_super()->..->kill_bdev()->truncate_inode_pages_range() before free
superblock, so I guess the use-after-free problem couldn't happen in general.
But I think it's fragile and may invalidate if the bdev has more than one
operners(__blkdev_put() call kill_bdev only if bd_openers becomes zero)?
I will check it.

Thanks,
Yi.

> without making sure the sb cannot go away the instant we load a pointer to
> it. Or am I missing something Ted? If I'm right, we'd need some careful
> sprinkling of RCU, READ_ONCE(), and careful superblock reference grabbing
> to make bdev_try_to_free_page() safe against concurrent
> kill_block_super()...
> 
> 								Honza
> 
>> ---
>>  fs/ext4/super.c      | 33 ++++++++++++++++++++++++---------
>>  fs/jbd2/journal.c    | 30 +++++++++++++++++++++++++++---
>>  include/linux/jbd2.h | 11 ++++++++++-
>>  3 files changed, 61 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index 02ba47a5bc70..6bbaadc5357b 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -1150,6 +1150,21 @@ static inline void ext4_quota_off_umount(struct super_block *sb)
>>  }
>>  #endif
>>  
>> +static int ext4_journal_release(struct ext4_sb_info *sbi)
>> +{
>> +	journal_t *journal = sbi->s_journal;
>> +	int ret;
>> +
>> +	ret = jbd2_journal_release(journal);
>> +	sbi->s_journal = NULL;
>> +	/*
>> +	 * Call rcu to prevent racing with bdev_try_to_free_page()
>> +	 * accessing the journal at the same time.
>> +	 */
>> +	call_rcu(&journal->j_rcu, jbd2_journal_release_rcu);
>> +	return ret;
>> +}
>> +
>>  static void ext4_put_super(struct super_block *sb)
>>  {
>>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
>> @@ -1174,11 +1189,9 @@ static void ext4_put_super(struct super_block *sb)
>>  
>>  	if (sbi->s_journal) {
>>  		aborted = is_journal_aborted(sbi->s_journal);
>> -		err = jbd2_journal_destroy(sbi->s_journal);
>> -		sbi->s_journal = NULL;
>> -		if ((err < 0) && !aborted) {
>> +		err = ext4_journal_release(sbi);
>> +		if ((err < 0) && !aborted)
>>  			ext4_abort(sb, -err, "Couldn't clean up the journal");
>> -		}
>>  	}
>>  
>>  	ext4_es_unregister_shrinker(sbi);
>> @@ -1449,14 +1462,18 @@ static int ext4_nfs_commit_metadata(struct inode *inode)
>>  static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
>>  				 gfp_t wait)
>>  {
>> -	journal_t *journal = EXT4_SB(sb)->s_journal;
>> +	journal_t *journal;
>>  	int ret = 0;
>>  
>>  	WARN_ON(PageChecked(page));
>>  	if (!page_has_buffers(page))
>>  		return 0;
>> +
>> +	rcu_read_lock();
>> +	journal = READ_ONCE(EXT4_SB(sb)->s_journal);
>>  	if (journal)
>>  		ret = jbd2_journal_try_to_free_buffers(journal, page);
>> +	rcu_read_unlock();
>>  	if (!ret)
>>  		return try_to_free_buffers(page);
>>  	return 0;
>> @@ -5146,10 +5163,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>  	ext4_xattr_destroy_cache(sbi->s_ea_block_cache);
>>  	sbi->s_ea_block_cache = NULL;
>>  
>> -	if (sbi->s_journal) {
>> -		jbd2_journal_destroy(sbi->s_journal);
>> -		sbi->s_journal = NULL;
>> -	}
>> +	if (sbi->s_journal)
>> +		ext4_journal_release(sbi);
>>  failed_mount3a:
>>  	ext4_es_unregister_shrinker(sbi);
>>  failed_mount3:
>> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
>> index 2dc944442802..071caaaa9de1 100644
>> --- a/fs/jbd2/journal.c
>> +++ b/fs/jbd2/journal.c
>> @@ -76,6 +76,8 @@ EXPORT_SYMBOL(jbd2_journal_check_available_features);
>>  EXPORT_SYMBOL(jbd2_journal_set_features);
>>  EXPORT_SYMBOL(jbd2_journal_load);
>>  EXPORT_SYMBOL(jbd2_journal_destroy);
>> +EXPORT_SYMBOL(jbd2_journal_release);
>> +EXPORT_SYMBOL(jbd2_journal_release_rcu);
>>  EXPORT_SYMBOL(jbd2_journal_abort);
>>  EXPORT_SYMBOL(jbd2_journal_errno);
>>  EXPORT_SYMBOL(jbd2_journal_ack_err);
>> @@ -1951,14 +1953,14 @@ int jbd2_journal_load(journal_t *journal)
>>  }
>>  
>>  /**
>> - * jbd2_journal_destroy() - Release a journal_t structure.
>> + * jbd2_journal_release() - Release a journal_t structure.
>>   * @journal: Journal to act on.
>>   *
>>   * Release a journal_t structure once it is no longer in use by the
>>   * journaled object.
>>   * Return <0 if we couldn't clean up the journal.
>>   */
>> -int jbd2_journal_destroy(journal_t *journal)
>> +int jbd2_journal_release(journal_t *journal)
>>  {
>>  	int err = 0;
>>  
>> @@ -2021,11 +2023,33 @@ int jbd2_journal_destroy(journal_t *journal)
>>  		crypto_free_shash(journal->j_chksum_driver);
>>  	kfree(journal->j_fc_wbuf);
>>  	kfree(journal->j_wbuf);
>> -	kfree(journal);
>>  
>>  	return err;
>>  }
>>  
>> +/**
>> + * jbd2_journal_release_rcu() - Free a journal_t structure.
>> + * @rcu: rcu list node relate to the journal want to free.
>> + *
>> + * Freeing a journal_t structure after a rcu grace period.
>> + */
>> +void jbd2_journal_release_rcu(struct rcu_head *rcu)
>> +{
>> +	kfree(container_of(rcu, journal_t, j_rcu));
>> +}
>> +
>> +/**
>> + * jbd2_journal_destroy() - Release and free a journal_t structure.
>> + * @journal: Journal to act on.
>> + *
>> + * Release and free a journal_t structure once it is no longer in use
>> + * by the journaled object.
>> + */
>> +void jbd2_journal_destroy(journal_t *journal)
>> +{
>> +	jbd2_journal_release(journal);
>> +	kfree(journal);
>> +}
>>  
>>  /**
>>   * jbd2_journal_check_used_features() - Check if features specified are used.
>> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
>> index 99d3cd051ac3..39a8d04596a2 100644
>> --- a/include/linux/jbd2.h
>> +++ b/include/linux/jbd2.h
>> @@ -1238,6 +1238,13 @@ struct journal_s
>>  	 */
>>  	__u32 j_csum_seed;
>>  
>> +	/**
>> +	 * @j_rcu:
>> +	 *
>> +	 * Prevent racing between accessing and destroy at the same time.
>> +	 */
>> +	struct rcu_head j_rcu;
>> +
>>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>>  	/**
>>  	 * @j_trans_commit_map:
>> @@ -1509,7 +1516,9 @@ extern int	   jbd2_journal_set_features
>>  extern void	   jbd2_journal_clear_features
>>  		   (journal_t *, unsigned long, unsigned long, unsigned long);
>>  extern int	   jbd2_journal_load       (journal_t *journal);
>> -extern int	   jbd2_journal_destroy    (journal_t *);
>> +extern void	   jbd2_journal_destroy    (journal_t *);
>> +extern int	   jbd2_journal_release    (journal_t *);
>> +extern void	   jbd2_journal_release_rcu     (struct rcu_head *rcu);
>>  extern int	   jbd2_journal_recover    (journal_t *journal);
>>  extern int	   jbd2_journal_wipe       (journal_t *, int);
>>  extern int	   jbd2_journal_skip_recovery	(journal_t *);
>> -- 
>> 2.25.4
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] ext4: add rcu to prevent use after free when umount filesystem
  2021-04-08 14:38     ` Zhang Yi
@ 2021-04-08 14:53       ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2021-04-08 14:53 UTC (permalink / raw)
  To: Zhang Yi; +Cc: Jan Kara, linux-ext4, tytso, adilger.kernel, yukuai3

On Thu 08-04-21 22:38:08, Zhang Yi wrote:
> On 2021/4/8 21:56, Jan Kara wrote:
> > On Thu 08-04-21 19:36:18, Zhang Yi wrote:
> >> There is a race between bdev_try_to_free_page() and
> >> jbd2_journal_destroy() that could end up triggering a use after free
> >> issue about journal.
> >>
> >> drop cache                           umount filesystem
> >> bdev_try_to_free_page()
> >>  get journal
> >>  jbd2_journal_try_to_free_buffers()  ext4_put_super()
> >>                                       kfree(journal)
> >>    access journal <-- lead to UAF
> >>
> >> The above race also could happens between the bdev_try_to_free_page()
> >> and the error path of ext4_fill_super(). This patch avoid this race by
> >> add rcu protection around accessing sbi->s_journal in
> >> bdev_try_to_free_page() and destroy the journal after an rcu grace
> >> period.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > 
> > OK, I see the problem. But cannot the use-after-free happen even for the
> > superblock itself (e.g., EXT4_SB(sb)->s_journal dereference)? I don't see
> > anything preventing that as blkdev_releasepage() just shamelessly does:
> > 
> > super = BDEV_I(page->mapping->host)->bdev.bd_super
> > 
> Hi, Jan.
> 
> I checked the superblock. In theory, the bdev_try_to_free_page() is invoked
> with page locked, the umount process will wait the page unlock on
> kill_block_super()->..->kill_bdev()->truncate_inode_pages_range() before free
> superblock, so I guess the use-after-free problem couldn't happen in general.
> But I think it's fragile and may invalidate if the bdev has more than one
> operners(__blkdev_put() call kill_bdev only if bd_openers becomes zero)?

Yes, kill_bdev() is only called when bd_openers drops to 0 but there can be
other processes having the bdev open (non-exclusively).

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-08 11:36 [PATCH 0/3] ext4: fix two issue about bdev_try_to_free_page() Zhang Yi
2021-04-08 11:36 ` [PATCH 1/3] jbd2: protect buffers release with j_checkpoint_mutex Zhang Yi
2021-04-08 13:45   ` Jan Kara
2021-04-08 11:36 ` [PATCH 2/3] jbd2: do not free buffers in jbd2_journal_try_to_free_buffers() Zhang Yi
2021-04-08 11:36 ` [PATCH 3/3] ext4: add rcu to prevent use after free when umount filesystem Zhang Yi
2021-04-08 13:51   ` kernel test robot
2021-04-08 13:56   ` Jan Kara
2021-04-08 14:38     ` Zhang Yi
2021-04-08 14:53       ` Jan Kara

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git