Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/2] jbd2: fix an oops problem
@ 2020-02-11 13:54 zhangyi (F)
  2020-02-11 13:54 ` [PATCH v2 1/2] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer() zhangyi (F)
  2020-02-11 13:55 ` [PATCH v2 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer zhangyi (F)
  0 siblings, 2 replies; 5+ messages in thread
From: zhangyi (F) @ 2020-02-11 13:54 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, tytso, luoshijie1, zhangxiaoxu5, yi.zhang

Changes since v1:
 - Switch to clear b_modified just after set_buffer_freed() instead of
   reuse codes at the end of journal_unmap_buffer().
 - Switch to distinguish metadata buffers through the page mapping dev.

Thanks,
Yi.

--------------
Original description:

We encountered a jbd2 oops problem on an aarch64 machine with 4K block
size and 64K page size when doing stress tests.

 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
 ...
 user pgtable: 64k pages, 42-bit VAs, pgdp = (____ptrval____)
 ...
 pc : jbd2_journal_put_journal_head+0x7c/0x284
 lr : jbd2_journal_put_journal_head+0x3c/0x284
 ...
 Call trace:
  jbd2_journal_put_journal_head+0x7c/0x284
  __jbd2_journal_refile_buffer+0x164/0x188
  jbd2_journal_commit_transaction+0x12a0/0x1a50
  kjournald2+0xd0/0x260
  kthread+0x134/0x138
  ret_from_fork+0x10/0x1c
 Code: 51000400 b9000ac0 35000760 f9402274 (b9400a80)
 ---[ end trace 8fa99273d06aeb63 ]---

These patch set can fix this issue, the first patch is just a cleanup
patch, and the second one describe the root cause and fix it.


zhangyi (F) (2):
  jbd2: move the clearing of b_modified flag to the
    journal_unmap_buffer()
  jbd2: do not clear the BH_Mapped flag when forgetting a metadata
    buffer

 fs/jbd2/commit.c      | 41 ++++++++++++++++++++---------------------
 fs/jbd2/transaction.c | 10 ++++++----
 2 files changed, 26 insertions(+), 25 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer()
  2020-02-11 13:54 [PATCH v2 0/2] jbd2: fix an oops problem zhangyi (F)
@ 2020-02-11 13:54 ` zhangyi (F)
  2020-02-12 10:45   ` Jan Kara
  2020-02-11 13:55 ` [PATCH v2 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer zhangyi (F)
  1 sibling, 1 reply; 5+ messages in thread
From: zhangyi (F) @ 2020-02-11 13:54 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, tytso, luoshijie1, zhangxiaoxu5, yi.zhang

There is no need to delay the clearing of b_modified flag to the
transaction committing time when unmapping the journalled buffer, so
just move it to the journal_unmap_buffer().

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
---
 fs/jbd2/commit.c      | 43 +++++++++++++++----------------------------
 fs/jbd2/transaction.c | 10 ++++++----
 2 files changed, 21 insertions(+), 32 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 7f0b362b3842..ecc2ea5f1b59 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -976,34 +976,21 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 		 * it. */
 
 		/*
-		* A buffer which has been freed while still being journaled by
-		* a previous transaction.
-		*/
-		if (buffer_freed(bh)) {
-			/*
-			 * If the running transaction is the one containing
-			 * "add to orphan" operation (b_next_transaction !=
-			 * NULL), we have to wait for that transaction to
-			 * commit before we can really get rid of the buffer.
-			 * So just clear b_modified to not confuse transaction
-			 * credit accounting and refile the buffer to
-			 * BJ_Forget of the running transaction. If the just
-			 * committed transaction contains "add to orphan"
-			 * operation, we can completely invalidate the buffer
-			 * now. We are rather through in that since the
-			 * buffer may be still accessible when blocksize <
-			 * pagesize and it is attached to the last partial
-			 * page.
-			 */
-			jh->b_modified = 0;
-			if (!jh->b_next_transaction) {
-				clear_buffer_freed(bh);
-				clear_buffer_jbddirty(bh);
-				clear_buffer_mapped(bh);
-				clear_buffer_new(bh);
-				clear_buffer_req(bh);
-				bh->b_bdev = NULL;
-			}
+		 * A buffer which has been freed while still being journaled
+		 * by a previous transaction, refile the buffer to BJ_Forget of
+		 * the running transaction. If the just committed transaction
+		 * contains "add to orphan" operation, we can completely
+		 * invalidate the buffer now. We are rather through in that
+		 * since the buffer may be still accessible when blocksize <
+		 * pagesize and it is attached to the last partial page.
+		 */
+		if (buffer_freed(bh) && !jh->b_next_transaction) {
+			clear_buffer_freed(bh);
+			clear_buffer_jbddirty(bh);
+			clear_buffer_mapped(bh);
+			clear_buffer_new(bh);
+			clear_buffer_req(bh);
+			bh->b_bdev = NULL;
 		}
 
 		if (buffer_jbddirty(bh)) {
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 27b9f9dee434..0603dfa9ad90 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2329,14 +2329,16 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
 			return -EBUSY;
 		}
 		/*
-		 * OK, buffer won't be reachable after truncate. We just set
-		 * j_next_transaction to the running transaction (if there is
-		 * one) and mark buffer as freed so that commit code knows it
-		 * should clear dirty bits when it is done with the buffer.
+		 * OK, buffer won't be reachable after truncate. We just clear
+		 * b_modified to not confuse transaction credit accounting, and
+		 * set j_next_transaction to the running transaction (if there
+		 * is one) and mark buffer as freed so that commit code knows
+		 * it should clear dirty bits when it is done with the buffer.
 		 */
 		set_buffer_freed(bh);
 		if (journal->j_running_transaction && buffer_jbddirty(bh))
 			jh->b_next_transaction = journal->j_running_transaction;
+		jh->b_modified = 0;
 		spin_unlock(&journal->j_list_lock);
 		spin_unlock(&jh->b_state_lock);
 		write_unlock(&journal->j_state_lock);
-- 
2.17.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
  2020-02-11 13:54 [PATCH v2 0/2] jbd2: fix an oops problem zhangyi (F)
  2020-02-11 13:54 ` [PATCH v2 1/2] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer() zhangyi (F)
@ 2020-02-11 13:55 ` zhangyi (F)
  2020-02-12 10:48   ` Jan Kara
  1 sibling, 1 reply; 5+ messages in thread
From: zhangyi (F) @ 2020-02-11 13:55 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, tytso, luoshijie1, zhangxiaoxu5, yi.zhang

Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
an older transaction") set the BH_Freed flag when forgetting a metadata
buffer which belongs to the committing transaction, it indicate the
committing process clear dirty bits when it is done with the buffer. But
it also clear the BH_Mapped flag at the same time, which may trigger
below NULL pointer oops when block_size < PAGE_SIZE.

rmdir 1             kjournald2                 mkdir 2
                    jbd2_journal_commit_transaction
		    commit transaction N
jbd2_journal_forget
set_buffer_freed(bh1)
                    jbd2_journal_commit_transaction
                     commit transaction N+1
                     ...
                     clear_buffer_mapped(bh1)
                                               ext4_getblk(bh2 ummapped)
                                               ...
                                               grow_dev_page
                                                init_page_buffers
                                                 bh1->b_private=NULL
                                                 bh2->b_private=NULL
                     jbd2_journal_put_journal_head(jh1)
                      __journal_remove_journal_head(hb1)
		       jh1 is NULL and trigger oops

*) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
   already been unmapped.

For the metadata buffer we forgetting, we should always keep the mapped
flag and clear the dirty flags is enough, so this patch pick out the
these buffers and keep their BH_Mapped flag.

Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
---
 fs/jbd2/commit.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index ecc2ea5f1b59..8f6f4ddd8b78 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -985,12 +985,24 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 		 * pagesize and it is attached to the last partial page.
 		 */
 		if (buffer_freed(bh) && !jh->b_next_transaction) {
+			struct address_space *mapping;
+
 			clear_buffer_freed(bh);
 			clear_buffer_jbddirty(bh);
-			clear_buffer_mapped(bh);
-			clear_buffer_new(bh);
-			clear_buffer_req(bh);
-			bh->b_bdev = NULL;
+			/*
+			 * We can (and need to) unmap buffer only for normal
+			 * mappings. Block device buffers need to stay mapped
+			 * all the time. We need to be careful about the check
+			 * because the data page mapping can get cleared under
+			 * our hands.
+			 */
+			mapping = READ_ONCE(bh->b_page->mapping);
+			if (!(mapping && sb_is_blkdev_sb(mapping->host->i_sb))) {
+				clear_buffer_mapped(bh);
+				clear_buffer_new(bh);
+				clear_buffer_req(bh);
+				bh->b_bdev = NULL;
+			}
 		}
 
 		if (buffer_jbddirty(bh)) {
-- 
2.17.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer()
  2020-02-11 13:54 ` [PATCH v2 1/2] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer() zhangyi (F)
@ 2020-02-12 10:45   ` Jan Kara
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Kara @ 2020-02-12 10:45 UTC (permalink / raw)
  To: zhangyi (F); +Cc: linux-ext4, jack, tytso, luoshijie1, zhangxiaoxu5

On Tue 11-02-20 21:54:59, zhangyi (F) wrote:
> There is no need to delay the clearing of b_modified flag to the
> transaction committing time when unmapping the journalled buffer, so
> just move it to the journal_unmap_buffer().
> 
> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/jbd2/commit.c      | 43 +++++++++++++++----------------------------
>  fs/jbd2/transaction.c | 10 ++++++----
>  2 files changed, 21 insertions(+), 32 deletions(-)
> 
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 7f0b362b3842..ecc2ea5f1b59 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -976,34 +976,21 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>  		 * it. */
>  
>  		/*
> -		* A buffer which has been freed while still being journaled by
> -		* a previous transaction.
> -		*/
> -		if (buffer_freed(bh)) {
> -			/*
> -			 * If the running transaction is the one containing
> -			 * "add to orphan" operation (b_next_transaction !=
> -			 * NULL), we have to wait for that transaction to
> -			 * commit before we can really get rid of the buffer.
> -			 * So just clear b_modified to not confuse transaction
> -			 * credit accounting and refile the buffer to
> -			 * BJ_Forget of the running transaction. If the just
> -			 * committed transaction contains "add to orphan"
> -			 * operation, we can completely invalidate the buffer
> -			 * now. We are rather through in that since the
> -			 * buffer may be still accessible when blocksize <
> -			 * pagesize and it is attached to the last partial
> -			 * page.
> -			 */
> -			jh->b_modified = 0;
> -			if (!jh->b_next_transaction) {
> -				clear_buffer_freed(bh);
> -				clear_buffer_jbddirty(bh);
> -				clear_buffer_mapped(bh);
> -				clear_buffer_new(bh);
> -				clear_buffer_req(bh);
> -				bh->b_bdev = NULL;
> -			}
> +		 * A buffer which has been freed while still being journaled
> +		 * by a previous transaction, refile the buffer to BJ_Forget of
> +		 * the running transaction. If the just committed transaction
> +		 * contains "add to orphan" operation, we can completely
> +		 * invalidate the buffer now. We are rather through in that
> +		 * since the buffer may be still accessible when blocksize <
> +		 * pagesize and it is attached to the last partial page.
> +		 */
> +		if (buffer_freed(bh) && !jh->b_next_transaction) {
> +			clear_buffer_freed(bh);
> +			clear_buffer_jbddirty(bh);
> +			clear_buffer_mapped(bh);
> +			clear_buffer_new(bh);
> +			clear_buffer_req(bh);
> +			bh->b_bdev = NULL;
>  		}
>  
>  		if (buffer_jbddirty(bh)) {
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 27b9f9dee434..0603dfa9ad90 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -2329,14 +2329,16 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
>  			return -EBUSY;
>  		}
>  		/*
> -		 * OK, buffer won't be reachable after truncate. We just set
> -		 * j_next_transaction to the running transaction (if there is
> -		 * one) and mark buffer as freed so that commit code knows it
> -		 * should clear dirty bits when it is done with the buffer.
> +		 * OK, buffer won't be reachable after truncate. We just clear
> +		 * b_modified to not confuse transaction credit accounting, and
> +		 * set j_next_transaction to the running transaction (if there
> +		 * is one) and mark buffer as freed so that commit code knows
> +		 * it should clear dirty bits when it is done with the buffer.
>  		 */
>  		set_buffer_freed(bh);
>  		if (journal->j_running_transaction && buffer_jbddirty(bh))
>  			jh->b_next_transaction = journal->j_running_transaction;
> +		jh->b_modified = 0;
>  		spin_unlock(&journal->j_list_lock);
>  		spin_unlock(&jh->b_state_lock);
>  		write_unlock(&journal->j_state_lock);
> -- 
> 2.17.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
  2020-02-11 13:55 ` [PATCH v2 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer zhangyi (F)
@ 2020-02-12 10:48   ` Jan Kara
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Kara @ 2020-02-12 10:48 UTC (permalink / raw)
  To: zhangyi (F); +Cc: linux-ext4, jack, tytso, luoshijie1, zhangxiaoxu5

On Tue 11-02-20 21:55:00, zhangyi (F) wrote:
> Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
> an older transaction") set the BH_Freed flag when forgetting a metadata
> buffer which belongs to the committing transaction, it indicate the
> committing process clear dirty bits when it is done with the buffer. But
> it also clear the BH_Mapped flag at the same time, which may trigger
> below NULL pointer oops when block_size < PAGE_SIZE.
> 
> rmdir 1             kjournald2                 mkdir 2
>                     jbd2_journal_commit_transaction
> 		    commit transaction N
> jbd2_journal_forget
> set_buffer_freed(bh1)
>                     jbd2_journal_commit_transaction
>                      commit transaction N+1
>                      ...
>                      clear_buffer_mapped(bh1)
>                                                ext4_getblk(bh2 ummapped)
>                                                ...
>                                                grow_dev_page
>                                                 init_page_buffers
>                                                  bh1->b_private=NULL
>                                                  bh2->b_private=NULL
>                      jbd2_journal_put_journal_head(jh1)
>                       __journal_remove_journal_head(hb1)
> 		       jh1 is NULL and trigger oops
> 
> *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
>    already been unmapped.
> 
> For the metadata buffer we forgetting, we should always keep the mapped
> flag and clear the dirty flags is enough, so this patch pick out the
> these buffers and keep their BH_Mapped flag.
> 
> Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
> ---
>  fs/jbd2/commit.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index ecc2ea5f1b59..8f6f4ddd8b78 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -985,12 +985,24 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>  		 * pagesize and it is attached to the last partial page.
>  		 */
>  		if (buffer_freed(bh) && !jh->b_next_transaction) {
> +			struct address_space *mapping;
> +
>  			clear_buffer_freed(bh);
>  			clear_buffer_jbddirty(bh);
> -			clear_buffer_mapped(bh);
> -			clear_buffer_new(bh);
> -			clear_buffer_req(bh);
> -			bh->b_bdev = NULL;
> +			/*
> +			 * We can (and need to) unmap buffer only for normal
> +			 * mappings. Block device buffers need to stay mapped
> +			 * all the time. We need to be careful about the check
> +			 * because the data page mapping can get cleared under
> +			 * our hands.
> +			 */
> +			mapping = READ_ONCE(bh->b_page->mapping);
> +			if (!(mapping && sb_is_blkdev_sb(mapping->host->i_sb))) {

As I wrote in my other email, we don't have to be that aggressive and this
condition could actually be (mapping && !sb_is_blkdev_sb()) but I guess it
doesn't really matter. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

									Honza

> +				clear_buffer_mapped(bh);
> +				clear_buffer_new(bh);
> +				clear_buffer_req(bh);
> +				bh->b_bdev = NULL;
> +			}
>  		}
>  
>  		if (buffer_jbddirty(bh)) {
> -- 
> 2.17.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-11 13:54 [PATCH v2 0/2] jbd2: fix an oops problem zhangyi (F)
2020-02-11 13:54 ` [PATCH v2 1/2] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer() zhangyi (F)
2020-02-12 10:45   ` Jan Kara
2020-02-11 13:55 ` [PATCH v2 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer zhangyi (F)
2020-02-12 10:48   ` Jan Kara

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git