Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/2] jbd2: fix an oops problem
@ 2020-02-03 14:04 zhangyi (F)
  2020-02-03 14:04 ` [PATCH 1/2] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer() zhangyi (F)
  2020-02-03 14:04 ` [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer zhangyi (F)
  0 siblings, 2 replies; 10+ messages in thread
From: zhangyi (F) @ 2020-02-03 14:04 UTC (permalink / raw)
  To: jack, tytso; +Cc: linux-ext4, yi.zhang, luoshijie1, zhangxiaoxu5

Hi, Ted and Jan
We encountered a jbd2 oops problem on an aarch64 machine with 4K block
size and 64K page size when doing stress tests.

 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
 ...
 user pgtable: 64k pages, 42-bit VAs, pgdp = (____ptrval____)
 ...
 pc : jbd2_journal_put_journal_head+0x7c/0x284
 lr : jbd2_journal_put_journal_head+0x3c/0x284
 ...
 Call trace:
  jbd2_journal_put_journal_head+0x7c/0x284
  __jbd2_journal_refile_buffer+0x164/0x188
  jbd2_journal_commit_transaction+0x12a0/0x1a50
  kjournald2+0xd0/0x260
  kthread+0x134/0x138
  ret_from_fork+0x10/0x1c
 Code: 51000400 b9000ac0 35000760 f9402274 (b9400a80)
 ---[ end trace 8fa99273d06aeb63 ]---

These patch set can fix this issue, the first patch is just a cleanup
patch, and the second one describe the root cause and fix it, please
review.

Thanks,
Yi.

zhangyi (F) (2):
  jbd2: move the clearing of b_modified flag to the
    journal_unmap_buffer()
  jbd2: do not clear the BH_Mapped flag when forgetting a metadata
    buffer

 fs/jbd2/commit.c      | 36 +++++++++++++-----------------------
 fs/jbd2/transaction.c | 25 ++++++++++++-------------
 include/linux/jbd2.h  |  2 ++
 3 files changed, 27 insertions(+), 36 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer()
  2020-02-03 14:04 [PATCH 0/2] jbd2: fix an oops problem zhangyi (F)
@ 2020-02-03 14:04 ` zhangyi (F)
  2020-02-06 11:03   ` Jan Kara
  2020-02-03 14:04 ` [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer zhangyi (F)
  1 sibling, 1 reply; 10+ messages in thread
From: zhangyi (F) @ 2020-02-03 14:04 UTC (permalink / raw)
  To: jack, tytso; +Cc: linux-ext4, yi.zhang, luoshijie1, zhangxiaoxu5

There is no need to delay the clearing of b_modified flag to the
transaction committing time when unmapping the journalled buffer, so
just move it to the journal_unmap_buffer().

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
---
 fs/jbd2/commit.c      | 43 +++++++++++++++----------------------------
 fs/jbd2/transaction.c | 24 +++++++++++-------------
 2 files changed, 26 insertions(+), 41 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 2494095e0340..6396fe70085b 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -976,34 +976,21 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 		 * it. */
 
 		/*
-		* A buffer which has been freed while still being journaled by
-		* a previous transaction.
-		*/
-		if (buffer_freed(bh)) {
-			/*
-			 * If the running transaction is the one containing
-			 * "add to orphan" operation (b_next_transaction !=
-			 * NULL), we have to wait for that transaction to
-			 * commit before we can really get rid of the buffer.
-			 * So just clear b_modified to not confuse transaction
-			 * credit accounting and refile the buffer to
-			 * BJ_Forget of the running transaction. If the just
-			 * committed transaction contains "add to orphan"
-			 * operation, we can completely invalidate the buffer
-			 * now. We are rather through in that since the
-			 * buffer may be still accessible when blocksize <
-			 * pagesize and it is attached to the last partial
-			 * page.
-			 */
-			jh->b_modified = 0;
-			if (!jh->b_next_transaction) {
-				clear_buffer_freed(bh);
-				clear_buffer_jbddirty(bh);
-				clear_buffer_mapped(bh);
-				clear_buffer_new(bh);
-				clear_buffer_req(bh);
-				bh->b_bdev = NULL;
-			}
+		 * A buffer which has been freed while still being journaled
+		 * by a previous transaction, refile the buffer to BJ_Forget of
+		 * the running transaction. If the just committed transaction
+		 * contains "add to orphan" operation, we can completely
+		 * invalidate the buffer now. We are rather through in that
+		 * since the buffer may be still accessible when blocksize <
+		 * pagesize and it is attached to the last partial page.
+		 */
+		if (buffer_freed(bh) && !jh->b_next_transaction) {
+			clear_buffer_freed(bh);
+			clear_buffer_jbddirty(bh);
+			clear_buffer_mapped(bh);
+			clear_buffer_new(bh);
+			clear_buffer_req(bh);
+			bh->b_bdev = NULL;
 		}
 
 		if (buffer_jbddirty(bh)) {
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index e77a5a0b4e46..a479cbf8ae54 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2337,11 +2337,7 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
 		set_buffer_freed(bh);
 		if (journal->j_running_transaction && buffer_jbddirty(bh))
 			jh->b_next_transaction = journal->j_running_transaction;
-		spin_unlock(&journal->j_list_lock);
-		spin_unlock(&jh->b_state_lock);
-		write_unlock(&journal->j_state_lock);
-		jbd2_journal_put_journal_head(jh);
-		return 0;
+		may_free = 0;
 	} else {
 		/* Good, the buffer belongs to the running transaction.
 		 * We are writing our own transaction's data, not any
@@ -2369,14 +2365,16 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
 	write_unlock(&journal->j_state_lock);
 	jbd2_journal_put_journal_head(jh);
 zap_buffer_unlocked:
-	clear_buffer_dirty(bh);
-	J_ASSERT_BH(bh, !buffer_jbddirty(bh));
-	clear_buffer_mapped(bh);
-	clear_buffer_req(bh);
-	clear_buffer_new(bh);
-	clear_buffer_delay(bh);
-	clear_buffer_unwritten(bh);
-	bh->b_bdev = NULL;
+	if (!buffer_freed(bh)) {
+		clear_buffer_dirty(bh);
+		J_ASSERT_BH(bh, !buffer_jbddirty(bh));
+		clear_buffer_mapped(bh);
+		clear_buffer_req(bh);
+		clear_buffer_new(bh);
+		clear_buffer_delay(bh);
+		clear_buffer_unwritten(bh);
+		bh->b_bdev = NULL;
+	}
 	return may_free;
 }
 
-- 
2.17.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
  2020-02-03 14:04 [PATCH 0/2] jbd2: fix an oops problem zhangyi (F)
  2020-02-03 14:04 ` [PATCH 1/2] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer() zhangyi (F)
@ 2020-02-03 14:04 ` zhangyi (F)
  2020-02-06 11:46   ` Jan Kara
  1 sibling, 1 reply; 10+ messages in thread
From: zhangyi (F) @ 2020-02-03 14:04 UTC (permalink / raw)
  To: jack, tytso; +Cc: linux-ext4, yi.zhang, luoshijie1, zhangxiaoxu5

Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
an older transaction") set the BH_Freed flag when forgetting a metadata
buffer which belongs to the committing transaction, it indicate the
committing process clear dirty bits when it is done with the buffer. But
it also clear the BH_Mapped flag at the same time, which may trigger
below NULL pointer oops when block_size < PAGE_SIZE.

rmdir 1             kjournald2                 mkdir 2
                    jbd2_journal_commit_transaction
		    commit transaction N
jbd2_journal_forget
set_buffer_freed(bh1)
                    jbd2_journal_commit_transaction
                     commit transaction N+1
                     ...
                     clear_buffer_mapped(bh1)
                                               ext4_getblk(bh2 ummapped)
                                               ...
                                               grow_dev_page
                                                init_page_buffers
                                                 bh1->b_private=NULL
                                                 bh2->b_private=NULL
                     jbd2_journal_put_journal_head(jh1)
                      __journal_remove_journal_head(hb1)
		       jh1 is NULL and trigger oops

*) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
   already been unmapped.

For the metadata buffer we forgetting, clear the dirty flags is enough,
so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
keep the mapped flag for the metadata buffer.

Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
---
 fs/jbd2/commit.c      | 11 +++++++----
 fs/jbd2/transaction.c |  1 +
 include/linux/jbd2.h  |  2 ++
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 6396fe70085b..a649cdd1c5e5 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -987,10 +987,13 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 		if (buffer_freed(bh) && !jh->b_next_transaction) {
 			clear_buffer_freed(bh);
 			clear_buffer_jbddirty(bh);
-			clear_buffer_mapped(bh);
-			clear_buffer_new(bh);
-			clear_buffer_req(bh);
-			bh->b_bdev = NULL;
+			if (buffer_unmap(bh)) {
+				clear_buffer_unmap(bh);
+				clear_buffer_mapped(bh);
+				clear_buffer_new(bh);
+				clear_buffer_req(bh);
+				bh->b_bdev = NULL;
+			}
 		}
 
 		if (buffer_jbddirty(bh)) {
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a479cbf8ae54..717964eec9d3 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2335,6 +2335,7 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
 		 * should clear dirty bits when it is done with the buffer.
 		 */
 		set_buffer_freed(bh);
+		set_buffer_unmap(bh);
 		if (journal->j_running_transaction && buffer_jbddirty(bh))
 			jh->b_next_transaction = journal->j_running_transaction;
 		may_free = 0;
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f613d8529863..f74906ebc73a 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -310,6 +310,7 @@ enum jbd_state_bits {
 	  = BH_PrivateStart,
 	BH_JWrite,		/* Being written to log (@@@ DEBUGGING) */
 	BH_Freed,		/* Has been freed (truncated) */
+	BH_Unmap,		/* Has been freed and need to unmap */
 	BH_Revoked,		/* Has been revoked from the log */
 	BH_RevokeValid,		/* Revoked flag is valid */
 	BH_JBDDirty,		/* Is dirty but journaled */
@@ -328,6 +329,7 @@ TAS_BUFFER_FNS(Revoked, revoked)
 BUFFER_FNS(RevokeValid, revokevalid)
 TAS_BUFFER_FNS(RevokeValid, revokevalid)
 BUFFER_FNS(Freed, freed)
+BUFFER_FNS(Unmap, unmap)
 BUFFER_FNS(Shadow, shadow)
 BUFFER_FNS(Verified, verified)
 
-- 
2.17.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer()
  2020-02-03 14:04 ` [PATCH 1/2] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer() zhangyi (F)
@ 2020-02-06 11:03   ` Jan Kara
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2020-02-06 11:03 UTC (permalink / raw)
  To: zhangyi (F); +Cc: jack, tytso, linux-ext4, luoshijie1, zhangxiaoxu5

On Mon 03-02-20 22:04:57, zhangyi (F) wrote:
> There is no need to delay the clearing of b_modified flag to the
> transaction committing time when unmapping the journalled buffer, so
> just move it to the journal_unmap_buffer().
> 
> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>

Thanks for the patch. It looks good, just one small comment below:

> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index e77a5a0b4e46..a479cbf8ae54 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -2337,11 +2337,7 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
>  		set_buffer_freed(bh);
>  		if (journal->j_running_transaction && buffer_jbddirty(bh))
>  			jh->b_next_transaction = journal->j_running_transaction;
> -		spin_unlock(&journal->j_list_lock);
> -		spin_unlock(&jh->b_state_lock);
> -		write_unlock(&journal->j_state_lock);
> -		jbd2_journal_put_journal_head(jh);
> -		return 0;
> +		may_free = 0;

I'd rather add b_modified clearing here than trying to reuse the tail of
the function. Because this condition is different from the other ones that
end up in zap_buffer_locked - here we really want to keep bh and jh mostly
intact.
								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
  2020-02-03 14:04 ` [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer zhangyi (F)
@ 2020-02-06 11:46   ` Jan Kara
  2020-02-06 15:28     ` zhangyi (F)
  2020-02-11  6:51     ` zhangyi (F)
  0 siblings, 2 replies; 10+ messages in thread
From: Jan Kara @ 2020-02-06 11:46 UTC (permalink / raw)
  To: zhangyi (F); +Cc: jack, tytso, linux-ext4, luoshijie1, zhangxiaoxu5

On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
> Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
> an older transaction") set the BH_Freed flag when forgetting a metadata
> buffer which belongs to the committing transaction, it indicate the
> committing process clear dirty bits when it is done with the buffer. But
> it also clear the BH_Mapped flag at the same time, which may trigger
> below NULL pointer oops when block_size < PAGE_SIZE.
> 
> rmdir 1             kjournald2                 mkdir 2
>                     jbd2_journal_commit_transaction
> 		    commit transaction N
> jbd2_journal_forget
> set_buffer_freed(bh1)
>                     jbd2_journal_commit_transaction
>                      commit transaction N+1
>                      ...
>                      clear_buffer_mapped(bh1)
>                                                ext4_getblk(bh2 ummapped)
>                                                ...
>                                                grow_dev_page
>                                                 init_page_buffers
>                                                  bh1->b_private=NULL
>                                                  bh2->b_private=NULL
>                      jbd2_journal_put_journal_head(jh1)
>                       __journal_remove_journal_head(hb1)
> 		       jh1 is NULL and trigger oops
> 
> *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
>    already been unmapped.
> 
> For the metadata buffer we forgetting, clear the dirty flags is enough,
> so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
> keep the mapped flag for the metadata buffer.
> 
> Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>

Good spotting! Thanks for the patch. Some comments below:

> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 6396fe70085b..a649cdd1c5e5 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -987,10 +987,13 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>  		if (buffer_freed(bh) && !jh->b_next_transaction) {
>  			clear_buffer_freed(bh);
>  			clear_buffer_jbddirty(bh);
> -			clear_buffer_mapped(bh);
> -			clear_buffer_new(bh);
> -			clear_buffer_req(bh);
> -			bh->b_bdev = NULL;
> +			if (buffer_unmap(bh)) {
> +				clear_buffer_unmap(bh);
> +				clear_buffer_mapped(bh);
> +				clear_buffer_new(bh);
> +				clear_buffer_req(bh);
> +				bh->b_bdev = NULL;
> +			}

Any reason why you don't want to clear buffer_req and buffer_new flags for
all buffers as well? I agree that b_bdev setting and buffer_mapped need
special treatment.

Also rather than introducing this new buffer_unmap bit, I'd use the fact
this special treatment is needed only for buffers coming from the block device
mapping. And we can check for that like:

		/*
		 * We can (and need to) unmap buffer only for normal mappings.
		 * Block device buffers need to stay mapped all the time.
		 * We need to be careful about the check because the page
		 * mapping can get cleared under our hands.
		 */
		mapping = READ_ONCE(bh->b_page->mapping);
		if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
			...
		}

Longer term, we might want to rework how the handling of truncated buffers
works with JDB2. There's lots of duplication between jbd2_journal_forget()
and jbd2_journal_unmap_buffer(), the dirtiness is tracked in jh->b_modified
as well as buffer_jbddirty() and it is further redundant with the journal
list the buffer is currently on. So I suspect it could all be simplified if
we took a fresh look at things.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
  2020-02-06 11:46   ` Jan Kara
@ 2020-02-06 15:28     ` zhangyi (F)
  2020-02-12  8:45       ` Jan Kara
  2020-02-11  6:51     ` zhangyi (F)
  1 sibling, 1 reply; 10+ messages in thread
From: zhangyi (F) @ 2020-02-06 15:28 UTC (permalink / raw)
  To: Jan Kara; +Cc: tytso, linux-ext4, luoshijie1, zhangxiaoxu5

Thanks for the comments.

On 2020/2/6 19:46, Jan Kara wrote:
> On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
[..]
>> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
>> index 6396fe70085b..a649cdd1c5e5 100644
>> --- a/fs/jbd2/commit.c
>> +++ b/fs/jbd2/commit.c
>> @@ -987,10 +987,13 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>>  		if (buffer_freed(bh) && !jh->b_next_transaction) {
>>  			clear_buffer_freed(bh);
>>  			clear_buffer_jbddirty(bh);
>> -			clear_buffer_mapped(bh);
>> -			clear_buffer_new(bh);
>> -			clear_buffer_req(bh);
>> -			bh->b_bdev = NULL;
>> +			if (buffer_unmap(bh)) {
>> +				clear_buffer_unmap(bh);
>> +				clear_buffer_mapped(bh);
>> +				clear_buffer_new(bh);
>> +				clear_buffer_req(bh);
>> +				bh->b_bdev = NULL;
>> +			}
> 
> Any reason why you don't want to clear buffer_req and buffer_new flags for
> all buffers as well? I agree that b_bdev setting and buffer_mapped need
> special treatment.
> 
IIUC, for the buffer coming from jbd2_journal_forget() is always 'block
device backed' metadata buffer (not pretty sure), and for these metadata
buffer, buffer_new flag will not be set. At the same time, since it's
always mapped, so it's fine to keep the buffer_req flag even it's freed
by the filesystem now, because it means the block device has committed
this buffer, and it seems that it does not affect we reuse this buffer.
Am I missing something ?

> Also rather than introducing this new buffer_unmap bit, I'd use the fact
> this special treatment is needed only for buffers coming from the block device
> mapping. And we can check for that like:
> 
> 		/*
> 		 * We can (and need to) unmap buffer only for normal mappings.
> 		 * Block device buffers need to stay mapped all the time.
> 		 * We need to be careful about the check because the page
> 		 * mapping can get cleared under our hands.
> 		 */
> 		mapping = READ_ONCE(bh->b_page->mapping);
> 		if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
> 			...
> 		}
> 
It looks better, I will use this checking in the next iteration.

> Longer term, we might want to rework how the handling of truncated buffers
> works with JDB2. There's lots of duplication between jbd2_journal_forget()
> and jbd2_journal_unmap_buffer(), the dirtiness is tracked in jh->b_modified
> as well as buffer_jbddirty() and it is further redundant with the journal
> list the buffer is currently on. So I suspect it could all be simplified if
> we took a fresh look at things.
> 
Indeed, it is tricky and not pretty easy to understand now, refactoring
these is awesome int the future.

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
  2020-02-06 11:46   ` Jan Kara
  2020-02-06 15:28     ` zhangyi (F)
@ 2020-02-11  6:51     ` zhangyi (F)
  2020-02-12  8:47       ` Jan Kara
  1 sibling, 1 reply; 10+ messages in thread
From: zhangyi (F) @ 2020-02-11  6:51 UTC (permalink / raw)
  To: Jan Kara; +Cc: tytso, linux-ext4, luoshijie1, zhangxiaoxu5

On 2020/2/6 19:46, Jan Kara wrote:
> On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
>> Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
>> an older transaction") set the BH_Freed flag when forgetting a metadata
>> buffer which belongs to the committing transaction, it indicate the
>> committing process clear dirty bits when it is done with the buffer. But
>> it also clear the BH_Mapped flag at the same time, which may trigger
>> below NULL pointer oops when block_size < PAGE_SIZE.
>>
>> rmdir 1             kjournald2                 mkdir 2
>>                     jbd2_journal_commit_transaction
>> 		    commit transaction N
>> jbd2_journal_forget
>> set_buffer_freed(bh1)
>>                     jbd2_journal_commit_transaction
>>                      commit transaction N+1
>>                      ...
>>                      clear_buffer_mapped(bh1)
>>                                                ext4_getblk(bh2 ummapped)
>>                                                ...
>>                                                grow_dev_page
>>                                                 init_page_buffers
>>                                                  bh1->b_private=NULL
>>                                                  bh2->b_private=NULL
>>                      jbd2_journal_put_journal_head(jh1)
>>                       __journal_remove_journal_head(hb1)
>> 		       jh1 is NULL and trigger oops
>>
>> *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
>>    already been unmapped.
>>
>> For the metadata buffer we forgetting, clear the dirty flags is enough,
>> so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
>> keep the mapped flag for the metadata buffer.
>>
>> Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
>> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
[..]
> 
> Also rather than introducing this new buffer_unmap bit, I'd use the fact
> this special treatment is needed only for buffers coming from the block device
> mapping. And we can check for that like:
> 
> 		/*
> 		 * We can (and need to) unmap buffer only for normal mappings.
> 		 * Block device buffers need to stay mapped all the time.
> 		 * We need to be careful about the check because the page
> 		 * mapping can get cleared under our hands.
> 		 */
> 		mapping = READ_ONCE(bh->b_page->mapping);
> 		if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
> 			...
> 		}

Think about it again, it may missing clearing of mapped flag if 'mapping'
of journalled data page was cleared, and finally trigger exception if
we reuse the buffer again. So I think it should be:

		if (!(mapping && sb_is_blkdev_sb(mapping->host->i_sb))) {
			...
		}

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
  2020-02-06 15:28     ` zhangyi (F)
@ 2020-02-12  8:45       ` Jan Kara
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2020-02-12  8:45 UTC (permalink / raw)
  To: zhangyi (F); +Cc: Jan Kara, tytso, linux-ext4, luoshijie1, zhangxiaoxu5

On Thu 06-02-20 23:28:01, zhangyi (F) wrote:
> Thanks for the comments.
> 
> On 2020/2/6 19:46, Jan Kara wrote:
> > On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
> [..]
> >> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> >> index 6396fe70085b..a649cdd1c5e5 100644
> >> --- a/fs/jbd2/commit.c
> >> +++ b/fs/jbd2/commit.c
> >> @@ -987,10 +987,13 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> >>  		if (buffer_freed(bh) && !jh->b_next_transaction) {
> >>  			clear_buffer_freed(bh);
> >>  			clear_buffer_jbddirty(bh);
> >> -			clear_buffer_mapped(bh);
> >> -			clear_buffer_new(bh);
> >> -			clear_buffer_req(bh);
> >> -			bh->b_bdev = NULL;
> >> +			if (buffer_unmap(bh)) {
> >> +				clear_buffer_unmap(bh);
> >> +				clear_buffer_mapped(bh);
> >> +				clear_buffer_new(bh);
> >> +				clear_buffer_req(bh);
> >> +				bh->b_bdev = NULL;
> >> +			}
> > 
> > Any reason why you don't want to clear buffer_req and buffer_new flags for
> > all buffers as well? I agree that b_bdev setting and buffer_mapped need
> > special treatment.
> > 
> IIUC, for the buffer coming from jbd2_journal_forget() is always 'block
> device backed' metadata buffer (not pretty sure), and for these metadata
  Yes, it is.

> buffer, buffer_new flag will not be set. At the same time, since it's
> always mapped, so it's fine to keep the buffer_req flag even it's freed
> by the filesystem now, because it means the block device has committed
> this buffer, and it seems that it does not affect we reuse this buffer.
> Am I missing something ?

OK, you're right that buffer_new shouldn't be ever set for block backed
buffers and we don't care about buffer_req. So let's keep the split of bits
to clear as you did and just add a comment that for block device buffers it
is enough to clear buffer_jbddirty and buffer_freed, for file mapping
buffers (i.e., journalled data) we have to be more careful and clear more
bits.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
  2020-02-11  6:51     ` zhangyi (F)
@ 2020-02-12  8:47       ` Jan Kara
  2020-02-12 13:14         ` zhangyi (F)
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2020-02-12  8:47 UTC (permalink / raw)
  To: zhangyi (F); +Cc: Jan Kara, tytso, linux-ext4, luoshijie1, zhangxiaoxu5

On Tue 11-02-20 14:51:10, zhangyi (F) wrote:
> On 2020/2/6 19:46, Jan Kara wrote:
> > On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
> >> Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
> >> an older transaction") set the BH_Freed flag when forgetting a metadata
> >> buffer which belongs to the committing transaction, it indicate the
> >> committing process clear dirty bits when it is done with the buffer. But
> >> it also clear the BH_Mapped flag at the same time, which may trigger
> >> below NULL pointer oops when block_size < PAGE_SIZE.
> >>
> >> rmdir 1             kjournald2                 mkdir 2
> >>                     jbd2_journal_commit_transaction
> >> 		    commit transaction N
> >> jbd2_journal_forget
> >> set_buffer_freed(bh1)
> >>                     jbd2_journal_commit_transaction
> >>                      commit transaction N+1
> >>                      ...
> >>                      clear_buffer_mapped(bh1)
> >>                                                ext4_getblk(bh2 ummapped)
> >>                                                ...
> >>                                                grow_dev_page
> >>                                                 init_page_buffers
> >>                                                  bh1->b_private=NULL
> >>                                                  bh2->b_private=NULL
> >>                      jbd2_journal_put_journal_head(jh1)
> >>                       __journal_remove_journal_head(hb1)
> >> 		       jh1 is NULL and trigger oops
> >>
> >> *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
> >>    already been unmapped.
> >>
> >> For the metadata buffer we forgetting, clear the dirty flags is enough,
> >> so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
> >> keep the mapped flag for the metadata buffer.
> >>
> >> Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
> >> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
> [..]
> > 
> > Also rather than introducing this new buffer_unmap bit, I'd use the fact
> > this special treatment is needed only for buffers coming from the block device
> > mapping. And we can check for that like:
> > 
> > 		/*
> > 		 * We can (and need to) unmap buffer only for normal mappings.
> > 		 * Block device buffers need to stay mapped all the time.
> > 		 * We need to be careful about the check because the page
> > 		 * mapping can get cleared under our hands.
> > 		 */
> > 		mapping = READ_ONCE(bh->b_page->mapping);
> > 		if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
> > 			...
> > 		}
> 
> Think about it again, it may missing clearing of mapped flag if 'mapping'
> of journalled data page was cleared, and finally trigger exception if
> we reuse the buffer again. So I think it should be:
> 
> 		if (!(mapping && sb_is_blkdev_sb(mapping->host->i_sb))) {
> 			...
> 		}

Well, if b_page->mapping got cleared, it means the page got fully truncated
and in such case buffers can never be reused - the page and buffers will be
freed once we are done with them. So what you are concerned about cannot
happen. But you're right it is good to explain this in the comment.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
  2020-02-12  8:47       ` Jan Kara
@ 2020-02-12 13:14         ` zhangyi (F)
  0 siblings, 0 replies; 10+ messages in thread
From: zhangyi (F) @ 2020-02-12 13:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: tytso, linux-ext4, luoshijie1, zhangxiaoxu5

Hi,

On 2020/2/12 16:47, Jan Kara wrote:
> On Tue 11-02-20 14:51:10, zhangyi (F) wrote:
>> On 2020/2/6 19:46, Jan Kara wrote:
>>> On Mon 03-02-20 22:04:58, zhangyi (F) wrote:
>>>> Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from
>>>> an older transaction") set the BH_Freed flag when forgetting a metadata
>>>> buffer which belongs to the committing transaction, it indicate the
>>>> committing process clear dirty bits when it is done with the buffer. But
>>>> it also clear the BH_Mapped flag at the same time, which may trigger
>>>> below NULL pointer oops when block_size < PAGE_SIZE.
>>>>
>>>> rmdir 1             kjournald2                 mkdir 2
>>>>                     jbd2_journal_commit_transaction
>>>> 		    commit transaction N
>>>> jbd2_journal_forget
>>>> set_buffer_freed(bh1)
>>>>                     jbd2_journal_commit_transaction
>>>>                      commit transaction N+1
>>>>                      ...
>>>>                      clear_buffer_mapped(bh1)
>>>>                                                ext4_getblk(bh2 ummapped)
>>>>                                                ...
>>>>                                                grow_dev_page
>>>>                                                 init_page_buffers
>>>>                                                  bh1->b_private=NULL
>>>>                                                  bh2->b_private=NULL
>>>>                      jbd2_journal_put_journal_head(jh1)
>>>>                       __journal_remove_journal_head(hb1)
>>>> 		       jh1 is NULL and trigger oops
>>>>
>>>> *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
>>>>    already been unmapped.
>>>>
>>>> For the metadata buffer we forgetting, clear the dirty flags is enough,
>>>> so this patch add BH_Unmap flag for the journal_unmap_buffer() case and
>>>> keep the mapped flag for the metadata buffer.
>>>>
>>>> Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
>>>> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
>> [..]
>>>
>>> Also rather than introducing this new buffer_unmap bit, I'd use the fact
>>> this special treatment is needed only for buffers coming from the block device
>>> mapping. And we can check for that like:
>>>
>>> 		/*
>>> 		 * We can (and need to) unmap buffer only for normal mappings.
>>> 		 * Block device buffers need to stay mapped all the time.
>>> 		 * We need to be careful about the check because the page
>>> 		 * mapping can get cleared under our hands.
>>> 		 */
>>> 		mapping = READ_ONCE(bh->b_page->mapping);
>>> 		if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) {
>>> 			...
>>> 		}
>>
>> Think about it again, it may missing clearing of mapped flag if 'mapping'
>> of journalled data page was cleared, and finally trigger exception if
>> we reuse the buffer again. So I think it should be:
>>
>> 		if (!(mapping && sb_is_blkdev_sb(mapping->host->i_sb))) {
>> 			...
>> 		}
> 
> Well, if b_page->mapping got cleared, it means the page got fully truncated
> and in such case buffers can never be reused - the page and buffers will be
> freed once we are done with them. So what you are concerned about cannot
> happen. But you're right it is good to explain this in the comment.
> 
Yes, you are right, the page and buffer will be freed in release_buffer_page()
and it seems there is no exception, I will send V3 to back to use the judgement
condition as you suggested and add comments after tests.

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-03 14:04 [PATCH 0/2] jbd2: fix an oops problem zhangyi (F)
2020-02-03 14:04 ` [PATCH 1/2] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer() zhangyi (F)
2020-02-06 11:03   ` Jan Kara
2020-02-03 14:04 ` [PATCH 2/2] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer zhangyi (F)
2020-02-06 11:46   ` Jan Kara
2020-02-06 15:28     ` zhangyi (F)
2020-02-12  8:45       ` Jan Kara
2020-02-11  6:51     ` zhangyi (F)
2020-02-12  8:47       ` Jan Kara
2020-02-12 13:14         ` zhangyi (F)

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git