linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] ocfs2: Fix handle refcount leak in two exception handling paths
@ 2021-09-08 10:20 Chenyuan Mi
  2021-09-08 10:51 ` Joseph Qi
  0 siblings, 1 reply; 9+ messages in thread
From: Chenyuan Mi @ 2021-09-08 10:20 UTC (permalink / raw)
  Cc: yuanxzhang, Chenyuan Mi, Xiyu Yang, Xin Tan, Mark Fasheh,
	Joel Becker, Joseph Qi, ocfs2-devel, linux-kernel

The reference counting issue happens in two exception handling paths
of ocfs2_replay_truncate_records(). When executing these two exception
handling paths, the function forgets to decrease the refcount of handle
increased by ocfs2_start_trans(), causing a refcount leak.

Fix this issue by using ocfs2_commit_trans() to decrease the refcount
of handle in two handling paths.

Signed-off-by: Chenyuan Mi <cymi20@fudan.edu.cn>
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
---
 fs/ocfs2/alloc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index f1cc8258d34a..b05fde7edc3a 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -5940,6 +5940,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
 		status = ocfs2_journal_access_di(handle, INODE_CACHE(tl_inode), tl_bh,
 						 OCFS2_JOURNAL_ACCESS_WRITE);
 		if (status < 0) {
+			ocfs2_commit_trans(osb, handle);
 			mlog_errno(status);
 			goto bail;
 		}
@@ -5964,6 +5965,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
 						     data_alloc_bh, start_blk,
 						     num_clusters);
 			if (status < 0) {
+				ocfs2_commit_trans(osb, handle);
 				mlog_errno(status);
 				goto bail;
 			}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] ocfs2: Fix handle refcount leak in two exception handling paths
  2021-09-08 10:20 [PATCH v2] ocfs2: Fix handle refcount leak in two exception handling paths Chenyuan Mi
@ 2021-09-08 10:51 ` Joseph Qi
  2021-09-08 17:12   ` [Ocfs2-devel] " Wengang Wang
  2021-09-15  2:36   ` Joseph Qi
  0 siblings, 2 replies; 9+ messages in thread
From: Joseph Qi @ 2021-09-08 10:51 UTC (permalink / raw)
  To: Chenyuan Mi, akpm
  Cc: yuanxzhang, Xiyu Yang, Xin Tan, Mark Fasheh, Joel Becker,
	ocfs2-devel, linux-kernel



On 9/8/21 6:20 PM, Chenyuan Mi wrote:
> The reference counting issue happens in two exception handling paths
> of ocfs2_replay_truncate_records(). When executing these two exception
> handling paths, the function forgets to decrease the refcount of handle
> increased by ocfs2_start_trans(), causing a refcount leak.
> 
> Fix this issue by using ocfs2_commit_trans() to decrease the refcount
> of handle in two handling paths.
> 
> Signed-off-by: Chenyuan Mi <cymi20@fudan.edu.cn>
> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>

Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
> ---
>  fs/ocfs2/alloc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> index f1cc8258d34a..b05fde7edc3a 100644
> --- a/fs/ocfs2/alloc.c
> +++ b/fs/ocfs2/alloc.c
> @@ -5940,6 +5940,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>  		status = ocfs2_journal_access_di(handle, INODE_CACHE(tl_inode), tl_bh,
>  						 OCFS2_JOURNAL_ACCESS_WRITE);
>  		if (status < 0) {
> +			ocfs2_commit_trans(osb, handle);
>  			mlog_errno(status);
>  			goto bail;
>  		}
> @@ -5964,6 +5965,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>  						     data_alloc_bh, start_blk,
>  						     num_clusters);
>  			if (status < 0) {
> +				ocfs2_commit_trans(osb, handle);
>  				mlog_errno(status);
>  				goto bail;
>  			}
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ocfs2-devel] [PATCH v2] ocfs2: Fix handle refcount leak in two exception handling paths
  2021-09-08 10:51 ` Joseph Qi
@ 2021-09-08 17:12   ` Wengang Wang
  2021-09-09 11:07     ` Joseph Qi
  2021-09-15  2:36   ` Joseph Qi
  1 sibling, 1 reply; 9+ messages in thread
From: Wengang Wang @ 2021-09-08 17:12 UTC (permalink / raw)
  To: Joseph Qi
  Cc: Chenyuan Mi, akpm, Xin Tan, Xiyu Yang, yuanxzhang, linux-kernel,
	ocfs2-devel

Hi,

Sorry for late involving, but this doesn’t look right to me.

> On Sep 8, 2021, at 3:51 AM, Joseph Qi <joseph.qi@linux.alibaba.com> wrote:
> 
> 
> 
> On 9/8/21 6:20 PM, Chenyuan Mi wrote:
>> The reference counting issue happens in two exception handling paths
>> of ocfs2_replay_truncate_records(). When executing these two exception
>> handling paths, the function forgets to decrease the refcount of handle
>> increased by ocfs2_start_trans(), causing a refcount leak.
>> 
>> Fix this issue by using ocfs2_commit_trans() to decrease the refcount
>> of handle in two handling paths.
>> 
>> Signed-off-by: Chenyuan Mi <cymi20@fudan.edu.cn>
>> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
>> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
> 
> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
>> ---
>> fs/ocfs2/alloc.c | 2 ++
>> 1 file changed, 2 insertions(+)
>> 
>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>> index f1cc8258d34a..b05fde7edc3a 100644
>> --- a/fs/ocfs2/alloc.c
>> +++ b/fs/ocfs2/alloc.c
>> @@ -5940,6 +5940,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>> 		status = ocfs2_journal_access_di(handle, INODE_CACHE(tl_inode), tl_bh,
>> 						 OCFS2_JOURNAL_ACCESS_WRITE);
>> 		if (status < 0) {
>> +			ocfs2_commit_trans(osb, handle);
>> 			mlog_errno(status);
>> 			goto bail;
>> 		}
>> @@ -5964,6 +5965,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>> 						     data_alloc_bh, start_blk,
>> 						     num_clusters);
>> 			if (status < 0) {
>> +				ocfs2_commit_trans(osb, handle);

As a transaction, stuff expected to be in the same handle should be treated as atomic.
Here the stuff includes the tl_bh and other metadata block which will be modified in ocfs2_free_clusters().
Coming here, some of related meta blocks may be in the handle but others are not due to the error happened.
If you do a commit, partial meta blocks are committed to log. — that breaks the atomic idea, it will cause FS inconsistency.
So what’s reason you want to commit the meta block changes, which is not all of expected, in this handle to journal log?

Do you really see a hit on the failure? or just you detected the refcount leak by code review?

You may want to look at ocfs2_journal_dirty() for the error handling part.

thanks,
wengang


>> 				mlog_errno(status);
>> 				goto bail;
>> 			}
>> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ocfs2-devel] [PATCH v2] ocfs2: Fix handle refcount leak in two exception handling paths
  2021-09-08 17:12   ` [Ocfs2-devel] " Wengang Wang
@ 2021-09-09 11:07     ` Joseph Qi
       [not found]       ` <CED0D2AD-7905-490E-8D36-50D192CD9BF1@oracle.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Joseph Qi @ 2021-09-09 11:07 UTC (permalink / raw)
  To: Wengang Wang
  Cc: Chenyuan Mi, akpm, Xin Tan, Xiyu Yang, yuanxzhang, linux-kernel,
	ocfs2-devel

Hi Wengang,

On 9/9/21 1:12 AM, Wengang Wang wrote:
> Hi,
> 
> Sorry for late involving, but this doesn’t look right to me.
> 
>> On Sep 8, 2021, at 3:51 AM, Joseph Qi <joseph.qi@linux.alibaba.com> wrote:
>>
>>
>>
>> On 9/8/21 6:20 PM, Chenyuan Mi wrote:
>>> The reference counting issue happens in two exception handling paths
>>> of ocfs2_replay_truncate_records(). When executing these two exception
>>> handling paths, the function forgets to decrease the refcount of handle
>>> increased by ocfs2_start_trans(), causing a refcount leak.
>>>
>>> Fix this issue by using ocfs2_commit_trans() to decrease the refcount
>>> of handle in two handling paths.
>>>
>>> Signed-off-by: Chenyuan Mi <cymi20@fudan.edu.cn>
>>> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
>>> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
>>
>> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
>>> ---
>>> fs/ocfs2/alloc.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>>> index f1cc8258d34a..b05fde7edc3a 100644
>>> --- a/fs/ocfs2/alloc.c
>>> +++ b/fs/ocfs2/alloc.c
>>> @@ -5940,6 +5940,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>>> 		status = ocfs2_journal_access_di(handle, INODE_CACHE(tl_inode), tl_bh,
>>> 						 OCFS2_JOURNAL_ACCESS_WRITE);
>>> 		if (status < 0) {
>>> +			ocfs2_commit_trans(osb, handle);
>>> 			mlog_errno(status);
>>> 			goto bail;
>>> 		}
>>> @@ -5964,6 +5965,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>>> 						     data_alloc_bh, start_blk,
>>> 						     num_clusters);
>>> 			if (status < 0) {
>>> +				ocfs2_commit_trans(osb, handle);
> 
> As a transaction, stuff expected to be in the same handle should be treated as atomic.
> Here the stuff includes the tl_bh and other metadata block which will be modified in ocfs2_free_clusters().
> Coming here, some of related meta blocks may be in the handle but others are not due to the error happened.
> If you do a commit, partial meta blocks are committed to log. — that breaks the atomic idea, it will cause FS inconsistency.
> So what’s reason you want to commit the meta block changes, which is not all of expected, in this handle to journal log?
> 
> Do you really see a hit on the failure? or just you detected the refcount leak by code review?
> 
> You may want to look at ocfs2_journal_dirty() for the error handling part.
> 

For the first error handling, since we don't call ocfs2_journal_dirty()
yet, so won't be a problem.
For the second error handling, I think we don't have a better way. Look
at other callers of ocfs2_free_clusters(), we simply ignore the error
code.
Anyway, we should commit transaction if starts, otherwise journal will
be abnormal.

Thanks,
Joseph


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ocfs2-devel] [PATCH v2] ocfs2: Fix handle refcount leak in two exception handling paths
       [not found]       ` <CED0D2AD-7905-490E-8D36-50D192CD9BF1@oracle.com>
@ 2021-09-10  1:53         ` Joseph Qi
  2021-09-10 17:00           ` Wengang Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Joseph Qi @ 2021-09-10  1:53 UTC (permalink / raw)
  To: Wengang Wang
  Cc: Chenyuan Mi, akpm, Xin Tan, Xiyu Yang, yuanxzhang, linux-kernel,
	ocfs2-devel



On 9/10/21 1:48 AM, Wengang Wang wrote:
> 
> 
> On Sep 9, 2021, at 4:07 AM, Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>> wrote:
> 
> Hi Wengang,
> 
> On 9/9/21 1:12 AM, Wengang Wang wrote:
> Hi,
> 
> Sorry for late involving, but this doesn’t look right to me.
> 
> On Sep 8, 2021, at 3:51 AM, Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>> wrote:
> 
> 
> 
> On 9/8/21 6:20 PM, Chenyuan Mi wrote:
> The reference counting issue happens in two exception handling paths
> of ocfs2_replay_truncate_records(). When executing these two exception
> handling paths, the function forgets to decrease the refcount of handle
> increased by ocfs2_start_trans(), causing a refcount leak.
> 
> Fix this issue by using ocfs2_commit_trans() to decrease the refcount
> of handle in two handling paths.
> 
> Signed-off-by: Chenyuan Mi <cymi20@fudan.edu.cn<mailto:cymi20@fudan.edu.cn>>
> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn<mailto:xiyuyang19@fudan.edu.cn>>
> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com<mailto:tanxin.ctf@gmail.com>>
> 
> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>>
> ---
> fs/ocfs2/alloc.c | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> index f1cc8258d34a..b05fde7edc3a 100644
> --- a/fs/ocfs2/alloc.c
> +++ b/fs/ocfs2/alloc.c
> @@ -5940,6 +5940,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
> status = ocfs2_journal_access_di(handle, INODE_CACHE(tl_inode), tl_bh,
>  OCFS2_JOURNAL_ACCESS_WRITE);
> if (status < 0) {
> + ocfs2_commit_trans(osb, handle);
> mlog_errno(status);
> goto bail;
> }
> @@ -5964,6 +5965,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>      data_alloc_bh, start_blk,
>      num_clusters);
> if (status < 0) {
> + ocfs2_commit_trans(osb, handle);
> 
> As a transaction, stuff expected to be in the same handle should be treated as atomic.
> Here the stuff includes the tl_bh and other metadata block which will be modified in ocfs2_free_clusters().
> Coming here, some of related meta blocks may be in the handle but others are not due to the error happened.
> If you do a commit, partial meta blocks are committed to log. — that breaks the atomic idea, it will cause FS inconsistency.
> So what’s reason you want to commit the meta block changes, which is not all of expected, in this handle to journal log?
> 
> Do you really see a hit on the failure? or just you detected the refcount leak by code review?
> 
> You may want to look at ocfs2_journal_dirty() for the error handling part.
> 
> 
> For the first error handling, since we don't call ocfs2_journal_dirty()
> yet, so won't be a problem.
> For the second error handling, I think we don't have a better way. Look
> at other callers of ocfs2_free_clusters(), we simply ignore the error
> code.
> Anyway, we should commit transaction if starts, otherwise journal will
> be abnormal.
> 
> I don't think so. If error happened, we should fail ocfs2, rather than do a partial committing.
> 

Umm... not exactly...
Take ocfs2_free_clusters() for example, when it fails in case of EIO or
ENOMEM, we can't just abort journal in such cases, because it is not so
serious, only a bit blocks still occupied and they will recovery during
the next mount. 
That's why we have "errors=continue" in most filesystems, we should always
consider the business continuity first.
Also you can look at ext4_free_blocks() for reference.

Thanks,
Joseph

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ocfs2-devel] [PATCH v2] ocfs2: Fix handle refcount leak in two exception handling paths
  2021-09-10  1:53         ` Joseph Qi
@ 2021-09-10 17:00           ` Wengang Wang
  2021-09-14  2:12             ` Joseph Qi
  0 siblings, 1 reply; 9+ messages in thread
From: Wengang Wang @ 2021-09-10 17:00 UTC (permalink / raw)
  To: Joseph Qi
  Cc: Chenyuan Mi, akpm, Xin Tan, Xiyu Yang, yuanxzhang, linux-kernel,
	ocfs2-devel



> On Sep 9, 2021, at 6:53 PM, Joseph Qi <joseph.qi@linux.alibaba.com> wrote:
> 
> 
> 
> On 9/10/21 1:48 AM, Wengang Wang wrote:
>> 
>> 
>> On Sep 9, 2021, at 4:07 AM, Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>> wrote:
>> 
>> Hi Wengang,
>> 
>> On 9/9/21 1:12 AM, Wengang Wang wrote:
>> Hi,
>> 
>> Sorry for late involving, but this doesn’t look right to me.
>> 
>> On Sep 8, 2021, at 3:51 AM, Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>> wrote:
>> 
>> 
>> 
>> On 9/8/21 6:20 PM, Chenyuan Mi wrote:
>> The reference counting issue happens in two exception handling paths
>> of ocfs2_replay_truncate_records(). When executing these two exception
>> handling paths, the function forgets to decrease the refcount of handle
>> increased by ocfs2_start_trans(), causing a refcount leak.
>> 
>> Fix this issue by using ocfs2_commit_trans() to decrease the refcount
>> of handle in two handling paths.
>> 
>> Signed-off-by: Chenyuan Mi <cymi20@fudan.edu.cn<mailto:cymi20@fudan.edu.cn>>
>> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn<mailto:xiyuyang19@fudan.edu.cn>>
>> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com<mailto:tanxin.ctf@gmail.com>>
>> 
>> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>>
>> ---
>> fs/ocfs2/alloc.c | 2 ++
>> 1 file changed, 2 insertions(+)
>> 
>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>> index f1cc8258d34a..b05fde7edc3a 100644
>> --- a/fs/ocfs2/alloc.c
>> +++ b/fs/ocfs2/alloc.c
>> @@ -5940,6 +5940,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>> status = ocfs2_journal_access_di(handle, INODE_CACHE(tl_inode), tl_bh,
>> OCFS2_JOURNAL_ACCESS_WRITE);
>> if (status < 0) {
>> + ocfs2_commit_trans(osb, handle);
>> mlog_errno(status);
>> goto bail;
>> }
>> @@ -5964,6 +5965,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>>     data_alloc_bh, start_blk,
>>     num_clusters);
>> if (status < 0) {
>> + ocfs2_commit_trans(osb, handle);
>> 
>> As a transaction, stuff expected to be in the same handle should be treated as atomic.
>> Here the stuff includes the tl_bh and other metadata block which will be modified in ocfs2_free_clusters().
>> Coming here, some of related meta blocks may be in the handle but others are not due to the error happened.
>> If you do a commit, partial meta blocks are committed to log. — that breaks the atomic idea, it will cause FS inconsistency.
>> So what’s reason you want to commit the meta block changes, which is not all of expected, in this handle to journal log?
>> 
>> Do you really see a hit on the failure? or just you detected the refcount leak by code review?
>> 
>> You may want to look at ocfs2_journal_dirty() for the error handling part.
>> 
>> 
>> For the first error handling, since we don't call ocfs2_journal_dirty()
>> yet, so won't be a problem.
>> For the second error handling, I think we don't have a better way. Look
>> at other callers of ocfs2_free_clusters(), we simply ignore the error
>> code.
>> Anyway, we should commit transaction if starts, otherwise journal will
>> be abnormal.
>> 
>> I don't think so. If error happened, we should fail ocfs2, rather than do a partial committing.
>> 
> 
> Umm... not exactly...
> Take ocfs2_free_clusters() for example, when it fails in case of EIO or
> ENOMEM, we can't just abort journal in such cases, because it is not so
> serious, only a bit blocks still occupied and they will recovery during
> the next mount. 

So are you sure a partial journal commitment won’t cause FS inconsistency? any proof for that?
Problem is not just if we can try to free the blocks again or not. The problem is FS in inconsistent state.

I didn’t look into ocfs2_free_clusters() further, but can image this case:

1) We are going to free some clusters/blocks. 
2) We may need a new (not merging into existing) record to remember the new free extent. 
3) The new record needs to be inserted into free extent tree.
4) the block (block A) where the new record to be inserted could be already full thus no space for the new record.
5) then we need at least a new block (block B) to store the new record. (to maintain the free block btree, maybe another block, block C is needed too).
6) So we need to save the pointer (block number) of block B in block A and store the new record in block B.
7) In this case we need to make sure block A and block B either both in journal log, or none of them in journal log.  We don’t allow block A is in journal bot block B is not, right?

go back to question,   Error could after block B is added to journal handle but before block A is added. In case we do a journal commit when hitting that error, we are committing block B to journal but leaving block A not in.
If panic happened, block A could never has the pointer pointing to block B. The result is block B is permanently lost (we can never reuse this block again). 
Or if we add block A to journal first before adding block B and error happens After block A is added and before block B is added.  Then we have the pointer pointing to block B in block A after panic, but block B doesn’t contain valid contents. The result is that we will hit problem when visiting block B as a valid btree block.


> That's why we have "errors=continue" in most filesystems, we should always
> consider the business continuity first.
> Also you can look at ext4_free_blocks() for reference.

OCFS2 doesn’t support ERRORS_CONT, or to say it just ignore that option.  No matter ERRORS_CONT is supported or not by OCFS2,
The key is not to leave the FS in inconsistent state.

thanks,
wengang


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ocfs2-devel] [PATCH v2] ocfs2: Fix handle refcount leak in two exception handling paths
  2021-09-10 17:00           ` Wengang Wang
@ 2021-09-14  2:12             ` Joseph Qi
  2021-09-14  3:07               ` Wengang Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Joseph Qi @ 2021-09-14  2:12 UTC (permalink / raw)
  To: Wengang Wang
  Cc: Chenyuan Mi, akpm, Xin Tan, Xiyu Yang, yuanxzhang, linux-kernel,
	ocfs2-devel



On 9/11/21 1:00 AM, Wengang Wang wrote:
> 
> 
>> On Sep 9, 2021, at 6:53 PM, Joseph Qi <joseph.qi@linux.alibaba.com> wrote:
>>
>>
>>
>> On 9/10/21 1:48 AM, Wengang Wang wrote:
>>>
>>>
>>> On Sep 9, 2021, at 4:07 AM, Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>> wrote:
>>>
>>> Hi Wengang,
>>>
>>> On 9/9/21 1:12 AM, Wengang Wang wrote:
>>> Hi,
>>>
>>> Sorry for late involving, but this doesn’t look right to me.
>>>
>>> On Sep 8, 2021, at 3:51 AM, Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>> wrote:
>>>
>>>
>>>
>>> On 9/8/21 6:20 PM, Chenyuan Mi wrote:
>>> The reference counting issue happens in two exception handling paths
>>> of ocfs2_replay_truncate_records(). When executing these two exception
>>> handling paths, the function forgets to decrease the refcount of handle
>>> increased by ocfs2_start_trans(), causing a refcount leak.
>>>
>>> Fix this issue by using ocfs2_commit_trans() to decrease the refcount
>>> of handle in two handling paths.
>>>
>>> Signed-off-by: Chenyuan Mi <cymi20@fudan.edu.cn<mailto:cymi20@fudan.edu.cn>>
>>> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn<mailto:xiyuyang19@fudan.edu.cn>>
>>> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com<mailto:tanxin.ctf@gmail.com>>
>>>
>>> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>>
>>> ---
>>> fs/ocfs2/alloc.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>>> index f1cc8258d34a..b05fde7edc3a 100644
>>> --- a/fs/ocfs2/alloc.c
>>> +++ b/fs/ocfs2/alloc.c
>>> @@ -5940,6 +5940,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>>> status = ocfs2_journal_access_di(handle, INODE_CACHE(tl_inode), tl_bh,
>>> OCFS2_JOURNAL_ACCESS_WRITE);
>>> if (status < 0) {
>>> + ocfs2_commit_trans(osb, handle);
>>> mlog_errno(status);
>>> goto bail;
>>> }
>>> @@ -5964,6 +5965,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>>>     data_alloc_bh, start_blk,
>>>     num_clusters);
>>> if (status < 0) {
>>> + ocfs2_commit_trans(osb, handle);
>>>
>>> As a transaction, stuff expected to be in the same handle should be treated as atomic.
>>> Here the stuff includes the tl_bh and other metadata block which will be modified in ocfs2_free_clusters().
>>> Coming here, some of related meta blocks may be in the handle but others are not due to the error happened.
>>> If you do a commit, partial meta blocks are committed to log. — that breaks the atomic idea, it will cause FS inconsistency.
>>> So what’s reason you want to commit the meta block changes, which is not all of expected, in this handle to journal log?
>>>
>>> Do you really see a hit on the failure? or just you detected the refcount leak by code review?
>>>
>>> You may want to look at ocfs2_journal_dirty() for the error handling part.
>>>
>>>
>>> For the first error handling, since we don't call ocfs2_journal_dirty()
>>> yet, so won't be a problem.
>>> For the second error handling, I think we don't have a better way. Look
>>> at other callers of ocfs2_free_clusters(), we simply ignore the error
>>> code.
>>> Anyway, we should commit transaction if starts, otherwise journal will
>>> be abnormal.
>>>
>>> I don't think so. If error happened, we should fail ocfs2, rather than do a partial committing.
>>>
>>
>> Umm... not exactly...
>> Take ocfs2_free_clusters() for example, when it fails in case of EIO or
>> ENOMEM, we can't just abort journal in such cases, because it is not so
>> serious, only a bit blocks still occupied and they will recovery during
>> the next mount. 
> 
> So are you sure a partial journal commitment won’t cause FS inconsistency? any proof for that?
> Problem is not just if we can try to free the blocks again or not. The problem is FS in inconsistent state.
> 
> I didn’t look into ocfs2_free_clusters() further, but can image this case:
> 
> 1) We are going to free some clusters/blocks. 
> 2) We may need a new (not merging into existing) record to remember the new free extent. 
> 3) The new record needs to be inserted into free extent tree.
> 4) the block (block A) where the new record to be inserted could be already full thus no space for the new record.
> 5) then we need at least a new block (block B) to store the new record. (to maintain the free block btree, maybe another block, block C is needed too).
> 6) So we need to save the pointer (block number) of block B in block A and store the new record in block B.
> 7) In this case we need to make sure block A and block B either both in journal log, or none of them in journal log.  We don’t allow block A is in journal bot block B is not, right?
> 
> go back to question,   Error could after block B is added to journal handle but before block A is added. In case we do a journal commit when hitting that error, we are committing block B to journal but leaving block A not in.
> If panic happened, block A could never has the pointer pointing to block B. The result is block B is permanently lost (we can never reuse this block again). 
> Or if we add block A to journal first before adding block B and error happens After block A is added and before block B is added.  Then we have the pointer pointing to block B in block A after panic, but block B doesn’t contain valid contents. The result is that we will hit problem when visiting block B as a valid btree block.
> 
> 
>> That's why we have "errors=continue" in most filesystems, we should always
>> consider the business continuity first.
>> Also you can look at ext4_free_blocks() for reference.
> 
> OCFS2 doesn’t support ERRORS_CONT, or to say it just ignore that option.  No matter ERRORS_CONT is supported or not by OCFS2,
> The key is not to leave the FS in inconsistent state.
> 

I didn't say it won't cause inconsistency, but "don't have a better way".
IIUC, ocfs2_free_cluster() mainly clears the bitmap and mark them free again.
So the lost space is some what a cost for "please don't abort business if
error happens but not be so serious". I think that's why other callers will
also commit transaction even ocfs2_free_cluster() fails.

Thanks,
Joseph


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ocfs2-devel] [PATCH v2] ocfs2: Fix handle refcount leak in two exception handling paths
  2021-09-14  2:12             ` Joseph Qi
@ 2021-09-14  3:07               ` Wengang Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Wengang Wang @ 2021-09-14  3:07 UTC (permalink / raw)
  To: Joseph Qi
  Cc: Chenyuan Mi, akpm, Xin Tan, Xiyu Yang, yuanxzhang, linux-kernel,
	ocfs2-devel



> On Sep 13, 2021, at 7:12 PM, Joseph Qi <joseph.qi@linux.alibaba.com> wrote:
> 
> 
> 
> On 9/11/21 1:00 AM, Wengang Wang wrote:
>> 
>> 
>>> On Sep 9, 2021, at 6:53 PM, Joseph Qi <joseph.qi@linux.alibaba.com> wrote:
>>> 
>>> 
>>> 
>>> On 9/10/21 1:48 AM, Wengang Wang wrote:
>>>> 
>>>> 
>>>> On Sep 9, 2021, at 4:07 AM, Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>> wrote:
>>>> 
>>>> Hi Wengang,
>>>> 
>>>> On 9/9/21 1:12 AM, Wengang Wang wrote:
>>>> Hi,
>>>> 
>>>> Sorry for late involving, but this doesn’t look right to me.
>>>> 
>>>> On Sep 8, 2021, at 3:51 AM, Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>> wrote:
>>>> 
>>>> 
>>>> 
>>>> On 9/8/21 6:20 PM, Chenyuan Mi wrote:
>>>> The reference counting issue happens in two exception handling paths
>>>> of ocfs2_replay_truncate_records(). When executing these two exception
>>>> handling paths, the function forgets to decrease the refcount of handle
>>>> increased by ocfs2_start_trans(), causing a refcount leak.
>>>> 
>>>> Fix this issue by using ocfs2_commit_trans() to decrease the refcount
>>>> of handle in two handling paths.
>>>> 
>>>> Signed-off-by: Chenyuan Mi <cymi20@fudan.edu.cn<mailto:cymi20@fudan.edu.cn>>
>>>> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn<mailto:xiyuyang19@fudan.edu.cn>>
>>>> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com<mailto:tanxin.ctf@gmail.com>>
>>>> 
>>>> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>>
>>>> ---
>>>> fs/ocfs2/alloc.c | 2 ++
>>>> 1 file changed, 2 insertions(+)
>>>> 
>>>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>>>> index f1cc8258d34a..b05fde7edc3a 100644
>>>> --- a/fs/ocfs2/alloc.c
>>>> +++ b/fs/ocfs2/alloc.c
>>>> @@ -5940,6 +5940,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>>>> status = ocfs2_journal_access_di(handle, INODE_CACHE(tl_inode), tl_bh,
>>>> OCFS2_JOURNAL_ACCESS_WRITE);
>>>> if (status < 0) {
>>>> + ocfs2_commit_trans(osb, handle);
>>>> mlog_errno(status);
>>>> goto bail;
>>>> }
>>>> @@ -5964,6 +5965,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>>>>    data_alloc_bh, start_blk,
>>>>    num_clusters);
>>>> if (status < 0) {
>>>> + ocfs2_commit_trans(osb, handle);
>>>> 
>>>> As a transaction, stuff expected to be in the same handle should be treated as atomic.
>>>> Here the stuff includes the tl_bh and other metadata block which will be modified in ocfs2_free_clusters().
>>>> Coming here, some of related meta blocks may be in the handle but others are not due to the error happened.
>>>> If you do a commit, partial meta blocks are committed to log. — that breaks the atomic idea, it will cause FS inconsistency.
>>>> So what’s reason you want to commit the meta block changes, which is not all of expected, in this handle to journal log?
>>>> 
>>>> Do you really see a hit on the failure? or just you detected the refcount leak by code review?
>>>> 
>>>> You may want to look at ocfs2_journal_dirty() for the error handling part.
>>>> 
>>>> 
>>>> For the first error handling, since we don't call ocfs2_journal_dirty()
>>>> yet, so won't be a problem.
>>>> For the second error handling, I think we don't have a better way. Look
>>>> at other callers of ocfs2_free_clusters(), we simply ignore the error
>>>> code.
>>>> Anyway, we should commit transaction if starts, otherwise journal will
>>>> be abnormal.
>>>> 
>>>> I don't think so. If error happened, we should fail ocfs2, rather than do a partial committing.
>>>> 
>>> 
>>> Umm... not exactly...
>>> Take ocfs2_free_clusters() for example, when it fails in case of EIO or
>>> ENOMEM, we can't just abort journal in such cases, because it is not so
>>> serious, only a bit blocks still occupied and they will recovery during
>>> the next mount. 
>> 
>> So are you sure a partial journal commitment won’t cause FS inconsistency? any proof for that?
>> Problem is not just if we can try to free the blocks again or not. The problem is FS in inconsistent state.
>> 
>> I didn’t look into ocfs2_free_clusters() further, but can image this case:
>> 
>> 1) We are going to free some clusters/blocks. 
>> 2) We may need a new (not merging into existing) record to remember the new free extent. 
>> 3) The new record needs to be inserted into free extent tree.
>> 4) the block (block A) where the new record to be inserted could be already full thus no space for the new record.
>> 5) then we need at least a new block (block B) to store the new record. (to maintain the free block btree, maybe another block, block C is needed too).
>> 6) So we need to save the pointer (block number) of block B in block A and store the new record in block B.
>> 7) In this case we need to make sure block A and block B either both in journal log, or none of them in journal log.  We don’t allow block A is in journal bot block B is not, right?
>> 
>> go back to question,   Error could after block B is added to journal handle but before block A is added. In case we do a journal commit when hitting that error, we are committing block B to journal but leaving block A not in.
>> If panic happened, block A could never has the pointer pointing to block B. The result is block B is permanently lost (we can never reuse this block again). 
>> Or if we add block A to journal first before adding block B and error happens After block A is added and before block B is added.  Then we have the pointer pointing to block B in block A after panic, but block B doesn’t contain valid contents. The result is that we will hit problem when visiting block B as a valid btree block.
>> 
>> 
>>> That's why we have "errors=continue" in most filesystems, we should always
>>> consider the business continuity first.
>>> Also you can look at ext4_free_blocks() for reference.
>> 
>> OCFS2 doesn’t support ERRORS_CONT, or to say it just ignore that option.  No matter ERRORS_CONT is supported or not by OCFS2,
>> The key is not to leave the FS in inconsistent state.
>> 
> 
> I didn't say it won't cause inconsistency, but "don't have a better way".
> IIUC, ocfs2_free_cluster() mainly clears the bitmap and mark them free again.
> So the lost space is some what a cost for "please don't abort business if
> error happens but not be so serious". I think that's why other callers will
> also commit transaction even ocfs2_free_cluster() fails.
> 

Hm.. OK, seems it needs a big effort to “have a better way”.
And Maybe freeing blocks is not “so serious”, I let it go.

thanks,
wengang


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ocfs2-devel] [PATCH v2] ocfs2: Fix handle refcount leak in two exception handling paths
  2021-09-08 10:51 ` Joseph Qi
  2021-09-08 17:12   ` [Ocfs2-devel] " Wengang Wang
@ 2021-09-15  2:36   ` Joseph Qi
  1 sibling, 0 replies; 9+ messages in thread
From: Joseph Qi @ 2021-09-15  2:36 UTC (permalink / raw)
  To: Chenyuan Mi, akpm
  Cc: Xin Tan, Xiyu Yang, yuanxzhang, linux-kernel, ocfs2-devel, Wengang Wang

Hi Andrew,
Now there is no objection on this patch.
Would you please pick it into your -mm tree? Thanks.

Joseph

On 9/8/21 6:51 PM, Joseph Qi wrote:
> 
> 
> On 9/8/21 6:20 PM, Chenyuan Mi wrote:
>> The reference counting issue happens in two exception handling paths
>> of ocfs2_replay_truncate_records(). When executing these two exception
>> handling paths, the function forgets to decrease the refcount of handle
>> increased by ocfs2_start_trans(), causing a refcount leak.
>>
>> Fix this issue by using ocfs2_commit_trans() to decrease the refcount
>> of handle in two handling paths.
>>
>> Signed-off-by: Chenyuan Mi <cymi20@fudan.edu.cn>
>> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
>> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
> 
> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
>> ---
>>  fs/ocfs2/alloc.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>> index f1cc8258d34a..b05fde7edc3a 100644
>> --- a/fs/ocfs2/alloc.c
>> +++ b/fs/ocfs2/alloc.c
>> @@ -5940,6 +5940,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>>  		status = ocfs2_journal_access_di(handle, INODE_CACHE(tl_inode), tl_bh,
>>  						 OCFS2_JOURNAL_ACCESS_WRITE);
>>  		if (status < 0) {
>> +			ocfs2_commit_trans(osb, handle);
>>  			mlog_errno(status);
>>  			goto bail;
>>  		}
>> @@ -5964,6 +5965,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
>>  						     data_alloc_bh, start_blk,
>>  						     num_clusters);
>>  			if (status < 0) {
>> +				ocfs2_commit_trans(osb, handle);
>>  				mlog_errno(status);
>>  				goto bail;
>>  			}
>>
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-09-15  2:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-08 10:20 [PATCH v2] ocfs2: Fix handle refcount leak in two exception handling paths Chenyuan Mi
2021-09-08 10:51 ` Joseph Qi
2021-09-08 17:12   ` [Ocfs2-devel] " Wengang Wang
2021-09-09 11:07     ` Joseph Qi
     [not found]       ` <CED0D2AD-7905-490E-8D36-50D192CD9BF1@oracle.com>
2021-09-10  1:53         ` Joseph Qi
2021-09-10 17:00           ` Wengang Wang
2021-09-14  2:12             ` Joseph Qi
2021-09-14  3:07               ` Wengang Wang
2021-09-15  2:36   ` Joseph Qi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).