On Sep 9, 2021, at 4:07 AM, Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>> wrote:

Hi Wengang,

On 9/9/21 1:12 AM, Wengang Wang wrote:
Hi,

Sorry for late involving, but this doesn’t look right to me.

On Sep 8, 2021, at 3:51 AM, Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>> wrote:



On 9/8/21 6:20 PM, Chenyuan Mi wrote:
The reference counting issue happens in two exception handling paths
of ocfs2_replay_truncate_records(). When executing these two exception
handling paths, the function forgets to decrease the refcount of handle
increased by ocfs2_start_trans(), causing a refcount leak.

Fix this issue by using ocfs2_commit_trans() to decrease the refcount
of handle in two handling paths.

Signed-off-by: Chenyuan Mi <cymi20@fudan.edu.cn<mailto:cymi20@fudan.edu.cn>>
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn<mailto:xiyuyang19@fudan.edu.cn>>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com<mailto:tanxin.ctf@gmail.com>>

Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com<mailto:joseph.qi@linux.alibaba.com>>
---
fs/ocfs2/alloc.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index f1cc8258d34a..b05fde7edc3a 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -5940,6 +5940,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
status = ocfs2_journal_access_di(handle, INODE_CACHE(tl_inode), tl_bh,
 OCFS2_JOURNAL_ACCESS_WRITE);
if (status < 0) {
+ ocfs2_commit_trans(osb, handle);
mlog_errno(status);
goto bail;
}
@@ -5964,6 +5965,7 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
     data_alloc_bh, start_blk,
     num_clusters);
if (status < 0) {
+ ocfs2_commit_trans(osb, handle);

As a transaction, stuff expected to be in the same handle should be treated as atomic.
Here the stuff includes the tl_bh and other metadata block which will be modified in ocfs2_free_clusters().
Coming here, some of related meta blocks may be in the handle but others are not due to the error happened.
If you do a commit, partial meta blocks are committed to log. — that breaks the atomic idea, it will cause FS inconsistency.
So what’s reason you want to commit the meta block changes, which is not all of expected, in this handle to journal log?

Do you really see a hit on the failure? or just you detected the refcount leak by code review?

You may want to look at ocfs2_journal_dirty() for the error handling part.


For the first error handling, since we don't call ocfs2_journal_dirty()
yet, so won't be a problem.
For the second error handling, I think we don't have a better way. Look
at other callers of ocfs2_free_clusters(), we simply ignore the error
code.
Anyway, we should commit transaction if starts, otherwise journal will
be abnormal.

I don't think so. If error happened, we should fail ocfs2, rather than do a partial committing.

thanks,
wengang