ocfs2-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [Ocfs2-devel] discuss about jbd2 assertion in defragment path
@ 2023-02-10 10:04 Heming Zhao via Ocfs2-devel
  2023-02-14  2:52 ` Joseph Qi via Ocfs2-devel
  0 siblings, 1 reply; 12+ messages in thread
From: Heming Zhao via Ocfs2-devel @ 2023-02-10 10:04 UTC (permalink / raw)
  To: joseph.qi, ocfs2-devel; +Cc: jack

Hello Joseph,

I am sorry to wake up a long time ago thread.

All mails of this thread (my patch is [1]):
[1] https://oss.oracle.com/pipermail/ocfs2-devel/2022-May/000101.html
[2] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000105.html
[3] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000109.html
[4] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000217.html

I re-checked ocfs2 defragmentation & jbd2 flow recently, I still think my
patch [1] is right. At least, the fixing code is correct, the patch commit
log needs to polish.

This bug has the same root cause of commit 7f27ec978b0e ("ocfs2: call
ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock()").
For this bug, jbd2_journal_restart() is called by ocfs2_split_extent() during
defragmenting, and it's not about "not enough credits" issue you ever said in [2].

I explain my thinking again in this mail.

the crash call flow:

ocfs2_defrag_extent //caller call it in while() loop.
 + handle = ocfs2_start_trans(osb, credits)
 + __ocfs2_move_extent
 |  + ocfs2_journal_access_di //[a]
 |  + ocfs2_split_extent      //[b]
 |  |  + if                   //[b.1]
 |  |  |   ocfs2_replace_extent_rec/ocfs2_split_and_insert
 |  |  + else
 |  |     ocfs2_try_to_merge_extent
 |  |
 |  + ocfs2_journal_dirty     //[c]
 |
 + ocfs2_commit_trans(osb, handle) //<== complete this handle

In my viewpoint, ocfs2_split_extent() is journal self-service function. I still
belive the two lines ([a] & [c]) in __ocfs2_move_extent() are totally useless.
In ocfs2_split_extent(), the code from the first code line to "if-else" code
area ([b.1]) doesn't need any journal protection, and we also could see there
are only read operations.

If we worry about data corruption after removing [a] & [c], (e.g: my eyes missed
some journal operations from [a] to [b.1]), we could only delete [c]. So the
fixed code seems (only remove line [c]):

ocfs2_defrag_extent
 + handle = ocfs2_start_trans(osb, credits)
 + __ocfs2_move_extent
 |  + ocfs2_journal_access_di //[a]  <-- keep it, but remove pair dirty action
 |  + ocfs2_split_extent      //[b]
 |     + if                   //[b.1]
 |     |   ocfs2_replace_extent_rec/ocfs2_split_and_insert
 |     + else
 |        ocfs2_try_to_merge_extent
 |
 + ocfs2_commit_trans(osb, handle)

Thanks,
Heming

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path
  2023-02-10 10:04 [Ocfs2-devel] discuss about jbd2 assertion in defragment path Heming Zhao via Ocfs2-devel
@ 2023-02-14  2:52 ` Joseph Qi via Ocfs2-devel
  2023-02-14  4:33   ` Heming Zhao via Ocfs2-devel
  0 siblings, 1 reply; 12+ messages in thread
From: Joseph Qi via Ocfs2-devel @ 2023-02-14  2:52 UTC (permalink / raw)
  To: Heming Zhao, ocfs2-devel; +Cc: jack

Hi, Sorry about the late reply.
This thread is indeed a long time ago:(
It seems that I said the two ocfs2_journal_access_di() are for different
buffer head. Anyway, I have to recall the discussion before and get back
to you.

Thanks,
Joseph

On 2/10/23 6:04 PM, Heming Zhao wrote:
> Hello Joseph,
> 
> I am sorry to wake up a long time ago thread.
> 
> All mails of this thread (my patch is [1]):
> [1] https://oss.oracle.com/pipermail/ocfs2-devel/2022-May/000101.html
> [2] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000105.html
> [3] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000109.html
> [4] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000217.html
> 
> I re-checked ocfs2 defragmentation & jbd2 flow recently, I still think my
> patch [1] is right. At least, the fixing code is correct, the patch commit
> log needs to polish.
> 
> This bug has the same root cause of commit 7f27ec978b0e ("ocfs2: call
> ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock()").
> For this bug, jbd2_journal_restart() is called by ocfs2_split_extent() during
> defragmenting, and it's not about "not enough credits" issue you ever said in [2].
> 
> I explain my thinking again in this mail.
> 
> the crash call flow:
> 
> ocfs2_defrag_extent //caller call it in while() loop.
>  + handle = ocfs2_start_trans(osb, credits)
>  + __ocfs2_move_extent
>  |  + ocfs2_journal_access_di //[a]
>  |  + ocfs2_split_extent      //[b]
>  |  |  + if                   //[b.1]
>  |  |  |   ocfs2_replace_extent_rec/ocfs2_split_and_insert
>  |  |  + else
>  |  |     ocfs2_try_to_merge_extent
>  |  |
>  |  + ocfs2_journal_dirty     //[c]
>  |
>  + ocfs2_commit_trans(osb, handle) //<== complete this handle
> 
> In my viewpoint, ocfs2_split_extent() is journal self-service function. I still
> belive the two lines ([a] & [c]) in __ocfs2_move_extent() are totally useless.
> In ocfs2_split_extent(), the code from the first code line to "if-else" code
> area ([b.1]) doesn't need any journal protection, and we also could see there
> are only read operations.
> 
> If we worry about data corruption after removing [a] & [c], (e.g: my eyes missed
> some journal operations from [a] to [b.1]), we could only delete [c]. So the
> fixed code seems (only remove line [c]):
> 
> ocfs2_defrag_extent
>  + handle = ocfs2_start_trans(osb, credits)
>  + __ocfs2_move_extent
>  |  + ocfs2_journal_access_di //[a]  <-- keep it, but remove pair dirty action
>  |  + ocfs2_split_extent      //[b]
>  |     + if                   //[b.1]
>  |     |   ocfs2_replace_extent_rec/ocfs2_split_and_insert
>  |     + else
>  |        ocfs2_try_to_merge_extent
>  |
>  + ocfs2_commit_trans(osb, handle)
> 
> Thanks,
> Heming

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path
  2023-02-14  2:52 ` Joseph Qi via Ocfs2-devel
@ 2023-02-14  4:33   ` Heming Zhao via Ocfs2-devel
  2023-02-14 11:08     ` Joseph Qi via Ocfs2-devel
  0 siblings, 1 reply; 12+ messages in thread
From: Heming Zhao via Ocfs2-devel @ 2023-02-14  4:33 UTC (permalink / raw)
  To: joseph.qi, ocfs2-devel; +Cc: jack

On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote:
> Hi, Sorry about the late reply.
> This thread is indeed a long time ago:(
> It seems that I said the two ocfs2_journal_access_di() are for different
> buffer head. Anyway, I have to recall the discussion before and get back
> to you.
> 

If you belive from [a] to [b.1] doesn't need any journal protection, my patch
makes sense from theory.

From code logic, if the defrag path (ocfs2_split_extent) doesn't call
jbd2_journal_restart(), current code is absolute correct. Increasing credits
can't help avoid jbd2 assert crash, but make more easily to trigger crash,
because it makes jbd2_journal_extend() more easily to return "status > 0". In my
view, the correct fix should delete the access/dirty pair and make sure the
ocfs2_split_extent() is journal self-service.

Different buffer head issue could be handled by rechecking ocfs2_split_extent(),
make sure all journal handle branches are correct. I already checked
ocfs2_split_extent, but I can't assure my eyes didn't miss any corner.

Thanks,
Heming

> 
> On 2/10/23 6:04 PM, Heming Zhao wrote:
> > Hello Joseph,
> > 
> > I am sorry to wake up a long time ago thread.
> > 
> > All mails of this thread (my patch is [1]):
> > [1] https://oss.oracle.com/pipermail/ocfs2-devel/2022-May/000101.html
> > [2] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000105.html
> > [3] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000109.html
> > [4] https://oss.oracle.com/pipermail/ocfs2-devel/2022-June/000217.html
> > 
> > I re-checked ocfs2 defragmentation & jbd2 flow recently, I still think my
> > patch [1] is right. At least, the fixing code is correct, the patch commit
> > log needs to polish.
> > 
> > This bug has the same root cause of commit 7f27ec978b0e ("ocfs2: call
> > ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock()").
> > For this bug, jbd2_journal_restart() is called by ocfs2_split_extent() during
> > defragmenting, and it's not about "not enough credits" issue you ever said in [2].
> > 
> > I explain my thinking again in this mail.
> > 
> > the crash call flow:
> > 
> > ocfs2_defrag_extent //caller call it in while() loop.
> >  + handle = ocfs2_start_trans(osb, credits)
> >  + __ocfs2_move_extent
> >  |  + ocfs2_journal_access_di //[a]
> >  |  + ocfs2_split_extent      //[b]
> >  |  |  + if                   //[b.1]
> >  |  |  |   ocfs2_replace_extent_rec/ocfs2_split_and_insert
> >  |  |  + else
> >  |  |     ocfs2_try_to_merge_extent
> >  |  |
> >  |  + ocfs2_journal_dirty     //[c]
> >  |
> >  + ocfs2_commit_trans(osb, handle) //<== complete this handle
> > 
> > In my viewpoint, ocfs2_split_extent() is journal self-service function. I still
> > belive the two lines ([a] & [c]) in __ocfs2_move_extent() are totally useless.
> > In ocfs2_split_extent(), the code from the first code line to "if-else" code
> > area ([b.1]) doesn't need any journal protection, and we also could see there
> > are only read operations.
> > 
> > If we worry about data corruption after removing [a] & [c], (e.g: my eyes missed
> > some journal operations from [a] to [b.1]), we could only delete [c]. So the
> > fixed code seems (only remove line [c]):
> > 
> > ocfs2_defrag_extent
> >  + handle = ocfs2_start_trans(osb, credits)
> >  + __ocfs2_move_extent
> >  |  + ocfs2_journal_access_di //[a]  <-- keep it, but remove pair dirty action
> >  |  + ocfs2_split_extent      //[b]
> >  |     + if                   //[b.1]
> >  |     |   ocfs2_replace_extent_rec/ocfs2_split_and_insert
> >  |     + else
> >  |        ocfs2_try_to_merge_extent
> >  |
> >  + ocfs2_commit_trans(osb, handle)
> > 
> > Thanks,
> > Heming

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path
  2023-02-14  4:33   ` Heming Zhao via Ocfs2-devel
@ 2023-02-14 11:08     ` Joseph Qi via Ocfs2-devel
  2023-02-14 11:48       ` Heming Zhao via Ocfs2-devel
  0 siblings, 1 reply; 12+ messages in thread
From: Joseph Qi via Ocfs2-devel @ 2023-02-14 11:08 UTC (permalink / raw)
  To: Heming Zhao, ocfs2-devel; +Cc: jack

Hi,

On 2/14/23 12:33 PM, Heming Zhao wrote:
> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote:
>> Hi, Sorry about the late reply.
>> This thread is indeed a long time ago:(
>> It seems that I said the two ocfs2_journal_access_di() are for different
>> buffer head. Anyway, I have to recall the discussion before and get back
>> to you.
>>
> If you belive from [a] to [b.1] doesn't need any journal protection, my patch
> makes sense from theory.
> 
> From code logic, if the defrag path (ocfs2_split_extent) doesn't call
> jbd2_journal_restart(), current code is absolute correct. Increasing credits
> can't help avoid jbd2 assert crash, but make more easily to trigger crash,
> because it makes jbd2_journal_extend() more easily to return "status > 0". In my
> view, the correct fix should delete the access/dirty pair and make sure the
> ocfs2_split_extent() is journal self-service.
> 
> Different buffer head issue could be handled by rechecking ocfs2_split_extent(),
> make sure all journal handle branches are correct. I already checked
> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner.

ocfs2_split_extent are not just 'read' operations, it may grow the tree,
that's why we have to do a transaction here.

The whole code flow:
ocfs2_ioctl_move_extents
  ocfs2_move_extents
    __ocfs2_move_extents_range
      ocfs2_defrag_extent  [1]
        ocfs2_start_trans  [a]
        __ocfs2_move_extent
          ocfs2_journal_access_di
          ocfs2_split_extent
          ocfs2_journal_dirty
        ocfs2_commit_trans
      ocfs2_move_extent  [2]
        ocfs2_start_trans
        __ocfs2_move_extent
          ocfs2_journal_access_di
          ocfs2_split_extent
          ocfs2_journal_dirty
        ocfs2_commit_trans
    ocfs2_start_trans  [b]
    ocfs2_journal_access_di
    ocfs2_journal_dirty
    ocfs2_commit_trans

In above, [a] and [b] are different transaction start/commit pair, and
each allocates different journal credits, even they are operate the same
bh, they are still different operations.

My another question is, __ocfs2_move_extent will be called by both
ocfs2_defrag_extent and ocfs2_move_extent, why this doesn't happen on the
other path?

Thanks,
Joseph

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path
  2023-02-14 11:08     ` Joseph Qi via Ocfs2-devel
@ 2023-02-14 11:48       ` Heming Zhao via Ocfs2-devel
  2023-02-15  2:06         ` Joseph Qi via Ocfs2-devel
  0 siblings, 1 reply; 12+ messages in thread
From: Heming Zhao via Ocfs2-devel @ 2023-02-14 11:48 UTC (permalink / raw)
  To: joseph.qi, ocfs2-devel; +Cc: jack

On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote:
> Hi,
> 
> On 2/14/23 12:33 PM, Heming Zhao wrote:
> > On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote:
> >> Hi, Sorry about the late reply.
> >> This thread is indeed a long time ago:(
> >> It seems that I said the two ocfs2_journal_access_di() are for different
> >> buffer head. Anyway, I have to recall the discussion before and get back
> >> to you.
> >>
> > If you belive from [a] to [b.1] doesn't need any journal protection, my patch
> > makes sense from theory.
> > 
> > From code logic, if the defrag path (ocfs2_split_extent) doesn't call
> > jbd2_journal_restart(), current code is absolute correct. Increasing credits
> > can't help avoid jbd2 assert crash, but make more easily to trigger crash,
> > because it makes jbd2_journal_extend() more easily to return "status > 0". In my
> > view, the correct fix should delete the access/dirty pair and make sure the
> > ocfs2_split_extent() is journal self-service.
> > 
> > Different buffer head issue could be handled by rechecking ocfs2_split_extent(),
> > make sure all journal handle branches are correct. I already checked
> > ocfs2_split_extent, but I can't assure my eyes didn't miss any corner.
> 
> ocfs2_split_extent are not just 'read' operations, it may grow the tree,
> that's why we have to do a transaction here.

You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent()
calling ocfs2_journal_access_di() (below [a]) to (before) the handling
ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read
operations. It's the range [ [a], [b.1]) (include [a], not include [b.1])

__ocfs2_move_extent
 + ocfs2_journal_access_di //[a]   <------+
 + ocfs2_split_extent      //[b]          | [a, b.1) area is read operations
 |  + if                   //[b.1] <------+
 |  |  A
 |  |  +---- from this line, this func starts r/w IOs & needs jbd2 operations
 |  |  
 |  |   ocfs2_replace_extent_rec()/ocfs2_split_and_insert()
 |  + else
 |      ocfs2_try_to_merge_extent ()
 |
 + ocfs2_journal_dirty     //[c]

> 
> The whole code flow:
> ocfs2_ioctl_move_extents
>   ocfs2_move_extents
>     __ocfs2_move_extents_range
>       ocfs2_defrag_extent  [1]
>         ocfs2_start_trans  [a]
>         __ocfs2_move_extent
>           ocfs2_journal_access_di
>           ocfs2_split_extent
>           ocfs2_journal_dirty
>         ocfs2_commit_trans
>       ocfs2_move_extent  [2]
>         ocfs2_start_trans
>         __ocfs2_move_extent
>           ocfs2_journal_access_di
>           ocfs2_split_extent
>           ocfs2_journal_dirty
>         ocfs2_commit_trans
>     ocfs2_start_trans  [b]
>     ocfs2_journal_access_di
>     ocfs2_journal_dirty
>     ocfs2_commit_trans
> 
> In above, [a] and [b] are different transaction start/commit pair, and
> each allocates different journal credits, even they are operate the same
> bh, they are still different operations.

I agree above writing, but it can't change my conclusion: ocfs2_split_extent()
is journal self-service.
It seems the author meaning: he wanted [a] to protect defragging actions (extents
changes), wanted [b] to protect inode ctime changes. the
ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the
access/dirty pair in __ocfs2_move_extent() is potential crash risk.

> 
> My another question is, __ocfs2_move_extent will be called by both
> ocfs2_defrag_extent and ocfs2_move_extent, why this doesn't happen on the
> other path?

The reason is userspace defragfs.ocfs2 never trigger ocfs2_move_extent().
see do_file_defrag():

struct ocfs2_move_extents me = {
	.me_start = 0,
	.me_len = buf->st_size,
	.me_flags = OCFS2_MOVE_EXT_FL_AUTO_DEFRAG, //<-- It makes do_defrag becomes 1
	                                 //in kernel space __ocfs2_move_extents_range()
};

Thanks,
Heming

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path
  2023-02-14 11:48       ` Heming Zhao via Ocfs2-devel
@ 2023-02-15  2:06         ` Joseph Qi via Ocfs2-devel
  2023-02-15  6:20           ` Heming Zhao via Ocfs2-devel
  0 siblings, 1 reply; 12+ messages in thread
From: Joseph Qi via Ocfs2-devel @ 2023-02-15  2:06 UTC (permalink / raw)
  To: Heming Zhao, ocfs2-devel; +Cc: jack



On 2/14/23 7:48 PM, Heming Zhao wrote:
> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote:
>> Hi,
>>
>> On 2/14/23 12:33 PM, Heming Zhao wrote:
>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote:
>>>> Hi, Sorry about the late reply.
>>>> This thread is indeed a long time ago:(
>>>> It seems that I said the two ocfs2_journal_access_di() are for different
>>>> buffer head. Anyway, I have to recall the discussion before and get back
>>>> to you.
>>>>
>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch
>>> makes sense from theory.
>>>
>>> From code logic, if the defrag path (ocfs2_split_extent) doesn't call
>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits
>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash,
>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my
>>> view, the correct fix should delete the access/dirty pair and make sure the
>>> ocfs2_split_extent() is journal self-service.
>>>
>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(),
>>> make sure all journal handle branches are correct. I already checked
>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner.
>>
>> ocfs2_split_extent are not just 'read' operations, it may grow the tree,
>> that's why we have to do a transaction here.
> 
> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent()
> calling ocfs2_journal_access_di() (below [a]) to (before) the handling
> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read
> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1])
> 
> __ocfs2_move_extent
>  + ocfs2_journal_access_di //[a]   <------+
>  + ocfs2_split_extent      //[b]          | [a, b.1) area is read operations
>  |  + if                   //[b.1] <------+
>  |  |  A
>  |  |  +---- from this line, this func starts r/w IOs & needs jbd2 operations
>  |  |  
>  |  |   ocfs2_replace_extent_rec()/ocfs2_split_and_insert()
>  |  + else
>  |      ocfs2_try_to_merge_extent ()
>  |
>  + ocfs2_journal_dirty     //[c]
> 
>>
>> The whole code flow:
>> ocfs2_ioctl_move_extents
>>   ocfs2_move_extents
>>     __ocfs2_move_extents_range
>>       ocfs2_defrag_extent  [1]
>>         ocfs2_start_trans  [a]
>>         __ocfs2_move_extent
>>           ocfs2_journal_access_di
>>           ocfs2_split_extent
>>           ocfs2_journal_dirty
>>         ocfs2_commit_trans
>>       ocfs2_move_extent  [2]
>>         ocfs2_start_trans
>>         __ocfs2_move_extent
>>           ocfs2_journal_access_di
>>           ocfs2_split_extent
>>           ocfs2_journal_dirty
>>         ocfs2_commit_trans
>>     ocfs2_start_trans  [b]
>>     ocfs2_journal_access_di
>>     ocfs2_journal_dirty
>>     ocfs2_commit_trans
>>
>> In above, [a] and [b] are different transaction start/commit pair, and
>> each allocates different journal credits, even they are operate the same
>> bh, they are still different operations.
> 
> I agree above writing, but it can't change my conclusion: ocfs2_split_extent()
> is journal self-service.

Don't understand what does 'journal self-service' mean.

> It seems the author meaning: he wanted [a] to protect defragging actions (extents
> changes), wanted [b] to protect inode ctime changes. the
> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the
> access/dirty pair in __ocfs2_move_extent() is potential crash risk.
> 
If so, what is transaction [a] protect? (no dirty buffer)

Thanks,
Joseph

>>
>> My another question is, __ocfs2_move_extent will be called by both
>> ocfs2_defrag_extent and ocfs2_move_extent, why this doesn't happen on the
>> other path?
> 
> The reason is userspace defragfs.ocfs2 never trigger ocfs2_move_extent().
> see do_file_defrag():
> 
> struct ocfs2_move_extents me = {
> 	.me_start = 0,
> 	.me_len = buf->st_size,
> 	.me_flags = OCFS2_MOVE_EXT_FL_AUTO_DEFRAG, //<-- It makes do_defrag becomes 1
> 	                                 //in kernel space __ocfs2_move_extents_range()
> };
> 
> Thanks,
> Heming

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path
  2023-02-15  2:06         ` Joseph Qi via Ocfs2-devel
@ 2023-02-15  6:20           ` Heming Zhao via Ocfs2-devel
  2023-02-15  6:42             ` Joseph Qi via Ocfs2-devel
  0 siblings, 1 reply; 12+ messages in thread
From: Heming Zhao via Ocfs2-devel @ 2023-02-15  6:20 UTC (permalink / raw)
  To: Joseph Qi, ocfs2-devel; +Cc: jack

On 2/15/23 10:06 AM, Joseph Qi wrote:
> 
> 
> On 2/14/23 7:48 PM, Heming Zhao wrote:
>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote:
>>> Hi,
>>>
>>> On 2/14/23 12:33 PM, Heming Zhao wrote:
>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote:
>>>>> Hi, Sorry about the late reply.
>>>>> This thread is indeed a long time ago:(
>>>>> It seems that I said the two ocfs2_journal_access_di() are for different
>>>>> buffer head. Anyway, I have to recall the discussion before and get back
>>>>> to you.
>>>>>
>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch
>>>> makes sense from theory.
>>>>
>>>>  From code logic, if the defrag path (ocfs2_split_extent) doesn't call
>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits
>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash,
>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my
>>>> view, the correct fix should delete the access/dirty pair and make sure the
>>>> ocfs2_split_extent() is journal self-service.
>>>>
>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(),
>>>> make sure all journal handle branches are correct. I already checked
>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner.
>>>
>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree,
>>> that's why we have to do a transaction here.
>>
>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent()
>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling
>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read
>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1])
>>
>> __ocfs2_move_extent
>>   + ocfs2_journal_access_di //[a]   <------+
>>   + ocfs2_split_extent      //[b]          | [a, b.1) area is read operations
>>   |  + if                   //[b.1] <------+
>>   |  |  A
>>   |  |  +---- from this line, this func starts r/w IOs & needs jbd2 operations
>>   |  |
>>   |  |   ocfs2_replace_extent_rec()/ocfs2_split_and_insert()
>>   |  + else
>>   |      ocfs2_try_to_merge_extent ()
>>   |
>>   + ocfs2_journal_dirty     //[c]
>>
>>>
>>> The whole code flow:
>>> ocfs2_ioctl_move_extents
>>>    ocfs2_move_extents
>>>      __ocfs2_move_extents_range
>>>        ocfs2_defrag_extent  [1]
>>>          ocfs2_start_trans  [a]
>>>          __ocfs2_move_extent
>>>            ocfs2_journal_access_di
>>>            ocfs2_split_extent
>>>            ocfs2_journal_dirty
>>>          ocfs2_commit_trans
>>>        ocfs2_move_extent  [2]
>>>          ocfs2_start_trans
>>>          __ocfs2_move_extent
>>>            ocfs2_journal_access_di
>>>            ocfs2_split_extent
>>>            ocfs2_journal_dirty
>>>          ocfs2_commit_trans
>>>      ocfs2_start_trans  [b]
>>>      ocfs2_journal_access_di
>>>      ocfs2_journal_dirty
>>>      ocfs2_commit_trans
>>>
>>> In above, [a] and [b] are different transaction start/commit pair, and
>>> each allocates different journal credits, even they are operate the same
>>> bh, they are still different operations.
>>
>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent()
>> is journal self-service.
> 
> Don't understand what does 'journal self-service' mean.

Sorry for my English skill, I invent this word 'journal self-service'.
The meaning is: this function could handle journal operations totally by itself.
Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2
journal start/stop pair.

Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function,
we even could add these two func in ocfs2_split_extent(). Then any caller won't
take care of any journal operations when calling ocfs2_split_extent().

> 
>> It seems the author meaning: he wanted [a] to protect defragging actions (extents
>> changes), wanted [b] to protect inode ctime changes. the
>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the
>> access/dirty pair in __ocfs2_move_extent() is potential crash risk.
>>
> If so, what is transaction [a] protect? (no dirty buffer)

Transaction [a] protects the dirty buffers in ocfs2_split_extent().
ocfs2_split_extent() has already correctly used access/dirty pair to
mark/handle dirty buffers.

Thanks,
Heming

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path
  2023-02-15  6:20           ` Heming Zhao via Ocfs2-devel
@ 2023-02-15  6:42             ` Joseph Qi via Ocfs2-devel
  2023-02-15  7:29               ` Heming Zhao via Ocfs2-devel
  2023-02-15 10:02               ` Heming Zhao via Ocfs2-devel
  0 siblings, 2 replies; 12+ messages in thread
From: Joseph Qi via Ocfs2-devel @ 2023-02-15  6:42 UTC (permalink / raw)
  To: Heming Zhao, ocfs2-devel; +Cc: jack



On 2/15/23 2:20 PM, Heming Zhao wrote:
> On 2/15/23 10:06 AM, Joseph Qi wrote:
>>
>>
>> On 2/14/23 7:48 PM, Heming Zhao wrote:
>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote:
>>>> Hi,
>>>>
>>>> On 2/14/23 12:33 PM, Heming Zhao wrote:
>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote:
>>>>>> Hi, Sorry about the late reply.
>>>>>> This thread is indeed a long time ago:(
>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different
>>>>>> buffer head. Anyway, I have to recall the discussion before and get back
>>>>>> to you.
>>>>>>
>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch
>>>>> makes sense from theory.
>>>>>
>>>>>  From code logic, if the defrag path (ocfs2_split_extent) doesn't call
>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits
>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash,
>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my
>>>>> view, the correct fix should delete the access/dirty pair and make sure the
>>>>> ocfs2_split_extent() is journal self-service.
>>>>>
>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(),
>>>>> make sure all journal handle branches are correct. I already checked
>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner.
>>>>
>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree,
>>>> that's why we have to do a transaction here.
>>>
>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent()
>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling
>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read
>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1])
>>>
>>> __ocfs2_move_extent
>>>   + ocfs2_journal_access_di //[a]   <------+
>>>   + ocfs2_split_extent      //[b]          | [a, b.1) area is read operations
>>>   |  + if                   //[b.1] <------+
>>>   |  |  A
>>>   |  |  +---- from this line, this func starts r/w IOs & needs jbd2 operations
>>>   |  |
>>>   |  |   ocfs2_replace_extent_rec()/ocfs2_split_and_insert()
>>>   |  + else
>>>   |      ocfs2_try_to_merge_extent ()
>>>   |
>>>   + ocfs2_journal_dirty     //[c]
>>>
>>>>
>>>> The whole code flow:
>>>> ocfs2_ioctl_move_extents
>>>>    ocfs2_move_extents
>>>>      __ocfs2_move_extents_range
>>>>        ocfs2_defrag_extent  [1]
>>>>          ocfs2_start_trans  [a]
>>>>          __ocfs2_move_extent
>>>>            ocfs2_journal_access_di
>>>>            ocfs2_split_extent
>>>>            ocfs2_journal_dirty
>>>>          ocfs2_commit_trans
>>>>        ocfs2_move_extent  [2]
>>>>          ocfs2_start_trans
>>>>          __ocfs2_move_extent
>>>>            ocfs2_journal_access_di
>>>>            ocfs2_split_extent
>>>>            ocfs2_journal_dirty
>>>>          ocfs2_commit_trans
>>>>      ocfs2_start_trans  [b]
>>>>      ocfs2_journal_access_di
>>>>      ocfs2_journal_dirty
>>>>      ocfs2_commit_trans
>>>>
>>>> In above, [a] and [b] are different transaction start/commit pair, and
>>>> each allocates different journal credits, even they are operate the same
>>>> bh, they are still different operations.
>>>
>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent()
>>> is journal self-service.
>>
>> Don't understand what does 'journal self-service' mean.
> 
> Sorry for my English skill, I invent this word 'journal self-service'.
> The meaning is: this function could handle journal operations totally by itself.
> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2
> journal start/stop pair.
> 
> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function,
> we even could add these two func in ocfs2_split_extent(). Then any caller won't
> take care of any journal operations when calling ocfs2_split_extent().
> 
>>
>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents
>>> changes), wanted [b] to protect inode ctime changes. the
>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the
>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk.
>>>
>> If so, what is transaction [a] protect? (no dirty buffer)
> 
> Transaction [a] protects the dirty buffers in ocfs2_split_extent().
> ocfs2_split_extent() has already correctly used access/dirty pair to
> mark/handle dirty buffers.
> 
Okay, I think I've gotten your idea now. You change looks reasonable.
Could you please also check if move extents without auto defrag also has
the same issue?

Thanks,
Joseph


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path
  2023-02-15  6:42             ` Joseph Qi via Ocfs2-devel
@ 2023-02-15  7:29               ` Heming Zhao via Ocfs2-devel
  2023-02-15 10:02               ` Heming Zhao via Ocfs2-devel
  1 sibling, 0 replies; 12+ messages in thread
From: Heming Zhao via Ocfs2-devel @ 2023-02-15  7:29 UTC (permalink / raw)
  To: Joseph Qi, ocfs2-devel; +Cc: jack

On 2/15/23 2:42 PM, Joseph Qi wrote:
> 
> 
> On 2/15/23 2:20 PM, Heming Zhao wrote:
>> On 2/15/23 10:06 AM, Joseph Qi wrote:
>>>
>>>
>>> On 2/14/23 7:48 PM, Heming Zhao wrote:
>>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote:
>>>>> Hi,
>>>>>
>>>>> On 2/14/23 12:33 PM, Heming Zhao wrote:
>>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote:
>>>>>>> Hi, Sorry about the late reply.
>>>>>>> This thread is indeed a long time ago:(
>>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different
>>>>>>> buffer head. Anyway, I have to recall the discussion before and get back
>>>>>>> to you.
>>>>>>>
>>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch
>>>>>> makes sense from theory.
>>>>>>
>>>>>>   From code logic, if the defrag path (ocfs2_split_extent) doesn't call
>>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits
>>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash,
>>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my
>>>>>> view, the correct fix should delete the access/dirty pair and make sure the
>>>>>> ocfs2_split_extent() is journal self-service.
>>>>>>
>>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(),
>>>>>> make sure all journal handle branches are correct. I already checked
>>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner.
>>>>>
>>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree,
>>>>> that's why we have to do a transaction here.
>>>>
>>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent()
>>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling
>>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read
>>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1])
>>>>
>>>> __ocfs2_move_extent
>>>>    + ocfs2_journal_access_di //[a]   <------+
>>>>    + ocfs2_split_extent      //[b]          | [a, b.1) area is read operations
>>>>    |  + if                   //[b.1] <------+
>>>>    |  |  A
>>>>    |  |  +---- from this line, this func starts r/w IOs & needs jbd2 operations
>>>>    |  |
>>>>    |  |   ocfs2_replace_extent_rec()/ocfs2_split_and_insert()
>>>>    |  + else
>>>>    |      ocfs2_try_to_merge_extent ()
>>>>    |
>>>>    + ocfs2_journal_dirty     //[c]
>>>>
>>>>>
>>>>> The whole code flow:
>>>>> ocfs2_ioctl_move_extents
>>>>>     ocfs2_move_extents
>>>>>       __ocfs2_move_extents_range
>>>>>         ocfs2_defrag_extent  [1]
>>>>>           ocfs2_start_trans  [a]
>>>>>           __ocfs2_move_extent
>>>>>             ocfs2_journal_access_di
>>>>>             ocfs2_split_extent
>>>>>             ocfs2_journal_dirty
>>>>>           ocfs2_commit_trans
>>>>>         ocfs2_move_extent  [2]
>>>>>           ocfs2_start_trans
>>>>>           __ocfs2_move_extent
>>>>>             ocfs2_journal_access_di
>>>>>             ocfs2_split_extent
>>>>>             ocfs2_journal_dirty
>>>>>           ocfs2_commit_trans
>>>>>       ocfs2_start_trans  [b]
>>>>>       ocfs2_journal_access_di
>>>>>       ocfs2_journal_dirty
>>>>>       ocfs2_commit_trans
>>>>>
>>>>> In above, [a] and [b] are different transaction start/commit pair, and
>>>>> each allocates different journal credits, even they are operate the same
>>>>> bh, they are still different operations.
>>>>
>>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent()
>>>> is journal self-service.
>>>
>>> Don't understand what does 'journal self-service' mean.
>>
>> Sorry for my English skill, I invent this word 'journal self-service'.
>> The meaning is: this function could handle journal operations totally by itself.
>> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2
>> journal start/stop pair.
>>
>> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function,
>> we even could add these two func in ocfs2_split_extent(). Then any caller won't
>> take care of any journal operations when calling ocfs2_split_extent().
>>
>>>
>>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents
>>>> changes), wanted [b] to protect inode ctime changes. the
>>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the
>>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk.
>>>>
>>> If so, what is transaction [a] protect? (no dirty buffer)
>>
>> Transaction [a] protects the dirty buffers in ocfs2_split_extent().
>> ocfs2_split_extent() has already correctly used access/dirty pair to
>> mark/handle dirty buffers.
>>
> Okay, I think I've gotten your idea now. You change looks reasonable.
> Could you please also check if move extents without auto defrag also has
> the same issue?

OK, wait me for checking.

Thanks,
Heming

> 
> Thanks,
> Joseph
> 


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path
  2023-02-15  6:42             ` Joseph Qi via Ocfs2-devel
  2023-02-15  7:29               ` Heming Zhao via Ocfs2-devel
@ 2023-02-15 10:02               ` Heming Zhao via Ocfs2-devel
  2023-02-15 12:04                 ` Joseph Qi via Ocfs2-devel
  1 sibling, 1 reply; 12+ messages in thread
From: Heming Zhao via Ocfs2-devel @ 2023-02-15 10:02 UTC (permalink / raw)
  To: Joseph Qi, ocfs2-devel; +Cc: jack

On 2/15/23 2:42 PM, Joseph Qi wrote:
> 
> 
> On 2/15/23 2:20 PM, Heming Zhao wrote:
>> On 2/15/23 10:06 AM, Joseph Qi wrote:
>>>
>>>
>>> On 2/14/23 7:48 PM, Heming Zhao wrote:
>>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote:
>>>>> Hi,
>>>>>
>>>>> On 2/14/23 12:33 PM, Heming Zhao wrote:
>>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote:
>>>>>>> Hi, Sorry about the late reply.
>>>>>>> This thread is indeed a long time ago:(
>>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different
>>>>>>> buffer head. Anyway, I have to recall the discussion before and get back
>>>>>>> to you.
>>>>>>>
>>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch
>>>>>> makes sense from theory.
>>>>>>
>>>>>>   From code logic, if the defrag path (ocfs2_split_extent) doesn't call
>>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits
>>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash,
>>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my
>>>>>> view, the correct fix should delete the access/dirty pair and make sure the
>>>>>> ocfs2_split_extent() is journal self-service.
>>>>>>
>>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(),
>>>>>> make sure all journal handle branches are correct. I already checked
>>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner.
>>>>>
>>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree,
>>>>> that's why we have to do a transaction here.
>>>>
>>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent()
>>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling
>>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read
>>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1])
>>>>
>>>> __ocfs2_move_extent
>>>>    + ocfs2_journal_access_di //[a]   <------+
>>>>    + ocfs2_split_extent      //[b]          | [a, b.1) area is read operations
>>>>    |  + if                   //[b.1] <------+
>>>>    |  |  A
>>>>    |  |  +---- from this line, this func starts r/w IOs & needs jbd2 operations
>>>>    |  |
>>>>    |  |   ocfs2_replace_extent_rec()/ocfs2_split_and_insert()
>>>>    |  + else
>>>>    |      ocfs2_try_to_merge_extent ()
>>>>    |
>>>>    + ocfs2_journal_dirty     //[c]
>>>>
>>>>>
>>>>> The whole code flow:
>>>>> ocfs2_ioctl_move_extents
>>>>>     ocfs2_move_extents
>>>>>       __ocfs2_move_extents_range
>>>>>         ocfs2_defrag_extent  [1]
>>>>>           ocfs2_start_trans  [a]
>>>>>           __ocfs2_move_extent
>>>>>             ocfs2_journal_access_di
>>>>>             ocfs2_split_extent
>>>>>             ocfs2_journal_dirty
>>>>>           ocfs2_commit_trans
>>>>>         ocfs2_move_extent  [2]
>>>>>           ocfs2_start_trans
>>>>>           __ocfs2_move_extent
>>>>>             ocfs2_journal_access_di
>>>>>             ocfs2_split_extent
>>>>>             ocfs2_journal_dirty
>>>>>           ocfs2_commit_trans
>>>>>       ocfs2_start_trans  [b]
>>>>>       ocfs2_journal_access_di
>>>>>       ocfs2_journal_dirty
>>>>>       ocfs2_commit_trans
>>>>>
>>>>> In above, [a] and [b] are different transaction start/commit pair, and
>>>>> each allocates different journal credits, even they are operate the same
>>>>> bh, they are still different operations.
>>>>
>>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent()
>>>> is journal self-service.
>>>
>>> Don't understand what does 'journal self-service' mean.
>>
>> Sorry for my English skill, I invent this word 'journal self-service'.
>> The meaning is: this function could handle journal operations totally by itself.
>> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2
>> journal start/stop pair.
>>
>> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function,
>> we even could add these two func in ocfs2_split_extent(). Then any caller won't
>> take care of any journal operations when calling ocfs2_split_extent().
>>
>>>
>>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents
>>>> changes), wanted [b] to protect inode ctime changes. the
>>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the
>>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk.
>>>>
>>> If so, what is transaction [a] protect? (no dirty buffer)
>>
>> Transaction [a] protects the dirty buffers in ocfs2_split_extent().
>> ocfs2_split_extent() has already correctly used access/dirty pair to
>> mark/handle dirty buffers.
>>
> Okay, I think I've gotten your idea now. You change looks reasonable.
> Could you please also check if move extents without auto defrag also has
> the same issue?
> 

I have finished code checking. ocfs2_move_extent() path has the same issue.
My patch also works fine from this code branch.
In theory, ocfs2_move_extent() has less opportunity to trigger crash, because
userspace defrag.ocfs2 very likely set less than file size on
'struct ocfs2_move_extents : me_len', if defrag.ocfs2 would support this
code branch.

If you think my patch is acceptable, I will file v2 patch & plan only to change
commit log in v2.

Thanks,
Heming

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path
  2023-02-15 10:02               ` Heming Zhao via Ocfs2-devel
@ 2023-02-15 12:04                 ` Joseph Qi via Ocfs2-devel
  2023-02-16 14:58                   ` Heming Zhao via Ocfs2-devel
  0 siblings, 1 reply; 12+ messages in thread
From: Joseph Qi via Ocfs2-devel @ 2023-02-15 12:04 UTC (permalink / raw)
  To: Heming Zhao, ocfs2-devel; +Cc: jack



On 2/15/23 6:02 PM, Heming Zhao wrote:
> On 2/15/23 2:42 PM, Joseph Qi wrote:
>>
>>
>> On 2/15/23 2:20 PM, Heming Zhao wrote:
>>> On 2/15/23 10:06 AM, Joseph Qi wrote:
>>>>
>>>>
>>>> On 2/14/23 7:48 PM, Heming Zhao wrote:
>>>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 2/14/23 12:33 PM, Heming Zhao wrote:
>>>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote:
>>>>>>>> Hi, Sorry about the late reply.
>>>>>>>> This thread is indeed a long time ago:(
>>>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different
>>>>>>>> buffer head. Anyway, I have to recall the discussion before and get back
>>>>>>>> to you.
>>>>>>>>
>>>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch
>>>>>>> makes sense from theory.
>>>>>>>
>>>>>>>   From code logic, if the defrag path (ocfs2_split_extent) doesn't call
>>>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits
>>>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash,
>>>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my
>>>>>>> view, the correct fix should delete the access/dirty pair and make sure the
>>>>>>> ocfs2_split_extent() is journal self-service.
>>>>>>>
>>>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(),
>>>>>>> make sure all journal handle branches are correct. I already checked
>>>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner.
>>>>>>
>>>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree,
>>>>>> that's why we have to do a transaction here.
>>>>>
>>>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent()
>>>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling
>>>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read
>>>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1])
>>>>>
>>>>> __ocfs2_move_extent
>>>>>    + ocfs2_journal_access_di //[a]   <------+
>>>>>    + ocfs2_split_extent      //[b]          | [a, b.1) area is read operations
>>>>>    |  + if                   //[b.1] <------+
>>>>>    |  |  A
>>>>>    |  |  +---- from this line, this func starts r/w IOs & needs jbd2 operations
>>>>>    |  |
>>>>>    |  |   ocfs2_replace_extent_rec()/ocfs2_split_and_insert()
>>>>>    |  + else
>>>>>    |      ocfs2_try_to_merge_extent ()
>>>>>    |
>>>>>    + ocfs2_journal_dirty     //[c]
>>>>>
>>>>>>
>>>>>> The whole code flow:
>>>>>> ocfs2_ioctl_move_extents
>>>>>>     ocfs2_move_extents
>>>>>>       __ocfs2_move_extents_range
>>>>>>         ocfs2_defrag_extent  [1]
>>>>>>           ocfs2_start_trans  [a]
>>>>>>           __ocfs2_move_extent
>>>>>>             ocfs2_journal_access_di
>>>>>>             ocfs2_split_extent
>>>>>>             ocfs2_journal_dirty
>>>>>>           ocfs2_commit_trans
>>>>>>         ocfs2_move_extent  [2]
>>>>>>           ocfs2_start_trans
>>>>>>           __ocfs2_move_extent
>>>>>>             ocfs2_journal_access_di
>>>>>>             ocfs2_split_extent
>>>>>>             ocfs2_journal_dirty
>>>>>>           ocfs2_commit_trans
>>>>>>       ocfs2_start_trans  [b]
>>>>>>       ocfs2_journal_access_di
>>>>>>       ocfs2_journal_dirty
>>>>>>       ocfs2_commit_trans
>>>>>>
>>>>>> In above, [a] and [b] are different transaction start/commit pair, and
>>>>>> each allocates different journal credits, even they are operate the same
>>>>>> bh, they are still different operations.
>>>>>
>>>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent()
>>>>> is journal self-service.
>>>>
>>>> Don't understand what does 'journal self-service' mean.
>>>
>>> Sorry for my English skill, I invent this word 'journal self-service'.
>>> The meaning is: this function could handle journal operations totally by itself.
>>> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2
>>> journal start/stop pair.
>>>
>>> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function,
>>> we even could add these two func in ocfs2_split_extent(). Then any caller won't
>>> take care of any journal operations when calling ocfs2_split_extent().
>>>
>>>>
>>>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents
>>>>> changes), wanted [b] to protect inode ctime changes. the
>>>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the
>>>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk.
>>>>>
>>>> If so, what is transaction [a] protect? (no dirty buffer)
>>>
>>> Transaction [a] protects the dirty buffers in ocfs2_split_extent().
>>> ocfs2_split_extent() has already correctly used access/dirty pair to
>>> mark/handle dirty buffers.
>>>
>> Okay, I think I've gotten your idea now. You change looks reasonable.
>> Could you please also check if move extents without auto defrag also has
>> the same issue?
>>
> 
> I have finished code checking. ocfs2_move_extent() path has the same issue.
> My patch also works fine from this code branch.
> In theory, ocfs2_move_extent() has less opportunity to trigger crash, because
> userspace defrag.ocfs2 very likely set less than file size on
> 'struct ocfs2_move_extents : me_len', if defrag.ocfs2 would support this
> code branch.
> 

If we write a simple program to do move extents ioctl on the same ocfs2
(without auto defrag), can we trigger this crash?

> If you think my patch is acceptable, I will file v2 patch & plan only to change
> commit log in v2.
> 
Sure, with above confirmed.

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] discuss about jbd2 assertion in defragment path
  2023-02-15 12:04                 ` Joseph Qi via Ocfs2-devel
@ 2023-02-16 14:58                   ` Heming Zhao via Ocfs2-devel
  0 siblings, 0 replies; 12+ messages in thread
From: Heming Zhao via Ocfs2-devel @ 2023-02-16 14:58 UTC (permalink / raw)
  To: Joseph Qi, ocfs2-devel; +Cc: jack

On Wed, Feb 15, 2023 at 08:04:48PM +0800, Joseph Qi wrote:
> 
> 
> On 2/15/23 6:02 PM, Heming Zhao wrote:
> > On 2/15/23 2:42 PM, Joseph Qi wrote:
> >>
> >>
> >> On 2/15/23 2:20 PM, Heming Zhao wrote:
> >>> On 2/15/23 10:06 AM, Joseph Qi wrote:
> >>>>
> >>>>
> >>>> On 2/14/23 7:48 PM, Heming Zhao wrote:
> >>>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> On 2/14/23 12:33 PM, Heming Zhao wrote:
> >>>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote:
> >>>>>>>> Hi, Sorry about the late reply.
> >>>>>>>> This thread is indeed a long time ago:(
> >>>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different
> >>>>>>>> buffer head. Anyway, I have to recall the discussion before and get back
> >>>>>>>> to you.
> >>>>>>>>
> >>>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch
> >>>>>>> makes sense from theory.
> >>>>>>>
> >>>>>>>   From code logic, if the defrag path (ocfs2_split_extent) doesn't call
> >>>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits
> >>>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash,
> >>>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my
> >>>>>>> view, the correct fix should delete the access/dirty pair and make sure the
> >>>>>>> ocfs2_split_extent() is journal self-service.
> >>>>>>>
> >>>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(),
> >>>>>>> make sure all journal handle branches are correct. I already checked
> >>>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner.
> >>>>>>
> >>>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree,
> >>>>>> that's why we have to do a transaction here.
> >>>>>
> >>>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent()
> >>>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling
> >>>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read
> >>>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1])
> >>>>>
> >>>>> __ocfs2_move_extent
> >>>>>    + ocfs2_journal_access_di //[a]   <------+
> >>>>>    + ocfs2_split_extent      //[b]          | [a, b.1) area is read operations
> >>>>>    |  + if                   //[b.1] <------+
> >>>>>    |  |  A
> >>>>>    |  |  +---- from this line, this func starts r/w IOs & needs jbd2 operations
> >>>>>    |  |
> >>>>>    |  |   ocfs2_replace_extent_rec()/ocfs2_split_and_insert()
> >>>>>    |  + else
> >>>>>    |      ocfs2_try_to_merge_extent ()
> >>>>>    |
> >>>>>    + ocfs2_journal_dirty     //[c]
> >>>>>
> >>>>>>
> >>>>>> The whole code flow:
> >>>>>> ocfs2_ioctl_move_extents
> >>>>>>     ocfs2_move_extents
> >>>>>>       __ocfs2_move_extents_range
> >>>>>>         ocfs2_defrag_extent  [1]
> >>>>>>           ocfs2_start_trans  [a]
> >>>>>>           __ocfs2_move_extent
> >>>>>>             ocfs2_journal_access_di
> >>>>>>             ocfs2_split_extent
> >>>>>>             ocfs2_journal_dirty
> >>>>>>           ocfs2_commit_trans
> >>>>>>         ocfs2_move_extent  [2]
> >>>>>>           ocfs2_start_trans
> >>>>>>           __ocfs2_move_extent
> >>>>>>             ocfs2_journal_access_di
> >>>>>>             ocfs2_split_extent
> >>>>>>             ocfs2_journal_dirty
> >>>>>>           ocfs2_commit_trans
> >>>>>>       ocfs2_start_trans  [b]
> >>>>>>       ocfs2_journal_access_di
> >>>>>>       ocfs2_journal_dirty
> >>>>>>       ocfs2_commit_trans
> >>>>>>
> >>>>>> In above, [a] and [b] are different transaction start/commit pair, and
> >>>>>> each allocates different journal credits, even they are operate the same
> >>>>>> bh, they are still different operations.
> >>>>>
> >>>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent()
> >>>>> is journal self-service.
> >>>>
> >>>> Don't understand what does 'journal self-service' mean.
> >>>
> >>> Sorry for my English skill, I invent this word 'journal self-service'.
> >>> The meaning is: this function could handle journal operations totally by itself.
> >>> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2
> >>> journal start/stop pair.
> >>>
> >>> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function,
> >>> we even could add these two func in ocfs2_split_extent(). Then any caller won't
> >>> take care of any journal operations when calling ocfs2_split_extent().
> >>>
> >>>>
> >>>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents
> >>>>> changes), wanted [b] to protect inode ctime changes. the
> >>>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the
> >>>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk.
> >>>>>
> >>>> If so, what is transaction [a] protect? (no dirty buffer)
> >>>
> >>> Transaction [a] protects the dirty buffers in ocfs2_split_extent().
> >>> ocfs2_split_extent() has already correctly used access/dirty pair to
> >>> mark/handle dirty buffers.
> >>>
> >> Okay, I think I've gotten your idea now. You change looks reasonable.
> >> Could you please also check if move extents without auto defrag also has
> >> the same issue?
> >>
> > 
> > I have finished code checking. ocfs2_move_extent() path has the same issue.
> > My patch also works fine from this code branch.
> > In theory, ocfs2_move_extent() has less opportunity to trigger crash, because
> > userspace defrag.ocfs2 very likely set less than file size on
> > 'struct ocfs2_move_extents : me_len', if defrag.ocfs2 would support this
> > code branch.
> > 
> 
> If we write a simple program to do move extents ioctl on the same ocfs2
> (without auto defrag), can we trigger this crash?

For this question, first, I modified defrag.ocfs2 to test non-auto defrag flow,
and found three bugs. I will describe it at the end of this mail.

Second, I failed to trigger the crash with non-auto defrag flow.

In theory, the non-auto & auto share the same defrag path from __ocfs2_move_extent().
Before I send mail to start this dicussion, I had tested some days, and failed
to trigger the crash. This crash, happened last year, one of SUSE customer
triggered it. But for confidence, I can't talk too much about it.

Under theory analysis, the crash derives from ocfs2_extend_trans() calling
jbd2_journal_restart() and getting return value "status > 0".

In jbd2_journal_extend(), there are only two cases (A & B) for return "value > 0".

```<A>
/* Don't extend a locked-down transaction! */
if (transaction->t_state != T_RUNNING) {
    jbd2_debug(3, "denied handle %p %d blocks: "
          "transaction not running\n", handle, nblocks);
    goto error_out;
}
```

```<B>
if (wanted > journal->j_max_transaction_buffers) {
    jbd2_debug(3, "denied handle %p %d blocks: "
          "transaction too large\n", handle, nblocks);
    atomic_sub(nblocks, &transaction->t_outstanding_credits);
    goto error_out;
}
```

For case <A>, in my view, if the transaction state is not T_RUNNING, which should
be committing or committed by jbd2. So I did:
- generated depth 3 B+tree
- modified defragfs.ocfs2 with/without OCFS2_MOVE_EXT_FL_PART_DEFRAG.
- mount options using commit_interval=[1,3]
- enable some mlog(0) to output log for slowing down ocfs2 layer defrag speed.


For case <B>, I did
- generated depth 3 B+tree
- modified defragfs.ocfs2 with/without OCFS2_MOVE_EXT_FL_PART_DEFRAG.
- mount options using: resv_level=[0,2]  commit_interval=[1,3,5,30]
- running defragfs, meanwhile, running dd to issue write IOs.

Both two cases with lots of times test failed to trigger crash. I only met
ocfs2 ininite loop issue: ocfs2 kept busy all the time, the defrag job is running
all the time.


At last, I describe the three bugs for non-auto defrag flow.
The fixing code as below:

```
diff --git a/fs/ocfs2/move_extents.c b/fs/ocfs2/move_extents.c
index 192cad0662d8..19693894aeec 100644
--- a/fs/ocfs2/move_extents.c
+++ b/fs/ocfs2/move_extents.c
@@ -444,7 +444,7 @@ static int ocfs2_find_victim_alloc_group(struct inode *inode,
 			bg = (struct ocfs2_group_desc *)gd_bh->b_data;

 			if (vict_blkno < (le64_to_cpu(bg->bg_blkno) +
-						le16_to_cpu(bg->bg_bits))) {
+						(le16_to_cpu(bg->bg_bits) << bits_per_unit))) {

 				*ret_bh = gd_bh;
 				*vict_bit = (vict_blkno - blkno) >>
@@ -559,6 +559,7 @@ static void ocfs2_probe_alloc_group(struct inode *inode, struct buffer_head *bh,
 			last_free_bits++;

 		if (last_free_bits == move_len) {
+			i -= move_len;
 			*goal_bit = i;
 			*phys_cpos = base_cpos + i;
 			break;
@@ -1030,18 +1031,19 @@ int ocfs2_ioctl_move_extents(struct file *filp, void __user *argp)

 	context->range = &range;

+	/*
+	 * ok, the default theshold for the defragmentation
+	 * is 1M, since our maximum clustersize was 1M also.
+	 * any thought?
+	 */
+	if (!range.me_threshold)
+		range.me_threshold = 1024 * 1024;
+
+	if (range.me_threshold > i_size_read(inode))
+		range.me_threshold = i_size_read(inode);
+
 	if (range.me_flags & OCFS2_MOVE_EXT_FL_AUTO_DEFRAG) {
 		context->auto_defrag = 1;
-		/*
-		 * ok, the default theshold for the defragmentation
-		 * is 1M, since our maximum clustersize was 1M also.
-		 * any thought?
-		 */
-		if (!range.me_threshold)
-			range.me_threshold = 1024 * 1024;
-
-		if (range.me_threshold > i_size_read(inode))
-			range.me_threshold = i_size_read(inode);

 		if (range.me_flags & OCFS2_MOVE_EXT_FL_PART_DEFRAG)
 			context->partial = 1;
```

I will file a new patch for above fix.

Thanks,
Heming

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-02-16 14:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-10 10:04 [Ocfs2-devel] discuss about jbd2 assertion in defragment path Heming Zhao via Ocfs2-devel
2023-02-14  2:52 ` Joseph Qi via Ocfs2-devel
2023-02-14  4:33   ` Heming Zhao via Ocfs2-devel
2023-02-14 11:08     ` Joseph Qi via Ocfs2-devel
2023-02-14 11:48       ` Heming Zhao via Ocfs2-devel
2023-02-15  2:06         ` Joseph Qi via Ocfs2-devel
2023-02-15  6:20           ` Heming Zhao via Ocfs2-devel
2023-02-15  6:42             ` Joseph Qi via Ocfs2-devel
2023-02-15  7:29               ` Heming Zhao via Ocfs2-devel
2023-02-15 10:02               ` Heming Zhao via Ocfs2-devel
2023-02-15 12:04                 ` Joseph Qi via Ocfs2-devel
2023-02-16 14:58                   ` Heming Zhao via Ocfs2-devel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).