All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] Long io response time doubt
@ 2015-11-12  3:05 Joseph Qi
  2015-11-12  7:23 ` Eric Ren
  0 siblings, 1 reply; 11+ messages in thread
From: Joseph Qi @ 2015-11-12  3:05 UTC (permalink / raw)
  To: ocfs2-devel

Hi Eric,
You reported an issue about sometime io response time may be long.

From your test case information, I think it was caused by downconvert.
And it seemed reasonable because it had to.

Node 1 wrote file, and node 2 read it. Since you used buffer io, that
was after node 1 had finished written, it might be still in page cache.
So node 1 should downconvert first then node 2 read could continue.
That was why you said it seemed ocfs2_inode_lock_with_page spent most
time. More specifically, it was ocfs2_inode_lock after trying nonblock
lock and returning -EAGAIN.

And this also explained why direct io didn't have the issue, but took
more time.

I am not sure if your test case is the same as what the customer has
reported. I think you should recheck the operations in each node.

And we have reported an case before about DLM handling issue. I am not
sure if it has relations.
https://oss.oracle.com/pipermail/ocfs2-devel/2015-August/011045.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] Long io response time doubt
  2015-11-12  3:05 [Ocfs2-devel] Long io response time doubt Joseph Qi
@ 2015-11-12  7:23 ` Eric Ren
  2015-11-12  8:00   ` Joseph Qi
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Ren @ 2015-11-12  7:23 UTC (permalink / raw)
  To: ocfs2-devel

Hi Joseph,

Thanks for your reply! There're more details I'd like to ask about ;-)

On 11/12/15 11:05, Joseph Qi wrote:
> Hi Eric,
> You reported an issue about sometime io response time may be long.
>
>  From your test case information, I think it was caused by downconvert.
 From what I learned from fs/dlm, lock manager grants all 
down-conversions requests
in place,i.e. on grant queue. Here're some silly questions:
1. who may requests down-convertion?
2. when down-convertion happends?
3. how could a down-convertion takes so long?

Could you describes more in this case?
> And it seemed reasonable because it had to.
>
> Node 1 wrote file, and node 2 read it. Since you used buffer io, that
> was after node 1 had finished written, it might be still in page cache.
Sorry, I cannot understand the relationship between "still in page case" 
and "so...downconvert".
> So node 1 should downconvert first then node 2 read could continue.
> That was why you said it seemed ocfs2_inode_lock_with_page spent most
Actually, it suprises me more with such long time spent than the *most* 
time compared to "readpage" stuff ;-)
> time. More specifically, it was ocfs2_inode_lock after trying nonblock
> lock and returning -EAGAIN.
You mean read process would repeatedly try nonblock lock until write 
process down-convertion completes?
>
> And this also explained why direct io didn't have the issue, but took
> more time.
>
> I am not sure if your test case is the same as what the customer has
> reported. I think you should recheck the operations in each node.
Yes, we've verified several times both on sles10 and sles11.  On sles10, 
each IO time is smooth, no long time IO peak.
>
> And we have reported an case before about DLM handling issue. I am not
> sure if it has relations.
> https://oss.oracle.com/pipermail/ocfs2-devel/2015-August/011045.html
Thanks, I've read this post. I cannot see any relations yet. Actually, 
fs/dlm also implements that way, it's the so-called "conversion deadlock"
which mentioned in 2.3.7.3 section of "programming locking applications" 
book.

There're only two processes from two nodes. Process A is blocked on wait 
queue caused by process B in convert queue, that leave grant queue empty,
is this possible?

You'know I'm new here, maybe some questions're improper,please point out 
if so;-)

Thank,
Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] Long io response time doubt
  2015-11-12  7:23 ` Eric Ren
@ 2015-11-12  8:00   ` Joseph Qi
  2015-11-12  9:48     ` Eric Ren
  0 siblings, 1 reply; 11+ messages in thread
From: Joseph Qi @ 2015-11-12  8:00 UTC (permalink / raw)
  To: ocfs2-devel

On 2015/11/12 15:23, Eric Ren wrote:
> Hi Joseph,
> 
> Thanks for your reply! There're more details I'd like to ask about ;-)
> 
> On 11/12/15 11:05, Joseph Qi wrote:
>> Hi Eric,
>> You reported an issue about sometime io response time may be long.
>>
>>  From your test case information, I think it was caused by downconvert.
> From what I learned from fs/dlm, lock manager grants all down-conversions requests
> in place,i.e. on grant queue. Here're some silly questions:
> 1. who may requests down-convertion?
> 2. when down-convertion happends?
> 3. how could a down-convertion takes so long?
IMO, it happens almost in two cases.
1. Owner knows another node is waiting on the lock, in other words, one
have blocked another's request. It may be triggered in ast, bast, or
unlock.
2. ocfs2cmt does periodically commit.

One case can lead to long time downconvert is, it is indeed that it has
too much work to do. I am not sure if there are any other cases or code
bug.

> 
> Could you describes more in this case?
>> And it seemed reasonable because it had to.
>>
>> Node 1 wrote file, and node 2 read it. Since you used buffer io, that
>> was after node 1 had finished written, it might be still in page cache.
> Sorry, I cannot understand the relationship between "still in page case" and "so...downconvert".
>> So node 1 should downconvert first then node 2 read could continue.
>> That was why you said it seemed ocfs2_inode_lock_with_page spent most
> Actually, it suprises me more with such long time spent than the *most* time compared to "readpage" stuff ;-)
>> time. More specifically, it was ocfs2_inode_lock after trying nonblock
>> lock and returning -EAGAIN.
> You mean read process would repeatedly try nonblock lock until write process down-convertion completes?
No, after nonblock lock returning -EAGAIN, it will unlock page and then
call ocfs2_inode_lock and ocfs2_inode_unlock. And ocfs2_inode_lock will
wait until downconvert completion in another node.
This is for an lock inversion case. You can refer the comments of
ocfs2_inode_lock_with_page.

>>
>> And this also explained why direct io didn't have the issue, but took
>> more time.
>>
>> I am not sure if your test case is the same as what the customer has
>> reported. I think you should recheck the operations in each node.
> Yes, we've verified several times both on sles10 and sles11.  On sles10, each IO time is smooth, no long time IO peak.
>>
>> And we have reported an case before about DLM handling issue. I am not
>> sure if it has relations.
>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-August/011045.html
> Thanks, I've read this post. I cannot see any relations yet. Actually, fs/dlm also implements that way, it's the so-called "conversion deadlock"
> which mentioned in 2.3.7.3 section of "programming locking applications" book.
> 
> There're only two processes from two nodes. Process A is blocked on wait queue caused by process B in convert queue, that leave grant queue empty,
> is this possible?
So we have to investigate why convert request cannot be satisfied.
If dlm still works fine, it is impossible. Otherwise it is a bug.

> 
> You'know I'm new here, maybe some questions're improper,please point out if so;-)
> 
> Thank,
> Eric
> 
> .
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] Long io response time doubt
  2015-11-12  8:00   ` Joseph Qi
@ 2015-11-12  9:48     ` Eric Ren
  2015-11-13  3:31       ` Joseph Qi
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Ren @ 2015-11-12  9:48 UTC (permalink / raw)
  To: ocfs2-devel

Hi Joseph,

On 11/12/15 16:00, Joseph Qi wrote:
> On 2015/11/12 15:23, Eric Ren wrote:
>> Hi Joseph,
>>
>> Thanks for your reply! There're more details I'd like to ask about ;-)
>>
>> On 11/12/15 11:05, Joseph Qi wrote:
>>> Hi Eric,
>>> You reported an issue about sometime io response time may be long.
>>>
>>>   From your test case information, I think it was caused by downconvert.
>>  From what I learned from fs/dlm, lock manager grants all down-conversions requests
>> in place,i.e. on grant queue. Here're some silly questions:
>> 1. who may requests down-convertion?
>> 2. when down-convertion happends?
>> 3. how could a down-convertion takes so long?
> IMO, it happens almost in two cases.
> 1. Owner knows another node is waiting on the lock, in other words, one
> have blocked another's request. It may be triggered in ast, bast, or
> unlock.
> 2. ocfs2cmt does periodically commit.
>
> One case can lead to long time downconvert is, it is indeed that it has
> too much work to do. I am not sure if there are any other cases or code
> bug.
OK, not familiar with ocfs2cmt. Could I bother you to explain what 
ocfs2cmt is used to do,
it's relation with R/W, and why down-conversion can be triggered by when 
it commits?
>> Could you describes more in this case?
>>> And it seemed reasonable because it had to.
>>>
>>> Node 1 wrote file, and node 2 read it. Since you used buffer io, that
>>> was after node 1 had finished written, it might be still in page cache.
>> Sorry, I cannot understand the relationship between "still in page case" and "so...downconvert".
>>> So node 1 should downconvert first then node 2 read could continue.
>>> That was why you said it seemed ocfs2_inode_lock_with_page spent most
>> Actually, it suprises me more with such long time spent than the *most* time compared to "readpage" stuff ;-)
>>> time. More specifically, it was ocfs2_inode_lock after trying nonblock
>>> lock and returning -EAGAIN.
>> You mean read process would repeatedly try nonblock lock until write process down-convertion completes?
> No, after nonblock lock returning -EAGAIN, it will unlock page and then
> call ocfs2_inode_lock and ocfs2_inode_unlock. And ocfs2_inode_lock will
Yes.
> wait until downconvert completion in another node.
Another node which read or write process on?
> This is for an lock inversion case. You can refer the comments of
> ocfs2_inode_lock_with_page.
Yeah, actually I read this comments again and again, but still fail to 
get this idea.
Could you please explain how this works? I'm really really interested 
;-) Forgive me
paste code below, make it convenient to refer.

/*
  * This is working around a lock inversion between tasks acquiring DLM
  * locks while holding a page lock and the downconvert thread which
  * blocks dlm lock acquiry while acquiring page locks.
  *
  * ** These _with_page variantes are only intended to be called from aop
  * methods that hold page locks and return a very specific *positive* error
  * code that aop methods pass up to the VFS -- test for errors with != 
0. **
  *
  * The DLM is called such that it returns -EAGAIN if it would have
  * blocked waiting for the downconvert thread.  In that case we unlock
  * our page so the downconvert thread can make progress.  Once we've
  * done this we have to return AOP_TRUNCATED_PAGE so the aop method
  * that called us can bubble that back up into the VFS who will then
  * immediately retry the aop call.
  *
  * We do a blocking lock and immediate unlock before returning, though, 
so that
  * the lock has a great chance of being cached on this node by the time 
the VFS
  * calls back to retry the aop.    This has a potential to livelock as 
nodes
  * ping locks back and forth, but that's a risk we're willing to take 
to avoid
  * the lock inversion simply.
  */
int ocfs2_inode_lock_with_page(struct inode *inode,
                               struct buffer_head **ret_bh,
                               int ex,
                               struct page *page)
{
         int ret;

         ret = ocfs2_inode_lock_full(inode, ret_bh, ex, 
OCFS2_LOCK_NONBLOCK);
         if (ret == -EAGAIN) {
                 unlock_page(page);
                 if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
                         ocfs2_inode_unlock(inode, ex);
                 ret = AOP_TRUNCATED_PAGE;
         }

         return ret;
}

Thanks,
Eric
>>> And this also explained why direct io didn't have the issue, but took
>>> more time.
>>>
>>> I am not sure if your test case is the same as what the customer has
>>> reported. I think you should recheck the operations in each node.
>> Yes, we've verified several times both on sles10 and sles11.  On sles10, each IO time is smooth, no long time IO peak.
>>> And we have reported an case before about DLM handling issue. I am not
>>> sure if it has relations.
>>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-August/011045.html
>> Thanks, I've read this post. I cannot see any relations yet. Actually, fs/dlm also implements that way, it's the so-called "conversion deadlock"
>> which mentioned in 2.3.7.3 section of "programming locking applications" book.
>>
>> There're only two processes from two nodes. Process A is blocked on wait queue caused by process B in convert queue, that leave grant queue empty,
>> is this possible?
> So we have to investigate why convert request cannot be satisfied.
> If dlm still works fine, it is impossible. Otherwise it is a bug.
>
>> You'know I'm new here, maybe some questions're improper,please point out if so;-)
>>
>> Thank,
>> Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] Long io response time doubt
  2015-11-12  9:48     ` Eric Ren
@ 2015-11-13  3:31       ` Joseph Qi
  2015-11-14  5:23         ` Eric Ren
  0 siblings, 1 reply; 11+ messages in thread
From: Joseph Qi @ 2015-11-13  3:31 UTC (permalink / raw)
  To: ocfs2-devel

Hi Eric,

On 2015/11/12 17:48, Eric Ren wrote:
> Hi Joseph,
> 
> On 11/12/15 16:00, Joseph Qi wrote:
>> On 2015/11/12 15:23, Eric Ren wrote:
>>> Hi Joseph,
>>>
>>> Thanks for your reply! There're more details I'd like to ask about ;-)
>>>
>>> On 11/12/15 11:05, Joseph Qi wrote:
>>>> Hi Eric,
>>>> You reported an issue about sometime io response time may be long.
>>>>
>>>>   From your test case information, I think it was caused by downconvert.
>>>  From what I learned from fs/dlm, lock manager grants all down-conversions requests
>>> in place,i.e. on grant queue. Here're some silly questions:
>>> 1. who may requests down-convertion?
>>> 2. when down-convertion happends?
>>> 3. how could a down-convertion takes so long?
>> IMO, it happens almost in two cases.
>> 1. Owner knows another node is waiting on the lock, in other words, one
>> have blocked another's request. It may be triggered in ast, bast, or
>> unlock.
>> 2. ocfs2cmt does periodically commit.
>>
>> One case can lead to long time downconvert is, it is indeed that it has
>> too much work to do. I am not sure if there are any other cases or code
>> bug.
> OK, not familiar with ocfs2cmt. Could I bother you to explain what ocfs2cmt is used to do,
> it's relation with R/W, and why down-conversion can be triggered by when it commits?
Sorry, the above explanation is not right and may mislead you.

jbd2/xxx (previously called kjournald2?) does periodically commit,
the default interval is 5s and can be set with mount option "commit=".

ocfs2cmt does the checkpoint, it can be waked up:
a) unblock lock during downconvert, and if jbd2/xxx has already done the
commit, ocfs2cmt won't be actually waken up because it has already been
checkpointed. So ocfs2cmt works with jbd2/xxx.
b) evict inode and then do downconvert.

>>> Could you describes more in this case?
>>>> And it seemed reasonable because it had to.
>>>>
>>>> Node 1 wrote file, and node 2 read it. Since you used buffer io, that
>>>> was after node 1 had finished written, it might be still in page cache.
>>> Sorry, I cannot understand the relationship between "still in page case" and "so...downconvert".
>>>> So node 1 should downconvert first then node 2 read could continue.
>>>> That was why you said it seemed ocfs2_inode_lock_with_page spent most
>>> Actually, it suprises me more with such long time spent than the *most* time compared to "readpage" stuff ;-)
>>>> time. More specifically, it was ocfs2_inode_lock after trying nonblock
>>>> lock and returning -EAGAIN.
>>> You mean read process would repeatedly try nonblock lock until write process down-convertion completes?
>> No, after nonblock lock returning -EAGAIN, it will unlock page and then
>> call ocfs2_inode_lock and ocfs2_inode_unlock. And ocfs2_inode_lock will
> Yes.
>> wait until downconvert completion in another node.
> Another node which read or write process on?
Yes, the node blocks my request.
For example, node 1 has EX, then node 2 wants to get PR, it should wait
for node 1 downconvert first.

Thanks,
Joesph

>> This is for an lock inversion case. You can refer the comments of
>> ocfs2_inode_lock_with_page.
> Yeah, actually I read this comments again and again, but still fail to get this idea.
> Could you please explain how this works? I'm really really interested ;-) Forgive me
> paste code below, make it convenient to refer.
> 
> /*
>  * This is working around a lock inversion between tasks acquiring DLM
>  * locks while holding a page lock and the downconvert thread which
>  * blocks dlm lock acquiry while acquiring page locks.
>  *
>  * ** These _with_page variantes are only intended to be called from aop
>  * methods that hold page locks and return a very specific *positive* error
>  * code that aop methods pass up to the VFS -- test for errors with != 0. **
>  *
>  * The DLM is called such that it returns -EAGAIN if it would have
>  * blocked waiting for the downconvert thread.  In that case we unlock
>  * our page so the downconvert thread can make progress.  Once we've
>  * done this we have to return AOP_TRUNCATED_PAGE so the aop method
>  * that called us can bubble that back up into the VFS who will then
>  * immediately retry the aop call.
>  *
>  * We do a blocking lock and immediate unlock before returning, though, so that
>  * the lock has a great chance of being cached on this node by the time the VFS
>  * calls back to retry the aop.    This has a potential to livelock as nodes
>  * ping locks back and forth, but that's a risk we're willing to take to avoid
>  * the lock inversion simply.
>  */
> int ocfs2_inode_lock_with_page(struct inode *inode,
>                               struct buffer_head **ret_bh,
>                               int ex,
>                               struct page *page)
> {
>         int ret;
> 
>         ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
>         if (ret == -EAGAIN) {
>                 unlock_page(page);
>                 if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
>                         ocfs2_inode_unlock(inode, ex);
>                 ret = AOP_TRUNCATED_PAGE;
>         }
> 
>         return ret;
> }
> 
> Thanks,
> Eric
>>>> And this also explained why direct io didn't have the issue, but took
>>>> more time.
>>>>
>>>> I am not sure if your test case is the same as what the customer has
>>>> reported. I think you should recheck the operations in each node.
>>> Yes, we've verified several times both on sles10 and sles11.  On sles10, each IO time is smooth, no long time IO peak.
>>>> And we have reported an case before about DLM handling issue. I am not
>>>> sure if it has relations.
>>>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-August/011045.html
>>> Thanks, I've read this post. I cannot see any relations yet. Actually, fs/dlm also implements that way, it's the so-called "conversion deadlock"
>>> which mentioned in 2.3.7.3 section of "programming locking applications" book.
>>>
>>> There're only two processes from two nodes. Process A is blocked on wait queue caused by process B in convert queue, that leave grant queue empty,
>>> is this possible?
>> So we have to investigate why convert request cannot be satisfied.
>> If dlm still works fine, it is impossible. Otherwise it is a bug.
>>
>>> You'know I'm new here, maybe some questions're improper,please point out if so;-)
>>>
>>> Thank,
>>> Eric
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] Long io response time doubt
  2015-11-13  3:31       ` Joseph Qi
@ 2015-11-14  5:23         ` Eric Ren
  2015-11-16  1:40           ` Joseph Qi
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Ren @ 2015-11-14  5:23 UTC (permalink / raw)
  To: ocfs2-devel

Hi Joseph,

> >> 2. ocfs2cmt does periodically commit.
> >>
> >> One case can lead to long time downconvert is, it is indeed that it has
> >> too much work to do. I am not sure if there are any other cases or code
> >> bug.
> > OK, not familiar with ocfs2cmt. Could I bother you to explain what ocfs2cmt is used to do,
> > it's relation with R/W, and why down-conversion can be triggered by when it commits?
> Sorry, the above explanation is not right and may mislead you.
> 
> jbd2/xxx (previously called kjournald2?) does periodically commit,
> the default interval is 5s and can be set with mount option "commit=".
> 
> ocfs2cmt does the checkpoint, it can be waked up:
> a) unblock lock during downconvert, and if jbd2/xxx has already done the
> commit, ocfs2cmt won't be actually waken up because it has already been
> checkpointed. So ocfs2cmt works with jbd2/xxx.
OK, thanks for your knowledge;-)
> b) evict inode and then do downconvert.
Sorry, I'm confused about b). You mean b) is also part of ocfs2cmt's
work? Does b) have something to do with a)? And what's the meaning of "evict inode"?
Actually, I can hardly understand the idea of b).
> 
> >>> Could you describes more in this case?
> >>>> And it seemed reasonable because it had to.
> >>>>
> >>>> Node 1 wrote file, and node 2 read it. Since you used buffer io, that
> >>>> was after node 1 had finished written, it might be still in page cache.
> >>> Sorry, I cannot understand the relationship between "still in page case" and "so...downconvert".
> >>>> So node 1 should downconvert first then node 2 read could continue.
> >>>> That was why you said it seemed ocfs2_inode_lock_with_page spent most
> >>> Actually, it suprises me more with such long time spent than the *most* time compared to "readpage" stuff ;-)
> >>>> time. More specifically, it was ocfs2_inode_lock after trying nonblock
> >>>> lock and returning -EAGAIN.
> >>> You mean read process would repeatedly try nonblock lock until write process down-convertion completes?
> >> No, after nonblock lock returning -EAGAIN, it will unlock page and then
> >> call ocfs2_inode_lock and ocfs2_inode_unlock. And ocfs2_inode_lock will
> > Yes.
> >> wait until downconvert completion in another node.
> > Another node which read or write process on?
> Yes, the node blocks my request.
> For example, node 1 has EX, then node 2 wants to get PR, it should wait
> for node 1 downconvert first.
OK~

Thanks,
Eric
> 
> Thanks,
> Joesph
> 
> >> This is for an lock inversion case. You can refer the comments of
> >> ocfs2_inode_lock_with_page.
> > Yeah, actually I read this comments again and again, but still fail to get this idea.
> > Could you please explain how this works? I'm really really interested ;-) Forgive me
> > paste code below, make it convenient to refer.
> > 
> > /*
> >  * This is working around a lock inversion between tasks acquiring DLM
> >  * locks while holding a page lock and the downconvert thread which
> >  * blocks dlm lock acquiry while acquiring page locks.
> >  *
> >  * ** These _with_page variantes are only intended to be called from aop
> >  * methods that hold page locks and return a very specific *positive* error
> >  * code that aop methods pass up to the VFS -- test for errors with != 0. **
> >  *
> >  * The DLM is called such that it returns -EAGAIN if it would have
> >  * blocked waiting for the downconvert thread.  In that case we unlock
> >  * our page so the downconvert thread can make progress.  Once we've
> >  * done this we have to return AOP_TRUNCATED_PAGE so the aop method
> >  * that called us can bubble that back up into the VFS who will then
> >  * immediately retry the aop call.
> >  *
> >  * We do a blocking lock and immediate unlock before returning, though, so that
> >  * the lock has a great chance of being cached on this node by the time the VFS
> >  * calls back to retry the aop.    This has a potential to livelock as nodes
> >  * ping locks back and forth, but that's a risk we're willing to take to avoid
> >  * the lock inversion simply.
> >  */
> > int ocfs2_inode_lock_with_page(struct inode *inode,
> >                               struct buffer_head **ret_bh,
> >                               int ex,
> >                               struct page *page)
> > {
> >         int ret;
> > 
> >         ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
> >         if (ret == -EAGAIN) {
> >                 unlock_page(page);
> >                 if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
> >                         ocfs2_inode_unlock(inode, ex);
> >                 ret = AOP_TRUNCATED_PAGE;
> >         }
> > 
> >         return ret;
> > }
> > 
> > Thanks,
> > Eric
> >>>> And this also explained why direct io didn't have the issue, but took
> >>>> more time.
> >>>>
> >>>> I am not sure if your test case is the same as what the customer has
> >>>> reported. I think you should recheck the operations in each node.
> >>> Yes, we've verified several times both on sles10 and sles11.  On sles10, each IO time is smooth, no long time IO peak.
> >>>> And we have reported an case before about DLM handling issue. I am not
> >>>> sure if it has relations.
> >>>> https://oss.oracle.com/pipermail/ocfs2-devel/2015-August/011045.html
> >>> Thanks, I've read this post. I cannot see any relations yet. Actually, fs/dlm also implements that way, it's the so-called "conversion deadlock"
> >>> which mentioned in 2.3.7.3 section of "programming locking applications" book.
> >>>
> >>> There're only two processes from two nodes. Process A is blocked on wait queue caused by process B in convert queue, that leave grant queue empty,
> >>> is this possible?
> >> So we have to investigate why convert request cannot be satisfied.
> >> If dlm still works fine, it is impossible. Otherwise it is a bug.
> >>
> >>> You'know I'm new here, maybe some questions're improper,please point out if so;-)
> >>>
> >>> Thank,
> >>> Eric
> > 
> > 
> > .
> > 
> 
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] Long io response time doubt
  2015-11-14  5:23         ` Eric Ren
@ 2015-11-16  1:40           ` Joseph Qi
  2015-11-24 10:02             ` Eric Ren
  0 siblings, 1 reply; 11+ messages in thread
From: Joseph Qi @ 2015-11-16  1:40 UTC (permalink / raw)
  To: ocfs2-devel

Hi Eric,

On 2015/11/14 13:23, Eric Ren wrote:
> Hi Joseph,
> 
>>>> > >> 2. ocfs2cmt does periodically commit.
>>>> > >>
>>>> > >> One case can lead to long time downconvert is, it is indeed that it has
>>>> > >> too much work to do. I am not sure if there are any other cases or code
>>>> > >> bug.
>>> > > OK, not familiar with ocfs2cmt. Could I bother you to explain what ocfs2cmt is used to do,
>>> > > it's relation with R/W, and why down-conversion can be triggered by when it commits?
>> > Sorry, the above explanation is not right and may mislead you.
>> > 
>> > jbd2/xxx (previously called kjournald2?) does periodically commit,
>> > the default interval is 5s and can be set with mount option "commit=".
>> > 
>> > ocfs2cmt does the checkpoint, it can be waked up:
>> > a) unblock lock during downconvert, and if jbd2/xxx has already done the
>> > commit, ocfs2cmt won't be actually waken up because it has already been
>> > checkpointed. So ocfs2cmt works with jbd2/xxx.
> OK, thanks for your knowledge;-)
>> > b) evict inode and then do downconvert.
> Sorry, I'm confused about b). You mean b) is also part of ocfs2cmt's
> work? Does b) have something to do with a)? And what's the meaning of "evict inode"?
> Actually, I can hardly understand the idea of b).
You can go through the code flow:
iput->iput_final->evict->evict_inode->ocfs2_evict_inode
->ocfs2_clear_inode->ocfs2_checkpoint_inode->ocfs2_start_checkpoint

It happens that one node do not use the inode any longer (but not
delete), and will free its related lockres.

Thanks,
Joseph

>> > 
>>>>> > >>> Could you describes more in this case?
>>>>>> > >>>> And it seemed reasonable because it had to.
>>>>>> > >>>>
>>>>>> > >>>> Node 1 wrote file, and node 2 read it. Since you used buffer io, that
>>>>>> > >>>> was after node 1 had finished written, it might be still in page cache.
>>>>> > >>> Sorry, I cannot understand the relationship between "still in page case" and "so...downconvert".
>>>>>> > >>>> So node 1 should downconvert first then node 2 read could continue.
>>>>>> > >>>> That was why you said it seemed ocfs2_inode_lock_with_page spent most
>>>>> > >>> Actually, it suprises me more with such long time spent than the *most* time compared to "readpage" stuff ;-)
>>>>>> > >>>> time. More specifically, it was ocfs2_inode_lock after trying nonblock
>>>>>> > >>>> lock and returning -EAGAIN.
>>>>> > >>> You mean read process would repeatedly try nonblock lock until write process down-convertion completes?
>>>> > >> No, after nonblock lock returning -EAGAIN, it will unlock page and then
>>>> > >> call ocfs2_inode_lock and ocfs2_inode_unlock. And ocfs2_inode_lock will
>>> > > Yes.
>>>> > >> wait until downconvert completion in another node.
>>> > > Another node which read or write process on?
>> > Yes, the node blocks my request.
>> > For example, node 1 has EX, then node 2 wants to get PR, it should wait
>> > for node 1 downconvert first.
> OK~
> 
> Thanks,
> Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] Long io response time doubt
  2015-11-16  1:40           ` Joseph Qi
@ 2015-11-24 10:02             ` Eric Ren
  2015-11-24 10:05               ` Eric Ren
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Ren @ 2015-11-24 10:02 UTC (permalink / raw)
  To: ocfs2-devel

Hi Joseph,

I use ftrace's function tracer to record some code flow. There's a 
question that makes me confused -
why does ocfs2_cancel_convert() be called here in ocfs2dc thread? In 
other words, what do we expect it
to do here?

ocfs2_unblock_lock(){
      ...
      if(lockres->l_flags & OCFS2_LOCK_BUSY){
         ...
         ocfs2_cancel_convert()
        ...
     }
}

 From what I understand, 
ocfs2_cancel_convert()->ocfs2_dlm_unlock()->user_dlm_unlock()->dlm_unlock(DLM_LKF_CANCEL) 
puts
the lock back on the the grand queue at its old grant mode.  In my case, 
you know, read/write the same shared file from two nodes,
I think the up-conversion can only happen on the writing node - 
(PR->EX), while on the reading node, no up-conversion  is need, right?

But, the following output from writing and reading nodes, shows that 
ocfs2_cancel_convert() has been called on both nodes. why could
this happen in this scenario?

On 11/16/15 09:40, Joseph Qi wrote:
>> Sorry, I'm confused about b). You mean b) is also part of ocfs2cmt's
>> work? Does b) have something to do with a)? And what's the meaning of "evict inode"?
>> Actually, I can hardly understand the idea of b).
> You can go through the code flow:
> iput->iput_final->evict->evict_inode->ocfs2_evict_inode
> ->ocfs2_clear_inode->ocfs2_checkpoint_inode->ocfs2_start_checkpoint
>
> It happens that one node do not use the inode any longer (but not
> delete), and will free its related lockres.
OK, thanks~

Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] Long io response time doubt
  2015-11-24 10:02             ` Eric Ren
@ 2015-11-24 10:05               ` Eric Ren
  2015-11-26  1:34                 ` Joseph Qi
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Ren @ 2015-11-24 10:05 UTC (permalink / raw)
  To: ocfs2-devel

Sorry, forget to add the pieces of code flow...

On reading node:

  3)  dlm_ast-4278  =>  ocfs2dc-4277
  ------------------------------------------

  3)               |  ocfs2_process_blocked_lock() {
  3)               |    ocfs2_unblock_lock() {
  3)   0.116 us    |      ocfs2_prepare_cancel_convert();
  3)               |      ocfs2_cancel_convert() {
  3)               |        user_dlm_unlock() {
  3)               |          dlm_unlock() {
  3)   0.120 us    |            dlm_find_lockspace_local();
  3)   0.158 us    |            find_lkb();
  3)               |            cancel_lock() {
  3)               |              validate_unlock_args() {
  3)   0.093 us    |                del_timeout();
  3)   0.782 us    |              }
  3)               |              _cancel_lock() {
  3)               |                send_common() {
  3)   0.189 us    |                  add_to_waiters();
  3)               |                  create_message() {
  3)               |                    _create_message() {
  3)               |                      dlm_lowcomms_get_buffer() {
  3)   0.156 us    |                        nodeid2con();
  3)   1.680 us    |                      }
  3)   0.108 us    |                      dlm_our_nodeid();
  3)   2.821 us    |                    }
  3)   3.319 us    |                  }
  3)   0.094 us    |                  send_args();
  3)               |                  send_message() {
  3)   0.070 us    |                    dlm_message_out();
  3)   9.485 us    |                    dlm_lowcomms_commit_buffer();
  3) + 10.609 us   |                  }
  3) + 16.054 us   |                }
  3) + 16.632 us   |              }
  3)   0.156 us    |              put_rsb();
  3) + 19.044 us   |            }
  3)               |            dlm_put_lkb() {
  3)   0.094 us    |              __put_lkb();
  3)   0.632 us    |            }
  3)   0.074 us    |            dlm_put_lockspace();
  3) + 22.513 us   |          }
  3) + 23.028 us   |        }
  3) + 23.727 us   |      }
  3) + 25.004 us   |    }
  3)               |    ocfs2_schedule_blocked_lock() {
  3)   0.073 us    |      lockres_set_flags();
  3)   0.592 us    |    }
  3) + 26.852 us   |  }
  ------------------------------------------
  3)  ocfs2dc-4277  =>  dlm_ast-4278
  ------------------------------------------

  3)               |  process_asts() {
  3)   0.202 us    |    dlm_rem_lkb_callback();
  3)   0.081 us    |    dlm_rem_lkb_callback();
  3)               |    fsdlm_lock_ast_wrapper() {
  3)               |      ocfs2_unlock_ast() {
  3)   0.099 us    |        ocfs2_get_inode_osb();
  3)   1.290 us    |        ocfs2_wake_downconvert_thread();
  3)               |        lockres_clear_flags() {
  3)   8.539 us    |          lockres_set_flags();
  3)   9.096 us    |        }
  3) + 12.055 us   |      }
  3) + 12.673 us   |    }
  3)               |    dlm_put_lkb() {
  3)   0.161 us    |      __put_lkb();
  3)   0.718 us    |    }
  3) + 16.133 us   |  }


On writing node:

  3)  kworker-443   =>  ocfs2dc-4456
  ------------------------------------------

  3)               |  ocfs2_process_blocked_lock() {
  3)               |    ocfs2_unblock_lock() {
  3)   0.269 us    |      ocfs2_prepare_cancel_convert();
  3)               |      ocfs2_cancel_convert() {
  3)               |        user_dlm_unlock() {
  3)               |          dlm_unlock() {
  3)   0.321 us    |            dlm_find_lockspace_local();
  3)   0.286 us    |            find_lkb();
  3)               |            cancel_lock() {
  3)               |              validate_unlock_args() {
  3)   0.122 us    |                del_timeout();
  3)   0.901 us    |              }
  3)               |              _cancel_lock() {
  3)               |                do_cancel() {
  3)               |                  revert_lock() {
  3)               |                    move_lkb() {
  3)   0.155 us    |                      del_lkb();
  3)   0.243 us    |                      add_lkb();
  3)   1.778 us    |                    }
  3)   2.577 us    |                  }
  3)               |                  queue_cast() {
  3)   0.102 us    |                    del_timeout();
  3)               |                    dlm_add_ast() {
  3)   0.165 us    |                      dlm_add_lkb_callback();
  3) + 14.492 us   |                    }
  3) + 16.381 us   |                  }
  3) + 20.384 us   |                }
  3)               |                grant_pending_locks() {
  3)               |                  grant_pending_convert() {
  3)               |                    can_be_granted() {
  3)   0.143 us    |                      _can_be_granted();
  3)   0.906 us    |                    }
  3)   1.900 us    |                  }
  3)   2.738 us    |                }
  3) + 24.670 us   |              }
  3)   0.154 us    |              put_rsb();
  3) + 28.068 us   |            }
  3)               |            dlm_put_lkb() {
  3)   0.163 us    |              __put_lkb();
  3)   1.029 us    |            }
  3)   0.195 us    |            dlm_put_lockspace();
  3) + 34.035 us   |          }
  3) + 34.914 us   |        }
  3) + 35.919 us   |      }
  3) + 37.864 us   |    }
  3)               |    ocfs2_schedule_blocked_lock() {
  3)   0.210 us    |      lockres_set_flags();
  0)               |  process_asts() {
  3)   0.998 us    |    }
  0)   0.215 us    |    dlm_rem_lkb_callback();
  3) + 40.671 us   |  }
  0)   0.084 us    |    dlm_rem_lkb_callback();
  0)               |    fsdlm_lock_ast_wrapper() {
  0)               |      ocfs2_unlock_ast() {
  0)   0.088 us    |        ocfs2_get_inode_osb();
  0)   9.498 us    |        ocfs2_wake_downconvert_thread();
  0)               |        lockres_clear_flags() {
  0)   1.272 us    |          lockres_set_flags();
  0)   1.757 us    |        }
  0) + 13.396 us   |      }
  0) + 13.983 us   |    }
  0)               |    dlm_put_lkb() {
  0)   0.136 us    |      __put_lkb();
  0)   0.641 us    |    }
  0) + 17.224 us   |  }


Thank,
Eric
On 11/24/15 18:02, Eric Ren wrote:
> Hi Joseph,
>
> I use ftrace's function tracer to record some code flow. There's a 
> question that makes me confused -
> why does ocfs2_cancel_convert() be called here in ocfs2dc thread? In 
> other words, what do we expect it
> to do here?
>
> ocfs2_unblock_lock(){
>      ...
>      if(lockres->l_flags & OCFS2_LOCK_BUSY){
>         ...
>         ocfs2_cancel_convert()
>        ...
>     }
> }
>
> From what I understand, 
> ocfs2_cancel_convert()->ocfs2_dlm_unlock()->user_dlm_unlock()->dlm_unlock(DLM_LKF_CANCEL) 
> puts
> the lock back on the the grand queue at its old grant mode.  In my 
> case, you know, read/write the same shared file from two nodes,
> I think the up-conversion can only happen on the writing node - 
> (PR->EX), while on the reading node, no up-conversion  is need, right?
>
> But, the following output from writing and reading nodes, shows that 
> ocfs2_cancel_convert() has been called on both nodes. why could
> this happen in this scenario?
>
> On 11/16/15 09:40, Joseph Qi wrote:
>>> Sorry, I'm confused about b). You mean b) is also part of ocfs2cmt's
>>> work? Does b) have something to do with a)? And what's the meaning 
>>> of "evict inode"?
>>> Actually, I can hardly understand the idea of b).
>> You can go through the code flow:
>> iput->iput_final->evict->evict_inode->ocfs2_evict_inode
>> ->ocfs2_clear_inode->ocfs2_checkpoint_inode->ocfs2_start_checkpoint
>>
>> It happens that one node do not use the inode any longer (but not
>> delete), and will free its related lockres.
> OK, thanks~
>
> Eric

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20151124/75a60f06/attachment-0001.html 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] Long io response time doubt
  2015-11-24 10:05               ` Eric Ren
@ 2015-11-26  1:34                 ` Joseph Qi
  2015-11-26  1:49                   ` Eric Ren
  0 siblings, 1 reply; 11+ messages in thread
From: Joseph Qi @ 2015-11-26  1:34 UTC (permalink / raw)
  To: ocfs2-devel

Hi Eric,
convert has two types, upconvert and downconvert. And please note that,
PR and EX is not compatible.
Assume read node has gotten PR first, then write node wants to get EX,
it requires read node to downconvert PR to NL. Then read node want's to
get PR again, write node should downconvert EX to PR (highest
compatible) and then read node can upconvert NL to PR. And so forth.
So both the read/write nodes will do upconvert and downconvert.
The code you paste is calling into fs/dlm which I am not familiar with:(
I think you can list your questions and send to cluster-devel.

Thanks,
Joseph

On 2015/11/24 18:05, Eric Ren wrote:
> Sorry, forget to add the pieces of code flow...
> 
> On reading node:
> 
>  3)  dlm_ast-4278  =>  ocfs2dc-4277 
>  ------------------------------------------
> 
>  3)               |  ocfs2_process_blocked_lock() {
>  3)               |    ocfs2_unblock_lock() {
>  3)   0.116 us    |      ocfs2_prepare_cancel_convert();
>  3)               |      ocfs2_cancel_convert() {
>  3)               |        user_dlm_unlock() {
>  3)               |          dlm_unlock() {
>  3)   0.120 us    |            dlm_find_lockspace_local();
>  3)   0.158 us    |            find_lkb();
>  3)               |            cancel_lock() {
>  3)               |              validate_unlock_args() {
>  3)   0.093 us    |                del_timeout();
>  3)   0.782 us    |              }
>  3)               |              _cancel_lock() {
>  3)               |                send_common() {
>  3)   0.189 us    |                  add_to_waiters();
>  3)               |                  create_message() {
>  3)               |                    _create_message() {
>  3)               |                      dlm_lowcomms_get_buffer() {
>  3)   0.156 us    |                        nodeid2con();
>  3)   1.680 us    |                      }
>  3)   0.108 us    |                      dlm_our_nodeid();
>  3)   2.821 us    |                    }
>  3)   3.319 us    |                  }
>  3)   0.094 us    |                  send_args();
>  3)               |                  send_message() {
>  3)   0.070 us    |                    dlm_message_out();
>  3)   9.485 us    |                    dlm_lowcomms_commit_buffer();
>  3) + 10.609 us   |                  }
>  3) + 16.054 us   |                }
>  3) + 16.632 us   |              }
>  3)   0.156 us    |              put_rsb();
>  3) + 19.044 us   |            }
>  3)               |            dlm_put_lkb() {
>  3)   0.094 us    |              __put_lkb();
>  3)   0.632 us    |            }
>  3)   0.074 us    |            dlm_put_lockspace();
>  3) + 22.513 us   |          }
>  3) + 23.028 us   |        }
>  3) + 23.727 us   |      }
>  3) + 25.004 us   |    }
>  3)               |    ocfs2_schedule_blocked_lock() {
>  3)   0.073 us    |      lockres_set_flags();
>  3)   0.592 us    |    }
>  3) + 26.852 us   |  }
>  ------------------------------------------
>  3)  ocfs2dc-4277  =>  dlm_ast-4278 
>  ------------------------------------------
> 
>  3)               |  process_asts() {
>  3)   0.202 us    |    dlm_rem_lkb_callback();
>  3)   0.081 us    |    dlm_rem_lkb_callback();
>  3)               |    fsdlm_lock_ast_wrapper() {
>  3)               |      ocfs2_unlock_ast() {
>  3)   0.099 us    |        ocfs2_get_inode_osb();
>  3)   1.290 us    |        ocfs2_wake_downconvert_thread();
>  3)               |        lockres_clear_flags() {
>  3)   8.539 us    |          lockres_set_flags();
>  3)   9.096 us    |        }
>  3) + 12.055 us   |      }
>  3) + 12.673 us   |    }
>  3)               |    dlm_put_lkb() {
>  3)   0.161 us    |      __put_lkb();
>  3)   0.718 us    |    }
>  3) + 16.133 us   |  }
> 
> 
> On writing node:
> 
>  3)  kworker-443   =>  ocfs2dc-4456 
>  ------------------------------------------
> 
>  3)               |  ocfs2_process_blocked_lock() {
>  3)               |    ocfs2_unblock_lock() {
>  3)   0.269 us    |      ocfs2_prepare_cancel_convert();
>  3)               |      ocfs2_cancel_convert() {
>  3)               |        user_dlm_unlock() {
>  3)               |          dlm_unlock() {
>  3)   0.321 us    |            dlm_find_lockspace_local();
>  3)   0.286 us    |            find_lkb();
>  3)               |            cancel_lock() {
>  3)               |              validate_unlock_args() {
>  3)   0.122 us    |                del_timeout();
>  3)   0.901 us    |              }
>  3)               |              _cancel_lock() {
>  3)               |                do_cancel() {
>  3)               |                  revert_lock() {
>  3)               |                    move_lkb() {
>  3)   0.155 us    |                      del_lkb();
>  3)   0.243 us    |                      add_lkb();
>  3)   1.778 us    |                    }
>  3)   2.577 us    |                  }
>  3)               |                  queue_cast() {
>  3)   0.102 us    |                    del_timeout();
>  3)               |                    dlm_add_ast() {
>  3)   0.165 us    |                      dlm_add_lkb_callback();
>  3) + 14.492 us   |                    }
>  3) + 16.381 us   |                  }
>  3) + 20.384 us   |                }
>  3)               |                grant_pending_locks() {
>  3)               |                  grant_pending_convert() {
>  3)               |                    can_be_granted() {
>  3)   0.143 us    |                      _can_be_granted();
>  3)   0.906 us    |                    }
>  3)   1.900 us    |                  }
>  3)   2.738 us    |                }
>  3) + 24.670 us   |              }
>  3)   0.154 us    |              put_rsb();
>  3) + 28.068 us   |            }
>  3)               |            dlm_put_lkb() {
>  3)   0.163 us    |              __put_lkb();
>  3)   1.029 us    |            }
>  3)   0.195 us    |            dlm_put_lockspace();
>  3) + 34.035 us   |          }
>  3) + 34.914 us   |        }
>  3) + 35.919 us   |      }
>  3) + 37.864 us   |    }
>  3)               |    ocfs2_schedule_blocked_lock() {
>  3)   0.210 us    |      lockres_set_flags();
>  0)               |  process_asts() {
>  3)   0.998 us    |    }
>  0)   0.215 us    |    dlm_rem_lkb_callback();
>  3) + 40.671 us   |  }
>  0)   0.084 us    |    dlm_rem_lkb_callback();
>  0)               |    fsdlm_lock_ast_wrapper() {
>  0)               |      ocfs2_unlock_ast() {
>  0)   0.088 us    |        ocfs2_get_inode_osb();
>  0)   9.498 us    |        ocfs2_wake_downconvert_thread();
>  0)               |        lockres_clear_flags() {
>  0)   1.272 us    |          lockres_set_flags();
>  0)   1.757 us    |        }
>  0) + 13.396 us   |      }
>  0) + 13.983 us   |    }
>  0)               |    dlm_put_lkb() {
>  0)   0.136 us    |      __put_lkb();
>  0)   0.641 us    |    }
>  0) + 17.224 us   |  }
> 
> 
> Thank,
> Eric
> On 11/24/15 18:02, Eric Ren wrote:
>> Hi Joseph,
>>
>> I use ftrace's function tracer to record some code flow. There's a question that makes me confused -
>> why does ocfs2_cancel_convert() be called here in ocfs2dc thread? In other words, what do we expect it
>> to do here?
>>
>> ocfs2_unblock_lock(){
>>      ...
>>      if(lockres->l_flags & OCFS2_LOCK_BUSY){
>>         ...
>>         ocfs2_cancel_convert()
>>        ...
>>     }
>> }
>>
>> From what I understand, ocfs2_cancel_convert()->ocfs2_dlm_unlock()->user_dlm_unlock()->dlm_unlock(DLM_LKF_CANCEL) puts
>> the lock back on the the grand queue at its old grant mode.  In my case, you know, read/write the same shared file from two nodes,
>> I think the up-conversion can only happen on the writing node - (PR->EX), while on the reading node, no up-conversion  is need, right?
>>
>> But, the following output from writing and reading nodes, shows that ocfs2_cancel_convert() has been called on both nodes. why could
>> this happen in this scenario?
>>
>> On 11/16/15 09:40, Joseph Qi wrote:
>>>> Sorry, I'm confused about b). You mean b) is also part of ocfs2cmt's
>>>> work? Does b) have something to do with a)? And what's the meaning of "evict inode"?
>>>> Actually, I can hardly understand the idea of b).
>>> You can go through the code flow:
>>> iput->iput_final->evict->evict_inode->ocfs2_evict_inode
>>> ->ocfs2_clear_inode->ocfs2_checkpoint_inode->ocfs2_start_checkpoint
>>>
>>> It happens that one node do not use the inode any longer (but not
>>> delete), and will free its related lockres.
>> OK, thanks~
>>
>> Eric
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Ocfs2-devel] Long io response time doubt
  2015-11-26  1:34                 ` Joseph Qi
@ 2015-11-26  1:49                   ` Eric Ren
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Ren @ 2015-11-26  1:49 UTC (permalink / raw)
  To: ocfs2-devel

Hi Joseph,

You've cleared up my confusion! Very good explaination~

Thanks,
Eric

On 11/26/15 09:34, Joseph Qi wrote:
> Hi Eric,
> convert has two types, upconvert and downconvert. And please note that,
> PR and EX is not compatible.
> Assume read node has gotten PR first, then write node wants to get EX,
> it requires read node to downconvert PR to NL. Then read node want's to
> get PR again, write node should downconvert EX to PR (highest
> compatible) and then read node can upconvert NL to PR. And so forth.
> So both the read/write nodes will do upconvert and downconvert.
> The code you paste is calling into fs/dlm which I am not familiar with:(
> I think you can list your questions and send to cluster-devel.
>
> Thanks,
> Joseph
>
> On 2015/11/24 18:05, Eric Ren wrote:
>> Sorry, forget to add the pieces of code flow...
>>
>> On reading node:
>>
>>   3)  dlm_ast-4278  =>  ocfs2dc-4277
>>   ------------------------------------------
>>
>>   3)               |  ocfs2_process_blocked_lock() {
>>   3)               |    ocfs2_unblock_lock() {
>>   3)   0.116 us    |      ocfs2_prepare_cancel_convert();
>>   3)               |      ocfs2_cancel_convert() {
>>   3)               |        user_dlm_unlock() {
>>   3)               |          dlm_unlock() {
>>   3)   0.120 us    |            dlm_find_lockspace_local();
>>   3)   0.158 us    |            find_lkb();
>>   3)               |            cancel_lock() {
>>   3)               |              validate_unlock_args() {
>>   3)   0.093 us    |                del_timeout();
>>   3)   0.782 us    |              }
>>   3)               |              _cancel_lock() {
>>   3)               |                send_common() {
>>   3)   0.189 us    |                  add_to_waiters();
>>   3)               |                  create_message() {
>>   3)               |                    _create_message() {
>>   3)               |                      dlm_lowcomms_get_buffer() {
>>   3)   0.156 us    |                        nodeid2con();
>>   3)   1.680 us    |                      }
>>   3)   0.108 us    |                      dlm_our_nodeid();
>>   3)   2.821 us    |                    }
>>   3)   3.319 us    |                  }
>>   3)   0.094 us    |                  send_args();
>>   3)               |                  send_message() {
>>   3)   0.070 us    |                    dlm_message_out();
>>   3)   9.485 us    |                    dlm_lowcomms_commit_buffer();
>>   3) + 10.609 us   |                  }
>>   3) + 16.054 us   |                }
>>   3) + 16.632 us   |              }
>>   3)   0.156 us    |              put_rsb();
>>   3) + 19.044 us   |            }
>>   3)               |            dlm_put_lkb() {
>>   3)   0.094 us    |              __put_lkb();
>>   3)   0.632 us    |            }
>>   3)   0.074 us    |            dlm_put_lockspace();
>>   3) + 22.513 us   |          }
>>   3) + 23.028 us   |        }
>>   3) + 23.727 us   |      }
>>   3) + 25.004 us   |    }
>>   3)               |    ocfs2_schedule_blocked_lock() {
>>   3)   0.073 us    |      lockres_set_flags();
>>   3)   0.592 us    |    }
>>   3) + 26.852 us   |  }
>>   ------------------------------------------
>>   3)  ocfs2dc-4277  =>  dlm_ast-4278
>>   ------------------------------------------
>>
>>   3)               |  process_asts() {
>>   3)   0.202 us    |    dlm_rem_lkb_callback();
>>   3)   0.081 us    |    dlm_rem_lkb_callback();
>>   3)               |    fsdlm_lock_ast_wrapper() {
>>   3)               |      ocfs2_unlock_ast() {
>>   3)   0.099 us    |        ocfs2_get_inode_osb();
>>   3)   1.290 us    |        ocfs2_wake_downconvert_thread();
>>   3)               |        lockres_clear_flags() {
>>   3)   8.539 us    |          lockres_set_flags();
>>   3)   9.096 us    |        }
>>   3) + 12.055 us   |      }
>>   3) + 12.673 us   |    }
>>   3)               |    dlm_put_lkb() {
>>   3)   0.161 us    |      __put_lkb();
>>   3)   0.718 us    |    }
>>   3) + 16.133 us   |  }
>>
>>
>> On writing node:
>>
>>   3)  kworker-443   =>  ocfs2dc-4456
>>   ------------------------------------------
>>
>>   3)               |  ocfs2_process_blocked_lock() {
>>   3)               |    ocfs2_unblock_lock() {
>>   3)   0.269 us    |      ocfs2_prepare_cancel_convert();
>>   3)               |      ocfs2_cancel_convert() {
>>   3)               |        user_dlm_unlock() {
>>   3)               |          dlm_unlock() {
>>   3)   0.321 us    |            dlm_find_lockspace_local();
>>   3)   0.286 us    |            find_lkb();
>>   3)               |            cancel_lock() {
>>   3)               |              validate_unlock_args() {
>>   3)   0.122 us    |                del_timeout();
>>   3)   0.901 us    |              }
>>   3)               |              _cancel_lock() {
>>   3)               |                do_cancel() {
>>   3)               |                  revert_lock() {
>>   3)               |                    move_lkb() {
>>   3)   0.155 us    |                      del_lkb();
>>   3)   0.243 us    |                      add_lkb();
>>   3)   1.778 us    |                    }
>>   3)   2.577 us    |                  }
>>   3)               |                  queue_cast() {
>>   3)   0.102 us    |                    del_timeout();
>>   3)               |                    dlm_add_ast() {
>>   3)   0.165 us    |                      dlm_add_lkb_callback();
>>   3) + 14.492 us   |                    }
>>   3) + 16.381 us   |                  }
>>   3) + 20.384 us   |                }
>>   3)               |                grant_pending_locks() {
>>   3)               |                  grant_pending_convert() {
>>   3)               |                    can_be_granted() {
>>   3)   0.143 us    |                      _can_be_granted();
>>   3)   0.906 us    |                    }
>>   3)   1.900 us    |                  }
>>   3)   2.738 us    |                }
>>   3) + 24.670 us   |              }
>>   3)   0.154 us    |              put_rsb();
>>   3) + 28.068 us   |            }
>>   3)               |            dlm_put_lkb() {
>>   3)   0.163 us    |              __put_lkb();
>>   3)   1.029 us    |            }
>>   3)   0.195 us    |            dlm_put_lockspace();
>>   3) + 34.035 us   |          }
>>   3) + 34.914 us   |        }
>>   3) + 35.919 us   |      }
>>   3) + 37.864 us   |    }
>>   3)               |    ocfs2_schedule_blocked_lock() {
>>   3)   0.210 us    |      lockres_set_flags();
>>   0)               |  process_asts() {
>>   3)   0.998 us    |    }
>>   0)   0.215 us    |    dlm_rem_lkb_callback();
>>   3) + 40.671 us   |  }
>>   0)   0.084 us    |    dlm_rem_lkb_callback();
>>   0)               |    fsdlm_lock_ast_wrapper() {
>>   0)               |      ocfs2_unlock_ast() {
>>   0)   0.088 us    |        ocfs2_get_inode_osb();
>>   0)   9.498 us    |        ocfs2_wake_downconvert_thread();
>>   0)               |        lockres_clear_flags() {
>>   0)   1.272 us    |          lockres_set_flags();
>>   0)   1.757 us    |        }
>>   0) + 13.396 us   |      }
>>   0) + 13.983 us   |    }
>>   0)               |    dlm_put_lkb() {
>>   0)   0.136 us    |      __put_lkb();
>>   0)   0.641 us    |    }
>>   0) + 17.224 us   |  }
>>
>>
>> Thank,
>> Eric
>> On 11/24/15 18:02, Eric Ren wrote:
>>> Hi Joseph,
>>>
>>> I use ftrace's function tracer to record some code flow. There's a question that makes me confused -
>>> why does ocfs2_cancel_convert() be called here in ocfs2dc thread? In other words, what do we expect it
>>> to do here?
>>>
>>> ocfs2_unblock_lock(){
>>>       ...
>>>       if(lockres->l_flags & OCFS2_LOCK_BUSY){
>>>          ...
>>>          ocfs2_cancel_convert()
>>>         ...
>>>      }
>>> }
>>>
>>>  From what I understand, ocfs2_cancel_convert()->ocfs2_dlm_unlock()->user_dlm_unlock()->dlm_unlock(DLM_LKF_CANCEL) puts
>>> the lock back on the the grand queue at its old grant mode.  In my case, you know, read/write the same shared file from two nodes,
>>> I think the up-conversion can only happen on the writing node - (PR->EX), while on the reading node, no up-conversion  is need, right?
>>>
>>> But, the following output from writing and reading nodes, shows that ocfs2_cancel_convert() has been called on both nodes. why could
>>> this happen in this scenario?
>>>
>>> On 11/16/15 09:40, Joseph Qi wrote:
>>>>> Sorry, I'm confused about b). You mean b) is also part of ocfs2cmt's
>>>>> work? Does b) have something to do with a)? And what's the meaning of "evict inode"?
>>>>> Actually, I can hardly understand the idea of b).
>>>> You can go through the code flow:
>>>> iput->iput_final->evict->evict_inode->ocfs2_evict_inode
>>>> ->ocfs2_clear_inode->ocfs2_checkpoint_inode->ocfs2_start_checkpoint
>>>>
>>>> It happens that one node do not use the inode any longer (but not
>>>> delete), and will free its related lockres.
>>> OK, thanks~
>>>
>>> Eric
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-11-26  1:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-12  3:05 [Ocfs2-devel] Long io response time doubt Joseph Qi
2015-11-12  7:23 ` Eric Ren
2015-11-12  8:00   ` Joseph Qi
2015-11-12  9:48     ` Eric Ren
2015-11-13  3:31       ` Joseph Qi
2015-11-14  5:23         ` Eric Ren
2015-11-16  1:40           ` Joseph Qi
2015-11-24 10:02             ` Eric Ren
2015-11-24 10:05               ` Eric Ren
2015-11-26  1:34                 ` Joseph Qi
2015-11-26  1:49                   ` Eric Ren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.