All of lore.kernel.org
 help / color / mirror / Atom feed
* Strange mmap_lock tracepoint
@ 2021-08-31 20:40 Liam Howlett
  2021-09-01 10:52 ` Vlastimil Babka
  0 siblings, 1 reply; 4+ messages in thread
From: Liam Howlett @ 2021-08-31 20:40 UTC (permalink / raw)
  To: linux-mm; +Cc: Michel Lespinasse


Hello,

I've been trying to trace the mmap_lock calls using tracepoints and
captured this behaviour which I cannot explain.  This was with the maple
tree v2 patches running ebizzy with multiple threads in an x86_64 KVM
using 8 virtual CPUs.

AFAICT, there are zero callers that use the mmap_lock directly besides a
prefetchw(&mm->mmap_lock);


          ebizzy-803     [000] ....  5376.655366: lock_release: 00000000de8cf25e lock
          ebizzy-803     [000] ....  5376.655366: lock_release: 00000000b5e38448 ptlock_ptr(page)
          ebizzy-803     [000] ....  5376.655367: lock_release: 000000006b581afd &mm->mmap_lock
          ebizzy-803     [000] ....  5376.655367: mmap_lock_released: mm=000000001de7b122 memcg_path= write=false
 
          ebizzy-803     [000] ....  5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false
 
          ebizzy-803     [000] ....  5376.655369: mmap_lock_acquire_returned: mm=000000001de7b122 memcg_path= write=false success=false
 
          ebizzy-803     [000] ....  5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false
 
          ebizzy-803     [000] ....  5376.655369: lock_acquire: 000000006b581afd read &mm->mmap_lock
          ebizzy-803     [000] d...  5376.655369: lock_acquire: 00000000984d533e &sem->wait_lock
          ebizzy-803     [000] d...  5376.655370: lock_release: 00000000984d533e &sem->wait_lock
          ebizzy-803     [000] d...  5376.655370: lock_acquire: 00000000cbb1a456 &rq->__lock
          ebizzy-803     [000] d...  5376.655371: lock_acquire: 00000000b3d53a93 read rcu_read_lock
          ebizzy-803     [000] d...  5376.655371: lock_release: 00000000b3d53a93 rcu_read_lock
          ebizzy-803     [000] d...  5376.655371: lock_acquire: 00000000b3d53a93 read rcu_read_lock
          ebizzy-803     [000] d...  5376.655371: lock_release: 00000000b3d53a93 rcu_read_lock
          ebizzy-803     [000] d...  5376.655372: lock_acquire: 00000000b3d53a93 read rcu_read_lock
          ebizzy-803     [000] d...  5376.655372: lock_release: 00000000b3d53a93 rcu_read_lock
          ebizzy-803     [000] d...  5376.655372: lock_release: 00000000cbb1a456 &rq->__lock
          <idle>-0       [000] d...  5376.655373: lock_acquire: 00000000cbb1a456 &rq->__lock
          <idle>-0       [000] d...  5376.655373: lock_release: 00000000cbb1a456 &rq->__lock
          <idle>-0       [000] d...  5376.655373: lock_acquire: 00000000bb7b405b read tk_core.seq.seqcount
          <idle>-0       [000] d...  5376.655374: lock_release: 00000000bb7b405b tk_core.seq.seqcount
          <idle>-0       [000] d...  5376.655374: lock_acquire: 000000007c3ac380 read jiffies_seq.seqcount
          <idle>-0       [000] d...  5376.655374: lock_release: 000000007c3ac380 jiffies_seq.seqcount
          <idle>-0       [000] d.h.  5376.655460: lock_acquire: 00000000bb7b405b read tk_core.seq.seqcount
          <idle>-0       [000] d.h.  5376.655460: lock_release: 00000000bb7b405b tk_core.seq.seqcount
          <idle>-0       [000] d.h.  5376.655460: lock_acquire: 00000000f7c85b8f hrtimer_bases.lock
          <idle>-0       [000] d.h.  5376.655461: lock_acquire: 00000000bb7b405b read tk_core.seq.seqcount
          <idle>-0       [000] d.h.  5376.655461: lock_release: 00000000bb7b405b tk_core.seq.seqcount
          <idle>-0       [000] d.h.  5376.655461: lock_release: 00000000f7c85b8f hrtimer_bases.lock
          <idle>-0       [000] d.h.  5376.655462: lock_acquire: 00000000bb7b405b read tk_core.seq.seqcount
          <idle>-0       [000] d.h.  5376.655462: lock_release: 00000000bb7b405b tk_core.seq.seqcount
          <idle>-0       [000] d.h.  5376.655462: lock_acquire: 00000000cbb1a456 &rq->__lock
          <idle>-0       [000] d.h.  5376.655463: lock_release: 00000000cbb1a456 &rq->__lock
          <idle>-0       [000] d.h.  5376.655463: lock_acquire: 00000000f7c85b8f hrtimer_bases.lock
          <idle>-0       [000] d.h.  5376.655464: lock_release: 00000000f7c85b8f hrtimer_bases.lock
          <idle>-0       [000] d.h.  5376.655464: lock_acquire: 00000000bb7b405b read tk_core.seq.seqcount
          <idle>-0       [000] d.h.  5376.655464: lock_release: 00000000bb7b405b tk_core.seq.seqcount
          <idle>-0       [000] d...  5376.655466: lock_acquire: 00000000bb7b405b read tk_core.seq.seqcount
          <idle>-0       [000] d...  5376.655466: lock_release: 00000000bb7b405b tk_core.seq.seqcount
          <idle>-0       [000] d...  5376.655467: lock_acquire: 000000007c3ac380 read jiffies_seq.seqcount
          <idle>-0       [000] d...  5376.655467: lock_release: 000000007c3ac380 jiffies_seq.seqcount
          <idle>-0       [000] d.h.  5376.655488: lock_acquire: 00000000bb7b405b read tk_core.seq.seqcount
          <idle>-0       [000] d.h.  5376.655489: lock_release: 00000000bb7b405b tk_core.seq.seqcount
          <idle>-0       [000] d.h.  5376.655489: lock_acquire: 00000000cbb1a456 &rq->__lock
          <idle>-0       [000] dNh.  5376.655490: lock_release: 00000000cbb1a456 &rq->__lock
          <idle>-0       [000] dN..  5376.655490: lock_acquire: 00000000cbb1a456 &rq->__lock
          <idle>-0       [000] d...  5376.655491: lock_release: 00000000cbb1a456 &rq->__lock
          ebizzy-803     [000] d...  5376.655491: lock_acquire: 00000000cbb1a456 &rq->__lock
          ebizzy-803     [000] d...  5376.655492: lock_release: 00000000cbb1a456 &rq->__lock
          ebizzy-803     [000] ....  5376.655492: mmap_lock_acquire_returned: mm=000000001de7b122 memcg_path= write=false success=true

The following tracepoints were enabled:
  events/mmap_lock/enable
  events/lock/enable

My reading of the above trace is that the ebizzy thread dropped the lock
and immediately attempted to reacquire and detected it was in contention
so the thread added itself to the list and went to sleep only to wake up
and get the lock?

Doesn't that mean the mmap_lock is _not_ contended?  The lack of
contention makes sense since there is no tracepoint of an attempt to
acquire the lock between the dropping of the lock and the same MM waking
up to get the lock.

What is even stranger is if I change the value of _Q_PENDING_LOOPS from
(1 << 9) to (1 << 10), the benchmark performs better.

From the above trace and the effects of the _Q_PENDING_LOOPS change, it
looks like the lock is somehow remaining in a state that causes a
failure to acquire the lock even when it is not held or contended?


Thanks,
Liam

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Strange mmap_lock tracepoint
  2021-08-31 20:40 Strange mmap_lock tracepoint Liam Howlett
@ 2021-09-01 10:52 ` Vlastimil Babka
  2021-09-01 17:05   ` Liam Howlett
  0 siblings, 1 reply; 4+ messages in thread
From: Vlastimil Babka @ 2021-09-01 10:52 UTC (permalink / raw)
  To: Liam Howlett, linux-mm; +Cc: Michel Lespinasse

On 8/31/21 22:40, Liam Howlett wrote:
> 
> Hello,
> 
> I've been trying to trace the mmap_lock calls using tracepoints and
> captured this behaviour which I cannot explain.  This was with the maple
> tree v2 patches running ebizzy with multiple threads in an x86_64 KVM
> using 8 virtual CPUs.
> 
> AFAICT, there are zero callers that use the mmap_lock directly besides a
> prefetchw(&mm->mmap_lock);
> 
> 
>           ebizzy-803     [000] ....  5376.655366: lock_release: 00000000de8cf25e lock
>           ebizzy-803     [000] ....  5376.655366: lock_release: 00000000b5e38448 ptlock_ptr(page)
>           ebizzy-803     [000] ....  5376.655367: lock_release: 000000006b581afd &mm->mmap_lock
>           ebizzy-803     [000] ....  5376.655367: mmap_lock_released: mm=000000001de7b122 memcg_path= write=false
>  
>           ebizzy-803     [000] ....  5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false
>  
>           ebizzy-803     [000] ....  5376.655369: mmap_lock_acquire_returned: mm=000000001de7b122 memcg_path= write=false success=false
>  
>           ebizzy-803     [000] ....  5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false

I was thinking khugepaged interference, but this seems to be a system-wide
tracing (judging from <idle>-0) so you would see khugepaged too, right?

In the other hand it seems strange to have a long list of just cpu 0 here.
Are other CPU's missing or just the interleaving is imperfect because
timestamps are not perfectly in sync between cpus, and in fact there was
another CPU who took the lock?

>           ebizzy-803     [000] ....  5376.655369: lock_acquire: 000000006b581afd read &mm->mmap_lock
>           ebizzy-803     [000] d...  5376.655369: lock_acquire: 00000000984d533e &sem->wait_lock
>           ebizzy-803     [000] d...  5376.655370: lock_release: 00000000984d533e &sem->wait_lock
>           ebizzy-803     [000] d...  5376.655370: lock_acquire: 00000000cbb1a456 &rq->__lock
>           ebizzy-803     [000] d...  5376.655371: lock_acquire: 00000000b3d53a93 read rcu_read_lock
>           ebizzy-803     [000] d...  5376.655371: lock_release: 00000000b3d53a93 rcu_read_lock
>           ebizzy-803     [000] d...  5376.655371: lock_acquire: 00000000b3d53a93 read rcu_read_lock
>           ebizzy-803     [000] d...  5376.655371: lock_release: 00000000b3d53a93 rcu_read_lock
>           ebizzy-803     [000] d...  5376.655372: lock_acquire: 00000000b3d53a93 read rcu_read_lock
>           ebizzy-803     [000] d...  5376.655372: lock_release: 00000000b3d53a93 rcu_read_lock
>           ebizzy-803     [000] d...  5376.655372: lock_release: 00000000cbb1a456 &rq->__lock
>           <idle>-0       [000] d...  5376.655373: lock_acquire: 00000000cbb1a456 &rq->__lock
>           <idle>-0       [000] d...  5376.655373: lock_release: 00000000cbb1a456 &rq->__lock
>           <idle>-0       [000] d...  5376.655373: lock_acquire: 00000000bb7b405b read tk_core.seq.seqcount
>           <idle>-0       [000] d...  5376.655374: lock_release: 00000000bb7b405b tk_core.seq.seqcount
>           <idle>-0       [000] d...  5376.655374: lock_acquire: 000000007c3ac380 read jiffies_seq.seqcount
>           <idle>-0       [000] d...  5376.655374: lock_release: 000000007c3ac380 jiffies_seq.seqcount
>           <idle>-0       [000] d.h.  5376.655460: lock_acquire: 00000000bb7b405b read tk_core.seq.seqcount
>           <idle>-0       [000] d.h.  5376.655460: lock_release: 00000000bb7b405b tk_core.seq.seqcount
>           <idle>-0       [000] d.h.  5376.655460: lock_acquire: 00000000f7c85b8f hrtimer_bases.lock
>           <idle>-0       [000] d.h.  5376.655461: lock_acquire: 00000000bb7b405b read tk_core.seq.seqcount
>           <idle>-0       [000] d.h.  5376.655461: lock_release: 00000000bb7b405b tk_core.seq.seqcount
>           <idle>-0       [000] d.h.  5376.655461: lock_release: 00000000f7c85b8f hrtimer_bases.lock
>           <idle>-0       [000] d.h.  5376.655462: lock_acquire: 00000000bb7b405b read tk_core.seq.seqcount
>           <idle>-0       [000] d.h.  5376.655462: lock_release: 00000000bb7b405b tk_core.seq.seqcount
>           <idle>-0       [000] d.h.  5376.655462: lock_acquire: 00000000cbb1a456 &rq->__lock
>           <idle>-0       [000] d.h.  5376.655463: lock_release: 00000000cbb1a456 &rq->__lock
>           <idle>-0       [000] d.h.  5376.655463: lock_acquire: 00000000f7c85b8f hrtimer_bases.lock
>           <idle>-0       [000] d.h.  5376.655464: lock_release: 00000000f7c85b8f hrtimer_bases.lock
>           <idle>-0       [000] d.h.  5376.655464: lock_acquire: 00000000bb7b405b read tk_core.seq.seqcount
>           <idle>-0       [000] d.h.  5376.655464: lock_release: 00000000bb7b405b tk_core.seq.seqcount
>           <idle>-0       [000] d...  5376.655466: lock_acquire: 00000000bb7b405b read tk_core.seq.seqcount
>           <idle>-0       [000] d...  5376.655466: lock_release: 00000000bb7b405b tk_core.seq.seqcount
>           <idle>-0       [000] d...  5376.655467: lock_acquire: 000000007c3ac380 read jiffies_seq.seqcount
>           <idle>-0       [000] d...  5376.655467: lock_release: 000000007c3ac380 jiffies_seq.seqcount
>           <idle>-0       [000] d.h.  5376.655488: lock_acquire: 00000000bb7b405b read tk_core.seq.seqcount
>           <idle>-0       [000] d.h.  5376.655489: lock_release: 00000000bb7b405b tk_core.seq.seqcount
>           <idle>-0       [000] d.h.  5376.655489: lock_acquire: 00000000cbb1a456 &rq->__lock
>           <idle>-0       [000] dNh.  5376.655490: lock_release: 00000000cbb1a456 &rq->__lock
>           <idle>-0       [000] dN..  5376.655490: lock_acquire: 00000000cbb1a456 &rq->__lock
>           <idle>-0       [000] d...  5376.655491: lock_release: 00000000cbb1a456 &rq->__lock
>           ebizzy-803     [000] d...  5376.655491: lock_acquire: 00000000cbb1a456 &rq->__lock
>           ebizzy-803     [000] d...  5376.655492: lock_release: 00000000cbb1a456 &rq->__lock
>           ebizzy-803     [000] ....  5376.655492: mmap_lock_acquire_returned: mm=000000001de7b122 memcg_path= write=false success=true
> 
> The following tracepoints were enabled:
>   events/mmap_lock/enable
>   events/lock/enable
> 
> My reading of the above trace is that the ebizzy thread dropped the lock
> and immediately attempted to reacquire and detected it was in contention
> so the thread added itself to the list and went to sleep only to wake up
> and get the lock?

Which path does this mmap_read_trylock() followed by immediate
mmap_read_lock() anyway? I mean down_read() is implemented like this
internally, but that wouldn't generate these mmap_lock_ events.

> Doesn't that mean the mmap_lock is _not_ contended?  The lack of
> contention makes sense since there is no tracepoint of an attempt to
> acquire the lock between the dropping of the lock and the same MM waking
> up to get the lock.
> 
> What is even stranger is if I change the value of _Q_PENDING_LOOPS from
> (1 << 9) to (1 << 10), the benchmark performs better.
> 
> From the above trace and the effects of the _Q_PENDING_LOOPS change, it
> looks like the lock is somehow remaining in a state that causes a
> failure to acquire the lock even when it is not held or contended?
> 
> 
> Thanks,
> Liam
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Strange mmap_lock tracepoint
  2021-09-01 10:52 ` Vlastimil Babka
@ 2021-09-01 17:05   ` Liam Howlett
  2021-09-03  1:57     ` Liam Howlett
  0 siblings, 1 reply; 4+ messages in thread
From: Liam Howlett @ 2021-09-01 17:05 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: linux-mm, Michel Lespinasse

* Vlastimil Babka <vbabka@suse.cz> [210901 06:52]:
> On 8/31/21 22:40, Liam Howlett wrote:
> > 
> > Hello,
> > 
> > I've been trying to trace the mmap_lock calls using tracepoints and
> > captured this behaviour which I cannot explain.  This was with the maple
> > tree v2 patches running ebizzy with multiple threads in an x86_64 KVM
> > using 8 virtual CPUs.
> > 
> > AFAICT, there are zero callers that use the mmap_lock directly besides a
> > prefetchw(&mm->mmap_lock);
> > 
> > 
> >           ebizzy-803     [000] ....  5376.655366: lock_release: 00000000de8cf25e lock
> >           ebizzy-803     [000] ....  5376.655366: lock_release: 00000000b5e38448 ptlock_ptr(page)
> >           ebizzy-803     [000] ....  5376.655367: lock_release: 000000006b581afd &mm->mmap_lock
> >           ebizzy-803     [000] ....  5376.655367: mmap_lock_released: mm=000000001de7b122 memcg_path= write=false
> >  
> >           ebizzy-803     [000] ....  5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false
> >  
> >           ebizzy-803     [000] ....  5376.655369: mmap_lock_acquire_returned: mm=000000001de7b122 memcg_path= write=false success=false
> >  
> >           ebizzy-803     [000] ....  5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false
> 
> I was thinking khugepaged interference, but this seems to be a system-wide
> tracing (judging from <idle>-0) so you would see khugepaged too, right?

Correct.  All calls will go through Michel's tracepoints from what I can
tell.

> 
> In the other hand it seems strange to have a long list of just cpu 0 here.
> Are other CPU's missing or just the interleaving is imperfect because
> timestamps are not perfectly in sync between cpus, and in fact there was
> another CPU who took the lock?

Yes, it could be the clock.  I had used the default clock which is local
to CPUs.  I was looking for documentation in 'time stamp' but should
have looked for 'clock'.  I've re-run the test with counter, which
should remove the potential of incorrect ordering.  I've also imported
into a spreadsheet and sorted by the counter to ensure there isn't
interleaving of printing causing issues.  This is why the output below
has a slightly different whitespace than the original.

ebizzy-1437    [003] ....	83650150	: mmap_lock_released: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83650157	: mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83650158	: mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=false
ebizzy-1437    [003] ....	83650159	: mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83650160	: lock_acquire: 0000000055bc357b read &mm->mmap_lock
ebizzy-1437    [003] d...	83650161	: lock_acquire: 00000000b3904b60 &sem->wait_lock
ebizzy-1437    [003] d...	83650163	: lock_release: 00000000b3904b60 &sem->wait_lock
ebizzy-1437    [003] d...	83650164	: lock_acquire: 00000000b5963f30 &rq->__lock
ebizzy-1437    [003] d...	83650168	: lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437    [003] d...	83650169	: lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437    [003] d...	83650171	: lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437    [003] d...	83650173	: lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437    [003] d...	83650174	: lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437    [003] d...	83650175	: lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437    [003] d...	83650178	: lock_release: 00000000b5963f30 &rq->__lock
<idle>-0       [003] d...	83650181	: lock_acquire: 00000000b5963f30 &rq->__lock
<idle>-0       [003] d...	83650182	: lock_release: 00000000b5963f30 &rq->__lock
<idle>-0       [003] d...	83650183	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] d...	83650184	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] d...	83650185	: lock_acquire: 00000000f5623f3e read jiffies_seq.seqcount
<idle>-0       [003] d...	83650186	: lock_release: 00000000f5623f3e jiffies_seq.seqcount
<idle>-0       [003] d...	83650187	: lock_acquire: 000000006705bb4e &base->lock
<idle>-0       [003] d...	83650188	: lock_release: 000000006705bb4e &base->lock
<idle>-0       [003] d...	83650189	: lock_acquire: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] d...	83650190	: lock_release: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] d...	83650191	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] d...	83650192	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] d...	83650193	: lock_acquire: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] d...	83650194	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] d...	83650195	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] d...	83650196	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] d...	83650197	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] d...	83650198	: lock_release: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] d.h.	83651382	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] d.h.	83651386	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] d.h.	83651391	: lock_acquire: 00000000b5963f30 &rq->__lock
<idle>-0       [003] dNh.	83651403	: lock_release: 00000000b5963f30 &rq->__lock
<idle>-0       [003] dN..	83651406	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] dN..	83651409	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] dN..	83651411	: lock_acquire: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] dN..	83651415	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] dN..	83651417	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] dN..	83651427	: lock_release: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] dN..	83651429	: lock_acquire: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] dN..	83651431	: lock_acquire: 00000000b152519a read tk_core.seq.seqcount
<idle>-0       [003] dN..	83651433	: lock_release: 00000000b152519a tk_core.seq.seqcount
<idle>-0       [003] dN..	83651441	: lock_release: 0000000009614e04 hrtimer_bases.lock
<idle>-0       [003] dN..	83651443	: lock_acquire: 00000000b5963f30 &rq->__lock
<idle>-0       [003] d...	83651448	: lock_release: 00000000b5963f30 &rq->__lock
ebizzy-1437    [003] d...	83651453	: lock_acquire: 00000000b5963f30 &rq->__lock
ebizzy-1437    [003] d...	83651455	: lock_release: 00000000b5963f30 &rq->__lock
ebizzy-1437    [003] ....	83651458	: mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=true
ebizzy-1437    [003] ....	83651462	: lock_acquire: 000000005ada5d35 fs_reclaim
ebizzy-1437    [003] ....	83651464	: lock_acquire: 000000009ff6ca04 mmu_notifier_invalidate_range_start
ebizzy-1437    [003] ....	83651466	: lock_release: 000000009ff6ca04 mmu_notifier_invalidate_range_start
ebizzy-1437    [003] ....	83651467	: lock_release: 000000005ada5d35 fs_reclaim
ebizzy-1437    [003] d...	83651470	: lock_acquire: 0000000002c996e9 lock
ebizzy-1437    [003] d...	83651472	: lock_release: 0000000002c996e9 lock
ebizzy-1437    [003] ....	83651475	: lock_acquire: 00000000aae3fceb ptlock_ptr(page)
ebizzy-1437    [003] ....	83651476	: lock_acquire: 0000000064812dba lock
ebizzy-1437    [003] ....	83651477	: lock_release: 0000000064812dba lock
ebizzy-1437    [003] ....	83651478	: lock_release: 00000000aae3fceb ptlock_ptr(page)
ebizzy-1437    [003] ....	83651479	: lock_release: 0000000055bc357b &mm->mmap_lock
ebizzy-1437    [003] ....	83651480	: mmap_lock_released: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83651489	: mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83651491	: mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=false
ebizzy-1437    [003] ....	83651492	: mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83651494	: lock_acquire: 0000000055bc357b read &mm->mmap_lock
ebizzy-1437    [003] d...	83651496	: lock_acquire: 00000000b3904b60 &sem->wait_lock
ebizzy-1437    [003] d...	83651497	: lock_release: 00000000b3904b60 &sem->wait_lock
ebizzy-1437    [003] d...	83651499	: lock_acquire: 00000000b5963f30 &rq->__lock
ebizzy-1437    [003] d...	83651501	: lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437    [003] d...	83651504	: lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437    [003] d...	83651505	: lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437    [003] d...	83651507	: lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437    [003] d...	83651510	: lock_acquire: 000000002d23d506 read rcu_read_lock
ebizzy-1437    [003] d...	83651512	: lock_release: 000000002d23d506 rcu_read_lock
ebizzy-1437    [003] d...	83651515	: lock_release: 00000000b5963f30 &rq->__lock
<idle>-0       [003] d...	83651517	: lock_acquire: 00000000b5963f30 &rq->__lock
<idle>-0       [003] d...	83651520	: lock_release: 00000000b5963f30 &rq->__lock

Please note that this is now next-20210811 directly.  No maple tree
patches.


As per the grouping of each thread, I don't really have an answer.
There are places in the trace which do have different CPUs and threads
running interleaved:

ebizzy-1445    [000] ....	83693711	: lock_release: 0000000055bc357b &mm->mmap_lock
ebizzy-1445    [000] ....	83693715	: mmap_lock_released: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83693727	: mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83693730	: lock_acquire: 0000000055bc357b try read &mm->mmap_lock
ebizzy-1437    [003] ....	83693733	: mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=true
ebizzy-1437    [003] ....	83693738	: lock_acquire: 000000005ada5d35 fs_reclaim
ebizzy-1437    [003] ....	83693742	: lock_acquire: 000000009ff6ca04 mmu_notifier_invalidate_range_start
ebizzy-1445    [000] ....	83693743	: mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
ebizzy-1437    [003] ....	83693747	: lock_release: 000000009ff6ca04 mmu_notifier_invalidate_range_start


>  <removed old log>
> > 
> > The following tracepoints were enabled:
> >   events/mmap_lock/enable
> >   events/lock/enable
> > 
> > My reading of the above trace is that the ebizzy thread dropped the lock
> > and immediately attempted to reacquire and detected it was in contention
> > so the thread added itself to the list and went to sleep only to wake up
> > and get the lock?
> 
> Which path does this mmap_read_trylock() followed by immediate
> mmap_read_lock() anyway? I mean down_read() is implemented like this
> internally, but that wouldn't generate these mmap_lock_ events.

I think it's do_user_addr_fault()?

> 
> > Doesn't that mean the mmap_lock is _not_ contended?  The lack of
> > contention makes sense since there is no tracepoint of an attempt to
> > acquire the lock between the dropping of the lock and the same MM waking
> > up to get the lock.
> > 
> > What is even stranger is if I change the value of _Q_PENDING_LOOPS from
> > (1 << 9) to (1 << 10), the benchmark performs better.
> > 
> > From the above trace and the effects of the _Q_PENDING_LOOPS change, it
> > looks like the lock is somehow remaining in a state that causes a
> > failure to acquire the lock even when it is not held or contended?

So the new log is with trace_clock set to counter with next-20210811
running ./ebizzy -m (only use mmap) on a KVM with 8 CPUs.

Thanks,
Liam

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Strange mmap_lock tracepoint
  2021-09-01 17:05   ` Liam Howlett
@ 2021-09-03  1:57     ` Liam Howlett
  0 siblings, 0 replies; 4+ messages in thread
From: Liam Howlett @ 2021-09-03  1:57 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: linux-mm, Michel Lespinasse

* Liam Howlett <liam.howlett@oracle.com> [210901 13:05]:
> * Vlastimil Babka <vbabka@suse.cz> [210901 06:52]:
> > On 8/31/21 22:40, Liam Howlett wrote:
> > > 
> > > Hello,
> > > 
> > > I've been trying to trace the mmap_lock calls using tracepoints and
> > > captured this behaviour which I cannot explain.  This was with the maple
> > > tree v2 patches running ebizzy with multiple threads in an x86_64 KVM
> > > using 8 virtual CPUs.
> > > 
> > > AFAICT, there are zero callers that use the mmap_lock directly besides a
> > > prefetchw(&mm->mmap_lock);
> > > 
> > > 
> > >           ebizzy-803     [000] ....  5376.655366: lock_release: 00000000de8cf25e lock
> > >           ebizzy-803     [000] ....  5376.655366: lock_release: 00000000b5e38448 ptlock_ptr(page)
> > >           ebizzy-803     [000] ....  5376.655367: lock_release: 000000006b581afd &mm->mmap_lock
> > >           ebizzy-803     [000] ....  5376.655367: mmap_lock_released: mm=000000001de7b122 memcg_path= write=false
> > >  
> > >           ebizzy-803     [000] ....  5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false
> > >  
> > >           ebizzy-803     [000] ....  5376.655369: mmap_lock_acquire_returned: mm=000000001de7b122 memcg_path= write=false success=false
> > >  
> > >           ebizzy-803     [000] ....  5376.655369: mmap_lock_start_locking: mm=000000001de7b122 memcg_path= write=false
> > 
> > I was thinking khugepaged interference, but this seems to be a system-wide
> > tracing (judging from <idle>-0) so you would see khugepaged too, right?
> 
> Correct.  All calls will go through Michel's tracepoints from what I can
> tell.
> 
> > 
> > In the other hand it seems strange to have a long list of just cpu 0 here.
> > Are other CPU's missing or just the interleaving is imperfect because
> > timestamps are not perfectly in sync between cpus, and in fact there was
> > another CPU who took the lock?
> 
> Yes, it could be the clock.  I had used the default clock which is local
> to CPUs.  I was looking for documentation in 'time stamp' but should
> have looked for 'clock'.  I've re-run the test with counter, which
> should remove the potential of incorrect ordering.  I've also imported
> into a spreadsheet and sorted by the counter to ensure there isn't
> interleaving of printing causing issues.  This is why the output below
> has a slightly different whitespace than the original.
> 

<dropping old log since it's not necessary>

> 
> Please note that this is now next-20210811 directly.  No maple tree
> patches.
> 
> 
> As per the grouping of each thread, I don't really have an answer.
> There are places in the trace which do have different CPUs and threads
> running interleaved:
> 
> ebizzy-1445    [000] ....	83693711	: lock_release: 0000000055bc357b &mm->mmap_lock
> ebizzy-1445    [000] ....	83693715	: mmap_lock_released: mm=00000000ce5bd903 memcg_path= write=false
> ebizzy-1437    [003] ....	83693727	: mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
> ebizzy-1437    [003] ....	83693730	: lock_acquire: 0000000055bc357b try read &mm->mmap_lock
> ebizzy-1437    [003] ....	83693733	: mmap_lock_acquire_returned: mm=00000000ce5bd903 memcg_path= write=false success=true
> ebizzy-1437    [003] ....	83693738	: lock_acquire: 000000005ada5d35 fs_reclaim
> ebizzy-1437    [003] ....	83693742	: lock_acquire: 000000009ff6ca04 mmu_notifier_invalidate_range_start
> ebizzy-1445    [000] ....	83693743	: mmap_lock_start_locking: mm=00000000ce5bd903 memcg_path= write=false
> ebizzy-1437    [003] ....	83693747	: lock_release: 000000009ff6ca04 mmu_notifier_invalidate_range_start
> 
> 
> >  <removed old log>
> > > 
> > > The following tracepoints were enabled:
> > >   events/mmap_lock/enable
> > >   events/lock/enable
> > > 
> > > My reading of the above trace is that the ebizzy thread dropped the lock
> > > and immediately attempted to reacquire and detected it was in contention
> > > so the thread added itself to the list and went to sleep only to wake up
> > > and get the lock?
> > 
> > Which path does this mmap_read_trylock() followed by immediate
> > mmap_read_lock() anyway? I mean down_read() is implemented like this
> > internally, but that wouldn't generate these mmap_lock_ events.
> 
> I think it's do_user_addr_fault()?
> 
> > 
> > > Doesn't that mean the mmap_lock is _not_ contended?  The lack of
> > > contention makes sense since there is no tracepoint of an attempt to
> > > acquire the lock between the dropping of the lock and the same MM waking
> > > up to get the lock.
> > > 
> > > What is even stranger is if I change the value of _Q_PENDING_LOOPS from
> > > (1 << 9) to (1 << 10), the benchmark performs better.
> > > 
> > > From the above trace and the effects of the _Q_PENDING_LOOPS change, it
> > > looks like the lock is somehow remaining in a state that causes a
> > > failure to acquire the lock even when it is not held or contended?
> 
> So the new log is with trace_clock set to counter with next-20210811
> running ./ebizzy -m (only use mmap) on a KVM with 8 CPUs.

I was able to get to the bottom what what was happening with a lot of
help from Steven Rostedt.  The logs can be explained by the following:

- Sequence counters may be missed if a given CPU overruns its ring
  buffer.  These are marked by #### CPU N buffer started #### or
  something similar.

- There may also be a jump in the sequence counter by interrupts.

- The mmap_lock logging was racy with other threads logging of
  mmap_lock.  I have sent out a patch [1] to fix the race.

- When downgrading the writer, the log will show that the lock is
  acquired as a read lock (write=false) without a 'start locking' log.

1.  https://lore.kernel.org/linux-mm/20210903013521.1802774-1-Liam.Howlett@oracle.com/


Thanks everyone for the help in tracking down the odd behaviour.  It was
nice to find out it wasn't entirely user error :)

Cheers,
Liam

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-09-03  1:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-31 20:40 Strange mmap_lock tracepoint Liam Howlett
2021-09-01 10:52 ` Vlastimil Babka
2021-09-01 17:05   ` Liam Howlett
2021-09-03  1:57     ` Liam Howlett

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.