linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Memory compaction and mlockall()
@ 2019-07-10 14:41 Sebastian Andrzej Siewior
  2019-07-10 16:20 ` Dave Hansen
  2019-07-10 18:21 ` Yang Shi
  0 siblings, 2 replies; 10+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-07-10 14:41 UTC (permalink / raw)
  To: linux-mm; +Cc: tglx

Hi,

I've been looking at the following trace:
| cyclicte-526     0d...2.. 6876070 603us : finish_task_switch <-__schedule
| cyclicte-526     0....2.. 6876070 605us : preempt_count_sub <-finish_task_switch
| cyclicte-526     0....1.. 6876070 607us : preempt_count_sub <-schedule
| cyclicte-526     0....... 6876070 610us : finish_wait <-put_and_wait_on_page_locked

I see put_and_wait_on_page_locked after schedule and didn't expect that.
cyclictest then blocks on a lock and switches to `kcompact'. Once it
finishes, it switches back to cyclictest:
| cyclicte-526     0....... 6876070 853us : rt_spin_unlock <-put_and_wait_on_page_locked
| cyclicte-526     0....... 6876070 854us : migrate_enable <-rt_spin_unlock
| cyclicte-526     0....... 6876070 860us : up_read <-do_page_fault
| cyclicte-526     0....... 6876070 861us : __up_read <-do_page_fault
| cyclicte-526     0d...... 6876070 867us : do_PrefetchAbort <-ret_from_exception
| cyclicte-526     0d...... 6876070 868us : do_page_fault <-do_PrefetchAbort
| cyclicte-526     0....... 6876070 870us : down_read_trylock <-do_page_fault
| cyclicte-526     0....... 6876070 872us : __down_read_trylock <-do_page_fault
…
| cyclicte-526     0....... 6876070 914us : __up_read <-do_page_fault
| cyclicte-526     0....... 6876070 923us : sys_clock_gettime32 <-ret_fast_syscall
| cyclicte-526     0....... 6876070 925us : posix_ktime_get_ts <-sys_clock_gettime32

I did not expect a pagefault with mlockall(). I assume it has to do with
memory compaction. I have
| CONFIG_COMPACTION=y
| CONFIG_MIGRATION=y

and Kconfig says:
|config COMPACTION
…
|                                                       You shouldn't
|           disable this option unless there really is a strong reason for
|           it and then we would be really interested to hear about that at
|           linux-mm@kvack.org.

Shouldn't COMPACTION avoid touching/moving mlock()ed pages?

Sebastian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory compaction and mlockall()
  2019-07-10 14:41 Memory compaction and mlockall() Sebastian Andrzej Siewior
@ 2019-07-10 16:20 ` Dave Hansen
  2019-07-10 16:45   ` Sebastian Andrzej Siewior
  2019-07-10 18:21 ` Yang Shi
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Hansen @ 2019-07-10 16:20 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-mm; +Cc: tglx

On 7/10/19 7:41 AM, Sebastian Andrzej Siewior wrote:
> I did not expect a pagefault with mlockall(). I assume it has to do with
> memory compaction.

mlock() doesn't technically prevent faults.  From the manpage:

> mlock(),  mlock2(),  and  mlockall()  lock  part  or  all of the
> calling process's virtual address space into RAM, preventing that
> memory from being paged to the swap area.
I read that as basically saying there will be no major faults, but minor
faults are possible.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory compaction and mlockall()
  2019-07-10 16:20 ` Dave Hansen
@ 2019-07-10 16:45   ` Sebastian Andrzej Siewior
  2019-07-10 17:02     ` Dave Hansen
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-07-10 16:45 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-mm, tglx

On 2019-07-10 09:20:47 [-0700], Dave Hansen wrote:
> On 7/10/19 7:41 AM, Sebastian Andrzej Siewior wrote:
> > I did not expect a pagefault with mlockall(). I assume it has to do with
> > memory compaction.
> 
> mlock() doesn't technically prevent faults.  From the manpage:
> 
> > mlock(),  mlock2(),  and  mlockall()  lock  part  or  all of the
> > calling process's virtual address space into RAM, preventing that
> > memory from being paged to the swap area.
> I read that as basically saying there will be no major faults, but minor
> faults are possible.

It says "lock virtual address space into into RAM". I assumed that there
will be no page faults because everything is locked.

The problem (besides the delay caused by the context switch and fix up)
is that a major fault (with might have happened at the same time in
another thread) will block this minor fault even longer.

Sebastian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory compaction and mlockall()
  2019-07-10 16:45   ` Sebastian Andrzej Siewior
@ 2019-07-10 17:02     ` Dave Hansen
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Hansen @ 2019-07-10 17:02 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-mm, tglx

On 7/10/19 9:45 AM, Sebastian Andrzej Siewior wrote:
> It says "lock virtual address space into into RAM". I assumed that there
> will be no page faults because everything is locked.
> 
> The problem (besides the delay caused by the context switch and fix up)
> is that a major fault (with might have happened at the same time in
> another thread) will block this minor fault even longer.

Yeah, I totally agree this behavior can be problematic and
counterintuitive.  Making it better would be nice.

I just wanted to point out that it's probably not strictly required, though.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory compaction and mlockall()
  2019-07-10 14:41 Memory compaction and mlockall() Sebastian Andrzej Siewior
  2019-07-10 16:20 ` Dave Hansen
@ 2019-07-10 18:21 ` Yang Shi
  2019-07-11  9:43   ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 10+ messages in thread
From: Yang Shi @ 2019-07-10 18:21 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Linux MM, Thomas Gleixner

On Wed, Jul 10, 2019 at 7:41 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> Hi,
>
> I've been looking at the following trace:
> | cyclicte-526     0d...2.. 6876070 603us : finish_task_switch <-__schedule
> | cyclicte-526     0....2.. 6876070 605us : preempt_count_sub <-finish_task_switch
> | cyclicte-526     0....1.. 6876070 607us : preempt_count_sub <-schedule
> | cyclicte-526     0....... 6876070 610us : finish_wait <-put_and_wait_on_page_locked
>
> I see put_and_wait_on_page_locked after schedule and didn't expect that.
> cyclictest then blocks on a lock and switches to `kcompact'. Once it
> finishes, it switches back to cyclictest:
> | cyclicte-526     0....... 6876070 853us : rt_spin_unlock <-put_and_wait_on_page_locked
> | cyclicte-526     0....... 6876070 854us : migrate_enable <-rt_spin_unlock
> | cyclicte-526     0....... 6876070 860us : up_read <-do_page_fault
> | cyclicte-526     0....... 6876070 861us : __up_read <-do_page_fault
> | cyclicte-526     0d...... 6876070 867us : do_PrefetchAbort <-ret_from_exception
> | cyclicte-526     0d...... 6876070 868us : do_page_fault <-do_PrefetchAbort
> | cyclicte-526     0....... 6876070 870us : down_read_trylock <-do_page_fault
> | cyclicte-526     0....... 6876070 872us : __down_read_trylock <-do_page_fault
> …
> | cyclicte-526     0....... 6876070 914us : __up_read <-do_page_fault
> | cyclicte-526     0....... 6876070 923us : sys_clock_gettime32 <-ret_fast_syscall
> | cyclicte-526     0....... 6876070 925us : posix_ktime_get_ts <-sys_clock_gettime32
>
> I did not expect a pagefault with mlockall(). I assume it has to do with
> memory compaction. I have
> | CONFIG_COMPACTION=y
> | CONFIG_MIGRATION=y
>
> and Kconfig says:
> |config COMPACTION
> …
> |                                                       You shouldn't
> |           disable this option unless there really is a strong reason for
> |           it and then we would be really interested to hear about that at
> |           linux-mm@kvack.org.
>
> Shouldn't COMPACTION avoid touching/moving mlock()ed pages?

compaction should not isolate unevictable pages unless you have
/proc/sys/vm/compact_unevictable_allowed set.

>
> Sebastian
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory compaction and mlockall()
  2019-07-10 18:21 ` Yang Shi
@ 2019-07-11  9:43   ` Sebastian Andrzej Siewior
  2019-07-11 16:23     ` Yang Shi
  2019-08-12 14:59     ` Vlastimil Babka
  0 siblings, 2 replies; 10+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-07-11  9:43 UTC (permalink / raw)
  To: Yang Shi; +Cc: Linux MM, Thomas Gleixner

On 2019-07-10 11:21:19 [-0700], Yang Shi wrote:
> 
> compaction should not isolate unevictable pages unless you have
> /proc/sys/vm/compact_unevictable_allowed set.

Thank you. This is enabled by default. The documentation for this says
| … compaction is allowed to examine the unevictable lru (mlocked pages) for
| pages to compact.…

so it is actually clear once you know where to look.
If I read this correct, the default behavior was to ignore mlock()ed
pages for compaction then commit
  5bbe3547aa3ba ("mm: allow compaction of unevictable pages")

came along in v4.1-rc1 and changed that behaviour. Is it too late to
flip it back?

Sebastian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory compaction and mlockall()
  2019-07-11  9:43   ` Sebastian Andrzej Siewior
@ 2019-07-11 16:23     ` Yang Shi
  2019-08-12 14:59     ` Vlastimil Babka
  1 sibling, 0 replies; 10+ messages in thread
From: Yang Shi @ 2019-07-11 16:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Linux MM, Thomas Gleixner

On Thu, Jul 11, 2019 at 2:43 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2019-07-10 11:21:19 [-0700], Yang Shi wrote:
> >
> > compaction should not isolate unevictable pages unless you have
> > /proc/sys/vm/compact_unevictable_allowed set.
>
> Thank you. This is enabled by default. The documentation for this says
> | … compaction is allowed to examine the unevictable lru (mlocked pages) for
> | pages to compact.…
>
> so it is actually clear once you know where to look.
> If I read this correct, the default behavior was to ignore mlock()ed
> pages for compaction then commit
>   5bbe3547aa3ba ("mm: allow compaction of unevictable pages")

Yes, before this commit compaction doesn't migrate unevictable pages.
But, other types of migration always do.

>
> came along in v4.1-rc1 and changed that behaviour. Is it too late to
> flip it back?

Disabling it via proc knob isn't fine?

>
> Sebastian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory compaction and mlockall()
  2019-07-11  9:43   ` Sebastian Andrzej Siewior
  2019-07-11 16:23     ` Yang Shi
@ 2019-08-12 14:59     ` Vlastimil Babka
  2019-08-12 15:54       ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2019-08-12 14:59 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Yang Shi; +Cc: Linux MM, Thomas Gleixner

On 7/11/19 11:43 AM, Sebastian Andrzej Siewior wrote:
> On 2019-07-10 11:21:19 [-0700], Yang Shi wrote:
>>
>> compaction should not isolate unevictable pages unless you have
>> /proc/sys/vm/compact_unevictable_allowed set.
> 
> Thank you. This is enabled by default. The documentation for this says
> | … compaction is allowed to examine the unevictable lru (mlocked pages) for
> | pages to compact.…
> 
> so it is actually clear once you know where to look.
> If I read this correct, the default behavior was to ignore mlock()ed
> pages for compaction then commit
>   5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
> 
> came along in v4.1-rc1 and changed that behaviour. Is it too late to
> flip it back?

I would say that enabled is a better default wrt benefits for the
majority of systems. This was assuming that mlock() is primarily used to
prevent sensitive data (crypto keys) from hitting swap, not to give
latency guarantees. You could perhaps argue that enabling PREEMPT_RT
might change the default, but it's somewhat subtle.

> Sebastian
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory compaction and mlockall()
  2019-08-12 14:59     ` Vlastimil Babka
@ 2019-08-12 15:54       ` Sebastian Andrzej Siewior
  2019-08-13  8:23         ` Vlastimil Babka
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-08-12 15:54 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Yang Shi, Linux MM, Thomas Gleixner

On 2019-08-12 16:59:00 [+0200], Vlastimil Babka wrote:
> On 7/11/19 11:43 AM, Sebastian Andrzej Siewior wrote:
> > On 2019-07-10 11:21:19 [-0700], Yang Shi wrote:
> >>
> >> compaction should not isolate unevictable pages unless you have
> >> /proc/sys/vm/compact_unevictable_allowed set.
> > 
> > Thank you. This is enabled by default. The documentation for this says
> > | … compaction is allowed to examine the unevictable lru (mlocked pages) for
> > | pages to compact.…
> > 
> > so it is actually clear once you know where to look.
> > If I read this correct, the default behavior was to ignore mlock()ed
> > pages for compaction then commit
> >   5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
> > 
> > came along in v4.1-rc1 and changed that behaviour. Is it too late to
> > flip it back?
> 
> I would say that enabled is a better default wrt benefits for the
> majority of systems. This was assuming that mlock() is primarily used to
> prevent sensitive data (crypto keys) from hitting swap, not to give
> latency guarantees. You could perhaps argue that enabling PREEMPT_RT
> might change the default, but it's somewhat subtle.

A different behaviour depending on PREEMPT_RT is bad. 
From the mlock(2) page:

|NOTES
|
|Memory locking has two main applications: real-time algorithms and
|high-security data processing. Real-time applications require deterministic
|timing, and, like scheduling, paging is one major cause of unexpected program
|execution delays. …
|
|Real-time processes that are using mlockall() to prevent delays on page faults
|should reserve enough locked stack pages before entering the time-critical
|section, so that no page fault can be caused by function calls. …

So if we are not going to revert that, then I would need to update man
page to reflect that we now have an additional knob to consider in order
to disable page faults on mlock()ed pages.

Sebastian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory compaction and mlockall()
  2019-08-12 15:54       ` Sebastian Andrzej Siewior
@ 2019-08-13  8:23         ` Vlastimil Babka
  0 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2019-08-13  8:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Yang Shi, Linux MM, Thomas Gleixner

On 8/12/19 5:54 PM, Sebastian Andrzej Siewior wrote:
> On 2019-08-12 16:59:00 [+0200], Vlastimil Babka wrote:
>> I would say that enabled is a better default wrt benefits for the
>> majority of systems. This was assuming that mlock() is primarily used to
>> prevent sensitive data (crypto keys) from hitting swap, not to give
>> latency guarantees. You could perhaps argue that enabling PREEMPT_RT
>> might change the default, but it's somewhat subtle.
> 
> A different behaviour depending on PREEMPT_RT is bad. 
> From the mlock(2) page:
> 
> |NOTES
> |
> |Memory locking has two main applications: real-time algorithms and
> |high-security data processing. Real-time applications require deterministic
> |timing, and, like scheduling, paging is one major cause of unexpected program
> |execution delays. …
> |
> |Real-time processes that are using mlockall() to prevent delays on page faults
> |should reserve enough locked stack pages before entering the time-critical
> |section, so that no page fault can be caused by function calls. …

Ah, I see, so it is documented for time-critical stuff.

> So if we are not going to revert that, then I would need to update man
> page to reflect that we now have an additional knob to consider in order
> to disable page faults on mlock()ed pages.

If we're going to update man page, then there's also auto NUMA balancing
that can cause minor faults and migrations, AFAIK.

> Sebastian
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-08-13  8:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-10 14:41 Memory compaction and mlockall() Sebastian Andrzej Siewior
2019-07-10 16:20 ` Dave Hansen
2019-07-10 16:45   ` Sebastian Andrzej Siewior
2019-07-10 17:02     ` Dave Hansen
2019-07-10 18:21 ` Yang Shi
2019-07-11  9:43   ` Sebastian Andrzej Siewior
2019-07-11 16:23     ` Yang Shi
2019-08-12 14:59     ` Vlastimil Babka
2019-08-12 15:54       ` Sebastian Andrzej Siewior
2019-08-13  8:23         ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).