All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] madvise.2: add MADV_HUGEPAGE and MADV_NOHUGEPAGE
@ 2011-07-27 20:14 Doug Goldstein
       [not found] ` <CAFWqQMRFHJ2kWkJWB2dAg-Od9MzqL7LeC=CQvzy6t5aNqVY_zQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Doug Goldstein @ 2011-07-27 20:14 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA

Document the MADV_HUGEPAGE and MADV_NOHUGEPAGE flags added to the
madvise() syscall in Linux kernels 2.6.38 and newer.

Signed-off-by: Doug Goldstein <cardoe-VPKZcK2rSRzQT0dZR+AlfA@public.gmane.org>
---
 man2/madvise.2 |   34 ++++++++++++++++++++++++++++++++++
 1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/man2/madvise.2 b/man2/madvise.2
index 6a449c5..e099e94 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -209,6 +209,40 @@ KSM unmerges whatever pages it had merged in the
address range specified by
 .IR addr
 and
 .IR length .
+.TP
+.BR MADV_HUGEPAGE " (since Linux 2.6.38)"
+Enables Transparent Huge Pages (THP) for pages in the range specified by
+.I addr
+and
+.IR length .
+Currently Transparent Huge Pages only work with private anonymous pages (see
+.BR mmap (2)).
+The kernel will regularly scan the areas marked as huge page candidates
+to replace them with huge pages.
+The kernel will also allocate huge pages directly when the region is
+naturally aligned to the huge page size. (see
+.BR posix_memalign (2)).
+This feature is primarily aimmed at applications that use large mappings of
+data and access large regions of that memory at a time (e.g. virtualization
+systems such as qemu).
+It can very easily waste memory (e.g. a 2MB mapping that only ever accesses
+1 byte will result in 2MB of wired memory instead of one 4KB page).
+See the kernel source file
+.I Documentation/vm/transhuge.txt
+for more details.
+The
+.BR MADV_HUGEPAGE
+and
+.BR MADV_NOHUGEPAGE
+operations are only available if the kernel was configured with
+.BR CONFIG_TRANSPARENT_HUGEPAGE.
+.TP
+.BR MADV_NOHUGEPAGE " (since Linux 2.6.38)"
+Ensures that memory in the address range specified by
+.IR addr
+and
+.IR length
+will not be collapsed into huge pages.
 .SH "RETURN VALUE"
 On success
 .BR madvise ()
-- 
1.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] madvise.2: add MADV_HUGEPAGE and MADV_NOHUGEPAGE
       [not found] ` <CAFWqQMRFHJ2kWkJWB2dAg-Od9MzqL7LeC=CQvzy6t5aNqVY_zQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-09-19  5:26   ` Michael Kerrisk
       [not found]     ` <CAKgNAkh55ZFMEU5nH0vS=jgW91GjDWx1Tf=gyRDUqNm4yqS1oA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Kerrisk @ 2011-09-19  5:26 UTC (permalink / raw)
  To: Doug Goldstein; +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, Andrea Arcangeli

Hello Doug,

On Wed, Jul 27, 2011 at 10:14 PM, Doug Goldstein <cardoe-VPKZcK2rSRzQT0dZR+AlfA@public.gmane.org> wrote:
> Document the MADV_HUGEPAGE and MADV_NOHUGEPAGE flags added to the
> madvise() syscall in Linux kernels 2.6.38 and newer.

Thanks. I've applied this for man-pages-3.34.

Andrea, is there anything you think necessary to add/change?

Cheers,

Michael


> Signed-off-by: Doug Goldstein <cardoe-VPKZcK2rSRzQT0dZR+AlfA@public.gmane.org>
> ---
>  man2/madvise.2 |   34 ++++++++++++++++++++++++++++++++++
>  1 files changed, 34 insertions(+), 0 deletions(-)
>
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index 6a449c5..e099e94 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -209,6 +209,40 @@ KSM unmerges whatever pages it had merged in the
> address range specified by
>  .IR addr
>  and
>  .IR length .
> +.TP
> +.BR MADV_HUGEPAGE " (since Linux 2.6.38)"
> +Enables Transparent Huge Pages (THP) for pages in the range specified by
> +.I addr
> +and
> +.IR length .
> +Currently Transparent Huge Pages only work with private anonymous pages (see
> +.BR mmap (2)).
> +The kernel will regularly scan the areas marked as huge page candidates
> +to replace them with huge pages.
> +The kernel will also allocate huge pages directly when the region is
> +naturally aligned to the huge page size. (see
> +.BR posix_memalign (2)).
> +This feature is primarily aimmed at applications that use large mappings of
> +data and access large regions of that memory at a time (e.g. virtualization
> +systems such as qemu).
> +It can very easily waste memory (e.g. a 2MB mapping that only ever accesses
> +1 byte will result in 2MB of wired memory instead of one 4KB page).
> +See the kernel source file
> +.I Documentation/vm/transhuge.txt
> +for more details.
> +The
> +.BR MADV_HUGEPAGE
> +and
> +.BR MADV_NOHUGEPAGE
> +operations are only available if the kernel was configured with
> +.BR CONFIG_TRANSPARENT_HUGEPAGE.
> +.TP
> +.BR MADV_NOHUGEPAGE " (since Linux 2.6.38)"
> +Ensures that memory in the address range specified by
> +.IR addr
> +and
> +.IR length
> +will not be collapsed into huge pages.
>  .SH "RETURN VALUE"
>  On success
>  .BR madvise ()
> --
> 1.7.6
>



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] madvise.2: add MADV_HUGEPAGE and MADV_NOHUGEPAGE
       [not found]     ` <CAKgNAkh55ZFMEU5nH0vS=jgW91GjDWx1Tf=gyRDUqNm4yqS1oA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-09-28  1:59       ` Andrea Arcangeli
  0 siblings, 0 replies; 3+ messages in thread
From: Andrea Arcangeli @ 2011-09-28  1:59 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: Doug Goldstein, linux-man-u79uwXL29TY76Z2rM5mHXA

Hello,

On Mon, Sep 19, 2011 at 07:26:49AM +0200, Michael Kerrisk wrote:
> Hello Doug,
> 
> On Wed, Jul 27, 2011 at 10:14 PM, Doug Goldstein <cardoe-VPKZcK2rSRzQT0dZR+AlfA@public.gmane.org> wrote:
> > Document the MADV_HUGEPAGE and MADV_NOHUGEPAGE flags added to the
> > madvise() syscall in Linux kernels 2.6.38 and newer.
> 
> Thanks. I've applied this for man-pages-3.34.
> 
> Andrea, is there anything you think necessary to add/change?

Looking good!

> > Signed-off-by: Doug Goldstein <cardoe-VPKZcK2rSRzQT0dZR+AlfA@public.gmane.org>
> > ---
> >  man2/madvise.2 |   34 ++++++++++++++++++++++++++++++++++
> >  1 files changed, 34 insertions(+), 0 deletions(-)
> >
> > diff --git a/man2/madvise.2 b/man2/madvise.2
> > index 6a449c5..e099e94 100644
> > --- a/man2/madvise.2
> > +++ b/man2/madvise.2
> > @@ -209,6 +209,40 @@ KSM unmerges whatever pages it had merged in the
> > address range specified by
> >  .IR addr
> >  and
> >  .IR length .
> > +.TP
> > +.BR MADV_HUGEPAGE " (since Linux 2.6.38)"
> > +Enables Transparent Huge Pages (THP) for pages in the range specified by
> > +.I addr
> > +and
> > +.IR length .

Maybe it should also be specified that most common kernels
configurations by default will behave like MADV_HUGEPAGE already, and
thus MADV_HUGEPAGE is normally not necessary and it's mostly meant for
embedded systems that may not enable by default in the kernel the
MADV_HUGEPAGE behavior. It can be used in order to selectively enable
THP through MADV_HUGEPAGE (only in some region). Whenever
MADV_HUGEPAGE is used, it should be always in regions of memory with
an access pattern that the developer knows in advance that won't risk
to increase the memory footprint of the application when transparent
hugepages are enabled.

> > +.BR MADV_NOHUGEPAGE " (since Linux 2.6.38)"
> > +Ensures that memory in the address range specified by
> > +.IR addr
> > +and
> > +.IR length
> > +will not be collapsed into huge pages.

Maybe it's more clear as "will not be backed by transparent
hugepages". The collapse is done by khugepaged only but the
transparent hugepages may be natively allocated during the page fault
without waiting them to be collapse later, if MADV_NOHUGEPAGE isn't
used.

This can be used to selectively disable THP for any app that is doing
some scattered memory access that may increase the memory footprint
of the application too much with THP enabled.

Generally those two MADV_*HUGEPAGE madvise are useful to deal with any
memory footprint issue that may arise depending on the kernel default.

For example that the NPTL thread stacks virtual area could be a good
candidate for MADV_NOHUGEPAGE usage, but that's not implemented yet I
think. As opposed qemu-kvm should do MADV_HUGEPAGE by default because
if somebody runs KVM on embedded there will be no memory waste in KVM
because of THP enabled for the guest physical memory (when the guest
reach peak load and touched all ram which happens eventually), so then
KVM will just run faster with no risk of increased memory footprint.

Not so easy to explain clearly though :) but if we manage express
these concepts too, it'll avoid the risk of people polluting apps with
these madvises when they're not needed 99% of the time (with a few
exceptions like qemu-kvm and maybe NPTL for the user thread stacks,
the latter has yet to be checked, KVM I'm positive it'll be fine).

But hey your previous patch already is looking good already.

Thanks a lot for helping document this!
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-09-28  1:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-27 20:14 [PATCH] madvise.2: add MADV_HUGEPAGE and MADV_NOHUGEPAGE Doug Goldstein
     [not found] ` <CAFWqQMRFHJ2kWkJWB2dAg-Od9MzqL7LeC=CQvzy6t5aNqVY_zQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-19  5:26   ` Michael Kerrisk
     [not found]     ` <CAKgNAkh55ZFMEU5nH0vS=jgW91GjDWx1Tf=gyRDUqNm4yqS1oA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-28  1:59       ` Andrea Arcangeli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.