linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pasha Tatashin <pasha.tatashin@soleen.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, x86@kernel.org, bp@alien8.de,
	brauner@kernel.org, bristot@redhat.com, bsegall@google.com,
	dave.hansen@linux.intel.com, dianders@chromium.org,
	dietmar.eggemann@arm.com, eric.devolder@oracle.com,
	hca@linux.ibm.com, hch@infradead.org, hpa@zytor.com,
	jacob.jun.pan@linux.intel.com, jgg@ziepe.ca, jpoimboe@kernel.org,
	jroedel@suse.de, juri.lelli@redhat.com,
	kent.overstreet@linux.dev, kinseyho@google.com,
	kirill.shutemov@linux.intel.com, lstoakes@gmail.com,
	luto@kernel.org, mgorman@suse.de, mic@digikod.net,
	michael.christie@oracle.com, mingo@redhat.com, mjguzik@gmail.com,
	mst@redhat.com, npiggin@gmail.com, peterz@infradead.org,
	pmladek@suse.com, rick.p.edgecombe@intel.com,
	rostedt@goodmis.org, surenb@google.com, tglx@linutronix.de,
	urezki@gmail.com, vincent.guittot@linaro.org,
	vschneid@redhat.com, pasha.tatashin@soleen.com
Subject: [RFC 00/14] Dynamic Kernel Stacks
Date: Mon, 11 Mar 2024 16:46:24 +0000	[thread overview]
Message-ID: <20240311164638.2015063-1-pasha.tatashin@soleen.com> (raw)

This is follow-up to the LSF/MM proposal [1]. Please provide your
thoughts and comments about dynamic kernel stacks feature. This is a WIP
has not been tested beside booting on some machines, and running LKDTM
thread exhaust tests. The series also lacks selftests, and
documentations.

This feature allows to grow kernel stack dynamically, from 4KiB and up
to the THREAD_SIZE. The intend is to save memory on fleet machines. From
the initial experiments it shows to save on average 70-75% of the kernel
stack memory.

The average depth of a kernel thread depends on the workload, profiling,
virtualization, compiler optimizations, and driver implementations.
However, the table below shows the amount of kernel stack memory before
vs. after on idling freshly booted machines:

CPU           #Cores #Stacks  BASE(kb) Dynamic(kb)   Saving
AMD Genoa        384    5786    92576       23388    74.74%
Intel Skylake    112    3182    50912       12860    74.74%
AMD Rome         128    3401    54416       14784    72.83%
AMD Rome         256    4908    78528       20876    73.42%
Intel Haswell     72    2644    42304       10624    74.89%

Some workloads with that have millions of threads would can benefit
significantly from this feature.

[1] https://lore.kernel.org/all/CA+CK2bBYt9RAVqASB2eLyRQxYT5aiL0fGhUu3TumQCyJCNTWvw@mail.gmail.com

Pasha Tatashin (14):
  task_stack.h: remove obsolete __HAVE_ARCH_KSTACK_END check
  fork: Clean-up ifdef logic around stack allocation
  fork: Clean-up naming of vm_strack/vm_struct variables in vmap stacks
    code
  fork: Remove assumption that vm_area->nr_pages equals to THREAD_SIZE
  fork: check charging success before zeroing stack
  fork: zero vmap stack using clear_page() instead of memset()
  fork: use the first page in stack to store vm_stack in cached_stacks
  fork: separate vmap stack alloction and free calls
  mm/vmalloc: Add a get_vm_area_node() and vmap_pages_range_noflush()
    public functions
  fork: Dynamic Kernel Stacks
  x86: add support for Dynamic Kernel Stacks
  task_stack.h: Clean-up stack_not_used() implementation
  task_stack.h: Add stack_not_used() support for dynamic stack
  fork: Dynamic Kernel Stack accounting

 arch/Kconfig                     |  33 +++
 arch/x86/Kconfig                 |   1 +
 arch/x86/kernel/traps.c          |   3 +
 arch/x86/mm/fault.c              |   3 +
 include/linux/mmzone.h           |   3 +
 include/linux/sched.h            |   2 +-
 include/linux/sched/task_stack.h |  94 ++++++--
 include/linux/vmalloc.h          |  15 ++
 kernel/fork.c                    | 388 ++++++++++++++++++++++++++-----
 kernel/sched/core.c              |   1 +
 mm/internal.h                    |   9 -
 mm/vmalloc.c                     |  24 ++
 mm/vmstat.c                      |   3 +
 13 files changed, 487 insertions(+), 92 deletions(-)

-- 
2.44.0.278.ge034bb2e1d-goog


             reply	other threads:[~2024-03-11 16:46 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-11 16:46 Pasha Tatashin [this message]
2024-03-11 16:46 ` [RFC 01/14] task_stack.h: remove obsolete __HAVE_ARCH_KSTACK_END check Pasha Tatashin
2024-03-17 14:36   ` Christophe JAILLET
2024-03-17 15:13     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 02/14] fork: Clean-up ifdef logic around stack allocation Pasha Tatashin
2024-03-11 16:46 ` [RFC 03/14] fork: Clean-up naming of vm_strack/vm_struct variables in vmap stacks code Pasha Tatashin
2024-03-17 14:42   ` Christophe JAILLET
2024-03-19 16:32     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 04/14] fork: Remove assumption that vm_area->nr_pages equals to THREAD_SIZE Pasha Tatashin
2024-03-17 14:45   ` Christophe JAILLET
2024-03-17 15:14     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 05/14] fork: check charging success before zeroing stack Pasha Tatashin
2024-03-12 15:57   ` Kirill A. Shutemov
2024-03-12 16:52     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 06/14] fork: zero vmap stack using clear_page() instead of memset() Pasha Tatashin
2024-03-12  7:15   ` Nikolay Borisov
2024-03-12 16:53     ` Pasha Tatashin
2024-03-14  7:55       ` Christophe Leroy
2024-03-14 13:52         ` Pasha Tatashin
2024-03-17 14:48   ` Christophe JAILLET
2024-03-17 15:15     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 07/14] fork: use the first page in stack to store vm_stack in cached_stacks Pasha Tatashin
2024-03-11 16:46 ` [RFC 08/14] fork: separate vmap stack alloction and free calls Pasha Tatashin
2024-03-14 15:18   ` Jeff Xie
2024-03-14 17:14     ` Pasha Tatashin
2024-03-17 14:51   ` Christophe JAILLET
2024-03-17 15:15     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 09/14] mm/vmalloc: Add a get_vm_area_node() and vmap_pages_range_noflush() public functions Pasha Tatashin
2024-03-11 16:46 ` [RFC 10/14] fork: Dynamic Kernel Stacks Pasha Tatashin
2024-03-11 19:32   ` Randy Dunlap
2024-03-11 19:55     ` Pasha Tatashin
2024-03-11 16:46 ` [RFC 11/14] x86: add support for " Pasha Tatashin
2024-03-11 22:17   ` Andy Lutomirski
2024-03-11 23:10     ` Pasha Tatashin
2024-03-11 23:33       ` Thomas Gleixner
2024-03-11 23:34       ` Andy Lutomirski
2024-03-12  0:08         ` Pasha Tatashin
2024-03-12  0:23           ` Pasha Tatashin
2024-03-11 23:34     ` Dave Hansen
2024-03-11 23:41       ` Andy Lutomirski
2024-03-11 23:56         ` Nadav Amit
2024-03-12  0:02           ` Andy Lutomirski
2024-03-12  7:20             ` Nadav Amit
2024-03-12  0:53           ` Dave Hansen
2024-03-12  1:25             ` H. Peter Anvin
2024-03-12  2:16               ` Andy Lutomirski
2024-03-12  2:20                 ` H. Peter Anvin
2024-03-12 21:58   ` Andi Kleen
2024-03-13 10:23   ` Thomas Gleixner
2024-03-13 13:43     ` Pasha Tatashin
2024-03-13 15:28       ` Pasha Tatashin
2024-03-13 16:12         ` Thomas Gleixner
2024-03-14 14:03           ` Pasha Tatashin
2024-03-14 18:26             ` Thomas Gleixner
2024-03-11 16:46 ` [RFC 12/14] task_stack.h: Clean-up stack_not_used() implementation Pasha Tatashin
2024-03-11 16:46 ` [RFC 13/14] task_stack.h: Add stack_not_used() support for dynamic stack Pasha Tatashin
2024-03-11 16:46 ` [RFC 14/14] fork: Dynamic Kernel Stack accounting Pasha Tatashin
2024-03-11 17:09 ` [RFC 00/14] Dynamic Kernel Stacks Mateusz Guzik
2024-03-11 18:58   ` Pasha Tatashin
2024-03-11 19:21     ` Mateusz Guzik
2024-03-11 19:55       ` Pasha Tatashin
2024-03-12 17:18 ` H. Peter Anvin
2024-03-12 19:45   ` Pasha Tatashin
2024-03-12 21:36     ` H. Peter Anvin
2024-03-14 19:05       ` Kent Overstreet
2024-03-14 19:23         ` Pasha Tatashin
2024-03-14 19:28           ` Kent Overstreet
2024-03-14 19:34             ` Pasha Tatashin
2024-03-14 19:49               ` Kent Overstreet
2024-03-12 22:18     ` David Laight
2024-03-14 19:43   ` Matthew Wilcox
2024-03-14 19:53     ` Kent Overstreet
2024-03-14 19:57       ` Matthew Wilcox
2024-03-14 19:58         ` Kent Overstreet
2024-03-15  3:13         ` Pasha Tatashin
2024-03-15  3:39           ` H. Peter Anvin
2024-03-16 19:17             ` Pasha Tatashin
2024-03-17  0:41               ` Matthew Wilcox
2024-03-17  1:32                 ` Kent Overstreet
2024-03-17 14:19                 ` Pasha Tatashin
2024-03-17 14:43               ` Brian Gerst
2024-03-17 16:15                 ` Pasha Tatashin
2024-03-17 21:30                   ` Brian Gerst
2024-03-18 14:59                     ` Pasha Tatashin
2024-03-18 21:02                       ` Brian Gerst
2024-03-19 14:56                         ` Pasha Tatashin
2024-03-17 18:57               ` David Laight
2024-03-18 15:09                 ` Pasha Tatashin
2024-03-18 15:13                   ` Pasha Tatashin
2024-03-18 15:19                   ` Matthew Wilcox
2024-03-18 15:30                     ` Pasha Tatashin
2024-03-18 15:53                       ` David Laight
2024-03-18 16:57                         ` Pasha Tatashin
2024-03-18 15:38               ` David Laight
2024-03-18 17:00                 ` Pasha Tatashin
2024-03-18 17:37                   ` Pasha Tatashin
2024-03-15  4:17           ` H. Peter Anvin
2024-03-17  0:47     ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240311164638.2015063-1-pasha.tatashin@soleen.com \
    --to=pasha.tatashin@soleen.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dianders@chromium.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=eric.devolder@oracle.com \
    --cc=hca@linux.ibm.com \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jpoimboe@kernel.org \
    --cc=jroedel@suse.de \
    --cc=juri.lelli@redhat.com \
    --cc=kent.overstreet@linux.dev \
    --cc=kinseyho@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mic@digikod.net \
    --cc=michael.christie@oracle.com \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=mst@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).