All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] The future of memory tiering
@ 2023-04-27  4:30 David Rientjes
  2023-04-27 17:10 ` Frank van der Linden
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: David Rientjes @ 2023-04-27  4:30 UTC (permalink / raw)
  To: Michal Hocko, Dan Williams
  Cc: lsf-pc, linux-mm, Wei Xu, Frank van der Linden, Johannes Weiner,
	Dave Hansen, Huang Ying, Aneesh Kumar K.V, Yang Shi,
	Davidlohr Bueso, Jon Grimm

Hi everybody,

As requested, sending along a last minute topic suggestion for 
consideration for LSF/MM/BPF 2023 :)

For a sizable set of emerging technologies, memory tiering presents one of 
the most formidable challenges and exicting opportunities for the MM 
subsystem today.

"Memory tiering" can mean many different things based on the user: from 
traditional every day NUMA, to swap (to zswap), to NVDIMMs, to HBM, to 
locally attached CXL memory, to memory borrowing over PCIe, to memory 
pooling with disaggregation, and beyond.

Just as NUMA started out only being useful for the supercomputers, memory 
tiering will likely evolve over the next five years to take on an 
expanding set of use cases, and likely with rapidly increasing adoption 
even beyond hyperscalers.

I think a discussion about memory tiering would be highly valuable.  A few 
key questions that I think can drive this discussion:

 - What are the various form factors that must be supported as short-term 
   goals as well as need to be supported 5+ years into the future?

 - What incremental changes need to be made on top of NUMA support to
   fully support the wide range of use cases that will be coming?  (Is
   memory tiering support built entirely upon NUMA?)

 - What is the minimum viable *default* support that the MM subsystem 
   should provide for tiered configs?  What are the set of optimizations
   that should be left to userspace or BPF to control?

 - What are the various page promotion technqiues that we must plan for
   beyond traditional NUMA balancing that will allow us to exploit
   hardware innovation?

(And I'm sure there are more topics of discussion that others would 
readily add.  It would be great to have additional ideas in replies.)

A key challenge in all of this is to make memory tiering support in the 
upstream kernel compatible with the roadmaps of various CPU vendors.  A 
key goal is to ensure the end user benefits from all of this rapid 
innovation with generalized support that is well abstracted and allows for 
extensibility.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM/BPF TOPIC] The future of memory tiering
  2023-04-27  4:30 [LSF/MM/BPF TOPIC] The future of memory tiering David Rientjes
@ 2023-04-27 17:10 ` Frank van der Linden
  2023-04-28  3:55   ` Wei Xu
  2023-04-28  8:04 ` Huang, Ying
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: Frank van der Linden @ 2023-04-27 17:10 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Hocko, Dan Williams, lsf-pc, linux-mm, Wei Xu,
	Johannes Weiner, Dave Hansen, Huang Ying, Aneesh Kumar K.V,
	Yang Shi, Davidlohr Bueso, Jon Grimm

On Wed, Apr 26, 2023 at 9:30 PM David Rientjes <rientjes@google.com> wrote:
>
> Hi everybody,
>
> As requested, sending along a last minute topic suggestion for
> consideration for LSF/MM/BPF 2023 :)
>
> For a sizable set of emerging technologies, memory tiering presents one of
> the most formidable challenges and exicting opportunities for the MM
> subsystem today.
>
> "Memory tiering" can mean many different things based on the user: from
> traditional every day NUMA, to swap (to zswap), to NVDIMMs, to HBM, to
> locally attached CXL memory, to memory borrowing over PCIe, to memory
> pooling with disaggregation, and beyond.
>
> Just as NUMA started out only being useful for the supercomputers, memory
> tiering will likely evolve over the next five years to take on an
> expanding set of use cases, and likely with rapidly increasing adoption
> even beyond hyperscalers.
>
> I think a discussion about memory tiering would be highly valuable.  A few
> key questions that I think can drive this discussion:
>
>  - What are the various form factors that must be supported as short-term
>    goals as well as need to be supported 5+ years into the future?
>
>  - What incremental changes need to be made on top of NUMA support to
>    fully support the wide range of use cases that will be coming?  (Is
>    memory tiering support built entirely upon NUMA?)
>
>  - What is the minimum viable *default* support that the MM subsystem
>    should provide for tiered configs?  What are the set of optimizations
>    that should be left to userspace or BPF to control?
>
>  - What are the various page promotion technqiues that we must plan for
>    beyond traditional NUMA balancing that will allow us to exploit
>    hardware innovation?
>
> (And I'm sure there are more topics of discussion that others would
> readily add.  It would be great to have additional ideas in replies.)
>
> A key challenge in all of this is to make memory tiering support in the
> upstream kernel compatible with the roadmaps of various CPU vendors.  A
> key goal is to ensure the end user benefits from all of this rapid
> innovation with generalized support that is well abstracted and allows for
> extensibility.

Thank you for bringing this one up. Memory tiering is a very important
topic that should definitely be discussed. I'm especially interested
in the userspace control part (which I proposed as a separate topic,
but happy to see it addressed as part of this discussion too, as that
is where the motivation originally came from). With the increased
complexity introduced by memory tiers, is it still possible to provide
a one-size-fits-all default? If there is such a default, is it
accurately represented by the current model of NUMA nodes, where pages
will be demoted to a slower tier as a 'reclaim' operation (e.g. you
basically map a global LRU model on to tiers of increased latency)?
Are there reasons to break that model, and should applications be able
to do that? Is the current mempolicy/madvise model sufficient?

- Frank


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM/BPF TOPIC] The future of memory tiering
  2023-04-27 17:10 ` Frank van der Linden
@ 2023-04-28  3:55   ` Wei Xu
  0 siblings, 0 replies; 7+ messages in thread
From: Wei Xu @ 2023-04-28  3:55 UTC (permalink / raw)
  To: Frank van der Linden
  Cc: David Rientjes, Michal Hocko, Dan Williams, lsf-pc, linux-mm,
	Johannes Weiner, Dave Hansen, Huang Ying, Aneesh Kumar K.V,
	Yang Shi, Davidlohr Bueso, Jon Grimm, Greg Thelen

On Thu, Apr 27, 2023 at 10:10 AM Frank van der Linden <fvdl@google.com> wrote:
>
> On Wed, Apr 26, 2023 at 9:30 PM David Rientjes <rientjes@google.com> wrote:
> >
> > Hi everybody,
> >
> > As requested, sending along a last minute topic suggestion for
> > consideration for LSF/MM/BPF 2023 :)
> >
> > For a sizable set of emerging technologies, memory tiering presents one of
> > the most formidable challenges and exicting opportunities for the MM
> > subsystem today.
> >
> > "Memory tiering" can mean many different things based on the user: from
> > traditional every day NUMA, to swap (to zswap), to NVDIMMs, to HBM, to
> > locally attached CXL memory, to memory borrowing over PCIe, to memory
> > pooling with disaggregation, and beyond.
> >
> > Just as NUMA started out only being useful for the supercomputers, memory
> > tiering will likely evolve over the next five years to take on an
> > expanding set of use cases, and likely with rapidly increasing adoption
> > even beyond hyperscalers.
> >
> > I think a discussion about memory tiering would be highly valuable.  A few
> > key questions that I think can drive this discussion:
> >
> >  - What are the various form factors that must be supported as short-term
> >    goals as well as need to be supported 5+ years into the future?
> >
> >  - What incremental changes need to be made on top of NUMA support to
> >    fully support the wide range of use cases that will be coming?  (Is
> >    memory tiering support built entirely upon NUMA?)
> >
> >  - What is the minimum viable *default* support that the MM subsystem
> >    should provide for tiered configs?  What are the set of optimizations
> >    that should be left to userspace or BPF to control?
> >
> >  - What are the various page promotion technqiues that we must plan for
> >    beyond traditional NUMA balancing that will allow us to exploit
> >    hardware innovation?
> >
> > (And I'm sure there are more topics of discussion that others would
> > readily add.  It would be great to have additional ideas in replies.)
> >
> > A key challenge in all of this is to make memory tiering support in the
> > upstream kernel compatible with the roadmaps of various CPU vendors.  A
> > key goal is to ensure the end user benefits from all of this rapid
> > innovation with generalized support that is well abstracted and allows for
> > extensibility.
>
> Thank you for bringing this one up. Memory tiering is a very important
> topic that should definitely be discussed. I'm especially interested
> in the userspace control part (which I proposed as a separate topic,
> but happy to see it addressed as part of this discussion too, as that
> is where the motivation originally came from). With the increased
> complexity introduced by memory tiers, is it still possible to provide
> a one-size-fits-all default? If there is such a default, is it
> accurately represented by the current model of NUMA nodes, where pages
> will be demoted to a slower tier as a 'reclaim' operation (e.g. you
> basically map a global LRU model on to tiers of increased latency)?
> Are there reasons to break that model, and should applications be able
> to do that? Is the current mempolicy/madvise model sufficient?
>
> - Frank

I am definitely interested in the discussions on memory tiering as
well. In particular:

- What should be the interface to configure and initialize various
memory devices, especially CXL.mem devices, as tiered memory
nodes/zones?
- What kind of framework do we need to leverage existing and future
hardware support (e.g. accessed bits/counters, PMU/IBS, etc) for page
promotions?
- How can the userspace influence the memory tiering policies?
- What kind of memory tiering controls do we want to provide for cgroups?

Wei


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM/BPF TOPIC] The future of memory tiering
  2023-04-27  4:30 [LSF/MM/BPF TOPIC] The future of memory tiering David Rientjes
  2023-04-27 17:10 ` Frank van der Linden
@ 2023-04-28  8:04 ` Huang, Ying
  2023-04-29  2:26 ` Yang Shi
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Huang, Ying @ 2023-04-28  8:04 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Hocko, Dan Williams, lsf-pc, linux-mm, Wei Xu,
	Frank van der Linden, Johannes Weiner, Dave Hansen,
	Aneesh Kumar K.V, Yang Shi, Davidlohr Bueso, Jon Grimm

David Rientjes <rientjes@google.com> writes:

> Hi everybody,
>
> As requested, sending along a last minute topic suggestion for 
> consideration for LSF/MM/BPF 2023 :)
>
> For a sizable set of emerging technologies, memory tiering presents one of 
> the most formidable challenges and exicting opportunities for the MM 
> subsystem today.
>
> "Memory tiering" can mean many different things based on the user: from 
> traditional every day NUMA, to swap (to zswap), to NVDIMMs, to HBM, to 
> locally attached CXL memory, to memory borrowing over PCIe, to memory 
> pooling with disaggregation, and beyond.

Thanks for proposing the topic.  I have strong interest in memory
tiering topics.

> Just as NUMA started out only being useful for the supercomputers, memory 
> tiering will likely evolve over the next five years to take on an 
> expanding set of use cases, and likely with rapidly increasing adoption 
> even beyond hyperscalers.
>
> I think a discussion about memory tiering would be highly valuable.  A few 
> key questions that I think can drive this discussion:
>
>  - What are the various form factors that must be supported as short-term 
>    goals as well as need to be supported 5+ years into the future?
>
>  - What incremental changes need to be made on top of NUMA support to
>    fully support the wide range of use cases that will be coming?  (Is
>    memory tiering support built entirely upon NUMA?)

Yes.  The memory tiers may represent memory compression (such as zswap)
too.  We may extend it.

>  - What is the minimum viable *default* support that the MM subsystem 
>    should provide for tiered configs?  What are the set of optimizations
>    that should be left to userspace or BPF to control?
>
>  - What are the various page promotion technqiues that we must plan for
>    beyond traditional NUMA balancing that will allow us to exploit
>    hardware innovation?

I have interest in hardware innovation too.  And I feel it's hard to
unite all these methods into one framework.

> (And I'm sure there are more topics of discussion that others would 
> readily add.  It would be great to have additional ideas in replies.)
>
> A key challenge in all of this is to make memory tiering support in the 
> upstream kernel compatible with the roadmaps of various CPU vendors.  A 
> key goal is to ensure the end user benefits from all of this rapid 
> innovation with generalized support that is well abstracted and allows for 
> extensibility.

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM/BPF TOPIC] The future of memory tiering
  2023-04-27  4:30 [LSF/MM/BPF TOPIC] The future of memory tiering David Rientjes
  2023-04-27 17:10 ` Frank van der Linden
  2023-04-28  8:04 ` Huang, Ying
@ 2023-04-29  2:26 ` Yang Shi
  2023-05-01 13:16 ` Jason Gunthorpe
  2023-05-01 19:21 ` John Hubbard
  4 siblings, 0 replies; 7+ messages in thread
From: Yang Shi @ 2023-04-29  2:26 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Hocko, Dan Williams, lsf-pc, linux-mm, Wei Xu,
	Frank van der Linden, Johannes Weiner, Dave Hansen, Huang Ying,
	Aneesh Kumar K.V, Davidlohr Bueso, Jon Grimm

Hi David,

Thanks for proposing this topic. I'd like to join the discussion.

Some inline comments in the below.

On Wed, Apr 26, 2023 at 9:30 PM David Rientjes <rientjes@google.com> wrote:
>
> Hi everybody,
>
> As requested, sending along a last minute topic suggestion for
> consideration for LSF/MM/BPF 2023 :)
>
> For a sizable set of emerging technologies, memory tiering presents one of
> the most formidable challenges and exicting opportunities for the MM
> subsystem today.
>
> "Memory tiering" can mean many different things based on the user: from
> traditional every day NUMA, to swap (to zswap), to NVDIMMs, to HBM, to
> locally attached CXL memory, to memory borrowing over PCIe, to memory
> pooling with disaggregation, and beyond.
>
> Just as NUMA started out only being useful for the supercomputers, memory
> tiering will likely evolve over the next five years to take on an
> expanding set of use cases, and likely with rapidly increasing adoption
> even beyond hyperscalers.
>
> I think a discussion about memory tiering would be highly valuable.  A few
> key questions that I think can drive this discussion:
>
>  - What are the various form factors that must be supported as short-term
>    goals as well as need to be supported 5+ years into the future?
>
>  - What incremental changes need to be made on top of NUMA support to
>    fully support the wide range of use cases that will be coming?  (Is
>    memory tiering support built entirely upon NUMA?)

AFAICT, per the before discussion numa distance may be not enough to
rank the memory devices in tiers properly. We may need to figure out
one or multiple better metrics.

>
>  - What is the minimum viable *default* support that the MM subsystem
>    should provide for tiered configs?  What are the set of optimizations
>    that should be left to userspace or BPF to control?
>
>  - What are the various page promotion technqiues that we must plan for
>    beyond traditional NUMA balancing that will allow us to exploit
>    hardware innovation?
>
> (And I'm sure there are more topics of discussion that others would
> readily add.  It would be great to have additional ideas in replies.)
>
> A key challenge in all of this is to make memory tiering support in the
> upstream kernel compatible with the roadmaps of various CPU vendors.  A
> key goal is to ensure the end user benefits from all of this rapid
> innovation with generalized support that is well abstracted and allows for
> extensibility.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM/BPF TOPIC] The future of memory tiering
  2023-04-27  4:30 [LSF/MM/BPF TOPIC] The future of memory tiering David Rientjes
                   ` (2 preceding siblings ...)
  2023-04-29  2:26 ` Yang Shi
@ 2023-05-01 13:16 ` Jason Gunthorpe
  2023-05-01 19:21 ` John Hubbard
  4 siblings, 0 replies; 7+ messages in thread
From: Jason Gunthorpe @ 2023-05-01 13:16 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Hocko, Dan Williams, lsf-pc, linux-mm, Wei Xu,
	Frank van der Linden, Johannes Weiner, Dave Hansen, Huang Ying,
	Aneesh Kumar K.V, Yang Shi, Davidlohr Bueso, Jon Grimm,
	John Hubbard

On Wed, Apr 26, 2023 at 09:30:54PM -0700, David Rientjes wrote:
> Hi everybody,
> 
> As requested, sending along a last minute topic suggestion for 
> consideration for LSF/MM/BPF 2023 :)
> 
> For a sizable set of emerging technologies, memory tiering presents one of 
> the most formidable challenges and exicting opportunities for the MM 
> subsystem today.
> 
> "Memory tiering" can mean many different things based on the user: from 
> traditional every day NUMA, to swap (to zswap), to NVDIMMs, to HBM, to 
> locally attached CXL memory, to memory borrowing over PCIe, to memory 
> pooling with disaggregation, and beyond.
> 
> Just as NUMA started out only being useful for the supercomputers, memory 
> tiering will likely evolve over the next five years to take on an 
> expanding set of use cases, and likely with rapidly increasing adoption 
> even beyond hyperscalers.
> 
> I think a discussion about memory tiering would be highly valuable.  A few 
> key questions that I think can drive this discussion:
> 
>  - What are the various form factors that must be supported as short-term 
>    goals as well as need to be supported 5+ years into the future?
> 
>  - What incremental changes need to be made on top of NUMA support to
>    fully support the wide range of use cases that will be coming?  (Is
>    memory tiering support built entirely upon NUMA?)
> 
>  - What is the minimum viable *default* support that the MM subsystem 
>    should provide for tiered configs?  What are the set of optimizations
>    that should be left to userspace or BPF to control?
> 
>  - What are the various page promotion technqiues that we must plan for
>    beyond traditional NUMA balancing that will allow us to exploit
>    hardware innovation?
> 
> (And I'm sure there are more topics of discussion that others would 
> readily add.  It would be great to have additional ideas in replies.)
> 
> A key challenge in all of this is to make memory tiering support in the 
> upstream kernel compatible with the roadmaps of various CPU vendors.  A 
> key goal is to ensure the end user benefits from all of this rapid 
> innovation with generalized support that is well abstracted and allows for 
> extensibility.

I'm interested in this too, memory pools with strong locality to
specific compute blocks are becoming an increasing feature in
supercomputer build outs. It would be great to see a comprehensive
approach to this in the mm, not just solving the "external
slower dram" approach.

Jason


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM/BPF TOPIC] The future of memory tiering
  2023-04-27  4:30 [LSF/MM/BPF TOPIC] The future of memory tiering David Rientjes
                   ` (3 preceding siblings ...)
  2023-05-01 13:16 ` Jason Gunthorpe
@ 2023-05-01 19:21 ` John Hubbard
  4 siblings, 0 replies; 7+ messages in thread
From: John Hubbard @ 2023-05-01 19:21 UTC (permalink / raw)
  To: David Rientjes, Michal Hocko, Dan Williams
  Cc: lsf-pc, linux-mm, Wei Xu, Frank van der Linden, Johannes Weiner,
	Dave Hansen, Huang Ying, Aneesh Kumar K.V, Yang Shi,
	Davidlohr Bueso, Jon Grimm

On 4/26/23 21:30, David Rientjes wrote:
> Hi everybody,
> 
> As requested, sending along a last minute topic suggestion for 
> consideration for LSF/MM/BPF 2023 :)
> 
> For a sizable set of emerging technologies, memory tiering presents one of 
> the most formidable challenges and exicting opportunities for the MM 
> subsystem today.
> 
> "Memory tiering" can mean many different things based on the user: from 
> traditional every day NUMA, to swap (to zswap), to NVDIMMs, to HBM, to 
> locally attached CXL memory, to memory borrowing over PCIe, to memory 
> pooling with disaggregation, and beyond.
> 
> Just as NUMA started out only being useful for the supercomputers, memory 
> tiering will likely evolve over the next five years to take on an 
> expanding set of use cases, and likely with rapidly increasing adoption 
> even beyond hyperscalers.
> 
> I think a discussion about memory tiering would be highly valuable.  A few 
> key questions that I think can drive this discussion:
> 
>  - What are the various form factors that must be supported as short-term 
>    goals as well as need to be supported 5+ years into the future?
> 
>  - What incremental changes need to be made on top of NUMA support to
>    fully support the wide range of use cases that will be coming?  (Is
>    memory tiering support built entirely upon NUMA?)
> 
>  - What is the minimum viable *default* support that the MM subsystem 
>    should provide for tiered configs?  What are the set of optimizations
>    that should be left to userspace or BPF to control?
> 
>  - What are the various page promotion technqiues that we must plan for
>    beyond traditional NUMA balancing that will allow us to exploit
>    hardware innovation?
> 
> (And I'm sure there are more topics of discussion that others would 
> readily add.  It would be great to have additional ideas in replies.)
> 
> A key challenge in all of this is to make memory tiering support in the 
> upstream kernel compatible with the roadmaps of various CPU vendors.  A 
> key goal is to ensure the end user benefits from all of this rapid 
> innovation with generalized support that is well abstracted and allows for 
> extensibility.
> 

Yes, this is an extremely relevant topic from our point of view, as Jason
already mentioned. I'm very interested in a system that works well in
the presence of highly capable devices that can handle replayable page
faults and can co-process along with the CPU. Eventually, the kernel
should, arguably, be more aware of what a GPU or smart NIC is doing with
both memory and (device) processor time.

I have lots of examples of that, and one of my favorites is the current
autonuma behavior: unmapping a lot of pages, and waiting for *CPU* page
faults, in order to decide which NUMA node those pages are best placed
on. Of course, if a GPU or other page-fault-capable device had those
pages mapped, then the MMU notifier callbacks will force the device to
unmap those pages, and then fault them in again, at a truly huge,
and unnecessary, performance cost.

Also, when thinking about designs, sometimes it helps to think about
memory from the perspective of these devices, just to kind of shake
up the mental model.

thanks,
-- 
John Hubbard
NVIDIA



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-05-01 19:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-27  4:30 [LSF/MM/BPF TOPIC] The future of memory tiering David Rientjes
2023-04-27 17:10 ` Frank van der Linden
2023-04-28  3:55   ` Wei Xu
2023-04-28  8:04 ` Huang, Ying
2023-04-29  2:26 ` Yang Shi
2023-05-01 13:16 ` Jason Gunthorpe
2023-05-01 19:21 ` John Hubbard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.