All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Memory tiering kernel alignment
@ 2024-01-25 18:26 David Rientjes
  2024-01-25 18:52 ` Matthew Wilcox
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: David Rientjes @ 2024-01-25 18:26 UTC (permalink / raw)
  To: John Hubbard, Zi Yan, Bharata B Rao, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple,
	Christoph Lameter, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner
  Cc: linux-mm

Hi everybody,

There is a lot of excitement around upcoming CXL type 3 memory expansion
devices and their cost savings potential.  As the industry starts to
adopt this technology, one of the key components in strategic planning is
how the upstream Linux kernel will support various tiered configurations
to meet various user needs.  I think it goes without saying that this is
quite interesting to cloud providers as well as other hyperscalers :)

I think this discussion would benefit from a collaborative approach
between various stakeholders and interested parties.  Reason being is
that there are several different use cases the need different support
models, but also because there is great incentive toward moving "with"
upstream Linux for this support rather than having multiple parties
bringing up their own stacks only to find that they are diverging from
upstream rather than converging with it.

I'm interested to learn if there is interest in forming a "Linux Memory
Tiering Work Group" to share ideas, discuss multi-faceted approaches, and
keep track of work items?

Some recent discussions have proven that there is widespread interest in
some very foundational topics for this technology such as:

 - Decoupling CPU balancing from memory balancing (or obsoleting CPU
   balancing entirely)

   + John Hubbard notes this would be useful for GPUs:

      a) GPUs have their own processors that are invisible to the kernel's
         NUMA "which tasks are active on which NUMA nodes" calculations,
         and

      b) Similar to where CXL is generally going, we have already built
         fully memory-coherent hardware, which include memory-only NUMA
         nodes.

 - In-kernel hot memory abstraction, informed by hardware hinting drivers
   (incl some architectures like Power10), usable as a NUMA Balancing
   backend for promotion and other areas of the kernel like transparent
   hugepage utilization

 - NUMA and memory tiering enlightenment for accelerators, such as for
   optimal use of GPU memory, extremely important for a cloud provider
   (hint hint :)

 - Asynchronous memory promotion independent of task_numa_fault() while
   considering the cost of page migration (due to identifying cold memory)

It looks like there is already some interest in such a working group that
would have a biweekly discussion of shared interests with the goal of
accelerating design, development, testing, and division of work:

Alistair Popple
Aneesh Kumar K V
Brian Morris
Christoph Lameter
Dan Williams
Gregory Price
Grimm, Jon
Huang, Ying
Johannes Weiner
John Hubbard
Zi Yan

Specifically for the in-kernel hot memory abstraction topic, Google and
Meta recently publushed an OCP base specification "Hyperscale CXL Tiered
Memory Expander Specification" available at
https://drive.google.com/file/d/1fFfU7dFmCyl6V9-9qiakdWaDr9d38ewZ/view?usp=drive_link
that would be great to discuss.

There is also on-going work in the CXL Consortium to standardize some of
the abstractions for CXL 3.1.

If folks are interested in this topic and your name doesn't appear above
(I already got you :), please:

 - reply-all to this email to express interest and expand upon the list
   of topics above to represent additional areas of interest that should
   be included, *or*

 - email me privately to express interest to make sure you are included

Perhaps I'm overly optimistic, but one thing that would be absolutely
*amazing* would be if we all have a very clear and understandable vision
for how Linux will support the wide variety of use cases, even before
that work is fully implemented (or even designed), by LSF/MM/BPF 2024
time in May.

Thanks!


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-01-25 18:26 [RFC] Memory tiering kernel alignment David Rientjes
@ 2024-01-25 18:52 ` Matthew Wilcox
  2024-01-25 20:04   ` David Rientjes
  2024-01-26 20:41   ` Christoph Lameter (Ampere)
  2024-01-26  0:04 ` SeongJae Park
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 17+ messages in thread
From: Matthew Wilcox @ 2024-01-25 18:52 UTC (permalink / raw)
  To: David Rientjes
  Cc: John Hubbard, Zi Yan, Bharata B Rao, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple,
	Christoph Lameter, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner, linux-mm

On Thu, Jan 25, 2024 at 10:26:19AM -0800, David Rientjes wrote:
> There is a lot of excitement around upcoming CXL type 3 memory expansion
> devices and their cost savings potential.  As the industry starts to
> adopt this technology, one of the key components in strategic planning is
> how the upstream Linux kernel will support various tiered configurations
> to meet various user needs.  I think it goes without saying that this is
> quite interesting to cloud providers as well as other hyperscalers :)

I'm not excited.  I'm disappointed that people are falling for this scam.
CXL is the ATM of this decade.  The protocol is not fit for the purpose
of accessing remote memory, adding 10ns just for an encode/decode cycle.
Hands up everybody who's excited about memory latency increasing by 17%.

Then there are the lies from the vendors who want you to buy switches.
Not one of them are willing to guarantee you the worst case latency
through their switches.

The concept is wrong.  Nobody wants to tie all of their machines together
into a giant single failure domain.  There's no possible redundancy
here.  Availability is diminished; how do you upgrade firmware on a
switch without taking it down?  Nobody can answer my contentions about
contention either; preventing a single machine from hogging access to
a single CXL endpoint seems like an unsolved problem.

CXL is great for its real purpose of attaching GPUs and migrating memory
back and forth in a software-transparent way.  We should support that,
and nothing more.

We should reject this technology before it harms our kernel and the
entire industry.  There's a reason that SGI died.  Nobody wants to buy
single image machines the size of a data centre.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-01-25 18:52 ` Matthew Wilcox
@ 2024-01-25 20:04   ` David Rientjes
  2024-01-25 20:19     ` Matthew Wilcox
  2024-01-26 20:41   ` Christoph Lameter (Ampere)
  1 sibling, 1 reply; 17+ messages in thread
From: David Rientjes @ 2024-01-25 20:04 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: John Hubbard, Zi Yan, Bharata B Rao, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple,
	Christoph Lameter, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner, linux-mm

On Thu, 25 Jan 2024, Matthew Wilcox wrote:

> On Thu, Jan 25, 2024 at 10:26:19AM -0800, David Rientjes wrote:
> > There is a lot of excitement around upcoming CXL type 3 memory expansion
> > devices and their cost savings potential.  As the industry starts to
> > adopt this technology, one of the key components in strategic planning is
> > how the upstream Linux kernel will support various tiered configurations
> > to meet various user needs.  I think it goes without saying that this is
> > quite interesting to cloud providers as well as other hyperscalers :)
> 
> I'm not excited.  I'm disappointed that people are falling for this scam.
> CXL is the ATM of this decade.  The protocol is not fit for the purpose
> of accessing remote memory, adding 10ns just for an encode/decode cycle.
> Hands up everybody who's excited about memory latency increasing by 17%.
> 

Right, I don't think that anybody is claiming that we can leverage locally 
attached CXL memory as through it was DRAM on the same or remote socket 
and that there won't be a noticable impact to application performance 
while the memory is still across the device.

It does offer several cost savings benefits for offloading of cold memory, 
though, if locally attached and I think the support for that use case is 
inevitable -- in fact, Linux has some sophisticated support for the 
locally attached use case already.

> Then there are the lies from the vendors who want you to buy switches.
> Not one of them are willing to guarantee you the worst case latency
> through their switches.
> 

I should have prefaced this thread by saying "locally attached CXL memory 
expansion", because that's the primary focus of many of the folks on this 
email thread :)

FWIW, I fully agree with your evaluation for memory pooling and some of 
the extensions provided by CXL 2.0.  I think that a lot of the pooling 
concepts are currently being overhyped, that's just my personal opinion.  
Happy to talk about the advantages and disadvantages (as well as the use 
cases), but I remain unconvinced on memory pooling use cases.

> The concept is wrong.  Nobody wants to tie all of their machines together
> into a giant single failure domain.  There's no possible redundancy
> here.  Availability is diminished; how do you upgrade firmware on a
> switch without taking it down?  Nobody can answer my contentions about
> contention either; preventing a single machine from hogging access to
> a single CXL endpoint seems like an unsolved problem.
> 
> CXL is great for its real purpose of attaching GPUs and migrating memory
> back and forth in a software-transparent way.  We should support that,
> and nothing more.
> 
> We should reject this technology before it harms our kernel and the
> entire industry.  There's a reason that SGI died.  Nobody wants to buy
> single image machines the size of a data centre.
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-01-25 20:04   ` David Rientjes
@ 2024-01-25 20:19     ` Matthew Wilcox
  2024-01-25 21:37       ` David Rientjes
  2024-01-29 10:27       ` David Hildenbrand
  0 siblings, 2 replies; 17+ messages in thread
From: Matthew Wilcox @ 2024-01-25 20:19 UTC (permalink / raw)
  To: David Rientjes
  Cc: John Hubbard, Zi Yan, Bharata B Rao, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple,
	Christoph Lameter, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner, linux-mm

On Thu, Jan 25, 2024 at 12:04:37PM -0800, David Rientjes wrote:
> On Thu, 25 Jan 2024, Matthew Wilcox wrote:
> > On Thu, Jan 25, 2024 at 10:26:19AM -0800, David Rientjes wrote:
> > > There is a lot of excitement around upcoming CXL type 3 memory expansion
> > > devices and their cost savings potential.  As the industry starts to
> > > adopt this technology, one of the key components in strategic planning is
> > > how the upstream Linux kernel will support various tiered configurations
> > > to meet various user needs.  I think it goes without saying that this is
> > > quite interesting to cloud providers as well as other hyperscalers :)
> > 
> > I'm not excited.  I'm disappointed that people are falling for this scam.
> > CXL is the ATM of this decade.  The protocol is not fit for the purpose
> > of accessing remote memory, adding 10ns just for an encode/decode cycle.
> > Hands up everybody who's excited about memory latency increasing by 17%.
> 
> Right, I don't think that anybody is claiming that we can leverage locally 
> attached CXL memory as through it was DRAM on the same or remote socket 
> and that there won't be a noticable impact to application performance 
> while the memory is still across the device.
> 
> It does offer several cost savings benefits for offloading of cold memory, 
> though, if locally attached and I think the support for that use case is 
> inevitable -- in fact, Linux has some sophisticated support for the 
> locally attached use case already.
> 
> > Then there are the lies from the vendors who want you to buy switches.
> > Not one of them are willing to guarantee you the worst case latency
> > through their switches.
> 
> I should have prefaced this thread by saying "locally attached CXL memory 
> expansion", because that's the primary focus of many of the folks on this 
> email thread :)

That's a huge relief.  I was not looking forward to the patches to add
support for pooling (etc).

Using CXL as cold-data-storage makes a certain amount of sense, although
I'm not really sure why it offers an advantage over NAND.  It's faster
than NAND, but you still want to bring it back locally before operating
on it.  NAND is denser, and consumes less power while idle.  NAND comes
with a DMA controller to move the data instead of relying on the CPU to
move the data around.  And of course moving the data first to CXL and
then to swap means that it's got to go over the memory bus multiple
times, unless you're building a swap device which attaches to the
other end of the CXL bus ...



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-01-25 20:19     ` Matthew Wilcox
@ 2024-01-25 21:37       ` David Rientjes
  2024-01-25 22:28         ` Gregory Price
                           ` (3 more replies)
  2024-01-29 10:27       ` David Hildenbrand
  1 sibling, 4 replies; 17+ messages in thread
From: David Rientjes @ 2024-01-25 21:37 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: John Hubbard, Zi Yan, Bharata B Rao, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple,
	Christoph Lameter, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner, SeongJae Park, linux-mm


On Thu, 25 Jan 2024, Matthew Wilcox wrote:

> On Thu, Jan 25, 2024 at 12:04:37PM -0800, David Rientjes wrote:
> > On Thu, 25 Jan 2024, Matthew Wilcox wrote:
> > > On Thu, Jan 25, 2024 at 10:26:19AM -0800, David Rientjes wrote:
> > > > There is a lot of excitement around upcoming CXL type 3 memory expansion
> > > > devices and their cost savings potential.  As the industry starts to
> > > > adopt this technology, one of the key components in strategic planning is
> > > > how the upstream Linux kernel will support various tiered configurations
> > > > to meet various user needs.  I think it goes without saying that this is
> > > > quite interesting to cloud providers as well as other hyperscalers :)
> > > 
> > > I'm not excited.  I'm disappointed that people are falling for this scam.
> > > CXL is the ATM of this decade.  The protocol is not fit for the purpose
> > > of accessing remote memory, adding 10ns just for an encode/decode cycle.
> > > Hands up everybody who's excited about memory latency increasing by 17%.
> > 
> > Right, I don't think that anybody is claiming that we can leverage locally 
> > attached CXL memory as through it was DRAM on the same or remote socket 
> > and that there won't be a noticable impact to application performance 
> > while the memory is still across the device.
> > 
> > It does offer several cost savings benefits for offloading of cold memory, 
> > though, if locally attached and I think the support for that use case is 
> > inevitable -- in fact, Linux has some sophisticated support for the 
> > locally attached use case already.
> > 
> > > Then there are the lies from the vendors who want you to buy switches.
> > > Not one of them are willing to guarantee you the worst case latency
> > > through their switches.
> > 
> > I should have prefaced this thread by saying "locally attached CXL memory 
> > expansion", because that's the primary focus of many of the folks on this 
> > email thread :)
> 
> That's a huge relief.  I was not looking forward to the patches to add
> support for pooling (etc).
> 
> Using CXL as cold-data-storage makes a certain amount of sense, although
> I'm not really sure why it offers an advantage over NAND.  It's faster
> than NAND, but you still want to bring it back locally before operating
> on it.  NAND is denser, and consumes less power while idle.  NAND comes
> with a DMA controller to move the data instead of relying on the CPU to
> move the data around.  And of course moving the data first to CXL and
> then to swap means that it's got to go over the memory bus multiple
> times, unless you're building a swap device which attaches to the
> other end of the CXL bus ...
> 

This is **exactly** the type of discussion we're looking to have :)

There are some things that I've chatted informally with folks about that 
I'd like to bring to the forum:

 - Decoupling CPU migration from memory migration for NUMA Balancing (or
   perhaps deprecating CPU migration entirely)

 - Allowing NUMA Balancing to do migration as part of a kthread 
   asynchronous to the NUMA hint fault, in kernel context

 - Abstraction for future hardware devices that can provide an expanded
   view into page hotness that can be leveraged in different areas of the
   kernel, including as a backend for NUMA Balanacing to replace NUMA
   hint faults

 - Per-container support for configuring balancing and memory migration

 - Opting certain types of memory into NUMA Balancing (like tmpfs) while
   leaving other types alone

 - Utilizing hardware accelerated memory migration as a replacement for
   the traditional migrate_pages() path when available

I could go code all of this up and spend an enormous amount of time doing 
so only to get NAKed by somebody because I'm ripping out their critical 
use case that I just didn't know about :)  There's also the question of 
whether DAMON should be the source of truth for this or it should be 
decoupled.

My dream world would be where we could discuss various use cases for 
locally attached CXL memory and determine, as a group, what the shared, 
comprehensive "Linux vision" for it is and do so before LSF/MM/BPF.  In a 
perfect world, we could block out an expanded MM session in Salt Lake City 
to bring all these concepts together, what approaches sound reasonable vs 
unreasonable, and leave that conference with a clear understanding of what 
needs to happen.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-01-25 21:37       ` David Rientjes
@ 2024-01-25 22:28         ` Gregory Price
  2024-01-26  0:16         ` SeongJae Park
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Gregory Price @ 2024-01-25 22:28 UTC (permalink / raw)
  To: David Rientjes
  Cc: Matthew Wilcox, John Hubbard, Zi Yan, Bharata B Rao, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple,
	Christoph Lameter, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner, SeongJae Park, linux-mm

On Thu, Jan 25, 2024 at 01:37:02PM -0800, David Rientjes wrote:
> 
> On Thu, 25 Jan 2024, Matthew Wilcox wrote:
> > 
> > That's a huge relief.  I was not looking forward to the patches to add
> > support for pooling (etc).
> > 
> > Using CXL as cold-data-storage makes a certain amount of sense, although
> > I'm not really sure why it offers an advantage over NAND.  It's faster
> > than NAND, but you still want to bring it back locally before operating
> > on it.  NAND is denser, and consumes less power while idle.  NAND comes
> > 
> 
> This is **exactly** the type of discussion we're looking to have :)
> 
> There are some things that I've chatted informally with folks about that 
> I'd like to bring to the forum:
>
[...snip...]

Just going to toss in that tiering from a latency-only perspective is
also too narrow a focus. We should also consider bandwidth.

Saturating a channel means increased latencies - but the problem can be
alleviated by moving hot data *off* of heavily contended devices, even
if they are farther away. In a best case scenario, the addition of local
CXL devices can bring bandwidth-saturated DRAM back down into the
"maximum sustainable" (around ~70% of max), reducing average latencies
and increasing CPU utilization.

We've been finding there are a non-trivial number of workloads that
benefit more from distributing their hot data across their available
bandwidth than they do from just jamming it all on the local socket.

~Gregory


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-01-25 18:26 [RFC] Memory tiering kernel alignment David Rientjes
  2024-01-25 18:52 ` Matthew Wilcox
@ 2024-01-26  0:04 ` SeongJae Park
       [not found] ` <tsnp3a6oxglx2siv7aoplo665k7xsigkqtpfm5yiu2r3wvys26@3vntgau4t2gv>
  2024-02-29  2:04 ` Davidlohr Bueso
  3 siblings, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2024-01-26  0:04 UTC (permalink / raw)
  To: David Rientjes
  Cc: John Hubbard, Zi Yan, Bharata B Rao, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple,
	Christoph Lameter, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner, linux-mm

Hi David,

On Thu, 25 Jan 2024 10:26:19 -0800 (PST) David Rientjes <rientjes@google.com> wrote:

> Hi everybody,
> 
[...]
> I'm interested to learn if there is interest in forming a "Linux Memory
> Tiering Work Group" to share ideas, discuss multi-faceted approaches, and
> keep track of work items?
[...]
> If folks are interested in this topic and your name doesn't appear above
> (I already got you :), please:
> 
>  - reply-all to this email to express interest and expand upon the list
>    of topics above to represent additional areas of interest that should
>    be included, *or*
> 
>  - email me privately to express interest to make sure you are included

I'm taking the option one :)  I'm interested in this topic.  I'm not directly
working on this, but collaborating with several parties who working on the
area.  Some academic papers[1] exploring the usage have published, and recently
SK Hynix released their tiered memory management SDK using DAMON[2].  I also
shared a humble idea for DAMOS-based tiered memory management[3].  Please let
me in the loop.

[1] https://arxiv.org/abs/2302.09468
[2] https://github.com/skhynix/hmsdk/releases/tag/hmsdk-v2.0
[3] https://lore.kernel.org/r/20231112195602.61525-1-sj@kernel.org


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-01-25 21:37       ` David Rientjes
  2024-01-25 22:28         ` Gregory Price
@ 2024-01-26  0:16         ` SeongJae Park
  2024-01-26 21:06         ` Christoph Lameter (Ampere)
  2024-01-28 20:15         ` David Rientjes
  3 siblings, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2024-01-26  0:16 UTC (permalink / raw)
  To: David Rientjes
  Cc: Matthew Wilcox, John Hubbard, Zi Yan, Bharata B Rao, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple,
	Christoph Lameter, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner, SeongJae Park, linux-mm

On Thu, 25 Jan 2024 13:37:02 -0800 (PST) David Rientjes <rientjes@google.com> wrote:

> 
> On Thu, 25 Jan 2024, Matthew Wilcox wrote:
> 
> > On Thu, Jan 25, 2024 at 12:04:37PM -0800, David Rientjes wrote:
> > > On Thu, 25 Jan 2024, Matthew Wilcox wrote:
> > > > On Thu, Jan 25, 2024 at 10:26:19AM -0800, David Rientjes wrote:
[...]
> This is **exactly** the type of discussion we're looking to have :)
> 
> There are some things that I've chatted informally with folks about that 
> I'd like to bring to the forum:
> 
>  - Decoupling CPU migration from memory migration for NUMA Balancing (or
>    perhaps deprecating CPU migration entirely)
> 
>  - Allowing NUMA Balancing to do migration as part of a kthread 
>    asynchronous to the NUMA hint fault, in kernel context
> 
>  - Abstraction for future hardware devices that can provide an expanded
>    view into page hotness that can be leveraged in different areas of the
>    kernel, including as a backend for NUMA Balanacing to replace NUMA
>    hint faults
> 
>  - Per-container support for configuring balancing and memory migration
> 
>  - Opting certain types of memory into NUMA Balancing (like tmpfs) while
>    leaving other types alone
> 
>  - Utilizing hardware accelerated memory migration as a replacement for
>    the traditional migrate_pages() path when available
> 
> I could go code all of this up and spend an enormous amount of time doing 
> so only to get NAKed by somebody because I'm ripping out their critical 
> use case that I just didn't know about :)  There's also the question of 
> whether DAMON should be the source of truth for this or it should be 
> decoupled.

I wouldn't dare to say DAMON should be the source of truth, but I hope DAMON to
be somewhat useful.  DAMON is designed to be able to be easily extended[1] for
various access monitoring / memory management primitives including hardware
features.  DAMOS of today provides a feature called filter[2], which allows
applying specific operations to specific pages depending on their
non-access-pattern information including type (anon vs file-backed) and which
memcg it belongs to.  Hence I think DAMON might be able to be used for a few of
above cases.

[1] https://docs.kernel.org/mm/damon/design.html#configurable-operations-set
[2] https://docs.kernel.org/mm/damon/design.html#filters


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
       [not found] ` <tsnp3a6oxglx2siv7aoplo665k7xsigkqtpfm5yiu2r3wvys26@3vntgau4t2gv>
@ 2024-01-26 14:31   ` John Groves
  0 siblings, 0 replies; 17+ messages in thread
From: John Groves @ 2024-01-26 14:31 UTC (permalink / raw)
  To: David Rientjes
  Cc: John Hubbard, Zi Yan, Bharata B Rao, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple,
	Christoph Lameter, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner, linux-mm

On 24/01/25 10:26AM, David Rientjes wrote:
> Hi everybody,
> 
> There is a lot of excitement around upcoming CXL type 3 memory expansion
> devices and their cost savings potential.  As the industry starts to
> adopt this technology, one of the key components in strategic planning is
> how the upstream Linux kernel will support various tiered configurations
> to meet various user needs.  I think it goes without saying that this is
> quite interesting to cloud providers as well as other hyperscalers :)
> 
> I think this discussion would benefit from a collaborative approach
> between various stakeholders and interested parties.  Reason being is
> that there are several different use cases the need different support
> models, but also because there is great incentive toward moving "with"
> upstream Linux for this support rather than having multiple parties
> bringing up their own stacks only to find that they are diverging from
> upstream rather than converging with it.
> 
> I'm interested to learn if there is interest in forming a "Linux Memory
> Tiering Work Group" to share ideas, discuss multi-faceted approaches, and
> keep track of work items?
> 
> Some recent discussions have proven that there is widespread interest in
> some very foundational topics for this technology such as:
> 
>  - Decoupling CPU balancing from memory balancing (or obsoleting CPU
>    balancing entirely)
> 
>    + John Hubbard notes this would be useful for GPUs:
> 
>       a) GPUs have their own processors that are invisible to the kernel's
>          NUMA "which tasks are active on which NUMA nodes" calculations,
>          and
> 
>       b) Similar to where CXL is generally going, we have already built
>          fully memory-coherent hardware, which include memory-only NUMA
>          nodes.
> 
>  - In-kernel hot memory abstraction, informed by hardware hinting drivers
>    (incl some architectures like Power10), usable as a NUMA Balancing
>    backend for promotion and other areas of the kernel like transparent
>    hugepage utilization
> 
>  - NUMA and memory tiering enlightenment for accelerators, such as for
>    optimal use of GPU memory, extremely important for a cloud provider
>    (hint hint :)
> 
>  - Asynchronous memory promotion independent of task_numa_fault() while
>    considering the cost of page migration (due to identifying cold memory)
> 
> It looks like there is already some interest in such a working group that
> would have a biweekly discussion of shared interests with the goal of
> accelerating design, development, testing, and division of work:
> 
> Alistair Popple
> Aneesh Kumar K V
> Brian Morris
> Christoph Lameter
> Dan Williams
> Gregory Price
> Grimm, Jon
> Huang, Ying
> Johannes Weiner
> John Hubbard
> Zi Yan
> 
> Specifically for the in-kernel hot memory abstraction topic, Google and
> Meta recently publushed an OCP base specification "Hyperscale CXL Tiered
> Memory Expander Specification" available at
> https://drive.google.com/file/d/1fFfU7dFmCyl6V9-9qiakdWaDr9d38ewZ/view?usp=drive_link
> that would be great to discuss.
> 
> There is also on-going work in the CXL Consortium to standardize some of
> the abstractions for CXL 3.1.
> 
> If folks are interested in this topic and your name doesn't appear above
> (I already got you :), please:
> 
>  - reply-all to this email to express interest and expand upon the list
>    of topics above to represent additional areas of interest that should
>    be included, *or*
> 
>  - email me privately to express interest to make sure you are included
> 
> Perhaps I'm overly optimistic, but one thing that would be absolutely
> *amazing* would be if we all have a very clear and understandable vision
> for how Linux will support the wide variety of use cases, even before
> that work is fully implemented (or even designed), by LSF/MM/BPF 2024
> time in May.
> 
> Thanks!
> 

Please add me to the cxl interested parties list. 

John Groves (jgroves@micron.com / John@Jagalactic.com)




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-01-25 18:52 ` Matthew Wilcox
  2024-01-25 20:04   ` David Rientjes
@ 2024-01-26 20:41   ` Christoph Lameter (Ampere)
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Lameter (Ampere) @ 2024-01-26 20:41 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Rientjes, John Hubbard, Zi Yan, Bharata B Rao, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple, Andrew Morton,
	Linus Torvalds, Dave Hansen, Mel Gorman, Jon Grimm,
	Gregory Price, Brian Morris, Wei Xu, Johannes Weiner, linux-mm

On Thu, 25 Jan 2024, Matthew Wilcox wrote:

> We should reject this technology before it harms our kernel and the
> entire industry.  There's a reason that SGI died.  Nobody wants to buy
> single image machines the size of a data centre.

Some people with big pockets needed these data center sized single image 
machines. SGI died because Intel was not able to support large memory 
(petabytes!) anymore after they neglected to develop the processor that 
they promised contractually to SGI and therefore the tech became unusuable 
for the deep pocketed customers.

The Linux kernel worked just fine with petabyte sized address spaces.

There are certainly use cases for large memory pools. They can be created 
and have been improvised using RDMA. Basically shifting memory back and 
forth into the small memory processor spaces that Intel confined us in by 
taking sections of a simulated petabyte sized cross machine "address 
space" that is spread over lots of network nodes.

CXL could perhaps allow us to come up with a better solution.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-01-25 21:37       ` David Rientjes
  2024-01-25 22:28         ` Gregory Price
  2024-01-26  0:16         ` SeongJae Park
@ 2024-01-26 21:06         ` Christoph Lameter (Ampere)
  2024-01-26 23:03           ` Gregory Price
  2024-01-28 20:15         ` David Rientjes
  3 siblings, 1 reply; 17+ messages in thread
From: Christoph Lameter (Ampere) @ 2024-01-26 21:06 UTC (permalink / raw)
  To: David Rientjes
  Cc: Matthew Wilcox, John Hubbard, Zi Yan, Bharata B Rao, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple, Andrew Morton,
	Linus Torvalds, Dave Hansen, Mel Gorman, Jon Grimm,
	Gregory Price, Brian Morris, Wei Xu, Johannes Weiner,
	SeongJae Park, linux-mm

On Thu, 25 Jan 2024, David Rientjes wrote:

> My dream world would be where we could discuss various use cases for
> locally attached CXL memory and determine, as a group, what the shared,
> comprehensive "Linux vision" for it is and do so before LSF/MM/BPF.  In a
> perfect world, we could block out an expanded MM session in Salt Lake City
> to bring all these concepts together, what approaches sound reasonable vs
> unreasonable, and leave that conference with a clear understanding of what
> needs to happen.

I thought the main use of CXL is as a standardized interconnect. We 
finally can link up heterogeneous systems with various types of nodes in 
larger system configuration. As such it could contain processor nodes, 
memory nodes and i/o nodes and allow the setup of powerful systems with 
large address spaces, coprocessors, diverse types of processing etc etc.

Well yes this is going to create some work but it looks like an exciting 
way to move forward to more powerful system configurations.

All of this will then be possible then in pretty small configurations of 
just a couple of chips.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-01-26 21:06         ` Christoph Lameter (Ampere)
@ 2024-01-26 23:03           ` Gregory Price
  0 siblings, 0 replies; 17+ messages in thread
From: Gregory Price @ 2024-01-26 23:03 UTC (permalink / raw)
  To: Christoph Lameter (Ampere)
  Cc: David Rientjes, Matthew Wilcox, John Hubbard, Zi Yan,
	Bharata B Rao, Dave Jiang, Aneesh Kumar K.V, Huang, Ying,
	Alistair Popple, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner, SeongJae Park, linux-mm

On Fri, Jan 26, 2024 at 01:06:25PM -0800, Christoph Lameter (Ampere) wrote:
> On Thu, 25 Jan 2024, David Rientjes wrote:
> 
> > My dream world would be where we could discuss various use cases for
> > locally attached CXL memory and determine, as a group, what the shared,
> > comprehensive "Linux vision" for it is and do so before LSF/MM/BPF.  In a
> > perfect world, we could block out an expanded MM session in Salt Lake City
> > to bring all these concepts together, what approaches sound reasonable vs
> > unreasonable, and leave that conference with a clear understanding of what
> > needs to happen.
> 
> I thought the main use of CXL is as a standardized interconnect. We finally
> can link up heterogeneous systems with various types of nodes in larger
> system configuration. As such it could contain processor nodes, memory nodes
> and i/o nodes and allow the setup of powerful systems with large address
> spaces, coprocessors, diverse types of processing etc etc.

if you ask 30 people what the "main use" of CXL is/will be, you will get
30 different answers.  We should not try to solve world peace here.

That starts with defining scope, and I think topological details should
be extremely limited for a general tiering system. Otherwise we're
inviting serious trouble.

> 
> Well yes this is going to create some work but it looks like an exciting way
> to move forward to more powerful system configurations.
> 
> All of this will then be possible then in pretty small configurations of
> just a couple of chips.
> 
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-01-25 21:37       ` David Rientjes
                           ` (2 preceding siblings ...)
  2024-01-26 21:06         ` Christoph Lameter (Ampere)
@ 2024-01-28 20:15         ` David Rientjes
  3 siblings, 0 replies; 17+ messages in thread
From: David Rientjes @ 2024-01-28 20:15 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: John Hubbard, Zi Yan, Bharata B Rao, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple,
	Christoph Lameter, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner, SeongJae Park, linux-mm

On Thu, 25 Jan 2024, David Rientjes wrote:

> This is **exactly** the type of discussion we're looking to have :)
> 
> There are some things that I've chatted informally with folks about that 
> I'd like to bring to the forum:
> 
>  - Decoupling CPU migration from memory migration for NUMA Balancing (or
>    perhaps deprecating CPU migration entirely)
> 
>  - Allowing NUMA Balancing to do migration as part of a kthread 
>    asynchronous to the NUMA hint fault, in kernel context
> 
>  - Abstraction for future hardware devices that can provide an expanded
>    view into page hotness that can be leveraged in different areas of the
>    kernel, including as a backend for NUMA Balanacing to replace NUMA
>    hint faults
> 
>  - Per-container support for configuring balancing and memory migration
> 
>  - Opting certain types of memory into NUMA Balancing (like tmpfs) while
>    leaving other types alone
> 
>  - Utilizing hardware accelerated memory migration as a replacement for
>    the traditional migrate_pages() path when available
> 

It would be absolutely stunning if all of the above is non controversial 
:)

I would imagine that there are some (perhaps strong) opinions polarized 
between "ok, this makes sense" and "ok, this seems crazy" for each one of 
these.  Or, at least, "prove to me that this makes sense."

We can definitely wait for this work group to be formed and then come back 
to the mailing list, but any early feedback on these would be much 
appreciated especially if the opinions are polarized toward the at least 
some of these being on the crazy side.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-01-25 20:19     ` Matthew Wilcox
  2024-01-25 21:37       ` David Rientjes
@ 2024-01-29 10:27       ` David Hildenbrand
  1 sibling, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2024-01-29 10:27 UTC (permalink / raw)
  To: Matthew Wilcox, David Rientjes
  Cc: John Hubbard, Zi Yan, Bharata B Rao, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple,
	Christoph Lameter, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner, linux-mm

On 25.01.24 21:19, Matthew Wilcox wrote:
> On Thu, Jan 25, 2024 at 12:04:37PM -0800, David Rientjes wrote:
>> On Thu, 25 Jan 2024, Matthew Wilcox wrote:
>>> On Thu, Jan 25, 2024 at 10:26:19AM -0800, David Rientjes wrote:
>>>> There is a lot of excitement around upcoming CXL type 3 memory expansion
>>>> devices and their cost savings potential.  As the industry starts to
>>>> adopt this technology, one of the key components in strategic planning is
>>>> how the upstream Linux kernel will support various tiered configurations
>>>> to meet various user needs.  I think it goes without saying that this is
>>>> quite interesting to cloud providers as well as other hyperscalers :)
>>>
>>> I'm not excited.  I'm disappointed that people are falling for this scam.
>>> CXL is the ATM of this decade.  The protocol is not fit for the purpose
>>> of accessing remote memory, adding 10ns just for an encode/decode cycle.
>>> Hands up everybody who's excited about memory latency increasing by 17%.
>>
>> Right, I don't think that anybody is claiming that we can leverage locally
>> attached CXL memory as through it was DRAM on the same or remote socket
>> and that there won't be a noticable impact to application performance
>> while the memory is still across the device.
>>
>> It does offer several cost savings benefits for offloading of cold memory,
>> though, if locally attached and I think the support for that use case is
>> inevitable -- in fact, Linux has some sophisticated support for the
>> locally attached use case already.
>>
>>> Then there are the lies from the vendors who want you to buy switches.
>>> Not one of them are willing to guarantee you the worst case latency
>>> through their switches.
>>
>> I should have prefaced this thread by saying "locally attached CXL memory
>> expansion", because that's the primary focus of many of the folks on this
>> email thread :)
> 
> That's a huge relief.  I was not looking forward to the patches to add
> support for pooling (etc).

The issue is that CXL standard is at this point extremely 
over-engineered with obscure use cases, and features that feel 
completely detached from reality -- especially, what a commodity OS can 
support and would be willing to support.

Thanks for expressing what I've been thinking all of the time. CXL is 
IMHO great for cheap (slow/cold) memory, GPGPUs etc, and I'm hoping that 
will be the primary focus for the near future -- not all of that dynamic 
capacity, memory pooling etc crap (sorry).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-01-25 18:26 [RFC] Memory tiering kernel alignment David Rientjes
                   ` (2 preceding siblings ...)
       [not found] ` <tsnp3a6oxglx2siv7aoplo665k7xsigkqtpfm5yiu2r3wvys26@3vntgau4t2gv>
@ 2024-02-29  2:04 ` Davidlohr Bueso
  2024-02-29  4:01   ` Bharata B Rao
  3 siblings, 1 reply; 17+ messages in thread
From: Davidlohr Bueso @ 2024-02-29  2:04 UTC (permalink / raw)
  To: David Rientjes
  Cc: John Hubbard, Zi Yan, Bharata B Rao, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple,
	Christoph Lameter, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner, linux-mm, Adam Manzanares

On Thu, 25 Jan 2024, David Rientjes wrote:

>Some recent discussions have proven that there is widespread interest in
>some very foundational topics for this technology such as:
>
> - Decoupling CPU balancing from memory balancing (or obsoleting CPU
>   balancing entirely)
>
>   + John Hubbard notes this would be useful for GPUs:
>
>      a) GPUs have their own processors that are invisible to the kernel's
>         NUMA "which tasks are active on which NUMA nodes" calculations,
>         and
>
>      b) Similar to where CXL is generally going, we have already built
>         fully memory-coherent hardware, which include memory-only NUMA
>         nodes.
>
> - In-kernel hot memory abstraction, informed by hardware hinting drivers
>   (incl some architectures like Power10), usable as a NUMA Balancing
>   backend for promotion and other areas of the kernel like transparent
>   hugepage utilization

Regarding the hardware counters, can/will CPU vendors provide something
better for what is currently there for PEBS/IBS - which needs a lot of
stat crunching to make it useful for hot page detection. imo if any sort
of hw assistance is going to be used, it better be *big* win vs anything
only in software - muddy numbers aren't worth the hassle. Power10 accounts
for time decay, and therefore would be better suited. iirc there was some
mention to possibly model after something along these lines.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-02-29  2:04 ` Davidlohr Bueso
@ 2024-02-29  4:01   ` Bharata B Rao
  2024-02-29 18:23     ` SeongJae Park
  0 siblings, 1 reply; 17+ messages in thread
From: Bharata B Rao @ 2024-02-29  4:01 UTC (permalink / raw)
  To: David Rientjes, John Hubbard, Zi Yan, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple,
	Christoph Lameter, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner, linux-mm, Adam Manzanares

On 29-Feb-24 7:34 AM, Davidlohr Bueso wrote:
> On Thu, 25 Jan 2024, David Rientjes wrote:
> 
>> Some recent discussions have proven that there is widespread interest in
>> some very foundational topics for this technology such as:
>>
>> - Decoupling CPU balancing from memory balancing (or obsoleting CPU
>>   balancing entirely)
>>
>>   + John Hubbard notes this would be useful for GPUs:
>>
>>      a) GPUs have their own processors that are invisible to the kernel's
>>         NUMA "which tasks are active on which NUMA nodes" calculations,
>>         and
>>
>>      b) Similar to where CXL is generally going, we have already built
>>         fully memory-coherent hardware, which include memory-only NUMA
>>         nodes.
>>
>> - In-kernel hot memory abstraction, informed by hardware hinting drivers
>>   (incl some architectures like Power10), usable as a NUMA Balancing
>>   backend for promotion and other areas of the kernel like transparent
>>   hugepage utilization
> 
> Regarding the hardware counters, can/will CPU vendors provide something
> better for what is currently there for PEBS/IBS - which needs a lot of
> stat crunching to make it useful for hot page detection.

IBS works independent of PMCs and reports useful information like the
virtual and physical address of the access, precise IP, Data Source
info (like cache, DRAM, External memory/CXL etc), remote node indication
etc. Hence it doesn't really need stat crunching.

However it captures and reports access information based on sampling
and I have seen that the best sampling interval isn't always good enough
to match the number of accesses captured by software based mechanism
like NUMA balancing.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Memory tiering kernel alignment
  2024-02-29  4:01   ` Bharata B Rao
@ 2024-02-29 18:23     ` SeongJae Park
  0 siblings, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2024-02-29 18:23 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: David Rientjes, John Hubbard, Zi Yan, Dave Jiang,
	Aneesh Kumar K.V, Huang, Ying, Alistair Popple,
	Christoph Lameter, Andrew Morton, Linus Torvalds, Dave Hansen,
	Mel Gorman, Jon Grimm, Gregory Price, Brian Morris, Wei Xu,
	Johannes Weiner, linux-mm, Adam Manzanares

On Thu, 29 Feb 2024 09:31:02 +0530 Bharata B Rao <bharata@amd.com> wrote:

> On 29-Feb-24 7:34 AM, Davidlohr Bueso wrote:
> > On Thu, 25 Jan 2024, David Rientjes wrote:
> > 
> >> Some recent discussions have proven that there is widespread interest in
> >> some very foundational topics for this technology such as:
> >>
> >> - Decoupling CPU balancing from memory balancing (or obsoleting CPU
> >>   balancing entirely)
> >>
> >>   + John Hubbard notes this would be useful for GPUs:
> >>
> >>      a) GPUs have their own processors that are invisible to the kernel's
> >>         NUMA "which tasks are active on which NUMA nodes" calculations,
> >>         and
> >>
> >>      b) Similar to where CXL is generally going, we have already built
> >>         fully memory-coherent hardware, which include memory-only NUMA
> >>         nodes.
> >>
> >> - In-kernel hot memory abstraction, informed by hardware hinting drivers
> >>   (incl some architectures like Power10), usable as a NUMA Balancing
> >>   backend for promotion and other areas of the kernel like transparent
> >>   hugepage utilization
> > 
> > Regarding the hardware counters, can/will CPU vendors provide something
> > better for what is currently there for PEBS/IBS - which needs a lot of
> > stat crunching to make it useful for hot page detection.
> 
> IBS works independent of PMCs and reports useful information like the
> virtual and physical address of the access, precise IP, Data Source
> info (like cache, DRAM, External memory/CXL etc), remote node indication
> etc. Hence it doesn't really need stat crunching.
> 
> However it captures and reports access information based on sampling
> and I have seen that the best sampling interval isn't always good enough
> to match the number of accesses captured by software based mechanism
> like NUMA balancing.

This is same for sampling methods using time interval such as DAMON.  I'm
trying to make a sort of automatic tuning of such intervals for DAMON based on
realtime monitoring results and real request from users.  For example, if the
user want to find hottest memory region of X % of the total memory, we could
draw hotness histogram of the memory and get some clue about if the current
sampling interval is too large or small.  Just a vague idea.  I haven't spent
time for this topic so far.

Since DAMON is designed to be easy to be extended with multiple access check
methods including hardware features like IBS, I think the DAMON level
auto-tuning might help in this case.


Thanks,
SJ

> 
> Regards,
> Bharata.
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-02-29 18:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-25 18:26 [RFC] Memory tiering kernel alignment David Rientjes
2024-01-25 18:52 ` Matthew Wilcox
2024-01-25 20:04   ` David Rientjes
2024-01-25 20:19     ` Matthew Wilcox
2024-01-25 21:37       ` David Rientjes
2024-01-25 22:28         ` Gregory Price
2024-01-26  0:16         ` SeongJae Park
2024-01-26 21:06         ` Christoph Lameter (Ampere)
2024-01-26 23:03           ` Gregory Price
2024-01-28 20:15         ` David Rientjes
2024-01-29 10:27       ` David Hildenbrand
2024-01-26 20:41   ` Christoph Lameter (Ampere)
2024-01-26  0:04 ` SeongJae Park
     [not found] ` <tsnp3a6oxglx2siv7aoplo665k7xsigkqtpfm5yiu2r3wvys26@3vntgau4t2gv>
2024-01-26 14:31   ` John Groves
2024-02-29  2:04 ` Davidlohr Bueso
2024-02-29  4:01   ` Bharata B Rao
2024-02-29 18:23     ` SeongJae Park

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.