All of lore.kernel.org
 help / color / mirror / Atom feed
* CXL 2.0 memory pooling emulation
@ 2023-02-08 22:28 zhiting zhu
  2023-02-15 15:18 ` Jonathan Cameron via
  0 siblings, 1 reply; 7+ messages in thread
From: zhiting zhu @ 2023-02-08 22:28 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 350 bytes --]

Hi,

I saw a PoC:
https://lore.kernel.org/qemu-devel/20220525121422.00003a84@Huawei.com/T/ to
implement memory pooling and fabric manager on qemu. Is there any further
development on this? Can qemu emulate a memory pooling on a simple case
that two virtual machines connected to a CXL switch where some memory
devices are attached to?

Best,
Zhiting

[-- Attachment #2: Type: text/html, Size: 538 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CXL 2.0 memory pooling emulation
  2023-02-15 15:18 ` Jonathan Cameron via
@ 2023-02-15  9:10   ` Gregory Price
  2023-02-16 18:00     ` Jonathan Cameron via
  0 siblings, 1 reply; 7+ messages in thread
From: Gregory Price @ 2023-02-15  9:10 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko

On Wed, Feb 15, 2023 at 03:18:54PM +0000, Jonathan Cameron via wrote:
> On Wed, 8 Feb 2023 16:28:44 -0600
> zhiting zhu <zhitingz@cs.utexas.edu> wrote:
> 
> > Hi,
> > 
> > I saw a PoC:
> > https://lore.kernel.org/qemu-devel/20220525121422.00003a84@Huawei.com/T/ to
> > implement memory pooling and fabric manager on qemu. Is there any further
> > development on this? Can qemu emulate a memory pooling on a simple case
> > that two virtual machines connected to a CXL switch where some memory
> > devices are attached to?
> > 
> > Best,
> > Zhiting
> [... snip ...]
> 
> Note though that there is a long way to go before we can do what you
> want.  The steps I'd expect to see along the way:
> 
> 1) Emulate an Multi Headed Device.
>    Initially connect two heads to different host bridges on a single QEMU
>    machine.  That lets us test most of the code flows without needing
>    to handle tests that involve multiple machines.
>    Later, we could add a means to connect between two instances of QEMU.

I've been playing with this a bit.

Hackiest way to do this is to connect the same memory backend to two
type-3 devices, with the obvious caveat that the device state will not
be consistent between views.

But we could, for example, just put the relevant shared state into an
optional shared memory area instead of a normally allocated region.

i can imagine this looking something like

memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true
cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken

then you can have multiple qemu instances hook their relevant devices up
to a a backend that points to the same file, and instantiate their
shared state in the region shmget(mytoken).

Additionally, these devices will require a set of what amounts to
vendor-specific mailbox commands - since the spec doesn't really define
what multi-headed devices "should do" to manage exclusivity.

Not sure if this would be upstream-worthy, or if we'd want to fork
mem/cxl-type3.c into like mem/cxl-type3-multihead.c or something.

The base type3 device is going to end up overloaded at some point i
think, and we'll want to look at trying to abstract it.

~Gregory

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CXL 2.0 memory pooling emulation
  2023-02-08 22:28 CXL 2.0 memory pooling emulation zhiting zhu
@ 2023-02-15 15:18 ` Jonathan Cameron via
  2023-02-15  9:10   ` Gregory Price
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Cameron via @ 2023-02-15 15:18 UTC (permalink / raw)
  To: zhiting zhu; +Cc: qemu-devel, linux-cxl, Viacheslav A.Dubeyko

On Wed, 8 Feb 2023 16:28:44 -0600
zhiting zhu <zhitingz@cs.utexas.edu> wrote:

> Hi,
> 
> I saw a PoC:
> https://lore.kernel.org/qemu-devel/20220525121422.00003a84@Huawei.com/T/ to
> implement memory pooling and fabric manager on qemu. Is there any further
> development on this? Can qemu emulate a memory pooling on a simple case
> that two virtual machines connected to a CXL switch where some memory
> devices are attached to?
> 
> Best,
> Zhiting

Hi Zhiting,

+CC linux-cxl as it's not as much of a firehose as qemu-devel
+CC Slava who has been driving discussion around fabric management.
> 

No progress on that particular approach though some discussion on
what the FM architecture itself might look like.

https://lore.kernel.org/linux-cxl/7F001EAF-C512-436A-A9DD-E08730C91214@bytedance.com/

There was a sticky problem with doing MCTP over I2C which is that
there are very few I2C controllers that support the combination of
master and subordinate needed to do MCTP.  The one that was used for
that (aspeed) doesn't have ACPI bindings (and they are non trivial to
add due to clocks etc and likely to be controversial on kernel side
given I just want it for emulation!).  So far we don't have DT bindings for CXL
(either the CFMWS - CXL fixed memory windows or pxb-cxl - the host bridge)
I'll be sending out one of the precursors for that as an RFC soon.

So we are in the fun position that we can either emulate the comms path
to the devices, or we can emulate the host actually using the devices.
I was planning to get back to that eventually but we have other options
now CXL 3.0 has been published.

CXL 3.0 provides two paths forwards that let us test the equivalent
functionality with fewer moving parts.
1) CXL SWCCI which is an extra PCI function next to the switch upstream port
   that provides a mailbox that takes FM-API commands.
PoC Kernel code at:
https://lore.kernel.org/linux-cxl/20221025104243.20836-1-Jonathan.Cameron@huawei.com/
Latest branch in 
gitlab.com/jic23/qemu should have switch CCI emulation support. (branches
are dated) Note we have a lot of stuff outstanding, either out for review
or backed up behind things that are.
2) Multi Headed Devices.  These allow FM-API commands over the normal CXL
   mailbox.

I did a very basic PoC to see how this would fit in with the kernel side
of things but recently there has been far too much we need to enable in
the shorter term. 

Note though that there is a long way to go before we can do what you
want.  The steps I'd expect to see along the way:

1) Emulate an Multi Headed Device.
   Initially connect two heads to different host bridges on a single QEMU
   machine.  That lets us test most of the code flows without needing
   to handle tests that involve multiple machines.
   Later, we could add a means to connect between two instances of QEMU.
2) Add DCD support (we'll need the kernel side of that as well)
3) Wire it all up.
4) Do the same for a Switch with MLDs behind it so we can poke the fun
   corners.

Note that in common with memory emulation in general for CXL on QEMU
the need to do live address decoding will make performance terrible.
There are probably ways to improve that, but whilst we are at the stage
of trying to get as much functional as possible for testing purposes,
I'm not sure anyone will pursue those options.  May not make sense in
the longer term either.  I'm more than happy to offer suggestions
/ feedback on approaches to this and will get back to it myself
once some more pressing requirements are dealt with.

Jonathan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CXL 2.0 memory pooling emulation
  2023-02-15  9:10   ` Gregory Price
@ 2023-02-16 18:00     ` Jonathan Cameron via
  2023-02-16 20:52       ` Gregory Price
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Cameron via @ 2023-02-16 18:00 UTC (permalink / raw)
  To: Gregory Price; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko

On Wed, 15 Feb 2023 04:10:20 -0500
Gregory Price <gregory.price@memverge.com> wrote:

> On Wed, Feb 15, 2023 at 03:18:54PM +0000, Jonathan Cameron via wrote:
> > On Wed, 8 Feb 2023 16:28:44 -0600
> > zhiting zhu <zhitingz@cs.utexas.edu> wrote:
> >   
> > > Hi,
> > > 
> > > I saw a PoC:
> > > https://lore.kernel.org/qemu-devel/20220525121422.00003a84@Huawei.com/T/ to
> > > implement memory pooling and fabric manager on qemu. Is there any further
> > > development on this? Can qemu emulate a memory pooling on a simple case
> > > that two virtual machines connected to a CXL switch where some memory
> > > devices are attached to?
> > > 
> > > Best,
> > > Zhiting  
> > [... snip ...]
> > 
> > Note though that there is a long way to go before we can do what you
> > want.  The steps I'd expect to see along the way:
> > 
> > 1) Emulate an Multi Headed Device.
> >    Initially connect two heads to different host bridges on a single QEMU
> >    machine.  That lets us test most of the code flows without needing
> >    to handle tests that involve multiple machines.
> >    Later, we could add a means to connect between two instances of QEMU.  
> 
> I've been playing with this a bit.
> 
> Hackiest way to do this is to connect the same memory backend to two
> type-3 devices, with the obvious caveat that the device state will not
> be consistent between views.
> 
> But we could, for example, just put the relevant shared state into an
> optional shared memory area instead of a normally allocated region.
> 
> i can imagine this looking something like
> 
> memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true
> cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken
> 
> then you can have multiple qemu instances hook their relevant devices up
> to a a backend that points to the same file, and instantiate their
> shared state in the region shmget(mytoken).

That's not pretty.  For local instance I was thinking a primary device
which also has the FM-API tunneled access via mailbox, and secondary devices
that don't.  That would also apply to remote. The secondary device would
then just receive some control commands on what to expose up to it's host.
Not sure what convention on how to do that is in QEMU. Maybe a socket
interface like is done for swtpm? With some ordering constraints on startup.

> 
> Additionally, these devices will require a set of what amounts to
> vendor-specific mailbox commands - since the spec doesn't really define
> what multi-headed devices "should do" to manage exclusivity.

The device shouldn't manage exclusivity.  That's a job for the fabric
manager + DCD presentation of the memory with device enforcing some rules
+ if it supports some of the capacity adding types, it might need a
simple allocator.
If we need vendor specific commands then we need to take that to the
relevant body. I'm not sure what they would be though.

> 
> Not sure if this would be upstream-worthy, or if we'd want to fork
> mem/cxl-type3.c into like mem/cxl-type3-multihead.c or something.
> 
> The base type3 device is going to end up overloaded at some point i
> think, and we'll want to look at trying to abstract it.

Sure.  Though we might end up with the normal type3 implementation being
(optionally) the primary device for a MHD (the one with the FM-API
tunneling available on it's mailbox).  Would need a secondary
device though which you instantiate with a link to the primary one
or with a socket. (assuming primary opens socket as well).

Jonathan

> 
> ~Gregory



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CXL 2.0 memory pooling emulation
  2023-02-16 18:00     ` Jonathan Cameron via
@ 2023-02-16 20:52       ` Gregory Price
  2023-02-17 11:14         ` Jonathan Cameron via
  0 siblings, 1 reply; 7+ messages in thread
From: Gregory Price @ 2023-02-16 20:52 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko

On Thu, Feb 16, 2023 at 06:00:57PM +0000, Jonathan Cameron wrote:
> On Wed, 15 Feb 2023 04:10:20 -0500
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > On Wed, Feb 15, 2023 at 03:18:54PM +0000, Jonathan Cameron via wrote:
> > > On Wed, 8 Feb 2023 16:28:44 -0600
> > > zhiting zhu <zhitingz@cs.utexas.edu> wrote:
> > >   
> > > 1) Emulate an Multi Headed Device.
> > >    Initially connect two heads to different host bridges on a single QEMU
> > >    machine.  That lets us test most of the code flows without needing
> > >    to handle tests that involve multiple machines.
> > >    Later, we could add a means to connect between two instances of QEMU.  
> > 
> > Hackiest way to do this is to connect the same memory backend to two
> > type-3 devices, with the obvious caveat that the device state will not
> > be consistent between views.
> > 
> > But we could, for example, just put the relevant shared state into an
> > optional shared memory area instead of a normally allocated region.
> > 
> > i can imagine this looking something like
> > 
> > memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true
> > cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken
> > 
> > then you can have multiple qemu instances hook their relevant devices up
> > to a a backend that points to the same file, and instantiate their
> > shared state in the region shmget(mytoken).
> 
> That's not pretty.  For local instance I was thinking a primary device
> which also has the FM-API tunneled access via mailbox, and secondary devices
> that don't.  That would also apply to remote. The secondary device would
> then just receive some control commands on what to expose up to it's host.
> Not sure what convention on how to do that is in QEMU. Maybe a socket
> interface like is done for swtpm? With some ordering constraints on startup.
> 

I agree, it's certainly "not pretty".

I'd go so far as to call the baby ugly :].  Like i said: "The Hackiest way"

My understanding from looking around at some road shows is that some
of these early multi-headed devices are basically just SLD's with multiple
heads. Most of these devices had to be developed well before DCD's and
therefore the FM-API were placed in the spec, and we haven't seen or
heard of any of these early devices having any form of switch yet.

I don't see how this type of device is feasible unless it's either statically
provisioned (change firmware settings from bios on reboot) or implements
custom firmware commands to implement some form of exclusivity controls over
memory regions.

The former makes it not really a useful pooling device, so I'm sorta guessing
we'll see most of these early devices implement custom commands.

I'm just not sure these early MHD's are going to have any real form of
FM-API, but it would still be nice to emulate them.

~Gregory

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CXL 2.0 memory pooling emulation
  2023-02-17 11:14         ` Jonathan Cameron via
@ 2023-02-17 11:02           ` Gregory Price
  0 siblings, 0 replies; 7+ messages in thread
From: Gregory Price @ 2023-02-17 11:02 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko

On Fri, Feb 17, 2023 at 11:14:18AM +0000, Jonathan Cameron wrote:
> On Thu, 16 Feb 2023 15:52:31 -0500
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > 
> > I agree, it's certainly "not pretty".
> > 
> > I'd go so far as to call the baby ugly :].  Like i said: "The Hackiest way"
> > 
> > My understanding from looking around at some road shows is that some
> > of these early multi-headed devices are basically just SLD's with multiple
> > heads. Most of these devices had to be developed well before DCD's and
> > therefore the FM-API were placed in the spec, and we haven't seen or
> > heard of any of these early devices having any form of switch yet.
> > 
> > I don't see how this type of device is feasible unless it's either statically
> > provisioned (change firmware settings from bios on reboot) or implements
> > custom firmware commands to implement some form of exclusivity controls over
> > memory regions.
> > 
> > The former makes it not really a useful pooling device, so I'm sorta guessing
> > we'll see most of these early devices implement custom commands.
> > 
> > I'm just not sure these early MHD's are going to have any real form of
> > FM-API, but it would still be nice to emulate them.
> > 
> Makes sense.  I'd be fine with adding any necessary hooks to allow that
> in the QEMU emulation, but probably not upstreaming the custom stuff.
> 
> Jonathan
> 

I'll have to give it some thought.  The "custom stuff" is mostly init
code, mailbox commands, and the fields those mailbox commands twiddle.

I guess we could create a wrapper-device that hooks raw commands?  Is
that what raw commands are intended for? Notably the kernel has to be
compiled with raw command support, which is disabled by default, but
that's fine.

Dunno, spitballing, but i'm a couple days away from a first pass at a
MHD, though I'll need to spend quite a bit of time cleaning it up before
i can push an RFC.

~Gregory

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CXL 2.0 memory pooling emulation
  2023-02-16 20:52       ` Gregory Price
@ 2023-02-17 11:14         ` Jonathan Cameron via
  2023-02-17 11:02           ` Gregory Price
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Cameron via @ 2023-02-17 11:14 UTC (permalink / raw)
  To: Gregory Price; +Cc: zhiting zhu, qemu-devel, linux-cxl, Viacheslav A.Dubeyko

On Thu, 16 Feb 2023 15:52:31 -0500
Gregory Price <gregory.price@memverge.com> wrote:

> On Thu, Feb 16, 2023 at 06:00:57PM +0000, Jonathan Cameron wrote:
> > On Wed, 15 Feb 2023 04:10:20 -0500
> > Gregory Price <gregory.price@memverge.com> wrote:
> >   
> > > On Wed, Feb 15, 2023 at 03:18:54PM +0000, Jonathan Cameron via wrote:  
> > > > On Wed, 8 Feb 2023 16:28:44 -0600
> > > > zhiting zhu <zhitingz@cs.utexas.edu> wrote:
> > > >   
> > > > 1) Emulate an Multi Headed Device.
> > > >    Initially connect two heads to different host bridges on a single QEMU
> > > >    machine.  That lets us test most of the code flows without needing
> > > >    to handle tests that involve multiple machines.
> > > >    Later, we could add a means to connect between two instances of QEMU.    
> > > 
> > > Hackiest way to do this is to connect the same memory backend to two
> > > type-3 devices, with the obvious caveat that the device state will not
> > > be consistent between views.
> > > 
> > > But we could, for example, just put the relevant shared state into an
> > > optional shared memory area instead of a normally allocated region.
> > > 
> > > i can imagine this looking something like
> > > 
> > > memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true
> > > cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken
> > > 
> > > then you can have multiple qemu instances hook their relevant devices up
> > > to a a backend that points to the same file, and instantiate their
> > > shared state in the region shmget(mytoken).  
> > 
> > That's not pretty.  For local instance I was thinking a primary device
> > which also has the FM-API tunneled access via mailbox, and secondary devices
> > that don't.  That would also apply to remote. The secondary device would
> > then just receive some control commands on what to expose up to it's host.
> > Not sure what convention on how to do that is in QEMU. Maybe a socket
> > interface like is done for swtpm? With some ordering constraints on startup.
> >   
> 
> I agree, it's certainly "not pretty".
> 
> I'd go so far as to call the baby ugly :].  Like i said: "The Hackiest way"
> 
> My understanding from looking around at some road shows is that some
> of these early multi-headed devices are basically just SLD's with multiple
> heads. Most of these devices had to be developed well before DCD's and
> therefore the FM-API were placed in the spec, and we haven't seen or
> heard of any of these early devices having any form of switch yet.
> 
> I don't see how this type of device is feasible unless it's either statically
> provisioned (change firmware settings from bios on reboot) or implements
> custom firmware commands to implement some form of exclusivity controls over
> memory regions.
> 
> The former makes it not really a useful pooling device, so I'm sorta guessing
> we'll see most of these early devices implement custom commands.
> 
> I'm just not sure these early MHD's are going to have any real form of
> FM-API, but it would still be nice to emulate them.
> 
Makes sense.  I'd be fine with adding any necessary hooks to allow that
in the QEMU emulation, but probably not upstreaming the custom stuff.

Jonathan

> ~Gregory



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-02-17 19:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-08 22:28 CXL 2.0 memory pooling emulation zhiting zhu
2023-02-15 15:18 ` Jonathan Cameron via
2023-02-15  9:10   ` Gregory Price
2023-02-16 18:00     ` Jonathan Cameron via
2023-02-16 20:52       ` Gregory Price
2023-02-17 11:14         ` Jonathan Cameron via
2023-02-17 11:02           ` Gregory Price

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.