* [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
@ 2023-04-07 21:05 Dragan Stancevic
2023-04-07 22:23 ` James Houghton
` (6 more replies)
0 siblings, 7 replies; 40+ messages in thread
From: Dragan Stancevic @ 2023-04-07 21:05 UTC (permalink / raw)
To: lsf-pc; +Cc: nil-migration, linux-cxl, linux-mm
Hi folks-
if it's not too late for the schedule...
I am starting to tackle VM live migration and hypervisor clustering over
switched CXL memory[1][2], intended for cloud virtualization types of loads.
I'd be interested in doing a small BoF session with some slides and get
into a discussion/brainstorming with other people that deal with VM/LM
cloud loads. Among other things to discuss would be page migrations over
switched CXL memory, shared in-memory ABI to allow VM hand-off between
hypervisors, etc...
A few of us discussed some of this under the ZONE_XMEM thread, but I
figured it might be better to start a separate thread.
If there is interested, thank you.
[1]. High-level overview available at http://nil-migration.org/
[2]. Based on CXL spec 3.0
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-07 21:05 [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dragan Stancevic
@ 2023-04-07 22:23 ` James Houghton
2023-04-07 23:17 ` David Rientjes
2023-04-08 0:05 ` Gregory Price
` (5 subsequent siblings)
6 siblings, 1 reply; 40+ messages in thread
From: James Houghton @ 2023-04-07 22:23 UTC (permalink / raw)
To: Dragan Stancevic
Cc: lsf-pc, nil-migration, linux-cxl, linux-mm, David Rientjes
On Fri, Apr 7, 2023 at 5:05 PM Dragan Stancevic <dragan@stancevic.com> wrote:
>
> Hi folks-
>
> if it's not too late for the schedule...
>
> I am starting to tackle VM live migration and hypervisor clustering over
> switched CXL memory[1][2], intended for cloud virtualization types of loads.
>
> I'd be interested in doing a small BoF session with some slides and get
> into a discussion/brainstorming with other people that deal with VM/LM
> cloud loads. Among other things to discuss would be page migrations over
> switched CXL memory, shared in-memory ABI to allow VM hand-off between
> hypervisors, etc...
>
> A few of us discussed some of this under the ZONE_XMEM thread, but I
> figured it might be better to start a separate thread.
>
> If there is interested, thank you.
Hi Dragan,
Thanks for bringing up this topic. I'd be very interested to be part
of this BoF, as I'm also interested in using CXL.mem as a live
migration mechanism.
- James
>
>
> [1]. High-level overview available at http://nil-migration.org/
> [2]. Based on CXL spec 3.0
>
> --
> Peace can only come as a natural consequence
> of universal enlightenment -Dr. Nikola Tesla
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-07 22:23 ` James Houghton
@ 2023-04-07 23:17 ` David Rientjes
2023-04-08 1:33 ` Dragan Stancevic
2023-04-08 16:24 ` Dragan Stancevic
0 siblings, 2 replies; 40+ messages in thread
From: David Rientjes @ 2023-04-07 23:17 UTC (permalink / raw)
To: James Houghton, Dragan Stancevic
Cc: lsf-pc, nil-migration, linux-cxl, linux-mm
[-- Attachment #1: Type: text/plain, Size: 1460 bytes --]
On Fri, 7 Apr 2023, James Houghton wrote:
> On Fri, Apr 7, 2023 at 5:05 PM Dragan Stancevic <dragan@stancevic.com> wrote:
> >
> > Hi folks-
> >
> > if it's not too late for the schedule...
> >
> > I am starting to tackle VM live migration and hypervisor clustering over
> > switched CXL memory[1][2], intended for cloud virtualization types of loads.
> >
> > I'd be interested in doing a small BoF session with some slides and get
> > into a discussion/brainstorming with other people that deal with VM/LM
> > cloud loads. Among other things to discuss would be page migrations over
> > switched CXL memory, shared in-memory ABI to allow VM hand-off between
> > hypervisors, etc...
> >
> > A few of us discussed some of this under the ZONE_XMEM thread, but I
> > figured it might be better to start a separate thread.
> >
> > If there is interested, thank you.
>
> Hi Dragan,
>
> Thanks for bringing up this topic. I'd be very interested to be part
> of this BoF, as I'm also interested in using CXL.mem as a live
> migration mechanism.
>
Thanks for cc'ing me, this would be very interesting to talk about. Count
me in!
> > [1]. High-level overview available at http://nil-migration.org/
> > [2]. Based on CXL spec 3.0
> >
Dragan: I'm curious about the reference to CXL spec 3.0 here, is there
something specific about 3.0 that this work depends on or should we be
good-to-go with 2.0 as well? (Are you referring to 3.0 for security
extensions?)
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-07 21:05 [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dragan Stancevic
2023-04-07 22:23 ` James Houghton
@ 2023-04-08 0:05 ` Gregory Price
2023-04-11 0:56 ` Dragan Stancevic
2023-04-11 6:37 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Huang, Ying
2023-04-09 17:40 ` Shreyas Shah
` (4 subsequent siblings)
6 siblings, 2 replies; 40+ messages in thread
From: Gregory Price @ 2023-04-08 0:05 UTC (permalink / raw)
To: Dragan Stancevic; +Cc: lsf-pc, nil-migration, linux-cxl, linux-mm
On Fri, Apr 07, 2023 at 04:05:31PM -0500, Dragan Stancevic wrote:
> Hi folks-
>
> if it's not too late for the schedule...
>
> I am starting to tackle VM live migration and hypervisor clustering over
> switched CXL memory[1][2], intended for cloud virtualization types of loads.
>
> I'd be interested in doing a small BoF session with some slides and get into
> a discussion/brainstorming with other people that deal with VM/LM cloud
> loads. Among other things to discuss would be page migrations over switched
> CXL memory, shared in-memory ABI to allow VM hand-off between hypervisors,
> etc...
>
> A few of us discussed some of this under the ZONE_XMEM thread, but I figured
> it might be better to start a separate thread.
>
> If there is interested, thank you.
>
>
> [1]. High-level overview available at http://nil-migration.org/
> [2]. Based on CXL spec 3.0
>
> --
> Peace can only come as a natural consequence
> of universal enlightenment -Dr. Nikola Tesla
I've been chatting about this with folks offline, figure i'll toss my
thoughts on the issue here.
Some things to consider:
1. If secure-compute is being used, then this mechanism won't work as
pages will be pinned, and therefore not movable and excluded from
using cxl memory at all.
This issue does not exist with traditional live migration, because
typically some kind of copy is used from one virtual space to another
(i.e. RMDA), so pages aren't really migrated in the kernel memory
block/numa node sense.
2. During the migration process, the memory needs to be forced not to be
migrated to another node by other means (tiering software, swap,
etc). The obvious way of doing this would be to migrate and
temporarily pin the page... but going back to problem #1 we see that
ZONE_MOVABLE and Pinning are mutually exclusive. So that's
troublesome.
3. This is changing the semantics of migration from a virtual memory
movement to a physical memory movement. Typically you would expect
the RDMA process for live migration to work something like...
a) migration request arrives
b) source host informs destination host of size requirements
c) destination host allocations memory and passes a Virtual Address
back to source host
d) source host initates an RDMA from HostA-VA to HostB-VA
e) CPU task is migrated
Importantly, the allocation of memory by Host B handles the important
step of creating HVA->HPA mappings, and the Extended/Nested Page
Tables can simply be flushed and re-created after the VM is fully
migrated.
to long didn't read: live migration is a virtual address operation,
and node-migration is a PHYSICAL address operation, the virtual
addresses remain the same.
This is problematic, as it's changing the underlying semantics of the
migration operation.
Problem #1 and #2 are head-scratchers, but maybe solvable.
Problem #3 is the meat and potatos of the issue in my opinion. So lets
consider that a little more closely.
Generically: NIL Migration is basically a pass by reference operation.
The reference in this case is... the page tables. You need to know how
to interpret the data in the CXL memory region on the remote host, and
that's a "relative page table translation" (to coin a phrase? I'm not
sure how to best describe it).
That's... complicated to say the least.
1) Pages on the physical hardware do not need to be contiguous
2) The CFMW on source and target host do not need to be mapped at the
same place
3) There's not pre-allocation in these charts, and migration isn't
targeted, so having the source-host "expertly place" the data isn't
possible (right now, i suppose you could make kernel extensions).
4) Similar to problem #2 above, even with a pre-allocate added in, you
would need to ensure those mappings were pinned during migration,
lest the target host end up swapping a page or something.
An Option: Make pages physically contiguous on migration to CXL
In this case, you don't necessarily care about the Host Virtual
Addresses, what you actually care about are the structure of the pages
in memory (are they physically contiguous? or do you need to
reconstruct the contiguity by inspecting the page tables?).
If a migration API were capable of reserving large swaths of contiguous
CXL memory, you could discard individual page information and instead
send page range information, reconstructing the virtual-physical
mappings this way.
That's about as far as I've thought about it so far. Feel free to rip
it apart! :]
~Gregory
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-07 23:17 ` David Rientjes
@ 2023-04-08 1:33 ` Dragan Stancevic
2023-04-08 16:24 ` Dragan Stancevic
1 sibling, 0 replies; 40+ messages in thread
From: Dragan Stancevic @ 2023-04-08 1:33 UTC (permalink / raw)
To: David Rientjes, James Houghton; +Cc: lsf-pc, nil-migration, linux-cxl, linux-mm
Hi David-
On 4/7/23 18:17, David Rientjes wrote:
> On Fri, 7 Apr 2023, James Houghton wrote:
>
>> On Fri, Apr 7, 2023 at 5:05 PM Dragan Stancevic <dragan@stancevic.com> wrote:
>>>
>>> Hi folks-
>>>
>>> if it's not too late for the schedule...
>>>
>>> I am starting to tackle VM live migration and hypervisor clustering over
>>> switched CXL memory[1][2], intended for cloud virtualization types of loads.
>>>
>>> I'd be interested in doing a small BoF session with some slides and get
>>> into a discussion/brainstorming with other people that deal with VM/LM
>>> cloud loads. Among other things to discuss would be page migrations over
>>> switched CXL memory, shared in-memory ABI to allow VM hand-off between
>>> hypervisors, etc...
>>>
>>> A few of us discussed some of this under the ZONE_XMEM thread, but I
>>> figured it might be better to start a separate thread.
>>>
>>> If there is interested, thank you.
>>
>> Hi Dragan,
>>
>> Thanks for bringing up this topic. I'd be very interested to be part
>> of this BoF, as I'm also interested in using CXL.mem as a live
>> migration mechanism.
>>
>
> Thanks for cc'ing me, this would be very interesting to talk about. Count
> me in!
>
>>> [1]. High-level overview available at http://nil-migration.org/
>>> [2]. Based on CXL spec 3.0
>>>
>
> Dragan: I'm curious about the reference to CXL spec 3.0 here, is there
> something specific about 3.0 that this work depends on or should we be
> good-to-go with 2.0 as well? (Are you referring to 3.0 for security
> extensions?)
I'm referencing 3.0 with regards to switched/shared memory as defined in
Compute Express Link Specification r3.0, v1.0 8/1/22, Page 51, figure
1-4, black color scheme circle(3) and bars.
It may be possible to do it with 2.0 but as far as I understand[1] the
2.0 spec it might be a lot more involved/clunky. I think new 3.0
features make it easier
[1]. I would love to read 2.0 spec, but I don't have access to 2.0
spec(only 3.0). But that is my understanding from speaking with some CXL
folks at last years plumbers when I floated this idea with them.
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-07 23:17 ` David Rientjes
2023-04-08 1:33 ` Dragan Stancevic
@ 2023-04-08 16:24 ` Dragan Stancevic
1 sibling, 0 replies; 40+ messages in thread
From: Dragan Stancevic @ 2023-04-08 16:24 UTC (permalink / raw)
To: David Rientjes, James Houghton; +Cc: lsf-pc, nil-migration, linux-cxl, linux-mm
Hi David-
On 4/7/23 18:17, David Rientjes wrote:
> On Fri, 7 Apr 2023, James Houghton wrote:
>
>> On Fri, Apr 7, 2023 at 5:05 PM Dragan Stancevic <dragan@stancevic.com> wrote:
>>>
>>> Hi folks-
>>>
>>> if it's not too late for the schedule...
>>>
>>> I am starting to tackle VM live migration and hypervisor clustering over
>>> switched CXL memory[1][2], intended for cloud virtualization types of loads.
>>>
>>> I'd be interested in doing a small BoF session with some slides and get
>>> into a discussion/brainstorming with other people that deal with VM/LM
>>> cloud loads. Among other things to discuss would be page migrations over
>>> switched CXL memory, shared in-memory ABI to allow VM hand-off between
>>> hypervisors, etc...
>>>
>>> A few of us discussed some of this under the ZONE_XMEM thread, but I
>>> figured it might be better to start a separate thread.
>>>
>>> If there is interested, thank you.
>>
>> Hi Dragan,
>>
>> Thanks for bringing up this topic. I'd be very interested to be part
>> of this BoF, as I'm also interested in using CXL.mem as a live
>> migration mechanism.
>>
>
> Thanks for cc'ing me, this would be very interesting to talk about. Count
> me in!
>
>>> [1]. High-level overview available at http://nil-migration.org/
>>> [2]. Based on CXL spec 3.0
>>>
>
> Dragan: I'm curious about the reference to CXL spec 3.0 here, is there
> something specific about 3.0 that this work depends on or should we be
> good-to-go with 2.0 as well? (Are you referring to 3.0 for security
> extensions?)
Sorry hit send too soon, and then had hosting provider issues... the
hypervisor clustering part[1] might not work on CXL 2.0
[1]. http://nil-migration.org/ds-nil-migration-p12.png
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-07 21:05 [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dragan Stancevic
2023-04-07 22:23 ` James Houghton
2023-04-08 0:05 ` Gregory Price
@ 2023-04-09 17:40 ` Shreyas Shah
2023-04-11 1:08 ` Dragan Stancevic
[not found] ` <CGME20230410030532epcas2p49eae675396bf81658c1a3401796da1d4@epcas2p4.samsung.com>
` (3 subsequent siblings)
6 siblings, 1 reply; 40+ messages in thread
From: Shreyas Shah @ 2023-04-09 17:40 UTC (permalink / raw)
To: Dragan Stancevic, lsf-pc; +Cc: nil-migration, linux-cxl, linux-mm
Hi Dragon,
The concept is great to time share the CXL attached memory across two NUMA nodes for live migration and create a cluster of VMs to increase the compute capacity.
When and where is the BoF?
Regards,
Shreyas
-----Original Message-----
From: Dragan Stancevic <dragan@stancevic.com>
Sent: Friday, April 7, 2023 2:06 PM
To: lsf-pc@lists.linux-foundation.org
Cc: nil-migration@lists.linux.dev; linux-cxl@vger.kernel.org; linux-mm@kvack.org
Subject: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
Hi folks-
if it's not too late for the schedule...
I am starting to tackle VM live migration and hypervisor clustering over switched CXL memory[1][2], intended for cloud virtualization types of loads.
I'd be interested in doing a small BoF session with some slides and get into a discussion/brainstorming with other people that deal with VM/LM cloud loads. Among other things to discuss would be page migrations over switched CXL memory, shared in-memory ABI to allow VM hand-off between hypervisors, etc...
A few of us discussed some of this under the ZONE_XMEM thread, but I figured it might be better to start a separate thread.
If there is interested, thank you.
[1]. High-level overview available at http://nil-migration.org/ [2]. Based on CXL spec 3.0
--
Peace can only come as a natural consequence of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
[not found] ` <CGME20230410030532epcas2p49eae675396bf81658c1a3401796da1d4@epcas2p4.samsung.com>
@ 2023-04-10 3:05 ` Kyungsan Kim
2023-04-10 17:46 ` [External] " Viacheslav A.Dubeyko
2023-04-14 3:27 ` Dragan Stancevic
0 siblings, 2 replies; 40+ messages in thread
From: Kyungsan Kim @ 2023-04-10 3:05 UTC (permalink / raw)
To: dragan
Cc: lsf-pc, linux-mm, linux-fsdevel, linux-cxl, a.manzanares,
viacheslav.dubeyko, dan.j.williams, seungjun.ha, wj28.lee
>Hi folks-
>
>if it's not too late for the schedule...
>
>I am starting to tackle VM live migration and hypervisor clustering over
>switched CXL memory[1][2], intended for cloud virtualization types of loads.
>
>I'd be interested in doing a small BoF session with some slides and get
>into a discussion/brainstorming with other people that deal with VM/LM
>cloud loads. Among other things to discuss would be page migrations over
>switched CXL memory, shared in-memory ABI to allow VM hand-off between
>hypervisors, etc...
>
>A few of us discussed some of this under the ZONE_XMEM thread, but I
>figured it might be better to start a separate thread.
>
>If there is interested, thank you.
I would like join the discussion as well.
Let me kindly suggest it would be more great if it includes the data flow of VM/hypervisor as background and kernel interaction expected.
>
>
>[1]. High-level overview available at http://nil-migration.org/
>[2]. Based on CXL spec 3.0
>
>--
>Peace can only come as a natural consequence
>of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [External] [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-10 3:05 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
@ 2023-04-10 17:46 ` Viacheslav A.Dubeyko
2023-04-14 3:27 ` Dragan Stancevic
1 sibling, 0 replies; 40+ messages in thread
From: Viacheslav A.Dubeyko @ 2023-04-10 17:46 UTC (permalink / raw)
To: Kyungsan Kim
Cc: dragan, lsf-pc, linux-mm, Linux FS Devel, linux-cxl,
Adam Manzanares, Dan Williams, seungjun.ha, wj28.lee
> On Apr 9, 2023, at 8:05 PM, Kyungsan Kim <ks0204.kim@samsung.com> wrote:
>
>> Hi folks-
>>
>> if it's not too late for the schedule...
>>
>> I am starting to tackle VM live migration and hypervisor clustering over
>> switched CXL memory[1][2], intended for cloud virtualization types of loads.
>>
>> I'd be interested in doing a small BoF session with some slides and get
>> into a discussion/brainstorming with other people that deal with VM/LM
>> cloud loads. Among other things to discuss would be page migrations over
>> switched CXL memory, shared in-memory ABI to allow VM hand-off between
>> hypervisors, etc...
>>
>> A few of us discussed some of this under the ZONE_XMEM thread, but I
>> figured it might be better to start a separate thread.
>>
>> If there is interested, thank you.
>
> I would like join the discussion as well.
> Let me kindly suggest it would be more great if it includes the data flow of VM/hypervisor as background and kernel interaction expected.
>
Sounds like interesting topic to me. I would like to attend the discussion.
Thanks,
Slava.
>>
>>
>> [1]. High-level overview available at http://nil-migration.org/
>> [2]. Based on CXL spec 3.0
>>
>> --
>> Peace can only come as a natural consequence
>> of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-08 0:05 ` Gregory Price
@ 2023-04-11 0:56 ` Dragan Stancevic
2023-04-11 1:48 ` Gregory Price
2023-04-11 6:37 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Huang, Ying
1 sibling, 1 reply; 40+ messages in thread
From: Dragan Stancevic @ 2023-04-11 0:56 UTC (permalink / raw)
To: Gregory Price; +Cc: lsf-pc, nil-migration, linux-cxl, linux-mm
Hi Gregory-
On 4/7/23 19:05, Gregory Price wrote:
> On Fri, Apr 07, 2023 at 04:05:31PM -0500, Dragan Stancevic wrote:
>> Hi folks-
>>
>> if it's not too late for the schedule...
>>
>> I am starting to tackle VM live migration and hypervisor clustering over
>> switched CXL memory[1][2], intended for cloud virtualization types of loads.
>>
>> I'd be interested in doing a small BoF session with some slides and get into
>> a discussion/brainstorming with other people that deal with VM/LM cloud
>> loads. Among other things to discuss would be page migrations over switched
>> CXL memory, shared in-memory ABI to allow VM hand-off between hypervisors,
>> etc...
>>
>> A few of us discussed some of this under the ZONE_XMEM thread, but I figured
>> it might be better to start a separate thread.
>>
>> If there is interested, thank you.
>>
>>
>> [1]. High-level overview available at http://nil-migration.org/
>> [2]. Based on CXL spec 3.0
>>
>> --
>> Peace can only come as a natural consequence
>> of universal enlightenment -Dr. Nikola Tesla
>
> I've been chatting about this with folks offline, figure i'll toss my
> thoughts on the issue here.
excellent brain dump, thank you
> Some things to consider:
>
> 1. If secure-compute is being used, then this mechanism won't work as
> pages will be pinned, and therefore not movable and excluded from
> using cxl memory at all.
>
> This issue does not exist with traditional live migration, because
> typically some kind of copy is used from one virtual space to another
> (i.e. RMDA), so pages aren't really migrated in the kernel memory
> block/numa node sense.
right, agreed... I don't think we can migrate in all scenarios, such as
pinning or forms of pass-through, etc
my opinion just to start off, as a base requirement, would be that the
pages be movable.
> 2. During the migration process, the memory needs to be forced not to be
> migrated to another node by other means (tiering software, swap,
> etc). The obvious way of doing this would be to migrate and
> temporarily pin the page... but going back to problem #1 we see that
> ZONE_MOVABLE and Pinning are mutually exclusive. So that's
> troublesome.
Yeah, true. I'd have to check the code, but I wonder if perhaps we could
mapcount or refount the pages upon migration onto CLX switched memory.
If my memory serves me right, wouldn't the move_pages back off or stall?
I guess it's TBD, how workable or useful that would be but it's good to
be thinking of different ways of doing this
> 3. This is changing the semantics of migration from a virtual memory
> movement to a physical memory movement. Typically you would expect
> the RDMA process for live migration to work something like...
>
> a) migration request arrives
> b) source host informs destination host of size requirements
> c) destination host allocations memory and passes a Virtual Address
> back to source host
> d) source host initates an RDMA from HostA-VA to HostB-VA
> e) CPU task is migrated
>
> Importantly, the allocation of memory by Host B handles the important
> step of creating HVA->HPA mappings, and the Extended/Nested Page
> Tables can simply be flushed and re-created after the VM is fully
> migrated.
>
> to long didn't read: live migration is a virtual address operation,
> and node-migration is a PHYSICAL address operation, the virtual
> addresses remain the same.
>
> This is problematic, as it's changing the underlying semantics of the
> migration operation.
Those are all valid points, but what if you don't need to recreate
HVA->HPA mappings? If I am understanding the CXL 3.0 spec correctly,
then both virtual addresses and physical addresses wouldn't have to
change. Because the fabric "virtualizes" host physical addresses and the
translation is done by the G-FAM/GFD that has the capability to
translate multi-host HPAs to it's internal DPAs. So if you have two
hypervisors seeing device physical address as the same physical address,
that might work?
> Problem #1 and #2 are head-scratchers, but maybe solvable.
>
> Problem #3 is the meat and potatos of the issue in my opinion. So lets
> consider that a little more closely.
>
> Generically: NIL Migration is basically a pass by reference operation.
Yup, agreed
> The reference in this case is... the page tables. You need to know how
> to interpret the data in the CXL memory region on the remote host, and
> that's a "relative page table translation" (to coin a phrase? I'm not
> sure how to best describe it).
right, coining phrases... I have been thinking of a "super-page" (for
the lack of a better word) a metadata region sitting on the switched
CXL.mem device that would allow hypervisors to synchronize on various
aspects, such as "relative page table translation", host is up, host is
down, list of peers, who owns what etc... In a perfect scenario, I would
love to see the hypervisors cooperating on switched CXL.mem device the
same way cpus on different numa nodes cooperate on memory in a single
hypervisor. If either host can allocate and schedule from this space
then "NIL" aspect of migration is "free".
> That's... complicated to say the least.
> 1) Pages on the physical hardware do not need to be contiguous
> 2) The CFMW on source and target host do not need to be mapped at the
> same place
> 3) There's not pre-allocation in these charts, and migration isn't
> targeted, so having the source-host "expertly place" the data isn't
> possible (right now, i suppose you could make kernel extensions).
> 4) Similar to problem #2 above, even with a pre-allocate added in, you
> would need to ensure those mappings were pinned during migration,
> lest the target host end up swapping a page or something.
>
>
>
> An Option: Make pages physically contiguous on migration to CXL
>
> In this case, you don't necessarily care about the Host Virtual
> Addresses, what you actually care about are the structure of the pages
> in memory (are they physically contiguous? or do you need to
> reconstruct the contiguity by inspecting the page tables?).
>
> If a migration API were capable of reserving large swaths of contiguous
> CXL memory, you could discard individual page information and instead
> send page range information, reconstructing the virtual-physical
> mappings this way.
yeah, good points, but this is all tricky though... it seems this would
require quiescing the VM and that is something I would like to avoid if
possible. I'd like to see the VM still executing while all of it's pages
are migrated onto CXL NUMA on the source hypervisor. And I would like to
see the VM executing on the destination hypervisor while migrate_pages
is moving pages off of CXL. Of course, what you are describing above
would still be a very fast VM migration, but would require quiescing.
> That's about as far as I've thought about it so far. Feel free to rip
> it apart! :]
Those are all great thoughts and I appreciate you sharing them. I don't
have all the answers either :)
> ~Gregory
>
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-09 17:40 ` Shreyas Shah
@ 2023-04-11 1:08 ` Dragan Stancevic
2023-04-11 1:17 ` Shreyas Shah
0 siblings, 1 reply; 40+ messages in thread
From: Dragan Stancevic @ 2023-04-11 1:08 UTC (permalink / raw)
To: Shreyas Shah, lsf-pc; +Cc: nil-migration, linux-cxl, linux-mm
Hi Shreyas-
On 4/9/23 12:40, Shreyas Shah wrote:
> Hi Dragon,
>
> The concept is great to time share the CXL attached memory across two NUMA nodes for live migration and create a cluster of VMs to increase the compute capacity.
>
> When and where is the BoF?
It's a proposal sent under (CFP), it has not been accepted yet. The
agenda is selected by the program committee based on interest. The
current proposal is for LSF/MM/BPF summit[1] running May 8 - May 10th,
but it's not happening if not approved by committee.
[1]. https://events.linuxfoundation.org/lsfmm/
>
>
> Regards,
> Shreyas
>
>
> -----Original Message-----
> From: Dragan Stancevic <dragan@stancevic.com>
> Sent: Friday, April 7, 2023 2:06 PM
> To: lsf-pc@lists.linux-foundation.org
> Cc: nil-migration@lists.linux.dev; linux-cxl@vger.kernel.org; linux-mm@kvack.org
> Subject: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
>
> Hi folks-
>
> if it's not too late for the schedule...
>
> I am starting to tackle VM live migration and hypervisor clustering over switched CXL memory[1][2], intended for cloud virtualization types of loads.
>
> I'd be interested in doing a small BoF session with some slides and get into a discussion/brainstorming with other people that deal with VM/LM cloud loads. Among other things to discuss would be page migrations over switched CXL memory, shared in-memory ABI to allow VM hand-off between hypervisors, etc...
>
> A few of us discussed some of this under the ZONE_XMEM thread, but I figured it might be better to start a separate thread.
>
> If there is interested, thank you.
>
>
> [1]. High-level overview available at http://nil-migration.org/ [2]. Based on CXL spec 3.0
>
> --
> Peace can only come as a natural consequence of universal enlightenment -Dr. Nikola Tesla
--
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-11 1:08 ` Dragan Stancevic
@ 2023-04-11 1:17 ` Shreyas Shah
2023-04-11 1:32 ` Dragan Stancevic
0 siblings, 1 reply; 40+ messages in thread
From: Shreyas Shah @ 2023-04-11 1:17 UTC (permalink / raw)
To: Dragan Stancevic, lsf-pc; +Cc: nil-migration, linux-cxl, linux-mm
Thank you, Dragon.
Btw, we can demonstrate the VM live migration with our FPGA based CXL memory and our demonstration today. We are not application expert, looked at the last diagram in your link and we are confident we can achieve it.
Will there be any interest from the group? I can present for 15 mins and Q&A.
Regards,
Shreyas
-----Original Message-----
From: Dragan Stancevic <dragan@stancevic.com>
Sent: Monday, April 10, 2023 6:09 PM
To: Shreyas Shah <shreyas.shah@elastics.cloud>; lsf-pc@lists.linux-foundation.org
Cc: nil-migration@lists.linux.dev; linux-cxl@vger.kernel.org; linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
Hi Shreyas-
On 4/9/23 12:40, Shreyas Shah wrote:
> Hi Dragon,
>
> The concept is great to time share the CXL attached memory across two NUMA nodes for live migration and create a cluster of VMs to increase the compute capacity.
>
> When and where is the BoF?
It's a proposal sent under (CFP), it has not been accepted yet. The agenda is selected by the program committee based on interest. The current proposal is for LSF/MM/BPF summit[1] running May 8 - May 10th, but it's not happening if not approved by committee.
[1]. https://events.linuxfoundation.org/lsfmm/
>
>
> Regards,
> Shreyas
>
>
> -----Original Message-----
> From: Dragan Stancevic <dragan@stancevic.com>
> Sent: Friday, April 7, 2023 2:06 PM
> To: lsf-pc@lists.linux-foundation.org
> Cc: nil-migration@lists.linux.dev; linux-cxl@vger.kernel.org;
> linux-mm@kvack.org
> Subject: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
>
> Hi folks-
>
> if it's not too late for the schedule...
>
> I am starting to tackle VM live migration and hypervisor clustering over switched CXL memory[1][2], intended for cloud virtualization types of loads.
>
> I'd be interested in doing a small BoF session with some slides and get into a discussion/brainstorming with other people that deal with VM/LM cloud loads. Among other things to discuss would be page migrations over switched CXL memory, shared in-memory ABI to allow VM hand-off between hypervisors, etc...
>
> A few of us discussed some of this under the ZONE_XMEM thread, but I figured it might be better to start a separate thread.
>
> If there is interested, thank you.
>
>
> [1]. High-level overview available at http://nil-migration.org/ [2].
> Based on CXL spec 3.0
>
> --
> Peace can only come as a natural consequence of universal
> enlightenment -Dr. Nikola Tesla
--
--
Peace can only come as a natural consequence of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-11 1:17 ` Shreyas Shah
@ 2023-04-11 1:32 ` Dragan Stancevic
2023-04-11 4:33 ` Shreyas Shah
0 siblings, 1 reply; 40+ messages in thread
From: Dragan Stancevic @ 2023-04-11 1:32 UTC (permalink / raw)
To: Shreyas Shah, lsf-pc; +Cc: nil-migration, linux-cxl, linux-mm
Hi Shreyas-
speaking strictly for myself, sounds interesting
On 4/10/23 20:17, Shreyas Shah wrote:
> Thank you, Dragon.
>
> Btw, we can demonstrate the VM live migration with our FPGA based CXL memory and our demonstration today. We are not application expert, looked at the last diagram in your link and we are confident we can achieve it.
>
> Will there be any interest from the group? I can present for 15 mins and Q&A.
>
> Regards,
> Shreyas
>
>
>
> -----Original Message-----
> From: Dragan Stancevic <dragan@stancevic.com>
> Sent: Monday, April 10, 2023 6:09 PM
> To: Shreyas Shah <shreyas.shah@elastics.cloud>; lsf-pc@lists.linux-foundation.org
> Cc: nil-migration@lists.linux.dev; linux-cxl@vger.kernel.org; linux-mm@kvack.org
> Subject: Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
>
> Hi Shreyas-
>
> On 4/9/23 12:40, Shreyas Shah wrote:
>> Hi Dragon,
>>
>> The concept is great to time share the CXL attached memory across two NUMA nodes for live migration and create a cluster of VMs to increase the compute capacity.
>>
>> When and where is the BoF?
>
> It's a proposal sent under (CFP), it has not been accepted yet. The agenda is selected by the program committee based on interest. The current proposal is for LSF/MM/BPF summit[1] running May 8 - May 10th, but it's not happening if not approved by committee.
>
>
> [1]. https://events.linuxfoundation.org/lsfmm/
>
>
>>
>>
>> Regards,
>> Shreyas
>>
>>
>> -----Original Message-----
>> From: Dragan Stancevic <dragan@stancevic.com>
>> Sent: Friday, April 7, 2023 2:06 PM
>> To: lsf-pc@lists.linux-foundation.org
>> Cc: nil-migration@lists.linux.dev; linux-cxl@vger.kernel.org;
>> linux-mm@kvack.org
>> Subject: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
>>
>> Hi folks-
>>
>> if it's not too late for the schedule...
>>
>> I am starting to tackle VM live migration and hypervisor clustering over switched CXL memory[1][2], intended for cloud virtualization types of loads.
>>
>> I'd be interested in doing a small BoF session with some slides and get into a discussion/brainstorming with other people that deal with VM/LM cloud loads. Among other things to discuss would be page migrations over switched CXL memory, shared in-memory ABI to allow VM hand-off between hypervisors, etc...
>>
>> A few of us discussed some of this under the ZONE_XMEM thread, but I figured it might be better to start a separate thread.
>>
>> If there is interested, thank you.
>>
>>
>> [1]. High-level overview available at http://nil-migration.org/ [2].
>> Based on CXL spec 3.0
>>
>> --
>> Peace can only come as a natural consequence of universal
>> enlightenment -Dr. Nikola Tesla
>
> --
> --
> Peace can only come as a natural consequence of universal enlightenment -Dr. Nikola Tesla
>
--
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-11 0:56 ` Dragan Stancevic
@ 2023-04-11 1:48 ` Gregory Price
2023-04-14 3:32 ` Dragan Stancevic
0 siblings, 1 reply; 40+ messages in thread
From: Gregory Price @ 2023-04-11 1:48 UTC (permalink / raw)
To: Dragan Stancevic; +Cc: lsf-pc, nil-migration, linux-cxl, linux-mm
On Mon, Apr 10, 2023 at 07:56:01PM -0500, Dragan Stancevic wrote:
> Hi Gregory-
>
> On 4/7/23 19:05, Gregory Price wrote:
> > 3. This is changing the semantics of migration from a virtual memory
> > movement to a physical memory movement. Typically you would expect
> > the RDMA process for live migration to work something like...
> >
> > a) migration request arrives
> > b) source host informs destination host of size requirements
> > c) destination host allocations memory and passes a Virtual Address
> > back to source host
> > d) source host initates an RDMA from HostA-VA to HostB-VA
> > e) CPU task is migrated
> >
> > Importantly, the allocation of memory by Host B handles the important
> > step of creating HVA->HPA mappings, and the Extended/Nested Page
> > Tables can simply be flushed and re-created after the VM is fully
> > migrated.
> >
> > to long didn't read: live migration is a virtual address operation,
> > and node-migration is a PHYSICAL address operation, the virtual
> > addresses remain the same.
> >
> > This is problematic, as it's changing the underlying semantics of the
> > migration operation.
>
> Those are all valid points, but what if you don't need to recreate HVA->HPA
> mappings? If I am understanding the CXL 3.0 spec correctly, then both
> virtual addresses and physical addresses wouldn't have to change. Because
> the fabric "virtualizes" host physical addresses and the translation is done
> by the G-FAM/GFD that has the capability to translate multi-host HPAs to
> it's internal DPAs. So if you have two hypervisors seeing device physical
> address as the same physical address, that might work?
>
>
Hm. I hadn't considered the device side translation (decoders), though
that's obviously a tool in the toolbox. You still have to know how to
slide ranges of data (which you mention below).
>
> > The reference in this case is... the page tables. You need to know how
> > to interpret the data in the CXL memory region on the remote host, and
> > that's a "relative page table translation" (to coin a phrase? I'm not
> > sure how to best describe it).
>
> right, coining phrases... I have been thinking of a "super-page" (for the
> lack of a better word) a metadata region sitting on the switched CXL.mem
> device that would allow hypervisors to synchronize on various aspects, such
> as "relative page table translation", host is up, host is down, list of
> peers, who owns what etc... In a perfect scenario, I would love to see the
> hypervisors cooperating on switched CXL.mem device the same way cpus on
> different numa nodes cooperate on memory in a single hypervisor. If either
> host can allocate and schedule from this space then "NIL" aspect of
> migration is "free".
>
>
The core of the problem is still that each of the hosts has to agree on
the location (physically) of this region of memory, which could be
problematic unless you have very strong BIOS and/or kernel driver
controls to ensure certain devices are guaranteed to be mapped into
certain spots in the CFMW.
After that it's a matter of treating this memory as incoherent shared
memory and handling ownership in a safe way. If the memory is only used
for migrations, then you don't have to worry about performance.
So I agree, as long as shared memory mapped into the same CFMW area is
used, this mechanism is totally sound.
My main concerns are that I don't know of a mechanism to ensure that. I
suppose for those interested, and with special BIOS/EFI, you could do
that - but I think that's going to be a tall ask in a heterogenous cloud
environment.
> > That's... complicated to say the least.
> >
> > <... snip ...>
> >
> > An Option: Make pages physically contiguous on migration to CXL
> >
> > In this case, you don't necessarily care about the Host Virtual
> > Addresses, what you actually care about are the structure of the pages
> > in memory (are they physically contiguous? or do you need to
> > reconstruct the contiguity by inspecting the page tables?).
> >
> > If a migration API were capable of reserving large swaths of contiguous
> > CXL memory, you could discard individual page information and instead
> > send page range information, reconstructing the virtual-physical
> > mappings this way.
>
> yeah, good points, but this is all tricky though... it seems this would
> require quiescing the VM and that is something I would like to avoid if
> possible. I'd like to see the VM still executing while all of it's pages are
> migrated onto CXL NUMA on the source hypervisor. And I would like to see the
> VM executing on the destination hypervisor while migrate_pages is moving
> pages off of CXL. Of course, what you are describing above would still be a
> very fast VM migration, but would require quiescing.
>
>
Possibly. If you're going to quiesce you're probably better off just
snapshotting to shared memory and migrating the snapshot.
Maybe that's the better option for a first-pass migration mechanism. I
don't know.
Anyway, would love to attend this session.
~Gregory
^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-11 1:32 ` Dragan Stancevic
@ 2023-04-11 4:33 ` Shreyas Shah
2023-04-14 3:26 ` Dragan Stancevic
0 siblings, 1 reply; 40+ messages in thread
From: Shreyas Shah @ 2023-04-11 4:33 UTC (permalink / raw)
To: Dragan Stancevic, lsf-pc; +Cc: nil-migration, linux-cxl, linux-mm
Hi Dragon,
Do you all meet on Zoom or Teams weekly call?
Maybe next time I can present if you have a slot.
Regards,
Shreyas
-----Original Message-----
From: Dragan Stancevic <dragan@stancevic.com>
Sent: Monday, April 10, 2023 6:33 PM
To: Shreyas Shah <shreyas.shah@elastics.cloud>; lsf-pc@lists.linux-foundation.org
Cc: nil-migration@lists.linux.dev; linux-cxl@vger.kernel.org; linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
Hi Shreyas-
speaking strictly for myself, sounds interesting
On 4/10/23 20:17, Shreyas Shah wrote:
> Thank you, Dragon.
>
> Btw, we can demonstrate the VM live migration with our FPGA based CXL memory and our demonstration today. We are not application expert, looked at the last diagram in your link and we are confident we can achieve it.
>
> Will there be any interest from the group? I can present for 15 mins and Q&A.
>
> Regards,
> Shreyas
>
>
>
> -----Original Message-----
> From: Dragan Stancevic <dragan@stancevic.com>
> Sent: Monday, April 10, 2023 6:09 PM
> To: Shreyas Shah <shreyas.shah@elastics.cloud>;
> lsf-pc@lists.linux-foundation.org
> Cc: nil-migration@lists.linux.dev; linux-cxl@vger.kernel.org;
> linux-mm@kvack.org
> Subject: Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
>
> Hi Shreyas-
>
> On 4/9/23 12:40, Shreyas Shah wrote:
>> Hi Dragon,
>>
>> The concept is great to time share the CXL attached memory across two NUMA nodes for live migration and create a cluster of VMs to increase the compute capacity.
>>
>> When and where is the BoF?
>
> It's a proposal sent under (CFP), it has not been accepted yet. The agenda is selected by the program committee based on interest. The current proposal is for LSF/MM/BPF summit[1] running May 8 - May 10th, but it's not happening if not approved by committee.
>
>
> [1]. https://events.linuxfoundation.org/lsfmm/
>
>
>>
>>
>> Regards,
>> Shreyas
>>
>>
>> -----Original Message-----
>> From: Dragan Stancevic <dragan@stancevic.com>
>> Sent: Friday, April 7, 2023 2:06 PM
>> To: lsf-pc@lists.linux-foundation.org
>> Cc: nil-migration@lists.linux.dev; linux-cxl@vger.kernel.org;
>> linux-mm@kvack.org
>> Subject: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
>>
>> Hi folks-
>>
>> if it's not too late for the schedule...
>>
>> I am starting to tackle VM live migration and hypervisor clustering over switched CXL memory[1][2], intended for cloud virtualization types of loads.
>>
>> I'd be interested in doing a small BoF session with some slides and get into a discussion/brainstorming with other people that deal with VM/LM cloud loads. Among other things to discuss would be page migrations over switched CXL memory, shared in-memory ABI to allow VM hand-off between hypervisors, etc...
>>
>> A few of us discussed some of this under the ZONE_XMEM thread, but I figured it might be better to start a separate thread.
>>
>> If there is interested, thank you.
>>
>>
>> [1]. High-level overview available at http://nil-migration.org/ [2].
>> Based on CXL spec 3.0
>>
>> --
>> Peace can only come as a natural consequence of universal
>> enlightenment -Dr. Nikola Tesla
>
> --
> --
> Peace can only come as a natural consequence of universal
> enlightenment -Dr. Nikola Tesla
>
--
--
Peace can only come as a natural consequence of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-08 0:05 ` Gregory Price
2023-04-11 0:56 ` Dragan Stancevic
@ 2023-04-11 6:37 ` Huang, Ying
2023-04-11 15:36 ` Gregory Price
2023-04-14 3:33 ` Dragan Stancevic
1 sibling, 2 replies; 40+ messages in thread
From: Huang, Ying @ 2023-04-11 6:37 UTC (permalink / raw)
To: Gregory Price
Cc: Dragan Stancevic, lsf-pc, nil-migration, linux-cxl, linux-mm
Gregory Price <gregory.price@memverge.com> writes:
[snip]
> 2. During the migration process, the memory needs to be forced not to be
> migrated to another node by other means (tiering software, swap,
> etc). The obvious way of doing this would be to migrate and
> temporarily pin the page... but going back to problem #1 we see that
> ZONE_MOVABLE and Pinning are mutually exclusive. So that's
> troublesome.
Can we use memory policy (cpusets, mbind(), set_mempolicy(), etc.) to
avoid move pages out of CXL.mem node? Now, there are gaps in tiering,
but I think it is fixable.
Best Regards,
Huang, Ying
[snip]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-11 6:37 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Huang, Ying
@ 2023-04-11 15:36 ` Gregory Price
2023-04-12 2:54 ` Huang, Ying
2023-04-14 3:33 ` Dragan Stancevic
1 sibling, 1 reply; 40+ messages in thread
From: Gregory Price @ 2023-04-11 15:36 UTC (permalink / raw)
To: Huang, Ying; +Cc: Dragan Stancevic, lsf-pc, nil-migration, linux-cxl, linux-mm
On Tue, Apr 11, 2023 at 02:37:50PM +0800, Huang, Ying wrote:
> Gregory Price <gregory.price@memverge.com> writes:
>
> [snip]
>
> > 2. During the migration process, the memory needs to be forced not to be
> > migrated to another node by other means (tiering software, swap,
> > etc). The obvious way of doing this would be to migrate and
> > temporarily pin the page... but going back to problem #1 we see that
> > ZONE_MOVABLE and Pinning are mutually exclusive. So that's
> > troublesome.
>
> Can we use memory policy (cpusets, mbind(), set_mempolicy(), etc.) to
> avoid move pages out of CXL.mem node? Now, there are gaps in tiering,
> but I think it is fixable.
>
> Best Regards,
> Huang, Ying
>
> [snip]
That feels like a hack/bodge rather than a proper solution to me.
Maybe this is an affirmative argument for the creation of an EXMEM
zone. Specifically to allow page pinning, but with far more stringent
controls - i.e. the zone is excluded from use via general allocations.
The point of ZONE_MOVABLE is to allow general allocation of userland
data into hotpluggable memory regions.
This memory region is not for general use, and wants to allow pinning
and be hotpluggable under very controlled circumstances. That seems
like a reasonable argument for the creation of EXMEM.
~Gregory
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-07 21:05 [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dragan Stancevic
` (3 preceding siblings ...)
[not found] ` <CGME20230410030532epcas2p49eae675396bf81658c1a3401796da1d4@epcas2p4.samsung.com>
@ 2023-04-11 18:00 ` Dave Hansen
2023-05-01 23:49 ` Dragan Stancevic
2023-04-11 18:16 ` RAGHU H
2023-05-09 15:08 ` Dragan Stancevic
6 siblings, 1 reply; 40+ messages in thread
From: Dave Hansen @ 2023-04-11 18:00 UTC (permalink / raw)
To: Dragan Stancevic, lsf-pc; +Cc: nil-migration, linux-cxl, linux-mm
On 4/7/23 14:05, Dragan Stancevic wrote:
> I'd be interested in doing a small BoF session with some slides and get
> into a discussion/brainstorming with other people that deal with VM/LM
> cloud loads. Among other things to discuss would be page migrations over
> switched CXL memory, shared in-memory ABI to allow VM hand-off between
> hypervisors, etc...
How would 'struct page' or other kernel metadata be handled?
I assume you'd want a really big CXL memory device with as many hosts
connected to it as is feasible. But, in order to hand the memory off
from one host to another, both would need to have metadata for it at
_some_ point.
So, do all hosts have metadata for the whole CXL memory device all the
time? Or, would they create the metadata (hotplug) when a VM is
migrated in and destroy it (hot unplug) when a VM is migrated out?
That gets back to the granularity question discussed elsewhere in the
thread. How would the metadata allocation granularity interact with the
page allocation granularity? How would fragmentation be avoided so that
hosts don't eat up all their RAM with unused metadata?
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-07 21:05 [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dragan Stancevic
` (4 preceding siblings ...)
2023-04-11 18:00 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dave Hansen
@ 2023-04-11 18:16 ` RAGHU H
2023-05-09 15:08 ` Dragan Stancevic
6 siblings, 0 replies; 40+ messages in thread
From: RAGHU H @ 2023-04-11 18:16 UTC (permalink / raw)
To: Dragan Stancevic; +Cc: lsf-pc, nil-migration, linux-cxl, linux-mm
Hi Dragan
Please add me to the discussion!
Regards
Raghu
On Sat, Apr 8, 2023 at 2:56 AM Dragan Stancevic <dragan@stancevic.com> wrote:
>
> Hi folks-
>
> if it's not too late for the schedule...
>
> I am starting to tackle VM live migration and hypervisor clustering over
> switched CXL memory[1][2], intended for cloud virtualization types of loads.
>
> I'd be interested in doing a small BoF session with some slides and get
> into a discussion/brainstorming with other people that deal with VM/LM
> cloud loads. Among other things to discuss would be page migrations over
> switched CXL memory, shared in-memory ABI to allow VM hand-off between
> hypervisors, etc...
>
> A few of us discussed some of this under the ZONE_XMEM thread, but I
> figured it might be better to start a separate thread.
>
> If there is interested, thank you.
>
>
> [1]. High-level overview available at http://nil-migration.org/
> [2]. Based on CXL spec 3.0
>
> --
> Peace can only come as a natural consequence
> of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-11 15:36 ` Gregory Price
@ 2023-04-12 2:54 ` Huang, Ying
2023-04-12 8:38 ` David Hildenbrand
0 siblings, 1 reply; 40+ messages in thread
From: Huang, Ying @ 2023-04-12 2:54 UTC (permalink / raw)
To: Gregory Price
Cc: Dragan Stancevic, lsf-pc, nil-migration, linux-cxl, linux-mm
Gregory Price <gregory.price@memverge.com> writes:
> On Tue, Apr 11, 2023 at 02:37:50PM +0800, Huang, Ying wrote:
>> Gregory Price <gregory.price@memverge.com> writes:
>>
>> [snip]
>>
>> > 2. During the migration process, the memory needs to be forced not to be
>> > migrated to another node by other means (tiering software, swap,
>> > etc). The obvious way of doing this would be to migrate and
>> > temporarily pin the page... but going back to problem #1 we see that
>> > ZONE_MOVABLE and Pinning are mutually exclusive. So that's
>> > troublesome.
>>
>> Can we use memory policy (cpusets, mbind(), set_mempolicy(), etc.) to
>> avoid move pages out of CXL.mem node? Now, there are gaps in tiering,
>> but I think it is fixable.
>>
>> Best Regards,
>> Huang, Ying
>>
>> [snip]
>
> That feels like a hack/bodge rather than a proper solution to me.
>
> Maybe this is an affirmative argument for the creation of an EXMEM
> zone.
Let's start with requirements. What is the requirements for a new zone
type?
> Specifically to allow page pinning, but with far more stringent
> controls -
> i.e. the zone is excluded from use via general allocations.
This can also be controlled via memory policy. The alternative solution
is to add a per node attribute.
> The point of ZONE_MOVABLE is to allow general allocation of userland
> data into hotpluggable memory regions.
IIUC, one typical requirement of CXL.mem is hotpluggable, right?
Best Regards,
Huang, Ying
> This memory region is not for general use, and wants to allow pinning
> and be hotpluggable under very controlled circumstances. That seems
> like a reasonable argument for the creation of EXMEM.
>
> ~Gregory
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-12 2:54 ` Huang, Ying
@ 2023-04-12 8:38 ` David Hildenbrand
[not found] ` <CGME20230412111034epcas2p1b46d2a26b7d3ac5db3b0e454255527b0@epcas2p1.samsung.com>
` (2 more replies)
0 siblings, 3 replies; 40+ messages in thread
From: David Hildenbrand @ 2023-04-12 8:38 UTC (permalink / raw)
To: Huang, Ying, Gregory Price
Cc: Dragan Stancevic, lsf-pc, nil-migration, linux-cxl, linux-mm
On 12.04.23 04:54, Huang, Ying wrote:
> Gregory Price <gregory.price@memverge.com> writes:
>
>> On Tue, Apr 11, 2023 at 02:37:50PM +0800, Huang, Ying wrote:
>>> Gregory Price <gregory.price@memverge.com> writes:
>>>
>>> [snip]
>>>
>>>> 2. During the migration process, the memory needs to be forced not to be
>>>> migrated to another node by other means (tiering software, swap,
>>>> etc). The obvious way of doing this would be to migrate and
>>>> temporarily pin the page... but going back to problem #1 we see that
>>>> ZONE_MOVABLE and Pinning are mutually exclusive. So that's
>>>> troublesome.
>>>
>>> Can we use memory policy (cpusets, mbind(), set_mempolicy(), etc.) to
>>> avoid move pages out of CXL.mem node? Now, there are gaps in tiering,
>>> but I think it is fixable.
>>>
>>> Best Regards,
>>> Huang, Ying
>>>
>>> [snip]
>>
>> That feels like a hack/bodge rather than a proper solution to me.
>>
>> Maybe this is an affirmative argument for the creation of an EXMEM
>> zone.
>
> Let's start with requirements. What is the requirements for a new zone
> type?
I'm stills scratching my head regarding this. I keep hearing all
different kind of statements that just add more confusions "we want it
to be hotunpluggable" "we want to allow for long-term pinning memory"
"but we still want it to be movable" "we want to place some unmovable
allocations on it". Huh?
Just to clarify: ZONE_MOVABLE allows for pinning. It just doesn't allow
for long-term pinning of memory.
For good reason, because long-term pinning of memory is just the worst
(memory waste, fragmentation, overcommit) and instead of finding new
ways to *avoid* long-term pinnings, we're coming up with advanced
concepts to work-around the fundamental property of long-term pinnings.
We want all memory to be long-term pinnable and we want all memory to be
movable/hotunpluggable. That's not going to work.
If you'd ask me today, my prediction is that ZONE_EXMEM is not going to
happen.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: FW: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
[not found] ` <CGME20230412111034epcas2p1b46d2a26b7d3ac5db3b0e454255527b0@epcas2p1.samsung.com>
@ 2023-04-12 11:10 ` Kyungsan Kim
2023-04-12 11:26 ` David Hildenbrand
2023-04-12 15:40 ` Matthew Wilcox
0 siblings, 2 replies; 40+ messages in thread
From: Kyungsan Kim @ 2023-04-12 11:10 UTC (permalink / raw)
To: david
Cc: lsf-pc, linux-mm, linux-fsdevel, linux-cxl, a.manzanares,
viacheslav.dubeyko, dan.j.williams, seungjun.ha, wj28.lee
>> Gregory Price <gregory.price@memverge.com> writes:
>>
>>> On Tue, Apr 11, 2023 at 02:37:50PM +0800, Huang, Ying wrote:
>>>> Gregory Price <gregory.price@memverge.com> writes:
>>>>
>>>> [snip]
>>>>
>>>>> 2. During the migration process, the memory needs to be forced not to be
>>>>> migrated to another node by other means (tiering software, swap,
>>>>> etc). The obvious way of doing this would be to migrate and
>>>>> temporarily pin the page... but going back to problem #1 we see that
>>>>> ZONE_MOVABLE and Pinning are mutually exclusive. So that's
>>>>> troublesome.
>>>>
>>>> Can we use memory policy (cpusets, mbind(), set_mempolicy(), etc.) to
>>>> avoid move pages out of CXL.mem node? Now, there are gaps in tiering,
>>>> but I think it is fixable.
>>>>
>>>> Best Regards,
>>>> Huang, Ying
>>>>
>>>> [snip]
>>>
>>> That feels like a hack/bodge rather than a proper solution to me.
>>>
>>> Maybe this is an affirmative argument for the creation of an EXMEM
>>> zone.
>>
>> Let's start with requirements. What is the requirements for a new zone
>> type?
>
>I'm stills scratching my head regarding this. I keep hearing all
>different kind of statements that just add more confusions "we want it
>to be hotunpluggable" "we want to allow for long-term pinning memory"
>"but we still want it to be movable" "we want to place some unmovable
>allocations on it". Huh?
>
>Just to clarify: ZONE_MOVABLE allows for pinning. It just doesn't allow
>for long-term pinning of memory.
>
>For good reason, because long-term pinning of memory is just the worst
>(memory waste, fragmentation, overcommit) and instead of finding new
>ways to *avoid* long-term pinnings, we're coming up with advanced
>concepts to work-around the fundamental property of long-term pinnings.
>
>We want all memory to be long-term pinnable and we want all memory to be
>movable/hotunpluggable. That's not going to work.
Looks there is misunderstanding about ZONE_EXMEM argument.
Pinning and plubbability is mutual exclusive so it can not happen at the same time.
What we argue is ZONE_EXMEM does not "confine movability". an allocation context can determine the movability attribute.
Even one unmovable allocation will make the entire CXL DRAM unpluggable.
When you see ZONE_EXMEM just on movable/unmoable aspect, we think it is the same with ZONE_NORMAL,
but ZONE_EXMEM works on an extended memory, as of now CXL DRAM.
Then why ZONE_EXMEM is, ZONE_EXMEM considers not only the pluggability aspect, but CXL identifier for user/kenelspace API,
the abstraction of multiple CXL DRAM channels, and zone unit algorithm for CXL HW characteristics.
The last one is potential at the moment, though.
As mentioned in ZONE_EXMEM thread, we are preparing slides to explain experiences and proposals.
It it not final version now[1].
[1] https://github.com/OpenMPDK/SMDK/wiki/93.-%5BLSF-MM-BPF-TOPIC%5D-SMDK-inspired-MM-changes-for-CXL
>If you'd ask me today, my prediction is that ZONE_EXMEM is not going to
>happen.
>
>--
>Thanks,
>
>David / dhildenb
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FW: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-12 11:10 ` FW: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
@ 2023-04-12 11:26 ` David Hildenbrand
[not found] ` <CGME20230414084110epcas2p20b90a8d1892110d7ca3ac16290cd4686@epcas2p2.samsung.com>
2023-04-12 15:40 ` Matthew Wilcox
1 sibling, 1 reply; 40+ messages in thread
From: David Hildenbrand @ 2023-04-12 11:26 UTC (permalink / raw)
To: Kyungsan Kim
Cc: lsf-pc, linux-mm, linux-fsdevel, linux-cxl, a.manzanares,
viacheslav.dubeyko, dan.j.williams, seungjun.ha, wj28.lee
On 12.04.23 13:10, Kyungsan Kim wrote:
>>> Gregory Price <gregory.price@memverge.com> writes:
>>>
>>>> On Tue, Apr 11, 2023 at 02:37:50PM +0800, Huang, Ying wrote:
>>>>> Gregory Price <gregory.price@memverge.com> writes:
>>>>>
>>>>> [snip]
>>>>>
>>>>>> 2. During the migration process, the memory needs to be forced not to be
>>>>>> migrated to another node by other means (tiering software, swap,
>>>>>> etc). The obvious way of doing this would be to migrate and
>>>>>> temporarily pin the page... but going back to problem #1 we see that
>>>>>> ZONE_MOVABLE and Pinning are mutually exclusive. So that's
>>>>>> troublesome.
>>>>>
>>>>> Can we use memory policy (cpusets, mbind(), set_mempolicy(), etc.) to
>>>>> avoid move pages out of CXL.mem node? Now, there are gaps in tiering,
>>>>> but I think it is fixable.
>>>>>
>>>>> Best Regards,
>>>>> Huang, Ying
>>>>>
>>>>> [snip]
>>>>
>>>> That feels like a hack/bodge rather than a proper solution to me.
>>>>
>>>> Maybe this is an affirmative argument for the creation of an EXMEM
>>>> zone.
>>>
>>> Let's start with requirements. What is the requirements for a new zone
>>> type?
>>
>> I'm stills scratching my head regarding this. I keep hearing all
>> different kind of statements that just add more confusions "we want it
>> to be hotunpluggable" "we want to allow for long-term pinning memory"
>> "but we still want it to be movable" "we want to place some unmovable
>> allocations on it". Huh?
>>
>> Just to clarify: ZONE_MOVABLE allows for pinning. It just doesn't allow
>> for long-term pinning of memory.
>>
>> For good reason, because long-term pinning of memory is just the worst
>> (memory waste, fragmentation, overcommit) and instead of finding new
>> ways to *avoid* long-term pinnings, we're coming up with advanced
>> concepts to work-around the fundamental property of long-term pinnings.
>>
>> We want all memory to be long-term pinnable and we want all memory to be
>> movable/hotunpluggable. That's not going to work.
>
> Looks there is misunderstanding about ZONE_EXMEM argument.
> Pinning and plubbability is mutual exclusive so it can not happen at the same time.
> What we argue is ZONE_EXMEM does not "confine movability". an allocation context can determine the movability attribute.
> Even one unmovable allocation will make the entire CXL DRAM unpluggable.
> When you see ZONE_EXMEM just on movable/unmoable aspect, we think it is the same with ZONE_NORMAL,
> but ZONE_EXMEM works on an extended memory, as of now CXL DRAM.
>
> Then why ZONE_EXMEM is, ZONE_EXMEM considers not only the pluggability aspect, but CXL identifier for user/kenelspace API,
> the abstraction of multiple CXL DRAM channels, and zone unit algorithm for CXL HW characteristics.
> The last one is potential at the moment, though.
>
> As mentioned in ZONE_EXMEM thread, we are preparing slides to explain experiences and proposals.
> It it not final version now[1].
> [1] https://github.com/OpenMPDK/SMDK/wiki/93.-%5BLSF-MM-BPF-TOPIC%5D-SMDK-inspired-MM-changes-for-CXL
Yes, hopefully we can discuss at LSF/MM also the problems we are trying
to solve instead of focusing on one solution. [did not have the time to
look at the slides yet, sorry]
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-12 8:38 ` David Hildenbrand
[not found] ` <CGME20230412111034epcas2p1b46d2a26b7d3ac5db3b0e454255527b0@epcas2p1.samsung.com>
@ 2023-04-12 15:15 ` James Bottomley
2023-05-03 23:42 ` Dragan Stancevic
2023-04-12 15:26 ` Gregory Price
2 siblings, 1 reply; 40+ messages in thread
From: James Bottomley @ 2023-04-12 15:15 UTC (permalink / raw)
To: David Hildenbrand, Huang, Ying, Gregory Price
Cc: Dragan Stancevic, lsf-pc, nil-migration, linux-cxl, linux-mm
On Wed, 2023-04-12 at 10:38 +0200, David Hildenbrand wrote:
> On 12.04.23 04:54, Huang, Ying wrote:
> > Gregory Price <gregory.price@memverge.com> writes:
[...]
> > > That feels like a hack/bodge rather than a proper solution to me.
> > >
> > > Maybe this is an affirmative argument for the creation of an
> > > EXMEM zone.
> >
> > Let's start with requirements. What is the requirements for a new
> > zone type?
>
> I'm stills scratching my head regarding this. I keep hearing all
> different kind of statements that just add more confusions "we want
> it to be hotunpluggable" "we want to allow for long-term pinning
> memory" "but we still want it to be movable" "we want to place some
> unmovable allocations on it". Huh?
This is the essential question about CXL memory itself: what would its
killer app be? The CXL people (or at least the ones I've talked to)
don't exactly know. Within IBM I've seen lots of ideas but no actual
concrete applications. Given the rates at which memory density in
systems is increasing, I'm a bit dubious of the extensible system pool
argument. Providing extensible memory to VMs sounds a bit more
plausible, particularly as it solves a big part of the local overcommit
problem (although you still have a global one). I'm not really sure I
buy the VM migration use case: iterative transfer works fine with small
down times so transferring memory seems to be the least of problems
with the VM migration use case (it's mostly about problems with
attached devices). CXL 3.0 is adding sharing primitives for memory so
now we have to ask if there are any multi-node shared memory use cases
for this, but most of us have already been burned by multi-node shared
clusters once in our career and are a bit leery of a second go around.
Is there a use case I left out (or needs expanding)?
James
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-12 8:38 ` David Hildenbrand
[not found] ` <CGME20230412111034epcas2p1b46d2a26b7d3ac5db3b0e454255527b0@epcas2p1.samsung.com>
2023-04-12 15:15 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory James Bottomley
@ 2023-04-12 15:26 ` Gregory Price
2023-04-12 15:50 ` David Hildenbrand
2 siblings, 1 reply; 40+ messages in thread
From: Gregory Price @ 2023-04-12 15:26 UTC (permalink / raw)
To: David Hildenbrand
Cc: Huang, Ying, Dragan Stancevic, lsf-pc, nil-migration, linux-cxl,
linux-mm
On Wed, Apr 12, 2023 at 10:38:04AM +0200, David Hildenbrand wrote:
> On 12.04.23 04:54, Huang, Ying wrote:
> > Gregory Price <gregory.price@memverge.com> writes:
> >
> > > On Tue, Apr 11, 2023 at 02:37:50PM +0800, Huang, Ying wrote:
> > > > Gregory Price <gregory.price@memverge.com> writes:
> > > >
> > > > [snip]
> > > >
> > > > > 2. During the migration process, the memory needs to be forced not to be
> > > > > migrated to another node by other means (tiering software, swap,
> > > > > etc). The obvious way of doing this would be to migrate and
> > > > > temporarily pin the page... but going back to problem #1 we see that
> > > > > ZONE_MOVABLE and Pinning are mutually exclusive. So that's
> > > > > troublesome.
> > > >
> > > > Can we use memory policy (cpusets, mbind(), set_mempolicy(), etc.) to
> > > > avoid move pages out of CXL.mem node? Now, there are gaps in tiering,
> > > > but I think it is fixable.
> > > >
> > > > Best Regards,
> > > > Huang, Ying
> > > >
> > > > [snip]
> > >
> > > That feels like a hack/bodge rather than a proper solution to me.
> > >
> > > Maybe this is an affirmative argument for the creation of an EXMEM
> > > zone.
> >
> > Let's start with requirements. What is the requirements for a new zone
> > type?
>
> I'm stills scratching my head regarding this. I keep hearing all different
> kind of statements that just add more confusions "we want it to be
> hotunpluggable" "we want to allow for long-term pinning memory" "but we
> still want it to be movable" "we want to place some unmovable allocations on
> it". Huh?
>
> Just to clarify: ZONE_MOVABLE allows for pinning. It just doesn't allow for
> long-term pinning of memory.
>
I apologize for the confusion, this is my fault. I had assumed that
since dax regions can't be pinned, subsequent nodes backed by a dax
device could not be pinned. In testing this, this is not the case.
Re: long-term pinning, can you be more explicit as to what is considered
long-term? Minutes? hours? days? etc
If a migration operation is considered short term, then pinning VM
memory during migration deals with this issue cleanly.
So walking back my statement - give my testing, i don't believe there's
a reason for a new zone.
> For good reason, because long-term pinning of memory is just the worst
> (memory waste, fragmentation, overcommit) and instead of finding new ways to
> *avoid* long-term pinnings, we're coming up with advanced concepts to
> work-around the fundamental property of long-term pinnings.
>
> We want all memory to be long-term pinnable and we want all memory to be
> movable/hotunpluggable. That's not going to work.
>
> If you'd ask me today, my prediction is that ZONE_EXMEM is not going to
> happen.
>
> --
> Thanks,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: FW: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-12 11:10 ` FW: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-12 11:26 ` David Hildenbrand
@ 2023-04-12 15:40 ` Matthew Wilcox
[not found] ` <CGME20230414084114epcas2p4754d6c0d3c86a0d6d4e855058562100f@epcas2p4.samsung.com>
1 sibling, 1 reply; 40+ messages in thread
From: Matthew Wilcox @ 2023-04-12 15:40 UTC (permalink / raw)
To: Kyungsan Kim
Cc: david, lsf-pc, linux-mm, linux-fsdevel, linux-cxl, a.manzanares,
viacheslav.dubeyko, dan.j.williams, seungjun.ha, wj28.lee
On Wed, Apr 12, 2023 at 08:10:33PM +0900, Kyungsan Kim wrote:
> Pinning and plubbability is mutual exclusive so it can not happen at the same time.
> What we argue is ZONE_EXMEM does not "confine movability". an allocation context can determine the movability attribute.
> Even one unmovable allocation will make the entire CXL DRAM unpluggable.
> When you see ZONE_EXMEM just on movable/unmoable aspect, we think it is the same with ZONE_NORMAL,
> but ZONE_EXMEM works on an extended memory, as of now CXL DRAM.
>
> Then why ZONE_EXMEM is, ZONE_EXMEM considers not only the pluggability aspect, but CXL identifier for user/kenelspace API,
> the abstraction of multiple CXL DRAM channels, and zone unit algorithm for CXL HW characteristics.
> The last one is potential at the moment, though.
>
> As mentioned in ZONE_EXMEM thread, we are preparing slides to explain experiences and proposals.
> It it not final version now[1].
> [1] https://github.com/OpenMPDK/SMDK/wiki/93.-%5BLSF-MM-BPF-TOPIC%5D-SMDK-inspired-MM-changes-for-CXL
The problem is that you're starting out with a solution. Tell us what
your requirements are, at a really high level, then walk us through
why ZONE_EXMEM is the best way to satisfy those requirements.
Also, those slides are terrible. Even at 200% zoom, the text is tiny.
There is no MAP_NORMAL argument to mmap(), there are no GFP flags to
sys_mmap() and calling mmap() does not typically cause alloc_page() to
be called. I'm not sure that putting your thoughts onto slides is
making them any better organised.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-12 15:26 ` Gregory Price
@ 2023-04-12 15:50 ` David Hildenbrand
2023-04-12 16:34 ` Gregory Price
0 siblings, 1 reply; 40+ messages in thread
From: David Hildenbrand @ 2023-04-12 15:50 UTC (permalink / raw)
To: Gregory Price
Cc: Huang, Ying, Dragan Stancevic, lsf-pc, nil-migration, linux-cxl,
linux-mm
On 12.04.23 17:26, Gregory Price wrote:
> On Wed, Apr 12, 2023 at 10:38:04AM +0200, David Hildenbrand wrote:
>> On 12.04.23 04:54, Huang, Ying wrote:
>>> Gregory Price <gregory.price@memverge.com> writes:
>>>
>>>> On Tue, Apr 11, 2023 at 02:37:50PM +0800, Huang, Ying wrote:
>>>>> Gregory Price <gregory.price@memverge.com> writes:
>>>>>
>>>>> [snip]
>>>>>
>>>>>> 2. During the migration process, the memory needs to be forced not to be
>>>>>> migrated to another node by other means (tiering software, swap,
>>>>>> etc). The obvious way of doing this would be to migrate and
>>>>>> temporarily pin the page... but going back to problem #1 we see that
>>>>>> ZONE_MOVABLE and Pinning are mutually exclusive. So that's
>>>>>> troublesome.
>>>>>
>>>>> Can we use memory policy (cpusets, mbind(), set_mempolicy(), etc.) to
>>>>> avoid move pages out of CXL.mem node? Now, there are gaps in tiering,
>>>>> but I think it is fixable.
>>>>>
>>>>> Best Regards,
>>>>> Huang, Ying
>>>>>
>>>>> [snip]
>>>>
>>>> That feels like a hack/bodge rather than a proper solution to me.
>>>>
>>>> Maybe this is an affirmative argument for the creation of an EXMEM
>>>> zone.
>>>
>>> Let's start with requirements. What is the requirements for a new zone
>>> type?
>>
>> I'm stills scratching my head regarding this. I keep hearing all different
>> kind of statements that just add more confusions "we want it to be
>> hotunpluggable" "we want to allow for long-term pinning memory" "but we
>> still want it to be movable" "we want to place some unmovable allocations on
>> it". Huh?
>>
>> Just to clarify: ZONE_MOVABLE allows for pinning. It just doesn't allow for
>> long-term pinning of memory.
>>
>
> I apologize for the confusion, this is my fault. I had assumed that
> since dax regions can't be pinned, subsequent nodes backed by a dax
> device could not be pinned. In testing this, this is not the case.
>
> Re: long-term pinning, can you be more explicit as to what is considered
> long-term? Minutes? hours? days? etc
long-term: possibly forever, controlled by user space. In practice,
anything longer than ~10 seconds ( best guess :) ). There can be
long-term pinnings that are of very short duration, we just don't know
what user space is up to and when it will decide to unpin.
Assume user space requests to trigger read/write of a user space page to
a file: the page is pinned, DMA is started, once DMA completes the page
is unpinned. Short-term. User space does not control how long the page
remains pinned.
In contrast:
Example #1: mapping VM guest memory into an IOMMU using vfio for PCI
passthrough requires pinning the pages. Until user space decides to
unmap the pages from the IOMMU, the pages will remain pinned. -> long-term
Example #2: mapping a user space address range into an IOMMU to
repeatedly perform RDMA using that address range requires pinning the
pages. Until user space decides to unregister that range, the pages
remain pinned. -> long-term
Example #3: registering a user space address range with io_uring as a
fixed buffer, such that io_uring OPS can avoid the page table walks by
simply using the pinned pages that were looked up once. As long as the
fixed buffer remains registered, the pages stay pinned. -> long-term
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-12 15:50 ` David Hildenbrand
@ 2023-04-12 16:34 ` Gregory Price
2023-04-14 4:16 ` Dragan Stancevic
0 siblings, 1 reply; 40+ messages in thread
From: Gregory Price @ 2023-04-12 16:34 UTC (permalink / raw)
To: David Hildenbrand
Cc: Huang, Ying, Dragan Stancevic, lsf-pc, nil-migration, linux-cxl,
linux-mm
On Wed, Apr 12, 2023 at 05:50:55PM +0200, David Hildenbrand wrote:
>
> long-term: possibly forever, controlled by user space. In practice, anything
> longer than ~10 seconds ( best guess :) ). There can be long-term pinnings
> that are of very short duration, we just don't know what user space is up to
> and when it will decide to unpin.
>
> Assume user space requests to trigger read/write of a user space page to a
> file: the page is pinned, DMA is started, once DMA completes the page is
> unpinned. Short-term. User space does not control how long the page remains
> pinned.
>
> In contrast:
>
> Example #1: mapping VM guest memory into an IOMMU using vfio for PCI
> passthrough requires pinning the pages. Until user space decides to unmap
> the pages from the IOMMU, the pages will remain pinned. -> long-term
>
> Example #2: mapping a user space address range into an IOMMU to repeatedly
> perform RDMA using that address range requires pinning the pages. Until user
> space decides to unregister that range, the pages remain pinned. ->
> long-term
>
> Example #3: registering a user space address range with io_uring as a fixed
> buffer, such that io_uring OPS can avoid the page table walks by simply
> using the pinned pages that were looked up once. As long as the fixed buffer
> remains registered, the pages stay pinned. -> long-term
>
> --
> Thanks,
>
> David / dhildenb
>
That pretty much precludes live migration from using CXL as a transport
mechanism, since live migration would be a user-initiated process, you
would need what amounts to an atomic move between hosts to ensure pages
are not left pinned.
The more i'm reading the more i'm somewhat convinced CXL memory should
not allow pinning at all.
I suppose you could implement a new RDMA feature where the remote host's
CXL memory is temporarily mapped, data is migrated, and then that area
is unmapped. Basically the exact same RDMA mechanism, but using memory
instead of network. This would make the operation a kernel-controlled
if pin/unpin is required.
Lots to talk about.
~Gregory
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-11 4:33 ` Shreyas Shah
@ 2023-04-14 3:26 ` Dragan Stancevic
0 siblings, 0 replies; 40+ messages in thread
From: Dragan Stancevic @ 2023-04-14 3:26 UTC (permalink / raw)
To: Shreyas Shah, lsf-pc; +Cc: nil-migration, linux-cxl, linux-mm
On 4/10/23 23:33, Shreyas Shah wrote:
> Hi Dragon,
>
> Do you all meet on Zoom or Teams weekly call?
>
> Maybe next time I can present if you have a slot.
Hi Shreyas-
no, sorry there is no zoom call cadence at the time.
> Regards,
> Shreyas
> -----Original Message-----
> From: Dragan Stancevic <dragan@stancevic.com>
> Sent: Monday, April 10, 2023 6:33 PM
> To: Shreyas Shah <shreyas.shah@elastics.cloud>; lsf-pc@lists.linux-foundation.org
> Cc: nil-migration@lists.linux.dev; linux-cxl@vger.kernel.org; linux-mm@kvack.org
> Subject: Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
>
> Hi Shreyas-
>
> speaking strictly for myself, sounds interesting
>
>
> On 4/10/23 20:17, Shreyas Shah wrote:
>> Thank you, Dragon.
>>
>> Btw, we can demonstrate the VM live migration with our FPGA based CXL memory and our demonstration today. We are not application expert, looked at the last diagram in your link and we are confident we can achieve it.
>>
>> Will there be any interest from the group? I can present for 15 mins and Q&A.
>>
>> Regards,
>> Shreyas
>>
>>
>>
>> -----Original Message-----
>> From: Dragan Stancevic <dragan@stancevic.com>
>> Sent: Monday, April 10, 2023 6:09 PM
>> To: Shreyas Shah <shreyas.shah@elastics.cloud>;
>> lsf-pc@lists.linux-foundation.org
>> Cc: nil-migration@lists.linux.dev; linux-cxl@vger.kernel.org;
>> linux-mm@kvack.org
>> Subject: Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
>>
>> Hi Shreyas-
>>
>> On 4/9/23 12:40, Shreyas Shah wrote:
>>> Hi Dragon,
>>>
>>> The concept is great to time share the CXL attached memory across two NUMA nodes for live migration and create a cluster of VMs to increase the compute capacity.
>>>
>>> When and where is the BoF?
>>
>> It's a proposal sent under (CFP), it has not been accepted yet. The agenda is selected by the program committee based on interest. The current proposal is for LSF/MM/BPF summit[1] running May 8 - May 10th, but it's not happening if not approved by committee.
>>
>>
>> [1]. https://events.linuxfoundation.org/lsfmm/
>>
>>
>>>
>>>
>>> Regards,
>>> Shreyas
>>>
>>>
>>> -----Original Message-----
>>> From: Dragan Stancevic <dragan@stancevic.com>
>>> Sent: Friday, April 7, 2023 2:06 PM
>>> To: lsf-pc@lists.linux-foundation.org
>>> Cc: nil-migration@lists.linux.dev; linux-cxl@vger.kernel.org;
>>> linux-mm@kvack.org
>>> Subject: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
>>>
>>> Hi folks-
>>>
>>> if it's not too late for the schedule...
>>>
>>> I am starting to tackle VM live migration and hypervisor clustering over switched CXL memory[1][2], intended for cloud virtualization types of loads.
>>>
>>> I'd be interested in doing a small BoF session with some slides and get into a discussion/brainstorming with other people that deal with VM/LM cloud loads. Among other things to discuss would be page migrations over switched CXL memory, shared in-memory ABI to allow VM hand-off between hypervisors, etc...
>>>
>>> A few of us discussed some of this under the ZONE_XMEM thread, but I figured it might be better to start a separate thread.
>>>
>>> If there is interested, thank you.
>>>
>>>
>>> [1]. High-level overview available at http://nil-migration.org/ [2].
>>> Based on CXL spec 3.0
>>>
>>> --
>>> Peace can only come as a natural consequence of universal
>>> enlightenment -Dr. Nikola Tesla
>>
>> --
>> --
>> Peace can only come as a natural consequence of universal
>> enlightenment -Dr. Nikola Tesla
>>
>
> --
> --
> Peace can only come as a natural consequence of universal enlightenment -Dr. Nikola Tesla
>
--
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-10 3:05 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-10 17:46 ` [External] " Viacheslav A.Dubeyko
@ 2023-04-14 3:27 ` Dragan Stancevic
1 sibling, 0 replies; 40+ messages in thread
From: Dragan Stancevic @ 2023-04-14 3:27 UTC (permalink / raw)
To: Kyungsan Kim
Cc: lsf-pc, linux-mm, linux-fsdevel, linux-cxl, a.manzanares,
viacheslav.dubeyko, dan.j.williams, seungjun.ha, wj28.lee
Hi Kyungsan-
On 4/9/23 22:05, Kyungsan Kim wrote:
>> Hi folks-
>>
>> if it's not too late for the schedule...
>>
>> I am starting to tackle VM live migration and hypervisor clustering over
>> switched CXL memory[1][2], intended for cloud virtualization types of loads.
>>
>> I'd be interested in doing a small BoF session with some slides and get
>> into a discussion/brainstorming with other people that deal with VM/LM
>> cloud loads. Among other things to discuss would be page migrations over
>> switched CXL memory, shared in-memory ABI to allow VM hand-off between
>> hypervisors, etc...
>>
>> A few of us discussed some of this under the ZONE_XMEM thread, but I
>> figured it might be better to start a separate thread.
>>
>> If there is interested, thank you.
>
> I would like join the discussion as well.
> Let me kindly suggest it would be more great if it includes the data flow of VM/hypervisor as background and kernel interaction expected.
Thank you for the suggestion, have you had a chance to check out
http://nil-migration.org/ I have a high-level data flow between
hypervisors, both for VM migration and hypervisor clustering. If that is
not enough, I can definitely throw more things together. Let me know,
thank you
>>
>>
>> [1]. High-level overview available at http://nil-migration.org/
>> [2]. Based on CXL spec 3.0
>>
>> --
>> Peace can only come as a natural consequence
>> of universal enlightenment -Dr. Nikola Tesla
>
--
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-11 1:48 ` Gregory Price
@ 2023-04-14 3:32 ` Dragan Stancevic
2023-04-14 13:16 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Jonathan Cameron
0 siblings, 1 reply; 40+ messages in thread
From: Dragan Stancevic @ 2023-04-14 3:32 UTC (permalink / raw)
To: Gregory Price; +Cc: lsf-pc, nil-migration, linux-cxl, linux-mm
Hi Gregory-
On 4/10/23 20:48, Gregory Price wrote:
> On Mon, Apr 10, 2023 at 07:56:01PM -0500, Dragan Stancevic wrote:
>> Hi Gregory-
>>
>> On 4/7/23 19:05, Gregory Price wrote:
>>> 3. This is changing the semantics of migration from a virtual memory
>>> movement to a physical memory movement. Typically you would expect
>>> the RDMA process for live migration to work something like...
>>>
>>> a) migration request arrives
>>> b) source host informs destination host of size requirements
>>> c) destination host allocations memory and passes a Virtual Address
>>> back to source host
>>> d) source host initates an RDMA from HostA-VA to HostB-VA
>>> e) CPU task is migrated
>>>
>>> Importantly, the allocation of memory by Host B handles the important
>>> step of creating HVA->HPA mappings, and the Extended/Nested Page
>>> Tables can simply be flushed and re-created after the VM is fully
>>> migrated.
>>>
>>> to long didn't read: live migration is a virtual address operation,
>>> and node-migration is a PHYSICAL address operation, the virtual
>>> addresses remain the same.
>>>
>>> This is problematic, as it's changing the underlying semantics of the
>>> migration operation.
>>
>> Those are all valid points, but what if you don't need to recreate HVA->HPA
>> mappings? If I am understanding the CXL 3.0 spec correctly, then both
>> virtual addresses and physical addresses wouldn't have to change. Because
>> the fabric "virtualizes" host physical addresses and the translation is done
>> by the G-FAM/GFD that has the capability to translate multi-host HPAs to
>> it's internal DPAs. So if you have two hypervisors seeing device physical
>> address as the same physical address, that might work?
>>
>>
>
> Hm. I hadn't considered the device side translation (decoders), though
> that's obviously a tool in the toolbox. You still have to know how to
> slide ranges of data (which you mention below).
Hmm, do you have any quick thoughts on that?
>>> The reference in this case is... the page tables. You need to know how
>>> to interpret the data in the CXL memory region on the remote host, and
>>> that's a "relative page table translation" (to coin a phrase? I'm not
>>> sure how to best describe it).
>>
>> right, coining phrases... I have been thinking of a "super-page" (for the
>> lack of a better word) a metadata region sitting on the switched CXL.mem
>> device that would allow hypervisors to synchronize on various aspects, such
>> as "relative page table translation", host is up, host is down, list of
>> peers, who owns what etc... In a perfect scenario, I would love to see the
>> hypervisors cooperating on switched CXL.mem device the same way cpus on
>> different numa nodes cooperate on memory in a single hypervisor. If either
>> host can allocate and schedule from this space then "NIL" aspect of
>> migration is "free".
>>
>>
>
> The core of the problem is still that each of the hosts has to agree on
> the location (physically) of this region of memory, which could be
> problematic unless you have very strong BIOS and/or kernel driver
> controls to ensure certain devices are guaranteed to be mapped into
> certain spots in the CFMW.
Right, true. The way I am thinking of it is that this would be a part of
data-center ops setup which at first pass would be a somewhat of a
manual setup same way as other pre-OS related setup. But later on down
the road perhaps this could be automated, either through some pre-agreed
auto-ranges detection or similar, it's not unusual for dc ops to name
hypervisors depending of where in dc/rack/etc they sit etc..
> After that it's a matter of treating this memory as incoherent shared
> memory and handling ownership in a safe way. If the memory is only used
> for migrations, then you don't have to worry about performance.
>
> So I agree, as long as shared memory mapped into the same CFMW area is
> used, this mechanism is totally sound.
>
> My main concerns are that I don't know of a mechanism to ensure that. I
> suppose for those interested, and with special BIOS/EFI, you could do
> that - but I think that's going to be a tall ask in a heterogenous cloud
> environment.
Yeah, I get that. But in my experience even heterogeneous setups have
some level of homogeneity, weather it's per rack, or per pod. As old
things are sunset and new things are brought in, it gives you these
segments of homogeneity with more or less advanced features. So at the
end of the day, if someone wants a feature X they will need to
understand the feature requirements or limitations. I feel like I deal
with hardware/feature fragmentation all the time, but doesn't preclude
bringing newer things in. You just have to plant it appropriately.
>>> That's... complicated to say the least.
>>>
>>> <... snip ...>
>>>
>>> An Option: Make pages physically contiguous on migration to CXL
>>>
>>> In this case, you don't necessarily care about the Host Virtual
>>> Addresses, what you actually care about are the structure of the pages
>>> in memory (are they physically contiguous? or do you need to
>>> reconstruct the contiguity by inspecting the page tables?).
>>>
>>> If a migration API were capable of reserving large swaths of contiguous
>>> CXL memory, you could discard individual page information and instead
>>> send page range information, reconstructing the virtual-physical
>>> mappings this way.
>>
>> yeah, good points, but this is all tricky though... it seems this would
>> require quiescing the VM and that is something I would like to avoid if
>> possible. I'd like to see the VM still executing while all of it's pages are
>> migrated onto CXL NUMA on the source hypervisor. And I would like to see the
>> VM executing on the destination hypervisor while migrate_pages is moving
>> pages off of CXL. Of course, what you are describing above would still be a
>> very fast VM migration, but would require quiescing.
>>
>>
>
> Possibly. If you're going to quiesce you're probably better off just
> snapshotting to shared memory and migrating the snapshot.
That is exactly my thought too.
> Maybe that's the better option for a first-pass migration mechanism. I
> don't know.
I definitely see your point, "canning" and "re-hydration" approach as a
first-pass. I'd be happy with even just a "Hello World" page migration
as a first pass :)
>
> Anyway, would love to attend this session.
>
> ~Gregory
>
--
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-11 6:37 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Huang, Ying
2023-04-11 15:36 ` Gregory Price
@ 2023-04-14 3:33 ` Dragan Stancevic
2023-04-14 5:35 ` Huang, Ying
1 sibling, 1 reply; 40+ messages in thread
From: Dragan Stancevic @ 2023-04-14 3:33 UTC (permalink / raw)
To: Huang, Ying, Gregory Price; +Cc: lsf-pc, nil-migration, linux-cxl, linux-mm
On 4/11/23 01:37, Huang, Ying wrote:
> Gregory Price <gregory.price@memverge.com> writes:
>
> [snip]
>
>> 2. During the migration process, the memory needs to be forced not to be
>> migrated to another node by other means (tiering software, swap,
>> etc). The obvious way of doing this would be to migrate and
>> temporarily pin the page... but going back to problem #1 we see that
>> ZONE_MOVABLE and Pinning are mutually exclusive. So that's
>> troublesome.
>
> Can we use memory policy (cpusets, mbind(), set_mempolicy(), etc.) to
> avoid move pages out of CXL.mem node? Now, there are gaps in tiering,
> but I think it is fixable.
Hmmm, I don't know about cpusets. For mbind, are you thinking something
along the lines of MPOL_MF_MOVE_ALL? I guess it does have that
deterministic placement, but this would have to be called from the
process itself. Unlike migrate_pages which takes a pid.
Same for set_mempolicy, right?
I mean I guess, if some of this needs to be added into qemu it's not the
end of the word...
> Best Regards,
> Huang, Ying
>
> [snip]
>
--
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-12 16:34 ` Gregory Price
@ 2023-04-14 4:16 ` Dragan Stancevic
0 siblings, 0 replies; 40+ messages in thread
From: Dragan Stancevic @ 2023-04-14 4:16 UTC (permalink / raw)
To: Gregory Price, David Hildenbrand
Cc: Huang, Ying, lsf-pc, nil-migration, linux-cxl, linux-mm
Hi Gregory-
On 4/12/23 11:34, Gregory Price wrote:
> On Wed, Apr 12, 2023 at 05:50:55PM +0200, David Hildenbrand wrote:
>>
>> long-term: possibly forever, controlled by user space. In practice, anything
>> longer than ~10 seconds ( best guess :) ). There can be long-term pinnings
>> that are of very short duration, we just don't know what user space is up to
>> and when it will decide to unpin.
>>
>> Assume user space requests to trigger read/write of a user space page to a
>> file: the page is pinned, DMA is started, once DMA completes the page is
>> unpinned. Short-term. User space does not control how long the page remains
>> pinned.
>>
>> In contrast:
>>
>> Example #1: mapping VM guest memory into an IOMMU using vfio for PCI
>> passthrough requires pinning the pages. Until user space decides to unmap
>> the pages from the IOMMU, the pages will remain pinned. -> long-term
>>
>> Example #2: mapping a user space address range into an IOMMU to repeatedly
>> perform RDMA using that address range requires pinning the pages. Until user
>> space decides to unregister that range, the pages remain pinned. ->
>> long-term
>>
>> Example #3: registering a user space address range with io_uring as a fixed
>> buffer, such that io_uring OPS can avoid the page table walks by simply
>> using the pinned pages that were looked up once. As long as the fixed buffer
>> remains registered, the pages stay pinned. -> long-term
>>
>> --
>> Thanks,
>>
>> David / dhildenb
>>
>
> That pretty much precludes live migration from using CXL as a transport
> mechanism, since live migration would be a user-initiated process, you
> would need what amounts to an atomic move between hosts to ensure pages
> are not left pinned.
Do you really need an atomic-in-between-hots? I mean, it's not really a
failure if you are in the process of migrating pages onto the switched
cxl memory memory and one of the pages is pulled out of cxl and back on
the hypervisor. The running VM cpu can do loads and stores from either.
So it's running, it's not affected. It's just that your migration is
potentially "stalled" or "canceled". You only encounter issues when all
your pages are on cxl and the other hypervisor is pulling pages out.
> The more i'm reading the more i'm somewhat convinced CXL memory should
> not allow pinning at all.
I think you want to be able to somehow pin the pages on one hypervisor
and unpin them on the other hypervisor. Or in some other way "pass
ownership" between the hypervisor. Right? Because of the scenario I
mention above, if your source hypervisor takes a page out of cxl, then
your destination hypervisor has a hole in VMs address space and can't
run it.
> I suppose you could implement a new RDMA feature where the remote host's
> CXL memory is temporarily mapped, data is migrated, and then that area
> is unmapped. Basically the exact same RDMA mechanism, but using memory
> instead of network. This would make the operation a kernel-controlled
> if pin/unpin is required.
That would move us from the shared memory in the CXL 3 spec into the
sections on direct memory placement I think. Which in order of
preference is a #2 for me personally and a "backup" plan if #1 shared
memory doesn't pan out.
> Lots to talk about.
>
> ~Gregory
>
--
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-14 3:33 ` Dragan Stancevic
@ 2023-04-14 5:35 ` Huang, Ying
0 siblings, 0 replies; 40+ messages in thread
From: Huang, Ying @ 2023-04-14 5:35 UTC (permalink / raw)
To: Dragan Stancevic
Cc: Gregory Price, lsf-pc, nil-migration, linux-cxl, linux-mm
Dragan Stancevic <dragan@stancevic.com> writes:
> On 4/11/23 01:37, Huang, Ying wrote:
>> Gregory Price <gregory.price@memverge.com> writes:
>> [snip]
>>
>>> 2. During the migration process, the memory needs to be forced not to be
>>> migrated to another node by other means (tiering software, swap,
>>> etc). The obvious way of doing this would be to migrate and
>>> temporarily pin the page... but going back to problem #1 we see that
>>> ZONE_MOVABLE and Pinning are mutually exclusive. So that's
>>> troublesome.
>> Can we use memory policy (cpusets, mbind(), set_mempolicy(), etc.)
>> to
>> avoid move pages out of CXL.mem node? Now, there are gaps in tiering,
>> but I think it is fixable.
>
>
> Hmmm, I don't know about cpusets. For mbind, are you thinking
> something along the lines of MPOL_MF_MOVE_ALL? I guess it does have
> that deterministic placement, but this would have to be called from
> the process itself. Unlike migrate_pages which takes a pid.
You can still use migrate_pages(2). But after that, if you want to
prevent the pages to be migrated out of CXL.mem, you can use some kind
of memory policy, such as cpusets, mbind(), set_mempolicy().
Best Regards,
Huang, Ying
> Same for set_mempolicy, right?
>
> I mean I guess, if some of this needs to be added into qemu it's not
> the end of the word...
>
>
>> Best Regards,
>> Huang, Ying
>> [snip]
>>
^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: FW: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
[not found] ` <CGME20230414084110epcas2p20b90a8d1892110d7ca3ac16290cd4686@epcas2p2.samsung.com>
@ 2023-04-14 8:41 ` Kyungsan Kim
0 siblings, 0 replies; 40+ messages in thread
From: Kyungsan Kim @ 2023-04-14 8:41 UTC (permalink / raw)
To: david
Cc: lsf-pc, linux-mm, linux-fsdevel, linux-cxl, a.manzanares,
viacheslav.dubeyko, dan.j.williams, seungjun.ha, wj28.lee,
hj96.nam
>On 12.04.23 13:10, Kyungsan Kim wrote:
>>>> Gregory Price <gregory.price@memverge.com> writes:
>>>>
>>>>> On Tue, Apr 11, 2023 at 02:37:50PM +0800, Huang, Ying wrote:
>>>>>> Gregory Price <gregory.price@memverge.com> writes:
>>>>>>
>>>>>> [snip]
>>>>>>
>>>>>>> 2. During the migration process, the memory needs to be forced not to be
>>>>>>> migrated to another node by other means (tiering software, swap,
>>>>>>> etc). The obvious way of doing this would be to migrate and
>>>>>>> temporarily pin the page... but going back to problem #1 we see that
>>>>>>> ZONE_MOVABLE and Pinning are mutually exclusive. So that's
>>>>>>> troublesome.
>>>>>>
>>>>>> Can we use memory policy (cpusets, mbind(), set_mempolicy(), etc.) to
>>>>>> avoid move pages out of CXL.mem node? Now, there are gaps in tiering,
>>>>>> but I think it is fixable.
>>>>>>
>>>>>> Best Regards,
>>>>>> Huang, Ying
>>>>>>
>>>>>> [snip]
>>>>>
>>>>> That feels like a hack/bodge rather than a proper solution to me.
>>>>>
>>>>> Maybe this is an affirmative argument for the creation of an EXMEM
>>>>> zone.
>>>>
>>>> Let's start with requirements. What is the requirements for a new zone
>>>> type?
>>>
>>> I'm stills scratching my head regarding this. I keep hearing all
>>> different kind of statements that just add more confusions "we want it
>>> to be hotunpluggable" "we want to allow for long-term pinning memory"
>>> "but we still want it to be movable" "we want to place some unmovable
>>> allocations on it". Huh?
>>>
>>> Just to clarify: ZONE_MOVABLE allows for pinning. It just doesn't allow
>>> for long-term pinning of memory.
>>>
>>> For good reason, because long-term pinning of memory is just the worst
>>> (memory waste, fragmentation, overcommit) and instead of finding new
>>> ways to *avoid* long-term pinnings, we're coming up with advanced
>>> concepts to work-around the fundamental property of long-term pinnings.
>>>
>>> We want all memory to be long-term pinnable and we want all memory to be
>>> movable/hotunpluggable. That's not going to work.
>>
>> Looks there is misunderstanding about ZONE_EXMEM argument.
>> Pinning and plubbability is mutual exclusive so it can not happen at the same time.
>> What we argue is ZONE_EXMEM does not "confine movability". an allocation context can determine the movability attribute.
>> Even one unmovable allocation will make the entire CXL DRAM unpluggable.
>> When you see ZONE_EXMEM just on movable/unmoable aspect, we think it is the same with ZONE_NORMAL,
>> but ZONE_EXMEM works on an extended memory, as of now CXL DRAM.
>>
>> Then why ZONE_EXMEM is, ZONE_EXMEM considers not only the pluggability aspect, but CXL identifier for user/kenelspace API,
>> the abstraction of multiple CXL DRAM channels, and zone unit algorithm for CXL HW characteristics.
>> The last one is potential at the moment, though.
>>
>> As mentioned in ZONE_EXMEM thread, we are preparing slides to explain experiences and proposals.
>> It it not final version now[1].
>> [1] https://protect2.fireeye.com/v1/url?k=265f4f76-47d45a59-265ec439-74fe485cbfe7-1e8ec1d2f0c2fd0a&q=1&e=727e97be-fc78-4fa6-990b-a86c256978d1&u=https%3A%2F%2Fgithub.com%2FOpenMPDK%2FSMDK%2Fwiki%2F93.-%255BLSF-MM-BPF-TOPIC%255D-SMDK-inspired-MM-changes-for-CXL
>
>Yes, hopefully we can discuss at LSF/MM also the problems we are trying
>to solve instead of focusing on one solution. [did not have the time to
>look at the slides yet, sorry]
For sure.. The purpose of LSF/MM this year is weighted on sharing experiences and issues as CXL provider for last couple of years.
We don't think our solution is the only way, but propose it.
Hopefully, we gradually figure out the best way with experts here.
>
>--
>Thanks,
>
>David / dhildenb
^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: RE: FW: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
[not found] ` <CGME20230414084114epcas2p4754d6c0d3c86a0d6d4e855058562100f@epcas2p4.samsung.com>
@ 2023-04-14 8:41 ` Kyungsan Kim
0 siblings, 0 replies; 40+ messages in thread
From: Kyungsan Kim @ 2023-04-14 8:41 UTC (permalink / raw)
To: willy
Cc: lsf-pc, linux-mm, linux-fsdevel, linux-cxl, a.manzanares,
viacheslav.dubeyko, dan.j.williams, seungjun.ha, wj28.lee,
hj96.nam
>On Wed, Apr 12, 2023 at 08:10:33PM +0900, Kyungsan Kim wrote:
>> Pinning and plubbability is mutual exclusive so it can not happen at the same time.
>> What we argue is ZONE_EXMEM does not "confine movability". an allocation context can determine the movability attribute.
>> Even one unmovable allocation will make the entire CXL DRAM unpluggable.
>> When you see ZONE_EXMEM just on movable/unmoable aspect, we think it is the same with ZONE_NORMAL,
>> but ZONE_EXMEM works on an extended memory, as of now CXL DRAM.
>>
>> Then why ZONE_EXMEM is, ZONE_EXMEM considers not only the pluggability aspect, but CXL identifier for user/kenelspace API,
>> the abstraction of multiple CXL DRAM channels, and zone unit algorithm for CXL HW characteristics.
>> The last one is potential at the moment, though.
>>
>> As mentioned in ZONE_EXMEM thread, we are preparing slides to explain experiences and proposals.
>> It it not final version now[1].
>> [1] https://github.com/OpenMPDK/SMDK/wiki/93.-%5BLSF-MM-BPF-TOPIC%5D-SMDK-inspired-MM-changes-for-CXL
>
>The problem is that you're starting out with a solution. Tell us what
>your requirements are, at a really high level, then walk us through
>why ZONE_EXMEM is the best way to satisfy those requirements.
Thank you for your advice. It makes sense.
We will restate requirements(usecases and issues) rather than our solution aspect.
A sympathy about the requirements should come first at the moment.
Hope we gradually reach up a consensus.
>Also, those slides are terrible. Even at 200% zoom, the text is tiny.
>
>There is no MAP_NORMAL argument to mmap(), there are no GFP flags to
>sys_mmap() and calling mmap() does not typically cause alloc_page() to
>be called. I'm not sure that putting your thoughts onto slides is
>making them any better organised.
I'm sorry for your inconvenience. Explaining the version of document, the 1st slide shows SMDK kernel, not vanilla kernel.
Especially, the slide is geared to highlight the flow of the new user/kernel API to implicitly/explicitly access DIMM DRAM or CXL DRAM
to help understanding at previous discussion context.
We added MAP_NORMAL/MAP_EXMEM on mmap()/sys_mmap(), GFP_EXMEM/GFP_NORMAL on alloc_pages().
If you mean COW, please assume the mmap() is called with MAP_POPULATE flag. We wanted to draw it simple to highlight the purpose.
The document is not final version, we will apply your comment while preparing.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-14 3:32 ` Dragan Stancevic
@ 2023-04-14 13:16 ` Jonathan Cameron
0 siblings, 0 replies; 40+ messages in thread
From: Jonathan Cameron @ 2023-04-14 13:16 UTC (permalink / raw)
To: Dragan Stancevic
Cc: Gregory Price, lsf-pc, nil-migration, linux-cxl, linux-mm
On Thu, 13 Apr 2023 22:32:48 -0500
Dragan Stancevic <dragan@stancevic.com> wrote:
> Hi Gregory-
>
>
> On 4/10/23 20:48, Gregory Price wrote:
> > On Mon, Apr 10, 2023 at 07:56:01PM -0500, Dragan Stancevic wrote:
> >> Hi Gregory-
> >>
> >> On 4/7/23 19:05, Gregory Price wrote:
> >>> 3. This is changing the semantics of migration from a virtual memory
> >>> movement to a physical memory movement. Typically you would expect
> >>> the RDMA process for live migration to work something like...
> >>>
> >>> a) migration request arrives
> >>> b) source host informs destination host of size requirements
> >>> c) destination host allocations memory and passes a Virtual Address
> >>> back to source host
> >>> d) source host initates an RDMA from HostA-VA to HostB-VA
> >>> e) CPU task is migrated
> >>>
> >>> Importantly, the allocation of memory by Host B handles the important
> >>> step of creating HVA->HPA mappings, and the Extended/Nested Page
> >>> Tables can simply be flushed and re-created after the VM is fully
> >>> migrated.
> >>>
> >>> to long didn't read: live migration is a virtual address operation,
> >>> and node-migration is a PHYSICAL address operation, the virtual
> >>> addresses remain the same.
> >>>
> >>> This is problematic, as it's changing the underlying semantics of the
> >>> migration operation.
> >>
> >> Those are all valid points, but what if you don't need to recreate HVA->HPA
> >> mappings? If I am understanding the CXL 3.0 spec correctly, then both
> >> virtual addresses and physical addresses wouldn't have to change.
That's implementation defined if we are talking DCD for this. I would suggest making
it very clear which particular CXL options you are thinking of using.
A CXL 2.0 approach of binding LDs to different switch vPPB (virtual ports) probably doesn't
have this problem, but has it's own limitations and is a much heavier weight thing
to handle.
For DCD if we assuming sharing is used (I'd suggest ignoring other possibilities
for now as there are architectural gaps that I'm not going into and the same
issues will occur with them anyway)...
Then what you get if you share on multiple LDs presented to multiple hosts is
a set of extents (each is a base + size, any number any size) that have sequence
numbers.
The device may, typically because of fragmentation of the DPA space exposed to
an LD (typically one of those from a device per host) decide to map what was created
in a particular DPA extents pattern (mapped via nice linear decoders into Host PA space)
in a different order and with different size extents.
So in general you can't assume a spec compliant CXL type 3 device (probably a multihead
device in initial deployments) will map anything to an particular location when moving
the memory between hosts.
So ultimately you'd need to translate between:
Page tables on source + DPA extents info.
and
Page table needed on destination to land the parts of the DPA extents (via HDM deoders
applying offsets etc) in the right place in GPA space so the guest gets the right
mapping.
So that will have some complexity and cost associated with it. Not impossible but
not a simple reuse of tables from source on the destination.
This is all PA to GPA translation though and in many cases I'd not expect that
to be particularly dynamic - so it's a step before you do any actual migration
hence I'm not sure it matters that might take a bit of maths.
> Because
> >> the fabric "virtualizes" host physical addresses and the translation is done
> >> by the G-FAM/GFD that has the capability to translate multi-host HPAs to
> >> it's internal DPAs. So if you have two hypervisors seeing device physical
> >> address as the same physical address, that might work?
> >>
> >>
> >
> > Hm. I hadn't considered the device side translation (decoders), though
> > that's obviously a tool in the toolbox. You still have to know how to
> > slide ranges of data (which you mention below).
>
> Hmm, do you have any quick thoughts on that?
HDM decoder programming is hard to do in a dynamic fashion (lots of limitations
on what you can do due to ordering restrictions in the spec). I'd ignore it
for this usecase beyond the fact that you get linear offsets from DPA to HPA
that need to be incorporated in your thinking.
>
>
> >>> The reference in this case is... the page tables. You need to know how
> >>> to interpret the data in the CXL memory region on the remote host, and
> >>> that's a "relative page table translation" (to coin a phrase? I'm not
> >>> sure how to best describe it).
> >>
> >> right, coining phrases... I have been thinking of a "super-page" (for the
> >> lack of a better word) a metadata region sitting on the switched CXL.mem
> >> device that would allow hypervisors to synchronize on various aspects, such
> >> as "relative page table translation", host is up, host is down, list of
> >> peers, who owns what etc... In a perfect scenario, I would love to see the
> >> hypervisors cooperating on switched CXL.mem device the same way cpus on
> >> different numa nodes cooperate on memory in a single hypervisor. If either
> >> host can allocate and schedule from this space then "NIL" aspect of
> >> migration is "free".
> >>
> >>
> >
> > The core of the problem is still that each of the hosts has to agree on
> > the location (physically) of this region of memory, which could be
> > problematic unless you have very strong BIOS and/or kernel driver
> > controls to ensure certain devices are guaranteed to be mapped into
> > certain spots in the CFMW.
>
> Right, true. The way I am thinking of it is that this would be a part of
> data-center ops setup which at first pass would be a somewhat of a
> manual setup same way as other pre-OS related setup. But later on down
> the road perhaps this could be automated, either through some pre-agreed
> auto-ranges detection or similar, it's not unusual for dc ops to name
> hypervisors depending of where in dc/rack/etc they sit etc..
>
You might be able to constrain particular devices to place nicely with such
a model, but that is out of the scope of the specification and I'd suggest
in Linux at least we'd write the code to deal with the general case then
maybe have a 'fast path' if the stars align.
Jonathan
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-11 18:00 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dave Hansen
@ 2023-05-01 23:49 ` Dragan Stancevic
0 siblings, 0 replies; 40+ messages in thread
From: Dragan Stancevic @ 2023-05-01 23:49 UTC (permalink / raw)
To: Dave Hansen, lsf-pc; +Cc: nil-migration, linux-cxl, linux-mm
Hi Dave-
sorry, looks like I've missed your email
On 4/11/23 13:00, Dave Hansen wrote:
> On 4/7/23 14:05, Dragan Stancevic wrote:
>> I'd be interested in doing a small BoF session with some slides and get
>> into a discussion/brainstorming with other people that deal with VM/LM
>> cloud loads. Among other things to discuss would be page migrations over
>> switched CXL memory, shared in-memory ABI to allow VM hand-off between
>> hypervisors, etc...
>
> How would 'struct page' or other kernel metadata be handled?
>
> I assume you'd want a really big CXL memory device with as many hosts
> connected to it as is feasible. But, in order to hand the memory off
> from one host to another, both would need to have metadata for it at
> _some_ point.
To be honest, I have not been thinking of this in terms of a "star"
connection topology. Where say each host in a rack connects to the same
memory device, I think I'd get bottle-necked on a singular device. Evac
of a few hypervisors simultaneously might get a bit dicey.
I've been thinking of it more in terms of multiple memory devices per
rack, connected to various hypervisors to form a hypervisor traversal
graph[1]. For example in this graph, a VM would migrate across a single
hop, or a few hops to reach it's destination hypervisor. And for the
lack of better word, this would be your "migration namespace" to migrate
the VM across the rack. The critical connections in the graph are
hostfoo04 and hostfoo09, and those you'd use if you want to pop the VM
into a different "migration namespace", for example a different rack or
maybe even a pod.
Of course, this is quite a ways out since there are no CXL 3.0 devices
yet. As a first step I would like to get to a point where I can emulate
this with qemu and just prototype various approaches, but starting with
a single emulated memory device and two hosts.
> So, do all hosts have metadata for the whole CXL memory device all the
> time? Or, would they create the metadata (hotplug) when a VM is
> migrated in and destroy it (hot unplug) when a VM is migrated out?
To be honest I have not thought about hot plugging, but might be
something for me to keep in mind and ponder about it. And if you have
additional thoughts on this I'd love to hear them.
What I was thinking, and this may or may not be possible, or may be
possible only to a certain extent, but my preference would be to keep as
much of the metadata as possible on the memory device itself and have
the hypervisors cooperate through some kind of ownership mechanism.
> That gets back to the granularity question discussed elsewhere in the
> thread. How would the metadata allocation granularity interact with the
> page allocation granularity? How would fragmentation be avoided so that
> hosts don't eat up all their RAM with unused metadata?
Yeah, this is something I am still running through my head. Even if we
have this "ownership-cooperation", is this based on pages, what happens
to the sub-page allocations, do we move them through the buckets or do
we attach ownership to sub-page allocations too. In my ideal world,
you'd have two hypervisors cooperate over this memory as transparently
as CPUs in a single system collaborating across NUMA nodes. A lot to
think about, many problems to solve and a lot of work to do. I don't
have all the answers yet, but value all input & help
[1]. https://nil-migration.org/VM-Graph.png
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-12 15:15 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory James Bottomley
@ 2023-05-03 23:42 ` Dragan Stancevic
0 siblings, 0 replies; 40+ messages in thread
From: Dragan Stancevic @ 2023-05-03 23:42 UTC (permalink / raw)
To: James Bottomley, David Hildenbrand, Huang, Ying, Gregory Price
Cc: lsf-pc, nil-migration, linux-cxl, linux-mm
Hi James, sorry looks like I missed your email...
On 4/12/23 10:15, James Bottomley wrote:
> On Wed, 2023-04-12 at 10:38 +0200, David Hildenbrand wrote:
>> On 12.04.23 04:54, Huang, Ying wrote:
>>> Gregory Price <gregory.price@memverge.com> writes:
> [...]
>>>> That feels like a hack/bodge rather than a proper solution to me.
>>>>
>>>> Maybe this is an affirmative argument for the creation of an
>>>> EXMEM zone.
>>>
>>> Let's start with requirements. What is the requirements for a new
>>> zone type?
>>
>> I'm stills scratching my head regarding this. I keep hearing all
>> different kind of statements that just add more confusions "we want
>> it to be hotunpluggable" "we want to allow for long-term pinning
>> memory" "but we still want it to be movable" "we want to place some
>> unmovable allocations on it". Huh?
>
> This is the essential question about CXL memory itself: what would its
> killer app be? The CXL people (or at least the ones I've talked to)
> don't exactly know.
I hope it's not something I've said, I'm not claiming VM migration or
hypervisor clustering is the killer app for CXL. I would never claim
that. And I'm not one of the CXL folks. You can chuck me into the "CXL
enthusiasts" bucket.... For a bit of context, I'm one of the
co-authors/architects of VMware's clustered filesystem[1] and I've
worked on live VM migration as far back as 2003 on the original ESX
server. Back in the day, we introduced the concept of VM live migration
into the x86 data-center parlance with a combination of a process
monitor and a clustered filesystem. The basic mechanism we put forward
at the time was: pre-copy, quiesce, post-copy, un-quiesce. And I think
most hypervisor after which added live migration are using loosely the
same basic principles, iirc xen introduced LM 4 years later in 2007 and
KVM about the same time or perhaps a year later. Anyway, the point that
I am trying to get to is, it bugged me 20 years ago that we quiesced,
and it bugs me today :) I think 20 years ago, quiescing was an
acceptable compromise because we couldn't solve it technologically.
Maybe 20-25 years later, we've reached a point we can solve it
technologically. I don't know, but the problem interests me enough to try.
> Within IBM I've seen lots of ideas but no actual
> concrete applications. Given the rates at which memory density in
> systems is increasing, I'm a bit dubious of the extensible system pool
> argument. Providing extensible memory to VMs sounds a bit more
> plausible, particularly as it solves a big part of the local overcommit
> problem (although you still have a global one). I'm not really sure I
> buy the VM migration use case: iterative transfer works fine with small
> down times so transferring memory seems to be the least of problems
> with the VM migration use case
We do approximately 2.5 Million live migrations per year. Some
migrations take less than a second, some take roughly a second, and
others on very noisy VMs can take several seconds. Whatever that average
is, let's say 1 second per live migration, that's cumulatively roughly
28 days of steal lost to migration per year. As you probably know, live
migrations are essential for de-fragmenting hypervisors/de-stranding
resources and from my perspective, I'd like to see them happen more
often with a smaller customer impact.
> (it's mostly about problems with attached devices).
That is purely virtualization load type dependent. Maybe for the cloud
you're running devices are a problem(I'm guessing here). For us this is
a non existent problem. We serve approximately 600,000 customers and
don't do forms of pass-through so it's literally a non issue. What I am
starting to tackle with nil-migration is to be able to migrate live and
executing memory, instead of frozen memory. Which should especially help
with noisy VMs, and in my experience customers of noisy VMs are more
likely to notice steal and complain about steal. I understand everyone
has their own workloads, and the devices problem will be solved in it's
own right, but it's out of scope for what I am tackling with
nil-migration. My main focus at this time is memory and context migration.
< CXL 3.0 is adding sharing primitives for memory so
> now we have to ask if there are any multi-node shared memory use cases
> for this, but most of us have already been burned by multi-node shared
> clusters once in our career and are a bit leery of a second go around.
Chatting with you at the last LPC, and judging by the combined gray hair
between us, I'll venture to guess we've both fallen off the proverbial
bike, many times. It's never stopped me from getting back on. Issue
interest me enough to try.
If you don't mind me asking, what clustering did you work on? Maybe I am
familiar with it
>
> Is there a use case I left out (or needs expanding)?
>
> James
>
[1]. https://en.wikipedia.org/wiki/VMware_VMFS
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
2023-04-07 21:05 [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dragan Stancevic
` (5 preceding siblings ...)
2023-04-11 18:16 ` RAGHU H
@ 2023-05-09 15:08 ` Dragan Stancevic
6 siblings, 0 replies; 40+ messages in thread
From: Dragan Stancevic @ 2023-05-09 15:08 UTC (permalink / raw)
To: lsf-pc; +Cc: nil-migration, linux-cxl, linux-mm
Hey folks-
to those that attended the BoF, just wanted to say thank you for your
time and attention, I appreciate it.
It was a bit challenging being remote, getting choppy sound, and I
already have hearing loss so I wasn't able to 100% hear what people were
saying.
So if you asked me "how you like them apples" and I said "Yeah, bananas
are cool", I apologize. Please feel free to email me with any additional
thoughts or questions.
Thanks
On 4/7/23 16:05, Dragan Stancevic wrote:
> Hi folks-
>
> if it's not too late for the schedule...
>
> I am starting to tackle VM live migration and hypervisor clustering over
> switched CXL memory[1][2], intended for cloud virtualization types of
> loads.
>
> I'd be interested in doing a small BoF session with some slides and get
> into a discussion/brainstorming with other people that deal with VM/LM
> cloud loads. Among other things to discuss would be page migrations over
> switched CXL memory, shared in-memory ABI to allow VM hand-off between
> hypervisors, etc...
>
> A few of us discussed some of this under the ZONE_XMEM thread, but I
> figured it might be better to start a separate thread.
>
> If there is interested, thank you.
>
>
> [1]. High-level overview available at http://nil-migration.org/
> [2]. Based on CXL spec 3.0
>
> --
> Peace can only come as a natural consequence
> of universal enlightenment -Dr. Nikola Tesla
>
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2023-05-09 15:08 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-07 21:05 [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dragan Stancevic
2023-04-07 22:23 ` James Houghton
2023-04-07 23:17 ` David Rientjes
2023-04-08 1:33 ` Dragan Stancevic
2023-04-08 16:24 ` Dragan Stancevic
2023-04-08 0:05 ` Gregory Price
2023-04-11 0:56 ` Dragan Stancevic
2023-04-11 1:48 ` Gregory Price
2023-04-14 3:32 ` Dragan Stancevic
2023-04-14 13:16 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Jonathan Cameron
2023-04-11 6:37 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Huang, Ying
2023-04-11 15:36 ` Gregory Price
2023-04-12 2:54 ` Huang, Ying
2023-04-12 8:38 ` David Hildenbrand
[not found] ` <CGME20230412111034epcas2p1b46d2a26b7d3ac5db3b0e454255527b0@epcas2p1.samsung.com>
2023-04-12 11:10 ` FW: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-12 11:26 ` David Hildenbrand
[not found] ` <CGME20230414084110epcas2p20b90a8d1892110d7ca3ac16290cd4686@epcas2p2.samsung.com>
2023-04-14 8:41 ` Kyungsan Kim
2023-04-12 15:40 ` Matthew Wilcox
[not found] ` <CGME20230414084114epcas2p4754d6c0d3c86a0d6d4e855058562100f@epcas2p4.samsung.com>
2023-04-14 8:41 ` Kyungsan Kim
2023-04-12 15:15 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory James Bottomley
2023-05-03 23:42 ` Dragan Stancevic
2023-04-12 15:26 ` Gregory Price
2023-04-12 15:50 ` David Hildenbrand
2023-04-12 16:34 ` Gregory Price
2023-04-14 4:16 ` Dragan Stancevic
2023-04-14 3:33 ` Dragan Stancevic
2023-04-14 5:35 ` Huang, Ying
2023-04-09 17:40 ` Shreyas Shah
2023-04-11 1:08 ` Dragan Stancevic
2023-04-11 1:17 ` Shreyas Shah
2023-04-11 1:32 ` Dragan Stancevic
2023-04-11 4:33 ` Shreyas Shah
2023-04-14 3:26 ` Dragan Stancevic
[not found] ` <CGME20230410030532epcas2p49eae675396bf81658c1a3401796da1d4@epcas2p4.samsung.com>
2023-04-10 3:05 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-10 17:46 ` [External] " Viacheslav A.Dubeyko
2023-04-14 3:27 ` Dragan Stancevic
2023-04-11 18:00 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dave Hansen
2023-05-01 23:49 ` Dragan Stancevic
2023-04-11 18:16 ` RAGHU H
2023-05-09 15:08 ` Dragan Stancevic
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.