linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dragan Stancevic <dragan@stancevic.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>,
	David Hildenbrand <david@redhat.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	Gregory Price <gregory.price@memverge.com>
Cc: lsf-pc@lists.linux-foundation.org, nil-migration@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory​
Date: Wed, 3 May 2023 18:42:03 -0500	[thread overview]
Message-ID: <43e81295-3ef3-e10d-4af8-cf53f06c7120@stancevic.com> (raw)
In-Reply-To: <ec7399fbd74d43a06edc8b344fccddbdef44a101.camel@HansenPartnership.com>

Hi James, sorry looks like I missed your email...


On 4/12/23 10:15, James Bottomley wrote:
> On Wed, 2023-04-12 at 10:38 +0200, David Hildenbrand wrote:
>> On 12.04.23 04:54, Huang, Ying wrote:
>>> Gregory Price <gregory.price@memverge.com> writes:
> [...]
>>>> That feels like a hack/bodge rather than a proper solution to me.
>>>>
>>>> Maybe this is an affirmative argument for the creation of an
>>>> EXMEM zone.
>>>
>>> Let's start with requirements.  What is the requirements for a new
>>> zone type?
>>
>> I'm stills scratching my head regarding this. I keep hearing all
>> different kind of statements that just add more confusions "we want
>> it to be hotunpluggable" "we want to allow for long-term pinning
>> memory" "but we still want it to be movable" "we want to place some
>> unmovable allocations on it". Huh?
> 
> This is the essential question about CXL memory itself: what would its
> killer app be?  The CXL people (or at least the ones I've talked to)
> don't exactly know.


I hope it's not something I've said, I'm not claiming VM migration or 
hypervisor clustering is the killer app for CXL. I would never claim 
that. And I'm not one of the CXL folks. You can chuck me into the "CXL 
enthusiasts" bucket.... For a bit of context, I'm one of the 
co-authors/architects of VMware's clustered filesystem[1] and I've 
worked on live VM migration as far back as 2003 on the original ESX 
server. Back in the day, we introduced the concept of VM live migration 
into the x86 data-center parlance with a combination of a process 
monitor and a clustered filesystem. The basic mechanism we put forward 
at the time was: pre-copy, quiesce, post-copy, un-quiesce. And I think 
most hypervisor after which added live migration are using loosely the 
same basic principles, iirc xen introduced LM 4 years later in 2007 and 
KVM about the same time or perhaps a year later. Anyway, the point that 
I am trying to get to is, it bugged me 20 years ago that we quiesced, 
and it bugs me today :) I think 20 years ago, quiescing was an 
acceptable compromise because we couldn't solve it technologically. 
Maybe 20-25 years later, we've reached a point we can solve it 
technologically. I don't know, but the problem interests me enough to try.


>  Within IBM I've seen lots of ideas but no actual
> concrete applications.  Given the rates at which memory density in
> systems is increasing, I'm a bit dubious of the extensible system pool
> argument.   Providing extensible memory to VMs sounds a bit more
> plausible, particularly as it solves a big part of the local overcommit
> problem (although you still have a global one).  I'm not really sure I
> buy the VM migration use case: iterative transfer works fine with small
> down times so transferring memory seems to be the least of problems
> with the VM migration use case

We do approximately 2.5 Million live migrations per year. Some 
migrations take less than a second, some take roughly a second, and 
others on very noisy VMs can take several seconds. Whatever that average 
is, let's say 1 second per live migration, that's cumulatively roughly 
28 days of steal lost to migration per year. As you probably know, live 
migrations are essential for de-fragmenting hypervisors/de-stranding 
resources and from my perspective, I'd like to see them happen more 
often with a smaller customer impact.


> (it's mostly about problems with attached devices).

That is purely virtualization load type dependent. Maybe for the cloud 
you're running devices are a problem(I'm guessing here). For us this is 
a non existent problem. We serve approximately 600,000 customers and 
don't do forms of pass-through so it's literally a non issue. What I am 
starting to tackle with nil-migration is to be able to migrate live and 
executing memory, instead of frozen memory. Which should especially help 
with noisy VMs, and in my experience customers of noisy VMs are more 
likely to notice steal and complain about steal. I understand everyone 
has their own workloads, and the devices problem will be solved in it's 
own right, but it's out of scope for what I am tackling with 
nil-migration. My main focus at this time is memory and context migration.


<  CXL 3.0 is adding sharing primitives for memory so
> now we have to ask if there are any multi-node shared memory use cases
> for this, but most of us have already been burned by multi-node shared
> clusters once in our career and are a bit leery of a second go around.

Chatting with you at the last LPC, and judging by the combined gray hair 
between us, I'll venture to guess we've both fallen off the proverbial 
bike, many times. It's never stopped me from getting back on. Issue 
interest me enough to try.

If you don't mind me asking, what clustering did you work on? Maybe I am 
familiar with it


> 
> Is there a use case I left out (or needs expanding)?
> 
> James
> 



[1]. https://en.wikipedia.org/wiki/VMware_VMFS

--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla


  reply	other threads:[~2023-05-03 23:42 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-07 21:05 [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory​ Dragan Stancevic
2023-04-07 22:23 ` James Houghton
2023-04-07 23:17   ` David Rientjes
2023-04-08  1:33     ` Dragan Stancevic
2023-04-08 16:24     ` Dragan Stancevic
2023-04-08  0:05 ` Gregory Price
2023-04-11  0:56   ` Dragan Stancevic
2023-04-11  1:48     ` Gregory Price
2023-04-14  3:32       ` Dragan Stancevic
2023-04-14 13:16         ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Jonathan Cameron
2023-04-11  6:37   ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory​ Huang, Ying
2023-04-11 15:36     ` Gregory Price
2023-04-12  2:54       ` Huang, Ying
2023-04-12  8:38         ` David Hildenbrand
     [not found]           ` <CGME20230412111034epcas2p1b46d2a26b7d3ac5db3b0e454255527b0@epcas2p1.samsung.com>
2023-04-12 11:10             ` FW: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-12 11:26               ` David Hildenbrand
     [not found]                 ` <CGME20230414084110epcas2p20b90a8d1892110d7ca3ac16290cd4686@epcas2p2.samsung.com>
2023-04-14  8:41                   ` Kyungsan Kim
2023-04-12 15:40               ` Matthew Wilcox
     [not found]                 ` <CGME20230414084114epcas2p4754d6c0d3c86a0d6d4e855058562100f@epcas2p4.samsung.com>
2023-04-14  8:41                   ` Kyungsan Kim
2023-04-12 15:15           ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory​ James Bottomley
2023-05-03 23:42             ` Dragan Stancevic [this message]
2023-04-12 15:26           ` Gregory Price
2023-04-12 15:50             ` David Hildenbrand
2023-04-12 16:34               ` Gregory Price
2023-04-14  4:16                 ` Dragan Stancevic
2023-04-14  3:33     ` Dragan Stancevic
2023-04-14  5:35       ` Huang, Ying
2023-04-09 17:40 ` Shreyas Shah
2023-04-11  1:08   ` Dragan Stancevic
2023-04-11  1:17     ` Shreyas Shah
2023-04-11  1:32       ` Dragan Stancevic
2023-04-11  4:33         ` Shreyas Shah
2023-04-14  3:26           ` Dragan Stancevic
     [not found] ` <CGME20230410030532epcas2p49eae675396bf81658c1a3401796da1d4@epcas2p4.samsung.com>
2023-04-10  3:05   ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-10 17:46     ` [External] " Viacheslav A.Dubeyko
2023-04-14  3:27     ` Dragan Stancevic
2023-04-11 18:00 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory​ Dave Hansen
2023-05-09 15:08 ` Dragan Stancevic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43e81295-3ef3-e10d-4af8-cf53f06c7120@stancevic.com \
    --to=dragan@stancevic.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=david@redhat.com \
    --cc=gregory.price@memverge.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=nil-migration@lists.linux.dev \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).