linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>,
	Matthew Wilcox <willy@infradead.org>
Cc: Kyungsan Kim <ks0204.kim@samsung.com>,
	lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-cxl@vger.kernel.org,
	a.manzanares@samsung.com, viacheslav.dubeyko@bytedance.com,
	seungjun.ha@samsung.com, wj28.lee@samsung.com
Subject: Re: FW: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL
Date: Thu, 6 Apr 2023 14:27:10 +0200	[thread overview]
Message-ID: <6ebf38f1-b7c4-cb38-b72f-2e406d2a2fdc@redhat.com> (raw)
In-Reply-To: <642dcf4169ae5_21a8294f@dwillia2-xfh.jf.intel.com.notmuch>

On 05.04.23 21:42, Dan Williams wrote:
> Matthew Wilcox wrote:
>> On Tue, Apr 04, 2023 at 09:48:41PM -0700, Dan Williams wrote:
>>> Kyungsan Kim wrote:
>>>> We know the situation. When a CXL DRAM channel is located under ZONE_NORMAL,
>>>> a random allocation of a kernel object by calling kmalloc() siblings makes the entire CXL DRAM unremovable.
>>>> Also, not all kernel objects can be allocated from ZONE_MOVABLE.
>>>>
>>>> ZONE_EXMEM does not confine a movability attribute(movable or unmovable), rather it allows a calling context can decide it.
>>>> In that aspect, it is the same with ZONE_NORMAL but ZONE_EXMEM works for extended memory device.
>>>> It does not mean ZONE_EXMEM support both movability and kernel object allocation at the same time.
>>>> In case multiple CXL DRAM channels are connected, we think a memory consumer possibly dedicate a channel for movable or unmovable purpose.
>>>>
>>>
>>> I want to clarify that I expect the number of people doing physical CXL
>>> hotplug of whole devices to be small compared to dynamic capacity
>>> devices (DCD). DCD is a new feature of the CXL 3.0 specification where a
>>> device maps 1 or more thinly provisioned memory regions that have
>>> individual extents get populated and depopulated by a fabric manager.
>>>
>>> In that scenario there is a semantic where the fabric manager hands out
>>> 100G to a host and asks for it back, it is within the protocol that the
>>> host can say "I can give 97GB back now, come back and ask again if you
>>> need that last 3GB".
>>
>> Presumably it can't give back arbitrary chunks of that 100GB?  There's
>> some granularity that's preferred; maybe on 1GB boundaries or something?
> 
> The device picks a granularity that can be tiny per spec, but it makes
> the hardware more expensive to track in small extents, so I expect
> something reasonable like 1GB, but time will tell once actual devices
> start showing up.

It all sounds a lot like virtio-mem using real hardware [I know, there 
are important differences, but for the dynamic aspect there are very 
similar issues to solve]

Fir virtio-mem, the current best way to support hotplugging of large 
memory to a VM to eventually be able to unplug a big fraction again is 
using a combination of ZONE_MOVABLE and ZONE_NORMAL -- "auto-movable" 
memory onlining policy. What's online to ZONE_MOVABLE can get (fairly) 
reliably unplugged again. What's onlined to ZONE_NORMAL is possibly lost 
forever.

Like (incrementally) hotplugging 1 TiB to a 4 GiB VM. Being able to 
unplug 1 TiB reliably again is pretty much out of scope. But the more 
memory we can reliably get back the better. And the more memory we can 
get in the common case, the better. With a ZONE_NORMAL vs. ZONE_MOVABLE 
ration of 1:3 on could unplug ~768 GiB again reliably. The remainder 
depends on fragmentation on the actual system and the unplug granularity.

The original plan was to use ZONE_PREFER_MOVABLE as a safety buffer to 
reduce ZONE_NORMAL memory without increasing ZONE_MOVABLE memory (and 
possibly harming the system). The underlying idea was that in many 
setups that memory in ZONE_PREFER_MOVABLE would not get used for 
unmovable allocations and it could, therefore, get unplugged fairly 
reliably in these setups. For all other setups, unmmovable allocations 
could leak into ZONE_PREFER_MOVABLE and reduce the number of memory we 
could unplug again. But the system would try to keep unmovable 
allocations to ZONE_NORMAL, so in most cases with some 
ZONE_PREFER_MOVABLE memory we would perform better than with only 
ZONE_NORMAL.

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2023-04-06 12:28 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20230221014114epcas2p1687db1d75765a8f9ed0b3495eab1154d@epcas2p1.samsung.com>
2023-02-21  1:41 ` [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL Kyungsan Kim
2023-02-27 23:14   ` Dan Williams
     [not found]     ` <CGME20230228043551epcas2p3085444899b00b106c2901e1f51814d2c@epcas2p3.samsung.com>
2023-02-28  4:35       ` Kyungsan Kim
2023-03-03  6:07   ` Huang, Ying
     [not found]     ` <CGME20230322043354epcas2p2227bcad190a470d635b92f92587dc69e@epcas2p2.samsung.com>
2023-03-22  4:33       ` FW: " Kyungsan Kim
2023-03-22 22:03         ` Dan Williams
     [not found]           ` <CGME20230323105106epcas2p39ea8de619622376a4698db425c6a6fb3@epcas2p3.samsung.com>
2023-03-23 10:51             ` RE(2): " Kyungsan Kim
2023-03-23 12:25               ` David Hildenbrand
     [not found]                 ` <CGME20230324090923epcas2p2710ba4dc8157f9141c03104cf66e9d26@epcas2p2.samsung.com>
2023-03-24  9:09                   ` RE(4): " Kyungsan Kim
2023-03-24  9:12                     ` David Hildenbrand
     [not found]                       ` <CGME20230324092731epcas2p315c348bd76ef9fc84bffdb158e4c1aa4@epcas2p3.samsung.com>
2023-03-24  9:27                         ` RE(2): " Kyungsan Kim
2023-03-24  9:30                           ` David Hildenbrand
     [not found]                             ` <CGME20230324095031epcas2p284095ae90b25a47360b5098478dffdaa@epcas2p2.samsung.com>
2023-03-24  9:50                               ` RE(3): " Kyungsan Kim
2023-03-24 13:08                                 ` Jørgen Hansen
2023-03-24 22:33                                   ` David Hildenbrand
     [not found]                                     ` <CGME20230331114220epcas2p2d5734efcbdd8956f861f8e7178cd5288@epcas2p2.samsung.com>
2023-03-31 11:42                                       ` Kyungsan Kim
2023-03-31 13:42                                         ` Matthew Wilcox
2023-03-31 15:56                                           ` Frank van der Linden
2023-04-03  8:34                                             ` David Hildenbrand
     [not found]                                               ` <CGME20230405021655epcas2p2364b1f56dcde629bbd05bc796c2896aa@epcas2p2.samsung.com>
2023-04-05  2:16                                                 ` Kyungsan Kim
     [not found]                                             ` <CGME20230405020631epcas2p1c85058b28a70bbd46d587e78a9c9c7ad@epcas2p1.samsung.com>
2023-04-05  2:06                                               ` Re: " Kyungsan Kim
2023-04-05  5:00                                                 ` Dan Williams
     [not found]                                           ` <CGME20230405020121epcas2p2d9d39c151b6c5ab9e568ab9e2ab826ce@epcas2p2.samsung.com>
2023-04-05  2:01                                             ` Kyungsan Kim
2023-04-05  3:11                                               ` Matthew Wilcox
2023-04-03  8:28                                         ` David Hildenbrand
     [not found]                                           ` <CGME20230405020916epcas2p24cf04f5354c12632eba50b64b217e403@epcas2p2.samsung.com>
2023-04-05  2:09                                             ` Kyungsan Kim
     [not found]                                   ` <CGME20230331113147epcas2p12655777fec6839f7070ffcc446e3581b@epcas2p1.samsung.com>
2023-03-31 11:31                                     ` RE: RE(3): " Kyungsan Kim
2023-03-24  0:41               ` RE(2): " Huang, Ying
     [not found]                 ` <CGME20230324084808epcas2p354865d38dccddcb5cd46b17610345a5f@epcas2p3.samsung.com>
2023-03-24  8:48                   ` RE(4): " Kyungsan Kim
2023-03-24 13:46                     ` Gregory Price
     [not found]                       ` <CGME20230331113417epcas2p20a886e1712dbdb1f8eec03a2ac0a47e2@epcas2p2.samsung.com>
2023-03-31 11:34                         ` Kyungsan Kim
2023-03-31 15:53                           ` Gregory Price
     [not found]                             ` <CGME20230405020257epcas2p11b253f8c97a353890b96e6ae6eb515d3@epcas2p1.samsung.com>
2023-04-05  2:02                               ` Kyungsan Kim
2023-03-24 14:55               ` RE(2): " Matthew Wilcox
2023-03-24 17:49                 ` Matthew Wilcox
     [not found]                   ` <CGME20230331113715epcas2p13127b95af4000ec1ed96a2e9d89b7444@epcas2p1.samsung.com>
2023-03-31 11:37                     ` Kyungsan Kim
2023-03-31 12:54                       ` Matthew Wilcox
     [not found]                         ` <CGME20230405020027epcas2p4682d43446a493385b60c39a1dbbf07d6@epcas2p4.samsung.com>
2023-04-05  2:00                           ` Kyungsan Kim
2023-04-05  4:48                             ` Dan Williams
2023-04-05 18:12                               ` Matthew Wilcox
2023-04-05 19:42                                 ` Dan Williams
2023-04-06 12:27                                   ` David Hildenbrand [this message]
     [not found]                                     ` <CGME20230407093007epcas2p32addf5da24110c3e45c90a15dcde0d01@epcas2p3.samsung.com>
2023-04-07  9:30                                       ` Kyungsan Kim
     [not found]                   ` <CGME20230331113845epcas2p313118617918ae2bf634c3c475fc5dbd8@epcas2p3.samsung.com>
2023-03-31 11:38                     ` Re: RE(2): " Kyungsan Kim
2023-03-26  7:21               ` Mike Rapoport
2023-03-30 22:03                 ` Dragan Stancevic
2023-04-03  8:44                   ` Mike Rapoport
2023-04-04  4:27                     ` Dragan Stancevic
2023-04-04  6:47                       ` Huang, Ying
2023-04-06 22:27                         ` Dragan Stancevic
2023-04-07  0:58                           ` Huang, Ying
     [not found]                             ` <CGME20230407092950epcas2p12bc20c2952a800cf3f4f1d0b695f67e2@epcas2p1.samsung.com>
2023-04-07  9:29                               ` Kyungsan Kim
2023-04-07 14:35                             ` Dragan Stancevic
     [not found]                       ` <CGME20230405101840epcas2p4c92037ceba77dfe963d24791a9058450@epcas2p4.samsung.com>
2023-04-05 10:18                         ` Kyungsan Kim
     [not found]                 ` <CGME20230331114526epcas2p2b6f1d4c8c1c0b2e3c12a425b6e48c0d8@epcas2p2.samsung.com>
2023-03-31 11:45                   ` RE: RE(2): " Kyungsan Kim
2023-04-04  8:31                     ` Mike Rapoport
2023-04-04 17:58                       ` Adam Manzanares
2023-04-01 10:51                         ` Gregory Price
2023-04-04 18:59                           ` [External] " Viacheslav A.Dubeyko
2023-04-01 11:51                             ` Gregory Price
2023-04-04 21:09                               ` Viacheslav A.Dubeyko
     [not found]                               ` <642cb7ec58c71_21a829453@dwillia2-xfh.jf.intel.com.notmuch>
2023-04-05  2:34                                 ` Gregory Price
     [not found]                               ` <CGME20230405101843epcas2p2c819c8d60b2a9a776124c2b4bc25af14@epcas2p2.samsung.com>
2023-04-05 10:18                                 ` Kyungsan Kim
2023-03-30 22:02   ` Dragan Stancevic
     [not found]     ` <CGME20230331114649epcas2p23d52cd1d224085e6192a0aaf22948e3e@epcas2p2.samsung.com>
2023-03-31 11:46       ` Kyungsan Kim
     [not found]   ` <CGME20230414084120epcas2p37f105901350410772a3115a5a490c215@epcas2p3.samsung.com>
2023-04-14  8:41     ` FW: " Kyungsan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ebf38f1-b7c4-cb38-b72f-2e406d2a2fdc@redhat.com \
    --to=david@redhat.com \
    --cc=a.manzanares@samsung.com \
    --cc=dan.j.williams@intel.com \
    --cc=ks0204.kim@samsung.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=seungjun.ha@samsung.com \
    --cc=viacheslav.dubeyko@bytedance.com \
    --cc=willy@infradead.org \
    --cc=wj28.lee@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).