linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kyungsan Kim <ks0204.kim@samsung.com>
To: Jorgen.Hansen@wdc.com
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-cxl@vger.kernel.org,
	a.manzanares@samsung.com, viacheslav.dubeyko@bytedance.com,
	dan.j.williams@intel.com, seungjun.ha@samsung.com,
	wj28.lee@samsung.com
Subject: RE: RE: RE(3): FW: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL
Date: Fri, 31 Mar 2023 20:31:47 +0900	[thread overview]
Message-ID: <20230331113147.399972-1-ks0204.kim@samsung.com> (raw)
In-Reply-To: <E224146D-058D-48B3-8788-A6BC3370044F@wdc.com>

Hi Jorgen Hansen.
Thank you for joining this topic and share your thoughts.
I'm sorry for late reply due to some major tasks of our team this week.

>> On 24 Mar 2023, at 10.50, Kyungsan Kim <ks0204.kim@samsung.com> wrote:
>> 
>>> On 24.03.23 10:27, Kyungsan Kim wrote:
>>>>> On 24.03.23 10:09, Kyungsan Kim wrote:
>>>>>> Thank you David Hinderbrand for your interest on this topic.
>>>>>> 
>>>>>>>> 
>>>>>>>>> Kyungsan Kim wrote:
>>>>>>>>> [..]
>>>>>>>>>>> In addition to CXL memory, we may have other kind of memory in the
>>>>>>>>>>> system, for example, HBM (High Bandwidth Memory), memory in FPGA card,
>>>>>>>>>>> memory in GPU card, etc.  I guess that we need to consider them
>>>>>>>>>>> together.  Do we need to add one zone type for each kind of memory?
>>>>>>>>>> 
>>>>>>>>>> We also don't think a new zone is needed for every single memory
>>>>>>>>>> device.  Our viewpoint is the sole ZONE_NORMAL becomes not enough to
>>>>>>>>>> manage multiple volatile memory devices due to the increased device
>>>>>>>>>> types.  Including CXL DRAM, we think the ZONE_EXMEM can be used to
>>>>>>>>>> represent extended volatile memories that have different HW
>>>>>>>>>> characteristics.
>>>>>>>>> 
>>>>>>>>> Some advice for the LSF/MM discussion, the rationale will need to be
>>>>>>>>> more than "we think the ZONE_EXMEM can be used to represent extended
>>>>>>>>> volatile memories that have different HW characteristics". It needs to
>>>>>>>>> be along the lines of "yes, to date Linux has been able to describe DDR
>>>>>>>>> with NUMA effects, PMEM with high write overhead, and HBM with improved
>>>>>>>>> bandwidth not necessarily latency, all without adding a new ZONE, but a
>>>>>>>>> new ZONE is absolutely required now to enable use case FOO, or address
>>>>>>>>> unfixable NUMA problem BAR." Without FOO and BAR to discuss the code
>>>>>>>>> maintainability concern of "fewer degress of freedom in the ZONE
>>>>>>>>> dimension" starts to dominate.
>>>>>>>> 
>>>>>>>> One problem we experienced was occured in the combination of hot-remove and kerelspace allocation usecases.
>>>>>>>> ZONE_NORMAL allows kernel context allocation, but it does not allow hot-remove because kernel resides all the time.
>>>>>>>> ZONE_MOVABLE allows hot-remove due to the page migration, but it only allows userspace allocation.
>>>>>>>> Alternatively, we allocated a kernel context out of ZONE_MOVABLE by adding GFP_MOVABLE flag.
>>>>>> 
>>>>>>> That sounds like a bad hack :) .
>>>>>> I consent you.
>>>>>> 
>>>>>>>> In case, oops and system hang has occasionally occured because ZONE_MOVABLE can be swapped.
>>>>>>>> We resolved the issue using ZONE_EXMEM by allowing seletively choice of the two usecases.
>>>>>> 
>>>>>>> I once raised the idea of a ZONE_PREFER_MOVABLE [1], maybe that's
>>>>>>> similar to what you have in mind here. In general, adding new zones is
>>>>>>> frowned upon.
>>>>>> 
>>>>>> Actually, we have already studied your idea and thought it is similar with us in 2 aspects.
>>>>>> 1. ZONE_PREFER_MOVABLE allows a kernelspace allocation using a new zone
>>>>>> 2. ZONE_PREFER_MOVABLE helps less fragmentation by splitting zones, and ordering allocation requests from the zones.
>>>>>> 
>>>>>> We think ZONE_EXMEM also helps less fragmentation.
>>>>>> Because it is a separated zone and handles a page allocation as movable by default.
>>>>> 
>>>>> So how is it different that it would justify a different (more confusing
>>>>> IMHO) name? :) Of course, names don't matter that much, but I'd be
>>>>> interested in which other aspect that zone would be "special".
>>>> 
>>>> FYI for the first time I named it as ZONE_CXLMEM, but we thought it would be needed to cover other extended memory types as well.
>>>> So I changed it as ZONE_EXMEM.
>>>> We also would like to point out a "special" zone aspeact, which is different from ZONE_NORMAL for tranditional DDR DRAM.
>>>> Of course, a symbol naming is important more or less to represent it very nicely, though.
>>>> Do you prefer ZONE_SPECIAL? :)
>>> 
>>> I called it ZONE_PREFER_MOVABLE. If you studied that approach there must
>>> be a good reason to name it differently?
>>> 
>> 
>> The intention of ZONE_EXMEM is a separated logical management dimension originated from the HW diffrences of extended memory devices.
>> Althought the ZONE_EXMEM considers the movable and frementation aspect, it is not all what ZONE_EXMEM considers.
>> So it is named as it.
>
>Given that CXL memory devices can potentially cover a wide range of technologies with quite different latency and bandwidth metrics, will one zone serve as the management vehicle that you seek? If a system contains both CXL attached DRAM and, let say, a byte-addressable CXL SSD - both used as (different) byte addressable tiers in a tiered memory hierarchy, allocating memory from the ZONE_EXMEM doesn’t really tell you much about what you get. So the client would still need an orthogonal method to characterize the desired performance characteristics. 

I agree that a heterogeneous system would be able to adopt multiple types of extended memory devices.
We think ZONE_EXMEM can apply different management algorithms for each extended memory type. 
What we think is ZONE_NORMAL : ZONE_EXMEM = 1 : N, where N is the number of HW device type.
ZONE_NORMAL is for conventional DDR DRAM on DIMM F/F, while ZONE_EXMEM is for extended memories, CXL DRAM, CXL SSD, etc on other F/Fs such as EDSFF. 

We think the movable attribute is a requirement for CXL DRAM device. 
However, there are other SW points we are concerning - implicit allocation and unintended migration - with CXL HW differences.
So, I'm not sure if it is possible or good to cover the matters by combination of ZONE_MOVABLE and ZONE_PREFER_MOVABLE design.
Let me point out again, we proposed the ZONE_EXMEM for the special logical management of extended memory devices.

Specifically, for the performance metric, we think it would be handled not in the zone, but in a node unit.


>This method could be combined with a fabric independent zone such as ZONE_PREFER_MOVABLE to address the kernel allocation issue. At the same time, this new zone could also be useful in other cases, such as virtio-mem.

We agree with your thought. Along with adoption of CXL memory pool and fabric, virtualization SW layers would be added.
Considering not only baremetal OS, but memory inflation/deflation between baremetal OS and a hypervisor, we think ZONE_EXMEM can be useful as the identifier for CXL memory.


>
>Thanks,
>Jorgen

  parent reply	other threads:[~2023-03-31 11:32 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20230221014114epcas2p1687db1d75765a8f9ed0b3495eab1154d@epcas2p1.samsung.com>
2023-02-21  1:41 ` [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL Kyungsan Kim
2023-02-27 23:14   ` Dan Williams
     [not found]     ` <CGME20230228043551epcas2p3085444899b00b106c2901e1f51814d2c@epcas2p3.samsung.com>
2023-02-28  4:35       ` Kyungsan Kim
2023-03-03  6:07   ` Huang, Ying
     [not found]     ` <CGME20230322043354epcas2p2227bcad190a470d635b92f92587dc69e@epcas2p2.samsung.com>
2023-03-22  4:33       ` FW: " Kyungsan Kim
2023-03-22 22:03         ` Dan Williams
     [not found]           ` <CGME20230323105106epcas2p39ea8de619622376a4698db425c6a6fb3@epcas2p3.samsung.com>
2023-03-23 10:51             ` RE(2): " Kyungsan Kim
2023-03-23 12:25               ` David Hildenbrand
     [not found]                 ` <CGME20230324090923epcas2p2710ba4dc8157f9141c03104cf66e9d26@epcas2p2.samsung.com>
2023-03-24  9:09                   ` RE(4): " Kyungsan Kim
2023-03-24  9:12                     ` David Hildenbrand
     [not found]                       ` <CGME20230324092731epcas2p315c348bd76ef9fc84bffdb158e4c1aa4@epcas2p3.samsung.com>
2023-03-24  9:27                         ` RE(2): " Kyungsan Kim
2023-03-24  9:30                           ` David Hildenbrand
     [not found]                             ` <CGME20230324095031epcas2p284095ae90b25a47360b5098478dffdaa@epcas2p2.samsung.com>
2023-03-24  9:50                               ` RE(3): " Kyungsan Kim
2023-03-24 13:08                                 ` Jørgen Hansen
2023-03-24 22:33                                   ` David Hildenbrand
     [not found]                                     ` <CGME20230331114220epcas2p2d5734efcbdd8956f861f8e7178cd5288@epcas2p2.samsung.com>
2023-03-31 11:42                                       ` Kyungsan Kim
2023-03-31 13:42                                         ` Matthew Wilcox
2023-03-31 15:56                                           ` Frank van der Linden
2023-04-03  8:34                                             ` David Hildenbrand
     [not found]                                               ` <CGME20230405021655epcas2p2364b1f56dcde629bbd05bc796c2896aa@epcas2p2.samsung.com>
2023-04-05  2:16                                                 ` Kyungsan Kim
     [not found]                                             ` <CGME20230405020631epcas2p1c85058b28a70bbd46d587e78a9c9c7ad@epcas2p1.samsung.com>
2023-04-05  2:06                                               ` Re: " Kyungsan Kim
2023-04-05  5:00                                                 ` Dan Williams
     [not found]                                           ` <CGME20230405020121epcas2p2d9d39c151b6c5ab9e568ab9e2ab826ce@epcas2p2.samsung.com>
2023-04-05  2:01                                             ` Kyungsan Kim
2023-04-05  3:11                                               ` Matthew Wilcox
2023-04-03  8:28                                         ` David Hildenbrand
     [not found]                                           ` <CGME20230405020916epcas2p24cf04f5354c12632eba50b64b217e403@epcas2p2.samsung.com>
2023-04-05  2:09                                             ` Kyungsan Kim
     [not found]                                   ` <CGME20230331113147epcas2p12655777fec6839f7070ffcc446e3581b@epcas2p1.samsung.com>
2023-03-31 11:31                                     ` Kyungsan Kim [this message]
2023-03-24  0:41               ` RE(2): " Huang, Ying
     [not found]                 ` <CGME20230324084808epcas2p354865d38dccddcb5cd46b17610345a5f@epcas2p3.samsung.com>
2023-03-24  8:48                   ` RE(4): " Kyungsan Kim
2023-03-24 13:46                     ` Gregory Price
     [not found]                       ` <CGME20230331113417epcas2p20a886e1712dbdb1f8eec03a2ac0a47e2@epcas2p2.samsung.com>
2023-03-31 11:34                         ` Kyungsan Kim
2023-03-31 15:53                           ` Gregory Price
     [not found]                             ` <CGME20230405020257epcas2p11b253f8c97a353890b96e6ae6eb515d3@epcas2p1.samsung.com>
2023-04-05  2:02                               ` Kyungsan Kim
2023-03-24 14:55               ` RE(2): " Matthew Wilcox
2023-03-24 17:49                 ` Matthew Wilcox
     [not found]                   ` <CGME20230331113715epcas2p13127b95af4000ec1ed96a2e9d89b7444@epcas2p1.samsung.com>
2023-03-31 11:37                     ` Kyungsan Kim
2023-03-31 12:54                       ` Matthew Wilcox
     [not found]                         ` <CGME20230405020027epcas2p4682d43446a493385b60c39a1dbbf07d6@epcas2p4.samsung.com>
2023-04-05  2:00                           ` Kyungsan Kim
2023-04-05  4:48                             ` Dan Williams
2023-04-05 18:12                               ` Matthew Wilcox
2023-04-05 19:42                                 ` Dan Williams
2023-04-06 12:27                                   ` David Hildenbrand
     [not found]                                     ` <CGME20230407093007epcas2p32addf5da24110c3e45c90a15dcde0d01@epcas2p3.samsung.com>
2023-04-07  9:30                                       ` Kyungsan Kim
     [not found]                   ` <CGME20230331113845epcas2p313118617918ae2bf634c3c475fc5dbd8@epcas2p3.samsung.com>
2023-03-31 11:38                     ` Re: RE(2): " Kyungsan Kim
2023-03-26  7:21               ` Mike Rapoport
2023-03-30 22:03                 ` Dragan Stancevic
2023-04-03  8:44                   ` Mike Rapoport
2023-04-04  4:27                     ` Dragan Stancevic
2023-04-04  6:47                       ` Huang, Ying
2023-04-06 22:27                         ` Dragan Stancevic
2023-04-07  0:58                           ` Huang, Ying
     [not found]                             ` <CGME20230407092950epcas2p12bc20c2952a800cf3f4f1d0b695f67e2@epcas2p1.samsung.com>
2023-04-07  9:29                               ` Kyungsan Kim
2023-04-07 14:35                             ` Dragan Stancevic
     [not found]                       ` <CGME20230405101840epcas2p4c92037ceba77dfe963d24791a9058450@epcas2p4.samsung.com>
2023-04-05 10:18                         ` Kyungsan Kim
     [not found]                 ` <CGME20230331114526epcas2p2b6f1d4c8c1c0b2e3c12a425b6e48c0d8@epcas2p2.samsung.com>
2023-03-31 11:45                   ` RE: RE(2): " Kyungsan Kim
2023-04-04  8:31                     ` Mike Rapoport
2023-04-04 17:58                       ` Adam Manzanares
2023-04-01 10:51                         ` Gregory Price
2023-04-04 18:59                           ` [External] " Viacheslav A.Dubeyko
2023-04-01 11:51                             ` Gregory Price
2023-04-04 21:09                               ` Viacheslav A.Dubeyko
     [not found]                               ` <642cb7ec58c71_21a829453@dwillia2-xfh.jf.intel.com.notmuch>
2023-04-05  2:34                                 ` Gregory Price
     [not found]                               ` <CGME20230405101843epcas2p2c819c8d60b2a9a776124c2b4bc25af14@epcas2p2.samsung.com>
2023-04-05 10:18                                 ` Kyungsan Kim
2023-03-30 22:02   ` Dragan Stancevic
     [not found]     ` <CGME20230331114649epcas2p23d52cd1d224085e6192a0aaf22948e3e@epcas2p2.samsung.com>
2023-03-31 11:46       ` Kyungsan Kim
     [not found]   ` <CGME20230414084120epcas2p37f105901350410772a3115a5a490c215@epcas2p3.samsung.com>
2023-04-14  8:41     ` FW: " Kyungsan Kim
2023-05-09 18:45       ` MTK

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230331113147.399972-1-ks0204.kim@samsung.com \
    --to=ks0204.kim@samsung.com \
    --cc=Jorgen.Hansen@wdc.com \
    --cc=a.manzanares@samsung.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=seungjun.ha@samsung.com \
    --cc=viacheslav.dubeyko@bytedance.com \
    --cc=wj28.lee@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).