linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: MTK <kim1158@gmail.com>
To: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	linux-cxl@vger.kernel.org,
	Dan Williams <dan.j.williams@intel.com>,
	mhocko@kernel.org, "david@redhat.com" <david@redhat.com>,
	willy@infradead.org, sj@kernel.org, ks0204.kim@samsung.com
Subject: Re: FW: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL
Date: Wed, 10 May 2023 03:45:45 +0900	[thread overview]
Message-ID: <CAARenATFs8rPcmjy7Zo4SSm5WH=ODwOcG6ip8_jTthT88PKGLg@mail.gmail.com> (raw)
In-Reply-To: <20230414084120.440801-1-ks0204.kim@samsung.com>

Hello all,

I appreciate all of the feedbacks and questions while my session at
5/8 13:00 PDT.
For those who are interested, please find my slide at [2].
My apology that I failed to manage the time slot so that I missed some
contents prepared.

Program Committee kindly allows me a make-up session to spend a few more minutes
around 5/10 15:30 PST after MM process: Akpm. Please find the schedule[1].
Thank you Dan Williams and Michal Hocko.

The remaining dialog I keep in mind now is
- more sync-up of CXL requirements to kernel
- what ZONE_EXMEM do for the requirements
- quick answers for the feedbacks I missed at 5/8
- alignment with kernel movement


[1] https://github.com/OpenMPDK/SMDK/wiki/93.-%5BLSF-MM-BPF-TOPIC%5D-SMDK-inspired-MM-changes-for-CXL
[2] https://docs.google.com/spreadsheets/d/1tIDYHgLhhcetoXtgyvcoM6YZWWHcVLdNYipBq2dH-_k/edit#gid=0

On Fri, Apr 14, 2023 at 5:45 PM Kyungsan Kim <ks0204.kim@samsung.com> wrote:
>
> >CXL is a promising technology that leads to fundamental changes in computing architecture.
> >To facilitate adoption and widespread of CXL memory, we are developing a memory tiering solution, called SMDK[1][2].
> >Using SMDK and CXL RAM device, our team has been working with industry and academic partners over last year.
> >Also, thanks to many researcher's effort, CXL adoption stage is gradually moving forward from basic enablement to real-world composite usecases.
> >At this moment, based on the researches and experiences gained working on SMDK, we would like to suggest a session at LSF/MM/BFP this year
> >to propose possible Linux MM changes with a brief of SMDK.
> >
> >Adam Manzanares kindly adviced me that it is preferred to discuss implementation details on given problem and consensus at LSF/MM/BFP.
> >Considering the adoption stage of CXL technology, however, let me suggest a design level discussion on the two MM expansions of SMDK this year.
> >When we have design consensus with participants, we want to continue follow-up discussions with additional implementation details, hopefully.
> >
> >
> >1. A new zone, ZONE_EXMEM
> >We added ZONE_EXMEM to manage CXL RAM device(s), separated from ZONE_NORMAL for usual DRAM due to the three reasons below.
> >
> >1) a CXL RAM has many different characteristics with conventional DRAM because a CXL device inherits and expands PCIe specification.
> >ex) frequency range, pluggability, link speed/width negotiation, host/device flow control, power throttling, channel-interleaving methodology, error handling, and etc.
> >It is likely that the primary usecase of CXL RAM would be System RAM.
> >However, to deal with the hardware differences properly, different MM algorithms are needed accordingly.
> >
> >2) Historically, zone has been expanded by reflecting the evolution of CPU, IO, and memory devices.
> >ex) ZONE_DMA(32), ZONE_HIGHMEM, ZONE_DEVICE, and ZONE_MOVABLE.
> >Each zone applies different MM algorithms such as page reclaim, compaction, migration, and fragmentation.
> >At first, we tried reuse of existing zones, ZONE_DEVICE and ZONE_MOVABLE, for CXL RAM purpose.
> >However, the purpose and implementation of the zones are not fit for CXL RAM.
> >
> >3) Industry is preparing a CXL-capable system that connects dozens of CXL devices in a server system.
> >When a CXL device becomes a separate node, an administrator/programmer needs to be aware of and manually control all nodes using 3rd party software, such as numactl and libnuma.
> >ZONE_EXMEM allows the assemble of CXL RAM devices into the single ZONE_EXMEM zone, and provides an abstraction to userspace by seamlessly managing the devices.
> >Also, the zone is able to interleave assembled devices in a software way to lead to aggregated bandwidth.
> >We would like to suggest if it is co-existable with HW interleaving like SW/HW raid0.
> >To help understanding, please refer to the node partition part of the picture[3].
> >
> >
> >2. User/Kernelspace Programmable Interface
> >In terms of a memory tiering solution, it is typical that the solution attempts to locate hot data on near memory, and cold data on far memory as accurately as possible.[4][5][6][7]
> >We noticed that the hot/coldness of data is determined by the memory access pattern of running application and/or kernel context.
> >Hence, a running context needs a near/far memory identifier to determine near/far memory.
> >When CXL RAM(s) is manipulated as a NUMA node, a node id can be function as a CXL identifier more or less.
> >However, the node id has limitation in that it is an ephemeral information that dynamically varies according to online status of CXL topology and system socket.
> >In this sense, we provides programmable interfaces for userspace and kernelspace context to explicitly (de)allocate memory from DRAM and CXL RAM regardless of a system change.
> >Specifically, MAP_EXMEM and GFP_EXMEM flags were added to mmap() syscall and kmalloc() siblings, respectively.
> >
> >Thanks to Adam Manzanares for reviewing this CFP thoroughly.
> >
> >
> >[1]SMDK: https://github.com/openMPDK/SMDK
> >[2]SMT: Software-defined Memory Tiering for Heterogeneous Computing systems with CXL Memory Expander, https://ieeexplore.ieee.org/document/10032695
> >[3]SMDK node partition: https://github.com/OpenMPDK/SMDK/wiki/2.-SMDK-Architecture#memory-partition
> >[4]TMO: Transparent Memory Offloading in Datacenters, https://dl.acm.org/doi/10.1145/3503222.3507731
> >[5]TPP: Transparent Page Placement for CXL-Enabled Tiered Memory, https://arxiv.org/abs/2206.02878
> >[6]Pond: CXL-Based Memory Pooling Systems for Cloud Platforms, https://dl.acm.org/doi/10.1145/3575693.3578835
> >[7]Hierarchical NUMA: https://blog.linuxplumbersconf.org/2017/ocw/system/presentations/4656/original/Hierarchical_NUMA_Design_Plumbers_2017.pdf
>
> Let us restate the original CFP as requirement point of view and the thought on that.
>
> 1) CXL DRAM pluggability
> Issue: a random unmovable allocation makes a CXL DRAM unpluggable.
> It can happen out of userspace e.g.) pinning for DMA buffer, or kernelspace e.g.) pinning for metadata such as struct page, zone, etc.
> For this matter, we should separate logical memory on/offline and physical add/remove.
> Thought: a CXL DRAM should be able to be used in a selective manner, pluggable or unpluggable.
> But, please don't get this wrong. Those are mutual-exclusive, so it cannot happen at the same time on a single CXL DRAM channel.
>
> 2) CXL DRAM identifier (API and ABI)
> Issue: an user/kernel context has to use the node id of a CXL memory-node to access CXL DRAM explicitly and implicitly.
> Thought: Node id would be ephemeral information. An userspace and kernelspace memory tiering solution need a API and/or ABI rather than node id.
>
> 3) Prevention of unintended CXL page migration
> Issue: while zswap operation, a page on near memory(DIMM DRAM) is allocated to store swapped page on far memory(CXL DRAM).
> Our thought: On the swap flow, the far memory should not be promoted to near memory accidentally.
>
> 4) Too many CXL nodes appearing in userland
> Issue: many CXL memory nodes would be appeared to userland along with development of a CXL capable server, switch and fabric topology.
> Currently, to lead to aggregated bandwidth among the CXL nodes, an userland needs to be aware and manage the nodes using a 3rd party SW such as numactl and libnuma.
> Thought: Kernel would provide an abstraction layer for userland to deal with it seamlessly.
> By the way, traditionally a node implies multiple memory channels in the same distance, and a node is the largest management unit in MM. i.e.) Node - Zone - Page.
> So, we thought that multiple CXL DRAMs can be appeared as a node, so the management dimension for single CXL DRAM should be smaller than node.
>


-- 
------------------------------------------------------------
the person who practices a truth goes toward light.

      reply	other threads:[~2023-05-09 18:46 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20230221014114epcas2p1687db1d75765a8f9ed0b3495eab1154d@epcas2p1.samsung.com>
2023-02-21  1:41 ` [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL Kyungsan Kim
2023-02-27 23:14   ` Dan Williams
     [not found]     ` <CGME20230228043551epcas2p3085444899b00b106c2901e1f51814d2c@epcas2p3.samsung.com>
2023-02-28  4:35       ` Kyungsan Kim
2023-03-03  6:07   ` Huang, Ying
     [not found]     ` <CGME20230322043354epcas2p2227bcad190a470d635b92f92587dc69e@epcas2p2.samsung.com>
2023-03-22  4:33       ` FW: " Kyungsan Kim
2023-03-22 22:03         ` Dan Williams
     [not found]           ` <CGME20230323105106epcas2p39ea8de619622376a4698db425c6a6fb3@epcas2p3.samsung.com>
2023-03-23 10:51             ` RE(2): " Kyungsan Kim
2023-03-23 12:25               ` David Hildenbrand
     [not found]                 ` <CGME20230324090923epcas2p2710ba4dc8157f9141c03104cf66e9d26@epcas2p2.samsung.com>
2023-03-24  9:09                   ` RE(4): " Kyungsan Kim
2023-03-24  9:12                     ` David Hildenbrand
     [not found]                       ` <CGME20230324092731epcas2p315c348bd76ef9fc84bffdb158e4c1aa4@epcas2p3.samsung.com>
2023-03-24  9:27                         ` RE(2): " Kyungsan Kim
2023-03-24  9:30                           ` David Hildenbrand
     [not found]                             ` <CGME20230324095031epcas2p284095ae90b25a47360b5098478dffdaa@epcas2p2.samsung.com>
2023-03-24  9:50                               ` RE(3): " Kyungsan Kim
2023-03-24 13:08                                 ` Jørgen Hansen
2023-03-24 22:33                                   ` David Hildenbrand
     [not found]                                     ` <CGME20230331114220epcas2p2d5734efcbdd8956f861f8e7178cd5288@epcas2p2.samsung.com>
2023-03-31 11:42                                       ` Kyungsan Kim
2023-03-31 13:42                                         ` Matthew Wilcox
2023-03-31 15:56                                           ` Frank van der Linden
2023-04-03  8:34                                             ` David Hildenbrand
     [not found]                                               ` <CGME20230405021655epcas2p2364b1f56dcde629bbd05bc796c2896aa@epcas2p2.samsung.com>
2023-04-05  2:16                                                 ` Kyungsan Kim
     [not found]                                             ` <CGME20230405020631epcas2p1c85058b28a70bbd46d587e78a9c9c7ad@epcas2p1.samsung.com>
2023-04-05  2:06                                               ` Re: " Kyungsan Kim
2023-04-05  5:00                                                 ` Dan Williams
     [not found]                                           ` <CGME20230405020121epcas2p2d9d39c151b6c5ab9e568ab9e2ab826ce@epcas2p2.samsung.com>
2023-04-05  2:01                                             ` Kyungsan Kim
2023-04-05  3:11                                               ` Matthew Wilcox
2023-04-03  8:28                                         ` David Hildenbrand
     [not found]                                           ` <CGME20230405020916epcas2p24cf04f5354c12632eba50b64b217e403@epcas2p2.samsung.com>
2023-04-05  2:09                                             ` Kyungsan Kim
     [not found]                                   ` <CGME20230331113147epcas2p12655777fec6839f7070ffcc446e3581b@epcas2p1.samsung.com>
2023-03-31 11:31                                     ` RE: RE(3): " Kyungsan Kim
2023-03-24  0:41               ` RE(2): " Huang, Ying
     [not found]                 ` <CGME20230324084808epcas2p354865d38dccddcb5cd46b17610345a5f@epcas2p3.samsung.com>
2023-03-24  8:48                   ` RE(4): " Kyungsan Kim
2023-03-24 13:46                     ` Gregory Price
     [not found]                       ` <CGME20230331113417epcas2p20a886e1712dbdb1f8eec03a2ac0a47e2@epcas2p2.samsung.com>
2023-03-31 11:34                         ` Kyungsan Kim
2023-03-31 15:53                           ` Gregory Price
     [not found]                             ` <CGME20230405020257epcas2p11b253f8c97a353890b96e6ae6eb515d3@epcas2p1.samsung.com>
2023-04-05  2:02                               ` Kyungsan Kim
2023-03-24 14:55               ` RE(2): " Matthew Wilcox
2023-03-24 17:49                 ` Matthew Wilcox
     [not found]                   ` <CGME20230331113715epcas2p13127b95af4000ec1ed96a2e9d89b7444@epcas2p1.samsung.com>
2023-03-31 11:37                     ` Kyungsan Kim
2023-03-31 12:54                       ` Matthew Wilcox
     [not found]                         ` <CGME20230405020027epcas2p4682d43446a493385b60c39a1dbbf07d6@epcas2p4.samsung.com>
2023-04-05  2:00                           ` Kyungsan Kim
2023-04-05  4:48                             ` Dan Williams
2023-04-05 18:12                               ` Matthew Wilcox
2023-04-05 19:42                                 ` Dan Williams
2023-04-06 12:27                                   ` David Hildenbrand
     [not found]                                     ` <CGME20230407093007epcas2p32addf5da24110c3e45c90a15dcde0d01@epcas2p3.samsung.com>
2023-04-07  9:30                                       ` Kyungsan Kim
     [not found]                   ` <CGME20230331113845epcas2p313118617918ae2bf634c3c475fc5dbd8@epcas2p3.samsung.com>
2023-03-31 11:38                     ` Re: RE(2): " Kyungsan Kim
2023-03-26  7:21               ` Mike Rapoport
2023-03-30 22:03                 ` Dragan Stancevic
2023-04-03  8:44                   ` Mike Rapoport
2023-04-04  4:27                     ` Dragan Stancevic
2023-04-04  6:47                       ` Huang, Ying
2023-04-06 22:27                         ` Dragan Stancevic
2023-04-07  0:58                           ` Huang, Ying
     [not found]                             ` <CGME20230407092950epcas2p12bc20c2952a800cf3f4f1d0b695f67e2@epcas2p1.samsung.com>
2023-04-07  9:29                               ` Kyungsan Kim
2023-04-07 14:35                             ` Dragan Stancevic
     [not found]                       ` <CGME20230405101840epcas2p4c92037ceba77dfe963d24791a9058450@epcas2p4.samsung.com>
2023-04-05 10:18                         ` Kyungsan Kim
     [not found]                 ` <CGME20230331114526epcas2p2b6f1d4c8c1c0b2e3c12a425b6e48c0d8@epcas2p2.samsung.com>
2023-03-31 11:45                   ` RE: RE(2): " Kyungsan Kim
2023-04-04  8:31                     ` Mike Rapoport
2023-04-04 17:58                       ` Adam Manzanares
2023-04-01 10:51                         ` Gregory Price
2023-04-04 18:59                           ` [External] " Viacheslav A.Dubeyko
2023-04-01 11:51                             ` Gregory Price
2023-04-04 21:09                               ` Viacheslav A.Dubeyko
     [not found]                               ` <642cb7ec58c71_21a829453@dwillia2-xfh.jf.intel.com.notmuch>
2023-04-05  2:34                                 ` Gregory Price
     [not found]                               ` <CGME20230405101843epcas2p2c819c8d60b2a9a776124c2b4bc25af14@epcas2p2.samsung.com>
2023-04-05 10:18                                 ` Kyungsan Kim
2023-03-30 22:02   ` Dragan Stancevic
     [not found]     ` <CGME20230331114649epcas2p23d52cd1d224085e6192a0aaf22948e3e@epcas2p2.samsung.com>
2023-03-31 11:46       ` Kyungsan Kim
     [not found]   ` <CGME20230414084120epcas2p37f105901350410772a3115a5a490c215@epcas2p3.samsung.com>
2023-04-14  8:41     ` FW: " Kyungsan Kim
2023-05-09 18:45       ` MTK [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAARenATFs8rPcmjy7Zo4SSm5WH=ODwOcG6ip8_jTthT88PKGLg@mail.gmail.com' \
    --to=kim1158@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=ks0204.kim@samsung.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@kernel.org \
    --cc=sj@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).