All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Gregory Price <gregory.price@memverge.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dan Williams <dan.j.williams@intel.com>, <linux-cxl@vger.kernel.org>
Subject: Re: [GIT preview] for-6.3/cxl-ram-region
Date: Mon, 30 Jan 2023 12:10:08 -0800	[thread overview]
Message-ID: <63d8242084087_3a36e529420@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <Y9fRVu+yLza4d5Vt@memverge.com>

Gregory Price wrote:
[..]
> I found the same results.
> 
> Reference command and config for list readers:
> 
> sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
> -drive file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \
> -m 2G,slots=4,maxmem=4G \
> -smp 4 \
> -machine type=q35,accel=kvm,cxl=on \
> -enable-kvm \
> -nographic \
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
> -object memory-backend-ram,id=mem0,size=1G,share=on \
> -device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
> -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G
> 
> 
> echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_region
> echo 1 > /sys/bus/cxl/devices/region0/interleave_ways
> echo 256 > /sys/bus/cxl/devices/region0/interleave_granularity
> echo 0x40000000 > /sys/bus/cxl/devices/region0/size
> echo mem0 > /sys/bus/cxl/devices/region0/target0
> 
> Not sure if bug/missing feature, but after attaching a device to the
> target, you get no output when reading target0
> 
> ```
> [root@fedora ~]# cat /sys/bus/cxl/devices/region0/target0
> 
> [root@fedora ~]#

Hmm, did you not get:

"-bash: echo: write error: Invalid argument"

...at that step? Because targetX expects an endpoint decoder, not a
memdev.


> ```
> 
> Would be nice for the sake of easier topology reporting if either this
> reported the configured target or we added a link to the targets into
> the directory.
> 
> But this looks good to be so far, excited to see the devdax patch, I
> think i can whip up a sample DCD device (for command testing and proof
> of concept) pretty quickly after this.
> 
> 
> One question re: auto-online of the devdax hookup - is the intent for
> auto-online to follow /sys/devices/system/memory/auto_online_blocks
> settings or should we consider controlling auto-online more granularly?
> 
> It's a bit of a catch-22 if we follow auto_online_blocks:
>   1) for local memory expanders, if off, this is annoying
> 	2) for statically configured remote-pools (remote expanders)
>      this is annoying for the same reason
> 	3) for early DCD's (multi-headed expander, no-switch), the pattern
> 	   / expectation i'm seeing is that the device expects hosts to see all
> 		 memory blocks when the device is hooked up, and then expects hosts
> 		 to "play nice" by only onlining blocks that have been allocated.
> 		 (there is some device-side exclusion features to enforce security).
> 
> Basically early DCD's will look like remote expanders with some
> exclusivity controls (configured via the DCD commands).
> 
> So with the pattern above, lets say you have a 1TB pool attached to 4
> hosts.  Each host would produce the following commands:
> 
> echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_region
> echo 1 > /sys/bus/cxl/devices/region0/interleave_ways
> echo 256 > /sys/bus/cxl/devices/region0/interleave_granularity
> echo 0x10000000000 > /sys/bus/cxl/devices/region0/size
> echo mem0 > /sys/bus/cxl/devices/region0/target0

> and mem0 would get 4096 memory# blocks (presumably under region/devdax?)

At 1T of size, mem0 would be hosting 4294967296 256-byte blocks.

> A provisioning command would be sent via the device interface
> 
> ioctl(DCD(N blocks)) -> /sys/bus/cxl/devices/mem0/dev 
> return: DCD return structure with extents[blocks[a,b,c],...]

In the DCD case the CXL-region would be instantiated ahead of time and
associated with a DAX-region. Upon each capacity addition event a new
devdax instance would appear in that region.

> Then the final action would be
> echo online > /sys/bus/cxl/devices/region0/devdax/memory[a,b,c...]

Something like that, yes.

> or online_moveable, or probably some other special zone to make sure
> the memory is not used by the kernel (so it can be later released)
> 
> 
> So to me, it feels like we might want more granular auto-online control,
> but I don't know how possible that is.

Yes, I think it also needs to coordinate with the existing udev rules
and policy around auto-onlining new memory blocks.

> Note: This is me relaying what I've seen/heard from some device vendors
> in terms of what they think the control scheme will be, so if something
> is wildly off-base, it would be good to address the expectations.
> 
> 
> Either way: This is awesome, thank you for sharing the preview Dan.

Thanks for testing!

  reply	other threads:[~2023-01-30 20:10 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-26  6:25 [GIT preview] for-6.3/cxl-ram-region Dan Williams
2023-01-26  6:29 ` Dan Williams
2023-01-26 18:50   ` Jonathan Cameron
2023-01-26 19:34     ` Jonathan Cameron
2023-01-30 14:16       ` Gregory Price
2023-01-30 20:10         ` Dan Williams [this message]
2023-01-30 20:58           ` Gregory Price
2023-01-30 23:18             ` Dan Williams
2023-01-30 22:00               ` Gregory Price
2023-01-31  2:00               ` Gregory Price
2023-01-31 16:56                 ` Dan Williams
2023-01-31 17:59                 ` Verma, Vishal L
2023-01-31 19:03                   ` Gregory Price
2023-01-31 19:46                     ` Verma, Vishal L
2023-01-31 20:24                       ` Verma, Vishal L
2023-01-31 23:03                         ` Gregory Price
2023-01-31 23:17                           ` Gregory Price
     [not found]                             ` <CGME20230131235012uscas1p11573de234af67d70a882d4ca0f3ebaab@uscas1p1.samsung.com>
2023-01-31 23:50                               ` Fan Ni
2023-02-01  5:29                                 ` Gregory Price
2023-02-01 21:16                                   ` Gregory Price
2023-02-02  1:06                                     ` Gregory Price
2023-02-02 16:03                                     ` Jonathan Cameron
2023-02-01 22:05                                       ` Gregory Price
2023-02-02 18:13                                         ` Jonathan Cameron
2023-02-02  0:43                                           ` Gregory Price
2023-02-02 18:18                                         ` Dan Williams
2023-02-02  0:44                                           ` Gregory Price
2023-02-07 16:31                                             ` Jonathan Cameron
2023-01-30 14:23       ` Gregory Price
2023-01-31 14:56         ` Jonathan Cameron
2023-01-31 17:34           ` Gregory Price
2023-01-26 22:05 ` Gregory Price
2023-01-26 22:20   ` Dan Williams
2023-02-04  2:36 ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=63d8242084087_3a36e529420@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=gregory.price@memverge.com \
    --cc=linux-cxl@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.