All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hao Xiang <hao.xiang@bytedance.com>
To: Gregory Price <gregory.price@memverge.com>
Cc: "Ho-Ren (Jack) Chuang" <horenchuang@bytedance.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jonathan Cameron" <Jonathan.Cameron@huawei.com>,
	"Ben Widawsky" <ben.widawsky@intel.com>,
	"Gregory Price" <gourry.memverge@gmail.com>,
	"Fan Ni" <fan.ni@samsung.com>, "Ira Weiny" <ira.weiny@intel.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"David Hildenbrand" <david@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	qemu-devel@nongnu.org, "Ho-Ren (Jack) Chuang" <horenc@vt.edu>,
	linux-cxl@vger.kernel.org
Subject: Re: [External] Re: [QEMU-devel][RFC PATCH 1/1] backends/hostmem: qapi/qom: Add an ObjectOption for memory-backend-* called HostMemType and its arg 'cxlram'
Date: Tue, 9 Jan 2024 11:33:04 -0800	[thread overview]
Message-ID: <CAAYibXhY5p6VN7yAMpmfAgHO+gsf51dvNw68y__RYV+43CVVLQ@mail.gmail.com> (raw)
In-Reply-To: <ZZydwBTS4NeSizzb@memverge.com>

On Mon, Jan 8, 2024 at 5:13 PM Gregory Price <gregory.price@memverge.com> wrote:
>
> On Mon, Jan 08, 2024 at 05:05:38PM -0800, Hao Xiang wrote:
> > On Mon, Jan 8, 2024 at 2:47 PM Hao Xiang <hao.xiang@bytedance.com> wrote:
> > >
> > > On Mon, Jan 8, 2024 at 9:15 AM Gregory Price <gregory.price@memverge.com> wrote:
> > > >
> > > > On Fri, Jan 05, 2024 at 09:59:19PM -0800, Hao Xiang wrote:
> > > > > On Wed, Jan 3, 2024 at 1:56 PM Gregory Price <gregory.price@memverge.com> wrote:
> > > > > >
> > > > > > For a variety of performance reasons, this will not work the way you
> > > > > > want it to.  You are essentially telling QEMU to map the vmem0 into a
> > > > > > virtual cxl device, and now any memory accesses to that memory region
> > > > > > will end up going through the cxl-type3 device logic - which is an IO
> > > > > > path from the perspective of QEMU.
> > > > >
> > > > > I didn't understand exactly how the virtual cxl-type3 device works. I
> > > > > thought it would go with the same "guest virtual address ->  guest
> > > > > physical address -> host physical address" translation totally done by
> > > > > CPU. But if it is going through an emulation path handled by virtual
> > > > > cxl-type3, I agree the performance would be bad. Do you know why
> > > > > accessing memory on a virtual cxl-type3 device can't go with the
> > > > > nested page table translation?
> > > > >
> > > >
> > > > Because a byte-access on CXL memory can have checks on it that must be
> > > > emulated by the virtual device, and because there are caching
> > > > implications that have to be emulated as well.
> > >
> > > Interesting. Now that I see the cxl_type3_read/cxl_type3_write. If the
> > > CXL memory data path goes through them, the performance would be
> > > pretty problematic. We have actually run Intel's Memory Latency
> > > Checker benchmark from inside a guest VM with both system-DRAM and
> > > virtual CXL-type3 configured. The idle latency on the virtual CXL
> > > memory is 2X of system DRAM, which is on-par with the benchmark
> > > running from a physical host. I need to debug this more to understand
> > > why the latency is actually much better than I would expect now.
> >
> > So we double checked on benchmark testing. What we see is that running
> > Intel Memory Latency Checker from a guest VM with virtual CXL memory
> > VS from a physical host with CXL1.1 memory expander has the same
> > latency.
> >
> > From guest VM: local socket system-DRAM latency is 117.0ns, local
> > socket CXL-DRAM latency is 269.4ns
> > From physical host: local socket system-DRAM latency is 113.6ns ,
> > local socket CXL-DRAM latency is 267.5ns
> >
> > I also set debugger breakpoints on cxl_type3_read/cxl_type3_write
> > while running the benchmark testing but those two functions are not
> > ever hit. We used the virtual CXL configuration while launching QEMU
> > but the CXL memory is present as a separate NUMA node and we are not
> > creating devdax devices. Does that make any difference?
> >
>
> Could you possibly share your full QEMU configuration and what OS/kernel
> you are running inside the guest?

Sounds like the technical details are explained on the other thread.
From what I understand now, if we don't go through a complex CXL
setup, it wouldn't go through the emulation path.

Here is our exact setup. Guest runs Linux kernel 6.6rc2

taskset --cpu-list 0-47,96-143 \
numactl -N 0 -m 0 ${QEMU} \
-M q35,cxl=on,hmat=on \
-m 64G \
-smp 8,sockets=1,cores=8,threads=1 \
-object memory-backend-ram,id=ram0,size=45G \
-numa node,memdev=ram0,cpus=0-7,nodeid=0 \
-msg timestamp=on -L /usr/share/seabios \
-enable-kvm \
-object memory-backend-ram,id=vmem0,size=19G,host-nodes=${HOST_CXL_NODE},policy=bind
\
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
-device cxl-type3,bus=root_port13,volatile-memdev=vmem0,id=cxl-vmem0 \
-numa node,memdev=vmem0,nodeid=1 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=19G,cxl-fmw.0.interleave-granularity=8k
\
-numa dist,src=0,dst=0,val=10 \
-numa dist,src=0,dst=1,val=14 \
-numa dist,src=1,dst=0,val=14 \
-numa dist,src=1,dst=1,val=10 \
-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=read-latency,latency=91
\
-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=read-latency,latency=100
\
-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=write-latency,latency=91
\
-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=write-latency,latency=100
\
-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=read-bandwidth,bandwidth=262100M
\
-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=read-bandwidth,bandwidth=30000M
\
-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=write-bandwidth,bandwidth=176100M
\
-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=write-bandwidth,bandwidth=30000M
\
-drive file="${DISK_IMG}",format=qcow2 \
-device pci-bridge,chassis_nr=3,id=pci.3,bus=pcie.0,addr=0xd \
-netdev tap,id=vm-sk-tap22,ifname=tap22,script=/usr/local/etc/qemu-ifup,downscript=no
\
-device virtio-net-pci,netdev=vm-sk-tap22,id=net0,mac=02:11:17:01:7e:33,bus=pci.3,addr=0x1,bootindex=3
\
-serial mon:stdio

>
> The only thing I'm surprised by is that the numa node appears without
> requiring the driver to generate the NUMA node.  It's possible I missed
> a QEMU update that allows this.
>
> ~Gregory

  reply	other threads:[~2024-01-09 19:33 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-01  7:53 [QEMU-devel][RFC PATCH 0/1] Introduce HostMemType for 'memory-backend-*' Ho-Ren (Jack) Chuang
2024-01-01  7:53 ` [QEMU-devel][RFC PATCH 1/1] backends/hostmem: qapi/qom: Add an ObjectOption for memory-backend-* called HostMemType and its arg 'cxlram' Ho-Ren (Jack) Chuang
2024-01-02 10:29   ` Philippe Mathieu-Daudé
2024-01-02 13:03   ` David Hildenbrand
2024-01-06  0:45     ` [External] " Hao Xiang
2024-01-03 21:56   ` Gregory Price
2024-01-06  5:59     ` [External] " Hao Xiang
2024-01-08 17:15       ` Gregory Price
2024-01-08 22:47         ` Hao Xiang
2024-01-09  1:05           ` Hao Xiang
2024-01-09  1:13             ` Gregory Price
2024-01-09 19:33               ` Hao Xiang [this message]
2024-01-09 19:57                 ` Gregory Price
2024-01-09 21:27                   ` Hao Xiang
2024-01-09 22:13                     ` Gregory Price
2024-01-09 23:55                       ` Hao Xiang
2024-01-10 14:31                         ` Jonathan Cameron
2024-01-10 14:31                           ` Jonathan Cameron via
2024-01-12 15:32   ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAYibXhY5p6VN7yAMpmfAgHO+gsf51dvNw68y__RYV+43CVVLQ@mail.gmail.com \
    --to=hao.xiang@bytedance.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=armbru@redhat.com \
    --cc=ben.widawsky@intel.com \
    --cc=berrange@redhat.com \
    --cc=david@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=fan.ni@samsung.com \
    --cc=gourry.memverge@gmail.com \
    --cc=gregory.price@memverge.com \
    --cc=horenc@vt.edu \
    --cc=horenchuang@bytedance.com \
    --cc=imammedo@redhat.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.