kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/3] virtio: NUMA-aware memory allocation
@ 2020-06-25 13:57 Stefan Hajnoczi
  2020-06-25 13:57 ` [RFC 1/3] virtio-pci: use NUMA-aware memory allocation in probe Stefan Hajnoczi
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2020-06-25 13:57 UTC (permalink / raw)
  To: kvm; +Cc: virtualization, Michael S. Tsirkin, Jason Wang

These patches are not ready to be merged because I was unable to measure a
performance improvement. I'm publishing them so they are archived in case
someone picks up this work again in the future.

The goal of these patches is to allocate virtqueues and driver state from the
device's NUMA node for optimal memory access latency. Only guests with a vNUMA
topology and virtio devices spread across vNUMA nodes benefit from this.  In
other cases the memory placement is fine and we don't need to take NUMA into
account inside the guest.

These patches could be extended to virtio_net.ko and other devices in the
future. I only tested virtio_blk.ko.

The benchmark configuration was designed to trigger worst-case NUMA placement:
 * Physical NVMe storage controller on host NUMA node 0
 * IOThread pinned to host NUMA node 0
 * virtio-blk-pci device in vNUMA node 1
 * vCPU 0 on host NUMA node 1 and vCPU 1 on host NUMA node 0
 * vCPU 0 in vNUMA node 0 and vCPU 1 in vNUMA node 1

The intent is to have .probe() code run on vCPU 0 in vNUMA node 0 (host NUMA
node 1) so that memory is in the wrong NUMA node for the virtio-blk-pci devic=
e.
Applying these patches fixes memory placement so that virtqueues and driver
state is allocated in vNUMA node 1 where the virtio-blk-pci device is located.

The fio 4KB randread benchmark results do not show a significant improvement:

Name                  IOPS   Error
virtio-blk        42373.79 =C2=B1 0.54%
virtio-blk-numa   42517.07 =C2=B1 0.79%

Stefan Hajnoczi (3):
  virtio-pci: use NUMA-aware memory allocation in probe
  virtio_ring: use NUMA-aware memory allocation in probe
  virtio-blk: use NUMA-aware memory allocation in probe

 include/linux/gfp.h                |  2 +-
 drivers/block/virtio_blk.c         |  7 +++++--
 drivers/virtio/virtio_pci_common.c | 16 ++++++++++++----
 drivers/virtio/virtio_ring.c       | 26 +++++++++++++++++---------
 mm/page_alloc.c                    |  2 +-
 5 files changed, 36 insertions(+), 17 deletions(-)

--=20
2.26.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC 1/3] virtio-pci: use NUMA-aware memory allocation in probe
  2020-06-25 13:57 [RFC 0/3] virtio: NUMA-aware memory allocation Stefan Hajnoczi
@ 2020-06-25 13:57 ` Stefan Hajnoczi
  2020-06-25 13:57 ` [RFC 2/3] virtio_ring: " Stefan Hajnoczi
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2020-06-25 13:57 UTC (permalink / raw)
  To: kvm; +Cc: virtualization, Michael S. Tsirkin, Jason Wang

Allocate frequently-accessed data structures from the NUMA node
associated with this virtio-pci device. This avoids slow cross-NUMA node
memory accesses.

Only the following memory allocations are made NUMA-aware:

1. Called during probe. If called in the data path then hopefully we're
   executing on a CPU in the same NUMA node as the device. If the CPU is
   not in the right NUMA node then it's unclear whether forcing memory
   allocations to use the device's NUMA node will increase or decrease
   performance.

2. Memory will be frequently accessed from the data path. There is no
   need to worry about data that is not accessed from
   performance-critical code paths.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/virtio/virtio_pci_common.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 222d630c41fc..cc6e49f9c698 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -178,11 +178,13 @@ static struct virtqueue *vp_setup_vq(struct virtio_device *vdev, unsigned index,
 				     u16 msix_vec)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-	struct virtio_pci_vq_info *info = kmalloc(sizeof *info, GFP_KERNEL);
+	int node = dev_to_node(&vdev->dev);
+	struct virtio_pci_vq_info *info;
 	struct virtqueue *vq;
 	unsigned long flags;
 
 	/* fill out our structure that represents an active queue */
+	info = kmalloc_node(sizeof *info, GFP_KERNEL, node);
 	if (!info)
 		return ERR_PTR(-ENOMEM);
 
@@ -283,10 +285,12 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs,
 		struct irq_affinity *desc)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	int node = dev_to_node(&vdev->dev);
 	u16 msix_vec;
 	int i, err, nvectors, allocated_vectors, queue_idx = 0;
 
-	vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
+	vp_dev->vqs = kcalloc_node(nvqs, sizeof(*vp_dev->vqs),
+				   GFP_KERNEL, node);
 	if (!vp_dev->vqs)
 		return -ENOMEM;
 
@@ -355,9 +359,11 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned nvqs,
 		const char * const names[], const bool *ctx)
 {
 	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+	int node = dev_to_node(&vdev->dev);
 	int i, err, queue_idx = 0;
 
-	vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
+	vp_dev->vqs = kcalloc_node(nvqs, sizeof(*vp_dev->vqs),
+				   GFP_KERNEL, node);
 	if (!vp_dev->vqs)
 		return -ENOMEM;
 
@@ -513,10 +519,12 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
 			    const struct pci_device_id *id)
 {
 	struct virtio_pci_device *vp_dev, *reg_dev = NULL;
+	int node = dev_to_node(&pci_dev->dev);
 	int rc;
 
 	/* allocate our structure and fill it out */
-	vp_dev = kzalloc(sizeof(struct virtio_pci_device), GFP_KERNEL);
+	vp_dev = kzalloc_node(sizeof(struct virtio_pci_device),
+			      GFP_KERNEL, node);
 	if (!vp_dev)
 		return -ENOMEM;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC 2/3] virtio_ring: use NUMA-aware memory allocation in probe
  2020-06-25 13:57 [RFC 0/3] virtio: NUMA-aware memory allocation Stefan Hajnoczi
  2020-06-25 13:57 ` [RFC 1/3] virtio-pci: use NUMA-aware memory allocation in probe Stefan Hajnoczi
@ 2020-06-25 13:57 ` Stefan Hajnoczi
  2020-06-25 13:57 ` [RFC 3/3] virtio-blk: " Stefan Hajnoczi
  2020-06-28  6:34 ` [RFC 0/3] virtio: NUMA-aware memory allocation Jason Wang
  3 siblings, 0 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2020-06-25 13:57 UTC (permalink / raw)
  To: kvm; +Cc: virtualization, Michael S. Tsirkin, Jason Wang

Allocate frequently-accessed data structures from the NUMA node
associated with this device to avoid slow cross-NUMA node memory
accesses.

Only the following memory allocations are made NUMA-aware:

1. Called during probe. If called in the data path then hopefully we're
   executing on a CPU in the same NUMA node as the device. If the CPU is
   not in the right NUMA node then it's unclear whether forcing memory
   allocations to use the device's NUMA node will increase or decrease
   performance.

2. Memory will be frequently accessed from the data path. There is no
   need to worry about data that is not accessed from
   performance-critical code paths.

This patch adds a non-meminit alloc_pages_exact_nid() caller so I've
removed the __meminit added by commit e19318116048 ("mm/page_alloc.c:
add __meminit to alloc_pages_exact_nid()").

Cc: Fabian Frederick <fabf@skynet.be>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
I have included the alloc_pages_exact_nid() __meminit removal in this
patch to provide context for reviewers.
---
 include/linux/gfp.h          |  2 +-
 drivers/virtio/virtio_ring.c | 26 +++++++++++++++++---------
 mm/page_alloc.c              |  2 +-
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4aba4c86c626..9b69df707c7a 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -563,7 +563,7 @@ extern unsigned long get_zeroed_page(gfp_t gfp_mask);
 
 void *alloc_pages_exact(size_t size, gfp_t gfp_mask);
 void free_pages_exact(void *virt, size_t size);
-void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
+void *alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
 
 #define __get_free_page(gfp_mask) \
 		__get_free_pages((gfp_mask), 0)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 58b96baa8d48..d06b42309bed 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -276,7 +276,9 @@ static void *vring_alloc_queue(struct virtio_device *vdev, size_t size,
 		return dma_alloc_coherent(vdev->dev.parent, size,
 					  dma_handle, flag);
 	} else {
-		void *queue = alloc_pages_exact(PAGE_ALIGN(size), flag);
+		int node = dev_to_node(&vdev->dev);
+		void *queue = alloc_pages_exact_nid(node, PAGE_ALIGN(size),
+						    flag);
 
 		if (queue) {
 			phys_addr_t phys_addr = virt_to_phys(queue);
@@ -1567,6 +1569,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	struct vring_packed_desc_event *driver, *device;
 	dma_addr_t ring_dma_addr, driver_event_dma_addr, device_event_dma_addr;
 	size_t ring_size_in_bytes, event_size_in_bytes;
+	int node = dev_to_node(&vdev->dev);
 	unsigned int i;
 
 	ring_size_in_bytes = num * sizeof(struct vring_packed_desc);
@@ -1591,7 +1594,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	if (!device)
 		goto err_device;
 
-	vq = kmalloc(sizeof(*vq), GFP_KERNEL);
+	vq = kmalloc_node(sizeof(*vq), GFP_KERNEL, node);
 	if (!vq)
 		goto err_vq;
 
@@ -1639,9 +1642,10 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->packed.event_flags_shadow = 0;
 	vq->packed.avail_used_flags = 1 << VRING_PACKED_DESC_F_AVAIL;
 
-	vq->packed.desc_state = kmalloc_array(num,
+	vq->packed.desc_state = kmalloc_array_node(num,
 			sizeof(struct vring_desc_state_packed),
-			GFP_KERNEL);
+			GFP_KERNEL,
+			node);
 	if (!vq->packed.desc_state)
 		goto err_desc_state;
 
@@ -1653,9 +1657,10 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	for (i = 0; i < num-1; i++)
 		vq->packed.desc_state[i].next = i + 1;
 
-	vq->packed.desc_extra = kmalloc_array(num,
+	vq->packed.desc_extra = kmalloc_array_node(num,
 			sizeof(struct vring_desc_extra_packed),
-			GFP_KERNEL);
+			GFP_KERNEL,
+			node);
 	if (!vq->packed.desc_extra)
 		goto err_desc_extra;
 
@@ -2059,13 +2064,14 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 					void (*callback)(struct virtqueue *),
 					const char *name)
 {
+	int node = dev_to_node(&vdev->dev);
 	unsigned int i;
 	struct vring_virtqueue *vq;
 
 	if (virtio_has_feature(vdev, VIRTIO_F_RING_PACKED))
 		return NULL;
 
-	vq = kmalloc(sizeof(*vq), GFP_KERNEL);
+	vq = kmalloc_node(sizeof(*vq), GFP_KERNEL, node);
 	if (!vq)
 		return NULL;
 
@@ -2110,8 +2116,10 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 					vq->split.avail_flags_shadow);
 	}
 
-	vq->split.desc_state = kmalloc_array(vring.num,
-			sizeof(struct vring_desc_state_split), GFP_KERNEL);
+	vq->split.desc_state = kmalloc_array_node(vring.num,
+			sizeof(struct vring_desc_state_split),
+			GFP_KERNEL,
+			node);
 	if (!vq->split.desc_state) {
 		kfree(vq);
 		return NULL;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 13cc653122b7..2216022d8987 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5053,7 +5053,7 @@ EXPORT_SYMBOL(alloc_pages_exact);
  *
  * Return: pointer to the allocated area or %NULL in case of error.
  */
-void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
+void *alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
 {
 	unsigned int order = get_order(size);
 	struct page *p;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC 3/3] virtio-blk: use NUMA-aware memory allocation in probe
  2020-06-25 13:57 [RFC 0/3] virtio: NUMA-aware memory allocation Stefan Hajnoczi
  2020-06-25 13:57 ` [RFC 1/3] virtio-pci: use NUMA-aware memory allocation in probe Stefan Hajnoczi
  2020-06-25 13:57 ` [RFC 2/3] virtio_ring: " Stefan Hajnoczi
@ 2020-06-25 13:57 ` Stefan Hajnoczi
  2020-06-28  6:34 ` [RFC 0/3] virtio: NUMA-aware memory allocation Jason Wang
  3 siblings, 0 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2020-06-25 13:57 UTC (permalink / raw)
  To: kvm; +Cc: virtualization, Michael S. Tsirkin, Jason Wang

Allocate frequently-accessed data structures from the NUMA node
associated with this device to avoid slow cross-NUMA node memory
accesses.

Only the following memory allocations are made NUMA-aware:

1. Called during probe. If called in the data path then hopefully we're
   executing on a CPU in the same NUMA node as the device. If the CPU is
   not in the right NUMA node then it's unclear whether forcing memory
   allocations to use the device's NUMA node will increase or decrease
   performance.

2. Memory will be frequently accessed from the data path. There is no
   need to worry about data that is not accessed from
   performance-critical code paths.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/block/virtio_blk.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 9d21bf0f155e..40845e9ad3b1 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -482,6 +482,7 @@ static int init_vq(struct virtio_blk *vblk)
 	unsigned short num_vqs;
 	struct virtio_device *vdev = vblk->vdev;
 	struct irq_affinity desc = { 0, };
+	int node = dev_to_node(&vdev->dev);
 
 	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_MQ,
 				   struct virtio_blk_config, num_queues,
@@ -491,7 +492,8 @@ static int init_vq(struct virtio_blk *vblk)
 
 	num_vqs = min_t(unsigned int, nr_cpu_ids, num_vqs);
 
-	vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
+	vblk->vqs = kmalloc_array_node(num_vqs, sizeof(*vblk->vqs),
+				       GFP_KERNEL, node);
 	if (!vblk->vqs)
 		return -ENOMEM;
 
@@ -683,6 +685,7 @@ module_param_named(queue_depth, virtblk_queue_depth, uint, 0444);
 
 static int virtblk_probe(struct virtio_device *vdev)
 {
+	int node = dev_to_node(&vdev->dev);
 	struct virtio_blk *vblk;
 	struct request_queue *q;
 	int err, index;
@@ -714,7 +717,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 
 	/* We need an extra sg elements at head and tail. */
 	sg_elems += 2;
-	vdev->priv = vblk = kmalloc(sizeof(*vblk), GFP_KERNEL);
+	vdev->priv = vblk = kmalloc_node(sizeof(*vblk), GFP_KERNEL, node);
 	if (!vblk) {
 		err = -ENOMEM;
 		goto out_free_index;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC 0/3] virtio: NUMA-aware memory allocation
  2020-06-25 13:57 [RFC 0/3] virtio: NUMA-aware memory allocation Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2020-06-25 13:57 ` [RFC 3/3] virtio-blk: " Stefan Hajnoczi
@ 2020-06-28  6:34 ` Jason Wang
  2020-06-29  9:26   ` Stefan Hajnoczi
  3 siblings, 1 reply; 8+ messages in thread
From: Jason Wang @ 2020-06-28  6:34 UTC (permalink / raw)
  To: Stefan Hajnoczi, kvm; +Cc: virtualization, Michael S. Tsirkin


On 2020/6/25 下午9:57, Stefan Hajnoczi wrote:
> These patches are not ready to be merged because I was unable to measure a
> performance improvement. I'm publishing them so they are archived in case
> someone picks up this work again in the future.
>
> The goal of these patches is to allocate virtqueues and driver state from the
> device's NUMA node for optimal memory access latency. Only guests with a vNUMA
> topology and virtio devices spread across vNUMA nodes benefit from this.  In
> other cases the memory placement is fine and we don't need to take NUMA into
> account inside the guest.
>
> These patches could be extended to virtio_net.ko and other devices in the
> future. I only tested virtio_blk.ko.
>
> The benchmark configuration was designed to trigger worst-case NUMA placement:
>   * Physical NVMe storage controller on host NUMA node 0
>   * IOThread pinned to host NUMA node 0
>   * virtio-blk-pci device in vNUMA node 1
>   * vCPU 0 on host NUMA node 1 and vCPU 1 on host NUMA node 0
>   * vCPU 0 in vNUMA node 0 and vCPU 1 in vNUMA node 1
>
> The intent is to have .probe() code run on vCPU 0 in vNUMA node 0 (host NUMA
> node 1) so that memory is in the wrong NUMA node for the virtio-blk-pci devic=
> e.
> Applying these patches fixes memory placement so that virtqueues and driver
> state is allocated in vNUMA node 1 where the virtio-blk-pci device is located.
>
> The fio 4KB randread benchmark results do not show a significant improvement:
>
> Name                  IOPS   Error
> virtio-blk        42373.79 =C2=B1 0.54%
> virtio-blk-numa   42517.07 =C2=B1 0.79%


I remember I did something similar in vhost by using page_to_nid() for 
descriptor ring. And I get little improvement as shown here.

Michael reminds that it was probably because all data were cached. So I 
doubt if the test lacks sufficient stress on the cache ...

Thanks


>
> Stefan Hajnoczi (3):
>    virtio-pci: use NUMA-aware memory allocation in probe
>    virtio_ring: use NUMA-aware memory allocation in probe
>    virtio-blk: use NUMA-aware memory allocation in probe
>
>   include/linux/gfp.h                |  2 +-
>   drivers/block/virtio_blk.c         |  7 +++++--
>   drivers/virtio/virtio_pci_common.c | 16 ++++++++++++----
>   drivers/virtio/virtio_ring.c       | 26 +++++++++++++++++---------
>   mm/page_alloc.c                    |  2 +-
>   5 files changed, 36 insertions(+), 17 deletions(-)
>
> --=20
> 2.26.2
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC 0/3] virtio: NUMA-aware memory allocation
  2020-06-28  6:34 ` [RFC 0/3] virtio: NUMA-aware memory allocation Jason Wang
@ 2020-06-29  9:26   ` Stefan Hajnoczi
  2020-06-29 15:28     ` Michael S. Tsirkin
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Hajnoczi @ 2020-06-29  9:26 UTC (permalink / raw)
  To: Jason Wang; +Cc: Stefan Hajnoczi, kvm, Michael S. Tsirkin, virtualization

[-- Attachment #1: Type: text/plain, Size: 2186 bytes --]

On Sun, Jun 28, 2020 at 02:34:37PM +0800, Jason Wang wrote:
> 
> On 2020/6/25 下午9:57, Stefan Hajnoczi wrote:
> > These patches are not ready to be merged because I was unable to measure a
> > performance improvement. I'm publishing them so they are archived in case
> > someone picks up this work again in the future.
> > 
> > The goal of these patches is to allocate virtqueues and driver state from the
> > device's NUMA node for optimal memory access latency. Only guests with a vNUMA
> > topology and virtio devices spread across vNUMA nodes benefit from this.  In
> > other cases the memory placement is fine and we don't need to take NUMA into
> > account inside the guest.
> > 
> > These patches could be extended to virtio_net.ko and other devices in the
> > future. I only tested virtio_blk.ko.
> > 
> > The benchmark configuration was designed to trigger worst-case NUMA placement:
> >   * Physical NVMe storage controller on host NUMA node 0
> >   * IOThread pinned to host NUMA node 0
> >   * virtio-blk-pci device in vNUMA node 1
> >   * vCPU 0 on host NUMA node 1 and vCPU 1 on host NUMA node 0
> >   * vCPU 0 in vNUMA node 0 and vCPU 1 in vNUMA node 1
> > 
> > The intent is to have .probe() code run on vCPU 0 in vNUMA node 0 (host NUMA
> > node 1) so that memory is in the wrong NUMA node for the virtio-blk-pci devic=
> > e.
> > Applying these patches fixes memory placement so that virtqueues and driver
> > state is allocated in vNUMA node 1 where the virtio-blk-pci device is located.
> > 
> > The fio 4KB randread benchmark results do not show a significant improvement:
> > 
> > Name                  IOPS   Error
> > virtio-blk        42373.79 =C2=B1 0.54%
> > virtio-blk-numa   42517.07 =C2=B1 0.79%
> 
> 
> I remember I did something similar in vhost by using page_to_nid() for
> descriptor ring. And I get little improvement as shown here.
> 
> Michael reminds that it was probably because all data were cached. So I
> doubt if the test lacks sufficient stress on the cache ...

Yes, that sounds likely. If there's no real-world performance
improvement then I'm happy to leave these patches unmerged.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC 0/3] virtio: NUMA-aware memory allocation
  2020-06-29  9:26   ` Stefan Hajnoczi
@ 2020-06-29 15:28     ` Michael S. Tsirkin
  2020-06-30  8:47       ` Stefan Hajnoczi
  0 siblings, 1 reply; 8+ messages in thread
From: Michael S. Tsirkin @ 2020-06-29 15:28 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Jason Wang, Stefan Hajnoczi, kvm, virtualization

On Mon, Jun 29, 2020 at 10:26:46AM +0100, Stefan Hajnoczi wrote:
> On Sun, Jun 28, 2020 at 02:34:37PM +0800, Jason Wang wrote:
> > 
> > On 2020/6/25 下午9:57, Stefan Hajnoczi wrote:
> > > These patches are not ready to be merged because I was unable to measure a
> > > performance improvement. I'm publishing them so they are archived in case
> > > someone picks up this work again in the future.
> > > 
> > > The goal of these patches is to allocate virtqueues and driver state from the
> > > device's NUMA node for optimal memory access latency. Only guests with a vNUMA
> > > topology and virtio devices spread across vNUMA nodes benefit from this.  In
> > > other cases the memory placement is fine and we don't need to take NUMA into
> > > account inside the guest.
> > > 
> > > These patches could be extended to virtio_net.ko and other devices in the
> > > future. I only tested virtio_blk.ko.
> > > 
> > > The benchmark configuration was designed to trigger worst-case NUMA placement:
> > >   * Physical NVMe storage controller on host NUMA node 0

It's possible that numa is not such a big deal for NVMe.
And it's possible that bios misconfigures ACPI reporting NUMA placement
incorrectly.
I think that the best thing to try is to use a ramdisk
on a specific numa node.

> > >   * IOThread pinned to host NUMA node 0
> > >   * virtio-blk-pci device in vNUMA node 1
> > >   * vCPU 0 on host NUMA node 1 and vCPU 1 on host NUMA node 0
> > >   * vCPU 0 in vNUMA node 0 and vCPU 1 in vNUMA node 1
> > > 
> > > The intent is to have .probe() code run on vCPU 0 in vNUMA node 0 (host NUMA
> > > node 1) so that memory is in the wrong NUMA node for the virtio-blk-pci devic=
> > > e.
> > > Applying these patches fixes memory placement so that virtqueues and driver
> > > state is allocated in vNUMA node 1 where the virtio-blk-pci device is located.
> > > 
> > > The fio 4KB randread benchmark results do not show a significant improvement:
> > > 
> > > Name                  IOPS   Error
> > > virtio-blk        42373.79 =C2=B1 0.54%
> > > virtio-blk-numa   42517.07 =C2=B1 0.79%
> > 
> > 
> > I remember I did something similar in vhost by using page_to_nid() for
> > descriptor ring. And I get little improvement as shown here.
> > 
> > Michael reminds that it was probably because all data were cached. So I
> > doubt if the test lacks sufficient stress on the cache ...
> 
> Yes, that sounds likely. If there's no real-world performance
> improvement then I'm happy to leave these patches unmerged.
> 
> Stefan


Well that was for vhost though. This is virtio, which is different.
Doesn't some benchmark put pressure on the CPU cache?


I kind of feel there should be a difference, and the fact there isn't
means there's some other bottleneck somewhere. Might be worth
figuring out.

-- 
MST


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC 0/3] virtio: NUMA-aware memory allocation
  2020-06-29 15:28     ` Michael S. Tsirkin
@ 2020-06-30  8:47       ` Stefan Hajnoczi
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2020-06-30  8:47 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stefan Hajnoczi, Jason Wang, kvm, virtualization

[-- Attachment #1: Type: text/plain, Size: 1484 bytes --]

On Mon, Jun 29, 2020 at 11:28:41AM -0400, Michael S. Tsirkin wrote:
> On Mon, Jun 29, 2020 at 10:26:46AM +0100, Stefan Hajnoczi wrote:
> > On Sun, Jun 28, 2020 at 02:34:37PM +0800, Jason Wang wrote:
> > > 
> > > On 2020/6/25 下午9:57, Stefan Hajnoczi wrote:
> > > > These patches are not ready to be merged because I was unable to measure a
> > > > performance improvement. I'm publishing them so they are archived in case
> > > > someone picks up this work again in the future.
> > > > 
> > > > The goal of these patches is to allocate virtqueues and driver state from the
> > > > device's NUMA node for optimal memory access latency. Only guests with a vNUMA
> > > > topology and virtio devices spread across vNUMA nodes benefit from this.  In
> > > > other cases the memory placement is fine and we don't need to take NUMA into
> > > > account inside the guest.
> > > > 
> > > > These patches could be extended to virtio_net.ko and other devices in the
> > > > future. I only tested virtio_blk.ko.
> > > > 
> > > > The benchmark configuration was designed to trigger worst-case NUMA placement:
> > > >   * Physical NVMe storage controller on host NUMA node 0
> 
> It's possible that numa is not such a big deal for NVMe.
> And it's possible that bios misconfigures ACPI reporting NUMA placement
> incorrectly.
> I think that the best thing to try is to use a ramdisk
> on a specific numa node.

Using ramdisk is an interesting idea, thanks.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-06-30  8:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-25 13:57 [RFC 0/3] virtio: NUMA-aware memory allocation Stefan Hajnoczi
2020-06-25 13:57 ` [RFC 1/3] virtio-pci: use NUMA-aware memory allocation in probe Stefan Hajnoczi
2020-06-25 13:57 ` [RFC 2/3] virtio_ring: " Stefan Hajnoczi
2020-06-25 13:57 ` [RFC 3/3] virtio-blk: " Stefan Hajnoczi
2020-06-28  6:34 ` [RFC 0/3] virtio: NUMA-aware memory allocation Jason Wang
2020-06-29  9:26   ` Stefan Hajnoczi
2020-06-29 15:28     ` Michael S. Tsirkin
2020-06-30  8:47       ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).