All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] virtio: introduce virtio_dev_to_node helper
@ 2021-09-26 14:55 Max Gurtovoy
  2021-09-26 14:55 ` [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset Max Gurtovoy
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Max Gurtovoy @ 2021-09-26 14:55 UTC (permalink / raw)
  To: mst, virtualization, kvm, stefanha
  Cc: oren, nitzanc, israelr, hch, linux-block, axboe, Max Gurtovoy

Also expose numa_node field as a sysfs attribute. Now virtio device
drivers will be able to allocate memory that is node-local to the
device. This significantly helps performance and it's oftenly used in
other drivers such as NVMe, for example.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/virtio/virtio.c | 10 ++++++++++
 include/linux/virtio.h  | 13 +++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 588e02fb91d3..bdbd76c5c58c 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -60,12 +60,22 @@ static ssize_t features_show(struct device *_d,
 }
 static DEVICE_ATTR_RO(features);
 
+static ssize_t numa_node_show(struct device *_d,
+			      struct device_attribute *attr, char *buf)
+{
+	struct virtio_device *vdev = dev_to_virtio(_d);
+
+	return sysfs_emit(buf, "%d\n", virtio_dev_to_node(vdev));
+}
+static DEVICE_ATTR_RO(numa_node);
+
 static struct attribute *virtio_dev_attrs[] = {
 	&dev_attr_device.attr,
 	&dev_attr_vendor.attr,
 	&dev_attr_status.attr,
 	&dev_attr_modalias.attr,
 	&dev_attr_features.attr,
+	&dev_attr_numa_node.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(virtio_dev);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 41edbc01ffa4..05b586ac71d1 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -125,6 +125,19 @@ static inline struct virtio_device *dev_to_virtio(struct device *_dev)
 	return container_of(_dev, struct virtio_device, dev);
 }
 
+/**
+ * virtio_dev_to_node - return the NUMA node for a given virtio device
+ * @vdev:	device to get the NUMA node for.
+ */
+static inline int virtio_dev_to_node(struct virtio_device *vdev)
+{
+	struct device *parent = vdev->dev.parent;
+
+	if (!parent)
+		return NUMA_NO_NODE;
+	return dev_to_node(parent);
+}
+
 void virtio_add_status(struct virtio_device *dev, unsigned int status);
 int register_virtio_device(struct virtio_device *dev);
 void unregister_virtio_device(struct virtio_device *dev);
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-26 14:55 [PATCH 1/2] virtio: introduce virtio_dev_to_node helper Max Gurtovoy
@ 2021-09-26 14:55 ` Max Gurtovoy
  2021-09-27  8:09     ` Stefan Hajnoczi
  2021-09-27 11:34     ` Leon Romanovsky
  2021-09-27  8:02   ` Stefan Hajnoczi
  2021-09-27  9:31   ` Michael S. Tsirkin
  2 siblings, 2 replies; 28+ messages in thread
From: Max Gurtovoy @ 2021-09-26 14:55 UTC (permalink / raw)
  To: mst, virtualization, kvm, stefanha
  Cc: oren, nitzanc, israelr, hch, linux-block, axboe, Max Gurtovoy

To optimize performance, set the affinity of the block device tagset
according to the virtio device affinity.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/block/virtio_blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 9b3bd083b411..1c68c3e0ebf9 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
 	vblk->tag_set.ops = &virtio_mq_ops;
 	vblk->tag_set.queue_depth = queue_depth;
-	vblk->tag_set.numa_node = NUMA_NO_NODE;
+	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
 	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
 	vblk->tag_set.cmd_size =
 		sizeof(struct virtblk_req) +
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] virtio: introduce virtio_dev_to_node helper
  2021-09-26 14:55 [PATCH 1/2] virtio: introduce virtio_dev_to_node helper Max Gurtovoy
@ 2021-09-27  8:02   ` Stefan Hajnoczi
  2021-09-27  8:02   ` Stefan Hajnoczi
  2021-09-27  9:31   ` Michael S. Tsirkin
  2 siblings, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2021-09-27  8:02 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: mst, virtualization, kvm, oren, nitzanc, israelr, hch,
	linux-block, axboe

[-- Attachment #1: Type: text/plain, Size: 568 bytes --]

On Sun, Sep 26, 2021 at 05:55:17PM +0300, Max Gurtovoy wrote:
> Also expose numa_node field as a sysfs attribute. Now virtio device
> drivers will be able to allocate memory that is node-local to the
> device. This significantly helps performance and it's oftenly used in
> other drivers such as NVMe, for example.
> 
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>  drivers/virtio/virtio.c | 10 ++++++++++
>  include/linux/virtio.h  | 13 +++++++++++++
>  2 files changed, 23 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] virtio: introduce virtio_dev_to_node helper
@ 2021-09-27  8:02   ` Stefan Hajnoczi
  0 siblings, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2021-09-27  8:02 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, linux-block, kvm, mst, israelr, virtualization, hch,
	nitzanc, oren


[-- Attachment #1.1: Type: text/plain, Size: 568 bytes --]

On Sun, Sep 26, 2021 at 05:55:17PM +0300, Max Gurtovoy wrote:
> Also expose numa_node field as a sysfs attribute. Now virtio device
> drivers will be able to allocate memory that is node-local to the
> device. This significantly helps performance and it's oftenly used in
> other drivers such as NVMe, for example.
> 
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>  drivers/virtio/virtio.c | 10 ++++++++++
>  include/linux/virtio.h  | 13 +++++++++++++
>  2 files changed, 23 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-26 14:55 ` [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset Max Gurtovoy
@ 2021-09-27  8:09     ` Stefan Hajnoczi
  2021-09-27 11:34     ` Leon Romanovsky
  1 sibling, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2021-09-27  8:09 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: mst, virtualization, kvm, oren, nitzanc, israelr, hch,
	linux-block, axboe

[-- Attachment #1: Type: text/plain, Size: 1479 bytes --]

On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> To optimize performance, set the affinity of the block device tagset
> according to the virtio device affinity.
> 
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>  drivers/block/virtio_blk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 9b3bd083b411..1c68c3e0ebf9 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>  	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
>  	vblk->tag_set.ops = &virtio_mq_ops;
>  	vblk->tag_set.queue_depth = queue_depth;
> -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
>  	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
>  	vblk->tag_set.cmd_size =
>  		sizeof(struct virtblk_req) +

I implemented NUMA affinity in the past and could not demonstrate a
performance improvement:
https://lists.linuxfoundation.org/pipermail/virtualization/2020-June/048248.html

The pathological case is when a guest with vNUMA has the virtio-blk-pci
device on the "wrong" host NUMA node. Then memory accesses should cross
NUMA nodes. Still, it didn't seem to matter.

Please share your benchmark results. If you haven't collected data yet
you could even combine our patches to see if it helps. Thanks!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
@ 2021-09-27  8:09     ` Stefan Hajnoczi
  0 siblings, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2021-09-27  8:09 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, linux-block, kvm, mst, israelr, virtualization, hch,
	nitzanc, oren


[-- Attachment #1.1: Type: text/plain, Size: 1479 bytes --]

On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> To optimize performance, set the affinity of the block device tagset
> according to the virtio device affinity.
> 
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>  drivers/block/virtio_blk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 9b3bd083b411..1c68c3e0ebf9 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>  	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
>  	vblk->tag_set.ops = &virtio_mq_ops;
>  	vblk->tag_set.queue_depth = queue_depth;
> -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
>  	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
>  	vblk->tag_set.cmd_size =
>  		sizeof(struct virtblk_req) +

I implemented NUMA affinity in the past and could not demonstrate a
performance improvement:
https://lists.linuxfoundation.org/pipermail/virtualization/2020-June/048248.html

The pathological case is when a guest with vNUMA has the virtio-blk-pci
device on the "wrong" host NUMA node. Then memory accesses should cross
NUMA nodes. Still, it didn't seem to matter.

Please share your benchmark results. If you haven't collected data yet
you could even combine our patches to see if it helps. Thanks!

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] virtio: introduce virtio_dev_to_node helper
  2021-09-26 14:55 [PATCH 1/2] virtio: introduce virtio_dev_to_node helper Max Gurtovoy
@ 2021-09-27  9:31   ` Michael S. Tsirkin
  2021-09-27  8:02   ` Stefan Hajnoczi
  2021-09-27  9:31   ` Michael S. Tsirkin
  2 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2021-09-27  9:31 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtualization, kvm, stefanha, oren, nitzanc, israelr, hch,
	linux-block, axboe

On Sun, Sep 26, 2021 at 05:55:17PM +0300, Max Gurtovoy wrote:
> Also expose numa_node field as a sysfs attribute. Now virtio device
> drivers will be able to allocate memory that is node-local to the
> device. This significantly helps performance and it's oftenly used in
> other drivers such as NVMe, for example.
> 
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>

If you have to respin this, it is better to split this in
two patches, one with the helper one adding a sysfs attribute.


> ---
>  drivers/virtio/virtio.c | 10 ++++++++++
>  include/linux/virtio.h  | 13 +++++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 588e02fb91d3..bdbd76c5c58c 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -60,12 +60,22 @@ static ssize_t features_show(struct device *_d,
>  }
>  static DEVICE_ATTR_RO(features);
>  
> +static ssize_t numa_node_show(struct device *_d,
> +			      struct device_attribute *attr, char *buf)
> +{
> +	struct virtio_device *vdev = dev_to_virtio(_d);
> +
> +	return sysfs_emit(buf, "%d\n", virtio_dev_to_node(vdev));
> +}
> +static DEVICE_ATTR_RO(numa_node);
> +
>  static struct attribute *virtio_dev_attrs[] = {
>  	&dev_attr_device.attr,
>  	&dev_attr_vendor.attr,
>  	&dev_attr_status.attr,
>  	&dev_attr_modalias.attr,
>  	&dev_attr_features.attr,
> +	&dev_attr_numa_node.attr,
>  	NULL,
>  };
>  ATTRIBUTE_GROUPS(virtio_dev);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 41edbc01ffa4..05b586ac71d1 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -125,6 +125,19 @@ static inline struct virtio_device *dev_to_virtio(struct device *_dev)
>  	return container_of(_dev, struct virtio_device, dev);
>  }
>  
> +/**
> + * virtio_dev_to_node - return the NUMA node for a given virtio device
> + * @vdev:	device to get the NUMA node for.
> + */
> +static inline int virtio_dev_to_node(struct virtio_device *vdev)
> +{
> +	struct device *parent = vdev->dev.parent;
> +
> +	if (!parent)
> +		return NUMA_NO_NODE;
> +	return dev_to_node(parent);
> +}
> +
>  void virtio_add_status(struct virtio_device *dev, unsigned int status);
>  int register_virtio_device(struct virtio_device *dev);
>  void unregister_virtio_device(struct virtio_device *dev);
> -- 
> 2.18.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] virtio: introduce virtio_dev_to_node helper
@ 2021-09-27  9:31   ` Michael S. Tsirkin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2021-09-27  9:31 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, linux-block, kvm, israelr, virtualization, hch, nitzanc,
	stefanha, oren

On Sun, Sep 26, 2021 at 05:55:17PM +0300, Max Gurtovoy wrote:
> Also expose numa_node field as a sysfs attribute. Now virtio device
> drivers will be able to allocate memory that is node-local to the
> device. This significantly helps performance and it's oftenly used in
> other drivers such as NVMe, for example.
> 
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>

If you have to respin this, it is better to split this in
two patches, one with the helper one adding a sysfs attribute.


> ---
>  drivers/virtio/virtio.c | 10 ++++++++++
>  include/linux/virtio.h  | 13 +++++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 588e02fb91d3..bdbd76c5c58c 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -60,12 +60,22 @@ static ssize_t features_show(struct device *_d,
>  }
>  static DEVICE_ATTR_RO(features);
>  
> +static ssize_t numa_node_show(struct device *_d,
> +			      struct device_attribute *attr, char *buf)
> +{
> +	struct virtio_device *vdev = dev_to_virtio(_d);
> +
> +	return sysfs_emit(buf, "%d\n", virtio_dev_to_node(vdev));
> +}
> +static DEVICE_ATTR_RO(numa_node);
> +
>  static struct attribute *virtio_dev_attrs[] = {
>  	&dev_attr_device.attr,
>  	&dev_attr_vendor.attr,
>  	&dev_attr_status.attr,
>  	&dev_attr_modalias.attr,
>  	&dev_attr_features.attr,
> +	&dev_attr_numa_node.attr,
>  	NULL,
>  };
>  ATTRIBUTE_GROUPS(virtio_dev);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 41edbc01ffa4..05b586ac71d1 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -125,6 +125,19 @@ static inline struct virtio_device *dev_to_virtio(struct device *_dev)
>  	return container_of(_dev, struct virtio_device, dev);
>  }
>  
> +/**
> + * virtio_dev_to_node - return the NUMA node for a given virtio device
> + * @vdev:	device to get the NUMA node for.
> + */
> +static inline int virtio_dev_to_node(struct virtio_device *vdev)
> +{
> +	struct device *parent = vdev->dev.parent;
> +
> +	if (!parent)
> +		return NUMA_NO_NODE;
> +	return dev_to_node(parent);
> +}
> +
>  void virtio_add_status(struct virtio_device *dev, unsigned int status);
>  int register_virtio_device(struct virtio_device *dev);
>  void unregister_virtio_device(struct virtio_device *dev);
> -- 
> 2.18.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-26 14:55 ` [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset Max Gurtovoy
@ 2021-09-27 11:34     ` Leon Romanovsky
  2021-09-27 11:34     ` Leon Romanovsky
  1 sibling, 0 replies; 28+ messages in thread
From: Leon Romanovsky @ 2021-09-27 11:34 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: mst, virtualization, kvm, stefanha, oren, nitzanc, israelr, hch,
	linux-block, axboe

On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> To optimize performance, set the affinity of the block device tagset
> according to the virtio device affinity.
> 
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>  drivers/block/virtio_blk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 9b3bd083b411..1c68c3e0ebf9 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>  	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
>  	vblk->tag_set.ops = &virtio_mq_ops;
>  	vblk->tag_set.queue_depth = queue_depth;
> -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);

I afraid that by doing it, you will increase chances to see OOM, because
in NUMA_NO_NODE, MM will try allocate memory in whole system, while in
the latter mode only on specific NUMA which can be depleted.

Thanks

>  	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
>  	vblk->tag_set.cmd_size =
>  		sizeof(struct virtblk_req) +
> -- 
> 2.18.1
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
@ 2021-09-27 11:34     ` Leon Romanovsky
  0 siblings, 0 replies; 28+ messages in thread
From: Leon Romanovsky @ 2021-09-27 11:34 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, linux-block, kvm, mst, israelr, virtualization, hch,
	nitzanc, stefanha, oren

On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> To optimize performance, set the affinity of the block device tagset
> according to the virtio device affinity.
> 
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>  drivers/block/virtio_blk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 9b3bd083b411..1c68c3e0ebf9 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>  	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
>  	vblk->tag_set.ops = &virtio_mq_ops;
>  	vblk->tag_set.queue_depth = queue_depth;
> -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);

I afraid that by doing it, you will increase chances to see OOM, because
in NUMA_NO_NODE, MM will try allocate memory in whole system, while in
the latter mode only on specific NUMA which can be depleted.

Thanks

>  	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
>  	vblk->tag_set.cmd_size =
>  		sizeof(struct virtblk_req) +
> -- 
> 2.18.1
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-27 11:34     ` Leon Romanovsky
  (?)
@ 2021-09-27 17:25     ` Max Gurtovoy
  2021-09-27 18:23         ` Leon Romanovsky
  -1 siblings, 1 reply; 28+ messages in thread
From: Max Gurtovoy @ 2021-09-27 17:25 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: mst, virtualization, kvm, stefanha, oren, nitzanc, israelr, hch,
	linux-block, axboe


On 9/27/2021 2:34 PM, Leon Romanovsky wrote:
> On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
>> To optimize performance, set the affinity of the block device tagset
>> according to the virtio device affinity.
>>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>> ---
>>   drivers/block/virtio_blk.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index 9b3bd083b411..1c68c3e0ebf9 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>>   	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
>>   	vblk->tag_set.ops = &virtio_mq_ops;
>>   	vblk->tag_set.queue_depth = queue_depth;
>> -	vblk->tag_set.numa_node = NUMA_NO_NODE;
>> +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
> I afraid that by doing it, you will increase chances to see OOM, because
> in NUMA_NO_NODE, MM will try allocate memory in whole system, while in
> the latter mode only on specific NUMA which can be depleted.

This is a common methodology we use in the block layer and in NVMe 
subsystem and we don't afraid of the OOM issue you raised.

This is not new and I guess that the kernel MM will (or should) be 
handling the fallback you raised.

Anyway, if we're doing this in NVMe I don't see a reason to afraid doing 
it in virtio-blk.

Also, I've send a patch that decrease the size of the memory consumption 
for virtio-blk few weeks ago so I guess we'll be just fine.

>
> Thanks
>
>>   	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
>>   	vblk->tag_set.cmd_size =
>>   		sizeof(struct virtblk_req) +
>> -- 
>> 2.18.1
>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-27  8:09     ` Stefan Hajnoczi
  (?)
@ 2021-09-27 17:39     ` Max Gurtovoy
  2021-09-28  6:47         ` Stefan Hajnoczi
  -1 siblings, 1 reply; 28+ messages in thread
From: Max Gurtovoy @ 2021-09-27 17:39 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: mst, virtualization, kvm, oren, nitzanc, israelr, hch,
	linux-block, axboe


On 9/27/2021 11:09 AM, Stefan Hajnoczi wrote:
> On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
>> To optimize performance, set the affinity of the block device tagset
>> according to the virtio device affinity.
>>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>> ---
>>   drivers/block/virtio_blk.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index 9b3bd083b411..1c68c3e0ebf9 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>>   	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
>>   	vblk->tag_set.ops = &virtio_mq_ops;
>>   	vblk->tag_set.queue_depth = queue_depth;
>> -	vblk->tag_set.numa_node = NUMA_NO_NODE;
>> +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
>>   	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
>>   	vblk->tag_set.cmd_size =
>>   		sizeof(struct virtblk_req) +
> I implemented NUMA affinity in the past and could not demonstrate a
> performance improvement:
> https://lists.linuxfoundation.org/pipermail/virtualization/2020-June/048248.html
>
> The pathological case is when a guest with vNUMA has the virtio-blk-pci
> device on the "wrong" host NUMA node. Then memory accesses should cross
> NUMA nodes. Still, it didn't seem to matter.

I think the reason you didn't see any improvement is since you didn't 
use the right device for the node query. See my patch 1/2.

I can try integrating these patches in my series and fix it.

BTW, we might not see a big improvement because of other bottlenecks but 
this is known perf optimization we use often in block storage drivers.


>
> Please share your benchmark results. If you haven't collected data yet
> you could even combine our patches to see if it helps. Thanks!
>
> Stefan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-27 17:25     ` Max Gurtovoy
@ 2021-09-27 18:23         ` Leon Romanovsky
  0 siblings, 0 replies; 28+ messages in thread
From: Leon Romanovsky @ 2021-09-27 18:23 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: mst, virtualization, kvm, stefanha, oren, nitzanc, israelr, hch,
	linux-block, axboe

On Mon, Sep 27, 2021 at 08:25:09PM +0300, Max Gurtovoy wrote:
> 
> On 9/27/2021 2:34 PM, Leon Romanovsky wrote:
> > On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> > > To optimize performance, set the affinity of the block device tagset
> > > according to the virtio device affinity.
> > > 
> > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > ---
> > >   drivers/block/virtio_blk.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > index 9b3bd083b411..1c68c3e0ebf9 100644
> > > --- a/drivers/block/virtio_blk.c
> > > +++ b/drivers/block/virtio_blk.c
> > > @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > >   	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
> > >   	vblk->tag_set.ops = &virtio_mq_ops;
> > >   	vblk->tag_set.queue_depth = queue_depth;
> > > -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> > > +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
> > I afraid that by doing it, you will increase chances to see OOM, because
> > in NUMA_NO_NODE, MM will try allocate memory in whole system, while in
> > the latter mode only on specific NUMA which can be depleted.
> 
> This is a common methodology we use in the block layer and in NVMe subsystem
> and we don't afraid of the OOM issue you raised.

There are many reasons for that, but we are talking about virtio here
and not about NVMe.

> 
> This is not new and I guess that the kernel MM will (or should) be handling
> the fallback you raised.

I afraid that it is not. Can you point me to the place where such
fallback is implemented?

> 
> Anyway, if we're doing this in NVMe I don't see a reason to afraid doing it
> in virtio-blk.

Still, it is nice to have some empirical data to support this copy/paste.

There are too many myths related to optimizations, so finally it will be
good to get some supportive data.

Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
@ 2021-09-27 18:23         ` Leon Romanovsky
  0 siblings, 0 replies; 28+ messages in thread
From: Leon Romanovsky @ 2021-09-27 18:23 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, linux-block, kvm, mst, israelr, virtualization, hch,
	nitzanc, stefanha, oren

On Mon, Sep 27, 2021 at 08:25:09PM +0300, Max Gurtovoy wrote:
> 
> On 9/27/2021 2:34 PM, Leon Romanovsky wrote:
> > On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> > > To optimize performance, set the affinity of the block device tagset
> > > according to the virtio device affinity.
> > > 
> > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > ---
> > >   drivers/block/virtio_blk.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > index 9b3bd083b411..1c68c3e0ebf9 100644
> > > --- a/drivers/block/virtio_blk.c
> > > +++ b/drivers/block/virtio_blk.c
> > > @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > >   	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
> > >   	vblk->tag_set.ops = &virtio_mq_ops;
> > >   	vblk->tag_set.queue_depth = queue_depth;
> > > -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> > > +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
> > I afraid that by doing it, you will increase chances to see OOM, because
> > in NUMA_NO_NODE, MM will try allocate memory in whole system, while in
> > the latter mode only on specific NUMA which can be depleted.
> 
> This is a common methodology we use in the block layer and in NVMe subsystem
> and we don't afraid of the OOM issue you raised.

There are many reasons for that, but we are talking about virtio here
and not about NVMe.

> 
> This is not new and I guess that the kernel MM will (or should) be handling
> the fallback you raised.

I afraid that it is not. Can you point me to the place where such
fallback is implemented?

> 
> Anyway, if we're doing this in NVMe I don't see a reason to afraid doing it
> in virtio-blk.

Still, it is nice to have some empirical data to support this copy/paste.

There are too many myths related to optimizations, so finally it will be
good to get some supportive data.

Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-27 17:39     ` Max Gurtovoy
@ 2021-09-28  6:47         ` Stefan Hajnoczi
  0 siblings, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2021-09-28  6:47 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: mst, virtualization, kvm, oren, nitzanc, israelr, hch,
	linux-block, axboe

[-- Attachment #1: Type: text/plain, Size: 2365 bytes --]

On Mon, Sep 27, 2021 at 08:39:30PM +0300, Max Gurtovoy wrote:
> 
> On 9/27/2021 11:09 AM, Stefan Hajnoczi wrote:
> > On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> > > To optimize performance, set the affinity of the block device tagset
> > > according to the virtio device affinity.
> > > 
> > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > ---
> > >   drivers/block/virtio_blk.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > index 9b3bd083b411..1c68c3e0ebf9 100644
> > > --- a/drivers/block/virtio_blk.c
> > > +++ b/drivers/block/virtio_blk.c
> > > @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > >   	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
> > >   	vblk->tag_set.ops = &virtio_mq_ops;
> > >   	vblk->tag_set.queue_depth = queue_depth;
> > > -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> > > +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
> > >   	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
> > >   	vblk->tag_set.cmd_size =
> > >   		sizeof(struct virtblk_req) +
> > I implemented NUMA affinity in the past and could not demonstrate a
> > performance improvement:
> > https://lists.linuxfoundation.org/pipermail/virtualization/2020-June/048248.html
> > 
> > The pathological case is when a guest with vNUMA has the virtio-blk-pci
> > device on the "wrong" host NUMA node. Then memory accesses should cross
> > NUMA nodes. Still, it didn't seem to matter.
> 
> I think the reason you didn't see any improvement is since you didn't use
> the right device for the node query. See my patch 1/2.

That doesn't seem to be the case. Please see
drivers/base/core.c:device_add():

  /* use parent numa_node */
  if (parent && (dev_to_node(dev) == NUMA_NO_NODE))
          set_dev_node(dev, dev_to_node(parent));

IMO it's cleaner to use dev_to_node(&vdev->dev) than to directly access
the parent.

Have I missed something?

> 
> I can try integrating these patches in my series and fix it.
> 
> BTW, we might not see a big improvement because of other bottlenecks but
> this is known perf optimization we use often in block storage drivers.

Let's see benchmark results. Otherwise this is just dead code that adds
complexity.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
@ 2021-09-28  6:47         ` Stefan Hajnoczi
  0 siblings, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2021-09-28  6:47 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, linux-block, kvm, mst, israelr, virtualization, hch,
	nitzanc, oren


[-- Attachment #1.1: Type: text/plain, Size: 2365 bytes --]

On Mon, Sep 27, 2021 at 08:39:30PM +0300, Max Gurtovoy wrote:
> 
> On 9/27/2021 11:09 AM, Stefan Hajnoczi wrote:
> > On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> > > To optimize performance, set the affinity of the block device tagset
> > > according to the virtio device affinity.
> > > 
> > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > ---
> > >   drivers/block/virtio_blk.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > index 9b3bd083b411..1c68c3e0ebf9 100644
> > > --- a/drivers/block/virtio_blk.c
> > > +++ b/drivers/block/virtio_blk.c
> > > @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > >   	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
> > >   	vblk->tag_set.ops = &virtio_mq_ops;
> > >   	vblk->tag_set.queue_depth = queue_depth;
> > > -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> > > +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
> > >   	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
> > >   	vblk->tag_set.cmd_size =
> > >   		sizeof(struct virtblk_req) +
> > I implemented NUMA affinity in the past and could not demonstrate a
> > performance improvement:
> > https://lists.linuxfoundation.org/pipermail/virtualization/2020-June/048248.html
> > 
> > The pathological case is when a guest with vNUMA has the virtio-blk-pci
> > device on the "wrong" host NUMA node. Then memory accesses should cross
> > NUMA nodes. Still, it didn't seem to matter.
> 
> I think the reason you didn't see any improvement is since you didn't use
> the right device for the node query. See my patch 1/2.

That doesn't seem to be the case. Please see
drivers/base/core.c:device_add():

  /* use parent numa_node */
  if (parent && (dev_to_node(dev) == NUMA_NO_NODE))
          set_dev_node(dev, dev_to_node(parent));

IMO it's cleaner to use dev_to_node(&vdev->dev) than to directly access
the parent.

Have I missed something?

> 
> I can try integrating these patches in my series and fix it.
> 
> BTW, we might not see a big improvement because of other bottlenecks but
> this is known perf optimization we use often in block storage drivers.

Let's see benchmark results. Otherwise this is just dead code that adds
complexity.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-27 18:23         ` Leon Romanovsky
  (?)
@ 2021-09-28 15:59         ` Max Gurtovoy
  2021-09-28 16:27             ` Leon Romanovsky
  -1 siblings, 1 reply; 28+ messages in thread
From: Max Gurtovoy @ 2021-09-28 15:59 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: mst, virtualization, kvm, stefanha, oren, nitzanc, israelr, hch,
	linux-block, axboe, Yaron Gepstein, Jason Gunthorpe


On 9/27/2021 9:23 PM, Leon Romanovsky wrote:
> On Mon, Sep 27, 2021 at 08:25:09PM +0300, Max Gurtovoy wrote:
>> On 9/27/2021 2:34 PM, Leon Romanovsky wrote:
>>> On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
>>>> To optimize performance, set the affinity of the block device tagset
>>>> according to the virtio device affinity.
>>>>
>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>> ---
>>>>    drivers/block/virtio_blk.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>> index 9b3bd083b411..1c68c3e0ebf9 100644
>>>> --- a/drivers/block/virtio_blk.c
>>>> +++ b/drivers/block/virtio_blk.c
>>>> @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>>>>    	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
>>>>    	vblk->tag_set.ops = &virtio_mq_ops;
>>>>    	vblk->tag_set.queue_depth = queue_depth;
>>>> -	vblk->tag_set.numa_node = NUMA_NO_NODE;
>>>> +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
>>> I afraid that by doing it, you will increase chances to see OOM, because
>>> in NUMA_NO_NODE, MM will try allocate memory in whole system, while in
>>> the latter mode only on specific NUMA which can be depleted.
>> This is a common methodology we use in the block layer and in NVMe subsystem
>> and we don't afraid of the OOM issue you raised.
> There are many reasons for that, but we are talking about virtio here
> and not about NVMe.

Ok. what reasons ?


>
>> This is not new and I guess that the kernel MM will (or should) be handling
>> the fallback you raised.
> I afraid that it is not. Can you point me to the place where such
> fallback is implemented?
>
>> Anyway, if we're doing this in NVMe I don't see a reason to afraid doing it
>> in virtio-blk.
> Still, it is nice to have some empirical data to support this copy/paste.

I'm not sure what you meant when you say that, but this was just an 
example as everyone else saw. No copy/paste.

Anyhow, taking good ideas from other Linux subsystems is common and I 
don't see a problem doing so.

I'll let the storage experts to comment on that and the maintainers to 
decide whether they would like to take this optimization among other 
optimizations I've sent to this subsystem.

I'll send a v2 in few days.

Thanks.

>
> There are too many myths related to optimizations, so finally it will be
> good to get some supportive data.
>
> Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] virtio: introduce virtio_dev_to_node helper
  2021-09-27  9:31   ` Michael S. Tsirkin
  (?)
@ 2021-09-28 16:14   ` Max Gurtovoy
  -1 siblings, 0 replies; 28+ messages in thread
From: Max Gurtovoy @ 2021-09-28 16:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, kvm, stefanha, oren, nitzanc, israelr, hch,
	linux-block, axboe


On 9/27/2021 12:31 PM, Michael S. Tsirkin wrote:
> On Sun, Sep 26, 2021 at 05:55:17PM +0300, Max Gurtovoy wrote:
>> Also expose numa_node field as a sysfs attribute. Now virtio device
>> drivers will be able to allocate memory that is node-local to the
>> device. This significantly helps performance and it's oftenly used in
>> other drivers such as NVMe, for example.
>>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> If you have to respin this, it is better to split this in
> two patches, one with the helper one adding a sysfs attribute.

It's not a problem, but it will cause the first commit to include a 
method that is not used anywhere.

I'm not sure this is preferred but I can do it.

>
>
>> ---
>>   drivers/virtio/virtio.c | 10 ++++++++++
>>   include/linux/virtio.h  | 13 +++++++++++++
>>   2 files changed, 23 insertions(+)
>>
>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>> index 588e02fb91d3..bdbd76c5c58c 100644
>> --- a/drivers/virtio/virtio.c
>> +++ b/drivers/virtio/virtio.c
>> @@ -60,12 +60,22 @@ static ssize_t features_show(struct device *_d,
>>   }
>>   static DEVICE_ATTR_RO(features);
>>   
>> +static ssize_t numa_node_show(struct device *_d,
>> +			      struct device_attribute *attr, char *buf)
>> +{
>> +	struct virtio_device *vdev = dev_to_virtio(_d);
>> +
>> +	return sysfs_emit(buf, "%d\n", virtio_dev_to_node(vdev));
>> +}
>> +static DEVICE_ATTR_RO(numa_node);
>> +
>>   static struct attribute *virtio_dev_attrs[] = {
>>   	&dev_attr_device.attr,
>>   	&dev_attr_vendor.attr,
>>   	&dev_attr_status.attr,
>>   	&dev_attr_modalias.attr,
>>   	&dev_attr_features.attr,
>> +	&dev_attr_numa_node.attr,
>>   	NULL,
>>   };
>>   ATTRIBUTE_GROUPS(virtio_dev);
>> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
>> index 41edbc01ffa4..05b586ac71d1 100644
>> --- a/include/linux/virtio.h
>> +++ b/include/linux/virtio.h
>> @@ -125,6 +125,19 @@ static inline struct virtio_device *dev_to_virtio(struct device *_dev)
>>   	return container_of(_dev, struct virtio_device, dev);
>>   }
>>   
>> +/**
>> + * virtio_dev_to_node - return the NUMA node for a given virtio device
>> + * @vdev:	device to get the NUMA node for.
>> + */
>> +static inline int virtio_dev_to_node(struct virtio_device *vdev)
>> +{
>> +	struct device *parent = vdev->dev.parent;
>> +
>> +	if (!parent)
>> +		return NUMA_NO_NODE;
>> +	return dev_to_node(parent);
>> +}
>> +
>>   void virtio_add_status(struct virtio_device *dev, unsigned int status);
>>   int register_virtio_device(struct virtio_device *dev);
>>   void unregister_virtio_device(struct virtio_device *dev);
>> -- 
>> 2.18.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-28 15:59         ` Max Gurtovoy
@ 2021-09-28 16:27             ` Leon Romanovsky
  0 siblings, 0 replies; 28+ messages in thread
From: Leon Romanovsky @ 2021-09-28 16:27 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: mst, virtualization, kvm, stefanha, oren, nitzanc, israelr, hch,
	linux-block, axboe, Yaron Gepstein, Jason Gunthorpe

On Tue, Sep 28, 2021 at 06:59:15PM +0300, Max Gurtovoy wrote:
> 
> On 9/27/2021 9:23 PM, Leon Romanovsky wrote:
> > On Mon, Sep 27, 2021 at 08:25:09PM +0300, Max Gurtovoy wrote:
> > > On 9/27/2021 2:34 PM, Leon Romanovsky wrote:
> > > > On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> > > > > To optimize performance, set the affinity of the block device tagset
> > > > > according to the virtio device affinity.
> > > > > 
> > > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > > ---
> > > > >    drivers/block/virtio_blk.c | 2 +-
> > > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > > index 9b3bd083b411..1c68c3e0ebf9 100644
> > > > > --- a/drivers/block/virtio_blk.c
> > > > > +++ b/drivers/block/virtio_blk.c
> > > > > @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > > > >    	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
> > > > >    	vblk->tag_set.ops = &virtio_mq_ops;
> > > > >    	vblk->tag_set.queue_depth = queue_depth;
> > > > > -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> > > > > +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
> > > > I afraid that by doing it, you will increase chances to see OOM, because
> > > > in NUMA_NO_NODE, MM will try allocate memory in whole system, while in
> > > > the latter mode only on specific NUMA which can be depleted.
> > > This is a common methodology we use in the block layer and in NVMe subsystem
> > > and we don't afraid of the OOM issue you raised.
> > There are many reasons for that, but we are talking about virtio here
> > and not about NVMe.
> 
> Ok. what reasons ?

For example, NVMe are physical devices that rely on DMA operations,
PCI connectivity e.t.c to operate. Such systems indeed can benefit from
NUMA locality hints. At the end, these devices are physically connected
to that NUMA node.

In our case, virtio-blk is a software interface that doesn't have all
these limitations. On the contrary, the virtio-blk can be created on one
CPU and moved later to be close to the QEMU which can run on another NUMA
node.

Also this patch increases chances to get OOM by factor of NUMA nodes.
Before your patch, the virtio_blk can allocate from X memory, after your
patch it will be X/NUMB_NUMA_NODES.

In addition, it has all chances to even hurt performance.

So yes, post v2, but as Stefan and I asked, please provide supportive
performance results, because what was done for another subsystem doesn't
mean that it will be applicable here.

Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
@ 2021-09-28 16:27             ` Leon Romanovsky
  0 siblings, 0 replies; 28+ messages in thread
From: Leon Romanovsky @ 2021-09-28 16:27 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, linux-block, Jason Gunthorpe, kvm, mst, israelr,
	virtualization, hch, nitzanc, stefanha, oren, Yaron Gepstein

On Tue, Sep 28, 2021 at 06:59:15PM +0300, Max Gurtovoy wrote:
> 
> On 9/27/2021 9:23 PM, Leon Romanovsky wrote:
> > On Mon, Sep 27, 2021 at 08:25:09PM +0300, Max Gurtovoy wrote:
> > > On 9/27/2021 2:34 PM, Leon Romanovsky wrote:
> > > > On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> > > > > To optimize performance, set the affinity of the block device tagset
> > > > > according to the virtio device affinity.
> > > > > 
> > > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > > ---
> > > > >    drivers/block/virtio_blk.c | 2 +-
> > > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > > index 9b3bd083b411..1c68c3e0ebf9 100644
> > > > > --- a/drivers/block/virtio_blk.c
> > > > > +++ b/drivers/block/virtio_blk.c
> > > > > @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > > > >    	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
> > > > >    	vblk->tag_set.ops = &virtio_mq_ops;
> > > > >    	vblk->tag_set.queue_depth = queue_depth;
> > > > > -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> > > > > +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
> > > > I afraid that by doing it, you will increase chances to see OOM, because
> > > > in NUMA_NO_NODE, MM will try allocate memory in whole system, while in
> > > > the latter mode only on specific NUMA which can be depleted.
> > > This is a common methodology we use in the block layer and in NVMe subsystem
> > > and we don't afraid of the OOM issue you raised.
> > There are many reasons for that, but we are talking about virtio here
> > and not about NVMe.
> 
> Ok. what reasons ?

For example, NVMe are physical devices that rely on DMA operations,
PCI connectivity e.t.c to operate. Such systems indeed can benefit from
NUMA locality hints. At the end, these devices are physically connected
to that NUMA node.

In our case, virtio-blk is a software interface that doesn't have all
these limitations. On the contrary, the virtio-blk can be created on one
CPU and moved later to be close to the QEMU which can run on another NUMA
node.

Also this patch increases chances to get OOM by factor of NUMA nodes.
Before your patch, the virtio_blk can allocate from X memory, after your
patch it will be X/NUMB_NUMA_NODES.

In addition, it has all chances to even hurt performance.

So yes, post v2, but as Stefan and I asked, please provide supportive
performance results, because what was done for another subsystem doesn't
mean that it will be applicable here.

Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-28 16:27             ` Leon Romanovsky
  (?)
@ 2021-09-28 23:28             ` Max Gurtovoy
  2021-09-29  6:50                 ` Leon Romanovsky
  -1 siblings, 1 reply; 28+ messages in thread
From: Max Gurtovoy @ 2021-09-28 23:28 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: mst, virtualization, kvm, stefanha, oren, nitzanc, israelr, hch,
	linux-block, axboe, Yaron Gepstein, Jason Gunthorpe


On 9/28/2021 7:27 PM, Leon Romanovsky wrote:
> On Tue, Sep 28, 2021 at 06:59:15PM +0300, Max Gurtovoy wrote:
>> On 9/27/2021 9:23 PM, Leon Romanovsky wrote:
>>> On Mon, Sep 27, 2021 at 08:25:09PM +0300, Max Gurtovoy wrote:
>>>> On 9/27/2021 2:34 PM, Leon Romanovsky wrote:
>>>>> On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
>>>>>> To optimize performance, set the affinity of the block device tagset
>>>>>> according to the virtio device affinity.
>>>>>>
>>>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>>>> ---
>>>>>>     drivers/block/virtio_blk.c | 2 +-
>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>>>> index 9b3bd083b411..1c68c3e0ebf9 100644
>>>>>> --- a/drivers/block/virtio_blk.c
>>>>>> +++ b/drivers/block/virtio_blk.c
>>>>>> @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>>>>>>     	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
>>>>>>     	vblk->tag_set.ops = &virtio_mq_ops;
>>>>>>     	vblk->tag_set.queue_depth = queue_depth;
>>>>>> -	vblk->tag_set.numa_node = NUMA_NO_NODE;
>>>>>> +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
>>>>> I afraid that by doing it, you will increase chances to see OOM, because
>>>>> in NUMA_NO_NODE, MM will try allocate memory in whole system, while in
>>>>> the latter mode only on specific NUMA which can be depleted.
>>>> This is a common methodology we use in the block layer and in NVMe subsystem
>>>> and we don't afraid of the OOM issue you raised.
>>> There are many reasons for that, but we are talking about virtio here
>>> and not about NVMe.
>> Ok. what reasons ?
> For example, NVMe are physical devices that rely on DMA operations,
> PCI connectivity e.t.c to operate. Such systems indeed can benefit from
> NUMA locality hints. At the end, these devices are physically connected
> to that NUMA node.

FYI Virtio devices are also physical devices that have PCI interface and 
rely on DMA operations.

from virtio spec: "Virtio devices use normal bus mechanisms of 
interrupts and DMA which should be familiar
to any device driver author".

Also we develop virtio HW at NVIDIA for blk and net devices with our 
SNAP technology.

These devices are connected via PCI bus to the host.

We also support SRIOV.

Same it true also for paravirt devices that are emulated by QEMU but 
still the guest sees them as PCI devices.

>
> In our case, virtio-blk is a software interface that doesn't have all
> these limitations. On the contrary, the virtio-blk can be created on one
> CPU and moved later to be close to the QEMU which can run on another NUMA
> node.

Not at all. virtio is HW interface.

I don't understand what are you saying here ?

>
> Also this patch increases chances to get OOM by factor of NUMA nodes.

This is common practice in Linux for storage drivers. Why does it 
bothers you at all ?

I already decreased the memory footprint for virtio blk devices.


> Before your patch, the virtio_blk can allocate from X memory, after your
> patch it will be X/NUMB_NUMA_NODES.

So go ahead and change all the block layer if it bothers you so much.

Also please change the NVMe subsystem when you do it.

And lets see what the community will say.

> In addition, it has all chances to even hurt performance.
>
> So yes, post v2, but as Stefan and I asked, please provide supportive
> performance results, because what was done for another subsystem doesn't
> mean that it will be applicable here.

I will measure the perf but even if we wont see an improvement since it 
might not be the bottleneck, this changes should be merged since this is 
the way the block layer is optimized.

This is a micro optimization that commonly used also in other subsystem. 
And non of your above reasons (PCI, SW device, DMA) is true.

Virtio blk device is in 99% a PCI device (paravirt or real HW) exactly 
like any other PCI device you are familiar with.

It's connected physically to some slot, it has a BAR, MMIO, 
configuration space, etc..

Thanks.

>
> Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-28 23:28             ` Max Gurtovoy
@ 2021-09-29  6:50                 ` Leon Romanovsky
  0 siblings, 0 replies; 28+ messages in thread
From: Leon Romanovsky @ 2021-09-29  6:50 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: mst, virtualization, kvm, stefanha, oren, nitzanc, israelr, hch,
	linux-block, axboe, Yaron Gepstein, Jason Gunthorpe

On Wed, Sep 29, 2021 at 02:28:08AM +0300, Max Gurtovoy wrote:
> 
> On 9/28/2021 7:27 PM, Leon Romanovsky wrote:
> > On Tue, Sep 28, 2021 at 06:59:15PM +0300, Max Gurtovoy wrote:
> > > On 9/27/2021 9:23 PM, Leon Romanovsky wrote:
> > > > On Mon, Sep 27, 2021 at 08:25:09PM +0300, Max Gurtovoy wrote:
> > > > > On 9/27/2021 2:34 PM, Leon Romanovsky wrote:
> > > > > > On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> > > > > > > To optimize performance, set the affinity of the block device tagset
> > > > > > > according to the virtio device affinity.
> > > > > > > 
> > > > > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > > > > ---
> > > > > > >     drivers/block/virtio_blk.c | 2 +-
> > > > > > >     1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > > > > index 9b3bd083b411..1c68c3e0ebf9 100644
> > > > > > > --- a/drivers/block/virtio_blk.c
> > > > > > > +++ b/drivers/block/virtio_blk.c
> > > > > > > @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > > > > > >     	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
> > > > > > >     	vblk->tag_set.ops = &virtio_mq_ops;
> > > > > > >     	vblk->tag_set.queue_depth = queue_depth;
> > > > > > > -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> > > > > > > +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
> > > > > > I afraid that by doing it, you will increase chances to see OOM, because
> > > > > > in NUMA_NO_NODE, MM will try allocate memory in whole system, while in
> > > > > > the latter mode only on specific NUMA which can be depleted.
> > > > > This is a common methodology we use in the block layer and in NVMe subsystem
> > > > > and we don't afraid of the OOM issue you raised.
> > > > There are many reasons for that, but we are talking about virtio here
> > > > and not about NVMe.
> > > Ok. what reasons ?
> > For example, NVMe are physical devices that rely on DMA operations,
> > PCI connectivity e.t.c to operate. Such systems indeed can benefit from
> > NUMA locality hints. At the end, these devices are physically connected
> > to that NUMA node.
> 
> FYI Virtio devices are also physical devices that have PCI interface and
> rely on DMA operations.
> 
> from virtio spec: "Virtio devices use normal bus mechanisms of interrupts
> and DMA which should be familiar
> to any device driver author".

Yes, this is how bus in Linux is implemented, there is nothing new here. 

> 
> Also we develop virtio HW at NVIDIA for blk and net devices with our SNAP
> technology.
> 
> These devices are connected via PCI bus to the host.

How all these related to general virtio-blk implementation?

> 
> We also support SRIOV.
> 
> Same it true also for paravirt devices that are emulated by QEMU but still
> the guest sees them as PCI devices.

Yes, the key word here - "emulated".

> 
> > 
> > In our case, virtio-blk is a software interface that doesn't have all
> > these limitations. On the contrary, the virtio-blk can be created on one
> > CPU and moved later to be close to the QEMU which can run on another NUMA
> > node.
> 
> Not at all. virtio is HW interface.

Virtio are para-virtualized devices that are represented as HW interfaces
in the guest OS. They are not needed to be real devices in the hypervisor,
which is my (and probably most of the world) use case.

My QEMU command line contains something like that: "-drive file=IMAGE.img,if=virtio"

> 
> I don't understand what are you saying here ?
> 
> > 
> > Also this patch increases chances to get OOM by factor of NUMA nodes.
> 
> This is common practice in Linux for storage drivers. Why does it bothers
> you at all ?

Do I need a reason to ask for a clarification for publicly posted patch
in open mailing list?

I use virtio and care about it.

> 
> I already decreased the memory footprint for virtio blk devices.

As I wrote before, you decreased by several KB, but by this patch you
limited available memory in magnitudes.

> 
> 
> > Before your patch, the virtio_blk can allocate from X memory, after your
> > patch it will be X/NUMB_NUMA_NODES.
> 
> So go ahead and change all the block layer if it bothers you so much.
> 
> Also please change the NVMe subsystem when you do it.

I suggest less radical approach - don't take patches without proven
benefit.

We are in 2021, let's rely on NUMA node policy.

> 
> And lets see what the community will say.

Stephen asked you for performance data too. I'm not alone here.

> 
> > In addition, it has all chances to even hurt performance.
> > 
> > So yes, post v2, but as Stefan and I asked, please provide supportive
> > performance results, because what was done for another subsystem doesn't
> > mean that it will be applicable here.
> 
> I will measure the perf but even if we wont see an improvement since it
> might not be the bottleneck, this changes should be merged since this is the
> way the block layer is optimized.

This is not acceptance criteria to merge patches.

> 
> This is a micro optimization that commonly used also in other subsystem. And
> non of your above reasons (PCI, SW device, DMA) is true.

Every subsystem is different, in some it makes sense, in others it doesn't.

We (RDMA) had very long discussion (together with perf data) and heavily tailored
test to measure influence of per-node allocations and guess what? We didn't see
any performance advantage.

https://lore.kernel.org/linux-rdma/c34a864803f9bbd33d3f856a6ba2dd595ab708a7.1620729033.git.leonro@nvidia.com/

> 
> Virtio blk device is in 99% a PCI device (paravirt or real HW) exactly like
> any other PCI device you are familiar with.
> 
> It's connected physically to some slot, it has a BAR, MMIO, configuration
> space, etc..

In general case, it is far from being true.

> 
> Thanks.
> 
> > 
> > Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
@ 2021-09-29  6:50                 ` Leon Romanovsky
  0 siblings, 0 replies; 28+ messages in thread
From: Leon Romanovsky @ 2021-09-29  6:50 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, linux-block, Jason Gunthorpe, kvm, mst, israelr,
	virtualization, hch, nitzanc, stefanha, oren, Yaron Gepstein

On Wed, Sep 29, 2021 at 02:28:08AM +0300, Max Gurtovoy wrote:
> 
> On 9/28/2021 7:27 PM, Leon Romanovsky wrote:
> > On Tue, Sep 28, 2021 at 06:59:15PM +0300, Max Gurtovoy wrote:
> > > On 9/27/2021 9:23 PM, Leon Romanovsky wrote:
> > > > On Mon, Sep 27, 2021 at 08:25:09PM +0300, Max Gurtovoy wrote:
> > > > > On 9/27/2021 2:34 PM, Leon Romanovsky wrote:
> > > > > > On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> > > > > > > To optimize performance, set the affinity of the block device tagset
> > > > > > > according to the virtio device affinity.
> > > > > > > 
> > > > > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > > > > ---
> > > > > > >     drivers/block/virtio_blk.c | 2 +-
> > > > > > >     1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > > > > index 9b3bd083b411..1c68c3e0ebf9 100644
> > > > > > > --- a/drivers/block/virtio_blk.c
> > > > > > > +++ b/drivers/block/virtio_blk.c
> > > > > > > @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > > > > > >     	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
> > > > > > >     	vblk->tag_set.ops = &virtio_mq_ops;
> > > > > > >     	vblk->tag_set.queue_depth = queue_depth;
> > > > > > > -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> > > > > > > +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
> > > > > > I afraid that by doing it, you will increase chances to see OOM, because
> > > > > > in NUMA_NO_NODE, MM will try allocate memory in whole system, while in
> > > > > > the latter mode only on specific NUMA which can be depleted.
> > > > > This is a common methodology we use in the block layer and in NVMe subsystem
> > > > > and we don't afraid of the OOM issue you raised.
> > > > There are many reasons for that, but we are talking about virtio here
> > > > and not about NVMe.
> > > Ok. what reasons ?
> > For example, NVMe are physical devices that rely on DMA operations,
> > PCI connectivity e.t.c to operate. Such systems indeed can benefit from
> > NUMA locality hints. At the end, these devices are physically connected
> > to that NUMA node.
> 
> FYI Virtio devices are also physical devices that have PCI interface and
> rely on DMA operations.
> 
> from virtio spec: "Virtio devices use normal bus mechanisms of interrupts
> and DMA which should be familiar
> to any device driver author".

Yes, this is how bus in Linux is implemented, there is nothing new here. 

> 
> Also we develop virtio HW at NVIDIA for blk and net devices with our SNAP
> technology.
> 
> These devices are connected via PCI bus to the host.

How all these related to general virtio-blk implementation?

> 
> We also support SRIOV.
> 
> Same it true also for paravirt devices that are emulated by QEMU but still
> the guest sees them as PCI devices.

Yes, the key word here - "emulated".

> 
> > 
> > In our case, virtio-blk is a software interface that doesn't have all
> > these limitations. On the contrary, the virtio-blk can be created on one
> > CPU and moved later to be close to the QEMU which can run on another NUMA
> > node.
> 
> Not at all. virtio is HW interface.

Virtio are para-virtualized devices that are represented as HW interfaces
in the guest OS. They are not needed to be real devices in the hypervisor,
which is my (and probably most of the world) use case.

My QEMU command line contains something like that: "-drive file=IMAGE.img,if=virtio"

> 
> I don't understand what are you saying here ?
> 
> > 
> > Also this patch increases chances to get OOM by factor of NUMA nodes.
> 
> This is common practice in Linux for storage drivers. Why does it bothers
> you at all ?

Do I need a reason to ask for a clarification for publicly posted patch
in open mailing list?

I use virtio and care about it.

> 
> I already decreased the memory footprint for virtio blk devices.

As I wrote before, you decreased by several KB, but by this patch you
limited available memory in magnitudes.

> 
> 
> > Before your patch, the virtio_blk can allocate from X memory, after your
> > patch it will be X/NUMB_NUMA_NODES.
> 
> So go ahead and change all the block layer if it bothers you so much.
> 
> Also please change the NVMe subsystem when you do it.

I suggest less radical approach - don't take patches without proven
benefit.

We are in 2021, let's rely on NUMA node policy.

> 
> And lets see what the community will say.

Stephen asked you for performance data too. I'm not alone here.

> 
> > In addition, it has all chances to even hurt performance.
> > 
> > So yes, post v2, but as Stefan and I asked, please provide supportive
> > performance results, because what was done for another subsystem doesn't
> > mean that it will be applicable here.
> 
> I will measure the perf but even if we wont see an improvement since it
> might not be the bottleneck, this changes should be merged since this is the
> way the block layer is optimized.

This is not acceptance criteria to merge patches.

> 
> This is a micro optimization that commonly used also in other subsystem. And
> non of your above reasons (PCI, SW device, DMA) is true.

Every subsystem is different, in some it makes sense, in others it doesn't.

We (RDMA) had very long discussion (together with perf data) and heavily tailored
test to measure influence of per-node allocations and guess what? We didn't see
any performance advantage.

https://lore.kernel.org/linux-rdma/c34a864803f9bbd33d3f856a6ba2dd595ab708a7.1620729033.git.leonro@nvidia.com/

> 
> Virtio blk device is in 99% a PCI device (paravirt or real HW) exactly like
> any other PCI device you are familiar with.
> 
> It's connected physically to some slot, it has a BAR, MMIO, configuration
> space, etc..

In general case, it is far from being true.

> 
> Thanks.
> 
> > 
> > Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-29  6:50                 ` Leon Romanovsky
  (?)
@ 2021-09-29  9:48                 ` Max Gurtovoy
  -1 siblings, 0 replies; 28+ messages in thread
From: Max Gurtovoy @ 2021-09-29  9:48 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: mst, virtualization, kvm, stefanha, oren, nitzanc, israelr, hch,
	linux-block, axboe, Yaron Gepstein, Jason Gunthorpe


On 9/29/2021 9:50 AM, Leon Romanovsky wrote:
> On Wed, Sep 29, 2021 at 02:28:08AM +0300, Max Gurtovoy wrote:
>> On 9/28/2021 7:27 PM, Leon Romanovsky wrote:
>>> On Tue, Sep 28, 2021 at 06:59:15PM +0300, Max Gurtovoy wrote:
>>>> On 9/27/2021 9:23 PM, Leon Romanovsky wrote:
>>>>> On Mon, Sep 27, 2021 at 08:25:09PM +0300, Max Gurtovoy wrote:
>>>>>> On 9/27/2021 2:34 PM, Leon Romanovsky wrote:
>>>>>>> On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
>>>>>>>> To optimize performance, set the affinity of the block device tagset
>>>>>>>> according to the virtio device affinity.
>>>>>>>>
>>>>>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>>>>>> ---
>>>>>>>>      drivers/block/virtio_blk.c | 2 +-
>>>>>>>>      1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>>>>>> index 9b3bd083b411..1c68c3e0ebf9 100644
>>>>>>>> --- a/drivers/block/virtio_blk.c
>>>>>>>> +++ b/drivers/block/virtio_blk.c
>>>>>>>> @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>>>>>>>>      	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
>>>>>>>>      	vblk->tag_set.ops = &virtio_mq_ops;
>>>>>>>>      	vblk->tag_set.queue_depth = queue_depth;
>>>>>>>> -	vblk->tag_set.numa_node = NUMA_NO_NODE;
>>>>>>>> +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
>>>>>>> I afraid that by doing it, you will increase chances to see OOM, because
>>>>>>> in NUMA_NO_NODE, MM will try allocate memory in whole system, while in
>>>>>>> the latter mode only on specific NUMA which can be depleted.
>>>>>> This is a common methodology we use in the block layer and in NVMe subsystem
>>>>>> and we don't afraid of the OOM issue you raised.
>>>>> There are many reasons for that, but we are talking about virtio here
>>>>> and not about NVMe.
>>>> Ok. what reasons ?
>>> For example, NVMe are physical devices that rely on DMA operations,
>>> PCI connectivity e.t.c to operate. Such systems indeed can benefit from
>>> NUMA locality hints. At the end, these devices are physically connected
>>> to that NUMA node.
>> FYI Virtio devices are also physical devices that have PCI interface and
>> rely on DMA operations.
>>
>> from virtio spec: "Virtio devices use normal bus mechanisms of interrupts
>> and DMA which should be familiar
>> to any device driver author".
> Yes, this is how bus in Linux is implemented, there is nothing new here.

So why you said that virtio is not a PCI device with DMA capabilities ?

>
>> Also we develop virtio HW at NVIDIA for blk and net devices with our SNAP
>> technology.
>>
>> These devices are connected via PCI bus to the host.
> How all these related to general virtio-blk implementation?

They use the same driver.

We develop HW virtio devices for bare metal cloud and also for 
virtualized cloud that use the SRIOV feature of the PF (real PF).

>
>> We also support SRIOV.
>>
>> Same it true also for paravirt devices that are emulated by QEMU but still
>> the guest sees them as PCI devices.
> Yes, the key word here - "emulated".

It doesn't matter. The guest kernel doesn't know if it's a paravirt 
device or real NVIDIA HW virtio SNAP device.

And FYI, a guest can also have 2 NUMA nodes and can benefit from this patch.

>
>>> In our case, virtio-blk is a software interface that doesn't have all
>>> these limitations. On the contrary, the virtio-blk can be created on one
>>> CPU and moved later to be close to the QEMU which can run on another NUMA
>>> node.
>> Not at all. virtio is HW interface.
> Virtio are para-virtualized devices that are represented as HW interfaces
> in the guest OS. They are not needed to be real devices in the hypervisor,
> which is my (and probably most of the world) use case.

Again, the kernel doesn't care or know if its a paravirt device or not. 
And it shouldn't care.

This patch is for kernel driver and not QEMU.

>
> My QEMU command line contains something like that: "-drive file=IMAGE.img,if=virtio"

This is one option.

For NVIDIA HW device, you pass a virtio device exactly how you pass a 
mlx5 device - using vfio + vfio_pci.


>
>> I don't understand what are you saying here ?
>>
>>> Also this patch increases chances to get OOM by factor of NUMA nodes.
>> This is common practice in Linux for storage drivers. Why does it bothers
>> you at all ?
> Do I need a reason to ask for a clarification for publicly posted patch
> in open mailing list?
>
> I use virtio and care about it.

I meant, why don't you want to change the entire block layer and NVMe 
subsystem ?

Why only this patch bothers you ?

>
>> I already decreased the memory footprint for virtio blk devices.
> As I wrote before, you decreased by several KB, but by this patch you
> limited available memory in magnitudes.
>
>>
>>> Before your patch, the virtio_blk can allocate from X memory, after your
>>> patch it will be X/NUMB_NUMA_NODES.
>> So go ahead and change all the block layer if it bothers you so much.
>>
>> Also please change the NVMe subsystem when you do it.
> I suggest less radical approach - don't take patches without proven
> benefit.
>
> We are in 2021, let's rely on NUMA node policy.

I'm trying to add NUMA policy here. Exactly.


>
>> And lets see what the community will say.
> Stephen asked you for performance data too. I'm not alone here.


I said I'll have a V2.

I also would like to hear the opinion of the block maintainers like Jens 
and Christoph regarding numa affinity for block drivers.

>>> In addition, it has all chances to even hurt performance.
>>>
>>> So yes, post v2, but as Stefan and I asked, please provide supportive
>>> performance results, because what was done for another subsystem doesn't
>>> mean that it will be applicable here.
>> I will measure the perf but even if we wont see an improvement since it
>> might not be the bottleneck, this changes should be merged since this is the
>> way the block layer is optimized.
> This is not acceptance criteria to merge patches.
>
>> This is a micro optimization that commonly used also in other subsystem. And
>> non of your above reasons (PCI, SW device, DMA) is true.
> Every subsystem is different, in some it makes sense, in others it doesn't.

But you were wrong saying that virtio device is not PCI HW device that 
uses DMA.

Do you understand the solution now ?

>
> We (RDMA) had very long discussion (together with perf data) and heavily tailored
> test to measure influence of per-node allocations and guess what? We didn't see
> any performance advantage.
>
> https://lore.kernel.org/linux-rdma/c34a864803f9bbd33d3f856a6ba2dd595ab708a7.1620729033.git.leonro@nvidia.com/

So go ahead and change all the kernel or the block layer.

As you said, for RDMA subsystem it might not be a good idea. I don't 
want to discuss RDMA considerations in this thread.

Lets talk storage and virtio.

>
>> Virtio blk device is in 99% a PCI device (paravirt or real HW) exactly like
>> any other PCI device you are familiar with.
>>
>> It's connected physically to some slot, it has a BAR, MMIO, configuration
>> space, etc..
> In general case, it is far from being true.

it's exactly true.

But let give MST and Stephan to comment.

>
>> Thanks.
>>
>>> Thanks

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-28  6:47         ` Stefan Hajnoczi
  (?)
@ 2021-09-29 15:07         ` Max Gurtovoy
  2021-09-30 13:16             ` Stefan Hajnoczi
  -1 siblings, 1 reply; 28+ messages in thread
From: Max Gurtovoy @ 2021-09-29 15:07 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: mst, virtualization, kvm, oren, nitzanc, israelr, hch,
	linux-block, axboe


On 9/28/2021 9:47 AM, Stefan Hajnoczi wrote:
> On Mon, Sep 27, 2021 at 08:39:30PM +0300, Max Gurtovoy wrote:
>> On 9/27/2021 11:09 AM, Stefan Hajnoczi wrote:
>>> On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
>>>> To optimize performance, set the affinity of the block device tagset
>>>> according to the virtio device affinity.
>>>>
>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>> ---
>>>>    drivers/block/virtio_blk.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>> index 9b3bd083b411..1c68c3e0ebf9 100644
>>>> --- a/drivers/block/virtio_blk.c
>>>> +++ b/drivers/block/virtio_blk.c
>>>> @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>>>>    	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
>>>>    	vblk->tag_set.ops = &virtio_mq_ops;
>>>>    	vblk->tag_set.queue_depth = queue_depth;
>>>> -	vblk->tag_set.numa_node = NUMA_NO_NODE;
>>>> +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
>>>>    	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
>>>>    	vblk->tag_set.cmd_size =
>>>>    		sizeof(struct virtblk_req) +
>>> I implemented NUMA affinity in the past and could not demonstrate a
>>> performance improvement:
>>> https://lists.linuxfoundation.org/pipermail/virtualization/2020-June/048248.html
>>>
>>> The pathological case is when a guest with vNUMA has the virtio-blk-pci
>>> device on the "wrong" host NUMA node. Then memory accesses should cross
>>> NUMA nodes. Still, it didn't seem to matter.
>> I think the reason you didn't see any improvement is since you didn't use
>> the right device for the node query. See my patch 1/2.
> That doesn't seem to be the case. Please see
> drivers/base/core.c:device_add():
>
>    /* use parent numa_node */
>    if (parent && (dev_to_node(dev) == NUMA_NO_NODE))
>            set_dev_node(dev, dev_to_node(parent));
>
> IMO it's cleaner to use dev_to_node(&vdev->dev) than to directly access
> the parent.
>
> Have I missed something?

but dev_to_node(dev) is 0 IMO.

who set it to NUMA_NO_NODE ?

>
>> I can try integrating these patches in my series and fix it.
>>
>> BTW, we might not see a big improvement because of other bottlenecks but
>> this is known perf optimization we use often in block storage drivers.
> Let's see benchmark results. Otherwise this is just dead code that adds
> complexity.
>
> Stefan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-29 15:07         ` Max Gurtovoy
@ 2021-09-30 13:16             ` Stefan Hajnoczi
  0 siblings, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2021-09-30 13:16 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: mst, virtualization, kvm, oren, nitzanc, israelr, hch,
	linux-block, axboe

[-- Attachment #1: Type: text/plain, Size: 2579 bytes --]

On Wed, Sep 29, 2021 at 06:07:52PM +0300, Max Gurtovoy wrote:
> 
> On 9/28/2021 9:47 AM, Stefan Hajnoczi wrote:
> > On Mon, Sep 27, 2021 at 08:39:30PM +0300, Max Gurtovoy wrote:
> > > On 9/27/2021 11:09 AM, Stefan Hajnoczi wrote:
> > > > On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> > > > > To optimize performance, set the affinity of the block device tagset
> > > > > according to the virtio device affinity.
> > > > > 
> > > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > > ---
> > > > >    drivers/block/virtio_blk.c | 2 +-
> > > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > > index 9b3bd083b411..1c68c3e0ebf9 100644
> > > > > --- a/drivers/block/virtio_blk.c
> > > > > +++ b/drivers/block/virtio_blk.c
> > > > > @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > > > >    	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
> > > > >    	vblk->tag_set.ops = &virtio_mq_ops;
> > > > >    	vblk->tag_set.queue_depth = queue_depth;
> > > > > -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> > > > > +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
> > > > >    	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
> > > > >    	vblk->tag_set.cmd_size =
> > > > >    		sizeof(struct virtblk_req) +
> > > > I implemented NUMA affinity in the past and could not demonstrate a
> > > > performance improvement:
> > > > https://lists.linuxfoundation.org/pipermail/virtualization/2020-June/048248.html
> > > > 
> > > > The pathological case is when a guest with vNUMA has the virtio-blk-pci
> > > > device on the "wrong" host NUMA node. Then memory accesses should cross
> > > > NUMA nodes. Still, it didn't seem to matter.
> > > I think the reason you didn't see any improvement is since you didn't use
> > > the right device for the node query. See my patch 1/2.
> > That doesn't seem to be the case. Please see
> > drivers/base/core.c:device_add():
> > 
> >    /* use parent numa_node */
> >    if (parent && (dev_to_node(dev) == NUMA_NO_NODE))
> >            set_dev_node(dev, dev_to_node(parent));
> > 
> > IMO it's cleaner to use dev_to_node(&vdev->dev) than to directly access
> > the parent.
> > 
> > Have I missed something?
> 
> but dev_to_node(dev) is 0 IMO.
> 
> who set it to NUMA_NO_NODE ?

drivers/virtio/virtio.c:register_virtio_device():

  device_initialize(&dev->dev);

drivers/base/core.c:device_initialize():

  set_dev_node(dev, -1);

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
@ 2021-09-30 13:16             ` Stefan Hajnoczi
  0 siblings, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2021-09-30 13:16 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, linux-block, kvm, mst, israelr, virtualization, hch,
	nitzanc, oren


[-- Attachment #1.1: Type: text/plain, Size: 2579 bytes --]

On Wed, Sep 29, 2021 at 06:07:52PM +0300, Max Gurtovoy wrote:
> 
> On 9/28/2021 9:47 AM, Stefan Hajnoczi wrote:
> > On Mon, Sep 27, 2021 at 08:39:30PM +0300, Max Gurtovoy wrote:
> > > On 9/27/2021 11:09 AM, Stefan Hajnoczi wrote:
> > > > On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> > > > > To optimize performance, set the affinity of the block device tagset
> > > > > according to the virtio device affinity.
> > > > > 
> > > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > > ---
> > > > >    drivers/block/virtio_blk.c | 2 +-
> > > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > > > index 9b3bd083b411..1c68c3e0ebf9 100644
> > > > > --- a/drivers/block/virtio_blk.c
> > > > > +++ b/drivers/block/virtio_blk.c
> > > > > @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > > > >    	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
> > > > >    	vblk->tag_set.ops = &virtio_mq_ops;
> > > > >    	vblk->tag_set.queue_depth = queue_depth;
> > > > > -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> > > > > +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
> > > > >    	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
> > > > >    	vblk->tag_set.cmd_size =
> > > > >    		sizeof(struct virtblk_req) +
> > > > I implemented NUMA affinity in the past and could not demonstrate a
> > > > performance improvement:
> > > > https://lists.linuxfoundation.org/pipermail/virtualization/2020-June/048248.html
> > > > 
> > > > The pathological case is when a guest with vNUMA has the virtio-blk-pci
> > > > device on the "wrong" host NUMA node. Then memory accesses should cross
> > > > NUMA nodes. Still, it didn't seem to matter.
> > > I think the reason you didn't see any improvement is since you didn't use
> > > the right device for the node query. See my patch 1/2.
> > That doesn't seem to be the case. Please see
> > drivers/base/core.c:device_add():
> > 
> >    /* use parent numa_node */
> >    if (parent && (dev_to_node(dev) == NUMA_NO_NODE))
> >            set_dev_node(dev, dev_to_node(parent));
> > 
> > IMO it's cleaner to use dev_to_node(&vdev->dev) than to directly access
> > the parent.
> > 
> > Have I missed something?
> 
> but dev_to_node(dev) is 0 IMO.
> 
> who set it to NUMA_NO_NODE ?

drivers/virtio/virtio.c:register_virtio_device():

  device_initialize(&dev->dev);

drivers/base/core.c:device_initialize():

  set_dev_node(dev, -1);

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset
  2021-09-30 13:16             ` Stefan Hajnoczi
  (?)
@ 2021-09-30 13:24             ` Max Gurtovoy
  -1 siblings, 0 replies; 28+ messages in thread
From: Max Gurtovoy @ 2021-09-30 13:24 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: mst, virtualization, kvm, oren, nitzanc, israelr, hch,
	linux-block, axboe


On 9/30/2021 4:16 PM, Stefan Hajnoczi wrote:
> On Wed, Sep 29, 2021 at 06:07:52PM +0300, Max Gurtovoy wrote:
>> On 9/28/2021 9:47 AM, Stefan Hajnoczi wrote:
>>> On Mon, Sep 27, 2021 at 08:39:30PM +0300, Max Gurtovoy wrote:
>>>> On 9/27/2021 11:09 AM, Stefan Hajnoczi wrote:
>>>>> On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
>>>>>> To optimize performance, set the affinity of the block device tagset
>>>>>> according to the virtio device affinity.
>>>>>>
>>>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>>>> ---
>>>>>>     drivers/block/virtio_blk.c | 2 +-
>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>>>> index 9b3bd083b411..1c68c3e0ebf9 100644
>>>>>> --- a/drivers/block/virtio_blk.c
>>>>>> +++ b/drivers/block/virtio_blk.c
>>>>>> @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>>>>>>     	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
>>>>>>     	vblk->tag_set.ops = &virtio_mq_ops;
>>>>>>     	vblk->tag_set.queue_depth = queue_depth;
>>>>>> -	vblk->tag_set.numa_node = NUMA_NO_NODE;
>>>>>> +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
>>>>>>     	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
>>>>>>     	vblk->tag_set.cmd_size =
>>>>>>     		sizeof(struct virtblk_req) +
>>>>> I implemented NUMA affinity in the past and could not demonstrate a
>>>>> performance improvement:
>>>>> https://lists.linuxfoundation.org/pipermail/virtualization/2020-June/048248.html
>>>>>
>>>>> The pathological case is when a guest with vNUMA has the virtio-blk-pci
>>>>> device on the "wrong" host NUMA node. Then memory accesses should cross
>>>>> NUMA nodes. Still, it didn't seem to matter.
>>>> I think the reason you didn't see any improvement is since you didn't use
>>>> the right device for the node query. See my patch 1/2.
>>> That doesn't seem to be the case. Please see
>>> drivers/base/core.c:device_add():
>>>
>>>     /* use parent numa_node */
>>>     if (parent && (dev_to_node(dev) == NUMA_NO_NODE))
>>>             set_dev_node(dev, dev_to_node(parent));
>>>
>>> IMO it's cleaner to use dev_to_node(&vdev->dev) than to directly access
>>> the parent.
>>>
>>> Have I missed something?
>> but dev_to_node(dev) is 0 IMO.
>>
>> who set it to NUMA_NO_NODE ?
> drivers/virtio/virtio.c:register_virtio_device():
>
>    device_initialize(&dev->dev);
>
> drivers/base/core.c:device_initialize():
>
>    set_dev_node(dev, -1);

Ohh I was searching NUMA_NO_NODE. I guess the initial commit from 
Christoph 15 years ago was before adding this macro.

I'll send a patch to fix it.

I hope I'll have a system to check your patches next week.

>
> Stefan

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-09-30 13:25 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-26 14:55 [PATCH 1/2] virtio: introduce virtio_dev_to_node helper Max Gurtovoy
2021-09-26 14:55 ` [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset Max Gurtovoy
2021-09-27  8:09   ` Stefan Hajnoczi
2021-09-27  8:09     ` Stefan Hajnoczi
2021-09-27 17:39     ` Max Gurtovoy
2021-09-28  6:47       ` Stefan Hajnoczi
2021-09-28  6:47         ` Stefan Hajnoczi
2021-09-29 15:07         ` Max Gurtovoy
2021-09-30 13:16           ` Stefan Hajnoczi
2021-09-30 13:16             ` Stefan Hajnoczi
2021-09-30 13:24             ` Max Gurtovoy
2021-09-27 11:34   ` Leon Romanovsky
2021-09-27 11:34     ` Leon Romanovsky
2021-09-27 17:25     ` Max Gurtovoy
2021-09-27 18:23       ` Leon Romanovsky
2021-09-27 18:23         ` Leon Romanovsky
2021-09-28 15:59         ` Max Gurtovoy
2021-09-28 16:27           ` Leon Romanovsky
2021-09-28 16:27             ` Leon Romanovsky
2021-09-28 23:28             ` Max Gurtovoy
2021-09-29  6:50               ` Leon Romanovsky
2021-09-29  6:50                 ` Leon Romanovsky
2021-09-29  9:48                 ` Max Gurtovoy
2021-09-27  8:02 ` [PATCH 1/2] virtio: introduce virtio_dev_to_node helper Stefan Hajnoczi
2021-09-27  8:02   ` Stefan Hajnoczi
2021-09-27  9:31 ` Michael S. Tsirkin
2021-09-27  9:31   ` Michael S. Tsirkin
2021-09-28 16:14   ` Max Gurtovoy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.