linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix failure to hot add memory
@ 2021-01-28 13:41 Eli Cohen
  2021-01-28 13:41 ` [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue Eli Cohen
  2021-01-28 13:41 ` [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map Eli Cohen
  0 siblings, 2 replies; 27+ messages in thread
From: Eli Cohen @ 2021-01-28 13:41 UTC (permalink / raw)
  To: mst, jasowang; +Cc: virtualization, netdev, linux-kernel, lulu, elic

Hi Michael,
The following two patches are a fixing a failure to update the hardware
with the updated used index. This results in a failure to to hot add
memory to the guest which results in a memory map update and teardown
and re-create of the resources.

The first patch just removes unnecessary code. The second on is the
actual fix.

Eli Cohen (2):
  vdpa/mlx5: Avoid unnecessary query virtqueue
  vdpa/mlx5: Restore the hardware used index after change map

 drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-01-28 13:41 [PATCH 0/2] Fix failure to hot add memory Eli Cohen
@ 2021-01-28 13:41 ` Eli Cohen
  2021-01-29  3:48   ` Jason Wang
  2021-02-01 18:51   ` Si-Wei Liu
  2021-01-28 13:41 ` [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map Eli Cohen
  1 sibling, 2 replies; 27+ messages in thread
From: Eli Cohen @ 2021-01-28 13:41 UTC (permalink / raw)
  To: mst, jasowang; +Cc: virtualization, netdev, linux-kernel, lulu, elic

suspend_vq should only suspend the VQ on not save the current available
index. This is done when a change of map occurs when the driver calls
save_channel_info().

Signed-off-by: Eli Cohen <elic@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 88dde3455bfd..549ded074ff3 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
 
 static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
 {
-	struct mlx5_virtq_attr attr;
-
 	if (!mvq->initialized)
 		return;
 
@@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
 
 	if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
 		mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
-
-	if (query_virtqueue(ndev, mvq, &attr)) {
-		mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
-		return;
-	}
-	mvq->avail_idx = attr.available_index;
 }
 
 static void suspend_vqs(struct mlx5_vdpa_net *ndev)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map
  2021-01-28 13:41 [PATCH 0/2] Fix failure to hot add memory Eli Cohen
  2021-01-28 13:41 ` [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue Eli Cohen
@ 2021-01-28 13:41 ` Eli Cohen
  2021-01-29  3:49   ` Jason Wang
  1 sibling, 1 reply; 27+ messages in thread
From: Eli Cohen @ 2021-01-28 13:41 UTC (permalink / raw)
  To: mst, jasowang; +Cc: virtualization, netdev, linux-kernel, lulu, elic

When a change of memory map occurs, the hardware resources are destroyed
and then re-created again with the new memory map. In such case, we need
to restore the hardware available and used indices. The driver failed to
restore the used index which is added here.

Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 549ded074ff3..3fc8588cecae 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
 	u64 device_addr;
 	u64 driver_addr;
 	u16 avail_index;
+	u16 used_index;
 	bool ready;
 	struct vdpa_callback cb;
 	bool restore;
@@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
 	u32 virtq_id;
 	struct mlx5_vdpa_net *ndev;
 	u16 avail_idx;
+	u16 used_idx;
 	int fw_state;
 
 	/* keep last in the struct */
@@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
 
 	obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
 	MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
+	MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
 	MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
 		 get_features_12_3(ndev->mvdev.actual_features));
 	vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
@@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
 struct mlx5_virtq_attr {
 	u8 state;
 	u16 available_index;
+	u16 used_index;
 };
 
 static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
@@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
 	memset(attr, 0, sizeof(*attr));
 	attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
 	attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index);
+	attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index);
 	kfree(out);
 	return 0;
 
@@ -1602,6 +1607,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
 		return err;
 
 	ri->avail_index = attr.available_index;
+	ri->used_index = attr.used_index;
 	ri->ready = mvq->ready;
 	ri->num_ent = mvq->num_ent;
 	ri->desc_addr = mvq->desc_addr;
@@ -1646,6 +1652,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev)
 			continue;
 
 		mvq->avail_idx = ri->avail_index;
+		mvq->used_idx = ri->used_index;
 		mvq->ready = ri->ready;
 		mvq->num_ent = ri->num_ent;
 		mvq->desc_addr = ri->desc_addr;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-01-28 13:41 ` [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue Eli Cohen
@ 2021-01-29  3:48   ` Jason Wang
  2021-02-01 18:51   ` Si-Wei Liu
  1 sibling, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-01-29  3:48 UTC (permalink / raw)
  To: Eli Cohen, mst; +Cc: virtualization, netdev, linux-kernel, lulu


On 2021/1/28 下午9:41, Eli Cohen wrote:
> suspend_vq should only suspend the VQ on not save the current available
> index. This is done when a change of map occurs when the driver calls
> save_channel_info().
>
> Signed-off-by: Eli Cohen <elic@nvidia.com>


Acked-by: Jason Wang <jasowang@redhat.com>


> ---
>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
>   1 file changed, 8 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 88dde3455bfd..549ded074ff3 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>   
>   static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>   {
> -	struct mlx5_virtq_attr attr;
> -
>   	if (!mvq->initialized)
>   		return;
>   
> @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
>   
>   	if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
>   		mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> -
> -	if (query_virtqueue(ndev, mvq, &attr)) {
> -		mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> -		return;
> -	}
> -	mvq->avail_idx = attr.available_index;
>   }
>   
>   static void suspend_vqs(struct mlx5_vdpa_net *ndev)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map
  2021-01-28 13:41 ` [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map Eli Cohen
@ 2021-01-29  3:49   ` Jason Wang
  2021-01-31 18:55     ` Eli Cohen
  0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-01-29  3:49 UTC (permalink / raw)
  To: Eli Cohen, mst; +Cc: virtualization, netdev, linux-kernel, lulu


On 2021/1/28 下午9:41, Eli Cohen wrote:
> When a change of memory map occurs, the hardware resources are destroyed
> and then re-created again with the new memory map. In such case, we need
> to restore the hardware available and used indices. The driver failed to
> restore the used index which is added here.
>
> Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> Signed-off-by: Eli Cohen <elic@nvidia.com>


A question. Does this mean after a vq is suspended, the hw used index is 
not equal to vq used index?

Thanks


> ---
>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 549ded074ff3..3fc8588cecae 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
>   	u64 device_addr;
>   	u64 driver_addr;
>   	u16 avail_index;
> +	u16 used_index;
>   	bool ready;
>   	struct vdpa_callback cb;
>   	bool restore;
> @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
>   	u32 virtq_id;
>   	struct mlx5_vdpa_net *ndev;
>   	u16 avail_idx;
> +	u16 used_idx;
>   	int fw_state;
>   
>   	/* keep last in the struct */
> @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
>   
>   	obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
>   	MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
> +	MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
>   	MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
>   		 get_features_12_3(ndev->mvdev.actual_features));
>   	vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
> @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
>   struct mlx5_virtq_attr {
>   	u8 state;
>   	u16 available_index;
> +	u16 used_index;
>   };
>   
>   static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
> @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
>   	memset(attr, 0, sizeof(*attr));
>   	attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
>   	attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index);
> +	attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index);
>   	kfree(out);
>   	return 0;
>   
> @@ -1602,6 +1607,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
>   		return err;
>   
>   	ri->avail_index = attr.available_index;
> +	ri->used_index = attr.used_index;
>   	ri->ready = mvq->ready;
>   	ri->num_ent = mvq->num_ent;
>   	ri->desc_addr = mvq->desc_addr;
> @@ -1646,6 +1652,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev)
>   			continue;
>   
>   		mvq->avail_idx = ri->avail_index;
> +		mvq->used_idx = ri->used_index;
>   		mvq->ready = ri->ready;
>   		mvq->num_ent = ri->num_ent;
>   		mvq->desc_addr = ri->desc_addr;


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map
  2021-01-29  3:49   ` Jason Wang
@ 2021-01-31 18:55     ` Eli Cohen
  2021-02-01  3:36       ` Jason Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Cohen @ 2021-01-31 18:55 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, netdev, linux-kernel, lulu

On Fri, Jan 29, 2021 at 11:49:45AM +0800, Jason Wang wrote:
> 
> On 2021/1/28 下午9:41, Eli Cohen wrote:
> > When a change of memory map occurs, the hardware resources are destroyed
> > and then re-created again with the new memory map. In such case, we need
> > to restore the hardware available and used indices. The driver failed to
> > restore the used index which is added here.
> > 
> > Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> > Signed-off-by: Eli Cohen <elic@nvidia.com>
> 
> 
> A question. Does this mean after a vq is suspended, the hw used index is not
> equal to vq used index?

Surely there is just one "Used index" for a VQ. What I was trying to say
is that after the VQ is suspended, I read the used index by querying the
hardware. The read result is the used index that the hardware wrote to
memory. After the I create the new hardware object, I need to tell it
what is the used index (and the available index) as a way to sync it
with the existing VQ.

This sync is especially important when a change of map occurs while the
VQ was already used (hence the indices are likely to be non zero). This
can be triggered by hot adding memory after the VQs have been used. 

> 
> Thanks
> 
> 
> > ---
> >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 549ded074ff3..3fc8588cecae 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
> >   	u64 device_addr;
> >   	u64 driver_addr;
> >   	u16 avail_index;
> > +	u16 used_index;
> >   	bool ready;
> >   	struct vdpa_callback cb;
> >   	bool restore;
> > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
> >   	u32 virtq_id;
> >   	struct mlx5_vdpa_net *ndev;
> >   	u16 avail_idx;
> > +	u16 used_idx;
> >   	int fw_state;
> >   	/* keep last in the struct */
> > @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
> >   	obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
> >   	MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
> > +	MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
> >   	MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
> >   		 get_features_12_3(ndev->mvdev.actual_features));
> >   	vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
> > @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> >   struct mlx5_virtq_attr {
> >   	u8 state;
> >   	u16 available_index;
> > +	u16 used_index;
> >   };
> >   static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
> > @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
> >   	memset(attr, 0, sizeof(*attr));
> >   	attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
> >   	attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index);
> > +	attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index);
> >   	kfree(out);
> >   	return 0;
> > @@ -1602,6 +1607,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
> >   		return err;
> >   	ri->avail_index = attr.available_index;
> > +	ri->used_index = attr.used_index;
> >   	ri->ready = mvq->ready;
> >   	ri->num_ent = mvq->num_ent;
> >   	ri->desc_addr = mvq->desc_addr;
> > @@ -1646,6 +1652,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev)
> >   			continue;
> >   		mvq->avail_idx = ri->avail_index;
> > +		mvq->used_idx = ri->used_index;
> >   		mvq->ready = ri->ready;
> >   		mvq->num_ent = ri->num_ent;
> >   		mvq->desc_addr = ri->desc_addr;
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map
  2021-01-31 18:55     ` Eli Cohen
@ 2021-02-01  3:36       ` Jason Wang
  2021-02-01  5:52         ` Eli Cohen
  0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-02-01  3:36 UTC (permalink / raw)
  To: Eli Cohen; +Cc: mst, virtualization, netdev, linux-kernel, lulu


On 2021/2/1 上午2:55, Eli Cohen wrote:
> On Fri, Jan 29, 2021 at 11:49:45AM +0800, Jason Wang wrote:
>> On 2021/1/28 下午9:41, Eli Cohen wrote:
>>> When a change of memory map occurs, the hardware resources are destroyed
>>> and then re-created again with the new memory map. In such case, we need
>>> to restore the hardware available and used indices. The driver failed to
>>> restore the used index which is added here.
>>>
>>> Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>
>> A question. Does this mean after a vq is suspended, the hw used index is not
>> equal to vq used index?
> Surely there is just one "Used index" for a VQ. What I was trying to say
> is that after the VQ is suspended, I read the used index by querying the
> hardware. The read result is the used index that the hardware wrote to
> memory.


Just to make sure I understand here. So it looks to me we had two index. 
The first is the used index which is stored in the memory/virtqueue, the 
second is the one that is stored by the device.


>   After the I create the new hardware object, I need to tell it
> what is the used index (and the available index) as a way to sync it
> with the existing VQ.


For avail index I understand that the hardware index is not synced with 
the avail index stored in the memory/virtqueue. The question is used 
index, if the hardware one is not synced with the one in the virtqueue. 
It means after vq is suspended,  some requests is not completed by the 
hardware (e.g the buffer were not put to used ring).

This may have implications to live migration, it means those 
unaccomplished requests needs to be migrated to the destination and 
resubmitted to the device. This looks not easy.

Thanks


>
> This sync is especially important when a change of map occurs while the
> VQ was already used (hence the indices are likely to be non zero). This
> can be triggered by hot adding memory after the VQs have been used.
>
>> Thanks
>>
>>
>>> ---
>>>    drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++++++
>>>    1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> index 549ded074ff3..3fc8588cecae 100644
>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
>>>    	u64 device_addr;
>>>    	u64 driver_addr;
>>>    	u16 avail_index;
>>> +	u16 used_index;
>>>    	bool ready;
>>>    	struct vdpa_callback cb;
>>>    	bool restore;
>>> @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
>>>    	u32 virtq_id;
>>>    	struct mlx5_vdpa_net *ndev;
>>>    	u16 avail_idx;
>>> +	u16 used_idx;
>>>    	int fw_state;
>>>    	/* keep last in the struct */
>>> @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
>>>    	obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
>>>    	MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
>>> +	MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
>>>    	MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
>>>    		 get_features_12_3(ndev->mvdev.actual_features));
>>>    	vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
>>> @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
>>>    struct mlx5_virtq_attr {
>>>    	u8 state;
>>>    	u16 available_index;
>>> +	u16 used_index;
>>>    };
>>>    static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
>>> @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
>>>    	memset(attr, 0, sizeof(*attr));
>>>    	attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
>>>    	attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index);
>>> +	attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index);
>>>    	kfree(out);
>>>    	return 0;
>>> @@ -1602,6 +1607,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
>>>    		return err;
>>>    	ri->avail_index = attr.available_index;
>>> +	ri->used_index = attr.used_index;
>>>    	ri->ready = mvq->ready;
>>>    	ri->num_ent = mvq->num_ent;
>>>    	ri->desc_addr = mvq->desc_addr;
>>> @@ -1646,6 +1652,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev)
>>>    			continue;
>>>    		mvq->avail_idx = ri->avail_index;
>>> +		mvq->used_idx = ri->used_index;
>>>    		mvq->ready = ri->ready;
>>>    		mvq->num_ent = ri->num_ent;
>>>    		mvq->desc_addr = ri->desc_addr;


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map
  2021-02-01  3:36       ` Jason Wang
@ 2021-02-01  5:52         ` Eli Cohen
  2021-02-01  6:00           ` Jason Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Cohen @ 2021-02-01  5:52 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, netdev, linux-kernel, lulu

On Mon, Feb 01, 2021 at 11:36:23AM +0800, Jason Wang wrote:
> 
> On 2021/2/1 上午2:55, Eli Cohen wrote:
> > On Fri, Jan 29, 2021 at 11:49:45AM +0800, Jason Wang wrote:
> > > On 2021/1/28 下午9:41, Eli Cohen wrote:
> > > > When a change of memory map occurs, the hardware resources are destroyed
> > > > and then re-created again with the new memory map. In such case, we need
> > > > to restore the hardware available and used indices. The driver failed to
> > > > restore the used index which is added here.
> > > > 
> > > > Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> > > > Signed-off-by: Eli Cohen <elic@nvidia.com>
> > > 
> > > A question. Does this mean after a vq is suspended, the hw used index is not
> > > equal to vq used index?
> > Surely there is just one "Used index" for a VQ. What I was trying to say
> > is that after the VQ is suspended, I read the used index by querying the
> > hardware. The read result is the used index that the hardware wrote to
> > memory.
> 
> 
> Just to make sure I understand here. So it looks to me we had two index. The
> first is the used index which is stored in the memory/virtqueue, the second
> is the one that is stored by the device.
> 

It is the structures defined in the virtio spec in 2.6.6 for the
available ring and 2.6.8 for the used ring. As you know these the
available ring is written to by the driver and read by the device. The
opposite happens for the used index.
The reason I need to restore the last known indices is for the new
hardware objects to sync on the last state and take over from there.

> 
> >   After the I create the new hardware object, I need to tell it
> > what is the used index (and the available index) as a way to sync it
> > with the existing VQ.
> 
> 
> For avail index I understand that the hardware index is not synced with the
> avail index stored in the memory/virtqueue. The question is used index, if
> the hardware one is not synced with the one in the virtqueue. It means after
> vq is suspended,  some requests is not completed by the hardware (e.g the
> buffer were not put to used ring).
> 
> This may have implications to live migration, it means those unaccomplished
> requests needs to be migrated to the destination and resubmitted to the
> device. This looks not easy.
> 
> Thanks
> 
> 
> > 
> > This sync is especially important when a change of map occurs while the
> > VQ was already used (hence the indices are likely to be non zero). This
> > can be triggered by hot adding memory after the VQs have been used.
> > 
> > > Thanks
> > > 
> > > 
> > > > ---
> > > >    drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++++++
> > > >    1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 549ded074ff3..3fc8588cecae 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
> > > >    	u64 device_addr;
> > > >    	u64 driver_addr;
> > > >    	u16 avail_index;
> > > > +	u16 used_index;
> > > >    	bool ready;
> > > >    	struct vdpa_callback cb;
> > > >    	bool restore;
> > > > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
> > > >    	u32 virtq_id;
> > > >    	struct mlx5_vdpa_net *ndev;
> > > >    	u16 avail_idx;
> > > > +	u16 used_idx;
> > > >    	int fw_state;
> > > >    	/* keep last in the struct */
> > > > @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
> > > >    	obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
> > > >    	MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
> > > > +	MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
> > > >    	MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
> > > >    		 get_features_12_3(ndev->mvdev.actual_features));
> > > >    	vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
> > > > @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> > > >    struct mlx5_virtq_attr {
> > > >    	u8 state;
> > > >    	u16 available_index;
> > > > +	u16 used_index;
> > > >    };
> > > >    static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
> > > > @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
> > > >    	memset(attr, 0, sizeof(*attr));
> > > >    	attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
> > > >    	attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index);
> > > > +	attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index);
> > > >    	kfree(out);
> > > >    	return 0;
> > > > @@ -1602,6 +1607,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
> > > >    		return err;
> > > >    	ri->avail_index = attr.available_index;
> > > > +	ri->used_index = attr.used_index;
> > > >    	ri->ready = mvq->ready;
> > > >    	ri->num_ent = mvq->num_ent;
> > > >    	ri->desc_addr = mvq->desc_addr;
> > > > @@ -1646,6 +1652,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev)
> > > >    			continue;
> > > >    		mvq->avail_idx = ri->avail_index;
> > > > +		mvq->used_idx = ri->used_index;
> > > >    		mvq->ready = ri->ready;
> > > >    		mvq->num_ent = ri->num_ent;
> > > >    		mvq->desc_addr = ri->desc_addr;
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map
  2021-02-01  5:52         ` Eli Cohen
@ 2021-02-01  6:00           ` Jason Wang
  2021-02-01  6:38             ` Eli Cohen
  0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-02-01  6:00 UTC (permalink / raw)
  To: Eli Cohen; +Cc: mst, virtualization, netdev, linux-kernel, lulu


On 2021/2/1 下午1:52, Eli Cohen wrote:
> On Mon, Feb 01, 2021 at 11:36:23AM +0800, Jason Wang wrote:
>> On 2021/2/1 上午2:55, Eli Cohen wrote:
>>> On Fri, Jan 29, 2021 at 11:49:45AM +0800, Jason Wang wrote:
>>>> On 2021/1/28 下午9:41, Eli Cohen wrote:
>>>>> When a change of memory map occurs, the hardware resources are destroyed
>>>>> and then re-created again with the new memory map. In such case, we need
>>>>> to restore the hardware available and used indices. The driver failed to
>>>>> restore the used index which is added here.
>>>>>
>>>>> Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>> A question. Does this mean after a vq is suspended, the hw used index is not
>>>> equal to vq used index?
>>> Surely there is just one "Used index" for a VQ. What I was trying to say
>>> is that after the VQ is suspended, I read the used index by querying the
>>> hardware. The read result is the used index that the hardware wrote to
>>> memory.
>>
>> Just to make sure I understand here. So it looks to me we had two index. The
>> first is the used index which is stored in the memory/virtqueue, the second
>> is the one that is stored by the device.
>>
> It is the structures defined in the virtio spec in 2.6.6 for the
> available ring and 2.6.8 for the used ring. As you know these the
> available ring is written to by the driver and read by the device. The
> opposite happens for the used index.


Right, so for used index it was wrote by device. And the device should 
have an internal used index value that is used to write to the used 
ring. And the code is used to sync the device internal used index if I 
understand this correctly.


> The reason I need to restore the last known indices is for the new
> hardware objects to sync on the last state and take over from there.


Right, after the vq suspending, the questions are:

1) is hardware internal used index might not be the same with the used 
index in the virtqueue?

or

2) can we simply sync the virtqueue's used index to the hardware's used 
index?

Thanks


>
>>>    After the I create the new hardware object, I need to tell it
>>> what is the used index (and the available index) as a way to sync it
>>> with the existing VQ.
>>
>> For avail index I understand that the hardware index is not synced with the
>> avail index stored in the memory/virtqueue. The question is used index, if
>> the hardware one is not synced with the one in the virtqueue. It means after
>> vq is suspended,  some requests is not completed by the hardware (e.g the
>> buffer were not put to used ring).
>>
>> This may have implications to live migration, it means those unaccomplished
>> requests needs to be migrated to the destination and resubmitted to the
>> device. This looks not easy.
>>
>> Thanks
>>
>>
>>> This sync is especially important when a change of map occurs while the
>>> VQ was already used (hence the indices are likely to be non zero). This
>>> can be triggered by hot adding memory after the VQs have been used.
>>>
>>>> Thanks
>>>>
>>>>
>>>>> ---
>>>>>     drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++++++
>>>>>     1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>> index 549ded074ff3..3fc8588cecae 100644
>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>> @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
>>>>>     	u64 device_addr;
>>>>>     	u64 driver_addr;
>>>>>     	u16 avail_index;
>>>>> +	u16 used_index;
>>>>>     	bool ready;
>>>>>     	struct vdpa_callback cb;
>>>>>     	bool restore;
>>>>> @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
>>>>>     	u32 virtq_id;
>>>>>     	struct mlx5_vdpa_net *ndev;
>>>>>     	u16 avail_idx;
>>>>> +	u16 used_idx;
>>>>>     	int fw_state;
>>>>>     	/* keep last in the struct */
>>>>> @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
>>>>>     	obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
>>>>>     	MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
>>>>> +	MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
>>>>>     	MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
>>>>>     		 get_features_12_3(ndev->mvdev.actual_features));
>>>>>     	vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
>>>>> @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
>>>>>     struct mlx5_virtq_attr {
>>>>>     	u8 state;
>>>>>     	u16 available_index;
>>>>> +	u16 used_index;
>>>>>     };
>>>>>     static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
>>>>> @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
>>>>>     	memset(attr, 0, sizeof(*attr));
>>>>>     	attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
>>>>>     	attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index);
>>>>> +	attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index);
>>>>>     	kfree(out);
>>>>>     	return 0;
>>>>> @@ -1602,6 +1607,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
>>>>>     		return err;
>>>>>     	ri->avail_index = attr.available_index;
>>>>> +	ri->used_index = attr.used_index;
>>>>>     	ri->ready = mvq->ready;
>>>>>     	ri->num_ent = mvq->num_ent;
>>>>>     	ri->desc_addr = mvq->desc_addr;
>>>>> @@ -1646,6 +1652,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev)
>>>>>     			continue;
>>>>>     		mvq->avail_idx = ri->avail_index;
>>>>> +		mvq->used_idx = ri->used_index;
>>>>>     		mvq->ready = ri->ready;
>>>>>     		mvq->num_ent = ri->num_ent;
>>>>>     		mvq->desc_addr = ri->desc_addr;


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map
  2021-02-01  6:00           ` Jason Wang
@ 2021-02-01  6:38             ` Eli Cohen
  2021-02-01  7:23               ` Jason Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Cohen @ 2021-02-01  6:38 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, netdev, linux-kernel, lulu

On Mon, Feb 01, 2021 at 02:00:35PM +0800, Jason Wang wrote:
> 
> On 2021/2/1 下午1:52, Eli Cohen wrote:
> > On Mon, Feb 01, 2021 at 11:36:23AM +0800, Jason Wang wrote:
> > > On 2021/2/1 上午2:55, Eli Cohen wrote:
> > > > On Fri, Jan 29, 2021 at 11:49:45AM +0800, Jason Wang wrote:
> > > > > On 2021/1/28 下午9:41, Eli Cohen wrote:
> > > > > > When a change of memory map occurs, the hardware resources are destroyed
> > > > > > and then re-created again with the new memory map. In such case, we need
> > > > > > to restore the hardware available and used indices. The driver failed to
> > > > > > restore the used index which is added here.
> > > > > > 
> > > > > > Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> > > > > > Signed-off-by: Eli Cohen <elic@nvidia.com>
> > > > > A question. Does this mean after a vq is suspended, the hw used index is not
> > > > > equal to vq used index?
> > > > Surely there is just one "Used index" for a VQ. What I was trying to say
> > > > is that after the VQ is suspended, I read the used index by querying the
> > > > hardware. The read result is the used index that the hardware wrote to
> > > > memory.
> > > 
> > > Just to make sure I understand here. So it looks to me we had two index. The
> > > first is the used index which is stored in the memory/virtqueue, the second
> > > is the one that is stored by the device.
> > > 
> > It is the structures defined in the virtio spec in 2.6.6 for the
> > available ring and 2.6.8 for the used ring. As you know these the
> > available ring is written to by the driver and read by the device. The
> > opposite happens for the used index.
> 
> 
> Right, so for used index it was wrote by device. And the device should have
> an internal used index value that is used to write to the used ring. And the
> code is used to sync the device internal used index if I understand this
> correctly.
> 
> 
> > The reason I need to restore the last known indices is for the new
> > hardware objects to sync on the last state and take over from there.
> 
> 
> Right, after the vq suspending, the questions are:
> 
> 1) is hardware internal used index might not be the same with the used index
> in the virtqueue?
> 

Generally the answer is no because the hardware is the only one writing
it. New objects start up with the initial value configured to them upon
creation. This value was zero before this change.
You could argue that since the hardware has access to virtqueue memory,
it could just read the value from there but it does not.

> or
> 
> 2) can we simply sync the virtqueue's used index to the hardware's used
> index?
> 
Theoretically it could be done but that's not how the hardware works.
One reason is that is not supposed to read from that area. But it is
really hardware implementation detail.
> > 
> > > >    After the I create the new hardware object, I need to tell it
> > > > what is the used index (and the available index) as a way to sync it
> > > > with the existing VQ.
> > > 
> > > For avail index I understand that the hardware index is not synced with the
> > > avail index stored in the memory/virtqueue. The question is used index, if
> > > the hardware one is not synced with the one in the virtqueue. It means after
> > > vq is suspended,  some requests is not completed by the hardware (e.g the
> > > buffer were not put to used ring).
> > > 
> > > This may have implications to live migration, it means those unaccomplished
> > > requests needs to be migrated to the destination and resubmitted to the
> > > device. This looks not easy.
> > > 
> > > Thanks
> > > 
> > > 
> > > > This sync is especially important when a change of map occurs while the
> > > > VQ was already used (hence the indices are likely to be non zero). This
> > > > can be triggered by hot adding memory after the VQs have been used.
> > > > 
> > > > > Thanks
> > > > > 
> > > > > 
> > > > > > ---
> > > > > >     drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++++++
> > > > > >     1 file changed, 7 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > index 549ded074ff3..3fc8588cecae 100644
> > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
> > > > > >     	u64 device_addr;
> > > > > >     	u64 driver_addr;
> > > > > >     	u16 avail_index;
> > > > > > +	u16 used_index;
> > > > > >     	bool ready;
> > > > > >     	struct vdpa_callback cb;
> > > > > >     	bool restore;
> > > > > > @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
> > > > > >     	u32 virtq_id;
> > > > > >     	struct mlx5_vdpa_net *ndev;
> > > > > >     	u16 avail_idx;
> > > > > > +	u16 used_idx;
> > > > > >     	int fw_state;
> > > > > >     	/* keep last in the struct */
> > > > > > @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
> > > > > >     	obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
> > > > > >     	MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
> > > > > > +	MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
> > > > > >     	MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
> > > > > >     		 get_features_12_3(ndev->mvdev.actual_features));
> > > > > >     	vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
> > > > > > @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> > > > > >     struct mlx5_virtq_attr {
> > > > > >     	u8 state;
> > > > > >     	u16 available_index;
> > > > > > +	u16 used_index;
> > > > > >     };
> > > > > >     static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
> > > > > > @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
> > > > > >     	memset(attr, 0, sizeof(*attr));
> > > > > >     	attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
> > > > > >     	attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index);
> > > > > > +	attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index);
> > > > > >     	kfree(out);
> > > > > >     	return 0;
> > > > > > @@ -1602,6 +1607,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
> > > > > >     		return err;
> > > > > >     	ri->avail_index = attr.available_index;
> > > > > > +	ri->used_index = attr.used_index;
> > > > > >     	ri->ready = mvq->ready;
> > > > > >     	ri->num_ent = mvq->num_ent;
> > > > > >     	ri->desc_addr = mvq->desc_addr;
> > > > > > @@ -1646,6 +1652,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev)
> > > > > >     			continue;
> > > > > >     		mvq->avail_idx = ri->avail_index;
> > > > > > +		mvq->used_idx = ri->used_index;
> > > > > >     		mvq->ready = ri->ready;
> > > > > >     		mvq->num_ent = ri->num_ent;
> > > > > >     		mvq->desc_addr = ri->desc_addr;
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map
  2021-02-01  6:38             ` Eli Cohen
@ 2021-02-01  7:23               ` Jason Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-02-01  7:23 UTC (permalink / raw)
  To: Eli Cohen; +Cc: mst, virtualization, netdev, linux-kernel, lulu


On 2021/2/1 下午2:38, Eli Cohen wrote:
> On Mon, Feb 01, 2021 at 02:00:35PM +0800, Jason Wang wrote:
>> On 2021/2/1 下午1:52, Eli Cohen wrote:
>>> On Mon, Feb 01, 2021 at 11:36:23AM +0800, Jason Wang wrote:
>>>> On 2021/2/1 上午2:55, Eli Cohen wrote:
>>>>> On Fri, Jan 29, 2021 at 11:49:45AM +0800, Jason Wang wrote:
>>>>>> On 2021/1/28 下午9:41, Eli Cohen wrote:
>>>>>>> When a change of memory map occurs, the hardware resources are destroyed
>>>>>>> and then re-created again with the new memory map. In such case, we need
>>>>>>> to restore the hardware available and used indices. The driver failed to
>>>>>>> restore the used index which is added here.
>>>>>>>
>>>>>>> Fixes 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
>>>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>>>> A question. Does this mean after a vq is suspended, the hw used index is not
>>>>>> equal to vq used index?
>>>>> Surely there is just one "Used index" for a VQ. What I was trying to say
>>>>> is that after the VQ is suspended, I read the used index by querying the
>>>>> hardware. The read result is the used index that the hardware wrote to
>>>>> memory.
>>>> Just to make sure I understand here. So it looks to me we had two index. The
>>>> first is the used index which is stored in the memory/virtqueue, the second
>>>> is the one that is stored by the device.
>>>>
>>> It is the structures defined in the virtio spec in 2.6.6 for the
>>> available ring and 2.6.8 for the used ring. As you know these the
>>> available ring is written to by the driver and read by the device. The
>>> opposite happens for the used index.
>>
>> Right, so for used index it was wrote by device. And the device should have
>> an internal used index value that is used to write to the used ring. And the
>> code is used to sync the device internal used index if I understand this
>> correctly.
>>
>>
>>> The reason I need to restore the last known indices is for the new
>>> hardware objects to sync on the last state and take over from there.
>>
>> Right, after the vq suspending, the questions are:
>>
>> 1) is hardware internal used index might not be the same with the used index
>> in the virtqueue?
>>
> Generally the answer is no because the hardware is the only one writing
> it. New objects start up with the initial value configured to them upon
> creation. This value was zero before this change.
> You could argue that since the hardware has access to virtqueue memory,
> it could just read the value from there but it does not.


I see.


>
>> or
>>
>> 2) can we simply sync the virtqueue's used index to the hardware's used
>> index?
>>
> Theoretically it could be done but that's not how the hardware works.
> One reason is that is not supposed to read from that area. But it is
> really hardware implementation detail.


I get you now.

So

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


>>>>>     After the I create the new hardware object, I need to tell it
>>>>> what is the used index (and the available index) as a way to sync it
>>>>> with the existing VQ.
>>>> For avail index I understand that the hardware index is not synced with the
>>>> avail index stored in the memory/virtqueue. The question is used index, if
>>>> the hardware one is not synced with the one in the virtqueue. It means after
>>>> vq is suspended,  some requests is not completed by the hardware (e.g the
>>>> buffer were not put to used ring).
>>>>
>>>> This may have implications to live migration, it means those unaccomplished
>>>> requests needs to be migrated to the destination and resubmitted to the
>>>> device. This looks not easy.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>> This sync is especially important when a change of map occurs while the
>>>>> VQ was already used (hence the indices are likely to be non zero). This
>>>>> can be triggered by hot adding memory after the VQs have been used.
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> ---
>>>>>>>      drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++++++
>>>>>>>      1 file changed, 7 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>> index 549ded074ff3..3fc8588cecae 100644
>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>> @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
>>>>>>>      	u64 device_addr;
>>>>>>>      	u64 driver_addr;
>>>>>>>      	u16 avail_index;
>>>>>>> +	u16 used_index;
>>>>>>>      	bool ready;
>>>>>>>      	struct vdpa_callback cb;
>>>>>>>      	bool restore;
>>>>>>> @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
>>>>>>>      	u32 virtq_id;
>>>>>>>      	struct mlx5_vdpa_net *ndev;
>>>>>>>      	u16 avail_idx;
>>>>>>> +	u16 used_idx;
>>>>>>>      	int fw_state;
>>>>>>>      	/* keep last in the struct */
>>>>>>> @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
>>>>>>>      	obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
>>>>>>>      	MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
>>>>>>> +	MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
>>>>>>>      	MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
>>>>>>>      		 get_features_12_3(ndev->mvdev.actual_features));
>>>>>>>      	vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
>>>>>>> @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
>>>>>>>      struct mlx5_virtq_attr {
>>>>>>>      	u8 state;
>>>>>>>      	u16 available_index;
>>>>>>> +	u16 used_index;
>>>>>>>      };
>>>>>>>      static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
>>>>>>> @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
>>>>>>>      	memset(attr, 0, sizeof(*attr));
>>>>>>>      	attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
>>>>>>>      	attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index);
>>>>>>> +	attr->used_index = MLX5_GET(virtio_net_q_object, obj_context, hw_used_index);
>>>>>>>      	kfree(out);
>>>>>>>      	return 0;
>>>>>>> @@ -1602,6 +1607,7 @@ static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
>>>>>>>      		return err;
>>>>>>>      	ri->avail_index = attr.available_index;
>>>>>>> +	ri->used_index = attr.used_index;
>>>>>>>      	ri->ready = mvq->ready;
>>>>>>>      	ri->num_ent = mvq->num_ent;
>>>>>>>      	ri->desc_addr = mvq->desc_addr;
>>>>>>> @@ -1646,6 +1652,7 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev)
>>>>>>>      			continue;
>>>>>>>      		mvq->avail_idx = ri->avail_index;
>>>>>>> +		mvq->used_idx = ri->used_index;
>>>>>>>      		mvq->ready = ri->ready;
>>>>>>>      		mvq->num_ent = ri->num_ent;
>>>>>>>      		mvq->desc_addr = ri->desc_addr;


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-01-28 13:41 ` [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue Eli Cohen
  2021-01-29  3:48   ` Jason Wang
@ 2021-02-01 18:51   ` Si-Wei Liu
  2021-02-01 19:17     ` Si-Wei Liu
  1 sibling, 1 reply; 27+ messages in thread
From: Si-Wei Liu @ 2021-02-01 18:51 UTC (permalink / raw)
  To: Eli Cohen; +Cc: mst, jasowang, virtualization, netdev, linux-kernel, lulu

On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
>
> suspend_vq should only suspend the VQ on not save the current available
> index. This is done when a change of map occurs when the driver calls
> save_channel_info().

Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
which doesn't save the available index as save_channel_info() doesn't
get called in that path at all. How does it handle the case that
aget_vq_state() is called from userspace (e.g. QEMU) while the
hardware VQ object was torn down, but userspace still wants to access
the queue index?

Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/

vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)

QEMU will complain with the above warning while VM is being rebooted
or shut down.

Looks to me either the kernel driver should cover this requirement, or
the userspace has to bear the burden in saving the index and not call
into kernel if VQ is destroyed.

-Siwei


>
> Signed-off-by: Eli Cohen <elic@nvidia.com>
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
>  1 file changed, 8 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 88dde3455bfd..549ded074ff3 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>
>  static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>  {
> -       struct mlx5_virtq_attr attr;
> -
>         if (!mvq->initialized)
>                 return;
>
> @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
>
>         if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
>                 mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> -
> -       if (query_virtqueue(ndev, mvq, &attr)) {
> -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> -               return;
> -       }
> -       mvq->avail_idx = attr.available_index;
>  }
>
>  static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> --
> 2.29.2
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-01 18:51   ` Si-Wei Liu
@ 2021-02-01 19:17     ` Si-Wei Liu
  2021-02-02  3:12       ` Jason Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Si-Wei Liu @ 2021-02-01 19:17 UTC (permalink / raw)
  To: Eli Cohen
  Cc: mst, jasowang, virtualization, netdev, linux-kernel, lulu, Si-Wei Liu

On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
>
> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
> >
> > suspend_vq should only suspend the VQ on not save the current available
> > index. This is done when a change of map occurs when the driver calls
> > save_channel_info().
>
> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> which doesn't save the available index as save_channel_info() doesn't
> get called in that path at all. How does it handle the case that
> aget_vq_state() is called from userspace (e.g. QEMU) while the
> hardware VQ object was torn down, but userspace still wants to access
> the queue index?
>
> Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
>
> vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
>
> QEMU will complain with the above warning while VM is being rebooted
> or shut down.
>
> Looks to me either the kernel driver should cover this requirement, or
> the userspace has to bear the burden in saving the index and not call
> into kernel if VQ is destroyed.

Actually, the userspace doesn't have the insights whether virt queue
will be destroyed if just changing the device status via set_status().
Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
so. Hence this still looks to me to be Mellanox specifics and
mlx5_vdpa implementation detail that shouldn't expose to userspace.
>
> -Siwei
>
>
> >
> > Signed-off-by: Eli Cohen <elic@nvidia.com>
> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
> >  1 file changed, 8 deletions(-)
> >
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 88dde3455bfd..549ded074ff3 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> >
> >  static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> >  {
> > -       struct mlx5_virtq_attr attr;
> > -
> >         if (!mvq->initialized)
> >                 return;
> >
> > @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> >
> >         if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> >                 mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> > -
> > -       if (query_virtqueue(ndev, mvq, &attr)) {
> > -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> > -               return;
> > -       }
> > -       mvq->avail_idx = attr.available_index;
> >  }
> >
> >  static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > --
> > 2.29.2
> >

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-01 19:17     ` Si-Wei Liu
@ 2021-02-02  3:12       ` Jason Wang
  2021-02-02  4:15         ` Si-Wei Liu
  2021-02-02  6:57         ` Eli Cohen
  0 siblings, 2 replies; 27+ messages in thread
From: Jason Wang @ 2021-02-02  3:12 UTC (permalink / raw)
  To: Si-Wei Liu, Eli Cohen
  Cc: mst, virtualization, netdev, linux-kernel, lulu, Si-Wei Liu


On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
>> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
>>> suspend_vq should only suspend the VQ on not save the current available
>>> index. This is done when a change of map occurs when the driver calls
>>> save_channel_info().
>> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
>> which doesn't save the available index as save_channel_info() doesn't
>> get called in that path at all. How does it handle the case that
>> aget_vq_state() is called from userspace (e.g. QEMU) while the
>> hardware VQ object was torn down, but userspace still wants to access
>> the queue index?
>>
>> Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
>>
>> vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
>> vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
>>
>> QEMU will complain with the above warning while VM is being rebooted
>> or shut down.
>>
>> Looks to me either the kernel driver should cover this requirement, or
>> the userspace has to bear the burden in saving the index and not call
>> into kernel if VQ is destroyed.
> Actually, the userspace doesn't have the insights whether virt queue
> will be destroyed if just changing the device status via set_status().
> Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> so. Hence this still looks to me to be Mellanox specifics and
> mlx5_vdpa implementation detail that shouldn't expose to userspace.


So I think we can simply drop this patch?

Thanks


>> -Siwei
>>
>>
>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>> ---
>>>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
>>>   1 file changed, 8 deletions(-)
>>>
>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> index 88dde3455bfd..549ded074ff3 100644
>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>>>
>>>   static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>>>   {
>>> -       struct mlx5_virtq_attr attr;
>>> -
>>>          if (!mvq->initialized)
>>>                  return;
>>>
>>> @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
>>>
>>>          if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
>>>                  mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
>>> -
>>> -       if (query_virtqueue(ndev, mvq, &attr)) {
>>> -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
>>> -               return;
>>> -       }
>>> -       mvq->avail_idx = attr.available_index;
>>>   }
>>>
>>>   static void suspend_vqs(struct mlx5_vdpa_net *ndev)
>>> --
>>> 2.29.2
>>>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-02  3:12       ` Jason Wang
@ 2021-02-02  4:15         ` Si-Wei Liu
  2021-02-02  6:02           ` Jason Wang
  2021-02-02  7:00           ` Eli Cohen
  2021-02-02  6:57         ` Eli Cohen
  1 sibling, 2 replies; 27+ messages in thread
From: Si-Wei Liu @ 2021-02-02  4:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: Eli Cohen, mst, virtualization, netdev, linux-kernel, lulu, Si-Wei Liu

On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
> >> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
> >>> suspend_vq should only suspend the VQ on not save the current available
> >>> index. This is done when a change of map occurs when the driver calls
> >>> save_channel_info().
> >> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> >> which doesn't save the available index as save_channel_info() doesn't
> >> get called in that path at all. How does it handle the case that
> >> aget_vq_state() is called from userspace (e.g. QEMU) while the
> >> hardware VQ object was torn down, but userspace still wants to access
> >> the queue index?
> >>
> >> Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
> >>
> >> vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> >> vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
> >>
> >> QEMU will complain with the above warning while VM is being rebooted
> >> or shut down.
> >>
> >> Looks to me either the kernel driver should cover this requirement, or
> >> the userspace has to bear the burden in saving the index and not call
> >> into kernel if VQ is destroyed.
> > Actually, the userspace doesn't have the insights whether virt queue
> > will be destroyed if just changing the device status via set_status().
> > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> > so. Hence this still looks to me to be Mellanox specifics and
> > mlx5_vdpa implementation detail that shouldn't expose to userspace.
>
>
> So I think we can simply drop this patch?

Yep, I think so. To be honest I don't know why it has anything to do
with the memory hotplug issue.

-Siwei

>
> Thanks
>
>
> >> -Siwei
> >>
> >>
> >>> Signed-off-by: Eli Cohen <elic@nvidia.com>
> >>> ---
> >>>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
> >>>   1 file changed, 8 deletions(-)
> >>>
> >>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >>> index 88dde3455bfd..549ded074ff3 100644
> >>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >>> @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> >>>
> >>>   static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> >>>   {
> >>> -       struct mlx5_virtq_attr attr;
> >>> -
> >>>          if (!mvq->initialized)
> >>>                  return;
> >>>
> >>> @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> >>>
> >>>          if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> >>>                  mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> >>> -
> >>> -       if (query_virtqueue(ndev, mvq, &attr)) {
> >>> -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> >>> -               return;
> >>> -       }
> >>> -       mvq->avail_idx = attr.available_index;
> >>>   }
> >>>
> >>>   static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> >>> --
> >>> 2.29.2
> >>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-02  4:15         ` Si-Wei Liu
@ 2021-02-02  6:02           ` Jason Wang
  2021-02-02  7:06             ` Eli Cohen
  2021-02-02  7:00           ` Eli Cohen
  1 sibling, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-02-02  6:02 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Eli Cohen, mst, virtualization, netdev, linux-kernel, lulu, Si-Wei Liu


On 2021/2/2 下午12:15, Si-Wei Liu wrote:
> On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2021/2/2 上午3:17, Si-Wei Liu wrote:
>>> On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
>>>> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
>>>>> suspend_vq should only suspend the VQ on not save the current available
>>>>> index. This is done when a change of map occurs when the driver calls
>>>>> save_channel_info().
>>>> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
>>>> which doesn't save the available index as save_channel_info() doesn't
>>>> get called in that path at all. How does it handle the case that
>>>> aget_vq_state() is called from userspace (e.g. QEMU) while the
>>>> hardware VQ object was torn down, but userspace still wants to access
>>>> the queue index?
>>>>
>>>> Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
>>>>
>>>> vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
>>>> vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
>>>>
>>>> QEMU will complain with the above warning while VM is being rebooted
>>>> or shut down.
>>>>
>>>> Looks to me either the kernel driver should cover this requirement, or
>>>> the userspace has to bear the burden in saving the index and not call
>>>> into kernel if VQ is destroyed.
>>> Actually, the userspace doesn't have the insights whether virt queue
>>> will be destroyed if just changing the device status via set_status().
>>> Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
>>> so. Hence this still looks to me to be Mellanox specifics and
>>> mlx5_vdpa implementation detail that shouldn't expose to userspace.
>>
>> So I think we can simply drop this patch?
> Yep, I think so. To be honest I don't know why it has anything to do
> with the memory hotplug issue.


Eli may know more, my understanding is that, during memory hotplut, qemu 
need to propagate new memory mappings via set_map(). For mellanox, it 
means it needs to rebuild memory keys, so the virtqueue needs to be 
suspended.

Thanks


>
> -Siwei
>
>> Thanks
>>
>>
>>>> -Siwei
>>>>
>>>>
>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>>> ---
>>>>>    drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
>>>>>    1 file changed, 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>> index 88dde3455bfd..549ded074ff3 100644
>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>> @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>>>>>
>>>>>    static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>>>>>    {
>>>>> -       struct mlx5_virtq_attr attr;
>>>>> -
>>>>>           if (!mvq->initialized)
>>>>>                   return;
>>>>>
>>>>> @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
>>>>>
>>>>>           if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
>>>>>                   mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
>>>>> -
>>>>> -       if (query_virtqueue(ndev, mvq, &attr)) {
>>>>> -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
>>>>> -               return;
>>>>> -       }
>>>>> -       mvq->avail_idx = attr.available_index;
>>>>>    }
>>>>>
>>>>>    static void suspend_vqs(struct mlx5_vdpa_net *ndev)
>>>>> --
>>>>> 2.29.2
>>>>>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-02  3:12       ` Jason Wang
  2021-02-02  4:15         ` Si-Wei Liu
@ 2021-02-02  6:57         ` Eli Cohen
  1 sibling, 0 replies; 27+ messages in thread
From: Eli Cohen @ 2021-02-02  6:57 UTC (permalink / raw)
  To: Jason Wang
  Cc: Si-Wei Liu, mst, virtualization, netdev, linux-kernel, lulu, Si-Wei Liu

On Tue, Feb 02, 2021 at 11:12:51AM +0800, Jason Wang wrote:
> 
> On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
> > > On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
> > > > suspend_vq should only suspend the VQ on not save the current available
> > > > index. This is done when a change of map occurs when the driver calls
> > > > save_channel_info().
> > > Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> > > which doesn't save the available index as save_channel_info() doesn't
> > > get called in that path at all. How does it handle the case that
> > > aget_vq_state() is called from userspace (e.g. QEMU) while the
> > > hardware VQ object was torn down, but userspace still wants to access
> > > the queue index?
> > > 
> > > Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
> > > 
> > > vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> > > vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
> > > 
> > > QEMU will complain with the above warning while VM is being rebooted
> > > or shut down.
> > > 
> > > Looks to me either the kernel driver should cover this requirement, or
> > > the userspace has to bear the burden in saving the index and not call
> > > into kernel if VQ is destroyed.
> > Actually, the userspace doesn't have the insights whether virt queue
> > will be destroyed if just changing the device status via set_status().
> > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> > so. Hence this still looks to me to be Mellanox specifics and
> > mlx5_vdpa implementation detail that shouldn't expose to userspace.
> 
> 
> So I think we can simply drop this patch?
> 

Yes, I agree. Let's just avoid it.

> Thanks
> 
> 
> > > -Siwei
> > > 
> > > 
> > > > Signed-off-by: Eli Cohen <elic@nvidia.com>
> > > > ---
> > > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
> > > >   1 file changed, 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 88dde3455bfd..549ded074ff3 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > > 
> > > >   static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > >   {
> > > > -       struct mlx5_virtq_attr attr;
> > > > -
> > > >          if (!mvq->initialized)
> > > >                  return;
> > > > 
> > > > @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> > > > 
> > > >          if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> > > >                  mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> > > > -
> > > > -       if (query_virtqueue(ndev, mvq, &attr)) {
> > > > -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> > > > -               return;
> > > > -       }
> > > > -       mvq->avail_idx = attr.available_index;
> > > >   }
> > > > 
> > > >   static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > > > --
> > > > 2.29.2
> > > > 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-02  4:15         ` Si-Wei Liu
  2021-02-02  6:02           ` Jason Wang
@ 2021-02-02  7:00           ` Eli Cohen
  2021-02-02 14:06             ` Michael S. Tsirkin
  1 sibling, 1 reply; 27+ messages in thread
From: Eli Cohen @ 2021-02-02  7:00 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Jason Wang, mst, virtualization, netdev, linux-kernel, lulu, Si-Wei Liu

On Mon, Feb 01, 2021 at 08:15:29PM -0800, Si-Wei Liu wrote:
> On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
> > >> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
> > >>> suspend_vq should only suspend the VQ on not save the current available
> > >>> index. This is done when a change of map occurs when the driver calls
> > >>> save_channel_info().
> > >> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> > >> which doesn't save the available index as save_channel_info() doesn't
> > >> get called in that path at all. How does it handle the case that
> > >> aget_vq_state() is called from userspace (e.g. QEMU) while the
> > >> hardware VQ object was torn down, but userspace still wants to access
> > >> the queue index?
> > >>
> > >> Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
> > >>
> > >> vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> > >> vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
> > >>
> > >> QEMU will complain with the above warning while VM is being rebooted
> > >> or shut down.
> > >>
> > >> Looks to me either the kernel driver should cover this requirement, or
> > >> the userspace has to bear the burden in saving the index and not call
> > >> into kernel if VQ is destroyed.
> > > Actually, the userspace doesn't have the insights whether virt queue
> > > will be destroyed if just changing the device status via set_status().
> > > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> > > so. Hence this still looks to me to be Mellanox specifics and
> > > mlx5_vdpa implementation detail that shouldn't expose to userspace.
> >
> >
> > So I think we can simply drop this patch?
> 
> Yep, I think so. To be honest I don't know why it has anything to do
> with the memory hotplug issue.

No relation. That's why I put them in two different patches. Only the
second one is the fix as I stated in the cover letter.

Anyway, let's just take the second patch.

Michael, do you need me to send PATCH 2 again as a single patch or can
you just take it?


> 
> -Siwei
> 
> >
> > Thanks
> >
> >
> > >> -Siwei
> > >>
> > >>
> > >>> Signed-off-by: Eli Cohen <elic@nvidia.com>
> > >>> ---
> > >>>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
> > >>>   1 file changed, 8 deletions(-)
> > >>>
> > >>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > >>> index 88dde3455bfd..549ded074ff3 100644
> > >>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > >>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > >>> @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > >>>
> > >>>   static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > >>>   {
> > >>> -       struct mlx5_virtq_attr attr;
> > >>> -
> > >>>          if (!mvq->initialized)
> > >>>                  return;
> > >>>
> > >>> @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> > >>>
> > >>>          if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> > >>>                  mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> > >>> -
> > >>> -       if (query_virtqueue(ndev, mvq, &attr)) {
> > >>> -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> > >>> -               return;
> > >>> -       }
> > >>> -       mvq->avail_idx = attr.available_index;
> > >>>   }
> > >>>
> > >>>   static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > >>> --
> > >>> 2.29.2
> > >>>
> >

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-02  6:02           ` Jason Wang
@ 2021-02-02  7:06             ` Eli Cohen
  2021-02-02  8:38               ` Si-Wei Liu
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Cohen @ 2021-02-02  7:06 UTC (permalink / raw)
  To: Jason Wang
  Cc: Si-Wei Liu, mst, virtualization, netdev, linux-kernel, lulu, Si-Wei Liu

On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote:
> 
> On 2021/2/2 下午12:15, Si-Wei Liu wrote:
> > On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <jasowang@redhat.com> wrote:
> > > 
> > > On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > > > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
> > > > > On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
> > > > > > suspend_vq should only suspend the VQ on not save the current available
> > > > > > index. This is done when a change of map occurs when the driver calls
> > > > > > save_channel_info().
> > > > > Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> > > > > which doesn't save the available index as save_channel_info() doesn't
> > > > > get called in that path at all. How does it handle the case that
> > > > > aget_vq_state() is called from userspace (e.g. QEMU) while the
> > > > > hardware VQ object was torn down, but userspace still wants to access
> > > > > the queue index?
> > > > > 
> > > > > Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
> > > > > 
> > > > > vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> > > > > vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
> > > > > 
> > > > > QEMU will complain with the above warning while VM is being rebooted
> > > > > or shut down.
> > > > > 
> > > > > Looks to me either the kernel driver should cover this requirement, or
> > > > > the userspace has to bear the burden in saving the index and not call
> > > > > into kernel if VQ is destroyed.
> > > > Actually, the userspace doesn't have the insights whether virt queue
> > > > will be destroyed if just changing the device status via set_status().
> > > > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> > > > so. Hence this still looks to me to be Mellanox specifics and
> > > > mlx5_vdpa implementation detail that shouldn't expose to userspace.
> > > 
> > > So I think we can simply drop this patch?
> > Yep, I think so. To be honest I don't know why it has anything to do
> > with the memory hotplug issue.
> 
> 
> Eli may know more, my understanding is that, during memory hotplut, qemu
> need to propagate new memory mappings via set_map(). For mellanox, it means
> it needs to rebuild memory keys, so the virtqueue needs to be suspended.
> 

I think Siwei was asking why the first patch was related to the hotplug
issue.

But you're correct. When memory is added, I get a new memory map. This
requires me to build a new memory key object which covers the new memory
map. Since the virtqueue objects are referencing this memory key, I need
to destroy them and build new ones referncing the new memory key.

> Thanks
> 
> 
> > 
> > -Siwei
> > 
> > > Thanks
> > > 
> > > 
> > > > > -Siwei
> > > > > 
> > > > > 
> > > > > > Signed-off-by: Eli Cohen <elic@nvidia.com>
> > > > > > ---
> > > > > >    drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
> > > > > >    1 file changed, 8 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > index 88dde3455bfd..549ded074ff3 100644
> > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > > > > 
> > > > > >    static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > > > >    {
> > > > > > -       struct mlx5_virtq_attr attr;
> > > > > > -
> > > > > >           if (!mvq->initialized)
> > > > > >                   return;
> > > > > > 
> > > > > > @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> > > > > > 
> > > > > >           if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> > > > > >                   mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> > > > > > -
> > > > > > -       if (query_virtqueue(ndev, mvq, &attr)) {
> > > > > > -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> > > > > > -               return;
> > > > > > -       }
> > > > > > -       mvq->avail_idx = attr.available_index;
> > > > > >    }
> > > > > > 
> > > > > >    static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > > > > > --
> > > > > > 2.29.2
> > > > > > 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-02  7:06             ` Eli Cohen
@ 2021-02-02  8:38               ` Si-Wei Liu
  2021-02-02  9:22                 ` Eli Cohen
  0 siblings, 1 reply; 27+ messages in thread
From: Si-Wei Liu @ 2021-02-02  8:38 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Jason Wang, mst, virtualization, netdev, linux-kernel, lulu, Si-Wei Liu

Thanks Eli and Jason for clarifications. See inline.

On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen <elic@nvidia.com> wrote:
>
> On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote:
> >
> > On 2021/2/2 下午12:15, Si-Wei Liu wrote:
> > > On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > > > > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
> > > > > > On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
> > > > > > > suspend_vq should only suspend the VQ on not save the current available
> > > > > > > index. This is done when a change of map occurs when the driver calls
> > > > > > > save_channel_info().
> > > > > > Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> > > > > > which doesn't save the available index as save_channel_info() doesn't
> > > > > > get called in that path at all. How does it handle the case that
> > > > > > aget_vq_state() is called from userspace (e.g. QEMU) while the
> > > > > > hardware VQ object was torn down, but userspace still wants to access
> > > > > > the queue index?
> > > > > >
> > > > > > Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
> > > > > >
> > > > > > vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> > > > > > vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
> > > > > >
> > > > > > QEMU will complain with the above warning while VM is being rebooted
> > > > > > or shut down.
> > > > > >
> > > > > > Looks to me either the kernel driver should cover this requirement, or
> > > > > > the userspace has to bear the burden in saving the index and not call
> > > > > > into kernel if VQ is destroyed.
> > > > > Actually, the userspace doesn't have the insights whether virt queue
> > > > > will be destroyed if just changing the device status via set_status().
> > > > > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> > > > > so. Hence this still looks to me to be Mellanox specifics and
> > > > > mlx5_vdpa implementation detail that shouldn't expose to userspace.
> > > >
> > > > So I think we can simply drop this patch?
> > > Yep, I think so. To be honest I don't know why it has anything to do
> > > with the memory hotplug issue.
> >
> >
> > Eli may know more, my understanding is that, during memory hotplut, qemu
> > need to propagate new memory mappings via set_map(). For mellanox, it means
> > it needs to rebuild memory keys, so the virtqueue needs to be suspended.
> >
>
> I think Siwei was asking why the first patch was related to the hotplug
> issue.

I was thinking how consistency is assured when saving/restoring this
h/w avail_index against the one in the virtq memory, particularly in
the region_add/.region_del() context (e.g. the hotplug case). Problem
is I don't see explicit memory barrier when guest thread updates the
avail_index, how does the device make sure the h/w avail_index is
uptodate while guest may race with updating the virtq's avail_index in
the mean while? Maybe I completely miss something obvious?

Thanks,
-Siwei

>
> But you're correct. When memory is added, I get a new memory map. This
> requires me to build a new memory key object which covers the new memory
> map. Since the virtqueue objects are referencing this memory key, I need
> to destroy them and build new ones referncing the new memory key.
>
> > Thanks
> >
> >
> > >
> > > -Siwei
> > >
> > > > Thanks
> > > >
> > > >
> > > > > > -Siwei
> > > > > >
> > > > > >
> > > > > > > Signed-off-by: Eli Cohen <elic@nvidia.com>
> > > > > > > ---
> > > > > > >    drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
> > > > > > >    1 file changed, 8 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > index 88dde3455bfd..549ded074ff3 100644
> > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > > > > >
> > > > > > >    static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > > > > >    {
> > > > > > > -       struct mlx5_virtq_attr attr;
> > > > > > > -
> > > > > > >           if (!mvq->initialized)
> > > > > > >                   return;
> > > > > > >
> > > > > > > @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> > > > > > >
> > > > > > >           if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> > > > > > >                   mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> > > > > > > -
> > > > > > > -       if (query_virtqueue(ndev, mvq, &attr)) {
> > > > > > > -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> > > > > > > -               return;
> > > > > > > -       }
> > > > > > > -       mvq->avail_idx = attr.available_index;
> > > > > > >    }
> > > > > > >
> > > > > > >    static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > > > > > > --
> > > > > > > 2.29.2
> > > > > > >
> >

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-02  8:38               ` Si-Wei Liu
@ 2021-02-02  9:22                 ` Eli Cohen
  2021-02-02 17:54                   ` Si-Wei Liu
  0 siblings, 1 reply; 27+ messages in thread
From: Eli Cohen @ 2021-02-02  9:22 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Jason Wang, mst, virtualization, netdev, linux-kernel, lulu, Si-Wei Liu

On Tue, Feb 02, 2021 at 12:38:51AM -0800, Si-Wei Liu wrote:
> Thanks Eli and Jason for clarifications. See inline.
> 
> On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen <elic@nvidia.com> wrote:
> >
> > On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote:
> > >
> > > On 2021/2/2 下午12:15, Si-Wei Liu wrote:
> > > > On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > > > > > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
> > > > > > > On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
> > > > > > > > suspend_vq should only suspend the VQ on not save the current available
> > > > > > > > index. This is done when a change of map occurs when the driver calls
> > > > > > > > save_channel_info().
> > > > > > > Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> > > > > > > which doesn't save the available index as save_channel_info() doesn't
> > > > > > > get called in that path at all. How does it handle the case that
> > > > > > > aget_vq_state() is called from userspace (e.g. QEMU) while the
> > > > > > > hardware VQ object was torn down, but userspace still wants to access
> > > > > > > the queue index?
> > > > > > >
> > > > > > > Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
> > > > > > >
> > > > > > > vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> > > > > > > vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
> > > > > > >
> > > > > > > QEMU will complain with the above warning while VM is being rebooted
> > > > > > > or shut down.
> > > > > > >
> > > > > > > Looks to me either the kernel driver should cover this requirement, or
> > > > > > > the userspace has to bear the burden in saving the index and not call
> > > > > > > into kernel if VQ is destroyed.
> > > > > > Actually, the userspace doesn't have the insights whether virt queue
> > > > > > will be destroyed if just changing the device status via set_status().
> > > > > > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> > > > > > so. Hence this still looks to me to be Mellanox specifics and
> > > > > > mlx5_vdpa implementation detail that shouldn't expose to userspace.
> > > > >
> > > > > So I think we can simply drop this patch?
> > > > Yep, I think so. To be honest I don't know why it has anything to do
> > > > with the memory hotplug issue.
> > >
> > >
> > > Eli may know more, my understanding is that, during memory hotplut, qemu
> > > need to propagate new memory mappings via set_map(). For mellanox, it means
> > > it needs to rebuild memory keys, so the virtqueue needs to be suspended.
> > >
> >
> > I think Siwei was asking why the first patch was related to the hotplug
> > issue.
> 
> I was thinking how consistency is assured when saving/restoring this
> h/w avail_index against the one in the virtq memory, particularly in
> the region_add/.region_del() context (e.g. the hotplug case). Problem
> is I don't see explicit memory barrier when guest thread updates the
> avail_index, how does the device make sure the h/w avail_index is
> uptodate while guest may race with updating the virtq's avail_index in
> the mean while? Maybe I completely miss something obvious?

If you're asking about syncronization upon hot plug of memory, the
hardware always goes to read the available index from memory when a new
hardware object is associted with a virtqueue. You can argue then that
you don't need to restore the available index and you may be right but
this is the currect inteface to the firmware.


If you're asking on generally how sync is assured when the guest updates
the available index, can you please send a pointer to the code where you
see the update without a memory barrier?

> 
> Thanks,
> -Siwei
> 
> >
> > But you're correct. When memory is added, I get a new memory map. This
> > requires me to build a new memory key object which covers the new memory
> > map. Since the virtqueue objects are referencing this memory key, I need
> > to destroy them and build new ones referncing the new memory key.
> >
> > > Thanks
> > >
> > >
> > > >
> > > > -Siwei
> > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > > > -Siwei
> > > > > > >
> > > > > > >
> > > > > > > > Signed-off-by: Eli Cohen <elic@nvidia.com>
> > > > > > > > ---
> > > > > > > >    drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
> > > > > > > >    1 file changed, 8 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > index 88dde3455bfd..549ded074ff3 100644
> > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > > > > > >
> > > > > > > >    static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > > > > > >    {
> > > > > > > > -       struct mlx5_virtq_attr attr;
> > > > > > > > -
> > > > > > > >           if (!mvq->initialized)
> > > > > > > >                   return;
> > > > > > > >
> > > > > > > > @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> > > > > > > >
> > > > > > > >           if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> > > > > > > >                   mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> > > > > > > > -
> > > > > > > > -       if (query_virtqueue(ndev, mvq, &attr)) {
> > > > > > > > -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> > > > > > > > -               return;
> > > > > > > > -       }
> > > > > > > > -       mvq->avail_idx = attr.available_index;
> > > > > > > >    }
> > > > > > > >
> > > > > > > >    static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > > > > > > > --
> > > > > > > > 2.29.2
> > > > > > > >
> > >

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-02  7:00           ` Eli Cohen
@ 2021-02-02 14:06             ` Michael S. Tsirkin
  0 siblings, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2021-02-02 14:06 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Si-Wei Liu, Jason Wang, virtualization, netdev, linux-kernel,
	lulu, Si-Wei Liu

On Tue, Feb 02, 2021 at 09:00:55AM +0200, Eli Cohen wrote:
> On Mon, Feb 01, 2021 at 08:15:29PM -0800, Si-Wei Liu wrote:
> > On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > > > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
> > > >> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
> > > >>> suspend_vq should only suspend the VQ on not save the current available
> > > >>> index. This is done when a change of map occurs when the driver calls
> > > >>> save_channel_info().
> > > >> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> > > >> which doesn't save the available index as save_channel_info() doesn't
> > > >> get called in that path at all. How does it handle the case that
> > > >> aget_vq_state() is called from userspace (e.g. QEMU) while the
> > > >> hardware VQ object was torn down, but userspace still wants to access
> > > >> the queue index?
> > > >>
> > > >> Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
> > > >>
> > > >> vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> > > >> vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
> > > >>
> > > >> QEMU will complain with the above warning while VM is being rebooted
> > > >> or shut down.
> > > >>
> > > >> Looks to me either the kernel driver should cover this requirement, or
> > > >> the userspace has to bear the burden in saving the index and not call
> > > >> into kernel if VQ is destroyed.
> > > > Actually, the userspace doesn't have the insights whether virt queue
> > > > will be destroyed if just changing the device status via set_status().
> > > > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> > > > so. Hence this still looks to me to be Mellanox specifics and
> > > > mlx5_vdpa implementation detail that shouldn't expose to userspace.
> > >
> > >
> > > So I think we can simply drop this patch?
> > 
> > Yep, I think so. To be honest I don't know why it has anything to do
> > with the memory hotplug issue.
> 
> No relation. That's why I put them in two different patches. Only the
> second one is the fix as I stated in the cover letter.
> 
> Anyway, let's just take the second patch.
> 
> Michael, do you need me to send PATCH 2 again as a single patch or can
> you just take it?

Pls post fixes separately. Thanks!

> 
> > 
> > -Siwei
> > 
> > >
> > > Thanks
> > >
> > >
> > > >> -Siwei
> > > >>
> > > >>
> > > >>> Signed-off-by: Eli Cohen <elic@nvidia.com>
> > > >>> ---
> > > >>>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
> > > >>>   1 file changed, 8 deletions(-)
> > > >>>
> > > >>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > >>> index 88dde3455bfd..549ded074ff3 100644
> > > >>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > >>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > >>> @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > >>>
> > > >>>   static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > >>>   {
> > > >>> -       struct mlx5_virtq_attr attr;
> > > >>> -
> > > >>>          if (!mvq->initialized)
> > > >>>                  return;
> > > >>>
> > > >>> @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> > > >>>
> > > >>>          if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> > > >>>                  mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> > > >>> -
> > > >>> -       if (query_virtqueue(ndev, mvq, &attr)) {
> > > >>> -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> > > >>> -               return;
> > > >>> -       }
> > > >>> -       mvq->avail_idx = attr.available_index;
> > > >>>   }
> > > >>>
> > > >>>   static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > > >>> --
> > > >>> 2.29.2
> > > >>>
> > >


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-02  9:22                 ` Eli Cohen
@ 2021-02-02 17:54                   ` Si-Wei Liu
  2021-02-03  5:16                     ` Jason Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Si-Wei Liu @ 2021-02-02 17:54 UTC (permalink / raw)
  To: Eli Cohen
  Cc: Jason Wang, mst, virtualization, netdev, linux-kernel, lulu, Si-Wei Liu

On Tue, Feb 2, 2021 at 1:23 AM Eli Cohen <elic@nvidia.com> wrote:
>
> On Tue, Feb 02, 2021 at 12:38:51AM -0800, Si-Wei Liu wrote:
> > Thanks Eli and Jason for clarifications. See inline.
> >
> > On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen <elic@nvidia.com> wrote:
> > >
> > > On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote:
> > > >
> > > > On 2021/2/2 下午12:15, Si-Wei Liu wrote:
> > > > > On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > > > > > > On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
> > > > > > > > On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
> > > > > > > > > suspend_vq should only suspend the VQ on not save the current available
> > > > > > > > > index. This is done when a change of map occurs when the driver calls
> > > > > > > > > save_channel_info().
> > > > > > > > Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> > > > > > > > which doesn't save the available index as save_channel_info() doesn't
> > > > > > > > get called in that path at all. How does it handle the case that
> > > > > > > > aget_vq_state() is called from userspace (e.g. QEMU) while the
> > > > > > > > hardware VQ object was torn down, but userspace still wants to access
> > > > > > > > the queue index?
> > > > > > > >
> > > > > > > > Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
> > > > > > > >
> > > > > > > > vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> > > > > > > > vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
> > > > > > > >
> > > > > > > > QEMU will complain with the above warning while VM is being rebooted
> > > > > > > > or shut down.
> > > > > > > >
> > > > > > > > Looks to me either the kernel driver should cover this requirement, or
> > > > > > > > the userspace has to bear the burden in saving the index and not call
> > > > > > > > into kernel if VQ is destroyed.
> > > > > > > Actually, the userspace doesn't have the insights whether virt queue
> > > > > > > will be destroyed if just changing the device status via set_status().
> > > > > > > Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> > > > > > > so. Hence this still looks to me to be Mellanox specifics and
> > > > > > > mlx5_vdpa implementation detail that shouldn't expose to userspace.
> > > > > >
> > > > > > So I think we can simply drop this patch?
> > > > > Yep, I think so. To be honest I don't know why it has anything to do
> > > > > with the memory hotplug issue.
> > > >
> > > >
> > > > Eli may know more, my understanding is that, during memory hotplut, qemu
> > > > need to propagate new memory mappings via set_map(). For mellanox, it means
> > > > it needs to rebuild memory keys, so the virtqueue needs to be suspended.
> > > >
> > >
> > > I think Siwei was asking why the first patch was related to the hotplug
> > > issue.
> >
> > I was thinking how consistency is assured when saving/restoring this
> > h/w avail_index against the one in the virtq memory, particularly in
> > the region_add/.region_del() context (e.g. the hotplug case). Problem
> > is I don't see explicit memory barrier when guest thread updates the
> > avail_index, how does the device make sure the h/w avail_index is
> > uptodate while guest may race with updating the virtq's avail_index in
> > the mean while? Maybe I completely miss something obvious?
> DKIM-Signature: v1; arsa-sha256; crelaxed/relaxed; dnvidia.com; sn1;
>         t 12257780; bhHnB0z4VEKwRS3WGY8d836MJgxu5Eln/jbFZlNXVxc08;
>         hX-PGP-Universal:Date:From:To:CC:Subject:Message-ID:References:
>          MIME-Version:Content-Type:Content-Disposition:
>          Content-Transfer-Encoding:In-Reply-To:User-Agent:X-Originating-IP:
>          X-ClientProxiedBy;
>         bgGmb8+rcn3/rKzKQ/7QzSnghWzZ+FAU0XntsRZYGQ66sFvT7zsYPHogG3LIWNY77t
>          wNHPw7GCJrNaH3nEXPbOp0FMOZw4Kv4W7UPuYPobbLeTkvuPAidjB8dM42vz+1X61t
>          9IVQT9X4hnAxRjI5CqZOo41GS4Tl1X+ykGoA+VE80BR/R/+nQ3tXDVULfppzeB+vu3
>          TWnnpaZ2GyoNyPlMiyVRkHdXzDVgA4uQHxwHn7otGK5J4lzyu8KrFyQtiP+f6hfu5v
>          crJkYS8e9A+rfzUmKWuyHcKcmhPhAVJ4XdpzZcDXXlMHVxG7nR1o88xttC6D1+oNIP
>          9xHI3DkNBpEuA
> If you're asking about syncronization upon hot plug of memory, the
> hardware always goes to read the available index from memory when a new
> hardware object is associted with a virtqueue. You can argue then that
> you don't need to restore the available index and you may be right but
> this is the currect inteface to the firmware.
>
>
> If you're asking on generally how sync is assured when the guest updates
> the available index, can you please send a pointer to the code where you
> see the update without a memory barrier?

This is a snippet of virtqueue_add_split() where avail_index gets
updated by guest:

        /* Put entry in available array (but don't update avail->idx until they
         * do sync). */
        avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
        vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);

        /* Descriptors and available array need to be set before we expose the
         * new available array entries. */
        virtio_wmb(vq->weak_barriers);
        vq->split.avail_idx_shadow++;
        vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
                                                vq->split.avail_idx_shadow);
        vq->num_added++;

There's memory barrier to make sure the update to descriptor and
available ring is seen before writing to the avail->idx, but there
seems no gurantee that this update would flush to the memory
immmedately either before or after the mlx5-vdpa is suspened and gets
the old avail_index value stashed somewhere. In this case, how does
the hardware ensure the consistency between the guest virtio and host
mlx5-vdpa? Or, it completly relies on guest to update the avail_index
once the next buffer is available, so that the index will be in sync
again?

Thanks,
-Siwei

>
> >
> > Thanks,
> > -Siwei
> >
> > >
> > > But you're correct. When memory is added, I get a new memory map. This
> > > requires me to build a new memory key object which covers the new memory
> > > map. Since the virtqueue objects are referencing this memory key, I need
> > > to destroy them and build new ones referncing the new memory key.
> > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > -Siwei
> > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > > > -Siwei
> > > > > > > >
> > > > > > > >
> > > > > > > > > Signed-off-by: Eli Cohen <elic@nvidia.com>
> > > > > > > > > ---
> > > > > > > > >    drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
> > > > > > > > >    1 file changed, 8 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > index 88dde3455bfd..549ded074ff3 100644
> > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > > > > > > >
> > > > > > > > >    static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > > > > > > > >    {
> > > > > > > > > -       struct mlx5_virtq_attr attr;
> > > > > > > > > -
> > > > > > > > >           if (!mvq->initialized)
> > > > > > > > >                   return;
> > > > > > > > >
> > > > > > > > > @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> > > > > > > > >
> > > > > > > > >           if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> > > > > > > > >                   mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> > > > > > > > > -
> > > > > > > > > -       if (query_virtqueue(ndev, mvq, &attr)) {
> > > > > > > > > -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> > > > > > > > > -               return;
> > > > > > > > > -       }
> > > > > > > > > -       mvq->avail_idx = attr.available_index;
> > > > > > > > >    }
> > > > > > > > >
> > > > > > > > >    static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > > > > > > > > --
> > > > > > > > > 2.29.2
> > > > > > > > >
> > > >

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-02 17:54                   ` Si-Wei Liu
@ 2021-02-03  5:16                     ` Jason Wang
  2021-02-03 23:19                       ` Si-Wei Liu
  0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-02-03  5:16 UTC (permalink / raw)
  To: Si-Wei Liu, Eli Cohen
  Cc: mst, virtualization, netdev, linux-kernel, lulu, Si-Wei Liu


On 2021/2/3 上午1:54, Si-Wei Liu wrote:
> On Tue, Feb 2, 2021 at 1:23 AM Eli Cohen <elic@nvidia.com> wrote:
>> On Tue, Feb 02, 2021 at 12:38:51AM -0800, Si-Wei Liu wrote:
>>> Thanks Eli and Jason for clarifications. See inline.
>>>
>>> On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen <elic@nvidia.com> wrote:
>>>> On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote:
>>>>> On 2021/2/2 下午12:15, Si-Wei Liu wrote:
>>>>>> On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>> On 2021/2/2 上午3:17, Si-Wei Liu wrote:
>>>>>>>> On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
>>>>>>>>> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
>>>>>>>>>> suspend_vq should only suspend the VQ on not save the current available
>>>>>>>>>> index. This is done when a change of map occurs when the driver calls
>>>>>>>>>> save_channel_info().
>>>>>>>>> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
>>>>>>>>> which doesn't save the available index as save_channel_info() doesn't
>>>>>>>>> get called in that path at all. How does it handle the case that
>>>>>>>>> aget_vq_state() is called from userspace (e.g. QEMU) while the
>>>>>>>>> hardware VQ object was torn down, but userspace still wants to access
>>>>>>>>> the queue index?
>>>>>>>>>
>>>>>>>>> Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
>>>>>>>>>
>>>>>>>>> vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
>>>>>>>>> vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
>>>>>>>>>
>>>>>>>>> QEMU will complain with the above warning while VM is being rebooted
>>>>>>>>> or shut down.
>>>>>>>>>
>>>>>>>>> Looks to me either the kernel driver should cover this requirement, or
>>>>>>>>> the userspace has to bear the burden in saving the index and not call
>>>>>>>>> into kernel if VQ is destroyed.
>>>>>>>> Actually, the userspace doesn't have the insights whether virt queue
>>>>>>>> will be destroyed if just changing the device status via set_status().
>>>>>>>> Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
>>>>>>>> so. Hence this still looks to me to be Mellanox specifics and
>>>>>>>> mlx5_vdpa implementation detail that shouldn't expose to userspace.
>>>>>>> So I think we can simply drop this patch?
>>>>>> Yep, I think so. To be honest I don't know why it has anything to do
>>>>>> with the memory hotplug issue.
>>>>>
>>>>> Eli may know more, my understanding is that, during memory hotplut, qemu
>>>>> need to propagate new memory mappings via set_map(). For mellanox, it means
>>>>> it needs to rebuild memory keys, so the virtqueue needs to be suspended.
>>>>>
>>>> I think Siwei was asking why the first patch was related to the hotplug
>>>> issue.
>>> I was thinking how consistency is assured when saving/restoring this
>>> h/w avail_index against the one in the virtq memory, particularly in
>>> the region_add/.region_del() context (e.g. the hotplug case). Problem
>>> is I don't see explicit memory barrier when guest thread updates the
>>> avail_index, how does the device make sure the h/w avail_index is
>>> uptodate while guest may race with updating the virtq's avail_index in
>>> the mean while? Maybe I completely miss something obvious?
>> DKIM-Signature: v1; arsa-sha256; crelaxed/relaxed; dnvidia.com; sn1;
>>          t 12257780; bhHnB0z4VEKwRS3WGY8d836MJgxu5Eln/jbFZlNXVxc08;
>>          hX-PGP-Universal:Date:From:To:CC:Subject:Message-ID:References:
>>           MIME-Version:Content-Type:Content-Disposition:
>>           Content-Transfer-Encoding:In-Reply-To:User-Agent:X-Originating-IP:
>>           X-ClientProxiedBy;
>>          bgGmb8+rcn3/rKzKQ/7QzSnghWzZ+FAU0XntsRZYGQ66sFvT7zsYPHogG3LIWNY77t
>>           wNHPw7GCJrNaH3nEXPbOp0FMOZw4Kv4W7UPuYPobbLeTkvuPAidjB8dM42vz+1X61t
>>           9IVQT9X4hnAxRjI5CqZOo41GS4Tl1X+ykGoA+VE80BR/R/+nQ3tXDVULfppzeB+vu3
>>           TWnnpaZ2GyoNyPlMiyVRkHdXzDVgA4uQHxwHn7otGK5J4lzyu8KrFyQtiP+f6hfu5v
>>           crJkYS8e9A+rfzUmKWuyHcKcmhPhAVJ4XdpzZcDXXlMHVxG7nR1o88xttC6D1+oNIP
>>           9xHI3DkNBpEuA
>> If you're asking about syncronization upon hot plug of memory, the
>> hardware always goes to read the available index from memory when a new
>> hardware object is associted with a virtqueue. You can argue then that
>> you don't need to restore the available index and you may be right but
>> this is the currect inteface to the firmware.
>>
>>
>> If you're asking on generally how sync is assured when the guest updates
>> the available index, can you please send a pointer to the code where you
>> see the update without a memory barrier?
> This is a snippet of virtqueue_add_split() where avail_index gets
> updated by guest:
>
>          /* Put entry in available array (but don't update avail->idx until they
>           * do sync). */
>          avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
>          vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
>
>          /* Descriptors and available array need to be set before we expose the
>           * new available array entries. */
>          virtio_wmb(vq->weak_barriers);
>          vq->split.avail_idx_shadow++;
>          vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
>                                                  vq->split.avail_idx_shadow);
>          vq->num_added++;
>
> There's memory barrier to make sure the update to descriptor and
> available ring is seen before writing to the avail->idx, but there
> seems no gurantee that this update would flush to the memory
> immmedately either before or after the mlx5-vdpa is suspened and gets
> the old avail_index value stashed somewhere. In this case, how does
> the hardware ensure the consistency between the guest virtio and host
> mlx5-vdpa? Or, it completly relies on guest to update the avail_index
> once the next buffer is available, so that the index will be in sync
> again?


I'm not sure I get the question but notice that the driver should check 
and notify virtqueue when device want a notification. So there's a 
virtio_wmb() e.g in:

static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
{
     struct vring_virtqueue *vq = to_vvq(_vq);
     u16 new, old;
     bool needs_kick;

     START_USE(vq);
     /* We need to expose available array entries before checking avail
      * event. */
     virtio_mb(vq->weak_barries);

     old = vq->split.avail_idx_shadow - vq->num_added;
     new = vq->split.avail_idx_shadow;
     vq->num_added = 0;

(See the comment above virtio_mb()). So the avail idx is guaranteed to 
be committed to memroy(cache hierarchy) before the check of 
notification. I think we sync through this.

Thanks


>
> Thanks,
> -Siwei
>
>>> Thanks,
>>> -Siwei
>>>
>>>> But you're correct. When memory is added, I get a new memory map. This
>>>> requires me to build a new memory key object which covers the new memory
>>>> map. Since the virtqueue objects are referencing this memory key, I need
>>>> to destroy them and build new ones referncing the new memory key.
>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>> -Siwei
>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>>> -Siwei
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>>>>>>>> ---
>>>>>>>>>>     drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
>>>>>>>>>>     1 file changed, 8 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> index 88dde3455bfd..549ded074ff3 100644
>>>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>>>>>>>>>>
>>>>>>>>>>     static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>>>>>>>>>>     {
>>>>>>>>>> -       struct mlx5_virtq_attr attr;
>>>>>>>>>> -
>>>>>>>>>>            if (!mvq->initialized)
>>>>>>>>>>                    return;
>>>>>>>>>>
>>>>>>>>>> @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
>>>>>>>>>>
>>>>>>>>>>            if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
>>>>>>>>>>                    mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
>>>>>>>>>> -
>>>>>>>>>> -       if (query_virtqueue(ndev, mvq, &attr)) {
>>>>>>>>>> -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
>>>>>>>>>> -               return;
>>>>>>>>>> -       }
>>>>>>>>>> -       mvq->avail_idx = attr.available_index;
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>>     static void suspend_vqs(struct mlx5_vdpa_net *ndev)
>>>>>>>>>> --
>>>>>>>>>> 2.29.2
>>>>>>>>>>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-03  5:16                     ` Jason Wang
@ 2021-02-03 23:19                       ` Si-Wei Liu
  2021-02-04  2:57                         ` Jason Wang
  2021-02-04  6:53                         ` Eli Cohen
  0 siblings, 2 replies; 27+ messages in thread
From: Si-Wei Liu @ 2021-02-03 23:19 UTC (permalink / raw)
  To: Jason Wang
  Cc: Eli Cohen, mst, virtualization, netdev, linux-kernel, lulu, Si-Wei Liu

On Tue, Feb 2, 2021 at 9:16 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2021/2/3 上午1:54, Si-Wei Liu wrote:
> > On Tue, Feb 2, 2021 at 1:23 AM Eli Cohen <elic@nvidia.com> wrote:
> >> On Tue, Feb 02, 2021 at 12:38:51AM -0800, Si-Wei Liu wrote:
> >>> Thanks Eli and Jason for clarifications. See inline.
> >>>
> >>> On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen <elic@nvidia.com> wrote:
> >>>> On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote:
> >>>>> On 2021/2/2 下午12:15, Si-Wei Liu wrote:
> >>>>>> On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <jasowang@redhat.com> wrote:
> >>>>>>> On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> >>>>>>>> On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
> >>>>>>>>> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
> >>>>>>>>>> suspend_vq should only suspend the VQ on not save the current available
> >>>>>>>>>> index. This is done when a change of map occurs when the driver calls
> >>>>>>>>>> save_channel_info().
> >>>>>>>>> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> >>>>>>>>> which doesn't save the available index as save_channel_info() doesn't
> >>>>>>>>> get called in that path at all. How does it handle the case that
> >>>>>>>>> aget_vq_state() is called from userspace (e.g. QEMU) while the
> >>>>>>>>> hardware VQ object was torn down, but userspace still wants to access
> >>>>>>>>> the queue index?
> >>>>>>>>>
> >>>>>>>>> Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
> >>>>>>>>>
> >>>>>>>>> vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> >>>>>>>>> vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
> >>>>>>>>>
> >>>>>>>>> QEMU will complain with the above warning while VM is being rebooted
> >>>>>>>>> or shut down.
> >>>>>>>>>
> >>>>>>>>> Looks to me either the kernel driver should cover this requirement, or
> >>>>>>>>> the userspace has to bear the burden in saving the index and not call
> >>>>>>>>> into kernel if VQ is destroyed.
> >>>>>>>> Actually, the userspace doesn't have the insights whether virt queue
> >>>>>>>> will be destroyed if just changing the device status via set_status().
> >>>>>>>> Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> >>>>>>>> so. Hence this still looks to me to be Mellanox specifics and
> >>>>>>>> mlx5_vdpa implementation detail that shouldn't expose to userspace.
> >>>>>>> So I think we can simply drop this patch?
> >>>>>> Yep, I think so. To be honest I don't know why it has anything to do
> >>>>>> with the memory hotplug issue.
> >>>>>
> >>>>> Eli may know more, my understanding is that, during memory hotplut, qemu
> >>>>> need to propagate new memory mappings via set_map(). For mellanox, it means
> >>>>> it needs to rebuild memory keys, so the virtqueue needs to be suspended.
> >>>>>
> >>>> I think Siwei was asking why the first patch was related to the hotplug
> >>>> issue.
> >>> I was thinking how consistency is assured when saving/restoring this
> >>> h/w avail_index against the one in the virtq memory, particularly in
> >>> the region_add/.region_del() context (e.g. the hotplug case). Problem
> >>> is I don't see explicit memory barrier when guest thread updates the
> >>> avail_index, how does the device make sure the h/w avail_index is
> >>> uptodate while guest may race with updating the virtq's avail_index in
> >>> the mean while? Maybe I completely miss something obvious?
> >> DKIM-Signature: v1; arsa-sha256; crelaxed/relaxed; dnvidia.com; sn1;
> >>          t 12257780; bhHnB0z4VEKwRS3WGY8d836MJgxu5Eln/jbFZlNXVxc08;
> >>          hX-PGP-Universal:Date:From:To:CC:Subject:Message-ID:References:
> >>           MIME-Version:Content-Type:Content-Disposition:
> >>           Content-Transfer-Encoding:In-Reply-To:User-Agent:X-Originating-IP:
> >>           X-ClientProxiedBy;
> >>          bgGmb8+rcn3/rKzKQ/7QzSnghWzZ+FAU0XntsRZYGQ66sFvT7zsYPHogG3LIWNY77t
> >>           wNHPw7GCJrNaH3nEXPbOp0FMOZw4Kv4W7UPuYPobbLeTkvuPAidjB8dM42vz+1X61t
> >>           9IVQT9X4hnAxRjI5CqZOo41GS4Tl1X+ykGoA+VE80BR/R/+nQ3tXDVULfppzeB+vu3
> >>           TWnnpaZ2GyoNyPlMiyVRkHdXzDVgA4uQHxwHn7otGK5J4lzyu8KrFyQtiP+f6hfu5v
> >>           crJkYS8e9A+rfzUmKWuyHcKcmhPhAVJ4XdpzZcDXXlMHVxG7nR1o88xttC6D1+oNIP
> >>           9xHI3DkNBpEuA
> >> If you're asking about syncronization upon hot plug of memory, the
> >> hardware always goes to read the available index from memory when a new
> >> hardware object is associted with a virtqueue. You can argue then that
> >> you don't need to restore the available index and you may be right but
> >> this is the currect inteface to the firmware.
> >>
> >>
> >> If you're asking on generally how sync is assured when the guest updates
> >> the available index, can you please send a pointer to the code where you
> >> see the update without a memory barrier?
> > This is a snippet of virtqueue_add_split() where avail_index gets
> > updated by guest:
> >
> >          /* Put entry in available array (but don't update avail->idx until they
> >           * do sync). */
> >          avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> >          vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
> >
> >          /* Descriptors and available array need to be set before we expose the
> >           * new available array entries. */
> >          virtio_wmb(vq->weak_barriers);
> >          vq->split.avail_idx_shadow++;
> >          vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
> >                                                  vq->split.avail_idx_shadow);
> >          vq->num_added++;
> >
> > There's memory barrier to make sure the update to descriptor and
> > available ring is seen before writing to the avail->idx, but there
> > seems no gurantee that this update would flush to the memory
> > immmedately either before or after the mlx5-vdpa is suspened and gets
> > the old avail_index value stashed somewhere. In this case, how does
> > the hardware ensure the consistency between the guest virtio and host
> > mlx5-vdpa? Or, it completly relies on guest to update the avail_index
> > once the next buffer is available, so that the index will be in sync
> > again?
>
>
> I'm not sure I get the question but notice that the driver should check
> and notify virtqueue when device want a notification. So there's a
> virtio_wmb() e.g in:
>
> static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
> {
>      struct vring_virtqueue *vq = to_vvq(_vq);
>      u16 new, old;
>      bool needs_kick;
>
>      START_USE(vq);
>      /* We need to expose available array entries before checking avail
>       * event. */
>      virtio_mb(vq->weak_barries);
>
>      old = vq->split.avail_idx_shadow - vq->num_added;
>      new = vq->split.avail_idx_shadow;
>      vq->num_added = 0;
>
> (See the comment above virtio_mb()). So the avail idx is guaranteed to
> be committed to memroy(cache hierarchy) before the check of
> notification. I think we sync through this.

Thanks for pointing it out! Indeed, the avail index is synced before
kicking the device. However, even so I think the race might still be
possible, see below:


  mlx5_vdpa device                       virtio driver
-------------------------------------------------------------------------
                                  virtqueue_add_split
                                    (bumped up avail_idx1 to
                                    avail_idx2, but update
                                    not yet committed to mem)

(hot plug memory...)
mlx5_vdpa_set_map
  mlx5_vdpa_change_map
    suspend_vqs
      suspend_vq
        (avail_idx1 was saved)
    save_channels_info
    :
    :                             virtqueue_kick_prepare_split
    :                               (avail_idx2 committed to memory)
    restore_channels_info
    setup_driver
      ...
        create_virtqueue
          (saved avail_idx1
          getting restored;
          whereas current
          avail_idx2 in
          memory is ignored)
:
:
                                   virtqueue_notify
                                     vp_notify
mlx5_vdpa_kick_vq
  (device processing up to
  avail_idx1 rather than
  avail_idx2?)


According to Eli, the vdpa firmware does not load the latest value,
i.e. avail_idx2 from memory when re-creating the virtqueue object.
Instead it starts with a stale avail_idx1 that was saved in mlx5_vdpa
driver private structure before the memory update is made. That is the
way how the current firmware interface is designed. The thing I'm not
sure though is if the firmware/device will resync and get to the
latest avail_idx2 from memory while processing the kick request with
mlx5_vdpa_kick_vq? If not, a few avail entries will be missing in the
kick, which could cause odd behavior or implicit failure.

But if the firmware will resync on kick_vq (and get_vq_state), I think
I completely lost the point of saving and restoring avail_idx when
changing the memory map...

Thanks,
-Siwei

>
> Thanks
>
>
> >
> > Thanks,
> > -Siwei
> >
> >>> Thanks,
> >>> -Siwei
> >>>
> >>>> But you're correct. When memory is added, I get a new memory map. This
> >>>> requires me to build a new memory key object which covers the new memory
> >>>> map. Since the virtqueue objects are referencing this memory key, I need
> >>>> to destroy them and build new ones referncing the new memory key.
> >>>>
> >>>>> Thanks
> >>>>>
> >>>>>
> >>>>>> -Siwei
> >>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>>
> >>>>>>>>> -Siwei
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
> >>>>>>>>>> ---
> >>>>>>>>>>     drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
> >>>>>>>>>>     1 file changed, 8 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >>>>>>>>>> index 88dde3455bfd..549ded074ff3 100644
> >>>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >>>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >>>>>>>>>> @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> >>>>>>>>>>
> >>>>>>>>>>     static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> >>>>>>>>>>     {
> >>>>>>>>>> -       struct mlx5_virtq_attr attr;
> >>>>>>>>>> -
> >>>>>>>>>>            if (!mvq->initialized)
> >>>>>>>>>>                    return;
> >>>>>>>>>>
> >>>>>>>>>> @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> >>>>>>>>>>
> >>>>>>>>>>            if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> >>>>>>>>>>                    mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> >>>>>>>>>> -
> >>>>>>>>>> -       if (query_virtqueue(ndev, mvq, &attr)) {
> >>>>>>>>>> -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> >>>>>>>>>> -               return;
> >>>>>>>>>> -       }
> >>>>>>>>>> -       mvq->avail_idx = attr.available_index;
> >>>>>>>>>>     }
> >>>>>>>>>>
> >>>>>>>>>>     static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> >>>>>>>>>> --
> >>>>>>>>>> 2.29.2
> >>>>>>>>>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-03 23:19                       ` Si-Wei Liu
@ 2021-02-04  2:57                         ` Jason Wang
  2021-02-04  6:53                         ` Eli Cohen
  1 sibling, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-02-04  2:57 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Eli Cohen, mst, virtualization, netdev, linux-kernel, lulu, Si-Wei Liu


On 2021/2/4 上午7:19, Si-Wei Liu wrote:
> On Tue, Feb 2, 2021 at 9:16 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2021/2/3 上午1:54, Si-Wei Liu wrote:
>>> On Tue, Feb 2, 2021 at 1:23 AM Eli Cohen <elic@nvidia.com> wrote:
>>>> On Tue, Feb 02, 2021 at 12:38:51AM -0800, Si-Wei Liu wrote:
>>>>> Thanks Eli and Jason for clarifications. See inline.
>>>>>
>>>>> On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen <elic@nvidia.com> wrote:
>>>>>> On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote:
>>>>>>> On 2021/2/2 下午12:15, Si-Wei Liu wrote:
>>>>>>>> On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>> On 2021/2/2 上午3:17, Si-Wei Liu wrote:
>>>>>>>>>> On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
>>>>>>>>>>> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
>>>>>>>>>>>> suspend_vq should only suspend the VQ on not save the current available
>>>>>>>>>>>> index. This is done when a change of map occurs when the driver calls
>>>>>>>>>>>> save_channel_info().
>>>>>>>>>>> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
>>>>>>>>>>> which doesn't save the available index as save_channel_info() doesn't
>>>>>>>>>>> get called in that path at all. How does it handle the case that
>>>>>>>>>>> aget_vq_state() is called from userspace (e.g. QEMU) while the
>>>>>>>>>>> hardware VQ object was torn down, but userspace still wants to access
>>>>>>>>>>> the queue index?
>>>>>>>>>>>
>>>>>>>>>>> Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
>>>>>>>>>>>
>>>>>>>>>>> vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
>>>>>>>>>>> vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
>>>>>>>>>>>
>>>>>>>>>>> QEMU will complain with the above warning while VM is being rebooted
>>>>>>>>>>> or shut down.
>>>>>>>>>>>
>>>>>>>>>>> Looks to me either the kernel driver should cover this requirement, or
>>>>>>>>>>> the userspace has to bear the burden in saving the index and not call
>>>>>>>>>>> into kernel if VQ is destroyed.
>>>>>>>>>> Actually, the userspace doesn't have the insights whether virt queue
>>>>>>>>>> will be destroyed if just changing the device status via set_status().
>>>>>>>>>> Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
>>>>>>>>>> so. Hence this still looks to me to be Mellanox specifics and
>>>>>>>>>> mlx5_vdpa implementation detail that shouldn't expose to userspace.
>>>>>>>>> So I think we can simply drop this patch?
>>>>>>>> Yep, I think so. To be honest I don't know why it has anything to do
>>>>>>>> with the memory hotplug issue.
>>>>>>> Eli may know more, my understanding is that, during memory hotplut, qemu
>>>>>>> need to propagate new memory mappings via set_map(). For mellanox, it means
>>>>>>> it needs to rebuild memory keys, so the virtqueue needs to be suspended.
>>>>>>>
>>>>>> I think Siwei was asking why the first patch was related to the hotplug
>>>>>> issue.
>>>>> I was thinking how consistency is assured when saving/restoring this
>>>>> h/w avail_index against the one in the virtq memory, particularly in
>>>>> the region_add/.region_del() context (e.g. the hotplug case). Problem
>>>>> is I don't see explicit memory barrier when guest thread updates the
>>>>> avail_index, how does the device make sure the h/w avail_index is
>>>>> uptodate while guest may race with updating the virtq's avail_index in
>>>>> the mean while? Maybe I completely miss something obvious?
>>>> DKIM-Signature: v1; arsa-sha256; crelaxed/relaxed; dnvidia.com; sn1;
>>>>           t 12257780; bhHnB0z4VEKwRS3WGY8d836MJgxu5Eln/jbFZlNXVxc08;
>>>>           hX-PGP-Universal:Date:From:To:CC:Subject:Message-ID:References:
>>>>            MIME-Version:Content-Type:Content-Disposition:
>>>>            Content-Transfer-Encoding:In-Reply-To:User-Agent:X-Originating-IP:
>>>>            X-ClientProxiedBy;
>>>>           bgGmb8+rcn3/rKzKQ/7QzSnghWzZ+FAU0XntsRZYGQ66sFvT7zsYPHogG3LIWNY77t
>>>>            wNHPw7GCJrNaH3nEXPbOp0FMOZw4Kv4W7UPuYPobbLeTkvuPAidjB8dM42vz+1X61t
>>>>            9IVQT9X4hnAxRjI5CqZOo41GS4Tl1X+ykGoA+VE80BR/R/+nQ3tXDVULfppzeB+vu3
>>>>            TWnnpaZ2GyoNyPlMiyVRkHdXzDVgA4uQHxwHn7otGK5J4lzyu8KrFyQtiP+f6hfu5v
>>>>            crJkYS8e9A+rfzUmKWuyHcKcmhPhAVJ4XdpzZcDXXlMHVxG7nR1o88xttC6D1+oNIP
>>>>            9xHI3DkNBpEuA
>>>> If you're asking about syncronization upon hot plug of memory, the
>>>> hardware always goes to read the available index from memory when a new
>>>> hardware object is associted with a virtqueue. You can argue then that
>>>> you don't need to restore the available index and you may be right but
>>>> this is the currect inteface to the firmware.
>>>>
>>>>
>>>> If you're asking on generally how sync is assured when the guest updates
>>>> the available index, can you please send a pointer to the code where you
>>>> see the update without a memory barrier?
>>> This is a snippet of virtqueue_add_split() where avail_index gets
>>> updated by guest:
>>>
>>>           /* Put entry in available array (but don't update avail->idx until they
>>>            * do sync). */
>>>           avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
>>>           vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
>>>
>>>           /* Descriptors and available array need to be set before we expose the
>>>            * new available array entries. */
>>>           virtio_wmb(vq->weak_barriers);
>>>           vq->split.avail_idx_shadow++;
>>>           vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
>>>                                                   vq->split.avail_idx_shadow);
>>>           vq->num_added++;
>>>
>>> There's memory barrier to make sure the update to descriptor and
>>> available ring is seen before writing to the avail->idx, but there
>>> seems no gurantee that this update would flush to the memory
>>> immmedately either before or after the mlx5-vdpa is suspened and gets
>>> the old avail_index value stashed somewhere. In this case, how does
>>> the hardware ensure the consistency between the guest virtio and host
>>> mlx5-vdpa? Or, it completly relies on guest to update the avail_index
>>> once the next buffer is available, so that the index will be in sync
>>> again?
>>
>> I'm not sure I get the question but notice that the driver should check
>> and notify virtqueue when device want a notification. So there's a
>> virtio_wmb() e.g in:
>>
>> static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
>> {
>>       struct vring_virtqueue *vq = to_vvq(_vq);
>>       u16 new, old;
>>       bool needs_kick;
>>
>>       START_USE(vq);
>>       /* We need to expose available array entries before checking avail
>>        * event. */
>>       virtio_mb(vq->weak_barries);
>>
>>       old = vq->split.avail_idx_shadow - vq->num_added;
>>       new = vq->split.avail_idx_shadow;
>>       vq->num_added = 0;
>>
>> (See the comment above virtio_mb()). So the avail idx is guaranteed to
>> be committed to memroy(cache hierarchy) before the check of
>> notification. I think we sync through this.
> Thanks for pointing it out! Indeed, the avail index is synced before
> kicking the device. However, even so I think the race might still be
> possible, see below:
>
>
>    mlx5_vdpa device                       virtio driver
> -------------------------------------------------------------------------
>                                    virtqueue_add_split
>                                      (bumped up avail_idx1 to
>                                      avail_idx2, but update
>                                      not yet committed to mem)
>
> (hot plug memory...)
> mlx5_vdpa_set_map
>    mlx5_vdpa_change_map
>      suspend_vqs
>        suspend_vq
>          (avail_idx1 was saved)
>      save_channels_info
>      :
>      :                             virtqueue_kick_prepare_split
>      :                               (avail_idx2 committed to memory)
>      restore_channels_info
>      setup_driver
>        ...
>          create_virtqueue
>            (saved avail_idx1
>            getting restored;
>            whereas current
>            avail_idx2 in
>            memory is ignored)


The index is not ignored, the device is expected to read descriptors 
from avail_idx1 until avail_idx2.


> :
> :
>                                     virtqueue_notify
>                                       vp_notify
> mlx5_vdpa_kick_vq
>    (device processing up to
>    avail_idx1 rather than
>    avail_idx2?)
>
>
> According to Eli, the vdpa firmware does not load the latest value,
> i.e. avail_idx2 from memory when re-creating the virtqueue object.


During sync it's not needed but after the device start to work, it needs 
to read descriptor until the available index.


> Instead it starts with a stale avail_idx1 that was saved in mlx5_vdpa
> driver private structure before the memory update is made. That is the
> way how the current firmware interface is designed. The thing I'm not
> sure though is if the firmware/device will resync and get to the
> latest avail_idx2 from memory while processing the kick request with
> mlx5_vdpa_kick_vq?


I think the point is that, the device must have an internal avail index, 
and the device is only expected to sync this internal avail index.


> If not, a few avail entries will be missing in the
> kick, which could cause odd behavior or implicit failure.
>
> But if the firmware will resync on kick_vq (and get_vq_state), I think
> I completely lost the point of saving and restoring avail_idx when
> changing the memory map...


The memory map changing is usually transparent to guest driver but only 
the device. Maybe you can refer vhost codes or qemu codes for the device 
logic.

Thanks


>
> Thanks,
> -Siwei
>
>> Thanks
>>
>>
>>> Thanks,
>>> -Siwei
>>>
>>>>> Thanks,
>>>>> -Siwei
>>>>>
>>>>>> But you're correct. When memory is added, I get a new memory map. This
>>>>>> requires me to build a new memory key object which covers the new memory
>>>>>> map. Since the virtqueue objects are referencing this memory key, I need
>>>>>> to destroy them and build new ones referncing the new memory key.
>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>> -Siwei
>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> -Siwei
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>      drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
>>>>>>>>>>>>      1 file changed, 8 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>>>> index 88dde3455bfd..549ded074ff3 100644
>>>>>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>>>> @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>>>>>>>>>>>>
>>>>>>>>>>>>      static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>>>>>>>>>>>>      {
>>>>>>>>>>>> -       struct mlx5_virtq_attr attr;
>>>>>>>>>>>> -
>>>>>>>>>>>>             if (!mvq->initialized)
>>>>>>>>>>>>                     return;
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
>>>>>>>>>>>>
>>>>>>>>>>>>             if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
>>>>>>>>>>>>                     mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
>>>>>>>>>>>> -
>>>>>>>>>>>> -       if (query_virtqueue(ndev, mvq, &attr)) {
>>>>>>>>>>>> -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
>>>>>>>>>>>> -               return;
>>>>>>>>>>>> -       }
>>>>>>>>>>>> -       mvq->avail_idx = attr.available_index;
>>>>>>>>>>>>      }
>>>>>>>>>>>>
>>>>>>>>>>>>      static void suspend_vqs(struct mlx5_vdpa_net *ndev)
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.29.2
>>>>>>>>>>>>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue
  2021-02-03 23:19                       ` Si-Wei Liu
  2021-02-04  2:57                         ` Jason Wang
@ 2021-02-04  6:53                         ` Eli Cohen
  1 sibling, 0 replies; 27+ messages in thread
From: Eli Cohen @ 2021-02-04  6:53 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Jason Wang, mst, virtualization, netdev, linux-kernel, lulu, Si-Wei Liu

On Wed, Feb 03, 2021 at 03:19:40PM -0800, Si-Wei Liu wrote:
> On Tue, Feb 2, 2021 at 9:16 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > On 2021/2/3 上午1:54, Si-Wei Liu wrote:
> > > On Tue, Feb 2, 2021 at 1:23 AM Eli Cohen <elic@nvidia.com> wrote:
> > >> On Tue, Feb 02, 2021 at 12:38:51AM -0800, Si-Wei Liu wrote:
> > >>> Thanks Eli and Jason for clarifications. See inline.
> > >>>
> > >>> On Mon, Feb 1, 2021 at 11:06 PM Eli Cohen <elic@nvidia.com> wrote:
> > >>>> On Tue, Feb 02, 2021 at 02:02:25PM +0800, Jason Wang wrote:
> > >>>>> On 2021/2/2 下午12:15, Si-Wei Liu wrote:
> > >>>>>> On Mon, Feb 1, 2021 at 7:13 PM Jason Wang <jasowang@redhat.com> wrote:
> > >>>>>>> On 2021/2/2 上午3:17, Si-Wei Liu wrote:
> > >>>>>>>> On Mon, Feb 1, 2021 at 10:51 AM Si-Wei Liu <siwliu.kernel@gmail.com> wrote:
> > >>>>>>>>> On Thu, Jan 28, 2021 at 5:46 AM Eli Cohen <elic@nvidia.com> wrote:
> > >>>>>>>>>> suspend_vq should only suspend the VQ on not save the current available
> > >>>>>>>>>> index. This is done when a change of map occurs when the driver calls
> > >>>>>>>>>> save_channel_info().
> > >>>>>>>>> Hmmm, suspend_vq() is also called by teardown_vq(), the latter of
> > >>>>>>>>> which doesn't save the available index as save_channel_info() doesn't
> > >>>>>>>>> get called in that path at all. How does it handle the case that
> > >>>>>>>>> aget_vq_state() is called from userspace (e.g. QEMU) while the
> > >>>>>>>>> hardware VQ object was torn down, but userspace still wants to access
> > >>>>>>>>> the queue index?
> > >>>>>>>>>
> > >>>>>>>>> Refer to https://lore.kernel.org/netdev/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com/
> > >>>>>>>>>
> > >>>>>>>>> vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
> > >>>>>>>>> vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
> > >>>>>>>>>
> > >>>>>>>>> QEMU will complain with the above warning while VM is being rebooted
> > >>>>>>>>> or shut down.
> > >>>>>>>>>
> > >>>>>>>>> Looks to me either the kernel driver should cover this requirement, or
> > >>>>>>>>> the userspace has to bear the burden in saving the index and not call
> > >>>>>>>>> into kernel if VQ is destroyed.
> > >>>>>>>> Actually, the userspace doesn't have the insights whether virt queue
> > >>>>>>>> will be destroyed if just changing the device status via set_status().
> > >>>>>>>> Looking at other vdpa driver in tree i.e. ifcvf it doesn't behave like
> > >>>>>>>> so. Hence this still looks to me to be Mellanox specifics and
> > >>>>>>>> mlx5_vdpa implementation detail that shouldn't expose to userspace.
> > >>>>>>> So I think we can simply drop this patch?
> > >>>>>> Yep, I think so. To be honest I don't know why it has anything to do
> > >>>>>> with the memory hotplug issue.
> > >>>>>
> > >>>>> Eli may know more, my understanding is that, during memory hotplut, qemu
> > >>>>> need to propagate new memory mappings via set_map(). For mellanox, it means
> > >>>>> it needs to rebuild memory keys, so the virtqueue needs to be suspended.
> > >>>>>
> > >>>> I think Siwei was asking why the first patch was related to the hotplug
> > >>>> issue.
> > >>> I was thinking how consistency is assured when saving/restoring this
> > >>> h/w avail_index against the one in the virtq memory, particularly in
> > >>> the region_add/.region_del() context (e.g. the hotplug case). Problem
> > >>> is I don't see explicit memory barrier when guest thread updates the
> > >>> avail_index, how does the device make sure the h/w avail_index is
> > >>> uptodate while guest may race with updating the virtq's avail_index in
> > >>> the mean while? Maybe I completely miss something obvious?
> > >> DKIM-Signature: v1; arsa-sha256; crelaxed/relaxed; dnvidia.com; sn1;
> > >>          t 12257780; bhHnB0z4VEKwRS3WGY8d836MJgxu5Eln/jbFZlNXVxc08;
> > >>          hX-PGP-Universal:Date:From:To:CC:Subject:Message-ID:References:
> > >>           MIME-Version:Content-Type:Content-Disposition:
> > >>           Content-Transfer-Encoding:In-Reply-To:User-Agent:X-Originating-IP:
> > >>           X-ClientProxiedBy;
> > >>          bgGmb8+rcn3/rKzKQ/7QzSnghWzZ+FAU0XntsRZYGQ66sFvT7zsYPHogG3LIWNY77t
> > >>           wNHPw7GCJrNaH3nEXPbOp0FMOZw4Kv4W7UPuYPobbLeTkvuPAidjB8dM42vz+1X61t
> > >>           9IVQT9X4hnAxRjI5CqZOo41GS4Tl1X+ykGoA+VE80BR/R/+nQ3tXDVULfppzeB+vu3
> > >>           TWnnpaZ2GyoNyPlMiyVRkHdXzDVgA4uQHxwHn7otGK5J4lzyu8KrFyQtiP+f6hfu5v
> > >>           crJkYS8e9A+rfzUmKWuyHcKcmhPhAVJ4XdpzZcDXXlMHVxG7nR1o88xttC6D1+oNIP
> > >>           9xHI3DkNBpEuA
> > >> If you're asking about syncronization upon hot plug of memory, the
> > >> hardware always goes to read the available index from memory when a new
> > >> hardware object is associted with a virtqueue. You can argue then that
> > >> you don't need to restore the available index and you may be right but
> > >> this is the currect inteface to the firmware.
> > >>
> > >>
> > >> If you're asking on generally how sync is assured when the guest updates
> > >> the available index, can you please send a pointer to the code where you
> > >> see the update without a memory barrier?
> > > This is a snippet of virtqueue_add_split() where avail_index gets
> > > updated by guest:
> > >
> > >          /* Put entry in available array (but don't update avail->idx until they
> > >           * do sync). */
> > >          avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> > >          vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
> > >
> > >          /* Descriptors and available array need to be set before we expose the
> > >           * new available array entries. */
> > >          virtio_wmb(vq->weak_barriers);
> > >          vq->split.avail_idx_shadow++;
> > >          vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
> > >                                                  vq->split.avail_idx_shadow);
> > >          vq->num_added++;
> > >
> > > There's memory barrier to make sure the update to descriptor and
> > > available ring is seen before writing to the avail->idx, but there
> > > seems no gurantee that this update would flush to the memory
> > > immmedately either before or after the mlx5-vdpa is suspened and gets
> > > the old avail_index value stashed somewhere. In this case, how does
> > > the hardware ensure the consistency between the guest virtio and host
> > > mlx5-vdpa? Or, it completly relies on guest to update the avail_index
> > > once the next buffer is available, so that the index will be in sync
> > > again?
> >
> >
> > I'm not sure I get the question but notice that the driver should check
> > and notify virtqueue when device want a notification. So there's a
> > virtio_wmb() e.g in:
> >
> > static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
> > {
> >      struct vring_virtqueue *vq = to_vvq(_vq);
> >      u16 new, old;
> >      bool needs_kick;
> >
> >      START_USE(vq);
> >      /* We need to expose available array entries before checking avail
> >       * event. */
> >      virtio_mb(vq->weak_barries);
> >
> >      old = vq->split.avail_idx_shadow - vq->num_added;
> >      new = vq->split.avail_idx_shadow;
> >      vq->num_added = 0;
> >
> > (See the comment above virtio_mb()). So the avail idx is guaranteed to
> > be committed to memroy(cache hierarchy) before the check of
> > notification. I think we sync through this.
> 
> Thanks for pointing it out! Indeed, the avail index is synced before
> kicking the device. However, even so I think the race might still be
> possible, see below:
> 
> 
>   mlx5_vdpa device                       virtio driver
> -------------------------------------------------------------------------
>                                   virtqueue_add_split
>                                     (bumped up avail_idx1 to
>                                     avail_idx2, but update
>                                     not yet committed to mem)
> 
> (hot plug memory...)
> mlx5_vdpa_set_map
>   mlx5_vdpa_change_map
>     suspend_vqs
>       suspend_vq
>         (avail_idx1 was saved)
>     save_channels_info
>     :
>     :                             virtqueue_kick_prepare_split
>     :                               (avail_idx2 committed to memory)
>     restore_channels_info
>     setup_driver
>       ...
>         create_virtqueue
>           (saved avail_idx1
>           getting restored;
>           whereas current
>           avail_idx2 in
>           memory is ignored)
> :
> :
>                                    virtqueue_notify
>                                      vp_notify
> mlx5_vdpa_kick_vq
>   (device processing up to
>   avail_idx1 rather than
>   avail_idx2?)
> 
> 
> According to Eli, the vdpa firmware does not load the latest value,
> i.e. avail_idx2 from memory when re-creating the virtqueue object.

I said exactly the opposite. The hardware always goes to read the
available index from memory. The requirement to configure it when
creating a new object is still a requirement defined by the interface so
I must not violate interface requirments.

> Instead it starts with a stale avail_idx1 that was saved in mlx5_vdpa
> driver private structure before the memory update is made. That is the
> way how the current firmware interface is designed. The thing I'm not
> sure though is if the firmware/device will resync and get to the
> latest avail_idx2 from memory while processing the kick request with
> mlx5_vdpa_kick_vq? If not, a few avail entries will be missing in the
> kick, which could cause odd behavior or implicit failure.
> 
> But if the firmware will resync on kick_vq (and get_vq_state), I think
> I completely lost the point of saving and restoring avail_idx when
> changing the memory map...
> 
> Thanks,
> -Siwei
> 
> >
> > Thanks
> >
> >
> > >
> > > Thanks,
> > > -Siwei
> > >
> > >>> Thanks,
> > >>> -Siwei
> > >>>
> > >>>> But you're correct. When memory is added, I get a new memory map. This
> > >>>> requires me to build a new memory key object which covers the new memory
> > >>>> map. Since the virtqueue objects are referencing this memory key, I need
> > >>>> to destroy them and build new ones referncing the new memory key.
> > >>>>
> > >>>>> Thanks
> > >>>>>
> > >>>>>
> > >>>>>> -Siwei
> > >>>>>>
> > >>>>>>> Thanks
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>> -Siwei
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
> > >>>>>>>>>> ---
> > >>>>>>>>>>     drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 --------
> > >>>>>>>>>>     1 file changed, 8 deletions(-)
> > >>>>>>>>>>
> > >>>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > >>>>>>>>>> index 88dde3455bfd..549ded074ff3 100644
> > >>>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > >>>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > >>>>>>>>>> @@ -1148,8 +1148,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > >>>>>>>>>>
> > >>>>>>>>>>     static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> > >>>>>>>>>>     {
> > >>>>>>>>>> -       struct mlx5_virtq_attr attr;
> > >>>>>>>>>> -
> > >>>>>>>>>>            if (!mvq->initialized)
> > >>>>>>>>>>                    return;
> > >>>>>>>>>>
> > >>>>>>>>>> @@ -1158,12 +1156,6 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> > >>>>>>>>>>
> > >>>>>>>>>>            if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> > >>>>>>>>>>                    mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> > >>>>>>>>>> -
> > >>>>>>>>>> -       if (query_virtqueue(ndev, mvq, &attr)) {
> > >>>>>>>>>> -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> > >>>>>>>>>> -               return;
> > >>>>>>>>>> -       }
> > >>>>>>>>>> -       mvq->avail_idx = attr.available_index;
> > >>>>>>>>>>     }
> > >>>>>>>>>>
> > >>>>>>>>>>     static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > >>>>>>>>>> --
> > >>>>>>>>>> 2.29.2
> > >>>>>>>>>>
> >

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-02-04  6:54 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-28 13:41 [PATCH 0/2] Fix failure to hot add memory Eli Cohen
2021-01-28 13:41 ` [PATCH 1/2] vdpa/mlx5: Avoid unnecessary query virtqueue Eli Cohen
2021-01-29  3:48   ` Jason Wang
2021-02-01 18:51   ` Si-Wei Liu
2021-02-01 19:17     ` Si-Wei Liu
2021-02-02  3:12       ` Jason Wang
2021-02-02  4:15         ` Si-Wei Liu
2021-02-02  6:02           ` Jason Wang
2021-02-02  7:06             ` Eli Cohen
2021-02-02  8:38               ` Si-Wei Liu
2021-02-02  9:22                 ` Eli Cohen
2021-02-02 17:54                   ` Si-Wei Liu
2021-02-03  5:16                     ` Jason Wang
2021-02-03 23:19                       ` Si-Wei Liu
2021-02-04  2:57                         ` Jason Wang
2021-02-04  6:53                         ` Eli Cohen
2021-02-02  7:00           ` Eli Cohen
2021-02-02 14:06             ` Michael S. Tsirkin
2021-02-02  6:57         ` Eli Cohen
2021-01-28 13:41 ` [PATCH 2/2] vdpa/mlx5: Restore the hardware used index after change map Eli Cohen
2021-01-29  3:49   ` Jason Wang
2021-01-31 18:55     ` Eli Cohen
2021-02-01  3:36       ` Jason Wang
2021-02-01  5:52         ` Eli Cohen
2021-02-01  6:00           ` Jason Wang
2021-02-01  6:38             ` Eli Cohen
2021-02-01  7:23               ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).