linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Eli Cohen <elic@nvidia.com>
Cc: Si-Wei Liu <si-wei.liu@oracle.com>,
	mst@redhat.com, virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	lulu@redhat.com
Subject: Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map
Date: Mon, 8 Feb 2021 17:04:27 +0800	[thread overview]
Message-ID: <0d592ed0-3cea-cfb0-9b7b-9d2755da3f12@redhat.com> (raw)
In-Reply-To: <20210208063736.GA166546@mtl-vdi-166.wap.labs.mlnx>


On 2021/2/8 下午2:37, Eli Cohen wrote:
> On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote:
>> On 2021/2/6 上午7:07, Si-Wei Liu wrote:
>>>
>>> On 2/3/2021 11:36 PM, Eli Cohen wrote:
>>>> When a change of memory map occurs, the hardware resources are destroyed
>>>> and then re-created again with the new memory map. In such case, we need
>>>> to restore the hardware available and used indices. The driver failed to
>>>> restore the used index which is added here.
>>>>
>>>> Also, since the driver also fails to reset the available and used
>>>> indices upon device reset, fix this here to avoid regression caused by
>>>> the fact that used index may not be zero upon device reset.
>>>>
>>>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5
>>>> devices")
>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>> ---
>>>> v0 -> v1:
>>>> Clear indices upon device reset
>>>>
>>>>    drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++++++++++++++++++
>>>>    1 file changed, 18 insertions(+)
>>>>
>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>> index 88dde3455bfd..b5fe6d2ad22f 100644
>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>> @@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
>>>>        u64 device_addr;
>>>>        u64 driver_addr;
>>>>        u16 avail_index;
>>>> +    u16 used_index;
>>>>        bool ready;
>>>>        struct vdpa_callback cb;
>>>>        bool restore;
>>>> @@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
>>>>        u32 virtq_id;
>>>>        struct mlx5_vdpa_net *ndev;
>>>>        u16 avail_idx;
>>>> +    u16 used_idx;
>>>>        int fw_state;
>>>>          /* keep last in the struct */
>>>> @@ -804,6 +806,7 @@ static int create_virtqueue(struct mlx5_vdpa_net
>>>> *ndev, struct mlx5_vdpa_virtque
>>>>          obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in,
>>>> obj_context);
>>>>        MLX5_SET(virtio_net_q_object, obj_context, hw_available_index,
>>>> mvq->avail_idx);
>>>> +    MLX5_SET(virtio_net_q_object, obj_context, hw_used_index,
>>>> mvq->used_idx);
>>>>        MLX5_SET(virtio_net_q_object, obj_context,
>>>> queue_feature_bit_mask_12_3,
>>>>             get_features_12_3(ndev->mvdev.actual_features));
>>>>        vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context,
>>>> virtio_q_context);
>>>> @@ -1022,6 +1025,7 @@ static int connect_qps(struct mlx5_vdpa_net
>>>> *ndev, struct mlx5_vdpa_virtqueue *m
>>>>    struct mlx5_virtq_attr {
>>>>        u8 state;
>>>>        u16 available_index;
>>>> +    u16 used_index;
>>>>    };
>>>>      static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct
>>>> mlx5_vdpa_virtqueue *mvq,
>>>> @@ -1052,6 +1056,7 @@ static int query_virtqueue(struct
>>>> mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
>>>>        memset(attr, 0, sizeof(*attr));
>>>>        attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
>>>>        attr->available_index = MLX5_GET(virtio_net_q_object,
>>>> obj_context, hw_available_index);
>>>> +    attr->used_index = MLX5_GET(virtio_net_q_object, obj_context,
>>>> hw_used_index);
>>>>        kfree(out);
>>>>        return 0;
>>>>    @@ -1535,6 +1540,16 @@ static void teardown_virtqueues(struct
>>>> mlx5_vdpa_net *ndev)
>>>>        }
>>>>    }
>>>>    +static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
>>>> +{
>>>> +    int i;
>>>> +
>>>> +    for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
>>>> +        ndev->vqs[i].avail_idx = 0;
>>>> +        ndev->vqs[i].used_idx = 0;
>>>> +    }
>>>> +}
>>>> +
>>>>    /* TODO: cross-endian support */
>>>>    static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev
>>>> *mvdev)
>>>>    {
>>>> @@ -1610,6 +1625,7 @@ static int save_channel_info(struct
>>>> mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
>>>>            return err;
>>>>          ri->avail_index = attr.available_index;
>>>> +    ri->used_index = attr.used_index;
>>>>        ri->ready = mvq->ready;
>>>>        ri->num_ent = mvq->num_ent;
>>>>        ri->desc_addr = mvq->desc_addr;
>>>> @@ -1654,6 +1670,7 @@ static void restore_channels_info(struct
>>>> mlx5_vdpa_net *ndev)
>>>>                continue;
>>>>              mvq->avail_idx = ri->avail_index;
>>>> +        mvq->used_idx = ri->used_index;
>>>>            mvq->ready = ri->ready;
>>>>            mvq->num_ent = ri->num_ent;
>>>>            mvq->desc_addr = ri->desc_addr;
>>>> @@ -1768,6 +1785,7 @@ static void mlx5_vdpa_set_status(struct
>>>> vdpa_device *vdev, u8 status)
>>>>        if (!status) {
>>>>            mlx5_vdpa_info(mvdev, "performing device reset\n");
>>>>            teardown_driver(ndev);
>>>> +        clear_virtqueues(ndev);
>>> The clearing looks fine at the first glance, as it aligns with the other
>>> state cleanups floating around at the same place. However, the thing is
>>> get_vq_state() is supposed to be called right after to get sync'ed with
>>> the latest internal avail_index from device while vq is stopped. The
>>> index was saved in the driver software at vq suspension, but before the
>>> virtq object is destroyed. We shouldn't clear the avail_index too early.
>>
>> Good point.
>>
>> There's a limitation on the virtio spec and vDPA framework that we can not
>> simply differ device suspending from device reset.
>>
> Are you talking about live migration where you reset the device but
> still want to know how far it progressed in order to continue from the
> same place in the new VM?


Yes. So if we want to support live migration at we need:

in src node:
1) suspend the device
2) get last_avail_idx via get_vq_state()

in the dst node:
3) set last_avail_idx via set_vq_state()
4) resume the device

So you can see, step 2 requires the device/driver not to forget the 
last_avail_idx.

The annoying thing is that, in the virtio spec there's no definition of 
device suspending. So we reuse set_status(0) right now for vq 
suspending. Then if we forget last_avail_idx in set_status(0), it will 
break the assumption of step 2).


>
>> Need to think about that. I suggest a new state in [1], the issue is that
>> people doesn't like the asynchronous API that it introduces.
>>
>>
>>> Possibly it can be postponed to where VIRTIO_CONFIG_S_DRIVER_OK gets set
>>> again, i.e. right before the setup_driver() in mlx5_vdpa_set_status()?
>>
>> Looks like a good workaround.


Rethink of this, this won't work for the step 4), if we reuse the 
S_DRING_OK for resuming.

The most clean way is to invent the feature in virtio spec and implement 
that in the driver.

Thanks



>>
>> Thanks
>>
>>
>>> -Siwei
>>
>> [1]
>> https://lists.oasis-open.org/archives/virtio-comment/202012/msg00029.html
>>
>>
>>>> mlx5_vdpa_destroy_mr(&ndev->mvdev);
>>>>            ndev->mvdev.status = 0;
>>>>            ndev->mvdev.mlx_features = 0;


  reply	other threads:[~2021-02-08  9:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-04  7:36 [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map Eli Cohen
2021-02-05  3:57 ` Jason Wang
2021-02-05 23:07 ` Si-Wei Liu
2021-02-08  4:27   ` Jason Wang
2021-02-08  6:37     ` Eli Cohen
2021-02-08  9:04       ` Jason Wang [this message]
2021-02-08  9:26         ` Michael S. Tsirkin
2021-02-08 10:04         ` Eli Cohen
2021-02-09  3:20           ` Jason Wang
2021-02-09  6:12             ` Eli Cohen
2021-02-09  6:37               ` Jason Wang
2021-02-10  2:30                 ` Si-Wei Liu
2021-02-10  3:53                   ` Jason Wang
2021-02-10  8:59                     ` Si-Wei Liu
2021-02-10 15:45                       ` Eli Cohen
2021-02-17  0:25                         ` Si-Wei Liu
2021-02-17  6:51                           ` Eli Cohen
2021-02-18  4:44                       ` Jason Wang
2021-02-18 12:43                         ` Si-Wei Liu
2021-02-19  3:10                           ` Jason Wang
2021-02-20  2:05                             ` Si-Wei Liu
2021-02-20  2:38                               ` Jason Wang
2021-02-20  3:00                                 ` Si-Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0d592ed0-3cea-cfb0-9b7b-9d2755da3f12@redhat.com \
    --to=jasowang@redhat.com \
    --cc=elic@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lulu@redhat.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=si-wei.liu@oracle.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).