linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] vdpa/mlx5: Add support for doorbell bypassing
@ 2021-04-21 10:41 Eli Cohen
  2021-04-22  2:37 ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Cohen @ 2021-04-21 10:41 UTC (permalink / raw)
  To: mst, jasowang; +Cc: virtualization, linux-kernel, elic

Implement mlx5_get_vq_notification() to return the doorbell address.
Size is set to one system page as required.

Signed-off-by: Eli Cohen <elic@nvidia.com>
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
 drivers/vdpa/mlx5/core/resources.c | 1 +
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
 3 files changed, 8 insertions(+)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index b6cc53ba980c..49de62cda598 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
 	u32 pdn;
 	struct mlx5_uars_page *uar;
 	void __iomem *kick_addr;
+	u64 phys_kick_addr;
 	u16 uid;
 	u32 null_mkey;
 	bool valid;
diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
index 6521cbd0f5c2..665f8fc1710f 100644
--- a/drivers/vdpa/mlx5/core/resources.c
+++ b/drivers/vdpa/mlx5/core/resources.c
@@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
 		goto err_key;
 
 	kick_addr = mdev->bar_addr + offset;
+	res->phys_kick_addr = kick_addr;
 
 	res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
 	if (!res->kick_addr) {
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 10c5fef3c020..680751074d2a 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
 
 static struct vdpa_notification_area mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
 {
+	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
 	struct vdpa_notification_area ret = {};
+	struct mlx5_vdpa_net *ndev;
+
+	ndev = to_mlx5_vdpa_ndev(mvdev);
 
+	ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
+	ret.size = PAGE_SIZE;
 	return ret;
 }
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-21 10:41 [PATCH] vdpa/mlx5: Add support for doorbell bypassing Eli Cohen
@ 2021-04-22  2:37 ` Jason Wang
  2021-04-22  3:27   ` Mika Penttilä
  2021-04-22  6:03   ` Eli Cohen
  0 siblings, 2 replies; 19+ messages in thread
From: Jason Wang @ 2021-04-22  2:37 UTC (permalink / raw)
  To: Eli Cohen, mst; +Cc: virtualization, linux-kernel


在 2021/4/21 下午6:41, Eli Cohen 写道:
> Implement mlx5_get_vq_notification() to return the doorbell address.
> Size is set to one system page as required.
>
> Signed-off-by: Eli Cohen <elic@nvidia.com>
> ---
>   drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
>   drivers/vdpa/mlx5/core/resources.c | 1 +
>   drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
>   3 files changed, 8 insertions(+)
>
> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> index b6cc53ba980c..49de62cda598 100644
> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
>   	u32 pdn;
>   	struct mlx5_uars_page *uar;
>   	void __iomem *kick_addr;
> +	u64 phys_kick_addr;
>   	u16 uid;
>   	u32 null_mkey;
>   	bool valid;
> diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
> index 6521cbd0f5c2..665f8fc1710f 100644
> --- a/drivers/vdpa/mlx5/core/resources.c
> +++ b/drivers/vdpa/mlx5/core/resources.c
> @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>   		goto err_key;
>   
>   	kick_addr = mdev->bar_addr + offset;
> +	res->phys_kick_addr = kick_addr;
>   
>   	res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
>   	if (!res->kick_addr) {
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 10c5fef3c020..680751074d2a 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
>   
>   static struct vdpa_notification_area mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
>   {
> +	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>   	struct vdpa_notification_area ret = {};
> +	struct mlx5_vdpa_net *ndev;
> +
> +	ndev = to_mlx5_vdpa_ndev(mvdev);
>   
> +	ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
> +	ret.size = PAGE_SIZE;


Note that the page will be mapped in to guest, so it's only safe if the 
doorbeel exclusively own the page. This means if there're other 
registers in the page, we can not let the doorbell bypass to work.

So this is suspicious at least in the case of subfunction where we 
calculate the bar length in mlx5_sf_dev_table_create() as:

table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);

It looks to me this can only work for the arch with PAGE_SIZE = 4096, 
otherwise we can map more into the userspace(guest).

Thanks



>   	return ret;
>   }
>   


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-22  2:37 ` Jason Wang
@ 2021-04-22  3:27   ` Mika Penttilä
  2021-04-22  4:52     ` Jason Wang
  2021-04-22  6:03   ` Eli Cohen
  1 sibling, 1 reply; 19+ messages in thread
From: Mika Penttilä @ 2021-04-22  3:27 UTC (permalink / raw)
  To: Jason Wang, Eli Cohen, mst; +Cc: virtualization, linux-kernel



On 22.4.2021 5.37, Jason Wang wrote:
>
> 在 2021/4/21 下午6:41, Eli Cohen 写道:
>> Implement mlx5_get_vq_notification() to return the doorbell address.
>> Size is set to one system page as required.
>>
>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>> ---
>>   drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
>>   drivers/vdpa/mlx5/core/resources.c | 1 +
>>   drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
>>   3 files changed, 8 insertions(+)
>>
>> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
>> b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>> index b6cc53ba980c..49de62cda598 100644
>> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>> @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
>>       u32 pdn;
>>       struct mlx5_uars_page *uar;
>>       void __iomem *kick_addr;
>> +    u64 phys_kick_addr;
>>       u16 uid;
>>       u32 null_mkey;
>>       bool valid;
>> diff --git a/drivers/vdpa/mlx5/core/resources.c 
>> b/drivers/vdpa/mlx5/core/resources.c
>> index 6521cbd0f5c2..665f8fc1710f 100644
>> --- a/drivers/vdpa/mlx5/core/resources.c
>> +++ b/drivers/vdpa/mlx5/core/resources.c
>> @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct 
>> mlx5_vdpa_dev *mvdev)
>>           goto err_key;
>>         kick_addr = mdev->bar_addr + offset;
>> +    res->phys_kick_addr = kick_addr;
>>         res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
>>       if (!res->kick_addr) {
>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> index 10c5fef3c020..680751074d2a 100644
>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device 
>> *vdev)
>>     static struct vdpa_notification_area 
>> mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
>>   {
>> +    struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>       struct vdpa_notification_area ret = {};
>> +    struct mlx5_vdpa_net *ndev;
>> +
>> +    ndev = to_mlx5_vdpa_ndev(mvdev);
>>   +    ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
>> +    ret.size = PAGE_SIZE;
>
>
> Note that the page will be mapped in to guest, so it's only safe if 
> the doorbeel exclusively own the page. This means if there're other 
> registers in the page, we can not let the doorbell bypass to work.
>
> So this is suspicious at least in the case of subfunction where we 
> calculate the bar length in mlx5_sf_dev_table_create() as:
>
> table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
>
> It looks to me this can only work for the arch with PAGE_SIZE = 4096, 
> otherwise we can map more into the userspace(guest).
>
> Thanks
>
>
Is there support as of today (in qemu  or elsewhere) to use this mmap 
doorbell instead of the traditional kick.

--Mika



>
>>       return ret;
>>   }
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-22  3:27   ` Mika Penttilä
@ 2021-04-22  4:52     ` Jason Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2021-04-22  4:52 UTC (permalink / raw)
  To: Mika Penttilä, Eli Cohen, mst; +Cc: virtualization, linux-kernel


在 2021/4/22 上午11:27, Mika Penttilä 写道:
>
>
> On 22.4.2021 5.37, Jason Wang wrote:
>>
>> 在 2021/4/21 下午6:41, Eli Cohen 写道:
>>> Implement mlx5_get_vq_notification() to return the doorbell address.
>>> Size is set to one system page as required.
>>>
>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>> ---
>>>   drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
>>>   drivers/vdpa/mlx5/core/resources.c | 1 +
>>>   drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
>>>   3 files changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
>>> b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>> index b6cc53ba980c..49de62cda598 100644
>>> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>> @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
>>>       u32 pdn;
>>>       struct mlx5_uars_page *uar;
>>>       void __iomem *kick_addr;
>>> +    u64 phys_kick_addr;
>>>       u16 uid;
>>>       u32 null_mkey;
>>>       bool valid;
>>> diff --git a/drivers/vdpa/mlx5/core/resources.c 
>>> b/drivers/vdpa/mlx5/core/resources.c
>>> index 6521cbd0f5c2..665f8fc1710f 100644
>>> --- a/drivers/vdpa/mlx5/core/resources.c
>>> +++ b/drivers/vdpa/mlx5/core/resources.c
>>> @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct 
>>> mlx5_vdpa_dev *mvdev)
>>>           goto err_key;
>>>         kick_addr = mdev->bar_addr + offset;
>>> +    res->phys_kick_addr = kick_addr;
>>>         res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
>>>       if (!res->kick_addr) {
>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> index 10c5fef3c020..680751074d2a 100644
>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device 
>>> *vdev)
>>>     static struct vdpa_notification_area 
>>> mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
>>>   {
>>> +    struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>       struct vdpa_notification_area ret = {};
>>> +    struct mlx5_vdpa_net *ndev;
>>> +
>>> +    ndev = to_mlx5_vdpa_ndev(mvdev);
>>>   +    ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
>>> +    ret.size = PAGE_SIZE;
>>
>>
>> Note that the page will be mapped in to guest, so it's only safe if 
>> the doorbeel exclusively own the page. This means if there're other 
>> registers in the page, we can not let the doorbell bypass to work.
>>
>> So this is suspicious at least in the case of subfunction where we 
>> calculate the bar length in mlx5_sf_dev_table_create() as:
>>
>> table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
>>
>> It looks to me this can only work for the arch with PAGE_SIZE = 4096, 
>> otherwise we can map more into the userspace(guest).
>>
>> Thanks
>>
>>
> Is there support as of today (in qemu  or elsewhere) to use this mmap 
> doorbell instead of the traditional kick.


Vhost-user had already had this support in Qemu (see set_host_notifier_mr).

And I've posted patches to support this for qemu[1] and vp_vdpa[2] driver.

Thanks

[1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg798935.html

[2] 
https://lore.kernel.org/virtualization/20210415073147.19331-5-jasowang@redhat.com/T/


>
> --Mika
>
>
>
>>
>>>       return ret;
>>>   }
>>
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-22  2:37 ` Jason Wang
  2021-04-22  3:27   ` Mika Penttilä
@ 2021-04-22  6:03   ` Eli Cohen
  2021-04-22  8:07     ` Eli Cohen
  1 sibling, 1 reply; 19+ messages in thread
From: Eli Cohen @ 2021-04-22  6:03 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel

On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
> 
> 在 2021/4/21 下午6:41, Eli Cohen 写道:
> > Implement mlx5_get_vq_notification() to return the doorbell address.
> > Size is set to one system page as required.
> > 
> > Signed-off-by: Eli Cohen <elic@nvidia.com>
> > ---
> >   drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
> >   drivers/vdpa/mlx5/core/resources.c | 1 +
> >   drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
> >   3 files changed, 8 insertions(+)
> > 
> > diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > index b6cc53ba980c..49de62cda598 100644
> > --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
> >   	u32 pdn;
> >   	struct mlx5_uars_page *uar;
> >   	void __iomem *kick_addr;
> > +	u64 phys_kick_addr;
> >   	u16 uid;
> >   	u32 null_mkey;
> >   	bool valid;
> > diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
> > index 6521cbd0f5c2..665f8fc1710f 100644
> > --- a/drivers/vdpa/mlx5/core/resources.c
> > +++ b/drivers/vdpa/mlx5/core/resources.c
> > @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
> >   		goto err_key;
> >   	kick_addr = mdev->bar_addr + offset;
> > +	res->phys_kick_addr = kick_addr;
> >   	res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
> >   	if (!res->kick_addr) {
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 10c5fef3c020..680751074d2a 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
> >   static struct vdpa_notification_area mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
> >   {
> > +	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> >   	struct vdpa_notification_area ret = {};
> > +	struct mlx5_vdpa_net *ndev;
> > +
> > +	ndev = to_mlx5_vdpa_ndev(mvdev);
> > +	ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
> > +	ret.size = PAGE_SIZE;
> 
> 
> Note that the page will be mapped in to guest, so it's only safe if the
> doorbeel exclusively own the page. This means if there're other registers in
> the page, we can not let the doorbell bypass to work.
> 
> So this is suspicious at least in the case of subfunction where we calculate
> the bar length in mlx5_sf_dev_table_create() as:
> 
> table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
> 
> It looks to me this can only work for the arch with PAGE_SIZE = 4096,
> otherwise we can map more into the userspace(guest).
> 

Correct, so I guess I should return here 4096.

I also think that the check in vhost_vdpa_mmap() should verify that the
returned size is not smaller than PAGE_SIZE because the returned address
might just be aligned to PAGE_SIZE. I think this should be enoght but
maybe also use the same logic in vhost_vdpa_fault().

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-22  6:03   ` Eli Cohen
@ 2021-04-22  8:07     ` Eli Cohen
  2021-04-22  8:21       ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Cohen @ 2021-04-22  8:07 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel

On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen wrote:
> On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
> > 
> > 在 2021/4/21 下午6:41, Eli Cohen 写道:
> > > Implement mlx5_get_vq_notification() to return the doorbell address.
> > > Size is set to one system page as required.
> > > 
> > > Signed-off-by: Eli Cohen <elic@nvidia.com>
> > > ---
> > >   drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
> > >   drivers/vdpa/mlx5/core/resources.c | 1 +
> > >   drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
> > >   3 files changed, 8 insertions(+)
> > > 
> > > diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > index b6cc53ba980c..49de62cda598 100644
> > > --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
> > >   	u32 pdn;
> > >   	struct mlx5_uars_page *uar;
> > >   	void __iomem *kick_addr;
> > > +	u64 phys_kick_addr;
> > >   	u16 uid;
> > >   	u32 null_mkey;
> > >   	bool valid;
> > > diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
> > > index 6521cbd0f5c2..665f8fc1710f 100644
> > > --- a/drivers/vdpa/mlx5/core/resources.c
> > > +++ b/drivers/vdpa/mlx5/core/resources.c
> > > @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
> > >   		goto err_key;
> > >   	kick_addr = mdev->bar_addr + offset;
> > > +	res->phys_kick_addr = kick_addr;
> > >   	res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
> > >   	if (!res->kick_addr) {
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index 10c5fef3c020..680751074d2a 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
> > >   static struct vdpa_notification_area mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
> > >   {
> > > +	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > >   	struct vdpa_notification_area ret = {};
> > > +	struct mlx5_vdpa_net *ndev;
> > > +
> > > +	ndev = to_mlx5_vdpa_ndev(mvdev);
> > > +	ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
> > > +	ret.size = PAGE_SIZE;
> > 
> > 
> > Note that the page will be mapped in to guest, so it's only safe if the
> > doorbeel exclusively own the page. This means if there're other registers in
> > the page, we can not let the doorbell bypass to work.
> > 
> > So this is suspicious at least in the case of subfunction where we calculate
> > the bar length in mlx5_sf_dev_table_create() as:
> > 
> > table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
> > 
> > It looks to me this can only work for the arch with PAGE_SIZE = 4096,
> > otherwise we can map more into the userspace(guest).
> > 
> 
> Correct, so I guess I should return here 4096.
> 
> I also think that the check in vhost_vdpa_mmap() should verify that the
> returned size is not smaller than PAGE_SIZE because the returned address

Actually I think it's ok since you verify the size equals vma->vm_end -
vma->vm_start which must be at least PAGE_SIZE.

> might just be aligned to PAGE_SIZE. I think this should be enoght but
> maybe also use the same logic in vhost_vdpa_fault().

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-22  8:07     ` Eli Cohen
@ 2021-04-22  8:21       ` Jason Wang
  2021-04-22  8:39         ` Eli Cohen
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2021-04-22  8:21 UTC (permalink / raw)
  To: Eli Cohen; +Cc: mst, virtualization, linux-kernel


在 2021/4/22 下午4:07, Eli Cohen 写道:
> On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen wrote:
>> On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
>>> 在 2021/4/21 下午6:41, Eli Cohen 写道:
>>>> Implement mlx5_get_vq_notification() to return the doorbell address.
>>>> Size is set to one system page as required.
>>>>
>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>> ---
>>>>    drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
>>>>    drivers/vdpa/mlx5/core/resources.c | 1 +
>>>>    drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
>>>>    3 files changed, 8 insertions(+)
>>>>
>>>> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>> index b6cc53ba980c..49de62cda598 100644
>>>> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>> @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
>>>>    	u32 pdn;
>>>>    	struct mlx5_uars_page *uar;
>>>>    	void __iomem *kick_addr;
>>>> +	u64 phys_kick_addr;
>>>>    	u16 uid;
>>>>    	u32 null_mkey;
>>>>    	bool valid;
>>>> diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
>>>> index 6521cbd0f5c2..665f8fc1710f 100644
>>>> --- a/drivers/vdpa/mlx5/core/resources.c
>>>> +++ b/drivers/vdpa/mlx5/core/resources.c
>>>> @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>>>>    		goto err_key;
>>>>    	kick_addr = mdev->bar_addr + offset;
>>>> +	res->phys_kick_addr = kick_addr;
>>>>    	res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
>>>>    	if (!res->kick_addr) {
>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>> index 10c5fef3c020..680751074d2a 100644
>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>> @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
>>>>    static struct vdpa_notification_area mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
>>>>    {
>>>> +	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>    	struct vdpa_notification_area ret = {};
>>>> +	struct mlx5_vdpa_net *ndev;
>>>> +
>>>> +	ndev = to_mlx5_vdpa_ndev(mvdev);
>>>> +	ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
>>>> +	ret.size = PAGE_SIZE;
>>>
>>> Note that the page will be mapped in to guest, so it's only safe if the
>>> doorbeel exclusively own the page. This means if there're other registers in
>>> the page, we can not let the doorbell bypass to work.
>>>
>>> So this is suspicious at least in the case of subfunction where we calculate
>>> the bar length in mlx5_sf_dev_table_create() as:
>>>
>>> table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
>>>
>>> It looks to me this can only work for the arch with PAGE_SIZE = 4096,
>>> otherwise we can map more into the userspace(guest).
>>>
>> Correct, so I guess I should return here 4096.


I'm not quite sure but since the calculation of the sf_bar_length is 
doen via a shift of 12, it might be correct.

And please double check if the doorbell own the page exclusively.


>>
>> I also think that the check in vhost_vdpa_mmap() should verify that the
>> returned size is not smaller than PAGE_SIZE because the returned address
> Actually I think it's ok since you verify the size equals vma->vm_end -
> vma->vm_start which must be at least PAGE_SIZE.


Yes.

Thanks


>
>> might just be aligned to PAGE_SIZE. I think this should be enoght but
>> maybe also use the same logic in vhost_vdpa_fault().


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-22  8:21       ` Jason Wang
@ 2021-04-22  8:39         ` Eli Cohen
  2021-04-22  8:59           ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Cohen @ 2021-04-22  8:39 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel

On Thu, Apr 22, 2021 at 04:21:45PM +0800, Jason Wang wrote:
> 
> 在 2021/4/22 下午4:07, Eli Cohen 写道:
> > On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen wrote:
> > > On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
> > > > 在 2021/4/21 下午6:41, Eli Cohen 写道:
> > > > > Implement mlx5_get_vq_notification() to return the doorbell address.
> > > > > Size is set to one system page as required.
> > > > > 
> > > > > Signed-off-by: Eli Cohen <elic@nvidia.com>
> > > > > ---
> > > > >    drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
> > > > >    drivers/vdpa/mlx5/core/resources.c | 1 +
> > > > >    drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
> > > > >    3 files changed, 8 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > > > index b6cc53ba980c..49de62cda598 100644
> > > > > --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > > > +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > > > @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
> > > > >    	u32 pdn;
> > > > >    	struct mlx5_uars_page *uar;
> > > > >    	void __iomem *kick_addr;
> > > > > +	u64 phys_kick_addr;
> > > > >    	u16 uid;
> > > > >    	u32 null_mkey;
> > > > >    	bool valid;
> > > > > diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
> > > > > index 6521cbd0f5c2..665f8fc1710f 100644
> > > > > --- a/drivers/vdpa/mlx5/core/resources.c
> > > > > +++ b/drivers/vdpa/mlx5/core/resources.c
> > > > > @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
> > > > >    		goto err_key;
> > > > >    	kick_addr = mdev->bar_addr + offset;
> > > > > +	res->phys_kick_addr = kick_addr;
> > > > >    	res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
> > > > >    	if (!res->kick_addr) {
> > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > index 10c5fef3c020..680751074d2a 100644
> > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
> > > > >    static struct vdpa_notification_area mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
> > > > >    {
> > > > > +	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > >    	struct vdpa_notification_area ret = {};
> > > > > +	struct mlx5_vdpa_net *ndev;
> > > > > +
> > > > > +	ndev = to_mlx5_vdpa_ndev(mvdev);
> > > > > +	ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
> > > > > +	ret.size = PAGE_SIZE;
> > > > 
> > > > Note that the page will be mapped in to guest, so it's only safe if the
> > > > doorbeel exclusively own the page. This means if there're other registers in
> > > > the page, we can not let the doorbell bypass to work.
> > > > 
> > > > So this is suspicious at least in the case of subfunction where we calculate
> > > > the bar length in mlx5_sf_dev_table_create() as:
> > > > 
> > > > table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
> > > > 
> > > > It looks to me this can only work for the arch with PAGE_SIZE = 4096,
> > > > otherwise we can map more into the userspace(guest).
> > > > 
> > > Correct, so I guess I should return here 4096.
> 
> 
> I'm not quite sure but since the calculation of the sf_bar_length is doen
> via a shift of 12, it might be correct.
> 
> And please double check if the doorbell own the page exclusively.

I am checking if it is safe to map the any part of the SF's BAR to
userspace without harming other functions. If this is true, I will check
if I can return PAGE_SIZE without compromising security. I think we may
need to extend struct vdpa_notification_area to contain another field
offset which indicates the offset from addr where the actual doorbell
resides.
> 
> 
> > > 
> > > I also think that the check in vhost_vdpa_mmap() should verify that the
> > > returned size is not smaller than PAGE_SIZE because the returned address
> > Actually I think it's ok since you verify the size equals vma->vm_end -
> > vma->vm_start which must be at least PAGE_SIZE.
> 
> 
> Yes.
> 
> Thanks
> 
> 
> > 
> > > might just be aligned to PAGE_SIZE. I think this should be enoght but
> > > maybe also use the same logic in vhost_vdpa_fault().
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-22  8:39         ` Eli Cohen
@ 2021-04-22  8:59           ` Jason Wang
  2021-04-25 13:25             ` Eli Cohen
  2021-04-29 10:00             ` Eli Cohen
  0 siblings, 2 replies; 19+ messages in thread
From: Jason Wang @ 2021-04-22  8:59 UTC (permalink / raw)
  To: Eli Cohen; +Cc: mst, virtualization, linux-kernel, Zhu, Lingshan


在 2021/4/22 下午4:39, Eli Cohen 写道:
> On Thu, Apr 22, 2021 at 04:21:45PM +0800, Jason Wang wrote:
>> 在 2021/4/22 下午4:07, Eli Cohen 写道:
>>> On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen wrote:
>>>> On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
>>>>> 在 2021/4/21 下午6:41, Eli Cohen 写道:
>>>>>> Implement mlx5_get_vq_notification() to return the doorbell address.
>>>>>> Size is set to one system page as required.
>>>>>>
>>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>>>> ---
>>>>>>     drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
>>>>>>     drivers/vdpa/mlx5/core/resources.c | 1 +
>>>>>>     drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
>>>>>>     3 files changed, 8 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>> index b6cc53ba980c..49de62cda598 100644
>>>>>> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>> @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
>>>>>>     	u32 pdn;
>>>>>>     	struct mlx5_uars_page *uar;
>>>>>>     	void __iomem *kick_addr;
>>>>>> +	u64 phys_kick_addr;
>>>>>>     	u16 uid;
>>>>>>     	u32 null_mkey;
>>>>>>     	bool valid;
>>>>>> diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
>>>>>> index 6521cbd0f5c2..665f8fc1710f 100644
>>>>>> --- a/drivers/vdpa/mlx5/core/resources.c
>>>>>> +++ b/drivers/vdpa/mlx5/core/resources.c
>>>>>> @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>>>>>>     		goto err_key;
>>>>>>     	kick_addr = mdev->bar_addr + offset;
>>>>>> +	res->phys_kick_addr = kick_addr;
>>>>>>     	res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
>>>>>>     	if (!res->kick_addr) {
>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>> index 10c5fef3c020..680751074d2a 100644
>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>> @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
>>>>>>     static struct vdpa_notification_area mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
>>>>>>     {
>>>>>> +	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>     	struct vdpa_notification_area ret = {};
>>>>>> +	struct mlx5_vdpa_net *ndev;
>>>>>> +
>>>>>> +	ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>> +	ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
>>>>>> +	ret.size = PAGE_SIZE;
>>>>> Note that the page will be mapped in to guest, so it's only safe if the
>>>>> doorbeel exclusively own the page. This means if there're other registers in
>>>>> the page, we can not let the doorbell bypass to work.
>>>>>
>>>>> So this is suspicious at least in the case of subfunction where we calculate
>>>>> the bar length in mlx5_sf_dev_table_create() as:
>>>>>
>>>>> table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
>>>>>
>>>>> It looks to me this can only work for the arch with PAGE_SIZE = 4096,
>>>>> otherwise we can map more into the userspace(guest).
>>>>>
>>>> Correct, so I guess I should return here 4096.
>>
>> I'm not quite sure but since the calculation of the sf_bar_length is doen
>> via a shift of 12, it might be correct.
>>
>> And please double check if the doorbell own the page exclusively.
> I am checking if it is safe to map the any part of the SF's BAR to
> userspace without harming other functions. If this is true, I will check
> if I can return PAGE_SIZE without compromising security.


It's usally not safe and a layer violation if other registers are placed 
at the same page.


>   I think we may
> need to extend struct vdpa_notification_area to contain another field
> offset which indicates the offset from addr where the actual doorbell
> resides.


The movitiaton of the current design is to be fit seamless into how Qemu 
model doorbell layouts currently:

1) page-per-vq, each vq has its own page aligned doorbell
2) 2 bytes doorbell, each vq has its own 2 byte aligend doorbell

Only 1) is support in vhost-vDPA (and vhost-user) since it's rather 
simple and secure (page aligned) to be modelled and implemented via mmap().

Exporting a complex layout is possbile but requires careful design.

Actually, we had antoher option

3) shared doorbell: all virtqueue shares a single page aligned doorbell

This is not yet supported by Qemu.

Thanks


>>
>>>> I also think that the check in vhost_vdpa_mmap() should verify that the
>>>> returned size is not smaller than PAGE_SIZE because the returned address
>>> Actually I think it's ok since you verify the size equals vma->vm_end -
>>> vma->vm_start which must be at least PAGE_SIZE.
>>
>> Yes.
>>
>> Thanks
>>
>>
>>>> might just be aligned to PAGE_SIZE. I think this should be enoght but
>>>> maybe also use the same logic in vhost_vdpa_fault().


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-22  8:59           ` Jason Wang
@ 2021-04-25 13:25             ` Eli Cohen
  2021-04-26  2:35               ` Jason Wang
  2021-04-29 10:00             ` Eli Cohen
  1 sibling, 1 reply; 19+ messages in thread
From: Eli Cohen @ 2021-04-25 13:25 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, Zhu, Lingshan

On Thu, Apr 22, 2021 at 04:59:11PM +0800, Jason Wang wrote:
> 
> 在 2021/4/22 下午4:39, Eli Cohen 写道:
> > On Thu, Apr 22, 2021 at 04:21:45PM +0800, Jason Wang wrote:
> > > 在 2021/4/22 下午4:07, Eli Cohen 写道:
> > > > On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen wrote:
> > > > > On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
> > > > > > 在 2021/4/21 下午6:41, Eli Cohen 写道:
> > > > > > > Implement mlx5_get_vq_notification() to return the doorbell address.
> > > > > > > Size is set to one system page as required.
> > > > > > > 
> > > > > > > Signed-off-by: Eli Cohen <elic@nvidia.com>
> > > > > > > ---
> > > > > > >     drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
> > > > > > >     drivers/vdpa/mlx5/core/resources.c | 1 +
> > > > > > >     drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
> > > > > > >     3 files changed, 8 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > > > > > index b6cc53ba980c..49de62cda598 100644
> > > > > > > --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > > > > > +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > > > > > @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
> > > > > > >     	u32 pdn;
> > > > > > >     	struct mlx5_uars_page *uar;
> > > > > > >     	void __iomem *kick_addr;
> > > > > > > +	u64 phys_kick_addr;
> > > > > > >     	u16 uid;
> > > > > > >     	u32 null_mkey;
> > > > > > >     	bool valid;
> > > > > > > diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
> > > > > > > index 6521cbd0f5c2..665f8fc1710f 100644
> > > > > > > --- a/drivers/vdpa/mlx5/core/resources.c
> > > > > > > +++ b/drivers/vdpa/mlx5/core/resources.c
> > > > > > > @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
> > > > > > >     		goto err_key;
> > > > > > >     	kick_addr = mdev->bar_addr + offset;
> > > > > > > +	res->phys_kick_addr = kick_addr;
> > > > > > >     	res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
> > > > > > >     	if (!res->kick_addr) {
> > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > index 10c5fef3c020..680751074d2a 100644
> > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
> > > > > > >     static struct vdpa_notification_area mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
> > > > > > >     {
> > > > > > > +	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > >     	struct vdpa_notification_area ret = {};
> > > > > > > +	struct mlx5_vdpa_net *ndev;
> > > > > > > +
> > > > > > > +	ndev = to_mlx5_vdpa_ndev(mvdev);
> > > > > > > +	ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
> > > > > > > +	ret.size = PAGE_SIZE;
> > > > > > Note that the page will be mapped in to guest, so it's only safe if the
> > > > > > doorbeel exclusively own the page. This means if there're other registers in
> > > > > > the page, we can not let the doorbell bypass to work.
> > > > > > 
> > > > > > So this is suspicious at least in the case of subfunction where we calculate
> > > > > > the bar length in mlx5_sf_dev_table_create() as:
> > > > > > 
> > > > > > table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
> > > > > > 
> > > > > > It looks to me this can only work for the arch with PAGE_SIZE = 4096,
> > > > > > otherwise we can map more into the userspace(guest).
> > > > > > 
> > > > > Correct, so I guess I should return here 4096.
> > > 
> > > I'm not quite sure but since the calculation of the sf_bar_length is doen
> > > via a shift of 12, it might be correct.
> > > 
> > > And please double check if the doorbell own the page exclusively.
> > I am checking if it is safe to map the any part of the SF's BAR to
> > userspace without harming other functions. If this is true, I will check
> > if I can return PAGE_SIZE without compromising security.
> 
> 
> It's usally not safe and a layer violation if other registers are placed at
> the same page.
> 
> 
> >   I think we may
> > need to extend struct vdpa_notification_area to contain another field
> > offset which indicates the offset from addr where the actual doorbell
> > resides.
> 
> 
> The movitiaton of the current design is to be fit seamless into how Qemu
> model doorbell layouts currently:
> 
> 1) page-per-vq, each vq has its own page aligned doorbell
> 2) 2 bytes doorbell, each vq has its own 2 byte aligend doorbell
> 
> Only 1) is support in vhost-vDPA (and vhost-user) since it's rather simple
> and secure (page aligned) to be modelled and implemented via mmap().
> 
> Exporting a complex layout is possbile but requires careful design.
> 
> Actually, we had antoher option
> 
> 3) shared doorbell: all virtqueue shares a single page aligned doorbell
> 

This nearly matches we have in ConnectX devices. All the doorbells are
located at the same place. For 4K page size atchitectures it is aligned
to the start of the page. For larger page sizes it is not aligned.
If we don't allow to some offset within the page, it means that direct
doorbells will not work for 64K page size archs over ConnectX.

> This is not yet supported by Qemu.
> 
> Thanks
> 
> 
> > > 
> > > > > I also think that the check in vhost_vdpa_mmap() should verify that the
> > > > > returned size is not smaller than PAGE_SIZE because the returned address
> > > > Actually I think it's ok since you verify the size equals vma->vm_end -
> > > > vma->vm_start which must be at least PAGE_SIZE.
> > > 
> > > Yes.
> > > 
> > > Thanks
> > > 
> > > 
> > > > > might just be aligned to PAGE_SIZE. I think this should be enoght but
> > > > > maybe also use the same logic in vhost_vdpa_fault().
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-25 13:25             ` Eli Cohen
@ 2021-04-26  2:35               ` Jason Wang
  2021-04-28 20:53                 ` Si-Wei Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2021-04-26  2:35 UTC (permalink / raw)
  To: Eli Cohen; +Cc: mst, virtualization, linux-kernel, Zhu, Lingshan


在 2021/4/25 下午9:25, Eli Cohen 写道:
> On Thu, Apr 22, 2021 at 04:59:11PM +0800, Jason Wang wrote:
>> 在 2021/4/22 下午4:39, Eli Cohen 写道:
>>> On Thu, Apr 22, 2021 at 04:21:45PM +0800, Jason Wang wrote:
>>>> 在 2021/4/22 下午4:07, Eli Cohen 写道:
>>>>> On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen wrote:
>>>>>> On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
>>>>>>> 在 2021/4/21 下午6:41, Eli Cohen 写道:
>>>>>>>> Implement mlx5_get_vq_notification() to return the doorbell address.
>>>>>>>> Size is set to one system page as required.
>>>>>>>>
>>>>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>>>>>> ---
>>>>>>>>      drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
>>>>>>>>      drivers/vdpa/mlx5/core/resources.c | 1 +
>>>>>>>>      drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
>>>>>>>>      3 files changed, 8 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>> index b6cc53ba980c..49de62cda598 100644
>>>>>>>> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>> @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
>>>>>>>>      	u32 pdn;
>>>>>>>>      	struct mlx5_uars_page *uar;
>>>>>>>>      	void __iomem *kick_addr;
>>>>>>>> +	u64 phys_kick_addr;
>>>>>>>>      	u16 uid;
>>>>>>>>      	u32 null_mkey;
>>>>>>>>      	bool valid;
>>>>>>>> diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>> index 6521cbd0f5c2..665f8fc1710f 100644
>>>>>>>> --- a/drivers/vdpa/mlx5/core/resources.c
>>>>>>>> +++ b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>> @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>>>>>>>>      		goto err_key;
>>>>>>>>      	kick_addr = mdev->bar_addr + offset;
>>>>>>>> +	res->phys_kick_addr = kick_addr;
>>>>>>>>      	res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
>>>>>>>>      	if (!res->kick_addr) {
>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> index 10c5fef3c020..680751074d2a 100644
>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
>>>>>>>>      static struct vdpa_notification_area mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
>>>>>>>>      {
>>>>>>>> +	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>>>      	struct vdpa_notification_area ret = {};
>>>>>>>> +	struct mlx5_vdpa_net *ndev;
>>>>>>>> +
>>>>>>>> +	ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>>>> +	ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
>>>>>>>> +	ret.size = PAGE_SIZE;
>>>>>>> Note that the page will be mapped in to guest, so it's only safe if the
>>>>>>> doorbeel exclusively own the page. This means if there're other registers in
>>>>>>> the page, we can not let the doorbell bypass to work.
>>>>>>>
>>>>>>> So this is suspicious at least in the case of subfunction where we calculate
>>>>>>> the bar length in mlx5_sf_dev_table_create() as:
>>>>>>>
>>>>>>> table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
>>>>>>>
>>>>>>> It looks to me this can only work for the arch with PAGE_SIZE = 4096,
>>>>>>> otherwise we can map more into the userspace(guest).
>>>>>>>
>>>>>> Correct, so I guess I should return here 4096.
>>>> I'm not quite sure but since the calculation of the sf_bar_length is doen
>>>> via a shift of 12, it might be correct.
>>>>
>>>> And please double check if the doorbell own the page exclusively.
>>> I am checking if it is safe to map the any part of the SF's BAR to
>>> userspace without harming other functions. If this is true, I will check
>>> if I can return PAGE_SIZE without compromising security.
>>
>> It's usally not safe and a layer violation if other registers are placed at
>> the same page.
>>
>>
>>>    I think we may
>>> need to extend struct vdpa_notification_area to contain another field
>>> offset which indicates the offset from addr where the actual doorbell
>>> resides.
>>
>> The movitiaton of the current design is to be fit seamless into how Qemu
>> model doorbell layouts currently:
>>
>> 1) page-per-vq, each vq has its own page aligned doorbell
>> 2) 2 bytes doorbell, each vq has its own 2 byte aligend doorbell
>>
>> Only 1) is support in vhost-vDPA (and vhost-user) since it's rather simple
>> and secure (page aligned) to be modelled and implemented via mmap().
>>
>> Exporting a complex layout is possbile but requires careful design.
>>
>> Actually, we had antoher option
>>
>> 3) shared doorbell: all virtqueue shares a single page aligned doorbell
>>
> This nearly matches we have in ConnectX devices. All the doorbells are
> located at the same place. For 4K page size atchitectures it is aligned
> to the start of the page. For larger page sizes it is not aligned.
> If we don't allow to some offset within the page, it means that direct
> doorbells will not work for 64K page size archs over ConnectX.


Right, just to clarify. This can still be model by the current 
page-per-vq model. It means the doorbell will be mapped into different 
pages for each virtqueue by Qemu. So from the view of Qemu or guest, 
each virtqueue has its own doorbell in this case.


>
>> This is not yet supported by Qemu.


For "not supported" I meant present this (doorbells sharing) layout to 
guest.

Thanks


>>
>> Thanks
>>
>>
>>>>>> I also think that the check in vhost_vdpa_mmap() should verify that the
>>>>>> returned size is not smaller than PAGE_SIZE because the returned address
>>>>> Actually I think it's ok since you verify the size equals vma->vm_end -
>>>>> vma->vm_start which must be at least PAGE_SIZE.
>>>> Yes.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>>> might just be aligned to PAGE_SIZE. I think this should be enoght but
>>>>>> maybe also use the same logic in vhost_vdpa_fault().


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-26  2:35               ` Jason Wang
@ 2021-04-28 20:53                 ` Si-Wei Liu
  2021-04-29  2:43                   ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Si-Wei Liu @ 2021-04-28 20:53 UTC (permalink / raw)
  To: Jason Wang; +Cc: Eli Cohen, mst, virtualization, linux-kernel, Zhu, Lingshan

On Sun, Apr 25, 2021 at 7:38 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/4/25 下午9:25, Eli Cohen 写道:
> > On Thu, Apr 22, 2021 at 04:59:11PM +0800, Jason Wang wrote:
> >> 在 2021/4/22 下午4:39, Eli Cohen 写道:
> >>> On Thu, Apr 22, 2021 at 04:21:45PM +0800, Jason Wang wrote:
> >>>> 在 2021/4/22 下午4:07, Eli Cohen 写道:
> >>>>> On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen wrote:
> >>>>>> On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
> >>>>>>> 在 2021/4/21 下午6:41, Eli Cohen 写道:
> >>>>>>>> Implement mlx5_get_vq_notification() to return the doorbell address.
> >>>>>>>> Size is set to one system page as required.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
> >>>>>>>> ---
> >>>>>>>>      drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
> >>>>>>>>      drivers/vdpa/mlx5/core/resources.c | 1 +
> >>>>>>>>      drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
> >>>>>>>>      3 files changed, 8 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> >>>>>>>> index b6cc53ba980c..49de62cda598 100644
> >>>>>>>> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> >>>>>>>> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> >>>>>>>> @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
> >>>>>>>>        u32 pdn;
> >>>>>>>>        struct mlx5_uars_page *uar;
> >>>>>>>>        void __iomem *kick_addr;
> >>>>>>>> +      u64 phys_kick_addr;
> >>>>>>>>        u16 uid;
> >>>>>>>>        u32 null_mkey;
> >>>>>>>>        bool valid;
> >>>>>>>> diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
> >>>>>>>> index 6521cbd0f5c2..665f8fc1710f 100644
> >>>>>>>> --- a/drivers/vdpa/mlx5/core/resources.c
> >>>>>>>> +++ b/drivers/vdpa/mlx5/core/resources.c
> >>>>>>>> @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
> >>>>>>>>                goto err_key;
> >>>>>>>>        kick_addr = mdev->bar_addr + offset;
> >>>>>>>> +      res->phys_kick_addr = kick_addr;
> >>>>>>>>        res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
> >>>>>>>>        if (!res->kick_addr) {
> >>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >>>>>>>> index 10c5fef3c020..680751074d2a 100644
> >>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >>>>>>>> @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
> >>>>>>>>      static struct vdpa_notification_area mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
> >>>>>>>>      {
> >>>>>>>> +      struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> >>>>>>>>        struct vdpa_notification_area ret = {};
> >>>>>>>> +      struct mlx5_vdpa_net *ndev;
> >>>>>>>> +
> >>>>>>>> +      ndev = to_mlx5_vdpa_ndev(mvdev);
> >>>>>>>> +      ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
> >>>>>>>> +      ret.size = PAGE_SIZE;
> >>>>>>> Note that the page will be mapped in to guest, so it's only safe if the
> >>>>>>> doorbeel exclusively own the page. This means if there're other registers in
> >>>>>>> the page, we can not let the doorbell bypass to work.
> >>>>>>>
> >>>>>>> So this is suspicious at least in the case of subfunction where we calculate
> >>>>>>> the bar length in mlx5_sf_dev_table_create() as:
> >>>>>>>
> >>>>>>> table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
> >>>>>>>
> >>>>>>> It looks to me this can only work for the arch with PAGE_SIZE = 4096,
> >>>>>>> otherwise we can map more into the userspace(guest).
> >>>>>>>
> >>>>>> Correct, so I guess I should return here 4096.
> >>>> I'm not quite sure but since the calculation of the sf_bar_length is doen
> >>>> via a shift of 12, it might be correct.
> >>>>
> >>>> And please double check if the doorbell own the page exclusively.
> >>> I am checking if it is safe to map the any part of the SF's BAR to
> >>> userspace without harming other functions. If this is true, I will check
> >>> if I can return PAGE_SIZE without compromising security.
> >>
> >> It's usally not safe and a layer violation if other registers are placed at
> >> the same page.
> >>
> >>
> >>>    I think we may
> >>> need to extend struct vdpa_notification_area to contain another field
> >>> offset which indicates the offset from addr where the actual doorbell
> >>> resides.
> >>
> >> The movitiaton of the current design is to be fit seamless into how Qemu
> >> model doorbell layouts currently:
> >>
> >> 1) page-per-vq, each vq has its own page aligned doorbell
> >> 2) 2 bytes doorbell, each vq has its own 2 byte aligend doorbell
> >>
> >> Only 1) is support in vhost-vDPA (and vhost-user) since it's rather simple
> >> and secure (page aligned) to be modelled and implemented via mmap().
> >>
> >> Exporting a complex layout is possbile but requires careful design.
> >>
> >> Actually, we had antoher option
> >>
> >> 3) shared doorbell: all virtqueue shares a single page aligned doorbell
> >>
> > This nearly matches we have in ConnectX devices. All the doorbells are
> > located at the same place. For 4K page size atchitectures it is aligned
> > to the start of the page. For larger page sizes it is not aligned.
> > If we don't allow to some offset within the page, it means that direct
> > doorbells will not work for 64K page size archs over ConnectX.
>
>
> Right, just to clarify. This can still be model by the current
> page-per-vq model. It means the doorbell will be mapped into different
> pages for each virtqueue by Qemu. So from the view of Qemu or guest,
> each virtqueue has its own doorbell in this case.

So this is the proposed model for mlx5 vdpa with doorbell per-instance
(rather than per-vq), assuming the exclusive ownership of mapped
doorbell page?

>
>
> >
> >> This is not yet supported by Qemu.
>
>
> For "not supported" I meant present this (doorbells sharing) layout to
> guest.

So it means this new layout perhaps will have to introduce new virtio
features to guest thus not compatible with eixsting driver?

-Siwei
>
> Thanks
>
>
> >>
> >> Thanks
> >>
> >>
> >>>>>> I also think that the check in vhost_vdpa_mmap() should verify that the
> >>>>>> returned size is not smaller than PAGE_SIZE because the returned address
> >>>>> Actually I think it's ok since you verify the size equals vma->vm_end -
> >>>>> vma->vm_start which must be at least PAGE_SIZE.
> >>>> Yes.
> >>>>
> >>>> Thanks
> >>>>
> >>>>
> >>>>>> might just be aligned to PAGE_SIZE. I think this should be enoght but
> >>>>>> maybe also use the same logic in vhost_vdpa_fault().
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-28 20:53                 ` Si-Wei Liu
@ 2021-04-29  2:43                   ` Jason Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2021-04-29  2:43 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: Eli Cohen, mst, virtualization, linux-kernel, Zhu, Lingshan


在 2021/4/29 上午4:53, Si-Wei Liu 写道:
> On Sun, Apr 25, 2021 at 7:38 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/4/25 下午9:25, Eli Cohen 写道:
>>> On Thu, Apr 22, 2021 at 04:59:11PM +0800, Jason Wang wrote:
>>>> 在 2021/4/22 下午4:39, Eli Cohen 写道:
>>>>> On Thu, Apr 22, 2021 at 04:21:45PM +0800, Jason Wang wrote:
>>>>>> 在 2021/4/22 下午4:07, Eli Cohen 写道:
>>>>>>> On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen wrote:
>>>>>>>> On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
>>>>>>>>> 在 2021/4/21 下午6:41, Eli Cohen 写道:
>>>>>>>>>> Implement mlx5_get_vq_notification() to return the doorbell address.
>>>>>>>>>> Size is set to one system page as required.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>>>>>>>> ---
>>>>>>>>>>       drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
>>>>>>>>>>       drivers/vdpa/mlx5/core/resources.c | 1 +
>>>>>>>>>>       drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
>>>>>>>>>>       3 files changed, 8 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>> index b6cc53ba980c..49de62cda598 100644
>>>>>>>>>> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>> @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
>>>>>>>>>>         u32 pdn;
>>>>>>>>>>         struct mlx5_uars_page *uar;
>>>>>>>>>>         void __iomem *kick_addr;
>>>>>>>>>> +      u64 phys_kick_addr;
>>>>>>>>>>         u16 uid;
>>>>>>>>>>         u32 null_mkey;
>>>>>>>>>>         bool valid;
>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>> index 6521cbd0f5c2..665f8fc1710f 100644
>>>>>>>>>> --- a/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>> +++ b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>> @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>>>>>>>>>>                 goto err_key;
>>>>>>>>>>         kick_addr = mdev->bar_addr + offset;
>>>>>>>>>> +      res->phys_kick_addr = kick_addr;
>>>>>>>>>>         res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
>>>>>>>>>>         if (!res->kick_addr) {
>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> index 10c5fef3c020..680751074d2a 100644
>>>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
>>>>>>>>>>       static struct vdpa_notification_area mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
>>>>>>>>>>       {
>>>>>>>>>> +      struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>>>>>         struct vdpa_notification_area ret = {};
>>>>>>>>>> +      struct mlx5_vdpa_net *ndev;
>>>>>>>>>> +
>>>>>>>>>> +      ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>>>>>> +      ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
>>>>>>>>>> +      ret.size = PAGE_SIZE;
>>>>>>>>> Note that the page will be mapped in to guest, so it's only safe if the
>>>>>>>>> doorbeel exclusively own the page. This means if there're other registers in
>>>>>>>>> the page, we can not let the doorbell bypass to work.
>>>>>>>>>
>>>>>>>>> So this is suspicious at least in the case of subfunction where we calculate
>>>>>>>>> the bar length in mlx5_sf_dev_table_create() as:
>>>>>>>>>
>>>>>>>>> table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
>>>>>>>>>
>>>>>>>>> It looks to me this can only work for the arch with PAGE_SIZE = 4096,
>>>>>>>>> otherwise we can map more into the userspace(guest).
>>>>>>>>>
>>>>>>>> Correct, so I guess I should return here 4096.
>>>>>> I'm not quite sure but since the calculation of the sf_bar_length is doen
>>>>>> via a shift of 12, it might be correct.
>>>>>>
>>>>>> And please double check if the doorbell own the page exclusively.
>>>>> I am checking if it is safe to map the any part of the SF's BAR to
>>>>> userspace without harming other functions. If this is true, I will check
>>>>> if I can return PAGE_SIZE without compromising security.
>>>> It's usally not safe and a layer violation if other registers are placed at
>>>> the same page.
>>>>
>>>>
>>>>>     I think we may
>>>>> need to extend struct vdpa_notification_area to contain another field
>>>>> offset which indicates the offset from addr where the actual doorbell
>>>>> resides.
>>>> The movitiaton of the current design is to be fit seamless into how Qemu
>>>> model doorbell layouts currently:
>>>>
>>>> 1) page-per-vq, each vq has its own page aligned doorbell
>>>> 2) 2 bytes doorbell, each vq has its own 2 byte aligend doorbell
>>>>
>>>> Only 1) is support in vhost-vDPA (and vhost-user) since it's rather simple
>>>> and secure (page aligned) to be modelled and implemented via mmap().
>>>>
>>>> Exporting a complex layout is possbile but requires careful design.
>>>>
>>>> Actually, we had antoher option
>>>>
>>>> 3) shared doorbell: all virtqueue shares a single page aligned doorbell
>>>>
>>> This nearly matches we have in ConnectX devices. All the doorbells are
>>> located at the same place. For 4K page size atchitectures it is aligned
>>> to the start of the page. For larger page sizes it is not aligned.
>>> If we don't allow to some offset within the page, it means that direct
>>> doorbells will not work for 64K page size archs over ConnectX.
>>
>> Right, just to clarify. This can still be model by the current
>> page-per-vq model. It means the doorbell will be mapped into different
>> pages for each virtqueue by Qemu. So from the view of Qemu or guest,
>> each virtqueue has its own doorbell in this case.
> So this is the proposed model for mlx5 vdpa with doorbell per-instance
> (rather than per-vq), assuming the exclusive ownership of mapped
> doorbell page?


So let me try to explain here:

Let's assume the mlx5 vDPA has a exclusive page which is used for 
doorbell for all the virtqueues.

This fits for the page-per-vq model of Qemu:

1) The virtio-pci devices emulated by Qemu provides a per virtqueue 
doorbell (one page per vq)
2) Qemu use mmap() for map the doorbell separately
3) The vhost-vDPA prepare the page, and in this case, it always provide 
the same page for userspace/Qemu
4) So vhost-vDPA actually map the single physical page into multiple 
virtual pages and use those virtual pages for per-vq-doorbell with the 
help of KVM
5) The device can still distinguish the virtqueue index since it is 
required as the value wrote to the doorbell

So it should work like a charm. From qemu point of view, it's 
page-per-vq, but at the physical device level, it's the shared doorbell.


>
>>
>>>> This is not yet supported by Qemu.
>>
>> For "not supported" I meant present this (doorbells sharing) layout to
>> guest.
> So it means this new layout perhaps will have to introduce new virtio
> features to guest thus not compatible with eixsting driver?


Consider page-per-vq model works, I don't think we need to introduce a 
userspace noticeable shared doorbell model. The details were hidden 
perfectly with the help of vhost-vDPA and it provides extra flexibility. 
The mapping is done per vq via:

mmap(NULL, vq_index << PAGE_SIZE);

Userspace can decide to:

1) only map part of the virtqueue, this is very important for the case 
when Qemu want to intercept the virtqueue transactions when we want to 
implement control vq or shadow vq
2) devices can selectively provide doorbell mapping based on the vq index

Exporting other layout is possible but it will require a careful design 
of the uAPI and will complicate the management stuff (migration 
compatibility especially).

Thanks


>
> -Siwei
>> Thanks
>>
>>
>>>> Thanks
>>>>
>>>>
>>>>>>>> I also think that the check in vhost_vdpa_mmap() should verify that the
>>>>>>>> returned size is not smaller than PAGE_SIZE because the returned address
>>>>>>> Actually I think it's ok since you verify the size equals vma->vm_end -
>>>>>>> vma->vm_start which must be at least PAGE_SIZE.
>>>>>> Yes.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>>> might just be aligned to PAGE_SIZE. I think this should be enoght but
>>>>>>>> maybe also use the same logic in vhost_vdpa_fault().


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-22  8:59           ` Jason Wang
  2021-04-25 13:25             ` Eli Cohen
@ 2021-04-29 10:00             ` Eli Cohen
  2021-04-30  4:40               ` Jason Wang
  1 sibling, 1 reply; 19+ messages in thread
From: Eli Cohen @ 2021-04-29 10:00 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, Zhu, Lingshan

On Thu, Apr 22, 2021 at 04:59:11PM +0800, Jason Wang wrote:
> 
> 在 2021/4/22 下午4:39, Eli Cohen 写道:
> > On Thu, Apr 22, 2021 at 04:21:45PM +0800, Jason Wang wrote:
> > > 在 2021/4/22 下午4:07, Eli Cohen 写道:
> > > > On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen wrote:
> > > > > On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
> > > > > > 在 2021/4/21 下午6:41, Eli Cohen 写道:
> > > > > > > Implement mlx5_get_vq_notification() to return the doorbell address.
> > > > > > > Size is set to one system page as required.
> > > > > > > 
> > > > > > > Signed-off-by: Eli Cohen <elic@nvidia.com>
> > > > > > > ---
> > > > > > >     drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
> > > > > > >     drivers/vdpa/mlx5/core/resources.c | 1 +
> > > > > > >     drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
> > > > > > >     3 files changed, 8 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > > > > > index b6cc53ba980c..49de62cda598 100644
> > > > > > > --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > > > > > +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> > > > > > > @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
> > > > > > >     	u32 pdn;
> > > > > > >     	struct mlx5_uars_page *uar;
> > > > > > >     	void __iomem *kick_addr;
> > > > > > > +	u64 phys_kick_addr;
> > > > > > >     	u16 uid;
> > > > > > >     	u32 null_mkey;
> > > > > > >     	bool valid;
> > > > > > > diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
> > > > > > > index 6521cbd0f5c2..665f8fc1710f 100644
> > > > > > > --- a/drivers/vdpa/mlx5/core/resources.c
> > > > > > > +++ b/drivers/vdpa/mlx5/core/resources.c
> > > > > > > @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
> > > > > > >     		goto err_key;
> > > > > > >     	kick_addr = mdev->bar_addr + offset;
> > > > > > > +	res->phys_kick_addr = kick_addr;
> > > > > > >     	res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
> > > > > > >     	if (!res->kick_addr) {
> > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > index 10c5fef3c020..680751074d2a 100644
> > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
> > > > > > >     static struct vdpa_notification_area mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
> > > > > > >     {
> > > > > > > +	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > >     	struct vdpa_notification_area ret = {};
> > > > > > > +	struct mlx5_vdpa_net *ndev;
> > > > > > > +
> > > > > > > +	ndev = to_mlx5_vdpa_ndev(mvdev);
> > > > > > > +	ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
> > > > > > > +	ret.size = PAGE_SIZE;
> > > > > > Note that the page will be mapped in to guest, so it's only safe if the
> > > > > > doorbeel exclusively own the page. This means if there're other registers in
> > > > > > the page, we can not let the doorbell bypass to work.
> > > > > > 
> > > > > > So this is suspicious at least in the case of subfunction where we calculate
> > > > > > the bar length in mlx5_sf_dev_table_create() as:
> > > > > > 
> > > > > > table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
> > > > > > 
> > > > > > It looks to me this can only work for the arch with PAGE_SIZE = 4096,
> > > > > > otherwise we can map more into the userspace(guest).
> > > > > > 
> > > > > Correct, so I guess I should return here 4096.
> > > 
> > > I'm not quite sure but since the calculation of the sf_bar_length is doen
> > > via a shift of 12, it might be correct.
> > > 
> > > And please double check if the doorbell own the page exclusively.
> > I am checking if it is safe to map the any part of the SF's BAR to
> > userspace without harming other functions. If this is true, I will check
> > if I can return PAGE_SIZE without compromising security.
> 
> 
> It's usally not safe and a layer violation if other registers are placed at
> the same page.
> 
> 
> >   I think we may
> > need to extend struct vdpa_notification_area to contain another field
> > offset which indicates the offset from addr where the actual doorbell
> > resides.
> 
> 
> The movitiaton of the current design is to be fit seamless into how Qemu
> model doorbell layouts currently:
> 
> 1) page-per-vq, each vq has its own page aligned doorbell
> 2) 2 bytes doorbell, each vq has its own 2 byte aligend doorbell
> 
> Only 1) is support in vhost-vDPA (and vhost-user) since it's rather simple
> and secure (page aligned) to be modelled and implemented via mmap().
> 
> Exporting a complex layout is possbile but requires careful design.
> 
> Actually, we had antoher option
> 
> 3) shared doorbell: all virtqueue shares a single page aligned doorbell

I am not sure how this could solve the problem of 64KB archs.
The point is that in ConnectX devices, the virtio queue objects doorbell
is aligned to 4K. For larger system page sizes, the doorbell may not be
aligned to a system page.
So it seems not too complex to introduce offset within the page.

BTW, for now, I am going to send another patch that makes sure page
boundaries are not vilated. It requires some support from mlx5_core
which is currently being reviewed internally.

> 
> This is not yet supported by Qemu.
> 
> Thanks
> 
> 
> > > 
> > > > > I also think that the check in vhost_vdpa_mmap() should verify that the
> > > > > returned size is not smaller than PAGE_SIZE because the returned address
> > > > Actually I think it's ok since you verify the size equals vma->vm_end -
> > > > vma->vm_start which must be at least PAGE_SIZE.
> > > 
> > > Yes.
> > > 
> > > Thanks
> > > 
> > > 
> > > > > might just be aligned to PAGE_SIZE. I think this should be enoght but
> > > > > maybe also use the same logic in vhost_vdpa_fault().
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-29 10:00             ` Eli Cohen
@ 2021-04-30  4:40               ` Jason Wang
  2021-04-30  6:31                 ` Zhu, Lingshan
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2021-04-30  4:40 UTC (permalink / raw)
  To: Eli Cohen; +Cc: mst, virtualization, linux-kernel, Zhu, Lingshan


在 2021/4/29 下午6:00, Eli Cohen 写道:
> On Thu, Apr 22, 2021 at 04:59:11PM +0800, Jason Wang wrote:
>> 在 2021/4/22 下午4:39, Eli Cohen 写道:
>>> On Thu, Apr 22, 2021 at 04:21:45PM +0800, Jason Wang wrote:
>>>> 在 2021/4/22 下午4:07, Eli Cohen 写道:
>>>>> On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen wrote:
>>>>>> On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
>>>>>>> 在 2021/4/21 下午6:41, Eli Cohen 写道:
>>>>>>>> Implement mlx5_get_vq_notification() to return the doorbell address.
>>>>>>>> Size is set to one system page as required.
>>>>>>>>
>>>>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>>>>>> ---
>>>>>>>>      drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
>>>>>>>>      drivers/vdpa/mlx5/core/resources.c | 1 +
>>>>>>>>      drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
>>>>>>>>      3 files changed, 8 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>> index b6cc53ba980c..49de62cda598 100644
>>>>>>>> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>> @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
>>>>>>>>      	u32 pdn;
>>>>>>>>      	struct mlx5_uars_page *uar;
>>>>>>>>      	void __iomem *kick_addr;
>>>>>>>> +	u64 phys_kick_addr;
>>>>>>>>      	u16 uid;
>>>>>>>>      	u32 null_mkey;
>>>>>>>>      	bool valid;
>>>>>>>> diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>> index 6521cbd0f5c2..665f8fc1710f 100644
>>>>>>>> --- a/drivers/vdpa/mlx5/core/resources.c
>>>>>>>> +++ b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>> @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
>>>>>>>>      		goto err_key;
>>>>>>>>      	kick_addr = mdev->bar_addr + offset;
>>>>>>>> +	res->phys_kick_addr = kick_addr;
>>>>>>>>      	res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
>>>>>>>>      	if (!res->kick_addr) {
>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> index 10c5fef3c020..680751074d2a 100644
>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
>>>>>>>>      static struct vdpa_notification_area mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
>>>>>>>>      {
>>>>>>>> +	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>>>      	struct vdpa_notification_area ret = {};
>>>>>>>> +	struct mlx5_vdpa_net *ndev;
>>>>>>>> +
>>>>>>>> +	ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>>>> +	ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
>>>>>>>> +	ret.size = PAGE_SIZE;
>>>>>>> Note that the page will be mapped in to guest, so it's only safe if the
>>>>>>> doorbeel exclusively own the page. This means if there're other registers in
>>>>>>> the page, we can not let the doorbell bypass to work.
>>>>>>>
>>>>>>> So this is suspicious at least in the case of subfunction where we calculate
>>>>>>> the bar length in mlx5_sf_dev_table_create() as:
>>>>>>>
>>>>>>> table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
>>>>>>>
>>>>>>> It looks to me this can only work for the arch with PAGE_SIZE = 4096,
>>>>>>> otherwise we can map more into the userspace(guest).
>>>>>>>
>>>>>> Correct, so I guess I should return here 4096.
>>>> I'm not quite sure but since the calculation of the sf_bar_length is doen
>>>> via a shift of 12, it might be correct.
>>>>
>>>> And please double check if the doorbell own the page exclusively.
>>> I am checking if it is safe to map the any part of the SF's BAR to
>>> userspace without harming other functions. If this is true, I will check
>>> if I can return PAGE_SIZE without compromising security.
>>
>> It's usally not safe and a layer violation if other registers are placed at
>> the same page.
>>
>>
>>>    I think we may
>>> need to extend struct vdpa_notification_area to contain another field
>>> offset which indicates the offset from addr where the actual doorbell
>>> resides.
>>
>> The movitiaton of the current design is to be fit seamless into how Qemu
>> model doorbell layouts currently:
>>
>> 1) page-per-vq, each vq has its own page aligned doorbell
>> 2) 2 bytes doorbell, each vq has its own 2 byte aligend doorbell
>>
>> Only 1) is support in vhost-vDPA (and vhost-user) since it's rather simple
>> and secure (page aligned) to be modelled and implemented via mmap().
>>
>> Exporting a complex layout is possbile but requires careful design.
>>
>> Actually, we had antoher option
>>
>> 3) shared doorbell: all virtqueue shares a single page aligned doorbell
> I am not sure how this could solve the problem of 64KB archs.
> The point is that in ConnectX devices, the virtio queue objects doorbell
> is aligned to 4K. For larger system page sizes, the doorbell may not be
> aligned to a system page.
> So it seems not too complex to introduce offset within the page.


Three major issues:

1) single mmap() works at page level, it means we need map 64K to guest 
and we can only do this safely if no other registers are placed into the 
same page
2) new uAPI to let the userspace know the offset
3) how to model them with the virtio-pci in Qemu, and this may introduce 
burdens for management (need some changes in the qemu command line) to 
deal with the migration compatibility

So consider the complexity, we can just stick to the current code. That 
means mmap() will fail and qemu will keep using the eventfd based kick.


>
> BTW, for now, I am going to send another patch that makes sure page
> boundaries are not vilated. It requires some support from mlx5_core
> which is currently being reviewed internally.


Sure.

Thanks


>
>> This is not yet supported by Qemu.
>>
>> Thanks
>>
>>
>>>>>> I also think that the check in vhost_vdpa_mmap() should verify that the
>>>>>> returned size is not smaller than PAGE_SIZE because the returned address
>>>>> Actually I think it's ok since you verify the size equals vma->vm_end -
>>>>> vma->vm_start which must be at least PAGE_SIZE.
>>>> Yes.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>>> might just be aligned to PAGE_SIZE. I think this should be enoght but
>>>>>> maybe also use the same logic in vhost_vdpa_fault().


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-30  4:40               ` Jason Wang
@ 2021-04-30  6:31                 ` Zhu, Lingshan
  2021-04-30  7:03                   ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Zhu, Lingshan @ 2021-04-30  6:31 UTC (permalink / raw)
  To: Jason Wang, Eli Cohen; +Cc: mst, virtualization, linux-kernel



On 4/30/2021 12:40 PM, Jason Wang wrote:
>
> 在 2021/4/29 下午6:00, Eli Cohen 写道:
>> On Thu, Apr 22, 2021 at 04:59:11PM +0800, Jason Wang wrote:
>>> 在 2021/4/22 下午4:39, Eli Cohen 写道:
>>>> On Thu, Apr 22, 2021 at 04:21:45PM +0800, Jason Wang wrote:
>>>>> 在 2021/4/22 下午4:07, Eli Cohen 写道:
>>>>>> On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen wrote:
>>>>>>> On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
>>>>>>>> 在 2021/4/21 下午6:41, Eli Cohen 写道:
>>>>>>>>> Implement mlx5_get_vq_notification() to return the doorbell 
>>>>>>>>> address.
>>>>>>>>> Size is set to one system page as required.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>>>>>>> ---
>>>>>>>>>      drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
>>>>>>>>>      drivers/vdpa/mlx5/core/resources.c | 1 +
>>>>>>>>>      drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
>>>>>>>>>      3 files changed, 8 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
>>>>>>>>> b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>> index b6cc53ba980c..49de62cda598 100644
>>>>>>>>> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>> @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
>>>>>>>>>          u32 pdn;
>>>>>>>>>          struct mlx5_uars_page *uar;
>>>>>>>>>          void __iomem *kick_addr;
>>>>>>>>> +    u64 phys_kick_addr;
>>>>>>>>>          u16 uid;
>>>>>>>>>          u32 null_mkey;
>>>>>>>>>          bool valid;
>>>>>>>>> diff --git a/drivers/vdpa/mlx5/core/resources.c 
>>>>>>>>> b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>> index 6521cbd0f5c2..665f8fc1710f 100644
>>>>>>>>> --- a/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>> +++ b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>> @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct 
>>>>>>>>> mlx5_vdpa_dev *mvdev)
>>>>>>>>>              goto err_key;
>>>>>>>>>          kick_addr = mdev->bar_addr + offset;
>>>>>>>>> +    res->phys_kick_addr = kick_addr;
>>>>>>>>>          res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
>>>>>>>>>          if (!res->kick_addr) {
>>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
>>>>>>>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>> index 10c5fef3c020..680751074d2a 100644
>>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>> @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct 
>>>>>>>>> vdpa_device *vdev)
>>>>>>>>>      static struct vdpa_notification_area 
>>>>>>>>> mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
>>>>>>>>>      {
>>>>>>>>> +    struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>>>>          struct vdpa_notification_area ret = {};
>>>>>>>>> +    struct mlx5_vdpa_net *ndev;
>>>>>>>>> +
>>>>>>>>> +    ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>>>>> +    ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
>>>>>>>>> +    ret.size = PAGE_SIZE;
>>>>>>>> Note that the page will be mapped in to guest, so it's only 
>>>>>>>> safe if the
>>>>>>>> doorbeel exclusively own the page. This means if there're other 
>>>>>>>> registers in
>>>>>>>> the page, we can not let the doorbell bypass to work.
>>>>>>>>
>>>>>>>> So this is suspicious at least in the case of subfunction where 
>>>>>>>> we calculate
>>>>>>>> the bar length in mlx5_sf_dev_table_create() as:
>>>>>>>>
>>>>>>>> table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) 
>>>>>>>> + 12);
>>>>>>>>
>>>>>>>> It looks to me this can only work for the arch with PAGE_SIZE = 
>>>>>>>> 4096,
>>>>>>>> otherwise we can map more into the userspace(guest).
>>>>>>>>
>>>>>>> Correct, so I guess I should return here 4096.
>>>>> I'm not quite sure but since the calculation of the sf_bar_length 
>>>>> is doen
>>>>> via a shift of 12, it might be correct.
>>>>>
>>>>> And please double check if the doorbell own the page exclusively.
>>>> I am checking if it is safe to map the any part of the SF's BAR to
>>>> userspace without harming other functions. If this is true, I will 
>>>> check
>>>> if I can return PAGE_SIZE without compromising security.
>>>
>>> It's usally not safe and a layer violation if other registers are 
>>> placed at
>>> the same page.
>>>
>>>
>>>>    I think we may
>>>> need to extend struct vdpa_notification_area to contain another field
>>>> offset which indicates the offset from addr where the actual doorbell
>>>> resides.
>>>
>>> The movitiaton of the current design is to be fit seamless into how 
>>> Qemu
>>> model doorbell layouts currently:
>>>
>>> 1) page-per-vq, each vq has its own page aligned doorbell
>>> 2) 2 bytes doorbell, each vq has its own 2 byte aligend doorbell
>>>
>>> Only 1) is support in vhost-vDPA (and vhost-user) since it's rather 
>>> simple
>>> and secure (page aligned) to be modelled and implemented via mmap().
>>>
>>> Exporting a complex layout is possbile but requires careful design.
>>>
>>> Actually, we had antoher option
>>>
>>> 3) shared doorbell: all virtqueue shares a single page aligned doorbell
>> I am not sure how this could solve the problem of 64KB archs.
>> The point is that in ConnectX devices, the virtio queue objects doorbell
>> is aligned to 4K. For larger system page sizes, the doorbell may not be
>> aligned to a system page.
>> So it seems not too complex to introduce offset within the page.
>
>
> Three major issues:
>
> 1) single mmap() works at page level, it means we need map 64K to 
> guest and we can only do this safely if no other registers are placed 
> into the same page
> 2) new uAPI to let the userspace know the offset
> 3) how to model them with the virtio-pci in Qemu, and this may 
> introduce burdens for management (need some changes in the qemu 
> command line) to deal with the migration compatibility
>
> So consider the complexity, we can just stick to the current code. 
> That means mmap() will fail and qemu will keep using the eventfd based 
> kick.
There is another case, mmap() works at page level, page size is at least 
4K. Consider if a device has a bar containing the shared doorbell page 
at its last 4K space. In this bar layout, map a arch.page_size=64K page 
to usersapce would lead to fatal errors.
I think we can assign the actual size of the doorbell area size to 
vdpa_notification.size than arch.page_size to avoid such issues. Then 
upper layers like vhost_vdpa should check whether this size can work 
with the machine arch and its alignment, if not, should fail over to use 
eventfd.
Then do we still need a uAPI tell the offset within the page?

Thanks
Zhu Lingshan
>
>
>
>>
>> BTW, for now, I am going to send another patch that makes sure page
>> boundaries are not vilated. It requires some support from mlx5_core
>> which is currently being reviewed internally.
>
>
> Sure.
>
> Thanks
>
>
>>
>>> This is not yet supported by Qemu.
>>>
>>> Thanks
>>>
>>>
>>>>>>> I also think that the check in vhost_vdpa_mmap() should verify 
>>>>>>> that the
>>>>>>> returned size is not smaller than PAGE_SIZE because the returned 
>>>>>>> address
>>>>>> Actually I think it's ok since you verify the size equals 
>>>>>> vma->vm_end -
>>>>>> vma->vm_start which must be at least PAGE_SIZE.
>>>>> Yes.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>>> might just be aligned to PAGE_SIZE. I think this should be 
>>>>>>> enoght but
>>>>>>> maybe also use the same logic in vhost_vdpa_fault().
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-30  6:31                 ` Zhu, Lingshan
@ 2021-04-30  7:03                   ` Jason Wang
  2021-04-30  9:25                     ` Zhu, Lingshan
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2021-04-30  7:03 UTC (permalink / raw)
  To: Zhu, Lingshan, Eli Cohen; +Cc: mst, virtualization, linux-kernel


在 2021/4/30 下午2:31, Zhu, Lingshan 写道:
>
>
> On 4/30/2021 12:40 PM, Jason Wang wrote:
>>
>> 在 2021/4/29 下午6:00, Eli Cohen 写道:
>>> On Thu, Apr 22, 2021 at 04:59:11PM +0800, Jason Wang wrote:
>>>> 在 2021/4/22 下午4:39, Eli Cohen 写道:
>>>>> On Thu, Apr 22, 2021 at 04:21:45PM +0800, Jason Wang wrote:
>>>>>> 在 2021/4/22 下午4:07, Eli Cohen 写道:
>>>>>>> On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen wrote:
>>>>>>>> On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
>>>>>>>>> 在 2021/4/21 下午6:41, Eli Cohen 写道:
>>>>>>>>>> Implement mlx5_get_vq_notification() to return the doorbell 
>>>>>>>>>> address.
>>>>>>>>>> Size is set to one system page as required.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>>>>>>>> ---
>>>>>>>>>>      drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
>>>>>>>>>>      drivers/vdpa/mlx5/core/resources.c | 1 +
>>>>>>>>>>      drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
>>>>>>>>>>      3 files changed, 8 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
>>>>>>>>>> b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>> index b6cc53ba980c..49de62cda598 100644
>>>>>>>>>> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>> @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
>>>>>>>>>>          u32 pdn;
>>>>>>>>>>          struct mlx5_uars_page *uar;
>>>>>>>>>>          void __iomem *kick_addr;
>>>>>>>>>> +    u64 phys_kick_addr;
>>>>>>>>>>          u16 uid;
>>>>>>>>>>          u32 null_mkey;
>>>>>>>>>>          bool valid;
>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/core/resources.c 
>>>>>>>>>> b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>> index 6521cbd0f5c2..665f8fc1710f 100644
>>>>>>>>>> --- a/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>> +++ b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>> @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct 
>>>>>>>>>> mlx5_vdpa_dev *mvdev)
>>>>>>>>>>              goto err_key;
>>>>>>>>>>          kick_addr = mdev->bar_addr + offset;
>>>>>>>>>> +    res->phys_kick_addr = kick_addr;
>>>>>>>>>>          res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
>>>>>>>>>>          if (!res->kick_addr) {
>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
>>>>>>>>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> index 10c5fef3c020..680751074d2a 100644
>>>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>> @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct 
>>>>>>>>>> vdpa_device *vdev)
>>>>>>>>>>      static struct vdpa_notification_area 
>>>>>>>>>> mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
>>>>>>>>>>      {
>>>>>>>>>> +    struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>>>>>          struct vdpa_notification_area ret = {};
>>>>>>>>>> +    struct mlx5_vdpa_net *ndev;
>>>>>>>>>> +
>>>>>>>>>> +    ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>>>>>> +    ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
>>>>>>>>>> +    ret.size = PAGE_SIZE;
>>>>>>>>> Note that the page will be mapped in to guest, so it's only 
>>>>>>>>> safe if the
>>>>>>>>> doorbeel exclusively own the page. This means if there're 
>>>>>>>>> other registers in
>>>>>>>>> the page, we can not let the doorbell bypass to work.
>>>>>>>>>
>>>>>>>>> So this is suspicious at least in the case of subfunction 
>>>>>>>>> where we calculate
>>>>>>>>> the bar length in mlx5_sf_dev_table_create() as:
>>>>>>>>>
>>>>>>>>> table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, 
>>>>>>>>> log_min_sf_size) + 12);
>>>>>>>>>
>>>>>>>>> It looks to me this can only work for the arch with PAGE_SIZE 
>>>>>>>>> = 4096,
>>>>>>>>> otherwise we can map more into the userspace(guest).
>>>>>>>>>
>>>>>>>> Correct, so I guess I should return here 4096.
>>>>>> I'm not quite sure but since the calculation of the sf_bar_length 
>>>>>> is doen
>>>>>> via a shift of 12, it might be correct.
>>>>>>
>>>>>> And please double check if the doorbell own the page exclusively.
>>>>> I am checking if it is safe to map the any part of the SF's BAR to
>>>>> userspace without harming other functions. If this is true, I will 
>>>>> check
>>>>> if I can return PAGE_SIZE without compromising security.
>>>>
>>>> It's usally not safe and a layer violation if other registers are 
>>>> placed at
>>>> the same page.
>>>>
>>>>
>>>>>    I think we may
>>>>> need to extend struct vdpa_notification_area to contain another field
>>>>> offset which indicates the offset from addr where the actual doorbell
>>>>> resides.
>>>>
>>>> The movitiaton of the current design is to be fit seamless into how 
>>>> Qemu
>>>> model doorbell layouts currently:
>>>>
>>>> 1) page-per-vq, each vq has its own page aligned doorbell
>>>> 2) 2 bytes doorbell, each vq has its own 2 byte aligend doorbell
>>>>
>>>> Only 1) is support in vhost-vDPA (and vhost-user) since it's rather 
>>>> simple
>>>> and secure (page aligned) to be modelled and implemented via mmap().
>>>>
>>>> Exporting a complex layout is possbile but requires careful design.
>>>>
>>>> Actually, we had antoher option
>>>>
>>>> 3) shared doorbell: all virtqueue shares a single page aligned 
>>>> doorbell
>>> I am not sure how this could solve the problem of 64KB archs.
>>> The point is that in ConnectX devices, the virtio queue objects 
>>> doorbell
>>> is aligned to 4K. For larger system page sizes, the doorbell may not be
>>> aligned to a system page.
>>> So it seems not too complex to introduce offset within the page.
>>
>>
>> Three major issues:
>>
>> 1) single mmap() works at page level, it means we need map 64K to 
>> guest and we can only do this safely if no other registers are placed 
>> into the same page
>> 2) new uAPI to let the userspace know the offset
>> 3) how to model them with the virtio-pci in Qemu, and this may 
>> introduce burdens for management (need some changes in the qemu 
>> command line) to deal with the migration compatibility
>>
>> So consider the complexity, we can just stick to the current code. 
>> That means mmap() will fail and qemu will keep using the eventfd 
>> based kick.
> There is another case, mmap() works at page level, page size is at 
> least 4K. Consider if a device has a bar containing the shared 
> doorbell page at its last 4K space. In this bar layout, map a 
> arch.page_size=64K page to usersapce would lead to fatal errors.


Why it's a fatal error? Userspace should survive from mmap() errors and 
keep using the kickfd.


> I think we can assign the actual size of the doorbell area size to 
> vdpa_notification.size than arch.page_size to avoid such issues. Then 
> upper layers like vhost_vdpa should check whether this size can work 
> with the machine arch and its alignment, if not, should fail over to 
> use eventfd.


Isn't this how get_vet_notification() designed and implemented right 
now? What parent need is just to report the doorbell size, it's the bus 
driver (vhost-vDPA) to decide if and how it is used.

Thanks


> Then do we still need a uAPI tell the offset within the page?
>
> Thanks
> Zhu Lingshan
>>
>>
>>
>>>
>>> BTW, for now, I am going to send another patch that makes sure page
>>> boundaries are not vilated. It requires some support from mlx5_core
>>> which is currently being reviewed internally.
>>
>>
>> Sure.
>>
>> Thanks
>>
>>
>>>
>>>> This is not yet supported by Qemu.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>>>>> I also think that the check in vhost_vdpa_mmap() should verify 
>>>>>>>> that the
>>>>>>>> returned size is not smaller than PAGE_SIZE because the 
>>>>>>>> returned address
>>>>>>> Actually I think it's ok since you verify the size equals 
>>>>>>> vma->vm_end -
>>>>>>> vma->vm_start which must be at least PAGE_SIZE.
>>>>>> Yes.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>>> might just be aligned to PAGE_SIZE. I think this should be 
>>>>>>>> enoght but
>>>>>>>> maybe also use the same logic in vhost_vdpa_fault().
>>
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-30  7:03                   ` Jason Wang
@ 2021-04-30  9:25                     ` Zhu, Lingshan
  2021-04-30 13:49                       ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Zhu, Lingshan @ 2021-04-30  9:25 UTC (permalink / raw)
  To: Jason Wang, Eli Cohen; +Cc: mst, virtualization, linux-kernel



On 4/30/2021 3:03 PM, Jason Wang wrote:
>
> 在 2021/4/30 下午2:31, Zhu, Lingshan 写道:
>>
>>
>> On 4/30/2021 12:40 PM, Jason Wang wrote:
>>>
>>> 在 2021/4/29 下午6:00, Eli Cohen 写道:
>>>> On Thu, Apr 22, 2021 at 04:59:11PM +0800, Jason Wang wrote:
>>>>> 在 2021/4/22 下午4:39, Eli Cohen 写道:
>>>>>> On Thu, Apr 22, 2021 at 04:21:45PM +0800, Jason Wang wrote:
>>>>>>> 在 2021/4/22 下午4:07, Eli Cohen 写道:
>>>>>>>> On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen wrote:
>>>>>>>>> On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
>>>>>>>>>> 在 2021/4/21 下午6:41, Eli Cohen 写道:
>>>>>>>>>>> Implement mlx5_get_vq_notification() to return the doorbell 
>>>>>>>>>>> address.
>>>>>>>>>>> Size is set to one system page as required.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>>>>>>>>> ---
>>>>>>>>>>>      drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
>>>>>>>>>>>      drivers/vdpa/mlx5/core/resources.c | 1 +
>>>>>>>>>>>      drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
>>>>>>>>>>>      3 files changed, 8 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
>>>>>>>>>>> b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>>> index b6cc53ba980c..49de62cda598 100644
>>>>>>>>>>> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>>> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>>> @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
>>>>>>>>>>>          u32 pdn;
>>>>>>>>>>>          struct mlx5_uars_page *uar;
>>>>>>>>>>>          void __iomem *kick_addr;
>>>>>>>>>>> +    u64 phys_kick_addr;
>>>>>>>>>>>          u16 uid;
>>>>>>>>>>>          u32 null_mkey;
>>>>>>>>>>>          bool valid;
>>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/core/resources.c 
>>>>>>>>>>> b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>>> index 6521cbd0f5c2..665f8fc1710f 100644
>>>>>>>>>>> --- a/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>>> +++ b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>>> @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct 
>>>>>>>>>>> mlx5_vdpa_dev *mvdev)
>>>>>>>>>>>              goto err_key;
>>>>>>>>>>>          kick_addr = mdev->bar_addr + offset;
>>>>>>>>>>> +    res->phys_kick_addr = kick_addr;
>>>>>>>>>>>          res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
>>>>>>>>>>>          if (!res->kick_addr) {
>>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
>>>>>>>>>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>>> index 10c5fef3c020..680751074d2a 100644
>>>>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>>> @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct 
>>>>>>>>>>> vdpa_device *vdev)
>>>>>>>>>>>      static struct vdpa_notification_area 
>>>>>>>>>>> mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
>>>>>>>>>>>      {
>>>>>>>>>>> +    struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>>>>>>          struct vdpa_notification_area ret = {};
>>>>>>>>>>> +    struct mlx5_vdpa_net *ndev;
>>>>>>>>>>> +
>>>>>>>>>>> +    ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>>>>>>> +    ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
>>>>>>>>>>> +    ret.size = PAGE_SIZE;
>>>>>>>>>> Note that the page will be mapped in to guest, so it's only 
>>>>>>>>>> safe if the
>>>>>>>>>> doorbeel exclusively own the page. This means if there're 
>>>>>>>>>> other registers in
>>>>>>>>>> the page, we can not let the doorbell bypass to work.
>>>>>>>>>>
>>>>>>>>>> So this is suspicious at least in the case of subfunction 
>>>>>>>>>> where we calculate
>>>>>>>>>> the bar length in mlx5_sf_dev_table_create() as:
>>>>>>>>>>
>>>>>>>>>> table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, 
>>>>>>>>>> log_min_sf_size) + 12);
>>>>>>>>>>
>>>>>>>>>> It looks to me this can only work for the arch with PAGE_SIZE 
>>>>>>>>>> = 4096,
>>>>>>>>>> otherwise we can map more into the userspace(guest).
>>>>>>>>>>
>>>>>>>>> Correct, so I guess I should return here 4096.
>>>>>>> I'm not quite sure but since the calculation of the 
>>>>>>> sf_bar_length is doen
>>>>>>> via a shift of 12, it might be correct.
>>>>>>>
>>>>>>> And please double check if the doorbell own the page exclusively.
>>>>>> I am checking if it is safe to map the any part of the SF's BAR to
>>>>>> userspace without harming other functions. If this is true, I 
>>>>>> will check
>>>>>> if I can return PAGE_SIZE without compromising security.
>>>>>
>>>>> It's usally not safe and a layer violation if other registers are 
>>>>> placed at
>>>>> the same page.
>>>>>
>>>>>
>>>>>>    I think we may
>>>>>> need to extend struct vdpa_notification_area to contain another 
>>>>>> field
>>>>>> offset which indicates the offset from addr where the actual 
>>>>>> doorbell
>>>>>> resides.
>>>>>
>>>>> The movitiaton of the current design is to be fit seamless into 
>>>>> how Qemu
>>>>> model doorbell layouts currently:
>>>>>
>>>>> 1) page-per-vq, each vq has its own page aligned doorbell
>>>>> 2) 2 bytes doorbell, each vq has its own 2 byte aligend doorbell
>>>>>
>>>>> Only 1) is support in vhost-vDPA (and vhost-user) since it's 
>>>>> rather simple
>>>>> and secure (page aligned) to be modelled and implemented via mmap().
>>>>>
>>>>> Exporting a complex layout is possbile but requires careful design.
>>>>>
>>>>> Actually, we had antoher option
>>>>>
>>>>> 3) shared doorbell: all virtqueue shares a single page aligned 
>>>>> doorbell
>>>> I am not sure how this could solve the problem of 64KB archs.
>>>> The point is that in ConnectX devices, the virtio queue objects 
>>>> doorbell
>>>> is aligned to 4K. For larger system page sizes, the doorbell may 
>>>> not be
>>>> aligned to a system page.
>>>> So it seems not too complex to introduce offset within the page.
>>>
>>>
>>> Three major issues:
>>>
>>> 1) single mmap() works at page level, it means we need map 64K to 
>>> guest and we can only do this safely if no other registers are 
>>> placed into the same page
>>> 2) new uAPI to let the userspace know the offset
>>> 3) how to model them with the virtio-pci in Qemu, and this may 
>>> introduce burdens for management (need some changes in the qemu 
>>> command line) to deal with the migration compatibility
>>>
>>> So consider the complexity, we can just stick to the current code. 
>>> That means mmap() will fail and qemu will keep using the eventfd 
>>> based kick.
>> There is another case, mmap() works at page level, page size is at 
>> least 4K. Consider if a device has a bar containing the shared 
>> doorbell page at its last 4K space. In this bar layout, map a 
>> arch.page_size=64K page to usersapce would lead to fatal errors.
>
>
> Why it's a fatal error? Userspace should survive from mmap() errors 
> and keep using the kickfd.
I mean vhost-vdpa should not only check the alignment, also need to 
check whether the doorbell size no less than a arch.page_size. If the 
doorbell placed at the last 4K in bar, map 64k page could be an error.

Thanks
>
>
>> I think we can assign the actual size of the doorbell area size to 
>> vdpa_notification.size than arch.page_size to avoid such issues. Then 
>> upper layers like vhost_vdpa should check whether this size can work 
>> with the machine arch and its alignment, if not, should fail over to 
>> use eventfd.
>
>
> Isn't this how get_vet_notification() designed and implemented right 
> now? What parent need is just to report the doorbell size, it's the 
> bus driver (vhost-vDPA) to decide if and how it is used.
>
> Thanks
>
>
>> Then do we still need a uAPI tell the offset within the page?
>>
>> Thanks
>> Zhu Lingshan
>>>
>>>
>>>
>>>>
>>>> BTW, for now, I am going to send another patch that makes sure page
>>>> boundaries are not vilated. It requires some support from mlx5_core
>>>> which is currently being reviewed internally.
>>>
>>>
>>> Sure.
>>>
>>> Thanks
>>>
>>>
>>>>
>>>>> This is not yet supported by Qemu.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>>>>> I also think that the check in vhost_vdpa_mmap() should verify 
>>>>>>>>> that the
>>>>>>>>> returned size is not smaller than PAGE_SIZE because the 
>>>>>>>>> returned address
>>>>>>>> Actually I think it's ok since you verify the size equals 
>>>>>>>> vma->vm_end -
>>>>>>>> vma->vm_start which must be at least PAGE_SIZE.
>>>>>>> Yes.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>>> might just be aligned to PAGE_SIZE. I think this should be 
>>>>>>>>> enoght but
>>>>>>>>> maybe also use the same logic in vhost_vdpa_fault().
>>>
>>
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] vdpa/mlx5: Add support for doorbell bypassing
  2021-04-30  9:25                     ` Zhu, Lingshan
@ 2021-04-30 13:49                       ` Jason Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2021-04-30 13:49 UTC (permalink / raw)
  To: Zhu, Lingshan, Eli Cohen; +Cc: mst, virtualization, linux-kernel


在 2021/4/30 下午5:25, Zhu, Lingshan 写道:
>
>
> On 4/30/2021 3:03 PM, Jason Wang wrote:
>>
>> 在 2021/4/30 下午2:31, Zhu, Lingshan 写道:
>>>
>>>
>>> On 4/30/2021 12:40 PM, Jason Wang wrote:
>>>>
>>>> 在 2021/4/29 下午6:00, Eli Cohen 写道:
>>>>> On Thu, Apr 22, 2021 at 04:59:11PM +0800, Jason Wang wrote:
>>>>>> 在 2021/4/22 下午4:39, Eli Cohen 写道:
>>>>>>> On Thu, Apr 22, 2021 at 04:21:45PM +0800, Jason Wang wrote:
>>>>>>>> 在 2021/4/22 下午4:07, Eli Cohen 写道:
>>>>>>>>> On Thu, Apr 22, 2021 at 09:03:58AM +0300, Eli Cohen wrote:
>>>>>>>>>> On Thu, Apr 22, 2021 at 10:37:38AM +0800, Jason Wang wrote:
>>>>>>>>>>> 在 2021/4/21 下午6:41, Eli Cohen 写道:
>>>>>>>>>>>> Implement mlx5_get_vq_notification() to return the doorbell 
>>>>>>>>>>>> address.
>>>>>>>>>>>> Size is set to one system page as required.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>      drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1+
>>>>>>>>>>>>      drivers/vdpa/mlx5/core/resources.c | 1+
>>>>>>>>>>>>      drivers/vdpa/mlx5/net/mlx5_vnet.c  | 6 ++++++
>>>>>>>>>>>>      3 files changed, 8 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
>>>>>>>>>>>> b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>>>> index b6cc53ba980c..49de62cda598 100644
>>>>>>>>>>>> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>>>> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
>>>>>>>>>>>> @@ -41,6 +41,7 @@ struct mlx5_vdpa_resources {
>>>>>>>>>>>>          u32 pdn;
>>>>>>>>>>>>          struct mlx5_uars_page *uar;
>>>>>>>>>>>>          void __iomem *kick_addr;
>>>>>>>>>>>> +    u64 phys_kick_addr;
>>>>>>>>>>>>          u16 uid;
>>>>>>>>>>>>          u32 null_mkey;
>>>>>>>>>>>>          bool valid;
>>>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/core/resources.c 
>>>>>>>>>>>> b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>>>> index 6521cbd0f5c2..665f8fc1710f 100644
>>>>>>>>>>>> --- a/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>>>> +++ b/drivers/vdpa/mlx5/core/resources.c
>>>>>>>>>>>> @@ -247,6 +247,7 @@ int mlx5_vdpa_alloc_resources(struct 
>>>>>>>>>>>> mlx5_vdpa_dev *mvdev)
>>>>>>>>>>>>              goto err_key;
>>>>>>>>>>>>          kick_addr = mdev->bar_addr + offset;
>>>>>>>>>>>> +    res->phys_kick_addr = kick_addr;
>>>>>>>>>>>>          res->kick_addr= ioremap(kick_addr, PAGE_SIZE);
>>>>>>>>>>>>          if (!res->kick_addr) {
>>>>>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
>>>>>>>>>>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>>>> index 10c5fef3c020..680751074d2a 100644
>>>>>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>>>>>> @@ -1865,8 +1865,14 @@ static void mlx5_vdpa_free(struct 
>>>>>>>>>>>> vdpa_device *vdev)
>>>>>>>>>>>>      static struct vdpa_notification_area 
>>>>>>>>>>>> mlx5_get_vq_notification(struct vdpa_device *vdev, u16 idx)
>>>>>>>>>>>>      {
>>>>>>>>>>>> +    struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>>>>>>>>>          struct vdpa_notification_area ret = {};
>>>>>>>>>>>> +    struct mlx5_vdpa_net *ndev;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>>>>>>>>> +    ret.addr = (phys_addr_t)ndev->mvdev.res.phys_kick_addr;
>>>>>>>>>>>> +    ret.size = PAGE_SIZE;
>>>>>>>>>>> Note that the page will be mapped in to guest, so it's only 
>>>>>>>>>>> safe if the
>>>>>>>>>>> doorbeel exclusively own the page. This means if there're 
>>>>>>>>>>> other registers in
>>>>>>>>>>> the page, we can not let the doorbell bypass to work.
>>>>>>>>>>>
>>>>>>>>>>> So this is suspicious at least in the case of subfunction 
>>>>>>>>>>> where we calculate
>>>>>>>>>>> the bar length in mlx5_sf_dev_table_create() as:
>>>>>>>>>>>
>>>>>>>>>>> table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, 
>>>>>>>>>>> log_min_sf_size) + 12);
>>>>>>>>>>>
>>>>>>>>>>> It looks to me this can only work for the arch with 
>>>>>>>>>>> PAGE_SIZE = 4096,
>>>>>>>>>>> otherwise we can map more into the userspace(guest).
>>>>>>>>>>>
>>>>>>>>>> Correct, so I guess I should return here 4096.
>>>>>>>> I'm not quite sure but since the calculation of the 
>>>>>>>> sf_bar_length is doen
>>>>>>>> via a shift of 12, it might be correct.
>>>>>>>>
>>>>>>>> And please double check if the doorbell own the page exclusively.
>>>>>>> I am checking if it is safe to map the any part of the SF's BAR to
>>>>>>> userspace without harming other functions. If this is true, I 
>>>>>>> will check
>>>>>>> if I can return PAGE_SIZE without compromising security.
>>>>>>
>>>>>> It's usally not safe and a layer violation if other registers are 
>>>>>> placed at
>>>>>> the same page.
>>>>>>
>>>>>>
>>>>>>>    I think we may
>>>>>>> need to extend struct vdpa_notification_area to contain another 
>>>>>>> field
>>>>>>> offset which indicates the offset from addr where the actual 
>>>>>>> doorbell
>>>>>>> resides.
>>>>>>
>>>>>> The movitiaton of the current design is to be fit seamless into 
>>>>>> how Qemu
>>>>>> model doorbell layouts currently:
>>>>>>
>>>>>> 1) page-per-vq, each vq has its own page aligned doorbell
>>>>>> 2) 2 bytes doorbell, each vq has its own 2 byte aligend doorbell
>>>>>>
>>>>>> Only 1) is support in vhost-vDPA (and vhost-user) since it's 
>>>>>> rather simple
>>>>>> and secure (page aligned) to be modelled and implemented via mmap().
>>>>>>
>>>>>> Exporting a complex layout is possbile but requires careful design.
>>>>>>
>>>>>> Actually, we had antoher option
>>>>>>
>>>>>> 3) shared doorbell: all virtqueue shares a single page aligned 
>>>>>> doorbell
>>>>> I am not sure how this could solve the problem of 64KB archs.
>>>>> The point is that in ConnectX devices, the virtio queue objects 
>>>>> doorbell
>>>>> is aligned to 4K. For larger system page sizes, the doorbell may 
>>>>> not be
>>>>> aligned to a system page.
>>>>> So it seems not too complex to introduce offset within the page.
>>>>
>>>>
>>>> Three major issues:
>>>>
>>>> 1) single mmap() works at page level, it means we need map 64K to 
>>>> guest and we can only do this safely if no other registers are 
>>>> placed into the same page
>>>> 2) new uAPI to let the userspace know the offset
>>>> 3) how to model them with the virtio-pci in Qemu, and this may 
>>>> introduce burdens for management (need some changes in the qemu 
>>>> command line) to deal with the migration compatibility
>>>>
>>>> So consider the complexity, we can just stick to the current code. 
>>>> That means mmap() will fail and qemu will keep using the eventfd 
>>>> based kick.
>>> There is another case, mmap() works at page level, page size is at 
>>> least 4K. Consider if a device has a bar containing the shared 
>>> doorbell page at its last 4K space. In this bar layout, map a 
>>> arch.page_size=64K page to usersapce would lead to fatal errors.
>>
>>
>> Why it's a fatal error? Userspace should survive from mmap() errors 
>> and keep using the kickfd.
> I mean vhost-vdpa should not only check the alignment, also need to 
> check whether the doorbell size no less than a arch.page_size. 


The code has already did this, isn't it?

         if (vma->vm_end - vma->vm_start != PAGE_SIZE)
                 return -EINVAL;

...

         notify = ops->get_vq_notification(vdpa, index);
         if (notify.addr & (PAGE_SIZE - 1))
                 return -EINVAL;
         if (vma->vm_end - vma->vm_start != notify.size)
                 return -ENOTSUPP;


> If the doorbell placed at the last 4K in bar, map 64k page could be an 
> error.


mmap() will fail in this case.

Thanks


>
> Thanks
>>
>>
>>> I think we can assign the actual size of the doorbell area size to 
>>> vdpa_notification.size than arch.page_size to avoid such issues. 
>>> Then upper layers like vhost_vdpa should check whether this size can 
>>> work with the machine arch and its alignment, if not, should fail 
>>> over to use eventfd.
>>
>>
>> Isn't this how get_vet_notification() designed and implemented right 
>> now? What parent need is just to report the doorbell size, it's the 
>> bus driver (vhost-vDPA) to decide if and how it is used.
>>
>> Thanks
>>
>>
>>> Then do we still need a uAPI tell the offset within the page?
>>>
>>> Thanks
>>> Zhu Lingshan
>>>>
>>>>
>>>>
>>>>>
>>>>> BTW, for now, I am going to send another patch that makes sure page
>>>>> boundaries are not vilated. It requires some support from mlx5_core
>>>>> which is currently being reviewed internally.
>>>>
>>>>
>>>> Sure.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>>
>>>>>> This is not yet supported by Qemu.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>>>>> I also think that the check in vhost_vdpa_mmap() should 
>>>>>>>>>> verify that the
>>>>>>>>>> returned size is not smaller than PAGE_SIZE because the 
>>>>>>>>>> returned address
>>>>>>>>> Actually I think it's ok since you verify the size equals 
>>>>>>>>> vma->vm_end -
>>>>>>>>> vma->vm_start which must be at least PAGE_SIZE.
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>>> might just be aligned to PAGE_SIZE. I think this should be 
>>>>>>>>>> enoght but
>>>>>>>>>> maybe also use the same logic in vhost_vdpa_fault().
>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-04-30 13:50 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-21 10:41 [PATCH] vdpa/mlx5: Add support for doorbell bypassing Eli Cohen
2021-04-22  2:37 ` Jason Wang
2021-04-22  3:27   ` Mika Penttilä
2021-04-22  4:52     ` Jason Wang
2021-04-22  6:03   ` Eli Cohen
2021-04-22  8:07     ` Eli Cohen
2021-04-22  8:21       ` Jason Wang
2021-04-22  8:39         ` Eli Cohen
2021-04-22  8:59           ` Jason Wang
2021-04-25 13:25             ` Eli Cohen
2021-04-26  2:35               ` Jason Wang
2021-04-28 20:53                 ` Si-Wei Liu
2021-04-29  2:43                   ` Jason Wang
2021-04-29 10:00             ` Eli Cohen
2021-04-30  4:40               ` Jason Wang
2021-04-30  6:31                 ` Zhu, Lingshan
2021-04-30  7:03                   ` Jason Wang
2021-04-30  9:25                     ` Zhu, Lingshan
2021-04-30 13:49                       ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).