All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
@ 2019-10-31 10:46 Vitaly Mireyno
  2019-11-01  4:09 ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Vitaly Mireyno @ 2019-10-31 10:46 UTC (permalink / raw)
  To: Jason Wang, virtio-comment; +Cc: Michael S. Tsirkin

The feature is limited to receive buffers only.
A driver decides on receive buffer length. The only limitation is that this length has to be the same for all receive virtqueue buffers.
The driver configures receive buffer length to the device during device initialization, and the device reads it and may use it for optimal operation. 
No changes for transmit buffers.

-----Original Message-----
From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-open.org> On Behalf Of Jason Wang
Sent: Thursday, 31 October, 2019 12:15
To: Vitaly Mireyno <vmireyno@marvell.com>; virtio-comment@lists.oasis-open.org
Cc: Michael S. Tsirkin <mst@redhat.com>
Subject: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature


On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
> Some devices benefit from working with receive buffers of a predefined constant length.
> Add a feature for drivers that allocate equal-sized receive buffers, and devices that benefit from predefined receive buffers length.
>
> Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
> ---
>   content.tex | 29 +++++++++++++++++++++++++++--
>   1 file changed, 27 insertions(+), 2 deletions(-)


Is there any other networking device that has this feature?


>
> diff --git a/content.tex b/content.tex index b1ea9b9..c9e67c8 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -2811,6 +2811,10 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>       channel.
>   
> +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver allocates all receive buffers
> +    of the same constant length. Device benefits from working with
> +    receive buffers of equal length.
> +
>   \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process duplicated ACKs
>       and report number of coalesced segments and duplicated ACKs
>   
> @@ -2854,8 +2858,8 @@ \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network
>   \subsection{Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout}
>   \label{sec:Device Types / Block Device / Feature bits / Device 
> configuration layout}
>   
> -Three driver-read-only configuration fields are currently defined. 
> The \field{mac} address field -always exists (though is only valid if 
> VIRTIO_NET_F_MAC is set), and
> +The driver-read-only \field{mac} address field always exists (though 
> +is only valid if VIRTIO_NET_F_MAC is set), and
>   \field{status} only exists if VIRTIO_NET_F_STATUS is set. Two
>   read-only bits (for the driver) are currently defined for the status field:
>   VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
> @@ -2875,12 +2879,17 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>   VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to
>   use.
>   
> +The device-read-only field \field{rx_buf_len} only exists if


Should be driver-read-only.


> +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field specifies the 
> +receive buffers length.
> +
>   \begin{lstlisting}
>   struct virtio_net_config {
>           u8 mac[6];
>           le16 status;
>           le16 max_virtqueue_pairs;
>           le16 mtu;
> +        le32 rx_buf_len;
>   };
>   \end{lstlisting}
>   
> @@ -2933,6 +2942,13 @@ \subsection{Device configuration 
> layout}\label{sec:Device Types / Network Device
>   
>   A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature if the device offers it.
>   
> +A driver SHOULD accept the VIRTIO_NET_F_CONST_RXBUF_LEN feature if offered.
> +
> +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been negotiated, the 
> +driver MUST set \field{rx_buf_len}.


I think it's device that set the field?


> +
> +A driver MUST NOT modify \field{rx_buf_len} once it has been set.
> +
>   \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout}
>   \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout}
>   When using the legacy interface, transitional devices and drivers @@ 
> -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive Buffers}\label{sec:Device Types / Network Devi
>   If VIRTIO_NET_F_MQ is negotiated, each of receiveq1\ldots receiveqN
>   that will be used SHOULD be populated with receive buffers.
>   
> +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been negotiated, the 
> +driver MUST initialize all receive virtqueue descriptors \field{len} 
> +field with the value configured in \field{rx_buf_len} device 
> +configuration field, and allocate receive buffers accordingly.
> +
>   \devicenormative{\paragraph}{Setting Up Receive Buffers}{Device 
> Types / Network Device / Device Operation / Setting Up Receive 
> Buffers}
>   
>   The device MUST set \field{num_buffers} to the number of descriptors 
> used to @@ -3396,6 +3417,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>   checksum (in case of multiple encapsulated protocols, one level
>   of checksums is validated).
>   
> +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated, the device MAY 
> +use \field{rx_buf_len} as a buffer length (instead of reading it from 
> +virtqueue descriptor \field{len} field).


Is this safe? What if driver submit a small buffer, then device can read more than what is allowed?

Thanks


> +
>   \drivernormative{\paragraph}{Processing of Incoming
>   Packets}{Device Types / Network Device / Device Operation /
>   Processing of Incoming Packets}
> --


This publicly archived list offers a means to provide input to theOASIS Virtual I/O Device (VIRTIO) TC.In order to verify user consent to the Feedback License terms andto minimize spam in the list archive, subscription is requiredbefore posting.Subscribe: virtio-comment-subscribe@lists.oasis-open.orgUnsubscribe: virtio-comment-unsubscribe@lists.oasis-open.orgList help: virtio-comment-help@lists.oasis-open.orgList archive: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-2Dopen.org_archives_virtio-2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=ZvYuXs3LuhkeJxW_zoADv1L4RisDyGH_vp7scO91w9s&e= Feedback License: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_who_ipr_feedback-5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7Hf-O4ss-rLdNJsAbIOg5ruheBcfAYFdXWuks_HJfFQ&e= List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7oMwlBy4fdYwjT2_f9BTwHRm2GFtIYL-x8e4C0jlrpM&e= Committee: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=-K2akZ06PY7r4TquQY4T4UWdZ6JyAsBDFZIEZFNsWHk&e= Join OASIS: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=lR1SlzXT71tjw7DDuD6XaHefJQSTBCAvRcdp45ureB8&e= 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-10-31 10:46 [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature Vitaly Mireyno
@ 2019-11-01  4:09 ` Michael S. Tsirkin
  2019-11-01  7:22   ` Jason Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-11-01  4:09 UTC (permalink / raw)
  To: Vitaly Mireyno; +Cc: Jason Wang, virtio-comment

On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
> The feature is limited to receive buffers only.
> A driver decides on receive buffer length. The only limitation is that this length has to be the same for all receive virtqueue buffers.
> The driver configures receive buffer length to the device during device initialization, and the device reads it and may use it for optimal operation. 
> No changes for transmit buffers.
> 
> -----Original Message-----
> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-open.org> On Behalf Of Jason Wang
> Sent: Thursday, 31 October, 2019 12:15
> To: Vitaly Mireyno <vmireyno@marvell.com>; virtio-comment@lists.oasis-open.org
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Subject: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
> 
> 
> On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
> > Some devices benefit from working with receive buffers of a predefined constant length.
> > Add a feature for drivers that allocate equal-sized receive buffers, and devices that benefit from predefined receive buffers length.
> >
> > Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
> > ---
> >   content.tex | 29 +++++++++++++++++++++++++++--
> >   1 file changed, 27 insertions(+), 2 deletions(-)
> 
> 
> Is there any other networking device that has this feature?
> 
> 
> >
> > diff --git a/content.tex b/content.tex index b1ea9b9..c9e67c8 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -2811,6 +2811,10 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> >   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> >       channel.
> >   
> > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver allocates all receive buffers
> > +    of the same constant length. Device benefits from working with
> > +    receive buffers of equal length.
> > +

Problem is, I don't think linux will use this for skbs since it breaks
buffer accounting. This is because it is important to make skbs
as small as you can. So even if you see "device would benefit"
there is no way to balance this with the benefit to linux.
How do you know which benefit is bigger?

You also never explained how does device benefit. My guess is this
allows device to calculate the # of descriptors to fetch
that are needed for a packet. Right?
Assuming this, I think a rough estimate should be enough.
If device fetches too much it can discard extra, if it does not
fetch enough it can fetch extra.

Let us not forget that express packets are 256 bytes so
up to 16 descriptors fit in a packet, there is no
benefit most of the time in knowing whether e.g. 1 or 2
descriptors are needed.


Let us not forget these are buffers, not descriptors.

So device does not really know the exact # of descriptors:
current drivers do 1 descriptor per buffer but nothing
prevents more.


Thoughts?


> >   \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process duplicated ACKs
> >       and report number of coalesced segments and duplicated ACKs
> >   
> > @@ -2854,8 +2858,8 @@ \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network
> >   \subsection{Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout}
> >   \label{sec:Device Types / Block Device / Feature bits / Device 
> > configuration layout}
> >   
> > -Three driver-read-only configuration fields are currently defined. 
> > The \field{mac} address field -always exists (though is only valid if 
> > VIRTIO_NET_F_MAC is set), and
> > +The driver-read-only \field{mac} address field always exists (though 
> > +is only valid if VIRTIO_NET_F_MAC is set), and
> >   \field{status} only exists if VIRTIO_NET_F_STATUS is set. Two
> >   read-only bits (for the driver) are currently defined for the status field:
> >   VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
> > @@ -2875,12 +2879,17 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
> >   VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to
> >   use.
> >   
> > +The device-read-only field \field{rx_buf_len} only exists if
> 
> 
> Should be driver-read-only.
> 
> 
> > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field specifies the 
> > +receive buffers length.
> > +
> >   \begin{lstlisting}
> >   struct virtio_net_config {
> >           u8 mac[6];
> >           le16 status;
> >           le16 max_virtqueue_pairs;
> >           le16 mtu;
> > +        le32 rx_buf_len;
> >   };
> >   \end{lstlisting}
> >   
> > @@ -2933,6 +2942,13 @@ \subsection{Device configuration 
> > layout}\label{sec:Device Types / Network Device
> >   
> >   A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature if the device offers it.
> >   
> > +A driver SHOULD accept the VIRTIO_NET_F_CONST_RXBUF_LEN feature if offered.
> > +
> > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been negotiated, the 
> > +driver MUST set \field{rx_buf_len}.
> 
> 
> I think it's device that set the field?

Makes more sense for the driver, but if you want this set
e.g. before buffers are added, you must say so.

> 
> > +
> > +A driver MUST NOT modify \field{rx_buf_len} once it has been set.
> > +


This seems very unflexible. I can see how e.g. XDP would benefit from
big buffers while skbs benefit from small buffers.

This calls for ability to change this.

> >   \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout}
> >   \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout}
> >   When using the legacy interface, transitional devices and drivers @@ 
> > -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive Buffers}\label{sec:Device Types / Network Devi
> >   If VIRTIO_NET_F_MQ is negotiated, each of receiveq1\ldots receiveqN
> >   that will be used SHOULD be populated with receive buffers.
> >   
> > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been negotiated, the 
> > +driver MUST initialize all receive virtqueue descriptors \field{len} 
> > +field with the value configured in \field{rx_buf_len} device 
> > +configuration field, and allocate receive buffers accordingly.
> > +
> >   \devicenormative{\paragraph}{Setting Up Receive Buffers}{Device 
> > Types / Network Device / Device Operation / Setting Up Receive 
> > Buffers}
> >   
> >   The device MUST set \field{num_buffers} to the number of descriptors 
> > used to @@ -3396,6 +3417,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> >   checksum (in case of multiple encapsulated protocols, one level
> >   of checksums is validated).
> >   
> > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated, the device MAY 
> > +use \field{rx_buf_len} as a buffer length (instead of reading it from 
> > +virtqueue descriptor \field{len} field).
> 
> 
> Is this safe? What if driver submit a small buffer, then device can read more than what is allowed?
> 
> Thanks
> 
> 
> > +
> >   \drivernormative{\paragraph}{Processing of Incoming
> >   Packets}{Device Types / Network Device / Device Operation /
> >   Processing of Incoming Packets}
> > --
> 
> 
> This publicly archived list offers a means to provide input to theOASIS Virtual I/O Device (VIRTIO) TC.In order to verify user consent to the Feedback License terms andto minimize spam in the list archive, subscription is requiredbefore posting.Subscribe: virtio-comment-subscribe@lists.oasis-open.orgUnsubscribe: virtio-comment-unsubscribe@lists.oasis-open.orgList help: virtio-comment-help@lists.oasis-open.orgList archive: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-2Dopen.org_archives_virtio-2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=ZvYuXs3LuhkeJxW_zoADv1L4RisDyGH_vp7scO91w9s&e= Feedback License: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_who_ipr_feedback-5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7Hf-O4ss-rLdNJsAbIOg5ruheBcfAYFdXWuks_HJfFQ&e= List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7oMwlBy4fdYwjT2_f9BTwHRm2GFtIYL-x8e4C0jlrpM&e= Committee: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=-K2akZ06PY7r4TquQY4T4UWdZ6JyAsBDFZIEZFNsWHk&e= Join OASIS: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=lR1SlzXT71tjw7DDuD6XaHefJQSTBCAvRcdp45ureB8&e= 
> 

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-01  4:09 ` Michael S. Tsirkin
@ 2019-11-01  7:22   ` Jason Wang
  2019-11-05 18:51     ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2019-11-01  7:22 UTC (permalink / raw)
  To: Michael S. Tsirkin, Vitaly Mireyno; +Cc: virtio-comment


On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
> On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
>> The feature is limited to receive buffers only.
>> A driver decides on receive buffer length. The only limitation is that this length has to be the same for all receive virtqueue buffers.
>> The driver configures receive buffer length to the device during device initialization, and the device reads it and may use it for optimal operation.
>> No changes for transmit buffers.
>>
>> -----Original Message-----
>> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-open.org> On Behalf Of Jason Wang
>> Sent: Thursday, 31 October, 2019 12:15
>> To: Vitaly Mireyno <vmireyno@marvell.com>; virtio-comment@lists.oasis-open.org
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Subject: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
>>
>>
>> On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
>>> Some devices benefit from working with receive buffers of a predefined constant length.
>>> Add a feature for drivers that allocate equal-sized receive buffers, and devices that benefit from predefined receive buffers length.
>>>
>>> Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
>>> ---
>>>    content.tex | 29 +++++++++++++++++++++++++++--
>>>    1 file changed, 27 insertions(+), 2 deletions(-)
>>
>> Is there any other networking device that has this feature?
>>
>>
>>> diff --git a/content.tex b/content.tex index b1ea9b9..c9e67c8 100644
>>> --- a/content.tex
>>> +++ b/content.tex
>>> @@ -2811,6 +2811,10 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>>    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>>        channel.
>>>    
>>> +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver allocates all receive buffers
>>> +    of the same constant length. Device benefits from working with
>>> +    receive buffers of equal length.
>>> +
> Problem is, I don't think linux will use this for skbs since it breaks
> buffer accounting. This is because it is important to make skbs
> as small as you can. So even if you see "device would benefit"
> there is no way to balance this with the benefit to linux.
> How do you know which benefit is bigger?


I guess the idea is e.g for Linux driver, it can refuse this feature.


>
> You also never explained how does device benefit. My guess is this
> allows device to calculate the # of descriptors to fetch
> that are needed for a packet. Right?
> Assuming this, I think a rough estimate should be enough.
> If device fetches too much it can discard extra, if it does not
> fetch enough it can fetch extra.
>
> Let us not forget that express packets are 256 bytes so
> up to 16 descriptors fit in a packet, there is no
> benefit most of the time in knowing whether e.g. 1 or 2
> descriptors are needed.
>
>
> Let us not forget these are buffers, not descriptors.


I guess maybe the initial motivation is constant descriptor length.


>
> So device does not really know the exact # of descriptors:
> current drivers do 1 descriptor per buffer but nothing
> prevents more.
>
>
> Thoughts?
>
>
>>>    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process duplicated ACKs
>>>        and report number of coalesced segments and duplicated ACKs
>>>    
>>> @@ -2854,8 +2858,8 @@ \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network
>>>    \subsection{Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout}
>>>    \label{sec:Device Types / Block Device / Feature bits / Device
>>> configuration layout}
>>>    
>>> -Three driver-read-only configuration fields are currently defined.
>>> The \field{mac} address field -always exists (though is only valid if
>>> VIRTIO_NET_F_MAC is set), and
>>> +The driver-read-only \field{mac} address field always exists (though
>>> +is only valid if VIRTIO_NET_F_MAC is set), and
>>>    \field{status} only exists if VIRTIO_NET_F_STATUS is set. Two
>>>    read-only bits (for the driver) are currently defined for the status field:
>>>    VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
>>> @@ -2875,12 +2879,17 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>>>    VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to
>>>    use.
>>>    
>>> +The device-read-only field \field{rx_buf_len} only exists if
>>
>> Should be driver-read-only.
>>
>>
>>> +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field specifies the
>>> +receive buffers length.
>>> +
>>>    \begin{lstlisting}
>>>    struct virtio_net_config {
>>>            u8 mac[6];
>>>            le16 status;
>>>            le16 max_virtqueue_pairs;
>>>            le16 mtu;
>>> +        le32 rx_buf_len;
>>>    };
>>>    \end{lstlisting}
>>>    
>>> @@ -2933,6 +2942,13 @@ \subsection{Device configuration
>>> layout}\label{sec:Device Types / Network Device
>>>    
>>>    A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature if the device offers it.
>>>    
>>> +A driver SHOULD accept the VIRTIO_NET_F_CONST_RXBUF_LEN feature if offered.
>>> +
>>> +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been negotiated, the
>>> +driver MUST set \field{rx_buf_len}.
>>
>> I think it's device that set the field?
> Makes more sense for the driver, but if you want this set
> e.g. before buffers are added, you must say so.
>
>>> +
>>> +A driver MUST NOT modify \field{rx_buf_len} once it has been set.
>>> +
>
> This seems very unflexible. I can see how e.g. XDP would benefit from
> big buffers while skbs benefit from small buffers.
>
> This calls for ability to change this.


Yes, but it requires non trivial cleanups for the old length and place 
them with new ones.

Thanks


>
>>>    \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout}
>>>    \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout}
>>>    When using the legacy interface, transitional devices and drivers @@
>>> -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive Buffers}\label{sec:Device Types / Network Devi
>>>    If VIRTIO_NET_F_MQ is negotiated, each of receiveq1\ldots receiveqN
>>>    that will be used SHOULD be populated with receive buffers.
>>>    
>>> +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been negotiated, the
>>> +driver MUST initialize all receive virtqueue descriptors \field{len}
>>> +field with the value configured in \field{rx_buf_len} device
>>> +configuration field, and allocate receive buffers accordingly.
>>> +
>>>    \devicenormative{\paragraph}{Setting Up Receive Buffers}{Device
>>> Types / Network Device / Device Operation / Setting Up Receive
>>> Buffers}
>>>    
>>>    The device MUST set \field{num_buffers} to the number of descriptors
>>> used to @@ -3396,6 +3417,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>    checksum (in case of multiple encapsulated protocols, one level
>>>    of checksums is validated).
>>>    
>>> +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated, the device MAY
>>> +use \field{rx_buf_len} as a buffer length (instead of reading it from
>>> +virtqueue descriptor \field{len} field).
>>
>> Is this safe? What if driver submit a small buffer, then device can read more than what is allowed?
>>
>> Thanks
>>
>>
>>> +
>>>    \drivernormative{\paragraph}{Processing of Incoming
>>>    Packets}{Device Types / Network Device / Device Operation /
>>>    Processing of Incoming Packets}
>>> --
>>
>> This publicly archived list offers a means to provide input to theOASIS Virtual I/O Device (VIRTIO) TC.In order to verify user consent to the Feedback License terms andto minimize spam in the list archive, subscription is requiredbefore posting.Subscribe: virtio-comment-subscribe@lists.oasis-open.orgUnsubscribe: virtio-comment-unsubscribe@lists.oasis-open.orgList help: virtio-comment-help@lists.oasis-open.orgList archive: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-2Dopen.org_archives_virtio-2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=ZvYuXs3LuhkeJxW_zoADv1L4RisDyGH_vp7scO91w9s&e= Feedback License: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_who_ipr_feedback-5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7Hf-O4ss-rLdNJsAbIOg5ruheBcfAYFdXWuks_HJfFQ&e= List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7oMwlBy4fdYwjT2_f9BTwHRm2GFtIYL-x8e4C0jlrpM&e= Committee: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=-K2akZ06PY7r4TquQY4T4UWdZ6JyAsBDFZIEZFNsWHk&e= Join OASIS: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=lR1SlzXT71tjw7DDuD6XaHefJQSTBCAvRcdp45ureB8&e=
>>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-01  7:22   ` Jason Wang
@ 2019-11-05 18:51     ` Michael S. Tsirkin
  2019-11-10 14:13       ` [virtio-comment] RE: [EXT] " Vitaly Mireyno
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-11-05 18:51 UTC (permalink / raw)
  To: Jason Wang; +Cc: Vitaly Mireyno, virtio-comment

On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
> 
> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
> > > The feature is limited to receive buffers only.
> > > A driver decides on receive buffer length. The only limitation is that this length has to be the same for all receive virtqueue buffers.
> > > The driver configures receive buffer length to the device during device initialization, and the device reads it and may use it for optimal operation.
> > > No changes for transmit buffers.
> > > 
> > > -----Original Message-----
> > > From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-open.org> On Behalf Of Jason Wang
> > > Sent: Thursday, 31 October, 2019 12:15
> > > To: Vitaly Mireyno <vmireyno@marvell.com>; virtio-comment@lists.oasis-open.org
> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > Subject: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
> > > 
> > > 
> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
> > > > Some devices benefit from working with receive buffers of a predefined constant length.
> > > > Add a feature for drivers that allocate equal-sized receive buffers, and devices that benefit from predefined receive buffers length.
> > > > 
> > > > Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
> > > > ---
> > > >    content.tex | 29 +++++++++++++++++++++++++++--
> > > >    1 file changed, 27 insertions(+), 2 deletions(-)
> > > 
> > > Is there any other networking device that has this feature?
> > > 
> > > 
> > > > diff --git a/content.tex b/content.tex index b1ea9b9..c9e67c8 100644
> > > > --- a/content.tex
> > > > +++ b/content.tex
> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > > >        channel.
> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver allocates all receive buffers
> > > > +    of the same constant length. Device benefits from working with
> > > > +    receive buffers of equal length.
> > > > +
> > Problem is, I don't think linux will use this for skbs since it breaks
> > buffer accounting. This is because it is important to make skbs
> > as small as you can. So even if you see "device would benefit"
> > there is no way to balance this with the benefit to linux.
> > How do you know which benefit is bigger?
> 
> 
> I guess the idea is e.g for Linux driver, it can refuse this feature.

Okay. What uses it?

> 
> > 
> > You also never explained how does device benefit. My guess is this
> > allows device to calculate the # of descriptors to fetch
> > that are needed for a packet. Right?
> > Assuming this, I think a rough estimate should be enough.
> > If device fetches too much it can discard extra, if it does not
> > fetch enough it can fetch extra.
> > 
> > Let us not forget that express packets are 256 bytes so
> > up to 16 descriptors fit in a packet, there is no
> > benefit most of the time in knowing whether e.g. 1 or 2
> > descriptors are needed.
> > 
> > 
> > Let us not forget these are buffers, not descriptors.
> 
> 
> I guess maybe the initial motivation is constant descriptor length.


That conflicts with requirement framing is up to driver.


> 
> > 
> > So device does not really know the exact # of descriptors:
> > current drivers do 1 descriptor per buffer but nothing
> > prevents more.
> > 
> > 
> > Thoughts?
> > 
> > 
> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process duplicated ACKs
> > > >        and report number of coalesced segments and duplicated ACKs
> > > > @@ -2854,8 +2858,8 @@ \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network
> > > >    \subsection{Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout}
> > > >    \label{sec:Device Types / Block Device / Feature bits / Device
> > > > configuration layout}
> > > > -Three driver-read-only configuration fields are currently defined.
> > > > The \field{mac} address field -always exists (though is only valid if
> > > > VIRTIO_NET_F_MAC is set), and
> > > > +The driver-read-only \field{mac} address field always exists (though
> > > > +is only valid if VIRTIO_NET_F_MAC is set), and
> > > >    \field{status} only exists if VIRTIO_NET_F_STATUS is set. Two
> > > >    read-only bits (for the driver) are currently defined for the status field:
> > > >    VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
> > > > @@ -2875,12 +2879,17 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
> > > >    VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to
> > > >    use.
> > > > +The device-read-only field \field{rx_buf_len} only exists if
> > > 
> > > Should be driver-read-only.
> > > 
> > > 
> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field specifies the
> > > > +receive buffers length.
> > > > +
> > > >    \begin{lstlisting}
> > > >    struct virtio_net_config {
> > > >            u8 mac[6];
> > > >            le16 status;
> > > >            le16 max_virtqueue_pairs;
> > > >            le16 mtu;
> > > > +        le32 rx_buf_len;
> > > >    };
> > > >    \end{lstlisting}
> > > > @@ -2933,6 +2942,13 @@ \subsection{Device configuration
> > > > layout}\label{sec:Device Types / Network Device
> > > >    A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature if the device offers it.
> > > > +A driver SHOULD accept the VIRTIO_NET_F_CONST_RXBUF_LEN feature if offered.
> > > > +
> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been negotiated, the
> > > > +driver MUST set \field{rx_buf_len}.
> > > 
> > > I think it's device that set the field?
> > Makes more sense for the driver, but if you want this set
> > e.g. before buffers are added, you must say so.
> > 
> > > > +
> > > > +A driver MUST NOT modify \field{rx_buf_len} once it has been set.
> > > > +
> > 
> > This seems very unflexible. I can see how e.g. XDP would benefit from
> > big buffers while skbs benefit from small buffers.
> > 
> > This calls for ability to change this.
> 
> 
> Yes, but it requires non trivial cleanups for the old length and place them
> with new ones.
> 
> Thanks


Let's see:
1	- making buffer smaller: just update config space,
	  then make new buffers smaller

2	- making buffers bigger: add larger buffers,
	once all small ones are consumed update config space

2 is tricky I agree. Thoughts?



> 
> > 
> > > >    \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout}
> > > >    \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout}
> > > >    When using the legacy interface, transitional devices and drivers @@
> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive Buffers}\label{sec:Device Types / Network Devi
> > > >    If VIRTIO_NET_F_MQ is negotiated, each of receiveq1\ldots receiveqN
> > > >    that will be used SHOULD be populated with receive buffers.
> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been negotiated, the
> > > > +driver MUST initialize all receive virtqueue descriptors \field{len}
> > > > +field with the value configured in \field{rx_buf_len} device
> > > > +configuration field, and allocate receive buffers accordingly.
> > > > +
> > > >    \devicenormative{\paragraph}{Setting Up Receive Buffers}{Device
> > > > Types / Network Device / Device Operation / Setting Up Receive
> > > > Buffers}
> > > >    The device MUST set \field{num_buffers} to the number of descriptors
> > > > used to @@ -3396,6 +3417,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > > >    checksum (in case of multiple encapsulated protocols, one level
> > > >    of checksums is validated).
> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated, the device MAY
> > > > +use \field{rx_buf_len} as a buffer length (instead of reading it from
> > > > +virtqueue descriptor \field{len} field).
> > > 
> > > Is this safe? What if driver submit a small buffer, then device can read more than what is allowed?
> > > 
> > > Thanks
> > > 
> > > 
> > > > +
> > > >    \drivernormative{\paragraph}{Processing of Incoming
> > > >    Packets}{Device Types / Network Device / Device Operation /
> > > >    Processing of Incoming Packets}
> > > > --
> > > 
> > > This publicly archived list offers a means to provide input to theOASIS Virtual I/O Device (VIRTIO) TC.In order to verify user consent to the Feedback License terms andto minimize spam in the list archive, subscription is requiredbefore posting.Subscribe: virtio-comment-subscribe@lists.oasis-open.orgUnsubscribe: virtio-comment-unsubscribe@lists.oasis-open.orgList help: virtio-comment-help@lists.oasis-open.orgList archive: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-2Dopen.org_archives_virtio-2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=ZvYuXs3LuhkeJxW_zoADv1L4RisDyGH_vp7scO91w9s&e= Feedback License: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_who_ipr_feedback-5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7Hf-O4ss-rLdNJsAbIOg5ruheBcfAYFdXWuks_HJfFQ&e= List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7oMwlBy4fdYwjT2_f9BTwHRm2GFtIYL-x8e4C0jlrpM&e= Committee: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=-K2akZ06PY7r4TquQY4T4UWdZ6JyAsBDFZIEZFNsWHk&e= Join OASIS: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=lR1SlzXT71tjw7DDuD6XaHefJQSTBCAvRcdp45ureB8&e=
> > > 
> > This publicly archived list offers a means to provide input to the
> > OASIS Virtual I/O Device (VIRTIO) TC.
> > 
> > In order to verify user consent to the Feedback License terms and
> > to minimize spam in the list archive, subscription is required
> > before posting.
> > 
> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > List help: virtio-comment-help@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/
> > 

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [virtio-comment] RE: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-05 18:51     ` Michael S. Tsirkin
@ 2019-11-10 14:13       ` Vitaly Mireyno
  2019-11-10 15:34         ` [virtio-comment] " Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Vitaly Mireyno @ 2019-11-10 14:13 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang; +Cc: virtio-comment


>-----Original Message-----
>From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
>open.org> On Behalf Of Michael S. Tsirkin
>Sent: Tuesday, 5 November, 2019 20:52
>To: Jason Wang <jasowang@redhat.com>
>Cc: Vitaly Mireyno <vmireyno@marvell.com>; virtio-comment@lists.oasis-
>open.org
>Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized
>receive buffers feature
>
>External Email
>
>----------------------------------------------------------------------
>On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
>>
>> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
>> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
>> > > The feature is limited to receive buffers only.
>> > > A driver decides on receive buffer length. The only limitation is that this
>length has to be the same for all receive virtqueue buffers.
>> > > The driver configures receive buffer length to the device during device
>initialization, and the device reads it and may use it for optimal operation.
>> > > No changes for transmit buffers.
>> > >
>> > > -----Original Message-----
>> > > From: virtio-comment@lists.oasis-open.org
>> > > <virtio-comment@lists.oasis-open.org> On Behalf Of Jason Wang
>> > > Sent: Thursday, 31 October, 2019 12:15
>> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
>> > > virtio-comment@lists.oasis-open.org
>> > > Cc: Michael S. Tsirkin <mst@redhat.com>
>> > > Subject: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized
>> > > receive buffers feature
>> > >
>> > >
>> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
>> > > > Some devices benefit from working with receive buffers of a
>predefined constant length.
>> > > > Add a feature for drivers that allocate equal-sized receive buffers, and
>devices that benefit from predefined receive buffers length.
>> > > >
>> > > > Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
>> > > > ---
>> > > >    content.tex | 29 +++++++++++++++++++++++++++--
>> > > >    1 file changed, 27 insertions(+), 2 deletions(-)
>> > >
>> > > Is there any other networking device that has this feature?
>> > >
>> > >
>> > > > diff --git a/content.tex b/content.tex index b1ea9b9..c9e67c8
>> > > > 100644
>> > > > --- a/content.tex
>> > > > +++ b/content.tex
>> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature bits}\label{sec:Device
>Types / Network Device / Feature bits
>> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address
>through control
>> > > >        channel.
>> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver allocates all
>receive buffers
>> > > > +    of the same constant length. Device benefits from working with
>> > > > +    receive buffers of equal length.
>> > > > +
>> > Problem is, I don't think linux will use this for skbs since it
>> > breaks buffer accounting. This is because it is important to make
>> > skbs as small as you can. So even if you see "device would benefit"
>> > there is no way to balance this with the benefit to linux.
>> > How do you know which benefit is bigger?
>>
>>
>> I guess the idea is e.g for Linux driver, it can refuse this feature.
>
>Okay. What uses it?
>
>>
>> >
>> > You also never explained how does device benefit. My guess is this
>> > allows device to calculate the # of descriptors to fetch that are
>> > needed for a packet. Right?
>> > Assuming this, I think a rough estimate should be enough.
>> > If device fetches too much it can discard extra, if it does not
>> > fetch enough it can fetch extra.
>> >
>> > Let us not forget that express packets are 256 bytes so up to 16
>> > descriptors fit in a packet, there is no benefit most of the time in
>> > knowing whether e.g. 1 or 2 descriptors are needed.
>> >
>> >
>> > Let us not forget these are buffers, not descriptors.
>>
>>
>> I guess maybe the initial motivation is constant descriptor length.
>
>
>That conflicts with requirement framing is up to driver.
>
>
I was using the wrong terminology. The feature is about constant *descriptor* length. In other words, the value that driver puts in Packed Virtqueue "Element Length" field (or 'len' field in the 'pvirtq_desc' struct).
Does it make more sense now, or you still see an issue with Linux driver?

The motivation is to allow the device to calculate the exact number of descriptors to consume, before fetching the descriptors. This is beneficial for devices for which overconsuming or underconsuming descriptors come at a high cost.
>>
>> >
>> > So device does not really know the exact # of descriptors:
>> > current drivers do 1 descriptor per buffer but nothing prevents
>> > more.
>> >
>> >
>> > Thoughts?
>> >
>> >
>> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process duplicated
>ACKs
>> > > >        and report number of coalesced segments and duplicated
>> > > > ACKs @@ -2854,8 +2858,8 @@ \subsubsection{Legacy Interface:
>Feature bits}\label{sec:Device Types / Network
>> > > >    \subsection{Device configuration layout}\label{sec:Device Types /
>Network Device / Device configuration layout}
>> > > >    \label{sec:Device Types / Block Device / Feature bits /
>> > > > Device configuration layout} -Three driver-read-only
>> > > > configuration fields are currently defined.
>> > > > The \field{mac} address field -always exists (though is only
>> > > > valid if VIRTIO_NET_F_MAC is set), and
>> > > > +The driver-read-only \field{mac} address field always exists
>> > > > +(though is only valid if VIRTIO_NET_F_MAC is set), and
>> > > >    \field{status} only exists if VIRTIO_NET_F_STATUS is set. Two
>> > > >    read-only bits (for the driver) are currently defined for the status
>field:
>> > > >    VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
>> > > > @@ -2875,12 +2879,17 @@ \subsection{Device configuration
>layout}\label{sec:Device Types / Network Device
>> > > >    VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for
>the driver to
>> > > >    use.
>> > > > +The device-read-only field \field{rx_buf_len} only exists if
>> > >
>> > > Should be driver-read-only.
>> > >
>> > >
>> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field
>> > > > +specifies the receive buffers length.
>> > > > +
>> > > >    \begin{lstlisting}
>> > > >    struct virtio_net_config {
>> > > >            u8 mac[6];
>> > > >            le16 status;
>> > > >            le16 max_virtqueue_pairs;
>> > > >            le16 mtu;
>> > > > +        le32 rx_buf_len;
>> > > >    };
>> > > >    \end{lstlisting}
>> > > > @@ -2933,6 +2942,13 @@ \subsection{Device configuration
>> > > > layout}\label{sec:Device Types / Network Device
>> > > >    A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature if
>the device offers it.
>> > > > +A driver SHOULD accept the VIRTIO_NET_F_CONST_RXBUF_LEN
>feature if offered.
>> > > > +
>> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been negotiated,
>> > > > +the driver MUST set \field{rx_buf_len}.
>> > >
>> > > I think it's device that set the field?
>> > Makes more sense for the driver, but if you want this set e.g.
>> > before buffers are added, you must say so.
>> >
>> > > > +
>> > > > +A driver MUST NOT modify \field{rx_buf_len} once it has been set.
>> > > > +
>> >
>> > This seems very unflexible. I can see how e.g. XDP would benefit
>> > from big buffers while skbs benefit from small buffers.
>> >
>> > This calls for ability to change this.
>>
>>
>> Yes, but it requires non trivial cleanups for the old length and place
>> them with new ones.
>>
>> Thanks
>
>
>Let's see:
>1	- making buffer smaller: just update config space,
>	  then make new buffers smaller
>
>2	- making buffers bigger: add larger buffers,
>	once all small ones are consumed update config space
>
>2 is tricky I agree. Thoughts?
>
>
Agree. It's doable, provided that the driver will follow the update procedure.
>
>>
>> >
>> > > >    \subsubsection{Legacy Interface: Device configuration
>layout}\label{sec:Device Types / Network Device / Device configuration layout
>/ Legacy Interface: Device configuration layout}
>> > > >    \label{sec:Device Types / Block Device / Feature bits / Device
>configuration layout / Legacy Interface: Device configuration layout}
>> > > >    When using the legacy interface, transitional devices and
>> > > > drivers @@
>> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive
>Buffers}\label{sec:Device Types / Network Devi
>> > > >    If VIRTIO_NET_F_MQ is negotiated, each of receiveq1\ldots
>receiveqN
>> > > >    that will be used SHOULD be populated with receive buffers.
>> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been negotiated,
>> > > > +the driver MUST initialize all receive virtqueue descriptors
>> > > > +\field{len} field with the value configured in
>> > > > +\field{rx_buf_len} device configuration field, and allocate receive
>buffers accordingly.
>> > > > +
>> > > >    \devicenormative{\paragraph}{Setting Up Receive
>> > > > Buffers}{Device Types / Network Device / Device Operation /
>> > > > Setting Up Receive Buffers}
>> > > >    The device MUST set \field{num_buffers} to the number of
>> > > > descriptors used to @@ -3396,6 +3417,10 @@
>\subsubsection{Processing of Incoming Packets}\label{sec:Device Types /
>Network
>> > > >    checksum (in case of multiple encapsulated protocols, one level
>> > > >    of checksums is validated).
>> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated, the
>device
>> > > > +MAY use \field{rx_buf_len} as a buffer length (instead of
>> > > > +reading it from virtqueue descriptor \field{len} field).
>> > >
>> > > Is this safe? What if driver submit a small buffer, then device can read
>more than what is allowed?
>> > >
>> > > Thanks
>> > >
>> > >
>> > > > +
>> > > >    \drivernormative{\paragraph}{Processing of Incoming
>> > > >    Packets}{Device Types / Network Device / Device Operation /
>> > > >    Processing of Incoming Packets}
>> > > > --
>> > >
>> > > This publicly archived list offers a means to provide input to
>> > > theOASIS Virtual I/O Device (VIRTIO) TC.In order to verify user
>> > > consent to the Feedback License terms andto minimize spam in the
>> > > list archive, subscription is requiredbefore posting.Subscribe:
>> > > virtio-comment-subscribe@lists.oasis-open.orgUnsubscribe:
>> > > virtio-comment-unsubscribe@lists.oasis-open.orgList help:
>> > > virtio-comment-help@lists.oasis-open.orgList archive:
>> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-2
>> > > Dopen.org_archives_virtio-
>2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7
>> > >
>xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg
>5F7
>> > >
>_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=ZvYuXs3LuhkeJxW_zoADv1L4RisDyG
>H_v
>> > > p7scO91w9s&e= Feedback License:
>> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>2Do
>> > > pen.org_who_ipr_feedback-
>5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPa
>> > >
>z7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3L
>Rg5
>> > > F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7Hf-O4ss-
>rLdNJsAbIOg5ruheBcfAYF
>> > > dXWuks_HJfFQ&e= List Guidelines:
>> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>2Do
>> > > pen.org_policies-2Dguidelines_mailing-
>2Dlists&d=DwIFaQ&c=nKjWec2b6
>> > >
>R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=
>KkXM
>> > >
>gr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7oMwlBy4fdYwjT2_f9BTw
>HRm
>> > > 2GFtIYL-x8e4C0jlrpM&e= Committee:
>> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>2Do
>> > >
>pen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=l
>DH
>> > >
>J2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uH
>CHA
>> > > 2n_Tn6303zez2N4VkSfU&s=-
>K2akZ06PY7r4TquQY4T4UWdZ6JyAsBDFZIEZFNsWHk
>> > > &e= Join OASIS:
>> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>2Do
>> > >
>pen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lq
>qs
>> > >
>ArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303
>zez2
>> > > N4VkSfU&s=lR1SlzXT71tjw7DDuD6XaHefJQSTBCAvRcdp45ureB8&e=
>> > >
>> > This publicly archived list offers a means to provide input to the
>> > OASIS Virtual I/O Device (VIRTIO) TC.
>> >
>> > In order to verify user consent to the Feedback License terms and to
>> > minimize spam in the list archive, subscription is required before
>> > posting.
>> >
>> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> > List help: virtio-comment-help@lists.oasis-open.org
>> > List archive:
>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-2Do
>> > pen.org_archives_virtio-
>2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
>> >
>&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1R
>Td67
>> >
>ln4UY3Qz2I8T0_t4KwDaM0Ow&s=JjvHrIJI6KbQVxPovjCYXudvXtlvyYp3u7l56tc
>gZ
>> > SU&e= Feedback License:
>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>2Dope
>> > n.org_who_ipr_feedback-
>5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xt
>> >
>fQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY
>1RTd
>> > 67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=F-
>1nqbZJyAZ3cYJgplGpxopp7wXt26y2UR97jmR
>> > ZSEQ&e= List Guidelines:
>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>2Dope
>> > n.org_policies-2Dguidelines_mailing-
>2Dlists&d=DwIFaQ&c=nKjWec2b6R0mO
>> >
>yPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfj
>Tktm
>> >
>5RY1RTd67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=0JXzYENjagm_cdHWwpZZ9ApvP
>qBLKHhg
>> > ifl_rDNqEP0&e=
>> > Committee:
>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>2Dope
>> >
>n.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ
>2FW
>> >
>52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3
>Qz2I
>> > 8T0_t4KwDaM0Ow&s=C1VTV-Oeidqxe7sJ-
>ctzZXEHto03VJEKs9BqAt2qObc&e=
>> > Join OASIS:
>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>2Dope
>> >
>n.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqs
>ArgF
>> >
>Rdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_t4Kw
>DaM0O
>> > w&s=SNZGlaiIO4-mL1Ct2MIUCBntMVTf8JUglBIv_RGloCY&e=
>> >
>
>This publicly archived list offers a means to provide input to theOASIS Virtual
>I/O Device (VIRTIO) TC.In order to verify user consent to the Feedback License
>terms andto minimize spam in the list archive, subscription is requiredbefore
>posting.Subscribe: virtio-comment-subscribe@lists.oasis-
>open.orgUnsubscribe: virtio-comment-unsubscribe@lists.oasis-open.orgList
>help: virtio-comment-help@lists.oasis-open.orgList archive:
>https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-
>2Dopen.org_archives_virtio-
>2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lq
>qsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0
>_t4KwDaM0Ow&s=JjvHrIJI6KbQVxPovjCYXudvXtlvyYp3u7l56tcgZSU&e=
>Feedback License: https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oasis-2Dopen.org_who_ipr_feedback-
>5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqq
>sArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_
>t4KwDaM0Ow&s=F-1nqbZJyAZ3cYJgplGpxopp7wXt26y2UR97jmRZSEQ&e=
>List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-
>2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFR
>dcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_t4KwD
>aM0Ow&s=0JXzYENjagm_cdHWwpZZ9ApvPqBLKHhgifl_rDNqEP0&e=
>Committee: https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oasis-
>2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&
>r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RT
>d67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=C1VTV-Oeidqxe7sJ-
>ctzZXEHto03VJEKs9BqAt2qObc&e= Join OASIS:
>https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52o
>J3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I
>8T0_t4KwDaM0Ow&s=SNZGlaiIO4-
>mL1Ct2MIUCBntMVTf8JUglBIv_RGloCY&e=


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-10 14:13       ` [virtio-comment] RE: [EXT] " Vitaly Mireyno
@ 2019-11-10 15:34         ` Michael S. Tsirkin
  2019-11-11 13:58           ` [virtio-comment] " Vitaly Mireyno
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-11-10 15:34 UTC (permalink / raw)
  To: Vitaly Mireyno; +Cc: Jason Wang, virtio-comment

On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
> 
> >-----Original Message-----
> >From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> >open.org> On Behalf Of Michael S. Tsirkin
> >Sent: Tuesday, 5 November, 2019 20:52
> >To: Jason Wang <jasowang@redhat.com>
> >Cc: Vitaly Mireyno <vmireyno@marvell.com>; virtio-comment@lists.oasis-
> >open.org
> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized
> >receive buffers feature
> >
> >External Email
> >
> >----------------------------------------------------------------------
> >On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
> >>
> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
> >> > > The feature is limited to receive buffers only.
> >> > > A driver decides on receive buffer length. The only limitation is that this
> >length has to be the same for all receive virtqueue buffers.
> >> > > The driver configures receive buffer length to the device during device
> >initialization, and the device reads it and may use it for optimal operation.
> >> > > No changes for transmit buffers.
> >> > >
> >> > > -----Original Message-----
> >> > > From: virtio-comment@lists.oasis-open.org
> >> > > <virtio-comment@lists.oasis-open.org> On Behalf Of Jason Wang
> >> > > Sent: Thursday, 31 October, 2019 12:15
> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
> >> > > virtio-comment@lists.oasis-open.org
> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized
> >> > > receive buffers feature
> >> > >
> >> > >
> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
> >> > > > Some devices benefit from working with receive buffers of a
> >predefined constant length.
> >> > > > Add a feature for drivers that allocate equal-sized receive buffers, and
> >devices that benefit from predefined receive buffers length.
> >> > > >
> >> > > > Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
> >> > > > ---
> >> > > >    content.tex | 29 +++++++++++++++++++++++++++--
> >> > > >    1 file changed, 27 insertions(+), 2 deletions(-)
> >> > >
> >> > > Is there any other networking device that has this feature?
> >> > >
> >> > >
> >> > > > diff --git a/content.tex b/content.tex index b1ea9b9..c9e67c8
> >> > > > 100644
> >> > > > --- a/content.tex
> >> > > > +++ b/content.tex
> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature bits}\label{sec:Device
> >Types / Network Device / Feature bits
> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address
> >through control
> >> > > >        channel.
> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver allocates all
> >receive buffers
> >> > > > +    of the same constant length. Device benefits from working with
> >> > > > +    receive buffers of equal length.
> >> > > > +
> >> > Problem is, I don't think linux will use this for skbs since it
> >> > breaks buffer accounting. This is because it is important to make
> >> > skbs as small as you can. So even if you see "device would benefit"
> >> > there is no way to balance this with the benefit to linux.
> >> > How do you know which benefit is bigger?
> >>
> >>
> >> I guess the idea is e.g for Linux driver, it can refuse this feature.
> >
> >Okay. What uses it?
> >
> >>
> >> >
> >> > You also never explained how does device benefit. My guess is this
> >> > allows device to calculate the # of descriptors to fetch that are
> >> > needed for a packet. Right?
> >> > Assuming this, I think a rough estimate should be enough.
> >> > If device fetches too much it can discard extra, if it does not
> >> > fetch enough it can fetch extra.
> >> >
> >> > Let us not forget that express packets are 256 bytes so up to 16
> >> > descriptors fit in a packet, there is no benefit most of the time in
> >> > knowing whether e.g. 1 or 2 descriptors are needed.
> >> >
> >> >
> >> > Let us not forget these are buffers, not descriptors.
> >>
> >>
> >> I guess maybe the initial motivation is constant descriptor length.
> >
> >
> >That conflicts with requirement framing is up to driver.
> >
> >
> I was using the wrong terminology. The feature is about constant *descriptor* length. In other words, the value that driver puts in Packed Virtqueue "Element Length" field (or 'len' field in the 'pvirtq_desc' struct).

OK so this conflicts with "2.6.4 Message Framing" which requires that
drivers can split buffers into as many descriptors as they like.
Right?

> Does it make more sense now, or you still see an issue with Linux driver?

I think there's an issue with the flexible framing requirements
and there's an issue with Linux driver.

> The motivation is to allow the device to calculate the exact number of descriptors to consume, before fetching the descriptors. This is beneficial for devices for which overconsuming or underconsuming descriptors come at a high cost.

So I guess we can agree getting the # of descriptors *exactly* right
isn't all that important. Right? My question is how does the driver
balance the device versus Linux needs?

One idea is if we assume this is best effort, does not have to be
exact, then how about just having the device assume descriptor
sizes stay more or less constant and use that to estimate the
amount? If it under/over estimates, things just go a bit slower.
This way driver can adjust the sizes and device will react
automatically, with time.


> >>
> >> >
> >> > So device does not really know the exact # of descriptors:
> >> > current drivers do 1 descriptor per buffer but nothing prevents
> >> > more.
> >> >
> >> >
> >> > Thoughts?
> >> >
> >> >
> >> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process duplicated
> >ACKs
> >> > > >        and report number of coalesced segments and duplicated
> >> > > > ACKs @@ -2854,8 +2858,8 @@ \subsubsection{Legacy Interface:
> >Feature bits}\label{sec:Device Types / Network
> >> > > >    \subsection{Device configuration layout}\label{sec:Device Types /
> >Network Device / Device configuration layout}
> >> > > >    \label{sec:Device Types / Block Device / Feature bits /
> >> > > > Device configuration layout} -Three driver-read-only
> >> > > > configuration fields are currently defined.
> >> > > > The \field{mac} address field -always exists (though is only
> >> > > > valid if VIRTIO_NET_F_MAC is set), and
> >> > > > +The driver-read-only \field{mac} address field always exists
> >> > > > +(though is only valid if VIRTIO_NET_F_MAC is set), and
> >> > > >    \field{status} only exists if VIRTIO_NET_F_STATUS is set. Two
> >> > > >    read-only bits (for the driver) are currently defined for the status
> >field:
> >> > > >    VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
> >> > > > @@ -2875,12 +2879,17 @@ \subsection{Device configuration
> >layout}\label{sec:Device Types / Network Device
> >> > > >    VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for
> >the driver to
> >> > > >    use.
> >> > > > +The device-read-only field \field{rx_buf_len} only exists if
> >> > >
> >> > > Should be driver-read-only.
> >> > >
> >> > >
> >> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field
> >> > > > +specifies the receive buffers length.
> >> > > > +
> >> > > >    \begin{lstlisting}
> >> > > >    struct virtio_net_config {
> >> > > >            u8 mac[6];
> >> > > >            le16 status;
> >> > > >            le16 max_virtqueue_pairs;
> >> > > >            le16 mtu;
> >> > > > +        le32 rx_buf_len;
> >> > > >    };
> >> > > >    \end{lstlisting}
> >> > > > @@ -2933,6 +2942,13 @@ \subsection{Device configuration
> >> > > > layout}\label{sec:Device Types / Network Device
> >> > > >    A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature if
> >the device offers it.
> >> > > > +A driver SHOULD accept the VIRTIO_NET_F_CONST_RXBUF_LEN
> >feature if offered.
> >> > > > +
> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been negotiated,
> >> > > > +the driver MUST set \field{rx_buf_len}.
> >> > >
> >> > > I think it's device that set the field?
> >> > Makes more sense for the driver, but if you want this set e.g.
> >> > before buffers are added, you must say so.
> >> >
> >> > > > +
> >> > > > +A driver MUST NOT modify \field{rx_buf_len} once it has been set.
> >> > > > +
> >> >
> >> > This seems very unflexible. I can see how e.g. XDP would benefit
> >> > from big buffers while skbs benefit from small buffers.
> >> >
> >> > This calls for ability to change this.
> >>
> >>
> >> Yes, but it requires non trivial cleanups for the old length and place
> >> them with new ones.
> >>
> >> Thanks
> >
> >
> >Let's see:
> >1	- making buffer smaller: just update config space,
> >	  then make new buffers smaller
> >
> >2	- making buffers bigger: add larger buffers,
> >	once all small ones are consumed update config space
> >
> >2 is tricky I agree. Thoughts?
> >
> >
> Agree. It's doable, provided that the driver will follow the update procedure.
> >
> >>
> >> >
> >> > > >    \subsubsection{Legacy Interface: Device configuration
> >layout}\label{sec:Device Types / Network Device / Device configuration layout
> >/ Legacy Interface: Device configuration layout}
> >> > > >    \label{sec:Device Types / Block Device / Feature bits / Device
> >configuration layout / Legacy Interface: Device configuration layout}
> >> > > >    When using the legacy interface, transitional devices and
> >> > > > drivers @@
> >> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive
> >Buffers}\label{sec:Device Types / Network Devi
> >> > > >    If VIRTIO_NET_F_MQ is negotiated, each of receiveq1\ldots
> >receiveqN
> >> > > >    that will be used SHOULD be populated with receive buffers.
> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been negotiated,
> >> > > > +the driver MUST initialize all receive virtqueue descriptors
> >> > > > +\field{len} field with the value configured in
> >> > > > +\field{rx_buf_len} device configuration field, and allocate receive
> >buffers accordingly.
> >> > > > +
> >> > > >    \devicenormative{\paragraph}{Setting Up Receive
> >> > > > Buffers}{Device Types / Network Device / Device Operation /
> >> > > > Setting Up Receive Buffers}
> >> > > >    The device MUST set \field{num_buffers} to the number of
> >> > > > descriptors used to @@ -3396,6 +3417,10 @@
> >\subsubsection{Processing of Incoming Packets}\label{sec:Device Types /
> >Network
> >> > > >    checksum (in case of multiple encapsulated protocols, one level
> >> > > >    of checksums is validated).
> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated, the
> >device
> >> > > > +MAY use \field{rx_buf_len} as a buffer length (instead of
> >> > > > +reading it from virtqueue descriptor \field{len} field).
> >> > >
> >> > > Is this safe? What if driver submit a small buffer, then device can read
> >more than what is allowed?
> >> > >
> >> > > Thanks
> >> > >
> >> > >
> >> > > > +
> >> > > >    \drivernormative{\paragraph}{Processing of Incoming
> >> > > >    Packets}{Device Types / Network Device / Device Operation /
> >> > > >    Processing of Incoming Packets}
> >> > > > --
> >> > >
> >> > > This publicly archived list offers a means to provide input to
> >> > > theOASIS Virtual I/O Device (VIRTIO) TC.In order to verify user
> >> > > consent to the Feedback License terms andto minimize spam in the
> >> > > list archive, subscription is requiredbefore posting.Subscribe:
> >> > > virtio-comment-subscribe@lists.oasis-open.orgUnsubscribe:
> >> > > virtio-comment-unsubscribe@lists.oasis-open.orgList help:
> >> > > virtio-comment-help@lists.oasis-open.orgList archive:
> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-2
> >> > > Dopen.org_archives_virtio-
> >2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7
> >> > >
> >xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg
> >5F7
> >> > >
> >_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=ZvYuXs3LuhkeJxW_zoADv1L4RisDyG
> >H_v
> >> > > p7scO91w9s&e= Feedback License:
> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >2Do
> >> > > pen.org_who_ipr_feedback-
> >5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> >> > >
> >z7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3L
> >Rg5
> >> > > F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7Hf-O4ss-
> >rLdNJsAbIOg5ruheBcfAYF
> >> > > dXWuks_HJfFQ&e= List Guidelines:
> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >2Do
> >> > > pen.org_policies-2Dguidelines_mailing-
> >2Dlists&d=DwIFaQ&c=nKjWec2b6
> >> > >
> >R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=
> >KkXM
> >> > >
> >gr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7oMwlBy4fdYwjT2_f9BTw
> >HRm
> >> > > 2GFtIYL-x8e4C0jlrpM&e= Committee:
> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >2Do
> >> > >
> >pen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=l
> >DH
> >> > >
> >J2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uH
> >CHA
> >> > > 2n_Tn6303zez2N4VkSfU&s=-
> >K2akZ06PY7r4TquQY4T4UWdZ6JyAsBDFZIEZFNsWHk
> >> > > &e= Join OASIS:
> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >2Do
> >> > >
> >pen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lq
> >qs
> >> > >
> >ArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6303
> >zez2
> >> > > N4VkSfU&s=lR1SlzXT71tjw7DDuD6XaHefJQSTBCAvRcdp45ureB8&e=
> >> > >
> >> > This publicly archived list offers a means to provide input to the
> >> > OASIS Virtual I/O Device (VIRTIO) TC.
> >> >
> >> > In order to verify user consent to the Feedback License terms and to
> >> > minimize spam in the list archive, subscription is required before
> >> > posting.
> >> >
> >> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> >> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> >> > List help: virtio-comment-help@lists.oasis-open.org
> >> > List archive:
> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-2Do
> >> > pen.org_archives_virtio-
> >2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
> >> >
> >&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1R
> >Td67
> >> >
> >ln4UY3Qz2I8T0_t4KwDaM0Ow&s=JjvHrIJI6KbQVxPovjCYXudvXtlvyYp3u7l56tc
> >gZ
> >> > SU&e= Feedback License:
> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >2Dope
> >> > n.org_who_ipr_feedback-
> >5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xt
> >> >
> >fQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY
> >1RTd
> >> > 67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=F-
> >1nqbZJyAZ3cYJgplGpxopp7wXt26y2UR97jmR
> >> > ZSEQ&e= List Guidelines:
> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >2Dope
> >> > n.org_policies-2Dguidelines_mailing-
> >2Dlists&d=DwIFaQ&c=nKjWec2b6R0mO
> >> >
> >yPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfj
> >Tktm
> >> >
> >5RY1RTd67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=0JXzYENjagm_cdHWwpZZ9ApvP
> >qBLKHhg
> >> > ifl_rDNqEP0&e=
> >> > Committee:
> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >2Dope
> >> >
> >n.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ
> >2FW
> >> >
> >52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3
> >Qz2I
> >> > 8T0_t4KwDaM0Ow&s=C1VTV-Oeidqxe7sJ-
> >ctzZXEHto03VJEKs9BqAt2qObc&e=
> >> > Join OASIS:
> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >2Dope
> >> >
> >n.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqs
> >ArgF
> >> >
> >Rdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_t4Kw
> >DaM0O
> >> > w&s=SNZGlaiIO4-mL1Ct2MIUCBntMVTf8JUglBIv_RGloCY&e=
> >> >
> >
> >This publicly archived list offers a means to provide input to theOASIS Virtual
> >I/O Device (VIRTIO) TC.In order to verify user consent to the Feedback License
> >terms andto minimize spam in the list archive, subscription is requiredbefore
> >posting.Subscribe: virtio-comment-subscribe@lists.oasis-
> >open.orgUnsubscribe: virtio-comment-unsubscribe@lists.oasis-open.orgList
> >help: virtio-comment-help@lists.oasis-open.orgList archive:
> >https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-
> >2Dopen.org_archives_virtio-
> >2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lq
> >qsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0
> >_t4KwDaM0Ow&s=JjvHrIJI6KbQVxPovjCYXudvXtlvyYp3u7l56tcgZSU&e=
> >Feedback License: https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__www.oasis-2Dopen.org_who_ipr_feedback-
> >5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqq
> >sArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_
> >t4KwDaM0Ow&s=F-1nqbZJyAZ3cYJgplGpxopp7wXt26y2UR97jmRZSEQ&e=
> >List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-
> >2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFR
> >dcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_t4KwD
> >aM0Ow&s=0JXzYENjagm_cdHWwpZZ9ApvPqBLKHhgifl_rDNqEP0&e=
> >Committee: https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__www.oasis-
> >2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&
> >r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RT
> >d67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=C1VTV-Oeidqxe7sJ-
> >ctzZXEHto03VJEKs9BqAt2qObc&e= Join OASIS:
> >https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52o
> >J3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I
> >8T0_t4KwDaM0Ow&s=SNZGlaiIO4-
> >mL1Ct2MIUCBntMVTf8JUglBIv_RGloCY&e=
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [virtio-comment] RE: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-10 15:34         ` [virtio-comment] " Michael S. Tsirkin
@ 2019-11-11 13:58           ` Vitaly Mireyno
  2019-11-11 15:05             ` [virtio-comment] " Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Vitaly Mireyno @ 2019-11-11 13:58 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, virtio-comment



>-----Original Message-----
>From: Michael S. Tsirkin <mst@redhat.com>
>Sent: Sunday, 10 November, 2019 17:35
>To: Vitaly Mireyno <vmireyno@marvell.com>
>Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>open.org
>Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-
>sized receive buffers feature
>
>On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
>>
>> >-----Original Message-----
>> >From: virtio-comment@lists.oasis-open.org
>> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
>> >Tsirkin
>> >Sent: Tuesday, 5 November, 2019 20:52
>> >To: Jason Wang <jasowang@redhat.com>
>> >Cc: Vitaly Mireyno <vmireyno@marvell.com>;
>> >virtio-comment@lists.oasis- open.org
>> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
>> >equal-sized receive buffers feature
>> >
>> >External Email
>> >
>> >---------------------------------------------------------------------
>> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
>> >>
>> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
>> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
>> >> > > The feature is limited to receive buffers only.
>> >> > > A driver decides on receive buffer length. The only limitation
>> >> > > is that this
>> >length has to be the same for all receive virtqueue buffers.
>> >> > > The driver configures receive buffer length to the device
>> >> > > during device
>> >initialization, and the device reads it and may use it for optimal operation.
>> >> > > No changes for transmit buffers.
>> >> > >
>> >> > > -----Original Message-----
>> >> > > From: virtio-comment@lists.oasis-open.org
>> >> > > <virtio-comment@lists.oasis-open.org> On Behalf Of Jason Wang
>> >> > > Sent: Thursday, 31 October, 2019 12:15
>> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
>> >> > > virtio-comment@lists.oasis-open.org
>> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
>> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net: Add
>> >> > > equal-sized receive buffers feature
>> >> > >
>> >> > >
>> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
>> >> > > > Some devices benefit from working with receive buffers of a
>> >predefined constant length.
>> >> > > > Add a feature for drivers that allocate equal-sized receive
>> >> > > > buffers, and
>> >devices that benefit from predefined receive buffers length.
>> >> > > >
>> >> > > > Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
>> >> > > > ---
>> >> > > >    content.tex | 29 +++++++++++++++++++++++++++--
>> >> > > >    1 file changed, 27 insertions(+), 2 deletions(-)
>> >> > >
>> >> > > Is there any other networking device that has this feature?
>> >> > >
>> >> > >
>> >> > > > diff --git a/content.tex b/content.tex index b1ea9b9..c9e67c8
>> >> > > > 100644
>> >> > > > --- a/content.tex
>> >> > > > +++ b/content.tex
>> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
>> >> > > > bits}\label{sec:Device
>> >Types / Network Device / Feature bits
>> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address
>> >through control
>> >> > > >        channel.
>> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver allocates all
>> >receive buffers
>> >> > > > +    of the same constant length. Device benefits from working with
>> >> > > > +    receive buffers of equal length.
>> >> > > > +
>> >> > Problem is, I don't think linux will use this for skbs since it
>> >> > breaks buffer accounting. This is because it is important to make
>> >> > skbs as small as you can. So even if you see "device would benefit"
>> >> > there is no way to balance this with the benefit to linux.
>> >> > How do you know which benefit is bigger?
>> >>
>> >>
>> >> I guess the idea is e.g for Linux driver, it can refuse this feature.
>> >
>> >Okay. What uses it?
>> >
>> >>
>> >> >
>> >> > You also never explained how does device benefit. My guess is
>> >> > this allows device to calculate the # of descriptors to fetch
>> >> > that are needed for a packet. Right?
>> >> > Assuming this, I think a rough estimate should be enough.
>> >> > If device fetches too much it can discard extra, if it does not
>> >> > fetch enough it can fetch extra.
>> >> >
>> >> > Let us not forget that express packets are 256 bytes so up to 16
>> >> > descriptors fit in a packet, there is no benefit most of the time
>> >> > in knowing whether e.g. 1 or 2 descriptors are needed.
>> >> >
>> >> >
>> >> > Let us not forget these are buffers, not descriptors.
>> >>
>> >>
>> >> I guess maybe the initial motivation is constant descriptor length.
>> >
>> >
>> >That conflicts with requirement framing is up to driver.
>> >
>> >
>> I was using the wrong terminology. The feature is about constant
>*descriptor* length. In other words, the value that driver puts in Packed
>Virtqueue "Element Length" field (or 'len' field in the 'pvirtq_desc' struct).
>
>OK so this conflicts with "2.6.4 Message Framing" which requires that drivers
>can split buffers into as many descriptors as they like.
>Right?
>
>> Does it make more sense now, or you still see an issue with Linux driver?
>
>I think there's an issue with the flexible framing requirements and there's an
>issue with Linux driver.
>
>> The motivation is to allow the device to calculate the exact number of
>descriptors to consume, before fetching the descriptors. This is beneficial for
>devices for which overconsuming or underconsuming descriptors come at a
>high cost.
>
>So I guess we can agree getting the # of descriptors *exactly* right isn't all
>that important. Right? My question is how does the driver balance the device
>versus Linux needs?
>
>One idea is if we assume this is best effort, does not have to be exact, then
>how about just having the device assume descriptor sizes stay more or less
>constant and use that to estimate the amount? If it under/over estimates,
>things just go a bit slower.
>This way driver can adjust the sizes and device will react automatically, with
>time.
>
>

I see that "2.6.4 Message Framing" allows a full flexibility of descriptor lengths, and I presume it's applicable to Packet Virtqueues as well, though defined under Split Virtqueues section. The example talks about transmit virtqueue, and it makes perfect sense.
However, wouldn't a typical driver place equal-sized *receive* descriptors anyway? So if a device can benefit from it, the driver might as well negotiate this new feature. This could be especially relevant for devices, for which adjusting the number of used descriptors is impractical after descriptors have already been fetched.
I agree that if this requirement conflicts with specific driver needs, it will not be negotiated, and the device will either underperform in certain scenarios, or will not come up at all.

Could you specify what issue do you see with the Linux driver?

>> >>
>> >> >
>> >> > So device does not really know the exact # of descriptors:
>> >> > current drivers do 1 descriptor per buffer but nothing prevents
>> >> > more.
>> >> >
>> >> >
>> >> > Thoughts?
>> >> >
>> >> >
>> >> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process
>> >> > > > duplicated
>> >ACKs
>> >> > > >        and report number of coalesced segments and duplicated
>> >> > > > ACKs @@ -2854,8 +2858,8 @@ \subsubsection{Legacy Interface:
>> >Feature bits}\label{sec:Device Types / Network
>> >> > > >    \subsection{Device configuration layout}\label{sec:Device
>> >> > > > Types /
>> >Network Device / Device configuration layout}
>> >> > > >    \label{sec:Device Types / Block Device / Feature bits /
>> >> > > > Device configuration layout} -Three driver-read-only
>> >> > > > configuration fields are currently defined.
>> >> > > > The \field{mac} address field -always exists (though is only
>> >> > > > valid if VIRTIO_NET_F_MAC is set), and
>> >> > > > +The driver-read-only \field{mac} address field always exists
>> >> > > > +(though is only valid if VIRTIO_NET_F_MAC is set), and
>> >> > > >    \field{status} only exists if VIRTIO_NET_F_STATUS is set. Two
>> >> > > >    read-only bits (for the driver) are currently defined for
>> >> > > > the status
>> >field:
>> >> > > >    VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
>> >> > > > @@ -2875,12 +2879,17 @@ \subsection{Device configuration
>> >layout}\label{sec:Device Types / Network Device
>> >> > > >    VIRTIO_NET_F_MTU is set. This field specifies the maximum
>> >> > > > MTU for
>> >the driver to
>> >> > > >    use.
>> >> > > > +The device-read-only field \field{rx_buf_len} only exists if
>> >> > >
>> >> > > Should be driver-read-only.
>> >> > >
>> >> > >
>> >> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field
>> >> > > > +specifies the receive buffers length.
>> >> > > > +
>> >> > > >    \begin{lstlisting}
>> >> > > >    struct virtio_net_config {
>> >> > > >            u8 mac[6];
>> >> > > >            le16 status;
>> >> > > >            le16 max_virtqueue_pairs;
>> >> > > >            le16 mtu;
>> >> > > > +        le32 rx_buf_len;
>> >> > > >    };
>> >> > > >    \end{lstlisting}
>> >> > > > @@ -2933,6 +2942,13 @@ \subsection{Device configuration
>> >> > > > layout}\label{sec:Device Types / Network Device
>> >> > > >    A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature
>> >> > > > if
>> >the device offers it.
>> >> > > > +A driver SHOULD accept the VIRTIO_NET_F_CONST_RXBUF_LEN
>> >feature if offered.
>> >> > > > +
>> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
>negotiated,
>> >> > > > +the driver MUST set \field{rx_buf_len}.
>> >> > >
>> >> > > I think it's device that set the field?
>> >> > Makes more sense for the driver, but if you want this set e.g.
>> >> > before buffers are added, you must say so.
>> >> >
>> >> > > > +
>> >> > > > +A driver MUST NOT modify \field{rx_buf_len} once it has been
>set.
>> >> > > > +
>> >> >
>> >> > This seems very unflexible. I can see how e.g. XDP would benefit
>> >> > from big buffers while skbs benefit from small buffers.
>> >> >
>> >> > This calls for ability to change this.
>> >>
>> >>
>> >> Yes, but it requires non trivial cleanups for the old length and
>> >> place them with new ones.
>> >>
>> >> Thanks
>> >
>> >
>> >Let's see:
>> >1	- making buffer smaller: just update config space,
>> >	  then make new buffers smaller
>> >
>> >2	- making buffers bigger: add larger buffers,
>> >	once all small ones are consumed update config space
>> >
>> >2 is tricky I agree. Thoughts?
>> >
>> >
>> Agree. It's doable, provided that the driver will follow the update
>procedure.
>> >
>> >>
>> >> >
>> >> > > >    \subsubsection{Legacy Interface: Device configuration
>> >layout}\label{sec:Device Types / Network Device / Device
>> >configuration layout / Legacy Interface: Device configuration layout}
>> >> > > >    \label{sec:Device Types / Block Device / Feature bits /
>> >> > > > Device
>> >configuration layout / Legacy Interface: Device configuration layout}
>> >> > > >    When using the legacy interface, transitional devices and
>> >> > > > drivers @@
>> >> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive
>> >Buffers}\label{sec:Device Types / Network Devi
>> >> > > >    If VIRTIO_NET_F_MQ is negotiated, each of receiveq1\ldots
>> >receiveqN
>> >> > > >    that will be used SHOULD be populated with receive buffers.
>> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
>negotiated,
>> >> > > > +the driver MUST initialize all receive virtqueue descriptors
>> >> > > > +\field{len} field with the value configured in
>> >> > > > +\field{rx_buf_len} device configuration field, and allocate
>> >> > > > +receive
>> >buffers accordingly.
>> >> > > > +
>> >> > > >    \devicenormative{\paragraph}{Setting Up Receive
>> >> > > > Buffers}{Device Types / Network Device / Device Operation /
>> >> > > > Setting Up Receive Buffers}
>> >> > > >    The device MUST set \field{num_buffers} to the number of
>> >> > > > descriptors used to @@ -3396,6 +3417,10 @@
>> >\subsubsection{Processing of Incoming Packets}\label{sec:Device Types
>> >/ Network
>> >> > > >    checksum (in case of multiple encapsulated protocols, one level
>> >> > > >    of checksums is validated).
>> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated, the
>> >device
>> >> > > > +MAY use \field{rx_buf_len} as a buffer length (instead of
>> >> > > > +reading it from virtqueue descriptor \field{len} field).
>> >> > >
>> >> > > Is this safe? What if driver submit a small buffer, then device
>> >> > > can read
>> >more than what is allowed?
>> >> > >
>> >> > > Thanks
>> >> > >
>> >> > >
>> >> > > > +
>> >> > > >    \drivernormative{\paragraph}{Processing of Incoming
>> >> > > >    Packets}{Device Types / Network Device / Device Operation /
>> >> > > >    Processing of Incoming Packets}
>> >> > > > --
>> >> > >
>> >> > > This publicly archived list offers a means to provide input to
>> >> > > theOASIS Virtual I/O Device (VIRTIO) TC.In order to verify user
>> >> > > consent to the Feedback License terms andto minimize spam in
>> >> > > the list archive, subscription is requiredbefore posting.Subscribe:
>> >> > > virtio-comment-subscribe@lists.oasis-open.orgUnsubscribe:
>> >> > > virtio-comment-unsubscribe@lists.oasis-open.orgList help:
>> >> > > virtio-comment-help@lists.oasis-open.orgList archive:
>> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasi
>> >> > > s-2
>> >> > > Dopen.org_archives_virtio-
>> >2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7
>> >> > >
>>
>>xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LR
>g
>> >5F7
>> >> > >
>>
>>_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=ZvYuXs3LuhkeJxW_zoADv1L4RisDy
>G
>> >H_v
>> >> > > p7scO91w9s&e= Feedback License:
>> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>> >2Do
>> >> > > pen.org_who_ipr_feedback-
>> >5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPa
>> >> > >
>>
>>z7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3
>L
>> >Rg5
>> >> > > F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7Hf-O4ss-
>> >rLdNJsAbIOg5ruheBcfAYF
>> >> > > dXWuks_HJfFQ&e= List Guidelines:
>> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>> >2Do
>> >> > > pen.org_policies-2Dguidelines_mailing-
>> >2Dlists&d=DwIFaQ&c=nKjWec2b6
>> >> > >
>>
>>R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m
>=
>> >KkXM
>> >> > >
>>
>>gr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7oMwlBy4fdYwjT2_f9BT
>w
>> >HRm
>> >> > > 2GFtIYL-x8e4C0jlrpM&e= Committee:
>> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>> >2Do
>> >> > >
>>
>>pen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=l
>> >DH
>> >> > >
>>
>>J2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6u
>H
>> >CHA
>> >> > > 2n_Tn6303zez2N4VkSfU&s=-
>> >K2akZ06PY7r4TquQY4T4UWdZ6JyAsBDFZIEZFNsWHk
>> >> > > &e= Join OASIS:
>> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>> >2Do
>> >> > >
>>
>>pen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3l
>q
>> >qs
>> >> > >
>>
>>ArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn630
>3
>> >zez2
>> >> > > N4VkSfU&s=lR1SlzXT71tjw7DDuD6XaHefJQSTBCAvRcdp45ureB8&e=
>> >> > >
>> >> > This publicly archived list offers a means to provide input to
>> >> > the OASIS Virtual I/O Device (VIRTIO) TC.
>> >> >
>> >> > In order to verify user consent to the Feedback License terms and
>> >> > to minimize spam in the list archive, subscription is required
>> >> > before posting.
>> >> >
>> >> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> >> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> >> > List help: virtio-comment-help@lists.oasis-open.org
>> >> > List archive:
>> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-
>> >> > 2Do
>> >> > pen.org_archives_virtio-
>> >2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
>> >> >
>>
>>&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1
>R
>> >Td67
>> >> >
>>
>>ln4UY3Qz2I8T0_t4KwDaM0Ow&s=JjvHrIJI6KbQVxPovjCYXudvXtlvyYp3u7l56t
>c
>> >gZ
>> >> > SU&e= Feedback License:
>> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>> >2Dope
>> >> > n.org_who_ipr_feedback-
>> >5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xt
>> >> >
>>
>>fQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5
>RY
>> >1RTd
>> >> > 67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=F-
>> >1nqbZJyAZ3cYJgplGpxopp7wXt26y2UR97jmR
>> >> > ZSEQ&e= List Guidelines:
>> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>> >2Dope
>> >> > n.org_policies-2Dguidelines_mailing-
>> >2Dlists&d=DwIFaQ&c=nKjWec2b6R0mO
>> >> >
>>
>>yPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcf
>j
>> >Tktm
>> >> >
>>
>>5RY1RTd67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=0JXzYENjagm_cdHWwpZZ9Apv
>P
>> >qBLKHhg
>> >> > ifl_rDNqEP0&e=
>> >> > Committee:
>> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>> >2Dope
>> >> >
>>
>>n.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lD
>HJ
>> >2FW
>> >> >
>>
>>52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY
>3
>> >Qz2I
>> >> > 8T0_t4KwDaM0Ow&s=C1VTV-Oeidqxe7sJ-
>> >ctzZXEHto03VJEKs9BqAt2qObc&e=
>> >> > Join OASIS:
>> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>> >2Dope
>> >> >
>>
>>n.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqq
>s
>> >ArgF
>> >> >
>>
>>Rdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_t4K
>w
>> >DaM0O
>> >> > w&s=SNZGlaiIO4-mL1Ct2MIUCBntMVTf8JUglBIv_RGloCY&e=
>> >> >
>> >
>> >This publicly archived list offers a means to provide input to
>> >theOASIS Virtual I/O Device (VIRTIO) TC.In order to verify user
>> >consent to the Feedback License terms andto minimize spam in the list
>> >archive, subscription is requiredbefore
>> >posting.Subscribe: virtio-comment-subscribe@lists.oasis-
>> >open.orgUnsubscribe:
>> >virtio-comment-unsubscribe@lists.oasis-open.orgList
>> >help: virtio-comment-help@lists.oasis-open.orgList archive:
>> >https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-
>> >2Dopen.org_archives_virtio-
>>
>>2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3l
>q
>>
>>qsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T
>0
>> >_t4KwDaM0Ow&s=JjvHrIJI6KbQVxPovjCYXudvXtlvyYp3u7l56tcgZSU&e=
>> >Feedback License: https://urldefense.proofpoint.com/v2/url?u=https-
>> >3A__www.oasis-2Dopen.org_who_ipr_feedback-
>>
>>5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lq
>q
>>
>>sArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0
>_
>> >t4KwDaM0Ow&s=F-1nqbZJyAZ3cYJgplGpxopp7wXt26y2UR97jmRZSEQ&e=
>> >List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-
>> >3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-
>>
>>2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgF
>R
>>
>>dcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_t4Kw
>D
>> >aM0Ow&s=0JXzYENjagm_cdHWwpZZ9ApvPqBLKHhgifl_rDNqEP0&e=
>> >Committee: https://urldefense.proofpoint.com/v2/url?u=https-
>> >3A__www.oasis-
>>
>>2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
>&
>>
>>r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1R
>T
>> >d67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=C1VTV-Oeidqxe7sJ-
>> >ctzZXEHto03VJEKs9BqAt2qObc&e= Join OASIS:
>> >https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>>
>>2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52
>o
>>
>>J3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz
>2I
>> >8T0_t4KwDaM0Ow&s=SNZGlaiIO4-
>> >mL1Ct2MIUCBntMVTf8JUglBIv_RGloCY&e=
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-11 13:58           ` [virtio-comment] " Vitaly Mireyno
@ 2019-11-11 15:05             ` Michael S. Tsirkin
  2019-11-14 15:49               ` [virtio-comment] " Vitaly Mireyno
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-11-11 15:05 UTC (permalink / raw)
  To: Vitaly Mireyno; +Cc: Jason Wang, virtio-comment

On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrote:
> 
> 
> >-----Original Message-----
> >From: Michael S. Tsirkin <mst@redhat.com>
> >Sent: Sunday, 10 November, 2019 17:35
> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> >open.org
> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-
> >sized receive buffers feature
> >
> >On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
> >>
> >> >-----Original Message-----
> >> >From: virtio-comment@lists.oasis-open.org
> >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
> >> >Tsirkin
> >> >Sent: Tuesday, 5 November, 2019 20:52
> >> >To: Jason Wang <jasowang@redhat.com>
> >> >Cc: Vitaly Mireyno <vmireyno@marvell.com>;
> >> >virtio-comment@lists.oasis- open.org
> >> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
> >> >equal-sized receive buffers feature
> >> >
> >> >External Email
> >> >
> >> >---------------------------------------------------------------------
> >> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
> >> >>
> >> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
> >> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
> >> >> > > The feature is limited to receive buffers only.
> >> >> > > A driver decides on receive buffer length. The only limitation
> >> >> > > is that this
> >> >length has to be the same for all receive virtqueue buffers.
> >> >> > > The driver configures receive buffer length to the device
> >> >> > > during device
> >> >initialization, and the device reads it and may use it for optimal operation.
> >> >> > > No changes for transmit buffers.
> >> >> > >
> >> >> > > -----Original Message-----
> >> >> > > From: virtio-comment@lists.oasis-open.org
> >> >> > > <virtio-comment@lists.oasis-open.org> On Behalf Of Jason Wang
> >> >> > > Sent: Thursday, 31 October, 2019 12:15
> >> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
> >> >> > > virtio-comment@lists.oasis-open.org
> >> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> >> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net: Add
> >> >> > > equal-sized receive buffers feature
> >> >> > >
> >> >> > >
> >> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
> >> >> > > > Some devices benefit from working with receive buffers of a
> >> >predefined constant length.
> >> >> > > > Add a feature for drivers that allocate equal-sized receive
> >> >> > > > buffers, and
> >> >devices that benefit from predefined receive buffers length.
> >> >> > > >
> >> >> > > > Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
> >> >> > > > ---
> >> >> > > >    content.tex | 29 +++++++++++++++++++++++++++--
> >> >> > > >    1 file changed, 27 insertions(+), 2 deletions(-)
> >> >> > >
> >> >> > > Is there any other networking device that has this feature?
> >> >> > >
> >> >> > >
> >> >> > > > diff --git a/content.tex b/content.tex index b1ea9b9..c9e67c8
> >> >> > > > 100644
> >> >> > > > --- a/content.tex
> >> >> > > > +++ b/content.tex
> >> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
> >> >> > > > bits}\label{sec:Device
> >> >Types / Network Device / Feature bits
> >> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address
> >> >through control
> >> >> > > >        channel.
> >> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver allocates all
> >> >receive buffers
> >> >> > > > +    of the same constant length. Device benefits from working with
> >> >> > > > +    receive buffers of equal length.
> >> >> > > > +
> >> >> > Problem is, I don't think linux will use this for skbs since it
> >> >> > breaks buffer accounting. This is because it is important to make
> >> >> > skbs as small as you can. So even if you see "device would benefit"
> >> >> > there is no way to balance this with the benefit to linux.
> >> >> > How do you know which benefit is bigger?
> >> >>
> >> >>
> >> >> I guess the idea is e.g for Linux driver, it can refuse this feature.
> >> >
> >> >Okay. What uses it?
> >> >
> >> >>
> >> >> >
> >> >> > You also never explained how does device benefit. My guess is
> >> >> > this allows device to calculate the # of descriptors to fetch
> >> >> > that are needed for a packet. Right?
> >> >> > Assuming this, I think a rough estimate should be enough.
> >> >> > If device fetches too much it can discard extra, if it does not
> >> >> > fetch enough it can fetch extra.
> >> >> >
> >> >> > Let us not forget that express packets are 256 bytes so up to 16
> >> >> > descriptors fit in a packet, there is no benefit most of the time
> >> >> > in knowing whether e.g. 1 or 2 descriptors are needed.
> >> >> >
> >> >> >
> >> >> > Let us not forget these are buffers, not descriptors.
> >> >>
> >> >>
> >> >> I guess maybe the initial motivation is constant descriptor length.
> >> >
> >> >
> >> >That conflicts with requirement framing is up to driver.
> >> >
> >> >
> >> I was using the wrong terminology. The feature is about constant
> >*descriptor* length. In other words, the value that driver puts in Packed
> >Virtqueue "Element Length" field (or 'len' field in the 'pvirtq_desc' struct).
> >
> >OK so this conflicts with "2.6.4 Message Framing" which requires that drivers
> >can split buffers into as many descriptors as they like.
> >Right?
> >
> >> Does it make more sense now, or you still see an issue with Linux driver?
> >
> >I think there's an issue with the flexible framing requirements and there's an
> >issue with Linux driver.
> >
> >> The motivation is to allow the device to calculate the exact number of
> >descriptors to consume, before fetching the descriptors. This is beneficial for
> >devices for which overconsuming or underconsuming descriptors come at a
> >high cost.
> >
> >So I guess we can agree getting the # of descriptors *exactly* right isn't all
> >that important. Right? My question is how does the driver balance the device
> >versus Linux needs?
> >
> >One idea is if we assume this is best effort, does not have to be exact, then
> >how about just having the device assume descriptor sizes stay more or less
> >constant and use that to estimate the amount? If it under/over estimates,
> >things just go a bit slower.
> >This way driver can adjust the sizes and device will react automatically, with
> >time.
> >
> >
> 
> I see that "2.6.4 Message Framing" allows a full flexibility of
> descriptor lengths, and I presume it's applicable to Packet Virtqueues
> as well, though defined under Split Virtqueues section.

That's a good point. Probably makes sense to move it out to
a common section, right?

> The example
> talks about transmit virtqueue, and it makes perfect sense.
> However, wouldn't a typical driver place equal-sized *receive* descriptors anyway? So if a device can benefit from it, the driver might as well negotiate this new feature.

Having buffer size jump around wildly doesn't seem too useful, I agree.
OTOH being able to adjust it gradually has been in the past demontrated
to help performance measureably.

> This could be especially relevant for devices, for which adjusting the number of used descriptors is impractical after descriptors have already been fetched.
> I agree that if this requirement conflicts with specific driver needs, it will not be negotiated, and the device will either underperform in certain scenarios, or will not come up at all.

Right so that makes it a challenge. Device says it prefers fixed
buffers. Is that preference more or less important than ability
to efficiently support workloads such as TCP small queues?
Driver has no idea and I suspect neither does the device.
So I don't see how will a Linux driver know that it should enable this,
neither how will device know it's okay to just refuse features_ok.


On the other hand, current drivers already have logic that tries to
change buffer sizes in a smooth way.  So simply caching the last buffer
size and assuming all the others will be exactly the same will go a long
way towards limiting how much does device need to fetch.  This does
imply extra logic on the device to recover if things change and the
first read did not fetch enough buffers, but then it would be required
anyway if as you say above the failure is not catastrophic.


The feature thus only seems useful for small, feature-restrained devices
- which are prepared to sacrifice some performance to cut costs, and
which can't recover at all. Is this what you are trying to do?

> Could you specify what issue do you see with the Linux driver?

See logic around struct ewma.

The first relevant commit is I guess
commit ab7db91705e95ed1bba1304388936fccfa58c992
    virtio-net: auto-tune mergeable rx buffer size for improved performance

this claims a gain of about 10% with large packets which
isn't earth shattering but also not something we can
ignore completely. And I suspect it can be bigger with
smaller packets.

> >> >>
> >> >> >
> >> >> > So device does not really know the exact # of descriptors:
> >> >> > current drivers do 1 descriptor per buffer but nothing prevents
> >> >> > more.
> >> >> >
> >> >> >
> >> >> > Thoughts?
> >> >> >
> >> >> >
> >> >> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process
> >> >> > > > duplicated
> >> >ACKs
> >> >> > > >        and report number of coalesced segments and duplicated
> >> >> > > > ACKs @@ -2854,8 +2858,8 @@ \subsubsection{Legacy Interface:
> >> >Feature bits}\label{sec:Device Types / Network
> >> >> > > >    \subsection{Device configuration layout}\label{sec:Device
> >> >> > > > Types /
> >> >Network Device / Device configuration layout}
> >> >> > > >    \label{sec:Device Types / Block Device / Feature bits /
> >> >> > > > Device configuration layout} -Three driver-read-only
> >> >> > > > configuration fields are currently defined.
> >> >> > > > The \field{mac} address field -always exists (though is only
> >> >> > > > valid if VIRTIO_NET_F_MAC is set), and
> >> >> > > > +The driver-read-only \field{mac} address field always exists
> >> >> > > > +(though is only valid if VIRTIO_NET_F_MAC is set), and
> >> >> > > >    \field{status} only exists if VIRTIO_NET_F_STATUS is set. Two
> >> >> > > >    read-only bits (for the driver) are currently defined for
> >> >> > > > the status
> >> >field:
> >> >> > > >    VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
> >> >> > > > @@ -2875,12 +2879,17 @@ \subsection{Device configuration
> >> >layout}\label{sec:Device Types / Network Device
> >> >> > > >    VIRTIO_NET_F_MTU is set. This field specifies the maximum
> >> >> > > > MTU for
> >> >the driver to
> >> >> > > >    use.
> >> >> > > > +The device-read-only field \field{rx_buf_len} only exists if
> >> >> > >
> >> >> > > Should be driver-read-only.
> >> >> > >
> >> >> > >
> >> >> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field
> >> >> > > > +specifies the receive buffers length.
> >> >> > > > +
> >> >> > > >    \begin{lstlisting}
> >> >> > > >    struct virtio_net_config {
> >> >> > > >            u8 mac[6];
> >> >> > > >            le16 status;
> >> >> > > >            le16 max_virtqueue_pairs;
> >> >> > > >            le16 mtu;
> >> >> > > > +        le32 rx_buf_len;
> >> >> > > >    };
> >> >> > > >    \end{lstlisting}
> >> >> > > > @@ -2933,6 +2942,13 @@ \subsection{Device configuration
> >> >> > > > layout}\label{sec:Device Types / Network Device
> >> >> > > >    A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature
> >> >> > > > if
> >> >the device offers it.
> >> >> > > > +A driver SHOULD accept the VIRTIO_NET_F_CONST_RXBUF_LEN
> >> >feature if offered.
> >> >> > > > +
> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
> >negotiated,
> >> >> > > > +the driver MUST set \field{rx_buf_len}.
> >> >> > >
> >> >> > > I think it's device that set the field?
> >> >> > Makes more sense for the driver, but if you want this set e.g.
> >> >> > before buffers are added, you must say so.
> >> >> >
> >> >> > > > +
> >> >> > > > +A driver MUST NOT modify \field{rx_buf_len} once it has been
> >set.
> >> >> > > > +
> >> >> >
> >> >> > This seems very unflexible. I can see how e.g. XDP would benefit
> >> >> > from big buffers while skbs benefit from small buffers.
> >> >> >
> >> >> > This calls for ability to change this.
> >> >>
> >> >>
> >> >> Yes, but it requires non trivial cleanups for the old length and
> >> >> place them with new ones.
> >> >>
> >> >> Thanks
> >> >
> >> >
> >> >Let's see:
> >> >1	- making buffer smaller: just update config space,
> >> >	  then make new buffers smaller
> >> >
> >> >2	- making buffers bigger: add larger buffers,
> >> >	once all small ones are consumed update config space
> >> >
> >> >2 is tricky I agree. Thoughts?
> >> >
> >> >
> >> Agree. It's doable, provided that the driver will follow the update
> >procedure.
> >> >
> >> >>
> >> >> >
> >> >> > > >    \subsubsection{Legacy Interface: Device configuration
> >> >layout}\label{sec:Device Types / Network Device / Device
> >> >configuration layout / Legacy Interface: Device configuration layout}
> >> >> > > >    \label{sec:Device Types / Block Device / Feature bits /
> >> >> > > > Device
> >> >configuration layout / Legacy Interface: Device configuration layout}
> >> >> > > >    When using the legacy interface, transitional devices and
> >> >> > > > drivers @@
> >> >> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive
> >> >Buffers}\label{sec:Device Types / Network Devi
> >> >> > > >    If VIRTIO_NET_F_MQ is negotiated, each of receiveq1\ldots
> >> >receiveqN
> >> >> > > >    that will be used SHOULD be populated with receive buffers.
> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
> >negotiated,
> >> >> > > > +the driver MUST initialize all receive virtqueue descriptors
> >> >> > > > +\field{len} field with the value configured in
> >> >> > > > +\field{rx_buf_len} device configuration field, and allocate
> >> >> > > > +receive
> >> >buffers accordingly.
> >> >> > > > +
> >> >> > > >    \devicenormative{\paragraph}{Setting Up Receive
> >> >> > > > Buffers}{Device Types / Network Device / Device Operation /
> >> >> > > > Setting Up Receive Buffers}
> >> >> > > >    The device MUST set \field{num_buffers} to the number of
> >> >> > > > descriptors used to @@ -3396,6 +3417,10 @@
> >> >\subsubsection{Processing of Incoming Packets}\label{sec:Device Types
> >> >/ Network
> >> >> > > >    checksum (in case of multiple encapsulated protocols, one level
> >> >> > > >    of checksums is validated).
> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated, the
> >> >device
> >> >> > > > +MAY use \field{rx_buf_len} as a buffer length (instead of
> >> >> > > > +reading it from virtqueue descriptor \field{len} field).
> >> >> > >
> >> >> > > Is this safe? What if driver submit a small buffer, then device
> >> >> > > can read
> >> >more than what is allowed?
> >> >> > >
> >> >> > > Thanks
> >> >> > >
> >> >> > >
> >> >> > > > +
> >> >> > > >    \drivernormative{\paragraph}{Processing of Incoming
> >> >> > > >    Packets}{Device Types / Network Device / Device Operation /
> >> >> > > >    Processing of Incoming Packets}
> >> >> > > > --
> >> >> > >
> >> >> > > This publicly archived list offers a means to provide input to
> >> >> > > theOASIS Virtual I/O Device (VIRTIO) TC.In order to verify user
> >> >> > > consent to the Feedback License terms andto minimize spam in
> >> >> > > the list archive, subscription is requiredbefore posting.Subscribe:
> >> >> > > virtio-comment-subscribe@lists.oasis-open.orgUnsubscribe:
> >> >> > > virtio-comment-unsubscribe@lists.oasis-open.orgList help:
> >> >> > > virtio-comment-help@lists.oasis-open.orgList archive:
> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasi
> >> >> > > s-2
> >> >> > > Dopen.org_archives_virtio-
> >> >2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7
> >> >> > >
> >>
> >>xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LR
> >g
> >> >5F7
> >> >> > >
> >>
> >>_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=ZvYuXs3LuhkeJxW_zoADv1L4RisDy
> >G
> >> >H_v
> >> >> > > p7scO91w9s&e= Feedback License:
> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >> >2Do
> >> >> > > pen.org_who_ipr_feedback-
> >> >5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> >> >> > >
> >>
> >>z7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3
> >L
> >> >Rg5
> >> >> > > F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7Hf-O4ss-
> >> >rLdNJsAbIOg5ruheBcfAYF
> >> >> > > dXWuks_HJfFQ&e= List Guidelines:
> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >> >2Do
> >> >> > > pen.org_policies-2Dguidelines_mailing-
> >> >2Dlists&d=DwIFaQ&c=nKjWec2b6
> >> >> > >
> >>
> >>R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m
> >=
> >> >KkXM
> >> >> > >
> >>
> >>gr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7oMwlBy4fdYwjT2_f9BT
> >w
> >> >HRm
> >> >> > > 2GFtIYL-x8e4C0jlrpM&e= Committee:
> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >> >2Do
> >> >> > >
> >>
> >>pen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=l
> >> >DH
> >> >> > >
> >>
> >>J2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6u
> >H
> >> >CHA
> >> >> > > 2n_Tn6303zez2N4VkSfU&s=-
> >> >K2akZ06PY7r4TquQY4T4UWdZ6JyAsBDFZIEZFNsWHk
> >> >> > > &e= Join OASIS:
> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >> >2Do
> >> >> > >
> >>
> >>pen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3l
> >q
> >> >qs
> >> >> > >
> >>
> >>ArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn630
> >3
> >> >zez2
> >> >> > > N4VkSfU&s=lR1SlzXT71tjw7DDuD6XaHefJQSTBCAvRcdp45ureB8&e=
> >> >> > >
> >> >> > This publicly archived list offers a means to provide input to
> >> >> > the OASIS Virtual I/O Device (VIRTIO) TC.
> >> >> >
> >> >> > In order to verify user consent to the Feedback License terms and
> >> >> > to minimize spam in the list archive, subscription is required
> >> >> > before posting.
> >> >> >
> >> >> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> >> >> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> >> >> > List help: virtio-comment-help@lists.oasis-open.org
> >> >> > List archive:
> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-
> >> >> > 2Do
> >> >> > pen.org_archives_virtio-
> >> >2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
> >> >> >
> >>
> >>&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1
> >R
> >> >Td67
> >> >> >
> >>
> >>ln4UY3Qz2I8T0_t4KwDaM0Ow&s=JjvHrIJI6KbQVxPovjCYXudvXtlvyYp3u7l56t
> >c
> >> >gZ
> >> >> > SU&e= Feedback License:
> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >> >2Dope
> >> >> > n.org_who_ipr_feedback-
> >> >5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xt
> >> >> >
> >>
> >>fQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5
> >RY
> >> >1RTd
> >> >> > 67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=F-
> >> >1nqbZJyAZ3cYJgplGpxopp7wXt26y2UR97jmR
> >> >> > ZSEQ&e= List Guidelines:
> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >> >2Dope
> >> >> > n.org_policies-2Dguidelines_mailing-
> >> >2Dlists&d=DwIFaQ&c=nKjWec2b6R0mO
> >> >> >
> >>
> >>yPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcf
> >j
> >> >Tktm
> >> >> >
> >>
> >>5RY1RTd67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=0JXzYENjagm_cdHWwpZZ9Apv
> >P
> >> >qBLKHhg
> >> >> > ifl_rDNqEP0&e=
> >> >> > Committee:
> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >> >2Dope
> >> >> >
> >>
> >>n.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lD
> >HJ
> >> >2FW
> >> >> >
> >>
> >>52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY
> >3
> >> >Qz2I
> >> >> > 8T0_t4KwDaM0Ow&s=C1VTV-Oeidqxe7sJ-
> >> >ctzZXEHto03VJEKs9BqAt2qObc&e=
> >> >> > Join OASIS:
> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >> >2Dope
> >> >> >
> >>
> >>n.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqq
> >s
> >> >ArgF
> >> >> >
> >>
> >>Rdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_t4K
> >w
> >> >DaM0O
> >> >> > w&s=SNZGlaiIO4-mL1Ct2MIUCBntMVTf8JUglBIv_RGloCY&e=
> >> >> >
> >> >
> >> >This publicly archived list offers a means to provide input to
> >> >theOASIS Virtual I/O Device (VIRTIO) TC.In order to verify user
> >> >consent to the Feedback License terms andto minimize spam in the list
> >> >archive, subscription is requiredbefore
> >> >posting.Subscribe: virtio-comment-subscribe@lists.oasis-
> >> >open.orgUnsubscribe:
> >> >virtio-comment-unsubscribe@lists.oasis-open.orgList
> >> >help: virtio-comment-help@lists.oasis-open.orgList archive:
> >> >https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-
> >> >2Dopen.org_archives_virtio-
> >>
> >>2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3l
> >q
> >>
> >>qsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T
> >0
> >> >_t4KwDaM0Ow&s=JjvHrIJI6KbQVxPovjCYXudvXtlvyYp3u7l56tcgZSU&e=
> >> >Feedback License: https://urldefense.proofpoint.com/v2/url?u=https-
> >> >3A__www.oasis-2Dopen.org_who_ipr_feedback-
> >>
> >>5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lq
> >q
> >>
> >>sArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0
> >_
> >> >t4KwDaM0Ow&s=F-1nqbZJyAZ3cYJgplGpxopp7wXt26y2UR97jmRZSEQ&e=
> >> >List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-
> >> >3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-
> >>
> >>2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgF
> >R
> >>
> >>dcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_t4Kw
> >D
> >> >aM0Ow&s=0JXzYENjagm_cdHWwpZZ9ApvPqBLKHhgifl_rDNqEP0&e=
> >> >Committee: https://urldefense.proofpoint.com/v2/url?u=https-
> >> >3A__www.oasis-
> >>
> >>2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
> >&
> >>
> >>r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1R
> >T
> >> >d67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=C1VTV-Oeidqxe7sJ-
> >> >ctzZXEHto03VJEKs9BqAt2qObc&e= Join OASIS:
> >> >https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >>
> >>2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52
> >o
> >>
> >>J3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz
> >2I
> >> >8T0_t4KwDaM0Ow&s=SNZGlaiIO4-
> >> >mL1Ct2MIUCBntMVTf8JUglBIv_RGloCY&e=
> >>
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [virtio-comment] RE: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-11 15:05             ` [virtio-comment] " Michael S. Tsirkin
@ 2019-11-14 15:49               ` Vitaly Mireyno
  2019-11-20 13:23                 ` [virtio-comment] " Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Vitaly Mireyno @ 2019-11-14 15:49 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, virtio-comment



>-----Original Message-----
>From: Michael S. Tsirkin <mst@redhat.com>
>Sent: Monday, 11 November, 2019 17:05
>To: Vitaly Mireyno <vmireyno@marvell.com>
>Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>open.org
>Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-
>sized receive buffers feature
>
>On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrote:
>>
>>
>> >-----Original Message-----
>> >From: Michael S. Tsirkin <mst@redhat.com>
>> >Sent: Sunday, 10 November, 2019 17:35
>> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>> >open.org
>> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
>> >equal- sized receive buffers feature
>> >
>> >On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
>> >>
>> >> >-----Original Message-----
>> >> >From: virtio-comment@lists.oasis-open.org
>> >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
>> >> >Tsirkin
>> >> >Sent: Tuesday, 5 November, 2019 20:52
>> >> >To: Jason Wang <jasowang@redhat.com>
>> >> >Cc: Vitaly Mireyno <vmireyno@marvell.com>;
>> >> >virtio-comment@lists.oasis- open.org
>> >> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
>> >> >equal-sized receive buffers feature
>> >> >
>> >> >External Email
>> >> >
>> >> >------------------------------------------------------------------
>> >> >---
>> >> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
>> >> >>
>> >> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
>> >> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
>> >> >> > > The feature is limited to receive buffers only.
>> >> >> > > A driver decides on receive buffer length. The only
>> >> >> > > limitation is that this
>> >> >length has to be the same for all receive virtqueue buffers.
>> >> >> > > The driver configures receive buffer length to the device
>> >> >> > > during device
>> >> >initialization, and the device reads it and may use it for optimal
>operation.
>> >> >> > > No changes for transmit buffers.
>> >> >> > >
>> >> >> > > -----Original Message-----
>> >> >> > > From: virtio-comment@lists.oasis-open.org
>> >> >> > > <virtio-comment@lists.oasis-open.org> On Behalf Of Jason
>> >> >> > > Wang
>> >> >> > > Sent: Thursday, 31 October, 2019 12:15
>> >> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
>> >> >> > > virtio-comment@lists.oasis-open.org
>> >> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
>> >> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net: Add
>> >> >> > > equal-sized receive buffers feature
>> >> >> > >
>> >> >> > >
>> >> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
>> >> >> > > > Some devices benefit from working with receive buffers of
>> >> >> > > > a
>> >> >predefined constant length.
>> >> >> > > > Add a feature for drivers that allocate equal-sized
>> >> >> > > > receive buffers, and
>> >> >devices that benefit from predefined receive buffers length.
>> >> >> > > >
>> >> >> > > > Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
>> >> >> > > > ---
>> >> >> > > >    content.tex | 29 +++++++++++++++++++++++++++--
>> >> >> > > >    1 file changed, 27 insertions(+), 2 deletions(-)
>> >> >> > >
>> >> >> > > Is there any other networking device that has this feature?
>> >> >> > >
>> >> >> > >
>> >> >> > > > diff --git a/content.tex b/content.tex index
>> >> >> > > > b1ea9b9..c9e67c8
>> >> >> > > > 100644
>> >> >> > > > --- a/content.tex
>> >> >> > > > +++ b/content.tex
>> >> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
>> >> >> > > > bits}\label{sec:Device
>> >> >Types / Network Device / Feature bits
>> >> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address
>> >> >through control
>> >> >> > > >        channel.
>> >> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver allocates
>> >> >> > > > +all
>> >> >receive buffers
>> >> >> > > > +    of the same constant length. Device benefits from working
>with
>> >> >> > > > +    receive buffers of equal length.
>> >> >> > > > +
>> >> >> > Problem is, I don't think linux will use this for skbs since
>> >> >> > it breaks buffer accounting. This is because it is important
>> >> >> > to make skbs as small as you can. So even if you see "device would
>benefit"
>> >> >> > there is no way to balance this with the benefit to linux.
>> >> >> > How do you know which benefit is bigger?
>> >> >>
>> >> >>
>> >> >> I guess the idea is e.g for Linux driver, it can refuse this feature.
>> >> >
>> >> >Okay. What uses it?
>> >> >
>> >> >>
>> >> >> >
>> >> >> > You also never explained how does device benefit. My guess is
>> >> >> > this allows device to calculate the # of descriptors to fetch
>> >> >> > that are needed for a packet. Right?
>> >> >> > Assuming this, I think a rough estimate should be enough.
>> >> >> > If device fetches too much it can discard extra, if it does
>> >> >> > not fetch enough it can fetch extra.
>> >> >> >
>> >> >> > Let us not forget that express packets are 256 bytes so up to
>> >> >> > 16 descriptors fit in a packet, there is no benefit most of
>> >> >> > the time in knowing whether e.g. 1 or 2 descriptors are needed.
>> >> >> >
>> >> >> >
>> >> >> > Let us not forget these are buffers, not descriptors.
>> >> >>
>> >> >>
>> >> >> I guess maybe the initial motivation is constant descriptor length.
>> >> >
>> >> >
>> >> >That conflicts with requirement framing is up to driver.
>> >> >
>> >> >
>> >> I was using the wrong terminology. The feature is about constant
>> >*descriptor* length. In other words, the value that driver puts in
>> >Packed Virtqueue "Element Length" field (or 'len' field in the 'pvirtq_desc'
>struct).
>> >
>> >OK so this conflicts with "2.6.4 Message Framing" which requires that
>> >drivers can split buffers into as many descriptors as they like.
>> >Right?
>> >
>> >> Does it make more sense now, or you still see an issue with Linux driver?
>> >
>> >I think there's an issue with the flexible framing requirements and
>> >there's an issue with Linux driver.
>> >
>> >> The motivation is to allow the device to calculate the exact number
>> >> of
>> >descriptors to consume, before fetching the descriptors. This is
>> >beneficial for devices for which overconsuming or underconsuming
>> >descriptors come at a high cost.
>> >
>> >So I guess we can agree getting the # of descriptors *exactly* right
>> >isn't all that important. Right? My question is how does the driver
>> >balance the device versus Linux needs?
>> >
>> >One idea is if we assume this is best effort, does not have to be
>> >exact, then how about just having the device assume descriptor sizes
>> >stay more or less constant and use that to estimate the amount? If it
>> >under/over estimates, things just go a bit slower.
>> >This way driver can adjust the sizes and device will react
>> >automatically, with time.
>> >
>> >
>>
>> I see that "2.6.4 Message Framing" allows a full flexibility of
>> descriptor lengths, and I presume it's applicable to Packet Virtqueues
>> as well, though defined under Split Virtqueues section.
>
>That's a good point. Probably makes sense to move it out to a common
>section, right?
>
>> The example
>> talks about transmit virtqueue, and it makes perfect sense.
>> However, wouldn't a typical driver place equal-sized *receive* descriptors
>anyway? So if a device can benefit from it, the driver might as well negotiate
>this new feature.
>
>Having buffer size jump around wildly doesn't seem too useful, I agree.
>OTOH being able to adjust it gradually has been in the past demontrated to
>help performance measureably.
>
>> This could be especially relevant for devices, for which adjusting the number
>of used descriptors is impractical after descriptors have already been fetched.
>> I agree that if this requirement conflicts with specific driver needs, it will not
>be negotiated, and the device will either underperform in certain scenarios, or
>will not come up at all.
>
>Right so that makes it a challenge. Device says it prefers fixed buffers. Is that
>preference more or less important than ability to efficiently support
>workloads such as TCP small queues?
>Driver has no idea and I suspect neither does the device.
>So I don't see how will a Linux driver know that it should enable this, neither
>how will device know it's okay to just refuse features_ok.
>
>
>On the other hand, current drivers already have logic that tries to change
>buffer sizes in a smooth way.  So simply caching the last buffer size and
>assuming all the others will be exactly the same will go a long way towards
>limiting how much does device need to fetch.  This does imply extra logic on
>the device to recover if things change and the first read did not fetch enough
>buffers, but then it would be required anyway if as you say above the failure
>is not catastrophic.
>
>
>The feature thus only seems useful for small, feature-restrained devices
>- which are prepared to sacrifice some performance to cut costs, and which
>can't recover at all. Is this what you are trying to do?
>

The intention is to enable devices with this specific limitation to offer virtio offload.
The feature can be redefined such that it would only be offered by devices that are unable to handle dynamically changing descriptor lengths. How does that sound?

>> Could you specify what issue do you see with the Linux driver?
>
>See logic around struct ewma.
>
>The first relevant commit is I guess
>commit ab7db91705e95ed1bba1304388936fccfa58c992
>    virtio-net: auto-tune mergeable rx buffer size for improved performance
>
>this claims a gain of about 10% with large packets which isn't earth shattering
>but also not something we can ignore completely. And I suspect it can be
>bigger with smaller packets.
>
>> >> >>
>> >> >> >
>> >> >> > So device does not really know the exact # of descriptors:
>> >> >> > current drivers do 1 descriptor per buffer but nothing
>> >> >> > prevents more.
>> >> >> >
>> >> >> >
>> >> >> > Thoughts?
>> >> >> >
>> >> >> >
>> >> >> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process
>> >> >> > > > duplicated
>> >> >ACKs
>> >> >> > > >        and report number of coalesced segments and
>> >> >> > > > duplicated ACKs @@ -2854,8 +2858,8 @@
>\subsubsection{Legacy Interface:
>> >> >Feature bits}\label{sec:Device Types / Network
>> >> >> > > >    \subsection{Device configuration
>> >> >> > > > layout}\label{sec:Device Types /
>> >> >Network Device / Device configuration layout}
>> >> >> > > >    \label{sec:Device Types / Block Device / Feature bits /
>> >> >> > > > Device configuration layout} -Three driver-read-only
>> >> >> > > > configuration fields are currently defined.
>> >> >> > > > The \field{mac} address field -always exists (though is
>> >> >> > > > only valid if VIRTIO_NET_F_MAC is set), and
>> >> >> > > > +The driver-read-only \field{mac} address field always
>> >> >> > > > +exists (though is only valid if VIRTIO_NET_F_MAC is set),
>> >> >> > > > +and
>> >> >> > > >    \field{status} only exists if VIRTIO_NET_F_STATUS is set. Two
>> >> >> > > >    read-only bits (for the driver) are currently defined
>> >> >> > > > for the status
>> >> >field:
>> >> >> > > >    VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
>> >> >> > > > @@ -2875,12 +2879,17 @@ \subsection{Device configuration
>> >> >layout}\label{sec:Device Types / Network Device
>> >> >> > > >    VIRTIO_NET_F_MTU is set. This field specifies the
>> >> >> > > > maximum MTU for
>> >> >the driver to
>> >> >> > > >    use.
>> >> >> > > > +The device-read-only field \field{rx_buf_len} only exists
>> >> >> > > > +if
>> >> >> > >
>> >> >> > > Should be driver-read-only.
>> >> >> > >
>> >> >> > >
>> >> >> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field
>> >> >> > > > +specifies the receive buffers length.
>> >> >> > > > +
>> >> >> > > >    \begin{lstlisting}
>> >> >> > > >    struct virtio_net_config {
>> >> >> > > >            u8 mac[6];
>> >> >> > > >            le16 status;
>> >> >> > > >            le16 max_virtqueue_pairs;
>> >> >> > > >            le16 mtu;
>> >> >> > > > +        le32 rx_buf_len;
>> >> >> > > >    };
>> >> >> > > >    \end{lstlisting}
>> >> >> > > > @@ -2933,6 +2942,13 @@ \subsection{Device configuration
>> >> >> > > > layout}\label{sec:Device Types / Network Device
>> >> >> > > >    A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY
>> >> >> > > > feature if
>> >> >the device offers it.
>> >> >> > > > +A driver SHOULD accept the
>VIRTIO_NET_F_CONST_RXBUF_LEN
>> >> >feature if offered.
>> >> >> > > > +
>> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
>> >negotiated,
>> >> >> > > > +the driver MUST set \field{rx_buf_len}.
>> >> >> > >
>> >> >> > > I think it's device that set the field?
>> >> >> > Makes more sense for the driver, but if you want this set e.g.
>> >> >> > before buffers are added, you must say so.
>> >> >> >
>> >> >> > > > +
>> >> >> > > > +A driver MUST NOT modify \field{rx_buf_len} once it has
>> >> >> > > > +been
>> >set.
>> >> >> > > > +
>> >> >> >
>> >> >> > This seems very unflexible. I can see how e.g. XDP would
>> >> >> > benefit from big buffers while skbs benefit from small buffers.
>> >> >> >
>> >> >> > This calls for ability to change this.
>> >> >>
>> >> >>
>> >> >> Yes, but it requires non trivial cleanups for the old length and
>> >> >> place them with new ones.
>> >> >>
>> >> >> Thanks
>> >> >
>> >> >
>> >> >Let's see:
>> >> >1	- making buffer smaller: just update config space,
>> >> >	  then make new buffers smaller
>> >> >
>> >> >2	- making buffers bigger: add larger buffers,
>> >> >	once all small ones are consumed update config space
>> >> >
>> >> >2 is tricky I agree. Thoughts?
>> >> >
>> >> >
>> >> Agree. It's doable, provided that the driver will follow the update
>> >procedure.
>> >> >
>> >> >>
>> >> >> >
>> >> >> > > >    \subsubsection{Legacy Interface: Device configuration
>> >> >layout}\label{sec:Device Types / Network Device / Device
>> >> >configuration layout / Legacy Interface: Device configuration
>> >> >layout}
>> >> >> > > >    \label{sec:Device Types / Block Device / Feature bits /
>> >> >> > > > Device
>> >> >configuration layout / Legacy Interface: Device configuration
>> >> >layout}
>> >> >> > > >    When using the legacy interface, transitional devices
>> >> >> > > > and drivers @@
>> >> >> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive
>> >> >Buffers}\label{sec:Device Types / Network Devi
>> >> >> > > >    If VIRTIO_NET_F_MQ is negotiated, each of
>> >> >> > > > receiveq1\ldots
>> >> >receiveqN
>> >> >> > > >    that will be used SHOULD be populated with receive buffers.
>> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
>> >negotiated,
>> >> >> > > > +the driver MUST initialize all receive virtqueue
>> >> >> > > > +descriptors \field{len} field with the value configured
>> >> >> > > > +in \field{rx_buf_len} device configuration field, and
>> >> >> > > > +allocate receive
>> >> >buffers accordingly.
>> >> >> > > > +
>> >> >> > > >    \devicenormative{\paragraph}{Setting Up Receive
>> >> >> > > > Buffers}{Device Types / Network Device / Device Operation
>> >> >> > > > / Setting Up Receive Buffers}
>> >> >> > > >    The device MUST set \field{num_buffers} to the number
>> >> >> > > > of descriptors used to @@ -3396,6 +3417,10 @@
>> >> >\subsubsection{Processing of Incoming Packets}\label{sec:Device
>> >> >Types / Network
>> >> >> > > >    checksum (in case of multiple encapsulated protocols, one
>level
>> >> >> > > >    of checksums is validated).
>> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated,
>the
>> >> >device
>> >> >> > > > +MAY use \field{rx_buf_len} as a buffer length (instead of
>> >> >> > > > +reading it from virtqueue descriptor \field{len} field).
>> >> >> > >
>> >> >> > > Is this safe? What if driver submit a small buffer, then
>> >> >> > > device can read
>> >> >more than what is allowed?
>> >> >> > >
>> >> >> > > Thanks
>> >> >> > >
>> >> >> > >
>> >> >> > > > +
>> >> >> > > >    \drivernormative{\paragraph}{Processing of Incoming
>> >> >> > > >    Packets}{Device Types / Network Device / Device Operation /
>> >> >> > > >    Processing of Incoming Packets}
>> >> >> > > > --
>> >> >> > >
>> >> >> > > This publicly archived list offers a means to provide input
>> >> >> > > to theOASIS Virtual I/O Device (VIRTIO) TC.In order to
>> >> >> > > verify user consent to the Feedback License terms andto
>> >> >> > > minimize spam in the list archive, subscription is requiredbefore
>posting.Subscribe:
>> >> >> > > virtio-comment-subscribe@lists.oasis-open.orgUnsubscribe:
>> >> >> > > virtio-comment-unsubscribe@lists.oasis-open.orgList help:
>> >> >> > > virtio-comment-help@lists.oasis-open.orgList archive:
>> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.o
>> >> >> > > asi
>> >> >> > > s-2
>> >> >> > > Dopen.org_archives_virtio-
>> >> >2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7
>> >> >> > >
>> >>
>>
>>>xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3L
>R
>> >g
>> >> >5F7
>> >> >> > >
>> >>
>>
>>>_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=ZvYuXs3LuhkeJxW_zoADv1L4RisD
>y
>> >G
>> >> >H_v
>> >> >> > > p7scO91w9s&e= Feedback License:
>> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oas
>> >> >> > > is-
>> >> >2Do
>> >> >> > > pen.org_who_ipr_feedback-
>> >> >5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPa
>> >> >> > >
>> >>
>>
>>>z7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr
>3
>> >L
>> >> >Rg5
>> >> >> > > F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7Hf-O4ss-
>> >> >rLdNJsAbIOg5ruheBcfAYF
>> >> >> > > dXWuks_HJfFQ&e= List Guidelines:
>> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oas
>> >> >> > > is-
>> >> >2Do
>> >> >> > > pen.org_policies-2Dguidelines_mailing-
>> >> >2Dlists&d=DwIFaQ&c=nKjWec2b6
>> >> >> > >
>> >>
>>
>>>R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&
>m
>> >=
>> >> >KkXM
>> >> >> > >
>> >>
>>
>>>gr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7oMwlBy4fdYwjT2_f9BT
>> >w
>> >> >HRm
>> >> >> > > 2GFtIYL-x8e4C0jlrpM&e= Committee:
>> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oas
>> >> >> > > is-
>> >> >2Do
>> >> >> > >
>> >>
>>
>>>pen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r
>=l
>> >> >DH
>> >> >> > >
>> >>
>>
>>>J2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6
>u
>> >H
>> >> >CHA
>> >> >> > > 2n_Tn6303zez2N4VkSfU&s=-
>> >> >K2akZ06PY7r4TquQY4T4UWdZ6JyAsBDFZIEZFNsWHk
>> >> >> > > &e= Join OASIS:
>> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oas
>> >> >> > > is-
>> >> >2Do
>> >> >> > >
>> >>
>>
>>>pen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ
>3l
>> >q
>> >> >qs
>> >> >> > >
>> >>
>>
>>>ArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6
>30
>> >3
>> >> >zez2
>> >> >> > >
>N4VkSfU&s=lR1SlzXT71tjw7DDuD6XaHefJQSTBCAvRcdp45ureB8&e=
>> >> >> > >
>> >> >> > This publicly archived list offers a means to provide input to
>> >> >> > the OASIS Virtual I/O Device (VIRTIO) TC.
>> >> >> >
>> >> >> > In order to verify user consent to the Feedback License terms
>> >> >> > and to minimize spam in the list archive, subscription is
>> >> >> > required before posting.
>> >> >> >
>> >> >> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> >> >> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> >> >> > List help: virtio-comment-help@lists.oasis-open.org
>> >> >> > List archive:
>> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oas
>> >> >> > is-
>> >> >> > 2Do
>> >> >> > pen.org_archives_virtio-
>> >> >2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
>> >> >> >
>> >>
>>
>>>&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5R
>Y1
>> >R
>> >> >Td67
>> >> >> >
>> >>
>>
>>>ln4UY3Qz2I8T0_t4KwDaM0Ow&s=JjvHrIJI6KbQVxPovjCYXudvXtlvyYp3u7l56
>t
>> >c
>> >> >gZ
>> >> >> > SU&e= Feedback License:
>> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis
>> >> >> > -
>> >> >2Dope
>> >> >> > n.org_who_ipr_feedback-
>> >> >5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xt
>> >> >> >
>> >>
>>
>>>fQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm
>5
>> >RY
>> >> >1RTd
>> >> >> > 67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=F-
>> >> >1nqbZJyAZ3cYJgplGpxopp7wXt26y2UR97jmR
>> >> >> > ZSEQ&e= List Guidelines:
>> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis
>> >> >> > -
>> >> >2Dope
>> >> >> > n.org_policies-2Dguidelines_mailing-
>> >> >2Dlists&d=DwIFaQ&c=nKjWec2b6R0mO
>> >> >> >
>> >>
>>
>>>yPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_v
>cf
>> >j
>> >> >Tktm
>> >> >> >
>> >>
>>
>>>5RY1RTd67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=0JXzYENjagm_cdHWwpZZ9Ap
>v
>> >P
>> >> >qBLKHhg
>> >> >> > ifl_rDNqEP0&e=
>> >> >> > Committee:
>> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis
>> >> >> > -
>> >> >2Dope
>> >> >> >
>> >>
>>
>>>n.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=l
>D
>> >HJ
>> >> >2FW
>> >> >> >
>> >>
>>
>>>52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4U
>Y
>> >3
>> >> >Qz2I
>> >> >> > 8T0_t4KwDaM0Ow&s=C1VTV-Oeidqxe7sJ-
>> >> >ctzZXEHto03VJEKs9BqAt2qObc&e=
>> >> >> > Join OASIS:
>> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis
>> >> >> > -
>> >> >2Dope
>> >> >> >
>> >>
>>
>>>n.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lq
>q
>> >s
>> >> >ArgF
>> >> >> >
>> >>
>>
>>>Rdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_t4K
>> >w
>> >> >DaM0O
>> >> >> > w&s=SNZGlaiIO4-mL1Ct2MIUCBntMVTf8JUglBIv_RGloCY&e=
>> >> >> >
>> >> >
>> >> >This publicly archived list offers a means to provide input to
>> >> >theOASIS Virtual I/O Device (VIRTIO) TC.In order to verify user
>> >> >consent to the Feedback License terms andto minimize spam in the
>> >> >list archive, subscription is requiredbefore
>> >> >posting.Subscribe: virtio-comment-subscribe@lists.oasis-
>> >> >open.orgUnsubscribe:
>> >> >virtio-comment-unsubscribe@lists.oasis-open.orgList
>> >> >help: virtio-comment-help@lists.oasis-open.orgList archive:
>> >> >https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-
>> >> >2Dopen.org_archives_virtio-
>> >>
>>
>>>2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3
>l
>> >q
>> >>
>>
>>>qsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8
>T
>> >0
>> >> >_t4KwDaM0Ow&s=JjvHrIJI6KbQVxPovjCYXudvXtlvyYp3u7l56tcgZSU&e=
>> >> >Feedback License:
>> >> >https://urldefense.proofpoint.com/v2/url?u=https-
>> >> >3A__www.oasis-2Dopen.org_who_ipr_feedback-
>> >>
>>
>>>5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3l
>q
>> >q
>> >>
>>
>>>sArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T
>0
>> >_
>> >> >t4KwDaM0Ow&s=F-
>1nqbZJyAZ3cYJgplGpxopp7wXt26y2UR97jmRZSEQ&e=
>> >> >List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-
>> >> >3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-
>> >>
>>
>>>2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsAr
>gF
>> >R
>> >>
>>
>>>dcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_t4K
>w
>> >D
>> >> >aM0Ow&s=0JXzYENjagm_cdHWwpZZ9ApvPqBLKHhgifl_rDNqEP0&e=
>> >> >Committee: https://urldefense.proofpoint.com/v2/url?u=https-
>> >> >3A__www.oasis-
>> >>
>>
>>>2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtf
>Q
>> >&
>> >>
>>
>>>r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1
>R
>> >T
>> >> >d67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=C1VTV-Oeidqxe7sJ-
>> >> >ctzZXEHto03VJEKs9BqAt2qObc&e= Join OASIS:
>> >> >https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
>> >>
>>
>>>2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW5
>2
>> >o
>> >>
>>
>>>J3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Q
>z
>> >2I
>> >> >8T0_t4KwDaM0Ow&s=SNZGlaiIO4-
>> >> >mL1Ct2MIUCBntMVTf8JUglBIv_RGloCY&e=
>> >>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-14 15:49               ` [virtio-comment] " Vitaly Mireyno
@ 2019-11-20 13:23                 ` Michael S. Tsirkin
  2019-11-24 15:02                   ` [virtio-comment] " Vitaly Mireyno
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-11-20 13:23 UTC (permalink / raw)
  To: Vitaly Mireyno; +Cc: Jason Wang, virtio-comment

On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote:
> 
> 
> >-----Original Message-----
> >From: Michael S. Tsirkin <mst@redhat.com>
> >Sent: Monday, 11 November, 2019 17:05
> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> >open.org
> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-
> >sized receive buffers feature
> >
> >On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrote:
> >>
> >>
> >> >-----Original Message-----
> >> >From: Michael S. Tsirkin <mst@redhat.com>
> >> >Sent: Sunday, 10 November, 2019 17:35
> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> >> >open.org
> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
> >> >equal- sized receive buffers feature
> >> >
> >> >On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
> >> >>
> >> >> >-----Original Message-----
> >> >> >From: virtio-comment@lists.oasis-open.org
> >> >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
> >> >> >Tsirkin
> >> >> >Sent: Tuesday, 5 November, 2019 20:52
> >> >> >To: Jason Wang <jasowang@redhat.com>
> >> >> >Cc: Vitaly Mireyno <vmireyno@marvell.com>;
> >> >> >virtio-comment@lists.oasis- open.org
> >> >> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
> >> >> >equal-sized receive buffers feature
> >> >> >
> >> >> >External Email
> >> >> >
> >> >> >------------------------------------------------------------------
> >> >> >---
> >> >> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
> >> >> >>
> >> >> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
> >> >> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
> >> >> >> > > The feature is limited to receive buffers only.
> >> >> >> > > A driver decides on receive buffer length. The only
> >> >> >> > > limitation is that this
> >> >> >length has to be the same for all receive virtqueue buffers.
> >> >> >> > > The driver configures receive buffer length to the device
> >> >> >> > > during device
> >> >> >initialization, and the device reads it and may use it for optimal
> >operation.
> >> >> >> > > No changes for transmit buffers.
> >> >> >> > >
> >> >> >> > > -----Original Message-----
> >> >> >> > > From: virtio-comment@lists.oasis-open.org
> >> >> >> > > <virtio-comment@lists.oasis-open.org> On Behalf Of Jason
> >> >> >> > > Wang
> >> >> >> > > Sent: Thursday, 31 October, 2019 12:15
> >> >> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
> >> >> >> > > virtio-comment@lists.oasis-open.org
> >> >> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> >> >> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net: Add
> >> >> >> > > equal-sized receive buffers feature
> >> >> >> > >
> >> >> >> > >
> >> >> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
> >> >> >> > > > Some devices benefit from working with receive buffers of
> >> >> >> > > > a
> >> >> >predefined constant length.
> >> >> >> > > > Add a feature for drivers that allocate equal-sized
> >> >> >> > > > receive buffers, and
> >> >> >devices that benefit from predefined receive buffers length.
> >> >> >> > > >
> >> >> >> > > > Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
> >> >> >> > > > ---
> >> >> >> > > >    content.tex | 29 +++++++++++++++++++++++++++--
> >> >> >> > > >    1 file changed, 27 insertions(+), 2 deletions(-)
> >> >> >> > >
> >> >> >> > > Is there any other networking device that has this feature?
> >> >> >> > >
> >> >> >> > >
> >> >> >> > > > diff --git a/content.tex b/content.tex index
> >> >> >> > > > b1ea9b9..c9e67c8
> >> >> >> > > > 100644
> >> >> >> > > > --- a/content.tex
> >> >> >> > > > +++ b/content.tex
> >> >> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
> >> >> >> > > > bits}\label{sec:Device
> >> >> >Types / Network Device / Feature bits
> >> >> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address
> >> >> >through control
> >> >> >> > > >        channel.
> >> >> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver allocates
> >> >> >> > > > +all
> >> >> >receive buffers
> >> >> >> > > > +    of the same constant length. Device benefits from working
> >with
> >> >> >> > > > +    receive buffers of equal length.
> >> >> >> > > > +
> >> >> >> > Problem is, I don't think linux will use this for skbs since
> >> >> >> > it breaks buffer accounting. This is because it is important
> >> >> >> > to make skbs as small as you can. So even if you see "device would
> >benefit"
> >> >> >> > there is no way to balance this with the benefit to linux.
> >> >> >> > How do you know which benefit is bigger?
> >> >> >>
> >> >> >>
> >> >> >> I guess the idea is e.g for Linux driver, it can refuse this feature.
> >> >> >
> >> >> >Okay. What uses it?
> >> >> >
> >> >> >>
> >> >> >> >
> >> >> >> > You also never explained how does device benefit. My guess is
> >> >> >> > this allows device to calculate the # of descriptors to fetch
> >> >> >> > that are needed for a packet. Right?
> >> >> >> > Assuming this, I think a rough estimate should be enough.
> >> >> >> > If device fetches too much it can discard extra, if it does
> >> >> >> > not fetch enough it can fetch extra.
> >> >> >> >
> >> >> >> > Let us not forget that express packets are 256 bytes so up to
> >> >> >> > 16 descriptors fit in a packet, there is no benefit most of
> >> >> >> > the time in knowing whether e.g. 1 or 2 descriptors are needed.
> >> >> >> >
> >> >> >> >
> >> >> >> > Let us not forget these are buffers, not descriptors.
> >> >> >>
> >> >> >>
> >> >> >> I guess maybe the initial motivation is constant descriptor length.
> >> >> >
> >> >> >
> >> >> >That conflicts with requirement framing is up to driver.
> >> >> >
> >> >> >
> >> >> I was using the wrong terminology. The feature is about constant
> >> >*descriptor* length. In other words, the value that driver puts in
> >> >Packed Virtqueue "Element Length" field (or 'len' field in the 'pvirtq_desc'
> >struct).
> >> >
> >> >OK so this conflicts with "2.6.4 Message Framing" which requires that
> >> >drivers can split buffers into as many descriptors as they like.
> >> >Right?
> >> >
> >> >> Does it make more sense now, or you still see an issue with Linux driver?
> >> >
> >> >I think there's an issue with the flexible framing requirements and
> >> >there's an issue with Linux driver.
> >> >
> >> >> The motivation is to allow the device to calculate the exact number
> >> >> of
> >> >descriptors to consume, before fetching the descriptors. This is
> >> >beneficial for devices for which overconsuming or underconsuming
> >> >descriptors come at a high cost.
> >> >
> >> >So I guess we can agree getting the # of descriptors *exactly* right
> >> >isn't all that important. Right? My question is how does the driver
> >> >balance the device versus Linux needs?
> >> >
> >> >One idea is if we assume this is best effort, does not have to be
> >> >exact, then how about just having the device assume descriptor sizes
> >> >stay more or less constant and use that to estimate the amount? If it
> >> >under/over estimates, things just go a bit slower.
> >> >This way driver can adjust the sizes and device will react
> >> >automatically, with time.
> >> >
> >> >
> >>
> >> I see that "2.6.4 Message Framing" allows a full flexibility of
> >> descriptor lengths, and I presume it's applicable to Packet Virtqueues
> >> as well, though defined under Split Virtqueues section.
> >
> >That's a good point. Probably makes sense to move it out to a common
> >section, right?
> >
> >> The example
> >> talks about transmit virtqueue, and it makes perfect sense.
> >> However, wouldn't a typical driver place equal-sized *receive* descriptors
> >anyway? So if a device can benefit from it, the driver might as well negotiate
> >this new feature.
> >
> >Having buffer size jump around wildly doesn't seem too useful, I agree.
> >OTOH being able to adjust it gradually has been in the past demontrated to
> >help performance measureably.
> >
> >> This could be especially relevant for devices, for which adjusting the number
> >of used descriptors is impractical after descriptors have already been fetched.
> >> I agree that if this requirement conflicts with specific driver needs, it will not
> >be negotiated, and the device will either underperform in certain scenarios, or
> >will not come up at all.
> >
> >Right so that makes it a challenge. Device says it prefers fixed buffers. Is that
> >preference more or less important than ability to efficiently support
> >workloads such as TCP small queues?
> >Driver has no idea and I suspect neither does the device.
> >So I don't see how will a Linux driver know that it should enable this, neither
> >how will device know it's okay to just refuse features_ok.
> >
> >
> >On the other hand, current drivers already have logic that tries to change
> >buffer sizes in a smooth way.  So simply caching the last buffer size and
> >assuming all the others will be exactly the same will go a long way towards
> >limiting how much does device need to fetch.  This does imply extra logic on
> >the device to recover if things change and the first read did not fetch enough
> >buffers, but then it would be required anyway if as you say above the failure
> >is not catastrophic.
> >
> >
> >The feature thus only seems useful for small, feature-restrained devices
> >- which are prepared to sacrifice some performance to cut costs, and which
> >can't recover at all. Is this what you are trying to do?
> >
> 
> The intention is to enable devices with this specific limitation to
> offer virtio offload.  The feature can be redefined such that it would
> only be offered by devices that are unable to handle dynamically
> changing descriptor lengths. How does that sound?

So it makes more sense when mergeable buffers are disabled (since then
buffers are practically all same size).


With that in mind, I have an idea: scsi and block already have max sg field.
How about we add a writeable max sg field, maybe even make it
programmable per vq?

Thus driver tells device what is it's max s/g value for rx. Worst case
device fetches a bit more than it needs. Discarding extra shouldn't be
expensive. This looks like it will help even smart devices. What do you
think?

This nicely avoids conflicting with the flexible framing requirement.


> >> Could you specify what issue do you see with the Linux driver?
> >
> >See logic around struct ewma.
> >
> >The first relevant commit is I guess
> >commit ab7db91705e95ed1bba1304388936fccfa58c992
> >    virtio-net: auto-tune mergeable rx buffer size for improved performance
> >
> >this claims a gain of about 10% with large packets which isn't earth shattering
> >but also not something we can ignore completely. And I suspect it can be
> >bigger with smaller packets.
> >
> >> >> >>
> >> >> >> >
> >> >> >> > So device does not really know the exact # of descriptors:
> >> >> >> > current drivers do 1 descriptor per buffer but nothing
> >> >> >> > prevents more.
> >> >> >> >
> >> >> >> >
> >> >> >> > Thoughts?
> >> >> >> >
> >> >> >> >
> >> >> >> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process
> >> >> >> > > > duplicated
> >> >> >ACKs
> >> >> >> > > >        and report number of coalesced segments and
> >> >> >> > > > duplicated ACKs @@ -2854,8 +2858,8 @@
> >\subsubsection{Legacy Interface:
> >> >> >Feature bits}\label{sec:Device Types / Network
> >> >> >> > > >    \subsection{Device configuration
> >> >> >> > > > layout}\label{sec:Device Types /
> >> >> >Network Device / Device configuration layout}
> >> >> >> > > >    \label{sec:Device Types / Block Device / Feature bits /
> >> >> >> > > > Device configuration layout} -Three driver-read-only
> >> >> >> > > > configuration fields are currently defined.
> >> >> >> > > > The \field{mac} address field -always exists (though is
> >> >> >> > > > only valid if VIRTIO_NET_F_MAC is set), and
> >> >> >> > > > +The driver-read-only \field{mac} address field always
> >> >> >> > > > +exists (though is only valid if VIRTIO_NET_F_MAC is set),
> >> >> >> > > > +and
> >> >> >> > > >    \field{status} only exists if VIRTIO_NET_F_STATUS is set. Two
> >> >> >> > > >    read-only bits (for the driver) are currently defined
> >> >> >> > > > for the status
> >> >> >field:
> >> >> >> > > >    VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
> >> >> >> > > > @@ -2875,12 +2879,17 @@ \subsection{Device configuration
> >> >> >layout}\label{sec:Device Types / Network Device
> >> >> >> > > >    VIRTIO_NET_F_MTU is set. This field specifies the
> >> >> >> > > > maximum MTU for
> >> >> >the driver to
> >> >> >> > > >    use.
> >> >> >> > > > +The device-read-only field \field{rx_buf_len} only exists
> >> >> >> > > > +if
> >> >> >> > >
> >> >> >> > > Should be driver-read-only.
> >> >> >> > >
> >> >> >> > >
> >> >> >> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field
> >> >> >> > > > +specifies the receive buffers length.
> >> >> >> > > > +
> >> >> >> > > >    \begin{lstlisting}
> >> >> >> > > >    struct virtio_net_config {
> >> >> >> > > >            u8 mac[6];
> >> >> >> > > >            le16 status;
> >> >> >> > > >            le16 max_virtqueue_pairs;
> >> >> >> > > >            le16 mtu;
> >> >> >> > > > +        le32 rx_buf_len;
> >> >> >> > > >    };
> >> >> >> > > >    \end{lstlisting}
> >> >> >> > > > @@ -2933,6 +2942,13 @@ \subsection{Device configuration
> >> >> >> > > > layout}\label{sec:Device Types / Network Device
> >> >> >> > > >    A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY
> >> >> >> > > > feature if
> >> >> >the device offers it.
> >> >> >> > > > +A driver SHOULD accept the
> >VIRTIO_NET_F_CONST_RXBUF_LEN
> >> >> >feature if offered.
> >> >> >> > > > +
> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
> >> >negotiated,
> >> >> >> > > > +the driver MUST set \field{rx_buf_len}.
> >> >> >> > >
> >> >> >> > > I think it's device that set the field?
> >> >> >> > Makes more sense for the driver, but if you want this set e.g.
> >> >> >> > before buffers are added, you must say so.
> >> >> >> >
> >> >> >> > > > +
> >> >> >> > > > +A driver MUST NOT modify \field{rx_buf_len} once it has
> >> >> >> > > > +been
> >> >set.
> >> >> >> > > > +
> >> >> >> >
> >> >> >> > This seems very unflexible. I can see how e.g. XDP would
> >> >> >> > benefit from big buffers while skbs benefit from small buffers.
> >> >> >> >
> >> >> >> > This calls for ability to change this.
> >> >> >>
> >> >> >>
> >> >> >> Yes, but it requires non trivial cleanups for the old length and
> >> >> >> place them with new ones.
> >> >> >>
> >> >> >> Thanks
> >> >> >
> >> >> >
> >> >> >Let's see:
> >> >> >1	- making buffer smaller: just update config space,
> >> >> >	  then make new buffers smaller
> >> >> >
> >> >> >2	- making buffers bigger: add larger buffers,
> >> >> >	once all small ones are consumed update config space
> >> >> >
> >> >> >2 is tricky I agree. Thoughts?
> >> >> >
> >> >> >
> >> >> Agree. It's doable, provided that the driver will follow the update
> >> >procedure.
> >> >> >
> >> >> >>
> >> >> >> >
> >> >> >> > > >    \subsubsection{Legacy Interface: Device configuration
> >> >> >layout}\label{sec:Device Types / Network Device / Device
> >> >> >configuration layout / Legacy Interface: Device configuration
> >> >> >layout}
> >> >> >> > > >    \label{sec:Device Types / Block Device / Feature bits /
> >> >> >> > > > Device
> >> >> >configuration layout / Legacy Interface: Device configuration
> >> >> >layout}
> >> >> >> > > >    When using the legacy interface, transitional devices
> >> >> >> > > > and drivers @@
> >> >> >> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive
> >> >> >Buffers}\label{sec:Device Types / Network Devi
> >> >> >> > > >    If VIRTIO_NET_F_MQ is negotiated, each of
> >> >> >> > > > receiveq1\ldots
> >> >> >receiveqN
> >> >> >> > > >    that will be used SHOULD be populated with receive buffers.
> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
> >> >negotiated,
> >> >> >> > > > +the driver MUST initialize all receive virtqueue
> >> >> >> > > > +descriptors \field{len} field with the value configured
> >> >> >> > > > +in \field{rx_buf_len} device configuration field, and
> >> >> >> > > > +allocate receive
> >> >> >buffers accordingly.
> >> >> >> > > > +
> >> >> >> > > >    \devicenormative{\paragraph}{Setting Up Receive
> >> >> >> > > > Buffers}{Device Types / Network Device / Device Operation
> >> >> >> > > > / Setting Up Receive Buffers}
> >> >> >> > > >    The device MUST set \field{num_buffers} to the number
> >> >> >> > > > of descriptors used to @@ -3396,6 +3417,10 @@
> >> >> >\subsubsection{Processing of Incoming Packets}\label{sec:Device
> >> >> >Types / Network
> >> >> >> > > >    checksum (in case of multiple encapsulated protocols, one
> >level
> >> >> >> > > >    of checksums is validated).
> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated,
> >the
> >> >> >device
> >> >> >> > > > +MAY use \field{rx_buf_len} as a buffer length (instead of
> >> >> >> > > > +reading it from virtqueue descriptor \field{len} field).
> >> >> >> > >
> >> >> >> > > Is this safe? What if driver submit a small buffer, then
> >> >> >> > > device can read
> >> >> >more than what is allowed?
> >> >> >> > >
> >> >> >> > > Thanks
> >> >> >> > >
> >> >> >> > >
> >> >> >> > > > +
> >> >> >> > > >    \drivernormative{\paragraph}{Processing of Incoming
> >> >> >> > > >    Packets}{Device Types / Network Device / Device Operation /
> >> >> >> > > >    Processing of Incoming Packets}
> >> >> >> > > > --
> >> >> >> > >
> >> >> >> > > This publicly archived list offers a means to provide input
> >> >> >> > > to theOASIS Virtual I/O Device (VIRTIO) TC.In order to
> >> >> >> > > verify user consent to the Feedback License terms andto
> >> >> >> > > minimize spam in the list archive, subscription is requiredbefore
> >posting.Subscribe:
> >> >> >> > > virtio-comment-subscribe@lists.oasis-open.orgUnsubscribe:
> >> >> >> > > virtio-comment-unsubscribe@lists.oasis-open.orgList help:
> >> >> >> > > virtio-comment-help@lists.oasis-open.orgList archive:
> >> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.o
> >> >> >> > > asi
> >> >> >> > > s-2
> >> >> >> > > Dopen.org_archives_virtio-
> >> >> >2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7
> >> >> >> > >
> >> >>
> >>
> >>>xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3L
> >R
> >> >g
> >> >> >5F7
> >> >> >> > >
> >> >>
> >>
> >>>_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=ZvYuXs3LuhkeJxW_zoADv1L4RisD
> >y
> >> >G
> >> >> >H_v
> >> >> >> > > p7scO91w9s&e= Feedback License:
> >> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__www.oas
> >> >> >> > > is-
> >> >> >2Do
> >> >> >> > > pen.org_who_ipr_feedback-
> >> >> >5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> >> >> >> > >
> >> >>
> >>
> >>>z7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr
> >3
> >> >L
> >> >> >Rg5
> >> >> >> > > F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7Hf-O4ss-
> >> >> >rLdNJsAbIOg5ruheBcfAYF
> >> >> >> > > dXWuks_HJfFQ&e= List Guidelines:
> >> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__www.oas
> >> >> >> > > is-
> >> >> >2Do
> >> >> >> > > pen.org_policies-2Dguidelines_mailing-
> >> >> >2Dlists&d=DwIFaQ&c=nKjWec2b6
> >> >> >> > >
> >> >>
> >>
> >>>R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&
> >m
> >> >=
> >> >> >KkXM
> >> >> >> > >
> >> >>
> >>
> >>>gr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7oMwlBy4fdYwjT2_f9BT
> >> >w
> >> >> >HRm
> >> >> >> > > 2GFtIYL-x8e4C0jlrpM&e= Committee:
> >> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__www.oas
> >> >> >> > > is-
> >> >> >2Do
> >> >> >> > >
> >> >>
> >>
> >>>pen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r
> >=l
> >> >> >DH
> >> >> >> > >
> >> >>
> >>
> >>>J2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6
> >u
> >> >H
> >> >> >CHA
> >> >> >> > > 2n_Tn6303zez2N4VkSfU&s=-
> >> >> >K2akZ06PY7r4TquQY4T4UWdZ6JyAsBDFZIEZFNsWHk
> >> >> >> > > &e= Join OASIS:
> >> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__www.oas
> >> >> >> > > is-
> >> >> >2Do
> >> >> >> > >
> >> >>
> >>
> >>>pen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ
> >3l
> >> >q
> >> >> >qs
> >> >> >> > >
> >> >>
> >>
> >>>ArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn6
> >30
> >> >3
> >> >> >zez2
> >> >> >> > >
> >N4VkSfU&s=lR1SlzXT71tjw7DDuD6XaHefJQSTBCAvRcdp45ureB8&e=
> >> >> >> > >
> >> >> >> > This publicly archived list offers a means to provide input to
> >> >> >> > the OASIS Virtual I/O Device (VIRTIO) TC.
> >> >> >> >
> >> >> >> > In order to verify user consent to the Feedback License terms
> >> >> >> > and to minimize spam in the list archive, subscription is
> >> >> >> > required before posting.
> >> >> >> >
> >> >> >> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> >> >> >> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> >> >> >> > List help: virtio-comment-help@lists.oasis-open.org
> >> >> >> > List archive:
> >> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oas
> >> >> >> > is-
> >> >> >> > 2Do
> >> >> >> > pen.org_archives_virtio-
> >> >> >2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
> >> >> >> >
> >> >>
> >>
> >>>&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5R
> >Y1
> >> >R
> >> >> >Td67
> >> >> >> >
> >> >>
> >>
> >>>ln4UY3Qz2I8T0_t4KwDaM0Ow&s=JjvHrIJI6KbQVxPovjCYXudvXtlvyYp3u7l56
> >t
> >> >c
> >> >> >gZ
> >> >> >> > SU&e= Feedback License:
> >> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis
> >> >> >> > -
> >> >> >2Dope
> >> >> >> > n.org_who_ipr_feedback-
> >> >> >5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xt
> >> >> >> >
> >> >>
> >>
> >>>fQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm
> >5
> >> >RY
> >> >> >1RTd
> >> >> >> > 67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=F-
> >> >> >1nqbZJyAZ3cYJgplGpxopp7wXt26y2UR97jmR
> >> >> >> > ZSEQ&e= List Guidelines:
> >> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis
> >> >> >> > -
> >> >> >2Dope
> >> >> >> > n.org_policies-2Dguidelines_mailing-
> >> >> >2Dlists&d=DwIFaQ&c=nKjWec2b6R0mO
> >> >> >> >
> >> >>
> >>
> >>>yPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_v
> >cf
> >> >j
> >> >> >Tktm
> >> >> >> >
> >> >>
> >>
> >>>5RY1RTd67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=0JXzYENjagm_cdHWwpZZ9Ap
> >v
> >> >P
> >> >> >qBLKHhg
> >> >> >> > ifl_rDNqEP0&e=
> >> >> >> > Committee:
> >> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis
> >> >> >> > -
> >> >> >2Dope
> >> >> >> >
> >> >>
> >>
> >>>n.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=l
> >D
> >> >HJ
> >> >> >2FW
> >> >> >> >
> >> >>
> >>
> >>>52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4U
> >Y
> >> >3
> >> >> >Qz2I
> >> >> >> > 8T0_t4KwDaM0Ow&s=C1VTV-Oeidqxe7sJ-
> >> >> >ctzZXEHto03VJEKs9BqAt2qObc&e=
> >> >> >> > Join OASIS:
> >> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis
> >> >> >> > -
> >> >> >2Dope
> >> >> >> >
> >> >>
> >>
> >>>n.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lq
> >q
> >> >s
> >> >> >ArgF
> >> >> >> >
> >> >>
> >>
> >>>Rdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_t4K
> >> >w
> >> >> >DaM0O
> >> >> >> > w&s=SNZGlaiIO4-mL1Ct2MIUCBntMVTf8JUglBIv_RGloCY&e=
> >> >> >> >
> >> >> >
> >> >> >This publicly archived list offers a means to provide input to
> >> >> >theOASIS Virtual I/O Device (VIRTIO) TC.In order to verify user
> >> >> >consent to the Feedback License terms andto minimize spam in the
> >> >> >list archive, subscription is requiredbefore
> >> >> >posting.Subscribe: virtio-comment-subscribe@lists.oasis-
> >> >> >open.orgUnsubscribe:
> >> >> >virtio-comment-unsubscribe@lists.oasis-open.orgList
> >> >> >help: virtio-comment-help@lists.oasis-open.orgList archive:
> >> >> >https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasis-
> >> >> >2Dopen.org_archives_virtio-
> >> >>
> >>
> >>>2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3
> >l
> >> >q
> >> >>
> >>
> >>>qsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8
> >T
> >> >0
> >> >> >_t4KwDaM0Ow&s=JjvHrIJI6KbQVxPovjCYXudvXtlvyYp3u7l56tcgZSU&e=
> >> >> >Feedback License:
> >> >> >https://urldefense.proofpoint.com/v2/url?u=https-
> >> >> >3A__www.oasis-2Dopen.org_who_ipr_feedback-
> >> >>
> >>
> >>>5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3l
> >q
> >> >q
> >> >>
> >>
> >>>sArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T
> >0
> >> >_
> >> >> >t4KwDaM0Ow&s=F-
> >1nqbZJyAZ3cYJgplGpxopp7wXt26y2UR97jmRZSEQ&e=
> >> >> >List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-
> >> >> >3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-
> >> >>
> >>
> >>>2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsAr
> >gF
> >> >R
> >> >>
> >>
> >>>dcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_t4K
> >w
> >> >D
> >> >> >aM0Ow&s=0JXzYENjagm_cdHWwpZZ9ApvPqBLKHhgifl_rDNqEP0&e=
> >> >> >Committee: https://urldefense.proofpoint.com/v2/url?u=https-
> >> >> >3A__www.oasis-
> >> >>
> >>
> >>>2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtf
> >Q
> >> >&
> >> >>
> >>
> >>>r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1
> >R
> >> >T
> >> >> >d67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=C1VTV-Oeidqxe7sJ-
> >> >> >ctzZXEHto03VJEKs9BqAt2qObc&e= Join OASIS:
> >> >> >https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oasis-
> >> >>
> >>
> >>>2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW5
> >2
> >> >o
> >> >>
> >>
> >>>J3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Q
> >z
> >> >2I
> >> >> >8T0_t4KwDaM0Ow&s=SNZGlaiIO4-
> >> >> >mL1Ct2MIUCBntMVTf8JUglBIv_RGloCY&e=
> >> >>
> >>
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [virtio-comment] RE: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-20 13:23                 ` [virtio-comment] " Michael S. Tsirkin
@ 2019-11-24 15:02                   ` Vitaly Mireyno
  2019-11-24 15:30                     ` [virtio-comment] " Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Vitaly Mireyno @ 2019-11-24 15:02 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, virtio-comment



>-----Original Message-----
>From: Michael S. Tsirkin <mst@redhat.com>
>Sent: Wednesday, 20 November, 2019 15:23
>To: Vitaly Mireyno <vmireyno@marvell.com>
>Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>open.org
>Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-
>sized receive buffers feature
>
>On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote:
>>
>>
>> >-----Original Message-----
>> >From: Michael S. Tsirkin <mst@redhat.com>
>> >Sent: Monday, 11 November, 2019 17:05
>> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>> >open.org
>> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
>> >equal- sized receive buffers feature
>> >
>> >On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrote:
>> >>
>> >>
>> >> >-----Original Message-----
>> >> >From: Michael S. Tsirkin <mst@redhat.com>
>> >> >Sent: Sunday, 10 November, 2019 17:35
>> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>> >> >open.org
>> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
>> >> >Add
>> >> >equal- sized receive buffers feature
>> >> >
>> >> >On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
>> >> >>
>> >> >> >-----Original Message-----
>> >> >> >From: virtio-comment@lists.oasis-open.org
>> >> >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
>> >> >> >Tsirkin
>> >> >> >Sent: Tuesday, 5 November, 2019 20:52
>> >> >> >To: Jason Wang <jasowang@redhat.com>
>> >> >> >Cc: Vitaly Mireyno <vmireyno@marvell.com>;
>> >> >> >virtio-comment@lists.oasis- open.org
>> >> >> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
>> >> >> >equal-sized receive buffers feature
>> >> >> >
>> >> >> >External Email
>> >> >> >
>> >> >> >---------------------------------------------------------------
>> >> >> >---
>> >> >> >---
>> >> >> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
>> >> >> >>
>> >> >> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
>> >> >> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
>> >> >> >> > > The feature is limited to receive buffers only.
>> >> >> >> > > A driver decides on receive buffer length. The only
>> >> >> >> > > limitation is that this
>> >> >> >length has to be the same for all receive virtqueue buffers.
>> >> >> >> > > The driver configures receive buffer length to the device
>> >> >> >> > > during device
>> >> >> >initialization, and the device reads it and may use it for
>> >> >> >optimal
>> >operation.
>> >> >> >> > > No changes for transmit buffers.
>> >> >> >> > >
>> >> >> >> > > -----Original Message-----
>> >> >> >> > > From: virtio-comment@lists.oasis-open.org
>> >> >> >> > > <virtio-comment@lists.oasis-open.org> On Behalf Of Jason
>> >> >> >> > > Wang
>> >> >> >> > > Sent: Thursday, 31 October, 2019 12:15
>> >> >> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
>> >> >> >> > > virtio-comment@lists.oasis-open.org
>> >> >> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
>> >> >> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net: Add
>> >> >> >> > > equal-sized receive buffers feature
>> >> >> >> > >
>> >> >> >> > >
>> >> >> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
>> >> >> >> > > > Some devices benefit from working with receive buffers
>> >> >> >> > > > of a
>> >> >> >predefined constant length.
>> >> >> >> > > > Add a feature for drivers that allocate equal-sized
>> >> >> >> > > > receive buffers, and
>> >> >> >devices that benefit from predefined receive buffers length.
>> >> >> >> > > >
>> >> >> >> > > > Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
>> >> >> >> > > > ---
>> >> >> >> > > >    content.tex | 29 +++++++++++++++++++++++++++--
>> >> >> >> > > >    1 file changed, 27 insertions(+), 2 deletions(-)
>> >> >> >> > >
>> >> >> >> > > Is there any other networking device that has this feature?
>> >> >> >> > >
>> >> >> >> > >
>> >> >> >> > > > diff --git a/content.tex b/content.tex index
>> >> >> >> > > > b1ea9b9..c9e67c8
>> >> >> >> > > > 100644
>> >> >> >> > > > --- a/content.tex
>> >> >> >> > > > +++ b/content.tex
>> >> >> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
>> >> >> >> > > > bits}\label{sec:Device
>> >> >> >Types / Network Device / Feature bits
>> >> >> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC
>> >> >> >> > > > address
>> >> >> >through control
>> >> >> >> > > >        channel.
>> >> >> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver
>> >> >> >> > > > +allocates all
>> >> >> >receive buffers
>> >> >> >> > > > +    of the same constant length. Device benefits from
>> >> >> >> > > > + working
>> >with
>> >> >> >> > > > +    receive buffers of equal length.
>> >> >> >> > > > +
>> >> >> >> > Problem is, I don't think linux will use this for skbs
>> >> >> >> > since it breaks buffer accounting. This is because it is
>> >> >> >> > important to make skbs as small as you can. So even if you
>> >> >> >> > see "device would
>> >benefit"
>> >> >> >> > there is no way to balance this with the benefit to linux.
>> >> >> >> > How do you know which benefit is bigger?
>> >> >> >>
>> >> >> >>
>> >> >> >> I guess the idea is e.g for Linux driver, it can refuse this feature.
>> >> >> >
>> >> >> >Okay. What uses it?
>> >> >> >
>> >> >> >>
>> >> >> >> >
>> >> >> >> > You also never explained how does device benefit. My guess
>> >> >> >> > is this allows device to calculate the # of descriptors to
>> >> >> >> > fetch that are needed for a packet. Right?
>> >> >> >> > Assuming this, I think a rough estimate should be enough.
>> >> >> >> > If device fetches too much it can discard extra, if it does
>> >> >> >> > not fetch enough it can fetch extra.
>> >> >> >> >
>> >> >> >> > Let us not forget that express packets are 256 bytes so up
>> >> >> >> > to
>> >> >> >> > 16 descriptors fit in a packet, there is no benefit most of
>> >> >> >> > the time in knowing whether e.g. 1 or 2 descriptors are needed.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > Let us not forget these are buffers, not descriptors.
>> >> >> >>
>> >> >> >>
>> >> >> >> I guess maybe the initial motivation is constant descriptor length.
>> >> >> >
>> >> >> >
>> >> >> >That conflicts with requirement framing is up to driver.
>> >> >> >
>> >> >> >
>> >> >> I was using the wrong terminology. The feature is about constant
>> >> >*descriptor* length. In other words, the value that driver puts in
>> >> >Packed Virtqueue "Element Length" field (or 'len' field in the
>'pvirtq_desc'
>> >struct).
>> >> >
>> >> >OK so this conflicts with "2.6.4 Message Framing" which requires
>> >> >that drivers can split buffers into as many descriptors as they like.
>> >> >Right?
>> >> >
>> >> >> Does it make more sense now, or you still see an issue with Linux
>driver?
>> >> >
>> >> >I think there's an issue with the flexible framing requirements
>> >> >and there's an issue with Linux driver.
>> >> >
>> >> >> The motivation is to allow the device to calculate the exact
>> >> >> number of
>> >> >descriptors to consume, before fetching the descriptors. This is
>> >> >beneficial for devices for which overconsuming or underconsuming
>> >> >descriptors come at a high cost.
>> >> >
>> >> >So I guess we can agree getting the # of descriptors *exactly*
>> >> >right isn't all that important. Right? My question is how does the
>> >> >driver balance the device versus Linux needs?
>> >> >
>> >> >One idea is if we assume this is best effort, does not have to be
>> >> >exact, then how about just having the device assume descriptor
>> >> >sizes stay more or less constant and use that to estimate the
>> >> >amount? If it under/over estimates, things just go a bit slower.
>> >> >This way driver can adjust the sizes and device will react
>> >> >automatically, with time.
>> >> >
>> >> >
>> >>
>> >> I see that "2.6.4 Message Framing" allows a full flexibility of
>> >> descriptor lengths, and I presume it's applicable to Packet
>> >> Virtqueues as well, though defined under Split Virtqueues section.
>> >
>> >That's a good point. Probably makes sense to move it out to a common
>> >section, right?
>> >
>> >> The example
>> >> talks about transmit virtqueue, and it makes perfect sense.
>> >> However, wouldn't a typical driver place equal-sized *receive*
>> >> descriptors
>> >anyway? So if a device can benefit from it, the driver might as well
>> >negotiate this new feature.
>> >
>> >Having buffer size jump around wildly doesn't seem too useful, I agree.
>> >OTOH being able to adjust it gradually has been in the past
>> >demontrated to help performance measureably.
>> >
>> >> This could be especially relevant for devices, for which adjusting
>> >> the number
>> >of used descriptors is impractical after descriptors have already been
>fetched.
>> >> I agree that if this requirement conflicts with specific driver
>> >> needs, it will not
>> >be negotiated, and the device will either underperform in certain
>> >scenarios, or will not come up at all.
>> >
>> >Right so that makes it a challenge. Device says it prefers fixed
>> >buffers. Is that preference more or less important than ability to
>> >efficiently support workloads such as TCP small queues?
>> >Driver has no idea and I suspect neither does the device.
>> >So I don't see how will a Linux driver know that it should enable
>> >this, neither how will device know it's okay to just refuse features_ok.
>> >
>> >
>> >On the other hand, current drivers already have logic that tries to
>> >change buffer sizes in a smooth way.  So simply caching the last
>> >buffer size and assuming all the others will be exactly the same will
>> >go a long way towards limiting how much does device need to fetch.
>> >This does imply extra logic on the device to recover if things change
>> >and the first read did not fetch enough buffers, but then it would be
>> >required anyway if as you say above the failure is not catastrophic.
>> >
>> >
>> >The feature thus only seems useful for small, feature-restrained
>> >devices
>> >- which are prepared to sacrifice some performance to cut costs, and
>> >which can't recover at all. Is this what you are trying to do?
>> >
>>
>> The intention is to enable devices with this specific limitation to
>> offer virtio offload.  The feature can be redefined such that it would
>> only be offered by devices that are unable to handle dynamically
>> changing descriptor lengths. How does that sound?
>
>So it makes more sense when mergeable buffers are disabled (since then
>buffers are practically all same size).
>

Actually, mergeable buffers are not a problem. They could be enabled, as long as each descriptor is the same length. So implementations that prefer optimizing memory utilization over jumbo frame performance can negotiate VIRTIO_NET_F_MRG_RXBUF and allocate smaller buffers.

>
>With that in mind, I have an idea: scsi and block already have max sg field.
>How about we add a writeable max sg field, maybe even make it
>programmable per vq?
>
>Thus driver tells device what is it's max s/g value for rx. Worst case device
>fetches a bit more than it needs. Discarding extra shouldn't be expensive. This
>looks like it will help even smart devices. What do you think?
>
>This nicely avoids conflicting with the flexible framing requirement.
>

My intention was to avoid any descriptor length variations, for devices that unable fetching or discarding extra descriptors. (If in the pipelined HW processing the decision on number of descriptors is made in the early stage, and it cannot be undone in a later stage).

Defining max s/g sounds like an interesting feature by itself.

>
>> >> Could you specify what issue do you see with the Linux driver?
>> >
>> >See logic around struct ewma.
>> >
>> >The first relevant commit is I guess
>> >commit ab7db91705e95ed1bba1304388936fccfa58c992
>> >    virtio-net: auto-tune mergeable rx buffer size for improved
>> >performance
>> >
>> >this claims a gain of about 10% with large packets which isn't earth
>> >shattering but also not something we can ignore completely. And I
>> >suspect it can be bigger with smaller packets.
>> >
>> >> >> >>
>> >> >> >> >
>> >> >> >> > So device does not really know the exact # of descriptors:
>> >> >> >> > current drivers do 1 descriptor per buffer but nothing
>> >> >> >> > prevents more.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > Thoughts?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process
>> >> >> >> > > > duplicated
>> >> >> >ACKs
>> >> >> >> > > >        and report number of coalesced segments and
>> >> >> >> > > > duplicated ACKs @@ -2854,8 +2858,8 @@
>> >\subsubsection{Legacy Interface:
>> >> >> >Feature bits}\label{sec:Device Types / Network
>> >> >> >> > > >    \subsection{Device configuration
>> >> >> >> > > > layout}\label{sec:Device Types /
>> >> >> >Network Device / Device configuration layout}
>> >> >> >> > > >    \label{sec:Device Types / Block Device / Feature
>> >> >> >> > > > bits / Device configuration layout} -Three
>> >> >> >> > > > driver-read-only configuration fields are currently defined.
>> >> >> >> > > > The \field{mac} address field -always exists (though is
>> >> >> >> > > > only valid if VIRTIO_NET_F_MAC is set), and
>> >> >> >> > > > +The driver-read-only \field{mac} address field always
>> >> >> >> > > > +exists (though is only valid if VIRTIO_NET_F_MAC is
>> >> >> >> > > > +set), and
>> >> >> >> > > >    \field{status} only exists if VIRTIO_NET_F_STATUS is set.
>Two
>> >> >> >> > > >    read-only bits (for the driver) are currently
>> >> >> >> > > > defined for the status
>> >> >> >field:
>> >> >> >> > > >    VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
>> >> >> >> > > > @@ -2875,12 +2879,17 @@ \subsection{Device
>> >> >> >> > > > configuration
>> >> >> >layout}\label{sec:Device Types / Network Device
>> >> >> >> > > >    VIRTIO_NET_F_MTU is set. This field specifies the
>> >> >> >> > > > maximum MTU for
>> >> >> >the driver to
>> >> >> >> > > >    use.
>> >> >> >> > > > +The device-read-only field \field{rx_buf_len} only
>> >> >> >> > > > +exists if
>> >> >> >> > >
>> >> >> >> > > Should be driver-read-only.
>> >> >> >> > >
>> >> >> >> > >
>> >> >> >> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field
>> >> >> >> > > > +specifies the receive buffers length.
>> >> >> >> > > > +
>> >> >> >> > > >    \begin{lstlisting}
>> >> >> >> > > >    struct virtio_net_config {
>> >> >> >> > > >            u8 mac[6];
>> >> >> >> > > >            le16 status;
>> >> >> >> > > >            le16 max_virtqueue_pairs;
>> >> >> >> > > >            le16 mtu;
>> >> >> >> > > > +        le32 rx_buf_len;
>> >> >> >> > > >    };
>> >> >> >> > > >    \end{lstlisting}
>> >> >> >> > > > @@ -2933,6 +2942,13 @@ \subsection{Device configuration
>> >> >> >> > > > layout}\label{sec:Device Types / Network Device
>> >> >> >> > > >    A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY
>> >> >> >> > > > feature if
>> >> >> >the device offers it.
>> >> >> >> > > > +A driver SHOULD accept the
>> >VIRTIO_NET_F_CONST_RXBUF_LEN
>> >> >> >feature if offered.
>> >> >> >> > > > +
>> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
>> >> >negotiated,
>> >> >> >> > > > +the driver MUST set \field{rx_buf_len}.
>> >> >> >> > >
>> >> >> >> > > I think it's device that set the field?
>> >> >> >> > Makes more sense for the driver, but if you want this set e.g.
>> >> >> >> > before buffers are added, you must say so.
>> >> >> >> >
>> >> >> >> > > > +
>> >> >> >> > > > +A driver MUST NOT modify \field{rx_buf_len} once it
>> >> >> >> > > > +has been
>> >> >set.
>> >> >> >> > > > +
>> >> >> >> >
>> >> >> >> > This seems very unflexible. I can see how e.g. XDP would
>> >> >> >> > benefit from big buffers while skbs benefit from small buffers.
>> >> >> >> >
>> >> >> >> > This calls for ability to change this.
>> >> >> >>
>> >> >> >>
>> >> >> >> Yes, but it requires non trivial cleanups for the old length
>> >> >> >> and place them with new ones.
>> >> >> >>
>> >> >> >> Thanks
>> >> >> >
>> >> >> >
>> >> >> >Let's see:
>> >> >> >1	- making buffer smaller: just update config space,
>> >> >> >	  then make new buffers smaller
>> >> >> >
>> >> >> >2	- making buffers bigger: add larger buffers,
>> >> >> >	once all small ones are consumed update config space
>> >> >> >
>> >> >> >2 is tricky I agree. Thoughts?
>> >> >> >
>> >> >> >
>> >> >> Agree. It's doable, provided that the driver will follow the
>> >> >> update
>> >> >procedure.
>> >> >> >
>> >> >> >>
>> >> >> >> >
>> >> >> >> > > >    \subsubsection{Legacy Interface: Device
>> >> >> >> > > > configuration
>> >> >> >layout}\label{sec:Device Types / Network Device / Device
>> >> >> >configuration layout / Legacy Interface: Device configuration
>> >> >> >layout}
>> >> >> >> > > >    \label{sec:Device Types / Block Device / Feature
>> >> >> >> > > > bits / Device
>> >> >> >configuration layout / Legacy Interface: Device configuration
>> >> >> >layout}
>> >> >> >> > > >    When using the legacy interface, transitional
>> >> >> >> > > > devices and drivers @@
>> >> >> >> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive
>> >> >> >Buffers}\label{sec:Device Types / Network Devi
>> >> >> >> > > >    If VIRTIO_NET_F_MQ is negotiated, each of
>> >> >> >> > > > receiveq1\ldots
>> >> >> >receiveqN
>> >> >> >> > > >    that will be used SHOULD be populated with receive
>buffers.
>> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
>> >> >negotiated,
>> >> >> >> > > > +the driver MUST initialize all receive virtqueue
>> >> >> >> > > > +descriptors \field{len} field with the value
>> >> >> >> > > > +configured in \field{rx_buf_len} device configuration
>> >> >> >> > > > +field, and allocate receive
>> >> >> >buffers accordingly.
>> >> >> >> > > > +
>> >> >> >> > > >    \devicenormative{\paragraph}{Setting Up Receive
>> >> >> >> > > > Buffers}{Device Types / Network Device / Device
>> >> >> >> > > > Operation / Setting Up Receive Buffers}
>> >> >> >> > > >    The device MUST set \field{num_buffers} to the
>> >> >> >> > > > number of descriptors used to @@ -3396,6 +3417,10 @@
>> >> >> >\subsubsection{Processing of Incoming Packets}\label{sec:Device
>> >> >> >Types / Network
>> >> >> >> > > >    checksum (in case of multiple encapsulated
>> >> >> >> > > > protocols, one
>> >level
>> >> >> >> > > >    of checksums is validated).
>> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated,
>> >the
>> >> >> >device
>> >> >> >> > > > +MAY use \field{rx_buf_len} as a buffer length (instead
>> >> >> >> > > > +of reading it from virtqueue descriptor \field{len} field).
>> >> >> >> > >
>> >> >> >> > > Is this safe? What if driver submit a small buffer, then
>> >> >> >> > > device can read
>> >> >> >more than what is allowed?
>> >> >> >> > >
>> >> >> >> > > Thanks
>> >> >> >> > >
>> >> >> >> > >
>> >> >> >> > > > +
>> >> >> >> > > >    \drivernormative{\paragraph}{Processing of Incoming
>> >> >> >> > > >    Packets}{Device Types / Network Device / Device Operation
>/
>> >> >> >> > > >    Processing of Incoming Packets}
>> >> >> >> > > > --
>> >> >> >> > >
>> >> >> >> > > This publicly archived list offers a means to provide
>> >> >> >> > > input to theOASIS Virtual I/O Device (VIRTIO) TC.In order
>> >> >> >> > > to verify user consent to the Feedback License terms
>> >> >> >> > > andto minimize spam in the list archive, subscription is
>> >> >> >> > > requiredbefore
>> >posting.Subscribe:
>> >> >> >> > > virtio-comment-subscribe@lists.oasis-open.orgUnsubscribe:
>> >> >> >> > > virtio-comment-unsubscribe@lists.oasis-open.orgList help:
>> >> >> >> > > virtio-comment-help@lists.oasis-open.orgList archive:
>> >> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__list
>> >> >> >> > > s.o
>> >> >> >> > > asi
>> >> >> >> > > s-2
>> >> >> >> > > Dopen.org_archives_virtio-
>> >> >> >2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7
>> >> >> >> > >
>> >> >>
>> >>
>>
>>>>xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3
>L
>> >R
>> >> >g
>> >> >> >5F7
>> >> >> >> > >
>> >> >>
>> >>
>>
>>>>_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=ZvYuXs3LuhkeJxW_zoADv1L4Ris
>D
>> >y
>> >> >G
>> >> >> >H_v
>> >> >> >> > > p7scO91w9s&e= Feedback License:
>> >> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-
>> >3A__www.oas
>> >> >> >> > > is-
>> >> >> >2Do
>> >> >> >> > > pen.org_who_ipr_feedback-
>> >> >> >5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPa
>> >> >> >> > >
>> >> >>
>> >>
>>
>>>>z7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXM
>gr
>> >3
>> >> >L
>> >> >> >Rg5
>> >> >> >> > > F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7Hf-O4ss-
>> >> >> >rLdNJsAbIOg5ruheBcfAYF
>> >> >> >> > > dXWuks_HJfFQ&e= List Guidelines:
>> >> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-
>> >3A__www.oas
>> >> >> >> > > is-
>> >> >> >2Do
>> >> >> >> > > pen.org_policies-2Dguidelines_mailing-
>> >> >> >2Dlists&d=DwIFaQ&c=nKjWec2b6
>> >> >> >> > >
>> >> >>
>> >>
>>
>>>>R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&
>> >m
>> >> >=
>> >> >> >KkXM
>> >> >> >> > >
>> >> >>
>> >>
>>
>>>>gr3LRg5F7_7jB6uHCHA2n_Tn6303zez2N4VkSfU&s=7oMwlBy4fdYwjT2_f9B
>T
>> >> >w
>> >> >> >HRm
>> >> >> >> > > 2GFtIYL-x8e4C0jlrpM&e= Committee:
>> >> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-
>> >3A__www.oas
>> >> >> >> > > is-
>> >> >> >2Do
>> >> >> >> > >
>> >> >>
>> >>
>>
>>>>pen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&
>r
>> >=l
>> >> >> >DH
>> >> >> >> > >
>> >> >>
>> >>
>>
>>>>J2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB
>6
>> >u
>> >> >H
>> >> >> >CHA
>> >> >> >> > > 2n_Tn6303zez2N4VkSfU&s=-
>> >> >> >K2akZ06PY7r4TquQY4T4UWdZ6JyAsBDFZIEZFNsWHk
>> >> >> >> > > &e= Join OASIS:
>> >> >> >> > > https://urldefense.proofpoint.com/v2/url?u=https-
>> >3A__www.oas
>> >> >> >> > > is-
>> >> >> >2Do
>> >> >> >> > >
>> >> >>
>> >>
>>
>>>>pen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52o
>J
>> >3l
>> >> >q
>> >> >> >qs
>> >> >> >> > >
>> >> >>
>> >>
>>
>>>>ArgFRdcevq01tbLQAw4A_NO7xgI&m=KkXMgr3LRg5F7_7jB6uHCHA2n_Tn
>6
>> >30
>> >> >3
>> >> >> >zez2
>> >> >> >> > >
>> >N4VkSfU&s=lR1SlzXT71tjw7DDuD6XaHefJQSTBCAvRcdp45ureB8&e=
>> >> >> >> > >
>> >> >> >> > This publicly archived list offers a means to provide input
>> >> >> >> > to the OASIS Virtual I/O Device (VIRTIO) TC.
>> >> >> >> >
>> >> >> >> > In order to verify user consent to the Feedback License
>> >> >> >> > terms and to minimize spam in the list archive,
>> >> >> >> > subscription is required before posting.
>> >> >> >> >
>> >> >> >> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> >> >> >> > Unsubscribe:
>> >> >> >> > virtio-comment-unsubscribe@lists.oasis-open.org
>> >> >> >> > List help: virtio-comment-help@lists.oasis-open.org
>> >> >> >> > List archive:
>> >> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.
>> >> >> >> > oas
>> >> >> >> > is-
>> >> >> >> > 2Do
>> >> >> >> > pen.org_archives_virtio-
>> >> >> >2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
>> >> >> >> >
>> >> >>
>> >>
>>
>>>>&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5
>R
>> >Y1
>> >> >R
>> >> >> >Td67
>> >> >> >> >
>> >> >>
>> >>
>>
>>>>ln4UY3Qz2I8T0_t4KwDaM0Ow&s=JjvHrIJI6KbQVxPovjCYXudvXtlvyYp3u7l5
>6
>> >t
>> >> >c
>> >> >> >gZ
>> >> >> >> > SU&e= Feedback License:
>> >> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oa
>> >> >> >> > sis
>> >> >> >> > -
>> >> >> >2Dope
>> >> >> >> > n.org_who_ipr_feedback-
>> >> >> >5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xt
>> >> >> >> >
>> >> >>
>> >>
>>
>>>>fQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTkt
>m
>> >5
>> >> >RY
>> >> >> >1RTd
>> >> >> >> > 67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=F-
>> >> >> >1nqbZJyAZ3cYJgplGpxopp7wXt26y2UR97jmR
>> >> >> >> > ZSEQ&e= List Guidelines:
>> >> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oa
>> >> >> >> > sis
>> >> >> >> > -
>> >> >> >2Dope
>> >> >> >> > n.org_policies-2Dguidelines_mailing-
>> >> >> >2Dlists&d=DwIFaQ&c=nKjWec2b6R0mO
>> >> >> >> >
>> >> >>
>> >>
>>
>>>>yPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_
>v
>> >cf
>> >> >j
>> >> >> >Tktm
>> >> >> >> >
>> >> >>
>> >>
>>
>>>>5RY1RTd67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=0JXzYENjagm_cdHWwpZZ9A
>p
>> >v
>> >> >P
>> >> >> >qBLKHhg
>> >> >> >> > ifl_rDNqEP0&e=
>> >> >> >> > Committee:
>> >> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oa
>> >> >> >> > sis
>> >> >> >> > -
>> >> >> >2Dope
>> >> >> >> >
>> >> >>
>> >>
>>
>>>>n.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=l
>> >D
>> >> >HJ
>> >> >> >2FW
>> >> >> >> >
>> >> >>
>> >>
>>
>>>>52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4
>U
>> >Y
>> >> >3
>> >> >> >Qz2I
>> >> >> >> > 8T0_t4KwDaM0Ow&s=C1VTV-Oeidqxe7sJ-
>> >> >> >ctzZXEHto03VJEKs9BqAt2qObc&e=
>> >> >> >> > Join OASIS:
>> >> >> >> > https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oa
>> >> >> >> > sis
>> >> >> >> > -
>> >> >> >2Dope
>> >> >> >> >
>> >> >>
>> >>
>>
>>>>n.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3l
>q
>> >q
>> >> >s
>> >> >> >ArgF
>> >> >> >> >
>> >> >>
>> >>
>>
>>>>Rdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_t4
>K
>> >> >w
>> >> >> >DaM0O
>> >> >> >> > w&s=SNZGlaiIO4-mL1Ct2MIUCBntMVTf8JUglBIv_RGloCY&e=
>> >> >> >> >
>> >> >> >
>> >> >> >This publicly archived list offers a means to provide input to
>> >> >> >theOASIS Virtual I/O Device (VIRTIO) TC.In order to verify user
>> >> >> >consent to the Feedback License terms andto minimize spam in
>> >> >> >the list archive, subscription is requiredbefore
>> >> >> >posting.Subscribe: virtio-comment-subscribe@lists.oasis-
>> >> >> >open.orgUnsubscribe:
>> >> >> >virtio-comment-unsubscribe@lists.oasis-open.orgList
>> >> >> >help: virtio-comment-help@lists.oasis-open.orgList archive:
>> >> >> >https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.oasi
>> >> >> >s-
>> >> >> >2Dopen.org_archives_virtio-
>> >> >>
>> >>
>>
>>>>2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ
>3
>> >l
>> >> >q
>> >> >>
>> >>
>>
>>>>qsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2
>I8
>> >T
>> >> >0
>> >> >>
>>_t4KwDaM0Ow&s=JjvHrIJI6KbQVxPovjCYXudvXtlvyYp3u7l56tcgZSU&e=
>> >> >> >Feedback License:
>> >> >> >https://urldefense.proofpoint.com/v2/url?u=https-
>> >> >> >3A__www.oasis-2Dopen.org_who_ipr_feedback-
>> >> >>
>> >>
>>
>>>>5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ
>3l
>> >q
>> >> >q
>> >> >>
>> >>
>>
>>>>sArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8
>T
>> >0
>> >> >_
>> >> >> >t4KwDaM0Ow&s=F-
>> >1nqbZJyAZ3cYJgplGpxopp7wXt26y2UR97jmRZSEQ&e=
>> >> >> >List Guidelines:
>> >> >> >https://urldefense.proofpoint.com/v2/url?u=https-
>> >> >> >3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-
>> >> >>
>> >>
>>
>>>>2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsA
>r
>> >gF
>> >> >R
>> >> >>
>> >>
>>
>>>>dcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3Qz2I8T0_t4K
>> >w
>> >> >D
>> >> >>
>>aM0Ow&s=0JXzYENjagm_cdHWwpZZ9ApvPqBLKHhgifl_rDNqEP0&e=
>> >> >> >Committee: https://urldefense.proofpoint.com/v2/url?u=https-
>> >> >> >3A__www.oasis-
>> >> >>
>> >>
>>
>>>>2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xt
>f
>> >Q
>> >> >&
>> >> >>
>> >>
>>
>>>>r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY
>1
>> >R
>> >> >T
>> >> >> >d67ln4UY3Qz2I8T0_t4KwDaM0Ow&s=C1VTV-Oeidqxe7sJ-
>> >> >> >ctzZXEHto03VJEKs9BqAt2qObc&e= Join OASIS:
>> >> >> >https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oasis-
>> >> >>
>> >>
>>
>>>>2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2F
>W5
>> >2
>> >> >o
>> >> >>
>> >>
>>
>>>>J3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=B_vcfjTktm5RY1RTd67ln4UY3
>Q
>> >z
>> >> >2I
>> >> >> >8T0_t4KwDaM0Ow&s=SNZGlaiIO4-
>> >> >> >mL1Ct2MIUCBntMVTf8JUglBIv_RGloCY&e=
>> >> >>
>> >>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-24 15:02                   ` [virtio-comment] " Vitaly Mireyno
@ 2019-11-24 15:30                     ` Michael S. Tsirkin
  2019-11-25  3:31                       ` Jason Wang
  2019-11-26 17:27                       ` Vitaly Mireyno
  0 siblings, 2 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-11-24 15:30 UTC (permalink / raw)
  To: Vitaly Mireyno; +Cc: Jason Wang, virtio-comment

On Sun, Nov 24, 2019 at 03:02:05PM +0000, Vitaly Mireyno wrote:
> 
> 
> >-----Original Message-----
> >From: Michael S. Tsirkin <mst@redhat.com>
> >Sent: Wednesday, 20 November, 2019 15:23
> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> >open.org
> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-
> >sized receive buffers feature
> >
> >On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote:
> >>
> >>
> >> >-----Original Message-----
> >> >From: Michael S. Tsirkin <mst@redhat.com>
> >> >Sent: Monday, 11 November, 2019 17:05
> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> >> >open.org
> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
> >> >equal- sized receive buffers feature
> >> >
> >> >On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrote:
> >> >>
> >> >>
> >> >> >-----Original Message-----
> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
> >> >> >Sent: Sunday, 10 November, 2019 17:35
> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> >> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> >> >> >open.org
> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> >> >> >Add
> >> >> >equal- sized receive buffers feature
> >> >> >
> >> >> >On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
> >> >> >>
> >> >> >> >-----Original Message-----
> >> >> >> >From: virtio-comment@lists.oasis-open.org
> >> >> >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
> >> >> >> >Tsirkin
> >> >> >> >Sent: Tuesday, 5 November, 2019 20:52
> >> >> >> >To: Jason Wang <jasowang@redhat.com>
> >> >> >> >Cc: Vitaly Mireyno <vmireyno@marvell.com>;
> >> >> >> >virtio-comment@lists.oasis- open.org
> >> >> >> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
> >> >> >> >equal-sized receive buffers feature
> >> >> >> >
> >> >> >> >External Email
> >> >> >> >
> >> >> >> >---------------------------------------------------------------
> >> >> >> >---
> >> >> >> >---
> >> >> >> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
> >> >> >> >>
> >> >> >> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
> >> >> >> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
> >> >> >> >> > > The feature is limited to receive buffers only.
> >> >> >> >> > > A driver decides on receive buffer length. The only
> >> >> >> >> > > limitation is that this
> >> >> >> >length has to be the same for all receive virtqueue buffers.
> >> >> >> >> > > The driver configures receive buffer length to the device
> >> >> >> >> > > during device
> >> >> >> >initialization, and the device reads it and may use it for
> >> >> >> >optimal
> >> >operation.
> >> >> >> >> > > No changes for transmit buffers.
> >> >> >> >> > >
> >> >> >> >> > > -----Original Message-----
> >> >> >> >> > > From: virtio-comment@lists.oasis-open.org
> >> >> >> >> > > <virtio-comment@lists.oasis-open.org> On Behalf Of Jason
> >> >> >> >> > > Wang
> >> >> >> >> > > Sent: Thursday, 31 October, 2019 12:15
> >> >> >> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
> >> >> >> >> > > virtio-comment@lists.oasis-open.org
> >> >> >> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> >> >> >> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net: Add
> >> >> >> >> > > equal-sized receive buffers feature
> >> >> >> >> > >
> >> >> >> >> > >
> >> >> >> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
> >> >> >> >> > > > Some devices benefit from working with receive buffers
> >> >> >> >> > > > of a
> >> >> >> >predefined constant length.
> >> >> >> >> > > > Add a feature for drivers that allocate equal-sized
> >> >> >> >> > > > receive buffers, and
> >> >> >> >devices that benefit from predefined receive buffers length.
> >> >> >> >> > > >
> >> >> >> >> > > > Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
> >> >> >> >> > > > ---
> >> >> >> >> > > >    content.tex | 29 +++++++++++++++++++++++++++--
> >> >> >> >> > > >    1 file changed, 27 insertions(+), 2 deletions(-)
> >> >> >> >> > >
> >> >> >> >> > > Is there any other networking device that has this feature?
> >> >> >> >> > >
> >> >> >> >> > >
> >> >> >> >> > > > diff --git a/content.tex b/content.tex index
> >> >> >> >> > > > b1ea9b9..c9e67c8
> >> >> >> >> > > > 100644
> >> >> >> >> > > > --- a/content.tex
> >> >> >> >> > > > +++ b/content.tex
> >> >> >> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
> >> >> >> >> > > > bits}\label{sec:Device
> >> >> >> >Types / Network Device / Feature bits
> >> >> >> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC
> >> >> >> >> > > > address
> >> >> >> >through control
> >> >> >> >> > > >        channel.
> >> >> >> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver
> >> >> >> >> > > > +allocates all
> >> >> >> >receive buffers
> >> >> >> >> > > > +    of the same constant length. Device benefits from
> >> >> >> >> > > > + working
> >> >with
> >> >> >> >> > > > +    receive buffers of equal length.
> >> >> >> >> > > > +
> >> >> >> >> > Problem is, I don't think linux will use this for skbs
> >> >> >> >> > since it breaks buffer accounting. This is because it is
> >> >> >> >> > important to make skbs as small as you can. So even if you
> >> >> >> >> > see "device would
> >> >benefit"
> >> >> >> >> > there is no way to balance this with the benefit to linux.
> >> >> >> >> > How do you know which benefit is bigger?
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> I guess the idea is e.g for Linux driver, it can refuse this feature.
> >> >> >> >
> >> >> >> >Okay. What uses it?
> >> >> >> >
> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> > You also never explained how does device benefit. My guess
> >> >> >> >> > is this allows device to calculate the # of descriptors to
> >> >> >> >> > fetch that are needed for a packet. Right?
> >> >> >> >> > Assuming this, I think a rough estimate should be enough.
> >> >> >> >> > If device fetches too much it can discard extra, if it does
> >> >> >> >> > not fetch enough it can fetch extra.
> >> >> >> >> >
> >> >> >> >> > Let us not forget that express packets are 256 bytes so up
> >> >> >> >> > to
> >> >> >> >> > 16 descriptors fit in a packet, there is no benefit most of
> >> >> >> >> > the time in knowing whether e.g. 1 or 2 descriptors are needed.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > Let us not forget these are buffers, not descriptors.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> I guess maybe the initial motivation is constant descriptor length.
> >> >> >> >
> >> >> >> >
> >> >> >> >That conflicts with requirement framing is up to driver.
> >> >> >> >
> >> >> >> >
> >> >> >> I was using the wrong terminology. The feature is about constant
> >> >> >*descriptor* length. In other words, the value that driver puts in
> >> >> >Packed Virtqueue "Element Length" field (or 'len' field in the
> >'pvirtq_desc'
> >> >struct).
> >> >> >
> >> >> >OK so this conflicts with "2.6.4 Message Framing" which requires
> >> >> >that drivers can split buffers into as many descriptors as they like.
> >> >> >Right?
> >> >> >
> >> >> >> Does it make more sense now, or you still see an issue with Linux
> >driver?
> >> >> >
> >> >> >I think there's an issue with the flexible framing requirements
> >> >> >and there's an issue with Linux driver.
> >> >> >
> >> >> >> The motivation is to allow the device to calculate the exact
> >> >> >> number of
> >> >> >descriptors to consume, before fetching the descriptors. This is
> >> >> >beneficial for devices for which overconsuming or underconsuming
> >> >> >descriptors come at a high cost.
> >> >> >
> >> >> >So I guess we can agree getting the # of descriptors *exactly*
> >> >> >right isn't all that important. Right? My question is how does the
> >> >> >driver balance the device versus Linux needs?
> >> >> >
> >> >> >One idea is if we assume this is best effort, does not have to be
> >> >> >exact, then how about just having the device assume descriptor
> >> >> >sizes stay more or less constant and use that to estimate the
> >> >> >amount? If it under/over estimates, things just go a bit slower.
> >> >> >This way driver can adjust the sizes and device will react
> >> >> >automatically, with time.
> >> >> >
> >> >> >
> >> >>
> >> >> I see that "2.6.4 Message Framing" allows a full flexibility of
> >> >> descriptor lengths, and I presume it's applicable to Packet
> >> >> Virtqueues as well, though defined under Split Virtqueues section.
> >> >
> >> >That's a good point. Probably makes sense to move it out to a common
> >> >section, right?
> >> >
> >> >> The example
> >> >> talks about transmit virtqueue, and it makes perfect sense.
> >> >> However, wouldn't a typical driver place equal-sized *receive*
> >> >> descriptors
> >> >anyway? So if a device can benefit from it, the driver might as well
> >> >negotiate this new feature.
> >> >
> >> >Having buffer size jump around wildly doesn't seem too useful, I agree.
> >> >OTOH being able to adjust it gradually has been in the past
> >> >demontrated to help performance measureably.
> >> >
> >> >> This could be especially relevant for devices, for which adjusting
> >> >> the number
> >> >of used descriptors is impractical after descriptors have already been
> >fetched.
> >> >> I agree that if this requirement conflicts with specific driver
> >> >> needs, it will not
> >> >be negotiated, and the device will either underperform in certain
> >> >scenarios, or will not come up at all.
> >> >
> >> >Right so that makes it a challenge. Device says it prefers fixed
> >> >buffers. Is that preference more or less important than ability to
> >> >efficiently support workloads such as TCP small queues?
> >> >Driver has no idea and I suspect neither does the device.
> >> >So I don't see how will a Linux driver know that it should enable
> >> >this, neither how will device know it's okay to just refuse features_ok.
> >> >
> >> >
> >> >On the other hand, current drivers already have logic that tries to
> >> >change buffer sizes in a smooth way.  So simply caching the last
> >> >buffer size and assuming all the others will be exactly the same will
> >> >go a long way towards limiting how much does device need to fetch.
> >> >This does imply extra logic on the device to recover if things change
> >> >and the first read did not fetch enough buffers, but then it would be
> >> >required anyway if as you say above the failure is not catastrophic.
> >> >
> >> >
> >> >The feature thus only seems useful for small, feature-restrained
> >> >devices
> >> >- which are prepared to sacrifice some performance to cut costs, and
> >> >which can't recover at all. Is this what you are trying to do?
> >> >
> >>
> >> The intention is to enable devices with this specific limitation to
> >> offer virtio offload.  The feature can be redefined such that it would
> >> only be offered by devices that are unable to handle dynamically
> >> changing descriptor lengths. How does that sound?
> >
> >So it makes more sense when mergeable buffers are disabled (since then
> >buffers are practically all same size).
> >
> 
> Actually, mergeable buffers are not a problem. They could be enabled,
> as long as each descriptor is the same length. So implementations that
> prefer optimizing memory utilization over jumbo frame performance can
> negotiate VIRTIO_NET_F_MRG_RXBUF and allocate smaller buffers.

So my point was, without VIRTIO_NET_F_MRG_RXBUF, all buffers are same
length anyway. So if we are talking about cheap simple hardware, I guess
it can just not have VIRTIO_NET_F_MRG_RXBUF and be done with it?
If device does care about memory utilization then I think
it needs to give driver the flexibility it needs/uses.
No?

Again I can see how we might want to disallow crazy setups with e.g. 1
byte per buffer. That's just abuse, no guest does that anyway. So asking
e.g. for a minimal buffer size sounds very reasonable.  But an option that
disables functionality that a popular guest uses needs a lot of
documentation to help device writers figure out whether they want that
option or not, and I'd worry that even with documentation will be
misunderstood even if we write it. When do you enable this?
When you don't care about performance? When you don't care about Linux?

> >
> >With that in mind, I have an idea: scsi and block already have max sg field.
> >How about we add a writeable max sg field, maybe even make it
> >programmable per vq?
> >
> >Thus driver tells device what is it's max s/g value for rx. Worst case device
> >fetches a bit more than it needs. Discarding extra shouldn't be expensive. This
> >looks like it will help even smart devices. What do you think?
> >
> >This nicely avoids conflicting with the flexible framing requirement.
> >
> 
> My intention was to avoid any descriptor length variations, for
> devices that unable fetching or discarding extra descriptors. (If in
> the pipelined HW processing the decision on number of descriptors is
> made in the early stage, and it cannot be undone in a later stage).

Frankly discarding unused descriptors looks to me like something that
shouldn't have a high cost in the hardware.  I can see how trying to
predict descriptor length, and fetching extra if not enough was fetched
can have a high cost, so to me extensions to remove the guesswork
from the device have value.  However a lot of effort went into trying to
reduce e.g. number of pci express packets needed to fetch descriptors.
Each packet fetches 256 bytes anyway, does it not?  Not having an
ability to use that information seems like an obvious waste, and that
means ability to keep some fetched descriptors around even if they are
not used for a given packet. Again just my $.02.

>
> Defining max s/g sounds like an interesting feature by itself.

But assuming we have max RX s/g, I guess hardware can set max s/g = 1?
Then since with !VIRTIO_NET_F_MRG_RXBUF all buffers are forced to
be same length.

> 
> >
> >> >> Could you specify what issue do you see with the Linux driver?
> >> >
> >> >See logic around struct ewma.
> >> >
> >> >The first relevant commit is I guess
> >> >commit ab7db91705e95ed1bba1304388936fccfa58c992
> >> >    virtio-net: auto-tune mergeable rx buffer size for improved
> >> >performance
> >> >
> >> >this claims a gain of about 10% with large packets which isn't earth
> >> >shattering but also not something we can ignore completely. And I
> >> >suspect it can be bigger with smaller packets.
> >> >
> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> > So device does not really know the exact # of descriptors:
> >> >> >> >> > current drivers do 1 descriptor per buffer but nothing
> >> >> >> >> > prevents more.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > Thoughts?
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process
> >> >> >> >> > > > duplicated
> >> >> >> >ACKs
> >> >> >> >> > > >        and report number of coalesced segments and
> >> >> >> >> > > > duplicated ACKs @@ -2854,8 +2858,8 @@
> >> >\subsubsection{Legacy Interface:
> >> >> >> >Feature bits}\label{sec:Device Types / Network
> >> >> >> >> > > >    \subsection{Device configuration
> >> >> >> >> > > > layout}\label{sec:Device Types /
> >> >> >> >Network Device / Device configuration layout}
> >> >> >> >> > > >    \label{sec:Device Types / Block Device / Feature
> >> >> >> >> > > > bits / Device configuration layout} -Three
> >> >> >> >> > > > driver-read-only configuration fields are currently defined.
> >> >> >> >> > > > The \field{mac} address field -always exists (though is
> >> >> >> >> > > > only valid if VIRTIO_NET_F_MAC is set), and
> >> >> >> >> > > > +The driver-read-only \field{mac} address field always
> >> >> >> >> > > > +exists (though is only valid if VIRTIO_NET_F_MAC is
> >> >> >> >> > > > +set), and
> >> >> >> >> > > >    \field{status} only exists if VIRTIO_NET_F_STATUS is set.
> >Two
> >> >> >> >> > > >    read-only bits (for the driver) are currently
> >> >> >> >> > > > defined for the status
> >> >> >> >field:
> >> >> >> >> > > >    VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
> >> >> >> >> > > > @@ -2875,12 +2879,17 @@ \subsection{Device
> >> >> >> >> > > > configuration
> >> >> >> >layout}\label{sec:Device Types / Network Device
> >> >> >> >> > > >    VIRTIO_NET_F_MTU is set. This field specifies the
> >> >> >> >> > > > maximum MTU for
> >> >> >> >the driver to
> >> >> >> >> > > >    use.
> >> >> >> >> > > > +The device-read-only field \field{rx_buf_len} only
> >> >> >> >> > > > +exists if
> >> >> >> >> > >
> >> >> >> >> > > Should be driver-read-only.
> >> >> >> >> > >
> >> >> >> >> > >
> >> >> >> >> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field
> >> >> >> >> > > > +specifies the receive buffers length.
> >> >> >> >> > > > +
> >> >> >> >> > > >    \begin{lstlisting}
> >> >> >> >> > > >    struct virtio_net_config {
> >> >> >> >> > > >            u8 mac[6];
> >> >> >> >> > > >            le16 status;
> >> >> >> >> > > >            le16 max_virtqueue_pairs;
> >> >> >> >> > > >            le16 mtu;
> >> >> >> >> > > > +        le32 rx_buf_len;
> >> >> >> >> > > >    };
> >> >> >> >> > > >    \end{lstlisting}
> >> >> >> >> > > > @@ -2933,6 +2942,13 @@ \subsection{Device configuration
> >> >> >> >> > > > layout}\label{sec:Device Types / Network Device
> >> >> >> >> > > >    A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY
> >> >> >> >> > > > feature if
> >> >> >> >the device offers it.
> >> >> >> >> > > > +A driver SHOULD accept the
> >> >VIRTIO_NET_F_CONST_RXBUF_LEN
> >> >> >> >feature if offered.
> >> >> >> >> > > > +
> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
> >> >> >negotiated,
> >> >> >> >> > > > +the driver MUST set \field{rx_buf_len}.
> >> >> >> >> > >
> >> >> >> >> > > I think it's device that set the field?
> >> >> >> >> > Makes more sense for the driver, but if you want this set e.g.
> >> >> >> >> > before buffers are added, you must say so.
> >> >> >> >> >
> >> >> >> >> > > > +
> >> >> >> >> > > > +A driver MUST NOT modify \field{rx_buf_len} once it
> >> >> >> >> > > > +has been
> >> >> >set.
> >> >> >> >> > > > +
> >> >> >> >> >
> >> >> >> >> > This seems very unflexible. I can see how e.g. XDP would
> >> >> >> >> > benefit from big buffers while skbs benefit from small buffers.
> >> >> >> >> >
> >> >> >> >> > This calls for ability to change this.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> Yes, but it requires non trivial cleanups for the old length
> >> >> >> >> and place them with new ones.
> >> >> >> >>
> >> >> >> >> Thanks
> >> >> >> >
> >> >> >> >
> >> >> >> >Let's see:
> >> >> >> >1	- making buffer smaller: just update config space,
> >> >> >> >	  then make new buffers smaller
> >> >> >> >
> >> >> >> >2	- making buffers bigger: add larger buffers,
> >> >> >> >	once all small ones are consumed update config space
> >> >> >> >
> >> >> >> >2 is tricky I agree. Thoughts?
> >> >> >> >
> >> >> >> >
> >> >> >> Agree. It's doable, provided that the driver will follow the
> >> >> >> update
> >> >> >procedure.
> >> >> >> >
> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> > > >    \subsubsection{Legacy Interface: Device
> >> >> >> >> > > > configuration
> >> >> >> >layout}\label{sec:Device Types / Network Device / Device
> >> >> >> >configuration layout / Legacy Interface: Device configuration
> >> >> >> >layout}
> >> >> >> >> > > >    \label{sec:Device Types / Block Device / Feature
> >> >> >> >> > > > bits / Device
> >> >> >> >configuration layout / Legacy Interface: Device configuration
> >> >> >> >layout}
> >> >> >> >> > > >    When using the legacy interface, transitional
> >> >> >> >> > > > devices and drivers @@
> >> >> >> >> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive
> >> >> >> >Buffers}\label{sec:Device Types / Network Devi
> >> >> >> >> > > >    If VIRTIO_NET_F_MQ is negotiated, each of
> >> >> >> >> > > > receiveq1\ldots
> >> >> >> >receiveqN
> >> >> >> >> > > >    that will be used SHOULD be populated with receive
> >buffers.
> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
> >> >> >negotiated,
> >> >> >> >> > > > +the driver MUST initialize all receive virtqueue
> >> >> >> >> > > > +descriptors \field{len} field with the value
> >> >> >> >> > > > +configured in \field{rx_buf_len} device configuration
> >> >> >> >> > > > +field, and allocate receive
> >> >> >> >buffers accordingly.
> >> >> >> >> > > > +
> >> >> >> >> > > >    \devicenormative{\paragraph}{Setting Up Receive
> >> >> >> >> > > > Buffers}{Device Types / Network Device / Device
> >> >> >> >> > > > Operation / Setting Up Receive Buffers}
> >> >> >> >> > > >    The device MUST set \field{num_buffers} to the
> >> >> >> >> > > > number of descriptors used to @@ -3396,6 +3417,10 @@
> >> >> >> >\subsubsection{Processing of Incoming Packets}\label{sec:Device
> >> >> >> >Types / Network
> >> >> >> >> > > >    checksum (in case of multiple encapsulated
> >> >> >> >> > > > protocols, one
> >> >level
> >> >> >> >> > > >    of checksums is validated).
> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated,
> >> >the
> >> >> >> >device
> >> >> >> >> > > > +MAY use \field{rx_buf_len} as a buffer length (instead
> >> >> >> >> > > > +of reading it from virtqueue descriptor \field{len} field).
> >> >> >> >> > >
> >> >> >> >> > > Is this safe? What if driver submit a small buffer, then
> >> >> >> >> > > device can read
> >> >> >> >more than what is allowed?
> >> >> >> >> > >
> >> >> >> >> > > Thanks
> >> >> >> >> > >
> >> >> >> >> > >
> >> >> >> >> > > > +
> >> >> >> >> > > >    \drivernormative{\paragraph}{Processing of Incoming
> >> >> >> >> > > >    Packets}{Device Types / Network Device / Device Operation
> >/
> >> >> >> >> > > >    Processing of Incoming Packets}


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-24 15:30                     ` [virtio-comment] " Michael S. Tsirkin
@ 2019-11-25  3:31                       ` Jason Wang
  2019-11-25  7:33                         ` Michael S. Tsirkin
  2019-11-26 17:27                       ` Vitaly Mireyno
  1 sibling, 1 reply; 28+ messages in thread
From: Jason Wang @ 2019-11-25  3:31 UTC (permalink / raw)
  To: Michael S. Tsirkin, Vitaly Mireyno; +Cc: virtio-comment


On 2019/11/24 下午11:30, Michael S. Tsirkin wrote:
> On Sun, Nov 24, 2019 at 03:02:05PM +0000, Vitaly Mireyno wrote:
>>
>>> -----Original Message-----
>>> From: Michael S. Tsirkin <mst@redhat.com>
>>> Sent: Wednesday, 20 November, 2019 15:23
>>> To: Vitaly Mireyno <vmireyno@marvell.com>
>>> Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>>> open.org
>>> Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-
>>> sized receive buffers feature
>>>
>>> On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote:
>>>>
>>>>> -----Original Message-----
>>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>>> Sent: Monday, 11 November, 2019 17:05
>>>>> To: Vitaly Mireyno <vmireyno@marvell.com>
>>>>> Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>>>>> open.org
>>>>> Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
>>>>> equal- sized receive buffers feature
>>>>>
>>>>> On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrote:
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>>>>> Sent: Sunday, 10 November, 2019 17:35
>>>>>>> To: Vitaly Mireyno <vmireyno@marvell.com>
>>>>>>> Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>>>>>>> open.org
>>>>>>> Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
>>>>>>> Add
>>>>>>> equal- sized receive buffers feature
>>>>>>>
>>>>>>> On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: virtio-comment@lists.oasis-open.org
>>>>>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
>>>>>>>>> Tsirkin
>>>>>>>>> Sent: Tuesday, 5 November, 2019 20:52
>>>>>>>>> To: Jason Wang <jasowang@redhat.com>
>>>>>>>>> Cc: Vitaly Mireyno <vmireyno@marvell.com>;
>>>>>>>>> virtio-comment@lists.oasis- open.org
>>>>>>>>> Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
>>>>>>>>> equal-sized receive buffers feature
>>>>>>>>>
>>>>>>>>> External Email
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>> ---
>>>>>>>>> ---
>>>>>>>>> - On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
>>>>>>>>>> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
>>>>>>>>>>>> The feature is limited to receive buffers only.
>>>>>>>>>>>> A driver decides on receive buffer length. The only
>>>>>>>>>>>> limitation is that this
>>>>>>>>> length has to be the same for all receive virtqueue buffers.
>>>>>>>>>>>> The driver configures receive buffer length to the device
>>>>>>>>>>>> during device
>>>>>>>>> initialization, and the device reads it and may use it for
>>>>>>>>> optimal
>>>>> operation.
>>>>>>>>>>>> No changes for transmit buffers.
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: virtio-comment@lists.oasis-open.org
>>>>>>>>>>>> <virtio-comment@lists.oasis-open.org> On Behalf Of Jason
>>>>>>>>>>>> Wang
>>>>>>>>>>>> Sent: Thursday, 31 October, 2019 12:15
>>>>>>>>>>>> To: Vitaly Mireyno <vmireyno@marvell.com>;
>>>>>>>>>>>> virtio-comment@lists.oasis-open.org
>>>>>>>>>>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>>>>>>>>>>> Subject: [virtio-comment] Re: [PATCH] virtio-net: Add
>>>>>>>>>>>> equal-sized receive buffers feature
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
>>>>>>>>>>>>> Some devices benefit from working with receive buffers
>>>>>>>>>>>>> of a
>>>>>>>>> predefined constant length.
>>>>>>>>>>>>> Add a feature for drivers that allocate equal-sized
>>>>>>>>>>>>> receive buffers, and
>>>>>>>>> devices that benefit from predefined receive buffers length.
>>>>>>>>>>>>> Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>     content.tex | 29 +++++++++++++++++++++++++++--
>>>>>>>>>>>>>     1 file changed, 27 insertions(+), 2 deletions(-)
>>>>>>>>>>>> Is there any other networking device that has this feature?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/content.tex b/content.tex index
>>>>>>>>>>>>> b1ea9b9..c9e67c8
>>>>>>>>>>>>> 100644
>>>>>>>>>>>>> --- a/content.tex
>>>>>>>>>>>>> +++ b/content.tex
>>>>>>>>>>>>> @@ -2811,6 +2811,10 @@ \subsection{Feature
>>>>>>>>>>>>> bits}\label{sec:Device
>>>>>>>>> Types / Network Device / Feature bits
>>>>>>>>>>>>>     \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC
>>>>>>>>>>>>> address
>>>>>>>>> through control
>>>>>>>>>>>>>         channel.
>>>>>>>>>>>>> +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver
>>>>>>>>>>>>> +allocates all
>>>>>>>>> receive buffers
>>>>>>>>>>>>> +    of the same constant length. Device benefits from
>>>>>>>>>>>>> + working
>>>>> with
>>>>>>>>>>>>> +    receive buffers of equal length.
>>>>>>>>>>>>> +
>>>>>>>>>>> Problem is, I don't think linux will use this for skbs
>>>>>>>>>>> since it breaks buffer accounting. This is because it is
>>>>>>>>>>> important to make skbs as small as you can. So even if you
>>>>>>>>>>> see "device would
>>>>> benefit"
>>>>>>>>>>> there is no way to balance this with the benefit to linux.
>>>>>>>>>>> How do you know which benefit is bigger?
>>>>>>>>>>
>>>>>>>>>> I guess the idea is e.g for Linux driver, it can refuse this feature.
>>>>>>>>> Okay. What uses it?
>>>>>>>>>
>>>>>>>>>>> You also never explained how does device benefit. My guess
>>>>>>>>>>> is this allows device to calculate the # of descriptors to
>>>>>>>>>>> fetch that are needed for a packet. Right?
>>>>>>>>>>> Assuming this, I think a rough estimate should be enough.
>>>>>>>>>>> If device fetches too much it can discard extra, if it does
>>>>>>>>>>> not fetch enough it can fetch extra.
>>>>>>>>>>>
>>>>>>>>>>> Let us not forget that express packets are 256 bytes so up
>>>>>>>>>>> to
>>>>>>>>>>> 16 descriptors fit in a packet, there is no benefit most of
>>>>>>>>>>> the time in knowing whether e.g. 1 or 2 descriptors are needed.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Let us not forget these are buffers, not descriptors.
>>>>>>>>>>
>>>>>>>>>> I guess maybe the initial motivation is constant descriptor length.
>>>>>>>>>
>>>>>>>>> That conflicts with requirement framing is up to driver.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> I was using the wrong terminology. The feature is about constant
>>>>>>> *descriptor* length. In other words, the value that driver puts in
>>>>>>> Packed Virtqueue "Element Length" field (or 'len' field in the
>>> 'pvirtq_desc'
>>>>> struct).
>>>>>>> OK so this conflicts with "2.6.4 Message Framing" which requires
>>>>>>> that drivers can split buffers into as many descriptors as they like.
>>>>>>> Right?
>>>>>>>
>>>>>>>> Does it make more sense now, or you still see an issue with Linux
>>> driver?
>>>>>>> I think there's an issue with the flexible framing requirements
>>>>>>> and there's an issue with Linux driver.
>>>>>>>
>>>>>>>> The motivation is to allow the device to calculate the exact
>>>>>>>> number of
>>>>>>> descriptors to consume, before fetching the descriptors. This is
>>>>>>> beneficial for devices for which overconsuming or underconsuming
>>>>>>> descriptors come at a high cost.
>>>>>>>
>>>>>>> So I guess we can agree getting the # of descriptors *exactly*
>>>>>>> right isn't all that important. Right? My question is how does the
>>>>>>> driver balance the device versus Linux needs?
>>>>>>>
>>>>>>> One idea is if we assume this is best effort, does not have to be
>>>>>>> exact, then how about just having the device assume descriptor
>>>>>>> sizes stay more or less constant and use that to estimate the
>>>>>>> amount? If it under/over estimates, things just go a bit slower.
>>>>>>> This way driver can adjust the sizes and device will react
>>>>>>> automatically, with time.
>>>>>>>
>>>>>>>
>>>>>> I see that "2.6.4 Message Framing" allows a full flexibility of
>>>>>> descriptor lengths, and I presume it's applicable to Packet
>>>>>> Virtqueues as well, though defined under Split Virtqueues section.
>>>>> That's a good point. Probably makes sense to move it out to a common
>>>>> section, right?
>>>>>
>>>>>> The example
>>>>>> talks about transmit virtqueue, and it makes perfect sense.
>>>>>> However, wouldn't a typical driver place equal-sized *receive*
>>>>>> descriptors
>>>>> anyway? So if a device can benefit from it, the driver might as well
>>>>> negotiate this new feature.
>>>>>
>>>>> Having buffer size jump around wildly doesn't seem too useful, I agree.
>>>>> OTOH being able to adjust it gradually has been in the past
>>>>> demontrated to help performance measureably.
>>>>>
>>>>>> This could be especially relevant for devices, for which adjusting
>>>>>> the number
>>>>> of used descriptors is impractical after descriptors have already been
>>> fetched.
>>>>>> I agree that if this requirement conflicts with specific driver
>>>>>> needs, it will not
>>>>> be negotiated, and the device will either underperform in certain
>>>>> scenarios, or will not come up at all.
>>>>>
>>>>> Right so that makes it a challenge. Device says it prefers fixed
>>>>> buffers. Is that preference more or less important than ability to
>>>>> efficiently support workloads such as TCP small queues?
>>>>> Driver has no idea and I suspect neither does the device.
>>>>> So I don't see how will a Linux driver know that it should enable
>>>>> this, neither how will device know it's okay to just refuse features_ok.
>>>>>
>>>>>
>>>>> On the other hand, current drivers already have logic that tries to
>>>>> change buffer sizes in a smooth way.  So simply caching the last
>>>>> buffer size and assuming all the others will be exactly the same will
>>>>> go a long way towards limiting how much does device need to fetch.
>>>>> This does imply extra logic on the device to recover if things change
>>>>> and the first read did not fetch enough buffers, but then it would be
>>>>> required anyway if as you say above the failure is not catastrophic.
>>>>>
>>>>>
>>>>> The feature thus only seems useful for small, feature-restrained
>>>>> devices
>>>>> - which are prepared to sacrifice some performance to cut costs, and
>>>>> which can't recover at all. Is this what you are trying to do?
>>>>>
>>>> The intention is to enable devices with this specific limitation to
>>>> offer virtio offload.  The feature can be redefined such that it would
>>>> only be offered by devices that are unable to handle dynamically
>>>> changing descriptor lengths. How does that sound?
>>> So it makes more sense when mergeable buffers are disabled (since then
>>> buffers are practically all same size).
>>>
>> Actually, mergeable buffers are not a problem. They could be enabled,
>> as long as each descriptor is the same length. So implementations that
>> prefer optimizing memory utilization over jumbo frame performance can
>> negotiate VIRTIO_NET_F_MRG_RXBUF and allocate smaller buffers.
> So my point was, without VIRTIO_NET_F_MRG_RXBUF, all buffers are same
> length anyway. So if we are talking about cheap simple hardware, I guess
> it can just not have VIRTIO_NET_F_MRG_RXBUF and be done with it?
> If device does care about memory utilization then I think
> it needs to give driver the flexibility it needs/uses.
> No?
>
> Again I can see how we might want to disallow crazy setups with e.g. 1
> byte per buffer. That's just abuse, no guest does that anyway. So asking
> e.g. for a minimal buffer size sounds very reasonable.


One question here is that, the minimal buffer size should depends on 
various factors. E.g the ring size. Consider a 256 entries ring, the 
minimal size should be 64K/256=256 ...


> But an option that
> disables functionality that a popular guest uses needs a lot of
> documentation to help device writers figure out whether they want that
> option or not, and I'd worry that even with documentation will be
> misunderstood even if we write it. When do you enable this?
> When you don't care about performance? When you don't care about Linux?


It looks to me the feature proposal here is something related to the 
descriptor pre fetching. In the case of packed virtqueue. Having 
avail/producer index may help more or less here?

Thanks


>
>>> With that in mind, I have an idea: scsi and block already have max sg field.
>>> How about we add a writeable max sg field, maybe even make it
>>> programmable per vq?
>>>
>>> Thus driver tells device what is it's max s/g value for rx. Worst case device
>>> fetches a bit more than it needs. Discarding extra shouldn't be expensive. This
>>> looks like it will help even smart devices. What do you think?
>>>
>>> This nicely avoids conflicting with the flexible framing requirement.
>>>
>> My intention was to avoid any descriptor length variations, for
>> devices that unable fetching or discarding extra descriptors. (If in
>> the pipelined HW processing the decision on number of descriptors is
>> made in the early stage, and it cannot be undone in a later stage).
> Frankly discarding unused descriptors looks to me like something that
> shouldn't have a high cost in the hardware.  I can see how trying to
> predict descriptor length, and fetching extra if not enough was fetched
> can have a high cost, so to me extensions to remove the guesswork
> from the device have value.  However a lot of effort went into trying to
> reduce e.g. number of pci express packets needed to fetch descriptors.
> Each packet fetches 256 bytes anyway, does it not?  Not having an
> ability to use that information seems like an obvious waste, and that
> means ability to keep some fetched descriptors around even if they are
> not used for a given packet. Again just my $.02.
>
>> Defining max s/g sounds like an interesting feature by itself.
> But assuming we have max RX s/g, I guess hardware can set max s/g = 1?
> Then since with !VIRTIO_NET_F_MRG_RXBUF all buffers are forced to
> be same length.
>
>>>>>> Could you specify what issue do you see with the Linux driver?
>>>>> See logic around struct ewma.
>>>>>
>>>>> The first relevant commit is I guess
>>>>> commit ab7db91705e95ed1bba1304388936fccfa58c992
>>>>>     virtio-net: auto-tune mergeable rx buffer size for improved
>>>>> performance
>>>>>
>>>>> this claims a gain of about 10% with large packets which isn't earth
>>>>> shattering but also not something we can ignore completely. And I
>>>>> suspect it can be bigger with smaller packets.
>>>>>
>>>>>>>>>>> So device does not really know the exact # of descriptors:
>>>>>>>>>>> current drivers do 1 descriptor per buffer but nothing
>>>>>>>>>>> prevents more.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thoughts?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>     \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process
>>>>>>>>>>>>> duplicated
>>>>>>>>> ACKs
>>>>>>>>>>>>>         and report number of coalesced segments and
>>>>>>>>>>>>> duplicated ACKs @@ -2854,8 +2858,8 @@
>>>>> \subsubsection{Legacy Interface:
>>>>>>>>> Feature bits}\label{sec:Device Types / Network
>>>>>>>>>>>>>     \subsection{Device configuration
>>>>>>>>>>>>> layout}\label{sec:Device Types /
>>>>>>>>> Network Device / Device configuration layout}
>>>>>>>>>>>>>     \label{sec:Device Types / Block Device / Feature
>>>>>>>>>>>>> bits / Device configuration layout} -Three
>>>>>>>>>>>>> driver-read-only configuration fields are currently defined.
>>>>>>>>>>>>> The \field{mac} address field -always exists (though is
>>>>>>>>>>>>> only valid if VIRTIO_NET_F_MAC is set), and
>>>>>>>>>>>>> +The driver-read-only \field{mac} address field always
>>>>>>>>>>>>> +exists (though is only valid if VIRTIO_NET_F_MAC is
>>>>>>>>>>>>> +set), and
>>>>>>>>>>>>>     \field{status} only exists if VIRTIO_NET_F_STATUS is set.
>>> Two
>>>>>>>>>>>>>     read-only bits (for the driver) are currently
>>>>>>>>>>>>> defined for the status
>>>>>>>>> field:
>>>>>>>>>>>>>     VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
>>>>>>>>>>>>> @@ -2875,12 +2879,17 @@ \subsection{Device
>>>>>>>>>>>>> configuration
>>>>>>>>> layout}\label{sec:Device Types / Network Device
>>>>>>>>>>>>>     VIRTIO_NET_F_MTU is set. This field specifies the
>>>>>>>>>>>>> maximum MTU for
>>>>>>>>> the driver to
>>>>>>>>>>>>>     use.
>>>>>>>>>>>>> +The device-read-only field \field{rx_buf_len} only
>>>>>>>>>>>>> +exists if
>>>>>>>>>>>> Should be driver-read-only.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field
>>>>>>>>>>>>> +specifies the receive buffers length.
>>>>>>>>>>>>> +
>>>>>>>>>>>>>     \begin{lstlisting}
>>>>>>>>>>>>>     struct virtio_net_config {
>>>>>>>>>>>>>             u8 mac[6];
>>>>>>>>>>>>>             le16 status;
>>>>>>>>>>>>>             le16 max_virtqueue_pairs;
>>>>>>>>>>>>>             le16 mtu;
>>>>>>>>>>>>> +        le32 rx_buf_len;
>>>>>>>>>>>>>     };
>>>>>>>>>>>>>     \end{lstlisting}
>>>>>>>>>>>>> @@ -2933,6 +2942,13 @@ \subsection{Device configuration
>>>>>>>>>>>>> layout}\label{sec:Device Types / Network Device
>>>>>>>>>>>>>     A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY
>>>>>>>>>>>>> feature if
>>>>>>>>> the device offers it.
>>>>>>>>>>>>> +A driver SHOULD accept the
>>>>> VIRTIO_NET_F_CONST_RXBUF_LEN
>>>>>>>>> feature if offered.
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
>>>>>>> negotiated,
>>>>>>>>>>>>> +the driver MUST set \field{rx_buf_len}.
>>>>>>>>>>>> I think it's device that set the field?
>>>>>>>>>>> Makes more sense for the driver, but if you want this set e.g.
>>>>>>>>>>> before buffers are added, you must say so.
>>>>>>>>>>>
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +A driver MUST NOT modify \field{rx_buf_len} once it
>>>>>>>>>>>>> +has been
>>>>>>> set.
>>>>>>>>>>>>> +
>>>>>>>>>>> This seems very unflexible. I can see how e.g. XDP would
>>>>>>>>>>> benefit from big buffers while skbs benefit from small buffers.
>>>>>>>>>>>
>>>>>>>>>>> This calls for ability to change this.
>>>>>>>>>>
>>>>>>>>>> Yes, but it requires non trivial cleanups for the old length
>>>>>>>>>> and place them with new ones.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> Let's see:
>>>>>>>>> 1	- making buffer smaller: just update config space,
>>>>>>>>> 	  then make new buffers smaller
>>>>>>>>>
>>>>>>>>> 2	- making buffers bigger: add larger buffers,
>>>>>>>>> 	once all small ones are consumed update config space
>>>>>>>>>
>>>>>>>>> 2 is tricky I agree. Thoughts?
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Agree. It's doable, provided that the driver will follow the
>>>>>>>> update
>>>>>>> procedure.
>>>>>>>>>>>>>     \subsubsection{Legacy Interface: Device
>>>>>>>>>>>>> configuration
>>>>>>>>> layout}\label{sec:Device Types / Network Device / Device
>>>>>>>>> configuration layout / Legacy Interface: Device configuration
>>>>>>>>> layout}
>>>>>>>>>>>>>     \label{sec:Device Types / Block Device / Feature
>>>>>>>>>>>>> bits / Device
>>>>>>>>> configuration layout / Legacy Interface: Device configuration
>>>>>>>>> layout}
>>>>>>>>>>>>>     When using the legacy interface, transitional
>>>>>>>>>>>>> devices and drivers @@
>>>>>>>>>>>>> -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive
>>>>>>>>> Buffers}\label{sec:Device Types / Network Devi
>>>>>>>>>>>>>     If VIRTIO_NET_F_MQ is negotiated, each of
>>>>>>>>>>>>> receiveq1\ldots
>>>>>>>>> receiveqN
>>>>>>>>>>>>>     that will be used SHOULD be populated with receive
>>> buffers.
>>>>>>>>>>>>> +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
>>>>>>> negotiated,
>>>>>>>>>>>>> +the driver MUST initialize all receive virtqueue
>>>>>>>>>>>>> +descriptors \field{len} field with the value
>>>>>>>>>>>>> +configured in \field{rx_buf_len} device configuration
>>>>>>>>>>>>> +field, and allocate receive
>>>>>>>>> buffers accordingly.
>>>>>>>>>>>>> +
>>>>>>>>>>>>>     \devicenormative{\paragraph}{Setting Up Receive
>>>>>>>>>>>>> Buffers}{Device Types / Network Device / Device
>>>>>>>>>>>>> Operation / Setting Up Receive Buffers}
>>>>>>>>>>>>>     The device MUST set \field{num_buffers} to the
>>>>>>>>>>>>> number of descriptors used to @@ -3396,6 +3417,10 @@
>>>>>>>>> \subsubsection{Processing of Incoming Packets}\label{sec:Device
>>>>>>>>> Types / Network
>>>>>>>>>>>>>     checksum (in case of multiple encapsulated
>>>>>>>>>>>>> protocols, one
>>>>> level
>>>>>>>>>>>>>     of checksums is validated).
>>>>>>>>>>>>> +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated,
>>>>> the
>>>>>>>>> device
>>>>>>>>>>>>> +MAY use \field{rx_buf_len} as a buffer length (instead
>>>>>>>>>>>>> +of reading it from virtqueue descriptor \field{len} field).
>>>>>>>>>>>> Is this safe? What if driver submit a small buffer, then
>>>>>>>>>>>> device can read
>>>>>>>>> more than what is allowed?
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> +
>>>>>>>>>>>>>     \drivernormative{\paragraph}{Processing of Incoming
>>>>>>>>>>>>>     Packets}{Device Types / Network Device / Device Operation
>>> /
>>>>>>>>>>>>>     Processing of Incoming Packets}


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-25  3:31                       ` Jason Wang
@ 2019-11-25  7:33                         ` Michael S. Tsirkin
  2019-11-25  8:07                           ` Jason Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-11-25  7:33 UTC (permalink / raw)
  To: Jason Wang; +Cc: Vitaly Mireyno, virtio-comment

On Mon, Nov 25, 2019 at 11:31:09AM +0800, Jason Wang wrote:
> 
> On 2019/11/24 下午11:30, Michael S. Tsirkin wrote:
> > On Sun, Nov 24, 2019 at 03:02:05PM +0000, Vitaly Mireyno wrote:
> > > 
> > > > -----Original Message-----
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Wednesday, 20 November, 2019 15:23
> > > > To: Vitaly Mireyno <vmireyno@marvell.com>
> > > > Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> > > > open.org
> > > > Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-
> > > > sized receive buffers feature
> > > > 
> > > > On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote:
> > > > > 
> > > > > > -----Original Message-----
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Monday, 11 November, 2019 17:05
> > > > > > To: Vitaly Mireyno <vmireyno@marvell.com>
> > > > > > Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> > > > > > open.org
> > > > > > Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
> > > > > > equal- sized receive buffers feature
> > > > > > 
> > > > > > On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrote:
> > > > > > > 
> > > > > > > > -----Original Message-----
> > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > Sent: Sunday, 10 November, 2019 17:35
> > > > > > > > To: Vitaly Mireyno <vmireyno@marvell.com>
> > > > > > > > Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> > > > > > > > open.org
> > > > > > > > Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> > > > > > > > Add
> > > > > > > > equal- sized receive buffers feature
> > > > > > > > 
> > > > > > > > On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: virtio-comment@lists.oasis-open.org
> > > > > > > > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
> > > > > > > > > > Tsirkin
> > > > > > > > > > Sent: Tuesday, 5 November, 2019 20:52
> > > > > > > > > > To: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > Cc: Vitaly Mireyno <vmireyno@marvell.com>;
> > > > > > > > > > virtio-comment@lists.oasis- open.org
> > > > > > > > > > Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
> > > > > > > > > > equal-sized receive buffers feature
> > > > > > > > > > 
> > > > > > > > > > External Email
> > > > > > > > > > 
> > > > > > > > > > ---------------------------------------------------------------
> > > > > > > > > > ---
> > > > > > > > > > ---
> > > > > > > > > > - On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
> > > > > > > > > > > On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
> > > > > > > > > > > > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
> > > > > > > > > > > > > The feature is limited to receive buffers only.
> > > > > > > > > > > > > A driver decides on receive buffer length. The only
> > > > > > > > > > > > > limitation is that this
> > > > > > > > > > length has to be the same for all receive virtqueue buffers.
> > > > > > > > > > > > > The driver configures receive buffer length to the device
> > > > > > > > > > > > > during device
> > > > > > > > > > initialization, and the device reads it and may use it for
> > > > > > > > > > optimal
> > > > > > operation.
> > > > > > > > > > > > > No changes for transmit buffers.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > From: virtio-comment@lists.oasis-open.org
> > > > > > > > > > > > > <virtio-comment@lists.oasis-open.org> On Behalf Of Jason
> > > > > > > > > > > > > Wang
> > > > > > > > > > > > > Sent: Thursday, 31 October, 2019 12:15
> > > > > > > > > > > > > To: Vitaly Mireyno <vmireyno@marvell.com>;
> > > > > > > > > > > > > virtio-comment@lists.oasis-open.org
> > > > > > > > > > > > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > > > > > Subject: [virtio-comment] Re: [PATCH] virtio-net: Add
> > > > > > > > > > > > > equal-sized receive buffers feature
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
> > > > > > > > > > > > > > Some devices benefit from working with receive buffers
> > > > > > > > > > > > > > of a
> > > > > > > > > > predefined constant length.
> > > > > > > > > > > > > > Add a feature for drivers that allocate equal-sized
> > > > > > > > > > > > > > receive buffers, and
> > > > > > > > > > devices that benefit from predefined receive buffers length.
> > > > > > > > > > > > > > Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >     content.tex | 29 +++++++++++++++++++++++++++--
> > > > > > > > > > > > > >     1 file changed, 27 insertions(+), 2 deletions(-)
> > > > > > > > > > > > > Is there any other networking device that has this feature?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > diff --git a/content.tex b/content.tex index
> > > > > > > > > > > > > > b1ea9b9..c9e67c8
> > > > > > > > > > > > > > 100644
> > > > > > > > > > > > > > --- a/content.tex
> > > > > > > > > > > > > > +++ b/content.tex
> > > > > > > > > > > > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
> > > > > > > > > > > > > > bits}\label{sec:Device
> > > > > > > > > > Types / Network Device / Feature bits
> > > > > > > > > > > > > >     \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC
> > > > > > > > > > > > > > address
> > > > > > > > > > through control
> > > > > > > > > > > > > >         channel.
> > > > > > > > > > > > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver
> > > > > > > > > > > > > > +allocates all
> > > > > > > > > > receive buffers
> > > > > > > > > > > > > > +    of the same constant length. Device benefits from
> > > > > > > > > > > > > > + working
> > > > > > with
> > > > > > > > > > > > > > +    receive buffers of equal length.
> > > > > > > > > > > > > > +
> > > > > > > > > > > > Problem is, I don't think linux will use this for skbs
> > > > > > > > > > > > since it breaks buffer accounting. This is because it is
> > > > > > > > > > > > important to make skbs as small as you can. So even if you
> > > > > > > > > > > > see "device would
> > > > > > benefit"
> > > > > > > > > > > > there is no way to balance this with the benefit to linux.
> > > > > > > > > > > > How do you know which benefit is bigger?
> > > > > > > > > > > 
> > > > > > > > > > > I guess the idea is e.g for Linux driver, it can refuse this feature.
> > > > > > > > > > Okay. What uses it?
> > > > > > > > > > 
> > > > > > > > > > > > You also never explained how does device benefit. My guess
> > > > > > > > > > > > is this allows device to calculate the # of descriptors to
> > > > > > > > > > > > fetch that are needed for a packet. Right?
> > > > > > > > > > > > Assuming this, I think a rough estimate should be enough.
> > > > > > > > > > > > If device fetches too much it can discard extra, if it does
> > > > > > > > > > > > not fetch enough it can fetch extra.
> > > > > > > > > > > > 
> > > > > > > > > > > > Let us not forget that express packets are 256 bytes so up
> > > > > > > > > > > > to
> > > > > > > > > > > > 16 descriptors fit in a packet, there is no benefit most of
> > > > > > > > > > > > the time in knowing whether e.g. 1 or 2 descriptors are needed.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Let us not forget these are buffers, not descriptors.
> > > > > > > > > > > 
> > > > > > > > > > > I guess maybe the initial motivation is constant descriptor length.
> > > > > > > > > > 
> > > > > > > > > > That conflicts with requirement framing is up to driver.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > I was using the wrong terminology. The feature is about constant
> > > > > > > > *descriptor* length. In other words, the value that driver puts in
> > > > > > > > Packed Virtqueue "Element Length" field (or 'len' field in the
> > > > 'pvirtq_desc'
> > > > > > struct).
> > > > > > > > OK so this conflicts with "2.6.4 Message Framing" which requires
> > > > > > > > that drivers can split buffers into as many descriptors as they like.
> > > > > > > > Right?
> > > > > > > > 
> > > > > > > > > Does it make more sense now, or you still see an issue with Linux
> > > > driver?
> > > > > > > > I think there's an issue with the flexible framing requirements
> > > > > > > > and there's an issue with Linux driver.
> > > > > > > > 
> > > > > > > > > The motivation is to allow the device to calculate the exact
> > > > > > > > > number of
> > > > > > > > descriptors to consume, before fetching the descriptors. This is
> > > > > > > > beneficial for devices for which overconsuming or underconsuming
> > > > > > > > descriptors come at a high cost.
> > > > > > > > 
> > > > > > > > So I guess we can agree getting the # of descriptors *exactly*
> > > > > > > > right isn't all that important. Right? My question is how does the
> > > > > > > > driver balance the device versus Linux needs?
> > > > > > > > 
> > > > > > > > One idea is if we assume this is best effort, does not have to be
> > > > > > > > exact, then how about just having the device assume descriptor
> > > > > > > > sizes stay more or less constant and use that to estimate the
> > > > > > > > amount? If it under/over estimates, things just go a bit slower.
> > > > > > > > This way driver can adjust the sizes and device will react
> > > > > > > > automatically, with time.
> > > > > > > > 
> > > > > > > > 
> > > > > > > I see that "2.6.4 Message Framing" allows a full flexibility of
> > > > > > > descriptor lengths, and I presume it's applicable to Packet
> > > > > > > Virtqueues as well, though defined under Split Virtqueues section.
> > > > > > That's a good point. Probably makes sense to move it out to a common
> > > > > > section, right?
> > > > > > 
> > > > > > > The example
> > > > > > > talks about transmit virtqueue, and it makes perfect sense.
> > > > > > > However, wouldn't a typical driver place equal-sized *receive*
> > > > > > > descriptors
> > > > > > anyway? So if a device can benefit from it, the driver might as well
> > > > > > negotiate this new feature.
> > > > > > 
> > > > > > Having buffer size jump around wildly doesn't seem too useful, I agree.
> > > > > > OTOH being able to adjust it gradually has been in the past
> > > > > > demontrated to help performance measureably.
> > > > > > 
> > > > > > > This could be especially relevant for devices, for which adjusting
> > > > > > > the number
> > > > > > of used descriptors is impractical after descriptors have already been
> > > > fetched.
> > > > > > > I agree that if this requirement conflicts with specific driver
> > > > > > > needs, it will not
> > > > > > be negotiated, and the device will either underperform in certain
> > > > > > scenarios, or will not come up at all.
> > > > > > 
> > > > > > Right so that makes it a challenge. Device says it prefers fixed
> > > > > > buffers. Is that preference more or less important than ability to
> > > > > > efficiently support workloads such as TCP small queues?
> > > > > > Driver has no idea and I suspect neither does the device.
> > > > > > So I don't see how will a Linux driver know that it should enable
> > > > > > this, neither how will device know it's okay to just refuse features_ok.
> > > > > > 
> > > > > > 
> > > > > > On the other hand, current drivers already have logic that tries to
> > > > > > change buffer sizes in a smooth way.  So simply caching the last
> > > > > > buffer size and assuming all the others will be exactly the same will
> > > > > > go a long way towards limiting how much does device need to fetch.
> > > > > > This does imply extra logic on the device to recover if things change
> > > > > > and the first read did not fetch enough buffers, but then it would be
> > > > > > required anyway if as you say above the failure is not catastrophic.
> > > > > > 
> > > > > > 
> > > > > > The feature thus only seems useful for small, feature-restrained
> > > > > > devices
> > > > > > - which are prepared to sacrifice some performance to cut costs, and
> > > > > > which can't recover at all. Is this what you are trying to do?
> > > > > > 
> > > > > The intention is to enable devices with this specific limitation to
> > > > > offer virtio offload.  The feature can be redefined such that it would
> > > > > only be offered by devices that are unable to handle dynamically
> > > > > changing descriptor lengths. How does that sound?
> > > > So it makes more sense when mergeable buffers are disabled (since then
> > > > buffers are practically all same size).
> > > > 
> > > Actually, mergeable buffers are not a problem. They could be enabled,
> > > as long as each descriptor is the same length. So implementations that
> > > prefer optimizing memory utilization over jumbo frame performance can
> > > negotiate VIRTIO_NET_F_MRG_RXBUF and allocate smaller buffers.
> > So my point was, without VIRTIO_NET_F_MRG_RXBUF, all buffers are same
> > length anyway. So if we are talking about cheap simple hardware, I guess
> > it can just not have VIRTIO_NET_F_MRG_RXBUF and be done with it?
> > If device does care about memory utilization then I think
> > it needs to give driver the flexibility it needs/uses.
> > No?
> > 
> > Again I can see how we might want to disallow crazy setups with e.g. 1
> > byte per buffer. That's just abuse, no guest does that anyway. So asking
> > e.g. for a minimal buffer size sounds very reasonable.
> 
> 
> One question here is that, the minimal buffer size should depends on various
> factors. E.g the ring size. Consider a 256 entries ring, the minimal size
> should be 64K/256=256 ...

I guess you are right. We can make this driver programmable I guess?
Basically pass min_buf_len to the device.


> 
> > But an option that
> > disables functionality that a popular guest uses needs a lot of
> > documentation to help device writers figure out whether they want that
> > option or not, and I'd worry that even with documentation will be
> > misunderstood even if we write it. When do you enable this?
> > When you don't care about performance? When you don't care about Linux?
> 
> 
> It looks to me the feature proposal here is something related to the
> descriptor pre fetching. In the case of packed virtqueue. Having
> avail/producer index may help more or less here?
> 
> Thanks

Isn't this addressed by VIRTIO_F_NOTIFICATION_DATA?
At least that's one of the things it's supposed to do ...

> 
> > 
> > > > With that in mind, I have an idea: scsi and block already have max sg field.
> > > > How about we add a writeable max sg field, maybe even make it
> > > > programmable per vq?
> > > > 
> > > > Thus driver tells device what is it's max s/g value for rx. Worst case device
> > > > fetches a bit more than it needs. Discarding extra shouldn't be expensive. This
> > > > looks like it will help even smart devices. What do you think?
> > > > 
> > > > This nicely avoids conflicting with the flexible framing requirement.
> > > > 
> > > My intention was to avoid any descriptor length variations, for
> > > devices that unable fetching or discarding extra descriptors. (If in
> > > the pipelined HW processing the decision on number of descriptors is
> > > made in the early stage, and it cannot be undone in a later stage).
> > Frankly discarding unused descriptors looks to me like something that
> > shouldn't have a high cost in the hardware.  I can see how trying to
> > predict descriptor length, and fetching extra if not enough was fetched
> > can have a high cost, so to me extensions to remove the guesswork
> > from the device have value.  However a lot of effort went into trying to
> > reduce e.g. number of pci express packets needed to fetch descriptors.
> > Each packet fetches 256 bytes anyway, does it not?  Not having an
> > ability to use that information seems like an obvious waste, and that
> > means ability to keep some fetched descriptors around even if they are
> > not used for a given packet. Again just my $.02.
> > 
> > > Defining max s/g sounds like an interesting feature by itself.
> > But assuming we have max RX s/g, I guess hardware can set max s/g = 1?
> > Then since with !VIRTIO_NET_F_MRG_RXBUF all buffers are forced to
> > be same length.
> > 
> > > > > > > Could you specify what issue do you see with the Linux driver?
> > > > > > See logic around struct ewma.
> > > > > > 
> > > > > > The first relevant commit is I guess
> > > > > > commit ab7db91705e95ed1bba1304388936fccfa58c992
> > > > > >     virtio-net: auto-tune mergeable rx buffer size for improved
> > > > > > performance
> > > > > > 
> > > > > > this claims a gain of about 10% with large packets which isn't earth
> > > > > > shattering but also not something we can ignore completely. And I
> > > > > > suspect it can be bigger with smaller packets.
> > > > > > 
> > > > > > > > > > > > So device does not really know the exact # of descriptors:
> > > > > > > > > > > > current drivers do 1 descriptor per buffer but nothing
> > > > > > > > > > > > prevents more.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Thoughts?
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > > >     \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process
> > > > > > > > > > > > > > duplicated
> > > > > > > > > > ACKs
> > > > > > > > > > > > > >         and report number of coalesced segments and
> > > > > > > > > > > > > > duplicated ACKs @@ -2854,8 +2858,8 @@
> > > > > > \subsubsection{Legacy Interface:
> > > > > > > > > > Feature bits}\label{sec:Device Types / Network
> > > > > > > > > > > > > >     \subsection{Device configuration
> > > > > > > > > > > > > > layout}\label{sec:Device Types /
> > > > > > > > > > Network Device / Device configuration layout}
> > > > > > > > > > > > > >     \label{sec:Device Types / Block Device / Feature
> > > > > > > > > > > > > > bits / Device configuration layout} -Three
> > > > > > > > > > > > > > driver-read-only configuration fields are currently defined.
> > > > > > > > > > > > > > The \field{mac} address field -always exists (though is
> > > > > > > > > > > > > > only valid if VIRTIO_NET_F_MAC is set), and
> > > > > > > > > > > > > > +The driver-read-only \field{mac} address field always
> > > > > > > > > > > > > > +exists (though is only valid if VIRTIO_NET_F_MAC is
> > > > > > > > > > > > > > +set), and
> > > > > > > > > > > > > >     \field{status} only exists if VIRTIO_NET_F_STATUS is set.
> > > > Two
> > > > > > > > > > > > > >     read-only bits (for the driver) are currently
> > > > > > > > > > > > > > defined for the status
> > > > > > > > > > field:
> > > > > > > > > > > > > >     VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
> > > > > > > > > > > > > > @@ -2875,12 +2879,17 @@ \subsection{Device
> > > > > > > > > > > > > > configuration
> > > > > > > > > > layout}\label{sec:Device Types / Network Device
> > > > > > > > > > > > > >     VIRTIO_NET_F_MTU is set. This field specifies the
> > > > > > > > > > > > > > maximum MTU for
> > > > > > > > > > the driver to
> > > > > > > > > > > > > >     use.
> > > > > > > > > > > > > > +The device-read-only field \field{rx_buf_len} only
> > > > > > > > > > > > > > +exists if
> > > > > > > > > > > > > Should be driver-read-only.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field
> > > > > > > > > > > > > > +specifies the receive buffers length.
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >     \begin{lstlisting}
> > > > > > > > > > > > > >     struct virtio_net_config {
> > > > > > > > > > > > > >             u8 mac[6];
> > > > > > > > > > > > > >             le16 status;
> > > > > > > > > > > > > >             le16 max_virtqueue_pairs;
> > > > > > > > > > > > > >             le16 mtu;
> > > > > > > > > > > > > > +        le32 rx_buf_len;
> > > > > > > > > > > > > >     };
> > > > > > > > > > > > > >     \end{lstlisting}
> > > > > > > > > > > > > > @@ -2933,6 +2942,13 @@ \subsection{Device configuration
> > > > > > > > > > > > > > layout}\label{sec:Device Types / Network Device
> > > > > > > > > > > > > >     A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY
> > > > > > > > > > > > > > feature if
> > > > > > > > > > the device offers it.
> > > > > > > > > > > > > > +A driver SHOULD accept the
> > > > > > VIRTIO_NET_F_CONST_RXBUF_LEN
> > > > > > > > > > feature if offered.
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
> > > > > > > > negotiated,
> > > > > > > > > > > > > > +the driver MUST set \field{rx_buf_len}.
> > > > > > > > > > > > > I think it's device that set the field?
> > > > > > > > > > > > Makes more sense for the driver, but if you want this set e.g.
> > > > > > > > > > > > before buffers are added, you must say so.
> > > > > > > > > > > > 
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +A driver MUST NOT modify \field{rx_buf_len} once it
> > > > > > > > > > > > > > +has been
> > > > > > > > set.
> > > > > > > > > > > > > > +
> > > > > > > > > > > > This seems very unflexible. I can see how e.g. XDP would
> > > > > > > > > > > > benefit from big buffers while skbs benefit from small buffers.
> > > > > > > > > > > > 
> > > > > > > > > > > > This calls for ability to change this.
> > > > > > > > > > > 
> > > > > > > > > > > Yes, but it requires non trivial cleanups for the old length
> > > > > > > > > > > and place them with new ones.
> > > > > > > > > > > 
> > > > > > > > > > > Thanks
> > > > > > > > > > 
> > > > > > > > > > Let's see:
> > > > > > > > > > 1	- making buffer smaller: just update config space,
> > > > > > > > > > 	  then make new buffers smaller
> > > > > > > > > > 
> > > > > > > > > > 2	- making buffers bigger: add larger buffers,
> > > > > > > > > > 	once all small ones are consumed update config space
> > > > > > > > > > 
> > > > > > > > > > 2 is tricky I agree. Thoughts?
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > Agree. It's doable, provided that the driver will follow the
> > > > > > > > > update
> > > > > > > > procedure.
> > > > > > > > > > > > > >     \subsubsection{Legacy Interface: Device
> > > > > > > > > > > > > > configuration
> > > > > > > > > > layout}\label{sec:Device Types / Network Device / Device
> > > > > > > > > > configuration layout / Legacy Interface: Device configuration
> > > > > > > > > > layout}
> > > > > > > > > > > > > >     \label{sec:Device Types / Block Device / Feature
> > > > > > > > > > > > > > bits / Device
> > > > > > > > > > configuration layout / Legacy Interface: Device configuration
> > > > > > > > > > layout}
> > > > > > > > > > > > > >     When using the legacy interface, transitional
> > > > > > > > > > > > > > devices and drivers @@
> > > > > > > > > > > > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive
> > > > > > > > > > Buffers}\label{sec:Device Types / Network Devi
> > > > > > > > > > > > > >     If VIRTIO_NET_F_MQ is negotiated, each of
> > > > > > > > > > > > > > receiveq1\ldots
> > > > > > > > > > receiveqN
> > > > > > > > > > > > > >     that will be used SHOULD be populated with receive
> > > > buffers.
> > > > > > > > > > > > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
> > > > > > > > negotiated,
> > > > > > > > > > > > > > +the driver MUST initialize all receive virtqueue
> > > > > > > > > > > > > > +descriptors \field{len} field with the value
> > > > > > > > > > > > > > +configured in \field{rx_buf_len} device configuration
> > > > > > > > > > > > > > +field, and allocate receive
> > > > > > > > > > buffers accordingly.
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >     \devicenormative{\paragraph}{Setting Up Receive
> > > > > > > > > > > > > > Buffers}{Device Types / Network Device / Device
> > > > > > > > > > > > > > Operation / Setting Up Receive Buffers}
> > > > > > > > > > > > > >     The device MUST set \field{num_buffers} to the
> > > > > > > > > > > > > > number of descriptors used to @@ -3396,6 +3417,10 @@
> > > > > > > > > > \subsubsection{Processing of Incoming Packets}\label{sec:Device
> > > > > > > > > > Types / Network
> > > > > > > > > > > > > >     checksum (in case of multiple encapsulated
> > > > > > > > > > > > > > protocols, one
> > > > > > level
> > > > > > > > > > > > > >     of checksums is validated).
> > > > > > > > > > > > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated,
> > > > > > the
> > > > > > > > > > device
> > > > > > > > > > > > > > +MAY use \field{rx_buf_len} as a buffer length (instead
> > > > > > > > > > > > > > +of reading it from virtqueue descriptor \field{len} field).
> > > > > > > > > > > > > Is this safe? What if driver submit a small buffer, then
> > > > > > > > > > > > > device can read
> > > > > > > > > > more than what is allowed?
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >     \drivernormative{\paragraph}{Processing of Incoming
> > > > > > > > > > > > > >     Packets}{Device Types / Network Device / Device Operation
> > > > /
> > > > > > > > > > > > > >     Processing of Incoming Packets}


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-25  7:33                         ` Michael S. Tsirkin
@ 2019-11-25  8:07                           ` Jason Wang
  2019-11-25  8:18                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2019-11-25  8:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Vitaly Mireyno, virtio-comment


On 2019/11/25 下午3:33, Michael S. Tsirkin wrote:
> On Mon, Nov 25, 2019 at 11:31:09AM +0800, Jason Wang wrote:
>> On 2019/11/24 下午11:30, Michael S. Tsirkin wrote:
>>> On Sun, Nov 24, 2019 at 03:02:05PM +0000, Vitaly Mireyno wrote:
>>>>> -----Original Message-----
>>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>>> Sent: Wednesday, 20 November, 2019 15:23
>>>>> To: Vitaly Mireyno <vmireyno@marvell.com>
>>>>> Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>>>>> open.org
>>>>> Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-
>>>>> sized receive buffers feature
>>>>>
>>>>> On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>>>>> Sent: Monday, 11 November, 2019 17:05
>>>>>>> To: Vitaly Mireyno <vmireyno@marvell.com>
>>>>>>> Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>>>>>>> open.org
>>>>>>> Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
>>>>>>> equal- sized receive buffers feature
>>>>>>>
>>>>>>> On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrote:
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>>>>>>> Sent: Sunday, 10 November, 2019 17:35
>>>>>>>>> To: Vitaly Mireyno <vmireyno@marvell.com>
>>>>>>>>> Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>>>>>>>>> open.org
>>>>>>>>> Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
>>>>>>>>> Add
>>>>>>>>> equal- sized receive buffers feature
>>>>>>>>>
>>>>>>>>> On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: virtio-comment@lists.oasis-open.org
>>>>>>>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
>>>>>>>>>>> Tsirkin
>>>>>>>>>>> Sent: Tuesday, 5 November, 2019 20:52
>>>>>>>>>>> To: Jason Wang <jasowang@redhat.com>
>>>>>>>>>>> Cc: Vitaly Mireyno <vmireyno@marvell.com>;
>>>>>>>>>>> virtio-comment@lists.oasis- open.org
>>>>>>>>>>> Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
>>>>>>>>>>> equal-sized receive buffers feature
>>>>>>>>>>>
>>>>>>>>>>> External Email
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>> ---
>>>>>>>>>>> ---
>>>>>>>>>>> - On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
>>>>>>>>>>>> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
>>>>>>>>>>>>> On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
>>>>>>>>>>>>>> The feature is limited to receive buffers only.
>>>>>>>>>>>>>> A driver decides on receive buffer length. The only
>>>>>>>>>>>>>> limitation is that this
>>>>>>>>>>> length has to be the same for all receive virtqueue buffers.
>>>>>>>>>>>>>> The driver configures receive buffer length to the device
>>>>>>>>>>>>>> during device
>>>>>>>>>>> initialization, and the device reads it and may use it for
>>>>>>>>>>> optimal
>>>>>>> operation.
>>>>>>>>>>>>>> No changes for transmit buffers.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: virtio-comment@lists.oasis-open.org
>>>>>>>>>>>>>> <virtio-comment@lists.oasis-open.org> On Behalf Of Jason
>>>>>>>>>>>>>> Wang
>>>>>>>>>>>>>> Sent: Thursday, 31 October, 2019 12:15
>>>>>>>>>>>>>> To: Vitaly Mireyno <vmireyno@marvell.com>;
>>>>>>>>>>>>>> virtio-comment@lists.oasis-open.org
>>>>>>>>>>>>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>>>>>>>>>>>>> Subject: [virtio-comment] Re: [PATCH] virtio-net: Add
>>>>>>>>>>>>>> equal-sized receive buffers feature
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
>>>>>>>>>>>>>>> Some devices benefit from working with receive buffers
>>>>>>>>>>>>>>> of a
>>>>>>>>>>> predefined constant length.
>>>>>>>>>>>>>>> Add a feature for drivers that allocate equal-sized
>>>>>>>>>>>>>>> receive buffers, and
>>>>>>>>>>> devices that benefit from predefined receive buffers length.
>>>>>>>>>>>>>>> Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>      content.tex | 29 +++++++++++++++++++++++++++--
>>>>>>>>>>>>>>>      1 file changed, 27 insertions(+), 2 deletions(-)
>>>>>>>>>>>>>> Is there any other networking device that has this feature?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/content.tex b/content.tex index
>>>>>>>>>>>>>>> b1ea9b9..c9e67c8
>>>>>>>>>>>>>>> 100644
>>>>>>>>>>>>>>> --- a/content.tex
>>>>>>>>>>>>>>> +++ b/content.tex
>>>>>>>>>>>>>>> @@ -2811,6 +2811,10 @@ \subsection{Feature
>>>>>>>>>>>>>>> bits}\label{sec:Device
>>>>>>>>>>> Types / Network Device / Feature bits
>>>>>>>>>>>>>>>      \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC
>>>>>>>>>>>>>>> address
>>>>>>>>>>> through control
>>>>>>>>>>>>>>>          channel.
>>>>>>>>>>>>>>> +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver
>>>>>>>>>>>>>>> +allocates all
>>>>>>>>>>> receive buffers
>>>>>>>>>>>>>>> +    of the same constant length. Device benefits from
>>>>>>>>>>>>>>> + working
>>>>>>> with
>>>>>>>>>>>>>>> +    receive buffers of equal length.
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>> Problem is, I don't think linux will use this for skbs
>>>>>>>>>>>>> since it breaks buffer accounting. This is because it is
>>>>>>>>>>>>> important to make skbs as small as you can. So even if you
>>>>>>>>>>>>> see "device would
>>>>>>> benefit"
>>>>>>>>>>>>> there is no way to balance this with the benefit to linux.
>>>>>>>>>>>>> How do you know which benefit is bigger?
>>>>>>>>>>>> I guess the idea is e.g for Linux driver, it can refuse this feature.
>>>>>>>>>>> Okay. What uses it?
>>>>>>>>>>>
>>>>>>>>>>>>> You also never explained how does device benefit. My guess
>>>>>>>>>>>>> is this allows device to calculate the # of descriptors to
>>>>>>>>>>>>> fetch that are needed for a packet. Right?
>>>>>>>>>>>>> Assuming this, I think a rough estimate should be enough.
>>>>>>>>>>>>> If device fetches too much it can discard extra, if it does
>>>>>>>>>>>>> not fetch enough it can fetch extra.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let us not forget that express packets are 256 bytes so up
>>>>>>>>>>>>> to
>>>>>>>>>>>>> 16 descriptors fit in a packet, there is no benefit most of
>>>>>>>>>>>>> the time in knowing whether e.g. 1 or 2 descriptors are needed.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let us not forget these are buffers, not descriptors.
>>>>>>>>>>>> I guess maybe the initial motivation is constant descriptor length.
>>>>>>>>>>> That conflicts with requirement framing is up to driver.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> I was using the wrong terminology. The feature is about constant
>>>>>>>>> *descriptor* length. In other words, the value that driver puts in
>>>>>>>>> Packed Virtqueue "Element Length" field (or 'len' field in the
>>>>> 'pvirtq_desc'
>>>>>>> struct).
>>>>>>>>> OK so this conflicts with "2.6.4 Message Framing" which requires
>>>>>>>>> that drivers can split buffers into as many descriptors as they like.
>>>>>>>>> Right?
>>>>>>>>>
>>>>>>>>>> Does it make more sense now, or you still see an issue with Linux
>>>>> driver?
>>>>>>>>> I think there's an issue with the flexible framing requirements
>>>>>>>>> and there's an issue with Linux driver.
>>>>>>>>>
>>>>>>>>>> The motivation is to allow the device to calculate the exact
>>>>>>>>>> number of
>>>>>>>>> descriptors to consume, before fetching the descriptors. This is
>>>>>>>>> beneficial for devices for which overconsuming or underconsuming
>>>>>>>>> descriptors come at a high cost.
>>>>>>>>>
>>>>>>>>> So I guess we can agree getting the # of descriptors *exactly*
>>>>>>>>> right isn't all that important. Right? My question is how does the
>>>>>>>>> driver balance the device versus Linux needs?
>>>>>>>>>
>>>>>>>>> One idea is if we assume this is best effort, does not have to be
>>>>>>>>> exact, then how about just having the device assume descriptor
>>>>>>>>> sizes stay more or less constant and use that to estimate the
>>>>>>>>> amount? If it under/over estimates, things just go a bit slower.
>>>>>>>>> This way driver can adjust the sizes and device will react
>>>>>>>>> automatically, with time.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> I see that "2.6.4 Message Framing" allows a full flexibility of
>>>>>>>> descriptor lengths, and I presume it's applicable to Packet
>>>>>>>> Virtqueues as well, though defined under Split Virtqueues section.
>>>>>>> That's a good point. Probably makes sense to move it out to a common
>>>>>>> section, right?
>>>>>>>
>>>>>>>> The example
>>>>>>>> talks about transmit virtqueue, and it makes perfect sense.
>>>>>>>> However, wouldn't a typical driver place equal-sized *receive*
>>>>>>>> descriptors
>>>>>>> anyway? So if a device can benefit from it, the driver might as well
>>>>>>> negotiate this new feature.
>>>>>>>
>>>>>>> Having buffer size jump around wildly doesn't seem too useful, I agree.
>>>>>>> OTOH being able to adjust it gradually has been in the past
>>>>>>> demontrated to help performance measureably.
>>>>>>>
>>>>>>>> This could be especially relevant for devices, for which adjusting
>>>>>>>> the number
>>>>>>> of used descriptors is impractical after descriptors have already been
>>>>> fetched.
>>>>>>>> I agree that if this requirement conflicts with specific driver
>>>>>>>> needs, it will not
>>>>>>> be negotiated, and the device will either underperform in certain
>>>>>>> scenarios, or will not come up at all.
>>>>>>>
>>>>>>> Right so that makes it a challenge. Device says it prefers fixed
>>>>>>> buffers. Is that preference more or less important than ability to
>>>>>>> efficiently support workloads such as TCP small queues?
>>>>>>> Driver has no idea and I suspect neither does the device.
>>>>>>> So I don't see how will a Linux driver know that it should enable
>>>>>>> this, neither how will device know it's okay to just refuse features_ok.
>>>>>>>
>>>>>>>
>>>>>>> On the other hand, current drivers already have logic that tries to
>>>>>>> change buffer sizes in a smooth way.  So simply caching the last
>>>>>>> buffer size and assuming all the others will be exactly the same will
>>>>>>> go a long way towards limiting how much does device need to fetch.
>>>>>>> This does imply extra logic on the device to recover if things change
>>>>>>> and the first read did not fetch enough buffers, but then it would be
>>>>>>> required anyway if as you say above the failure is not catastrophic.
>>>>>>>
>>>>>>>
>>>>>>> The feature thus only seems useful for small, feature-restrained
>>>>>>> devices
>>>>>>> - which are prepared to sacrifice some performance to cut costs, and
>>>>>>> which can't recover at all. Is this what you are trying to do?
>>>>>>>
>>>>>> The intention is to enable devices with this specific limitation to
>>>>>> offer virtio offload.  The feature can be redefined such that it would
>>>>>> only be offered by devices that are unable to handle dynamically
>>>>>> changing descriptor lengths. How does that sound?
>>>>> So it makes more sense when mergeable buffers are disabled (since then
>>>>> buffers are practically all same size).
>>>>>
>>>> Actually, mergeable buffers are not a problem. They could be enabled,
>>>> as long as each descriptor is the same length. So implementations that
>>>> prefer optimizing memory utilization over jumbo frame performance can
>>>> negotiate VIRTIO_NET_F_MRG_RXBUF and allocate smaller buffers.
>>> So my point was, without VIRTIO_NET_F_MRG_RXBUF, all buffers are same
>>> length anyway. So if we are talking about cheap simple hardware, I guess
>>> it can just not have VIRTIO_NET_F_MRG_RXBUF and be done with it?
>>> If device does care about memory utilization then I think
>>> it needs to give driver the flexibility it needs/uses.
>>> No?
>>>
>>> Again I can see how we might want to disallow crazy setups with e.g. 1
>>> byte per buffer. That's just abuse, no guest does that anyway. So asking
>>> e.g. for a minimal buffer size sounds very reasonable.
>>
>> One question here is that, the minimal buffer size should depends on various
>> factors. E.g the ring size. Consider a 256 entries ring, the minimal size
>> should be 64K/256=256 ...
> I guess you are right. We can make this driver programmable I guess?
> Basically pass min_buf_len to the device.


Then it still have the chance to program the min_buf_len to 1?


>
>
>>> But an option that
>>> disables functionality that a popular guest uses needs a lot of
>>> documentation to help device writers figure out whether they want that
>>> option or not, and I'd worry that even with documentation will be
>>> misunderstood even if we write it. When do you enable this?
>>> When you don't care about performance? When you don't care about Linux?
>>
>> It looks to me the feature proposal here is something related to the
>> descriptor pre fetching. In the case of packed virtqueue. Having
>> avail/producer index may help more or less here?
>>
>> Thanks
> Isn't this addressed by VIRTIO_F_NOTIFICATION_DATA?
> At least that's one of the things it's supposed to do ...


Device can choose to disable notification, they usually happen on heavy 
workload which suppresses the effort of notification data.

Thanks


>>>>> With that in mind, I have an idea: scsi and block already have max sg field.
>>>>> How about we add a writeable max sg field, maybe even make it
>>>>> programmable per vq?
>>>>>
>>>>> Thus driver tells device what is it's max s/g value for rx. Worst case device
>>>>> fetches a bit more than it needs. Discarding extra shouldn't be expensive. This
>>>>> looks like it will help even smart devices. What do you think?
>>>>>
>>>>> This nicely avoids conflicting with the flexible framing requirement.
>>>>>
>>>> My intention was to avoid any descriptor length variations, for
>>>> devices that unable fetching or discarding extra descriptors. (If in
>>>> the pipelined HW processing the decision on number of descriptors is
>>>> made in the early stage, and it cannot be undone in a later stage).
>>> Frankly discarding unused descriptors looks to me like something that
>>> shouldn't have a high cost in the hardware.  I can see how trying to
>>> predict descriptor length, and fetching extra if not enough was fetched
>>> can have a high cost, so to me extensions to remove the guesswork
>>> from the device have value.  However a lot of effort went into trying to
>>> reduce e.g. number of pci express packets needed to fetch descriptors.
>>> Each packet fetches 256 bytes anyway, does it not?  Not having an
>>> ability to use that information seems like an obvious waste, and that
>>> means ability to keep some fetched descriptors around even if they are
>>> not used for a given packet. Again just my $.02.
>>>
>>>> Defining max s/g sounds like an interesting feature by itself.
>>> But assuming we have max RX s/g, I guess hardware can set max s/g = 1?
>>> Then since with !VIRTIO_NET_F_MRG_RXBUF all buffers are forced to
>>> be same length.
>>>
>>>>>>>> Could you specify what issue do you see with the Linux driver?
>>>>>>> See logic around struct ewma.
>>>>>>>
>>>>>>> The first relevant commit is I guess
>>>>>>> commit ab7db91705e95ed1bba1304388936fccfa58c992
>>>>>>>      virtio-net: auto-tune mergeable rx buffer size for improved
>>>>>>> performance
>>>>>>>
>>>>>>> this claims a gain of about 10% with large packets which isn't earth
>>>>>>> shattering but also not something we can ignore completely. And I
>>>>>>> suspect it can be bigger with smaller packets.
>>>>>>>
>>>>>>>>>>>>> So device does not really know the exact # of descriptors:
>>>>>>>>>>>>> current drivers do 1 descriptor per buffer but nothing
>>>>>>>>>>>>> prevents more.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process
>>>>>>>>>>>>>>> duplicated
>>>>>>>>>>> ACKs
>>>>>>>>>>>>>>>          and report number of coalesced segments and
>>>>>>>>>>>>>>> duplicated ACKs @@ -2854,8 +2858,8 @@
>>>>>>> \subsubsection{Legacy Interface:
>>>>>>>>>>> Feature bits}\label{sec:Device Types / Network
>>>>>>>>>>>>>>>      \subsection{Device configuration
>>>>>>>>>>>>>>> layout}\label{sec:Device Types /
>>>>>>>>>>> Network Device / Device configuration layout}
>>>>>>>>>>>>>>>      \label{sec:Device Types / Block Device / Feature
>>>>>>>>>>>>>>> bits / Device configuration layout} -Three
>>>>>>>>>>>>>>> driver-read-only configuration fields are currently defined.
>>>>>>>>>>>>>>> The \field{mac} address field -always exists (though is
>>>>>>>>>>>>>>> only valid if VIRTIO_NET_F_MAC is set), and
>>>>>>>>>>>>>>> +The driver-read-only \field{mac} address field always
>>>>>>>>>>>>>>> +exists (though is only valid if VIRTIO_NET_F_MAC is
>>>>>>>>>>>>>>> +set), and
>>>>>>>>>>>>>>>      \field{status} only exists if VIRTIO_NET_F_STATUS is set.
>>>>> Two
>>>>>>>>>>>>>>>      read-only bits (for the driver) are currently
>>>>>>>>>>>>>>> defined for the status
>>>>>>>>>>> field:
>>>>>>>>>>>>>>>      VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
>>>>>>>>>>>>>>> @@ -2875,12 +2879,17 @@ \subsection{Device
>>>>>>>>>>>>>>> configuration
>>>>>>>>>>> layout}\label{sec:Device Types / Network Device
>>>>>>>>>>>>>>>      VIRTIO_NET_F_MTU is set. This field specifies the
>>>>>>>>>>>>>>> maximum MTU for
>>>>>>>>>>> the driver to
>>>>>>>>>>>>>>>      use.
>>>>>>>>>>>>>>> +The device-read-only field \field{rx_buf_len} only
>>>>>>>>>>>>>>> +exists if
>>>>>>>>>>>>>> Should be driver-read-only.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field
>>>>>>>>>>>>>>> +specifies the receive buffers length.
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>      \begin{lstlisting}
>>>>>>>>>>>>>>>      struct virtio_net_config {
>>>>>>>>>>>>>>>              u8 mac[6];
>>>>>>>>>>>>>>>              le16 status;
>>>>>>>>>>>>>>>              le16 max_virtqueue_pairs;
>>>>>>>>>>>>>>>              le16 mtu;
>>>>>>>>>>>>>>> +        le32 rx_buf_len;
>>>>>>>>>>>>>>>      };
>>>>>>>>>>>>>>>      \end{lstlisting}
>>>>>>>>>>>>>>> @@ -2933,6 +2942,13 @@ \subsection{Device configuration
>>>>>>>>>>>>>>> layout}\label{sec:Device Types / Network Device
>>>>>>>>>>>>>>>      A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY
>>>>>>>>>>>>>>> feature if
>>>>>>>>>>> the device offers it.
>>>>>>>>>>>>>>> +A driver SHOULD accept the
>>>>>>> VIRTIO_NET_F_CONST_RXBUF_LEN
>>>>>>>>>>> feature if offered.
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
>>>>>>>>> negotiated,
>>>>>>>>>>>>>>> +the driver MUST set \field{rx_buf_len}.
>>>>>>>>>>>>>> I think it's device that set the field?
>>>>>>>>>>>>> Makes more sense for the driver, but if you want this set e.g.
>>>>>>>>>>>>> before buffers are added, you must say so.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +A driver MUST NOT modify \field{rx_buf_len} once it
>>>>>>>>>>>>>>> +has been
>>>>>>>>> set.
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>> This seems very unflexible. I can see how e.g. XDP would
>>>>>>>>>>>>> benefit from big buffers while skbs benefit from small buffers.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This calls for ability to change this.
>>>>>>>>>>>> Yes, but it requires non trivial cleanups for the old length
>>>>>>>>>>>> and place them with new ones.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>> Let's see:
>>>>>>>>>>> 1	- making buffer smaller: just update config space,
>>>>>>>>>>> 	  then make new buffers smaller
>>>>>>>>>>>
>>>>>>>>>>> 2	- making buffers bigger: add larger buffers,
>>>>>>>>>>> 	once all small ones are consumed update config space
>>>>>>>>>>>
>>>>>>>>>>> 2 is tricky I agree. Thoughts?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Agree. It's doable, provided that the driver will follow the
>>>>>>>>>> update
>>>>>>>>> procedure.
>>>>>>>>>>>>>>>      \subsubsection{Legacy Interface: Device
>>>>>>>>>>>>>>> configuration
>>>>>>>>>>> layout}\label{sec:Device Types / Network Device / Device
>>>>>>>>>>> configuration layout / Legacy Interface: Device configuration
>>>>>>>>>>> layout}
>>>>>>>>>>>>>>>      \label{sec:Device Types / Block Device / Feature
>>>>>>>>>>>>>>> bits / Device
>>>>>>>>>>> configuration layout / Legacy Interface: Device configuration
>>>>>>>>>>> layout}
>>>>>>>>>>>>>>>      When using the legacy interface, transitional
>>>>>>>>>>>>>>> devices and drivers @@
>>>>>>>>>>>>>>> -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive
>>>>>>>>>>> Buffers}\label{sec:Device Types / Network Devi
>>>>>>>>>>>>>>>      If VIRTIO_NET_F_MQ is negotiated, each of
>>>>>>>>>>>>>>> receiveq1\ldots
>>>>>>>>>>> receiveqN
>>>>>>>>>>>>>>>      that will be used SHOULD be populated with receive
>>>>> buffers.
>>>>>>>>>>>>>>> +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
>>>>>>>>> negotiated,
>>>>>>>>>>>>>>> +the driver MUST initialize all receive virtqueue
>>>>>>>>>>>>>>> +descriptors \field{len} field with the value
>>>>>>>>>>>>>>> +configured in \field{rx_buf_len} device configuration
>>>>>>>>>>>>>>> +field, and allocate receive
>>>>>>>>>>> buffers accordingly.
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>      \devicenormative{\paragraph}{Setting Up Receive
>>>>>>>>>>>>>>> Buffers}{Device Types / Network Device / Device
>>>>>>>>>>>>>>> Operation / Setting Up Receive Buffers}
>>>>>>>>>>>>>>>      The device MUST set \field{num_buffers} to the
>>>>>>>>>>>>>>> number of descriptors used to @@ -3396,6 +3417,10 @@
>>>>>>>>>>> \subsubsection{Processing of Incoming Packets}\label{sec:Device
>>>>>>>>>>> Types / Network
>>>>>>>>>>>>>>>      checksum (in case of multiple encapsulated
>>>>>>>>>>>>>>> protocols, one
>>>>>>> level
>>>>>>>>>>>>>>>      of checksums is validated).
>>>>>>>>>>>>>>> +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated,
>>>>>>> the
>>>>>>>>>>> device
>>>>>>>>>>>>>>> +MAY use \field{rx_buf_len} as a buffer length (instead
>>>>>>>>>>>>>>> +of reading it from virtqueue descriptor \field{len} field).
>>>>>>>>>>>>>> Is this safe? What if driver submit a small buffer, then
>>>>>>>>>>>>>> device can read
>>>>>>>>>>> more than what is allowed?
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>      \drivernormative{\paragraph}{Processing of Incoming
>>>>>>>>>>>>>>>      Packets}{Device Types / Network Device / Device Operation
>>>>> /
>>>>>>>>>>>>>>>      Processing of Incoming Packets}
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-25  8:07                           ` Jason Wang
@ 2019-11-25  8:18                             ` Michael S. Tsirkin
  2019-11-25  8:22                               ` Jason Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-11-25  8:18 UTC (permalink / raw)
  To: Jason Wang; +Cc: Vitaly Mireyno, virtio-comment

On Mon, Nov 25, 2019 at 04:07:05PM +0800, Jason Wang wrote:
> 
> On 2019/11/25 下午3:33, Michael S. Tsirkin wrote:
> > On Mon, Nov 25, 2019 at 11:31:09AM +0800, Jason Wang wrote:
> > > On 2019/11/24 下午11:30, Michael S. Tsirkin wrote:
> > > > On Sun, Nov 24, 2019 at 03:02:05PM +0000, Vitaly Mireyno wrote:
> > > > > > -----Original Message-----
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Wednesday, 20 November, 2019 15:23
> > > > > > To: Vitaly Mireyno <vmireyno@marvell.com>
> > > > > > Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> > > > > > open.org
> > > > > > Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-
> > > > > > sized receive buffers feature
> > > > > > 
> > > > > > On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote:
> > > > > > > > -----Original Message-----
> > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > Sent: Monday, 11 November, 2019 17:05
> > > > > > > > To: Vitaly Mireyno <vmireyno@marvell.com>
> > > > > > > > Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> > > > > > > > open.org
> > > > > > > > Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
> > > > > > > > equal- sized receive buffers feature
> > > > > > > > 
> > > > > > > > On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrote:
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > > Sent: Sunday, 10 November, 2019 17:35
> > > > > > > > > > To: Vitaly Mireyno <vmireyno@marvell.com>
> > > > > > > > > > Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> > > > > > > > > > open.org
> > > > > > > > > > Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> > > > > > > > > > Add
> > > > > > > > > > equal- sized receive buffers feature
> > > > > > > > > > 
> > > > > > > > > > On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
> > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > From: virtio-comment@lists.oasis-open.org
> > > > > > > > > > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
> > > > > > > > > > > > Tsirkin
> > > > > > > > > > > > Sent: Tuesday, 5 November, 2019 20:52
> > > > > > > > > > > > To: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > > Cc: Vitaly Mireyno <vmireyno@marvell.com>;
> > > > > > > > > > > > virtio-comment@lists.oasis- open.org
> > > > > > > > > > > > Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
> > > > > > > > > > > > equal-sized receive buffers feature
> > > > > > > > > > > > 
> > > > > > > > > > > > External Email
> > > > > > > > > > > > 
> > > > > > > > > > > > ---------------------------------------------------------------
> > > > > > > > > > > > ---
> > > > > > > > > > > > ---
> > > > > > > > > > > > - On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
> > > > > > > > > > > > > On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
> > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno wrote:
> > > > > > > > > > > > > > > The feature is limited to receive buffers only.
> > > > > > > > > > > > > > > A driver decides on receive buffer length. The only
> > > > > > > > > > > > > > > limitation is that this
> > > > > > > > > > > > length has to be the same for all receive virtqueue buffers.
> > > > > > > > > > > > > > > The driver configures receive buffer length to the device
> > > > > > > > > > > > > > > during device
> > > > > > > > > > > > initialization, and the device reads it and may use it for
> > > > > > > > > > > > optimal
> > > > > > > > operation.
> > > > > > > > > > > > > > > No changes for transmit buffers.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > > > From: virtio-comment@lists.oasis-open.org
> > > > > > > > > > > > > > > <virtio-comment@lists.oasis-open.org> On Behalf Of Jason
> > > > > > > > > > > > > > > Wang
> > > > > > > > > > > > > > > Sent: Thursday, 31 October, 2019 12:15
> > > > > > > > > > > > > > > To: Vitaly Mireyno <vmireyno@marvell.com>;
> > > > > > > > > > > > > > > virtio-comment@lists.oasis-open.org
> > > > > > > > > > > > > > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > > > > > > > Subject: [virtio-comment] Re: [PATCH] virtio-net: Add
> > > > > > > > > > > > > > > equal-sized receive buffers feature
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
> > > > > > > > > > > > > > > > Some devices benefit from working with receive buffers
> > > > > > > > > > > > > > > > of a
> > > > > > > > > > > > predefined constant length.
> > > > > > > > > > > > > > > > Add a feature for drivers that allocate equal-sized
> > > > > > > > > > > > > > > > receive buffers, and
> > > > > > > > > > > > devices that benefit from predefined receive buffers length.
> > > > > > > > > > > > > > > > Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
> > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > >      content.tex | 29 +++++++++++++++++++++++++++--
> > > > > > > > > > > > > > > >      1 file changed, 27 insertions(+), 2 deletions(-)
> > > > > > > > > > > > > > > Is there any other networking device that has this feature?
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > diff --git a/content.tex b/content.tex index
> > > > > > > > > > > > > > > > b1ea9b9..c9e67c8
> > > > > > > > > > > > > > > > 100644
> > > > > > > > > > > > > > > > --- a/content.tex
> > > > > > > > > > > > > > > > +++ b/content.tex
> > > > > > > > > > > > > > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
> > > > > > > > > > > > > > > > bits}\label{sec:Device
> > > > > > > > > > > > Types / Network Device / Feature bits
> > > > > > > > > > > > > > > >      \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC
> > > > > > > > > > > > > > > > address
> > > > > > > > > > > > through control
> > > > > > > > > > > > > > > >          channel.
> > > > > > > > > > > > > > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver
> > > > > > > > > > > > > > > > +allocates all
> > > > > > > > > > > > receive buffers
> > > > > > > > > > > > > > > > +    of the same constant length. Device benefits from
> > > > > > > > > > > > > > > > + working
> > > > > > > > with
> > > > > > > > > > > > > > > > +    receive buffers of equal length.
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > Problem is, I don't think linux will use this for skbs
> > > > > > > > > > > > > > since it breaks buffer accounting. This is because it is
> > > > > > > > > > > > > > important to make skbs as small as you can. So even if you
> > > > > > > > > > > > > > see "device would
> > > > > > > > benefit"
> > > > > > > > > > > > > > there is no way to balance this with the benefit to linux.
> > > > > > > > > > > > > > How do you know which benefit is bigger?
> > > > > > > > > > > > > I guess the idea is e.g for Linux driver, it can refuse this feature.
> > > > > > > > > > > > Okay. What uses it?
> > > > > > > > > > > > 
> > > > > > > > > > > > > > You also never explained how does device benefit. My guess
> > > > > > > > > > > > > > is this allows device to calculate the # of descriptors to
> > > > > > > > > > > > > > fetch that are needed for a packet. Right?
> > > > > > > > > > > > > > Assuming this, I think a rough estimate should be enough.
> > > > > > > > > > > > > > If device fetches too much it can discard extra, if it does
> > > > > > > > > > > > > > not fetch enough it can fetch extra.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Let us not forget that express packets are 256 bytes so up
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > 16 descriptors fit in a packet, there is no benefit most of
> > > > > > > > > > > > > > the time in knowing whether e.g. 1 or 2 descriptors are needed.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Let us not forget these are buffers, not descriptors.
> > > > > > > > > > > > > I guess maybe the initial motivation is constant descriptor length.
> > > > > > > > > > > > That conflicts with requirement framing is up to driver.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > I was using the wrong terminology. The feature is about constant
> > > > > > > > > > *descriptor* length. In other words, the value that driver puts in
> > > > > > > > > > Packed Virtqueue "Element Length" field (or 'len' field in the
> > > > > > 'pvirtq_desc'
> > > > > > > > struct).
> > > > > > > > > > OK so this conflicts with "2.6.4 Message Framing" which requires
> > > > > > > > > > that drivers can split buffers into as many descriptors as they like.
> > > > > > > > > > Right?
> > > > > > > > > > 
> > > > > > > > > > > Does it make more sense now, or you still see an issue with Linux
> > > > > > driver?
> > > > > > > > > > I think there's an issue with the flexible framing requirements
> > > > > > > > > > and there's an issue with Linux driver.
> > > > > > > > > > 
> > > > > > > > > > > The motivation is to allow the device to calculate the exact
> > > > > > > > > > > number of
> > > > > > > > > > descriptors to consume, before fetching the descriptors. This is
> > > > > > > > > > beneficial for devices for which overconsuming or underconsuming
> > > > > > > > > > descriptors come at a high cost.
> > > > > > > > > > 
> > > > > > > > > > So I guess we can agree getting the # of descriptors *exactly*
> > > > > > > > > > right isn't all that important. Right? My question is how does the
> > > > > > > > > > driver balance the device versus Linux needs?
> > > > > > > > > > 
> > > > > > > > > > One idea is if we assume this is best effort, does not have to be
> > > > > > > > > > exact, then how about just having the device assume descriptor
> > > > > > > > > > sizes stay more or less constant and use that to estimate the
> > > > > > > > > > amount? If it under/over estimates, things just go a bit slower.
> > > > > > > > > > This way driver can adjust the sizes and device will react
> > > > > > > > > > automatically, with time.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > I see that "2.6.4 Message Framing" allows a full flexibility of
> > > > > > > > > descriptor lengths, and I presume it's applicable to Packet
> > > > > > > > > Virtqueues as well, though defined under Split Virtqueues section.
> > > > > > > > That's a good point. Probably makes sense to move it out to a common
> > > > > > > > section, right?
> > > > > > > > 
> > > > > > > > > The example
> > > > > > > > > talks about transmit virtqueue, and it makes perfect sense.
> > > > > > > > > However, wouldn't a typical driver place equal-sized *receive*
> > > > > > > > > descriptors
> > > > > > > > anyway? So if a device can benefit from it, the driver might as well
> > > > > > > > negotiate this new feature.
> > > > > > > > 
> > > > > > > > Having buffer size jump around wildly doesn't seem too useful, I agree.
> > > > > > > > OTOH being able to adjust it gradually has been in the past
> > > > > > > > demontrated to help performance measureably.
> > > > > > > > 
> > > > > > > > > This could be especially relevant for devices, for which adjusting
> > > > > > > > > the number
> > > > > > > > of used descriptors is impractical after descriptors have already been
> > > > > > fetched.
> > > > > > > > > I agree that if this requirement conflicts with specific driver
> > > > > > > > > needs, it will not
> > > > > > > > be negotiated, and the device will either underperform in certain
> > > > > > > > scenarios, or will not come up at all.
> > > > > > > > 
> > > > > > > > Right so that makes it a challenge. Device says it prefers fixed
> > > > > > > > buffers. Is that preference more or less important than ability to
> > > > > > > > efficiently support workloads such as TCP small queues?
> > > > > > > > Driver has no idea and I suspect neither does the device.
> > > > > > > > So I don't see how will a Linux driver know that it should enable
> > > > > > > > this, neither how will device know it's okay to just refuse features_ok.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On the other hand, current drivers already have logic that tries to
> > > > > > > > change buffer sizes in a smooth way.  So simply caching the last
> > > > > > > > buffer size and assuming all the others will be exactly the same will
> > > > > > > > go a long way towards limiting how much does device need to fetch.
> > > > > > > > This does imply extra logic on the device to recover if things change
> > > > > > > > and the first read did not fetch enough buffers, but then it would be
> > > > > > > > required anyway if as you say above the failure is not catastrophic.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > The feature thus only seems useful for small, feature-restrained
> > > > > > > > devices
> > > > > > > > - which are prepared to sacrifice some performance to cut costs, and
> > > > > > > > which can't recover at all. Is this what you are trying to do?
> > > > > > > > 
> > > > > > > The intention is to enable devices with this specific limitation to
> > > > > > > offer virtio offload.  The feature can be redefined such that it would
> > > > > > > only be offered by devices that are unable to handle dynamically
> > > > > > > changing descriptor lengths. How does that sound?
> > > > > > So it makes more sense when mergeable buffers are disabled (since then
> > > > > > buffers are practically all same size).
> > > > > > 
> > > > > Actually, mergeable buffers are not a problem. They could be enabled,
> > > > > as long as each descriptor is the same length. So implementations that
> > > > > prefer optimizing memory utilization over jumbo frame performance can
> > > > > negotiate VIRTIO_NET_F_MRG_RXBUF and allocate smaller buffers.
> > > > So my point was, without VIRTIO_NET_F_MRG_RXBUF, all buffers are same
> > > > length anyway. So if we are talking about cheap simple hardware, I guess
> > > > it can just not have VIRTIO_NET_F_MRG_RXBUF and be done with it?
> > > > If device does care about memory utilization then I think
> > > > it needs to give driver the flexibility it needs/uses.
> > > > No?
> > > > 
> > > > Again I can see how we might want to disallow crazy setups with e.g. 1
> > > > byte per buffer. That's just abuse, no guest does that anyway. So asking
> > > > e.g. for a minimal buffer size sounds very reasonable.
> > > 
> > > One question here is that, the minimal buffer size should depends on various
> > > factors. E.g the ring size. Consider a 256 entries ring, the minimal size
> > > should be 64K/256=256 ...
> > I guess you are right. We can make this driver programmable I guess?
> > Basically pass min_buf_len to the device.
> 
> 
> Then it still have the chance to program the min_buf_len to 1?

If it wants to - yes - presumably device won't work as well,
fetching all kind of extra data e.g. unnecessary descriptors.

> 
> > 
> > 
> > > > But an option that
> > > > disables functionality that a popular guest uses needs a lot of
> > > > documentation to help device writers figure out whether they want that
> > > > option or not, and I'd worry that even with documentation will be
> > > > misunderstood even if we write it. When do you enable this?
> > > > When you don't care about performance? When you don't care about Linux?
> > > 
> > > It looks to me the feature proposal here is something related to the
> > > descriptor pre fetching. In the case of packed virtqueue. Having
> > > avail/producer index may help more or less here?
> > > 
> > > Thanks
> > Isn't this addressed by VIRTIO_F_NOTIFICATION_DATA?
> > At least that's one of the things it's supposed to do ...
> 
> 
> Device can choose to disable notification, they usually happen on heavy
> workload which suppresses the effort of notification data.
> 
> Thanks

Well prefetch is good for latency but presumably
less important for a streaming workload.

> 
> > > > > > With that in mind, I have an idea: scsi and block already have max sg field.
> > > > > > How about we add a writeable max sg field, maybe even make it
> > > > > > programmable per vq?
> > > > > > 
> > > > > > Thus driver tells device what is it's max s/g value for rx. Worst case device
> > > > > > fetches a bit more than it needs. Discarding extra shouldn't be expensive. This
> > > > > > looks like it will help even smart devices. What do you think?
> > > > > > 
> > > > > > This nicely avoids conflicting with the flexible framing requirement.
> > > > > > 
> > > > > My intention was to avoid any descriptor length variations, for
> > > > > devices that unable fetching or discarding extra descriptors. (If in
> > > > > the pipelined HW processing the decision on number of descriptors is
> > > > > made in the early stage, and it cannot be undone in a later stage).
> > > > Frankly discarding unused descriptors looks to me like something that
> > > > shouldn't have a high cost in the hardware.  I can see how trying to
> > > > predict descriptor length, and fetching extra if not enough was fetched
> > > > can have a high cost, so to me extensions to remove the guesswork
> > > > from the device have value.  However a lot of effort went into trying to
> > > > reduce e.g. number of pci express packets needed to fetch descriptors.
> > > > Each packet fetches 256 bytes anyway, does it not?  Not having an
> > > > ability to use that information seems like an obvious waste, and that
> > > > means ability to keep some fetched descriptors around even if they are
> > > > not used for a given packet. Again just my $.02.
> > > > 
> > > > > Defining max s/g sounds like an interesting feature by itself.
> > > > But assuming we have max RX s/g, I guess hardware can set max s/g = 1?
> > > > Then since with !VIRTIO_NET_F_MRG_RXBUF all buffers are forced to
> > > > be same length.
> > > > 
> > > > > > > > > Could you specify what issue do you see with the Linux driver?
> > > > > > > > See logic around struct ewma.
> > > > > > > > 
> > > > > > > > The first relevant commit is I guess
> > > > > > > > commit ab7db91705e95ed1bba1304388936fccfa58c992
> > > > > > > >      virtio-net: auto-tune mergeable rx buffer size for improved
> > > > > > > > performance
> > > > > > > > 
> > > > > > > > this claims a gain of about 10% with large packets which isn't earth
> > > > > > > > shattering but also not something we can ignore completely. And I
> > > > > > > > suspect it can be bigger with smaller packets.
> > > > > > > > 
> > > > > > > > > > > > > > So device does not really know the exact # of descriptors:
> > > > > > > > > > > > > > current drivers do 1 descriptor per buffer but nothing
> > > > > > > > > > > > > > prevents more.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Thoughts?
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > >      \item[VIRTIO_NET_F_RSC_EXT(61)] Device can process
> > > > > > > > > > > > > > > > duplicated
> > > > > > > > > > > > ACKs
> > > > > > > > > > > > > > > >          and report number of coalesced segments and
> > > > > > > > > > > > > > > > duplicated ACKs @@ -2854,8 +2858,8 @@
> > > > > > > > \subsubsection{Legacy Interface:
> > > > > > > > > > > > Feature bits}\label{sec:Device Types / Network
> > > > > > > > > > > > > > > >      \subsection{Device configuration
> > > > > > > > > > > > > > > > layout}\label{sec:Device Types /
> > > > > > > > > > > > Network Device / Device configuration layout}
> > > > > > > > > > > > > > > >      \label{sec:Device Types / Block Device / Feature
> > > > > > > > > > > > > > > > bits / Device configuration layout} -Three
> > > > > > > > > > > > > > > > driver-read-only configuration fields are currently defined.
> > > > > > > > > > > > > > > > The \field{mac} address field -always exists (though is
> > > > > > > > > > > > > > > > only valid if VIRTIO_NET_F_MAC is set), and
> > > > > > > > > > > > > > > > +The driver-read-only \field{mac} address field always
> > > > > > > > > > > > > > > > +exists (though is only valid if VIRTIO_NET_F_MAC is
> > > > > > > > > > > > > > > > +set), and
> > > > > > > > > > > > > > > >      \field{status} only exists if VIRTIO_NET_F_STATUS is set.
> > > > > > Two
> > > > > > > > > > > > > > > >      read-only bits (for the driver) are currently
> > > > > > > > > > > > > > > > defined for the status
> > > > > > > > > > > > field:
> > > > > > > > > > > > > > > >      VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
> > > > > > > > > > > > > > > > @@ -2875,12 +2879,17 @@ \subsection{Device
> > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > layout}\label{sec:Device Types / Network Device
> > > > > > > > > > > > > > > >      VIRTIO_NET_F_MTU is set. This field specifies the
> > > > > > > > > > > > > > > > maximum MTU for
> > > > > > > > > > > > the driver to
> > > > > > > > > > > > > > > >      use.
> > > > > > > > > > > > > > > > +The device-read-only field \field{rx_buf_len} only
> > > > > > > > > > > > > > > > +exists if
> > > > > > > > > > > > > > > Should be driver-read-only.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This field
> > > > > > > > > > > > > > > > +specifies the receive buffers length.
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >      \begin{lstlisting}
> > > > > > > > > > > > > > > >      struct virtio_net_config {
> > > > > > > > > > > > > > > >              u8 mac[6];
> > > > > > > > > > > > > > > >              le16 status;
> > > > > > > > > > > > > > > >              le16 max_virtqueue_pairs;
> > > > > > > > > > > > > > > >              le16 mtu;
> > > > > > > > > > > > > > > > +        le32 rx_buf_len;
> > > > > > > > > > > > > > > >      };
> > > > > > > > > > > > > > > >      \end{lstlisting}
> > > > > > > > > > > > > > > > @@ -2933,6 +2942,13 @@ \subsection{Device configuration
> > > > > > > > > > > > > > > > layout}\label{sec:Device Types / Network Device
> > > > > > > > > > > > > > > >      A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY
> > > > > > > > > > > > > > > > feature if
> > > > > > > > > > > > the device offers it.
> > > > > > > > > > > > > > > > +A driver SHOULD accept the
> > > > > > > > VIRTIO_NET_F_CONST_RXBUF_LEN
> > > > > > > > > > > > feature if offered.
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
> > > > > > > > > > negotiated,
> > > > > > > > > > > > > > > > +the driver MUST set \field{rx_buf_len}.
> > > > > > > > > > > > > > > I think it's device that set the field?
> > > > > > > > > > > > > > Makes more sense for the driver, but if you want this set e.g.
> > > > > > > > > > > > > > before buffers are added, you must say so.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +A driver MUST NOT modify \field{rx_buf_len} once it
> > > > > > > > > > > > > > > > +has been
> > > > > > > > > > set.
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > This seems very unflexible. I can see how e.g. XDP would
> > > > > > > > > > > > > > benefit from big buffers while skbs benefit from small buffers.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > This calls for ability to change this.
> > > > > > > > > > > > > Yes, but it requires non trivial cleanups for the old length
> > > > > > > > > > > > > and place them with new ones.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > Let's see:
> > > > > > > > > > > > 1	- making buffer smaller: just update config space,
> > > > > > > > > > > > 	  then make new buffers smaller
> > > > > > > > > > > > 
> > > > > > > > > > > > 2	- making buffers bigger: add larger buffers,
> > > > > > > > > > > > 	once all small ones are consumed update config space
> > > > > > > > > > > > 
> > > > > > > > > > > > 2 is tricky I agree. Thoughts?
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > Agree. It's doable, provided that the driver will follow the
> > > > > > > > > > > update
> > > > > > > > > > procedure.
> > > > > > > > > > > > > > > >      \subsubsection{Legacy Interface: Device
> > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > layout}\label{sec:Device Types / Network Device / Device
> > > > > > > > > > > > configuration layout / Legacy Interface: Device configuration
> > > > > > > > > > > > layout}
> > > > > > > > > > > > > > > >      \label{sec:Device Types / Block Device / Feature
> > > > > > > > > > > > > > > > bits / Device
> > > > > > > > > > > > configuration layout / Legacy Interface: Device configuration
> > > > > > > > > > > > layout}
> > > > > > > > > > > > > > > >      When using the legacy interface, transitional
> > > > > > > > > > > > > > > > devices and drivers @@
> > > > > > > > > > > > > > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up Receive
> > > > > > > > > > > > Buffers}\label{sec:Device Types / Network Devi
> > > > > > > > > > > > > > > >      If VIRTIO_NET_F_MQ is negotiated, each of
> > > > > > > > > > > > > > > > receiveq1\ldots
> > > > > > > > > > > > receiveqN
> > > > > > > > > > > > > > > >      that will be used SHOULD be populated with receive
> > > > > > buffers.
> > > > > > > > > > > > > > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
> > > > > > > > > > negotiated,
> > > > > > > > > > > > > > > > +the driver MUST initialize all receive virtqueue
> > > > > > > > > > > > > > > > +descriptors \field{len} field with the value
> > > > > > > > > > > > > > > > +configured in \field{rx_buf_len} device configuration
> > > > > > > > > > > > > > > > +field, and allocate receive
> > > > > > > > > > > > buffers accordingly.
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >      \devicenormative{\paragraph}{Setting Up Receive
> > > > > > > > > > > > > > > > Buffers}{Device Types / Network Device / Device
> > > > > > > > > > > > > > > > Operation / Setting Up Receive Buffers}
> > > > > > > > > > > > > > > >      The device MUST set \field{num_buffers} to the
> > > > > > > > > > > > > > > > number of descriptors used to @@ -3396,6 +3417,10 @@
> > > > > > > > > > > > \subsubsection{Processing of Incoming Packets}\label{sec:Device
> > > > > > > > > > > > Types / Network
> > > > > > > > > > > > > > > >      checksum (in case of multiple encapsulated
> > > > > > > > > > > > > > > > protocols, one
> > > > > > > > level
> > > > > > > > > > > > > > > >      of checksums is validated).
> > > > > > > > > > > > > > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been negotiated,
> > > > > > > > the
> > > > > > > > > > > > device
> > > > > > > > > > > > > > > > +MAY use \field{rx_buf_len} as a buffer length (instead
> > > > > > > > > > > > > > > > +of reading it from virtqueue descriptor \field{len} field).
> > > > > > > > > > > > > > > Is this safe? What if driver submit a small buffer, then
> > > > > > > > > > > > > > > device can read
> > > > > > > > > > > > more than what is allowed?
> > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >      \drivernormative{\paragraph}{Processing of Incoming
> > > > > > > > > > > > > > > >      Packets}{Device Types / Network Device / Device Operation
> > > > > > /
> > > > > > > > > > > > > > > >      Processing of Incoming Packets}
> > 
> > This publicly archived list offers a means to provide input to the
> > OASIS Virtual I/O Device (VIRTIO) TC.
> > 
> > In order to verify user consent to the Feedback License terms and
> > to minimize spam in the list archive, subscription is required
> > before posting.
> > 
> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > List help: virtio-comment-help@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/
> > 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-25  8:18                             ` Michael S. Tsirkin
@ 2019-11-25  8:22                               ` Jason Wang
  2019-11-25  8:30                                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Wang @ 2019-11-25  8:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Vitaly Mireyno, virtio-comment


On 2019/11/25 下午4:18, Michael S. Tsirkin wrote:
>>>>> Again I can see how we might want to disallow crazy setups with e.g. 1
>>>>> byte per buffer. That's just abuse, no guest does that anyway. So asking
>>>>> e.g. for a minimal buffer size sounds very reasonable.
>>>> One question here is that, the minimal buffer size should depends on various
>>>> factors. E.g the ring size. Consider a 256 entries ring, the minimal size
>>>> should be 64K/256=256 ...
>>> I guess you are right. We can make this driver programmable I guess?
>>> Basically pass min_buf_len to the device.
>> Then it still have the chance to program the min_buf_len to 1?
> If it wants to - yes - presumably device won't work as well,
> fetching all kind of extra data e.g. unnecessary descriptors.
>

Technically, device should still work for small packet and drop large 
packet.

Vhost has a bug that will stall RX in the case.

Will post a fix.

Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-25  8:22                               ` Jason Wang
@ 2019-11-25  8:30                                 ` Michael S. Tsirkin
  2019-11-25  9:46                                   ` Jason Wang
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-11-25  8:30 UTC (permalink / raw)
  To: Jason Wang; +Cc: Vitaly Mireyno, virtio-comment

On Mon, Nov 25, 2019 at 04:22:47PM +0800, Jason Wang wrote:
> 
> On 2019/11/25 下午4:18, Michael S. Tsirkin wrote:
> > > > > > Again I can see how we might want to disallow crazy setups with e.g. 1
> > > > > > byte per buffer. That's just abuse, no guest does that anyway. So asking
> > > > > > e.g. for a minimal buffer size sounds very reasonable.
> > > > > One question here is that, the minimal buffer size should depends on various
> > > > > factors. E.g the ring size. Consider a 256 entries ring, the minimal size
> > > > > should be 64K/256=256 ...
> > > > I guess you are right. We can make this driver programmable I guess?
> > > > Basically pass min_buf_len to the device.
> > > Then it still have the chance to program the min_buf_len to 1?
> > If it wants to - yes - presumably device won't work as well,
> > fetching all kind of extra data e.g. unnecessary descriptors.
> > 
> 
> Technically, device should still work for small packet and drop large
> packet.

I am not sure I agree. How is this different from just not being
sufficient space in the RX ring? We certainly do not want to
drop packets just because there's not enough buffers -
it's a temporary condition.


> Vhost has a bug that will stall RX in the case.

I think we want to spec this first then.

> Will post a fix.
> 
> Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-25  8:30                                 ` Michael S. Tsirkin
@ 2019-11-25  9:46                                   ` Jason Wang
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Wang @ 2019-11-25  9:46 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Vitaly Mireyno, virtio-comment


On 2019/11/25 下午4:30, Michael S. Tsirkin wrote:
> On Mon, Nov 25, 2019 at 04:22:47PM +0800, Jason Wang wrote:
>> On 2019/11/25 下午4:18, Michael S. Tsirkin wrote:
>>>>>>> Again I can see how we might want to disallow crazy setups with e.g. 1
>>>>>>> byte per buffer. That's just abuse, no guest does that anyway. So asking
>>>>>>> e.g. for a minimal buffer size sounds very reasonable.
>>>>>> One question here is that, the minimal buffer size should depends on various
>>>>>> factors. E.g the ring size. Consider a 256 entries ring, the minimal size
>>>>>> should be 64K/256=256 ...
>>>>> I guess you are right. We can make this driver programmable I guess?
>>>>> Basically pass min_buf_len to the device.
>>>> Then it still have the chance to program the min_buf_len to 1?
>>> If it wants to - yes - presumably device won't work as well,
>>> fetching all kind of extra data e.g. unnecessary descriptors.
>>>
>> Technically, device should still work for small packet and drop large
>> packet.
> I am not sure I agree. How is this different from just not being
> sufficient space in the RX ring? We certainly do not want to
> drop packets just because there's not enough buffers -
> it's a temporary condition.


I think not? Spec doesn't forbid e.g 1 byte buffer. So when a large 
packet come, if we don't drop packet. Device may think no available 
buffer but driver may think all buffers are available.


>
>
>> Vhost has a bug that will stall RX in the case.
> I think we want to spec this first then.


I agree the clarification in the spec, but it wold be too late for 
existed drivers?

Thanks


>
>> Will post a fix.
>>
>> Thanks
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-24 15:30                     ` [virtio-comment] " Michael S. Tsirkin
  2019-11-25  3:31                       ` Jason Wang
@ 2019-11-26 17:27                       ` Vitaly Mireyno
  2019-11-27 13:51                         ` Michael S. Tsirkin
  1 sibling, 1 reply; 28+ messages in thread
From: Vitaly Mireyno @ 2019-11-26 17:27 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, virtio-comment



>-----Original Message-----
>From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
>open.org> On Behalf Of Michael S. Tsirkin
>Sent: Sunday, 24 November, 2019 17:30
>To: Vitaly Mireyno <vmireyno@marvell.com>
>Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>open.org
>Subject: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-
>net: Add equal-sized receive buffers feature
>
>On Sun, Nov 24, 2019 at 03:02:05PM +0000, Vitaly Mireyno wrote:
>>
>>
>> >-----Original Message-----
>> >From: Michael S. Tsirkin <mst@redhat.com>
>> >Sent: Wednesday, 20 November, 2019 15:23
>> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>> >open.org
>> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
>> >equal- sized receive buffers feature
>> >
>> >On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote:
>> >>
>> >>
>> >> >-----Original Message-----
>> >> >From: Michael S. Tsirkin <mst@redhat.com>
>> >> >Sent: Monday, 11 November, 2019 17:05
>> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>> >> >open.org
>> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
>> >> >Add
>> >> >equal- sized receive buffers feature
>> >> >
>> >> >On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrote:
>> >> >>
>> >> >>
>> >> >> >-----Original Message-----
>> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
>> >> >> >Sent: Sunday, 10 November, 2019 17:35
>> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >> >> >Cc: Jason Wang <jasowang@redhat.com>;
>> >> >> >virtio-comment@lists.oasis- open.org
>> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
>> >> >> >Add
>> >> >> >equal- sized receive buffers feature
>> >> >> >
>> >> >> >On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
>> >> >> >>
>> >> >> >> >-----Original Message-----
>> >> >> >> >From: virtio-comment@lists.oasis-open.org
>> >> >> >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
>> >> >> >> >Tsirkin
>> >> >> >> >Sent: Tuesday, 5 November, 2019 20:52
>> >> >> >> >To: Jason Wang <jasowang@redhat.com>
>> >> >> >> >Cc: Vitaly Mireyno <vmireyno@marvell.com>;
>> >> >> >> >virtio-comment@lists.oasis- open.org
>> >> >> >> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
>> >> >> >> >Add equal-sized receive buffers feature
>> >> >> >> >
>> >> >> >> >External Email
>> >> >> >> >
>> >> >> >> >------------------------------------------------------------
>> >> >> >> >---
>> >> >> >> >---
>> >> >> >> >---
>> >> >> >> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
>> >> >> >> >>
>> >> >> >> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
>> >> >> >> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno
>wrote:
>> >> >> >> >> > > The feature is limited to receive buffers only.
>> >> >> >> >> > > A driver decides on receive buffer length. The only
>> >> >> >> >> > > limitation is that this
>> >> >> >> >length has to be the same for all receive virtqueue buffers.
>> >> >> >> >> > > The driver configures receive buffer length to the
>> >> >> >> >> > > device during device
>> >> >> >> >initialization, and the device reads it and may use it for
>> >> >> >> >optimal
>> >> >operation.
>> >> >> >> >> > > No changes for transmit buffers.
>> >> >> >> >> > >
>> >> >> >> >> > > -----Original Message-----
>> >> >> >> >> > > From: virtio-comment@lists.oasis-open.org
>> >> >> >> >> > > <virtio-comment@lists.oasis-open.org> On Behalf Of
>> >> >> >> >> > > Jason Wang
>> >> >> >> >> > > Sent: Thursday, 31 October, 2019 12:15
>> >> >> >> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
>> >> >> >> >> > > virtio-comment@lists.oasis-open.org
>> >> >> >> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
>> >> >> >> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net: Add
>> >> >> >> >> > > equal-sized receive buffers feature
>> >> >> >> >> > >
>> >> >> >> >> > >
>> >> >> >> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
>> >> >> >> >> > > > Some devices benefit from working with receive
>> >> >> >> >> > > > buffers of a
>> >> >> >> >predefined constant length.
>> >> >> >> >> > > > Add a feature for drivers that allocate equal-sized
>> >> >> >> >> > > > receive buffers, and
>> >> >> >> >devices that benefit from predefined receive buffers length.
>> >> >> >> >> > > >
>> >> >> >> >> > > > Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
>> >> >> >> >> > > > ---
>> >> >> >> >> > > >    content.tex | 29 +++++++++++++++++++++++++++--
>> >> >> >> >> > > >    1 file changed, 27 insertions(+), 2 deletions(-)
>> >> >> >> >> > >
>> >> >> >> >> > > Is there any other networking device that has this feature?
>> >> >> >> >> > >
>> >> >> >> >> > >
>> >> >> >> >> > > > diff --git a/content.tex b/content.tex index
>> >> >> >> >> > > > b1ea9b9..c9e67c8
>> >> >> >> >> > > > 100644
>> >> >> >> >> > > > --- a/content.tex
>> >> >> >> >> > > > +++ b/content.tex
>> >> >> >> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
>> >> >> >> >> > > > bits}\label{sec:Device
>> >> >> >> >Types / Network Device / Feature bits
>> >> >> >> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC
>> >> >> >> >> > > > address
>> >> >> >> >through control
>> >> >> >> >> > > >        channel.
>> >> >> >> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver
>> >> >> >> >> > > > +allocates all
>> >> >> >> >receive buffers
>> >> >> >> >> > > > +    of the same constant length. Device benefits
>> >> >> >> >> > > > + from working
>> >> >with
>> >> >> >> >> > > > +    receive buffers of equal length.
>> >> >> >> >> > > > +
>> >> >> >> >> > Problem is, I don't think linux will use this for skbs
>> >> >> >> >> > since it breaks buffer accounting. This is because it is
>> >> >> >> >> > important to make skbs as small as you can. So even if
>> >> >> >> >> > you see "device would
>> >> >benefit"
>> >> >> >> >> > there is no way to balance this with the benefit to linux.
>> >> >> >> >> > How do you know which benefit is bigger?
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> I guess the idea is e.g for Linux driver, it can refuse this feature.
>> >> >> >> >
>> >> >> >> >Okay. What uses it?
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> > You also never explained how does device benefit. My
>> >> >> >> >> > guess is this allows device to calculate the # of
>> >> >> >> >> > descriptors to fetch that are needed for a packet. Right?
>> >> >> >> >> > Assuming this, I think a rough estimate should be enough.
>> >> >> >> >> > If device fetches too much it can discard extra, if it
>> >> >> >> >> > does not fetch enough it can fetch extra.
>> >> >> >> >> >
>> >> >> >> >> > Let us not forget that express packets are 256 bytes so
>> >> >> >> >> > up to
>> >> >> >> >> > 16 descriptors fit in a packet, there is no benefit most
>> >> >> >> >> > of the time in knowing whether e.g. 1 or 2 descriptors are
>needed.
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > Let us not forget these are buffers, not descriptors.
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> I guess maybe the initial motivation is constant descriptor
>length.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >That conflicts with requirement framing is up to driver.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> I was using the wrong terminology. The feature is about
>> >> >> >> constant
>> >> >> >*descriptor* length. In other words, the value that driver puts
>> >> >> >in Packed Virtqueue "Element Length" field (or 'len' field in
>> >> >> >the
>> >'pvirtq_desc'
>> >> >struct).
>> >> >> >
>> >> >> >OK so this conflicts with "2.6.4 Message Framing" which
>> >> >> >requires that drivers can split buffers into as many descriptors as
>they like.
>> >> >> >Right?
>> >> >> >
>> >> >> >> Does it make more sense now, or you still see an issue with
>> >> >> >> Linux
>> >driver?
>> >> >> >
>> >> >> >I think there's an issue with the flexible framing requirements
>> >> >> >and there's an issue with Linux driver.
>> >> >> >
>> >> >> >> The motivation is to allow the device to calculate the exact
>> >> >> >> number of
>> >> >> >descriptors to consume, before fetching the descriptors. This
>> >> >> >is beneficial for devices for which overconsuming or
>> >> >> >underconsuming descriptors come at a high cost.
>> >> >> >
>> >> >> >So I guess we can agree getting the # of descriptors *exactly*
>> >> >> >right isn't all that important. Right? My question is how does
>> >> >> >the driver balance the device versus Linux needs?
>> >> >> >
>> >> >> >One idea is if we assume this is best effort, does not have to
>> >> >> >be exact, then how about just having the device assume
>> >> >> >descriptor sizes stay more or less constant and use that to
>> >> >> >estimate the amount? If it under/over estimates, things just go a bit
>slower.
>> >> >> >This way driver can adjust the sizes and device will react
>> >> >> >automatically, with time.
>> >> >> >
>> >> >> >
>> >> >>
>> >> >> I see that "2.6.4 Message Framing" allows a full flexibility of
>> >> >> descriptor lengths, and I presume it's applicable to Packet
>> >> >> Virtqueues as well, though defined under Split Virtqueues section.
>> >> >
>> >> >That's a good point. Probably makes sense to move it out to a
>> >> >common section, right?
>> >> >
>> >> >> The example
>> >> >> talks about transmit virtqueue, and it makes perfect sense.
>> >> >> However, wouldn't a typical driver place equal-sized *receive*
>> >> >> descriptors
>> >> >anyway? So if a device can benefit from it, the driver might as
>> >> >well negotiate this new feature.
>> >> >
>> >> >Having buffer size jump around wildly doesn't seem too useful, I agree.
>> >> >OTOH being able to adjust it gradually has been in the past
>> >> >demontrated to help performance measureably.
>> >> >
>> >> >> This could be especially relevant for devices, for which
>> >> >> adjusting the number
>> >> >of used descriptors is impractical after descriptors have already
>> >> >been
>> >fetched.
>> >> >> I agree that if this requirement conflicts with specific driver
>> >> >> needs, it will not
>> >> >be negotiated, and the device will either underperform in certain
>> >> >scenarios, or will not come up at all.
>> >> >
>> >> >Right so that makes it a challenge. Device says it prefers fixed
>> >> >buffers. Is that preference more or less important than ability to
>> >> >efficiently support workloads such as TCP small queues?
>> >> >Driver has no idea and I suspect neither does the device.
>> >> >So I don't see how will a Linux driver know that it should enable
>> >> >this, neither how will device know it's okay to just refuse features_ok.
>> >> >
>> >> >
>> >> >On the other hand, current drivers already have logic that tries
>> >> >to change buffer sizes in a smooth way.  So simply caching the
>> >> >last buffer size and assuming all the others will be exactly the
>> >> >same will go a long way towards limiting how much does device need to
>fetch.
>> >> >This does imply extra logic on the device to recover if things
>> >> >change and the first read did not fetch enough buffers, but then
>> >> >it would be required anyway if as you say above the failure is not
>catastrophic.
>> >> >
>> >> >
>> >> >The feature thus only seems useful for small, feature-restrained
>> >> >devices
>> >> >- which are prepared to sacrifice some performance to cut costs,
>> >> >and which can't recover at all. Is this what you are trying to do?
>> >> >
>> >>
>> >> The intention is to enable devices with this specific limitation to
>> >> offer virtio offload.  The feature can be redefined such that it
>> >> would only be offered by devices that are unable to handle
>> >> dynamically changing descriptor lengths. How does that sound?
>> >
>> >So it makes more sense when mergeable buffers are disabled (since
>> >then buffers are practically all same size).
>> >
>>
>> Actually, mergeable buffers are not a problem. They could be enabled,
>> as long as each descriptor is the same length. So implementations that
>> prefer optimizing memory utilization over jumbo frame performance can
>> negotiate VIRTIO_NET_F_MRG_RXBUF and allocate smaller buffers.
>
>So my point was, without VIRTIO_NET_F_MRG_RXBUF, all buffers are same
>length anyway. So if we are talking about cheap simple hardware, I guess it
>can just not have VIRTIO_NET_F_MRG_RXBUF and be done with it?
>If device does care about memory utilization then I think it needs to give
>driver the flexibility it needs/uses.
>No?
>

It's not a matter of device complexity, but a HW architecture, which could be complex, but have this specific limitation.
I see at least two reasons to support and negotiate VIRTIO_NET_F_MRG_RXBUF, while keeping equal-sized descriptor limitation:
 * GRO
 * With jumbo packets, if the throughput is capped by the port bandwidth, and not by the device/driver per-packet performance, it makes sense to optimize memory utilization by allocating small buffers, without sacrificing throughput performance.


>Again I can see how we might want to disallow crazy setups with e.g. 1 byte
>per buffer. That's just abuse, no guest does that anyway. So asking e.g. for a
>minimal buffer size sounds very reasonable.  But an option that disables
>functionality that a popular guest uses needs a lot of documentation to help
>device writers figure out whether they want that option or not, and I'd worry
>that even with documentation will be misunderstood even if we write it.
>When do you enable this?
>When you don't care about performance? When you don't care about Linux?
>

I understand that there are guest-side optimizations that require flexibility in descriptors length. I can propose the following simple logic:
Device - advertise "equal-sized descriptor" feature only if the device is absolutely unable to operate otherwise. The device will not set FEATURES_OK unless the driver negotiates this feature.
Driver - if device advertises "equal-sized descriptor" feature - if possible, give up on the flexible descriptors length optimizations. If not - give up on the device.


>> >
>> >With that in mind, I have an idea: scsi and block already have max sg field.
>> >How about we add a writeable max sg field, maybe even make it
>> >programmable per vq?
>> >
>> >Thus driver tells device what is it's max s/g value for rx. Worst
>> >case device fetches a bit more than it needs. Discarding extra
>> >shouldn't be expensive. This looks like it will help even smart devices. What
>do you think?
>> >
>> >This nicely avoids conflicting with the flexible framing requirement.
>> >
>>
>> My intention was to avoid any descriptor length variations, for
>> devices that unable fetching or discarding extra descriptors. (If in
>> the pipelined HW processing the decision on number of descriptors is
>> made in the early stage, and it cannot be undone in a later stage).
>
>Frankly discarding unused descriptors looks to me like something that
>shouldn't have a high cost in the hardware.  I can see how trying to predict
>descriptor length, and fetching extra if not enough was fetched can have a
>high cost, so to me extensions to remove the guesswork from the device
>have value.  However a lot of effort went into trying to reduce e.g. number of
>pci express packets needed to fetch descriptors.
>Each packet fetches 256 bytes anyway, does it not?  Not having an ability to
>use that information seems like an obvious waste, and that means ability to
>keep some fetched descriptors around even if they are not used for a given
>packet. Again just my $.02.
>

In packed virtqueue, discarding unused descriptors (and buffers associated with them) can indeed be easy, but reusing them for the next packet is complicated (or impossible).
I agree that descriptors are being (pre)fetched with the maximum efficiency (in terms of PCIe bandwidth), and cached in the device. But the decision to fetch is being made according to the number of left cached-in descriptors and the expected number of descriptors that will be used by the packet.
If the expected number of descriptors is larger than the actual one, the next fetch decision will be taken too early, and there will be no way to reuse excess cached descriptors, and they will have to be discarded.
Even if it's possible to skip descriptors in the packed virtqueue (is it?), it's surely inefficient.


>>
>> Defining max s/g sounds like an interesting feature by itself.
>
>But assuming we have max RX s/g, I guess hardware can set max s/g = 1?
>Then since with !VIRTIO_NET_F_MRG_RXBUF all buffers are forced to be
>same length.
>
>>
>> >
>> >> >> Could you specify what issue do you see with the Linux driver?
>> >> >
>> >> >See logic around struct ewma.
>> >> >
>> >> >The first relevant commit is I guess commit
>> >> >ab7db91705e95ed1bba1304388936fccfa58c992
>> >> >    virtio-net: auto-tune mergeable rx buffer size for improved
>> >> >performance
>> >> >
>> >> >this claims a gain of about 10% with large packets which isn't
>> >> >earth shattering but also not something we can ignore completely.
>> >> >And I suspect it can be bigger with smaller packets.
>> >> >
>> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> > So device does not really know the exact # of descriptors:
>> >> >> >> >> > current drivers do 1 descriptor per buffer but nothing
>> >> >> >> >> > prevents more.
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > Thoughts?
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can
>> >> >> >> >> > > > process duplicated
>> >> >> >> >ACKs
>> >> >> >> >> > > >        and report number of coalesced segments and
>> >> >> >> >> > > > duplicated ACKs @@ -2854,8 +2858,8 @@
>> >> >\subsubsection{Legacy Interface:
>> >> >> >> >Feature bits}\label{sec:Device Types / Network
>> >> >> >> >> > > >    \subsection{Device configuration
>> >> >> >> >> > > > layout}\label{sec:Device Types /
>> >> >> >> >Network Device / Device configuration layout}
>> >> >> >> >> > > >    \label{sec:Device Types / Block Device / Feature
>> >> >> >> >> > > > bits / Device configuration layout} -Three
>> >> >> >> >> > > > driver-read-only configuration fields are currently defined.
>> >> >> >> >> > > > The \field{mac} address field -always exists (though
>> >> >> >> >> > > > is only valid if VIRTIO_NET_F_MAC is set), and
>> >> >> >> >> > > > +The driver-read-only \field{mac} address field
>> >> >> >> >> > > > +always exists (though is only valid if
>> >> >> >> >> > > > +VIRTIO_NET_F_MAC is set), and
>> >> >> >> >> > > >    \field{status} only exists if VIRTIO_NET_F_STATUS is set.
>> >Two
>> >> >> >> >> > > >    read-only bits (for the driver) are currently
>> >> >> >> >> > > > defined for the status
>> >> >> >> >field:
>> >> >> >> >> > > >    VIRTIO_NET_S_LINK_UP and
>VIRTIO_NET_S_ANNOUNCE.
>> >> >> >> >> > > > @@ -2875,12 +2879,17 @@ \subsection{Device
>> >> >> >> >> > > > configuration
>> >> >> >> >layout}\label{sec:Device Types / Network Device
>> >> >> >> >> > > >    VIRTIO_NET_F_MTU is set. This field specifies the
>> >> >> >> >> > > > maximum MTU for
>> >> >> >> >the driver to
>> >> >> >> >> > > >    use.
>> >> >> >> >> > > > +The device-read-only field \field{rx_buf_len} only
>> >> >> >> >> > > > +exists if
>> >> >> >> >> > >
>> >> >> >> >> > > Should be driver-read-only.
>> >> >> >> >> > >
>> >> >> >> >> > >
>> >> >> >> >> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This
>> >> >> >> >> > > > +field specifies the receive buffers length.
>> >> >> >> >> > > > +
>> >> >> >> >> > > >    \begin{lstlisting}
>> >> >> >> >> > > >    struct virtio_net_config {
>> >> >> >> >> > > >            u8 mac[6];
>> >> >> >> >> > > >            le16 status;
>> >> >> >> >> > > >            le16 max_virtqueue_pairs;
>> >> >> >> >> > > >            le16 mtu;
>> >> >> >> >> > > > +        le32 rx_buf_len;
>> >> >> >> >> > > >    };
>> >> >> >> >> > > >    \end{lstlisting}
>> >> >> >> >> > > > @@ -2933,6 +2942,13 @@ \subsection{Device
>> >> >> >> >> > > > configuration layout}\label{sec:Device Types / Network
>Device
>> >> >> >> >> > > >    A driver SHOULD negotiate the
>> >> >> >> >> > > > VIRTIO_NET_F_STANDBY feature if
>> >> >> >> >the device offers it.
>> >> >> >> >> > > > +A driver SHOULD accept the
>> >> >VIRTIO_NET_F_CONST_RXBUF_LEN
>> >> >> >> >feature if offered.
>> >> >> >> >> > > > +
>> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
>> >> >> >negotiated,
>> >> >> >> >> > > > +the driver MUST set \field{rx_buf_len}.
>> >> >> >> >> > >
>> >> >> >> >> > > I think it's device that set the field?
>> >> >> >> >> > Makes more sense for the driver, but if you want this set e.g.
>> >> >> >> >> > before buffers are added, you must say so.
>> >> >> >> >> >
>> >> >> >> >> > > > +
>> >> >> >> >> > > > +A driver MUST NOT modify \field{rx_buf_len} once it
>> >> >> >> >> > > > +has been
>> >> >> >set.
>> >> >> >> >> > > > +
>> >> >> >> >> >
>> >> >> >> >> > This seems very unflexible. I can see how e.g. XDP would
>> >> >> >> >> > benefit from big buffers while skbs benefit from small
>buffers.
>> >> >> >> >> >
>> >> >> >> >> > This calls for ability to change this.
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> Yes, but it requires non trivial cleanups for the old
>> >> >> >> >> length and place them with new ones.
>> >> >> >> >>
>> >> >> >> >> Thanks
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >Let's see:
>> >> >> >> >1	- making buffer smaller: just update config space,
>> >> >> >> >	  then make new buffers smaller
>> >> >> >> >
>> >> >> >> >2	- making buffers bigger: add larger buffers,
>> >> >> >> >	once all small ones are consumed update config space
>> >> >> >> >
>> >> >> >> >2 is tricky I agree. Thoughts?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> Agree. It's doable, provided that the driver will follow the
>> >> >> >> update
>> >> >> >procedure.
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> > > >    \subsubsection{Legacy Interface: Device
>> >> >> >> >> > > > configuration
>> >> >> >> >layout}\label{sec:Device Types / Network Device / Device
>> >> >> >> >configuration layout / Legacy Interface: Device
>> >> >> >> >configuration layout}
>> >> >> >> >> > > >    \label{sec:Device Types / Block Device / Feature
>> >> >> >> >> > > > bits / Device
>> >> >> >> >configuration layout / Legacy Interface: Device
>> >> >> >> >configuration layout}
>> >> >> >> >> > > >    When using the legacy interface, transitional
>> >> >> >> >> > > > devices and drivers @@
>> >> >> >> >> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up
>> >> >> >> >> > > > Receive
>> >> >> >> >Buffers}\label{sec:Device Types / Network Devi
>> >> >> >> >> > > >    If VIRTIO_NET_F_MQ is negotiated, each of
>> >> >> >> >> > > > receiveq1\ldots
>> >> >> >> >receiveqN
>> >> >> >> >> > > >    that will be used SHOULD be populated with
>> >> >> >> >> > > > receive
>> >buffers.
>> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
>> >> >> >negotiated,
>> >> >> >> >> > > > +the driver MUST initialize all receive virtqueue
>> >> >> >> >> > > > +descriptors \field{len} field with the value
>> >> >> >> >> > > > +configured in \field{rx_buf_len} device
>> >> >> >> >> > > > +configuration field, and allocate receive
>> >> >> >> >buffers accordingly.
>> >> >> >> >> > > > +
>> >> >> >> >> > > >    \devicenormative{\paragraph}{Setting Up Receive
>> >> >> >> >> > > > Buffers}{Device Types / Network Device / Device
>> >> >> >> >> > > > Operation / Setting Up Receive Buffers}
>> >> >> >> >> > > >    The device MUST set \field{num_buffers} to the
>> >> >> >> >> > > > number of descriptors used to @@ -3396,6 +3417,10 @@
>> >> >> >> >\subsubsection{Processing of Incoming
>> >> >> >> >Packets}\label{sec:Device Types / Network
>> >> >> >> >> > > >    checksum (in case of multiple encapsulated
>> >> >> >> >> > > > protocols, one
>> >> >level
>> >> >> >> >> > > >    of checksums is validated).
>> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been
>> >> >> >> >> > > > +negotiated,
>> >> >the
>> >> >> >> >device
>> >> >> >> >> > > > +MAY use \field{rx_buf_len} as a buffer length
>> >> >> >> >> > > > +(instead of reading it from virtqueue descriptor
>\field{len} field).
>> >> >> >> >> > >
>> >> >> >> >> > > Is this safe? What if driver submit a small buffer,
>> >> >> >> >> > > then device can read
>> >> >> >> >more than what is allowed?
>> >> >> >> >> > >
>> >> >> >> >> > > Thanks
>> >> >> >> >> > >
>> >> >> >> >> > >
>> >> >> >> >> > > > +
>> >> >> >> >> > > >    \drivernormative{\paragraph}{Processing of Incoming
>> >> >> >> >> > > >    Packets}{Device Types / Network Device / Device
>> >> >> >> >> > > > Operation
>> >/
>> >> >> >> >> > > >    Processing of Incoming Packets}
>
>
>This publicly archived list offers a means to provide input to the OASIS Virtual
>I/O Device (VIRTIO) TC.
>
>In order to verify user consent to the Feedback License terms and to minimize
>spam in the list archive, subscription is required before posting.
>
>Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>List help: virtio-comment-help@lists.oasis-open.org
>List archive: https://urldefense.proofpoint.com/v2/url?u=https-
>3A__lists.oasis-2Dopen.org_archives_virtio-
>2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lq
>qsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
>42yWtI9MnZcqZ2Gw7KCN7EgCg&s=-
>JICLquqUnNye7tilUS67AFv7opngsKEl5L75acB64U&e=
>Feedback License: https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oasis-2Dopen.org_who_ipr_feedback-
>5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqq
>sArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
>42yWtI9MnZcqZ2Gw7KCN7EgCg&s=4e2kWmSdPAtGMXBTHwfgNE_KOZdDUR
>Wsji73HWdVF3A&e=
>List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-
>2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFR
>dcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
>42yWtI9MnZcqZ2Gw7KCN7EgCg&s=i-dQn5G-
>auoFtCxN8Y2PN8UccM1ezgcrsT2A8T1H8wE&e=
>Committee: https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oasis-
>2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&
>r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-
>aDaM6Z-42yWtI9MnZcqZ2Gw7KCN7EgCg&s=kcv-jA-_-JC3v64-_r5iP-
>XQ9rhyaeOKvrHJNHwZrLc&e=
>Join OASIS: https://urldefense.proofpoint.com/v2/url?u=https-
>3A__www.oasis-
>2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52o
>J3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
>42yWtI9MnZcqZ2Gw7KCN7EgCg&s=cKbLeI_5Fu9G7aybE5u51yISB0eRer6BvCxr
>wgd5HS4&e=


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-26 17:27                       ` Vitaly Mireyno
@ 2019-11-27 13:51                         ` Michael S. Tsirkin
  2019-12-01 10:22                           ` Vitaly Mireyno
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-11-27 13:51 UTC (permalink / raw)
  To: Vitaly Mireyno; +Cc: Jason Wang, virtio-comment

On Tue, Nov 26, 2019 at 05:27:31PM +0000, Vitaly Mireyno wrote:
> 
> 
> >-----Original Message-----
> >From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> >open.org> On Behalf Of Michael S. Tsirkin
> >Sent: Sunday, 24 November, 2019 17:30
> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> >open.org
> >Subject: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-
> >net: Add equal-sized receive buffers feature
> >
> >On Sun, Nov 24, 2019 at 03:02:05PM +0000, Vitaly Mireyno wrote:
> >>
> >>
> >> >-----Original Message-----
> >> >From: Michael S. Tsirkin <mst@redhat.com>
> >> >Sent: Wednesday, 20 November, 2019 15:23
> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> >> >open.org
> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add
> >> >equal- sized receive buffers feature
> >> >
> >> >On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote:
> >> >>
> >> >>
> >> >> >-----Original Message-----
> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
> >> >> >Sent: Monday, 11 November, 2019 17:05
> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> >> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> >> >> >open.org
> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> >> >> >Add
> >> >> >equal- sized receive buffers feature
> >> >> >
> >> >> >On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrote:
> >> >> >>
> >> >> >>
> >> >> >> >-----Original Message-----
> >> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
> >> >> >> >Sent: Sunday, 10 November, 2019 17:35
> >> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> >> >> >Cc: Jason Wang <jasowang@redhat.com>;
> >> >> >> >virtio-comment@lists.oasis- open.org
> >> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> >> >> >> >Add
> >> >> >> >equal- sized receive buffers feature
> >> >> >> >
> >> >> >> >On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
> >> >> >> >>
> >> >> >> >> >-----Original Message-----
> >> >> >> >> >From: virtio-comment@lists.oasis-open.org
> >> >> >> >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
> >> >> >> >> >Tsirkin
> >> >> >> >> >Sent: Tuesday, 5 November, 2019 20:52
> >> >> >> >> >To: Jason Wang <jasowang@redhat.com>
> >> >> >> >> >Cc: Vitaly Mireyno <vmireyno@marvell.com>;
> >> >> >> >> >virtio-comment@lists.oasis- open.org
> >> >> >> >> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> >> >> >> >> >Add equal-sized receive buffers feature
> >> >> >> >> >
> >> >> >> >> >External Email
> >> >> >> >> >
> >> >> >> >> >------------------------------------------------------------
> >> >> >> >> >---
> >> >> >> >> >---
> >> >> >> >> >---
> >> >> >> >> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
> >> >> >> >> >>
> >> >> >> >> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
> >> >> >> >> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly Mireyno
> >wrote:
> >> >> >> >> >> > > The feature is limited to receive buffers only.
> >> >> >> >> >> > > A driver decides on receive buffer length. The only
> >> >> >> >> >> > > limitation is that this
> >> >> >> >> >length has to be the same for all receive virtqueue buffers.
> >> >> >> >> >> > > The driver configures receive buffer length to the
> >> >> >> >> >> > > device during device
> >> >> >> >> >initialization, and the device reads it and may use it for
> >> >> >> >> >optimal
> >> >> >operation.
> >> >> >> >> >> > > No changes for transmit buffers.
> >> >> >> >> >> > >
> >> >> >> >> >> > > -----Original Message-----
> >> >> >> >> >> > > From: virtio-comment@lists.oasis-open.org
> >> >> >> >> >> > > <virtio-comment@lists.oasis-open.org> On Behalf Of
> >> >> >> >> >> > > Jason Wang
> >> >> >> >> >> > > Sent: Thursday, 31 October, 2019 12:15
> >> >> >> >> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
> >> >> >> >> >> > > virtio-comment@lists.oasis-open.org
> >> >> >> >> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> >> >> >> >> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net: Add
> >> >> >> >> >> > > equal-sized receive buffers feature
> >> >> >> >> >> > >
> >> >> >> >> >> > >
> >> >> >> >> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
> >> >> >> >> >> > > > Some devices benefit from working with receive
> >> >> >> >> >> > > > buffers of a
> >> >> >> >> >predefined constant length.
> >> >> >> >> >> > > > Add a feature for drivers that allocate equal-sized
> >> >> >> >> >> > > > receive buffers, and
> >> >> >> >> >devices that benefit from predefined receive buffers length.
> >> >> >> >> >> > > >
> >> >> >> >> >> > > > Signed-off-by: Vitaly Mireyno <vmireyno@marvell.com>
> >> >> >> >> >> > > > ---
> >> >> >> >> >> > > >    content.tex | 29 +++++++++++++++++++++++++++--
> >> >> >> >> >> > > >    1 file changed, 27 insertions(+), 2 deletions(-)
> >> >> >> >> >> > >
> >> >> >> >> >> > > Is there any other networking device that has this feature?
> >> >> >> >> >> > >
> >> >> >> >> >> > >
> >> >> >> >> >> > > > diff --git a/content.tex b/content.tex index
> >> >> >> >> >> > > > b1ea9b9..c9e67c8
> >> >> >> >> >> > > > 100644
> >> >> >> >> >> > > > --- a/content.tex
> >> >> >> >> >> > > > +++ b/content.tex
> >> >> >> >> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
> >> >> >> >> >> > > > bits}\label{sec:Device
> >> >> >> >> >Types / Network Device / Feature bits
> >> >> >> >> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC
> >> >> >> >> >> > > > address
> >> >> >> >> >through control
> >> >> >> >> >> > > >        channel.
> >> >> >> >> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver
> >> >> >> >> >> > > > +allocates all
> >> >> >> >> >receive buffers
> >> >> >> >> >> > > > +    of the same constant length. Device benefits
> >> >> >> >> >> > > > + from working
> >> >> >with
> >> >> >> >> >> > > > +    receive buffers of equal length.
> >> >> >> >> >> > > > +
> >> >> >> >> >> > Problem is, I don't think linux will use this for skbs
> >> >> >> >> >> > since it breaks buffer accounting. This is because it is
> >> >> >> >> >> > important to make skbs as small as you can. So even if
> >> >> >> >> >> > you see "device would
> >> >> >benefit"
> >> >> >> >> >> > there is no way to balance this with the benefit to linux.
> >> >> >> >> >> > How do you know which benefit is bigger?
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> I guess the idea is e.g for Linux driver, it can refuse this feature.
> >> >> >> >> >
> >> >> >> >> >Okay. What uses it?
> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >> >> >
> >> >> >> >> >> > You also never explained how does device benefit. My
> >> >> >> >> >> > guess is this allows device to calculate the # of
> >> >> >> >> >> > descriptors to fetch that are needed for a packet. Right?
> >> >> >> >> >> > Assuming this, I think a rough estimate should be enough.
> >> >> >> >> >> > If device fetches too much it can discard extra, if it
> >> >> >> >> >> > does not fetch enough it can fetch extra.
> >> >> >> >> >> >
> >> >> >> >> >> > Let us not forget that express packets are 256 bytes so
> >> >> >> >> >> > up to
> >> >> >> >> >> > 16 descriptors fit in a packet, there is no benefit most
> >> >> >> >> >> > of the time in knowing whether e.g. 1 or 2 descriptors are
> >needed.
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> > Let us not forget these are buffers, not descriptors.
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> I guess maybe the initial motivation is constant descriptor
> >length.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >That conflicts with requirement framing is up to driver.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> I was using the wrong terminology. The feature is about
> >> >> >> >> constant
> >> >> >> >*descriptor* length. In other words, the value that driver puts
> >> >> >> >in Packed Virtqueue "Element Length" field (or 'len' field in
> >> >> >> >the
> >> >'pvirtq_desc'
> >> >> >struct).
> >> >> >> >
> >> >> >> >OK so this conflicts with "2.6.4 Message Framing" which
> >> >> >> >requires that drivers can split buffers into as many descriptors as
> >they like.
> >> >> >> >Right?
> >> >> >> >
> >> >> >> >> Does it make more sense now, or you still see an issue with
> >> >> >> >> Linux
> >> >driver?
> >> >> >> >
> >> >> >> >I think there's an issue with the flexible framing requirements
> >> >> >> >and there's an issue with Linux driver.
> >> >> >> >
> >> >> >> >> The motivation is to allow the device to calculate the exact
> >> >> >> >> number of
> >> >> >> >descriptors to consume, before fetching the descriptors. This
> >> >> >> >is beneficial for devices for which overconsuming or
> >> >> >> >underconsuming descriptors come at a high cost.
> >> >> >> >
> >> >> >> >So I guess we can agree getting the # of descriptors *exactly*
> >> >> >> >right isn't all that important. Right? My question is how does
> >> >> >> >the driver balance the device versus Linux needs?
> >> >> >> >
> >> >> >> >One idea is if we assume this is best effort, does not have to
> >> >> >> >be exact, then how about just having the device assume
> >> >> >> >descriptor sizes stay more or less constant and use that to
> >> >> >> >estimate the amount? If it under/over estimates, things just go a bit
> >slower.
> >> >> >> >This way driver can adjust the sizes and device will react
> >> >> >> >automatically, with time.
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >> I see that "2.6.4 Message Framing" allows a full flexibility of
> >> >> >> descriptor lengths, and I presume it's applicable to Packet
> >> >> >> Virtqueues as well, though defined under Split Virtqueues section.
> >> >> >
> >> >> >That's a good point. Probably makes sense to move it out to a
> >> >> >common section, right?
> >> >> >
> >> >> >> The example
> >> >> >> talks about transmit virtqueue, and it makes perfect sense.
> >> >> >> However, wouldn't a typical driver place equal-sized *receive*
> >> >> >> descriptors
> >> >> >anyway? So if a device can benefit from it, the driver might as
> >> >> >well negotiate this new feature.
> >> >> >
> >> >> >Having buffer size jump around wildly doesn't seem too useful, I agree.
> >> >> >OTOH being able to adjust it gradually has been in the past
> >> >> >demontrated to help performance measureably.
> >> >> >
> >> >> >> This could be especially relevant for devices, for which
> >> >> >> adjusting the number
> >> >> >of used descriptors is impractical after descriptors have already
> >> >> >been
> >> >fetched.
> >> >> >> I agree that if this requirement conflicts with specific driver
> >> >> >> needs, it will not
> >> >> >be negotiated, and the device will either underperform in certain
> >> >> >scenarios, or will not come up at all.
> >> >> >
> >> >> >Right so that makes it a challenge. Device says it prefers fixed
> >> >> >buffers. Is that preference more or less important than ability to
> >> >> >efficiently support workloads such as TCP small queues?
> >> >> >Driver has no idea and I suspect neither does the device.
> >> >> >So I don't see how will a Linux driver know that it should enable
> >> >> >this, neither how will device know it's okay to just refuse features_ok.
> >> >> >
> >> >> >
> >> >> >On the other hand, current drivers already have logic that tries
> >> >> >to change buffer sizes in a smooth way.  So simply caching the
> >> >> >last buffer size and assuming all the others will be exactly the
> >> >> >same will go a long way towards limiting how much does device need to
> >fetch.
> >> >> >This does imply extra logic on the device to recover if things
> >> >> >change and the first read did not fetch enough buffers, but then
> >> >> >it would be required anyway if as you say above the failure is not
> >catastrophic.
> >> >> >
> >> >> >
> >> >> >The feature thus only seems useful for small, feature-restrained
> >> >> >devices
> >> >> >- which are prepared to sacrifice some performance to cut costs,
> >> >> >and which can't recover at all. Is this what you are trying to do?
> >> >> >
> >> >>
> >> >> The intention is to enable devices with this specific limitation to
> >> >> offer virtio offload.  The feature can be redefined such that it
> >> >> would only be offered by devices that are unable to handle
> >> >> dynamically changing descriptor lengths. How does that sound?
> >> >
> >> >So it makes more sense when mergeable buffers are disabled (since
> >> >then buffers are practically all same size).
> >> >
> >>
> >> Actually, mergeable buffers are not a problem. They could be enabled,
> >> as long as each descriptor is the same length. So implementations that
> >> prefer optimizing memory utilization over jumbo frame performance can
> >> negotiate VIRTIO_NET_F_MRG_RXBUF and allocate smaller buffers.
> >
> >So my point was, without VIRTIO_NET_F_MRG_RXBUF, all buffers are same
> >length anyway. So if we are talking about cheap simple hardware, I guess it
> >can just not have VIRTIO_NET_F_MRG_RXBUF and be done with it?
> >If device does care about memory utilization then I think it needs to give
> >driver the flexibility it needs/uses.
> >No?
> >
> 
> It's not a matter of device complexity, but a HW architecture, which could be complex, but have this specific limitation.

So - don't do it then?

> I see at least two reasons to support and negotiate VIRTIO_NET_F_MRG_RXBUF, while keeping equal-sized descriptor limitation:
>  * GRO

You mean LRO?

>  * With jumbo packets, if the throughput is capped by the port bandwidth, and not by the device/driver per-packet performance, it makes sense to optimize memory utilization by allocating small buffers, without sacrificing throughput performance.

So we are back to square one, if driver cares about memory
utilization with 1K packets vs 9K buffers, why not with
100byte packets vs 1K buffers? Looks like exactly the same
tradeoff.


> 
> >Again I can see how we might want to disallow crazy setups with e.g. 1 byte
> >per buffer. That's just abuse, no guest does that anyway. So asking e.g. for a
> >minimal buffer size sounds very reasonable.  But an option that disables
> >functionality that a popular guest uses needs a lot of documentation to help
> >device writers figure out whether they want that option or not, and I'd worry
> >that even with documentation will be misunderstood even if we write it.
> >When do you enable this?
> >When you don't care about performance? When you don't care about Linux?
> >
> 
> I understand that there are guest-side optimizations that require flexibility in descriptors length. I can propose the following simple logic:
> Device - advertise "equal-sized descriptor" feature only if the device is absolutely unable to operate otherwise. The device will not set FEATURES_OK unless the driver negotiates this feature.
> Driver - if device advertises "equal-sized descriptor" feature - if possible, give up on the flexible descriptors length optimizations. If not - give up on the device.

Yes, that's possible. Looks like a rather limited feature, and
i'd rather we focused on something more widely applicable, but
with enough disclamers that devices SHOULD NOT set this bit we
can maybe do that.


> 
> >> >
> >> >With that in mind, I have an idea: scsi and block already have max sg field.
> >> >How about we add a writeable max sg field, maybe even make it
> >> >programmable per vq?
> >> >
> >> >Thus driver tells device what is it's max s/g value for rx. Worst
> >> >case device fetches a bit more than it needs. Discarding extra
> >> >shouldn't be expensive. This looks like it will help even smart devices. What
> >do you think?
> >> >
> >> >This nicely avoids conflicting with the flexible framing requirement.
> >> >
> >>
> >> My intention was to avoid any descriptor length variations, for
> >> devices that unable fetching or discarding extra descriptors. (If in
> >> the pipelined HW processing the decision on number of descriptors is
> >> made in the early stage, and it cannot be undone in a later stage).
> >
> >Frankly discarding unused descriptors looks to me like something that
> >shouldn't have a high cost in the hardware.  I can see how trying to predict
> >descriptor length, and fetching extra if not enough was fetched can have a
> >high cost, so to me extensions to remove the guesswork from the device
> >have value.  However a lot of effort went into trying to reduce e.g. number of
> >pci express packets needed to fetch descriptors.
> >Each packet fetches 256 bytes anyway, does it not?  Not having an ability to
> >use that information seems like an obvious waste, and that means ability to
> >keep some fetched descriptors around even if they are not used for a given
> >packet. Again just my $.02.
> >
> 
> In packed virtqueue, discarding unused descriptors (and buffers associated with them) can indeed be easy, but reusing them for the next packet is complicated (or impossible).
> I agree that descriptors are being (pre)fetched with the maximum efficiency (in terms of PCIe bandwidth), and cached in the device. But the decision to fetch is being made according to the number of left cached-in descriptors and the expected number of descriptors that will be used by the packet.
> If the expected number of descriptors is larger than the actual one, the next fetch decision will be taken too early, and there will be no way to reuse excess cached descriptors, and they will have to be discarded.

> Even if it's possible to skip descriptors in the packed virtqueue (is it?), it's surely inefficient.

OK I think I understand what you are doing. Device is getting
buffers 1,2,3 for packet 1, it is meanwhile receiving packet
2 and decides to get buffers 4,5 for packet 2.
At this point it finally gets buffers 1-3 and figures out
that buffers 3 was not needed, but possibly it already
started writing packet 2 into buffers 4.
What to do about buffers 3 now?

Is above a good example?

If yes then it looks like you are unaware that Descriptors can be used
out of order, with split or packed ring, with no issues.  Looks
like exactly what you need to address this issue?
So device will simply proceed with marking buffers 1,2,4,5
as used, and store buffers 3 in some kind of internal
memory and use it for the next packet.

This is exactly the kind of thing out of order was designed for.

Does this answer the question?

> 
> >>
> >> Defining max s/g sounds like an interesting feature by itself.
> >
> >But assuming we have max RX s/g, I guess hardware can set max s/g = 1?
> >Then since with !VIRTIO_NET_F_MRG_RXBUF all buffers are forced to be
> >same length.
> >
> >>
> >> >
> >> >> >> Could you specify what issue do you see with the Linux driver?
> >> >> >
> >> >> >See logic around struct ewma.
> >> >> >
> >> >> >The first relevant commit is I guess commit
> >> >> >ab7db91705e95ed1bba1304388936fccfa58c992
> >> >> >    virtio-net: auto-tune mergeable rx buffer size for improved
> >> >> >performance
> >> >> >
> >> >> >this claims a gain of about 10% with large packets which isn't
> >> >> >earth shattering but also not something we can ignore completely.
> >> >> >And I suspect it can be bigger with smaller packets.
> >> >> >
> >> >> >> >> >>
> >> >> >> >> >> >
> >> >> >> >> >> > So device does not really know the exact # of descriptors:
> >> >> >> >> >> > current drivers do 1 descriptor per buffer but nothing
> >> >> >> >> >> > prevents more.
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> > Thoughts?
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can
> >> >> >> >> >> > > > process duplicated
> >> >> >> >> >ACKs
> >> >> >> >> >> > > >        and report number of coalesced segments and
> >> >> >> >> >> > > > duplicated ACKs @@ -2854,8 +2858,8 @@
> >> >> >\subsubsection{Legacy Interface:
> >> >> >> >> >Feature bits}\label{sec:Device Types / Network
> >> >> >> >> >> > > >    \subsection{Device configuration
> >> >> >> >> >> > > > layout}\label{sec:Device Types /
> >> >> >> >> >Network Device / Device configuration layout}
> >> >> >> >> >> > > >    \label{sec:Device Types / Block Device / Feature
> >> >> >> >> >> > > > bits / Device configuration layout} -Three
> >> >> >> >> >> > > > driver-read-only configuration fields are currently defined.
> >> >> >> >> >> > > > The \field{mac} address field -always exists (though
> >> >> >> >> >> > > > is only valid if VIRTIO_NET_F_MAC is set), and
> >> >> >> >> >> > > > +The driver-read-only \field{mac} address field
> >> >> >> >> >> > > > +always exists (though is only valid if
> >> >> >> >> >> > > > +VIRTIO_NET_F_MAC is set), and
> >> >> >> >> >> > > >    \field{status} only exists if VIRTIO_NET_F_STATUS is set.
> >> >Two
> >> >> >> >> >> > > >    read-only bits (for the driver) are currently
> >> >> >> >> >> > > > defined for the status
> >> >> >> >> >field:
> >> >> >> >> >> > > >    VIRTIO_NET_S_LINK_UP and
> >VIRTIO_NET_S_ANNOUNCE.
> >> >> >> >> >> > > > @@ -2875,12 +2879,17 @@ \subsection{Device
> >> >> >> >> >> > > > configuration
> >> >> >> >> >layout}\label{sec:Device Types / Network Device
> >> >> >> >> >> > > >    VIRTIO_NET_F_MTU is set. This field specifies the
> >> >> >> >> >> > > > maximum MTU for
> >> >> >> >> >the driver to
> >> >> >> >> >> > > >    use.
> >> >> >> >> >> > > > +The device-read-only field \field{rx_buf_len} only
> >> >> >> >> >> > > > +exists if
> >> >> >> >> >> > >
> >> >> >> >> >> > > Should be driver-read-only.
> >> >> >> >> >> > >
> >> >> >> >> >> > >
> >> >> >> >> >> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. This
> >> >> >> >> >> > > > +field specifies the receive buffers length.
> >> >> >> >> >> > > > +
> >> >> >> >> >> > > >    \begin{lstlisting}
> >> >> >> >> >> > > >    struct virtio_net_config {
> >> >> >> >> >> > > >            u8 mac[6];
> >> >> >> >> >> > > >            le16 status;
> >> >> >> >> >> > > >            le16 max_virtqueue_pairs;
> >> >> >> >> >> > > >            le16 mtu;
> >> >> >> >> >> > > > +        le32 rx_buf_len;
> >> >> >> >> >> > > >    };
> >> >> >> >> >> > > >    \end{lstlisting}
> >> >> >> >> >> > > > @@ -2933,6 +2942,13 @@ \subsection{Device
> >> >> >> >> >> > > > configuration layout}\label{sec:Device Types / Network
> >Device
> >> >> >> >> >> > > >    A driver SHOULD negotiate the
> >> >> >> >> >> > > > VIRTIO_NET_F_STANDBY feature if
> >> >> >> >> >the device offers it.
> >> >> >> >> >> > > > +A driver SHOULD accept the
> >> >> >VIRTIO_NET_F_CONST_RXBUF_LEN
> >> >> >> >> >feature if offered.
> >> >> >> >> >> > > > +
> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
> >> >> >> >negotiated,
> >> >> >> >> >> > > > +the driver MUST set \field{rx_buf_len}.
> >> >> >> >> >> > >
> >> >> >> >> >> > > I think it's device that set the field?
> >> >> >> >> >> > Makes more sense for the driver, but if you want this set e.g.
> >> >> >> >> >> > before buffers are added, you must say so.
> >> >> >> >> >> >
> >> >> >> >> >> > > > +
> >> >> >> >> >> > > > +A driver MUST NOT modify \field{rx_buf_len} once it
> >> >> >> >> >> > > > +has been
> >> >> >> >set.
> >> >> >> >> >> > > > +
> >> >> >> >> >> >
> >> >> >> >> >> > This seems very unflexible. I can see how e.g. XDP would
> >> >> >> >> >> > benefit from big buffers while skbs benefit from small
> >buffers.
> >> >> >> >> >> >
> >> >> >> >> >> > This calls for ability to change this.
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> Yes, but it requires non trivial cleanups for the old
> >> >> >> >> >> length and place them with new ones.
> >> >> >> >> >>
> >> >> >> >> >> Thanks
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >Let's see:
> >> >> >> >> >1	- making buffer smaller: just update config space,
> >> >> >> >> >	  then make new buffers smaller
> >> >> >> >> >
> >> >> >> >> >2	- making buffers bigger: add larger buffers,
> >> >> >> >> >	once all small ones are consumed update config space
> >> >> >> >> >
> >> >> >> >> >2 is tricky I agree. Thoughts?
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> Agree. It's doable, provided that the driver will follow the
> >> >> >> >> update
> >> >> >> >procedure.
> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >> >> >
> >> >> >> >> >> > > >    \subsubsection{Legacy Interface: Device
> >> >> >> >> >> > > > configuration
> >> >> >> >> >layout}\label{sec:Device Types / Network Device / Device
> >> >> >> >> >configuration layout / Legacy Interface: Device
> >> >> >> >> >configuration layout}
> >> >> >> >> >> > > >    \label{sec:Device Types / Block Device / Feature
> >> >> >> >> >> > > > bits / Device
> >> >> >> >> >configuration layout / Legacy Interface: Device
> >> >> >> >> >configuration layout}
> >> >> >> >> >> > > >    When using the legacy interface, transitional
> >> >> >> >> >> > > > devices and drivers @@
> >> >> >> >> >> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up
> >> >> >> >> >> > > > Receive
> >> >> >> >> >Buffers}\label{sec:Device Types / Network Devi
> >> >> >> >> >> > > >    If VIRTIO_NET_F_MQ is negotiated, each of
> >> >> >> >> >> > > > receiveq1\ldots
> >> >> >> >> >receiveqN
> >> >> >> >> >> > > >    that will be used SHOULD be populated with
> >> >> >> >> >> > > > receive
> >> >buffers.
> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has been
> >> >> >> >negotiated,
> >> >> >> >> >> > > > +the driver MUST initialize all receive virtqueue
> >> >> >> >> >> > > > +descriptors \field{len} field with the value
> >> >> >> >> >> > > > +configured in \field{rx_buf_len} device
> >> >> >> >> >> > > > +configuration field, and allocate receive
> >> >> >> >> >buffers accordingly.
> >> >> >> >> >> > > > +
> >> >> >> >> >> > > >    \devicenormative{\paragraph}{Setting Up Receive
> >> >> >> >> >> > > > Buffers}{Device Types / Network Device / Device
> >> >> >> >> >> > > > Operation / Setting Up Receive Buffers}
> >> >> >> >> >> > > >    The device MUST set \field{num_buffers} to the
> >> >> >> >> >> > > > number of descriptors used to @@ -3396,6 +3417,10 @@
> >> >> >> >> >\subsubsection{Processing of Incoming
> >> >> >> >> >Packets}\label{sec:Device Types / Network
> >> >> >> >> >> > > >    checksum (in case of multiple encapsulated
> >> >> >> >> >> > > > protocols, one
> >> >> >level
> >> >> >> >> >> > > >    of checksums is validated).
> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been
> >> >> >> >> >> > > > +negotiated,
> >> >> >the
> >> >> >> >> >device
> >> >> >> >> >> > > > +MAY use \field{rx_buf_len} as a buffer length
> >> >> >> >> >> > > > +(instead of reading it from virtqueue descriptor
> >\field{len} field).
> >> >> >> >> >> > >
> >> >> >> >> >> > > Is this safe? What if driver submit a small buffer,
> >> >> >> >> >> > > then device can read
> >> >> >> >> >more than what is allowed?
> >> >> >> >> >> > >
> >> >> >> >> >> > > Thanks
> >> >> >> >> >> > >
> >> >> >> >> >> > >
> >> >> >> >> >> > > > +
> >> >> >> >> >> > > >    \drivernormative{\paragraph}{Processing of Incoming
> >> >> >> >> >> > > >    Packets}{Device Types / Network Device / Device
> >> >> >> >> >> > > > Operation
> >> >/
> >> >> >> >> >> > > >    Processing of Incoming Packets}
> >
> >
> >This publicly archived list offers a means to provide input to the OASIS Virtual
> >I/O Device (VIRTIO) TC.
> >
> >In order to verify user consent to the Feedback License terms and to minimize
> >spam in the list archive, subscription is required before posting.
> >
> >Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> >Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> >List help: virtio-comment-help@lists.oasis-open.org
> >List archive: https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__lists.oasis-2Dopen.org_archives_virtio-
> >2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lq
> >qsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
> >42yWtI9MnZcqZ2Gw7KCN7EgCg&s=-
> >JICLquqUnNye7tilUS67AFv7opngsKEl5L75acB64U&e=
> >Feedback License: https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__www.oasis-2Dopen.org_who_ipr_feedback-
> >5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqq
> >sArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
> >42yWtI9MnZcqZ2Gw7KCN7EgCg&s=4e2kWmSdPAtGMXBTHwfgNE_KOZdDUR
> >Wsji73HWdVF3A&e=
> >List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-
> >2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgFR
> >dcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
> >42yWtI9MnZcqZ2Gw7KCN7EgCg&s=i-dQn5G-
> >auoFtCxN8Y2PN8UccM1ezgcrsT2A8T1H8wE&e=
> >Committee: https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__www.oasis-
> >2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&
> >r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-
> >aDaM6Z-42yWtI9MnZcqZ2Gw7KCN7EgCg&s=kcv-jA-_-JC3v64-_r5iP-
> >XQ9rhyaeOKvrHJNHwZrLc&e=
> >Join OASIS: https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__www.oasis-
> >2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52o
> >J3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
> >42yWtI9MnZcqZ2Gw7KCN7EgCg&s=cKbLeI_5Fu9G7aybE5u51yISB0eRer6BvCxr
> >wgd5HS4&e=
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-11-27 13:51                         ` Michael S. Tsirkin
@ 2019-12-01 10:22                           ` Vitaly Mireyno
  2019-12-01 13:07                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Vitaly Mireyno @ 2019-12-01 10:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, virtio-comment



>-----Original Message-----
>From: Michael S. Tsirkin <mst@redhat.com>
>Sent: Wednesday, 27 November, 2019 15:51
>To: Vitaly Mireyno <vmireyno@marvell.com>
>Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>open.org
>Subject: Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH]
>virtio-net: Add equal-sized receive buffers feature
>
>On Tue, Nov 26, 2019 at 05:27:31PM +0000, Vitaly Mireyno wrote:
>>
>>
>> >-----Original Message-----
>> >From: virtio-comment@lists.oasis-open.org
>> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
>> >Tsirkin
>> >Sent: Sunday, 24 November, 2019 17:30
>> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>> >open.org
>> >Subject: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH]
>> >virtio-
>> >net: Add equal-sized receive buffers feature
>> >
>> >On Sun, Nov 24, 2019 at 03:02:05PM +0000, Vitaly Mireyno wrote:
>> >>
>> >>
>> >> >-----Original Message-----
>> >> >From: Michael S. Tsirkin <mst@redhat.com>
>> >> >Sent: Wednesday, 20 November, 2019 15:23
>> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>> >> >open.org
>> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
>> >> >Add
>> >> >equal- sized receive buffers feature
>> >> >
>> >> >On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote:
>> >> >>
>> >> >>
>> >> >> >-----Original Message-----
>> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
>> >> >> >Sent: Monday, 11 November, 2019 17:05
>> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >> >> >Cc: Jason Wang <jasowang@redhat.com>;
>> >> >> >virtio-comment@lists.oasis- open.org
>> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
>> >> >> >Add
>> >> >> >equal- sized receive buffers feature
>> >> >> >
>> >> >> >On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrote:
>> >> >> >>
>> >> >> >>
>> >> >> >> >-----Original Message-----
>> >> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
>> >> >> >> >Sent: Sunday, 10 November, 2019 17:35
>> >> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >> >> >> >Cc: Jason Wang <jasowang@redhat.com>;
>> >> >> >> >virtio-comment@lists.oasis- open.org
>> >> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
>> >> >> >> >Add
>> >> >> >> >equal- sized receive buffers feature
>> >> >> >> >
>> >> >> >> >On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
>> >> >> >> >>
>> >> >> >> >> >-----Original Message-----
>> >> >> >> >> >From: virtio-comment@lists.oasis-open.org
>> >> >> >> >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael
>S.
>> >> >> >> >> >Tsirkin
>> >> >> >> >> >Sent: Tuesday, 5 November, 2019 20:52
>> >> >> >> >> >To: Jason Wang <jasowang@redhat.com>
>> >> >> >> >> >Cc: Vitaly Mireyno <vmireyno@marvell.com>;
>> >> >> >> >> >virtio-comment@lists.oasis- open.org
>> >> >> >> >> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
>> >> >> >> >> >Add equal-sized receive buffers feature
>> >> >> >> >> >
>> >> >> >> >> >External Email
>> >> >> >> >> >
>> >> >> >> >> >---------------------------------------------------------
>> >> >> >> >> >---
>> >> >> >> >> >---
>> >> >> >> >> >---
>> >> >> >> >> >---
>> >> >> >> >> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
>> >> >> >> >> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly
>> >> >> >> >> >> > Mireyno
>> >wrote:
>> >> >> >> >> >> > > The feature is limited to receive buffers only.
>> >> >> >> >> >> > > A driver decides on receive buffer length. The only
>> >> >> >> >> >> > > limitation is that this
>> >> >> >> >> >length has to be the same for all receive virtqueue buffers.
>> >> >> >> >> >> > > The driver configures receive buffer length to the
>> >> >> >> >> >> > > device during device
>> >> >> >> >> >initialization, and the device reads it and may use it
>> >> >> >> >> >for optimal
>> >> >> >operation.
>> >> >> >> >> >> > > No changes for transmit buffers.
>> >> >> >> >> >> > >
>> >> >> >> >> >> > > -----Original Message-----
>> >> >> >> >> >> > > From: virtio-comment@lists.oasis-open.org
>> >> >> >> >> >> > > <virtio-comment@lists.oasis-open.org> On Behalf Of
>> >> >> >> >> >> > > Jason Wang
>> >> >> >> >> >> > > Sent: Thursday, 31 October, 2019 12:15
>> >> >> >> >> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
>> >> >> >> >> >> > > virtio-comment@lists.oasis-open.org
>> >> >> >> >> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
>> >> >> >> >> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net:
>> >> >> >> >> >> > > Add equal-sized receive buffers feature
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >
>> >> >> >> >> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
>> >> >> >> >> >> > > > Some devices benefit from working with receive
>> >> >> >> >> >> > > > buffers of a
>> >> >> >> >> >predefined constant length.
>> >> >> >> >> >> > > > Add a feature for drivers that allocate
>> >> >> >> >> >> > > > equal-sized receive buffers, and
>> >> >> >> >> >devices that benefit from predefined receive buffers length.
>> >> >> >> >> >> > > >
>> >> >> >> >> >> > > > Signed-off-by: Vitaly Mireyno
>> >> >> >> >> >> > > > <vmireyno@marvell.com>
>> >> >> >> >> >> > > > ---
>> >> >> >> >> >> > > >    content.tex | 29 +++++++++++++++++++++++++++--
>> >> >> >> >> >> > > >    1 file changed, 27 insertions(+), 2
>> >> >> >> >> >> > > > deletions(-)
>> >> >> >> >> >> > >
>> >> >> >> >> >> > > Is there any other networking device that has this
>feature?
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >
>> >> >> >> >> >> > > > diff --git a/content.tex b/content.tex index
>> >> >> >> >> >> > > > b1ea9b9..c9e67c8
>> >> >> >> >> >> > > > 100644
>> >> >> >> >> >> > > > --- a/content.tex
>> >> >> >> >> >> > > > +++ b/content.tex
>> >> >> >> >> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
>> >> >> >> >> >> > > > bits}\label{sec:Device
>> >> >> >> >> >Types / Network Device / Feature bits
>> >> >> >> >> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set
>MAC
>> >> >> >> >> >> > > > address
>> >> >> >> >> >through control
>> >> >> >> >> >> > > >        channel.
>> >> >> >> >> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver
>> >> >> >> >> >> > > > +allocates all
>> >> >> >> >> >receive buffers
>> >> >> >> >> >> > > > +    of the same constant length. Device benefits
>> >> >> >> >> >> > > > + from working
>> >> >> >with
>> >> >> >> >> >> > > > +    receive buffers of equal length.
>> >> >> >> >> >> > > > +
>> >> >> >> >> >> > Problem is, I don't think linux will use this for
>> >> >> >> >> >> > skbs since it breaks buffer accounting. This is
>> >> >> >> >> >> > because it is important to make skbs as small as you
>> >> >> >> >> >> > can. So even if you see "device would
>> >> >> >benefit"
>> >> >> >> >> >> > there is no way to balance this with the benefit to linux.
>> >> >> >> >> >> > How do you know which benefit is bigger?
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> I guess the idea is e.g for Linux driver, it can refuse this
>feature.
>> >> >> >> >> >
>> >> >> >> >> >Okay. What uses it?
>> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >> >
>> >> >> >> >> >> > You also never explained how does device benefit. My
>> >> >> >> >> >> > guess is this allows device to calculate the # of
>> >> >> >> >> >> > descriptors to fetch that are needed for a packet. Right?
>> >> >> >> >> >> > Assuming this, I think a rough estimate should be enough.
>> >> >> >> >> >> > If device fetches too much it can discard extra, if
>> >> >> >> >> >> > it does not fetch enough it can fetch extra.
>> >> >> >> >> >> >
>> >> >> >> >> >> > Let us not forget that express packets are 256 bytes
>> >> >> >> >> >> > so up to
>> >> >> >> >> >> > 16 descriptors fit in a packet, there is no benefit
>> >> >> >> >> >> > most of the time in knowing whether e.g. 1 or 2
>> >> >> >> >> >> > descriptors are
>> >needed.
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > Let us not forget these are buffers, not descriptors.
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> I guess maybe the initial motivation is constant
>> >> >> >> >> >> descriptor
>> >length.
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >That conflicts with requirement framing is up to driver.
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> I was using the wrong terminology. The feature is about
>> >> >> >> >> constant
>> >> >> >> >*descriptor* length. In other words, the value that driver
>> >> >> >> >puts in Packed Virtqueue "Element Length" field (or 'len'
>> >> >> >> >field in the
>> >> >'pvirtq_desc'
>> >> >> >struct).
>> >> >> >> >
>> >> >> >> >OK so this conflicts with "2.6.4 Message Framing" which
>> >> >> >> >requires that drivers can split buffers into as many
>> >> >> >> >descriptors as
>> >they like.
>> >> >> >> >Right?
>> >> >> >> >
>> >> >> >> >> Does it make more sense now, or you still see an issue
>> >> >> >> >> with Linux
>> >> >driver?
>> >> >> >> >
>> >> >> >> >I think there's an issue with the flexible framing
>> >> >> >> >requirements and there's an issue with Linux driver.
>> >> >> >> >
>> >> >> >> >> The motivation is to allow the device to calculate the
>> >> >> >> >> exact number of
>> >> >> >> >descriptors to consume, before fetching the descriptors.
>> >> >> >> >This is beneficial for devices for which overconsuming or
>> >> >> >> >underconsuming descriptors come at a high cost.
>> >> >> >> >
>> >> >> >> >So I guess we can agree getting the # of descriptors
>> >> >> >> >*exactly* right isn't all that important. Right? My question
>> >> >> >> >is how does the driver balance the device versus Linux needs?
>> >> >> >> >
>> >> >> >> >One idea is if we assume this is best effort, does not have
>> >> >> >> >to be exact, then how about just having the device assume
>> >> >> >> >descriptor sizes stay more or less constant and use that to
>> >> >> >> >estimate the amount? If it under/over estimates, things just
>> >> >> >> >go a bit
>> >slower.
>> >> >> >> >This way driver can adjust the sizes and device will react
>> >> >> >> >automatically, with time.
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >> I see that "2.6.4 Message Framing" allows a full flexibility
>> >> >> >> of descriptor lengths, and I presume it's applicable to
>> >> >> >> Packet Virtqueues as well, though defined under Split Virtqueues
>section.
>> >> >> >
>> >> >> >That's a good point. Probably makes sense to move it out to a
>> >> >> >common section, right?
>> >> >> >
>> >> >> >> The example
>> >> >> >> talks about transmit virtqueue, and it makes perfect sense.
>> >> >> >> However, wouldn't a typical driver place equal-sized
>> >> >> >> *receive* descriptors
>> >> >> >anyway? So if a device can benefit from it, the driver might as
>> >> >> >well negotiate this new feature.
>> >> >> >
>> >> >> >Having buffer size jump around wildly doesn't seem too useful, I
>agree.
>> >> >> >OTOH being able to adjust it gradually has been in the past
>> >> >> >demontrated to help performance measureably.
>> >> >> >
>> >> >> >> This could be especially relevant for devices, for which
>> >> >> >> adjusting the number
>> >> >> >of used descriptors is impractical after descriptors have
>> >> >> >already been
>> >> >fetched.
>> >> >> >> I agree that if this requirement conflicts with specific
>> >> >> >> driver needs, it will not
>> >> >> >be negotiated, and the device will either underperform in
>> >> >> >certain scenarios, or will not come up at all.
>> >> >> >
>> >> >> >Right so that makes it a challenge. Device says it prefers
>> >> >> >fixed buffers. Is that preference more or less important than
>> >> >> >ability to efficiently support workloads such as TCP small queues?
>> >> >> >Driver has no idea and I suspect neither does the device.
>> >> >> >So I don't see how will a Linux driver know that it should
>> >> >> >enable this, neither how will device know it's okay to just refuse
>features_ok.
>> >> >> >
>> >> >> >
>> >> >> >On the other hand, current drivers already have logic that
>> >> >> >tries to change buffer sizes in a smooth way.  So simply
>> >> >> >caching the last buffer size and assuming all the others will
>> >> >> >be exactly the same will go a long way towards limiting how
>> >> >> >much does device need to
>> >fetch.
>> >> >> >This does imply extra logic on the device to recover if things
>> >> >> >change and the first read did not fetch enough buffers, but
>> >> >> >then it would be required anyway if as you say above the
>> >> >> >failure is not
>> >catastrophic.
>> >> >> >
>> >> >> >
>> >> >> >The feature thus only seems useful for small,
>> >> >> >feature-restrained devices
>> >> >> >- which are prepared to sacrifice some performance to cut
>> >> >> >costs, and which can't recover at all. Is this what you are trying to do?
>> >> >> >
>> >> >>
>> >> >> The intention is to enable devices with this specific limitation
>> >> >> to offer virtio offload.  The feature can be redefined such that
>> >> >> it would only be offered by devices that are unable to handle
>> >> >> dynamically changing descriptor lengths. How does that sound?
>> >> >
>> >> >So it makes more sense when mergeable buffers are disabled (since
>> >> >then buffers are practically all same size).
>> >> >
>> >>
>> >> Actually, mergeable buffers are not a problem. They could be
>> >> enabled, as long as each descriptor is the same length. So
>> >> implementations that prefer optimizing memory utilization over
>> >> jumbo frame performance can negotiate VIRTIO_NET_F_MRG_RXBUF
>and allocate smaller buffers.
>> >
>> >So my point was, without VIRTIO_NET_F_MRG_RXBUF, all buffers are
>same
>> >length anyway. So if we are talking about cheap simple hardware, I
>> >guess it can just not have VIRTIO_NET_F_MRG_RXBUF and be done with
>it?
>> >If device does care about memory utilization then I think it needs to
>> >give driver the flexibility it needs/uses.
>> >No?
>> >
>>
>> It's not a matter of device complexity, but a HW architecture, which could be
>complex, but have this specific limitation.
>
>So - don't do it then?
>
>> I see at least two reasons to support and negotiate
>VIRTIO_NET_F_MRG_RXBUF, while keeping equal-sized descriptor limitation:
>>  * GRO
>
>You mean LRO?
>
>>  * With jumbo packets, if the throughput is capped by the port bandwidth,
>and not by the device/driver per-packet performance, it makes sense to
>optimize memory utilization by allocating small buffers, without sacrificing
>throughput performance.
>
>So we are back to square one, if driver cares about memory utilization with 1K
>packets vs 9K buffers, why not with 100byte packets vs 1K buffers? Looks like
>exactly the same tradeoff.
>
>
>>
>> >Again I can see how we might want to disallow crazy setups with e.g.
>> >1 byte per buffer. That's just abuse, no guest does that anyway. So
>> >asking e.g. for a minimal buffer size sounds very reasonable.  But an
>> >option that disables functionality that a popular guest uses needs a
>> >lot of documentation to help device writers figure out whether they
>> >want that option or not, and I'd worry that even with documentation will
>be misunderstood even if we write it.
>> >When do you enable this?
>> >When you don't care about performance? When you don't care about
>Linux?
>> >
>>
>> I understand that there are guest-side optimizations that require flexibility
>in descriptors length. I can propose the following simple logic:
>> Device - advertise "equal-sized descriptor" feature only if the device is
>absolutely unable to operate otherwise. The device will not set FEATURES_OK
>unless the driver negotiates this feature.
>> Driver - if device advertises "equal-sized descriptor" feature - if possible,
>give up on the flexible descriptors length optimizations. If not - give up on the
>device.
>
>Yes, that's possible. Looks like a rather limited feature, and i'd rather we
>focused on something more widely applicable, but with enough disclamers
>that devices SHOULD NOT set this bit we can maybe do that.
>

I agree that this feature is more of a limitation declaration, rather than an enhancement, but let me emphasize that its only purpose is to make more *existing* HW devices be virtio compatible. 
This feature will allow such HW devices to offer virtio offload with VIRTIO_NET_F_MRG_RXBUF capability for GRO/LRO offload.
New device designs should definitely avoid employing this feature.

>
>>
>> >> >
>> >> >With that in mind, I have an idea: scsi and block already have max sg
>field.
>> >> >How about we add a writeable max sg field, maybe even make it
>> >> >programmable per vq?
>> >> >
>> >> >Thus driver tells device what is it's max s/g value for rx. Worst
>> >> >case device fetches a bit more than it needs. Discarding extra
>> >> >shouldn't be expensive. This looks like it will help even smart
>> >> >devices. What
>> >do you think?
>> >> >
>> >> >This nicely avoids conflicting with the flexible framing requirement.
>> >> >
>> >>
>> >> My intention was to avoid any descriptor length variations, for
>> >> devices that unable fetching or discarding extra descriptors. (If
>> >> in the pipelined HW processing the decision on number of
>> >> descriptors is made in the early stage, and it cannot be undone in a later
>stage).
>> >
>> >Frankly discarding unused descriptors looks to me like something that
>> >shouldn't have a high cost in the hardware.  I can see how trying to
>> >predict descriptor length, and fetching extra if not enough was
>> >fetched can have a high cost, so to me extensions to remove the
>> >guesswork from the device have value.  However a lot of effort went
>> >into trying to reduce e.g. number of pci express packets needed to fetch
>descriptors.
>> >Each packet fetches 256 bytes anyway, does it not?  Not having an
>> >ability to use that information seems like an obvious waste, and that
>> >means ability to keep some fetched descriptors around even if they
>> >are not used for a given packet. Again just my $.02.
>> >
>>
>> In packed virtqueue, discarding unused descriptors (and buffers associated
>with them) can indeed be easy, but reusing them for the next packet is
>complicated (or impossible).
>> I agree that descriptors are being (pre)fetched with the maximum efficiency
>(in terms of PCIe bandwidth), and cached in the device. But the decision to
>fetch is being made according to the number of left cached-in descriptors and
>the expected number of descriptors that will be used by the packet.
>> If the expected number of descriptors is larger than the actual one, the next
>fetch decision will be taken too early, and there will be no way to reuse excess
>cached descriptors, and they will have to be discarded.
>
>> Even if it's possible to skip descriptors in the packed virtqueue (is it?), it's
>surely inefficient.
>
>OK I think I understand what you are doing. Device is getting buffers 1,2,3 for
>packet 1, it is meanwhile receiving packet
>2 and decides to get buffers 4,5 for packet 2.
>At this point it finally gets buffers 1-3 and figures out that buffers 3 was not
>needed, but possibly it already started writing packet 2 into buffers 4.
>What to do about buffers 3 now?
>
>Is above a good example?
>
>If yes then it looks like you are unaware that Descriptors can be used out of
>order, with split or packed ring, with no issues.  Looks like exactly what you
>need to address this issue?
>So device will simply proceed with marking buffers 1,2,4,5 as used, and store
>buffers 3 in some kind of internal memory and use it for the next packet.
>
>This is exactly the kind of thing out of order was designed for.
>
>Does this answer the question?
>

The example is good, and the suggested solution is clear. However if we're talking about HW device that is designed to process packets/buffers in order, this solution could not be applicable.

>>
>> >>
>> >> Defining max s/g sounds like an interesting feature by itself.
>> >
>> >But assuming we have max RX s/g, I guess hardware can set max s/g = 1?
>> >Then since with !VIRTIO_NET_F_MRG_RXBUF all buffers are forced to be
>> >same length.
>> >
>> >>
>> >> >
>> >> >> >> Could you specify what issue do you see with the Linux driver?
>> >> >> >
>> >> >> >See logic around struct ewma.
>> >> >> >
>> >> >> >The first relevant commit is I guess commit
>> >> >> >ab7db91705e95ed1bba1304388936fccfa58c992
>> >> >> >    virtio-net: auto-tune mergeable rx buffer size for improved
>> >> >> >performance
>> >> >> >
>> >> >> >this claims a gain of about 10% with large packets which isn't
>> >> >> >earth shattering but also not something we can ignore completely.
>> >> >> >And I suspect it can be bigger with smaller packets.
>> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >> >
>> >> >> >> >> >> > So device does not really know the exact # of descriptors:
>> >> >> >> >> >> > current drivers do 1 descriptor per buffer but
>> >> >> >> >> >> > nothing prevents more.
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > Thoughts?
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can
>> >> >> >> >> >> > > > process duplicated
>> >> >> >> >> >ACKs
>> >> >> >> >> >> > > >        and report number of coalesced segments
>> >> >> >> >> >> > > > and duplicated ACKs @@ -2854,8 +2858,8 @@
>> >> >> >\subsubsection{Legacy Interface:
>> >> >> >> >> >Feature bits}\label{sec:Device Types / Network
>> >> >> >> >> >> > > >    \subsection{Device configuration
>> >> >> >> >> >> > > > layout}\label{sec:Device Types /
>> >> >> >> >> >Network Device / Device configuration layout}
>> >> >> >> >> >> > > >    \label{sec:Device Types / Block Device /
>> >> >> >> >> >> > > > Feature bits / Device configuration layout}
>> >> >> >> >> >> > > > -Three driver-read-only configuration fields are
>currently defined.
>> >> >> >> >> >> > > > The \field{mac} address field -always exists
>> >> >> >> >> >> > > > (though is only valid if VIRTIO_NET_F_MAC is
>> >> >> >> >> >> > > > set), and
>> >> >> >> >> >> > > > +The driver-read-only \field{mac} address field
>> >> >> >> >> >> > > > +always exists (though is only valid if
>> >> >> >> >> >> > > > +VIRTIO_NET_F_MAC is set), and
>> >> >> >> >> >> > > >    \field{status} only exists if VIRTIO_NET_F_STATUS is
>set.
>> >> >Two
>> >> >> >> >> >> > > >    read-only bits (for the driver) are currently
>> >> >> >> >> >> > > > defined for the status
>> >> >> >> >> >field:
>> >> >> >> >> >> > > >    VIRTIO_NET_S_LINK_UP and
>> >VIRTIO_NET_S_ANNOUNCE.
>> >> >> >> >> >> > > > @@ -2875,12 +2879,17 @@ \subsection{Device
>> >> >> >> >> >> > > > configuration
>> >> >> >> >> >layout}\label{sec:Device Types / Network Device
>> >> >> >> >> >> > > >    VIRTIO_NET_F_MTU is set. This field specifies
>> >> >> >> >> >> > > > the maximum MTU for
>> >> >> >> >> >the driver to
>> >> >> >> >> >> > > >    use.
>> >> >> >> >> >> > > > +The device-read-only field \field{rx_buf_len}
>> >> >> >> >> >> > > > +only exists if
>> >> >> >> >> >> > >
>> >> >> >> >> >> > > Should be driver-read-only.
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >
>> >> >> >> >> >> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated.
>This
>> >> >> >> >> >> > > > +field specifies the receive buffers length.
>> >> >> >> >> >> > > > +
>> >> >> >> >> >> > > >    \begin{lstlisting}
>> >> >> >> >> >> > > >    struct virtio_net_config {
>> >> >> >> >> >> > > >            u8 mac[6];
>> >> >> >> >> >> > > >            le16 status;
>> >> >> >> >> >> > > >            le16 max_virtqueue_pairs;
>> >> >> >> >> >> > > >            le16 mtu;
>> >> >> >> >> >> > > > +        le32 rx_buf_len;
>> >> >> >> >> >> > > >    };
>> >> >> >> >> >> > > >    \end{lstlisting} @@ -2933,6 +2942,13 @@
>> >> >> >> >> >> > > > \subsection{Device configuration
>> >> >> >> >> >> > > > layout}\label{sec:Device Types / Network
>> >Device
>> >> >> >> >> >> > > >    A driver SHOULD negotiate the
>> >> >> >> >> >> > > > VIRTIO_NET_F_STANDBY feature if
>> >> >> >> >> >the device offers it.
>> >> >> >> >> >> > > > +A driver SHOULD accept the
>> >> >> >VIRTIO_NET_F_CONST_RXBUF_LEN
>> >> >> >> >> >feature if offered.
>> >> >> >> >> >> > > > +
>> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has
>been
>> >> >> >> >negotiated,
>> >> >> >> >> >> > > > +the driver MUST set \field{rx_buf_len}.
>> >> >> >> >> >> > >
>> >> >> >> >> >> > > I think it's device that set the field?
>> >> >> >> >> >> > Makes more sense for the driver, but if you want this set
>e.g.
>> >> >> >> >> >> > before buffers are added, you must say so.
>> >> >> >> >> >> >
>> >> >> >> >> >> > > > +
>> >> >> >> >> >> > > > +A driver MUST NOT modify \field{rx_buf_len} once
>> >> >> >> >> >> > > > +it has been
>> >> >> >> >set.
>> >> >> >> >> >> > > > +
>> >> >> >> >> >> >
>> >> >> >> >> >> > This seems very unflexible. I can see how e.g. XDP
>> >> >> >> >> >> > would benefit from big buffers while skbs benefit
>> >> >> >> >> >> > from small
>> >buffers.
>> >> >> >> >> >> >
>> >> >> >> >> >> > This calls for ability to change this.
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> Yes, but it requires non trivial cleanups for the old
>> >> >> >> >> >> length and place them with new ones.
>> >> >> >> >> >>
>> >> >> >> >> >> Thanks
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >Let's see:
>> >> >> >> >> >1	- making buffer smaller: just update config space,
>> >> >> >> >> >	  then make new buffers smaller
>> >> >> >> >> >
>> >> >> >> >> >2	- making buffers bigger: add larger buffers,
>> >> >> >> >> >	once all small ones are consumed update config space
>> >> >> >> >> >
>> >> >> >> >> >2 is tricky I agree. Thoughts?
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> Agree. It's doable, provided that the driver will follow
>> >> >> >> >> the update
>> >> >> >> >procedure.
>> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >> >
>> >> >> >> >> >> > > >    \subsubsection{Legacy Interface: Device
>> >> >> >> >> >> > > > configuration
>> >> >> >> >> >layout}\label{sec:Device Types / Network Device / Device
>> >> >> >> >> >configuration layout / Legacy Interface: Device
>> >> >> >> >> >configuration layout}
>> >> >> >> >> >> > > >    \label{sec:Device Types / Block Device /
>> >> >> >> >> >> > > > Feature bits / Device
>> >> >> >> >> >configuration layout / Legacy Interface: Device
>> >> >> >> >> >configuration layout}
>> >> >> >> >> >> > > >    When using the legacy interface, transitional
>> >> >> >> >> >> > > > devices and drivers @@
>> >> >> >> >> >> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up
>> >> >> >> >> >> > > > Receive
>> >> >> >> >> >Buffers}\label{sec:Device Types / Network Devi
>> >> >> >> >> >> > > >    If VIRTIO_NET_F_MQ is negotiated, each of
>> >> >> >> >> >> > > > receiveq1\ldots
>> >> >> >> >> >receiveqN
>> >> >> >> >> >> > > >    that will be used SHOULD be populated with
>> >> >> >> >> >> > > > receive
>> >> >buffers.
>> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has
>been
>> >> >> >> >negotiated,
>> >> >> >> >> >> > > > +the driver MUST initialize all receive virtqueue
>> >> >> >> >> >> > > > +descriptors \field{len} field with the value
>> >> >> >> >> >> > > > +configured in \field{rx_buf_len} device
>> >> >> >> >> >> > > > +configuration field, and allocate receive
>> >> >> >> >> >buffers accordingly.
>> >> >> >> >> >> > > > +
>> >> >> >> >> >> > > >    \devicenormative{\paragraph}{Setting Up
>> >> >> >> >> >> > > > Receive Buffers}{Device Types / Network Device /
>> >> >> >> >> >> > > > Device Operation / Setting Up Receive Buffers}
>> >> >> >> >> >> > > >    The device MUST set \field{num_buffers} to the
>> >> >> >> >> >> > > > number of descriptors used to @@ -3396,6 +3417,10
>> >> >> >> >> >> > > > @@
>> >> >> >> >> >\subsubsection{Processing of Incoming
>> >> >> >> >> >Packets}\label{sec:Device Types / Network
>> >> >> >> >> >> > > >    checksum (in case of multiple encapsulated
>> >> >> >> >> >> > > > protocols, one
>> >> >> >level
>> >> >> >> >> >> > > >    of checksums is validated).
>> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been
>> >> >> >> >> >> > > > +negotiated,
>> >> >> >the
>> >> >> >> >> >device
>> >> >> >> >> >> > > > +MAY use \field{rx_buf_len} as a buffer length
>> >> >> >> >> >> > > > +(instead of reading it from virtqueue descriptor
>> >\field{len} field).
>> >> >> >> >> >> > >
>> >> >> >> >> >> > > Is this safe? What if driver submit a small buffer,
>> >> >> >> >> >> > > then device can read
>> >> >> >> >> >more than what is allowed?
>> >> >> >> >> >> > >
>> >> >> >> >> >> > > Thanks
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >
>> >> >> >> >> >> > > > +
>> >> >> >> >> >> > > >    \drivernormative{\paragraph}{Processing of Incoming
>> >> >> >> >> >> > > >    Packets}{Device Types / Network Device /
>> >> >> >> >> >> > > > Device Operation
>> >> >/
>> >> >> >> >> >> > > >    Processing of Incoming Packets}
>> >
>> >
>> >This publicly archived list offers a means to provide input to the
>> >OASIS Virtual I/O Device (VIRTIO) TC.
>> >
>> >In order to verify user consent to the Feedback License terms and to
>> >minimize spam in the list archive, subscription is required before posting.
>> >
>> >Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> >Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> >List help: virtio-comment-help@lists.oasis-open.org
>> >List archive: https://urldefense.proofpoint.com/v2/url?u=https-
>> >3A__lists.oasis-2Dopen.org_archives_virtio-
>>
>>2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3l
>q
>> >qsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
>> >42yWtI9MnZcqZ2Gw7KCN7EgCg&s=-
>> >JICLquqUnNye7tilUS67AFv7opngsKEl5L75acB64U&e=
>> >Feedback License: https://urldefense.proofpoint.com/v2/url?u=https-
>> >3A__www.oasis-2Dopen.org_who_ipr_feedback-
>>
>>5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lq
>q
>> >sArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
>>
>>42yWtI9MnZcqZ2Gw7KCN7EgCg&s=4e2kWmSdPAtGMXBTHwfgNE_KOZdDU
>R
>> >Wsji73HWdVF3A&e=
>> >List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-
>> >3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-
>>
>>2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgF
>R
>> >dcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
>> >42yWtI9MnZcqZ2Gw7KCN7EgCg&s=i-dQn5G-
>> >auoFtCxN8Y2PN8UccM1ezgcrsT2A8T1H8wE&e=
>> >Committee: https://urldefense.proofpoint.com/v2/url?u=https-
>> >3A__www.oasis-
>>
>>2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
>&
>> >r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-
>> >aDaM6Z-42yWtI9MnZcqZ2Gw7KCN7EgCg&s=kcv-jA-_-JC3v64-_r5iP-
>> >XQ9rhyaeOKvrHJNHwZrLc&e=
>> >Join OASIS: https://urldefense.proofpoint.com/v2/url?u=https-
>> >3A__www.oasis-
>>
>>2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52
>o
>> >J3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
>>
>>42yWtI9MnZcqZ2Gw7KCN7EgCg&s=cKbLeI_5Fu9G7aybE5u51yISB0eRer6BvC
>xr
>> >wgd5HS4&e=
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-12-01 10:22                           ` Vitaly Mireyno
@ 2019-12-01 13:07                             ` Michael S. Tsirkin
  2019-12-01 21:43                               ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-12-01 13:07 UTC (permalink / raw)
  To: Vitaly Mireyno; +Cc: Jason Wang, virtio-comment

On Sun, Dec 01, 2019 at 10:22:17AM +0000, Vitaly Mireyno wrote:
> 
> 
> >-----Original Message-----
> >From: Michael S. Tsirkin <mst@redhat.com>
> >Sent: Wednesday, 27 November, 2019 15:51
> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> >open.org
> >Subject: Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH]
> >virtio-net: Add equal-sized receive buffers feature
> >
> >On Tue, Nov 26, 2019 at 05:27:31PM +0000, Vitaly Mireyno wrote:
> >>
> >>
> >> >-----Original Message-----
> >> >From: virtio-comment@lists.oasis-open.org
> >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
> >> >Tsirkin
> >> >Sent: Sunday, 24 November, 2019 17:30
> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> >> >open.org
> >> >Subject: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH]
> >> >virtio-
> >> >net: Add equal-sized receive buffers feature
> >> >
> >> >On Sun, Nov 24, 2019 at 03:02:05PM +0000, Vitaly Mireyno wrote:
> >> >>
> >> >>
> >> >> >-----Original Message-----
> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
> >> >> >Sent: Wednesday, 20 November, 2019 15:23
> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> >> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> >> >> >open.org
> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> >> >> >Add
> >> >> >equal- sized receive buffers feature
> >> >> >
> >> >> >On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote:
> >> >> >>
> >> >> >>
> >> >> >> >-----Original Message-----
> >> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
> >> >> >> >Sent: Monday, 11 November, 2019 17:05
> >> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> >> >> >Cc: Jason Wang <jasowang@redhat.com>;
> >> >> >> >virtio-comment@lists.oasis- open.org
> >> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> >> >> >> >Add
> >> >> >> >equal- sized receive buffers feature
> >> >> >> >
> >> >> >> >On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrote:
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> >-----Original Message-----
> >> >> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
> >> >> >> >> >Sent: Sunday, 10 November, 2019 17:35
> >> >> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> >> >> >> >Cc: Jason Wang <jasowang@redhat.com>;
> >> >> >> >> >virtio-comment@lists.oasis- open.org
> >> >> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> >> >> >> >> >Add
> >> >> >> >> >equal- sized receive buffers feature
> >> >> >> >> >
> >> >> >> >> >On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
> >> >> >> >> >>
> >> >> >> >> >> >-----Original Message-----
> >> >> >> >> >> >From: virtio-comment@lists.oasis-open.org
> >> >> >> >> >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael
> >S.
> >> >> >> >> >> >Tsirkin
> >> >> >> >> >> >Sent: Tuesday, 5 November, 2019 20:52
> >> >> >> >> >> >To: Jason Wang <jasowang@redhat.com>
> >> >> >> >> >> >Cc: Vitaly Mireyno <vmireyno@marvell.com>;
> >> >> >> >> >> >virtio-comment@lists.oasis- open.org
> >> >> >> >> >> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> >> >> >> >> >> >Add equal-sized receive buffers feature
> >> >> >> >> >> >
> >> >> >> >> >> >External Email
> >> >> >> >> >> >
> >> >> >> >> >> >---------------------------------------------------------
> >> >> >> >> >> >---
> >> >> >> >> >> >---
> >> >> >> >> >> >---
> >> >> >> >> >> >---
> >> >> >> >> >> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
> >> >> >> >> >> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly
> >> >> >> >> >> >> > Mireyno
> >> >wrote:
> >> >> >> >> >> >> > > The feature is limited to receive buffers only.
> >> >> >> >> >> >> > > A driver decides on receive buffer length. The only
> >> >> >> >> >> >> > > limitation is that this
> >> >> >> >> >> >length has to be the same for all receive virtqueue buffers.
> >> >> >> >> >> >> > > The driver configures receive buffer length to the
> >> >> >> >> >> >> > > device during device
> >> >> >> >> >> >initialization, and the device reads it and may use it
> >> >> >> >> >> >for optimal
> >> >> >> >operation.
> >> >> >> >> >> >> > > No changes for transmit buffers.
> >> >> >> >> >> >> > >
> >> >> >> >> >> >> > > -----Original Message-----
> >> >> >> >> >> >> > > From: virtio-comment@lists.oasis-open.org
> >> >> >> >> >> >> > > <virtio-comment@lists.oasis-open.org> On Behalf Of
> >> >> >> >> >> >> > > Jason Wang
> >> >> >> >> >> >> > > Sent: Thursday, 31 October, 2019 12:15
> >> >> >> >> >> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
> >> >> >> >> >> >> > > virtio-comment@lists.oasis-open.org
> >> >> >> >> >> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> >> >> >> >> >> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net:
> >> >> >> >> >> >> > > Add equal-sized receive buffers feature
> >> >> >> >> >> >> > >
> >> >> >> >> >> >> > >
> >> >> >> >> >> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
> >> >> >> >> >> >> > > > Some devices benefit from working with receive
> >> >> >> >> >> >> > > > buffers of a
> >> >> >> >> >> >predefined constant length.
> >> >> >> >> >> >> > > > Add a feature for drivers that allocate
> >> >> >> >> >> >> > > > equal-sized receive buffers, and
> >> >> >> >> >> >devices that benefit from predefined receive buffers length.
> >> >> >> >> >> >> > > >
> >> >> >> >> >> >> > > > Signed-off-by: Vitaly Mireyno
> >> >> >> >> >> >> > > > <vmireyno@marvell.com>
> >> >> >> >> >> >> > > > ---
> >> >> >> >> >> >> > > >    content.tex | 29 +++++++++++++++++++++++++++--
> >> >> >> >> >> >> > > >    1 file changed, 27 insertions(+), 2
> >> >> >> >> >> >> > > > deletions(-)
> >> >> >> >> >> >> > >
> >> >> >> >> >> >> > > Is there any other networking device that has this
> >feature?
> >> >> >> >> >> >> > >
> >> >> >> >> >> >> > >
> >> >> >> >> >> >> > > > diff --git a/content.tex b/content.tex index
> >> >> >> >> >> >> > > > b1ea9b9..c9e67c8
> >> >> >> >> >> >> > > > 100644
> >> >> >> >> >> >> > > > --- a/content.tex
> >> >> >> >> >> >> > > > +++ b/content.tex
> >> >> >> >> >> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
> >> >> >> >> >> >> > > > bits}\label{sec:Device
> >> >> >> >> >> >Types / Network Device / Feature bits
> >> >> >> >> >> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set
> >MAC
> >> >> >> >> >> >> > > > address
> >> >> >> >> >> >through control
> >> >> >> >> >> >> > > >        channel.
> >> >> >> >> >> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver
> >> >> >> >> >> >> > > > +allocates all
> >> >> >> >> >> >receive buffers
> >> >> >> >> >> >> > > > +    of the same constant length. Device benefits
> >> >> >> >> >> >> > > > + from working
> >> >> >> >with
> >> >> >> >> >> >> > > > +    receive buffers of equal length.
> >> >> >> >> >> >> > > > +
> >> >> >> >> >> >> > Problem is, I don't think linux will use this for
> >> >> >> >> >> >> > skbs since it breaks buffer accounting. This is
> >> >> >> >> >> >> > because it is important to make skbs as small as you
> >> >> >> >> >> >> > can. So even if you see "device would
> >> >> >> >benefit"
> >> >> >> >> >> >> > there is no way to balance this with the benefit to linux.
> >> >> >> >> >> >> > How do you know which benefit is bigger?
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> I guess the idea is e.g for Linux driver, it can refuse this
> >feature.
> >> >> >> >> >> >
> >> >> >> >> >> >Okay. What uses it?
> >> >> >> >> >> >
> >> >> >> >> >> >>
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > You also never explained how does device benefit. My
> >> >> >> >> >> >> > guess is this allows device to calculate the # of
> >> >> >> >> >> >> > descriptors to fetch that are needed for a packet. Right?
> >> >> >> >> >> >> > Assuming this, I think a rough estimate should be enough.
> >> >> >> >> >> >> > If device fetches too much it can discard extra, if
> >> >> >> >> >> >> > it does not fetch enough it can fetch extra.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Let us not forget that express packets are 256 bytes
> >> >> >> >> >> >> > so up to
> >> >> >> >> >> >> > 16 descriptors fit in a packet, there is no benefit
> >> >> >> >> >> >> > most of the time in knowing whether e.g. 1 or 2
> >> >> >> >> >> >> > descriptors are
> >> >needed.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Let us not forget these are buffers, not descriptors.
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> I guess maybe the initial motivation is constant
> >> >> >> >> >> >> descriptor
> >> >length.
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> >That conflicts with requirement framing is up to driver.
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> I was using the wrong terminology. The feature is about
> >> >> >> >> >> constant
> >> >> >> >> >*descriptor* length. In other words, the value that driver
> >> >> >> >> >puts in Packed Virtqueue "Element Length" field (or 'len'
> >> >> >> >> >field in the
> >> >> >'pvirtq_desc'
> >> >> >> >struct).
> >> >> >> >> >
> >> >> >> >> >OK so this conflicts with "2.6.4 Message Framing" which
> >> >> >> >> >requires that drivers can split buffers into as many
> >> >> >> >> >descriptors as
> >> >they like.
> >> >> >> >> >Right?
> >> >> >> >> >
> >> >> >> >> >> Does it make more sense now, or you still see an issue
> >> >> >> >> >> with Linux
> >> >> >driver?
> >> >> >> >> >
> >> >> >> >> >I think there's an issue with the flexible framing
> >> >> >> >> >requirements and there's an issue with Linux driver.
> >> >> >> >> >
> >> >> >> >> >> The motivation is to allow the device to calculate the
> >> >> >> >> >> exact number of
> >> >> >> >> >descriptors to consume, before fetching the descriptors.
> >> >> >> >> >This is beneficial for devices for which overconsuming or
> >> >> >> >> >underconsuming descriptors come at a high cost.
> >> >> >> >> >
> >> >> >> >> >So I guess we can agree getting the # of descriptors
> >> >> >> >> >*exactly* right isn't all that important. Right? My question
> >> >> >> >> >is how does the driver balance the device versus Linux needs?
> >> >> >> >> >
> >> >> >> >> >One idea is if we assume this is best effort, does not have
> >> >> >> >> >to be exact, then how about just having the device assume
> >> >> >> >> >descriptor sizes stay more or less constant and use that to
> >> >> >> >> >estimate the amount? If it under/over estimates, things just
> >> >> >> >> >go a bit
> >> >slower.
> >> >> >> >> >This way driver can adjust the sizes and device will react
> >> >> >> >> >automatically, with time.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> I see that "2.6.4 Message Framing" allows a full flexibility
> >> >> >> >> of descriptor lengths, and I presume it's applicable to
> >> >> >> >> Packet Virtqueues as well, though defined under Split Virtqueues
> >section.
> >> >> >> >
> >> >> >> >That's a good point. Probably makes sense to move it out to a
> >> >> >> >common section, right?
> >> >> >> >
> >> >> >> >> The example
> >> >> >> >> talks about transmit virtqueue, and it makes perfect sense.
> >> >> >> >> However, wouldn't a typical driver place equal-sized
> >> >> >> >> *receive* descriptors
> >> >> >> >anyway? So if a device can benefit from it, the driver might as
> >> >> >> >well negotiate this new feature.
> >> >> >> >
> >> >> >> >Having buffer size jump around wildly doesn't seem too useful, I
> >agree.
> >> >> >> >OTOH being able to adjust it gradually has been in the past
> >> >> >> >demontrated to help performance measureably.
> >> >> >> >
> >> >> >> >> This could be especially relevant for devices, for which
> >> >> >> >> adjusting the number
> >> >> >> >of used descriptors is impractical after descriptors have
> >> >> >> >already been
> >> >> >fetched.
> >> >> >> >> I agree that if this requirement conflicts with specific
> >> >> >> >> driver needs, it will not
> >> >> >> >be negotiated, and the device will either underperform in
> >> >> >> >certain scenarios, or will not come up at all.
> >> >> >> >
> >> >> >> >Right so that makes it a challenge. Device says it prefers
> >> >> >> >fixed buffers. Is that preference more or less important than
> >> >> >> >ability to efficiently support workloads such as TCP small queues?
> >> >> >> >Driver has no idea and I suspect neither does the device.
> >> >> >> >So I don't see how will a Linux driver know that it should
> >> >> >> >enable this, neither how will device know it's okay to just refuse
> >features_ok.
> >> >> >> >
> >> >> >> >
> >> >> >> >On the other hand, current drivers already have logic that
> >> >> >> >tries to change buffer sizes in a smooth way.  So simply
> >> >> >> >caching the last buffer size and assuming all the others will
> >> >> >> >be exactly the same will go a long way towards limiting how
> >> >> >> >much does device need to
> >> >fetch.
> >> >> >> >This does imply extra logic on the device to recover if things
> >> >> >> >change and the first read did not fetch enough buffers, but
> >> >> >> >then it would be required anyway if as you say above the
> >> >> >> >failure is not
> >> >catastrophic.
> >> >> >> >
> >> >> >> >
> >> >> >> >The feature thus only seems useful for small,
> >> >> >> >feature-restrained devices
> >> >> >> >- which are prepared to sacrifice some performance to cut
> >> >> >> >costs, and which can't recover at all. Is this what you are trying to do?
> >> >> >> >
> >> >> >>
> >> >> >> The intention is to enable devices with this specific limitation
> >> >> >> to offer virtio offload.  The feature can be redefined such that
> >> >> >> it would only be offered by devices that are unable to handle
> >> >> >> dynamically changing descriptor lengths. How does that sound?
> >> >> >
> >> >> >So it makes more sense when mergeable buffers are disabled (since
> >> >> >then buffers are practically all same size).
> >> >> >
> >> >>
> >> >> Actually, mergeable buffers are not a problem. They could be
> >> >> enabled, as long as each descriptor is the same length. So
> >> >> implementations that prefer optimizing memory utilization over
> >> >> jumbo frame performance can negotiate VIRTIO_NET_F_MRG_RXBUF
> >and allocate smaller buffers.
> >> >
> >> >So my point was, without VIRTIO_NET_F_MRG_RXBUF, all buffers are
> >same
> >> >length anyway. So if we are talking about cheap simple hardware, I
> >> >guess it can just not have VIRTIO_NET_F_MRG_RXBUF and be done with
> >it?
> >> >If device does care about memory utilization then I think it needs to
> >> >give driver the flexibility it needs/uses.
> >> >No?
> >> >
> >>
> >> It's not a matter of device complexity, but a HW architecture, which could be
> >complex, but have this specific limitation.
> >
> >So - don't do it then?
> >
> >> I see at least two reasons to support and negotiate
> >VIRTIO_NET_F_MRG_RXBUF, while keeping equal-sized descriptor limitation:
> >>  * GRO
> >
> >You mean LRO?
> >
> >>  * With jumbo packets, if the throughput is capped by the port bandwidth,
> >and not by the device/driver per-packet performance, it makes sense to
> >optimize memory utilization by allocating small buffers, without sacrificing
> >throughput performance.
> >
> >So we are back to square one, if driver cares about memory utilization with 1K
> >packets vs 9K buffers, why not with 100byte packets vs 1K buffers? Looks like
> >exactly the same tradeoff.
> >
> >
> >>
> >> >Again I can see how we might want to disallow crazy setups with e.g.
> >> >1 byte per buffer. That's just abuse, no guest does that anyway. So
> >> >asking e.g. for a minimal buffer size sounds very reasonable.  But an
> >> >option that disables functionality that a popular guest uses needs a
> >> >lot of documentation to help device writers figure out whether they
> >> >want that option or not, and I'd worry that even with documentation will
> >be misunderstood even if we write it.
> >> >When do you enable this?
> >> >When you don't care about performance? When you don't care about
> >Linux?
> >> >
> >>
> >> I understand that there are guest-side optimizations that require flexibility
> >in descriptors length. I can propose the following simple logic:
> >> Device - advertise "equal-sized descriptor" feature only if the device is
> >absolutely unable to operate otherwise. The device will not set FEATURES_OK
> >unless the driver negotiates this feature.
> >> Driver - if device advertises "equal-sized descriptor" feature - if possible,
> >give up on the flexible descriptors length optimizations. If not - give up on the
> >device.
> >
> >Yes, that's possible. Looks like a rather limited feature, and i'd rather we
> >focused on something more widely applicable, but with enough disclamers
> >that devices SHOULD NOT set this bit we can maybe do that.
> >
> 
> I agree that this feature is more of a limitation declaration, rather than an enhancement, but let me emphasize that its only purpose is to make more *existing* HW devices be virtio compatible. 
> This feature will allow such HW devices to offer virtio offload with VIRTIO_NET_F_MRG_RXBUF capability for GRO/LRO offload.
> New device designs should definitely avoid employing this feature.

I understand, my question is: can you find a way to make this more
generally useful? Something along the lines of the min buffer size
suggestion which would avoid creating a special case in the
driver? Any idea?


> >
> >>
> >> >> >
> >> >> >With that in mind, I have an idea: scsi and block already have max sg
> >field.
> >> >> >How about we add a writeable max sg field, maybe even make it
> >> >> >programmable per vq?
> >> >> >
> >> >> >Thus driver tells device what is it's max s/g value for rx. Worst
> >> >> >case device fetches a bit more than it needs. Discarding extra
> >> >> >shouldn't be expensive. This looks like it will help even smart
> >> >> >devices. What
> >> >do you think?
> >> >> >
> >> >> >This nicely avoids conflicting with the flexible framing requirement.
> >> >> >
> >> >>
> >> >> My intention was to avoid any descriptor length variations, for
> >> >> devices that unable fetching or discarding extra descriptors. (If
> >> >> in the pipelined HW processing the decision on number of
> >> >> descriptors is made in the early stage, and it cannot be undone in a later
> >stage).
> >> >
> >> >Frankly discarding unused descriptors looks to me like something that
> >> >shouldn't have a high cost in the hardware.  I can see how trying to
> >> >predict descriptor length, and fetching extra if not enough was
> >> >fetched can have a high cost, so to me extensions to remove the
> >> >guesswork from the device have value.  However a lot of effort went
> >> >into trying to reduce e.g. number of pci express packets needed to fetch
> >descriptors.
> >> >Each packet fetches 256 bytes anyway, does it not?  Not having an
> >> >ability to use that information seems like an obvious waste, and that
> >> >means ability to keep some fetched descriptors around even if they
> >> >are not used for a given packet. Again just my $.02.
> >> >
> >>
> >> In packed virtqueue, discarding unused descriptors (and buffers associated
> >with them) can indeed be easy, but reusing them for the next packet is
> >complicated (or impossible).
> >> I agree that descriptors are being (pre)fetched with the maximum efficiency
> >(in terms of PCIe bandwidth), and cached in the device. But the decision to
> >fetch is being made according to the number of left cached-in descriptors and
> >the expected number of descriptors that will be used by the packet.
> >> If the expected number of descriptors is larger than the actual one, the next
> >fetch decision will be taken too early, and there will be no way to reuse excess
> >cached descriptors, and they will have to be discarded.
> >
> >> Even if it's possible to skip descriptors in the packed virtqueue (is it?), it's
> >surely inefficient.
> >
> >OK I think I understand what you are doing. Device is getting buffers 1,2,3 for
> >packet 1, it is meanwhile receiving packet
> >2 and decides to get buffers 4,5 for packet 2.
> >At this point it finally gets buffers 1-3 and figures out that buffers 3 was not
> >needed, but possibly it already started writing packet 2 into buffers 4.
> >What to do about buffers 3 now?
> >
> >Is above a good example?
> >
> >If yes then it looks like you are unaware that Descriptors can be used out of
> >order, with split or packed ring, with no issues.  Looks like exactly what you
> >need to address this issue?
> >So device will simply proceed with marking buffers 1,2,4,5 as used, and store
> >buffers 3 in some kind of internal memory and use it for the next packet.
> >
> >This is exactly the kind of thing out of order was designed for.
> >
> >Does this answer the question?
> >
> 
> The example is good, and the suggested solution is clear. However if we're talking about HW device that is designed to process packets/buffers in order, this solution could not be applicable.



> >>
> >> >>
> >> >> Defining max s/g sounds like an interesting feature by itself.
> >> >
> >> >But assuming we have max RX s/g, I guess hardware can set max s/g = 1?
> >> >Then since with !VIRTIO_NET_F_MRG_RXBUF all buffers are forced to be
> >> >same length.
> >> >
> >> >>
> >> >> >
> >> >> >> >> Could you specify what issue do you see with the Linux driver?
> >> >> >> >
> >> >> >> >See logic around struct ewma.
> >> >> >> >
> >> >> >> >The first relevant commit is I guess commit
> >> >> >> >ab7db91705e95ed1bba1304388936fccfa58c992
> >> >> >> >    virtio-net: auto-tune mergeable rx buffer size for improved
> >> >> >> >performance
> >> >> >> >
> >> >> >> >this claims a gain of about 10% with large packets which isn't
> >> >> >> >earth shattering but also not something we can ignore completely.
> >> >> >> >And I suspect it can be bigger with smaller packets.
> >> >> >> >
> >> >> >> >> >> >>
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > So device does not really know the exact # of descriptors:
> >> >> >> >> >> >> > current drivers do 1 descriptor per buffer but
> >> >> >> >> >> >> > nothing prevents more.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Thoughts?
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can
> >> >> >> >> >> >> > > > process duplicated
> >> >> >> >> >> >ACKs
> >> >> >> >> >> >> > > >        and report number of coalesced segments
> >> >> >> >> >> >> > > > and duplicated ACKs @@ -2854,8 +2858,8 @@
> >> >> >> >\subsubsection{Legacy Interface:
> >> >> >> >> >> >Feature bits}\label{sec:Device Types / Network
> >> >> >> >> >> >> > > >    \subsection{Device configuration
> >> >> >> >> >> >> > > > layout}\label{sec:Device Types /
> >> >> >> >> >> >Network Device / Device configuration layout}
> >> >> >> >> >> >> > > >    \label{sec:Device Types / Block Device /
> >> >> >> >> >> >> > > > Feature bits / Device configuration layout}
> >> >> >> >> >> >> > > > -Three driver-read-only configuration fields are
> >currently defined.
> >> >> >> >> >> >> > > > The \field{mac} address field -always exists
> >> >> >> >> >> >> > > > (though is only valid if VIRTIO_NET_F_MAC is
> >> >> >> >> >> >> > > > set), and
> >> >> >> >> >> >> > > > +The driver-read-only \field{mac} address field
> >> >> >> >> >> >> > > > +always exists (though is only valid if
> >> >> >> >> >> >> > > > +VIRTIO_NET_F_MAC is set), and
> >> >> >> >> >> >> > > >    \field{status} only exists if VIRTIO_NET_F_STATUS is
> >set.
> >> >> >Two
> >> >> >> >> >> >> > > >    read-only bits (for the driver) are currently
> >> >> >> >> >> >> > > > defined for the status
> >> >> >> >> >> >field:
> >> >> >> >> >> >> > > >    VIRTIO_NET_S_LINK_UP and
> >> >VIRTIO_NET_S_ANNOUNCE.
> >> >> >> >> >> >> > > > @@ -2875,12 +2879,17 @@ \subsection{Device
> >> >> >> >> >> >> > > > configuration
> >> >> >> >> >> >layout}\label{sec:Device Types / Network Device
> >> >> >> >> >> >> > > >    VIRTIO_NET_F_MTU is set. This field specifies
> >> >> >> >> >> >> > > > the maximum MTU for
> >> >> >> >> >> >the driver to
> >> >> >> >> >> >> > > >    use.
> >> >> >> >> >> >> > > > +The device-read-only field \field{rx_buf_len}
> >> >> >> >> >> >> > > > +only exists if
> >> >> >> >> >> >> > >
> >> >> >> >> >> >> > > Should be driver-read-only.
> >> >> >> >> >> >> > >
> >> >> >> >> >> >> > >
> >> >> >> >> >> >> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated.
> >This
> >> >> >> >> >> >> > > > +field specifies the receive buffers length.
> >> >> >> >> >> >> > > > +
> >> >> >> >> >> >> > > >    \begin{lstlisting}
> >> >> >> >> >> >> > > >    struct virtio_net_config {
> >> >> >> >> >> >> > > >            u8 mac[6];
> >> >> >> >> >> >> > > >            le16 status;
> >> >> >> >> >> >> > > >            le16 max_virtqueue_pairs;
> >> >> >> >> >> >> > > >            le16 mtu;
> >> >> >> >> >> >> > > > +        le32 rx_buf_len;
> >> >> >> >> >> >> > > >    };
> >> >> >> >> >> >> > > >    \end{lstlisting} @@ -2933,6 +2942,13 @@
> >> >> >> >> >> >> > > > \subsection{Device configuration
> >> >> >> >> >> >> > > > layout}\label{sec:Device Types / Network
> >> >Device
> >> >> >> >> >> >> > > >    A driver SHOULD negotiate the
> >> >> >> >> >> >> > > > VIRTIO_NET_F_STANDBY feature if
> >> >> >> >> >> >the device offers it.
> >> >> >> >> >> >> > > > +A driver SHOULD accept the
> >> >> >> >VIRTIO_NET_F_CONST_RXBUF_LEN
> >> >> >> >> >> >feature if offered.
> >> >> >> >> >> >> > > > +
> >> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has
> >been
> >> >> >> >> >negotiated,
> >> >> >> >> >> >> > > > +the driver MUST set \field{rx_buf_len}.
> >> >> >> >> >> >> > >
> >> >> >> >> >> >> > > I think it's device that set the field?
> >> >> >> >> >> >> > Makes more sense for the driver, but if you want this set
> >e.g.
> >> >> >> >> >> >> > before buffers are added, you must say so.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > > > +
> >> >> >> >> >> >> > > > +A driver MUST NOT modify \field{rx_buf_len} once
> >> >> >> >> >> >> > > > +it has been
> >> >> >> >> >set.
> >> >> >> >> >> >> > > > +
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > This seems very unflexible. I can see how e.g. XDP
> >> >> >> >> >> >> > would benefit from big buffers while skbs benefit
> >> >> >> >> >> >> > from small
> >> >buffers.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > This calls for ability to change this.
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> Yes, but it requires non trivial cleanups for the old
> >> >> >> >> >> >> length and place them with new ones.
> >> >> >> >> >> >>
> >> >> >> >> >> >> Thanks
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> >Let's see:
> >> >> >> >> >> >1	- making buffer smaller: just update config space,
> >> >> >> >> >> >	  then make new buffers smaller
> >> >> >> >> >> >
> >> >> >> >> >> >2	- making buffers bigger: add larger buffers,
> >> >> >> >> >> >	once all small ones are consumed update config space
> >> >> >> >> >> >
> >> >> >> >> >> >2 is tricky I agree. Thoughts?
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> Agree. It's doable, provided that the driver will follow
> >> >> >> >> >> the update
> >> >> >> >> >procedure.
> >> >> >> >> >> >
> >> >> >> >> >> >>
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > > >    \subsubsection{Legacy Interface: Device
> >> >> >> >> >> >> > > > configuration
> >> >> >> >> >> >layout}\label{sec:Device Types / Network Device / Device
> >> >> >> >> >> >configuration layout / Legacy Interface: Device
> >> >> >> >> >> >configuration layout}
> >> >> >> >> >> >> > > >    \label{sec:Device Types / Block Device /
> >> >> >> >> >> >> > > > Feature bits / Device
> >> >> >> >> >> >configuration layout / Legacy Interface: Device
> >> >> >> >> >> >configuration layout}
> >> >> >> >> >> >> > > >    When using the legacy interface, transitional
> >> >> >> >> >> >> > > > devices and drivers @@
> >> >> >> >> >> >> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up
> >> >> >> >> >> >> > > > Receive
> >> >> >> >> >> >Buffers}\label{sec:Device Types / Network Devi
> >> >> >> >> >> >> > > >    If VIRTIO_NET_F_MQ is negotiated, each of
> >> >> >> >> >> >> > > > receiveq1\ldots
> >> >> >> >> >> >receiveqN
> >> >> >> >> >> >> > > >    that will be used SHOULD be populated with
> >> >> >> >> >> >> > > > receive
> >> >> >buffers.
> >> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has
> >been
> >> >> >> >> >negotiated,
> >> >> >> >> >> >> > > > +the driver MUST initialize all receive virtqueue
> >> >> >> >> >> >> > > > +descriptors \field{len} field with the value
> >> >> >> >> >> >> > > > +configured in \field{rx_buf_len} device
> >> >> >> >> >> >> > > > +configuration field, and allocate receive
> >> >> >> >> >> >buffers accordingly.
> >> >> >> >> >> >> > > > +
> >> >> >> >> >> >> > > >    \devicenormative{\paragraph}{Setting Up
> >> >> >> >> >> >> > > > Receive Buffers}{Device Types / Network Device /
> >> >> >> >> >> >> > > > Device Operation / Setting Up Receive Buffers}
> >> >> >> >> >> >> > > >    The device MUST set \field{num_buffers} to the
> >> >> >> >> >> >> > > > number of descriptors used to @@ -3396,6 +3417,10
> >> >> >> >> >> >> > > > @@
> >> >> >> >> >> >\subsubsection{Processing of Incoming
> >> >> >> >> >> >Packets}\label{sec:Device Types / Network
> >> >> >> >> >> >> > > >    checksum (in case of multiple encapsulated
> >> >> >> >> >> >> > > > protocols, one
> >> >> >> >level
> >> >> >> >> >> >> > > >    of checksums is validated).
> >> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been
> >> >> >> >> >> >> > > > +negotiated,
> >> >> >> >the
> >> >> >> >> >> >device
> >> >> >> >> >> >> > > > +MAY use \field{rx_buf_len} as a buffer length
> >> >> >> >> >> >> > > > +(instead of reading it from virtqueue descriptor
> >> >\field{len} field).
> >> >> >> >> >> >> > >
> >> >> >> >> >> >> > > Is this safe? What if driver submit a small buffer,
> >> >> >> >> >> >> > > then device can read
> >> >> >> >> >> >more than what is allowed?
> >> >> >> >> >> >> > >
> >> >> >> >> >> >> > > Thanks
> >> >> >> >> >> >> > >
> >> >> >> >> >> >> > >
> >> >> >> >> >> >> > > > +
> >> >> >> >> >> >> > > >    \drivernormative{\paragraph}{Processing of Incoming
> >> >> >> >> >> >> > > >    Packets}{Device Types / Network Device /
> >> >> >> >> >> >> > > > Device Operation
> >> >> >/
> >> >> >> >> >> >> > > >    Processing of Incoming Packets}
> >> >
> >> >
> >> >This publicly archived list offers a means to provide input to the
> >> >OASIS Virtual I/O Device (VIRTIO) TC.
> >> >
> >> >In order to verify user consent to the Feedback License terms and to
> >> >minimize spam in the list archive, subscription is required before posting.
> >> >
> >> >Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> >> >Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> >> >List help: virtio-comment-help@lists.oasis-open.org
> >> >List archive: https://urldefense.proofpoint.com/v2/url?u=https-
> >> >3A__lists.oasis-2Dopen.org_archives_virtio-
> >>
> >>2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3l
> >q
> >> >qsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
> >> >42yWtI9MnZcqZ2Gw7KCN7EgCg&s=-
> >> >JICLquqUnNye7tilUS67AFv7opngsKEl5L75acB64U&e=
> >> >Feedback License: https://urldefense.proofpoint.com/v2/url?u=https-
> >> >3A__www.oasis-2Dopen.org_who_ipr_feedback-
> >>
> >>5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lq
> >q
> >> >sArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
> >>
> >>42yWtI9MnZcqZ2Gw7KCN7EgCg&s=4e2kWmSdPAtGMXBTHwfgNE_KOZdDU
> >R
> >> >Wsji73HWdVF3A&e=
> >> >List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-
> >> >3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-
> >>
> >>2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgF
> >R
> >> >dcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
> >> >42yWtI9MnZcqZ2Gw7KCN7EgCg&s=i-dQn5G-
> >> >auoFtCxN8Y2PN8UccM1ezgcrsT2A8T1H8wE&e=
> >> >Committee: https://urldefense.proofpoint.com/v2/url?u=https-
> >> >3A__www.oasis-
> >>
> >>2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
> >&
> >> >r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-
> >> >aDaM6Z-42yWtI9MnZcqZ2Gw7KCN7EgCg&s=kcv-jA-_-JC3v64-_r5iP-
> >> >XQ9rhyaeOKvrHJNHwZrLc&e=
> >> >Join OASIS: https://urldefense.proofpoint.com/v2/url?u=https-
> >> >3A__www.oasis-
> >>
> >>2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52
> >o
> >> >J3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
> >>
> >>42yWtI9MnZcqZ2Gw7KCN7EgCg&s=cKbLeI_5Fu9G7aybE5u51yISB0eRer6BvC
> >xr
> >> >wgd5HS4&e=
> >>
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-12-01 13:07                             ` Michael S. Tsirkin
@ 2019-12-01 21:43                               ` Michael S. Tsirkin
  2019-12-02 15:09                                 ` Vitaly Mireyno
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-12-01 21:43 UTC (permalink / raw)
  To: Vitaly Mireyno; +Cc: Jason Wang, virtio-comment

On Sun, Dec 01, 2019 at 08:07:55AM -0500, Michael S. Tsirkin wrote:
> On Sun, Dec 01, 2019 at 10:22:17AM +0000, Vitaly Mireyno wrote:
> > 
> > 
> > >-----Original Message-----
> > >From: Michael S. Tsirkin <mst@redhat.com>
> > >Sent: Wednesday, 27 November, 2019 15:51
> > >To: Vitaly Mireyno <vmireyno@marvell.com>
> > >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> > >open.org
> > >Subject: Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH]
> > >virtio-net: Add equal-sized receive buffers feature
> > >
> > >On Tue, Nov 26, 2019 at 05:27:31PM +0000, Vitaly Mireyno wrote:
> > >>
> > >>
> > >> >-----Original Message-----
> > >> >From: virtio-comment@lists.oasis-open.org
> > >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
> > >> >Tsirkin
> > >> >Sent: Sunday, 24 November, 2019 17:30
> > >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> > >> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> > >> >open.org
> > >> >Subject: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH]
> > >> >virtio-
> > >> >net: Add equal-sized receive buffers feature
> > >> >
> > >> >On Sun, Nov 24, 2019 at 03:02:05PM +0000, Vitaly Mireyno wrote:
> > >> >>
> > >> >>
> > >> >> >-----Original Message-----
> > >> >> >From: Michael S. Tsirkin <mst@redhat.com>
> > >> >> >Sent: Wednesday, 20 November, 2019 15:23
> > >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> > >> >> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> > >> >> >open.org
> > >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> > >> >> >Add
> > >> >> >equal- sized receive buffers feature
> > >> >> >
> > >> >> >On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote:
> > >> >> >>
> > >> >> >>
> > >> >> >> >-----Original Message-----
> > >> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
> > >> >> >> >Sent: Monday, 11 November, 2019 17:05
> > >> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> > >> >> >> >Cc: Jason Wang <jasowang@redhat.com>;
> > >> >> >> >virtio-comment@lists.oasis- open.org
> > >> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> > >> >> >> >Add
> > >> >> >> >equal- sized receive buffers feature
> > >> >> >> >
> > >> >> >> >On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrote:
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >> >-----Original Message-----
> > >> >> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
> > >> >> >> >> >Sent: Sunday, 10 November, 2019 17:35
> > >> >> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> > >> >> >> >> >Cc: Jason Wang <jasowang@redhat.com>;
> > >> >> >> >> >virtio-comment@lists.oasis- open.org
> > >> >> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> > >> >> >> >> >Add
> > >> >> >> >> >equal- sized receive buffers feature
> > >> >> >> >> >
> > >> >> >> >> >On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno wrote:
> > >> >> >> >> >>
> > >> >> >> >> >> >-----Original Message-----
> > >> >> >> >> >> >From: virtio-comment@lists.oasis-open.org
> > >> >> >> >> >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael
> > >S.
> > >> >> >> >> >> >Tsirkin
> > >> >> >> >> >> >Sent: Tuesday, 5 November, 2019 20:52
> > >> >> >> >> >> >To: Jason Wang <jasowang@redhat.com>
> > >> >> >> >> >> >Cc: Vitaly Mireyno <vmireyno@marvell.com>;
> > >> >> >> >> >> >virtio-comment@lists.oasis- open.org
> > >> >> >> >> >> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> > >> >> >> >> >> >Add equal-sized receive buffers feature
> > >> >> >> >> >> >
> > >> >> >> >> >> >External Email
> > >> >> >> >> >> >
> > >> >> >> >> >> >---------------------------------------------------------
> > >> >> >> >> >> >---
> > >> >> >> >> >> >---
> > >> >> >> >> >> >---
> > >> >> >> >> >> >---
> > >> >> >> >> >> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang wrote:
> > >> >> >> >> >> >>
> > >> >> >> >> >> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
> > >> >> >> >> >> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly
> > >> >> >> >> >> >> > Mireyno
> > >> >wrote:
> > >> >> >> >> >> >> > > The feature is limited to receive buffers only.
> > >> >> >> >> >> >> > > A driver decides on receive buffer length. The only
> > >> >> >> >> >> >> > > limitation is that this
> > >> >> >> >> >> >length has to be the same for all receive virtqueue buffers.
> > >> >> >> >> >> >> > > The driver configures receive buffer length to the
> > >> >> >> >> >> >> > > device during device
> > >> >> >> >> >> >initialization, and the device reads it and may use it
> > >> >> >> >> >> >for optimal
> > >> >> >> >operation.
> > >> >> >> >> >> >> > > No changes for transmit buffers.
> > >> >> >> >> >> >> > >
> > >> >> >> >> >> >> > > -----Original Message-----
> > >> >> >> >> >> >> > > From: virtio-comment@lists.oasis-open.org
> > >> >> >> >> >> >> > > <virtio-comment@lists.oasis-open.org> On Behalf Of
> > >> >> >> >> >> >> > > Jason Wang
> > >> >> >> >> >> >> > > Sent: Thursday, 31 October, 2019 12:15
> > >> >> >> >> >> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
> > >> >> >> >> >> >> > > virtio-comment@lists.oasis-open.org
> > >> >> >> >> >> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > >> >> >> >> >> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net:
> > >> >> >> >> >> >> > > Add equal-sized receive buffers feature
> > >> >> >> >> >> >> > >
> > >> >> >> >> >> >> > >
> > >> >> >> >> >> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
> > >> >> >> >> >> >> > > > Some devices benefit from working with receive
> > >> >> >> >> >> >> > > > buffers of a
> > >> >> >> >> >> >predefined constant length.
> > >> >> >> >> >> >> > > > Add a feature for drivers that allocate
> > >> >> >> >> >> >> > > > equal-sized receive buffers, and
> > >> >> >> >> >> >devices that benefit from predefined receive buffers length.
> > >> >> >> >> >> >> > > >
> > >> >> >> >> >> >> > > > Signed-off-by: Vitaly Mireyno
> > >> >> >> >> >> >> > > > <vmireyno@marvell.com>
> > >> >> >> >> >> >> > > > ---
> > >> >> >> >> >> >> > > >    content.tex | 29 +++++++++++++++++++++++++++--
> > >> >> >> >> >> >> > > >    1 file changed, 27 insertions(+), 2
> > >> >> >> >> >> >> > > > deletions(-)
> > >> >> >> >> >> >> > >
> > >> >> >> >> >> >> > > Is there any other networking device that has this
> > >feature?
> > >> >> >> >> >> >> > >
> > >> >> >> >> >> >> > >
> > >> >> >> >> >> >> > > > diff --git a/content.tex b/content.tex index
> > >> >> >> >> >> >> > > > b1ea9b9..c9e67c8
> > >> >> >> >> >> >> > > > 100644
> > >> >> >> >> >> >> > > > --- a/content.tex
> > >> >> >> >> >> >> > > > +++ b/content.tex
> > >> >> >> >> >> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
> > >> >> >> >> >> >> > > > bits}\label{sec:Device
> > >> >> >> >> >> >Types / Network Device / Feature bits
> > >> >> >> >> >> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set
> > >MAC
> > >> >> >> >> >> >> > > > address
> > >> >> >> >> >> >through control
> > >> >> >> >> >> >> > > >        channel.
> > >> >> >> >> >> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver
> > >> >> >> >> >> >> > > > +allocates all
> > >> >> >> >> >> >receive buffers
> > >> >> >> >> >> >> > > > +    of the same constant length. Device benefits
> > >> >> >> >> >> >> > > > + from working
> > >> >> >> >with
> > >> >> >> >> >> >> > > > +    receive buffers of equal length.
> > >> >> >> >> >> >> > > > +
> > >> >> >> >> >> >> > Problem is, I don't think linux will use this for
> > >> >> >> >> >> >> > skbs since it breaks buffer accounting. This is
> > >> >> >> >> >> >> > because it is important to make skbs as small as you
> > >> >> >> >> >> >> > can. So even if you see "device would
> > >> >> >> >benefit"
> > >> >> >> >> >> >> > there is no way to balance this with the benefit to linux.
> > >> >> >> >> >> >> > How do you know which benefit is bigger?
> > >> >> >> >> >> >>
> > >> >> >> >> >> >>
> > >> >> >> >> >> >> I guess the idea is e.g for Linux driver, it can refuse this
> > >feature.
> > >> >> >> >> >> >
> > >> >> >> >> >> >Okay. What uses it?
> > >> >> >> >> >> >
> > >> >> >> >> >> >>
> > >> >> >> >> >> >> >
> > >> >> >> >> >> >> > You also never explained how does device benefit. My
> > >> >> >> >> >> >> > guess is this allows device to calculate the # of
> > >> >> >> >> >> >> > descriptors to fetch that are needed for a packet. Right?
> > >> >> >> >> >> >> > Assuming this, I think a rough estimate should be enough.
> > >> >> >> >> >> >> > If device fetches too much it can discard extra, if
> > >> >> >> >> >> >> > it does not fetch enough it can fetch extra.
> > >> >> >> >> >> >> >
> > >> >> >> >> >> >> > Let us not forget that express packets are 256 bytes
> > >> >> >> >> >> >> > so up to
> > >> >> >> >> >> >> > 16 descriptors fit in a packet, there is no benefit
> > >> >> >> >> >> >> > most of the time in knowing whether e.g. 1 or 2
> > >> >> >> >> >> >> > descriptors are
> > >> >needed.
> > >> >> >> >> >> >> >
> > >> >> >> >> >> >> >
> > >> >> >> >> >> >> > Let us not forget these are buffers, not descriptors.
> > >> >> >> >> >> >>
> > >> >> >> >> >> >>
> > >> >> >> >> >> >> I guess maybe the initial motivation is constant
> > >> >> >> >> >> >> descriptor
> > >> >length.
> > >> >> >> >> >> >
> > >> >> >> >> >> >
> > >> >> >> >> >> >That conflicts with requirement framing is up to driver.
> > >> >> >> >> >> >
> > >> >> >> >> >> >
> > >> >> >> >> >> I was using the wrong terminology. The feature is about
> > >> >> >> >> >> constant
> > >> >> >> >> >*descriptor* length. In other words, the value that driver
> > >> >> >> >> >puts in Packed Virtqueue "Element Length" field (or 'len'
> > >> >> >> >> >field in the
> > >> >> >'pvirtq_desc'
> > >> >> >> >struct).
> > >> >> >> >> >
> > >> >> >> >> >OK so this conflicts with "2.6.4 Message Framing" which
> > >> >> >> >> >requires that drivers can split buffers into as many
> > >> >> >> >> >descriptors as
> > >> >they like.
> > >> >> >> >> >Right?
> > >> >> >> >> >
> > >> >> >> >> >> Does it make more sense now, or you still see an issue
> > >> >> >> >> >> with Linux
> > >> >> >driver?
> > >> >> >> >> >
> > >> >> >> >> >I think there's an issue with the flexible framing
> > >> >> >> >> >requirements and there's an issue with Linux driver.
> > >> >> >> >> >
> > >> >> >> >> >> The motivation is to allow the device to calculate the
> > >> >> >> >> >> exact number of
> > >> >> >> >> >descriptors to consume, before fetching the descriptors.
> > >> >> >> >> >This is beneficial for devices for which overconsuming or
> > >> >> >> >> >underconsuming descriptors come at a high cost.
> > >> >> >> >> >
> > >> >> >> >> >So I guess we can agree getting the # of descriptors
> > >> >> >> >> >*exactly* right isn't all that important. Right? My question
> > >> >> >> >> >is how does the driver balance the device versus Linux needs?
> > >> >> >> >> >
> > >> >> >> >> >One idea is if we assume this is best effort, does not have
> > >> >> >> >> >to be exact, then how about just having the device assume
> > >> >> >> >> >descriptor sizes stay more or less constant and use that to
> > >> >> >> >> >estimate the amount? If it under/over estimates, things just
> > >> >> >> >> >go a bit
> > >> >slower.
> > >> >> >> >> >This way driver can adjust the sizes and device will react
> > >> >> >> >> >automatically, with time.
> > >> >> >> >> >
> > >> >> >> >> >
> > >> >> >> >>
> > >> >> >> >> I see that "2.6.4 Message Framing" allows a full flexibility
> > >> >> >> >> of descriptor lengths, and I presume it's applicable to
> > >> >> >> >> Packet Virtqueues as well, though defined under Split Virtqueues
> > >section.
> > >> >> >> >
> > >> >> >> >That's a good point. Probably makes sense to move it out to a
> > >> >> >> >common section, right?
> > >> >> >> >
> > >> >> >> >> The example
> > >> >> >> >> talks about transmit virtqueue, and it makes perfect sense.
> > >> >> >> >> However, wouldn't a typical driver place equal-sized
> > >> >> >> >> *receive* descriptors
> > >> >> >> >anyway? So if a device can benefit from it, the driver might as
> > >> >> >> >well negotiate this new feature.
> > >> >> >> >
> > >> >> >> >Having buffer size jump around wildly doesn't seem too useful, I
> > >agree.
> > >> >> >> >OTOH being able to adjust it gradually has been in the past
> > >> >> >> >demontrated to help performance measureably.
> > >> >> >> >
> > >> >> >> >> This could be especially relevant for devices, for which
> > >> >> >> >> adjusting the number
> > >> >> >> >of used descriptors is impractical after descriptors have
> > >> >> >> >already been
> > >> >> >fetched.
> > >> >> >> >> I agree that if this requirement conflicts with specific
> > >> >> >> >> driver needs, it will not
> > >> >> >> >be negotiated, and the device will either underperform in
> > >> >> >> >certain scenarios, or will not come up at all.
> > >> >> >> >
> > >> >> >> >Right so that makes it a challenge. Device says it prefers
> > >> >> >> >fixed buffers. Is that preference more or less important than
> > >> >> >> >ability to efficiently support workloads such as TCP small queues?
> > >> >> >> >Driver has no idea and I suspect neither does the device.
> > >> >> >> >So I don't see how will a Linux driver know that it should
> > >> >> >> >enable this, neither how will device know it's okay to just refuse
> > >features_ok.
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >On the other hand, current drivers already have logic that
> > >> >> >> >tries to change buffer sizes in a smooth way.  So simply
> > >> >> >> >caching the last buffer size and assuming all the others will
> > >> >> >> >be exactly the same will go a long way towards limiting how
> > >> >> >> >much does device need to
> > >> >fetch.
> > >> >> >> >This does imply extra logic on the device to recover if things
> > >> >> >> >change and the first read did not fetch enough buffers, but
> > >> >> >> >then it would be required anyway if as you say above the
> > >> >> >> >failure is not
> > >> >catastrophic.
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >The feature thus only seems useful for small,
> > >> >> >> >feature-restrained devices
> > >> >> >> >- which are prepared to sacrifice some performance to cut
> > >> >> >> >costs, and which can't recover at all. Is this what you are trying to do?
> > >> >> >> >
> > >> >> >>
> > >> >> >> The intention is to enable devices with this specific limitation
> > >> >> >> to offer virtio offload.  The feature can be redefined such that
> > >> >> >> it would only be offered by devices that are unable to handle
> > >> >> >> dynamically changing descriptor lengths. How does that sound?
> > >> >> >
> > >> >> >So it makes more sense when mergeable buffers are disabled (since
> > >> >> >then buffers are practically all same size).
> > >> >> >
> > >> >>
> > >> >> Actually, mergeable buffers are not a problem. They could be
> > >> >> enabled, as long as each descriptor is the same length. So
> > >> >> implementations that prefer optimizing memory utilization over
> > >> >> jumbo frame performance can negotiate VIRTIO_NET_F_MRG_RXBUF
> > >and allocate smaller buffers.
> > >> >
> > >> >So my point was, without VIRTIO_NET_F_MRG_RXBUF, all buffers are
> > >same
> > >> >length anyway. So if we are talking about cheap simple hardware, I
> > >> >guess it can just not have VIRTIO_NET_F_MRG_RXBUF and be done with
> > >it?
> > >> >If device does care about memory utilization then I think it needs to
> > >> >give driver the flexibility it needs/uses.
> > >> >No?
> > >> >
> > >>
> > >> It's not a matter of device complexity, but a HW architecture, which could be
> > >complex, but have this specific limitation.
> > >
> > >So - don't do it then?
> > >
> > >> I see at least two reasons to support and negotiate
> > >VIRTIO_NET_F_MRG_RXBUF, while keeping equal-sized descriptor limitation:
> > >>  * GRO
> > >
> > >You mean LRO?
> > >
> > >>  * With jumbo packets, if the throughput is capped by the port bandwidth,
> > >and not by the device/driver per-packet performance, it makes sense to
> > >optimize memory utilization by allocating small buffers, without sacrificing
> > >throughput performance.
> > >
> > >So we are back to square one, if driver cares about memory utilization with 1K
> > >packets vs 9K buffers, why not with 100byte packets vs 1K buffers? Looks like
> > >exactly the same tradeoff.
> > >
> > >
> > >>
> > >> >Again I can see how we might want to disallow crazy setups with e.g.
> > >> >1 byte per buffer. That's just abuse, no guest does that anyway. So
> > >> >asking e.g. for a minimal buffer size sounds very reasonable.  But an
> > >> >option that disables functionality that a popular guest uses needs a
> > >> >lot of documentation to help device writers figure out whether they
> > >> >want that option or not, and I'd worry that even with documentation will
> > >be misunderstood even if we write it.
> > >> >When do you enable this?
> > >> >When you don't care about performance? When you don't care about
> > >Linux?
> > >> >
> > >>
> > >> I understand that there are guest-side optimizations that require flexibility
> > >in descriptors length. I can propose the following simple logic:
> > >> Device - advertise "equal-sized descriptor" feature only if the device is
> > >absolutely unable to operate otherwise. The device will not set FEATURES_OK
> > >unless the driver negotiates this feature.
> > >> Driver - if device advertises "equal-sized descriptor" feature - if possible,
> > >give up on the flexible descriptors length optimizations. If not - give up on the
> > >device.
> > >
> > >Yes, that's possible. Looks like a rather limited feature, and i'd rather we
> > >focused on something more widely applicable, but with enough disclamers
> > >that devices SHOULD NOT set this bit we can maybe do that.
> > >
> > 
> > I agree that this feature is more of a limitation declaration, rather than an enhancement, but let me emphasize that its only purpose is to make more *existing* HW devices be virtio compatible. 
> > This feature will allow such HW devices to offer virtio offload with VIRTIO_NET_F_MRG_RXBUF capability for GRO/LRO offload.
> > New device designs should definitely avoid employing this feature.
> 
> I understand, my question is: can you find a way to make this more
> generally useful? Something along the lines of the min buffer size
> suggestion which would avoid creating a special case in the
> driver? Any idea?

as an example of a vague idea, exposing max rx s/g, min buffer + max
buffer will allow device to force this from device side.

Is that good? If not, how does driver decide on a good fixed buffer size?

> 
> > >
> > >>
> > >> >> >
> > >> >> >With that in mind, I have an idea: scsi and block already have max sg
> > >field.
> > >> >> >How about we add a writeable max sg field, maybe even make it
> > >> >> >programmable per vq?
> > >> >> >
> > >> >> >Thus driver tells device what is it's max s/g value for rx. Worst
> > >> >> >case device fetches a bit more than it needs. Discarding extra
> > >> >> >shouldn't be expensive. This looks like it will help even smart
> > >> >> >devices. What
> > >> >do you think?
> > >> >> >
> > >> >> >This nicely avoids conflicting with the flexible framing requirement.
> > >> >> >
> > >> >>
> > >> >> My intention was to avoid any descriptor length variations, for
> > >> >> devices that unable fetching or discarding extra descriptors. (If
> > >> >> in the pipelined HW processing the decision on number of
> > >> >> descriptors is made in the early stage, and it cannot be undone in a later
> > >stage).
> > >> >
> > >> >Frankly discarding unused descriptors looks to me like something that
> > >> >shouldn't have a high cost in the hardware.  I can see how trying to
> > >> >predict descriptor length, and fetching extra if not enough was
> > >> >fetched can have a high cost, so to me extensions to remove the
> > >> >guesswork from the device have value.  However a lot of effort went
> > >> >into trying to reduce e.g. number of pci express packets needed to fetch
> > >descriptors.
> > >> >Each packet fetches 256 bytes anyway, does it not?  Not having an
> > >> >ability to use that information seems like an obvious waste, and that
> > >> >means ability to keep some fetched descriptors around even if they
> > >> >are not used for a given packet. Again just my $.02.
> > >> >
> > >>
> > >> In packed virtqueue, discarding unused descriptors (and buffers associated
> > >with them) can indeed be easy, but reusing them for the next packet is
> > >complicated (or impossible).
> > >> I agree that descriptors are being (pre)fetched with the maximum efficiency
> > >(in terms of PCIe bandwidth), and cached in the device. But the decision to
> > >fetch is being made according to the number of left cached-in descriptors and
> > >the expected number of descriptors that will be used by the packet.
> > >> If the expected number of descriptors is larger than the actual one, the next
> > >fetch decision will be taken too early, and there will be no way to reuse excess
> > >cached descriptors, and they will have to be discarded.
> > >
> > >> Even if it's possible to skip descriptors in the packed virtqueue (is it?), it's
> > >surely inefficient.
> > >
> > >OK I think I understand what you are doing. Device is getting buffers 1,2,3 for
> > >packet 1, it is meanwhile receiving packet
> > >2 and decides to get buffers 4,5 for packet 2.
> > >At this point it finally gets buffers 1-3 and figures out that buffers 3 was not
> > >needed, but possibly it already started writing packet 2 into buffers 4.
> > >What to do about buffers 3 now?
> > >
> > >Is above a good example?
> > >
> > >If yes then it looks like you are unaware that Descriptors can be used out of
> > >order, with split or packed ring, with no issues.  Looks like exactly what you
> > >need to address this issue?
> > >So device will simply proceed with marking buffers 1,2,4,5 as used, and store
> > >buffers 3 in some kind of internal memory and use it for the next packet.
> > >
> > >This is exactly the kind of thing out of order was designed for.
> > >
> > >Does this answer the question?
> > >
> > 
> > The example is good, and the suggested solution is clear. However if we're talking about HW device that is designed to process packets/buffers in order, this solution could not be applicable.
> 
> 
> 
> > >>
> > >> >>
> > >> >> Defining max s/g sounds like an interesting feature by itself.
> > >> >
> > >> >But assuming we have max RX s/g, I guess hardware can set max s/g = 1?
> > >> >Then since with !VIRTIO_NET_F_MRG_RXBUF all buffers are forced to be
> > >> >same length.
> > >> >
> > >> >>
> > >> >> >
> > >> >> >> >> Could you specify what issue do you see with the Linux driver?
> > >> >> >> >
> > >> >> >> >See logic around struct ewma.
> > >> >> >> >
> > >> >> >> >The first relevant commit is I guess commit
> > >> >> >> >ab7db91705e95ed1bba1304388936fccfa58c992
> > >> >> >> >    virtio-net: auto-tune mergeable rx buffer size for improved
> > >> >> >> >performance
> > >> >> >> >
> > >> >> >> >this claims a gain of about 10% with large packets which isn't
> > >> >> >> >earth shattering but also not something we can ignore completely.
> > >> >> >> >And I suspect it can be bigger with smaller packets.
> > >> >> >> >
> > >> >> >> >> >> >>
> > >> >> >> >> >> >> >
> > >> >> >> >> >> >> > So device does not really know the exact # of descriptors:
> > >> >> >> >> >> >> > current drivers do 1 descriptor per buffer but
> > >> >> >> >> >> >> > nothing prevents more.
> > >> >> >> >> >> >> >
> > >> >> >> >> >> >> >
> > >> >> >> >> >> >> > Thoughts?
> > >> >> >> >> >> >> >
> > >> >> >> >> >> >> >
> > >> >> >> >> >> >> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device can
> > >> >> >> >> >> >> > > > process duplicated
> > >> >> >> >> >> >ACKs
> > >> >> >> >> >> >> > > >        and report number of coalesced segments
> > >> >> >> >> >> >> > > > and duplicated ACKs @@ -2854,8 +2858,8 @@
> > >> >> >> >\subsubsection{Legacy Interface:
> > >> >> >> >> >> >Feature bits}\label{sec:Device Types / Network
> > >> >> >> >> >> >> > > >    \subsection{Device configuration
> > >> >> >> >> >> >> > > > layout}\label{sec:Device Types /
> > >> >> >> >> >> >Network Device / Device configuration layout}
> > >> >> >> >> >> >> > > >    \label{sec:Device Types / Block Device /
> > >> >> >> >> >> >> > > > Feature bits / Device configuration layout}
> > >> >> >> >> >> >> > > > -Three driver-read-only configuration fields are
> > >currently defined.
> > >> >> >> >> >> >> > > > The \field{mac} address field -always exists
> > >> >> >> >> >> >> > > > (though is only valid if VIRTIO_NET_F_MAC is
> > >> >> >> >> >> >> > > > set), and
> > >> >> >> >> >> >> > > > +The driver-read-only \field{mac} address field
> > >> >> >> >> >> >> > > > +always exists (though is only valid if
> > >> >> >> >> >> >> > > > +VIRTIO_NET_F_MAC is set), and
> > >> >> >> >> >> >> > > >    \field{status} only exists if VIRTIO_NET_F_STATUS is
> > >set.
> > >> >> >Two
> > >> >> >> >> >> >> > > >    read-only bits (for the driver) are currently
> > >> >> >> >> >> >> > > > defined for the status
> > >> >> >> >> >> >field:
> > >> >> >> >> >> >> > > >    VIRTIO_NET_S_LINK_UP and
> > >> >VIRTIO_NET_S_ANNOUNCE.
> > >> >> >> >> >> >> > > > @@ -2875,12 +2879,17 @@ \subsection{Device
> > >> >> >> >> >> >> > > > configuration
> > >> >> >> >> >> >layout}\label{sec:Device Types / Network Device
> > >> >> >> >> >> >> > > >    VIRTIO_NET_F_MTU is set. This field specifies
> > >> >> >> >> >> >> > > > the maximum MTU for
> > >> >> >> >> >> >the driver to
> > >> >> >> >> >> >> > > >    use.
> > >> >> >> >> >> >> > > > +The device-read-only field \field{rx_buf_len}
> > >> >> >> >> >> >> > > > +only exists if
> > >> >> >> >> >> >> > >
> > >> >> >> >> >> >> > > Should be driver-read-only.
> > >> >> >> >> >> >> > >
> > >> >> >> >> >> >> > >
> > >> >> >> >> >> >> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated.
> > >This
> > >> >> >> >> >> >> > > > +field specifies the receive buffers length.
> > >> >> >> >> >> >> > > > +
> > >> >> >> >> >> >> > > >    \begin{lstlisting}
> > >> >> >> >> >> >> > > >    struct virtio_net_config {
> > >> >> >> >> >> >> > > >            u8 mac[6];
> > >> >> >> >> >> >> > > >            le16 status;
> > >> >> >> >> >> >> > > >            le16 max_virtqueue_pairs;
> > >> >> >> >> >> >> > > >            le16 mtu;
> > >> >> >> >> >> >> > > > +        le32 rx_buf_len;
> > >> >> >> >> >> >> > > >    };
> > >> >> >> >> >> >> > > >    \end{lstlisting} @@ -2933,6 +2942,13 @@
> > >> >> >> >> >> >> > > > \subsection{Device configuration
> > >> >> >> >> >> >> > > > layout}\label{sec:Device Types / Network
> > >> >Device
> > >> >> >> >> >> >> > > >    A driver SHOULD negotiate the
> > >> >> >> >> >> >> > > > VIRTIO_NET_F_STANDBY feature if
> > >> >> >> >> >> >the device offers it.
> > >> >> >> >> >> >> > > > +A driver SHOULD accept the
> > >> >> >> >VIRTIO_NET_F_CONST_RXBUF_LEN
> > >> >> >> >> >> >feature if offered.
> > >> >> >> >> >> >> > > > +
> > >> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has
> > >been
> > >> >> >> >> >negotiated,
> > >> >> >> >> >> >> > > > +the driver MUST set \field{rx_buf_len}.
> > >> >> >> >> >> >> > >
> > >> >> >> >> >> >> > > I think it's device that set the field?
> > >> >> >> >> >> >> > Makes more sense for the driver, but if you want this set
> > >e.g.
> > >> >> >> >> >> >> > before buffers are added, you must say so.
> > >> >> >> >> >> >> >
> > >> >> >> >> >> >> > > > +
> > >> >> >> >> >> >> > > > +A driver MUST NOT modify \field{rx_buf_len} once
> > >> >> >> >> >> >> > > > +it has been
> > >> >> >> >> >set.
> > >> >> >> >> >> >> > > > +
> > >> >> >> >> >> >> >
> > >> >> >> >> >> >> > This seems very unflexible. I can see how e.g. XDP
> > >> >> >> >> >> >> > would benefit from big buffers while skbs benefit
> > >> >> >> >> >> >> > from small
> > >> >buffers.
> > >> >> >> >> >> >> >
> > >> >> >> >> >> >> > This calls for ability to change this.
> > >> >> >> >> >> >>
> > >> >> >> >> >> >>
> > >> >> >> >> >> >> Yes, but it requires non trivial cleanups for the old
> > >> >> >> >> >> >> length and place them with new ones.
> > >> >> >> >> >> >>
> > >> >> >> >> >> >> Thanks
> > >> >> >> >> >> >
> > >> >> >> >> >> >
> > >> >> >> >> >> >Let's see:
> > >> >> >> >> >> >1	- making buffer smaller: just update config space,
> > >> >> >> >> >> >	  then make new buffers smaller
> > >> >> >> >> >> >
> > >> >> >> >> >> >2	- making buffers bigger: add larger buffers,
> > >> >> >> >> >> >	once all small ones are consumed update config space
> > >> >> >> >> >> >
> > >> >> >> >> >> >2 is tricky I agree. Thoughts?
> > >> >> >> >> >> >
> > >> >> >> >> >> >
> > >> >> >> >> >> Agree. It's doable, provided that the driver will follow
> > >> >> >> >> >> the update
> > >> >> >> >> >procedure.
> > >> >> >> >> >> >
> > >> >> >> >> >> >>
> > >> >> >> >> >> >> >
> > >> >> >> >> >> >> > > >    \subsubsection{Legacy Interface: Device
> > >> >> >> >> >> >> > > > configuration
> > >> >> >> >> >> >layout}\label{sec:Device Types / Network Device / Device
> > >> >> >> >> >> >configuration layout / Legacy Interface: Device
> > >> >> >> >> >> >configuration layout}
> > >> >> >> >> >> >> > > >    \label{sec:Device Types / Block Device /
> > >> >> >> >> >> >> > > > Feature bits / Device
> > >> >> >> >> >> >configuration layout / Legacy Interface: Device
> > >> >> >> >> >> >configuration layout}
> > >> >> >> >> >> >> > > >    When using the legacy interface, transitional
> > >> >> >> >> >> >> > > > devices and drivers @@
> > >> >> >> >> >> >> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up
> > >> >> >> >> >> >> > > > Receive
> > >> >> >> >> >> >Buffers}\label{sec:Device Types / Network Devi
> > >> >> >> >> >> >> > > >    If VIRTIO_NET_F_MQ is negotiated, each of
> > >> >> >> >> >> >> > > > receiveq1\ldots
> > >> >> >> >> >> >receiveqN
> > >> >> >> >> >> >> > > >    that will be used SHOULD be populated with
> > >> >> >> >> >> >> > > > receive
> > >> >> >buffers.
> > >> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has
> > >been
> > >> >> >> >> >negotiated,
> > >> >> >> >> >> >> > > > +the driver MUST initialize all receive virtqueue
> > >> >> >> >> >> >> > > > +descriptors \field{len} field with the value
> > >> >> >> >> >> >> > > > +configured in \field{rx_buf_len} device
> > >> >> >> >> >> >> > > > +configuration field, and allocate receive
> > >> >> >> >> >> >buffers accordingly.
> > >> >> >> >> >> >> > > > +
> > >> >> >> >> >> >> > > >    \devicenormative{\paragraph}{Setting Up
> > >> >> >> >> >> >> > > > Receive Buffers}{Device Types / Network Device /
> > >> >> >> >> >> >> > > > Device Operation / Setting Up Receive Buffers}
> > >> >> >> >> >> >> > > >    The device MUST set \field{num_buffers} to the
> > >> >> >> >> >> >> > > > number of descriptors used to @@ -3396,6 +3417,10
> > >> >> >> >> >> >> > > > @@
> > >> >> >> >> >> >\subsubsection{Processing of Incoming
> > >> >> >> >> >> >Packets}\label{sec:Device Types / Network
> > >> >> >> >> >> >> > > >    checksum (in case of multiple encapsulated
> > >> >> >> >> >> >> > > > protocols, one
> > >> >> >> >level
> > >> >> >> >> >> >> > > >    of checksums is validated).
> > >> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been
> > >> >> >> >> >> >> > > > +negotiated,
> > >> >> >> >the
> > >> >> >> >> >> >device
> > >> >> >> >> >> >> > > > +MAY use \field{rx_buf_len} as a buffer length
> > >> >> >> >> >> >> > > > +(instead of reading it from virtqueue descriptor
> > >> >\field{len} field).
> > >> >> >> >> >> >> > >
> > >> >> >> >> >> >> > > Is this safe? What if driver submit a small buffer,
> > >> >> >> >> >> >> > > then device can read
> > >> >> >> >> >> >more than what is allowed?
> > >> >> >> >> >> >> > >
> > >> >> >> >> >> >> > > Thanks
> > >> >> >> >> >> >> > >
> > >> >> >> >> >> >> > >
> > >> >> >> >> >> >> > > > +
> > >> >> >> >> >> >> > > >    \drivernormative{\paragraph}{Processing of Incoming
> > >> >> >> >> >> >> > > >    Packets}{Device Types / Network Device /
> > >> >> >> >> >> >> > > > Device Operation
> > >> >> >/
> > >> >> >> >> >> >> > > >    Processing of Incoming Packets}
> > >> >
> > >> >
> > >> >This publicly archived list offers a means to provide input to the
> > >> >OASIS Virtual I/O Device (VIRTIO) TC.
> > >> >
> > >> >In order to verify user consent to the Feedback License terms and to
> > >> >minimize spam in the list archive, subscription is required before posting.
> > >> >
> > >> >Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > >> >Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > >> >List help: virtio-comment-help@lists.oasis-open.org
> > >> >List archive: https://urldefense.proofpoint.com/v2/url?u=https-
> > >> >3A__lists.oasis-2Dopen.org_archives_virtio-
> > >>
> > >>2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3l
> > >q
> > >> >qsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
> > >> >42yWtI9MnZcqZ2Gw7KCN7EgCg&s=-
> > >> >JICLquqUnNye7tilUS67AFv7opngsKEl5L75acB64U&e=
> > >> >Feedback License: https://urldefense.proofpoint.com/v2/url?u=https-
> > >> >3A__www.oasis-2Dopen.org_who_ipr_feedback-
> > >>
> > >>5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lq
> > >q
> > >> >sArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
> > >>
> > >>42yWtI9MnZcqZ2Gw7KCN7EgCg&s=4e2kWmSdPAtGMXBTHwfgNE_KOZdDU
> > >R
> > >> >Wsji73HWdVF3A&e=
> > >> >List Guidelines: https://urldefense.proofpoint.com/v2/url?u=https-
> > >> >3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-
> > >>
> > >>2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsArgF
> > >R
> > >> >dcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
> > >> >42yWtI9MnZcqZ2Gw7KCN7EgCg&s=i-dQn5G-
> > >> >auoFtCxN8Y2PN8UccM1ezgcrsT2A8T1H8wE&e=
> > >> >Committee: https://urldefense.proofpoint.com/v2/url?u=https-
> > >> >3A__www.oasis-
> > >>
> > >>2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
> > >&
> > >> >r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-
> > >> >aDaM6Z-42yWtI9MnZcqZ2Gw7KCN7EgCg&s=kcv-jA-_-JC3v64-_r5iP-
> > >> >XQ9rhyaeOKvrHJNHwZrLc&e=
> > >> >Join OASIS: https://urldefense.proofpoint.com/v2/url?u=https-
> > >> >3A__www.oasis-
> > >>
> > >>2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52
> > >o
> > >> >J3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
> > >>
> > >>42yWtI9MnZcqZ2Gw7KCN7EgCg&s=cKbLeI_5Fu9G7aybE5u51yISB0eRer6BvC
> > >xr
> > >> >wgd5HS4&e=
> > >>
> > 



This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-12-01 21:43                               ` Michael S. Tsirkin
@ 2019-12-02 15:09                                 ` Vitaly Mireyno
  2019-12-02 15:23                                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Vitaly Mireyno @ 2019-12-02 15:09 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, virtio-comment



>-----Original Message-----
>From: Michael S. Tsirkin <mst@redhat.com>
>Sent: Sunday, 1 December, 2019 23:44
>To: Vitaly Mireyno <vmireyno@marvell.com>
>Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>open.org
>Subject: Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH]
>virtio-net: Add equal-sized receive buffers feature
>
>On Sun, Dec 01, 2019 at 08:07:55AM -0500, Michael S. Tsirkin wrote:
>> On Sun, Dec 01, 2019 at 10:22:17AM +0000, Vitaly Mireyno wrote:
>> >
>> >
>> > >-----Original Message-----
>> > >From: Michael S. Tsirkin <mst@redhat.com>
>> > >Sent: Wednesday, 27 November, 2019 15:51
>> > >To: Vitaly Mireyno <vmireyno@marvell.com>
>> > >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>> > >open.org
>> > >Subject: Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re:
>> > >[PATCH]
>> > >virtio-net: Add equal-sized receive buffers feature
>> > >
>> > >On Tue, Nov 26, 2019 at 05:27:31PM +0000, Vitaly Mireyno wrote:
>> > >>
>> > >>
>> > >> >-----Original Message-----
>> > >> >From: virtio-comment@lists.oasis-open.org
>> > >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
>> > >> >Tsirkin
>> > >> >Sent: Sunday, 24 November, 2019 17:30
>> > >> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> > >> >Cc: Jason Wang <jasowang@redhat.com>;
>> > >> >virtio-comment@lists.oasis- open.org
>> > >> >Subject: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re:
>> > >> >[PATCH]
>> > >> >virtio-
>> > >> >net: Add equal-sized receive buffers feature
>> > >> >
>> > >> >On Sun, Nov 24, 2019 at 03:02:05PM +0000, Vitaly Mireyno wrote:
>> > >> >>
>> > >> >>
>> > >> >> >-----Original Message-----
>> > >> >> >From: Michael S. Tsirkin <mst@redhat.com>
>> > >> >> >Sent: Wednesday, 20 November, 2019 15:23
>> > >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> > >> >> >Cc: Jason Wang <jasowang@redhat.com>;
>> > >> >> >virtio-comment@lists.oasis- open.org
>> > >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
>> > >> >> >Add
>> > >> >> >equal- sized receive buffers feature
>> > >> >> >
>> > >> >> >On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote:
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> >-----Original Message-----
>> > >> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
>> > >> >> >> >Sent: Monday, 11 November, 2019 17:05
>> > >> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> > >> >> >> >Cc: Jason Wang <jasowang@redhat.com>;
>> > >> >> >> >virtio-comment@lists.oasis- open.org
>> > >> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
>> > >> >> >> >Add
>> > >> >> >> >equal- sized receive buffers feature
>> > >> >> >> >
>> > >> >> >> >On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno
>wrote:
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> >-----Original Message-----
>> > >> >> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
>> > >> >> >> >> >Sent: Sunday, 10 November, 2019 17:35
>> > >> >> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> > >> >> >> >> >Cc: Jason Wang <jasowang@redhat.com>;
>> > >> >> >> >> >virtio-comment@lists.oasis- open.org
>> > >> >> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-
>net:
>> > >> >> >> >> >Add
>> > >> >> >> >> >equal- sized receive buffers feature
>> > >> >> >> >> >
>> > >> >> >> >> >On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno
>wrote:
>> > >> >> >> >> >>
>> > >> >> >> >> >> >-----Original Message-----
>> > >> >> >> >> >> >From: virtio-comment@lists.oasis-open.org
>> > >> >> >> >> >> ><virtio-comment@lists.oasis- open.org> On Behalf Of
>> > >> >> >> >> >> >Michael
>> > >S.
>> > >> >> >> >> >> >Tsirkin
>> > >> >> >> >> >> >Sent: Tuesday, 5 November, 2019 20:52
>> > >> >> >> >> >> >To: Jason Wang <jasowang@redhat.com>
>> > >> >> >> >> >> >Cc: Vitaly Mireyno <vmireyno@marvell.com>;
>> > >> >> >> >> >> >virtio-comment@lists.oasis- open.org
>> > >> >> >> >> >> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-
>net:
>> > >> >> >> >> >> >Add equal-sized receive buffers feature
>> > >> >> >> >> >> >
>> > >> >> >> >> >> >External Email
>> > >> >> >> >> >> >
>> > >> >> >> >> >> >----------------------------------------------------
>> > >> >> >> >> >> >-----
>> > >> >> >> >> >> >---
>> > >> >> >> >> >> >---
>> > >> >> >> >> >> >---
>> > >> >> >> >> >> >---
>> > >> >> >> >> >> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang
>wrote:
>> > >> >> >> >> >> >>
>> > >> >> >> >> >> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
>> > >> >> >> >> >> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly
>> > >> >> >> >> >> >> > Mireyno
>> > >> >wrote:
>> > >> >> >> >> >> >> > > The feature is limited to receive buffers only.
>> > >> >> >> >> >> >> > > A driver decides on receive buffer length. The
>> > >> >> >> >> >> >> > > only limitation is that this
>> > >> >> >> >> >> >length has to be the same for all receive virtqueue
>buffers.
>> > >> >> >> >> >> >> > > The driver configures receive buffer length to
>> > >> >> >> >> >> >> > > the device during device
>> > >> >> >> >> >> >initialization, and the device reads it and may use
>> > >> >> >> >> >> >it for optimal
>> > >> >> >> >operation.
>> > >> >> >> >> >> >> > > No changes for transmit buffers.
>> > >> >> >> >> >> >> > >
>> > >> >> >> >> >> >> > > -----Original Message-----
>> > >> >> >> >> >> >> > > From: virtio-comment@lists.oasis-open.org
>> > >> >> >> >> >> >> > > <virtio-comment@lists.oasis-open.org> On
>> > >> >> >> >> >> >> > > Behalf Of Jason Wang
>> > >> >> >> >> >> >> > > Sent: Thursday, 31 October, 2019 12:15
>> > >> >> >> >> >> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
>> > >> >> >> >> >> >> > > virtio-comment@lists.oasis-open.org
>> > >> >> >> >> >> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
>> > >> >> >> >> >> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net:
>> > >> >> >> >> >> >> > > Add equal-sized receive buffers feature
>> > >> >> >> >> >> >> > >
>> > >> >> >> >> >> >> > >
>> > >> >> >> >> >> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
>> > >> >> >> >> >> >> > > > Some devices benefit from working with
>> > >> >> >> >> >> >> > > > receive buffers of a
>> > >> >> >> >> >> >predefined constant length.
>> > >> >> >> >> >> >> > > > Add a feature for drivers that allocate
>> > >> >> >> >> >> >> > > > equal-sized receive buffers, and
>> > >> >> >> >> >> >devices that benefit from predefined receive buffers
>length.
>> > >> >> >> >> >> >> > > >
>> > >> >> >> >> >> >> > > > Signed-off-by: Vitaly Mireyno
>> > >> >> >> >> >> >> > > > <vmireyno@marvell.com>
>> > >> >> >> >> >> >> > > > ---
>> > >> >> >> >> >> >> > > >    content.tex | 29
>+++++++++++++++++++++++++++--
>> > >> >> >> >> >> >> > > >    1 file changed, 27 insertions(+), 2
>> > >> >> >> >> >> >> > > > deletions(-)
>> > >> >> >> >> >> >> > >
>> > >> >> >> >> >> >> > > Is there any other networking device that has
>> > >> >> >> >> >> >> > > this
>> > >feature?
>> > >> >> >> >> >> >> > >
>> > >> >> >> >> >> >> > >
>> > >> >> >> >> >> >> > > > diff --git a/content.tex b/content.tex index
>> > >> >> >> >> >> >> > > > b1ea9b9..c9e67c8
>> > >> >> >> >> >> >> > > > 100644
>> > >> >> >> >> >> >> > > > --- a/content.tex
>> > >> >> >> >> >> >> > > > +++ b/content.tex
>> > >> >> >> >> >> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
>> > >> >> >> >> >> >> > > > bits}\label{sec:Device
>> > >> >> >> >> >> >Types / Network Device / Feature bits
>> > >> >> >> >> >> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set
>> > >MAC
>> > >> >> >> >> >> >> > > > address
>> > >> >> >> >> >> >through control
>> > >> >> >> >> >> >> > > >        channel.
>> > >> >> >> >> >> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)]
>> > >> >> >> >> >> >> > > > +Driver allocates all
>> > >> >> >> >> >> >receive buffers
>> > >> >> >> >> >> >> > > > +    of the same constant length. Device
>> > >> >> >> >> >> >> > > > + benefits from working
>> > >> >> >> >with
>> > >> >> >> >> >> >> > > > +    receive buffers of equal length.
>> > >> >> >> >> >> >> > > > +
>> > >> >> >> >> >> >> > Problem is, I don't think linux will use this
>> > >> >> >> >> >> >> > for skbs since it breaks buffer accounting. This
>> > >> >> >> >> >> >> > is because it is important to make skbs as small
>> > >> >> >> >> >> >> > as you can. So even if you see "device would
>> > >> >> >> >benefit"
>> > >> >> >> >> >> >> > there is no way to balance this with the benefit to
>linux.
>> > >> >> >> >> >> >> > How do you know which benefit is bigger?
>> > >> >> >> >> >> >>
>> > >> >> >> >> >> >>
>> > >> >> >> >> >> >> I guess the idea is e.g for Linux driver, it can
>> > >> >> >> >> >> >> refuse this
>> > >feature.
>> > >> >> >> >> >> >
>> > >> >> >> >> >> >Okay. What uses it?
>> > >> >> >> >> >> >
>> > >> >> >> >> >> >>
>> > >> >> >> >> >> >> >
>> > >> >> >> >> >> >> > You also never explained how does device
>> > >> >> >> >> >> >> > benefit. My guess is this allows device to
>> > >> >> >> >> >> >> > calculate the # of descriptors to fetch that are needed
>for a packet. Right?
>> > >> >> >> >> >> >> > Assuming this, I think a rough estimate should be
>enough.
>> > >> >> >> >> >> >> > If device fetches too much it can discard extra,
>> > >> >> >> >> >> >> > if it does not fetch enough it can fetch extra.
>> > >> >> >> >> >> >> >
>> > >> >> >> >> >> >> > Let us not forget that express packets are 256
>> > >> >> >> >> >> >> > bytes so up to
>> > >> >> >> >> >> >> > 16 descriptors fit in a packet, there is no
>> > >> >> >> >> >> >> > benefit most of the time in knowing whether e.g.
>> > >> >> >> >> >> >> > 1 or 2 descriptors are
>> > >> >needed.
>> > >> >> >> >> >> >> >
>> > >> >> >> >> >> >> >
>> > >> >> >> >> >> >> > Let us not forget these are buffers, not descriptors.
>> > >> >> >> >> >> >>
>> > >> >> >> >> >> >>
>> > >> >> >> >> >> >> I guess maybe the initial motivation is constant
>> > >> >> >> >> >> >> descriptor
>> > >> >length.
>> > >> >> >> >> >> >
>> > >> >> >> >> >> >
>> > >> >> >> >> >> >That conflicts with requirement framing is up to driver.
>> > >> >> >> >> >> >
>> > >> >> >> >> >> >
>> > >> >> >> >> >> I was using the wrong terminology. The feature is
>> > >> >> >> >> >> about constant
>> > >> >> >> >> >*descriptor* length. In other words, the value that
>> > >> >> >> >> >driver puts in Packed Virtqueue "Element Length" field (or
>'len'
>> > >> >> >> >> >field in the
>> > >> >> >'pvirtq_desc'
>> > >> >> >> >struct).
>> > >> >> >> >> >
>> > >> >> >> >> >OK so this conflicts with "2.6.4 Message Framing" which
>> > >> >> >> >> >requires that drivers can split buffers into as many
>> > >> >> >> >> >descriptors as
>> > >> >they like.
>> > >> >> >> >> >Right?
>> > >> >> >> >> >
>> > >> >> >> >> >> Does it make more sense now, or you still see an
>> > >> >> >> >> >> issue with Linux
>> > >> >> >driver?
>> > >> >> >> >> >
>> > >> >> >> >> >I think there's an issue with the flexible framing
>> > >> >> >> >> >requirements and there's an issue with Linux driver.
>> > >> >> >> >> >
>> > >> >> >> >> >> The motivation is to allow the device to calculate
>> > >> >> >> >> >> the exact number of
>> > >> >> >> >> >descriptors to consume, before fetching the descriptors.
>> > >> >> >> >> >This is beneficial for devices for which overconsuming
>> > >> >> >> >> >or underconsuming descriptors come at a high cost.
>> > >> >> >> >> >
>> > >> >> >> >> >So I guess we can agree getting the # of descriptors
>> > >> >> >> >> >*exactly* right isn't all that important. Right? My
>> > >> >> >> >> >question is how does the driver balance the device versus
>Linux needs?
>> > >> >> >> >> >
>> > >> >> >> >> >One idea is if we assume this is best effort, does not
>> > >> >> >> >> >have to be exact, then how about just having the device
>> > >> >> >> >> >assume descriptor sizes stay more or less constant and
>> > >> >> >> >> >use that to estimate the amount? If it under/over
>> > >> >> >> >> >estimates, things just go a bit
>> > >> >slower.
>> > >> >> >> >> >This way driver can adjust the sizes and device will
>> > >> >> >> >> >react automatically, with time.
>> > >> >> >> >> >
>> > >> >> >> >> >
>> > >> >> >> >>
>> > >> >> >> >> I see that "2.6.4 Message Framing" allows a full
>> > >> >> >> >> flexibility of descriptor lengths, and I presume it's
>> > >> >> >> >> applicable to Packet Virtqueues as well, though defined
>> > >> >> >> >> under Split Virtqueues
>> > >section.
>> > >> >> >> >
>> > >> >> >> >That's a good point. Probably makes sense to move it out
>> > >> >> >> >to a common section, right?
>> > >> >> >> >
>> > >> >> >> >> The example
>> > >> >> >> >> talks about transmit virtqueue, and it makes perfect sense.
>> > >> >> >> >> However, wouldn't a typical driver place equal-sized
>> > >> >> >> >> *receive* descriptors
>> > >> >> >> >anyway? So if a device can benefit from it, the driver
>> > >> >> >> >might as well negotiate this new feature.
>> > >> >> >> >
>> > >> >> >> >Having buffer size jump around wildly doesn't seem too
>> > >> >> >> >useful, I
>> > >agree.
>> > >> >> >> >OTOH being able to adjust it gradually has been in the
>> > >> >> >> >past demontrated to help performance measureably.
>> > >> >> >> >
>> > >> >> >> >> This could be especially relevant for devices, for which
>> > >> >> >> >> adjusting the number
>> > >> >> >> >of used descriptors is impractical after descriptors have
>> > >> >> >> >already been
>> > >> >> >fetched.
>> > >> >> >> >> I agree that if this requirement conflicts with specific
>> > >> >> >> >> driver needs, it will not
>> > >> >> >> >be negotiated, and the device will either underperform in
>> > >> >> >> >certain scenarios, or will not come up at all.
>> > >> >> >> >
>> > >> >> >> >Right so that makes it a challenge. Device says it prefers
>> > >> >> >> >fixed buffers. Is that preference more or less important
>> > >> >> >> >than ability to efficiently support workloads such as TCP small
>queues?
>> > >> >> >> >Driver has no idea and I suspect neither does the device.
>> > >> >> >> >So I don't see how will a Linux driver know that it should
>> > >> >> >> >enable this, neither how will device know it's okay to
>> > >> >> >> >just refuse
>> > >features_ok.
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >> >On the other hand, current drivers already have logic that
>> > >> >> >> >tries to change buffer sizes in a smooth way.  So simply
>> > >> >> >> >caching the last buffer size and assuming all the others
>> > >> >> >> >will be exactly the same will go a long way towards
>> > >> >> >> >limiting how much does device need to
>> > >> >fetch.
>> > >> >> >> >This does imply extra logic on the device to recover if
>> > >> >> >> >things change and the first read did not fetch enough
>> > >> >> >> >buffers, but then it would be required anyway if as you
>> > >> >> >> >say above the failure is not
>> > >> >catastrophic.
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >> >The feature thus only seems useful for small,
>> > >> >> >> >feature-restrained devices
>> > >> >> >> >- which are prepared to sacrifice some performance to cut
>> > >> >> >> >costs, and which can't recover at all. Is this what you are trying
>to do?
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> The intention is to enable devices with this specific
>> > >> >> >> limitation to offer virtio offload.  The feature can be
>> > >> >> >> redefined such that it would only be offered by devices
>> > >> >> >> that are unable to handle dynamically changing descriptor
>lengths. How does that sound?
>> > >> >> >
>> > >> >> >So it makes more sense when mergeable buffers are disabled
>> > >> >> >(since then buffers are practically all same size).
>> > >> >> >
>> > >> >>
>> > >> >> Actually, mergeable buffers are not a problem. They could be
>> > >> >> enabled, as long as each descriptor is the same length. So
>> > >> >> implementations that prefer optimizing memory utilization over
>> > >> >> jumbo frame performance can negotiate
>VIRTIO_NET_F_MRG_RXBUF
>> > >and allocate smaller buffers.
>> > >> >
>> > >> >So my point was, without VIRTIO_NET_F_MRG_RXBUF, all buffers are
>> > >same
>> > >> >length anyway. So if we are talking about cheap simple hardware,
>> > >> >I guess it can just not have VIRTIO_NET_F_MRG_RXBUF and be done
>> > >> >with
>> > >it?
>> > >> >If device does care about memory utilization then I think it
>> > >> >needs to give driver the flexibility it needs/uses.
>> > >> >No?
>> > >> >
>> > >>
>> > >> It's not a matter of device complexity, but a HW architecture,
>> > >> which could be
>> > >complex, but have this specific limitation.
>> > >
>> > >So - don't do it then?
>> > >
>> > >> I see at least two reasons to support and negotiate
>> > >VIRTIO_NET_F_MRG_RXBUF, while keeping equal-sized descriptor
>limitation:
>> > >>  * GRO
>> > >
>> > >You mean LRO?
>> > >
>> > >>  * With jumbo packets, if the throughput is capped by the port
>> > >> bandwidth,
>> > >and not by the device/driver per-packet performance, it makes sense
>> > >to optimize memory utilization by allocating small buffers, without
>> > >sacrificing throughput performance.
>> > >
>> > >So we are back to square one, if driver cares about memory
>> > >utilization with 1K packets vs 9K buffers, why not with 100byte
>> > >packets vs 1K buffers? Looks like exactly the same tradeoff.
>> > >
>> > >
>> > >>
>> > >> >Again I can see how we might want to disallow crazy setups with e.g.
>> > >> >1 byte per buffer. That's just abuse, no guest does that anyway.
>> > >> >So asking e.g. for a minimal buffer size sounds very reasonable.
>> > >> >But an option that disables functionality that a popular guest
>> > >> >uses needs a lot of documentation to help device writers figure
>> > >> >out whether they want that option or not, and I'd worry that
>> > >> >even with documentation will
>> > >be misunderstood even if we write it.
>> > >> >When do you enable this?
>> > >> >When you don't care about performance? When you don't care about
>> > >Linux?
>> > >> >
>> > >>
>> > >> I understand that there are guest-side optimizations that require
>> > >> flexibility
>> > >in descriptors length. I can propose the following simple logic:
>> > >> Device - advertise "equal-sized descriptor" feature only if the
>> > >> device is
>> > >absolutely unable to operate otherwise. The device will not set
>> > >FEATURES_OK unless the driver negotiates this feature.
>> > >> Driver - if device advertises "equal-sized descriptor" feature -
>> > >> if possible,
>> > >give up on the flexible descriptors length optimizations. If not -
>> > >give up on the device.
>> > >
>> > >Yes, that's possible. Looks like a rather limited feature, and i'd
>> > >rather we focused on something more widely applicable, but with
>> > >enough disclamers that devices SHOULD NOT set this bit we can maybe
>do that.
>> > >
>> >
>> > I agree that this feature is more of a limitation declaration, rather than an
>enhancement, but let me emphasize that its only purpose is to make more
>*existing* HW devices be virtio compatible.
>> > This feature will allow such HW devices to offer virtio offload with
>VIRTIO_NET_F_MRG_RXBUF capability for GRO/LRO offload.
>> > New device designs should definitely avoid employing this feature.
>>
>> I understand, my question is: can you find a way to make this more
>> generally useful? Something along the lines of the min buffer size
>> suggestion which would avoid creating a special case in the driver?
>> Any idea?
>
>as an example of a vague idea, exposing max rx s/g, min buffer + max buffer
>will allow device to force this from device side.
>
>Is that good? If not, how does driver decide on a good fixed buffer size?
>

Setting max=min=fixed_size by the device will work, but this seems too restrictive, as we still may want to enable the driver to select its buffer size.
I guess driver can select the fixed buffer size based on the MTU.

What if device will request max/min buffer size ratio, and driver will set min buffer size? This can solve the fixed size issue, without forcing a specific size.
Along with the max s/g, maybe it can also help avoiding rx buffer size abuse by the driver (i.e. setting it too low).

>>
>> > >
>> > >>
>> > >> >> >
>> > >> >> >With that in mind, I have an idea: scsi and block already
>> > >> >> >have max sg
>> > >field.
>> > >> >> >How about we add a writeable max sg field, maybe even make it
>> > >> >> >programmable per vq?
>> > >> >> >
>> > >> >> >Thus driver tells device what is it's max s/g value for rx.
>> > >> >> >Worst case device fetches a bit more than it needs.
>> > >> >> >Discarding extra shouldn't be expensive. This looks like it
>> > >> >> >will help even smart devices. What
>> > >> >do you think?
>> > >> >> >
>> > >> >> >This nicely avoids conflicting with the flexible framing requirement.
>> > >> >> >
>> > >> >>
>> > >> >> My intention was to avoid any descriptor length variations,
>> > >> >> for devices that unable fetching or discarding extra
>> > >> >> descriptors. (If in the pipelined HW processing the decision
>> > >> >> on number of descriptors is made in the early stage, and it
>> > >> >> cannot be undone in a later
>> > >stage).
>> > >> >
>> > >> >Frankly discarding unused descriptors looks to me like something
>> > >> >that shouldn't have a high cost in the hardware.  I can see how
>> > >> >trying to predict descriptor length, and fetching extra if not
>> > >> >enough was fetched can have a high cost, so to me extensions to
>> > >> >remove the guesswork from the device have value.  However a lot
>> > >> >of effort went into trying to reduce e.g. number of pci express
>> > >> >packets needed to fetch
>> > >descriptors.
>> > >> >Each packet fetches 256 bytes anyway, does it not?  Not having
>> > >> >an ability to use that information seems like an obvious waste,
>> > >> >and that means ability to keep some fetched descriptors around
>> > >> >even if they are not used for a given packet. Again just my $.02.
>> > >> >
>> > >>
>> > >> In packed virtqueue, discarding unused descriptors (and buffers
>> > >> associated
>> > >with them) can indeed be easy, but reusing them for the next packet
>> > >is complicated (or impossible).
>> > >> I agree that descriptors are being (pre)fetched with the maximum
>> > >> efficiency
>> > >(in terms of PCIe bandwidth), and cached in the device. But the
>> > >decision to fetch is being made according to the number of left
>> > >cached-in descriptors and the expected number of descriptors that will
>be used by the packet.
>> > >> If the expected number of descriptors is larger than the actual
>> > >> one, the next
>> > >fetch decision will be taken too early, and there will be no way to
>> > >reuse excess cached descriptors, and they will have to be discarded.
>> > >
>> > >> Even if it's possible to skip descriptors in the packed virtqueue
>> > >> (is it?), it's
>> > >surely inefficient.
>> > >
>> > >OK I think I understand what you are doing. Device is getting
>> > >buffers 1,2,3 for packet 1, it is meanwhile receiving packet
>> > >2 and decides to get buffers 4,5 for packet 2.
>> > >At this point it finally gets buffers 1-3 and figures out that
>> > >buffers 3 was not needed, but possibly it already started writing packet 2
>into buffers 4.
>> > >What to do about buffers 3 now?
>> > >
>> > >Is above a good example?
>> > >
>> > >If yes then it looks like you are unaware that Descriptors can be
>> > >used out of order, with split or packed ring, with no issues.
>> > >Looks like exactly what you need to address this issue?
>> > >So device will simply proceed with marking buffers 1,2,4,5 as used,
>> > >and store buffers 3 in some kind of internal memory and use it for the
>next packet.
>> > >
>> > >This is exactly the kind of thing out of order was designed for.
>> > >
>> > >Does this answer the question?
>> > >
>> >
>> > The example is good, and the suggested solution is clear. However if we're
>talking about HW device that is designed to process packets/buffers in order,
>this solution could not be applicable.
>>
>>
>>
>> > >>
>> > >> >>
>> > >> >> Defining max s/g sounds like an interesting feature by itself.
>> > >> >
>> > >> >But assuming we have max RX s/g, I guess hardware can set max s/g =
>1?
>> > >> >Then since with !VIRTIO_NET_F_MRG_RXBUF all buffers are forced
>> > >> >to be same length.
>> > >> >
>> > >> >>
>> > >> >> >
>> > >> >> >> >> Could you specify what issue do you see with the Linux
>driver?
>> > >> >> >> >
>> > >> >> >> >See logic around struct ewma.
>> > >> >> >> >
>> > >> >> >> >The first relevant commit is I guess commit
>> > >> >> >> >ab7db91705e95ed1bba1304388936fccfa58c992
>> > >> >> >> >    virtio-net: auto-tune mergeable rx buffer size for
>> > >> >> >> >improved performance
>> > >> >> >> >
>> > >> >> >> >this claims a gain of about 10% with large packets which
>> > >> >> >> >isn't earth shattering but also not something we can ignore
>completely.
>> > >> >> >> >And I suspect it can be bigger with smaller packets.
>> > >> >> >> >
>> > >> >> >> >> >> >>
>> > >> >> >> >> >> >> >
>> > >> >> >> >> >> >> > So device does not really know the exact # of
>descriptors:
>> > >> >> >> >> >> >> > current drivers do 1 descriptor per buffer but
>> > >> >> >> >> >> >> > nothing prevents more.
>> > >> >> >> >> >> >> >
>> > >> >> >> >> >> >> >
>> > >> >> >> >> >> >> > Thoughts?
>> > >> >> >> >> >> >> >
>> > >> >> >> >> >> >> >
>> > >> >> >> >> >> >> > > >    \item[VIRTIO_NET_F_RSC_EXT(61)] Device
>> > >> >> >> >> >> >> > > > can process duplicated
>> > >> >> >> >> >> >ACKs
>> > >> >> >> >> >> >> > > >        and report number of coalesced
>> > >> >> >> >> >> >> > > > segments and duplicated ACKs @@ -2854,8
>> > >> >> >> >> >> >> > > > +2858,8 @@
>> > >> >> >> >\subsubsection{Legacy Interface:
>> > >> >> >> >> >> >Feature bits}\label{sec:Device Types / Network
>> > >> >> >> >> >> >> > > >    \subsection{Device configuration
>> > >> >> >> >> >> >> > > > layout}\label{sec:Device Types /
>> > >> >> >> >> >> >Network Device / Device configuration layout}
>> > >> >> >> >> >> >> > > >    \label{sec:Device Types / Block Device /
>> > >> >> >> >> >> >> > > > Feature bits / Device configuration layout}
>> > >> >> >> >> >> >> > > > -Three driver-read-only configuration fields
>> > >> >> >> >> >> >> > > > are
>> > >currently defined.
>> > >> >> >> >> >> >> > > > The \field{mac} address field -always exists
>> > >> >> >> >> >> >> > > > (though is only valid if VIRTIO_NET_F_MAC is
>> > >> >> >> >> >> >> > > > set), and
>> > >> >> >> >> >> >> > > > +The driver-read-only \field{mac} address
>> > >> >> >> >> >> >> > > > +field always exists (though is only valid
>> > >> >> >> >> >> >> > > > +if VIRTIO_NET_F_MAC is set), and
>> > >> >> >> >> >> >> > > >    \field{status} only exists if
>> > >> >> >> >> >> >> > > > VIRTIO_NET_F_STATUS is
>> > >set.
>> > >> >> >Two
>> > >> >> >> >> >> >> > > >    read-only bits (for the driver) are
>> > >> >> >> >> >> >> > > > currently defined for the status
>> > >> >> >> >> >> >field:
>> > >> >> >> >> >> >> > > >    VIRTIO_NET_S_LINK_UP and
>> > >> >VIRTIO_NET_S_ANNOUNCE.
>> > >> >> >> >> >> >> > > > @@ -2875,12 +2879,17 @@ \subsection{Device
>> > >> >> >> >> >> >> > > > configuration
>> > >> >> >> >> >> >layout}\label{sec:Device Types / Network Device
>> > >> >> >> >> >> >> > > >    VIRTIO_NET_F_MTU is set. This field
>> > >> >> >> >> >> >> > > > specifies the maximum MTU for
>> > >> >> >> >> >> >the driver to
>> > >> >> >> >> >> >> > > >    use.
>> > >> >> >> >> >> >> > > > +The device-read-only field
>> > >> >> >> >> >> >> > > > +\field{rx_buf_len} only exists if
>> > >> >> >> >> >> >> > >
>> > >> >> >> >> >> >> > > Should be driver-read-only.
>> > >> >> >> >> >> >> > >
>> > >> >> >> >> >> >> > >
>> > >> >> >> >> >> >> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is
>negotiated.
>> > >This
>> > >> >> >> >> >> >> > > > +field specifies the receive buffers length.
>> > >> >> >> >> >> >> > > > +
>> > >> >> >> >> >> >> > > >    \begin{lstlisting}
>> > >> >> >> >> >> >> > > >    struct virtio_net_config {
>> > >> >> >> >> >> >> > > >            u8 mac[6];
>> > >> >> >> >> >> >> > > >            le16 status;
>> > >> >> >> >> >> >> > > >            le16 max_virtqueue_pairs;
>> > >> >> >> >> >> >> > > >            le16 mtu;
>> > >> >> >> >> >> >> > > > +        le32 rx_buf_len;
>> > >> >> >> >> >> >> > > >    };
>> > >> >> >> >> >> >> > > >    \end{lstlisting} @@ -2933,6 +2942,13 @@
>> > >> >> >> >> >> >> > > > \subsection{Device configuration
>> > >> >> >> >> >> >> > > > layout}\label{sec:Device Types / Network
>> > >> >Device
>> > >> >> >> >> >> >> > > >    A driver SHOULD negotiate the
>> > >> >> >> >> >> >> > > > VIRTIO_NET_F_STANDBY feature if
>> > >> >> >> >> >> >the device offers it.
>> > >> >> >> >> >> >> > > > +A driver SHOULD accept the
>> > >> >> >> >VIRTIO_NET_F_CONST_RXBUF_LEN
>> > >> >> >> >> >> >feature if offered.
>> > >> >> >> >> >> >> > > > +
>> > >> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature
>has
>> > >been
>> > >> >> >> >> >negotiated,
>> > >> >> >> >> >> >> > > > +the driver MUST set \field{rx_buf_len}.
>> > >> >> >> >> >> >> > >
>> > >> >> >> >> >> >> > > I think it's device that set the field?
>> > >> >> >> >> >> >> > Makes more sense for the driver, but if you want
>> > >> >> >> >> >> >> > this set
>> > >e.g.
>> > >> >> >> >> >> >> > before buffers are added, you must say so.
>> > >> >> >> >> >> >> >
>> > >> >> >> >> >> >> > > > +
>> > >> >> >> >> >> >> > > > +A driver MUST NOT modify \field{rx_buf_len}
>> > >> >> >> >> >> >> > > > +once it has been
>> > >> >> >> >> >set.
>> > >> >> >> >> >> >> > > > +
>> > >> >> >> >> >> >> >
>> > >> >> >> >> >> >> > This seems very unflexible. I can see how e.g.
>> > >> >> >> >> >> >> > XDP would benefit from big buffers while skbs
>> > >> >> >> >> >> >> > benefit from small
>> > >> >buffers.
>> > >> >> >> >> >> >> >
>> > >> >> >> >> >> >> > This calls for ability to change this.
>> > >> >> >> >> >> >>
>> > >> >> >> >> >> >>
>> > >> >> >> >> >> >> Yes, but it requires non trivial cleanups for the
>> > >> >> >> >> >> >> old length and place them with new ones.
>> > >> >> >> >> >> >>
>> > >> >> >> >> >> >> Thanks
>> > >> >> >> >> >> >
>> > >> >> >> >> >> >
>> > >> >> >> >> >> >Let's see:
>> > >> >> >> >> >> >1	- making buffer smaller: just update config space,
>> > >> >> >> >> >> >	  then make new buffers smaller
>> > >> >> >> >> >> >
>> > >> >> >> >> >> >2	- making buffers bigger: add larger buffers,
>> > >> >> >> >> >> >	once all small ones are consumed update config
>> > >> >> >> >> >> >space
>> > >> >> >> >> >> >
>> > >> >> >> >> >> >2 is tricky I agree. Thoughts?
>> > >> >> >> >> >> >
>> > >> >> >> >> >> >
>> > >> >> >> >> >> Agree. It's doable, provided that the driver will
>> > >> >> >> >> >> follow the update
>> > >> >> >> >> >procedure.
>> > >> >> >> >> >> >
>> > >> >> >> >> >> >>
>> > >> >> >> >> >> >> >
>> > >> >> >> >> >> >> > > >    \subsubsection{Legacy Interface: Device
>> > >> >> >> >> >> >> > > > configuration
>> > >> >> >> >> >> >layout}\label{sec:Device Types / Network Device /
>> > >> >> >> >> >> >Device configuration layout / Legacy Interface:
>> > >> >> >> >> >> >Device configuration layout}
>> > >> >> >> >> >> >> > > >    \label{sec:Device Types / Block Device /
>> > >> >> >> >> >> >> > > > Feature bits / Device
>> > >> >> >> >> >> >configuration layout / Legacy Interface: Device
>> > >> >> >> >> >> >configuration layout}
>> > >> >> >> >> >> >> > > >    When using the legacy interface,
>> > >> >> >> >> >> >> > > > transitional devices and drivers @@
>> > >> >> >> >> >> >> > > > -3241,6 +3257,11 @@ \subsubsection{Setting
>> > >> >> >> >> >> >> > > > Up Receive
>> > >> >> >> >> >> >Buffers}\label{sec:Device Types / Network Devi
>> > >> >> >> >> >> >> > > >    If VIRTIO_NET_F_MQ is negotiated, each of
>> > >> >> >> >> >> >> > > > receiveq1\ldots
>> > >> >> >> >> >> >receiveqN
>> > >> >> >> >> >> >> > > >    that will be used SHOULD be populated
>> > >> >> >> >> >> >> > > > with receive
>> > >> >> >buffers.
>> > >> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature
>has
>> > >been
>> > >> >> >> >> >negotiated,
>> > >> >> >> >> >> >> > > > +the driver MUST initialize all receive
>> > >> >> >> >> >> >> > > > +virtqueue descriptors \field{len} field
>> > >> >> >> >> >> >> > > > +with the value configured in
>> > >> >> >> >> >> >> > > > +\field{rx_buf_len} device configuration
>> > >> >> >> >> >> >> > > > +field, and allocate receive
>> > >> >> >> >> >> >buffers accordingly.
>> > >> >> >> >> >> >> > > > +
>> > >> >> >> >> >> >> > > >    \devicenormative{\paragraph}{Setting Up
>> > >> >> >> >> >> >> > > > Receive Buffers}{Device Types / Network
>> > >> >> >> >> >> >> > > > Device / Device Operation / Setting Up Receive
>Buffers}
>> > >> >> >> >> >> >> > > >    The device MUST set \field{num_buffers}
>> > >> >> >> >> >> >> > > > to the number of descriptors used to @@
>> > >> >> >> >> >> >> > > > -3396,6 +3417,10 @@
>> > >> >> >> >> >> >\subsubsection{Processing of Incoming
>> > >> >> >> >> >> >Packets}\label{sec:Device Types / Network
>> > >> >> >> >> >> >> > > >    checksum (in case of multiple
>> > >> >> >> >> >> >> > > > encapsulated protocols, one
>> > >> >> >> >level
>> > >> >> >> >> >> >> > > >    of checksums is validated).
>> > >> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been
>> > >> >> >> >> >> >> > > > +negotiated,
>> > >> >> >> >the
>> > >> >> >> >> >> >device
>> > >> >> >> >> >> >> > > > +MAY use \field{rx_buf_len} as a buffer
>> > >> >> >> >> >> >> > > > +length (instead of reading it from
>> > >> >> >> >> >> >> > > > +virtqueue descriptor
>> > >> >\field{len} field).
>> > >> >> >> >> >> >> > >
>> > >> >> >> >> >> >> > > Is this safe? What if driver submit a small
>> > >> >> >> >> >> >> > > buffer, then device can read
>> > >> >> >> >> >> >more than what is allowed?
>> > >> >> >> >> >> >> > >
>> > >> >> >> >> >> >> > > Thanks
>> > >> >> >> >> >> >> > >
>> > >> >> >> >> >> >> > >
>> > >> >> >> >> >> >> > > > +
>> > >> >> >> >> >> >> > > >    \drivernormative{\paragraph}{Processing of
>Incoming
>> > >> >> >> >> >> >> > > >    Packets}{Device Types / Network Device /
>> > >> >> >> >> >> >> > > > Device Operation
>> > >> >> >/
>> > >> >> >> >> >> >> > > >    Processing of Incoming Packets}
>> > >> >
>> > >> >
>> > >> >This publicly archived list offers a means to provide input to
>> > >> >the OASIS Virtual I/O Device (VIRTIO) TC.
>> > >> >
>> > >> >In order to verify user consent to the Feedback License terms
>> > >> >and to minimize spam in the list archive, subscription is required
>before posting.
>> > >> >
>> > >> >Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> > >> >Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> > >> >List help: virtio-comment-help@lists.oasis-open.org
>> > >> >List archive: https://urldefense.proofpoint.com/v2/url?u=https-
>> > >> >3A__lists.oasis-2Dopen.org_archives_virtio-
>> > >>
>> >
>>>2Dcomment_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3
>l
>> > >q
>> > >> >qsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
>> > >> >42yWtI9MnZcqZ2Gw7KCN7EgCg&s=-
>> > >> >JICLquqUnNye7tilUS67AFv7opngsKEl5L75acB64U&e=
>> > >> >Feedback License:
>> > >> >https://urldefense.proofpoint.com/v2/url?u=https-
>> > >> >3A__www.oasis-2Dopen.org_who_ipr_feedback-
>> > >>
>> >
>>>5Flicense.pdf&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3l
>q
>> > >q
>> > >> >sArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
>> > >>
>> >
>>>42yWtI9MnZcqZ2Gw7KCN7EgCg&s=4e2kWmSdPAtGMXBTHwfgNE_KOZdD
>U
>> > >R
>> > >> >Wsji73HWdVF3A&e=
>> > >> >List Guidelines:
>> > >> >https://urldefense.proofpoint.com/v2/url?u=https-
>> > >> >3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing-
>> > >>
>> >
>>>2Dlists&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW52oJ3lqqsAr
>gF
>> > >R
>> > >> >dcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
>> > >> >42yWtI9MnZcqZ2Gw7KCN7EgCg&s=i-dQn5G-
>> > >> >auoFtCxN8Y2PN8UccM1ezgcrsT2A8T1H8wE&e=
>> > >> >Committee: https://urldefense.proofpoint.com/v2/url?u=https-
>> > >> >3A__www.oasis-
>> > >>
>> >
>>>2Dopen.org_committees_virtio_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtf
>Q
>> > >&
>> > >>
>>r=lDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-
>> > >> >aDaM6Z-42yWtI9MnZcqZ2Gw7KCN7EgCg&s=kcv-jA-_-JC3v64-_r5iP-
>> > >> >XQ9rhyaeOKvrHJNHwZrLc&e=
>> > >> >Join OASIS: https://urldefense.proofpoint.com/v2/url?u=https-
>> > >> >3A__www.oasis-
>> > >>
>> >
>>>2Dopen.org_join_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=lDHJ2FW5
>2
>> > >o
>> > >> >J3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=gIIx9_eEGj-aDaM6Z-
>> > >>
>> >
>>>42yWtI9MnZcqZ2Gw7KCN7EgCg&s=cKbLeI_5Fu9G7aybE5u51yISB0eRer6Bv
>C
>> > >xr
>> > >> >wgd5HS4&e=
>> > >>
>> >


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-12-02 15:09                                 ` Vitaly Mireyno
@ 2019-12-02 15:23                                   ` Michael S. Tsirkin
  2019-12-02 16:16                                     ` Vitaly Mireyno
  0 siblings, 1 reply; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-12-02 15:23 UTC (permalink / raw)
  To: Vitaly Mireyno; +Cc: Jason Wang, virtio-comment

On Mon, Dec 02, 2019 at 03:09:14PM +0000, Vitaly Mireyno wrote:
> 
> 
> >-----Original Message-----
> >From: Michael S. Tsirkin <mst@redhat.com>
> >Sent: Sunday, 1 December, 2019 23:44
> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> >open.org
> >Subject: Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH]
> >virtio-net: Add equal-sized receive buffers feature
> >
> >On Sun, Dec 01, 2019 at 08:07:55AM -0500, Michael S. Tsirkin wrote:
> >> On Sun, Dec 01, 2019 at 10:22:17AM +0000, Vitaly Mireyno wrote:
> >> >
> >> >
> >> > >-----Original Message-----
> >> > >From: Michael S. Tsirkin <mst@redhat.com>
> >> > >Sent: Wednesday, 27 November, 2019 15:51
> >> > >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> > >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
> >> > >open.org
> >> > >Subject: Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re:
> >> > >[PATCH]
> >> > >virtio-net: Add equal-sized receive buffers feature
> >> > >
> >> > >On Tue, Nov 26, 2019 at 05:27:31PM +0000, Vitaly Mireyno wrote:
> >> > >>
> >> > >>
> >> > >> >-----Original Message-----
> >> > >> >From: virtio-comment@lists.oasis-open.org
> >> > >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
> >> > >> >Tsirkin
> >> > >> >Sent: Sunday, 24 November, 2019 17:30
> >> > >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> > >> >Cc: Jason Wang <jasowang@redhat.com>;
> >> > >> >virtio-comment@lists.oasis- open.org
> >> > >> >Subject: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re:
> >> > >> >[PATCH]
> >> > >> >virtio-
> >> > >> >net: Add equal-sized receive buffers feature
> >> > >> >
> >> > >> >On Sun, Nov 24, 2019 at 03:02:05PM +0000, Vitaly Mireyno wrote:
> >> > >> >>
> >> > >> >>
> >> > >> >> >-----Original Message-----
> >> > >> >> >From: Michael S. Tsirkin <mst@redhat.com>
> >> > >> >> >Sent: Wednesday, 20 November, 2019 15:23
> >> > >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> > >> >> >Cc: Jason Wang <jasowang@redhat.com>;
> >> > >> >> >virtio-comment@lists.oasis- open.org
> >> > >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> >> > >> >> >Add
> >> > >> >> >equal- sized receive buffers feature
> >> > >> >> >
> >> > >> >> >On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote:
> >> > >> >> >>
> >> > >> >> >>
> >> > >> >> >> >-----Original Message-----
> >> > >> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
> >> > >> >> >> >Sent: Monday, 11 November, 2019 17:05
> >> > >> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> > >> >> >> >Cc: Jason Wang <jasowang@redhat.com>;
> >> > >> >> >> >virtio-comment@lists.oasis- open.org
> >> > >> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
> >> > >> >> >> >Add
> >> > >> >> >> >equal- sized receive buffers feature
> >> > >> >> >> >
> >> > >> >> >> >On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno
> >wrote:
> >> > >> >> >> >>
> >> > >> >> >> >>
> >> > >> >> >> >> >-----Original Message-----
> >> > >> >> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
> >> > >> >> >> >> >Sent: Sunday, 10 November, 2019 17:35
> >> > >> >> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
> >> > >> >> >> >> >Cc: Jason Wang <jasowang@redhat.com>;
> >> > >> >> >> >> >virtio-comment@lists.oasis- open.org
> >> > >> >> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-
> >net:
> >> > >> >> >> >> >Add
> >> > >> >> >> >> >equal- sized receive buffers feature
> >> > >> >> >> >> >
> >> > >> >> >> >> >On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno
> >wrote:
> >> > >> >> >> >> >>
> >> > >> >> >> >> >> >-----Original Message-----
> >> > >> >> >> >> >> >From: virtio-comment@lists.oasis-open.org
> >> > >> >> >> >> >> ><virtio-comment@lists.oasis- open.org> On Behalf Of
> >> > >> >> >> >> >> >Michael
> >> > >S.
> >> > >> >> >> >> >> >Tsirkin
> >> > >> >> >> >> >> >Sent: Tuesday, 5 November, 2019 20:52
> >> > >> >> >> >> >> >To: Jason Wang <jasowang@redhat.com>
> >> > >> >> >> >> >> >Cc: Vitaly Mireyno <vmireyno@marvell.com>;
> >> > >> >> >> >> >> >virtio-comment@lists.oasis- open.org
> >> > >> >> >> >> >> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-
> >net:
> >> > >> >> >> >> >> >Add equal-sized receive buffers feature
> >> > >> >> >> >> >> >
> >> > >> >> >> >> >> >External Email
> >> > >> >> >> >> >> >
> >> > >> >> >> >> >> >----------------------------------------------------
> >> > >> >> >> >> >> >-----
> >> > >> >> >> >> >> >---
> >> > >> >> >> >> >> >---
> >> > >> >> >> >> >> >---
> >> > >> >> >> >> >> >---
> >> > >> >> >> >> >> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang
> >wrote:
> >> > >> >> >> >> >> >>
> >> > >> >> >> >> >> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
> >> > >> >> >> >> >> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly
> >> > >> >> >> >> >> >> > Mireyno
> >> > >> >wrote:
> >> > >> >> >> >> >> >> > > The feature is limited to receive buffers only.
> >> > >> >> >> >> >> >> > > A driver decides on receive buffer length. The
> >> > >> >> >> >> >> >> > > only limitation is that this
> >> > >> >> >> >> >> >length has to be the same for all receive virtqueue
> >buffers.
> >> > >> >> >> >> >> >> > > The driver configures receive buffer length to
> >> > >> >> >> >> >> >> > > the device during device
> >> > >> >> >> >> >> >initialization, and the device reads it and may use
> >> > >> >> >> >> >> >it for optimal
> >> > >> >> >> >operation.
> >> > >> >> >> >> >> >> > > No changes for transmit buffers.
> >> > >> >> >> >> >> >> > >
> >> > >> >> >> >> >> >> > > -----Original Message-----
> >> > >> >> >> >> >> >> > > From: virtio-comment@lists.oasis-open.org
> >> > >> >> >> >> >> >> > > <virtio-comment@lists.oasis-open.org> On
> >> > >> >> >> >> >> >> > > Behalf Of Jason Wang
> >> > >> >> >> >> >> >> > > Sent: Thursday, 31 October, 2019 12:15
> >> > >> >> >> >> >> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
> >> > >> >> >> >> >> >> > > virtio-comment@lists.oasis-open.org
> >> > >> >> >> >> >> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> >> > >> >> >> >> >> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net:
> >> > >> >> >> >> >> >> > > Add equal-sized receive buffers feature
> >> > >> >> >> >> >> >> > >
> >> > >> >> >> >> >> >> > >
> >> > >> >> >> >> >> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
> >> > >> >> >> >> >> >> > > > Some devices benefit from working with
> >> > >> >> >> >> >> >> > > > receive buffers of a
> >> > >> >> >> >> >> >predefined constant length.
> >> > >> >> >> >> >> >> > > > Add a feature for drivers that allocate
> >> > >> >> >> >> >> >> > > > equal-sized receive buffers, and
> >> > >> >> >> >> >> >devices that benefit from predefined receive buffers
> >length.
> >> > >> >> >> >> >> >> > > >
> >> > >> >> >> >> >> >> > > > Signed-off-by: Vitaly Mireyno
> >> > >> >> >> >> >> >> > > > <vmireyno@marvell.com>
> >> > >> >> >> >> >> >> > > > ---
> >> > >> >> >> >> >> >> > > >    content.tex | 29
> >+++++++++++++++++++++++++++--
> >> > >> >> >> >> >> >> > > >    1 file changed, 27 insertions(+), 2
> >> > >> >> >> >> >> >> > > > deletions(-)
> >> > >> >> >> >> >> >> > >
> >> > >> >> >> >> >> >> > > Is there any other networking device that has
> >> > >> >> >> >> >> >> > > this
> >> > >feature?
> >> > >> >> >> >> >> >> > >
> >> > >> >> >> >> >> >> > >
> >> > >> >> >> >> >> >> > > > diff --git a/content.tex b/content.tex index
> >> > >> >> >> >> >> >> > > > b1ea9b9..c9e67c8
> >> > >> >> >> >> >> >> > > > 100644
> >> > >> >> >> >> >> >> > > > --- a/content.tex
> >> > >> >> >> >> >> >> > > > +++ b/content.tex
> >> > >> >> >> >> >> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature
> >> > >> >> >> >> >> >> > > > bits}\label{sec:Device
> >> > >> >> >> >> >> >Types / Network Device / Feature bits
> >> > >> >> >> >> >> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set
> >> > >MAC
> >> > >> >> >> >> >> >> > > > address
> >> > >> >> >> >> >> >through control
> >> > >> >> >> >> >> >> > > >        channel.
> >> > >> >> >> >> >> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)]
> >> > >> >> >> >> >> >> > > > +Driver allocates all
> >> > >> >> >> >> >> >receive buffers
> >> > >> >> >> >> >> >> > > > +    of the same constant length. Device
> >> > >> >> >> >> >> >> > > > + benefits from working
> >> > >> >> >> >with
> >> > >> >> >> >> >> >> > > > +    receive buffers of equal length.
> >> > >> >> >> >> >> >> > > > +
> >> > >> >> >> >> >> >> > Problem is, I don't think linux will use this
> >> > >> >> >> >> >> >> > for skbs since it breaks buffer accounting. This
> >> > >> >> >> >> >> >> > is because it is important to make skbs as small
> >> > >> >> >> >> >> >> > as you can. So even if you see "device would
> >> > >> >> >> >benefit"
> >> > >> >> >> >> >> >> > there is no way to balance this with the benefit to
> >linux.
> >> > >> >> >> >> >> >> > How do you know which benefit is bigger?
> >> > >> >> >> >> >> >>
> >> > >> >> >> >> >> >>
> >> > >> >> >> >> >> >> I guess the idea is e.g for Linux driver, it can
> >> > >> >> >> >> >> >> refuse this
> >> > >feature.
> >> > >> >> >> >> >> >
> >> > >> >> >> >> >> >Okay. What uses it?
> >> > >> >> >> >> >> >
> >> > >> >> >> >> >> >>
> >> > >> >> >> >> >> >> >
> >> > >> >> >> >> >> >> > You also never explained how does device
> >> > >> >> >> >> >> >> > benefit. My guess is this allows device to
> >> > >> >> >> >> >> >> > calculate the # of descriptors to fetch that are needed
> >for a packet. Right?
> >> > >> >> >> >> >> >> > Assuming this, I think a rough estimate should be
> >enough.
> >> > >> >> >> >> >> >> > If device fetches too much it can discard extra,
> >> > >> >> >> >> >> >> > if it does not fetch enough it can fetch extra.
> >> > >> >> >> >> >> >> >
> >> > >> >> >> >> >> >> > Let us not forget that express packets are 256
> >> > >> >> >> >> >> >> > bytes so up to
> >> > >> >> >> >> >> >> > 16 descriptors fit in a packet, there is no
> >> > >> >> >> >> >> >> > benefit most of the time in knowing whether e.g.
> >> > >> >> >> >> >> >> > 1 or 2 descriptors are
> >> > >> >needed.
> >> > >> >> >> >> >> >> >
> >> > >> >> >> >> >> >> >
> >> > >> >> >> >> >> >> > Let us not forget these are buffers, not descriptors.
> >> > >> >> >> >> >> >>
> >> > >> >> >> >> >> >>
> >> > >> >> >> >> >> >> I guess maybe the initial motivation is constant
> >> > >> >> >> >> >> >> descriptor
> >> > >> >length.
> >> > >> >> >> >> >> >
> >> > >> >> >> >> >> >
> >> > >> >> >> >> >> >That conflicts with requirement framing is up to driver.
> >> > >> >> >> >> >> >
> >> > >> >> >> >> >> >
> >> > >> >> >> >> >> I was using the wrong terminology. The feature is
> >> > >> >> >> >> >> about constant
> >> > >> >> >> >> >*descriptor* length. In other words, the value that
> >> > >> >> >> >> >driver puts in Packed Virtqueue "Element Length" field (or
> >'len'
> >> > >> >> >> >> >field in the
> >> > >> >> >'pvirtq_desc'
> >> > >> >> >> >struct).
> >> > >> >> >> >> >
> >> > >> >> >> >> >OK so this conflicts with "2.6.4 Message Framing" which
> >> > >> >> >> >> >requires that drivers can split buffers into as many
> >> > >> >> >> >> >descriptors as
> >> > >> >they like.
> >> > >> >> >> >> >Right?
> >> > >> >> >> >> >
> >> > >> >> >> >> >> Does it make more sense now, or you still see an
> >> > >> >> >> >> >> issue with Linux
> >> > >> >> >driver?
> >> > >> >> >> >> >
> >> > >> >> >> >> >I think there's an issue with the flexible framing
> >> > >> >> >> >> >requirements and there's an issue with Linux driver.
> >> > >> >> >> >> >
> >> > >> >> >> >> >> The motivation is to allow the device to calculate
> >> > >> >> >> >> >> the exact number of
> >> > >> >> >> >> >descriptors to consume, before fetching the descriptors.
> >> > >> >> >> >> >This is beneficial for devices for which overconsuming
> >> > >> >> >> >> >or underconsuming descriptors come at a high cost.
> >> > >> >> >> >> >
> >> > >> >> >> >> >So I guess we can agree getting the # of descriptors
> >> > >> >> >> >> >*exactly* right isn't all that important. Right? My
> >> > >> >> >> >> >question is how does the driver balance the device versus
> >Linux needs?
> >> > >> >> >> >> >
> >> > >> >> >> >> >One idea is if we assume this is best effort, does not
> >> > >> >> >> >> >have to be exact, then how about just having the device
> >> > >> >> >> >> >assume descriptor sizes stay more or less constant and
> >> > >> >> >> >> >use that to estimate the amount? If it under/over
> >> > >> >> >> >> >estimates, things just go a bit
> >> > >> >slower.
> >> > >> >> >> >> >This way driver can adjust the sizes and device will
> >> > >> >> >> >> >react automatically, with time.
> >> > >> >> >> >> >
> >> > >> >> >> >> >
> >> > >> >> >> >>
> >> > >> >> >> >> I see that "2.6.4 Message Framing" allows a full
> >> > >> >> >> >> flexibility of descriptor lengths, and I presume it's
> >> > >> >> >> >> applicable to Packet Virtqueues as well, though defined
> >> > >> >> >> >> under Split Virtqueues
> >> > >section.
> >> > >> >> >> >
> >> > >> >> >> >That's a good point. Probably makes sense to move it out
> >> > >> >> >> >to a common section, right?
> >> > >> >> >> >
> >> > >> >> >> >> The example
> >> > >> >> >> >> talks about transmit virtqueue, and it makes perfect sense.
> >> > >> >> >> >> However, wouldn't a typical driver place equal-sized
> >> > >> >> >> >> *receive* descriptors
> >> > >> >> >> >anyway? So if a device can benefit from it, the driver
> >> > >> >> >> >might as well negotiate this new feature.
> >> > >> >> >> >
> >> > >> >> >> >Having buffer size jump around wildly doesn't seem too
> >> > >> >> >> >useful, I
> >> > >agree.
> >> > >> >> >> >OTOH being able to adjust it gradually has been in the
> >> > >> >> >> >past demontrated to help performance measureably.
> >> > >> >> >> >
> >> > >> >> >> >> This could be especially relevant for devices, for which
> >> > >> >> >> >> adjusting the number
> >> > >> >> >> >of used descriptors is impractical after descriptors have
> >> > >> >> >> >already been
> >> > >> >> >fetched.
> >> > >> >> >> >> I agree that if this requirement conflicts with specific
> >> > >> >> >> >> driver needs, it will not
> >> > >> >> >> >be negotiated, and the device will either underperform in
> >> > >> >> >> >certain scenarios, or will not come up at all.
> >> > >> >> >> >
> >> > >> >> >> >Right so that makes it a challenge. Device says it prefers
> >> > >> >> >> >fixed buffers. Is that preference more or less important
> >> > >> >> >> >than ability to efficiently support workloads such as TCP small
> >queues?
> >> > >> >> >> >Driver has no idea and I suspect neither does the device.
> >> > >> >> >> >So I don't see how will a Linux driver know that it should
> >> > >> >> >> >enable this, neither how will device know it's okay to
> >> > >> >> >> >just refuse
> >> > >features_ok.
> >> > >> >> >> >
> >> > >> >> >> >
> >> > >> >> >> >On the other hand, current drivers already have logic that
> >> > >> >> >> >tries to change buffer sizes in a smooth way.  So simply
> >> > >> >> >> >caching the last buffer size and assuming all the others
> >> > >> >> >> >will be exactly the same will go a long way towards
> >> > >> >> >> >limiting how much does device need to
> >> > >> >fetch.
> >> > >> >> >> >This does imply extra logic on the device to recover if
> >> > >> >> >> >things change and the first read did not fetch enough
> >> > >> >> >> >buffers, but then it would be required anyway if as you
> >> > >> >> >> >say above the failure is not
> >> > >> >catastrophic.
> >> > >> >> >> >
> >> > >> >> >> >
> >> > >> >> >> >The feature thus only seems useful for small,
> >> > >> >> >> >feature-restrained devices
> >> > >> >> >> >- which are prepared to sacrifice some performance to cut
> >> > >> >> >> >costs, and which can't recover at all. Is this what you are trying
> >to do?
> >> > >> >> >> >
> >> > >> >> >>
> >> > >> >> >> The intention is to enable devices with this specific
> >> > >> >> >> limitation to offer virtio offload.  The feature can be
> >> > >> >> >> redefined such that it would only be offered by devices
> >> > >> >> >> that are unable to handle dynamically changing descriptor
> >lengths. How does that sound?
> >> > >> >> >
> >> > >> >> >So it makes more sense when mergeable buffers are disabled
> >> > >> >> >(since then buffers are practically all same size).
> >> > >> >> >
> >> > >> >>
> >> > >> >> Actually, mergeable buffers are not a problem. They could be
> >> > >> >> enabled, as long as each descriptor is the same length. So
> >> > >> >> implementations that prefer optimizing memory utilization over
> >> > >> >> jumbo frame performance can negotiate
> >VIRTIO_NET_F_MRG_RXBUF
> >> > >and allocate smaller buffers.
> >> > >> >
> >> > >> >So my point was, without VIRTIO_NET_F_MRG_RXBUF, all buffers are
> >> > >same
> >> > >> >length anyway. So if we are talking about cheap simple hardware,
> >> > >> >I guess it can just not have VIRTIO_NET_F_MRG_RXBUF and be done
> >> > >> >with
> >> > >it?
> >> > >> >If device does care about memory utilization then I think it
> >> > >> >needs to give driver the flexibility it needs/uses.
> >> > >> >No?
> >> > >> >
> >> > >>
> >> > >> It's not a matter of device complexity, but a HW architecture,
> >> > >> which could be
> >> > >complex, but have this specific limitation.
> >> > >
> >> > >So - don't do it then?
> >> > >
> >> > >> I see at least two reasons to support and negotiate
> >> > >VIRTIO_NET_F_MRG_RXBUF, while keeping equal-sized descriptor
> >limitation:
> >> > >>  * GRO
> >> > >
> >> > >You mean LRO?
> >> > >
> >> > >>  * With jumbo packets, if the throughput is capped by the port
> >> > >> bandwidth,
> >> > >and not by the device/driver per-packet performance, it makes sense
> >> > >to optimize memory utilization by allocating small buffers, without
> >> > >sacrificing throughput performance.
> >> > >
> >> > >So we are back to square one, if driver cares about memory
> >> > >utilization with 1K packets vs 9K buffers, why not with 100byte
> >> > >packets vs 1K buffers? Looks like exactly the same tradeoff.
> >> > >
> >> > >
> >> > >>
> >> > >> >Again I can see how we might want to disallow crazy setups with e.g.
> >> > >> >1 byte per buffer. That's just abuse, no guest does that anyway.
> >> > >> >So asking e.g. for a minimal buffer size sounds very reasonable.
> >> > >> >But an option that disables functionality that a popular guest
> >> > >> >uses needs a lot of documentation to help device writers figure
> >> > >> >out whether they want that option or not, and I'd worry that
> >> > >> >even with documentation will
> >> > >be misunderstood even if we write it.
> >> > >> >When do you enable this?
> >> > >> >When you don't care about performance? When you don't care about
> >> > >Linux?
> >> > >> >
> >> > >>
> >> > >> I understand that there are guest-side optimizations that require
> >> > >> flexibility
> >> > >in descriptors length. I can propose the following simple logic:
> >> > >> Device - advertise "equal-sized descriptor" feature only if the
> >> > >> device is
> >> > >absolutely unable to operate otherwise. The device will not set
> >> > >FEATURES_OK unless the driver negotiates this feature.
> >> > >> Driver - if device advertises "equal-sized descriptor" feature -
> >> > >> if possible,
> >> > >give up on the flexible descriptors length optimizations. If not -
> >> > >give up on the device.
> >> > >
> >> > >Yes, that's possible. Looks like a rather limited feature, and i'd
> >> > >rather we focused on something more widely applicable, but with
> >> > >enough disclamers that devices SHOULD NOT set this bit we can maybe
> >do that.
> >> > >
> >> >
> >> > I agree that this feature is more of a limitation declaration, rather than an
> >enhancement, but let me emphasize that its only purpose is to make more
> >*existing* HW devices be virtio compatible.
> >> > This feature will allow such HW devices to offer virtio offload with
> >VIRTIO_NET_F_MRG_RXBUF capability for GRO/LRO offload.
> >> > New device designs should definitely avoid employing this feature.
> >>
> >> I understand, my question is: can you find a way to make this more
> >> generally useful? Something along the lines of the min buffer size
> >> suggestion which would avoid creating a special case in the driver?
> >> Any idea?
> >
> >as an example of a vague idea, exposing max rx s/g, min buffer + max buffer
> >will allow device to force this from device side.
> >
> >Is that good? If not, how does driver decide on a good fixed buffer size?
> >
> 
> Setting max=min=fixed_size by the device will work, but this seems too restrictive, as we still may want to enable the driver to select its buffer size.
> I guess driver can select the fixed buffer size based on the MTU.


Would that be based on max MTU field then?
I note that that's read-only, so just starting with that
and sending it back to the device doesn't change much ...

> What if device will request max/min buffer size ratio, and driver will set min buffer size? This can solve the fixed size issue, without forcing a specific size.
> Along with the max s/g, maybe it can also help avoiding rx buffer size abuse by the driver (i.e. setting it too low).

That sounds reasonably generic, too.

-- 
MST


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-12-02 15:23                                   ` Michael S. Tsirkin
@ 2019-12-02 16:16                                     ` Vitaly Mireyno
  2019-12-02 16:47                                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 28+ messages in thread
From: Vitaly Mireyno @ 2019-12-02 16:16 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, virtio-comment



>-----Original Message-----
>From: Michael S. Tsirkin <mst@redhat.com>
>Sent: Monday, 2 December, 2019 17:23
>To: Vitaly Mireyno <vmireyno@marvell.com>
>Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>open.org
>Subject: Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH]
>virtio-net: Add equal-sized receive buffers feature
>
>On Mon, Dec 02, 2019 at 03:09:14PM +0000, Vitaly Mireyno wrote:
>>
>>
>> >-----Original Message-----
>> >From: Michael S. Tsirkin <mst@redhat.com>
>> >Sent: Sunday, 1 December, 2019 23:44
>> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >Cc: Jason Wang <jasowang@redhat.com>; virtio-comment@lists.oasis-
>> >open.org
>> >Subject: Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re:
>> >[PATCH]
>> >virtio-net: Add equal-sized receive buffers feature
>> >
>> >On Sun, Dec 01, 2019 at 08:07:55AM -0500, Michael S. Tsirkin wrote:
>> >> On Sun, Dec 01, 2019 at 10:22:17AM +0000, Vitaly Mireyno wrote:
>> >> >
>> >> >
>> >> > >-----Original Message-----
>> >> > >From: Michael S. Tsirkin <mst@redhat.com>
>> >> > >Sent: Wednesday, 27 November, 2019 15:51
>> >> > >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >> > >Cc: Jason Wang <jasowang@redhat.com>;
>> >> > >virtio-comment@lists.oasis- open.org
>> >> > >Subject: Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re:
>> >> > >[PATCH]
>> >> > >virtio-net: Add equal-sized receive buffers feature
>> >> > >
>> >> > >On Tue, Nov 26, 2019 at 05:27:31PM +0000, Vitaly Mireyno wrote:
>> >> > >>
>> >> > >>
>> >> > >> >-----Original Message-----
>> >> > >> >From: virtio-comment@lists.oasis-open.org
>> >> > >> ><virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
>> >> > >> >Tsirkin
>> >> > >> >Sent: Sunday, 24 November, 2019 17:30
>> >> > >> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >> > >> >Cc: Jason Wang <jasowang@redhat.com>;
>> >> > >> >virtio-comment@lists.oasis- open.org
>> >> > >> >Subject: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re:
>> >> > >> >[PATCH]
>> >> > >> >virtio-
>> >> > >> >net: Add equal-sized receive buffers feature
>> >> > >> >
>> >> > >> >On Sun, Nov 24, 2019 at 03:02:05PM +0000, Vitaly Mireyno wrote:
>> >> > >> >>
>> >> > >> >>
>> >> > >> >> >-----Original Message-----
>> >> > >> >> >From: Michael S. Tsirkin <mst@redhat.com>
>> >> > >> >> >Sent: Wednesday, 20 November, 2019 15:23
>> >> > >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >> > >> >> >Cc: Jason Wang <jasowang@redhat.com>;
>> >> > >> >> >virtio-comment@lists.oasis- open.org
>> >> > >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net:
>> >> > >> >> >Add
>> >> > >> >> >equal- sized receive buffers feature
>> >> > >> >> >
>> >> > >> >> >On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno
>wrote:
>> >> > >> >> >>
>> >> > >> >> >>
>> >> > >> >> >> >-----Original Message-----
>> >> > >> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
>> >> > >> >> >> >Sent: Monday, 11 November, 2019 17:05
>> >> > >> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >> > >> >> >> >Cc: Jason Wang <jasowang@redhat.com>;
>> >> > >> >> >> >virtio-comment@lists.oasis- open.org
>> >> > >> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-
>net:
>> >> > >> >> >> >Add
>> >> > >> >> >> >equal- sized receive buffers feature
>> >> > >> >> >> >
>> >> > >> >> >> >On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly
>> >> > >> >> >> >Mireyno
>> >wrote:
>> >> > >> >> >> >>
>> >> > >> >> >> >>
>> >> > >> >> >> >> >-----Original Message-----
>> >> > >> >> >> >> >From: Michael S. Tsirkin <mst@redhat.com>
>> >> > >> >> >> >> >Sent: Sunday, 10 November, 2019 17:35
>> >> > >> >> >> >> >To: Vitaly Mireyno <vmireyno@marvell.com>
>> >> > >> >> >> >> >Cc: Jason Wang <jasowang@redhat.com>;
>> >> > >> >> >> >> >virtio-comment@lists.oasis- open.org
>> >> > >> >> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH]
>> >> > >> >> >> >> >virtio-
>> >net:
>> >> > >> >> >> >> >Add
>> >> > >> >> >> >> >equal- sized receive buffers feature
>> >> > >> >> >> >> >
>> >> > >> >> >> >> >On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly
>> >> > >> >> >> >> >Mireyno
>> >wrote:
>> >> > >> >> >> >> >>
>> >> > >> >> >> >> >> >-----Original Message-----
>> >> > >> >> >> >> >> >From: virtio-comment@lists.oasis-open.org
>> >> > >> >> >> >> >> ><virtio-comment@lists.oasis- open.org> On Behalf
>> >> > >> >> >> >> >> >Of Michael
>> >> > >S.
>> >> > >> >> >> >> >> >Tsirkin
>> >> > >> >> >> >> >> >Sent: Tuesday, 5 November, 2019 20:52
>> >> > >> >> >> >> >> >To: Jason Wang <jasowang@redhat.com>
>> >> > >> >> >> >> >> >Cc: Vitaly Mireyno <vmireyno@marvell.com>;
>> >> > >> >> >> >> >> >virtio-comment@lists.oasis- open.org
>> >> > >> >> >> >> >> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH]
>> >> > >> >> >> >> >> >virtio-
>> >net:
>> >> > >> >> >> >> >> >Add equal-sized receive buffers feature
>> >> > >> >> >> >> >> >
>> >> > >> >> >> >> >> >External Email
>> >> > >> >> >> >> >> >
>> >> > >> >> >> >> >> >-------------------------------------------------
>> >> > >> >> >> >> >> >---
>> >> > >> >> >> >> >> >-----
>> >> > >> >> >> >> >> >---
>> >> > >> >> >> >> >> >---
>> >> > >> >> >> >> >> >---
>> >> > >> >> >> >> >> >---
>> >> > >> >> >> >> >> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason
>> >> > >> >> >> >> >> >Wang
>> >wrote:
>> >> > >> >> >> >> >> >>
>> >> > >> >> >> >> >> >> On 2019/11/1 下午12:09, Michael S. Tsirkin wrote:
>> >> > >> >> >> >> >> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000,
>> >> > >> >> >> >> >> >> > Vitaly Mireyno
>> >> > >> >wrote:
>> >> > >> >> >> >> >> >> > > The feature is limited to receive buffers only.
>> >> > >> >> >> >> >> >> > > A driver decides on receive buffer length.
>> >> > >> >> >> >> >> >> > > The only limitation is that this
>> >> > >> >> >> >> >> >length has to be the same for all receive
>> >> > >> >> >> >> >> >virtqueue
>> >buffers.
>> >> > >> >> >> >> >> >> > > The driver configures receive buffer length
>> >> > >> >> >> >> >> >> > > to the device during device
>> >> > >> >> >> >> >> >initialization, and the device reads it and may
>> >> > >> >> >> >> >> >use it for optimal
>> >> > >> >> >> >operation.
>> >> > >> >> >> >> >> >> > > No changes for transmit buffers.
>> >> > >> >> >> >> >> >> > >
>> >> > >> >> >> >> >> >> > > -----Original Message-----
>> >> > >> >> >> >> >> >> > > From: virtio-comment@lists.oasis-open.org
>> >> > >> >> >> >> >> >> > > <virtio-comment@lists.oasis-open.org> On
>> >> > >> >> >> >> >> >> > > Behalf Of Jason Wang
>> >> > >> >> >> >> >> >> > > Sent: Thursday, 31 October, 2019 12:15
>> >> > >> >> >> >> >> >> > > To: Vitaly Mireyno <vmireyno@marvell.com>;
>> >> > >> >> >> >> >> >> > > virtio-comment@lists.oasis-open.org
>> >> > >> >> >> >> >> >> > > Cc: Michael S. Tsirkin <mst@redhat.com>
>> >> > >> >> >> >> >> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net:
>> >> > >> >> >> >> >> >> > > Add equal-sized receive buffers feature
>> >> > >> >> >> >> >> >> > >
>> >> > >> >> >> >> >> >> > >
>> >> > >> >> >> >> >> >> > > On 2019/10/31 下午5:23, Vitaly Mireyno wrote:
>> >> > >> >> >> >> >> >> > > > Some devices benefit from working with
>> >> > >> >> >> >> >> >> > > > receive buffers of a
>> >> > >> >> >> >> >> >predefined constant length.
>> >> > >> >> >> >> >> >> > > > Add a feature for drivers that allocate
>> >> > >> >> >> >> >> >> > > > equal-sized receive buffers, and
>> >> > >> >> >> >> >> >devices that benefit from predefined receive
>> >> > >> >> >> >> >> >buffers
>> >length.
>> >> > >> >> >> >> >> >> > > >
>> >> > >> >> >> >> >> >> > > > Signed-off-by: Vitaly Mireyno
>> >> > >> >> >> >> >> >> > > > <vmireyno@marvell.com>
>> >> > >> >> >> >> >> >> > > > ---
>> >> > >> >> >> >> >> >> > > >    content.tex | 29
>> >+++++++++++++++++++++++++++--
>> >> > >> >> >> >> >> >> > > >    1 file changed, 27 insertions(+), 2
>> >> > >> >> >> >> >> >> > > > deletions(-)
>> >> > >> >> >> >> >> >> > >
>> >> > >> >> >> >> >> >> > > Is there any other networking device that
>> >> > >> >> >> >> >> >> > > has this
>> >> > >feature?
>> >> > >> >> >> >> >> >> > >
>> >> > >> >> >> >> >> >> > >
>> >> > >> >> >> >> >> >> > > > diff --git a/content.tex b/content.tex
>> >> > >> >> >> >> >> >> > > > index
>> >> > >> >> >> >> >> >> > > > b1ea9b9..c9e67c8
>> >> > >> >> >> >> >> >> > > > 100644
>> >> > >> >> >> >> >> >> > > > --- a/content.tex
>> >> > >> >> >> >> >> >> > > > +++ b/content.tex
>> >> > >> >> >> >> >> >> > > > @@ -2811,6 +2811,10 @@
>> >> > >> >> >> >> >> >> > > > \subsection{Feature
>> >> > >> >> >> >> >> >> > > > bits}\label{sec:Device
>> >> > >> >> >> >> >> >Types / Network Device / Feature bits
>> >> > >> >> >> >> >> >> > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)]
>> >> > >> >> >> >> >> >> > > > Set
>> >> > >MAC
>> >> > >> >> >> >> >> >> > > > address
>> >> > >> >> >> >> >> >through control
>> >> > >> >> >> >> >> >> > > >        channel.
>> >> > >> >> >> >> >> >> > > >
>+\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)]
>> >> > >> >> >> >> >> >> > > > +Driver allocates all
>> >> > >> >> >> >> >> >receive buffers
>> >> > >> >> >> >> >> >> > > > +    of the same constant length. Device
>> >> > >> >> >> >> >> >> > > > + benefits from working
>> >> > >> >> >> >with
>> >> > >> >> >> >> >> >> > > > +    receive buffers of equal length.
>> >> > >> >> >> >> >> >> > > > +
>> >> > >> >> >> >> >> >> > Problem is, I don't think linux will use this
>> >> > >> >> >> >> >> >> > for skbs since it breaks buffer accounting.
>> >> > >> >> >> >> >> >> > This is because it is important to make skbs
>> >> > >> >> >> >> >> >> > as small as you can. So even if you see
>> >> > >> >> >> >> >> >> > "device would
>> >> > >> >> >> >benefit"
>> >> > >> >> >> >> >> >> > there is no way to balance this with the
>> >> > >> >> >> >> >> >> > benefit to
>> >linux.
>> >> > >> >> >> >> >> >> > How do you know which benefit is bigger?
>> >> > >> >> >> >> >> >>
>> >> > >> >> >> >> >> >>
>> >> > >> >> >> >> >> >> I guess the idea is e.g for Linux driver, it
>> >> > >> >> >> >> >> >> can refuse this
>> >> > >feature.
>> >> > >> >> >> >> >> >
>> >> > >> >> >> >> >> >Okay. What uses it?
>> >> > >> >> >> >> >> >
>> >> > >> >> >> >> >> >>
>> >> > >> >> >> >> >> >> >
>> >> > >> >> >> >> >> >> > You also never explained how does device
>> >> > >> >> >> >> >> >> > benefit. My guess is this allows device to
>> >> > >> >> >> >> >> >> > calculate the # of descriptors to fetch that
>> >> > >> >> >> >> >> >> > are needed
>> >for a packet. Right?
>> >> > >> >> >> >> >> >> > Assuming this, I think a rough estimate
>> >> > >> >> >> >> >> >> > should be
>> >enough.
>> >> > >> >> >> >> >> >> > If device fetches too much it can discard
>> >> > >> >> >> >> >> >> > extra, if it does not fetch enough it can fetch extra.
>> >> > >> >> >> >> >> >> >
>> >> > >> >> >> >> >> >> > Let us not forget that express packets are
>> >> > >> >> >> >> >> >> > 256 bytes so up to
>> >> > >> >> >> >> >> >> > 16 descriptors fit in a packet, there is no
>> >> > >> >> >> >> >> >> > benefit most of the time in knowing whether e.g.
>> >> > >> >> >> >> >> >> > 1 or 2 descriptors are
>> >> > >> >needed.
>> >> > >> >> >> >> >> >> >
>> >> > >> >> >> >> >> >> >
>> >> > >> >> >> >> >> >> > Let us not forget these are buffers, not
>descriptors.
>> >> > >> >> >> >> >> >>
>> >> > >> >> >> >> >> >>
>> >> > >> >> >> >> >> >> I guess maybe the initial motivation is
>> >> > >> >> >> >> >> >> constant descriptor
>> >> > >> >length.
>> >> > >> >> >> >> >> >
>> >> > >> >> >> >> >> >
>> >> > >> >> >> >> >> >That conflicts with requirement framing is up to driver.
>> >> > >> >> >> >> >> >
>> >> > >> >> >> >> >> >
>> >> > >> >> >> >> >> I was using the wrong terminology. The feature is
>> >> > >> >> >> >> >> about constant
>> >> > >> >> >> >> >*descriptor* length. In other words, the value that
>> >> > >> >> >> >> >driver puts in Packed Virtqueue "Element Length"
>> >> > >> >> >> >> >field (or
>> >'len'
>> >> > >> >> >> >> >field in the
>> >> > >> >> >'pvirtq_desc'
>> >> > >> >> >> >struct).
>> >> > >> >> >> >> >
>> >> > >> >> >> >> >OK so this conflicts with "2.6.4 Message Framing"
>> >> > >> >> >> >> >which requires that drivers can split buffers into
>> >> > >> >> >> >> >as many descriptors as
>> >> > >> >they like.
>> >> > >> >> >> >> >Right?
>> >> > >> >> >> >> >
>> >> > >> >> >> >> >> Does it make more sense now, or you still see an
>> >> > >> >> >> >> >> issue with Linux
>> >> > >> >> >driver?
>> >> > >> >> >> >> >
>> >> > >> >> >> >> >I think there's an issue with the flexible framing
>> >> > >> >> >> >> >requirements and there's an issue with Linux driver.
>> >> > >> >> >> >> >
>> >> > >> >> >> >> >> The motivation is to allow the device to calculate
>> >> > >> >> >> >> >> the exact number of
>> >> > >> >> >> >> >descriptors to consume, before fetching the descriptors.
>> >> > >> >> >> >> >This is beneficial for devices for which
>> >> > >> >> >> >> >overconsuming or underconsuming descriptors come at a
>high cost.
>> >> > >> >> >> >> >
>> >> > >> >> >> >> >So I guess we can agree getting the # of descriptors
>> >> > >> >> >> >> >*exactly* right isn't all that important. Right? My
>> >> > >> >> >> >> >question is how does the driver balance the device
>> >> > >> >> >> >> >versus
>> >Linux needs?
>> >> > >> >> >> >> >
>> >> > >> >> >> >> >One idea is if we assume this is best effort, does
>> >> > >> >> >> >> >not have to be exact, then how about just having the
>> >> > >> >> >> >> >device assume descriptor sizes stay more or less
>> >> > >> >> >> >> >constant and use that to estimate the amount? If it
>> >> > >> >> >> >> >under/over estimates, things just go a bit
>> >> > >> >slower.
>> >> > >> >> >> >> >This way driver can adjust the sizes and device will
>> >> > >> >> >> >> >react automatically, with time.
>> >> > >> >> >> >> >
>> >> > >> >> >> >> >
>> >> > >> >> >> >>
>> >> > >> >> >> >> I see that "2.6.4 Message Framing" allows a full
>> >> > >> >> >> >> flexibility of descriptor lengths, and I presume it's
>> >> > >> >> >> >> applicable to Packet Virtqueues as well, though
>> >> > >> >> >> >> defined under Split Virtqueues
>> >> > >section.
>> >> > >> >> >> >
>> >> > >> >> >> >That's a good point. Probably makes sense to move it
>> >> > >> >> >> >out to a common section, right?
>> >> > >> >> >> >
>> >> > >> >> >> >> The example
>> >> > >> >> >> >> talks about transmit virtqueue, and it makes perfect sense.
>> >> > >> >> >> >> However, wouldn't a typical driver place equal-sized
>> >> > >> >> >> >> *receive* descriptors
>> >> > >> >> >> >anyway? So if a device can benefit from it, the driver
>> >> > >> >> >> >might as well negotiate this new feature.
>> >> > >> >> >> >
>> >> > >> >> >> >Having buffer size jump around wildly doesn't seem too
>> >> > >> >> >> >useful, I
>> >> > >agree.
>> >> > >> >> >> >OTOH being able to adjust it gradually has been in the
>> >> > >> >> >> >past demontrated to help performance measureably.
>> >> > >> >> >> >
>> >> > >> >> >> >> This could be especially relevant for devices, for
>> >> > >> >> >> >> which adjusting the number
>> >> > >> >> >> >of used descriptors is impractical after descriptors
>> >> > >> >> >> >have already been
>> >> > >> >> >fetched.
>> >> > >> >> >> >> I agree that if this requirement conflicts with
>> >> > >> >> >> >> specific driver needs, it will not
>> >> > >> >> >> >be negotiated, and the device will either underperform
>> >> > >> >> >> >in certain scenarios, or will not come up at all.
>> >> > >> >> >> >
>> >> > >> >> >> >Right so that makes it a challenge. Device says it
>> >> > >> >> >> >prefers fixed buffers. Is that preference more or less
>> >> > >> >> >> >important than ability to efficiently support workloads
>> >> > >> >> >> >such as TCP small
>> >queues?
>> >> > >> >> >> >Driver has no idea and I suspect neither does the device.
>> >> > >> >> >> >So I don't see how will a Linux driver know that it
>> >> > >> >> >> >should enable this, neither how will device know it's
>> >> > >> >> >> >okay to just refuse
>> >> > >features_ok.
>> >> > >> >> >> >
>> >> > >> >> >> >
>> >> > >> >> >> >On the other hand, current drivers already have logic
>> >> > >> >> >> >that tries to change buffer sizes in a smooth way.  So
>> >> > >> >> >> >simply caching the last buffer size and assuming all
>> >> > >> >> >> >the others will be exactly the same will go a long way
>> >> > >> >> >> >towards limiting how much does device need to
>> >> > >> >fetch.
>> >> > >> >> >> >This does imply extra logic on the device to recover if
>> >> > >> >> >> >things change and the first read did not fetch enough
>> >> > >> >> >> >buffers, but then it would be required anyway if as you
>> >> > >> >> >> >say above the failure is not
>> >> > >> >catastrophic.
>> >> > >> >> >> >
>> >> > >> >> >> >
>> >> > >> >> >> >The feature thus only seems useful for small,
>> >> > >> >> >> >feature-restrained devices
>> >> > >> >> >> >- which are prepared to sacrifice some performance to
>> >> > >> >> >> >cut costs, and which can't recover at all. Is this what
>> >> > >> >> >> >you are trying
>> >to do?
>> >> > >> >> >> >
>> >> > >> >> >>
>> >> > >> >> >> The intention is to enable devices with this specific
>> >> > >> >> >> limitation to offer virtio offload.  The feature can be
>> >> > >> >> >> redefined such that it would only be offered by devices
>> >> > >> >> >> that are unable to handle dynamically changing
>> >> > >> >> >> descriptor
>> >lengths. How does that sound?
>> >> > >> >> >
>> >> > >> >> >So it makes more sense when mergeable buffers are disabled
>> >> > >> >> >(since then buffers are practically all same size).
>> >> > >> >> >
>> >> > >> >>
>> >> > >> >> Actually, mergeable buffers are not a problem. They could
>> >> > >> >> be enabled, as long as each descriptor is the same length.
>> >> > >> >> So implementations that prefer optimizing memory
>> >> > >> >> utilization over jumbo frame performance can negotiate
>> >VIRTIO_NET_F_MRG_RXBUF
>> >> > >and allocate smaller buffers.
>> >> > >> >
>> >> > >> >So my point was, without VIRTIO_NET_F_MRG_RXBUF, all buffers
>> >> > >> >are
>> >> > >same
>> >> > >> >length anyway. So if we are talking about cheap simple
>> >> > >> >hardware, I guess it can just not have VIRTIO_NET_F_MRG_RXBUF
>> >> > >> >and be done with
>> >> > >it?
>> >> > >> >If device does care about memory utilization then I think it
>> >> > >> >needs to give driver the flexibility it needs/uses.
>> >> > >> >No?
>> >> > >> >
>> >> > >>
>> >> > >> It's not a matter of device complexity, but a HW architecture,
>> >> > >> which could be
>> >> > >complex, but have this specific limitation.
>> >> > >
>> >> > >So - don't do it then?
>> >> > >
>> >> > >> I see at least two reasons to support and negotiate
>> >> > >VIRTIO_NET_F_MRG_RXBUF, while keeping equal-sized descriptor
>> >limitation:
>> >> > >>  * GRO
>> >> > >
>> >> > >You mean LRO?
>> >> > >
>> >> > >>  * With jumbo packets, if the throughput is capped by the port
>> >> > >> bandwidth,
>> >> > >and not by the device/driver per-packet performance, it makes
>> >> > >sense to optimize memory utilization by allocating small
>> >> > >buffers, without sacrificing throughput performance.
>> >> > >
>> >> > >So we are back to square one, if driver cares about memory
>> >> > >utilization with 1K packets vs 9K buffers, why not with 100byte
>> >> > >packets vs 1K buffers? Looks like exactly the same tradeoff.
>> >> > >
>> >> > >
>> >> > >>
>> >> > >> >Again I can see how we might want to disallow crazy setups with
>e.g.
>> >> > >> >1 byte per buffer. That's just abuse, no guest does that anyway.
>> >> > >> >So asking e.g. for a minimal buffer size sounds very reasonable.
>> >> > >> >But an option that disables functionality that a popular
>> >> > >> >guest uses needs a lot of documentation to help device
>> >> > >> >writers figure out whether they want that option or not, and
>> >> > >> >I'd worry that even with documentation will
>> >> > >be misunderstood even if we write it.
>> >> > >> >When do you enable this?
>> >> > >> >When you don't care about performance? When you don't care
>> >> > >> >about
>> >> > >Linux?
>> >> > >> >
>> >> > >>
>> >> > >> I understand that there are guest-side optimizations that
>> >> > >> require flexibility
>> >> > >in descriptors length. I can propose the following simple logic:
>> >> > >> Device - advertise "equal-sized descriptor" feature only if
>> >> > >> the device is
>> >> > >absolutely unable to operate otherwise. The device will not set
>> >> > >FEATURES_OK unless the driver negotiates this feature.
>> >> > >> Driver - if device advertises "equal-sized descriptor" feature
>> >> > >> - if possible,
>> >> > >give up on the flexible descriptors length optimizations. If not
>> >> > >- give up on the device.
>> >> > >
>> >> > >Yes, that's possible. Looks like a rather limited feature, and
>> >> > >i'd rather we focused on something more widely applicable, but
>> >> > >with enough disclamers that devices SHOULD NOT set this bit we
>> >> > >can maybe
>> >do that.
>> >> > >
>> >> >
>> >> > I agree that this feature is more of a limitation declaration,
>> >> > rather than an
>> >enhancement, but let me emphasize that its only purpose is to make
>> >more
>> >*existing* HW devices be virtio compatible.
>> >> > This feature will allow such HW devices to offer virtio offload
>> >> > with
>> >VIRTIO_NET_F_MRG_RXBUF capability for GRO/LRO offload.
>> >> > New device designs should definitely avoid employing this feature.
>> >>
>> >> I understand, my question is: can you find a way to make this more
>> >> generally useful? Something along the lines of the min buffer size
>> >> suggestion which would avoid creating a special case in the driver?
>> >> Any idea?
>> >
>> >as an example of a vague idea, exposing max rx s/g, min buffer + max
>> >buffer will allow device to force this from device side.
>> >
>> >Is that good? If not, how does driver decide on a good fixed buffer size?
>> >
>>
>> Setting max=min=fixed_size by the device will work, but this seems too
>restrictive, as we still may want to enable the driver to select its buffer size.
>> I guess driver can select the fixed buffer size based on the MTU.
>
>
>Would that be based on max MTU field then?
>I note that that's read-only, so just starting with that and sending it back to the
>device doesn't change much ...
>

I had in mind the actual interface MTU, and not the maximum supported MTU (given the device provided one).
Perhaps it's a different discussion, but in this sense I find the 'mtu' field a bit confusing. Device advertises its maximum supported MTU, but then it's being used as an actual MTU (which can be much smaller). Shouldn't device enforce the actual MTU on TX and RX packets?
I also noticed that the spec mentions the maximum RX packet size is strictly 1514 bytes (w/o TSO).

>> What if device will request max/min buffer size ratio, and driver will set min
>buffer size? This can solve the fixed size issue, without forcing a specific size.
>> Along with the max s/g, maybe it can also help avoiding rx buffer size abuse
>by the driver (i.e. setting it too low).
>
>That sounds reasonably generic, too.
>

Ok, I will formulate a new patch.

>--
>MST


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature
  2019-12-02 16:16                                     ` Vitaly Mireyno
@ 2019-12-02 16:47                                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 28+ messages in thread
From: Michael S. Tsirkin @ 2019-12-02 16:47 UTC (permalink / raw)
  To: Vitaly Mireyno; +Cc: Jason Wang, virtio-comment

On Mon, Dec 02, 2019 at 04:16:46PM +0000, Vitaly Mireyno wrote:
> >> >as an example of a vague idea, exposing max rx s/g, min buffer + max
> >> >buffer will allow device to force this from device side.
> >> >
> >> >Is that good? If not, how does driver decide on a good fixed buffer size?
> >> >
> >>
> >> Setting max=min=fixed_size by the device will work, but this seems too
> >restrictive, as we still may want to enable the driver to select its buffer size.
> >> I guess driver can select the fixed buffer size based on the MTU.
> >
> >
> >Would that be based on max MTU field then?
> >I note that that's read-only, so just starting with that and sending it back to the
> >device doesn't change much ...
> >
> 
> I had in mind the actual interface MTU, and not the maximum supported
> MTU (given the device provided one).  Perhaps it's a different
> discussion, but in this sense I find the 'mtu' field a bit confusing.
> Device advertises its maximum supported MTU, but then it's being used
> as an actual MTU (which can be much smaller).

This was sufficient for what it was used for (encapsulation).


> Shouldn't device enforce
> the actual MTU on TX and RX packets?

We need to think of a set of usecases to address though. It's also
tricky wrt things like migration.

>  I also noticed that the spec
> mentions the maximum RX packet size is strictly 1514 bytes (w/o TSO).

It might make sense to extend that for jumbo frames etc. And, I suspect
it doesn't match what drivers/devices actually do in all cases.
Would be great if someone looked at this some more.

> >> What if device will request max/min buffer size ratio, and driver will set min
> >buffer size? This can solve the fixed size issue, without forcing a specific size.
> >> Along with the max s/g, maybe it can also help avoiding rx buffer size abuse
> >by the driver (i.e. setting it too low).
> >
> >That sounds reasonably generic, too.
> >
> 
> Ok, I will formulate a new patch.
> 
> >--
> >MST
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-12-02 16:47 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-31 10:46 [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature Vitaly Mireyno
2019-11-01  4:09 ` Michael S. Tsirkin
2019-11-01  7:22   ` Jason Wang
2019-11-05 18:51     ` Michael S. Tsirkin
2019-11-10 14:13       ` [virtio-comment] RE: [EXT] " Vitaly Mireyno
2019-11-10 15:34         ` [virtio-comment] " Michael S. Tsirkin
2019-11-11 13:58           ` [virtio-comment] " Vitaly Mireyno
2019-11-11 15:05             ` [virtio-comment] " Michael S. Tsirkin
2019-11-14 15:49               ` [virtio-comment] " Vitaly Mireyno
2019-11-20 13:23                 ` [virtio-comment] " Michael S. Tsirkin
2019-11-24 15:02                   ` [virtio-comment] " Vitaly Mireyno
2019-11-24 15:30                     ` [virtio-comment] " Michael S. Tsirkin
2019-11-25  3:31                       ` Jason Wang
2019-11-25  7:33                         ` Michael S. Tsirkin
2019-11-25  8:07                           ` Jason Wang
2019-11-25  8:18                             ` Michael S. Tsirkin
2019-11-25  8:22                               ` Jason Wang
2019-11-25  8:30                                 ` Michael S. Tsirkin
2019-11-25  9:46                                   ` Jason Wang
2019-11-26 17:27                       ` Vitaly Mireyno
2019-11-27 13:51                         ` Michael S. Tsirkin
2019-12-01 10:22                           ` Vitaly Mireyno
2019-12-01 13:07                             ` Michael S. Tsirkin
2019-12-01 21:43                               ` Michael S. Tsirkin
2019-12-02 15:09                                 ` Vitaly Mireyno
2019-12-02 15:23                                   ` Michael S. Tsirkin
2019-12-02 16:16                                     ` Vitaly Mireyno
2019-12-02 16:47                                       ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.