linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
@ 2015-09-16 15:50 Vitaly Kuznetsov
  2015-09-16 16:08 ` Haiyang Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Vitaly Kuznetsov @ 2015-09-16 15:50 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Jason Wang

Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc: Eliminate
 memory allocation in the packet send path") introduced skb headroom
request for Hyper-V netvsc driver:

       max_needed_headroom = sizeof(struct hv_netvsc_packet) +
                               sizeof(struct rndis_message) +
                               NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE +
                               NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE;
       ...
       net->needed_headroom = max_needed_headroom;

max_needed_headroom is 220 bytes, it significantly exceeds the
LL_MAX_HEADER setting. This causes each skb to be cloned on send path,
e.g. for IPv4 case we fall into the following clause
(ip_finish_output2()):

if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
    ...
    skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
    ...
}

leading to a significant performance regression. Increase LL_MAX_HEADER
to make it suitable for netvsc, make it 224 to be 16-aligned.
Alternatively we could (partially) revert the commit which introduced skb
headroom request restoring manual memory allocation on transmit path.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 include/linux/netdevice.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 88a0069..7233790 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc)
  *	used.
  */
 
-#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
+#if IS_ENABLED(CONFIG_HYPERV_NET)
+# define LL_MAX_HEADER 224
+#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
 # if defined(CONFIG_MAC80211_MESH)
 #  define LL_MAX_HEADER 128
 # else
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-16 15:50 [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V Vitaly Kuznetsov
@ 2015-09-16 16:08 ` Haiyang Zhang
  2015-09-16 16:25   ` David Laight
  0 siblings, 1 reply; 17+ messages in thread
From: Haiyang Zhang @ 2015-09-16 16:08 UTC (permalink / raw)
  To: Vitaly Kuznetsov, netdev
  Cc: David S. Miller, linux-kernel, KY Srinivasan, Jason Wang



> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Wednesday, September 16, 2015 11:50 AM
> To: netdev@vger.kernel.org
> Cc: David S. Miller <davem@davemloft.net>; linux-kernel@vger.kernel.org;
> KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Jason Wang <jasowang@redhat.com>
> Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> 
> Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc: Eliminate
>  memory allocation in the packet send path") introduced skb headroom
> request for Hyper-V netvsc driver:
> 
>        max_needed_headroom = sizeof(struct hv_netvsc_packet) +
>                                sizeof(struct rndis_message) +
>                                NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE +
>                                NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE;
>        ...
>        net->needed_headroom = max_needed_headroom;
> 
> max_needed_headroom is 220 bytes, it significantly exceeds the
> LL_MAX_HEADER setting. This causes each skb to be cloned on send path,
> e.g. for IPv4 case we fall into the following clause
> (ip_finish_output2()):
> 
> if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
>     ...
>     skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
>     ...
> }
> 
> leading to a significant performance regression. Increase LL_MAX_HEADER
> to make it suitable for netvsc, make it 224 to be 16-aligned.
> Alternatively we could (partially) revert the commit which introduced
> skb
> headroom request restoring manual memory allocation on transmit path.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  include/linux/netdevice.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 88a0069..7233790 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc)
>   *	used.
>   */
> 
> -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
> +#if IS_ENABLED(CONFIG_HYPERV_NET)
> +# define LL_MAX_HEADER 224
> +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
>  # if defined(CONFIG_MAC80211_MESH)
>  #  define LL_MAX_HEADER 128
>  # else

Thanks for the patch.
To avoid we forget to update that 224 number when we add more things
into netvsc header, I suggest that we define a macro in netdevice.h such 
as:
#define HVNETVSC_MAX_HEADER 224
#define LL_MAX_HEADER HVNETVSC_MAX_HEADER

And, put a note in netvsc code saying the header reservation shouldn't 
exceed HVNETVSC_MAX_HEADER, or you need to update HVNETVSC_MAX_HEADER.

Thanks,
- Haiyang





^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-16 16:08 ` Haiyang Zhang
@ 2015-09-16 16:25   ` David Laight
  2015-09-16 17:26     ` Alexander Duyck
                       ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: David Laight @ 2015-09-16 16:25 UTC (permalink / raw)
  To: 'Haiyang Zhang', Vitaly Kuznetsov, netdev
  Cc: David S. Miller, linux-kernel, KY Srinivasan, Jason Wang

From: Haiyang Zhang
> Sent: 16 September 2015 17:09
> > -----Original Message-----
> > From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> > Sent: Wednesday, September 16, 2015 11:50 AM
> > To: netdev@vger.kernel.org
> > Cc: David S. Miller <davem@davemloft.net>; linux-kernel@vger.kernel.org;
> > KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> > <haiyangz@microsoft.com>; Jason Wang <jasowang@redhat.com>
> > Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> >
> > Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc: Eliminate
> >  memory allocation in the packet send path") introduced skb headroom
> > request for Hyper-V netvsc driver:
> >
> >        max_needed_headroom = sizeof(struct hv_netvsc_packet) +
> >                                sizeof(struct rndis_message) +
> >                                NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE +
> >                                NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE;
> >        ...
> >        net->needed_headroom = max_needed_headroom;
> >
> > max_needed_headroom is 220 bytes, it significantly exceeds the
> > LL_MAX_HEADER setting. This causes each skb to be cloned on send path,
> > e.g. for IPv4 case we fall into the following clause
> > (ip_finish_output2()):
> >
> > if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
> >     ...
> >     skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
> >     ...
> > }
> >
> > leading to a significant performance regression. Increase LL_MAX_HEADER
> > to make it suitable for netvsc, make it 224 to be 16-aligned.
> > Alternatively we could (partially) revert the commit which introduced
> > skb
> > headroom request restoring manual memory allocation on transmit path.
> >
> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > ---
> >  include/linux/netdevice.h | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 88a0069..7233790 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc)
> >   *	used.
> >   */
> >
> > -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
> > +#if IS_ENABLED(CONFIG_HYPERV_NET)
> > +# define LL_MAX_HEADER 224
> > +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
> >  # if defined(CONFIG_MAC80211_MESH)
> >  #  define LL_MAX_HEADER 128
> >  # else
> 
> Thanks for the patch.
> To avoid we forget to update that 224 number when we add more things
> into netvsc header, I suggest that we define a macro in netdevice.h such
> as:
> #define HVNETVSC_MAX_HEADER 224
> #define LL_MAX_HEADER HVNETVSC_MAX_HEADER
> 
> And, put a note in netvsc code saying the header reservation shouldn't
> exceed HVNETVSC_MAX_HEADER, or you need to update HVNETVSC_MAX_HEADER.

Am I right in thinking this is adding an extra 96 unused bytes to the front
of almost all skb just so that hyper-v can make its link level header
contiguous with whatever follows (IP header ?).

Doesn't sound ideal.

	David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-16 16:25   ` David Laight
@ 2015-09-16 17:26     ` Alexander Duyck
  2015-09-16 17:55     ` KY Srinivasan
  2015-09-16 17:59     ` David Miller
  2 siblings, 0 replies; 17+ messages in thread
From: Alexander Duyck @ 2015-09-16 17:26 UTC (permalink / raw)
  To: David Laight, 'Haiyang Zhang', Vitaly Kuznetsov, netdev
  Cc: David S. Miller, linux-kernel, KY Srinivasan, Jason Wang

On 09/16/2015 09:25 AM, David Laight wrote:
> From: Haiyang Zhang
>> Sent: 16 September 2015 17:09
>>> -----Original Message-----
>>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>>> Sent: Wednesday, September 16, 2015 11:50 AM
>>> To: netdev@vger.kernel.org
>>> Cc: David S. Miller <davem@davemloft.net>; linux-kernel@vger.kernel.org;
>>> KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
>>> <haiyangz@microsoft.com>; Jason Wang <jasowang@redhat.com>
>>> Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
>>>
>>> Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc: Eliminate
>>>   memory allocation in the packet send path") introduced skb headroom
>>> request for Hyper-V netvsc driver:
>>>
>>>         max_needed_headroom = sizeof(struct hv_netvsc_packet) +
>>>                                 sizeof(struct rndis_message) +
>>>                                 NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE +
>>>                                 NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE;
>>>         ...
>>>         net->needed_headroom = max_needed_headroom;
>>>
>>> max_needed_headroom is 220 bytes, it significantly exceeds the
>>> LL_MAX_HEADER setting. This causes each skb to be cloned on send path,
>>> e.g. for IPv4 case we fall into the following clause
>>> (ip_finish_output2()):
>>>
>>> if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
>>>      ...
>>>      skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
>>>      ...
>>> }
>>>
>>> leading to a significant performance regression. Increase LL_MAX_HEADER
>>> to make it suitable for netvsc, make it 224 to be 16-aligned.
>>> Alternatively we could (partially) revert the commit which introduced
>>> skb
>>> headroom request restoring manual memory allocation on transmit path.
>>>
>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>> ---
>>>   include/linux/netdevice.h | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 88a0069..7233790 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc)
>>>    *	used.
>>>    */
>>>
>>> -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
>>> +#if IS_ENABLED(CONFIG_HYPERV_NET)
>>> +# define LL_MAX_HEADER 224
>>> +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
>>>   # if defined(CONFIG_MAC80211_MESH)
>>>   #  define LL_MAX_HEADER 128
>>>   # else
>> Thanks for the patch.
>> To avoid we forget to update that 224 number when we add more things
>> into netvsc header, I suggest that we define a macro in netdevice.h such
>> as:
>> #define HVNETVSC_MAX_HEADER 224
>> #define LL_MAX_HEADER HVNETVSC_MAX_HEADER
>>
>> And, put a note in netvsc code saying the header reservation shouldn't
>> exceed HVNETVSC_MAX_HEADER, or you need to update HVNETVSC_MAX_HEADER.
> Am I right in thinking this is adding an extra 96 unused bytes to the front
> of almost all skb just so that hyper-v can make its link level header
> contiguous with whatever follows (IP header ?).
>
> Doesn't sound ideal.

Actually it will probably add closer to 128 unused bytes per frame since 
an skb allocation is rounded off to the nearest cache line in terms of size.

Is there a reason why this buffer info and descriptor like thing even 
needs to be in the packet header?  From what I can tell it was being 
allocated separately before.  Why not just maintain a ring of them if 
you need to keep them around and want to avoid having to reallocate them 
for the next transmit?  Also I don't know if the layout  matters but it 
probably wouldn't hurt to go through with pahole and clean the 
hv_netvsc_packet structure up.  You could probably save yourself 
something like 20+ bytes just doing that.

- Alex

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-16 16:25   ` David Laight
  2015-09-16 17:26     ` Alexander Duyck
@ 2015-09-16 17:55     ` KY Srinivasan
  2015-09-16 21:38       ` Alexander Duyck
  2015-09-16 17:59     ` David Miller
  2 siblings, 1 reply; 17+ messages in thread
From: KY Srinivasan @ 2015-09-16 17:55 UTC (permalink / raw)
  To: David Laight, Haiyang Zhang, Vitaly Kuznetsov, netdev
  Cc: David S. Miller, linux-kernel, Jason Wang



> -----Original Message-----
> From: David Laight [mailto:David.Laight@ACULAB.COM]
> Sent: Wednesday, September 16, 2015 9:25 AM
> To: Haiyang Zhang <haiyangz@microsoft.com>; Vitaly Kuznetsov
> <vkuznets@redhat.com>; netdev@vger.kernel.org
> Cc: David S. Miller <davem@davemloft.net>; linux-kernel@vger.kernel.org;
> KY Srinivasan <kys@microsoft.com>; Jason Wang <jasowang@redhat.com>
> Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-
> V
> 
> From: Haiyang Zhang
> > Sent: 16 September 2015 17:09
> > > -----Original Message-----
> > > From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> > > Sent: Wednesday, September 16, 2015 11:50 AM
> > > To: netdev@vger.kernel.org
> > > Cc: David S. Miller <davem@davemloft.net>; linux-
> kernel@vger.kernel.org;
> > > KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> > > <haiyangz@microsoft.com>; Jason Wang <jasowang@redhat.com>
> > > Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-
> V
> > >
> > > Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc:
> Eliminate
> > >  memory allocation in the packet send path") introduced skb headroom
> > > request for Hyper-V netvsc driver:
> > >
> > >        max_needed_headroom = sizeof(struct hv_netvsc_packet) +
> > >                                sizeof(struct rndis_message) +
> > >                                NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE +
> > >                                NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE;
> > >        ...
> > >        net->needed_headroom = max_needed_headroom;
> > >
> > > max_needed_headroom is 220 bytes, it significantly exceeds the
> > > LL_MAX_HEADER setting. This causes each skb to be cloned on send
> path,
> > > e.g. for IPv4 case we fall into the following clause
> > > (ip_finish_output2()):
> > >
> > > if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
> > >     ...
> > >     skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
> > >     ...
> > > }
> > >
> > > leading to a significant performance regression. Increase
> LL_MAX_HEADER
> > > to make it suitable for netvsc, make it 224 to be 16-aligned.
> > > Alternatively we could (partially) revert the commit which introduced
> > > skb
> > > headroom request restoring manual memory allocation on transmit path.
> > >
> > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> > > ---
> > >  include/linux/netdevice.h | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 88a0069..7233790 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc)
> > >   *	used.
> > >   */
> > >
> > > -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
> > > +#if IS_ENABLED(CONFIG_HYPERV_NET)
> > > +# define LL_MAX_HEADER 224
> > > +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
> > >  # if defined(CONFIG_MAC80211_MESH)
> > >  #  define LL_MAX_HEADER 128
> > >  # else
> >
> > Thanks for the patch.
> > To avoid we forget to update that 224 number when we add more things
> > into netvsc header, I suggest that we define a macro in netdevice.h such
> > as:
> > #define HVNETVSC_MAX_HEADER 224
> > #define LL_MAX_HEADER HVNETVSC_MAX_HEADER
> >
> > And, put a note in netvsc code saying the header reservation shouldn't
> > exceed HVNETVSC_MAX_HEADER, or you need to update
> HVNETVSC_MAX_HEADER.
> 
> Am I right in thinking this is adding an extra 96 unused bytes to the front
> of almost all skb just so that hyper-v can make its link level header
> contiguous with whatever follows (IP header ?).
> 
> Doesn't sound ideal.

Remote NDIS is the protocol used to send packets from the guest to the host. Every packet
needs to be decorated with the RNDIS header and the maximum room needed for the RNDIS
header is the hreadroom we want.

K. Y
> 
> 	David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-16 16:25   ` David Laight
  2015-09-16 17:26     ` Alexander Duyck
  2015-09-16 17:55     ` KY Srinivasan
@ 2015-09-16 17:59     ` David Miller
  2015-09-17  9:03       ` Vitaly Kuznetsov
  2 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2015-09-16 17:59 UTC (permalink / raw)
  To: David.Laight; +Cc: haiyangz, vkuznets, netdev, linux-kernel, kys, jasowang

From: David Laight <David.Laight@ACULAB.COM>
Date: Wed, 16 Sep 2015 16:25:03 +0000

> Am I right in thinking this is adding an extra 96 unused bytes to the front
> of almost all skb just so that hyper-v can make its link level header
> contiguous with whatever follows (IP header ?).
> 
> Doesn't sound ideal.

Agreed, this is rediculous, and the entire stack will incur this cost
just because hyperv is enabled in the kernel config.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-16 17:55     ` KY Srinivasan
@ 2015-09-16 21:38       ` Alexander Duyck
  2015-09-16 22:57         ` KY Srinivasan
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Duyck @ 2015-09-16 21:38 UTC (permalink / raw)
  To: KY Srinivasan, David Laight, Haiyang Zhang, Vitaly Kuznetsov, netdev
  Cc: David S. Miller, linux-kernel, Jason Wang

On 09/16/2015 10:55 AM, KY Srinivasan wrote:
>
>> -----Original Message-----
>> From: David Laight [mailto:David.Laight@ACULAB.COM]
>> Sent: Wednesday, September 16, 2015 9:25 AM
>> To: Haiyang Zhang <haiyangz@microsoft.com>; Vitaly Kuznetsov
>> <vkuznets@redhat.com>; netdev@vger.kernel.org
>> Cc: David S. Miller <davem@davemloft.net>; linux-kernel@vger.kernel.org;
>> KY Srinivasan <kys@microsoft.com>; Jason Wang <jasowang@redhat.com>
>> Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-
>> V
>>
>> From: Haiyang Zhang
>>> Sent: 16 September 2015 17:09
>>>> -----Original Message-----
>>>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>>>> Sent: Wednesday, September 16, 2015 11:50 AM
>>>> To: netdev@vger.kernel.org
>>>> Cc: David S. Miller <davem@davemloft.net>; linux-
>> kernel@vger.kernel.org;
>>>> KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
>>>> <haiyangz@microsoft.com>; Jason Wang <jasowang@redhat.com>
>>>> Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-
>> V
>>>> Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc:
>> Eliminate
>>>>   memory allocation in the packet send path") introduced skb headroom
>>>> request for Hyper-V netvsc driver:
>>>>
>>>>         max_needed_headroom = sizeof(struct hv_netvsc_packet) +
>>>>                                 sizeof(struct rndis_message) +
>>>>                                 NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE +
>>>>                                 NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE;
>>>>         ...
>>>>         net->needed_headroom = max_needed_headroom;
>>>>
>>>> max_needed_headroom is 220 bytes, it significantly exceeds the
>>>> LL_MAX_HEADER setting. This causes each skb to be cloned on send
>> path,
>>>> e.g. for IPv4 case we fall into the following clause
>>>> (ip_finish_output2()):
>>>>
>>>> if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
>>>>      ...
>>>>      skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
>>>>      ...
>>>> }
>>>>
>>>> leading to a significant performance regression. Increase
>> LL_MAX_HEADER
>>>> to make it suitable for netvsc, make it 224 to be 16-aligned.
>>>> Alternatively we could (partially) revert the commit which introduced
>>>> skb
>>>> headroom request restoring manual memory allocation on transmit path.
>>>>
>>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>> ---
>>>>   include/linux/netdevice.h | 4 +++-
>>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index 88a0069..7233790 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc)
>>>>    *	used.
>>>>    */
>>>>
>>>> -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
>>>> +#if IS_ENABLED(CONFIG_HYPERV_NET)
>>>> +# define LL_MAX_HEADER 224
>>>> +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
>>>>   # if defined(CONFIG_MAC80211_MESH)
>>>>   #  define LL_MAX_HEADER 128
>>>>   # else
>>> Thanks for the patch.
>>> To avoid we forget to update that 224 number when we add more things
>>> into netvsc header, I suggest that we define a macro in netdevice.h such
>>> as:
>>> #define HVNETVSC_MAX_HEADER 224
>>> #define LL_MAX_HEADER HVNETVSC_MAX_HEADER
>>>
>>> And, put a note in netvsc code saying the header reservation shouldn't
>>> exceed HVNETVSC_MAX_HEADER, or you need to update
>> HVNETVSC_MAX_HEADER.
>>
>> Am I right in thinking this is adding an extra 96 unused bytes to the front
>> of almost all skb just so that hyper-v can make its link level header
>> contiguous with whatever follows (IP header ?).
>>
>> Doesn't sound ideal.
> Remote NDIS is the protocol used to send packets from the guest to the host. Every packet
> needs to be decorated with the RNDIS header and the maximum room needed for the RNDIS
> header is the hreadroom we want.

I think we get that.  The question is does the Remote NDIS header and 
packet info actually need to be a part of the header data?  I would 
argue that it probably doesn't.

So for example in netvsc_start_xmit it looks like you are calling 
init_page_array in order to populate a set of page buffers, but the 
first buffer for the Remote NDIS protocol is populated as a separate 
page and offset.  As such it doesn't seem like it necessarily needs to 
be a part of the header data but could be maintained perhaps in a 
separate ring buffer, or perhaps just be a separate page that you break 
up to use for each header.

- Alex

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-16 21:38       ` Alexander Duyck
@ 2015-09-16 22:57         ` KY Srinivasan
  2015-09-16 23:48           ` Alexander Duyck
  2015-09-17  8:38           ` David Laight
  0 siblings, 2 replies; 17+ messages in thread
From: KY Srinivasan @ 2015-09-16 22:57 UTC (permalink / raw)
  To: Alexander Duyck, David Laight, Haiyang Zhang, Vitaly Kuznetsov, netdev
  Cc: David S. Miller, linux-kernel, Jason Wang



> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> Sent: Wednesday, September 16, 2015 2:39 PM
> To: KY Srinivasan <kys@microsoft.com>; David Laight
> <David.Laight@ACULAB.COM>; Haiyang Zhang <haiyangz@microsoft.com>;
> Vitaly Kuznetsov <vkuznets@redhat.com>; netdev@vger.kernel.org
> Cc: David S. Miller <davem@davemloft.net>; linux-kernel@vger.kernel.org;
> Jason Wang <jasowang@redhat.com>
> Subject: Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> 
> On 09/16/2015 10:55 AM, KY Srinivasan wrote:
> >
> >> -----Original Message-----
> >> From: David Laight [mailto:David.Laight@ACULAB.COM]
> >> Sent: Wednesday, September 16, 2015 9:25 AM
> >> To: Haiyang Zhang <haiyangz@microsoft.com>; Vitaly Kuznetsov
> >> <vkuznets@redhat.com>; netdev@vger.kernel.org
> >> Cc: David S. Miller <davem@davemloft.net>; linux-kernel@vger.kernel.org;
> >> KY Srinivasan <kys@microsoft.com>; Jason Wang <jasowang@redhat.com>
> >> Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-
> >> V
> >>
> >> From: Haiyang Zhang
> >>> Sent: 16 September 2015 17:09
> >>>> -----Original Message-----
> >>>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> >>>> Sent: Wednesday, September 16, 2015 11:50 AM
> >>>> To: netdev@vger.kernel.org
> >>>> Cc: David S. Miller <davem@davemloft.net>; linux-
> >> kernel@vger.kernel.org;
> >>>> KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> >>>> <haiyangz@microsoft.com>; Jason Wang <jasowang@redhat.com>
> >>>> Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-
> >> V
> >>>> Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc:
> >> Eliminate
> >>>>   memory allocation in the packet send path") introduced skb headroom
> >>>> request for Hyper-V netvsc driver:
> >>>>
> >>>>         max_needed_headroom = sizeof(struct hv_netvsc_packet) +
> >>>>                                 sizeof(struct rndis_message) +
> >>>>                                 NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE +
> >>>>                                 NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE;
> >>>>         ...
> >>>>         net->needed_headroom = max_needed_headroom;
> >>>>
> >>>> max_needed_headroom is 220 bytes, it significantly exceeds the
> >>>> LL_MAX_HEADER setting. This causes each skb to be cloned on send
> >> path,
> >>>> e.g. for IPv4 case we fall into the following clause
> >>>> (ip_finish_output2()):
> >>>>
> >>>> if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
> >>>>      ...
> >>>>      skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
> >>>>      ...
> >>>> }
> >>>>
> >>>> leading to a significant performance regression. Increase
> >> LL_MAX_HEADER
> >>>> to make it suitable for netvsc, make it 224 to be 16-aligned.
> >>>> Alternatively we could (partially) revert the commit which introduced
> >>>> skb
> >>>> headroom request restoring manual memory allocation on transmit path.
> >>>>
> >>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >>>> ---
> >>>>   include/linux/netdevice.h | 4 +++-
> >>>>   1 file changed, 3 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>>> index 88a0069..7233790 100644
> >>>> --- a/include/linux/netdevice.h
> >>>> +++ b/include/linux/netdevice.h
> >>>> @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc)
> >>>>    *	used.
> >>>>    */
> >>>>
> >>>> -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
> >>>> +#if IS_ENABLED(CONFIG_HYPERV_NET)
> >>>> +# define LL_MAX_HEADER 224
> >>>> +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
> >>>>   # if defined(CONFIG_MAC80211_MESH)
> >>>>   #  define LL_MAX_HEADER 128
> >>>>   # else
> >>> Thanks for the patch.
> >>> To avoid we forget to update that 224 number when we add more things
> >>> into netvsc header, I suggest that we define a macro in netdevice.h such
> >>> as:
> >>> #define HVNETVSC_MAX_HEADER 224
> >>> #define LL_MAX_HEADER HVNETVSC_MAX_HEADER
> >>>
> >>> And, put a note in netvsc code saying the header reservation shouldn't
> >>> exceed HVNETVSC_MAX_HEADER, or you need to update
> >> HVNETVSC_MAX_HEADER.
> >>
> >> Am I right in thinking this is adding an extra 96 unused bytes to the front
> >> of almost all skb just so that hyper-v can make its link level header
> >> contiguous with whatever follows (IP header ?).
> >>
> >> Doesn't sound ideal.
> > Remote NDIS is the protocol used to send packets from the guest to the host.
> Every packet
> > needs to be decorated with the RNDIS header and the maximum room needed
> for the RNDIS
> > header is the hreadroom we want.
> 
> I think we get that.  The question is does the Remote NDIS header and
> packet info actually need to be a part of the header data?  I would
> argue that it probably doesn't.
> 
> So for example in netvsc_start_xmit it looks like you are calling
> init_page_array in order to populate a set of page buffers, but the
> first buffer for the Remote NDIS protocol is populated as a separate
> page and offset.  As such it doesn't seem like it necessarily needs to
> be a part of the header data but could be maintained perhaps in a
> separate ring buffer, or perhaps just be a separate page that you break
> up to use for each header.

You are right; the rndis header can be built as a separate fragment and sent.
Indeed this is what we were doing earlier - on the outgoing path we would allocate
memory for the rndis header. My goal was to avoid this allocation on every packet being
sent and I decided to use the headroom instead. If we can completely avoid all memory
allocation for rndis header, it makes a significant perf difference:

Throughput as measured by iperf on a 40G interface (VM to VM on two nodes) in Gbps.
Scenario #A: LL_MAX_HEADER =128 [no change], needed_headroom = 220 [no change]
Scenario #B: LL_MAX_HEADER =224, needed_headroom = 220 [no change]

Conn#   #A                     #B          
1              6.9                  8.2  
2              13.2                14.9        
4              17.6                16.6        
8              24.1                26.9        
16           24.0                 31.5        
32           24.5                 33.6        
64           31.6                 31.5        
128         29.6                 30.3        

Column A is the existing code where we end up having to allocate more headroom and column B is with
Vitaly's patch. I will experiment with a light-weight allocator for the rndis header.

Regards,

K. Y





> 
> - Alex

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-16 22:57         ` KY Srinivasan
@ 2015-09-16 23:48           ` Alexander Duyck
  2015-09-16 23:58             ` KY Srinivasan
  2015-09-17  8:38           ` David Laight
  1 sibling, 1 reply; 17+ messages in thread
From: Alexander Duyck @ 2015-09-16 23:48 UTC (permalink / raw)
  To: KY Srinivasan, David Laight, Haiyang Zhang, Vitaly Kuznetsov, netdev
  Cc: David S. Miller, linux-kernel, Jason Wang

On 09/16/2015 03:57 PM, KY Srinivasan wrote:
>
>> -----Original Message-----
>> From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
>> Sent: Wednesday, September 16, 2015 2:39 PM
>> To: KY Srinivasan <kys@microsoft.com>; David Laight
>> <David.Laight@ACULAB.COM>; Haiyang Zhang <haiyangz@microsoft.com>;
>> Vitaly Kuznetsov <vkuznets@redhat.com>; netdev@vger.kernel.org
>> Cc: David S. Miller <davem@davemloft.net>; linux-kernel@vger.kernel.org;
>> Jason Wang <jasowang@redhat.com>
>> Subject: Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
>>
>> On 09/16/2015 10:55 AM, KY Srinivasan wrote:
>>>> -----Original Message-----
>>>> From: David Laight [mailto:David.Laight@ACULAB.COM]
>>>> Sent: Wednesday, September 16, 2015 9:25 AM
>>>> To: Haiyang Zhang <haiyangz@microsoft.com>; Vitaly Kuznetsov
>>>> <vkuznets@redhat.com>; netdev@vger.kernel.org
>>>> Cc: David S. Miller <davem@davemloft.net>; linux-kernel@vger.kernel.org;
>>>> KY Srinivasan <kys@microsoft.com>; Jason Wang <jasowang@redhat.com>
>>>> Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-
>>>> V
>>>>
>>>> From: Haiyang Zhang
>>>>> Sent: 16 September 2015 17:09
>>>>>> -----Original Message-----
>>>>>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>>>>>> Sent: Wednesday, September 16, 2015 11:50 AM
>>>>>> To: netdev@vger.kernel.org
>>>>>> Cc: David S. Miller <davem@davemloft.net>; linux-
>>>> kernel@vger.kernel.org;
>>>>>> KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
>>>>>> <haiyangz@microsoft.com>; Jason Wang <jasowang@redhat.com>
>>>>>> Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-
>>>> V
>>>>>> Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc:
>>>> Eliminate
>>>>>>    memory allocation in the packet send path") introduced skb headroom
>>>>>> request for Hyper-V netvsc driver:
>>>>>>
>>>>>>          max_needed_headroom = sizeof(struct hv_netvsc_packet) +
>>>>>>                                  sizeof(struct rndis_message) +
>>>>>>                                  NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE +
>>>>>>                                  NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE;
>>>>>>          ...
>>>>>>          net->needed_headroom = max_needed_headroom;
>>>>>>
>>>>>> max_needed_headroom is 220 bytes, it significantly exceeds the
>>>>>> LL_MAX_HEADER setting. This causes each skb to be cloned on send
>>>> path,
>>>>>> e.g. for IPv4 case we fall into the following clause
>>>>>> (ip_finish_output2()):
>>>>>>
>>>>>> if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
>>>>>>       ...
>>>>>>       skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
>>>>>>       ...
>>>>>> }
>>>>>>
>>>>>> leading to a significant performance regression. Increase
>>>> LL_MAX_HEADER
>>>>>> to make it suitable for netvsc, make it 224 to be 16-aligned.
>>>>>> Alternatively we could (partially) revert the commit which introduced
>>>>>> skb
>>>>>> headroom request restoring manual memory allocation on transmit path.
>>>>>>
>>>>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>>>> ---
>>>>>>    include/linux/netdevice.h | 4 +++-
>>>>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>>> index 88a0069..7233790 100644
>>>>>> --- a/include/linux/netdevice.h
>>>>>> +++ b/include/linux/netdevice.h
>>>>>> @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc)
>>>>>>     *	used.
>>>>>>     */
>>>>>>
>>>>>> -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
>>>>>> +#if IS_ENABLED(CONFIG_HYPERV_NET)
>>>>>> +# define LL_MAX_HEADER 224
>>>>>> +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
>>>>>>    # if defined(CONFIG_MAC80211_MESH)
>>>>>>    #  define LL_MAX_HEADER 128
>>>>>>    # else
>>>>> Thanks for the patch.
>>>>> To avoid we forget to update that 224 number when we add more things
>>>>> into netvsc header, I suggest that we define a macro in netdevice.h such
>>>>> as:
>>>>> #define HVNETVSC_MAX_HEADER 224
>>>>> #define LL_MAX_HEADER HVNETVSC_MAX_HEADER
>>>>>
>>>>> And, put a note in netvsc code saying the header reservation shouldn't
>>>>> exceed HVNETVSC_MAX_HEADER, or you need to update
>>>> HVNETVSC_MAX_HEADER.
>>>>
>>>> Am I right in thinking this is adding an extra 96 unused bytes to the front
>>>> of almost all skb just so that hyper-v can make its link level header
>>>> contiguous with whatever follows (IP header ?).
>>>>
>>>> Doesn't sound ideal.
>>> Remote NDIS is the protocol used to send packets from the guest to the host.
>> Every packet
>>> needs to be decorated with the RNDIS header and the maximum room needed
>> for the RNDIS
>>> header is the hreadroom we want.
>> I think we get that.  The question is does the Remote NDIS header and
>> packet info actually need to be a part of the header data?  I would
>> argue that it probably doesn't.
>>
>> So for example in netvsc_start_xmit it looks like you are calling
>> init_page_array in order to populate a set of page buffers, but the
>> first buffer for the Remote NDIS protocol is populated as a separate
>> page and offset.  As such it doesn't seem like it necessarily needs to
>> be a part of the header data but could be maintained perhaps in a
>> separate ring buffer, or perhaps just be a separate page that you break
>> up to use for each header.
> You are right; the rndis header can be built as a separate fragment and sent.
> Indeed this is what we were doing earlier - on the outgoing path we would allocate
> memory for the rndis header. My goal was to avoid this allocation on every packet being
> sent and I decided to use the headroom instead. If we can completely avoid all memory
> allocation for rndis header, it makes a significant perf difference:
>
> Throughput as measured by iperf on a 40G interface (VM to VM on two nodes) in Gbps.
> Scenario #A: LL_MAX_HEADER =128 [no change], needed_headroom = 220 [no change]
> Scenario #B: LL_MAX_HEADER =224, needed_headroom = 220 [no change]
>
> Conn#   #A                     #B
> 1              6.9                  8.2
> 2              13.2                14.9
> 4              17.6                16.6
> 8              24.1                26.9
> 16           24.0                 31.5
> 32           24.5                 33.6
> 64           31.6                 31.5
> 128         29.6                 30.3
>
> Column A is the existing code where we end up having to allocate more headroom and column B is with
> Vitaly's patch. I will experiment with a light-weight allocator for the rndis header.
>
> Regards,
>
> K. Y

I get the performance implications, but that is increasing the memory 
footprint for every driver in the system that is holding any outstanding 
transmit buffers.  Also it will likely have a negative impact on 
transmit performance as it increases the truesize of every outgoing buffer.

The other thing I don't get is why hv_netvsc_packet was being included 
in this allocation as well.  It seems like it is just metadata that is 
used for each outgoing frame.  Odds are you could probably make it a 
separate allocation as well, however if nothing else you should probably 
look at rearranging the structure to fill the holes as it looks like you 
have about 16 bytes of wasted space due to the arrangement of 32b and 
64b values.  Fixing that would allow you to reduce your needed_headroom 
which may also help to improve things.

- Alex


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-16 23:48           ` Alexander Duyck
@ 2015-09-16 23:58             ` KY Srinivasan
  0 siblings, 0 replies; 17+ messages in thread
From: KY Srinivasan @ 2015-09-16 23:58 UTC (permalink / raw)
  To: Alexander Duyck, David Laight, Haiyang Zhang, Vitaly Kuznetsov, netdev
  Cc: David S. Miller, linux-kernel, Jason Wang



> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> Sent: Wednesday, September 16, 2015 4:49 PM
> To: KY Srinivasan <kys@microsoft.com>; David Laight
> <David.Laight@ACULAB.COM>; Haiyang Zhang <haiyangz@microsoft.com>;
> Vitaly Kuznetsov <vkuznets@redhat.com>; netdev@vger.kernel.org
> Cc: David S. Miller <davem@davemloft.net>; linux-kernel@vger.kernel.org;
> Jason Wang <jasowang@redhat.com>
> Subject: Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> 
> On 09/16/2015 03:57 PM, KY Srinivasan wrote:
> >
> >> -----Original Message-----
> >> From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> >> Sent: Wednesday, September 16, 2015 2:39 PM
> >> To: KY Srinivasan <kys@microsoft.com>; David Laight
> >> <David.Laight@ACULAB.COM>; Haiyang Zhang <haiyangz@microsoft.com>;
> >> Vitaly Kuznetsov <vkuznets@redhat.com>; netdev@vger.kernel.org
> >> Cc: David S. Miller <davem@davemloft.net>; linux-kernel@vger.kernel.org;
> >> Jason Wang <jasowang@redhat.com>
> >> Subject: Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-
> V
> >>
> >> On 09/16/2015 10:55 AM, KY Srinivasan wrote:
> >>>> -----Original Message-----
> >>>> From: David Laight [mailto:David.Laight@ACULAB.COM]
> >>>> Sent: Wednesday, September 16, 2015 9:25 AM
> >>>> To: Haiyang Zhang <haiyangz@microsoft.com>; Vitaly Kuznetsov
> >>>> <vkuznets@redhat.com>; netdev@vger.kernel.org
> >>>> Cc: David S. Miller <davem@davemloft.net>; linux-
> kernel@vger.kernel.org;
> >>>> KY Srinivasan <kys@microsoft.com>; Jason Wang
> <jasowang@redhat.com>
> >>>> Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for
> Hyper-
> >>>> V
> >>>>
> >>>> From: Haiyang Zhang
> >>>>> Sent: 16 September 2015 17:09
> >>>>>> -----Original Message-----
> >>>>>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> >>>>>> Sent: Wednesday, September 16, 2015 11:50 AM
> >>>>>> To: netdev@vger.kernel.org
> >>>>>> Cc: David S. Miller <davem@davemloft.net>; linux-
> >>>> kernel@vger.kernel.org;
> >>>>>> KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> >>>>>> <haiyangz@microsoft.com>; Jason Wang <jasowang@redhat.com>
> >>>>>> Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for
> Hyper-
> >>>> V
> >>>>>> Commit b08cc79155fc26d0d112b1470d1ece5034651a4b
> ("hv_netvsc:
> >>>> Eliminate
> >>>>>>    memory allocation in the packet send path") introduced skb
> headroom
> >>>>>> request for Hyper-V netvsc driver:
> >>>>>>
> >>>>>>          max_needed_headroom = sizeof(struct hv_netvsc_packet) +
> >>>>>>                                  sizeof(struct rndis_message) +
> >>>>>>                                  NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE +
> >>>>>>                                  NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE;
> >>>>>>          ...
> >>>>>>          net->needed_headroom = max_needed_headroom;
> >>>>>>
> >>>>>> max_needed_headroom is 220 bytes, it significantly exceeds the
> >>>>>> LL_MAX_HEADER setting. This causes each skb to be cloned on send
> >>>> path,
> >>>>>> e.g. for IPv4 case we fall into the following clause
> >>>>>> (ip_finish_output2()):
> >>>>>>
> >>>>>> if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
> >>>>>>       ...
> >>>>>>       skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
> >>>>>>       ...
> >>>>>> }
> >>>>>>
> >>>>>> leading to a significant performance regression. Increase
> >>>> LL_MAX_HEADER
> >>>>>> to make it suitable for netvsc, make it 224 to be 16-aligned.
> >>>>>> Alternatively we could (partially) revert the commit which introduced
> >>>>>> skb
> >>>>>> headroom request restoring manual memory allocation on transmit
> path.
> >>>>>>
> >>>>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >>>>>> ---
> >>>>>>    include/linux/netdevice.h | 4 +++-
> >>>>>>    1 file changed, 3 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>>>>> index 88a0069..7233790 100644
> >>>>>> --- a/include/linux/netdevice.h
> >>>>>> +++ b/include/linux/netdevice.h
> >>>>>> @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc)
> >>>>>>     *	used.
> >>>>>>     */
> >>>>>>
> >>>>>> -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
> >>>>>> +#if IS_ENABLED(CONFIG_HYPERV_NET)
> >>>>>> +# define LL_MAX_HEADER 224
> >>>>>> +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
> >>>>>>    # if defined(CONFIG_MAC80211_MESH)
> >>>>>>    #  define LL_MAX_HEADER 128
> >>>>>>    # else
> >>>>> Thanks for the patch.
> >>>>> To avoid we forget to update that 224 number when we add more things
> >>>>> into netvsc header, I suggest that we define a macro in netdevice.h such
> >>>>> as:
> >>>>> #define HVNETVSC_MAX_HEADER 224
> >>>>> #define LL_MAX_HEADER HVNETVSC_MAX_HEADER
> >>>>>
> >>>>> And, put a note in netvsc code saying the header reservation shouldn't
> >>>>> exceed HVNETVSC_MAX_HEADER, or you need to update
> >>>> HVNETVSC_MAX_HEADER.
> >>>>
> >>>> Am I right in thinking this is adding an extra 96 unused bytes to the front
> >>>> of almost all skb just so that hyper-v can make its link level header
> >>>> contiguous with whatever follows (IP header ?).
> >>>>
> >>>> Doesn't sound ideal.
> >>> Remote NDIS is the protocol used to send packets from the guest to the
> host.
> >> Every packet
> >>> needs to be decorated with the RNDIS header and the maximum room
> needed
> >> for the RNDIS
> >>> header is the hreadroom we want.
> >> I think we get that.  The question is does the Remote NDIS header and
> >> packet info actually need to be a part of the header data?  I would
> >> argue that it probably doesn't.
> >>
> >> So for example in netvsc_start_xmit it looks like you are calling
> >> init_page_array in order to populate a set of page buffers, but the
> >> first buffer for the Remote NDIS protocol is populated as a separate
> >> page and offset.  As such it doesn't seem like it necessarily needs to
> >> be a part of the header data but could be maintained perhaps in a
> >> separate ring buffer, or perhaps just be a separate page that you break
> >> up to use for each header.
> > You are right; the rndis header can be built as a separate fragment and sent.
> > Indeed this is what we were doing earlier - on the outgoing path we would
> allocate
> > memory for the rndis header. My goal was to avoid this allocation on every
> packet being
> > sent and I decided to use the headroom instead. If we can completely avoid all
> memory
> > allocation for rndis header, it makes a significant perf difference:
> >
> > Throughput as measured by iperf on a 40G interface (VM to VM on two
> nodes) in Gbps.
> > Scenario #A: LL_MAX_HEADER =128 [no change], needed_headroom = 220
> [no change]
> > Scenario #B: LL_MAX_HEADER =224, needed_headroom = 220 [no change]
> >
> > Conn#   #A                     #B
> > 1              6.9                  8.2
> > 2              13.2                14.9
> > 4              17.6                16.6
> > 8              24.1                26.9
> > 16           24.0                 31.5
> > 32           24.5                 33.6
> > 64           31.6                 31.5
> > 128         29.6                 30.3
> >
> > Column A is the existing code where we end up having to allocate more
> headroom and column B is with
> > Vitaly's patch. I will experiment with a light-weight allocator for the rndis
> header.
> >
> > Regards,
> >
> > K. Y
> 
> I get the performance implications, but that is increasing the memory
> footprint for every driver in the system that is holding any outstanding
> transmit buffers.  Also it will likely have a negative impact on
> transmit performance as it increases the truesize of every outgoing buffer.
> 
> The other thing I don't get is why hv_netvsc_packet was being included
> in this allocation as well.  It seems like it is just metadata that is
> used for each outgoing frame.  Odds are you could probably make it a
> separate allocation as well, however if nothing else you should probably
> look at rearranging the structure to fill the holes as it looks like you
> have about 16 bytes of wasted space due to the arrangement of 32b and
> 64b values.  Fixing that would allow you to reduce your needed_headroom
> which may also help to improve things.

Agreed; I will address this as well.

K. Y
> 
> - Alex


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-16 22:57         ` KY Srinivasan
  2015-09-16 23:48           ` Alexander Duyck
@ 2015-09-17  8:38           ` David Laight
  2015-09-17 15:14             ` KY Srinivasan
  1 sibling, 1 reply; 17+ messages in thread
From: David Laight @ 2015-09-17  8:38 UTC (permalink / raw)
  To: 'KY Srinivasan',
	Alexander Duyck, Haiyang Zhang, Vitaly Kuznetsov, netdev
  Cc: David S. Miller, linux-kernel, Jason Wang

From: KY Srinivasan
> Sent: 16 September 2015 23:58
...
> > I think we get that.  The question is does the Remote NDIS header and
> > packet info actually need to be a part of the header data?  I would
> > argue that it probably doesn't.
> >
> > So for example in netvsc_start_xmit it looks like you are calling
> > init_page_array in order to populate a set of page buffers, but the
> > first buffer for the Remote NDIS protocol is populated as a separate
> > page and offset.  As such it doesn't seem like it necessarily needs to
> > be a part of the header data but could be maintained perhaps in a
> > separate ring buffer, or perhaps just be a separate page that you break
> > up to use for each header.
> 
> You are right; the rndis header can be built as a separate fragment and sent.
> Indeed this is what we were doing earlier - on the outgoing path we would allocate
> memory for the rndis header. My goal was to avoid this allocation on every packet being
> sent and I decided to use the headroom instead. If we can completely avoid all memory
> allocation for rndis header, it makes a significant perf difference:
...


So just preallocate the header space as a fixed buffer for each ring entry
(or tx frame).

If you allocate a fixed buffer for each ring entry you may find there are
performance gains from copying small fragments into the buffer instead
of doing whatever mapping operations are required.

	David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-16 17:59     ` David Miller
@ 2015-09-17  9:03       ` Vitaly Kuznetsov
  0 siblings, 0 replies; 17+ messages in thread
From: Vitaly Kuznetsov @ 2015-09-17  9:03 UTC (permalink / raw)
  To: David Miller; +Cc: David.Laight, haiyangz, netdev, linux-kernel, kys, jasowang

David Miller <davem@davemloft.net> writes:

> From: David Laight <David.Laight@ACULAB.COM>
> Date: Wed, 16 Sep 2015 16:25:03 +0000
>
>> Am I right in thinking this is adding an extra 96 unused bytes to the front
>> of almost all skb just so that hyper-v can make its link level header
>> contiguous with whatever follows (IP header ?).
>> 
>> Doesn't sound ideal.
>
> Agreed, this is rediculous, and the entire stack will incur this cost
> just because hyperv is enabled in the kernel config.

That's what 'RFC' in the subject was about :-)

We already have a precedent of increasing LL_MAX_HEADER globaly because
of a config option (CONFIG_MAC80211_MESH) but Hyper-V needs more.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-17  8:38           ` David Laight
@ 2015-09-17 15:14             ` KY Srinivasan
  2015-09-17 18:52               ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: KY Srinivasan @ 2015-09-17 15:14 UTC (permalink / raw)
  To: David Laight, Alexander Duyck, Haiyang Zhang, Vitaly Kuznetsov, netdev
  Cc: David S. Miller, linux-kernel, Jason Wang



> -----Original Message-----
> From: David Laight [mailto:David.Laight@ACULAB.COM]
> Sent: Thursday, September 17, 2015 1:38 AM
> To: KY Srinivasan <kys@microsoft.com>; Alexander Duyck
> <alexander.duyck@gmail.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> Vitaly Kuznetsov <vkuznets@redhat.com>; netdev@vger.kernel.org
> Cc: David S. Miller <davem@davemloft.net>; linux-kernel@vger.kernel.org;
> Jason Wang <jasowang@redhat.com>
> Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> 
> From: KY Srinivasan
> > Sent: 16 September 2015 23:58
> ...
> > > I think we get that.  The question is does the Remote NDIS header and
> > > packet info actually need to be a part of the header data?  I would
> > > argue that it probably doesn't.
> > >
> > > So for example in netvsc_start_xmit it looks like you are calling
> > > init_page_array in order to populate a set of page buffers, but the
> > > first buffer for the Remote NDIS protocol is populated as a separate
> > > page and offset.  As such it doesn't seem like it necessarily needs to
> > > be a part of the header data but could be maintained perhaps in a
> > > separate ring buffer, or perhaps just be a separate page that you break
> > > up to use for each header.
> >
> > You are right; the rndis header can be built as a separate fragment and sent.
> > Indeed this is what we were doing earlier - on the outgoing path we would
> allocate
> > memory for the rndis header. My goal was to avoid this allocation on every
> packet being
> > sent and I decided to use the headroom instead. If we can completely avoid all
> memory
> > allocation for rndis header, it makes a significant perf difference:
> ...
> 
> 
> So just preallocate the header space as a fixed buffer for each ring entry
> (or tx frame).
> 
> If you allocate a fixed buffer for each ring entry you may find there are
> performance gains from copying small fragments into the buffer instead
> of doing whatever mapping operations are required.
> 
> 	David

Yes; I could do that. My original goal of asking for additional head room was to avoid having
any allocation in the transmit path. I did not realize that all I had done was push the allocation to a
different spot since the head room I was asking was greater than the default head room on skb allocation.

I think I can achieve my original goal of not having any allocation in the send path by carefully using the
memory available in the skb:

1. I am going to separately handle the rndis header and this can be packed in the default headroom
available in the skb.
2. I will use the scratch area in the skb to stash away the state that needs to persist. This is the state needed
to cleanup the guest state after we get the send_complete packet.

Regards,

K. Y 
 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-17 15:14             ` KY Srinivasan
@ 2015-09-17 18:52               ` David Miller
  2015-09-17 19:52                 ` KY Srinivasan
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2015-09-17 18:52 UTC (permalink / raw)
  To: kys
  Cc: David.Laight, alexander.duyck, haiyangz, vkuznets, netdev,
	linux-kernel, jasowang

From: KY Srinivasan <kys@microsoft.com>
Date: Thu, 17 Sep 2015 15:14:05 +0000

> I think I can achieve my original goal of not having any allocation
> in the send path by carefully using the memory available in the skb:

Please stop flat-out ignoring David L.'s suggestion.

Have a pre-cooked ring of buffers for these descriptors that you can
point the chip at.  No per-packet allocation is necessary at all.

If you play games with SKBs you will get burned.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-17 18:52               ` David Miller
@ 2015-09-17 19:52                 ` KY Srinivasan
  2015-09-17 20:10                   ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: KY Srinivasan @ 2015-09-17 19:52 UTC (permalink / raw)
  To: David Miller
  Cc: David.Laight, alexander.duyck, Haiyang Zhang, vkuznets, netdev,
	linux-kernel, jasowang



> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, September 17, 2015 11:52 AM
> To: KY Srinivasan <kys@microsoft.com>
> Cc: David.Laight@ACULAB.COM; alexander.duyck@gmail.com; Haiyang Zhang
> <haiyangz@microsoft.com>; vkuznets@redhat.com; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; jasowang@redhat.com
> Subject: Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> 
> From: KY Srinivasan <kys@microsoft.com>
> Date: Thu, 17 Sep 2015 15:14:05 +0000
> 
> > I think I can achieve my original goal of not having any allocation
> > in the send path by carefully using the memory available in the skb:
> 
> Please stop flat-out ignoring David L.'s suggestion.

I am sorry; I did not mean to convey that impression.

> 
> Have a pre-cooked ring of buffers for these descriptors that you can
> point the chip at.  No per-packet allocation is necessary at all.

Even if I had a ring of buffers, I would still need to manage the life cycle
of these buffers - selecting an unused one on the transmit path and marking
it used (atomically). Once the transmit completes (as indicated by the transmit complete
callback) this buffer needs to be marked free. I can certainly make these operations
efficient and  lock-free, but they are still at some level an allocation/free
operation albeit potentially more efficient than having the kernel allocate the memory.
 
> 
> If you play games with SKBs you will get burned.

I will implement Dave L's suggestion. However, I am curious as to why you would consider
my proposed usage of the skb headroom and the control buffer area in skb as non-standard
usage.

Regards,

K. Y 


  

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-17 19:52                 ` KY Srinivasan
@ 2015-09-17 20:10                   ` David Miller
  2015-09-17 21:16                     ` KY Srinivasan
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2015-09-17 20:10 UTC (permalink / raw)
  To: kys
  Cc: David.Laight, alexander.duyck, haiyangz, vkuznets, netdev,
	linux-kernel, jasowang

From: KY Srinivasan <kys@microsoft.com>
Date: Thu, 17 Sep 2015 19:52:01 +0000

> 
> 
>> -----Original Message-----
>> Have a pre-cooked ring of buffers for these descriptors that you can
>> point the chip at.  No per-packet allocation is necessary at all.
> 
> Even if I had a ring of buffers, I would still need to manage the life cycle
> of these buffers - selecting an unused one on the transmit path and marking
> it used (atomically).

Have one per TX ring entry, then the lifetime matches the lifetime of the
TX entry itself and therefore you need do nothing.

That's the whole idea.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
  2015-09-17 20:10                   ` David Miller
@ 2015-09-17 21:16                     ` KY Srinivasan
  0 siblings, 0 replies; 17+ messages in thread
From: KY Srinivasan @ 2015-09-17 21:16 UTC (permalink / raw)
  To: David Miller
  Cc: David.Laight, alexander.duyck, Haiyang Zhang, vkuznets, netdev,
	linux-kernel, jasowang



> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, September 17, 2015 1:11 PM
> To: KY Srinivasan <kys@microsoft.com>
> Cc: David.Laight@ACULAB.COM; alexander.duyck@gmail.com; Haiyang Zhang
> <haiyangz@microsoft.com>; vkuznets@redhat.com; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; jasowang@redhat.com
> Subject: Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> 
> From: KY Srinivasan <kys@microsoft.com>
> Date: Thu, 17 Sep 2015 19:52:01 +0000
> 
> >
> >
> >> -----Original Message-----
> >> Have a pre-cooked ring of buffers for these descriptors that you can
> >> point the chip at.  No per-packet allocation is necessary at all.
> >
> > Even if I had a ring of buffers, I would still need to manage the life cycle
> > of these buffers - selecting an unused one on the transmit path and marking
> > it used (atomically).
> 
> Have one per TX ring entry, then the lifetime matches the lifetime of the
> TX entry itself and therefore you need do nothing.

Yes, I understand. Unfortunately, the ring buffer used on Hyper-V to send the packets to the host is not
managed as a traditional TX ring entries - it is not fixed size and a given packet can wrap around and lastly, I think
the management of space on the ring buffer is not tied to the act of completing the send operation. That is why
we have an explicit "send complete" message.

I am working on moving the model to more closely match the hardware model but it will take some time.
For now, I will implement a very light weight mechanism for managing the additional memory needed.

Regards,

K. Y
> 
> That's the whole idea.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-09-17 21:17 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-16 15:50 [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V Vitaly Kuznetsov
2015-09-16 16:08 ` Haiyang Zhang
2015-09-16 16:25   ` David Laight
2015-09-16 17:26     ` Alexander Duyck
2015-09-16 17:55     ` KY Srinivasan
2015-09-16 21:38       ` Alexander Duyck
2015-09-16 22:57         ` KY Srinivasan
2015-09-16 23:48           ` Alexander Duyck
2015-09-16 23:58             ` KY Srinivasan
2015-09-17  8:38           ` David Laight
2015-09-17 15:14             ` KY Srinivasan
2015-09-17 18:52               ` David Miller
2015-09-17 19:52                 ` KY Srinivasan
2015-09-17 20:10                   ` David Miller
2015-09-17 21:16                     ` KY Srinivasan
2015-09-16 17:59     ` David Miller
2015-09-17  9:03       ` Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).