From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753373AbbIPX6R (ORCPT ); Wed, 16 Sep 2015 19:58:17 -0400 Received: from mail-bn1on0147.outbound.protection.outlook.com ([157.56.110.147]:5426 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752818AbbIPX6P convert rfc822-to-8bit (ORCPT ); Wed, 16 Sep 2015 19:58:15 -0400 X-Greylist: delayed 3534 seconds by postgrey-1.27 at vger.kernel.org; Wed, 16 Sep 2015 19:58:15 EDT From: KY Srinivasan To: Alexander Duyck , David Laight , Haiyang Zhang , "Vitaly Kuznetsov" , "netdev@vger.kernel.org" CC: "David S. Miller" , "linux-kernel@vger.kernel.org" , Jason Wang Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V Thread-Topic: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V Thread-Index: AQHQ8Jdrp1Mkwb76akCmi3unx8nMnp4/UsgAgAAEhICAABfyIIAAP8MAgAAEDQCAACA7AIAAAlrw Date: Wed, 16 Sep 2015 23:58:12 +0000 Message-ID: References: <1442418625-11805-1-git-send-email-vkuznets@redhat.com> <063D6719AE5E284EB5DD2968C1650D6D1CB9800B@AcuExch.aculab.com> <55F9E172.2040008@gmail.com> <55F9FFE2.3030803@gmail.com> In-Reply-To: <55F9FFE2.3030803@gmail.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=kys@microsoft.com; x-originating-ip: [2601:600:8c01:121d:95bf:6e55:443c:f514] x-microsoft-exchange-diagnostics: 1;BY2PR0301MB0776;5:DbQUFxIZY/SIK0rbqWhCPgk7yr1C4WDo/l6PxRkv9legAODOfJncutrQ/8/XgvvXAzfDqxWzYrjCfqPTnamqfE1bVvcAs1zLugseWqFZWJ+SQHeVyCUvZA9zVqbgpISX7FvkGxqL083eaCTB+LyXgQ==;24:uT9QXOKM+zqn2xaf/fxXZ/LCZwZLVaHzeQxzR1mFg3bxLgk05H6JAVrSe2m44ZOj4HhAg1r9wi0fd7RiiZeeHA/yxyVXkp0PsDrUnhzRJMQ=;20:oXrZdYwIYXzwRweIu2CWtkjkP2G8YJxOlEDgTQSypsdj+0gGycoBAmcWpzL002uc7BfHss04bYey9OTjHEwAIg== x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BY2PR0301MB0776; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(108003899814671); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(61425019)(601004)(2401001)(8121501046)(520058)(520078)(5005006)(520075)(3002001)(61426019)(61427019);SRVR:BY2PR0301MB0776;BCL:0;PCL:0;RULEID:;SRVR:BY2PR0301MB0776; x-forefront-prvs: 07013D7479 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(13464003)(479174004)(24454002)(377454003)(51914003)(199003)(189002)(76176999)(50986999)(5005710100001)(97736004)(189998001)(74316001)(2900100001)(87936001)(10400500002)(10090500001)(40100003)(86362001)(5004730100002)(1511001)(8990500004)(2501003)(106356001)(5001770100001)(575784001)(101416001)(93886004)(10290500002)(122556002)(106116001)(33656002)(19580395003)(102836002)(5003600100002)(5007970100001)(46102003)(19580405001)(99286002)(54356999)(5001860100001)(5001920100001)(110136002)(2561002)(5001960100002)(62966003)(2950100001)(4001540100001)(86612001)(92566002)(5001830100001)(2421001)(5002640100001)(105586002)(64706001)(76576001)(77156002)(81156007)(68736005)(77096005)(3826002);DIR:OUT;SFP:1102;SCL:1;SRVR:BY2PR0301MB0776;H:BY2PR0301MB1654.namprd03.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Sep 2015 23:58:12.2120 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR0301MB0776 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: Alexander Duyck [mailto:alexander.duyck@gmail.com] > Sent: Wednesday, September 16, 2015 4:49 PM > To: KY Srinivasan ; David Laight > ; Haiyang Zhang ; > Vitaly Kuznetsov ; netdev@vger.kernel.org > Cc: David S. Miller ; linux-kernel@vger.kernel.org; > Jason Wang > Subject: Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V > > On 09/16/2015 03:57 PM, KY Srinivasan wrote: > > > >> -----Original Message----- > >> From: Alexander Duyck [mailto:alexander.duyck@gmail.com] > >> Sent: Wednesday, September 16, 2015 2:39 PM > >> To: KY Srinivasan ; David Laight > >> ; Haiyang Zhang ; > >> Vitaly Kuznetsov ; netdev@vger.kernel.org > >> Cc: David S. Miller ; linux-kernel@vger.kernel.org; > >> Jason Wang > >> Subject: Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper- > V > >> > >> On 09/16/2015 10:55 AM, KY Srinivasan wrote: > >>>> -----Original Message----- > >>>> From: David Laight [mailto:David.Laight@ACULAB.COM] > >>>> Sent: Wednesday, September 16, 2015 9:25 AM > >>>> To: Haiyang Zhang ; Vitaly Kuznetsov > >>>> ; netdev@vger.kernel.org > >>>> Cc: David S. Miller ; linux- > kernel@vger.kernel.org; > >>>> KY Srinivasan ; Jason Wang > > >>>> Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for > Hyper- > >>>> V > >>>> > >>>> From: Haiyang Zhang > >>>>> Sent: 16 September 2015 17:09 > >>>>>> -----Original Message----- > >>>>>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com] > >>>>>> Sent: Wednesday, September 16, 2015 11:50 AM > >>>>>> To: netdev@vger.kernel.org > >>>>>> Cc: David S. Miller ; linux- > >>>> kernel@vger.kernel.org; > >>>>>> KY Srinivasan ; Haiyang Zhang > >>>>>> ; Jason Wang > >>>>>> Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for > Hyper- > >>>> V > >>>>>> Commit b08cc79155fc26d0d112b1470d1ece5034651a4b > ("hv_netvsc: > >>>> Eliminate > >>>>>> memory allocation in the packet send path") introduced skb > headroom > >>>>>> request for Hyper-V netvsc driver: > >>>>>> > >>>>>> max_needed_headroom = sizeof(struct hv_netvsc_packet) + > >>>>>> sizeof(struct rndis_message) + > >>>>>> NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE + > >>>>>> NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE; > >>>>>> ... > >>>>>> net->needed_headroom = max_needed_headroom; > >>>>>> > >>>>>> max_needed_headroom is 220 bytes, it significantly exceeds the > >>>>>> LL_MAX_HEADER setting. This causes each skb to be cloned on send > >>>> path, > >>>>>> e.g. for IPv4 case we fall into the following clause > >>>>>> (ip_finish_output2()): > >>>>>> > >>>>>> if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) { > >>>>>> ... > >>>>>> skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev)); > >>>>>> ... > >>>>>> } > >>>>>> > >>>>>> leading to a significant performance regression. Increase > >>>> LL_MAX_HEADER > >>>>>> to make it suitable for netvsc, make it 224 to be 16-aligned. > >>>>>> Alternatively we could (partially) revert the commit which introduced > >>>>>> skb > >>>>>> headroom request restoring manual memory allocation on transmit > path. > >>>>>> > >>>>>> Signed-off-by: Vitaly Kuznetsov > >>>>>> --- > >>>>>> include/linux/netdevice.h | 4 +++- > >>>>>> 1 file changed, 3 insertions(+), 1 deletion(-) > >>>>>> > >>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > >>>>>> index 88a0069..7233790 100644 > >>>>>> --- a/include/linux/netdevice.h > >>>>>> +++ b/include/linux/netdevice.h > >>>>>> @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc) > >>>>>> * used. > >>>>>> */ > >>>>>> > >>>>>> -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) > >>>>>> +#if IS_ENABLED(CONFIG_HYPERV_NET) > >>>>>> +# define LL_MAX_HEADER 224 > >>>>>> +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) > >>>>>> # if defined(CONFIG_MAC80211_MESH) > >>>>>> # define LL_MAX_HEADER 128 > >>>>>> # else > >>>>> Thanks for the patch. > >>>>> To avoid we forget to update that 224 number when we add more things > >>>>> into netvsc header, I suggest that we define a macro in netdevice.h such > >>>>> as: > >>>>> #define HVNETVSC_MAX_HEADER 224 > >>>>> #define LL_MAX_HEADER HVNETVSC_MAX_HEADER > >>>>> > >>>>> And, put a note in netvsc code saying the header reservation shouldn't > >>>>> exceed HVNETVSC_MAX_HEADER, or you need to update > >>>> HVNETVSC_MAX_HEADER. > >>>> > >>>> Am I right in thinking this is adding an extra 96 unused bytes to the front > >>>> of almost all skb just so that hyper-v can make its link level header > >>>> contiguous with whatever follows (IP header ?). > >>>> > >>>> Doesn't sound ideal. > >>> Remote NDIS is the protocol used to send packets from the guest to the > host. > >> Every packet > >>> needs to be decorated with the RNDIS header and the maximum room > needed > >> for the RNDIS > >>> header is the hreadroom we want. > >> I think we get that. The question is does the Remote NDIS header and > >> packet info actually need to be a part of the header data? I would > >> argue that it probably doesn't. > >> > >> So for example in netvsc_start_xmit it looks like you are calling > >> init_page_array in order to populate a set of page buffers, but the > >> first buffer for the Remote NDIS protocol is populated as a separate > >> page and offset. As such it doesn't seem like it necessarily needs to > >> be a part of the header data but could be maintained perhaps in a > >> separate ring buffer, or perhaps just be a separate page that you break > >> up to use for each header. > > You are right; the rndis header can be built as a separate fragment and sent. > > Indeed this is what we were doing earlier - on the outgoing path we would > allocate > > memory for the rndis header. My goal was to avoid this allocation on every > packet being > > sent and I decided to use the headroom instead. If we can completely avoid all > memory > > allocation for rndis header, it makes a significant perf difference: > > > > Throughput as measured by iperf on a 40G interface (VM to VM on two > nodes) in Gbps. > > Scenario #A: LL_MAX_HEADER =128 [no change], needed_headroom = 220 > [no change] > > Scenario #B: LL_MAX_HEADER =224, needed_headroom = 220 [no change] > > > > Conn# #A #B > > 1 6.9 8.2 > > 2 13.2 14.9 > > 4 17.6 16.6 > > 8 24.1 26.9 > > 16 24.0 31.5 > > 32 24.5 33.6 > > 64 31.6 31.5 > > 128 29.6 30.3 > > > > Column A is the existing code where we end up having to allocate more > headroom and column B is with > > Vitaly's patch. I will experiment with a light-weight allocator for the rndis > header. > > > > Regards, > > > > K. Y > > I get the performance implications, but that is increasing the memory > footprint for every driver in the system that is holding any outstanding > transmit buffers. Also it will likely have a negative impact on > transmit performance as it increases the truesize of every outgoing buffer. > > The other thing I don't get is why hv_netvsc_packet was being included > in this allocation as well. It seems like it is just metadata that is > used for each outgoing frame. Odds are you could probably make it a > separate allocation as well, however if nothing else you should probably > look at rearranging the structure to fill the holes as it looks like you > have about 16 bytes of wasted space due to the arrangement of 32b and > 64b values. Fixing that would allow you to reduce your needed_headroom > which may also help to improve things. Agreed; I will address this as well. K. Y > > - Alex