From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jakub Kicinski Subject: Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute Date: Wed, 23 May 2018 13:13:44 -0700 Message-ID: <20180523131344.27e17299@cakuba> References: <20180521210502.11082-1-saeedm@mellanox.com> <20180521210502.11082-2-saeedm@mellanox.com> <20180521222026.4f54f479@cakuba> <20180523094331.GC3046@nanopsycho> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Jiri Pirko , Saeed Mahameed , "David S. Miller" , netdev@vger.kernel.org, Huy Nguyen , Or Gerlitz To: John Fastabend Return-path: Received: from mail-qk0-f171.google.com ([209.85.220.171]:41703 "EHLO mail-qk0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933782AbeEWUNt (ORCPT ); Wed, 23 May 2018 16:13:49 -0400 Received: by mail-qk0-f171.google.com with SMTP id d125-v6so18521350qkb.8 for ; Wed, 23 May 2018 13:13:49 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Wed, 23 May 2018 06:52:33 -0700, John Fastabend wrote: > On 05/23/2018 02:43 AM, Jiri Pirko wrote: > > Tue, May 22, 2018 at 07:20:26AM CEST, jakub.kicinski@netronome.com wrote: > >> On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote: > >>> From: Huy Nguyen > >>> > >>> In this patch, we add dcbnl buffer attribute to allow user > >>> change the NIC's buffer configuration such as priority > >>> to buffer mapping and buffer size of individual buffer. > >>> > >>> This attribute combined with pfc attribute allows advance user to > >>> fine tune the qos setting for specific priority queue. For example, > >>> user can give dedicated buffer for one or more prirorities or user > >>> can give large buffer to certain priorities. > >>> > >>> We present an use case scenario where dcbnl buffer attribute configured > >>> by advance user helps reduce the latency of messages of different sizes. > >>> > >>> Scenarios description: > >>> On ConnectX-5, we run latency sensitive traffic with > >>> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive > >>> traffic with large messages sizes 512KB and 1MB. We group small, medium, > >>> and large message sizes to their own pfc enables priorities as follow. > >>> Priorities 1 & 2 (64B, 256B and 1KB) > >>> Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB) > >>> Priorities 5 & 6 (512KB and 1MB) > >>> > >>> By default, ConnectX-5 maps all pfc enabled priorities to a single > >>> lossless fixed buffer size of 50% of total available buffer space. The > >>> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute, > >>> we create three equal size lossless buffers. Each buffer has 25% of total > >>> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority > >>> to lossless buffer mappings are set as follow. > >>> Priorities 1 & 2 on lossless buffer #1 > >>> Priorities 3 & 4 on lossless buffer #2 > >>> Priorities 5 & 6 on lossless buffer #3 > >>> > >>> We observe improvements in latency for small and medium message sizes > >>> as follows. Please note that the large message sizes bandwidth performance is > >>> reduced but the total bandwidth remains the same. > >>> 256B message size (42 % latency reduction) > >>> 4K message size (21% latency reduction) > >>> 64K message size (16% latency reduction) > >>> > >>> Signed-off-by: Huy Nguyen > >>> Signed-off-by: Saeed Mahameed > >> > >> On a cursory look this bares a lot of resemblance to devlink shared > >> buffer configuration ABI. Did you look into using that? > >> > >> Just to be clear devlink shared buffer ABIs don't require representors > >> and "switchdev mode". > > > > If the CX5 buffer they are trying to utilize here is per port and not a > > shared one, it would seem ok for me to not have it in "devlink sb". What I meant is that it may be shared between VFs and PF contexts. But if it's purely ingress per-prio FIFO without any advanced configuration capabilities, then perhaps this API is a better match. > +1 I think its probably reasonable to let devlink manage the global > (device layer) buffers and then have dcbnl partition the buffer up > further per netdev. Notice there is already a partitioning of the > buffers happening when DCB is enabled and/or parameters are changed. > So giving explicit control over this seems OK to me. Okay, thanks for the discussion! :) > It would be nice though if the API gave us some hint on max/min/stride > of allowed values. Could the get API return these along with current > value? Presumably the allowed max size could change with devlink > buffer changes in how the global buffer is divided up as well. > > The argument against allowing this API is it doesn't have anything to > do with the 802.1Q standard, but that is fine IMO.