From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jakub Kicinski Subject: Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute Date: Tue, 22 May 2018 11:32:02 -0700 Message-ID: <20180522113202.54c43440@cakuba> References: <20180521210502.11082-1-saeedm@mellanox.com> <20180521210502.11082-2-saeedm@mellanox.com> <20180521222026.4f54f479@cakuba> <5b0d2137-8c70-ceb4-6965-fef2c75c4c24@mellanox.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Saeed Mahameed , "David S. Miller" , netdev@vger.kernel.org, Jiri Pirko , Or Gerlitz , Parav Pandit To: Huy Nguyen Return-path: Received: from mail-qt0-f193.google.com ([209.85.216.193]:33016 "EHLO mail-qt0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751351AbeEVScH (ORCPT ); Tue, 22 May 2018 14:32:07 -0400 Received: by mail-qt0-f193.google.com with SMTP id e8-v6so24752934qth.0 for ; Tue, 22 May 2018 11:32:06 -0700 (PDT) In-Reply-To: <5b0d2137-8c70-ceb4-6965-fef2c75c4c24@mellanox.com> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 22 May 2018 10:36:17 -0500, Huy Nguyen wrote: > On 5/22/2018 12:20 AM, Jakub Kicinski wrote: > > On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote: > >> From: Huy Nguyen > >> > >> In this patch, we add dcbnl buffer attribute to allow user > >> change the NIC's buffer configuration such as priority > >> to buffer mapping and buffer size of individual buffer. > >> > >> This attribute combined with pfc attribute allows advance user to > >> fine tune the qos setting for specific priority queue. For example, > >> user can give dedicated buffer for one or more prirorities or user > >> can give large buffer to certain priorities. > >> > >> We present an use case scenario where dcbnl buffer attribute configured > >> by advance user helps reduce the latency of messages of different sizes. > >> > >> Scenarios description: > >> On ConnectX-5, we run latency sensitive traffic with > >> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive > >> traffic with large messages sizes 512KB and 1MB. We group small, medium, > >> and large message sizes to their own pfc enables priorities as follow. > >> Priorities 1 & 2 (64B, 256B and 1KB) > >> Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB) > >> Priorities 5 & 6 (512KB and 1MB) > >> > >> By default, ConnectX-5 maps all pfc enabled priorities to a single > >> lossless fixed buffer size of 50% of total available buffer space. The > >> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute, > >> we create three equal size lossless buffers. Each buffer has 25% of total > >> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority > >> to lossless buffer mappings are set as follow. > >> Priorities 1 & 2 on lossless buffer #1 > >> Priorities 3 & 4 on lossless buffer #2 > >> Priorities 5 & 6 on lossless buffer #3 > >> > >> We observe improvements in latency for small and medium message sizes > >> as follows. Please note that the large message sizes bandwidth performance is > >> reduced but the total bandwidth remains the same. > >> 256B message size (42 % latency reduction) > >> 4K message size (21% latency reduction) > >> 64K message size (16% latency reduction) > >> > >> Signed-off-by: Huy Nguyen > >> Signed-off-by: Saeed Mahameed > > On a cursory look this bares a lot of resemblance to devlink shared > > buffer configuration ABI. Did you look into using that? > > > > Just to be clear devlink shared buffer ABIs don't require representors > > and "switchdev mode". > > . > [HQN] Dear Jakub, there are several reasons that devlink shared buffer > ABI cannot be used: > 1. The devlink shared buffer ABI is written based on the switch cli > which you can find out more > from this link https://community.mellanox.com/docs/DOC-2558. Devlink API accommodates requirements of simpler (SwitchX2?) and more advanced schemes (present in Spectrum). The simpler/basic static threshold configurations is exactly what you are doing here, AFAIU. > 2. The dcbnl interfaces have been used for QoS settings. QoS settings != shared buffer configuration. > In NIC, the buffer configuration are tied to priority (ETS PFC). Some customers use DCB, a lot (most?) of them don't. I don't think the "this is a logical extension of a commonly used API" really stands here. > The buffer configuration are not tied to port like switch. It's tied to a port and TCs, you just have one port but still have 8 TCs exactly like a switch... > 3. Shared buffer, alpha, threshold are switch specific terms. IDK how talking about alpha is relevant, it's just one threshold type the API supports. As far as shared buffer and threshold I don't know if these are switch terms (or how "switch" differs from "NIC" at that level) - I personally find carving shared buffer into pools very intuitive. Could you give examples of commands/configs one can use with your new ABI? How does one query the total size of the buffer to be carved?