netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Huy Nguyen <huyn@mellanox.com>
To: Jakub Kicinski <jakub.kicinski@netronome.com>,
	Saeed Mahameed <saeedm@mellanox.com>,
	"David S. Miller" <davem@davemloft.net>,
	netdev@vger.kernel.org, Jiri Pirko <jiri@resnulli.us>,
	Or Gerlitz <gerlitz.or@gmail.com>,
	Parav Pandit <parav@mellanox.com>
Subject: Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
Date: Tue, 22 May 2018 20:01:21 -0500	[thread overview]
Message-ID: <1576e986-6e04-63f3-3569-a105493929a6@mellanox.com> (raw)
In-Reply-To: <20180522113202.54c43440@cakuba>

Dear Jakub, PSB.

On 5/22/2018 1:32 PM, Jakub Kicinski wrote:
> On Tue, 22 May 2018 10:36:17 -0500, Huy Nguyen wrote:
>> On 5/22/2018 12:20 AM, Jakub Kicinski wrote:
>>> On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:
>>>> From: Huy Nguyen <huyn@mellanox.com>
>>>>
>>>> In this patch, we add dcbnl buffer attribute to allow user
>>>> change the NIC's buffer configuration such as priority
>>>> to buffer mapping and buffer size of individual buffer.
>>>>
>>>> This attribute combined with pfc attribute allows advance user to
>>>> fine tune the qos setting for specific priority queue. For example,
>>>> user can give dedicated buffer for one or more prirorities or user
>>>> can give large buffer to certain priorities.
>>>>
>>>> We present an use case scenario where dcbnl buffer attribute configured
>>>> by advance user helps reduce the latency of messages of different sizes.
>>>>
>>>> Scenarios description:
>>>> On ConnectX-5, we run latency sensitive traffic with
>>>> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
>>>> traffic with large messages sizes 512KB and 1MB. We group small, medium,
>>>> and large message sizes to their own pfc enables priorities as follow.
>>>>     Priorities 1 & 2 (64B, 256B and 1KB)
>>>>     Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
>>>>     Priorities 5 & 6 (512KB and 1MB)
>>>>
>>>> By default, ConnectX-5 maps all pfc enabled priorities to a single
>>>> lossless fixed buffer size of 50% of total available buffer space. The
>>>> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
>>>> we create three equal size lossless buffers. Each buffer has 25% of total
>>>> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
>>>> to lossless  buffer mappings are set as follow.
>>>>     Priorities 1 & 2 on lossless buffer #1
>>>>     Priorities 3 & 4 on lossless buffer #2
>>>>     Priorities 5 & 6 on lossless buffer #3
>>>>
>>>> We observe improvements in latency for small and medium message sizes
>>>> as follows. Please note that the large message sizes bandwidth performance is
>>>> reduced but the total bandwidth remains the same.
>>>>     256B message size (42 % latency reduction)
>>>>     4K message size (21% latency reduction)
>>>>     64K message size (16% latency reduction)
>>>>
>>>> Signed-off-by: Huy Nguyen <huyn@mellanox.com>
>>>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
>>> On a cursory look this bares a lot of resemblance to devlink shared
>>> buffer configuration ABI.  Did you look into using that?
>>>
>>> Just to be clear devlink shared buffer ABIs don't require representors
>>> and "switchdev mode".
>>> .
>> [HQN] Dear Jakub, there are several reasons that devlink shared buffer
>> ABI cannot be used:
>> 1. The devlink shared buffer ABI is written based on the switch cli
>> which you can find out more
>> from this link https://community.mellanox.com/docs/DOC-2558.
> Devlink API accommodates requirements of simpler (SwitchX2?) and more
> advanced schemes (present in Spectrum).  The simpler/basic static
> threshold configurations is exactly what you are doing here, AFAIU.
[HQN] Devlink API is tailored specifically for switch. We don't 
configure threshold configuration
explicitly. It is done via PFC. Once PFC is enabled on priority, 
threshold is setup based on our
proprietary formula that were tested rigorously for performance.
>> 2. The dcbnl interfaces have been used for QoS settings.
> QoS settings != shared buffer configuration.
[HQN] I think we have different definition about "shared buffer". Please 
refer to this below switch cli link.
It explained in detail what is the "shared buffer" in switch means.
Our NIC does not have "shared buffer" supported.
https://community.mellanox.com/docs/DOC-2591

>
>> In NIC, the  buffer configuration are tied to priority (ETS PFC).
> Some customers use DCB, a lot (most?) of them don't.  I don't think the
> "this is a logical extension of a commonly used API" really stands here.
[HQN] DCBNL are being actively used. The whole point of this patch
is to tie buffer configuration with IEEE's priority and is IEEE's PFC 
configuration.

Ambitious future is to have the switch configure the NIC's buffer size 
and buffer mapping
via TLV packet and this DCBNL interface. But we won't go too far here.
>
>> The buffer configuration are not tied to port like switch.
> It's tied to a port and TCs, you just have one port but still have 8
> TCs exactly like a switch...
[HQN] No. Our buffer ties to priority not to TCs.
>> 3. Shared buffer, alpha, threshold are switch specific terms.
> IDK how talking about alpha is relevant, it's just one threshold type
> the API supports.  As far as shared buffer and threshold I don't know
> if these are switch terms (or how "switch" differs from "NIC" at that
> level) - I personally find carving shared buffer into pools very
> intuitive.
[HQN] Yes, I understand your point too. The NIC's buffer shares some 
characteristics with the switch's
buffer settings. But this DCB buffer setting is to improve the 
performance and work together with the
PFC setting. We would like to keep all the qos setting under DCB Netlink 
as they are designed
to be this way.

>
> Could you give examples of commands/configs one can use with your new
> ABI?
[HQN] The plan is to add the support in lldptool once the kernel code is 
accepted. To test the kernel code,
I am using small python scripts that works on top of the netlink library.
It will be like this format which is similar to other options in lldptool
     priority2buffer: 0,2,5,7,1,2,3,6 maps priorities 0,1,2,3,4,5,6,7 to 
buffer 0,2,5,7,1,2,3,6
     buffer_size: 87296,87296,0,87296,0,0,0,0 set receive buffer size 
for buffer 0,1,2,3,4,5,6,7 respectively
>    How does one query the total size of the buffer to be carved?
[HQN] This is not necessary. If the total size is too big, error will be 
return via DCB netlink interface.
>

  reply	other threads:[~2018-05-23  1:01 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-21 21:04 [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 Saeed Mahameed
2018-05-21 21:04 ` [net-next 1/6] net/dcb: Add dcbnl buffer attribute Saeed Mahameed
2018-05-22  5:20   ` Jakub Kicinski
2018-05-22 15:36     ` Huy Nguyen
2018-05-22 18:32       ` Jakub Kicinski
2018-05-23  1:01         ` Huy Nguyen [this message]
2018-05-23  6:15           ` Or Gerlitz
2018-05-23  9:23           ` Jakub Kicinski
2018-05-23  9:33             ` Jiri Pirko
2018-05-23 15:08             ` Huy Nguyen
2018-05-23 15:27             ` Huy Nguyen
2018-05-24 17:13             ` Ido Schimmel
2018-05-23  9:43     ` Jiri Pirko
2018-05-23 13:52       ` John Fastabend
2018-05-23 15:37         ` Huy Nguyen
2018-05-23 16:03           ` John Fastabend
2018-05-23 20:28             ` Jakub Kicinski
2018-05-24 14:37             ` Huy Nguyen
2018-05-23 20:13         ` Jakub Kicinski
2018-05-23 20:19   ` Jakub Kicinski
2018-05-24 14:11     ` Huy Nguyen
2018-05-21 21:04 ` [net-next 2/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask Saeed Mahameed
2018-05-22 10:19   ` Or Gerlitz
2018-05-22 10:21     ` Or Gerlitz
2018-05-22 16:01       ` Saeed Mahameed
2018-05-24 21:21         ` Or Gerlitz
2018-05-24 21:28           ` Saeed Mahameed
2018-05-21 21:04 ` [net-next 3/6] net/mlx5e: Move port speed code from en_ethtool.c to en/port.c Saeed Mahameed
2018-05-21 21:05 ` [net-next 4/6] net/mlx5e: PPTB and PBMC register firmware command support Saeed Mahameed
2018-05-21 21:05 ` [net-next 5/6] net/mlx5e: Receive buffer configuration Saeed Mahameed
2018-05-21 21:05 ` [net-next 6/6] net/mlx5e: Receive buffer support for DCBX Saeed Mahameed
2018-05-22 19:38 ` [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19 David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1576e986-6e04-63f3-3569-a105493929a6@mellanox.com \
    --to=huyn@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=gerlitz.or@gmail.com \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=parav@mellanox.com \
    --cc=saeedm@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).