netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Yossi Kuperman <yossiku@mellanox.com>,
	"netdev\@vger.kernel.org" <netdev@vger.kernel.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>,
	Jiri Pirko <jiri@mellanox.com>, Rony Efraim <ronye@mellanox.com>,
	Maxim Mikityanskiy <maximmi@mellanox.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Eran Ben Elisha <eranbe@mellanox.com>
Subject: Re: [RFC] Hierarchical QoS Hardware Offload (HTB)
Date: Sat, 01 Feb 2020 17:48:26 +0100	[thread overview]
Message-ID: <87y2tmckyt.fsf@toke.dk> (raw)
In-Reply-To: <FC053E80-74C9-4884-92F1-4DBEB5F0C81A@mellanox.com>

Yossi Kuperman <yossiku@mellanox.com> writes:

> Following is an outline briefly describing our plans towards offloading HTB functionality.
>
> HTB qdisc allows you to use one physical link to simulate several
> slower links. This is done by configuring a hierarchical QoS tree;
> each tree node corresponds to a class. Filters are used to classify
> flows to different classes. HTB is quite flexible and versatile, but
> it comes with a cost. HTB does not scale and consumes considerable CPU
> and memory. Our aim is to offload HTB functionality to hardware and
> provide the user with the flexibility and the conventional tools
> offered by TC subsystem, while scaling to thousands of traffic classes
> and maintaining wire-speed performance. 
>
> Mellanox hardware can support hierarchical rate-limiting;
> rate-limiting is done per hardware queue. In our proposed solution,
> flow classification takes place in software. By moving the
> classification to clsact egress hook, which is thread-safe and does
> not require locking, we avoid the contention induced by the single
> qdisc lock. Furthermore, clsact filters are perform before the
> net-device’s TX queue is selected, allowing the driver a chance to
> translate the class to the appropriate hardware queue. Please note
> that the user will need to configure the filters slightly different;
> apply them to the clsact rather than to the HTB itself, and set the
> priority to the desired class-id.
>
> For example, the following two filters are equivalent:
> 	1. tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80 classid 1:10
> 	2. tc filter add dev eth0 egress protocol ip flower dst_port 80 action skbedit priority 1:10
>
> Note: to support the above filter no code changes to the upstream kernel nor to iproute2 package is required.
>
> Furthermore, the most concerning aspect of the current HTB
> implementation is its lack of support for multi-queue. All
> net-device’s TX queues points to the same HTB instance, resulting in
> high spin-lock contention. This contention (might) negates the overall
> performance gains expected by introducing the offload in the first
> place. We should modify HTB to present itself as mq qdisc does. By
> default, mq qdisc allocates a simple fifo qdisc per TX queue exposed
> by the lower layer device. This is only when hardware offload is
> configured, otherwise, HTB behaves as usual. There is no HTB code
> along the data-path; the only overhead compared to regular traffic is
> the classification taking place at clsact. Please note that this
> design induces full offload---no fallback to software; it is not
> trivial to partial offload the hierarchical tree considering borrowing
> between siblings anyway.
>
>
> To summaries: for each HTB leaf-class the driver will allocate a
> special queue and match it with a corresponding net-device TX queue
> (increase real_num_tx_queues). A unique fifo qdisc will be attached to
> any such TX queue. Classification will still take place in software,
> but rather at the clsact egress hook. This way we can scale to
> thousands of classes while maintaining wire-speed performance and
> reducing CPU overhead.
>
> Any feedback will be much appreciated.

Other than echoing Dave's concern around baking FIFO semantics into
hardware, maybe also consider whether implementing the required
functionality using EDT-based semantics instead might be better? I.e.,
something like this:
https://netdevconf.info/0x14/session.html?talk-replacing-HTB-with-EDT-and-BPF

-Toke


  parent reply	other threads:[~2020-02-01 16:48 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-30 16:20 [RFC] Hierarchical QoS Hardware Offload (HTB) Yossi Kuperman
2020-01-31  1:47 ` Dave Taht
2020-01-31 21:42   ` Dave Taht
2020-02-14 11:33   ` Yossi Kuperman
2020-02-01 16:48 ` Toke Høiland-Jørgensen [this message]
     [not found]   ` <bbafbd41-2a3b-3abd-e57c-18175a7c9e3f@mellanox.com>
2020-02-14 11:14     ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y2tmckyt.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=eranbe@mellanox.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@mellanox.com \
    --cc=john.fastabend@gmail.com \
    --cc=maximmi@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=ronye@mellanox.com \
    --cc=yossiku@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).