linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamal Hadi Salim <jhs@mojatatu.com>
To: Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Vinicius Costa Gomes <vinicius.gomes@intel.com>,
	linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>,
	Peilin Ye <yepeilin.cs@gmail.com>,
	Pedro Tammela <pctammela@mojatatu.com>
Subject: Re: [PATCH RESEND net-next 0/5] Improve the taprio qdisc's relationship with its children
Date: Tue, 6 Jun 2023 11:39:32 -0400	[thread overview]
Message-ID: <CAM0EoM=qG9sDjM=F6D=i=h3dKXwQXAt1wui8W_EXDJi2tijRnw@mail.gmail.com> (raw)
In-Reply-To: <20230605165353.tnjrwa7gbcp4qhim@skbuf>

Hi Vladimir,

On Mon, Jun 5, 2023 at 12:53 PM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>
> Hi Jamal,
>
> On Mon, Jun 05, 2023 at 11:44:17AM -0400, Jamal Hadi Salim wrote:
> > I havent been following - but if you show me sample intended tc
> > configs for both s/w and hardware offloads i can comment.
>
> There is not much difference in usage between the 2 modes. IMO the software
> data path logic is only a simulation for demonstrative purposes of what the
> shaper is intended to do. If hardware offload is available, it is always
> preferable. Otherwise, I'm not sure if anyone uses the pure software
> scheduling mode (also without txtime assist) for a real life use case.
>

Thanks for the sample. Understood on the software twin being useful
for simulation (our standard rule is you need to have a software twin)
- it is great you conform to that.

I took a cursory glance at the existing kernel code in order to get
better context (I will make more time later). Essentially this qdisc
is classful and is capable of hardware offload. Initial comments are
unrelated to the patchset (all this Klingon DCB thingy configuration
like a flag 0x2 is still magic to me).

1)Just some details become confusing in regards to offload vs not; F.e
class grafting (taprio_graft()) is de/activating the device but that
seems only needed for offload. Would it not be better to have those
separate and call graft_offload vs graft_software, etc? We really need
to create a generic document on how someone would write code for
qdiscs for consistency (I started working on one but never completed
it - if there is a volunteer i would be happy to work with one to
complete it).
2) It seems like in mqprio this qdisc can only be root qdisc (like
mqprio) and you dont want to replace the children with other types of
qdiscs i.e the children are always pfifo? i.e is it possible or
intended for example to replace 8001:x with bfifo etc? or even change
the pfifo queue size, etc?
3) Offload intention seems really to be bypassing enqueue() and going
straigth to the driver xmit() for a specific DMA ring that the skb is
mapped to. Except for the case where the driver says it's busy and
refuses to stash the skb in ring in which case you have to requeue to
the appropriate child qdisc/class. I am not sure how that would work
here - mqprio gets away with it by not defining any of the
en/de/requeue() callbacks - but likely it will be the lack of requeue
that makes it work.

 I will read the other thread you pointed to when i get a moment.

cheers,
jamal

> I was working with something like this for testing the code paths affected
> by these changes:
>
> #!/bin/bash
>
> add_taprio()
> {
>         local offload=$1
>         local extra_flags
>
>         case $offload in
>         true)
>                 extra_flags="flags 0x2"
>                 ;;
>         false)
>                 extra_flags="clockid CLOCK_TAI"
>                 ;;
>         esac
>
>         tc qdisc replace dev eno0 handle 8001: parent root stab overhead 24 taprio \
>                 num_tc 8 \
>                 map 0 1 2 3 4 5 6 7 \
>                 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
>                 max-sdu 0 0 0 0 0 200 0 0 \
>                 base-time 200 \
>                 sched-entry S 80 20000 \
>                 sched-entry S a0 20000 \
>                 sched-entry S 5f 60000 \
>                 $extra_flags
> }
>
> add_cbs()
> {
>         local offload=$1
>         local extra_flags
>
>         case $offload in
>         true)
>                 extra_flags="offload 1"
>                 ;;
>         false)
>                 extra_flags=""
>                 ;;
>         esac
>
>         max_frame_size=1500
>         data_rate_kbps=20000
>         port_transmit_rate_kbps=1000000
>         idleslope=$data_rate_kbps
>         sendslope=$(($idleslope - $port_transmit_rate_kbps))
>         locredit=$(($max_frame_size * $sendslope / $port_transmit_rate_kbps))
>         hicredit=$(($max_frame_size * $idleslope / $port_transmit_rate_kbps))
>         tc qdisc replace dev eno0 parent 8001:8 cbs \
>                 idleslope $idleslope \
>                 sendslope $sendslope \
>                 hicredit $hicredit \
>                 locredit $locredit \
>                 $extra_flags
> }
>
> # this should always fail
> add_second_taprio()
> {
>         tc qdisc replace dev eno0 parent 8001:7 taprio \
>                 num_tc 8 \
>                 map 0 1 2 3 4 5 6 7 \
>                 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
>                 max-sdu 0 0 0 0 0 200 0 0 \
>                 base-time 200 \
>                 sched-entry S 80 20000 \
>                 sched-entry S a0 20000 \
>                 sched-entry S 5f 60000 \
>                 clockid CLOCK_TAI
> }
>
> ip link set eno0 up
>
> echo "Offload:"
> add_taprio true
> add_cbs true
> add_second_taprio
> mausezahn eno0 -t ip -b 00:04:9f:05:f6:27 -c 100 -p 60
> sleep 5
> tc -s class show dev eno0
> tc qdisc del dev eno0 root
>
> echo "Software:"
> add_taprio false
> add_cbs false
> add_second_taprio
> mausezahn eno0 -t ip -b 00:04:9f:05:f6:27 -c 100 -p 60
> sleep 5
> tc -s class show dev eno0
> tc qdisc del dev eno0 root
>
> > In my cursory look i assumed you wanted to go along the path of mqprio
> > where nothing much happens in the s/w datapath other than requeues
> > when the tx hardware path is busy (notice it is missing an
> > enqueue/deque ops). In that case the hardware selection is essentially
> > of a DMA ring based on skb tags. It seems you took it up a notch by
> > infact having a choice of whether to have pure s/w or offload path.
>
> Yes. Actually the original taprio design always had the enqueue()/dequeue()
> ops involved in the data path, then commit 13511704f8d7 ("net: taprio
> offload: enforce qdisc to netdev queue mapping") retrofitted the mqprio
> model when using the "flags 0x2" argument.


> If you have time to read, the discussion behind that redesign was here:
> https://lore.kernel.org/netdev/20210511171829.17181-1-yannick.vignon@oss.nxp.com/

  reply	other threads:[~2023-06-06 15:39 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-02 10:37 [PATCH RESEND net-next 0/5] Improve the taprio qdisc's relationship with its children Vladimir Oltean
2023-06-02 10:37 ` [PATCH RESEND net-next 1/5] net/sched: taprio: don't access q->qdiscs[] in unoffloaded mode during attach() Vladimir Oltean
2023-06-02 10:37 ` [PATCH RESEND net-next 2/5] net/sched: taprio: keep child Qdisc refcount elevated at 2 in offload mode Vladimir Oltean
2023-06-06 10:27   ` Paolo Abeni
2023-06-06 15:56     ` Vladimir Oltean
2023-06-07 10:05       ` Paolo Abeni
2023-06-07 10:16         ` Vladimir Oltean
2023-06-02 10:37 ` [PATCH RESEND net-next 3/5] net/sched: taprio: try again to report q->qdiscs[] to qdisc_leaf() Vladimir Oltean
2023-06-02 10:37 ` [PATCH RESEND net-next 4/5] net/sched: taprio: delete misleading comment about preallocating child qdiscs Vladimir Oltean
2023-06-02 10:37 ` [PATCH RESEND net-next 5/5] net/sched: taprio: dump class stats for the actual q->qdiscs[] Vladimir Oltean
2023-06-08 18:44   ` Jamal Hadi Salim
2023-06-09 12:10     ` Vladimir Oltean
2023-06-09 14:56       ` Victor Nogueira
2023-06-10 17:49         ` Vladimir Oltean
2023-06-09 16:19       ` Jamal Hadi Salim
2023-06-09 17:56         ` Vladimir Oltean
2023-06-05 12:50 ` [PATCH RESEND net-next 0/5] Improve the taprio qdisc's relationship with its children Vladimir Oltean
2023-06-05 13:27   ` Simon Horman
2023-06-05 15:44 ` Jamal Hadi Salim
2023-06-05 16:53   ` Vladimir Oltean
2023-06-06 15:39     ` Jamal Hadi Salim [this message]
2023-06-06 16:31       ` Vladimir Oltean
2023-06-06 17:42         ` Jamal Hadi Salim
2023-06-07 10:09           ` Vladimir Oltean
2023-06-07 10:30             ` Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM0EoM=qG9sDjM=F6D=i=h3dKXwQXAt1wui8W_EXDJi2tijRnw@mail.gmail.com' \
    --to=jhs@mojatatu.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=muhammad.husaini.zulkifli@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pctammela@mojatatu.com \
    --cc=vinicius.gomes@intel.com \
    --cc=vladimir.oltean@nxp.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yepeilin.cs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).