netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Florian Fainelli <f.fainelli@gmail.com>
To: Andrew Lunn <andrew@lunn.ch>, jiri@resnulli.us, jhs@mojatatu.com
Cc: netdev@vger.kernel.org, davem@davemloft.net,
	xiyou.wangcong@gmail.com, vivien.didelot@savoirfairelinux.com
Subject: Re: [RFC net-next 0/8] net: dsa: Multi-queue awareness
Date: Fri, 1 Sep 2017 09:46:26 -0700	[thread overview]
Message-ID: <a0b9ab3a-45d8-3371-901b-2249a26b9192@gmail.com> (raw)
In-Reply-To: <20170901132921.GV22289@lunn.ch>

On 09/01/2017 06:29 AM, Andrew Lunn wrote:
>> I suppose that you could somehow use TC to influence how the traffic
>> from host to CPU works, but without a "CPU" port representor the
>> question is how do we get that done? If we used "eth0" we need to
>> callback into the switch driver for programming..
> 
> We need to compare how the different switches work with respect to
> QoS. Marvell switches do a lot of the classification on the ingress
> port where it determines what queue the frame should be placed in on
> the egress port. The egress port then schedules its queues.
> 
> This does not map to TC too well.
> 
>> Regarding the last patch in this series, what I would ideally to replace
>> it with is something along the lines of:
>>
>> tc bind dev sw0p0 queue 0 dev eth0 queue 16
> 
> Why do you need this? sw0p0 has 8 queues? So i assume you use TC on
> sw0p0 to place frames into these queues? The queue then gets passed
> transparently down through the conduit interface and then used by the
> tagger. I don't see why you need eth0 here? We try our best to avoid
> eth0 wherever possible, it causes confusion. So i would prefer not to
> have to use eth0 with TC commands.

Well, if you read through patch 8 maybe this is explained. The dynamic
queue selection is working fine through the use of the DSA network
device's queue being passed to the Broadcom tag. If there was just I
would agree with you, but here is the catch below.

We also have this unique (AFAICT) hardware feature called Advanced
Congestion Buffering (ACB) where the CPU Ethernet MAC can receive
congestion information from the switch queues directly, but out of band
from specifically added HW logic and signals. This is not using pause
frames.

This is useful for instance when your CPU is linking at 1Gbits/sec (or
more) internally with the switch, but you have connected external hosts
that are only 10/100 capable. When you push 1Gbits/sec of traffic
towards such hosts, you would get severe packet loss, unless you have
pause frames enabled. The problem with Pause frames within the SF2
switch design is that they are not per-flow though, so you can't resolve
from the pause frames which egress queue to backpressure. With ACB
though, you get end-to-end backpressure between say, Port 8 and Port 0.
In order for this to work though, you need the CPU MAC to be inspecting
or rather receiving congestion notification from the switch port and
queues directly.

This is why I need to define a mapping between switch (port P,queue Q)
and CPU MAC queues (Q'). In the first generation HW, we have up to 4
ports exposed, each with 8 queues, and we have 32 CPU queues, 1:1
mapping is possible. On 2nd generation hardware, same number of ports
and queues per port, but only 16 queues, so a 2:1 mapping is possible only.

If I do not establish the mapping, several problems can occur but the
most severe is that congestion notification logic gets its congestion
information from the wrong port and queue, so it can be permanently
backpressuring the CPU queue based on e.g: a disabled port. This results
in the transmit queue being disabled, and so you get the netdev watchdog
to kick in and scream.

As you can see from the last patch, I used a notifier to receive
information when DSA slave network devices are added, from which I
extract their port number and set up an internal mapping to the CPU
queues, but this is creating a layering violation since I have now the
CPU driver extracting DSA specific net_device information.

The idea behind exposing a command like the proposed "tc bind" is to let
users define the mapping as they see fit. 1:1 or 2:1 mapping is fine as
a default, but may not be satisfactory for some use cases.

Hope this helps understand the bigger picture ;)
-- 
Florian

  reply	other threads:[~2017-09-01 16:46 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-31  0:18 [RFC net-next 0/8] net: dsa: Multi-queue awareness Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 1/8] net: dsa: Allow switch drivers to indicate number of RX/TX queues Florian Fainelli
2017-08-31 23:44   ` Andrew Lunn
2017-09-01  4:00     ` Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 2/8] net: dsa: tag_brcm: Set output queue from skb queue mapping Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 3/8] net: dsa: bcm_sf2: Advertise number of egress queues Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 4/8] net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 5/8] net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278 Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 6/8] net: dsa: Expose dsa_slave_dev_check and dsa_slave_dev_port_num Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 7/8] net: dsa: tag_brcm: Indicate to master netdevice port + queue Florian Fainelli
2017-08-31  0:18 ` [RFC net-next 8/8] net: systemport: Establish DSA network device queue mapping Florian Fainelli
2017-09-01  0:05 ` [RFC net-next 0/8] net: dsa: Multi-queue awareness Andrew Lunn
2017-09-01  4:10   ` Florian Fainelli
2017-09-01 13:29     ` Andrew Lunn
2017-09-01 16:46       ` Florian Fainelli [this message]
2017-09-01 17:55         ` Andrew Lunn
2017-09-01 18:27           ` Florian Fainelli
2017-09-01 18:50             ` Andrew Lunn
2017-09-01 19:21               ` Florian Fainelli
2017-09-01 19:44                 ` Andrew Lunn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a0b9ab3a-45d8-3371-901b-2249a26b9192@gmail.com \
    --to=f.fainelli@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=vivien.didelot@savoirfairelinux.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).