linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Florian Fainelli <f.fainelli@gmail.com>
To: Alexander Lobakin <alobakin@dlink.ru>
Cc: Andrew Lunn <andrew@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Muciri Gatimu <muciri@openmesh.com>,
	Shashidhar Lakkavalli <shashidhar.lakkavalli@openmesh.com>,
	John Crispin <john@phrozen.org>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Stanislav Fomichev <sdf@google.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Song Liu <songliubraving@fb.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Matteo Croce <mcroce@redhat.com>,
	Jakub Sitnicki <jakub@cloudflare.com>,
	Eric Dumazet <edumazet@google.com>,
	Paul Blakey <paulb@mellanox.com>,
	Yoshiki Komachi <komachi.yoshiki@gmail.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net] net: dsa: fix flow dissection on Tx path
Date: Fri, 6 Dec 2019 10:05:20 -0800	[thread overview]
Message-ID: <30eb10a3-ad94-5b9b-5cea-f4b50c2f0e42@gmail.com> (raw)
In-Reply-To: <5e108291220110b10fdf0c88f8894fdb@dlink.ru>

On 12/5/19 11:37 PM, Alexander Lobakin wrote:
> Florian Fainelli wrote 06.12.2019 06:32:
>> On 12/5/2019 6:58 AM, Alexander Lobakin wrote:
>>> Andrew Lunn wrote 05.12.2019 17:01:
>>>>> Hi,
>>>>>
>>>>> > What i'm missing here is an explanation why the flow dissector is
>>>>> > called here if the protocol is already set? It suggests there is a
>>>>> > case when the protocol is not correctly set, and we do need to look
>>>>> > into the frame?
>>>>>
>>>>> If we have a device with multiple Tx queues, but XPS is not configured
>>>>> or system is running on uniprocessor system, then networking core code
>>>>> selects Tx queue depending on the flow to utilize as much Tx queues as
>>>>> possible but without breaking frames order.
>>>>> This selection happens in net/core/dev.c:skb_tx_hash() as:
>>>>>
>>>>> reciprocal_scale(skb_get_hash(skb), qcount)
>>>>>
>>>>> where 'qcount' is the total number of Tx queues on the network device.
>>>>>
>>>>> If skb has not been hashed prior to this line, then skb_get_hash()
>>>>> will
>>>>> call flow dissector to generate a new hash. That's why flow dissection
>>>>> can occur on Tx path.
>>>>
>>>>
>>>> Hi Alexander
>>>>
>>>> So it looks like you are now skipping this hash. Which in your
>>>> testing, give better results, because the protocol is already set
>>>> correctly. But are there cases when the protocol is not set correctly?
>>>> We really do need to look into the frame?
>>>
>>> Actually no, I'm not skipping the entire hashing, I'm only skipping
>>> tag_ops->flow_dissect() (helper that only alters network offset and
>>> replaces fake ETH_P_XDSA with the actual protocol) call on Tx path,
>>> because this only breaks flow dissection logics. All skbs are still
>>> processed and hashed by the generic code that goes after that call.
>>>
>>>> How about when an outer header has just been removed? The frame was
>>>> received on a GRE tunnel, the GRE header has just been removed, and
>>>> now the frame is on its way out? Is the protocol still GRE, and we
>>>> should look into the frame to determine if it is IPv4, ARP etc?
>>>>
>>>> Your patch looks to improve things for the cases you have tested, but
>>>> i'm wondering if there are other use cases where we really do need to
>>>> look into the frame? In which case, your fix is doing the wrong thing.
>>>> Should we be extending the tagger to handle the TX case as well as the
>>>> RX case?
>>>
>>> We really have two options: don't call tag_ops->flow_dissect() on Tx
>>> (this patch), or extend tagger callbacks to handle Tx path too. I was
>>> using both of this for several months each and couldn't detect cases
>>> where the first one was worse than the second.
>>> I mean, there _might_ be such cases in theory, and if they will appear
>>> we should extend our taggers. But for now I don't see the necessity to
>>> do this as generic flow dissection logics works as expected after this
>>> patch and is completely broken without it.
>>> And remember that we have the reverse logic on Tx and all skbs are
>>> firstly queued on slave netdevice and only then on master/CPU port.
>>>
>>> It would be nice to see what other people think about it anyways.
>>
>> Your patch seems appropriate to me and quite frankly I am not sure why
>> flow dissection on RX is done at the DSA master device level, where we
>> have not parsed the DSA tag yet, instead of being done at the DSA slave
>> network device level. It seems to me that if the DSA master has N RX
>> queues, we should be creating the DSA slave devices with the same amount
>> of RX queues and perform RPS there against a standard Ethernet frame
>> (sans DSA tag).
>>
>> For TX the story is a little different because we can have multiqueue
>> DSA slave network devices in order to steer traffic towards particular
>> switch queues and we could do XPS there that way.
>>
>> What do you think?
> 
> Hi Florian,
> 
> First of all, thank you for the "Reviewed-by"!
> 
> I agree with you that all the network stack processing should be
> performed on standard frames without CPU tags and on corresponding
> slave netdevices. So I think we really should think about extending
> DSA core code to create slaves with at least as many Rx queues as
> master device have. With this done we could remove .flow_dissect()
> callback from DSA taggers entirely and simplify traffic flow.

Indeed.

> 
> Also, if we get back to Tx processing, number of Tx queues on slaves
> should be equal to number of queues on switch inself in ideal case.
> Maybe we should then apply this rule to Rx queues too, i.e. create
> slaves with the number of Rx queues that switch has?

Yes, I would offer the same configuration knob we have today with TX
queues for RX queues.

> 
> (for example, I'm currently working with the switches that have 8 Rxqs
> and 8 Txqs, but their Ethernet controlers / CPU ports have only 4/4)

Yes, that is not uncommon unfortunately, we have a similar set-up with
BCM7278 which has 16 TX queues for its DSA master and we have 4 switch
ports with 8 TX queues per port, what I did is basically clamp the
number of DSA slave device TX queues to have a 1:1 mapping and that
seems acceptable to the users :)
-- 
Florian

  reply	other threads:[~2019-12-06 18:05 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-05 10:02 [PATCH net] net: dsa: fix flow dissection on Tx path Alexander Lobakin
2019-12-05 12:58 ` Andrew Lunn
2019-12-05 13:34   ` Alexander Lobakin
2019-12-05 14:01     ` Andrew Lunn
2019-12-05 14:58       ` Alexander Lobakin
2019-12-06  3:32         ` Florian Fainelli
2019-12-06  7:37           ` Alexander Lobakin
2019-12-06 18:05             ` Florian Fainelli [this message]
2019-12-06  3:28 ` Florian Fainelli
2019-12-06 15:06   ` Alexander Lobakin
2019-12-06 19:32 ` Rainer Sickinger
2019-12-07  4:19 ` David Miller
2019-12-07  8:10   ` Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=30eb10a3-ad94-5b9b-5cea-f4b50c2f0e42@gmail.com \
    --to=f.fainelli@gmail.com \
    --cc=alobakin@dlink.ru \
    --cc=andrew@lunn.ch \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jakub@cloudflare.com \
    --cc=john@phrozen.org \
    --cc=komachi.yoshiki@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcroce@redhat.com \
    --cc=muciri@openmesh.com \
    --cc=netdev@vger.kernel.org \
    --cc=paulb@mellanox.com \
    --cc=sdf@google.com \
    --cc=shashidhar.lakkavalli@openmesh.com \
    --cc=songliubraving@fb.com \
    --cc=vivien.didelot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).