All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Blakey <paulb@nvidia.com>
To: Marcelo Leitner <mleitner@redhat.com>,
	Ilya Maximets <i.maximets@ovn.org>
Cc: netdev@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
	Paolo Abeni <pabeni@redhat.com>, Jakub Kicinski <kuba@kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Oz Shlomo <ozsh@nvidia.com>, Jiri Pirko <jiri@nvidia.com>,
	Roi Dayan <roid@nvidia.com>, Vlad Buslov <vladbu@nvidia.com>
Subject: Re: [PATCH net-next v8 0/7] net/sched: cls_api: Support hardware miss to tc action
Date: Wed, 8 Feb 2023 10:41:39 +0200	[thread overview]
Message-ID: <a2f19534-9752-845c-9b8a-3aa75b5f3706@nvidia.com> (raw)
In-Reply-To: <CALnP8ZaEFnd=N_oFar+8hBF=XukRis92cnW4KBtywxnO4u9=zQ@mail.gmail.com>



On 07/02/2023 07:03, Marcelo Leitner wrote:
> On Tue, Feb 07, 2023 at 01:20:55AM +0100, Ilya Maximets wrote:
>> On 2/6/23 18:14, Paul Blakey wrote:
>>>
>>>
>>> On 06/02/2023 14:34, Ilya Maximets wrote:
>>>> On 2/5/23 16:49, Paul Blakey wrote:
>>>>> Hi,
>>>>>
>>>>> This series adds support for hardware miss to instruct tc to continue execution
>>>>> in a specific tc action instance on a filter's action list. The mlx5 driver patch
>>>>> (besides the refactors) shows its usage instead of using just chain restore.
>>>>>
>>>>> Currently a filter's action list must be executed all together or
>>>>> not at all as driver are only able to tell tc to continue executing from a
>>>>> specific tc chain, and not a specific filter/action.
>>>>>
>>>>> This is troublesome with regards to action CT, where new connections should
>>>>> be sent to software (via tc chain restore), and established connections can
>>>>> be handled in hardware.
>>>>>
>>>>> Checking for new connections is done when executing the ct action in hardware
>>>>> (by checking the packet's tuple against known established tuples).
>>>>> But if there is a packet modification (pedit) action before action CT and the
>>>>> checked tuple is a new connection, hardware will need to revert the previous
>>>>> packet modifications before sending it back to software so it can
>>>>> re-match the same tc filter in software and re-execute its CT action.
>>>>>
>>>>> The following is an example configuration of stateless nat
>>>>> on mlx5 driver that isn't supported before this patchet:
>>>>>
>>>>>    #Setup corrosponding mlx5 VFs in namespaces
>>>>>    $ ip netns add ns0
>>>>>    $ ip netns add ns1
>>>>>    $ ip link set dev enp8s0f0v0 netns ns0
>>>>>    $ ip netns exec ns0 ifconfig enp8s0f0v0 1.1.1.1/24 up
>>>>>    $ ip link set dev enp8s0f0v1 netns ns1
>>>>>    $ ip netns exec ns1 ifconfig enp8s0f0v1 1.1.1.2/24 up
>>>>>
>>>>>    #Setup tc arp and ct rules on mxl5 VF representors
>>>>>    $ tc qdisc add dev enp8s0f0_0 ingress
>>>>>    $ tc qdisc add dev enp8s0f0_1 ingress
>>>>>    $ ifconfig enp8s0f0_0 up
>>>>>    $ ifconfig enp8s0f0_1 up
>>>>>
>>>>>    #Original side
>>>>>    $ tc filter add dev enp8s0f0_0 ingress chain 0 proto ip flower \
>>>>>       ct_state -trk ip_proto tcp dst_port 8888 \
>>>>>         action pedit ex munge tcp dport set 5001 pipe \
>>>>>         action csum ip tcp pipe \
>>>>>         action ct pipe \
>>>>>         action goto chain 1
>>>>>    $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
>>>>>       ct_state +trk+est \
>>>>>         action mirred egress redirect dev enp8s0f0_1
>>>>>    $ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
>>>>>       ct_state +trk+new \
>>>>>         action ct commit pipe \
>>>>>         action mirred egress redirect dev enp8s0f0_1
>>>>>    $ tc filter add dev enp8s0f0_0 ingress chain 0 proto arp flower \
>>>>>         action mirred egress redirect dev enp8s0f0_1
>>>>>
>>>>>    #Reply side
>>>>>    $ tc filter add dev enp8s0f0_1 ingress chain 0 proto arp flower \
>>>>>         action mirred egress redirect dev enp8s0f0_0
>>>>>    $ tc filter add dev enp8s0f0_1 ingress chain 0 proto ip flower \
>>>>>       ct_state -trk ip_proto tcp \
>>>>>         action ct pipe \
>>>>>         action pedit ex munge tcp sport set 8888 pipe \
>>>>>         action csum ip tcp pipe \
>>>>>         action mirred egress redirect dev enp8s0f0_0
>>>>>
>>>>>    #Run traffic
>>>>>    $ ip netns exec ns1 iperf -s -p 5001&
>>>>>    $ sleep 2 #wait for iperf to fully open
>>>>>    $ ip netns exec ns0 iperf -c 1.1.1.2 -p 8888
>>>>>
>>>>>    #dump tc filter stats on enp8s0f0_0 chain 0 rule and see hardware packets:
>>>>>    $ tc -s filter show dev enp8s0f0_0 ingress chain 0 proto ip | grep "hardware.*pkt"
>>>>>           Sent hardware 9310116832 bytes 6149672 pkt
>>>>>           Sent hardware 9310116832 bytes 6149672 pkt
>>>>>           Sent hardware 9310116832 bytes 6149672 pkt
>>>>>
>>>>> A new connection executing the first filter in hardware will first rewrite
>>>>> the dst port to the new port, and then the ct action is executed,
>>>>> because this is a new connection, hardware will need to be send this back
>>>>> to software, on chain 0, to execute the first filter again in software.
>>>>> The dst port needs to be reverted otherwise it won't re-match the old
>>>>> dst port in the first filter. Because of that, currently mlx5 driver will
>>>>> reject offloading the above action ct rule.
>>>>>
>>>>> This series adds supports partial offload of a filter's action list,
>>>>> and letting tc software continue processing in the specific action instance
>>>>> where hardware left off (in the above case after the "action pedit ex munge tcp
>>>>> dport... of the first rule") allowing support for scenarios such as the above.
>>>>
>>>>
>>>> Hi, Paul.  Not sure if this was discussed before, but don't we also need
>>>> a new TCA_CLS_FLAGS_IN_HW_PARTIAL flag or something like this?
>>>>
>>>> Currently the in_hw/not_in_hw flags are reported per filter, i.e. these
>>>> flags are not per-action.  This may cause confusion among users, if flows
>>>> are reported as in_hw, while they are actually partially or even mostly
>>>> processed in SW.
>>>>
>>>> What do you think?
>>>>
>>>> Best regards, Ilya Maximets.
>>>
>>> I think its a good idea, and I'm fine with proposing something like this in a
>>> different series, as this isn't a new problem from this series and existed before
>>> it, at least with CT rules.
>>
>> Hmm, I didn't realize the issue already exists.
> 
> Maintainers: please give me up to Friday to review this patchset.
> 
> Disclaimer: I had missed this patchset, and I didn't even read it yet.
> 
> I don't follow. Can someone please rephase the issue please?
> AFAICT, it is not that the NIC is offloading half of the action list
> and never executing a part of it. Instead, for established connections
> the rule will work fully offloaded. While for misses in the CT action,
> it will simply trigger a miss, like it already does today.

You got it right, and like you said it was like this before so its not 
strictly related by this series and could be in a different patchset. 
And I thought that (extra) flag would mean that it can miss, compared to 
other rules/actions combination that will never miss because they
don't need sw support.

> 
>>
>>>
>>> So how about I'll propose it in a different series and we continue with this first?
> 
> So I'm not sure either on what's the idea here.
> 
> Thanks,
> Marcelo
> 
>>
>> Sounds fine to me.  Thanks!
>>
>> Best regards, Ilya Maximets.
>>
> 

  parent reply	other threads:[~2023-02-08  8:42 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-05 15:49 [PATCH net-next v8 0/7] net/sched: cls_api: Support hardware miss to tc action Paul Blakey
2023-02-05 15:49 ` [PATCH net-next v8 1/7] " Paul Blakey
2023-02-05 15:49 ` [PATCH net-next v8 2/7] net/sched: flower: Move filter handle initialization earlier Paul Blakey
2023-02-05 15:49 ` [PATCH net-next v8 3/7] net/sched: flower: Support hardware miss to tc action Paul Blakey
2023-02-05 15:49 ` [PATCH net-next v8 4/7] net/mlx5: Kconfig: Make tc offload depend on tc skb extension Paul Blakey
2023-02-06 15:40   ` Alexander H Duyck
2023-02-06 17:16     ` Paul Blakey
2023-02-05 15:49 ` [PATCH net-next v8 5/7] net/mlx5: Refactor tc miss handling to a single function Paul Blakey
2023-02-05 15:49 ` [PATCH net-next v8 6/7] net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG Paul Blakey
2023-02-05 15:49 ` [PATCH net-next v8 7/7] net/mlx5e: TC, Set CT miss to the specific ct action instance Paul Blakey
2023-02-06 12:34 ` [PATCH net-next v8 0/7] net/sched: cls_api: Support hardware miss to tc action Ilya Maximets
2023-02-06 17:14   ` Paul Blakey
2023-02-07  0:20     ` Ilya Maximets
2023-02-07  5:03       ` Marcelo Leitner
2023-02-07  5:20         ` Jakub Kicinski
2023-02-08  8:41         ` Paul Blakey [this message]
2023-02-08 18:01           ` Marcelo Leitner
2023-02-09  0:09             ` Ilya Maximets
2023-02-09  1:09               ` Marcelo Leitner
2023-02-09 12:07                 ` Ilya Maximets
2023-02-09 12:40                   ` Paul Blakey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2f19534-9752-845c-9b8a-3aa75b5f3706@nvidia.com \
    --to=paulb@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=i.maximets@ovn.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=mleitner@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=ozsh@nvidia.com \
    --cc=pabeni@redhat.com \
    --cc=roid@nvidia.com \
    --cc=saeedm@nvidia.com \
    --cc=vladbu@nvidia.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.