All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: Paolo Abeni <pabeni@redhat.com>,
	Tonghao Zhang <xiangxia.m.yue@gmail.com>
Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Jonathan Lemon <jonathan.lemon@gmail.com>,
	Eric Dumazet <edumazet@google.com>,
	Alexander Lobakin <alobakin@pm.me>,
	Talal Ahmad <talalahmad@google.com>,
	Kevin Hao <haokexin@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	bpf@vger.kernel.org
Subject: Re: [net-next v10 1/2] net: sched: use queue_mapping to pick tx queue
Date: Fri, 18 Mar 2022 14:36:25 +0100	[thread overview]
Message-ID: <f61e4a34-e7e5-198b-dde6-816654775b21@iogearbox.net> (raw)
In-Reply-To: <7d4b0c51460dec351bbbaf9be85c4a25cb6cec4f.camel@redhat.com>

On 3/17/22 9:20 AM, Paolo Abeni wrote:
> On Tue, 2022-03-15 at 20:48 +0800, Tonghao Zhang wrote:
>> On Tue, Mar 15, 2022 at 5:59 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>> On 3/14/22 3:15 PM, xiangxia.m.yue@gmail.com wrote:
>>> [...]
>>>>    include/linux/netdevice.h |  3 +++
>>>>    include/linux/rtnetlink.h |  1 +
>>>>    net/core/dev.c            | 31 +++++++++++++++++++++++++++++--
>>>>    net/sched/act_skbedit.c   |  6 +++++-
>>>>    4 files changed, 38 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index 0d994710b335..f33fb2d6712a 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -3065,6 +3065,9 @@ struct softnet_data {
>>>>        struct {
>>>>                u16 recursion;
>>>>                u8  more;
>>>> +#ifdef CONFIG_NET_EGRESS
>>>> +             u8  skip_txqueue;
>>>> +#endif
>>>>        } xmit;
>>>>    #ifdef CONFIG_RPS
>>>>        /* input_queue_head should be written by cpu owning this struct,
>>>> diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
>>>> index 7f970b16da3a..ae2c6a3cec5d 100644
>>>> --- a/include/linux/rtnetlink.h
>>>> +++ b/include/linux/rtnetlink.h
>>>> @@ -100,6 +100,7 @@ void net_dec_ingress_queue(void);
>>>>    #ifdef CONFIG_NET_EGRESS
>>>>    void net_inc_egress_queue(void);
>>>>    void net_dec_egress_queue(void);
>>>> +void netdev_xmit_skip_txqueue(bool skip);
>>>>    #endif
>>>>
>>>>    void rtnetlink_init(void);
>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>> index 75bab5b0dbae..8e83b7099977 100644
>>>> --- a/net/core/dev.c
>>>> +++ b/net/core/dev.c
>>>> @@ -3908,6 +3908,25 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
>>>>
>>>>        return skb;
>>>>    }
>>>> +
>>>> +static struct netdev_queue *
>>>> +netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb)
>>>> +{
>>>> +     int qm = skb_get_queue_mapping(skb);
>>>> +
>>>> +     return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm));
>>>> +}
>>>> +
>>>> +static bool netdev_xmit_txqueue_skipped(void)
>>>> +{
>>>> +     return __this_cpu_read(softnet_data.xmit.skip_txqueue);
>>>> +}
>>>> +
>>>> +void netdev_xmit_skip_txqueue(bool skip)
>>>> +{
>>>> +     __this_cpu_write(softnet_data.xmit.skip_txqueue, skip);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue);
>>>>    #endif /* CONFIG_NET_EGRESS */
>>>>
>>>>    #ifdef CONFIG_XPS
>>>> @@ -4078,7 +4097,7 @@ struct netdev_queue *netdev_core_pick_tx(struct net_device *dev,
>>>>    static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
>>>>    {
>>>>        struct net_device *dev = skb->dev;
>>>> -     struct netdev_queue *txq;
>>>> +     struct netdev_queue *txq = NULL;
>>>>        struct Qdisc *q;
>>>>        int rc = -ENOMEM;
>>>>        bool again = false;
>>>> @@ -4106,11 +4125,17 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
>>>>                        if (!skb)
>>>>                                goto out;
>>>>                }
>>>> +
>>>> +             netdev_xmit_skip_txqueue(false);
>>>> +
>>>>                nf_skip_egress(skb, true);
>>>>                skb = sch_handle_egress(skb, &rc, dev);
>>>>                if (!skb)
>>>>                        goto out;
>>>>                nf_skip_egress(skb, false);
>>>> +
>>>> +             if (netdev_xmit_txqueue_skipped())
>>>> +                     txq = netdev_tx_queue_mapping(dev, skb);
>>>>        }
>>>>    #endif
>>>>        /* If device/qdisc don't need skb->dst, release it right now while
>>>> @@ -4121,7 +4146,9 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
>>>>        else
>>>>                skb_dst_force(skb);
>>>>
>>>> -     txq = netdev_core_pick_tx(dev, skb, sb_dev);
>>>> +     if (likely(!txq))
>>>
>>> nit: Drop likely(). If the feature is used from sch_handle_egress(), then this would always be the case.
>> Hi Daniel
>> I think in most case, we don't use skbedit queue_mapping in the
>> sch_handle_egress() , so I add likely in fast path.

Yeah, but then let branch predictor do its work ? We can still change and drop the
likely() once we add support for BPF though..

>>>> +             txq = netdev_core_pick_tx(dev, skb, sb_dev);
>>>> +
>>>>        q = rcu_dereference_bh(txq->qdisc);
>>>
>>> How will the `netdev_xmit_skip_txqueue(true)` be usable from BPF side (see bpf_convert_ctx_access() ->
>>> queue_mapping)?
>> Good questions, In other patch, I introduce the
>> bpf_netdev_skip_txqueue, so we can use netdev_xmit_skip_txqueue in bpf
>> side

Yeah, that bpf_netdev_skip_txqueue() won't fly. It's basically a helper doing quirk for
an implementation detail (aka calling netdev_xmit_skip_txqueue()). Was hoping you have
something better we could use along with the context rewrite of __sk_buff's queue_mapping,
but worst case we need to rework a bit for BPF. :/

Thanks,
Daniel

  reply	other threads:[~2022-03-18 13:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-14 14:15 [net-next v10 0/2] net: sched: allow user to select txqueue xiangxia.m.yue
2022-03-14 14:15 ` [net-next v10 1/2] net: sched: use queue_mapping to pick tx queue xiangxia.m.yue
2022-03-14 16:38   ` Jamal Hadi Salim
2022-03-14 21:59   ` Daniel Borkmann
2022-03-15 12:48     ` Tonghao Zhang
2022-03-17  8:20       ` Paolo Abeni
2022-03-18 13:36         ` Daniel Borkmann [this message]
2022-03-19 13:40           ` Tonghao Zhang
2022-03-14 14:15 ` [net-next v10 2/2] net: sched: support hash selecting " xiangxia.m.yue
2022-03-14 16:38   ` Jamal Hadi Salim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f61e4a34-e7e5-198b-dde6-816654775b21@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=alobakin@pm.me \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=haokexin@gmail.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=jonathan.lemon@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=talalahmad@google.com \
    --cc=xiangxia.m.yue@gmail.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.