All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: Eric Dumazet <edumazet@google.com>
Cc: Peilin Ye <yepeilin.cs@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Peilin Ye <peilin.ye@bytedance.com>,
	netdev <netdev@vger.kernel.org>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Cong Wang <cong.wang@bytedance.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	Dave Taht <dave.taht@gmail.com>
Subject: Re: [PATCH RFC v2 net-next 0/5] net: Qdisc backpressure infrastructure
Date: Tue, 30 Aug 2022 10:28:01 +0800	[thread overview]
Message-ID: <CALOAHbDQJY-YeOHnLLrZxyR6Xv957qBe+JH4Mq4YQtBB9AM8zQ@mail.gmail.com> (raw)
In-Reply-To: <CANn89iJsOHK1qgudpfFW9poC4NRBZiob-ynTOuRBkuJTw6FaJw@mail.gmail.com>

On Tue, Aug 23, 2022 at 1:02 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Aug 22, 2022 at 2:10 AM Peilin Ye <yepeilin.cs@gmail.com> wrote:
> >
> > From: Peilin Ye <peilin.ye@bytedance.com>
> >
> > Hi all,
> >
> > Currently sockets (especially UDP ones) can drop a lot of packets at TC
> > egress when rate limited by shaper Qdiscs like HTB.  This patchset series
> > tries to solve this by introducing a Qdisc backpressure mechanism.
> >
> > RFC v1 [1] used a throttle & unthrottle approach, which introduced several
> > issues, including a thundering herd problem and a socket reference count
> > issue [2].  This RFC v2 uses a different approach to avoid those issues:
> >
> >   1. When a shaper Qdisc drops a packet that belongs to a local socket due
> >      to TC egress congestion, we make part of the socket's sndbuf
> >      temporarily unavailable, so it sends slower.
> >
> >   2. Later, when TC egress becomes idle again, we gradually recover the
> >      socket's sndbuf back to normal.  Patch 2 implements this step using a
> >      timer for UDP sockets.
> >
> > The thundering herd problem is avoided, since we no longer wake up all
> > throttled sockets at the same time in qdisc_watchdog().  The socket
> > reference count issue is also avoided, since we no longer maintain socket
> > list on Qdisc.
> >
> > Performance is better than RFC v1.  There is one concern about fairness
> > between flows for TBF Qdisc, which could be solved by using a SFQ inner
> > Qdisc.
> >
> > Please see the individual patches for details and numbers.  Any comments,
> > suggestions would be much appreciated.  Thanks!
> >
> > [1] https://lore.kernel.org/netdev/cover.1651800598.git.peilin.ye@bytedance.com/
> > [2] https://lore.kernel.org/netdev/20220506133111.1d4bebf3@hermes.local/
> >
> > Peilin Ye (5):
> >   net: Introduce Qdisc backpressure infrastructure
> >   net/udp: Implement Qdisc backpressure algorithm
> >   net/sched: sch_tbf: Use Qdisc backpressure infrastructure
> >   net/sched: sch_htb: Use Qdisc backpressure infrastructure
> >   net/sched: sch_cbq: Use Qdisc backpressure infrastructure
> >
>
> I think the whole idea is wrong.
>
> Packet schedulers can be remote (offloaded, or on another box)
>
> The idea of going back to socket level from a packet scheduler should
> really be a last resort.
>
> Issue of having UDP sockets being able to flood a network is tough, I
> am not sure the core networking stack
> should pretend it can solve the issue.
>
> Note that FQ based packet schedulers can also help already.

We encounter a similar issue when using (fq + edt-bpf) to limit UDP
packet, because of the qdisc buffer limit.
If the qdisc buffer limit is too small, the UDP packet will be dropped
in the qdisc layer. But the sender doesn't know that the packets has
been dropped, so it will continue to send packets, and thus more and
more packets will be dropped there.  IOW, the qdisc will be a
bottleneck before the bandwidth limit is reached.
We workaround this issue by enlarging the buffer limit and flow_limit
(the proper values can be calculated from net.ipv4.udp_mem and
net.core.wmem_default).
But obviously this is not a perfect solution, because
net.ipv4.udp_mem or net.core.wmem_default may be changed dynamically.
We also think about a solution to build a connection between udp
memory and qdisc limit, but not sure if it is a good idea neither.

-- 
Regards
Yafang

  parent reply	other threads:[~2022-08-30  2:28 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-06 19:43 [PATCH RFC v1 net-next 0/4] net: Qdisc backpressure infrastructure Peilin Ye
2022-05-06 19:44 ` [PATCH RFC v1 net-next 1/4] net: Introduce " Peilin Ye
2022-05-06 20:31   ` Stephen Hemminger
2022-05-06 23:34     ` Peilin Ye
2022-05-09  7:53   ` Dave Taht
2022-05-10  2:23     ` Peilin Ye
2022-05-06 19:44 ` [PATCH RFC v1 net-next 2/4] net/sched: sch_tbf: Use " Peilin Ye
2022-05-06 19:45 ` [PATCH RFC v1 net-next 3/4] net/sched: sch_htb: " Peilin Ye
2022-05-06 19:45 ` [PATCH RFC v1 net-next 4/4] net/sched: sch_cbq: " Peilin Ye
2022-05-10  3:26 ` [PATCH RFC v1 net-next 0/4] net: " Eric Dumazet
2022-05-10 23:03   ` Peilin Ye
2022-05-10 23:27     ` Peilin Ye
2022-08-22  9:10 ` [PATCH RFC v2 net-next 0/5] " Peilin Ye
2022-08-22  9:11   ` [PATCH RFC v2 net-next 1/5] net: Introduce " Peilin Ye
2022-08-22  9:12   ` [PATCH RFC v2 net-next 2/5] net/udp: Implement Qdisc backpressure algorithm Peilin Ye
2022-08-31 10:42     ` Hillf Danton
2022-08-22  9:12   ` [PATCH RFC v2 net-next 3/5] net/sched: sch_tbf: Use Qdisc backpressure infrastructure Peilin Ye
2022-08-22  9:12   ` [PATCH RFC v2 net-next 4/5] net/sched: sch_htb: " Peilin Ye
2022-08-22  9:12   ` [PATCH RFC v2 net-next 5/5] net/sched: sch_cbq: " Peilin Ye
2022-08-22 16:17   ` [PATCH RFC v2 net-next 0/5] net: " Jakub Kicinski
2022-08-29 16:53     ` Cong Wang
2022-08-30  0:21       ` Jakub Kicinski
2022-09-19 17:00         ` Cong Wang
2022-08-22 16:22   ` Eric Dumazet
2022-08-29 16:47     ` Cong Wang
2022-08-29 16:53       ` Eric Dumazet
2022-09-19 17:06         ` Cong Wang
2022-08-30  2:28     ` Yafang Shao [this message]
2022-09-19 17:04       ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALOAHbDQJY-YeOHnLLrZxyR6Xv957qBe+JH4Mq4YQtBB9AM8zQ@mail.gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=cong.wang@bytedance.com \
    --cc=corbet@lwn.net \
    --cc=dave.taht@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=peilin.ye@bytedance.com \
    --cc=stephen@networkplumber.org \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yepeilin.cs@gmail.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.