linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next] net: snmp: add statistics for tcp small queue check
@ 2021-11-28  6:01 menglong8.dong
  2021-11-29 13:10 ` patchwork-bot+netdevbpf
  2021-11-29 15:57 ` Jakub Kicinski
  0 siblings, 2 replies; 8+ messages in thread
From: menglong8.dong @ 2021-11-28  6:01 UTC (permalink / raw)
  To: kuba
  Cc: davem, yoshfuji, dsahern, edumazet, imagedong, ycheng, kuniyu,
	linux-kernel, netdev

From: Menglong Dong <imagedong@tencent.com>

Once tcp small queue check failed in tcp_small_queue_check(), the
throughput of tcp will be limited, and it's hard to distinguish
whether it is out of tcp congestion control.

Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
v2:
- use NET_INC_STATS() instead of __NET_INC_STATS()
---
 include/uapi/linux/snmp.h | 1 +
 net/ipv4/proc.c           | 1 +
 net/ipv4/tcp_output.c     | 5 ++++-
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index 904909d020e2..e32ec6932e82 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -292,6 +292,7 @@ enum
 	LINUX_MIB_TCPDSACKIGNOREDDUBIOUS,	/* TCPDSACKIgnoredDubious */
 	LINUX_MIB_TCPMIGRATEREQSUCCESS,		/* TCPMigrateReqSuccess */
 	LINUX_MIB_TCPMIGRATEREQFAILURE,		/* TCPMigrateReqFailure */
+	LINUX_MIB_TCPSMALLQUEUEFAILURE,		/* TCPSmallQueueFailure */
 	__LINUX_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index f30273afb539..43b7a77cd6b4 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -297,6 +297,7 @@ static const struct snmp_mib snmp4_net_list[] = {
 	SNMP_MIB_ITEM("TCPDSACKIgnoredDubious", LINUX_MIB_TCPDSACKIGNOREDDUBIOUS),
 	SNMP_MIB_ITEM("TCPMigrateReqSuccess", LINUX_MIB_TCPMIGRATEREQSUCCESS),
 	SNMP_MIB_ITEM("TCPMigrateReqFailure", LINUX_MIB_TCPMIGRATEREQFAILURE),
+	SNMP_MIB_ITEM("TCPSmallQueueFailure", LINUX_MIB_TCPSMALLQUEUEFAILURE),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2e6e5a70168e..835a556a597a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2524,8 +2524,11 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
 		 * test again the condition.
 		 */
 		smp_mb__after_atomic();
-		if (refcount_read(&sk->sk_wmem_alloc) > limit)
+		if (refcount_read(&sk->sk_wmem_alloc) > limit) {
+			NET_INC_STATS(sock_net(sk),
+				      LINUX_MIB_TCPSMALLQUEUEFAILURE);
 			return true;
+		}
 	}
 	return false;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next] net: snmp: add statistics for tcp small queue check
  2021-11-28  6:01 [PATCH v2 net-next] net: snmp: add statistics for tcp small queue check menglong8.dong
@ 2021-11-29 13:10 ` patchwork-bot+netdevbpf
  2021-11-29 15:57 ` Jakub Kicinski
  1 sibling, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-11-29 13:10 UTC (permalink / raw)
  To: Menglong Dong
  Cc: kuba, davem, yoshfuji, dsahern, edumazet, imagedong, ycheng,
	kuniyu, linux-kernel, netdev

Hello:

This patch was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Sun, 28 Nov 2021 14:01:02 +0800 you wrote:
> From: Menglong Dong <imagedong@tencent.com>
> 
> Once tcp small queue check failed in tcp_small_queue_check(), the
> throughput of tcp will be limited, and it's hard to distinguish
> whether it is out of tcp congestion control.
> 
> Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
> 
> [...]

Here is the summary with links:
  - [v2,net-next] net: snmp: add statistics for tcp small queue check
    https://git.kernel.org/netdev/net-next/c/aeeecb889165

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next] net: snmp: add statistics for tcp small queue check
  2021-11-28  6:01 [PATCH v2 net-next] net: snmp: add statistics for tcp small queue check menglong8.dong
  2021-11-29 13:10 ` patchwork-bot+netdevbpf
@ 2021-11-29 15:57 ` Jakub Kicinski
  2021-11-29 16:37   ` Eric Dumazet
  2021-11-30 14:36   ` Menglong Dong
  1 sibling, 2 replies; 8+ messages in thread
From: Jakub Kicinski @ 2021-11-29 15:57 UTC (permalink / raw)
  To: menglong8.dong
  Cc: davem, yoshfuji, dsahern, edumazet, imagedong, ycheng, kuniyu,
	linux-kernel, netdev

On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote:
> Once tcp small queue check failed in tcp_small_queue_check(), the
> throughput of tcp will be limited, and it's hard to distinguish
> whether it is out of tcp congestion control.
> 
> Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.

Isn't this going to trigger all the time and alarm users because of the
"Failure" in the TCPSmallQueueFailure name?  Isn't it perfectly fine
for TCP to bake full TSQ amount of data and have it paced out onto the
wire? What's your link speed?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next] net: snmp: add statistics for tcp small queue check
  2021-11-29 15:57 ` Jakub Kicinski
@ 2021-11-29 16:37   ` Eric Dumazet
  2021-11-30 14:36   ` Menglong Dong
  1 sibling, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2021-11-29 16:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: menglong8.dong, davem, yoshfuji, dsahern, imagedong, ycheng,
	kuniyu, linux-kernel, netdev

On Mon, Nov 29, 2021 at 7:57 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote:
> > Once tcp small queue check failed in tcp_small_queue_check(), the
> > throughput of tcp will be limited, and it's hard to distinguish
> > whether it is out of tcp congestion control.
> >
> > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
>
> Isn't this going to trigger all the time and alarm users because of the
> "Failure" in the TCPSmallQueueFailure name?  Isn't it perfectly fine
> for TCP to bake full TSQ amount of data and have it paced out onto the
> wire? What's your link speed?

Yes, I would be curious to have some instructions on how this new SNMP
variable can be used,
in a concrete case.

Like, how getting these SNMP values can translate to an action, giving
more throughput ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next] net: snmp: add statistics for tcp small queue check
  2021-11-29 15:57 ` Jakub Kicinski
  2021-11-29 16:37   ` Eric Dumazet
@ 2021-11-30 14:36   ` Menglong Dong
  2021-11-30 15:23     ` Jakub Kicinski
  1 sibling, 1 reply; 8+ messages in thread
From: Menglong Dong @ 2021-11-30 14:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Miller, Hideaki YOSHIFUJI, dsahern, Eric Dumazet,
	Menglong Dong, Yuchung Cheng, kuniyu, LKML, netdev

On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote:
> > Once tcp small queue check failed in tcp_small_queue_check(), the
> > throughput of tcp will be limited, and it's hard to distinguish
> > whether it is out of tcp congestion control.
> >
> > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
>
> Isn't this going to trigger all the time and alarm users because of the
> "Failure" in the TCPSmallQueueFailure name?  Isn't it perfectly fine
> for TCP to bake full TSQ amount of data and have it paced out onto the
> wire? What's your link speed?

Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net
is used with napi_tx enabled.

With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net,
until it is released. The point is that the sending interrupt of
virtio_net will be
turned off and the skb can't be released until the next net_rx interrupt comes.
So, wmem_alloc can't decrease on time, and the bandwidth is limited. When
this happens, the bandwidth can decrease from 500M to 10M.

In fact, this issue of uapi_tx is fixed in this commit:
https://lore.kernel.org/lkml/20210719144949.935298466@linuxfoundation.org/

I added this statistics to monitor the sending failure (may be called
sending delay)
caused by qdisc and net_device. When something happen, maybe users can
raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth.

Thanks!
Menglong Dong

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next] net: snmp: add statistics for tcp small queue check
  2021-11-30 14:36   ` Menglong Dong
@ 2021-11-30 15:23     ` Jakub Kicinski
  2021-11-30 15:55       ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2021-11-30 15:23 UTC (permalink / raw)
  To: Menglong Dong
  Cc: David Miller, Hideaki YOSHIFUJI, dsahern, Eric Dumazet,
	Menglong Dong, Yuchung Cheng, kuniyu, LKML, netdev

On Tue, 30 Nov 2021 22:36:59 +0800 Menglong Dong wrote:
> On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote:  
> > > Once tcp small queue check failed in tcp_small_queue_check(), the
> > > throughput of tcp will be limited, and it's hard to distinguish
> > > whether it is out of tcp congestion control.
> > >
> > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.  
> >
> > Isn't this going to trigger all the time and alarm users because of the
> > "Failure" in the TCPSmallQueueFailure name?  Isn't it perfectly fine
> > for TCP to bake full TSQ amount of data and have it paced out onto the
> > wire? What's your link speed?  
> 
> Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net
> is used with napi_tx enabled.
> 
> With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net,
> until it is released. The point is that the sending interrupt of
> virtio_net will be
> turned off and the skb can't be released until the next net_rx interrupt comes.
> So, wmem_alloc can't decrease on time, and the bandwidth is limited. When
> this happens, the bandwidth can decrease from 500M to 10M.
> 
> In fact, this issue of uapi_tx is fixed in this commit:
> https://lore.kernel.org/lkml/20210719144949.935298466@linuxfoundation.org/
> 
> I added this statistics to monitor the sending failure (may be called
> sending delay)
> caused by qdisc and net_device. When something happen, maybe users can
> raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth.

Sounds very second-order and particular to a buggy driver :/
Let's see what Eric says but I vote revert.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next] net: snmp: add statistics for tcp small queue check
  2021-11-30 15:23     ` Jakub Kicinski
@ 2021-11-30 15:55       ` Eric Dumazet
  2021-12-01  2:05         ` Menglong Dong
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2021-11-30 15:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Menglong Dong, David Miller, Hideaki YOSHIFUJI, dsahern,
	Menglong Dong, Yuchung Cheng, kuniyu, LKML, netdev

On Tue, Nov 30, 2021 at 7:23 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 30 Nov 2021 22:36:59 +0800 Menglong Dong wrote:
> > On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote:
> > > > Once tcp small queue check failed in tcp_small_queue_check(), the
> > > > throughput of tcp will be limited, and it's hard to distinguish
> > > > whether it is out of tcp congestion control.
> > > >
> > > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
> > >
> > > Isn't this going to trigger all the time and alarm users because of the
> > > "Failure" in the TCPSmallQueueFailure name?  Isn't it perfectly fine
> > > for TCP to bake full TSQ amount of data and have it paced out onto the
> > > wire? What's your link speed?
> >
> > Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net
> > is used with napi_tx enabled.
> >
> > With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net,
> > until it is released. The point is that the sending interrupt of
> > virtio_net will be
> > turned off and the skb can't be released until the next net_rx interrupt comes.
> > So, wmem_alloc can't decrease on time, and the bandwidth is limited. When
> > this happens, the bandwidth can decrease from 500M to 10M.
> >
> > In fact, this issue of uapi_tx is fixed in this commit:
> > https://lore.kernel.org/lkml/20210719144949.935298466@linuxfoundation.org/
> >
> > I added this statistics to monitor the sending failure (may be called
> > sending delay)
> > caused by qdisc and net_device. When something happen, maybe users can
> > raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth.
>
> Sounds very second-order and particular to a buggy driver :/
> Let's see what Eric says but I vote revert.

I did some tests yesterday, using one high speed TCP_STREAM (~90Gbit),
and got plenty of increments when using pfifo_fast qdisc.
Yet seed was nominal (bottleneck is the copyout() cost at receiver)

I got few counter increments when qdisc is fq as expected
(because of commit c73e5807e4f6fc6d "tcp: tsq: no longer use
limit_output_bytes for paced flows")


So this new SNMP counter is not a proxy for the kind of problems that a buggy
driver would trigger.

I also suggest we revert this patch.

Thanks !

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next] net: snmp: add statistics for tcp small queue check
  2021-11-30 15:55       ` Eric Dumazet
@ 2021-12-01  2:05         ` Menglong Dong
  0 siblings, 0 replies; 8+ messages in thread
From: Menglong Dong @ 2021-12-01  2:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jakub Kicinski, David Miller, Hideaki YOSHIFUJI, dsahern,
	Menglong Dong, Yuchung Cheng, kuniyu, LKML, netdev

On Tue, Nov 30, 2021 at 11:56 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Nov 30, 2021 at 7:23 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue, 30 Nov 2021 22:36:59 +0800 Menglong Dong wrote:
> > > On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > >
> > > > On Sun, 28 Nov 2021 14:01:02 +0800 menglong8.dong@gmail.com wrote:
> > > > > Once tcp small queue check failed in tcp_small_queue_check(), the
> > > > > throughput of tcp will be limited, and it's hard to distinguish
> > > > > whether it is out of tcp congestion control.
> > > > >
> > > > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
> > > >
> > > > Isn't this going to trigger all the time and alarm users because of the
> > > > "Failure" in the TCPSmallQueueFailure name?  Isn't it perfectly fine
> > > > for TCP to bake full TSQ amount of data and have it paced out onto the
> > > > wire? What's your link speed?
> > >
> > > Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net
> > > is used with napi_tx enabled.
> > >
> > > With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net,
> > > until it is released. The point is that the sending interrupt of
> > > virtio_net will be
> > > turned off and the skb can't be released until the next net_rx interrupt comes.
> > > So, wmem_alloc can't decrease on time, and the bandwidth is limited. When
> > > this happens, the bandwidth can decrease from 500M to 10M.
> > >
> > > In fact, this issue of uapi_tx is fixed in this commit:
> > > https://lore.kernel.org/lkml/20210719144949.935298466@linuxfoundation.org/
> > >
> > > I added this statistics to monitor the sending failure (may be called
> > > sending delay)
> > > caused by qdisc and net_device. When something happen, maybe users can
> > > raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth.
> >
> > Sounds very second-order and particular to a buggy driver :/
> > Let's see what Eric says but I vote revert.
>
> I did some tests yesterday, using one high speed TCP_STREAM (~90Gbit),
> and got plenty of increments when using pfifo_fast qdisc.
> Yet seed was nominal (bottleneck is the copyout() cost at receiver)
>
> I got few counter increments when qdisc is fq as expected
> (because of commit c73e5807e4f6fc6d "tcp: tsq: no longer use
> limit_output_bytes for paced flows")
>
>
> So this new SNMP counter is not a proxy for the kind of problems that a buggy
> driver would trigger.
>
> I also suggest we revert this patch.

Seems this SNMP is indeed not suitable....I vote to revert too.

(Seems Jakub already do the revert, thanks~)

In fact, I'm a little curious that this patch is applied directly. I
used to receive
emails with the message 'applied'.

>
> Thanks !

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-12-01  2:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-28  6:01 [PATCH v2 net-next] net: snmp: add statistics for tcp small queue check menglong8.dong
2021-11-29 13:10 ` patchwork-bot+netdevbpf
2021-11-29 15:57 ` Jakub Kicinski
2021-11-29 16:37   ` Eric Dumazet
2021-11-30 14:36   ` Menglong Dong
2021-11-30 15:23     ` Jakub Kicinski
2021-11-30 15:55       ` Eric Dumazet
2021-12-01  2:05         ` Menglong Dong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).