All of lore.kernel.org
 help / color / mirror / Atom feed
* A bug in team driver
@ 2016-10-04 22:30 Alex Sidorenko
  2016-10-05  3:08 ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Alex Sidorenko @ 2016-10-04 22:30 UTC (permalink / raw)
  To: netdev

The problem was found on RHEL7.2 but is still present in the latest upstream kernel (according to visual sources inspection).

While using roundrobin runner we have noticed that after sending on team0 about 2.1 billion packets we started seeing 50% packet drop on team0 
(according to 'netstat -i'). This number suggested 'signed int' overflow and indeed, inspecting the sources I have noticed the following in 

drivers/net/team/team_mode_roundrobin.c
---------------------------------------
struct rr_priv {
	unsigned int sent_packets;                        <--------- unsigned int
};

static struct rr_priv *rr_priv(struct team *team)
{
	return (struct rr_priv *) &team->mode_priv;
}

static bool rr_transmit(struct team *team, struct sk_buff *skb)
{
	struct team_port *port;
	int port_index;

	port_index = team_num_to_port_index(team,
					    rr_priv(team)->sent_packets++);
---


we have 'unsigned int sent_packets' but we call team_num_to_port_index where 'num' is 'int'

include/linux/if_team.h
-----------------------
static inline int team_num_to_port_index(struct team *team, int num)    <-- signed int
{
	int en_port_count = ACCESS_ONCE(team->en_port_count);

	if (unlikely(!en_port_count))
		return 0;
	return num % en_port_count;
}


As soon as sent_packets becomes larger than MAXINT (=2**31-1), team_num_to_port_index() can return negative number as num becomes negative and remainder 
(num % en_port_count) is either 0 or negative. This leads to looking up incorrect hash-bucket and dropping packets.

We have easily duplicated this in roundrobin mode with two ports. After reaching 2**31 packets sent on team0 every second packet was dropped.

Rebuilding the kernel after changing 

team_num_to_port_index(struct team *team, int num) -> team_num_to_port_index(struct team *team, unsigned int num)

and running the test again does not show packet drop anymore.

The same subroutine is used in team_mode_loadbalance.c:lb_hash_select_tx_port but we pass 'unsigned char hash' to team_num_to_port_index(), so there should be no overflow. I did not test that mode in my tests.

Regards,
Alex



-- 

------------------------------------------------------------------
Alex Sidorenko	email: asid@hpe.com
ERT  Linux 	Hewlett-Packard Enterprise (Canada)
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A bug in team driver
  2016-10-04 22:30 A bug in team driver Alex Sidorenko
@ 2016-10-05  3:08 ` Eric Dumazet
  2016-10-05 13:06   ` [PATCH net] Fixing a bug in team driver due to incorrect 'unsigned int' to 'int' conversion Alex Sidorenko
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2016-10-05  3:08 UTC (permalink / raw)
  To: Alex Sidorenko; +Cc: netdev

On Tue, 2016-10-04 at 18:30 -0400, Alex Sidorenko wrote:
> The problem was found on RHEL7.2 but is still present in the latest
> upstream kernel (according to visual sources inspection).
> 
> While using roundrobin runner we have noticed that after sending on
> team0 about 2.1 billion packets we started seeing 50% packet drop on
> team0 
> (according to 'netstat -i'). This number suggested 'signed int'
> overflow and indeed, inspecting the sources I have noticed the
> following in 
> 
> drivers/net/team/team_mode_roundrobin.c
> ---------------------------------------
> struct rr_priv {
> 	unsigned int sent_packets;                        <--------- unsigned
> int
> };
> 
> static struct rr_priv *rr_priv(struct team *team)
> {
> 	return (struct rr_priv *) &team->mode_priv;
> }
> 
> static bool rr_transmit(struct team *team, struct sk_buff *skb)
> {
> 	struct team_port *port;
> 	int port_index;
> 
> 	port_index = team_num_to_port_index(team,
> 					    rr_priv(team)->sent_packets++);
> ---
> 
> 
> we have 'unsigned int sent_packets' but we call team_num_to_port_index
> where 'num' is 'int'
> 
> include/linux/if_team.h
> -----------------------
> static inline int team_num_to_port_index(struct team *team, int num)
> <-- signed int
> {
> 	int en_port_count = ACCESS_ONCE(team->en_port_count);
> 
> 	if (unlikely(!en_port_count))
> 		return 0;
> 	return num % en_port_count;
> }
> 
> 
> As soon as sent_packets becomes larger than MAXINT (=2**31-1),
> team_num_to_port_index() can return negative number as num becomes
> negative and remainder 
> (num % en_port_count) is either 0 or negative. This leads to looking
> up incorrect hash-bucket and dropping packets.
> 
> We have easily duplicated this in roundrobin mode with two ports.
> After reaching 2**31 packets sent on team0 every second packet was
> dropped.
> 
> Rebuilding the kernel after changing 
> 
> team_num_to_port_index(struct team *team, int num) ->
> team_num_to_port_index(struct team *team, unsigned int num)
> 
> and running the test again does not show packet drop anymore.
> 
> The same subroutine is used in
> team_mode_loadbalance.c:lb_hash_select_tx_port but we pass 'unsigned
> char hash' to team_num_to_port_index(), so there should be no
> overflow. I did not test that mode in my tests.

Good catch ! Can you send an official patch to fix this ?

Thanks.

> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net] Fixing a bug in team driver due to incorrect 'unsigned  int' to 'int' conversion
  2016-10-05  3:08 ` Eric Dumazet
@ 2016-10-05 13:06   ` Alex Sidorenko
  2016-10-05 19:37     ` Eric Dumazet
  2016-10-06  5:19     ` David Miller
  0 siblings, 2 replies; 5+ messages in thread
From: Alex Sidorenko @ 2016-10-05 13:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Roundrobin runner of team driver uses 'unsigned int' variable to count the number of sent_packets.
Later it is passed to a subroutine team_num_to_port_index(struct team *team, int num) as
'num' and when we reach MAXINT (2**31-1), 'num' becomes negative.

This leads to using incorrect hash-bucket for port lookup and as a result, packets are dropped. The fix
consists of changing 'int num' to 'unsigned int num'. Testing of a fixed kernel shows that there
is no packet drop anymore.


Signed-off-by: Alex Sidorenko <alexandre.sidorenko@hpe.com>

---
 include/linux/if_team.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/if_team.h b/include/linux/if_team.h
index 174f43f..c05216a 100644
--- a/include/linux/if_team.h
+++ b/include/linux/if_team.h
@@ -245,7 +245,7 @@ static inline struct team_port *team_get_port_by_index(struct team *team,
        return NULL;
 }
 
-static inline int team_num_to_port_index(struct team *team, int num)
+static inline int team_num_to_port_index(struct team *team, unsigned int num)
 {
        int en_port_count = ACCESS_ONCE(team->en_port_count);
 
-- 
2.7.4
 


-- 

------------------------------------------------------------------
Alex Sidorenko	email: asid@hpe.com
ERT  Linux 	Hewlett-Packard Enterprise (Canada)
------------------------------------------------------------------

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net] Fixing a bug in team driver due to incorrect 'unsigned  int' to 'int' conversion
  2016-10-05 13:06   ` [PATCH net] Fixing a bug in team driver due to incorrect 'unsigned int' to 'int' conversion Alex Sidorenko
@ 2016-10-05 19:37     ` Eric Dumazet
  2016-10-06  5:19     ` David Miller
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2016-10-05 19:37 UTC (permalink / raw)
  To: Alex Sidorenko; +Cc: netdev

On Wed, 2016-10-05 at 09:06 -0400, Alex Sidorenko wrote:
> Roundrobin runner of team driver uses 'unsigned int' variable to count the number of sent_packets.
> Later it is passed to a subroutine team_num_to_port_index(struct team *team, int num) as
> 'num' and when we reach MAXINT (2**31-1), 'num' becomes negative.
> 
> This leads to using incorrect hash-bucket for port lookup and as a result, packets are dropped. The fix
> consists of changing 'int num' to 'unsigned int num'. Testing of a fixed kernel shows that there
> is no packet drop anymore.
> 
> 
> Signed-off-by: Alex Sidorenko <alexandre.sidorenko@hpe.com>

Note that lines in your changelog are longer than the norm 
( Documentation/SubmittingPatches around line 619 )

 - The body of the explanation, line wrapped at 75 columns, which will
   be copied to the permanent changelog to describe this patch.

Otherwise, patch looks, welcome to the club Alex !

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] Fixing a bug in team driver due to incorrect 'unsigned int' to 'int' conversion
  2016-10-05 13:06   ` [PATCH net] Fixing a bug in team driver due to incorrect 'unsigned int' to 'int' conversion Alex Sidorenko
  2016-10-05 19:37     ` Eric Dumazet
@ 2016-10-06  5:19     ` David Miller
  1 sibling, 0 replies; 5+ messages in thread
From: David Miller @ 2016-10-06  5:19 UTC (permalink / raw)
  To: alexandre.sidorenko; +Cc: eric.dumazet, netdev

From: Alex Sidorenko <alexandre.sidorenko@hpe.com>
Date: Wed, 05 Oct 2016 09:06:04 -0400

> Roundrobin runner of team driver uses 'unsigned int' variable to count the number of sent_packets.
> Later it is passed to a subroutine team_num_to_port_index(struct team *team, int num) as
> 'num' and when we reach MAXINT (2**31-1), 'num' becomes negative.
> 
> This leads to using incorrect hash-bucket for port lookup and as a result, packets are dropped. The fix
> consists of changing 'int num' to 'unsigned int num'. Testing of a fixed kernel shows that there
> is no packet drop anymore.
> 
> 
> Signed-off-by: Alex Sidorenko <alexandre.sidorenko@hpe.com>

This patch has been corrupted by your email client, for example it
has transformed TAB charactes into spaces.

Please fix this up, email a test patch to yourself, and only resubmit
this patch to the mailing list when you are able to successfully apply
the test patch you send to yourself.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-10-06  5:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-04 22:30 A bug in team driver Alex Sidorenko
2016-10-05  3:08 ` Eric Dumazet
2016-10-05 13:06   ` [PATCH net] Fixing a bug in team driver due to incorrect 'unsigned int' to 'int' conversion Alex Sidorenko
2016-10-05 19:37     ` Eric Dumazet
2016-10-06  5:19     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.