linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net] packets: Always register packet sk in the same order
@ 2019-03-14 12:24 Maxime Chevallier
  2019-03-14 15:24 ` Willem de Bruijn
  0 siblings, 1 reply; 4+ messages in thread
From: Maxime Chevallier @ 2019-03-14 12:24 UTC (permalink / raw)
  To: linux-kernel, netdev, David S . Miller
  Cc: Maxime Chevallier, Willem de Bruijn, Eric Dumazet,
	Antoine Tenart, Thomas Petazzoni

When using fanouts with AF_PACKET, the demux functions such as
fanout_demux_cpu will return an index in the fanout socket array, which
corresponds to the selected socket.

The ordering of this array depends on the order the sockets were added
to a given fanout group, so for FANOUT_CPU this means sockets are bound
to cpus in the order they are configured, which is OK.

However, when stopping then restarting the interface these sockets are
bound to, the sockets are reassigned to the fanout group in the reverse
order, due to the fact that they were inserted at the head of the
interface's AF_PACKET socket list.

This means that traffic that was directed to the first socket in the
fanout group is now directed to the last one after an interface restart.

In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
socket that used to receive traffic from the last CPU after an interface
restart.

This commit introduces a helper to add a socket at the tail of a list,
then uses it to register AF_PACKET sockets.

Fixes: 808f5114a920 ("packet: convert socket list to RCU (v3)")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---

Hi All,

I stumbled upon this issue when using FANOUT_CPU and came-up with this
patch, but I'm not sure that (a) this is really a bug (although this
behaviour is at least misleading) and (b) this is the correct fix,
so any input on this is welcome.

Also David, I'm not sure about the Fixes tag, from what I see, this
behaviour has always been there.

Thanks,

Maxime

 include/net/sock.h     | 6 ++++++
 net/packet/af_packet.c | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 328cb7cb7b0b..8de5ee258b93 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -710,6 +710,12 @@ static inline void sk_add_node_rcu(struct sock *sk, struct hlist_head *list)
 		hlist_add_head_rcu(&sk->sk_node, list);
 }
 
+static inline void sk_add_node_tail_rcu(struct sock *sk, struct hlist_head *list)
+{
+	sock_hold(sk);
+	hlist_add_tail_rcu(&sk->sk_node, list);
+}
+
 static inline void __sk_nulls_add_node_rcu(struct sock *sk, struct hlist_nulls_head *list)
 {
 	hlist_nulls_add_head_rcu(&sk->sk_nulls_node, list);
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8376bc1c1508..8754d7c93b84 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3243,7 +3243,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
 	}
 
 	mutex_lock(&net->packet.sklist_lock);
-	sk_add_node_rcu(sk, &net->packet.sklist);
+	sk_add_node_tail_rcu(sk, &net->packet.sklist);
 	mutex_unlock(&net->packet.sklist_lock);
 
 	preempt_disable();
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH net] packets: Always register packet sk in the same order
  2019-03-14 12:24 [RFC PATCH net] packets: Always register packet sk in the same order Maxime Chevallier
@ 2019-03-14 15:24 ` Willem de Bruijn
  2019-03-15  9:03   ` Maxime Chevallier
  0 siblings, 1 reply; 4+ messages in thread
From: Willem de Bruijn @ 2019-03-14 15:24 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: LKML, Network Development, David S . Miller, Willem de Bruijn,
	Eric Dumazet, Antoine Tenart, Thomas Petazzoni

On Thu, Mar 14, 2019 at 8:27 AM Maxime Chevallier
<maxime.chevallier@bootlin.com> wrote:
>
> When using fanouts with AF_PACKET, the demux functions such as
> fanout_demux_cpu will return an index in the fanout socket array, which
> corresponds to the selected socket.
>
> The ordering of this array depends on the order the sockets were added
> to a given fanout group, so for FANOUT_CPU this means sockets are bound
> to cpus in the order they are configured, which is OK.
>
> However, when stopping then restarting the interface these sockets are
> bound to, the sockets are reassigned to the fanout group in the reverse
> order, due to the fact that they were inserted at the head of the
> interface's AF_PACKET socket list.
>
> This means that traffic that was directed to the first socket in the
> fanout group is now directed to the last one after an interface restart.
>
> In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
> socket that used to receive traffic from the last CPU after an interface
> restart.

The above assumes that sockets are added to a fanout group in the
order in which they are created. That is a reasonable assumption,
though not strictly required. This can be worked around in userspace
by inserting in the inverse order. Still, good to fix.

It does change the order of output of proc and diag, which is an
unfortunately side effect. But insertion order is less surprising and
I don't see a simple option to only traverse the sklist in inverse
order on packet_notifier NETDEV_UP (which would avoid this).

> This commit introduces a helper to add a socket at the tail of a list,
> then uses it to register AF_PACKET sockets.
>
> Fixes: 808f5114a920 ("packet: convert socket list to RCU (v3)")

Before this patch, insertion with sk_add_node is also at the head.
This does not seem like the right commit? This behavior has probably
existed as long as fanout.


> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> ---
>
> Hi All,
>
> I stumbled upon this issue when using FANOUT_CPU and came-up with this
> patch, but I'm not sure that (a) this is really a bug (although this
> behaviour is at least misleading) and (b) this is the correct fix,
> so any input on this is welcome.
>
> Also David, I'm not sure about the Fixes tag, from what I see, this
> behaviour has always been there.
>
> Thanks,
>
> Maxime
>
>  include/net/sock.h     | 6 ++++++
>  net/packet/af_packet.c | 2 +-
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 328cb7cb7b0b..8de5ee258b93 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -710,6 +710,12 @@ static inline void sk_add_node_rcu(struct sock *sk, struct hlist_head *list)
>                 hlist_add_head_rcu(&sk->sk_node, list);
>  }
>
> +static inline void sk_add_node_tail_rcu(struct sock *sk, struct hlist_head *list)
> +{
> +       sock_hold(sk);
> +       hlist_add_tail_rcu(&sk->sk_node, list);
> +}
> +
>  static inline void __sk_nulls_add_node_rcu(struct sock *sk, struct hlist_nulls_head *list)
>  {
>         hlist_nulls_add_head_rcu(&sk->sk_nulls_node, list);
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 8376bc1c1508..8754d7c93b84 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -3243,7 +3243,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
>         }
>
>         mutex_lock(&net->packet.sklist_lock);
> -       sk_add_node_rcu(sk, &net->packet.sklist);
> +       sk_add_node_tail_rcu(sk, &net->packet.sklist);
>         mutex_unlock(&net->packet.sklist_lock);
>
>         preempt_disable();
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH net] packets: Always register packet sk in the same order
  2019-03-14 15:24 ` Willem de Bruijn
@ 2019-03-15  9:03   ` Maxime Chevallier
  2019-03-15 14:13     ` Willem de Bruijn
  0 siblings, 1 reply; 4+ messages in thread
From: Maxime Chevallier @ 2019-03-15  9:03 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: LKML, Network Development, David S . Miller, Willem de Bruijn,
	Eric Dumazet, Antoine Tenart, Thomas Petazzoni

Hi Willem,

On Thu, 14 Mar 2019 11:24:54 -0400
Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:

>On Thu, Mar 14, 2019 at 8:27 AM Maxime Chevallier
><maxime.chevallier@bootlin.com> wrote:
>>
>> When using fanouts with AF_PACKET, the demux functions such as
>> fanout_demux_cpu will return an index in the fanout socket array, which
>> corresponds to the selected socket.
>>
>> The ordering of this array depends on the order the sockets were added
>> to a given fanout group, so for FANOUT_CPU this means sockets are bound
>> to cpus in the order they are configured, which is OK.
>>
>> However, when stopping then restarting the interface these sockets are
>> bound to, the sockets are reassigned to the fanout group in the reverse
>> order, due to the fact that they were inserted at the head of the
>> interface's AF_PACKET socket list.
>>
>> This means that traffic that was directed to the first socket in the
>> fanout group is now directed to the last one after an interface restart.
>>
>> In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
>> socket that used to receive traffic from the last CPU after an interface
>> restart.  
>
>The above assumes that sockets are added to a fanout group in the
>order in which they are created. That is a reasonable assumption,
>though not strictly required. This can be worked around in userspace
>by inserting in the inverse order. Still, good to fix.
>
>It does change the order of output of proc and diag, which is an
>unfortunately side effect. But insertion order is less surprising and
>I don't see a simple option to only traverse the sklist in inverse
>order on packet_notifier NETDEV_UP (which would avoid this).

Thanks for the review ! Clearly the solution proposed by this patch has
side-effects, and as you said I don't see an easy way to do that in
another manner.

>> This commit introduces a helper to add a socket at the tail of a list,
>> then uses it to register AF_PACKET sockets.
>>
>> Fixes: 808f5114a920 ("packet: convert socket list to RCU (v3)")  
>
>Before this patch, insertion with sk_add_node is also at the head.
>This does not seem like the right commit? This behavior has probably
>existed as long as fanout.

Yes I agree, I'm not even sure if this patch should be targeted to -net
or -net-next. If this is OK after reviews, I can resend it without RFC
and without this misleading Fixes tag.

Thanks,

Maxime

>
>> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
>> ---
>>
>> Hi All,
>>
>> I stumbled upon this issue when using FANOUT_CPU and came-up with this
>> patch, but I'm not sure that (a) this is really a bug (although this
>> behaviour is at least misleading) and (b) this is the correct fix,
>> so any input on this is welcome.
>>
>> Also David, I'm not sure about the Fixes tag, from what I see, this
>> behaviour has always been there.
>>
>> Thanks,
>>
>> Maxime
>>
>>  include/net/sock.h     | 6 ++++++
>>  net/packet/af_packet.c | 2 +-
>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index 328cb7cb7b0b..8de5ee258b93 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -710,6 +710,12 @@ static inline void sk_add_node_rcu(struct sock *sk, struct hlist_head *list)
>>                 hlist_add_head_rcu(&sk->sk_node, list);
>>  }
>>
>> +static inline void sk_add_node_tail_rcu(struct sock *sk, struct hlist_head *list)
>> +{
>> +       sock_hold(sk);
>> +       hlist_add_tail_rcu(&sk->sk_node, list);
>> +}
>> +
>>  static inline void __sk_nulls_add_node_rcu(struct sock *sk, struct hlist_nulls_head *list)
>>  {
>>         hlist_nulls_add_head_rcu(&sk->sk_nulls_node, list);
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index 8376bc1c1508..8754d7c93b84 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -3243,7 +3243,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
>>         }
>>
>>         mutex_lock(&net->packet.sklist_lock);
>> -       sk_add_node_rcu(sk, &net->packet.sklist);
>> +       sk_add_node_tail_rcu(sk, &net->packet.sklist);
>>         mutex_unlock(&net->packet.sklist_lock);
>>
>>         preempt_disable();
>> --
>> 2.20.1
>>  



-- 
Maxime Chevallier, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH net] packets: Always register packet sk in the same order
  2019-03-15  9:03   ` Maxime Chevallier
@ 2019-03-15 14:13     ` Willem de Bruijn
  0 siblings, 0 replies; 4+ messages in thread
From: Willem de Bruijn @ 2019-03-15 14:13 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: LKML, Network Development, David S . Miller, Willem de Bruijn,
	Eric Dumazet, Antoine Tenart, Thomas Petazzoni

On Fri, Mar 15, 2019 at 5:03 AM Maxime Chevallier
<maxime.chevallier@bootlin.com> wrote:
>
> Hi Willem,
>
> On Thu, 14 Mar 2019 11:24:54 -0400
> Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
>
> >On Thu, Mar 14, 2019 at 8:27 AM Maxime Chevallier
> ><maxime.chevallier@bootlin.com> wrote:
> >>
> >> When using fanouts with AF_PACKET, the demux functions such as
> >> fanout_demux_cpu will return an index in the fanout socket array, which
> >> corresponds to the selected socket.
> >>
> >> The ordering of this array depends on the order the sockets were added
> >> to a given fanout group, so for FANOUT_CPU this means sockets are bound
> >> to cpus in the order they are configured, which is OK.
> >>
> >> However, when stopping then restarting the interface these sockets are
> >> bound to, the sockets are reassigned to the fanout group in the reverse
> >> order, due to the fact that they were inserted at the head of the
> >> interface's AF_PACKET socket list.
> >>
> >> This means that traffic that was directed to the first socket in the
> >> fanout group is now directed to the last one after an interface restart.
> >>
> >> In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
> >> socket that used to receive traffic from the last CPU after an interface
> >> restart.
> >
> >The above assumes that sockets are added to a fanout group in the
> >order in which they are created. That is a reasonable assumption,
> >though not strictly required. This can be worked around in userspace
> >by inserting in the inverse order. Still, good to fix.
> >
> >It does change the order of output of proc and diag, which is an
> >unfortunately side effect. But insertion order is less surprising and
> >I don't see a simple option to only traverse the sklist in inverse
> >order on packet_notifier NETDEV_UP (which would avoid this).
>
> Thanks for the review ! Clearly the solution proposed by this patch has
> side-effects, and as you said I don't see an easy way to do that in
> another manner.
>
> >> This commit introduces a helper to add a socket at the tail of a list,
> >> then uses it to register AF_PACKET sockets.
> >>
> >> Fixes: 808f5114a920 ("packet: convert socket list to RCU (v3)")
> >
> >Before this patch, insertion with sk_add_node is also at the head.
> >This does not seem like the right commit? This behavior has probably
> >existed as long as fanout.
>
> Yes I agree, I'm not even sure if this patch should be targeted to -net
> or -net-next. If this is OK after reviews, I can resend it without RFC
> and without this misleading Fixes tag.

Sounds good.

Since current behavior can split packets of the same flow across
multiple sockets, even for fanout_demux_hash, I think this warrants
net. Then the Fixes tag would be dc99f600698d ("packet: Add fanout
support.").

Please mention the reordering of proc and diag output in the commit message.

Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-15 14:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-14 12:24 [RFC PATCH net] packets: Always register packet sk in the same order Maxime Chevallier
2019-03-14 15:24 ` Willem de Bruijn
2019-03-15  9:03   ` Maxime Chevallier
2019-03-15 14:13     ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).