All of lore.kernel.org
 help / color / mirror / Atom feed
* A data race between fanout_demux_rollover() and __fanout_unlink()
@ 2021-04-13 20:51 Gong, Sishuai
  2021-04-13 22:55 ` Xie He
  0 siblings, 1 reply; 5+ messages in thread
From: Gong, Sishuai @ 2021-04-13 20:51 UTC (permalink / raw)
  To: xie.he.0141, eyal.birger, yonatanlinik; +Cc: netdev

Hi,

We found a data race in linux-5.12-rc3 between af_packet.c functions fanout_demux_rollover() and __fanout_unlink() and we are able to reproduce it under x86. 

When the two functions are running together, __fanout_unlink() will grab a lock and modify some attribute of packet_fanout variable, but fanout_demux_rollover() may or may not see this update depending on different interleavings, as shown in below.

Currently, we didn’t find any explicit errors due to this data race. But in fanout_demux_rollover(), we noticed that the data-racing variable is involved in the later operation, which might be a concern.

------------------------------------------
Execution interleaving

Thread 1							Thread 2

__fanout_unlink()						fanout_demux_rollover()
spin_lock(&f->lock);
									po = pkt_sk(f->arr[idx]);
									// po is a out-of-date value
f->arr[i] = f->arr[f->num_members - 1];
spin_unlock(&f->lock);



Thanks,
Sishuai


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A data race between fanout_demux_rollover() and __fanout_unlink()
  2021-04-13 20:51 A data race between fanout_demux_rollover() and __fanout_unlink() Gong, Sishuai
@ 2021-04-13 22:55 ` Xie He
  2021-04-13 23:27   ` Willem de Bruijn
  0 siblings, 1 reply; 5+ messages in thread
From: Xie He @ 2021-04-13 22:55 UTC (permalink / raw)
  To: eyal.birger, yonatanlinik, netdev, Gong, Sishuai,
	Willem de Bruijn, David S. Miller, Jakub Kicinski

On Tue, Apr 13, 2021 at 1:51 PM Gong, Sishuai <sishuai@purdue.edu> wrote:
>
> Hi,
>
> We found a data race in linux-5.12-rc3 between af_packet.c functions fanout_demux_rollover() and __fanout_unlink() and we are able to reproduce it under x86.
>
> When the two functions are running together, __fanout_unlink() will grab a lock and modify some attribute of packet_fanout variable, but fanout_demux_rollover() may or may not see this update depending on different interleavings, as shown in below.
>
> Currently, we didn’t find any explicit errors due to this data race. But in fanout_demux_rollover(), we noticed that the data-racing variable is involved in the later operation, which might be a concern.
>
> ------------------------------------------
> Execution interleaving
>
> Thread 1                                                        Thread 2
>
> __fanout_unlink()                                               fanout_demux_rollover()
> spin_lock(&f->lock);
>                                                                         po = pkt_sk(f->arr[idx]);
>                                                                         // po is a out-of-date value
> f->arr[i] = f->arr[f->num_members - 1];
> spin_unlock(&f->lock);
>
>
>
> Thanks,
> Sishuai

CC'ing more people.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A data race between fanout_demux_rollover() and __fanout_unlink()
  2021-04-13 22:55 ` Xie He
@ 2021-04-13 23:27   ` Willem de Bruijn
  2021-04-14 16:52     ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Willem de Bruijn @ 2021-04-13 23:27 UTC (permalink / raw)
  To: Xie He
  Cc: eyal.birger, yonatanlinik, netdev, Gong, Sishuai,
	David S. Miller, Jakub Kicinski

On Tue, Apr 13, 2021 at 6:55 PM Xie He <xie.he.0141@gmail.com> wrote:
>
> On Tue, Apr 13, 2021 at 1:51 PM Gong, Sishuai <sishuai@purdue.edu> wrote:
> >
> > Hi,
> >
> > We found a data race in linux-5.12-rc3 between af_packet.c functions fanout_demux_rollover() and __fanout_unlink() and we are able to reproduce it under x86.
> >
> > When the two functions are running together, __fanout_unlink() will grab a lock and modify some attribute of packet_fanout variable, but fanout_demux_rollover() may or may not see this update depending on different interleavings, as shown in below.
> >
> > Currently, we didn’t find any explicit errors due to this data race. But in fanout_demux_rollover(), we noticed that the data-racing variable is involved in the later operation, which might be a concern.
> >
> > ------------------------------------------
> > Execution interleaving
> >
> > Thread 1                                                        Thread 2
> >
> > __fanout_unlink()                                               fanout_demux_rollover()
> > spin_lock(&f->lock);
> >                                                                         po = pkt_sk(f->arr[idx]);
> >                                                                         // po is a out-of-date value
> > f->arr[i] = f->arr[f->num_members - 1];
> > spin_unlock(&f->lock);
> >
> >
> >
> > Thanks,
> > Sishuai
>
> CC'ing more people.

__fanout_unlink removes a socket from the fanout group, but ensures
that the socket is not destroyed until after no datapath can refer to
it anymore, through a call to synchronize_net.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A data race between fanout_demux_rollover() and __fanout_unlink()
  2021-04-13 23:27   ` Willem de Bruijn
@ 2021-04-14 16:52     ` Eric Dumazet
  2021-04-14 17:52       ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2021-04-14 16:52 UTC (permalink / raw)
  To: Willem de Bruijn, Xie He
  Cc: eyal.birger, yonatanlinik, netdev, Gong, Sishuai,
	David S. Miller, Jakub Kicinski



On 4/14/21 1:27 AM, Willem de Bruijn wrote:
> On Tue, Apr 13, 2021 at 6:55 PM Xie He <xie.he.0141@gmail.com> wrote:
>>
>> On Tue, Apr 13, 2021 at 1:51 PM Gong, Sishuai <sishuai@purdue.edu> wrote:
>>>
>>> Hi,
>>>
>>> We found a data race in linux-5.12-rc3 between af_packet.c functions fanout_demux_rollover() and __fanout_unlink() and we are able to reproduce it under x86.
>>>
>>> When the two functions are running together, __fanout_unlink() will grab a lock and modify some attribute of packet_fanout variable, but fanout_demux_rollover() may or may not see this update depending on different interleavings, as shown in below.
>>>
>>> Currently, we didn’t find any explicit errors due to this data race. But in fanout_demux_rollover(), we noticed that the data-racing variable is involved in the later operation, which might be a concern.
>>>
>>> ------------------------------------------
>>> Execution interleaving
>>>
>>> Thread 1                                                        Thread 2
>>>
>>> __fanout_unlink()                                               fanout_demux_rollover()
>>> spin_lock(&f->lock);
>>>                                                                         po = pkt_sk(f->arr[idx]);
>>>                                                                         // po is a out-of-date value
>>> f->arr[i] = f->arr[f->num_members - 1];
>>> spin_unlock(&f->lock);
>>>
>>>
>>>
>>> Thanks,
>>> Sishuai
>>
>> CC'ing more people.
> 
> __fanout_unlink removes a socket from the fanout group, but ensures
> that the socket is not destroyed until after no datapath can refer to
> it anymore, through a call to synchronize_net.
> 

Right, but there is a data race.

Compiler might implement 

f->arr[i] = f->arr[f->num_members - 1];

(And po = pkt_sk(f->arr[idx]);

Using one-byte-at-a-time load/stores, yes crazy, but oh well.

We should use READ_ONCE()/WRITE_ONCE() at very minimum,
and rcu_dereference()/rcu_assign_pointer() since we clearly rely on standard RCU rules.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A data race between fanout_demux_rollover() and __fanout_unlink()
  2021-04-14 16:52     ` Eric Dumazet
@ 2021-04-14 17:52       ` Eric Dumazet
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2021-04-14 17:52 UTC (permalink / raw)
  To: Willem de Bruijn, Xie He
  Cc: eyal.birger, yonatanlinik, netdev, Gong, Sishuai,
	David S. Miller, Jakub Kicinski



On 4/14/21 6:52 PM, Eric Dumazet wrote:
> 
> 
> On 4/14/21 1:27 AM, Willem de Bruijn wrote:
>> On Tue, Apr 13, 2021 at 6:55 PM Xie He <xie.he.0141@gmail.com> wrote:
>>>
>>> On Tue, Apr 13, 2021 at 1:51 PM Gong, Sishuai <sishuai@purdue.edu> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We found a data race in linux-5.12-rc3 between af_packet.c functions fanout_demux_rollover() and __fanout_unlink() and we are able to reproduce it under x86.
>>>>
>>>> When the two functions are running together, __fanout_unlink() will grab a lock and modify some attribute of packet_fanout variable, but fanout_demux_rollover() may or may not see this update depending on different interleavings, as shown in below.
>>>>
>>>> Currently, we didn’t find any explicit errors due to this data race. But in fanout_demux_rollover(), we noticed that the data-racing variable is involved in the later operation, which might be a concern.
>>>>
>>>> ------------------------------------------
>>>> Execution interleaving
>>>>
>>>> Thread 1                                                        Thread 2
>>>>
>>>> __fanout_unlink()                                               fanout_demux_rollover()
>>>> spin_lock(&f->lock);
>>>>                                                                         po = pkt_sk(f->arr[idx]);
>>>>                                                                         // po is a out-of-date value
>>>> f->arr[i] = f->arr[f->num_members - 1];
>>>> spin_unlock(&f->lock);
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Sishuai
>>>
>>> CC'ing more people.
>>
>> __fanout_unlink removes a socket from the fanout group, but ensures
>> that the socket is not destroyed until after no datapath can refer to
>> it anymore, through a call to synchronize_net.
>>
> 
> Right, but there is a data race.
> 
> Compiler might implement 
> 
> f->arr[i] = f->arr[f->num_members - 1];
> 
> (And po = pkt_sk(f->arr[idx]);
> 
> Using one-byte-at-a-time load/stores, yes crazy, but oh well.
> 
> We should use READ_ONCE()/WRITE_ONCE() at very minimum,
> and rcu_dereference()/rcu_assign_pointer() since we clearly rely on standard RCU rules.
> 
> 
> 

I will test something like :

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 118d585337d72f10cd31ec5ca7c55b508fc18baf..ba96db1880eae89febf77ba6ff943b054cd268d7 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1359,7 +1359,7 @@ static unsigned int fanout_demux_rollover(struct packet_fanout *f,
        struct packet_sock *po, *po_next, *po_skip = NULL;
        unsigned int i, j, room = ROOM_NONE;
 
-       po = pkt_sk(f->arr[idx]);
+       po = pkt_sk(rcu_dereference(f->arr[idx]));
 
        if (try_self) {
                room = packet_rcv_has_room(po, skb);
@@ -1371,7 +1371,7 @@ static unsigned int fanout_demux_rollover(struct packet_fanout *f,
 
        i = j = min_t(int, po->rollover->sock, num - 1);
        do {
-               po_next = pkt_sk(f->arr[i]);
+               po_next = pkt_sk(rcu_dereference(f->arr[i]));
                if (po_next != po_skip && !READ_ONCE(po_next->pressure) &&
                    packet_rcv_has_room(po_next, skb) == ROOM_NORMAL) {
                        if (i != j)
@@ -1466,7 +1466,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
        if (fanout_has_flag(f, PACKET_FANOUT_FLAG_ROLLOVER))
                idx = fanout_demux_rollover(f, skb, idx, true, num);
 
-       po = pkt_sk(f->arr[idx]);
+       po = pkt_sk(rcu_dereference(f->arr[idx]));
        return po->prot_hook.func(skb, dev, &po->prot_hook, orig_dev);
 }
 
@@ -1480,7 +1480,7 @@ static void __fanout_link(struct sock *sk, struct packet_sock *po)
        struct packet_fanout *f = po->fanout;
 
        spin_lock(&f->lock);
-       f->arr[f->num_members] = sk;
+       rcu_assign_pointer(f->arr[f->num_members], sk);
        smp_wmb();
        f->num_members++;
        if (f->num_members == 1)
@@ -1495,11 +1495,14 @@ static void __fanout_unlink(struct sock *sk, struct packet_sock *po)
 
        spin_lock(&f->lock);
        for (i = 0; i < f->num_members; i++) {
-               if (f->arr[i] == sk)
+               if (rcu_dereference_protected(f->arr[i],
+                                             lockdep_is_held(&f->lock)) == sk)
                        break;
        }
        BUG_ON(i >= f->num_members);
-       f->arr[i] = f->arr[f->num_members - 1];
+       rcu_assign_pointer(f->arr[i],
+                          rcu_dereference_protected(f->arr[f->num_members - 1],
+                                                    lockdep_is_held(&f->lock)));
        f->num_members--;
        if (f->num_members == 0)
                __dev_remove_pack(&f->prot_hook);
diff --git a/net/packet/internal.h b/net/packet/internal.h
index 5f61e59ebbffaa25a8fdfe31f79211fe6a755c51..48af35b1aed2565267c0288e013e23ff51f2fcac 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -94,7 +94,7 @@ struct packet_fanout {
        spinlock_t              lock;
        refcount_t              sk_ref;
        struct packet_type      prot_hook ____cacheline_aligned_in_smp;
-       struct sock             *arr[];
+       struct sock     __rcu   *arr[];
 };
 
 struct packet_rollover {


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-14 17:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13 20:51 A data race between fanout_demux_rollover() and __fanout_unlink() Gong, Sishuai
2021-04-13 22:55 ` Xie He
2021-04-13 23:27   ` Willem de Bruijn
2021-04-14 16:52     ` Eric Dumazet
2021-04-14 17:52       ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.