* Re: [PATCH] net: fix a data race when get vlan device
@ 2021-04-17 12:33 zhudi (J)
2021-04-19 2:26 ` Yunsheng Lin
0 siblings, 1 reply; 7+ messages in thread
From: zhudi (J) @ 2021-04-17 12:33 UTC (permalink / raw)
To: linyunsheng, davem, kuba; +Cc: netdev, Chenxiang (EulerOS)
> On 2021/4/16 11:27, zhudi (J) wrote:
> >> dependencyOn 2021/4/15 11:35, zhudi wrote:
> >>> From: Di Zhu <zhudi21@huawei.com>
> >>>
> >>> We encountered a crash: in the packet receiving process, we got an
> >>> illegal VLAN device address, but the VLAN device address saved in
> >>> vmcore is correct. After checking the code, we found a possible data
> >>> competition:
> >>> CPU 0: CPU 1:
> >>> (RCU read lock) (RTNL lock)
> >>> vlan_do_receive() register_vlan_dev()
> >>> vlan_find_dev()
> >>>
> >>> ->__vlan_group_get_device() ->vlan_group_prealloc_vid()
> >>>
> >>> In vlan_group_prealloc_vid(), We need to make sure that kzalloc is
> >>> executed before assigning a value to vlan devices array, otherwise we
> >>
> >> As my understanding, there is a dependency between calling kzalloc() and
> >> assigning the address(returned from kzalloc()) to vg->vlan_devices_arrays,
> >> CPU and compiler can see the dependency, why can't it handling the
> >> dependency before adding the smp_wmb()?
> >>
> >> See CONTROL DEPENDENCIES section in Documentation/memory-
> >> barriers.txt:
> >>
> >> However, stores are not speculated. This means that ordering -is-
> provided
> >> for load-store control dependencies, as in the following example:
> >>
> >> q = READ_ONCE(a);
> >> if (q) {
> >> WRITE_ONCE(b, 1);
> >> }
> >>
> >
> > Maybe I didn't make it clear. This memory isolation is to ensure the order
> of
> > memset(object, 0, size) in kzalloc() operations and the subsequent array
> assignment statements.
> >
> > kzalloc()
> > ->memset(object, 0, size)
> >
> > smp_wmb()
> >
> > vg->vlan_devices_arrays[pidx][vidx] = array;
> >
> > Because __vlan_group_get_device() function depends on this order
>
> Thanks for clarify, it would be good to mention this in the
> commit log too.
OK, I'll change it. Thank you for your advice.
>
> Also, __vlan_group_get_device() is used in the data path, it would
> be to avoid the barrier op too. Maybe using rcu to avoid the barrier
> if the __vlan_group_get_device() is already protected by rcu_lock.
Using the netperf command for testing on x86, there is no difference in performance:
# netperf -H 112.113.0.12 -l 20 -t TCP_STREAM
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 112.113.0.12 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 20.00 9386.03
After patch:
# netperf -H 112.113.0.12 -l 20 -t TCP_STREAM
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 112.113.0.12 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 20.00 9386.41
The same is true for UDP stream test
>
> >
> >>
> >>
> >>> may get a wrong address from the hardware cache on another cpu.
> >>>
> >>> So fix it by adding memory barrier instruction to ensure the order of
> >>> memory operations.
> >>>
> >>> Signed-off-by: Di Zhu <zhudi21@huawei.com>
> >>> ---
> >>> net/8021q/vlan.c | 2 ++
> >>> net/8021q/vlan.h | 3 +++
> >>> 2 files changed, 5 insertions(+)
> >>>
> >>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index
> >>> 8b644113715e..4f541e05cd3f 100644
> >>> --- a/net/8021q/vlan.c
> >>> +++ b/net/8021q/vlan.c
> >>> @@ -71,6 +71,8 @@ static int vlan_group_prealloc_vid(struct
> vlan_group
> >> *vg,
> >>> if (array == NULL)
> >>> return -ENOBUFS;
> >>>
> >>> + smp_wmb();
> >>> +
> >>> vg->vlan_devices_arrays[pidx][vidx] = array;
> >>> return 0;
> >>> }
> >>> diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h index
> >>> 953405362795..7408fda084d3 100644
> >>> --- a/net/8021q/vlan.h
> >>> +++ b/net/8021q/vlan.h
> >>> @@ -57,6 +57,9 @@ static inline struct net_device
> >>> *__vlan_group_get_device(struct vlan_group *vg,
> >>>
> >>> array = vg->vlan_devices_arrays[pidx]
> >>> [vlan_id /
> >> VLAN_GROUP_ARRAY_PART_LEN];
> >>> +
> >>> + smp_rmb();
> >>> +
> >>> return array ? array[vlan_id % VLAN_GROUP_ARRAY_PART_LEN] :
> >> NULL; }
> >>>
> >>>
> >
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net: fix a data race when get vlan device
2021-04-17 12:33 [PATCH] net: fix a data race when get vlan device zhudi (J)
@ 2021-04-19 2:26 ` Yunsheng Lin
0 siblings, 0 replies; 7+ messages in thread
From: Yunsheng Lin @ 2021-04-19 2:26 UTC (permalink / raw)
To: zhudi (J), davem, kuba; +Cc: netdev, Chenxiang (EulerOS)
On 2021/4/17 20:33, zhudi (J) wrote:
>> On 2021/4/16 11:27, zhudi (J) wrote:
>>>> dependencyOn 2021/4/15 11:35, zhudi wrote:
>>>>> From: Di Zhu <zhudi21@huawei.com>
>>>>>
>>>>> We encountered a crash: in the packet receiving process, we got an
>>>>> illegal VLAN device address, but the VLAN device address saved in
>>>>> vmcore is correct. After checking the code, we found a possible data
>>>>> competition:
>>>>> CPU 0: CPU 1:
>>>>> (RCU read lock) (RTNL lock)
>>>>> vlan_do_receive() register_vlan_dev()
>>>>> vlan_find_dev()
>>>>>
>>>>> ->__vlan_group_get_device() ->vlan_group_prealloc_vid()
>>>>>
>>>>> In vlan_group_prealloc_vid(), We need to make sure that kzalloc is
>>>>> executed before assigning a value to vlan devices array, otherwise we
>>>>
>>>> As my understanding, there is a dependency between calling kzalloc() and
>>>> assigning the address(returned from kzalloc()) to vg->vlan_devices_arrays,
>>>> CPU and compiler can see the dependency, why can't it handling the
>>>> dependency before adding the smp_wmb()?
>>>>
>>>> See CONTROL DEPENDENCIES section in Documentation/memory-
>>>> barriers.txt:
>>>>
>>>> However, stores are not speculated. This means that ordering -is-
>> provided
>>>> for load-store control dependencies, as in the following example:
>>>>
>>>> q = READ_ONCE(a);
>>>> if (q) {
>>>> WRITE_ONCE(b, 1);
>>>> }
>>>>
>>>
>>> Maybe I didn't make it clear. This memory isolation is to ensure the order
>> of
>>> memset(object, 0, size) in kzalloc() operations and the subsequent array
>> assignment statements.
>>>
>>> kzalloc()
>>> ->memset(object, 0, size)
>>>
>>> smp_wmb()
>>>
>>> vg->vlan_devices_arrays[pidx][vidx] = array;
>>>
>>> Because __vlan_group_get_device() function depends on this order
>>
>
>> Thanks for clarify, it would be good to mention this in the
>> commit log too.
>
> OK, I'll change it. Thank you for your advice.
>
>>
>> Also, __vlan_group_get_device() is used in the data path, it would
>> be to avoid the barrier op too. Maybe using rcu to avoid the barrier
>> if the __vlan_group_get_device() is already protected by rcu_lock.
>
> Using the netperf command for testing on x86, there is no difference in performance:
This may make sense for x86 because x86 has a strong order memory
model, which has smp_rmb() as compiler barrier, as my understanding.
How about the weak order memory model CPU? such as arm64, which has
smp_rmb() as 'dmb' instruction.
Also the cpu usage may need to be looked at if data rate is
at line speed.
>
> # netperf -H 112.113.0.12 -l 20 -t TCP_STREAM
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 112.113.0.12 () port 0 AF_INET
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> 131072 16384 16384 20.00 9386.03
>
> After patch:
>
> # netperf -H 112.113.0.12 -l 20 -t TCP_STREAM
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 112.113.0.12 () port 0 AF_INET
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> 131072 16384 16384 20.00 9386.41
>
> The same is true for UDP stream test
>
>>
>>>
>>>>
>>>>
>>>>> may get a wrong address from the hardware cache on another cpu.
>>>>>
>>>>> So fix it by adding memory barrier instruction to ensure the order of
>>>>> memory operations.
>>>>>
>>>>> Signed-off-by: Di Zhu <zhudi21@huawei.com>
>>>>> ---
>>>>> net/8021q/vlan.c | 2 ++
>>>>> net/8021q/vlan.h | 3 +++
>>>>> 2 files changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index
>>>>> 8b644113715e..4f541e05cd3f 100644
>>>>> --- a/net/8021q/vlan.c
>>>>> +++ b/net/8021q/vlan.c
>>>>> @@ -71,6 +71,8 @@ static int vlan_group_prealloc_vid(struct
>> vlan_group
>>>> *vg,
>>>>> if (array == NULL)
>>>>> return -ENOBUFS;
>>>>>
>>>>> + smp_wmb();
>>>>> +
>>>>> vg->vlan_devices_arrays[pidx][vidx] = array;
>>>>> return 0;
>>>>> }
>>>>> diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h index
>>>>> 953405362795..7408fda084d3 100644
>>>>> --- a/net/8021q/vlan.h
>>>>> +++ b/net/8021q/vlan.h
>>>>> @@ -57,6 +57,9 @@ static inline struct net_device
>>>>> *__vlan_group_get_device(struct vlan_group *vg,
>>>>>
>>>>> array = vg->vlan_devices_arrays[pidx]
>>>>> [vlan_id /
>>>> VLAN_GROUP_ARRAY_PART_LEN];
>>>>> +
>>>>> + smp_rmb();
>>>>> +
>>>>> return array ? array[vlan_id % VLAN_GROUP_ARRAY_PART_LEN] :
>>>> NULL; }
>>>>>
>>>>>
>>>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net: fix a data race when get vlan device
@ 2021-04-19 13:13 zhudi (J)
0 siblings, 0 replies; 7+ messages in thread
From: zhudi (J) @ 2021-04-19 13:13 UTC (permalink / raw)
To: linyunsheng, davem, kuba; +Cc: netdev, Chenxiang (EulerOS)
> On 2021/4/17 20:33, zhudi (J) wrote:
> >> On 2021/4/16 11:27, zhudi (J) wrote:
> >>>> dependencyOn 2021/4/15 11:35, zhudi wrote:
> >>>>> From: Di Zhu <zhudi21@huawei.com>
> >>>>>
> >>>>> We encountered a crash: in the packet receiving process, we got an
> >>>>> illegal VLAN device address, but the VLAN device address saved in
> >>>>> vmcore is correct. After checking the code, we found a possible
> >>>>> data
> >>>>> competition:
> >>>>> CPU 0: CPU 1:
> >>>>> (RCU read lock) (RTNL lock)
> >>>>> vlan_do_receive() register_vlan_dev()
> >>>>> vlan_find_dev()
> >>>>>
> >>>>> ->__vlan_group_get_device() ->vlan_group_prealloc_vid()
> >>>>>
> >>>>> In vlan_group_prealloc_vid(), We need to make sure that kzalloc is
> >>>>> executed before assigning a value to vlan devices array, otherwise
> >>>>> we
> >>>>
> >>>> As my understanding, there is a dependency between calling
> >>>> kzalloc() and assigning the address(returned from kzalloc()) to
> >>>> vg->vlan_devices_arrays, CPU and compiler can see the dependency,
> >>>> why can't it handling the dependency before adding the smp_wmb()?
> >>>>
> >>>> See CONTROL DEPENDENCIES section in Documentation/memory-
> >>>> barriers.txt:
> >>>>
> >>>> However, stores are not speculated. This means that ordering -is-
> >> provided
> >>>> for load-store control dependencies, as in the following example:
> >>>>
> >>>> q = READ_ONCE(a);
> >>>> if (q) {
> >>>> WRITE_ONCE(b, 1);
> >>>> }
> >>>>
> >>>
> >>> Maybe I didn't make it clear. This memory isolation is to ensure
> >>> the order
> >> of
> >>> memset(object, 0, size) in kzalloc() operations and the subsequent
> >>> array
> >> assignment statements.
> >>>
> >>> kzalloc()
> >>> ->memset(object, 0, size)
> >>>
> >>> smp_wmb()
> >>>
> >>> vg->vlan_devices_arrays[pidx][vidx] = array;
> >>>
> >>> Because __vlan_group_get_device() function depends on this order
> >>
> >
> >> Thanks for clarify, it would be good to mention this in the commit
> >> log too.
> >
> > OK, I'll change it. Thank you for your advice.
> >
> >>
> >> Also, __vlan_group_get_device() is used in the data path, it would be
> >> to avoid the barrier op too. Maybe using rcu to avoid the barrier if
> >> the __vlan_group_get_device() is already protected by rcu_lock.
> >
> > Using the netperf command for testing on x86, there is no difference in
> performance:
>
> This may make sense for x86 because x86 has a strong order memory model,
> which has smp_rmb() as compiler barrier, as my understanding.
>
> How about the weak order memory model CPU? such as arm64, which has
> smp_rmb() as 'dmb' instruction.
The test result on Arm is the same as that on X86, no matter whether it is patched or not,
there is basically no difference in performance.
The smp_rmb() semantically does not ensure the completion of any of the
memory accesses for which it ensures relative order, So the performance impact
should be minimal
> Also the cpu usage may need to be looked at if data rate is at line speed.
>
> >
> > # netperf -H 112.113.0.12 -l 20 -t TCP_STREAM MIGRATED TCP STREAM
> TEST
> > from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 112.113.0.12 () port 0 AF_INET
> > Recv Send Send
> > Socket Socket Message Elapsed
> > Size Size Size Time Throughput
> > bytes bytes bytes secs. 10^6bits/sec
> >
> > 131072 16384 16384 20.00 9386.03
> >
> > After patch:
> >
> > # netperf -H 112.113.0.12 -l 20 -t TCP_STREAM MIGRATED TCP STREAM
> > TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 112.113.0.12 () port 0 AF_INET
> > Recv Send Send
> > Socket Socket Message Elapsed
> > Size Size Size Time Throughput
> > bytes bytes bytes secs. 10^6bits/sec
> >
> > 131072 16384 16384 20.00 9386.41
> >
> > The same is true for UDP stream test
> >
> >>
> >>>
> >>>>
> >>>>
> >>>>> may get a wrong address from the hardware cache on another cpu.
> >>>>>
> >>>>> So fix it by adding memory barrier instruction to ensure the order
> >>>>> of memory operations.
> >>>>>
> >>>>> Signed-off-by: Di Zhu <zhudi21@huawei.com>
> >>>>> ---
> >>>>> net/8021q/vlan.c | 2 ++
> >>>>> net/8021q/vlan.h | 3 +++
> >>>>> 2 files changed, 5 insertions(+)
> >>>>>
> >>>>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index
> >>>>> 8b644113715e..4f541e05cd3f 100644
> >>>>> --- a/net/8021q/vlan.c
> >>>>> +++ b/net/8021q/vlan.c
> >>>>> @@ -71,6 +71,8 @@ static int vlan_group_prealloc_vid(struct
> >> vlan_group
> >>>> *vg,
> >>>>> if (array == NULL)
> >>>>> return -ENOBUFS;
> >>>>>
> >>>>> + smp_wmb();
> >>>>> +
> >>>>> vg->vlan_devices_arrays[pidx][vidx] = array;
> >>>>> return 0;
> >>>>> }
> >>>>> diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h index
> >>>>> 953405362795..7408fda084d3 100644
> >>>>> --- a/net/8021q/vlan.h
> >>>>> +++ b/net/8021q/vlan.h
> >>>>> @@ -57,6 +57,9 @@ static inline struct net_device
> >>>>> *__vlan_group_get_device(struct vlan_group *vg,
> >>>>>
> >>>>> array = vg->vlan_devices_arrays[pidx]
> >>>>> [vlan_id /
> >>>> VLAN_GROUP_ARRAY_PART_LEN];
> >>>>> +
> >>>>> + smp_rmb();
> >>>>> +
> >>>>> return array ? array[vlan_id % VLAN_GROUP_ARRAY_PART_LEN] :
> >>>> NULL; }
> >>>>>
> >>>>>
> >>>
> >
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net: fix a data race when get vlan device
2021-04-16 3:27 zhudi (J)
@ 2021-04-16 3:56 ` Yunsheng Lin
0 siblings, 0 replies; 7+ messages in thread
From: Yunsheng Lin @ 2021-04-16 3:56 UTC (permalink / raw)
To: zhudi (J), davem, kuba; +Cc: netdev, Chenxiang (EulerOS)
On 2021/4/16 11:27, zhudi (J) wrote:
>> dependencyOn 2021/4/15 11:35, zhudi wrote:
>>> From: Di Zhu <zhudi21@huawei.com>
>>>
>>> We encountered a crash: in the packet receiving process, we got an
>>> illegal VLAN device address, but the VLAN device address saved in
>>> vmcore is correct. After checking the code, we found a possible data
>>> competition:
>>> CPU 0: CPU 1:
>>> (RCU read lock) (RTNL lock)
>>> vlan_do_receive() register_vlan_dev()
>>> vlan_find_dev()
>>>
>>> ->__vlan_group_get_device() ->vlan_group_prealloc_vid()
>>>
>>> In vlan_group_prealloc_vid(), We need to make sure that kzalloc is
>>> executed before assigning a value to vlan devices array, otherwise we
>>
>> As my understanding, there is a dependency between calling kzalloc() and
>> assigning the address(returned from kzalloc()) to vg->vlan_devices_arrays,
>> CPU and compiler can see the dependency, why can't it handling the
>> dependency before adding the smp_wmb()?
>>
>> See CONTROL DEPENDENCIES section in Documentation/memory-
>> barriers.txt:
>>
>> However, stores are not speculated. This means that ordering -is- provided
>> for load-store control dependencies, as in the following example:
>>
>> q = READ_ONCE(a);
>> if (q) {
>> WRITE_ONCE(b, 1);
>> }
>>
>
> Maybe I didn't make it clear. This memory isolation is to ensure the order of
> memset(object, 0, size) in kzalloc() operations and the subsequent array assignment statements.
>
> kzalloc()
> ->memset(object, 0, size)
>
> smp_wmb()
>
> vg->vlan_devices_arrays[pidx][vidx] = array;
>
> Because __vlan_group_get_device() function depends on this order
Thanks for clarify, it would be good to mention this in the
commit log too.
Also, __vlan_group_get_device() is used in the data path, it would
be to avoid the barrier op too. Maybe using rcu to avoid the barrier
if the __vlan_group_get_device() is already protected by rcu_lock.
>
>>
>>
>>> may get a wrong address from the hardware cache on another cpu.
>>>
>>> So fix it by adding memory barrier instruction to ensure the order of
>>> memory operations.
>>>
>>> Signed-off-by: Di Zhu <zhudi21@huawei.com>
>>> ---
>>> net/8021q/vlan.c | 2 ++
>>> net/8021q/vlan.h | 3 +++
>>> 2 files changed, 5 insertions(+)
>>>
>>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index
>>> 8b644113715e..4f541e05cd3f 100644
>>> --- a/net/8021q/vlan.c
>>> +++ b/net/8021q/vlan.c
>>> @@ -71,6 +71,8 @@ static int vlan_group_prealloc_vid(struct vlan_group
>> *vg,
>>> if (array == NULL)
>>> return -ENOBUFS;
>>>
>>> + smp_wmb();
>>> +
>>> vg->vlan_devices_arrays[pidx][vidx] = array;
>>> return 0;
>>> }
>>> diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h index
>>> 953405362795..7408fda084d3 100644
>>> --- a/net/8021q/vlan.h
>>> +++ b/net/8021q/vlan.h
>>> @@ -57,6 +57,9 @@ static inline struct net_device
>>> *__vlan_group_get_device(struct vlan_group *vg,
>>>
>>> array = vg->vlan_devices_arrays[pidx]
>>> [vlan_id /
>> VLAN_GROUP_ARRAY_PART_LEN];
>>> +
>>> + smp_rmb();
>>> +
>>> return array ? array[vlan_id % VLAN_GROUP_ARRAY_PART_LEN] :
>> NULL; }
>>>
>>>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net: fix a data race when get vlan device
@ 2021-04-16 3:27 zhudi (J)
2021-04-16 3:56 ` Yunsheng Lin
0 siblings, 1 reply; 7+ messages in thread
From: zhudi (J) @ 2021-04-16 3:27 UTC (permalink / raw)
To: linyunsheng, davem, kuba; +Cc: netdev, Chenxiang (EulerOS)
> dependencyOn 2021/4/15 11:35, zhudi wrote:
> > From: Di Zhu <zhudi21@huawei.com>
> >
> > We encountered a crash: in the packet receiving process, we got an
> > illegal VLAN device address, but the VLAN device address saved in
> > vmcore is correct. After checking the code, we found a possible data
> > competition:
> > CPU 0: CPU 1:
> > (RCU read lock) (RTNL lock)
> > vlan_do_receive() register_vlan_dev()
> > vlan_find_dev()
> >
> > ->__vlan_group_get_device() ->vlan_group_prealloc_vid()
> >
> > In vlan_group_prealloc_vid(), We need to make sure that kzalloc is
> > executed before assigning a value to vlan devices array, otherwise we
>
> As my understanding, there is a dependency between calling kzalloc() and
> assigning the address(returned from kzalloc()) to vg->vlan_devices_arrays,
> CPU and compiler can see the dependency, why can't it handling the
> dependency before adding the smp_wmb()?
>
> See CONTROL DEPENDENCIES section in Documentation/memory-
> barriers.txt:
>
> However, stores are not speculated. This means that ordering -is- provided
> for load-store control dependencies, as in the following example:
>
> q = READ_ONCE(a);
> if (q) {
> WRITE_ONCE(b, 1);
> }
>
Maybe I didn't make it clear. This memory isolation is to ensure the order of
memset(object, 0, size) in kzalloc() operations and the subsequent array assignment statements.
kzalloc()
->memset(object, 0, size)
smp_wmb()
vg->vlan_devices_arrays[pidx][vidx] = array;
Because __vlan_group_get_device() function depends on this order
>
>
> > may get a wrong address from the hardware cache on another cpu.
> >
> > So fix it by adding memory barrier instruction to ensure the order of
> > memory operations.
> >
> > Signed-off-by: Di Zhu <zhudi21@huawei.com>
> > ---
> > net/8021q/vlan.c | 2 ++
> > net/8021q/vlan.h | 3 +++
> > 2 files changed, 5 insertions(+)
> >
> > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index
> > 8b644113715e..4f541e05cd3f 100644
> > --- a/net/8021q/vlan.c
> > +++ b/net/8021q/vlan.c
> > @@ -71,6 +71,8 @@ static int vlan_group_prealloc_vid(struct vlan_group
> *vg,
> > if (array == NULL)
> > return -ENOBUFS;
> >
> > + smp_wmb();
> > +
> > vg->vlan_devices_arrays[pidx][vidx] = array;
> > return 0;
> > }
> > diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h index
> > 953405362795..7408fda084d3 100644
> > --- a/net/8021q/vlan.h
> > +++ b/net/8021q/vlan.h
> > @@ -57,6 +57,9 @@ static inline struct net_device
> > *__vlan_group_get_device(struct vlan_group *vg,
> >
> > array = vg->vlan_devices_arrays[pidx]
> > [vlan_id /
> VLAN_GROUP_ARRAY_PART_LEN];
> > +
> > + smp_rmb();
> > +
> > return array ? array[vlan_id % VLAN_GROUP_ARRAY_PART_LEN] :
> NULL; }
> >
> >
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net: fix a data race when get vlan device
2021-04-15 3:35 zhudi
@ 2021-04-16 2:26 ` Yunsheng Lin
0 siblings, 0 replies; 7+ messages in thread
From: Yunsheng Lin @ 2021-04-16 2:26 UTC (permalink / raw)
To: zhudi, davem, kuba; +Cc: netdev, rose.chen
dependencyOn 2021/4/15 11:35, zhudi wrote:
> From: Di Zhu <zhudi21@huawei.com>
>
> We encountered a crash: in the packet receiving process, we got an
> illegal VLAN device address, but the VLAN device address saved in vmcore
> is correct. After checking the code, we found a possible data
> competition:
> CPU 0: CPU 1:
> (RCU read lock) (RTNL lock)
> vlan_do_receive() register_vlan_dev()
> vlan_find_dev()
>
> ->__vlan_group_get_device() ->vlan_group_prealloc_vid()
>
> In vlan_group_prealloc_vid(), We need to make sure that kzalloc is
> executed before assigning a value to vlan devices array, otherwise we
As my understanding, there is a dependency between calling kzalloc() and
assigning the address(returned from kzalloc()) to vg->vlan_devices_arrays,
CPU and compiler can see the dependency, why can't it handling the
dependency before adding the smp_wmb()?
See CONTROL DEPENDENCIES section in Documentation/memory-barriers.txt:
However, stores are not speculated. This means that ordering -is- provided
for load-store control dependencies, as in the following example:
q = READ_ONCE(a);
if (q) {
WRITE_ONCE(b, 1);
}
> may get a wrong address from the hardware cache on another cpu.
>
> So fix it by adding memory barrier instruction to ensure the order
> of memory operations.
>
> Signed-off-by: Di Zhu <zhudi21@huawei.com>
> ---
> net/8021q/vlan.c | 2 ++
> net/8021q/vlan.h | 3 +++
> 2 files changed, 5 insertions(+)
>
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 8b644113715e..4f541e05cd3f 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -71,6 +71,8 @@ static int vlan_group_prealloc_vid(struct vlan_group *vg,
> if (array == NULL)
> return -ENOBUFS;
>
> + smp_wmb();
> +
> vg->vlan_devices_arrays[pidx][vidx] = array;
> return 0;
> }
> diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
> index 953405362795..7408fda084d3 100644
> --- a/net/8021q/vlan.h
> +++ b/net/8021q/vlan.h
> @@ -57,6 +57,9 @@ static inline struct net_device *__vlan_group_get_device(struct vlan_group *vg,
>
> array = vg->vlan_devices_arrays[pidx]
> [vlan_id / VLAN_GROUP_ARRAY_PART_LEN];
> +
> + smp_rmb();
> +
> return array ? array[vlan_id % VLAN_GROUP_ARRAY_PART_LEN] : NULL;
> }
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] net: fix a data race when get vlan device
@ 2021-04-15 3:35 zhudi
2021-04-16 2:26 ` Yunsheng Lin
0 siblings, 1 reply; 7+ messages in thread
From: zhudi @ 2021-04-15 3:35 UTC (permalink / raw)
To: davem, kuba; +Cc: netdev, zhudi21, rose.chen
From: Di Zhu <zhudi21@huawei.com>
We encountered a crash: in the packet receiving process, we got an
illegal VLAN device address, but the VLAN device address saved in vmcore
is correct. After checking the code, we found a possible data
competition:
CPU 0: CPU 1:
(RCU read lock) (RTNL lock)
vlan_do_receive() register_vlan_dev()
vlan_find_dev()
->__vlan_group_get_device() ->vlan_group_prealloc_vid()
In vlan_group_prealloc_vid(), We need to make sure that kzalloc is
executed before assigning a value to vlan devices array, otherwise we
may get a wrong address from the hardware cache on another cpu.
So fix it by adding memory barrier instruction to ensure the order
of memory operations.
Signed-off-by: Di Zhu <zhudi21@huawei.com>
---
net/8021q/vlan.c | 2 ++
net/8021q/vlan.h | 3 +++
2 files changed, 5 insertions(+)
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 8b644113715e..4f541e05cd3f 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -71,6 +71,8 @@ static int vlan_group_prealloc_vid(struct vlan_group *vg,
if (array == NULL)
return -ENOBUFS;
+ smp_wmb();
+
vg->vlan_devices_arrays[pidx][vidx] = array;
return 0;
}
diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
index 953405362795..7408fda084d3 100644
--- a/net/8021q/vlan.h
+++ b/net/8021q/vlan.h
@@ -57,6 +57,9 @@ static inline struct net_device *__vlan_group_get_device(struct vlan_group *vg,
array = vg->vlan_devices_arrays[pidx]
[vlan_id / VLAN_GROUP_ARRAY_PART_LEN];
+
+ smp_rmb();
+
return array ? array[vlan_id % VLAN_GROUP_ARRAY_PART_LEN] : NULL;
}
--
2.23.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-04-19 13:16 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-17 12:33 [PATCH] net: fix a data race when get vlan device zhudi (J)
2021-04-19 2:26 ` Yunsheng Lin
-- strict thread matches above, loose matches on Subject: below --
2021-04-19 13:13 zhudi (J)
2021-04-16 3:27 zhudi (J)
2021-04-16 3:56 ` Yunsheng Lin
2021-04-15 3:35 zhudi
2021-04-16 2:26 ` Yunsheng Lin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.