* Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC
@ 2016-07-05 2:50 Matt Bennett
2016-07-05 17:59 ` Cong Wang
0 siblings, 1 reply; 5+ messages in thread
From: Matt Bennett @ 2016-07-05 2:50 UTC (permalink / raw)
To: linux-ppp, netdev; +Cc: g.nault
Hi,
I am producing the attached bug trace when testing PPP connections.
Specifically the steps I am doing are:
1. Configure PPP client and LAC with shared key and wait for client to
negotiate an IP address.
2. Change the shared key on the LAC.
3. Bring the PPP client interface down and up to make it reconnect.
4. Repeat down/up until bug occurs.
Using printk I have confirmed that ppp_pernet() is called from
ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
This behavior appears to have been introduced in commit 1f461dc ("ppp:
take reference on channels netns").
Thanks,
Matt
-------
Kernel bug detected[#1]:
CPU: 0 PID: 1796 Comm: pppd Tainted: P O 4.4.6-at1 #1
task: 800000004cf9a610 ti: 80000000009b8000 task.ti: 80000000009b8000
$ 0 : 0000000000000000 0000000000000001 0000000000000000 0000000000000001
$ 4 : 8000000005174560 8000000005174560 8000000005175b30 00000000048d0550
$ 8 : 0000000004830000 0000000000000000 0000000000005fd8 0000000004900000
$12 : ffffffff80000000 8000000004900000 0000000000000014 0000000000000000
$16 : 0000000000000001 80000000008ec480 0000000010069dc4 fffffffffffffff2
$20 : 8000000004840000 80000000008ec4f8 000000001008e708 0000000010010000
$24 : 0000000004900000 0000000004900000
$28 : 80000000009b8000 80000000009bbd00 800000004c777d80 800000000438ade8
Hi : 0000000000000000
Lo : 09cd1da35f400000
epc : 800000000438a5c8 ppp_ioctl+0x868/0x1098
ra : 800000000438ade8 ppp_ioctl+0x1088/0x1098
Status: 10009ce3 KX SX UX KERNEL EXL IE
Cause : 00800034 (ExcCode 0d)
PrId : 000d9602 (Cavium Octeon III)
Modules linked in: jitterentropy_rng echainiv drbg linux_user_bde(PO)
linux_kernel_bde(PO) platform_driver(O) ipifwd(PO
)
Process pppd (pid: 1796, threadinfo€000000009b8000,
task€0000004cf9a610, tls\00000ffee40b700)
Stack : 0000000000000001 800000004cf9aa00 800000000480f1f8 80000000048d4600
ffffffff80000000 8000000005188600 0000000010020000
8000000004086dc0
0000000010069dc4 800000004c777d80 800000004f90bba8
000000000000000b
0000000010069dc4 ffffffff8004743a 000000001008e708
0000000010010000
0000000010020000 800000000414e318 0000000000000001
800000004cf9a610
8000000005188600 8000000001440190 800000000480df00
80000000015126a0
80000000015126a0 80000000046792ac 80000000010ef080
0000000000000004
800000004c777d80 800000004c777d80 0000000010070000
800000004c777d80
000000000000000b 800000000414e930 0000000000000011
0000000010044438
000000000000000b ffffffffffffffff 000000001003eb80
000000001003ec10
...
Call Trace:
[<800000000438a5c8>] ppp_ioctl+0x868/0x1098
[<800000000414e318>] do_vfs_ioctl+0x98/0x620
[<800000000414e930>] SyS_ioctl+0x90/0xd0
[<8000000004035e80>] syscall_common+0x44/0x68
Code: de2200a0 10400202 0000182d <00030336> 3c038000 3c080481
dc421420 64630000 0003183c
---[ end trace 72203e44575f38a6 ]---
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC
2016-07-05 2:50 Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC Matt Bennett
@ 2016-07-05 17:59 ` Cong Wang
2016-07-05 20:36 ` Cong Wang
0 siblings, 1 reply; 5+ messages in thread
From: Cong Wang @ 2016-07-05 17:59 UTC (permalink / raw)
To: Matt Bennett; +Cc: linux-ppp, netdev, g.nault
On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
<Matt.Bennett@alliedtelesis.co.nz> wrote:
> Using printk I have confirmed that ppp_pernet() is called from
> ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
>
> This behavior appears to have been introduced in commit 1f461dc ("ppp:
> take reference on channels netns").
We have some race condition here, where a parallel ppp_unregister_channel()
could happen while we are in ppp_connect_channel().
We need some synchronization for them. I am not sure what is the right lock
here since ppp locking looks crazy.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC
2016-07-05 17:59 ` Cong Wang
@ 2016-07-05 20:36 ` Cong Wang
2016-07-06 0:05 ` Matt Bennett
0 siblings, 1 reply; 5+ messages in thread
From: Cong Wang @ 2016-07-05 20:36 UTC (permalink / raw)
To: Matt Bennett; +Cc: linux-ppp, netdev, g.nault
[-- Attachment #1: Type: text/plain, Size: 736 bytes --]
On Tue, Jul 5, 2016 at 10:59 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
> <Matt.Bennett@alliedtelesis.co.nz> wrote:
>> Using printk I have confirmed that ppp_pernet() is called from
>> ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
>>
>> This behavior appears to have been introduced in commit 1f461dc ("ppp:
>> take reference on channels netns").
>
> We have some race condition here, where a parallel ppp_unregister_channel()
> could happen while we are in ppp_connect_channel().
>
> We need some synchronization for them. I am not sure what is the right lock
> here since ppp locking looks crazy.
Matt, could you try if the attached patch helps?
Thanks!
[-- Attachment #2: ppp.diff --]
[-- Type: text/plain, Size: 738 bytes --]
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 8dedafa..07f0e49 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -2601,8 +2601,6 @@ ppp_unregister_channel(struct ppp_channel *chan)
spin_lock_bh(&pn->all_channels_lock);
list_del(&pch->list);
spin_unlock_bh(&pn->all_channels_lock);
- put_net(pch->chan_net);
- pch->chan_net = NULL;
pch->file.dead = 1;
wake_up_interruptible(&pch->file.rwait);
@@ -3136,6 +3134,11 @@ ppp_disconnect_channel(struct channel *pch)
*/
static void ppp_destroy_channel(struct channel *pch)
{
+ if (pch->chan_net) {
+ put_net(pch->chan_net);
+ pch->chan_net = NULL;
+ }
+
atomic_dec(&channel_count);
if (!pch->file.dead) {
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC
2016-07-05 20:36 ` Cong Wang
@ 2016-07-06 0:05 ` Matt Bennett
2016-07-06 2:02 ` Cong Wang
0 siblings, 1 reply; 5+ messages in thread
From: Matt Bennett @ 2016-07-06 0:05 UTC (permalink / raw)
To: Cong Wang; +Cc: linux-ppp, netdev, g.nault, g.nault
On 07/06/2016 08:37 AM, Cong Wang wrote:
> On Tue, Jul 5, 2016 at 10:59 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
>> <Matt.Bennett@alliedtelesis.co.nz> wrote:
>>> Using printk I have confirmed that ppp_pernet() is called from
>>> ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
>>>
>>> This behavior appears to have been introduced in commit 1f461dc ("ppp:
>>> take reference on channels netns").
>>
>> We have some race condition here, where a parallel ppp_unregister_channel()
>> could happen while we are in ppp_connect_channel().
>>
>> We need some synchronization for them. I am not sure what is the right lock
>> here since ppp locking looks crazy.
>
> Matt, could you try if the attached patch helps?
>
> Thanks!
>
I have given that patch a good amount of testing and the BUG_ON() no
longer is hit. Whether that is the best fix or not I am unsure?
Either way, the following comment in ppp_unregister_channel() seems
incorrect to me and should probably be deleted unless it is fixed?
/*
* This ensures that we have returned from any calls into the
* the channel's start_xmit or ioctl routine before we proceed.
*/
It appears mutex_lock(&ppp_mutex) what locks ppp_ioctl. ppp_xmit uses
ppp_xmit_lock(ppp) in ppp_xmit_process.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC
2016-07-06 0:05 ` Matt Bennett
@ 2016-07-06 2:02 ` Cong Wang
0 siblings, 0 replies; 5+ messages in thread
From: Cong Wang @ 2016-07-06 2:02 UTC (permalink / raw)
To: Matt Bennett; +Cc: linux-ppp, netdev, g.nault
On Tue, Jul 5, 2016 at 5:05 PM, Matt Bennett
<Matt.Bennett@alliedtelesis.co.nz> wrote:
> On 07/06/2016 08:37 AM, Cong Wang wrote:
>> On Tue, Jul 5, 2016 at 10:59 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
>>> <Matt.Bennett@alliedtelesis.co.nz> wrote:
>>>> Using printk I have confirmed that ppp_pernet() is called from
>>>> ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
>>>>
>>>> This behavior appears to have been introduced in commit 1f461dc ("ppp:
>>>> take reference on channels netns").
>>>
>>> We have some race condition here, where a parallel ppp_unregister_channel()
>>> could happen while we are in ppp_connect_channel().
>>>
>>> We need some synchronization for them. I am not sure what is the right lock
>>> here since ppp locking looks crazy.
>>
>> Matt, could you try if the attached patch helps?
>>
>> Thanks!
>>
> I have given that patch a good amount of testing and the BUG_ON() no
> longer is hit. Whether that is the best fix or not I am unsure?
At least my patch makes the net refcnt sync with pch life-time:
we grab a net refcnt when we allocate a pch, and release it when
we are going to destroy a pch. Makes sense to you?
>
> Either way, the following comment in ppp_unregister_channel() seems
> incorrect to me and should probably be deleted unless it is fixed?
>
> /*
> * This ensures that we have returned from any calls into the
> * the channel's start_xmit or ioctl routine before we proceed.
> */
This comment is pretty old, I think it refers to the pch->ppp
check in ppp_connect_channel().
Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-07-06 2:02 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-05 2:50 Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC Matt Bennett
2016-07-05 17:59 ` Cong Wang
2016-07-05 20:36 ` Cong Wang
2016-07-06 0:05 ` Matt Bennett
2016-07-06 2:02 ` Cong Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).