linux-ppp.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC
@ 2016-07-05  2:50 Matt Bennett
  2016-07-05 17:59 ` Cong Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Matt Bennett @ 2016-07-05  2:50 UTC (permalink / raw)
  To: linux-ppp, netdev; +Cc: g.nault

Hi,

I am producing the attached bug trace when testing PPP connections. 
Specifically the steps I am doing are:

1. Configure PPP client and LAC with shared key and wait for client to 
negotiate an IP address.

2. Change the shared key on the LAC.

3. Bring the PPP client interface down and up to make it reconnect.

4. Repeat down/up until bug occurs.

Using printk I have confirmed that ppp_pernet() is called from 
ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).

This behavior appears to have been introduced in commit 1f461dc ("ppp: 
take reference on channels netns").

Thanks,
Matt

-------

Kernel bug detected[#1]:
CPU: 0 PID: 1796 Comm: pppd Tainted: P           O    4.4.6-at1 #1
task: 800000004cf9a610 ti: 80000000009b8000 task.ti: 80000000009b8000
$ 0   : 0000000000000000 0000000000000001 0000000000000000 0000000000000001
$ 4   : 8000000005174560 8000000005174560 8000000005175b30 00000000048d0550
$ 8   : 0000000004830000 0000000000000000 0000000000005fd8 0000000004900000
$12   : ffffffff80000000 8000000004900000 0000000000000014 0000000000000000
$16   : 0000000000000001 80000000008ec480 0000000010069dc4 fffffffffffffff2
$20   : 8000000004840000 80000000008ec4f8 000000001008e708 0000000010010000
$24   : 0000000004900000 0000000004900000
$28   : 80000000009b8000 80000000009bbd00 800000004c777d80 800000000438ade8
Hi    : 0000000000000000
Lo    : 09cd1da35f400000
epc   : 800000000438a5c8 ppp_ioctl+0x868/0x1098
ra    : 800000000438ade8 ppp_ioctl+0x1088/0x1098
Status: 10009ce3        KX SX UX KERNEL EXL IE
Cause : 00800034 (ExcCode 0d)
PrId  : 000d9602 (Cavium Octeon III)
Modules linked in: jitterentropy_rng echainiv drbg linux_user_bde(PO) 
linux_kernel_bde(PO) platform_driver(O) ipifwd(PO
)
Process pppd (pid: 1796, threadinfo€000000009b8000, 
task€0000004cf9a610, tls\00000ffee40b700)
Stack : 0000000000000001 800000004cf9aa00 800000000480f1f8 80000000048d4600
           ffffffff80000000 8000000005188600 0000000010020000 
8000000004086dc0
           0000000010069dc4 800000004c777d80 800000004f90bba8 
000000000000000b
           0000000010069dc4 ffffffff8004743a 000000001008e708 
0000000010010000
           0000000010020000 800000000414e318 0000000000000001 
800000004cf9a610
           8000000005188600 8000000001440190 800000000480df00 
80000000015126a0
           80000000015126a0 80000000046792ac 80000000010ef080 
0000000000000004
           800000004c777d80 800000004c777d80 0000000010070000 
800000004c777d80
           000000000000000b 800000000414e930 0000000000000011 
0000000010044438
           000000000000000b ffffffffffffffff 000000001003eb80 
000000001003ec10
           ...
Call Trace:
[<800000000438a5c8>] ppp_ioctl+0x868/0x1098
[<800000000414e318>] do_vfs_ioctl+0x98/0x620
[<800000000414e930>] SyS_ioctl+0x90/0xd0
[<8000000004035e80>] syscall_common+0x44/0x68


Code: de2200a0  10400202  0000182d <00030336> 3c038000  3c080481 
dc421420  64630000  0003183c
---[ end trace 72203e44575f38a6 ]---

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC
  2016-07-05  2:50 Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC Matt Bennett
@ 2016-07-05 17:59 ` Cong Wang
  2016-07-05 20:36   ` Cong Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Cong Wang @ 2016-07-05 17:59 UTC (permalink / raw)
  To: Matt Bennett; +Cc: linux-ppp, netdev, g.nault

On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
<Matt.Bennett@alliedtelesis.co.nz> wrote:
> Using printk I have confirmed that ppp_pernet() is called from
> ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
>
> This behavior appears to have been introduced in commit 1f461dc ("ppp:
> take reference on channels netns").

We have some race condition here, where a parallel ppp_unregister_channel()
could happen while we are in ppp_connect_channel().

We need some synchronization for them. I am not sure what is the right lock
here since ppp locking looks crazy.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC
  2016-07-05 17:59 ` Cong Wang
@ 2016-07-05 20:36   ` Cong Wang
  2016-07-06  0:05     ` Matt Bennett
  0 siblings, 1 reply; 5+ messages in thread
From: Cong Wang @ 2016-07-05 20:36 UTC (permalink / raw)
  To: Matt Bennett; +Cc: linux-ppp, netdev, g.nault

[-- Attachment #1: Type: text/plain, Size: 736 bytes --]

On Tue, Jul 5, 2016 at 10:59 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
> <Matt.Bennett@alliedtelesis.co.nz> wrote:
>> Using printk I have confirmed that ppp_pernet() is called from
>> ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
>>
>> This behavior appears to have been introduced in commit 1f461dc ("ppp:
>> take reference on channels netns").
>
> We have some race condition here, where a parallel ppp_unregister_channel()
> could happen while we are in ppp_connect_channel().
>
> We need some synchronization for them. I am not sure what is the right lock
> here since ppp locking looks crazy.

Matt, could you try if the attached patch helps?

Thanks!

[-- Attachment #2: ppp.diff --]
[-- Type: text/plain, Size: 738 bytes --]

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 8dedafa..07f0e49 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -2601,8 +2601,6 @@ ppp_unregister_channel(struct ppp_channel *chan)
 	spin_lock_bh(&pn->all_channels_lock);
 	list_del(&pch->list);
 	spin_unlock_bh(&pn->all_channels_lock);
-	put_net(pch->chan_net);
-	pch->chan_net = NULL;
 
 	pch->file.dead = 1;
 	wake_up_interruptible(&pch->file.rwait);
@@ -3136,6 +3134,11 @@ ppp_disconnect_channel(struct channel *pch)
  */
 static void ppp_destroy_channel(struct channel *pch)
 {
+	if (pch->chan_net) {
+		put_net(pch->chan_net);
+		pch->chan_net = NULL;
+	}
+
 	atomic_dec(&channel_count);
 
 	if (!pch->file.dead) {

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC
  2016-07-05 20:36   ` Cong Wang
@ 2016-07-06  0:05     ` Matt Bennett
  2016-07-06  2:02       ` Cong Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Matt Bennett @ 2016-07-06  0:05 UTC (permalink / raw)
  To: Cong Wang; +Cc: linux-ppp, netdev, g.nault, g.nault

On 07/06/2016 08:37 AM, Cong Wang wrote:
> On Tue, Jul 5, 2016 at 10:59 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
>> <Matt.Bennett@alliedtelesis.co.nz> wrote:
>>> Using printk I have confirmed that ppp_pernet() is called from
>>> ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
>>>
>>> This behavior appears to have been introduced in commit 1f461dc ("ppp:
>>> take reference on channels netns").
>>
>> We have some race condition here, where a parallel ppp_unregister_channel()
>> could happen while we are in ppp_connect_channel().
>>
>> We need some synchronization for them. I am not sure what is the right lock
>> here since ppp locking looks crazy.
>
> Matt, could you try if the attached patch helps?
>
> Thanks!
>
I have given that patch a good amount of testing and the BUG_ON() no 
longer is hit. Whether that is the best fix or not I am unsure?

Either way, the following comment in ppp_unregister_channel() seems 
incorrect to me and should probably be deleted unless it is fixed?

/*
  * This ensures that we have returned from any calls into the
  * the channel's start_xmit or ioctl routine before we proceed.
  */

It appears mutex_lock(&ppp_mutex) what locks ppp_ioctl. ppp_xmit uses 
ppp_xmit_lock(ppp) in ppp_xmit_process.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC
  2016-07-06  0:05     ` Matt Bennett
@ 2016-07-06  2:02       ` Cong Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Cong Wang @ 2016-07-06  2:02 UTC (permalink / raw)
  To: Matt Bennett; +Cc: linux-ppp, netdev, g.nault

On Tue, Jul 5, 2016 at 5:05 PM, Matt Bennett
<Matt.Bennett@alliedtelesis.co.nz> wrote:
> On 07/06/2016 08:37 AM, Cong Wang wrote:
>> On Tue, Jul 5, 2016 at 10:59 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
>>> <Matt.Bennett@alliedtelesis.co.nz> wrote:
>>>> Using printk I have confirmed that ppp_pernet() is called from
>>>> ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
>>>>
>>>> This behavior appears to have been introduced in commit 1f461dc ("ppp:
>>>> take reference on channels netns").
>>>
>>> We have some race condition here, where a parallel ppp_unregister_channel()
>>> could happen while we are in ppp_connect_channel().
>>>
>>> We need some synchronization for them. I am not sure what is the right lock
>>> here since ppp locking looks crazy.
>>
>> Matt, could you try if the attached patch helps?
>>
>> Thanks!
>>
> I have given that patch a good amount of testing and the BUG_ON() no
> longer is hit. Whether that is the best fix or not I am unsure?

At least my patch makes the net refcnt sync with pch life-time:
we grab a net refcnt when we allocate a pch, and release it when
we are going to destroy a pch. Makes sense to you?

>
> Either way, the following comment in ppp_unregister_channel() seems
> incorrect to me and should probably be deleted unless it is fixed?
>
> /*
>   * This ensures that we have returned from any calls into the
>   * the channel's start_xmit or ioctl routine before we proceed.
>   */

This comment is pretty old, I think it refers to the pch->ppp
check in ppp_connect_channel().

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-07-06  2:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-05  2:50 Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC Matt Bennett
2016-07-05 17:59 ` Cong Wang
2016-07-05 20:36   ` Cong Wang
2016-07-06  0:05     ` Matt Bennett
2016-07-06  2:02       ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).