linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Urgent: BUG: PPP ioctl Transport endpoint is not connected
@ 2020-12-09 14:47 Martin Zaharinov
  2020-12-09 16:40 ` Guillaume Nault
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Zaharinov @ 2020-12-09 14:47 UTC (permalink / raw)
  To: linux-kernel@vger kernel. org; +Cc: Eric Dumazet, netdev

Hi All

I have problem with latest kernel release 
And the problem is base on this late problem :


https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1

I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have same problem.


In kernel 5.9.13 now don’t have any crashes in dimes but in one moment accel service stop with defunct and in log have many of this line :


error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected

In one moment connected user bump double or triple and after that service defunct and need wait to drop all session to start .

I talk with accel-ppp team and they said this is kernel related problem and to back to kernel 4.14 there is not this problem.

Problem is come after kernel 4.15 > and not have solution to this moment.

Please help to find the problem.

Last time in link I see is make changes in ppp_generic.c 

ppp_lock(ppp);
        spin_lock_bh(&pch->downl);
        if (!pch->chan) {
                /* Don't connect unregistered channels */
                spin_unlock_bh(&pch->downl);
                ppp_unlock(ppp);
                ret = -ENOTCONN;
                goto outl;
        }
        spin_unlock_bh(&pch->downl);


But this fix only to don’t display error and freeze system 
The problem is stay and is to big.


Please help to fix.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Urgent: BUG: PPP ioctl Transport endpoint is not connected
  2020-12-09 14:47 Urgent: BUG: PPP ioctl Transport endpoint is not connected Martin Zaharinov
@ 2020-12-09 16:40 ` Guillaume Nault
  2020-12-09 16:57   ` Martin Zaharinov
  0 siblings, 1 reply; 10+ messages in thread
From: Guillaume Nault @ 2020-12-09 16:40 UTC (permalink / raw)
  To: Martin Zaharinov; +Cc: linux-kernel@vger kernel. org, Eric Dumazet, netdev

On Wed, Dec 09, 2020 at 04:47:52PM +0200, Martin Zaharinov wrote:
> Hi All
> 
> I have problem with latest kernel release 
> And the problem is base on this late problem :
> 
> 
> https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1
> 
> I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have same problem.
> 
> 
> In kernel 5.9.13 now don’t have any crashes in dimes but in one moment accel service stop with defunct and in log have many of this line :
> 
> 
> error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> 
> In one moment connected user bump double or triple and after that service defunct and need wait to drop all session to start .
> 
> I talk with accel-ppp team and they said this is kernel related problem and to back to kernel 4.14 there is not this problem.
> 
> Problem is come after kernel 4.15 > and not have solution to this moment.

I'm sorry, I don't understand.
Do you mean that v4.14 worked fine (no crash, no ioctl() error)?
Did the problem start appearing in v4.15? Or did v4.15 work and the
problem appeared in v4.16?

> Please help to find the problem.
> 
> Last time in link I see is make changes in ppp_generic.c 
> 
> ppp_lock(ppp);
>         spin_lock_bh(&pch->downl);
>         if (!pch->chan) {
>                 /* Don't connect unregistered channels */
>                 spin_unlock_bh(&pch->downl);
>                 ppp_unlock(ppp);
>                 ret = -ENOTCONN;
>                 goto outl;
>         }
>         spin_unlock_bh(&pch->downl);
> 
> 
> But this fix only to don’t display error and freeze system 
> The problem is stay and is to big.

Do you use accel-ppp's unit-cache option? Does the problem go away if
you stop using it?

> 
> Please help to fix.
> 
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Urgent: BUG: PPP ioctl Transport endpoint is not connected
  2020-12-09 16:40 ` Guillaume Nault
@ 2020-12-09 16:57   ` Martin Zaharinov
  2020-12-09 17:29     ` Martin Zaharinov
  2020-12-09 18:10     ` Guillaume Nault
  0 siblings, 2 replies; 10+ messages in thread
From: Martin Zaharinov @ 2020-12-09 16:57 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: linux-kernel@vger kernel. org, Eric Dumazet, netdev

Hi Nault 



> On 9 Dec 2020, at 18:40, Guillaume Nault <gnault@redhat.com> wrote:
> 
> On Wed, Dec 09, 2020 at 04:47:52PM +0200, Martin Zaharinov wrote:
>> Hi All
>> 
>> I have problem with latest kernel release 
>> And the problem is base on this late problem :
>> 
>> 
>> https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1
>> 
>> I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have same problem.
>> 
>> 
>> In kernel 5.9.13 now don’t have any crashes in dimes but in one moment accel service stop with defunct and in log have many of this line :
>> 
>> 
>> error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>> error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>> error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>> 
>> In one moment connected user bump double or triple and after that service defunct and need wait to drop all session to start .
>> 
>> I talk with accel-ppp team and they said this is kernel related problem and to back to kernel 4.14 there is not this problem.
>> 
>> Problem is come after kernel 4.15 > and not have solution to this moment.
> 
> I'm sorry, I don't understand.
> Do you mean that v4.14 worked fine (no crash, no ioctl() error)?
> Did the problem start appearing in v4.15? Or did v4.15 work and the
> problem appeared in v4.16?

In Telegram group I talk with Sergey and Dimka and told my the problem is come after changes from 4.14 to 4.15 
Sergey write this : "as I know, there was a similar issue in kernel 4.15 so maybe it is still not fixed”

I don’t have options to test with this old kernel 4.14.xxx i don’t have support for them.


> 
>> Please help to find the problem.
>> 
>> Last time in link I see is make changes in ppp_generic.c 
>> 
>> ppp_lock(ppp);
>>        spin_lock_bh(&pch->downl);
>>        if (!pch->chan) {
>>                /* Don't connect unregistered channels */
>>                spin_unlock_bh(&pch->downl);
>>                ppp_unlock(ppp);
>>                ret = -ENOTCONN;
>>                goto outl;
>>        }
>>        spin_unlock_bh(&pch->downl);
>> 
>> 
>> But this fix only to don’t display error and freeze system 
>> The problem is stay and is to big.
> 
> Do you use accel-ppp's unit-cache option? Does the problem go away if
> you stop using it?
> 

No I don’t use unit-cache , if I set unit-cache accel-ppp defunct same but user Is connect and disconnet more fast.

The problem is same with unit and without . 
Only after this patch I don’t see error in dimes but this is not solution.
In network have customer what have power cut problem, when drop 600 user and back Is normal but in this moment kernel is locking and start to make this : 
sessions:
  starting: 4235
  active: 3882
  finishing: 378

 The problem is starting session is not real user normal user in this server is ~4k customers .

I use pppd_compat .

Any idea ?

>> 
>> Please help to fix.
Martin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Urgent: BUG: PPP ioctl Transport endpoint is not connected
  2020-12-09 16:57   ` Martin Zaharinov
@ 2020-12-09 17:29     ` Martin Zaharinov
  2020-12-09 18:10     ` Guillaume Nault
  1 sibling, 0 replies; 10+ messages in thread
From: Martin Zaharinov @ 2020-12-09 17:29 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: linux-kernel@vger kernel. org, Eric Dumazet, netdev

I make diff linux 4.14.211 and 4.15 kernel

And changes is:

atomic_inc to refcount_inc 

And on other part of code in ppp_generic.c remove skb_free ….



You see diff down : 


--- linux-4.14.211/drivers/net/ppp/ppp_generic.c	2020-12-08 09:17:35.000000000 +0000
+++ linux-4.15/drivers/net/ppp/ppp_generic.c	2018-01-28 21:20:33.000000000 +0000
@@ -51,6 +51,7 @@
 #include <asm/unaligned.h>
 #include <net/slhc_vj.h>
 #include <linux/atomic.h>
+#include <linux/refcount.h>

 #include <linux/nsproxy.h>
 #include <net/net_namespace.h>
@@ -84,7 +85,7 @@ struct ppp_file {
 	struct sk_buff_head xq;		/* pppd transmit queue */
 	struct sk_buff_head rq;		/* receive queue for pppd */
 	wait_queue_head_t rwait;	/* for poll on reading /dev/ppp */
-	atomic_t	refcnt;		/* # refs (incl /dev/ppp attached) */
+	refcount_t	refcnt;		/* # refs (incl /dev/ppp attached) */
 	int		hdrlen;		/* space to leave for headers */
 	int		index;		/* interface unit / channel number */
 	int		dead;		/* unit/channel has been shut down */
@@ -256,7 +257,7 @@ struct ppp_net {
 /* Prototypes. */
 static int ppp_unattached_ioctl(struct net *net, struct ppp_file *pf,
 			struct file *file, unsigned int cmd, unsigned long arg);
-static void ppp_xmit_process(struct ppp *ppp, struct sk_buff *skb);
+static void ppp_xmit_process(struct ppp *ppp);
 static void ppp_send_frame(struct ppp *ppp, struct sk_buff *skb);
 static void ppp_push(struct ppp *ppp);
 static void ppp_channel_push(struct channel *pch);
@@ -389,7 +390,7 @@ static int ppp_open(struct inode *inode,
 	/*
 	 * This could (should?) be enforced by the permissions on /dev/ppp.
 	 */
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(file->f_cred->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	return 0;
 }
@@ -408,7 +409,7 @@ static int ppp_release(struct inode *unu
 				unregister_netdevice(ppp->dev);
 			rtnl_unlock();
 		}
-		if (atomic_dec_and_test(&pf->refcnt)) {
+		if (refcount_dec_and_test(&pf->refcnt)) {
 			switch (pf->kind) {
 			case INTERFACE:
 				ppp_destroy_interface(PF_TO_PPP(pf));
@@ -512,12 +513,13 @@ static ssize_t ppp_write(struct file *fi
 		goto out;
 	}

+	skb_queue_tail(&pf->xq, skb);
+
 	switch (pf->kind) {
 	case INTERFACE:
-		ppp_xmit_process(PF_TO_PPP(pf), skb);
+		ppp_xmit_process(PF_TO_PPP(pf));
 		break;
 	case CHANNEL:
-		skb_queue_tail(&pf->xq, skb);
 		ppp_channel_push(PF_TO_CHANNEL(pf));
 		break;
 	}
@@ -880,7 +882,7 @@ static int ppp_unattached_ioctl(struct n
 		mutex_lock(&pn->all_ppp_mutex);
 		ppp = ppp_find_unit(pn, unit);
 		if (ppp) {
-			atomic_inc(&ppp->file.refcnt);
+			refcount_inc(&ppp->file.refcnt);
 			file->private_data = &ppp->file;
 			err = 0;
 		}
@@ -895,7 +897,7 @@ static int ppp_unattached_ioctl(struct n
 		spin_lock_bh(&pn->all_channels_lock);
 		chan = ppp_find_channel(pn, unit);
 		if (chan) {
-			atomic_inc(&chan->file.refcnt);
+			refcount_inc(&chan->file.refcnt);
 			file->private_data = &chan->file;
 			err = 0;
 		}
@@ -960,6 +962,8 @@ static __net_exit void ppp_exit_net(stru

 	mutex_destroy(&pn->all_ppp_mutex);
 	idr_destroy(&pn->units_idr);
+	WARN_ON_ONCE(!list_empty(&pn->all_channels));
+	WARN_ON_ONCE(!list_empty(&pn->new_channels));
 }

 static struct pernet_operations ppp_net_ops = {
@@ -1263,8 +1267,8 @@ ppp_start_xmit(struct sk_buff *skb, stru
 	put_unaligned_be16(proto, pp);

 	skb_scrub_packet(skb, !net_eq(ppp->ppp_net, dev_net(dev)));
-	ppp_xmit_process(ppp, skb);
-
+	skb_queue_tail(&ppp->file.xq, skb);
+	ppp_xmit_process(ppp);
 	return NETDEV_TX_OK;

  outf:
@@ -1349,7 +1353,7 @@ static int ppp_dev_init(struct net_devic
 	 * that ppp_destroy_interface() won't run before the device gets
 	 * unregistered.
 	 */
-	atomic_inc(&ppp->file.refcnt);
+	refcount_inc(&ppp->file.refcnt);

 	return 0;
 }
@@ -1378,7 +1382,7 @@ static void ppp_dev_priv_destructor(stru
 	struct ppp *ppp;

 	ppp = netdev_priv(dev);
-	if (atomic_dec_and_test(&ppp->file.refcnt))
+	if (refcount_dec_and_test(&ppp->file.refcnt))
 		ppp_destroy_interface(ppp);
 }

@@ -1416,14 +1420,13 @@ static void ppp_setup(struct net_device
  */

 /* Called to do any work queued up on the transmit side that can now be done */
-static void __ppp_xmit_process(struct ppp *ppp, struct sk_buff *skb)
+static void __ppp_xmit_process(struct ppp *ppp)
 {
+	struct sk_buff *skb;
+
 	ppp_xmit_lock(ppp);
 	if (!ppp->closing) {
 		ppp_push(ppp);
-
-		if (skb)
-			skb_queue_tail(&ppp->file.xq, skb);
 		while (!ppp->xmit_pending &&
 		       (skb = skb_dequeue(&ppp->file.xq)))
 			ppp_send_frame(ppp, skb);
@@ -1433,13 +1436,11 @@ static void __ppp_xmit_process(struct pp
 			netif_wake_queue(ppp->dev);
 		else
 			netif_stop_queue(ppp->dev);
-	} else {
-		kfree_skb(skb);
 	}
 	ppp_xmit_unlock(ppp);
 }

-static void ppp_xmit_process(struct ppp *ppp, struct sk_buff *skb)
+static void ppp_xmit_process(struct ppp *ppp)
 {
 	local_bh_disable();

@@ -1447,7 +1448,7 @@ static void ppp_xmit_process(struct ppp
 		goto err;

 	(*this_cpu_ptr(ppp->xmit_recursion))++;
-	__ppp_xmit_process(ppp, skb);
+	__ppp_xmit_process(ppp);
 	(*this_cpu_ptr(ppp->xmit_recursion))--;

 	local_bh_enable();
@@ -1457,8 +1458,6 @@ static void ppp_xmit_process(struct ppp
 err:
 	local_bh_enable();

-	kfree_skb(skb);
-
 	if (net_ratelimit())
 		netdev_err(ppp->dev, "recursion detected\n");
 }
@@ -1943,7 +1942,7 @@ static void __ppp_channel_push(struct ch
 	if (skb_queue_empty(&pch->file.xq)) {
 		ppp = pch->ppp;
 		if (ppp)
-			__ppp_xmit_process(ppp, NULL);
+			__ppp_xmit_process(ppp);
 	}
 }

@@ -2682,7 +2681,7 @@ ppp_unregister_channel(struct ppp_channe

 	pch->file.dead = 1;
 	wake_up_interruptible(&pch->file.rwait);
-	if (atomic_dec_and_test(&pch->file.refcnt))
+	if (refcount_dec_and_test(&pch->file.refcnt))
 		ppp_destroy_channel(pch);
 }

@@ -3052,7 +3051,7 @@ init_ppp_file(struct ppp_file *pf, int k
 	pf->kind = kind;
 	skb_queue_head_init(&pf->xq);
 	skb_queue_head_init(&pf->rq);
-	atomic_set(&pf->refcnt, 1);
+	refcount_set(&pf->refcnt, 1);
 	init_waitqueue_head(&pf->rwait);
 }

@@ -3162,15 +3161,6 @@ ppp_connect_channel(struct channel *pch,
 		goto outl;

 	ppp_lock(ppp);
-	spin_lock_bh(&pch->downl);
-	if (!pch->chan) {
-		/* Don't connect unregistered channels */
-		spin_unlock_bh(&pch->downl);
-		ppp_unlock(ppp);
-		ret = -ENOTCONN;
-		goto outl;
-	}
-	spin_unlock_bh(&pch->downl);
 	if (pch->file.hdrlen > ppp->file.hdrlen)
 		ppp->file.hdrlen = pch->file.hdrlen;
 	hdrlen = pch->file.hdrlen + 2;	/* for protocol bytes */
@@ -3179,7 +3169,7 @@ ppp_connect_channel(struct channel *pch,
 	list_add_tail(&pch->clist, &ppp->channels);
 	++ppp->n_channels;
 	pch->ppp = ppp;
-	atomic_inc(&ppp->file.refcnt);
+	refcount_inc(&ppp->file.refcnt);
 	ppp_unlock(ppp);
 	ret = 0;

@@ -3210,7 +3200,7 @@ ppp_disconnect_channel(struct channel *p
 		if (--ppp->n_channels == 0)
 			wake_up_interruptible(&ppp->file.rwait);
 		ppp_unlock(ppp);
-		if (atomic_dec_and_test(&ppp->file.refcnt))
+		if (refcount_dec_and_test(&ppp->file.refcnt))
 			ppp_destroy_interface(ppp);
 		err = 0;
 	}

> On 9 Dec 2020, at 18:57, Martin Zaharinov <micron10@gmail.com> wrote:
> 
> Hi Nault 
> 
> 
> 
>> On 9 Dec 2020, at 18:40, Guillaume Nault <gnault@redhat.com> wrote:
>> 
>> On Wed, Dec 09, 2020 at 04:47:52PM +0200, Martin Zaharinov wrote:
>>> Hi All
>>> 
>>> I have problem with latest kernel release 
>>> And the problem is base on this late problem :
>>> 
>>> 
>>> https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1
>>> 
>>> I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have same problem.
>>> 
>>> 
>>> In kernel 5.9.13 now don’t have any crashes in dimes but in one moment accel service stop with defunct and in log have many of this line :
>>> 
>>> 
>>> error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>> error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>> error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>> 
>>> In one moment connected user bump double or triple and after that service defunct and need wait to drop all session to start .
>>> 
>>> I talk with accel-ppp team and they said this is kernel related problem and to back to kernel 4.14 there is not this problem.
>>> 
>>> Problem is come after kernel 4.15 > and not have solution to this moment.
>> 
>> I'm sorry, I don't understand.
>> Do you mean that v4.14 worked fine (no crash, no ioctl() error)?
>> Did the problem start appearing in v4.15? Or did v4.15 work and the
>> problem appeared in v4.16?
> 
> In Telegram group I talk with Sergey and Dimka and told my the problem is come after changes from 4.14 to 4.15 
> Sergey write this : "as I know, there was a similar issue in kernel 4.15 so maybe it is still not fixed”
> 
> I don’t have options to test with this old kernel 4.14.xxx i don’t have support for them.
> 
> 
>> 
>>> Please help to find the problem.
>>> 
>>> Last time in link I see is make changes in ppp_generic.c 
>>> 
>>> ppp_lock(ppp);
>>>       spin_lock_bh(&pch->downl);
>>>       if (!pch->chan) {
>>>               /* Don't connect unregistered channels */
>>>               spin_unlock_bh(&pch->downl);
>>>               ppp_unlock(ppp);
>>>               ret = -ENOTCONN;
>>>               goto outl;
>>>       }
>>>       spin_unlock_bh(&pch->downl);
>>> 
>>> 
>>> But this fix only to don’t display error and freeze system 
>>> The problem is stay and is to big.
>> 
>> Do you use accel-ppp's unit-cache option? Does the problem go away if
>> you stop using it?
>> 
> 
> No I don’t use unit-cache , if I set unit-cache accel-ppp defunct same but user Is connect and disconnet more fast.
> 
> The problem is same with unit and without . 
> Only after this patch I don’t see error in dimes but this is not solution.
> In network have customer what have power cut problem, when drop 600 user and back Is normal but in this moment kernel is locking and start to make this : 
> sessions:
>  starting: 4235
>  active: 3882
>  finishing: 378
> 
> The problem is starting session is not real user normal user in this server is ~4k customers .
> 
> I use pppd_compat .
> 
> Any idea ?
> 
>>> 
>>> Please help to fix.
> Martin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Urgent: BUG: PPP ioctl Transport endpoint is not connected
  2020-12-09 16:57   ` Martin Zaharinov
  2020-12-09 17:29     ` Martin Zaharinov
@ 2020-12-09 18:10     ` Guillaume Nault
  2020-12-09 19:12       ` Martin Zaharinov
                         ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Guillaume Nault @ 2020-12-09 18:10 UTC (permalink / raw)
  To: Martin Zaharinov; +Cc: linux-kernel@vger kernel. org, Eric Dumazet, netdev

On Wed, Dec 09, 2020 at 06:57:44PM +0200, Martin Zaharinov wrote:
> > On 9 Dec 2020, at 18:40, Guillaume Nault <gnault@redhat.com> wrote:
> > On Wed, Dec 09, 2020 at 04:47:52PM +0200, Martin Zaharinov wrote:
> >> Hi All
> >> 
> >> I have problem with latest kernel release 
> >> And the problem is base on this late problem :
> >> 
> >> 
> >> https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1
> >> 
> >> I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have same problem.
> >> 
> >> 
> >> In kernel 5.9.13 now don’t have any crashes in dimes but in one moment accel service stop with defunct and in log have many of this line :
> >> 
> >> 
> >> error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >> error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >> error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >> 
> >> In one moment connected user bump double or triple and after that service defunct and need wait to drop all session to start .
> >> 
> >> I talk with accel-ppp team and they said this is kernel related problem and to back to kernel 4.14 there is not this problem.
> >> 
> >> Problem is come after kernel 4.15 > and not have solution to this moment.
> > 
> > I'm sorry, I don't understand.
> > Do you mean that v4.14 worked fine (no crash, no ioctl() error)?
> > Did the problem start appearing in v4.15? Or did v4.15 work and the
> > problem appeared in v4.16?
> 
> In Telegram group I talk with Sergey and Dimka and told my the problem is come after changes from 4.14 to 4.15 
> Sergey write this : "as I know, there was a similar issue in kernel 4.15 so maybe it is still not fixed"

Ok, but what is your experience? Do you have a kernel version where
accel-ppp reports no ioctl() error and doesn't crash the kernel?

There wasn't a lot of changes between 4.14 and 4.15 for PPP.
The only PPP patch I can see that might have been risky is commit
0171c4183559 ("ppp: unlock all_ppp_mutex before registering device").

> I don’t have options to test with this old kernel 4.14.xxx i don’t have support for them.
> 
> 
> > 
> >> Please help to find the problem.
> >> 
> >> Last time in link I see is make changes in ppp_generic.c 
> >> 
> >> ppp_lock(ppp);
> >>        spin_lock_bh(&pch->downl);
> >>        if (!pch->chan) {
> >>                /* Don't connect unregistered channels */
> >>                spin_unlock_bh(&pch->downl);
> >>                ppp_unlock(ppp);
> >>                ret = -ENOTCONN;
> >>                goto outl;
> >>        }
> >>        spin_unlock_bh(&pch->downl);
> >> 
> >> 
> >> But this fix only to don’t display error and freeze system 
> >> The problem is stay and is to big.
> > 
> > Do you use accel-ppp's unit-cache option? Does the problem go away if
> > you stop using it?
> > 
> 
> No I don’t use unit-cache , if I set unit-cache accel-ppp defunct same but user Is connect and disconnet more fast.
> 
> The problem is same with unit and without . 
> Only after this patch I don’t see error in dimes but this is not solution.

Soryy, what's "in dimes"?
Do you mean that reverting commit 77f840e3e5f0 ("ppp: prevent
unregistered channels from connecting to PPP units") fixes your problem?

> In network have customer what have power cut problem, when drop 600 user and back Is normal but in this moment kernel is locking and start to make this : 
> sessions:
>   starting: 4235
>   active: 3882
>   finishing: 378
>  The problem is starting session is not real user normal user in this server is ~4k customers .

What type of session is it? L2TP, PPPoE, PPTP?

> I use pppd_compat .
> 
> Any idea ?
> 
> >> 
> >> Please help to fix.
> Martin
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Urgent: BUG: PPP ioctl Transport endpoint is not connected
  2020-12-09 18:10     ` Guillaume Nault
@ 2020-12-09 19:12       ` Martin Zaharinov
  2020-12-14 17:09         ` Guillaume Nault
  2020-12-10  7:06       ` Martin Zaharinov
  2020-12-10  7:16       ` Martin Zaharinov
  2 siblings, 1 reply; 10+ messages in thread
From: Martin Zaharinov @ 2020-12-09 19:12 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: linux-kernel@vger kernel. org, Eric Dumazet, netdev



> On 9 Dec 2020, at 20:10, Guillaume Nault <gnault@redhat.com> wrote:
> 
> On Wed, Dec 09, 2020 at 06:57:44PM +0200, Martin Zaharinov wrote:
>>> On 9 Dec 2020, at 18:40, Guillaume Nault <gnault@redhat.com> wrote:
>>> On Wed, Dec 09, 2020 at 04:47:52PM +0200, Martin Zaharinov wrote:
>>>> Hi All
>>>> 
>>>> I have problem with latest kernel release 
>>>> And the problem is base on this late problem :
>>>> 
>>>> 
>>>> https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1
>>>> 
>>>> I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have same problem.
>>>> 
>>>> 
>>>> In kernel 5.9.13 now don’t have any crashes in dimes but in one moment accel service stop with defunct and in log have many of this line :
>>>> 
>>>> 
>>>> error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>>> error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>>> error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>>> 
>>>> In one moment connected user bump double or triple and after that service defunct and need wait to drop all session to start .
>>>> 
>>>> I talk with accel-ppp team and they said this is kernel related problem and to back to kernel 4.14 there is not this problem.
>>>> 
>>>> Problem is come after kernel 4.15 > and not have solution to this moment.
>>> 
>>> I'm sorry, I don't understand.
>>> Do you mean that v4.14 worked fine (no crash, no ioctl() error)?
>>> Did the problem start appearing in v4.15? Or did v4.15 work and the
>>> problem appeared in v4.16?
>> 
>> In Telegram group I talk with Sergey and Dimka and told my the problem is come after changes from 4.14 to 4.15 
>> Sergey write this : "as I know, there was a similar issue in kernel 4.15 so maybe it is still not fixed"
> 
> Ok, but what is your experience? Do you have a kernel version where
> accel-ppp reports no ioctl() error and doesn't crash the kernel?
Reported by Sergey and Dimka  version 4.14 < don’t have this problem,
Only version after 4.15.0 > have problem and with patch only fix to don’t put crash log in dimes and freeze system.


> 
> There wasn't a lot of changes between 4.14 and 4.15 for PPP.
> The only PPP patch I can see that might have been risky is commit
> 0171c4183559 ("ppp: unlock all_ppp_mutex before registering device").

For my changes is a atomic and skb kfree .
> 
>> I don’t have options to test with this old kernel 4.14.xxx i don’t have support for them.
>> 
>> 
>>> 
>>>> Please help to find the problem.
>>>> 
>>>> Last time in link I see is make changes in ppp_generic.c 
>>>> 
>>>> ppp_lock(ppp);
>>>>       spin_lock_bh(&pch->downl);
>>>>       if (!pch->chan) {
>>>>               /* Don't connect unregistered channels */
>>>>               spin_unlock_bh(&pch->downl);
>>>>               ppp_unlock(ppp);
>>>>               ret = -ENOTCONN;
>>>>               goto outl;
>>>>       }
>>>>       spin_unlock_bh(&pch->downl);
>>>> 
>>>> 
>>>> But this fix only to don’t display error and freeze system 
>>>> The problem is stay and is to big.
>>> 
>>> Do you use accel-ppp's unit-cache option? Does the problem go away if
>>> you stop using it?
>>> 
>> 
>> No I don’t use unit-cache , if I set unit-cache accel-ppp defunct same but user Is connect and disconnet more fast.
>> 
>> The problem is same with unit and without . 
>> Only after this patch I don’t see error in dimes but this is not solution.
> 
> Soryy, what's "in dimes"?
> Do you mean that reverting commit 77f840e3e5f0 ("ppp: prevent
> unregistered channels from connecting to PPP units") fixes your problem?
Sorry text correct in dmesg*

I don’t make any changes of ppp part of kernel 5.9.13 I use clean build for ppp .


> 
>> In network have customer what have power cut problem, when drop 600 user and back Is normal but in this moment kernel is locking and start to make this : 
>> sessions:
>>  starting: 4235
>>  active: 3882
>>  finishing: 378
>> The problem is starting session is not real user normal user in this server is ~4k customers .
> 
> What type of session is it? L2TP, PPPoE, PPTP?
> 
Session is PPPoE only with radius auth

>> I use pppd_compat .
>> 
>> Any idea ?
>> 
>>>> 
>>>> Please help to fix.
>> Martin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Urgent: BUG: PPP ioctl Transport endpoint is not connected
  2020-12-09 18:10     ` Guillaume Nault
  2020-12-09 19:12       ` Martin Zaharinov
@ 2020-12-10  7:06       ` Martin Zaharinov
  2020-12-10  7:16       ` Martin Zaharinov
  2 siblings, 0 replies; 10+ messages in thread
From: Martin Zaharinov @ 2020-12-10  7:06 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: linux-kernel@vger kernel. org, Eric Dumazet, netdev



> On 9 Dec 2020, at 20:10, Guillaume Nault <gnault@redhat.com> wrote:
> 
> On Wed, Dec 09, 2020 at 06:57:44PM +0200, Martin Zaharinov wrote:
>>> On 9 Dec 2020, at 18:40, Guillaume Nault <gnault@redhat.com> wrote:
>>> On Wed, Dec 09, 2020 at 04:47:52PM +0200, Martin Zaharinov wrote:
>>>> Hi All
>>>> 
>>>> I have problem with latest kernel release 
>>>> And the problem is base on this late problem :
>>>> 
>>>> 
>>>> https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1
>>>> 
>>>> I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have same problem.
>>>> 
>>>> 
>>>> In kernel 5.9.13 now don’t have any crashes in dimes but in one moment accel service stop with defunct and in log have many of this line :
>>>> 
>>>> 
>>>> error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>>> error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>>> error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>>> 
>>>> In one moment connected user bump double or triple and after that service defunct and need wait to drop all session to start .
>>>> 
>>>> I talk with accel-ppp team and they said this is kernel related problem and to back to kernel 4.14 there is not this problem.
>>>> 
>>>> Problem is come after kernel 4.15 > and not have solution to this moment.
>>> 
>>> I'm sorry, I don't understand.
>>> Do you mean that v4.14 worked fine (no crash, no ioctl() error)?
>>> Did the problem start appearing in v4.15? Or did v4.15 work and the
>>> problem appeared in v4.16?
>> 
>> In Telegram group I talk with Sergey and Dimka and told my the problem is come after changes from 4.14 to 4.15 
>> Sergey write this : "as I know, there was a similar issue in kernel 4.15 so maybe it is still not fixed"
> 
> Ok, but what is your experience? Do you have a kernel version where
> accel-ppp reports no ioctl() error and doesn't crash the kernel?
> 
> There wasn't a lot of changes between 4.14 and 4.15 for PPP.
> The only PPP patch I can see that might have been risky is commit
> 0171c4183559 ("ppp: unlock all_ppp_mutex before registering device").

May be or is other bug in ppp but how to debug or find fix…


> 
>> I don’t have options to test with this old kernel 4.14.xxx i don’t have support for them.
>> 
>> 
>>> 
>>>> Please help to find the problem.
>>>> 
>>>> Last time in link I see is make changes in ppp_generic.c 
>>>> 
>>>> ppp_lock(ppp);
>>>>       spin_lock_bh(&pch->downl);
>>>>       if (!pch->chan) {
>>>>               /* Don't connect unregistered channels */
>>>>               spin_unlock_bh(&pch->downl);
>>>>               ppp_unlock(ppp);
>>>>               ret = -ENOTCONN;
>>>>               goto outl;
>>>>       }
>>>>       spin_unlock_bh(&pch->downl);
>>>> 
>>>> 
>>>> But this fix only to don’t display error and freeze system 
>>>> The problem is stay and is to big.
>>> 
>>> Do you use accel-ppp's unit-cache option? Does the problem go away if
>>> you stop using it?
>>> 
>> 
>> No I don’t use unit-cache , if I set unit-cache accel-ppp defunct same but user Is connect and disconnet more fast.
>> 
>> The problem is same with unit and without . 
>> Only after this patch I don’t see error in dimes but this is not solution.
> 
> Soryy, what's "in dimes"?
> Do you mean that reverting commit 77f840e3e5f0 ("ppp: prevent
> unregistered channels from connecting to PPP units") fixes your problem?


May be no if revert system will display crash report and go to freeze .



> 
>> In network have customer what have power cut problem, when drop 600 user and back Is normal but in this moment kernel is locking and start to make this : 
>> sessions:
>>  starting: 4235
>>  active: 3882
>>  finishing: 378
>> The problem is starting session is not real user normal user in this server is ~4k customers .
> 
> What type of session is it? L2TP, PPPoE, PPTP?
> 
>> I use pppd_compat .
>> 
>> Any idea ?
>> 
>>>> 
>>>> Please help to fix.
>> Martin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Urgent: BUG: PPP ioctl Transport endpoint is not connected
  2020-12-09 18:10     ` Guillaume Nault
  2020-12-09 19:12       ` Martin Zaharinov
  2020-12-10  7:06       ` Martin Zaharinov
@ 2020-12-10  7:16       ` Martin Zaharinov
  2020-12-14 16:44         ` Guillaume Nault
  2 siblings, 1 reply; 10+ messages in thread
From: Martin Zaharinov @ 2020-12-10  7:16 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: linux-kernel@vger kernel. org, Eric Dumazet, netdev

And one other 
From other mailing I see you send patch to Denys Fedoryshchenko this patch is : 

diff --git a/drivers/net/ppp/ppp_generic.c 
b/drivers/net/ppp/ppp_generic.c

index 255a5def56e9..2acf4b0eabd1 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -3161,6 +3161,15 @@ ppp_connect_channel(struct channel *pch, int 
unit)

goto outl;

ppp_lock(ppp);
+   spin_lock_bh(>downl);
+   if (!pch->chan) {
+   /* Don't connect unregistered channels */
+   ppp_unlock(ppp);
+   spin_unlock_bh(>downl);
+   ret = -ENOTCONN;
+   goto outl;
+   }
+   spin_unlock_bh(>downl);
if (pch->file.hdrlen > ppp->file.hdrlen)
ppp->file.hdrlen = pch->file.hdrlen;
hdrlen = pch->file.hdrlen + 2;   /* for protocol bytes */





But in official stable kernel three In ppp_generic.c is this : 

spin_lock_bh(&pch->downl); 
	if (!pch->chan) { 
	/* Don't connect unregistered channels */ 
	spin_unlock_bh(&pch->downl); 
	ppp_unlock(ppp); 
	ret = -ENOTCONN; 
	goto outl; }
	spin_unlock_bh(&pch->downl);	



It is  normal to unlock ppp after spin_unlock ?
shouldn't it be as you wrote it?
In your patch first :

+   ppp_unlock(ppp);
+   spin_unlock_bh(>downl);

But in stable kernel is : 

spin_unlock_bh(&pch->downl); 
	ppp_unlock(ppp); 






> On 9 Dec 2020, at 20:10, Guillaume Nault <gnault@redhat.com> wrote:
> 
> On Wed, Dec 09, 2020 at 06:57:44PM +0200, Martin Zaharinov wrote:
>>> On 9 Dec 2020, at 18:40, Guillaume Nault <gnault@redhat.com> wrote:
>>> On Wed, Dec 09, 2020 at 04:47:52PM +0200, Martin Zaharinov wrote:
>>>> Hi All
>>>> 
>>>> I have problem with latest kernel release 
>>>> And the problem is base on this late problem :
>>>> 
>>>> 
>>>> https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1
>>>> 
>>>> I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have same problem.
>>>> 
>>>> 
>>>> In kernel 5.9.13 now don’t have any crashes in dimes but in one moment accel service stop with defunct and in log have many of this line :
>>>> 
>>>> 
>>>> error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>>> error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>>> error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>>>> 
>>>> In one moment connected user bump double or triple and after that service defunct and need wait to drop all session to start .
>>>> 
>>>> I talk with accel-ppp team and they said this is kernel related problem and to back to kernel 4.14 there is not this problem.
>>>> 
>>>> Problem is come after kernel 4.15 > and not have solution to this moment.
>>> 
>>> I'm sorry, I don't understand.
>>> Do you mean that v4.14 worked fine (no crash, no ioctl() error)?
>>> Did the problem start appearing in v4.15? Or did v4.15 work and the
>>> problem appeared in v4.16?
>> 
>> In Telegram group I talk with Sergey and Dimka and told my the problem is come after changes from 4.14 to 4.15 
>> Sergey write this : "as I know, there was a similar issue in kernel 4.15 so maybe it is still not fixed"
> 
> Ok, but what is your experience? Do you have a kernel version where
> accel-ppp reports no ioctl() error and doesn't crash the kernel?
> 
> There wasn't a lot of changes between 4.14 and 4.15 for PPP.
> The only PPP patch I can see that might have been risky is commit
> 0171c4183559 ("ppp: unlock all_ppp_mutex before registering device").
> 
>> I don’t have options to test with this old kernel 4.14.xxx i don’t have support for them.
>> 
>> 
>>> 
>>>> Please help to find the problem.
>>>> 
>>>> Last time in link I see is make changes in ppp_generic.c 
>>>> 
>>>> ppp_lock(ppp);
>>>>       spin_lock_bh(&pch->downl);
>>>>       if (!pch->chan) {
>>>>               /* Don't connect unregistered channels */
>>>>               spin_unlock_bh(&pch->downl);
>>>>               ppp_unlock(ppp);
>>>>               ret = -ENOTCONN;
>>>>               goto outl;
>>>>       }
>>>>       spin_unlock_bh(&pch->downl);
>>>> 
>>>> 
>>>> But this fix only to don’t display error and freeze system 
>>>> The problem is stay and is to big.
>>> 
>>> Do you use accel-ppp's unit-cache option? Does the problem go away if
>>> you stop using it?
>>> 
>> 
>> No I don’t use unit-cache , if I set unit-cache accel-ppp defunct same but user Is connect and disconnet more fast.
>> 
>> The problem is same with unit and without . 
>> Only after this patch I don’t see error in dimes but this is not solution.
> 
> Soryy, what's "in dimes"?
> Do you mean that reverting commit 77f840e3e5f0 ("ppp: prevent
> unregistered channels from connecting to PPP units") fixes your problem?
> 
>> In network have customer what have power cut problem, when drop 600 user and back Is normal but in this moment kernel is locking and start to make this : 
>> sessions:
>>  starting: 4235
>>  active: 3882
>>  finishing: 378
>> The problem is starting session is not real user normal user in this server is ~4k customers .
> 
> What type of session is it? L2TP, PPPoE, PPTP?
> 
>> I use pppd_compat .
>> 
>> Any idea ?
>> 
>>>> 
>>>> Please help to fix.
>> Martin


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Urgent: BUG: PPP ioctl Transport endpoint is not connected
  2020-12-10  7:16       ` Martin Zaharinov
@ 2020-12-14 16:44         ` Guillaume Nault
  0 siblings, 0 replies; 10+ messages in thread
From: Guillaume Nault @ 2020-12-14 16:44 UTC (permalink / raw)
  To: Martin Zaharinov; +Cc: linux-kernel@vger kernel. org, Eric Dumazet, netdev

On Thu, Dec 10, 2020 at 09:16:24AM +0200, Martin Zaharinov wrote:
> And one other 
> From other mailing I see you send patch to Denys Fedoryshchenko this patch is : 
> 
> diff --git a/drivers/net/ppp/ppp_generic.c 
> b/drivers/net/ppp/ppp_generic.c
> 
> index 255a5def56e9..2acf4b0eabd1 100644
> --- a/drivers/net/ppp/ppp_generic.c
> +++ b/drivers/net/ppp/ppp_generic.c
> @@ -3161,6 +3161,15 @@ ppp_connect_channel(struct channel *pch, int 
> unit)
> 
> goto outl;
> 
> ppp_lock(ppp);
> +   spin_lock_bh(>downl);
> +   if (!pch->chan) {
> +   /* Don't connect unregistered channels */
> +   ppp_unlock(ppp);
> +   spin_unlock_bh(>downl);
> +   ret = -ENOTCONN;
> +   goto outl;
> +   }
> +   spin_unlock_bh(>downl);
> if (pch->file.hdrlen > ppp->file.hdrlen)
> ppp->file.hdrlen = pch->file.hdrlen;
> hdrlen = pch->file.hdrlen + 2;   /* for protocol bytes */

This was a quick untested patch that I sent to help debugging Denys'
problem. It has a lock inversion problem that I fixed before I formally
submitted it upstream. I even warned about it in the original thread:
https://lore.kernel.org/netdev/20180302174328.GD1413@alphalink.fr/

> But in official stable kernel three In ppp_generic.c is this : 
> 
> spin_lock_bh(&pch->downl); 
> 	if (!pch->chan) { 
> 	/* Don't connect unregistered channels */ 
> 	spin_unlock_bh(&pch->downl); 
> 	ppp_unlock(ppp); 
> 	ret = -ENOTCONN; 
> 	goto outl; }
> 	spin_unlock_bh(&pch->downl);	

This one is correct.

> It is  normal to unlock ppp after spin_unlock ?
> shouldn't it be as you wrote it?
> In your patch first :
> 
> +   ppp_unlock(ppp);
> +   spin_unlock_bh(>downl);

No, nested locks have to be released in the reverse order they were
acquired.

> But in stable kernel is : 
> 
> spin_unlock_bh(&pch->downl); 
> 	ppp_unlock(ppp); 

This is correct, and has been correctly backported to 4.14-stable.


> > On 9 Dec 2020, at 20:10, Guillaume Nault <gnault@redhat.com> wrote:
> > 
> > On Wed, Dec 09, 2020 at 06:57:44PM +0200, Martin Zaharinov wrote:
> >>> On 9 Dec 2020, at 18:40, Guillaume Nault <gnault@redhat.com> wrote:
> >>> On Wed, Dec 09, 2020 at 04:47:52PM +0200, Martin Zaharinov wrote:
> >>>> Hi All
> >>>> 
> >>>> I have problem with latest kernel release 
> >>>> And the problem is base on this late problem :
> >>>> 
> >>>> 
> >>>> https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1
> >>>> 
> >>>> I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have same problem.
> >>>> 
> >>>> 
> >>>> In kernel 5.9.13 now don’t have any crashes in dimes but in one moment accel service stop with defunct and in log have many of this line :
> >>>> 
> >>>> 
> >>>> error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>>> error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>>> error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>>> 
> >>>> In one moment connected user bump double or triple and after that service defunct and need wait to drop all session to start .
> >>>> 
> >>>> I talk with accel-ppp team and they said this is kernel related problem and to back to kernel 4.14 there is not this problem.
> >>>> 
> >>>> Problem is come after kernel 4.15 > and not have solution to this moment.
> >>> 
> >>> I'm sorry, I don't understand.
> >>> Do you mean that v4.14 worked fine (no crash, no ioctl() error)?
> >>> Did the problem start appearing in v4.15? Or did v4.15 work and the
> >>> problem appeared in v4.16?
> >> 
> >> In Telegram group I talk with Sergey and Dimka and told my the problem is come after changes from 4.14 to 4.15 
> >> Sergey write this : "as I know, there was a similar issue in kernel 4.15 so maybe it is still not fixed"
> > 
> > Ok, but what is your experience? Do you have a kernel version where
> > accel-ppp reports no ioctl() error and doesn't crash the kernel?
> > 
> > There wasn't a lot of changes between 4.14 and 4.15 for PPP.
> > The only PPP patch I can see that might have been risky is commit
> > 0171c4183559 ("ppp: unlock all_ppp_mutex before registering device").
> > 
> >> I don’t have options to test with this old kernel 4.14.xxx i don’t have support for them.
> >> 
> >> 
> >>> 
> >>>> Please help to find the problem.
> >>>> 
> >>>> Last time in link I see is make changes in ppp_generic.c 
> >>>> 
> >>>> ppp_lock(ppp);
> >>>>       spin_lock_bh(&pch->downl);
> >>>>       if (!pch->chan) {
> >>>>               /* Don't connect unregistered channels */
> >>>>               spin_unlock_bh(&pch->downl);
> >>>>               ppp_unlock(ppp);
> >>>>               ret = -ENOTCONN;
> >>>>               goto outl;
> >>>>       }
> >>>>       spin_unlock_bh(&pch->downl);
> >>>> 
> >>>> 
> >>>> But this fix only to don’t display error and freeze system 
> >>>> The problem is stay and is to big.
> >>> 
> >>> Do you use accel-ppp's unit-cache option? Does the problem go away if
> >>> you stop using it?
> >>> 
> >> 
> >> No I don’t use unit-cache , if I set unit-cache accel-ppp defunct same but user Is connect and disconnet more fast.
> >> 
> >> The problem is same with unit and without . 
> >> Only after this patch I don’t see error in dimes but this is not solution.
> > 
> > Soryy, what's "in dimes"?
> > Do you mean that reverting commit 77f840e3e5f0 ("ppp: prevent
> > unregistered channels from connecting to PPP units") fixes your problem?
> > 
> >> In network have customer what have power cut problem, when drop 600 user and back Is normal but in this moment kernel is locking and start to make this : 
> >> sessions:
> >>  starting: 4235
> >>  active: 3882
> >>  finishing: 378
> >> The problem is starting session is not real user normal user in this server is ~4k customers .
> > 
> > What type of session is it? L2TP, PPPoE, PPTP?
> > 
> >> I use pppd_compat .
> >> 
> >> Any idea ?
> >> 
> >>>> 
> >>>> Please help to fix.
> >> Martin
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Urgent: BUG: PPP ioctl Transport endpoint is not connected
  2020-12-09 19:12       ` Martin Zaharinov
@ 2020-12-14 17:09         ` Guillaume Nault
  0 siblings, 0 replies; 10+ messages in thread
From: Guillaume Nault @ 2020-12-14 17:09 UTC (permalink / raw)
  To: Martin Zaharinov; +Cc: linux-kernel@vger kernel. org, Eric Dumazet, netdev

On Wed, Dec 09, 2020 at 09:12:18PM +0200, Martin Zaharinov wrote:
> 
> 
> > On 9 Dec 2020, at 20:10, Guillaume Nault <gnault@redhat.com> wrote:
> > 
> > On Wed, Dec 09, 2020 at 06:57:44PM +0200, Martin Zaharinov wrote:
> >>> On 9 Dec 2020, at 18:40, Guillaume Nault <gnault@redhat.com> wrote:
> >>> On Wed, Dec 09, 2020 at 04:47:52PM +0200, Martin Zaharinov wrote:
> >>>> Hi All
> >>>> 
> >>>> I have problem with latest kernel release 
> >>>> And the problem is base on this late problem :
> >>>> 
> >>>> 
> >>>> https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1
> >>>> 
> >>>> I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have same problem.
> >>>> 
> >>>> 
> >>>> In kernel 5.9.13 now don’t have any crashes in dimes but in one moment accel service stop with defunct and in log have many of this line :
> >>>> 
> >>>> 
> >>>> error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>>> error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>>> error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>>> 
> >>>> In one moment connected user bump double or triple and after that service defunct and need wait to drop all session to start .
> >>>> 
> >>>> I talk with accel-ppp team and they said this is kernel related problem and to back to kernel 4.14 there is not this problem.
> >>>> 
> >>>> Problem is come after kernel 4.15 > and not have solution to this moment.
> >>> 
> >>> I'm sorry, I don't understand.
> >>> Do you mean that v4.14 worked fine (no crash, no ioctl() error)?
> >>> Did the problem start appearing in v4.15? Or did v4.15 work and the
> >>> problem appeared in v4.16?
> >> 
> >> In Telegram group I talk with Sergey and Dimka and told my the problem is come after changes from 4.14 to 4.15 
> >> Sergey write this : "as I know, there was a similar issue in kernel 4.15 so maybe it is still not fixed"
> > 
> > Ok, but what is your experience? Do you have a kernel version where
> > accel-ppp reports no ioctl() error and doesn't crash the kernel?
> Reported by Sergey and Dimka  version 4.14 < don’t have this problem,
> Only version after 4.15.0 > have problem and with patch only fix to don’t put crash log in dimes and freeze system.

If they know about some regressions, please tell them to report them
(either to the list or directly to me). Because I'm not aware of
anything that broke with 4.15.

> > 
> > There wasn't a lot of changes between 4.14 and 4.15 for PPP.
> > The only PPP patch I can see that might have been risky is commit
> > 0171c4183559 ("ppp: unlock all_ppp_mutex before registering device").
> 
> For my changes is a atomic and skb kfree .
> > 
> >> I don’t have options to test with this old kernel 4.14.xxx i don’t have support for them.
> >> 
> >> 
> >>> 
> >>>> Please help to find the problem.
> >>>> 
> >>>> Last time in link I see is make changes in ppp_generic.c 
> >>>> 
> >>>> ppp_lock(ppp);
> >>>>       spin_lock_bh(&pch->downl);
> >>>>       if (!pch->chan) {
> >>>>               /* Don't connect unregistered channels */
> >>>>               spin_unlock_bh(&pch->downl);
> >>>>               ppp_unlock(ppp);
> >>>>               ret = -ENOTCONN;
> >>>>               goto outl;
> >>>>       }
> >>>>       spin_unlock_bh(&pch->downl);
> >>>> 
> >>>> 
> >>>> But this fix only to don’t display error and freeze system 
> >>>> The problem is stay and is to big.
> >>> 
> >>> Do you use accel-ppp's unit-cache option? Does the problem go away if
> >>> you stop using it?
> >>> 
> >> 
> >> No I don’t use unit-cache , if I set unit-cache accel-ppp defunct same but user Is connect and disconnet more fast.
> >> 
> >> The problem is same with unit and without . 
> >> Only after this patch I don’t see error in dimes but this is not solution.
> > 
> > Soryy, what's "in dimes"?
> > Do you mean that reverting commit 77f840e3e5f0 ("ppp: prevent
> > unregistered channels from connecting to PPP units") fixes your problem?
> Sorry text correct in dmesg*
> 
> I don’t make any changes of ppp part of kernel 5.9.13 I use clean build for ppp .
> > 
> >> In network have customer what have power cut problem, when drop 600 user and back Is normal but in this moment kernel is locking and start to make this : 
> >> sessions:
> >>  starting: 4235
> >>  active: 3882
> >>  finishing: 378
> >> The problem is starting session is not real user normal user in this server is ~4k customers .
> > 
> > What type of session is it? L2TP, PPPoE, PPTP?
> > 
> Session is PPPoE only with radius auth
> 
> >> I use pppd_compat .
> >> 
> >> Any idea ?
> >> 
> >>>> 
> >>>> Please help to fix.
> >> Martin
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-12-14 17:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-09 14:47 Urgent: BUG: PPP ioctl Transport endpoint is not connected Martin Zaharinov
2020-12-09 16:40 ` Guillaume Nault
2020-12-09 16:57   ` Martin Zaharinov
2020-12-09 17:29     ` Martin Zaharinov
2020-12-09 18:10     ` Guillaume Nault
2020-12-09 19:12       ` Martin Zaharinov
2020-12-14 17:09         ` Guillaume Nault
2020-12-10  7:06       ` Martin Zaharinov
2020-12-10  7:16       ` Martin Zaharinov
2020-12-14 16:44         ` Guillaume Nault

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).