From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3BF0C4361B for ; Mon, 14 Dec 2020 17:11:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 96367224DF for ; Mon, 14 Dec 2020 17:11:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408389AbgLNRLc (ORCPT ); Mon, 14 Dec 2020 12:11:32 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:49092 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2408381AbgLNRLT (ORCPT ); Mon, 14 Dec 2020 12:11:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1607965792; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bfJv7pyU3y1plqcZLspvJWWR6I6MIq6AZVeJWdRv7rM=; b=b+fjcCglgypxga5/c6i6CzWRcgtpphn+YeVj6Lc3Lv/ak2uQc7sRuNf6/kNuvlTJJsOMYx w8Zi0+qfUDfzjtB0HZl0dIi9IBo+Q9gbU8V70BrqfNHb7Nc0LHNm4eZR/SbV0lo47xeTOp MWIyppE4t26EV6c7dLK/suLNEpdZVQQ= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-536-yuEih19FNOiogAMVtr6huw-1; Mon, 14 Dec 2020 12:09:51 -0500 X-MC-Unique: yuEih19FNOiogAMVtr6huw-1 Received: by mail-wr1-f69.google.com with SMTP id v1so6895581wri.16 for ; Mon, 14 Dec 2020 09:09:50 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=bfJv7pyU3y1plqcZLspvJWWR6I6MIq6AZVeJWdRv7rM=; b=ZFsCnWvvMfXH90lcAuZjY6ooxb5M5CaVtaAcPGYnVtF/Yejs4LvniAOnYdsE2nBG/L zOV3rrG1IJuukBN6z0J24PtRthYpgrRlJFhNOJ2B/idt0Ded7yBaTWajCBKS9r9bsMr4 aAGrqSixUWt27QqIHwrRTBjE1i/qF7u01rCsxImnDYDFGpG6AbW9FljIJJuaZrq+6ESa 06p+qV31sMYv86A/QJF8SGlC4/lPNvMMiz0bCrFU9tJvl+v6uu7P+nqShgJhyXPCFkBG opmpD3FSRDQX+oHEqT7I8Id+OOc5A5wWGkZgRXa7VPnG9cEcWctTLgF4ujBGN01bIe+Y WOfg== X-Gm-Message-State: AOAM532EYqULCwfbRuhAKW+lb7gynsjY7Zomiqa8wl1bMj6CAfuqxvPh DjW1DeyLwNrC4CgZGkldujNeyauAQkaQxLOJi4sslNx65roZ8QgK5ChLDAy+yzYMxuFCNWo3Nor aUF2cE6bWoiUDpFxYeA6ssK0x X-Received: by 2002:a1c:1fc4:: with SMTP id f187mr28784528wmf.107.1607965789305; Mon, 14 Dec 2020 09:09:49 -0800 (PST) X-Google-Smtp-Source: ABdhPJxRVXsSXqA3sY4VvpK70uLZVdAE7dFO3NJoxd/7i8PSqwKK7XsINKbTuV+o1EUCRA8cYqOwUg== X-Received: by 2002:a1c:1fc4:: with SMTP id f187mr28784511wmf.107.1607965789070; Mon, 14 Dec 2020 09:09:49 -0800 (PST) Received: from linux.home (2a01cb058918ce00dd1a5a4f9908f2d5.ipv6.abo.wanadoo.fr. [2a01:cb05:8918:ce00:dd1a:5a4f:9908:f2d5]) by smtp.gmail.com with ESMTPSA id y6sm32273563wmg.39.2020.12.14.09.09.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Dec 2020 09:09:48 -0800 (PST) Date: Mon, 14 Dec 2020 18:09:46 +0100 From: Guillaume Nault To: Martin Zaharinov Cc: "linux-kernel@vger kernel. org" , Eric Dumazet , netdev@vger.kernel.org Subject: Re: Urgent: BUG: PPP ioctl Transport endpoint is not connected Message-ID: <20201214170946.GB8350@linux.home> References: <83C781EB-5D66-426E-A216-E1B846A3EC8A@gmail.com> <20201209164013.GA21199@linux.home> <1E49F9F8-0325-439E-B200-17C8CB6A3CBE@gmail.com> <20201209181033.GB21199@linux.home> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 09, 2020 at 09:12:18PM +0200, Martin Zaharinov wrote: > > > > On 9 Dec 2020, at 20:10, Guillaume Nault wrote: > > > > On Wed, Dec 09, 2020 at 06:57:44PM +0200, Martin Zaharinov wrote: > >>> On 9 Dec 2020, at 18:40, Guillaume Nault wrote: > >>> On Wed, Dec 09, 2020 at 04:47:52PM +0200, Martin Zaharinov wrote: > >>>> Hi All > >>>> > >>>> I have problem with latest kernel release > >>>> And the problem is base on this late problem : > >>>> > >>>> > >>>> https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1 > >>>> > >>>> I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have same problem. > >>>> > >>>> > >>>> In kernel 5.9.13 now don’t have any crashes in dimes but in one moment accel service stop with defunct and in log have many of this line : > >>>> > >>>> > >>>> error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> > >>>> In one moment connected user bump double or triple and after that service defunct and need wait to drop all session to start . > >>>> > >>>> I talk with accel-ppp team and they said this is kernel related problem and to back to kernel 4.14 there is not this problem. > >>>> > >>>> Problem is come after kernel 4.15 > and not have solution to this moment. > >>> > >>> I'm sorry, I don't understand. > >>> Do you mean that v4.14 worked fine (no crash, no ioctl() error)? > >>> Did the problem start appearing in v4.15? Or did v4.15 work and the > >>> problem appeared in v4.16? > >> > >> In Telegram group I talk with Sergey and Dimka and told my the problem is come after changes from 4.14 to 4.15 > >> Sergey write this : "as I know, there was a similar issue in kernel 4.15 so maybe it is still not fixed" > > > > Ok, but what is your experience? Do you have a kernel version where > > accel-ppp reports no ioctl() error and doesn't crash the kernel? > Reported by Sergey and Dimka version 4.14 < don’t have this problem, > Only version after 4.15.0 > have problem and with patch only fix to don’t put crash log in dimes and freeze system. If they know about some regressions, please tell them to report them (either to the list or directly to me). Because I'm not aware of anything that broke with 4.15. > > > > There wasn't a lot of changes between 4.14 and 4.15 for PPP. > > The only PPP patch I can see that might have been risky is commit > > 0171c4183559 ("ppp: unlock all_ppp_mutex before registering device"). > > For my changes is a atomic and skb kfree . > > > >> I don’t have options to test with this old kernel 4.14.xxx i don’t have support for them. > >> > >> > >>> > >>>> Please help to find the problem. > >>>> > >>>> Last time in link I see is make changes in ppp_generic.c > >>>> > >>>> ppp_lock(ppp); > >>>> spin_lock_bh(&pch->downl); > >>>> if (!pch->chan) { > >>>> /* Don't connect unregistered channels */ > >>>> spin_unlock_bh(&pch->downl); > >>>> ppp_unlock(ppp); > >>>> ret = -ENOTCONN; > >>>> goto outl; > >>>> } > >>>> spin_unlock_bh(&pch->downl); > >>>> > >>>> > >>>> But this fix only to don’t display error and freeze system > >>>> The problem is stay and is to big. > >>> > >>> Do you use accel-ppp's unit-cache option? Does the problem go away if > >>> you stop using it? > >>> > >> > >> No I don’t use unit-cache , if I set unit-cache accel-ppp defunct same but user Is connect and disconnet more fast. > >> > >> The problem is same with unit and without . > >> Only after this patch I don’t see error in dimes but this is not solution. > > > > Soryy, what's "in dimes"? > > Do you mean that reverting commit 77f840e3e5f0 ("ppp: prevent > > unregistered channels from connecting to PPP units") fixes your problem? > Sorry text correct in dmesg* > > I don’t make any changes of ppp part of kernel 5.9.13 I use clean build for ppp . > > > >> In network have customer what have power cut problem, when drop 600 user and back Is normal but in this moment kernel is locking and start to make this : > >> sessions: > >> starting: 4235 > >> active: 3882 > >> finishing: 378 > >> The problem is starting session is not real user normal user in this server is ~4k customers . > > > > What type of session is it? L2TP, PPPoE, PPTP? > > > Session is PPPoE only with radius auth > > >> I use pppd_compat . > >> > >> Any idea ? > >> > >>>> > >>>> Please help to fix. > >> Martin >