linux-ppp.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PPPoE Modem hangup after random time - how to debug?
@ 2020-04-22 17:45 David Balažic
  2020-04-22 19:51 ` Michael Richardson
                   ` (42 more replies)
  0 siblings, 43 replies; 44+ messages in thread
From: David Balažic @ 2020-04-22 17:45 UTC (permalink / raw)
  To: linux-ppp

Hi!

I have a router running openwrt connected to a GPON ONT, running a
PPPoE connection (hw details below).

The problem is, that after some random time one to 20 hours, the
connection breaks. Then in about a minute it reconnects (successfully,
each time).
This happens almost every day.

I enabled the debug option, but there is nothing useful in the logs:

** there is nothing related before this line **
Tue Apr 21 10:04:32 2020 daemon.notice pppd[6515]: Modem hangup
Tue Apr 21 10:04:32 2020 daemon.info pppd[6515]: Connect time 1094.4 minutes.
Tue Apr 21 10:04:32 2020 daemon.info pppd[6515]: Sent 612360704 bytes,
received 3423392867 bytes.
Tue Apr 21 10:04:32 2020 daemon.notice netifd: Network device
'pppoe-wan' link is down
Tue Apr 21 10:04:32 2020 daemon.debug pppd[6515]: Script
/lib/netifd/ppp-down started (pid 7655)
Tue Apr 21 10:04:32 2020 daemon.notice pppd[6515]: Connection terminated.
Tue Apr 21 10:04:32 2020 daemon.debug pppd[6515]: Send PPPOE Discovery
V1T1 PADT session 0x1 length 28
Tue Apr 21 10:04:32 2020 daemon.debug pppd[6515]:  dst
a4:xx:xx:xx:xx:44  src c4:xx:xx:xx:xx:ed
Tue Apr 21 10:04:32 2020 daemon.debug pppd[6515]:  [host-uniq  00 00
19 73] [AC-cookie  ed 89 xx xx xx xx xx xx xx xx xx xx xx xx xx 75]
Tue Apr 21 10:04:32 2020 daemon.info pppd[6515]: Sent PADT
Tue Apr 21 10:04:32 2020 daemon.debug pppd[6515]: Waiting for 1 child
processes...
Tue Apr 21 10:04:32 2020 daemon.debug pppd[6515]:   script
/lib/netifd/ppp-down, pid 7655
Tue Apr 21 10:04:32 2020 daemon.debug pppd[6515]: Script
/lib/netifd/ppp-down finished (pid 7655), status = 0x1
Tue Apr 21 10:04:32 2020 daemon.info pppd[6515]: Exit.


The command line is:

 /usr/sbin/pppd nodetach ipparam wan ifname pppoe-wan
lcp-echo-interval 1 lcp-echo-failure 5 lcp-echo-adaptive +ipv6 set
AUTOIPV6=1 nodefaultroute usepeerdns maxfail 1 user XXXXXX password
YYYYYY ip-up-script /lib/netifd/ppp-up ipv6-up-script
/lib/netifd/ppp6-up ip-down-script /lib/netifd/ppp-down
ipv6-down-script /lib/netifd/ppp-down mtu 1492 mru 1492 plugin
rp-pppoe.so nic-eth1.3902

(the PPPoE goes over VLAN 3902, pppd version is  2.4.7)

I wrote about this on the openwrt forum [1], not much progress unfortunately.

Any ides? How to proceed? Sniff packets?

What does normally prompt pppd to to print "Modem hangup" ?

Thanks in advance,
David

[1] https://forum.openwrt.org/t/pppoe-disconnects-every-few-hours/61239

Setup details:

ONT: Innbox G21 (provided by ISP)
router: Netgear WNDR 3700v2 running OpenWRT 19.7.2
ISP provided router (not in use): Ubee TVW625  - I did not see such
disconnections when using this router (I ran a script calling ping in
a loop, for days, logging output, not problem except a lost (single)
packet every now and then)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
@ 2020-04-22 19:51 ` Michael Richardson
  2020-04-22 22:00 ` James Carlson
                   ` (41 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Michael Richardson @ 2020-04-22 19:51 UTC (permalink / raw)
  To: linux-ppp

[-- Attachment #1: Type: text/plain, Size: 1117 bytes --]


Hi, I found that some PPPoE BMS do not emit IPCP echo responses if there is
also traffic.  Since I have a static IPv4/v6, and there is DNS requests and
script kiddies, there is *ALWAYS* half a megabit/s of traffic, which is
probably atypical for many residential users who get random IPs.
I.e. they have some sort of "lcp-echo-adaptive" going on, but not in a good
way.

I had to force lcp-echo-interval to 0, which was a bit difficult to get right
in the OpenWRT UI.  I had tried to increase echo-failure to 30, but that
didn't help consistently.

The other problem is that Bell Canada only answers PADI's once every five
minutes.  It's total BS.

Modem Hangup would normally be because the DCD signal went down, but on
PPPoE, that can't happen, I think.  I did see that when LCP failed, but I
also had debugging that told me it had failed.  maybe that's just off for
you.

--
]               Never tell me the odds!                 | ipv6 mesh networks [
]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
]     mcr@sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    [


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
  2020-04-22 19:51 ` Michael Richardson
@ 2020-04-22 22:00 ` James Carlson
  2020-04-23  0:08 ` Guillaume Nault
                   ` (40 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: James Carlson @ 2020-04-22 22:00 UTC (permalink / raw)
  To: linux-ppp

On 2020-04-22 13:45, David Balažic wrote:
> Hi!
> 
> I have a router running openwrt connected to a GPON ONT, running a
> PPPoE connection (hw details below).

This looks to be (possibly) a PPPoE failure, though there's nothing
interesting in the log messages provided.  If there were an interesting
log message, it would almost certainly come somewhere (perhaps even
*minutes*) before that initial "Modem hangup" message.

The definitive test would be to run a packet capture on the Ethernet
interface itself (*not* on the PPP interface) with something like
Wireshark and determine what happens in the lead-up to the failure.  I'd
expect the system is just getting a stray PADT from the peer, if it's
some kind of PPPoE problem.

If it isn't, then possibly it's something else.  I think that the last
time I looked at the PPPoE implementation on Linux it was a bit hokey --
it ran PPP over a pty pair and then decoded the framing in user space
and wrote it back out over Ethernet using PPPoE.  I hope it's not still
like that, as I haven't looked at it in years, but it may well be.  An
internal error in that logic could also cause a "hangup" message,
although hopefully along with some kind of system log about a core file
as well.

(The implementation on other platforms, such as Solaris, was a bit more
solid.)

-- 
James Carlson         42.703N 71.076W         <carlsonj@workingcode.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
  2020-04-22 19:51 ` Michael Richardson
  2020-04-22 22:00 ` James Carlson
@ 2020-04-23  0:08 ` Guillaume Nault
  2020-04-23  0:41 ` Michael Richardson
                   ` (39 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Guillaume Nault @ 2020-04-23  0:08 UTC (permalink / raw)
  To: linux-ppp

On Wed, Apr 22, 2020 at 06:00:52PM -0400, James Carlson wrote:
> On 2020-04-22 13:45, David Balažic wrote:
> > Hi!
> > 
> > I have a router running openwrt connected to a GPON ONT, running a
> > PPPoE connection (hw details below).
> 
> This looks to be (possibly) a PPPoE failure, though there's nothing
> interesting in the log messages provided.  If there were an interesting
> log message, it would almost certainly come somewhere (perhaps even
> *minutes*) before that initial "Modem hangup" message.
> 
> The definitive test would be to run a packet capture on the Ethernet
> interface itself (*not* on the PPP interface) with something like
> Wireshark and determine what happens in the lead-up to the failure.  I'd
> expect the system is just getting a stray PADT from the peer, if it's
> some kind of PPPoE problem.
> 
I'd also suspect that the peer just sent a PADT. That can be
intentional. Some providers proactively force reconnections
every x days. Though 20 hours looks like a really low timeout.
You can capture PPPoE discovery packets with
"tcpdump -i nic-eth1.3902 pppoed".

Another possibility is that the network interface is modified
(interface passed down, MTU or the MAC address changed) while the
PPPoE session is established.

> If it isn't, then possibly it's something else.  I think that the last
> time I looked at the PPPoE implementation on Linux it was a bit hokey --
> it ran PPP over a pty pair and then decoded the framing in user space
> and wrote it back out over Ethernet using PPPoE.  I hope it's not still
> like that, as I haven't looked at it in years, but it may well be.  An
> internal error in that logic could also cause a "hangup" message,
> although hopefully along with some kind of system log about a core file
> as well.
> 
I didn't know about that implementation. PPPoE has been implemented
using PPPOX sockets since at least the beginning of the git history.

> (The implementation on other platforms, such as Solaris, was a bit more
> solid.)
> 
> -- 
> James Carlson         42.703N 71.076W         <carlsonj@workingcode.com>
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (2 preceding siblings ...)
  2020-04-23  0:08 ` Guillaume Nault
@ 2020-04-23  0:41 ` Michael Richardson
  2020-04-23  2:01 ` James Carlson
                   ` (38 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Michael Richardson @ 2020-04-23  0:41 UTC (permalink / raw)
  To: linux-ppp

[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]


James Carlson <carlsonj@workingcode.com> wrote:
    > If it isn't, then possibly it's something else.  I think that the last
    > time I looked at the PPPoE implementation on Linux it was a bit hokey --
    > it ran PPP over a pty pair and then decoded the framing in user space
    > and wrote it back out over Ethernet using PPPoE.  I hope it's not still
    > like that, as I haven't looked at it in years, but it may well be.  An
    > internal error in that logic could also cause a "hangup" message,
    > although hopefully along with some kind of system log about a core file
    > as well.

No, that's not how PPPoE has worked for at least 15 years.
There is a PPPoE socket and the IP packets do not leave the kernel.
I have built BMS systems that handle thousands of sessions on 8-core
systems.

We had to move to 64-bit kernels because we needed such a lot of control
structures that we'd run out 32-bit kernel memory long before we ran out of
CPU.  (this was for oversubscribed rather slow last miles, not VDSL2)

--
]               Never tell me the odds!                 | ipv6 mesh networks [
]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
]     mcr@sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    [




[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (3 preceding siblings ...)
  2020-04-23  0:41 ` Michael Richardson
@ 2020-04-23  2:01 ` James Carlson
  2020-04-23 10:37 ` David Balažic
                   ` (37 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: James Carlson @ 2020-04-23  2:01 UTC (permalink / raw)
  To: linux-ppp

On 4/22/20 8:41 PM, Michael Richardson wrote:
> 
> James Carlson <carlsonj@workingcode.com> wrote:
>      > If it isn't, then possibly it's something else.  I think that the last
>      > time I looked at the PPPoE implementation on Linux it was a bit hokey --
>      > it ran PPP over a pty pair and then decoded the framing in user space
>      > and wrote it back out over Ethernet using PPPoE.  I hope it's not still
>      > like that, as I haven't looked at it in years, but it may well be.  An
>      > internal error in that logic could also cause a "hangup" message,
>      > although hopefully along with some kind of system log about a core file
>      > as well.
> 
> No, that's not how PPPoE has worked for at least 15 years.
> There is a PPPoE socket and the IP packets do not leave the kernel.
> I have built BMS systems that handle thousands of sessions on 8-core
> systems.

OK; thanks for the correction, and it's good to hear.  The ahdlc hacks 
I've seen were just inherently unstable.

That makes it ever more likely that this is just the provider's doing.

-- 
James Carlson         42.703N 71.076W         <carlsonj@workingcode.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (4 preceding siblings ...)
  2020-04-23  2:01 ` James Carlson
@ 2020-04-23 10:37 ` David Balažic
  2020-04-23 11:12 ` David Balažic
                   ` (36 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-04-23 10:37 UTC (permalink / raw)
  To: linux-ppp

Some more info:

 - the connect time is random:
# logread  | grep "Connect time"
Wed Apr 22 16:42:18 2020 daemon.info pppd[7960]: Connect time 1742.8 minutes.
Wed Apr 22 19:13:03 2020 daemon.info pppd[9563]: Connect time 150.7 minutes.
Thu Apr 23 09:38:31 2020 daemon.info pppd[9884]: Connect time 865.5 minutes.



 - the assigned IP address is different on each connection


 - logs
On Thu, 23 Apr 2020 at 00:00, James Carlson <carlsonj@workingcode.com> wrote:
>
> On 2020-04-22 13:45, David Balažic wrote:
> > Hi!
> >
> > I have a router running openwrt connected to a GPON ONT, running a
> > PPPoE connection (hw details below).
>
> This looks to be (possibly) a PPPoE failure, though there's nothing
> interesting in the log messages provided.  If there were an interesting
> log message, it would almost certainly come somewhere (perhaps even
> *minutes*) before that initial "Modem hangup" message.

Nothing there:

Thu Apr 23 08:59:02 2020 daemon.info hostapd: wlan0: STA
58:XX:XX:XX:XX:b9 IEEE 802.11: deauthenticated due to inactivity
(timer DEAUTH/REMOVE)
Thu Apr 23 09:38:31 2020 daemon.notice pppd[9884]: Modem hangup

I'll try packet capture next.

Regards,
David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (5 preceding siblings ...)
  2020-04-23 10:37 ` David Balažic
@ 2020-04-23 11:12 ` David Balažic
  2020-04-23 12:13 ` David Balažic
                   ` (35 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-04-23 11:12 UTC (permalink / raw)
  To: linux-ppp

One more thing I noticed:

the packet count for eth1 is much higher than for eth1.3902:


eth1      Link encap:Ethernet  HWaddr C4:XX:XX:XX:XX:ED
          inet6 addr: fe80::c6xx:xxff:fexx:xxed/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:23147175 errors:0 dropped:0 overruns:3 frame:0
          TX packets:12334230 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3553425618 (3.3 GiB)  TX bytes:4237688514 (3.9 GiB)
          Interrupt:5

eth1.3902 Link encap:Ethernet  HWaddr C4:XX:XX:XX:XX:ED
          inet6 addr: fe80::c6xx:xxff:fexx:xxed/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:682037 errors:0 dropped:0 overruns:0 frame:0
          TX packets:91218 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:373700490 (356.3 MiB)  TX bytes:26176360 (24.9 MiB)

As if the eth1.3902 is somehow reset every now and then. I'll keep my eye on it.


Also: the IPv6 address (the part derived from hwaddr) starts with c6
instead of c4 like the actual hw address. Is that normal?



On Thu, 23 Apr 2020 at 12:37, David Balažic <xerces9@gmail.com> wrote:
>
> Some more info:
>
>  - the connect time is random:
> # logread  | grep "Connect time"
> Wed Apr 22 16:42:18 2020 daemon.info pppd[7960]: Connect time 1742.8 minutes.
> Wed Apr 22 19:13:03 2020 daemon.info pppd[9563]: Connect time 150.7 minutes.
> Thu Apr 23 09:38:31 2020 daemon.info pppd[9884]: Connect time 865.5 minutes.
>
>
>
>  - the assigned IP address is different on each connection
>
>
>  - logs
> On Thu, 23 Apr 2020 at 00:00, James Carlson <carlsonj@workingcode.com> wrote:
> >
> > On 2020-04-22 13:45, David Balažic wrote:
> > > Hi!
> > >
> > > I have a router running openwrt connected to a GPON ONT, running a
> > > PPPoE connection (hw details below).
> >
> > This looks to be (possibly) a PPPoE failure, though there's nothing
> > interesting in the log messages provided.  If there were an interesting
> > log message, it would almost certainly come somewhere (perhaps even
> > *minutes*) before that initial "Modem hangup" message.
>
> Nothing there:
>
> Thu Apr 23 08:59:02 2020 daemon.info hostapd: wlan0: STA
> 58:XX:XX:XX:XX:b9 IEEE 802.11: deauthenticated due to inactivity
> (timer DEAUTH/REMOVE)
> Thu Apr 23 09:38:31 2020 daemon.notice pppd[9884]: Modem hangup
>
> I'll try packet capture next.
>
> Regards,
> David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (6 preceding siblings ...)
  2020-04-23 11:12 ` David Balažic
@ 2020-04-23 12:13 ` David Balažic
  2020-04-23 15:59 ` Michael Richardson
                   ` (34 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-04-23 12:13 UTC (permalink / raw)
  To: linux-ppp

Oh, i'm "lucky", shortly after starting "tcpdump -i eth1.3902 pppoed"
there was another disconnect:

Thu Apr 23 11:44:13 2020 daemon.notice pppd[10756]: Modem hangup

tcpdump output:

listening on eth1.3902, link-type EN10MB (Ethernet), capture size 262144 bytes
11:44:09.749322 PPPoE PADO [Service-Name] [AC-Name "SIMB_TABOR_BNG1"]
[Host-Uniq 0x44************long_number******************************AA]
[AC-Cookie ".5b************v"]
11:44:09.754297 PPPoE PADS [ses 0x1] [Service-Name] [Host-Uniq
0x44*******************long_number*********************************AA]
11:44:13.115312 PPPoE PADT [ses 0x1]
11:44:13.126422 PPPoE PADT [ses 0x1] [Host-Uniq 0x00002A04] [AC-Cookie
0xED****************************75]
tcpdump: pcap_loop: The interface went down
4 packets captured
22 packets received by filter
0 packets dropped by kernel


The ifconfig packet counter for the VLAN interface eth1.3902 is reset,
compare to values in quested message below, this is th eoutpu a few
minutes after the reconnect:

eth1      Link encap:Ethernet  HWaddr C4:3D:C7:90:CE:ED
          inet6 addr: fe80::c63d:c7ff:fe90:ceed/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:24193753 errors:0 dropped:0 overruns:3 frame:0
          TX packets:12513809 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:4171215264 (3.8 GiB)  TX bytes:2485973 (2.3 MiB)
          Interrupt:5

eth1.3902 Link encap:Ethernet  HWaddr C4:3D:C7:90:CE:ED
          inet6 addr: fe80::c63d:c7ff:fe90:ceed/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:255930 errors:0 dropped:0 overruns:0 frame:0
          TX packets:42015 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:126494373 (120.6 MiB)  TX bytes:25853728 (24.6 MiB)


PS: The IPv6 address is according to standards, ignore that question.



On Thu, 23 Apr 2020 at 13:12, David Balažic <xerces9@gmail.com> wrote:
>
> One more thing I noticed:
>
> the packet count for eth1 is much higher than for eth1.3902:
>
>
> eth1      Link encap:Ethernet  HWaddr C4:XX:XX:XX:XX:ED
>           inet6 addr: fe80::c6xx:xxff:fexx:xxed/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:23147175 errors:0 dropped:0 overruns:3 frame:0
>           TX packets:12334230 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:3553425618 (3.3 GiB)  TX bytes:4237688514 (3.9 GiB)
>           Interrupt:5
>
> eth1.3902 Link encap:Ethernet  HWaddr C4:XX:XX:XX:XX:ED
>           inet6 addr: fe80::c6xx:xxff:fexx:xxed/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:682037 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:91218 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:373700490 (356.3 MiB)  TX bytes:26176360 (24.9 MiB)
>
> As if the eth1.3902 is somehow reset every now and then. I'll keep my eye on it.
>
>
> Also: the IPv6 address (the part derived from hwaddr) starts with c6
> instead of c4 like the actual hw address. Is that normal?
>
>
>
> On Thu, 23 Apr 2020 at 12:37, David Balažic <xerces9@gmail.com> wrote:
> >
> > Some more info:
> >
> >  - the connect time is random:
> > # logread  | grep "Connect time"
> > Wed Apr 22 16:42:18 2020 daemon.info pppd[7960]: Connect time 1742.8 minutes.
> > Wed Apr 22 19:13:03 2020 daemon.info pppd[9563]: Connect time 150.7 minutes.
> > Thu Apr 23 09:38:31 2020 daemon.info pppd[9884]: Connect time 865.5 minutes.
> >
> >
> >
> >  - the assigned IP address is different on each connection
> >
> >
> >  - logs
> > On Thu, 23 Apr 2020 at 00:00, James Carlson <carlsonj@workingcode.com> wrote:
> > >
> > > On 2020-04-22 13:45, David Balažic wrote:
> > > > Hi!
> > > >
> > > > I have a router running openwrt connected to a GPON ONT, running a
> > > > PPPoE connection (hw details below).
> > >
> > > This looks to be (possibly) a PPPoE failure, though there's nothing
> > > interesting in the log messages provided.  If there were an interesting
> > > log message, it would almost certainly come somewhere (perhaps even
> > > *minutes*) before that initial "Modem hangup" message.
> >
> > Nothing there:
> >
> > Thu Apr 23 08:59:02 2020 daemon.info hostapd: wlan0: STA
> > 58:XX:XX:XX:XX:b9 IEEE 802.11: deauthenticated due to inactivity
> > (timer DEAUTH/REMOVE)
> > Thu Apr 23 09:38:31 2020 daemon.notice pppd[9884]: Modem hangup
> >
> > I'll try packet capture next.
> >
> > Regards,
> > David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (7 preceding siblings ...)
  2020-04-23 12:13 ` David Balažic
@ 2020-04-23 15:59 ` Michael Richardson
  2020-04-23 16:01 ` Michael Richardson
                   ` (33 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Michael Richardson @ 2020-04-23 15:59 UTC (permalink / raw)
  To: linux-ppp


David Balažic <xerces9@gmail.com> wrote:
    > Also: the IPv6 address (the part derived from hwaddr) starts with c6
    > instead of c4 like the actual hw address. Is that normal?

There is a bit in the IPv6 IID (bit 2), which is the L/G bit, and says if the
IID is considered globally unique.

--
]               Never tell me the odds!                 | ipv6 mesh networks [
]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
]     mcr@sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    [

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (8 preceding siblings ...)
  2020-04-23 15:59 ` Michael Richardson
@ 2020-04-23 16:01 ` Michael Richardson
  2020-04-24 13:47 ` David Balažic
                   ` (32 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Michael Richardson @ 2020-04-23 16:01 UTC (permalink / raw)
  To: linux-ppp


David Balažic <xerces9@gmail.com> wrote:
    > tcpdump: pcap_loop: The interface went down
    > 4 packets captured
    > 22 packets received by filter
    > 0 packets dropped by kernel

    > The ifconfig packet counter for the VLAN interface eth1.3902 is reset,
    > compare to values in quested message below, this is th eoutpu a few
    > minutes after the reconnect:

.. your counter reset suggests that somehow your VLAN interface is being
killed and restarted.
That's really weird.

This is openwrt?

--
]               Never tell me the odds!                 | ipv6 mesh networks [
]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
]     mcr@sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    [

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (9 preceding siblings ...)
  2020-04-23 16:01 ` Michael Richardson
@ 2020-04-24 13:47 ` David Balažic
  2020-04-24 14:02 ` David Balažic
                   ` (31 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-04-24 13:47 UTC (permalink / raw)
  To: linux-ppp

On Thu, 23 Apr 2020 at 21:36, David Balažic <xerces9@gmail.com> wrote:
>
> OK, more data:
> I started tcpdump with the -e option as suggested on openwrt forum:
> tcpdump -e -v -i eth1.3902 pppoed
>
> on disconnect, this was logged (times are UTC):
>
>
> tcpdump: listening on eth1.3902, link-type EN10MB (Ethernet), capture
> size 262144 bytes
> 18:23:09.204991 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
> (oui Unknown), ethertype PPPoE D (0x8863), length 97: PPPoE PADO
> [Service-Name] [AC-Name "SIMB_TABOR_BNG1"] [Host-Uniq
> 0x44************long_number******************************AA]
> [AC-Cookie ".5b************v"]

Isnt this strange? The dst addr is not my router or any other known.
Also the Host-Uniq value is different.
As if is traffic meant for someone else.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (10 preceding siblings ...)
  2020-04-24 13:47 ` David Balažic
@ 2020-04-24 14:02 ` David Balažic
  2020-04-24 14:26 ` James Carlson
                   ` (30 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-04-24 14:02 UTC (permalink / raw)
  To: linux-ppp

Sorry for another post, but I just noticed I had a disconnect again,
this time I captured the LCP traffic with: tcpdump -evpni eth1.3902
'not ether[0x14:2] = 0x21 and not ether[0x14:2] = 0x57'


10:42:01.795885 c4:xx:xx:xx:xx:ed > a4:xx:xx:xx:xx:44, ethertype PPPoE
S (0x8864), length 30: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP,
Echo-Request (0x09), id 20, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x277f6119
10:42:01.807239 a4:xx:xx:xx:xx:44 > c4:xx:xx:xx:xx:ed, ethertype PPPoE
S (0x8864), length 56: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP,
Echo-Reply (0x0a), id 20, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x631aa5be
10:42:04.799255 c4:xx:xx:xx:xx:ed > a4:xx:xx:xx:xx:44, ethertype PPPoE
S (0x8864), length 30: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP,
Echo-Request (0x09), id 21, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x277f6119
10:42:04.809234 a4:xx:xx:xx:xx:44 > c4:xx:xx:xx:xx:ed, ethertype PPPoE
S (0x8864), length 56: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP,
Echo-Reply (0x0a), id 21, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x631aa5be
10:42:06.801533 c4:xx:xx:xx:xx:ed > a4:xx:xx:xx:xx:44, ethertype PPPoE
S (0x8864), length 30: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP,
Echo-Request (0x09), id 22, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x277f6119
10:42:06.807236 a4:xx:xx:xx:xx:44 > c4:xx:xx:xx:xx:ed, ethertype PPPoE
S (0x8864), length 56: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP,
Echo-Reply (0x0a), id 22, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x631aa5be
10:42:08.803817 c4:xx:xx:xx:xx:ed > a4:xx:xx:xx:xx:44, ethertype PPPoE
S (0x8864), length 30: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP,
Echo-Request (0x09), id 23, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x277f6119
10:42:08.817232 a4:xx:xx:xx:xx:44 > c4:xx:xx:xx:xx:ed, ethertype PPPoE
S (0x8864), length 56: PPPoE  [ses 0x1] LCP (0xc021), length 10: LCP,
Echo-Reply (0x0a), id 23, length 10
        encoded length 8 (=Option(s) length 4)
          Magic-Num 0x631aa5be
tcpdump: pcap_loop: The interface went down
52753 packets captured
52760 packets received by filter
0 packets dropped by kernel

ppp log:


Fri Apr 24 10:42:32 2020 daemon.notice pppd[11986]: Modem hangup
Fri Apr 24 10:42:32 2020 daemon.info pppd[11986]: Connect time 979.3 minutes.
Fri Apr 24 10:42:32 2020 daemon.info pppd[11986]: Sent 325423487
bytes, received 1643504692 bytes.

It seems echo requests come more often towards the end, almost every
second. Earlier they are also irregular, but come at 30-60 second
intervals.



On Fri, 24 Apr 2020 at 15:47, David Balažic <xerces9@gmail.com> wrote:
>
> On Thu, 23 Apr 2020 at 21:36, David Balažic <xerces9@gmail.com> wrote:
> >
> > OK, more data:
> > I started tcpdump with the -e option as suggested on openwrt forum:
> > tcpdump -e -v -i eth1.3902 pppoed
> >
> > on disconnect, this was logged (times are UTC):
> >
> >
> > tcpdump: listening on eth1.3902, link-type EN10MB (Ethernet), capture
> > size 262144 bytes
> > 18:23:09.204991 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
> > (oui Unknown), ethertype PPPoE D (0x8863), length 97: PPPoE PADO
> > [Service-Name] [AC-Name "SIMB_TABOR_BNG1"] [Host-Uniq
> > 0x44************long_number******************************AA]
> > [AC-Cookie ".5b************v"]
>
> Isnt this strange? The dst addr is not my router or any other known.
> Also the Host-Uniq value is different.
> As if is traffic meant for someone else.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (11 preceding siblings ...)
  2020-04-24 14:02 ` David Balažic
@ 2020-04-24 14:26 ` James Carlson
  2020-04-24 15:44 ` Guillaume Nault
                   ` (29 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: James Carlson @ 2020-04-24 14:26 UTC (permalink / raw)
  To: linux-ppp

On 2020-04-24 10:02, David Balažic wrote:
> ppp log:
> 
> 
> Fri Apr 24 10:42:32 2020 daemon.notice pppd[11986]: Modem hangup
> Fri Apr 24 10:42:32 2020 daemon.info pppd[11986]: Connect time 979.3 minutes.
> Fri Apr 24 10:42:32 2020 daemon.info pppd[11986]: Sent 325423487
> bytes, received 1643504692 bytes.

Truncated logs aren't so helpful.  This tells us nothing about what's
going on.

> It seems echo requests come more often towards the end, almost every
> second. Earlier they are also irregular, but come at 30-60 second
> intervals.

LCP Echo-Request messages are an optional feature.  They're controlled
by the lcp-echo-interval and lcp-echo-failure options.  By default, pppd
won't send these, but will respond if the other side sends them.

The idea is that if you send a request, and the other side fails to
respond, then, regardless of what the hardware is telling you, the link
itself is dead, and can be torn down.

However, in the exchange you've shown here, every request has a
response, and all of the requests and responses appear to be in good
order (each side is attaching its own LCP magic number option, which is
the right thing to see, and the ID numbers line up for request/reply).

There's nothing out of what you've sent that indicates (to me, at least)
any sort of problem, at least within PPP itself.

-- 
James Carlson         42.703N 71.076W         <carlsonj@workingcode.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (12 preceding siblings ...)
  2020-04-24 14:26 ` James Carlson
@ 2020-04-24 15:44 ` Guillaume Nault
  2020-04-24 15:54 ` Guillaume Nault
                   ` (28 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Guillaume Nault @ 2020-04-24 15:44 UTC (permalink / raw)
  To: linux-ppp

On Fri, Apr 24, 2020 at 03:47:34PM +0200, David Balažic wrote:
> On Thu, 23 Apr 2020 at 21:36, David Balažic <xerces9@gmail.com> wrote:
> >
> > OK, more data:
> > I started tcpdump with the -e option as suggested on openwrt forum:
> > tcpdump -e -v -i eth1.3902 pppoed
> >
> > on disconnect, this was logged (times are UTC):
> >
> >
> > tcpdump: listening on eth1.3902, link-type EN10MB (Ethernet), capture
> > size 262144 bytes
> > 18:23:09.204991 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
> > (oui Unknown), ethertype PPPoE D (0x8863), length 97: PPPoE PADO
> > [Service-Name] [AC-Name "SIMB_TABOR_BNG1"] [Host-Uniq
> > 0x44************long_number******************************AA]
> > [AC-Cookie ".5b************v"]
> 
> Isnt this strange? The dst addr is not my router or any other known.
> Also the Host-Uniq value is different.
> As if is traffic meant for someone else.
> 
You're probably not supposed to receive this frame, but a PADO
shouldn't disconnect you. If that was a PADT, then yes, it'd be a good
suspect.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (13 preceding siblings ...)
  2020-04-24 15:44 ` Guillaume Nault
@ 2020-04-24 15:54 ` Guillaume Nault
  2020-04-26  1:38 ` Michael Richardson
                   ` (27 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Guillaume Nault @ 2020-04-24 15:54 UTC (permalink / raw)
  To: linux-ppp

On Thu, Apr 23, 2020 at 02:13:14PM +0200, David Balažic wrote:
> Oh, i'm "lucky", shortly after starting "tcpdump -i eth1.3902 pppoed"
> there was another disconnect:
> 
> Thu Apr 23 11:44:13 2020 daemon.notice pppd[10756]: Modem hangup
> 
> tcpdump output:
> 
> listening on eth1.3902, link-type EN10MB (Ethernet), capture size 262144 bytes
> 11:44:09.749322 PPPoE PADO [Service-Name] [AC-Name "SIMB_TABOR_BNG1"]
> [Host-Uniq 0x44************long_number******************************AA]
> [AC-Cookie ".5b************v"]
> 11:44:09.754297 PPPoE PADS [ses 0x1] [Service-Name] [Host-Uniq
> 0x44*******************long_number*********************************AA]
> 11:44:13.115312 PPPoE PADT [ses 0x1]
> 11:44:13.126422 PPPoE PADT [ses 0x1] [Host-Uniq 0x00002A04] [AC-Cookie
> 0xED****************************75]
> tcpdump: pcap_loop: The interface went down
> 4 packets captured
> 22 packets received by filter
> 0 packets dropped by kernel
> 
> 
> The ifconfig packet counter for the VLAN interface eth1.3902 is reset,
> compare to values in quested message below, this is th eoutpu a few
> minutes after the reconnect:
> 
> eth1      Link encap:Ethernet  HWaddr C4:3D:C7:90:CE:ED
>           inet6 addr: fe80::c63d:c7ff:fe90:ceed/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:24193753 errors:0 dropped:0 overruns:3 frame:0
>           TX packets:12513809 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:4171215264 (3.8 GiB)  TX bytes:2485973 (2.3 MiB)
>           Interrupt:5
> 
> eth1.3902 Link encap:Ethernet  HWaddr C4:3D:C7:90:CE:ED
>           inet6 addr: fe80::c63d:c7ff:fe90:ceed/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:255930 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:42015 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:126494373 (120.6 MiB)  TX bytes:25853728 (24.6 MiB)
> 
You can monitor the evolution of you network devices with rtmon:
$ rtmon file ./some/file.log link

Then read it with ip monitor:
$ ip monitor file ./some/file.log

That should tell if eth1.3902 is ever passed down or even deleted.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (14 preceding siblings ...)
  2020-04-24 15:54 ` Guillaume Nault
@ 2020-04-26  1:38 ` Michael Richardson
  2020-04-26 12:44 ` David Balažic
                   ` (26 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Michael Richardson @ 2020-04-26  1:38 UTC (permalink / raw)
  To: linux-ppp

[-- Attachment #1: Type: text/plain, Size: 1607 bytes --]


David Balažic <xerces9@gmail.com> wrote:
    >> OK, more data:
    >> I started tcpdump with the -e option as suggested on openwrt forum:
    >> tcpdump -e -v -i eth1.3902 pppoed
    >>
    >> on disconnect, this was logged (times are UTC):
    >>
    >>
    >> tcpdump: listening on eth1.3902, link-type EN10MB (Ethernet), capture
    >> size 262144 bytes
    >> 18:23:09.204991 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
    >> (oui Unknown), ethertype PPPoE D (0x8863), length 97: PPPoE PADO
    >> [Service-Name] [AC-Name "SIMB_TABOR_BNG1"] [Host-Uniq
    >> 0x44************long_number******************************AA]
    >> [AC-Cookie ".5b************v"]

    > Isnt this strange? The dst addr is not my router or any other known.
    > Also the Host-Uniq value is different.
    > As if is traffic meant for someone else.

oh, I understand.
thank you for noting this.
So, it's as if you are getting a PPPoE message for someone else, and when
that happens, the interface is dying.
Is it always the same ethernet address?

Do you have ebtables?  Could you arrange to filter out packets like that?
That would point to there being some kernel bug.

Clearly, your ISP has some other bug that they are sending stuff down the
wrong pipe, but that could just be hash collisions that they assume are
"harmless".

--
]               Never tell me the odds!                 | ipv6 mesh networks [
]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
]     mcr@sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    [




[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (15 preceding siblings ...)
  2020-04-26  1:38 ` Michael Richardson
@ 2020-04-26 12:44 ` David Balažic
  2020-04-26 12:48 ` David Balažic
                   ` (25 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-04-26 12:44 UTC (permalink / raw)
  To: linux-ppp

On Sun, 26 Apr 2020 at 03:34, Michael Richardson <mcr@sandelman.ca> wrote:
>
> David Balažic <xerces9@gmail.com> wrote:
>     > OK, more data:
>     > I started tcpdump with the -e option as suggested on openwrt forum:
>     > tcpdump -e -v -i eth1.3902 pppoed
>
> tcpdump isn't going to be able to see what happens after the interface goes
> down.
>         tcpdump -e -v -i eth1 vlan 3902 pppoed
>
> ought to work slightly better, but due to the way vlans are evaluated, might
> not work properly.  {as maintainer, it's a major PITA...}

tcpdump -e -v -i eth1 vlan 3902 and pppoed

seems to work, it started it and wait for results. (just "tcpdump -e
-v -i eth1 vlan 3902" shows a lot of packets)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (16 preceding siblings ...)
  2020-04-26 12:44 ` David Balažic
@ 2020-04-26 12:48 ` David Balažic
  2020-04-26 19:03 ` David Balažic
                   ` (24 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-04-26 12:48 UTC (permalink / raw)
  To: linux-ppp

On Sun, 26 Apr 2020 at 03:38, Michael Richardson <mcr@sandelman.ca> wrote:
>
>
> David Balažic <xerces9@gmail.com> wrote:
>     >> OK, more data:
>     >> I started tcpdump with the -e option as suggested on openwrt forum:
>     >> tcpdump -e -v -i eth1.3902 pppoed
>     >>
>     >> on disconnect, this was logged (times are UTC):
>     >>
>     >>
>     >> tcpdump: listening on eth1.3902, link-type EN10MB (Ethernet), capture
>     >> size 262144 bytes
>     >> 18:23:09.204991 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
>     >> (oui Unknown), ethertype PPPoE D (0x8863), length 97: PPPoE PADO
>     >> [Service-Name] [AC-Name "SIMB_TABOR_BNG1"] [Host-Uniq
>     >> 0x44************long_number******************************AA]
>     >> [AC-Cookie ".5b************v"]
>
>     > Isnt this strange? The dst addr is not my router or any other known.
>     > Also the Host-Uniq value is different.
>     > As if is traffic meant for someone else.
>
> oh, I understand.
> thank you for noting this.
> So, it's as if you are getting a PPPoE message for someone else, and when
> that happens, the interface is dying.
> Is it always the same ethernet address?

Yes,the same.

> Do you have ebtables?  Could you arrange to filter out packets like that?
> That would point to there being some kernel bug.
>
> Clearly, your ISP has some other bug that they are sending stuff down the
> wrong pipe, but that could just be hash collisions that they assume are
> "harmless".

Those packets don't seem to be a problem. In the last connection, I
received 15 of them and no reconnect.

As suggested on OpenWRT, I removed a IPv6 wan device config that was
running DHCPv6 on eth1.

Then he connection was up for 45 hours before dropping. Could be a
coincidence...

I posted the logs at
https://forum.openwrt.org/t/pppoe-disconnects-every-few-hours/61239/22
(gmail wraps lines...)

Regards,
David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (17 preceding siblings ...)
  2020-04-26 12:48 ` David Balažic
@ 2020-04-26 19:03 ` David Balažic
  2020-04-27  2:14 ` Michael Richardson
                   ` (23 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-04-26 19:03 UTC (permalink / raw)
  To: linux-ppp

OK, next round.

First:
> Do you have ebtables?  Could you arrange to filter out packets like that?
> That would point to there being some kernel bug.
They seems to "arrive" only if the interface is put into promiscuous mode.
That is tcpdump  captures them unless I add the -p option.
I will recheck this to be sure.


The current disconnect:

 - output of  tcpdump -e -v -i eth1 vlan 3902 and  pppoed:

tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size
262144 bytes
17:34:34.805424 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
(oui Unknown), ethertype 802.1Q (0x8100), length 60: vlan 3902, p 1,
ethertype PPPoE D, PPPoE PADT [ses 0x1]
17:34:34.815673 c4:3d:c7:90:ce:ed (oui Unknown) > a4:7b:2c:9e:c7:44
(oui Unknown), ethertype 802.1Q (0x8100), length 52: vlan 3902, p 0,
ethertype PPPoE D, PPPoE PADT [ses 0x1] [Host-Uniq 0x00003003]
[AC-Cookie 0xED****************************75]
tcpdump: pcap_loop: The interface went down
2 packets captured
2 packets received by filter
0 packets dropped by kernel

system logs:  logread | grep -v -E '(hostapd|dnsmasq)'


Sun Apr 26 14:40:26 2020 authpriv.info dropbear[22495]: Exit (root):
Disconnect received
Sun Apr 26 17:34:34 2020 daemon.notice pppd[12291]: Modem hangup
Sun Apr 26 17:34:34 2020 daemon.info pppd[12291]: Connect time 606.7 minutes.
Sun Apr 26 17:34:34 2020 daemon.info pppd[12291]: Sent 105754253
bytes, received 999814297 bytes.
Sun Apr 26 17:34:34 2020 daemon.notice netifd: Network device
'pppoe-wan' link is down
Sun Apr 26 17:34:34 2020 daemon.debug pppd[12291]: Script
/lib/netifd/ppp-down started (pid 20230)
Sun Apr 26 17:34:34 2020 daemon.notice pppd[12291]: Connection terminated.
Sun Apr 26 17:34:34 2020 daemon.debug pppd[12291]: Send PPPOE
Discovery V1T1 PADT session 0x1 length 28
Sun Apr 26 17:34:34 2020 daemon.debug pppd[12291]:  dst
a4:XX:XX:XX:XX:44  src c4:XX:XX:XX:XX:ed
Sun Apr 26 17:34:34 2020 daemon.debug pppd[12291]:  [host-uniq  00 00
30 03] [AC-cookie  ed XY XY XY XY XY XY XY XY XY XY XY XY XY XY 75]
Sun Apr 26 17:34:34 2020 daemon.info pppd[12291]: Sent PADT
Sun Apr 26 17:34:34 2020 daemon.debug pppd[12291]: Waiting for 1 child
processes...
Sun Apr 26 17:34:34 2020 daemon.debug pppd[12291]:   script
/lib/netifd/ppp-down, pid 20230
Sun Apr 26 17:34:34 2020 daemon.notice netifd: Interface 'wan' has
lost the connection
Sun Apr 26 17:34:34 2020 daemon.debug pppd[12291]: Script
/lib/netifd/ppp-down finished (pid 20230), status = 0x1
Sun Apr 26 17:34:34 2020 daemon.info pppd[12291]: Exit.
Sun Apr 26 17:34:35 2020 daemon.notice netifd: Interface 'wan' is now down
Sun Apr 26 17:34:35 2020 kern.info kernel: [605071.537261] eth1: link down
Sun Apr 26 17:34:35 2020 daemon.notice netifd: Interface 'wan' is disabled
Sun Apr 26 17:34:35 2020 kern.info kernel: [605071.567649] IPv6:
ADDRCONF(NETDEV_UP): eth1: link is not ready
Sun Apr 26 17:34:35 2020 kern.info kernel: [605071.574985] eth1: link
up (1000Mbps/Full duplex)
Sun Apr 26 17:34:35 2020 kern.info kernel: [605071.579700] IPv6:
ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
Sun Apr 26 17:34:35 2020 daemon.notice netifd: Interface 'wan' is enabled
Sun Apr 26 17:34:35 2020 daemon.notice netifd: Interface 'wan' is setting up now
Sun Apr 26 17:34:35 2020 kern.info kernel: [605071.645168] device eth1
left promiscuous mode
Sun Apr 26 17:34:35 2020 daemon.err insmod: module is already loaded - slhc
Sun Apr 26 17:34:35 2020 daemon.err insmod: module is already loaded -
ppp_generic
Sun Apr 26 17:34:35 2020 daemon.err insmod: module is already loaded - pppox
Sun Apr 26 17:34:35 2020 daemon.err insmod: module is already loaded - pppoe
Sun Apr 26 17:34:35 2020 daemon.info pppd[20289]: Plugin rp-pppoe.so loaded.
Sun Apr 26 17:34:35 2020 daemon.info pppd[20289]: RP-PPPoE plugin
version 3.8p compiled against pppd 2.4.7
Sun Apr 26 17:34:35 2020 daemon.notice pppd[20289]: pppd 2.4.7 started
by root, uid 0
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]: Send PPPOE
Discovery V1T1 PADI session 0x0 length 12
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]:  dst
ff:ff:ff:ff:ff:ff  src c4:XX:XX:XX:XX:ed
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]:  [service-name]
[host-uniq  00 00 4f 41]
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]: Recv PPPOE
Discovery V1T1 PADO session 0x0 length 51
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]:  dst
c4:XX:XX:XX:XX:ed  src a4:XX:XX:XX:XX:44
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]:  [service-name]
[AC-name SIMB_TABOR_BNG1] [host-uniq  00 00 4f 41] [AC-cookie  ed XY
XY XY XY XY XY XY XY XY XY XY XY XY XY 75]
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]: Send PPPOE
Discovery V1T1 PADR session 0x0 length 32
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]:  dst
a4:XX:XX:XX:XX:44  src c4:XX:XX:XX:XX:ed
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]:  [service-name]
[host-uniq  00 00 4f 41] [AC-cookie  ed XY XY XY XY XY XY XY XY XY XY
XY XY XY XY 75]
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]: Recv PPPOE
Discovery V1T1 PADS session 0x1 length 12
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]:  dst
c4:XX:XX:XX:XX:ed  src a4:XX:XX:XX:XX:44
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]:  [service-name]
[host-uniq  00 00 4f 41]
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]: PADS: Service-Name: ''
Sun Apr 26 17:34:35 2020 daemon.info pppd[20289]: PPP session is 1
Sun Apr 26 17:34:35 2020 daemon.warn pppd[20289]: Connected to
a4:XX:XX:XX:XX:44 via interface eth1.3902
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]: using channel 14
Sun Apr 26 17:34:35 2020 kern.info kernel: [605072.063661] pppoe-wan:
renamed from ppp0
Sun Apr 26 17:34:35 2020 daemon.info pppd[20289]: Renamed interface
ppp0 to pppoe-wan
Sun Apr 26 17:34:35 2020 daemon.info pppd[20289]: Using interface pppoe-wan
Sun Apr 26 17:34:35 2020 daemon.notice pppd[20289]: Connect: pppoe-wan
<--> eth1.3902
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]: sent [LCP ConfReq
id=0x1 <mru 1492> <magic 0x5182441e>]
Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]: rcvd [LCP ConfAck
id=0x1 <mru 1492> <magic 0x5182441e>]
Sun Apr 26 17:34:38 2020 daemon.debug pppd[20289]: sent [LCP ConfReq
id=0x1 <mru 1492> <magic 0x5182441e>]
Sun Apr 26 17:34:38 2020 daemon.debug pppd[20289]: rcvd [LCP ConfAck
id=0x1 <mru 1492> <magic 0x5182441e>]
Sun Apr 26 17:34:38 2020 daemon.debug pppd[20289]: rcvd [LCP ConfReq
id=0xc5 <mru 1492> <auth chap MD5> <magic 0x1ec8e8c8>]
Sun Apr 26 17:34:38 2020 daemon.debug pppd[20289]: sent [LCP ConfAck
id=0xc5 <mru 1492> <auth chap MD5> <magic 0x1ec8e8c8>]
Sun Apr 26 17:34:38 2020 daemon.debug pppd[20289]: sent [LCP EchoReq
id=0x0 magic=0x5182441e]
Sun Apr 26 17:34:38 2020 daemon.debug pppd[20289]: rcvd [CHAP
Challenge id=0x1
<f*********************************************************************50>,
name = "SIMB_TABOR_BNG1"]
Sun Apr 26 17:34:38 2020 daemon.debug pppd[20289]: sent [CHAP Response
id=0x1 <83****************************c4>, name = "username"]
Sun Apr 26 17:34:38 2020 daemon.debug pppd[20289]: rcvd [LCP EchoRep
id=0x0 magic=0x1ec8e8c8]
Sun Apr 26 17:34:38 2020 daemon.debug pppd[20289]: rcvd [CHAP Success
id=0x1 "CHAP authentication success"]
Sun Apr 26 17:34:38 2020 daemon.info pppd[20289]: CHAP authentication
succeeded: CHAP authentication success
Sun Apr 26 17:34:38 2020 daemon.notice pppd[20289]: CHAP
authentication succeeded
Sun Apr 26 17:34:38 2020 daemon.notice pppd[20289]: peer from calling
number a4:XX:XX:XX:XX:44 authorized
Sun Apr 26 17:34:39 2020 daemon.debug pppd[20289]: sent [IPCP ConfReq
id=0x1 <addr 0.0.0.0> <ms-dns1 0.0.0.0> <ms-dns2 0.0.0.0>]
Sun Apr 26 17:34:39 2020 daemon.debug pppd[20289]: sent [IPV6CP
ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
Sun Apr 26 17:34:39 2020 daemon.debug pppd[20289]: rcvd [IPCP ConfReq
id=0xf4 <addr 94.127.WW.ZZ>]
Sun Apr 26 17:34:39 2020 daemon.debug pppd[20289]: sent [IPCP ConfAck
id=0xf4 <addr 94.127.WW.ZZ>]
Sun Apr 26 17:34:39 2020 daemon.debug pppd[20289]: rcvd [IPCP ConfNak
id=0x1 <addr 46.164.XX.YY> <ms-dns1 212.18.32.10> <ms-dns2
212.18.32.12>]
Sun Apr 26 17:34:39 2020 daemon.debug pppd[20289]: sent [IPCP ConfReq
id=0x2 <addr 46.164.XX.YY> <ms-dns1 212.18.32.10> <ms-dns2
212.18.32.12>]
Sun Apr 26 17:34:39 2020 daemon.debug pppd[20289]: rcvd [IPCP ConfAck
id=0x2 <addr 46.164.XX.YY> <ms-dns1 212.18.32.10> <ms-dns2
212.18.32.12>]
Sun Apr 26 17:34:39 2020 daemon.notice pppd[20289]: local  IP address
46.164.XX.YY
Sun Apr 26 17:34:39 2020 daemon.notice pppd[20289]: remote IP address
94.127.WW.ZZ
Sun Apr 26 17:34:39 2020 daemon.notice pppd[20289]: primary   DNS
address 212.18.32.10
Sun Apr 26 17:34:39 2020 daemon.notice pppd[20289]: secondary DNS
address 212.18.32.12
Sun Apr 26 17:34:39 2020 daemon.debug pppd[20289]: Script
/lib/netifd/ppp-up started (pid 20355)
Sun Apr 26 17:34:39 2020 daemon.notice netifd: Network device
'pppoe-wan' link is up
Sun Apr 26 17:34:39 2020 daemon.notice netifd: Interface 'wan' is now up
Sun Apr 26 17:34:39 2020 daemon.debug pppd[20289]: Script
/lib/netifd/ppp-up finished (pid 20355), status = 0x1
Sun Apr 26 17:34:39 2020 user.notice firewall: Reloading firewall due
to ifup of wan (pppoe-wan)
Sun Apr 26 17:34:42 2020 daemon.debug pppd[20289]: sent [IPV6CP
ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
Sun Apr 26 17:34:45 2020 daemon.debug pppd[20289]: sent [IPV6CP
ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
Sun Apr 26 17:34:48 2020 daemon.debug pppd[20289]: sent [IPV6CP
ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
Sun Apr 26 17:34:51 2020 daemon.debug pppd[20289]: sent [IPV6CP
ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
Sun Apr 26 17:34:54 2020 daemon.debug pppd[20289]: sent [IPV6CP
ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
Sun Apr 26 17:34:57 2020 daemon.debug pppd[20289]: sent [IPV6CP
ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
Sun Apr 26 17:35:00 2020 daemon.debug pppd[20289]: sent [IPV6CP
ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
Sun Apr 26 17:35:03 2020 daemon.debug pppd[20289]: sent [IPV6CP
ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
Sun Apr 26 17:35:06 2020 daemon.debug pppd[20289]: sent [IPV6CP
ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
Sun Apr 26 17:35:09 2020 daemon.warn pppd[20289]: IPV6CP: timeout
sending Config-Requests


The strange part is in the tcpdump there is a PADT sent to an
"unknown" MAC and my pppd responds. At least that is how I see it.

You think NOT putting the interface into promiscuous mode (done by
tcpdump) would prevent this?
Anyway, now I startted tcpdump with the -p option:  tcpdump -e -v -p
-i eth1 vlan 3902 and pppoed


On Sun, 26 Apr 2020 at 14:48, David Balažic <xerces9@gmail.com> wrote:
>
> On Sun, 26 Apr 2020 at 03:38, Michael Richardson <mcr@sandelman.ca> wrote:
> >
> >
> > David Balažic <xerces9@gmail.com> wrote:
> >     >> OK, more data:
> >     >> I started tcpdump with the -e option as suggested on openwrt forum:
> >     >> tcpdump -e -v -i eth1.3902 pppoed
> >     >>
> >     >> on disconnect, this was logged (times are UTC):
> >     >>
> >     >>
> >     >> tcpdump: listening on eth1.3902, link-type EN10MB (Ethernet), capture
> >     >> size 262144 bytes
> >     >> 18:23:09.204991 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
> >     >> (oui Unknown), ethertype PPPoE D (0x8863), length 97: PPPoE PADO
> >     >> [Service-Name] [AC-Name "SIMB_TABOR_BNG1"] [Host-Uniq
> >     >> 0x44************long_number******************************AA]
> >     >> [AC-Cookie ".5b************v"]
> >
> >     > Isnt this strange? The dst addr is not my router or any other known.
> >     > Also the Host-Uniq value is different.
> >     > As if is traffic meant for someone else.
> >
> > oh, I understand.
> > thank you for noting this.
> > So, it's as if you are getting a PPPoE message for someone else, and when
> > that happens, the interface is dying.
> > Is it always the same ethernet address?
>
> Yes,the same.
>
> > Do you have ebtables?  Could you arrange to filter out packets like that?
> > That would point to there being some kernel bug.
> >
> > Clearly, your ISP has some other bug that they are sending stuff down the
> > wrong pipe, but that could just be hash collisions that they assume are
> > "harmless".
>
> Those packets don't seem to be a problem. In the last connection, I
> received 15 of them and no reconnect.
>
> As suggested on OpenWRT, I removed a IPv6 wan device config that was
> running DHCPv6 on eth1.
>
> Then he connection was up for 45 hours before dropping. Could be a
> coincidence...
>
> I posted the logs at
> https://forum.openwrt.org/t/pppoe-disconnects-every-few-hours/61239/22
> (gmail wraps lines...)
>
> Regards,
> David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (18 preceding siblings ...)
  2020-04-26 19:03 ` David Balažic
@ 2020-04-27  2:14 ` Michael Richardson
  2020-04-27  9:59 ` David Balažic
                   ` (22 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Michael Richardson @ 2020-04-27  2:14 UTC (permalink / raw)
  To: linux-ppp

[-- Attachment #1: Type: text/plain, Size: 1826 bytes --]


David Balažic <xerces9@gmail.com> wrote:
    > Sun Apr 26 17:34:57 2020 daemon.debug pppd[20289]: sent [IPV6CP
    > ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
    > Sun Apr 26 17:35:00 2020 daemon.debug pppd[20289]: sent [IPV6CP
    > ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
    > Sun Apr 26 17:35:03 2020 daemon.debug pppd[20289]: sent [IPV6CP
    > ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
    > Sun Apr 26 17:35:06 2020 daemon.debug pppd[20289]: sent [IPV6CP
    > ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
    > Sun Apr 26 17:35:09 2020 daemon.warn pppd[20289]: IPV6CP: timeout
    > sending Config-Requests

Could this be the reason for the hangup?
pppd gets tired of no IPv6, decides it should hangup?

    > The strange part is in the tcpdump there is a PADT sent to an
    > "unknown" MAC and my pppd responds. At least that is how I see it.

    > You think NOT putting the interface into promiscuous mode (done by
    > tcpdump) would prevent this?
    > Anyway, now I startted tcpdump with the -p option:  tcpdump -e -v -p
    > -i eth1 vlan 3902 and pppoed

It could be that promisc mode (no -p) means that the PADT makes something
break, different than what you are investigating.  -p avoid promisc mode, so
would avoid seeing that packet.

You mention in another thread that you were trying to do DHCPv6 on a
different (non-PPPoE) interface.  I don't see how that would matter unless
the failure caused netifd to decide to retry it all.

It seems that you ought to try the noipv6 option to pppd.


--
]               Never tell me the odds!                 | ipv6 mesh networks [
]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
]     mcr@sandelman.ca  http://www.sandelman.ca/        |   ruby on rails    [


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (19 preceding siblings ...)
  2020-04-27  2:14 ` Michael Richardson
@ 2020-04-27  9:59 ` David Balažic
  2020-04-27 13:43 ` James Carlson
                   ` (21 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-04-27  9:59 UTC (permalink / raw)
  To: linux-ppp

On Mon, 27 Apr 2020 at 04:14, Michael Richardson <mcr@sandelman.ca> wrote:
>
>
> David Balažic <xerces9@gmail.com> wrote:
>     > Sun Apr 26 17:34:57 2020 daemon.debug pppd[20289]: sent [IPV6CP
>     > ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
>     > Sun Apr 26 17:35:00 2020 daemon.debug pppd[20289]: sent [IPV6CP
>     > ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
>     > Sun Apr 26 17:35:03 2020 daemon.debug pppd[20289]: sent [IPV6CP
>     > ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
>     > Sun Apr 26 17:35:06 2020 daemon.debug pppd[20289]: sent [IPV6CP
>     > ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
>     > Sun Apr 26 17:35:09 2020 daemon.warn pppd[20289]: IPV6CP: timeout
>     > sending Config-Requests
>
> Could this be the reason for the hangup?
> pppd gets tired of no IPv6, decides it should hangup?

These lines are logged in the 30 seconds after the connection is
established. Later they never show up. It was like that for all
connections thus far.

>     > The strange part is in the tcpdump there is a PADT sent to an
>     > "unknown" MAC and my pppd responds. At least that is how I see it.
>
>     > You think NOT putting the interface into promiscuous mode (done by
>     > tcpdump) would prevent this?
>     > Anyway, now I startted tcpdump with the -p option:  tcpdump -e -v -p
>     > -i eth1 vlan 3902 and pppoed
>
> It could be that promisc mode (no -p) means that the PADT makes something
> break, different than what you are investigating.  -p avoid promisc mode, so
> would avoid seeing that packet.
>
> You mention in another thread that you were trying to do DHCPv6 on a
> different (non-PPPoE) interface.  I don't see how that would matter unless
> the failure caused netifd to decide to retry it all.
>
> It seems that you ought to try the noipv6 option to pppd.

I removed it (from the system config, the file /etc/config/network ).
I also removed that "other interface", so there is no IPv6 stuff left.
The new command line is:
/usr/sbin/pppd nodetach ipparam wan ifname pppoe-wan lcp-echo-interval
1 lcp-echo-failure 5 lcp-echo-adaptive nodefaultroute usepeerdns
maxfail 1 user YYYYYY password XXXXX ip-up-script /lib/netifd/ppp-up
ip-down-script /lib/netifd/ppp-down mtu 1492 mru 1492 plugin
rp-pppoe.so nic-eth1.3902

For the record: The new connection with new settings was started on
Mon Apr 27 09:47:48 (UTC)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (20 preceding siblings ...)
  2020-04-27  9:59 ` David Balažic
@ 2020-04-27 13:43 ` James Carlson
  2020-04-27 20:08 ` Guillaume Nault
                   ` (20 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: James Carlson @ 2020-04-27 13:43 UTC (permalink / raw)
  To: linux-ppp

On 2020-04-26 15:03, David Balažic wrote:
> tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size
> 262144 bytes
> 17:34:34.805424 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
> (oui Unknown), ethertype 802.1Q (0x8100), length 60: vlan 3902, p 1,
> ethertype PPPoE D, PPPoE PADT [ses 0x1]
> 17:34:34.815673 c4:3d:c7:90:ce:ed (oui Unknown) > a4:7b:2c:9e:c7:44
> (oui Unknown), ethertype 802.1Q (0x8100), length 52: vlan 3902, p 0,
> ethertype PPPoE D, PPPoE PADT [ses 0x1] [Host-Uniq 0x00003003]
> [AC-Cookie 0xED****************************75]
> tcpdump: pcap_loop: The interface went down
[...]
> Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]:  dst
> c4:XX:XX:XX:XX:ed  src a4:XX:XX:XX:XX:44

That's really good evidence right there.  It certainly looks like the
Roaring Penguin PPPoE kernel driver is blindly accepting a bad PADT
packet meant for someone else.  Your peer probably shouldn't have sent
this down the line to you, but that's another matter.

I don't maintain the RP software, but I have a guess.  Either something
on the system is keeping the Ethernet driver itself in promiscuous mode,
or some defect in the driver allows non-matching non-multicast
destinations to be sent up the receive pipe.

Either way, it's usually responsibility of the receiving protocol
software after ethertype demux (in this case, the PPPoE implementation,
and NOT the PPP code) to validate that the addresses are correct when
necessary.  It's not the driver's problem.

I think the PPPoE code may be missing a check here.  This looks like a
bug to me.

> Sun Apr 26 17:35:06 2020 daemon.debug pppd[20289]: sent [IPV6CP
> ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
> Sun Apr 26 17:35:09 2020 daemon.warn pppd[20289]: IPV6CP: timeout
> sending Config-Requests

I don't think this is causing the problem, but it's interesting evidence
nonetheless.  If the peer you're talking to actually implemented PPP in
a reasonable manner, it would send LCP Protocol-Reject in response to
your attempt to negotiate IPV6CP rather than just ignoring you.  The
peer isn't healthy.  Not much you can do about it at this point (other
than disabling IPv6 on your side with "noipv6"), but still interesting.

> The strange part is in the tcpdump there is a PADT sent to an
> "unknown" MAC and my pppd responds. At least that is how I see it.

Yes, EXCEPT that pppd isn't involved at all.  PADT is part of the PPPoE
protocol, not PPP.  pppd doesn't know anything about this.  It's the
rp-pppoe software that's responding improperly.  As I suggested
previously in a private message, I think you should discuss the problem
with the maintainers of the PPPoE stuff.

-- 
James Carlson         42.703N 71.076W         <carlsonj@workingcode.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (21 preceding siblings ...)
  2020-04-27 13:43 ` James Carlson
@ 2020-04-27 20:08 ` Guillaume Nault
  2020-04-27 20:30 ` Guillaume Nault
                   ` (19 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Guillaume Nault @ 2020-04-27 20:08 UTC (permalink / raw)
  To: linux-ppp

On Mon, Apr 27, 2020 at 09:43:06AM -0400, James Carlson wrote:
> On 2020-04-26 15:03, David Balažic wrote:
> > tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size
> > 262144 bytes
> > 17:34:34.805424 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
> > (oui Unknown), ethertype 802.1Q (0x8100), length 60: vlan 3902, p 1,
> > ethertype PPPoE D, PPPoE PADT [ses 0x1]
> > 17:34:34.815673 c4:3d:c7:90:ce:ed (oui Unknown) > a4:7b:2c:9e:c7:44
> > (oui Unknown), ethertype 802.1Q (0x8100), length 52: vlan 3902, p 0,
> > ethertype PPPoE D, PPPoE PADT [ses 0x1] [Host-Uniq 0x00003003]
> > [AC-Cookie 0xED****************************75]
> > tcpdump: pcap_loop: The interface went down
> [...]
> > Sun Apr 26 17:34:35 2020 daemon.debug pppd[20289]:  dst
> > c4:XX:XX:XX:XX:ed  src a4:XX:XX:XX:XX:44
> 
> That's really good evidence right there.  It certainly looks like the
> Roaring Penguin PPPoE kernel driver is blindly accepting a bad PADT
> packet meant for someone else.  Your peer probably shouldn't have sent
> this down the line to you, but that's another matter.
> 
From what I remember of the rp-pppoe and pppd code, once the PPPoE
session is established, rp-pppoe gives control to pppd.
Therefore it has no chance to receive a PADT, since it can't read the
file descriptor used for PPPoE discovery anymore.

That means that the kernel needs to have some ugly code to snoop PADT
messages and kill the affected channel, so that pppd terminates without
waiting for an LCP time out.

What's certain, is that the kernel will shut down the PPP channel
attached to a PPPoE session that receives a PADT. Matching is done on
the source MAC address, the input interface and the session Id.
Delivering the PADT to the interface follows the normal rules. So, in
non-promiscuous mode, a non-multicast frame must have the correct
destination MAC address. In promiscuous mode, all frames are accepted.

Note that, as I already said in a previous message, actions on the
network interface used by PPPoE can also unregister the PPP channel
(passing the interface down, changing the MTU, etc.). That's why I
proposed to check the evolution of the PPPoE interface with rtmon.

> I don't maintain the RP software, but I have a guess.  Either something
> on the system is keeping the Ethernet driver itself in promiscuous mode,
> or some defect in the driver allows non-matching non-multicast
> destinations to be sent up the receive pipe.
> 
I guess it was just because of the concurrent tcpdump running without
the -p option.

> Either way, it's usually responsibility of the receiving protocol
> software after ethertype demux (in this case, the PPPoE implementation,
> and NOT the PPP code) to validate that the addresses are correct when
> necessary.  It's not the driver's problem.
> 
> I think the PPPoE code may be missing a check here.  This looks like a
> bug to me.
> 
> > Sun Apr 26 17:35:06 2020 daemon.debug pppd[20289]: sent [IPV6CP
> > ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
> > Sun Apr 26 17:35:09 2020 daemon.warn pppd[20289]: IPV6CP: timeout
> > sending Config-Requests
> 
> I don't think this is causing the problem, but it's interesting evidence
> nonetheless.  If the peer you're talking to actually implemented PPP in
> a reasonable manner, it would send LCP Protocol-Reject in response to
> your attempt to negotiate IPV6CP rather than just ignoring you.  The
> peer isn't healthy.  Not much you can do about it at this point (other
> than disabling IPv6 on your side with "noipv6"), but still interesting.
> 
> > The strange part is in the tcpdump there is a PADT sent to an
> > "unknown" MAC and my pppd responds. At least that is how I see it.
> 
> Yes, EXCEPT that pppd isn't involved at all.  PADT is part of the PPPoE
> protocol, not PPP.  pppd doesn't know anything about this.  It's the
> rp-pppoe software that's responding improperly.  As I suggested
> previously in a private message, I think you should discuss the problem
> with the maintainers of the PPPoE stuff.
> 
> -- 
> James Carlson         42.703N 71.076W         <carlsonj@workingcode.com>
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (22 preceding siblings ...)
  2020-04-27 20:08 ` Guillaume Nault
@ 2020-04-27 20:30 ` Guillaume Nault
  2020-04-27 23:34 ` David Balažic
                   ` (18 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Guillaume Nault @ 2020-04-27 20:30 UTC (permalink / raw)
  To: linux-ppp

On Mon, Apr 27, 2020 at 11:59:25AM +0200, David Balažic wrote:
> On Mon, 27 Apr 2020 at 04:14, Michael Richardson <mcr@sandelman.ca> wrote:
> >
> >
> > David Balažic <xerces9@gmail.com> wrote:
> >     > Sun Apr 26 17:34:57 2020 daemon.debug pppd[20289]: sent [IPV6CP
> >     > ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
> >     > Sun Apr 26 17:35:00 2020 daemon.debug pppd[20289]: sent [IPV6CP
> >     > ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
> >     > Sun Apr 26 17:35:03 2020 daemon.debug pppd[20289]: sent [IPV6CP
> >     > ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
> >     > Sun Apr 26 17:35:06 2020 daemon.debug pppd[20289]: sent [IPV6CP
> >     > ConfReq id=0x1 <addr fe80::d035:60ed:928e:f741>]
> >     > Sun Apr 26 17:35:09 2020 daemon.warn pppd[20289]: IPV6CP: timeout
> >     > sending Config-Requests
> >
> > Could this be the reason for the hangup?
> > pppd gets tired of no IPv6, decides it should hangup?
> 
> These lines are logged in the 30 seconds after the connection is
> established. Later they never show up. It was like that for all
> connections thus far.
> 
Your provider doesn't support IPv6. As James pointed out, it isn't even
capable of reporting that fact. So your device keeps trying for a few
seconds. Nothing to worry about (appart that you won't have IPv6
connectivity of course).

> >     > The strange part is in the tcpdump there is a PADT sent to an
> >     > "unknown" MAC and my pppd responds. At least that is how I see it.
> >
> >     > You think NOT putting the interface into promiscuous mode (done by
> >     > tcpdump) would prevent this?
> >     > Anyway, now I startted tcpdump with the -p option:  tcpdump -e -v -p
> >     > -i eth1 vlan 3902 and pppoed
> >
> > It could be that promisc mode (no -p) means that the PADT makes something
> > break, different than what you are investigating.  -p avoid promisc mode, so
> > would avoid seeing that packet.
> >
> > You mention in another thread that you were trying to do DHCPv6 on a
> > different (non-PPPoE) interface.  I don't see how that would matter unless
> > the failure caused netifd to decide to retry it all.
> >
> > It seems that you ought to try the noipv6 option to pppd.
> 
> I removed it (from the system config, the file /etc/config/network ).
> I also removed that "other interface", so there is no IPv6 stuff left.
> The new command line is:
> /usr/sbin/pppd nodetach ipparam wan ifname pppoe-wan lcp-echo-interval
> 1 lcp-echo-failure 5 lcp-echo-adaptive nodefaultroute usepeerdns
> maxfail 1 user YYYYYY password XXXXX ip-up-script /lib/netifd/ppp-up
> ip-down-script /lib/netifd/ppp-down mtu 1492 mru 1492 plugin
> rp-pppoe.so nic-eth1.3902
> 
> For the record: The new connection with new settings was started on
> Mon Apr 27 09:47:48 (UTC)
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (23 preceding siblings ...)
  2020-04-27 20:30 ` Guillaume Nault
@ 2020-04-27 23:34 ` David Balažic
  2020-04-28 10:18 ` David Balažic
                   ` (17 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-04-27 23:34 UTC (permalink / raw)
  To: linux-ppp

On Fri, 24 Apr 2020 at 17:55, Guillaume Nault <gnault@redhat.com> wrote:
> You can monitor the evolution of you network devices with rtmon:
> $ rtmon file ./some/file.log link
>
> Then read it with ip monitor:
> $ ip monitor file ./some/file.log
>
> That should tell if eth1.3902 is ever passed down or even deleted.

rtmon does not seem to be available on openwrt. It is part of iproute2, right?

Are there any alternatives?
(until I find or compile it myself...)

Regards,
David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (24 preceding siblings ...)
  2020-04-27 23:34 ` David Balažic
@ 2020-04-28 10:18 ` David Balažic
  2020-04-28 11:00 ` Guillaume Nault
                   ` (16 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-04-28 10:18 UTC (permalink / raw)
  To: linux-ppp

On Tue, 28 Apr 2020 at 01:34, David Balažic <xerces9@gmail.com> wrote:
>
> On Fri, 24 Apr 2020 at 17:55, Guillaume Nault <gnault@redhat.com> wrote:
> > You can monitor the evolution of you network devices with rtmon:
> > $ rtmon file ./some/file.log link
> >
> > Then read it with ip monitor:
> > $ ip monitor file ./some/file.log
> >
> > That should tell if eth1.3902 is ever passed down or even deleted.
>
> rtmon does not seem to be available on openwrt. It is part of iproute2, right?
>
> Are there any alternatives?
> (until I find or compile it myself...)

OK, I found 'ip', but not 'rtmon'.

Is running 'ip monitor link' equivalent?
I'll start that...

Regards,
David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (25 preceding siblings ...)
  2020-04-28 10:18 ` David Balažic
@ 2020-04-28 11:00 ` Guillaume Nault
  2020-04-29 11:58 ` David Balažic
                   ` (15 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Guillaume Nault @ 2020-04-28 11:00 UTC (permalink / raw)
  To: linux-ppp

On Tue, Apr 28, 2020 at 12:18:38PM +0200, David Balažic wrote:
> On Tue, 28 Apr 2020 at 01:34, David Balažic <xerces9@gmail.com> wrote:
> >
> > On Fri, 24 Apr 2020 at 17:55, Guillaume Nault <gnault@redhat.com> wrote:
> > > You can monitor the evolution of you network devices with rtmon:
> > > $ rtmon file ./some/file.log link
> > >
> > > Then read it with ip monitor:
> > > $ ip monitor file ./some/file.log
> > >
> > > That should tell if eth1.3902 is ever passed down or even deleted.
> >
> > rtmon does not seem to be available on openwrt. It is part of iproute2, right?
> >
> > Are there any alternatives?
> > (until I find or compile it myself...)
> 
Sorry for late reply.
Yes, rtmon is part of iproute2.

> OK, I found 'ip', but not 'rtmon'.
> 
> Is running 'ip monitor link' equivalent?
> I'll start that...
> 
Yes. Both listen to the kernel netlink messages that are sent when
network interfaces are modified.

> Regards,
> David
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (26 preceding siblings ...)
  2020-04-28 11:00 ` Guillaume Nault
@ 2020-04-29 11:58 ` David Balažic
  2020-05-03 10:31 ` David Balažic
                   ` (14 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-04-29 11:58 UTC (permalink / raw)
  To: linux-ppp

Next disconnect:

NOTE: c4:XX:XX:XX:XX:ed  is my wan interfaces MAC address

tcpdump -e -v -i eth1 vlan 3902 and pppoed
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size
262144 bytes
** several PADO and PADS packet sent from server/AC to
44:4e:6d:fd:c7:39 before this, during the day **
05:03:44.348738 a4:7b:2c:9e:c7:44 (oui Unknown) > 00:0f:94:ef:64:e0
(oui Unknown), ethertype 802.1Q (0x8100), length 75: vlan 3902, p 1,
ethertype PPPoE D, PPPoE PADO [Service-Name] [AC-Name
"SIMB_TABOR_BNG1"] [Host-Uniq 0x960B0000] [AC-Cookie
0xDCC3248D61A5D76C7C1DD4138431B17C]
06:40:27.668095 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
(oui Unknown), ethertype 802.1Q (0x8100), length 101: vlan 3902, p 1,
ethertype PPPoE D, PPPoE PADO [Service-Name] [AC-Name
"SIMB_TABOR_BNG1"] [Host-Uniq
0x444E6DFDC739AAAAAAAA92000000AAAAAAAA4621426F78202020AAAAAAAA]
[AC-Cookie ".5b.u...~.F.mlKv"]
06:40:27.674078 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
(oui Unknown), ethertype 802.1Q (0x8100), length 62: vlan 3902, p 1,
ethertype PPPoE D, PPPoE PADS [ses 0x1] [Service-Name] [Host-Uniq
0x444E6DFDC739AAAAAAAA92000000AAAAAAAA4621426F78202020AAAAAAAA]
06:40:30.552119 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
(oui Unknown), ethertype 802.1Q (0x8100), length 60: vlan 3902, p 1,
ethertype PPPoE D, PPPoE PADT [ses 0x1]
06:40:31.158067 c4:XX:XX:XX:XX:ed (oui Unknown) > a4:7b:2c:9e:c7:44
(oui Unknown), ethertype 802.1Q (0x8100), length 52: vlan 3902, p 0,
ethertype PPPoE D, PPPoE PADT [ses 0x1] [Host-Uniq 0x000057CF]
[AC-Cookie 0xED891159D8A73577F9B098A2E3E36875]
tcpdump: pcap_loop: The interface went down
12 packets captured
12 packets received by filter
0 packets dropped by kernel


ip monitor link

[2020-04-29 06:40:30] 47: pppoe-wan: <POINTOPOINT,MULTICAST,NOARP> mtu
1492 qdisc fq_codel state DOWN group default
[2020-04-29 06:40:30]     link/ppp
[2020-04-29 06:40:31] Deleted 47: pppoe-wan:
<POINTOPOINT,MULTICAST,NOARP> mtu 1492 qdisc noop state DOWN group
default
[2020-04-29 06:40:31]     link/ppp
[2020-04-29 06:40:31] 46: eth1.3902@eth1: <BROADCAST,MULTICAST> mtu
1500 qdisc noqueue state DOWN group default
[2020-04-29 06:40:31]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff
[2020-04-29 06:40:31] 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu
1500 qdisc fq_codel state UP group default
[2020-04-29 06:40:31]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff
[2020-04-29 06:40:31] Deleted 46: eth1.3902@eth1:
<BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
[2020-04-29 06:40:31]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff
[2020-04-29 06:40:31] 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc
fq_codel state DOWN group default
[2020-04-29 06:40:31]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff
[2020-04-29 06:40:31] 3: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc
fq_codel state UP group default
[2020-04-29 06:40:31]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff
[2020-04-29 06:40:31] 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu
1500 qdisc fq_codel state UP group default
[2020-04-29 06:40:31]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff
[2020-04-29 06:40:31] 48: eth1.3902@eth1: <BROADCAST,MULTICAST> mtu
1500 qdisc noop state DOWN group default
[2020-04-29 06:40:31]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff
[2020-04-29 06:40:31] 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu
1500 qdisc fq_codel state UP group default
[2020-04-29 06:40:31]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff
[2020-04-29 06:40:31] 48: eth1.3902@eth1:
<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
group default
[2020-04-29 06:40:31]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff
[2020-04-29 06:40:31] 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu
1500 qdisc fq_codel state UP group default
[2020-04-29 06:40:31]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff
[2020-04-29 06:40:31] 49: ppp0: <POINTOPOINT,MULTICAST,NOARP> mtu 1500
qdisc noop state DOWN group default
[2020-04-29 06:40:31]     link/ppp
[2020-04-29 06:40:31] 49: pppoe-wan: <POINTOPOINT,MULTICAST,NOARP> mtu
1500 qdisc noop state DOWN group default
[2020-04-29 06:40:31]     link/ppp
[2020-04-29 06:40:32] 48: eth1.3902@eth1:
<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
group default
[2020-04-29 06:40:32]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff
[2020-04-29 06:40:34] 49: pppoe-wan: <POINTOPOINT,MULTICAST,NOARP> mtu
1492 qdisc noop state DOWN group default
[2020-04-29 06:40:34]     link/ppp
[2020-04-29 06:40:35] 49: pppoe-wan:
<POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc fq_codel
state UNKNOWN group default
[2020-04-29 06:40:35]     link/ppp


syslog:

Wed Apr 29 02:36:07 2020 daemon.info dnsmasq-dhcp[1565]:
DHCPREQUEST(br-lan) 192.168.1.197 XX:XX:XX:XX:XX:XX
Wed Apr 29 02:36:07 2020 daemon.info dnsmasq-dhcp[1565]:
DHCPACK(br-lan) 192.168.1.197 XX:XX:XX:XX:XX:XX ffoo
Wed Apr 29 05:21:53 2020 daemon.info dnsmasq-dhcp[1565]:
DHCPREQUEST(br-lan) 192.168.1.111 XX:XX:XX:XX:XX:XX
Wed Apr 29 05:21:53 2020 daemon.info dnsmasq-dhcp[1565]:
DHCPACK(br-lan) 192.168.1.111 XX:XX:XX:XX:XX:XX bar
Wed Apr 29 05:53:34 2020 daemon.info dnsmasq-dhcp[1565]:
DHCPREQUEST(br-lan) 192.168.1.217 XX:XX:XX:XX:XX:XX
Wed Apr 29 05:53:34 2020 daemon.info dnsmasq-dhcp[1565]:
DHCPACK(br-lan) 192.168.1.217 XX:XX:XX:XX:XX:XX baz
Wed Apr 29 06:01:46 2020 daemon.info dnsmasq-dhcp[1565]:
DHCPREQUEST(br-lan) 192.168.1.225 XX:XX:XX:XX:XX:XX
Wed Apr 29 06:01:46 2020 daemon.info dnsmasq-dhcp[1565]:
DHCPACK(br-lan) 192.168.1.225 XX:XX:XX:XX:XX:XX foobar
Wed Apr 29 06:36:11 2020 daemon.info dnsmasq-dhcp[1565]:
DHCPREQUEST(br-lan) 192.168.1.193 XX:XX:XX:XX:XX:XX
Wed Apr 29 06:36:11 2020 daemon.info dnsmasq-dhcp[1565]:
DHCPACK(br-lan) 192.168.1.193 XX:XX:XX:XX:XX:XX barfoo
Wed Apr 29 06:40:30 2020 daemon.notice pppd[22479]: Modem hangup
Wed Apr 29 06:40:30 2020 daemon.info pppd[22479]: Connect time 2351.5 minutes.
Wed Apr 29 06:40:30 2020 daemon.info pppd[22479]: Sent 524878898
bytes, received 2649683636 bytes.
Wed Apr 29 06:40:30 2020 daemon.notice netifd: Network device
'pppoe-wan' link is down
Wed Apr 29 06:40:30 2020 daemon.debug pppd[22479]: Script
/lib/netifd/ppp-down started (pid 9434)
Wed Apr 29 06:40:31 2020 daemon.notice pppd[22479]: Connection terminated.
Wed Apr 29 06:40:31 2020 daemon.debug pppd[22479]: Send PPPOE
Discovery V1T1 PADT session 0x1 length 28
Wed Apr 29 06:40:31 2020 daemon.debug pppd[22479]:  dst
a4:7b:2c:9e:c7:44  src c4:XX:XX:XX:XX:ed
Wed Apr 29 06:40:31 2020 daemon.debug pppd[22479]:  [host-uniq  00 00
57 cf] [AC-cookie  ed 89 11 59 d8 a7 35 77 f9 b0 98 a2 e3 e3 68 75]
Wed Apr 29 06:40:31 2020 daemon.info pppd[22479]: Sent PADT
Wed Apr 29 06:40:31 2020 daemon.debug pppd[22479]: Waiting for 1 child
processes...
Wed Apr 29 06:40:31 2020 daemon.debug pppd[22479]:   script
/lib/netifd/ppp-down, pid 9434
Wed Apr 29 06:40:31 2020 daemon.notice netifd: Interface 'wan' has
lost the connection
Wed Apr 29 06:40:31 2020 daemon.warn dnsmasq[1565]: no servers found
in /tmp/resolv.conf.auto, will retry
Wed Apr 29 06:40:31 2020 daemon.debug pppd[22479]: Script
/lib/netifd/ppp-down finished (pid 9434), status = 0x1
Wed Apr 29 06:40:31 2020 daemon.info pppd[22479]: Exit.
Wed Apr 29 06:40:31 2020 daemon.notice netifd: Interface 'wan' is now down
Wed Apr 29 06:40:31 2020 kern.info kernel: [825025.766137] IPv6:
ADDRCONF(NETDEV_UP): eth1.3902: link is not ready
Wed Apr 29 06:40:31 2020 kern.info kernel: [825025.781197] eth1: link down
Wed Apr 29 06:40:31 2020 daemon.notice netifd: Interface 'wan' is disabled
Wed Apr 29 06:40:31 2020 kern.info kernel: [825025.815043] IPv6:
ADDRCONF(NETDEV_UP): eth1: link is not ready
Wed Apr 29 06:40:31 2020 kern.info kernel: [825025.822423] eth1: link
up (1000Mbps/Full duplex)
Wed Apr 29 06:40:31 2020 kern.info kernel: [825025.844083] IPv6:
ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
Wed Apr 29 06:40:31 2020 daemon.notice netifd: Interface 'wan' is enabled
Wed Apr 29 06:40:31 2020 daemon.notice netifd: Interface 'wan' is setting up now
Wed Apr 29 06:40:31 2020 kern.info kernel: [825026.010950] device eth1
left promiscuous mode
Wed Apr 29 06:40:31 2020 daemon.err insmod: module is already loaded - slhc
Wed Apr 29 06:40:31 2020 daemon.err insmod: module is already loaded -
ppp_generic
Wed Apr 29 06:40:31 2020 daemon.err insmod: module is already loaded - pppox
Wed Apr 29 06:40:31 2020 daemon.err insmod: module is already loaded - pppoe
Wed Apr 29 06:40:31 2020 daemon.info pppd[9495]: Plugin rp-pppoe.so loaded.
Wed Apr 29 06:40:31 2020 daemon.info pppd[9495]: RP-PPPoE plugin
version 3.8p compiled against pppd 2.4.7
Wed Apr 29 06:40:31 2020 daemon.notice pppd[9495]: pppd 2.4.7 started
by root, uid 0
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]: Send PPPOE Discovery
V1T1 PADI session 0x0 length 12
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]:  dst
ff:ff:ff:ff:ff:ff  src c4:XX:XX:XX:XX:ed
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]:  [service-name]
[host-uniq  00 00 25 17]
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]: Recv PPPOE Discovery
V1T1 PADO session 0x0 length 51
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]:  dst
c4:XX:XX:XX:XX:ed  src a4:7b:2c:9e:c7:44
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]:  [service-name]
[AC-name SIMB_TABOR_BNG1] [host-uniq  00 00 25 17] [AC-cookie  ed 89
11 59 d8 a7 35 77 f9 b0 98 a2 e3 e3 68 75]
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]: Send PPPOE Discovery
V1T1 PADR session 0x0 length 32
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]:  dst
a4:7b:2c:9e:c7:44  src c4:XX:XX:XX:XX:ed
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]:  [service-name]
[host-uniq  00 00 25 17] [AC-cookie  ed 89 11 59 d8 a7 35 77 f9 b0 98
a2 e3 e3 68 75]
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]: Recv PPPOE Discovery
V1T1 PADS session 0x1 length 12
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]:  dst
c4:XX:XX:XX:XX:ed  src a4:7b:2c:9e:c7:44
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]:  [service-name]
[host-uniq  00 00 25 17]
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]: PADS: Service-Name: ''
Wed Apr 29 06:40:31 2020 daemon.info pppd[9495]: PPP session is 1
Wed Apr 29 06:40:31 2020 daemon.warn pppd[9495]: Connected to
a4:7b:2c:9e:c7:44 via interface eth1.3902
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]: using channel 20
Wed Apr 29 06:40:31 2020 kern.info kernel: [825026.248151] pppoe-wan:
renamed from ppp0
Wed Apr 29 06:40:31 2020 daemon.info pppd[9495]: Renamed interface
ppp0 to pppoe-wan
Wed Apr 29 06:40:31 2020 daemon.info pppd[9495]: Using interface pppoe-wan
Wed Apr 29 06:40:31 2020 daemon.notice pppd[9495]: Connect: pppoe-wan
<--> eth1.3902
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]: sent [LCP ConfReq
id=0x1 <mru 1492> <magic 0x5455b2ef>]
Wed Apr 29 06:40:31 2020 daemon.debug pppd[9495]: rcvd [LCP ConfAck
id=0x1 <mru 1492> <magic 0x5455b2ef>]
Wed Apr 29 06:40:34 2020 daemon.debug pppd[9495]: rcvd [LCP ConfReq
id=0x22 <mru 1492> <auth chap MD5> <magic 0x3b0f63c8>]
Wed Apr 29 06:40:34 2020 daemon.debug pppd[9495]: sent [LCP ConfAck
id=0x22 <mru 1492> <auth chap MD5> <magic 0x3b0f63c8>]
Wed Apr 29 06:40:34 2020 daemon.debug pppd[9495]: sent [LCP EchoReq
id=0x0 magic=0x5455b2ef]
Wed Apr 29 06:40:34 2020 daemon.debug pppd[9495]: rcvd [CHAP Challenge
id=0x1 <5f38**********************************************5423>, name
= "SIMB_TABOR_BNG1"]
Wed Apr 29 06:40:34 2020 daemon.debug pppd[9495]: sent [CHAP Response
id=0x1 <c02**********************e9f>, name = "username"]
Wed Apr 29 06:40:34 2020 daemon.debug pppd[9495]: rcvd [LCP EchoRep
id=0x0 magic=0x3b0f63c8]
Wed Apr 29 06:40:34 2020 daemon.debug pppd[9495]: rcvd [CHAP Success
id=0x1 "CHAP authentication success"]
Wed Apr 29 06:40:34 2020 daemon.info pppd[9495]: CHAP authentication
succeeded: CHAP authentication success
Wed Apr 29 06:40:34 2020 daemon.notice pppd[9495]: CHAP authentication succeeded
Wed Apr 29 06:40:34 2020 daemon.notice pppd[9495]: peer from calling
number A4:7B:2C:9E:C7:44 authorized
Wed Apr 29 06:40:35 2020 daemon.debug pppd[9495]: sent [IPCP ConfReq
id=0x1 <addr 0.0.0.0> <ms-dns1 0.0.0.0> <ms-dns2 0.0.0.0>]
Wed Apr 29 06:40:35 2020 daemon.debug pppd[9495]: rcvd [IPCP ConfReq
id=0x2d <addr 94.127.30.0>]
Wed Apr 29 06:40:35 2020 daemon.debug pppd[9495]: sent [IPCP ConfAck
id=0x2d <addr 94.127.30.0>]
Wed Apr 29 06:40:35 2020 daemon.debug pppd[9495]: rcvd [IPCP ConfNak
id=0x1 <addr 146.X.Y.Z> <ms-dns1 212.18.32.10> <ms-dns2 212.18.32.12>]
Wed Apr 29 06:40:35 2020 daemon.debug pppd[9495]: sent [IPCP ConfReq
id=0x2 <addr 146.X.Y.Z> <ms-dns1 212.18.32.10> <ms-dns2 212.18.32.12>]
Wed Apr 29 06:40:35 2020 daemon.debug pppd[9495]: rcvd [IPCP ConfAck
id=0x2 <addr 146.X.Y.Z> <ms-dns1 212.18.32.10> <ms-dns2 212.18.32.12>]
Wed Apr 29 06:40:35 2020 daemon.notice pppd[9495]: local  IP address 146.X.Y.Z
Wed Apr 29 06:40:35 2020 daemon.notice pppd[9495]: remote IP address 94.127.30.0
Wed Apr 29 06:40:35 2020 daemon.notice pppd[9495]: primary   DNS
address 212.18.32.10
Wed Apr 29 06:40:35 2020 daemon.notice pppd[9495]: secondary DNS
address 212.18.32.12
Wed Apr 29 06:40:35 2020 daemon.debug pppd[9495]: Script
/lib/netifd/ppp-up started (pid 9560)
Wed Apr 29 06:40:35 2020 daemon.notice netifd: Network device
'pppoe-wan' link is up
Wed Apr 29 06:40:35 2020 daemon.notice netifd: Interface 'wan' is now up
Wed Apr 29 06:40:35 2020 daemon.info dnsmasq[1565]: reading
/tmp/resolv.conf.auto
Wed Apr 29 06:40:35 2020 daemon.info dnsmasq[1565]: using local
addresses only for domain test
Wed Apr 29 06:40:35 2020 daemon.info dnsmasq[1565]: using local
addresses only for domain onion
Wed Apr 29 06:40:35 2020 daemon.info dnsmasq[1565]: using local
addresses only for domain localhost
Wed Apr 29 06:40:35 2020 daemon.info dnsmasq[1565]: using local
addresses only for domain local
Wed Apr 29 06:40:35 2020 daemon.info dnsmasq[1565]: using local
addresses only for domain invalid
Wed Apr 29 06:40:35 2020 daemon.info dnsmasq[1565]: using local
addresses only for domain bind
Wed Apr 29 06:40:35 2020 daemon.info dnsmasq[1565]: using local
addresses only for domain lan
Wed Apr 29 06:40:35 2020 daemon.info dnsmasq[1565]: using nameserver
212.18.32.10#53
Wed Apr 29 06:40:35 2020 daemon.info dnsmasq[1565]: using nameserver
212.18.32.12#53
Wed Apr 29 06:40:35 2020 daemon.debug pppd[9495]: Script
/lib/netifd/ppp-up finished (pid 9560), status = 0x1
Wed Apr 29 06:40:36 2020 user.notice firewall: Reloading firewall due
to ifup of wan (pppoe-wan)
Wed Apr 29 08:07:42 2020 daemon.info dnsmasq-dhcp[1565]:
DHCPREQUEST(br-lan) 192.168.1.197 XX:XX:XX:XX:XX:XX


Regards,
David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (27 preceding siblings ...)
  2020-04-29 11:58 ` David Balažic
@ 2020-05-03 10:31 ` David Balažic
  2020-05-04 12:27 ` Guillaume Nault
                   ` (13 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-05-03 10:31 UTC (permalink / raw)
  To: linux-ppp

On Wed, 22 Apr 2020 at 19:45, David Balažic <xerces9@gmail.com> wrote:
>
> Hi!
>
> I have a router running openwrt connected to a GPON ONT, running a
> PPPoE connection (hw details below).
>
> The problem is, that after some random time one to 20 hours, the
> connection breaks. Then in about a minute it reconnects (successfully,
> each time).

Mystery solved.

I added a debug output to kernel pppoe.c:

pr_info("pppoe: pppoe_disc_rcv /my patch/  PADT received, sid=%d, SRC:
%02x:%02x:%02x:%02x:%02x:%02x, DST: %02x:%02x:%02x:%02x:%02x:%02x\n",
ph->sid,
eth_hdr(skb)->h_source[0],
eth_hdr(skb)->h_source[1],
eth_hdr(skb)->h_source[2],
eth_hdr(skb)->h_source[3],
eth_hdr(skb)->h_source[4],
eth_hdr(skb)->h_source[5],
eth_hdr(skb)->h_dest[0],
eth_hdr(skb)->h_dest[1],
eth_hdr(skb)->h_dest[2],
eth_hdr(skb)->h_dest[3],
eth_hdr(skb)->h_dest[4],
eth_hdr(skb)->h_dest[5]
);

When the problem happens, it prints:

pppoe: pppoe_disc_rcv /my patch/  PADT received, sid=1, SRC:
a4:7b:2c:9e:c7:44, DST: 44:4e:6d:fd:c7:39

My HWaddr is c4:xxxxxx:ed

After that, the pppoe module closes the connection.

So the problem was as already suspected, that pppoe does not properly
check if the received PADT packet belongs to its ppp session.

I'll patch it up soon (if someone does not beat me to it).

Regards,
David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (28 preceding siblings ...)
  2020-05-03 10:31 ` David Balažic
@ 2020-05-04 12:27 ` Guillaume Nault
  2020-05-04 13:01 ` Guillaume Nault
                   ` (12 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Guillaume Nault @ 2020-05-04 12:27 UTC (permalink / raw)
  To: linux-ppp

On Wed, Apr 29, 2020 at 01:58:37PM +0200, David Balažic wrote:
> Next disconnect:
> 
> NOTE: c4:XX:XX:XX:XX:ed  is my wan interfaces MAC address
> 
> tcpdump -e -v -i eth1 vlan 3902 and pppoed
> 
This puts the parent interface in promiscuous mode.

> tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size
> 262144 bytes
> ** several PADO and PADS packet sent from server/AC to
> 44:4e:6d:fd:c7:39 before this, during the day **
> 05:03:44.348738 a4:7b:2c:9e:c7:44 (oui Unknown) > 00:0f:94:ef:64:e0
> (oui Unknown), ethertype 802.1Q (0x8100), length 75: vlan 3902, p 1,
> ethertype PPPoE D, PPPoE PADO [Service-Name] [AC-Name
> "SIMB_TABOR_BNG1"] [Host-Uniq 0x960B0000] [AC-Cookie
> 0xDCC3248D61A5D76C7C1DD4138431B17C]
> 06:40:27.668095 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
> (oui Unknown), ethertype 802.1Q (0x8100), length 101: vlan 3902, p 1,
> ethertype PPPoE D, PPPoE PADO [Service-Name] [AC-Name
> "SIMB_TABOR_BNG1"] [Host-Uniq
> 0x444E6DFDC739AAAAAAAA92000000AAAAAAAA4621426F78202020AAAAAAAA]
> [AC-Cookie ".5b.u...~.F.mlKv"]
> 06:40:27.674078 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
> (oui Unknown), ethertype 802.1Q (0x8100), length 62: vlan 3902, p 1,
> ethertype PPPoE D, PPPoE PADS [ses 0x1] [Service-Name] [Host-Uniq
> 0x444E6DFDC739AAAAAAAA92000000AAAAAAAA4621426F78202020AAAAAAAA]

> 06:40:30.552119 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39
> (oui Unknown), ethertype 802.1Q (0x8100), length 60: vlan 3902, p 1,
> ethertype PPPoE D, PPPoE PADT [ses 0x1]
> 
Because the interface is in promiscuous mode, this PADT is accepted,
even though the destination MAC address is different from the one of
your interface. That's certainly the reason why you got disconnected
(assuming your PPPoE session Id. was 1).

Also, please avoid wrapping your log lines, that makes them close to
unreadable.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (29 preceding siblings ...)
  2020-05-04 12:27 ` Guillaume Nault
@ 2020-05-04 13:01 ` Guillaume Nault
  2020-05-04 15:54 ` David Balažic
                   ` (11 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Guillaume Nault @ 2020-05-04 13:01 UTC (permalink / raw)
  To: linux-ppp

On Sun, May 03, 2020 at 12:31:51PM +0200, David Balažic wrote:
> On Wed, 22 Apr 2020 at 19:45, David Balažic <xerces9@gmail.com> wrote:
> >
> > Hi!
> >
> > I have a router running openwrt connected to a GPON ONT, running a
> > PPPoE connection (hw details below).
> >
> > The problem is, that after some random time one to 20 hours, the
> > connection breaks. Then in about a minute it reconnects (successfully,
> > each time).
> 
> Mystery solved.
> 
> I added a debug output to kernel pppoe.c:
> 
> pr_info("pppoe: pppoe_disc_rcv /my patch/  PADT received, sid=%d, SRC:
> %02x:%02x:%02x:%02x:%02x:%02x, DST: %02x:%02x:%02x:%02x:%02x:%02x\n",
> ph->sid,
> eth_hdr(skb)->h_source[0],
> eth_hdr(skb)->h_source[1],
> eth_hdr(skb)->h_source[2],
> eth_hdr(skb)->h_source[3],
> eth_hdr(skb)->h_source[4],
> eth_hdr(skb)->h_source[5],
> eth_hdr(skb)->h_dest[0],
> eth_hdr(skb)->h_dest[1],
> eth_hdr(skb)->h_dest[2],
> eth_hdr(skb)->h_dest[3],
> eth_hdr(skb)->h_dest[4],
> eth_hdr(skb)->h_dest[5]
> );
>
You can use "%pM" for printing MAC addresses. Also, it'd be interesting
to have information about promisc mode:
  "dev %s, flags: %#x, promiscuity %u",
  dev->name, dev->flags, dev->promiscuity,

> When the problem happens, it prints:
> 
> pppoe: pppoe_disc_rcv /my patch/  PADT received, sid=1, SRC:
> a4:7b:2c:9e:c7:44, DST: 44:4e:6d:fd:c7:39
> 
> My HWaddr is c4:xxxxxx:ed
> 
> After that, the pppoe module closes the connection.
> 
> So the problem was as already suspected, that pppoe does not properly
> check if the received PADT packet belongs to its ppp session.
> 
Given the logs you posted previously, it looks like the frame was
accepted because your interface was in promiscuous mode.

> I'll patch it up soon (if someone does not beat me to it).
> 
Please verify that the problem still happens when the device is _not_
in promisc mode.

> Regards,
> David
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (30 preceding siblings ...)
  2020-05-04 13:01 ` Guillaume Nault
@ 2020-05-04 15:54 ` David Balažic
  2020-05-04 16:36 ` David Balažic
                   ` (10 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-05-04 15:54 UTC (permalink / raw)
  To: linux-ppp

On Mon, 4 May 2020 at 14:27, Guillaume Nault <gnault@redhat.com> wrote:
>
> Also, please avoid wrapping your log lines, that makes them close to
> unreadable.

Sorry about that, it is gmail.com doing it. (a long time know issue
they refuse to fix).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (31 preceding siblings ...)
  2020-05-04 15:54 ` David Balažic
@ 2020-05-04 16:36 ` David Balažic
  2020-05-04 18:11 ` Guillaume Nault
                   ` (9 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-05-04 16:36 UTC (permalink / raw)
  To: linux-ppp

On Mon, 4 May 2020 at 15:01, Guillaume Nault <gnault@redhat.com> wrote:
> You can use "%pM" for printing MAC addresses. Also, it'd be interesting
> to have information about promisc mode:
>   "dev %s, flags: %#x, promiscuity %u",
>   dev->name, dev->flags, dev->promiscuity,

"ifdown wan" triggers a PADT response from the server, and then this
code prints:

 dev eth1.3902, flags: 0x1003, promiscuity 0

I'll report later values printed when a stray PADT appears.

Regards,
David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (32 preceding siblings ...)
  2020-05-04 16:36 ` David Balažic
@ 2020-05-04 18:11 ` Guillaume Nault
  2020-05-04 18:43 ` David Balažic
                   ` (8 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Guillaume Nault @ 2020-05-04 18:11 UTC (permalink / raw)
  To: linux-ppp

On Mon, May 04, 2020 at 06:36:48PM +0200, David Balažic wrote:
> On Mon, 4 May 2020 at 15:01, Guillaume Nault <gnault@redhat.com> wrote:
> > You can use "%pM" for printing MAC addresses. Also, it'd be interesting
> > to have information about promisc mode:
> >   "dev %s, flags: %#x, promiscuity %u",
> >   dev->name, dev->flags, dev->promiscuity,
> 
> "ifdown wan" triggers a PADT response from the server, and then this
> code prints:
> 
>  dev eth1.3902, flags: 0x1003, promiscuity 0
> 
I'm not sure what "ifdown wan" is supposed to do. Assuming "wan" is the
name of your ppp interface (wasn't it "pppoe-wan" in your previous
logs?) and that ifdown works like on Debian, then it makes sense that
the peer disconnects the PPPoE session by sending a PADT. At least now
the device isn't in promisc mode.

> I'll report later values printed when a stray PADT appears.
> 
Okay, but please keep printing the destination MAC address of the
packet. I was providing the flags/promiscuity string just to get extra
information, not to replace your original log.

Also, what's the driver of your NIC (ethtool -i eth1)?

> Regards,
> David
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (33 preceding siblings ...)
  2020-05-04 18:11 ` Guillaume Nault
@ 2020-05-04 18:43 ` David Balažic
  2020-05-06  9:52 ` David Balažic
                   ` (7 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-05-04 18:43 UTC (permalink / raw)
  To: linux-ppp

On Mon, 4 May 2020 at 20:12, Guillaume Nault <gnault@redhat.com> wrote:
>
> On Mon, May 04, 2020 at 06:36:48PM +0200, David Balažic wrote:
> > On Mon, 4 May 2020 at 15:01, Guillaume Nault <gnault@redhat.com> wrote:
> > > You can use "%pM" for printing MAC addresses. Also, it'd be interesting
> > > to have information about promisc mode:
> > >   "dev %s, flags: %#x, promiscuity %u",
> > >   dev->name, dev->flags, dev->promiscuity,
> >
> > "ifdown wan" triggers a PADT response from the server, and then this
> > code prints:
> >
> >  dev eth1.3902, flags: 0x1003, promiscuity 0
> >
> I'm not sure what "ifdown wan" is supposed to do. Assuming "wan" is the
> name of your ppp interface (wasn't it "pppoe-wan" in your previous
> logs?) and that ifdown works like on Debian, then it makes sense that
> the peer disconnects the PPPoE session by sending a PADT. At least now
> the device isn't in promisc mode.

Yes, "wan" is the interface alias. ifdown "closes" it... sends a PADT
to the server, server replies with PADT...

> > I'll report later values printed when a stray PADT appears.
> >
> Okay, but please keep printing the destination MAC address of the
> packet. I was providing the flags/promiscuity string just to get extra
> information, not to replace your original log.
>
> Also, what's the driver of your NIC (ethtool -i eth1)?

# ethtool  -i eth1
driver: ag71xx

Regards,
David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (34 preceding siblings ...)
  2020-05-04 18:43 ` David Balažic
@ 2020-05-06  9:52 ` David Balažic
  2020-05-06 10:34 ` David Balažic
                   ` (6 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-05-06  9:52 UTC (permalink / raw)
  To: linux-ppp

On Mon, 4 May 2020 at 20:12, Guillaume Nault <gnault@redhat.com> wrote:
>
> On Mon, May 04, 2020 at 06:36:48PM +0200, David Balažic wrote:
> > On Mon, 4 May 2020 at 15:01, Guillaume Nault <gnault@redhat.com> wrote:
> > > You can use "%pM" for printing MAC addresses. Also, it'd be interesting
> > > to have information about promisc mode:
> > >   "dev %s, flags: %#x, promiscuity %u",
> > >   dev->name, dev->flags, dev->promiscuity,
> >
>
> > I'll report later values printed when a stray PADT appears.
> >
> Okay, but please keep printing the destination MAC address of the
> packet. I was providing the flags/promiscuity string just to get extra
> information, not to replace your original log.

This was logged now:

(all at May  6 05:34:50 2020 UTC)
pppoe_disc_rcv PADT received, sid=1, SRC: a4:7b:2c:9e:c7:44, DST:
44:4e:6d:fd:c7:39
pppoe_disc_rcv PADT received, own hw addr: c4:XX:XX:XX:XX:ed
      dev eth1.3902, flags: 0x1003, promiscuity 0
pppoe_disc_rcv PADT received, not four our address, ignoring

(the last line is from my fix, the connection is now not dropped when
the PADT is not for us; works fine, my connection stays up and
working)

I'll clean up and post the patch later.

Regards,
David

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (35 preceding siblings ...)
  2020-05-06  9:52 ` David Balažic
@ 2020-05-06 10:34 ` David Balažic
  2020-05-06 14:26 ` Guillaume Nault
                   ` (5 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-05-06 10:34 UTC (permalink / raw)
  To: linux-ppp

On Wed, 6 May 2020 at 11:52, David Balažic <xerces9@gmail.com> wrote:

> I'll clean up and post the patch later.

Patch and description at https://bugzilla.kernel.org/show_bug.cgi?id=207597

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (36 preceding siblings ...)
  2020-05-06 10:34 ` David Balažic
@ 2020-05-06 14:26 ` Guillaume Nault
  2020-05-06 15:02 ` Guillaume Nault
                   ` (4 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Guillaume Nault @ 2020-05-06 14:26 UTC (permalink / raw)
  To: linux-ppp

On Wed, May 06, 2020 at 11:52:48AM +0200, David Balažic wrote:
> On Mon, 4 May 2020 at 20:12, Guillaume Nault <gnault@redhat.com> wrote:
> >
> > On Mon, May 04, 2020 at 06:36:48PM +0200, David Balažic wrote:
> > > On Mon, 4 May 2020 at 15:01, Guillaume Nault <gnault@redhat.com> wrote:
> > > > You can use "%pM" for printing MAC addresses. Also, it'd be interesting
> > > > to have information about promisc mode:
> > > >   "dev %s, flags: %#x, promiscuity %u",
> > > >   dev->name, dev->flags, dev->promiscuity,
> > >
> >
> > > I'll report later values printed when a stray PADT appears.
> > >
> > Okay, but please keep printing the destination MAC address of the
> > packet. I was providing the flags/promiscuity string just to get extra
> > information, not to replace your original log.
> 
> This was logged now:
> 
> (all at May  6 05:34:50 2020 UTC)
> pppoe_disc_rcv PADT received, sid=1, SRC: a4:7b:2c:9e:c7:44, DST:
> 44:4e:6d:fd:c7:39
> pppoe_disc_rcv PADT received, own hw addr: c4:XX:XX:XX:XX:ed
>       dev eth1.3902, flags: 0x1003, promiscuity 0
> pppoe_disc_rcv PADT received, not four our address, ignoring
> 
> (the last line is from my fix, the connection is now not dropped when
> the PADT is not for us; works fine, my connection stays up and
> working)
> 
Looks like a more fundamental issue. This frame shouldn't have been
accepted in the first place. Can you also print the packet class
("... pkt_type %u", ..., skb->pkt_type)?

Testing the destination MAC here is likely to just paper over the
problem.

> I'll clean up and post the patch later.
> 
> Regards,
> David
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (37 preceding siblings ...)
  2020-05-06 14:26 ` Guillaume Nault
@ 2020-05-06 15:02 ` Guillaume Nault
  2020-05-06 15:15 ` James Carlson
                   ` (3 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: Guillaume Nault @ 2020-05-06 15:02 UTC (permalink / raw)
  To: linux-ppp

On Mon, May 04, 2020 at 08:43:26PM +0200, David Balažic wrote:
> On Mon, 4 May 2020 at 20:12, Guillaume Nault <gnault@redhat.com> wrote:
> >
> > On Mon, May 04, 2020 at 06:36:48PM +0200, David Balažic wrote:
> > > On Mon, 4 May 2020 at 15:01, Guillaume Nault <gnault@redhat.com> wrote:
> > > > You can use "%pM" for printing MAC addresses. Also, it'd be interesting
> > > > to have information about promisc mode:
> > > >   "dev %s, flags: %#x, promiscuity %u",
> > > >   dev->name, dev->flags, dev->promiscuity,
> > >
> > > "ifdown wan" triggers a PADT response from the server, and then this
> > > code prints:
> > >
> > >  dev eth1.3902, flags: 0x1003, promiscuity 0
> > >
> > I'm not sure what "ifdown wan" is supposed to do. Assuming "wan" is the
> > name of your ppp interface (wasn't it "pppoe-wan" in your previous
> > logs?) and that ifdown works like on Debian, then it makes sense that
> > the peer disconnects the PPPoE session by sending a PADT. At least now
> > the device isn't in promisc mode.
> 
> Yes, "wan" is the interface alias. ifdown "closes" it... sends a PADT
> to the server, server replies with PADT...
> 
Thanks.

> > > I'll report later values printed when a stray PADT appears.
> > >
> > Okay, but please keep printing the destination MAC address of the
> > packet. I was providing the flags/promiscuity string just to get extra
> > information, not to replace your original log.
> >
> > Also, what's the driver of your NIC (ethtool -i eth1)?
> 
> # ethtool  -i eth1
> driver: ag71xx
> 
Ok, can you connect this interface directly to a box under your control
and use this box to ping the ag71xx interface using an invalid
destination MAC address?

Assuming your other box runs Linux and the IP address assigned to the
ag71xx NIC is 192.0.2.1, the test would look like this:

$ ping 192.0.2.1    # Basic test. Also makes Linux get the peer dst MAC

$ ip neighbour list # Show the peer dst MAC
192.0.2.1 dev ethX lladdr aa:bb:cc:dd:ee:ff REACHABLE

$ ip neighbour replace 192.0.2.1 dev ethX lladdr aa:bb:cc:dd:ee:f0

$ ping 192.0.2.1   # Try to ping again, this time using a wrong dst MAC

> Regards,
> David
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (38 preceding siblings ...)
  2020-05-06 15:02 ` Guillaume Nault
@ 2020-05-06 15:15 ` James Carlson
  2020-05-06 15:39 ` David Balažic
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 44+ messages in thread
From: James Carlson @ 2020-05-06 15:15 UTC (permalink / raw)
  To: linux-ppp

On 2020-05-06 10:26, Guillaume Nault wrote:
> On Wed, May 06, 2020 at 11:52:48AM +0200, David Balažic wrote:
>> (the last line is from my fix, the connection is now not dropped when
>> the PADT is not for us; works fine, my connection stays up and
>> working)
>>
> Looks like a more fundamental issue. This frame shouldn't have been
> accepted in the first place. Can you also print the packet class
> ("... pkt_type %u", ..., skb->pkt_type)?
> 
> Testing the destination MAC here is likely to just paper over the
> problem.

Having either sketchy (e.g., CRC hash filter based) or just absent
hardware filtering of unicast addresses isn't an unknown property for
some Ethernet devices.  Relying on exact match from the underlying
hardware seems like a bad idea.  (Some, for instance, can either receive
"all multicast" or can filter using the low N bits of the CRC to index
into a 2^N-bit hash table, but can't really exclude a unicast packet
that happens to match the low N bits of a selectively enabled multicast
address.  See, for example, the PowerQUICC implementation.  But there
are quite a few that work like this.  It wouldn't surprise me in the
least that 'ar71xx' [Atheros/Qualcomm?] is like this.)

I agree that having to test the MAC address right here is a bit lame and
that the driver itself ought to assist the hardware in doing the Right
Thing.

-- 
James Carlson         42.703N 71.076W         <carlsonj@workingcode.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (39 preceding siblings ...)
  2020-05-06 15:15 ` James Carlson
@ 2020-05-06 15:39 ` David Balažic
  2020-05-06 15:54 ` David Balažic
  2020-05-06 20:20 ` Guillaume Nault
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-05-06 15:39 UTC (permalink / raw)
  To: linux-ppp

On Wed, 6 May 2020 at 16:26, Guillaume Nault <gnault@redhat.com> wrote:
>
> On Wed, May 06, 2020 at 11:52:48AM +0200, David Balažic wrote:
> > On Mon, 4 May 2020 at 20:12, Guillaume Nault <gnault@redhat.com> wrote:
> > >
> > > On Mon, May 04, 2020 at 06:36:48PM +0200, David Balažic wrote:
> > > > On Mon, 4 May 2020 at 15:01, Guillaume Nault <gnault@redhat.com> wrote:
> > > > > You can use "%pM" for printing MAC addresses. Also, it'd be interesting
> > > > > to have information about promisc mode:
> > > > >   "dev %s, flags: %#x, promiscuity %u",
> > > > >   dev->name, dev->flags, dev->promiscuity,
> > > >
> > >
> > > > I'll report later values printed when a stray PADT appears.
> > > >
> > > Okay, but please keep printing the destination MAC address of the
> > > packet. I was providing the flags/promiscuity string just to get extra
> > > information, not to replace your original log.
> >
> > This was logged now:
> >
> > (all at May  6 05:34:50 2020 UTC)
> > pppoe_disc_rcv PADT received, sid=1, SRC: a4:7b:2c:9e:c7:44, DST:
> > 44:4e:6d:fd:c7:39
> > pppoe_disc_rcv PADT received, own hw addr: c4:XX:XX:XX:XX:ed
> >       dev eth1.3902, flags: 0x1003, promiscuity 0
> > pppoe_disc_rcv PADT received, not four our address, ignoring
> >
> > (the last line is from my fix, the connection is now not dropped when
> > the PADT is not for us; works fine, my connection stays up and
> > working)
> >
> Looks like a more fundamental issue. This frame shouldn't have been
> accepted in the first place. Can you also print the packet class
> ("... pkt_type %u", ..., skb->pkt_type)?

After fixing it, the priority of this issue really dropped on my task list.
Maybe someone has a similar hardware and can test?

It is an Atheros AR7161 SoC. (full HW info on the openwrt.org pages,
look for Netgear WNDR3700v2)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (40 preceding siblings ...)
  2020-05-06 15:39 ` David Balažic
@ 2020-05-06 15:54 ` David Balažic
  2020-05-06 20:20 ` Guillaume Nault
  42 siblings, 0 replies; 44+ messages in thread
From: David Balažic @ 2020-05-06 15:54 UTC (permalink / raw)
  To: linux-ppp

On Wed, 6 May 2020 at 17:02, Guillaume Nault <gnault@redhat.com> wrote:
>
> On Mon, May 04, 2020 at 08:43:26PM +0200, David Balažic wrote:
> > On Mon, 4 May 2020 at 20:12, Guillaume Nault <gnault@redhat.com> wrote:
> > >
> > > Also, what's the driver of your NIC (ethtool -i eth1)?
> >
> > # ethtool  -i eth1
> > driver: ag71xx

This seems to be a driver from the openwrt project :
https://git.openwrt.org/?p=openwrt/openwrt.git;a=tree;f=target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx;h=3ced45e853aca454e8f745a686834ea1cce259f2;hb=33732f4a9c17921b782167a0dcaba9703d4e6753
(hopefully still readable)

The official kernel has a driver for this hardware since kernel 5.3.x
(and not before).
(linux-x.y.z/drivers/net/ethernet/atheros/ag71xx.c)

Maybe that one is better and openwrt will use it eventually.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: PPPoE Modem hangup after random time - how to debug?
  2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
                   ` (41 preceding siblings ...)
  2020-05-06 15:54 ` David Balažic
@ 2020-05-06 20:20 ` Guillaume Nault
  42 siblings, 0 replies; 44+ messages in thread
From: Guillaume Nault @ 2020-05-06 20:20 UTC (permalink / raw)
  To: linux-ppp

On Wed, May 06, 2020 at 11:15:46AM -0400, James Carlson wrote:
> On 2020-05-06 10:26, Guillaume Nault wrote:
> > On Wed, May 06, 2020 at 11:52:48AM +0200, David Balažic wrote:
> >> (the last line is from my fix, the connection is now not dropped when
> >> the PADT is not for us; works fine, my connection stays up and
> >> working)
> >>
> > Looks like a more fundamental issue. This frame shouldn't have been
> > accepted in the first place. Can you also print the packet class
> > ("... pkt_type %u", ..., skb->pkt_type)?
> > 
> > Testing the destination MAC here is likely to just paper over the
> > problem.
> 
> Having either sketchy (e.g., CRC hash filter based) or just absent
> hardware filtering of unicast addresses isn't an unknown property for
> some Ethernet devices.  Relying on exact match from the underlying
> hardware seems like a bad idea.  (Some, for instance, can either receive
> "all multicast" or can filter using the low N bits of the CRC to index
> into a 2^N-bit hash table, but can't really exclude a unicast packet
> that happens to match the low N bits of a selectively enabled multicast
> address.  See, for example, the PowerQUICC implementation.  But there
> are quite a few that work like this.  It wouldn't surprise me in the
> least that 'ar71xx' [Atheros/Qualcomm?] is like this.)
> 
Yes, but that's just a limitation of the chip. I'm not a driver guy,
but I'd expect the driver to do whatever is need, in software, to fix
such limitations.

> I agree that having to test the MAC address right here is a bit lame and
> that the driver itself ought to assist the hardware in doing the Right
> Thing.
> 
Yes, we shouldn't have to check the destination MAC address in all
ethertype handlers. The Ethernet layer has to do its job.

But looking at the openwrt code, it seems that the driver properly
calls eth_type_trans(). So skb->pkt_type should be correctly set and we
could use it to know if the frame was for us. It'd still be good to
verify that skb->pkt_type = PACKET_OTHERHOST when the original problem
happens.

Testing if ->pkt_type = PACKET_HOST in the PPPoE discovery handler
would be a good thing anyway, to avoid processing packets just because
someone started tcpdump. That might also "fix" David's problem as a
side effect.

> -- 
> James Carlson         42.703N 71.076W         <carlsonj@workingcode.com>
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2020-05-06 20:20 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-22 17:45 PPPoE Modem hangup after random time - how to debug? David Balažic
2020-04-22 19:51 ` Michael Richardson
2020-04-22 22:00 ` James Carlson
2020-04-23  0:08 ` Guillaume Nault
2020-04-23  0:41 ` Michael Richardson
2020-04-23  2:01 ` James Carlson
2020-04-23 10:37 ` David Balažic
2020-04-23 11:12 ` David Balažic
2020-04-23 12:13 ` David Balažic
2020-04-23 15:59 ` Michael Richardson
2020-04-23 16:01 ` Michael Richardson
2020-04-24 13:47 ` David Balažic
2020-04-24 14:02 ` David Balažic
2020-04-24 14:26 ` James Carlson
2020-04-24 15:44 ` Guillaume Nault
2020-04-24 15:54 ` Guillaume Nault
2020-04-26  1:38 ` Michael Richardson
2020-04-26 12:44 ` David Balažic
2020-04-26 12:48 ` David Balažic
2020-04-26 19:03 ` David Balažic
2020-04-27  2:14 ` Michael Richardson
2020-04-27  9:59 ` David Balažic
2020-04-27 13:43 ` James Carlson
2020-04-27 20:08 ` Guillaume Nault
2020-04-27 20:30 ` Guillaume Nault
2020-04-27 23:34 ` David Balažic
2020-04-28 10:18 ` David Balažic
2020-04-28 11:00 ` Guillaume Nault
2020-04-29 11:58 ` David Balažic
2020-05-03 10:31 ` David Balažic
2020-05-04 12:27 ` Guillaume Nault
2020-05-04 13:01 ` Guillaume Nault
2020-05-04 15:54 ` David Balažic
2020-05-04 16:36 ` David Balažic
2020-05-04 18:11 ` Guillaume Nault
2020-05-04 18:43 ` David Balažic
2020-05-06  9:52 ` David Balažic
2020-05-06 10:34 ` David Balažic
2020-05-06 14:26 ` Guillaume Nault
2020-05-06 15:02 ` Guillaume Nault
2020-05-06 15:15 ` James Carlson
2020-05-06 15:39 ` David Balažic
2020-05-06 15:54 ` David Balažic
2020-05-06 20:20 ` Guillaume Nault

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).