All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Can only connect  to RMS gateway once
@ 2016-06-03 20:16 Basil Gunn
  2016-06-03 23:45 ` David Ranch
  2016-06-04 20:43 ` Basil Gunn
  0 siblings, 2 replies; 10+ messages in thread
From: Basil Gunn @ 2016-06-03 20:16 UTC (permalink / raw)
  To: linux-hams

> Have you tried to disable smp (in grub, boot the kernel with the
> cmdline option nosmp)? Did then the problem still occur?

I disabled SMP and the problem of socket remains open after disconnect
still occurs with kernels >= 4.2.

cat /boot/cmdline.txt
dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1
root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline fsck.repair=yes
maxcpus=0 rootwait


I built a 4.1.21 kernel from the raspbian repo
https://github.com/raspberrypi/linux/tree/rpi-4.1.y

and the problem does NOT exist with that kernel.

I built a 4.2.8 kernel from the raspbian repo
https://github.com/raspberrypi/linux/tree/rpi-4.2.y

and the problem DOES exist with that kernel.
Showing connection listening after final disconnect.

Active AX.25 sockets
Dest       Source     Device  State        Vr/Vs    Send-Q  Recv-Q
N7NIX-0    N7NIX-11   ax0     LISTENING    001/004  0       0
*          N7NIX-11   ax0     LISTENING    000/000  0       0


/Basil n7nix

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Can only connect to RMS gateway once
  2016-06-03 20:16 Can only connect to RMS gateway once Basil Gunn
@ 2016-06-03 23:45 ` David Ranch
  2016-06-04 20:43 ` Basil Gunn
  1 sibling, 0 replies; 10+ messages in thread
From: David Ranch @ 2016-06-03 23:45 UTC (permalink / raw)
  To: Basil Gunn, linux-hams

Hey Basil,

Thanks for digging into this.  Few things to check:

GOOD - the 4.1.21 kernel was released on 4/6/16
BAD - 4.2.8 kernel was released on 12/16/15

Can you try a vanilla kernel with:
    - 4.2.7, released on 12/9/16 to see if that's OK
    - 4.1.25, released on 5/23/16 to see if that's ok?  (a worse case 
4.1.x test)

If 4.1.25 is bad:
    - Try 4.1.14, released on 12/9/15 to see if that's ok?

If that's ok, try:
    - 4.1.15 released on 12/15/15
    - 4.1.16 released on 1/23/16

--David




On 06/03/2016 01:16 PM, Basil Gunn wrote:
>> Have you tried to disable smp (in grub, boot the kernel with the
>> cmdline option nosmp)? Did then the problem still occur?
> I disabled SMP and the problem of socket remains open after disconnect
> still occurs with kernels >= 4.2.
>
> cat /boot/cmdline.txt
> dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1
> root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline fsck.repair=yes
> maxcpus=0 rootwait
>
>
> I built a 4.1.21 kernel from the raspbian repo
> https://github.com/raspberrypi/linux/tree/rpi-4.1.y
>
> and the problem does NOT exist with that kernel.
>
> I built a 4.2.8 kernel from the raspbian repo
> https://github.com/raspberrypi/linux/tree/rpi-4.2.y
>
> and the problem DOES exist with that kernel.
> Showing connection listening after final disconnect.
>
> Active AX.25 sockets
> Dest       Source     Device  State        Vr/Vs    Send-Q  Recv-Q
> N7NIX-0    N7NIX-11   ax0     LISTENING    001/004  0       0
> *          N7NIX-11   ax0     LISTENING    000/000  0       0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Can only connect  to RMS gateway once
  2016-06-03 20:16 Can only connect to RMS gateway once Basil Gunn
  2016-06-03 23:45 ` David Ranch
@ 2016-06-04 20:43 ` Basil Gunn
  2016-06-04 20:57   ` David Ranch
  1 sibling, 1 reply; 10+ messages in thread
From: Basil Gunn @ 2016-06-04 20:43 UTC (permalink / raw)
  To: linux-hams

This isn't a final solution but the problem is in:

 sock_set_flag(sk, SOCK_DESTROY);

in routine ax25_release() in file net/ax25/af_ax25.c which does what it
is supposed to do in kernel 4.1.21 but NOT in kernels 4.2.8 & above. It
should destroy & free the socket when disconnecting.

For my 4.2.8 kernel If I add this after the sock_set_flag() call in
ax25_release() then the connection is released after disconnect & I can
reconnect again.

    release_sock(sk);
    ax25_disconnect(ax25, 0);
    lock_sock(sk);
    ax25_destroy_socket(ax25);

From the af_ax25 code in the 4.1.21 kernel, it expects sock_set_flag(sk, SOCK_DESTROY); to
 ax25_destroy_socket
 ax25_free_sock


> /Basil n7nix


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Can only connect to RMS gateway once
  2016-06-04 20:43 ` Basil Gunn
@ 2016-06-04 20:57   ` David Ranch
  2016-06-04 21:32     ` Basil Gunn
  0 siblings, 1 reply; 10+ messages in thread
From: David Ranch @ 2016-06-04 20:57 UTC (permalink / raw)
  To: Basil Gunn, linux-hams; +Cc: davem


+ David Miller for comments


I see a change on June 25, 2015, and a few others on that file that seem 
like they could be the issue:

https://github.com/torvalds/linux/commits/master/net/ax25/af_ax25.c


--David


On 06/04/2016 01:43 PM, Basil Gunn wrote:
> This isn't a final solution but the problem is in:
>
>   sock_set_flag(sk, SOCK_DESTROY);
>
> in routine ax25_release() in file net/ax25/af_ax25.c which does what it
> is supposed to do in kernel 4.1.21 but NOT in kernels 4.2.8 & above. It
> should destroy & free the socket when disconnecting.
>
> For my 4.2.8 kernel If I add this after the sock_set_flag() call in
> ax25_release() then the connection is released after disconnect & I can
> reconnect again.
>
>      release_sock(sk);
>      ax25_disconnect(ax25, 0);
>      lock_sock(sk);
>      ax25_destroy_socket(ax25);
>
> >From the af_ax25 code in the 4.1.21 kernel, it expects sock_set_flag(sk, SOCK_DESTROY); to
>   ax25_destroy_socket
>   ax25_free_sock
>
>
>> /Basil n7nix
> --
> To unsubscribe from this list: send the line "unsubscribe linux-hams" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Can only connect to RMS gateway once
  2016-06-04 20:57   ` David Ranch
@ 2016-06-04 21:32     ` Basil Gunn
  2016-06-05 23:46       ` Basil Gunn
  0 siblings, 1 reply; 10+ messages in thread
From: Basil Gunn @ 2016-06-04 21:32 UTC (permalink / raw)
  To: linux-hams; +Cc: David Ranch

It looks like the SOCK_DESTROY flag should be serviced off the
ax25_ds_timer or ax25_std_timer but that's not happening.

/Basil

On Sat, 4 Jun 2016 13:57:17 -0700
David Ranch <linux-hams@trinnet.net> wrote:

>
> + David Miller for comments
>
>
> I see a change on June 25, 2015, and a few others on that file that
> seem like they could be the issue:
>
> https://github.com/torvalds/linux/commits/master/net/ax25/af_ax25.c
>
>
> --David
>
>
> On 06/04/2016 01:43 PM, Basil Gunn wrote:
> > This isn't a final solution but the problem is in:
> >
> >   sock_set_flag(sk, SOCK_DESTROY);
> >
> > in routine ax25_release() in file net/ax25/af_ax25.c which does
> > what it is supposed to do in kernel 4.1.21 but NOT in kernels 4.2.8
> > & above. It should destroy & free the socket when disconnecting.
> >
> > For my 4.2.8 kernel If I add this after the sock_set_flag() call in
> > ax25_release() then the connection is released after disconnect & I
> > can reconnect again.
> >
> >      release_sock(sk);
> >      ax25_disconnect(ax25, 0);
> >      lock_sock(sk);
> >      ax25_destroy_socket(ax25);
> >
> > >From the af_ax25 code in the 4.1.21 kernel, it expects
> > >sock_set_flag(sk, SOCK_DESTROY); to
> >   ax25_destroy_socket
> >   ax25_free_sock
> >
> >> /Basil n7nix

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Can only connect to RMS gateway once
  2016-06-04 21:32     ` Basil Gunn
@ 2016-06-05 23:46       ` Basil Gunn
  0 siblings, 0 replies; 10+ messages in thread
From: Basil Gunn @ 2016-06-05 23:46 UTC (permalink / raw)
  To: linux-hams; +Cc: David Ranch, Jeremy McDermond

Below is a unified diff of the 4 files I changed in kernel 4.2.y from
the Raspbian repo.  At least for my case this fixed the problem of an
open socket not getting closed & preventing anymore connections from
the original connection callsign. In newer kernels (after 4.1.21)
an ax25_ disconnect was occurring and stopping the heartbeat timer. The
socket free code runs off the heartbeat timer.

/Basil Gunn n7nix


diff -Nau ax25/af_ax25.c /home/gunn/projects/rpi/rpi-4.2.y.dev/net/ax25/af_ax25.c
--- ax25/af_ax25.c	2016-03-07 06:13:41.000000000 -0800
+++ /home/gunn/projects/rpi/rpi-4.2.y.dev/net/ax25/af_ax25.c	2016-06-05 15:08:29.475604719 -0700
@@ -973,7 +973,9 @@
 			release_sock(sk);
 			ax25_disconnect(ax25, 0);
 			lock_sock(sk);
-			ax25_destroy_socket(ax25);
+			if(!sock_flag(ax25->sk, SOCK_DESTROY)) {
+				ax25_destroy_socket(ax25);
+			}
 			break;

 		case AX25_STATE_3:
diff -Nau ax25/ax25_ds_timer.c /home/gunn/projects/rpi/rpi-4.2.y.dev/net/ax25/ax25_ds_timer.c
--- ax25/ax25_ds_timer.c	2016-03-07 06:13:41.000000000 -0800
+++ /home/gunn/projects/rpi/rpi-4.2.y.dev/net/ax25/ax25_ds_timer.c	2016-06-05 16:19:00.023254345 -0700
@@ -101,7 +101,8 @@

 	switch (ax25->state) {

-	case AX25_STATE_0:
+		case AX25_STATE_0:
+		case AX25_STATE_2:
 		/* Magic here: If we listen() and a new link dies before it
 		   is accepted() it isn't 'dead' so doesn't get removed. */
 		if (!sk || sock_flag(sk, SOCK_DESTROY) ||
@@ -112,6 +113,8 @@
 				ax25_destroy_socket(ax25);
 				bh_unlock_sock(sk);
 				sock_put(sk);
+				ax25_cb_put(sk_to_ax25(sk));
+
 			} else
 				ax25_destroy_socket(ax25);
 			return;
diff -Nau ax25/ax25_std_timer.c /home/gunn/projects/rpi/rpi-4.2.y.dev/net/ax25/ax25_std_timer.c
--- ax25/ax25_std_timer.c	2016-03-07 06:13:41.000000000 -0800
+++ /home/gunn/projects/rpi/rpi-4.2.y.dev/net/ax25/ax25_std_timer.c	2016-06-05 16:18:15.334583540 -0700
@@ -38,6 +38,7 @@

 	switch (ax25->state) {
 	case AX25_STATE_0:
+	case AX25_STATE_2:
 		/* Magic here: If we listen() and a new link dies before it
 		   is accepted() it isn't 'dead' so doesn't get removed. */
 		if (!sk || sock_flag(sk, SOCK_DESTROY) ||
@@ -48,6 +49,7 @@
 				ax25_destroy_socket(ax25);
 				bh_unlock_sock(sk);
 				sock_put(sk);
+				ax25_cb_put(sk_to_ax25(sk));
 			} else
 				ax25_destroy_socket(ax25);
 			return;
@@ -144,7 +146,9 @@
 	case AX25_STATE_2:
 		if (ax25->n2count == ax25->n2) {
 			ax25_send_control(ax25, AX25_DISC, AX25_POLLON, AX25_COMMAND);
-			ax25_disconnect(ax25, ETIMEDOUT);
+
+			if(!sock_flag(ax25->sk, SOCK_DESTROY))
+				ax25_disconnect(ax25, ETIMEDOUT);
 			return;
 		} else {
 			ax25->n2count++;
diff -Nau ax25/ax25_subr.c /home/gunn/projects/rpi/rpi-4.2.y.dev/net/ax25/ax25_subr.c
--- ax25/ax25_subr.c	2016-03-07 06:13:41.000000000 -0800
+++ /home/gunn/projects/rpi/rpi-4.2.y.dev/net/ax25/ax25_subr.c	2016-06-05 14:06:39.424828134 -0700
@@ -262,9 +262,10 @@

 void ax25_disconnect(ax25_cb *ax25, int reason)
 {
-	ax25_clear_queues(ax25);

-	ax25_stop_heartbeat(ax25);
+	ax25_clear_queues(ax25);
+	if(!sock_flag(ax25->sk, SOCK_DESTROY))
+		ax25_stop_heartbeat(ax25);
 	ax25_stop_t1timer(ax25);
 	ax25_stop_t2timer(ax25);
 	ax25_stop_t3timer(ax25);

---------------------------------------

On Sat, 4 Jun 2016 14:32:14 -0700
Basil Gunn <basil@pacabunga.com> wrote:

> It looks like the SOCK_DESTROY flag should be serviced off the
> ax25_ds_timer or ax25_std_timer but that's not happening.
>
> /Basil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Can only connect to RMS gateway once
  2016-06-03  8:19   ` Thomas Osterried
@ 2016-06-03 15:52     ` David Ranch
  0 siblings, 0 replies; 10+ messages in thread
From: David Ranch @ 2016-06-03 15:52 UTC (permalink / raw)
  To: Thomas Osterried, Linux Hams; +Cc: Basil Gunn, Ralf Bächle DL5RB


Hey Thomas,

I followed up with Greg Kroah-Hartman who has been very helpful in the 
past for some of my kernel contributions.  He had the following to say:

--
-------- Forwarded Message --------
Subject: Re: Fwd: Re: Can only connect to RMS gateway once - AX.25 stack 
issues in recent kernel versions..
Date: Fri, 3 Jun 2016 08:45:23 -0700
From: Greg Kroah-Hartman <greg@com>
To: David Ranch <linux-hams@net>

On Fri, Jun 03, 2016 at 08:39:39AM -0700, David Ranch wrote:
 >
 > [Resend to move past your email bot]
 >
 > Hey Greg,
 >
 > I know you're a busy guy in the world of everything Linux but I was 
curious
 > if you can help direct some resources (people time) towards the AX.25 
stack.
 > There are a few issues that have crept into the kernel here due to it's
 > ongoing cleanup efforts and though patches have been offered, they 
weren't
 > committed into Git.

I don't see where the patches were sent, do you have pointers to them?
What subsystem were they for? And why were they rejected?

And if you need/want help with this, please post on the driverdevel
mailing list (for the staging tree, the address is in the MAINTAINERS
file), there are lots of people there looking for things to help out
with.

thanks,

greg k-h
--

Can you find your previous patches and any other troubleshooting details 
you've recorded (SMP issues, etc) put them into a easy to follow email? 
  With that, I'd be happy to cheerlead this effort to Greg and the 
driverdevel group to see if we can get some help here.

--David
KI6ZHD



On 06/03/2016 01:19 AM, Thomas Osterried wrote:
>
>> Am 03.06.2016 um 02:01 schrieb David Ranch <linux-hams@trinnet.net>:
>>
>>
>> Hey Basil,
>>
>> Good to hear from you.. hope all is well.
>>
>> Yes.. it's been reported and Thomas verified it but I haven't heard of any fixes yet ( I did send out a prod last month but no response)
>
> In another thread (Message-Id: <56FAD2CA.5060707@trinnet.net> you answered my question, that those machines are running with an smp-kernel.
> Have you tried to disable smp (in grub, boot the kernel with the cmdline option nosmp)?
> Did then the problem still occur?
>
> Those bugs are very hard to trace, because you cannot really provoke them; they occur suddenly.
>
> With kernel ax25 on smp machines I have discovered other severe bugs (ax25 data corruption), that also needs to be fixed.
>
> Imho, our greatest problem is that there too few kernel ax25 developers around in our ham community.
>
> In the meantime, I encourage to disable SMP to minimize the problems with kernel ax25.
>
>
> Also look at my posting Message-Id: <20160215140035.GF24276@x-berg.in-berlin.de> from 2016-02-15. There was no response on the list, and nothing got into the kernel
> (as far as I can see - my approach is to look at https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/master/drivers/net/hamradio ; perhaps I'm wrong with that ).
>
> And on 2016-02-17 I asked for submitting my patch to mkiss.c that fixes a race condition that leads to kernel panic (!!!!): when the kernel ax25 stack sends data to the interface right after you plugged off your usb-serial-adapter.
> It took me hours to discover, test and submit that, but nothing happens.
> (David, it was in my mail to you with Message-Id: <9E57F6D5-9BFC-4C64-B2E4-7C332C502D01@osterried.de> )
>
>
> Thus, those problems are discussed here year after year, again and again, and periodically people spend time to develop fixes others have already done (but never made it into the mainline kernel).
>
> I'm very frustrated in that, and in a review of my past efforts I simply have to say now "sorry, I cannot help".
>
>
> vy 73,
> 	- Thomas  dl9sau
>
>
>> --David
>> KI6ZHD
>>
>>
>>
>> -------- Forwarded Message --------
>> Subject: 	Re: AX.25 / ax25d socket close issue on Ubuntu 14.04 but not on 12.04
>> Date: 	Tue, 29 Mar 2016 09:00:37 +0200
>> From: 	Thomas Osterried <thomas@de>
>> To: 	David Ranch <dranch@net>
>> CC: 	Ralf Bächle DL5RB <ralf@org>, Bernard, f6bvp <f6bvp@fr>
>>
>>
>>
>>> Am 28.03.2016 um 22:21 schrieb David Ranch <dranch@net>:
>>>
>>> Hey Ralf, Thomas, Bernard,
>>>
>>> I've been helping a user here who is running the LinuxRMS gateway on his Ubuntu 14.04 machine and when the remote station terminates the session, it leaves an AX.25 session on his computer *forever*.. never times out:
>>>
>>> Active AX.25 sockets
>>> Dest       Source     Device  State        Vr/Vs    Send-Q  Recv-Q
>>> WA7FPV-0   WA7FPV-10  ax0     LISTENING    001/003  0       0
>>>
>>> He built up an Ubuntu 12.04 machine with the same LinuxRMS/ax25d service and this does NOT happen.  He then sent me the below strace.  Any thoughts on where this issue is coming from?
>>
>> Hello David,
>>
>> just for a quick answer (I'm on journey): it's coming from a kernel bug in the ax25 part.
>> You already have Cc'ed Ralf <dl5rb>.
>> If I remember correctly, he spoke some weeks ago also about this issue.
>> I also know of those problems, which are very rare.
>>
>> My question is: does it happen on SMP (multiprocessor-machine)?
>>
>> vy 73,
>> 	- Thomas  dl9sau
>>
>>>
>>> --David
>>>
>>>
>>>
>>> -------- Forwarded Message --------
>>> Subject:	Re: AX.25 Help...
>>> Date:	Mon, 28 Mar 2016 12:52:25 -0700
>>> From:	Josh Gibbs <gibbsjj@com>
>>> To:	David Ranch <dranch@net>
>>>
>>> Confirmed that starting Direwolf on the Ubuntu 14 box with your script made no difference. Socket still hangs up. I connected to the rmsgw process with strace, and then sent the bye command:
>>>
>>> select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
>>> read(0, "b\r", 8192)                    = 2
>>> write(4, "b\r", 2)                      = 2
>>> read(0, 0x8058180, 8192)                = -1 EAGAIN (Resource temporarily unavailable)
>>> select(5, [0 4], NULL, NULL, NULL)      = 1 (in [4])
>>> recv(4, "D", 1, MSG_PEEK|MSG_DONTWAIT)  = 1
>>> recv(4, "Disconnecting...\r", 8192, 0)  = 17
>>> write(1, "Disconnecting...\r", 17)      = 17
>>> recv(4, 0x8058180, 8192, 0)             = -1 EAGAIN (Resource temporarily unavailable)
>>> select(5, [0 4], NULL, NULL, NULL)      = 1 (in [4])
>>> recv(4, "", 1, MSG_PEEK|MSG_DONTWAIT)   = 0
>>> time(NULL)                              = 1459193715
>>> send(3, "<134>Mar 28 12:35:15 rmsgw[1417]"..., 85, MSG_NOSIGNAL) = 85
>>> write(1, "; INFO: Connection closed by CMS"..., 51) = 51
>>> rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
>>> rt_sigaction(SIGCHLD, NULL, {SIG_IGN, [], 0}, 8) = 0
>>> nanosleep({1, 0}, 0xbfad3bac)           = 0
>>> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
>>> close(4)                                = 0
>>> time(NULL)                              = 1459193716
>>> write(1, "; Sent: 81 Bytes / Received: 2 B"..., 61) = 61
>>> write(1, "; W7AUX de WA7FPV-10 SK\n", 24) = 24
>>> time(NULL)                              = 1459193716
>>> time(NULL)                              = 1459193716
>>> send(3, "<133>Mar 28 12:35:16 rmsgw[1417]"..., 84, MSG_NOSIGNAL) = 84
>>> close(4)                                = -1 EBADF (Bad file descriptor)
>>> exit_group(0)                           = ?
>>> +++ exited with 0 +++
>>>
>>> I'm thinking that close(4) near the end is supposed to close the socket, but is resulting in -1 EBADF (Bad file descriptor).
>>>
>>> I'm going to have a look in the code when I have more time to poke at this, but for now I at least have a working RMS Gateway on the Ubuntu 12 box! Appreciate all your help with this. I will let you know when I get to the root of it all, if you are interested!
>>>
>>> -Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-hams" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Can only connect to RMS gateway once
  2016-06-03  0:01 ` David Ranch
@ 2016-06-03  8:19   ` Thomas Osterried
  2016-06-03 15:52     ` David Ranch
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Osterried @ 2016-06-03  8:19 UTC (permalink / raw)
  To: David Ranch; +Cc: Basil Gunn, Linux Hams, Ralf Bächle DL5RB


> Am 03.06.2016 um 02:01 schrieb David Ranch <linux-hams@trinnet.net>:
> 
> 
> Hey Basil,
> 
> Good to hear from you.. hope all is well.
> 
> Yes.. it's been reported and Thomas verified it but I haven't heard of any fixes yet ( I did send out a prod last month but no response)

In another thread (Message-Id: <56FAD2CA.5060707@trinnet.net> you answered my question, that those machines are running with an smp-kernel.
Have you tried to disable smp (in grub, boot the kernel with the cmdline option nosmp)?
Did then the problem still occur?

Those bugs are very hard to trace, because you cannot really provoke them; they occur suddenly.

With kernel ax25 on smp machines I have discovered other severe bugs (ax25 data corruption), that also needs to be fixed.

Imho, our greatest problem is that there too few kernel ax25 developers around in our ham community.

In the meantime, I encourage to disable SMP to minimize the problems with kernel ax25.


Also look at my posting Message-Id: <20160215140035.GF24276@x-berg.in-berlin.de> from 2016-02-15. There was no response on the list, and nothing got into the kernel
(as far as I can see - my approach is to look at https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable/+/master/drivers/net/hamradio ; perhaps I'm wrong with that ).

And on 2016-02-17 I asked for submitting my patch to mkiss.c that fixes a race condition that leads to kernel panic (!!!!): when the kernel ax25 stack sends data to the interface right after you plugged off your usb-serial-adapter.
It took me hours to discover, test and submit that, but nothing happens.
(David, it was in my mail to you with Message-Id: <9E57F6D5-9BFC-4C64-B2E4-7C332C502D01@osterried.de> )


Thus, those problems are discussed here year after year, again and again, and periodically people spend time to develop fixes others have already done (but never made it into the mainline kernel).

I'm very frustrated in that, and in a review of my past efforts I simply have to say now "sorry, I cannot help".


vy 73,
	- Thomas  dl9sau


> --David
> KI6ZHD
> 
> 
> 
> -------- Forwarded Message --------
> Subject: 	Re: AX.25 / ax25d socket close issue on Ubuntu 14.04 but not on 12.04
> Date: 	Tue, 29 Mar 2016 09:00:37 +0200
> From: 	Thomas Osterried <thomas@de>
> To: 	David Ranch <dranch@net>
> CC: 	Ralf Bächle DL5RB <ralf@org>, Bernard, f6bvp <f6bvp@fr>
> 
> 
> 
>> Am 28.03.2016 um 22:21 schrieb David Ranch <dranch@net>:
>> 
>> Hey Ralf, Thomas, Bernard,
>> 
>> I've been helping a user here who is running the LinuxRMS gateway on his Ubuntu 14.04 machine and when the remote station terminates the session, it leaves an AX.25 session on his computer *forever*.. never times out:
>> 
>> Active AX.25 sockets
>> Dest       Source     Device  State        Vr/Vs    Send-Q  Recv-Q
>> WA7FPV-0   WA7FPV-10  ax0     LISTENING    001/003  0       0
>> 
>> He built up an Ubuntu 12.04 machine with the same LinuxRMS/ax25d service and this does NOT happen.  He then sent me the below strace.  Any thoughts on where this issue is coming from?
> 
> Hello David,
> 
> just for a quick answer (I'm on journey): it's coming from a kernel bug in the ax25 part.
> You already have Cc'ed Ralf <dl5rb>.
> If I remember correctly, he spoke some weeks ago also about this issue.
> I also know of those problems, which are very rare.
> 
> My question is: does it happen on SMP (multiprocessor-machine)?
> 
> vy 73,
> 	- Thomas  dl9sau
> 
>> 
>> --David
>> 
>> 
>> 
>> -------- Forwarded Message --------
>> Subject:	Re: AX.25 Help...
>> Date:	Mon, 28 Mar 2016 12:52:25 -0700
>> From:	Josh Gibbs <gibbsjj@com>
>> To:	David Ranch <dranch@net>
>> 
>> Confirmed that starting Direwolf on the Ubuntu 14 box with your script made no difference. Socket still hangs up. I connected to the rmsgw process with strace, and then sent the bye command:
>> 
>> select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
>> read(0, "b\r", 8192)                    = 2
>> write(4, "b\r", 2)                      = 2
>> read(0, 0x8058180, 8192)                = -1 EAGAIN (Resource temporarily unavailable)
>> select(5, [0 4], NULL, NULL, NULL)      = 1 (in [4])
>> recv(4, "D", 1, MSG_PEEK|MSG_DONTWAIT)  = 1
>> recv(4, "Disconnecting...\r", 8192, 0)  = 17
>> write(1, "Disconnecting...\r", 17)      = 17
>> recv(4, 0x8058180, 8192, 0)             = -1 EAGAIN (Resource temporarily unavailable)
>> select(5, [0 4], NULL, NULL, NULL)      = 1 (in [4])
>> recv(4, "", 1, MSG_PEEK|MSG_DONTWAIT)   = 0
>> time(NULL)                              = 1459193715
>> send(3, "<134>Mar 28 12:35:15 rmsgw[1417]"..., 85, MSG_NOSIGNAL) = 85
>> write(1, "; INFO: Connection closed by CMS"..., 51) = 51
>> rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
>> rt_sigaction(SIGCHLD, NULL, {SIG_IGN, [], 0}, 8) = 0
>> nanosleep({1, 0}, 0xbfad3bac)           = 0
>> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
>> close(4)                                = 0
>> time(NULL)                              = 1459193716
>> write(1, "; Sent: 81 Bytes / Received: 2 B"..., 61) = 61
>> write(1, "; W7AUX de WA7FPV-10 SK\n", 24) = 24
>> time(NULL)                              = 1459193716
>> time(NULL)                              = 1459193716
>> send(3, "<133>Mar 28 12:35:16 rmsgw[1417]"..., 84, MSG_NOSIGNAL) = 84
>> close(4)                                = -1 EBADF (Bad file descriptor)
>> exit_group(0)                           = ?
>> +++ exited with 0 +++
>> 
>> I'm thinking that close(4) near the end is supposed to close the socket, but is resulting in -1 EBADF (Bad file descriptor).
>> 
>> I'm going to have a look in the code when I have more time to poke at this, but for now I at least have a working RMS Gateway on the Ubuntu 12 box! Appreciate all your help with this. I will let you know when I get to the root of it all, if you are interested!
>> 
>> -Josh
>> 
>> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-hams" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Can only connect to RMS gateway once
  2016-06-02 19:46 Basil Gunn
@ 2016-06-03  0:01 ` David Ranch
  2016-06-03  8:19   ` Thomas Osterried
  0 siblings, 1 reply; 10+ messages in thread
From: David Ranch @ 2016-06-03  0:01 UTC (permalink / raw)
  To: Basil Gunn, Linux Hams; +Cc: Thomas Osterried


Hey Basil,

Good to hear from you.. hope all is well.

Yes.. it's been reported and Thomas verified it but I haven't heard of 
any fixes yet ( I did send out a prod last month but no response)

--David
KI6ZHD



-------- Forwarded Message --------
Subject: 	Re: AX.25 / ax25d socket close issue on Ubuntu 14.04 but not 
on 12.04
Date: 	Tue, 29 Mar 2016 09:00:37 +0200
From: 	Thomas Osterried <thomas@de>
To: 	David Ranch <dranch@net>
CC: 	Ralf Bächle DL5RB <ralf@org>, Bernard, f6bvp <f6bvp@fr>



> Am 28.03.2016 um 22:21 schrieb David Ranch <dranch@net>:
>
> Hey Ralf, Thomas, Bernard,
>
> I've been helping a user here who is running the LinuxRMS gateway on his Ubuntu 14.04 machine and when the remote station terminates the session, it leaves an AX.25 session on his computer *forever*.. never times out:
>
> Active AX.25 sockets
> Dest       Source     Device  State        Vr/Vs    Send-Q  Recv-Q
> WA7FPV-0   WA7FPV-10  ax0     LISTENING    001/003  0       0
>
> He built up an Ubuntu 12.04 machine with the same LinuxRMS/ax25d service and this does NOT happen.  He then sent me the below strace.  Any thoughts on where this issue is coming from?

Hello David,

just for a quick answer (I'm on journey): it's coming from a kernel bug in the ax25 part.
You already have Cc'ed Ralf <dl5rb>.
If I remember correctly, he spoke some weeks ago also about this issue.
I also know of those problems, which are very rare.

My question is: does it happen on SMP (multiprocessor-machine)?

vy 73,
	- Thomas  dl9sau

>
> --David
>
>
>
> -------- Forwarded Message --------
> Subject:	Re: AX.25 Help...
> Date:	Mon, 28 Mar 2016 12:52:25 -0700
> From:	Josh Gibbs <gibbsjj@com>
> To:	David Ranch <dranch@net>
>
> Confirmed that starting Direwolf on the Ubuntu 14 box with your script made no difference. Socket still hangs up. I connected to the rmsgw process with strace, and then sent the bye command:
>
> select(5, [0 4], NULL, NULL, NULL)      = 1 (in [0])
> read(0, "b\r", 8192)                    = 2
> write(4, "b\r", 2)                      = 2
> read(0, 0x8058180, 8192)                = -1 EAGAIN (Resource temporarily unavailable)
> select(5, [0 4], NULL, NULL, NULL)      = 1 (in [4])
> recv(4, "D", 1, MSG_PEEK|MSG_DONTWAIT)  = 1
> recv(4, "Disconnecting...\r", 8192, 0)  = 17
> write(1, "Disconnecting...\r", 17)      = 17
> recv(4, 0x8058180, 8192, 0)             = -1 EAGAIN (Resource temporarily unavailable)
> select(5, [0 4], NULL, NULL, NULL)      = 1 (in [4])
> recv(4, "", 1, MSG_PEEK|MSG_DONTWAIT)   = 0
> time(NULL)                              = 1459193715
> send(3, "<134>Mar 28 12:35:15 rmsgw[1417]"..., 85, MSG_NOSIGNAL) = 85
> write(1, "; INFO: Connection closed by CMS"..., 51) = 51
> rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
> rt_sigaction(SIGCHLD, NULL, {SIG_IGN, [], 0}, 8) = 0
> nanosleep({1, 0}, 0xbfad3bac)           = 0
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> close(4)                                = 0
> time(NULL)                              = 1459193716
> write(1, "; Sent: 81 Bytes / Received: 2 B"..., 61) = 61
> write(1, "; W7AUX de WA7FPV-10 SK\n", 24) = 24
> time(NULL)                              = 1459193716
> time(NULL)                              = 1459193716
> send(3, "<133>Mar 28 12:35:16 rmsgw[1417]"..., 84, MSG_NOSIGNAL) = 84
> close(4)                                = -1 EBADF (Bad file descriptor)
> exit_group(0)                           = ?
> +++ exited with 0 +++
>
> I'm thinking that close(4) near the end is supposed to close the socket, but is resulting in -1 EBADF (Bad file descriptor).
>
> I'm going to have a look in the code when I have more time to poke at this, but for now I at least have a working RMS Gateway on the Ubuntu 12 box! Appreciate all your help with this. I will let you know when I get to the root of it all, if you are interested!
>
> -Josh
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-hams" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Can only connect to RMS gateway once
@ 2016-06-02 19:46 Basil Gunn
  2016-06-03  0:01 ` David Ranch
  0 siblings, 1 reply; 10+ messages in thread
From: Basil Gunn @ 2016-06-02 19:46 UTC (permalink / raw)
  To: Linux Hams

This problem was was originally described on the Linux RMS-Gateway
group in March.

I've done a little investigation on the 'Can only connect to gateway
once' problem. RMS Gateway is definitely broken when running on
kernels after 4.1.6 or so. My 4.1.6 kernel didn't exhibit symptoms but
as mentioned previously a 4.2.0 kernel didn't work. My 4.4.10 kernel
doesn't work as well and I'm assuming the problem persists in the
latest kernels.

Using 'netstat --ax25' this is the expected reponse. You can see the
connection in operation between N7NIX-0 & N7NIX-11.

Active AX.25 sockets
Dest       Source     Device  State        Vr/Vs    Send-Q  Recv-Q
N7NIX-0    N7NIX-11   ax0     ESTABLISHED  000/001  2112    0
*          N7NIX-11   ax0     LISTENING    000/000  0       0

After the connection the sockect for N7NIX-0 should go away as shown
below:

Active AX.25 sockets
Dest       Source     Device  State        Vr/Vs    Send-Q  Recv-Q
*          N7NIX-11   ax0     LISTENING    000/000  0       0

On kernels newer then 4.1.6 the N7NIX-0 socket connection lingers
around and N7NIX can no longer connect to N7NIX-11.

Active AX.25 sockets
Dest       Source     Device  State        Vr/Vs    Send-Q  Recv-Q
*          N7NIX-11   ax0     LISTENING    000/000  0       0
N7NIX-0    N7NIX-11   ???     LISTENING    001/004  0       0
N7NIX-0    N7NIX-11   ???     LISTENING    001/004  0       0
N7NIX-0    N7NIX-11   ???     LISTENING    001/004  0       0
N7NIX-0    N7NIX-11   ???     LISTENING    001/004  0       0

If I 'ifconfig down' the device, kill kissattach and reattach I can
connect once, again.  There have been many changes in the net/ax25
files since kernel version 4.1.6 and I'm guessing that it is something
that has changed there that is causing problems for RMS Gateway or
ax25d.

/Basil n7nix

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-06-05 23:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-03 20:16 Can only connect to RMS gateway once Basil Gunn
2016-06-03 23:45 ` David Ranch
2016-06-04 20:43 ` Basil Gunn
2016-06-04 20:57   ` David Ranch
2016-06-04 21:32     ` Basil Gunn
2016-06-05 23:46       ` Basil Gunn
  -- strict thread matches above, loose matches on Subject: below --
2016-06-02 19:46 Basil Gunn
2016-06-03  0:01 ` David Ranch
2016-06-03  8:19   ` Thomas Osterried
2016-06-03 15:52     ` David Ranch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.