linux-serial.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* N_HDLC line discipline: Race condition
@ 2024-04-24 21:31 Dianne Skoll
  2024-04-25 18:01 ` Dianne Skoll
  0 siblings, 1 reply; 9+ messages in thread
From: Dianne Skoll @ 2024-04-24 21:31 UTC (permalink / raw)
  To: linux-serial, linux-kernel; +Cc: Greg Kroah-Hartman, Jiri Slaby

[-- Attachment #1: Type: text/plain, Size: 2272 bytes --]

Hi,

I'm reposting here (originally emailed GregKH and Jiri Slaby directly
as they are listed as tty maintainers) as well as responding to a
reply from Greg.

Original mail:

Some people have been reporting bugs using synchronous PPP with the
rp-pppoe user-space program.  See for example
https://github.com/dfskoll/rp-pppoe/issues/32

I've narrowed this down to the N_HDLC line discipline sometimes
concatenating two packets on the write side into a single read on the
other side.  I have attached a proof-of-concept program illustrating
the problem.

If you run:

./test_n_hdlc

with no arguments, then the program makes two writes in quick succession
to the tty followed by a read.  It generally only takes one or two tries
on my computer before I see both writes being combined in a single read.

On the other hand, if you run:

./test_n_hdlc foo

with a single argument, then the program sleeps for 0.1s between writes,
and I never see them being combined in a single read, even after 20 tries.

I'm running mainline kernel 6.8.7 on amd64; the processor model name
per cpuinfo is: AMD Ryzen Threadripper 3970X 32-Core Processor

Regards,

Dianne.

Greg's reply with my responses inline:

> What is wrong with that?  Does the N_HDLC line discipline somewhere
> state that this is not possible to happen?  Normal write combining
> happens for other tty ldiscs.  Userspace should be able to handle
> this, unless again, the N_HDLC ldisc somehow says that this should
> never happen.

> Ok, I looked at the comments at the top of the the ldisc, and it says
> this should not happen, so something is odd, I agree.

Right.  It's done that way so a PPP 'pty' helper always gets exactly one
frame when it reads from the tty file dscriptor.

> Did this change recently?  Or has this always been the case?  Meaning
> did something in the kernel change to cause this to break?

I haven't used this mode of PPP in over a decade, so I know it used to work,
but no longer does.  I'm sorry I can't narrow down more precisely as to
when it stopped working.

> Also, please cc: the linux-serial list for tty issues, doing stuff in
> private is generally not a good idea.

OK, sorry about that... I misread the bug reporting instructions and
forgot to Cc the lists.

Regards,

Dianne.

[-- Attachment #2: test_n_hdlc.tar.gz --]
[-- Type: application/gzip, Size: 1085 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: N_HDLC line discipline: Race condition
  2024-04-24 21:31 N_HDLC line discipline: Race condition Dianne Skoll
@ 2024-04-25 18:01 ` Dianne Skoll
  2024-05-15 10:33   ` Jiri Slaby
  2024-05-21 10:47   ` Jiri Slaby
  0 siblings, 2 replies; 9+ messages in thread
From: Dianne Skoll @ 2024-04-25 18:01 UTC (permalink / raw)
  To: linux-serial, linux-kernel; +Cc: Greg Kroah-Hartman, Jiri Slaby

Hi,

I have (somewhat) narrowed down when the kernel bug appeared by installing
Debian 10, 11 and 12 in KVM virtual machines.

The bug is NOT present in Debian 10, kernel version 4.19.67.

The bug IS present in Debian 11, kernel version 5.10.209

The bug IS present in Debian 12, kernel version 6.1.85

So I guess it was introduced sometime between 4.19.67 and 5.10.209.  I'll
take a look to see if I can do a git bisect.

[To recap, the bug is that the N_HDLC line discipline sometimes
coalesces two write()s so you get them both back in a single read()
which is contrary to what it's supposed to do... preserve the write
boundaries as individual frames.]

Regards,

Dianne

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: N_HDLC line discipline: Race condition
  2024-04-25 18:01 ` Dianne Skoll
@ 2024-05-15 10:33   ` Jiri Slaby
  2024-05-15 13:42     ` Dianne Skoll
  2024-05-21 10:47   ` Jiri Slaby
  1 sibling, 1 reply; 9+ messages in thread
From: Jiri Slaby @ 2024-05-15 10:33 UTC (permalink / raw)
  To: Dianne Skoll, linux-serial, linux-kernel; +Cc: Greg Kroah-Hartman

Hi,

On 25. 04. 24, 20:01, Dianne Skoll wrote:
> I have (somewhat) narrowed down when the kernel bug appeared by installing
> Debian 10, 11 and 12 in KVM virtual machines.
> 
> The bug is NOT present in Debian 10, kernel version 4.19.67.

I can repro even with 4.19:
posix_openpt = 3
grantpt(3)  = 0
unlockpt(3) = 0
ptsname(3) = /dev/pts/2
open(/dev/pts/2) = 4
ioctl(3, TIOCSETD) = 0
ioctl(4, TIOCSETD) = 0
write(3, Hello , 6) = 6
write(3, world, 5) = 5
read(4, buf, 2048) = 6
buf = |Hello |
read(4, buf, 2048) = 5
buf = |world|
read(4, buf, 2048) = -1
write(3, Hello , 6) = 6
write(3, world, 5) = 5
read(4, buf, 2048) = 11
buf = |Hello world|
HAHA!  Try #2: The two writes were combined in a single read!


Could you recheck?

> The bug IS present in Debian 11, kernel version 5.10.209
> 
> The bug IS present in Debian 12, kernel version 6.1.85
> 
> So I guess it was introduced sometime between 4.19.67 and 5.10.209.  I'll
> take a look to see if I can do a git bisect.

Were you able to do so?

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: N_HDLC line discipline: Race condition
  2024-05-15 10:33   ` Jiri Slaby
@ 2024-05-15 13:42     ` Dianne Skoll
  2024-05-15 15:44       ` Dianne Skoll
  0 siblings, 1 reply; 9+ messages in thread
From: Dianne Skoll @ 2024-05-15 13:42 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: linux-serial, linux-kernel, Greg Kroah-Hartman

On Wed, 15 May 2024 12:33:24 +0200
Jiri Slaby <jirislaby@kernel.org> wrote:

> I can repro even with 4.19:

Huh!  I managed to make it happen on Debian's 4.19.67
kernel.  I had to redirect stdout/stderr to a file because printing to the
terminal slowed down the writes enough to prevent it from happening for me.

I guess it's hardware-dependent because I can't reproduce it on kernel
6.6 on a Raspberry Pi.

> > So I guess it was introduced sometime between 4.19.67 and 5.10.209.
> >  I'll take a look to see if I can do a git bisect.

> Were you able to do so?

Unfortunately not; the older kernels failed to compile on my machine and
since I'm not a kernel developer and don't actually use any software that
relies on N_HDLC, I gave up... sorry.

Regards,

Dianne.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: N_HDLC line discipline: Race condition
  2024-05-15 13:42     ` Dianne Skoll
@ 2024-05-15 15:44       ` Dianne Skoll
  0 siblings, 0 replies; 9+ messages in thread
From: Dianne Skoll @ 2024-05-15 15:44 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: linux-serial, linux-kernel, Greg Kroah-Hartman

On Wed, 15 May 2024 09:42:52 -0400
Dianne Skoll <dianne@skoll.ca> wrote:

> I guess it's hardware-dependent because I can't reproduce it on kernel
> 6.6 on a Raspberry Pi.

Please ignore the above.  The Pi kernel ships without N_HDLC enabled,
so the test results were wrong.

Regards,

Dianne.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: N_HDLC line discipline: Race condition
  2024-04-25 18:01 ` Dianne Skoll
  2024-05-15 10:33   ` Jiri Slaby
@ 2024-05-21 10:47   ` Jiri Slaby
  2024-05-21 14:15     ` Dianne Skoll
  1 sibling, 1 reply; 9+ messages in thread
From: Jiri Slaby @ 2024-05-21 10:47 UTC (permalink / raw)
  To: Dianne Skoll, linux-serial, linux-kernel; +Cc: Greg Kroah-Hartman

On 25. 04. 24, 20:01, Dianne Skoll wrote:
> Hi,
> 
> I have (somewhat) narrowed down when the kernel bug appeared by installing
> Debian 10, 11 and 12 in KVM virtual machines.
> 
> The bug is NOT present in Debian 10, kernel version 4.19.67.
> 
> The bug IS present in Debian 11, kernel version 5.10.209
> 
> The bug IS present in Debian 12, kernel version 6.1.85
> 
> So I guess it was introduced sometime between 4.19.67 and 5.10.209.  I'll
> take a look to see if I can do a git bisect.
> 
> [To recap, the bug is that the N_HDLC line discipline sometimes
> coalesces two write()s so you get them both back in a single read()
> which is contrary to what it's supposed to do... preserve the write
> boundaries as individual frames.]

I believe it is a correct behavior after all. As you use pty for 
testing, the "framing" is lost during the pty-to-pty pass on the flush 
to ldisc path (receive_buf()).

[ T1056] n_hdlc_send_frames: ptm2 sending frame 0000000081e69927, count=6
[ T1056]        frame 0000000081e69927 completed
[ T1056] n_hdlc_send_frames: ptm2 sending frame 00000000576db119, count=5
[ T1056]        frame 00000000576db119 completed
[  T123] n_hdlc_tty_receive: pts2 buf=00000000a616a2be count=11
[ T1056] n_hdlc_tty_read: pts2 rbuf=00000000a616a2be 
kbuf=000000004abc3c35 offset=0 ret=11

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: N_HDLC line discipline: Race condition
  2024-05-21 10:47   ` Jiri Slaby
@ 2024-05-21 14:15     ` Dianne Skoll
  2024-05-23  8:01       ` Jiri Slaby
  0 siblings, 1 reply; 9+ messages in thread
From: Dianne Skoll @ 2024-05-21 14:15 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: linux-serial, linux-kernel, Greg Kroah-Hartman

On Tue, 21 May 2024 12:47:00 +0200
Jiri Slaby <jirislaby@kernel.org> wrote:

> I believe it is a correct behavior after all. As you use pty for 
> testing, the "framing" is lost during the pty-to-pty pass on the
> flush to ldisc path (receive_buf()).

That might be what's happening, but I don't think it matches the documentation
in n_hdlc.c.  If you read the comment block near the top of the file,
it says this:

 * All HDLC data is frame oriented which means:
 *
 * 1. tty write calls represent one complete transmit frame of data
 *    The device driver should accept the complete frame or none of
 *    the frame (busy) in the write method. Each write call should have
 *    a byte count in the range of 2-65535 bytes (2 is min HDLC frame
 *    with 1 addr byte and 1 ctrl byte). The max byte count of 65535
 *    should include any crc bytes required. For example, when using
 *    CCITT CRC32, 4 crc bytes are required, so the maximum size frame
 *    the application may transmit is limited to 65531 bytes. For CCITT
 *    CRC16, the maximum application frame size would be 65533.
 *
 *
 * 2. receive callbacks from the device driver represents
 *    one received frame. The device driver should bypass
 *    the tty flip buffer and call the line discipline receive
 *    callback directly to avoid fragmenting or concatenating
 *    multiple frames into a single receive callback.
 *
 *    The HDLC line discipline queues the receive frames in separate
 *    buffers so complete receive frames can be returned by the
 *    tty read calls.
 *
 * 3. tty read calls returns an entire frame of data or nothing.
[...] */

Point 2 says that the driver should avoid fragmenting frames, or concatenating
frames into a single receive callback.  Doesn't this imply that frame
boundaries should be preserved when you read() data, which happens reliably
when you add a small delay between write()'s?

Regards,

Dianne.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: N_HDLC line discipline: Race condition
  2024-05-21 14:15     ` Dianne Skoll
@ 2024-05-23  8:01       ` Jiri Slaby
  2024-05-23 12:44         ` Dianne Skoll
  0 siblings, 1 reply; 9+ messages in thread
From: Jiri Slaby @ 2024-05-23  8:01 UTC (permalink / raw)
  To: Dianne Skoll; +Cc: linux-serial, linux-kernel, Greg Kroah-Hartman

On 21. 05. 24, 16:15, Dianne Skoll wrote:
> On Tue, 21 May 2024 12:47:00 +0200
> Jiri Slaby <jirislaby@kernel.org> wrote:
> 
>> I believe it is a correct behavior after all. As you use pty for
>> testing, the "framing" is lost during the pty-to-pty pass on the
>> flush to ldisc path (receive_buf()).
> 
> That might be what's happening, but I don't think it matches the documentation
> in n_hdlc.c.  If you read the comment block near the top of the file,
> it says this:
> 
>   * All HDLC data is frame oriented which means:
>   *
>   * 1. tty write calls represent one complete transmit frame of data
>   *    The device driver should accept the complete frame or none of
>   *    the frame (busy) in the write method. Each write call should have
>   *    a byte count in the range of 2-65535 bytes (2 is min HDLC frame
>   *    with 1 addr byte and 1 ctrl byte). The max byte count of 65535
>   *    should include any crc bytes required. For example, when using
>   *    CCITT CRC32, 4 crc bytes are required, so the maximum size frame
>   *    the application may transmit is limited to 65531 bytes. For CCITT
>   *    CRC16, the maximum application frame size would be 65533.
>   *
>   *
>   * 2. receive callbacks from the device driver represents
>   *    one received frame. The device driver should bypass
>   *    the tty flip buffer and call the line discipline receive
>   *    callback directly to avoid fragmenting or concatenating
>   *    multiple frames into a single receive callback.
>   *
>   *    The HDLC line discipline queues the receive frames in separate
>   *    buffers so complete receive frames can be returned by the
>   *    tty read calls.
>   *
>   * 3. tty read calls returns an entire frame of data or nothing.
> [...] */
> 
> Point 2 says that the driver should avoid fragmenting frames, or concatenating
> frames into a single receive callback.  Doesn't this imply that frame
> boundaries should be preserved when you read() data, which happens reliably
> when you add a small delay between write()'s?

The driver definitely behaves as described. If the ldisc is used on a 
real HW. ptys are a different story -- it's not guaranteed there. Does 
it make sense to use nhdlc on a pty pair? I believe not.

thanks,
-- 
js
suse labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: N_HDLC line discipline: Race condition
  2024-05-23  8:01       ` Jiri Slaby
@ 2024-05-23 12:44         ` Dianne Skoll
  0 siblings, 0 replies; 9+ messages in thread
From: Dianne Skoll @ 2024-05-23 12:44 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: linux-serial, linux-kernel, Greg Kroah-Hartman

On Thu, 23 May 2024 10:01:51 +0200
Jiri Slaby <jirislaby@kernel.org> wrote:

> The driver definitely behaves as described. If the ldisc is used on a
> real HW. ptys are a different story -- it's not guaranteed there.
> Does it make sense to use nhdlc on a pty pair? I believe not.

OK.  Well, the use case was as follows:  pppd has an option called `pty`
that makes it execute an arbitrary program, connect to its standard
input and output via a pty pair, and send and receive PPP frames over that
pty pair.

rp-pppoe (https://dianne.skoll.ca/projects/rp-pppoe/) includes a
program called `pppoe` designed to be on the other end of that pty
pair and receive/transmit the PPP frames via PPPoE.  pppd includes a
`sync` option, and pppoe a `-s` option that enable N_HDLC on the pty.
This lets pppoe just read/write a frame at a time without worrying
about PPP framing bytes or dealing with PPP escape characters, which
reduces the CPU overhead of pppoe.

Now, the Linux kernel has had built-in support for PPPoE for many
years, and I was thinking of dropping userspace PPPoE support, but I
heard from a user who wants to keep it.  ucLinux, it seems, does not
support dlopen(), so this user can't use pppd's `plugin` option to
load the kernel-mode PPPoE support module and has to keep using
user-space PPPoE.

Sync support was added to pppoe decades ago and either it worked well,
nobody used it, or nobody reported a bug until recently.  But anyway,
if the consensus is that N_HDLC shouldn't be used on a pty pair, I'm
fine with that.  Perhaps a comment in the source file and a note in
the N_HDLC documentation would be good, and then I'll just remove
support for `sync` and `-s` from pppoe, since it can't be guaranteed
to work correctly.

Regards,

Dianne.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-05-23 12:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-24 21:31 N_HDLC line discipline: Race condition Dianne Skoll
2024-04-25 18:01 ` Dianne Skoll
2024-05-15 10:33   ` Jiri Slaby
2024-05-15 13:42     ` Dianne Skoll
2024-05-15 15:44       ` Dianne Skoll
2024-05-21 10:47   ` Jiri Slaby
2024-05-21 14:15     ` Dianne Skoll
2024-05-23  8:01       ` Jiri Slaby
2024-05-23 12:44         ` Dianne Skoll

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).