All of lore.kernel.org
 help / color / mirror / Atom feed
* please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events
@ 2023-02-28 20:42 Oliver Hartkopp
  2023-02-28 20:49 ` Michal Sojka
  2023-02-28 20:50 ` [PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events Michal Sojka
  0 siblings, 2 replies; 17+ messages in thread
From: Oliver Hartkopp @ 2023-02-28 20:42 UTC (permalink / raw)
  To: Michal Sojka; +Cc: linux-can

Hi Michal,

I'm currently testing some isotp traffic with your 'poll' patch but 
unfortunately I did not see this mail in my in-box:

https://lore.kernel.org/linux-can/20230224010659.48420-1-michal.sojka@cvut.cz/T/#u

It is visible on the mail archive but obviously something went wrong on 
my mail server side ¯\_(ツ)_/¯

Can you please re-send your patch, so that I can comment inside?

Many thanks,
Oliver

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events
  2023-02-28 20:42 please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events Oliver Hartkopp
@ 2023-02-28 20:49 ` Michal Sojka
  2023-02-28 20:51   ` Oliver Hartkopp
  2023-03-01 18:59   ` Oliver Hartkopp
  2023-02-28 20:50 ` [PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events Michal Sojka
  1 sibling, 2 replies; 17+ messages in thread
From: Michal Sojka @ 2023-02-28 20:49 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: linux-can

Hi Oliver,

On Tue, Feb 28 2023, Oliver Hartkopp wrote:
> Hi Michal,
>
> I'm currently testing some isotp traffic with your 'poll' patch but 
> unfortunately I did not see this mail in my in-box:
>
> https://lore.kernel.org/linux-can/20230224010659.48420-1-michal.sojka@cvut.cz/T/#u
>
> It is visible on the mail archive but obviously something went wrong on 
> my mail server side ¯\_(ツ)_/¯
>
> Can you please re-send your patch, so that I can comment inside?

resending in a minute.

-Michal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events
  2023-02-28 20:42 please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events Oliver Hartkopp
  2023-02-28 20:49 ` Michal Sojka
@ 2023-02-28 20:50 ` Michal Sojka
  1 sibling, 0 replies; 17+ messages in thread
From: Michal Sojka @ 2023-02-28 20:50 UTC (permalink / raw)
  To: Oliver Hartkopp, Michal Sojka; +Cc: linux-can, Jakub Jira

When using select/poll/epoll() with a non-blocking ISOTP socket to
wait for when non-blocking write is possible, false EPOLLOUT event is
sometimes returned. This can happen at least after sending a message
which must be split to multiple CAN frames.

The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
not equal to ISOTP_IDLE and this behavior is not reflected in
datagram_poll(), which is used in isotp_ops.

This is fixed by introducing ISOTP-specific poll function, which
suppresses the EPOLLOUT events in that case.

Below is a program that can trigger the problem on a vcan interface.
When running the program as:

    ./isotp-poll-test -s 123 -d 321 -o

it starts sending ISOTP messages that include increasing ASCII
numbers. poll() is used to wait before next transmission.

With current mainline Linux, once the message length is greater than 7
bytes, write() returns -EAGAIN and the program terminates. This should
not happen, because the previous poll() reported that the write()
would not block.

After applying this patch, the above command doesn't fail - if one
runs some ISOTP reader such as:

    isotprecv -l -s 321 -d 123 vcan0

This test program can also show another problem. When running:

    ./isotp-poll-test -s 321 -d 123 -i -a

and then in another terminal:

    ./isotp-poll-test -s 123 -d 321 -o

The first program receives the messages and uses the counter values to
check for lost messages. After a random number of iterations a lost
message is always detected. I believe that ISOTP should be reliable
protocol, at least on vcan, shouldn't it?

Anyway, this patch doesn't try to address this problem.

 --8<---------------cut here---------------start------------->8---
 #include <net/if.h>
 #include <sys/socket.h>
 #include <sys/types.h>

 #include <fcntl.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
 #include <poll.h>
 #include <stdbool.h>
 #include <err.h>

 #include <net/if.h>
 #include <sys/ioctl.h>
 #include <sys/socket.h>
 #include <sys/types.h>

 #include <linux/can.h>
 #include <linux/can/isotp.h>

 #define CHECK(expr) ({ int ret = (expr); if (ret == -1) err(EXIT_FAILURE, "%s", #expr); ret; })

 int main(int argc, char *argv[])
 {
     int sock;
     struct sockaddr_can addr;
     char opt;
     bool in = false, out = false;
     bool validate_seq = false;
     int buf_size = 0;
     unsigned cnt = 1, max_msgs = 0;

     /* These default can be overridden with -s and -d */
     addr.can_addr.tp.tx_id = 0x123;
     addr.can_addr.tp.rx_id = 0x321;

     while ((opt = getopt(argc, argv, "ac:d:ios:")) != -1) {
         switch (opt) {
         case 'a':
             validate_seq = true;
             break;
         case 'c':
             max_msgs = atol(optarg);
             break;
         case 'i':
             in = true;
             break;
         case 'o':
             out = true;
             break;
         case 's':
             addr.can_addr.tp.tx_id = strtoul(optarg, NULL, 16);
             if (strlen(optarg) > 7)
                 addr.can_addr.tp.tx_id |= CAN_EFF_FLAG;
             break;
         case 'd':
             addr.can_addr.tp.rx_id = strtoul(optarg, NULL, 16);
             if (strlen(optarg) > 7)
                 addr.can_addr.tp.rx_id |= CAN_EFF_FLAG;
             break;
         default: /* '?' */
             err(EXIT_FAILURE, "Usage: %s [-i] [-o]", argv[0]);
         }
     }

     sock = CHECK(socket(PF_CAN, SOCK_DGRAM, CAN_ISOTP));

     const char *ifname = "vcan0";
     addr.can_family = AF_CAN;
     addr.can_ifindex = if_nametoindex(ifname);
     if (!addr.can_ifindex)
         err(EXIT_FAILURE, "%s", ifname);

     CHECK(bind(sock, (struct sockaddr *)&addr, sizeof(addr)));

     int flags = CHECK(fcntl(sock, F_GETFL, 0));
     CHECK(fcntl(sock, F_SETFL, flags | O_NONBLOCK));

     struct pollfd pollfd = {
         .fd = sock,
         .events = ((in ? POLLIN : 0) | ((out & !in) ? POLLOUT : 0))
     };

     do {
         char buf[100];
         int ret;

         CHECK(poll(&pollfd, 1, -1)); /* Wait with infinite timeout */

         if (pollfd.revents & POLLIN) {
             buf_size = CHECK(read(sock, buf, sizeof(buf) - 1));
             printf("#%u: Read %d bytes\n", cnt, buf_size);
             if (validate_seq) {
                 unsigned cnt_rcvd = 0;
                 buf[buf_size] = 0;
                 sscanf(buf, "Hello%u", &cnt_rcvd);
                 if (cnt != cnt_rcvd)
                     errx(EXIT_FAILURE, "Lost messages. Expected: #%u, received #%u", cnt, cnt_rcvd);
             }
             if (out)
                 pollfd.events |= POLLOUT; /* Start writing only after reception of data */
         }
         if (pollfd.revents & POLLOUT) {
             if (!in) {
                 char str[200];
                 sprintf(str, "Hello%u", cnt);
                 ret = CHECK(write(sock, str, strlen(str)));
             } else {
                 ret = CHECK(write(sock, buf, buf_size));
             }
             printf("#%u: Wrote %d bytes\n", cnt, ret);
         }
     } while (cnt++ < max_msgs || max_msgs == 0);

     return 0;
 }
 --8<---------------cut here---------------end--------------->8---

Signed-off-by: Michal Sojka <michal.sojka@cvut.cz>
Reported-by: Jakub Jira <jirajak2@fel.cvut.cz>
---
 net/can/isotp.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/net/can/isotp.c b/net/can/isotp.c
index 9bc344851704..d205359eaabd 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -1608,6 +1608,19 @@ static int isotp_init(struct sock *sk)
 	return 0;
 }
 
+static __poll_t isotp_poll(struct file *file, struct socket *sock, poll_table *wait)
+{
+	__poll_t mask = datagram_poll(file, sock, wait);
+	struct sock *sk = sock->sk;
+	struct isotp_sock *so = isotp_sk(sk);
+
+	/* Check for false positives due to TX state */
+	if ((mask & EPOLLWRNORM) && (so->tx.state != ISOTP_IDLE))
+		mask &= ~(EPOLLOUT | EPOLLWRNORM);
+
+	return mask;
+}
+
 static int isotp_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
 				  unsigned long arg)
 {
@@ -1623,7 +1636,7 @@ static const struct proto_ops isotp_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept = sock_no_accept,
 	.getname = isotp_getname,
-	.poll = datagram_poll,
+	.poll = isotp_poll,
 	.ioctl = isotp_sock_no_ioctlcmd,
 	.gettstamp = sock_gettstamp,
 	.listen = sock_no_listen,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events
  2023-02-28 20:49 ` Michal Sojka
@ 2023-02-28 20:51   ` Oliver Hartkopp
  2023-03-01 18:59   ` Oliver Hartkopp
  1 sibling, 0 replies; 17+ messages in thread
From: Oliver Hartkopp @ 2023-02-28 20:51 UTC (permalink / raw)
  To: Michal Sojka; +Cc: linux-can

On 28.02.23 21:49, Michal Sojka wrote:

>> Can you please re-send your patch, so that I can comment inside?
> 
> resending in a minute.

No hurry! I think I'm done for today anyway :-D

Best regards,
Oliver

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events
  2023-02-28 20:49 ` Michal Sojka
  2023-02-28 20:51   ` Oliver Hartkopp
@ 2023-03-01 18:59   ` Oliver Hartkopp
  2023-03-01 20:55     ` Oliver Hartkopp
  2023-03-02  9:20     ` Michal Sojka
  1 sibling, 2 replies; 17+ messages in thread
From: Oliver Hartkopp @ 2023-03-01 18:59 UTC (permalink / raw)
  To: Michal Sojka, Jakub Jira; +Cc: linux-can

Hi Michal,

just copied the text to answer your patch ...

On 28.02.23 21:49, Michal Sojka wrote:

 > When using select/poll/epoll() with a non-blocking ISOTP socket to
 > wait for when non-blocking write is possible, false EPOLLOUT event is
 > sometimes returned. This can happen at least after sending a message
 > which must be split to multiple CAN frames.
 >
 > The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
 > not equal to ISOTP_IDLE and this behavior is not reflected in
 > datagram_poll(), which is used in isotp_ops.
 >
 > This is fixed by introducing ISOTP-specific poll function, which
 > suppresses the EPOLLOUT events in that case.


Good improvement!

 > Below is a program that can trigger the problem on a vcan interface.
 > When running the program as:
 >
 >      ./isotp-poll-test -s 123 -d 321 -o
 >
 > it starts sending ISOTP messages that include increasing ASCII
 > numbers. poll() is used to wait before next transmission.
 >
 > With current mainline Linux, once the message length is greater than 7
 > bytes, write() returns -EAGAIN and the program terminates. This should
 > not happen, because the previous poll() reported that the write()
 > would not block.
 >
 > After applying this patch, the above command doesn't fail - if one
 > runs some ISOTP reader such as:
 >
 >      isotprecv -l -s 321 -d 123 vcan0
 >

Yes, I can confirm that the above setup works - but not reliable:

 > This test program can also show another problem. When running:
 >
 >      ./isotp-poll-test -s 321 -d 123 -i -a
 >
 > and then in another terminal:
 >
 >      ./isotp-poll-test -s 123 -d 321 -o
 >
 > The first program receives the messages and uses the counter values to
 > check for lost messages. After a random number of iterations a lost
 > message is always detected. I believe that ISOTP should be reliable
 > protocol, at least on vcan, shouldn't it?

The problem seems to occur with the new introduced POLL feature.

When using

	./isotp-poll-test -s 123 -d 321 -o

together with

	isotprecv -l -s 321 -d 123 vcan0

the transmission of isotp PDUs stops at some point
#8179 #6204 #1787 #373 #69321 etc

I can not see problems or drops when using

./isotpsend vcan0 -s 123 -d 321 -D 9 -li

as data producer where I added a counter:

diff --git a/isotpsend.c b/isotpsend.c
index deac601..815c254 100644
--- a/isotpsend.c
+++ b/isotpsend.c
@@ -96,10 +96,11 @@ int main(int argc, char **argv)
      useconds_t usecs = 0; /* wait before sending the PDU */
      __u32 force_tx_stmin = 0;
      unsigned char buf[BUFSIZE];
      int buflen = 0;
      int datalen = 0;
+    u_int32_t datainc = 0;
      int retval = 0;

      addr.can_addr.tp.tx_id = addr.can_addr.tp.rx_id = NO_CAN_ID;

      while ((opt = getopt(argc, argv, "s:d:x:p:P:t:f:D:l:g:bSCL:?")) != 
-1) {
@@ -295,10 +296,13 @@ int main(int argc, char **argv)

  loop:
      if (usecs)
             usleep(usecs);

+    memcpy(&buf[datalen - 4], &datainc, 4);
+    datainc++;
+
      retval = write(s, buf, buflen);
      if (retval < 0) {
             perror("write");
             return retval;
      }


Maybe the 'poll' patch needs to check for something else. I will take a 
look at it.

Many thanks,
Oliver


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events
  2023-03-01 18:59   ` Oliver Hartkopp
@ 2023-03-01 20:55     ` Oliver Hartkopp
  2023-03-01 23:17       ` Michal Sojka
  2023-03-02  9:20     ` Michal Sojka
  1 sibling, 1 reply; 17+ messages in thread
From: Oliver Hartkopp @ 2023-03-01 20:55 UTC (permalink / raw)
  To: Michal Sojka, Jakub Jira; +Cc: linux-can

I added the sockopt CAN_ISOTP_WAIT_TX_DONE in isotp-poll-test.c which 
fixes the problem:

(..)

struct sockaddr_can addr;
static struct can_isotp_options opts = { .flags = CAN_ISOTP_WAIT_TX_DONE};

(..)

sock = CHECK(socket(PF_CAN, SOCK_DGRAM, CAN_ISOTP));

CHECK(setsockopt(sock, SOL_CAN_ISOTP, CAN_ISOTP_OPTS, &opts, sizeof(opts)));

(..)

The sockopt uses a wait queue and returns from the write() syscall when 
tx.state becomes ISOTP_IDLE.

Using the isotp socket without CAN_ISOTP_WAIT_TX_DONE turned out to be 
not such good idea. But this results from the original API which had 
some kind of "fire-and-forget" mode.

Today the tx.state is set back to ISOTP_IDLE in isotp_rcv_echo() - and 
with this short 9 byte PDU the interaction with the receiving entity is 
really fast on vcan's. Maybe faster than the the write syscall returns.

I don't know what happens to the poll notification then.

Best regards,
Oliver


On 01.03.23 19:59, Oliver Hartkopp wrote:
> Hi Michal,
> 
> just copied the text to answer your patch ...
> 
> On 28.02.23 21:49, Michal Sojka wrote:
> 
>  > When using select/poll/epoll() with a non-blocking ISOTP socket to
>  > wait for when non-blocking write is possible, false EPOLLOUT event is
>  > sometimes returned. This can happen at least after sending a message
>  > which must be split to multiple CAN frames.
>  >
>  > The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
>  > not equal to ISOTP_IDLE and this behavior is not reflected in
>  > datagram_poll(), which is used in isotp_ops.
>  >
>  > This is fixed by introducing ISOTP-specific poll function, which
>  > suppresses the EPOLLOUT events in that case.
> 
> 
> Good improvement!
> 
>  > Below is a program that can trigger the problem on a vcan interface.
>  > When running the program as:
>  >
>  >      ./isotp-poll-test -s 123 -d 321 -o
>  >
>  > it starts sending ISOTP messages that include increasing ASCII
>  > numbers. poll() is used to wait before next transmission.
>  >
>  > With current mainline Linux, once the message length is greater than 7
>  > bytes, write() returns -EAGAIN and the program terminates. This should
>  > not happen, because the previous poll() reported that the write()
>  > would not block.
>  >
>  > After applying this patch, the above command doesn't fail - if one
>  > runs some ISOTP reader such as:
>  >
>  >      isotprecv -l -s 321 -d 123 vcan0
>  >
> 
> Yes, I can confirm that the above setup works - but not reliable:
> 
>  > This test program can also show another problem. When running:
>  >
>  >      ./isotp-poll-test -s 321 -d 123 -i -a
>  >
>  > and then in another terminal:
>  >
>  >      ./isotp-poll-test -s 123 -d 321 -o
>  >
>  > The first program receives the messages and uses the counter values to
>  > check for lost messages. After a random number of iterations a lost
>  > message is always detected. I believe that ISOTP should be reliable
>  > protocol, at least on vcan, shouldn't it?
> 
> The problem seems to occur with the new introduced POLL feature.
> 
> When using
> 
>      ./isotp-poll-test -s 123 -d 321 -o
> 
> together with
> 
>      isotprecv -l -s 321 -d 123 vcan0
> 
> the transmission of isotp PDUs stops at some point
> #8179 #6204 #1787 #373 #69321 etc
> 
> I can not see problems or drops when using
> 
> ./isotpsend vcan0 -s 123 -d 321 -D 9 -li
> 
> as data producer where I added a counter:
> 
> diff --git a/isotpsend.c b/isotpsend.c
> index deac601..815c254 100644
> --- a/isotpsend.c
> +++ b/isotpsend.c
> @@ -96,10 +96,11 @@ int main(int argc, char **argv)
>       useconds_t usecs = 0; /* wait before sending the PDU */
>       __u32 force_tx_stmin = 0;
>       unsigned char buf[BUFSIZE];
>       int buflen = 0;
>       int datalen = 0;
> +    u_int32_t datainc = 0;
>       int retval = 0;
> 
>       addr.can_addr.tp.tx_id = addr.can_addr.tp.rx_id = NO_CAN_ID;
> 
>       while ((opt = getopt(argc, argv, "s:d:x:p:P:t:f:D:l:g:bSCL:?")) != 
> -1) {
> @@ -295,10 +296,13 @@ int main(int argc, char **argv)
> 
>   loop:
>       if (usecs)
>              usleep(usecs);
> 
> +    memcpy(&buf[datalen - 4], &datainc, 4);
> +    datainc++;
> +
>       retval = write(s, buf, buflen);
>       if (retval < 0) {
>              perror("write");
>              return retval;
>       }
> 
> 
> Maybe the 'poll' patch needs to check for something else. I will take a 
> look at it.
> 
> Many thanks,
> Oliver
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events
  2023-03-01 20:55     ` Oliver Hartkopp
@ 2023-03-01 23:17       ` Michal Sojka
  0 siblings, 0 replies; 17+ messages in thread
From: Michal Sojka @ 2023-03-01 23:17 UTC (permalink / raw)
  To: Oliver Hartkopp, Jakub Jira; +Cc: linux-can

On Wed, Mar 01 2023, Oliver Hartkopp wrote:
> I added the sockopt CAN_ISOTP_WAIT_TX_DONE in isotp-poll-test.c which 
> fixes the problem:

I saw CAN_ISOTP_WAIT_TX_DONE but didn't want to use it, because the
whole point of (e)poll/select is to wait only in poll() and have write()
execute without any blocking. In a sense, CAN_ISOTP_WAIT_TX_DONE is
incompatible with O_NONBLOCK sockets.

> The sockopt uses a wait queue and returns from the write() syscall when 
> tx.state becomes ISOTP_IDLE.

That should be the point of poll() too: Waiting until the socket is
ready for the next write(). As I understand it, the poll()
implementation is "connected" to all relevant wait queues and whenever a
wait queue is woken up, protocol's poll method is called. It checks if
the wake really made the socket writable (or readable) and if so, the
poll() syscall returns.

When thinking about that, I guess I know where is the problem.
Currently, we call datagram_poll(), which checks only sock->wq.wait
queue and not the isotp wait queue so->wait.

Tomorrow, I'll try to look at how to wait also for so->wait.

> Using the isotp socket without CAN_ISOTP_WAIT_TX_DONE turned out to be 
> not such good idea. But this results from the original API which had 
> some kind of "fire-and-forget" mode.
>
> Today the tx.state is set back to ISOTP_IDLE in isotp_rcv_echo() - and 
> with this short 9 byte PDU the interaction with the receiving entity is 
> really fast on vcan's. Maybe faster than the the write syscall
> returns.

If it is faster, then we wouldn't see write() returning EAGAIN in our
poll() tests (without CAN_ISOTP_WAIT_TX_DONE).

Best regards,
-Michal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events
  2023-03-01 18:59   ` Oliver Hartkopp
  2023-03-01 20:55     ` Oliver Hartkopp
@ 2023-03-02  9:20     ` Michal Sojka
  2023-03-02  9:28       ` [RFC PATCH v2] can: isotp: fix poll() to not report false " Michal Sojka
  2023-03-02 21:00       ` please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive " Oliver Hartkopp
  1 sibling, 2 replies; 17+ messages in thread
From: Michal Sojka @ 2023-03-02  9:20 UTC (permalink / raw)
  To: Oliver Hartkopp, Jakub Jira; +Cc: linux-can

Hi Oliver,

On Wed, Mar 01 2023, Oliver Hartkopp wrote:
> I can not see problems or drops when using
>
> ./isotpsend vcan0 -s 123 -d 321 -D 9 -li
>
> as data producer where I added a counter:

Really? I tried to modify my test program to also support blocking
operation (without poll()) and setting CAN_ISOTP_WAIT_TX_DONE. It's
available at https://github.com/wentasah/isotp-poll-test/blob/fd095b2242c49dc5d3e36faf5ac9f4f47fd002c7/isotp-poll-test.c.

With it, I detect lost message in mainline Linux 6.2, but sometimes, it
takes quite long time (a minute) before it happens.

What happens is well visible with strace. When I run the receiver as (-b
stands for blocking operation without poll()):

    strace ./isotp-poll-test -s 321 -d 123 -i -a -b -q

and then the sender (-w means to use CAN_ISOTP_WAIT_TX_DONE):

    strace ./isotp-poll-test -s 123 -d 321 -o -w -b -q

The output of the receiver is:

    read(3, "Hello122813", 99)              = 11
    read(3, "Hello122814", 99)              = 11
    read(3, "Hello122815", 99)              = 11
    read(3, "Hello122816", 99)              = 11
    read(3, "Hello122817", 99)              = 11
    read(3, "Hello122847", 99)              = 11
    write(2, "isotp-poll-test: ", 17isotp-poll-test: )       = 17
    write(2, "Lost messages. Expected: #122818"..., 50Lost messages. Expected: #122818, received #122847) = 50
    write(2, "\n", 1
    )                       = 1
    exit_group(1)                           = ?

Sender's output looks as follows:

    write(3, "Hello122813", 11)             = 11
    write(3, "Hello122814", 11)             = 11
    write(3, "Hello122815", 11)             = 11
    write(3, "Hello122816", 11)             = 11
    write(3, "Hello122817", 11)             = 11
    write(3, "Hello122818", 11)             = 11
    write(3, "Hello122819", 11)             = 11
    write(3, "Hello122820", 11)             = 11
    ...
    write(3, "Hello123116", 11)             = 11
    write(3, "Hello123117", 11)             = 11
    write(3, "Hello123118", 11)             = 11
    write(3, "Hello123119", 11)             = -1 ECOMM (Communication error on send)
    write(2, "isotp-poll-test: ", 17isotp-poll-test: )       = 17
    write(2, "write(sock, str, strlen(str))", 29write(sock, str, strlen(str))) = 29
    write(2, ": Communication error on send\n", 30: Communication error on send
    ) = 30
    exit_group(1)                           = ?

So my impression is that this does not relate to poll() implementation
changes.

I also tried to update my poll patch to take into account the so->wait
wait queue. But the result is the same as without it. I will send it in
a separate mail.

Best regards,
-Michal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH v2] can: isotp: fix poll() to not report false EPOLLOUT events
  2023-03-02  9:20     ` Michal Sojka
@ 2023-03-02  9:28       ` Michal Sojka
  2023-03-24 17:50         ` Marc Kleine-Budde
  2023-03-30 17:30         ` Oliver Hartkopp
  2023-03-02 21:00       ` please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive " Oliver Hartkopp
  1 sibling, 2 replies; 17+ messages in thread
From: Michal Sojka @ 2023-03-02  9:28 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: linux-can, Michal Sojka, Jakub Jira

When using select/poll/epoll() with a non-blocking ISOTP socket to
wait for when non-blocking write is possible, false EPOLLOUT event is
sometimes returned. This can happen at least after sending a message
which must be split to multiple CAN frames.

The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
not equal to ISOTP_IDLE and this behavior is not reflected in
datagram_poll(), which is used in isotp_ops.

This is fixed by introducing ISOTP-specific poll function, which
suppresses the EPOLLOUT events in that case.

This can be tested with a program available at
https://github.com/wentasah/isotp-poll-test/blob/fd095b2242c49dc5d3e36faf5ac9f4f47fd002c7/isotp-poll-test.c
which can trigger the problem on a vcan interface. When running the
program as:

    ./isotp-poll-test -s 123 -d 321 -o

it starts sending ISOTP messages that include increasing ASCII
numbers. poll() is used to wait before the next transmission.

With current mainline Linux, once the message length is greater than 7
bytes, write() returns -EAGAIN and the program terminates. This should
not happen, because the previous poll() reported that the write()
would not block.

After applying this patch, the above command doesn't fail - if one
runs some ISOTP reader such as:

    isotprecv -l -s 321 -d 123 vcan0

This test program can also show another problem. When running:

    ./isotp-poll-test -s 321 -d 123 -i -a

and then in another terminal:

    ./isotp-poll-test -s 123 -d 321 -o

The first program receives the messages and uses the counter values to
check for lost messages. After a random number of iterations a lost
message is always detected. I believe that ISOTP should be reliable
protocol, at least on vcan, shouldn't it?

Anyway, this patch doesn't try to address this problem.

Signed-off-by: Michal Sojka <michal.sojka@cvut.cz>
Reported-by: Jakub Jira <jirajak2@fel.cvut.cz>

---
Changelog:

v2: Added waiting for isotp-specific wait queue: poll_wait(file, &so->wait, wait).
---
 net/can/isotp.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/net/can/isotp.c b/net/can/isotp.c
index 9bc344851704..ec163e36ac53 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -1608,6 +1608,21 @@ static int isotp_init(struct sock *sk)
 	return 0;
 }
 
+static __poll_t isotp_poll(struct file *file, struct socket *sock, poll_table *wait)
+{
+	struct sock *sk = sock->sk;
+	struct isotp_sock *so = isotp_sk(sk);
+
+	__poll_t mask = datagram_poll(file, sock, wait);
+	poll_wait(file, &so->wait, wait);
+
+	/* Check for false positives due to TX state */
+	if ((mask & EPOLLWRNORM) && (so->tx.state != ISOTP_IDLE))
+		mask &= ~(EPOLLOUT | EPOLLWRNORM);
+
+	return mask;
+}
+
 static int isotp_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
 				  unsigned long arg)
 {
@@ -1623,7 +1638,7 @@ static const struct proto_ops isotp_ops = {
 	.socketpair = sock_no_socketpair,
 	.accept = sock_no_accept,
 	.getname = isotp_getname,
-	.poll = datagram_poll,
+	.poll = isotp_poll,
 	.ioctl = isotp_sock_no_ioctlcmd,
 	.gettstamp = sock_gettstamp,
 	.listen = sock_no_listen,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events
  2023-03-02  9:20     ` Michal Sojka
  2023-03-02  9:28       ` [RFC PATCH v2] can: isotp: fix poll() to not report false " Michal Sojka
@ 2023-03-02 21:00       ` Oliver Hartkopp
  2023-03-03  7:13         ` Michal Sojka
  2023-03-03 17:26         ` Lost packets on ISO-TP and vcan Michal Sojka
  1 sibling, 2 replies; 17+ messages in thread
From: Oliver Hartkopp @ 2023-03-02 21:00 UTC (permalink / raw)
  To: Michal Sojka, Jakub Jira; +Cc: linux-can

Hi Michal,

On 02.03.23 10:20, Michal Sojka wrote:

> On Wed, Mar 01 2023, Oliver Hartkopp wrote:
>> I can not see problems or drops when using
>>
>> ./isotpsend vcan0 -s 123 -d 321 -D 9 -li
>>
>> as data producer where I added a counter:
> 
> Really? 

Yes ;-)

> I tried to modify my test program to also support blocking
> operation (without poll()) and setting CAN_ISOTP_WAIT_TX_DONE. It's
> available at https://github.com/wentasah/isotp-poll-test/blob/fd095b2242c49dc5d3e36faf5ac9f4f47fd002c7/isotp-poll-test.c.

Cloned it.

> With it, I detect lost message in mainline Linux 6.2, but sometimes, it
> takes quite long time (a minute) before it happens.

Hm. Everything fine here with 6.2.0-12944-ge492250d5252.

> What happens is well visible with strace. When I run the receiver as (-b
> stands for blocking operation without poll()):
> 
>      strace ./isotp-poll-test -s 321 -d 123 -i -a -b -q
> 
> and then the sender (-w means to use CAN_ISOTP_WAIT_TX_DONE):
> 
>      strace ./isotp-poll-test -s 123 -d 321 -o -w -b -q

Running these two (without strace) for 15 mins now:

Hello20160713

and counting ...


> The output of the receiver is:
> 
>      read(3, "Hello122813", 99)              = 11
>      read(3, "Hello122814", 99)              = 11
>      read(3, "Hello122815", 99)              = 11
>      read(3, "Hello122816", 99)              = 11
>      read(3, "Hello122817", 99)              = 11
>      read(3, "Hello122847", 99)              = 11
>      write(2, "isotp-poll-test: ", 17isotp-poll-test: )       = 17
>      write(2, "Lost messages. Expected: #122818"..., 50Lost messages. Expected: #122818, received #122847) = 50
>      write(2, "\n", 1
>      )                       = 1
>      exit_group(1)                           = ?
> 
> Sender's output looks as follows:
> 
>      write(3, "Hello122813", 11)             = 11
>      write(3, "Hello122814", 11)             = 11
>      write(3, "Hello122815", 11)             = 11
>      write(3, "Hello122816", 11)             = 11
>      write(3, "Hello122817", 11)             = 11
>      write(3, "Hello122818", 11)             = 11
>      write(3, "Hello122819", 11)             = 11
>      write(3, "Hello122820", 11)             = 11
>      ...
>      write(3, "Hello123116", 11)             = 11
>      write(3, "Hello123117", 11)             = 11
>      write(3, "Hello123118", 11)             = 11
>      write(3, "Hello123119", 11)             = -1 ECOMM (Communication error on send)
>      write(2, "isotp-poll-test: ", 17isotp-poll-test: )       = 17
>      write(2, "write(sock, str, strlen(str))", 29write(sock, str, strlen(str))) = 29
>      write(2, ": Communication error on send\n", 30: Communication error on send
>      ) = 30
>      exit_group(1)                           = ?
> 
> So my impression is that this does not relate to poll() implementation
> changes.

With strace too.

strace ./isotp-poll-test -s 123 -d 321 -o -b -q

strace ./isotp-poll-test -s 321 -d 123 -i -a -b -q

No problems with your V2 patch either.

> I also tried to update my poll patch to take into account the so->wait
> wait queue. But the result is the same as without it. I will send it in
> a separate mail.

I only get a problem when removing '-b'

strace ./isotp-poll-test -s 321 -d 123 -i -a -q

poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
read(3, "Hello716", 99)                 = 8
write(2, "isotp-poll-test: ", 17isotp-poll-test: )       = 17
write(2, "Lost messages. Expected: #715, r"..., 44Lost messages. 
Expected: #715, received #716) = 44
write(2, "\n", 1
)                       = 1
exit_group(1)                           = ?
+++ exited with 1 +++

Hm.

Best regards,
Oliver

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events
  2023-03-02 21:00       ` please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive " Oliver Hartkopp
@ 2023-03-03  7:13         ` Michal Sojka
  2023-03-03 17:26         ` Lost packets on ISO-TP and vcan Michal Sojka
  1 sibling, 0 replies; 17+ messages in thread
From: Michal Sojka @ 2023-03-03  7:13 UTC (permalink / raw)
  To: Oliver Hartkopp, Jakub Jira; +Cc: linux-can

On Thu, Mar 02 2023, Oliver Hartkopp wrote:
> Hi Michal,
>
> On 02.03.23 10:20, Michal Sojka wrote:
>
>> On Wed, Mar 01 2023, Oliver Hartkopp wrote:
>>> I can not see problems or drops when using
>>>
>>> ./isotpsend vcan0 -s 123 -d 321 -D 9 -li
>>>
>>> as data producer where I added a counter:
>> 
>> Really? 
>
> Yes ;-)

You're right. I have just forgotten that I added the poll patch to my
default kernel. Thanks for testing.

I'll continue investigating it. It's not clear to me how can poll
implementation influence behaviour of blocking I/O.

Best regards,
-Michal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Lost packets on ISO-TP and vcan
  2023-03-02 21:00       ` please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive " Oliver Hartkopp
  2023-03-03  7:13         ` Michal Sojka
@ 2023-03-03 17:26         ` Michal Sojka
  2023-03-30 17:24           ` Oliver Hartkopp
  1 sibling, 1 reply; 17+ messages in thread
From: Michal Sojka @ 2023-03-03 17:26 UTC (permalink / raw)
  To: Oliver Hartkopp, Jakub Jira; +Cc: linux-can

Hi Oliver,

On Fri, Mar 03 2023, Michal Sojka wrote:
>>> On Wed, Mar 01 2023, Oliver Hartkopp wrote:
>>>> I can not see problems or drops when using
>>>>
>>>> ./isotpsend vcan0 -s 123 -d 321 -D 9 -li
>>>>
>>>> as data producer where I added a counter:
>>> 
>>> Really? 
>>
>> Yes ;-)
>
> You're right. I have just forgotten that I added the poll patch to my
> default kernel. Thanks for testing.

It seems I was just lucky in the morning. Now, I can reproduce the
packet loss again even without my patch. And my student Jakub (Cc) too,
but with kernel 5.15.

On Thu, Mar 02 2023, Oliver Hartkopp wrote:
> Hm. Everything fine here with 6.2.0-12944-ge492250d5252.

I tried the exact same commit. When the system running the test is
mostly idle, I have to wait many minutes to loose the packet. But when I
run the test simultaneously with Linux kernel compilation on all CPUs
(make -j$(nproc)), the packet is lost almost immediately.

I'm still testing with:

- receiver: ./isotp-poll-test -s 321 -d 123 -i -a -b -q
- sender:   ./isotp-poll-test -s 123 -d 321 -o -w -b

-Michal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH v2] can: isotp: fix poll() to not report false EPOLLOUT events
  2023-03-02  9:28       ` [RFC PATCH v2] can: isotp: fix poll() to not report false " Michal Sojka
@ 2023-03-24 17:50         ` Marc Kleine-Budde
  2023-03-26 11:38           ` Oliver Hartkopp
  2023-03-30 17:30         ` Oliver Hartkopp
  1 sibling, 1 reply; 17+ messages in thread
From: Marc Kleine-Budde @ 2023-03-24 17:50 UTC (permalink / raw)
  To: Michal Sojka, Oliver Hartkopp; +Cc: linux-can, Jakub Jira

[-- Attachment #1: Type: text/plain, Size: 3776 bytes --]

Oliver, what about this patch?

Marc

On 02.03.2023 10:28:12, Michal Sojka wrote:
> When using select/poll/epoll() with a non-blocking ISOTP socket to
> wait for when non-blocking write is possible, false EPOLLOUT event is
> sometimes returned. This can happen at least after sending a message
> which must be split to multiple CAN frames.
> 
> The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
> not equal to ISOTP_IDLE and this behavior is not reflected in
> datagram_poll(), which is used in isotp_ops.
> 
> This is fixed by introducing ISOTP-specific poll function, which
> suppresses the EPOLLOUT events in that case.
> 
> This can be tested with a program available at
> https://github.com/wentasah/isotp-poll-test/blob/fd095b2242c49dc5d3e36faf5ac9f4f47fd002c7/isotp-poll-test.c
> which can trigger the problem on a vcan interface. When running the
> program as:
> 
>     ./isotp-poll-test -s 123 -d 321 -o
> 
> it starts sending ISOTP messages that include increasing ASCII
> numbers. poll() is used to wait before the next transmission.
> 
> With current mainline Linux, once the message length is greater than 7
> bytes, write() returns -EAGAIN and the program terminates. This should
> not happen, because the previous poll() reported that the write()
> would not block.
> 
> After applying this patch, the above command doesn't fail - if one
> runs some ISOTP reader such as:
> 
>     isotprecv -l -s 321 -d 123 vcan0
> 
> This test program can also show another problem. When running:
> 
>     ./isotp-poll-test -s 321 -d 123 -i -a
> 
> and then in another terminal:
> 
>     ./isotp-poll-test -s 123 -d 321 -o
> 
> The first program receives the messages and uses the counter values to
> check for lost messages. After a random number of iterations a lost
> message is always detected. I believe that ISOTP should be reliable
> protocol, at least on vcan, shouldn't it?
> 
> Anyway, this patch doesn't try to address this problem.
> 
> Signed-off-by: Michal Sojka <michal.sojka@cvut.cz>
> Reported-by: Jakub Jira <jirajak2@fel.cvut.cz>
> 
> ---
> Changelog:
> 
> v2: Added waiting for isotp-specific wait queue: poll_wait(file, &so->wait, wait).
> ---
>  net/can/isotp.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/net/can/isotp.c b/net/can/isotp.c
> index 9bc344851704..ec163e36ac53 100644
> --- a/net/can/isotp.c
> +++ b/net/can/isotp.c
> @@ -1608,6 +1608,21 @@ static int isotp_init(struct sock *sk)
>  	return 0;
>  }
>  
> +static __poll_t isotp_poll(struct file *file, struct socket *sock, poll_table *wait)
> +{
> +	struct sock *sk = sock->sk;
> +	struct isotp_sock *so = isotp_sk(sk);
> +
> +	__poll_t mask = datagram_poll(file, sock, wait);
> +	poll_wait(file, &so->wait, wait);
> +
> +	/* Check for false positives due to TX state */
> +	if ((mask & EPOLLWRNORM) && (so->tx.state != ISOTP_IDLE))
> +		mask &= ~(EPOLLOUT | EPOLLWRNORM);
> +
> +	return mask;
> +}
> +
>  static int isotp_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
>  				  unsigned long arg)
>  {
> @@ -1623,7 +1638,7 @@ static const struct proto_ops isotp_ops = {
>  	.socketpair = sock_no_socketpair,
>  	.accept = sock_no_accept,
>  	.getname = isotp_getname,
> -	.poll = datagram_poll,
> +	.poll = isotp_poll,
>  	.ioctl = isotp_sock_no_ioctlcmd,
>  	.gettstamp = sock_gettstamp,
>  	.listen = sock_no_listen,
> -- 
> 2.39.2
> 
> 

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung Nürnberg              | Phone: +49-5121-206917-129  |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH v2] can: isotp: fix poll() to not report false EPOLLOUT events
  2023-03-24 17:50         ` Marc Kleine-Budde
@ 2023-03-26 11:38           ` Oliver Hartkopp
  0 siblings, 0 replies; 17+ messages in thread
From: Oliver Hartkopp @ 2023-03-26 11:38 UTC (permalink / raw)
  To: Marc Kleine-Budde, Michal Sojka; +Cc: linux-can, Jakub Jira

Hi Marc,

On 24.03.23 18:50, Marc Kleine-Budde wrote:
> Oliver, what about this patch?
> 

the latest discussion on this topic was this answer from Michal ...

https://lore.kernel.org/linux-can/87mt4tu4ow.fsf@steelpick.2x.cz/T/#u

... which somehow ended there.

After getting and answering this mail today:

https://lore.kernel.org/linux-can/31c4a218-ee1b-4b64-59b6-ba5ef6ecce3c@hartkopp.net/T/#t

... I would try the V2 patch for poll() function after applying the 
lock_sock() patch suggested there.

Best regards,
Oliver

> 
> On 02.03.2023 10:28:12, Michal Sojka wrote:
>> When using select/poll/epoll() with a non-blocking ISOTP socket to
>> wait for when non-blocking write is possible, false EPOLLOUT event is
>> sometimes returned. This can happen at least after sending a message
>> which must be split to multiple CAN frames.
>>
>> The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
>> not equal to ISOTP_IDLE and this behavior is not reflected in
>> datagram_poll(), which is used in isotp_ops.
>>
>> This is fixed by introducing ISOTP-specific poll function, which
>> suppresses the EPOLLOUT events in that case.
>>
>> This can be tested with a program available at
>> https://github.com/wentasah/isotp-poll-test/blob/fd095b2242c49dc5d3e36faf5ac9f4f47fd002c7/isotp-poll-test.c
>> which can trigger the problem on a vcan interface. When running the
>> program as:
>>
>>      ./isotp-poll-test -s 123 -d 321 -o
>>
>> it starts sending ISOTP messages that include increasing ASCII
>> numbers. poll() is used to wait before the next transmission.
>>
>> With current mainline Linux, once the message length is greater than 7
>> bytes, write() returns -EAGAIN and the program terminates. This should
>> not happen, because the previous poll() reported that the write()
>> would not block.
>>
>> After applying this patch, the above command doesn't fail - if one
>> runs some ISOTP reader such as:
>>
>>      isotprecv -l -s 321 -d 123 vcan0
>>
>> This test program can also show another problem. When running:
>>
>>      ./isotp-poll-test -s 321 -d 123 -i -a
>>
>> and then in another terminal:
>>
>>      ./isotp-poll-test -s 123 -d 321 -o
>>
>> The first program receives the messages and uses the counter values to
>> check for lost messages. After a random number of iterations a lost
>> message is always detected. I believe that ISOTP should be reliable
>> protocol, at least on vcan, shouldn't it?
>>
>> Anyway, this patch doesn't try to address this problem.
>>
>> Signed-off-by: Michal Sojka <michal.sojka@cvut.cz>
>> Reported-by: Jakub Jira <jirajak2@fel.cvut.cz>
>>
>> ---
>> Changelog:
>>
>> v2: Added waiting for isotp-specific wait queue: poll_wait(file, &so->wait, wait).
>> ---
>>   net/can/isotp.c | 17 ++++++++++++++++-
>>   1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/can/isotp.c b/net/can/isotp.c
>> index 9bc344851704..ec163e36ac53 100644
>> --- a/net/can/isotp.c
>> +++ b/net/can/isotp.c
>> @@ -1608,6 +1608,21 @@ static int isotp_init(struct sock *sk)
>>   	return 0;
>>   }
>>   
>> +static __poll_t isotp_poll(struct file *file, struct socket *sock, poll_table *wait)
>> +{
>> +	struct sock *sk = sock->sk;
>> +	struct isotp_sock *so = isotp_sk(sk);
>> +
>> +	__poll_t mask = datagram_poll(file, sock, wait);
>> +	poll_wait(file, &so->wait, wait);
>> +
>> +	/* Check for false positives due to TX state */
>> +	if ((mask & EPOLLWRNORM) && (so->tx.state != ISOTP_IDLE))
>> +		mask &= ~(EPOLLOUT | EPOLLWRNORM);
>> +
>> +	return mask;
>> +}
>> +
>>   static int isotp_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
>>   				  unsigned long arg)
>>   {
>> @@ -1623,7 +1638,7 @@ static const struct proto_ops isotp_ops = {
>>   	.socketpair = sock_no_socketpair,
>>   	.accept = sock_no_accept,
>>   	.getname = isotp_getname,
>> -	.poll = datagram_poll,
>> +	.poll = isotp_poll,
>>   	.ioctl = isotp_sock_no_ioctlcmd,
>>   	.gettstamp = sock_gettstamp,
>>   	.listen = sock_no_listen,
>> -- 
>> 2.39.2
>>
>>
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Lost packets on ISO-TP and vcan
  2023-03-03 17:26         ` Lost packets on ISO-TP and vcan Michal Sojka
@ 2023-03-30 17:24           ` Oliver Hartkopp
  0 siblings, 0 replies; 17+ messages in thread
From: Oliver Hartkopp @ 2023-03-30 17:24 UTC (permalink / raw)
  To: Michal Sojka, Jakub Jira, Marc Kleine-Budde; +Cc: linux-can

Hi all,

On 03.03.23 18:26, Michal Sojka wrote:

> I tried the exact same commit. When the system running the test is
> mostly idle, I have to wait many minutes to loose the packet. But when I
> run the test simultaneously with Linux kernel compilation on all CPUs
> (make -j$(nproc)), the packet is lost almost immediately.
> 
> I'm still testing with:
> 
> - receiver: ./isotp-poll-test -s 321 -d 123 -i -a -b -q
> - sender:   ./isotp-poll-test -s 123 -d 321 -o -w -b
> 

I've performed some more tests.

The poll V2 patch seems to do the job!

The problem with the kernel compilation is caused by scheduling in the 
way that the receiving process is not fast enough to grab the PDUs from 
the socket until the socket receive buffer gets an overflow.

To prove this idea I added the possibility to detect dropped PDUs in the 
recvmsg() syscall of isotp.c, see:

https://lore.kernel.org/linux-can/20230330170248.62342-1-socketcan@hartkopp.net/T/#u

Additionally the isotp-pool-test was extended to retrieve the 
SOCK_RXQ_OVFL information.

And then "compiling the kernel" the drop counter triggered every time.

So I added the option the increase the sockets receive buffer and with

./isotp-poll-test -s 321 -d 123 -i -a -b -r 20000000

(as root)

the drops were gone - even when compiling the kernel.

See https://github.com/hartkopp/isotp-poll-test for the extensions of 
isotp-pool-test.c

@Michal/Jakub: Feel free to try this out.

Thanks & best regards,
Oliver

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH v2] can: isotp: fix poll() to not report false EPOLLOUT events
  2023-03-02  9:28       ` [RFC PATCH v2] can: isotp: fix poll() to not report false " Michal Sojka
  2023-03-24 17:50         ` Marc Kleine-Budde
@ 2023-03-30 17:30         ` Oliver Hartkopp
  2023-03-31 10:17           ` Michal Sojka
  1 sibling, 1 reply; 17+ messages in thread
From: Oliver Hartkopp @ 2023-03-30 17:30 UTC (permalink / raw)
  To: Michal Sojka, Jakub Jira; +Cc: linux-can



On 02.03.23 10:28, Michal Sojka wrote:
> When using select/poll/epoll() with a non-blocking ISOTP socket to
> wait for when non-blocking write is possible, false EPOLLOUT event is
> sometimes returned. This can happen at least after sending a message
> which must be split to multiple CAN frames.
> 
> The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is
> not equal to ISOTP_IDLE and this behavior is not reflected in
> datagram_poll(), which is used in isotp_ops.
> 
> This is fixed by introducing ISOTP-specific poll function, which
> suppresses the EPOLLOUT events in that case.
> 
> This can be tested with a program available at
> https://github.com/wentasah/isotp-poll-test/blob/fd095b2242c49dc5d3e36faf5ac9f4f47fd002c7/isotp-poll-test.c
> which can trigger the problem on a vcan interface. When running the
> program as:
> 
>      ./isotp-poll-test -s 123 -d 321 -o
> 
> it starts sending ISOTP messages that include increasing ASCII
> numbers. poll() is used to wait before the next transmission.
> 
> With current mainline Linux, once the message length is greater than 7
> bytes, write() returns -EAGAIN and the program terminates. This should
> not happen, because the previous poll() reported that the write()
> would not block.
> 
> After applying this patch, the above command doesn't fail - if one
> runs some ISOTP reader such as:
> 
>      isotprecv -l -s 321 -d 123 vcan0
> 
> This test program can also show another problem. When running:
> 
>      ./isotp-poll-test -s 321 -d 123 -i -a
> 
> and then in another terminal:
> 
>      ./isotp-poll-test -s 123 -d 321 -o
> 
> The first program receives the messages and uses the counter values to
> check for lost messages. After a random number of iterations a lost
> message is always detected. I believe that ISOTP should be reliable
> protocol, at least on vcan, shouldn't it?
> 
> Anyway, this patch doesn't try to address this problem.
> 
> Signed-off-by: Michal Sojka <michal.sojka@cvut.cz>
> Reported-by: Jakub Jira <jirajak2@fel.cvut.cz>

Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>

@Michal: Would you like to rework the commit message or is it just ok to 
cut the message after

"This can be tested with a program available at ..."

?

Best regards,
Oliver

> ---
> Changelog:
> 
> v2: Added waiting for isotp-specific wait queue: poll_wait(file, &so->wait, wait).
> ---
>   net/can/isotp.c | 17 ++++++++++++++++-
>   1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/net/can/isotp.c b/net/can/isotp.c
> index 9bc344851704..ec163e36ac53 100644
> --- a/net/can/isotp.c
> +++ b/net/can/isotp.c
> @@ -1608,6 +1608,21 @@ static int isotp_init(struct sock *sk)
>   	return 0;
>   }
>   
> +static __poll_t isotp_poll(struct file *file, struct socket *sock, poll_table *wait)
> +{
> +	struct sock *sk = sock->sk;
> +	struct isotp_sock *so = isotp_sk(sk);
> +
> +	__poll_t mask = datagram_poll(file, sock, wait);
> +	poll_wait(file, &so->wait, wait);
> +
> +	/* Check for false positives due to TX state */
> +	if ((mask & EPOLLWRNORM) && (so->tx.state != ISOTP_IDLE))
> +		mask &= ~(EPOLLOUT | EPOLLWRNORM);
> +
> +	return mask;
> +}
> +
>   static int isotp_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
>   				  unsigned long arg)
>   {
> @@ -1623,7 +1638,7 @@ static const struct proto_ops isotp_ops = {
>   	.socketpair = sock_no_socketpair,
>   	.accept = sock_no_accept,
>   	.getname = isotp_getname,
> -	.poll = datagram_poll,
> +	.poll = isotp_poll,
>   	.ioctl = isotp_sock_no_ioctlcmd,
>   	.gettstamp = sock_gettstamp,
>   	.listen = sock_no_listen,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH v2] can: isotp: fix poll() to not report false EPOLLOUT events
  2023-03-30 17:30         ` Oliver Hartkopp
@ 2023-03-31 10:17           ` Michal Sojka
  0 siblings, 0 replies; 17+ messages in thread
From: Michal Sojka @ 2023-03-31 10:17 UTC (permalink / raw)
  To: Oliver Hartkopp, Jakub Jira; +Cc: linux-can

On Thu, Mar 30 2023, Oliver Hartkopp wrote:
> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
>
> @Michal: Would you like to rework the commit message or is it just ok to 
> cut the message after
>
> "This can be tested with a program available at ..."
>
> ?

Will do. I'll send v3 in a while.

-Michal

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-03-31 10:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-28 20:42 please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events Oliver Hartkopp
2023-02-28 20:49 ` Michal Sojka
2023-02-28 20:51   ` Oliver Hartkopp
2023-03-01 18:59   ` Oliver Hartkopp
2023-03-01 20:55     ` Oliver Hartkopp
2023-03-01 23:17       ` Michal Sojka
2023-03-02  9:20     ` Michal Sojka
2023-03-02  9:28       ` [RFC PATCH v2] can: isotp: fix poll() to not report false " Michal Sojka
2023-03-24 17:50         ` Marc Kleine-Budde
2023-03-26 11:38           ` Oliver Hartkopp
2023-03-30 17:30         ` Oliver Hartkopp
2023-03-31 10:17           ` Michal Sojka
2023-03-02 21:00       ` please re-send [RFC PATCH] can: isotp: fix poll() to not report false positive " Oliver Hartkopp
2023-03-03  7:13         ` Michal Sojka
2023-03-03 17:26         ` Lost packets on ISO-TP and vcan Michal Sojka
2023-03-30 17:24           ` Oliver Hartkopp
2023-02-28 20:50 ` [PATCH] can: isotp: fix poll() to not report false positive EPOLLOUT events Michal Sojka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.