All of lore.kernel.org
 help / color / mirror / Atom feed
* CAN: TX frames marked as RX after the sending socket is closed
@ 2021-05-10 14:23 Torin Cooper-Bennun
  2021-05-10 15:35 ` Marc Kleine-Budde
  2021-05-10 18:18 ` Marc Kleine-Budde
  0 siblings, 2 replies; 6+ messages in thread
From: Torin Cooper-Bennun @ 2021-05-10 14:23 UTC (permalink / raw)
  To: linux-can, netdev

Hi,

Scenario: I open a raw CAN socket, queue a bunch of frames for TX, then
close the socket as soon as possible. For the duration of the test, I
have another socket open listening for all frames (candump).

After the sending socket has been closed, and there are still frames in
the queue yet to be transmitted, I find candump reporting the remainder
of my sent frames as RX rather than TX.

For example, I send 1,000 8-byte classical CAN frames, immediately close
the socket and log the time at which I did so.

My application reports the socket closed:

| Socket closed at 15:02:45.987278

My candump log shows:

| (2021-05-10 15:02:45.327724)  can0  TX - -  000  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:45.329578)  can0  TX - -  001  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:45.330493)  can0  TX - -  002  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:45.331341)  can0  TX - -  003  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:45.332264)  can0  TX - -  004  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:45.333148)  can0  TX - -  005  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:45.334115)  can0  TX - -  006  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:45.335061)  can0  TX - -  007  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:45.336021)  can0  TX - -  008  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:45.336951)  can0  TX - -  009  [08]  EE EE EE EE EE EE EE EE
| 
| .... snip ....
|
| (2021-05-10 15:02:46.089177)  can0  TX - -  399  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.090001)  can0  TX - -  39A  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.090852)  can0  TX - -  39B  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.091735)  can0  TX - -  39C  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.092483)  can0  TX - -  39D  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.093313)  can0  RX - -  39E  [08]  EE EE EE EE EE EE EE EE <----- !!!!!
| (2021-05-10 15:02:46.094091)  can0  RX - -  39F  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.094931)  can0  RX - -  3A0  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.095774)  can0  RX - -  3A1  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.096513)  can0  RX - -  3A2  [08]  EE EE EE EE EE EE EE EE
|
| .... snip ....
|
| (2021-05-10 15:02:46.143287)  can0  RX - -  3DE  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.144046)  can0  RX - -  3DF  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.144808)  can0  RX - -  3E0  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.145570)  can0  RX - -  3E1  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.146357)  can0  RX - -  3E2  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.147117)  can0  RX - -  3E3  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.147876)  can0  RX - -  3E4  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.148635)  can0  RX - -  3E5  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.149395)  can0  RX - -  3E6  [08]  EE EE EE EE EE EE EE EE
| (2021-05-10 15:02:46.150161)  can0  RX - -  3E7  [08]  EE EE EE EE EE EE EE EE

Why?

candump.c prints 'RX' if the received frame has no MSG_DONTROUTE flag.

|	if (msg.msg_flags & MSG_DONTROUTE)
|		printf ("  TX %s", extra_m_info[frame.flags & 3]);
|	else
|		printf ("  RX %s", extra_m_info[frame.flags & 3]);

In turn, MSG_DONTROUTE is set in net/can/raw.c: raw_rcv():

|	/* add CAN specific message flags for raw_recvmsg() */
|	pflags = raw_flags(skb);
|	*pflags = 0;
|	if (oskb->sk)
|		*pflags |= MSG_DONTROUTE;
|	if (oskb->sk == sk)
|		*pflags |= MSG_CONFIRM;

So, I'm guessing, some 100 ms after my application begins to request
that the socket be closed, the socket's pointer becomes NULL in further
TX skbs in the queue, so the raw CAN layer can no longer differentiate
these skbs as TX. (Sorry if my pathways are a bit mixed up.)

Seems broken to me - is this known behaviour?

Test setup:
 - kernel: v5.13-rc1 with some RPi-specific patches
 - hardware: RPi CM4 with TCAN4550 (so, m_can driver in peripheral mode)

--
Regards,

Torin Cooper-Bennun
Software Engineer | maxiluxsystems.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: CAN: TX frames marked as RX after the sending socket is closed
  2021-05-10 14:23 CAN: TX frames marked as RX after the sending socket is closed Torin Cooper-Bennun
@ 2021-05-10 15:35 ` Marc Kleine-Budde
  2021-05-11  9:20   ` Torin Cooper-Bennun
  2021-05-10 18:18 ` Marc Kleine-Budde
  1 sibling, 1 reply; 6+ messages in thread
From: Marc Kleine-Budde @ 2021-05-10 15:35 UTC (permalink / raw)
  To: Torin Cooper-Bennun; +Cc: linux-can, netdev

[-- Attachment #1: Type: text/plain, Size: 4770 bytes --]

On 10.05.2021 15:23:02, Torin Cooper-Bennun wrote:
> Scenario: I open a raw CAN socket, queue a bunch of frames for TX, then
> close the socket as soon as possible. For the duration of the test, I
> have another socket open listening for all frames (candump).
> 
> After the sending socket has been closed, and there are still frames in
> the queue yet to be transmitted, I find candump reporting the remainder
> of my sent frames as RX rather than TX.
> 
> For example, I send 1,000 8-byte classical CAN frames, immediately close
> the socket and log the time at which I did so.

Can you provide the program to reproduce the issue?
Have you increased the CAN interface's txqueuelen?

> My application reports the socket closed:
> 
> | Socket closed at 15:02:45.987278
> 
> My candump log shows:
> 
> | (2021-05-10 15:02:45.327724)  can0  TX - -  000  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:45.329578)  can0  TX - -  001  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:45.330493)  can0  TX - -  002  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:45.331341)  can0  TX - -  003  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:45.332264)  can0  TX - -  004  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:45.333148)  can0  TX - -  005  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:45.334115)  can0  TX - -  006  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:45.335061)  can0  TX - -  007  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:45.336021)  can0  TX - -  008  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:45.336951)  can0  TX - -  009  [08]  EE EE EE EE EE EE EE EE
> | 
> | .... snip ....
> |
> | (2021-05-10 15:02:46.089177)  can0  TX - -  399  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.090001)  can0  TX - -  39A  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.090852)  can0  TX - -  39B  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.091735)  can0  TX - -  39C  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.092483)  can0  TX - -  39D  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.093313)  can0  RX - -  39E  [08]  EE EE EE EE EE EE EE EE <----- !!!!!
> | (2021-05-10 15:02:46.094091)  can0  RX - -  39F  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.094931)  can0  RX - -  3A0  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.095774)  can0  RX - -  3A1  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.096513)  can0  RX - -  3A2  [08]  EE EE EE EE EE EE EE EE
> |
> | .... snip ....
> |
> | (2021-05-10 15:02:46.143287)  can0  RX - -  3DE  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.144046)  can0  RX - -  3DF  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.144808)  can0  RX - -  3E0  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.145570)  can0  RX - -  3E1  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.146357)  can0  RX - -  3E2  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.147117)  can0  RX - -  3E3  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.147876)  can0  RX - -  3E4  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.148635)  can0  RX - -  3E5  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.149395)  can0  RX - -  3E6  [08]  EE EE EE EE EE EE EE EE
> | (2021-05-10 15:02:46.150161)  can0  RX - -  3E7  [08]  EE EE EE EE EE EE EE EE
> 
> Why?
> 
> candump.c prints 'RX' if the received frame has no MSG_DONTROUTE flag.
> 
> |	if (msg.msg_flags & MSG_DONTROUTE)
> |		printf ("  TX %s", extra_m_info[frame.flags & 3]);
> |	else
> |		printf ("  RX %s", extra_m_info[frame.flags & 3]);
> 
> In turn, MSG_DONTROUTE is set in net/can/raw.c: raw_rcv():
> 
> |	/* add CAN specific message flags for raw_recvmsg() */
> |	pflags = raw_flags(skb);
> |	*pflags = 0;
> |	if (oskb->sk)
> |		*pflags |= MSG_DONTROUTE;
> |	if (oskb->sk == sk)
> |		*pflags |= MSG_CONFIRM;
> 
> So, I'm guessing, some 100 ms after my application begins to request
> that the socket be closed, the socket's pointer becomes NULL in further
> TX skbs in the queue, so the raw CAN layer can no longer differentiate
> these skbs as TX. (Sorry if my pathways are a bit mixed up.)
> 
> Seems broken to me - is this known behaviour?

Looks like a unknown bug to me!

> Test setup:
>  - kernel: v5.13-rc1 with some RPi-specific patches
>  - hardware: RPi CM4 with TCAN4550 (so, m_can driver in peripheral mode)

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: CAN: TX frames marked as RX after the sending socket is closed
  2021-05-10 14:23 CAN: TX frames marked as RX after the sending socket is closed Torin Cooper-Bennun
  2021-05-10 15:35 ` Marc Kleine-Budde
@ 2021-05-10 18:18 ` Marc Kleine-Budde
  2021-05-11  9:28   ` Torin Cooper-Bennun
  1 sibling, 1 reply; 6+ messages in thread
From: Marc Kleine-Budde @ 2021-05-10 18:18 UTC (permalink / raw)
  To: Torin Cooper-Bennun; +Cc: linux-can, netdev

[-- Attachment #1: Type: text/plain, Size: 3572 bytes --]

On 10.05.2021 15:23:02, Torin Cooper-Bennun wrote:
> Why?
> 
> candump.c prints 'RX' if the received frame has no MSG_DONTROUTE flag.
> 
> |	if (msg.msg_flags & MSG_DONTROUTE)
> |		printf ("  TX %s", extra_m_info[frame.flags & 3]);
> |	else
> |		printf ("  RX %s", extra_m_info[frame.flags & 3]);
> 
> In turn, MSG_DONTROUTE is set in net/can/raw.c: raw_rcv():
> 
> |	/* add CAN specific message flags for raw_recvmsg() */
> |	pflags = raw_flags(skb);
> |	*pflags = 0;
> |	if (oskb->sk)
> |		*pflags |= MSG_DONTROUTE;
> |	if (oskb->sk == sk)
> |		*pflags |= MSG_CONFIRM;

Without testing, I think you're right here, the MSG_DONTROUTE isn't set
here anymore.

> So, I'm guessing, some 100 ms after my application begins to request
> that the socket be closed, the socket's pointer becomes NULL in further
> TX skbs in the queue, so the raw CAN layer can no longer differentiate
> these skbs as TX. (Sorry if my pathways are a bit mixed up.)

I have a git feeling that I've found the problem. Can you revert
e940e0895a82 ("can: skb: can_skb_set_owner(): fix ref counting if socket
was closed before setting skb ownership") and check if that fixes your
problem? This might trigger the problem described in the patch:

| WARNING: CPU: 0 PID: 280 at lib/refcount.c:25 refcount_warn_saturate+0x114/0x134
| refcount_t: addition on 0; use-after-free.
| Modules linked in: coda_vpu(E) v4l2_jpeg(E) videobuf2_vmalloc(E) imx_vdoa(E)
| CPU: 0 PID: 280 Comm: test_can.sh Tainted: G            E     5.11.0-04577-gf8ff6603c617 #203
| Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
| Backtrace:
| [<80bafea4>] (dump_backtrace) from [<80bb0280>] (show_stack+0x20/0x24) r7:00000000 r6:600f0113 r5:00000000 r4:81441220
| [<80bb0260>] (show_stack) from [<80bb593c>] (dump_stack+0xa0/0xc8)
| [<80bb589c>] (dump_stack) from [<8012b268>] (__warn+0xd4/0x114) r9:00000019 r8:80f4a8c2 r7:83e4150c r6:00000000 r5:00000009 r4:80528f90
| [<8012b194>] (__warn) from [<80bb09c4>] (warn_slowpath_fmt+0x88/0xc8) r9:83f26400 r8:80f4a8d1 r7:00000009 r6:80528f90 r5:00000019 r4:80f4a8c2
| [<80bb0940>] (warn_slowpath_fmt) from [<80528f90>] (refcount_warn_saturate+0x114/0x134) r8:00000000 r7:00000000 r6:82b44000 r5:834e5600 r4:83f4d540
| [<80528e7c>] (refcount_warn_saturate) from [<8079a4c8>] (__refcount_add.constprop.0+0x4c/0x50)
| [<8079a47c>] (__refcount_add.constprop.0) from [<8079a57c>] (can_put_echo_skb+0xb0/0x13c)
| [<8079a4cc>] (can_put_echo_skb) from [<8079ba98>] (flexcan_start_xmit+0x1c4/0x230) r9:00000010 r8:83f48610 r7:0fdc0000 r6:0c080000 r5:82b44000 r4:834e5600
| [<8079b8d4>] (flexcan_start_xmit) from [<80969078>] (netdev_start_xmit+0x44/0x70) r9:814c0ba0 r8:80c8790c r7:00000000 r6:834e5600 r5:82b44000 r4:82ab1f00
| [<80969034>] (netdev_start_xmit) from [<809725a4>] (dev_hard_start_xmit+0x19c/0x318) r9:814c0ba0 r8:00000000 r7:82ab1f00 r6:82b44000 r5:00000000 r4:834e5600
| [<80972408>] (dev_hard_start_xmit) from [<809c6584>] (sch_direct_xmit+0xcc/0x264) r10:834e5600 r9:00000000 r8:00000000 r7:82b44000 r6:82ab1f00 r5:834e5600 r4:83f27400
| [<809c64b8>] (sch_direct_xmit) from [<809c6c0c>] (__qdisc_run+0x4f0/0x534)

Can you give me feedback if
1. the revert "fixes" your problem
2. the revert triggers the above backtrace

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: CAN: TX frames marked as RX after the sending socket is closed
  2021-05-10 15:35 ` Marc Kleine-Budde
@ 2021-05-11  9:20   ` Torin Cooper-Bennun
  0 siblings, 0 replies; 6+ messages in thread
From: Torin Cooper-Bennun @ 2021-05-11  9:20 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: linux-can, netdev

[-- Attachment #1: Type: text/plain, Size: 333 bytes --]

On Mon, May 10, 2021 at 05:35:40PM +0200, Marc Kleine-Budde wrote:
> Can you provide the program to reproduce the issue?

See attached below this email (tx only, rx done with candump).

> Have you increased the CAN interface's txqueuelen?

Yup, qlen is 1000.

--
Regards,

Torin Cooper-Bennun
Software Engineer | maxiluxsystems.com


[-- Attachment #2: tx_rx_bug.c --]
[-- Type: text/x-csrc, Size: 1042 bytes --]

#include <errno.h>
#include <linux/can.h>
#include <net/if.h>
#include <stdio.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <time.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
	const char *if_name = "can0";
	int sockfd;
	struct ifreq req = {};
	struct sockaddr_can sockaddr = {};
	struct can_frame frame = {};
	struct timeval tv;
	char timestr[16];

	sockfd = socket(AF_CAN, SOCK_RAW, CAN_RAW);

	strncpy(req.ifr_name, if_name, IF_NAMESIZE);
	ioctl(sockfd, SIOCGIFINDEX, &req);

	sockaddr.can_family = AF_CAN;
	sockaddr.can_ifindex = req.ifr_ifindex;

	bind(sockfd, (const struct sockaddr *) &sockaddr, sizeof(sockaddr));

	frame.can_dlc = 8;
	memset(frame.data, 0xee, 8);
	for (int i = 0; i < 1000; ++i)
	{
		frame.can_id = (i & 0x7ff);
		send(sockfd, &frame, sizeof(frame), 0);
	}

	close(sockfd);

	gettimeofday(&tv, NULL);
	strftime(timestr, sizeof(timestr), "%H:%M:%S", localtime(&tv.tv_sec));
	printf("Socket closed at %s.%06ld\n", timestr, tv.tv_usec);

	return 0;
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: CAN: TX frames marked as RX after the sending socket is closed
  2021-05-10 18:18 ` Marc Kleine-Budde
@ 2021-05-11  9:28   ` Torin Cooper-Bennun
  2021-05-11 10:01     ` Marc Kleine-Budde
  0 siblings, 1 reply; 6+ messages in thread
From: Torin Cooper-Bennun @ 2021-05-11  9:28 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: linux-can, netdev

On Mon, May 10, 2021 at 08:18:07PM +0200, Marc Kleine-Budde wrote:
> I have a git feeling that I've found the problem. Can you revert
> e940e0895a82 ("can: skb: can_skb_set_owner(): fix ref counting if socket
> was closed before setting skb ownership") and check if that fixes your
> problem? This might trigger the problem described in the patch:
> 
> | WARNING: CPU: 0 PID: 280 at lib/refcount.c:25 refcount_warn_saturate+0x114/0x134
> | refcount_t: addition on 0; use-after-free.
> | Modules linked in: coda_vpu(E) v4l2_jpeg(E) videobuf2_vmalloc(E) imx_vdoa(E)
> | CPU: 0 PID: 280 Comm: test_can.sh Tainted: G            E     5.11.0-04577-gf8ff6603c617 #203
> | Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> | Backtrace:
> | [<80bafea4>] (dump_backtrace) from [<80bb0280>] (show_stack+0x20/0x24) r7:00000000 r6:600f0113 r5:00000000 r4:81441220
> | [<80bb0260>] (show_stack) from [<80bb593c>] (dump_stack+0xa0/0xc8)
> | [<80bb589c>] (dump_stack) from [<8012b268>] (__warn+0xd4/0x114) r9:00000019 r8:80f4a8c2 r7:83e4150c r6:00000000 r5:00000009 r4:80528f90
> | [<8012b194>] (__warn) from [<80bb09c4>] (warn_slowpath_fmt+0x88/0xc8) r9:83f26400 r8:80f4a8d1 r7:00000009 r6:80528f90 r5:00000019 r4:80f4a8c2
> | [<80bb0940>] (warn_slowpath_fmt) from [<80528f90>] (refcount_warn_saturate+0x114/0x134) r8:00000000 r7:00000000 r6:82b44000 r5:834e5600 r4:83f4d540
> | [<80528e7c>] (refcount_warn_saturate) from [<8079a4c8>] (__refcount_add.constprop.0+0x4c/0x50)
> | [<8079a47c>] (__refcount_add.constprop.0) from [<8079a57c>] (can_put_echo_skb+0xb0/0x13c)
> | [<8079a4cc>] (can_put_echo_skb) from [<8079ba98>] (flexcan_start_xmit+0x1c4/0x230) r9:00000010 r8:83f48610 r7:0fdc0000 r6:0c080000 r5:82b44000 r4:834e5600
> | [<8079b8d4>] (flexcan_start_xmit) from [<80969078>] (netdev_start_xmit+0x44/0x70) r9:814c0ba0 r8:80c8790c r7:00000000 r6:834e5600 r5:82b44000 r4:82ab1f00
> | [<80969034>] (netdev_start_xmit) from [<809725a4>] (dev_hard_start_xmit+0x19c/0x318) r9:814c0ba0 r8:00000000 r7:82ab1f00 r6:82b44000 r5:00000000 r4:834e5600
> | [<80972408>] (dev_hard_start_xmit) from [<809c6584>] (sch_direct_xmit+0xcc/0x264) r10:834e5600 r9:00000000 r8:00000000 r7:82b44000 r6:82ab1f00 r5:834e5600 r4:83f27400
> | [<809c64b8>] (sch_direct_xmit) from [<809c6c0c>] (__qdisc_run+0x4f0/0x534)
> 
> Can you give me feedback if
> 1. the revert "fixes" your problem
> 2. the revert triggers the above backtrace

Always trust your git, it seems... I can confirm this revert both
'fixes' the problem and triggers that backtrace originating from
m_can_tx_handler. I got two of those backtraces during the run, and
sandwiched between them a backtrace from the rx path:

| WARNING: CPU: 2 PID: 22 at lib/refcount.c:28 refcount_warn_saturate+0x13c/0x174
| refcount_t: underflow; use-after-free.
| Modules linked in: can_raw can sha256_generic cfg80211 rfkill 8021q garp stp llc tcan4x5x m_can can_dev spidev v3d raspberrypi_hwmon vc4 gpu_sched cec i2c_bcm2835 bcm2835_isp(C) drm_kms_helper spi_bcm2835 bcm2835_codec(C) v4l2_mem2mem bcm2835_v4l2(C) bcm2835_mmal_vchiq(C) videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common videodev snd_bcm2835(C) mc vc_sm_cma(C) snd_soc_core snd_compress snd_pcm_dmaengine rpivid_mem snd_pcm snd_timer snd syscopyarea sysfillrect sysimgblt fb_sys_fops nvmem_rmem uio_pdrv_genirq uio i2c_dev drm drm_panel_orientation_quirks backlight fuse ip_tables x_tables ipv6
| CPU: 2 PID: 22 Comm: ksoftirqd/2 Tainted: G        WC        5.13.0-rc1-v7l+ #1
| Hardware name: BCM2711
| Backtrace: 
| [<c0bdc4bc>] (dump_backtrace) from [<c0bdc830>] (show_stack+0x20/0x24)
|  r7:ffffffff r6:00000000 r5:60000013 r4:c12e85a4
| [<c0bdc810>] (show_stack) from [<c0be0eec>] (dump_stack+0xc4/0xf0)
| [<c0be0e28>] (dump_stack) from [<c0221194>] (__warn+0xfc/0x158)
|  r9:ef41b540 r8:00000009 r7:0000001c r6:00000009 r5:c075c064 r4:c0e6bfc8
| [<c0221098>] (__warn) from [<c0bdd004>] (warn_slowpath_fmt+0xa4/0xe4)
|  r7:c075c064 r6:0000001c r5:c0e6bfc8 r4:c0e6c004
| [<c0bdcf64>] (warn_slowpath_fmt) from [<c075c064>] (refcount_warn_saturate+0x13c/0x174)
|  r8:00000001 r7:c1323d40 r6:c3f90000 r5:c3eb4e40 r4:c37de240
| [<c075bf28>] (refcount_warn_saturate) from [<c0a3ea80>] (sock_efree+0x50/0x90)
| [<c0a3ea30>] (sock_efree) from [<c0a470a8>] (skb_release_head_state+0x50/0xe4)
| [<c0a47058>] (skb_release_head_state) from [<c0a48fe4>] (consume_skb+0x38/0xe0)
|  r5:c3eb4e40 r4:c37de240
| [<c0a48fac>] (consume_skb) from [<bf32a920>] (can_receive+0xc8/0xf4 [can])
|  r5:c3eb4e40 r4:c37de240
| [<bf32a858>] (can_receive [can]) from [<bf32aa50>] (can_rcv+0x44/0xc0 [can])
|  r9:ef41b540 r8:c3f906a0 r7:c3f90680 r6:00000040 r5:bf32aa0c r4:c37de240
| [<bf32aa0c>] (can_rcv [can]) from [<c0a655ac>] (__netif_receive_skb_one_core+0x68/0x90)
|  r5:bf32aa0c r4:c3f90000
| [<c0a65544>] (__netif_receive_skb_one_core) from [<c0a65620>] (__netif_receive_skb+0x20/0x70)
|  r5:00000001 r4:c37de240
| [<c0a65600>] (__netif_receive_skb) from [<c0a656b0>] (netif_receive_skb+0x40/0x180)
|  r5:00000001 r4:c37de240
| [<c0a65670>] (netif_receive_skb) from [<bf2ae9dc>] (can_rx_offload_napi_poll+0x58/0xb4 [can_dev])
|  r4:c3f90000
| [<bf2ae984>] (can_rx_offload_napi_poll [can_dev]) from [<c0a66d74>] (__napi_poll+0x38/0x1dc)

--
Regards,

Torin Cooper-Bennun
Software Engineer | maxiluxsystems.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: CAN: TX frames marked as RX after the sending socket is closed
  2021-05-11  9:28   ` Torin Cooper-Bennun
@ 2021-05-11 10:01     ` Marc Kleine-Budde
  0 siblings, 0 replies; 6+ messages in thread
From: Marc Kleine-Budde @ 2021-05-11 10:01 UTC (permalink / raw)
  To: Torin Cooper-Bennun; +Cc: linux-can, netdev

[-- Attachment #1: Type: text/plain, Size: 3360 bytes --]

On 11.05.2021 10:28:28, Torin Cooper-Bennun wrote:
> On Mon, May 10, 2021 at 08:18:07PM +0200, Marc Kleine-Budde wrote:
> > I have a git feeling that I've found the problem. Can you revert
> > e940e0895a82 ("can: skb: can_skb_set_owner(): fix ref counting if socket
> > was closed before setting skb ownership") and check if that fixes your
> > problem? This might trigger the problem described in the patch:
> > 
> > | WARNING: CPU: 0 PID: 280 at lib/refcount.c:25 refcount_warn_saturate+0x114/0x134
> > | refcount_t: addition on 0; use-after-free.
> > | Modules linked in: coda_vpu(E) v4l2_jpeg(E) videobuf2_vmalloc(E) imx_vdoa(E)
> > | CPU: 0 PID: 280 Comm: test_can.sh Tainted: G            E     5.11.0-04577-gf8ff6603c617 #203
> > | Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > | Backtrace:
> > | [<80bafea4>] (dump_backtrace) from [<80bb0280>] (show_stack+0x20/0x24) r7:00000000 r6:600f0113 r5:00000000 r4:81441220
> > | [<80bb0260>] (show_stack) from [<80bb593c>] (dump_stack+0xa0/0xc8)
> > | [<80bb589c>] (dump_stack) from [<8012b268>] (__warn+0xd4/0x114) r9:00000019 r8:80f4a8c2 r7:83e4150c r6:00000000 r5:00000009 r4:80528f90
> > | [<8012b194>] (__warn) from [<80bb09c4>] (warn_slowpath_fmt+0x88/0xc8) r9:83f26400 r8:80f4a8d1 r7:00000009 r6:80528f90 r5:00000019 r4:80f4a8c2
> > | [<80bb0940>] (warn_slowpath_fmt) from [<80528f90>] (refcount_warn_saturate+0x114/0x134) r8:00000000 r7:00000000 r6:82b44000 r5:834e5600 r4:83f4d540
> > | [<80528e7c>] (refcount_warn_saturate) from [<8079a4c8>] (__refcount_add.constprop.0+0x4c/0x50)
> > | [<8079a47c>] (__refcount_add.constprop.0) from [<8079a57c>] (can_put_echo_skb+0xb0/0x13c)
> > | [<8079a4cc>] (can_put_echo_skb) from [<8079ba98>] (flexcan_start_xmit+0x1c4/0x230) r9:00000010 r8:83f48610 r7:0fdc0000 r6:0c080000 r5:82b44000 r4:834e5600
> > | [<8079b8d4>] (flexcan_start_xmit) from [<80969078>] (netdev_start_xmit+0x44/0x70) r9:814c0ba0 r8:80c8790c r7:00000000 r6:834e5600 r5:82b44000 r4:82ab1f00
> > | [<80969034>] (netdev_start_xmit) from [<809725a4>] (dev_hard_start_xmit+0x19c/0x318) r9:814c0ba0 r8:00000000 r7:82ab1f00 r6:82b44000 r5:00000000 r4:834e5600
> > | [<80972408>] (dev_hard_start_xmit) from [<809c6584>] (sch_direct_xmit+0xcc/0x264) r10:834e5600 r9:00000000 r8:00000000 r7:82b44000 r6:82ab1f00 r5:834e5600 r4:83f27400
> > | [<809c64b8>] (sch_direct_xmit) from [<809c6c0c>] (__qdisc_run+0x4f0/0x534)
> > 
> > Can you give me feedback if
> > 1. the revert "fixes" your problem
> > 2. the revert triggers the above backtrace
> 
> Always trust your git, it seems... I can confirm this revert both
> 'fixes' the problem and triggers that backtrace originating from
> m_can_tx_handler.

\o/

> I got two of those backtraces during the run, and
> sandwiched between them a backtrace from the rx path:
> 
> | WARNING: CPU: 2 PID: 22 at lib/refcount.c:28 refcount_warn_saturate+0x13c/0x174
> | refcount_t: underflow; use-after-free.

Now please test if
https://lore.kernel.org/linux-can/20210510182038.1528631-1-mkl@pengutronix.de/
fixes your problem.

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-11 10:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-10 14:23 CAN: TX frames marked as RX after the sending socket is closed Torin Cooper-Bennun
2021-05-10 15:35 ` Marc Kleine-Budde
2021-05-11  9:20   ` Torin Cooper-Bennun
2021-05-10 18:18 ` Marc Kleine-Budde
2021-05-11  9:28   ` Torin Cooper-Bennun
2021-05-11 10:01     ` Marc Kleine-Budde

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.