RE: [PATCH] net: can: Increase tx queue length

From: Appana Durga Kedareswara Rao <appanad@xilinx.com>
To: "Oliver Hartkopp" <socketcan@hartkopp.net>,
	"Dave Taht" <dave@taht.net>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Andre Naujoks" <nautsch2@gmail.com>,
	"wg@grandegger.com" <wg@grandegger.com>,
	"mkl@pengutronix.de" <mkl@pengutronix.de>,
	"davem@davemloft.net" <davem@davemloft.net>
Cc: "linux-can@vger.kernel.org" <linux-can@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH] net: can: Increase tx queue length
Date: Fri, 15 Mar 2019 10:04:27 +0000	[thread overview]
Message-ID: <DM5PR02MB2187A422962182382358B916DC440@DM5PR02MB2187.namprd02.prod.outlook.com> (raw)
In-Reply-To: <9adf2fbb-7b0b-6821-98fb-2ddcdf5c0edd@hartkopp.net>

Hi All,

<Snip> 
> Hi all,
> 
> On 3/10/19 6:07 AM, Dave Taht wrote:
> > Toke Høiland-Jørgensen <toke@redhat.com> writes:
> >
> >> Appana Durga Kedareswara Rao <appanad@xilinx.com> writes:
> >>
> >>> Hi Andre,
> >>>
> >>> <Snip>
> >>>>
> >>>> On 3/9/19 3:07 PM, Appana Durga Kedareswara rao wrote:
> >>>>> While stress testing the CAN interface on xilinx axi can in
> >>>>> loopback mode getting message "write: no buffer space available"
> >>>>> Increasing device tx queue length resolved the above mentioned issue.
> >>>>
> >>>> No need to patch the kernel:
> >>>>
> >>>> $ ip link set <dev-name> txqueuelen 500
> >>>>
> >>>> does the same thing.
> >>>
> >>> Thanks for the review...
> >>> Agree but it is not an out of box solution right??
> >>> Do you have any idea for socket can devices why the tx queue length
> >>> is 10 whereas for other network devices (ex: ethernet) it is 1000 ??
> >>
> >> Probably because you don't generally want a long queue adding latency
> >> on a CAN interface? The default 1000 is already way too much even for
> >> an Ethernet device in a lot of cases.
> >>
> >> If you get "out of buffer" errors it means your application is
> >> sending things faster than the receiver (or device) can handle them.
> >> If you solve this by increasing the queue length you are just
> >> papering over the underlying issue, and trading latency for fewer
> >> errors. This tradeoff
> >> *may* be appropriate for your particular application, but I can
> >> imagine it would not be appropriate as a default. Keeping the buffer
> >> size small allows errors to propagate up to the application, which
> >> can then back off, or do something smarter, as appropriate.
> >>
> >> I don't know anything about the actual discussions going on when the
> >> defaults were set, but I can imagine something along the lines of the
> >> above was probably a part of it :)
> >>
> >> -Toke
> >
> > In a related discussion, loud and often difficult, over here on the
> > can bus,
> >
> > https://github.com/systemd/systemd/issues/9194#issuecomment-
> 469403685
> >
> > we found that applying fq_codel as the default via sysctl qdisc a bad
> > idea for systems for at least one model of can device.
> >
> > If you scroll back on the bug, a good description of what the can
> > subsystem expects from the qdisc is therein - it mandates an in-order
> > fifo qdisc or no queue at all. the CAN protocol expects each packet to
> > be transmitted successfully or rejected, and if so, passes the error
> > up to userspace and is supposed to stop for further input.
> >
> > As this was the first serious bug ever reported against using fq_codel
> > as the default in 5+ years of systemd and 7 of openwrt deployment I've
> > been taking it very seriously. It's worse than just systemd - openwrt
> > patches out pfifo_fast entirely. pfifo_fast is the wrong qdisc - the
> > right choices are noqueue and possibly pfifo.
> >
> > However, the vcan device exposes noqueue, and so far it has been only
> > the one device ( a 8Devices socketcan USB2CAN ) that did not do this
> > in their driver that was misbehaving.
> >
> > Which was just corrected with a simple:
> >
> > static int usb_8dev_probe(struct usb_interface *intf,
> > 			 const struct usb_device_id *id)
> > {
> >       ...
> >       netdev->netdev_ops = &usb_8dev_netdev_ops;
> >
> >       netdev->flags |= IFF_ECHO; /* we support local echo */
> > +    netdev->priv_flags |= IFF_NO_QUEUE;
> >       ...
> > }
> >
> > and successfully tested on that bug report.
> >
> > So at the moment, my thought is that all can devices should default to
> > noqueue, if they are not already. I think a pfifo_fast and a qlen of
> > any size is the wrong thing, but I still don't know enough about what
> > other can devices do or did to be certain.
> >
> 
> Having about 10 elements in a CAN driver tx queue allows to work with
> queueing disciplines
> (http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf) and also to maintain a
> nearly real-time behaviour with outgoing traffic.
> 
> When the CAN interface is not able to cope with the (intened) outgoing traffic
> load, the applications should get an instant feedback about it.
> 
> There is a difference between running CAN applications in the real world and
> doing performance tests, where it makes sense to increase the tx-queue-len to
> e.g. 1000 and dump 1000 frames into the driver to check the hardware
> performance.

Thanks, Oliver, Martin, Andre, Toke, Dave for your inputs...
So to conclude this the default txqueuelen 10 is ideal for real-time CAN traffic,
For Stress/Performance tests user manually need to increase the txqueuelen based on his requirements.

Please correct me if my understanding is wrong. 

Regards,
Kedar.

> 
> Best regards,
> Oliver