linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* selftest/net: so_txtime.sh fails intermittently - read Resource temporarily unavailable
@ 2019-11-14  8:47 Naresh Kamboju
  2019-11-15 16:21 ` Willem de Bruijn
  0 siblings, 1 reply; 4+ messages in thread
From: Naresh Kamboju @ 2019-11-14  8:47 UTC (permalink / raw)
  To: open list:KERNEL SELFTEST FRAMEWORK, Netdev
  Cc: Shuah Khan, Anders Roxell, lkft-triage, Willem de Bruijn,
	David S. Miller

selftests net so_txtime.sh fails intermittently on multiple boards and
linux next and mainline.


# selftests net so_txtime.sh
net: so_txtime.sh_ #
#
: _ #
# SO_TXTIME ipv6 clock monotonic
ipv6: clock_monotonic #
# payloada delay452 expected0 (us)
delay452: expected0_(us) #
#
: _ #
# SO_TXTIME ipv4 clock monotonic
ipv4: clock_monotonic #
# payloada delay97 expected0 (us)
delay97: expected0_(us) #
#
: _ #
# SO_TXTIME ipv6 clock monotonic
ipv6: clock_monotonic #
# payloada delay168 expected0 (us)
delay168: expected0_(us) #
#
: _ #
# SO_TXTIME ipv4 clock monotonic
ipv4: clock_monotonic #
# payloada delay90 expected0 (us)
delay90: expected0_(us) #
#
: _ #
# SO_TXTIME ipv6 clock monotonic
ipv6: clock_monotonic #
# payloada delay10146 expected10000 (us)
delay10146: expected10000_(us) #
#
: _ #
# SO_TXTIME ipv4 clock monotonic
ipv4: clock_monotonic #
# payloada delay10146 expected10000 (us)
delay10146: expected10000_(us) #
#
: _ #
# SO_TXTIME ipv6 clock monotonic
ipv6: clock_monotonic #
# payloada delay10261 expected10000 (us)
delay10261: expected10000_(us) #
# payloadb delay20211 expected20000 (us)
delay20211: expected20000_(us) #
#
: _ #
# SO_TXTIME ipv4 clock monotonic
ipv4: clock_monotonic #
# payloada delay10186 expected10000 (us)
delay10186: expected10000_(us) #
# payloadb delay20387 expected20000 (us)
delay20387: expected20000_(us) #
#
: _ #
# SO_TXTIME ipv6 clock monotonic
ipv6: clock_monotonic #
# payloadb delay20238 expected20000 (us)
delay20238: expected20000_(us) #
# payloada delay20294 expected20000 (us)
delay20294: expected20000_(us) #
#
: _ #
# SO_TXTIME ipv4 clock monotonic
ipv4: clock_monotonic #
# payloadb delay20258 expected20000 (us)
delay20258: expected20000_(us) #
# payloada delay20313 expected20000 (us)
delay20313: expected20000_(us) #
#
: _ #
# SO_TXTIME ipv6 clock tai
ipv6: clock_tai #
# ./so_txtime read Resource temporarily unavailable
read: Resource_temporarily #
#
: _ #
# SO_TXTIME ipv6 clock tai
ipv6: clock_tai #
# ./so_txtime read Resource temporarily unavailable
read: Resource_temporarily #
#
: _ #
# SO_TXTIME ipv6 clock tai
ipv6: clock_tai #
# payloada delay10097 expected10000 (us)
delay10097: expected10000_(us) #
#
: _ #
# SO_TXTIME ipv4 clock tai
ipv4: clock_tai #
# payloada delay9957 expected10000 (us)
delay9957: expected10000_(us) #
#
: _ #
# SO_TXTIME ipv6 clock tai
ipv6: clock_tai #
# payloada delay10382 expected10000 (us)
delay10382: expected10000_(us) #
# ./so_txtime read Resource temporarily unavailable
read: Resource_temporarily #
[FAIL] 24 selftests net so_txtime.sh # exit=1
selftests: net_so_txtime.sh [FAIL]

Test run full log,
https://lkft.validation.linaro.org/scheduler/job/1010545#L1234

Test results comparison link,
https://qa-reports.linaro.org/lkft/linux-next-oe/tests/kselftest/net_so_txtime.sh
https://qa-reports.linaro.org/lkft/linux-mainline-oe/tests/kselftest/net_so_txtime.sh

Best regards
Naresh Kamboju

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: selftest/net: so_txtime.sh fails intermittently - read Resource temporarily unavailable
  2019-11-14  8:47 selftest/net: so_txtime.sh fails intermittently - read Resource temporarily unavailable Naresh Kamboju
@ 2019-11-15 16:21 ` Willem de Bruijn
  2019-11-20  6:32   ` Naresh Kamboju
  0 siblings, 1 reply; 4+ messages in thread
From: Willem de Bruijn @ 2019-11-15 16:21 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: open list:KERNEL SELFTEST FRAMEWORK, Netdev, Shuah Khan,
	Anders Roxell, lkft-triage, David S. Miller

On Thu, Nov 14, 2019 at 3:47 AM Naresh Kamboju
<naresh.kamboju@linaro.org> wrote:
>
> selftests net so_txtime.sh fails intermittently on multiple boards and
> linux next and mainline.

This is a time based test, so that is definitely possible. I had to
trade off sensitivity to variance against total running time.

Current tests schedule delivery in the future at 10 msec timescale. It
succeeds if dequeue happens at programmed departure time += 2 msec of
variance (cfg_variance_us).

Note that statements of this kind are not errors and are printed every time:

> # SO_TXTIME ipv6 clock monotonic
> ipv6: clock_monotonic #
> # payloada delay452 expected0 (us)
> delay452: expected0_(us) #

This seems like an error with clockid CLOCK_TAI, which means with qdisc ETF.

> # SO_TXTIME ipv6 clock tai
> ipv6: clock_tai #
> # ./so_txtime read Resource temporarily unavailable
> read: Resource_temporarily #
> #
> : _ #
> # SO_TXTIME ipv6 clock tai
> ipv6: clock_tai #
> # ./so_txtime read Resource temporarily unavailable
> read: Resource_temporarily #

Let me check a few other runs on the dashboard, too.

> [FAIL] 24 selftests net so_txtime.sh # exit=1
> selftests: net_so_txtime.sh [FAIL]
>
> Test run full log,
> https://lkft.validation.linaro.org/scheduler/job/1010545#L1234
>
> Test results comparison link,
> https://qa-reports.linaro.org/lkft/linux-next-oe/tests/kselftest/net_so_txtime.sh
> https://qa-reports.linaro.org/lkft/linux-mainline-oe/tests/kselftest/net_so_txtime.sh

This appears to have been flaky from the start, particularly on qemu_arm.

Looking at a few runs..

failing runs exceeds bounds:
https://lkft.validation.linaro.org/scheduler/job/1006586
https://lkft.validation.linaro.org/scheduler/job/1010686
https://lkft.validation.linaro.org/scheduler/job/1010630

"
delay22049: expected20000_(us) #
# ./so_txtime exceeds variance (2000 us)
"

"
delay13700: expected10000_(us) #
# ./so_txtime exceeds variance (2000 us)
"
"
delay29722: expected20000_(us) #
# ./so_txtime exceeds variance (2000 us)
"

These are easy to suppress, by just increasing cfg_variance_us and
optionally also increasing the delivery time scale.

failing run hit the "read: Resource temporarily unavailable" on TAI,
like this report
https://lkft.validation.linaro.org/scheduler/job/1008681

It is not absence of CONFIG_NET_SCH_ETF. That is compiled in (as
module) in these runs, according to the kernel config linked in the
dashboard.

The recv call must return EAGAIN because it reaches the SO_RCVTIMEO
timeout, set to 100 msec. So the packet was lost. I don't immediately
have an explanation for this. Will try to run my own qemu-arm
instance.

Naresh, when you mention "multiple boards" are there specific
microarchitectural details of the hosts that I should take into
account aside from the qemu-arm virtualized environment itself?

passing run detects missing ETF and skips those:
https://lkft.validation.linaro.org/scheduler/job/1006511

That is peculiar, as the dashboard for that run also shows it as available.

Very nice dashboard, btw!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: selftest/net: so_txtime.sh fails intermittently - read Resource temporarily unavailable
  2019-11-15 16:21 ` Willem de Bruijn
@ 2019-11-20  6:32   ` Naresh Kamboju
  2019-11-21  0:34     ` Willem de Bruijn
  0 siblings, 1 reply; 4+ messages in thread
From: Naresh Kamboju @ 2019-11-20  6:32 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: open list:KERNEL SELFTEST FRAMEWORK, Netdev, Shuah Khan,
	Anders Roxell, lkft-triage, David S. Miller

On Fri, 15 Nov 2019 at 21:52, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Thu, Nov 14, 2019 at 3:47 AM Naresh Kamboju

> This appears to have been flaky from the start, particularly on qemu_arm.

This is because of emulating 2 CPU.
I am gonna change this to emulate 4 CPU for qemu_arm.

>
> Looking at a few runs..
>
> failing runs exceeds bounds:
> https://lkft.validation.linaro.org/scheduler/job/1006586
...
> delay29722: expected20000_(us) #
> # ./so_txtime exceeds variance (2000 us)
> "
> These are easy to suppress, by just increasing cfg_variance_us and
> optionally also increasing the delivery time scale.

Alright !
The variance is 2000.
static int cfg_variance_us = 2000

> Naresh, when you mention "multiple boards" are there specific
> microarchitectural details of the hosts that I should take into
> account aside from the qemu-arm virtualized environment itself?

The easy to reproduce way is running 32-bit kernel and rootfs on
x86_64 machine.

arm32 bit beagleboard x15 device.

qemu-arm command line,
qemu-system-aarch64 -cpu host,aarch64=off -machine virt-2.10,accel=kvm
-nographic -net nic,model=virtio,macaddr=BA:DD:AD:CC:09:02 -net tap -m
2048 -monitor none -kernel zImage --append "console=ttyAMA0
root=/dev/vda rw" -drive
format=raw,file=rpb-console-image-lkft-am57xx-evm-20191112073604-644.rootfs.ext4,if=virtio
-m 4096 -smp 2 -nographic

> Very nice dashboard, btw!

Thanks for your valuable feedback. Great to hear this :-)

- Naresh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: selftest/net: so_txtime.sh fails intermittently - read Resource temporarily unavailable
  2019-11-20  6:32   ` Naresh Kamboju
@ 2019-11-21  0:34     ` Willem de Bruijn
  0 siblings, 0 replies; 4+ messages in thread
From: Willem de Bruijn @ 2019-11-21  0:34 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: open list:KERNEL SELFTEST FRAMEWORK, Netdev, Shuah Khan,
	Anders Roxell, lkft-triage, David S. Miller,
	jesus.sanchez-palencia

On Wed, Nov 20, 2019 at 1:33 AM Naresh Kamboju
<naresh.kamboju@linaro.org> wrote:
>
> On Fri, 15 Nov 2019 at 21:52, Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Thu, Nov 14, 2019 at 3:47 AM Naresh Kamboju
>
> > This appears to have been flaky from the start, particularly on qemu_arm.
>
> This is because of emulating 2 CPU.
> I am gonna change this to emulate 4 CPU for qemu_arm.
>
> >
> > Looking at a few runs..
> >
> > failing runs exceeds bounds:
> > https://lkft.validation.linaro.org/scheduler/job/1006586
> ...
> > delay29722: expected20000_(us) #
> > # ./so_txtime exceeds variance (2000 us)
> > "
> > These are easy to suppress, by just increasing cfg_variance_us and
> > optionally also increasing the delivery time scale.
>
> Alright !
> The variance is 2000.
> static int cfg_variance_us = 2000
>
> > Naresh, when you mention "multiple boards" are there specific
> > microarchitectural details of the hosts that I should take into
> > account aside from the qemu-arm virtualized environment itself?
>
> The easy to reproduce way is running 32-bit kernel and rootfs on
> x86_64 machine.

Thanks. As soon as I disabled kvm acceleration, it proved also easy to
reproduce on an x86_64 guest inside an x86_64 host.

> # ./so_txtime read Resource temporarily unavailable
> read: Resource_temporarily #

This occurs due to sch_etf dropping the packet on dequeue in
etf_dequeue_timesortedlist because of dequeue time is after the
scheduled delivery time.

There is some inevitable delay and jitter in scheduling the dequeue
timer. The q->delta argument to ETF enables scheduling ahead of the
deadline. Unfortunately, in this virtualized environment even the
current setting in so_txtime.sh of 200 us is proves too short. It
already seemed high to me at the time.

Doubling to 400 usec and also doubling cfg_variance_us to 4000 greatly
reduces -if not fully solves- the failure rate for me.

This type of drop is also reported through the socket error queue. To
avoid ending up with wholly meaningless time bounds, we can retry on
these known failures as long as failure rate is already low.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-11-21  0:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-14  8:47 selftest/net: so_txtime.sh fails intermittently - read Resource temporarily unavailable Naresh Kamboju
2019-11-15 16:21 ` Willem de Bruijn
2019-11-20  6:32   ` Naresh Kamboju
2019-11-21  0:34     ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).