All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
To: netdev@vger.kernel.org, linux-kselftest@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Shuah Khan <shuah@kernel.org>, Eric Dumazet <edumazet@google.com>,
	linux-kernel@vger.kernel.org
Subject: Re: BUG: selftest/net/tun: Hang in unregister_netdevice
Date: Tue, 14 Mar 2023 17:00:47 +0100	[thread overview]
Message-ID: <27769d34-521c-f0ef-b6c2-6bd452e4f9bf@alu.unizg.hr> (raw)
In-Reply-To: <d7a64812-73db-feb2-e6d6-e1d8c09a6fed@alu.unizg.hr>

On 3/14/23 14:52, Mirsad Todorovac wrote:
> On 3/14/23 12:45, Mirsad Todorovac wrote:
>> Hi, all!
>>
>> After running tools/testing/selftests/net/tun, there seems to be some kind of hang
>> in test "FAIL  tun.reattach_delete_close" or "FAIL  tun.reattach_close_delete".
>>
>> Two tests exit by timeout, but the processes left are unkillable, even with kill -9 PID:
>>
>> [root@pc-mtodorov linux_torvalds]# ps -ef | grep tun
>> root        1140       1  0 12:16 ?        00:00:00 /bin/bash /usr/sbin/ksmtuned
>> root        1333       1  0 12:16 ?        00:00:01 /usr/libexec/platform-python -Es /usr/sbin/tuned -l -P
>> root        3930    2309  0 12:20 pts/1    00:00:00 tools/testing/selftests/net/tun
>> root        3952    2309  0 12:21 pts/1    00:00:00 tools/testing/selftests/net/tun
>> root        4056    3765  0 12:25 pts/1    00:00:00 grep --color=auto tun
>> [root@pc-mtodorov linux_torvalds]# kill -9 3930 3952
>> [root@pc-mtodorov linux_torvalds]# ps -ef | grep tun
>> root        1140       1  0 12:16 ?        00:00:00 /bin/bash /usr/sbin/ksmtuned
>> root        1333       1  0 12:16 ?        00:00:01 /usr/libexec/platform-python -Es /usr/sbin/tuned -l -P
>> root        3930    2309  0 12:20 pts/1    00:00:00 tools/testing/selftests/net/tun
>> root        3952    2309  0 12:21 pts/1    00:00:00 tools/testing/selftests/net/tun
>> root        4060    3765  0 12:25 pts/1    00:00:00 grep --color=auto tun
>> [root@pc-mtodorov linux_torvalds]#
>>
>> The kernel seems to be stuck in some loop, and filling the log with the
>> following messages until reboot, where it is also waiting very long on the
>> situation to timeout, which apparently never happens.
>>
>> Mar 14 11:54:09 pc-mtodorov kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = 3
>> Mar 14 11:54:19 pc-mtodorov kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = 3
>> Mar 14 11:54:29 pc-mtodorov kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = 3
>> Mar 14 11:54:40 pc-mtodorov kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = 3
>> Mar 14 11:54:50 pc-mtodorov kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = 3
>>
>> The platform is kernel 6.3.0-rc2 on AlmaLinux 8.7 and a LENOVO_MT_10TX_BU_Lenovo_FM_V530S-07ICB
>> (lshw output attached).
>>
>> The .config is here:
>>
>> https://domac.alu.hr/~mtodorov/linux/selftests/net-tun/config-6.3.0-rc2-mg-andy-devres-00006-gfc89d7fb499b
>>
>> Basically, it is a vanilla Torvalds tree kernel with MGLRU, KMEMLEAK, and CONFIG_DEBUG_KOBJECT enabled.
>> And devres patch.
>>
>> Please find the strace of the net/tun run attached.
>>
>> I am available for additional diagnostics.
> 
> Hi, again!
> 
> I've been busy while waiting for reply, so I wondered how would a vanilla kernel
> go through the test, considering that I've been testing a number of patches
> lately.
> 
> I did a fresh git clone from repo and woa.
> 
> Surprisingly, the test with CONFIG_DEBUG_KOBJECT turned off passes:
> 
> [root@pc-mtodorov linux_torvalds]# tools/testing/selftests/net/tun
> TAP version 13
> 1..5
> # Starting 5 tests from 1 test cases.
> #  RUN           tun.delete_detach_close ...
> #            OK  tun.delete_detach_close
> ok 1 tun.delete_detach_close
> #  RUN           tun.detach_delete_close ...
> #            OK  tun.detach_delete_close
> ok 2 tun.detach_delete_close
> #  RUN           tun.detach_close_delete ...
> #            OK  tun.detach_close_delete
> ok 3 tun.detach_close_delete
> #  RUN           tun.reattach_delete_close ...
> #            OK  tun.reattach_delete_close
> ok 4 tun.reattach_delete_close
> #  RUN           tun.reattach_close_delete ...
> #            OK  tun.reattach_close_delete
> ok 5 tun.reattach_close_delete
> # PASSED: 5 / 5 tests passed.
> # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0
> [root@pc-mtodorov linux_torvalds]#
> 
> So, no hanging processes that cannot be killed now.
> 
> If you think it is worthy to explore the lockup that occurs when turning
> CONFIG_DEBUG_KOBJECT=y, I will rebuild once again with these turned on,
> to clear any doubts.

Confirmed.

With the sole difference of:

[marvin@pc-mtodorov linux_torvalds]$ grep KOBJECT /boot/config-6.3.0-rc2-vanilla-00006-gfc89d7fb499b
CONFIG_DEBUG_KOBJECT=y
CONFIG_DEBUG_KOBJECT_RELEASE=y
# CONFIG_SAMPLE_KOBJECT is not set
[marvin@pc-mtodorov linux_torvalds]$

we get again unkillable processes:

[root@pc-mtodorov linux_torvalds]# ps -ef | grep tun
root        1157       1  0 16:44 ?        00:00:00 /bin/bash /usr/sbin/ksmtuned
root        1331       1  0 16:44 ?        00:00:01 /usr/libexec/platform-python -Es /usr/sbin/tuned -l -P
root        3479    2315  0 16:45 pts/1    00:00:00 tools/testing/selftests/net/tun
root        3512    2315  0 16:45 pts/1    00:00:00 tools/testing/selftests/net/tun
root        4091    3364  0 16:49 pts/1    00:00:00 grep --color=auto tun
[root@pc-mtodorov linux_torvalds]# kill -9 3479 3512
[root@pc-mtodorov linux_torvalds]# ps -ef | grep tun
root        1157       1  0 16:44 ?        00:00:00 /bin/bash /usr/sbin/ksmtuned
root        1331       1  0 16:44 ?        00:00:01 /usr/libexec/platform-python -Es /usr/sbin/tuned -l -P
root        3479    2315  0 16:45 pts/1    00:00:00 tools/testing/selftests/net/tun
root        3512    2315  0 16:45 pts/1    00:00:00 tools/testing/selftests/net/tun
root        4095    3364  0 16:50 pts/1    00:00:00 grep --color=auto tun
[root@pc-mtodorov linux_torvalds]#

Possibly the kernel /proc/cmdline is also important:

[root@pc-mtodorov linux_torvalds]# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt5)/vmlinuz-6.3.0-rc2-vanilla-00006-gfc89d7fb499b root=/dev/mapper/almalinux_desktop--mtodorov-root ro 
crashkernel=auto resume=/dev/mapper/almalinux_desktop--mtodorov-swap rd.lvm.lv=almalinux_desktop-mtodorov/root 
rd.lvm.lv=almalinux_desktop-mtodorov/swap loglevel=7 i915.alpha_support=1 debug devres.log=1
[root@pc-mtodorov linux_torvalds]#

After a while, kernel message start looping:

  kernel:unregister_netdevice: waiting for tap0 to become free. Usage count = 3

Message from syslogd@pc-mtodorov at Mar 14 16:57:15 ...
  kernel:unregister_netdevice: waiting for tap0 to become free. Usage count = 3

Message from syslogd@pc-mtodorov at Mar 14 16:57:24 ...
  kernel:unregister_netdevice: waiting for tap0 to become free. Usage count = 3

Message from syslogd@pc-mtodorov at Mar 14 16:57:26 ...
  kernel:unregister_netdevice: waiting for tap0 to become free. Usage count = 3

This hangs processes until very late stage of shutdown.

I can confirm that CONFIG_DEBUG_{KOBJECT,KOBJECT_RELEASE}=y were the only changes
to .config in between builds.

Best regards,
Mirsad

-- 
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia

  reply	other threads:[~2023-03-14 16:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-14 11:45 BUG: selftest/net/tun: Hang in unregister_netdevice Mirsad Todorovac
2023-03-14 13:52 ` Mirsad Todorovac
2023-03-14 16:00   ` Mirsad Todorovac [this message]
2023-03-14 16:02     ` Eric Dumazet
2023-03-14 20:10       ` Mirsad Goran Todorovac
2023-03-15 20:56         ` Kuniyuki Iwashima
2023-03-15 20:59           ` Eric Dumazet
2023-03-16 20:28             ` Mirsad Goran Todorovac

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=27769d34-521c-f0ef-b6c2-6bd452e4f9bf@alu.unizg.hr \
    --to=mirsad.todorovac@alu.unizg.hr \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.