All of lore.kernel.org
 help / color / mirror / Atom feed
* [sparc64] possible circular locking / deadlock
@ 2019-06-17 14:46 ` Anatoly Pugachev
  0 siblings, 0 replies; 12+ messages in thread
From: Anatoly Pugachev @ 2019-06-17 14:46 UTC (permalink / raw)
  To: Sparc kernel list, netfilter-devel; +Cc: debian-sparc, coreteam

Hello!

Getting the following git kernel trace on boot with rc.local having :

ipset create sshguard4 hash:net
iptables -A INPUT -p tcp --dport 22 -m set --match-set sshguard4 src -j DROP

current git kernel:

$ uname -a
Linux ttip 5.2.0-rc5 #981 SMP Mon Jun 17 09:52:04 MSK 2019 sparc64 GNU/Linux
linux-2.6$ git desc
v5.2-rc5


$ dmesg
<cut>
[   10.356388] Adding 787176k swap on /dev/vdiska4.  Priority:-2
extents:1 across:787176k FS
[   10.471900] EXT4-fs (vdiska1): mounting ext3 file system using the
ext4 subsystem
[   10.487226] EXT4-fs (vdiska1): mounted filesystem with ordered data
mode. Opts: (null)
[   11.158102] random: crng init done
[   11.158155] random: 7 urandom warning(s) missed due to ratelimiting

[   11.697866] ======================================================
[   11.697875] WARNING: possible circular locking dependency detected
[   11.697886] 5.2.0-rc5 #981 Not tainted
[   11.697894] ------------------------------------------------------
[   11.697902] iptables/732 is trying to acquire lock:
[   11.697913] 000000004f61aa56 (&table[i].mutex){+.+.}, at:
nfnl_lock+0x24/0x40 [nfnetlink]
[   11.697937]
               but task is already holding lock:
[   11.697946] 000000000d652829 (&net->nft.commit_mutex){+.+.}, at:
nf_tables_valid_genid+0x18/0x60 [nf_tables]
[   11.697973]
               which lock already depends on the new lock.

[   11.697983]
               the existing dependency chain (in reverse order) is:
[   11.697992]
               -> #1 (&net->nft.commit_mutex){+.+.}:
[   11.698012]        __mutex_lock+0x48/0x920
[   11.698021]        mutex_lock_nested+0x1c/0x40
[   11.698033]        nf_tables_valid_genid+0x18/0x60 [nf_tables]
[   11.698043]        nfnetlink_rcv_batch+0x24c/0x620 [nfnetlink]
[   11.698053]        nfnetlink_rcv+0x110/0x140 [nfnetlink]
[   11.698067]        netlink_unicast+0x12c/0x1e0
[   11.698076]        netlink_sendmsg+0x324/0x360
[   11.698091]        sock_sendmsg+0x34/0x80
[   11.698099]        ___sys_sendmsg+0x228/0x240
[   11.698108]        __sys_sendmsg+0x4c/0x80
[   11.698116]        sys_sendmsg+0x18/0x40
[   11.698131]        linux_sparc_syscall+0x34/0x44
[   11.698138]
               -> #0 (&table[i].mutex){+.+.}:
[   11.698157]        lock_acquire+0x1a4/0x1c0
[   11.698165]        __mutex_lock+0x48/0x920
[   11.698173]        mutex_lock_nested+0x1c/0x40
[   11.698181]        nfnl_lock+0x24/0x40 [nfnetlink]
[   11.698196]        ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
[   11.698207]        set_match_v1_checkentry+0x14/0xc0 [xt_set]
[   11.698222]        xt_check_match+0x238/0x260 [x_tables]
[   11.698234]        __nft_match_init+0x160/0x180 [nft_compat]
[   11.698244]        nft_match_init+0x18/0x40 [nft_compat]
[   11.698256]        nf_tables_newrule+0x57c/0x7a0 [nf_tables]
[   11.698266]        nfnetlink_rcv_batch+0x3f8/0x620 [nfnetlink]
[   11.698275]        nfnetlink_rcv+0x110/0x140 [nfnetlink]
[   11.698284]        netlink_unicast+0x12c/0x1e0
[   11.698292]        netlink_sendmsg+0x324/0x360
[   11.698300]        sock_sendmsg+0x34/0x80
[   11.698309]        ___sys_sendmsg+0x228/0x240
[   11.698317]        __sys_sendmsg+0x4c/0x80
[   11.698325]        sys_sendmsg+0x18/0x40
[   11.698334]        linux_sparc_syscall+0x34/0x44
[   11.698340]
               other info that might help us debug this:

[   11.698351]  Possible unsafe locking scenario:

[   11.698359]        CPU0                    CPU1
[   11.698366]        ----                    ----
[   11.698372]   lock(&net->nft.commit_mutex);
[   11.698381]                                lock(&table[i].mutex);
[   11.698390]                                lock(&net->nft.commit_mutex);
[   11.698400]   lock(&table[i].mutex);
[   11.698408]
                *** DEADLOCK ***

[   11.698418] 1 lock held by iptables/732:
[   11.698424]  #0: 000000000d652829 (&net->nft.commit_mutex){+.+.},
at: nf_tables_valid_genid+0x18/0x60 [nf_tables]
[   11.698444]
               stack backtrace:
[   11.698454] CPU: 6 PID: 732 Comm: iptables Not tainted 5.2.0-rc5 #981
[   11.698463] Call Trace:
[   11.698471]  [00000000004cfde0] print_circular_bug+0x2e0/0x320
[   11.698480]  [00000000004d4bd8] __lock_acquire+0x1d38/0x2900
[   11.698489]  [00000000004d6084] lock_acquire+0x1a4/0x1c0
[   11.698498]  [0000000000a06508] __mutex_lock+0x48/0x920
[   11.698506]  [0000000000a06dfc] mutex_lock_nested+0x1c/0x40
[   11.698516]  [000000001071c024] nfnl_lock+0x24/0x40 [nfnetlink]
[   11.698527]  [00000000107568dc] ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
[   11.698537]  [000000001078e5d4] set_match_v1_checkentry+0x14/0xc0 [xt_set]
[   11.698549]  [0000000010310ed8] xt_check_match+0x238/0x260 [x_tables]
[   11.698559]  [000000001077cc00] __nft_match_init+0x160/0x180 [nft_compat]
[   11.698569]  [000000001077ccb8] nft_match_init+0x18/0x40 [nft_compat]
[   11.698582]  [0000000010731c3c] nf_tables_newrule+0x57c/0x7a0 [nf_tables]
[   11.698592]  [000000001071d238] nfnetlink_rcv_batch+0x3f8/0x620 [nfnetlink]
[   11.698602]  [000000001071d570] nfnetlink_rcv+0x110/0x140 [nfnetlink]
[   11.698611]  [000000000093e82c] netlink_unicast+0x12c/0x1e0
[   11.698620]  [000000000093f484] netlink_sendmsg+0x324/0x360



Full kernel configuration file as well full dmesg messages are
available at https://github.com/mator/sparc64-dmesg/

system info:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/sparc64-linux-gnu/8/lto-wrapper
Target: sparc64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian
8.3.0-7' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr
--with-gcc-major-version-only --program-suffix=-8
--program-prefix=sparc64-linux-gnu- --enable-shared
--enable-linker-build-id --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --libdir=/usr/lib
--enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-libquadmath
--disable-libquadmath-support --enable-plugin --enable-default-pie
--with-system-zlib --disable-libphobos --enable-objc-gc=auto
--enable-multiarch --disable-werror --with-cpu-32=ultrasparc
--enable-targets=all --with-long-double-128 --enable-multilib
--enable-checking=release --build=sparc64-linux-gnu
--host=sparc64-linux-gnu --target=sparc64-linux-gnu
Thread model: posix
gcc version 8.3.0 (Debian 8.3.0-7)

# ldconfig -V
ldconfig (Debian GLIBC 2.28-10) 2.28

# ld -V
GNU ld (GNU Binutils for Debian) 2.31.1

PS: i wasn't able to trace which kernel version introduced this
possible deadlock... but tried (from top git tag v5.2-rc1 to bottom)
up to 4.13 kernel version...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [sparc64] possible circular locking / deadlock
@ 2019-06-17 14:46 ` Anatoly Pugachev
  0 siblings, 0 replies; 12+ messages in thread
From: Anatoly Pugachev @ 2019-06-17 14:46 UTC (permalink / raw)
  To: Sparc kernel list, netfilter-devel; +Cc: debian-sparc, coreteam

Hello!

Getting the following git kernel trace on boot with rc.local having :

ipset create sshguard4 hash:net
iptables -A INPUT -p tcp --dport 22 -m set --match-set sshguard4 src -j DROP

current git kernel:

$ uname -a
Linux ttip 5.2.0-rc5 #981 SMP Mon Jun 17 09:52:04 MSK 2019 sparc64 GNU/Linux
linux-2.6$ git desc
v5.2-rc5


$ dmesg
<cut>
[   10.356388] Adding 787176k swap on /dev/vdiska4.  Priority:-2
extents:1 across:787176k FS
[   10.471900] EXT4-fs (vdiska1): mounting ext3 file system using the
ext4 subsystem
[   10.487226] EXT4-fs (vdiska1): mounted filesystem with ordered data
mode. Opts: (null)
[   11.158102] random: crng init done
[   11.158155] random: 7 urandom warning(s) missed due to ratelimiting

[   11.697866] ===========================
[   11.697875] WARNING: possible circular locking dependency detected
[   11.697886] 5.2.0-rc5 #981 Not tainted
[   11.697894] ------------------------------------------------------
[   11.697902] iptables/732 is trying to acquire lock:
[   11.697913] 000000004f61aa56 (&table[i].mutex){+.+.}, at:
nfnl_lock+0x24/0x40 [nfnetlink]
[   11.697937]
               but task is already holding lock:
[   11.697946] 000000000d652829 (&net->nft.commit_mutex){+.+.}, at:
nf_tables_valid_genid+0x18/0x60 [nf_tables]
[   11.697973]
               which lock already depends on the new lock.

[   11.697983]
               the existing dependency chain (in reverse order) is:
[   11.697992]
               -> #1 (&net->nft.commit_mutex){+.+.}:
[   11.698012]        __mutex_lock+0x48/0x920
[   11.698021]        mutex_lock_nested+0x1c/0x40
[   11.698033]        nf_tables_valid_genid+0x18/0x60 [nf_tables]
[   11.698043]        nfnetlink_rcv_batch+0x24c/0x620 [nfnetlink]
[   11.698053]        nfnetlink_rcv+0x110/0x140 [nfnetlink]
[   11.698067]        netlink_unicast+0x12c/0x1e0
[   11.698076]        netlink_sendmsg+0x324/0x360
[   11.698091]        sock_sendmsg+0x34/0x80
[   11.698099]        ___sys_sendmsg+0x228/0x240
[   11.698108]        __sys_sendmsg+0x4c/0x80
[   11.698116]        sys_sendmsg+0x18/0x40
[   11.698131]        linux_sparc_syscall+0x34/0x44
[   11.698138]
               -> #0 (&table[i].mutex){+.+.}:
[   11.698157]        lock_acquire+0x1a4/0x1c0
[   11.698165]        __mutex_lock+0x48/0x920
[   11.698173]        mutex_lock_nested+0x1c/0x40
[   11.698181]        nfnl_lock+0x24/0x40 [nfnetlink]
[   11.698196]        ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
[   11.698207]        set_match_v1_checkentry+0x14/0xc0 [xt_set]
[   11.698222]        xt_check_match+0x238/0x260 [x_tables]
[   11.698234]        __nft_match_init+0x160/0x180 [nft_compat]
[   11.698244]        nft_match_init+0x18/0x40 [nft_compat]
[   11.698256]        nf_tables_newrule+0x57c/0x7a0 [nf_tables]
[   11.698266]        nfnetlink_rcv_batch+0x3f8/0x620 [nfnetlink]
[   11.698275]        nfnetlink_rcv+0x110/0x140 [nfnetlink]
[   11.698284]        netlink_unicast+0x12c/0x1e0
[   11.698292]        netlink_sendmsg+0x324/0x360
[   11.698300]        sock_sendmsg+0x34/0x80
[   11.698309]        ___sys_sendmsg+0x228/0x240
[   11.698317]        __sys_sendmsg+0x4c/0x80
[   11.698325]        sys_sendmsg+0x18/0x40
[   11.698334]        linux_sparc_syscall+0x34/0x44
[   11.698340]
               other info that might help us debug this:

[   11.698351]  Possible unsafe locking scenario:

[   11.698359]        CPU0                    CPU1
[   11.698366]        ----                    ----
[   11.698372]   lock(&net->nft.commit_mutex);
[   11.698381]                                lock(&table[i].mutex);
[   11.698390]                                lock(&net->nft.commit_mutex);
[   11.698400]   lock(&table[i].mutex);
[   11.698408]
                *** DEADLOCK ***

[   11.698418] 1 lock held by iptables/732:
[   11.698424]  #0: 000000000d652829 (&net->nft.commit_mutex){+.+.},
at: nf_tables_valid_genid+0x18/0x60 [nf_tables]
[   11.698444]
               stack backtrace:
[   11.698454] CPU: 6 PID: 732 Comm: iptables Not tainted 5.2.0-rc5 #981
[   11.698463] Call Trace:
[   11.698471]  [00000000004cfde0] print_circular_bug+0x2e0/0x320
[   11.698480]  [00000000004d4bd8] __lock_acquire+0x1d38/0x2900
[   11.698489]  [00000000004d6084] lock_acquire+0x1a4/0x1c0
[   11.698498]  [0000000000a06508] __mutex_lock+0x48/0x920
[   11.698506]  [0000000000a06dfc] mutex_lock_nested+0x1c/0x40
[   11.698516]  [000000001071c024] nfnl_lock+0x24/0x40 [nfnetlink]
[   11.698527]  [00000000107568dc] ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
[   11.698537]  [000000001078e5d4] set_match_v1_checkentry+0x14/0xc0 [xt_set]
[   11.698549]  [0000000010310ed8] xt_check_match+0x238/0x260 [x_tables]
[   11.698559]  [000000001077cc00] __nft_match_init+0x160/0x180 [nft_compat]
[   11.698569]  [000000001077ccb8] nft_match_init+0x18/0x40 [nft_compat]
[   11.698582]  [0000000010731c3c] nf_tables_newrule+0x57c/0x7a0 [nf_tables]
[   11.698592]  [000000001071d238] nfnetlink_rcv_batch+0x3f8/0x620 [nfnetlink]
[   11.698602]  [000000001071d570] nfnetlink_rcv+0x110/0x140 [nfnetlink]
[   11.698611]  [000000000093e82c] netlink_unicast+0x12c/0x1e0
[   11.698620]  [000000000093f484] netlink_sendmsg+0x324/0x360



Full kernel configuration file as well full dmesg messages are
available at https://github.com/mator/sparc64-dmesg/

system info:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/sparc64-linux-gnu/8/lto-wrapper
Target: sparc64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian
8.3.0-7' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr
--with-gcc-major-version-only --program-suffix=-8
--program-prefix=sparc64-linux-gnu- --enable-shared
--enable-linker-build-id --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --libdir=/usr/lib
--enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-libquadmath
--disable-libquadmath-support --enable-plugin --enable-default-pie
--with-system-zlib --disable-libphobos --enable-objc-gc=auto
--enable-multiarch --disable-werror --with-cpu-32=ultrasparc
--enable-targets=all --with-long-double-128 --enable-multilib
--enable-checking=release --build=sparc64-linux-gnu
--host=sparc64-linux-gnu --target=sparc64-linux-gnu
Thread model: posix
gcc version 8.3.0 (Debian 8.3.0-7)

# ldconfig -V
ldconfig (Debian GLIBC 2.28-10) 2.28

# ld -V
GNU ld (GNU Binutils for Debian) 2.31.1

PS: i wasn't able to trace which kernel version introduced this
possible deadlock... but tried (from top git tag v5.2-rc1 to bottom)
up to 4.13 kernel version...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [netfilter-core] [sparc64] possible circular locking / deadlock
  2019-06-17 14:46 ` Anatoly Pugachev
@ 2019-06-17 15:02   ` Jozsef Kadlecsik
  -1 siblings, 0 replies; 12+ messages in thread
From: Jozsef Kadlecsik @ 2019-06-17 15:02 UTC (permalink / raw)
  To: Anatoly Pugachev
  Cc: Sparc kernel list, netfilter-devel, coreteam, debian-sparc

Hi,

On Mon, 17 Jun 2019, Anatoly Pugachev wrote:

> Getting the following git kernel trace on boot with rc.local having :
> 
> ipset create sshguard4 hash:net
> iptables -A INPUT -p tcp --dport 22 -m set --match-set sshguard4 src -j DROP

In spite of "iptables", it must be the nftables compat backend.
 
> current git kernel:
> 
> $ uname -a
> Linux ttip 5.2.0-rc5 #981 SMP Mon Jun 17 09:52:04 MSK 2019 sparc64 GNU/Linux
> linux-2.6$ git desc
> v5.2-rc5
> 
> 
> $ dmesg
> <cut>
> [   10.356388] Adding 787176k swap on /dev/vdiska4.  Priority:-2
> extents:1 across:787176k FS
> [   10.471900] EXT4-fs (vdiska1): mounting ext3 file system using the
> ext4 subsystem
> [   10.487226] EXT4-fs (vdiska1): mounted filesystem with ordered data
> mode. Opts: (null)
> [   11.158102] random: crng init done
> [   11.158155] random: 7 urandom warning(s) missed due to ratelimiting
> 
> [   11.697866] ======================================================
> [   11.697875] WARNING: possible circular locking dependency detected
> [   11.697886] 5.2.0-rc5 #981 Not tainted
> [   11.697894] ------------------------------------------------------
> [   11.697902] iptables/732 is trying to acquire lock:
> [   11.697913] 000000004f61aa56 (&table[i].mutex){+.+.}, at:
> nfnl_lock+0x24/0x40 [nfnetlink]
> [   11.697937]
>                but task is already holding lock:
> [   11.697946] 000000000d652829 (&net->nft.commit_mutex){+.+.}, at:
> nf_tables_valid_genid+0x18/0x60 [nf_tables]
> [   11.697973]
>                which lock already depends on the new lock.
> 
> [   11.697983]
>                the existing dependency chain (in reverse order) is:
> [   11.697992]
>                -> #1 (&net->nft.commit_mutex){+.+.}:
> [   11.698012]        __mutex_lock+0x48/0x920
> [   11.698021]        mutex_lock_nested+0x1c/0x40
> [   11.698033]        nf_tables_valid_genid+0x18/0x60 [nf_tables]
> [   11.698043]        nfnetlink_rcv_batch+0x24c/0x620 [nfnetlink]
> [   11.698053]        nfnetlink_rcv+0x110/0x140 [nfnetlink]
> [   11.698067]        netlink_unicast+0x12c/0x1e0
> [   11.698076]        netlink_sendmsg+0x324/0x360
> [   11.698091]        sock_sendmsg+0x34/0x80
> [   11.698099]        ___sys_sendmsg+0x228/0x240
> [   11.698108]        __sys_sendmsg+0x4c/0x80
> [   11.698116]        sys_sendmsg+0x18/0x40
> [   11.698131]        linux_sparc_syscall+0x34/0x44
> [   11.698138]
>                -> #0 (&table[i].mutex){+.+.}:
> [   11.698157]        lock_acquire+0x1a4/0x1c0
> [   11.698165]        __mutex_lock+0x48/0x920
> [   11.698173]        mutex_lock_nested+0x1c/0x40
> [   11.698181]        nfnl_lock+0x24/0x40 [nfnetlink]
> [   11.698196]        ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
> [   11.698207]        set_match_v1_checkentry+0x14/0xc0 [xt_set]

set_match_v1_checkentry() from ipset always assumed that it's called via 
the old xtables/setsockopt interface. Thus it calls 
ip_set_nfnl_get_byindex() which is then calls 
nfnl_lock(NFNL_SUBSYS_IPSET). Here comes the circular dependency.

I suppose the only solution is to check wether the mutex is already held 
or not. Until I send the patch, the only way to avoid the issue is to use 
the old legacy xtables interface.

Best regards,
Jozsef

> [   11.698222]        xt_check_match+0x238/0x260 [x_tables]
> [   11.698234]        __nft_match_init+0x160/0x180 [nft_compat]
> [   11.698244]        nft_match_init+0x18/0x40 [nft_compat]
> [   11.698256]        nf_tables_newrule+0x57c/0x7a0 [nf_tables]
> [   11.698266]        nfnetlink_rcv_batch+0x3f8/0x620 [nfnetlink]
> [   11.698275]        nfnetlink_rcv+0x110/0x140 [nfnetlink]
> [   11.698284]        netlink_unicast+0x12c/0x1e0
> [   11.698292]        netlink_sendmsg+0x324/0x360
> [   11.698300]        sock_sendmsg+0x34/0x80
> [   11.698309]        ___sys_sendmsg+0x228/0x240
> [   11.698317]        __sys_sendmsg+0x4c/0x80
> [   11.698325]        sys_sendmsg+0x18/0x40
> [   11.698334]        linux_sparc_syscall+0x34/0x44
> [   11.698340]
>                other info that might help us debug this:
> 
> [   11.698351]  Possible unsafe locking scenario:
> 
> [   11.698359]        CPU0                    CPU1
> [   11.698366]        ----                    ----
> [   11.698372]   lock(&net->nft.commit_mutex);
> [   11.698381]                                lock(&table[i].mutex);
> [   11.698390]                                lock(&net->nft.commit_mutex);
> [   11.698400]   lock(&table[i].mutex);
> [   11.698408]
>                 *** DEADLOCK ***
> 
> [   11.698418] 1 lock held by iptables/732:
> [   11.698424]  #0: 000000000d652829 (&net->nft.commit_mutex){+.+.},
> at: nf_tables_valid_genid+0x18/0x60 [nf_tables]
> [   11.698444]
>                stack backtrace:
> [   11.698454] CPU: 6 PID: 732 Comm: iptables Not tainted 5.2.0-rc5 #981
> [   11.698463] Call Trace:
> [   11.698471]  [00000000004cfde0] print_circular_bug+0x2e0/0x320
> [   11.698480]  [00000000004d4bd8] __lock_acquire+0x1d38/0x2900
> [   11.698489]  [00000000004d6084] lock_acquire+0x1a4/0x1c0
> [   11.698498]  [0000000000a06508] __mutex_lock+0x48/0x920
> [   11.698506]  [0000000000a06dfc] mutex_lock_nested+0x1c/0x40
> [   11.698516]  [000000001071c024] nfnl_lock+0x24/0x40 [nfnetlink]
> [   11.698527]  [00000000107568dc] ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
> [   11.698537]  [000000001078e5d4] set_match_v1_checkentry+0x14/0xc0 [xt_set]
> [   11.698549]  [0000000010310ed8] xt_check_match+0x238/0x260 [x_tables]
> [   11.698559]  [000000001077cc00] __nft_match_init+0x160/0x180 [nft_compat]
> [   11.698569]  [000000001077ccb8] nft_match_init+0x18/0x40 [nft_compat]
> [   11.698582]  [0000000010731c3c] nf_tables_newrule+0x57c/0x7a0 [nf_tables]
> [   11.698592]  [000000001071d238] nfnetlink_rcv_batch+0x3f8/0x620 [nfnetlink]
> [   11.698602]  [000000001071d570] nfnetlink_rcv+0x110/0x140 [nfnetlink]
> [   11.698611]  [000000000093e82c] netlink_unicast+0x12c/0x1e0
> [   11.698620]  [000000000093f484] netlink_sendmsg+0x324/0x360
> 
> 
> 
> Full kernel configuration file as well full dmesg messages are
> available at https://github.com/mator/sparc64-dmesg/
> 
> system info:
> 
> $ gcc -v
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/sparc64-linux-gnu/8/lto-wrapper
> Target: sparc64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Debian
> 8.3.0-7' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs
> --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr
> --with-gcc-major-version-only --program-suffix=-8
> --program-prefix=sparc64-linux-gnu- --enable-shared
> --enable-linker-build-id --libexecdir=/usr/lib
> --without-included-gettext --enable-threads=posix --libdir=/usr/lib
> --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
> --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
> --enable-gnu-unique-object --disable-libquadmath
> --disable-libquadmath-support --enable-plugin --enable-default-pie
> --with-system-zlib --disable-libphobos --enable-objc-gc=auto
> --enable-multiarch --disable-werror --with-cpu-32=ultrasparc
> --enable-targets=all --with-long-double-128 --enable-multilib
> --enable-checking=release --build=sparc64-linux-gnu
> --host=sparc64-linux-gnu --target=sparc64-linux-gnu
> Thread model: posix
> gcc version 8.3.0 (Debian 8.3.0-7)
> 
> # ldconfig -V
> ldconfig (Debian GLIBC 2.28-10) 2.28
> 
> # ld -V
> GNU ld (GNU Binutils for Debian) 2.31.1
> 
> PS: i wasn't able to trace which kernel version introduced this
> possible deadlock... but tried (from top git tag v5.2-rc1 to bottom)
> up to 4.13 kernel version...
> 
> 

-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [netfilter-core] [sparc64] possible circular locking / deadlock
@ 2019-06-17 15:02   ` Jozsef Kadlecsik
  0 siblings, 0 replies; 12+ messages in thread
From: Jozsef Kadlecsik @ 2019-06-17 15:02 UTC (permalink / raw)
  To: Anatoly Pugachev
  Cc: Sparc kernel list, netfilter-devel, coreteam, debian-sparc

Hi,

On Mon, 17 Jun 2019, Anatoly Pugachev wrote:

> Getting the following git kernel trace on boot with rc.local having :
> 
> ipset create sshguard4 hash:net
> iptables -A INPUT -p tcp --dport 22 -m set --match-set sshguard4 src -j DROP

In spite of "iptables", it must be the nftables compat backend.
 
> current git kernel:
> 
> $ uname -a
> Linux ttip 5.2.0-rc5 #981 SMP Mon Jun 17 09:52:04 MSK 2019 sparc64 GNU/Linux
> linux-2.6$ git desc
> v5.2-rc5
> 
> 
> $ dmesg
> <cut>
> [   10.356388] Adding 787176k swap on /dev/vdiska4.  Priority:-2
> extents:1 across:787176k FS
> [   10.471900] EXT4-fs (vdiska1): mounting ext3 file system using the
> ext4 subsystem
> [   10.487226] EXT4-fs (vdiska1): mounted filesystem with ordered data
> mode. Opts: (null)
> [   11.158102] random: crng init done
> [   11.158155] random: 7 urandom warning(s) missed due to ratelimiting
> 
> [   11.697866] ===========================
> [   11.697875] WARNING: possible circular locking dependency detected
> [   11.697886] 5.2.0-rc5 #981 Not tainted
> [   11.697894] ------------------------------------------------------
> [   11.697902] iptables/732 is trying to acquire lock:
> [   11.697913] 000000004f61aa56 (&table[i].mutex){+.+.}, at:
> nfnl_lock+0x24/0x40 [nfnetlink]
> [   11.697937]
>                but task is already holding lock:
> [   11.697946] 000000000d652829 (&net->nft.commit_mutex){+.+.}, at:
> nf_tables_valid_genid+0x18/0x60 [nf_tables]
> [   11.697973]
>                which lock already depends on the new lock.
> 
> [   11.697983]
>                the existing dependency chain (in reverse order) is:
> [   11.697992]
>                -> #1 (&net->nft.commit_mutex){+.+.}:
> [   11.698012]        __mutex_lock+0x48/0x920
> [   11.698021]        mutex_lock_nested+0x1c/0x40
> [   11.698033]        nf_tables_valid_genid+0x18/0x60 [nf_tables]
> [   11.698043]        nfnetlink_rcv_batch+0x24c/0x620 [nfnetlink]
> [   11.698053]        nfnetlink_rcv+0x110/0x140 [nfnetlink]
> [   11.698067]        netlink_unicast+0x12c/0x1e0
> [   11.698076]        netlink_sendmsg+0x324/0x360
> [   11.698091]        sock_sendmsg+0x34/0x80
> [   11.698099]        ___sys_sendmsg+0x228/0x240
> [   11.698108]        __sys_sendmsg+0x4c/0x80
> [   11.698116]        sys_sendmsg+0x18/0x40
> [   11.698131]        linux_sparc_syscall+0x34/0x44
> [   11.698138]
>                -> #0 (&table[i].mutex){+.+.}:
> [   11.698157]        lock_acquire+0x1a4/0x1c0
> [   11.698165]        __mutex_lock+0x48/0x920
> [   11.698173]        mutex_lock_nested+0x1c/0x40
> [   11.698181]        nfnl_lock+0x24/0x40 [nfnetlink]
> [   11.698196]        ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
> [   11.698207]        set_match_v1_checkentry+0x14/0xc0 [xt_set]

set_match_v1_checkentry() from ipset always assumed that it's called via 
the old xtables/setsockopt interface. Thus it calls 
ip_set_nfnl_get_byindex() which is then calls 
nfnl_lock(NFNL_SUBSYS_IPSET). Here comes the circular dependency.

I suppose the only solution is to check wether the mutex is already held 
or not. Until I send the patch, the only way to avoid the issue is to use 
the old legacy xtables interface.

Best regards,
Jozsef

> [   11.698222]        xt_check_match+0x238/0x260 [x_tables]
> [   11.698234]        __nft_match_init+0x160/0x180 [nft_compat]
> [   11.698244]        nft_match_init+0x18/0x40 [nft_compat]
> [   11.698256]        nf_tables_newrule+0x57c/0x7a0 [nf_tables]
> [   11.698266]        nfnetlink_rcv_batch+0x3f8/0x620 [nfnetlink]
> [   11.698275]        nfnetlink_rcv+0x110/0x140 [nfnetlink]
> [   11.698284]        netlink_unicast+0x12c/0x1e0
> [   11.698292]        netlink_sendmsg+0x324/0x360
> [   11.698300]        sock_sendmsg+0x34/0x80
> [   11.698309]        ___sys_sendmsg+0x228/0x240
> [   11.698317]        __sys_sendmsg+0x4c/0x80
> [   11.698325]        sys_sendmsg+0x18/0x40
> [   11.698334]        linux_sparc_syscall+0x34/0x44
> [   11.698340]
>                other info that might help us debug this:
> 
> [   11.698351]  Possible unsafe locking scenario:
> 
> [   11.698359]        CPU0                    CPU1
> [   11.698366]        ----                    ----
> [   11.698372]   lock(&net->nft.commit_mutex);
> [   11.698381]                                lock(&table[i].mutex);
> [   11.698390]                                lock(&net->nft.commit_mutex);
> [   11.698400]   lock(&table[i].mutex);
> [   11.698408]
>                 *** DEADLOCK ***
> 
> [   11.698418] 1 lock held by iptables/732:
> [   11.698424]  #0: 000000000d652829 (&net->nft.commit_mutex){+.+.},
> at: nf_tables_valid_genid+0x18/0x60 [nf_tables]
> [   11.698444]
>                stack backtrace:
> [   11.698454] CPU: 6 PID: 732 Comm: iptables Not tainted 5.2.0-rc5 #981
> [   11.698463] Call Trace:
> [   11.698471]  [00000000004cfde0] print_circular_bug+0x2e0/0x320
> [   11.698480]  [00000000004d4bd8] __lock_acquire+0x1d38/0x2900
> [   11.698489]  [00000000004d6084] lock_acquire+0x1a4/0x1c0
> [   11.698498]  [0000000000a06508] __mutex_lock+0x48/0x920
> [   11.698506]  [0000000000a06dfc] mutex_lock_nested+0x1c/0x40
> [   11.698516]  [000000001071c024] nfnl_lock+0x24/0x40 [nfnetlink]
> [   11.698527]  [00000000107568dc] ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
> [   11.698537]  [000000001078e5d4] set_match_v1_checkentry+0x14/0xc0 [xt_set]
> [   11.698549]  [0000000010310ed8] xt_check_match+0x238/0x260 [x_tables]
> [   11.698559]  [000000001077cc00] __nft_match_init+0x160/0x180 [nft_compat]
> [   11.698569]  [000000001077ccb8] nft_match_init+0x18/0x40 [nft_compat]
> [   11.698582]  [0000000010731c3c] nf_tables_newrule+0x57c/0x7a0 [nf_tables]
> [   11.698592]  [000000001071d238] nfnetlink_rcv_batch+0x3f8/0x620 [nfnetlink]
> [   11.698602]  [000000001071d570] nfnetlink_rcv+0x110/0x140 [nfnetlink]
> [   11.698611]  [000000000093e82c] netlink_unicast+0x12c/0x1e0
> [   11.698620]  [000000000093f484] netlink_sendmsg+0x324/0x360
> 
> 
> 
> Full kernel configuration file as well full dmesg messages are
> available at https://github.com/mator/sparc64-dmesg/
> 
> system info:
> 
> $ gcc -v
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/sparc64-linux-gnu/8/lto-wrapper
> Target: sparc64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Debian
> 8.3.0-7' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs
> --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr
> --with-gcc-major-version-only --program-suffix=-8
> --program-prefix=sparc64-linux-gnu- --enable-shared
> --enable-linker-build-id --libexecdir=/usr/lib
> --without-included-gettext --enable-threads=posix --libdir=/usr/lib
> --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
> --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
> --enable-gnu-unique-object --disable-libquadmath
> --disable-libquadmath-support --enable-plugin --enable-default-pie
> --with-system-zlib --disable-libphobos --enable-objc-gc=auto
> --enable-multiarch --disable-werror --with-cpu-32=ultrasparc
> --enable-targets=all --with-long-double-128 --enable-multilib
> --enable-checking=release --build=sparc64-linux-gnu
> --host=sparc64-linux-gnu --target=sparc64-linux-gnu
> Thread model: posix
> gcc version 8.3.0 (Debian 8.3.0-7)
> 
> # ldconfig -V
> ldconfig (Debian GLIBC 2.28-10) 2.28
> 
> # ld -V
> GNU ld (GNU Binutils for Debian) 2.31.1
> 
> PS: i wasn't able to trace which kernel version introduced this
> possible deadlock... but tried (from top git tag v5.2-rc1 to bottom)
> up to 4.13 kernel version...
> 
> 

-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [netfilter-core] [sparc64] possible circular locking / deadlock
  2019-06-17 15:02   ` Jozsef Kadlecsik
@ 2019-06-17 15:06     ` Pablo Neira Ayuso
  -1 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2019-06-17 15:06 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Anatoly Pugachev, Sparc kernel list, netfilter-devel, coreteam,
	debian-sparc

Hi Jozsef,

On Mon, Jun 17, 2019 at 05:02:51PM +0200, Jozsef Kadlecsik wrote:
> Hi,
> 
> On Mon, 17 Jun 2019, Anatoly Pugachev wrote:
> 
> > Getting the following git kernel trace on boot with rc.local having :
> > 
> > ipset create sshguard4 hash:net
> > iptables -A INPUT -p tcp --dport 22 -m set --match-set sshguard4 src -j DROP
> 
> In spite of "iptables", it must be the nftables compat backend.
>  
> > current git kernel:
> > 
> > $ uname -a
> > Linux ttip 5.2.0-rc5 #981 SMP Mon Jun 17 09:52:04 MSK 2019 sparc64 GNU/Linux
> > linux-2.6$ git desc
> > v5.2-rc5
> > 
> > 
> > $ dmesg
> > <cut>
> > [   10.356388] Adding 787176k swap on /dev/vdiska4.  Priority:-2
> > extents:1 across:787176k FS
> > [   10.471900] EXT4-fs (vdiska1): mounting ext3 file system using the
> > ext4 subsystem
> > [   10.487226] EXT4-fs (vdiska1): mounted filesystem with ordered data
> > mode. Opts: (null)
> > [   11.158102] random: crng init done
> > [   11.158155] random: 7 urandom warning(s) missed due to ratelimiting
> > 
> > [   11.697866] ======================================================
> > [   11.697875] WARNING: possible circular locking dependency detected
> > [   11.697886] 5.2.0-rc5 #981 Not tainted
> > [   11.697894] ------------------------------------------------------
> > [   11.697902] iptables/732 is trying to acquire lock:
> > [   11.697913] 000000004f61aa56 (&table[i].mutex){+.+.}, at:
> > nfnl_lock+0x24/0x40 [nfnetlink]
> > [   11.697937]
> >                but task is already holding lock:
> > [   11.697946] 000000000d652829 (&net->nft.commit_mutex){+.+.}, at:
> > nf_tables_valid_genid+0x18/0x60 [nf_tables]
> > [   11.697973]
> >                which lock already depends on the new lock.
> > 
> > [   11.697983]
> >                the existing dependency chain (in reverse order) is:
> > [   11.697992]
> >                -> #1 (&net->nft.commit_mutex){+.+.}:
> > [   11.698012]        __mutex_lock+0x48/0x920
> > [   11.698021]        mutex_lock_nested+0x1c/0x40
> > [   11.698033]        nf_tables_valid_genid+0x18/0x60 [nf_tables]
> > [   11.698043]        nfnetlink_rcv_batch+0x24c/0x620 [nfnetlink]
> > [   11.698053]        nfnetlink_rcv+0x110/0x140 [nfnetlink]
> > [   11.698067]        netlink_unicast+0x12c/0x1e0
> > [   11.698076]        netlink_sendmsg+0x324/0x360
> > [   11.698091]        sock_sendmsg+0x34/0x80
> > [   11.698099]        ___sys_sendmsg+0x228/0x240
> > [   11.698108]        __sys_sendmsg+0x4c/0x80
> > [   11.698116]        sys_sendmsg+0x18/0x40
> > [   11.698131]        linux_sparc_syscall+0x34/0x44
> > [   11.698138]
> >                -> #0 (&table[i].mutex){+.+.}:
> > [   11.698157]        lock_acquire+0x1a4/0x1c0
> > [   11.698165]        __mutex_lock+0x48/0x920
> > [   11.698173]        mutex_lock_nested+0x1c/0x40
> > [   11.698181]        nfnl_lock+0x24/0x40 [nfnetlink]
> > [   11.698196]        ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
> > [   11.698207]        set_match_v1_checkentry+0x14/0xc0 [xt_set]
> 
> set_match_v1_checkentry() from ipset always assumed that it's called via 
> the old xtables/setsockopt interface. Thus it calls 
> ip_set_nfnl_get_byindex() which is then calls 
> nfnl_lock(NFNL_SUBSYS_IPSET). Here comes the circular dependency.
> 
> I suppose the only solution is to check wether the mutex is already held 
> or not. Until I send the patch, the only way to avoid the issue is to use 
> the old legacy xtables interface.

There's par->nft_compat in xt_tgchk_param that allows you to know if
you are in the context of the xt over nft infrastructure.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [netfilter-core] [sparc64] possible circular locking / deadlock
@ 2019-06-17 15:06     ` Pablo Neira Ayuso
  0 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2019-06-17 15:06 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Anatoly Pugachev, Sparc kernel list, netfilter-devel, coreteam,
	debian-sparc

Hi Jozsef,

On Mon, Jun 17, 2019 at 05:02:51PM +0200, Jozsef Kadlecsik wrote:
> Hi,
> 
> On Mon, 17 Jun 2019, Anatoly Pugachev wrote:
> 
> > Getting the following git kernel trace on boot with rc.local having :
> > 
> > ipset create sshguard4 hash:net
> > iptables -A INPUT -p tcp --dport 22 -m set --match-set sshguard4 src -j DROP
> 
> In spite of "iptables", it must be the nftables compat backend.
>  
> > current git kernel:
> > 
> > $ uname -a
> > Linux ttip 5.2.0-rc5 #981 SMP Mon Jun 17 09:52:04 MSK 2019 sparc64 GNU/Linux
> > linux-2.6$ git desc
> > v5.2-rc5
> > 
> > 
> > $ dmesg
> > <cut>
> > [   10.356388] Adding 787176k swap on /dev/vdiska4.  Priority:-2
> > extents:1 across:787176k FS
> > [   10.471900] EXT4-fs (vdiska1): mounting ext3 file system using the
> > ext4 subsystem
> > [   10.487226] EXT4-fs (vdiska1): mounted filesystem with ordered data
> > mode. Opts: (null)
> > [   11.158102] random: crng init done
> > [   11.158155] random: 7 urandom warning(s) missed due to ratelimiting
> > 
> > [   11.697866] ===========================
> > [   11.697875] WARNING: possible circular locking dependency detected
> > [   11.697886] 5.2.0-rc5 #981 Not tainted
> > [   11.697894] ------------------------------------------------------
> > [   11.697902] iptables/732 is trying to acquire lock:
> > [   11.697913] 000000004f61aa56 (&table[i].mutex){+.+.}, at:
> > nfnl_lock+0x24/0x40 [nfnetlink]
> > [   11.697937]
> >                but task is already holding lock:
> > [   11.697946] 000000000d652829 (&net->nft.commit_mutex){+.+.}, at:
> > nf_tables_valid_genid+0x18/0x60 [nf_tables]
> > [   11.697973]
> >                which lock already depends on the new lock.
> > 
> > [   11.697983]
> >                the existing dependency chain (in reverse order) is:
> > [   11.697992]
> >                -> #1 (&net->nft.commit_mutex){+.+.}:
> > [   11.698012]        __mutex_lock+0x48/0x920
> > [   11.698021]        mutex_lock_nested+0x1c/0x40
> > [   11.698033]        nf_tables_valid_genid+0x18/0x60 [nf_tables]
> > [   11.698043]        nfnetlink_rcv_batch+0x24c/0x620 [nfnetlink]
> > [   11.698053]        nfnetlink_rcv+0x110/0x140 [nfnetlink]
> > [   11.698067]        netlink_unicast+0x12c/0x1e0
> > [   11.698076]        netlink_sendmsg+0x324/0x360
> > [   11.698091]        sock_sendmsg+0x34/0x80
> > [   11.698099]        ___sys_sendmsg+0x228/0x240
> > [   11.698108]        __sys_sendmsg+0x4c/0x80
> > [   11.698116]        sys_sendmsg+0x18/0x40
> > [   11.698131]        linux_sparc_syscall+0x34/0x44
> > [   11.698138]
> >                -> #0 (&table[i].mutex){+.+.}:
> > [   11.698157]        lock_acquire+0x1a4/0x1c0
> > [   11.698165]        __mutex_lock+0x48/0x920
> > [   11.698173]        mutex_lock_nested+0x1c/0x40
> > [   11.698181]        nfnl_lock+0x24/0x40 [nfnetlink]
> > [   11.698196]        ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
> > [   11.698207]        set_match_v1_checkentry+0x14/0xc0 [xt_set]
> 
> set_match_v1_checkentry() from ipset always assumed that it's called via 
> the old xtables/setsockopt interface. Thus it calls 
> ip_set_nfnl_get_byindex() which is then calls 
> nfnl_lock(NFNL_SUBSYS_IPSET). Here comes the circular dependency.
> 
> I suppose the only solution is to check wether the mutex is already held 
> or not. Until I send the patch, the only way to avoid the issue is to use 
> the old legacy xtables interface.

There's par->nft_compat in xt_tgchk_param that allows you to know if
you are in the context of the xt over nft infrastructure.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [netfilter-core] [sparc64] possible circular locking / deadlock
  2019-06-17 15:06     ` Pablo Neira Ayuso
@ 2019-06-17 18:17       ` Jozsef Kadlecsik
  -1 siblings, 0 replies; 12+ messages in thread
From: Jozsef Kadlecsik @ 2019-06-17 18:17 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Anatoly Pugachev, Sparc kernel list, netfilter-devel, coreteam,
	debian-sparc

Hi Pablo,

On Mon, 17 Jun 2019, Pablo Neira Ayuso wrote:

> On Mon, Jun 17, 2019 at 05:02:51PM +0200, Jozsef Kadlecsik wrote:
> > Hi,
> > 
> > On Mon, 17 Jun 2019, Anatoly Pugachev wrote:
> > 
> > > Getting the following git kernel trace on boot with rc.local having :
> > > 
> > > ipset create sshguard4 hash:net
> > > iptables -A INPUT -p tcp --dport 22 -m set --match-set sshguard4 src -j DROP
> > 
> > In spite of "iptables", it must be the nftables compat backend.
> >  
> > > current git kernel:
> > > 
> > > $ uname -a
> > > Linux ttip 5.2.0-rc5 #981 SMP Mon Jun 17 09:52:04 MSK 2019 sparc64 GNU/Linux
> > > linux-2.6$ git desc
> > > v5.2-rc5
> > > 
> > > 
> > > $ dmesg
> > > <cut>
> > > [   10.356388] Adding 787176k swap on /dev/vdiska4.  Priority:-2
> > > extents:1 across:787176k FS
> > > [   10.471900] EXT4-fs (vdiska1): mounting ext3 file system using the
> > > ext4 subsystem
> > > [   10.487226] EXT4-fs (vdiska1): mounted filesystem with ordered data
> > > mode. Opts: (null)
> > > [   11.158102] random: crng init done
> > > [   11.158155] random: 7 urandom warning(s) missed due to ratelimiting
> > > 
> > > [   11.697866] ======================================================
> > > [   11.697875] WARNING: possible circular locking dependency detected
> > > [   11.697886] 5.2.0-rc5 #981 Not tainted
> > > [   11.697894] ------------------------------------------------------
> > > [   11.697902] iptables/732 is trying to acquire lock:
> > > [   11.697913] 000000004f61aa56 (&table[i].mutex){+.+.}, at:
> > > nfnl_lock+0x24/0x40 [nfnetlink]
> > > [   11.697937]
> > >                but task is already holding lock:
> > > [   11.697946] 000000000d652829 (&net->nft.commit_mutex){+.+.}, at:
> > > nf_tables_valid_genid+0x18/0x60 [nf_tables]
> > > [   11.697973]
> > >                which lock already depends on the new lock.
> > > 
> > > [   11.697983]
> > >                the existing dependency chain (in reverse order) is:
> > > [   11.697992]
> > >                -> #1 (&net->nft.commit_mutex){+.+.}:
> > > [   11.698012]        __mutex_lock+0x48/0x920
> > > [   11.698021]        mutex_lock_nested+0x1c/0x40
> > > [   11.698033]        nf_tables_valid_genid+0x18/0x60 [nf_tables]
> > > [   11.698043]        nfnetlink_rcv_batch+0x24c/0x620 [nfnetlink]
> > > [   11.698053]        nfnetlink_rcv+0x110/0x140 [nfnetlink]
> > > [   11.698067]        netlink_unicast+0x12c/0x1e0
> > > [   11.698076]        netlink_sendmsg+0x324/0x360
> > > [   11.698091]        sock_sendmsg+0x34/0x80
> > > [   11.698099]        ___sys_sendmsg+0x228/0x240
> > > [   11.698108]        __sys_sendmsg+0x4c/0x80
> > > [   11.698116]        sys_sendmsg+0x18/0x40
> > > [   11.698131]        linux_sparc_syscall+0x34/0x44
> > > [   11.698138]
> > >                -> #0 (&table[i].mutex){+.+.}:
> > > [   11.698157]        lock_acquire+0x1a4/0x1c0
> > > [   11.698165]        __mutex_lock+0x48/0x920
> > > [   11.698173]        mutex_lock_nested+0x1c/0x40
> > > [   11.698181]        nfnl_lock+0x24/0x40 [nfnetlink]
> > > [   11.698196]        ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
> > > [   11.698207]        set_match_v1_checkentry+0x14/0xc0 [xt_set]
> > 
> > set_match_v1_checkentry() from ipset always assumed that it's called via 
> > the old xtables/setsockopt interface. Thus it calls 
> > ip_set_nfnl_get_byindex() which is then calls 
> > nfnl_lock(NFNL_SUBSYS_IPSET). Here comes the circular dependency.
> > 
> > I suppose the only solution is to check wether the mutex is already held 
> > or not. Until I send the patch, the only way to avoid the issue is to use 
> > the old legacy xtables interface.
> 
> There's par->nft_compat in xt_tgchk_param that allows you to know if you 
> are in the context of the xt over nft infrastructure.

Great, thank you! That's better than checking the mutex!

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [netfilter-core] [sparc64] possible circular locking / deadlock
@ 2019-06-17 18:17       ` Jozsef Kadlecsik
  0 siblings, 0 replies; 12+ messages in thread
From: Jozsef Kadlecsik @ 2019-06-17 18:17 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Anatoly Pugachev, Sparc kernel list, netfilter-devel, coreteam,
	debian-sparc

Hi Pablo,

On Mon, 17 Jun 2019, Pablo Neira Ayuso wrote:

> On Mon, Jun 17, 2019 at 05:02:51PM +0200, Jozsef Kadlecsik wrote:
> > Hi,
> > 
> > On Mon, 17 Jun 2019, Anatoly Pugachev wrote:
> > 
> > > Getting the following git kernel trace on boot with rc.local having :
> > > 
> > > ipset create sshguard4 hash:net
> > > iptables -A INPUT -p tcp --dport 22 -m set --match-set sshguard4 src -j DROP
> > 
> > In spite of "iptables", it must be the nftables compat backend.
> >  
> > > current git kernel:
> > > 
> > > $ uname -a
> > > Linux ttip 5.2.0-rc5 #981 SMP Mon Jun 17 09:52:04 MSK 2019 sparc64 GNU/Linux
> > > linux-2.6$ git desc
> > > v5.2-rc5
> > > 
> > > 
> > > $ dmesg
> > > <cut>
> > > [   10.356388] Adding 787176k swap on /dev/vdiska4.  Priority:-2
> > > extents:1 across:787176k FS
> > > [   10.471900] EXT4-fs (vdiska1): mounting ext3 file system using the
> > > ext4 subsystem
> > > [   10.487226] EXT4-fs (vdiska1): mounted filesystem with ordered data
> > > mode. Opts: (null)
> > > [   11.158102] random: crng init done
> > > [   11.158155] random: 7 urandom warning(s) missed due to ratelimiting
> > > 
> > > [   11.697866] ===========================
> > > [   11.697875] WARNING: possible circular locking dependency detected
> > > [   11.697886] 5.2.0-rc5 #981 Not tainted
> > > [   11.697894] ------------------------------------------------------
> > > [   11.697902] iptables/732 is trying to acquire lock:
> > > [   11.697913] 000000004f61aa56 (&table[i].mutex){+.+.}, at:
> > > nfnl_lock+0x24/0x40 [nfnetlink]
> > > [   11.697937]
> > >                but task is already holding lock:
> > > [   11.697946] 000000000d652829 (&net->nft.commit_mutex){+.+.}, at:
> > > nf_tables_valid_genid+0x18/0x60 [nf_tables]
> > > [   11.697973]
> > >                which lock already depends on the new lock.
> > > 
> > > [   11.697983]
> > >                the existing dependency chain (in reverse order) is:
> > > [   11.697992]
> > >                -> #1 (&net->nft.commit_mutex){+.+.}:
> > > [   11.698012]        __mutex_lock+0x48/0x920
> > > [   11.698021]        mutex_lock_nested+0x1c/0x40
> > > [   11.698033]        nf_tables_valid_genid+0x18/0x60 [nf_tables]
> > > [   11.698043]        nfnetlink_rcv_batch+0x24c/0x620 [nfnetlink]
> > > [   11.698053]        nfnetlink_rcv+0x110/0x140 [nfnetlink]
> > > [   11.698067]        netlink_unicast+0x12c/0x1e0
> > > [   11.698076]        netlink_sendmsg+0x324/0x360
> > > [   11.698091]        sock_sendmsg+0x34/0x80
> > > [   11.698099]        ___sys_sendmsg+0x228/0x240
> > > [   11.698108]        __sys_sendmsg+0x4c/0x80
> > > [   11.698116]        sys_sendmsg+0x18/0x40
> > > [   11.698131]        linux_sparc_syscall+0x34/0x44
> > > [   11.698138]
> > >                -> #0 (&table[i].mutex){+.+.}:
> > > [   11.698157]        lock_acquire+0x1a4/0x1c0
> > > [   11.698165]        __mutex_lock+0x48/0x920
> > > [   11.698173]        mutex_lock_nested+0x1c/0x40
> > > [   11.698181]        nfnl_lock+0x24/0x40 [nfnetlink]
> > > [   11.698196]        ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
> > > [   11.698207]        set_match_v1_checkentry+0x14/0xc0 [xt_set]
> > 
> > set_match_v1_checkentry() from ipset always assumed that it's called via 
> > the old xtables/setsockopt interface. Thus it calls 
> > ip_set_nfnl_get_byindex() which is then calls 
> > nfnl_lock(NFNL_SUBSYS_IPSET). Here comes the circular dependency.
> > 
> > I suppose the only solution is to check wether the mutex is already held 
> > or not. Until I send the patch, the only way to avoid the issue is to use 
> > the old legacy xtables interface.
> 
> There's par->nft_compat in xt_tgchk_param that allows you to know if you 
> are in the context of the xt over nft infrastructure.

Great, thank you! That's better than checking the mutex!

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [netfilter-core] [sparc64] possible circular locking / deadlock
  2019-06-17 15:02   ` Jozsef Kadlecsik
@ 2019-06-17 20:11     ` Florian Westphal
  -1 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2019-06-17 20:11 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Anatoly Pugachev, Sparc kernel list, netfilter-devel, coreteam,
	debian-sparc

Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:
> >                -> #0 (&table[i].mutex){+.+.}:
> > [   11.698157]        lock_acquire+0x1a4/0x1c0
> > [   11.698165]        __mutex_lock+0x48/0x920
> > [   11.698173]        mutex_lock_nested+0x1c/0x40
> > [   11.698181]        nfnl_lock+0x24/0x40 [nfnetlink]
> > [   11.698196]        ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
> > [   11.698207]        set_match_v1_checkentry+0x14/0xc0 [xt_set]
> 
> set_match_v1_checkentry() from ipset always assumed that it's called via 
> the old xtables/setsockopt interface. Thus it calls 
> ip_set_nfnl_get_byindex() which is then calls 
> nfnl_lock(NFNL_SUBSYS_IPSET). Here comes the circular dependency.

But isnt it a false positive?

> > [   11.698359]        CPU0                    CPU1
> > [   11.698366]        ----                    ----
> > [   11.698372]   lock(&net->nft.commit_mutex);
> > [   11.698381]                                lock(&table[i].mutex);
> > [   11.698390]                                lock(&net->nft.commit_mutex);
> > [   11.698400]   lock(&table[i].mutex);
> > [   11.698408]

AFAICS CPU0 takes the ipset subsys mutex after taking the nftables
transaction mutex (via checkentry of ipset match), while CPU1 took the
nftables subsys mutex and then the nftables transaction mutex.

The only reason why this splat is generated is because nftables and
ipset subset mutexes are currently the same from lockdep pov.

It looks like we need to extend nfnetlink to place the subsystem mutexes
in different lockdep classes.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [netfilter-core] [sparc64] possible circular locking / deadlock
@ 2019-06-17 20:11     ` Florian Westphal
  0 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2019-06-17 20:11 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Anatoly Pugachev, Sparc kernel list, netfilter-devel, coreteam,
	debian-sparc

Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:
> >                -> #0 (&table[i].mutex){+.+.}:
> > [   11.698157]        lock_acquire+0x1a4/0x1c0
> > [   11.698165]        __mutex_lock+0x48/0x920
> > [   11.698173]        mutex_lock_nested+0x1c/0x40
> > [   11.698181]        nfnl_lock+0x24/0x40 [nfnetlink]
> > [   11.698196]        ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
> > [   11.698207]        set_match_v1_checkentry+0x14/0xc0 [xt_set]
> 
> set_match_v1_checkentry() from ipset always assumed that it's called via 
> the old xtables/setsockopt interface. Thus it calls 
> ip_set_nfnl_get_byindex() which is then calls 
> nfnl_lock(NFNL_SUBSYS_IPSET). Here comes the circular dependency.

But isnt it a false positive?

> > [   11.698359]        CPU0                    CPU1
> > [   11.698366]        ----                    ----
> > [   11.698372]   lock(&net->nft.commit_mutex);
> > [   11.698381]                                lock(&table[i].mutex);
> > [   11.698390]                                lock(&net->nft.commit_mutex);
> > [   11.698400]   lock(&table[i].mutex);
> > [   11.698408]

AFAICS CPU0 takes the ipset subsys mutex after taking the nftables
transaction mutex (via checkentry of ipset match), while CPU1 took the
nftables subsys mutex and then the nftables transaction mutex.

The only reason why this splat is generated is because nftables and
ipset subset mutexes are currently the same from lockdep pov.

It looks like we need to extend nfnetlink to place the subsystem mutexes
in different lockdep classes.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [netfilter-core] [sparc64] possible circular locking / deadlock
  2019-06-17 20:11     ` Florian Westphal
@ 2019-06-17 20:32       ` Jozsef Kadlecsik
  -1 siblings, 0 replies; 12+ messages in thread
From: Jozsef Kadlecsik @ 2019-06-17 20:32 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Anatoly Pugachev, Sparc kernel list, netfilter-devel, coreteam,
	debian-sparc

Hi Florian,

On Mon, 17 Jun 2019, Florian Westphal wrote:

> Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:
> > >                -> #0 (&table[i].mutex){+.+.}:
> > > [   11.698157]        lock_acquire+0x1a4/0x1c0
> > > [   11.698165]        __mutex_lock+0x48/0x920
> > > [   11.698173]        mutex_lock_nested+0x1c/0x40
> > > [   11.698181]        nfnl_lock+0x24/0x40 [nfnetlink]
> > > [   11.698196]        ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
> > > [   11.698207]        set_match_v1_checkentry+0x14/0xc0 [xt_set]
> > 
> > set_match_v1_checkentry() from ipset always assumed that it's called via 
> > the old xtables/setsockopt interface. Thus it calls 
> > ip_set_nfnl_get_byindex() which is then calls 
> > nfnl_lock(NFNL_SUBSYS_IPSET). Here comes the circular dependency.
> 
> But isnt it a false positive?
> 
> > > [   11.698359]        CPU0                    CPU1
> > > [   11.698366]        ----                    ----
> > > [   11.698372]   lock(&net->nft.commit_mutex);
> > > [   11.698381]                                lock(&table[i].mutex);
> > > [   11.698390]                                lock(&net->nft.commit_mutex);
> > > [   11.698400]   lock(&table[i].mutex);
> > > [   11.698408]
> 
> AFAICS CPU0 takes the ipset subsys mutex after taking the nftables 
> transaction mutex (via checkentry of ipset match), while CPU1 took the 
> nftables subsys mutex and then the nftables transaction mutex.
> 
> The only reason why this splat is generated is because nftables and 
> ipset subset mutexes are currently the same from lockdep pov.
> 
> It looks like we need to extend nfnetlink to place the subsystem mutexes 
> in different lockdep classes.

That would be nicer! 

Otherwise I'd need "struct xt_mtdtor_param" and "struct xt_tgdtor_param" 
be extended with "bool nft_compat" to handle all required calls from the 
ip_set module.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [netfilter-core] [sparc64] possible circular locking / deadlock
@ 2019-06-17 20:32       ` Jozsef Kadlecsik
  0 siblings, 0 replies; 12+ messages in thread
From: Jozsef Kadlecsik @ 2019-06-17 20:32 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Anatoly Pugachev, Sparc kernel list, netfilter-devel, coreteam,
	debian-sparc

Hi Florian,

On Mon, 17 Jun 2019, Florian Westphal wrote:

> Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:
> > >                -> #0 (&table[i].mutex){+.+.}:
> > > [   11.698157]        lock_acquire+0x1a4/0x1c0
> > > [   11.698165]        __mutex_lock+0x48/0x920
> > > [   11.698173]        mutex_lock_nested+0x1c/0x40
> > > [   11.698181]        nfnl_lock+0x24/0x40 [nfnetlink]
> > > [   11.698196]        ip_set_nfnl_get_byindex+0x19c/0x280 [ip_set]
> > > [   11.698207]        set_match_v1_checkentry+0x14/0xc0 [xt_set]
> > 
> > set_match_v1_checkentry() from ipset always assumed that it's called via 
> > the old xtables/setsockopt interface. Thus it calls 
> > ip_set_nfnl_get_byindex() which is then calls 
> > nfnl_lock(NFNL_SUBSYS_IPSET). Here comes the circular dependency.
> 
> But isnt it a false positive?
> 
> > > [   11.698359]        CPU0                    CPU1
> > > [   11.698366]        ----                    ----
> > > [   11.698372]   lock(&net->nft.commit_mutex);
> > > [   11.698381]                                lock(&table[i].mutex);
> > > [   11.698390]                                lock(&net->nft.commit_mutex);
> > > [   11.698400]   lock(&table[i].mutex);
> > > [   11.698408]
> 
> AFAICS CPU0 takes the ipset subsys mutex after taking the nftables 
> transaction mutex (via checkentry of ipset match), while CPU1 took the 
> nftables subsys mutex and then the nftables transaction mutex.
> 
> The only reason why this splat is generated is because nftables and 
> ipset subset mutexes are currently the same from lockdep pov.
> 
> It looks like we need to extend nfnetlink to place the subsystem mutexes 
> in different lockdep classes.

That would be nicer! 

Otherwise I'd need "struct xt_mtdtor_param" and "struct xt_tgdtor_param" 
be extended with "bool nft_compat" to handle all required calls from the 
ip_set module.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-06-17 20:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-17 14:46 [sparc64] possible circular locking / deadlock Anatoly Pugachev
2019-06-17 14:46 ` Anatoly Pugachev
2019-06-17 15:02 ` [netfilter-core] " Jozsef Kadlecsik
2019-06-17 15:02   ` Jozsef Kadlecsik
2019-06-17 15:06   ` Pablo Neira Ayuso
2019-06-17 15:06     ` Pablo Neira Ayuso
2019-06-17 18:17     ` Jozsef Kadlecsik
2019-06-17 18:17       ` Jozsef Kadlecsik
2019-06-17 20:11   ` Florian Westphal
2019-06-17 20:11     ` Florian Westphal
2019-06-17 20:32     ` Jozsef Kadlecsik
2019-06-17 20:32       ` Jozsef Kadlecsik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.