* [PATCH v2 net] ipv4: update fib_info_cnt under spinlock protection
@ 2022-01-16 9:02 Eric Dumazet
2022-01-16 9:06 ` Ido Schimmel
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Eric Dumazet @ 2022-01-16 9:02 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski
Cc: netdev, Eric Dumazet, Eric Dumazet, syzbot, David Laight,
Ido Schimmel, Jiri Pirko
From: Eric Dumazet <edumazet@google.com>
In the past, free_fib_info() was supposed to be called
under RTNL protection.
This eventually was no longer the case.
Instead of enforcing RTNL it seems we simply can
move fib_info_cnt changes to occur when fib_info_lock
is held.
v2: David Laight suggested to update fib_info_cnt
only when an entry is added/deleted to/from the hash table,
as fib_info_cnt is used to make sure hash table size
is optimal.
BUG: KCSAN: data-race in fib_create_info / free_fib_info
write to 0xffffffff86e243a0 of 4 bytes by task 26429 on cpu 0:
fib_create_info+0xe78/0x3440 net/ipv4/fib_semantics.c:1428
fib_table_insert+0x148/0x10c0 net/ipv4/fib_trie.c:1224
fib_magic+0x195/0x1e0 net/ipv4/fib_frontend.c:1087
fib_add_ifaddr+0xd0/0x2e0 net/ipv4/fib_frontend.c:1109
fib_netdev_event+0x178/0x510 net/ipv4/fib_frontend.c:1466
notifier_call_chain kernel/notifier.c:83 [inline]
raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:391
__dev_notify_flags+0x1d3/0x3b0
dev_change_flags+0xa2/0xc0 net/core/dev.c:8872
do_setlink+0x810/0x2410 net/core/rtnetlink.c:2719
rtnl_group_changelink net/core/rtnetlink.c:3242 [inline]
__rtnl_newlink net/core/rtnetlink.c:3396 [inline]
rtnl_newlink+0xb10/0x13b0 net/core/rtnetlink.c:3506
rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5571
netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2496
rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5589
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x5fc/0x6c0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x726/0x840 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg net/socket.c:724 [inline]
____sys_sendmsg+0x39a/0x510 net/socket.c:2409
___sys_sendmsg net/socket.c:2463 [inline]
__sys_sendmsg+0x195/0x230 net/socket.c:2492
__do_sys_sendmsg net/socket.c:2501 [inline]
__se_sys_sendmsg net/socket.c:2499 [inline]
__x64_sys_sendmsg+0x42/0x50 net/socket.c:2499
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffffffff86e243a0 of 4 bytes by task 31505 on cpu 1:
free_fib_info+0x35/0x80 net/ipv4/fib_semantics.c:252
fib_info_put include/net/ip_fib.h:575 [inline]
nsim_fib4_rt_destroy drivers/net/netdevsim/fib.c:294 [inline]
nsim_fib4_rt_replace drivers/net/netdevsim/fib.c:403 [inline]
nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:431 [inline]
nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
nsim_fib_event_work+0x15ca/0x2cf0 drivers/net/netdevsim/fib.c:1477
process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
process_scheduled_works kernel/workqueue.c:2361 [inline]
worker_thread+0x7df/0xa70 kernel/workqueue.c:2447
kthread+0x2c7/0x2e0 kernel/kthread.c:327
ret_from_fork+0x1f/0x30
value changed: 0x00000d2d -> 0x00000d2e
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 31505 Comm: kworker/1:21 Not tainted 5.16.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work
Fixes: 48bb9eb47b27 ("netdevsim: fib: Add dummy implementation for FIB offload")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
---
net/ipv4/fib_semantics.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 828de171708f599b56f63715514c0259c7cb08a2..45619c005b8dddd7ccd5c7029efa4ed69b6ce1de 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -249,7 +249,6 @@ void free_fib_info(struct fib_info *fi)
pr_warn("Freeing alive fib_info %p\n", fi);
return;
}
- fib_info_cnt--;
call_rcu(&fi->rcu, free_fib_info_rcu);
}
@@ -260,6 +259,10 @@ void fib_release_info(struct fib_info *fi)
spin_lock_bh(&fib_info_lock);
if (fi && refcount_dec_and_test(&fi->fib_treeref)) {
hlist_del(&fi->fib_hash);
+
+ /* Paired with READ_ONCE() in fib_create_info(). */
+ WRITE_ONCE(fib_info_cnt, fib_info_cnt - 1);
+
if (fi->fib_prefsrc)
hlist_del(&fi->fib_lhash);
if (fi->nh) {
@@ -1430,7 +1433,9 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
#endif
err = -ENOBUFS;
- if (fib_info_cnt >= fib_info_hash_size) {
+
+ /* Paired with WRITE_ONCE() in fib_release_info() */
+ if (READ_ONCE(fib_info_cnt) >= fib_info_hash_size) {
unsigned int new_size = fib_info_hash_size << 1;
struct hlist_head *new_info_hash;
struct hlist_head *new_laddrhash;
@@ -1462,7 +1467,6 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
return ERR_PTR(err);
}
- fib_info_cnt++;
fi->fib_net = net;
fi->fib_protocol = cfg->fc_protocol;
fi->fib_scope = cfg->fc_scope;
@@ -1591,6 +1595,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
refcount_set(&fi->fib_treeref, 1);
refcount_set(&fi->fib_clntref, 1);
spin_lock_bh(&fib_info_lock);
+ fib_info_cnt++;
hlist_add_head(&fi->fib_hash,
&fib_info_hash[fib_info_hashfn(fi)]);
if (fi->fib_prefsrc) {
--
2.34.1.703.g22d0c6ccf7-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 net] ipv4: update fib_info_cnt under spinlock protection
2022-01-16 9:02 [PATCH v2 net] ipv4: update fib_info_cnt under spinlock protection Eric Dumazet
@ 2022-01-16 9:06 ` Ido Schimmel
2022-01-16 12:30 ` patchwork-bot+netdevbpf
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Ido Schimmel @ 2022-01-16 9:06 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, netdev, Eric Dumazet, syzbot,
David Laight, Ido Schimmel, Jiri Pirko
On Sun, Jan 16, 2022 at 01:02:20AM -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> In the past, free_fib_info() was supposed to be called
> under RTNL protection.
>
> This eventually was no longer the case.
>
> Instead of enforcing RTNL it seems we simply can
> move fib_info_cnt changes to occur when fib_info_lock
> is held.
>
> v2: David Laight suggested to update fib_info_cnt
> only when an entry is added/deleted to/from the hash table,
> as fib_info_cnt is used to make sure hash table size
> is optimal.
[...]
> Fixes: 48bb9eb47b27 ("netdevsim: fib: Add dummy implementation for FIB offload")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Cc: David Laight <David.Laight@ACULAB.COM>
> Cc: Ido Schimmel <idosch@mellanox.com>
> Cc: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 net] ipv4: update fib_info_cnt under spinlock protection
2022-01-16 9:02 [PATCH v2 net] ipv4: update fib_info_cnt under spinlock protection Eric Dumazet
2022-01-16 9:06 ` Ido Schimmel
@ 2022-01-16 12:30 ` patchwork-bot+netdevbpf
2022-01-17 3:23 ` David Laight
2022-01-17 9:20 ` David Laight
3 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-01-16 12:30 UTC (permalink / raw)
To: Eric Dumazet
Cc: davem, kuba, netdev, edumazet, syzkaller, David.Laight, idosch, jiri
Hello:
This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:
On Sun, 16 Jan 2022 01:02:20 -0800 you wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> In the past, free_fib_info() was supposed to be called
> under RTNL protection.
>
> This eventually was no longer the case.
>
> [...]
Here is the summary with links:
- [v2,net] ipv4: update fib_info_cnt under spinlock protection
https://git.kernel.org/netdev/net/c/0a6e6b3c7db6
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH v2 net] ipv4: update fib_info_cnt under spinlock protection
2022-01-16 9:02 [PATCH v2 net] ipv4: update fib_info_cnt under spinlock protection Eric Dumazet
2022-01-16 9:06 ` Ido Schimmel
2022-01-16 12:30 ` patchwork-bot+netdevbpf
@ 2022-01-17 3:23 ` David Laight
2022-01-17 9:48 ` Eric Dumazet
2022-01-17 9:20 ` David Laight
3 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2022-01-17 3:23 UTC (permalink / raw)
To: 'Eric Dumazet', David S . Miller, Jakub Kicinski
Cc: netdev, Eric Dumazet, syzbot, Ido Schimmel, Jiri Pirko
From: Eric Dumazet
> Sent: 16 January 2022 09:02
>
> In the past, free_fib_info() was supposed to be called
> under RTNL protection.
>
> This eventually was no longer the case.
>
> Instead of enforcing RTNL it seems we simply can
> move fib_info_cnt changes to occur when fib_info_lock
> is held.
>
> v2: David Laight suggested to update fib_info_cnt
> only when an entry is added/deleted to/from the hash table,
> as fib_info_cnt is used to make sure hash table size
> is optimal.
Already applied, but
acked-by: David Laight
...
If you are going to add READ_ONCE() markers then one on
'fib_info_hash_size' would be much more appropriate since
this value is used twice.
> err = -ENOBUFS;
> - if (fib_info_cnt >= fib_info_hash_size) {
> +
> + /* Paired with WRITE_ONCE() in fib_release_info() */
> + if (READ_ONCE(fib_info_cnt) >= fib_info_hash_size) {
> unsigned int new_size = fib_info_hash_size << 1;
> struct hlist_head *new_info_hash;
> struct hlist_head *new_laddrhash;
> @@ -1462,7 +1467,6 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
If is also possible for two (or many) threads to decide to
increase the hash table size at the same time.
The code that moves the items to the new hash tables should
probably discard the new tables is they aren't larger than
the existing ones.
The copy does look safe - just a waste of time.
It is also technically possible (but very unlikely) that the table
will get shrunk!
It will grow again on the next allocate.
But this is a different bug.
I also though Linus said that the WRITE_ONCE() weren't needed
here because the kernel basically assumes the compiler isn't
stupid enough to do 'write tearing' on word sized items
(or just write zero before every write).
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH v2 net] ipv4: update fib_info_cnt under spinlock protection
2022-01-16 9:02 [PATCH v2 net] ipv4: update fib_info_cnt under spinlock protection Eric Dumazet
` (2 preceding siblings ...)
2022-01-17 3:23 ` David Laight
@ 2022-01-17 9:20 ` David Laight
2022-01-17 9:45 ` Eric Dumazet
3 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2022-01-17 9:20 UTC (permalink / raw)
To: 'Eric Dumazet', David S . Miller, Jakub Kicinski
Cc: netdev, Eric Dumazet, syzbot, Ido Schimmel, Jiri Pirko
From: Eric Dumazet
> Sent: 16 January 2022 09:02
>
> From: Eric Dumazet <edumazet@google.com>
>
> In the past, free_fib_info() was supposed to be called
> under RTNL protection.
>
> This eventually was no longer the case.
>
> Instead of enforcing RTNL it seems we simply can
> move fib_info_cnt changes to occur when fib_info_lock
> is held.
This probably ought to be a stable candidate.
If an increment is lost due to the unlocked inc/dec it is
possible for the count to wrap to -1 if all fib are deleted.
That will cause the table size to be doubled on every
allocate (until the count goes +ve again).
Losing a decrement is less of a problem.
You'd need to lose a lot of them for any ill effect.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 net] ipv4: update fib_info_cnt under spinlock protection
2022-01-17 9:20 ` David Laight
@ 2022-01-17 9:45 ` Eric Dumazet
0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2022-01-17 9:45 UTC (permalink / raw)
To: David Laight
Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, syzbot,
Ido Schimmel, Jiri Pirko
On Mon, Jan 17, 2022 at 1:20 AM David Laight <David.Laight@aculab.com> wrote:
> This probably ought to be a stable candidate.
>
> If an increment is lost due to the unlocked inc/dec it is
> possible for the count to wrap to -1 if all fib are deleted.
> That will cause the table size to be doubled on every
> allocate (until the count goes +ve again).
>
> Losing a decrement is less of a problem.
> You'd need to lose a lot of them for any ill effect.
>
You are rephrasing the syzbot report, and the Fixes: tag I added in
the changelog.
hint: netdev maintainers handle the stable backports based on the
Fixes: tag, and their
own judgement.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 net] ipv4: update fib_info_cnt under spinlock protection
2022-01-17 3:23 ` David Laight
@ 2022-01-17 9:48 ` Eric Dumazet
0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2022-01-17 9:48 UTC (permalink / raw)
To: David Laight
Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, syzbot,
Ido Schimmel, Jiri Pirko
On Sun, Jan 16, 2022 at 7:23 PM David Laight <David.Laight@aculab.com> wrote:
>
> From: Eric Dumazet
> > Sent: 16 January 2022 09:02
> >
> > In the past, free_fib_info() was supposed to be called
> > under RTNL protection.
> >
> > This eventually was no longer the case.
> >
> > Instead of enforcing RTNL it seems we simply can
> > move fib_info_cnt changes to occur when fib_info_lock
> > is held.
> >
> > v2: David Laight suggested to update fib_info_cnt
> > only when an entry is added/deleted to/from the hash table,
> > as fib_info_cnt is used to make sure hash table size
> > is optimal.
>
> Already applied, but
> acked-by: David Laight
>
> ...
> If you are going to add READ_ONCE() markers then one on
> 'fib_info_hash_size' would be much more appropriate since
> this value is used twice.
>
> > err = -ENOBUFS;
> > - if (fib_info_cnt >= fib_info_hash_size) {
> > +
> > + /* Paired with WRITE_ONCE() in fib_release_info() */
> > + if (READ_ONCE(fib_info_cnt) >= fib_info_hash_size) {
> > unsigned int new_size = fib_info_hash_size << 1;
> > struct hlist_head *new_info_hash;
> > struct hlist_head *new_laddrhash;
> > @@ -1462,7 +1467,6 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
>
> If is also possible for two (or many) threads to decide to
> increase the hash table size at the same time.
>
> The code that moves the items to the new hash tables should
> probably discard the new tables is they aren't larger than
> the existing ones.
> The copy does look safe - just a waste of time.
>
> It is also technically possible (but very unlikely) that the table
> will get shrunk!
> It will grow again on the next allocate.
>
> But this is a different bug.
>
There is no bug.
fib_create_info() is called with RTNL held.
> I also though Linus said that the WRITE_ONCE() weren't needed
> here because the kernel basically assumes the compiler isn't
> stupid enough to do 'write tearing' on word sized items
> (or just write zero before every write).
>
That is not true. READ_ONCE()/WRITE_ONCE() have their own purpose,
we can not assume a compiler will follow arbitrary rules about word sizes.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-01-17 9:48 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-16 9:02 [PATCH v2 net] ipv4: update fib_info_cnt under spinlock protection Eric Dumazet
2022-01-16 9:06 ` Ido Schimmel
2022-01-16 12:30 ` patchwork-bot+netdevbpf
2022-01-17 3:23 ` David Laight
2022-01-17 9:48 ` Eric Dumazet
2022-01-17 9:20 ` David Laight
2022-01-17 9:45 ` Eric Dumazet
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.