* [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr()
@ 2024-02-29 11:40 Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 1/6] inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstamp Eric Dumazet
` (7 more replies)
0 siblings, 8 replies; 14+ messages in thread
From: Eric Dumazet @ 2024-02-29 11:40 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Jiri Pirko, David Ahern, netdev, Florian Westphal, eric.dumazet,
Eric Dumazet
This series convert inet so that a dump of addresses (ip -4 addr)
no longer requires RTNL.
Eric Dumazet (6):
inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstamp
inet: annotate data-races around ifa->ifa_valid_lft
inet: annotate data-races around ifa->ifa_preferred_lft
inet: annotate data-races around ifa->ifa_flags
inet: prepare inet_base_seq() to run without RTNL
inet: use xa_array iterator to implement inet_dump_ifaddr()
net/core/dev.c | 5 +-
net/ipv4/devinet.c | 166 +++++++++++++++++++++------------------------
2 files changed, 79 insertions(+), 92 deletions(-)
--
2.44.0.278.ge034bb2e1d-goog
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next 1/6] inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstamp
2024-02-29 11:40 [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr() Eric Dumazet
@ 2024-02-29 11:40 ` Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 2/6] inet: annotate data-races around ifa->ifa_valid_lft Eric Dumazet
` (6 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2024-02-29 11:40 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Jiri Pirko, David Ahern, netdev, Florian Westphal, eric.dumazet,
Eric Dumazet
ifa->ifa_tstamp can be read locklessly.
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
Do the same for ifa->ifa_cstamp to prepare upcoming changes.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/devinet.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index af741af61830aeb695e7e75608515547dade8f39..1316046d5f28955376091d9e02ab4594e19fbd09 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -713,13 +713,14 @@ static void check_lifetime(struct work_struct *work)
rcu_read_lock();
hlist_for_each_entry_rcu(ifa, &inet_addr_lst[i], hash) {
- unsigned long age;
+ unsigned long age, tstamp;
if (ifa->ifa_flags & IFA_F_PERMANENT)
continue;
+ tstamp = READ_ONCE(ifa->ifa_tstamp);
/* We try to batch several events at once. */
- age = (now - ifa->ifa_tstamp +
+ age = (now - tstamp +
ADDRCONF_TIMER_FUZZ_MINUS) / HZ;
if (ifa->ifa_valid_lft != INFINITY_LIFE_TIME &&
@@ -729,17 +730,17 @@ static void check_lifetime(struct work_struct *work)
INFINITY_LIFE_TIME) {
continue;
} else if (age >= ifa->ifa_preferred_lft) {
- if (time_before(ifa->ifa_tstamp +
+ if (time_before(tstamp +
ifa->ifa_valid_lft * HZ, next))
- next = ifa->ifa_tstamp +
+ next = tstamp +
ifa->ifa_valid_lft * HZ;
if (!(ifa->ifa_flags & IFA_F_DEPRECATED))
change_needed = true;
- } else if (time_before(ifa->ifa_tstamp +
+ } else if (time_before(tstamp +
ifa->ifa_preferred_lft * HZ,
next)) {
- next = ifa->ifa_tstamp +
+ next = tstamp +
ifa->ifa_preferred_lft * HZ;
}
}
@@ -819,9 +820,9 @@ static void set_ifa_lifetime(struct in_ifaddr *ifa, __u32 valid_lft,
ifa->ifa_flags |= IFA_F_DEPRECATED;
ifa->ifa_preferred_lft = timeout;
}
- ifa->ifa_tstamp = jiffies;
+ WRITE_ONCE(ifa->ifa_tstamp, jiffies);
if (!ifa->ifa_cstamp)
- ifa->ifa_cstamp = ifa->ifa_tstamp;
+ WRITE_ONCE(ifa->ifa_cstamp, ifa->ifa_tstamp);
}
static struct in_ifaddr *rtm_to_ifaddr(struct net *net, struct nlmsghdr *nlh,
@@ -1676,6 +1677,7 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
{
struct ifaddrmsg *ifm;
struct nlmsghdr *nlh;
+ unsigned long tstamp;
u32 preferred, valid;
nlh = nlmsg_put(skb, args->portid, args->seq, args->event, sizeof(*ifm),
@@ -1694,11 +1696,12 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
nla_put_s32(skb, IFA_TARGET_NETNSID, args->netnsid))
goto nla_put_failure;
+ tstamp = READ_ONCE(ifa->ifa_tstamp);
if (!(ifm->ifa_flags & IFA_F_PERMANENT)) {
preferred = ifa->ifa_preferred_lft;
valid = ifa->ifa_valid_lft;
if (preferred != INFINITY_LIFE_TIME) {
- long tval = (jiffies - ifa->ifa_tstamp) / HZ;
+ long tval = (jiffies - tstamp) / HZ;
if (preferred > tval)
preferred -= tval;
@@ -1728,7 +1731,7 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
nla_put_u32(skb, IFA_FLAGS, ifa->ifa_flags) ||
(ifa->ifa_rt_priority &&
nla_put_u32(skb, IFA_RT_PRIORITY, ifa->ifa_rt_priority)) ||
- put_cacheinfo(skb, ifa->ifa_cstamp, ifa->ifa_tstamp,
+ put_cacheinfo(skb, READ_ONCE(ifa->ifa_cstamp), tstamp,
preferred, valid))
goto nla_put_failure;
--
2.44.0.278.ge034bb2e1d-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 2/6] inet: annotate data-races around ifa->ifa_valid_lft
2024-02-29 11:40 [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr() Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 1/6] inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstamp Eric Dumazet
@ 2024-02-29 11:40 ` Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 3/6] inet: annotate data-races around ifa->ifa_preferred_lft Eric Dumazet
` (5 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2024-02-29 11:40 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Jiri Pirko, David Ahern, netdev, Florian Westphal, eric.dumazet,
Eric Dumazet
ifa->ifa_valid_lft can be read locklessly.
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/devinet.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 1316046d5f28955376091d9e02ab4594e19fbd09..99f3e7c57d36ff028edadd4efd66d996ddc5d9b4 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -714,26 +714,26 @@ static void check_lifetime(struct work_struct *work)
rcu_read_lock();
hlist_for_each_entry_rcu(ifa, &inet_addr_lst[i], hash) {
unsigned long age, tstamp;
+ u32 valid_lft;
if (ifa->ifa_flags & IFA_F_PERMANENT)
continue;
+ valid_lft = READ_ONCE(ifa->ifa_valid_lft);
tstamp = READ_ONCE(ifa->ifa_tstamp);
/* We try to batch several events at once. */
age = (now - tstamp +
ADDRCONF_TIMER_FUZZ_MINUS) / HZ;
- if (ifa->ifa_valid_lft != INFINITY_LIFE_TIME &&
- age >= ifa->ifa_valid_lft) {
+ if (valid_lft != INFINITY_LIFE_TIME &&
+ age >= valid_lft) {
change_needed = true;
} else if (ifa->ifa_preferred_lft ==
INFINITY_LIFE_TIME) {
continue;
} else if (age >= ifa->ifa_preferred_lft) {
- if (time_before(tstamp +
- ifa->ifa_valid_lft * HZ, next))
- next = tstamp +
- ifa->ifa_valid_lft * HZ;
+ if (time_before(tstamp + valid_lft * HZ, next))
+ next = tstamp + valid_lft * HZ;
if (!(ifa->ifa_flags & IFA_F_DEPRECATED))
change_needed = true;
@@ -810,7 +810,7 @@ static void set_ifa_lifetime(struct in_ifaddr *ifa, __u32 valid_lft,
timeout = addrconf_timeout_fixup(valid_lft, HZ);
if (addrconf_finite_timeout(timeout))
- ifa->ifa_valid_lft = timeout;
+ WRITE_ONCE(ifa->ifa_valid_lft, timeout);
else
ifa->ifa_flags |= IFA_F_PERMANENT;
@@ -1699,7 +1699,7 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
tstamp = READ_ONCE(ifa->ifa_tstamp);
if (!(ifm->ifa_flags & IFA_F_PERMANENT)) {
preferred = ifa->ifa_preferred_lft;
- valid = ifa->ifa_valid_lft;
+ valid = READ_ONCE(ifa->ifa_valid_lft);
if (preferred != INFINITY_LIFE_TIME) {
long tval = (jiffies - tstamp) / HZ;
--
2.44.0.278.ge034bb2e1d-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 3/6] inet: annotate data-races around ifa->ifa_preferred_lft
2024-02-29 11:40 [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr() Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 1/6] inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstamp Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 2/6] inet: annotate data-races around ifa->ifa_valid_lft Eric Dumazet
@ 2024-02-29 11:40 ` Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 4/6] inet: annotate data-races around ifa->ifa_flags Eric Dumazet
` (4 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2024-02-29 11:40 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Jiri Pirko, David Ahern, netdev, Florian Westphal, eric.dumazet,
Eric Dumazet
ifa->ifa_preferred_lft can be read locklessly.
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/devinet.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 99f3e7c57d36ff028edadd4efd66d996ddc5d9b4..368fb56e1f1b2e3b7888f611a3f113f3bd5fc2af 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -714,11 +714,13 @@ static void check_lifetime(struct work_struct *work)
rcu_read_lock();
hlist_for_each_entry_rcu(ifa, &inet_addr_lst[i], hash) {
unsigned long age, tstamp;
+ u32 preferred_lft;
u32 valid_lft;
if (ifa->ifa_flags & IFA_F_PERMANENT)
continue;
+ preferred_lft = READ_ONCE(ifa->ifa_preferred_lft);
valid_lft = READ_ONCE(ifa->ifa_valid_lft);
tstamp = READ_ONCE(ifa->ifa_tstamp);
/* We try to batch several events at once. */
@@ -728,20 +730,18 @@ static void check_lifetime(struct work_struct *work)
if (valid_lft != INFINITY_LIFE_TIME &&
age >= valid_lft) {
change_needed = true;
- } else if (ifa->ifa_preferred_lft ==
+ } else if (preferred_lft ==
INFINITY_LIFE_TIME) {
continue;
- } else if (age >= ifa->ifa_preferred_lft) {
+ } else if (age >= preferred_lft) {
if (time_before(tstamp + valid_lft * HZ, next))
next = tstamp + valid_lft * HZ;
if (!(ifa->ifa_flags & IFA_F_DEPRECATED))
change_needed = true;
- } else if (time_before(tstamp +
- ifa->ifa_preferred_lft * HZ,
+ } else if (time_before(tstamp + preferred_lft * HZ,
next)) {
- next = tstamp +
- ifa->ifa_preferred_lft * HZ;
+ next = tstamp + preferred_lft * HZ;
}
}
rcu_read_unlock();
@@ -818,7 +818,7 @@ static void set_ifa_lifetime(struct in_ifaddr *ifa, __u32 valid_lft,
if (addrconf_finite_timeout(timeout)) {
if (timeout == 0)
ifa->ifa_flags |= IFA_F_DEPRECATED;
- ifa->ifa_preferred_lft = timeout;
+ WRITE_ONCE(ifa->ifa_preferred_lft, timeout);
}
WRITE_ONCE(ifa->ifa_tstamp, jiffies);
if (!ifa->ifa_cstamp)
@@ -1698,7 +1698,7 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
tstamp = READ_ONCE(ifa->ifa_tstamp);
if (!(ifm->ifa_flags & IFA_F_PERMANENT)) {
- preferred = ifa->ifa_preferred_lft;
+ preferred = READ_ONCE(ifa->ifa_preferred_lft);
valid = READ_ONCE(ifa->ifa_valid_lft);
if (preferred != INFINITY_LIFE_TIME) {
long tval = (jiffies - tstamp) / HZ;
--
2.44.0.278.ge034bb2e1d-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 4/6] inet: annotate data-races around ifa->ifa_flags
2024-02-29 11:40 [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr() Eric Dumazet
` (2 preceding siblings ...)
2024-02-29 11:40 ` [PATCH net-next 3/6] inet: annotate data-races around ifa->ifa_preferred_lft Eric Dumazet
@ 2024-02-29 11:40 ` Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 5/6] inet: prepare inet_base_seq() to run without RTNL Eric Dumazet
` (3 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2024-02-29 11:40 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Jiri Pirko, David Ahern, netdev, Florian Westphal, eric.dumazet,
Eric Dumazet
ifa->ifa_flags can be read locklessly.
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/devinet.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 368fb56e1f1b2e3b7888f611a3f113f3bd5fc2af..550b775cbbf3c140c66e224c69996df7051b3d36 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -716,8 +716,10 @@ static void check_lifetime(struct work_struct *work)
unsigned long age, tstamp;
u32 preferred_lft;
u32 valid_lft;
+ u32 flags;
- if (ifa->ifa_flags & IFA_F_PERMANENT)
+ flags = READ_ONCE(ifa->ifa_flags);
+ if (flags & IFA_F_PERMANENT)
continue;
preferred_lft = READ_ONCE(ifa->ifa_preferred_lft);
@@ -737,7 +739,7 @@ static void check_lifetime(struct work_struct *work)
if (time_before(tstamp + valid_lft * HZ, next))
next = tstamp + valid_lft * HZ;
- if (!(ifa->ifa_flags & IFA_F_DEPRECATED))
+ if (!(flags & IFA_F_DEPRECATED))
change_needed = true;
} else if (time_before(tstamp + preferred_lft * HZ,
next)) {
@@ -805,21 +807,23 @@ static void set_ifa_lifetime(struct in_ifaddr *ifa, __u32 valid_lft,
__u32 prefered_lft)
{
unsigned long timeout;
+ u32 flags;
- ifa->ifa_flags &= ~(IFA_F_PERMANENT | IFA_F_DEPRECATED);
+ flags = ifa->ifa_flags & ~(IFA_F_PERMANENT | IFA_F_DEPRECATED);
timeout = addrconf_timeout_fixup(valid_lft, HZ);
if (addrconf_finite_timeout(timeout))
WRITE_ONCE(ifa->ifa_valid_lft, timeout);
else
- ifa->ifa_flags |= IFA_F_PERMANENT;
+ flags |= IFA_F_PERMANENT;
timeout = addrconf_timeout_fixup(prefered_lft, HZ);
if (addrconf_finite_timeout(timeout)) {
if (timeout == 0)
- ifa->ifa_flags |= IFA_F_DEPRECATED;
+ flags |= IFA_F_DEPRECATED;
WRITE_ONCE(ifa->ifa_preferred_lft, timeout);
}
+ WRITE_ONCE(ifa->ifa_flags, flags);
WRITE_ONCE(ifa->ifa_tstamp, jiffies);
if (!ifa->ifa_cstamp)
WRITE_ONCE(ifa->ifa_cstamp, ifa->ifa_tstamp);
@@ -1313,7 +1317,7 @@ static __be32 in_dev_select_addr(const struct in_device *in_dev,
const struct in_ifaddr *ifa;
in_dev_for_each_ifa_rcu(ifa, in_dev) {
- if (ifa->ifa_flags & IFA_F_SECONDARY)
+ if (READ_ONCE(ifa->ifa_flags) & IFA_F_SECONDARY)
continue;
if (ifa->ifa_scope != RT_SCOPE_LINK &&
ifa->ifa_scope <= scope)
@@ -1341,7 +1345,7 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
localnet_scope = RT_SCOPE_LINK;
in_dev_for_each_ifa_rcu(ifa, in_dev) {
- if (ifa->ifa_flags & IFA_F_SECONDARY)
+ if (READ_ONCE(ifa->ifa_flags) & IFA_F_SECONDARY)
continue;
if (min(ifa->ifa_scope, localnet_scope) > scope)
continue;
@@ -1688,7 +1692,7 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
ifm = nlmsg_data(nlh);
ifm->ifa_family = AF_INET;
ifm->ifa_prefixlen = ifa->ifa_prefixlen;
- ifm->ifa_flags = ifa->ifa_flags;
+ ifm->ifa_flags = READ_ONCE(ifa->ifa_flags);
ifm->ifa_scope = ifa->ifa_scope;
ifm->ifa_index = ifa->ifa_dev->dev->ifindex;
@@ -1728,7 +1732,7 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
nla_put_string(skb, IFA_LABEL, ifa->ifa_label)) ||
(ifa->ifa_proto &&
nla_put_u8(skb, IFA_PROTO, ifa->ifa_proto)) ||
- nla_put_u32(skb, IFA_FLAGS, ifa->ifa_flags) ||
+ nla_put_u32(skb, IFA_FLAGS, ifm->ifa_flags) ||
(ifa->ifa_rt_priority &&
nla_put_u32(skb, IFA_RT_PRIORITY, ifa->ifa_rt_priority)) ||
put_cacheinfo(skb, READ_ONCE(ifa->ifa_cstamp), tstamp,
--
2.44.0.278.ge034bb2e1d-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 5/6] inet: prepare inet_base_seq() to run without RTNL
2024-02-29 11:40 [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr() Eric Dumazet
` (3 preceding siblings ...)
2024-02-29 11:40 ` [PATCH net-next 4/6] inet: annotate data-races around ifa->ifa_flags Eric Dumazet
@ 2024-02-29 11:40 ` Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 6/6] inet: use xa_array iterator to implement inet_dump_ifaddr() Eric Dumazet
` (2 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2024-02-29 11:40 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Jiri Pirko, David Ahern, netdev, Florian Westphal, eric.dumazet,
Eric Dumazet
In the following patch, inet_base_seq() will no longer be called
with RTNL held.
Add READ_ONCE()/WRITE_ONCE() annotations in dev_base_seq_inc()
and inet_base_seq().
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/dev.c | 5 +++--
net/ipv4/devinet.c | 2 +-
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 053fac78305c7322b894ceb07a925f7e64ed70aa..873e095e141db3e631e51763435ddafbf0d280af 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -180,8 +180,9 @@ static DECLARE_RWSEM(devnet_rename_sem);
static inline void dev_base_seq_inc(struct net *net)
{
- while (++net->dev_base_seq == 0)
- ;
+ unsigned int val = net->dev_base_seq + 1;
+
+ WRITE_ONCE(net->dev_base_seq, val ?: 1);
}
static inline struct hlist_head *dev_name_hash(struct net *net, const char *name)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 550b775cbbf3c140c66e224c69996df7051b3d36..2afe78dfc3c2f6c0394925f1c35532a2dfd26d71 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1837,7 +1837,7 @@ static int in_dev_dump_addr(struct in_device *in_dev, struct sk_buff *skb,
static u32 inet_base_seq(const struct net *net)
{
u32 res = atomic_read(&net->ipv4.dev_addr_genid) +
- net->dev_base_seq;
+ READ_ONCE(net->dev_base_seq);
/* Must not return 0 (see nl_dump_check_consistent()).
* Chose a value far away from 0.
--
2.44.0.278.ge034bb2e1d-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 6/6] inet: use xa_array iterator to implement inet_dump_ifaddr()
2024-02-29 11:40 [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr() Eric Dumazet
` (4 preceding siblings ...)
2024-02-29 11:40 ` [PATCH net-next 5/6] inet: prepare inet_base_seq() to run without RTNL Eric Dumazet
@ 2024-02-29 11:40 ` Eric Dumazet
2024-02-29 15:37 ` Jakub Kicinski
2024-02-29 14:18 ` [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr() Jiri Pirko
2024-03-01 11:20 ` patchwork-bot+netdevbpf
7 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2024-02-29 11:40 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Jiri Pirko, David Ahern, netdev, Florian Westphal, eric.dumazet,
Eric Dumazet
1) inet_dump_ifaddr() can can run under RCU protection
instead of RTNL.
2) properly return 0 at the end of a dump, avoiding an
an extra recvmsg() system call.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/devinet.c | 95 ++++++++++++++++++----------------------------
1 file changed, 37 insertions(+), 58 deletions(-)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 2afe78dfc3c2f6c0394925f1c35532a2dfd26d71..4daa8124f247c256c4f8c1ff29ac621570af0755 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1676,7 +1676,7 @@ static int put_cacheinfo(struct sk_buff *skb, unsigned long cstamp,
return nla_put(skb, IFA_CACHEINFO, sizeof(ci), &ci);
}
-static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
+static int inet_fill_ifaddr(struct sk_buff *skb, const struct in_ifaddr *ifa,
struct inet_fill_args *args)
{
struct ifaddrmsg *ifm;
@@ -1805,15 +1805,15 @@ static int inet_valid_dump_ifaddr_req(const struct nlmsghdr *nlh,
}
static int in_dev_dump_addr(struct in_device *in_dev, struct sk_buff *skb,
- struct netlink_callback *cb, int s_ip_idx,
+ struct netlink_callback *cb, int *s_ip_idx,
struct inet_fill_args *fillargs)
{
struct in_ifaddr *ifa;
int ip_idx = 0;
int err;
- in_dev_for_each_ifa_rtnl(ifa, in_dev) {
- if (ip_idx < s_ip_idx) {
+ in_dev_for_each_ifa_rcu(ifa, in_dev) {
+ if (ip_idx < *s_ip_idx) {
ip_idx++;
continue;
}
@@ -1825,9 +1825,9 @@ static int in_dev_dump_addr(struct in_device *in_dev, struct sk_buff *skb,
ip_idx++;
}
err = 0;
-
+ ip_idx = 0;
done:
- cb->args[2] = ip_idx;
+ *s_ip_idx = ip_idx;
return err;
}
@@ -1859,75 +1859,53 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
};
struct net *net = sock_net(skb->sk);
struct net *tgt_net = net;
- int h, s_h;
- int idx, s_idx;
- int s_ip_idx;
- struct net_device *dev;
+ struct {
+ unsigned long ifindex;
+ int ip_idx;
+ } *ctx = (void *)cb->ctx;
struct in_device *in_dev;
- struct hlist_head *head;
+ struct net_device *dev;
int err = 0;
- s_h = cb->args[0];
- s_idx = idx = cb->args[1];
- s_ip_idx = cb->args[2];
-
+ rcu_read_lock();
if (cb->strict_check) {
err = inet_valid_dump_ifaddr_req(nlh, &fillargs, &tgt_net,
skb->sk, cb);
if (err < 0)
- goto put_tgt_net;
+ goto done;
- err = 0;
if (fillargs.ifindex) {
- dev = __dev_get_by_index(tgt_net, fillargs.ifindex);
- if (!dev) {
- err = -ENODEV;
- goto put_tgt_net;
- }
-
- in_dev = __in_dev_get_rtnl(dev);
- if (in_dev) {
- err = in_dev_dump_addr(in_dev, skb, cb, s_ip_idx,
- &fillargs);
- }
- goto put_tgt_net;
- }
- }
-
- for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
- idx = 0;
- head = &tgt_net->dev_index_head[h];
- rcu_read_lock();
- cb->seq = inet_base_seq(tgt_net);
- hlist_for_each_entry_rcu(dev, head, index_hlist) {
- if (idx < s_idx)
- goto cont;
- if (h > s_h || idx > s_idx)
- s_ip_idx = 0;
+ err = -ENODEV;
+ dev = dev_get_by_index_rcu(tgt_net, fillargs.ifindex);
+ if (!dev)
+ goto done;
in_dev = __in_dev_get_rcu(dev);
if (!in_dev)
- goto cont;
-
- err = in_dev_dump_addr(in_dev, skb, cb, s_ip_idx,
- &fillargs);
- if (err < 0) {
- rcu_read_unlock();
goto done;
- }
-cont:
- idx++;
+ err = in_dev_dump_addr(in_dev, skb, cb, &ctx->ip_idx,
+ &fillargs);
+ goto done;
}
- rcu_read_unlock();
}
+ cb->seq = inet_base_seq(tgt_net);
+
+ for_each_netdev_dump(net, dev, ctx->ifindex) {
+ in_dev = __in_dev_get_rcu(dev);
+ if (!in_dev)
+ continue;
+ err = in_dev_dump_addr(in_dev, skb, cb, &ctx->ip_idx,
+ &fillargs);
+ if (err < 0)
+ goto done;
+ }
done:
- cb->args[0] = h;
- cb->args[1] = idx;
-put_tgt_net:
+ if (err < 0 && likely(skb->len))
+ err = skb->len;
if (fillargs.netnsid >= 0)
put_net(tgt_net);
-
- return skb->len ? : err;
+ rcu_read_unlock();
+ return err;
}
static void rtmsg_ifa(int event, struct in_ifaddr *ifa, struct nlmsghdr *nlh,
@@ -2818,7 +2796,8 @@ void __init devinet_init(void)
rtnl_register(PF_INET, RTM_NEWADDR, inet_rtm_newaddr, NULL, 0);
rtnl_register(PF_INET, RTM_DELADDR, inet_rtm_deladdr, NULL, 0);
- rtnl_register(PF_INET, RTM_GETADDR, NULL, inet_dump_ifaddr, 0);
+ rtnl_register(PF_INET, RTM_GETADDR, NULL, inet_dump_ifaddr,
+ RTNL_FLAG_DUMP_UNLOCKED);
rtnl_register(PF_INET, RTM_GETNETCONF, inet_netconf_get_devconf,
inet_netconf_dump_devconf,
RTNL_FLAG_DOIT_UNLOCKED | RTNL_FLAG_DUMP_UNLOCKED);
--
2.44.0.278.ge034bb2e1d-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr()
2024-02-29 11:40 [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr() Eric Dumazet
` (5 preceding siblings ...)
2024-02-29 11:40 ` [PATCH net-next 6/6] inet: use xa_array iterator to implement inet_dump_ifaddr() Eric Dumazet
@ 2024-02-29 14:18 ` Jiri Pirko
2024-03-01 11:20 ` patchwork-bot+netdevbpf
7 siblings, 0 replies; 14+ messages in thread
From: Jiri Pirko @ 2024-02-29 14:18 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Jiri Pirko,
David Ahern, netdev, Florian Westphal, eric.dumazet
Thu, Feb 29, 2024 at 12:40:10PM CET, edumazet@google.com wrote:
>This series convert inet so that a dump of addresses (ip -4 addr)
>no longer requires RTNL.
>
>Eric Dumazet (6):
> inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstamp
> inet: annotate data-races around ifa->ifa_valid_lft
> inet: annotate data-races around ifa->ifa_preferred_lft
> inet: annotate data-races around ifa->ifa_flags
> inet: prepare inet_base_seq() to run without RTNL
> inet: use xa_array iterator to implement inet_dump_ifaddr()
>
> net/core/dev.c | 5 +-
> net/ipv4/devinet.c | 166 +++++++++++++++++++++------------------------
> 2 files changed, 79 insertions(+), 92 deletions(-)
Looks fine to me.
set-
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 6/6] inet: use xa_array iterator to implement inet_dump_ifaddr()
2024-02-29 11:40 ` [PATCH net-next 6/6] inet: use xa_array iterator to implement inet_dump_ifaddr() Eric Dumazet
@ 2024-02-29 15:37 ` Jakub Kicinski
2024-02-29 15:45 ` Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-02-29 15:37 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Paolo Abeni, Jiri Pirko, David Ahern, netdev,
Florian Westphal, eric.dumazet
On Thu, 29 Feb 2024 11:40:16 +0000 Eric Dumazet wrote:
> + if (err < 0 && likely(skb->len))
> + err = skb->len;
I think Ido may have commented on one of your early series, but if we
set err to skb->len we'll have to do an extra empty message to terminate
the dump.
You basically only want to return skb->len when message has
overflown, so the somewhat idiomatic way to do this is:
err = (err == -EMSGSIZE) ? skb->len : err;
Assuming err can't be set to some weird positive value.
IDK if you want to do this in future patches or it's risky, but I have
the itch to tell you every time I see a conversion which doesn't follow
this pattern :)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 6/6] inet: use xa_array iterator to implement inet_dump_ifaddr()
2024-02-29 15:37 ` Jakub Kicinski
@ 2024-02-29 15:45 ` Eric Dumazet
2024-02-29 15:50 ` Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2024-02-29 15:45 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Paolo Abeni, Jiri Pirko, David Ahern, netdev,
Florian Westphal, eric.dumazet
On Thu, Feb 29, 2024 at 4:37 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 29 Feb 2024 11:40:16 +0000 Eric Dumazet wrote:
> > + if (err < 0 && likely(skb->len))
> > + err = skb->len;
>
> I think Ido may have commented on one of your early series, but if we
> set err to skb->len we'll have to do an extra empty message to terminate
> the dump.
>
> You basically only want to return skb->len when message has
> overflown, so the somewhat idiomatic way to do this is:
>
> err = (err == -EMSGSIZE) ? skb->len : err;
>
> Assuming err can't be set to some weird positive value.
>
> IDK if you want to do this in future patches or it's risky, but I have
> the itch to tell you every time I see a conversion which doesn't follow
> this pattern :)
This totally makes sense.
I will send a followup patch to fix all these in one go, if this is ok
with you ?
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 6/6] inet: use xa_array iterator to implement inet_dump_ifaddr()
2024-02-29 15:45 ` Eric Dumazet
@ 2024-02-29 15:50 ` Eric Dumazet
2024-02-29 16:21 ` Jakub Kicinski
0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2024-02-29 15:50 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Paolo Abeni, Jiri Pirko, David Ahern, netdev,
Florian Westphal, eric.dumazet
On Thu, Feb 29, 2024 at 4:45 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Feb 29, 2024 at 4:37 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Thu, 29 Feb 2024 11:40:16 +0000 Eric Dumazet wrote:
> > > + if (err < 0 && likely(skb->len))
> > > + err = skb->len;
> >
> > I think Ido may have commented on one of your early series, but if we
> > set err to skb->len we'll have to do an extra empty message to terminate
> > the dump.
> >
> > You basically only want to return skb->len when message has
> > overflown, so the somewhat idiomatic way to do this is:
> >
> > err = (err == -EMSGSIZE) ? skb->len : err;
This would set err to zero if skb is empty at this point.
I guess a more correct action would be:
if (err == -EMSGSIZE && likely(skb->len))
err = skb->len;
> >
> > Assuming err can't be set to some weird positive value.
> >
> > IDK if you want to do this in future patches or it's risky, but I have
> > the itch to tell you every time I see a conversion which doesn't follow
> > this pattern :)
>
> This totally makes sense.
>
> I will send a followup patch to fix all these in one go, if this is ok
> with you ?
>
> Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 6/6] inet: use xa_array iterator to implement inet_dump_ifaddr()
2024-02-29 15:50 ` Eric Dumazet
@ 2024-02-29 16:21 ` Jakub Kicinski
2024-02-29 16:26 ` Eric Dumazet
0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-02-29 16:21 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Paolo Abeni, Jiri Pirko, David Ahern, netdev,
Florian Westphal, eric.dumazet
On Thu, 29 Feb 2024 16:50:45 +0100 Eric Dumazet wrote:
> > > You basically only want to return skb->len when message has
> > > overflown, so the somewhat idiomatic way to do this is:
> > >
> > > err = (err == -EMSGSIZE) ? skb->len : err;
>
> This would set err to zero if skb is empty at this point.
>
> I guess a more correct action would be:
>
> if (err == -EMSGSIZE && likely(skb->len))
> err = skb->len;
Ugh, fair point.
We should probably move the EMSGSIZE handling to the core, this is
getting too complicated for average humans.. Like this?
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 765921cf1194..ce27003b90a8 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2264,6 +2264,8 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
if (extra_mutex)
mutex_lock(extra_mutex);
nlk->dump_done_errno = cb->dump(skb, cb);
+ if (nlk->dump_done_errno == -EMSGSIZE && skb->len)
+ nlk->dump_done_errno = skb->len;
if (extra_mutex)
mutex_unlock(extra_mutex);
> > >
> > > Assuming err can't be set to some weird positive value.
> > >
> > > IDK if you want to do this in future patches or it's risky, but I have
> > > the itch to tell you every time I see a conversion which doesn't follow
> > > this pattern :)
> >
> > This totally makes sense.
> >
> > I will send a followup patch to fix all these in one go, if this is ok
> > with you ?
Definitely not a blocker
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 6/6] inet: use xa_array iterator to implement inet_dump_ifaddr()
2024-02-29 16:21 ` Jakub Kicinski
@ 2024-02-29 16:26 ` Eric Dumazet
0 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2024-02-29 16:26 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S . Miller, Paolo Abeni, Jiri Pirko, David Ahern, netdev,
Florian Westphal, eric.dumazet
On Thu, Feb 29, 2024 at 5:21 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 29 Feb 2024 16:50:45 +0100 Eric Dumazet wrote:
> > > > You basically only want to return skb->len when message has
> > > > overflown, so the somewhat idiomatic way to do this is:
> > > >
> > > > err = (err == -EMSGSIZE) ? skb->len : err;
> >
> > This would set err to zero if skb is empty at this point.
> >
> > I guess a more correct action would be:
> >
> > if (err == -EMSGSIZE && likely(skb->len))
> > err = skb->len;
>
> Ugh, fair point.
> We should probably move the EMSGSIZE handling to the core, this is
> getting too complicated for average humans.. Like this?
>
Now you are talking ! Yep, definitely nicer.
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 765921cf1194..ce27003b90a8 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -2264,6 +2264,8 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
> if (extra_mutex)
> mutex_lock(extra_mutex);
> nlk->dump_done_errno = cb->dump(skb, cb);
> + if (nlk->dump_done_errno == -EMSGSIZE && skb->len)
> + nlk->dump_done_errno = skb->len;
> if (extra_mutex)
> mutex_unlock(extra_mutex);
>
> > > >
> > > > Assuming err can't be set to some weird positive value.
> > > >
> > > > IDK if you want to do this in future patches or it's risky, but I have
> > > > the itch to tell you every time I see a conversion which doesn't follow
> > > > this pattern :)
> > >
> > > This totally makes sense.
> > >
> > > I will send a followup patch to fix all these in one go, if this is ok
> > > with you ?
>
> Definitely not a blocker
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr()
2024-02-29 11:40 [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr() Eric Dumazet
` (6 preceding siblings ...)
2024-02-29 14:18 ` [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr() Jiri Pirko
@ 2024-03-01 11:20 ` patchwork-bot+netdevbpf
7 siblings, 0 replies; 14+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-03-01 11:20 UTC (permalink / raw)
To: Eric Dumazet; +Cc: davem, kuba, pabeni, jiri, dsahern, netdev, fw, eric.dumazet
Hello:
This series was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:
On Thu, 29 Feb 2024 11:40:10 +0000 you wrote:
> This series convert inet so that a dump of addresses (ip -4 addr)
> no longer requires RTNL.
>
> Eric Dumazet (6):
> inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstamp
> inet: annotate data-races around ifa->ifa_valid_lft
> inet: annotate data-races around ifa->ifa_preferred_lft
> inet: annotate data-races around ifa->ifa_flags
> inet: prepare inet_base_seq() to run without RTNL
> inet: use xa_array iterator to implement inet_dump_ifaddr()
>
> [...]
Here is the summary with links:
- [net-next,1/6] inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstamp
https://git.kernel.org/netdev/net-next/c/3cd3e72ccb3a
- [net-next,2/6] inet: annotate data-races around ifa->ifa_valid_lft
https://git.kernel.org/netdev/net-next/c/a5fcf74d80be
- [net-next,3/6] inet: annotate data-races around ifa->ifa_preferred_lft
https://git.kernel.org/netdev/net-next/c/9f6fa3c4e722
- [net-next,4/6] inet: annotate data-races around ifa->ifa_flags
https://git.kernel.org/netdev/net-next/c/3ddc2231c810
- [net-next,5/6] inet: prepare inet_base_seq() to run without RTNL
https://git.kernel.org/netdev/net-next/c/590e92cdc835
- [net-next,6/6] inet: use xa_array iterator to implement inet_dump_ifaddr()
https://git.kernel.org/netdev/net-next/c/cdb2f80f1c10
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-03-01 11:20 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-29 11:40 [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr() Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 1/6] inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstamp Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 2/6] inet: annotate data-races around ifa->ifa_valid_lft Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 3/6] inet: annotate data-races around ifa->ifa_preferred_lft Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 4/6] inet: annotate data-races around ifa->ifa_flags Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 5/6] inet: prepare inet_base_seq() to run without RTNL Eric Dumazet
2024-02-29 11:40 ` [PATCH net-next 6/6] inet: use xa_array iterator to implement inet_dump_ifaddr() Eric Dumazet
2024-02-29 15:37 ` Jakub Kicinski
2024-02-29 15:45 ` Eric Dumazet
2024-02-29 15:50 ` Eric Dumazet
2024-02-29 16:21 ` Jakub Kicinski
2024-02-29 16:26 ` Eric Dumazet
2024-02-29 14:18 ` [PATCH net-next 0/6] inet: no longer use RTNL to protect inet_dump_ifaddr() Jiri Pirko
2024-03-01 11:20 ` patchwork-bot+netdevbpf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.