* [bisected] e341694e3eb5 netlink_lookup() rcu conversion causes latencies
@ 2014-10-11 8:36 Heiko Carstens
2014-10-11 19:32 ` Eric Dumazet
0 siblings, 1 reply; 9+ messages in thread
From: Heiko Carstens @ 2014-10-11 8:36 UTC (permalink / raw)
To: Thomas Graf, Nikolay Aleksandrov, David S. Miller
Cc: netdev, linux-kernel, Ursula Braun
Hi all,
it just came to my attention that commit e341694e3eb5
"netlink: Convert netlink_lookup() to use RCU protected hash table"
causes network latencies for me on s390.
The testcase is quite simple and 100% reproducible on s390:
Simply login via ssh to a remote system which has the above mentioned
patch applied. Any action like pressing return now has significant
latencies. Or in other words, working via such a connection becomes
a pain ;)
I haven't debugged it, however I assume the problem is that a) the
commit introduces a synchronize_net() call und b) s390 kernels
usually get compiled with CONFIG_HZ_100 while most other architectures
use CONFIG_HZ_1000.
If I change the kernel config to CONFIG_HZ_1000 the problem goes away,
however I don't consider this a fix...
Another reason why this hasn't been observed on x86 may or may not be
that we haven't implemented CONFIG_HAVE_CONTEXT_TRACKING on s390 (yet).
But that's just guessing...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bisected] e341694e3eb5 netlink_lookup() rcu conversion causes latencies
2014-10-11 8:36 [bisected] e341694e3eb5 netlink_lookup() rcu conversion causes latencies Heiko Carstens
@ 2014-10-11 19:32 ` Eric Dumazet
2014-10-11 22:25 ` Thomas Graf
0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2014-10-11 19:32 UTC (permalink / raw)
To: Heiko Carstens, Sasha Levin, paulmck
Cc: Thomas Graf, Nikolay Aleksandrov, David S. Miller, netdev,
linux-kernel, Ursula Braun
On Sat, 2014-10-11 at 10:36 +0200, Heiko Carstens wrote:
> Hi all,
>
> it just came to my attention that commit e341694e3eb5
> "netlink: Convert netlink_lookup() to use RCU protected hash table"
> causes network latencies for me on s390.
>
> The testcase is quite simple and 100% reproducible on s390:
>
> Simply login via ssh to a remote system which has the above mentioned
> patch applied. Any action like pressing return now has significant
> latencies. Or in other words, working via such a connection becomes
> a pain ;)
>
> I haven't debugged it, however I assume the problem is that a) the
> commit introduces a synchronize_net() call und b) s390 kernels
> usually get compiled with CONFIG_HZ_100 while most other architectures
> use CONFIG_HZ_1000.
> If I change the kernel config to CONFIG_HZ_1000 the problem goes away,
> however I don't consider this a fix...
>
> Another reason why this hasn't been observed on x86 may or may not be
> that we haven't implemented CONFIG_HAVE_CONTEXT_TRACKING on s390 (yet).
> But that's just guessing...
CC Paul and Sasha
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bisected] e341694e3eb5 netlink_lookup() rcu conversion causes latencies
2014-10-11 19:32 ` Eric Dumazet
@ 2014-10-11 22:25 ` Thomas Graf
2014-10-11 23:08 ` David Miller
2014-10-20 8:21 ` Heiko Carstens
0 siblings, 2 replies; 9+ messages in thread
From: Thomas Graf @ 2014-10-11 22:25 UTC (permalink / raw)
To: Eric Dumazet
Cc: Heiko Carstens, Sasha Levin, paulmck, Nikolay Aleksandrov,
David S. Miller, netdev, linux-kernel, Ursula Braun
On 10/11/14 at 12:32pm, Eric Dumazet wrote:
> On Sat, 2014-10-11 at 10:36 +0200, Heiko Carstens wrote:
> > Hi all,
> >
> > it just came to my attention that commit e341694e3eb5
> > "netlink: Convert netlink_lookup() to use RCU protected hash table"
> > causes network latencies for me on s390.
> >
> > The testcase is quite simple and 100% reproducible on s390:
> >
> > Simply login via ssh to a remote system which has the above mentioned
> > patch applied. Any action like pressing return now has significant
> > latencies. Or in other words, working via such a connection becomes
> > a pain ;)
> >
> > I haven't debugged it, however I assume the problem is that a) the
> > commit introduces a synchronize_net() call und b) s390 kernels
> > usually get compiled with CONFIG_HZ_100 while most other architectures
> > use CONFIG_HZ_1000.
> > If I change the kernel config to CONFIG_HZ_1000 the problem goes away,
> > however I don't consider this a fix...
> >
> > Another reason why this hasn't been observed on x86 may or may not be
> > that we haven't implemented CONFIG_HAVE_CONTEXT_TRACKING on s390 (yet).
> > But that's just guessing...
>
> CC Paul and Sasha
I think the issue here is obvious and a fix is on the way to move
the insertion and removal to a worker to no longer require the
synchronize_rcu().
What bothers me is that the synchronize_rcu() should only occur
on expand/shrink and not for every table update. The default table
size is 64.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bisected] e341694e3eb5 netlink_lookup() rcu conversion causes latencies
2014-10-11 22:25 ` Thomas Graf
@ 2014-10-11 23:08 ` David Miller
2014-10-20 8:21 ` Heiko Carstens
1 sibling, 0 replies; 9+ messages in thread
From: David Miller @ 2014-10-11 23:08 UTC (permalink / raw)
To: tgraf
Cc: eric.dumazet, heiko.carstens, sasha.levin, paulmck, nikolay,
netdev, linux-kernel, braunu
From: Thomas Graf <tgraf@suug.ch>
Date: Sat, 11 Oct 2014 23:25:14 +0100
> I think the issue here is obvious and a fix is on the way to move
> the insertion and removal to a worker to no longer require the
> synchronize_rcu().
>
> What bothers me is that the synchronize_rcu() should only occur
> on expand/shrink and not for every table update. The default table
> size is 64.
Not true, every netlink socket release incurs a synchronize_net()
now, because we added such a call to netlink_release().
I specifically brought this up to as a possible problem when the
changes went in...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bisected] e341694e3eb5 netlink_lookup() rcu conversion causes latencies
2014-10-11 22:25 ` Thomas Graf
2014-10-11 23:08 ` David Miller
@ 2014-10-20 8:21 ` Heiko Carstens
2014-10-20 19:53 ` [PATCH] netlink: Re-add locking to netlink_lookup() and seq walker Thomas Graf
1 sibling, 1 reply; 9+ messages in thread
From: Heiko Carstens @ 2014-10-20 8:21 UTC (permalink / raw)
To: Thomas Graf
Cc: Eric Dumazet, Sasha Levin, paulmck, Nikolay Aleksandrov,
David S. Miller, netdev, linux-kernel, Ursula Braun
On Sat, Oct 11, 2014 at 11:25:14PM +0100, Thomas Graf wrote:
> On 10/11/14 at 12:32pm, Eric Dumazet wrote:
> > On Sat, 2014-10-11 at 10:36 +0200, Heiko Carstens wrote:
> > > Hi all,
> > >
> > > it just came to my attention that commit e341694e3eb5
> > > "netlink: Convert netlink_lookup() to use RCU protected hash table"
> > > causes network latencies for me on s390.
> > >
> > > The testcase is quite simple and 100% reproducible on s390:
> > >
> > > Simply login via ssh to a remote system which has the above mentioned
> > > patch applied. Any action like pressing return now has significant
> > > latencies. Or in other words, working via such a connection becomes
> > > a pain ;)
> > >
> > > I haven't debugged it, however I assume the problem is that a) the
> > > commit introduces a synchronize_net() call und b) s390 kernels
> > > usually get compiled with CONFIG_HZ_100 while most other architectures
> > > use CONFIG_HZ_1000.
> > > If I change the kernel config to CONFIG_HZ_1000 the problem goes away,
> > > however I don't consider this a fix...
> > >
> > > Another reason why this hasn't been observed on x86 may or may not be
> > > that we haven't implemented CONFIG_HAVE_CONTEXT_TRACKING on s390 (yet).
> > > But that's just guessing...
> >
> > CC Paul and Sasha
>
> I think the issue here is obvious and a fix is on the way to move
> the insertion and removal to a worker to no longer require the
> synchronize_rcu().
>
> What bothers me is that the synchronize_rcu() should only occur
> on expand/shrink and not for every table update. The default table
> size is 64.
*ping* ... is there already any patch available?
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] netlink: Re-add locking to netlink_lookup() and seq walker
2014-10-20 8:21 ` Heiko Carstens
@ 2014-10-20 19:53 ` Thomas Graf
2014-10-21 7:04 ` Heiko Carstens
0 siblings, 1 reply; 9+ messages in thread
From: Thomas Graf @ 2014-10-20 19:53 UTC (permalink / raw)
To: Heiko Carstens
Cc: Eric Dumazet, Sasha Levin, paulmck, Nikolay Aleksandrov,
David S. Miller, netdev, linux-kernel, Ursula Braun
Heiko,
Can you test the following patch:
The synchronize_rcu() in netlink_release() introduces unacceptable
latency. Reintroduce minimal lookup so we can drop the
synchronize_rcu() until socket destruction has been RCUfied.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
net/netlink/af_netlink.c | 37 +++++++++++++++++++++++++------------
1 file changed, 25 insertions(+), 12 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7a186e7..f1de72d 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -96,6 +96,14 @@ static DECLARE_WAIT_QUEUE_HEAD(nl_table_wait);
static int netlink_dump(struct sock *sk);
static void netlink_skb_destructor(struct sk_buff *skb);
+/* nl_table locking explained:
+ * Lookup and traversal are protected with nl_sk_hash_lock or nl_table_lock
+ * combined with an RCU read-side lock. Insertion and removal are protected
+ * with nl_sk_hash_lock while using RCU list modification primitives and may
+ * run in parallel to nl_table_lock protected lookups. Destruction of the
+ * Netlink socket may only occur *after* nl_table_lock has been acquired
+ * either during or after the socket has been removed from the list.
+ */
DEFINE_RWLOCK(nl_table_lock);
EXPORT_SYMBOL_GPL(nl_table_lock);
static atomic_t nl_table_users = ATOMIC_INIT(0);
@@ -109,10 +117,10 @@ EXPORT_SYMBOL_GPL(nl_sk_hash_lock);
static int lockdep_nl_sk_hash_is_held(void)
{
#ifdef CONFIG_LOCKDEP
- return (debug_locks) ? lockdep_is_held(&nl_sk_hash_lock) : 1;
-#else
- return 1;
+ if (debug_locks)
+ return lockdep_is_held(&nl_sk_hash_lock) || lockdep_is_held(&nl_table_lock);
#endif
+ return 1;
}
static ATOMIC_NOTIFIER_HEAD(netlink_chain);
@@ -1028,11 +1036,13 @@ static struct sock *netlink_lookup(struct net *net, int protocol, u32 portid)
struct netlink_table *table = &nl_table[protocol];
struct sock *sk;
+ read_lock(&nl_table_lock);
rcu_read_lock();
sk = __netlink_lookup(table, portid, net);
if (sk)
sock_hold(sk);
rcu_read_unlock();
+ read_unlock(&nl_table_lock);
return sk;
}
@@ -1257,9 +1267,6 @@ static int netlink_release(struct socket *sock)
}
netlink_table_ungrab();
- /* Wait for readers to complete */
- synchronize_net();
-
kfree(nlk->groups);
nlk->groups = NULL;
@@ -1281,6 +1288,7 @@ static int netlink_autobind(struct socket *sock)
retry:
cond_resched();
+ netlink_table_grab();
rcu_read_lock();
if (__netlink_lookup(table, portid, net)) {
/* Bind collision, search negative portid values. */
@@ -1288,9 +1296,11 @@ retry:
if (rover > -4097)
rover = -4097;
rcu_read_unlock();
+ netlink_table_ungrab();
goto retry;
}
rcu_read_unlock();
+ netlink_table_ungrab();
err = netlink_insert(sk, net, portid);
if (err == -EADDRINUSE)
@@ -2921,14 +2931,16 @@ static struct sock *netlink_seq_socket_idx(struct seq_file *seq, loff_t pos)
}
static void *netlink_seq_start(struct seq_file *seq, loff_t *pos)
- __acquires(RCU)
+ __acquires(nl_table_lock) __acquires(RCU)
{
+ read_lock(&nl_table_lock);
rcu_read_lock();
return *pos ? netlink_seq_socket_idx(seq, *pos - 1) : SEQ_START_TOKEN;
}
static void *netlink_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
+ struct rhashtable *ht;
struct netlink_sock *nlk;
struct nl_seq_iter *iter;
struct net *net;
@@ -2943,19 +2955,19 @@ static void *netlink_seq_next(struct seq_file *seq, void *v, loff_t *pos)
iter = seq->private;
nlk = v;
- rht_for_each_entry_rcu(nlk, nlk->node.next, node)
+ i = iter->link;
+ ht = &nl_table[i].hash;
+ rht_for_each_entry(nlk, nlk->node.next, ht, node)
if (net_eq(sock_net((struct sock *)nlk), net))
return nlk;
- i = iter->link;
j = iter->hash_idx + 1;
do {
- struct rhashtable *ht = &nl_table[i].hash;
const struct bucket_table *tbl = rht_dereference_rcu(ht->tbl, ht);
for (; j < tbl->size; j++) {
- rht_for_each_entry_rcu(nlk, tbl->buckets[j], node) {
+ rht_for_each_entry(nlk, tbl->buckets[j], ht, node) {
if (net_eq(sock_net((struct sock *)nlk), net)) {
iter->link = i;
iter->hash_idx = j;
@@ -2971,9 +2983,10 @@ static void *netlink_seq_next(struct seq_file *seq, void *v, loff_t *pos)
}
static void netlink_seq_stop(struct seq_file *seq, void *v)
- __releases(RCU)
+ __releases(RCU) __releases(nl_table_lock)
{
rcu_read_unlock();
+ read_unlock(&nl_table_lock);
}
--
1.9.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] netlink: Re-add locking to netlink_lookup() and seq walker
2014-10-20 19:53 ` [PATCH] netlink: Re-add locking to netlink_lookup() and seq walker Thomas Graf
@ 2014-10-21 7:04 ` Heiko Carstens
0 siblings, 0 replies; 9+ messages in thread
From: Heiko Carstens @ 2014-10-21 7:04 UTC (permalink / raw)
To: Thomas Graf
Cc: Eric Dumazet, Sasha Levin, paulmck, Nikolay Aleksandrov,
David S. Miller, netdev, linux-kernel, Ursula Braun
On Mon, Oct 20, 2014 at 08:53:55PM +0100, Thomas Graf wrote:
> Heiko,
>
> Can you test the following patch:
>
> The synchronize_rcu() in netlink_release() introduces unacceptable
> latency. Reintroduce minimal lookup so we can drop the
> synchronize_rcu() until socket destruction has been RCUfied.
>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
> net/netlink/af_netlink.c | 37 +++++++++++++++++++++++++------------
> 1 file changed, 25 insertions(+), 12 deletions(-)
Thanks a lot! Your patch fixes the issue for me.
Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] netlink: Re-add locking to netlink_lookup() and seq walker
@ 2014-10-21 20:05 Thomas Graf
2014-10-22 1:36 ` David Miller
0 siblings, 1 reply; 9+ messages in thread
From: Thomas Graf @ 2014-10-21 20:05 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-kernel, David S. Miller, Eric Dumazet
The synchronize_rcu() in netlink_release() introduces unacceptable
latency. Reintroduce minimal lookup so we can drop the
synchronize_rcu() until socket destruction has been RCUfied.
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
net/netlink/af_netlink.c | 37 +++++++++++++++++++++++++------------
1 file changed, 25 insertions(+), 12 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 7a186e7..f1de72d 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -96,6 +96,14 @@ static DECLARE_WAIT_QUEUE_HEAD(nl_table_wait);
static int netlink_dump(struct sock *sk);
static void netlink_skb_destructor(struct sk_buff *skb);
+/* nl_table locking explained:
+ * Lookup and traversal are protected with nl_sk_hash_lock or nl_table_lock
+ * combined with an RCU read-side lock. Insertion and removal are protected
+ * with nl_sk_hash_lock while using RCU list modification primitives and may
+ * run in parallel to nl_table_lock protected lookups. Destruction of the
+ * Netlink socket may only occur *after* nl_table_lock has been acquired
+ * either during or after the socket has been removed from the list.
+ */
DEFINE_RWLOCK(nl_table_lock);
EXPORT_SYMBOL_GPL(nl_table_lock);
static atomic_t nl_table_users = ATOMIC_INIT(0);
@@ -109,10 +117,10 @@ EXPORT_SYMBOL_GPL(nl_sk_hash_lock);
static int lockdep_nl_sk_hash_is_held(void)
{
#ifdef CONFIG_LOCKDEP
- return (debug_locks) ? lockdep_is_held(&nl_sk_hash_lock) : 1;
-#else
- return 1;
+ if (debug_locks)
+ return lockdep_is_held(&nl_sk_hash_lock) || lockdep_is_held(&nl_table_lock);
#endif
+ return 1;
}
static ATOMIC_NOTIFIER_HEAD(netlink_chain);
@@ -1028,11 +1036,13 @@ static struct sock *netlink_lookup(struct net *net, int protocol, u32 portid)
struct netlink_table *table = &nl_table[protocol];
struct sock *sk;
+ read_lock(&nl_table_lock);
rcu_read_lock();
sk = __netlink_lookup(table, portid, net);
if (sk)
sock_hold(sk);
rcu_read_unlock();
+ read_unlock(&nl_table_lock);
return sk;
}
@@ -1257,9 +1267,6 @@ static int netlink_release(struct socket *sock)
}
netlink_table_ungrab();
- /* Wait for readers to complete */
- synchronize_net();
-
kfree(nlk->groups);
nlk->groups = NULL;
@@ -1281,6 +1288,7 @@ static int netlink_autobind(struct socket *sock)
retry:
cond_resched();
+ netlink_table_grab();
rcu_read_lock();
if (__netlink_lookup(table, portid, net)) {
/* Bind collision, search negative portid values. */
@@ -1288,9 +1296,11 @@ retry:
if (rover > -4097)
rover = -4097;
rcu_read_unlock();
+ netlink_table_ungrab();
goto retry;
}
rcu_read_unlock();
+ netlink_table_ungrab();
err = netlink_insert(sk, net, portid);
if (err == -EADDRINUSE)
@@ -2921,14 +2931,16 @@ static struct sock *netlink_seq_socket_idx(struct seq_file *seq, loff_t pos)
}
static void *netlink_seq_start(struct seq_file *seq, loff_t *pos)
- __acquires(RCU)
+ __acquires(nl_table_lock) __acquires(RCU)
{
+ read_lock(&nl_table_lock);
rcu_read_lock();
return *pos ? netlink_seq_socket_idx(seq, *pos - 1) : SEQ_START_TOKEN;
}
static void *netlink_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
+ struct rhashtable *ht;
struct netlink_sock *nlk;
struct nl_seq_iter *iter;
struct net *net;
@@ -2943,19 +2955,19 @@ static void *netlink_seq_next(struct seq_file *seq, void *v, loff_t *pos)
iter = seq->private;
nlk = v;
- rht_for_each_entry_rcu(nlk, nlk->node.next, node)
+ i = iter->link;
+ ht = &nl_table[i].hash;
+ rht_for_each_entry(nlk, nlk->node.next, ht, node)
if (net_eq(sock_net((struct sock *)nlk), net))
return nlk;
- i = iter->link;
j = iter->hash_idx + 1;
do {
- struct rhashtable *ht = &nl_table[i].hash;
const struct bucket_table *tbl = rht_dereference_rcu(ht->tbl, ht);
for (; j < tbl->size; j++) {
- rht_for_each_entry_rcu(nlk, tbl->buckets[j], node) {
+ rht_for_each_entry(nlk, tbl->buckets[j], ht, node) {
if (net_eq(sock_net((struct sock *)nlk), net)) {
iter->link = i;
iter->hash_idx = j;
@@ -2971,9 +2983,10 @@ static void *netlink_seq_next(struct seq_file *seq, void *v, loff_t *pos)
}
static void netlink_seq_stop(struct seq_file *seq, void *v)
- __releases(RCU)
+ __releases(RCU) __releases(nl_table_lock)
{
rcu_read_unlock();
+ read_unlock(&nl_table_lock);
}
--
1.9.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] netlink: Re-add locking to netlink_lookup() and seq walker
2014-10-21 20:05 Thomas Graf
@ 2014-10-22 1:36 ` David Miller
0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2014-10-22 1:36 UTC (permalink / raw)
To: tgraf; +Cc: davem, netdev, linux-kernel, eric.dumazet
From: Thomas Graf <tgraf@suug.ch>
Date: Tue, 21 Oct 2014 22:05:38 +0200
> The synchronize_rcu() in netlink_release() introduces unacceptable
> latency. Reintroduce minimal lookup so we can drop the
> synchronize_rcu() until socket destruction has been RCUfied.
>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
> Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
Applied and queued up for -stable.
Please use an appropriate "Fixes: " tag in the future, this is
especially important for an issue as serious as this one.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-10-22 1:36 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-11 8:36 [bisected] e341694e3eb5 netlink_lookup() rcu conversion causes latencies Heiko Carstens
2014-10-11 19:32 ` Eric Dumazet
2014-10-11 22:25 ` Thomas Graf
2014-10-11 23:08 ` David Miller
2014-10-20 8:21 ` Heiko Carstens
2014-10-20 19:53 ` [PATCH] netlink: Re-add locking to netlink_lookup() and seq walker Thomas Graf
2014-10-21 7:04 ` Heiko Carstens
2014-10-21 20:05 Thomas Graf
2014-10-22 1:36 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).