All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Oltean <olteanv@gmail.com>
To: "David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, "Andrew Lunn" <andrew@lunn.ch>,
	"Florian Fainelli" <f.fainelli@gmail.com>,
	"Paul Gortmaker" <paul.gortmaker@windriver.com>,
	"Pablo Neira Ayuso" <pablo@netfilter.org>,
	"Jiri Benc" <jbenc@redhat.com>,
	"Cong Wang" <xiyou.wangcong@gmail.com>,
	"Jamal Hadi Salim" <jhs@mojatatu.com>,
	"Stephen Hemminger" <stephen@networkplumber.org>,
	"Eric Dumazet" <edumazet@google.com>,
	"George McCollister" <george.mccollister@gmail.com>,
	"Oleksij Rempel" <o.rempel@pengutronix.de>,
	"Jay Vosburgh" <j.vosburgh@gmail.com>,
	"Veaceslav Falico" <vfalico@gmail.com>,
	"Andy Gospodarek" <andy@greyhouse.net>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Andrii Nakryiko" <andriin@fb.com>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Taehee Yoo" <ap420073@gmail.com>,
	"Jiri Pirko" <jiri@mellanox.com>,
	"Björn Töpel" <bjorn.topel@intel.com>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Christian Brauner" <christian.brauner@ubuntu.com>,
	"Florian Westphal" <fw@strlen.de>,
	linux-s390@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	linux-parisc@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-usb@vger.kernel.org, dev@openvswitch.org
Subject: [RFC PATCH v2 net-next 01/12] net: mark dev_base_lock for deprecation
Date: Tue,  5 Jan 2021 20:58:51 +0200	[thread overview]
Message-ID: <20210105185902.3922928-2-olteanv@gmail.com> (raw)
In-Reply-To: <20210105185902.3922928-1-olteanv@gmail.com>

From: Vladimir Oltean <vladimir.oltean@nxp.com>

There is a movement to eliminate the usage of dev_base_lock, which
exists since as far as I could track the kernel history down (the
"7a2deb329241 Import changeset" commit from the bitkeeper branch).

The dev_base_lock approach has multiple issues:
- It is global and not per netns.
- Its meaning has mutated over the years and the vast majority of
  current users is using it to protect against changes of network device
  state, as opposed to changes of the interface list.
- It is overlapping with other protection mechanisms introduced in the
  meantime, which have higher performance for readers, like RCU
  introduced in 2009 by Eric Dumazet for kernel 2.6.

The old comment that I just deleted made this distinction: writers
protect only against readers by holding dev_base_lock, whereas they need
to protect against other writers by holding the RTNL mutex. This is
slightly confusing/incorrect, since a rwlock_t cannot have more than one
concurrently running writer, so that explanation does not justify why
the RTNL mutex would be necessary.

Instead, Stephen Hemminger makes this clarification here:
https://lore.kernel.org/netdev/20201129211230.4d704931@hermes.local/T/#u

| There are really two different domains being covered by locks here.
|
| The first area is change of state of network devices. This has traditionally
| been covered by RTNL because there are places that depend on coordinating
| state between multiple devices. RTNL is too big and held too long but getting
| rid of it is hard because there are corner cases (like state changes from userspace
| for VPN devices).
|
| The other area is code that wants to do read access to look at list of devices.
| These pure readers can/should be converted to RCU by now. Writers should hold RTNL.

This patch edits the comment for dev_base_lock, minimizing its role in
the protection of network interface lists, and clarifies that it is has
other purposes as well.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/core/dev.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index a46334906c94..2aa613d22318 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -169,23 +169,22 @@ static int call_netdevice_notifiers_extack(unsigned long val,
 static struct napi_struct *napi_by_id(unsigned int napi_id);
 
 /*
- * The @dev_base_head list is protected by @dev_base_lock and the rtnl
- * semaphore.
- *
- * Pure readers hold dev_base_lock for reading, or rcu_read_lock()
- *
- * Writers must hold the rtnl semaphore while they loop through the
- * dev_base_head list, and hold dev_base_lock for writing when they do the
- * actual updates.  This allows pure readers to access the list even
- * while a writer is preparing to update it.
- *
- * To put it another way, dev_base_lock is held for writing only to
- * protect against pure readers; the rtnl semaphore provides the
- * protection against other writers.
- *
- * See, for example usages, register_netdevice() and
- * unregister_netdevice(), which must be called with the rtnl
- * semaphore held.
+ * The network interface list of a netns (@net->dev_base_head) and the hash
+ * tables per ifindex (@net->dev_index_head) and per name (@net->dev_name_head)
+ * are protected using the following rules:
+ *
+ * Pure readers should hold rcu_read_lock() which should protect them against
+ * concurrent changes to the interface lists made by the writers. Pure writers
+ * must serialize by holding the RTNL mutex while they loop through the list
+ * and make changes to it.
+ *
+ * It is also possible to hold the global rwlock_t @dev_base_lock for
+ * protection (holding its read side as an alternative to rcu_read_lock, and
+ * its write side as an alternative to the RTNL mutex), however this should not
+ * be done in new code, since it is deprecated and pending removal.
+ *
+ * One other role of @dev_base_lock is to protect against changes in the
+ * operational state of a network interface.
  */
 DEFINE_RWLOCK(dev_base_lock);
 EXPORT_SYMBOL(dev_base_lock);
-- 
2.25.1


WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Oltean <olteanv@gmail.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [RFC PATCH v2 net-next 01/12] net: mark dev_base_lock for deprecation
Date: Tue,  5 Jan 2021 20:58:51 +0200	[thread overview]
Message-ID: <20210105185902.3922928-2-olteanv@gmail.com> (raw)
In-Reply-To: <20210105185902.3922928-1-olteanv@gmail.com>

From: Vladimir Oltean <vladimir.oltean@nxp.com>

There is a movement to eliminate the usage of dev_base_lock, which
exists since as far as I could track the kernel history down (the
"7a2deb329241 Import changeset" commit from the bitkeeper branch).

The dev_base_lock approach has multiple issues:
- It is global and not per netns.
- Its meaning has mutated over the years and the vast majority of
  current users is using it to protect against changes of network device
  state, as opposed to changes of the interface list.
- It is overlapping with other protection mechanisms introduced in the
  meantime, which have higher performance for readers, like RCU
  introduced in 2009 by Eric Dumazet for kernel 2.6.

The old comment that I just deleted made this distinction: writers
protect only against readers by holding dev_base_lock, whereas they need
to protect against other writers by holding the RTNL mutex. This is
slightly confusing/incorrect, since a rwlock_t cannot have more than one
concurrently running writer, so that explanation does not justify why
the RTNL mutex would be necessary.

Instead, Stephen Hemminger makes this clarification here:
https://lore.kernel.org/netdev/20201129211230.4d704931 at hermes.local/T/#u

| There are really two different domains being covered by locks here.
|
| The first area is change of state of network devices. This has traditionally
| been covered by RTNL because there are places that depend on coordinating
| state between multiple devices. RTNL is too big and held too long but getting
| rid of it is hard because there are corner cases (like state changes from userspace
| for VPN devices).
|
| The other area is code that wants to do read access to look at list of devices.
| These pure readers can/should be converted to RCU by now. Writers should hold RTNL.

This patch edits the comment for dev_base_lock, minimizing its role in
the protection of network interface lists, and clarifies that it is has
other purposes as well.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/core/dev.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index a46334906c94..2aa613d22318 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -169,23 +169,22 @@ static int call_netdevice_notifiers_extack(unsigned long val,
 static struct napi_struct *napi_by_id(unsigned int napi_id);
 
 /*
- * The @dev_base_head list is protected by @dev_base_lock and the rtnl
- * semaphore.
- *
- * Pure readers hold dev_base_lock for reading, or rcu_read_lock()
- *
- * Writers must hold the rtnl semaphore while they loop through the
- * dev_base_head list, and hold dev_base_lock for writing when they do the
- * actual updates.  This allows pure readers to access the list even
- * while a writer is preparing to update it.
- *
- * To put it another way, dev_base_lock is held for writing only to
- * protect against pure readers; the rtnl semaphore provides the
- * protection against other writers.
- *
- * See, for example usages, register_netdevice() and
- * unregister_netdevice(), which must be called with the rtnl
- * semaphore held.
+ * The network interface list of a netns (@net->dev_base_head) and the hash
+ * tables per ifindex (@net->dev_index_head) and per name (@net->dev_name_head)
+ * are protected using the following rules:
+ *
+ * Pure readers should hold rcu_read_lock() which should protect them against
+ * concurrent changes to the interface lists made by the writers. Pure writers
+ * must serialize by holding the RTNL mutex while they loop through the list
+ * and make changes to it.
+ *
+ * It is also possible to hold the global rwlock_t @dev_base_lock for
+ * protection (holding its read side as an alternative to rcu_read_lock, and
+ * its write side as an alternative to the RTNL mutex), however this should not
+ * be done in new code, since it is deprecated and pending removal.
+ *
+ * One other role of @dev_base_lock is to protect against changes in the
+ * operational state of a network interface.
  */
 DEFINE_RWLOCK(dev_base_lock);
 EXPORT_SYMBOL(dev_base_lock);
-- 
2.25.1


  reply	other threads:[~2021-01-05 19:02 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-05 18:58 [RFC PATCH v2 net-next 00/12] Make .ndo_get_stats64 sleepable Vladimir Oltean
2021-01-05 18:58 ` [Intel-wired-lan] " Vladimir Oltean
2021-01-05 18:58 ` Vladimir Oltean [this message]
2021-01-05 18:58   ` [Intel-wired-lan] [RFC PATCH v2 net-next 01/12] net: mark dev_base_lock for deprecation Vladimir Oltean
2021-01-05 18:58 ` [RFC PATCH v2 net-next 02/12] net: introduce a mutex for the netns interface lists Vladimir Oltean
2021-01-05 18:58   ` [Intel-wired-lan] " Vladimir Oltean
2021-01-05 18:58 ` [RFC PATCH v2 net-next 03/12] net: procfs: hold netif_lists_lock when retrieving device statistics Vladimir Oltean
2021-01-05 18:58   ` [Intel-wired-lan] " Vladimir Oltean
2021-01-05 18:58 ` [RFC PATCH v2 net-next 04/12] net: sysfs: don't hold dev_base_lock while " Vladimir Oltean
2021-01-05 18:58   ` [Intel-wired-lan] " Vladimir Oltean
2021-01-05 18:58 ` [RFC PATCH v2 net-next 05/12] s390/appldata_net_sum: hold the netdev lists lock when " Vladimir Oltean
2021-01-05 18:58   ` [Intel-wired-lan] " Vladimir Oltean
2021-01-05 18:58 ` [RFC PATCH v2 net-next 06/12] parisc/led: reindent the code that gathers " Vladimir Oltean
2021-01-05 18:58   ` [Intel-wired-lan] " Vladimir Oltean
2021-01-05 18:58 ` [RFC PATCH v2 net-next 07/12] parisc/led: hold the netdev lists lock when retrieving " Vladimir Oltean
2021-01-05 18:58   ` [Intel-wired-lan] " Vladimir Oltean
2021-01-05 18:58 ` [RFC PATCH v2 net-next 08/12] net: make dev_get_stats return void Vladimir Oltean
2021-01-05 18:58   ` [Intel-wired-lan] " Vladimir Oltean
2021-01-05 18:58 ` [RFC PATCH v2 net-next 09/12] net: net_failover: ensure .ndo_get_stats64 can sleep Vladimir Oltean
2021-01-05 18:58   ` [Intel-wired-lan] " Vladimir Oltean
2021-01-05 18:59 ` [RFC PATCH v2 net-next 10/12] net: bonding: " Vladimir Oltean
2021-01-05 18:59   ` [Intel-wired-lan] " Vladimir Oltean
2021-01-05 18:59 ` [RFC PATCH v2 net-next 11/12] net: mark ndo_get_stats64 as being able to sleep Vladimir Oltean
2021-01-05 18:59   ` [Intel-wired-lan] " Vladimir Oltean
2021-01-05 18:59 ` [RFC PATCH v2 net-next 12/12] net: remove obsolete comments about ndo_get_stats64 context from eth drivers Vladimir Oltean
2021-01-05 18:59   ` [Intel-wired-lan] " Vladimir Oltean
2021-01-06 13:45 ` [RFC PATCH v2 net-next 00/12] Make .ndo_get_stats64 sleepable Vladimir Oltean
2021-01-06 13:45   ` [Intel-wired-lan] " Vladimir Oltean
2021-01-09  1:26   ` Jacob Keller
2021-01-09  1:26     ` [Intel-wired-lan] " Jacob Keller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210105185902.3922928-2-olteanv@gmail.com \
    --to=olteanv@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=andriin@fb.com \
    --cc=andy@greyhouse.net \
    --cc=ap420073@gmail.com \
    --cc=arnd@arndb.de \
    --cc=ast@kernel.org \
    --cc=bjorn.topel@intel.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dev@openvswitch.org \
    --cc=edumazet@google.com \
    --cc=f.fainelli@gmail.com \
    --cc=fw@strlen.de \
    --cc=george.mccollister@gmail.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=j.vosburgh@gmail.com \
    --cc=jbenc@redhat.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@mellanox.com \
    --cc=kuba@kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=o.rempel@pengutronix.de \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=stephen@networkplumber.org \
    --cc=vfalico@gmail.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.