netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Oltean <olteanv@gmail.com>
To: netdev@vger.kernel.org, Jakub Kicinski <kuba@kernel.org>,
	Paul Gortmaker <paul.gortmaker@windriver.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	Eric Dumazet <edumazet@google.com>, Jiri Benc <jbenc@redhat.com>,
	Or Gerlitz <ogerlitz@mellanox.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Andrew Lunn <andrew@lunn.ch>, Florian Fainelli <f.fainelli@gmail.com>
Subject: Re: Correct usage of dev_base_lock in 2020
Date: Sun, 29 Nov 2020 22:58:17 +0200	[thread overview]
Message-ID: <20201129205817.hti2l4hm2fbp2iwy@skbuf> (raw)
In-Reply-To: <20201129182435.jgqfjbekqmmtaief@skbuf>

[ resent, had forgot to copy the list ]

Hi,

net/core/dev.c has this to say about the locking rules around the network
interface lists (dev_base_head, and I can only assume that it also applies to
the per-ifindex hash table dev_index_head and the per-name hash table
dev_name_head):

/*
 * The @dev_base_head list is protected by @dev_base_lock and the rtnl
 * semaphore.
 *
 * Pure readers hold dev_base_lock for reading, or rcu_read_lock()
 *
 * Writers must hold the rtnl semaphore while they loop through the
 * dev_base_head list, and hold dev_base_lock for writing when they do the
 * actual updates.  This allows pure readers to access the list even
 * while a writer is preparing to update it.
 *
 * To put it another way, dev_base_lock is held for writing only to
 * protect against pure readers; the rtnl semaphore provides the
 * protection against other writers.
 *
 * See, for example usages, register_netdevice() and
 * unregister_netdevice(), which must be called with the rtnl
 * semaphore held.
 */

However, as of today, most if not all the read-side accessors of the network
interface lists have been converted to run under rcu_read_lock. As Eric explains,

commit fb699dfd426a189fe33b91586c15176a75c8aed0
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Mon Oct 19 19:18:49 2009 +0000

    net: Introduce dev_get_by_index_rcu()

    Some workloads hit dev_base_lock rwlock pretty hard.
    We can use RCU lookups to avoid touching this rwlock.

    netdevices are already freed after a RCU grace period, so this patch
    adds no penalty at device dismantle time.

    dev_ifname() converted to dev_get_by_index_rcu()

    Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

A lot of work has been put into eliminating the dev_base_lock rwlock
completely, as Stephen explained here:

[PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages
https://www.spinics.net/lists/netdev/msg112264.html

However, its use has not been completely eliminated. It is still there, and
even more confusingly, that comment in net/core/dev.c is still there. What I
see the dev_base_lock being used for now are complete oddballs.

- The debugfs for mac80211, in net/mac80211/debugfs_netdev.c, holds the read
  side when printing some interface properties (good luck disentangling the
  code and figuring out which ones, though). What is that read-side actually
  protecting against?

- HSR, in net/hsr/hsr_device.c (called from hsr_netdev_notify on NETDEV_UP
  NETDEV_DOWN and NETDEV_CHANGE), takes the write-side of the lock when
  modifying the RFC 2863 operstate of the interface. Why?
  Actually the use of dev_base_lock is the most widespread in the kernel today
  when accessing the RFC 2863 operstate. I could only find this truncated
  discussion in the archives:
    Re: Issue 0 WAS (Re: Oustanding issues WAS(IRe: Consensus? WAS(RFC 2863)
    https://www.mail-archive.com/netdev@vger.kernel.org/msg03632.html
  and it said:

    > be transitioned to up/dormant etc. So an ethernet driver doesnt know it
    > needs to go from detecting peer link is up to next being authenticated
    > in the case of 802.1x. It just calls netif_carrier_on which checks
    > link_mode to decide on transition.

    we could protect operstate with a spinlock_irqsave() and then change it either
    from netif_[carrier|dormant]_on/off() or userspace-supplicant. However, I'm
    not feeling good about it. Look at rtnetlink_fill_ifinfo(), it is able to
    query a consistent snapshot of all interface settings as long as locking with
    dev_base_lock and rtnl is obeyed. __LINK_STATE flags are already an
    exemption, and I don't want operstate to be another. That's why I chose
    setting it from linkwatch in process context, and I really think this is the
    correct approach.

- rfc2863_policy() in net/core/link_watch.c seems to be the major writer that
  holds this lock in 2020, together with do_setlink() and set_operstate() from
  net/core/rtnetlink.c. Has the lock been repurposed over the years and we
  should update its name appropriately?

- This usage from netdev_show() in net/core/net-sysfs.c just looks random to
  me, maybe somebody can explain:

	read_lock(&dev_base_lock);
	if (dev_isalive(ndev))
		ret = (*format)(ndev, buf);
	read_unlock(&dev_base_lock);

- This also looks like nonsense to me, maybe somebody can explain.
  drivers/infiniband/hw/mlx4/main.c, function mlx4_ib_update_qps():

	read_lock(&dev_base_lock);
	new_smac = mlx4_mac_to_u64(dev->dev_addr);
	read_unlock(&dev_base_lock);

  where mlx4_mac_to_u64 does:

static inline u64 mlx4_mac_to_u64(u8 *addr)
{
	u64 mac = 0;
	int i;

	for (i = 0; i < ETH_ALEN; i++) {
		mac <<= 8;
		mac |= addr[i];
	}
	return mac;
}

  basically a duplicate of ether_addr_to_u64. So I can only assume that the
  dev_base_lock was taken to protect against what, against changes to
  dev->dev_addr? :)

So it's clear that the dev_base_lock needs to be at least renamed, if not
removed (and at least some instances of it removed). But it's not clear what to
rename it to.

Thanks,
-Vladimir

       reply	other threads:[~2020-11-29 20:59 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20201129182435.jgqfjbekqmmtaief@skbuf>
2020-11-29 20:58 ` Vladimir Oltean [this message]
2020-11-30  5:12   ` Correct usage of dev_base_lock in 2020 Stephen Hemminger
2020-11-30 10:41     ` Eric Dumazet
2020-11-30 18:14       ` Jakub Kicinski
2020-11-30 18:30         ` Eric Dumazet
2020-11-30 18:48         ` Vladimir Oltean
2020-11-30 19:00           ` Eric Dumazet
2020-11-30 19:03             ` Vladimir Oltean
2020-11-30 19:22               ` Eric Dumazet
2020-11-30 19:32                 ` Vladimir Oltean
2020-11-30 21:41                   ` Florian Fainelli
2020-11-30 19:46                 ` Vladimir Oltean
2020-11-30 20:18                   ` Eric Dumazet
2020-11-30 20:21                   ` Stephen Hemminger
2020-11-30 20:26                     ` Vladimir Oltean
2020-11-30 20:29                       ` Eric Dumazet
2020-11-30 20:36                         ` Vladimir Oltean
2020-11-30 20:43                           ` Eric Dumazet
2020-11-30 20:50                             ` Vladimir Oltean
2020-11-30 21:00                               ` Eric Dumazet
2020-11-30 21:11                                 ` Vladimir Oltean
2020-11-30 21:46                                   ` Eric Dumazet
2020-11-30 21:53                                     ` Vladimir Oltean
2020-11-30 22:20                                       ` Eric Dumazet
2020-11-30 22:41                                         ` Vladimir Oltean
2020-12-01 14:42           ` Pablo Neira Ayuso
2020-12-01 18:58             ` Vladimir Oltean
2020-12-10  4:32           ` [PATCH] net: bonding: retrieve device statistics under RTNL, not RCU kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201129205817.hti2l4hm2fbp2iwy@skbuf \
    --to=olteanv@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=edumazet@google.com \
    --cc=f.fainelli@gmail.com \
    --cc=jbenc@redhat.com \
    --cc=jhs@mojatatu.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=paul.gortmaker@windriver.com \
    --cc=stephen@networkplumber.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).