From: Kai-Heng Feng <kai.heng.feng@canonical.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>,
"open list:NETWORKING DRIVERS" <netdev@vger.kernel.org>,
open list <linux-kernel@vger.kernel.org>,
"moderated list:INTEL ETHERNET DRIVERS"
<intel-wired-lan@lists.osuosl.org>,
"David S. Miller" <davem@davemloft.net>
Subject: Re: [Intel-wired-lan] [PATCH] igb: Use a sperate mutex insead of rtnl_lock()
Date: Fri, 27 Mar 2020 01:16:39 +0800 [thread overview]
Message-ID: <98E86E5A-4EE9-4CB5-81CF-49C3E74C3AE6@canonical.com> (raw)
In-Reply-To: <CAKgT0UfFnXcSSsXvxk8+xiZvyzDh+8V-9bCT-z5U+MEVoAVKLw@mail.gmail.com>
Hi Alexander,
> On Mar 27, 2020, at 00:27, Alexander Duyck <alexander.duyck@gmail.com> wrote:
>
> On Thu, Mar 26, 2020 at 3:39 AM Kai-Heng Feng
> <kai.heng.feng@canonical.com <mailto:kai.heng.feng@canonical.com>> wrote:
>>
>> Commit 9474933caf21 ("igb: close/suspend race in netif_device_detach")
>> fixed race condition between close and power management ops by using
>> rtnl_lock().
>>
>> This fix is a preparation for next patch, to prevent a dead lock under
>> rtnl_lock() when calling runtime resume routine.
>>
>> However, we can't use device_lock() in igb_close() because when module
>> is getting removed, the lock is already held for igb_remove(), and
>> igb_close() gets called during unregistering the netdev, hence causing a
>> deadlock. So let's introduce a new mutex so we don't cause a deadlock
>> with driver core or netdev core.
>>
>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>
> So this description doesn't make much sense to me. You describe the
> use of the device_lock() in igb_close() but it isn't used there.
Sorry I forgot to add a revision number.
It was used by previous version and Aaron found a regression when device_lock() is used.
> In addition it seems like you are arbitrarily moving code that was
> wrapped in the rtnl_lock out of it. I'm not entirely sure that is safe
> since there are calls within many of these functions that assume the
> rtnl_lock is held and changing that is likely going to introduce more
> issues.
The reason why rtnl lock needs to be removed is because of the following patch:
https://lore.kernel.org/lkml/20200207101005.4454-2-kai.heng.feng@canonical.com/
Ethtools helpers already held rtnl_lock, so to prevent a deadlock, my idea is to use another lock to solve what "igb: close/suspend race in netif_device_detach" originally tried to fix.
>
>
>
>> ---
>> drivers/net/ethernet/intel/igb/igb_main.c | 19 +++++++++++++------
>> 1 file changed, 13 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
>> index b46bff8fe056..dc7ed5dd216b 100644
>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>> @@ -288,6 +288,8 @@ static const struct igb_reg_info igb_reg_info_tbl[] = {
>> {}
>> };
>>
>> +static DEFINE_MUTEX(igb_mutex);
>> +
>> /* igb_regdump - register printout routine */
>> static void igb_regdump(struct e1000_hw *hw, struct igb_reg_info *reginfo)
>> {
>> @@ -4026,9 +4028,14 @@ static int __igb_close(struct net_device *netdev, bool suspending)
>>
>> int igb_close(struct net_device *netdev)
>> {
>> + int err = 0;
>> +
>> + mutex_lock(&igb_mutex);
>> if (netif_device_present(netdev) || netdev->dismantle)
>> - return __igb_close(netdev, false);
>> - return 0;
>> + err = __igb_close(netdev, false);
>> + mutex_unlock(&igb_mutex);
>> +
>> + return err;
>> }
>>
>
> Okay, so I am guessing the problem has something to do with the
> addition of the netdev->dismantle test here and the fact that it is
> bypassing the present check for the hotplug remove case?
Please see the rationale above.
>
> So it looks like nobody ever really reviewed commit 888f22931478
> ("igb: Free IRQs when device is hotplugged"). What I would recommend
> is reverting it and instead we fix the remaining pieces that need to
> be addressed in igb so it more closely matches what we have in e1000e
> after commit a7023819404a ("e1000e: Use rtnl_lock to prevent race
> conditions between net and pci/pm"). From what I can tell the only
> pieces that are really missing is to update igb_io_error_detected so
> that in addition to igb_down it will call igb_free_irq, and then in
> addition we should be wrapping most of the code in that function with
> an rtnl_lock since it is detaching a device and making modifications
> to it.
In addition to that, igb_shutdown() indirectly calls igb_close() when netdev unregistering the device.
My "only scratch the surface" approach is because I don't have a reproducer for commit "igb: close/suspend race in netif_device_detach", and I am afraid of breaking it.
Kai-Heng
next prev parent reply other threads:[~2020-03-26 17:16 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-26 10:39 [PATCH] igb: Use a sperate mutex insead of rtnl_lock() Kai-Heng Feng
2020-03-26 10:41 ` Kai-Heng Feng
2020-03-26 11:16 ` [Intel-wired-lan] " Paul Menzel
2020-03-26 16:27 ` Alexander Duyck
2020-03-26 17:16 ` Kai-Heng Feng [this message]
2020-03-26 20:55 ` Alexander Duyck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=98E86E5A-4EE9-4CB5-81CF-49C3E74C3AE6@canonical.com \
--to=kai.heng.feng@canonical.com \
--cc=alexander.duyck@gmail.com \
--cc=davem@davemloft.net \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jeffrey.t.kirsher@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).