From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6894AC4320A for ; Mon, 16 Aug 2021 13:16:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4CB3760F46 for ; Mon, 16 Aug 2021 13:16:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238969AbhHPNQp (ORCPT ); Mon, 16 Aug 2021 09:16:45 -0400 Received: from mail.kernel.org ([198.145.29.99]:58216 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237150AbhHPNJL (ORCPT ); Mon, 16 Aug 2021 09:09:11 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 5047D6113D; Mon, 16 Aug 2021 13:08:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1629119281; bh=4YvZByB9nPGrmEp4OvMsfBVOnrJYSppjoNbvz70+u/Y=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dOvnPqZ9gIWy90m9B3WrRbv06iAnqxD9LMirrTOsrAd82tOXzbPdfq6I8iz6wfXnM uZ+LTv3K2Vwf2B3SHDiAlQPEDpYm8bSVASI6IsFLqO0Gx1TrNMrZVHAMSCZpokViIs GzzDw6tu2gSFby7okuzK2ZMPTPwHaLgHdTfcwKkk= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Heiner Kallweit , Geert Uytterhoeven , Florian Fainelli , Willy Tarreau , Jakub Kicinski , Sasha Levin Subject: [PATCH 5.10 61/96] net: linkwatch: fix failure to restore device state across suspend/resume Date: Mon, 16 Aug 2021 15:02:11 +0200 Message-Id: <20210816125436.982294079@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210816125434.948010115@linuxfoundation.org> References: <20210816125434.948010115@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Willy Tarreau [ Upstream commit 6922110d152e56d7569616b45a1f02876cf3eb9f ] After migrating my laptop from 4.19-LTS to 5.4-LTS a while ago I noticed that my Ethernet port to which a bond and a VLAN interface are attached appeared to remain up after resuming from suspend with the cable unplugged (and that problem still persists with 5.10-LTS). It happens that the following happens: - the network driver (e1000e here) prepares to suspend, calls e1000e_down() which calls netif_carrier_off() to signal that the link is going down. - netif_carrier_off() adds a link_watch event to the list of events for this device - the device is completely stopped. - the machine suspends - the cable is unplugged and the machine brought to another location - the machine is resumed - the queued linkwatch events are processed for the device - the device doesn't yet have the __LINK_STATE_PRESENT bit and its events are silently dropped - the device is resumed with its link down - the upper VLAN and bond interfaces are never notified that the link had been turned down and remain up - the only way to provoke a change is to physically connect the machine to a port and possibly unplug it. The state after resume looks like this: $ ip -br li | egrep 'bond|eth' bond0 UP e8:6a:64:64:64:64 eth0 DOWN e8:6a:64:64:64:64 eth0.2@eth0 UP e8:6a:64:64:64:64 Placing an explicit call to netdev_state_change() either in the suspend or the resume code in the NIC driver worked around this but the solution is not satisfying. The issue in fact really is in link_watch that loses events while it ought not to. It happens that the test for the device being present was added by commit 124eee3f6955 ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev") in 4.20 to avoid an access to devices that are not present. Instead of dropping events, this patch proceeds slightly differently by postponing their handling so that they happen after the device is fully resumed. Fixes: 124eee3f6955 ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev") Link: https://lists.openwall.net/netdev/2018/03/15/62 Cc: Heiner Kallweit Cc: Geert Uytterhoeven Cc: Florian Fainelli Signed-off-by: Willy Tarreau Link: https://lore.kernel.org/r/20210809160628.22623-1-w@1wt.eu Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/core/link_watch.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/core/link_watch.c b/net/core/link_watch.c index 75431ca9300f..1a455847da54 100644 --- a/net/core/link_watch.c +++ b/net/core/link_watch.c @@ -158,7 +158,7 @@ static void linkwatch_do_dev(struct net_device *dev) clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state); rfc2863_policy(dev); - if (dev->flags & IFF_UP && netif_device_present(dev)) { + if (dev->flags & IFF_UP) { if (netif_carrier_ok(dev)) dev_activate(dev); else @@ -204,7 +204,8 @@ static void __linkwatch_run_queue(int urgent_only) dev = list_first_entry(&wrk, struct net_device, link_watch_list); list_del_init(&dev->link_watch_list); - if (urgent_only && !linkwatch_urgent_event(dev)) { + if (!netif_device_present(dev) || + (urgent_only && !linkwatch_urgent_event(dev))) { list_add_tail(&dev->link_watch_list, &lweventlist); continue; } -- 2.30.2