From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF3F0C48BD1 for ; Wed, 9 Jun 2021 14:29:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 953BC613AD for ; Wed, 9 Jun 2021 14:29:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237099AbhFIObi (ORCPT ); Wed, 9 Jun 2021 10:31:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234837AbhFIObg (ORCPT ); Wed, 9 Jun 2021 10:31:36 -0400 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3101C061574; Wed, 9 Jun 2021 07:29:28 -0700 (PDT) Received: by mail-wr1-x431.google.com with SMTP id f2so25690308wri.11; Wed, 09 Jun 2021 07:29:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:cc:references:from:subject:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=9+GD4T1tScerJbWjL85fcVxnAqoIkKZwgXPkR/S+YuE=; b=o/wso2j5go226epNBQU6pzqBT3E0W7049exXHzlA3qgENc1eyXkrrx7yDg/ToYIvzB tKwFX1uxVjlD3n9ZAdQBndqx9v0nisY5sAUa+N/djQog2GefJN2fk19Gxk02hy+5kVCe +VCufOnXC7UEKOwj3GeMI1SPKAhRebVj2wP/WPBopO1LLHPIwpnZWOsehZvAbD9cE1mA Cvigtx7RfK9WoVTH/PhC1NF860wlTFQ9DWquvQnLam/zjIvmBx3s1ZKf+ZAm+Qi5E8/m VEmFJfz5phkOyBAOGG46WHSLWLvLC9Rx9Gif1Uzx4VQAG4OP5KBI1uEw1aRrvp9PIzoe XjHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:subject:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=9+GD4T1tScerJbWjL85fcVxnAqoIkKZwgXPkR/S+YuE=; b=EBZZGvnuRUs7mUEJRZtcPCPeK2hG/RTGc4135AeGHpMeUkBZagnKebIMHXoKFpDwRW KpKniLa/JTlk8ZCKubJFvw59Cg+WNCVmM0Qft2W8KS8vwuA6+JKyew2Ecm7gdr6Z00mW Qw5OjADSukep4LNxITypMscKcUhwGZHO6lQcv8Ytrlk6r9c5fq7eXJz6axObptnU7rmJ VpYFnJeKlD/x3HGWE55tsYrrbEoPysgTniPHB/+qaxxl3ty4Zk3Pf7MZk0eGbZA9jtOv /KwBr/NWS1j/Ee+Ii0uqOC7zAoOUObV4tpOv7TmEPJ1TE6aDxHa2Spqg9r1B+CZeZYfA NnJQ== X-Gm-Message-State: AOAM532Eu0BGmydgDikDIueSloP/Nv8UQLzz0v6e1Ke7SycAruavX6Hj D1qoAIvEIoEhhNddhZximY2lYlgjvBo= X-Google-Smtp-Source: ABdhPJx4CcHBB6Mc+0Z+JYOuwu9mbXF7yzNQdsv5Wjp558OXEWFLRnI9QIPuQqM2uRPrK+Tc/fQEjA== X-Received: by 2002:adf:f690:: with SMTP id v16mr62142wrp.247.1623248967343; Wed, 09 Jun 2021 07:29:27 -0700 (PDT) Received: from ?IPv6:2003:ea:8f29:3800:a9b1:fb24:eb4f:c8ac? (p200300ea8f293800a9b1fb24eb4fc8ac.dip0.t-ipconnect.de. [2003:ea:8f29:3800:a9b1:fb24:eb4f:c8ac]) by smtp.googlemail.com with ESMTPSA id o3sm146298wrc.0.2021.06.09.07.29.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 09 Jun 2021 07:29:26 -0700 (PDT) To: Koba Ko Cc: "David S. Miller" , Jakub Kicinski , netdev@vger.kernel.org, Linux Kernel Mailing List References: <20210608032207.2923574-1-koba.ko@canonical.com> <84eb168e-58ff-0350-74e2-c55249eb258c@gmail.com> <7a36c032-38fa-6aa6-fa0f-c3664850d8ea@gmail.com> <4508fbcb-f8ec-3805-4aa3-eeea4975de31@gmail.com> From: Heiner Kallweit Subject: Re: [PATCH] [v2] r8169: Use PHY_POLL when RTL8106E enable ASPM Message-ID: <51990dff-79f8-ec25-873d-e5d6be2f923c@gmail.com> Date: Wed, 9 Jun 2021 16:29:21 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09.06.2021 03:47, Koba Ko wrote: > On Wed, Jun 9, 2021 at 4:58 AM Heiner Kallweit wrote: >> >> On 08.06.2021 16:17, Koba Ko wrote: >>> On Tue, Jun 8, 2021 at 9:45 PM Heiner Kallweit wrote: >>>> >>>> On 08.06.2021 12:43, Koba Ko wrote: >>>>> On Tue, Jun 8, 2021 at 4:00 PM Heiner Kallweit wrote: >>>>>> >>>>>> On 08.06.2021 05:22, Koba Ko wrote: >>>>>>> For RTL8106E, it's a Fast-ethernet chip. >>>>>>> If ASPM is enabled, the link chang interrupt wouldn't be triggered >>>>>>> immediately and must wait a very long time to get link change interrupt. >>>>>>> Even the link change interrupt isn't triggered, the phy link is already >>>>>>> established. >>>>>>> >>>>>>> Use PHY_POLL to watch the status of phy link and disable >>>>>>> the link change interrupt when ASPM is enabled on RTL8106E. >>>>>>> >>>>>>> v2: Instead use PHY_POLL and identify 8106E by RTL_GIGA_MAC_VER_39. >>>>>>> >>>>>> >>>>>> Still the issue description doesn't convince me that it's a hw bug >>>>>> with the respective chip version. What has been stated so far: >>>>>> >>>>>> 1. (and most important) Issue doesn't occur in mainline because ASPM >>>>>> is disabled in mainline for r8169. Issue occurs only with a >>>>>> downstream kernel with ASPM enabled for r8169. >>>>> >>>>> mainline kernel and enable L1, the issue is also observed. >>>>> >>>> Yes, but enabling L1 via sysfs is at own risk. >>> >>> but we could have a workaround if hw have an aspm issue. >>> >>>> >>>>>> 2. Issue occurs only with ASPM L1.1 not disabled, even though this chip >>>>>> version doesn't support L1 sub-states. Just L0s/L1 don't trigger >>>>>> the issue. >>>>>> The NIC doesn't announce L1.1 support, therefore PCI core won't >>>>>> enable L1 sub-states on the PCIe link between NIC and upstream >>>>>> PCI bridge. >>>>> >>>>> More precisely, when L1 is enabled, the issue would be triggered. >>>>> For RTL8106E, >>>>> 1. Only disable L0s, pcie_aspm_enabled return 1, issue is triggered. >>>>> 2. Only disable L1_1, pcie_aspm_enabled return 1, issue is triggered. >>>>> >>>>> 3. Only disable L1, pcie_aspm_enabled return 0, issue is not triggered. >>>>> >>>>>> >>>>>> 3. Issue occurs only with a GBit-capable link partner. 100MBit link >>>>>> partners are fine. Not clear whether issue occurs with a specific >>>>>> Gbit link partner only or with GBit-capable link partners in general. >>>>>> >>>>>> 4. Only link-up interrupt is affected. Not link-down and not interrupts >>>>>> triggered by other interrupt sources. >>>>>> >>>>>> 5. Realtek couldn't confirm that there's such a hw bug on RTL8106e. >>>>>> >>>>>> One thing that hasn't been asked yet: >>>>>> Does issue occur always if you re-plug the cable? Or only on boot? >>>>>> I'm asking because in the dmesg log you attached to the bugzilla issue >>>>>> the following looks totally ok. >>>>>> >>>>>> [ 61.651643] r8169 0000:01:00.0 enp1s0: Link is Down >>>>>> [ 63.720015] r8169 0000:01:00.0 enp1s0: Link is Up - 100Mbps/Full - flow control rx/tx >>>>>> [ 66.685499] r8169 0000:01:00.0 enp1s0: Link is Down >>>>> >>>>> Once the link is up, >>>>> 1. If cable is unplug&plug immediately, you wouldn't see the issue. >>>>> 2. Unplug cable and wait a long time (~1Mins), then plug the cable, >>>>> the issue appears again. >>>>> >>>> This sounds runtime-pm-related. After 10s the NIC runtime-suspends, >>>> and once the cable is re-plugged a PME is triggered that lets the >>>> PCI core return the PCIe link from D3hot to D0. >>>> If you re-plug the cable after such a longer time, do you see the >>>> PCIe PME immediately in /proc/interrupts? >>> >>> I don't know which irq number is for RTL8106e PME, but check all PME >>> in /proc/interrupt, >>> >>> There's no interrupt increase on all PME entries and rtl8106e irq entry(irq 32). >>> >>>> >>>> And if you set /sys/class/net//power/control to "on", does the >>>> issue still occur? >>> >>> Yes, after echo "on" to /sys/class/net//power/control and >>> replugging the cable, the issue is still caught. >>>> >>>>>> >>>>>>> Signed-off-by: Koba Ko >>>>>>> --- >>>>>>> drivers/net/ethernet/realtek/r8169_main.c | 21 +++++++++++++++++++-- >>>>>>> 1 file changed, 19 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c >>>>>>> index 2c89cde7da1e..a59cbaef2839 100644 >>>>>>> --- a/drivers/net/ethernet/realtek/r8169_main.c >>>>>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c >>>>>>> @@ -4914,6 +4914,19 @@ static const struct dev_pm_ops rtl8169_pm_ops = { >>>>>>> >>>>>>> #endif /* CONFIG_PM */ >>>>>>> >>>>>>> +static int rtl_phy_poll_quirk(struct rtl8169_private *tp) >>>>>>> +{ >>>>>>> + struct pci_dev *pdev = tp->pci_dev; >>>>>>> + >>>>>>> + if (!pcie_aspm_enabled(pdev)) >>>>>> >>>>>> That's the wrong call. According to what you said earlier you want to >>>>>> check for L1 sub-states, not for ASPM in general. >>>>> >>>>> As per described above, that's why use pcie_aspm_enabled here. >>>>> >>>>>> >>>>>>> + return 0; >>>>>>> + >>>>>>> + if (tp->mac_version == RTL_GIGA_MAC_VER_39) >>>>>>> + return 1; >>>>>>> + >>>>>>> + return 0; >>>>>>> +} >>>>>>> + >>>>>>> static void rtl_wol_shutdown_quirk(struct rtl8169_private *tp) >>>>>>> { >>>>>>> /* WoL fails with 8168b when the receiver is disabled. */ >>>>>>> @@ -4991,7 +5004,10 @@ static const struct net_device_ops rtl_netdev_ops = { >>>>>>> >>>>>>> static void rtl_set_irq_mask(struct rtl8169_private *tp) >>>>>>> { >>>>>>> - tp->irq_mask = RxOK | RxErr | TxOK | TxErr | LinkChg; >>>>>>> + tp->irq_mask = RxOK | RxErr | TxOK | TxErr; >>>>>>> + >>>>>>> + if (!rtl_phy_poll_quirk(tp)) >>>>>>> + tp->irq_mask |= LinkChg; >>>>>>> >>>>>>> if (tp->mac_version <= RTL_GIGA_MAC_VER_06) >>>>>>> tp->irq_mask |= SYSErr | RxOverflow | RxFIFOOver; >>>>>>> @@ -5085,7 +5101,8 @@ static int r8169_mdio_register(struct rtl8169_private *tp) >>>>>>> new_bus->name = "r8169"; >>>>>>> new_bus->priv = tp; >>>>>>> new_bus->parent = &pdev->dev; >>>>>>> - new_bus->irq[0] = PHY_MAC_INTERRUPT; >>>>>>> + new_bus->irq[0] = >>>>>>> + (rtl_phy_poll_quirk(tp) ? PHY_POLL : PHY_MAC_INTERRUPT); >>>>>>> snprintf(new_bus->id, MII_BUS_ID_SIZE, "r8169-%x", pci_dev_id(pdev)); >>>>>>> >>>>>>> new_bus->read = r8169_mdio_read_reg; >>>>>>> >>>>>> >>>> >> >> The r8101 vendor driver applies a special setting for RTL8106e if ASPM is enabled. >> Not sure whether it's related but it's worth a try. Could you please check whether >> the following makes a difference? > > After applying this patch, it can't help relieve the issue. > The 8101 vendor driver also use polling method to watch the link change event. > Tried to disable polling method and enable LinkChg Irq, but it also > can't get LinkChg interrupt. > OK, thanks for testing. As it is now, rtl_phy_poll_quirk() would always return false because ASPM is disabled on driver load. Changing ASPM settings can be done later via sysfs if supported by device and upstream bridge. In addition: - If you want to return a bool, declare the return value as bool. - You shouldn't have to disable the LinkChg interrupt. The interrupt just triggers a phylib state machine run what may result in a faster link-up signal even when using polling. By the way: All Realtek vendor drivers (r8101, r8168, r8125) don't use the LinkChg interrupt. However I don't think this is because of known hw issues, more likely it's because they don't use phylib and thought polling is the easier approach. At least for r8169 I haven't heard yet of any complain regarding the LinkChg interrupt. >> >> diff --git a/drivers/net/ethernet/realtek/r8169_phy_config.c b/drivers/net/ethernet/realtek/r8169_phy_config.c >> index 50f0f621b..60014b9c4 100644 >> --- a/drivers/net/ethernet/realtek/r8169_phy_config.c >> +++ b/drivers/net/ethernet/realtek/r8169_phy_config.c >> @@ -1153,6 +1153,7 @@ static void rtl8106e_hw_phy_config(struct rtl8169_private *tp, >> r8169_apply_firmware(tp); >> >> rtl_writephy_batch(phydev, phy_reg_init); >> + phy_write(phydev, 0x18, 0x8310); >> } >> >> static void rtl8125_legacy_force_mode(struct phy_device *phydev) >> -- >> 2.32.0 >> >>