* Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-04 13:06 Andreas K. Huettel 2021-10-04 14:48 ` [Intel-wired-lan] " Jakub Kicinski 0 siblings, 1 reply; 35+ messages in thread From: Andreas K. Huettel @ 2021-10-04 13:06 UTC (permalink / raw) To: netdev [-- Attachment #1: Type: text/plain, Size: 1952 bytes --] Dear all, I hope this is the right place to ask, if not please advise me where to go. I have a new Dell machine with both an Intel on-board ethernet controller ([8086:15f9]) and an additional 2-port extension card ([8086:1521]). The second adaptor, a "DeLock PCIe 2xGBit", worked fine as far as I could see with Linux 5.10.59, but fails to initialize with Linux 5.14.9. dilfridge ~ # lspci -nn [...] 01:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) 01:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) [...] dilfridge ~ # dmesg|grep igb [ 2.069286] igb: Intel(R) Gigabit Ethernet Network Driver [ 2.069288] igb: Copyright (c) 2007-2014 Intel Corporation. [ 2.069305] igb 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible) [ 2.069624] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost [ 2.386659] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER session. [ 4.115500] igb 0000:01:00.0: The NVM Checksum Is Not Valid [ 4.133807] igb: probe of 0000:01:00.0 failed with error -5 [ 4.133820] igb 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible) [ 4.134072] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe link lost [ 4.451602] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER session. [ 6.180123] igb 0000:01:00.1: The NVM Checksum Is Not Valid [ 6.188631] igb: probe of 0000:01:00.1 failed with error -5 Any advice on how to proceed? Willing to test patches and provide additional debug info. Thanks, Andreas -- PD Dr. Andreas K. Huettel Institute for Experimental and Applied Physics University of Regensburg 93040 Regensburg Germany tel. +49 151 241 67748 (mobile) tel. +49 941 943 1618 (office) e-mail andreas.huettel@ur.de http://www.akhuettel.de/ http://www.physik.uni-r.de/forschung/huettel/ [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 981 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-04 13:06 Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] Andreas K. Huettel @ 2021-10-04 14:48 ` Jakub Kicinski 0 siblings, 0 replies; 35+ messages in thread From: Jakub Kicinski @ 2021-10-04 14:48 UTC (permalink / raw) To: Andreas K. Huettel; +Cc: netdev, intel-wired-lan, Sasha Neftin On Mon, 04 Oct 2021 15:06:31 +0200 Andreas K. Huettel wrote: > Dear all, > > I hope this is the right place to ask, if not please advise me where to go. Adding intel-wired-lan@lists.osuosl.org and Sasha as well. > I have a new Dell machine with both an Intel on-board ethernet controller > ([8086:15f9]) and an additional 2-port extension card ([8086:1521]). > > The second adaptor, a "DeLock PCIe 2xGBit", worked fine as far as I could > see with Linux 5.10.59, but fails to initialize with Linux 5.14.9. > > dilfridge ~ # lspci -nn > [...] > 01:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) > 01:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) > [...] > > dilfridge ~ # dmesg|grep igb > [ 2.069286] igb: Intel(R) Gigabit Ethernet Network Driver > [ 2.069288] igb: Copyright (c) 2007-2014 Intel Corporation. > [ 2.069305] igb 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible) > [ 2.069624] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost > [ 2.386659] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER session. > [ 4.115500] igb 0000:01:00.0: The NVM Checksum Is Not Valid > [ 4.133807] igb: probe of 0000:01:00.0 failed with error -5 > [ 4.133820] igb 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible) > [ 4.134072] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe link lost > [ 4.451602] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER session. > [ 6.180123] igb 0000:01:00.1: The NVM Checksum Is Not Valid > [ 6.188631] igb: probe of 0000:01:00.1 failed with error -5 > > Any advice on how to proceed? Willing to test patches and provide additional debug info. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-04 14:48 ` Jakub Kicinski 0 siblings, 0 replies; 35+ messages in thread From: Jakub Kicinski @ 2021-10-04 14:48 UTC (permalink / raw) To: intel-wired-lan On Mon, 04 Oct 2021 15:06:31 +0200 Andreas K. Huettel wrote: > Dear all, > > I hope this is the right place to ask, if not please advise me where to go. Adding intel-wired-lan at lists.osuosl.org and Sasha as well. > I have a new Dell machine with both an Intel on-board ethernet controller > ([8086:15f9]) and an additional 2-port extension card ([8086:1521]). > > The second adaptor, a "DeLock PCIe 2xGBit", worked fine as far as I could > see with Linux 5.10.59, but fails to initialize with Linux 5.14.9. > > dilfridge ~ # lspci -nn > [...] > 01:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) > 01:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) > [...] > > dilfridge ~ # dmesg|grep igb > [ 2.069286] igb: Intel(R) Gigabit Ethernet Network Driver > [ 2.069288] igb: Copyright (c) 2007-2014 Intel Corporation. > [ 2.069305] igb 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible) > [ 2.069624] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost > [ 2.386659] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER session. > [ 4.115500] igb 0000:01:00.0: The NVM Checksum Is Not Valid > [ 4.133807] igb: probe of 0000:01:00.0 failed with error -5 > [ 4.133820] igb 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible) > [ 4.134072] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe link lost > [ 4.451602] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER session. > [ 6.180123] igb 0000:01:00.1: The NVM Checksum Is Not Valid > [ 6.188631] igb: probe of 0000:01:00.1 failed with error -5 > > Any advice on how to proceed? Willing to test patches and provide additional debug info. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-04 14:48 ` [Intel-wired-lan] " Jakub Kicinski @ 2021-10-04 23:39 ` Hisashi T Fujinaka -1 siblings, 0 replies; 35+ messages in thread From: Hisashi T Fujinaka @ 2021-10-04 23:39 UTC (permalink / raw) To: Jakub Kicinski; +Cc: Andreas K. Huettel, netdev, intel-wired-lan On Mon, 4 Oct 2021, Jakub Kicinski wrote: > On Mon, 04 Oct 2021 15:06:31 +0200 Andreas K. Huettel wrote: >> Dear all, >> >> I hope this is the right place to ask, if not please advise me where to go. > > Adding intel-wired-lan@lists.osuosl.org and Sasha as well. > >> I have a new Dell machine with both an Intel on-board ethernet controller >> ([8086:15f9]) and an additional 2-port extension card ([8086:1521]). >> >> The second adaptor, a "DeLock PCIe 2xGBit", worked fine as far as I could >> see with Linux 5.10.59, but fails to initialize with Linux 5.14.9. >> >> dilfridge ~ # lspci -nn >> [...] >> 01:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) >> 01:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) >> [...] >> >> dilfridge ~ # dmesg|grep igb >> [ 2.069286] igb: Intel(R) Gigabit Ethernet Network Driver >> [ 2.069288] igb: Copyright (c) 2007-2014 Intel Corporation. >> [ 2.069305] igb 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible) >> [ 2.069624] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost >> [ 2.386659] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER session. >> [ 4.115500] igb 0000:01:00.0: The NVM Checksum Is Not Valid >> [ 4.133807] igb: probe of 0000:01:00.0 failed with error -5 >> [ 4.133820] igb 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible) >> [ 4.134072] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe link lost >> [ 4.451602] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER session. >> [ 6.180123] igb 0000:01:00.1: The NVM Checksum Is Not Valid >> [ 6.188631] igb: probe of 0000:01:00.1 failed with error -5 >> >> Any advice on how to proceed? Willing to test patches and provide additional debug info. Sorry to reply from a non-Intel account. I would suggest first contacting Dell, and then contacting DeLock. This sounds like an issue with motherboard firmware and most of what I can help with would be with the driver. I think the issues are probably before things get to the driver. Todd Fujinaka <todd.fujinaka@intel.com> ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-04 23:39 ` Hisashi T Fujinaka 0 siblings, 0 replies; 35+ messages in thread From: Hisashi T Fujinaka @ 2021-10-04 23:39 UTC (permalink / raw) To: intel-wired-lan On Mon, 4 Oct 2021, Jakub Kicinski wrote: > On Mon, 04 Oct 2021 15:06:31 +0200 Andreas K. Huettel wrote: >> Dear all, >> >> I hope this is the right place to ask, if not please advise me where to go. > > Adding intel-wired-lan at lists.osuosl.org and Sasha as well. > >> I have a new Dell machine with both an Intel on-board ethernet controller >> ([8086:15f9]) and an additional 2-port extension card ([8086:1521]). >> >> The second adaptor, a "DeLock PCIe 2xGBit", worked fine as far as I could >> see with Linux 5.10.59, but fails to initialize with Linux 5.14.9. >> >> dilfridge ~ # lspci -nn >> [...] >> 01:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) >> 01:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) >> [...] >> >> dilfridge ~ # dmesg|grep igb >> [ 2.069286] igb: Intel(R) Gigabit Ethernet Network Driver >> [ 2.069288] igb: Copyright (c) 2007-2014 Intel Corporation. >> [ 2.069305] igb 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible) >> [ 2.069624] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost >> [ 2.386659] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER session. >> [ 4.115500] igb 0000:01:00.0: The NVM Checksum Is Not Valid >> [ 4.133807] igb: probe of 0000:01:00.0 failed with error -5 >> [ 4.133820] igb 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible) >> [ 4.134072] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe link lost >> [ 4.451602] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER session. >> [ 6.180123] igb 0000:01:00.1: The NVM Checksum Is Not Valid >> [ 6.188631] igb: probe of 0000:01:00.1 failed with error -5 >> >> Any advice on how to proceed? Willing to test patches and provide additional debug info. Sorry to reply from a non-Intel account. I would suggest first contacting Dell, and then contacting DeLock. This sounds like an issue with motherboard firmware and most of what I can help with would be with the driver. I think the issues are probably before things get to the driver. Todd Fujinaka <todd.fujinaka@intel.com> ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [EXT] Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-04 23:39 ` Hisashi T Fujinaka @ 2021-10-05 0:12 ` Andreas K. Huettel -1 siblings, 0 replies; 35+ messages in thread From: Andreas K. Huettel @ 2021-10-05 0:12 UTC (permalink / raw) To: Jakub Kicinski, Hisashi T Fujinaka; +Cc: netdev, intel-wired-lan [-- Attachment #1: Type: text/plain, Size: 561 bytes --] > >> > >> Any advice on how to proceed? Willing to test patches and provide > >> additional debug info. > Sorry to reply from a non-Intel account. I would suggest first > contacting Dell, and then contacting DeLock. This sounds like an > issue with motherboard firmware and most of what I can help with > would be with the driver. I think the issues are probably before > things get to the driver. Ouch. OK. Can you think of any temporary workaround? (Other than downgrading to 5.10 again, which I can't since it fails at the graphics (i915) modesetting...) [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 981 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] [EXT] Re: Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-05 0:12 ` Andreas K. Huettel 0 siblings, 0 replies; 35+ messages in thread From: Andreas K. Huettel @ 2021-10-05 0:12 UTC (permalink / raw) To: intel-wired-lan > >> > >> Any advice on how to proceed? Willing to test patches and provide > >> additional debug info. > Sorry to reply from a non-Intel account. I would suggest first > contacting Dell, and then contacting DeLock. This sounds like an > issue with motherboard firmware and most of what I can help with > would be with the driver. I think the issues are probably before > things get to the driver. Ouch. OK. Can you think of any temporary workaround? (Other than downgrading to 5.10 again, which I can't since it fails at the graphics (i915) modesetting...) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 981 bytes Desc: This is a digitally signed message part. URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20211005/c7c9b8f3/attachment.asc> ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [EXT] Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-05 0:12 ` [Intel-wired-lan] [EXT] " Andreas K. Huettel @ 2021-10-05 0:21 ` Hisashi T Fujinaka -1 siblings, 0 replies; 35+ messages in thread From: Hisashi T Fujinaka @ 2021-10-05 0:21 UTC (permalink / raw) To: Andreas K. Huettel; +Cc: Jakub Kicinski, netdev, intel-wired-lan On Tue, 5 Oct 2021, Andreas K. Huettel wrote: >>>> >>>> Any advice on how to proceed? Willing to test patches and provide >>>> additional debug info. >> Sorry to reply from a non-Intel account. I would suggest first >> contacting Dell, and then contacting DeLock. This sounds like an >> issue with motherboard firmware and most of what I can help with >> would be with the driver. I think the issues are probably before >> things get to the driver. > > Ouch. OK. Can you think of any temporary workaround? > > (Other than downgrading to 5.10 again, which I can't since it fails > at the graphics (i915) modesetting...) This is completely unofficial because I don't really work on client systems, but I'd try different NICs, different slots, and the BIOS settings. You also might try support@intel.com because they're much more used to client system support. Todd Fujinaka todd.fujinaka@intel.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] [EXT] Re: Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-05 0:21 ` Hisashi T Fujinaka 0 siblings, 0 replies; 35+ messages in thread From: Hisashi T Fujinaka @ 2021-10-05 0:21 UTC (permalink / raw) To: intel-wired-lan On Tue, 5 Oct 2021, Andreas K. Huettel wrote: >>>> >>>> Any advice on how to proceed? Willing to test patches and provide >>>> additional debug info. >> Sorry to reply from a non-Intel account. I would suggest first >> contacting Dell, and then contacting DeLock. This sounds like an >> issue with motherboard firmware and most of what I can help with >> would be with the driver. I think the issues are probably before >> things get to the driver. > > Ouch. OK. Can you think of any temporary workaround? > > (Other than downgrading to 5.10 again, which I can't since it fails > at the graphics (i915) modesetting...) This is completely unofficial because I don't really work on client systems, but I'd try different NICs, different slots, and the BIOS settings. You also might try support at intel.com because they're much more used to client system support. Todd Fujinaka todd.fujinaka at intel.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-04 23:39 ` Hisashi T Fujinaka @ 2021-10-05 6:50 ` Sasha Neftin -1 siblings, 0 replies; 35+ messages in thread From: Sasha Neftin @ 2021-10-05 6:50 UTC (permalink / raw) To: Hisashi T Fujinaka, Jakub Kicinski Cc: netdev, intel-wired-lan, Neftin, Sasha, Nguyen, Anthony L On 10/5/2021 02:39, Hisashi T Fujinaka wrote: > On Mon, 4 Oct 2021, Jakub Kicinski wrote: > >> On Mon, 04 Oct 2021 15:06:31 +0200 Andreas K. Huettel wrote: >>> Dear all, >>> >>> I hope this is the right place to ask, if not please advise me where >>> to go. >> >> Adding intel-wired-lan@lists.osuosl.org and Sasha as well. >> >>> I have a new Dell machine with both an Intel on-board ethernet >>> controller >>> ([8086:15f9]) and an additional 2-port extension card ([8086:1521]). >>> >>> The second adaptor, a "DeLock PCIe 2xGBit", worked fine as far as I >>> could >>> see with Linux 5.10.59, but fails to initialize with Linux 5.14.9. >>> >>> dilfridge ~ # lspci -nn >>> [...] >>> 01:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit >>> Network Connection [8086:1521] (rev ff) >>> 01:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit >>> Network Connection [8086:1521] (rev ff) >>> [...] >>> >>> dilfridge ~ # dmesg|grep igb >>> [ 2.069286] igb: Intel(R) Gigabit Ethernet Network Driver >>> [ 2.069288] igb: Copyright (c) 2007-2014 Intel Corporation. >>> [ 2.069305] igb 0000:01:00.0: can't change power state from D3cold >>> to D0 (config space inaccessible) >>> [ 2.069624] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe >>> link lost >>> [ 2.386659] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER >>> session. >>> [ 4.115500] igb 0000:01:00.0: The NVM Checksum Is Not Valid >>> [ 4.133807] igb: probe of 0000:01:00.0 failed with error -5 >>> [ 4.133820] igb 0000:01:00.1: can't change power state from D3cold >>> to D0 (config space inaccessible) >>> [ 4.134072] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe >>> link lost >>> [ 4.451602] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER >>> session. >>> [ 6.180123] igb 0000:01:00.1: The NVM Checksum Is Not Valid >>> [ 6.188631] igb: probe of 0000:01:00.1 failed with error -5 >>> >>> Any advice on how to proceed? Willing to test patches and provide >>> additional debug info. > > Sorry to reply from a non-Intel account. I would suggest first > contacting Dell, and then contacting DeLock. This sounds like an issue > with motherboard firmware and most of what I can help with would be with > the driver. I think the issues are probably before things get to the > driver. > Agree. The driver starts work when the PCIe link in L0. Please, check with Dell/DeLock what is PCIe link status and enumeration process finished as properly.(probably you will need PCIe sniffer) > Todd Fujinaka <todd.fujinaka@intel.com> > _______________________________________________ > Intel-wired-lan mailing list > Intel-wired-lan@osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-05 6:50 ` Sasha Neftin 0 siblings, 0 replies; 35+ messages in thread From: Sasha Neftin @ 2021-10-05 6:50 UTC (permalink / raw) To: intel-wired-lan On 10/5/2021 02:39, Hisashi T Fujinaka wrote: > On Mon, 4 Oct 2021, Jakub Kicinski wrote: > >> On Mon, 04 Oct 2021 15:06:31 +0200 Andreas K. Huettel wrote: >>> Dear all, >>> >>> I hope this is the right place to ask, if not please advise me where >>> to go. >> >> Adding intel-wired-lan at lists.osuosl.org and Sasha as well. >> >>> I have a new Dell machine with both an Intel on-board ethernet >>> controller >>> ([8086:15f9]) and an additional 2-port extension card ([8086:1521]). >>> >>> The second adaptor, a "DeLock PCIe 2xGBit", worked fine as far as I >>> could >>> see with Linux 5.10.59, but fails to initialize with Linux 5.14.9. >>> >>> dilfridge ~ # lspci -nn >>> [...] >>> 01:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit >>> Network Connection [8086:1521] (rev ff) >>> 01:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit >>> Network Connection [8086:1521] (rev ff) >>> [...] >>> >>> dilfridge ~ # dmesg|grep igb >>> [??? 2.069286] igb: Intel(R) Gigabit Ethernet Network Driver >>> [??? 2.069288] igb: Copyright (c) 2007-2014 Intel Corporation. >>> [??? 2.069305] igb 0000:01:00.0: can't change power state from D3cold >>> to D0 (config space inaccessible) >>> [??? 2.069624] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe >>> link lost >>> [??? 2.386659] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER >>> session. >>> [??? 4.115500] igb 0000:01:00.0: The NVM Checksum Is Not Valid >>> [??? 4.133807] igb: probe of 0000:01:00.0 failed with error -5 >>> [??? 4.133820] igb 0000:01:00.1: can't change power state from D3cold >>> to D0 (config space inaccessible) >>> [??? 4.134072] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe >>> link lost >>> [??? 4.451602] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER >>> session. >>> [??? 6.180123] igb 0000:01:00.1: The NVM Checksum Is Not Valid >>> [??? 6.188631] igb: probe of 0000:01:00.1 failed with error -5 >>> >>> Any advice on how to proceed? Willing to test patches and provide >>> additional debug info. > > Sorry to reply from a non-Intel account. I would suggest first > contacting Dell, and then contacting DeLock. This sounds like an issue > with motherboard firmware and most of what I can help with would be with > the driver. I think the issues are probably before things get to the > driver. > Agree. The driver starts work when the PCIe link in L0. Please, check with Dell/DeLock what is PCIe link status and enumeration process finished as properly.(probably you will need PCIe sniffer) > Todd Fujinaka <todd.fujinaka@intel.com> > _______________________________________________ > Intel-wired-lan mailing list > Intel-wired-lan at osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-05 6:50 ` Sasha Neftin @ 2021-10-05 9:40 ` Paul Menzel -1 siblings, 0 replies; 35+ messages in thread From: Paul Menzel @ 2021-10-05 9:40 UTC (permalink / raw) To: Sasha Neftin Cc: netdev, intel-wired-lan, Hisashi T Fujinaka, Jakub Kicinski, Andreas K. Huettel Dear Sasha, Am 05.10.21 um 08:50 schrieb Sasha Neftin: > On 10/5/2021 02:39, Hisashi T Fujinaka wrote: >> On Mon, 4 Oct 2021, Jakub Kicinski wrote: >> >>> On Mon, 04 Oct 2021 15:06:31 +0200 Andreas K. Huettel wrote: >>>> Dear all, >>>> >>>> I hope this is the right place to ask, if not please advise me where >>>> to go. >>> >>> Adding intel-wired-lan@lists.osuosl.org and Sasha as well. >>> >>>> I have a new Dell machine with both an Intel on-board ethernet >>>> controller >>>> ([8086:15f9]) and an additional 2-port extension card ([8086:1521]). >>>> >>>> The second adaptor, a "DeLock PCIe 2xGBit", worked fine as far as I >>>> could >>>> see with Linux 5.10.59, but fails to initialize with Linux 5.14.9. >>>> >>>> dilfridge ~ # lspci -nn >>>> [...] >>>> 01:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) >>>> 01:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) >>>> [...] >>>> >>>> dilfridge ~ # dmesg|grep igb >>>> [ 2.069286] igb: Intel(R) Gigabit Ethernet Network Driver >>>> [ 2.069288] igb: Copyright (c) 2007-2014 Intel Corporation. >>>> [ 2.069305] igb 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible) >>>> [ 2.069624] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost >>>> [ 2.386659] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER session. >>>> [ 4.115500] igb 0000:01:00.0: The NVM Checksum Is Not Valid >>>> [ 4.133807] igb: probe of 0000:01:00.0 failed with error -5 >>>> [ 4.133820] igb 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible) >>>> [ 4.134072] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe link lost >>>> [ 4.451602] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER session. >>>> [ 6.180123] igb 0000:01:00.1: The NVM Checksum Is Not Valid >>>> [ 6.188631] igb: probe of 0000:01:00.1 failed with error -5 >>>> >>>> Any advice on how to proceed? Willing to test patches and provide >>>> additional debug info. >> >> Sorry to reply from a non-Intel account. I would suggest first >> contacting Dell, and then contacting DeLock. This sounds like an issue >> with motherboard firmware and most of what I can help with would be with >> the driver. I think the issues are probably before things get to the >> driver. >> > Agree. The driver starts work when the PCIe link in L0. Please, check > with Dell/DeLock what is PCIe link status and enumeration process > finished as properly.(probably you will need PCIe sniffer) Of course, it’d be great to fix potential firmware bugs, but to suggest to a consumer to work with Dell to fix the problem is unfortunately not a realistic solution if Andreas does not own thousands of the problematic system. Linux has a no-regression policy, meaning when userspace/hardware with an older Linux kernel worked than it *has to* work with a new version too. So besides fixing the firmware/system, it’s as important to find the commit introducing the regression and fix it. Kind regards, Paul ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-05 9:40 ` Paul Menzel 0 siblings, 0 replies; 35+ messages in thread From: Paul Menzel @ 2021-10-05 9:40 UTC (permalink / raw) To: intel-wired-lan Dear Sasha, Am 05.10.21 um 08:50 schrieb Sasha Neftin: > On 10/5/2021 02:39, Hisashi T Fujinaka wrote: >> On Mon, 4 Oct 2021, Jakub Kicinski wrote: >> >>> On Mon, 04 Oct 2021 15:06:31 +0200 Andreas K. Huettel wrote: >>>> Dear all, >>>> >>>> I hope this is the right place to ask, if not please advise me where >>>> to go. >>> >>> Adding intel-wired-lan at lists.osuosl.org and Sasha as well. >>> >>>> I have a new Dell machine with both an Intel on-board ethernet >>>> controller >>>> ([8086:15f9]) and an additional 2-port extension card ([8086:1521]). >>>> >>>> The second adaptor, a "DeLock PCIe 2xGBit", worked fine as far as I >>>> could >>>> see with Linux 5.10.59, but fails to initialize with Linux 5.14.9. >>>> >>>> dilfridge ~ # lspci -nn >>>> [...] >>>> 01:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) >>>> 01:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) >>>> [...] >>>> >>>> dilfridge ~ # dmesg|grep igb >>>> [??? 2.069286] igb: Intel(R) Gigabit Ethernet Network Driver >>>> [??? 2.069288] igb: Copyright (c) 2007-2014 Intel Corporation. >>>> [??? 2.069305] igb 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible) >>>> [??? 2.069624] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost >>>> [??? 2.386659] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER session. >>>> [??? 4.115500] igb 0000:01:00.0: The NVM Checksum Is Not Valid >>>> [??? 4.133807] igb: probe of 0000:01:00.0 failed with error -5 >>>> [??? 4.133820] igb 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible) >>>> [??? 4.134072] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe link lost >>>> [??? 4.451602] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER session. >>>> [??? 6.180123] igb 0000:01:00.1: The NVM Checksum Is Not Valid >>>> [??? 6.188631] igb: probe of 0000:01:00.1 failed with error -5 >>>> >>>> Any advice on how to proceed? Willing to test patches and provide >>>> additional debug info. >> >> Sorry to reply from a non-Intel account. I would suggest first >> contacting Dell, and then contacting DeLock. This sounds like an issue >> with motherboard firmware and most of what I can help with would be with >> the driver. I think the issues are probably before things get to the >> driver. >> > Agree. The driver starts work when the PCIe link in L0. Please, check > with Dell/DeLock what is PCIe link status and enumeration process > finished as properly.(probably you will need PCIe sniffer) Of course, it?d be great to fix potential firmware bugs, but to suggest to a consumer to work with Dell to fix the problem is unfortunately not a realistic solution if Andreas does not own thousands of the problematic system. Linux has a no-regression policy, meaning when userspace/hardware with an older Linux kernel worked than it *has to* work with a new version too. So besides fixing the firmware/system, it?s as important to find the commit introducing the regression and fix it. Kind regards, Paul ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-05 9:40 ` Paul Menzel @ 2021-10-05 18:20 ` Hisashi T Fujinaka -1 siblings, 0 replies; 35+ messages in thread From: Hisashi T Fujinaka @ 2021-10-05 18:20 UTC (permalink / raw) To: Paul Menzel Cc: Sasha Neftin, netdev, intel-wired-lan, Jakub Kicinski, Andreas K. Huettel On Tue, 5 Oct 2021, Paul Menzel wrote: > Linux has a no-regression policy, meaning when userspace/hardware with an > older Linux kernel worked than it *has to* work with a new version too. So > besides fixing the firmware/system, it?s as important to find the commit > introducing the regression and fix it. I think you're looking at the wrong driver. igb is fairly stable and we haven't been poking at it much. Most of the changes have been from the community. Sasha is commiting to igc, not igb. In any case, we don't have the hardware (motherboard or NIC) and any bisection will have to be done by the issue submitter. Todd Fujinaka todd.fujinaka@intel.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-05 18:20 ` Hisashi T Fujinaka 0 siblings, 0 replies; 35+ messages in thread From: Hisashi T Fujinaka @ 2021-10-05 18:20 UTC (permalink / raw) To: intel-wired-lan On Tue, 5 Oct 2021, Paul Menzel wrote: > Linux has a no-regression policy, meaning when userspace/hardware with an > older Linux kernel worked than it *has to* work with a new version too. So > besides fixing the firmware/system, it?s as important to find the commit > introducing the regression and fix it. I think you're looking at the wrong driver. igb is fairly stable and we haven't been poking at it much. Most of the changes have been from the community. Sasha is commiting to igc, not igb. In any case, we don't have the hardware (motherboard or NIC) and any bisection will have to be done by the issue submitter. Todd Fujinaka todd.fujinaka at intel.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-04 14:48 ` [Intel-wired-lan] " Jakub Kicinski @ 2021-10-05 9:34 ` Paul Menzel -1 siblings, 0 replies; 35+ messages in thread From: Paul Menzel @ 2021-10-05 9:34 UTC (permalink / raw) To: Andreas K. Huettel; +Cc: netdev, intel-wired-lan, Jakub Kicinski Dear Andreas, Am 04.10.21 um 16:48 schrieb Jakub Kicinski: > On Mon, 04 Oct 2021 15:06:31 +0200 Andreas K. Huettel wrote: >> I hope this is the right place to ask, if not please advise me where to go. > > Adding intel-wired-lan@lists.osuosl.org and Sasha as well. > >> I have a new Dell machine with both an Intel on-board ethernet controller >> ([8086:15f9]) and an additional 2-port extension card ([8086:1521]). >> >> The second adaptor, a "DeLock PCIe 2xGBit", worked fine as far as I could >> see with Linux 5.10.59, but fails to initialize with Linux 5.14.9. >> >> dilfridge ~ # lspci -nn >> [...] >> 01:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) >> 01:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) >> [...] >> >> dilfridge ~ # dmesg|grep igb >> [ 2.069286] igb: Intel(R) Gigabit Ethernet Network Driver >> [ 2.069288] igb: Copyright (c) 2007-2014 Intel Corporation. >> [ 2.069305] igb 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible) >> [ 2.069624] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost >> [ 2.386659] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER session. >> [ 4.115500] igb 0000:01:00.0: The NVM Checksum Is Not Valid >> [ 4.133807] igb: probe of 0000:01:00.0 failed with error -5 >> [ 4.133820] igb 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible) >> [ 4.134072] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe link lost >> [ 4.451602] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER session. >> [ 6.180123] igb 0000:01:00.1: The NVM Checksum Is Not Valid >> [ 6.188631] igb: probe of 0000:01:00.1 failed with error -5 What messages are new compared to the working Linux 5.10.59? >> Any advice on how to proceed? Willing to test patches and provide additional debug info. Without any ideas about the issue, please bisect the issue to find the commit introducing the regression, so it can be reverted/fixed to not violate Linux’ no-regression policy. Kind regards, Paul ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-05 9:34 ` Paul Menzel 0 siblings, 0 replies; 35+ messages in thread From: Paul Menzel @ 2021-10-05 9:34 UTC (permalink / raw) To: intel-wired-lan Dear Andreas, Am 04.10.21 um 16:48 schrieb Jakub Kicinski: > On Mon, 04 Oct 2021 15:06:31 +0200 Andreas K. Huettel wrote: >> I hope this is the right place to ask, if not please advise me where to go. > > Adding intel-wired-lan at lists.osuosl.org and Sasha as well. > >> I have a new Dell machine with both an Intel on-board ethernet controller >> ([8086:15f9]) and an additional 2-port extension card ([8086:1521]). >> >> The second adaptor, a "DeLock PCIe 2xGBit", worked fine as far as I could >> see with Linux 5.10.59, but fails to initialize with Linux 5.14.9. >> >> dilfridge ~ # lspci -nn >> [...] >> 01:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) >> 01:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev ff) >> [...] >> >> dilfridge ~ # dmesg|grep igb >> [ 2.069286] igb: Intel(R) Gigabit Ethernet Network Driver >> [ 2.069288] igb: Copyright (c) 2007-2014 Intel Corporation. >> [ 2.069305] igb 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible) >> [ 2.069624] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost >> [ 2.386659] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER session. >> [ 4.115500] igb 0000:01:00.0: The NVM Checksum Is Not Valid >> [ 4.133807] igb: probe of 0000:01:00.0 failed with error -5 >> [ 4.133820] igb 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible) >> [ 4.134072] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe link lost >> [ 4.451602] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER session. >> [ 6.180123] igb 0000:01:00.1: The NVM Checksum Is Not Valid >> [ 6.188631] igb: probe of 0000:01:00.1 failed with error -5 What messages are new compared to the working Linux 5.10.59? >> Any advice on how to proceed? Willing to test patches and provide additional debug info. Without any ideas about the issue, please bisect the issue to find the commit introducing the regression, so it can be reverted/fixed to not violate Linux? no-regression policy. Kind regards, Paul ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [EXT] Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-05 9:34 ` Paul Menzel @ 2021-10-05 13:43 ` Andreas K. Huettel -1 siblings, 0 replies; 35+ messages in thread From: Andreas K. Huettel @ 2021-10-05 13:43 UTC (permalink / raw) To: Paul Menzel; +Cc: netdev, intel-wired-lan, Jakub Kicinski [-- Attachment #1: Type: text/plain, Size: 4730 bytes --] > > What messages are new compared to the working Linux 5.10.59? > I've uploaded the full boot logs to https://dev.gentoo.org/~dilfridge/igb/ (both in a version with and without timestamps, for easy diff). * I can't see anything that immediately points to the igb device (like a PCI id etc.) before the module is loaded. * The main difference between the logs is many unrelated (?) i915 warnings in 5.10.59 because of the nonfunctional graphics. The messages easily identifiable are: huettel@pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) Oct 5 15:11:18 dilfridge kernel: [ 2.094438] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs Oct 5 15:11:18 dilfridge kernel: [ 2.097287] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs Oct 5 15:11:18 dilfridge kernel: [ 2.098492] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs Oct 5 15:11:18 dilfridge kernel: [ 2.098787] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs Oct 5 15:11:18 dilfridge kernel: [ 2.173386] igb 0000:01:00.0: added PHC on eth0 Oct 5 15:11:18 dilfridge kernel: [ 2.173391] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection Oct 5 15:11:18 dilfridge kernel: [ 2.173395] igb 0000:01:00.0: eth0: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4c Oct 5 15:11:18 dilfridge kernel: [ 2.173991] igb 0000:01:00.0: eth0: PBA No: H47819-001 Oct 5 15:11:18 dilfridge kernel: [ 2.173994] igb 0000:01:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) Oct 5 15:11:18 dilfridge kernel: [ 2.174199] igb 0000:01:00.1: enabling device (0000 -> 0002) Oct 5 15:11:18 dilfridge kernel: [ 2.261029] igb 0000:01:00.1: added PHC on eth1 Oct 5 15:11:18 dilfridge kernel: [ 2.261034] igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection Oct 5 15:11:18 dilfridge kernel: [ 2.261038] igb 0000:01:00.1: eth1: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4d Oct 5 15:11:18 dilfridge kernel: [ 2.261772] igb 0000:01:00.1: eth1: PBA No: H47819-001 Oct 5 15:11:18 dilfridge kernel: [ 2.261776] igb 0000:01:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) Oct 5 15:11:18 dilfridge kernel: [ 2.265376] igb 0000:01:00.1 enp1s0f1: renamed from eth1 Oct 5 15:11:18 dilfridge kernel: [ 2.282514] igb 0000:01:00.0 enp1s0f0: renamed from eth0 Oct 5 15:11:31 dilfridge kernel: [ 17.585202] igb 0000:01:00.0 enp1s0f0: igb: enp1s0f0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX huettel@pinacolada ~/tmp $ cat kernel-messages-5.14.9.txt |grep igb Oct 5 02:38:31 dilfridge kernel: [ 2.108606] igb: Intel(R) Gigabit Ethernet Network Driver Oct 5 02:38:31 dilfridge kernel: [ 2.108608] igb: Copyright (c) 2007-2014 Intel Corporation. Oct 5 02:38:31 dilfridge kernel: [ 2.108622] igb 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible) Oct 5 02:38:31 dilfridge kernel: [ 2.108918] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost Oct 5 02:38:31 dilfridge kernel: [ 2.418724] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER session. Oct 5 02:38:31 dilfridge kernel: [ 4.148163] igb 0000:01:00.0: The NVM Checksum Is Not Valid Oct 5 02:38:31 dilfridge kernel: [ 4.154891] igb: probe of 0000:01:00.0 failed with error -5 Oct 5 02:38:31 dilfridge kernel: [ 4.154904] igb 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible) Oct 5 02:38:31 dilfridge kernel: [ 4.155146] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe link lost Oct 5 02:38:31 dilfridge kernel: [ 4.466904] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER session. Oct 5 02:38:31 dilfridge kernel: [ 6.195528] igb 0000:01:00.1: The NVM Checksum Is Not Valid Oct 5 02:38:31 dilfridge kernel: [ 6.200863] igb: probe of 0000:01:00.1 failed with error -5 > >> Any advice on how to proceed? Willing to test patches and provide additional debug info. > > Without any ideas about the issue, please bisect the issue to find the > commit introducing the regression, so it can be reverted/fixed to not > violate Linux’ no-regression policy. I'll start going through kernel versions (and later revisions) end of the week. Thanks a lot, Andreas -- PD Dr. Andreas K. Huettel Institute for Experimental and Applied Physics University of Regensburg 93040 Regensburg Germany [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 981 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] [EXT] Re: Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-05 13:43 ` Andreas K. Huettel 0 siblings, 0 replies; 35+ messages in thread From: Andreas K. Huettel @ 2021-10-05 13:43 UTC (permalink / raw) To: intel-wired-lan > > What messages are new compared to the working Linux 5.10.59? > I've uploaded the full boot logs to https://dev.gentoo.org/~dilfridge/igb/ (both in a version with and without timestamps, for easy diff). * I can't see anything that immediately points to the igb device (like a PCI id etc.) before the module is loaded. * The main difference between the logs is many unrelated (?) i915 warnings in 5.10.59 because of the nonfunctional graphics. The messages easily identifiable are: huettel at pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) Oct 5 15:11:18 dilfridge kernel: [ 2.094438] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs Oct 5 15:11:18 dilfridge kernel: [ 2.097287] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs Oct 5 15:11:18 dilfridge kernel: [ 2.098492] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs Oct 5 15:11:18 dilfridge kernel: [ 2.098787] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs Oct 5 15:11:18 dilfridge kernel: [ 2.173386] igb 0000:01:00.0: added PHC on eth0 Oct 5 15:11:18 dilfridge kernel: [ 2.173391] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection Oct 5 15:11:18 dilfridge kernel: [ 2.173395] igb 0000:01:00.0: eth0: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4c Oct 5 15:11:18 dilfridge kernel: [ 2.173991] igb 0000:01:00.0: eth0: PBA No: H47819-001 Oct 5 15:11:18 dilfridge kernel: [ 2.173994] igb 0000:01:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) Oct 5 15:11:18 dilfridge kernel: [ 2.174199] igb 0000:01:00.1: enabling device (0000 -> 0002) Oct 5 15:11:18 dilfridge kernel: [ 2.261029] igb 0000:01:00.1: added PHC on eth1 Oct 5 15:11:18 dilfridge kernel: [ 2.261034] igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection Oct 5 15:11:18 dilfridge kernel: [ 2.261038] igb 0000:01:00.1: eth1: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4d Oct 5 15:11:18 dilfridge kernel: [ 2.261772] igb 0000:01:00.1: eth1: PBA No: H47819-001 Oct 5 15:11:18 dilfridge kernel: [ 2.261776] igb 0000:01:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) Oct 5 15:11:18 dilfridge kernel: [ 2.265376] igb 0000:01:00.1 enp1s0f1: renamed from eth1 Oct 5 15:11:18 dilfridge kernel: [ 2.282514] igb 0000:01:00.0 enp1s0f0: renamed from eth0 Oct 5 15:11:31 dilfridge kernel: [ 17.585202] igb 0000:01:00.0 enp1s0f0: igb: enp1s0f0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX huettel at pinacolada ~/tmp $ cat kernel-messages-5.14.9.txt |grep igb Oct 5 02:38:31 dilfridge kernel: [ 2.108606] igb: Intel(R) Gigabit Ethernet Network Driver Oct 5 02:38:31 dilfridge kernel: [ 2.108608] igb: Copyright (c) 2007-2014 Intel Corporation. Oct 5 02:38:31 dilfridge kernel: [ 2.108622] igb 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible) Oct 5 02:38:31 dilfridge kernel: [ 2.108918] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost Oct 5 02:38:31 dilfridge kernel: [ 2.418724] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER session. Oct 5 02:38:31 dilfridge kernel: [ 4.148163] igb 0000:01:00.0: The NVM Checksum Is Not Valid Oct 5 02:38:31 dilfridge kernel: [ 4.154891] igb: probe of 0000:01:00.0 failed with error -5 Oct 5 02:38:31 dilfridge kernel: [ 4.154904] igb 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible) Oct 5 02:38:31 dilfridge kernel: [ 4.155146] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe link lost Oct 5 02:38:31 dilfridge kernel: [ 4.466904] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER session. Oct 5 02:38:31 dilfridge kernel: [ 6.195528] igb 0000:01:00.1: The NVM Checksum Is Not Valid Oct 5 02:38:31 dilfridge kernel: [ 6.200863] igb: probe of 0000:01:00.1 failed with error -5 > >> Any advice on how to proceed? Willing to test patches and provide additional debug info. > > Without any ideas about the issue, please bisect the issue to find the > commit introducing the regression, so it can be reverted/fixed to not > violate Linux? no-regression policy. I'll start going through kernel versions (and later revisions) end of the week. Thanks a lot, Andreas -- PD Dr. Andreas K. Huettel Institute for Experimental and Applied Physics University of Regensburg 93040 Regensburg Germany -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 981 bytes Desc: This is a digitally signed message part. URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20211005/8dcdb430/attachment.asc> ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [EXT] Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-05 13:43 ` [Intel-wired-lan] [EXT] " Andreas K. Huettel @ 2021-10-05 22:27 ` Jesse Brandeburg -1 siblings, 0 replies; 35+ messages in thread From: Jesse Brandeburg @ 2021-10-05 22:27 UTC (permalink / raw) To: Andreas K. Huettel, Paul Menzel; +Cc: netdev, intel-wired-lan, Jakub Kicinski On 10/5/2021 6:43 AM, Andreas K. Huettel wrote: >> >> What messages are new compared to the working Linux 5.10.59? >> > > I've uploaded the full boot logs to https://dev.gentoo.org/~dilfridge/igb/ > (both in a version with and without timestamps, for easy diff). > > * I can't see anything that immediately points to the igb device (like a PCI id etc.) before the module is loaded. > * The main difference between the logs is many unrelated (?) i915 warnings in 5.10.59 because of the nonfunctional graphics. > > The messages easily identifiable are: > > huettel@pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb > Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver > Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. > Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) This line is missing below, it indicates that the kernel couldn't or didn't power up the PCIe for some reason. We're looking for something like ACPI or PCI patches (possibly PCI-Power management) to be the culprit here. > Oct 5 15:11:18 dilfridge kernel: [ 2.094438] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs > Oct 5 15:11:18 dilfridge kernel: [ 2.097287] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs > Oct 5 15:11:18 dilfridge kernel: [ 2.098492] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs > Oct 5 15:11:18 dilfridge kernel: [ 2.098787] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs > Oct 5 15:11:18 dilfridge kernel: [ 2.173386] igb 0000:01:00.0: added PHC on eth0 > Oct 5 15:11:18 dilfridge kernel: [ 2.173391] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection > Oct 5 15:11:18 dilfridge kernel: [ 2.173395] igb 0000:01:00.0: eth0: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4c > Oct 5 15:11:18 dilfridge kernel: [ 2.173991] igb 0000:01:00.0: eth0: PBA No: H47819-001 > Oct 5 15:11:18 dilfridge kernel: [ 2.173994] igb 0000:01:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) > Oct 5 15:11:18 dilfridge kernel: [ 2.174199] igb 0000:01:00.1: enabling device (0000 -> 0002) > Oct 5 15:11:18 dilfridge kernel: [ 2.261029] igb 0000:01:00.1: added PHC on eth1 > Oct 5 15:11:18 dilfridge kernel: [ 2.261034] igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection > Oct 5 15:11:18 dilfridge kernel: [ 2.261038] igb 0000:01:00.1: eth1: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4d > Oct 5 15:11:18 dilfridge kernel: [ 2.261772] igb 0000:01:00.1: eth1: PBA No: H47819-001 > Oct 5 15:11:18 dilfridge kernel: [ 2.261776] igb 0000:01:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) > Oct 5 15:11:18 dilfridge kernel: [ 2.265376] igb 0000:01:00.1 enp1s0f1: renamed from eth1 > Oct 5 15:11:18 dilfridge kernel: [ 2.282514] igb 0000:01:00.0 enp1s0f0: renamed from eth0 > Oct 5 15:11:31 dilfridge kernel: [ 17.585202] igb 0000:01:00.0 enp1s0f0: igb: enp1s0f0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX > > huettel@pinacolada ~/tmp $ cat kernel-messages-5.14.9.txt |grep igb > Oct 5 02:38:31 dilfridge kernel: [ 2.108606] igb: Intel(R) Gigabit Ethernet Network Driver > Oct 5 02:38:31 dilfridge kernel: [ 2.108608] igb: Copyright (c) 2007-2014 Intel Corporation. > Oct 5 02:38:31 dilfridge kernel: [ 2.108622] igb 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible) This is really the only message that matters. It indicates the config space is inaccessible, and from the system/kernel's perspective, the device is unplugged or not responding, or in a PCIe power state. > Oct 5 02:38:31 dilfridge kernel: [ 2.108918] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost > Oct 5 02:38:31 dilfridge kernel: [ 2.418724] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER session. > Oct 5 02:38:31 dilfridge kernel: [ 4.148163] igb 0000:01:00.0: The NVM Checksum Is Not Valid > Oct 5 02:38:31 dilfridge kernel: [ 4.154891] igb: probe of 0000:01:00.0 failed with error -5 > Oct 5 02:38:31 dilfridge kernel: [ 4.154904] igb 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible) > Oct 5 02:38:31 dilfridge kernel: [ 4.155146] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe link lost > Oct 5 02:38:31 dilfridge kernel: [ 4.466904] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER session. > Oct 5 02:38:31 dilfridge kernel: [ 6.195528] igb 0000:01:00.1: The NVM Checksum Is Not Valid > Oct 5 02:38:31 dilfridge kernel: [ 6.200863] igb: probe of 0000:01:00.1 failed with error -5 > > >>>> Any advice on how to proceed? Willing to test patches and provide additional debug info. >> >> Without any ideas about the issue, please bisect the issue to find the >> commit introducing the regression, so it can be reverted/fixed to not >> violate Linux’ no-regression policy. > > I'll start going through kernel versions (and later revisions) end of the week. Thank you for helping the community figure out what is up here. I don't believe that it is a driver bug/change that broke things, but anything is possible. :-) Given what I saw above I wonder if you should try to boot with pci_aspm=off The best option is a bisect using git, but it will help to narrow things down to a couple different kernel versions if that is the only option you have. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] [EXT] Re: Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-05 22:27 ` Jesse Brandeburg 0 siblings, 0 replies; 35+ messages in thread From: Jesse Brandeburg @ 2021-10-05 22:27 UTC (permalink / raw) To: intel-wired-lan On 10/5/2021 6:43 AM, Andreas K. Huettel wrote: >> >> What messages are new compared to the working Linux 5.10.59? >> > > I've uploaded the full boot logs to https://dev.gentoo.org/~dilfridge/igb/ > (both in a version with and without timestamps, for easy diff). > > * I can't see anything that immediately points to the igb device (like a PCI id etc.) before the module is loaded. > * The main difference between the logs is many unrelated (?) i915 warnings in 5.10.59 because of the nonfunctional graphics. > > The messages easily identifiable are: > > huettel at pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb > Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver > Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. > Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) This line is missing below, it indicates that the kernel couldn't or didn't power up the PCIe for some reason. We're looking for something like ACPI or PCI patches (possibly PCI-Power management) to be the culprit here. > Oct 5 15:11:18 dilfridge kernel: [ 2.094438] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs > Oct 5 15:11:18 dilfridge kernel: [ 2.097287] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs > Oct 5 15:11:18 dilfridge kernel: [ 2.098492] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs > Oct 5 15:11:18 dilfridge kernel: [ 2.098787] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs > Oct 5 15:11:18 dilfridge kernel: [ 2.173386] igb 0000:01:00.0: added PHC on eth0 > Oct 5 15:11:18 dilfridge kernel: [ 2.173391] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection > Oct 5 15:11:18 dilfridge kernel: [ 2.173395] igb 0000:01:00.0: eth0: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4c > Oct 5 15:11:18 dilfridge kernel: [ 2.173991] igb 0000:01:00.0: eth0: PBA No: H47819-001 > Oct 5 15:11:18 dilfridge kernel: [ 2.173994] igb 0000:01:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) > Oct 5 15:11:18 dilfridge kernel: [ 2.174199] igb 0000:01:00.1: enabling device (0000 -> 0002) > Oct 5 15:11:18 dilfridge kernel: [ 2.261029] igb 0000:01:00.1: added PHC on eth1 > Oct 5 15:11:18 dilfridge kernel: [ 2.261034] igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection > Oct 5 15:11:18 dilfridge kernel: [ 2.261038] igb 0000:01:00.1: eth1: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4d > Oct 5 15:11:18 dilfridge kernel: [ 2.261772] igb 0000:01:00.1: eth1: PBA No: H47819-001 > Oct 5 15:11:18 dilfridge kernel: [ 2.261776] igb 0000:01:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) > Oct 5 15:11:18 dilfridge kernel: [ 2.265376] igb 0000:01:00.1 enp1s0f1: renamed from eth1 > Oct 5 15:11:18 dilfridge kernel: [ 2.282514] igb 0000:01:00.0 enp1s0f0: renamed from eth0 > Oct 5 15:11:31 dilfridge kernel: [ 17.585202] igb 0000:01:00.0 enp1s0f0: igb: enp1s0f0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX > > huettel at pinacolada ~/tmp $ cat kernel-messages-5.14.9.txt |grep igb > Oct 5 02:38:31 dilfridge kernel: [ 2.108606] igb: Intel(R) Gigabit Ethernet Network Driver > Oct 5 02:38:31 dilfridge kernel: [ 2.108608] igb: Copyright (c) 2007-2014 Intel Corporation. > Oct 5 02:38:31 dilfridge kernel: [ 2.108622] igb 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible) This is really the only message that matters. It indicates the config space is inaccessible, and from the system/kernel's perspective, the device is unplugged or not responding, or in a PCIe power state. > Oct 5 02:38:31 dilfridge kernel: [ 2.108918] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost > Oct 5 02:38:31 dilfridge kernel: [ 2.418724] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER session. > Oct 5 02:38:31 dilfridge kernel: [ 4.148163] igb 0000:01:00.0: The NVM Checksum Is Not Valid > Oct 5 02:38:31 dilfridge kernel: [ 4.154891] igb: probe of 0000:01:00.0 failed with error -5 > Oct 5 02:38:31 dilfridge kernel: [ 4.154904] igb 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible) > Oct 5 02:38:31 dilfridge kernel: [ 4.155146] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe link lost > Oct 5 02:38:31 dilfridge kernel: [ 4.466904] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER session. > Oct 5 02:38:31 dilfridge kernel: [ 6.195528] igb 0000:01:00.1: The NVM Checksum Is Not Valid > Oct 5 02:38:31 dilfridge kernel: [ 6.200863] igb: probe of 0000:01:00.1 failed with error -5 > > >>>> Any advice on how to proceed? Willing to test patches and provide additional debug info. >> >> Without any ideas about the issue, please bisect the issue to find the >> commit introducing the regression, so it can be reverted/fixed to not >> violate Linux? no-regression policy. > > I'll start going through kernel versions (and later revisions) end of the week. Thank you for helping the community figure out what is up here. I don't believe that it is a driver bug/change that broke things, but anything is possible. :-) Given what I saw above I wonder if you should try to boot with pci_aspm=off The best option is a bisect using git, but it will help to narrow things down to a couple different kernel versions if that is the only option you have. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [EXT] Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-05 22:27 ` [Intel-wired-lan] [EXT] " Jesse Brandeburg @ 2021-10-12 16:34 ` Andreas K. Huettel -1 siblings, 0 replies; 35+ messages in thread From: Andreas K. Huettel @ 2021-10-12 16:34 UTC (permalink / raw) To: Paul Menzel, Jesse Brandeburg; +Cc: netdev, intel-wired-lan, Jakub Kicinski > > The messages easily identifiable are: > > > > huettel@pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb > > Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver > > Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. > > Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) > > This line is missing below, it indicates that the kernel couldn't or > didn't power up the PCIe for some reason. We're looking for something > like ACPI or PCI patches (possibly PCI-Power management) to be the > culprit here. > So I did a git bisect from linux-v5.10 (good) to linux-v5.14.11 (bad). The result was: dilfridge /usr/src/linux-git # git bisect bad 6381195ad7d06ef979528c7452f3ff93659f86b1 is the first bad commit commit 6381195ad7d06ef979528c7452f3ff93659f86b1 Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Date: Mon May 24 17:26:16 2021 +0200 ACPI: power: Rework turning off unused power resources [...] I tried naive reverting of this commit on top of 5.14.11. That applies nearly cleanly, and after a reboot the additional ethernet interfaces show up with their MAC in the boot messages. (Not knowing how safe that experiment was, I did not go further than single mode and immediately rebooted into 5.10 afterwards.) -- PD Dr. Andreas K. Huettel Institute for Experimental and Applied Physics University of Regensburg 93040 Regensburg Germany e-mail andreas.huettel@ur.de http://www.akhuettel.de/ http://www.physik.uni-r.de/forschung/huettel/ ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] [EXT] Re: Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-12 16:34 ` Andreas K. Huettel 0 siblings, 0 replies; 35+ messages in thread From: Andreas K. Huettel @ 2021-10-12 16:34 UTC (permalink / raw) To: intel-wired-lan > > The messages easily identifiable are: > > > > huettel at pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb > > Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver > > Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. > > Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) > > This line is missing below, it indicates that the kernel couldn't or > didn't power up the PCIe for some reason. We're looking for something > like ACPI or PCI patches (possibly PCI-Power management) to be the > culprit here. > So I did a git bisect from linux-v5.10 (good) to linux-v5.14.11 (bad). The result was: dilfridge /usr/src/linux-git # git bisect bad 6381195ad7d06ef979528c7452f3ff93659f86b1 is the first bad commit commit 6381195ad7d06ef979528c7452f3ff93659f86b1 Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Date: Mon May 24 17:26:16 2021 +0200 ACPI: power: Rework turning off unused power resources [...] I tried naive reverting of this commit on top of 5.14.11. That applies nearly cleanly, and after a reboot the additional ethernet interfaces show up with their MAC in the boot messages. (Not knowing how safe that experiment was, I did not go further than single mode and immediately rebooted into 5.10 afterwards.) -- PD Dr. Andreas K. Huettel Institute for Experimental and Applied Physics University of Regensburg 93040 Regensburg Germany e-mail andreas.huettel at ur.de http://www.akhuettel.de/ http://www.physik.uni-r.de/forschung/huettel/ ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [EXT] Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-12 16:34 ` [Intel-wired-lan] [EXT] " Andreas K. Huettel @ 2021-10-12 17:42 ` Paul Menzel -1 siblings, 0 replies; 35+ messages in thread From: Paul Menzel @ 2021-10-12 17:42 UTC (permalink / raw) To: Andreas K. Huettel, Jesse Brandeburg Cc: netdev, intel-wired-lan, Jakub Kicinski, Rafael J. Wysocki, Len Brown, ACPI Devel Maling List, Rafael J. Wysocki [Cc: +ACPI maintainers] Am 12.10.21 um 18:34 schrieb Andreas K. Huettel: >>> The messages easily identifiable are: >>> >>> huettel@pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) >> >> This line is missing below, it indicates that the kernel couldn't or >> didn't power up the PCIe for some reason. We're looking for something >> like ACPI or PCI patches (possibly PCI-Power management) to be the >> culprit here. > > So I did a git bisect from linux-v5.10 (good) to linux-v5.14.11 (bad). > > The result was: > > dilfridge /usr/src/linux-git # git bisect bad > 6381195ad7d06ef979528c7452f3ff93659f86b1 is the first bad commit > commit 6381195ad7d06ef979528c7452f3ff93659f86b1 > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Date: Mon May 24 17:26:16 2021 +0200 > > ACPI: power: Rework turning off unused power resources > [...] > > I tried naive reverting of this commit on top of 5.14.11. That applies nearly cleanly, > and after a reboot the additional ethernet interfaces show up with their MAC in the > boot messages. > > (Not knowing how safe that experiment was, I did not go further than single mode and > immediately rebooted into 5.10 afterwards.) ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] [EXT] Re: Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-12 17:42 ` Paul Menzel 0 siblings, 0 replies; 35+ messages in thread From: Paul Menzel @ 2021-10-12 17:42 UTC (permalink / raw) To: intel-wired-lan [Cc: +ACPI maintainers] Am 12.10.21 um 18:34 schrieb Andreas K. Huettel: >>> The messages easily identifiable are: >>> >>> huettel at pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) >> >> This line is missing below, it indicates that the kernel couldn't or >> didn't power up the PCIe for some reason. We're looking for something >> like ACPI or PCI patches (possibly PCI-Power management) to be the >> culprit here. > > So I did a git bisect from linux-v5.10 (good) to linux-v5.14.11 (bad). > > The result was: > > dilfridge /usr/src/linux-git # git bisect bad > 6381195ad7d06ef979528c7452f3ff93659f86b1 is the first bad commit > commit 6381195ad7d06ef979528c7452f3ff93659f86b1 > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Date: Mon May 24 17:26:16 2021 +0200 > > ACPI: power: Rework turning off unused power resources > [...] > > I tried naive reverting of this commit on top of 5.14.11. That applies nearly cleanly, > and after a reboot the additional ethernet interfaces show up with their MAC in the > boot messages. > > (Not knowing how safe that experiment was, I did not go further than single mode and > immediately rebooted into 5.10 afterwards.) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [EXT] Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-12 17:42 ` [Intel-wired-lan] [EXT] " Paul Menzel @ 2021-10-12 17:58 ` Rafael J. Wysocki -1 siblings, 0 replies; 35+ messages in thread From: Rafael J. Wysocki @ 2021-10-12 17:58 UTC (permalink / raw) To: Paul Menzel Cc: Andreas K. Huettel, Jesse Brandeburg, netdev, intel-wired-lan, Jakub Kicinski, Rafael J. Wysocki, Len Brown, ACPI Devel Maling List, Rafael J. Wysocki On Tue, Oct 12, 2021 at 7:42 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > [Cc: +ACPI maintainers] > > Am 12.10.21 um 18:34 schrieb Andreas K. Huettel: > >>> The messages easily identifiable are: > >>> > >>> huettel@pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) > >> > >> This line is missing below, it indicates that the kernel couldn't or > >> didn't power up the PCIe for some reason. We're looking for something > >> like ACPI or PCI patches (possibly PCI-Power management) to be the > >> culprit here. > > > > So I did a git bisect from linux-v5.10 (good) to linux-v5.14.11 (bad). > > > > The result was: > > > > dilfridge /usr/src/linux-git # git bisect bad > > 6381195ad7d06ef979528c7452f3ff93659f86b1 is the first bad commit > > commit 6381195ad7d06ef979528c7452f3ff93659f86b1 > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Date: Mon May 24 17:26:16 2021 +0200 > > > > ACPI: power: Rework turning off unused power resources > > [...] > > > > I tried naive reverting of this commit on top of 5.14.11. That applies nearly cleanly, > > and after a reboot the additional ethernet interfaces show up with their MAC in the > > boot messages. > > > > (Not knowing how safe that experiment was, I did not go further than single mode and > > immediately rebooted into 5.10 afterwards.) Reverting this is rather not an option, because the code before it was a one-off fix of an earlier issue, but it should be fixable given some more information. Basically, I need a boot log from both the good and bad cases and the acpidump output from the affected machine. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] [EXT] Re: Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-12 17:58 ` Rafael J. Wysocki 0 siblings, 0 replies; 35+ messages in thread From: Rafael J. Wysocki @ 2021-10-12 17:58 UTC (permalink / raw) To: intel-wired-lan On Tue, Oct 12, 2021 at 7:42 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > [Cc: +ACPI maintainers] > > Am 12.10.21 um 18:34 schrieb Andreas K. Huettel: > >>> The messages easily identifiable are: > >>> > >>> huettel at pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) > >> > >> This line is missing below, it indicates that the kernel couldn't or > >> didn't power up the PCIe for some reason. We're looking for something > >> like ACPI or PCI patches (possibly PCI-Power management) to be the > >> culprit here. > > > > So I did a git bisect from linux-v5.10 (good) to linux-v5.14.11 (bad). > > > > The result was: > > > > dilfridge /usr/src/linux-git # git bisect bad > > 6381195ad7d06ef979528c7452f3ff93659f86b1 is the first bad commit > > commit 6381195ad7d06ef979528c7452f3ff93659f86b1 > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Date: Mon May 24 17:26:16 2021 +0200 > > > > ACPI: power: Rework turning off unused power resources > > [...] > > > > I tried naive reverting of this commit on top of 5.14.11. That applies nearly cleanly, > > and after a reboot the additional ethernet interfaces show up with their MAC in the > > boot messages. > > > > (Not knowing how safe that experiment was, I did not go further than single mode and > > immediately rebooted into 5.10 afterwards.) Reverting this is rather not an option, because the code before it was a one-off fix of an earlier issue, but it should be fixable given some more information. Basically, I need a boot log from both the good and bad cases and the acpidump output from the affected machine. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [EXT] Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-12 17:58 ` [Intel-wired-lan] [EXT] " Rafael J. Wysocki @ 2021-10-12 19:28 ` Andreas K. Huettel -1 siblings, 0 replies; 35+ messages in thread From: Andreas K. Huettel @ 2021-10-12 19:28 UTC (permalink / raw) To: Paul Menzel, Rafael J. Wysocki Cc: Jesse Brandeburg, netdev, intel-wired-lan, Jakub Kicinski, Rafael J. Wysocki, Len Brown, ACPI Devel Maling List, Rafael J. Wysocki [-- Attachment #1: Type: text/plain, Size: 2525 bytes --] Am Dienstag, 12. Oktober 2021, 19:58:47 CEST schrieb Rafael J. Wysocki: > On Tue, Oct 12, 2021 at 7:42 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > > > [Cc: +ACPI maintainers] > > > > Am 12.10.21 um 18:34 schrieb Andreas K. Huettel: > > >>> The messages easily identifiable are: > > >>> > > >>> huettel@pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) > > >> > > >> This line is missing below, it indicates that the kernel couldn't or > > >> didn't power up the PCIe for some reason. We're looking for something > > >> like ACPI or PCI patches (possibly PCI-Power management) to be the > > >> culprit here. > > > > > > So I did a git bisect from linux-v5.10 (good) to linux-v5.14.11 (bad). > > > > > > The result was: > > > > > > dilfridge /usr/src/linux-git # git bisect bad > > > 6381195ad7d06ef979528c7452f3ff93659f86b1 is the first bad commit > > > commit 6381195ad7d06ef979528c7452f3ff93659f86b1 > > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > Date: Mon May 24 17:26:16 2021 +0200 > > > > > > ACPI: power: Rework turning off unused power resources > > > [...] > > > > > > I tried naive reverting of this commit on top of 5.14.11. That applies nearly cleanly, > > > and after a reboot the additional ethernet interfaces show up with their MAC in the > > > boot messages. > > > > > > (Not knowing how safe that experiment was, I did not go further than single mode and > > > immediately rebooted into 5.10 afterwards.) > > Reverting this is rather not an option, because the code before it was > a one-off fix of an earlier issue, but it should be fixable given some > more information. > > Basically, I need a boot log from both the good and bad cases and the > acpidump output from the affected machine. > https://dev.gentoo.org/~dilfridge/igb/ ^ Should all be here now. 5.10 -> "good" log (the errors are caused by missing support for my i915 graphics and hopefully unrelated) 5.14 -> "bad" log Thank you for looking at this. If you need anything else, just ask. Andreas -- PD Dr. Andreas K. Huettel Institute for Experimental and Applied Physics University of Regensburg 93040 Regensburg Germany e-mail andreas.huettel@ur.de http://www.akhuettel.de/ [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 981 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] [EXT] Re: Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-12 19:28 ` Andreas K. Huettel 0 siblings, 0 replies; 35+ messages in thread From: Andreas K. Huettel @ 2021-10-12 19:28 UTC (permalink / raw) To: intel-wired-lan Am Dienstag, 12. Oktober 2021, 19:58:47 CEST schrieb Rafael J. Wysocki: > On Tue, Oct 12, 2021 at 7:42 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > > > [Cc: +ACPI maintainers] > > > > Am 12.10.21 um 18:34 schrieb Andreas K. Huettel: > > >>> The messages easily identifiable are: > > >>> > > >>> huettel at pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) > > >> > > >> This line is missing below, it indicates that the kernel couldn't or > > >> didn't power up the PCIe for some reason. We're looking for something > > >> like ACPI or PCI patches (possibly PCI-Power management) to be the > > >> culprit here. > > > > > > So I did a git bisect from linux-v5.10 (good) to linux-v5.14.11 (bad). > > > > > > The result was: > > > > > > dilfridge /usr/src/linux-git # git bisect bad > > > 6381195ad7d06ef979528c7452f3ff93659f86b1 is the first bad commit > > > commit 6381195ad7d06ef979528c7452f3ff93659f86b1 > > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > Date: Mon May 24 17:26:16 2021 +0200 > > > > > > ACPI: power: Rework turning off unused power resources > > > [...] > > > > > > I tried naive reverting of this commit on top of 5.14.11. That applies nearly cleanly, > > > and after a reboot the additional ethernet interfaces show up with their MAC in the > > > boot messages. > > > > > > (Not knowing how safe that experiment was, I did not go further than single mode and > > > immediately rebooted into 5.10 afterwards.) > > Reverting this is rather not an option, because the code before it was > a one-off fix of an earlier issue, but it should be fixable given some > more information. > > Basically, I need a boot log from both the good and bad cases and the > acpidump output from the affected machine. > https://dev.gentoo.org/~dilfridge/igb/ ^ Should all be here now. 5.10 -> "good" log (the errors are caused by missing support for my i915 graphics and hopefully unrelated) 5.14 -> "bad" log Thank you for looking at this. If you need anything else, just ask. Andreas -- PD Dr. Andreas K. Huettel Institute for Experimental and Applied Physics University of Regensburg 93040 Regensburg Germany e-mail andreas.huettel at ur.de http://www.akhuettel.de/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 981 bytes Desc: This is a digitally signed message part. URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20211012/570e4463/attachment-0001.asc> ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [EXT] Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-12 19:28 ` [Intel-wired-lan] [EXT] " Andreas K. Huettel @ 2021-10-14 12:09 ` Rafael J. Wysocki -1 siblings, 0 replies; 35+ messages in thread From: Rafael J. Wysocki @ 2021-10-14 12:09 UTC (permalink / raw) To: Andreas K. Huettel Cc: Paul Menzel, Rafael J. Wysocki, Jesse Brandeburg, netdev, intel-wired-lan, Jakub Kicinski, Rafael J. Wysocki, Len Brown, ACPI Devel Maling List [-- Attachment #1: Type: text/plain, Size: 2596 bytes --] On Tue, Oct 12, 2021 at 9:36 PM Andreas K. Huettel <andreas.huettel@ur.de> wrote: > > Am Dienstag, 12. Oktober 2021, 19:58:47 CEST schrieb Rafael J. Wysocki: > > On Tue, Oct 12, 2021 at 7:42 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > > > > > [Cc: +ACPI maintainers] > > > > > > Am 12.10.21 um 18:34 schrieb Andreas K. Huettel: > > > >>> The messages easily identifiable are: > > > >>> > > > >>> huettel@pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) > > > >> > > > >> This line is missing below, it indicates that the kernel couldn't or > > > >> didn't power up the PCIe for some reason. We're looking for something > > > >> like ACPI or PCI patches (possibly PCI-Power management) to be the > > > >> culprit here. > > > > > > > > So I did a git bisect from linux-v5.10 (good) to linux-v5.14.11 (bad). > > > > > > > > The result was: > > > > > > > > dilfridge /usr/src/linux-git # git bisect bad > > > > 6381195ad7d06ef979528c7452f3ff93659f86b1 is the first bad commit > > > > commit 6381195ad7d06ef979528c7452f3ff93659f86b1 > > > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > Date: Mon May 24 17:26:16 2021 +0200 > > > > > > > > ACPI: power: Rework turning off unused power resources > > > > [...] > > > > > > > > I tried naive reverting of this commit on top of 5.14.11. That applies nearly cleanly, > > > > and after a reboot the additional ethernet interfaces show up with their MAC in the > > > > boot messages. > > > > > > > > (Not knowing how safe that experiment was, I did not go further than single mode and > > > > immediately rebooted into 5.10 afterwards.) > > > > Reverting this is rather not an option, because the code before it was > > a one-off fix of an earlier issue, but it should be fixable given some > > more information. > > > > Basically, I need a boot log from both the good and bad cases and the > > acpidump output from the affected machine. > > > > https://dev.gentoo.org/~dilfridge/igb/ > > ^ Should all be here now. > > 5.10 -> "good" log (the errors are caused by missing support for my i915 graphics and hopefully unrelated) > 5.14 -> "bad" log > > Thank you for looking at this. If you need anything else, just ask. You're welcome. Please test the attached patch and let me know if it helps. [-- Attachment #2: acpi-power-turn-off-fixup.patch --] [-- Type: text/x-patch, Size: 909 bytes --] --- drivers/acpi/power.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) Index: linux-pm/drivers/acpi/power.c =================================================================== --- linux-pm.orig/drivers/acpi/power.c +++ linux-pm/drivers/acpi/power.c @@ -1035,13 +1035,8 @@ void acpi_turn_off_unused_power_resource list_for_each_entry_reverse(resource, &acpi_power_resource_list, list_node) { mutex_lock(&resource->resource_lock); - /* - * Turn off power resources in an unknown state too, because the - * platform firmware on some system expects the OS to turn off - * power resources without any users unconditionally. - */ if (!resource->ref_count && - resource->state != ACPI_POWER_RESOURCE_STATE_OFF) { + resource->state == ACPI_POWER_RESOURCE_STATE_ON) { acpi_handle_debug(resource->device.handle, "Turning OFF\n"); __acpi_power_off(resource); } ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] [EXT] Re: Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-14 12:09 ` Rafael J. Wysocki 0 siblings, 0 replies; 35+ messages in thread From: Rafael J. Wysocki @ 2021-10-14 12:09 UTC (permalink / raw) To: intel-wired-lan On Tue, Oct 12, 2021 at 9:36 PM Andreas K. Huettel <andreas.huettel@ur.de> wrote: > > Am Dienstag, 12. Oktober 2021, 19:58:47 CEST schrieb Rafael J. Wysocki: > > On Tue, Oct 12, 2021 at 7:42 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote: > > > > > > [Cc: +ACPI maintainers] > > > > > > Am 12.10.21 um 18:34 schrieb Andreas K. Huettel: > > > >>> The messages easily identifiable are: > > > >>> > > > >>> huettel at pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) > > > >> > > > >> This line is missing below, it indicates that the kernel couldn't or > > > >> didn't power up the PCIe for some reason. We're looking for something > > > >> like ACPI or PCI patches (possibly PCI-Power management) to be the > > > >> culprit here. > > > > > > > > So I did a git bisect from linux-v5.10 (good) to linux-v5.14.11 (bad). > > > > > > > > The result was: > > > > > > > > dilfridge /usr/src/linux-git # git bisect bad > > > > 6381195ad7d06ef979528c7452f3ff93659f86b1 is the first bad commit > > > > commit 6381195ad7d06ef979528c7452f3ff93659f86b1 > > > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > Date: Mon May 24 17:26:16 2021 +0200 > > > > > > > > ACPI: power: Rework turning off unused power resources > > > > [...] > > > > > > > > I tried naive reverting of this commit on top of 5.14.11. That applies nearly cleanly, > > > > and after a reboot the additional ethernet interfaces show up with their MAC in the > > > > boot messages. > > > > > > > > (Not knowing how safe that experiment was, I did not go further than single mode and > > > > immediately rebooted into 5.10 afterwards.) > > > > Reverting this is rather not an option, because the code before it was > > a one-off fix of an earlier issue, but it should be fixable given some > > more information. > > > > Basically, I need a boot log from both the good and bad cases and the > > acpidump output from the affected machine. > > > > https://dev.gentoo.org/~dilfridge/igb/ > > ^ Should all be here now. > > 5.10 -> "good" log (the errors are caused by missing support for my i915 graphics and hopefully unrelated) > 5.14 -> "bad" log > > Thank you for looking at this. If you need anything else, just ask. You're welcome. Please test the attached patch and let me know if it helps. -------------- next part -------------- A non-text attachment was scrubbed... Name: acpi-power-turn-off-fixup.patch Type: text/x-patch Size: 909 bytes Desc: not available URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20211014/611cbe2b/attachment.bin> ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [EXT] Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-14 12:09 ` [Intel-wired-lan] [EXT] " Rafael J. Wysocki @ 2021-10-15 14:00 ` Andreas K. Huettel -1 siblings, 0 replies; 35+ messages in thread From: Andreas K. Huettel @ 2021-10-15 14:00 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Paul Menzel, Rafael J. Wysocki, Jesse Brandeburg, netdev, intel-wired-lan, Jakub Kicinski, Rafael J. Wysocki, Len Brown, ACPI Devel Maling List [-- Attachment #1: Type: text/plain, Size: 3710 bytes --] Am Donnerstag, 14. Oktober 2021, 14:09:39 CEST schrieb Rafael J. Wysocki: > > > > >>> huettel@pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb > > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver > > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. > > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) > > > > >> > > > > >> This line is missing below, it indicates that the kernel couldn't or > > > > >> didn't power up the PCIe for some reason. We're looking for something > > > > >> like ACPI or PCI patches (possibly PCI-Power management) to be the > > > > >> culprit here. > > > > > > > > > > So I did a git bisect from linux-v5.10 (good) to linux-v5.14.11 (bad). > > > > > > > > > > The result was: > > > > > > > > > > dilfridge /usr/src/linux-git # git bisect bad > > > > > 6381195ad7d06ef979528c7452f3ff93659f86b1 is the first bad commit > > > > > commit 6381195ad7d06ef979528c7452f3ff93659f86b1 > > > > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > Date: Mon May 24 17:26:16 2021 +0200 > > > > > > > > > > ACPI: power: Rework turning off unused power resources > > > > > [...] > > > > > > > > > > I tried naive reverting of this commit on top of 5.14.11. That applies nearly cleanly, > > > > > and after a reboot the additional ethernet interfaces show up with their MAC in the > > > > > boot messages. > > > > > > > > > > (Not knowing how safe that experiment was, I did not go further than single mode and > > > > > immediately rebooted into 5.10 afterwards.) > > > > > > Reverting this is rather not an option, because the code before it was > > > a one-off fix of an earlier issue, but it should be fixable given some > > > more information. > > > > > > Basically, I need a boot log from both the good and bad cases and the > > > acpidump output from the affected machine. > > > > > > > https://dev.gentoo.org/~dilfridge/igb/ > > > > ^ Should all be here now. > > > > 5.10 -> "good" log (the errors are caused by missing support for my i915 graphics and hopefully unrelated) > > 5.14 -> "bad" log > > > > Thank you for looking at this. If you need anything else, just ask. > > You're welcome. > > Please test the attached patch and let me know if it helps. > It helps (*); the second ethernet adaptor is initialized, and works normally as far as I can see. (*) The debug output line following the if-condition apparently changed in the meantime, so I had to apply the change in the if-condition "manually". igb: Intel(R) Gigabit Ethernet Network Driver igb: Copyright (c) 2007-2014 Intel Corporation. igb 0000:01:00.0: enabling device (0000 -> 0002) igb 0000:01:00.0: added PHC on eth1 igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection igb 0000:01:00.0: eth1: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4c igb 0000:01:00.0: eth1: PBA No: H47819-001 igb 0000:01:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) igb 0000:01:00.1: enabling device (0000 -> 0002) igb 0000:01:00.1: added PHC on eth2 igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection igb 0000:01:00.1: eth2: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4d igb 0000:01:00.1: eth2: PBA No: H47819-001 igb 0000:01:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) The full boot log is at https://dev.gentoo.org/~dilfridge/igb/ as 5.14.11-*.txt Thanks, Andreas -- PD Dr. Andreas K. Huettel Institute for Experimental and Applied Physics University of Regensburg 93040 Regensburg Germany e-mail andreas.huettel@ur.de http://www.akhuettel.de/ http://www.physik.uni-r.de/forschung/huettel/ [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 981 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] [EXT] Re: Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-15 14:00 ` Andreas K. Huettel 0 siblings, 0 replies; 35+ messages in thread From: Andreas K. Huettel @ 2021-10-15 14:00 UTC (permalink / raw) To: intel-wired-lan Am Donnerstag, 14. Oktober 2021, 14:09:39 CEST schrieb Rafael J. Wysocki: > > > > >>> huettel at pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb > > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver > > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. > > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) > > > > >> > > > > >> This line is missing below, it indicates that the kernel couldn't or > > > > >> didn't power up the PCIe for some reason. We're looking for something > > > > >> like ACPI or PCI patches (possibly PCI-Power management) to be the > > > > >> culprit here. > > > > > > > > > > So I did a git bisect from linux-v5.10 (good) to linux-v5.14.11 (bad). > > > > > > > > > > The result was: > > > > > > > > > > dilfridge /usr/src/linux-git # git bisect bad > > > > > 6381195ad7d06ef979528c7452f3ff93659f86b1 is the first bad commit > > > > > commit 6381195ad7d06ef979528c7452f3ff93659f86b1 > > > > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > Date: Mon May 24 17:26:16 2021 +0200 > > > > > > > > > > ACPI: power: Rework turning off unused power resources > > > > > [...] > > > > > > > > > > I tried naive reverting of this commit on top of 5.14.11. That applies nearly cleanly, > > > > > and after a reboot the additional ethernet interfaces show up with their MAC in the > > > > > boot messages. > > > > > > > > > > (Not knowing how safe that experiment was, I did not go further than single mode and > > > > > immediately rebooted into 5.10 afterwards.) > > > > > > Reverting this is rather not an option, because the code before it was > > > a one-off fix of an earlier issue, but it should be fixable given some > > > more information. > > > > > > Basically, I need a boot log from both the good and bad cases and the > > > acpidump output from the affected machine. > > > > > > > https://dev.gentoo.org/~dilfridge/igb/ > > > > ^ Should all be here now. > > > > 5.10 -> "good" log (the errors are caused by missing support for my i915 graphics and hopefully unrelated) > > 5.14 -> "bad" log > > > > Thank you for looking at this. If you need anything else, just ask. > > You're welcome. > > Please test the attached patch and let me know if it helps. > It helps (*); the second ethernet adaptor is initialized, and works normally as far as I can see. (*) The debug output line following the if-condition apparently changed in the meantime, so I had to apply the change in the if-condition "manually". igb: Intel(R) Gigabit Ethernet Network Driver igb: Copyright (c) 2007-2014 Intel Corporation. igb 0000:01:00.0: enabling device (0000 -> 0002) igb 0000:01:00.0: added PHC on eth1 igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection igb 0000:01:00.0: eth1: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4c igb 0000:01:00.0: eth1: PBA No: H47819-001 igb 0000:01:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) igb 0000:01:00.1: enabling device (0000 -> 0002) igb 0000:01:00.1: added PHC on eth2 igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection igb 0000:01:00.1: eth2: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4d igb 0000:01:00.1: eth2: PBA No: H47819-001 igb 0000:01:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) The full boot log is at https://dev.gentoo.org/~dilfridge/igb/ as 5.14.11-*.txt Thanks, Andreas -- PD Dr. Andreas K. Huettel Institute for Experimental and Applied Physics University of Regensburg 93040 Regensburg Germany e-mail andreas.huettel at ur.de http://www.akhuettel.de/ http://www.physik.uni-r.de/forschung/huettel/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 981 bytes Desc: This is a digitally signed message part. URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20211015/29d9cbd2/attachment.asc> ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [EXT] Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] 2021-10-15 14:00 ` [Intel-wired-lan] [EXT] " Andreas K. Huettel @ 2021-10-15 18:42 ` Rafael J. Wysocki -1 siblings, 0 replies; 35+ messages in thread From: Rafael J. Wysocki @ 2021-10-15 18:42 UTC (permalink / raw) To: Andreas K. Huettel Cc: Rafael J. Wysocki, Paul Menzel, Jesse Brandeburg, netdev, intel-wired-lan, Jakub Kicinski, Rafael J. Wysocki, Len Brown, ACPI Devel Maling List On Fri, Oct 15, 2021 at 4:01 PM Andreas K. Huettel <andreas.huettel@ur.de> wrote: > > Am Donnerstag, 14. Oktober 2021, 14:09:39 CEST schrieb Rafael J. Wysocki: > > > > > >>> huettel@pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb > > > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver > > > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. > > > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) > > > > > >> > > > > > >> This line is missing below, it indicates that the kernel couldn't or > > > > > >> didn't power up the PCIe for some reason. We're looking for something > > > > > >> like ACPI or PCI patches (possibly PCI-Power management) to be the > > > > > >> culprit here. > > > > > > > > > > > > So I did a git bisect from linux-v5.10 (good) to linux-v5.14.11 (bad). > > > > > > > > > > > > The result was: > > > > > > > > > > > > dilfridge /usr/src/linux-git # git bisect bad > > > > > > 6381195ad7d06ef979528c7452f3ff93659f86b1 is the first bad commit > > > > > > commit 6381195ad7d06ef979528c7452f3ff93659f86b1 > > > > > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > Date: Mon May 24 17:26:16 2021 +0200 > > > > > > > > > > > > ACPI: power: Rework turning off unused power resources > > > > > > [...] > > > > > > > > > > > > I tried naive reverting of this commit on top of 5.14.11. That applies nearly cleanly, > > > > > > and after a reboot the additional ethernet interfaces show up with their MAC in the > > > > > > boot messages. > > > > > > > > > > > > (Not knowing how safe that experiment was, I did not go further than single mode and > > > > > > immediately rebooted into 5.10 afterwards.) > > > > > > > > Reverting this is rather not an option, because the code before it was > > > > a one-off fix of an earlier issue, but it should be fixable given some > > > > more information. > > > > > > > > Basically, I need a boot log from both the good and bad cases and the > > > > acpidump output from the affected machine. > > > > > > > > > > https://dev.gentoo.org/~dilfridge/igb/ > > > > > > ^ Should all be here now. > > > > > > 5.10 -> "good" log (the errors are caused by missing support for my i915 graphics and hopefully unrelated) > > > 5.14 -> "bad" log > > > > > > Thank you for looking at this. If you need anything else, just ask. > > > > You're welcome. > > > > Please test the attached patch and let me know if it helps. > > > > It helps (*); the second ethernet adaptor is initialized, and works normally as far as I can see. > > (*) The debug output line following the if-condition apparently changed in the meantime, so I had > to apply the change in the if-condition "manually". > > igb: Intel(R) Gigabit Ethernet Network Driver > igb: Copyright (c) 2007-2014 Intel Corporation. > igb 0000:01:00.0: enabling device (0000 -> 0002) > igb 0000:01:00.0: added PHC on eth1 > igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection > igb 0000:01:00.0: eth1: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4c > igb 0000:01:00.0: eth1: PBA No: H47819-001 > igb 0000:01:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) > igb 0000:01:00.1: enabling device (0000 -> 0002) > igb 0000:01:00.1: added PHC on eth2 > igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection > igb 0000:01:00.1: eth2: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4d > igb 0000:01:00.1: eth2: PBA No: H47819-001 > igb 0000:01:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) > > The full boot log is at https://dev.gentoo.org/~dilfridge/igb/ as 5.14.11-*.txt Thank you! I've added a changelog to it and resent along with another patch to test for you: https://lore.kernel.org/linux-acpi/21226252.EfDdHjke4D@kreacher/ ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-wired-lan] [EXT] Re: Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] @ 2021-10-15 18:42 ` Rafael J. Wysocki 0 siblings, 0 replies; 35+ messages in thread From: Rafael J. Wysocki @ 2021-10-15 18:42 UTC (permalink / raw) To: intel-wired-lan On Fri, Oct 15, 2021 at 4:01 PM Andreas K. Huettel <andreas.huettel@ur.de> wrote: > > Am Donnerstag, 14. Oktober 2021, 14:09:39 CEST schrieb Rafael J. Wysocki: > > > > > >>> huettel at pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb > > > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090675] igb: Intel(R) Gigabit Ethernet Network Driver > > > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090676] igb: Copyright (c) 2007-2014 Intel Corporation. > > > > > >>> Oct 5 15:11:18 dilfridge kernel: [ 2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002) > > > > > >> > > > > > >> This line is missing below, it indicates that the kernel couldn't or > > > > > >> didn't power up the PCIe for some reason. We're looking for something > > > > > >> like ACPI or PCI patches (possibly PCI-Power management) to be the > > > > > >> culprit here. > > > > > > > > > > > > So I did a git bisect from linux-v5.10 (good) to linux-v5.14.11 (bad). > > > > > > > > > > > > The result was: > > > > > > > > > > > > dilfridge /usr/src/linux-git # git bisect bad > > > > > > 6381195ad7d06ef979528c7452f3ff93659f86b1 is the first bad commit > > > > > > commit 6381195ad7d06ef979528c7452f3ff93659f86b1 > > > > > > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > Date: Mon May 24 17:26:16 2021 +0200 > > > > > > > > > > > > ACPI: power: Rework turning off unused power resources > > > > > > [...] > > > > > > > > > > > > I tried naive reverting of this commit on top of 5.14.11. That applies nearly cleanly, > > > > > > and after a reboot the additional ethernet interfaces show up with their MAC in the > > > > > > boot messages. > > > > > > > > > > > > (Not knowing how safe that experiment was, I did not go further than single mode and > > > > > > immediately rebooted into 5.10 afterwards.) > > > > > > > > Reverting this is rather not an option, because the code before it was > > > > a one-off fix of an earlier issue, but it should be fixable given some > > > > more information. > > > > > > > > Basically, I need a boot log from both the good and bad cases and the > > > > acpidump output from the affected machine. > > > > > > > > > > https://dev.gentoo.org/~dilfridge/igb/ > > > > > > ^ Should all be here now. > > > > > > 5.10 -> "good" log (the errors are caused by missing support for my i915 graphics and hopefully unrelated) > > > 5.14 -> "bad" log > > > > > > Thank you for looking at this. If you need anything else, just ask. > > > > You're welcome. > > > > Please test the attached patch and let me know if it helps. > > > > It helps (*); the second ethernet adaptor is initialized, and works normally as far as I can see. > > (*) The debug output line following the if-condition apparently changed in the meantime, so I had > to apply the change in the if-condition "manually". > > igb: Intel(R) Gigabit Ethernet Network Driver > igb: Copyright (c) 2007-2014 Intel Corporation. > igb 0000:01:00.0: enabling device (0000 -> 0002) > igb 0000:01:00.0: added PHC on eth1 > igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection > igb 0000:01:00.0: eth1: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4c > igb 0000:01:00.0: eth1: PBA No: H47819-001 > igb 0000:01:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) > igb 0000:01:00.1: enabling device (0000 -> 0002) > igb 0000:01:00.1: added PHC on eth2 > igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection > igb 0000:01:00.1: eth2: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4d > igb 0000:01:00.1: eth2: PBA No: H47819-001 > igb 0000:01:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) > > The full boot log is at https://dev.gentoo.org/~dilfridge/igb/ as 5.14.11-*.txt Thank you! I've added a changelog to it and resent along with another patch to test for you: https://lore.kernel.org/linux-acpi/21226252.EfDdHjke4D at kreacher/ ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2021-10-15 18:43 UTC | newest] Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-10-04 13:06 Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] Andreas K. Huettel 2021-10-04 14:48 ` Jakub Kicinski 2021-10-04 14:48 ` [Intel-wired-lan] " Jakub Kicinski 2021-10-04 23:39 ` Hisashi T Fujinaka 2021-10-04 23:39 ` Hisashi T Fujinaka 2021-10-05 0:12 ` [EXT] " Andreas K. Huettel 2021-10-05 0:12 ` [Intel-wired-lan] [EXT] " Andreas K. Huettel 2021-10-05 0:21 ` [EXT] Re: [Intel-wired-lan] " Hisashi T Fujinaka 2021-10-05 0:21 ` [Intel-wired-lan] [EXT] " Hisashi T Fujinaka 2021-10-05 6:50 ` [Intel-wired-lan] " Sasha Neftin 2021-10-05 6:50 ` Sasha Neftin 2021-10-05 9:40 ` Paul Menzel 2021-10-05 9:40 ` Paul Menzel 2021-10-05 18:20 ` Hisashi T Fujinaka 2021-10-05 18:20 ` Hisashi T Fujinaka 2021-10-05 9:34 ` Paul Menzel 2021-10-05 9:34 ` Paul Menzel 2021-10-05 13:43 ` [EXT] " Andreas K. Huettel 2021-10-05 13:43 ` [Intel-wired-lan] [EXT] " Andreas K. Huettel 2021-10-05 22:27 ` [EXT] Re: [Intel-wired-lan] " Jesse Brandeburg 2021-10-05 22:27 ` [Intel-wired-lan] [EXT] " Jesse Brandeburg 2021-10-12 16:34 ` [EXT] Re: [Intel-wired-lan] " Andreas K. Huettel 2021-10-12 16:34 ` [Intel-wired-lan] [EXT] " Andreas K. Huettel 2021-10-12 17:42 ` [EXT] Re: [Intel-wired-lan] " Paul Menzel 2021-10-12 17:42 ` [Intel-wired-lan] [EXT] " Paul Menzel 2021-10-12 17:58 ` [EXT] Re: [Intel-wired-lan] " Rafael J. Wysocki 2021-10-12 17:58 ` [Intel-wired-lan] [EXT] " Rafael J. Wysocki 2021-10-12 19:28 ` [EXT] Re: [Intel-wired-lan] " Andreas K. Huettel 2021-10-12 19:28 ` [Intel-wired-lan] [EXT] " Andreas K. Huettel 2021-10-14 12:09 ` [EXT] Re: [Intel-wired-lan] " Rafael J. Wysocki 2021-10-14 12:09 ` [Intel-wired-lan] [EXT] " Rafael J. Wysocki 2021-10-15 14:00 ` [EXT] Re: [Intel-wired-lan] " Andreas K. Huettel 2021-10-15 14:00 ` [Intel-wired-lan] [EXT] " Andreas K. Huettel 2021-10-15 18:42 ` [EXT] Re: [Intel-wired-lan] " Rafael J. Wysocki 2021-10-15 18:42 ` [Intel-wired-lan] [EXT] " Rafael J. Wysocki
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.