All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
       [not found] <CAPAtJa_o5q-sU+AD=G3y43H_5pBKnOZTQGXM99uszPXNkn8Z9A@mail.gmail.com>
@ 2022-11-01  0:05 ` Jakub Kicinski
  2022-11-01 16:20   ` Neftin, Sasha
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2022-11-01  0:05 UTC (permalink / raw)
  To: Ivan Smirnov; +Cc: intel-wired-lan

CC: intel-wired

On Sun, 30 Oct 2022 14:44:57 -0600 Ivan Smirnov wrote:
> Hi folks,
> 
> I found your commits on the linux kernel igc
> <https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc>
> folder. There appears to be a bug with the igc kernel module on Intel
> I225-V chips.
> 
> Specifically, the probe fails at startup with error: "igc: probe of
> 0000:06:00.0 failed with error -13". When it does load, it crashes after a
> few hours with error "igc failed to read reg 0xc030".
> 
> There are several affected users posting on
> https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/
> with more details.
> 
> Could I help you debug this? This problem has been reproduced on the
> following setups:
> 
> 1. Asus TUF-GAMING-Z690-PLUS-WIFI-D4
> <https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/>
> on
> Arch Linux, kernel 6.0.2-arch1-1
> 2. rog strix x670e-e gaming wifi
> <https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/>
> on
> Proxmox 7, as well as Ubuntu Linux (kernel 5.19, I believe)
> 
> I'm happy to load any debug modules or provide additional logs as per
> your request.
> 
> Thank you
> 
> 
> 
> 
> --
> Ivan Smirnov
> https://ivans.io/ | https://blog.ivansmirnov.name/
> https://www.linkedin.com/in/ismirnov |
> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> *https://github.com/issmirnov <https://ivansmirnov.name/>*

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-01  0:05 ` [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V) Jakub Kicinski
@ 2022-11-01 16:20   ` Neftin, Sasha
  2022-11-02 16:54     ` Ivan Smirnov
  0 siblings, 1 reply; 29+ messages in thread
From: Neftin, Sasha @ 2022-11-01 16:20 UTC (permalink / raw)
  To: Jakub Kicinski, Ivan Smirnov, Avivi, Amir, naamax.meir,
	Fuxbrumer, Devora
  Cc: Ruinskiy, Dima, intel-wired-lan

On 11/1/2022 02:05, Jakub Kicinski wrote:
> CC: intel-wired
> 
> On Sun, 30 Oct 2022 14:44:57 -0600 Ivan Smirnov wrote:
>> Hi folks,
>>
>> I found your commits on the linux kernel igc
>> <https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc>
>> folder. There appears to be a bug with the igc kernel module on Intel
>> I225-V chips.
>>
>> Specifically, the probe fails at startup with error: "igc: probe of
>> 0000:06:00.0 failed with error -13". When it does load, it crashes after a
>> few hours with error "igc failed to read reg 0xc030".
>>
Could you provide dmesg -w -T | grep -i igc on the boot stage? ethtool -i?
I've cc'd our PAE expert Amir who also could try to look at this problem.

>> There are several affected users posting on
>> https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/
>> with more details.
>>
>> Could I help you debug this? This problem has been reproduced on the
>> following setups:
>>
>> 1. Asus TUF-GAMING-Z690-PLUS-WIFI-D4
>> <https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/>
>> on
>> Arch Linux, kernel 6.0.2-arch1-1
>> 2. rog strix x670e-e gaming wifi
>> <https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/>
>> on
>> Proxmox 7, as well as Ubuntu Linux (kernel 5.19, I believe)
>>
>> I'm happy to load any debug modules or provide additional logs as per
>> your request.
>>
>> Thank you
>>
>>
>>
>>
>> --
>> Ivan Smirnov
>> https://ivans.io/ | https://blog.ivansmirnov.name/
>> https://www.linkedin.com/in/ismirnov |
>> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
>> *https://github.com/issmirnov <https://ivansmirnov.name/>*
> 

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-01 16:20   ` Neftin, Sasha
@ 2022-11-02 16:54     ` Ivan Smirnov
  2022-11-02 17:53       ` Ivan Smirnov
  0 siblings, 1 reply; 29+ messages in thread
From: Ivan Smirnov @ 2022-11-02 16:54 UTC (permalink / raw)
  To: Neftin, Sasha
  Cc: Fuxbrumer, Devora, Ruinskiy, Dima, Jakub Kicinski,
	intel-wired-lan, Avivi, Amir


[-- Attachment #1.1: Type: text/plain, Size: 2683 bytes --]

Hi folks,

As usual, the computers know when the experts join the chat... I haven't
been able to reproduce the issue for the past few days. Yay for stability,
boo for debugging.

I posted on the reddit thread
<https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/>
asking
other users to post their output. I'll do my best to keep an eye out for
this issue and get you the logs ASAP once I repro the crash.

Thank you for your responsiveness - will keep you posted!

Best,
- Ivan
--
Ivan Smirnov
https://ivans.io/ | https://blog.ivansmirnov.name/
https://www.linkedin.com/in/ismirnov |
*https://ivansmirnov.name/ <https://ivansmirnov.name/>*
*https://github.com/issmirnov <https://ivansmirnov.name/>*


On Tue, Nov 1, 2022 at 10:21 AM Neftin, Sasha <sasha.neftin@intel.com>
wrote:

> On 11/1/2022 02:05, Jakub Kicinski wrote:
> > CC: intel-wired
> >
> > On Sun, 30 Oct 2022 14:44:57 -0600 Ivan Smirnov wrote:
> >> Hi folks,
> >>
> >> I found your commits on the linux kernel igc
> >> <
> https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc
> >
> >> folder. There appears to be a bug with the igc kernel module on Intel
> >> I225-V chips.
> >>
> >> Specifically, the probe fails at startup with error: "igc: probe of
> >> 0000:06:00.0 failed with error -13". When it does load, it crashes
> after a
> >> few hours with error "igc failed to read reg 0xc030".
> >>
> Could you provide dmesg -w -T | grep -i igc on the boot stage? ethtool -i?
> I've cc'd our PAE expert Amir who also could try to look at this problem.
>
> >> There are several affected users posting on
> >>
> https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/
> >> with more details.
> >>
> >> Could I help you debug this? This problem has been reproduced on the
> >> following setups:
> >>
> >> 1. Asus TUF-GAMING-Z690-PLUS-WIFI-D4
> >> <
> https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/
> >
> >> on
> >> Arch Linux, kernel 6.0.2-arch1-1
> >> 2. rog strix x670e-e gaming wifi
> >> <
> https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/
> >
> >> on
> >> Proxmox 7, as well as Ubuntu Linux (kernel 5.19, I believe)
> >>
> >> I'm happy to load any debug modules or provide additional logs as per
> >> your request.
> >>
> >> Thank you
> >>
> >>
> >>
> >>
> >> --
> >> Ivan Smirnov
> >> https://ivans.io/ | https://blog.ivansmirnov.name/
> >> https://www.linkedin.com/in/ismirnov |
> >> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> >> *https://github.com/issmirnov <https://ivansmirnov.name/>*
> >
>
>

[-- Attachment #1.2: Type: text/html, Size: 5352 bytes --]

[-- Attachment #2: Type: text/plain, Size: 162 bytes --]

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-02 16:54     ` Ivan Smirnov
@ 2022-11-02 17:53       ` Ivan Smirnov
  2022-11-10 11:44         ` Ivan Smirnov
  0 siblings, 1 reply; 29+ messages in thread
From: Ivan Smirnov @ 2022-11-02 17:53 UTC (permalink / raw)
  To: Neftin, Sasha
  Cc: Fuxbrumer, Devora, Ruinskiy, Dima, Jakub Kicinski,
	intel-wired-lan, Avivi, Amir


[-- Attachment #1.1: Type: text/plain, Size: 3298 bytes --]

Here is the gist from one reddit user:
https://gist.github.com/DarkArc/50ffca5fc343e2ff8166bc81d3ff8335

Here are my gists (crash free for now):
https://gist.github.com/issmirnov/b9ac74d232e1865ae849a3e64dce2afe
--
Ivan Smirnov
https://ivans.io/ | https://blog.ivansmirnov.name/
https://www.linkedin.com/in/ismirnov |
*https://ivansmirnov.name/ <https://ivansmirnov.name/>*
*https://github.com/issmirnov <https://ivansmirnov.name/>*


On Wed, Nov 2, 2022 at 10:54 AM Ivan Smirnov <isgsmirnov@gmail.com> wrote:

> Hi folks,
>
> As usual, the computers know when the experts join the chat... I haven't
> been able to reproduce the issue for the past few days. Yay for stability,
> boo for debugging.
>
> I posted on the reddit thread
> <https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/> asking
> other users to post their output. I'll do my best to keep an eye out for
> this issue and get you the logs ASAP once I repro the crash.
>
> Thank you for your responsiveness - will keep you posted!
>
> Best,
> - Ivan
> --
> Ivan Smirnov
> https://ivans.io/ | https://blog.ivansmirnov.name/
> https://www.linkedin.com/in/ismirnov |
> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> *https://github.com/issmirnov <https://ivansmirnov.name/>*
>
>
> On Tue, Nov 1, 2022 at 10:21 AM Neftin, Sasha <sasha.neftin@intel.com>
> wrote:
>
>> On 11/1/2022 02:05, Jakub Kicinski wrote:
>> > CC: intel-wired
>> >
>> > On Sun, 30 Oct 2022 14:44:57 -0600 Ivan Smirnov wrote:
>> >> Hi folks,
>> >>
>> >> I found your commits on the linux kernel igc
>> >> <
>> https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc
>> >
>> >> folder. There appears to be a bug with the igc kernel module on Intel
>> >> I225-V chips.
>> >>
>> >> Specifically, the probe fails at startup with error: "igc: probe of
>> >> 0000:06:00.0 failed with error -13". When it does load, it crashes
>> after a
>> >> few hours with error "igc failed to read reg 0xc030".
>> >>
>> Could you provide dmesg -w -T | grep -i igc on the boot stage? ethtool -i?
>> I've cc'd our PAE expert Amir who also could try to look at this problem.
>>
>> >> There are several affected users posting on
>> >>
>> https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/
>> >> with more details.
>> >>
>> >> Could I help you debug this? This problem has been reproduced on the
>> >> following setups:
>> >>
>> >> 1. Asus TUF-GAMING-Z690-PLUS-WIFI-D4
>> >> <
>> https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/
>> >
>> >> on
>> >> Arch Linux, kernel 6.0.2-arch1-1
>> >> 2. rog strix x670e-e gaming wifi
>> >> <
>> https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/
>> >
>> >> on
>> >> Proxmox 7, as well as Ubuntu Linux (kernel 5.19, I believe)
>> >>
>> >> I'm happy to load any debug modules or provide additional logs as per
>> >> your request.
>> >>
>> >> Thank you
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Ivan Smirnov
>> >> https://ivans.io/ | https://blog.ivansmirnov.name/
>> >> https://www.linkedin.com/in/ismirnov |
>> >> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
>> >> *https://github.com/issmirnov <https://ivansmirnov.name/>*
>> >
>>
>>

[-- Attachment #1.2: Type: text/html, Size: 7039 bytes --]

[-- Attachment #2: Type: text/plain, Size: 162 bytes --]

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-02 17:53       ` Ivan Smirnov
@ 2022-11-10 11:44         ` Ivan Smirnov
  2022-11-16 22:23           ` Ivan Smirnov
  0 siblings, 1 reply; 29+ messages in thread
From: Ivan Smirnov @ 2022-11-10 11:44 UTC (permalink / raw)
  To: Neftin, Sasha
  Cc: Fuxbrumer, Devora, intel-wired-lan, Jakub Kicinski, Ruinskiy,
	Dima, Avivi, Amir


[-- Attachment #1.1: Type: text/plain, Size: 4263 bytes --]

Some more data from another user. Do you guys have any preliminary
investigation you could share back with the community?

Same issue, been struggling with it for a last month or so: both with
Ubuntu and Arch Linux. I have a dual-boot system with Windows 11, and did
not notice any issues with ethernet or wifi on Windows. So this indeed
seems like a firmware issue, particularly in igc. Not the adapter itself

Running on Arch Linux kernel 6.0.7, same motherboard as in your post

https://gist.github.com/LilDojd/2f030ecc5c5b6f8c3285725adfb8c456




On Thu, Nov 3, 2022 at 05:53 Ivan Smirnov <isgsmirnov@gmail.com> wrote:

> Here is the gist from one reddit user:
> https://gist.github.com/DarkArc/50ffca5fc343e2ff8166bc81d3ff8335
>
> Here are my gists (crash free for now):
> https://gist.github.com/issmirnov/b9ac74d232e1865ae849a3e64dce2afe
>
> --
> Ivan Smirnov
> https://ivans.io/ | https://blog.ivansmirnov.name/
> https://www.linkedin.com/in/ismirnov |
> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> *https://github.com/issmirnov <https://ivansmirnov.name/>*
>
>
> On Wed, Nov 2, 2022 at 10:54 AM Ivan Smirnov <isgsmirnov@gmail.com> wrote:
>
>> Hi folks,
>>
>> As usual, the computers know when the experts join the chat... I haven't
>> been able to reproduce the issue for the past few days. Yay for stability,
>> boo for debugging.
>>
>> I posted on the reddit thread
>> <https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/> asking
>> other users to post their output. I'll do my best to keep an eye out for
>> this issue and get you the logs ASAP once I repro the crash.
>>
>> Thank you for your responsiveness - will keep you posted!
>>
>> Best,
>> - Ivan
>> --
>> Ivan Smirnov
>> https://ivans.io/ | https://blog.ivansmirnov.name/
>> https://www.linkedin.com/in/ismirnov |
>> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
>> *https://github.com/issmirnov <https://ivansmirnov.name/>*
>>
>>
>> On Tue, Nov 1, 2022 at 10:21 AM Neftin, Sasha <sasha.neftin@intel.com>
>> wrote:
>>
>>> On 11/1/2022 02:05, Jakub Kicinski wrote:
>>> > CC: intel-wired
>>> >
>>> > On Sun, 30 Oct 2022 14:44:57 -0600 Ivan Smirnov wrote:
>>> >> Hi folks,
>>> >>
>>> >> I found your commits on the linux kernel igc
>>> >> <
>>> https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc
>>> >
>>> >> folder. There appears to be a bug with the igc kernel module on Intel
>>> >> I225-V chips.
>>> >>
>>> >> Specifically, the probe fails at startup with error: "igc: probe of
>>> >> 0000:06:00.0 failed with error -13". When it does load, it crashes
>>> after a
>>> >> few hours with error "igc failed to read reg 0xc030".
>>> >>
>>> Could you provide dmesg -w -T | grep -i igc on the boot stage? ethtool
>>> -i?
>>> I've cc'd our PAE expert Amir who also could try to look at this problem.
>>>
>>> >> There are several affected users posting on
>>> >>
>>> https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/
>>> >> with more details.
>>> >>
>>> >> Could I help you debug this? This problem has been reproduced on the
>>> >> following setups:
>>> >>
>>> >> 1. Asus TUF-GAMING-Z690-PLUS-WIFI-D4
>>> >> <
>>> https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/
>>> >
>>> >> on
>>> >> Arch Linux, kernel 6.0.2-arch1-1
>>> >> 2. rog strix x670e-e gaming wifi
>>> >> <
>>> https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/
>>> >
>>> >> on
>>> >> Proxmox 7, as well as Ubuntu Linux (kernel 5.19, I believe)
>>> >>
>>> >> I'm happy to load any debug modules or provide additional logs as per
>>> >> your request.
>>> >>
>>> >> Thank you
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Ivan Smirnov
>>> >> https://ivans.io/ | https://blog.ivansmirnov.name/
>>> >> https://www.linkedin.com/in/ismirnov |
>>> >> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
>>> >> *https://github.com/issmirnov <https://ivansmirnov.name/>*
>>> >
>>>
>>> --
--
Ivan Smirnov
https://ivans.io/ | https://blog.ivansmirnov.name/
https://www.linkedin.com/in/ismirnov |
*https://ivansmirnov.name/ <https://ivansmirnov.name/>*
*https://github.com/issmirnov <https://ivansmirnov.name/>*

[-- Attachment #1.2: Type: text/html, Size: 9678 bytes --]

[-- Attachment #2: Type: text/plain, Size: 162 bytes --]

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-10 11:44         ` Ivan Smirnov
@ 2022-11-16 22:23           ` Ivan Smirnov
  2022-11-18 22:43               ` Conor Dooley
  2022-11-23 11:47             ` Ruinskiy, Dima
  0 siblings, 2 replies; 29+ messages in thread
From: Ivan Smirnov @ 2022-11-16 22:23 UTC (permalink / raw)
  To: Neftin, Sasha
  Cc: Fuxbrumer, Devora, intel-wired-lan, Jakub Kicinski, Ruinskiy,
	Dima, Avivi, Amir


[-- Attachment #1.1: Type: text/plain, Size: 4865 bytes --]

Hi folks,

Is there any update for the community? More and more folks are asking. We
are all techies and happy to help debug.

Thank you kindly,
- Ivan

On Thu, Nov 10, 2022 at 03:44 Ivan Smirnov <isgsmirnov@gmail.com> wrote:

> Some more data from another user. Do you guys have any preliminary
> investigation you could share back with the community?
>
> Same issue, been struggling with it for a last month or so: both with
> Ubuntu and Arch Linux. I have a dual-boot system with Windows 11, and did
> not notice any issues with ethernet or wifi on Windows. So this indeed
> seems like a firmware issue, particularly in igc. Not the adapter itself
>
> Running on Arch Linux kernel 6.0.7, same motherboard as in your post
>
> https://gist.github.com/LilDojd/2f030ecc5c5b6f8c3285725adfb8c456
>
>
>
>
> On Thu, Nov 3, 2022 at 05:53 Ivan Smirnov <isgsmirnov@gmail.com> wrote:
>
>> Here is the gist from one reddit user:
>> https://gist.github.com/DarkArc/50ffca5fc343e2ff8166bc81d3ff8335
>>
>> Here are my gists (crash free for now):
>> https://gist.github.com/issmirnov/b9ac74d232e1865ae849a3e64dce2afe
>>
>> --
>> Ivan Smirnov
>> https://ivans.io/ | https://blog.ivansmirnov.name/
>> https://www.linkedin.com/in/ismirnov |
>> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
>> *https://github.com/issmirnov <https://ivansmirnov.name/>*
>>
>>
>> On Wed, Nov 2, 2022 at 10:54 AM Ivan Smirnov <isgsmirnov@gmail.com>
>> wrote:
>>
>>> Hi folks,
>>>
>>> As usual, the computers know when the experts join the chat... I haven't
>>> been able to reproduce the issue for the past few days. Yay for stability,
>>> boo for debugging.
>>>
>>> I posted on the reddit thread
>>> <https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/> asking
>>> other users to post their output. I'll do my best to keep an eye out for
>>> this issue and get you the logs ASAP once I repro the crash.
>>>
>>> Thank you for your responsiveness - will keep you posted!
>>>
>>> Best,
>>> - Ivan
>>> --
>>> Ivan Smirnov
>>> https://ivans.io/ | https://blog.ivansmirnov.name/
>>> https://www.linkedin.com/in/ismirnov |
>>> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
>>> *https://github.com/issmirnov <https://ivansmirnov.name/>*
>>>
>>>
>>> On Tue, Nov 1, 2022 at 10:21 AM Neftin, Sasha <sasha.neftin@intel.com>
>>> wrote:
>>>
>>>> On 11/1/2022 02:05, Jakub Kicinski wrote:
>>>> > CC: intel-wired
>>>> >
>>>> > On Sun, 30 Oct 2022 14:44:57 -0600 Ivan Smirnov wrote:
>>>> >> Hi folks,
>>>> >>
>>>> >> I found your commits on the linux kernel igc
>>>> >> <
>>>> https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc
>>>> >
>>>> >> folder. There appears to be a bug with the igc kernel module on Intel
>>>> >> I225-V chips.
>>>> >>
>>>> >> Specifically, the probe fails at startup with error: "igc: probe of
>>>> >> 0000:06:00.0 failed with error -13". When it does load, it crashes
>>>> after a
>>>> >> few hours with error "igc failed to read reg 0xc030".
>>>> >>
>>>> Could you provide dmesg -w -T | grep -i igc on the boot stage? ethtool
>>>> -i?
>>>> I've cc'd our PAE expert Amir who also could try to look at this
>>>> problem.
>>>>
>>>> >> There are several affected users posting on
>>>> >>
>>>> https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/
>>>> >> with more details.
>>>> >>
>>>> >> Could I help you debug this? This problem has been reproduced on the
>>>> >> following setups:
>>>> >>
>>>> >> 1. Asus TUF-GAMING-Z690-PLUS-WIFI-D4
>>>> >> <
>>>> https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/
>>>> >
>>>> >> on
>>>> >> Arch Linux, kernel 6.0.2-arch1-1
>>>> >> 2. rog strix x670e-e gaming wifi
>>>> >> <
>>>> https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/
>>>> >
>>>> >> on
>>>> >> Proxmox 7, as well as Ubuntu Linux (kernel 5.19, I believe)
>>>> >>
>>>> >> I'm happy to load any debug modules or provide additional logs as per
>>>> >> your request.
>>>> >>
>>>> >> Thank you
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Ivan Smirnov
>>>> >> https://ivans.io/ | https://blog.ivansmirnov.name/
>>>> >> https://www.linkedin.com/in/ismirnov |
>>>> >> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
>>>> >> *https://github.com/issmirnov <https://ivansmirnov.name/>*
>>>> >
>>>>
>>>> --
> --
> Ivan Smirnov
> https://ivans.io/ | https://blog.ivansmirnov.name/
> https://www.linkedin.com/in/ismirnov |
> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> *https://github.com/issmirnov <https://ivansmirnov.name/>*
>
-- 
--
Ivan Smirnov
https://ivans.io/ | https://blog.ivansmirnov.name/
https://www.linkedin.com/in/ismirnov |
*https://ivansmirnov.name/ <https://ivansmirnov.name/>*
*https://github.com/issmirnov <https://ivansmirnov.name/>*

[-- Attachment #1.2: Type: text/html, Size: 11375 bytes --]

[-- Attachment #2: Type: text/plain, Size: 162 bytes --]

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-16 22:23           ` Ivan Smirnov
@ 2022-11-18 22:43               ` Conor Dooley
  2022-11-23 11:47             ` Ruinskiy, Dima
  1 sibling, 0 replies; 29+ messages in thread
From: Conor Dooley @ 2022-11-18 22:43 UTC (permalink / raw)
  To: Ivan Smirnov
  Cc: Neftin, Sasha, Fuxbrumer, Devora, intel-wired-lan,
	Jakub Kicinski, Ruinskiy, Dima, Avivi, Amir, regressions

Hey,

On Wed, Nov 16, 2022 at 02:23:57PM -0800, Ivan Smirnov wrote:
> Hi folks,
> 
> Is there any update for the community? More and more folks are asking. We
> are all techies and happy to help debug.

Vested interest since I am suffering from the same issue (X670E-F
Gaming), but is it okay to add this to regzbot? Not sure whether it
counts as a regression or not since it's new hw with the existing driver,
but this seems to be falling through the cracks without a response for
several weeks.

Thanks,
Conor.

> On Thu, Nov 10, 2022 at 03:44 Ivan Smirnov <isgsmirnov@gmail.com> wrote:
> 
> > Some more data from another user. Do you guys have any preliminary
> > investigation you could share back with the community?
> >
> > Same issue, been struggling with it for a last month or so: both with
> > Ubuntu and Arch Linux. I have a dual-boot system with Windows 11, and did
> > not notice any issues with ethernet or wifi on Windows. So this indeed
> > seems like a firmware issue, particularly in igc. Not the adapter itself
> >
> > Running on Arch Linux kernel 6.0.7, same motherboard as in your post
> >
> > https://gist.github.com/LilDojd/2f030ecc5c5b6f8c3285725adfb8c456
> >
> >
> >
> >
> > On Thu, Nov 3, 2022 at 05:53 Ivan Smirnov <isgsmirnov@gmail.com> wrote:
> >
> >> Here is the gist from one reddit user:
> >> https://gist.github.com/DarkArc/50ffca5fc343e2ff8166bc81d3ff8335
> >>
> >> Here are my gists (crash free for now):
> >> https://gist.github.com/issmirnov/b9ac74d232e1865ae849a3e64dce2afe
> >>
> >> --
> >> Ivan Smirnov
> >> https://ivans.io/ | https://blog.ivansmirnov.name/
> >> https://www.linkedin.com/in/ismirnov |
> >> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> >> *https://github.com/issmirnov <https://ivansmirnov.name/>*
> >>
> >>
> >> On Wed, Nov 2, 2022 at 10:54 AM Ivan Smirnov <isgsmirnov@gmail.com>
> >> wrote:
> >>
> >>> Hi folks,
> >>>
> >>> As usual, the computers know when the experts join the chat... I haven't
> >>> been able to reproduce the issue for the past few days. Yay for stability,
> >>> boo for debugging.
> >>>
> >>> I posted on the reddit thread
> >>> <https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/> asking
> >>> other users to post their output. I'll do my best to keep an eye out for
> >>> this issue and get you the logs ASAP once I repro the crash.
> >>>
> >>> Thank you for your responsiveness - will keep you posted!
> >>>
> >>> Best,
> >>> - Ivan
> >>> --
> >>> Ivan Smirnov
> >>> https://ivans.io/ | https://blog.ivansmirnov.name/
> >>> https://www.linkedin.com/in/ismirnov |
> >>> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> >>> *https://github.com/issmirnov <https://ivansmirnov.name/>*
> >>>
> >>>
> >>> On Tue, Nov 1, 2022 at 10:21 AM Neftin, Sasha <sasha.neftin@intel.com>
> >>> wrote:
> >>>
> >>>> On 11/1/2022 02:05, Jakub Kicinski wrote:
> >>>> > CC: intel-wired
> >>>> >
> >>>> > On Sun, 30 Oct 2022 14:44:57 -0600 Ivan Smirnov wrote:
> >>>> >> Hi folks,
> >>>> >>
> >>>> >> I found your commits on the linux kernel igc
> >>>> >> <
> >>>> https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc
> >>>> >
> >>>> >> folder. There appears to be a bug with the igc kernel module on Intel
> >>>> >> I225-V chips.
> >>>> >>
> >>>> >> Specifically, the probe fails at startup with error: "igc: probe of
> >>>> >> 0000:06:00.0 failed with error -13". When it does load, it crashes
> >>>> after a
> >>>> >> few hours with error "igc failed to read reg 0xc030".
> >>>> >>
> >>>> Could you provide dmesg -w -T | grep -i igc on the boot stage? ethtool
> >>>> -i?
> >>>> I've cc'd our PAE expert Amir who also could try to look at this
> >>>> problem.
> >>>>
> >>>> >> There are several affected users posting on
> >>>> >>
> >>>> https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/
> >>>> >> with more details.
> >>>> >>
> >>>> >> Could I help you debug this? This problem has been reproduced on the
> >>>> >> following setups:
> >>>> >>
> >>>> >> 1. Asus TUF-GAMING-Z690-PLUS-WIFI-D4
> >>>> >> <
> >>>> https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/
> >>>> >
> >>>> >> on
> >>>> >> Arch Linux, kernel 6.0.2-arch1-1
> >>>> >> 2. rog strix x670e-e gaming wifi
> >>>> >> <
> >>>> https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/
> >>>> >
> >>>> >> on
> >>>> >> Proxmox 7, as well as Ubuntu Linux (kernel 5.19, I believe)
> >>>> >>
> >>>> >> I'm happy to load any debug modules or provide additional logs as per
> >>>> >> your request.
> >>>> >>
> >>>> >> Thank you
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> --
> >>>> >> Ivan Smirnov
> >>>> >> https://ivans.io/ | https://blog.ivansmirnov.name/
> >>>> >> https://www.linkedin.com/in/ismirnov |
> >>>> >> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> >>>> >> *https://github.com/issmirnov <https://ivansmirnov.name/>*
> >>>> >
> >>>>
> >>>> --
> > --
> > Ivan Smirnov
> > https://ivans.io/ | https://blog.ivansmirnov.name/
> > https://www.linkedin.com/in/ismirnov |
> > *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> > *https://github.com/issmirnov <https://ivansmirnov.name/>*
> >
> -- 
> --
> Ivan Smirnov
> https://ivans.io/ | https://blog.ivansmirnov.name/
> https://www.linkedin.com/in/ismirnov |
> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> *https://github.com/issmirnov <https://ivansmirnov.name/>*

> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
@ 2022-11-18 22:43               ` Conor Dooley
  0 siblings, 0 replies; 29+ messages in thread
From: Conor Dooley @ 2022-11-18 22:43 UTC (permalink / raw)
  To: Ivan Smirnov
  Cc: Fuxbrumer, Devora, regressions, intel-wired-lan, Jakub Kicinski,
	Ruinskiy, Dima, Avivi, Amir

Hey,

On Wed, Nov 16, 2022 at 02:23:57PM -0800, Ivan Smirnov wrote:
> Hi folks,
> 
> Is there any update for the community? More and more folks are asking. We
> are all techies and happy to help debug.

Vested interest since I am suffering from the same issue (X670E-F
Gaming), but is it okay to add this to regzbot? Not sure whether it
counts as a regression or not since it's new hw with the existing driver,
but this seems to be falling through the cracks without a response for
several weeks.

Thanks,
Conor.

> On Thu, Nov 10, 2022 at 03:44 Ivan Smirnov <isgsmirnov@gmail.com> wrote:
> 
> > Some more data from another user. Do you guys have any preliminary
> > investigation you could share back with the community?
> >
> > Same issue, been struggling with it for a last month or so: both with
> > Ubuntu and Arch Linux. I have a dual-boot system with Windows 11, and did
> > not notice any issues with ethernet or wifi on Windows. So this indeed
> > seems like a firmware issue, particularly in igc. Not the adapter itself
> >
> > Running on Arch Linux kernel 6.0.7, same motherboard as in your post
> >
> > https://gist.github.com/LilDojd/2f030ecc5c5b6f8c3285725adfb8c456
> >
> >
> >
> >
> > On Thu, Nov 3, 2022 at 05:53 Ivan Smirnov <isgsmirnov@gmail.com> wrote:
> >
> >> Here is the gist from one reddit user:
> >> https://gist.github.com/DarkArc/50ffca5fc343e2ff8166bc81d3ff8335
> >>
> >> Here are my gists (crash free for now):
> >> https://gist.github.com/issmirnov/b9ac74d232e1865ae849a3e64dce2afe
> >>
> >> --
> >> Ivan Smirnov
> >> https://ivans.io/ | https://blog.ivansmirnov.name/
> >> https://www.linkedin.com/in/ismirnov |
> >> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> >> *https://github.com/issmirnov <https://ivansmirnov.name/>*
> >>
> >>
> >> On Wed, Nov 2, 2022 at 10:54 AM Ivan Smirnov <isgsmirnov@gmail.com>
> >> wrote:
> >>
> >>> Hi folks,
> >>>
> >>> As usual, the computers know when the experts join the chat... I haven't
> >>> been able to reproduce the issue for the past few days. Yay for stability,
> >>> boo for debugging.
> >>>
> >>> I posted on the reddit thread
> >>> <https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/> asking
> >>> other users to post their output. I'll do my best to keep an eye out for
> >>> this issue and get you the logs ASAP once I repro the crash.
> >>>
> >>> Thank you for your responsiveness - will keep you posted!
> >>>
> >>> Best,
> >>> - Ivan
> >>> --
> >>> Ivan Smirnov
> >>> https://ivans.io/ | https://blog.ivansmirnov.name/
> >>> https://www.linkedin.com/in/ismirnov |
> >>> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> >>> *https://github.com/issmirnov <https://ivansmirnov.name/>*
> >>>
> >>>
> >>> On Tue, Nov 1, 2022 at 10:21 AM Neftin, Sasha <sasha.neftin@intel.com>
> >>> wrote:
> >>>
> >>>> On 11/1/2022 02:05, Jakub Kicinski wrote:
> >>>> > CC: intel-wired
> >>>> >
> >>>> > On Sun, 30 Oct 2022 14:44:57 -0600 Ivan Smirnov wrote:
> >>>> >> Hi folks,
> >>>> >>
> >>>> >> I found your commits on the linux kernel igc
> >>>> >> <
> >>>> https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc
> >>>> >
> >>>> >> folder. There appears to be a bug with the igc kernel module on Intel
> >>>> >> I225-V chips.
> >>>> >>
> >>>> >> Specifically, the probe fails at startup with error: "igc: probe of
> >>>> >> 0000:06:00.0 failed with error -13". When it does load, it crashes
> >>>> after a
> >>>> >> few hours with error "igc failed to read reg 0xc030".
> >>>> >>
> >>>> Could you provide dmesg -w -T | grep -i igc on the boot stage? ethtool
> >>>> -i?
> >>>> I've cc'd our PAE expert Amir who also could try to look at this
> >>>> problem.
> >>>>
> >>>> >> There are several affected users posting on
> >>>> >>
> >>>> https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/
> >>>> >> with more details.
> >>>> >>
> >>>> >> Could I help you debug this? This problem has been reproduced on the
> >>>> >> following setups:
> >>>> >>
> >>>> >> 1. Asus TUF-GAMING-Z690-PLUS-WIFI-D4
> >>>> >> <
> >>>> https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/
> >>>> >
> >>>> >> on
> >>>> >> Arch Linux, kernel 6.0.2-arch1-1
> >>>> >> 2. rog strix x670e-e gaming wifi
> >>>> >> <
> >>>> https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/
> >>>> >
> >>>> >> on
> >>>> >> Proxmox 7, as well as Ubuntu Linux (kernel 5.19, I believe)
> >>>> >>
> >>>> >> I'm happy to load any debug modules or provide additional logs as per
> >>>> >> your request.
> >>>> >>
> >>>> >> Thank you
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> --
> >>>> >> Ivan Smirnov
> >>>> >> https://ivans.io/ | https://blog.ivansmirnov.name/
> >>>> >> https://www.linkedin.com/in/ismirnov |
> >>>> >> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> >>>> >> *https://github.com/issmirnov <https://ivansmirnov.name/>*
> >>>> >
> >>>>
> >>>> --
> > --
> > Ivan Smirnov
> > https://ivans.io/ | https://blog.ivansmirnov.name/
> > https://www.linkedin.com/in/ismirnov |
> > *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> > *https://github.com/issmirnov <https://ivansmirnov.name/>*
> >
> -- 
> --
> Ivan Smirnov
> https://ivans.io/ | https://blog.ivansmirnov.name/
> https://www.linkedin.com/in/ismirnov |
> *https://ivansmirnov.name/ <https://ivansmirnov.name/>*
> *https://github.com/issmirnov <https://ivansmirnov.name/>*

> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-18 22:43               ` Conor Dooley
@ 2022-11-18 22:54                 ` Jakub Kicinski
  -1 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2022-11-18 22:54 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Ivan Smirnov, Neftin, Sasha, Fuxbrumer, Devora, intel-wired-lan,
	Ruinskiy, Dima, Avivi, Amir, regressions

On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
> > Is there any update for the community? More and more folks are asking. We
> > are all techies and happy to help debug.  
> 
> Vested interest since I am suffering from the same issue (X670E-F
> Gaming), but is it okay to add this to regzbot? Not sure whether it
> counts as a regression or not since it's new hw with the existing driver,
> but this seems to be falling through the cracks without a response for
> several weeks.

Dunno, Thorsten's will decide. The line has to be drawn somewhere
on "vendor doesn't care about Linux support" vs "we broke uAPI".
This is the kind of situation I was alluding to in my line of
questioning at the maintainer summit: https://lwn.net/Articles/908324/

Finding a kernel release which does not suffer from the problem
would certainly strengthen your case.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
@ 2022-11-18 22:54                 ` Jakub Kicinski
  0 siblings, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2022-11-18 22:54 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Fuxbrumer, Devora, regressions, intel-wired-lan, Ivan Smirnov,
	Ruinskiy, Dima, Avivi, Amir

On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
> > Is there any update for the community? More and more folks are asking. We
> > are all techies and happy to help debug.  
> 
> Vested interest since I am suffering from the same issue (X670E-F
> Gaming), but is it okay to add this to regzbot? Not sure whether it
> counts as a regression or not since it's new hw with the existing driver,
> but this seems to be falling through the cracks without a response for
> several weeks.

Dunno, Thorsten's will decide. The line has to be drawn somewhere
on "vendor doesn't care about Linux support" vs "we broke uAPI".
This is the kind of situation I was alluding to in my line of
questioning at the maintainer summit: https://lwn.net/Articles/908324/

Finding a kernel release which does not suffer from the problem
would certainly strengthen your case.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-18 22:54                 ` Jakub Kicinski
@ 2022-11-18 23:21                   ` Conor Dooley
  -1 siblings, 0 replies; 29+ messages in thread
From: Conor Dooley @ 2022-11-18 23:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Ivan Smirnov, Neftin, Sasha, Fuxbrumer, Devora, intel-wired-lan,
	Ruinskiy, Dima, Avivi, Amir, regressions

On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
> On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
> > > Is there any update for the community? More and more folks are asking. We
> > > are all techies and happy to help debug.  
> > 
> > Vested interest since I am suffering from the same issue (X670E-F
> > Gaming), but is it okay to add this to regzbot? Not sure whether it
> > counts as a regression or not since it's new hw with the existing driver,
> > but this seems to be falling through the cracks without a response for
> > several weeks.
> 
> Dunno, Thorsten's will decide. The line has to be drawn somewhere
> on "vendor doesn't care about Linux support" vs "we broke uAPI".
> This is the kind of situation I was alluding to in my line of
> questioning at the maintainer summit: https://lwn.net/Articles/908324/

Yeah & it is /regression/ tracking which I don't (or rather didn't)
consider this situation to be. I'm generally a little unsure as to when
I should trigger regzbot in general:
- immediately when I find something?
- only if it goes a while with nothing constructive?
- is it okay to use it outside of "this used to work and now doesnt"?

Either way, but I did some more googling and found this reddit thread:
https://www.reddit.com/r/intel/comments/lqb4km/for_people_having_i225v_connection_issues/

That's being reported against windows & I dunno if the dude is using
firmware and driver interchangeably etc. But the disabling power saving
etc sounds oddly like the issue we have here, since that was a proposed
workaround in Ivan's 2022 reddit thread.

Supposedly I am on firmware-version 1082:8770, but /I/ I have no idea
how that corresponds to windows versioning. That may lend some credence
to your assertion about firmware being the source of many issues.

> Finding a kernel release which does not suffer from the problem
> would certainly strengthen your case.

Aye, likely to be a little difficult to do a meaningful bisection for
me at least, since the motherboard I have with the problem is an AM5
one for the new Zen4 stuff. I'm not an x86 person, so not entirely
sure when that support landed. I may do some poking tomorrow..


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
@ 2022-11-18 23:21                   ` Conor Dooley
  0 siblings, 0 replies; 29+ messages in thread
From: Conor Dooley @ 2022-11-18 23:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Fuxbrumer, Devora, regressions, intel-wired-lan, Ivan Smirnov,
	Ruinskiy, Dima, Avivi, Amir

On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
> On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
> > > Is there any update for the community? More and more folks are asking. We
> > > are all techies and happy to help debug.  
> > 
> > Vested interest since I am suffering from the same issue (X670E-F
> > Gaming), but is it okay to add this to regzbot? Not sure whether it
> > counts as a regression or not since it's new hw with the existing driver,
> > but this seems to be falling through the cracks without a response for
> > several weeks.
> 
> Dunno, Thorsten's will decide. The line has to be drawn somewhere
> on "vendor doesn't care about Linux support" vs "we broke uAPI".
> This is the kind of situation I was alluding to in my line of
> questioning at the maintainer summit: https://lwn.net/Articles/908324/

Yeah & it is /regression/ tracking which I don't (or rather didn't)
consider this situation to be. I'm generally a little unsure as to when
I should trigger regzbot in general:
- immediately when I find something?
- only if it goes a while with nothing constructive?
- is it okay to use it outside of "this used to work and now doesnt"?

Either way, but I did some more googling and found this reddit thread:
https://www.reddit.com/r/intel/comments/lqb4km/for_people_having_i225v_connection_issues/

That's being reported against windows & I dunno if the dude is using
firmware and driver interchangeably etc. But the disabling power saving
etc sounds oddly like the issue we have here, since that was a proposed
workaround in Ivan's 2022 reddit thread.

Supposedly I am on firmware-version 1082:8770, but /I/ I have no idea
how that corresponds to windows versioning. That may lend some credence
to your assertion about firmware being the source of many issues.

> Finding a kernel release which does not suffer from the problem
> would certainly strengthen your case.

Aye, likely to be a little difficult to do a meaningful bisection for
me at least, since the motherboard I have with the problem is an AM5
one for the new Zen4 stuff. I'm not an x86 person, so not entirely
sure when that support landed. I may do some poking tomorrow..

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-18 23:21                   ` Conor Dooley
@ 2022-11-19 18:06                     ` Neftin, Sasha
  -1 siblings, 0 replies; 29+ messages in thread
From: Neftin, Sasha @ 2022-11-19 18:06 UTC (permalink / raw)
  To: Conor Dooley, Jakub Kicinski
  Cc: Fuxbrumer, Devora, regressions, Meir, NaamaX, intel-wired-lan,
	Ivan Smirnov, Ruinskiy,  Dima, Avivi, Amir

On 11/19/2022 01:21, Conor Dooley wrote:
> On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
>> On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
>>>> Is there any update for the community? More and more folks are asking. We
>>>> are all techies and happy to help debug.
>>>
>>> Vested interest since I am suffering from the same issue (X670E-F
>>> Gaming), but is it okay to add this to regzbot? Not sure whether it
>>> counts as a regression or not since it's new hw with the existing driver,
>>> but this seems to be falling through the cracks without a response for
>>> several weeks.
>>
>> Dunno, Thorsten's will decide. The line has to be drawn somewhere
>> on "vendor doesn't care about Linux support" vs "we broke uAPI".
>> This is the kind of situation I was alluding to in my line of
>> questioning at the maintainer summit: https://lwn.net/Articles/908324/
> 
> Yeah & it is /regression/ tracking which I don't (or rather didn't)
> consider this situation to be. I'm generally a little unsure as to when
> I should trigger regzbot in general:
> - immediately when I find something?
> - only if it goes a while with nothing constructive?
> - is it okay to use it outside of "this used to work and now doesnt"?
> 
> Either way, but I did some more googling and found this reddit thread:
> https://www.reddit.com/r/intel/comments/lqb4km/for_people_having_i225v_connection_issues/
> 
> That's being reported against windows & I dunno if the dude is using
> firmware and driver interchangeably etc. But the disabling power saving
> etc sounds oddly like the issue we have here, since that was a proposed
> workaround in Ivan's 2022 reddit thread.
> 
> Supposedly I am on firmware-version 1082:8770, but /I/ I have no idea
> how that corresponds to windows versioning. That may lend some credence
> to your assertion about firmware being the source of many issues.
> 
>> Finding a kernel release which does not suffer from the problem
>> would certainly strengthen your case.
> 
> Aye, likely to be a little difficult to do a meaningful bisection for
> me at least, since the motherboard I have with the problem is an AM5
> one for the new Zen4 stuff. I'm not an x86 person, so not entirely
> sure when that support landed. I may do some poking tomorrow..
> 
I do not think we can resolve this problem on this forum.
In early Ivan's report was reported error to netdev "PCIe link lost, 
device now detached"). Since the PCIe link unexpectedly drops it could 
lead to many problems (not only crashes).
Before you go to SW/FW bisection (change FW(NVM), go back with a kernel 
version) - please, contact your board vendor (ASUS). Why PCIe link drop?
Circuit problem on board, the system performs power management flows and 
does not stop the driver.

"failed to read reg 0xc030" (just symptom) happen after PCIe link lost.


_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
@ 2022-11-19 18:06                     ` Neftin, Sasha
  0 siblings, 0 replies; 29+ messages in thread
From: Neftin, Sasha @ 2022-11-19 18:06 UTC (permalink / raw)
  To: Conor Dooley, Jakub Kicinski
  Cc: Ivan Smirnov, Fuxbrumer, Devora, intel-wired-lan, Ruinskiy, Dima,
	Avivi, Amir, regressions, Lifshits, Vitaly, naamax.meir, Meir,
	NaamaX

On 11/19/2022 01:21, Conor Dooley wrote:
> On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
>> On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
>>>> Is there any update for the community? More and more folks are asking. We
>>>> are all techies and happy to help debug.
>>>
>>> Vested interest since I am suffering from the same issue (X670E-F
>>> Gaming), but is it okay to add this to regzbot? Not sure whether it
>>> counts as a regression or not since it's new hw with the existing driver,
>>> but this seems to be falling through the cracks without a response for
>>> several weeks.
>>
>> Dunno, Thorsten's will decide. The line has to be drawn somewhere
>> on "vendor doesn't care about Linux support" vs "we broke uAPI".
>> This is the kind of situation I was alluding to in my line of
>> questioning at the maintainer summit: https://lwn.net/Articles/908324/
> 
> Yeah & it is /regression/ tracking which I don't (or rather didn't)
> consider this situation to be. I'm generally a little unsure as to when
> I should trigger regzbot in general:
> - immediately when I find something?
> - only if it goes a while with nothing constructive?
> - is it okay to use it outside of "this used to work and now doesnt"?
> 
> Either way, but I did some more googling and found this reddit thread:
> https://www.reddit.com/r/intel/comments/lqb4km/for_people_having_i225v_connection_issues/
> 
> That's being reported against windows & I dunno if the dude is using
> firmware and driver interchangeably etc. But the disabling power saving
> etc sounds oddly like the issue we have here, since that was a proposed
> workaround in Ivan's 2022 reddit thread.
> 
> Supposedly I am on firmware-version 1082:8770, but /I/ I have no idea
> how that corresponds to windows versioning. That may lend some credence
> to your assertion about firmware being the source of many issues.
> 
>> Finding a kernel release which does not suffer from the problem
>> would certainly strengthen your case.
> 
> Aye, likely to be a little difficult to do a meaningful bisection for
> me at least, since the motherboard I have with the problem is an AM5
> one for the new Zen4 stuff. I'm not an x86 person, so not entirely
> sure when that support landed. I may do some poking tomorrow..
> 
I do not think we can resolve this problem on this forum.
In early Ivan's report was reported error to netdev "PCIe link lost, 
device now detached"). Since the PCIe link unexpectedly drops it could 
lead to many problems (not only crashes).
Before you go to SW/FW bisection (change FW(NVM), go back with a kernel 
version) - please, contact your board vendor (ASUS). Why PCIe link drop?
Circuit problem on board, the system performs power management flows and 
does not stop the driver.

"failed to read reg 0xc030" (just symptom) happen after PCIe link lost.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-18 23:21                   ` Conor Dooley
@ 2022-11-20 10:32                     ` Thorsten Leemhuis
  -1 siblings, 0 replies; 29+ messages in thread
From: Thorsten Leemhuis @ 2022-11-20 10:32 UTC (permalink / raw)
  To: Conor Dooley, Jakub Kicinski
  Cc: Fuxbrumer, Devora, regressions, intel-wired-lan, Ivan Smirnov,
	Ruinskiy, Dima, Avivi, Amir

On 19.11.22 00:21, Conor Dooley wrote:
> On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
>> On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
>>>> Is there any update for the community? More and more folks are asking. We
>>>> are all techies and happy to help debug.  
>>>
>>> Vested interest since I am suffering from the same issue (X670E-F
>>> Gaming), but is it okay to add this to regzbot? Not sure whether it
>>> counts as a regression or not since it's new hw with the existing driver,
>>> but this seems to be falling through the cracks without a response for
>>> several weeks.
>>
>> Dunno, Thorsten's will decide. The line has to be drawn somewhere
>> on "vendor doesn't care about Linux support" vs "we broke uAPI".
>> This is the kind of situation I was alluding to in my line of
>> questioning at the maintainer summit: https://lwn.net/Articles/908324/
> 
> Yeah & it is /regression/ tracking which I don't (or rather didn't)
> consider this situation to be.

Yeah, looks like this is not something that look track-worthy for
regzbot -- at least for now, maybe it one day makes sense to use and
improved regzbot for bug reports as well, but I'd like to focus on
establishing regression tracking properly first, which still requires a
lot of work.

> I'm generally a little unsure as to when
> I should trigger regzbot in general:
> - immediately when I find something?

Yes, ideally, as documented here:
https://docs.kernel.org/admin-guide/reporting-regressions.html

> - only if it goes a while with nothing constructive?

But that is fine as well. But FWIW, we all don't want bureaucracy. Even
I don't add each and every regression I see to the tracking yet.

> - is it okay to use it outside of "this used to work and now doesnt"?

Guess I should clarify that this is unwanted in above doc.

Ciao, Thorsten
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
@ 2022-11-20 10:32                     ` Thorsten Leemhuis
  0 siblings, 0 replies; 29+ messages in thread
From: Thorsten Leemhuis @ 2022-11-20 10:32 UTC (permalink / raw)
  To: Conor Dooley, Jakub Kicinski
  Cc: Ivan Smirnov, Neftin, Sasha, Fuxbrumer, Devora, intel-wired-lan,
	Ruinskiy, Dima, Avivi, Amir, regressions

On 19.11.22 00:21, Conor Dooley wrote:
> On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
>> On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
>>>> Is there any update for the community? More and more folks are asking. We
>>>> are all techies and happy to help debug.  
>>>
>>> Vested interest since I am suffering from the same issue (X670E-F
>>> Gaming), but is it okay to add this to regzbot? Not sure whether it
>>> counts as a regression or not since it's new hw with the existing driver,
>>> but this seems to be falling through the cracks without a response for
>>> several weeks.
>>
>> Dunno, Thorsten's will decide. The line has to be drawn somewhere
>> on "vendor doesn't care about Linux support" vs "we broke uAPI".
>> This is the kind of situation I was alluding to in my line of
>> questioning at the maintainer summit: https://lwn.net/Articles/908324/
> 
> Yeah & it is /regression/ tracking which I don't (or rather didn't)
> consider this situation to be.

Yeah, looks like this is not something that look track-worthy for
regzbot -- at least for now, maybe it one day makes sense to use and
improved regzbot for bug reports as well, but I'd like to focus on
establishing regression tracking properly first, which still requires a
lot of work.

> I'm generally a little unsure as to when
> I should trigger regzbot in general:
> - immediately when I find something?

Yes, ideally, as documented here:
https://docs.kernel.org/admin-guide/reporting-regressions.html

> - only if it goes a while with nothing constructive?

But that is fine as well. But FWIW, we all don't want bureaucracy. Even
I don't add each and every regression I see to the tracking yet.

> - is it okay to use it outside of "this used to work and now doesnt"?

Guess I should clarify that this is unwanted in above doc.

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-20 10:32                     ` Thorsten Leemhuis
@ 2022-11-20 18:40                       ` Conor Dooley
  -1 siblings, 0 replies; 29+ messages in thread
From: Conor Dooley @ 2022-11-20 18:40 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Jakub Kicinski, Ivan Smirnov, Neftin, Sasha, Fuxbrumer, Devora,
	intel-wired-lan, Ruinskiy, Dima, Avivi, Amir, regressions

On Sun, Nov 20, 2022 at 11:32:36AM +0100, Thorsten Leemhuis wrote:
> On 19.11.22 00:21, Conor Dooley wrote:
> > On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
> >> On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
> >>>> Is there any update for the community? More and more folks are asking. We
> >>>> are all techies and happy to help debug.  
> >>>
> >>> Vested interest since I am suffering from the same issue (X670E-F
> >>> Gaming), but is it okay to add this to regzbot? Not sure whether it
> >>> counts as a regression or not since it's new hw with the existing driver,
> >>> but this seems to be falling through the cracks without a response for
> >>> several weeks.
> >>
> >> Dunno, Thorsten's will decide. The line has to be drawn somewhere
> >> on "vendor doesn't care about Linux support" vs "we broke uAPI".
> >> This is the kind of situation I was alluding to in my line of
> >> questioning at the maintainer summit: https://lwn.net/Articles/908324/
> > 
> > Yeah & it is /regression/ tracking which I don't (or rather didn't)
> > consider this situation to be.
> 
> Yeah, looks like this is not something that look track-worthy for
> regzbot -- at least for now, maybe it one day makes sense to use and
> improved regzbot for bug reports as well, but I'd like to focus on
> establishing regression tracking properly first, which still requires a
> lot of work.
> 
> > I'm generally a little unsure as to when
> > I should trigger regzbot in general:
> > - immediately when I find something?
> 
> Yes, ideally, as documented here:
> https://docs.kernel.org/admin-guide/reporting-regressions.html
> 
> > - only if it goes a while with nothing constructive?
> 
> But that is fine as well. But FWIW, we all don't want bureaucracy. Even
> I don't add each and every regression I see to the tracking yet.
> 
> > - is it okay to use it outside of "this used to work and now doesnt"?
> 
> Guess I should clarify that this is unwanted in above doc.

Right. I wasn't sure if it was okay to use it for "this never worked"
type of issues. Thanks Thorsten!


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
@ 2022-11-20 18:40                       ` Conor Dooley
  0 siblings, 0 replies; 29+ messages in thread
From: Conor Dooley @ 2022-11-20 18:40 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Fuxbrumer, Devora, regressions, intel-wired-lan, Ivan Smirnov,
	Jakub Kicinski, Ruinskiy, Dima, Avivi, Amir

On Sun, Nov 20, 2022 at 11:32:36AM +0100, Thorsten Leemhuis wrote:
> On 19.11.22 00:21, Conor Dooley wrote:
> > On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
> >> On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
> >>>> Is there any update for the community? More and more folks are asking. We
> >>>> are all techies and happy to help debug.  
> >>>
> >>> Vested interest since I am suffering from the same issue (X670E-F
> >>> Gaming), but is it okay to add this to regzbot? Not sure whether it
> >>> counts as a regression or not since it's new hw with the existing driver,
> >>> but this seems to be falling through the cracks without a response for
> >>> several weeks.
> >>
> >> Dunno, Thorsten's will decide. The line has to be drawn somewhere
> >> on "vendor doesn't care about Linux support" vs "we broke uAPI".
> >> This is the kind of situation I was alluding to in my line of
> >> questioning at the maintainer summit: https://lwn.net/Articles/908324/
> > 
> > Yeah & it is /regression/ tracking which I don't (or rather didn't)
> > consider this situation to be.
> 
> Yeah, looks like this is not something that look track-worthy for
> regzbot -- at least for now, maybe it one day makes sense to use and
> improved regzbot for bug reports as well, but I'd like to focus on
> establishing regression tracking properly first, which still requires a
> lot of work.
> 
> > I'm generally a little unsure as to when
> > I should trigger regzbot in general:
> > - immediately when I find something?
> 
> Yes, ideally, as documented here:
> https://docs.kernel.org/admin-guide/reporting-regressions.html
> 
> > - only if it goes a while with nothing constructive?
> 
> But that is fine as well. But FWIW, we all don't want bureaucracy. Even
> I don't add each and every regression I see to the tracking yet.
> 
> > - is it okay to use it outside of "this used to work and now doesnt"?
> 
> Guess I should clarify that this is unwanted in above doc.

Right. I wasn't sure if it was okay to use it for "this never worked"
type of issues. Thanks Thorsten!

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-19 18:06                     ` Neftin, Sasha
@ 2022-11-20 19:55                       ` Conor Dooley
  -1 siblings, 0 replies; 29+ messages in thread
From: Conor Dooley @ 2022-11-20 19:55 UTC (permalink / raw)
  To: Neftin, Sasha
  Cc: Jakub Kicinski, Ivan Smirnov, Fuxbrumer, Devora, intel-wired-lan,
	Ruinskiy, Dima, Avivi, Amir, regressions, Lifshits, Vitaly,
	naamax.meir, Meir, NaamaX

On Sat, Nov 19, 2022 at 08:06:05PM +0200, Neftin, Sasha wrote:
> On 11/19/2022 01:21, Conor Dooley wrote:
> > On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
> > > On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
> > > > > Is there any update for the community? More and more folks are asking. We
> > > > > are all techies and happy to help debug.
> > > > 
> > > > Vested interest since I am suffering from the same issue (X670E-F
> > > > Gaming), but is it okay to add this to regzbot? Not sure whether it
> > > > counts as a regression or not since it's new hw with the existing driver,
> > > > but this seems to be falling through the cracks without a response for
> > > > several weeks.
> > > 
> > > Dunno, Thorsten's will decide. The line has to be drawn somewhere
> > > on "vendor doesn't care about Linux support" vs "we broke uAPI".
> > > This is the kind of situation I was alluding to in my line of
> > > questioning at the maintainer summit: https://lwn.net/Articles/908324/
> > 
> > Yeah & it is /regression/ tracking which I don't (or rather didn't)
> > consider this situation to be. I'm generally a little unsure as to when
> > I should trigger regzbot in general:
> > - immediately when I find something?
> > - only if it goes a while with nothing constructive?
> > - is it okay to use it outside of "this used to work and now doesnt"?
> > 
> > Either way, but I did some more googling and found this reddit thread:
> > https://www.reddit.com/r/intel/comments/lqb4km/for_people_having_i225v_connection_issues/
> > 
> > That's being reported against windows & I dunno if the dude is using
> > firmware and driver interchangeably etc. But the disabling power saving
> > etc sounds oddly like the issue we have here, since that was a proposed
> > workaround in Ivan's 2022 reddit thread.
> > 
> > Supposedly I am on firmware-version 1082:8770, but /I/ I have no idea
> > how that corresponds to windows versioning. That may lend some credence
> > to your assertion about firmware being the source of many issues.
> > 
> > > Finding a kernel release which does not suffer from the problem
> > > would certainly strengthen your case.
> > 
> > Aye, likely to be a little difficult to do a meaningful bisection for
> > me at least, since the motherboard I have with the problem is an AM5
> > one for the new Zen4 stuff. I'm not an x86 person, so not entirely
> > sure when that support landed. I may do some poking tomorrow..
> > 
> I do not think we can resolve this problem on this forum.
> In early Ivan's report was reported error to netdev "PCIe link lost, device
> now detached"). Since the PCIe link unexpectedly drops it could lead to many
> problems (not only crashes).

Hmm, I'll take a look at what mine spits out next time it dies, but I
would imagine that you're correct and I see it too.

> Before you go to SW/FW bisection (change FW(NVM), go back with a kernel
> version) - please, contact your board vendor (ASUS). Why PCIe link drop?

I dunno, I suppose it just entered a lower power state!

> Circuit problem on board, the system performs power management flows and
> does not stop the driver.

My GPU and other PCI devices are returning from lower power modes properly.
I wonder what's different about this specific device. As I said, not too
familiar with x86 stuff - is there someone from AMD worth poking as the
output from lspci is a wall of AMD bridges w/ endpoints mixed in.

Doing a cursory look at other x670 stuff - the non-asus ones that I
looked at are not using Intel ethernet.

> "failed to read reg 0xc030" (just symptom) happen after PCIe link lost.

Per 47e16692b26b ("igb/igc: warn when fatal read failure happens"), it
looks as though this is not a *new* problem though as you guys have seen
this while testing.

I've got a 1 G NIC, I like my dev machine to "just work" so I'll probably
throw that in and see how far that gets me. IIRC it's an igb one so will
at least make for a datapoint.

Thanks,
Conor.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
@ 2022-11-20 19:55                       ` Conor Dooley
  0 siblings, 0 replies; 29+ messages in thread
From: Conor Dooley @ 2022-11-20 19:55 UTC (permalink / raw)
  To: Neftin, Sasha
  Cc: Fuxbrumer, Devora, regressions, Meir, NaamaX, Ivan Smirnov,
	intel-wired-lan, Jakub Kicinski, Ruinskiy, Dima, Avivi, Amir

On Sat, Nov 19, 2022 at 08:06:05PM +0200, Neftin, Sasha wrote:
> On 11/19/2022 01:21, Conor Dooley wrote:
> > On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
> > > On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
> > > > > Is there any update for the community? More and more folks are asking. We
> > > > > are all techies and happy to help debug.
> > > > 
> > > > Vested interest since I am suffering from the same issue (X670E-F
> > > > Gaming), but is it okay to add this to regzbot? Not sure whether it
> > > > counts as a regression or not since it's new hw with the existing driver,
> > > > but this seems to be falling through the cracks without a response for
> > > > several weeks.
> > > 
> > > Dunno, Thorsten's will decide. The line has to be drawn somewhere
> > > on "vendor doesn't care about Linux support" vs "we broke uAPI".
> > > This is the kind of situation I was alluding to in my line of
> > > questioning at the maintainer summit: https://lwn.net/Articles/908324/
> > 
> > Yeah & it is /regression/ tracking which I don't (or rather didn't)
> > consider this situation to be. I'm generally a little unsure as to when
> > I should trigger regzbot in general:
> > - immediately when I find something?
> > - only if it goes a while with nothing constructive?
> > - is it okay to use it outside of "this used to work and now doesnt"?
> > 
> > Either way, but I did some more googling and found this reddit thread:
> > https://www.reddit.com/r/intel/comments/lqb4km/for_people_having_i225v_connection_issues/
> > 
> > That's being reported against windows & I dunno if the dude is using
> > firmware and driver interchangeably etc. But the disabling power saving
> > etc sounds oddly like the issue we have here, since that was a proposed
> > workaround in Ivan's 2022 reddit thread.
> > 
> > Supposedly I am on firmware-version 1082:8770, but /I/ I have no idea
> > how that corresponds to windows versioning. That may lend some credence
> > to your assertion about firmware being the source of many issues.
> > 
> > > Finding a kernel release which does not suffer from the problem
> > > would certainly strengthen your case.
> > 
> > Aye, likely to be a little difficult to do a meaningful bisection for
> > me at least, since the motherboard I have with the problem is an AM5
> > one for the new Zen4 stuff. I'm not an x86 person, so not entirely
> > sure when that support landed. I may do some poking tomorrow..
> > 
> I do not think we can resolve this problem on this forum.
> In early Ivan's report was reported error to netdev "PCIe link lost, device
> now detached"). Since the PCIe link unexpectedly drops it could lead to many
> problems (not only crashes).

Hmm, I'll take a look at what mine spits out next time it dies, but I
would imagine that you're correct and I see it too.

> Before you go to SW/FW bisection (change FW(NVM), go back with a kernel
> version) - please, contact your board vendor (ASUS). Why PCIe link drop?

I dunno, I suppose it just entered a lower power state!

> Circuit problem on board, the system performs power management flows and
> does not stop the driver.

My GPU and other PCI devices are returning from lower power modes properly.
I wonder what's different about this specific device. As I said, not too
familiar with x86 stuff - is there someone from AMD worth poking as the
output from lspci is a wall of AMD bridges w/ endpoints mixed in.

Doing a cursory look at other x670 stuff - the non-asus ones that I
looked at are not using Intel ethernet.

> "failed to read reg 0xc030" (just symptom) happen after PCIe link lost.

Per 47e16692b26b ("igb/igc: warn when fatal read failure happens"), it
looks as though this is not a *new* problem though as you guys have seen
this while testing.

I've got a 1 G NIC, I like my dev machine to "just work" so I'll probably
throw that in and see how far that gets me. IIRC it's an igb one so will
at least make for a datapoint.

Thanks,
Conor.

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-16 22:23           ` Ivan Smirnov
  2022-11-18 22:43               ` Conor Dooley
@ 2022-11-23 11:47             ` Ruinskiy, Dima
  2022-11-24  6:20               ` Ivan Smirnov
  1 sibling, 1 reply; 29+ messages in thread
From: Ruinskiy, Dima @ 2022-11-23 11:47 UTC (permalink / raw)
  To: Ivan Smirnov, Neftin, Sasha
  Cc: Fuxbrumer, Devora, intel-wired-lan, Jakub Kicinski, Avivi, Amir

I have looked at this thread and the other threads referenced from it.

I see multiple users reporting issues with the the I225 and its Linux 
driver, on a specific series of ASUS motherboards (X670), and at least 
one report of a similar issue with a different ASUS board (Z690).

The problem looks like the device 'disappears' from the bus, and becomes 
inaccessible to the driver. If it happens early - the driver will not 
load, if it happens later - it may fail with sporadic access errors.

There are some reports of partially working workarounds (i.e., some 
users claim it solved their issues, while for others it did not help) - 
that have to do with tweaking various PCIe power management settings. I 
can see the connection here, because PCIe power management is not 
trivial, and depends on a combination of hardware, firmware, BIOS, OS 
and driver factors. When there is a problem somewhere - it can manifest 
exactly like what has been reported here.

The user will see that the driver is crashing, but that does not 
necessarily mean that the problem is in the driver. It may be a bug in 
any other component, or an interoperability issue. A fix/workaround may 
also be implemented in any of the involved modules, depending on the 
root cause and the complexity.

We, the igc driver maintainers, are unable to offer any software patch 
for the problem at this point, because the issue has not been 
root-caused, as far as I know. We have not seen this problem during our 
in-house testing, and since it has been reported, have not been able to 
reproduce it on any of our test setups.

The I225 network device is a "LAN on motherboard" solution. While the 
chip, the firmware and the driver are provided by Intel, the motherboard 
vendor is the one that controls the layout, the electrical 
interconnects, the BIOS, and the specific FW version that is flashed to 
the chip.

The fact that many such reports are coming recently from specific ASUS 
boards, and not from other vendors with I225 solutions, would lead me to 
first check in ASUS's direction. ASUS may be able to address this issue 
with a range of solutions - hardware replacement, BIOS update, I225 
firmware rollout, or instructions on how to tweak the settings - 
depending on the cause they will determine. The more reports they get 
from their customers, the more likely they will figure it out.

For instance - a recent report from adam.lamarz on the Bugzilla thread 
https://bugzilla.kernel.org/show_bug.cgi?id=216652#c15
indicates that there is some hope the issue can be alleviated with a 
BIOS update and a tweak to the kernel settings.

Is it _possible_ that in the end there will be some patch in the igc 
driver for this issue, together or independently from other components? Yes.

Can we offer such a patch based on what we know so far? No, because we 
have not been able to reproduce the issue in-house, and have also not 
received any communication about it from ASUS (who, I expect, have their 
own validation and test procedures, before they roll-out their hardware 
to the end-users).

I understand this is not the definitive answer to the problem that we 
may all want to see, but this is what I have at the moment.

--Dima

On 17/11/2022 0:23, Ivan Smirnov wrote:
> Hi folks,
> 
> Is there any update for the community? More and more folks are asking. 
> We are all techies and happy to help debug.
> 
> Thank you kindly,
> - Ivan
> 
> On Thu, Nov 10, 2022 at 03:44 Ivan Smirnov <isgsmirnov@gmail.com 
> <mailto:isgsmirnov@gmail.com>> wrote:
> 
>     Some more data from another user. Do you guys have any preliminary
>     investigation you could share back with the community?
> 
>     Same issue, been struggling with it for a last month or so: both
>     with Ubuntu and Arch Linux. I have a dual-boot system with Windows
>     11, and did not notice any issues with ethernet or wifi on Windows.
>     So this indeed seems like a firmware issue, particularly in igc. Not
>     the adapter itself
> 
>     Running on Arch Linux kernel 6.0.7, same motherboard as in your post
> 
>     https://gist.github.com/LilDojd/2f030ecc5c5b6f8c3285725adfb8c456
>     <https://gist.github.com/LilDojd/2f030ecc5c5b6f8c3285725adfb8c456>
> 
> 
> 
> 
>     On Thu, Nov 3, 2022 at 05:53 Ivan Smirnov <isgsmirnov@gmail.com
>     <mailto:isgsmirnov@gmail.com>> wrote:
> 
>         Here is the gist from one reddit user:
>         https://gist.github.com/DarkArc/50ffca5fc343e2ff8166bc81d3ff8335
>         <https://gist.github.com/DarkArc/50ffca5fc343e2ff8166bc81d3ff8335>
> 
>         Here are my gists (crash free for now):
>         https://gist.github.com/issmirnov/b9ac74d232e1865ae849a3e64dce2afe
>         <https://gist.github.com/issmirnov/b9ac74d232e1865ae849a3e64dce2afe>
> 
>         --
>         Ivan Smirnov
>         https://ivans.io/ <https://ivans.io/> |
>         https://blog.ivansmirnov.name/ <https://blog.ivansmirnov.name/>
>         https://www.linkedin.com/in/ismirnov
>         <https://www.linkedin.com/in/ismirnov> | _https://ivansmirnov.name/
>         <https://ivansmirnov.name/>_
>         _https://github.com/issmirnov <https://ivansmirnov.name/>_
> 
> 
>         On Wed, Nov 2, 2022 at 10:54 AM Ivan Smirnov
>         <isgsmirnov@gmail.com <mailto:isgsmirnov@gmail.com>> wrote:
> 
>             Hi folks,
> 
>             As usual, the computers know when the experts join the
>             chat... I haven't been able to reproduce the issue for the
>             past few days. Yay for stability, boo for debugging.
> 
>             I posted on the reddit thread
>             <https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/> asking
>             other users to post their output. I'll do my best to keep an
>             eye out for this issue and get you the logs ASAP once I
>             repro the crash.
> 
>             Thank you for your responsiveness - will keep you posted!
> 
>             Best,
>             - Ivan
>             --
>             Ivan Smirnov
>             https://ivans.io/ <https://ivans.io/> |
>             https://blog.ivansmirnov.name/ <https://blog.ivansmirnov.name/>
>             https://www.linkedin.com/in/ismirnov
>             <https://www.linkedin.com/in/ismirnov> |
>             _https://ivansmirnov.name/
>             <https://ivansmirnov.name/>_
>             _https://github.com/issmirnov <https://ivansmirnov.name/>_
> 
> 
>             On Tue, Nov 1, 2022 at 10:21 AM Neftin, Sasha
>             <sasha.neftin@intel.com <mailto:sasha.neftin@intel.com>> wrote:
> 
>                 On 11/1/2022 02:05, Jakub Kicinski wrote:
>                  > CC: intel-wired
>                  >
>                  > On Sun, 30 Oct 2022 14:44:57 -0600 Ivan Smirnov wrote:
>                  >> Hi folks,
>                  >>
>                  >> I found your commits on the linux kernel igc
>                  >>
>                 <https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc
>                 <https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc>>
>                  >> folder. There appears to be a bug with the igc
>                 kernel module on Intel
>                  >> I225-V chips.
>                  >>
>                  >> Specifically, the probe fails at startup with error:
>                 "igc: probe of
>                  >> 0000:06:00.0 failed with error -13". When it does
>                 load, it crashes after a
>                  >> few hours with error "igc failed to read reg 0xc030".
>                  >>
>                 Could you provide dmesg -w -T | grep -i igc on the boot
>                 stage? ethtool -i?
>                 I've cc'd our PAE expert Amir who also could try to look
>                 at this problem.
> 
>                  >> There are several affected users posting on
>                  >>
>                 https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/
>                 <https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/>
>                  >> with more details.
>                  >>
>                  >> Could I help you debug this? This problem has been
>                 reproduced on the
>                  >> following setups:
>                  >>
>                  >> 1. Asus TUF-GAMING-Z690-PLUS-WIFI-D4
>                  >>
>                 <https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/
>                 <https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/>>
>                  >> on
>                  >> Arch Linux, kernel 6.0.2-arch1-1
>                  >> 2. rog strix x670e-e gaming wifi
>                  >>
>                 <https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/
>                 <https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/>>
>                  >> on
>                  >> Proxmox 7, as well as Ubuntu Linux (kernel 5.19, I
>                 believe)
>                  >>
>                  >> I'm happy to load any debug modules or provide
>                 additional logs as per
>                  >> your request.
>                  >>
>                  >> Thank you
>                  >>
>                  >>
>                  >>
>                  >>
>                  >> --
>                  >> Ivan Smirnov
>                  >> https://ivans.io/ <https://ivans.io/> |
>                 https://blog.ivansmirnov.name/
>                 <https://blog.ivansmirnov.name/>
>                  >> https://www.linkedin.com/in/ismirnov
>                 <https://www.linkedin.com/in/ismirnov> |
>                  >> *https://ivansmirnov.name/
>                 <https://ivansmirnov.name/> <https://ivansmirnov.name/
>                 <https://ivansmirnov.name/>>*
>                  >> *https://github.com/issmirnov
>                 <https://github.com/issmirnov>
>                 <https://ivansmirnov.name/ <https://ivansmirnov.name/>>*
>                  >
> 
>     -- 
>     --
>     Ivan Smirnov
>     https://ivans.io/ <https://ivans.io/> |
>     https://blog.ivansmirnov.name/ <https://blog.ivansmirnov.name/>
>     https://www.linkedin.com/in/ismirnov
>     <https://www.linkedin.com/in/ismirnov> | _https://ivansmirnov.name/
>     <https://ivansmirnov.name/>_
>     _https://github.com/issmirnov <https://ivansmirnov.name/>_
> 
> -- 
> --
> Ivan Smirnov
> https://ivans.io/ <https://ivans.io/> | https://blog.ivansmirnov.name/ 
> <https://blog.ivansmirnov.name/>
> https://www.linkedin.com/in/ismirnov 
> <https://www.linkedin.com/in/ismirnov> | _https://ivansmirnov.name/
> <https://ivansmirnov.name/>_
> _https://github.com/issmirnov <https://ivansmirnov.name/>_

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-23 11:47             ` Ruinskiy, Dima
@ 2022-11-24  6:20               ` Ivan Smirnov
  2022-11-24 13:55                 ` Ruinskiy, Dima
  0 siblings, 1 reply; 29+ messages in thread
From: Ivan Smirnov @ 2022-11-24  6:20 UTC (permalink / raw)
  To: Ruinskiy, Dima
  Cc: Fuxbrumer, Devora, Jakub Kicinski, intel-wired-lan, Avivi, Amir


[-- Attachment #1.1: Type: text/plain, Size: 11911 bytes --]

Hi Dima,

Thank you for the measured response. Do I have your permission to post it
on the reddit thread to inform other users? We can all reach out to ASUS
and hopefully get this escalated past the usual tier-1 support tarpit.

Thanks,
- Ivan
--
Ivan Smirnov
https://ivans.io/ | https://blog.ivansmirnov.name/
https://www.linkedin.com/in/ismirnov |
*https://ivansmirnov.name/ <https://ivansmirnov.name/>*
*https://github.com/issmirnov <https://ivansmirnov.name/>*


On Wed, Nov 23, 2022 at 4:47 AM Ruinskiy, Dima <dima.ruinskiy@intel.com>
wrote:

> I have looked at this thread and the other threads referenced from it.
>
> I see multiple users reporting issues with the the I225 and its Linux
> driver, on a specific series of ASUS motherboards (X670), and at least
> one report of a similar issue with a different ASUS board (Z690).
>
> The problem looks like the device 'disappears' from the bus, and becomes
> inaccessible to the driver. If it happens early - the driver will not
> load, if it happens later - it may fail with sporadic access errors.
>
> There are some reports of partially working workarounds (i.e., some
> users claim it solved their issues, while for others it did not help) -
> that have to do with tweaking various PCIe power management settings. I
> can see the connection here, because PCIe power management is not
> trivial, and depends on a combination of hardware, firmware, BIOS, OS
> and driver factors. When there is a problem somewhere - it can manifest
> exactly like what has been reported here.
>
> The user will see that the driver is crashing, but that does not
> necessarily mean that the problem is in the driver. It may be a bug in
> any other component, or an interoperability issue. A fix/workaround may
> also be implemented in any of the involved modules, depending on the
> root cause and the complexity.
>
> We, the igc driver maintainers, are unable to offer any software patch
> for the problem at this point, because the issue has not been
> root-caused, as far as I know. We have not seen this problem during our
> in-house testing, and since it has been reported, have not been able to
> reproduce it on any of our test setups.
>
> The I225 network device is a "LAN on motherboard" solution. While the
> chip, the firmware and the driver are provided by Intel, the motherboard
> vendor is the one that controls the layout, the electrical
> interconnects, the BIOS, and the specific FW version that is flashed to
> the chip.
>
> The fact that many such reports are coming recently from specific ASUS
> boards, and not from other vendors with I225 solutions, would lead me to
> first check in ASUS's direction. ASUS may be able to address this issue
> with a range of solutions - hardware replacement, BIOS update, I225
> firmware rollout, or instructions on how to tweak the settings -
> depending on the cause they will determine. The more reports they get
> from their customers, the more likely they will figure it out.
>
> For instance - a recent report from adam.lamarz on the Bugzilla thread
> https://bugzilla.kernel.org/show_bug.cgi?id=216652#c15
> indicates that there is some hope the issue can be alleviated with a
> BIOS update and a tweak to the kernel settings.
>
> Is it _possible_ that in the end there will be some patch in the igc
> driver for this issue, together or independently from other components?
> Yes.
>
> Can we offer such a patch based on what we know so far? No, because we
> have not been able to reproduce the issue in-house, and have also not
> received any communication about it from ASUS (who, I expect, have their
> own validation and test procedures, before they roll-out their hardware
> to the end-users).
>
> I understand this is not the definitive answer to the problem that we
> may all want to see, but this is what I have at the moment.
>
> --Dima
>
> On 17/11/2022 0:23, Ivan Smirnov wrote:
> > Hi folks,
> >
> > Is there any update for the community? More and more folks are asking.
> > We are all techies and happy to help debug.
> >
> > Thank you kindly,
> > - Ivan
> >
> > On Thu, Nov 10, 2022 at 03:44 Ivan Smirnov <isgsmirnov@gmail.com
> > <mailto:isgsmirnov@gmail.com>> wrote:
> >
> >     Some more data from another user. Do you guys have any preliminary
> >     investigation you could share back with the community?
> >
> >     Same issue, been struggling with it for a last month or so: both
> >     with Ubuntu and Arch Linux. I have a dual-boot system with Windows
> >     11, and did not notice any issues with ethernet or wifi on Windows.
> >     So this indeed seems like a firmware issue, particularly in igc. Not
> >     the adapter itself
> >
> >     Running on Arch Linux kernel 6.0.7, same motherboard as in your post
> >
> >     https://gist.github.com/LilDojd/2f030ecc5c5b6f8c3285725adfb8c456
> >     <https://gist.github.com/LilDojd/2f030ecc5c5b6f8c3285725adfb8c456>
> >
> >
> >
> >
> >     On Thu, Nov 3, 2022 at 05:53 Ivan Smirnov <isgsmirnov@gmail.com
> >     <mailto:isgsmirnov@gmail.com>> wrote:
> >
> >         Here is the gist from one reddit user:
> >         https://gist.github.com/DarkArc/50ffca5fc343e2ff8166bc81d3ff8335
> >         <
> https://gist.github.com/DarkArc/50ffca5fc343e2ff8166bc81d3ff8335>
> >
> >         Here are my gists (crash free for now):
> >
> https://gist.github.com/issmirnov/b9ac74d232e1865ae849a3e64dce2afe
> >         <
> https://gist.github.com/issmirnov/b9ac74d232e1865ae849a3e64dce2afe>
> >
> >         --
> >         Ivan Smirnov
> >         https://ivans.io/ <https://ivans.io/> |
> >         https://blog.ivansmirnov.name/ <https://blog.ivansmirnov.name/>
> >         https://www.linkedin.com/in/ismirnov
> >         <https://www.linkedin.com/in/ismirnov> | _
> https://ivansmirnov.name/
> >         <https://ivansmirnov.name/>_
> >         _https://github.com/issmirnov <https://ivansmirnov.name/>_
> >
> >
> >         On Wed, Nov 2, 2022 at 10:54 AM Ivan Smirnov
> >         <isgsmirnov@gmail.com <mailto:isgsmirnov@gmail.com>> wrote:
> >
> >             Hi folks,
> >
> >             As usual, the computers know when the experts join the
> >             chat... I haven't been able to reproduce the issue for the
> >             past few days. Yay for stability, boo for debugging.
> >
> >             I posted on the reddit thread
> >             <
> https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/
> > asking
> >             other users to post their output. I'll do my best to keep an
> >             eye out for this issue and get you the logs ASAP once I
> >             repro the crash.
> >
> >             Thank you for your responsiveness - will keep you posted!
> >
> >             Best,
> >             - Ivan
> >             --
> >             Ivan Smirnov
> >             https://ivans.io/ <https://ivans.io/> |
> >             https://blog.ivansmirnov.name/ <
> https://blog.ivansmirnov.name/>
> >             https://www.linkedin.com/in/ismirnov
> >             <https://www.linkedin.com/in/ismirnov> |
> >             _https://ivansmirnov.name/
> >             <https://ivansmirnov.name/>_
> >             _https://github.com/issmirnov <https://ivansmirnov.name/>_
> >
> >
> >             On Tue, Nov 1, 2022 at 10:21 AM Neftin, Sasha
> >             <sasha.neftin@intel.com <mailto:sasha.neftin@intel.com>>
> wrote:
> >
> >                 On 11/1/2022 02:05, Jakub Kicinski wrote:
> >                  > CC: intel-wired
> >                  >
> >                  > On Sun, 30 Oct 2022 14:44:57 -0600 Ivan Smirnov wrote:
> >                  >> Hi folks,
> >                  >>
> >                  >> I found your commits on the linux kernel igc
> >                  >>
> >                 <
> https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc
> >                 <
> https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc
> >>
> >                  >> folder. There appears to be a bug with the igc
> >                 kernel module on Intel
> >                  >> I225-V chips.
> >                  >>
> >                  >> Specifically, the probe fails at startup with error:
> >                 "igc: probe of
> >                  >> 0000:06:00.0 failed with error -13". When it does
> >                 load, it crashes after a
> >                  >> few hours with error "igc failed to read reg 0xc030".
> >                  >>
> >                 Could you provide dmesg -w -T | grep -i igc on the boot
> >                 stage? ethtool -i?
> >                 I've cc'd our PAE expert Amir who also could try to look
> >                 at this problem.
> >
> >                  >> There are several affected users posting on
> >                  >>
> >
> https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/
> >                 <
> https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/
> >
> >                  >> with more details.
> >                  >>
> >                  >> Could I help you debug this? This problem has been
> >                 reproduced on the
> >                  >> following setups:
> >                  >>
> >                  >> 1. Asus TUF-GAMING-Z690-PLUS-WIFI-D4
> >                  >>
> >                 <
> https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/
> >                 <
> https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/
> >>
> >                  >> on
> >                  >> Arch Linux, kernel 6.0.2-arch1-1
> >                  >> 2. rog strix x670e-e gaming wifi
> >                  >>
> >                 <
> https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/
> >                 <
> https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/
> >>
> >                  >> on
> >                  >> Proxmox 7, as well as Ubuntu Linux (kernel 5.19, I
> >                 believe)
> >                  >>
> >                  >> I'm happy to load any debug modules or provide
> >                 additional logs as per
> >                  >> your request.
> >                  >>
> >                  >> Thank you
> >                  >>
> >                  >>
> >                  >>
> >                  >>
> >                  >> --
> >                  >> Ivan Smirnov
> >                  >> https://ivans.io/ <https://ivans.io/> |
> >                 https://blog.ivansmirnov.name/
> >                 <https://blog.ivansmirnov.name/>
> >                  >> https://www.linkedin.com/in/ismirnov
> >                 <https://www.linkedin.com/in/ismirnov> |
> >                  >> *https://ivansmirnov.name/
> >                 <https://ivansmirnov.name/> <https://ivansmirnov.name/
> >                 <https://ivansmirnov.name/>>*
> >                  >> *https://github.com/issmirnov
> >                 <https://github.com/issmirnov>
> >                 <https://ivansmirnov.name/ <https://ivansmirnov.name/>>*
> >                  >
> >
> >     --
> >     --
> >     Ivan Smirnov
> >     https://ivans.io/ <https://ivans.io/> |
> >     https://blog.ivansmirnov.name/ <https://blog.ivansmirnov.name/>
> >     https://www.linkedin.com/in/ismirnov
> >     <https://www.linkedin.com/in/ismirnov> | _https://ivansmirnov.name/
> >     <https://ivansmirnov.name/>_
> >     _https://github.com/issmirnov <https://ivansmirnov.name/>_
> >
> > --
> > --
> > Ivan Smirnov
> > https://ivans.io/ <https://ivans.io/> | https://blog.ivansmirnov.name/
> > <https://blog.ivansmirnov.name/>
> > https://www.linkedin.com/in/ismirnov
> > <https://www.linkedin.com/in/ismirnov> | _https://ivansmirnov.name/
> > <https://ivansmirnov.name/>_
> > _https://github.com/issmirnov <https://ivansmirnov.name/>_
>
>

[-- Attachment #1.2: Type: text/html, Size: 22364 bytes --]

[-- Attachment #2: Type: text/plain, Size: 162 bytes --]

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-24  6:20               ` Ivan Smirnov
@ 2022-11-24 13:55                 ` Ruinskiy, Dima
  0 siblings, 0 replies; 29+ messages in thread
From: Ruinskiy, Dima @ 2022-11-24 13:55 UTC (permalink / raw)
  To: Ivan Smirnov
  Cc: Fuxbrumer, Devora, Jakub Kicinski, intel-wired-lan, Avivi, Amir

Hi Ivan,

Certainly. Anything posted on the mailing list is public anyways. 
Whatever can expedite the dissemination of information is welcome.

--Dima

On 24/11/2022 8:20, Ivan Smirnov wrote:
> Hi Dima,
> 
> Thank you for the measured response. Do I have your permission to post 
> it on the reddit thread to inform other users? We can all reach out to 
> ASUS and hopefully get this escalated past the usual tier-1 support tarpit.
> 
> Thanks,
> - Ivan
> --
> Ivan Smirnov
> https://ivans.io/ <https://ivans.io/> | https://blog.ivansmirnov.name/ 
> <https://blog.ivansmirnov.name/>
> https://www.linkedin.com/in/ismirnov 
> <https://www.linkedin.com/in/ismirnov> | _https://ivansmirnov.name/
> <https://ivansmirnov.name/>_
> _https://github.com/issmirnov <https://ivansmirnov.name/>_
> 
> 
> On Wed, Nov 23, 2022 at 4:47 AM Ruinskiy, Dima <dima.ruinskiy@intel.com 
> <mailto:dima.ruinskiy@intel.com>> wrote:
> 
>     I have looked at this thread and the other threads referenced from it.
> 
>     I see multiple users reporting issues with the the I225 and its Linux
>     driver, on a specific series of ASUS motherboards (X670), and at least
>     one report of a similar issue with a different ASUS board (Z690).
> 
>     The problem looks like the device 'disappears' from the bus, and
>     becomes
>     inaccessible to the driver. If it happens early - the driver will not
>     load, if it happens later - it may fail with sporadic access errors.
> 
>     There are some reports of partially working workarounds (i.e., some
>     users claim it solved their issues, while for others it did not help) -
>     that have to do with tweaking various PCIe power management settings. I
>     can see the connection here, because PCIe power management is not
>     trivial, and depends on a combination of hardware, firmware, BIOS, OS
>     and driver factors. When there is a problem somewhere - it can manifest
>     exactly like what has been reported here.
> 
>     The user will see that the driver is crashing, but that does not
>     necessarily mean that the problem is in the driver. It may be a bug in
>     any other component, or an interoperability issue. A fix/workaround may
>     also be implemented in any of the involved modules, depending on the
>     root cause and the complexity.
> 
>     We, the igc driver maintainers, are unable to offer any software patch
>     for the problem at this point, because the issue has not been
>     root-caused, as far as I know. We have not seen this problem during our
>     in-house testing, and since it has been reported, have not been able to
>     reproduce it on any of our test setups.
> 
>     The I225 network device is a "LAN on motherboard" solution. While the
>     chip, the firmware and the driver are provided by Intel, the
>     motherboard
>     vendor is the one that controls the layout, the electrical
>     interconnects, the BIOS, and the specific FW version that is flashed to
>     the chip.
> 
>     The fact that many such reports are coming recently from specific ASUS
>     boards, and not from other vendors with I225 solutions, would lead
>     me to
>     first check in ASUS's direction. ASUS may be able to address this issue
>     with a range of solutions - hardware replacement, BIOS update, I225
>     firmware rollout, or instructions on how to tweak the settings -
>     depending on the cause they will determine. The more reports they get
>     from their customers, the more likely they will figure it out.
> 
>     For instance - a recent report from adam.lamarz on the Bugzilla thread
>     https://bugzilla.kernel.org/show_bug.cgi?id=216652#c15
>     <https://bugzilla.kernel.org/show_bug.cgi?id=216652#c15>
>     indicates that there is some hope the issue can be alleviated with a
>     BIOS update and a tweak to the kernel settings.
> 
>     Is it _possible_ that in the end there will be some patch in the igc
>     driver for this issue, together or independently from other
>     components? Yes.
> 
>     Can we offer such a patch based on what we know so far? No, because we
>     have not been able to reproduce the issue in-house, and have also not
>     received any communication about it from ASUS (who, I expect, have
>     their
>     own validation and test procedures, before they roll-out their hardware
>     to the end-users).
> 
>     I understand this is not the definitive answer to the problem that we
>     may all want to see, but this is what I have at the moment.
> 
>     --Dima
> 
>     On 17/11/2022 0:23, Ivan Smirnov wrote:
>      > Hi folks,
>      >
>      > Is there any update for the community? More and more folks are
>     asking.
>      > We are all techies and happy to help debug.
>      >
>      > Thank you kindly,
>      > - Ivan
>      >
>      > On Thu, Nov 10, 2022 at 03:44 Ivan Smirnov <isgsmirnov@gmail.com
>     <mailto:isgsmirnov@gmail.com>
>      > <mailto:isgsmirnov@gmail.com <mailto:isgsmirnov@gmail.com>>> wrote:
>      >
>      >     Some more data from another user. Do you guys have any
>     preliminary
>      >     investigation you could share back with the community?
>      >
>      >     Same issue, been struggling with it for a last month or so: both
>      >     with Ubuntu and Arch Linux. I have a dual-boot system with
>     Windows
>      >     11, and did not notice any issues with ethernet or wifi on
>     Windows.
>      >     So this indeed seems like a firmware issue, particularly in
>     igc. Not
>      >     the adapter itself
>      >
>      >     Running on Arch Linux kernel 6.0.7, same motherboard as in
>     your post
>      >
>      > https://gist.github.com/LilDojd/2f030ecc5c5b6f8c3285725adfb8c456
>     <https://gist.github.com/LilDojd/2f030ecc5c5b6f8c3285725adfb8c456>
>      >   
>       <https://gist.github.com/LilDojd/2f030ecc5c5b6f8c3285725adfb8c456
>     <https://gist.github.com/LilDojd/2f030ecc5c5b6f8c3285725adfb8c456>>
>      >
>      >
>      >
>      >
>      >     On Thu, Nov 3, 2022 at 05:53 Ivan Smirnov
>     <isgsmirnov@gmail.com <mailto:isgsmirnov@gmail.com>
>      >     <mailto:isgsmirnov@gmail.com <mailto:isgsmirnov@gmail.com>>>
>     wrote:
>      >
>      >         Here is the gist from one reddit user:
>      > https://gist.github.com/DarkArc/50ffca5fc343e2ff8166bc81d3ff8335
>     <https://gist.github.com/DarkArc/50ffca5fc343e2ff8166bc81d3ff8335>
>      >       
>       <https://gist.github.com/DarkArc/50ffca5fc343e2ff8166bc81d3ff8335
>     <https://gist.github.com/DarkArc/50ffca5fc343e2ff8166bc81d3ff8335>>
>      >
>      >         Here are my gists (crash free for now):
>      >
>     https://gist.github.com/issmirnov/b9ac74d232e1865ae849a3e64dce2afe
>     <https://gist.github.com/issmirnov/b9ac74d232e1865ae849a3e64dce2afe>
>      >       
>       <https://gist.github.com/issmirnov/b9ac74d232e1865ae849a3e64dce2afe <https://gist.github.com/issmirnov/b9ac74d232e1865ae849a3e64dce2afe>>
>      >
>      >         --
>      >         Ivan Smirnov
>      > https://ivans.io/ <https://ivans.io/> <https://ivans.io/
>     <https://ivans.io/>> |
>      > https://blog.ivansmirnov.name/ <https://blog.ivansmirnov.name/>
>     <https://blog.ivansmirnov.name/ <https://blog.ivansmirnov.name/>>
>      > https://www.linkedin.com/in/ismirnov
>     <https://www.linkedin.com/in/ismirnov>
>      >         <https://www.linkedin.com/in/ismirnov
>     <https://www.linkedin.com/in/ismirnov>> | _https://ivansmirnov.name/
>     <https://ivansmirnov.name/>
>      >         <https://ivansmirnov.name/ <https://ivansmirnov.name/>>_
>      >         _https://github.com/issmirnov
>     <https://github.com/issmirnov> <https://ivansmirnov.name/
>     <https://ivansmirnov.name/>>_
>      >
>      >
>      >         On Wed, Nov 2, 2022 at 10:54 AM Ivan Smirnov
>      >         <isgsmirnov@gmail.com <mailto:isgsmirnov@gmail.com>
>     <mailto:isgsmirnov@gmail.com <mailto:isgsmirnov@gmail.com>>> wrote:
>      >
>      >             Hi folks,
>      >
>      >             As usual, the computers know when the experts join the
>      >             chat... I haven't been able to reproduce the issue
>     for the
>      >             past few days. Yay for stability, boo for debugging.
>      >
>      >             I posted on the reddit thread
>      >           
>       <https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/ <https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/>> asking
>      >             other users to post their output. I'll do my best to
>     keep an
>      >             eye out for this issue and get you the logs ASAP once I
>      >             repro the crash.
>      >
>      >             Thank you for your responsiveness - will keep you posted!
>      >
>      >             Best,
>      >             - Ivan
>      >             --
>      >             Ivan Smirnov
>      > https://ivans.io/ <https://ivans.io/> <https://ivans.io/
>     <https://ivans.io/>> |
>      > https://blog.ivansmirnov.name/ <https://blog.ivansmirnov.name/>
>     <https://blog.ivansmirnov.name/ <https://blog.ivansmirnov.name/>>
>      > https://www.linkedin.com/in/ismirnov
>     <https://www.linkedin.com/in/ismirnov>
>      >             <https://www.linkedin.com/in/ismirnov
>     <https://www.linkedin.com/in/ismirnov>> |
>      >             _https://ivansmirnov.name/ <https://ivansmirnov.name/>
>      >             <https://ivansmirnov.name/ <https://ivansmirnov.name/>>_
>      >             _https://github.com/issmirnov
>     <https://github.com/issmirnov> <https://ivansmirnov.name/
>     <https://ivansmirnov.name/>>_
>      >
>      >
>      >             On Tue, Nov 1, 2022 at 10:21 AM Neftin, Sasha
>      >             <sasha.neftin@intel.com
>     <mailto:sasha.neftin@intel.com> <mailto:sasha.neftin@intel.com
>     <mailto:sasha.neftin@intel.com>>> wrote:
>      >
>      >                 On 11/1/2022 02:05, Jakub Kicinski wrote:
>      >                  > CC: intel-wired
>      >                  >
>      >                  > On Sun, 30 Oct 2022 14:44:57 -0600 Ivan
>     Smirnov wrote:
>      >                  >> Hi folks,
>      >                  >>
>      >                  >> I found your commits on the linux kernel igc
>      >                  >>
>      >               
>       <https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc <https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc>
>      >               
>       <https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc <https://github.com/torvalds/linux/commits/master/drivers/net/ethernet/intel/igc>>>
>      >                  >> folder. There appears to be a bug with the igc
>      >                 kernel module on Intel
>      >                  >> I225-V chips.
>      >                  >>
>      >                  >> Specifically, the probe fails at startup with
>     error:
>      >                 "igc: probe of
>      >                  >> 0000:06:00.0 failed with error -13". When it does
>      >                 load, it crashes after a
>      >                  >> few hours with error "igc failed to read reg
>     0xc030".
>      >                  >>
>      >                 Could you provide dmesg -w -T | grep -i igc on
>     the boot
>      >                 stage? ethtool -i?
>      >                 I've cc'd our PAE expert Amir who also could try
>     to look
>      >                 at this problem.
>      >
>      >                  >> There are several affected users posting on
>      >                  >>
>      >
>     https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/
>     <https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/>
>      >               
>       <https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/ <https://www.reddit.com/r/buildapc/comments/xypn1m/network_card_intel_ethernet_controller_i225v_igc/>>
>      >                  >> with more details.
>      >                  >>
>      >                  >> Could I help you debug this? This problem has
>     been
>      >                 reproduced on the
>      >                  >> following setups:
>      >                  >>
>      >                  >> 1. Asus TUF-GAMING-Z690-PLUS-WIFI-D4
>      >                  >>
>      >               
>       <https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/ <https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/>
>      >               
>       <https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/ <https://www.asus.com/motherboards-components/motherboards/tuf-gaming/tuf-gaming-z690-plus-wifi-d4/>>>
>      >                  >> on
>      >                  >> Arch Linux, kernel 6.0.2-arch1-1
>      >                  >> 2. rog strix x670e-e gaming wifi
>      >                  >>
>      >               
>       <https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/ <https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/>
>      >               
>       <https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/ <https://rog.asus.com/us/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/>>>
>      >                  >> on
>      >                  >> Proxmox 7, as well as Ubuntu Linux (kernel
>     5.19, I
>      >                 believe)
>      >                  >>
>      >                  >> I'm happy to load any debug modules or provide
>      >                 additional logs as per
>      >                  >> your request.
>      >                  >>
>      >                  >> Thank you
>      >                  >>
>      >                  >>
>      >                  >>
>      >                  >>
>      >                  >> --
>      >                  >> Ivan Smirnov
>      >                  >> https://ivans.io/ <https://ivans.io/>
>     <https://ivans.io/ <https://ivans.io/>> |
>      > https://blog.ivansmirnov.name/ <https://blog.ivansmirnov.name/>
>      >                 <https://blog.ivansmirnov.name/
>     <https://blog.ivansmirnov.name/>>
>      >                  >> https://www.linkedin.com/in/ismirnov
>     <https://www.linkedin.com/in/ismirnov>
>      >                 <https://www.linkedin.com/in/ismirnov
>     <https://www.linkedin.com/in/ismirnov>> |
>      >                  >> *https://ivansmirnov.name/
>     <https://ivansmirnov.name/>
>      >                 <https://ivansmirnov.name/
>     <https://ivansmirnov.name/>> <https://ivansmirnov.name/
>     <https://ivansmirnov.name/>
>      >                 <https://ivansmirnov.name/
>     <https://ivansmirnov.name/>>>*
>      >                  >> *https://github.com/issmirnov
>     <https://github.com/issmirnov>
>      >                 <https://github.com/issmirnov
>     <https://github.com/issmirnov>>
>      >                 <https://ivansmirnov.name/
>     <https://ivansmirnov.name/> <https://ivansmirnov.name/
>     <https://ivansmirnov.name/>>>*
>      >                  >
>      >
>      >     --
>      >     --
>      >     Ivan Smirnov
>      > https://ivans.io/ <https://ivans.io/> <https://ivans.io/
>     <https://ivans.io/>> |
>      > https://blog.ivansmirnov.name/ <https://blog.ivansmirnov.name/>
>     <https://blog.ivansmirnov.name/ <https://blog.ivansmirnov.name/>>
>      > https://www.linkedin.com/in/ismirnov
>     <https://www.linkedin.com/in/ismirnov>
>      >     <https://www.linkedin.com/in/ismirnov
>     <https://www.linkedin.com/in/ismirnov>> | _https://ivansmirnov.name/
>     <https://ivansmirnov.name/>
>      >     <https://ivansmirnov.name/ <https://ivansmirnov.name/>>_
>      >     _https://github.com/issmirnov <https://github.com/issmirnov>
>     <https://ivansmirnov.name/ <https://ivansmirnov.name/>>_
>      >
>      > --
>      > --
>      > Ivan Smirnov
>      > https://ivans.io/ <https://ivans.io/> <https://ivans.io/
>     <https://ivans.io/>> | https://blog.ivansmirnov.name/
>     <https://blog.ivansmirnov.name/>
>      > <https://blog.ivansmirnov.name/ <https://blog.ivansmirnov.name/>>
>      > https://www.linkedin.com/in/ismirnov
>     <https://www.linkedin.com/in/ismirnov>
>      > <https://www.linkedin.com/in/ismirnov
>     <https://www.linkedin.com/in/ismirnov>> | _https://ivansmirnov.name/
>     <https://ivansmirnov.name/>
>      > <https://ivansmirnov.name/ <https://ivansmirnov.name/>>_
>      > _https://github.com/issmirnov <https://github.com/issmirnov>
>     <https://ivansmirnov.name/ <https://ivansmirnov.name/>>_
> 

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-11-20 19:55                       ` Conor Dooley
@ 2022-12-21 17:30                         ` Conor Dooley
  -1 siblings, 0 replies; 29+ messages in thread
From: Conor Dooley @ 2022-12-21 17:30 UTC (permalink / raw)
  To: Neftin, Sasha
  Cc: Jakub Kicinski, Ivan Smirnov, Fuxbrumer, Devora, intel-wired-lan,
	Ruinskiy, Dima, Avivi, Amir, regressions, Lifshits, Vitaly,
	naamax.meir, Meir, NaamaX

[-- Attachment #1: Type: text/plain, Size: 4620 bytes --]

On Sun, Nov 20, 2022 at 07:55:09PM +0000, Conor Dooley wrote:
> On Sat, Nov 19, 2022 at 08:06:05PM +0200, Neftin, Sasha wrote:
> > On 11/19/2022 01:21, Conor Dooley wrote:
> > > On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
> > > > On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
> > > > > > Is there any update for the community? More and more folks are asking. We
> > > > > > are all techies and happy to help debug.
> > > > > 
> > > > > Vested interest since I am suffering from the same issue (X670E-F
> > > > > Gaming), but is it okay to add this to regzbot? Not sure whether it
> > > > > counts as a regression or not since it's new hw with the existing driver,
> > > > > but this seems to be falling through the cracks without a response for
> > > > > several weeks.
> > > > 
> > > > Dunno, Thorsten's will decide. The line has to be drawn somewhere
> > > > on "vendor doesn't care about Linux support" vs "we broke uAPI".
> > > > This is the kind of situation I was alluding to in my line of
> > > > questioning at the maintainer summit: https://lwn.net/Articles/908324/
> > > 
> > > Yeah & it is /regression/ tracking which I don't (or rather didn't)
> > > consider this situation to be. I'm generally a little unsure as to when
> > > I should trigger regzbot in general:
> > > - immediately when I find something?
> > > - only if it goes a while with nothing constructive?
> > > - is it okay to use it outside of "this used to work and now doesnt"?
> > > 
> > > Either way, but I did some more googling and found this reddit thread:
> > > https://www.reddit.com/r/intel/comments/lqb4km/for_people_having_i225v_connection_issues/
> > > 
> > > That's being reported against windows & I dunno if the dude is using
> > > firmware and driver interchangeably etc. But the disabling power saving
> > > etc sounds oddly like the issue we have here, since that was a proposed
> > > workaround in Ivan's 2022 reddit thread.
> > > 
> > > Supposedly I am on firmware-version 1082:8770, but /I/ I have no idea
> > > how that corresponds to windows versioning. That may lend some credence
> > > to your assertion about firmware being the source of many issues.
> > > 
> > > > Finding a kernel release which does not suffer from the problem
> > > > would certainly strengthen your case.
> > > 
> > > Aye, likely to be a little difficult to do a meaningful bisection for
> > > me at least, since the motherboard I have with the problem is an AM5
> > > one for the new Zen4 stuff. I'm not an x86 person, so not entirely
> > > sure when that support landed. I may do some poking tomorrow..
> > > 
> > I do not think we can resolve this problem on this forum.
> > In early Ivan's report was reported error to netdev "PCIe link lost, device
> > now detached"). Since the PCIe link unexpectedly drops it could lead to many
> > problems (not only crashes).
> 
> Hmm, I'll take a look at what mine spits out next time it dies, but I
> would imagine that you're correct and I see it too.

It does in fact say that, but interestingly only this peripheral has any
issues. My GPUs etc have no problem at all.

> > Before you go to SW/FW bisection (change FW(NVM), go back with a kernel
> > version) - please, contact your board vendor (ASUS). Why PCIe link drop?
> 
> I dunno, I suppose it just entered a lower power state!
> 
> > Circuit problem on board, the system performs power management flows and
> > does not stop the driver.
> 
> My GPU and other PCI devices are returning from lower power modes properly.
> I wonder what's different about this specific device. As I said, not too
> familiar with x86 stuff - is there someone from AMD worth poking as the
> output from lspci is a wall of AMD bridges w/ endpoints mixed in.
> 
> Doing a cursory look at other x670 stuff - the non-asus ones that I
> looked at are not using Intel ethernet.
> 
> > "failed to read reg 0xc030" (just symptom) happen after PCIe link lost.
> 
> Per 47e16692b26b ("igb/igc: warn when fatal read failure happens"), it
> looks as though this is not a *new* problem though as you guys have seen
> this while testing.
> 
> I've got a 1 G NIC, I like my dev machine to "just work" so I'll probably
> throw that in and see how far that gets me. IIRC it's an igb one so will
> at least make for a datapoint.

FWIW I gave up on the igc driver and am using my NIC, couldn't be
bothered with the disruption. I'll give the bios stuff mentioned
elsewhere a go over Christmas now that v6.1.1 exists and see if that
helps. Hopefully it does!

Conor.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
@ 2022-12-21 17:30                         ` Conor Dooley
  0 siblings, 0 replies; 29+ messages in thread
From: Conor Dooley @ 2022-12-21 17:30 UTC (permalink / raw)
  To: Neftin, Sasha
  Cc: Fuxbrumer, Devora, regressions, Meir, NaamaX, Ivan Smirnov,
	intel-wired-lan, Jakub Kicinski, Ruinskiy, Dima, Avivi, Amir


[-- Attachment #1.1: Type: text/plain, Size: 4620 bytes --]

On Sun, Nov 20, 2022 at 07:55:09PM +0000, Conor Dooley wrote:
> On Sat, Nov 19, 2022 at 08:06:05PM +0200, Neftin, Sasha wrote:
> > On 11/19/2022 01:21, Conor Dooley wrote:
> > > On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
> > > > On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
> > > > > > Is there any update for the community? More and more folks are asking. We
> > > > > > are all techies and happy to help debug.
> > > > > 
> > > > > Vested interest since I am suffering from the same issue (X670E-F
> > > > > Gaming), but is it okay to add this to regzbot? Not sure whether it
> > > > > counts as a regression or not since it's new hw with the existing driver,
> > > > > but this seems to be falling through the cracks without a response for
> > > > > several weeks.
> > > > 
> > > > Dunno, Thorsten's will decide. The line has to be drawn somewhere
> > > > on "vendor doesn't care about Linux support" vs "we broke uAPI".
> > > > This is the kind of situation I was alluding to in my line of
> > > > questioning at the maintainer summit: https://lwn.net/Articles/908324/
> > > 
> > > Yeah & it is /regression/ tracking which I don't (or rather didn't)
> > > consider this situation to be. I'm generally a little unsure as to when
> > > I should trigger regzbot in general:
> > > - immediately when I find something?
> > > - only if it goes a while with nothing constructive?
> > > - is it okay to use it outside of "this used to work and now doesnt"?
> > > 
> > > Either way, but I did some more googling and found this reddit thread:
> > > https://www.reddit.com/r/intel/comments/lqb4km/for_people_having_i225v_connection_issues/
> > > 
> > > That's being reported against windows & I dunno if the dude is using
> > > firmware and driver interchangeably etc. But the disabling power saving
> > > etc sounds oddly like the issue we have here, since that was a proposed
> > > workaround in Ivan's 2022 reddit thread.
> > > 
> > > Supposedly I am on firmware-version 1082:8770, but /I/ I have no idea
> > > how that corresponds to windows versioning. That may lend some credence
> > > to your assertion about firmware being the source of many issues.
> > > 
> > > > Finding a kernel release which does not suffer from the problem
> > > > would certainly strengthen your case.
> > > 
> > > Aye, likely to be a little difficult to do a meaningful bisection for
> > > me at least, since the motherboard I have with the problem is an AM5
> > > one for the new Zen4 stuff. I'm not an x86 person, so not entirely
> > > sure when that support landed. I may do some poking tomorrow..
> > > 
> > I do not think we can resolve this problem on this forum.
> > In early Ivan's report was reported error to netdev "PCIe link lost, device
> > now detached"). Since the PCIe link unexpectedly drops it could lead to many
> > problems (not only crashes).
> 
> Hmm, I'll take a look at what mine spits out next time it dies, but I
> would imagine that you're correct and I see it too.

It does in fact say that, but interestingly only this peripheral has any
issues. My GPUs etc have no problem at all.

> > Before you go to SW/FW bisection (change FW(NVM), go back with a kernel
> > version) - please, contact your board vendor (ASUS). Why PCIe link drop?
> 
> I dunno, I suppose it just entered a lower power state!
> 
> > Circuit problem on board, the system performs power management flows and
> > does not stop the driver.
> 
> My GPU and other PCI devices are returning from lower power modes properly.
> I wonder what's different about this specific device. As I said, not too
> familiar with x86 stuff - is there someone from AMD worth poking as the
> output from lspci is a wall of AMD bridges w/ endpoints mixed in.
> 
> Doing a cursory look at other x670 stuff - the non-asus ones that I
> looked at are not using Intel ethernet.
> 
> > "failed to read reg 0xc030" (just symptom) happen after PCIe link lost.
> 
> Per 47e16692b26b ("igb/igc: warn when fatal read failure happens"), it
> looks as though this is not a *new* problem though as you guys have seen
> this while testing.
> 
> I've got a 1 G NIC, I like my dev machine to "just work" so I'll probably
> throw that in and see how far that gets me. IIRC it's an igb one so will
> at least make for a datapoint.

FWIW I gave up on the igc driver and am using my NIC, couldn't be
bothered with the disruption. I'll give the bios stuff mentioned
elsewhere a go over Christmas now that v6.1.1 exists and see if that
helps. Hopefully it does!

Conor.


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 162 bytes --]

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-12-21 17:30                         ` Conor Dooley
@ 2022-12-31 15:02                           ` Conor Dooley
  -1 siblings, 0 replies; 29+ messages in thread
From: Conor Dooley @ 2022-12-31 15:02 UTC (permalink / raw)
  To: Neftin, Sasha
  Cc: Jakub Kicinski, Ivan Smirnov, Fuxbrumer, Devora, intel-wired-lan,
	Ruinskiy, Dima, Avivi, Amir, regressions, Lifshits, Vitaly,
	naamax.meir, Meir, NaamaX, helgaas

[-- Attachment #1: Type: text/plain, Size: 5909 bytes --]

On Wed, Dec 21, 2022 at 05:30:54PM +0000, Conor Dooley wrote:
> On Sun, Nov 20, 2022 at 07:55:09PM +0000, Conor Dooley wrote:
> > On Sat, Nov 19, 2022 at 08:06:05PM +0200, Neftin, Sasha wrote:
> > > On 11/19/2022 01:21, Conor Dooley wrote:
> > > > On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
> > > > > On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
> > > > > > > Is there any update for the community? More and more folks are asking. We
> > > > > > > are all techies and happy to help debug.
> > > > > > 
> > > > > > Vested interest since I am suffering from the same issue (X670E-F
> > > > > > Gaming), but is it okay to add this to regzbot? Not sure whether it
> > > > > > counts as a regression or not since it's new hw with the existing driver,
> > > > > > but this seems to be falling through the cracks without a response for
> > > > > > several weeks.
> > > > > 
> > > > > Dunno, Thorsten's will decide. The line has to be drawn somewhere
> > > > > on "vendor doesn't care about Linux support" vs "we broke uAPI".
> > > > > This is the kind of situation I was alluding to in my line of
> > > > > questioning at the maintainer summit: https://lwn.net/Articles/908324/
> > > > 
> > > > Yeah & it is /regression/ tracking which I don't (or rather didn't)
> > > > consider this situation to be. I'm generally a little unsure as to when
> > > > I should trigger regzbot in general:
> > > > - immediately when I find something?
> > > > - only if it goes a while with nothing constructive?
> > > > - is it okay to use it outside of "this used to work and now doesnt"?
> > > > 
> > > > Either way, but I did some more googling and found this reddit thread:
> > > > https://www.reddit.com/r/intel/comments/lqb4km/for_people_having_i225v_connection_issues/
> > > > 
> > > > That's being reported against windows & I dunno if the dude is using
> > > > firmware and driver interchangeably etc. But the disabling power saving
> > > > etc sounds oddly like the issue we have here, since that was a proposed
> > > > workaround in Ivan's 2022 reddit thread.
> > > > 
> > > > Supposedly I am on firmware-version 1082:8770, but /I/ I have no idea
> > > > how that corresponds to windows versioning. That may lend some credence
> > > > to your assertion about firmware being the source of many issues.
> > > > 
> > > > > Finding a kernel release which does not suffer from the problem
> > > > > would certainly strengthen your case.
> > > > 
> > > > Aye, likely to be a little difficult to do a meaningful bisection for
> > > > me at least, since the motherboard I have with the problem is an AM5
> > > > one for the new Zen4 stuff. I'm not an x86 person, so not entirely
> > > > sure when that support landed. I may do some poking tomorrow..
> > > > 
> > > I do not think we can resolve this problem on this forum.
> > > In early Ivan's report was reported error to netdev "PCIe link lost, device
> > > now detached"). Since the PCIe link unexpectedly drops it could lead to many
> > > problems (not only crashes).
> > 
> > Hmm, I'll take a look at what mine spits out next time it dies, but I
> > would imagine that you're correct and I see it too.
> 
> It does in fact say that, but interestingly only this peripheral has any
> issues. My GPUs etc have no problem at all.
> 
> > > Before you go to SW/FW bisection (change FW(NVM), go back with a kernel
> > > version) - please, contact your board vendor (ASUS). Why PCIe link drop?
> > 
> > I dunno, I suppose it just entered a lower power state!
> > 
> > > Circuit problem on board, the system performs power management flows and
> > > does not stop the driver.
> > 
> > My GPU and other PCI devices are returning from lower power modes properly.
> > I wonder what's different about this specific device. As I said, not too
> > familiar with x86 stuff - is there someone from AMD worth poking as the
> > output from lspci is a wall of AMD bridges w/ endpoints mixed in.
> > 
> > Doing a cursory look at other x670 stuff - the non-asus ones that I
> > looked at are not using Intel ethernet.
> > 
> > > "failed to read reg 0xc030" (just symptom) happen after PCIe link lost.
> > 
> > Per 47e16692b26b ("igb/igc: warn when fatal read failure happens"), it
> > looks as though this is not a *new* problem though as you guys have seen
> > this while testing.
> > 
> > I've got a 1 G NIC, I like my dev machine to "just work" so I'll probably
> > throw that in and see how far that gets me. IIRC it's an igb one so will
> > at least make for a datapoint.
> 
> FWIW I gave up on the igc driver and am using my NIC, couldn't be
> bothered with the disruption. I'll give the bios stuff mentioned
> elsewhere a go over Christmas now that v6.1.1 exists and see if that
> helps. Hopefully it does!

Hallo, me again...

I didn't actually give the bios stuff a go in the end. I figured that
changing everything at once would likely not be a good idea - but what I
did do was try v6.1.1 & have now been running for 50-something hours
without any issues while using the igc iface.

Whole-ly unscientific of course, but I had noticed this thread:
https://lore.kernel.org/all/20221226225045.GA400369@bhelgaas/
and that commit c01163dbd1b8 ("PCI/PM: Always disable PTM for all devices
during suspend") was not part of the v6.0.y kernels I was running but
*is* in v6.1.y, which was my impetus for trying the kernel upgrade.

I checked v6.0.16-rc2 and that commit does not appear to have been
backported yet.
Perhaps some of the other "victims" in this thread who have not yet
tried changing BIOS etc, could give v6.1.y a go & see if they still have
issues.

I may backport the aforementioned patch myself and see how it does, but
someone else trying v6.1.y & not seeing the iface dying would certainly
help with motivation :)

Thanks,
Conor.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
@ 2022-12-31 15:02                           ` Conor Dooley
  0 siblings, 0 replies; 29+ messages in thread
From: Conor Dooley @ 2022-12-31 15:02 UTC (permalink / raw)
  To: Neftin, Sasha
  Cc: Fuxbrumer, Devora, helgaas, regressions, Meir, NaamaX,
	Ivan Smirnov, intel-wired-lan, Jakub Kicinski, Ruinskiy, Dima,
	Avivi, Amir


[-- Attachment #1.1: Type: text/plain, Size: 5909 bytes --]

On Wed, Dec 21, 2022 at 05:30:54PM +0000, Conor Dooley wrote:
> On Sun, Nov 20, 2022 at 07:55:09PM +0000, Conor Dooley wrote:
> > On Sat, Nov 19, 2022 at 08:06:05PM +0200, Neftin, Sasha wrote:
> > > On 11/19/2022 01:21, Conor Dooley wrote:
> > > > On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
> > > > > On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
> > > > > > > Is there any update for the community? More and more folks are asking. We
> > > > > > > are all techies and happy to help debug.
> > > > > > 
> > > > > > Vested interest since I am suffering from the same issue (X670E-F
> > > > > > Gaming), but is it okay to add this to regzbot? Not sure whether it
> > > > > > counts as a regression or not since it's new hw with the existing driver,
> > > > > > but this seems to be falling through the cracks without a response for
> > > > > > several weeks.
> > > > > 
> > > > > Dunno, Thorsten's will decide. The line has to be drawn somewhere
> > > > > on "vendor doesn't care about Linux support" vs "we broke uAPI".
> > > > > This is the kind of situation I was alluding to in my line of
> > > > > questioning at the maintainer summit: https://lwn.net/Articles/908324/
> > > > 
> > > > Yeah & it is /regression/ tracking which I don't (or rather didn't)
> > > > consider this situation to be. I'm generally a little unsure as to when
> > > > I should trigger regzbot in general:
> > > > - immediately when I find something?
> > > > - only if it goes a while with nothing constructive?
> > > > - is it okay to use it outside of "this used to work and now doesnt"?
> > > > 
> > > > Either way, but I did some more googling and found this reddit thread:
> > > > https://www.reddit.com/r/intel/comments/lqb4km/for_people_having_i225v_connection_issues/
> > > > 
> > > > That's being reported against windows & I dunno if the dude is using
> > > > firmware and driver interchangeably etc. But the disabling power saving
> > > > etc sounds oddly like the issue we have here, since that was a proposed
> > > > workaround in Ivan's 2022 reddit thread.
> > > > 
> > > > Supposedly I am on firmware-version 1082:8770, but /I/ I have no idea
> > > > how that corresponds to windows versioning. That may lend some credence
> > > > to your assertion about firmware being the source of many issues.
> > > > 
> > > > > Finding a kernel release which does not suffer from the problem
> > > > > would certainly strengthen your case.
> > > > 
> > > > Aye, likely to be a little difficult to do a meaningful bisection for
> > > > me at least, since the motherboard I have with the problem is an AM5
> > > > one for the new Zen4 stuff. I'm not an x86 person, so not entirely
> > > > sure when that support landed. I may do some poking tomorrow..
> > > > 
> > > I do not think we can resolve this problem on this forum.
> > > In early Ivan's report was reported error to netdev "PCIe link lost, device
> > > now detached"). Since the PCIe link unexpectedly drops it could lead to many
> > > problems (not only crashes).
> > 
> > Hmm, I'll take a look at what mine spits out next time it dies, but I
> > would imagine that you're correct and I see it too.
> 
> It does in fact say that, but interestingly only this peripheral has any
> issues. My GPUs etc have no problem at all.
> 
> > > Before you go to SW/FW bisection (change FW(NVM), go back with a kernel
> > > version) - please, contact your board vendor (ASUS). Why PCIe link drop?
> > 
> > I dunno, I suppose it just entered a lower power state!
> > 
> > > Circuit problem on board, the system performs power management flows and
> > > does not stop the driver.
> > 
> > My GPU and other PCI devices are returning from lower power modes properly.
> > I wonder what's different about this specific device. As I said, not too
> > familiar with x86 stuff - is there someone from AMD worth poking as the
> > output from lspci is a wall of AMD bridges w/ endpoints mixed in.
> > 
> > Doing a cursory look at other x670 stuff - the non-asus ones that I
> > looked at are not using Intel ethernet.
> > 
> > > "failed to read reg 0xc030" (just symptom) happen after PCIe link lost.
> > 
> > Per 47e16692b26b ("igb/igc: warn when fatal read failure happens"), it
> > looks as though this is not a *new* problem though as you guys have seen
> > this while testing.
> > 
> > I've got a 1 G NIC, I like my dev machine to "just work" so I'll probably
> > throw that in and see how far that gets me. IIRC it's an igb one so will
> > at least make for a datapoint.
> 
> FWIW I gave up on the igc driver and am using my NIC, couldn't be
> bothered with the disruption. I'll give the bios stuff mentioned
> elsewhere a go over Christmas now that v6.1.1 exists and see if that
> helps. Hopefully it does!

Hallo, me again...

I didn't actually give the bios stuff a go in the end. I figured that
changing everything at once would likely not be a good idea - but what I
did do was try v6.1.1 & have now been running for 50-something hours
without any issues while using the igc iface.

Whole-ly unscientific of course, but I had noticed this thread:
https://lore.kernel.org/all/20221226225045.GA400369@bhelgaas/
and that commit c01163dbd1b8 ("PCI/PM: Always disable PTM for all devices
during suspend") was not part of the v6.0.y kernels I was running but
*is* in v6.1.y, which was my impetus for trying the kernel upgrade.

I checked v6.0.16-rc2 and that commit does not appear to have been
backported yet.
Perhaps some of the other "victims" in this thread who have not yet
tried changing BIOS etc, could give v6.1.y a go & see if they still have
issues.

I may backport the aforementioned patch myself and see how it does, but
someone else trying v6.1.y & not seeing the iface dying would certainly
help with motivation :)

Thanks,
Conor.


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 162 bytes --]

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
  2022-12-31 15:02                           ` Conor Dooley
@ 2023-01-02 11:09                             ` Conor Dooley
  -1 siblings, 0 replies; 29+ messages in thread
From: Conor Dooley @ 2023-01-02 11:09 UTC (permalink / raw)
  To: Neftin, Sasha
  Cc: Jakub Kicinski, Ivan Smirnov, Fuxbrumer, Devora, intel-wired-lan,
	Ruinskiy, Dima, Avivi, Amir, regressions, Lifshits, Vitaly,
	naamax.meir, Meir, NaamaX, helgaas

[-- Attachment #1: Type: text/plain, Size: 545 bytes --]

On Sat, Dec 31, 2022 at 03:02:57PM +0000, Conor Dooley wrote:

> I didn't actually give the bios stuff a go in the end. I figured that
> changing everything at once would likely not be a good idea - but what I
> did do was try v6.1.1 & have now been running for 50-something hours
> without any issues while using the igc iface.

Bah, it died last night about about the 90 hour mark. Still an order of
magnitude longer than I had got it to work sequentially for before, but
not fixed :(

I'll give the bios a go I suppose, sorry for the noise!


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
@ 2023-01-02 11:09                             ` Conor Dooley
  0 siblings, 0 replies; 29+ messages in thread
From: Conor Dooley @ 2023-01-02 11:09 UTC (permalink / raw)
  To: Neftin, Sasha
  Cc: Fuxbrumer, Devora, helgaas, regressions, Meir, NaamaX,
	Ivan Smirnov, intel-wired-lan, Jakub Kicinski, Ruinskiy, Dima,
	Avivi, Amir


[-- Attachment #1.1: Type: text/plain, Size: 545 bytes --]

On Sat, Dec 31, 2022 at 03:02:57PM +0000, Conor Dooley wrote:

> I didn't actually give the bios stuff a go in the end. I figured that
> changing everything at once would likely not be a good idea - but what I
> did do was try v6.1.1 & have now been running for 50-something hours
> without any issues while using the igc iface.

Bah, it died last night about about the 90 hour mark. Still an order of
magnitude longer than I had got it to work sequentially for before, but
not fixed :(

I'll give the bios a go I suppose, sorry for the noise!


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 162 bytes --]

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-01-03 16:48 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAPAtJa_o5q-sU+AD=G3y43H_5pBKnOZTQGXM99uszPXNkn8Z9A@mail.gmail.com>
2022-11-01  0:05 ` [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V) Jakub Kicinski
2022-11-01 16:20   ` Neftin, Sasha
2022-11-02 16:54     ` Ivan Smirnov
2022-11-02 17:53       ` Ivan Smirnov
2022-11-10 11:44         ` Ivan Smirnov
2022-11-16 22:23           ` Ivan Smirnov
2022-11-18 22:43             ` Conor Dooley
2022-11-18 22:43               ` Conor Dooley
2022-11-18 22:54               ` Jakub Kicinski
2022-11-18 22:54                 ` Jakub Kicinski
2022-11-18 23:21                 ` Conor Dooley
2022-11-18 23:21                   ` Conor Dooley
2022-11-19 18:06                   ` Neftin, Sasha
2022-11-19 18:06                     ` Neftin, Sasha
2022-11-20 19:55                     ` Conor Dooley
2022-11-20 19:55                       ` Conor Dooley
2022-12-21 17:30                       ` Conor Dooley
2022-12-21 17:30                         ` Conor Dooley
2022-12-31 15:02                         ` Conor Dooley
2022-12-31 15:02                           ` Conor Dooley
2023-01-02 11:09                           ` Conor Dooley
2023-01-02 11:09                             ` Conor Dooley
2022-11-20 10:32                   ` Thorsten Leemhuis
2022-11-20 10:32                     ` Thorsten Leemhuis
2022-11-20 18:40                     ` Conor Dooley
2022-11-20 18:40                       ` Conor Dooley
2022-11-23 11:47             ` Ruinskiy, Dima
2022-11-24  6:20               ` Ivan Smirnov
2022-11-24 13:55                 ` Ruinskiy, Dima

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.