regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards
@ 2022-03-24 10:37 Thorsten Leemhuis
  2022-03-24 15:09 ` Neftin, Sasha
  2022-04-19 15:33 ` Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards #forregzbot Thorsten Leemhuis
  0 siblings, 2 replies; 7+ messages in thread
From: Thorsten Leemhuis @ 2022-03-24 10:37 UTC (permalink / raw)
  To: Sasha Neftin, Tony Nguyen, Jesse Brandeburg
  Cc: regressions, intel-wired-lan, James

Hi, this is your Linux kernel regression tracker.

I noticed a regression report in bugzilla.kernel.org that afaics nobody
acted upon since it was reported about a week ago, that's why I decided
to forward it to the lists and a few relevant people to the CC. To quote
from https://bugzilla.kernel.org/show_bug.cgi?id=215689 :

> [reply] [−] Description James 2022-03-15 13:45:38 UTC
> 
> I run Arch linux on an Intel NUC 8i3BEH which has the following network card:
> 
> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (6) I219-V (rev 30)
>         DeviceName:  LAN
>         Subsystem: Intel Corporation Device 2074
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Interrupt: pin A routed to IRQ 135
>         Region 0: Memory at c0b00000 (32-bit, non-prefetchable) [size=128K]
>         Capabilities: [c8] Power Management version 3
>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>         Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>                 Address: 00000000fee003d8  Data: 0000
>         Kernel driver in use: e1000e
>         Kernel modules: e1000e
> 
> I found a major regression since the previous few kernel versions which causes several odd issues, most noteably I use the machine to stream live tv via TVheadend and was finding this to be unusable (picture freezes and sound breaks up very badly with continuity errors in the TVheadend logfile).
> 
> I found the issue was introduced since the 5.14 kernel, and have eventually got round to performing a git bisect, which landed upon the following commit:
> 
> 44a13a5: e1000e: Fix the max snoop/no-snoop latency for 10M 
> 
> Indeed, if I revert this single commit then the problem is resolved.
> 
> To help diagnose the issue I applied the following patch to capture the values of the lat_enc, max_ltr_enc vs lat_enc_d, max_ltr_enc_d variables:
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> index d60e2016d..f4e5ffbcd 100644
> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> @@ -1012,6 +1012,7 @@ static s32 e1000_platform_pm_pch_lpt(struct e1000_hw *hw, bool link)
>         u16 max_ltr_enc_d = 0;  /* maximum LTR decoded by platform */
>         u16 lat_enc_d = 0;      /* latency decoded */
>         u16 lat_enc = 0;        /* latency encoded */
> +       struct e1000_adapter *adapter = hw->adapter;
> 
>         if (link) {
>                 u16 speed, duplex, scale = 0;
> @@ -1074,6 +1075,9 @@ static s32 e1000_platform_pm_pch_lpt(struct e1000_hw *hw, bool link)
>                                  ((max_ltr_enc & E1000_LTRV_SCALE_MASK)
>                                  >> E1000_LTRV_SCALE_SHIFT)));
> 
> +               e_info("e1000e: lat_enc=%d max_ltr_enc=%d", lat_enc, max_ltr_enc);
> +               e_info("e1000e: lat_enc_d=%u max_ltr_enc_d=%u", lat_enc_d, max_ltr_enc_d);
> +
>                 if (lat_enc_d > max_ltr_enc_d)
>                         lat_enc = max_ltr_enc;
>         }
> 
> With this in place I see the following in dmesg:
> 
> [    3.241215] e1000e: Intel(R) PRO/1000 Network Driver
> [    3.241217] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> [    3.243382] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
> [    3.749009] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered PHC clock
> [    3.824751] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) 94:c6:91:ae:b3:7b
> [    3.824765] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
> [    3.824849] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No: FFFFFF-0FF
> [    6.949327] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc=2233 max_ltr_enc=4099
> [    6.949331] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc_d=58368 max_ltr_enc_d=0
> [    6.951165] e1000e 0000:00:1f.6 eth0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> 
> Notice that lat_enc_d=58368 and max_ltr_enc_d=0 !
> 
> lat_enc_d is greater than max_ltr_enc_d so it's setting snoop latency to max_ltr_enc (i.e. 4099) where it would have previously been set to 2233 in this particular example. This seems to be where the problem lies.
> 
> Prior to commit 44a13a5:
> 
> if (lat_enc > max_ltr_enc)
>   lat_enc = max_ltr_enc;
> 
> After commit 44a13a5:
> 
> if (lat_enc_d > max_ltr_enc_d)
>   lat_enc = max_ltr_enc;
> 
> 
> I'm not sure whether it was intended for this new code to take effect for an I219 since the commit message on 44a13a5 indicates it was aimed at I217/I218. Seems strange that max_ltr_enc_d is getting set to 0?
> 

BTW, that commit is from Sasha Neftin.

Could somebody take a look into this? Or was this discussed somewhere
else already? Or even fixed?

Anyway, to get this tracked:

#regzbot introduced: 44a13a5d99c71bf9e1676d9e51679daf4d7b3d73
#regzbot from: James <jahutchinson99@googlemail.com>
#regzbot title: net: e1000e: instabilities on I219-V for kernel 5.14 onwards
#regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215689

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.

-- 
Additional information about regzbot:

If you want to know more about regzbot, check out its web-interface, the
getting start guide, and the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The last two documents will explain how you can interact with regzbot
yourself if your want to.

Hint for reporters: when reporting a regression it's in your interest to
CC the regression list and tell regzbot about the issue, as that ensures
the regression makes it onto the radar of the Linux kernel's regression
tracker -- that's in your interest, as it ensures your report won't fall
through the cracks unnoticed.

Hint for developers: you normally don't need to care about regzbot once
it's involved. Fix the issue as you normally would, just remember to
include 'Link:' tag in the patch descriptions pointing to all reports
about the issue. This has been expected from developers even before
regzbot showed up for reasons explained in
'Documentation/process/submitting-patches.rst' and
'Documentation/process/5.Posting.rst'.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards
  2022-03-24 10:37 Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards Thorsten Leemhuis
@ 2022-03-24 15:09 ` Neftin, Sasha
  2022-03-24 19:36   ` Neftin, Sasha
  2022-04-19 15:33 ` Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards #forregzbot Thorsten Leemhuis
  1 sibling, 1 reply; 7+ messages in thread
From: Neftin, Sasha @ 2022-03-24 15:09 UTC (permalink / raw)
  To: Thorsten Leemhuis, Tony Nguyen, Jesse Brandeburg, Fuxbrumer,
	Devora, Ruinskiy, Dima, naamax.meir, Neftin, Sasha
  Cc: regressions, intel-wired-lan, James

On 3/24/2022 12:37, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker.
> 
> I noticed a regression report in bugzilla.kernel.org that afaics nobody
> acted upon since it was reported about a week ago, that's why I decided
> to forward it to the lists and a few relevant people to the CC. To quote
> from https://bugzilla.kernel.org/show_bug.cgi?id=215689 :
> 
>> [reply] [−] Description James 2022-03-15 13:45:38 UTC
>>
>> I run Arch linux on an Intel NUC 8i3BEH which has the following network card:
>>
>> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (6) I219-V (rev 30)
>>          DeviceName:  LAN
>>          Subsystem: Intel Corporation Device 2074
>>          Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>>          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>          Latency: 0
>>          Interrupt: pin A routed to IRQ 135
>>          Region 0: Memory at c0b00000 (32-bit, non-prefetchable) [size=128K]
>>          Capabilities: [c8] Power Management version 3
>>                  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>                  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>>          Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>                  Address: 00000000fee003d8  Data: 0000
>>          Kernel driver in use: e1000e
>>          Kernel modules: e1000e
>>
>> I found a major regression since the previous few kernel versions which causes several odd issues, most noteably I use the machine to stream live tv via TVheadend and was finding this to be unusable (picture freezes and sound breaks up very badly with continuity errors in the TVheadend logfile).
>>
>> I found the issue was introduced since the 5.14 kernel, and have eventually got round to performing a git bisect, which landed upon the following commit:
>>
>> 44a13a5: e1000e: Fix the max snoop/no-snoop latency for 10M
>>
>> Indeed, if I revert this single commit then the problem is resolved.
>>
>> To help diagnose the issue I applied the following patch to capture the values of the lat_enc, max_ltr_enc vs lat_enc_d, max_ltr_enc_d variables:
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> index d60e2016d..f4e5ffbcd 100644
>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> @@ -1012,6 +1012,7 @@ static s32 e1000_platform_pm_pch_lpt(struct e1000_hw *hw, bool link)
>>          u16 max_ltr_enc_d = 0;  /* maximum LTR decoded by platform */
>>          u16 lat_enc_d = 0;      /* latency decoded */
>>          u16 lat_enc = 0;        /* latency encoded */
>> +       struct e1000_adapter *adapter = hw->adapter;
>>
>>          if (link) {
>>                  u16 speed, duplex, scale = 0;
>> @@ -1074,6 +1075,9 @@ static s32 e1000_platform_pm_pch_lpt(struct e1000_hw *hw, bool link)
>>                                   ((max_ltr_enc & E1000_LTRV_SCALE_MASK)
>>                                   >> E1000_LTRV_SCALE_SHIFT)));
>>
>> +               e_info("e1000e: lat_enc=%d max_ltr_enc=%d", lat_enc, max_ltr_enc);
>> +               e_info("e1000e: lat_enc_d=%u max_ltr_enc_d=%u", lat_enc_d, max_ltr_enc_d);
>> +
>>                  if (lat_enc_d > max_ltr_enc_d)
>>                          lat_enc = max_ltr_enc;
>>          }
>>
>> With this in place I see the following in dmesg:
>>
>> [    3.241215] e1000e: Intel(R) PRO/1000 Network Driver
>> [    3.241217] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
>> [    3.243382] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
>> [    3.749009] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered PHC clock
>> [    3.824751] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) 94:c6:91:ae:b3:7b
>> [    3.824765] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
>> [    3.824849] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No: FFFFFF-0FF
>> [    6.949327] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc=2233 max_ltr_enc=4099
>> [    6.949331] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc_d=58368 max_ltr_enc_d=0
>> [    6.951165] e1000e 0000:00:1f.6 eth0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
>>
>> Notice that lat_enc_d=58368 and max_ltr_enc_d=0 !
>>
>> lat_enc_d is greater than max_ltr_enc_d so it's setting snoop latency to max_ltr_enc (i.e. 4099) where it would have previously been set to 2233 in this particular example. This seems to be where the problem lies.
>>
>> Prior to commit 44a13a5:
>>
>> if (lat_enc > max_ltr_enc)
>>    lat_enc = max_ltr_enc;
>>
>> After commit 44a13a5:
>>
>> if (lat_enc_d > max_ltr_enc_d)
>>    lat_enc = max_ltr_enc;
>>
>>
>> I'm not sure whether it was intended for this new code to take effect for an I219 since the commit message on 44a13a5 indicates it was aimed at I217/I218. Seems strange that max_ltr_enc_d is getting set to 0?
>>
> 
> BTW, that commit is from Sasha Neftin.
Hello Thorsten,
I've expected follow decoded values (link 1G)
lat_enc: 0x000008b9 => lat_enc_d: 189440 (1024*185)
max_ltr_enc: 0x00001003 => max_ltr_enc_d: 3145728 (1048576*3)

scale 0 - 1
scale 1 - 32
scale 2 - 1024
scale 3 - 32768
scale 4 - 1048576 (nano s)

I've separated calculate:
e_info("e1000e: 1* max_ltr_enc_d: %d\n",
        max_ltr_enc & E1000_LTRV_VALUE_MASK);
e_info("e1000e: 2* max_ltr_enc_d: %d\n",
        (1U << (E1000_LTRV_SCALE_FACTOR *
        ((max_ltr_enc & E1000_LTRV_SCALE_MASK)
        >> E1000_LTRV_SCALE_SHIFT))));
I would expect:
1* max_ltr_enc_d (value): 3
2* max_ltr_enc_d (scale): 1048576
and so: value * scale
1048576*3 = 3145728ns

Please, let's check it. (I am wondering if over-calculate it)
Thanks,
Sasha
> 
> Could somebody take a look into this? Or was this discussed somewhere
> else already? Or even fixed?
> 
> Anyway, to get this tracked:
> 
> #regzbot introduced: 44a13a5d99c71bf9e1676d9e51679daf4d7b3d73
> #regzbot from: James <jahutchinson99@googlemail.com>
> #regzbot title: net: e1000e: instabilities on I219-V for kernel 5.14 onwards
> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215689
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> 
> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> reports on my table. I can only look briefly into most of them and lack
> knowledge about most of the areas they concern. I thus unfortunately
> will sometimes get things wrong or miss something important. I hope
> that's not the case here; if you think it is, don't hesitate to tell me
> in a public reply, it's in everyone's interest to set the public record
> straight.
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards
  2022-03-24 15:09 ` Neftin, Sasha
@ 2022-03-24 19:36   ` Neftin, Sasha
  2022-04-10  8:21     ` Thorsten Leemhuis
  0 siblings, 1 reply; 7+ messages in thread
From: Neftin, Sasha @ 2022-03-24 19:36 UTC (permalink / raw)
  To: Thorsten Leemhuis, Tony Nguyen, Jesse Brandeburg, Fuxbrumer,
	Devora, Ruinskiy, Dima, naamax.meir
  Cc: regressions, intel-wired-lan, James, Neftin, Sasha

On 3/24/2022 17:09, Neftin, Sasha wrote:
> On 3/24/2022 12:37, Thorsten Leemhuis wrote:
>> Hi, this is your Linux kernel regression tracker.
>>
>> I noticed a regression report in bugzilla.kernel.org that afaics nobody
>> acted upon since it was reported about a week ago, that's why I decided
>> to forward it to the lists and a few relevant people to the CC. To quote
>> from https://bugzilla.kernel.org/show_bug.cgi?id=215689 :
>>
>>> [reply] [−] Description James 2022-03-15 13:45:38 UTC
>>>
>>> I run Arch linux on an Intel NUC 8i3BEH which has the following 
>>> network card:
>>>
>>> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection 
>>> (6) I219-V (rev 30)
>>>          DeviceName:  LAN
>>>          Subsystem: Intel Corporation Device 2074
>>>          Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast 
>>> >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>          Latency: 0
>>>          Interrupt: pin A routed to IRQ 135
>>>          Region 0: Memory at c0b00000 (32-bit, non-prefetchable) 
>>> [size=128K]
>>>          Capabilities: [c8] Power Management version 3
>>>                  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
>>> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>>                  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>>>          Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>>                  Address: 00000000fee003d8  Data: 0000
>>>          Kernel driver in use: e1000e
>>>          Kernel modules: e1000e
>>>
>>> I found a major regression since the previous few kernel versions 
>>> which causes several odd issues, most noteably I use the machine to 
>>> stream live tv via TVheadend and was finding this to be unusable 
>>> (picture freezes and sound breaks up very badly with continuity 
>>> errors in the TVheadend logfile).
>>>
>>> I found the issue was introduced since the 5.14 kernel, and have 
>>> eventually got round to performing a git bisect, which landed upon 
>>> the following commit:
>>>
>>> 44a13a5: e1000e: Fix the max snoop/no-snoop latency for 10M
>>>
>>> Indeed, if I revert this single commit then the problem is resolved.
>>>
>>> To help diagnose the issue I applied the following patch to capture 
>>> the values of the lat_enc, max_ltr_enc vs lat_enc_d, max_ltr_enc_d 
>>> variables:
>>>
>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c 
>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> index d60e2016d..f4e5ffbcd 100644
>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> @@ -1012,6 +1012,7 @@ static s32 e1000_platform_pm_pch_lpt(struct 
>>> e1000_hw *hw, bool link)
>>>          u16 max_ltr_enc_d = 0;  /* maximum LTR decoded by platform */
>>>          u16 lat_enc_d = 0;      /* latency decoded */
>>>          u16 lat_enc = 0;        /* latency encoded */
>>> +       struct e1000_adapter *adapter = hw->adapter;
>>>
>>>          if (link) {
>>>                  u16 speed, duplex, scale = 0;
>>> @@ -1074,6 +1075,9 @@ static s32 e1000_platform_pm_pch_lpt(struct 
>>> e1000_hw *hw, bool link)
>>>                                   ((max_ltr_enc & E1000_LTRV_SCALE_MASK)
>>>                                   >> E1000_LTRV_SCALE_SHIFT)));
>>>
>>> +               e_info("e1000e: lat_enc=%d max_ltr_enc=%d", lat_enc, 
>>> max_ltr_enc);
>>> +               e_info("e1000e: lat_enc_d=%u max_ltr_enc_d=%u", 
>>> lat_enc_d, max_ltr_enc_d);
>>> +
>>>                  if (lat_enc_d > max_ltr_enc_d)
>>>                          lat_enc = max_ltr_enc;
>>>          }
>>>
>>> With this in place I see the following in dmesg:
>>>
>>> [    3.241215] e1000e: Intel(R) PRO/1000 Network Driver
>>> [    3.241217] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
>>> [    3.243382] e1000e 0000:00:1f.6: Interrupt Throttling Rate 
>>> (ints/sec) set to dynamic conservative mode
>>> [    3.749009] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): 
>>> registered PHC clock
>>> [    3.824751] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width 
>>> x1) 94:c6:91:ae:b3:7b
>>> [    3.824765] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network 
>>> Connection
>>> [    3.824849] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No: 
>>> FFFFFF-0FF
>>> [    6.949327] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc=2233 
>>> max_ltr_enc=4099
>>> [    6.949331] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc_d=58368 
>>> max_ltr_enc_d=0
>>> [    6.951165] e1000e 0000:00:1f.6 eth0: NIC Link is Up 1000 Mbps 
>>> Full Duplex, Flow Control: Rx/Tx
>>>
>>> Notice that lat_enc_d=58368 and max_ltr_enc_d=0 !
>>>
>>> lat_enc_d is greater than max_ltr_enc_d so it's setting snoop latency 
>>> to max_ltr_enc (i.e. 4099) where it would have previously been set to 
>>> 2233 in this particular example. This seems to be where the problem 
>>> lies.
>>>
>>> Prior to commit 44a13a5:
>>>
>>> if (lat_enc > max_ltr_enc)
>>>    lat_enc = max_ltr_enc;
>>>
>>> After commit 44a13a5:
>>>
>>> if (lat_enc_d > max_ltr_enc_d)
>>>    lat_enc = max_ltr_enc;
>>>
>>>
>>> I'm not sure whether it was intended for this new code to take effect 
>>> for an I219 since the commit message on 44a13a5 indicates it was 
>>> aimed at I217/I218. Seems strange that max_ltr_enc_d is getting set 
>>> to 0?
>>>
>>
>> BTW, that commit is from Sasha Neftin.
> Hello Thorsten,
> I've expected follow decoded values (link 1G)
> lat_enc: 0x000008b9 => lat_enc_d: 189440 (1024*185)
> max_ltr_enc: 0x00001003 => max_ltr_enc_d: 3145728 (1048576*3)
> 
> scale 0 - 1
> scale 1 - 32
> scale 2 - 1024
> scale 3 - 32768
> scale 4 - 1048576 (nano s)
> 
> I've separated calculate:
> e_info("e1000e: 1* max_ltr_enc_d: %d\n",
>         max_ltr_enc & E1000_LTRV_VALUE_MASK);
> e_info("e1000e: 2* max_ltr_enc_d: %d\n",
>         (1U << (E1000_LTRV_SCALE_FACTOR *
>         ((max_ltr_enc & E1000_LTRV_SCALE_MASK)
>         >> E1000_LTRV_SCALE_SHIFT))));
> I would expect:
> 1* max_ltr_enc_d (value): 3
> 2* max_ltr_enc_d (scale): 1048576
> and so: value * scale
> 1048576*3 = 3145728ns
> 
> Please, let's check it. (I am wondering if over-calculate it)
> Thanks,
> Sasha
ok. Overflow... Instead of
+       u16 max_ltr_enc_d = 0;  /* maximum LTR decoded by platform */
+       u16 lat_enc_d = 0;      /* latency decoded */

Should be:
+       u32 max_ltr_enc_d = 0;  /* maximum LTR decoded by platform */
+       u32 lat_enc_d = 0;      /* latency decoded */
I will process the patch address this overflow and some e_dbg to 
eliminate calculation.

sudo cat /sys/kernel/debug/pmc_core/ltr_show
SOUTHPORT_A                     	LTR: RAW: 0x0 
Non-Snoop(ns): 0               	Snoop(ns): 0
SOUTHPORT_B                     	LTR: RAW: 0x0 
Non-Snoop(ns): 0               	Snoop(ns): 0
SATA                            	LTR: RAW: 0x900f 
Non-Snoop(ns): 0               	Snoop(ns): 15728640
GIGABIT_ETHERNET                	LTR: RAW: 0x88b988b9 
Non-Snoop(ns): 189440          	Snoop(ns): 189440
XHCI                            	LTR: RAW: 0x891a 
Non-Snoop(ns): 0               	Snoop(ns): 288768

>>
>> Could somebody take a look into this? Or was this discussed somewhere
>> else already? Or even fixed?
>>
>> Anyway, to get this tracked:
>>
>> #regzbot introduced: 44a13a5d99c71bf9e1676d9e51679daf4d7b3d73
>> #regzbot from: James <jahutchinson99@googlemail.com>
>> #regzbot title: net: e1000e: instabilities on I219-V for kernel 5.14 
>> onwards
>> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215689
>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>
>> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
>> reports on my table. I can only look briefly into most of them and lack
>> knowledge about most of the areas they concern. I thus unfortunately
>> will sometimes get things wrong or miss something important. I hope
>> that's not the case here; if you think it is, don't hesitate to tell me
>> in a public reply, it's in everyone's interest to set the public record
>> straight.
>>
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards
  2022-03-24 19:36   ` Neftin, Sasha
@ 2022-04-10  8:21     ` Thorsten Leemhuis
  2022-04-10  9:26       ` Neftin, Sasha
  0 siblings, 1 reply; 7+ messages in thread
From: Thorsten Leemhuis @ 2022-04-10  8:21 UTC (permalink / raw)
  To: Neftin, Sasha, Tony Nguyen, Jesse Brandeburg, Fuxbrumer, Devora,
	Ruinskiy, Dima, naamax.meir
  Cc: regressions, intel-wired-lan, James

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

Hey Sasha and e1000e developers, what's up there? Two and a half weeks
ago it seemed the root cause for this regression was found and a
proposed patch to fix it was added to the bugzilla ticket and even
tested by the reporter. But since then nothing happened afaics. What's
up here? Or did I miss something?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.

#regzbot poke

On 24.03.22 20:36, Neftin, Sasha wrote:
> On 3/24/2022 17:09, Neftin, Sasha wrote:
>> On 3/24/2022 12:37, Thorsten Leemhuis wrote:
>>> Hi, this is your Linux kernel regression tracker.
>>>
>>> I noticed a regression report in bugzilla.kernel.org that afaics nobody
>>> acted upon since it was reported about a week ago, that's why I decided
>>> to forward it to the lists and a few relevant people to the CC. To quote
>>> from https://bugzilla.kernel.org/show_bug.cgi?id=215689 :
>>>
>>>> [reply] [−] Description James 2022-03-15 13:45:38 UTC
>>>>
>>>> I run Arch linux on an Intel NUC 8i3BEH which has the following
>>>> network card:
>>>>
>>>> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection
>>>> (6) I219-V (rev 30)
>>>>          DeviceName:  LAN
>>>>          Subsystem: Intel Corporation Device 2074
>>>>          Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>>>> >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>          Latency: 0
>>>>          Interrupt: pin A routed to IRQ 135
>>>>          Region 0: Memory at c0b00000 (32-bit, non-prefetchable)
>>>> [size=128K]
>>>>          Capabilities: [c8] Power Management version 3
>>>>                  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
>>>> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>>>                  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>>>>          Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>>>                  Address: 00000000fee003d8  Data: 0000
>>>>          Kernel driver in use: e1000e
>>>>          Kernel modules: e1000e
>>>>
>>>> I found a major regression since the previous few kernel versions
>>>> which causes several odd issues, most noteably I use the machine to
>>>> stream live tv via TVheadend and was finding this to be unusable
>>>> (picture freezes and sound breaks up very badly with continuity
>>>> errors in the TVheadend logfile).
>>>>
>>>> I found the issue was introduced since the 5.14 kernel, and have
>>>> eventually got round to performing a git bisect, which landed upon
>>>> the following commit:
>>>>
>>>> 44a13a5: e1000e: Fix the max snoop/no-snoop latency for 10M
>>>>
>>>> Indeed, if I revert this single commit then the problem is resolved.
>>>>
>>>> To help diagnose the issue I applied the following patch to capture
>>>> the values of the lat_enc, max_ltr_enc vs lat_enc_d, max_ltr_enc_d
>>>> variables:
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> index d60e2016d..f4e5ffbcd 100644
>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> @@ -1012,6 +1012,7 @@ static s32 e1000_platform_pm_pch_lpt(struct
>>>> e1000_hw *hw, bool link)
>>>>          u16 max_ltr_enc_d = 0;  /* maximum LTR decoded by platform */
>>>>          u16 lat_enc_d = 0;      /* latency decoded */
>>>>          u16 lat_enc = 0;        /* latency encoded */
>>>> +       struct e1000_adapter *adapter = hw->adapter;
>>>>
>>>>          if (link) {
>>>>                  u16 speed, duplex, scale = 0;
>>>> @@ -1074,6 +1075,9 @@ static s32 e1000_platform_pm_pch_lpt(struct
>>>> e1000_hw *hw, bool link)
>>>>                                   ((max_ltr_enc &
>>>> E1000_LTRV_SCALE_MASK)
>>>>                                   >> E1000_LTRV_SCALE_SHIFT)));
>>>>
>>>> +               e_info("e1000e: lat_enc=%d max_ltr_enc=%d", lat_enc,
>>>> max_ltr_enc);
>>>> +               e_info("e1000e: lat_enc_d=%u max_ltr_enc_d=%u",
>>>> lat_enc_d, max_ltr_enc_d);
>>>> +
>>>>                  if (lat_enc_d > max_ltr_enc_d)
>>>>                          lat_enc = max_ltr_enc;
>>>>          }
>>>>
>>>> With this in place I see the following in dmesg:
>>>>
>>>> [    3.241215] e1000e: Intel(R) PRO/1000 Network Driver
>>>> [    3.241217] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
>>>> [    3.243382] e1000e 0000:00:1f.6: Interrupt Throttling Rate
>>>> (ints/sec) set to dynamic conservative mode
>>>> [    3.749009] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized):
>>>> registered PHC clock
>>>> [    3.824751] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width
>>>> x1) 94:c6:91:ae:b3:7b
>>>> [    3.824765] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network
>>>> Connection
>>>> [    3.824849] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No:
>>>> FFFFFF-0FF
>>>> [    6.949327] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc=2233
>>>> max_ltr_enc=4099
>>>> [    6.949331] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc_d=58368
>>>> max_ltr_enc_d=0
>>>> [    6.951165] e1000e 0000:00:1f.6 eth0: NIC Link is Up 1000 Mbps
>>>> Full Duplex, Flow Control: Rx/Tx
>>>>
>>>> Notice that lat_enc_d=58368 and max_ltr_enc_d=0 !
>>>>
>>>> lat_enc_d is greater than max_ltr_enc_d so it's setting snoop
>>>> latency to max_ltr_enc (i.e. 4099) where it would have previously
>>>> been set to 2233 in this particular example. This seems to be where
>>>> the problem lies.
>>>>
>>>> Prior to commit 44a13a5:
>>>>
>>>> if (lat_enc > max_ltr_enc)
>>>>    lat_enc = max_ltr_enc;
>>>>
>>>> After commit 44a13a5:
>>>>
>>>> if (lat_enc_d > max_ltr_enc_d)
>>>>    lat_enc = max_ltr_enc;
>>>>
>>>>
>>>> I'm not sure whether it was intended for this new code to take
>>>> effect for an I219 since the commit message on 44a13a5 indicates it
>>>> was aimed at I217/I218. Seems strange that max_ltr_enc_d is getting
>>>> set to 0?
>>>>
>>>
>>> BTW, that commit is from Sasha Neftin.
>> Hello Thorsten,
>> I've expected follow decoded values (link 1G)
>> lat_enc: 0x000008b9 => lat_enc_d: 189440 (1024*185)
>> max_ltr_enc: 0x00001003 => max_ltr_enc_d: 3145728 (1048576*3)
>>
>> scale 0 - 1
>> scale 1 - 32
>> scale 2 - 1024
>> scale 3 - 32768
>> scale 4 - 1048576 (nano s)
>>
>> I've separated calculate:
>> e_info("e1000e: 1* max_ltr_enc_d: %d\n",
>>         max_ltr_enc & E1000_LTRV_VALUE_MASK);
>> e_info("e1000e: 2* max_ltr_enc_d: %d\n",
>>         (1U << (E1000_LTRV_SCALE_FACTOR *
>>         ((max_ltr_enc & E1000_LTRV_SCALE_MASK)
>>         >> E1000_LTRV_SCALE_SHIFT))));
>> I would expect:
>> 1* max_ltr_enc_d (value): 3
>> 2* max_ltr_enc_d (scale): 1048576
>> and so: value * scale
>> 1048576*3 = 3145728ns
>>
>> Please, let's check it. (I am wondering if over-calculate it)
>> Thanks,
>> Sasha
> ok. Overflow... Instead of
> +       u16 max_ltr_enc_d = 0;  /* maximum LTR decoded by platform */
> +       u16 lat_enc_d = 0;      /* latency decoded */
> 
> Should be:
> +       u32 max_ltr_enc_d = 0;  /* maximum LTR decoded by platform */
> +       u32 lat_enc_d = 0;      /* latency decoded */
> I will process the patch address this overflow and some e_dbg to
> eliminate calculation.
> 
> sudo cat /sys/kernel/debug/pmc_core/ltr_show
> SOUTHPORT_A                         LTR: RAW: 0x0 Non-Snoop(ns):
> 0                   Snoop(ns): 0
> SOUTHPORT_B                         LTR: RAW: 0x0 Non-Snoop(ns):
> 0                   Snoop(ns): 0
> SATA                                LTR: RAW: 0x900f Non-Snoop(ns):
> 0                   Snoop(ns): 15728640
> GIGABIT_ETHERNET                    LTR: RAW: 0x88b988b9 Non-Snoop(ns):
> 189440              Snoop(ns): 189440
> XHCI                                LTR: RAW: 0x891a Non-Snoop(ns):
> 0                   Snoop(ns): 288768
> 
>>>
>>> Could somebody take a look into this? Or was this discussed somewhere
>>> else already? Or even fixed?
>>>
>>> Anyway, to get this tracked:
>>>
>>> #regzbot introduced: 44a13a5d99c71bf9e1676d9e51679daf4d7b3d73
>>> #regzbot from: James <jahutchinson99@googlemail.com>
>>> #regzbot title: net: e1000e: instabilities on I219-V for kernel 5.14
>>> onwards
>>> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215689
>>>
>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>
>>> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
>>> reports on my table. I can only look briefly into most of them and lack
>>> knowledge about most of the areas they concern. I thus unfortunately
>>> will sometimes get things wrong or miss something important. I hope
>>> that's not the case here; if you think it is, don't hesitate to tell me
>>> in a public reply, it's in everyone's interest to set the public record
>>> straight.
>>>
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards
  2022-04-10  8:21     ` Thorsten Leemhuis
@ 2022-04-10  9:26       ` Neftin, Sasha
  2022-04-10  9:47         ` Thorsten Leemhuis
  0 siblings, 1 reply; 7+ messages in thread
From: Neftin, Sasha @ 2022-04-10  9:26 UTC (permalink / raw)
  To: Thorsten Leemhuis, Tony Nguyen, Jesse Brandeburg, Fuxbrumer,
	Devora, Ruinskiy, Dima, naamax.meir
  Cc: regressions, intel-wired-lan, James, Neftin, Sasha

On 4/10/2022 11:21, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
> 
> Hey Sasha and e1000e developers, what's up there? Two and a half weeks
> ago it seemed the root cause for this regression was found and a
> proposed patch to fix it was added to the bugzilla ticket and even
> tested by the reporter. But since then nothing happened afaics. What's
> up here? Or did I miss something?
Hello Thorsten,
Patch submitted to the IWL:
https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git/commit/?h=dev-queue&id=7dd121b8d5735780b6a70db735d44b3e5b856130
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> 
> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> reports on my table. I can only look briefly into most of them and lack
> knowledge about most of the areas they concern. I thus unfortunately
> will sometimes get things wrong or miss something important. I hope
> that's not the case here; if you think it is, don't hesitate to tell me
> in a public reply, it's in everyone's interest to set the public record
> straight.
> 
> #regzbot poke
> 
> On 24.03.22 20:36, Neftin, Sasha wrote:
>> On 3/24/2022 17:09, Neftin, Sasha wrote:
>>> On 3/24/2022 12:37, Thorsten Leemhuis wrote:
>>>> Hi, this is your Linux kernel regression tracker.
>>>>
>>>> I noticed a regression report in bugzilla.kernel.org that afaics nobody
>>>> acted upon since it was reported about a week ago, that's why I decided
>>>> to forward it to the lists and a few relevant people to the CC. To quote
>>>> from https://bugzilla.kernel.org/show_bug.cgi?id=215689 :
>>>>
>>>>> [reply] [−] Description James 2022-03-15 13:45:38 UTC
>>>>>
>>>>> I run Arch linux on an Intel NUC 8i3BEH which has the following
>>>>> network card:
>>>>>
>>>>> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection
>>>>> (6) I219-V (rev 30)
>>>>>           DeviceName:  LAN
>>>>>           Subsystem: Intel Corporation Device 2074
>>>>>           Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>           Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>>>>>> TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>           Latency: 0
>>>>>           Interrupt: pin A routed to IRQ 135
>>>>>           Region 0: Memory at c0b00000 (32-bit, non-prefetchable)
>>>>> [size=128K]
>>>>>           Capabilities: [c8] Power Management version 3
>>>>>                   Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
>>>>> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>>>>                   Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>>>>>           Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>>>>                   Address: 00000000fee003d8  Data: 0000
>>>>>           Kernel driver in use: e1000e
>>>>>           Kernel modules: e1000e
>>>>>
>>>>> I found a major regression since the previous few kernel versions
>>>>> which causes several odd issues, most noteably I use the machine to
>>>>> stream live tv via TVheadend and was finding this to be unusable
>>>>> (picture freezes and sound breaks up very badly with continuity
>>>>> errors in the TVheadend logfile).
>>>>>
>>>>> I found the issue was introduced since the 5.14 kernel, and have
>>>>> eventually got round to performing a git bisect, which landed upon
>>>>> the following commit:
>>>>>
>>>>> 44a13a5: e1000e: Fix the max snoop/no-snoop latency for 10M
>>>>>
>>>>> Indeed, if I revert this single commit then the problem is resolved.
>>>>>
>>>>> To help diagnose the issue I applied the following patch to capture
>>>>> the values of the lat_enc, max_ltr_enc vs lat_enc_d, max_ltr_enc_d
>>>>> variables:
>>>>>
>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> index d60e2016d..f4e5ffbcd 100644
>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> @@ -1012,6 +1012,7 @@ static s32 e1000_platform_pm_pch_lpt(struct
>>>>> e1000_hw *hw, bool link)
>>>>>           u16 max_ltr_enc_d = 0;  /* maximum LTR decoded by platform */
>>>>>           u16 lat_enc_d = 0;      /* latency decoded */
>>>>>           u16 lat_enc = 0;        /* latency encoded */
>>>>> +       struct e1000_adapter *adapter = hw->adapter;
>>>>>
>>>>>           if (link) {
>>>>>                   u16 speed, duplex, scale = 0;
>>>>> @@ -1074,6 +1075,9 @@ static s32 e1000_platform_pm_pch_lpt(struct
>>>>> e1000_hw *hw, bool link)
>>>>>                                    ((max_ltr_enc &
>>>>> E1000_LTRV_SCALE_MASK)
>>>>>                                    >> E1000_LTRV_SCALE_SHIFT)));
>>>>>
>>>>> +               e_info("e1000e: lat_enc=%d max_ltr_enc=%d", lat_enc,
>>>>> max_ltr_enc);
>>>>> +               e_info("e1000e: lat_enc_d=%u max_ltr_enc_d=%u",
>>>>> lat_enc_d, max_ltr_enc_d);
>>>>> +
>>>>>                   if (lat_enc_d > max_ltr_enc_d)
>>>>>                           lat_enc = max_ltr_enc;
>>>>>           }
>>>>>
>>>>> With this in place I see the following in dmesg:
>>>>>
>>>>> [    3.241215] e1000e: Intel(R) PRO/1000 Network Driver
>>>>> [    3.241217] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
>>>>> [    3.243382] e1000e 0000:00:1f.6: Interrupt Throttling Rate
>>>>> (ints/sec) set to dynamic conservative mode
>>>>> [    3.749009] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized):
>>>>> registered PHC clock
>>>>> [    3.824751] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width
>>>>> x1) 94:c6:91:ae:b3:7b
>>>>> [    3.824765] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network
>>>>> Connection
>>>>> [    3.824849] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No:
>>>>> FFFFFF-0FF
>>>>> [    6.949327] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc=2233
>>>>> max_ltr_enc=4099
>>>>> [    6.949331] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc_d=58368
>>>>> max_ltr_enc_d=0
>>>>> [    6.951165] e1000e 0000:00:1f.6 eth0: NIC Link is Up 1000 Mbps
>>>>> Full Duplex, Flow Control: Rx/Tx
>>>>>
>>>>> Notice that lat_enc_d=58368 and max_ltr_enc_d=0 !
>>>>>
>>>>> lat_enc_d is greater than max_ltr_enc_d so it's setting snoop
>>>>> latency to max_ltr_enc (i.e. 4099) where it would have previously
>>>>> been set to 2233 in this particular example. This seems to be where
>>>>> the problem lies.
>>>>>
>>>>> Prior to commit 44a13a5:
>>>>>
>>>>> if (lat_enc > max_ltr_enc)
>>>>>     lat_enc = max_ltr_enc;
>>>>>
>>>>> After commit 44a13a5:
>>>>>
>>>>> if (lat_enc_d > max_ltr_enc_d)
>>>>>     lat_enc = max_ltr_enc;
>>>>>
>>>>>
>>>>> I'm not sure whether it was intended for this new code to take
>>>>> effect for an I219 since the commit message on 44a13a5 indicates it
>>>>> was aimed at I217/I218. Seems strange that max_ltr_enc_d is getting
>>>>> set to 0?
>>>>>
>>>>
>>>> BTW, that commit is from Sasha Neftin.
>>> Hello Thorsten,
>>> I've expected follow decoded values (link 1G)
>>> lat_enc: 0x000008b9 => lat_enc_d: 189440 (1024*185)
>>> max_ltr_enc: 0x00001003 => max_ltr_enc_d: 3145728 (1048576*3)
>>>
>>> scale 0 - 1
>>> scale 1 - 32
>>> scale 2 - 1024
>>> scale 3 - 32768
>>> scale 4 - 1048576 (nano s)
>>>
>>> I've separated calculate:
>>> e_info("e1000e: 1* max_ltr_enc_d: %d\n",
>>>          max_ltr_enc & E1000_LTRV_VALUE_MASK);
>>> e_info("e1000e: 2* max_ltr_enc_d: %d\n",
>>>          (1U << (E1000_LTRV_SCALE_FACTOR *
>>>          ((max_ltr_enc & E1000_LTRV_SCALE_MASK)
>>>          >> E1000_LTRV_SCALE_SHIFT))));
>>> I would expect:
>>> 1* max_ltr_enc_d (value): 3
>>> 2* max_ltr_enc_d (scale): 1048576
>>> and so: value * scale
>>> 1048576*3 = 3145728ns
>>>
>>> Please, let's check it. (I am wondering if over-calculate it)
>>> Thanks,
>>> Sasha
>> ok. Overflow... Instead of
>> +       u16 max_ltr_enc_d = 0;  /* maximum LTR decoded by platform */
>> +       u16 lat_enc_d = 0;      /* latency decoded */
>>
>> Should be:
>> +       u32 max_ltr_enc_d = 0;  /* maximum LTR decoded by platform */
>> +       u32 lat_enc_d = 0;      /* latency decoded */
>> I will process the patch address this overflow and some e_dbg to
>> eliminate calculation.
>>
>> sudo cat /sys/kernel/debug/pmc_core/ltr_show
>> SOUTHPORT_A                         LTR: RAW: 0x0 Non-Snoop(ns):
>> 0                   Snoop(ns): 0
>> SOUTHPORT_B                         LTR: RAW: 0x0 Non-Snoop(ns):
>> 0                   Snoop(ns): 0
>> SATA                                LTR: RAW: 0x900f Non-Snoop(ns):
>> 0                   Snoop(ns): 15728640
>> GIGABIT_ETHERNET                    LTR: RAW: 0x88b988b9 Non-Snoop(ns):
>> 189440              Snoop(ns): 189440
>> XHCI                                LTR: RAW: 0x891a Non-Snoop(ns):
>> 0                   Snoop(ns): 288768
>>
>>>>
>>>> Could somebody take a look into this? Or was this discussed somewhere
>>>> else already? Or even fixed?
>>>>
>>>> Anyway, to get this tracked:
>>>>
>>>> #regzbot introduced: 44a13a5d99c71bf9e1676d9e51679daf4d7b3d73
>>>> #regzbot from: James <jahutchinson99@googlemail.com>
>>>> #regzbot title: net: e1000e: instabilities on I219-V for kernel 5.14
>>>> onwards
>>>> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215689
>>>>
>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>>
>>>> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
>>>> reports on my table. I can only look briefly into most of them and lack
>>>> knowledge about most of the areas they concern. I thus unfortunately
>>>> will sometimes get things wrong or miss something important. I hope
>>>> that's not the case here; if you think it is, don't hesitate to tell me
>>>> in a public reply, it's in everyone's interest to set the public record
>>>> straight.
>>>>
>>>
>>
>>
>>
Sasha

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards
  2022-04-10  9:26       ` Neftin, Sasha
@ 2022-04-10  9:47         ` Thorsten Leemhuis
  0 siblings, 0 replies; 7+ messages in thread
From: Thorsten Leemhuis @ 2022-04-10  9:47 UTC (permalink / raw)
  To: Neftin, Sasha, Tony Nguyen, Jesse Brandeburg, Fuxbrumer, Devora,
	Ruinskiy, Dima, naamax.meir
  Cc: regressions, intel-wired-lan, James

On 10.04.22 11:26, Neftin, Sasha wrote:
> On 4/10/2022 11:21, Thorsten Leemhuis wrote:
>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>> to make this easily accessible to everyone.
>>
>> Hey Sasha and e1000e developers, what's up there? Two and a half weeks
>> ago it seemed the root cause for this regression was found and a
>> proposed patch to fix it was added to the bugzilla ticket and even
>> tested by the reporter. But since then nothing happened afaics. What's
>> up here? Or did I miss something?
> Hello Thorsten,
> Patch submitted to the IWL:
> https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git/commit/?h=dev-queue&id=7dd121b8d5735780b6a70db735d44b3e5b856130

Ahh, great, many thx. That's hard to find for an outsider like other
people that run into this problem (IOW: mentioning it in the bugzilla
ticket would have been nice). Guess at least I might have found it, if
intel-wired-lad was archived on lore. Anyway:

I have to wonder: why is this in the "next"-queue? That patch doesn't
look really dangerous and it's fixing a regression from v5.14, so why
not merge it this cycle? Ohh, and a explicit tag to get it backported to
stable quickly might be good as well afaics:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

Ciao, Thorsten

>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>
>> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
>> reports on my table. I can only look briefly into most of them and lack
>> knowledge about most of the areas they concern. I thus unfortunately
>> will sometimes get things wrong or miss something important. I hope
>> that's not the case here; if you think it is, don't hesitate to tell me
>> in a public reply, it's in everyone's interest to set the public record
>> straight.
>>
>> #regzbot poke
>>
>> On 24.03.22 20:36, Neftin, Sasha wrote:
>>> On 3/24/2022 17:09, Neftin, Sasha wrote:
>>>> On 3/24/2022 12:37, Thorsten Leemhuis wrote:
>>>>> Hi, this is your Linux kernel regression tracker.
>>>>>
>>>>> I noticed a regression report in bugzilla.kernel.org that afaics
>>>>> nobody
>>>>> acted upon since it was reported about a week ago, that's why I
>>>>> decided
>>>>> to forward it to the lists and a few relevant people to the CC. To
>>>>> quote
>>>>> from https://bugzilla.kernel.org/show_bug.cgi?id=215689 :
>>>>>
>>>>>> [reply] [−] Description James 2022-03-15 13:45:38 UTC
>>>>>>
>>>>>> I run Arch linux on an Intel NUC 8i3BEH which has the following
>>>>>> network card:
>>>>>>
>>>>>> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection
>>>>>> (6) I219-V (rev 30)
>>>>>>           DeviceName:  LAN
>>>>>>           Subsystem: Intel Corporation Device 2074
>>>>>>           Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>           Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>>>>>>> TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>           Latency: 0
>>>>>>           Interrupt: pin A routed to IRQ 135
>>>>>>           Region 0: Memory at c0b00000 (32-bit, non-prefetchable)
>>>>>> [size=128K]
>>>>>>           Capabilities: [c8] Power Management version 3
>>>>>>                   Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
>>>>>> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>>>>>                   Status: D0 NoSoftRst+ PME-Enable- DSel=0
>>>>>> DScale=1 PME-
>>>>>>           Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>>>>>                   Address: 00000000fee003d8  Data: 0000
>>>>>>           Kernel driver in use: e1000e
>>>>>>           Kernel modules: e1000e
>>>>>>
>>>>>> I found a major regression since the previous few kernel versions
>>>>>> which causes several odd issues, most noteably I use the machine to
>>>>>> stream live tv via TVheadend and was finding this to be unusable
>>>>>> (picture freezes and sound breaks up very badly with continuity
>>>>>> errors in the TVheadend logfile).
>>>>>>
>>>>>> I found the issue was introduced since the 5.14 kernel, and have
>>>>>> eventually got round to performing a git bisect, which landed upon
>>>>>> the following commit:
>>>>>>
>>>>>> 44a13a5: e1000e: Fix the max snoop/no-snoop latency for 10M
>>>>>>
>>>>>> Indeed, if I revert this single commit then the problem is resolved.
>>>>>>
>>>>>> To help diagnose the issue I applied the following patch to capture
>>>>>> the values of the lat_enc, max_ltr_enc vs lat_enc_d, max_ltr_enc_d
>>>>>> variables:
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>> index d60e2016d..f4e5ffbcd 100644
>>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>> @@ -1012,6 +1012,7 @@ static s32 e1000_platform_pm_pch_lpt(struct
>>>>>> e1000_hw *hw, bool link)
>>>>>>           u16 max_ltr_enc_d = 0;  /* maximum LTR decoded by
>>>>>> platform */
>>>>>>           u16 lat_enc_d = 0;      /* latency decoded */
>>>>>>           u16 lat_enc = 0;        /* latency encoded */
>>>>>> +       struct e1000_adapter *adapter = hw->adapter;
>>>>>>
>>>>>>           if (link) {
>>>>>>                   u16 speed, duplex, scale = 0;
>>>>>> @@ -1074,6 +1075,9 @@ static s32 e1000_platform_pm_pch_lpt(struct
>>>>>> e1000_hw *hw, bool link)
>>>>>>                                    ((max_ltr_enc &
>>>>>> E1000_LTRV_SCALE_MASK)
>>>>>>                                    >> E1000_LTRV_SCALE_SHIFT)));
>>>>>>
>>>>>> +               e_info("e1000e: lat_enc=%d max_ltr_enc=%d", lat_enc,
>>>>>> max_ltr_enc);
>>>>>> +               e_info("e1000e: lat_enc_d=%u max_ltr_enc_d=%u",
>>>>>> lat_enc_d, max_ltr_enc_d);
>>>>>> +
>>>>>>                   if (lat_enc_d > max_ltr_enc_d)
>>>>>>                           lat_enc = max_ltr_enc;
>>>>>>           }
>>>>>>
>>>>>> With this in place I see the following in dmesg:
>>>>>>
>>>>>> [    3.241215] e1000e: Intel(R) PRO/1000 Network Driver
>>>>>> [    3.241217] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
>>>>>> [    3.243382] e1000e 0000:00:1f.6: Interrupt Throttling Rate
>>>>>> (ints/sec) set to dynamic conservative mode
>>>>>> [    3.749009] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized):
>>>>>> registered PHC clock
>>>>>> [    3.824751] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width
>>>>>> x1) 94:c6:91:ae:b3:7b
>>>>>> [    3.824765] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network
>>>>>> Connection
>>>>>> [    3.824849] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No:
>>>>>> FFFFFF-0FF
>>>>>> [    6.949327] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc=2233
>>>>>> max_ltr_enc=4099
>>>>>> [    6.949331] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc_d=58368
>>>>>> max_ltr_enc_d=0
>>>>>> [    6.951165] e1000e 0000:00:1f.6 eth0: NIC Link is Up 1000 Mbps
>>>>>> Full Duplex, Flow Control: Rx/Tx
>>>>>>
>>>>>> Notice that lat_enc_d=58368 and max_ltr_enc_d=0 !
>>>>>>
>>>>>> lat_enc_d is greater than max_ltr_enc_d so it's setting snoop
>>>>>> latency to max_ltr_enc (i.e. 4099) where it would have previously
>>>>>> been set to 2233 in this particular example. This seems to be where
>>>>>> the problem lies.
>>>>>>
>>>>>> Prior to commit 44a13a5:
>>>>>>
>>>>>> if (lat_enc > max_ltr_enc)
>>>>>>     lat_enc = max_ltr_enc;
>>>>>>
>>>>>> After commit 44a13a5:
>>>>>>
>>>>>> if (lat_enc_d > max_ltr_enc_d)
>>>>>>     lat_enc = max_ltr_enc;
>>>>>>
>>>>>>
>>>>>> I'm not sure whether it was intended for this new code to take
>>>>>> effect for an I219 since the commit message on 44a13a5 indicates it
>>>>>> was aimed at I217/I218. Seems strange that max_ltr_enc_d is getting
>>>>>> set to 0?
>>>>>>
>>>>>
>>>>> BTW, that commit is from Sasha Neftin.
>>>> Hello Thorsten,
>>>> I've expected follow decoded values (link 1G)
>>>> lat_enc: 0x000008b9 => lat_enc_d: 189440 (1024*185)
>>>> max_ltr_enc: 0x00001003 => max_ltr_enc_d: 3145728 (1048576*3)
>>>>
>>>> scale 0 - 1
>>>> scale 1 - 32
>>>> scale 2 - 1024
>>>> scale 3 - 32768
>>>> scale 4 - 1048576 (nano s)
>>>>
>>>> I've separated calculate:
>>>> e_info("e1000e: 1* max_ltr_enc_d: %d\n",
>>>>          max_ltr_enc & E1000_LTRV_VALUE_MASK);
>>>> e_info("e1000e: 2* max_ltr_enc_d: %d\n",
>>>>          (1U << (E1000_LTRV_SCALE_FACTOR *
>>>>          ((max_ltr_enc & E1000_LTRV_SCALE_MASK)
>>>>          >> E1000_LTRV_SCALE_SHIFT))));
>>>> I would expect:
>>>> 1* max_ltr_enc_d (value): 3
>>>> 2* max_ltr_enc_d (scale): 1048576
>>>> and so: value * scale
>>>> 1048576*3 = 3145728ns
>>>>
>>>> Please, let's check it. (I am wondering if over-calculate it)
>>>> Thanks,
>>>> Sasha
>>> ok. Overflow... Instead of
>>> +       u16 max_ltr_enc_d = 0;  /* maximum LTR decoded by platform */
>>> +       u16 lat_enc_d = 0;      /* latency decoded */
>>>
>>> Should be:
>>> +       u32 max_ltr_enc_d = 0;  /* maximum LTR decoded by platform */
>>> +       u32 lat_enc_d = 0;      /* latency decoded */
>>> I will process the patch address this overflow and some e_dbg to
>>> eliminate calculation.
>>>
>>> sudo cat /sys/kernel/debug/pmc_core/ltr_show
>>> SOUTHPORT_A                         LTR: RAW: 0x0 Non-Snoop(ns):
>>> 0                   Snoop(ns): 0
>>> SOUTHPORT_B                         LTR: RAW: 0x0 Non-Snoop(ns):
>>> 0                   Snoop(ns): 0
>>> SATA                                LTR: RAW: 0x900f Non-Snoop(ns):
>>> 0                   Snoop(ns): 15728640
>>> GIGABIT_ETHERNET                    LTR: RAW: 0x88b988b9 Non-Snoop(ns):
>>> 189440              Snoop(ns): 189440
>>> XHCI                                LTR: RAW: 0x891a Non-Snoop(ns):
>>> 0                   Snoop(ns): 288768
>>>
>>>>>
>>>>> Could somebody take a look into this? Or was this discussed somewhere
>>>>> else already? Or even fixed?
>>>>>
>>>>> Anyway, to get this tracked:
>>>>>
>>>>> #regzbot introduced: 44a13a5d99c71bf9e1676d9e51679daf4d7b3d73
>>>>> #regzbot from: James <jahutchinson99@googlemail.com>
>>>>> #regzbot title: net: e1000e: instabilities on I219-V for kernel 5.14
>>>>> onwards
>>>>> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215689
>>>>>
>>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker'
>>>>> hat)
>>>>>
>>>>> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
>>>>> reports on my table. I can only look briefly into most of them and
>>>>> lack
>>>>> knowledge about most of the areas they concern. I thus unfortunately
>>>>> will sometimes get things wrong or miss something important. I hope
>>>>> that's not the case here; if you think it is, don't hesitate to
>>>>> tell me
>>>>> in a public reply, it's in everyone's interest to set the public
>>>>> record
>>>>> straight.
>>>>>
>>>>
>>>
>>>
>>>
> Sasha
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards #forregzbot
  2022-03-24 10:37 Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards Thorsten Leemhuis
  2022-03-24 15:09 ` Neftin, Sasha
@ 2022-04-19 15:33 ` Thorsten Leemhuis
  1 sibling, 0 replies; 7+ messages in thread
From: Thorsten Leemhuis @ 2022-04-19 15:33 UTC (permalink / raw)
  To: regressions

TWIMC: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.

#regzbot fixed-by: 04ebaa1cfddae5f240

On 24.03.22 11:37, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker.
> 
> I noticed a regression report in bugzilla.kernel.org that afaics nobody
> acted upon since it was reported about a week ago, that's why I decided
> to forward it to the lists and a few relevant people to the CC. To quote
> from https://bugzilla.kernel.org/show_bug.cgi?id=215689 :
> 
>> [reply] [−] Description James 2022-03-15 13:45:38 UTC
>>
>> I run Arch linux on an Intel NUC 8i3BEH which has the following network card:
>>
>> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (6) I219-V (rev 30)
>>         DeviceName:  LAN
>>         Subsystem: Intel Corporation Device 2074
>>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>         Latency: 0
>>         Interrupt: pin A routed to IRQ 135
>>         Region 0: Memory at c0b00000 (32-bit, non-prefetchable) [size=128K]
>>         Capabilities: [c8] Power Management version 3
>>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>>         Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>                 Address: 00000000fee003d8  Data: 0000
>>         Kernel driver in use: e1000e
>>         Kernel modules: e1000e
>>
>> I found a major regression since the previous few kernel versions which causes several odd issues, most noteably I use the machine to stream live tv via TVheadend and was finding this to be unusable (picture freezes and sound breaks up very badly with continuity errors in the TVheadend logfile).
>>
>> I found the issue was introduced since the 5.14 kernel, and have eventually got round to performing a git bisect, which landed upon the following commit:
>>
>> 44a13a5: e1000e: Fix the max snoop/no-snoop latency for 10M 
>>
>> Indeed, if I revert this single commit then the problem is resolved.
>>
>> To help diagnose the issue I applied the following patch to capture the values of the lat_enc, max_ltr_enc vs lat_enc_d, max_ltr_enc_d variables:
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> index d60e2016d..f4e5ffbcd 100644
>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> @@ -1012,6 +1012,7 @@ static s32 e1000_platform_pm_pch_lpt(struct e1000_hw *hw, bool link)
>>         u16 max_ltr_enc_d = 0;  /* maximum LTR decoded by platform */
>>         u16 lat_enc_d = 0;      /* latency decoded */
>>         u16 lat_enc = 0;        /* latency encoded */
>> +       struct e1000_adapter *adapter = hw->adapter;
>>
>>         if (link) {
>>                 u16 speed, duplex, scale = 0;
>> @@ -1074,6 +1075,9 @@ static s32 e1000_platform_pm_pch_lpt(struct e1000_hw *hw, bool link)
>>                                  ((max_ltr_enc & E1000_LTRV_SCALE_MASK)
>>                                  >> E1000_LTRV_SCALE_SHIFT)));
>>
>> +               e_info("e1000e: lat_enc=%d max_ltr_enc=%d", lat_enc, max_ltr_enc);
>> +               e_info("e1000e: lat_enc_d=%u max_ltr_enc_d=%u", lat_enc_d, max_ltr_enc_d);
>> +
>>                 if (lat_enc_d > max_ltr_enc_d)
>>                         lat_enc = max_ltr_enc;
>>         }
>>
>> With this in place I see the following in dmesg:
>>
>> [    3.241215] e1000e: Intel(R) PRO/1000 Network Driver
>> [    3.241217] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
>> [    3.243382] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
>> [    3.749009] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered PHC clock
>> [    3.824751] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) 94:c6:91:ae:b3:7b
>> [    3.824765] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
>> [    3.824849] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No: FFFFFF-0FF
>> [    6.949327] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc=2233 max_ltr_enc=4099
>> [    6.949331] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc_d=58368 max_ltr_enc_d=0
>> [    6.951165] e1000e 0000:00:1f.6 eth0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
>>
>> Notice that lat_enc_d=58368 and max_ltr_enc_d=0 !
>>
>> lat_enc_d is greater than max_ltr_enc_d so it's setting snoop latency to max_ltr_enc (i.e. 4099) where it would have previously been set to 2233 in this particular example. This seems to be where the problem lies.
>>
>> Prior to commit 44a13a5:
>>
>> if (lat_enc > max_ltr_enc)
>>   lat_enc = max_ltr_enc;
>>
>> After commit 44a13a5:
>>
>> if (lat_enc_d > max_ltr_enc_d)
>>   lat_enc = max_ltr_enc;
>>
>>
>> I'm not sure whether it was intended for this new code to take effect for an I219 since the commit message on 44a13a5 indicates it was aimed at I217/I218. Seems strange that max_ltr_enc_d is getting set to 0?
>>
> 
> BTW, that commit is from Sasha Neftin.
> 
> Could somebody take a look into this? Or was this discussed somewhere
> else already? Or even fixed?
> 
> Anyway, to get this tracked:
> 
> #regzbot introduced: 44a13a5d99c71bf9e1676d9e51679daf4d7b3d73
> #regzbot from: James <jahutchinson99@googlemail.com>
> #regzbot title: net: e1000e: instabilities on I219-V for kernel 5.14 onwards
> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215689
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> 
> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> reports on my table. I can only look briefly into most of them and lack
> knowledge about most of the areas they concern. I thus unfortunately
> will sometimes get things wrong or miss something important. I hope
> that's not the case here; if you think it is, don't hesitate to tell me
> in a public reply, it's in everyone's interest to set the public record
> straight.
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-04-19 15:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-24 10:37 Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards Thorsten Leemhuis
2022-03-24 15:09 ` Neftin, Sasha
2022-03-24 19:36   ` Neftin, Sasha
2022-04-10  8:21     ` Thorsten Leemhuis
2022-04-10  9:26       ` Neftin, Sasha
2022-04-10  9:47         ` Thorsten Leemhuis
2022-04-19 15:33 ` Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards #forregzbot Thorsten Leemhuis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).