* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver @ 2019-07-11 6:50 Gavin Lambert 2019-07-12 3:23 ` Gavin Lambert 0 siblings, 1 reply; 18+ messages in thread From: Gavin Lambert @ 2019-07-11 6:50 UTC (permalink / raw) To: intel-wired-lan This might be a bit of a tricky question, but I'm not really sure where else to ask. Please cc me on any replies or I might overlook them. I'm using a system with an e1000e network driver which has been patched to bypass the regular Linux network stack (because it can get called from a Xenomai RT context, among other reasons -- although in my case I'm not doing that). The complete source for the patched version of the code can be found here: https://github.com/ribalda/ethercat/blob/master/devices/e1000e/netdev-4.9-ethercat.c (There are some minor changes to other files, but the majority of changes are only to this file. You can see just the changes at https://gist.github.com/uecasm/5e36a15bda6ffd53079344fc443dcc5f/revisions .) It was originally based on the in-kernel e1000e driver as of Linux 4.9.65. (I'm not the person who originally made the patches, but I am the person who rebased them to kernel 4.9 and I'm the one trying to maintain them for newer kernel versions. Though I'm also not the person who made that github repo.) On a Debian system with kernel linux-image-4.9.0-4-rt-amd64 (4.9.65) installed, this works perfectly. It also works perfectly with linux-image-4.9.0-8-rt-amd64 (4.9.110). However, with kernel linux-image-4.9.0-9-rt-amd64 (4.9.168) installed (and no other changes to the system other than building the patched e1000e module against this kernel's headers), something weird happens when the driver is running in its alternate "ecdev" mode. Specifically, when the module is initially loaded, it works as expected and can send/receive without problems. When link is removed (by disconnecting the Ethernet cable), it detects this as expected. When link is restored, it detects this and reports it but is then unable to actually send any packets. (Note: to send packets the external code calls the "ndo_start_xmit" operation directly, and to receive packets it calls "ec_poll". Also note that it won't receive a packet unless it sends one first, due to the way that the network it's connected to works, so I can't tell if receives work or not when sends don't work.) Unloading and reloading the module fixes this, even if the link is initially down and then reconnected after the module is reloaded. (So perhaps the problem is something it does at the link-loss event?) Occasionally, it does manage to survive one or two replugs before getting into the problem state. But once there, no amount of replugging appears to recover it; only reloading the module. I do know that when it's in the failure state (not actually sending packets), e1000_xmit_frame continues to get all the way to the bottom and return NETDEV_TX_OK. Note that the e1000e code being used is still the code as shown in the link above, not the code as exists in Linux 4.9.168. I did try rebasing the ethercat patches onto the new driver version, but this didn?t seem to change the behavior. Also note that the bad behavior was observed on an I219-V and an I219-LM, but does not appear to happen with an 82571EB (these are the only devices I have handy at the moment). The problem also doesn't occur when using the unpatched driver from 4.9.168 as a standard Linux network driver. Obviously, something the patches are doing is causing problems, but it seems odd that the issue only occurs with certain hardware and with certain kernel versions. Any ideas on what could be the cause and solution (or how to narrow it down further)? I can easily make changes to the driver code; it's a lot harder to try kernel versions between the two above, however, but I might be able to do that too. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-07-11 6:50 [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver Gavin Lambert @ 2019-07-12 3:23 ` Gavin Lambert 2019-07-18 8:06 ` Gavin Lambert 0 siblings, 1 reply; 18+ messages in thread From: Gavin Lambert @ 2019-07-12 3:23 UTC (permalink / raw) To: intel-wired-lan On 2019-07-11 18:50, I wrote: > This might be a bit of a tricky question, but I'm not really sure > where else to ask. Please cc me on any replies or I might overlook > them. > > I'm using a system with an e1000e network driver which has been > patched to bypass the regular Linux network stack (because it can get > called from a Xenomai RT context, among other reasons -- although in > my case I'm not doing that). The complete source for the patched > version of the code can be found here: > > https://github.com/ribalda/ethercat/blob/master/devices/e1000e/netdev-4.9-ethercat.c > (There are some minor changes to other files, but the majority of > changes are only to this file. You can see just the changes at > https://gist.github.com/uecasm/5e36a15bda6ffd53079344fc443dcc5f/revisions > .) > > It was originally based on the in-kernel e1000e driver as of Linux > 4.9.65. (I'm not the person who originally made the patches, but I am > the person who rebased them to kernel 4.9 and I'm the one trying to > maintain them for newer kernel versions. Though I'm also not the > person who made that github repo.) > > On a Debian system with kernel linux-image-4.9.0-4-rt-amd64 (4.9.65) > installed, this works perfectly. It also works perfectly with > linux-image-4.9.0-8-rt-amd64 (4.9.110). > > However, with kernel linux-image-4.9.0-9-rt-amd64 (4.9.168) installed > (and no other changes to the system other than building the patched > e1000e module against this kernel's headers), something weird happens > when the driver is running in its alternate "ecdev" mode. > > Specifically, when the module is initially loaded, it works as > expected and can send/receive without problems. When link is removed > (by disconnecting the Ethernet cable), it detects this as expected. > When link is restored, it detects this and reports it but is then > unable to actually send any packets. (Note: to send packets the > external code calls the "ndo_start_xmit" operation directly, and to > receive packets it calls "ec_poll". Also note that it won't receive a > packet unless it sends one first, due to the way that the network it's > connected to works, so I can't tell if receives work or not when sends > don't work.) Unloading and reloading the module fixes this, even if > the link is initially down and then reconnected after the module is > reloaded. (So perhaps the problem is something it does at the > link-loss event?) > > Occasionally, it does manage to survive one or two replugs before > getting into the problem state. But once there, no amount of > replugging appears to recover it; only reloading the module. > > I do know that when it's in the failure state (not actually sending > packets), e1000_xmit_frame continues to get all the way to the bottom > and return NETDEV_TX_OK. > > Note that the e1000e code being used is still the code as shown in the > link above, not the code as exists in Linux 4.9.168. I did try > rebasing the ethercat patches onto the new driver version, but this > didn?t seem to change the behavior. > > Also note that the bad behavior was observed on an I219-V and an > I219-LM, but does not appear to happen with an 82571EB (these are the > only devices I have handy at the moment). The problem also doesn't > occur when using the unpatched driver from 4.9.168 as a standard Linux > network driver. > > Obviously, something the patches are doing is causing problems, but it > seems odd that the issue only occurs with certain hardware and with > certain kernel versions. Any ideas on what could be the cause and > solution (or how to narrow it down further)? I can easily make > changes to the driver code; it's a lot harder to try kernel versions > between the two above, however, but I might be able to do that too. (I wouldn't normally quote that much, but I haven't seen this message appear on the mailing list yet, so I'm not sure if it got through or not.) Another data point: on linux-image-4.9.0-8-rt-amd64 (4.9.110), which works ok with the code previously given, if I apply the attached patch (which is the rebase to bring the base driver up to date with 4.9.168) then the same problem occurs. So *either* applying this patch or updating to 4.9.168 without applying this patch introduces the problem. Making the further change below to the code fixes the problem in 4.9.110, but not in 4.9.168: --- a/netdev-4.9-ethercat.c +++ b/netdev-4.9-ethercat.c @@ -5407,7 +5407,7 @@ static void e1000_watchdog_task(struct w * reset the controller to flush the Tx packet buffers. */ if ((adapter->flags & FLAG_RX_NEEDS_RESTART) || - e1000_desc_unused(tx_ring) + 1 < tx_ring->count) + (!adapter->ecdev && e1000_desc_unused(tx_ring) + 1 < tx_ring->count)) adapter->flags |= FLAG_RESTART_NOW; else pm_schedule_suspend(netdev->dev.parent, Since this was mostly just a rebase error (you can see a similar change in the old location of this code), I'm not sure if this helps narrow down the source of the problem between 4.9.110 and 4.9.168 or not. I'm still looking for ideas for that. -------------- next part -------------- A non-text attachment was scrubbed... Name: e1000e_problem.diff Type: text/x-diff Size: 7603 bytes Desc: not available URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20190712/081f7e46/attachment-0001.diff> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-07-12 3:23 ` Gavin Lambert @ 2019-07-18 8:06 ` Gavin Lambert 2019-07-18 8:22 ` Paul Menzel 2019-07-18 8:24 ` Neftin, Sasha 0 siblings, 2 replies; 18+ messages in thread From: Gavin Lambert @ 2019-07-18 8:06 UTC (permalink / raw) To: intel-wired-lan On 2019-07-12 15:23, I wrote: > On 2019-07-11 18:50, I wrote: >> On a Debian system with kernel linux-image-4.9.0-4-rt-amd64 (4.9.65) >> installed, this works perfectly. It also works perfectly with >> linux-image-4.9.0-8-rt-amd64 (4.9.110). >> >> However, with kernel linux-image-4.9.0-9-rt-amd64 (4.9.168) installed >> (and no other changes to the system other than building the patched >> e1000e module against this kernel's headers), something weird happens >> when the driver is running in its alternate "ecdev" mode. [...] > Since this was mostly just a rebase error (you can see a similar > change in the old location of this code), I'm not sure if this helps > narrow down the source of the problem between 4.9.110 and 4.9.168 or > not. I'm still looking for ideas for that. Using this kernel tree: https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/log/?h=v4.9-rt&ofs=3120 I've identified that the code at tag v4.9.126 is "good" and the code at tag v4.9.127 is "bad". I've done a bisect (twice, from different starting points) and both times settled on this commit as the one which introduced the problem I'm experiencing: commit c0b809985a7a418fcc3361c239ae79250245282d (refs/bisect/bad) Author: Tomas Winkler <tomas.winkler@intel.com> Date: Tue Jan 2 12:01:41 2018 +0200 mei: me: allow runtime pm for platform with D0i3 commit cc365dcf0e56271bedf3de95f88922abe248e951 upstream. >From the pci power documentation: "The driver itself should not call pm_runtime_allow(), though. Instead, it should let user space or some platform-specific code do that (user space can do it via sysfs as stated above)..." However, the S0ix residency cannot be reached without MEI device getting into low power state. Hence, for mei devices that support D0i3, it's better to make runtime power management mandatory and not rely on the system integration such as udev rules. This policy cannot be applied globally as some older platforms were found to have broken power management. Cc: <stable@vger.kernel.org> v4.13+ Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Reviewed-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> It is reproducible every time; if I build at the parent commit (3d3432580911) then the driver works, and if I add the commit above then it fails. However it's unclear to me how this is affecting my modified e1000e driver in this way, except that it is perhaps power management related? Since it appears to be a pm_runtime-related thing, just as an experiment I did try commenting out every single call to pm_runtime* functions in netdev.c, but this did not resolve the problem. Ditto for anything with the word "suspend" in it. I also tried adding e_info() logging calls to most places that used pm_ calls other than pm_runtime_get/put (and in particular, in all of the pm_ops callbacks), and none of them were hit during the problem events. And even when it's not working, if I `cat` various things in `/sys/bus/pci/.../power/` on the adapter device, it appears to all be non-suspended, which makes me doubt that it really is a PM issue, unless I'm just looking in the wrong places. Any ideas? ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-07-18 8:06 ` Gavin Lambert @ 2019-07-18 8:22 ` Paul Menzel 2019-07-18 8:24 ` Neftin, Sasha 1 sibling, 0 replies; 18+ messages in thread From: Paul Menzel @ 2019-07-18 8:22 UTC (permalink / raw) To: intel-wired-lan [private answer] Dear Gavin, Your messages were delivered to the list subscribers. On 18.07.19 10:06, Gavin Lambert wrote: > On 2019-07-12 15:23, I wrote: >> On 2019-07-11 18:50, I wrote: >>> On a Debian system with kernel linux-image-4.9.0-4-rt-amd64 (4.9.65) >>> installed, this works perfectly.? It also works perfectly with >>> linux-image-4.9.0-8-rt-amd64 (4.9.110). >>> >>> However, with kernel linux-image-4.9.0-9-rt-amd64 (4.9.168) installed >>> (and no other changes to the system other than building the patched >>> e1000e module against this kernel's headers), something weird happens >>> when the driver is running in its alternate "ecdev" mode. > [...] >> Since this was mostly just a rebase error (you can see a similar >> change in the old location of this code), I'm not sure if this helps >> narrow down the source of the problem between 4.9.110 and 4.9.168 or >> not.? I'm still looking for ideas for that. > > Using this kernel tree: > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/log/?h=v4.9-rt&ofs=3120 > > I've identified that the code at tag v4.9.126 is "good" and the code at > tag v4.9.127 is "bad". > > I've done a bisect (twice, from different starting points) and both > times settled on this commit as the one which introduced the problem I'm > experiencing: > > commit c0b809985a7a418fcc3361c239ae79250245282d (refs/bisect/bad) > Author: Tomas Winkler <tomas.winkler@intel.com> > Date:?? Tue Jan 2 12:01:41 2018 +0200 > > ??? mei: me: allow runtime pm for platform with D0i3 > > ??? commit cc365dcf0e56271bedf3de95f88922abe248e951 upstream. > > ??? >From the pci power documentation: > ??? "The driver itself should not call pm_runtime_allow(), though. > Instead, > ??? it should let user space or some platform-specific code do that > (user space > ??? can do it via sysfs as stated above)..." > > ??? However, the S0ix residency cannot be reached without MEI device > getting > ??? into low power state. Hence, for mei devices that support D0i3, > it's better > ??? to make runtime power management mandatory and not rely on the system > ??? integration such as udev rules. > ??? This policy cannot be applied globally as some older platforms > ??? were found to have broken power management. > > ??? Cc: <stable@vger.kernel.org> v4.13+ > ??? Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > ??? Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> > ??? Reviewed-by: Alexander Usyskin <alexander.usyskin@intel.com> > ??? Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> This commit was added in v4.16-rc1. > It is reproducible every time; if I build at the parent commit > (3d3432580911) then the driver works, and if I add the commit above then > it fails. > > However it's unclear to me how this is affecting my modified e1000e > driver in this way, except that it is perhaps power management related? > > Since it appears to be a pm_runtime-related thing, just as an experiment > I did try commenting out every single call to pm_runtime* functions in > netdev.c, but this did not resolve the problem.? Ditto for anything with > the word "suspend" in it.? I also tried adding e_info() logging calls to > most places that used pm_ calls other than pm_runtime_get/put (and in > particular, in all of the pm_ops callbacks), and none of them were hit > during the problem events. > > And even when it's not working, if I `cat` various things in > `/sys/bus/pci/.../power/` on the adapter device, it appears to all be > non-suspended, which makes me doubt that it really is a PM issue, unless > I'm just looking in the wrong places. If you found a faulty commit, please CC the commit authors, reviewers, and subsystem maintainers and maybe even the regression address. If you have time, please check with Linux master tree to see if a commit fixing this has been added or you still need to revert it. Kind regards, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-07-18 8:06 ` Gavin Lambert 2019-07-18 8:22 ` Paul Menzel @ 2019-07-18 8:24 ` Neftin, Sasha 2019-07-19 0:40 ` Gavin Lambert 1 sibling, 1 reply; 18+ messages in thread From: Neftin, Sasha @ 2019-07-18 8:24 UTC (permalink / raw) To: intel-wired-lan On 7/18/2019 11:06, Gavin Lambert wrote: > On 2019-07-12 15:23, I wrote: >> On 2019-07-11 18:50, I wrote: >>> On a Debian system with kernel linux-image-4.9.0-4-rt-amd64 (4.9.65) >>> installed, this works perfectly.? It also works perfectly with >>> linux-image-4.9.0-8-rt-amd64 (4.9.110). >>> >>> However, with kernel linux-image-4.9.0-9-rt-amd64 (4.9.168) installed >>> (and no other changes to the system other than building the patched >>> e1000e module against this kernel's headers), something weird happens >>> when the driver is running in its alternate "ecdev" mode. > [...] >> Since this was mostly just a rebase error (you can see a similar >> change in the old location of this code), I'm not sure if this helps >> narrow down the source of the problem between 4.9.110 and 4.9.168 or >> not.? I'm still looking for ideas for that. > > Using this kernel tree: > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/log/?h=v4.9-rt&ofs=3120 > > > I've identified that the code at tag v4.9.126 is "good" and the code at > tag v4.9.127 is "bad". > > I've done a bisect (twice, from different starting points) and both > times settled on this commit as the one which introduced the problem I'm > experiencing: > > commit c0b809985a7a418fcc3361c239ae79250245282d (refs/bisect/bad) > Author: Tomas Winkler <tomas.winkler@intel.com> > Date:?? Tue Jan 2 12:01:41 2018 +0200 > > ??? mei: me: allow runtime pm for platform with D0i3 > > ??? commit cc365dcf0e56271bedf3de95f88922abe248e951 upstream. > > ??? >From the pci power documentation: > ??? "The driver itself should not call pm_runtime_allow(), though. > Instead, > ??? it should let user space or some platform-specific code do that > (user space > ??? can do it via sysfs as stated above)..." > > ??? However, the S0ix residency cannot be reached without MEI device > getting > ??? into low power state. Hence, for mei devices that support D0i3, > it's better > ??? to make runtime power management mandatory and not rely on the system > ??? integration such as udev rules. > ??? This policy cannot be applied globally as some older platforms > ??? were found to have broken power management. > > ??? Cc: <stable@vger.kernel.org> v4.13+ > ??? Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > ??? Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> > ??? Reviewed-by: Alexander Usyskin <alexander.usyskin@intel.com> > ??? Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > It is reproducible every time; if I build at the parent commit > (3d3432580911) then the driver works, and if I add the commit above then > it fails. > > However it's unclear to me how this is affecting my modified e1000e > driver in this way, except that it is perhaps power management related? > > Since it appears to be a pm_runtime-related thing, just as an experiment > I did try commenting out every single call to pm_runtime* functions in > netdev.c, but this did not resolve the problem.? Ditto for anything with > the word "suspend" in it.? I also tried adding e_info() logging calls to > most places that used pm_ calls other than pm_runtime_get/put (and in > particular, in all of the pm_ops callbacks), and none of them were hit > during the problem events. > > And even when it's not working, if I `cat` various things in > `/sys/bus/pci/.../power/` on the adapter device, it appears to all be > non-suspended, which makes me doubt that it really is a PM issue, unless > I'm just looking in the wrong places. > > Any ideas? > _______________________________________________ > Intel-wired-lan mailing list > Intel-wired-lan at osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan Please, refer to the commit def4ec6dce393e2136b62a05712f35a7fa5f5e56 on the Jeff Kirsher's next-queue: https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git/commit/drivers/net/ethernet/intel/e1000e?id=def4ec6dce393e2136b62a05712f35a7fa5f5e56 We are working to push this patch to upstream. Thanks, Sasha ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-07-18 8:24 ` Neftin, Sasha @ 2019-07-19 0:40 ` Gavin Lambert 2019-07-19 1:02 ` Gavin Lambert 0 siblings, 1 reply; 18+ messages in thread From: Gavin Lambert @ 2019-07-19 0:40 UTC (permalink / raw) To: intel-wired-lan On 2019-07-18 20:24, Neftin, Sasha wrote: > Please, refer to the commit def4ec6dce393e2136b62a05712f35a7fa5f5e56 > on the Jeff Kirsher's next-queue: > https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git/commit/drivers/net/ethernet/intel/e1000e?id=def4ec6dce393e2136b62a05712f35a7fa5f5e56 > > We are working to push this patch to upstream. Thanks, that does sound identical to my symptoms. However I tried applying this patch to my driver in 4.9 and it does not resolve the problem. Are some additional patches required as well? FWIW, I added some extra logging around the new code. I can confirm that it does execute on link regain but doesn't actually enter the loop in my problem case. The pcim_state is 0x00080083 at the time. So the e1000_phy_hw_reset is never actually called. If I try changing it to call that unconditionally, then it can't successfully establish a link in the first place. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-07-19 0:40 ` Gavin Lambert @ 2019-07-19 1:02 ` Gavin Lambert 2019-08-20 2:15 ` Gavin Lambert 0 siblings, 1 reply; 18+ messages in thread From: Gavin Lambert @ 2019-07-19 1:02 UTC (permalink / raw) To: intel-wired-lan On 2019-07-19 12:40, I wrote: > FWIW, I added some extra logging around the new code. I can confirm > that it does execute on link regain but doesn't actually enter the > loop in my problem case. The pcim_state is 0x00080083 at the time. > So the e1000_phy_hw_reset is never actually called. If I try changing > it to call that unconditionally, then it can't successfully establish > a link in the first place. I added a call to e1000e_dump at the point of link regain, in hopes that it might shed more light. On startup, when it does successfully link and send/receive packets: 0000:00:1f.6: Register Dump Register Name Value CTRL 58180240 STATUS 00080083 CTRL_EXT 995a1027 ICR 00000000 RCTL 04008002 RDLEN 00001000 RDH 00000000 RDT 000000f0 RDTR 00000000 RXDCTL[0-1] 00010000 00010000 ERT 00000000 RDBAL 6061c000 RDBAH 00000002 RDFH 00000000 RDFT 00000000 RDFHS 00000000 RDFTS 00000000 RDFPC 00000000 TCTL 3103f0f8 TDBAL 5e8a0000 TDBAH 00000002 TDLEN 00001000 TDH 00000000 TDT 00000000 TIDV 00000008 TXDCTL[0-1] 0141001f 0141001f TADV 00000020 TARC[0-1] 3d800403 45000403 TDFH 00000d00 TDFT 00000d00 TDFHS 00000d00 TDFTS 00000d00 TDFPC 00000000 ecm0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx On disconnecting and reconnecting the cable, when it does get link but then can't actually send any packets: 0000:00:1f.6: Register Dump Register Name Value CTRL 58180240 STATUS 00080083 CTRL_EXT 995a1027 ICR 00000000 RCTL 04008002 RDLEN 00001000 RDH 000000d1 RDT 000000c0 RDTR 00000000 RXDCTL[0-1] 00010000 00010000 ERT 00000000 RDBAL 6061c000 RDBAH 00000002 RDFH 00000582 RDFT 00000582 RDFHS 00000582 RDFTS 00000582 RDFPC 00000000 TCTL 3103f0fa TDBAL 5e8a0000 TDBAH 00000002 TDLEN 00001000 TDH 00000050 TDT 0000003d TIDV 00000008 TXDCTL[0-1] 0141001f 0141001f TADV 00000020 TARC[0-1] 3d800403 45000403 TDFH 00000f0a TDFT 00000f1c TDFHS 00000f0a TDFTS 00000f0a TDFPC 00000000 ecm0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-07-19 1:02 ` Gavin Lambert @ 2019-08-20 2:15 ` Gavin Lambert 2019-09-03 7:56 ` Gavin Lambert 0 siblings, 1 reply; 18+ messages in thread From: Gavin Lambert @ 2019-08-20 2:15 UTC (permalink / raw) To: intel-wired-lan On 2019-07-19 13:02, I wrote: > On 2019-07-19 12:40, I wrote: >> FWIW, I added some extra logging around the new code. I can confirm >> that it does execute on link regain but doesn't actually enter the >> loop in my problem case. The pcim_state is 0x00080083 at the time. >> So the e1000_phy_hw_reset is never actually called. If I try changing >> it to call that unconditionally, then it can't successfully establish >> a link in the first place. > > I added a call to e1000e_dump at the point of link regain, in hopes > that it might shed more light. [register dumps clipped] Does anyone have any ideas about this? Either towards further investigation or to a possible resolution? This is at the point of hardware internals now, so I have no idea how to proceed in either area. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-08-20 2:15 ` Gavin Lambert @ 2019-09-03 7:56 ` Gavin Lambert 2019-09-03 8:35 ` Paul Menzel 0 siblings, 1 reply; 18+ messages in thread From: Gavin Lambert @ 2019-09-03 7:56 UTC (permalink / raw) To: intel-wired-lan On 2019-08-20 14:15, I wrote: > Does anyone have any ideas about this? Either towards further > investigation or to a possible resolution? > > This is at the point of hardware internals now, so I have no idea how > to proceed in either area. To recap (plus some new info): 1. I am using a kernel module which uses the code from the e1000e driver to communicate with the hardware without actually registering it as a Linux netdev. (This is partly because it can get used in a Xenomai context outside of Linux itself, although I'm not doing that myself.) This historically works fine. 2. On certain Linux versions, I encountered an issue where disconnecting the network cable and reconnecting it almost always results in not being able to send any packets. (I cannot determine if receiving packets works in this case, as the network design will not receive packets unless some are sent first.) Restarting the driver (rmmod+modprobe) does recover from this case (until the next link loss), but simply replugging the cable never does. 3. The problem was observed with both I219-V and I219-LM (on motherboard), but was *not* observed with 82571EB (PCIE). The problem was not observed with a motherboard igb-based I211. I suspect the issue is limited to motherboard-based e1000e adapters. (Or perhaps there's something different about how the IGBs are internally connected.) 4. The problem does not occur when the e1000e driver is registered "normally" as a Linux netdev. 5. The problem was introduced by "mei: me: allow runtime pm for platform with D0i3" (which has been backported to 4.4+, as far as I can tell). Excluding this commit reliably resolves the issue and including it reliably breaks it. 6. Applying the previously suggested patch https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git/commit/drivers/net/ethernet/intel/e1000e?id=def4ec6dce393e2136b62a05712f35a7fa5f5e56 has no effect; the E1000_STATUS_PCIM_STATE bit is not set when the issue occurs. 7. Given the content of the change in #5, I assumed that the problem was power-management related, perhaps a side effect of the e1000e driver not being registered as a netdev. (So perhaps something thinks that no devices are in use and turns something off?) 8. I've previously posted register dumps from an e1000e in both the "normal" and "link up but not transmitting" states. They seemed very similar, but as I'm not familiar with the register meanings I may have overlooked something significant. (Note that the dumps were captured inside the watchdog task, when it detects link up but before it sets E1000_TCTL_EN.) 9. I enabled debug logging in the mei driver; it logs a couple of runtime_idles and then a runtime_suspend during system startup. (I added a log to runtime_resume that is missing in the driver source, but it appears this does not get called in my scenario.) Note that the e1000e driver is still working ok after this.. at least at first. 10. "cat /sys/bus/devices/pci0000:00/0000:00:16.0/power/runtime_status" => "suspended" "cat /sys/bus/devices/pci0000:00/0000:00:16.0/mei/mei0/power/runtime_status" => "unsupported" "cat /sys/bus/devices/pci0000:00/0000:00:1f.0/power/runtime_status" => "active" "cat /sys/bus/devices/pci0000:00/0000:00:1f.6/power/runtime_status" => "active" (this is the actual NIC) These don't change between the working and non-working states. (It's possible that some other device does, but I haven't found it yet.) 11. I did try forcing the above to unsuspend, but this did not recover from the e1000e issue. 12. I also tried calling e1000e_reset on link-down. This produces different register output on link-up, but doesn't recover from the issue. 13. I also tried recompiling the kernel with CONFIG_PM disabled (no power management). This *does* resolve the problem (but is a very big hammer). 14. Possibly also of interest is that if I do *both* #12 and #13, the problem remains (suggesting #12 was counter-productive). FYI the hardware on one of the test machines is as follows: 00:00.0 Host bridge: Intel Corporation Device 591f (rev 05) 00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller (x16) (rev 05) 00:02.0 VGA compatible controller: Intel Corporation Device 5912 (rev 04) 00:08.0 System peripheral: Intel Corporation Skylake Gaussian Mixture Model 00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31) 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-H Thermal subsystem (rev 31) 00:15.0 Signal processing controller: Intel Corporation Sunrise Point-H Serial IO I2C Controller #0 (rev 31) 00:15.1 Signal processing controller: Intel Corporation Sunrise Point-H Serial IO I2C Controller #1 (rev 31) 00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31) 00:17.0 SATA controller: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] (rev 31) 00:1b.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Root Port #19 (rev f1) 00:1b.3 PCI bridge: Intel Corporation Sunrise Point-H PCI Root Port #20 (rev f1) 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1) 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #11 (rev f1) 00:1e.0 Signal processing controller: Intel Corporation Sunrise Point-H Serial IO UART #0 (rev 31) 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC Controller (rev 31) 00:1f.2 Memory controller: Intel Corporation Sunrise Point-H PMC (rev 31) 00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31) 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31) 02:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) 05:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) I'm happy to add any code instrumentation or make any other changes needed to locate and resolve the problem, and I can readily reproduce it -- I'm just at a complete loss as to where to start looking, and am still hoping for some suggestions in that regard. If there's anywhere (or anyone) else better for me to talk to about this issue, please let me know that too. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-09-03 7:56 ` Gavin Lambert @ 2019-09-03 8:35 ` Paul Menzel 2019-09-03 9:20 ` Greg Kroah-Hartman 0 siblings, 1 reply; 18+ messages in thread From: Paul Menzel @ 2019-09-03 8:35 UTC (permalink / raw) To: intel-wired-lan Dear Gavin, Thank you for following up on this. On 03.09.19 09:56, Gavin Lambert wrote: > On 2019-08-20 14:15, I wrote: >> Does anyone have any ideas about this?? Either towards further >> investigation or to a possible resolution? >> >> This is at the point of hardware internals now, so I have no idea how >> to proceed in either area. > > To recap (plus some new info): > > 1. I am using a kernel module which uses the code from the e1000e driver > to communicate with the hardware without actually registering it as a > Linux netdev.? (This is partly because it can get used in a Xenomai > context outside of Linux itself, although I'm not doing that myself.) > This historically works fine. > > 2. On certain Linux versions, I encountered an issue where disconnecting > the network cable and reconnecting it almost always results in not being > able to send any packets.? (I cannot determine if receiving packets > works in this case, as the network design will not receive packets > unless some are sent first.)? Restarting the driver (rmmod+modprobe) > does recover from this case (until the next link loss), but simply > replugging the cable never does. > > 3. The problem was observed with both I219-V and I219-LM (on > motherboard), but was *not* observed with 82571EB (PCIE).? The problem > was not observed with a motherboard igb-based I211.? I suspect the issue > is limited to motherboard-based e1000e adapters.? (Or perhaps there's > something different about how the IGBs are internally connected.) > > 4. The problem does not occur when the e1000e driver is registered > "normally" as a Linux netdev. > > 5. The problem was introduced by "mei: me: allow runtime pm for platform > with D0i3" (which has been backported to 4.4+, as far as I can tell). > Excluding this commit reliably resolves the issue and including it > reliably breaks it. The commit hash in the master branch is cc365dcf0e56271bedf3de95f88922abe248e951 and is there since v4.16-rc1. Strange, that it is in 4.4 and 4.9, as it was only tagged for v4.13+. > 6. Applying the previously suggested patch > https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git/commit/drivers/net/ethernet/intel/e1000e?id=def4ec6dce393e2136b62a05712f35a7fa5f5e56 > has no effect; the E1000_STATUS_PCIM_STATE bit is not set when the issue > occurs. > > 7. Given the content of the change in #5, I assumed that the problem was > power-management related, perhaps a side effect of the e1000e driver not > being registered as a netdev.? (So perhaps something thinks that no > devices are in use and turns something off?) > > 8. I've previously posted register dumps from an e1000e in both the > "normal" and "link up but not transmitting" states.? They seemed very > similar, but as I'm not familiar with the register meanings I may have > overlooked something significant.? (Note that the dumps were captured > inside the watchdog task, when it detects link up but before it sets > E1000_TCTL_EN.) > > 9. I enabled debug logging in the mei driver; it logs a couple of > runtime_idles and then a runtime_suspend during system startup.? (I > added a log to runtime_resume that is missing in the driver source, but > it appears this does not get called in my scenario.)? Note that the > e1000e driver is still working ok after this.. at least at first. > > 10. "cat /sys/bus/devices/pci0000:00/0000:00:16.0/power/runtime_status" > => "suspended" > ??? "cat > /sys/bus/devices/pci0000:00/0000:00:16.0/mei/mei0/power/runtime_status" > => "unsupported" > ??? "cat /sys/bus/devices/pci0000:00/0000:00:1f.0/power/runtime_status" > => "active" > ??? "cat /sys/bus/devices/pci0000:00/0000:00:1f.6/power/runtime_status" > => "active" (this is the actual NIC) > ??? These don't change between the working and non-working states. > (It's possible that some other device does, but I haven't found it yet.) > > 11. I did try forcing the above to unsuspend, but this did not recover > from the e1000e issue. > > 12. I also tried calling e1000e_reset on link-down.? This produces > different register output on link-up, but doesn't recover from the issue. > > 13. I also tried recompiling the kernel with CONFIG_PM disabled (no > power management).? This *does* resolve the problem (but is a very big > hammer). > > 14. Possibly also of interest is that if I do *both* #12 and #13, the > problem remains (suggesting #12 was counter-productive). > > FYI the hardware on one of the test machines is as follows: > ??? 00:00.0 Host bridge: Intel Corporation Device 591f (rev 05) > ??? 00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller (x16) (rev 05) > ??? 00:02.0 VGA compatible controller: Intel Corporation Device 5912 (rev 04) > ??? 00:08.0 System peripheral: Intel Corporation Skylake Gaussian Mixture Model > ??? 00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31) > ??? 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-H Thermal subsystem (rev 31) > ??? 00:15.0 Signal processing controller: Intel Corporation Sunrise Point-H Serial IO I2C Controller #0 (rev 31) > ??? 00:15.1 Signal processing controller: Intel Corporation Sunrise Point-H Serial IO I2C Controller #1 (rev 31) > ??? 00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31) > ??? 00:17.0 SATA controller: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] (rev 31) > ??? 00:1b.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Root Port #19 (rev f1) > ??? 00:1b.3 PCI bridge: Intel Corporation Sunrise Point-H PCI Root Port #20 (rev f1) > ??? 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1) > ??? 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #11 (rev f1) > ??? 00:1e.0 Signal processing controller: Intel Corporation Sunrise Point-H Serial IO UART #0 (rev 31) > ??? 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC Controller (rev 31) > ??? 00:1f.2 Memory controller: Intel Corporation Sunrise Point-H PMC (rev 31) > ??? 00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31) > ??? 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31) > ??? 02:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) > ??? 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) > ??? 05:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) > > I'm happy to add any code instrumentation or make any other changes > needed to locate and resolve the problem, and I can readily reproduce it > -- I'm just at a complete loss as to where to start looking, and am > still hoping for some suggestions in that regard. > > If there's anywhere (or anyone) else better for me to talk to about this > issue, please let me know that too. It is not clear to me, if this is still reproducible on Linux 5.3-rc7 (or Linus? master branch). If it is, this is a definitely regression, and the commits need to be reverted due to Linux? no regression policy. Kind regards, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-09-03 8:35 ` Paul Menzel @ 2019-09-03 9:20 ` Greg Kroah-Hartman 2019-09-03 9:28 ` Winkler, Tomas 0 siblings, 1 reply; 18+ messages in thread From: Greg Kroah-Hartman @ 2019-09-03 9:20 UTC (permalink / raw) To: intel-wired-lan On Tue, Sep 03, 2019 at 10:35:30AM +0200, Paul Menzel wrote: > Dear Gavin, > > > Thank you for following up on this. > > On 03.09.19 09:56, Gavin Lambert wrote: > > On 2019-08-20 14:15, I wrote: > > > Does anyone have any ideas about this?? Either towards further > > > investigation or to a possible resolution? > > > > > > This is at the point of hardware internals now, so I have no idea how > > > to proceed in either area. > > > > To recap (plus some new info): > > > > 1. I am using a kernel module which uses the code from the e1000e driver > > to communicate with the hardware without actually registering it as a > > Linux netdev.? (This is partly because it can get used in a Xenomai > > context outside of Linux itself, although I'm not doing that myself.) > > This historically works fine. > > > > 2. On certain Linux versions, I encountered an issue where disconnecting > > the network cable and reconnecting it almost always results in not being > > able to send any packets.? (I cannot determine if receiving packets > > works in this case, as the network design will not receive packets > > unless some are sent first.)? Restarting the driver (rmmod+modprobe) > > does recover from this case (until the next link loss), but simply > > replugging the cable never does. > > > > 3. The problem was observed with both I219-V and I219-LM (on > > motherboard), but was *not* observed with 82571EB (PCIE).? The problem > > was not observed with a motherboard igb-based I211.? I suspect the issue > > is limited to motherboard-based e1000e adapters.? (Or perhaps there's > > something different about how the IGBs are internally connected.) > > > > 4. The problem does not occur when the e1000e driver is registered > > "normally" as a Linux netdev. > > > > 5. The problem was introduced by "mei: me: allow runtime pm for platform > > with D0i3" (which has been backported to 4.4+, as far as I can tell). > > Excluding this commit reliably resolves the issue and including it > > reliably breaks it. > > The commit hash in the master branch is > cc365dcf0e56271bedf3de95f88922abe248e951 and is there since v4.16-rc1. > > Strange, that it is in 4.4 and 4.9, as it was only tagged for v4.13+. > > > 6. Applying the previously suggested patch https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git/commit/drivers/net/ethernet/intel/e1000e?id=def4ec6dce393e2136b62a05712f35a7fa5f5e56 > > has no effect; the E1000_STATUS_PCIM_STATE bit is not set when the issue > > occurs. > > > > 7. Given the content of the change in #5, I assumed that the problem was > > power-management related, perhaps a side effect of the e1000e driver not > > being registered as a netdev.? (So perhaps something thinks that no > > devices are in use and turns something off?) > > > > 8. I've previously posted register dumps from an e1000e in both the > > "normal" and "link up but not transmitting" states.? They seemed very > > similar, but as I'm not familiar with the register meanings I may have > > overlooked something significant.? (Note that the dumps were captured > > inside the watchdog task, when it detects link up but before it sets > > E1000_TCTL_EN.) > > > > 9. I enabled debug logging in the mei driver; it logs a couple of > > runtime_idles and then a runtime_suspend during system startup.? (I > > added a log to runtime_resume that is missing in the driver source, but > > it appears this does not get called in my scenario.)? Note that the > > e1000e driver is still working ok after this.. at least at first. > > > > 10. "cat /sys/bus/devices/pci0000:00/0000:00:16.0/power/runtime_status" > > => "suspended" > > ??? "cat > > /sys/bus/devices/pci0000:00/0000:00:16.0/mei/mei0/power/runtime_status" > > => "unsupported" > > ??? "cat /sys/bus/devices/pci0000:00/0000:00:1f.0/power/runtime_status" > > => "active" > > ??? "cat /sys/bus/devices/pci0000:00/0000:00:1f.6/power/runtime_status" > > => "active" (this is the actual NIC) > > ??? These don't change between the working and non-working states. > > (It's possible that some other device does, but I haven't found it yet.) > > > > 11. I did try forcing the above to unsuspend, but this did not recover > > from the e1000e issue. > > > > 12. I also tried calling e1000e_reset on link-down.? This produces > > different register output on link-up, but doesn't recover from the > > issue. > > > > 13. I also tried recompiling the kernel with CONFIG_PM disabled (no > > power management).? This *does* resolve the problem (but is a very big > > hammer). > > > > 14. Possibly also of interest is that if I do *both* #12 and #13, the > > problem remains (suggesting #12 was counter-productive). > > > > FYI the hardware on one of the test machines is as follows: > > ??? 00:00.0 Host bridge: Intel Corporation Device 591f (rev 05) > > ??? 00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller (x16) (rev 05) > > ??? 00:02.0 VGA compatible controller: Intel Corporation Device 5912 (rev 04) > > ??? 00:08.0 System peripheral: Intel Corporation Skylake Gaussian Mixture Model > > ??? 00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31) > > ??? 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-H Thermal subsystem (rev 31) > > ??? 00:15.0 Signal processing controller: Intel Corporation Sunrise Point-H Serial IO I2C Controller #0 (rev 31) > > ??? 00:15.1 Signal processing controller: Intel Corporation Sunrise Point-H Serial IO I2C Controller #1 (rev 31) > > ??? 00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31) > > ??? 00:17.0 SATA controller: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] (rev 31) > > ??? 00:1b.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Root Port #19 (rev f1) > > ??? 00:1b.3 PCI bridge: Intel Corporation Sunrise Point-H PCI Root Port #20 (rev f1) > > ??? 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1) > > ??? 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #11 (rev f1) > > ??? 00:1e.0 Signal processing controller: Intel Corporation Sunrise Point-H Serial IO UART #0 (rev 31) > > ??? 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC Controller (rev 31) > > ??? 00:1f.2 Memory controller: Intel Corporation Sunrise Point-H PMC (rev 31) > > ??? 00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31) > > ??? 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31) > > ??? 02:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) > > ??? 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) > > ??? 05:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) > > > > I'm happy to add any code instrumentation or make any other changes > > needed to locate and resolve the problem, and I can readily reproduce it > > -- I'm just at a complete loss as to where to start looking, and am > > still hoping for some suggestions in that regard. > > > > If there's anywhere (or anyone) else better for me to talk to about this > > issue, please let me know that too. > > It is not clear to me, if this is still reproducible on Linux 5.3-rc7 (or > Linus? master branch). > > If it is, this is a definitely regression, and the commits need to be > reverted due to Linux? no regression policy. So I should revert this from 4.4.y and 4.9.y? thanks, greg k-h ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-09-03 9:20 ` Greg Kroah-Hartman @ 2019-09-03 9:28 ` Winkler, Tomas 2019-09-03 9:39 ` Paul Menzel 0 siblings, 1 reply; 18+ messages in thread From: Winkler, Tomas @ 2019-09-03 9:28 UTC (permalink / raw) To: intel-wired-lan > On Tue, Sep 03, 2019 at 10:35:30AM +0200, Paul Menzel wrote: > > Dear Gavin, > > > > > > Thank you for following up on this. > > > > On 03.09.19 09:56, Gavin Lambert wrote: > > > On 2019-08-20 14:15, I wrote: > > > > Does anyone have any ideas about this?? Either towards further > > > > investigation or to a possible resolution? > > > > > > > > This is at the point of hardware internals now, so I have no idea > > > > how to proceed in either area. > > > > > > To recap (plus some new info): > > > > > > 1. I am using a kernel module which uses the code from the e1000e > > > driver to communicate with the hardware without actually registering > > > it as a Linux netdev.? (This is partly because it can get used in a > > > Xenomai context outside of Linux itself, although I'm not doing that > > > myself.) This historically works fine. > > > > > > 2. On certain Linux versions, I encountered an issue where > > > disconnecting the network cable and reconnecting it almost always > > > results in not being able to send any packets.? (I cannot determine > > > if receiving packets works in this case, as the network design will > > > not receive packets unless some are sent first.)? Restarting the > > > driver (rmmod+modprobe) does recover from this case (until the next > > > link loss), but simply replugging the cable never does. > > > > > > 3. The problem was observed with both I219-V and I219-LM (on > > > motherboard), but was *not* observed with 82571EB (PCIE).? The > > > problem was not observed with a motherboard igb-based I211.? I > > > suspect the issue is limited to motherboard-based e1000e adapters. > > > (Or perhaps there's something different about how the IGBs are > > > internally connected.) > > > > > > 4. The problem does not occur when the e1000e driver is registered > > > "normally" as a Linux netdev. > > > > > > 5. The problem was introduced by "mei: me: allow runtime pm for > > > platform with D0i3" (which has been backported to 4.4+, as far as I can > tell). > > > Excluding this commit reliably resolves the issue and including it > > > reliably breaks it. > > > > The commit hash in the master branch is > > cc365dcf0e56271bedf3de95f88922abe248e951 and is there since v4.16-rc1. > > > > Strange, that it is in 4.4 and 4.9, as it was only tagged for v4.13+. > > > > > 6. Applying the previously suggested patch > > > https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue. > > > git/commit/drivers/net/ethernet/intel/e1000e?id=def4ec6dce393e2136b6 > > > 2a05712f35a7fa5f5e56 has no effect; the E1000_STATUS_PCIM_STATE bit > > > is not set when the issue occurs. > > > > > > 7. Given the content of the change in #5, I assumed that the problem > > > was power-management related, perhaps a side effect of the e1000e > > > driver not being registered as a netdev.? (So perhaps something > > > thinks that no devices are in use and turns something off?) > > > > > > 8. I've previously posted register dumps from an e1000e in both the > > > "normal" and "link up but not transmitting" states.? They seemed > > > very similar, but as I'm not familiar with the register meanings I > > > may have overlooked something significant.? (Note that the dumps > > > were captured inside the watchdog task, when it detects link up but > > > before it sets > > > E1000_TCTL_EN.) > > > > > > 9. I enabled debug logging in the mei driver; it logs a couple of > > > runtime_idles and then a runtime_suspend during system startup.? (I > > > added a log to runtime_resume that is missing in the driver source, > > > but it appears this does not get called in my scenario.)? Note that > > > the e1000e driver is still working ok after this.. at least at first. > > > > > > 10. "cat /sys/bus/devices/pci0000:00/0000:00:16.0/power/runtime_status" > > > => "suspended" > > > ??? "cat > > > > /sys/bus/devices/pci0000:00/0000:00:16.0/mei/mei0/power/runtime_status" > > > => "unsupported" > > > ??? "cat /sys/bus/devices/pci0000:00/0000:00:1f.0/power/runtime_status" > > > => "active" > > > ??? "cat /sys/bus/devices/pci0000:00/0000:00:1f.6/power/runtime_status" > > > => "active" (this is the actual NIC) > > > ??? These don't change between the working and non-working states. > > > (It's possible that some other device does, but I haven't found it > > > yet.) > > > > > > 11. I did try forcing the above to unsuspend, but this did not > > > recover from the e1000e issue. > > > > > > 12. I also tried calling e1000e_reset on link-down.? This produces > > > different register output on link-up, but doesn't recover from the > > > issue. > > > > > > 13. I also tried recompiling the kernel with CONFIG_PM disabled (no > > > power management).? This *does* resolve the problem (but is a very > > > big hammer). > > > > > > 14. Possibly also of interest is that if I do *both* #12 and #13, > > > the problem remains (suggesting #12 was counter-productive). > > > > > > FYI the hardware on one of the test machines is as follows: > > > ??? 00:00.0 Host bridge: Intel Corporation Device 591f (rev 05) > > > ??? 00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller > > > (x16) (rev 05) > > > ??? 00:02.0 VGA compatible controller: Intel Corporation Device > > > 5912 (rev 04) > > > ??? 00:08.0 System peripheral: Intel Corporation Skylake Gaussian > > > Mixture Model > > > ??? 00:14.0 USB controller: Intel Corporation Sunrise Point-H USB > > > 3.0 xHCI Controller (rev 31) > > > ??? 00:14.2 Signal processing controller: Intel Corporation Sunrise > > > Point-H Thermal subsystem (rev 31) > > > ??? 00:15.0 Signal processing controller: Intel Corporation Sunrise > > > Point-H Serial IO I2C Controller #0 (rev 31) > > > ??? 00:15.1 Signal processing controller: Intel Corporation Sunrise > > > Point-H Serial IO I2C Controller #1 (rev 31) > > > ??? 00:16.0 Communication controller: Intel Corporation Sunrise > > > Point-H CSME HECI #1 (rev 31) > > > ??? 00:17.0 SATA controller: Intel Corporation Sunrise Point-H SATA > > > controller [AHCI mode] (rev 31) > > > ??? 00:1b.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Root > > > Port #19 (rev f1) > > > ??? 00:1b.3 PCI bridge: Intel Corporation Sunrise Point-H PCI Root > > > Port #20 (rev f1) > > > ??? 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI > > > Express Root Port #5 (rev f1) > > > ??? 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI > > > Express Root Port #11 (rev f1) > > > ??? 00:1e.0 Signal processing controller: Intel Corporation Sunrise > > > Point-H Serial IO UART #0 (rev 31) > > > ??? 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC > > > Controller (rev 31) > > > ??? 00:1f.2 Memory controller: Intel Corporation Sunrise Point-H > > > PMC (rev 31) > > > ??? 00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31) > > > ??? 00:1f.6 Ethernet controller: Intel Corporation Ethernet > > > Connection (2) I219-LM (rev 31) > > > ??? 02:00.0 Ethernet controller: Intel Corporation I211 Gigabit > > > Network Connection (rev 03) > > > ??? 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit > > > Network Connection (rev 03) > > > ??? 05:00.0 Ethernet controller: Intel Corporation I211 Gigabit > > > Network Connection (rev 03) > > > > > > I'm happy to add any code instrumentation or make any other changes > > > needed to locate and resolve the problem, and I can readily > > > reproduce it > > > -- I'm just at a complete loss as to where to start looking, and am > > > still hoping for some suggestions in that regard. > > > > > > If there's anywhere (or anyone) else better for me to talk to about > > > this issue, please let me know that too. > > > > It is not clear to me, if this is still reproducible on Linux 5.3-rc7 > > (or Linus? master branch). > > > > If it is, this is a definitely regression, and the commits need to be > > reverted due to Linux? no regression policy. > > So I should revert this from 4.4.y and 4.9.y? The issue is not in mei driver, it is in e1000 driver, I my best knowledge there should be fix, please Vitaly can it be backported to older kernels? Thanks Tomas ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-09-03 9:28 ` Winkler, Tomas @ 2019-09-03 9:39 ` Paul Menzel 2019-09-03 11:00 ` Gavin Lambert 0 siblings, 1 reply; 18+ messages in thread From: Paul Menzel @ 2019-09-03 9:39 UTC (permalink / raw) To: intel-wired-lan Dear Tomas, On 2019-09-03 11:28, Winkler, Tomas wrote: >> On Tue, Sep 03, 2019 at 10:35:30AM +0200, Paul Menzel wrote: >>> On 03.09.19 09:56, Gavin Lambert wrote: >>>> On 2019-08-20 14:15, I wrote: >>>>> Does anyone have any ideas about this?? Either towards further >>>>> investigation or to a possible resolution? >>>>> >>>>> This is at the point of hardware internals now, so I have no idea >>>>> how to proceed in either area. >>>> >>>> To recap (plus some new info): >>>> >>>> 1. I am using a kernel module which uses the code from the e1000e >>>> driver to communicate with the hardware without actually registering >>>> it as a Linux netdev.? (This is partly because it can get used in a >>>> Xenomai context outside of Linux itself, although I'm not doing that >>>> myself.) This historically works fine. >>>> >>>> 2. On certain Linux versions, I encountered an issue where >>>> disconnecting the network cable and reconnecting it almost always >>>> results in not being able to send any packets.? (I cannot determine >>>> if receiving packets works in this case, as the network design will >>>> not receive packets unless some are sent first.)? Restarting the >>>> driver (rmmod+modprobe) does recover from this case (until the next >>>> link loss), but simply replugging the cable never does. >>>> >>>> 3. The problem was observed with both I219-V and I219-LM (on >>>> motherboard), but was *not* observed with 82571EB (PCIE).? The >>>> problem was not observed with a motherboard igb-based I211.? I >>>> suspect the issue is limited to motherboard-based e1000e adapters. >>>> (Or perhaps there's something different about how the IGBs are >>>> internally connected.) >>>> >>>> 4. The problem does not occur when the e1000e driver is registered >>>> "normally" as a Linux netdev. >>>> >>>> 5. The problem was introduced by "mei: me: allow runtime pm for >>>> platform with D0i3" (which has been backported to 4.4+, as far as I can >> tell). >>>> Excluding this commit reliably resolves the issue and including it >>>> reliably breaks it. >>> >>> The commit hash in the master branch is >>> cc365dcf0e56271bedf3de95f88922abe248e951 and is there since v4.16-rc1. >>> >>> Strange, that it is in 4.4 and 4.9, as it was only tagged for v4.13+. >>> >>>> 6. Applying the previously suggested patch >>>> https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue. >>>> git/commit/drivers/net/ethernet/intel/e1000e?id=def4ec6dce393e2136b6 >>>> 2a05712f35a7fa5f5e56 has no effect; the E1000_STATUS_PCIM_STATE bit >>>> is not set when the issue occurs. >>>> >>>> 7. Given the content of the change in #5, I assumed that the problem >>>> was power-management related, perhaps a side effect of the e1000e >>>> driver not being registered as a netdev.? (So perhaps something >>>> thinks that no devices are in use and turns something off?) >>>> >>>> 8. I've previously posted register dumps from an e1000e in both the >>>> "normal" and "link up but not transmitting" states.? They seemed >>>> very similar, but as I'm not familiar with the register meanings I >>>> may have overlooked something significant.? (Note that the dumps >>>> were captured inside the watchdog task, when it detects link up but >>>> before it sets >>>> E1000_TCTL_EN.) >>>> >>>> 9. I enabled debug logging in the mei driver; it logs a couple of >>>> runtime_idles and then a runtime_suspend during system startup.? (I >>>> added a log to runtime_resume that is missing in the driver source, >>>> but it appears this does not get called in my scenario.)? Note that >>>> the e1000e driver is still working ok after this.. at least at first. >>>> >>>> 10. "cat /sys/bus/devices/pci0000:00/0000:00:16.0/power/runtime_status" >>>> => "suspended" >>>> ??? "cat >>>> >> /sys/bus/devices/pci0000:00/0000:00:16.0/mei/mei0/power/runtime_status" >>>> => "unsupported" >>>> ??? "cat /sys/bus/devices/pci0000:00/0000:00:1f.0/power/runtime_status" >>>> => "active" >>>> ??? "cat /sys/bus/devices/pci0000:00/0000:00:1f.6/power/runtime_status" >>>> => "active" (this is the actual NIC) >>>> ??? These don't change between the working and non-working states. >>>> (It's possible that some other device does, but I haven't found it >>>> yet.) >>>> >>>> 11. I did try forcing the above to unsuspend, but this did not >>>> recover from the e1000e issue. >>>> >>>> 12. I also tried calling e1000e_reset on link-down.? This produces >>>> different register output on link-up, but doesn't recover from the >>>> issue. >>>> >>>> 13. I also tried recompiling the kernel with CONFIG_PM disabled (no >>>> power management).? This *does* resolve the problem (but is a very >>>> big hammer). >>>> >>>> 14. Possibly also of interest is that if I do *both* #12 and #13, >>>> the problem remains (suggesting #12 was counter-productive). >>>> >>>> FYI the hardware on one of the test machines is as follows: >>>> ??? 00:00.0 Host bridge: Intel Corporation Device 591f (rev 05) >>>> ??? 00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller >>>> (x16) (rev 05) >>>> ??? 00:02.0 VGA compatible controller: Intel Corporation Device >>>> 5912 (rev 04) >>>> ??? 00:08.0 System peripheral: Intel Corporation Skylake Gaussian >>>> Mixture Model >>>> ??? 00:14.0 USB controller: Intel Corporation Sunrise Point-H USB >>>> 3.0 xHCI Controller (rev 31) >>>> ??? 00:14.2 Signal processing controller: Intel Corporation Sunrise >>>> Point-H Thermal subsystem (rev 31) >>>> ??? 00:15.0 Signal processing controller: Intel Corporation Sunrise >>>> Point-H Serial IO I2C Controller #0 (rev 31) >>>> ??? 00:15.1 Signal processing controller: Intel Corporation Sunrise >>>> Point-H Serial IO I2C Controller #1 (rev 31) >>>> ??? 00:16.0 Communication controller: Intel Corporation Sunrise >>>> Point-H CSME HECI #1 (rev 31) >>>> ??? 00:17.0 SATA controller: Intel Corporation Sunrise Point-H SATA >>>> controller [AHCI mode] (rev 31) >>>> ??? 00:1b.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Root >>>> Port #19 (rev f1) >>>> ??? 00:1b.3 PCI bridge: Intel Corporation Sunrise Point-H PCI Root >>>> Port #20 (rev f1) >>>> ??? 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI >>>> Express Root Port #5 (rev f1) >>>> ??? 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI >>>> Express Root Port #11 (rev f1) >>>> ??? 00:1e.0 Signal processing controller: Intel Corporation Sunrise >>>> Point-H Serial IO UART #0 (rev 31) >>>> ??? 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC >>>> Controller (rev 31) >>>> ??? 00:1f.2 Memory controller: Intel Corporation Sunrise Point-H >>>> PMC (rev 31) >>>> ??? 00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31) >>>> ??? 00:1f.6 Ethernet controller: Intel Corporation Ethernet >>>> Connection (2) I219-LM (rev 31) >>>> ??? 02:00.0 Ethernet controller: Intel Corporation I211 Gigabit >>>> Network Connection (rev 03) >>>> ??? 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit >>>> Network Connection (rev 03) >>>> ??? 05:00.0 Ethernet controller: Intel Corporation I211 Gigabit >>>> Network Connection (rev 03) (Tomas, your MUA wrapped the lines messing up the formatting.) >>>> I'm happy to add any code instrumentation or make any other changes >>>> needed to locate and resolve the problem, and I can readily >>>> reproduce it >>>> -- I'm just at a complete loss as to where to start looking, and am >>>> still hoping for some suggestions in that regard. >>>> >>>> If there's anywhere (or anyone) else better for me to talk to about >>>> this issue, please let me know that too. >>> >>> It is not clear to me, if this is still reproducible on Linux 5.3-rc7 >>> (or Linus? master branch). >>> >>> If it is, this is a definitely regression, and the commits need to be >>> reverted due to Linux? no regression policy. >> >> So I should revert this from 4.4.y and 4.9.y? > > The issue is not in mei driver, it is in e1000 driver, I my best > knowledge there should be fix, please Vitaly can it be backported to > older kernels? Tomas, backporting the commit supposedly fixing this, does *not* help. Also, it does not matter for the no regression policy. Let?s wait until Gavin can confirm if it is happening with Linux 5.3-rc7. Kind regards, Paul -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5174 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20190903/a8484910/attachment-0001.p7s> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-09-03 9:39 ` Paul Menzel @ 2019-09-03 11:00 ` Gavin Lambert 2019-09-04 10:06 ` Winkler, Tomas 0 siblings, 1 reply; 18+ messages in thread From: Gavin Lambert @ 2019-09-03 11:00 UTC (permalink / raw) To: intel-wired-lan On 2019-09-03 21:39, Paul Menzel wrote: > Dear Tomas, > > On 2019-09-03 11:28, Winkler, Tomas wrote: > >>> On Tue, Sep 03, 2019 at 10:35:30AM +0200, Paul Menzel wrote: > >>>> On 03.09.19 09:56, Gavin Lambert wrote: >>>>> On 2019-08-20 14:15, I wrote: >>>>>> Does anyone have any ideas about this?? Either towards further >>>>>> investigation or to a possible resolution? >>>>>> >>>>>> This is at the point of hardware internals now, so I have no idea >>>>>> how to proceed in either area. >>>>> >>>>> To recap (plus some new info): >>>>> >>>>> 1. I am using a kernel module which uses the code from the e1000e >>>>> driver to communicate with the hardware without actually >>>>> registering >>>>> it as a Linux netdev.? (This is partly because it can get used in a >>>>> Xenomai context outside of Linux itself, although I'm not doing >>>>> that >>>>> myself.) This historically works fine. >>>>> >>>>> 2. On certain Linux versions, I encountered an issue where >>>>> disconnecting the network cable and reconnecting it almost always >>>>> results in not being able to send any packets.? (I cannot determine >>>>> if receiving packets works in this case, as the network design will >>>>> not receive packets unless some are sent first.)? Restarting the >>>>> driver (rmmod+modprobe) does recover from this case (until the next >>>>> link loss), but simply replugging the cable never does. >>>>> >>>>> 3. The problem was observed with both I219-V and I219-LM (on >>>>> motherboard), but was *not* observed with 82571EB (PCIE).? The >>>>> problem was not observed with a motherboard igb-based I211.? I >>>>> suspect the issue is limited to motherboard-based e1000e adapters. >>>>> (Or perhaps there's something different about how the IGBs are >>>>> internally connected.) >>>>> >>>>> 4. The problem does not occur when the e1000e driver is registered >>>>> "normally" as a Linux netdev. >>>>> >>>>> 5. The problem was introduced by "mei: me: allow runtime pm for >>>>> platform with D0i3" (which has been backported to 4.4+, as far as I >>>>> can >>> tell). >>>>> Excluding this commit reliably resolves the issue and including it >>>>> reliably breaks it. >>>> >>>> The commit hash in the master branch is >>>> cc365dcf0e56271bedf3de95f88922abe248e951 and is there since >>>> v4.16-rc1. >>>> >>>> Strange, that it is in 4.4 and 4.9, as it was only tagged for >>>> v4.13+. >>>> >>>>> 6. Applying the previously suggested patch >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue. >>>>> git/commit/drivers/net/ethernet/intel/e1000e?id=def4ec6dce393e2136b6 >>>>> 2a05712f35a7fa5f5e56 has no effect; the E1000_STATUS_PCIM_STATE bit >>>>> is not set when the issue occurs. >>>>> >>>>> 7. Given the content of the change in #5, I assumed that the >>>>> problem >>>>> was power-management related, perhaps a side effect of the e1000e >>>>> driver not being registered as a netdev.? (So perhaps something >>>>> thinks that no devices are in use and turns something off?) >>>>> >>>>> 8. I've previously posted register dumps from an e1000e in both the >>>>> "normal" and "link up but not transmitting" states.? They seemed >>>>> very similar, but as I'm not familiar with the register meanings I >>>>> may have overlooked something significant.? (Note that the dumps >>>>> were captured inside the watchdog task, when it detects link up but >>>>> before it sets >>>>> E1000_TCTL_EN.) >>>>> >>>>> 9. I enabled debug logging in the mei driver; it logs a couple of >>>>> runtime_idles and then a runtime_suspend during system startup.? (I >>>>> added a log to runtime_resume that is missing in the driver source, >>>>> but it appears this does not get called in my scenario.)? Note that >>>>> the e1000e driver is still working ok after this.. at least at >>>>> first. >>>>> >>>>> 10. "cat >>>>> /sys/bus/devices/pci0000:00/0000:00:16.0/power/runtime_status" >>>>> => "suspended" >>>>> ??? "cat >>>>> >>> /sys/bus/devices/pci0000:00/0000:00:16.0/mei/mei0/power/runtime_status" >>>>> => "unsupported" >>>>> ??? "cat >>>>> /sys/bus/devices/pci0000:00/0000:00:1f.0/power/runtime_status" >>>>> => "active" >>>>> ??? "cat >>>>> /sys/bus/devices/pci0000:00/0000:00:1f.6/power/runtime_status" >>>>> => "active" (this is the actual NIC) >>>>> ??? These don't change between the working and non-working states. >>>>> (It's possible that some other device does, but I haven't found it >>>>> yet.) >>>>> >>>>> 11. I did try forcing the above to unsuspend, but this did not >>>>> recover from the e1000e issue. >>>>> >>>>> 12. I also tried calling e1000e_reset on link-down.? This produces >>>>> different register output on link-up, but doesn't recover from the >>>>> issue. >>>>> >>>>> 13. I also tried recompiling the kernel with CONFIG_PM disabled (no >>>>> power management).? This *does* resolve the problem (but is a very >>>>> big hammer). >>>>> >>>>> 14. Possibly also of interest is that if I do *both* #12 and #13, >>>>> the problem remains (suggesting #12 was counter-productive). >>>>> >>>>> FYI the hardware on one of the test machines is as follows: >>>>> ??? 00:00.0 Host bridge: Intel Corporation Device 591f (rev 05) >>>>> ??? 00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller >>>>> (x16) (rev 05) >>>>> ??? 00:02.0 VGA compatible controller: Intel Corporation Device >>>>> 5912 (rev 04) >>>>> ??? 00:08.0 System peripheral: Intel Corporation Skylake Gaussian >>>>> Mixture Model >>>>> ??? 00:14.0 USB controller: Intel Corporation Sunrise Point-H USB >>>>> 3.0 xHCI Controller (rev 31) >>>>> ??? 00:14.2 Signal processing controller: Intel Corporation >>>>> Sunrise >>>>> Point-H Thermal subsystem (rev 31) >>>>> ??? 00:15.0 Signal processing controller: Intel Corporation >>>>> Sunrise >>>>> Point-H Serial IO I2C Controller #0 (rev 31) >>>>> ??? 00:15.1 Signal processing controller: Intel Corporation >>>>> Sunrise >>>>> Point-H Serial IO I2C Controller #1 (rev 31) >>>>> ??? 00:16.0 Communication controller: Intel Corporation Sunrise >>>>> Point-H CSME HECI #1 (rev 31) >>>>> ??? 00:17.0 SATA controller: Intel Corporation Sunrise Point-H >>>>> SATA >>>>> controller [AHCI mode] (rev 31) >>>>> ??? 00:1b.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Root >>>>> Port #19 (rev f1) >>>>> ??? 00:1b.3 PCI bridge: Intel Corporation Sunrise Point-H PCI Root >>>>> Port #20 (rev f1) >>>>> ??? 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI >>>>> Express Root Port #5 (rev f1) >>>>> ??? 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI >>>>> Express Root Port #11 (rev f1) >>>>> ??? 00:1e.0 Signal processing controller: Intel Corporation >>>>> Sunrise >>>>> Point-H Serial IO UART #0 (rev 31) >>>>> ??? 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC >>>>> Controller (rev 31) >>>>> ??? 00:1f.2 Memory controller: Intel Corporation Sunrise Point-H >>>>> PMC (rev 31) >>>>> ??? 00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev >>>>> 31) >>>>> ??? 00:1f.6 Ethernet controller: Intel Corporation Ethernet >>>>> Connection (2) I219-LM (rev 31) >>>>> ??? 02:00.0 Ethernet controller: Intel Corporation I211 Gigabit >>>>> Network Connection (rev 03) >>>>> ??? 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit >>>>> Network Connection (rev 03) >>>>> ??? 05:00.0 Ethernet controller: Intel Corporation I211 Gigabit >>>>> Network Connection (rev 03) > > (Tomas, your MUA wrapped the lines messing up the formatting.) > >>>>> I'm happy to add any code instrumentation or make any other changes >>>>> needed to locate and resolve the problem, and I can readily >>>>> reproduce it >>>>> -- I'm just at a complete loss as to where to start looking, and am >>>>> still hoping for some suggestions in that regard. >>>>> >>>>> If there's anywhere (or anyone) else better for me to talk to about >>>>> this issue, please let me know that too. >>>> >>>> It is not clear to me, if this is still reproducible on Linux >>>> 5.3-rc7 >>>> (or Linus? master branch). >>>> >>>> If it is, this is a definitely regression, and the commits need to >>>> be >>>> reverted due to Linux? no regression policy. >>> >>> So I should revert this from 4.4.y and 4.9.y? >> >> The issue is not in mei driver, it is in e1000 driver, I my best >> knowledge there should be fix, please Vitaly can it be backported to >> older kernels? > > Tomas, backporting the commit supposedly fixing this, does *not* help. > Also, it does not matter for the no regression policy. > > Let?s wait until Gavin can confirm if it is happening with Linux > 5.3-rc7. As noted above (and in a prior email), the problem doesn't occur when using the driver "normally" within Linux. The triggering environment is where the driver init/send/receive code is being executed directly *without* being registered as a Linux netdev. It is likely that the "real problem" is some side effect of this, such as something checking if a child device is in use or powered down but it's not registered. My environment is currently based on this tree: > Using this kernel tree: > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/log/?h=v4.9-rt&ofs=3120 > > I've identified that the code at tag v4.9.126 is "good" and the > code at tag v4.9.127 is "bad". (I then narrowed it down to that specific commit.) To reiterate, there is probably no problem with standard usage of the drivers as part of Linux. But in this particular non-standard-edge-case-usage, there seems to be some unfortunate interaction between the mei driver power management change and link-loss in onboard e1000e, and I'm trying to figure out the cause and hopefully a fix/workaround (or at least one less serious than disabling power management entirely). Some more context from my original email: > I'm using a system with an e1000e network driver which has been patched > to bypass the regular Linux network stack (because it can get called > from a Xenomai RT context, among other reasons -- although in my case > I'm not doing that). The complete source for the patched version of > the code can be found here: > > https://github.com/ribalda/ethercat/blob/master/devices/e1000e/netdev-4.9-ethercat.c > (There are some minor changes to other files, but the majority of > changes are only to this file. You can see just the changes at > https://gist.github.com/uecasm/5e36a15bda6ffd53079344fc443dcc5f/revisions > .) > > It was originally based on the in-kernel e1000e driver as of Linux > 4.9.65. (I'm not the person who originally made the patches, but I am > the person who rebased them to kernel 4.9 and I'm the one trying to > maintain them for newer kernel versions. Though I'm also not the > person who made that github repo.) ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-09-03 11:00 ` Gavin Lambert @ 2019-09-04 10:06 ` Winkler, Tomas 2019-09-04 11:08 ` Gavin Lambert 0 siblings, 1 reply; 18+ messages in thread From: Winkler, Tomas @ 2019-09-04 10:06 UTC (permalink / raw) To: intel-wired-lan > > On 2019-09-03 21:39, Paul Menzel wrote: > > Dear Tomas, > > > > On 2019-09-03 11:28, Winkler, Tomas wrote: > > > >>> On Tue, Sep 03, 2019 at 10:35:30AM +0200, Paul Menzel wrote: > > > >>>> On 03.09.19 09:56, Gavin Lambert wrote: > >>>>> On 2019-08-20 14:15, I wrote: > >>>>>> Does anyone have any ideas about this?? Either towards further > >>>>>> investigation or to a possible resolution? > >>>>>> > >>>>>> This is at the point of hardware internals now, so I have no idea > >>>>>> how to proceed in either area. > >>>>> > >>>>> To recap (plus some new info): > >>>>> > >>>>> 1. I am using a kernel module which uses the code from the e1000e > >>>>> driver to communicate with the hardware without actually > >>>>> registering it as a Linux netdev.? (This is partly because it can > >>>>> get used in a Xenomai context outside of Linux itself, although > >>>>> I'm not doing that > >>>>> myself.) This historically works fine. > >>>>> > >>>>> 2. On certain Linux versions, I encountered an issue where > >>>>> disconnecting the network cable and reconnecting it almost always > >>>>> results in not being able to send any packets.? (I cannot > >>>>> determine if receiving packets works in this case, as the network > >>>>> design will not receive packets unless some are sent first.) > >>>>> Restarting the driver (rmmod+modprobe) does recover from this case > >>>>> (until the next link loss), but simply replugging the cable never does. > >>>>> > >>>>> 3. The problem was observed with both I219-V and I219-LM (on > >>>>> motherboard), but was *not* observed with 82571EB (PCIE).? The > >>>>> problem was not observed with a motherboard igb-based I211.? I > >>>>> suspect the issue is limited to motherboard-based e1000e adapters. > >>>>> (Or perhaps there's something different about how the IGBs are > >>>>> internally connected.) > >>>>> > >>>>> 4. The problem does not occur when the e1000e driver is registered > >>>>> "normally" as a Linux netdev. > >>>>> > >>>>> 5. The problem was introduced by "mei: me: allow runtime pm for > >>>>> platform with D0i3" (which has been backported to 4.4+, as far as > >>>>> I can > >>> tell). > >>>>> Excluding this commit reliably resolves the issue and including it > >>>>> reliably breaks it. > >>>> > >>>> The commit hash in the master branch is > >>>> cc365dcf0e56271bedf3de95f88922abe248e951 and is there since > >>>> v4.16-rc1. > >>>> > >>>> Strange, that it is in 4.4 and 4.9, as it was only tagged for > >>>> v4.13+. > >>>> > >>>>> 6. Applying the previously suggested patch > >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue. > >>>>> > git/commit/drivers/net/ethernet/intel/e1000e?id=def4ec6dce393e2136 > >>>>> b6 > >>>>> 2a05712f35a7fa5f5e56 has no effect; the E1000_STATUS_PCIM_STATE > >>>>> bit is not set when the issue occurs. > >>>>> > >>>>> 7. Given the content of the change in #5, I assumed that the > >>>>> problem was power-management related, perhaps a side effect of the > >>>>> e1000e driver not being registered as a netdev.? (So perhaps > >>>>> something thinks that no devices are in use and turns something > >>>>> off?) > >>>>> > >>>>> 8. I've previously posted register dumps from an e1000e in both > >>>>> the "normal" and "link up but not transmitting" states.? They > >>>>> seemed very similar, but as I'm not familiar with the register > >>>>> meanings I may have overlooked something significant.? (Note that > >>>>> the dumps were captured inside the watchdog task, when it detects > >>>>> link up but before it sets > >>>>> E1000_TCTL_EN.) > >>>>> > >>>>> 9. I enabled debug logging in the mei driver; it logs a couple of > >>>>> runtime_idles and then a runtime_suspend during system startup. > >>>>> (I added a log to runtime_resume that is missing in the driver > >>>>> source, but it appears this does not get called in my scenario.) > >>>>> Note that the e1000e driver is still working ok after this.. at > >>>>> least at first. > >>>>> > >>>>> 10. "cat > >>>>> /sys/bus/devices/pci0000:00/0000:00:16.0/power/runtime_status" > >>>>> => "suspended" > >>>>> ??? "cat > >>>>> > >>> > /sys/bus/devices/pci0000:00/0000:00:16.0/mei/mei0/power/runtime_status" > >>>>> => "unsupported" > >>>>> ??? "cat > >>>>> /sys/bus/devices/pci0000:00/0000:00:1f.0/power/runtime_status" > >>>>> => "active" > >>>>> ??? "cat > >>>>> /sys/bus/devices/pci0000:00/0000:00:1f.6/power/runtime_status" > >>>>> => "active" (this is the actual NIC) > >>>>> ??? These don't change between the working and non-working states. > >>>>> (It's possible that some other device does, but I haven't found it > >>>>> yet.) > >>>>> > >>>>> 11. I did try forcing the above to unsuspend, but this did not > >>>>> recover from the e1000e issue. > >>>>> > >>>>> 12. I also tried calling e1000e_reset on link-down.? This produces > >>>>> different register output on link-up, but doesn't recover from the > >>>>> issue. > >>>>> > >>>>> 13. I also tried recompiling the kernel with CONFIG_PM disabled > >>>>> (no power management).? This *does* resolve the problem (but is a > >>>>> very big hammer). > >>>>> > >>>>> 14. Possibly also of interest is that if I do *both* #12 and #13, > >>>>> the problem remains (suggesting #12 was counter-productive). > >>>>> > >>>>> FYI the hardware on one of the test machines is as follows: > >>>>> ??? 00:00.0 Host bridge: Intel Corporation Device 591f (rev 05) > >>>>> ??? 00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller > >>>>> (x16) (rev 05) > >>>>> ??? 00:02.0 VGA compatible controller: Intel Corporation Device > >>>>> 5912 (rev 04) > >>>>> ??? 00:08.0 System peripheral: Intel Corporation Skylake Gaussian > >>>>> Mixture Model > >>>>> ??? 00:14.0 USB controller: Intel Corporation Sunrise Point-H USB > >>>>> 3.0 xHCI Controller (rev 31) > >>>>> ??? 00:14.2 Signal processing controller: Intel Corporation > >>>>> Sunrise Point-H Thermal subsystem (rev 31) > >>>>> ??? 00:15.0 Signal processing controller: Intel Corporation > >>>>> Sunrise Point-H Serial IO I2C Controller #0 (rev 31) > >>>>> ??? 00:15.1 Signal processing controller: Intel Corporation > >>>>> Sunrise Point-H Serial IO I2C Controller #1 (rev 31) > >>>>> ??? 00:16.0 Communication controller: Intel Corporation Sunrise > >>>>> Point-H CSME HECI #1 (rev 31) > >>>>> ??? 00:17.0 SATA controller: Intel Corporation Sunrise Point-H > >>>>> SATA controller [AHCI mode] (rev 31) > >>>>> ??? 00:1b.0 PCI bridge: Intel Corporation Sunrise Point-H PCI > >>>>> Root Port #19 (rev f1) > >>>>> ??? 00:1b.3 PCI bridge: Intel Corporation Sunrise Point-H PCI > >>>>> Root Port #20 (rev f1) > >>>>> ??? 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI > >>>>> Express Root Port #5 (rev f1) > >>>>> ??? 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI > >>>>> Express Root Port #11 (rev f1) > >>>>> ??? 00:1e.0 Signal processing controller: Intel Corporation > >>>>> Sunrise Point-H Serial IO UART #0 (rev 31) > >>>>> ??? 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC > >>>>> Controller (rev 31) > >>>>> ??? 00:1f.2 Memory controller: Intel Corporation Sunrise Point-H > >>>>> PMC (rev 31) > >>>>> ??? 00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev > >>>>> 31) > >>>>> ??? 00:1f.6 Ethernet controller: Intel Corporation Ethernet > >>>>> Connection (2) I219-LM (rev 31) > >>>>> ??? 02:00.0 Ethernet controller: Intel Corporation I211 Gigabit > >>>>> Network Connection (rev 03) > >>>>> ??? 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit > >>>>> Network Connection (rev 03) > >>>>> ??? 05:00.0 Ethernet controller: Intel Corporation I211 Gigabit > >>>>> Network Connection (rev 03) > > > > (Tomas, your MUA wrapped the lines messing up the formatting.) Sorry, it's outlook. > > > >>>>> I'm happy to add any code instrumentation or make any other > >>>>> changes needed to locate and resolve the problem, and I can > >>>>> readily reproduce it > >>>>> -- I'm just at a complete loss as to where to start looking, and > >>>>> am still hoping for some suggestions in that regard. > >>>>> > >>>>> If there's anywhere (or anyone) else better for me to talk to > >>>>> about this issue, please let me know that too. > >>>> > >>>> It is not clear to me, if this is still reproducible on Linux > >>>> 5.3-rc7 > >>>> (or Linus? master branch). > >>>> > >>>> If it is, this is a definitely regression, and the commits need to > >>>> be reverted due to Linux? no regression policy. > >>> > >>> So I should revert this from 4.4.y and 4.9.y? > >> > >> The issue is not in mei driver, it is in e1000 driver, I my best > >> knowledge there should be fix, please Vitaly can it be backported to > >> older kernels? > > > > Tomas, backporting the commit supposedly fixing this, does *not* help. I hope that Vitaly can address that. > > Also, it does not matter for the no regression policy. There are power consumption implication if you revert this commit for everyone, while the issue is present only on some platforms. You can still disable runtime power management via sysfs and permanently using udev rule on your particular system. e.g. ATTR{../../power/control}="on" > > > > Let?s wait until Gavin can confirm if it is happening with Linux > > 5.3-rc7. > > As noted above (and in a prior email), the problem doesn't occur when using > the driver "normally" within Linux. The triggering environment is where the > driver init/send/receive code is being executed directly > *without* being registered as a Linux netdev. > > It is likely that the "real problem" is some side effect of this, such as > something checking if a child device is in use or powered down but it's not > registered. > > My environment is currently based on this tree: > > > Using this kernel tree: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git > > /log/?h=v4.9-rt&ofs=3120 > > > > I've identified that the code at tag v4.9.126 is "good" and the code > > at tag v4.9.127 is "bad". > (I then narrowed it down to that specific commit.) > > To reiterate, there is probably no problem with standard usage of the > drivers as part of Linux. > > But in this particular non-standard-edge-case-usage, there seems to be some > unfortunate interaction between the mei driver power management change > and link-loss in onboard e1000e, and I'm trying to figure out the cause and > hopefully a fix/workaround (or at least one less serious than disabling power > management entirely). This is some underlying issue, I'm don't think you can be able to resolve it yourself, e1000 guys should provide the fix. Unfortunately I cannot really fix this issue form the mei side. > > Some more context from my original email: > > I'm using a system with an e1000e network driver which has been > > patched to bypass the regular Linux network stack (because it can get > > called from a Xenomai RT context, among other reasons -- although in > > my case I'm not doing that). The complete source for the patched > > version of the code can be found here: > > > > https://github.com/ribalda/ethercat/blob/master/devices/e1000e/netdev- > > 4.9-ethercat.c (There are some minor changes to other files, but the > > majority of changes are only to this file. You can see just the > > changes at > > https://gist.github.com/uecasm/5e36a15bda6ffd53079344fc443dcc5f/revisi > > ons > > .) > > > > It was originally based on the in-kernel e1000e driver as of Linux > > 4.9.65. (I'm not the person who originally made the patches, but I am > > the person who rebased them to kernel 4.9 and I'm the one trying to > > maintain them for newer kernel versions. Though I'm also not the > > person who made that github repo.) You will need to eventually incorporate the e1000 fix when resolved also to your code base. For now the easiest workaround is to disable power management on mei from outside on effected platforms. Tomas ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-09-04 10:06 ` Winkler, Tomas @ 2019-09-04 11:08 ` Gavin Lambert 2019-09-04 12:31 ` Lifshits, Vitaly 2019-09-05 3:59 ` Gavin Lambert 0 siblings, 2 replies; 18+ messages in thread From: Gavin Lambert @ 2019-09-04 11:08 UTC (permalink / raw) To: intel-wired-lan On 2019-09-04 22:06, Winkler, Tomas wrote: >> >> On 2019-09-03 21:39, Paul Menzel wrote: >> > Dear Tomas, >> > >> > On 2019-09-03 11:28, Winkler, Tomas wrote: >> > >> >>> On Tue, Sep 03, 2019 at 10:35:30AM +0200, Paul Menzel wrote: >> > >> >>>> On 03.09.19 09:56, Gavin Lambert wrote: >> >>>>> On 2019-08-20 14:15, I wrote: >> >>>>>> Does anyone have any ideas about this?? Either towards further >> >>>>>> investigation or to a possible resolution? >> >>>>>> >> >>>>>> This is at the point of hardware internals now, so I have no idea >> >>>>>> how to proceed in either area. >> >>>>> >> >>>>> To recap (plus some new info): >> >>>>> >> >>>>> 1. I am using a kernel module which uses the code from the e1000e >> >>>>> driver to communicate with the hardware without actually >> >>>>> registering it as a Linux netdev.? (This is partly because it can >> >>>>> get used in a Xenomai context outside of Linux itself, although >> >>>>> I'm not doing that >> >>>>> myself.) This historically works fine. >> >>>>> >> >>>>> 2. On certain Linux versions, I encountered an issue where >> >>>>> disconnecting the network cable and reconnecting it almost always >> >>>>> results in not being able to send any packets.? (I cannot >> >>>>> determine if receiving packets works in this case, as the network >> >>>>> design will not receive packets unless some are sent first.) >> >>>>> Restarting the driver (rmmod+modprobe) does recover from this case >> >>>>> (until the next link loss), but simply replugging the cable never does. >> >>>>> >> >>>>> 3. The problem was observed with both I219-V and I219-LM (on >> >>>>> motherboard), but was *not* observed with 82571EB (PCIE).? The >> >>>>> problem was not observed with a motherboard igb-based I211.? I >> >>>>> suspect the issue is limited to motherboard-based e1000e adapters. >> >>>>> (Or perhaps there's something different about how the IGBs are >> >>>>> internally connected.) >> >>>>> >> >>>>> 4. The problem does not occur when the e1000e driver is registered >> >>>>> "normally" as a Linux netdev. >> >>>>> >> >>>>> 5. The problem was introduced by "mei: me: allow runtime pm for >> >>>>> platform with D0i3" (which has been backported to 4.4+, as far as >> >>>>> I can tell). >> >>>>> Excluding this commit reliably resolves the issue and including it >> >>>>> reliably breaks it. >> >>>> >> >>>> The commit hash in the master branch is >> >>>> cc365dcf0e56271bedf3de95f88922abe248e951 and is there since >> >>>> v4.16-rc1. >> >>>> >> >>>> Strange, that it is in 4.4 and 4.9, as it was only tagged for >> >>>> v4.13+. >> >>>> >> >>>>> 6. Applying the previously suggested patch >> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git/commit/drivers/net/ethernet/intel/e1000e?id=def4ec6dce393e2136b62a05712f35a7fa5f5e56 >> >>>>> has no effect; the E1000_STATUS_PCIM_STATE >> >>>>> bit is not set when the issue occurs. >> >>>>> >> >>>>> 7. Given the content of the change in #5, I assumed that the >> >>>>> problem was power-management related, perhaps a side effect of the >> >>>>> e1000e driver not being registered as a netdev.? (So perhaps >> >>>>> something thinks that no devices are in use and turns something >> >>>>> off?) >> >>>>> >> >>>>> 8. I've previously posted register dumps from an e1000e in both >> >>>>> the "normal" and "link up but not transmitting" states.? They >> >>>>> seemed very similar, but as I'm not familiar with the register >> >>>>> meanings I may have overlooked something significant.? (Note that >> >>>>> the dumps were captured inside the watchdog task, when it detects >> >>>>> link up but before it sets >> >>>>> E1000_TCTL_EN.) >> >>>>> >> >>>>> 9. I enabled debug logging in the mei driver; it logs a couple of >> >>>>> runtime_idles and then a runtime_suspend during system startup. >> >>>>> (I added a log to runtime_resume that is missing in the driver >> >>>>> source, but it appears this does not get called in my scenario.) >> >>>>> Note that the e1000e driver is still working ok after this.. at >> >>>>> least at first. >> >>>>> >> >>>>> 10. "cat >> >>>>> /sys/bus/devices/pci0000:00/0000:00:16.0/power/runtime_status" >> >>>>> => "suspended" >> >>>>> ??? "cat >> >>>>> >> >>> >> /sys/bus/devices/pci0000:00/0000:00:16.0/mei/mei0/power/runtime_status" >> >>>>> => "unsupported" >> >>>>> ??? "cat >> >>>>> /sys/bus/devices/pci0000:00/0000:00:1f.0/power/runtime_status" >> >>>>> => "active" >> >>>>> ??? "cat >> >>>>> /sys/bus/devices/pci0000:00/0000:00:1f.6/power/runtime_status" >> >>>>> => "active" (this is the actual NIC) >> >>>>> ??? These don't change between the working and non-working states. >> >>>>> (It's possible that some other device does, but I haven't found it >> >>>>> yet.) >> >>>>> >> >>>>> 11. I did try forcing the above to unsuspend, but this did not >> >>>>> recover from the e1000e issue. >> >>>>> >> >>>>> 12. I also tried calling e1000e_reset on link-down.? This produces >> >>>>> different register output on link-up, but doesn't recover from the >> >>>>> issue. >> >>>>> >> >>>>> 13. I also tried recompiling the kernel with CONFIG_PM disabled >> >>>>> (no power management).? This *does* resolve the problem (but is a >> >>>>> very big hammer). >> >>>>> >> >>>>> 14. Possibly also of interest is that if I do *both* #12 and #13, >> >>>>> the problem remains (suggesting #12 was counter-productive). >> >>>>> >> >>>>> FYI the hardware on one of the test machines is as follows: >> >>>>> ??? 00:00.0 Host bridge: Intel Corporation Device 591f (rev 05) >> >>>>> ??? 00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller >> >>>>> (x16) (rev 05) >> >>>>> ??? 00:02.0 VGA compatible controller: Intel Corporation Device >> >>>>> 5912 (rev 04) >> >>>>> ??? 00:08.0 System peripheral: Intel Corporation Skylake Gaussian >> >>>>> Mixture Model >> >>>>> ??? 00:14.0 USB controller: Intel Corporation Sunrise Point-H USB >> >>>>> 3.0 xHCI Controller (rev 31) >> >>>>> ??? 00:14.2 Signal processing controller: Intel Corporation >> >>>>> Sunrise Point-H Thermal subsystem (rev 31) >> >>>>> ??? 00:15.0 Signal processing controller: Intel Corporation >> >>>>> Sunrise Point-H Serial IO I2C Controller #0 (rev 31) >> >>>>> ??? 00:15.1 Signal processing controller: Intel Corporation >> >>>>> Sunrise Point-H Serial IO I2C Controller #1 (rev 31) >> >>>>> ??? 00:16.0 Communication controller: Intel Corporation Sunrise >> >>>>> Point-H CSME HECI #1 (rev 31) >> >>>>> ??? 00:17.0 SATA controller: Intel Corporation Sunrise Point-H >> >>>>> SATA controller [AHCI mode] (rev 31) >> >>>>> ??? 00:1b.0 PCI bridge: Intel Corporation Sunrise Point-H PCI >> >>>>> Root Port #19 (rev f1) >> >>>>> ??? 00:1b.3 PCI bridge: Intel Corporation Sunrise Point-H PCI >> >>>>> Root Port #20 (rev f1) >> >>>>> ??? 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI >> >>>>> Express Root Port #5 (rev f1) >> >>>>> ??? 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI >> >>>>> Express Root Port #11 (rev f1) >> >>>>> ??? 00:1e.0 Signal processing controller: Intel Corporation >> >>>>> Sunrise Point-H Serial IO UART #0 (rev 31) >> >>>>> ??? 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC >> >>>>> Controller (rev 31) >> >>>>> ??? 00:1f.2 Memory controller: Intel Corporation Sunrise Point-H >> >>>>> PMC (rev 31) >> >>>>> ??? 00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev >> >>>>> 31) >> >>>>> ??? 00:1f.6 Ethernet controller: Intel Corporation Ethernet >> >>>>> Connection (2) I219-LM (rev 31) >> >>>>> ??? 02:00.0 Ethernet controller: Intel Corporation I211 Gigabit >> >>>>> Network Connection (rev 03) >> >>>>> ??? 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit >> >>>>> Network Connection (rev 03) >> >>>>> ??? 05:00.0 Ethernet controller: Intel Corporation I211 Gigabit >> >>>>> Network Connection (rev 03) >> > >> > (Tomas, your MUA wrapped the lines messing up the formatting.) > > > Sorry, it's outlook. > >> > >> >>>>> I'm happy to add any code instrumentation or make any other >> >>>>> changes needed to locate and resolve the problem, and I can >> >>>>> readily reproduce it >> >>>>> -- I'm just at a complete loss as to where to start looking, and >> >>>>> am still hoping for some suggestions in that regard. >> >>>>> >> >>>>> If there's anywhere (or anyone) else better for me to talk to >> >>>>> about this issue, please let me know that too. >> >>>> >> >>>> It is not clear to me, if this is still reproducible on Linux >> >>>> 5.3-rc7 (or Linus? master branch). >> >>>> >> >>>> If it is, this is a definitely regression, and the commits need to >> >>>> be reverted due to Linux? no regression policy. >> >>> >> >>> So I should revert this from 4.4.y and 4.9.y? >> >> >> >> The issue is not in mei driver, it is in e1000 driver, I my best >> >> knowledge there should be fix, please Vitaly can it be backported to >> >> older kernels? >> > >> > Tomas, backporting the commit supposedly fixing this, does *not* help. > > I hope that Vitaly can address that. > >> > Also, it does not matter for the no regression policy. > > There are power consumption implication if you revert this commit for > everyone, while the issue is present only on some platforms. I wouldn't suggest reverting that change, at least not solely on my account (unless it's affecting more people). It's not only me using this code but it's still a very niche case, and outside of "normal" Linux usage. Although it seems a little odd that it ended up in 4.4 and 4.9 when the commit said it was intended for 4.13+. But I don't know how those things work. (Though in a way this was good for me -- it would have been a lot harder to run into this issue when switching from 4.9 to 4.19 [which would have been the next step] rather than from 4.9.110 to 4.9.168 [which is what actually happened].) > You can still disable runtime power management via sysfs and > permanently using udev rule on your particular system. > e.g. ATTR{../../power/control}="on" I'll do some more testing on this tomorrow, but I do recall trying setting power/control to "on" (via sysfs) for the device: 00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31) which was the one that I noticed was suspended. Is this the mei device? In any case when I tried it before it didn't seem to help, but I think this was after link-down and things had already failed. I'll try testing a few more cases, including doing it pre-emptively. >> > Let?s wait until Gavin can confirm if it is happening with Linux >> > 5.3-rc7. >> >> As noted above (and in a prior email), the problem doesn't occur when >> using >> the driver "normally" within Linux. The triggering environment is >> where the >> driver init/send/receive code is being executed directly >> *without* being registered as a Linux netdev. >> >> It is likely that the "real problem" is some side effect of this, such >> as >> something checking if a child device is in use or powered down but >> it's not >> registered. >> >> My environment is currently based on this tree: >> >> > Using this kernel tree: >> > >> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/log/?h=v4.9-rt&ofs=3120 >> > >> > I've identified that the code at tag v4.9.126 is "good" and the code >> > at tag v4.9.127 is "bad". >> (I then narrowed it down to that specific commit.) >> >> To reiterate, there is probably no problem with standard usage of the >> drivers as part of Linux. >> >> But in this particular non-standard-edge-case-usage, there seems to be >> some >> unfortunate interaction between the mei driver power management change >> and link-loss in onboard e1000e, and I'm trying to figure out the >> cause and >> hopefully a fix/workaround (or at least one less serious than >> disabling power >> management entirely). > This is some underlying issue, I'm don't think you can be able to > resolve it yourself, e1000 guys should provide the fix. > Unfortunately I cannot really fix this issue form the mei side. > >> >> Some more context from my original email: >> > I'm using a system with an e1000e network driver which has been >> > patched to bypass the regular Linux network stack (because it can get >> > called from a Xenomai RT context, among other reasons -- although in >> > my case I'm not doing that). The complete source for the patched >> > version of the code can be found here: >> > >> > https://github.com/ribalda/ethercat/blob/master/devices/e1000e/netdev-4.9-ethercat.c >> > (There are some minor changes to other files, but the >> > majority of changes are only to this file. You can see just the >> > changes at >> > https://gist.github.com/uecasm/5e36a15bda6ffd53079344fc443dcc5f/revisions .) >> > >> > It was originally based on the in-kernel e1000e driver as of Linux >> > 4.9.65. (I'm not the person who originally made the patches, but I am >> > the person who rebased them to kernel 4.9 and I'm the one trying to >> > maintain them for newer kernel versions. Though I'm also not the >> > person who made that github repo.) > > You will need to eventually incorporate the e1000 fix when resolved > also to your code base. > For now the easiest workaround is to disable power management on mei > from outside on effected platforms. Yeah, I'm hoping that the eventual solution will be a code change to the e1000e driver. The way the distribution is structured it's very easy to apply a fix there and much much harder to apply one at any other point. Though userspace rule changes are also feasible. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-09-04 11:08 ` Gavin Lambert @ 2019-09-04 12:31 ` Lifshits, Vitaly 2019-09-05 3:59 ` Gavin Lambert 1 sibling, 0 replies; 18+ messages in thread From: Lifshits, Vitaly @ 2019-09-04 12:31 UTC (permalink / raw) To: intel-wired-lan On 9/4/2019 14:08, Gavin Lambert wrote: > On 2019-09-04 22:06, Winkler, Tomas wrote: >>> >>> On 2019-09-03 21:39, Paul Menzel wrote: >>> > Dear Tomas, >>> > >>> > On 2019-09-03 11:28, Winkler, Tomas wrote: >>> > >>> >>> On Tue, Sep 03, 2019 at 10:35:30AM +0200, Paul Menzel wrote: >>> > >>> >>>> On 03.09.19 09:56, Gavin Lambert wrote: >>> >>>>> On 2019-08-20 14:15, I wrote: >>> >>>>>> Does anyone have any ideas about this?? Either towards further >>> >>>>>> investigation or to a possible resolution? >>> >>>>>> >>> >>>>>> This is at the point of hardware internals now, so I have no >>> idea >>> >>>>>> how to proceed in either area. >>> >>>>> >>> >>>>> To recap (plus some new info): >>> >>>>> >>> >>>>> 1. I am using a kernel module which uses the code from the e1000e >>> >>>>> driver to communicate with the hardware without actually >>> >>>>> registering it as a Linux netdev.? (This is partly because it can >>> >>>>> get used in a Xenomai context outside of Linux itself, although >>> >>>>> I'm not doing that >>> >>>>> myself.) This historically works fine. >>> >>>>> >>> >>>>> 2. On certain Linux versions, I encountered an issue where >>> >>>>> disconnecting the network cable and reconnecting it almost always >>> >>>>> results in not being able to send any packets.? (I cannot >>> >>>>> determine if receiving packets works in this case, as the network >>> >>>>> design will not receive packets unless some are sent first.) >>> >>>>> Restarting the driver (rmmod+modprobe) does recover from this >>> case >>> >>>>> (until the next link loss), but simply replugging the cable >>> never does. >>> >>>>> >>> >>>>> 3. The problem was observed with both I219-V and I219-LM (on >>> >>>>> motherboard), but was *not* observed with 82571EB (PCIE).? The >>> >>>>> problem was not observed with a motherboard igb-based I211.? I >>> >>>>> suspect the issue is limited to motherboard-based e1000e >>> adapters. >>> >>>>> (Or perhaps there's something different about how the IGBs are >>> >>>>> internally connected.) >>> >>>>> >>> >>>>> 4. The problem does not occur when the e1000e driver is >>> registered >>> >>>>> "normally" as a Linux netdev. >>> >>>>> >>> >>>>> 5. The problem was introduced by "mei: me: allow runtime pm for >>> >>>>> platform with D0i3" (which has been backported to 4.4+, as far as >>> >>>>> I can tell). >>> >>>>> Excluding this commit reliably resolves the issue and >>> including it >>> >>>>> reliably breaks it. >>> >>>> >>> >>>> The commit hash in the master branch is >>> >>>> cc365dcf0e56271bedf3de95f88922abe248e951 and is there since >>> >>>> v4.16-rc1. >>> >>>> >>> >>>> Strange, that it is in 4.4 and 4.9, as it was only tagged for >>> >>>> v4.13+. >>> >>>> >>> >>>>> 6. Applying the previously suggested patch >>> >>>>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git/commit/drivers/net/ethernet/intel/e1000e?id=def4ec6dce393e2136b62a05712f35a7fa5f5e56 >>> >>>>> has no effect; the E1000_STATUS_PCIM_STATE >>> >>>>> bit is not set when the issue occurs. >>> >>>>> >>> >>>>> 7. Given the content of the change in #5, I assumed that the >>> >>>>> problem was power-management related, perhaps a side effect of >>> the >>> >>>>> e1000e driver not being registered as a netdev.? (So perhaps >>> >>>>> something thinks that no devices are in use and turns something >>> >>>>> off?) >>> >>>>> >>> >>>>> 8. I've previously posted register dumps from an e1000e in both >>> >>>>> the "normal" and "link up but not transmitting" states.? They >>> >>>>> seemed very similar, but as I'm not familiar with the register >>> >>>>> meanings I may have overlooked something significant.? (Note that >>> >>>>> the dumps were captured inside the watchdog task, when it detects >>> >>>>> link up but before it sets >>> >>>>> E1000_TCTL_EN.) >>> >>>>> >>> >>>>> 9. I enabled debug logging in the mei driver; it logs a couple of >>> >>>>> runtime_idles and then a runtime_suspend during system startup. >>> >>>>> (I added a log to runtime_resume that is missing in the driver >>> >>>>> source, but it appears this does not get called in my scenario.) >>> >>>>> Note that the e1000e driver is still working ok after this.. at >>> >>>>> least at first. >>> >>>>> >>> >>>>> 10. "cat >>> >>>>> /sys/bus/devices/pci0000:00/0000:00:16.0/power/runtime_status" >>> >>>>> => "suspended" >>> >>>>>? ??? "cat >>> >>>>> >>> >>> >>> /sys/bus/devices/pci0000:00/0000:00:16.0/mei/mei0/power/runtime_status" >>> >>>>> => "unsupported" >>> >>>>>? ??? "cat >>> >>>>> /sys/bus/devices/pci0000:00/0000:00:1f.0/power/runtime_status" >>> >>>>> => "active" >>> >>>>>? ??? "cat >>> >>>>> /sys/bus/devices/pci0000:00/0000:00:1f.6/power/runtime_status" >>> >>>>> => "active" (this is the actual NIC) >>> >>>>>? ??? These don't change between the working and non-working >>> states. >>> >>>>> (It's possible that some other device does, but I haven't >>> found it >>> >>>>> yet.) >>> >>>>> >>> >>>>> 11. I did try forcing the above to unsuspend, but this did not >>> >>>>> recover from the e1000e issue. >>> >>>>> >>> >>>>> 12. I also tried calling e1000e_reset on link-down.? This >>> produces >>> >>>>> different register output on link-up, but doesn't recover from >>> the >>> >>>>> issue. >>> >>>>> >>> >>>>> 13. I also tried recompiling the kernel with CONFIG_PM disabled >>> >>>>> (no power management).? This *does* resolve the problem (but is a >>> >>>>> very big hammer). >>> >>>>> >>> >>>>> 14. Possibly also of interest is that if I do *both* #12 and #13, >>> >>>>> the problem remains (suggesting #12 was counter-productive). >>> >>>>> >>> >>>>> FYI the hardware on one of the test machines is as follows: >>> >>>>>? ??? 00:00.0 Host bridge: Intel Corporation Device 591f (rev 05) >>> >>>>>? ??? 00:01.0 PCI bridge: Intel Corporation Skylake PCIe >>> Controller >>> >>>>> (x16) (rev 05) >>> >>>>>? ??? 00:02.0 VGA compatible controller: Intel Corporation Device >>> >>>>> 5912 (rev 04) >>> >>>>>? ??? 00:08.0 System peripheral: Intel Corporation Skylake >>> Gaussian >>> >>>>> Mixture Model >>> >>>>>? ??? 00:14.0 USB controller: Intel Corporation Sunrise Point-H >>> USB >>> >>>>> 3.0? xHCI Controller (rev 31) >>> >>>>>? ??? 00:14.2 Signal processing controller: Intel Corporation >>> >>>>> Sunrise Point-H Thermal subsystem (rev 31) >>> >>>>>? ??? 00:15.0 Signal processing controller: Intel Corporation >>> >>>>> Sunrise Point-H Serial IO I2C Controller #0 (rev 31) >>> >>>>>? ??? 00:15.1 Signal processing controller: Intel Corporation >>> >>>>> Sunrise Point-H Serial IO I2C Controller #1 (rev 31) >>> >>>>>? ??? 00:16.0 Communication controller: Intel Corporation Sunrise >>> >>>>> Point-H CSME HECI #1 (rev 31) >>> >>>>>? ??? 00:17.0 SATA controller: Intel Corporation Sunrise Point-H >>> >>>>> SATA controller [AHCI mode] (rev 31) >>> >>>>>? ??? 00:1b.0 PCI bridge: Intel Corporation Sunrise Point-H PCI >>> >>>>> Root Port #19 (rev f1) >>> >>>>>? ??? 00:1b.3 PCI bridge: Intel Corporation Sunrise Point-H PCI >>> >>>>> Root Port #20 (rev f1) >>> >>>>>? ??? 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI >>> >>>>> Express Root Port #5 (rev f1) >>> >>>>>? ??? 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI >>> >>>>> Express Root Port #11 (rev f1) >>> >>>>>? ??? 00:1e.0 Signal processing controller: Intel Corporation >>> >>>>> Sunrise Point-H Serial IO UART #0 (rev 31) >>> >>>>>? ??? 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC >>> >>>>> Controller (rev 31) >>> >>>>>? ??? 00:1f.2 Memory controller: Intel Corporation Sunrise Point-H >>> >>>>> PMC (rev 31) >>> >>>>>? ??? 00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev >>> >>>>> 31) >>> >>>>>? ??? 00:1f.6 Ethernet controller: Intel Corporation Ethernet >>> >>>>> Connection (2) I219-LM (rev 31) >>> >>>>>? ??? 02:00.0 Ethernet controller: Intel Corporation I211 Gigabit >>> >>>>> Network Connection (rev 03) >>> >>>>>? ??? 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit >>> >>>>> Network Connection (rev 03) >>> >>>>>? ??? 05:00.0 Ethernet controller: Intel Corporation I211 Gigabit >>> >>>>> Network Connection (rev 03) >>> > >>> > (Tomas, your MUA wrapped the lines messing up the formatting.) >> >> >> Sorry, it's outlook. >> >>> > >>> >>>>> I'm happy to add any code instrumentation or make any other >>> >>>>> changes needed to locate and resolve the problem, and I can >>> >>>>> readily reproduce it >>> >>>>> -- I'm just at a complete loss as to where to start looking, and >>> >>>>> am still hoping for some suggestions in that regard. >>> >>>>> >>> >>>>> If there's anywhere (or anyone) else better for me to talk to >>> >>>>> about this issue, please let me know that too. >>> >>>> >>> >>>> It is not clear to me, if this is still reproducible on Linux >>> >>>> 5.3-rc7 (or Linus? master branch). >>> >>>> >>> >>>> If it is, this is a definitely regression, and the commits need to >>> >>>> be reverted due to Linux? no regression policy. >>> >>> >>> >>> So I should revert this from 4.4.y and 4.9.y? >>> >> >>> >> The issue is not in mei driver, it is in e1000 driver, I my best >>> >> knowledge there should be fix, please Vitaly can it be backported to >>> >> older kernels? >>> > >>> > Tomas, backporting the commit supposedly fixing this, does *not* >>> help. >> >> I hope that Vitaly can address that. As far as I can see it's not the same issue we had in the upstream driver when the mei commit was added. Backporting this commit is not possible and will not help. >> >>> > Also, it does not matter for the no regression policy. >> >> There are power consumption implication if you revert this commit for >> everyone, while the issue is present only on some platforms. > > I wouldn't suggest reverting that change, at least not solely on my > account (unless it's affecting more people).? It's not only me using > this code but it's still a very niche case, and outside of "normal" > Linux usage. > > Although it seems a little odd that it ended up in 4.4 and 4.9 when > the commit said it was intended for 4.13+.? But I don't know how those > things work. > > (Though in a way this was good for me -- it would have been a lot > harder to run into this issue when switching from 4.9 to 4.19 [which > would have been the next step] rather than from 4.9.110 to 4.9.168 > [which is what actually happened].) > >> You can still disable runtime power management via sysfs and >> permanently using udev rule on your particular system. >> e.g. ATTR{../../power/control}="on" > > I'll do some more testing on this tomorrow, but I do recall trying > setting power/control to "on" (via sysfs) for the device: > > ? 00:16.0 Communication controller: Intel Corporation Sunrise Point-H > CSME HECI #1 (rev 31) > > which was the one that I noticed was suspended.? Is this the mei device? > > In any case when I tried it before it didn't seem to help, but I think > this was after link-down and things had already failed. I'll try > testing a few more cases, including doing it pre-emptively. I suggest testing this on kernel 5.e-rc7 as Paul advised. As the bug wasn't reproduced on the kernel . > >>> > Let?s wait until Gavin can confirm if it is happening with Linux >>> > 5.3-rc7. >>> >>> As noted above (and in a prior email), the problem doesn't occur >>> when using >>> the driver "normally" within Linux.? The triggering environment is >>> where the >>> driver init/send/receive code is being executed directly >>> *without* being registered as a Linux netdev. >>> >>> It is likely that the "real problem" is some side effect of this, >>> such as >>> something checking if a child device is in use or powered down but >>> it's not >>> registered. >>> >>> My environment is currently based on this tree: >>> >>> > Using this kernel tree: >>> > >>> > >>> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/log/?h=v4.9-rt&ofs=3120 >>> > >>> > I've identified that the code at tag v4.9.126 is "good" and the code >>> > at tag v4.9.127 is "bad". >>> (I then narrowed it down to that specific commit.) >>> >>> To reiterate, there is probably no problem with standard usage of the >>> drivers as part of Linux. >>> >>> But in this particular non-standard-edge-case-usage, there seems to >>> be some >>> unfortunate interaction between the mei driver power management change >>> and link-loss in onboard e1000e, and I'm trying to figure out the >>> cause and >>> hopefully a fix/workaround (or at least one less serious than >>> disabling power >>> management entirely). >> This is some underlying issue, I'm don't think you can be able to >> resolve it yourself,? e1000 guys should provide the fix. >> Unfortunately I cannot really fix this issue form the mei side. >> >>> >>> Some more context from my original email: >>> > I'm using a system with an e1000e network driver which has been >>> > patched to bypass the regular Linux network stack (because it can get >>> > called from a Xenomai RT context, among other reasons -- although in >>> > my case I'm not doing that).? The complete source for the patched >>> > version of the code can be found here: >>> > >>> > >>> https://github.com/ribalda/ethercat/blob/master/devices/e1000e/netdev-4.9-ethercat.c >>> > (There are some minor changes to other files, but the >>> > majority of changes are only to this file.? You can see just the >>> > changes at >>> > >>> https://gist.github.com/uecasm/5e36a15bda6ffd53079344fc443dcc5f/revisions >>> .) >>> > >>> > It was originally based on the in-kernel e1000e driver as of Linux >>> > 4.9.65.? (I'm not the person who originally made the patches, but >>> I am >>> > the person who rebased them to kernel 4.9 and I'm the one trying to >>> > maintain them for newer kernel versions.? Though I'm also not the >>> > person who made that github repo.) >> >> You will need to eventually incorporate the e1000 fix when resolved >> also to your code base. >> For now the easiest workaround is to disable power management on mei >> from outside on effected platforms. > > Yeah, I'm hoping that the eventual solution will be a code change to > the e1000e driver.? The way the distribution is structured it's very > easy to apply a fix there and much much harder to apply one at any > other point.? Though userspace rule changes are also feasible. Please try our OOT driver which can be found in: https://sourceforge.net/projects/e1000/files/e1000e%20stable/3.5.1/ Also please open a ticket for this issue in this source forge page. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver 2019-09-04 11:08 ` Gavin Lambert 2019-09-04 12:31 ` Lifshits, Vitaly @ 2019-09-05 3:59 ` Gavin Lambert 1 sibling, 0 replies; 18+ messages in thread From: Gavin Lambert @ 2019-09-05 3:59 UTC (permalink / raw) To: intel-wired-lan On 2019-09-04 23:08, I wrote: > On 2019-09-04 22:06, Winkler, Tomas wrote: >> You can still disable runtime power management via sysfs and >> permanently using udev rule on your particular system. >> e.g. ATTR{../../power/control}="on" > > I'll do some more testing on this tomorrow, but I do recall trying > setting power/control to "on" (via sysfs) for the device: > > 00:16.0 Communication controller: Intel Corporation Sunrise Point-H > CSME HECI #1 (rev 31) > > which was the one that I noticed was suspended. Is this the mei > device? > > In any case when I tried it before it didn't seem to help, but I think > this was after link-down and things had already failed. I'll try > testing a few more cases, including doing it pre-emptively. Good news: while forcing the mei_me device to "on" does not recover from the problem once it has happened, it does appear to prevent the problem from happening. (I assume this is because it effectively reverts the problem commit without any actual code changes.) I was able to get this to happen on boot by setting /etc/udev/rules.d/20-mei.rules: ACTION=="add",KERNEL=="mei0",ATTR{../../power/control}="on" (Initially I tried to match on DRIVER=="mei_me" to avoid the parent attribute reference, but this didn't trigger on boot. The above seems to work though.) This is probably a sufficient workaround for now to keep me happy. Is there anything else you wanted me to test while I have the system handy? (FWIW, I did previously verify that the original problem also occurs in Linux 4.19, though I don't recall the precise version at the moment.) ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2019-09-05 3:59 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-11 6:50 [Intel-wired-lan] [e1000e] Linux 4.9: unable to send packets after link recovery with patched driver Gavin Lambert 2019-07-12 3:23 ` Gavin Lambert 2019-07-18 8:06 ` Gavin Lambert 2019-07-18 8:22 ` Paul Menzel 2019-07-18 8:24 ` Neftin, Sasha 2019-07-19 0:40 ` Gavin Lambert 2019-07-19 1:02 ` Gavin Lambert 2019-08-20 2:15 ` Gavin Lambert 2019-09-03 7:56 ` Gavin Lambert 2019-09-03 8:35 ` Paul Menzel 2019-09-03 9:20 ` Greg Kroah-Hartman 2019-09-03 9:28 ` Winkler, Tomas 2019-09-03 9:39 ` Paul Menzel 2019-09-03 11:00 ` Gavin Lambert 2019-09-04 10:06 ` Winkler, Tomas 2019-09-04 11:08 ` Gavin Lambert 2019-09-04 12:31 ` Lifshits, Vitaly 2019-09-05 3:59 ` Gavin Lambert
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.