* Re: [BUG] iwlwifi: card unusable after firmware crash
2020-12-09 17:13 ` Jakub Kicinski
@ 2020-12-09 20:02 ` Coelho, Luciano
2020-12-09 20:14 ` Rui Salvaterra
2020-12-09 20:32 ` Emmanuel Grumbach
2 siblings, 0 replies; 17+ messages in thread
From: Coelho, Luciano @ 2020-12-09 20:02 UTC (permalink / raw)
To: Grumbach, Emmanuel, kuba
Cc: linux-wireless, Goodstein, Mordechay, Berg, Johannes,
linux-kernel, rsalvaterra
Hi Jakub et al,
On Wed, 2020-12-09 at 09:13 -0800, Jakub Kicinski wrote:
> On Tue, 8 Dec 2020 23:17:48 +0000 Rui Salvaterra wrote:
> > Hi, Luca,
> >
> > On Tue, 8 Dec 2020 at 16:27, Coelho, Luciano <luciano.coelho@intel.com> wrote:
> > > On Tue, 2020-12-08 at 11:27 +0000, Rui Salvaterra wrote:
> > > >
> > > > [ 3174.003910] iwlwifi 0000:02:00.0: RF_KILL bit toggled to disable radio.
> > > > [ 3174.003913] iwlwifi 0000:02:00.0: reporting RF_KILL (radio disabled)
> > >
> > > It looks like your machine is reporting RF-Kill to the WiFi device.
> >
> > Yes, that's an artifact of how I tested: I rebooted the router, the
> > Wi-Fi interface disassociated and the dmesg was clean. However, after
> > the router came up, the laptop didn't reconnect (and the connection
> > had completely disappeared from nmtui). Afterwards, I did the rfkill
> > cycle you see, and only then I got the register dump.
> >
> > > There seems to be some sort of race there that is causing us to still
> > > try to communicate with the device (and thus you see the transaction
> > > failed dump), but that will obviously fail when RF-Kill is enabled.
> >
> > I'm not sure about that, the card was already dead before the rfkill cycle.
>
> Any luck figuring this out, Luca? If this is a 5.10 regression we need
> to let Linus know tomorrow, so the time is ticking :(
I just checked all the commits in iwlwifi between v5.9 and v5.10 and I
don't see anything that could affect how RF-Kill works.
Emmanuel, do you remember anything that went in that could have
affected RF-Kill?
--
Cheers,
Luca.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] iwlwifi: card unusable after firmware crash
2020-12-09 17:13 ` Jakub Kicinski
2020-12-09 20:02 ` Coelho, Luciano
@ 2020-12-09 20:14 ` Rui Salvaterra
2020-12-09 20:32 ` Emmanuel Grumbach
2 siblings, 0 replies; 17+ messages in thread
From: Rui Salvaterra @ 2020-12-09 20:14 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Coelho, Luciano, Goodstein, Mordechay, Berg, Johannes,
linux-wireless, linux-kernel
Hi, guys,
On Wed, 9 Dec 2020 at 17:13, Jakub Kicinski <kuba@kernel.org> wrote:
>
> Any luck figuring this out, Luca? If this is a 5.10 regression we need
> to let Linus know tomorrow, so the time is ticking :(
I don't have the possibility to test other kernels at the moment, but
I will do so in a few days (at least to find a working version to
bisect). Meanwhile, I don't know if this is relevant or not, but I'm
using WPA3 PSK.
Thanks,
Rui
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] iwlwifi: card unusable after firmware crash
2020-12-09 17:13 ` Jakub Kicinski
2020-12-09 20:02 ` Coelho, Luciano
2020-12-09 20:14 ` Rui Salvaterra
@ 2020-12-09 20:32 ` Emmanuel Grumbach
2020-12-09 20:40 ` Emmanuel Grumbach
2020-12-09 20:40 ` Rui Salvaterra
2 siblings, 2 replies; 17+ messages in thread
From: Emmanuel Grumbach @ 2020-12-09 20:32 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Coelho, Luciano, Rui Salvaterra, Goodstein, Mordechay, Berg,
Johannes, linux-wireless, linux-kernel
On Wed, Dec 9, 2020 at 7:19 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 8 Dec 2020 23:17:48 +0000 Rui Salvaterra wrote:
> > Hi, Luca,
> >
> > On Tue, 8 Dec 2020 at 16:27, Coelho, Luciano <luciano.coelho@intel.com> wrote:
> > > On Tue, 2020-12-08 at 11:27 +0000, Rui Salvaterra wrote:
> > > >
> > > > [ 3174.003910] iwlwifi 0000:02:00.0: RF_KILL bit toggled to disable radio.
> > > > [ 3174.003913] iwlwifi 0000:02:00.0: reporting RF_KILL (radio disabled)
> > >
> > > It looks like your machine is reporting RF-Kill to the WiFi device.
> >
> > Yes, that's an artifact of how I tested: I rebooted the router, the
> > Wi-Fi interface disassociated and the dmesg was clean. However, after
> > the router came up, the laptop didn't reconnect (and the connection
> > had completely disappeared from nmtui). Afterwards, I did the rfkill
> > cycle you see, and only then I got the register dump.
> >
> > > There seems to be some sort of race there that is causing us to still
> > > try to communicate with the device (and thus you see the transaction
> > > failed dump), but that will obviously fail when RF-Kill is enabled.
> >
> > I'm not sure about that, the card was already dead before the rfkill cycle.
>
> Any luck figuring this out, Luca? If this is a 5.10 regression we need
> to let Linus know tomorrow, so the time is ticking :(
Rui, I looked at the register dump and looks like you're using AMT on
your system?
Can you confirm?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] iwlwifi: card unusable after firmware crash
2020-12-09 20:32 ` Emmanuel Grumbach
@ 2020-12-09 20:40 ` Emmanuel Grumbach
2020-12-09 20:41 ` Rui Salvaterra
2020-12-09 20:40 ` Rui Salvaterra
1 sibling, 1 reply; 17+ messages in thread
From: Emmanuel Grumbach @ 2020-12-09 20:40 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Coelho, Luciano, Rui Salvaterra, Goodstein, Mordechay, Berg,
Johannes, linux-wireless, linux-kernel
On Wed, Dec 9, 2020 at 10:32 PM Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>
> On Wed, Dec 9, 2020 at 7:19 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue, 8 Dec 2020 23:17:48 +0000 Rui Salvaterra wrote:
> > > Hi, Luca,
> > >
> > > On Tue, 8 Dec 2020 at 16:27, Coelho, Luciano <luciano.coelho@intel.com> wrote:
> > > > On Tue, 2020-12-08 at 11:27 +0000, Rui Salvaterra wrote:
> > > > >
> > > > > [ 3174.003910] iwlwifi 0000:02:00.0: RF_KILL bit toggled to disable radio.
> > > > > [ 3174.003913] iwlwifi 0000:02:00.0: reporting RF_KILL (radio disabled)
> > > >
> > > > It looks like your machine is reporting RF-Kill to the WiFi device.
> > >
> > > Yes, that's an artifact of how I tested: I rebooted the router, the
> > > Wi-Fi interface disassociated and the dmesg was clean. However, after
> > > the router came up, the laptop didn't reconnect (and the connection
> > > had completely disappeared from nmtui). Afterwards, I did the rfkill
> > > cycle you see, and only then I got the register dump.
> > >
> > > > There seems to be some sort of race there that is causing us to still
> > > > try to communicate with the device (and thus you see the transaction
> > > > failed dump), but that will obviously fail when RF-Kill is enabled.
> > >
> > > I'm not sure about that, the card was already dead before the rfkill cycle.
> >
> > Any luck figuring this out, Luca? If this is a 5.10 regression we need
> > to let Linus know tomorrow, so the time is ticking :(
>
> Rui, I looked at the register dump and looks like you're using AMT on
> your system?
> Can you confirm?
Besides, don't you get a stack dump in the vicinity of this register
dump? That's be helpful to see.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] iwlwifi: card unusable after firmware crash
2020-12-09 20:40 ` Emmanuel Grumbach
@ 2020-12-09 20:41 ` Rui Salvaterra
0 siblings, 0 replies; 17+ messages in thread
From: Rui Salvaterra @ 2020-12-09 20:41 UTC (permalink / raw)
To: Emmanuel Grumbach
Cc: Jakub Kicinski, Coelho, Luciano, Goodstein, Mordechay, Berg,
Johannes, linux-wireless, linux-kernel
Hi, again,
On Wed, 9 Dec 2020 at 20:40, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>
> Besides, don't you get a stack dump in the vicinity of this register
> dump? That's be helpful to see.
Nope. No stack trace at all. Only the register dump.
Thanks,
Rui
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] iwlwifi: card unusable after firmware crash
2020-12-09 20:32 ` Emmanuel Grumbach
2020-12-09 20:40 ` Emmanuel Grumbach
@ 2020-12-09 20:40 ` Rui Salvaterra
2020-12-09 20:47 ` Emmanuel Grumbach
1 sibling, 1 reply; 17+ messages in thread
From: Rui Salvaterra @ 2020-12-09 20:40 UTC (permalink / raw)
To: Emmanuel Grumbach
Cc: Jakub Kicinski, Coelho, Luciano, Goodstein, Mordechay, Berg,
Johannes, linux-wireless, linux-kernel
Hi, Emmanuel,
On Wed, 9 Dec 2020 at 20:32, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>
> Rui, I looked at the register dump and looks like you're using AMT on
> your system?
> Can you confirm?
AMT? You mean Intel Active Management? Heavens, no, not that I know
of! This is a personal laptop (Lenovo B51-80). (And I'd personally
kill the ME with fire, if I could.)
Thanks,
Rui
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] iwlwifi: card unusable after firmware crash
2020-12-09 20:40 ` Rui Salvaterra
@ 2020-12-09 20:47 ` Emmanuel Grumbach
2020-12-09 21:07 ` Emmanuel Grumbach
0 siblings, 1 reply; 17+ messages in thread
From: Emmanuel Grumbach @ 2020-12-09 20:47 UTC (permalink / raw)
To: Rui Salvaterra
Cc: Jakub Kicinski, Coelho, Luciano, Goodstein, Mordechay, Berg,
Johannes, linux-wireless, linux-kernel
On Wed, Dec 9, 2020 at 10:40 PM Rui Salvaterra <rsalvaterra@gmail.com> wrote:
>
> Hi, Emmanuel,
>
> On Wed, 9 Dec 2020 at 20:32, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
> >
> > Rui, I looked at the register dump and looks like you're using AMT on
> > your system?
> > Can you confirm?
>
> AMT? You mean Intel Active Management? Heavens, no, not that I know
> of! This is a personal laptop (Lenovo B51-80). (And I'd personally
> kill the ME with fire, if I could.)
Yes, I mean that thing. No VPRO sticker on the laptop?
Weird... So apparently I was wrong about the register value.
>
> Thanks,
> Rui
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] iwlwifi: card unusable after firmware crash
2020-12-09 20:47 ` Emmanuel Grumbach
@ 2020-12-09 21:07 ` Emmanuel Grumbach
2020-12-09 21:16 ` Rui Salvaterra
0 siblings, 1 reply; 17+ messages in thread
From: Emmanuel Grumbach @ 2020-12-09 21:07 UTC (permalink / raw)
To: Rui Salvaterra
Cc: Jakub Kicinski, Coelho, Luciano, Goodstein, Mordechay, Berg,
Johannes, linux-wireless, linux-kernel
On Wed, Dec 9, 2020 at 10:47 PM Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>
> On Wed, Dec 9, 2020 at 10:40 PM Rui Salvaterra <rsalvaterra@gmail.com> wrote:
> >
> > Hi, Emmanuel,
> >
> > On Wed, 9 Dec 2020 at 20:32, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
> > >
> > > Rui, I looked at the register dump and looks like you're using AMT on
> > > your system?
> > > Can you confirm?
> >
> > AMT? You mean Intel Active Management? Heavens, no, not that I know
> > of! This is a personal laptop (Lenovo B51-80). (And I'd personally
> > kill the ME with fire, if I could.)
>
> Yes, I mean that thing. No VPRO sticker on the laptop?
> Weird... So apparently I was wrong about the register value.
Indeed, the bit is reverse logic. So we can put that aside.
Frankly, I have no clue. You can try our backport tree to bisect,
should be easier..
What I see here is that your GP_CTRL value is 080003d8
#define CSR_GP_CNTRL_REG_FLAG_HW_RF_KILL_SW (0x08000000)
which means sense since apparently, HW RF-Kill was asserted.
#define CSR_GP_CNTRL_REG_FLAG_GOING_TO_SLEEP (0x00000010)
Which means that the device is going to sleep... And that's the problem:
iwl_trans_pcie_grab_nic_access:
ret = iwl_poll_bit(trans, CSR_GP_CNTRL,
CSR_GP_CNTRL_REG_VAL_MAC_ACCESS_EN,
(CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY |
CSR_GP_CNTRL_REG_FLAG_GOING_TO_SLEEP), 15000);
if (unlikely(ret < 0)) {
u32 cntrl = iwl_read32(trans, CSR_GP_CNTRL);
WARN_ONCE(1,
"Timeout waiting for hardware access
(CSR_GP_CNTRL 0x%08x)\n",
cntrl);
but I'd expect the splat in your log...
Or maybe you can't load the firmware?
Can you try this:
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index 2fffbbc8462f..748300752630 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -2121,6 +2121,7 @@ static bool
iwl_trans_pcie_grab_nic_access(struct iwl_trans *trans,
* track nic_access anyway.
*/
__release(&trans_pcie->reg_lock);
+ mdelay(1);
return true;
}
If that helps, then... I'd have no clue why it helps, but this
specific device caused us trouble like bad timing after
grab_nic_access..
>
> >
> > Thanks,
> > Rui
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [BUG] iwlwifi: card unusable after firmware crash
2020-12-09 21:07 ` Emmanuel Grumbach
@ 2020-12-09 21:16 ` Rui Salvaterra
2020-12-10 16:21 ` Rui Salvaterra
0 siblings, 1 reply; 17+ messages in thread
From: Rui Salvaterra @ 2020-12-09 21:16 UTC (permalink / raw)
To: Emmanuel Grumbach
Cc: Jakub Kicinski, Coelho, Luciano, Goodstein, Mordechay, Berg,
Johannes, linux-wireless, linux-kernel
Hi again, Emmanuel,
On Wed, 9 Dec 2020 at 21:07, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>
> Indeed, the bit is reverse logic. So we can put that aside.
> Frankly, I have no clue. You can try our backport tree to bisect,
> should be easier..
> What I see here is that your GP_CTRL value is 080003d8
>
> #define CSR_GP_CNTRL_REG_FLAG_HW_RF_KILL_SW (0x08000000)
> which means sense since apparently, HW RF-Kill was asserted.
> #define CSR_GP_CNTRL_REG_FLAG_GOING_TO_SLEEP (0x00000010)
> Which means that the device is going to sleep... And that's the problem:
>
> iwl_trans_pcie_grab_nic_access:
> ret = iwl_poll_bit(trans, CSR_GP_CNTRL,
> CSR_GP_CNTRL_REG_VAL_MAC_ACCESS_EN,
> (CSR_GP_CNTRL_REG_FLAG_MAC_CLOCK_READY |
> CSR_GP_CNTRL_REG_FLAG_GOING_TO_SLEEP), 15000);
> if (unlikely(ret < 0)) {
> u32 cntrl = iwl_read32(trans, CSR_GP_CNTRL);
>
> WARN_ONCE(1,
> "Timeout waiting for hardware access
> (CSR_GP_CNTRL 0x%08x)\n",
> cntrl);
>
> but I'd expect the splat in your log...
> Or maybe you can't load the firmware?
Well, my kernel doesn't have any modules, it's all built-in. The
firmware is obviously loading fine, otherwise the card wouldn't work,
but yeah, that WARN_ONCE hasn't triggered at all.
> Can you try this:
> diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
> b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
> index 2fffbbc8462f..748300752630 100644
> --- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
> +++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
> @@ -2121,6 +2121,7 @@ static bool
> iwl_trans_pcie_grab_nic_access(struct iwl_trans *trans,
> * track nic_access anyway.
> */
> __release(&trans_pcie->reg_lock);
> + mdelay(1);
> return true;
> }
>
> If that helps, then... I'd have no clue why it helps, but this
> specific device caused us trouble like bad timing after
> grab_nic_access..
I'll give it a spin. Nasty hack, but if it works, it works. :)
Thanks,
Rui
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] iwlwifi: card unusable after firmware crash
2020-12-09 21:16 ` Rui Salvaterra
@ 2020-12-10 16:21 ` Rui Salvaterra
2020-12-10 18:57 ` Emmanuel Grumbach
0 siblings, 1 reply; 17+ messages in thread
From: Rui Salvaterra @ 2020-12-10 16:21 UTC (permalink / raw)
To: Emmanuel Grumbach
Cc: Jakub Kicinski, Coelho, Luciano, Goodstein, Mordechay, Berg,
Johannes, linux-wireless, linux-kernel
Hi, again,
I haven't tested any patch or bisected, but I have another data point.
I built and tested Linux 5.8.18, with the same firmware, and it is
working correctly. I reduced the test case to just rfkilling the
connection, which showed the register dump immediately (before that I
was using the airplane toggle on the keyboard, which isn't working
correctly, it disables and immediately reenables the radio, for some
unfathomable reason).
So, now I'm inclined to believe this is some sort of race condition
between rfkill and pending transactions.
Thanks,
Rui
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] iwlwifi: card unusable after firmware crash
2020-12-10 16:21 ` Rui Salvaterra
@ 2020-12-10 18:57 ` Emmanuel Grumbach
0 siblings, 0 replies; 17+ messages in thread
From: Emmanuel Grumbach @ 2020-12-10 18:57 UTC (permalink / raw)
To: Rui Salvaterra
Cc: Jakub Kicinski, Coelho, Luciano, Goodstein, Mordechay, Berg,
Johannes, linux-wireless, linux-kernel
Hi,
> Hi, again,
>
> I haven't tested any patch or bisected, but I have another data point.
> I built and tested Linux 5.8.18, with the same firmware, and it is
> working correctly. I reduced the test case to just rfkilling the
> connection, which showed the register dump immediately (before that I
> was using the airplane toggle on the keyboard, which isn't working
> correctly, it disables and immediately reenables the radio, for some
> unfathomable reason).
> So, now I'm inclined to believe this is some sort of race condition
> between rfkill and pending transactions.
Which also means it's not a regression.
You can add a dump_stack() in the function that dumps the registers to
get a clue.
^ permalink raw reply [flat|nested] 17+ messages in thread