linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jonas Dreßler" <verdre@v0yd.nl>
To: Brian Norris <briannorris@chromium.org>
Cc: "Andy Shevchenko" <andy.shevchenko@gmail.com>,
	"Amitkumar Karwar" <amitkarwar@gmail.com>,
	"Ganapathi Bhat" <ganapathi017@gmail.com>,
	"Xinming Hu" <huxinming820@gmail.com>,
	"Kalle Valo" <kvalo@codeaurora.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Tsuchiya Yuto" <kitakar@gmail.com>,
	linux-wireless <linux-wireless@vger.kernel.org>,
	netdev@vger.kernel.org,
	"Linux Kernel" <linux-kernel@vger.kernel.org>,
	linux-pci <linux-pci@vger.kernel.org>,
	"Maximilian Luz" <luzmaximilian@gmail.com>,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Pali Rohár" <pali@kernel.org>
Subject: Re: [PATCH 1/2] mwifiex: Use non-posted PCI register writes
Date: Thu, 23 Sep 2021 17:28:24 +0200	[thread overview]
Message-ID: <e4cbf804-c374-79a3-53ac-8a0fbd8f75b8@v0yd.nl> (raw)
In-Reply-To: <bd64c142-93d0-c348-834c-34ed80c460f9@v0yd.nl>

On 9/22/21 2:50 PM, Jonas Dreßler wrote:
> On 9/20/21 7:48 PM, Brian Norris wrote:
>> On Sat, Sep 18, 2021 at 12:37 AM Jonas Dreßler <verdre@v0yd.nl> wrote:
>>> Thanks for the pointer to that commit Brian, it turns out this is
>>> actually the change that causes the "Firmware wakeup failed" issues that
>>> I'm trying to fix with the second patch here.
>>
>> Huh. That's interesting, although I guess it makes some sense given
>> your theory of "dropped writes". FWIW, this strategy (post a single
>> write, then wait for wakeup) is the same used by some other
>> chips/drivers too (e.g., ath10k/pci), although in those cases card
>> wakeup is much much faster. But if the bus was dropping writes
>> somehow, those strategies would fail too.
>>
>>> Also my approach is a lot messier than just reverting
>>> 062e008a6e83e7c4da7df0a9c6aefdbc849e2bb3 and also appears to be blocking
>>> even longer...
>>
>> For the record, in case you're talking about my data ("blocking even
>> longer"): I was only testing patch 1. Patch 2 isn't really relevant to
>> my particular systems (Rockchip RK3399 + Marvell 8997/PCIe), because
>> (a) I'm pretty sure my system isn't "dropping" any reads or writes
>> (b) all my delay is in the read-back; the Rockchip PCIe bus is waiting
>> indefinitely for the card to wake up, instead of timing out and
>> reporting all-1's like many x86 systems appear to do (I've tested
>> this).
>>
>> So, the 6ms delay is entirely sitting in the ioread32(), not a delay 
>> loop.
>>
>> I haven't yet tried your version 2 (which avoids the blocking read to
>> wake up; good!), but it sounds like in theory it could solve your
>> problem while avoiding 6ms delays for me. I intend to test your v2
>> this week.
>>
> 
> With "blocking even longer" I meant that (on my system) the delay-loop 
> blocks even longer than waking up the card via mwifiex_read_reg() (both 
> are in the orders of milliseconds). And given that in certain cases the 
> card wakeup (or a write getting through to the card, I have no idea) can 
> take extremely long, I'd feel more confident going with the 
> mwifiex_read_reg() method to wake up the card.
> 
> Anyway, you know what's even weirder with all this: I've been testing 
> the first commit of patch v2 (so just the single read-back instead of 
> the big hammer) together with 062e008a6e83e7c4da7df0a9c6aefdbc849e2bb3 
> reverted for a good week now and haven't seen any wakeup failure yet. 
> Otoh I'm fairly sure the big hammer with reading back every write wasn't 
> enough to fix the wakeup failures, otherwise I wouldn't even have 
> started working on the second commit.
> 
> So that would mean there's a difference between writing and then reading 
> back vs only reading to wake up the card: Only the latter fixes the 
> wakeup failures.
> 
>>> Does anyone have an idea what could be the reason for the posted write
>>> not going through, or could that also be a potential firmware bug in the
>>> chip?
>>
>> I have no clue about that. That does sound downright horrible, but so
>> are many things when dealing with this family of hardware/firmware.
>> I'm not sure how to prove out whether this is a host bus problem, or
>> an endpoint/firmware problem, other than perhaps trying the same
>> module/firmware on another system, if that's possible.
>>
>> Anyway, to reiterate: I'm not fundamentally opposed to v2 (pending a
>> test run here), even if it is a bit ugly and perhaps not 100%
>> understood.
>>
> 
> I'm not 100% sure about all this yet, I think I'm gonna try to confirm 
> my older findings once again now and then we'll see. FTR, would you be 
> fine with using the mwifiex_read_reg() method to wake up the card and 
> somehow quirking your system to use write_reg()?
> 
>> Brian
>>
> 

Okay, so I finally managed to find my exact reproducer for the bug again:

1) Make sure wifi powersaving is enabled (iw dev wlp1s0 set power_save on)
2) Connect to any wifi network (makes firmware go into wifi powersaving 
mode, not deep sleep)
3) Make sure bluetooth is turned off (to ensure the firmware actually 
enters powersave mode and doesn't keep the radio active doing bluetooth 
stuff)
4) To confirm that wifi powersaving is entered ping a device on the LAN, 
pings should be a few ms higher than without powersaving
5) Run "while true; do iwconfig; sleep 0.0001; done", this wakes and 
suspends the firmware extremely often
6) Wait until things explode, for me it consistently takes <5 minutes

Using this reproducer I was able to clear things up a bit:

- There still are wakeup failures when using (only) mwifiex_read_reg() 
to wake the card, so there's no weird difference between waking up using 
read vs write+read-back

- Just calling mwifiex_write_reg() once and then blocking until the card 
wakes up using my delay-loop doesn't fix the issue, it's actually 
writing multiple times that fixes the issue

These observations sound a lot like writes (and even reads) are actually 
being dropped, don't they?

  reply	other threads:[~2021-09-23 15:28 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-30 12:37 [PATCH 0/2] mwifiex: Work around firmware bugs on 88W8897 chip Jonas Dreßler
2021-08-30 12:37 ` [PATCH 1/2] mwifiex: Use non-posted PCI register writes Jonas Dreßler
2021-08-30 12:49   ` Andy Shevchenko
2021-09-01 14:01     ` Jonas Dreßler
2021-09-01 15:47       ` Andy Shevchenko
2021-09-01 15:51       ` Pali Rohár
2021-09-01 16:51         ` Heiner Kallweit
2021-09-01 17:07           ` Johannes Berg
2021-09-01 19:07             ` Heiner Kallweit
2021-09-01 22:41             ` Bjorn Helgaas
2021-09-02 14:05               ` Bjorn Helgaas
2021-09-01 19:40   ` Brian Norris
2021-09-01 20:40     ` Andy Shevchenko
2021-09-01 21:04       ` Brian Norris
2021-09-01 21:07         ` Brian Norris
2021-09-18  7:37           ` Jonas Dreßler
2021-09-20 17:48             ` Brian Norris
2021-09-22 12:50               ` Jonas Dreßler
2021-09-23 15:28                 ` Jonas Dreßler [this message]
2021-09-23 19:37                   ` Andy Shevchenko
2021-09-23 19:41                   ` Andy Shevchenko
2021-09-23 20:22                     ` Pali Rohár
2021-09-30 15:38                       ` Jonas Dreßler
2021-09-30 15:42                         ` Pali Rohár
2021-09-30 16:14                           ` Jonas Dreßler
2021-09-30 16:19                             ` Pali Rohár
2021-09-30 16:22                               ` Jonas Dreßler
2021-09-30 16:39                                 ` Pali Rohár
2021-08-30 12:37 ` [PATCH 2/2] mwifiex: Try waking the firmware until we get an interrupt Jonas Dreßler
2021-08-30 12:51   ` Andy Shevchenko
2021-08-30 12:55     ` Andy Shevchenko
2021-09-25 17:32 ` [PATCH 0/2] mwifiex: Work around firmware bugs on 88W8897 chip Pali Rohár

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e4cbf804-c374-79a3-53ac-8a0fbd8f75b8@v0yd.nl \
    --to=verdre@v0yd.nl \
    --cc=amitkarwar@gmail.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=andy.shevchenko@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=briannorris@chromium.org \
    --cc=davem@davemloft.net \
    --cc=ganapathi017@gmail.com \
    --cc=huxinming820@gmail.com \
    --cc=kitakar@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kvalo@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=luzmaximilian@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pali@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).