All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Cameron <quozl@laptop.org>
To: Larry Finger <Larry.Finger@lwfinger.net>
Cc: linux-wireless@vger.kernel.org, Ping-Ke Shih <pkshih@realtek.com>
Subject: Re: rtl8821ae keep alive not set, connection lost
Date: Thu, 1 Feb 2018 17:22:02 +1100	[thread overview]
Message-ID: <20180201062202.GH917@us.netrek.org> (raw)
In-Reply-To: <a9a1ff03-d12f-7437-e7bc-5f01b195f753@lwfinger.net>

On Wed, Jan 31, 2018 at 11:06:12AM -0600, Larry Finger wrote:
> On 09/12/2017 05:09 PM, James Cameron wrote:
> >Summary: 40b368af4b75 ("rtlwifi: Fix alignment issues") breaks
> >rtl8821ae keep alive, causing "Connection to AP lost" and deauth,
> >but why?
> >
> >Wireless connection is lost after a few seconds or minutes, on
> >every OLPC NL3 laptop with rtl8821ae, with any stable kernel after
> >4.10.1, and any kernel with 40b368af4b75.
> >
> >dmesg contains
> >
> >   wlp2s0: Connection to AP 2c:b0:5d:a6:86:eb lost
> >
> >iw event shows
> >
> >   wlp2s0: del station 2c:b0:5d:a6:86:eb
> >   wlp2s0 (phy #0): deauth 74:c6:3b:09:b5:0d -> 2c:b0:5d:a6:86:eb reason 4: Disassociated due to inactivity
> >   wlp2s0 (phy #0): disconnected (local request)
> >
> >Workaround is to bounce the link, then reconnect;
> >
> >   ip link set wlp2s0 down
> >   ip link set wlp2s0 up
> >   iw dev wlp2s0 connect qz
> >
> >A nearby monitor host captures a deauthentication packet sent by
> >the device.
> >
> >Bisection showed cause is 40b368af4b75 ("rtlwifi: Fix alignment
> >issues") which changes the width of DBI register read.
> >
> >On the face of it, 40b368af4b75 looks correct, especially compared
> >against same function in rtl8723be.
> >
> >I've no idea why reverting fixes the problem.  I'm hoping someone
> >here might speculate and suggest ways to test.
> >
> >As keep alive is set through this path, my guess is that keep alive
> >is not being set in the device.  Or perhaps reading 16-bits
> >perturbs another register.  Is there a way to test?
> >
> >http://dev.laptop.org/~quozl/z/1drtGD.txt dmesg of 4.13
> >
> >http://dev.laptop.org/~quozl/z/1drt7c.txt dmesg with 4.13 and
> >revert of 40b368af4b75
> 
> James,
> 
> I'm afraid we are needing to revisit this problem again. Changing
> that 8-bit read to a 16-bit version causes an unaligned memory
> reference in AARCH64, thus we will need to re-revert. To prevent
> problems on systems such as yours, PK plans to turn off ASPM
> capability and backdoor in certain platforms that will be listed in
> a quirks table. Please report the output of 'dmidecode -t system'
> for you affected system(s).

Thanks for letting me know.

We made three production runs, and I'm waiting to get a hold of the
dmidecode for two of them.  This may take some weeks; we have to find
stock and ship it, or we have to ask our contract manufacturer (CM) if
they have kept data or units.

I've dmidecode for one production run.

http://dev.laptop.org/~quozl/z/1eh7JF.txt (my unit nl3-e)

I've dmidecode for prototypes, but they have clearly been programmed
badly.  We did not ask our CM for Windows compatibility, so they may
have had no step to verify the data.  We also went through several
iterations to get serial numbers assigned, so the data I have does not
have good provenance.

http://dev.laptop.org/~quozl/z/1eh7EE.txt (my unit nl3-c)
http://dev.laptop.org/~quozl/z/1eh7EV.txt (my unit nl3-d)
http://dev.laptop.org/~quozl/z/1eh7He.txt (my unit nl3-a)
http://dev.laptop.org/~quozl/z/1eh8DR.txt (my unit nl3-b)

> We hope you will be able to test any proposed patches.

Yes, can do.

I've just tested v4.15.

However, I'm concerned about your plan to use quirks;

1.  turning off ASPM may decrease run time on battery, which if it is
significant, across several thousand laptops will yield generator fuel
or solar budget failure; can the power impact be quantified?

2.  why not keep ASPM enabled, and use 8-bit when quirked, or on
x86_64, or when not AARCH64?

3.  why not find the underlying problem; PK is in the same company as
the device firmware engineers, so it should be possible for them to
find out why 16-bit access causes the device firmware to hang?  We
drew a blank trying to reach firmware engineers through our CM and
module maker; perhaps we were not large or noisy enough.

4.  it's not just me; there are others who have reported similar
problems, so won't re-reverting affect them?  They haven't engaged in
the process as thoroughly, and may not be in the quirks table.  You
also reproduced the problem with different hardware.

> Thanks,
> 
> Larry

-- 
James Cameron
http://quozl.netrek.org/

  reply	other threads:[~2018-02-01  6:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-12 22:09 rtl8821ae keep alive not set, connection lost James Cameron
2017-09-13 15:01 ` Larry Finger
2017-09-13 21:46   ` James Cameron
2017-09-14  0:39     ` Larry Finger
2017-09-14  9:27       ` James Cameron
2017-09-19  9:42         ` James Cameron
2017-09-20  9:36           ` James Cameron
2017-09-20 21:48             ` Larry Finger
2017-09-20 23:22               ` James Cameron
2017-09-21  8:07                 ` James Cameron
2017-09-21 14:40                   ` Larry Finger
2017-09-22  5:35                     ` James Cameron
2018-01-31 17:06 ` Larry Finger
2018-02-01  6:22   ` James Cameron [this message]
2018-02-02  7:50     ` Pkshih
2018-02-02 20:13       ` Larry Finger
2018-02-03  4:45         ` Pkshih
2018-02-04 18:18           ` Larry Finger
2018-02-02 20:27       ` Larry Finger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180201062202.GH917@us.netrek.org \
    --to=quozl@laptop.org \
    --cc=Larry.Finger@lwfinger.net \
    --cc=linux-wireless@vger.kernel.org \
    --cc=pkshih@realtek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.