All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Greear <greearb@candelatech.com>
To: Rajkumar Manoharan <rmanohar@codeaurora.org>
Cc: "Manoharan, Rajkumar" <rmanohar@qti.qualcomm.com>,
	"Valo, Kalle" <kvalo@qca.qualcomm.com>,
	ath10k@lists.infradead.org, linux-wireless@vger.kernel.org,
	mike@fireburn.co.uk
Subject: Re: Bug 119151 - [regression] ath10k no longer authenitcates and freezes system
Date: Thu, 2 Jun 2016 11:02:47 -0700	[thread overview]
Message-ID: <575074C7.50009@candelatech.com> (raw)
In-Reply-To: <6c6f208f9abc81cb262d763f6f6d684d@codeaurora.org>

On 06/02/2016 10:41 AM, Rajkumar Manoharan wrote:
> On 2016-06-02 22:53, Ben Greear wrote:
>> On 06/02/2016 10:03 AM, Manoharan, Rajkumar wrote:
>>> On Thursday, June 2, 2016 8:51 PM, Ben Greear <greearb@candelatech.com> wrote:
>>>> On 06/02/2016 07:24 AM, Valo, Kalle wrote:
>>>>> Kalle Valo <kvalo@qca.qualcomm.com> writes:
>>>>>
>>>>>> there's a regression in ath10k:
>>>>>>
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=119151
>>>>>>
>>>>>> Reporter bisected it to this:
>>>>>>
>>>>>> 5c86d97bcc1d42ce7f75685a61be4dad34ee8183 is the first bad commit
>>>>>> commit 5c86d97bcc1d42ce7f75685a61be4dad34ee8183
>>>>>> Author: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
>>>>>> Date:   Tue Mar 22 17:22:19 2016 +0530
>>>>>>
>>>>>> ath10k: combine txrx and replenish task
>>>>>>
> [...]
>
>>>> I found a lot of problems with this code as well, and the 5 patches
>>>> starting from the URL below fixed the issues for me.
>>>>
>>> Ben,
>>>
>>> Can you please explain the sort of issues you have observed with this change?
>>
>> I imported a bunch of upstream patches at once, so not sure exactly what commit
>> caused it.  And, this was about 2 months ago...  Upon review, I'm not
>> sure I even have
>> the patch this particular bug was bisected to, so maybe that is some
>> other issue.
>>
> Please keep track of buggy commit and report them asap.

I posted to the list at the time.  When I was debugging this, there
were so many conflicting issues that it was hard to find a single
regression point.

>> But, the problems I saw were deadlocks and memory corruption.  A lot of it was
>> because I was debugging new firmware at the time and so peer creation
>> was failing
>> sometimes, and things like that.  The error handling in ath10k for this was
>> faulty and racy and such.  We have not seen any performance regressions,
>> but we mostly run on very powerful CPUs.
>>
>> Please take a look at those 5 patches.  A good review would be much appreciated,
>> and by reading them you will better be able to see the problems I was hitting
>> and trying to fix.
>>
> Below two patches are critical and I already shared my feedback.
>
> https://patchwork.kernel.org/patch/8727841/
> https://patchwork.kernel.org/patch/9073471/
>
> Others are LGTM.

Not sure what LGTM means.

This one fixes memory corruption:
http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=blobdiff;f=drivers/net/wireless/ath/ath10k/htt_tx.c;h=58e88d392fb56a65304db17d11a9eaf0b0397dc7;hp=07b960e9704f509b3dddf1e45730e76a4c39e51e;hb=fddb6661a0f5772853fbb9feb7232f325d5f74c5;hpb=ed1757f8345064181664e4a62e2b917e694a665e

This one fixes use-after-free memory bugs:
http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=blobdiff;f=drivers/net/wireless/ath/ath10k/mac.c;h=5e5cc9c6c1d82524b9b77a7c6d2c1341c5268732;hp=8783119b9ba84e0ddb292d521e6513bf7d68a40b;hb=5ae13cea64004afc673ecc22cd70ac51179168c6;hpb=fddb6661a0f5772853fbb9feb7232f325d5f74c5

As does this one:
http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=blobdiff;f=drivers/net/wireless/ath/ath10k/mac.c;h=020dd25752224d9786da37a6dfd10a69e646b138;hp=5e5cc9c6c1d82524b9b77a7c6d2c1341c5268732;hb=c4b9566416a5e7b8d4c446d1bad34aabcbeff9f5;hpb=9bd9c11c1a2e61261c268ac2b6d791d4f6b6fe26

>
>> In case you want to look at the full context of those patches, you can find
>> them here (around 24 patches down from the top...)
>>
> Quite a big list :)
>
>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=summary
>>
>> For now, I am sticking with 4.4 + what I pulled in, but will rebase
>> against upstream someday
>> soon-ish and then we can start testing it all over again :)
>>
> Will go through the list. Better to post them to public if not.

Many of these patches are related to features only in my firmware.  The ~20
patch patch-bomb was a start at adding some of the hopefully less controversial
support.  If I can ever get that upstream, then I will pick off another
set of patches and try to get them ready for upstream.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


WARNING: multiple messages have this Message-ID (diff)
From: Ben Greear <greearb@candelatech.com>
To: Rajkumar Manoharan <rmanohar@codeaurora.org>
Cc: mike@fireburn.co.uk, "Valo, Kalle" <kvalo@qca.qualcomm.com>,
	linux-wireless@vger.kernel.org, "Manoharan,
	Rajkumar" <rmanohar@qti.qualcomm.com>,
	ath10k@lists.infradead.org
Subject: Re: Bug 119151 - [regression] ath10k no longer authenitcates and freezes system
Date: Thu, 2 Jun 2016 11:02:47 -0700	[thread overview]
Message-ID: <575074C7.50009@candelatech.com> (raw)
In-Reply-To: <6c6f208f9abc81cb262d763f6f6d684d@codeaurora.org>

On 06/02/2016 10:41 AM, Rajkumar Manoharan wrote:
> On 2016-06-02 22:53, Ben Greear wrote:
>> On 06/02/2016 10:03 AM, Manoharan, Rajkumar wrote:
>>> On Thursday, June 2, 2016 8:51 PM, Ben Greear <greearb@candelatech.com> wrote:
>>>> On 06/02/2016 07:24 AM, Valo, Kalle wrote:
>>>>> Kalle Valo <kvalo@qca.qualcomm.com> writes:
>>>>>
>>>>>> there's a regression in ath10k:
>>>>>>
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=119151
>>>>>>
>>>>>> Reporter bisected it to this:
>>>>>>
>>>>>> 5c86d97bcc1d42ce7f75685a61be4dad34ee8183 is the first bad commit
>>>>>> commit 5c86d97bcc1d42ce7f75685a61be4dad34ee8183
>>>>>> Author: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
>>>>>> Date:   Tue Mar 22 17:22:19 2016 +0530
>>>>>>
>>>>>> ath10k: combine txrx and replenish task
>>>>>>
> [...]
>
>>>> I found a lot of problems with this code as well, and the 5 patches
>>>> starting from the URL below fixed the issues for me.
>>>>
>>> Ben,
>>>
>>> Can you please explain the sort of issues you have observed with this change?
>>
>> I imported a bunch of upstream patches at once, so not sure exactly what commit
>> caused it.  And, this was about 2 months ago...  Upon review, I'm not
>> sure I even have
>> the patch this particular bug was bisected to, so maybe that is some
>> other issue.
>>
> Please keep track of buggy commit and report them asap.

I posted to the list at the time.  When I was debugging this, there
were so many conflicting issues that it was hard to find a single
regression point.

>> But, the problems I saw were deadlocks and memory corruption.  A lot of it was
>> because I was debugging new firmware at the time and so peer creation
>> was failing
>> sometimes, and things like that.  The error handling in ath10k for this was
>> faulty and racy and such.  We have not seen any performance regressions,
>> but we mostly run on very powerful CPUs.
>>
>> Please take a look at those 5 patches.  A good review would be much appreciated,
>> and by reading them you will better be able to see the problems I was hitting
>> and trying to fix.
>>
> Below two patches are critical and I already shared my feedback.
>
> https://patchwork.kernel.org/patch/8727841/
> https://patchwork.kernel.org/patch/9073471/
>
> Others are LGTM.

Not sure what LGTM means.

This one fixes memory corruption:
http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=blobdiff;f=drivers/net/wireless/ath/ath10k/htt_tx.c;h=58e88d392fb56a65304db17d11a9eaf0b0397dc7;hp=07b960e9704f509b3dddf1e45730e76a4c39e51e;hb=fddb6661a0f5772853fbb9feb7232f325d5f74c5;hpb=ed1757f8345064181664e4a62e2b917e694a665e

This one fixes use-after-free memory bugs:
http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=blobdiff;f=drivers/net/wireless/ath/ath10k/mac.c;h=5e5cc9c6c1d82524b9b77a7c6d2c1341c5268732;hp=8783119b9ba84e0ddb292d521e6513bf7d68a40b;hb=5ae13cea64004afc673ecc22cd70ac51179168c6;hpb=fddb6661a0f5772853fbb9feb7232f325d5f74c5

As does this one:
http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=blobdiff;f=drivers/net/wireless/ath/ath10k/mac.c;h=020dd25752224d9786da37a6dfd10a69e646b138;hp=5e5cc9c6c1d82524b9b77a7c6d2c1341c5268732;hb=c4b9566416a5e7b8d4c446d1bad34aabcbeff9f5;hpb=9bd9c11c1a2e61261c268ac2b6d791d4f6b6fe26

>
>> In case you want to look at the full context of those patches, you can find
>> them here (around 24 patches down from the top...)
>>
> Quite a big list :)
>
>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=summary
>>
>> For now, I am sticking with 4.4 + what I pulled in, but will rebase
>> against upstream someday
>> soon-ish and then we can start testing it all over again :)
>>
> Will go through the list. Better to post them to public if not.

Many of these patches are related to features only in my firmware.  The ~20
patch patch-bomb was a start at adding some of the hopefully less controversial
support.  If I can ever get that upstream, then I will pick off another
set of patches and try to get them ready for upstream.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

  reply	other threads:[~2016-06-02 18:02 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-02 13:52 Bug 119151 - [regression] ath10k no longer authenitcates and freezes system Valo, Kalle
2016-06-02 13:52 ` Valo, Kalle
2016-06-02 14:24 ` Valo, Kalle
2016-06-02 14:24   ` Valo, Kalle
2016-06-02 15:21   ` Ben Greear
2016-06-02 15:21     ` Ben Greear
2016-06-02 15:26     ` Valo, Kalle
2016-06-02 15:26       ` Valo, Kalle
2016-06-02 15:32       ` Ben Greear
2016-06-02 15:32         ` Ben Greear
2016-06-03 15:52         ` Valo, Kalle
2016-06-03 15:52           ` Valo, Kalle
2016-06-03 16:12           ` Ben Greear
2016-06-03 16:12             ` Ben Greear
2016-06-02 15:34     ` Mohammed Shafi Shajakhan
2016-06-02 15:34       ` Mohammed Shafi Shajakhan
2016-06-02 17:03     ` Manoharan, Rajkumar
2016-06-02 17:03       ` Manoharan, Rajkumar
2016-06-02 17:23       ` Ben Greear
2016-06-02 17:23         ` Ben Greear
2016-06-02 17:41         ` Rajkumar Manoharan
2016-06-02 17:41           ` Rajkumar Manoharan
2016-06-02 18:02           ` Ben Greear [this message]
2016-06-02 18:02             ` Ben Greear
     [not found]         ` <CAHbf0-GT0y1pEs-ToxbPAf+aRo7TNAyV_Emies_rjL27R1fk2A@mail.gmail.com>
2016-06-08 15:52           ` Rajkumar Manoharan
2016-06-08 15:52             ` Rajkumar Manoharan
2016-06-08 17:41             ` Mike Lothian
2016-06-08 17:41               ` Mike Lothian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=575074C7.50009@candelatech.com \
    --to=greearb@candelatech.com \
    --cc=ath10k@lists.infradead.org \
    --cc=kvalo@qca.qualcomm.com \
    --cc=linux-wireless@vger.kernel.org \
    --cc=mike@fireburn.co.uk \
    --cc=rmanohar@codeaurora.org \
    --cc=rmanohar@qti.qualcomm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.