linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [3.3-rc2+] Thousands of ath9k warnings on dmesg before laptop froze
@ 2012-02-06  0:29 Carlos R. Mafra
  2012-02-06 22:57 ` [ath9k-devel] " Pavel Roskin
  0 siblings, 1 reply; 4+ messages in thread
From: Carlos R. Mafra @ 2012-02-06  0:29 UTC (permalink / raw)
  To: LKML; +Cc: Luis R. Rodriguez, ath9k-devel


I'm testing the latest kernel 3.3.0-rc2+ I pulled from git 
this morning.

My laptop just froze, and when I rebooted I noticed that /var/log/messages
contained 48 thousand (!) warnings coming from ath9k since a few hours 
ago. I'm pasting the first one:


 ------------[ cut here ]------------
 WARNING: at /home/mafra/linux-2.6/drivers/net/wireless/ath/ath9k/rc.c:697 ath_rc_get_highest_rix+0x156/0x210 [ath9k]()
 Hardware name: VPCEB4X1E
 Modules linked in: dvb_usb_dib0700 dib0090 dib7000p dib7000m dib0070 dib8000 dib3000mc dibx000_common af_packet snd_pcm_oss snd_mixer_oss microcode arc4 sr_mod cdrom sg ath9k mac80211 uvcvideo ath9k_common ath9k_hw videobuf2_core videobuf2_vmalloc videobuf2_memops ath i2c_i801 cfg80211 sony_laptop rfkill radeon ttm drm_kms_helper drm i2c_algo_bit
 Pid: 48, comm: kworker/u:6 Not tainted 3.3.0-rc2+ #6
 Call Trace:
  [<ffffffff81032f2a>] warn_slowpath_common+0x7a/0xb0
  [<ffffffff81032f75>] warn_slowpath_null+0x15/0x20
  [<ffffffffa0298eb6>] ath_rc_get_highest_rix+0x156/0x210 [ath9k]
  [<ffffffffa0298ff3>] ath_get_rate+0x83/0x480 [ath9k]
  [<ffffffffa024f006>] rate_control_get_rate+0x86/0x160 [mac80211]
  [<ffffffffa025a112>] invoke_tx_handlers+0x7b2/0x1420 [mac80211]
  [<ffffffff81061a00>] ? load_balance+0x90/0x8a0
  [<ffffffffa025af22>] ieee80211_tx+0x82/0xf0 [mac80211]
  [<ffffffffa025b030>] ieee80211_xmit+0xa0/0x100 [mac80211]
  [<ffffffffa025c23a>] ieee80211_tx_skb_tid+0x5a/0x70 [mac80211]
  [<ffffffffa0246f56>] ieee80211_send_nullfunc+0x56/0x70 [mac80211]
  [<ffffffffa0248460>] ieee80211_dynamic_ps_enable_work+0x240/0x310 [mac80211]
  [<ffffffffa0248220>] ? ieee80211_dynamic_ps_disable_work+0x50/0x50 [mac80211]
  [<ffffffff81048d67>] process_one_work+0x117/0x390
  [<ffffffff81049899>] worker_thread+0x169/0x350
  [<ffffffff81049730>] ? manage_workers+0x240/0x240
  [<ffffffff8104f096>] kthread+0x96/0xa0
  [<ffffffff814f7714>] kernel_thread_helper+0x4/0x10
  [<ffffffff8104f000>] ? flush_kthread_worker+0xb0/0xb0
  [<ffffffff814f7710>] ? gs_change+0xb/0xb
 ---[ end trace 7a4892c93f508f3a ]---
 ------------[ cut here ]------------


I must say that until the laptop froze, I didn't notice anything unusual
to the point of checking my dmesg. Sometimes the internet was slow,
but I thought it was unrelated to the kernel.

Is there any other info I should provide or the above warning is enough?



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ath9k-devel] [3.3-rc2+] Thousands of ath9k warnings on dmesg before laptop froze
  2012-02-06  0:29 [3.3-rc2+] Thousands of ath9k warnings on dmesg before laptop froze Carlos R. Mafra
@ 2012-02-06 22:57 ` Pavel Roskin
  2012-02-07 12:23   ` Carlos R. Mafra
  0 siblings, 1 reply; 4+ messages in thread
From: Pavel Roskin @ 2012-02-06 22:57 UTC (permalink / raw)
  To: Carlos R. Mafra; +Cc: LKML, Luis R. Rodriguez, ath9k-devel

[-- Attachment #1: Type: text/plain, Size: 2004 bytes --]

On Mon, 6 Feb 2012 00:29:07 +0000
"Carlos R. Mafra" <crmafra@gmail.com> wrote:

> 
> I'm testing the latest kernel 3.3.0-rc2+ I pulled from git 
> this morning.
> 
> My laptop just froze, and when I rebooted I noticed
> that /var/log/messages contained 48 thousand (!) warnings coming from
> ath9k since a few hours ago. I'm pasting the first one:

> 
>  ------------[ cut here ]------------
>  WARNING:
> at /home/mafra/linux-2.6/drivers/net/wireless/ath/ath9k/rc.c:697
> ath_rc_get_highest_rix+0x156/0x210 [ath9k]() Hardware name: VPCEB4X1E

I believe I found a solution for this today.  Please see this bug
tracker: https://bugzilla.redhat.com/show_bug.cgi?id=768639

While Fedora users report a warning, I've seen panic reports in the
list.  It's a memory corruption bug, so it can manifest in different
ways.  Please test the latest patch (attached).

Here's my comment to the patch:

This patch is based on my analysis of printk() output I added to the
ath9k driver.  I didn't have a chance to test the patch, so testing
would be greatly appreciated.

The corruption must be happening in ath_debug_stat_rc(), which is given
the result of ath_rc_get_rateindex().  ath_rc_get_rateindex() can
return -1, which causes ath_debug_stat_rc() to increment the value that
lies 16 bytes before rcstats in struct ath_rate_priv.  On 64-bit
systems, that happens to be rate_table.  Once the rate_table pointer is
incremented, all data there becomes invalid, which leads to the
warning.  On 32-bit systems, the corruption should happen in
neg_ht_rates.

The -1 value of idx in struct ieee80211_tx_rate is described in
net/mac80211.h. I don't know why we have -1 there and how to reproduce
the problem reliably. But -1 can be there and ath9k has no checks for
it.

The patch introduces two protections: ath_rc_get_rateindex() never
returns a negative value and ath_debug_stat_rc() checks the array
bounds.

It may not be good enough for the kernel, but it may be good enough for
Fedora.

-- 
Regards,
Pavel Roskin

[-- Attachment #2: 01-rix-check.patch --]
[-- Type: text/x-patch, Size: 1280 bytes --]

Prevent memory corruption in ath9k rate control algorithm

From: Pavel Roskin <proski@gnu.org>

Check final_rate in ath_debug_stat_rc().  Don't return negative values
from ath_rc_get_rateindex(), callers don't expect it.

Signed-off-by: Pavel Roskin <proski@gnu.org>
---

 drivers/net/wireless/ath/ath9k/rc.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)


diff --git a/drivers/net/wireless/ath/ath9k/rc.c b/drivers/net/wireless/ath/ath9k/rc.c
index 635b592..afe22f4 100644
--- a/drivers/net/wireless/ath/ath9k/rc.c
+++ b/drivers/net/wireless/ath/ath9k/rc.c
@@ -385,6 +385,11 @@ static int ath_rc_get_rateindex(const struct ath_rate_table *rate_table,
 	int rix = 0, i = 0;
 	static const int mcs_rix_off[] = { 7, 15, 20, 21, 22, 23 };
 
+	if (rate->idx < 0) {
+		printk(KERN_ERR "%s: rate->idx = %d\n", __func__, rate->idx);
+		return 0;
+	}
+
 	if (!(rate->flags & IEEE80211_TX_RC_MCS))
 		return rate->idx;
 
@@ -1324,6 +1329,11 @@ static void ath_debug_stat_rc(struct ath_rate_priv *rc, int final_rate)
 {
 	struct ath_rc_stats *stats;
 
+	if (final_rate < 0 || final_rate >= RATE_TABLE_SIZE) {
+		printk(KERN_ERR "%s: invalid final_rate: %d\n", __func__,
+		       final_rate);
+		return;
+	}
 	stats = &rc->rcstats[final_rate];
 	stats->success++;
 }

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [ath9k-devel] [3.3-rc2+] Thousands of ath9k warnings on dmesg before laptop froze
  2012-02-06 22:57 ` [ath9k-devel] " Pavel Roskin
@ 2012-02-07 12:23   ` Carlos R. Mafra
  2012-02-08 22:29     ` Calvin Owens
  0 siblings, 1 reply; 4+ messages in thread
From: Carlos R. Mafra @ 2012-02-07 12:23 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: LKML, Luis R. Rodriguez, ath9k-devel

On Mon,  6 Feb 2012 at 17:57:02 -0500, Pavel Roskin wrote:
> On Mon, 6 Feb 2012 00:29:07 +0000
> "Carlos R. Mafra" <crmafra@gmail.com> wrote:
> 
> > 
> > I'm testing the latest kernel 3.3.0-rc2+ I pulled from git 
> > this morning.
> > 
> > My laptop just froze, and when I rebooted I noticed
> > that /var/log/messages contained 48 thousand (!) warnings coming from
> > ath9k since a few hours ago. I'm pasting the first one:
> 
> > 
> >  ------------[ cut here ]------------
> >  WARNING:
> > at /home/mafra/linux-2.6/drivers/net/wireless/ath/ath9k/rc.c:697
> > ath_rc_get_highest_rix+0x156/0x210 [ath9k]() Hardware name: VPCEB4X1E
> 
> I believe I found a solution for this today.  Please see this bug
> tracker: https://bugzilla.redhat.com/show_bug.cgi?id=768639
> 
> While Fedora users report a warning, I've seen panic reports in the
> list.  It's a memory corruption bug, so it can manifest in different
> ways.  Please test the latest patch (attached).
> 
> Here's my comment to the patch:
> 
> This patch is based on my analysis of printk() output I added to the
> ath9k driver.  I didn't have a chance to test the patch, so testing
> would be greatly appreciated.
> 
> The corruption must be happening in ath_debug_stat_rc(), which is given
> the result of ath_rc_get_rateindex().  ath_rc_get_rateindex() can
> return -1, which causes ath_debug_stat_rc() to increment the value that
> lies 16 bytes before rcstats in struct ath_rate_priv.  On 64-bit
> systems, that happens to be rate_table.  Once the rate_table pointer is
> incremented, all data there becomes invalid, which leads to the
> warning.  On 32-bit systems, the corruption should happen in
> neg_ht_rates.
> 
> The -1 value of idx in struct ieee80211_tx_rate is described in
> net/mac80211.h. I don't know why we have -1 there and how to reproduce
> the problem reliably. But -1 can be there and ath9k has no checks for
> it.
> 
> The patch introduces two protections: ath_rc_get_rateindex() never
> returns a negative value and ath_debug_stat_rc() checks the array
> bounds.
> 
> It may not be good enough for the kernel, but it may be good enough for
> Fedora.


Thanks for the link to the bugzilla and for the attached patch. 

I'm currently testing it, and so far so good.



> Prevent memory corruption in ath9k rate control algorithm
> 
> From: Pavel Roskin <proski@gnu.org>
> 
> Check final_rate in ath_debug_stat_rc().  Don't return negative values
> from ath_rc_get_rateindex(), callers don't expect it.
> 
> Signed-off-by: Pavel Roskin <proski@gnu.org>
> ---
> 
>  drivers/net/wireless/ath/ath9k/rc.c |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
> 
> 
> diff --git a/drivers/net/wireless/ath/ath9k/rc.c b/drivers/net/wireless/ath/ath9k/rc.c
> index 635b592..afe22f4 100644
> --- a/drivers/net/wireless/ath/ath9k/rc.c
> +++ b/drivers/net/wireless/ath/ath9k/rc.c
> @@ -385,6 +385,11 @@ static int ath_rc_get_rateindex(const struct ath_rate_table *rate_table,
>  	int rix = 0, i = 0;
>  	static const int mcs_rix_off[] = { 7, 15, 20, 21, 22, 23 };
>  
> +	if (rate->idx < 0) {
> +		printk(KERN_ERR "%s: rate->idx = %d\n", __func__, rate->idx);
> +		return 0;
> +	}
> +
>  	if (!(rate->flags & IEEE80211_TX_RC_MCS))
>  		return rate->idx;
>  
> @@ -1324,6 +1329,11 @@ static void ath_debug_stat_rc(struct ath_rate_priv *rc, int final_rate)
>  {
>  	struct ath_rc_stats *stats;
>  
> +	if (final_rate < 0 || final_rate >= RATE_TABLE_SIZE) {
> +		printk(KERN_ERR "%s: invalid final_rate: %d\n", __func__,
> +		       final_rate);
> +		return;
> +	}
>  	stats = &rc->rcstats[final_rate];
>  	stats->success++;
>  }


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ath9k-devel] [3.3-rc2+] Thousands of ath9k warnings on dmesg before laptop froze
  2012-02-07 12:23   ` Carlos R. Mafra
@ 2012-02-08 22:29     ` Calvin Owens
  0 siblings, 0 replies; 4+ messages in thread
From: Calvin Owens @ 2012-02-08 22:29 UTC (permalink / raw)
  To: Carlos R. Mafra; +Cc: Pavel Roskin, LKML, Luis R. Rodriguez, ath9k-devel

I've been having this issue on x86_64 with 3.3-rc2+ as well. I got a
panic a couple days ago, but I haven't been able to reproduce it.

I can, however, reproduce the WARN_ON()'s on my laptop by pinging an
external server with a high frequency (.1s works well) while walking
in circles around the 2nd floor of a nearby library. (It's kind of
hilarious, but that's the only way I seem to be able to get it).

I can confirm that Pavel's patch resolves the problem. Feel free to add:
Tested-by: Calvin Owens <jcalvinowens@gmail.com>

Thanks,
Calvin

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-02-08 22:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-06  0:29 [3.3-rc2+] Thousands of ath9k warnings on dmesg before laptop froze Carlos R. Mafra
2012-02-06 22:57 ` [ath9k-devel] " Pavel Roskin
2012-02-07 12:23   ` Carlos R. Mafra
2012-02-08 22:29     ` Calvin Owens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).