linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@toke.dk>
To: Peter Seiderer <ps.report@gmx.net>
Cc: Kalle Valo <kvalo@kernel.org>,
	linux-wireless@vger.kernel.org,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	Johannes Berg <johannes@sipsolutions.net>
Subject: Re: [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211
Date: Mon, 04 Apr 2022 18:03:48 +0200	[thread overview]
Message-ID: <87o81gesyj.fsf@toke.dk> (raw)
In-Reply-To: <20220404130453.5ec6e045@gmx.net>

Peter Seiderer <ps.report@gmx.net> writes:

> Hello Toke,
>
> On Sat, 2 Apr 2022 18:19:10 +0200, Peter Seiderer <ps.report@gmx.net> wrote:
>
>> Hello Toke,
>> 
>> On Sat, 02 Apr 2022 17:11:54 +0200, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> 
>> > Peter Seiderer <ps.report@gmx.net> writes:
>> >   
>> > > Hello Toke,
>> > >
>> > > On Fri, 1 Apr 2022 19:26:57 +0200, Peter Seiderer <ps.report@gmx.net> wrote:
>> > >    
>> > >> Hello Toke,
>> > >> 
>> > >> On Wed, 30 Mar 2022 18:44:09 +0200, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>> > >>     
>> > >> > The ath9k driver was not properly clearing the status area in the
>> > >> > ieee80211_tx_info struct before reporting TX status to mac80211. Instead,
>> > >> > it was manually filling in fields, which meant that fields introduced later
>> > >> > were left as-is.
>> > >> > 
>> > >> > Conveniently, mac80211 actually provides a helper to zero out the status
>> > >> > area, so use that to make sure we zero everything.
>> > >> > 
>> > >> > The last commit touching the driver function writing the status information
>> > >> > seems to have actually been fixing an issue that was also caused by the
>> > >> > area being uninitialised; but it only added clearing of a single field
>> > >> > instead of the whole struct. That is now redundant, though, so revert that
>> > >> > commit and use it as a convenient Fixes tag.
>> > >> > 
>> > >> > Fixes: cc591d77aba1 ("ath9k: Make sure to zero status.tx_time before reporting TX status")
>> > >> > Reported-by: Bagas Sanjaya <bagasdotme@gmail.com>
>> > >> > Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>> > >> > ---
>> > >> >  drivers/net/wireless/ath/ath9k/xmit.c | 5 ++---
>> > >> >  1 file changed, 2 insertions(+), 3 deletions(-)
>> > >> > 
>> > >> > diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
>> > >> > index d0caf1de2bde..cbcf96ac303e 100644
>> > >> > --- a/drivers/net/wireless/ath/ath9k/xmit.c
>> > >> > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>> > >> > @@ -2553,6 +2553,8 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>> > >> >  	struct ath_hw *ah = sc->sc_ah;
>> > >> >  	u8 i, tx_rateindex;
>> > >> >  
>> > >> > +	ieee80211_tx_info_clear_status(tx_info);
>> > >> > +      
>> > >> 
>> > >> As this also clears the status.rates[].count, see include/net/mac80211.h:
>> > >> 
>> > >> 1195 static inline void
>> > >> 1196 ieee80211_tx_info_clear_status(struct ieee80211_tx_info *info)
>> > >> 1197 {
>> > >> 1198         int i;
>> > >> 1199         
>> > >> 1200         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
>> > >> 1201                      offsetof(struct ieee80211_tx_info, control.rates));
>> > >> 1202         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
>> > >> 1203                      offsetof(struct ieee80211_tx_info, driver_rates));
>> > >> 1204         BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8)     ;       
>> > >> 1205         /* clear the rate counts */
>> > >> 1206         for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
>> > >> 1207                 info->status.rates[i].count = 0;
>> > >> 1208         memset_after(&info->status, 0, rates);
>> > >> 1209 }
>> > >> 
>> > >> I would have expected some lines added to restore the count (for the
>> > >> rates with index < tx_rateindex), e.g. as done in
>> > >> drivers/net/wireless/ath/ath5k/base.c:
>> > >> 
>> > >> 1731         ieee80211_tx_info_clear_status(info);
>> > >> 1732 
>> > >> 1733         for (i = 0; i < ts->ts_final_idx; i++) {
>> > >> 1734                 struct ieee80211_tx_rate *r =
>> > >> 1735                         &info->status.rates[i];
>> > >> 1736 
>> > >> 1737                 r->count = tries[i];
>> > >> 1738         }
>> > >> 
>> > >> In drivers/net/wireless/ath/ath9k/xmit.c this is only done/changed for
>> > >> the tx_rateindex index (which is often zero in case the first suggested rate
>> > >> succeeds, but in noisy environment is sometimes > 0)...
>> > >>     
>> > >> >  	if (txok)
>> > >> >  		tx_info->status.ack_signal = ts->ts_rssi;
>> > >> >  
>> > >> > @@ -2595,9 +2597,6 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>> > >> >  	}
>> > >> >  
>> > >> >  	tx_info->status.rates[)].count = ts->ts_longretry + 1;
>> > >> > -
>> > >> > -	/* we report airtime in ath_tx_count_airtime(), don't report twice */
>> > >> > -	tx_info->status.tx_time = 0;
>> > >> >  }
>> > >> >  
>> > >> >  static void ath_tx_processq(struct ath_softc *sc, struct ath_txq *txq)      
>> > >> 
>> > >> 
>> > >> And from drivers/net/wireless/ath/ath9k/xmit.c:
>> > >> 
>> > >> 2592         for (i = tx_rateindex + 1; i < hw->max_rates; i++) {
>> > >> 2593                 tx_info->status.rates[i].count = 0;
>> > >> 2594                 tx_info->status.rates[i].idx = -1;
>> > >> 2595         }
>> > >> 
>> > >> Line 2593 can be deleted as status.rates[].count is already zeroed through the
>> > >> ieee80211_tx_info_clear_status() call...
>> > >> 
>> > >> And it should be sufficient to do:
>> > >> 
>> > >> 	if (tx_rateindex + 1 < hw->max_rates)
>> > >> 		tx_info->status.rates[tx_rateindex + 1].idx = -1;
>> > >> 
>> > >> Regards,
>> > >> Peter    
>> > >
>> > > And one additional problem found with your patch applied (to 5.16.18), the log
>> > > get spammed by:
>> > >
>> > > [  445.489139] ------------[ cut here ]------------
>> > > [  445.489150] WARNING: CPU: 0 PID: 8630 at drivers/net/wireless/ath/ath9k/xmit.c:174 ath_tx_complete+0x190/0x1a0 [ath9k]
>> > > [  445.489178] Modules linked in: ath9k ath9k_common ath9k_hw mac80211 ppp_async ppp_generic slhc xt_u32 xt_TCPMSS xt_tcpmss xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter iptable_filter ip6table_mangle ip6_tables cls_u32 sch_prio qmi_wwan cdc_wdm usbnet qcserial usb_wwan gpio_pca953x xt_tcpudp xt_mark iptable_mangle ip_tables x_tables dummy tqmx86_wdt watchdog i2c_machxo2 gpio_tqmx86 libarc4 ath tqmx86 mfd_core ftdi_sio cfg80211 usbserial nct7802 regmap_i2c [last unloaded: mac80211]
>> > > [  445.489298] CPU: 0 PID: 8630 Comm: modprobe Tainted: G        W         5.16.18+ #2
>> > > [  445.489307] Hardware name: TQ-Group TQMxE38M X64/Type2 - Board Product Name, BIOS TQMxE38M.5.4.48.0028.12 01/31/2017
>> > > [  445.489311] EIP: ath_tx_complete+0x190/0x1a0 [ath9k]
>> > > [  445.489328] Code: ff 75 e8 68 d7 a1 2c f8 89 55 ec e8 d5 4f e3 ff 8b 45 f0 8b 55 ec 83 c4 10 8b 80 58 0a 00 00 e9 a8 fe ff ff 8d b6 00 00 00 00 <0f> 0b c7 42 68 00 00 00 00 e9 1d ff ff ff 66 90 55 89 e5 57 56 89
>> > > [  445.489336] EAX: 00000000 EBX: c2e67900 ECX: c23cc040 EDX: c235a980
>> > > [  445.489342] ESI: c235a888 EDI: c2359860 EBP: c1089e18 ESP: c1089dfc
>> > > [  445.489348] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010297
>> > > [  445.489355] CR0: 80050033 CR2: b7f81000 CR3: 02ec7000 CR4: 001006d0
>> > > [  445.489362] Call Trace:
>> > > [  445.489366]  <SOFTIRQ>
>> > > [  445.489372]  ath_tx_complete_buf+0x100/0x130 [ath9k]
>> > > [  445.489390]  ath_tx_process_buffer+0x187/0xb40 [ath9k]
>> > > [  445.489407]  ? resched_curr+0x1d/0xc0
>> > > [  445.489419]  ? check_preempt_wakeup+0x115/0x250
>> > > [  445.489429]  ? task_fork_fair+0xc0/0x170
>> > > [  445.489436]  ? put_prev_task_fair+0x40/0x40
>> > > [  445.489444]  ? check_preempt_curr+0x62/0x70
>> > > [  445.489452]  ? __local_bh_enable_ip+0x33/0x80
>> > > [  445.489460]  ? _raw_spin_unlock_bh+0x13/0x20
>> > > [  445.489468]  ? ath_txq_unlock_complete+0x5a/0x100 [ath9k]
>> > > [  445.489485]  ? _raw_spin_unlock_irqrestore+0x16/0x30
>> > > [  445.489492]  ? try_to_wake_up+0x7c/0x550
>> > > [  445.489499]  ath_tx_tasklet+0x1de/0x2e0 [ath9k]
>> > > [  445.489519]  ath9k_tasklet+0x22f/0x330 [ath9k]
>> > > [  445.489535]  tasklet_action_common.constprop.0+0x89/0xb0
>> > > [  445.489544]  tasklet_action+0x21/0x30
>> > > [  445.489551]  __do_softirq+0xc5/0x28b
>> > > [  445.489559]  ? __entry_text_end+0x4/0x4
>> > > [  445.489567]  call_on_stack+0x40/0x50
>> > > [  445.489576]  </SOFTIRQ>
>> > > [  445.489580]  ? irq_exit_rcu+0x92/0x100
>> > > [  445.489587]  ? common_interrupt+0x27/0x40
>> > > [  445.489595]  ? asm_common_interrupt+0x102/0x140
>> > > [  445.489605] ---[ end trace 5c176b666255bca1 ]---
>> > > [  445.607206] ------------[ cut here ]------------
>> > >
>> > >
>> > > Could be fixed by the following additional change:
>> > >
>> > > --- a/drivers/net/wireless/ath/ath9k/xmit.c
>> > > +++ b/drivers/net/wireless/ath/ath9k/xmit.c
>> > > @@ -2511,9 +2511,12 @@ static void ath_tx_rc_status(struct ath_softc *sc, struct ath_buf *bf,
>> > >         struct ieee80211_hw *hw = sc->hw;
>> > >         struct ath_hw *ah = sc->sc_ah;
>> > >         u8 i, tx_rateindex;
>> > > +       bool is_valid_ack_signal = tx_info->status.is_valid_ack_signal;
>> > >
>> > >         ieee80211_tx_info_clear_status(tx_info);
>> > >
>> > > +       tx_info->status.is_valid_ack_signal = is_valid_ack_signal;
>> > > +
>> > >         if (txok)    
>> > 
>> > That doesn't seem right, zeroing is_valid_ack_signal was the whole point
>> > of the first patch... Something seems off in that backtrace; could you
>> > try running it through ./scripts/decode_stacktrace.sh please?  
>> 
>> Will do (and re-run the test with an fresh compiled kernel as CONFIG_DEBUG_KERNEL
>> was not set - but need to wait until next week as I need physical access to
>> the test system for kernel update)...
>
> Same with an fresh build (and with stacktrace run through decode_stacktrace.sh), but
> did some more investigations, added an debug print of tx_info->status.is_valid_ack_signal
> in ath_tx_rc_status() and an corresponding one in ath_txq_skb_done() printing out
> fi->txq, with the additional
>
> 	tx_info->status.is_valid_ack_signal = is_valid_ack_signal;
>
> line added showed the following output
>
> [  839.543175] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
> [  839.543195] XXX - ath_txq_skb_done() txq: 2
> [  839.617898] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
> [  839.617961] XXX - ath_txq_skb_done() txq: 2
> [  840.024845] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
> [  840.024863] XXX - ath_txq_skb_done() txq: 2
> [...]
>
> and without the following:
>
> [  924.614745] XXX - ath_tx_rc_status() - is_valid_ack_signal: 2
> [  924.614762] XXX - ath_txq_skb_done() txq: 0
> [  924.614805] ------------[ cut here ]------------
> [  924.614809] WARNING: CPU: 0 PID: 0 at drivers/net/wireless/ath/ath9k/xmit.c:176 ath_txq_skb_done.constprop.0+0x28/0x38 [ath9k]
> [...]
>
>
> Seems ieee80211_tx_info_clear_status() clears/destroys the dual use of the
> struct ieee80211_tx_info.status vs. ieee80211_tx_info.rate_driver_data...
>
> See drivers/net/wireless/ath/ath9k/xmit.c
>
>  140 static struct ath_frame_info *get_frame_info(struct sk_buff *skb)
>  141 {
>  142         struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb);
>  143         BUILD_BUG_ON(sizeof(struct ath_frame_info) >
>  144                      sizeof(tx_info->rate_driver_data));
>  145         return (struct ath_frame_info *) &tx_info->rate_driver_data[0];
>  146 }
>
>
>  205 static void ath_txq_skb_done(struct ath_softc *sc, struct ath_txq *txq,
>  206                              struct sk_buff *skb)
>  207 {
>  208         struct ath_frame_info *fi = get_frame_info(skb);
>  209         int q = fi->txq;
>  210 
>  211         if (q < 0)
>  212                 return;
>  213 
>  214         txq = sc->tx.txq_map[q];
>  215         if (WARN_ON(--txq->pending_frames < 0))
>  216                 txq->pending_frames = 0;
>  217 
>  218 }
>
> Seems it only worked by chance (?) before...

Yikes, yeah, that's no good! Also, this comment above
ieee80211_tx_info_clear_status() is completely misleading:

 * NOTE: You can only use this function if you do NOT use
 *	 info->driver_data! Use info->rate_driver_data
 *	 instead if you need only the less space that allows.

(I read that, and figured that the current use was OK, but nope, guess
not!)

The ath_frame_info struct is small enough to fit into the
status_driver_data field, though, so guess we could just move it there.
However, ieee80211_tx_info_clear_status() also clears that AFAICT, so
we'll need to save & restore it? That seems a bit odd, is this
intentional Johannes?

-Toke

      reply	other threads:[~2022-04-04 21:16 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-30 16:44 [PATCH v5.18] ath9k: Properly clear TX status area before reporting to mac80211 Toke Høiland-Jørgensen
2022-03-31  5:06 ` Bagas Sanjaya
2022-03-31  5:18   ` Bagas Sanjaya
2022-03-31  5:36   ` Kalle Valo
2022-03-31  8:35     ` Toke Høiland-Jørgensen
2022-03-31  9:31       ` Kalle Valo
2022-04-01  6:37 ` [v5.18] " Kalle Valo
2022-04-01 17:26 ` [PATCH v5.18] " Peter Seiderer
2022-04-02 12:00   ` Toke Høiland-Jørgensen
2022-04-02 14:33   ` Peter Seiderer
2022-04-02 15:11     ` Toke Høiland-Jørgensen
2022-04-02 16:19       ` Peter Seiderer
2022-04-04 11:04         ` Peter Seiderer
2022-04-04 16:03           ` Toke Høiland-Jørgensen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o81gesyj.fsf@toke.dk \
    --to=toke@toke.dk \
    --cc=bagasdotme@gmail.com \
    --cc=johannes@sipsolutions.net \
    --cc=kvalo@kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=ps.report@gmx.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).