* ath10k performance, master branch from 20160407 @ 2016-04-08 14:44 Roman Yeryomin 2016-04-08 15:34 ` Manoharan, Rajkumar 0 siblings, 1 reply; 23+ messages in thread From: Roman Yeryomin @ 2016-04-08 14:44 UTC (permalink / raw) To: ath10k Hello! I've seen performance patches were commited so I've decided to give it a try (using 4.1 kernel and backports). The results are quite disappointing: TCP download (client pov) dropped from 750Mbps to ~550 and UDP shows completely weird behavour - if generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives 250Mbps, before (latest official backports release from January) I was able to get 900Mbps. Hardware is basically ap152 + qca988x 3x3. When running perf top I see that fq_codel_drop eats a lot of cpu. Here is the output when running iperf3 UDP test: 45.78% [kernel] [k] fq_codel_drop 3.05% [kernel] [k] ag71xx_poll 2.18% [kernel] [k] skb_release_data 2.01% [kernel] [k] r4k_dma_cache_inv 1.73% [kernel] [k] eth_type_trans 1.24% [kernel] [k] build_skb 1.20% [mac80211] [k] ieee80211_tx_dequeue 1.03% [kernel] [k] __delay 0.98% [kernel] [k] fq_codel_enqueue 0.94% [kernel] [k] __netif_receive_skb_core 0.93% [kernel] [k] skb_release_head_state 0.88% [ath10k_core] [k] ath10k_htt_tx 0.87% [kernel] [k] __dev_queue_xmit 0.84% [mac80211] [k] ieee80211_tx_status 0.81% [kernel] [k] __build_skb 0.80% [mac80211] [k] __ieee80211_subif_start_xmit 0.77% [kernel] [k] br_handle_frame_finish 0.75% [kernel] [k] __qdisc_run 0.73% [kernel] [k] skb_recycler_consume 0.72% [kernel] [k] kfree_skb 0.72% [kernel] [k] get_page_from_freelist 0.69% [kernel] [k] br_fdb_update 0.69% [kernel] [k] br_handle_frame 0.67% [kernel] [k] __copy_user_common 0.66% [kernel] [k] __skb_flow_dissect 0.65% [ath10k_core] [k] ath10k_txrx_tx_unref 0.60% [kernel] [k] kmem_cache_alloc 0.60% [mac80211] [k] sta_addr_hash 0.56% [kernel] [k] fq_codel_dequeue 0.53% [kernel] [k] __local_bh_enable_ip 0.50% [kernel] [k] __br_fdb_get What could be the reason? I've seen there are some patches from Michal which touch fq_codel, would those help or not? Regards, Roman _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-08 14:44 ath10k performance, master branch from 20160407 Roman Yeryomin @ 2016-04-08 15:34 ` Manoharan, Rajkumar 2016-04-08 16:00 ` Roman Yeryomin 0 siblings, 1 reply; 23+ messages in thread From: Manoharan, Rajkumar @ 2016-04-08 15:34 UTC (permalink / raw) To: Roman Yeryomin, ath10k, Rajkumar Manoharan Roman, Which backports version are you using? I don't see codel changes in ath.git/wireless-drivers.git. Hope you are using same firmware. -Rajkumar ________________________________________ From: ath10k <ath10k-bounces@lists.infradead.org> on behalf of Roman Yeryomin <leroi.lists@gmail.com> Sent: Friday, April 8, 2016 8:14 PM To: ath10k@lists.infradead.org Subject: ath10k performance, master branch from 20160407 Hello! I've seen performance patches were commited so I've decided to give it a try (using 4.1 kernel and backports). The results are quite disappointing: TCP download (client pov) dropped from 750Mbps to ~550 and UDP shows completely weird behavour - if generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives 250Mbps, before (latest official backports release from January) I was able to get 900Mbps. Hardware is basically ap152 + qca988x 3x3. When running perf top I see that fq_codel_drop eats a lot of cpu. Here is the output when running iperf3 UDP test: 45.78% [kernel] [k] fq_codel_drop 3.05% [kernel] [k] ag71xx_poll 2.18% [kernel] [k] skb_release_data 2.01% [kernel] [k] r4k_dma_cache_inv 1.73% [kernel] [k] eth_type_trans 1.24% [kernel] [k] build_skb 1.20% [mac80211] [k] ieee80211_tx_dequeue 1.03% [kernel] [k] __delay 0.98% [kernel] [k] fq_codel_enqueue 0.94% [kernel] [k] __netif_receive_skb_core 0.93% [kernel] [k] skb_release_head_state 0.88% [ath10k_core] [k] ath10k_htt_tx 0.87% [kernel] [k] __dev_queue_xmit 0.84% [mac80211] [k] ieee80211_tx_status 0.81% [kernel] [k] __build_skb 0.80% [mac80211] [k] __ieee80211_subif_start_xmit 0.77% [kernel] [k] br_handle_frame_finish 0.75% [kernel] [k] __qdisc_run 0.73% [kernel] [k] skb_recycler_consume 0.72% [kernel] [k] kfree_skb 0.72% [kernel] [k] get_page_from_freelist 0.69% [kernel] [k] br_fdb_update 0.69% [kernel] [k] br_handle_frame 0.67% [kernel] [k] __copy_user_common 0.66% [kernel] [k] __skb_flow_dissect 0.65% [ath10k_core] [k] ath10k_txrx_tx_unref 0.60% [kernel] [k] kmem_cache_alloc 0.60% [mac80211] [k] sta_addr_hash 0.56% [kernel] [k] fq_codel_dequeue 0.53% [kernel] [k] __local_bh_enable_ip 0.50% [kernel] [k] __br_fdb_get What could be the reason? I've seen there are some patches from Michal which touch fq_codel, would those help or not? Regards, Roman _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-08 15:34 ` Manoharan, Rajkumar @ 2016-04-08 16:00 ` Roman Yeryomin 2016-04-08 16:41 ` Manoharan, Rajkumar 2016-04-12 10:16 ` Xue Liu 0 siblings, 2 replies; 23+ messages in thread From: Roman Yeryomin @ 2016-04-08 16:00 UTC (permalink / raw) To: Manoharan, Rajkumar; +Cc: Rajkumar Manoharan, ath10k Rajkumar, I took backports from git://git.kernel.org/pub/scm/linux/kernel/git/backports/backports.git, took latest ath tree from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git, generated backports-output based on ath master branch, refreshed openwrt patches. And saw big performance degradation. Am I doing something wrong? Regards, Roman On 8 April 2016 at 18:34, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: > Roman, > > Which backports version are you using? I don't see codel changes in ath.git/wireless-drivers.git. > Hope you are using same firmware. > > -Rajkumar > ________________________________________ > From: ath10k <ath10k-bounces@lists.infradead.org> on behalf of Roman Yeryomin <leroi.lists@gmail.com> > Sent: Friday, April 8, 2016 8:14 PM > To: ath10k@lists.infradead.org > Subject: ath10k performance, master branch from 20160407 > > Hello! > > I've seen performance patches were commited so I've decided to give it > a try (using 4.1 kernel and backports). > The results are quite disappointing: TCP download (client pov) dropped > from 750Mbps to ~550 and UDP shows completely weird behavour - if > generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives > 250Mbps, before (latest official backports release from January) I was > able to get 900Mbps. > Hardware is basically ap152 + qca988x 3x3. > When running perf top I see that fq_codel_drop eats a lot of cpu. > Here is the output when running iperf3 UDP test: > > 45.78% [kernel] [k] fq_codel_drop > 3.05% [kernel] [k] ag71xx_poll > 2.18% [kernel] [k] skb_release_data > 2.01% [kernel] [k] r4k_dma_cache_inv > 1.73% [kernel] [k] eth_type_trans > 1.24% [kernel] [k] build_skb > 1.20% [mac80211] [k] ieee80211_tx_dequeue > 1.03% [kernel] [k] __delay > 0.98% [kernel] [k] fq_codel_enqueue > 0.94% [kernel] [k] __netif_receive_skb_core > 0.93% [kernel] [k] skb_release_head_state > 0.88% [ath10k_core] [k] ath10k_htt_tx > 0.87% [kernel] [k] __dev_queue_xmit > 0.84% [mac80211] [k] ieee80211_tx_status > 0.81% [kernel] [k] __build_skb > 0.80% [mac80211] [k] __ieee80211_subif_start_xmit > 0.77% [kernel] [k] br_handle_frame_finish > 0.75% [kernel] [k] __qdisc_run > 0.73% [kernel] [k] skb_recycler_consume > 0.72% [kernel] [k] kfree_skb > 0.72% [kernel] [k] get_page_from_freelist > 0.69% [kernel] [k] br_fdb_update > 0.69% [kernel] [k] br_handle_frame > 0.67% [kernel] [k] __copy_user_common > 0.66% [kernel] [k] __skb_flow_dissect > 0.65% [ath10k_core] [k] ath10k_txrx_tx_unref > 0.60% [kernel] [k] kmem_cache_alloc > 0.60% [mac80211] [k] sta_addr_hash > 0.56% [kernel] [k] fq_codel_dequeue > 0.53% [kernel] [k] __local_bh_enable_ip > 0.50% [kernel] [k] __br_fdb_get > > What could be the reason? > I've seen there are some patches from Michal which touch fq_codel, > would those help or not? > > > Regards, > Roman > > _______________________________________________ > ath10k mailing list > ath10k@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/ath10k _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: ath10k performance, master branch from 20160407 2016-04-08 16:00 ` Roman Yeryomin @ 2016-04-08 16:41 ` Manoharan, Rajkumar 2016-04-08 17:19 ` Roman Yeryomin 2016-04-12 10:16 ` Xue Liu 1 sibling, 1 reply; 23+ messages in thread From: Manoharan, Rajkumar @ 2016-04-08 16:41 UTC (permalink / raw) To: Roman Yeryomin; +Cc: Rajkumar Manoharan, ath10k That should be fine. Is codel running only for latest backports? Are there any openwrt changes to configure codel? Can you plz try to reset master branch to older commit and validate? -Rajkumar ________________________________________ From: Roman Yeryomin [leroi.lists@gmail.com] Sent: Friday, April 8, 2016 9:30 PM To: Manoharan, Rajkumar Cc: ath10k@lists.infradead.org; Rajkumar Manoharan Subject: Re: ath10k performance, master branch from 20160407 Rajkumar, I took backports from git://git.kernel.org/pub/scm/linux/kernel/git/backports/backports.git, took latest ath tree from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git, generated backports-output based on ath master branch, refreshed openwrt patches. And saw big performance degradation. Am I doing something wrong? Regards, Roman On 8 April 2016 at 18:34, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: > Roman, > > Which backports version are you using? I don't see codel changes in ath.git/wireless-drivers.git. > Hope you are using same firmware. > > -Rajkumar > ________________________________________ > From: ath10k <ath10k-bounces@lists.infradead.org> on behalf of Roman Yeryomin <leroi.lists@gmail.com> > Sent: Friday, April 8, 2016 8:14 PM > To: ath10k@lists.infradead.org > Subject: ath10k performance, master branch from 20160407 > > Hello! > > I've seen performance patches were commited so I've decided to give it > a try (using 4.1 kernel and backports). > The results are quite disappointing: TCP download (client pov) dropped > from 750Mbps to ~550 and UDP shows completely weird behavour - if > generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives > 250Mbps, before (latest official backports release from January) I was > able to get 900Mbps. > Hardware is basically ap152 + qca988x 3x3. > When running perf top I see that fq_codel_drop eats a lot of cpu. > Here is the output when running iperf3 UDP test: > > 45.78% [kernel] [k] fq_codel_drop > 3.05% [kernel] [k] ag71xx_poll > 2.18% [kernel] [k] skb_release_data > 2.01% [kernel] [k] r4k_dma_cache_inv > 1.73% [kernel] [k] eth_type_trans > 1.24% [kernel] [k] build_skb > 1.20% [mac80211] [k] ieee80211_tx_dequeue > 1.03% [kernel] [k] __delay > 0.98% [kernel] [k] fq_codel_enqueue > 0.94% [kernel] [k] __netif_receive_skb_core > 0.93% [kernel] [k] skb_release_head_state > 0.88% [ath10k_core] [k] ath10k_htt_tx > 0.87% [kernel] [k] __dev_queue_xmit > 0.84% [mac80211] [k] ieee80211_tx_status > 0.81% [kernel] [k] __build_skb > 0.80% [mac80211] [k] __ieee80211_subif_start_xmit > 0.77% [kernel] [k] br_handle_frame_finish > 0.75% [kernel] [k] __qdisc_run > 0.73% [kernel] [k] skb_recycler_consume > 0.72% [kernel] [k] kfree_skb > 0.72% [kernel] [k] get_page_from_freelist > 0.69% [kernel] [k] br_fdb_update > 0.69% [kernel] [k] br_handle_frame > 0.67% [kernel] [k] __copy_user_common > 0.66% [kernel] [k] __skb_flow_dissect > 0.65% [ath10k_core] [k] ath10k_txrx_tx_unref > 0.60% [kernel] [k] kmem_cache_alloc > 0.60% [mac80211] [k] sta_addr_hash > 0.56% [kernel] [k] fq_codel_dequeue > 0.53% [kernel] [k] __local_bh_enable_ip > 0.50% [kernel] [k] __br_fdb_get > > What could be the reason? > I've seen there are some patches from Michal which touch fq_codel, > would those help or not? > > > Regards, > Roman > > _______________________________________________ > ath10k mailing list > ath10k@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/ath10k _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-08 16:41 ` Manoharan, Rajkumar @ 2016-04-08 17:19 ` Roman Yeryomin 2016-04-09 4:02 ` Manoharan, Rajkumar 0 siblings, 1 reply; 23+ messages in thread From: Roman Yeryomin @ 2016-04-08 17:19 UTC (permalink / raw) To: Manoharan, Rajkumar; +Cc: Rajkumar Manoharan, ath10k Latest backports (compat-wireless) released (20160110) has codel enabled (CPTCFG_NET_SCH_FQ_CODEL=y) and there are no openwrt patches or special configuration for codel. And it runs ok. How old commit do you want me to try? Regards, Roman On 8 April 2016 at 19:41, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: > That should be fine. Is codel running only for latest backports? Are there any openwrt changes to configure codel? Can you plz try to reset master branch to older commit and validate? > > -Rajkumar > ________________________________________ > From: Roman Yeryomin [leroi.lists@gmail.com] > Sent: Friday, April 8, 2016 9:30 PM > To: Manoharan, Rajkumar > Cc: ath10k@lists.infradead.org; Rajkumar Manoharan > Subject: Re: ath10k performance, master branch from 20160407 > > Rajkumar, > > I took backports from > git://git.kernel.org/pub/scm/linux/kernel/git/backports/backports.git, > took latest ath tree from > git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git, generated > backports-output based on ath master branch, refreshed openwrt > patches. > And saw big performance degradation. Am I doing something wrong? > > Regards, > Roman > > On 8 April 2016 at 18:34, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >> Roman, >> >> Which backports version are you using? I don't see codel changes in ath.git/wireless-drivers.git. >> Hope you are using same firmware. >> >> -Rajkumar >> ________________________________________ >> From: ath10k <ath10k-bounces@lists.infradead.org> on behalf of Roman Yeryomin <leroi.lists@gmail.com> >> Sent: Friday, April 8, 2016 8:14 PM >> To: ath10k@lists.infradead.org >> Subject: ath10k performance, master branch from 20160407 >> >> Hello! >> >> I've seen performance patches were commited so I've decided to give it >> a try (using 4.1 kernel and backports). >> The results are quite disappointing: TCP download (client pov) dropped >> from 750Mbps to ~550 and UDP shows completely weird behavour - if >> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives >> 250Mbps, before (latest official backports release from January) I was >> able to get 900Mbps. >> Hardware is basically ap152 + qca988x 3x3. >> When running perf top I see that fq_codel_drop eats a lot of cpu. >> Here is the output when running iperf3 UDP test: >> >> 45.78% [kernel] [k] fq_codel_drop >> 3.05% [kernel] [k] ag71xx_poll >> 2.18% [kernel] [k] skb_release_data >> 2.01% [kernel] [k] r4k_dma_cache_inv >> 1.73% [kernel] [k] eth_type_trans >> 1.24% [kernel] [k] build_skb >> 1.20% [mac80211] [k] ieee80211_tx_dequeue >> 1.03% [kernel] [k] __delay >> 0.98% [kernel] [k] fq_codel_enqueue >> 0.94% [kernel] [k] __netif_receive_skb_core >> 0.93% [kernel] [k] skb_release_head_state >> 0.88% [ath10k_core] [k] ath10k_htt_tx >> 0.87% [kernel] [k] __dev_queue_xmit >> 0.84% [mac80211] [k] ieee80211_tx_status >> 0.81% [kernel] [k] __build_skb >> 0.80% [mac80211] [k] __ieee80211_subif_start_xmit >> 0.77% [kernel] [k] br_handle_frame_finish >> 0.75% [kernel] [k] __qdisc_run >> 0.73% [kernel] [k] skb_recycler_consume >> 0.72% [kernel] [k] kfree_skb >> 0.72% [kernel] [k] get_page_from_freelist >> 0.69% [kernel] [k] br_fdb_update >> 0.69% [kernel] [k] br_handle_frame >> 0.67% [kernel] [k] __copy_user_common >> 0.66% [kernel] [k] __skb_flow_dissect >> 0.65% [ath10k_core] [k] ath10k_txrx_tx_unref >> 0.60% [kernel] [k] kmem_cache_alloc >> 0.60% [mac80211] [k] sta_addr_hash >> 0.56% [kernel] [k] fq_codel_dequeue >> 0.53% [kernel] [k] __local_bh_enable_ip >> 0.50% [kernel] [k] __br_fdb_get >> >> What could be the reason? >> I've seen there are some patches from Michal which touch fq_codel, >> would those help or not? >> >> >> Regards, >> Roman >> >> _______________________________________________ >> ath10k mailing list >> ath10k@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/ath10k _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-08 17:19 ` Roman Yeryomin @ 2016-04-09 4:02 ` Manoharan, Rajkumar 2016-04-13 12:44 ` Roman Yeryomin 2016-04-17 9:28 ` Roman Yeryomin 0 siblings, 2 replies; 23+ messages in thread From: Manoharan, Rajkumar @ 2016-04-09 4:02 UTC (permalink / raw) To: Roman Yeryomin; +Cc: Rajkumar Manoharan, ath10k Roman, Need your help to bisect regression point. Can you try w/o CPTCFG_NET_SCH_FQ_CODEL? If it does not help, try reverting below commits which are major changes in data path. Instead of generating backports, apply revert commit on top your backports. ath10k: combine txrx and replenish task ath10k: reuse copy engine 5 (htt rx) descriptors ath10k: cleanup copy engine receive next completion ath10k: register ath10k_htt_htc_t2h_msg_handler ath10k: speedup htt rx descriptor processing for rx_ind ath10k: cleanup amsdu processing for rx indication ath10k: remove unused fw_desc processing ath10k: copy tx fetch indication message ath10k: speedup htt rx descriptor processing for tx completion ath10k: fix null deref if device crashes early ath10k: fix pull-push tx threshold handling ath10k: fix tx hang ath10k: move mgmt descriptor limit handle under mgmt_tx ath10k: change htt tx desc/qcache peer limit config ath10k: fix HTT Tx CE ring size ath10k: implement push-pull tx ath10k: keep track of queue depth per txq ath10k: store txq in skb_cb ath10k: implement updating shared htt txq state ath10k: implement wake_tx_queue ath10k: add new htt message generation/parsing logic ath10k: add fast peer_map lookup ath10k: maintain peer_id for each sta and vif ath10k: refactor tx pending management ath10k: unify txpath decision ath10k: refactor tx code -Rajkumar ________________________________________ From: Roman Yeryomin <leroi.lists@gmail.com> Sent: Friday, April 8, 2016 10:49 PM To: Manoharan, Rajkumar Cc: ath10k@lists.infradead.org; Rajkumar Manoharan Subject: Re: ath10k performance, master branch from 20160407 Latest backports (compat-wireless) released (20160110) has codel enabled (CPTCFG_NET_SCH_FQ_CODEL=y) and there are no openwrt patches or special configuration for codel. And it runs ok. How old commit do you want me to try? Regards, Roman On 8 April 2016 at 19:41, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: > That should be fine. Is codel running only for latest backports? Are there any openwrt changes to configure codel? Can you plz try to reset master branch to older commit and validate? > > -Rajkumar > ________________________________________ > From: Roman Yeryomin [leroi.lists@gmail.com] > Sent: Friday, April 8, 2016 9:30 PM > To: Manoharan, Rajkumar > Cc: ath10k@lists.infradead.org; Rajkumar Manoharan > Subject: Re: ath10k performance, master branch from 20160407 > > Rajkumar, > > I took backports from > git://git.kernel.org/pub/scm/linux/kernel/git/backports/backports.git, > took latest ath tree from > git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git, generated > backports-output based on ath master branch, refreshed openwrt > patches. > And saw big performance degradation. Am I doing something wrong? > > Regards, > Roman > > On 8 April 2016 at 18:34, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >> Roman, >> >> Which backports version are you using? I don't see codel changes in ath.git/wireless-drivers.git. >> Hope you are using same firmware. >> >> -Rajkumar >> ________________________________________ >> From: ath10k <ath10k-bounces@lists.infradead.org> on behalf of Roman Yeryomin <leroi.lists@gmail.com> >> Sent: Friday, April 8, 2016 8:14 PM >> To: ath10k@lists.infradead.org >> Subject: ath10k performance, master branch from 20160407 >> >> Hello! >> >> I've seen performance patches were commited so I've decided to give it >> a try (using 4.1 kernel and backports). >> The results are quite disappointing: TCP download (client pov) dropped >> from 750Mbps to ~550 and UDP shows completely weird behavour - if >> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives >> 250Mbps, before (latest official backports release from January) I was >> able to get 900Mbps. >> Hardware is basically ap152 + qca988x 3x3. >> When running perf top I see that fq_codel_drop eats a lot of cpu. >> Here is the output when running iperf3 UDP test: >> >> 45.78% [kernel] [k] fq_codel_drop >> 3.05% [kernel] [k] ag71xx_poll >> 2.18% [kernel] [k] skb_release_data >> 2.01% [kernel] [k] r4k_dma_cache_inv >> 1.73% [kernel] [k] eth_type_trans >> 1.24% [kernel] [k] build_skb >> 1.20% [mac80211] [k] ieee80211_tx_dequeue >> 1.03% [kernel] [k] __delay >> 0.98% [kernel] [k] fq_codel_enqueue >> 0.94% [kernel] [k] __netif_receive_skb_core >> 0.93% [kernel] [k] skb_release_head_state >> 0.88% [ath10k_core] [k] ath10k_htt_tx >> 0.87% [kernel] [k] __dev_queue_xmit >> 0.84% [mac80211] [k] ieee80211_tx_status >> 0.81% [kernel] [k] __build_skb >> 0.80% [mac80211] [k] __ieee80211_subif_start_xmit >> 0.77% [kernel] [k] br_handle_frame_finish >> 0.75% [kernel] [k] __qdisc_run >> 0.73% [kernel] [k] skb_recycler_consume >> 0.72% [kernel] [k] kfree_skb >> 0.72% [kernel] [k] get_page_from_freelist >> 0.69% [kernel] [k] br_fdb_update >> 0.69% [kernel] [k] br_handle_frame >> 0.67% [kernel] [k] __copy_user_common >> 0.66% [kernel] [k] __skb_flow_dissect >> 0.65% [ath10k_core] [k] ath10k_txrx_tx_unref >> 0.60% [kernel] [k] kmem_cache_alloc >> 0.60% [mac80211] [k] sta_addr_hash >> 0.56% [kernel] [k] fq_codel_dequeue >> 0.53% [kernel] [k] __local_bh_enable_ip >> 0.50% [kernel] [k] __br_fdb_get >> >> What could be the reason? >> I've seen there are some patches from Michal which touch fq_codel, >> would those help or not? >> >> >> Regards, >> Roman >> >> _______________________________________________ >> ath10k mailing list >> ath10k@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/ath10k _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-09 4:02 ` Manoharan, Rajkumar @ 2016-04-13 12:44 ` Roman Yeryomin 2016-04-17 9:28 ` Roman Yeryomin 1 sibling, 0 replies; 23+ messages in thread From: Roman Yeryomin @ 2016-04-13 12:44 UTC (permalink / raw) To: Manoharan, Rajkumar; +Cc: Rajkumar Manoharan, ath10k Rajkumar, sorry for delay, will try today. On 9 April 2016 at 07:02, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: > Roman, > > Need your help to bisect regression point. Can you try w/o CPTCFG_NET_SCH_FQ_CODEL? > If it does not help, try reverting below commits which are major changes in data path. > Instead of generating backports, apply revert commit on top your backports. > > ath10k: combine txrx and replenish task > ath10k: reuse copy engine 5 (htt rx) descriptors > ath10k: cleanup copy engine receive next completion > ath10k: register ath10k_htt_htc_t2h_msg_handler > ath10k: speedup htt rx descriptor processing for rx_ind > ath10k: cleanup amsdu processing for rx indication > ath10k: remove unused fw_desc processing > ath10k: copy tx fetch indication message > ath10k: speedup htt rx descriptor processing for tx completion > ath10k: fix null deref if device crashes early > ath10k: fix pull-push tx threshold handling > ath10k: fix tx hang > ath10k: move mgmt descriptor limit handle under mgmt_tx > ath10k: change htt tx desc/qcache peer limit config > ath10k: fix HTT Tx CE ring size > ath10k: implement push-pull tx > ath10k: keep track of queue depth per txq > ath10k: store txq in skb_cb > ath10k: implement updating shared htt txq state > ath10k: implement wake_tx_queue > ath10k: add new htt message generation/parsing logic > ath10k: add fast peer_map lookup > ath10k: maintain peer_id for each sta and vif > ath10k: refactor tx pending management > ath10k: unify txpath decision > ath10k: refactor tx code > > -Rajkumar > ________________________________________ > From: Roman Yeryomin <leroi.lists@gmail.com> > Sent: Friday, April 8, 2016 10:49 PM > To: Manoharan, Rajkumar > Cc: ath10k@lists.infradead.org; Rajkumar Manoharan > Subject: Re: ath10k performance, master branch from 20160407 > > Latest backports (compat-wireless) released (20160110) has codel > enabled (CPTCFG_NET_SCH_FQ_CODEL=y) and there are no openwrt patches > or special configuration for codel. And it runs ok. > How old commit do you want me to try? > > Regards, > Roman > > On 8 April 2016 at 19:41, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >> That should be fine. Is codel running only for latest backports? Are there any openwrt changes to configure codel? Can you plz try to reset master branch to older commit and validate? >> >> -Rajkumar >> ________________________________________ >> From: Roman Yeryomin [leroi.lists@gmail.com] >> Sent: Friday, April 8, 2016 9:30 PM >> To: Manoharan, Rajkumar >> Cc: ath10k@lists.infradead.org; Rajkumar Manoharan >> Subject: Re: ath10k performance, master branch from 20160407 >> >> Rajkumar, >> >> I took backports from >> git://git.kernel.org/pub/scm/linux/kernel/git/backports/backports.git, >> took latest ath tree from >> git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git, generated >> backports-output based on ath master branch, refreshed openwrt >> patches. >> And saw big performance degradation. Am I doing something wrong? >> >> Regards, >> Roman >> >> On 8 April 2016 at 18:34, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >>> Roman, >>> >>> Which backports version are you using? I don't see codel changes in ath.git/wireless-drivers.git. >>> Hope you are using same firmware. >>> >>> -Rajkumar >>> ________________________________________ >>> From: ath10k <ath10k-bounces@lists.infradead.org> on behalf of Roman Yeryomin <leroi.lists@gmail.com> >>> Sent: Friday, April 8, 2016 8:14 PM >>> To: ath10k@lists.infradead.org >>> Subject: ath10k performance, master branch from 20160407 >>> >>> Hello! >>> >>> I've seen performance patches were commited so I've decided to give it >>> a try (using 4.1 kernel and backports). >>> The results are quite disappointing: TCP download (client pov) dropped >>> from 750Mbps to ~550 and UDP shows completely weird behavour - if >>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives >>> 250Mbps, before (latest official backports release from January) I was >>> able to get 900Mbps. >>> Hardware is basically ap152 + qca988x 3x3. >>> When running perf top I see that fq_codel_drop eats a lot of cpu. >>> Here is the output when running iperf3 UDP test: >>> >>> 45.78% [kernel] [k] fq_codel_drop >>> 3.05% [kernel] [k] ag71xx_poll >>> 2.18% [kernel] [k] skb_release_data >>> 2.01% [kernel] [k] r4k_dma_cache_inv >>> 1.73% [kernel] [k] eth_type_trans >>> 1.24% [kernel] [k] build_skb >>> 1.20% [mac80211] [k] ieee80211_tx_dequeue >>> 1.03% [kernel] [k] __delay >>> 0.98% [kernel] [k] fq_codel_enqueue >>> 0.94% [kernel] [k] __netif_receive_skb_core >>> 0.93% [kernel] [k] skb_release_head_state >>> 0.88% [ath10k_core] [k] ath10k_htt_tx >>> 0.87% [kernel] [k] __dev_queue_xmit >>> 0.84% [mac80211] [k] ieee80211_tx_status >>> 0.81% [kernel] [k] __build_skb >>> 0.80% [mac80211] [k] __ieee80211_subif_start_xmit >>> 0.77% [kernel] [k] br_handle_frame_finish >>> 0.75% [kernel] [k] __qdisc_run >>> 0.73% [kernel] [k] skb_recycler_consume >>> 0.72% [kernel] [k] kfree_skb >>> 0.72% [kernel] [k] get_page_from_freelist >>> 0.69% [kernel] [k] br_fdb_update >>> 0.69% [kernel] [k] br_handle_frame >>> 0.67% [kernel] [k] __copy_user_common >>> 0.66% [kernel] [k] __skb_flow_dissect >>> 0.65% [ath10k_core] [k] ath10k_txrx_tx_unref >>> 0.60% [kernel] [k] kmem_cache_alloc >>> 0.60% [mac80211] [k] sta_addr_hash >>> 0.56% [kernel] [k] fq_codel_dequeue >>> 0.53% [kernel] [k] __local_bh_enable_ip >>> 0.50% [kernel] [k] __br_fdb_get >>> >>> What could be the reason? >>> I've seen there are some patches from Michal which touch fq_codel, >>> would those help or not? >>> >>> >>> Regards, >>> Roman >>> >>> _______________________________________________ >>> ath10k mailing list >>> ath10k@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/ath10k _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-09 4:02 ` Manoharan, Rajkumar 2016-04-13 12:44 ` Roman Yeryomin @ 2016-04-17 9:28 ` Roman Yeryomin 2016-04-17 15:06 ` Manoharan, Rajkumar 1 sibling, 1 reply; 23+ messages in thread From: Roman Yeryomin @ 2016-04-17 9:28 UTC (permalink / raw) To: Manoharan, Rajkumar; +Cc: Rajkumar Manoharan, ath10k Rajkumar, Somehow unseting CPTCFG_NET_SCH_FQ_CODEL didn't change anything and the patches you listed didn't revert cleanly, I gave up on 3rd dependent patch somewhere in the middle and just reset master to 89ef41bfaa46f24a14b776f1cd78c0e0b39e54ce, which is the last commit just before "ath10k: refactor tx code", and generated new backports. The result is that it has same performance as before. But I guess it is not a very good test as there were many changes to mac80211 too. So what do you want me to try next? Maybe you could provide a more precise list to revert? Regards, Roman On 9 April 2016 at 07:02, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: > Roman, > > Need your help to bisect regression point. Can you try w/o CPTCFG_NET_SCH_FQ_CODEL? > If it does not help, try reverting below commits which are major changes in data path. > Instead of generating backports, apply revert commit on top your backports. > > ath10k: combine txrx and replenish task > ath10k: reuse copy engine 5 (htt rx) descriptors > ath10k: cleanup copy engine receive next completion > ath10k: register ath10k_htt_htc_t2h_msg_handler > ath10k: speedup htt rx descriptor processing for rx_ind > ath10k: cleanup amsdu processing for rx indication > ath10k: remove unused fw_desc processing > ath10k: copy tx fetch indication message > ath10k: speedup htt rx descriptor processing for tx completion > ath10k: fix null deref if device crashes early > ath10k: fix pull-push tx threshold handling > ath10k: fix tx hang > ath10k: move mgmt descriptor limit handle under mgmt_tx > ath10k: change htt tx desc/qcache peer limit config > ath10k: fix HTT Tx CE ring size > ath10k: implement push-pull tx > ath10k: keep track of queue depth per txq > ath10k: store txq in skb_cb > ath10k: implement updating shared htt txq state > ath10k: implement wake_tx_queue > ath10k: add new htt message generation/parsing logic > ath10k: add fast peer_map lookup > ath10k: maintain peer_id for each sta and vif > ath10k: refactor tx pending management > ath10k: unify txpath decision > ath10k: refactor tx code > > -Rajkumar > ________________________________________ > From: Roman Yeryomin <leroi.lists@gmail.com> > Sent: Friday, April 8, 2016 10:49 PM > To: Manoharan, Rajkumar > Cc: ath10k@lists.infradead.org; Rajkumar Manoharan > Subject: Re: ath10k performance, master branch from 20160407 > > Latest backports (compat-wireless) released (20160110) has codel > enabled (CPTCFG_NET_SCH_FQ_CODEL=y) and there are no openwrt patches > or special configuration for codel. And it runs ok. > How old commit do you want me to try? > > Regards, > Roman > > On 8 April 2016 at 19:41, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >> That should be fine. Is codel running only for latest backports? Are there any openwrt changes to configure codel? Can you plz try to reset master branch to older commit and validate? >> >> -Rajkumar >> ________________________________________ >> From: Roman Yeryomin [leroi.lists@gmail.com] >> Sent: Friday, April 8, 2016 9:30 PM >> To: Manoharan, Rajkumar >> Cc: ath10k@lists.infradead.org; Rajkumar Manoharan >> Subject: Re: ath10k performance, master branch from 20160407 >> >> Rajkumar, >> >> I took backports from >> git://git.kernel.org/pub/scm/linux/kernel/git/backports/backports.git, >> took latest ath tree from >> git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git, generated >> backports-output based on ath master branch, refreshed openwrt >> patches. >> And saw big performance degradation. Am I doing something wrong? >> >> Regards, >> Roman >> >> On 8 April 2016 at 18:34, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >>> Roman, >>> >>> Which backports version are you using? I don't see codel changes in ath.git/wireless-drivers.git. >>> Hope you are using same firmware. >>> >>> -Rajkumar >>> ________________________________________ >>> From: ath10k <ath10k-bounces@lists.infradead.org> on behalf of Roman Yeryomin <leroi.lists@gmail.com> >>> Sent: Friday, April 8, 2016 8:14 PM >>> To: ath10k@lists.infradead.org >>> Subject: ath10k performance, master branch from 20160407 >>> >>> Hello! >>> >>> I've seen performance patches were commited so I've decided to give it >>> a try (using 4.1 kernel and backports). >>> The results are quite disappointing: TCP download (client pov) dropped >>> from 750Mbps to ~550 and UDP shows completely weird behavour - if >>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives >>> 250Mbps, before (latest official backports release from January) I was >>> able to get 900Mbps. >>> Hardware is basically ap152 + qca988x 3x3. >>> When running perf top I see that fq_codel_drop eats a lot of cpu. >>> Here is the output when running iperf3 UDP test: >>> >>> 45.78% [kernel] [k] fq_codel_drop >>> 3.05% [kernel] [k] ag71xx_poll >>> 2.18% [kernel] [k] skb_release_data >>> 2.01% [kernel] [k] r4k_dma_cache_inv >>> 1.73% [kernel] [k] eth_type_trans >>> 1.24% [kernel] [k] build_skb >>> 1.20% [mac80211] [k] ieee80211_tx_dequeue >>> 1.03% [kernel] [k] __delay >>> 0.98% [kernel] [k] fq_codel_enqueue >>> 0.94% [kernel] [k] __netif_receive_skb_core >>> 0.93% [kernel] [k] skb_release_head_state >>> 0.88% [ath10k_core] [k] ath10k_htt_tx >>> 0.87% [kernel] [k] __dev_queue_xmit >>> 0.84% [mac80211] [k] ieee80211_tx_status >>> 0.81% [kernel] [k] __build_skb >>> 0.80% [mac80211] [k] __ieee80211_subif_start_xmit >>> 0.77% [kernel] [k] br_handle_frame_finish >>> 0.75% [kernel] [k] __qdisc_run >>> 0.73% [kernel] [k] skb_recycler_consume >>> 0.72% [kernel] [k] kfree_skb >>> 0.72% [kernel] [k] get_page_from_freelist >>> 0.69% [kernel] [k] br_fdb_update >>> 0.69% [kernel] [k] br_handle_frame >>> 0.67% [kernel] [k] __copy_user_common >>> 0.66% [kernel] [k] __skb_flow_dissect >>> 0.65% [ath10k_core] [k] ath10k_txrx_tx_unref >>> 0.60% [kernel] [k] kmem_cache_alloc >>> 0.60% [mac80211] [k] sta_addr_hash >>> 0.56% [kernel] [k] fq_codel_dequeue >>> 0.53% [kernel] [k] __local_bh_enable_ip >>> 0.50% [kernel] [k] __br_fdb_get >>> >>> What could be the reason? >>> I've seen there are some patches from Michal which touch fq_codel, >>> would those help or not? >>> >>> >>> Regards, >>> Roman >>> >>> _______________________________________________ >>> ath10k mailing list >>> ath10k@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/ath10k _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-17 9:28 ` Roman Yeryomin @ 2016-04-17 15:06 ` Manoharan, Rajkumar 2016-04-17 23:03 ` Roman Yeryomin 0 siblings, 1 reply; 23+ messages in thread From: Manoharan, Rajkumar @ 2016-04-17 15:06 UTC (permalink / raw) To: Roman Yeryomin; +Cc: Rajkumar Manoharan, ath10k Roman, Hmm.. I just listed ath10k changes alone. So there might be some dependencies. In your earlier mail fq_codel_drop was consuming 45% cpu. Have you observed any improvement after switching off NET_SCH_FQ_CODEL? Had CPU usage gone down? Please try to revert the commit "ath10k: combine txrx and replenish task" alone. If you still see same behavior (lower numbers), reset master branch to till "ath10k: fix pull-push tx threshold handling" and generate backports. Please make sure that codel is switched off always until regression point is root caused. -Rajkumar ________________________________________ From: Roman Yeryomin <leroi.lists@gmail.com> Sent: Sunday, April 17, 2016 2:58 PM To: Manoharan, Rajkumar Cc: ath10k@lists.infradead.org; Rajkumar Manoharan Subject: Re: ath10k performance, master branch from 20160407 Rajkumar, Somehow unseting CPTCFG_NET_SCH_FQ_CODEL didn't change anything and the patches you listed didn't revert cleanly, I gave up on 3rd dependent patch somewhere in the middle and just reset master to 89ef41bfaa46f24a14b776f1cd78c0e0b39e54ce, which is the last commit just before "ath10k: refactor tx code", and generated new backports. The result is that it has same performance as before. But I guess it is not a very good test as there were many changes to mac80211 too. So what do you want me to try next? Maybe you could provide a more precise list to revert? Regards, Roman On 9 April 2016 at 07:02, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: > Roman, > > Need your help to bisect regression point. Can you try w/o CPTCFG_NET_SCH_FQ_CODEL? > If it does not help, try reverting below commits which are major changes in data path. > Instead of generating backports, apply revert commit on top your backports. > > ath10k: combine txrx and replenish task > ath10k: reuse copy engine 5 (htt rx) descriptors > ath10k: cleanup copy engine receive next completion > ath10k: register ath10k_htt_htc_t2h_msg_handler > ath10k: speedup htt rx descriptor processing for rx_ind > ath10k: cleanup amsdu processing for rx indication > ath10k: remove unused fw_desc processing > ath10k: copy tx fetch indication message > ath10k: speedup htt rx descriptor processing for tx completion > ath10k: fix null deref if device crashes early > ath10k: fix pull-push tx threshold handling > ath10k: fix tx hang > ath10k: move mgmt descriptor limit handle under mgmt_tx > ath10k: change htt tx desc/qcache peer limit config > ath10k: fix HTT Tx CE ring size > ath10k: implement push-pull tx > ath10k: keep track of queue depth per txq > ath10k: store txq in skb_cb > ath10k: implement updating shared htt txq state > ath10k: implement wake_tx_queue > ath10k: add new htt message generation/parsing logic > ath10k: add fast peer_map lookup > ath10k: maintain peer_id for each sta and vif > ath10k: refactor tx pending management > ath10k: unify txpath decision > ath10k: refactor tx code > > -Rajkumar > ________________________________________ > From: Roman Yeryomin <leroi.lists@gmail.com> > Sent: Friday, April 8, 2016 10:49 PM > To: Manoharan, Rajkumar > Cc: ath10k@lists.infradead.org; Rajkumar Manoharan > Subject: Re: ath10k performance, master branch from 20160407 > > Latest backports (compat-wireless) released (20160110) has codel > enabled (CPTCFG_NET_SCH_FQ_CODEL=y) and there are no openwrt patches > or special configuration for codel. And it runs ok. > How old commit do you want me to try? > > Regards, > Roman > > On 8 April 2016 at 19:41, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >> That should be fine. Is codel running only for latest backports? Are there any openwrt changes to configure codel? Can you plz try to reset master branch to older commit and validate? >> >> -Rajkumar >> ________________________________________ >> From: Roman Yeryomin [leroi.lists@gmail.com] >> Sent: Friday, April 8, 2016 9:30 PM >> To: Manoharan, Rajkumar >> Cc: ath10k@lists.infradead.org; Rajkumar Manoharan >> Subject: Re: ath10k performance, master branch from 20160407 >> >> Rajkumar, >> >> I took backports from >> git://git.kernel.org/pub/scm/linux/kernel/git/backports/backports.git, >> took latest ath tree from >> git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git, generated >> backports-output based on ath master branch, refreshed openwrt >> patches. >> And saw big performance degradation. Am I doing something wrong? >> >> Regards, >> Roman >> >> On 8 April 2016 at 18:34, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >>> Roman, >>> >>> Which backports version are you using? I don't see codel changes in ath.git/wireless-drivers.git. >>> Hope you are using same firmware. >>> >>> -Rajkumar >>> ________________________________________ >>> From: ath10k <ath10k-bounces@lists.infradead.org> on behalf of Roman Yeryomin <leroi.lists@gmail.com> >>> Sent: Friday, April 8, 2016 8:14 PM >>> To: ath10k@lists.infradead.org >>> Subject: ath10k performance, master branch from 20160407 >>> >>> Hello! >>> >>> I've seen performance patches were commited so I've decided to give it >>> a try (using 4.1 kernel and backports). >>> The results are quite disappointing: TCP download (client pov) dropped >>> from 750Mbps to ~550 and UDP shows completely weird behavour - if >>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives >>> 250Mbps, before (latest official backports release from January) I was >>> able to get 900Mbps. >>> Hardware is basically ap152 + qca988x 3x3. >>> When running perf top I see that fq_codel_drop eats a lot of cpu. >>> Here is the output when running iperf3 UDP test: >>> >>> 45.78% [kernel] [k] fq_codel_drop >>> 3.05% [kernel] [k] ag71xx_poll >>> 2.18% [kernel] [k] skb_release_data >>> 2.01% [kernel] [k] r4k_dma_cache_inv >>> 1.73% [kernel] [k] eth_type_trans >>> 1.24% [kernel] [k] build_skb >>> 1.20% [mac80211] [k] ieee80211_tx_dequeue >>> 1.03% [kernel] [k] __delay >>> 0.98% [kernel] [k] fq_codel_enqueue >>> 0.94% [kernel] [k] __netif_receive_skb_core >>> 0.93% [kernel] [k] skb_release_head_state >>> 0.88% [ath10k_core] [k] ath10k_htt_tx >>> 0.87% [kernel] [k] __dev_queue_xmit >>> 0.84% [mac80211] [k] ieee80211_tx_status >>> 0.81% [kernel] [k] __build_skb >>> 0.80% [mac80211] [k] __ieee80211_subif_start_xmit >>> 0.77% [kernel] [k] br_handle_frame_finish >>> 0.75% [kernel] [k] __qdisc_run >>> 0.73% [kernel] [k] skb_recycler_consume >>> 0.72% [kernel] [k] kfree_skb >>> 0.72% [kernel] [k] get_page_from_freelist >>> 0.69% [kernel] [k] br_fdb_update >>> 0.69% [kernel] [k] br_handle_frame >>> 0.67% [kernel] [k] __copy_user_common >>> 0.66% [kernel] [k] __skb_flow_dissect >>> 0.65% [ath10k_core] [k] ath10k_txrx_tx_unref >>> 0.60% [kernel] [k] kmem_cache_alloc >>> 0.60% [mac80211] [k] sta_addr_hash >>> 0.56% [kernel] [k] fq_codel_dequeue >>> 0.53% [kernel] [k] __local_bh_enable_ip >>> 0.50% [kernel] [k] __br_fdb_get >>> >>> What could be the reason? >>> I've seen there are some patches from Michal which touch fq_codel, >>> would those help or not? >>> >>> >>> Regards, >>> Roman >>> >>> _______________________________________________ >>> ath10k mailing list >>> ath10k@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/ath10k _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-17 15:06 ` Manoharan, Rajkumar @ 2016-04-17 23:03 ` Roman Yeryomin 2016-04-18 13:00 ` Roman Yeryomin 2016-04-20 9:03 ` Michal Kazior 0 siblings, 2 replies; 23+ messages in thread From: Roman Yeryomin @ 2016-04-17 23:03 UTC (permalink / raw) To: Manoharan, Rajkumar; +Cc: Rajkumar Manoharan, ath10k Rajkumar, ok, I've ended up resolving (seems to be trivial) conflicts in revert list you provided (see comments inlined). Performance restored and codel symbols are gone from perf top. Will try reverting "ath10k: combine txrx and replenish task" alone and then, if that doesn't help, resetting reverts by patch sets. Regards, Roman On 17 April 2016 at 18:06, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: > Roman, > > Hmm.. I just listed ath10k changes alone. So there might be some dependencies. there were ath10k conflicts, please see below > In your earlier mail fq_codel_drop was consuming 45% cpu. Have you observed any > improvement after switching off NET_SCH_FQ_CODEL? Had CPU usage gone down? CPU usage didn't go down after simply turning off CPTCFG_NET_SCH_FQ_CODEL under compat wireless (and yes, I verified it was off in the config after recompilation). But still I'm not sure it's really off. Turning it off both in kernel config and compat-wireless doesn't seem to have effect. I didn't dig deeper into this but it looks I didn't find a correct way to turn it off completely. Not sure if I stated it correctly: after resetting to 89ef41bfaa46f24a14b776f1cd78c0e0b39e54ce I got same (good enough) performance as with latest compat-wireless release (20160110). > Please try to revert the commit "ath10k: combine txrx and replenish task" alone. If you still > see same behavior (lower numbers), reset master branch to till "ath10k: fix pull-push tx > threshold handling" and generate backports. > > Please make sure that codel is switched off always until regression point is root caused. > > -Rajkumar > > ________________________________________ > From: Roman Yeryomin <leroi.lists@gmail.com> > Sent: Sunday, April 17, 2016 2:58 PM > To: Manoharan, Rajkumar > Cc: ath10k@lists.infradead.org; Rajkumar Manoharan > Subject: Re: ath10k performance, master branch from 20160407 > > Rajkumar, > > Somehow unseting CPTCFG_NET_SCH_FQ_CODEL didn't change anything and > the patches you listed didn't revert cleanly, I gave up on 3rd > dependent patch somewhere in the middle and just reset master to > 89ef41bfaa46f24a14b776f1cd78c0e0b39e54ce, which is the last commit > just before "ath10k: refactor tx code", and generated new backports. > The result is that it has same performance as before. But I guess it > is not a very good test as there were many changes to mac80211 too. > > So what do you want me to try next? Maybe you could provide a more > precise list to revert? > > > Regards, > Roman > > On 9 April 2016 at 07:02, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >> Roman, >> >> Need your help to bisect regression point. Can you try w/o CPTCFG_NET_SCH_FQ_CODEL? >> If it does not help, try reverting below commits which are major changes in data path. >> Instead of generating backports, apply revert commit on top your backports. >> >> ath10k: combine txrx and replenish task >> ath10k: reuse copy engine 5 (htt rx) descriptors >> ath10k: cleanup copy engine receive next completion >> ath10k: register ath10k_htt_htc_t2h_msg_handler >> ath10k: speedup htt rx descriptor processing for rx_ind this depends on 689de38e37179c6f524dd003e1dae92042f8f5cd >> ath10k: cleanup amsdu processing for rx indication >> ath10k: remove unused fw_desc processing >> ath10k: copy tx fetch indication message >> ath10k: speedup htt rx descriptor processing for tx completion >> ath10k: fix null deref if device crashes early >> ath10k: fix pull-push tx threshold handling >> ath10k: fix tx hang >> ath10k: move mgmt descriptor limit handle under mgmt_tx error: could not revert cac0855... ath10k: move mgmt descriptor limit handle under mgmt_tx Not even sure why it fails here, pretty trivial to resolve but still... >> ath10k: change htt tx desc/qcache peer limit config error: could not revert 99ad1cb... ath10k: change htt tx desc/qcache peer limit config ook, resolved, hope correctly >> ath10k: fix HTT Tx CE ring size >> ath10k: implement push-pull tx >> ath10k: keep track of queue depth per txq >> ath10k: store txq in skb_cb >> ath10k: implement updating shared htt txq state >> ath10k: implement wake_tx_queue depends on 9d71d47eed20f34620e54e29bcc90f959d5873b8 and 750eeed89cf3c466df302e4707491b015531e26c all three fail to revert cleanly >> ath10k: add new htt message generation/parsing logic fails to revert cleanly >> ath10k: add fast peer_map lookup >> ath10k: maintain peer_id for each sta and vif >> ath10k: refactor tx pending management >> ath10k: unify txpath decision >> ath10k: refactor tx code >> >> -Rajkumar >> ________________________________________ >> From: Roman Yeryomin <leroi.lists@gmail.com> >> Sent: Friday, April 8, 2016 10:49 PM >> To: Manoharan, Rajkumar >> Cc: ath10k@lists.infradead.org; Rajkumar Manoharan >> Subject: Re: ath10k performance, master branch from 20160407 >> >> Latest backports (compat-wireless) released (20160110) has codel >> enabled (CPTCFG_NET_SCH_FQ_CODEL=y) and there are no openwrt patches >> or special configuration for codel. And it runs ok. >> How old commit do you want me to try? >> >> Regards, >> Roman >> >> On 8 April 2016 at 19:41, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >>> That should be fine. Is codel running only for latest backports? Are there any openwrt changes to configure codel? Can you plz try to reset master branch to older commit and validate? >>> >>> -Rajkumar >>> ________________________________________ >>> From: Roman Yeryomin [leroi.lists@gmail.com] >>> Sent: Friday, April 8, 2016 9:30 PM >>> To: Manoharan, Rajkumar >>> Cc: ath10k@lists.infradead.org; Rajkumar Manoharan >>> Subject: Re: ath10k performance, master branch from 20160407 >>> >>> Rajkumar, >>> >>> I took backports from >>> git://git.kernel.org/pub/scm/linux/kernel/git/backports/backports.git, >>> took latest ath tree from >>> git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git, generated >>> backports-output based on ath master branch, refreshed openwrt >>> patches. >>> And saw big performance degradation. Am I doing something wrong? >>> >>> Regards, >>> Roman >>> >>> On 8 April 2016 at 18:34, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >>>> Roman, >>>> >>>> Which backports version are you using? I don't see codel changes in ath.git/wireless-drivers.git. >>>> Hope you are using same firmware. >>>> >>>> -Rajkumar >>>> ________________________________________ >>>> From: ath10k <ath10k-bounces@lists.infradead.org> on behalf of Roman Yeryomin <leroi.lists@gmail.com> >>>> Sent: Friday, April 8, 2016 8:14 PM >>>> To: ath10k@lists.infradead.org >>>> Subject: ath10k performance, master branch from 20160407 >>>> >>>> Hello! >>>> >>>> I've seen performance patches were commited so I've decided to give it >>>> a try (using 4.1 kernel and backports). >>>> The results are quite disappointing: TCP download (client pov) dropped >>>> from 750Mbps to ~550 and UDP shows completely weird behavour - if >>>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives >>>> 250Mbps, before (latest official backports release from January) I was >>>> able to get 900Mbps. >>>> Hardware is basically ap152 + qca988x 3x3. >>>> When running perf top I see that fq_codel_drop eats a lot of cpu. >>>> Here is the output when running iperf3 UDP test: >>>> >>>> 45.78% [kernel] [k] fq_codel_drop >>>> 3.05% [kernel] [k] ag71xx_poll >>>> 2.18% [kernel] [k] skb_release_data >>>> 2.01% [kernel] [k] r4k_dma_cache_inv >>>> 1.73% [kernel] [k] eth_type_trans >>>> 1.24% [kernel] [k] build_skb >>>> 1.20% [mac80211] [k] ieee80211_tx_dequeue >>>> 1.03% [kernel] [k] __delay >>>> 0.98% [kernel] [k] fq_codel_enqueue >>>> 0.94% [kernel] [k] __netif_receive_skb_core >>>> 0.93% [kernel] [k] skb_release_head_state >>>> 0.88% [ath10k_core] [k] ath10k_htt_tx >>>> 0.87% [kernel] [k] __dev_queue_xmit >>>> 0.84% [mac80211] [k] ieee80211_tx_status >>>> 0.81% [kernel] [k] __build_skb >>>> 0.80% [mac80211] [k] __ieee80211_subif_start_xmit >>>> 0.77% [kernel] [k] br_handle_frame_finish >>>> 0.75% [kernel] [k] __qdisc_run >>>> 0.73% [kernel] [k] skb_recycler_consume >>>> 0.72% [kernel] [k] kfree_skb >>>> 0.72% [kernel] [k] get_page_from_freelist >>>> 0.69% [kernel] [k] br_fdb_update >>>> 0.69% [kernel] [k] br_handle_frame >>>> 0.67% [kernel] [k] __copy_user_common >>>> 0.66% [kernel] [k] __skb_flow_dissect >>>> 0.65% [ath10k_core] [k] ath10k_txrx_tx_unref >>>> 0.60% [kernel] [k] kmem_cache_alloc >>>> 0.60% [mac80211] [k] sta_addr_hash >>>> 0.56% [kernel] [k] fq_codel_dequeue >>>> 0.53% [kernel] [k] __local_bh_enable_ip >>>> 0.50% [kernel] [k] __br_fdb_get >>>> >>>> What could be the reason? >>>> I've seen there are some patches from Michal which touch fq_codel, >>>> would those help or not? >>>> >>>> >>>> Regards, >>>> Roman >>>> >>>> _______________________________________________ >>>> ath10k mailing list >>>> ath10k@lists.infradead.org >>>> http://lists.infradead.org/mailman/listinfo/ath10k _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-17 23:03 ` Roman Yeryomin @ 2016-04-18 13:00 ` Roman Yeryomin 2016-04-19 5:28 ` Michal Kazior 2016-04-20 9:03 ` Michal Kazior 1 sibling, 1 reply; 23+ messages in thread From: Roman Yeryomin @ 2016-04-18 13:00 UTC (permalink / raw) To: Manoharan, Rajkumar; +Cc: Rajkumar Manoharan, ath10k, michal.kazior So it looks like Michal's patch set "ath10k: implement push-pull tx model" introduced this regression - after restoring it from reverts fq_codel_drop is hungry again. Any ideas how to fix? Regards, Roman On 18 April 2016 at 02:03, Roman Yeryomin <leroi.lists@gmail.com> wrote: > Rajkumar, > > ok, I've ended up resolving (seems to be trivial) conflicts in revert > list you provided (see comments inlined). > Performance restored and codel symbols are gone from perf top. > Will try reverting "ath10k: combine txrx and replenish task" alone and > then, if that doesn't help, resetting reverts by patch sets. > > Regards, > Roman > > On 17 April 2016 at 18:06, Manoharan, Rajkumar > <rmanohar@qti.qualcomm.com> wrote: >> Roman, >> >> Hmm.. I just listed ath10k changes alone. So there might be some dependencies. > > there were ath10k conflicts, please see below > >> In your earlier mail fq_codel_drop was consuming 45% cpu. Have you observed any >> improvement after switching off NET_SCH_FQ_CODEL? Had CPU usage gone down? > > CPU usage didn't go down after simply turning off > CPTCFG_NET_SCH_FQ_CODEL under compat wireless (and yes, I verified it > was off in the config after recompilation). > But still I'm not sure it's really off. Turning it off both in kernel > config and compat-wireless doesn't seem to have effect. I didn't dig > deeper into this but it looks I didn't find a correct way to turn it > off completely. > > Not sure if I stated it correctly: after resetting to > 89ef41bfaa46f24a14b776f1cd78c0e0b39e54ce I got same (good enough) > performance as with latest compat-wireless release (20160110). > >> Please try to revert the commit "ath10k: combine txrx and replenish task" alone. If you still >> see same behavior (lower numbers), reset master branch to till "ath10k: fix pull-push tx >> threshold handling" and generate backports. >> >> Please make sure that codel is switched off always until regression point is root caused. >> >> -Rajkumar >> >> ________________________________________ >> From: Roman Yeryomin <leroi.lists@gmail.com> >> Sent: Sunday, April 17, 2016 2:58 PM >> To: Manoharan, Rajkumar >> Cc: ath10k@lists.infradead.org; Rajkumar Manoharan >> Subject: Re: ath10k performance, master branch from 20160407 >> >> Rajkumar, >> >> Somehow unseting CPTCFG_NET_SCH_FQ_CODEL didn't change anything and >> the patches you listed didn't revert cleanly, I gave up on 3rd >> dependent patch somewhere in the middle and just reset master to >> 89ef41bfaa46f24a14b776f1cd78c0e0b39e54ce, which is the last commit >> just before "ath10k: refactor tx code", and generated new backports. >> The result is that it has same performance as before. But I guess it >> is not a very good test as there were many changes to mac80211 too. >> >> So what do you want me to try next? Maybe you could provide a more >> precise list to revert? >> >> >> Regards, >> Roman >> >> On 9 April 2016 at 07:02, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >>> Roman, >>> >>> Need your help to bisect regression point. Can you try w/o CPTCFG_NET_SCH_FQ_CODEL? >>> If it does not help, try reverting below commits which are major changes in data path. >>> Instead of generating backports, apply revert commit on top your backports. >>> >>> ath10k: combine txrx and replenish task >>> ath10k: reuse copy engine 5 (htt rx) descriptors >>> ath10k: cleanup copy engine receive next completion >>> ath10k: register ath10k_htt_htc_t2h_msg_handler >>> ath10k: speedup htt rx descriptor processing for rx_ind > > this depends on 689de38e37179c6f524dd003e1dae92042f8f5cd > >>> ath10k: cleanup amsdu processing for rx indication >>> ath10k: remove unused fw_desc processing >>> ath10k: copy tx fetch indication message >>> ath10k: speedup htt rx descriptor processing for tx completion >>> ath10k: fix null deref if device crashes early >>> ath10k: fix pull-push tx threshold handling >>> ath10k: fix tx hang >>> ath10k: move mgmt descriptor limit handle under mgmt_tx > > error: could not revert cac0855... ath10k: move mgmt descriptor limit > handle under mgmt_tx > Not even sure why it fails here, pretty trivial to resolve but still... > >>> ath10k: change htt tx desc/qcache peer limit config > > error: could not revert 99ad1cb... ath10k: change htt tx desc/qcache > peer limit config > ook, resolved, hope correctly > >>> ath10k: fix HTT Tx CE ring size >>> ath10k: implement push-pull tx >>> ath10k: keep track of queue depth per txq >>> ath10k: store txq in skb_cb >>> ath10k: implement updating shared htt txq state >>> ath10k: implement wake_tx_queue > > depends on 9d71d47eed20f34620e54e29bcc90f959d5873b8 and > 750eeed89cf3c466df302e4707491b015531e26c > all three fail to revert cleanly > >>> ath10k: add new htt message generation/parsing logic > > fails to revert cleanly > >>> ath10k: add fast peer_map lookup >>> ath10k: maintain peer_id for each sta and vif >>> ath10k: refactor tx pending management >>> ath10k: unify txpath decision >>> ath10k: refactor tx code >>> >>> -Rajkumar >>> ________________________________________ >>> From: Roman Yeryomin <leroi.lists@gmail.com> >>> Sent: Friday, April 8, 2016 10:49 PM >>> To: Manoharan, Rajkumar >>> Cc: ath10k@lists.infradead.org; Rajkumar Manoharan >>> Subject: Re: ath10k performance, master branch from 20160407 >>> >>> Latest backports (compat-wireless) released (20160110) has codel >>> enabled (CPTCFG_NET_SCH_FQ_CODEL=y) and there are no openwrt patches >>> or special configuration for codel. And it runs ok. >>> How old commit do you want me to try? >>> >>> Regards, >>> Roman >>> >>> On 8 April 2016 at 19:41, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >>>> That should be fine. Is codel running only for latest backports? Are there any openwrt changes to configure codel? Can you plz try to reset master branch to older commit and validate? >>>> >>>> -Rajkumar >>>> ________________________________________ >>>> From: Roman Yeryomin [leroi.lists@gmail.com] >>>> Sent: Friday, April 8, 2016 9:30 PM >>>> To: Manoharan, Rajkumar >>>> Cc: ath10k@lists.infradead.org; Rajkumar Manoharan >>>> Subject: Re: ath10k performance, master branch from 20160407 >>>> >>>> Rajkumar, >>>> >>>> I took backports from >>>> git://git.kernel.org/pub/scm/linux/kernel/git/backports/backports.git, >>>> took latest ath tree from >>>> git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git, generated >>>> backports-output based on ath master branch, refreshed openwrt >>>> patches. >>>> And saw big performance degradation. Am I doing something wrong? >>>> >>>> Regards, >>>> Roman >>>> >>>> On 8 April 2016 at 18:34, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >>>>> Roman, >>>>> >>>>> Which backports version are you using? I don't see codel changes in ath.git/wireless-drivers.git. >>>>> Hope you are using same firmware. >>>>> >>>>> -Rajkumar >>>>> ________________________________________ >>>>> From: ath10k <ath10k-bounces@lists.infradead.org> on behalf of Roman Yeryomin <leroi.lists@gmail.com> >>>>> Sent: Friday, April 8, 2016 8:14 PM >>>>> To: ath10k@lists.infradead.org >>>>> Subject: ath10k performance, master branch from 20160407 >>>>> >>>>> Hello! >>>>> >>>>> I've seen performance patches were commited so I've decided to give it >>>>> a try (using 4.1 kernel and backports). >>>>> The results are quite disappointing: TCP download (client pov) dropped >>>>> from 750Mbps to ~550 and UDP shows completely weird behavour - if >>>>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives >>>>> 250Mbps, before (latest official backports release from January) I was >>>>> able to get 900Mbps. >>>>> Hardware is basically ap152 + qca988x 3x3. >>>>> When running perf top I see that fq_codel_drop eats a lot of cpu. >>>>> Here is the output when running iperf3 UDP test: >>>>> >>>>> 45.78% [kernel] [k] fq_codel_drop >>>>> 3.05% [kernel] [k] ag71xx_poll >>>>> 2.18% [kernel] [k] skb_release_data >>>>> 2.01% [kernel] [k] r4k_dma_cache_inv >>>>> 1.73% [kernel] [k] eth_type_trans >>>>> 1.24% [kernel] [k] build_skb >>>>> 1.20% [mac80211] [k] ieee80211_tx_dequeue >>>>> 1.03% [kernel] [k] __delay >>>>> 0.98% [kernel] [k] fq_codel_enqueue >>>>> 0.94% [kernel] [k] __netif_receive_skb_core >>>>> 0.93% [kernel] [k] skb_release_head_state >>>>> 0.88% [ath10k_core] [k] ath10k_htt_tx >>>>> 0.87% [kernel] [k] __dev_queue_xmit >>>>> 0.84% [mac80211] [k] ieee80211_tx_status >>>>> 0.81% [kernel] [k] __build_skb >>>>> 0.80% [mac80211] [k] __ieee80211_subif_start_xmit >>>>> 0.77% [kernel] [k] br_handle_frame_finish >>>>> 0.75% [kernel] [k] __qdisc_run >>>>> 0.73% [kernel] [k] skb_recycler_consume >>>>> 0.72% [kernel] [k] kfree_skb >>>>> 0.72% [kernel] [k] get_page_from_freelist >>>>> 0.69% [kernel] [k] br_fdb_update >>>>> 0.69% [kernel] [k] br_handle_frame >>>>> 0.67% [kernel] [k] __copy_user_common >>>>> 0.66% [kernel] [k] __skb_flow_dissect >>>>> 0.65% [ath10k_core] [k] ath10k_txrx_tx_unref >>>>> 0.60% [kernel] [k] kmem_cache_alloc >>>>> 0.60% [mac80211] [k] sta_addr_hash >>>>> 0.56% [kernel] [k] fq_codel_dequeue >>>>> 0.53% [kernel] [k] __local_bh_enable_ip >>>>> 0.50% [kernel] [k] __br_fdb_get >>>>> >>>>> What could be the reason? >>>>> I've seen there are some patches from Michal which touch fq_codel, >>>>> would those help or not? >>>>> >>>>> >>>>> Regards, >>>>> Roman >>>>> >>>>> _______________________________________________ >>>>> ath10k mailing list >>>>> ath10k@lists.infradead.org >>>>> http://lists.infradead.org/mailman/listinfo/ath10k _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-18 13:00 ` Roman Yeryomin @ 2016-04-19 5:28 ` Michal Kazior 2016-04-19 7:31 ` Roman Yeryomin 2016-04-22 17:02 ` Roman Yeryomin 0 siblings, 2 replies; 23+ messages in thread From: Michal Kazior @ 2016-04-19 5:28 UTC (permalink / raw) To: Roman Yeryomin; +Cc: Manoharan, Rajkumar, ath10k, Rajkumar Manoharan On 18 April 2016 at 15:00, Roman Yeryomin <leroi.lists@gmail.com> wrote: > So it looks like Michal's patch set "ath10k: implement push-pull tx > model" introduced this regression - after restoring it from reverts > fq_codel_drop is hungry again. > Any ideas how to fix? If my hunch is right there's no easy (and proper) fix for that now. One of the patchset patches (ath10k: implement wake_tx_queue) starts to use mac80211 software queuing. This introduces extra induced latency and I'm guessing it results in fill-in-then-drain sequences in some cases which end up being long enough to make fq_codel_drop more work than normal. This is required for other changes and MU-MIMO performance improvements so this patch can't be removed. I guess you could try forcing fq_codel to use different target time, e.g. 20ms (instead of the default 5). You can do this using `tc` command like so: tc qdisc replace dev wlan0 parent :1 fq_codel limit 1024 target 20ms tc qdisc replace dev wlan0 parent :2 fq_codel limit 1024 target 20ms tc qdisc replace dev wlan0 parent :3 fq_codel limit 1024 target 20ms tc qdisc replace dev wlan0 parent :4 fq_codel limit 1024 target 20ms You might also want to try `pfifo` instead of `fq_codel` for comparison as well. Michał > > Regards, > Roman > > On 18 April 2016 at 02:03, Roman Yeryomin <leroi.lists@gmail.com> wrote: >> Rajkumar, >> >> ok, I've ended up resolving (seems to be trivial) conflicts in revert >> list you provided (see comments inlined). >> Performance restored and codel symbols are gone from perf top. >> Will try reverting "ath10k: combine txrx and replenish task" alone and >> then, if that doesn't help, resetting reverts by patch sets. >> >> Regards, >> Roman >> >> On 17 April 2016 at 18:06, Manoharan, Rajkumar >> <rmanohar@qti.qualcomm.com> wrote: >>> Roman, >>> >>> Hmm.. I just listed ath10k changes alone. So there might be some dependencies. >> >> there were ath10k conflicts, please see below >> >>> In your earlier mail fq_codel_drop was consuming 45% cpu. Have you observed any >>> improvement after switching off NET_SCH_FQ_CODEL? Had CPU usage gone down? >> >> CPU usage didn't go down after simply turning off >> CPTCFG_NET_SCH_FQ_CODEL under compat wireless (and yes, I verified it >> was off in the config after recompilation). >> But still I'm not sure it's really off. Turning it off both in kernel >> config and compat-wireless doesn't seem to have effect. I didn't dig >> deeper into this but it looks I didn't find a correct way to turn it >> off completely. >> >> Not sure if I stated it correctly: after resetting to >> 89ef41bfaa46f24a14b776f1cd78c0e0b39e54ce I got same (good enough) >> performance as with latest compat-wireless release (20160110). >> >>> Please try to revert the commit "ath10k: combine txrx and replenish task" alone. If you still >>> see same behavior (lower numbers), reset master branch to till "ath10k: fix pull-push tx >>> threshold handling" and generate backports. >>> >>> Please make sure that codel is switched off always until regression point is root caused. >>> >>> -Rajkumar >>> >>> ________________________________________ >>> From: Roman Yeryomin <leroi.lists@gmail.com> >>> Sent: Sunday, April 17, 2016 2:58 PM >>> To: Manoharan, Rajkumar >>> Cc: ath10k@lists.infradead.org; Rajkumar Manoharan >>> Subject: Re: ath10k performance, master branch from 20160407 >>> >>> Rajkumar, >>> >>> Somehow unseting CPTCFG_NET_SCH_FQ_CODEL didn't change anything and >>> the patches you listed didn't revert cleanly, I gave up on 3rd >>> dependent patch somewhere in the middle and just reset master to >>> 89ef41bfaa46f24a14b776f1cd78c0e0b39e54ce, which is the last commit >>> just before "ath10k: refactor tx code", and generated new backports. >>> The result is that it has same performance as before. But I guess it >>> is not a very good test as there were many changes to mac80211 too. >>> >>> So what do you want me to try next? Maybe you could provide a more >>> precise list to revert? >>> >>> >>> Regards, >>> Roman >>> >>> On 9 April 2016 at 07:02, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >>>> Roman, >>>> >>>> Need your help to bisect regression point. Can you try w/o CPTCFG_NET_SCH_FQ_CODEL? >>>> If it does not help, try reverting below commits which are major changes in data path. >>>> Instead of generating backports, apply revert commit on top your backports. >>>> >>>> ath10k: combine txrx and replenish task >>>> ath10k: reuse copy engine 5 (htt rx) descriptors >>>> ath10k: cleanup copy engine receive next completion >>>> ath10k: register ath10k_htt_htc_t2h_msg_handler >>>> ath10k: speedup htt rx descriptor processing for rx_ind >> >> this depends on 689de38e37179c6f524dd003e1dae92042f8f5cd >> >>>> ath10k: cleanup amsdu processing for rx indication >>>> ath10k: remove unused fw_desc processing >>>> ath10k: copy tx fetch indication message >>>> ath10k: speedup htt rx descriptor processing for tx completion >>>> ath10k: fix null deref if device crashes early >>>> ath10k: fix pull-push tx threshold handling >>>> ath10k: fix tx hang >>>> ath10k: move mgmt descriptor limit handle under mgmt_tx >> >> error: could not revert cac0855... ath10k: move mgmt descriptor limit >> handle under mgmt_tx >> Not even sure why it fails here, pretty trivial to resolve but still... >> >>>> ath10k: change htt tx desc/qcache peer limit config >> >> error: could not revert 99ad1cb... ath10k: change htt tx desc/qcache >> peer limit config >> ook, resolved, hope correctly >> >>>> ath10k: fix HTT Tx CE ring size >>>> ath10k: implement push-pull tx >>>> ath10k: keep track of queue depth per txq >>>> ath10k: store txq in skb_cb >>>> ath10k: implement updating shared htt txq state >>>> ath10k: implement wake_tx_queue >> >> depends on 9d71d47eed20f34620e54e29bcc90f959d5873b8 and >> 750eeed89cf3c466df302e4707491b015531e26c >> all three fail to revert cleanly >> >>>> ath10k: add new htt message generation/parsing logic >> >> fails to revert cleanly >> >>>> ath10k: add fast peer_map lookup >>>> ath10k: maintain peer_id for each sta and vif >>>> ath10k: refactor tx pending management >>>> ath10k: unify txpath decision >>>> ath10k: refactor tx code >>>> >>>> -Rajkumar >>>> ________________________________________ >>>> From: Roman Yeryomin <leroi.lists@gmail.com> >>>> Sent: Friday, April 8, 2016 10:49 PM >>>> To: Manoharan, Rajkumar >>>> Cc: ath10k@lists.infradead.org; Rajkumar Manoharan >>>> Subject: Re: ath10k performance, master branch from 20160407 >>>> >>>> Latest backports (compat-wireless) released (20160110) has codel >>>> enabled (CPTCFG_NET_SCH_FQ_CODEL=y) and there are no openwrt patches >>>> or special configuration for codel. And it runs ok. >>>> How old commit do you want me to try? >>>> >>>> Regards, >>>> Roman >>>> >>>> On 8 April 2016 at 19:41, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >>>>> That should be fine. Is codel running only for latest backports? Are there any openwrt changes to configure codel? Can you plz try to reset master branch to older commit and validate? >>>>> >>>>> -Rajkumar >>>>> ________________________________________ >>>>> From: Roman Yeryomin [leroi.lists@gmail.com] >>>>> Sent: Friday, April 8, 2016 9:30 PM >>>>> To: Manoharan, Rajkumar >>>>> Cc: ath10k@lists.infradead.org; Rajkumar Manoharan >>>>> Subject: Re: ath10k performance, master branch from 20160407 >>>>> >>>>> Rajkumar, >>>>> >>>>> I took backports from >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/backports/backports.git, >>>>> took latest ath tree from >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git, generated >>>>> backports-output based on ath master branch, refreshed openwrt >>>>> patches. >>>>> And saw big performance degradation. Am I doing something wrong? >>>>> >>>>> Regards, >>>>> Roman >>>>> >>>>> On 8 April 2016 at 18:34, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >>>>>> Roman, >>>>>> >>>>>> Which backports version are you using? I don't see codel changes in ath.git/wireless-drivers.git. >>>>>> Hope you are using same firmware. >>>>>> >>>>>> -Rajkumar >>>>>> ________________________________________ >>>>>> From: ath10k <ath10k-bounces@lists.infradead.org> on behalf of Roman Yeryomin <leroi.lists@gmail.com> >>>>>> Sent: Friday, April 8, 2016 8:14 PM >>>>>> To: ath10k@lists.infradead.org >>>>>> Subject: ath10k performance, master branch from 20160407 >>>>>> >>>>>> Hello! >>>>>> >>>>>> I've seen performance patches were commited so I've decided to give it >>>>>> a try (using 4.1 kernel and backports). >>>>>> The results are quite disappointing: TCP download (client pov) dropped >>>>>> from 750Mbps to ~550 and UDP shows completely weird behavour - if >>>>>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives >>>>>> 250Mbps, before (latest official backports release from January) I was >>>>>> able to get 900Mbps. >>>>>> Hardware is basically ap152 + qca988x 3x3. >>>>>> When running perf top I see that fq_codel_drop eats a lot of cpu. >>>>>> Here is the output when running iperf3 UDP test: >>>>>> >>>>>> 45.78% [kernel] [k] fq_codel_drop >>>>>> 3.05% [kernel] [k] ag71xx_poll >>>>>> 2.18% [kernel] [k] skb_release_data >>>>>> 2.01% [kernel] [k] r4k_dma_cache_inv >>>>>> 1.73% [kernel] [k] eth_type_trans >>>>>> 1.24% [kernel] [k] build_skb >>>>>> 1.20% [mac80211] [k] ieee80211_tx_dequeue >>>>>> 1.03% [kernel] [k] __delay >>>>>> 0.98% [kernel] [k] fq_codel_enqueue >>>>>> 0.94% [kernel] [k] __netif_receive_skb_core >>>>>> 0.93% [kernel] [k] skb_release_head_state >>>>>> 0.88% [ath10k_core] [k] ath10k_htt_tx >>>>>> 0.87% [kernel] [k] __dev_queue_xmit >>>>>> 0.84% [mac80211] [k] ieee80211_tx_status >>>>>> 0.81% [kernel] [k] __build_skb >>>>>> 0.80% [mac80211] [k] __ieee80211_subif_start_xmit >>>>>> 0.77% [kernel] [k] br_handle_frame_finish >>>>>> 0.75% [kernel] [k] __qdisc_run >>>>>> 0.73% [kernel] [k] skb_recycler_consume >>>>>> 0.72% [kernel] [k] kfree_skb >>>>>> 0.72% [kernel] [k] get_page_from_freelist >>>>>> 0.69% [kernel] [k] br_fdb_update >>>>>> 0.69% [kernel] [k] br_handle_frame >>>>>> 0.67% [kernel] [k] __copy_user_common >>>>>> 0.66% [kernel] [k] __skb_flow_dissect >>>>>> 0.65% [ath10k_core] [k] ath10k_txrx_tx_unref >>>>>> 0.60% [kernel] [k] kmem_cache_alloc >>>>>> 0.60% [mac80211] [k] sta_addr_hash >>>>>> 0.56% [kernel] [k] fq_codel_dequeue >>>>>> 0.53% [kernel] [k] __local_bh_enable_ip >>>>>> 0.50% [kernel] [k] __br_fdb_get >>>>>> >>>>>> What could be the reason? >>>>>> I've seen there are some patches from Michal which touch fq_codel, >>>>>> would those help or not? >>>>>> >>>>>> >>>>>> Regards, >>>>>> Roman >>>>>> >>>>>> _______________________________________________ >>>>>> ath10k mailing list >>>>>> ath10k@lists.infradead.org >>>>>> http://lists.infradead.org/mailman/listinfo/ath10k _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-19 5:28 ` Michal Kazior @ 2016-04-19 7:31 ` Roman Yeryomin 2016-04-19 7:43 ` Michal Kazior 2016-04-22 17:02 ` Roman Yeryomin 1 sibling, 1 reply; 23+ messages in thread From: Roman Yeryomin @ 2016-04-19 7:31 UTC (permalink / raw) To: Michal Kazior; +Cc: Manoharan, Rajkumar, ath10k, Rajkumar Manoharan On 19 April 2016 at 08:28, Michal Kazior <michal.kazior@tieto.com> wrote: > On 18 April 2016 at 15:00, Roman Yeryomin <leroi.lists@gmail.com> wrote: >> So it looks like Michal's patch set "ath10k: implement push-pull tx >> model" introduced this regression - after restoring it from reverts >> fq_codel_drop is hungry again. >> Any ideas how to fix? > > If my hunch is right there's no easy (and proper) fix for that now. > > One of the patchset patches (ath10k: implement wake_tx_queue) starts > to use mac80211 software queuing. This introduces extra induced > latency and I'm guessing it results in fill-in-then-drain sequences in > some cases which end up being long enough to make fq_codel_drop more > work than normal. > > This is required for other changes and MU-MIMO performance > improvements so this patch can't be removed. But qca988x doesn't support MU-MIMO, AFAIK. Can this be made chip dependent? > I guess you could try forcing fq_codel to use different target time, > e.g. 20ms (instead of the default 5). You can do this using `tc` > command like so: > > tc qdisc replace dev wlan0 parent :1 fq_codel limit 1024 target 20ms > tc qdisc replace dev wlan0 parent :2 fq_codel limit 1024 target 20ms > tc qdisc replace dev wlan0 parent :3 fq_codel limit 1024 target 20ms > tc qdisc replace dev wlan0 parent :4 fq_codel limit 1024 target 20ms > > You might also want to try `pfifo` instead of `fq_codel` for comparison as well. Will try. Regards, Roman _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-19 7:31 ` Roman Yeryomin @ 2016-04-19 7:43 ` Michal Kazior 2016-04-19 15:35 ` Valo, Kalle 2016-04-22 17:03 ` Roman Yeryomin 0 siblings, 2 replies; 23+ messages in thread From: Michal Kazior @ 2016-04-19 7:43 UTC (permalink / raw) To: Roman Yeryomin; +Cc: Manoharan, Rajkumar, ath10k, Rajkumar Manoharan On 19 April 2016 at 09:31, Roman Yeryomin <leroi.lists@gmail.com> wrote: > On 19 April 2016 at 08:28, Michal Kazior <michal.kazior@tieto.com> wrote: >> On 18 April 2016 at 15:00, Roman Yeryomin <leroi.lists@gmail.com> wrote: >>> So it looks like Michal's patch set "ath10k: implement push-pull tx >>> model" introduced this regression - after restoring it from reverts >>> fq_codel_drop is hungry again. >>> Any ideas how to fix? >> >> If my hunch is right there's no easy (and proper) fix for that now. >> >> One of the patchset patches (ath10k: implement wake_tx_queue) starts >> to use mac80211 software queuing. This introduces extra induced >> latency and I'm guessing it results in fill-in-then-drain sequences in >> some cases which end up being long enough to make fq_codel_drop more >> work than normal. >> >> This is required for other changes and MU-MIMO performance >> improvements so this patch can't be removed. > > But qca988x doesn't support MU-MIMO, AFAIK. Correct. > Can this be made chip dependent? I guess it could but it'd arguably make the driver more complex and harder to maintain. What we want is a long-term fix, not a short-term one. The long-term fix is a work-in-progress which aims at killing bufferbloat in general [1][2]. This should, by proxy, improve everything. [1]: https://www.spinics.net/lists/linux-wireless/msg149776.html [2]: https://www.spinics.net/lists/linux-wireless/msg148714.html [3]: https://www.spinics.net/lists/linux-wireless/msg149039.html You can try out patchset from [1] (and maybe [3] as well) to see if it helps you (assuming you have spare time to play around). Michał _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-19 7:43 ` Michal Kazior @ 2016-04-19 15:35 ` Valo, Kalle 2016-04-22 17:05 ` Roman Yeryomin 2016-04-22 17:03 ` Roman Yeryomin 1 sibling, 1 reply; 23+ messages in thread From: Valo, Kalle @ 2016-04-19 15:35 UTC (permalink / raw) To: michal.kazior Cc: Roman Yeryomin, Manoharan, Rajkumar, ath10k, Rajkumar Manoharan Michal Kazior <michal.kazior@tieto.com> writes: > On 19 April 2016 at 09:31, Roman Yeryomin <leroi.lists@gmail.com> wrote: >> On 19 April 2016 at 08:28, Michal Kazior <michal.kazior@tieto.com> wrote: >> >>> If my hunch is right there's no easy (and proper) fix for that now. >>> >>> One of the patchset patches (ath10k: implement wake_tx_queue) starts >>> to use mac80211 software queuing. This introduces extra induced >>> latency and I'm guessing it results in fill-in-then-drain sequences in >>> some cases which end up being long enough to make fq_codel_drop more >>> work than normal. >>> >>> This is required for other changes and MU-MIMO performance >>> improvements so this patch can't be removed. >> >> But qca988x doesn't support MU-MIMO, AFAIK. > > Correct. > > >> Can this be made chip dependent? > > I guess it could but it'd arguably make the driver more complex and > harder to maintain. What we want is a long-term fix, not a short-term > one. But we should never go backwards and TCP dropping from 750 Mbps to ~550 Mbps is a huge drop, so this is not ok. We have to do something to fix this, be it reverting the wake_tx_queue support, somehow disabling it by default or something. -- Kalle Valo _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-19 15:35 ` Valo, Kalle @ 2016-04-22 17:05 ` Roman Yeryomin 2016-05-09 12:26 ` Michal Kazior 0 siblings, 1 reply; 23+ messages in thread From: Roman Yeryomin @ 2016-04-22 17:05 UTC (permalink / raw) To: Valo, Kalle Cc: Rajkumar Manoharan, michal.kazior, ath10k, Manoharan, Rajkumar On 19 April 2016 at 18:35, Valo, Kalle <kvalo@qca.qualcomm.com> wrote: > Michal Kazior <michal.kazior@tieto.com> writes: > >> On 19 April 2016 at 09:31, Roman Yeryomin <leroi.lists@gmail.com> wrote: >>> On 19 April 2016 at 08:28, Michal Kazior <michal.kazior@tieto.com> wrote: >>> >>>> If my hunch is right there's no easy (and proper) fix for that now. >>>> >>>> One of the patchset patches (ath10k: implement wake_tx_queue) starts >>>> to use mac80211 software queuing. This introduces extra induced >>>> latency and I'm guessing it results in fill-in-then-drain sequences in >>>> some cases which end up being long enough to make fq_codel_drop more >>>> work than normal. >>>> >>>> This is required for other changes and MU-MIMO performance >>>> improvements so this patch can't be removed. >>> >>> But qca988x doesn't support MU-MIMO, AFAIK. >> >> Correct. >> >> >>> Can this be made chip dependent? >> >> I guess it could but it'd arguably make the driver more complex and >> harder to maintain. What we want is a long-term fix, not a short-term >> one. > > But we should never go backwards and TCP dropping from 750 Mbps to ~550 > Mbps is a huge drop, so this is not ok. We have to do something to fix > this, be it reverting the wake_tx_queue support, somehow disabling it by > default or something. I would agree with Kalle here. This looks like very serious regression. But I'm afraid I can only help with testing here. Regards, Roman _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-22 17:05 ` Roman Yeryomin @ 2016-05-09 12:26 ` Michal Kazior 2016-05-15 22:59 ` Roman Yeryomin 0 siblings, 1 reply; 23+ messages in thread From: Michal Kazior @ 2016-05-09 12:26 UTC (permalink / raw) To: Roman Yeryomin Cc: Valo, Kalle, Manoharan, Rajkumar, ath10k, Rajkumar Manoharan Hi Roman, On 22 April 2016 at 19:05, Roman Yeryomin <leroi.lists@gmail.com> wrote: > On 19 April 2016 at 18:35, Valo, Kalle <kvalo@qca.qualcomm.com> wrote: >> Michal Kazior <michal.kazior@tieto.com> writes: >> >>> On 19 April 2016 at 09:31, Roman Yeryomin <leroi.lists@gmail.com> wrote: >>>> On 19 April 2016 at 08:28, Michal Kazior <michal.kazior@tieto.com> wrote: >>>> >>>>> If my hunch is right there's no easy (and proper) fix for that now. >>>>> >>>>> One of the patchset patches (ath10k: implement wake_tx_queue) starts >>>>> to use mac80211 software queuing. This introduces extra induced >>>>> latency and I'm guessing it results in fill-in-then-drain sequences in >>>>> some cases which end up being long enough to make fq_codel_drop more >>>>> work than normal. >>>>> >>>>> This is required for other changes and MU-MIMO performance >>>>> improvements so this patch can't be removed. >>>> >>>> But qca988x doesn't support MU-MIMO, AFAIK. >>> >>> Correct. >>> >>> >>>> Can this be made chip dependent? >>> >>> I guess it could but it'd arguably make the driver more complex and >>> harder to maintain. What we want is a long-term fix, not a short-term >>> one. >> >> But we should never go backwards and TCP dropping from 750 Mbps to ~550 >> Mbps is a huge drop, so this is not ok. We have to do something to fix >> this, be it reverting the wake_tx_queue support, somehow disabling it by >> default or something. > > I would agree with Kalle here. This looks like very serious regression. > But I'm afraid I can only help with testing here. Can you give the following patch a try, please? I didn't get to reproduce your problem on a real AP135/AP152 board and instead tried to simulate a slow uni-proc system via KVM and cooling_device in sysfs. The patch does improve things in this synthetic setup for me. http://lists.infradead.org/pipermail/ath10k/2016-May/007526.html Michał _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-05-09 12:26 ` Michal Kazior @ 2016-05-15 22:59 ` Roman Yeryomin 2016-05-16 3:57 ` Rajkumar Manoharan 0 siblings, 1 reply; 23+ messages in thread From: Roman Yeryomin @ 2016-05-15 22:59 UTC (permalink / raw) To: Michal Kazior Cc: Valo, Kalle, Manoharan, Rajkumar, ath10k, Rajkumar Manoharan On 9 May 2016 at 15:26, Michal Kazior <michal.kazior@tieto.com> wrote: > Hi Roman, > > On 22 April 2016 at 19:05, Roman Yeryomin <leroi.lists@gmail.com> wrote: >> On 19 April 2016 at 18:35, Valo, Kalle <kvalo@qca.qualcomm.com> wrote: >>> Michal Kazior <michal.kazior@tieto.com> writes: >>> >>>> On 19 April 2016 at 09:31, Roman Yeryomin <leroi.lists@gmail.com> wrote: >>>>> On 19 April 2016 at 08:28, Michal Kazior <michal.kazior@tieto.com> wrote: >>>>> >>>>>> If my hunch is right there's no easy (and proper) fix for that now. >>>>>> >>>>>> One of the patchset patches (ath10k: implement wake_tx_queue) starts >>>>>> to use mac80211 software queuing. This introduces extra induced >>>>>> latency and I'm guessing it results in fill-in-then-drain sequences in >>>>>> some cases which end up being long enough to make fq_codel_drop more >>>>>> work than normal. >>>>>> >>>>>> This is required for other changes and MU-MIMO performance >>>>>> improvements so this patch can't be removed. >>>>> >>>>> But qca988x doesn't support MU-MIMO, AFAIK. >>>> >>>> Correct. >>>> >>>> >>>>> Can this be made chip dependent? >>>> >>>> I guess it could but it'd arguably make the driver more complex and >>>> harder to maintain. What we want is a long-term fix, not a short-term >>>> one. >>> >>> But we should never go backwards and TCP dropping from 750 Mbps to ~550 >>> Mbps is a huge drop, so this is not ok. We have to do something to fix >>> this, be it reverting the wake_tx_queue support, somehow disabling it by >>> default or something. >> >> I would agree with Kalle here. This looks like very serious regression. >> But I'm afraid I can only help with testing here. > > Can you give the following patch a try, please? I didn't get to > reproduce your problem on a real AP135/AP152 board and instead tried > to simulate a slow uni-proc system via KVM and cooling_device in > sysfs. The patch does improve things in this synthetic setup for me. > > http://lists.infradead.org/pipermail/ath10k/2016-May/007526.html > Unfortunately doesn't seem to make any difference at all (really, if there is, it's less than 10Mbps). Please see this thread also: https://lists.openwrt.org/pipermail/openwrt-devel/2016-May/041445.html That is with your and Eric's patch applied. Regards, Roman _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-05-15 22:59 ` Roman Yeryomin @ 2016-05-16 3:57 ` Rajkumar Manoharan 0 siblings, 0 replies; 23+ messages in thread From: Rajkumar Manoharan @ 2016-05-16 3:57 UTC (permalink / raw) To: Roman Yeryomin; +Cc: Valo, Kalle, Michal Kazior, ath10k, Manoharan, Rajkumar On 2016-05-16 04:29, Roman Yeryomin wrote: > On 9 May 2016 at 15:26, Michal Kazior <michal.kazior@tieto.com> wrote: >> Hi Roman, >> >> On 22 April 2016 at 19:05, Roman Yeryomin <leroi.lists@gmail.com> >> wrote: >>> On 19 April 2016 at 18:35, Valo, Kalle <kvalo@qca.qualcomm.com> >>> wrote: >>>> Michal Kazior <michal.kazior@tieto.com> writes: >>>> >>>>> On 19 April 2016 at 09:31, Roman Yeryomin <leroi.lists@gmail.com> >>>>> wrote: >>>>>> On 19 April 2016 at 08:28, Michal Kazior <michal.kazior@tieto.com> >>>>>> wrote: >>>>>> >>>>>>> If my hunch is right there's no easy (and proper) fix for that >>>>>>> now. >>>>>>> >>>>>>> One of the patchset patches (ath10k: implement wake_tx_queue) >>>>>>> starts >>>>>>> to use mac80211 software queuing. This introduces extra induced >>>>>>> latency and I'm guessing it results in fill-in-then-drain >>>>>>> sequences in >>>>>>> some cases which end up being long enough to make fq_codel_drop >>>>>>> more >>>>>>> work than normal. >>>>>>> >>>>>>> This is required for other changes and MU-MIMO performance >>>>>>> improvements so this patch can't be removed. >>>>>> >>>>>> But qca988x doesn't support MU-MIMO, AFAIK. >>>>> >>>>> Correct. >>>>> >>>>> >>>>>> Can this be made chip dependent? >>>>> >>>>> I guess it could but it'd arguably make the driver more complex and >>>>> harder to maintain. What we want is a long-term fix, not a >>>>> short-term >>>>> one. >>>> >>>> But we should never go backwards and TCP dropping from 750 Mbps to >>>> ~550 >>>> Mbps is a huge drop, so this is not ok. We have to do something to >>>> fix >>>> this, be it reverting the wake_tx_queue support, somehow disabling >>>> it by >>>> default or something. >>> >>> I would agree with Kalle here. This looks like very serious >>> regression. >>> But I'm afraid I can only help with testing here. >> >> Can you give the following patch a try, please? I didn't get to >> reproduce your problem on a real AP135/AP152 board and instead tried >> to simulate a slow uni-proc system via KVM and cooling_device in >> sysfs. The patch does improve things in this synthetic setup for me. >> >> http://lists.infradead.org/pipermail/ath10k/2016-May/007526.html >> > > Unfortunately doesn't seem to make any difference at all (really, if > there is, it's less than 10Mbps). > Please see this thread also: > https://lists.openwrt.org/pipermail/openwrt-devel/2016-May/041445.html > That is with your and Eric's patch applied. > Roman, Can you please try without registering wake_tx_queue callback? software queuing is needed for devices that supports peer-flow-control. diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index 6829a08638b2..5df904169ded 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -7313,7 +7313,6 @@ ath10k_mac_op_switch_vif_chanctx(struct ieee80211_hw *hw, static const struct ieee80211_ops ath10k_ops = { .tx = ath10k_mac_op_tx, - .wake_tx_queue = ath10k_mac_op_wake_tx_queue, .start = ath10k_start, .stop = ath10k_stop, .config = ath10k_config, -Rajkumar _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-19 7:43 ` Michal Kazior 2016-04-19 15:35 ` Valo, Kalle @ 2016-04-22 17:03 ` Roman Yeryomin 1 sibling, 0 replies; 23+ messages in thread From: Roman Yeryomin @ 2016-04-22 17:03 UTC (permalink / raw) To: Michal Kazior; +Cc: Manoharan, Rajkumar, ath10k, Rajkumar Manoharan On 19 April 2016 at 10:43, Michal Kazior <michal.kazior@tieto.com> wrote: > On 19 April 2016 at 09:31, Roman Yeryomin <leroi.lists@gmail.com> wrote: >> On 19 April 2016 at 08:28, Michal Kazior <michal.kazior@tieto.com> wrote: >>> On 18 April 2016 at 15:00, Roman Yeryomin <leroi.lists@gmail.com> wrote: >>>> So it looks like Michal's patch set "ath10k: implement push-pull tx >>>> model" introduced this regression - after restoring it from reverts >>>> fq_codel_drop is hungry again. >>>> Any ideas how to fix? >>> >>> If my hunch is right there's no easy (and proper) fix for that now. >>> >>> One of the patchset patches (ath10k: implement wake_tx_queue) starts >>> to use mac80211 software queuing. This introduces extra induced >>> latency and I'm guessing it results in fill-in-then-drain sequences in >>> some cases which end up being long enough to make fq_codel_drop more >>> work than normal. >>> >>> This is required for other changes and MU-MIMO performance >>> improvements so this patch can't be removed. >> >> But qca988x doesn't support MU-MIMO, AFAIK. > > Correct. > > >> Can this be made chip dependent? > > I guess it could but it'd arguably make the driver more complex and > harder to maintain. What we want is a long-term fix, not a short-term > one. > > The long-term fix is a work-in-progress which aims at killing > bufferbloat in general [1][2]. This should, by proxy, improve > everything. > > [1]: https://www.spinics.net/lists/linux-wireless/msg149776.html > [2]: https://www.spinics.net/lists/linux-wireless/msg148714.html > [3]: https://www.spinics.net/lists/linux-wireless/msg149039.html > > You can try out patchset from [1] (and maybe [3] as well) to see if it > helps you (assuming you have spare time to play around). Will try. Regards, Roman _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-19 5:28 ` Michal Kazior 2016-04-19 7:31 ` Roman Yeryomin @ 2016-04-22 17:02 ` Roman Yeryomin 1 sibling, 0 replies; 23+ messages in thread From: Roman Yeryomin @ 2016-04-22 17:02 UTC (permalink / raw) To: Michal Kazior; +Cc: Manoharan, Rajkumar, ath10k, Rajkumar Manoharan On 19 April 2016 at 08:28, Michal Kazior <michal.kazior@tieto.com> wrote: > On 18 April 2016 at 15:00, Roman Yeryomin <leroi.lists@gmail.com> wrote: >> So it looks like Michal's patch set "ath10k: implement push-pull tx >> model" introduced this regression - after restoring it from reverts >> fq_codel_drop is hungry again. >> Any ideas how to fix? > > If my hunch is right there's no easy (and proper) fix for that now. > > One of the patchset patches (ath10k: implement wake_tx_queue) starts > to use mac80211 software queuing. This introduces extra induced > latency and I'm guessing it results in fill-in-then-drain sequences in > some cases which end up being long enough to make fq_codel_drop more > work than normal. > > This is required for other changes and MU-MIMO performance > improvements so this patch can't be removed. > > I guess you could try forcing fq_codel to use different target time, > e.g. 20ms (instead of the default 5). You can do this using `tc` > command like so: > > tc qdisc replace dev wlan0 parent :1 fq_codel limit 1024 target 20ms > tc qdisc replace dev wlan0 parent :2 fq_codel limit 1024 target 20ms > tc qdisc replace dev wlan0 parent :3 fq_codel limit 1024 target 20ms > tc qdisc replace dev wlan0 parent :4 fq_codel limit 1024 target 20ms this didn't change anything qdisc mq 0: dev wlan0 root qdisc fq_codel 8001: dev wlan0 parent :1 limit 1024p flows 1024 quantum 1514 target 20.0ms interval 100.0ms ecn qdisc fq_codel 8002: dev wlan0 parent :2 limit 1024p flows 1024 quantum 1514 target 20.0ms interval 100.0ms ecn qdisc fq_codel 8003: dev wlan0 parent :3 limit 1024p flows 1024 quantum 1514 target 20.0ms interval 100.0ms ecn qdisc fq_codel 8004: dev wlan0 parent :4 limit 1024p flows 1024 quantum 1514 target 20.0ms interval 100.0ms ecn > You might also want to try `pfifo` instead of `fq_codel` for comparison as well. and this did, but for UDP only: 450Mbps instead of 30, TCP remained on 550Mbps Regards, Roman _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-17 23:03 ` Roman Yeryomin 2016-04-18 13:00 ` Roman Yeryomin @ 2016-04-20 9:03 ` Michal Kazior 1 sibling, 0 replies; 23+ messages in thread From: Michal Kazior @ 2016-04-20 9:03 UTC (permalink / raw) To: Roman Yeryomin; +Cc: ath10k, Manoharan, Rajkumar, Rajkumar Manoharan On 18 April 2016 at 01:03, Roman Yeryomin <leroi.lists@gmail.com> wrote: [...] > CPU usage didn't go down after simply turning off > CPTCFG_NET_SCH_FQ_CODEL under compat wireless (and yes, I verified it > was off in the config after recompilation). > But still I'm not sure it's really off. Turning it off both in kernel > config and compat-wireless doesn't seem to have effect. I didn't dig > deeper into this but it looks I didn't find a correct way to turn it > off completely. You can check this using `tc qdisc` command. It'll tell you what kind of qdiscs sit on each interface. Michał _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: ath10k performance, master branch from 20160407 2016-04-08 16:00 ` Roman Yeryomin 2016-04-08 16:41 ` Manoharan, Rajkumar @ 2016-04-12 10:16 ` Xue Liu 1 sibling, 0 replies; 23+ messages in thread From: Xue Liu @ 2016-04-12 10:16 UTC (permalink / raw) To: Roman Yeryomin, Manoharan, Rajkumar; +Cc: ath10k, Rajkumar Manoharan Hello Roman, I am also working on the 10k testing with armada 388 + QCA9880 and OpenWRT trunk (compat-wireless 20160110). On 08/04/16 18:00, Roman Yeryomin wrote: > Rajkumar, > > I took backports from > git://git.kernel.org/pub/scm/linux/kernel/git/backports/backports.git, > took latest ath tree from > git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git, generated > backports-output based on ath master branch, refreshed openwrt > patches. Can you share how to refreshed openwrt patches to include latest patches from ath10k. I have also get backports-output from ath10k kernel, but I don't know how to use them. Thank you. Regards, Xue Liu > And saw big performance degradation. Am I doing something wrong? > > Regards, > Roman > > On 8 April 2016 at 18:34, Manoharan, Rajkumar <rmanohar@qti.qualcomm.com> wrote: >> Roman, >> >> Which backports version are you using? I don't see codel changes in ath.git/wireless-drivers.git. >> Hope you are using same firmware. >> >> -Rajkumar >> ________________________________________ >> From: ath10k <ath10k-bounces@lists.infradead.org> on behalf of Roman Yeryomin <leroi.lists@gmail.com> >> Sent: Friday, April 8, 2016 8:14 PM >> To: ath10k@lists.infradead.org >> Subject: ath10k performance, master branch from 20160407 >> >> Hello! >> >> I've seen performance patches were commited so I've decided to give it >> a try (using 4.1 kernel and backports). >> The results are quite disappointing: TCP download (client pov) dropped >> from 750Mbps to ~550 and UDP shows completely weird behavour - if >> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives >> 250Mbps, before (latest official backports release from January) I was >> able to get 900Mbps. >> Hardware is basically ap152 + qca988x 3x3. >> When running perf top I see that fq_codel_drop eats a lot of cpu. >> Here is the output when running iperf3 UDP test: >> >> 45.78% [kernel] [k] fq_codel_drop >> 3.05% [kernel] [k] ag71xx_poll >> 2.18% [kernel] [k] skb_release_data >> 2.01% [kernel] [k] r4k_dma_cache_inv >> 1.73% [kernel] [k] eth_type_trans >> 1.24% [kernel] [k] build_skb >> 1.20% [mac80211] [k] ieee80211_tx_dequeue >> 1.03% [kernel] [k] __delay >> 0.98% [kernel] [k] fq_codel_enqueue >> 0.94% [kernel] [k] __netif_receive_skb_core >> 0.93% [kernel] [k] skb_release_head_state >> 0.88% [ath10k_core] [k] ath10k_htt_tx >> 0.87% [kernel] [k] __dev_queue_xmit >> 0.84% [mac80211] [k] ieee80211_tx_status >> 0.81% [kernel] [k] __build_skb >> 0.80% [mac80211] [k] __ieee80211_subif_start_xmit >> 0.77% [kernel] [k] br_handle_frame_finish >> 0.75% [kernel] [k] __qdisc_run >> 0.73% [kernel] [k] skb_recycler_consume >> 0.72% [kernel] [k] kfree_skb >> 0.72% [kernel] [k] get_page_from_freelist >> 0.69% [kernel] [k] br_fdb_update >> 0.69% [kernel] [k] br_handle_frame >> 0.67% [kernel] [k] __copy_user_common >> 0.66% [kernel] [k] __skb_flow_dissect >> 0.65% [ath10k_core] [k] ath10k_txrx_tx_unref >> 0.60% [kernel] [k] kmem_cache_alloc >> 0.60% [mac80211] [k] sta_addr_hash >> 0.56% [kernel] [k] fq_codel_dequeue >> 0.53% [kernel] [k] __local_bh_enable_ip >> 0.50% [kernel] [k] __br_fdb_get >> >> What could be the reason? >> I've seen there are some patches from Michal which touch fq_codel, >> would those help or not? >> >> >> Regards, >> Roman >> >> _______________________________________________ >> ath10k mailing list >> ath10k@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/ath10k > _______________________________________________ > ath10k mailing list > ath10k@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/ath10k > _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2016-05-16 3:58 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-04-08 14:44 ath10k performance, master branch from 20160407 Roman Yeryomin 2016-04-08 15:34 ` Manoharan, Rajkumar 2016-04-08 16:00 ` Roman Yeryomin 2016-04-08 16:41 ` Manoharan, Rajkumar 2016-04-08 17:19 ` Roman Yeryomin 2016-04-09 4:02 ` Manoharan, Rajkumar 2016-04-13 12:44 ` Roman Yeryomin 2016-04-17 9:28 ` Roman Yeryomin 2016-04-17 15:06 ` Manoharan, Rajkumar 2016-04-17 23:03 ` Roman Yeryomin 2016-04-18 13:00 ` Roman Yeryomin 2016-04-19 5:28 ` Michal Kazior 2016-04-19 7:31 ` Roman Yeryomin 2016-04-19 7:43 ` Michal Kazior 2016-04-19 15:35 ` Valo, Kalle 2016-04-22 17:05 ` Roman Yeryomin 2016-05-09 12:26 ` Michal Kazior 2016-05-15 22:59 ` Roman Yeryomin 2016-05-16 3:57 ` Rajkumar Manoharan 2016-04-22 17:03 ` Roman Yeryomin 2016-04-22 17:02 ` Roman Yeryomin 2016-04-20 9:03 ` Michal Kazior 2016-04-12 10:16 ` Xue Liu
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.