* vchiq: Performance regression since 5.18-rc1 @ 2022-05-21 23:22 Stefan Wahren 2022-05-21 23:46 ` Paul E. McKenney ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Stefan Wahren @ 2022-05-21 23:22 UTC (permalink / raw) To: Marcelo Tosatti, Andrew Morton, Nicolas Saenz Julienne Cc: Borislav Petkov, Minchan Kim, Matthew Wilcox, Mel Gorman, Juri Lelli, Thomas Gleixner, Sebastian Andrzej Siewior, Paul E. McKenney, linux-kernel, linux-mm, Linux ARM, Phil Elwell, regressions Hi, while testing the staging/vc04_services/interface/vchiq_arm driver with my Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu Usually i run "vchiq_test -f 1" to see the driver is still working [1]. Before commit: real 0m1,500s user 0m0,068s sys 0m0,846s After commit: real 7m11,449s user 0m2,049s sys 0m0,023s Best regards [1] - https://github.com/raspberrypi/userland _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-21 23:22 vchiq: Performance regression since 5.18-rc1 Stefan Wahren @ 2022-05-21 23:46 ` Paul E. McKenney 2022-05-22 15:11 ` Stefan Wahren 2022-05-23 7:09 ` Sebastian Andrzej Siewior 2022-05-23 9:28 ` Thorsten Leemhuis 2 siblings, 1 reply; 19+ messages in thread From: Paul E. McKenney @ 2022-05-21 23:46 UTC (permalink / raw) To: Stefan Wahren Cc: Marcelo Tosatti, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Matthew Wilcox, Mel Gorman, Juri Lelli, Thomas Gleixner, Sebastian Andrzej Siewior, linux-kernel, linux-mm, Linux ARM, Phil Elwell, regressions, riel, viro On Sun, May 22, 2022 at 01:22:00AM +0200, Stefan Wahren wrote: > Hi, > > while testing the staging/vc04_services/interface/vchiq_arm driver with my > Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance > regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: > lru_cache_disable: replace work queue synchronization with synchronize_rcu > > Usually i run "vchiq_test -f 1" to see the driver is still working [1]. > > Before commit: > > real 0m1,500s > user 0m0,068s > sys 0m0,846s > > After commit: > > real 7m11,449s > user 0m2,049s > sys 0m0,023s > > Best regards > > [1] - https://github.com/raspberrypi/userland Please feel free to try the patch shown below. Or the pair of patches from Rik here: https://lore.kernel.org/lkml/20220218183114.2867528-2-riel@surriel.com/ https://lore.kernel.org/lkml/20220218183114.2867528-3-riel@surriel.com/ There is work ongoing to produce something better, but ongoing slowly. Especially my part of that work. Thanx, Paul ------------------------------------------------------------------------ From paulmck@kernel.org Mon Feb 14 11:05:49 2022 Date: Mon, 14 Feb 2022 11:05:49 -0800 From: "Paul E. McKenney" <paulmck@kernel.org> To: clm@fb.com Cc: riel@surriel.com, viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH RFC fs/namespace] Make kern_unmount() use synchronize_rcu_expedited() Message-ID: <20220214190549.GA2815154@paulmck-ThinkPad-P17-Gen-1> Reply-To: paulmck@kernel.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Status: RO Content-Length: 1036 Lines: 32 Experimental. Not for inclusion. Yet, anyway. Freeing large numbers of namespaces in quick succession can result in a bottleneck on the synchronize_rcu() invoked from kern_unmount(). This patch applies the synchronize_rcu_expedited() hammer to allow further testing and fault isolation. Hey, at least there was no need to change the comment! ;-) Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <linux-fsdevel@vger.kernel.org> Cc: <linux-kernel@vger.kernel.org> Not-yet-signed-off-by: Paul E. McKenney <paulmck@kernel.org> --- namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/namespace.c b/fs/namespace.c index 40b994a29e90d..79c50ad0ade5b 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -4389,7 +4389,7 @@ void kern_unmount(struct vfsmount *mnt) /* release long term mount so mount point can be released */ if (!IS_ERR_OR_NULL(mnt)) { real_mount(mnt)->mnt_ns = NULL; - synchronize_rcu(); /* yecchhh... */ + synchronize_rcu_expedited(); /* yecchhh... */ mntput(mnt); } } _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-21 23:46 ` Paul E. McKenney @ 2022-05-22 15:11 ` Stefan Wahren 2022-05-23 4:48 ` Paul E. McKenney 0 siblings, 1 reply; 19+ messages in thread From: Stefan Wahren @ 2022-05-22 15:11 UTC (permalink / raw) To: paulmck Cc: Marcelo Tosatti, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Matthew Wilcox, Mel Gorman, Juri Lelli, Thomas Gleixner, Sebastian Andrzej Siewior, linux-kernel, linux-mm, Linux ARM, Phil Elwell, regressions, riel, viro Hi Paul, Am 22.05.22 um 01:46 schrieb Paul E. McKenney: > On Sun, May 22, 2022 at 01:22:00AM +0200, Stefan Wahren wrote: >> Hi, >> >> while testing the staging/vc04_services/interface/vchiq_arm driver with my >> Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance >> regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: >> lru_cache_disable: replace work queue synchronization with synchronize_rcu >> >> Usually i run "vchiq_test -f 1" to see the driver is still working [1]. >> >> Before commit: >> >> real 0m1,500s >> user 0m0,068s >> sys 0m0,846s >> >> After commit: >> >> real 7m11,449s >> user 0m2,049s >> sys 0m0,023s >> >> Best regards >> >> [1] - https://github.com/raspberrypi/userland > Please feel free to try the patch shown below. Or the pair of patches > from Rik here: > > https://lore.kernel.org/lkml/20220218183114.2867528-2-riel@surriel.com/ > https://lore.kernel.org/lkml/20220218183114.2867528-3-riel@surriel.com/ I tried your patch and Rik's patches but in both cases vchiq_test runs 7 minutes instead of ~ 1 second. Best regards > > There is work ongoing to produce something better, but ongoing slowly. > Especially my part of that work. > > Thanx, Paul > > ------------------------------------------------------------------------ > > From paulmck@kernel.org Mon Feb 14 11:05:49 2022 > Date: Mon, 14 Feb 2022 11:05:49 -0800 > From: "Paul E. McKenney" <paulmck@kernel.org> > To: clm@fb.com > Cc: riel@surriel.com, viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org, > linux-fsdevel@vger.kernel.org, kernel-team@fb.com > Subject: [PATCH RFC fs/namespace] Make kern_unmount() use > synchronize_rcu_expedited() > Message-ID: <20220214190549.GA2815154@paulmck-ThinkPad-P17-Gen-1> > Reply-To: paulmck@kernel.org > MIME-Version: 1.0 > Content-Type: text/plain; charset=us-ascii > Content-Disposition: inline > Status: RO > Content-Length: 1036 > Lines: 32 > > Experimental. Not for inclusion. Yet, anyway. > > Freeing large numbers of namespaces in quick succession can result in > a bottleneck on the synchronize_rcu() invoked from kern_unmount(). > This patch applies the synchronize_rcu_expedited() hammer to allow > further testing and fault isolation. > > Hey, at least there was no need to change the comment! ;-) > > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Cc: <linux-fsdevel@vger.kernel.org> > Cc: <linux-kernel@vger.kernel.org> > Not-yet-signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > --- > > namespace.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 40b994a29e90d..79c50ad0ade5b 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -4389,7 +4389,7 @@ void kern_unmount(struct vfsmount *mnt) > /* release long term mount so mount point can be released */ > if (!IS_ERR_OR_NULL(mnt)) { > real_mount(mnt)->mnt_ns = NULL; > - synchronize_rcu(); /* yecchhh... */ > + synchronize_rcu_expedited(); /* yecchhh... */ > mntput(mnt); > } > } > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-22 15:11 ` Stefan Wahren @ 2022-05-23 4:48 ` Paul E. McKenney 2022-05-23 6:19 ` Stefan Wahren 0 siblings, 1 reply; 19+ messages in thread From: Paul E. McKenney @ 2022-05-23 4:48 UTC (permalink / raw) To: Stefan Wahren Cc: Marcelo Tosatti, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Matthew Wilcox, Mel Gorman, Juri Lelli, Thomas Gleixner, Sebastian Andrzej Siewior, linux-kernel, linux-mm, Linux ARM, Phil Elwell, regressions, riel, viro On Sun, May 22, 2022 at 05:11:36PM +0200, Stefan Wahren wrote: > Hi Paul, > > Am 22.05.22 um 01:46 schrieb Paul E. McKenney: > > On Sun, May 22, 2022 at 01:22:00AM +0200, Stefan Wahren wrote: > > > Hi, > > > > > > while testing the staging/vc04_services/interface/vchiq_arm driver with my > > > Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance > > > regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: > > > lru_cache_disable: replace work queue synchronization with synchronize_rcu > > > > > > Usually i run "vchiq_test -f 1" to see the driver is still working [1]. > > > > > > Before commit: > > > > > > real 0m1,500s > > > user 0m0,068s > > > sys 0m0,846s > > > > > > After commit: > > > > > > real 7m11,449s > > > user 0m2,049s > > > sys 0m0,023s > > > > > > Best regards > > > > > > [1] - https://github.com/raspberrypi/userland > > Please feel free to try the patch shown below. Or the pair of patches > > from Rik here: > > > > https://lore.kernel.org/lkml/20220218183114.2867528-2-riel@surriel.com/ > > https://lore.kernel.org/lkml/20220218183114.2867528-3-riel@surriel.com/ > > I tried your patch and Rik's patches but in both cases vchiq_test runs 7 > minutes instead of ~ 1 second. That is surprising. Do you boot with rcupdate.rcu_normal=1? That would nullify my patch, but I would expect that Rik's patch would still provide increased performance even in that case. Could you please characterize where the slowdown is occurring? Thanx, Paul > Best regards > > > > > There is work ongoing to produce something better, but ongoing slowly. > > Especially my part of that work. > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > From paulmck@kernel.org Mon Feb 14 11:05:49 2022 > > Date: Mon, 14 Feb 2022 11:05:49 -0800 > > From: "Paul E. McKenney" <paulmck@kernel.org> > > To: clm@fb.com > > Cc: riel@surriel.com, viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org, > > linux-fsdevel@vger.kernel.org, kernel-team@fb.com > > Subject: [PATCH RFC fs/namespace] Make kern_unmount() use > > synchronize_rcu_expedited() > > Message-ID: <20220214190549.GA2815154@paulmck-ThinkPad-P17-Gen-1> > > Reply-To: paulmck@kernel.org > > MIME-Version: 1.0 > > Content-Type: text/plain; charset=us-ascii > > Content-Disposition: inline > > Status: RO > > Content-Length: 1036 > > Lines: 32 > > > > Experimental. Not for inclusion. Yet, anyway. > > > > Freeing large numbers of namespaces in quick succession can result in > > a bottleneck on the synchronize_rcu() invoked from kern_unmount(). > > This patch applies the synchronize_rcu_expedited() hammer to allow > > further testing and fault isolation. > > > > Hey, at least there was no need to change the comment! ;-) > > > > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > > Cc: <linux-fsdevel@vger.kernel.org> > > Cc: <linux-kernel@vger.kernel.org> > > Not-yet-signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > > --- > > > > namespace.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/namespace.c b/fs/namespace.c > > index 40b994a29e90d..79c50ad0ade5b 100644 > > --- a/fs/namespace.c > > +++ b/fs/namespace.c > > @@ -4389,7 +4389,7 @@ void kern_unmount(struct vfsmount *mnt) > > /* release long term mount so mount point can be released */ > > if (!IS_ERR_OR_NULL(mnt)) { > > real_mount(mnt)->mnt_ns = NULL; > > - synchronize_rcu(); /* yecchhh... */ > > + synchronize_rcu_expedited(); /* yecchhh... */ > > mntput(mnt); > > } > > } > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-23 4:48 ` Paul E. McKenney @ 2022-05-23 6:19 ` Stefan Wahren 2022-05-23 9:29 ` Phil Elwell 0 siblings, 1 reply; 19+ messages in thread From: Stefan Wahren @ 2022-05-23 6:19 UTC (permalink / raw) To: paulmck, Phil Elwell Cc: Marcelo Tosatti, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Mel Gorman, Juri Lelli, Thomas Gleixner, Sebastian Andrzej Siewior, linux-kernel, linux-mm, Linux ARM, regressions, riel, viro Hi Paul, Am 23.05.22 um 06:48 schrieb Paul E. McKenney: > On Sun, May 22, 2022 at 05:11:36PM +0200, Stefan Wahren wrote: >> Hi Paul, >> >> Am 22.05.22 um 01:46 schrieb Paul E. McKenney: >>> On Sun, May 22, 2022 at 01:22:00AM +0200, Stefan Wahren wrote: >>>> Hi, >>>> >>>> while testing the staging/vc04_services/interface/vchiq_arm driver with my >>>> Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance >>>> regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: >>>> lru_cache_disable: replace work queue synchronization with synchronize_rcu >>>> >>>> Usually i run "vchiq_test -f 1" to see the driver is still working [1]. >>>> >>>> Before commit: >>>> >>>> real 0m1,500s >>>> user 0m0,068s >>>> sys 0m0,846s >>>> >>>> After commit: >>>> >>>> real 7m11,449s >>>> user 0m2,049s >>>> sys 0m0,023s >>>> >>>> Best regards >>>> >>>> [1] - https://github.com/raspberrypi/userland >>> Please feel free to try the patch shown below. Or the pair of patches >>> from Rik here: >>> >>> https://lore.kernel.org/lkml/20220218183114.2867528-2-riel@surriel.com/ >>> https://lore.kernel.org/lkml/20220218183114.2867528-3-riel@surriel.com/ >> I tried your patch and Rik's patches but in both cases vchiq_test runs 7 >> minutes instead of ~ 1 second. > That is surprising. Do you boot with rcupdate.rcu_normal=1? No, not explicit. > That would > nullify my patch, but I would expect that Rik's patch would still provide > increased performance even in that case. I will retest with a fresh SD card image. > > Could you please characterize where the slowdown is occurring? Unfortunately i don't have a deep insight into driver and vchiq_test tool. Just a user view. Do you think an strace would be a good starting point? @Phil Any advices to analyse this issue? > > Thanx, Paul > >> Best regards >> >>> There is work ongoing to produce something better, but ongoing slowly. >>> Especially my part of that work. >>> >>> Thanx, Paul >>> >>> ------------------------------------------------------------------------ >>> >>> From paulmck@kernel.org Mon Feb 14 11:05:49 2022 >>> Date: Mon, 14 Feb 2022 11:05:49 -0800 >>> From: "Paul E. McKenney" <paulmck@kernel.org> >>> To: clm@fb.com >>> Cc: riel@surriel.com, viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org, >>> linux-fsdevel@vger.kernel.org, kernel-team@fb.com >>> Subject: [PATCH RFC fs/namespace] Make kern_unmount() use >>> synchronize_rcu_expedited() >>> Message-ID: <20220214190549.GA2815154@paulmck-ThinkPad-P17-Gen-1> >>> Reply-To: paulmck@kernel.org >>> MIME-Version: 1.0 >>> Content-Type: text/plain; charset=us-ascii >>> Content-Disposition: inline >>> Status: RO >>> Content-Length: 1036 >>> Lines: 32 >>> >>> Experimental. Not for inclusion. Yet, anyway. >>> >>> Freeing large numbers of namespaces in quick succession can result in >>> a bottleneck on the synchronize_rcu() invoked from kern_unmount(). >>> This patch applies the synchronize_rcu_expedited() hammer to allow >>> further testing and fault isolation. >>> >>> Hey, at least there was no need to change the comment! ;-) >>> >>> Cc: Alexander Viro <viro@zeniv.linux.org.uk> >>> Cc: <linux-fsdevel@vger.kernel.org> >>> Cc: <linux-kernel@vger.kernel.org> >>> Not-yet-signed-off-by: Paul E. McKenney <paulmck@kernel.org> >>> >>> --- >>> >>> namespace.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/fs/namespace.c b/fs/namespace.c >>> index 40b994a29e90d..79c50ad0ade5b 100644 >>> --- a/fs/namespace.c >>> +++ b/fs/namespace.c >>> @@ -4389,7 +4389,7 @@ void kern_unmount(struct vfsmount *mnt) >>> /* release long term mount so mount point can be released */ >>> if (!IS_ERR_OR_NULL(mnt)) { >>> real_mount(mnt)->mnt_ns = NULL; >>> - synchronize_rcu(); /* yecchhh... */ >>> + synchronize_rcu_expedited(); /* yecchhh... */ >>> mntput(mnt); >>> } >>> } >>> _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-23 6:19 ` Stefan Wahren @ 2022-05-23 9:29 ` Phil Elwell 2022-05-23 10:48 ` Stefan Wahren 0 siblings, 1 reply; 19+ messages in thread From: Phil Elwell @ 2022-05-23 9:29 UTC (permalink / raw) To: Stefan Wahren, paulmck Cc: Marcelo Tosatti, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Mel Gorman, Juri Lelli, Thomas Gleixner, Sebastian Andrzej Siewior, linux-kernel, linux-mm, Linux ARM, regressions, riel, viro Hi Stefan, On 23/05/2022 07:19, Stefan Wahren wrote: > Hi Paul, > > Am 23.05.22 um 06:48 schrieb Paul E. McKenney: >> On Sun, May 22, 2022 at 05:11:36PM +0200, Stefan Wahren wrote: >>> Hi Paul, >>> >>> Am 22.05.22 um 01:46 schrieb Paul E. McKenney: >>>> On Sun, May 22, 2022 at 01:22:00AM +0200, Stefan Wahren wrote: >>>>> Hi, >>>>> >>>>> while testing the staging/vc04_services/interface/vchiq_arm driver with my >>>>> Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance >>>>> regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: >>>>> lru_cache_disable: replace work queue synchronization with synchronize_rcu >>>>> >>>>> Usually i run "vchiq_test -f 1" to see the driver is still working [1]. >>>>> >>>>> Before commit: >>>>> >>>>> real 0m1,500s >>>>> user 0m0,068s >>>>> sys 0m0,846s >>>>> >>>>> After commit: >>>>> >>>>> real 7m11,449s >>>>> user 0m2,049s >>>>> sys 0m0,023s >>>>> >>>>> Best regards >>>>> >>>>> [1] - https://github.com/raspberrypi/userland >>>> Please feel free to try the patch shown below. Or the pair of patches >>>> from Rik here: >>>> >>>> https://lore.kernel.org/lkml/20220218183114.2867528-2-riel@surriel.com/ >>>> https://lore.kernel.org/lkml/20220218183114.2867528-3-riel@surriel.com/ >>> I tried your patch and Rik's patches but in both cases vchiq_test runs 7 >>> minutes instead of ~ 1 second. >> That is surprising. Do you boot with rcupdate.rcu_normal=1? > No, not explicit. >> That would >> nullify my patch, but I would expect that Rik's patch would still provide >> increased performance even in that case. > I will retest with a fresh SD card image. >> >> Could you please characterize where the slowdown is occurring? > > Unfortunately i don't have a deep insight into driver and vchiq_test tool. Just > a user view. > > Do you think an strace would be a good starting point? > > @Phil Any advices to analyse this issue? Sending many small control packets: vchiq_test -c 1 10000 essentially tests interrupt latency. Using a small number of large bulk transfers: vchiq_test -b 10000 1 becomes a test of how long it takes to lock down pages. It also tests DMA transfer speeds, but since the DMA is run by the firmware (which you aren't changing), I think you can rule that. You may also find it helpful to include "force_turbo=1" in config.txt for more predictable results. By the way, running our 5.18-rc7-based branch on a 3B+ I'm not seeing any performance problems: pi@raspberrypi:~$ time vchiq_test -f 1 Functional test - iters:1 ======== iteration 1 ======== Testing bulk transfer for alignment. Testing bulk transfer at PAGE_SIZE. real 0m0.512s user 0m0.042s sys 0m0.165s Phil _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-23 9:29 ` Phil Elwell @ 2022-05-23 10:48 ` Stefan Wahren 2022-05-23 11:01 ` Phil Elwell 0 siblings, 1 reply; 19+ messages in thread From: Stefan Wahren @ 2022-05-23 10:48 UTC (permalink / raw) To: Phil Elwell, paulmck Cc: Marcelo Tosatti, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Mel Gorman, Juri Lelli, Thomas Gleixner, Sebastian Andrzej Siewior, linux-kernel, linux-mm, Linux ARM, regressions, riel, viro Hi Phil, Am 23.05.22 um 11:29 schrieb Phil Elwell: > Hi Stefan, > > On 23/05/2022 07:19, Stefan Wahren wrote: >> Hi Paul, >> >> Am 23.05.22 um 06:48 schrieb Paul E. McKenney: >>> On Sun, May 22, 2022 at 05:11:36PM +0200, Stefan Wahren wrote: >>>> Hi Paul, >>>> >>>> Am 22.05.22 um 01:46 schrieb Paul E. McKenney: >>>>> On Sun, May 22, 2022 at 01:22:00AM +0200, Stefan Wahren wrote: >>>>>> Hi, >>>>>> >>>>>> while testing the staging/vc04_services/interface/vchiq_arm >>>>>> driver with my >>>>>> Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance >>>>>> regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: >>>>>> lru_cache_disable: replace work queue synchronization with >>>>>> synchronize_rcu >>>>>> >>>>>> Usually i run "vchiq_test -f 1" to see the driver is still >>>>>> working [1]. >>>>>> >>>>>> Before commit: >>>>>> >>>>>> real 0m1,500s >>>>>> user 0m0,068s >>>>>> sys 0m0,846s >>>>>> >>>>>> After commit: >>>>>> >>>>>> real 7m11,449s >>>>>> user 0m2,049s >>>>>> sys 0m0,023s >>>>>> >>>>>> Best regards >>>>>> >>>>>> [1] - https://github.com/raspberrypi/userland >>>>> Please feel free to try the patch shown below. Or the pair of >>>>> patches >>>>> from Rik here: >>>>> >>>>> https://lore.kernel.org/lkml/20220218183114.2867528-2-riel@surriel.com/ >>>>> >>>>> https://lore.kernel.org/lkml/20220218183114.2867528-3-riel@surriel.com/ >>>>> >>>> I tried your patch and Rik's patches but in both cases vchiq_test >>>> runs 7 >>>> minutes instead of ~ 1 second. >>> That is surprising. Do you boot with rcupdate.rcu_normal=1? >> No, not explicit. >>> That would >>> nullify my patch, but I would expect that Rik's patch would still >>> provide >>> increased performance even in that case. >> I will retest with a fresh SD card image. >>> >>> Could you please characterize where the slowdown is occurring? >> >> Unfortunately i don't have a deep insight into driver and vchiq_test >> tool. Just a user view. >> >> Do you think an strace would be a good starting point? >> >> @Phil Any advices to analyse this issue? > > Sending many small control packets: > > vchiq_test -c 1 10000 > > essentially tests interrupt latency. Using a small number of large > bulk transfers: > > vchiq_test -b 10000 1 > > becomes a test of how long it takes to lock down pages. It also tests > DMA transfer speeds, but since the DMA is run by the firmware (which > you aren't changing), I think you can rule that. Thanks i will try. > > You may also find it helpful to include "force_turbo=1" in config.txt > for more predictable results. > > By the way, running our 5.18-rc7-based branch on a 3B+ I'm not seeing > any performance problems: I assume you are using arm/bcm2709_defconfig and not arm/multi_v7_defconfig as me? > > pi@raspberrypi:~$ time vchiq_test -f 1 > Functional test - iters:1 > ======== iteration 1 ======== > Testing bulk transfer for alignment. > Testing bulk transfer at PAGE_SIZE. > > real 0m0.512s > user 0m0.042s > sys 0m0.165s > > Phil _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-23 10:48 ` Stefan Wahren @ 2022-05-23 11:01 ` Phil Elwell 2022-05-23 11:15 ` Stefan Wahren 0 siblings, 1 reply; 19+ messages in thread From: Phil Elwell @ 2022-05-23 11:01 UTC (permalink / raw) To: Stefan Wahren, paulmck Cc: Marcelo Tosatti, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Mel Gorman, Juri Lelli, Thomas Gleixner, Sebastian Andrzej Siewior, linux-kernel, linux-mm, Linux ARM, regressions, riel, viro Hi Stefan, On 23/05/2022 11:48, Stefan Wahren wrote: > Hi Phil, > > Am 23.05.22 um 11:29 schrieb Phil Elwell: >> Hi Stefan, >> >> On 23/05/2022 07:19, Stefan Wahren wrote: >>> Hi Paul, >>> >>> Am 23.05.22 um 06:48 schrieb Paul E. McKenney: >>>> On Sun, May 22, 2022 at 05:11:36PM +0200, Stefan Wahren wrote: >>>>> Hi Paul, >>>>> >>>>> Am 22.05.22 um 01:46 schrieb Paul E. McKenney: >>>>>> On Sun, May 22, 2022 at 01:22:00AM +0200, Stefan Wahren wrote: >>>>>>> Hi, >>>>>>> >>>>>>> while testing the staging/vc04_services/interface/vchiq_arm driver with my >>>>>>> Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance >>>>>>> regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: >>>>>>> lru_cache_disable: replace work queue synchronization with synchronize_rcu >>>>>>> >>>>>>> Usually i run "vchiq_test -f 1" to see the driver is still working [1]. >>>>>>> >>>>>>> Before commit: >>>>>>> >>>>>>> real 0m1,500s >>>>>>> user 0m0,068s >>>>>>> sys 0m0,846s >>>>>>> >>>>>>> After commit: >>>>>>> >>>>>>> real 7m11,449s >>>>>>> user 0m2,049s >>>>>>> sys 0m0,023s >>>>>>> >>>>>>> Best regards >>>>>>> >>>>>>> [1] - https://github.com/raspberrypi/userland >>>>>> Please feel free to try the patch shown below. Or the pair of patches >>>>>> from Rik here: >>>>>> >>>>>> https://lore.kernel.org/lkml/20220218183114.2867528-2-riel@surriel.com/ >>>>>> https://lore.kernel.org/lkml/20220218183114.2867528-3-riel@surriel.com/ >>>>> I tried your patch and Rik's patches but in both cases vchiq_test runs 7 >>>>> minutes instead of ~ 1 second. >>>> That is surprising. Do you boot with rcupdate.rcu_normal=1? >>> No, not explicit. >>>> That would >>>> nullify my patch, but I would expect that Rik's patch would still provide >>>> increased performance even in that case. >>> I will retest with a fresh SD card image. >>>> >>>> Could you please characterize where the slowdown is occurring? >>> >>> Unfortunately i don't have a deep insight into driver and vchiq_test tool. >>> Just a user view. >>> >>> Do you think an strace would be a good starting point? >>> >>> @Phil Any advices to analyse this issue? >> >> Sending many small control packets: >> >> vchiq_test -c 1 10000 >> >> essentially tests interrupt latency. Using a small number of large bulk >> transfers: >> >> vchiq_test -b 10000 1 >> >> becomes a test of how long it takes to lock down pages. It also tests DMA >> transfer speeds, but since the DMA is run by the firmware (which you aren't >> changing), I think you can rule that. > Thanks i will try. >> >> You may also find it helpful to include "force_turbo=1" in config.txt for more >> predictable results. >> >> By the way, running our 5.18-rc7-based branch on a 3B+ I'm not seeing any >> performance problems: > I assume you are using arm/bcm2709_defconfig and not arm/multi_v7_defconfig as me? That's correct. Simply switching to multi_v7_defconfig breaks vchiq completely, presumably because it doesn't define CONFIG_BCM2835_VCHIQ. Phil >> >> pi@raspberrypi:~$ time vchiq_test -f 1 >> Functional test - iters:1 >> ======== iteration 1 ======== >> Testing bulk transfer for alignment. >> Testing bulk transfer at PAGE_SIZE. >> >> real 0m0.512s >> user 0m0.042s >> sys 0m0.165s >> >> Phil _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-23 11:01 ` Phil Elwell @ 2022-05-23 11:15 ` Stefan Wahren 2022-05-23 11:22 ` Phil Elwell 0 siblings, 1 reply; 19+ messages in thread From: Stefan Wahren @ 2022-05-23 11:15 UTC (permalink / raw) To: Phil Elwell, paulmck Cc: Marcelo Tosatti, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Mel Gorman, Juri Lelli, Thomas Gleixner, Sebastian Andrzej Siewior, linux-kernel, linux-mm, Linux ARM, regressions, riel, viro Hi Phil, Am 23.05.22 um 13:01 schrieb Phil Elwell: > Hi Stefan, > > On 23/05/2022 11:48, Stefan Wahren wrote: >> Hi Phil, >> >> Am 23.05.22 um 11:29 schrieb Phil Elwell: >>> Hi Stefan, >>> >>> On 23/05/2022 07:19, Stefan Wahren wrote: >>>> Hi Paul, >>>> >>>> Am 23.05.22 um 06:48 schrieb Paul E. McKenney: >>>>> On Sun, May 22, 2022 at 05:11:36PM +0200, Stefan Wahren wrote: >>>>>> Hi Paul, >>>>>> >>>>>> Am 22.05.22 um 01:46 schrieb Paul E. McKenney: >>>>>>> On Sun, May 22, 2022 at 01:22:00AM +0200, Stefan Wahren wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> while testing the staging/vc04_services/interface/vchiq_arm >>>>>>>> driver with my >>>>>>>> Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge >>>>>>>> performance >>>>>>>> regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: >>>>>>>> lru_cache_disable: replace work queue synchronization with >>>>>>>> synchronize_rcu >>>>>>>> >>>>>>>> Usually i run "vchiq_test -f 1" to see the driver is still >>>>>>>> working [1]. >>>>>>>> >>>>>>>> Before commit: >>>>>>>> >>>>>>>> real 0m1,500s >>>>>>>> user 0m0,068s >>>>>>>> sys 0m0,846s >>>>>>>> >>>>>>>> After commit: >>>>>>>> >>>>>>>> real 7m11,449s >>>>>>>> user 0m2,049s >>>>>>>> sys 0m0,023s >>>>>>>> >>>>>>>> Best regards >>>>>>>> >>>>>>>> [1] - https://github.com/raspberrypi/userland >>>>>>> Please feel free to try the patch shown below. Or the pair of >>>>>>> patches >>>>>>> from Rik here: >>>>>>> >>>>>>> https://lore.kernel.org/lkml/20220218183114.2867528-2-riel@surriel.com/ >>>>>>> >>>>>>> https://lore.kernel.org/lkml/20220218183114.2867528-3-riel@surriel.com/ >>>>>>> >>>>>> I tried your patch and Rik's patches but in both cases vchiq_test >>>>>> runs 7 >>>>>> minutes instead of ~ 1 second. >>>>> That is surprising. Do you boot with rcupdate.rcu_normal=1? >>>> No, not explicit. >>>>> That would >>>>> nullify my patch, but I would expect that Rik's patch would still >>>>> provide >>>>> increased performance even in that case. >>>> I will retest with a fresh SD card image. >>>>> >>>>> Could you please characterize where the slowdown is occurring? >>>> >>>> Unfortunately i don't have a deep insight into driver and >>>> vchiq_test tool. Just a user view. >>>> >>>> Do you think an strace would be a good starting point? >>>> >>>> @Phil Any advices to analyse this issue? >>> >>> Sending many small control packets: >>> >>> vchiq_test -c 1 10000 >>> >>> essentially tests interrupt latency. Using a small number of large >>> bulk transfers: >>> >>> vchiq_test -b 10000 1 >>> >>> becomes a test of how long it takes to lock down pages. It also >>> tests DMA transfer speeds, but since the DMA is run by the firmware >>> (which you aren't changing), I think you can rule that. >> Thanks i will try. >>> >>> You may also find it helpful to include "force_turbo=1" in >>> config.txt for more predictable results. >>> >>> By the way, running our 5.18-rc7-based branch on a 3B+ I'm not >>> seeing any performance problems: >> I assume you are using arm/bcm2709_defconfig and not >> arm/multi_v7_defconfig as me? > > That's correct. Simply switching to multi_v7_defconfig breaks vchiq > completely, presumably because it doesn't define CONFIG_BCM2835_VCHIQ. sorry, forgot to mention. I that i enable VCHIQ as module on top of multi_v7_defconfig. > > Phil > >>> >>> pi@raspberrypi:~$ time vchiq_test -f 1 >>> Functional test - iters:1 >>> ======== iteration 1 ======== >>> Testing bulk transfer for alignment. >>> Testing bulk transfer at PAGE_SIZE. >>> >>> real 0m0.512s >>> user 0m0.042s >>> sys 0m0.165s >>> >>> Phil _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-23 11:15 ` Stefan Wahren @ 2022-05-23 11:22 ` Phil Elwell 0 siblings, 0 replies; 19+ messages in thread From: Phil Elwell @ 2022-05-23 11:22 UTC (permalink / raw) To: Stefan Wahren, paulmck Cc: Marcelo Tosatti, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Mel Gorman, Juri Lelli, Thomas Gleixner, Sebastian Andrzej Siewior, linux-kernel, linux-mm, Linux ARM, regressions, riel, viro On 23/05/2022 12:15, Stefan Wahren wrote: > Hi Phil, > > Am 23.05.22 um 13:01 schrieb Phil Elwell: >> Hi Stefan, >> >> On 23/05/2022 11:48, Stefan Wahren wrote: >>> Hi Phil, >>> >>> Am 23.05.22 um 11:29 schrieb Phil Elwell: >>>> Hi Stefan, >>>> >>>> On 23/05/2022 07:19, Stefan Wahren wrote: >>>>> Hi Paul, >>>>> >>>>> Am 23.05.22 um 06:48 schrieb Paul E. McKenney: >>>>>> On Sun, May 22, 2022 at 05:11:36PM +0200, Stefan Wahren wrote: >>>>>>> Hi Paul, >>>>>>> >>>>>>> Am 22.05.22 um 01:46 schrieb Paul E. McKenney: >>>>>>>> On Sun, May 22, 2022 at 01:22:00AM +0200, Stefan Wahren wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> while testing the staging/vc04_services/interface/vchiq_arm driver with my >>>>>>>>> Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance >>>>>>>>> regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: >>>>>>>>> lru_cache_disable: replace work queue synchronization with synchronize_rcu >>>>>>>>> >>>>>>>>> Usually i run "vchiq_test -f 1" to see the driver is still working [1]. >>>>>>>>> >>>>>>>>> Before commit: >>>>>>>>> >>>>>>>>> real 0m1,500s >>>>>>>>> user 0m0,068s >>>>>>>>> sys 0m0,846s >>>>>>>>> >>>>>>>>> After commit: >>>>>>>>> >>>>>>>>> real 7m11,449s >>>>>>>>> user 0m2,049s >>>>>>>>> sys 0m0,023s >>>>>>>>> >>>>>>>>> Best regards >>>>>>>>> >>>>>>>>> [1] - https://github.com/raspberrypi/userland >>>>>>>> Please feel free to try the patch shown below. Or the pair of patches >>>>>>>> from Rik here: >>>>>>>> >>>>>>>> https://lore.kernel.org/lkml/20220218183114.2867528-2-riel@surriel.com/ >>>>>>>> https://lore.kernel.org/lkml/20220218183114.2867528-3-riel@surriel.com/ >>>>>>> I tried your patch and Rik's patches but in both cases vchiq_test runs 7 >>>>>>> minutes instead of ~ 1 second. >>>>>> That is surprising. Do you boot with rcupdate.rcu_normal=1? >>>>> No, not explicit. >>>>>> That would >>>>>> nullify my patch, but I would expect that Rik's patch would still provide >>>>>> increased performance even in that case. >>>>> I will retest with a fresh SD card image. >>>>>> >>>>>> Could you please characterize where the slowdown is occurring? >>>>> >>>>> Unfortunately i don't have a deep insight into driver and vchiq_test tool. >>>>> Just a user view. >>>>> >>>>> Do you think an strace would be a good starting point? >>>>> >>>>> @Phil Any advices to analyse this issue? >>>> >>>> Sending many small control packets: >>>> >>>> vchiq_test -c 1 10000 >>>> >>>> essentially tests interrupt latency. Using a small number of large bulk >>>> transfers: >>>> >>>> vchiq_test -b 10000 1 >>>> >>>> becomes a test of how long it takes to lock down pages. It also tests DMA >>>> transfer speeds, but since the DMA is run by the firmware (which you aren't >>>> changing), I think you can rule that. >>> Thanks i will try. >>>> >>>> You may also find it helpful to include "force_turbo=1" in config.txt for >>>> more predictable results. >>>> >>>> By the way, running our 5.18-rc7-based branch on a 3B+ I'm not seeing any >>>> performance problems: >>> I assume you are using arm/bcm2709_defconfig and not arm/multi_v7_defconfig >>> as me? >> >> That's correct. Simply switching to multi_v7_defconfig breaks vchiq >> completely, presumably because it doesn't define CONFIG_BCM2835_VCHIQ. > sorry, forgot to mention. I that i enable VCHIQ as module on top of > multi_v7_defconfig. Downstream tree with multi_v7_defconfig + CONFIG_BCM2835_VCHIQ: pi@raspberrypi:~$ time vchiq_test -f 1 Functional test - iters:1 ======== iteration 1 ======== Testing bulk transfer for alignment. Testing bulk transfer at PAGE_SIZE. real 0m0.566s user 0m0.037s sys 0m0.166s Phil _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-21 23:22 vchiq: Performance regression since 5.18-rc1 Stefan Wahren 2022-05-21 23:46 ` Paul E. McKenney @ 2022-05-23 7:09 ` Sebastian Andrzej Siewior 2022-05-25 13:56 ` Marcelo Tosatti 2022-05-23 9:28 ` Thorsten Leemhuis 2 siblings, 1 reply; 19+ messages in thread From: Sebastian Andrzej Siewior @ 2022-05-23 7:09 UTC (permalink / raw) To: Stefan Wahren Cc: Marcelo Tosatti, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Matthew Wilcox, Mel Gorman, Juri Lelli, Thomas Gleixner, Paul E. McKenney, linux-kernel, linux-mm, Linux ARM, Phil Elwell, regressions On 2022-05-22 01:22:00 [+0200], Stefan Wahren wrote: > Hi, Hi, > while testing the staging/vc04_services/interface/vchiq_arm driver with my > Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance > regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: > lru_cache_disable: replace work queue synchronization with synchronize_rcu > > Usually i run "vchiq_test -f 1" to see the driver is still working [1]. What about https://lore.kernel.org/all/YmrWK%2FKoU1zrAxPI@fuller.cnet/ Sebastian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-23 7:09 ` Sebastian Andrzej Siewior @ 2022-05-25 13:56 ` Marcelo Tosatti 2022-05-25 14:07 ` Stefan Wahren 2022-05-30 9:54 ` Stefan Wahren 0 siblings, 2 replies; 19+ messages in thread From: Marcelo Tosatti @ 2022-05-25 13:56 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Stefan Wahren Cc: Stefan Wahren, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Matthew Wilcox, Mel Gorman, Juri Lelli, Thomas Gleixner, Paul E. McKenney, linux-kernel, linux-mm, Linux ARM, Phil Elwell, regressions On Mon, May 23, 2022 at 09:09:07AM +0200, Sebastian Andrzej Siewior wrote: > On 2022-05-22 01:22:00 [+0200], Stefan Wahren wrote: > > Hi, > Hi, > > > while testing the staging/vc04_services/interface/vchiq_arm driver with my > > Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance > > regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: > > lru_cache_disable: replace work queue synchronization with synchronize_rcu > > > > Usually i run "vchiq_test -f 1" to see the driver is still working [1]. > > What about > https://lore.kernel.org/all/YmrWK%2FKoU1zrAxPI@fuller.cnet/ > > Sebastian Stefan, Can you please try the patch above ? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-25 13:56 ` Marcelo Tosatti @ 2022-05-25 14:07 ` Stefan Wahren 2022-05-25 14:26 ` Sebastian Andrzej Siewior 2022-05-25 15:37 ` Marcelo Tosatti 2022-05-30 9:54 ` Stefan Wahren 1 sibling, 2 replies; 19+ messages in thread From: Stefan Wahren @ 2022-05-25 14:07 UTC (permalink / raw) To: Marcelo Tosatti, Sebastian Andrzej Siewior Cc: Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Matthew Wilcox, Mel Gorman, Juri Lelli, Thomas Gleixner, Paul E. McKenney, linux-kernel, linux-mm, Linux ARM, Phil Elwell, regressions Hi Marcelo, Am 25.05.22 um 15:56 schrieb Marcelo Tosatti: > On Mon, May 23, 2022 at 09:09:07AM +0200, Sebastian Andrzej Siewior wrote: >> On 2022-05-22 01:22:00 [+0200], Stefan Wahren wrote: >>> Hi, >> Hi, >> >>> while testing the staging/vc04_services/interface/vchiq_arm driver with my >>> Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance >>> regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: >>> lru_cache_disable: replace work queue synchronization with synchronize_rcu >>> >>> Usually i run "vchiq_test -f 1" to see the driver is still working [1]. >> What about >> https://lore.kernel.org/all/YmrWK%2FKoU1zrAxPI@fuller.cnet/ >> >> Sebastian > Stefan, > > Can you please try the patch above ? this was the same as Paul send. I think i need more time for investigation, maybe there is an issue with the application. All i noticed so far is that in good case the CPU usage is around ~ 60 % and higher, while in bad case the CPU is almost idle. Also the issue is not reproducible with arm64/defconfig. > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-25 14:07 ` Stefan Wahren @ 2022-05-25 14:26 ` Sebastian Andrzej Siewior 2022-05-25 15:02 ` Paul E. McKenney 2022-05-25 15:37 ` Marcelo Tosatti 1 sibling, 1 reply; 19+ messages in thread From: Sebastian Andrzej Siewior @ 2022-05-25 14:26 UTC (permalink / raw) To: Stefan Wahren Cc: Marcelo Tosatti, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Matthew Wilcox, Mel Gorman, Juri Lelli, Thomas Gleixner, Paul E. McKenney, linux-kernel, linux-mm, Linux ARM, Phil Elwell, regressions On 2022-05-25 16:07:47 [+0200], Stefan Wahren wrote: > this was the same as Paul send. I think i need more time for investigation, > maybe there is an issue with the application. I haven't seen Paul referring to *that* patch. He pointed to some fs/ related changes. Sebastian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-25 14:26 ` Sebastian Andrzej Siewior @ 2022-05-25 15:02 ` Paul E. McKenney 0 siblings, 0 replies; 19+ messages in thread From: Paul E. McKenney @ 2022-05-25 15:02 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Stefan Wahren, Marcelo Tosatti, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Matthew Wilcox, Mel Gorman, Juri Lelli, Thomas Gleixner, linux-kernel, linux-mm, Linux ARM, Phil Elwell, regressions On Wed, May 25, 2022 at 04:26:27PM +0200, Sebastian Andrzej Siewior wrote: > On 2022-05-25 16:07:47 [+0200], Stefan Wahren wrote: > > this was the same as Paul send. I think i need more time for investigation, > > maybe there is an issue with the application. > > I haven't seen Paul referring to *that* patch. He pointed to some fs/ > related changes. True! Both patches changed from a synchronize_rcu() to a synchronize_rcu_expedited(), but different instances of synchronize_rcu(). Thanx, Paul _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-25 14:07 ` Stefan Wahren 2022-05-25 14:26 ` Sebastian Andrzej Siewior @ 2022-05-25 15:37 ` Marcelo Tosatti 2022-05-29 22:47 ` Stefan Wahren 1 sibling, 1 reply; 19+ messages in thread From: Marcelo Tosatti @ 2022-05-25 15:37 UTC (permalink / raw) To: Stefan Wahren Cc: Sebastian Andrzej Siewior, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Matthew Wilcox, Mel Gorman, Juri Lelli, Thomas Gleixner, Paul E. McKenney, linux-kernel, linux-mm, Linux ARM, Phil Elwell, regressions On Wed, May 25, 2022 at 04:07:47PM +0200, Stefan Wahren wrote: > Hi Marcelo, > > Am 25.05.22 um 15:56 schrieb Marcelo Tosatti: > > On Mon, May 23, 2022 at 09:09:07AM +0200, Sebastian Andrzej Siewior wrote: > > > On 2022-05-22 01:22:00 [+0200], Stefan Wahren wrote: > > > > Hi, > > > Hi, > > > > > > > while testing the staging/vc04_services/interface/vchiq_arm driver with my > > > > Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance > > > > regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: > > > > lru_cache_disable: replace work queue synchronization with synchronize_rcu > > > > > > > > Usually i run "vchiq_test -f 1" to see the driver is still working [1]. > > > What about > > > https://lore.kernel.org/all/YmrWK%2FKoU1zrAxPI@fuller.cnet/ > > > > > > Sebastian > > Stefan, > > > > Can you please try the patch above ? > > this was the same as Paul send. I think i need more time for investigation, > maybe there is an issue with the application. To clarify: they are not the same patches. > > All i noticed so far is that in good case the CPU usage is around ~ 60 % and > higher, while in bad case the CPU is almost idle. Also the issue is not > reproducible with arm64/defconfig. > > > > > > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-25 15:37 ` Marcelo Tosatti @ 2022-05-29 22:47 ` Stefan Wahren 0 siblings, 0 replies; 19+ messages in thread From: Stefan Wahren @ 2022-05-29 22:47 UTC (permalink / raw) To: Marcelo Tosatti Cc: Sebastian Andrzej Siewior, Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Matthew Wilcox, Mel Gorman, Juri Lelli, Thomas Gleixner, Paul E. McKenney, linux-kernel, linux-mm, Linux ARM, Phil Elwell, regressions Am 25.05.22 um 17:37 schrieb Marcelo Tosatti: > On Wed, May 25, 2022 at 04:07:47PM +0200, Stefan Wahren wrote: >> Hi Marcelo, >> >> Am 25.05.22 um 15:56 schrieb Marcelo Tosatti: >>> On Mon, May 23, 2022 at 09:09:07AM +0200, Sebastian Andrzej Siewior wrote: >>>> On 2022-05-22 01:22:00 [+0200], Stefan Wahren wrote: >>>>> Hi, >>>> Hi, >>>> >>>>> while testing the staging/vc04_services/interface/vchiq_arm driver with my >>>>> Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance >>>>> regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: >>>>> lru_cache_disable: replace work queue synchronization with synchronize_rcu >>>>> >>>>> Usually i run "vchiq_test -f 1" to see the driver is still working [1]. >>>> What about >>>> https://lore.kernel.org/all/YmrWK%2FKoU1zrAxPI@fuller.cnet/ >>>> >>>> Sebastian >>> Stefan, >>> >>> Can you please try the patch above ? >> this was the same as Paul send. I think i need more time for investigation, >> maybe there is an issue with the application. > To clarify: they are not the same patches. Thanks for pointing out. I will test it ASAP. > >> All i noticed so far is that in good case the CPU usage is around ~ 60 % and >> higher, while in bad case the CPU is almost idle. Also the issue is not >> reproducible with arm64/defconfig. >> >>> >>> >>> _______________________________________________ >>> linux-arm-kernel mailing list >>> linux-arm-kernel@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >> _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-25 13:56 ` Marcelo Tosatti 2022-05-25 14:07 ` Stefan Wahren @ 2022-05-30 9:54 ` Stefan Wahren 1 sibling, 0 replies; 19+ messages in thread From: Stefan Wahren @ 2022-05-30 9:54 UTC (permalink / raw) To: Marcelo Tosatti, Sebastian Andrzej Siewior Cc: Andrew Morton, Nicolas Saenz Julienne, Borislav Petkov, Minchan Kim, Matthew Wilcox, Mel Gorman, Juri Lelli, Thomas Gleixner, Paul E. McKenney, linux-kernel, linux-mm, Linux ARM, Phil Elwell, regressions Hi Marcelo, hi Sebastian, Am 25.05.22 um 15:56 schrieb Marcelo Tosatti: > On Mon, May 23, 2022 at 09:09:07AM +0200, Sebastian Andrzej Siewior wrote: >> On 2022-05-22 01:22:00 [+0200], Stefan Wahren wrote: >>> Hi, >> Hi, >> >>> while testing the staging/vc04_services/interface/vchiq_arm driver with my >>> Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance >>> regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: >>> lru_cache_disable: replace work queue synchronization with synchronize_rcu >>> >>> Usually i run "vchiq_test -f 1" to see the driver is still working [1]. >> What about >> https://lore.kernel.org/all/YmrWK%2FKoU1zrAxPI@fuller.cnet/ >> >> Sebastian > Stefan, > > Can you please try the patch above ? this patch fixes the regression. Great Best regards > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: vchiq: Performance regression since 5.18-rc1 2022-05-21 23:22 vchiq: Performance regression since 5.18-rc1 Stefan Wahren 2022-05-21 23:46 ` Paul E. McKenney 2022-05-23 7:09 ` Sebastian Andrzej Siewior @ 2022-05-23 9:28 ` Thorsten Leemhuis 2 siblings, 0 replies; 19+ messages in thread From: Thorsten Leemhuis @ 2022-05-23 9:28 UTC (permalink / raw) To: Stefan Wahren, Marcelo Tosatti, Andrew Morton, Nicolas Saenz Julienne Cc: Borislav Petkov, Minchan Kim, Matthew Wilcox, Mel Gorman, Juri Lelli, Thomas Gleixner, Sebastian Andrzej Siewior, Paul E. McKenney, linux-kernel, linux-mm, Linux ARM, Phil Elwell, regressions [TLDR: I'm adding this regression report to the list of tracked regressions; all text from me you find below is based on a few templates paragraphs you might have encountered already already in similar form.] Hi, this is your Linux kernel regression tracker. On 22.05.22 01:22, Stefan Wahren wrote: > > while testing the staging/vc04_services/interface/vchiq_arm driver with > my Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance > regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm: > lru_cache_disable: replace work queue synchronization with synchronize_rcu > > Usually i run "vchiq_test -f 1" to see the driver is still working [1]. > > Before commit: > > real 0m1,500s > user 0m0,068s > sys 0m0,846s > > After commit: > > real 7m11,449s > user 0m2,049s > sys 0m0,023s Thanks for the report. To be sure below issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression tracking bot: #regzbot ^introduced ff042f4a9b050895a42cae893cc01fa2ca81b95 #regzbot title mm: chiq_test runs 7 minutes instead of ~ 1 second. #regzbot ignore-activity This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply -- ideally with also telling regzbot about it, as explained here: https://linux-regtracking.leemhuis.info/tracked-regression/ Reminder for developers: When fixing the issue, add 'Link:' tags pointing to the report (the mail this one replied to), as the kernel's documentation call for; above page explains why this is important for tracked regressions. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2022-05-30 9:55 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-21 23:22 vchiq: Performance regression since 5.18-rc1 Stefan Wahren 2022-05-21 23:46 ` Paul E. McKenney 2022-05-22 15:11 ` Stefan Wahren 2022-05-23 4:48 ` Paul E. McKenney 2022-05-23 6:19 ` Stefan Wahren 2022-05-23 9:29 ` Phil Elwell 2022-05-23 10:48 ` Stefan Wahren 2022-05-23 11:01 ` Phil Elwell 2022-05-23 11:15 ` Stefan Wahren 2022-05-23 11:22 ` Phil Elwell 2022-05-23 7:09 ` Sebastian Andrzej Siewior 2022-05-25 13:56 ` Marcelo Tosatti 2022-05-25 14:07 ` Stefan Wahren 2022-05-25 14:26 ` Sebastian Andrzej Siewior 2022-05-25 15:02 ` Paul E. McKenney 2022-05-25 15:37 ` Marcelo Tosatti 2022-05-29 22:47 ` Stefan Wahren 2022-05-30 9:54 ` Stefan Wahren 2022-05-23 9:28 ` Thorsten Leemhuis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).