All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mikulas Patocka <mpatocka@redhat.com>
To: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
Cc: "linux-fbdev@vger.kernel.org" <linux-fbdev@vger.kernel.org>,
	"ladis@linux-mips.org" <ladis@linux-mips.org>,
	"b.zolnierkie@samsung.com" <b.zolnierkie@samsung.com>,
	"bernie@plugable.com" <bernie@plugable.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"airlied@redhat.com" <airlied@redhat.com>
Subject: Re: [PATCH 08/21] udl-kms: avoid prefetch
Date: Wed, 06 Jun 2018 15:46:17 +0000	[thread overview]
Message-ID: <alpine.LRH.2.02.1806061138080.7464@file01.intranet.prod.int.rdu2.redhat.com> (raw)
In-Reply-To: <fad12efd7a0f93e1dd6de5e7bf0c45832e6d74ea.camel@synopsys.com>



On Wed, 6 Jun 2018, Alexey Brodkin wrote:

> Hi Mikulas,
> 
> On Tue, 2018-06-05 at 11:30 -0400, Mikulas Patocka wrote:
> > 
> > On Tue, 5 Jun 2018, Alexey Brodkin wrote:
> > 
> > > Hi Mikulas,
> > > 
> > > On Sun, 2018-06-03 at 16:41 +0200, Mikulas Patocka wrote:
> > > > Modern processors can detect linear memory accesses and prefetch data
> > > > automatically, so there's no need to use prefetch.
> > > 
> > > Not each and every CPU that's capable of running Linux has prefetch
> > > functionality :)
> > > 
> > > Still read-on...
> > > 
> > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > > 
> > > > ---
> > > >  drivers/gpu/drm/udl/udl_transfer.c |    7 -------
> > > >  1 file changed, 7 deletions(-)
> > > > 
> > > > Index: linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c
> > > > =================================> > > > --- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_transfer.c	2018-05-31 14:48:12.000000000 +0200
> > > > +++ linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c	2018-05-31 14:48:12.000000000 +0200
> > > > @@ -13,7 +13,6 @@
> > > >  #include <linux/module.h>
> > > >  #include <linux/slab.h>
> > > >  #include <linux/fb.h>
> > > > -#include <linux/prefetch.h>
> > > >  #include <asm/unaligned.h>
> > > >  
> > > >  #include <drm/drmP.h>
> > > > @@ -51,9 +50,6 @@ static int udl_trim_hline(const u8 *bbac
> > > >  	int start = width;
> > > >  	int end = width;
> > > >  
> > > > -	prefetch((void *) front);
> > > > -	prefetch((void *) back);
> > > 
> > > AFAIK prefetcher fetches new data according to a known history... i.e. based on previously
> > > used pattern we'll trying to get the next batch of data.
> > > 
> > > But the code above is in the very beginning of the data processing routine where
> > > prefetcher doesn't yet have any history to know what and where to prefetch.
> > > 
> > > So I'd say this particular usage is good.
> > > At least those prefetches shouldn't hurt because typically it
> > > would be just 1 instruction if those exist or nothing if CPU/compiler doesn't
> > > support it.
> > 
> > See this post https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_444336_&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=lqdeeSSEes0GFDDl656e
> > ViXO7breS55ytWkhpk5R81I&m¥RaqJtvajFkM1hL7bOKD5jV7cpFfTvG2Y1cYCdBPd0&s=w0W8wFtAgENp8TE6RzdPGhdKRasJc_otIn08V0EkgrY&e= where they measured that 
> > prefetch hurts performance. Prefetch shouldn't be used unless you have a 
> > proof that it improves performance.
> > 
> > The problem is that the prefetch instruction causes stalls in the pipeline 
> > when it encounters TLB miss and the automatic prefetcher doesn't.
> 
> Wow, thanks for the link.
> I didn't know about that subtle issue with prefetch instructions on ARM and x86.
> 
> So OK in case of UDL these prefetches anyways make not not much sense I guess and there's
> something worse still, see what I've got from WandBoard Quad running kmscube [1] application
> with help of perf utility:
> --------------------------->8-------------------------
> # Overhead  Command  Shared Object            Symbol 
> # ........  .......  .......................  ........................................
> #
>     92.93%  kmscube  [kernel.kallsyms]        [k] udl_render_hline
>      2.51%  kmscube  [kernel.kallsyms]        [k] __divsi3
>      0.33%  kmscube  [kernel.kallsyms]        [k] _raw_spin_unlock_irqrestore
>      0.22%  kmscube  [kernel.kallsyms]        [k] lock_acquire
>      0.19%  kmscube  [kernel.kallsyms]        [k] _raw_spin_unlock_irq
>      0.17%  kmscube  [kernel.kallsyms]        [k] udl_handle_damage
>      0.12%  kmscube  [kernel.kallsyms]        [k] v7_dma_clean_range
>      0.11%  kmscube  [kernel.kallsyms]        [k] l2c210_clean_range
>      0.06%  kmscube  [kernel.kallsyms]        [k] __memzero
> --------------------------->8-------------------------
> 
> That said it's not even USB 2.0 which is a bottle-neck but
> computations in the udl_render_hline().
> 
> 
> [1] https://cgit.freedesktop.org/mesa/kmscube/
> 
> -Alexey

Try this patch 
http://people.redhat.com/~mpatocka/patches/kernel/udl/udlkms-avoid-division.patch

It is doing a lot of divisions - and WandBoard has Cortex-A9, that doesn't 
have division instruction.

BTW. the framebuffer UDL driver (not the modesetting driver) has 
performance counters in sysfs. Their location depends on the system, you 
can find them with find /sys -name "*metrics*"

The file "metrics_reset" resets the counters, so you can measure if the 
prefetch instructions improve performance or not.

Mikulas

WARNING: multiple messages have this Message-ID (diff)
From: Mikulas Patocka <mpatocka@redhat.com>
To: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
Cc: "linux-fbdev@vger.kernel.org" <linux-fbdev@vger.kernel.org>,
	"ladis@linux-mips.org" <ladis@linux-mips.org>,
	"b.zolnierkie@samsung.com" <b.zolnierkie@samsung.com>,
	"bernie@plugable.com" <bernie@plugable.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"airlied@redhat.com" <airlied@redhat.com>
Subject: Re: [PATCH 08/21] udl-kms: avoid prefetch
Date: Wed, 6 Jun 2018 11:46:17 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LRH.2.02.1806061138080.7464@file01.intranet.prod.int.rdu2.redhat.com> (raw)
In-Reply-To: <fad12efd7a0f93e1dd6de5e7bf0c45832e6d74ea.camel@synopsys.com>



On Wed, 6 Jun 2018, Alexey Brodkin wrote:

> Hi Mikulas,
> 
> On Tue, 2018-06-05 at 11:30 -0400, Mikulas Patocka wrote:
> > 
> > On Tue, 5 Jun 2018, Alexey Brodkin wrote:
> > 
> > > Hi Mikulas,
> > > 
> > > On Sun, 2018-06-03 at 16:41 +0200, Mikulas Patocka wrote:
> > > > Modern processors can detect linear memory accesses and prefetch data
> > > > automatically, so there's no need to use prefetch.
> > > 
> > > Not each and every CPU that's capable of running Linux has prefetch
> > > functionality :)
> > > 
> > > Still read-on...
> > > 
> > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > > 
> > > > ---
> > > >  drivers/gpu/drm/udl/udl_transfer.c |    7 -------
> > > >  1 file changed, 7 deletions(-)
> > > > 
> > > > Index: linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c
> > > > ===================================================================
> > > > --- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_transfer.c	2018-05-31 14:48:12.000000000 +0200
> > > > +++ linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c	2018-05-31 14:48:12.000000000 +0200
> > > > @@ -13,7 +13,6 @@
> > > >  #include <linux/module.h>
> > > >  #include <linux/slab.h>
> > > >  #include <linux/fb.h>
> > > > -#include <linux/prefetch.h>
> > > >  #include <asm/unaligned.h>
> > > >  
> > > >  #include <drm/drmP.h>
> > > > @@ -51,9 +50,6 @@ static int udl_trim_hline(const u8 *bbac
> > > >  	int start = width;
> > > >  	int end = width;
> > > >  
> > > > -	prefetch((void *) front);
> > > > -	prefetch((void *) back);
> > > 
> > > AFAIK prefetcher fetches new data according to a known history... i.e. based on previously
> > > used pattern we'll trying to get the next batch of data.
> > > 
> > > But the code above is in the very beginning of the data processing routine where
> > > prefetcher doesn't yet have any history to know what and where to prefetch.
> > > 
> > > So I'd say this particular usage is good.
> > > At least those prefetches shouldn't hurt because typically it
> > > would be just 1 instruction if those exist or nothing if CPU/compiler doesn't
> > > support it.
> > 
> > See this post https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_444336_&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=lqdeeSSEes0GFDDl656e
> > ViXO7breS55ytWkhpk5R81I&m=a5RaqJtvajFkM1hL7bOKD5jV7cpFfTvG2Y1cYCdBPd0&s=w0W8wFtAgENp8TE6RzdPGhdKRasJc_otIn08V0EkgrY&e= where they measured that 
> > prefetch hurts performance. Prefetch shouldn't be used unless you have a 
> > proof that it improves performance.
> > 
> > The problem is that the prefetch instruction causes stalls in the pipeline 
> > when it encounters TLB miss and the automatic prefetcher doesn't.
> 
> Wow, thanks for the link.
> I didn't know about that subtle issue with prefetch instructions on ARM and x86.
> 
> So OK in case of UDL these prefetches anyways make not not much sense I guess and there's
> something worse still, see what I've got from WandBoard Quad running kmscube [1] application
> with help of perf utility:
> --------------------------->8-------------------------
> # Overhead  Command  Shared Object            Symbol 
> # ........  .......  .......................  ........................................
> #
>     92.93%  kmscube  [kernel.kallsyms]        [k] udl_render_hline
>      2.51%  kmscube  [kernel.kallsyms]        [k] __divsi3
>      0.33%  kmscube  [kernel.kallsyms]        [k] _raw_spin_unlock_irqrestore
>      0.22%  kmscube  [kernel.kallsyms]        [k] lock_acquire
>      0.19%  kmscube  [kernel.kallsyms]        [k] _raw_spin_unlock_irq
>      0.17%  kmscube  [kernel.kallsyms]        [k] udl_handle_damage
>      0.12%  kmscube  [kernel.kallsyms]        [k] v7_dma_clean_range
>      0.11%  kmscube  [kernel.kallsyms]        [k] l2c210_clean_range
>      0.06%  kmscube  [kernel.kallsyms]        [k] __memzero
> --------------------------->8-------------------------
> 
> That said it's not even USB 2.0 which is a bottle-neck but
> computations in the udl_render_hline().
> 
> 
> [1] https://cgit.freedesktop.org/mesa/kmscube/
> 
> -Alexey

Try this patch 
http://people.redhat.com/~mpatocka/patches/kernel/udl/udlkms-avoid-division.patch

It is doing a lot of divisions - and WandBoard has Cortex-A9, that doesn't 
have division instruction.

BTW. the framebuffer UDL driver (not the modesetting driver) has 
performance counters in sysfs. Their location depends on the system, you 
can find them with find /sys -name "*metrics*"

The file "metrics_reset" resets the counters, so you can measure if the 
prefetch instructions improve performance or not.

Mikulas
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2018-06-06 15:46 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-03 14:40 [PATCH 00/21] USB DisplayLink patches Mikulas Patocka
2018-06-03 14:40 ` Mikulas Patocka
2018-06-03 14:40 ` [PATCH 01/21] udl-kms: fix display corruption of the last line Mikulas Patocka
2018-06-03 14:40   ` Mikulas Patocka
2018-06-03 14:40 ` [PATCH 02/21] udl-kms: change down_interruptible to down Mikulas Patocka
2018-06-03 14:40   ` Mikulas Patocka
2018-06-03 14:40 ` [PATCH 03/21] udl-kms: handle allocation failure Mikulas Patocka
2018-06-03 14:40   ` Mikulas Patocka
2018-06-03 14:40 ` [PATCH 04/21] udl-kms: fix crash due to uninitialized memory Mikulas Patocka
2018-06-03 14:40   ` Mikulas Patocka
2018-06-03 14:40 ` [PATCH 05/21] udl-kms: fix a linked-list corruption when using fbdefio Mikulas Patocka
2018-06-03 14:40   ` Mikulas Patocka
2018-06-03 14:40 ` [PATCH 06/21] udl-kms: make a local copy of fb_ops Mikulas Patocka
2018-06-03 14:40   ` Mikulas Patocka
2018-06-03 14:41 ` [PATCH 07/21] udl-kms: avoid division Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-03 14:41 ` [PATCH 08/21] udl-kms: avoid prefetch Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-05 10:08   ` Alexey Brodkin
2018-06-05 10:08     ` Alexey Brodkin
2018-06-05 10:48     ` Ladislav Michl
2018-06-05 10:48       ` Ladislav Michl
2018-06-05 15:30     ` Mikulas Patocka
2018-06-05 15:30       ` Mikulas Patocka
2018-06-06 12:04       ` Alexey Brodkin
2018-06-06 12:04         ` Alexey Brodkin
2018-06-06 15:46         ` Mikulas Patocka [this message]
2018-06-06 15:46           ` Mikulas Patocka
2018-06-15 16:30           ` Alexey Brodkin
2018-06-15 16:30             ` Alexey Brodkin
2018-06-15 16:30             ` Alexey Brodkin
2018-06-03 14:41 ` [PATCH 09/21] udl-kms: use spin_lock_irq instead of spin_lock_irqsave Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-03 14:41 ` [PATCH 10/21] udl-kms: dont spam the syslog with debug messages Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-03 14:41 ` [PATCH 11/21] udlfb: fix semaphore value leak Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-03 14:41 ` [PATCH 12/21] udlfb: fix display corruption of the last line Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-03 14:41 ` [PATCH 13/21] udlfb: dont switch if we are switching to the same videomode Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-03 14:41 ` [PATCH 14/21] udlfb: make a local copy of fb_ops Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-03 14:41 ` [PATCH 15/21] udlfb: set optimal write delay Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-03 14:41 ` [PATCH 16/21] udlfb: handle allocation failure Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-03 14:41 ` [PATCH 17/21] udlfb: set line_length in dlfb_ops_set_par Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-03 14:41 ` [PATCH 18/21] udlfb: allow reallocating the framebuffer Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-03 19:24   ` kbuild test robot
2018-06-03 19:24     ` kbuild test robot
2018-06-12 16:32     ` Mikulas Patocka
2018-06-12 16:32       ` Mikulas Patocka
2018-07-03 14:58       ` Bartlomiej Zolnierkiewicz
2018-07-03 14:58         ` Bartlomiej Zolnierkiewicz
2018-06-03 14:41 ` [PATCH 19/21] udlfb: optimization - test the backing buffer Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-03 14:41 ` [PATCH 20/21] udlfb: avoid prefetch Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-03 14:41 ` [PATCH 21/21] udlfb: use spin_lock_irq instead of spin_lock_irqsave Mikulas Patocka
2018-06-03 14:41   ` Mikulas Patocka
2018-06-04  1:25 ` [PATCH 00/21] USB DisplayLink patches Dave Airlie
2018-06-04  1:25   ` Dave Airlie
2018-06-04 14:14   ` Mikulas Patocka
2018-06-04 14:14     ` Mikulas Patocka
2018-07-04  8:04     ` Daniel Vetter
2018-07-04  8:04       ` Daniel Vetter
2018-06-05  9:47 ` Alexey Brodkin
2018-06-05  9:47   ` Alexey Brodkin
2018-06-05 15:34   ` Mikulas Patocka
2018-06-05 15:34     ` Mikulas Patocka
2018-07-25 13:40     ` Bartlomiej Zolnierkiewicz
2018-07-25 13:40       ` Bartlomiej Zolnierkiewicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LRH.2.02.1806061138080.7464@file01.intranet.prod.int.rdu2.redhat.com \
    --to=mpatocka@redhat.com \
    --cc=Alexey.Brodkin@synopsys.com \
    --cc=airlied@redhat.com \
    --cc=b.zolnierkie@samsung.com \
    --cc=bernie@plugable.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ladis@linux-mips.org \
    --cc=linux-fbdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.