All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sergey Organov <sorganov@gmail.com>
To: "Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>
Cc: "Fabio Estevam" <festevam@gmail.com>,
	"Pengutronix Kernel Team" <kernel@pengutronix.de>,
	"Richard Genoud" <richard.genoud@gmail.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Sascha Hauer" <s.hauer@pengutronix.de>,
	"NXP Linux Team" <linux-imx@nxp.com>,
	linux-serial@vger.kernel.org, "Shawn Guo" <shawnguo@kernel.org>,
	"Tim Harvey" <tharvey@gateworks.com>,
	"Jiri Slaby" <jirislaby@kernel.org>,
	"Tomasz Moń" <tomasz.mon@camlingroup.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 7/8] serial: imx: use readl() to optimize FIFO reading loop
Date: Wed, 18 Jan 2023 18:40:17 +0300	[thread overview]
Message-ID: <87ilh3zw9q.fsf@osv.gnss.ru> (raw)
In-Reply-To: <20230117212702.vvwe3rqjedivqbhn@pengutronix.de> ("Uwe =?utf-8?Q?Kleine-K=C3=B6nig=22's?= message of "Tue, 17 Jan 2023 22:27:02 +0100")

Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:

> Hello Sergey,
>
> On Tue, Jan 17, 2023 at 04:22:51PM +0300, Sergey Organov wrote:
>> Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:
>> > On Fri, Jan 13, 2023 at 09:43:33PM +0300, Sergey Organov wrote:
>> >> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know
>> >> we read registers that must not be cached.
>> >> 
>> >> Signed-off-by: Sergey Organov <sorganov@gmail.com>
>> >> ---
>> >>  drivers/tty/serial/imx.c | 5 +++--
>> >>  1 file changed, 3 insertions(+), 2 deletions(-)
>> >> 
>> >> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
>> >> index be00362b8b67..f4236e8995fa 100644
>> >> --- a/drivers/tty/serial/imx.c
>> >> +++ b/drivers/tty/serial/imx.c
>> >> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>> >>  	struct imx_port *sport = dev_id;
>> >>  	unsigned int rx, flg;
>> >>  	struct tty_port *port = &sport->port.state->port;
>> >> +	typeof(sport->port.membase) membase = sport->port.membase;
>> >>  	u32 usr2;
>> >>  
>> >>  	/* If we received something, check for 0xff flood */
>> >> -	usr2 = imx_uart_readl(sport, USR2);
>> >> +	usr2 = readl(membase + USR2);
>> >>  	if (usr2 & USR2_RDR)
>> >>  		imx_uart_check_flood(sport, usr2);
>> >>  
>> >> -	while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) {
>> >> +	while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) {
>> >>  		flg = TTY_NORMAL;
>> >>  		sport->port.icount.rx++;
>> >
>> > One of the motivations to introduce imx_uart_readl was to have a single
>> > place to add a debug output to be able to inspect what the driver is
>> > doing.
>> >
>> > I wonder where your need for higher speed comes from and if the compiler
>> > really generates more effective code with your change.
>> 
>> Mostly it's because I'm obviously slowing things down a bit with the
>> patch to fight the flood, so I feel obliged to get things back on par
>> with the origin. Then, higher speed, let alone the time spent with
>> interrupts disabled and/or spinlocks taken, is always one of generic
>> goals for me.
>> 
>> As for the generated code, with this patch I don't aim to affect code
>> generation, I rather avoid execution of part of existing code while
>> being on the most critical path. It should be quite obvious that not
>> executing some code is at least not slower than executing it.
>
> That's true, but I think it doesn't apply here.

Well, "at least not slower" still applies ;-)

>
> I would expect that the compiler "sees" for the call
>
> 	imx_uart_readl(sport, USR2)
>
> that the 2nd argument is constant and that for that value of offset the
> call is equivalent to readl(sport->port.membase + offset);
>
> So I doubt you're making anything quicker here.

Yep, it's nice compiler is clever enough to optimize-out the switch for
constant argument, though I still typically prefer to avoid over-relying
on optimizations. That said, I now tend to agree with your POV in this
particular case.

>
> I tried the following patch on mainline (that is without the preceding
> patches in this series):
>
> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
> index 757825edb0cd..cfc2f7057345 100644
> --- a/drivers/tty/serial/imx.c
> +++ b/drivers/tty/serial/imx.c
> @@ -807,7 +807,7 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>  	unsigned int rx, flg, ignored = 0;
>  	struct tty_port *port = &sport->port.state->port;
>  
> -	while (imx_uart_readl(sport, USR2) & USR2_RDR) {
> +	while (readl(sport->port.membase + USR2) & USR2_RDR) {
>  		u32 usr2;
>  
>  		flg = TTY_NORMAL;
>
> and the resulting code didn't change at all. For a bigger change (i.e.
> adding a variable for sport->port.membase and replacing two
> imx_uart_readl) the code changed quite a bit (it got 28 bytes bigger for
> imx_v6_v7_defconfig) and in the short time I tried I couldn't judge if
> the resulting code is better or not.
>
> So a change that explicitly doesn't execute the code that the compiler
> optimizes away anyhow isn't a win. Together with the fact that your
> patch makes register access use different idioms and so makes it harder
> to understand for a human I'd say the net benefit of your patch is
> negative.

OK, you convinced me to drop it.

>
>> > Please either drop the patch from your series or provide the differences
>> > the compiler produces and a benchmark.
>> 
>> If your only objection against this patch is the desire to keep a single
>> place to add debug output, I'll be happy to tune the resulting code to
>> still have one.
>
> I don't see the need to optimize it.
>
>> That said, before we make a decision, could you please tell why register
>> shadows that the imx_uart_readl/writel are dealing with are needed in
>> the first place? It looks like all the registers that are shadowed are
>> readable as well. What's going on here, and if it happens to be a
>> speed-up, do we have any benchmarks?
>
> Not sure I did benchmarks back then, probably not. The main motivation
> was really to have that single access function. So I admit being guilty
> to have implemented an optimization without hard numbers just assuming
> that access to (cached) RAM is quicker than the register space.

Well, even if it is quicker, we still spend time writing to both RAM and
register, and then there is no gain for the data Tx/Rx registers that
aren't cached, yet are on most critical paths.

So, if this is just caching and doesn't change behavior, I'd suggest to
get rid of the shadowing altogether, making code simpler to follow.
Besides, if it were so, I'd had no temptation to replace the
imx_uart_readl() with raw readl().

Thanks,
-- Sergey

WARNING: multiple messages have this Message-ID (diff)
From: Sergey Organov <sorganov@gmail.com>
To: "Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>
Cc: "Fabio Estevam" <festevam@gmail.com>,
	"Pengutronix Kernel Team" <kernel@pengutronix.de>,
	"Richard Genoud" <richard.genoud@gmail.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Sascha Hauer" <s.hauer@pengutronix.de>,
	"NXP Linux Team" <linux-imx@nxp.com>,
	linux-serial@vger.kernel.org, "Shawn Guo" <shawnguo@kernel.org>,
	"Tim Harvey" <tharvey@gateworks.com>,
	"Jiri Slaby" <jirislaby@kernel.org>,
	"Tomasz Moń" <tomasz.mon@camlingroup.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 7/8] serial: imx: use readl() to optimize FIFO reading loop
Date: Wed, 18 Jan 2023 18:40:17 +0300	[thread overview]
Message-ID: <87ilh3zw9q.fsf@osv.gnss.ru> (raw)
In-Reply-To: <20230117212702.vvwe3rqjedivqbhn@pengutronix.de> ("Uwe =?utf-8?Q?Kleine-K=C3=B6nig=22's?= message of "Tue, 17 Jan 2023 22:27:02 +0100")

Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:

> Hello Sergey,
>
> On Tue, Jan 17, 2023 at 04:22:51PM +0300, Sergey Organov wrote:
>> Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:
>> > On Fri, Jan 13, 2023 at 09:43:33PM +0300, Sergey Organov wrote:
>> >> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know
>> >> we read registers that must not be cached.
>> >> 
>> >> Signed-off-by: Sergey Organov <sorganov@gmail.com>
>> >> ---
>> >>  drivers/tty/serial/imx.c | 5 +++--
>> >>  1 file changed, 3 insertions(+), 2 deletions(-)
>> >> 
>> >> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
>> >> index be00362b8b67..f4236e8995fa 100644
>> >> --- a/drivers/tty/serial/imx.c
>> >> +++ b/drivers/tty/serial/imx.c
>> >> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>> >>  	struct imx_port *sport = dev_id;
>> >>  	unsigned int rx, flg;
>> >>  	struct tty_port *port = &sport->port.state->port;
>> >> +	typeof(sport->port.membase) membase = sport->port.membase;
>> >>  	u32 usr2;
>> >>  
>> >>  	/* If we received something, check for 0xff flood */
>> >> -	usr2 = imx_uart_readl(sport, USR2);
>> >> +	usr2 = readl(membase + USR2);
>> >>  	if (usr2 & USR2_RDR)
>> >>  		imx_uart_check_flood(sport, usr2);
>> >>  
>> >> -	while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) {
>> >> +	while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) {
>> >>  		flg = TTY_NORMAL;
>> >>  		sport->port.icount.rx++;
>> >
>> > One of the motivations to introduce imx_uart_readl was to have a single
>> > place to add a debug output to be able to inspect what the driver is
>> > doing.
>> >
>> > I wonder where your need for higher speed comes from and if the compiler
>> > really generates more effective code with your change.
>> 
>> Mostly it's because I'm obviously slowing things down a bit with the
>> patch to fight the flood, so I feel obliged to get things back on par
>> with the origin. Then, higher speed, let alone the time spent with
>> interrupts disabled and/or spinlocks taken, is always one of generic
>> goals for me.
>> 
>> As for the generated code, with this patch I don't aim to affect code
>> generation, I rather avoid execution of part of existing code while
>> being on the most critical path. It should be quite obvious that not
>> executing some code is at least not slower than executing it.
>
> That's true, but I think it doesn't apply here.

Well, "at least not slower" still applies ;-)

>
> I would expect that the compiler "sees" for the call
>
> 	imx_uart_readl(sport, USR2)
>
> that the 2nd argument is constant and that for that value of offset the
> call is equivalent to readl(sport->port.membase + offset);
>
> So I doubt you're making anything quicker here.

Yep, it's nice compiler is clever enough to optimize-out the switch for
constant argument, though I still typically prefer to avoid over-relying
on optimizations. That said, I now tend to agree with your POV in this
particular case.

>
> I tried the following patch on mainline (that is without the preceding
> patches in this series):
>
> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
> index 757825edb0cd..cfc2f7057345 100644
> --- a/drivers/tty/serial/imx.c
> +++ b/drivers/tty/serial/imx.c
> @@ -807,7 +807,7 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id)
>  	unsigned int rx, flg, ignored = 0;
>  	struct tty_port *port = &sport->port.state->port;
>  
> -	while (imx_uart_readl(sport, USR2) & USR2_RDR) {
> +	while (readl(sport->port.membase + USR2) & USR2_RDR) {
>  		u32 usr2;
>  
>  		flg = TTY_NORMAL;
>
> and the resulting code didn't change at all. For a bigger change (i.e.
> adding a variable for sport->port.membase and replacing two
> imx_uart_readl) the code changed quite a bit (it got 28 bytes bigger for
> imx_v6_v7_defconfig) and in the short time I tried I couldn't judge if
> the resulting code is better or not.
>
> So a change that explicitly doesn't execute the code that the compiler
> optimizes away anyhow isn't a win. Together with the fact that your
> patch makes register access use different idioms and so makes it harder
> to understand for a human I'd say the net benefit of your patch is
> negative.

OK, you convinced me to drop it.

>
>> > Please either drop the patch from your series or provide the differences
>> > the compiler produces and a benchmark.
>> 
>> If your only objection against this patch is the desire to keep a single
>> place to add debug output, I'll be happy to tune the resulting code to
>> still have one.
>
> I don't see the need to optimize it.
>
>> That said, before we make a decision, could you please tell why register
>> shadows that the imx_uart_readl/writel are dealing with are needed in
>> the first place? It looks like all the registers that are shadowed are
>> readable as well. What's going on here, and if it happens to be a
>> speed-up, do we have any benchmarks?
>
> Not sure I did benchmarks back then, probably not. The main motivation
> was really to have that single access function. So I admit being guilty
> to have implemented an optimization without hard numbers just assuming
> that access to (cached) RAM is quicker than the register space.

Well, even if it is quicker, we still spend time writing to both RAM and
register, and then there is no gain for the data Tx/Rx registers that
aren't cached, yet are on most critical paths.

So, if this is just caching and doesn't change behavior, I'd suggest to
get rid of the shadowing altogether, making code simpler to follow.
Besides, if it were so, I'd had no temptation to replace the
imx_uart_readl() with raw readl().

Thanks,
-- Sergey

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2023-01-18 15:40 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87bko4e65y.fsf@osv.gnss.ru>
2022-12-15 21:38 ` serial: imx: sudden rx flood: hardware bug? Fabio Estevam
2022-12-15 22:58   ` Fabio Estevam
2022-12-16 17:08   ` Sergey Organov
2023-01-13 18:43 ` [PATCH 0/8] serial: imx: work-around for hardware RX flood, and then isr improvements Sergey Organov
2023-01-13 18:43   ` Sergey Organov
2023-01-13 18:43   ` [PATCH 1/8] serial: imx: factor-out common code to imx_uart_soft_reset() Sergey Organov
2023-01-13 18:43     ` Sergey Organov
2023-01-16 10:30     ` Ilpo Järvinen
2023-01-16 10:30       ` Ilpo Järvinen
2023-01-17 13:42       ` Sergey Organov
2023-01-17 13:42         ` Sergey Organov
2023-01-13 18:43   ` [PATCH 2/8] serial: imx: work-around for hardware RX flood Sergey Organov
2023-01-13 18:43     ` Sergey Organov
2023-01-13 18:43   ` [PATCH 3/8] serial: imx: do not sysrq broken chars Sergey Organov
2023-01-13 18:43     ` Sergey Organov
2023-01-16 10:41     ` Ilpo Järvinen
2023-01-16 10:41       ` Ilpo Järvinen
2023-01-16 15:24     ` Johan Hovold
2023-01-16 15:24       ` Johan Hovold
2023-01-17 17:35       ` Sergey Organov
2023-01-17 17:35         ` Sergey Organov
2023-01-18  8:06         ` Johan Hovold
2023-01-18  8:06           ` Johan Hovold
2023-01-13 18:43   ` [PATCH 4/8] serial: imx: do not break from FIFO reading loop prematurely Sergey Organov
2023-01-13 18:43     ` Sergey Organov
2023-01-13 18:43   ` [PATCH 5/8] serial: imx: remove redundant USR2 read from FIFO reading loop Sergey Organov
2023-01-13 18:43     ` Sergey Organov
2023-01-16 10:50     ` Ilpo Järvinen
2023-01-16 10:50       ` Ilpo Järvinen
2023-01-13 18:43   ` [PATCH 6/8] serial: imx: stop using USR2 in " Sergey Organov
2023-01-13 18:43     ` Sergey Organov
2023-01-16 10:54     ` Ilpo Järvinen
2023-01-16 10:54       ` Ilpo Järvinen
2023-01-17 13:30       ` Sergey Organov
2023-01-17 13:30         ` Sergey Organov
2023-01-13 18:43   ` [PATCH 7/8] serial: imx: use readl() to optimize " Sergey Organov
2023-01-13 18:43     ` Sergey Organov
2023-01-16 11:03     ` Ilpo Järvinen
2023-01-16 11:03       ` Ilpo Järvinen
2023-01-17 17:43       ` Sergey Organov
2023-01-17 17:43         ` Sergey Organov
2023-01-18  8:24         ` Ilpo Järvinen
2023-01-18  8:24           ` Ilpo Järvinen
2023-01-18 15:43           ` Sergey Organov
2023-01-18 15:43             ` Sergey Organov
2023-01-18 19:29             ` Ilpo Järvinen
2023-01-18 19:29               ` Ilpo Järvinen
2023-01-17 10:20     ` Sherry Sun
2023-01-17 10:20       ` Sherry Sun
2023-01-17 17:48       ` Sergey Organov
2023-01-17 17:48         ` Sergey Organov
2023-01-17 11:32     ` Uwe Kleine-König
2023-01-17 11:32       ` Uwe Kleine-König
2023-01-17 13:22       ` Sergey Organov
2023-01-17 13:22         ` Sergey Organov
2023-01-17 21:27         ` Uwe Kleine-König
2023-01-17 21:27           ` Uwe Kleine-König
2023-01-18 15:40           ` Sergey Organov [this message]
2023-01-18 15:40             ` Sergey Organov
2023-01-19  7:01             ` Uwe Kleine-König
2023-01-19  7:01               ` Uwe Kleine-König
2023-01-21 18:04               ` Sergey Organov
2023-01-21 18:04                 ` Sergey Organov
2023-01-13 18:43   ` [PATCH 8/8] serial: imx: refine local variables in rxint() Sergey Organov
2023-01-13 18:43     ` Sergey Organov
2023-01-21 15:36 ` [PATCH v1 0/7] serial: imx: work-around for hardware RX flood, and then isr improvements Sergey Organov
2023-01-21 15:36   ` Sergey Organov
2023-01-21 15:36   ` [PATCH v1 1/7] serial: imx: factor-out common code to imx_uart_soft_reset() Sergey Organov
2023-01-21 15:36     ` Sergey Organov
2023-01-21 15:36   ` [PATCH v1 2/7] serial: imx: work-around for hardware RX flood Sergey Organov
2023-01-21 15:36     ` Sergey Organov
2023-01-21 15:36   ` [PATCH v1 3/7] serial: imx: do not sysrq broken chars Sergey Organov
2023-01-21 15:36     ` Sergey Organov
2023-01-21 16:12     ` Stefan Wahren
2023-01-21 16:12       ` Stefan Wahren
2023-01-21 21:04       ` Sergey Organov
2023-01-21 21:04         ` Sergey Organov
2023-01-23 12:38         ` Ilpo Järvinen
2023-01-23 12:38           ` Ilpo Järvinen
2023-01-23 16:34           ` Sergey Organov
2023-01-23 16:34             ` Sergey Organov
2023-01-21 15:36   ` [PATCH v1 4/7] serial: imx: do not break from FIFO reading loop prematurely Sergey Organov
2023-01-21 15:36     ` Sergey Organov
2023-01-21 15:36   ` [PATCH v1 5/7] serial: imx: remove redundant USR2 read from FIFO reading loop Sergey Organov
2023-01-21 15:36     ` Sergey Organov
2023-01-21 15:36   ` [PATCH v1 6/7] serial: imx: stop using USR2 in " Sergey Organov
2023-01-21 15:36     ` Sergey Organov
2023-01-21 15:36   ` [PATCH v1 7/7] serial: imx: refine local variables in rxint() Sergey Organov
2023-01-21 15:36     ` Sergey Organov
2023-02-01 14:16 ` [PATCH RESEND] serial: imx: get rid of registers shadowing Sergey Organov
2023-02-01 14:16   ` Sergey Organov
2023-02-01 14:26 ` [PATCH v1 RESEND 0/7] serial: imx: work-around for hardware RX flood, and then isr improvements Sergey Organov
2023-02-01 14:26   ` Sergey Organov
2023-02-01 14:26   ` [PATCH v1 1/7] serial: imx: factor-out common code to imx_uart_soft_reset() Sergey Organov
2023-02-01 14:26     ` Sergey Organov
2023-02-01 14:26   ` [PATCH v1 2/7] serial: imx: work-around for hardware RX flood Sergey Organov
2023-02-01 14:26     ` Sergey Organov
2023-02-01 14:26   ` [PATCH v1 3/7] serial: imx: do not sysrq broken chars Sergey Organov
2023-02-01 14:26     ` Sergey Organov
2023-02-01 14:26   ` [PATCH v1 4/7] serial: imx: do not break from FIFO reading loop prematurely Sergey Organov
2023-02-01 14:26     ` Sergey Organov
2023-02-01 14:26   ` [PATCH v1 5/7] serial: imx: remove redundant USR2 read from FIFO reading loop Sergey Organov
2023-02-01 14:26     ` Sergey Organov
2023-02-01 14:26   ` [PATCH v1 6/7] serial: imx: stop using USR2 in " Sergey Organov
2023-02-01 14:26     ` Sergey Organov
2023-02-01 14:27   ` [PATCH v1 7/7] serial: imx: refine local variables in rxint() Sergey Organov
2023-02-01 14:27     ` Sergey Organov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ilh3zw9q.fsf@osv.gnss.ru \
    --to=sorganov@gmail.com \
    --cc=festevam@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jirislaby@kernel.org \
    --cc=kernel@pengutronix.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-imx@nxp.com \
    --cc=linux-serial@vger.kernel.org \
    --cc=richard.genoud@gmail.com \
    --cc=s.hauer@pengutronix.de \
    --cc=shawnguo@kernel.org \
    --cc=tharvey@gateworks.com \
    --cc=tomasz.mon@camlingroup.com \
    --cc=u.kleine-koenig@pengutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.