All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ionela Voinescu <ionela.voinescu@arm.com>
To: Jerome Brunet <jbrunet@baylibre.com>
Cc: Jassi Brar <jassisinghbrar@gmail.com>,
	Kevin Hilman <khilman@baylibre.com>,
	linux-amlogic@lists.infradead.org, Da Xue <da@libre.computer>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mailbox: cancel timer before starting it
Date: Thu, 15 Oct 2020 14:46:28 +0100	[thread overview]
Message-ID: <20201015134628.GA11989@arm.com> (raw)
In-Reply-To: <20200923123916.1115962-1-jbrunet@baylibre.com>

Hi guys,

On Wednesday 23 Sep 2020 at 14:39:16 (+0200), Jerome Brunet wrote:
> If the txdone is done by polling, it is possible for msg_submit() to start
> the timer while txdone_hrtimer() callback is running. If the timer needs
> recheduling, it could already be enqueued by the time hrtimer_forward_now()
> is called, leading hrtimer to loudly complain.
> 
> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 hrtimer_forward+0xc4/0x110
> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 5.9.0-rc2-00236-gd3520067d01c-dirty #5
> Hardware name: Libre Computer AML-S805X-AC (DT)
> Workqueue: events_freezable_power_ thermal_zone_device_check
> pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--)
> pc : hrtimer_forward+0xc4/0x110
> lr : txdone_hrtimer+0xf8/0x118
> [...]
> 
> Canceling the timer before starting it ensure that the timer callback is
> not running when the timer is started, solving this race condition.
> 
> Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete polling")
> Reported-by: Da Xue <da@libre.computer>
> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
> ---
>  drivers/mailbox/mailbox.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 0b821a5b2db8..34f9ab01caef 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
>  exit:
>  	spin_unlock_irqrestore(&chan->lock, flags);
>  
> -	if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> -		/* kick start the timer immediately to avoid delays */
> +	if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> +		/* Disable the timer if already active ... */
> +		hrtimer_cancel(&chan->mbox->poll_hrt);
> +
> +		/* ... and kick start it immediately to avoid delays */
>  		hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
> +	}
>  }
>  
>  static void tx_tick(struct mbox_chan *chan, int r)

I've tracked a regression back to this commit. Details to reproduce:


 - HEAD: (linux-next)
   * 62c04453381e  Jerome Brunet   3 weeks ago mailbox: cancel timer before starting it

 - Platform: arm64 Juno R0 and Juno R2 [1]

 - Partial log:
	[    0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd030]
	[    0.000000] Linux version 5.9.0-rc8-01722-g62c04453381e () (aarch64-none-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 9.2-2019.12 (arm-9.10)) 9.2.1 20191025, GNU ld (GNU Toolchain for the A-profile Architecture 9.2-2019.12 (arm-9.10)) 2.33.1.20191209) #175 SMP PREEMPT Thu Oct 15 14:17:41 BST 2020
	[    0.000000] Machine model: ARM Juno development board (r0)
	[..]
	[    1.714340] mhu 2b1f0000.mhu: ARM MHU Mailbox registered
	[    1.722768] NET: Registered protocol family 17
	[    1.727364] 9pnet: Installing 9P2000 support
	[    1.731689] Key type dns_resolver registered
	[    1.735474] usb 1-1: new high-speed USB device number 2 using ehci-platform
	[    1.736407] registered taskstats version 1
	[    1.747061] Loading compiled-in X.509 certificates
	[    1.755885] scpi_protocol scpi: SCP Protocol 1.2 Firmware 1.21.0 version
	[    1.770484] cpu cpu0: EM: created perf domain
	[    1.778505] cpu cpu1: EM: created perf domain
	[    1.807449] scpi_clocks scpi:clocks: failed to register clock 'pxlclk'
	[    1.897593] hub 1-1:1.0: USB hub found
	[    1.901656] hub 1-1:1.0: 4 ports detected
	[    2.559453] atkbd serio0: keyboard reset failed on 1c060000.kmi
	[   22.787431] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	[   22.793536] rcu:     1-...0: (1 ticks this GP) idle=222/1/0x4000000000000002 softirq=63/64 fqs=2626
	[   22.802421]  (detected by 2, t=5255 jiffies, g=-991, q=9)
	[   22.807823] Task dump for CPU 1:
	[   22.811049] task:swapper/1       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x0000002a
	[   22.820980] Call trace:
	[   22.823429]  __switch_to+0x138/0x198
	[   23.583444] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-... } 5443 jiffies s: 49 root: 0x2/.
	[   23.593995] rcu: blocking rcu_node structures:
	[   23.598449] Task dump for CPU 1:
	[   23.601680] task:swapper/1       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x0000002a
	[   23.611619] Call trace:
	[   23.614064]  __switch_to+0x138/0x198
	[   85.807430] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	[   85.813534] rcu:     1-...0: (1 ticks this GP) idle=222/1/0x4000000000000002 softirq=63/64 fqs=10502
	[   85.822506]  (detected by 2, t=21009 jiffies, g=-991, q=9)
	[   85.827994] Task dump for CPU 1:
	[   85.831220] task:swapper/1       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x0000002a
	[   85.841150] Call trace:
	[   85.843596]  __switch_to+0x138/0x198
	[   87.071446] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-... } 21315 jiffies s: 49 root: 0x2/.
	[   87.082088] rcu: blocking rcu_node structures:
	[   87.086540] Task dump for CPU 1:
	[   87.089773] task:swapper/1       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x0000002a
	[   87.099708] Call trace:
	[   87.102155]  __switch_to+0x138/0x198
	[  148.827442] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	[  148.833565] rcu:     1-...0: (1 ticks this GP) idle=222/1/0x4000000000000002 softirq=63/64 fqs=18377
	[  148.842543]  (detected by 4, t=36762 jiffies, g=-991, q=9)
	[  148.848037] Task dump for CPU 1:
	[  148.851268] task:swapper/1       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x0000002a
	[  148.861207] Call trace:
	[  148.863663]  __switch_to+0x138/0x198
	[  150.559443] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-... } 37187 jiffies s: 49 root: 0x2/.
	[  150.570082] rcu: blocking rcu_node structures:
	[  150.574535] Task dump for CPU 1:
	[  150.577767] task:swapper/1       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x0000002a
	[  150.587705] Call trace:
	[  150.590151]  __switch_to+0x138/0x198

 - Commit working as expected:
   * 558e4c36ec9f  Krzysztof Kozlowski     7 weeks ago     maiblox: mediatek: Fix handling of platform_get_irq() error


[1] https://developer.arm.com/tools-and-software/development-boards/juno-development-board

Thank you,
Ionela.

> -- 
> 2.25.4
> 
> 
> _______________________________________________
> linux-amlogic mailing list
> linux-amlogic@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-amlogic

WARNING: multiple messages have this Message-ID (diff)
From: Ionela Voinescu <ionela.voinescu@arm.com>
To: Jerome Brunet <jbrunet@baylibre.com>
Cc: Kevin Hilman <khilman@baylibre.com>,
	Jassi Brar <jassisinghbrar@gmail.com>, Da Xue <da@libre.computer>,
	linux-kernel@vger.kernel.org, linux-amlogic@lists.infradead.org
Subject: Re: [PATCH] mailbox: cancel timer before starting it
Date: Thu, 15 Oct 2020 14:46:28 +0100	[thread overview]
Message-ID: <20201015134628.GA11989@arm.com> (raw)
In-Reply-To: <20200923123916.1115962-1-jbrunet@baylibre.com>

Hi guys,

On Wednesday 23 Sep 2020 at 14:39:16 (+0200), Jerome Brunet wrote:
> If the txdone is done by polling, it is possible for msg_submit() to start
> the timer while txdone_hrtimer() callback is running. If the timer needs
> recheduling, it could already be enqueued by the time hrtimer_forward_now()
> is called, leading hrtimer to loudly complain.
> 
> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 hrtimer_forward+0xc4/0x110
> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 5.9.0-rc2-00236-gd3520067d01c-dirty #5
> Hardware name: Libre Computer AML-S805X-AC (DT)
> Workqueue: events_freezable_power_ thermal_zone_device_check
> pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--)
> pc : hrtimer_forward+0xc4/0x110
> lr : txdone_hrtimer+0xf8/0x118
> [...]
> 
> Canceling the timer before starting it ensure that the timer callback is
> not running when the timer is started, solving this race condition.
> 
> Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete polling")
> Reported-by: Da Xue <da@libre.computer>
> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
> ---
>  drivers/mailbox/mailbox.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 0b821a5b2db8..34f9ab01caef 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan)
>  exit:
>  	spin_unlock_irqrestore(&chan->lock, flags);
>  
> -	if (!err && (chan->txdone_method & TXDONE_BY_POLL))
> -		/* kick start the timer immediately to avoid delays */
> +	if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
> +		/* Disable the timer if already active ... */
> +		hrtimer_cancel(&chan->mbox->poll_hrt);
> +
> +		/* ... and kick start it immediately to avoid delays */
>  		hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
> +	}
>  }
>  
>  static void tx_tick(struct mbox_chan *chan, int r)

I've tracked a regression back to this commit. Details to reproduce:


 - HEAD: (linux-next)
   * 62c04453381e  Jerome Brunet   3 weeks ago mailbox: cancel timer before starting it

 - Platform: arm64 Juno R0 and Juno R2 [1]

 - Partial log:
	[    0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd030]
	[    0.000000] Linux version 5.9.0-rc8-01722-g62c04453381e () (aarch64-none-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 9.2-2019.12 (arm-9.10)) 9.2.1 20191025, GNU ld (GNU Toolchain for the A-profile Architecture 9.2-2019.12 (arm-9.10)) 2.33.1.20191209) #175 SMP PREEMPT Thu Oct 15 14:17:41 BST 2020
	[    0.000000] Machine model: ARM Juno development board (r0)
	[..]
	[    1.714340] mhu 2b1f0000.mhu: ARM MHU Mailbox registered
	[    1.722768] NET: Registered protocol family 17
	[    1.727364] 9pnet: Installing 9P2000 support
	[    1.731689] Key type dns_resolver registered
	[    1.735474] usb 1-1: new high-speed USB device number 2 using ehci-platform
	[    1.736407] registered taskstats version 1
	[    1.747061] Loading compiled-in X.509 certificates
	[    1.755885] scpi_protocol scpi: SCP Protocol 1.2 Firmware 1.21.0 version
	[    1.770484] cpu cpu0: EM: created perf domain
	[    1.778505] cpu cpu1: EM: created perf domain
	[    1.807449] scpi_clocks scpi:clocks: failed to register clock 'pxlclk'
	[    1.897593] hub 1-1:1.0: USB hub found
	[    1.901656] hub 1-1:1.0: 4 ports detected
	[    2.559453] atkbd serio0: keyboard reset failed on 1c060000.kmi
	[   22.787431] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	[   22.793536] rcu:     1-...0: (1 ticks this GP) idle=222/1/0x4000000000000002 softirq=63/64 fqs=2626
	[   22.802421]  (detected by 2, t=5255 jiffies, g=-991, q=9)
	[   22.807823] Task dump for CPU 1:
	[   22.811049] task:swapper/1       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x0000002a
	[   22.820980] Call trace:
	[   22.823429]  __switch_to+0x138/0x198
	[   23.583444] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-... } 5443 jiffies s: 49 root: 0x2/.
	[   23.593995] rcu: blocking rcu_node structures:
	[   23.598449] Task dump for CPU 1:
	[   23.601680] task:swapper/1       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x0000002a
	[   23.611619] Call trace:
	[   23.614064]  __switch_to+0x138/0x198
	[   85.807430] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	[   85.813534] rcu:     1-...0: (1 ticks this GP) idle=222/1/0x4000000000000002 softirq=63/64 fqs=10502
	[   85.822506]  (detected by 2, t=21009 jiffies, g=-991, q=9)
	[   85.827994] Task dump for CPU 1:
	[   85.831220] task:swapper/1       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x0000002a
	[   85.841150] Call trace:
	[   85.843596]  __switch_to+0x138/0x198
	[   87.071446] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-... } 21315 jiffies s: 49 root: 0x2/.
	[   87.082088] rcu: blocking rcu_node structures:
	[   87.086540] Task dump for CPU 1:
	[   87.089773] task:swapper/1       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x0000002a
	[   87.099708] Call trace:
	[   87.102155]  __switch_to+0x138/0x198
	[  148.827442] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	[  148.833565] rcu:     1-...0: (1 ticks this GP) idle=222/1/0x4000000000000002 softirq=63/64 fqs=18377
	[  148.842543]  (detected by 4, t=36762 jiffies, g=-991, q=9)
	[  148.848037] Task dump for CPU 1:
	[  148.851268] task:swapper/1       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x0000002a
	[  148.861207] Call trace:
	[  148.863663]  __switch_to+0x138/0x198
	[  150.559443] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-... } 37187 jiffies s: 49 root: 0x2/.
	[  150.570082] rcu: blocking rcu_node structures:
	[  150.574535] Task dump for CPU 1:
	[  150.577767] task:swapper/1       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x0000002a
	[  150.587705] Call trace:
	[  150.590151]  __switch_to+0x138/0x198

 - Commit working as expected:
   * 558e4c36ec9f  Krzysztof Kozlowski     7 weeks ago     maiblox: mediatek: Fix handling of platform_get_irq() error


[1] https://developer.arm.com/tools-and-software/development-boards/juno-development-board

Thank you,
Ionela.

> -- 
> 2.25.4
> 
> 
> _______________________________________________
> linux-amlogic mailing list
> linux-amlogic@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-amlogic

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

  parent reply	other threads:[~2020-10-15 13:46 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-23 12:39 [PATCH] mailbox: cancel timer before starting it Jerome Brunet
2020-09-23 12:39 ` Jerome Brunet
2020-09-24 16:50 ` Kevin Hilman
2020-09-24 16:50   ` Kevin Hilman
2020-10-15 13:46 ` Ionela Voinescu [this message]
2020-10-15 13:46   ` Ionela Voinescu
2020-10-15 13:58   ` Jerome Brunet
2020-10-15 13:58     ` Jerome Brunet
2020-10-15 14:29     ` Ionela Voinescu
2020-10-15 14:29       ` Ionela Voinescu
2020-10-16  8:44       ` Sudeep Holla
2020-10-16  8:44         ` Sudeep Holla
2020-10-16  9:02         ` Jerome Brunet
2020-10-16  9:02           ` Jerome Brunet
2020-10-16  9:34           ` Sudeep Holla
2020-10-16  9:34             ` Sudeep Holla
2020-10-16 11:47             ` Sudeep Holla
2020-10-16 11:47               ` Sudeep Holla
2020-10-15 18:45     ` Jassi Brar
2020-10-15 18:45       ` Jassi Brar
2020-10-16  8:52       ` Ionela Voinescu
2020-10-16  8:52         ` Ionela Voinescu
2020-10-16  9:00         ` Jerome Brunet
2020-10-16  9:00           ` Jerome Brunet
2020-10-16 13:52           ` Sudeep Holla
2020-10-16 13:52             ` Sudeep Holla
2020-10-16 17:33           ` Jassi Brar
2020-10-16 17:33             ` Jassi Brar
2020-10-16 18:54             ` Jerome Brunet
2020-10-16 18:54               ` Jerome Brunet
2020-10-16 19:27               ` Jassi Brar
2020-10-16 19:27                 ` Jassi Brar
2022-07-19  7:08                 ` Da Xue
2022-07-19  7:08                   ` Da Xue
2020-10-16 11:52         ` Sudeep Holla
2020-10-16 11:52           ` Sudeep Holla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201015134628.GA11989@arm.com \
    --to=ionela.voinescu@arm.com \
    --cc=da@libre.computer \
    --cc=jassisinghbrar@gmail.com \
    --cc=jbrunet@baylibre.com \
    --cc=khilman@baylibre.com \
    --cc=linux-amlogic@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.