All of lore.kernel.org
 help / color / mirror / Atom feed
* oops with 4.14-rc8 when opening and closing /dev/watchdog0
@ 2017-11-08 13:26 Rasmus Villemoes
  2017-11-08 18:15 ` Guenter Roeck
  0 siblings, 1 reply; 4+ messages in thread
From: Rasmus Villemoes @ 2017-11-08 13:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-watchdog, Guenter Roeck

Running current master (4.14.0-rc8-00009-gfbc3edf) I can reproduce the
below quite consistently, though there are some variations in the stack
trace. It happens when I start and stop busybox watchdog on
/dev/watchdog0 a few times (sometimes on start, sometimes on stop,
almost always after at most 3 starts/stops). watchdog0 is a gpio
watchdog with the below DT entry.

        gpio-wdt {
                status = "okay";
                compatible = "linux,wdt-gpio";
                hw_margin_ms = <0xfa>;
                hw_algo = "toggle";
                gpios = <0x15 0x19 0x0>;
                always-running;
        };

The 6b6b6b6b suggests some kind of use-after-free, I think.

This is a ARM board based on LS1021A. Unfortunately, I hit this in the
process of starting to use a mainline-based kernel for the board, so I
don't have any previous known-working kernel to start a bisection from.
I'll try to see if I can get a 4.13 one to boot with the same .dtb and
.config, but in the meantime perhaps someone can see something obvious.

Thanks,
Rasmus


Unable to handle kernel paging request at virtual address 6b6b6d3b
pgd = 80003000
[6b6b6d3b] *pgd=80000080005003, *pmd=00000000
Internal error: Oops: 206 [#1] SMP ARM
Modules linked in: bridge stp llc
CPU: 0 PID: 1931 Comm: watchdog Not tainted 4.14.0-rc8-00009-gfbc3edf #1
Hardware name: Freescale LS1021A
task: be4a8d40 task.stack: bd1d4000
PC is at module_put+0x8/0x68
LR is at __fput+0x108/0x1b0
pc : [<8027f090>]    lr : [<802ef898>]    psr: 200d0013
sp : bd1d5f20  ip : 00000000  fp : 00000000
r10: bf0877c8  r9 : be805608  r8 : be04ccd0
r7 : be3f57c8  r6 : 00000008  r5 : bf0877c8  r4 : be3f57c0
r3 : bf2241a0  r2 : 6b6b6b6a  r1 : 00000000  r0 : 6b6b6b6b
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 30c5387d  Table: bd189340  DAC: fffffffd
Process watchdog (pid: 1931, stack limit = 0xbd1d4210)
Stack: (0xbd1d5f20 to 0xbd1d6000)
5f20: 00000000 00000000 0000002b be4a917c be4a8d40 be412b80 81055310
be49897c
5f40: 00000000 0000044c 00000000 80235014 be4a8d40 be498940 bd1d4000
bd1d5f70
5f60: be49897c 80220a10 00000000 be4a9068 be4a8d40 be4a9068 bd80c5c0
000000f8
5f80: 00000000 80220fe8 000f4240 7ed9bccc 0007a154 000000f8 80207544
80220ffc
5fa0: 000f4240 80207380 000f4240 7ed9bccc 00000000 000874fe 00000001
00000000
5fc0: 000f4240 7ed9bccc 0007a154 000000f8 00000f00 00000000 76f89000
00000000
5fe0: 76eb87e0 7ed9b9b4 00021c9c 76eb87f0 600d0010 00000000 00000000
00000000
[<8027f090>] (module_put) from [<00000000>] (  (null))
Code: e3a00001 e12fff1e e3500000 012fff1e (e59021d0)
---[ end trace d8b636b1833a6c9e ]---
Kernel panic - not syncing: Fatal exception
CPU1: stopping
CPU: 1 PID: 64 Comm: kworker/u4:1 Tainted: G      D
4.14.0-rc8-00009-gfbc3edf #1
Hardware name: Freescale LS1021A
Workqueue: events_unbound flush_to_ldisc
[<8020c8e8>] (unwind_backtrace) from [<8020a728>] (show_stack+0x10/0x14)
[<8020a728>] (show_stack) from [<80661704>] (dump_stack+0x7c/0x98)
[<80661704>] (dump_stack) from [<8020bc24>] (handle_IPI+0xdc/0x184)
[<8020bc24>] (handle_IPI) from [<802013ac>] (gic_handle_irq+0x70/0x78)
[<802013ac>] (gic_handle_irq) from [<806776f8>] (__irq_svc+0x58/0x74)
Exception stack(0xbe0d1e58 to 0xbe0d1ea0)
1e40:                                                       00000000
000000fd
1e60: 00000000 00000ff8 be18ca40 00000000 00000000 c08a3000 bdb09c5b
bdb09c5b
1e80: 00000052 00000000 be18cbbc be0d1ea8 802502c0 8045cd2c 60010013
ffffffff
[<806776f8>] (__irq_svc) from [<8045cd2c>]
(n_tty_receive_buf_common+0x804/0x8bc)
[<8045cd2c>] (n_tty_receive_buf_common) from [<8045cdf4>]
(n_tty_receive_buf2+0x10/0x18)
[<8045cdf4>] (n_tty_receive_buf2) from [<8045f374>]
(tty_port_default_receive_buf+0x44/0x54)
[<8045f374>] (tty_port_default_receive_buf) from [<8045ebfc>]
(flush_to_ldisc+0x8c/0xac)
[<8045ebfc>] (flush_to_ldisc) from [<80231224>]
(process_one_work+0x1b0/0x314)
[<80231224>] (process_one_work) from [<80232118>]
(worker_thread+0x2cc/0x424)
[<80232118>] (worker_thread) from [<80236598>] (kthread+0x130/0x148)
[<80236598>] (kthread) from [<80207440>] (ret_from_fork+0x14/0x34)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: oops with 4.14-rc8 when opening and closing /dev/watchdog0
  2017-11-08 13:26 oops with 4.14-rc8 when opening and closing /dev/watchdog0 Rasmus Villemoes
@ 2017-11-08 18:15 ` Guenter Roeck
  2017-11-09 13:39   ` [PATCH] watchdog: gpio_wdt: set WDOG_HW_RUNNING in gpio_wdt_stop Rasmus Villemoes
  0 siblings, 1 reply; 4+ messages in thread
From: Guenter Roeck @ 2017-11-08 18:15 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: linux-kernel, linux-watchdog

On Wed, Nov 08, 2017 at 02:26:34PM +0100, Rasmus Villemoes wrote:
> Running current master (4.14.0-rc8-00009-gfbc3edf) I can reproduce the
> below quite consistently, though there are some variations in the stack
> trace. It happens when I start and stop busybox watchdog on
> /dev/watchdog0 a few times (sometimes on start, sometimes on stop,
> almost always after at most 3 starts/stops). watchdog0 is a gpio
> watchdog with the below DT entry.
> 
>         gpio-wdt {
>                 status = "okay";
>                 compatible = "linux,wdt-gpio";
>                 hw_margin_ms = <0xfa>;
>                 hw_algo = "toggle";
>                 gpios = <0x15 0x19 0x0>;
>                 always-running;
>         };
> 
> The 6b6b6b6b suggests some kind of use-after-free, I think.
> 
> This is a ARM board based on LS1021A. Unfortunately, I hit this in the
> process of starting to use a mainline-based kernel for the board, so I
> don't have any previous known-working kernel to start a bisection from.
> I'll try to see if I can get a 4.13 one to boot with the same .dtb and
> .config, but in the meantime perhaps someone can see something obvious.
> 

Please let me know if the following two patches help.

https://patchwork.kernel.org/patch/9970181/
https://patchwork.kernel.org/patch/9970187/

Thanks,
Guenter

> Thanks,
> Rasmus
> 
> 
> Unable to handle kernel paging request at virtual address 6b6b6d3b
> pgd = 80003000
> [6b6b6d3b] *pgd=80000080005003, *pmd=00000000
> Internal error: Oops: 206 [#1] SMP ARM
> Modules linked in: bridge stp llc
> CPU: 0 PID: 1931 Comm: watchdog Not tainted 4.14.0-rc8-00009-gfbc3edf #1
> Hardware name: Freescale LS1021A
> task: be4a8d40 task.stack: bd1d4000
> PC is at module_put+0x8/0x68
> LR is at __fput+0x108/0x1b0
> pc : [<8027f090>]    lr : [<802ef898>]    psr: 200d0013
> sp : bd1d5f20  ip : 00000000  fp : 00000000
> r10: bf0877c8  r9 : be805608  r8 : be04ccd0
> r7 : be3f57c8  r6 : 00000008  r5 : bf0877c8  r4 : be3f57c0
> r3 : bf2241a0  r2 : 6b6b6b6a  r1 : 00000000  r0 : 6b6b6b6b
> Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> Control: 30c5387d  Table: bd189340  DAC: fffffffd
> Process watchdog (pid: 1931, stack limit = 0xbd1d4210)
> Stack: (0xbd1d5f20 to 0xbd1d6000)
> 5f20: 00000000 00000000 0000002b be4a917c be4a8d40 be412b80 81055310
> be49897c
> 5f40: 00000000 0000044c 00000000 80235014 be4a8d40 be498940 bd1d4000
> bd1d5f70
> 5f60: be49897c 80220a10 00000000 be4a9068 be4a8d40 be4a9068 bd80c5c0
> 000000f8
> 5f80: 00000000 80220fe8 000f4240 7ed9bccc 0007a154 000000f8 80207544
> 80220ffc
> 5fa0: 000f4240 80207380 000f4240 7ed9bccc 00000000 000874fe 00000001
> 00000000
> 5fc0: 000f4240 7ed9bccc 0007a154 000000f8 00000f00 00000000 76f89000
> 00000000
> 5fe0: 76eb87e0 7ed9b9b4 00021c9c 76eb87f0 600d0010 00000000 00000000
> 00000000
> [<8027f090>] (module_put) from [<00000000>] (  (null))
> Code: e3a00001 e12fff1e e3500000 012fff1e (e59021d0)
> ---[ end trace d8b636b1833a6c9e ]---
> Kernel panic - not syncing: Fatal exception
> CPU1: stopping
> CPU: 1 PID: 64 Comm: kworker/u4:1 Tainted: G      D
> 4.14.0-rc8-00009-gfbc3edf #1
> Hardware name: Freescale LS1021A
> Workqueue: events_unbound flush_to_ldisc
> [<8020c8e8>] (unwind_backtrace) from [<8020a728>] (show_stack+0x10/0x14)
> [<8020a728>] (show_stack) from [<80661704>] (dump_stack+0x7c/0x98)
> [<80661704>] (dump_stack) from [<8020bc24>] (handle_IPI+0xdc/0x184)
> [<8020bc24>] (handle_IPI) from [<802013ac>] (gic_handle_irq+0x70/0x78)
> [<802013ac>] (gic_handle_irq) from [<806776f8>] (__irq_svc+0x58/0x74)
> Exception stack(0xbe0d1e58 to 0xbe0d1ea0)
> 1e40:                                                       00000000
> 000000fd
> 1e60: 00000000 00000ff8 be18ca40 00000000 00000000 c08a3000 bdb09c5b
> bdb09c5b
> 1e80: 00000052 00000000 be18cbbc be0d1ea8 802502c0 8045cd2c 60010013
> ffffffff
> [<806776f8>] (__irq_svc) from [<8045cd2c>]
> (n_tty_receive_buf_common+0x804/0x8bc)
> [<8045cd2c>] (n_tty_receive_buf_common) from [<8045cdf4>]
> (n_tty_receive_buf2+0x10/0x18)
> [<8045cdf4>] (n_tty_receive_buf2) from [<8045f374>]
> (tty_port_default_receive_buf+0x44/0x54)
> [<8045f374>] (tty_port_default_receive_buf) from [<8045ebfc>]
> (flush_to_ldisc+0x8c/0xac)
> [<8045ebfc>] (flush_to_ldisc) from [<80231224>]
> (process_one_work+0x1b0/0x314)
> [<80231224>] (process_one_work) from [<80232118>]
> (worker_thread+0x2cc/0x424)
> [<80232118>] (worker_thread) from [<80236598>] (kthread+0x130/0x148)
> [<80236598>] (kthread) from [<80207440>] (ret_from_fork+0x14/0x34)
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] watchdog: gpio_wdt: set WDOG_HW_RUNNING in gpio_wdt_stop
  2017-11-08 18:15 ` Guenter Roeck
@ 2017-11-09 13:39   ` Rasmus Villemoes
  2017-11-09 16:09     ` Guenter Roeck
  0 siblings, 1 reply; 4+ messages in thread
From: Rasmus Villemoes @ 2017-11-09 13:39 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-watchdog, linux-kernel, Rasmus Villemoes

The first patch above (https://patchwork.kernel.org/patch/9970181/)
makes the oops go away, but it just papers over the problem. The real
problem is that the watchdog core clears WDOG_HW_RUNNING in
watchdog_stop, and the gpio driver fails to set it in its stop
function when it doesn't actually stop it. This means that the core
doesn't know that it now has responsibility for petting the device, in
turn causing the device to reset the system (I hadn't noticed this
because the board I'm working on has that reset logic disabled).

How about this (other drivers may of course have the same problem, I
haven't checked). One might say that ->stop should return an error
when the device can't be stopped, but OTOH this brings parity between
a device without a ->stop method and a GPIO wd that has always-running
set. IOW, I think ->stop should only return an error when an actual
attempt to stop the hardware failed.

From: Rasmus Villemoes <rasmus.villemoes@prevas.dk>

The watchdog framework clears WDOG_HW_RUNNING before calling
->stop. If the driver is unable to stop the device, it is supposed to
set that bit again so that the watchdog core takes care of sending
heart-beats while the device is not open from user-space. Update the
gpio_wdt driver to honour that contract (and get rid of the redundant
clearing of WDOG_HW_RUNNING).

Fixes: 3c10bbde10 ("watchdog: core: Clear WDOG_HW_RUNNING before calling the stop function")
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
---
 drivers/watchdog/gpio_wdt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/watchdog/gpio_wdt.c b/drivers/watchdog/gpio_wdt.c
index cb66c2f..7a6279d 100644
--- a/drivers/watchdog/gpio_wdt.c
+++ b/drivers/watchdog/gpio_wdt.c
@@ -80,7 +80,8 @@ static int gpio_wdt_stop(struct watchdog_device *wdd)
 
 	if (!priv->always_running) {
 		gpio_wdt_disable(priv);
-		clear_bit(WDOG_HW_RUNNING, &wdd->status);
+	} else {
+		set_bit(WDOG_HW_RUNNING, &wdd->status);
 	}
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] watchdog: gpio_wdt: set WDOG_HW_RUNNING in gpio_wdt_stop
  2017-11-09 13:39   ` [PATCH] watchdog: gpio_wdt: set WDOG_HW_RUNNING in gpio_wdt_stop Rasmus Villemoes
@ 2017-11-09 16:09     ` Guenter Roeck
  0 siblings, 0 replies; 4+ messages in thread
From: Guenter Roeck @ 2017-11-09 16:09 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: linux-watchdog, linux-kernel

On Thu, Nov 09, 2017 at 02:39:55PM +0100, Rasmus Villemoes wrote:
> The first patch above (https://patchwork.kernel.org/patch/9970181/)
> makes the oops go away, but it just papers over the problem. The real
> problem is that the watchdog core clears WDOG_HW_RUNNING in
> watchdog_stop, and the gpio driver fails to set it in its stop
> function when it doesn't actually stop it. This means that the core
> doesn't know that it now has responsibility for petting the device, in
> turn causing the device to reset the system (I hadn't noticed this
> because the board I'm working on has that reset logic disabled).
> 
> How about this (other drivers may of course have the same problem, I
> haven't checked). One might say that ->stop should return an error
> when the device can't be stopped, but OTOH this brings parity between
> a device without a ->stop method and a GPIO wd that has always-running
> set. IOW, I think ->stop should only return an error when an actual
> attempt to stop the hardware failed.
> 
Agreed.

> From: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
> 
> The watchdog framework clears WDOG_HW_RUNNING before calling
> ->stop. If the driver is unable to stop the device, it is supposed to
> set that bit again so that the watchdog core takes care of sending
> heart-beats while the device is not open from user-space. Update the
> gpio_wdt driver to honour that contract (and get rid of the redundant
> clearing of WDOG_HW_RUNNING).
> 
> Fixes: 3c10bbde10 ("watchdog: core: Clear WDOG_HW_RUNNING before calling the stop function")
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
> ---
>  drivers/watchdog/gpio_wdt.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/watchdog/gpio_wdt.c b/drivers/watchdog/gpio_wdt.c
> index cb66c2f..7a6279d 100644
> --- a/drivers/watchdog/gpio_wdt.c
> +++ b/drivers/watchdog/gpio_wdt.c
> @@ -80,7 +80,8 @@ static int gpio_wdt_stop(struct watchdog_device *wdd)
>  
>  	if (!priv->always_running) {
>  		gpio_wdt_disable(priv);
> -		clear_bit(WDOG_HW_RUNNING, &wdd->status);
> +	} else {
> +		set_bit(WDOG_HW_RUNNING, &wdd->status);
>  	}

{ } are now unnecessary. otherwise

Reviewed-by: Guenter Roeck <linux@roeck-us.net>

>  
>  	return 0;
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-11-09 16:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-08 13:26 oops with 4.14-rc8 when opening and closing /dev/watchdog0 Rasmus Villemoes
2017-11-08 18:15 ` Guenter Roeck
2017-11-09 13:39   ` [PATCH] watchdog: gpio_wdt: set WDOG_HW_RUNNING in gpio_wdt_stop Rasmus Villemoes
2017-11-09 16:09     ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.