All of lore.kernel.org
 help / color / mirror / Atom feed
* Clock related crashes in v5.4.y-queue
@ 2020-01-02  2:44 Guenter Roeck
  2020-01-02  3:41 ` Guenter Roeck
  2020-01-02 21:01 ` Greg Kroah-Hartman
  0 siblings, 2 replies; 14+ messages in thread
From: Guenter Roeck @ 2020-01-02  2:44 UTC (permalink / raw)
  To: Greg Kroah-Hartman, stable, Stephen Boyd

Hi,

I see a number of crashes in the latest v5.4.y-queue; please see below
for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
leak in clk_unregister()").

The context suggests recovery from a failed driver probe, and it appears
that the memory is released twice. Interestingly, I don't see the problem
in mainline.

I would suggest to drop that patch from the stable queue.

Guenter

---
First traceback is:

[   19.203547] ------------[ cut here ]------------
[   19.204107] WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:4034 __clk_put+0xfc/0x128
[   19.204275] Modules linked in:
[   19.204634] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.8-rc1-00191-gaf408bc6c96e #1
[   19.204790] Hardware name: Xilinx Zynq Platform
[   19.204994] [<c0313658>] (unwind_backtrace) from [<c030d698>] (show_stack+0x10/0x14)
[   19.205150] [<c030d698>] (show_stack) from [<c1139bdc>] (dump_stack+0xe0/0x10c)
[   19.205278] [<c1139bdc>] (dump_stack) from [<c0349098>] (__warn+0xf4/0x10c)
[   19.205399] [<c0349098>] (__warn) from [<c0349164>] (warn_slowpath_fmt+0xb4/0xbc)
[   19.205522] [<c0349164>] (warn_slowpath_fmt) from [<c0956d14>] (__clk_put+0xfc/0x128)
[   19.205654] [<c0956d14>] (__clk_put) from [<c0b1ea10>] (release_nodes+0x1c4/0x278)
[   19.205780] [<c0b1ea10>] (release_nodes) from [<c0b1a220>] (really_probe+0x108/0x34c)
[   19.205908] [<c0b1a220>] (really_probe) from [<c0b1a5dc>] (driver_probe_device+0x60/0x174)
[   19.206042] [<c0b1a5dc>] (driver_probe_device) from [<c0b1a898>] (device_driver_attach+0x58/0x60)
[   19.206179] [<c0b1a898>] (device_driver_attach) from [<c0b1a924>] (__driver_attach+0x84/0xc0)
[   19.206313] [<c0b1a924>] (__driver_attach) from [<c0b18400>] (bus_for_each_dev+0x78/0xb8)
[   19.206463] [<c0b18400>] (bus_for_each_dev) from [<c0b195e8>] (bus_add_driver+0x164/0x1e8)
[   19.206590] [<c0b195e8>] (bus_add_driver) from [<c0b1b6fc>] (driver_register+0x74/0x108)
[   19.206723] [<c0b1b6fc>] (driver_register) from [<c030315c>] (do_one_initcall+0x8c/0x3bc)
[   19.206857] [<c030315c>] (do_one_initcall) from [<c1a01080>] (kernel_init_freeable+0x14c/0x1e8)
[   19.206992] [<c1a01080>] (kernel_init_freeable) from [<c11547a4>] (kernel_init+0x8/0x118)
[   19.207116] [<c11547a4>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20)

followed by:

[   19.209792] 8<--- cut here ---
[   19.209926] Unable to handle kernel paging request at virtual address 6b6b6bb3
[   19.210117] pgd = (ptrval)
[   19.210207] [6b6b6bb3] *pgd=00000000
[   19.210626] Internal error: Oops: 5 [#1] SMP ARM
[   19.210807] Modules linked in:
[   19.210956] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.4.8-rc1-00191-gaf408bc6c96e #1
[   19.211090] Hardware name: Xilinx Zynq Platform
[   19.211200] PC is at __clk_put+0x104/0x128
[   19.211274] LR is at __clk_put+0xfc/0x128
[   19.211349] pc : [<c0956d1c>]    lr : [<c0956d14>]    psr: 60000053
[   19.211446] sp : c7129dd8  ip : 00000000  fp : c59f1680
[   19.211534] r10: c72fb6ac  r9 : c0b1dbd0  r8 : 00000008
[   19.211626] r7 : c7129e04  r6 : c72fb410  r5 : c59f0880  r4 : c59f3180
[   19.211727] r3 : 7a538c1d  r2 : 6b6b6b6b  r1 : 6b6b6b6b  r0 : 00000000
[   19.211885] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
[   19.212022] Control: 10c5387d  Table: 00204059  DAC: 00000051
[   19.212152] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
[   19.212270] Stack: (0xc7129dd8 to 0xc712a000)
[   19.212391] 9dc0:                                                       c59f1680 c59f0880
[   19.212608] 9de0: c72fb410 c0b1ea10 ffffffed 00000000 c0b1e404 c7128000 c72fb410 a0000053
[   19.212822] 9e00: c72fb68c c59f1c80 c59f1480 7a538c1d 00000001 c241e19c c72fb410 c241e1a0
[   19.213029] 9e20: 00000000 c1d8a1ac 00000000 ffffffed c1b8124c c0b1a220 c72fb410 c1d8a1ac
[   19.213240] 9e40: c1d8a1ac c7128000 c1dc347c 00000007 000001f6 c0b1a5dc c1d8a1ac c1d8a1ac
[   19.213462] 9e60: c7128000 c72fb410 00000000 c1d8a1ac c7128000 c1dc347c 00000007 000001f6
[   19.213683] 9e80: c1b8124c c0b1a898 00000000 c1d8a1ac c72fb410 c0b1a924 00000000 c1d8a1ac
[   19.213899] 9ea0: c0b1a8a0 c0b18400 c70b50d4 c70b50a4 c725d210 7a538c1d c70b50d4 c1d8a1ac
[   19.214115] 9ec0: c59f0280 c1d6dd50 00000000 c0b195e8 c185eb44 c1aab944 00000000 c1d8a1ac
[   19.214343] 9ee0: c1aab944 00000000 c1c08468 c0b1b6fc c1dc46c0 c1aab944 00000000 c030315c
[   19.214555] 9f00: c1959bf0 000001f6 000001f6 c0372600 00000000 c19574b8 c1883c18 00000000
[   19.214783] 9f20: c7128000 c03b3f70 c7128000 c1dd1f00 c1c08468 c1ae7870 c1a00590 00000007
[   19.215001] 9f40: 000001f6 c03d39b8 00000000 7a538c1d c1dc347c c1dd1f00 c1dd1f00 c1ae7850
[   19.215214] 9f60: c1ae7870 c1a00590 00000007 c1a01080 00000006 00000006 00000000 c1a00590
[   19.215429] 9f80: 00000000 00000000 c115479c 00000000 00000000 00000000 00000000 00000000
[   19.215636] 9fa0: 00000000 c11547a4 00000000 c03010b4 00000000 00000000 00000000 00000000
[   19.215843] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   19.216068] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[   19.216255] [<c0956d1c>] (__clk_put) from [<c0b1ea10>] (release_nodes+0x1c4/0x278)
[   19.216376] [<c0b1ea10>] (release_nodes) from [<c0b1a220>] (really_probe+0x108/0x34c)
[   19.216494] [<c0b1a220>] (really_probe) from [<c0b1a5dc>] (driver_probe_device+0x60/0x174)
[   19.216617] [<c0b1a5dc>] (driver_probe_device) from [<c0b1a898>] (device_driver_attach+0x58/0x60)
[   19.216745] [<c0b1a898>] (device_driver_attach) from [<c0b1a924>] (__driver_attach+0x84/0xc0)
[   19.216867] [<c0b1a924>] (__driver_attach) from [<c0b18400>] (bus_for_each_dev+0x78/0xb8)
[   19.216993] [<c0b18400>] (bus_for_each_dev) from [<c0b195e8>] (bus_add_driver+0x164/0x1e8)
[   19.217112] [<c0b195e8>] (bus_add_driver) from [<c0b1b6fc>] (driver_register+0x74/0x108)
[   19.217233] [<c0b1b6fc>] (driver_register) from [<c030315c>] (do_one_initcall+0x8c/0x3bc)
[   19.217358] [<c030315c>] (do_one_initcall) from [<c1a01080>] (kernel_init_freeable+0x14c/0x1e8)
[   19.217500] [<c1a01080>] (kernel_init_freeable) from [<c11547a4>] (kernel_init+0x8/0x118)
[   19.217624] [<c11547a4>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Clock related crashes in v5.4.y-queue
  2020-01-02  2:44 Clock related crashes in v5.4.y-queue Guenter Roeck
@ 2020-01-02  3:41 ` Guenter Roeck
  2020-01-02  7:30   ` Stephen Boyd
  2020-01-02 21:01 ` Greg Kroah-Hartman
  1 sibling, 1 reply; 14+ messages in thread
From: Guenter Roeck @ 2020-01-02  3:41 UTC (permalink / raw)
  To: Greg Kroah-Hartman, stable, Stephen Boyd

On 1/1/20 6:44 PM, Guenter Roeck wrote:
> Hi,
> 
> I see a number of crashes in the latest v5.4.y-queue; please see below
> for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
> leak in clk_unregister()").
> 
> The context suggests recovery from a failed driver probe, and it appears
> that the memory is released twice. Interestingly, I don't see the problem
> in mainline.
> 

The reason for not seeing the crash in mainline is that macb_probe()
doesn't fail there due to unrelated changes in the driver. If I force
macb_probe() to fail in mainline, I see exactly the same crash.

Effectively this means that upstream commit 8247470772be is broken;
it may fix a memory leak in some situations, but it results in UAF
and crashes otherwise.

Stephen, any comments ? I must admit that I don't understand the clock
code nor the commit in question; I would have assumed that the call
to __clk_put() would release the clk data structure, not an explicit
call to kfree().

Guenter

> I would suggest to drop that patch from the stable queue.
> 
> Guenter
> 
> ---
> First traceback is:
> 
> [   19.203547] ------------[ cut here ]------------
> [   19.204107] WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:4034 __clk_put+0xfc/0x128
> [   19.204275] Modules linked in:
> [   19.204634] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.8-rc1-00191-gaf408bc6c96e #1
> [   19.204790] Hardware name: Xilinx Zynq Platform
> [   19.204994] [<c0313658>] (unwind_backtrace) from [<c030d698>] (show_stack+0x10/0x14)
> [   19.205150] [<c030d698>] (show_stack) from [<c1139bdc>] (dump_stack+0xe0/0x10c)
> [   19.205278] [<c1139bdc>] (dump_stack) from [<c0349098>] (__warn+0xf4/0x10c)
> [   19.205399] [<c0349098>] (__warn) from [<c0349164>] (warn_slowpath_fmt+0xb4/0xbc)
> [   19.205522] [<c0349164>] (warn_slowpath_fmt) from [<c0956d14>] (__clk_put+0xfc/0x128)
> [   19.205654] [<c0956d14>] (__clk_put) from [<c0b1ea10>] (release_nodes+0x1c4/0x278)
> [   19.205780] [<c0b1ea10>] (release_nodes) from [<c0b1a220>] (really_probe+0x108/0x34c)
> [   19.205908] [<c0b1a220>] (really_probe) from [<c0b1a5dc>] (driver_probe_device+0x60/0x174)
> [   19.206042] [<c0b1a5dc>] (driver_probe_device) from [<c0b1a898>] (device_driver_attach+0x58/0x60)
> [   19.206179] [<c0b1a898>] (device_driver_attach) from [<c0b1a924>] (__driver_attach+0x84/0xc0)
> [   19.206313] [<c0b1a924>] (__driver_attach) from [<c0b18400>] (bus_for_each_dev+0x78/0xb8)
> [   19.206463] [<c0b18400>] (bus_for_each_dev) from [<c0b195e8>] (bus_add_driver+0x164/0x1e8)
> [   19.206590] [<c0b195e8>] (bus_add_driver) from [<c0b1b6fc>] (driver_register+0x74/0x108)
> [   19.206723] [<c0b1b6fc>] (driver_register) from [<c030315c>] (do_one_initcall+0x8c/0x3bc)
> [   19.206857] [<c030315c>] (do_one_initcall) from [<c1a01080>] (kernel_init_freeable+0x14c/0x1e8)
> [   19.206992] [<c1a01080>] (kernel_init_freeable) from [<c11547a4>] (kernel_init+0x8/0x118)
> [   19.207116] [<c11547a4>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20)
> 
> followed by:
> 
> [   19.209792] 8<--- cut here ---
> [   19.209926] Unable to handle kernel paging request at virtual address 6b6b6bb3
> [   19.210117] pgd = (ptrval)
> [   19.210207] [6b6b6bb3] *pgd=00000000
> [   19.210626] Internal error: Oops: 5 [#1] SMP ARM
> [   19.210807] Modules linked in:
> [   19.210956] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.4.8-rc1-00191-gaf408bc6c96e #1
> [   19.211090] Hardware name: Xilinx Zynq Platform
> [   19.211200] PC is at __clk_put+0x104/0x128
> [   19.211274] LR is at __clk_put+0xfc/0x128
> [   19.211349] pc : [<c0956d1c>]    lr : [<c0956d14>]    psr: 60000053
> [   19.211446] sp : c7129dd8  ip : 00000000  fp : c59f1680
> [   19.211534] r10: c72fb6ac  r9 : c0b1dbd0  r8 : 00000008
> [   19.211626] r7 : c7129e04  r6 : c72fb410  r5 : c59f0880  r4 : c59f3180
> [   19.211727] r3 : 7a538c1d  r2 : 6b6b6b6b  r1 : 6b6b6b6b  r0 : 00000000
> [   19.211885] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
> [   19.212022] Control: 10c5387d  Table: 00204059  DAC: 00000051
> [   19.212152] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
> [   19.212270] Stack: (0xc7129dd8 to 0xc712a000)
> [   19.212391] 9dc0:                                                       c59f1680 c59f0880
> [   19.212608] 9de0: c72fb410 c0b1ea10 ffffffed 00000000 c0b1e404 c7128000 c72fb410 a0000053
> [   19.212822] 9e00: c72fb68c c59f1c80 c59f1480 7a538c1d 00000001 c241e19c c72fb410 c241e1a0
> [   19.213029] 9e20: 00000000 c1d8a1ac 00000000 ffffffed c1b8124c c0b1a220 c72fb410 c1d8a1ac
> [   19.213240] 9e40: c1d8a1ac c7128000 c1dc347c 00000007 000001f6 c0b1a5dc c1d8a1ac c1d8a1ac
> [   19.213462] 9e60: c7128000 c72fb410 00000000 c1d8a1ac c7128000 c1dc347c 00000007 000001f6
> [   19.213683] 9e80: c1b8124c c0b1a898 00000000 c1d8a1ac c72fb410 c0b1a924 00000000 c1d8a1ac
> [   19.213899] 9ea0: c0b1a8a0 c0b18400 c70b50d4 c70b50a4 c725d210 7a538c1d c70b50d4 c1d8a1ac
> [   19.214115] 9ec0: c59f0280 c1d6dd50 00000000 c0b195e8 c185eb44 c1aab944 00000000 c1d8a1ac
> [   19.214343] 9ee0: c1aab944 00000000 c1c08468 c0b1b6fc c1dc46c0 c1aab944 00000000 c030315c
> [   19.214555] 9f00: c1959bf0 000001f6 000001f6 c0372600 00000000 c19574b8 c1883c18 00000000
> [   19.214783] 9f20: c7128000 c03b3f70 c7128000 c1dd1f00 c1c08468 c1ae7870 c1a00590 00000007
> [   19.215001] 9f40: 000001f6 c03d39b8 00000000 7a538c1d c1dc347c c1dd1f00 c1dd1f00 c1ae7850
> [   19.215214] 9f60: c1ae7870 c1a00590 00000007 c1a01080 00000006 00000006 00000000 c1a00590
> [   19.215429] 9f80: 00000000 00000000 c115479c 00000000 00000000 00000000 00000000 00000000
> [   19.215636] 9fa0: 00000000 c11547a4 00000000 c03010b4 00000000 00000000 00000000 00000000
> [   19.215843] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [   19.216068] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> [   19.216255] [<c0956d1c>] (__clk_put) from [<c0b1ea10>] (release_nodes+0x1c4/0x278)
> [   19.216376] [<c0b1ea10>] (release_nodes) from [<c0b1a220>] (really_probe+0x108/0x34c)
> [   19.216494] [<c0b1a220>] (really_probe) from [<c0b1a5dc>] (driver_probe_device+0x60/0x174)
> [   19.216617] [<c0b1a5dc>] (driver_probe_device) from [<c0b1a898>] (device_driver_attach+0x58/0x60)
> [   19.216745] [<c0b1a898>] (device_driver_attach) from [<c0b1a924>] (__driver_attach+0x84/0xc0)
> [   19.216867] [<c0b1a924>] (__driver_attach) from [<c0b18400>] (bus_for_each_dev+0x78/0xb8)
> [   19.216993] [<c0b18400>] (bus_for_each_dev) from [<c0b195e8>] (bus_add_driver+0x164/0x1e8)
> [   19.217112] [<c0b195e8>] (bus_add_driver) from [<c0b1b6fc>] (driver_register+0x74/0x108)
> [   19.217233] [<c0b1b6fc>] (driver_register) from [<c030315c>] (do_one_initcall+0x8c/0x3bc)
> [   19.217358] [<c030315c>] (do_one_initcall) from [<c1a01080>] (kernel_init_freeable+0x14c/0x1e8)
> [   19.217500] [<c1a01080>] (kernel_init_freeable) from [<c11547a4>] (kernel_init+0x8/0x118)
> [   19.217624] [<c11547a4>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Clock related crashes in v5.4.y-queue
  2020-01-02  3:41 ` Guenter Roeck
@ 2020-01-02  7:30   ` Stephen Boyd
  2020-01-02 14:13     ` Guenter Roeck
  0 siblings, 1 reply; 14+ messages in thread
From: Stephen Boyd @ 2020-01-02  7:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Guenter Roeck, stable, Kishon Vijay Abraham I
  Cc: linux-clk

(Happy New Year!)

Quoting Guenter Roeck (2020-01-01 19:41:40)
> On 1/1/20 6:44 PM, Guenter Roeck wrote:
> > Hi,
> > 
> > I see a number of crashes in the latest v5.4.y-queue; please see below
> > for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
> > leak in clk_unregister()").
> > 
> > The context suggests recovery from a failed driver probe, and it appears
> > that the memory is released twice. Interestingly, I don't see the problem
> > in mainline.
> > 
> 
> The reason for not seeing the crash in mainline is that macb_probe()
> doesn't fail there due to unrelated changes in the driver. If I force
> macb_probe() to fail in mainline, I see exactly the same crash.
> 
> Effectively this means that upstream commit 8247470772be is broken;
> it may fix a memory leak in some situations, but it results in UAF
> and crashes otherwise.
> 
> Stephen, any comments ? I must admit that I don't understand the clock
> code nor the commit in question; I would have assumed that the call
> to __clk_put() would release the clk data structure, not an explicit
> call to kfree().

The clk that the commit from Kishon is freeing is the first "consumer
handle" that we make when a clk is registered. That is returned to
anyone that calls clk_register(), or if the provider decides to access
clk_hw::clk directly, which is not desired but still exists for
historical reasons. It is also used when drivers call clk_get_parent()
and that API currently fails to reference count or even create a
per-call clk pointer.

The general idea is that each user of clk_get() should get a different
struct clk pointer to use. The problem is we have this semi-internal
struct clk pointer that leaks out of clk_get_parent(), __clk_lookup()
and clk_register().

Maybe someone is calling clk_unregister() twice with the same pointer
they got through different ways? The macb driver does some questionable
things, like calling clk_register() and then putting the returned
pointer into *tx_clk, but only for the sifive implementation. After that
it does even odder things, like calling clk_unregister() on a clk that
probably shouldn't be unregistered, except for on the sifive platforms
that register it. Pretty horrifying that clk_unregister() gives any
consumer the power to destroy a clk from the system!

Can you try this patch? I think by fixing the leak we've discovered more
problems.

----8<----
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 9c767ee252ac..7dce403fd27c 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -4069,7 +4069,7 @@ static int fu540_c000_clk_init(struct platform_device *pdev, struct clk **pclk,
 	mgmt->rate = 0;
 	mgmt->hw.init = &init;
 
-	*tx_clk = clk_register(NULL, &mgmt->hw);
+	*tx_clk = devm_clk_register(&pdev->dev, &mgmt->hw);
 	if (IS_ERR(*tx_clk))
 		return PTR_ERR(*tx_clk);
 
@@ -4397,7 +4397,6 @@ static int macb_probe(struct platform_device *pdev)
 
 err_disable_clocks:
 	clk_disable_unprepare(tx_clk);
-	clk_unregister(tx_clk);
 	clk_disable_unprepare(hclk);
 	clk_disable_unprepare(pclk);
 	clk_disable_unprepare(rx_clk);
@@ -4427,7 +4426,6 @@ static int macb_remove(struct platform_device *pdev)
 		pm_runtime_dont_use_autosuspend(&pdev->dev);
 		if (!pm_runtime_suspended(&pdev->dev)) {
 			clk_disable_unprepare(bp->tx_clk);
-			clk_unregister(bp->tx_clk);
 			clk_disable_unprepare(bp->hclk);
 			clk_disable_unprepare(bp->pclk);
 			clk_disable_unprepare(bp->rx_clk);

> 
> Guenter
> 
> > I would suggest to drop that patch from the stable queue.
> > 
> > Guenter
> > 
> > ---
> > First traceback is:
> > 
> > [   19.203547] ------------[ cut here ]------------
> > [   19.204107] WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:4034 __clk_put+0xfc/0x128

Presumably this is the 

	WARN_ON_ONCE(IS_ERR(clk))

in __clk_put()? Or is it the exclusive count check that is getting
tricked out because of page poisoning?

I guess we should put that in some sort of text form of warning instead
of a not so helpful line number.

> > [   19.204275] Modules linked in:
> > [   19.204634] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.8-rc1-00191-gaf408bc6c96e #1
> > [   19.204790] Hardware name: Xilinx Zynq Platform
> > [   19.204994] [<c0313658>] (unwind_backtrace) from [<c030d698>] (show_stack+0x10/0x14)
> > [   19.205150] [<c030d698>] (show_stack) from [<c1139bdc>] (dump_stack+0xe0/0x10c)
> > [   19.205278] [<c1139bdc>] (dump_stack) from [<c0349098>] (__warn+0xf4/0x10c)
> > [   19.205399] [<c0349098>] (__warn) from [<c0349164>] (warn_slowpath_fmt+0xb4/0xbc)
> > [   19.205522] [<c0349164>] (warn_slowpath_fmt) from [<c0956d14>] (__clk_put+0xfc/0x128)
> > [   19.205654] [<c0956d14>] (__clk_put) from [<c0b1ea10>] (release_nodes+0x1c4/0x278)
> > [   19.205780] [<c0b1ea10>] (release_nodes) from [<c0b1a220>] (really_probe+0x108/0x34c)
> > [   19.205908] [<c0b1a220>] (really_probe) from [<c0b1a5dc>] (driver_probe_device+0x60/0x174)
> > [   19.206042] [<c0b1a5dc>] (driver_probe_device) from [<c0b1a898>] (device_driver_attach+0x58/0x60)
> > [   19.206179] [<c0b1a898>] (device_driver_attach) from [<c0b1a924>] (__driver_attach+0x84/0xc0)
> > [   19.206313] [<c0b1a924>] (__driver_attach) from [<c0b18400>] (bus_for_each_dev+0x78/0xb8)
> > [   19.206463] [<c0b18400>] (bus_for_each_dev) from [<c0b195e8>] (bus_add_driver+0x164/0x1e8)
> > [   19.206590] [<c0b195e8>] (bus_add_driver) from [<c0b1b6fc>] (driver_register+0x74/0x108)
> > [   19.206723] [<c0b1b6fc>] (driver_register) from [<c030315c>] (do_one_initcall+0x8c/0x3bc)
> > [   19.206857] [<c030315c>] (do_one_initcall) from [<c1a01080>] (kernel_init_freeable+0x14c/0x1e8)
> > [   19.206992] [<c1a01080>] (kernel_init_freeable) from [<c11547a4>] (kernel_init+0x8/0x118)
> > [   19.207116] [<c11547a4>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20)
> > 
> > followed by:
> > 
> > [   19.209792] 8<--- cut here ---
> > [   19.209926] Unable to handle kernel paging request at virtual address 6b6b6bb3
> > [   19.210117] pgd = (ptrval)
> > [   19.210207] [6b6b6bb3] *pgd=00000000
> > [   19.210626] Internal error: Oops: 5 [#1] SMP ARM
> > [   19.210807] Modules linked in:
> > [   19.210956] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.4.8-rc1-00191-gaf408bc6c96e #1
> > [   19.211090] Hardware name: Xilinx Zynq Platform
> > [   19.211200] PC is at __clk_put+0x104/0x128
> > [   19.211274] LR is at __clk_put+0xfc/0x128
> > [   19.211349] pc : [<c0956d1c>]    lr : [<c0956d14>]    psr: 60000053
> > [   19.211446] sp : c7129dd8  ip : 00000000  fp : c59f1680
> > [   19.211534] r10: c72fb6ac  r9 : c0b1dbd0  r8 : 00000008
> > [   19.211626] r7 : c7129e04  r6 : c72fb410  r5 : c59f0880  r4 : c59f3180
> > [   19.211727] r3 : 7a538c1d  r2 : 6b6b6b6b  r1 : 6b6b6b6b  r0 : 00000000
> > [   19.211885] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
> > [   19.212022] Control: 10c5387d  Table: 00204059  DAC: 00000051
> > [   19.212152] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
> > [   19.212270] Stack: (0xc7129dd8 to 0xc712a000)
> > [   19.212391] 9dc0:                                                       c59f1680 c59f0880
> > [   19.212608] 9de0: c72fb410 c0b1ea10 ffffffed 00000000 c0b1e404 c7128000 c72fb410 a0000053
> > [   19.212822] 9e00: c72fb68c c59f1c80 c59f1480 7a538c1d 00000001 c241e19c c72fb410 c241e1a0
> > [   19.213029] 9e20: 00000000 c1d8a1ac 00000000 ffffffed c1b8124c c0b1a220 c72fb410 c1d8a1ac
> > [   19.213240] 9e40: c1d8a1ac c7128000 c1dc347c 00000007 000001f6 c0b1a5dc c1d8a1ac c1d8a1ac
> > [   19.213462] 9e60: c7128000 c72fb410 00000000 c1d8a1ac c7128000 c1dc347c 00000007 000001f6
> > [   19.213683] 9e80: c1b8124c c0b1a898 00000000 c1d8a1ac c72fb410 c0b1a924 00000000 c1d8a1ac
> > [   19.213899] 9ea0: c0b1a8a0 c0b18400 c70b50d4 c70b50a4 c725d210 7a538c1d c70b50d4 c1d8a1ac
> > [   19.214115] 9ec0: c59f0280 c1d6dd50 00000000 c0b195e8 c185eb44 c1aab944 00000000 c1d8a1ac
> > [   19.214343] 9ee0: c1aab944 00000000 c1c08468 c0b1b6fc c1dc46c0 c1aab944 00000000 c030315c
> > [   19.214555] 9f00: c1959bf0 000001f6 000001f6 c0372600 00000000 c19574b8 c1883c18 00000000
> > [   19.214783] 9f20: c7128000 c03b3f70 c7128000 c1dd1f00 c1c08468 c1ae7870 c1a00590 00000007
> > [   19.215001] 9f40: 000001f6 c03d39b8 00000000 7a538c1d c1dc347c c1dd1f00 c1dd1f00 c1ae7850
> > [   19.215214] 9f60: c1ae7870 c1a00590 00000007 c1a01080 00000006 00000006 00000000 c1a00590
> > [   19.215429] 9f80: 00000000 00000000 c115479c 00000000 00000000 00000000 00000000 00000000
> > [   19.215636] 9fa0: 00000000 c11547a4 00000000 c03010b4 00000000 00000000 00000000 00000000
> > [   19.215843] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > [   19.216068] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> > [   19.216255] [<c0956d1c>] (__clk_put) from [<c0b1ea10>] (release_nodes+0x1c4/0x278)
> > [   19.216376] [<c0b1ea10>] (release_nodes) from [<c0b1a220>] (really_probe+0x108/0x34c)
> > [   19.216494] [<c0b1a220>] (really_probe) from [<c0b1a5dc>] (driver_probe_device+0x60/0x174)
> > [   19.216617] [<c0b1a5dc>] (driver_probe_device) from [<c0b1a898>] (device_driver_attach+0x58/0x60)
> > [   19.216745] [<c0b1a898>] (device_driver_attach) from [<c0b1a924>] (__driver_attach+0x84/0xc0)
> > [   19.216867] [<c0b1a924>] (__driver_attach) from [<c0b18400>] (bus_for_each_dev+0x78/0xb8)
> > [   19.216993] [<c0b18400>] (bus_for_each_dev) from [<c0b195e8>] (bus_add_driver+0x164/0x1e8)
> > [   19.217112] [<c0b195e8>] (bus_add_driver) from [<c0b1b6fc>] (driver_register+0x74/0x108)
> > [   19.217233] [<c0b1b6fc>] (driver_register) from [<c030315c>] (do_one_initcall+0x8c/0x3bc)
> > [   19.217358] [<c030315c>] (do_one_initcall) from [<c1a01080>] (kernel_init_freeable+0x14c/0x1e8)
> > [   19.217500] [<c1a01080>] (kernel_init_freeable) from [<c11547a4>] (kernel_init+0x8/0x118)
> > [   19.217624] [<c11547a4>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20)
> 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Clock related crashes in v5.4.y-queue
  2020-01-02  7:30   ` Stephen Boyd
@ 2020-01-02 14:13     ` Guenter Roeck
  2020-01-02 14:19       ` Naresh Kamboju
  0 siblings, 1 reply; 14+ messages in thread
From: Guenter Roeck @ 2020-01-02 14:13 UTC (permalink / raw)
  To: Stephen Boyd, Greg Kroah-Hartman, stable, Kishon Vijay Abraham I
  Cc: linux-clk

On 1/1/20 11:30 PM, Stephen Boyd wrote:
> (Happy New Year!)
> 
> Quoting Guenter Roeck (2020-01-01 19:41:40)
>> On 1/1/20 6:44 PM, Guenter Roeck wrote:
>>> Hi,
>>>
>>> I see a number of crashes in the latest v5.4.y-queue; please see below
>>> for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
>>> leak in clk_unregister()").
>>>
>>> The context suggests recovery from a failed driver probe, and it appears
>>> that the memory is released twice. Interestingly, I don't see the problem
>>> in mainline.
>>>
>>
>> The reason for not seeing the crash in mainline is that macb_probe()
>> doesn't fail there due to unrelated changes in the driver. If I force
>> macb_probe() to fail in mainline, I see exactly the same crash.
>>
>> Effectively this means that upstream commit 8247470772be is broken;
>> it may fix a memory leak in some situations, but it results in UAF
>> and crashes otherwise.
>>
>> Stephen, any comments ? I must admit that I don't understand the clock
>> code nor the commit in question; I would have assumed that the call
>> to __clk_put() would release the clk data structure, not an explicit
>> call to kfree().
> 
> The clk that the commit from Kishon is freeing is the first "consumer
> handle" that we make when a clk is registered. That is returned to
> anyone that calls clk_register(), or if the provider decides to access
> clk_hw::clk directly, which is not desired but still exists for
> historical reasons. It is also used when drivers call clk_get_parent()
> and that API currently fails to reference count or even create a
> per-call clk pointer.
> 
> The general idea is that each user of clk_get() should get a different
> struct clk pointer to use. The problem is we have this semi-internal
> struct clk pointer that leaks out of clk_get_parent(), __clk_lookup()
> and clk_register().
> 
> Maybe someone is calling clk_unregister() twice with the same pointer
> they got through different ways? The macb driver does some questionable

I think it is clk_unregister() followed by __clk_put(), but yes, it looks
like that is what is happening.

> things, like calling clk_register() and then putting the returned
> pointer into *tx_clk, but only for the sifive implementation. After that
> it does even odder things, like calling clk_unregister() on a clk that
> probably shouldn't be unregistered, except for on the sifive platforms
> that register it. Pretty horrifying that clk_unregister() gives any
> consumer the power to destroy a clk from the system!
> 
> Can you try this patch? I think by fixing the leak we've discovered more
> problems.
> 

Yes, it does.

Tested-by: Guenter Roeck <linux@roeck-us.net>

Thanks,
Guenter

> ----8<----
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 9c767ee252ac..7dce403fd27c 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -4069,7 +4069,7 @@ static int fu540_c000_clk_init(struct platform_device *pdev, struct clk **pclk,
>   	mgmt->rate = 0;
>   	mgmt->hw.init = &init;
>   
> -	*tx_clk = clk_register(NULL, &mgmt->hw);
> +	*tx_clk = devm_clk_register(&pdev->dev, &mgmt->hw);
>   	if (IS_ERR(*tx_clk))
>   		return PTR_ERR(*tx_clk);
>   
> @@ -4397,7 +4397,6 @@ static int macb_probe(struct platform_device *pdev)
>   
>   err_disable_clocks:
>   	clk_disable_unprepare(tx_clk);
> -	clk_unregister(tx_clk);
>   	clk_disable_unprepare(hclk);
>   	clk_disable_unprepare(pclk);
>   	clk_disable_unprepare(rx_clk);
> @@ -4427,7 +4426,6 @@ static int macb_remove(struct platform_device *pdev)
>   		pm_runtime_dont_use_autosuspend(&pdev->dev);
>   		if (!pm_runtime_suspended(&pdev->dev)) {
>   			clk_disable_unprepare(bp->tx_clk);
> -			clk_unregister(bp->tx_clk);
>   			clk_disable_unprepare(bp->hclk);
>   			clk_disable_unprepare(bp->pclk);
>   			clk_disable_unprepare(bp->rx_clk);
> 
>>
>> Guenter
>>
>>> I would suggest to drop that patch from the stable queue.
>>>
>>> Guenter
>>>
>>> ---
>>> First traceback is:
>>>
>>> [   19.203547] ------------[ cut here ]------------
>>> [   19.204107] WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:4034 __clk_put+0xfc/0x128
> 
> Presumably this is the
> 
> 	WARN_ON_ONCE(IS_ERR(clk))
> 
> in __clk_put()? Or is it the exclusive count check that is getting
> tricked out because of page poisoning?
> 
> I guess we should put that in some sort of text form of warning instead
> of a not so helpful line number.
> 
>>> [   19.204275] Modules linked in:
>>> [   19.204634] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.8-rc1-00191-gaf408bc6c96e #1
>>> [   19.204790] Hardware name: Xilinx Zynq Platform
>>> [   19.204994] [<c0313658>] (unwind_backtrace) from [<c030d698>] (show_stack+0x10/0x14)
>>> [   19.205150] [<c030d698>] (show_stack) from [<c1139bdc>] (dump_stack+0xe0/0x10c)
>>> [   19.205278] [<c1139bdc>] (dump_stack) from [<c0349098>] (__warn+0xf4/0x10c)
>>> [   19.205399] [<c0349098>] (__warn) from [<c0349164>] (warn_slowpath_fmt+0xb4/0xbc)
>>> [   19.205522] [<c0349164>] (warn_slowpath_fmt) from [<c0956d14>] (__clk_put+0xfc/0x128)
>>> [   19.205654] [<c0956d14>] (__clk_put) from [<c0b1ea10>] (release_nodes+0x1c4/0x278)
>>> [   19.205780] [<c0b1ea10>] (release_nodes) from [<c0b1a220>] (really_probe+0x108/0x34c)
>>> [   19.205908] [<c0b1a220>] (really_probe) from [<c0b1a5dc>] (driver_probe_device+0x60/0x174)
>>> [   19.206042] [<c0b1a5dc>] (driver_probe_device) from [<c0b1a898>] (device_driver_attach+0x58/0x60)
>>> [   19.206179] [<c0b1a898>] (device_driver_attach) from [<c0b1a924>] (__driver_attach+0x84/0xc0)
>>> [   19.206313] [<c0b1a924>] (__driver_attach) from [<c0b18400>] (bus_for_each_dev+0x78/0xb8)
>>> [   19.206463] [<c0b18400>] (bus_for_each_dev) from [<c0b195e8>] (bus_add_driver+0x164/0x1e8)
>>> [   19.206590] [<c0b195e8>] (bus_add_driver) from [<c0b1b6fc>] (driver_register+0x74/0x108)
>>> [   19.206723] [<c0b1b6fc>] (driver_register) from [<c030315c>] (do_one_initcall+0x8c/0x3bc)
>>> [   19.206857] [<c030315c>] (do_one_initcall) from [<c1a01080>] (kernel_init_freeable+0x14c/0x1e8)
>>> [   19.206992] [<c1a01080>] (kernel_init_freeable) from [<c11547a4>] (kernel_init+0x8/0x118)
>>> [   19.207116] [<c11547a4>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20)
>>>
>>> followed by:
>>>
>>> [   19.209792] 8<--- cut here ---
>>> [   19.209926] Unable to handle kernel paging request at virtual address 6b6b6bb3
>>> [   19.210117] pgd = (ptrval)
>>> [   19.210207] [6b6b6bb3] *pgd=00000000
>>> [   19.210626] Internal error: Oops: 5 [#1] SMP ARM
>>> [   19.210807] Modules linked in:
>>> [   19.210956] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.4.8-rc1-00191-gaf408bc6c96e #1
>>> [   19.211090] Hardware name: Xilinx Zynq Platform
>>> [   19.211200] PC is at __clk_put+0x104/0x128
>>> [   19.211274] LR is at __clk_put+0xfc/0x128
>>> [   19.211349] pc : [<c0956d1c>]    lr : [<c0956d14>]    psr: 60000053
>>> [   19.211446] sp : c7129dd8  ip : 00000000  fp : c59f1680
>>> [   19.211534] r10: c72fb6ac  r9 : c0b1dbd0  r8 : 00000008
>>> [   19.211626] r7 : c7129e04  r6 : c72fb410  r5 : c59f0880  r4 : c59f3180
>>> [   19.211727] r3 : 7a538c1d  r2 : 6b6b6b6b  r1 : 6b6b6b6b  r0 : 00000000
>>> [   19.211885] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
>>> [   19.212022] Control: 10c5387d  Table: 00204059  DAC: 00000051
>>> [   19.212152] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
>>> [   19.212270] Stack: (0xc7129dd8 to 0xc712a000)
>>> [   19.212391] 9dc0:                                                       c59f1680 c59f0880
>>> [   19.212608] 9de0: c72fb410 c0b1ea10 ffffffed 00000000 c0b1e404 c7128000 c72fb410 a0000053
>>> [   19.212822] 9e00: c72fb68c c59f1c80 c59f1480 7a538c1d 00000001 c241e19c c72fb410 c241e1a0
>>> [   19.213029] 9e20: 00000000 c1d8a1ac 00000000 ffffffed c1b8124c c0b1a220 c72fb410 c1d8a1ac
>>> [   19.213240] 9e40: c1d8a1ac c7128000 c1dc347c 00000007 000001f6 c0b1a5dc c1d8a1ac c1d8a1ac
>>> [   19.213462] 9e60: c7128000 c72fb410 00000000 c1d8a1ac c7128000 c1dc347c 00000007 000001f6
>>> [   19.213683] 9e80: c1b8124c c0b1a898 00000000 c1d8a1ac c72fb410 c0b1a924 00000000 c1d8a1ac
>>> [   19.213899] 9ea0: c0b1a8a0 c0b18400 c70b50d4 c70b50a4 c725d210 7a538c1d c70b50d4 c1d8a1ac
>>> [   19.214115] 9ec0: c59f0280 c1d6dd50 00000000 c0b195e8 c185eb44 c1aab944 00000000 c1d8a1ac
>>> [   19.214343] 9ee0: c1aab944 00000000 c1c08468 c0b1b6fc c1dc46c0 c1aab944 00000000 c030315c
>>> [   19.214555] 9f00: c1959bf0 000001f6 000001f6 c0372600 00000000 c19574b8 c1883c18 00000000
>>> [   19.214783] 9f20: c7128000 c03b3f70 c7128000 c1dd1f00 c1c08468 c1ae7870 c1a00590 00000007
>>> [   19.215001] 9f40: 000001f6 c03d39b8 00000000 7a538c1d c1dc347c c1dd1f00 c1dd1f00 c1ae7850
>>> [   19.215214] 9f60: c1ae7870 c1a00590 00000007 c1a01080 00000006 00000006 00000000 c1a00590
>>> [   19.215429] 9f80: 00000000 00000000 c115479c 00000000 00000000 00000000 00000000 00000000
>>> [   19.215636] 9fa0: 00000000 c11547a4 00000000 c03010b4 00000000 00000000 00000000 00000000
>>> [   19.215843] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>> [   19.216068] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
>>> [   19.216255] [<c0956d1c>] (__clk_put) from [<c0b1ea10>] (release_nodes+0x1c4/0x278)
>>> [   19.216376] [<c0b1ea10>] (release_nodes) from [<c0b1a220>] (really_probe+0x108/0x34c)
>>> [   19.216494] [<c0b1a220>] (really_probe) from [<c0b1a5dc>] (driver_probe_device+0x60/0x174)
>>> [   19.216617] [<c0b1a5dc>] (driver_probe_device) from [<c0b1a898>] (device_driver_attach+0x58/0x60)
>>> [   19.216745] [<c0b1a898>] (device_driver_attach) from [<c0b1a924>] (__driver_attach+0x84/0xc0)
>>> [   19.216867] [<c0b1a924>] (__driver_attach) from [<c0b18400>] (bus_for_each_dev+0x78/0xb8)
>>> [   19.216993] [<c0b18400>] (bus_for_each_dev) from [<c0b195e8>] (bus_add_driver+0x164/0x1e8)
>>> [   19.217112] [<c0b195e8>] (bus_add_driver) from [<c0b1b6fc>] (driver_register+0x74/0x108)
>>> [   19.217233] [<c0b1b6fc>] (driver_register) from [<c030315c>] (do_one_initcall+0x8c/0x3bc)
>>> [   19.217358] [<c030315c>] (do_one_initcall) from [<c1a01080>] (kernel_init_freeable+0x14c/0x1e8)
>>> [   19.217500] [<c1a01080>] (kernel_init_freeable) from [<c11547a4>] (kernel_init+0x8/0x118)
>>> [   19.217624] [<c11547a4>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20)
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Clock related crashes in v5.4.y-queue
  2020-01-02 14:13     ` Guenter Roeck
@ 2020-01-02 14:19       ` Naresh Kamboju
  2020-01-02 14:38         ` Guenter Roeck
  0 siblings, 1 reply; 14+ messages in thread
From: Naresh Kamboju @ 2020-01-02 14:19 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Stephen Boyd, Greg Kroah-Hartman, stable, Kishon Vijay Abraham I,
	linux-clk

Hi Guenter

On Thu, 2 Jan 2020 at 19:45, Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 1/1/20 11:30 PM, Stephen Boyd wrote:
> > (Happy New Year!)
> >
> > Quoting Guenter Roeck (2020-01-01 19:41:40)
> >> On 1/1/20 6:44 PM, Guenter Roeck wrote:
> >>> Hi,
> >>>
> >>> I see a number of crashes in the latest v5.4.y-queue; please see below
> >>> for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
> >>> leak in clk_unregister()").

Please share steps to reproduce this crash.

- Naresh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Clock related crashes in v5.4.y-queue
  2020-01-02 14:19       ` Naresh Kamboju
@ 2020-01-02 14:38         ` Guenter Roeck
  0 siblings, 0 replies; 14+ messages in thread
From: Guenter Roeck @ 2020-01-02 14:38 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Stephen Boyd, Greg Kroah-Hartman, stable, Kishon Vijay Abraham I,
	linux-clk

On 1/2/20 6:19 AM, Naresh Kamboju wrote:
> Hi Guenter
> 
> On Thu, 2 Jan 2020 at 19:45, Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> On 1/1/20 11:30 PM, Stephen Boyd wrote:
>>> (Happy New Year!)
>>>
>>> Quoting Guenter Roeck (2020-01-01 19:41:40)
>>>> On 1/1/20 6:44 PM, Guenter Roeck wrote:
>>>>> Hi,
>>>>>
>>>>> I see a number of crashes in the latest v5.4.y-queue; please see below
>>>>> for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
>>>>> leak in clk_unregister()").
> 
> Please share steps to reproduce this crash.
> 

With multi_v7_defconfig:

qemu-system-arm -M xilinx-zynq-a9 -kernel arch/arm/boot/zImage \
	-no-reboot -initrd rootfs-armv5.cpio \
	-m 128 -serial null \
	--append 'panic=-1 slub_debug=FZPUA rdinit=/sbin/init console=ttyPS0' \
	-dtb arch/arm/boot/dts/zynq-zc702.dtb \
	-nographic -monitor null -serial stdio

initrd from https://github.com/groeck/linux-build-test/blob/master/rootfs/arm/rootfs-armv5.cpio.gz

Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Clock related crashes in v5.4.y-queue
  2020-01-02  2:44 Clock related crashes in v5.4.y-queue Guenter Roeck
  2020-01-02  3:41 ` Guenter Roeck
@ 2020-01-02 21:01 ` Greg Kroah-Hartman
  2020-01-02 21:28   ` Guenter Roeck
  1 sibling, 1 reply; 14+ messages in thread
From: Greg Kroah-Hartman @ 2020-01-02 21:01 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: stable, Stephen Boyd

On Wed, Jan 01, 2020 at 06:44:08PM -0800, Guenter Roeck wrote:
> Hi,
> 
> I see a number of crashes in the latest v5.4.y-queue; please see below
> for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
> leak in clk_unregister()").
> 
> The context suggests recovery from a failed driver probe, and it appears
> that the memory is released twice. Interestingly, I don't see the problem
> in mainline.
> 
> I would suggest to drop that patch from the stable queue.

That does not look right, as you point out, so I will go drop it now.

The logic of the clk structure lifetimes seems crazy, messing with krefs
and just "knowing" the lifecycle of the other structures seems like a
problem just waiting to happen...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Clock related crashes in v5.4.y-queue
  2020-01-02 21:01 ` Greg Kroah-Hartman
@ 2020-01-02 21:28   ` Guenter Roeck
  2020-01-03  0:40     ` Sasha Levin
  0 siblings, 1 reply; 14+ messages in thread
From: Guenter Roeck @ 2020-01-02 21:28 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: stable, Stephen Boyd

On Thu, Jan 02, 2020 at 10:01:19PM +0100, Greg Kroah-Hartman wrote:
> On Wed, Jan 01, 2020 at 06:44:08PM -0800, Guenter Roeck wrote:
> > Hi,
> > 
> > I see a number of crashes in the latest v5.4.y-queue; please see below
> > for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
> > leak in clk_unregister()").
> > 
> > The context suggests recovery from a failed driver probe, and it appears
> > that the memory is released twice. Interestingly, I don't see the problem
> > in mainline.
> > 
> > I would suggest to drop that patch from the stable queue.
> 
> That does not look right, as you point out, so I will go drop it now.
> 
> The logic of the clk structure lifetimes seems crazy, messing with krefs
> and just "knowing" the lifecycle of the other structures seems like a
> problem just waiting to happen...
> 

I agree. While the patch itself seems to be ok per Stephen's feedback,
we have to assume that there will be more secondary failures in addition
to the one I have discovered. Given that clocks are not normally
unregistered, I don't think fixing the memory leak is important enough
to risk the stability of stable releases.

With all that in mind, I'd rather have this in mainline for a prolonged
period of time before considering it for stable release (if at all).

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Clock related crashes in v5.4.y-queue
  2020-01-02 21:28   ` Guenter Roeck
@ 2020-01-03  0:40     ` Sasha Levin
  2020-01-03 14:50       ` Guenter Roeck
  0 siblings, 1 reply; 14+ messages in thread
From: Sasha Levin @ 2020-01-03  0:40 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Greg Kroah-Hartman, stable, Stephen Boyd

On Thu, Jan 02, 2020 at 01:28:37PM -0800, Guenter Roeck wrote:
>On Thu, Jan 02, 2020 at 10:01:19PM +0100, Greg Kroah-Hartman wrote:
>> On Wed, Jan 01, 2020 at 06:44:08PM -0800, Guenter Roeck wrote:
>> > Hi,
>> >
>> > I see a number of crashes in the latest v5.4.y-queue; please see below
>> > for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
>> > leak in clk_unregister()").
>> >
>> > The context suggests recovery from a failed driver probe, and it appears
>> > that the memory is released twice. Interestingly, I don't see the problem
>> > in mainline.
>> >
>> > I would suggest to drop that patch from the stable queue.
>>
>> That does not look right, as you point out, so I will go drop it now.
>>
>> The logic of the clk structure lifetimes seems crazy, messing with krefs
>> and just "knowing" the lifecycle of the other structures seems like a
>> problem just waiting to happen...
>>
>
>I agree. While the patch itself seems to be ok per Stephen's feedback,
>we have to assume that there will be more secondary failures in addition
>to the one I have discovered. Given that clocks are not normally
>unregistered, I don't think fixing the memory leak is important enough
>to risk the stability of stable releases.
>
>With all that in mind, I'd rather have this in mainline for a prolonged
>period of time before considering it for stable release (if at all).

I would very much like to circle back and add both this patch and it's
fix to the stable trees at some point in the future.

If the code is good enough for mainline it should be good enough for
stable as well. If it's broken - let's fix it now instead of deferring
this to when people try to upgrade their major kernel versions.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Clock related crashes in v5.4.y-queue
  2020-01-03  0:40     ` Sasha Levin
@ 2020-01-03 14:50       ` Guenter Roeck
  2020-01-05 16:02         ` Sasha Levin
  0 siblings, 1 reply; 14+ messages in thread
From: Guenter Roeck @ 2020-01-03 14:50 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Greg Kroah-Hartman, stable, Stephen Boyd

On 1/2/20 4:40 PM, Sasha Levin wrote:
> On Thu, Jan 02, 2020 at 01:28:37PM -0800, Guenter Roeck wrote:
>> On Thu, Jan 02, 2020 at 10:01:19PM +0100, Greg Kroah-Hartman wrote:
>>> On Wed, Jan 01, 2020 at 06:44:08PM -0800, Guenter Roeck wrote:
>>> > Hi,
>>> >
>>> > I see a number of crashes in the latest v5.4.y-queue; please see below
>>> > for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
>>> > leak in clk_unregister()").
>>> >
>>> > The context suggests recovery from a failed driver probe, and it appears
>>> > that the memory is released twice. Interestingly, I don't see the problem
>>> > in mainline.
>>> >
>>> > I would suggest to drop that patch from the stable queue.
>>>
>>> That does not look right, as you point out, so I will go drop it now.
>>>
>>> The logic of the clk structure lifetimes seems crazy, messing with krefs
>>> and just "knowing" the lifecycle of the other structures seems like a
>>> problem just waiting to happen...
>>>
>>
>> I agree. While the patch itself seems to be ok per Stephen's feedback,
>> we have to assume that there will be more secondary failures in addition
>> to the one I have discovered. Given that clocks are not normally
>> unregistered, I don't think fixing the memory leak is important enough
>> to risk the stability of stable releases.
>>
>> With all that in mind, I'd rather have this in mainline for a prolonged
>> period of time before considering it for stable release (if at all).
> 
> I would very much like to circle back and add both this patch and it's
> fix to the stable trees at some point in the future.
> 
> If the code is good enough for mainline it should be good enough for
> stable as well. If it's broken - let's fix it now instead of deferring
> this to when people try to upgrade their major kernel versions.
> 

This is where we differ strongly, and where I think the Linux community will
have to make a decision sometime soon. If "good enough for mainline" is a
relevant criteria for inclusion of a patch into stable releases, we don't
need stable releases anymore (we are backporting all bugs into those anyway).
Just use mainline.

Really, stable releases should be limited to fixing severe bugs. This is not
a fix for a severe bug, and on top of that it has side effects. True, those
side effects are that it uncovers other bugs, but that just makes it worse.
If we assume that my marginal testing covers, optimistically, 1% of the kernel,
and it discovers one bug, we have the potential of many more bugs littered
throughout the kernel which are now exposed. I really don't want to export
that risk into stable releases.

Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Clock related crashes in v5.4.y-queue
  2020-01-03 14:50       ` Guenter Roeck
@ 2020-01-05 16:02         ` Sasha Levin
  2020-01-05 16:34           ` Guenter Roeck
  0 siblings, 1 reply; 14+ messages in thread
From: Sasha Levin @ 2020-01-05 16:02 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Greg Kroah-Hartman, stable, Stephen Boyd

On Fri, Jan 03, 2020 at 06:50:45AM -0800, Guenter Roeck wrote:
>On 1/2/20 4:40 PM, Sasha Levin wrote:
>>On Thu, Jan 02, 2020 at 01:28:37PM -0800, Guenter Roeck wrote:
>>>On Thu, Jan 02, 2020 at 10:01:19PM +0100, Greg Kroah-Hartman wrote:
>>>>On Wed, Jan 01, 2020 at 06:44:08PM -0800, Guenter Roeck wrote:
>>>>> Hi,
>>>>>
>>>>> I see a number of crashes in the latest v5.4.y-queue; please see below
>>>>> for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
>>>>> leak in clk_unregister()").
>>>>>
>>>>> The context suggests recovery from a failed driver probe, and it appears
>>>>> that the memory is released twice. Interestingly, I don't see the problem
>>>>> in mainline.
>>>>>
>>>>> I would suggest to drop that patch from the stable queue.
>>>>
>>>>That does not look right, as you point out, so I will go drop it now.
>>>>
>>>>The logic of the clk structure lifetimes seems crazy, messing with krefs
>>>>and just "knowing" the lifecycle of the other structures seems like a
>>>>problem just waiting to happen...
>>>>
>>>
>>>I agree. While the patch itself seems to be ok per Stephen's feedback,
>>>we have to assume that there will be more secondary failures in addition
>>>to the one I have discovered. Given that clocks are not normally
>>>unregistered, I don't think fixing the memory leak is important enough
>>>to risk the stability of stable releases.
>>>
>>>With all that in mind, I'd rather have this in mainline for a prolonged
>>>period of time before considering it for stable release (if at all).
>>
>>I would very much like to circle back and add both this patch and it's
>>fix to the stable trees at some point in the future.
>>
>>If the code is good enough for mainline it should be good enough for
>>stable as well. If it's broken - let's fix it now instead of deferring
>>this to when people try to upgrade their major kernel versions.
>>
>
>This is where we differ strongly, and where I think the Linux community will
>have to make a decision sometime soon. If "good enough for mainline" is a
>relevant criteria for inclusion of a patch into stable releases, we don't
>need stable releases anymore (we are backporting all bugs into those anyway).
>Just use mainline.
>
>Really, stable releases should be limited to fixing severe bugs. This is not
>a fix for a severe bug, and on top of that it has side effects. True, those
>side effects are that it uncovers other bugs, but that just makes it worse.
>If we assume that my marginal testing covers, optimistically, 1% of the kernel,
>and it discovers one bug, we have the potential of many more bugs littered
>throughout the kernel which are now exposed. I really don't want to export
>that risk into stable releases.

The assumption here is that fixes introduce less bugs than newly
introduced features, so I'd like to think that we're not backporting
*all* bugs :)

It's hard to define "severe" given how widely the kernel is used and all
the weird usecases it has. Something that doesn't look severe might be
very critical in a specific usecase. I fear that if we have a strict
definition of "severe", our users will end up carrying more patches
out-of-tree to fix their "less severe" issue, causing fragmantation
which we really want to avoid.

I actually belive very much in the suggestion you've made in your first
paragraph: I'd love to see LTS and later on -stable kernels go away and
users just use mainline releases. Yes, it's unrealistic now, but I'd
like to think that we're working towards it, thus I want to keep picking
up more patches and develop our (as well as our user's) testing muscle
to be able to catch regressions.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Clock related crashes in v5.4.y-queue
  2020-01-05 16:02         ` Sasha Levin
@ 2020-01-05 16:34           ` Guenter Roeck
  2020-01-05 19:10             ` Sasha Levin
  2020-01-06 13:20             ` Sasha Levin
  0 siblings, 2 replies; 14+ messages in thread
From: Guenter Roeck @ 2020-01-05 16:34 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Greg Kroah-Hartman, stable, Stephen Boyd

On 1/5/20 8:02 AM, Sasha Levin wrote:
> On Fri, Jan 03, 2020 at 06:50:45AM -0800, Guenter Roeck wrote:
>> On 1/2/20 4:40 PM, Sasha Levin wrote:
>>> On Thu, Jan 02, 2020 at 01:28:37PM -0800, Guenter Roeck wrote:
>>>> On Thu, Jan 02, 2020 at 10:01:19PM +0100, Greg Kroah-Hartman wrote:
>>>>> On Wed, Jan 01, 2020 at 06:44:08PM -0800, Guenter Roeck wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I see a number of crashes in the latest v5.4.y-queue; please see below
>>>>>> for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
>>>>>> leak in clk_unregister()").
>>>>>>
>>>>>> The context suggests recovery from a failed driver probe, and it appears
>>>>>> that the memory is released twice. Interestingly, I don't see the problem
>>>>>> in mainline.
>>>>>>
>>>>>> I would suggest to drop that patch from the stable queue.
>>>>>
>>>>> That does not look right, as you point out, so I will go drop it now.
>>>>>
>>>>> The logic of the clk structure lifetimes seems crazy, messing with krefs
>>>>> and just "knowing" the lifecycle of the other structures seems like a
>>>>> problem just waiting to happen...
>>>>>
>>>>
>>>> I agree. While the patch itself seems to be ok per Stephen's feedback,
>>>> we have to assume that there will be more secondary failures in addition
>>>> to the one I have discovered. Given that clocks are not normally
>>>> unregistered, I don't think fixing the memory leak is important enough
>>>> to risk the stability of stable releases.
>>>>
>>>> With all that in mind, I'd rather have this in mainline for a prolonged
>>>> period of time before considering it for stable release (if at all).
>>>
>>> I would very much like to circle back and add both this patch and it's
>>> fix to the stable trees at some point in the future.
>>>
>>> If the code is good enough for mainline it should be good enough for
>>> stable as well. If it's broken - let's fix it now instead of deferring
>>> this to when people try to upgrade their major kernel versions.
>>>
>>
>> This is where we differ strongly, and where I think the Linux community will
>> have to make a decision sometime soon. If "good enough for mainline" is a
>> relevant criteria for inclusion of a patch into stable releases, we don't
>> need stable releases anymore (we are backporting all bugs into those anyway).
>> Just use mainline.
>>
>> Really, stable releases should be limited to fixing severe bugs. This is not
>> a fix for a severe bug, and on top of that it has side effects. True, those
>> side effects are that it uncovers other bugs, but that just makes it worse.
>> If we assume that my marginal testing covers, optimistically, 1% of the kernel,
>> and it discovers one bug, we have the potential of many more bugs littered
>> throughout the kernel which are now exposed. I really don't want to export
>> that risk into stable releases.
> 
> The assumption here is that fixes introduce less bugs than newly
> introduced features, so I'd like to think that we're not backporting
> *all* bugs :)
> 
> It's hard to define "severe" given how widely the kernel is used and all
> the weird usecases it has. Something that doesn't look severe might be
> very critical in a specific usecase. I fear that if we have a strict
> definition of "severe", our users will end up carrying more patches
> out-of-tree to fix their "less severe" issue, causing fragmantation
> which we really want to avoid.
> 
> I actually belive very much in the suggestion you've made in your first
> paragraph: I'd love to see LTS and later on -stable kernels go away and
> users just use mainline releases. Yes, it's unrealistic now, but I'd
> like to think that we're working towards it, thus I want to keep picking
> up more patches and develop our (as well as our user's) testing muscle
> to be able to catch regressions.
> 

The result will be that more users will abandon upstream stable releases.
I already have trouble explaining why that many patches are being backported;
there is quite some internal grumbling about it (along the line of "this
patch should not have been backported").

I think we are going into the absolutely wrong direction. Expecting that
everyone would use mainline is absolutely unrealistic, and backporting
more and more patches to stable branches can only result in destabilizing
stable branches, which in turn will drive people away from it (and _not_
to use mainline). The only reason it wasn't a disaster yet is that we have
better testing now. But we offset that better testing with backporting
more patches, which has the opposite effect. One stabilizes, the other
destabilizes. The end result is the same. Actually, it is worse - the
indiscriminate backporting not only causes unnecessary regressions,
it (again) gives people an argument against merging stable releases.
And, this time I have to admit they are right.

Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Clock related crashes in v5.4.y-queue
  2020-01-05 16:34           ` Guenter Roeck
@ 2020-01-05 19:10             ` Sasha Levin
  2020-01-06 13:20             ` Sasha Levin
  1 sibling, 0 replies; 14+ messages in thread
From: Sasha Levin @ 2020-01-05 19:10 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Greg Kroah-Hartman, stable, Stephen Boyd

On Sun, Jan 05, 2020 at 08:34:59AM -0800, Guenter Roeck wrote:
>On 1/5/20 8:02 AM, Sasha Levin wrote:
>>On Fri, Jan 03, 2020 at 06:50:45AM -0800, Guenter Roeck wrote:
>>>On 1/2/20 4:40 PM, Sasha Levin wrote:
>>>>On Thu, Jan 02, 2020 at 01:28:37PM -0800, Guenter Roeck wrote:
>>>>>On Thu, Jan 02, 2020 at 10:01:19PM +0100, Greg Kroah-Hartman wrote:
>>>>>>On Wed, Jan 01, 2020 at 06:44:08PM -0800, Guenter Roeck wrote:
>>>>>>>Hi,
>>>>>>>
>>>>>>>I see a number of crashes in the latest v5.4.y-queue; please see below
>>>>>>>for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
>>>>>>>leak in clk_unregister()").
>>>>>>>
>>>>>>>The context suggests recovery from a failed driver probe, and it appears
>>>>>>>that the memory is released twice. Interestingly, I don't see the problem
>>>>>>>in mainline.
>>>>>>>
>>>>>>>I would suggest to drop that patch from the stable queue.
>>>>>>
>>>>>>That does not look right, as you point out, so I will go drop it now.
>>>>>>
>>>>>>The logic of the clk structure lifetimes seems crazy, messing with krefs
>>>>>>and just "knowing" the lifecycle of the other structures seems like a
>>>>>>problem just waiting to happen...
>>>>>>
>>>>>
>>>>>I agree. While the patch itself seems to be ok per Stephen's feedback,
>>>>>we have to assume that there will be more secondary failures in addition
>>>>>to the one I have discovered. Given that clocks are not normally
>>>>>unregistered, I don't think fixing the memory leak is important enough
>>>>>to risk the stability of stable releases.
>>>>>
>>>>>With all that in mind, I'd rather have this in mainline for a prolonged
>>>>>period of time before considering it for stable release (if at all).
>>>>
>>>>I would very much like to circle back and add both this patch and it's
>>>>fix to the stable trees at some point in the future.
>>>>
>>>>If the code is good enough for mainline it should be good enough for
>>>>stable as well. If it's broken - let's fix it now instead of deferring
>>>>this to when people try to upgrade their major kernel versions.
>>>>
>>>
>>>This is where we differ strongly, and where I think the Linux community will
>>>have to make a decision sometime soon. If "good enough for mainline" is a
>>>relevant criteria for inclusion of a patch into stable releases, we don't
>>>need stable releases anymore (we are backporting all bugs into those anyway).
>>>Just use mainline.
>>>
>>>Really, stable releases should be limited to fixing severe bugs. This is not
>>>a fix for a severe bug, and on top of that it has side effects. True, those
>>>side effects are that it uncovers other bugs, but that just makes it worse.
>>>If we assume that my marginal testing covers, optimistically, 1% of the kernel,
>>>and it discovers one bug, we have the potential of many more bugs littered
>>>throughout the kernel which are now exposed. I really don't want to export
>>>that risk into stable releases.
>>
>>The assumption here is that fixes introduce less bugs than newly
>>introduced features, so I'd like to think that we're not backporting
>>*all* bugs :)
>>
>>It's hard to define "severe" given how widely the kernel is used and all
>>the weird usecases it has. Something that doesn't look severe might be
>>very critical in a specific usecase. I fear that if we have a strict
>>definition of "severe", our users will end up carrying more patches
>>out-of-tree to fix their "less severe" issue, causing fragmantation
>>which we really want to avoid.
>>
>>I actually belive very much in the suggestion you've made in your first
>>paragraph: I'd love to see LTS and later on -stable kernels go away and
>>users just use mainline releases. Yes, it's unrealistic now, but I'd
>>like to think that we're working towards it, thus I want to keep picking
>>up more patches and develop our (as well as our user's) testing muscle
>>to be able to catch regressions.
>>
>
>The result will be that more users will abandon upstream stable releases.
>I already have trouble explaining why that many patches are being backported;
>there is quite some internal grumbling about it (along the line of "this
>patch should not have been backported").
>
>I think we are going into the absolutely wrong direction. Expecting that
>everyone would use mainline is absolutely unrealistic, and backporting
>more and more patches to stable branches can only result in destabilizing
>stable branches, which in turn will drive people away from it (and _not_
>to use mainline). The only reason it wasn't a disaster yet is that we have
>better testing now. But we offset that better testing with backporting
>more patches, which has the opposite effect. One stabilizes, the other
>destabilizes. The end result is the same. Actually, it is worse - the
>indiscriminate backporting not only causes unnecessary regressions,
>it (again) gives people an argument against merging stable releases.
>And, this time I have to admit they are right.

I'd like to think that there are more patches that fix bugs than ones
that introduce them, so backporting more patches should reduce the bug
count rather than increase it. I very much agree with your point on
testing allowing us to pull more patches in, but I disagree that pulling
more patches destabilizes the tree - we don't add any new features here,
we only take more fixes which some view as less severe/critical.

Years ago I shared your view of this, and actually had a pet project
that attempted to pick only the "most important" patches out of a stable
tree (see https://lkml.org/lkml/2016/4/11/775). What I realized quickly
when we started using it is that almost every patch matters to someone
and almost every patch ends up having security impact. We could start
taking one what seems to be really important to us, but as humans we
suck at figuring this out.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Clock related crashes in v5.4.y-queue
  2020-01-05 16:34           ` Guenter Roeck
  2020-01-05 19:10             ` Sasha Levin
@ 2020-01-06 13:20             ` Sasha Levin
  1 sibling, 0 replies; 14+ messages in thread
From: Sasha Levin @ 2020-01-06 13:20 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Greg Kroah-Hartman, stable, Stephen Boyd

On Sun, Jan 05, 2020 at 08:34:59AM -0800, Guenter Roeck wrote:
>I think we are going into the absolutely wrong direction. Expecting that
>everyone would use mainline is absolutely unrealistic, and backporting
>more and more patches to stable branches can only result in destabilizing
>stable branches, which in turn will drive people away from it (and _not_
>to use mainline). The only reason it wasn't a disaster yet is that we have
>better testing now. But we offset that better testing with backporting
>more patches, which has the opposite effect. One stabilizes, the other
>destabilizes. The end result is the same. Actually, it is worse - the
>indiscriminate backporting not only causes unnecessary regressions,
>it (again) gives people an argument against merging stable releases.
>And, this time I have to admit they are right.

Just to get an idea of how the AUTOSEL commits affect the kernel tree I
tried the following:

We have 10648 commits on top of 4.19 in the 4.19 LTS tree:

$ git log --oneline v4.19..stable/linux-4.19.y | wc -l
10468

I've tried to identify how many of them have a "Fixes:" tag pointing to
them, and how many were reverted (using it to identify buggy commits in
the stable tree), ending up with:

$ wc -l fixes
637 fixes

So about 6% of the commits that go in the stable tree have a follow up
fix or revert. Now, let's see where commits in the 4.19 LTS tree come
from:

$ git log --oneline --grep "Signed-off-by: Sasha Levin <sashal@kernel.org>" v4.19..stable/linux-4.19.y | wc -l
5475
$ git log --invert-grep --oneline --grep "Signed-off-by: Sasha Levin <sashal@kernel.org>" v4.19..stable/linux-4.19.y | wc -l
4993

In general, Greg is the one who picks up commits that are tagged for
stable, security issues, and patches requested by folks on the mailing
list. I'm mostly doing AUTOSEL, other distro trees, and mailing list
(but Greg still does the most of the mailing list work).

Anyway, looks like mostly an equal split between stable tagged commits
and AUTOSEL ones.

Now, looking at the buggy commits again, I check whether the commit came
via Greg or myself (just using it as a way to differentiate between
stable tagged commits/commits requested by users/etc and commits that
came in using AUTOSEL), and I get:

$ for i in $(cat fixes | awk {'print $2'}); do if [ $(git show $i | grep 'Signed-off-by: Sasha Levin <sashal@kernel.org>' | wc -l) -gt 0 ]; then echo "Sasha"; else echo "Greg"; fi; done | sort | uniq -c
    367 Greg
    270 Sasha

Which seems to show that AUTOSEL commits are actually less buggy than
stable tagged commits. Sure, this analysis isn't perfect, but if we
treat it purely as ballpark figures I think that it's enough to show
that picking up more fixes doesn't contribute to "destabilizing" the
stable trees.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-01-06 13:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-02  2:44 Clock related crashes in v5.4.y-queue Guenter Roeck
2020-01-02  3:41 ` Guenter Roeck
2020-01-02  7:30   ` Stephen Boyd
2020-01-02 14:13     ` Guenter Roeck
2020-01-02 14:19       ` Naresh Kamboju
2020-01-02 14:38         ` Guenter Roeck
2020-01-02 21:01 ` Greg Kroah-Hartman
2020-01-02 21:28   ` Guenter Roeck
2020-01-03  0:40     ` Sasha Levin
2020-01-03 14:50       ` Guenter Roeck
2020-01-05 16:02         ` Sasha Levin
2020-01-05 16:34           ` Guenter Roeck
2020-01-05 19:10             ` Sasha Levin
2020-01-06 13:20             ` Sasha Levin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.