All of lore.kernel.org
 help / color / mirror / Atom feed
* UBSAN: Undefined behaviour in ../include/linux/percpu_counter.h:137:13
@ 2018-06-07 18:26 Mathieu Malaterre
  2018-06-13  1:42 ` Michael Ellerman
  0 siblings, 1 reply; 5+ messages in thread
From: Mathieu Malaterre @ 2018-06-07 18:26 UTC (permalink / raw)
  To: linuxppc-dev

Hi there,

I have a reproducible UBSAN appearing in dmesg after a while on my G4
(*). Could anyone suggest a way to diagnose the actual root issue here
(or is it just a false positive) ?

Thanks,

(*)
[41877.514338] ================================================================================
[41877.514364] UBSAN: Undefined behaviour in
../include/linux/percpu_counter.h:137:13
[41877.514373] signed integer overflow:
[41877.514378] 9223352809007201260 + 41997676517838 cannot be
represented in type 'long long int'
[41877.514389] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #54
[41877.514394] Call Trace:
[41877.514411] [dffedd30] [c047a5f8] ubsan_epilogue+0x18/0x4c (unreliable)
[41877.514422] [dffedd40] [c047af98] handle_overflow+0xbc/0xdc
[41877.514437] [dffeddc0] [c043aaa8] cfq_completed_request+0x560/0x1234
[41877.514446] [dffede40] [c03f595c] __blk_put_request+0xb0/0x2dc
[41877.514460] [dffede80] [c05aa41c] scsi_end_request+0x19c/0x344
[41877.514469] [dffedeb0] [c05abba0] scsi_io_completion+0x4b4/0x854
[41877.514482] [dffedf10] [c040604c] blk_done_softirq+0xe4/0x1e0
[41877.514496] [dffedf60] [c07eef84] __do_softirq+0x16c/0x5f0
[41877.514508] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
[41877.514520] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
[41877.514533] [c0ce5e80] [c0009a2c] do_IRQ+0x98/0x1a0
[41877.514541] [c0ce5eb0] [c001b93c] ret_from_except+0x0/0x14
[41877.514549] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
                   LR = arch_cpu_idle+0x30/0x78
[41877.514558] [c0ce5f70] [c0ce4000] 0xc0ce4000 (unreliable)
[41877.514570] [c0ce5f80] [c00a3928] do_idle+0xc4/0x158
[41877.514577] [c0ce5fb0] [c00a3b74] cpu_startup_entry+0x24/0x28
[41877.514585] [c0ce5fc0] [c0988820] start_kernel+0x47c/0x490
[41877.514592] [c0ce5ff0] [00003444] 0x3444
[41877.514597] ================================================================================
[41886.390210] ================================================================================
[41886.390236] UBSAN: Undefined behaviour in
../include/linux/percpu_counter.h:137:13
[41886.390245] signed integer overflow:
[41886.390250] 9223366156262940402 + 42006563339289 cannot be
represented in type 'long long int'
[41886.390260] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #54
[41886.390265] Call Trace:
[41886.390282] [dffedd30] [c047a5f8] ubsan_epilogue+0x18/0x4c (unreliable)
[41886.390293] [dffedd40] [c047af98] handle_overflow+0xbc/0xdc
[41886.390309] [dffeddc0] [c043a8c4] cfq_completed_request+0x37c/0x1234
[41886.390317] [dffede40] [c03f595c] __blk_put_request+0xb0/0x2dc
[41886.390331] [dffede80] [c05aa41c] scsi_end_request+0x19c/0x344
[41886.390340] [dffedeb0] [c05abba0] scsi_io_completion+0x4b4/0x854
[41886.390353] [dffedf10] [c040604c] blk_done_softirq+0xe4/0x1e0
[41886.390367] [dffedf60] [c07eef84] __do_softirq+0x16c/0x5f0
[41886.390379] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
[41886.390391] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
[41886.390404] [c0ce5e80] [c0009a2c] do_IRQ+0x98/0x1a0
[41886.390411] [c0ce5eb0] [c001b93c] ret_from_except+0x0/0x14
[41886.390420] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
                   LR = arch_cpu_idle+0x30/0x78
[41886.390429] [c0ce5f70] [c0ce4000] 0xc0ce4000 (unreliable)
[41886.390441] [c0ce5f80] [c00a3928] do_idle+0xc4/0x158
[41886.390449] [c0ce5fb0] [c00a3b74] cpu_startup_entry+0x24/0x28
[41886.390457] [c0ce5fc0] [c0988820] start_kernel+0x47c/0x490
[41886.390463] [c0ce5ff0] [00003444] 0x3444
[41886.390468] ================================================================================

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: UBSAN: Undefined behaviour in ../include/linux/percpu_counter.h:137:13
  2018-06-07 18:26 UBSAN: Undefined behaviour in ../include/linux/percpu_counter.h:137:13 Mathieu Malaterre
@ 2018-06-13  1:42 ` Michael Ellerman
  2018-06-13  8:43   ` Mathieu Malaterre
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Ellerman @ 2018-06-13  1:42 UTC (permalink / raw)
  To: Mathieu Malaterre, linuxppc-dev

Mathieu Malaterre <malat@debian.org> writes:

> Hi there,
>
> I have a reproducible UBSAN appearing in dmesg after a while on my G4
> (*). Could anyone suggest a way to diagnose the actual root issue here
> (or is it just a false positive) ?

It looks like a real overflow, I guess the question is why are we seeing it.

The first thing to work out would be what exactly is overflowing.

Is it in here?

	cfqg_stats_update_completion(cfqq->cfqg, rq->start_time_ns,
				     rq->io_start_time_ns, rq->cmd_flags);


If so that would suggest something is taking multiple hours to complete,
which seems unlikely. Is time going backward?

cheers

> (*)
> [41877.514338] ================================================================================
> [41877.514364] UBSAN: Undefined behaviour in
> ../include/linux/percpu_counter.h:137:13
> [41877.514373] signed integer overflow:
> [41877.514378] 9223352809007201260 + 41997676517838 cannot be
> represented in type 'long long int'
> [41877.514389] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #54
> [41877.514394] Call Trace:
> [41877.514411] [dffedd30] [c047a5f8] ubsan_epilogue+0x18/0x4c (unreliable)
> [41877.514422] [dffedd40] [c047af98] handle_overflow+0xbc/0xdc
> [41877.514437] [dffeddc0] [c043aaa8] cfq_completed_request+0x560/0x1234
> [41877.514446] [dffede40] [c03f595c] __blk_put_request+0xb0/0x2dc
> [41877.514460] [dffede80] [c05aa41c] scsi_end_request+0x19c/0x344
> [41877.514469] [dffedeb0] [c05abba0] scsi_io_completion+0x4b4/0x854
> [41877.514482] [dffedf10] [c040604c] blk_done_softirq+0xe4/0x1e0
> [41877.514496] [dffedf60] [c07eef84] __do_softirq+0x16c/0x5f0
> [41877.514508] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
> [41877.514520] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
> [41877.514533] [c0ce5e80] [c0009a2c] do_IRQ+0x98/0x1a0
> [41877.514541] [c0ce5eb0] [c001b93c] ret_from_except+0x0/0x14
> [41877.514549] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
>                    LR = arch_cpu_idle+0x30/0x78
> [41877.514558] [c0ce5f70] [c0ce4000] 0xc0ce4000 (unreliable)
> [41877.514570] [c0ce5f80] [c00a3928] do_idle+0xc4/0x158
> [41877.514577] [c0ce5fb0] [c00a3b74] cpu_startup_entry+0x24/0x28
> [41877.514585] [c0ce5fc0] [c0988820] start_kernel+0x47c/0x490
> [41877.514592] [c0ce5ff0] [00003444] 0x3444
> [41877.514597] ================================================================================
> [41886.390210] ================================================================================
> [41886.390236] UBSAN: Undefined behaviour in
> ../include/linux/percpu_counter.h:137:13
> [41886.390245] signed integer overflow:
> [41886.390250] 9223366156262940402 + 42006563339289 cannot be
> represented in type 'long long int'
> [41886.390260] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #54
> [41886.390265] Call Trace:
> [41886.390282] [dffedd30] [c047a5f8] ubsan_epilogue+0x18/0x4c (unreliable)
> [41886.390293] [dffedd40] [c047af98] handle_overflow+0xbc/0xdc
> [41886.390309] [dffeddc0] [c043a8c4] cfq_completed_request+0x37c/0x1234
> [41886.390317] [dffede40] [c03f595c] __blk_put_request+0xb0/0x2dc
> [41886.390331] [dffede80] [c05aa41c] scsi_end_request+0x19c/0x344
> [41886.390340] [dffedeb0] [c05abba0] scsi_io_completion+0x4b4/0x854
> [41886.390353] [dffedf10] [c040604c] blk_done_softirq+0xe4/0x1e0
> [41886.390367] [dffedf60] [c07eef84] __do_softirq+0x16c/0x5f0
> [41886.390379] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
> [41886.390391] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
> [41886.390404] [c0ce5e80] [c0009a2c] do_IRQ+0x98/0x1a0
> [41886.390411] [c0ce5eb0] [c001b93c] ret_from_except+0x0/0x14
> [41886.390420] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
>                    LR = arch_cpu_idle+0x30/0x78
> [41886.390429] [c0ce5f70] [c0ce4000] 0xc0ce4000 (unreliable)
> [41886.390441] [c0ce5f80] [c00a3928] do_idle+0xc4/0x158
> [41886.390449] [c0ce5fb0] [c00a3b74] cpu_startup_entry+0x24/0x28
> [41886.390457] [c0ce5fc0] [c0988820] start_kernel+0x47c/0x490
> [41886.390463] [c0ce5ff0] [00003444] 0x3444
> [41886.390468] ================================================================================

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: UBSAN: Undefined behaviour in ../include/linux/percpu_counter.h:137:13
  2018-06-13  1:42 ` Michael Ellerman
@ 2018-06-13  8:43   ` Mathieu Malaterre
  2018-06-13 11:41     ` Mathieu Malaterre
  0 siblings, 1 reply; 5+ messages in thread
From: Mathieu Malaterre @ 2018-06-13  8:43 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On Wed, Jun 13, 2018 at 3:43 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Mathieu Malaterre <malat@debian.org> writes:
>
> > Hi there,
> >
> > I have a reproducible UBSAN appearing in dmesg after a while on my G4
> > (*). Could anyone suggest a way to diagnose the actual root issue here
> > (or is it just a false positive) ?
>
> It looks like a real overflow, I guess the question is why are we seeing it.
>
> The first thing to work out would be what exactly is overflowing.
>
> Is it in here?
>
>         cfqg_stats_update_completion(cfqq->cfqg, rq->start_time_ns,
>                                      rq->io_start_time_ns, rq->cmd_flags);
>
>
> If so that would suggest something is taking multiple hours to complete,
> which seems unlikely. Is time going backward?

There is also something suspicious in the kern.log file:

Jun 12 20:09:04 debian kernel: [    5.504182]
================================================================================
Jun 12 20:09:04 debian kernel: [    5.508945] UBSAN: Undefined
behaviour in ../drivers/rtc/rtc-lib.c:87:22
Jun 12 20:09:04 debian kernel: [    5.513658] signed integer overflow:
Jun 12 20:09:04 debian kernel: [    5.518211] 1193024 * 3600 cannot be
represented in type 'int'
Jun 12 20:09:04 debian kernel: [    5.522866] CPU: 0 PID: 1 Comm:
swapper Not tainted 4.17.0+ #1
Jun 12 20:09:04 debian kernel: [    5.527567] Call Trace:
Jun 12 20:09:04 debian kernel: [    5.532200] [df4e7b00] [c0481074]
ubsan_epilogue+0x18/0x4c (unreliable)
Jun 12 20:09:04 debian kernel: [    5.537019] [df4e7b10] [c0481a14]
handle_overflow+0xbc/0xdc
Jun 12 20:09:04 debian kernel: [    5.541832] [df4e7b90] [c060d698]
rtc_time64_to_tm+0x344/0x388
Jun 12 20:09:04 debian kernel: [    5.546655] [df4e7bd0] [c001076c]
rtc_generic_get_time+0x2c/0x40
Jun 12 20:09:04 debian kernel: [    5.551477] [df4e7be0] [c06113d4]
__rtc_read_time+0x70/0x13c
Jun 12 20:09:04 debian kernel: [    5.556288] [df4e7c00] [c061150c]
rtc_read_time+0x6c/0x130
Jun 12 20:09:04 debian kernel: [    5.561088] [df4e7c30] [c061271c]
__rtc_read_alarm+0x34/0x684
Jun 12 20:09:04 debian kernel: [    5.565884] [df4e7ce0] [c060f234]
rtc_device_register+0x88/0x218
Jun 12 20:09:04 debian kernel: [    5.570695] [df4e7d40] [c060f428]
devm_rtc_device_register+0x64/0xc4
Jun 12 20:09:04 debian kernel: [    5.575528] [df4e7d60] [c09d15d4]
generic_rtc_probe+0x50/0x78
Jun 12 20:09:04 debian kernel: [    5.580359] [df4e7d70] [c055e4a4]
platform_drv_probe+0xa8/0x128
Jun 12 20:09:04 debian kernel: [    5.585210] [df4e7d90] [c0559d28]
driver_probe_device+0x354/0x6fc
Jun 12 20:09:04 debian kernel: [    5.590064] [df4e7dd0] [c055a270]
__driver_attach+0x1a0/0x22c
Jun 12 20:09:04 debian kernel: [    5.594917] [df4e7df0] [c0555b70]
bus_for_each_dev+0x84/0xdc
Jun 12 20:09:04 debian kernel: [    5.599750] [df4e7e20] [c0558420]
bus_add_driver+0x188/0x348
Jun 12 20:09:04 debian kernel: [    5.604584] [df4e7e40] [c055b7b4]
driver_register+0xa0/0x18c
Jun 12 20:09:04 debian kernel: [    5.609433] [df4e7e50] [c055e950]
__platform_driver_probe+0x8c/0x198
Jun 12 20:09:04 debian kernel: [    5.614330] [df4e7e70] [c0005800]
do_one_initcall+0x64/0x280
Jun 12 20:09:04 debian kernel: [    5.619237] [df4e7ee0] [c0997c04]
kernel_init_freeable+0x3a4/0x444
Jun 12 20:09:04 debian kernel: [    5.624145] [df4e7f30] [c00049f8]
kernel_init+0x24/0x118
Jun 12 20:09:04 debian kernel: [    5.629029] [df4e7f40] [c001b1c4]
ret_from_kernel_thread+0x14/0x1c
Jun 12 20:09:04 debian kernel: [    5.633878]
================================================================================


Grep-ing all leads to:

$ grep  "cannot be represented" kern.log | colrm 1 45|sort -u
 1193022 * 3600 cannot be represented in type 'int'
 1193024 * 3600 cannot be represented in type 'int'
 1193032 * 3600 cannot be represented in type 'int'
 1193033 * 3600 cannot be represented in type 'int'
 1193034 * 3600 cannot be represented in type 'int'
 1193035 * 3600 cannot be represented in type 'int'

How come tm_hour can store a value of 1193035 ?

> cheers
>
> > (*)
> > [41877.514338] ================================================================================
> > [41877.514364] UBSAN: Undefined behaviour in
> > ../include/linux/percpu_counter.h:137:13
> > [41877.514373] signed integer overflow:
> > [41877.514378] 9223352809007201260 + 41997676517838 cannot be
> > represented in type 'long long int'
> > [41877.514389] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #54
> > [41877.514394] Call Trace:
> > [41877.514411] [dffedd30] [c047a5f8] ubsan_epilogue+0x18/0x4c (unreliable)
> > [41877.514422] [dffedd40] [c047af98] handle_overflow+0xbc/0xdc
> > [41877.514437] [dffeddc0] [c043aaa8] cfq_completed_request+0x560/0x1234
> > [41877.514446] [dffede40] [c03f595c] __blk_put_request+0xb0/0x2dc
> > [41877.514460] [dffede80] [c05aa41c] scsi_end_request+0x19c/0x344
> > [41877.514469] [dffedeb0] [c05abba0] scsi_io_completion+0x4b4/0x854
> > [41877.514482] [dffedf10] [c040604c] blk_done_softirq+0xe4/0x1e0
> > [41877.514496] [dffedf60] [c07eef84] __do_softirq+0x16c/0x5f0
> > [41877.514508] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
> > [41877.514520] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
> > [41877.514533] [c0ce5e80] [c0009a2c] do_IRQ+0x98/0x1a0
> > [41877.514541] [c0ce5eb0] [c001b93c] ret_from_except+0x0/0x14
> > [41877.514549] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
> >                    LR = arch_cpu_idle+0x30/0x78
> > [41877.514558] [c0ce5f70] [c0ce4000] 0xc0ce4000 (unreliable)
> > [41877.514570] [c0ce5f80] [c00a3928] do_idle+0xc4/0x158
> > [41877.514577] [c0ce5fb0] [c00a3b74] cpu_startup_entry+0x24/0x28
> > [41877.514585] [c0ce5fc0] [c0988820] start_kernel+0x47c/0x490
> > [41877.514592] [c0ce5ff0] [00003444] 0x3444
> > [41877.514597] ================================================================================
> > [41886.390210] ================================================================================
> > [41886.390236] UBSAN: Undefined behaviour in
> > ../include/linux/percpu_counter.h:137:13
> > [41886.390245] signed integer overflow:
> > [41886.390250] 9223366156262940402 + 42006563339289 cannot be
> > represented in type 'long long int'
> > [41886.390260] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #54
> > [41886.390265] Call Trace:
> > [41886.390282] [dffedd30] [c047a5f8] ubsan_epilogue+0x18/0x4c (unreliable)
> > [41886.390293] [dffedd40] [c047af98] handle_overflow+0xbc/0xdc
> > [41886.390309] [dffeddc0] [c043a8c4] cfq_completed_request+0x37c/0x1234
> > [41886.390317] [dffede40] [c03f595c] __blk_put_request+0xb0/0x2dc
> > [41886.390331] [dffede80] [c05aa41c] scsi_end_request+0x19c/0x344
> > [41886.390340] [dffedeb0] [c05abba0] scsi_io_completion+0x4b4/0x854
> > [41886.390353] [dffedf10] [c040604c] blk_done_softirq+0xe4/0x1e0
> > [41886.390367] [dffedf60] [c07eef84] __do_softirq+0x16c/0x5f0
> > [41886.390379] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
> > [41886.390391] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
> > [41886.390404] [c0ce5e80] [c0009a2c] do_IRQ+0x98/0x1a0
> > [41886.390411] [c0ce5eb0] [c001b93c] ret_from_except+0x0/0x14
> > [41886.390420] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
> >                    LR = arch_cpu_idle+0x30/0x78
> > [41886.390429] [c0ce5f70] [c0ce4000] 0xc0ce4000 (unreliable)
> > [41886.390441] [c0ce5f80] [c00a3928] do_idle+0xc4/0x158
> > [41886.390449] [c0ce5fb0] [c00a3b74] cpu_startup_entry+0x24/0x28
> > [41886.390457] [c0ce5fc0] [c0988820] start_kernel+0x47c/0x490
> > [41886.390463] [c0ce5ff0] [00003444] 0x3444
> > [41886.390468] ================================================================================

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: UBSAN: Undefined behaviour in ../include/linux/percpu_counter.h:137:13
  2018-06-13  8:43   ` Mathieu Malaterre
@ 2018-06-13 11:41     ` Mathieu Malaterre
  2018-06-15 12:34       ` Michael Ellerman
  0 siblings, 1 reply; 5+ messages in thread
From: Mathieu Malaterre @ 2018-06-13 11:41 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On Wed, Jun 13, 2018 at 10:43 AM Mathieu Malaterre <malat@debian.org> wrote:
>
> On Wed, Jun 13, 2018 at 3:43 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> >
> > Mathieu Malaterre <malat@debian.org> writes:
> >
> > > Hi there,
> > >
> > > I have a reproducible UBSAN appearing in dmesg after a while on my G4
> > > (*). Could anyone suggest a way to diagnose the actual root issue here
> > > (or is it just a false positive) ?
> >
> > It looks like a real overflow, I guess the question is why are we seeing it.
> >
> > The first thing to work out would be what exactly is overflowing.
> >
> > Is it in here?
> >
> >         cfqg_stats_update_completion(cfqq->cfqg, rq->start_time_ns,
> >                                      rq->io_start_time_ns, rq->cmd_flags);
> >
> >
> > If so that would suggest something is taking multiple hours to complete,
> > which seems unlikely. Is time going backward?
>
> There is also something suspicious in the kern.log file:
>
> Jun 12 20:09:04 debian kernel: [    5.504182]
> ================================================================================
> Jun 12 20:09:04 debian kernel: [    5.508945] UBSAN: Undefined
> behaviour in ../drivers/rtc/rtc-lib.c:87:22
> Jun 12 20:09:04 debian kernel: [    5.513658] signed integer overflow:
> Jun 12 20:09:04 debian kernel: [    5.518211] 1193024 * 3600 cannot be
> represented in type 'int'
> Jun 12 20:09:04 debian kernel: [    5.522866] CPU: 0 PID: 1 Comm:
> swapper Not tainted 4.17.0+ #1
> Jun 12 20:09:04 debian kernel: [    5.527567] Call Trace:
> Jun 12 20:09:04 debian kernel: [    5.532200] [df4e7b00] [c0481074]
> ubsan_epilogue+0x18/0x4c (unreliable)
> Jun 12 20:09:04 debian kernel: [    5.537019] [df4e7b10] [c0481a14]
> handle_overflow+0xbc/0xdc
> Jun 12 20:09:04 debian kernel: [    5.541832] [df4e7b90] [c060d698]
> rtc_time64_to_tm+0x344/0x388
> Jun 12 20:09:04 debian kernel: [    5.546655] [df4e7bd0] [c001076c]
> rtc_generic_get_time+0x2c/0x40
> Jun 12 20:09:04 debian kernel: [    5.551477] [df4e7be0] [c06113d4]
> __rtc_read_time+0x70/0x13c
> Jun 12 20:09:04 debian kernel: [    5.556288] [df4e7c00] [c061150c]
> rtc_read_time+0x6c/0x130
> Jun 12 20:09:04 debian kernel: [    5.561088] [df4e7c30] [c061271c]
> __rtc_read_alarm+0x34/0x684
> Jun 12 20:09:04 debian kernel: [    5.565884] [df4e7ce0] [c060f234]
> rtc_device_register+0x88/0x218
> Jun 12 20:09:04 debian kernel: [    5.570695] [df4e7d40] [c060f428]
> devm_rtc_device_register+0x64/0xc4
> Jun 12 20:09:04 debian kernel: [    5.575528] [df4e7d60] [c09d15d4]
> generic_rtc_probe+0x50/0x78
> Jun 12 20:09:04 debian kernel: [    5.580359] [df4e7d70] [c055e4a4]
> platform_drv_probe+0xa8/0x128
> Jun 12 20:09:04 debian kernel: [    5.585210] [df4e7d90] [c0559d28]
> driver_probe_device+0x354/0x6fc
> Jun 12 20:09:04 debian kernel: [    5.590064] [df4e7dd0] [c055a270]
> __driver_attach+0x1a0/0x22c
> Jun 12 20:09:04 debian kernel: [    5.594917] [df4e7df0] [c0555b70]
> bus_for_each_dev+0x84/0xdc
> Jun 12 20:09:04 debian kernel: [    5.599750] [df4e7e20] [c0558420]
> bus_add_driver+0x188/0x348
> Jun 12 20:09:04 debian kernel: [    5.604584] [df4e7e40] [c055b7b4]
> driver_register+0xa0/0x18c
> Jun 12 20:09:04 debian kernel: [    5.609433] [df4e7e50] [c055e950]
> __platform_driver_probe+0x8c/0x198
> Jun 12 20:09:04 debian kernel: [    5.614330] [df4e7e70] [c0005800]
> do_one_initcall+0x64/0x280
> Jun 12 20:09:04 debian kernel: [    5.619237] [df4e7ee0] [c0997c04]
> kernel_init_freeable+0x3a4/0x444
> Jun 12 20:09:04 debian kernel: [    5.624145] [df4e7f30] [c00049f8]
> kernel_init+0x24/0x118
> Jun 12 20:09:04 debian kernel: [    5.629029] [df4e7f40] [c001b1c4]
> ret_from_kernel_thread+0x14/0x1c
> Jun 12 20:09:04 debian kernel: [    5.633878]
> ================================================================================
>
>
> Grep-ing all leads to:
>
> $ grep  "cannot be represented" kern.log | colrm 1 45|sort -u
>  1193022 * 3600 cannot be represented in type 'int'
>  1193024 * 3600 cannot be represented in type 'int'
>  1193032 * 3600 cannot be represented in type 'int'
>  1193033 * 3600 cannot be represented in type 'int'
>  1193034 * 3600 cannot be represented in type 'int'
>  1193035 * 3600 cannot be represented in type 'int'
>
> How come tm_hour can store a value of 1193035 ?

It appears that I am getting a negative value for  time64_t :

[    5.495470] devices_kset: Moving rtc-generic to end of list
[    5.502584]  mm DBG: TIME64_T -2068738348
[    5.507205] WARNING: CPU: 0 PID: 1 at ../drivers/rtc/rtc-lib.c:60
rtc_time64_to_tm+0x48/0x38c
[    5.511843] Modules linked in:
[    5.516306] CPU: 0 PID: 1 Comm: swapper Not tainted 4.17.0+ #6
[    5.520879] NIP:  c060d498 LR: c060d494 CTR: c00d49dc
[    5.525467] REGS: df4e7ad0 TRAP: 0700   Not tainted  (4.17.0+)
[    5.530115] MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 88008282  XER: 00000000
[    5.534865]
               GPR00: c060d494 df4e7b80 df4e8000 00000019 00000001
0000032e 00000400 00000000
               GPR08: 00000003 00000001 c0aa4b38 00000000 24008882
00000000 c09dfc4c c09dfc28
               GPR16: 00000690 c0aedc6c c0aee7f8 c0aed13c 00000054
c0ae75cc c0aea2a4 00000000
               GPR24: c0d70000 c0a85850 df4e7c5c df4e7c5c df4e7ce8
de519040 df4e7c5c ffffffff
[    5.559346] NIP [c060d498] rtc_time64_to_tm+0x48/0x38c
[    5.564356] LR [c060d494] rtc_time64_to_tm+0x44/0x38c
[    5.569382] Call Trace:
[    5.574354] [df4e7b80] [c060d494] rtc_time64_to_tm+0x44/0x38c (unreliable)
[    5.579525] [df4e7bd0] [c001076c] rtc_generic_get_time+0x2c/0x40
[    5.584689] [df4e7be0] [c06114d4] __rtc_read_time+0x70/0x13c
[    5.589845] [df4e7c00] [c061160c] rtc_read_time+0x6c/0x130
[    5.594993] [df4e7c30] [c061281c] __rtc_read_alarm+0x34/0x684
[    5.600151] [df4e7ce0] [c060f334] rtc_device_register+0x88/0x218
[    5.605343] [df4e7d40] [c060f528] devm_rtc_device_register+0x64/0xc4
[    5.610584] [df4e7d60] [c09d25a8] generic_rtc_probe+0x50/0x78
[    5.615833] [df4e7d70] [c055e5a0] platform_drv_probe+0xa8/0x128
[    5.621106] [df4e7d90] [c0559e24] driver_probe_device+0x354/0x6fc
[    5.626387] [df4e7dd0] [c055a36c] __driver_attach+0x1a0/0x22c
[    5.631663] [df4e7df0] [c0555c6c] bus_for_each_dev+0x84/0xdc
[    5.636853] [df4e7e20] [c055851c] bus_add_driver+0x188/0x348
[    5.641906] [df4e7e40] [c055b8b0] driver_register+0xa0/0x18c
[    5.646850] [df4e7e50] [c055ea4c] __platform_driver_probe+0x8c/0x198
[    5.651841] [df4e7e70] [c0005800] do_one_initcall+0x64/0x280
[    5.656805] [df4e7ee0] [c0998bd8] kernel_init_freeable+0x3a4/0x444
[    5.661752] [df4e7f30] [c00049f8] kernel_init+0x24/0x118
[    5.666596] [df4e7f40] [c001b1c4] ret_from_kernel_thread+0x14/0x1c
[    5.671348] Instruction dump:
[    5.675944] 90010054 bf010030 9181002c 7c7f1b78 7cba2b78 90810018
7c651b78 3c60c093
[    5.680655] 7c862378 38636340 4bac3c1d 57e90ffe <0f090000> 3ca00001
80810018 7fe3fb78
[    5.685436] ---[ end trace 57955f2c4e7598fd ]---
[    5.690118]  mm DBG: SECS 4294904148
[    5.694648]  mm DBG: HOUR 1193028


> > cheers
> >
> > > (*)
> > > [41877.514338] ================================================================================
> > > [41877.514364] UBSAN: Undefined behaviour in
> > > ../include/linux/percpu_counter.h:137:13
> > > [41877.514373] signed integer overflow:
> > > [41877.514378] 9223352809007201260 + 41997676517838 cannot be
> > > represented in type 'long long int'
> > > [41877.514389] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #54
> > > [41877.514394] Call Trace:
> > > [41877.514411] [dffedd30] [c047a5f8] ubsan_epilogue+0x18/0x4c (unreliable)
> > > [41877.514422] [dffedd40] [c047af98] handle_overflow+0xbc/0xdc
> > > [41877.514437] [dffeddc0] [c043aaa8] cfq_completed_request+0x560/0x1234
> > > [41877.514446] [dffede40] [c03f595c] __blk_put_request+0xb0/0x2dc
> > > [41877.514460] [dffede80] [c05aa41c] scsi_end_request+0x19c/0x344
> > > [41877.514469] [dffedeb0] [c05abba0] scsi_io_completion+0x4b4/0x854
> > > [41877.514482] [dffedf10] [c040604c] blk_done_softirq+0xe4/0x1e0
> > > [41877.514496] [dffedf60] [c07eef84] __do_softirq+0x16c/0x5f0
> > > [41877.514508] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
> > > [41877.514520] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
> > > [41877.514533] [c0ce5e80] [c0009a2c] do_IRQ+0x98/0x1a0
> > > [41877.514541] [c0ce5eb0] [c001b93c] ret_from_except+0x0/0x14
> > > [41877.514549] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
> > >                    LR = arch_cpu_idle+0x30/0x78
> > > [41877.514558] [c0ce5f70] [c0ce4000] 0xc0ce4000 (unreliable)
> > > [41877.514570] [c0ce5f80] [c00a3928] do_idle+0xc4/0x158
> > > [41877.514577] [c0ce5fb0] [c00a3b74] cpu_startup_entry+0x24/0x28
> > > [41877.514585] [c0ce5fc0] [c0988820] start_kernel+0x47c/0x490
> > > [41877.514592] [c0ce5ff0] [00003444] 0x3444
> > > [41877.514597] ================================================================================
> > > [41886.390210] ================================================================================
> > > [41886.390236] UBSAN: Undefined behaviour in
> > > ../include/linux/percpu_counter.h:137:13
> > > [41886.390245] signed integer overflow:
> > > [41886.390250] 9223366156262940402 + 42006563339289 cannot be
> > > represented in type 'long long int'
> > > [41886.390260] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #54
> > > [41886.390265] Call Trace:
> > > [41886.390282] [dffedd30] [c047a5f8] ubsan_epilogue+0x18/0x4c (unreliable)
> > > [41886.390293] [dffedd40] [c047af98] handle_overflow+0xbc/0xdc
> > > [41886.390309] [dffeddc0] [c043a8c4] cfq_completed_request+0x37c/0x1234
> > > [41886.390317] [dffede40] [c03f595c] __blk_put_request+0xb0/0x2dc
> > > [41886.390331] [dffede80] [c05aa41c] scsi_end_request+0x19c/0x344
> > > [41886.390340] [dffedeb0] [c05abba0] scsi_io_completion+0x4b4/0x854
> > > [41886.390353] [dffedf10] [c040604c] blk_done_softirq+0xe4/0x1e0
> > > [41886.390367] [dffedf60] [c07eef84] __do_softirq+0x16c/0x5f0
> > > [41886.390379] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
> > > [41886.390391] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
> > > [41886.390404] [c0ce5e80] [c0009a2c] do_IRQ+0x98/0x1a0
> > > [41886.390411] [c0ce5eb0] [c001b93c] ret_from_except+0x0/0x14
> > > [41886.390420] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
> > >                    LR = arch_cpu_idle+0x30/0x78
> > > [41886.390429] [c0ce5f70] [c0ce4000] 0xc0ce4000 (unreliable)
> > > [41886.390441] [c0ce5f80] [c00a3928] do_idle+0xc4/0x158
> > > [41886.390449] [c0ce5fb0] [c00a3b74] cpu_startup_entry+0x24/0x28
> > > [41886.390457] [c0ce5fc0] [c0988820] start_kernel+0x47c/0x490
> > > [41886.390463] [c0ce5ff0] [00003444] 0x3444
> > > [41886.390468] ================================================================================

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: UBSAN: Undefined behaviour in ../include/linux/percpu_counter.h:137:13
  2018-06-13 11:41     ` Mathieu Malaterre
@ 2018-06-15 12:34       ` Michael Ellerman
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Ellerman @ 2018-06-15 12:34 UTC (permalink / raw)
  To: Mathieu Malaterre; +Cc: linuxppc-dev

Mathieu Malaterre <malat@debian.org> writes:
> On Wed, Jun 13, 2018 at 10:43 AM Mathieu Malaterre <malat@debian.org> wrote:
>> On Wed, Jun 13, 2018 at 3:43 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>> > Mathieu Malaterre <malat@debian.org> writes:
>> > > Hi there,
>> > >
>> > > I have a reproducible UBSAN appearing in dmesg after a while on my G4
>> > > (*). Could anyone suggest a way to diagnose the actual root issue here
>> > > (or is it just a false positive) ?
>> >
>> > It looks like a real overflow, I guess the question is why are we seeing it.
>> >
>> > The first thing to work out would be what exactly is overflowing.
>> >
>> > Is it in here?
>> >
>> >         cfqg_stats_update_completion(cfqq->cfqg, rq->start_time_ns,
>> >                                      rq->io_start_time_ns, rq->cmd_flags);
>> >
>> >
>> > If so that would suggest something is taking multiple hours to complete,
>> > which seems unlikely. Is time going backward?
>>
>> There is also something suspicious in the kern.log file:
>>
>> Jun 12 20:09:04 debian kernel: [    5.504182]
>> ================================================================================
>> Jun 12 20:09:04 debian kernel: [    5.508945] UBSAN: Undefined
>> behaviour in ../drivers/rtc/rtc-lib.c:87:22
>> Jun 12 20:09:04 debian kernel: [    5.513658] signed integer overflow:
>> Jun 12 20:09:04 debian kernel: [    5.518211] 1193024 * 3600 cannot be
>> represented in type 'int'
>> Jun 12 20:09:04 debian kernel: [    5.522866] CPU: 0 PID: 1 Comm:
>> swapper Not tainted 4.17.0+ #1
>> Jun 12 20:09:04 debian kernel: [    5.527567] Call Trace:
>> Jun 12 20:09:04 debian kernel: [    5.532200] [df4e7b00] [c0481074]
>> ubsan_epilogue+0x18/0x4c (unreliable)
>> Jun 12 20:09:04 debian kernel: [    5.537019] [df4e7b10] [c0481a14]
>> handle_overflow+0xbc/0xdc
>> Jun 12 20:09:04 debian kernel: [    5.541832] [df4e7b90] [c060d698]
>> rtc_time64_to_tm+0x344/0x388
>> Jun 12 20:09:04 debian kernel: [    5.546655] [df4e7bd0] [c001076c]
>> rtc_generic_get_time+0x2c/0x40
>> Jun 12 20:09:04 debian kernel: [    5.551477] [df4e7be0] [c06113d4]
>> __rtc_read_time+0x70/0x13c
>> Jun 12 20:09:04 debian kernel: [    5.556288] [df4e7c00] [c061150c]
>> rtc_read_time+0x6c/0x130
>> Jun 12 20:09:04 debian kernel: [    5.561088] [df4e7c30] [c061271c]
>> __rtc_read_alarm+0x34/0x684
>> Jun 12 20:09:04 debian kernel: [    5.565884] [df4e7ce0] [c060f234]
>> rtc_device_register+0x88/0x218
>> Jun 12 20:09:04 debian kernel: [    5.570695] [df4e7d40] [c060f428]
>> devm_rtc_device_register+0x64/0xc4
>> Jun 12 20:09:04 debian kernel: [    5.575528] [df4e7d60] [c09d15d4]
>> generic_rtc_probe+0x50/0x78
>> Jun 12 20:09:04 debian kernel: [    5.580359] [df4e7d70] [c055e4a4]
>> platform_drv_probe+0xa8/0x128
>> Jun 12 20:09:04 debian kernel: [    5.585210] [df4e7d90] [c0559d28]
>> driver_probe_device+0x354/0x6fc
>> Jun 12 20:09:04 debian kernel: [    5.590064] [df4e7dd0] [c055a270]
>> __driver_attach+0x1a0/0x22c
>> Jun 12 20:09:04 debian kernel: [    5.594917] [df4e7df0] [c0555b70]
>> bus_for_each_dev+0x84/0xdc
>> Jun 12 20:09:04 debian kernel: [    5.599750] [df4e7e20] [c0558420]
>> bus_add_driver+0x188/0x348
>> Jun 12 20:09:04 debian kernel: [    5.604584] [df4e7e40] [c055b7b4]
>> driver_register+0xa0/0x18c
>> Jun 12 20:09:04 debian kernel: [    5.609433] [df4e7e50] [c055e950]
>> __platform_driver_probe+0x8c/0x198
>> Jun 12 20:09:04 debian kernel: [    5.614330] [df4e7e70] [c0005800]
>> do_one_initcall+0x64/0x280
>> Jun 12 20:09:04 debian kernel: [    5.619237] [df4e7ee0] [c0997c04]
>> kernel_init_freeable+0x3a4/0x444
>> Jun 12 20:09:04 debian kernel: [    5.624145] [df4e7f30] [c00049f8]
>> kernel_init+0x24/0x118
>> Jun 12 20:09:04 debian kernel: [    5.629029] [df4e7f40] [c001b1c4]
>> ret_from_kernel_thread+0x14/0x1c
>> Jun 12 20:09:04 debian kernel: [    5.633878]
>> ================================================================================
>>
>>
>> Grep-ing all leads to:
>>
>> $ grep  "cannot be represented" kern.log | colrm 1 45|sort -u
>>  1193022 * 3600 cannot be represented in type 'int'
>>  1193024 * 3600 cannot be represented in type 'int'
>>  1193032 * 3600 cannot be represented in type 'int'
>>  1193033 * 3600 cannot be represented in type 'int'
>>  1193034 * 3600 cannot be represented in type 'int'
>>  1193035 * 3600 cannot be represented in type 'int'
>>
>> How come tm_hour can store a value of 1193035 ?
>
> It appears that I am getting a negative value for  time64_t :

That sounds bad :)

Thanks for tracking it down.

cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-06-15 12:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-07 18:26 UBSAN: Undefined behaviour in ../include/linux/percpu_counter.h:137:13 Mathieu Malaterre
2018-06-13  1:42 ` Michael Ellerman
2018-06-13  8:43   ` Mathieu Malaterre
2018-06-13 11:41     ` Mathieu Malaterre
2018-06-15 12:34       ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.