All of lore.kernel.org
 help / color / mirror / Atom feed
* linux-next: boot failures with next-20120411
@ 2012-04-11  6:58 Stephen Rothwell
  2012-04-12  2:44 ` Milton Miller
  2012-04-13  2:30   ` Michael Neuling
  0 siblings, 2 replies; 14+ messages in thread
From: Stephen Rothwell @ 2012-04-11  6:58 UTC (permalink / raw)
  To: LKML; +Cc: ppc-dev

[-- Attachment #1: Type: text/plain, Size: 2453 bytes --]

Hi all,

Some (not all) of my PowerPC boot tests have failed like this after
getting into user mode (this one was just after udev started, but others
are after other processes getting going):

Unable to handle kernel paging request for data at address 0xc0000003f9d550
Faulting instruction address: 0xc0000000001b7f40
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in: ehea
NIP: c0000000001b7f40 LR: c0000000001b7f14 CTR: c0000000000e04f0
REGS: c0000003f68bf6b0 TRAP: 0300   Not tainted  (3.4.0-rc2-autokern1)
MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 24422424  XER: 20000001
SOFTE: 1
CFAR: 000000000000562c
DAR: 00c0000003f9d550, DSISR: 40000000
TASK = c0000003f8818000[3192] 'kdump' THREAD: c0000003f68bc000 CPU: 5
GPR00: 0000000000000000 c0000003f68bf930 c000000000ce1d40 c0000003fe00ec00 
GPR04: 00000000000002d0 0000000000000038 c0000003f8f935e8 c000000000e55280 
GPR08: 0000000000000011 c000000000bcb280 c000000000bcb1e8 000000000028a000 
GPR12: 0000000024422424 c00000000f33bc80 00000fffdd90a770 0000000000081000 
GPR16: c0000003f846c000 000000000de4f7a0 f00000000de4f7a0 0000000000000000 
GPR20: c0000003f8365408 c0000003f8365480 c0000003f8e5d110 0000000000000000 
GPR24: 0000000000000100 c0000003f8365400 c0000000001e5424 00000000000002d0 
GPR28: 0000000000000800 00c0000003f9d550 c000000000c5b718 c0000003fe00ec00 
NIP [c0000000001b7f40] .__kmalloc+0x70/0x230
LR [c0000000001b7f14] .__kmalloc+0x44/0x230
Call Trace:
[c0000003f68bf930] [c0000003f68bf9b0] 0xc0000003f68bf9b0 (unreliable)
[c0000003f68bf9e0] [c0000000001e5424] .alloc_fdmem+0x24/0x70
[c0000003f68bfa60] [c0000000001e54f8] .alloc_fdtable+0x88/0x130
[c0000003f68bfaf0] [c0000000001e5924] .dup_fd+0x384/0x450
[c0000003f68bfbd0] [c00000000009a310] .copy_process+0x880/0x11d0
[c0000003f68bfcd0] [c00000000009aee0] .do_fork+0x70/0x400
[c0000003f68bfdc0] [c0000000000141c4] .sys_clone+0x54/0x70
[c0000003f68bfe30] [c000000000009aa0] .ppc_clone+0x8/0xc
Instruction dump:
4bff9281 2ba30010 7c7f1b78 40dd00f4 e96d0040 e93f0000 7ce95a14 e9070008 
7fa9582a 2fbd0000 41de0054 e81f0022 <7f3d002a> 38000000 886d01f2 980d01f2 
---[ end trace 366fe6c7ced3bfb0 ]---

This did not happen yesterday.  Just wondering if anyone can think of
anything obvious.  Full console log at
http://ozlabs.org/~sfr/next-20120411.log.bz2

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: linux-next: boot failures with next-20120411
  2012-04-11  6:58 linux-next: boot failures with next-20120411 Stephen Rothwell
@ 2012-04-12  2:44 ` Milton Miller
  2012-04-13  2:30   ` Michael Neuling
  1 sibling, 0 replies; 14+ messages in thread
From: Milton Miller @ 2012-04-12  2:44 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: ppc-dev, LKML

On Wed, 11 Apr 2012 about 16:58:35 +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Some (not all) of my PowerPC boot tests have failed like this after
> getting into user mode (this one was just after udev started, but others
> are after other processes getting going):
> 
> Unable to handle kernel paging request for data at address 0xc0000003f9d550
> Faulting instruction address: 0xc0000000001b7f40
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=32 NUMA pSeries
> Modules linked in: ehea
> NIP: c0000000001b7f40 LR: c0000000001b7f14 CTR: c0000000000e04f0
> REGS: c0000003f68bf6b0 TRAP: 0300   Not tainted  (3.4.0-rc2-autokern1)
> MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 24422424  XER: 20000001
> SOFTE: 1
> CFAR: 000000000000562c
> DAR: 00c0000003f9d550, DSISR: 40000000
> TASK = c0000003f8818000[3192] 'kdump' THREAD: c0000003f68bc000 CPU: 5
> GPR00: 0000000000000000 c0000003f68bf930 c000000000ce1d40 c0000003fe00ec00 
> GPR04: 00000000000002d0 0000000000000038 c0000003f8f935e8 c000000000e55280 
> GPR08: 0000000000000011 c000000000bcb280 c000000000bcb1e8 000000000028a000 
> GPR12: 0000000024422424 c00000000f33bc80 00000fffdd90a770 0000000000081000 
> GPR16: c0000003f846c000 000000000de4f7a0 f00000000de4f7a0 0000000000000000 
> GPR20: c0000003f8365408 c0000003f8365480 c0000003f8e5d110 0000000000000000 
> GPR24: 0000000000000100 c0000003f8365400 c0000000001e5424 00000000000002d0 
> GPR28: 0000000000000800 00c0000003f9d550 c000000000c5b718 c0000003fe00ec00 
> NIP [c0000000001b7f40] .__kmalloc+0x70/0x230
> LR [c0000000001b7f14] .__kmalloc+0x44/0x230
> Call Trace:
> [c0000003f68bf930] [c0000003f68bf9b0] 0xc0000003f68bf9b0 (unreliable)
> [c0000003f68bf9e0] [c0000000001e5424] .alloc_fdmem+0x24/0x70
> [c0000003f68bfa60] [c0000000001e54f8] .alloc_fdtable+0x88/0x130
> [c0000003f68bfaf0] [c0000000001e5924] .dup_fd+0x384/0x450
> [c0000003f68bfbd0] [c00000000009a310] .copy_process+0x880/0x11d0
> [c0000003f68bfcd0] [c00000000009aee0] .do_fork+0x70/0x400
> [c0000003f68bfdc0] [c0000000000141c4] .sys_clone+0x54/0x70
> [c0000003f68bfe30] [c000000000009aa0] .ppc_clone+0x8/0xc
> Instruction dump:
> 4bff9281 2ba30010 7c7f1b78 40dd00f4 e96d0040 e93f0000 7ce95a14 e9070008 
> 7fa9582a 2fbd0000 41de0054 e81f0022 <7f3d002a> 38000000 886d01f2 980d01f2 
> ---[ end trace 366fe6c7ced3bfb0 ]---
> This did not happen yesterday.  Just wondering if anyone can think of
> anything obvious.  Full console log at
> http://ozlabs.org/~sfr/next-20120411.log.bz2

Hi Steven.

The DAR print of the faulting address points out that the address
appears to be shifted right 8 bits.  Or more likely the address used
to load the register was decremented by one somewhere (Big Endian).

Although all the registers are multiples of 4 in the first dump,
looking at the later oops in the log would seem to confirm the
address being decremented, eg put_files struct dar of 
c0000003f9d547ff in oops #2, and dar 00000000ffffffff in #9, #12,
#14, and #16.

No idea if this is caused by a bad save/restore somewhere or a
decrement of a 32bit number in memory.

Anyone else with a wild -1 on a int, u32 or s32?

milton

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: linux-next: boot failures with next-20120411
  2012-04-11  6:58 linux-next: boot failures with next-20120411 Stephen Rothwell
@ 2012-04-13  2:30   ` Michael Neuling
  2012-04-13  2:30   ` Michael Neuling
  1 sibling, 0 replies; 14+ messages in thread
From: Michael Neuling @ 2012-04-13  2:30 UTC (permalink / raw)
  To: Stephen Rothwell, Jiri Slaby, Greg Kroah-Hartman
  Cc: LKML, ppc-dev, linux-next

Stephen Rothwell <sfr@canb.auug.org.au> wrote:

> Hi all,
> 
> Some (not all) of my PowerPC boot tests have failed like this after
> getting into user mode (this one was just after udev started, but others
> are after other processes getting going):
> 
> Unable to handle kernel paging request for data at address 0xc0000003f9d550
> Faulting instruction address: 0xc0000000001b7f40
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=32 NUMA pSeries
> Modules linked in: ehea
> NIP: c0000000001b7f40 LR: c0000000001b7f14 CTR: c0000000000e04f0
> REGS: c0000003f68bf6b0 TRAP: 0300   Not tainted  (3.4.0-rc2-autokern1)
> MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 24422424  XER: 20000001
> SOFTE: 1
> CFAR: 000000000000562c
> DAR: 00c0000003f9d550, DSISR: 40000000
> TASK = c0000003f8818000[3192] 'kdump' THREAD: c0000003f68bc000 CPU: 5
> GPR00: 0000000000000000 c0000003f68bf930 c000000000ce1d40 c0000003fe00ec00 
> GPR04: 00000000000002d0 0000000000000038 c0000003f8f935e8 c000000000e55280 
> GPR08: 0000000000000011 c000000000bcb280 c000000000bcb1e8 000000000028a000 
> GPR12: 0000000024422424 c00000000f33bc80 00000fffdd90a770 0000000000081000 
> GPR16: c0000003f846c000 000000000de4f7a0 f00000000de4f7a0 0000000000000000 
> GPR20: c0000003f8365408 c0000003f8365480 c0000003f8e5d110 0000000000000000 
> GPR24: 0000000000000100 c0000003f8365400 c0000000001e5424 00000000000002d0 
> GPR28: 0000000000000800 00c0000003f9d550 c000000000c5b718 c0000003fe00ec00 
> NIP [c0000000001b7f40] .__kmalloc+0x70/0x230
> LR [c0000000001b7f14] .__kmalloc+0x44/0x230
> Call Trace:
> [c0000003f68bf930] [c0000003f68bf9b0] 0xc0000003f68bf9b0 (unreliable)
> [c0000003f68bf9e0] [c0000000001e5424] .alloc_fdmem+0x24/0x70
> [c0000003f68bfa60] [c0000000001e54f8] .alloc_fdtable+0x88/0x130
> [c0000003f68bfaf0] [c0000000001e5924] .dup_fd+0x384/0x450
> [c0000003f68bfbd0] [c00000000009a310] .copy_process+0x880/0x11d0
> [c0000003f68bfcd0] [c00000000009aee0] .do_fork+0x70/0x400
> [c0000003f68bfdc0] [c0000000000141c4] .sys_clone+0x54/0x70
> [c0000003f68bfe30] [c000000000009aa0] .ppc_clone+0x8/0xc
> Instruction dump:
> 4bff9281 2ba30010 7c7f1b78 40dd00f4 e96d0040 e93f0000 7ce95a14 e9070008 
> 7fa9582a 2fbd0000 41de0054 e81f0022 <7f3d002a> 38000000 886d01f2 980d01f2 
> ---[ end trace 366fe6c7ced3bfb0 ]---
> 
> This did not happen yesterday.  Just wondering if anyone can think of
> anything obvious.  Full console log at
> http://ozlabs.org/~sfr/next-20120411.log.bz2

I managed to bisect this down using pseries_defconfig with next-20120412
to this patch:

  commit 85bbc003b24335e253a392f6a9874103b77abb36
  Author: Jiri Slaby <jslaby@suse.cz>
  Date:   Mon Apr 2 13:54:22 2012 +0200

      TTY: HVC, use tty from tty_port

      The driver already used refcounting. So we just switch it to tty_port
      helpers. And switch to tty_port->lock for tty.

      Signed-off-by: Jiri Slaby <jslaby@suse.cz>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Reverting this commit (and 0146b6939074ebe14ece3604fd00e7be128a3812
otherwise git barfs) fixes the problem on next-20120412.  

I'm assuming we got the ref count changes wrong somewhere in the patch
but the tty code is beyond me.  Jiri, can you take a look?

Mikey

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: linux-next: boot failures with next-20120411
@ 2012-04-13  2:30   ` Michael Neuling
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Neuling @ 2012-04-13  2:30 UTC (permalink / raw)
  To: Stephen Rothwell, Jiri Slaby, Greg Kroah-Hartman
  Cc: linux-next, ppc-dev, LKML

Stephen Rothwell <sfr@canb.auug.org.au> wrote:

> Hi all,
> 
> Some (not all) of my PowerPC boot tests have failed like this after
> getting into user mode (this one was just after udev started, but others
> are after other processes getting going):
> 
> Unable to handle kernel paging request for data at address 0xc0000003f9d550
> Faulting instruction address: 0xc0000000001b7f40
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=32 NUMA pSeries
> Modules linked in: ehea
> NIP: c0000000001b7f40 LR: c0000000001b7f14 CTR: c0000000000e04f0
> REGS: c0000003f68bf6b0 TRAP: 0300   Not tainted  (3.4.0-rc2-autokern1)
> MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 24422424  XER: 20000001
> SOFTE: 1
> CFAR: 000000000000562c
> DAR: 00c0000003f9d550, DSISR: 40000000
> TASK = c0000003f8818000[3192] 'kdump' THREAD: c0000003f68bc000 CPU: 5
> GPR00: 0000000000000000 c0000003f68bf930 c000000000ce1d40 c0000003fe00ec00 
> GPR04: 00000000000002d0 0000000000000038 c0000003f8f935e8 c000000000e55280 
> GPR08: 0000000000000011 c000000000bcb280 c000000000bcb1e8 000000000028a000 
> GPR12: 0000000024422424 c00000000f33bc80 00000fffdd90a770 0000000000081000 
> GPR16: c0000003f846c000 000000000de4f7a0 f00000000de4f7a0 0000000000000000 
> GPR20: c0000003f8365408 c0000003f8365480 c0000003f8e5d110 0000000000000000 
> GPR24: 0000000000000100 c0000003f8365400 c0000000001e5424 00000000000002d0 
> GPR28: 0000000000000800 00c0000003f9d550 c000000000c5b718 c0000003fe00ec00 
> NIP [c0000000001b7f40] .__kmalloc+0x70/0x230
> LR [c0000000001b7f14] .__kmalloc+0x44/0x230
> Call Trace:
> [c0000003f68bf930] [c0000003f68bf9b0] 0xc0000003f68bf9b0 (unreliable)
> [c0000003f68bf9e0] [c0000000001e5424] .alloc_fdmem+0x24/0x70
> [c0000003f68bfa60] [c0000000001e54f8] .alloc_fdtable+0x88/0x130
> [c0000003f68bfaf0] [c0000000001e5924] .dup_fd+0x384/0x450
> [c0000003f68bfbd0] [c00000000009a310] .copy_process+0x880/0x11d0
> [c0000003f68bfcd0] [c00000000009aee0] .do_fork+0x70/0x400
> [c0000003f68bfdc0] [c0000000000141c4] .sys_clone+0x54/0x70
> [c0000003f68bfe30] [c000000000009aa0] .ppc_clone+0x8/0xc
> Instruction dump:
> 4bff9281 2ba30010 7c7f1b78 40dd00f4 e96d0040 e93f0000 7ce95a14 e9070008 
> 7fa9582a 2fbd0000 41de0054 e81f0022 <7f3d002a> 38000000 886d01f2 980d01f2 
> ---[ end trace 366fe6c7ced3bfb0 ]---
> 
> This did not happen yesterday.  Just wondering if anyone can think of
> anything obvious.  Full console log at
> http://ozlabs.org/~sfr/next-20120411.log.bz2

I managed to bisect this down using pseries_defconfig with next-20120412
to this patch:

  commit 85bbc003b24335e253a392f6a9874103b77abb36
  Author: Jiri Slaby <jslaby@suse.cz>
  Date:   Mon Apr 2 13:54:22 2012 +0200

      TTY: HVC, use tty from tty_port

      The driver already used refcounting. So we just switch it to tty_port
      helpers. And switch to tty_port->lock for tty.

      Signed-off-by: Jiri Slaby <jslaby@suse.cz>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Reverting this commit (and 0146b6939074ebe14ece3604fd00e7be128a3812
otherwise git barfs) fixes the problem on next-20120412.  

I'm assuming we got the ref count changes wrong somewhere in the patch
but the tty code is beyond me.  Jiri, can you take a look?

Mikey

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: linux-next: boot failures with next-20120411
  2012-04-13  2:30   ` Michael Neuling
@ 2012-04-13  2:57     ` Stephen Rothwell
  -1 siblings, 0 replies; 14+ messages in thread
From: Stephen Rothwell @ 2012-04-13  2:57 UTC (permalink / raw)
  To: Michael Neuling; +Cc: Jiri Slaby, Greg Kroah-Hartman, LKML, ppc-dev, linux-next

[-- Attachment #1: Type: text/plain, Size: 661 bytes --]

Hi all,

On Fri, 13 Apr 2012 12:30:11 +1000 Michael Neuling <mikey@neuling.org> wrote:
>
> I managed to bisect this down using pseries_defconfig with next-20120412
> to this patch:
> 
>   commit 85bbc003b24335e253a392f6a9874103b77abb36
>   Author: Jiri Slaby <jslaby@suse.cz>
>   Date:   Mon Apr 2 13:54:22 2012 +0200
> 
>       TTY: HVC, use tty from tty_port

Thanks for that, Mikey.

> Reverting this commit (and 0146b6939074ebe14ece3604fd00e7be128a3812
> otherwise git barfs) fixes the problem on next-20120412.

I will revert those commits form linux-next today.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: linux-next: boot failures with next-20120411
@ 2012-04-13  2:57     ` Stephen Rothwell
  0 siblings, 0 replies; 14+ messages in thread
From: Stephen Rothwell @ 2012-04-13  2:57 UTC (permalink / raw)
  To: Michael Neuling; +Cc: Greg Kroah-Hartman, linux-next, ppc-dev, Jiri Slaby, LKML

[-- Attachment #1: Type: text/plain, Size: 661 bytes --]

Hi all,

On Fri, 13 Apr 2012 12:30:11 +1000 Michael Neuling <mikey@neuling.org> wrote:
>
> I managed to bisect this down using pseries_defconfig with next-20120412
> to this patch:
> 
>   commit 85bbc003b24335e253a392f6a9874103b77abb36
>   Author: Jiri Slaby <jslaby@suse.cz>
>   Date:   Mon Apr 2 13:54:22 2012 +0200
> 
>       TTY: HVC, use tty from tty_port

Thanks for that, Mikey.

> Reverting this commit (and 0146b6939074ebe14ece3604fd00e7be128a3812
> otherwise git barfs) fixes the problem on next-20120412.

I will revert those commits form linux-next today.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: linux-next: boot failures with next-20120411
  2012-04-13  2:30   ` Michael Neuling
@ 2012-04-13  8:02     ` Jiri Slaby
  -1 siblings, 0 replies; 14+ messages in thread
From: Jiri Slaby @ 2012-04-13  8:02 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Stephen Rothwell, Greg Kroah-Hartman, LKML, ppc-dev, linux-next,
	Jiri Slaby

[-- Attachment #1: Type: text/plain, Size: 3609 bytes --]

On 04/13/2012 04:30 AM, Michael Neuling wrote:
> Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> 
>> Hi all,
>>
>> Some (not all) of my PowerPC boot tests have failed like this after
>> getting into user mode (this one was just after udev started, but others
>> are after other processes getting going):
>>
>> Unable to handle kernel paging request for data at address 0xc0000003f9d550
>> Faulting instruction address: 0xc0000000001b7f40
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> SMP NR_CPUS=32 NUMA pSeries
>> Modules linked in: ehea
>> NIP: c0000000001b7f40 LR: c0000000001b7f14 CTR: c0000000000e04f0
>> REGS: c0000003f68bf6b0 TRAP: 0300   Not tainted  (3.4.0-rc2-autokern1)
>> MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 24422424  XER: 20000001
>> SOFTE: 1
>> CFAR: 000000000000562c
>> DAR: 00c0000003f9d550, DSISR: 40000000
>> TASK = c0000003f8818000[3192] 'kdump' THREAD: c0000003f68bc000 CPU: 5
>> GPR00: 0000000000000000 c0000003f68bf930 c000000000ce1d40 c0000003fe00ec00 
>> GPR04: 00000000000002d0 0000000000000038 c0000003f8f935e8 c000000000e55280 
>> GPR08: 0000000000000011 c000000000bcb280 c000000000bcb1e8 000000000028a000 
>> GPR12: 0000000024422424 c00000000f33bc80 00000fffdd90a770 0000000000081000 
>> GPR16: c0000003f846c000 000000000de4f7a0 f00000000de4f7a0 0000000000000000 
>> GPR20: c0000003f8365408 c0000003f8365480 c0000003f8e5d110 0000000000000000 
>> GPR24: 0000000000000100 c0000003f8365400 c0000000001e5424 00000000000002d0 
>> GPR28: 0000000000000800 00c0000003f9d550 c000000000c5b718 c0000003fe00ec00 
>> NIP [c0000000001b7f40] .__kmalloc+0x70/0x230
>> LR [c0000000001b7f14] .__kmalloc+0x44/0x230
>> Call Trace:
>> [c0000003f68bf930] [c0000003f68bf9b0] 0xc0000003f68bf9b0 (unreliable)
>> [c0000003f68bf9e0] [c0000000001e5424] .alloc_fdmem+0x24/0x70
>> [c0000003f68bfa60] [c0000000001e54f8] .alloc_fdtable+0x88/0x130
>> [c0000003f68bfaf0] [c0000000001e5924] .dup_fd+0x384/0x450
>> [c0000003f68bfbd0] [c00000000009a310] .copy_process+0x880/0x11d0
>> [c0000003f68bfcd0] [c00000000009aee0] .do_fork+0x70/0x400
>> [c0000003f68bfdc0] [c0000000000141c4] .sys_clone+0x54/0x70
>> [c0000003f68bfe30] [c000000000009aa0] .ppc_clone+0x8/0xc
>> Instruction dump:
>> 4bff9281 2ba30010 7c7f1b78 40dd00f4 e96d0040 e93f0000 7ce95a14 e9070008 
>> 7fa9582a 2fbd0000 41de0054 e81f0022 <7f3d002a> 38000000 886d01f2 980d01f2 
>> ---[ end trace 366fe6c7ced3bfb0 ]---
>>
>> This did not happen yesterday.  Just wondering if anyone can think of
>> anything obvious.  Full console log at
>> http://ozlabs.org/~sfr/next-20120411.log.bz2
> 
> I managed to bisect this down using pseries_defconfig with next-20120412
> to this patch:
> 
>   commit 85bbc003b24335e253a392f6a9874103b77abb36
>   Author: Jiri Slaby <jslaby@suse.cz>
>   Date:   Mon Apr 2 13:54:22 2012 +0200
> 
>       TTY: HVC, use tty from tty_port
> 
>       The driver already used refcounting. So we just switch it to tty_port
>       helpers. And switch to tty_port->lock for tty.
> 
>       Signed-off-by: Jiri Slaby <jslaby@suse.cz>
>       Cc: linuxppc-dev@lists.ozlabs.org
>       Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> Reverting this commit (and 0146b6939074ebe14ece3604fd00e7be128a3812
> otherwise git barfs) fixes the problem on next-20120412.  
> 
> I'm assuming we got the ref count changes wrong somewhere in the patch
> but the tty code is beyond me.  Jiri, can you take a look?

Yeah, I see. I forgot to remove a couple of tty reference drops. The
reference is dropped by tty_port_tty_set in open/close/hangup now. Does
the attached patch help?

thanks,
-- 
js
suse labs



[-- Attachment #2: 0001-HVC-fix-refcounting.patch --]
[-- Type: text/x-patch, Size: 1157 bytes --]

>From cc51efe721f5aa184e119c52c661a1faf865e492 Mon Sep 17 00:00:00 2001
From: Jiri Slaby <jslaby@suse.cz>
Date: Fri, 13 Apr 2012 10:00:28 +0200
Subject: [PATCH 1/1] HVC: fix refcounting

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 drivers/tty/hvc/hvc_console.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 6c45cbf..260d4f2 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -338,7 +338,6 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 	 */
 	if (rc) {
 		tty_port_tty_set(&hp->port, NULL);
-		tty_kref_put(tty);
 		tty->driver_data = NULL;
 		tty_port_put(&hp->port);
 		printk(KERN_ERR "hvc_open: request_irq failed with rc %d.\n", rc);
@@ -393,7 +392,6 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
 		spin_unlock_irqrestore(&hp->port.lock, flags);
 	}
 
-	tty_kref_put(tty);
 	tty_port_put(&hp->port);
 }
 
@@ -433,7 +431,6 @@ static void hvc_hangup(struct tty_struct *tty)
 
 	while(temp_open_count) {
 		--temp_open_count;
-		tty_kref_put(tty);
 		tty_port_put(&hp->port);
 	}
 }
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: linux-next: boot failures with next-20120411
@ 2012-04-13  8:02     ` Jiri Slaby
  0 siblings, 0 replies; 14+ messages in thread
From: Jiri Slaby @ 2012-04-13  8:02 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Stephen Rothwell, Jiri Slaby, Greg Kroah-Hartman, LKML,
	linux-next, ppc-dev

[-- Attachment #1: Type: text/plain, Size: 3609 bytes --]

On 04/13/2012 04:30 AM, Michael Neuling wrote:
> Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> 
>> Hi all,
>>
>> Some (not all) of my PowerPC boot tests have failed like this after
>> getting into user mode (this one was just after udev started, but others
>> are after other processes getting going):
>>
>> Unable to handle kernel paging request for data at address 0xc0000003f9d550
>> Faulting instruction address: 0xc0000000001b7f40
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> SMP NR_CPUS=32 NUMA pSeries
>> Modules linked in: ehea
>> NIP: c0000000001b7f40 LR: c0000000001b7f14 CTR: c0000000000e04f0
>> REGS: c0000003f68bf6b0 TRAP: 0300   Not tainted  (3.4.0-rc2-autokern1)
>> MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 24422424  XER: 20000001
>> SOFTE: 1
>> CFAR: 000000000000562c
>> DAR: 00c0000003f9d550, DSISR: 40000000
>> TASK = c0000003f8818000[3192] 'kdump' THREAD: c0000003f68bc000 CPU: 5
>> GPR00: 0000000000000000 c0000003f68bf930 c000000000ce1d40 c0000003fe00ec00 
>> GPR04: 00000000000002d0 0000000000000038 c0000003f8f935e8 c000000000e55280 
>> GPR08: 0000000000000011 c000000000bcb280 c000000000bcb1e8 000000000028a000 
>> GPR12: 0000000024422424 c00000000f33bc80 00000fffdd90a770 0000000000081000 
>> GPR16: c0000003f846c000 000000000de4f7a0 f00000000de4f7a0 0000000000000000 
>> GPR20: c0000003f8365408 c0000003f8365480 c0000003f8e5d110 0000000000000000 
>> GPR24: 0000000000000100 c0000003f8365400 c0000000001e5424 00000000000002d0 
>> GPR28: 0000000000000800 00c0000003f9d550 c000000000c5b718 c0000003fe00ec00 
>> NIP [c0000000001b7f40] .__kmalloc+0x70/0x230
>> LR [c0000000001b7f14] .__kmalloc+0x44/0x230
>> Call Trace:
>> [c0000003f68bf930] [c0000003f68bf9b0] 0xc0000003f68bf9b0 (unreliable)
>> [c0000003f68bf9e0] [c0000000001e5424] .alloc_fdmem+0x24/0x70
>> [c0000003f68bfa60] [c0000000001e54f8] .alloc_fdtable+0x88/0x130
>> [c0000003f68bfaf0] [c0000000001e5924] .dup_fd+0x384/0x450
>> [c0000003f68bfbd0] [c00000000009a310] .copy_process+0x880/0x11d0
>> [c0000003f68bfcd0] [c00000000009aee0] .do_fork+0x70/0x400
>> [c0000003f68bfdc0] [c0000000000141c4] .sys_clone+0x54/0x70
>> [c0000003f68bfe30] [c000000000009aa0] .ppc_clone+0x8/0xc
>> Instruction dump:
>> 4bff9281 2ba30010 7c7f1b78 40dd00f4 e96d0040 e93f0000 7ce95a14 e9070008 
>> 7fa9582a 2fbd0000 41de0054 e81f0022 <7f3d002a> 38000000 886d01f2 980d01f2 
>> ---[ end trace 366fe6c7ced3bfb0 ]---
>>
>> This did not happen yesterday.  Just wondering if anyone can think of
>> anything obvious.  Full console log at
>> http://ozlabs.org/~sfr/next-20120411.log.bz2
> 
> I managed to bisect this down using pseries_defconfig with next-20120412
> to this patch:
> 
>   commit 85bbc003b24335e253a392f6a9874103b77abb36
>   Author: Jiri Slaby <jslaby@suse.cz>
>   Date:   Mon Apr 2 13:54:22 2012 +0200
> 
>       TTY: HVC, use tty from tty_port
> 
>       The driver already used refcounting. So we just switch it to tty_port
>       helpers. And switch to tty_port->lock for tty.
> 
>       Signed-off-by: Jiri Slaby <jslaby@suse.cz>
>       Cc: linuxppc-dev@lists.ozlabs.org
>       Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> Reverting this commit (and 0146b6939074ebe14ece3604fd00e7be128a3812
> otherwise git barfs) fixes the problem on next-20120412.  
> 
> I'm assuming we got the ref count changes wrong somewhere in the patch
> but the tty code is beyond me.  Jiri, can you take a look?

Yeah, I see. I forgot to remove a couple of tty reference drops. The
reference is dropped by tty_port_tty_set in open/close/hangup now. Does
the attached patch help?

thanks,
-- 
js
suse labs



[-- Attachment #2: 0001-HVC-fix-refcounting.patch --]
[-- Type: text/x-patch, Size: 1157 bytes --]

>From cc51efe721f5aa184e119c52c661a1faf865e492 Mon Sep 17 00:00:00 2001
From: Jiri Slaby <jslaby@suse.cz>
Date: Fri, 13 Apr 2012 10:00:28 +0200
Subject: [PATCH 1/1] HVC: fix refcounting

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 drivers/tty/hvc/hvc_console.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 6c45cbf..260d4f2 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -338,7 +338,6 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 	 */
 	if (rc) {
 		tty_port_tty_set(&hp->port, NULL);
-		tty_kref_put(tty);
 		tty->driver_data = NULL;
 		tty_port_put(&hp->port);
 		printk(KERN_ERR "hvc_open: request_irq failed with rc %d.\n", rc);
@@ -393,7 +392,6 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
 		spin_unlock_irqrestore(&hp->port.lock, flags);
 	}
 
-	tty_kref_put(tty);
 	tty_port_put(&hp->port);
 }
 
@@ -433,7 +431,6 @@ static void hvc_hangup(struct tty_struct *tty)
 
 	while(temp_open_count) {
 		--temp_open_count;
-		tty_kref_put(tty);
 		tty_port_put(&hp->port);
 	}
 }
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: linux-next: boot failures with next-20120411
  2012-04-13  8:02     ` Jiri Slaby
@ 2012-04-13  8:04       ` Jiri Slaby
  -1 siblings, 0 replies; 14+ messages in thread
From: Jiri Slaby @ 2012-04-13  8:04 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Michael Neuling, Stephen Rothwell, Greg Kroah-Hartman, LKML,
	ppc-dev, linux-next

[-- Attachment #1: Type: text/plain, Size: 3795 bytes --]

On 04/13/2012 10:02 AM, Jiri Slaby wrote:
> On 04/13/2012 04:30 AM, Michael Neuling wrote:
>> Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>>
>>> Hi all,
>>>
>>> Some (not all) of my PowerPC boot tests have failed like this after
>>> getting into user mode (this one was just after udev started, but others
>>> are after other processes getting going):
>>>
>>> Unable to handle kernel paging request for data at address 0xc0000003f9d550
>>> Faulting instruction address: 0xc0000000001b7f40
>>> Oops: Kernel access of bad area, sig: 11 [#1]
>>> SMP NR_CPUS=32 NUMA pSeries
>>> Modules linked in: ehea
>>> NIP: c0000000001b7f40 LR: c0000000001b7f14 CTR: c0000000000e04f0
>>> REGS: c0000003f68bf6b0 TRAP: 0300   Not tainted  (3.4.0-rc2-autokern1)
>>> MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 24422424  XER: 20000001
>>> SOFTE: 1
>>> CFAR: 000000000000562c
>>> DAR: 00c0000003f9d550, DSISR: 40000000
>>> TASK = c0000003f8818000[3192] 'kdump' THREAD: c0000003f68bc000 CPU: 5
>>> GPR00: 0000000000000000 c0000003f68bf930 c000000000ce1d40 c0000003fe00ec00 
>>> GPR04: 00000000000002d0 0000000000000038 c0000003f8f935e8 c000000000e55280 
>>> GPR08: 0000000000000011 c000000000bcb280 c000000000bcb1e8 000000000028a000 
>>> GPR12: 0000000024422424 c00000000f33bc80 00000fffdd90a770 0000000000081000 
>>> GPR16: c0000003f846c000 000000000de4f7a0 f00000000de4f7a0 0000000000000000 
>>> GPR20: c0000003f8365408 c0000003f8365480 c0000003f8e5d110 0000000000000000 
>>> GPR24: 0000000000000100 c0000003f8365400 c0000000001e5424 00000000000002d0 
>>> GPR28: 0000000000000800 00c0000003f9d550 c000000000c5b718 c0000003fe00ec00 
>>> NIP [c0000000001b7f40] .__kmalloc+0x70/0x230
>>> LR [c0000000001b7f14] .__kmalloc+0x44/0x230
>>> Call Trace:
>>> [c0000003f68bf930] [c0000003f68bf9b0] 0xc0000003f68bf9b0 (unreliable)
>>> [c0000003f68bf9e0] [c0000000001e5424] .alloc_fdmem+0x24/0x70
>>> [c0000003f68bfa60] [c0000000001e54f8] .alloc_fdtable+0x88/0x130
>>> [c0000003f68bfaf0] [c0000000001e5924] .dup_fd+0x384/0x450
>>> [c0000003f68bfbd0] [c00000000009a310] .copy_process+0x880/0x11d0
>>> [c0000003f68bfcd0] [c00000000009aee0] .do_fork+0x70/0x400
>>> [c0000003f68bfdc0] [c0000000000141c4] .sys_clone+0x54/0x70
>>> [c0000003f68bfe30] [c000000000009aa0] .ppc_clone+0x8/0xc
>>> Instruction dump:
>>> 4bff9281 2ba30010 7c7f1b78 40dd00f4 e96d0040 e93f0000 7ce95a14 e9070008 
>>> 7fa9582a 2fbd0000 41de0054 e81f0022 <7f3d002a> 38000000 886d01f2 980d01f2 
>>> ---[ end trace 366fe6c7ced3bfb0 ]---
>>>
>>> This did not happen yesterday.  Just wondering if anyone can think of
>>> anything obvious.  Full console log at
>>> http://ozlabs.org/~sfr/next-20120411.log.bz2
>>
>> I managed to bisect this down using pseries_defconfig with next-20120412
>> to this patch:
>>
>>   commit 85bbc003b24335e253a392f6a9874103b77abb36
>>   Author: Jiri Slaby <jslaby@suse.cz>
>>   Date:   Mon Apr 2 13:54:22 2012 +0200
>>
>>       TTY: HVC, use tty from tty_port
>>
>>       The driver already used refcounting. So we just switch it to tty_port
>>       helpers. And switch to tty_port->lock for tty.
>>
>>       Signed-off-by: Jiri Slaby <jslaby@suse.cz>
>>       Cc: linuxppc-dev@lists.ozlabs.org
>>       Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>
>> Reverting this commit (and 0146b6939074ebe14ece3604fd00e7be128a3812
>> otherwise git barfs) fixes the problem on next-20120412.  
>>
>> I'm assuming we got the ref count changes wrong somewhere in the patch
>> but the tty code is beyond me.  Jiri, can you take a look?
> 
> Yeah, I see. I forgot to remove a couple of tty reference drops. The
> reference is dropped by tty_port_tty_set in open/close/hangup now. Does
> the attached patch help?

And the patch is incomplete. Now we have a leak. This one should work.

> thanks,
-- 
js
suse labs


[-- Attachment #2: 0001-HVC-fix-refcounting.patch --]
[-- Type: text/x-patch, Size: 1496 bytes --]

>From 7a55e2976cb5a47e499a6db335ad30ecac2e621c Mon Sep 17 00:00:00 2001
From: Jiri Slaby <jslaby@suse.cz>
Date: Fri, 13 Apr 2012 10:00:28 +0200
Subject: [PATCH 1/1] HVC: fix refcounting

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 drivers/tty/hvc/hvc_console.c |    5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 6c45cbf..2d691eb 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -317,8 +317,6 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 	/* Check and then increment for fast path open. */
 	if (hp->port.count++ > 0) {
 		spin_unlock_irqrestore(&hp->port.lock, flags);
-		/* FIXME why taking a reference here? */
-		tty_kref_get(tty);
 		hvc_kick();
 		return 0;
 	} /* else count == 0 */
@@ -338,7 +336,6 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 	 */
 	if (rc) {
 		tty_port_tty_set(&hp->port, NULL);
-		tty_kref_put(tty);
 		tty->driver_data = NULL;
 		tty_port_put(&hp->port);
 		printk(KERN_ERR "hvc_open: request_irq failed with rc %d.\n", rc);
@@ -393,7 +390,6 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
 		spin_unlock_irqrestore(&hp->port.lock, flags);
 	}
 
-	tty_kref_put(tty);
 	tty_port_put(&hp->port);
 }
 
@@ -433,7 +429,6 @@ static void hvc_hangup(struct tty_struct *tty)
 
 	while(temp_open_count) {
 		--temp_open_count;
-		tty_kref_put(tty);
 		tty_port_put(&hp->port);
 	}
 }
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: linux-next: boot failures with next-20120411
@ 2012-04-13  8:04       ` Jiri Slaby
  0 siblings, 0 replies; 14+ messages in thread
From: Jiri Slaby @ 2012-04-13  8:04 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Stephen Rothwell, Michael Neuling, Greg Kroah-Hartman, LKML,
	linux-next, ppc-dev

[-- Attachment #1: Type: text/plain, Size: 3795 bytes --]

On 04/13/2012 10:02 AM, Jiri Slaby wrote:
> On 04/13/2012 04:30 AM, Michael Neuling wrote:
>> Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>>
>>> Hi all,
>>>
>>> Some (not all) of my PowerPC boot tests have failed like this after
>>> getting into user mode (this one was just after udev started, but others
>>> are after other processes getting going):
>>>
>>> Unable to handle kernel paging request for data at address 0xc0000003f9d550
>>> Faulting instruction address: 0xc0000000001b7f40
>>> Oops: Kernel access of bad area, sig: 11 [#1]
>>> SMP NR_CPUS=32 NUMA pSeries
>>> Modules linked in: ehea
>>> NIP: c0000000001b7f40 LR: c0000000001b7f14 CTR: c0000000000e04f0
>>> REGS: c0000003f68bf6b0 TRAP: 0300   Not tainted  (3.4.0-rc2-autokern1)
>>> MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 24422424  XER: 20000001
>>> SOFTE: 1
>>> CFAR: 000000000000562c
>>> DAR: 00c0000003f9d550, DSISR: 40000000
>>> TASK = c0000003f8818000[3192] 'kdump' THREAD: c0000003f68bc000 CPU: 5
>>> GPR00: 0000000000000000 c0000003f68bf930 c000000000ce1d40 c0000003fe00ec00 
>>> GPR04: 00000000000002d0 0000000000000038 c0000003f8f935e8 c000000000e55280 
>>> GPR08: 0000000000000011 c000000000bcb280 c000000000bcb1e8 000000000028a000 
>>> GPR12: 0000000024422424 c00000000f33bc80 00000fffdd90a770 0000000000081000 
>>> GPR16: c0000003f846c000 000000000de4f7a0 f00000000de4f7a0 0000000000000000 
>>> GPR20: c0000003f8365408 c0000003f8365480 c0000003f8e5d110 0000000000000000 
>>> GPR24: 0000000000000100 c0000003f8365400 c0000000001e5424 00000000000002d0 
>>> GPR28: 0000000000000800 00c0000003f9d550 c000000000c5b718 c0000003fe00ec00 
>>> NIP [c0000000001b7f40] .__kmalloc+0x70/0x230
>>> LR [c0000000001b7f14] .__kmalloc+0x44/0x230
>>> Call Trace:
>>> [c0000003f68bf930] [c0000003f68bf9b0] 0xc0000003f68bf9b0 (unreliable)
>>> [c0000003f68bf9e0] [c0000000001e5424] .alloc_fdmem+0x24/0x70
>>> [c0000003f68bfa60] [c0000000001e54f8] .alloc_fdtable+0x88/0x130
>>> [c0000003f68bfaf0] [c0000000001e5924] .dup_fd+0x384/0x450
>>> [c0000003f68bfbd0] [c00000000009a310] .copy_process+0x880/0x11d0
>>> [c0000003f68bfcd0] [c00000000009aee0] .do_fork+0x70/0x400
>>> [c0000003f68bfdc0] [c0000000000141c4] .sys_clone+0x54/0x70
>>> [c0000003f68bfe30] [c000000000009aa0] .ppc_clone+0x8/0xc
>>> Instruction dump:
>>> 4bff9281 2ba30010 7c7f1b78 40dd00f4 e96d0040 e93f0000 7ce95a14 e9070008 
>>> 7fa9582a 2fbd0000 41de0054 e81f0022 <7f3d002a> 38000000 886d01f2 980d01f2 
>>> ---[ end trace 366fe6c7ced3bfb0 ]---
>>>
>>> This did not happen yesterday.  Just wondering if anyone can think of
>>> anything obvious.  Full console log at
>>> http://ozlabs.org/~sfr/next-20120411.log.bz2
>>
>> I managed to bisect this down using pseries_defconfig with next-20120412
>> to this patch:
>>
>>   commit 85bbc003b24335e253a392f6a9874103b77abb36
>>   Author: Jiri Slaby <jslaby@suse.cz>
>>   Date:   Mon Apr 2 13:54:22 2012 +0200
>>
>>       TTY: HVC, use tty from tty_port
>>
>>       The driver already used refcounting. So we just switch it to tty_port
>>       helpers. And switch to tty_port->lock for tty.
>>
>>       Signed-off-by: Jiri Slaby <jslaby@suse.cz>
>>       Cc: linuxppc-dev@lists.ozlabs.org
>>       Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>
>> Reverting this commit (and 0146b6939074ebe14ece3604fd00e7be128a3812
>> otherwise git barfs) fixes the problem on next-20120412.  
>>
>> I'm assuming we got the ref count changes wrong somewhere in the patch
>> but the tty code is beyond me.  Jiri, can you take a look?
> 
> Yeah, I see. I forgot to remove a couple of tty reference drops. The
> reference is dropped by tty_port_tty_set in open/close/hangup now. Does
> the attached patch help?

And the patch is incomplete. Now we have a leak. This one should work.

> thanks,
-- 
js
suse labs


[-- Attachment #2: 0001-HVC-fix-refcounting.patch --]
[-- Type: text/x-patch, Size: 1496 bytes --]

>From 7a55e2976cb5a47e499a6db335ad30ecac2e621c Mon Sep 17 00:00:00 2001
From: Jiri Slaby <jslaby@suse.cz>
Date: Fri, 13 Apr 2012 10:00:28 +0200
Subject: [PATCH 1/1] HVC: fix refcounting

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 drivers/tty/hvc/hvc_console.c |    5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 6c45cbf..2d691eb 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -317,8 +317,6 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 	/* Check and then increment for fast path open. */
 	if (hp->port.count++ > 0) {
 		spin_unlock_irqrestore(&hp->port.lock, flags);
-		/* FIXME why taking a reference here? */
-		tty_kref_get(tty);
 		hvc_kick();
 		return 0;
 	} /* else count == 0 */
@@ -338,7 +336,6 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 	 */
 	if (rc) {
 		tty_port_tty_set(&hp->port, NULL);
-		tty_kref_put(tty);
 		tty->driver_data = NULL;
 		tty_port_put(&hp->port);
 		printk(KERN_ERR "hvc_open: request_irq failed with rc %d.\n", rc);
@@ -393,7 +390,6 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
 		spin_unlock_irqrestore(&hp->port.lock, flags);
 	}
 
-	tty_kref_put(tty);
 	tty_port_put(&hp->port);
 }
 
@@ -433,7 +429,6 @@ static void hvc_hangup(struct tty_struct *tty)
 
 	while(temp_open_count) {
 		--temp_open_count;
-		tty_kref_put(tty);
 		tty_port_put(&hp->port);
 	}
 }
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: linux-next: boot failures with next-20120411
  2012-04-13  8:04       ` Jiri Slaby
@ 2012-04-13  8:09         ` Michael Neuling
  -1 siblings, 0 replies; 14+ messages in thread
From: Michael Neuling @ 2012-04-13  8:09 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Jiri Slaby, Stephen Rothwell, Greg Kroah-Hartman, LKML, ppc-dev,
	linux-next

Jiri Slaby <jslaby@suse.cz> wrote:

> On 04/13/2012 10:02 AM, Jiri Slaby wrote:
> > On 04/13/2012 04:30 AM, Michael Neuling wrote:
> >> Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Some (not all) of my PowerPC boot tests have failed like this after
> >>> getting into user mode (this one was just after udev started, but others
> >>> are after other processes getting going):
> >>>
> >>> Unable to handle kernel paging request for data at address 0xc0000003f9d550
> >>> Faulting instruction address: 0xc0000000001b7f40
> >>> Oops: Kernel access of bad area, sig: 11 [#1]
> >>> SMP NR_CPUS=32 NUMA pSeries
> >>> Modules linked in: ehea
> >>> NIP: c0000000001b7f40 LR: c0000000001b7f14 CTR: c0000000000e04f0
> >>> REGS: c0000003f68bf6b0 TRAP: 0300   Not tainted  (3.4.0-rc2-autokern1)
> >>> MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 24422424  XER: 20000001
> >>> SOFTE: 1
> >>> CFAR: 000000000000562c
> >>> DAR: 00c0000003f9d550, DSISR: 40000000
> >>> TASK = c0000003f8818000[3192] 'kdump' THREAD: c0000003f68bc000 CPU: 5
> >>> GPR00: 0000000000000000 c0000003f68bf930 c000000000ce1d40 c0000003fe00ec00 
> >>> GPR04: 00000000000002d0 0000000000000038 c0000003f8f935e8 c000000000e55280 
> >>> GPR08: 0000000000000011 c000000000bcb280 c000000000bcb1e8 000000000028a000 
> >>> GPR12: 0000000024422424 c00000000f33bc80 00000fffdd90a770 0000000000081000 
> >>> GPR16: c0000003f846c000 000000000de4f7a0 f00000000de4f7a0 0000000000000000 
> >>> GPR20: c0000003f8365408 c0000003f8365480 c0000003f8e5d110 0000000000000000 
> >>> GPR24: 0000000000000100 c0000003f8365400 c0000000001e5424 00000000000002d0 
> >>> GPR28: 0000000000000800 00c0000003f9d550 c000000000c5b718 c0000003fe00ec00 
> >>> NIP [c0000000001b7f40] .__kmalloc+0x70/0x230
> >>> LR [c0000000001b7f14] .__kmalloc+0x44/0x230
> >>> Call Trace:
> >>> [c0000003f68bf930] [c0000003f68bf9b0] 0xc0000003f68bf9b0 (unreliable)
> >>> [c0000003f68bf9e0] [c0000000001e5424] .alloc_fdmem+0x24/0x70
> >>> [c0000003f68bfa60] [c0000000001e54f8] .alloc_fdtable+0x88/0x130
> >>> [c0000003f68bfaf0] [c0000000001e5924] .dup_fd+0x384/0x450
> >>> [c0000003f68bfbd0] [c00000000009a310] .copy_process+0x880/0x11d0
> >>> [c0000003f68bfcd0] [c00000000009aee0] .do_fork+0x70/0x400
> >>> [c0000003f68bfdc0] [c0000000000141c4] .sys_clone+0x54/0x70
> >>> [c0000003f68bfe30] [c000000000009aa0] .ppc_clone+0x8/0xc
> >>> Instruction dump:
> >>> 4bff9281 2ba30010 7c7f1b78 40dd00f4 e96d0040 e93f0000 7ce95a14 e9070008 
> >>> 7fa9582a 2fbd0000 41de0054 e81f0022 <7f3d002a> 38000000 886d01f2 980d01f2 
> >>> ---[ end trace 366fe6c7ced3bfb0 ]---
> >>>
> >>> This did not happen yesterday.  Just wondering if anyone can think of
> >>> anything obvious.  Full console log at
> >>> http://ozlabs.org/~sfr/next-20120411.log.bz2
> >>
> >> I managed to bisect this down using pseries_defconfig with next-20120412
> >> to this patch:
> >>
> >>   commit 85bbc003b24335e253a392f6a9874103b77abb36
> >>   Author: Jiri Slaby <jslaby@suse.cz>
> >>   Date:   Mon Apr 2 13:54:22 2012 +0200
> >>
> >>       TTY: HVC, use tty from tty_port
> >>
> >>       The driver already used refcounting. So we just switch it to tty_port
> >>       helpers. And switch to tty_port->lock for tty.
> >>
> >>       Signed-off-by: Jiri Slaby <jslaby@suse.cz>
> >>       Cc: linuxppc-dev@lists.ozlabs.org
> >>       Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >>
> >> Reverting this commit (and 0146b6939074ebe14ece3604fd00e7be128a3812
> >> otherwise git barfs) fixes the problem on next-20120412.  
> >>
> >> I'm assuming we got the ref count changes wrong somewhere in the patch
> >> but the tty code is beyond me.  Jiri, can you take a look?
> > 
> > Yeah, I see. I forgot to remove a couple of tty reference drops. The
> > reference is dropped by tty_port_tty_set in open/close/hangup now. Does
> > the attached patch help?
> 
> And the patch is incomplete. Now we have a leak. This one should work.

Fixes the problem here.. Thanks.

Tested-by: Michael Neuling <mikey@neuling.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: linux-next: boot failures with next-20120411
@ 2012-04-13  8:09         ` Michael Neuling
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Neuling @ 2012-04-13  8:09 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Stephen Rothwell, Jiri Slaby, Greg Kroah-Hartman, LKML,
	linux-next, ppc-dev

Jiri Slaby <jslaby@suse.cz> wrote:

> On 04/13/2012 10:02 AM, Jiri Slaby wrote:
> > On 04/13/2012 04:30 AM, Michael Neuling wrote:
> >> Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Some (not all) of my PowerPC boot tests have failed like this after
> >>> getting into user mode (this one was just after udev started, but others
> >>> are after other processes getting going):
> >>>
> >>> Unable to handle kernel paging request for data at address 0xc0000003f9d550
> >>> Faulting instruction address: 0xc0000000001b7f40
> >>> Oops: Kernel access of bad area, sig: 11 [#1]
> >>> SMP NR_CPUS=32 NUMA pSeries
> >>> Modules linked in: ehea
> >>> NIP: c0000000001b7f40 LR: c0000000001b7f14 CTR: c0000000000e04f0
> >>> REGS: c0000003f68bf6b0 TRAP: 0300   Not tainted  (3.4.0-rc2-autokern1)
> >>> MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 24422424  XER: 20000001
> >>> SOFTE: 1
> >>> CFAR: 000000000000562c
> >>> DAR: 00c0000003f9d550, DSISR: 40000000
> >>> TASK = c0000003f8818000[3192] 'kdump' THREAD: c0000003f68bc000 CPU: 5
> >>> GPR00: 0000000000000000 c0000003f68bf930 c000000000ce1d40 c0000003fe00ec00 
> >>> GPR04: 00000000000002d0 0000000000000038 c0000003f8f935e8 c000000000e55280 
> >>> GPR08: 0000000000000011 c000000000bcb280 c000000000bcb1e8 000000000028a000 
> >>> GPR12: 0000000024422424 c00000000f33bc80 00000fffdd90a770 0000000000081000 
> >>> GPR16: c0000003f846c000 000000000de4f7a0 f00000000de4f7a0 0000000000000000 
> >>> GPR20: c0000003f8365408 c0000003f8365480 c0000003f8e5d110 0000000000000000 
> >>> GPR24: 0000000000000100 c0000003f8365400 c0000000001e5424 00000000000002d0 
> >>> GPR28: 0000000000000800 00c0000003f9d550 c000000000c5b718 c0000003fe00ec00 
> >>> NIP [c0000000001b7f40] .__kmalloc+0x70/0x230
> >>> LR [c0000000001b7f14] .__kmalloc+0x44/0x230
> >>> Call Trace:
> >>> [c0000003f68bf930] [c0000003f68bf9b0] 0xc0000003f68bf9b0 (unreliable)
> >>> [c0000003f68bf9e0] [c0000000001e5424] .alloc_fdmem+0x24/0x70
> >>> [c0000003f68bfa60] [c0000000001e54f8] .alloc_fdtable+0x88/0x130
> >>> [c0000003f68bfaf0] [c0000000001e5924] .dup_fd+0x384/0x450
> >>> [c0000003f68bfbd0] [c00000000009a310] .copy_process+0x880/0x11d0
> >>> [c0000003f68bfcd0] [c00000000009aee0] .do_fork+0x70/0x400
> >>> [c0000003f68bfdc0] [c0000000000141c4] .sys_clone+0x54/0x70
> >>> [c0000003f68bfe30] [c000000000009aa0] .ppc_clone+0x8/0xc
> >>> Instruction dump:
> >>> 4bff9281 2ba30010 7c7f1b78 40dd00f4 e96d0040 e93f0000 7ce95a14 e9070008 
> >>> 7fa9582a 2fbd0000 41de0054 e81f0022 <7f3d002a> 38000000 886d01f2 980d01f2 
> >>> ---[ end trace 366fe6c7ced3bfb0 ]---
> >>>
> >>> This did not happen yesterday.  Just wondering if anyone can think of
> >>> anything obvious.  Full console log at
> >>> http://ozlabs.org/~sfr/next-20120411.log.bz2
> >>
> >> I managed to bisect this down using pseries_defconfig with next-20120412
> >> to this patch:
> >>
> >>   commit 85bbc003b24335e253a392f6a9874103b77abb36
> >>   Author: Jiri Slaby <jslaby@suse.cz>
> >>   Date:   Mon Apr 2 13:54:22 2012 +0200
> >>
> >>       TTY: HVC, use tty from tty_port
> >>
> >>       The driver already used refcounting. So we just switch it to tty_port
> >>       helpers. And switch to tty_port->lock for tty.
> >>
> >>       Signed-off-by: Jiri Slaby <jslaby@suse.cz>
> >>       Cc: linuxppc-dev@lists.ozlabs.org
> >>       Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >>
> >> Reverting this commit (and 0146b6939074ebe14ece3604fd00e7be128a3812
> >> otherwise git barfs) fixes the problem on next-20120412.  
> >>
> >> I'm assuming we got the ref count changes wrong somewhere in the patch
> >> but the tty code is beyond me.  Jiri, can you take a look?
> > 
> > Yeah, I see. I forgot to remove a couple of tty reference drops. The
> > reference is dropped by tty_port_tty_set in open/close/hangup now. Does
> > the attached patch help?
> 
> And the patch is incomplete. Now we have a leak. This one should work.

Fixes the problem here.. Thanks.

Tested-by: Michael Neuling <mikey@neuling.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/1] TTY: hvc, fix TTY refcounting
  2012-04-13  8:09         ` Michael Neuling
@ 2012-04-13  8:31           ` Jiri Slaby
  -1 siblings, 0 replies; 14+ messages in thread
From: Jiri Slaby @ 2012-04-13  8:31 UTC (permalink / raw)
  To: gregkh; +Cc: linux-kernel, jirislaby, Stephen Rothwell, ppc-dev

A -next commit "TTY: HVC, use tty from tty_port" switched the driver
to use tty_port helper for tty refcounting. But it omitted to remove
manual tty refcounting from open, close and hangup. So now we are
getting random crashes caused by use-after-free:
Unable to handle kernel paging request for data at address 0xc0000003f9d550
Faulting instruction address: 0xc0000000001b7f40
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP: c0000000001b7f40 LR: c0000000001b7f14 CTR: c0000000000e04f0
...
NIP [c0000000001b7f40] .__kmalloc+0x70/0x230
LR [c0000000001b7f14] .__kmalloc+0x44/0x230
Call Trace:
[c0000003f68bf930] [c0000003f68bf9b0] 0xc0000003f68bf9b0 (unreliable)
[c0000003f68bf9e0] [c0000000001e5424] .alloc_fdmem+0x24/0x70
[c0000003f68bfa60] [c0000000001e54f8] .alloc_fdtable+0x88/0x130
[c0000003f68bfaf0] [c0000000001e5924] .dup_fd+0x384/0x450
[c0000003f68bfbd0] [c00000000009a310] .copy_process+0x880/0x11d0
[c0000003f68bfcd0] [c00000000009aee0] .do_fork+0x70/0x400
[c0000003f68bfdc0] [c0000000000141c4] .sys_clone+0x54/0x70
[c0000003f68bfe30] [c000000000009aa0] .ppc_clone+0x8/0xc

Fix that by complete removal of tty_kref_get/put in open/close/hangup
paths.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-and-tested-by: Michael Neuling <mikey@neuling.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: ppc-dev <linuxppc-dev@lists.ozlabs.org>
---
 drivers/tty/hvc/hvc_console.c |    5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 6c45cbf..2d691eb 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -317,8 +317,6 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 	/* Check and then increment for fast path open. */
 	if (hp->port.count++ > 0) {
 		spin_unlock_irqrestore(&hp->port.lock, flags);
-		/* FIXME why taking a reference here? */
-		tty_kref_get(tty);
 		hvc_kick();
 		return 0;
 	} /* else count == 0 */
@@ -338,7 +336,6 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 	 */
 	if (rc) {
 		tty_port_tty_set(&hp->port, NULL);
-		tty_kref_put(tty);
 		tty->driver_data = NULL;
 		tty_port_put(&hp->port);
 		printk(KERN_ERR "hvc_open: request_irq failed with rc %d.\n", rc);
@@ -393,7 +390,6 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
 		spin_unlock_irqrestore(&hp->port.lock, flags);
 	}
 
-	tty_kref_put(tty);
 	tty_port_put(&hp->port);
 }
 
@@ -433,7 +429,6 @@ static void hvc_hangup(struct tty_struct *tty)
 
 	while(temp_open_count) {
 		--temp_open_count;
-		tty_kref_put(tty);
 		tty_port_put(&hp->port);
 	}
 }
-- 
1.7.9.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 1/1] TTY: hvc, fix TTY refcounting
@ 2012-04-13  8:31           ` Jiri Slaby
  0 siblings, 0 replies; 14+ messages in thread
From: Jiri Slaby @ 2012-04-13  8:31 UTC (permalink / raw)
  To: gregkh; +Cc: Stephen Rothwell, ppc-dev, linux-kernel, jirislaby

A -next commit "TTY: HVC, use tty from tty_port" switched the driver
to use tty_port helper for tty refcounting. But it omitted to remove
manual tty refcounting from open, close and hangup. So now we are
getting random crashes caused by use-after-free:
Unable to handle kernel paging request for data at address 0xc0000003f9d550
Faulting instruction address: 0xc0000000001b7f40
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP: c0000000001b7f40 LR: c0000000001b7f14 CTR: c0000000000e04f0
...
NIP [c0000000001b7f40] .__kmalloc+0x70/0x230
LR [c0000000001b7f14] .__kmalloc+0x44/0x230
Call Trace:
[c0000003f68bf930] [c0000003f68bf9b0] 0xc0000003f68bf9b0 (unreliable)
[c0000003f68bf9e0] [c0000000001e5424] .alloc_fdmem+0x24/0x70
[c0000003f68bfa60] [c0000000001e54f8] .alloc_fdtable+0x88/0x130
[c0000003f68bfaf0] [c0000000001e5924] .dup_fd+0x384/0x450
[c0000003f68bfbd0] [c00000000009a310] .copy_process+0x880/0x11d0
[c0000003f68bfcd0] [c00000000009aee0] .do_fork+0x70/0x400
[c0000003f68bfdc0] [c0000000000141c4] .sys_clone+0x54/0x70
[c0000003f68bfe30] [c000000000009aa0] .ppc_clone+0x8/0xc

Fix that by complete removal of tty_kref_get/put in open/close/hangup
paths.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-and-tested-by: Michael Neuling <mikey@neuling.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: ppc-dev <linuxppc-dev@lists.ozlabs.org>
---
 drivers/tty/hvc/hvc_console.c |    5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 6c45cbf..2d691eb 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -317,8 +317,6 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 	/* Check and then increment for fast path open. */
 	if (hp->port.count++ > 0) {
 		spin_unlock_irqrestore(&hp->port.lock, flags);
-		/* FIXME why taking a reference here? */
-		tty_kref_get(tty);
 		hvc_kick();
 		return 0;
 	} /* else count == 0 */
@@ -338,7 +336,6 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 	 */
 	if (rc) {
 		tty_port_tty_set(&hp->port, NULL);
-		tty_kref_put(tty);
 		tty->driver_data = NULL;
 		tty_port_put(&hp->port);
 		printk(KERN_ERR "hvc_open: request_irq failed with rc %d.\n", rc);
@@ -393,7 +390,6 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
 		spin_unlock_irqrestore(&hp->port.lock, flags);
 	}
 
-	tty_kref_put(tty);
 	tty_port_put(&hp->port);
 }
 
@@ -433,7 +429,6 @@ static void hvc_hangup(struct tty_struct *tty)
 
 	while(temp_open_count) {
 		--temp_open_count;
-		tty_kref_put(tty);
 		tty_port_put(&hp->port);
 	}
 }
-- 
1.7.9.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-04-13  8:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-11  6:58 linux-next: boot failures with next-20120411 Stephen Rothwell
2012-04-12  2:44 ` Milton Miller
2012-04-13  2:30 ` Michael Neuling
2012-04-13  2:30   ` Michael Neuling
2012-04-13  2:57   ` Stephen Rothwell
2012-04-13  2:57     ` Stephen Rothwell
2012-04-13  8:02   ` Jiri Slaby
2012-04-13  8:02     ` Jiri Slaby
2012-04-13  8:04     ` Jiri Slaby
2012-04-13  8:04       ` Jiri Slaby
2012-04-13  8:09       ` Michael Neuling
2012-04-13  8:09         ` Michael Neuling
2012-04-13  8:31         ` [PATCH 1/1] TTY: hvc, fix TTY refcounting Jiri Slaby
2012-04-13  8:31           ` Jiri Slaby

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.