Crash (ext3 ) during 2.6.29-rc6 boot

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Crash (ext3 ) during 2.6.29-rc6 boot
@ 2009-02-23  9:46 Sachin P. Sant
  2009-02-23 10:13 ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Sachin P. Sant @ 2009-02-23  9:46 UTC (permalink / raw)
  To: linux-kernel, linux-ext4; +Cc: akpm, Mel Gorman

2.6.29-rc6 bootup on a powerpc box failed with

Unable to handle kernel paging request for data at address 0xc00000003f380000
Faulting instruction address: 0xc000000000039574
cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020]
    pc: c000000000039574: .memcpy+0x74/0x244
    lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
    sp: c00000003baf32a0
   msr: 8000000000009032
   dar: c00000003f380000
 dsisr: 40000000
  current = 0xc00000003e54b010
  paca    = 0xc000000000a53680
    pid   = 1840, comm = readahead
enter ? for help
[link register   ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
[c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
(unreliab
le)
[c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
[c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c
[c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678
[c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68
[c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c
[c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120
[c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3]
[c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260
[c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010
[c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4
[c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4
[c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8
[c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874
[c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140
[c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38
[c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 000000000ff0ef18
SP (ffc6f4b0) is in userspace
1:mon>

Following EXT3 related options were enabled in the config.

CONFIG_EXT3_FS=m
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y


Thanks
-Sachin

-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-23  9:46 Crash (ext3 ) during 2.6.29-rc6 boot Sachin P. Sant
@ 2009-02-23 10:13 ` Andrew Morton
  2009-02-23 10:32   ` Paul Mackerras
  2009-02-23 10:48   ` Sachin P. Sant
  0 siblings, 2 replies; 24+ messages in thread
From: Andrew Morton @ 2009-02-23 10:13 UTC (permalink / raw)
  To: Sachin P. Sant
  Cc: linux-kernel, linux-ext4, Mel Gorman, linuxppc-dev, Jan Kara

On Mon, 23 Feb 2009 15:16:05 +0530 "Sachin P. Sant" <sachinp@in.ibm.com> wrote:

> 2.6.29-rc6 bootup on a powerpc box failed with
> 
> Unable to handle kernel paging request for data at address 0xc00000003f380000
> Faulting instruction address: 0xc000000000039574
> cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020]
>     pc: c000000000039574: .memcpy+0x74/0x244
>     lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
>     sp: c00000003baf32a0
>    msr: 8000000000009032
>    dar: c00000003f380000
>  dsisr: 40000000
>   current = 0xc00000003e54b010
>   paca    = 0xc000000000a53680
>     pid   = 1840, comm = readahead
> enter ? for help
> [link register   ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
> [c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
> (unreliab
> le)
> [c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
> [c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c
> [c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678
> [c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68
> [c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c
> [c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120
> [c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3]
> [c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260
> [c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010
> [c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4
> [c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4
> [c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8
> [c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874
> [c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140
> [c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38
> [c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40
> --- Exception: c01 (System Call) at 000000000ff0ef18
> SP (ffc6f4b0) is in userspace
> 1:mon>
> 
> Following EXT3 related options were enabled in the config.
> 
> CONFIG_EXT3_FS=m
> CONFIG_EXT3_FS_XATTR=y
> CONFIG_EXT3_FS_POSIX_ACL=y
> CONFIG_EXT3_FS_SECURITY=y
> 

hm, I wonder what could have caused that - we haven't altered
fs/ext3/xattr.c in ages.

What is the most recent kernel version you know of which didn't do
this?  Bear in mind that this crash might be triggered by the
current contents of the filesystem, so if possible, please test
some other kernel versions on that disk.

It looks like we died in ext3_xattr_block_get():

		memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
		       size);

Perhaps entry->e_value_offs is no good.  I wonder if the filesystem is
corrupted and this snuck through the defenses.

I also wonder if there is enough info in that trace for a ppc person to
be able to determine whether the faulting address is in the source or
destination of the memcpy() (please)?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-23 10:13 ` Andrew Morton
@ 2009-02-23 10:32   ` Paul Mackerras
  2009-02-23 10:57     ` Sachin P. Sant
                       ` (2 more replies)
  2009-02-23 10:48   ` Sachin P. Sant
  1 sibling, 3 replies; 24+ messages in thread
From: Paul Mackerras @ 2009-02-23 10:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Sachin P. Sant, Mel Gorman, linuxppc-dev, linux-ext4, Jan Kara,
	linux-kernel

Andrew Morton writes:

> It looks like we died in ext3_xattr_block_get():
> 
> 		memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
> 		       size);
> 
> Perhaps entry->e_value_offs is no good.  I wonder if the filesystem is
> corrupted and this snuck through the defenses.
> 
> I also wonder if there is enough info in that trace for a ppc person to
> be able to determine whether the faulting address is in the source or
> destination of the memcpy() (please)?

It appears to have faulted on a load, implicating the source.  The
address being referenced (0xc00000003f380000) doesn't look
outlandish.  I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
on, and what page size is selected?

Paul.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-23 10:13 ` Andrew Morton
  2009-02-23 10:32   ` Paul Mackerras
@ 2009-02-23 10:48   ` Sachin P. Sant
  2009-02-24 16:14     ` Jan Kara
  1 sibling, 1 reply; 24+ messages in thread
From: Sachin P. Sant @ 2009-02-23 10:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-ext4, Mel Gorman, linuxppc-dev, Jan Kara

Andrew Morton wrote:
> hm, I wonder what could have caused that - we haven't altered
> fs/ext3/xattr.c in ages.
>
> What is the most recent kernel version you know of which didn't do
> this?  Bear in mind that this crash might be triggered by the
> current contents of the filesystem, so if possible, please test
> some other kernel versions on that disk.
>   
I am trying to boot a vanilla kernel on this machine for the first
time. Haven't tried any other kernels. Will give it a try.

> It looks like we died in ext3_xattr_block_get():
>
> 		memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
> 		       size);
>
> Perhaps entry->e_value_offs is no good.  I wonder if the filesystem is
> corrupted and this snuck through the defenses.
>
> I also wonder if there is enough info in that trace for a ppc person to
> be able to determine whether the faulting address is in the source or
> destination of the memcpy() (please)?
>   
Some more information if this could be of any help.

0:mon> di 0xc000000000039574
c000000000039574  e9240008      ld      r9,8(r4)
c000000000039578  409d0010      ble     cr7,c000000000039588    # .memcpy+0x88/0x244
c00000000003957c  79290002      rotldi  r9,r9,32
c000000000039580  91230000      stw     r9,0(r3)
c000000000039584  38630004      addi    r3,r3,4
c000000000039588  409e0010      bne     cr7,c000000000039598    # .memcpy+0x98/0x244
c00000000003958c  79298000      rotldi  r9,r9,16
c000000000039590  b1230000      sth     r9,0(r3)
c000000000039594  38630002      addi    r3,r3,2
c000000000039598  409f000c      bns     cr7,c0000000000395a4    # .memcpy+0xa4/0x244
c00000000003959c  79294000      rotldi  r9,r9,8
c0000000000395a0  99230000      stb     r9,0(r3)
c0000000000395a4  e8610030      ld      r3,48(r1)
c0000000000395a8  4e800020      blr
c0000000000395ac  78a6e8c2      rldicl  r6,r5,61,3
c0000000000395b0  38a5fff0      addi    r5,r5,-16
0:mon> r
R00 = 000000000000e40f   R16 = 00000000100edbc8
R01 = c00000003e59b3e0   R17 = 00000000100b0000
R02 = c0000000009c2110   R18 = 0000000000000005
R03 = c000000044bc90e0   R19 = 00000000fff0d7a8
R04 = c000000039cffff4   R20 = 00000000fff0d708
R05 = 0000000000000003   R21 = 00000000000000ff
R06 = 0000000000000000   R22 = 0000000000000006
R07 = 0000000000000001   R23 = c00000000079ab49
R08 = 723a7573725f743a   R24 = c0000000372fe2a8
R09 = 3a6f626a6563745f   R25 = c000000044bc90c8
R10 = c00000003b250968   R26 = c0000000372fe240
R11 = c000000000039500   R27 = c0000000372fe3b0
R12 = d00000000244c590   R28 = c0000000372c5280
R13 = c000000000a53480   R29 = 000000000000001b
R14 = 00000000100d0000   R30 = d0000000024654d0
R15 = 0000000000000000   R31 = ffffffffffffffde
pc  = c000000000039574 .memcpy+0x74/0x244
lr  = d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
msr = 8000000000009032   cr  = 4400844b
ctr = 0000000000000000   xer = 0000000000000001   trap =  300
dar = c000000039d00000   dsisr = 40000000
0:mon>

So the other thing i noticed was that this machine was running
a kernel with selinux enabled. I turned off selinux and there
were no issues during bootup. It was a clean boot.

Thanks
-Sachin

-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-23 10:32   ` Paul Mackerras
@ 2009-02-23 10:57     ` Sachin P. Sant
  2009-02-23 15:51     ` Jan Kara
  2009-02-24 18:01     ` Crash (ext3 ) during 2.6.29-rc6 boot Geert Uytterhoeven
  2 siblings, 0 replies; 24+ messages in thread
From: Sachin P. Sant @ 2009-02-23 10:57 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Andrew Morton, Mel Gorman, linuxppc-dev, linux-ext4, Jan Kara,
	linux-kernel

Paul Mackerras wrote:
> It appears to have faulted on a load, implicating the source.  The
> address being referenced (0xc00000003f380000) doesn't look
> outlandish.  I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
> on, and what page size is selected?
Yes CONFIG_DEBUG_PAGEALLOC is enabled and the page size is 64K.

CONFIG_DEBUG_PAGEALLOC=y
CONFIG_PPC_64K_PAGES=y

Thanks
-Sachin


-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-23 10:32   ` Paul Mackerras
  2009-02-23 10:57     ` Sachin P. Sant
@ 2009-02-23 15:51     ` Jan Kara
  2009-02-24  6:38       ` Sachin P. Sant
  2009-02-24 18:01     ` Crash (ext3 ) during 2.6.29-rc6 boot Geert Uytterhoeven
  2 siblings, 1 reply; 24+ messages in thread
From: Jan Kara @ 2009-02-23 15:51 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Andrew Morton, Sachin P. Sant, Mel Gorman, linuxppc-dev,
	linux-ext4, Jan Kara, linux-kernel

> Andrew Morton writes:
> 
> > It looks like we died in ext3_xattr_block_get():
> > 
> > 		memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
> > 		       size);
> > 
> > Perhaps entry->e_value_offs is no good.  I wonder if the filesystem is
> > corrupted and this snuck through the defenses.
> > 
> > I also wonder if there is enough info in that trace for a ppc person to
> > be able to determine whether the faulting address is in the source or
> > destination of the memcpy() (please)?
> 
> It appears to have faulted on a load, implicating the source.  The
> address being referenced (0xc00000003f380000) doesn't look
> outlandish.  I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
> on, and what page size is selected?
  Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
somehow got beyond end of the page referenced by bh->b_data. So it means
that le16_to_cpu(entry->e_value_offs) + size > page_size. But
ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
particular checks whether e_value_offs + e_value_size isn't greater than
bh->b_size. So I see no way how memcpy can get beyond end of the page.
  Sachin, is the problem reproducible? If yes, can you send us contents
of the page just before the faulting address (i.e., for current fault it
would be 0xc00000003f370000-0xc00000003f37ffff). As far as I can
remember powerpc monitor could dump it.
  BTW, I suppose you use 4KB blocksize on the filesystem, right?

									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-23 15:51     ` Jan Kara
@ 2009-02-24  6:38       ` Sachin P. Sant
  2009-02-24 15:51         ` Jan Kara
  2009-02-25  6:52         ` Mark Nelson
  0 siblings, 2 replies; 24+ messages in thread
From: Sachin P. Sant @ 2009-02-24  6:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: Paul Mackerras, Andrew Morton, Mel Gorman, linuxppc-dev,
	linux-ext4, Jan Kara, linux-kernel

Jan Kara wrote:
>   Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> somehow got beyond end of the page referenced by bh->b_data. So it means
> that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> particular checks whether e_value_offs + e_value_size isn't greater than
> bh->b_size. So I see no way how memcpy can get beyond end of the page.
>   Sachin, is the problem reproducible? If yes, can you send us contents
>   
Yes, i am able to recreate this problem easily. As i had mentioned if the
earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
without any problem.

> of the page just before the faulting address (i.e., for current fault it
> would be 0xc00000003f370000-0xc00000003f37ffff). As far as I can
> remember powerpc monitor could dump it.
>   
Here is the page dump. This time it crashed while accessing address
0xc00000002d670000.

 Unable to handle kernel paging request for data at address 0xc0000
0002d670000
Faulting instruction address: 0xc000000000039574
cpu 0x1: Vector: 300 (Data Access) at [c00000004288b0b0]
    pc: c000000000039574: .memcpy+0x74/0x244
    lr: c0000000001b497c: .ext3_xattr_get+0x288/0x2f4
    sp: c00000004288b330
   msr: 8000000000009032

1:mon> d 0xc00000002d660000
............................... <SNIP> ...............................

c00000002d66efd0 0000000000000000 0000000000000000  |................|
c00000002d66efe0 0000000000000000 0000000000000000  |................|
c00000002d66eff0 0000000000000000 0000000000000000  |................|
c00000002d66f000 000002ea00040000 01000000e200d20a  |................|
c00000002d66f010 0000000000000000 0000000000000000  |................|
c00000002d66f020 0706e40f00000000 1b000000e200d20a  |................|
c00000002d66f030 73656c696e757800 0000000000000000  |selinux.........|
c00000002d66f040 0000000000000000 0000000000000000  |................|
c00000002d66f050 0000000000000000 0000000000000000  |................|
c00000002d66f060 0000000000000000 0000000000000000  |................|

............................... <SNIP> ...............................

c00000002d66ff60 0000000000000000 0000000000000000  |................|
c00000002d66ff70 0000000000000000 0000000000000000  |................|
c00000002d66ff80 0000000000000000 0000000000000000  |................|
c00000002d66ff90 0000000000000000 0000000000000000  |................|
c00000002d66ffa0 0000000000000000 0000000000000000  |................|
c00000002d66ffb0 0000000000000000 0000000000000000  |................|
c00000002d66ffc0 0000000000000000 0000000000000000  |................|
c00000002d66ffd0 0000000000000000 0000000000000000  |................|
c00000002d66ffe0 0000000073797374 656d5f753a6f626a  |....system_u:obj|
c00000002d66fff0 6563745f723a7573 725f743a73300000  |ect_r:usr_t:s0..|
c00000002d670000 **************** ****************  |                |
1:mon> r
R00 = 000000000000e40f   R16 = 000000000000005d
R01 = c00000004288b330   R17 = 0000000000000000
R02 = c0000000009f59b8   R18 = 00000000fffbfe9e
R03 = c000000044aa34a0   R19 = 0000000010042638
R04 = c00000002d66fff4   R20 = 0000000010041610
R05 = 0000000000000003   R21 = 00000000000000ff
R06 = 0000000000000000   R22 = 0000000000000006
R07 = 0000000000000001   R23 = c0000000007d27c1
R08 = 723a7573725f743a   R24 = c00000002c0cd758
R09 = 3a6f626a6563745f   R25 = c000000044aa3488
R10 = c00000000017b43c   R26 = c00000002c0cd6f0
R11 = c00000002d66f020   R27 = c00000002c0cd860
R12 = d0000000023c14b0   R28 = c00000002c0b0840
R13 = c000000000a93680   R29 = 000000000000001b
R14 = 00000000000041ed   R30 = c0000000009880b0
R15 = 0000000010040000   R31 = ffffffffffffffde
pc  = c000000000039574 .memcpy+0x74/0x244
lr  = c0000000001b497c .ext3_xattr_get+0x288/0x2f4
msr = 8000000000009032   cr  = 4400044b
ctr = 0000000000000000   xer = 0000000020000001   trap =  300
dar = c00000002d670000   dsisr = 40000000
1:mon> zr

>   BTW, I suppose you use 4KB blocksize on the filesystem, right?
>   
Yes.

dumpe2fs /dev/sda3 | grep -i "block size" 
dumpe2fs 1.39 (29-May-2006)
Block size:               4096

Thanks
-Sachin

-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-24  6:38       ` Sachin P. Sant
@ 2009-02-24 15:51         ` Jan Kara
  2009-02-25  1:20           ` Mark Nelson
  2009-02-25  6:52         ` Mark Nelson
  1 sibling, 1 reply; 24+ messages in thread
From: Jan Kara @ 2009-02-24 15:51 UTC (permalink / raw)
  To: Sachin P. Sant
  Cc: Paul Mackerras, Andrew Morton, Mel Gorman, linuxppc-dev,
	linux-ext4, Jan Kara, linux-kernel, Mark Nelson

  Hello,

On Tue 24-02-09 12:08:37, Sachin P. Sant wrote:
> Jan Kara wrote:
>>   Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
>> somehow got beyond end of the page referenced by bh->b_data. So it means
>> that le16_to_cpu(entry->e_value_offs) + size > page_size. But
>> ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
>> particular checks whether e_value_offs + e_value_size isn't greater than
>> bh->b_size. So I see no way how memcpy can get beyond end of the page.
>>   Sachin, is the problem reproducible? If yes, can you send us contents
>>   
> Yes, i am able to recreate this problem easily. As i had mentioned if the
> earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> without any problem.
>
>> of the page just before the faulting address (i.e., for current fault it
>> would be 0xc00000003f370000-0xc00000003f37ffff). As far as I can
>> remember powerpc monitor could dump it.
>>   
> Here is the page dump. This time it crashed while accessing address
> 0xc00000002d670000.
  Thanks for the dump.

> Unable to handle kernel paging request for data at address 0xc0000
> 0002d670000
> Faulting instruction address: 0xc000000000039574
> cpu 0x1: Vector: 300 (Data Access) at [c00000004288b0b0]
>    pc: c000000000039574: .memcpy+0x74/0x244
>    lr: c0000000001b497c: .ext3_xattr_get+0x288/0x2f4
>    sp: c00000004288b330
>   msr: 8000000000009032
>
> 1:mon> d 0xc00000002d660000
> ............................... <SNIP> ...............................
>
> c00000002d66efd0 0000000000000000 0000000000000000  |................|
> c00000002d66efe0 0000000000000000 0000000000000000  |................|
> c00000002d66eff0 0000000000000000 0000000000000000  |................|
> c00000002d66f000 000002ea00040000 01000000e200d20a  |................|
> c00000002d66f010 0000000000000000 0000000000000000  |................|
> c00000002d66f020 0706e40f00000000 1b000000e200d20a  |................|
> c00000002d66f030 73656c696e757800 0000000000000000  |selinux.........|
> c00000002d66f040 0000000000000000 0000000000000000  |................|
> c00000002d66f050 0000000000000000 0000000000000000  |................|
> c00000002d66f060 0000000000000000 0000000000000000  |................|
>
> ............................... <SNIP> ...............................
>
> c00000002d66ff60 0000000000000000 0000000000000000  |................|
> c00000002d66ff70 0000000000000000 0000000000000000  |................|
> c00000002d66ff80 0000000000000000 0000000000000000  |................|
> c00000002d66ff90 0000000000000000 0000000000000000  |................|
> c00000002d66ffa0 0000000000000000 0000000000000000  |................|
> c00000002d66ffb0 0000000000000000 0000000000000000  |................|
> c00000002d66ffc0 0000000000000000 0000000000000000  |................|
> c00000002d66ffd0 0000000000000000 0000000000000000  |................|
> c00000002d66ffe0 0000000073797374 656d5f753a6f626a  |....system_u:obj|
> c00000002d66fff0 6563745f723a7573 725f743a73300000  |ect_r:usr_t:s0..|
> c00000002d670000 **************** ****************  |                |
> 1:mon> r
> R00 = 000000000000e40f   R16 = 000000000000005d
> R01 = c00000004288b330   R17 = 0000000000000000
> R02 = c0000000009f59b8   R18 = 00000000fffbfe9e
> R03 = c000000044aa34a0   R19 = 0000000010042638
> R04 = c00000002d66fff4   R20 = 0000000010041610
> R05 = 0000000000000003   R21 = 00000000000000ff
> R06 = 0000000000000000   R22 = 0000000000000006
> R07 = 0000000000000001   R23 = c0000000007d27c1
> R08 = 723a7573725f743a   R24 = c00000002c0cd758
> R09 = 3a6f626a6563745f   R25 = c000000044aa3488
> R10 = c00000000017b43c   R26 = c00000002c0cd6f0
> R11 = c00000002d66f020   R27 = c00000002c0cd860
> R12 = d0000000023c14b0   R28 = c00000002c0b0840
> R13 = c000000000a93680   R29 = 000000000000001b
> R14 = 00000000000041ed   R30 = c0000000009880b0
> R15 = 0000000010040000   R31 = ffffffffffffffde
> pc  = c000000000039574 .memcpy+0x74/0x244
> lr  = c0000000001b497c .ext3_xattr_get+0x288/0x2f4
> msr = 8000000000009032   cr  = 4400044b
> ctr = 0000000000000000   xer = 0000000020000001   trap =  300
> dar = c00000002d670000   dsisr = 40000000
> 1:mon> zr
>
>>   BTW, I suppose you use 4KB blocksize on the filesystem, right?
>>   
> Yes.
>
> dumpe2fs /dev/sda3 | grep -i "block size" dumpe2fs 1.39 (29-May-2006)
> Block size:               4096
  OK. The xattr block causing oops is completely correct. To me it seems
more like some problem in powerpc memcpy() (I saw there went some changes
into in in the end of December) - we call it to copy 27 bytes from
address 0xc00000002d66ffe4 (which is one byte before end of the page).
Could some of the powerpc guys have a look whether this could be the case?
I'm not quite fluent in the powerpc assembly so it would take me ages ;).

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-23 10:48   ` Sachin P. Sant
@ 2009-02-24 16:14     ` Jan Kara
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Kara @ 2009-02-24 16:14 UTC (permalink / raw)
  To: Sachin P. Sant
  Cc: Andrew Morton, linux-kernel, linux-ext4, Mel Gorman, linuxppc-dev

> Andrew Morton wrote:
> >hm, I wonder what could have caused that - we haven't altered
> >fs/ext3/xattr.c in ages.
> >
> >What is the most recent kernel version you know of which didn't do
> >this?  Bear in mind that this crash might be triggered by the
> >current contents of the filesystem, so if possible, please test
> >some other kernel versions on that disk.
> >  
> I am trying to boot a vanilla kernel on this machine for the first
> time. Haven't tried any other kernels. Will give it a try.
> 
> >It looks like we died in ext3_xattr_block_get():
> >
> >		memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
> >		       size);
> >
> >Perhaps entry->e_value_offs is no good.  I wonder if the filesystem is
> >corrupted and this snuck through the defenses.
> >
> >I also wonder if there is enough info in that trace for a ppc person to
> >be able to determine whether the faulting address is in the source or
> >destination of the memcpy() (please)?
> >  
> Some more information if this could be of any help.
> 
> 0:mon> di 0xc000000000039574
> c000000000039574  e9240008      ld      r9,8(r4)
> c000000000039578  409d0010      ble     cr7,c000000000039588    # 
> .memcpy+0x88/0x244
> c00000000003957c  79290002      rotldi  r9,r9,32
> c000000000039580  91230000      stw     r9,0(r3)
> c000000000039584  38630004      addi    r3,r3,4
> c000000000039588  409e0010      bne     cr7,c000000000039598    # 
> .memcpy+0x98/0x244
> c00000000003958c  79298000      rotldi  r9,r9,16
> c000000000039590  b1230000      sth     r9,0(r3)
> c000000000039594  38630002      addi    r3,r3,2
> c000000000039598  409f000c      bns     cr7,c0000000000395a4    # 
> .memcpy+0xa4/0x244
> c00000000003959c  79294000      rotldi  r9,r9,8
> c0000000000395a0  99230000      stb     r9,0(r3)
> c0000000000395a4  e8610030      ld      r3,48(r1)
> c0000000000395a8  4e800020      blr
> c0000000000395ac  78a6e8c2      rldicl  r6,r5,61,3
> c0000000000395b0  38a5fff0      addi    r5,r5,-16
> 0:mon> r
> R00 = 000000000000e40f   R16 = 00000000100edbc8
> R01 = c00000003e59b3e0   R17 = 00000000100b0000
> R02 = c0000000009c2110   R18 = 0000000000000005
> R03 = c000000044bc90e0   R19 = 00000000fff0d7a8
> R04 = c000000039cffff4   R20 = 00000000fff0d708
> R05 = 0000000000000003   R21 = 00000000000000ff
> R06 = 0000000000000000   R22 = 0000000000000006
> R07 = 0000000000000001   R23 = c00000000079ab49
> R08 = 723a7573725f743a   R24 = c0000000372fe2a8
> R09 = 3a6f626a6563745f   R25 = c000000044bc90c8
> R10 = c00000003b250968   R26 = c0000000372fe240
> R11 = c000000000039500   R27 = c0000000372fe3b0
> R12 = d00000000244c590   R28 = c0000000372c5280
> R13 = c000000000a53480   R29 = 000000000000001b
> R14 = 00000000100d0000   R30 = d0000000024654d0
> R15 = 0000000000000000   R31 = ffffffffffffffde
> pc  = c000000000039574 .memcpy+0x74/0x244
> lr  = d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
> msr = 8000000000009032   cr  = 4400844b
> ctr = 0000000000000000   xer = 0000000000000001   trap =  300
> dar = c000000039d00000   dsisr = 40000000
> 0:mon>
  Yes, this makes me even more suspitious that memcpy() on powerpc could
be at fault. The instruction (ld r9,8(r4)) is loading last 8 bytes to copy,
but in fact it should load only 3 bytes in our case because remaining 5
bytes are not in the range we specified and thus larger load can cause
page fault...

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-23 10:32   ` Paul Mackerras
  2009-02-23 10:57     ` Sachin P. Sant
  2009-02-23 15:51     ` Jan Kara
@ 2009-02-24 18:01     ` Geert Uytterhoeven
  2009-02-25  1:27       ` Mark Nelson
  2 siblings, 1 reply; 24+ messages in thread
From: Geert Uytterhoeven @ 2009-02-24 18:01 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Andrew Morton, Jan Kara, Mel Gorman, linux-kernel, linuxppc-dev,
	linux-ext4

On Mon, 23 Feb 2009, Paul Mackerras wrote:
> Andrew Morton writes:
> > It looks like we died in ext3_xattr_block_get():
> > 
> > 		memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
> > 		       size);
> > 
> > Perhaps entry->e_value_offs is no good.  I wonder if the filesystem is
> > corrupted and this snuck through the defenses.
> > 
> > I also wonder if there is enough info in that trace for a ppc person to
> > be able to determine whether the faulting address is in the source or
> > destination of the memcpy() (please)?
> 
> It appears to have faulted on a load, implicating the source.  The
> address being referenced (0xc00000003f380000) doesn't look
> outlandish.  I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
> on, and what page size is selected?

I'm seeing a similar thing on PS3, but not in ext3. During early userspace
setup (udevd), it crashes accessing a 0xc00* address in:

| NIP setup+0x20/0x130
| LR copy_user_page+0x18/0x6c
| Call trace:
| do_wp_page+0x5b4/0x89c
| do_page_fault+0x3a8/0x58c
| handle_page_fault+0x20/0x5c

I have CONFIG_DEBUG_PAGEALLOC=y. If I disable it, the system boots fine.

If needed, I can probably bisect this tomorrow. It definitely didn't happen in
2.6.29-rc5.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:    +32 (0)2 700 8453
Fax:      +32 (0)2 700 8622
E-mail:   Geert.Uytterhoeven@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-24 15:51         ` Jan Kara
@ 2009-02-25  1:20           ` Mark Nelson
  0 siblings, 0 replies; 24+ messages in thread
From: Mark Nelson @ 2009-02-25  1:20 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sachin P. Sant, Paul Mackerras, Andrew Morton, Mel Gorman,
	linuxppc-dev, linux-ext4, Jan Kara, linux-kernel, benh

On Wed, 25 Feb 2009 02:51:20 am Jan Kara wrote:
>   Hello,
> 
> On Tue 24-02-09 12:08:37, Sachin P. Sant wrote:
> > Jan Kara wrote:
> >>   Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> >> somehow got beyond end of the page referenced by bh->b_data. So it means
> >> that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> >> ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> >> particular checks whether e_value_offs + e_value_size isn't greater than
> >> bh->b_size. So I see no way how memcpy can get beyond end of the page.
> >>   Sachin, is the problem reproducible? If yes, can you send us contents
> >>   
> > Yes, i am able to recreate this problem easily. As i had mentioned if the
> > earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> > i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> > without any problem.
> >
> >> of the page just before the faulting address (i.e., for current fault it
> >> would be 0xc00000003f370000-0xc00000003f37ffff). As far as I can
> >> remember powerpc monitor could dump it.
> >>   
> > Here is the page dump. This time it crashed while accessing address
> > 0xc00000002d670000.
>   Thanks for the dump.
> 
> > Unable to handle kernel paging request for data at address 0xc0000
> > 0002d670000
> > Faulting instruction address: 0xc000000000039574
> > cpu 0x1: Vector: 300 (Data Access) at [c00000004288b0b0]
> >    pc: c000000000039574: .memcpy+0x74/0x244
> >    lr: c0000000001b497c: .ext3_xattr_get+0x288/0x2f4
> >    sp: c00000004288b330
> >   msr: 8000000000009032
> >
> > 1:mon> d 0xc00000002d660000
> > ............................... <SNIP> ...............................
> >
> > c00000002d66efd0 0000000000000000 0000000000000000  |................|
> > c00000002d66efe0 0000000000000000 0000000000000000  |................|
> > c00000002d66eff0 0000000000000000 0000000000000000  |................|
> > c00000002d66f000 000002ea00040000 01000000e200d20a  |................|
> > c00000002d66f010 0000000000000000 0000000000000000  |................|
> > c00000002d66f020 0706e40f00000000 1b000000e200d20a  |................|
> > c00000002d66f030 73656c696e757800 0000000000000000  |selinux.........|
> > c00000002d66f040 0000000000000000 0000000000000000  |................|
> > c00000002d66f050 0000000000000000 0000000000000000  |................|
> > c00000002d66f060 0000000000000000 0000000000000000  |................|
> >
> > ............................... <SNIP> ...............................
> >
> > c00000002d66ff60 0000000000000000 0000000000000000  |................|
> > c00000002d66ff70 0000000000000000 0000000000000000  |................|
> > c00000002d66ff80 0000000000000000 0000000000000000  |................|
> > c00000002d66ff90 0000000000000000 0000000000000000  |................|
> > c00000002d66ffa0 0000000000000000 0000000000000000  |................|
> > c00000002d66ffb0 0000000000000000 0000000000000000  |................|
> > c00000002d66ffc0 0000000000000000 0000000000000000  |................|
> > c00000002d66ffd0 0000000000000000 0000000000000000  |................|
> > c00000002d66ffe0 0000000073797374 656d5f753a6f626a  |....system_u:obj|
> > c00000002d66fff0 6563745f723a7573 725f743a73300000  |ect_r:usr_t:s0..|
> > c00000002d670000 **************** ****************  |                |
> > 1:mon> r
> > R00 = 000000000000e40f   R16 = 000000000000005d
> > R01 = c00000004288b330   R17 = 0000000000000000
> > R02 = c0000000009f59b8   R18 = 00000000fffbfe9e
> > R03 = c000000044aa34a0   R19 = 0000000010042638
> > R04 = c00000002d66fff4   R20 = 0000000010041610
> > R05 = 0000000000000003   R21 = 00000000000000ff
> > R06 = 0000000000000000   R22 = 0000000000000006
> > R07 = 0000000000000001   R23 = c0000000007d27c1
> > R08 = 723a7573725f743a   R24 = c00000002c0cd758
> > R09 = 3a6f626a6563745f   R25 = c000000044aa3488
> > R10 = c00000000017b43c   R26 = c00000002c0cd6f0
> > R11 = c00000002d66f020   R27 = c00000002c0cd860
> > R12 = d0000000023c14b0   R28 = c00000002c0b0840
> > R13 = c000000000a93680   R29 = 000000000000001b
> > R14 = 00000000000041ed   R30 = c0000000009880b0
> > R15 = 0000000010040000   R31 = ffffffffffffffde
> > pc  = c000000000039574 .memcpy+0x74/0x244
> > lr  = c0000000001b497c .ext3_xattr_get+0x288/0x2f4
> > msr = 8000000000009032   cr  = 4400044b
> > ctr = 0000000000000000   xer = 0000000020000001   trap =  300
> > dar = c00000002d670000   dsisr = 40000000
> > 1:mon> zr
> >
> >>   BTW, I suppose you use 4KB blocksize on the filesystem, right?
> >>   
> > Yes.
> >
> > dumpe2fs /dev/sda3 | grep -i "block size" dumpe2fs 1.39 (29-May-2006)
> > Block size:               4096
>   OK. The xattr block causing oops is completely correct. To me it seems
> more like some problem in powerpc memcpy() (I saw there went some changes
> into in in the end of December) - we call it to copy 27 bytes from
> address 0xc00000002d66ffe4 (which is one byte before end of the page).
> Could some of the powerpc guys have a look whether this could be the case?
> I'm not quite fluent in the powerpc assembly so it would take me ages ;).

You're right - it's a problem with the 64bit powerpc memcpy(). And the brown
paper bag is all mine (commit 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556). On
Power6 and Cell we're doing a load double that goes beyond the source size
we were given to copy. I'll see if I can find a nice way of fixing this up,
if not then I'll ask Ben to revert.

Sorry about the goose chase!

Mark

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-24 18:01     ` Crash (ext3 ) during 2.6.29-rc6 boot Geert Uytterhoeven
@ 2009-02-25  1:27       ` Mark Nelson
  2009-02-25 10:50         ` Geert Uytterhoeven
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Nelson @ 2009-02-25  1:27 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Geert Uytterhoeven, Paul Mackerras, Jan Kara, Mel Gorman,
	linux-kernel, Andrew Morton, linux-ext4

On Wed, 25 Feb 2009 05:01:59 am Geert Uytterhoeven wrote:
> On Mon, 23 Feb 2009, Paul Mackerras wrote:
> > Andrew Morton writes:
> > > It looks like we died in ext3_xattr_block_get():
> > > 
> > > 		memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
> > > 		       size);
> > > 
> > > Perhaps entry->e_value_offs is no good.  I wonder if the filesystem is
> > > corrupted and this snuck through the defenses.
> > > 
> > > I also wonder if there is enough info in that trace for a ppc person to
> > > be able to determine whether the faulting address is in the source or
> > > destination of the memcpy() (please)?
> > 
> > It appears to have faulted on a load, implicating the source.  The
> > address being referenced (0xc00000003f380000) doesn't look
> > outlandish.  I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
> > on, and what page size is selected?
> 
> I'm seeing a similar thing on PS3, but not in ext3. During early userspace
> setup (udevd), it crashes accessing a 0xc00* address in:
> 
> | NIP setup+0x20/0x130
> | LR copy_user_page+0x18/0x6c
> | Call trace:
> | do_wp_page+0x5b4/0x89c
> | do_page_fault+0x3a8/0x58c
> | handle_page_fault+0x20/0x5c
> 
> I have CONFIG_DEBUG_PAGEALLOC=y. If I disable it, the system boots fine.
> 
> If needed, I can probably bisect this tomorrow. It definitely didn't happen in
> 2.6.29-rc5.

No need to bisect - it was 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, my
commit that "optimised" 64bit memcpy() for Power6 and Cell.

The bug was in -rc1, but if your copies were 8-byte aligned with respect
to the source the problem wouldn't have been seen... Could this have
been why you didn't see it in -rc5?

I'll work on a fix now.

Thanks!

Mark

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-24  6:38       ` Sachin P. Sant
  2009-02-24 15:51         ` Jan Kara
@ 2009-02-25  6:52         ` Mark Nelson
  2009-02-25  9:50           ` Geert Uytterhoeven
                             ` (2 more replies)
  1 sibling, 3 replies; 24+ messages in thread
From: Mark Nelson @ 2009-02-25  6:52 UTC (permalink / raw)
  To: Sachin P. Sant, Geert Uytterhoeven
  Cc: linuxppc-dev, Jan Kara, Jan Kara, Mel Gorman, linux-kernel,
	Paul Mackerras, Andrew Morton, linux-ext4, benh

On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
> Jan Kara wrote:
> >   Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> > somehow got beyond end of the page referenced by bh->b_data. So it means
> > that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> > ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> > particular checks whether e_value_offs + e_value_size isn't greater than
> > bh->b_size. So I see no way how memcpy can get beyond end of the page.
> >   Sachin, is the problem reproducible? If yes, can you send us contents
> >   
> Yes, i am able to recreate this problem easily. As i had mentioned if the
> earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> without any problem.

Hi Sanchin and Geert,

Does the patch below fix the problems you're seeing? If it does I'll send
a properly written up and formatted patch to linuxppc-dev (as well as
another one to fix the same problem in copy_tofrom_user()).

Thanks and sorry again!

Mark

---
 arch/powerpc/lib/memcpy_64.S |   26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

Index: upstream/arch/powerpc/lib/memcpy_64.S
===================================================================
--- upstream.orig/arch/powerpc/lib/memcpy_64.S
+++ upstream/arch/powerpc/lib/memcpy_64.S
@@ -53,18 +53,19 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 3:	std	r8,8(r3)
 	beq	3f
 	addi	r3,r3,16
-	ld	r9,8(r4)
 .Ldo_tail:
 	bf	cr7*4+1,1f
-	rotldi	r9,r9,32
+	lwz	r9,8(r4)
+	addi	r4,r4,4
 	stw	r9,0(r3)
 	addi	r3,r3,4
 1:	bf	cr7*4+2,2f
-	rotldi	r9,r9,16
+	lhz	r9,8(r4)
+	addi	r4,r4,2
 	sth	r9,0(r3)
 	addi	r3,r3,2
 2:	bf	cr7*4+3,3f
-	rotldi	r9,r9,8
+	lbz	r9,8(r4)
 	stb	r9,0(r3)
 3:	ld	r3,48(r1)	/* return dest pointer */
 	blr
@@ -133,11 +134,24 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 	cmpwi	cr1,r5,8
 	addi	r3,r3,32
 	sld	r9,r9,r10
-	ble	cr1,.Ldo_tail
+	ble	cr1,6f
 	ld	r0,8(r4)
 	srd	r7,r0,r11
 	or	r9,r7,r9
-	b	.Ldo_tail
+6:
+	bf	cr7*4+1,1f
+	rotldi	r9,r9,32
+	stw	r9,0(r3)
+	addi	r3,r3,4
+1:	bf	cr7*4+2,2f
+	rotldi	r9,r9,16
+	sth	r9,0(r3)
+	addi	r3,r3,2
+2:	bf	cr7*4+3,3f
+	rotldi	r9,r9,8
+	stb	r9,0(r3)
+3:	ld	r3,48(r1)	/* return dest pointer */
+	blr
 
 .Ldst_unaligned:
 	PPC_MTOCRF	0x01,r6		# put #bytes to 8B bdry into cr7

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-25  6:52         ` Mark Nelson
@ 2009-02-25  9:50           ` Geert Uytterhoeven
  2009-02-25 12:10             ` Mark Nelson
  2009-02-25 11:08           ` Sachin P. Sant
  2009-02-25 23:26           ` [PATCH] powerpc: Fix 64bit memcpy() regression Mark Nelson
  2 siblings, 1 reply; 24+ messages in thread
From: Geert Uytterhoeven @ 2009-02-25  9:50 UTC (permalink / raw)
  To: Mark Nelson
  Cc: Sachin P. Sant, Jan Kara, Jan Kara, Mel Gorman, linux-kernel,
	linuxppc-dev, Paul Mackerras, Andrew Morton, linux-ext4

On Wed, 25 Feb 2009, Mark Nelson wrote:
> On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
> > Jan Kara wrote:
> > >   Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> > > somehow got beyond end of the page referenced by bh->b_data. So it means
> > > that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> > > ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> > > particular checks whether e_value_offs + e_value_size isn't greater than
> > > bh->b_size. So I see no way how memcpy can get beyond end of the page.
> > >   Sachin, is the problem reproducible? If yes, can you send us contents
> > >   
> > Yes, i am able to recreate this problem easily. As i had mentioned if the
> > earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> > i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> > without any problem.
> 
> Hi Sanchin and Geert,
> 
> Does the patch below fix the problems you're seeing? If it does I'll send
> a properly written up and formatted patch to linuxppc-dev (as well as
> another one to fix the same problem in copy_tofrom_user()).

Unfortunately not, now it crashes while accessing the memory pointed to by
GPR16, in

NIP: copy_page_range+x0608/0x628
LR:  dup_mm+0x2e4/0x428
Trace: debug_table+0xcc70/0x1afe0 (unreliable)
dup_mm+0x2e4/0x428
copy_process+0x86c/0xf9c
do_fork+0x188/0x39c
sys_clone+0x58/0x70
ppc_clone+0x8/0xc

However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I still see
similar problems as above (crash in copy_page_range()).
Which makes me think that
  1. Your new patch fixes the problem introduced by 25d6e2d7,
  2. There's still another issue than the one introduced by 25d6e2d7.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:    +32 (0)2 700 8453
Fax:      +32 (0)2 700 8622
E-mail:   Geert.Uytterhoeven@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-25  1:27       ` Mark Nelson
@ 2009-02-25 10:50         ` Geert Uytterhoeven
  0 siblings, 0 replies; 24+ messages in thread
From: Geert Uytterhoeven @ 2009-02-25 10:50 UTC (permalink / raw)
  To: Mark Nelson
  Cc: linuxppc-dev, Jan Kara, Mel Gorman, linux-kernel, Paul Mackerras,
	Andrew Morton, linux-ext4

On Wed, 25 Feb 2009, Mark Nelson wrote:
> On Wed, 25 Feb 2009 05:01:59 am Geert Uytterhoeven wrote:
> > On Mon, 23 Feb 2009, Paul Mackerras wrote:
> > > Andrew Morton writes:
> > > > It looks like we died in ext3_xattr_block_get():
> > > > 
> > > > 		memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
> > > > 		       size);
> > > > 
> > > > Perhaps entry->e_value_offs is no good.  I wonder if the filesystem is
> > > > corrupted and this snuck through the defenses.
> > > > 
> > > > I also wonder if there is enough info in that trace for a ppc person to
> > > > be able to determine whether the faulting address is in the source or
> > > > destination of the memcpy() (please)?
> > > 
> > > It appears to have faulted on a load, implicating the source.  The
> > > address being referenced (0xc00000003f380000) doesn't look
> > > outlandish.  I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
> > > on, and what page size is selected?
> > 
> > I'm seeing a similar thing on PS3, but not in ext3. During early userspace
> > setup (udevd), it crashes accessing a 0xc00* address in:
> > 
> > | NIP setup+0x20/0x130
> > | LR copy_user_page+0x18/0x6c
> > | Call trace:
> > | do_wp_page+0x5b4/0x89c
> > | do_page_fault+0x3a8/0x58c
> > | handle_page_fault+0x20/0x5c
> > 
> > I have CONFIG_DEBUG_PAGEALLOC=y. If I disable it, the system boots fine.
> > 
> > If needed, I can probably bisect this tomorrow. It definitely didn't happen in
> > 2.6.29-rc5.
> 
> No need to bisect - it was 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, my
> commit that "optimised" 64bit memcpy() for Power6 and Cell.
> 
> The bug was in -rc1, but if your copies were 8-byte aligned with respect
> to the source the problem wouldn't have been seen... Could this have
> been why you didn't see it in -rc5?

Hmm... I just started seeing it on older kernels (-rc5+), too...

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:    +32 (0)2 700 8453
Fax:      +32 (0)2 700 8622
E-mail:   Geert.Uytterhoeven@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-25  6:52         ` Mark Nelson
  2009-02-25  9:50           ` Geert Uytterhoeven
@ 2009-02-25 11:08           ` Sachin P. Sant
  2009-02-25 12:13             ` Mark Nelson
  2009-02-25 23:26           ` [PATCH] powerpc: Fix 64bit memcpy() regression Mark Nelson
  2 siblings, 1 reply; 24+ messages in thread
From: Sachin P. Sant @ 2009-02-25 11:08 UTC (permalink / raw)
  To: Mark Nelson
  Cc: Geert Uytterhoeven, linuxppc-dev, Jan Kara, Jan Kara, Mel Gorman,
	linux-kernel, Paul Mackerras, Andrew Morton, linux-ext4, benh

Mark Nelson wrote:
> Hi Sanchin and Geert,
>
> Does the patch below fix the problems you're seeing? If it does I'll send
> a properly written up and formatted patch to linuxppc-dev (as well as
> another one to fix the same problem in copy_tofrom_user()).
>   
This patch fixes the issue at my side. I tried booting the system few times
and every single time it came up clean.

Thanks
-Sachin

-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-25  9:50           ` Geert Uytterhoeven
@ 2009-02-25 12:10             ` Mark Nelson
  2009-02-25 13:31               ` Geert Uytterhoeven
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Nelson @ 2009-02-25 12:10 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Geert Uytterhoeven, Jan Kara, Jan Kara, Mel Gorman, linux-kernel,
	Paul Mackerras, Andrew Morton, linux-ext4

On Wed, 25 Feb 2009 08:50:46 pm Geert Uytterhoeven wrote:
> On Wed, 25 Feb 2009, Mark Nelson wrote:
> > On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
> > > Jan Kara wrote:
> > > >   Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> > > > somehow got beyond end of the page referenced by bh->b_data. So it means
> > > > that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> > > > ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> > > > particular checks whether e_value_offs + e_value_size isn't greater than
> > > > bh->b_size. So I see no way how memcpy can get beyond end of the page.
> > > >   Sachin, is the problem reproducible? If yes, can you send us contents
> > > >   
> > > Yes, i am able to recreate this problem easily. As i had mentioned if the
> > > earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> > > i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> > > without any problem.
> > 
> > Hi Sanchin and Geert,
> > 
> > Does the patch below fix the problems you're seeing? If it does I'll send
> > a properly written up and formatted patch to linuxppc-dev (as well as
> > another one to fix the same problem in copy_tofrom_user()).
> 
> Unfortunately not, now it crashes while accessing the memory pointed to by
> GPR16, in
> 
> NIP: copy_page_range+x0608/0x628
> LR:  dup_mm+0x2e4/0x428
> Trace: debug_table+0xcc70/0x1afe0 (unreliable)
> dup_mm+0x2e4/0x428
> copy_process+0x86c/0xf9c
> do_fork+0x188/0x39c
> sys_clone+0x58/0x70
> ppc_clone+0x8/0xc
> 
> However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I still see
> similar problems as above (crash in copy_page_range()).
> Which makes me think that
>   1. Your new patch fixes the problem introduced by 25d6e2d7,
>   2. There's still another issue than the one introduced by 25d6e2d7.

Does the following patch fix the errors you're seeing? (it applies the
same fix as the previous patch but this time to copy_tofrom_user, which
I updated in a4e22f02f5b6518c1484faea1f88d81802b9feac)

Thanks!

Mark

---
 arch/powerpc/lib/copyuser_64.S |   38 +++++++++++++++++++++++++++++++-------
 1 file changed, 31 insertions(+), 7 deletions(-)

Index: upstream/arch/powerpc/lib/copyuser_64.S
===================================================================
--- upstream.orig/arch/powerpc/lib/copyuser_64.S
+++ upstream/arch/powerpc/lib/copyuser_64.S
@@ -62,18 +62,19 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 72:	std	r8,8(r3)
 	beq+	3f
 	addi	r3,r3,16
-23:	ld	r9,8(r4)
 .Ldo_tail:
 	bf	cr7*4+1,1f
-	rotldi	r9,r9,32
+23:	lwz	r9,8(r4)
+	addi	r4,r4,4
 73:	stw	r9,0(r3)
 	addi	r3,r3,4
 1:	bf	cr7*4+2,2f
-	rotldi	r9,r9,16
+44:	lhz	r9,8(r4)
+	addi	r4,r4,2
 74:	sth	r9,0(r3)
 	addi	r3,r3,2
 2:	bf	cr7*4+3,3f
-	rotldi	r9,r9,8
+45:	lbz	r9,8(r4)
 75:	stb	r9,0(r3)
 3:	li	r3,0
 	blr
@@ -141,11 +142,24 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 6:	cmpwi	cr1,r5,8
 	addi	r3,r3,32
 	sld	r9,r9,r10
-	ble	cr1,.Ldo_tail
+	ble	cr1,7f
 34:	ld	r0,8(r4)
 	srd	r7,r0,r11
 	or	r9,r7,r9
-	b	.Ldo_tail
+7:
+	bf	cr7*4+1,1f
+	rotldi	r9,r9,32
+94:	stw	r9,0(r3)
+	addi	r3,r3,4
+1:	bf	cr7*4+2,2f
+	rotldi	r9,r9,16
+95:	sth	r9,0(r3)
+	addi	r3,r3,2
+2:	bf	cr7*4+3,3f
+	rotldi	r9,r9,8
+96:	stb	r9,0(r3)
+3:	li	r3,0
+	blr
 
 .Ldst_unaligned:
 	PPC_MTOCRF	0x01,r6		/* put #bytes to 8B bdry into cr7 */
@@ -218,7 +232,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 121:
 132:
 	addi	r3,r3,8
-123:
 134:
 135:
 138:
@@ -226,6 +239,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 140:
 141:
 142:
+123:
+144:
+145:
 
 /*
  * here we have had a fault on a load and r3 points to the first
@@ -309,6 +325,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 187:
 188:
 189:	
+194:
+195:
+196:
 1:
 	ld	r6,-24(r1)
 	ld	r5,-8(r1)
@@ -329,7 +348,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 	.llong	72b,172b
 	.llong	23b,123b
 	.llong	73b,173b
+	.llong	44b,144b
 	.llong	74b,174b
+	.llong	45b,145b
 	.llong	75b,175b
 	.llong	24b,124b
 	.llong	25b,125b
@@ -347,6 +368,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 	.llong	79b,179b
 	.llong	80b,180b
 	.llong	34b,134b
+	.llong	94b,194b
+	.llong	95b,195b
+	.llong	96b,196b
 	.llong	35b,135b
 	.llong	81b,181b
 	.llong	36b,136b

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-25 11:08           ` Sachin P. Sant
@ 2009-02-25 12:13             ` Mark Nelson
  0 siblings, 0 replies; 24+ messages in thread
From: Mark Nelson @ 2009-02-25 12:13 UTC (permalink / raw)
  To: Sachin P. Sant
  Cc: Geert Uytterhoeven, linuxppc-dev, Jan Kara, Jan Kara, Mel Gorman,
	linux-kernel, Paul Mackerras, Andrew Morton, linux-ext4, benh

On Wed, 25 Feb 2009 10:08:22 pm Sachin P. Sant wrote:
> Mark Nelson wrote:
> > Hi Sanchin and Geert,
> >
> > Does the patch below fix the problems you're seeing? If it does I'll send
> > a properly written up and formatted patch to linuxppc-dev (as well as
> > another one to fix the same problem in copy_tofrom_user()).
> >   
> This patch fixes the issue at my side. I tried booting the system few times
> and every single time it came up clean.

Good to hear. Thanks for testing Sanchin!

Mark

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-25 12:10             ` Mark Nelson
@ 2009-02-25 13:31               ` Geert Uytterhoeven
  2009-02-25 22:45                 ` Mark Nelson
  0 siblings, 1 reply; 24+ messages in thread
From: Geert Uytterhoeven @ 2009-02-25 13:31 UTC (permalink / raw)
  To: Mark Nelson
  Cc: linuxppc-dev, Jan Kara, Jan Kara, Mel Gorman, linux-kernel,
	Paul Mackerras, Andrew Morton, linux-ext4

On Wed, 25 Feb 2009, Mark Nelson wrote:
> On Wed, 25 Feb 2009 08:50:46 pm Geert Uytterhoeven wrote:
> > On Wed, 25 Feb 2009, Mark Nelson wrote:
> > > On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
> > > > Jan Kara wrote:
> > > > >   Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> > > > > somehow got beyond end of the page referenced by bh->b_data. So it means
> > > > > that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> > > > > ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> > > > > particular checks whether e_value_offs + e_value_size isn't greater than
> > > > > bh->b_size. So I see no way how memcpy can get beyond end of the page.
> > > > >   Sachin, is the problem reproducible? If yes, can you send us contents
> > > > >   
> > > > Yes, i am able to recreate this problem easily. As i had mentioned if the
> > > > earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> > > > i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> > > > without any problem.
> > > 
> > > Hi Sanchin and Geert,
> > > 
> > > Does the patch below fix the problems you're seeing? If it does I'll send
> > > a properly written up and formatted patch to linuxppc-dev (as well as
> > > another one to fix the same problem in copy_tofrom_user()).
> > 
> > Unfortunately not, now it crashes while accessing the memory pointed to by
> > GPR16, in
> > 
> > NIP: copy_page_range+x0608/0x628
> > LR:  dup_mm+0x2e4/0x428
> > Trace: debug_table+0xcc70/0x1afe0 (unreliable)
> > dup_mm+0x2e4/0x428
> > copy_process+0x86c/0xf9c
> > do_fork+0x188/0x39c
> > sys_clone+0x58/0x70
> > ppc_clone+0x8/0xc
> > 
> > However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I still see
> > similar problems as above (crash in copy_page_range()).
> > Which makes me think that
> >   1. Your new patch fixes the problem introduced by 25d6e2d7,
> >   2. There's still another issue than the one introduced by 25d6e2d7.
> 
> Does the following patch fix the errors you're seeing? (it applies the
> same fix as the previous patch but this time to copy_tofrom_user, which
> I updated in a4e22f02f5b6518c1484faea1f88d81802b9feac)

Thanks, but I still get crashes in copy_page_range().

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:    +32 (0)2 700 8453
Fax:      +32 (0)2 700 8622
E-mail:   Geert.Uytterhoeven@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-25 13:31               ` Geert Uytterhoeven
@ 2009-02-25 22:45                 ` Mark Nelson
  2009-02-25 23:20                   ` Mark Nelson
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Nelson @ 2009-02-25 22:45 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linuxppc-dev, Jan Kara, Jan Kara, Mel Gorman, linux-kernel,
	Paul Mackerras, Andrew Morton, linux-ext4

On Thu, 26 Feb 2009 12:31:20 am Geert Uytterhoeven wrote:
> On Wed, 25 Feb 2009, Mark Nelson wrote:
> > On Wed, 25 Feb 2009 08:50:46 pm Geert Uytterhoeven wrote:
> > > On Wed, 25 Feb 2009, Mark Nelson wrote:
> > > > On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
> > > > > Jan Kara wrote:
> > > > > >   Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> > > > > > somehow got beyond end of the page referenced by bh->b_data. So it means
> > > > > > that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> > > > > > ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> > > > > > particular checks whether e_value_offs + e_value_size isn't greater than
> > > > > > bh->b_size. So I see no way how memcpy can get beyond end of the page.
> > > > > >   Sachin, is the problem reproducible? If yes, can you send us contents
> > > > > >   
> > > > > Yes, i am able to recreate this problem easily. As i had mentioned if the
> > > > > earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> > > > > i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> > > > > without any problem.
> > > > 
> > > > Hi Sanchin and Geert,
> > > > 
> > > > Does the patch below fix the problems you're seeing? If it does I'll send
> > > > a properly written up and formatted patch to linuxppc-dev (as well as
> > > > another one to fix the same problem in copy_tofrom_user()).
> > > 
> > > Unfortunately not, now it crashes while accessing the memory pointed to by
> > > GPR16, in
> > > 
> > > NIP: copy_page_range+x0608/0x628
> > > LR:  dup_mm+0x2e4/0x428
> > > Trace: debug_table+0xcc70/0x1afe0 (unreliable)
> > > dup_mm+0x2e4/0x428
> > > copy_process+0x86c/0xf9c
> > > do_fork+0x188/0x39c
> > > sys_clone+0x58/0x70
> > > ppc_clone+0x8/0xc
> > > 
> > > However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I still see
> > > similar problems as above (crash in copy_page_range()).
> > > Which makes me think that
> > >   1. Your new patch fixes the problem introduced by 25d6e2d7,
> > >   2. There's still another issue than the one introduced by 25d6e2d7.
> > 
> > Does the following patch fix the errors you're seeing? (it applies the
> > same fix as the previous patch but this time to copy_tofrom_user, which
> > I updated in a4e22f02f5b6518c1484faea1f88d81802b9feac)
> 
> Thanks, but I still get crashes in copy_page_range().
> 

Hmmm... I'm out of ideas for the moment, but thanks for testing anyway!

Mark

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-25 22:45                 ` Mark Nelson
@ 2009-02-25 23:20                   ` Mark Nelson
  2009-02-26 17:40                     ` Geert Uytterhoeven
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Nelson @ 2009-02-25 23:20 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Geert Uytterhoeven, Jan Kara, Jan Kara, Mel Gorman, linux-kernel,
	Paul Mackerras, Andrew Morton, linux-ext4

On Thu, 26 Feb 2009 09:45:41 am Mark Nelson wrote:
> On Thu, 26 Feb 2009 12:31:20 am Geert Uytterhoeven wrote:
> > On Wed, 25 Feb 2009, Mark Nelson wrote:
> > > On Wed, 25 Feb 2009 08:50:46 pm Geert Uytterhoeven wrote:
> > > > On Wed, 25 Feb 2009, Mark Nelson wrote:
> > > > > On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
> > > > > > Jan Kara wrote:
> > > > > > >   Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> > > > > > > somehow got beyond end of the page referenced by bh->b_data. So it means
> > > > > > > that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> > > > > > > ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> > > > > > > particular checks whether e_value_offs + e_value_size isn't greater than
> > > > > > > bh->b_size. So I see no way how memcpy can get beyond end of the page.
> > > > > > >   Sachin, is the problem reproducible? If yes, can you send us contents
> > > > > > >   
> > > > > > Yes, i am able to recreate this problem easily. As i had mentioned if the
> > > > > > earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> > > > > > i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> > > > > > without any problem.
> > > > > 
> > > > > Hi Sanchin and Geert,
> > > > > 
> > > > > Does the patch below fix the problems you're seeing? If it does I'll send
> > > > > a properly written up and formatted patch to linuxppc-dev (as well as
> > > > > another one to fix the same problem in copy_tofrom_user()).
> > > > 
> > > > Unfortunately not, now it crashes while accessing the memory pointed to by
> > > > GPR16, in
> > > > 
> > > > NIP: copy_page_range+x0608/0x628
> > > > LR:  dup_mm+0x2e4/0x428
> > > > Trace: debug_table+0xcc70/0x1afe0 (unreliable)
> > > > dup_mm+0x2e4/0x428
> > > > copy_process+0x86c/0xf9c
> > > > do_fork+0x188/0x39c
> > > > sys_clone+0x58/0x70
> > > > ppc_clone+0x8/0xc
> > > > 
> > > > However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I still see
> > > > similar problems as above (crash in copy_page_range()).
> > > > Which makes me think that
> > > >   1. Your new patch fixes the problem introduced by 25d6e2d7,
> > > >   2. There's still another issue than the one introduced by 25d6e2d7.
> > > 
> > > Does the following patch fix the errors you're seeing? (it applies the
> > > same fix as the previous patch but this time to copy_tofrom_user, which
> > > I updated in a4e22f02f5b6518c1484faea1f88d81802b9feac)
> > 
> > Thanks, but I still get crashes in copy_page_range().
> > 
> 
> Hmmm... I'm out of ideas for the moment, but thanks for testing anyway!
> 
> Mark
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
> 

If you revert both 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556 and
a4e22f02f5b6518c1484faea1f88d81802b9feac, does it help? You could also
try to revert 57dda6ef5bd5b9e60410477ad29e654097e2cca1 just in case I
need to keep wearing the brown paper bag for a bit longer :)

Thanks!

Mark

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] powerpc: Fix 64bit memcpy() regression
  2009-02-25  6:52         ` Mark Nelson
  2009-02-25  9:50           ` Geert Uytterhoeven
  2009-02-25 11:08           ` Sachin P. Sant
@ 2009-02-25 23:26           ` Mark Nelson
  2009-02-25 23:46             ` [PATCH] powerpc: Fix 64bit __copy_tofrom_user() regression Mark Nelson
  2 siblings, 1 reply; 24+ messages in thread
From: Mark Nelson @ 2009-02-25 23:26 UTC (permalink / raw)
  To: benh
  Cc: linuxppc-dev, Sachin P. Sant, Geert Uytterhoeven, Jan Kara,
	Jan Kara, Mel Gorman, linux-kernel, Paul Mackerras,
	Andrew Morton, linux-ext4

This fixes a regression introduced by commit
25d6e2d7c58ddc4a3b614fc5381591c0cfe66556 ("powerpc: Update 64bit memcpy()
using CPU_FTR_UNALIGNED_LD_STD").

This commit allowed CPUs that have the CPU_FTR_UNALIGNED_LD_STD CPU
feature bit present to do the memcpy() with unaligned load doubles. But,
along with this came a bug where our final load double would read bytes
beyond a page boundary and into the next (unmapped) page. This was caught
by enabling CONFIG_DEBUG_PAGEALLOC, 

The fix was to read only the number of bytes that we need to store rather
than reading a full 8-byte doubleword and storing only a portion of that.

In order to minimise the amount of existing code touched we use the
original do_tail for the src_unaligned case.

Below is an example of the regression, as reported by Sachin Sant:

Unable to handle kernel paging request for data at address 0xc00000003f380000
Faulting instruction address: 0xc000000000039574
cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020]
    pc: c000000000039574: .memcpy+0x74/0x244
    lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
    sp: c00000003baf32a0
   msr: 8000000000009032
   dar: c00000003f380000
 dsisr: 40000000
  current = 0xc00000003e54b010
  paca    = 0xc000000000a53680
    pid   = 1840, comm = readahead
enter ? for help
[link register   ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
[c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
(unreliab
le)
[c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
[c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c
[c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678
[c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68
[c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c
[c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120
[c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3]
[c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260
[c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010
[c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4
[c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4
[c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8
[c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874
[c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140
[c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38
[c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 000000000ff0ef18
SP (ffc6f4b0) is in userspace
1:mon>

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Reported-by: Sachin Sant <sachinp@in.ibm.com>
Tested-by: Sachin Sant <sachinp@in.ibm.com>
---
 arch/powerpc/lib/memcpy_64.S |   26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

Index: upstream/arch/powerpc/lib/memcpy_64.S
===================================================================
--- upstream.orig/arch/powerpc/lib/memcpy_64.S
+++ upstream/arch/powerpc/lib/memcpy_64.S
@@ -53,18 +53,19 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 3:	std	r8,8(r3)
 	beq	3f
 	addi	r3,r3,16
-	ld	r9,8(r4)
 .Ldo_tail:
 	bf	cr7*4+1,1f
-	rotldi	r9,r9,32
+	lwz	r9,8(r4)
+	addi	r4,r4,4
 	stw	r9,0(r3)
 	addi	r3,r3,4
 1:	bf	cr7*4+2,2f
-	rotldi	r9,r9,16
+	lhz	r9,8(r4)
+	addi	r4,r4,2
 	sth	r9,0(r3)
 	addi	r3,r3,2
 2:	bf	cr7*4+3,3f
-	rotldi	r9,r9,8
+	lbz	r9,8(r4)
 	stb	r9,0(r3)
 3:	ld	r3,48(r1)	/* return dest pointer */
 	blr
@@ -133,11 +134,24 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 	cmpwi	cr1,r5,8
 	addi	r3,r3,32
 	sld	r9,r9,r10
-	ble	cr1,.Ldo_tail
+	ble	cr1,6f
 	ld	r0,8(r4)
 	srd	r7,r0,r11
 	or	r9,r7,r9
-	b	.Ldo_tail
+6:
+	bf	cr7*4+1,1f
+	rotldi	r9,r9,32
+	stw	r9,0(r3)
+	addi	r3,r3,4
+1:	bf	cr7*4+2,2f
+	rotldi	r9,r9,16
+	sth	r9,0(r3)
+	addi	r3,r3,2
+2:	bf	cr7*4+3,3f
+	rotldi	r9,r9,8
+	stb	r9,0(r3)
+3:	ld	r3,48(r1)	/* return dest pointer */
+	blr
 
 .Ldst_unaligned:
 	PPC_MTOCRF	0x01,r6		# put #bytes to 8B bdry into cr7

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] powerpc: Fix 64bit __copy_tofrom_user() regression
  2009-02-25 23:26           ` [PATCH] powerpc: Fix 64bit memcpy() regression Mark Nelson
@ 2009-02-25 23:46             ` Mark Nelson
  0 siblings, 0 replies; 24+ messages in thread
From: Mark Nelson @ 2009-02-25 23:46 UTC (permalink / raw)
  To: benh
  Cc: linuxppc-dev, Jan Kara, Jan Kara, Mel Gorman, linux-kernel,
	Paul Mackerras, Geert Uytterhoeven, Andrew Morton, linux-ext4

This fixes a regression introduced by commit
a4e22f02f5b6518c1484faea1f88d81802b9feac ("powerpc: Update 64bit
__copy_tofrom_user() using CPU_FTR_UNALIGNED_LD_STD").

The same bug that existed in the 64bit memcpy() also exists here so fix
it here too. The fix is the same as that applied to memcpy() with the
addition of fixes for the exception handling code required for
__copy_tofrom_user().

This stops us reading beyond the end of the source region we were told
to copy.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
---
 arch/powerpc/lib/copyuser_64.S |   38 +++++++++++++++++++++++++++++++-------
 1 file changed, 31 insertions(+), 7 deletions(-)

Index: upstream/arch/powerpc/lib/copyuser_64.S
===================================================================
--- upstream.orig/arch/powerpc/lib/copyuser_64.S
+++ upstream/arch/powerpc/lib/copyuser_64.S
@@ -62,18 +62,19 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 72:	std	r8,8(r3)
 	beq+	3f
 	addi	r3,r3,16
-23:	ld	r9,8(r4)
 .Ldo_tail:
 	bf	cr7*4+1,1f
-	rotldi	r9,r9,32
+23:	lwz	r9,8(r4)
+	addi	r4,r4,4
 73:	stw	r9,0(r3)
 	addi	r3,r3,4
 1:	bf	cr7*4+2,2f
-	rotldi	r9,r9,16
+44:	lhz	r9,8(r4)
+	addi	r4,r4,2
 74:	sth	r9,0(r3)
 	addi	r3,r3,2
 2:	bf	cr7*4+3,3f
-	rotldi	r9,r9,8
+45:	lbz	r9,8(r4)
 75:	stb	r9,0(r3)
 3:	li	r3,0
 	blr
@@ -141,11 +142,24 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 6:	cmpwi	cr1,r5,8
 	addi	r3,r3,32
 	sld	r9,r9,r10
-	ble	cr1,.Ldo_tail
+	ble	cr1,7f
 34:	ld	r0,8(r4)
 	srd	r7,r0,r11
 	or	r9,r7,r9
-	b	.Ldo_tail
+7:
+	bf	cr7*4+1,1f
+	rotldi	r9,r9,32
+94:	stw	r9,0(r3)
+	addi	r3,r3,4
+1:	bf	cr7*4+2,2f
+	rotldi	r9,r9,16
+95:	sth	r9,0(r3)
+	addi	r3,r3,2
+2:	bf	cr7*4+3,3f
+	rotldi	r9,r9,8
+96:	stb	r9,0(r3)
+3:	li	r3,0
+	blr
 
 .Ldst_unaligned:
 	PPC_MTOCRF	0x01,r6		/* put #bytes to 8B bdry into cr7 */
@@ -218,7 +232,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 121:
 132:
 	addi	r3,r3,8
-123:
 134:
 135:
 138:
@@ -226,6 +239,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 140:
 141:
 142:
+123:
+144:
+145:
 
 /*
  * here we have had a fault on a load and r3 points to the first
@@ -309,6 +325,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 187:
 188:
 189:	
+194:
+195:
+196:
 1:
 	ld	r6,-24(r1)
 	ld	r5,-8(r1)
@@ -329,7 +348,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 	.llong	72b,172b
 	.llong	23b,123b
 	.llong	73b,173b
+	.llong	44b,144b
 	.llong	74b,174b
+	.llong	45b,145b
 	.llong	75b,175b
 	.llong	24b,124b
 	.llong	25b,125b
@@ -347,6 +368,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 	.llong	79b,179b
 	.llong	80b,180b
 	.llong	34b,134b
+	.llong	94b,194b
+	.llong	95b,195b
+	.llong	96b,196b
 	.llong	35b,135b
 	.llong	81b,181b
 	.llong	36b,136b

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Crash (ext3 ) during 2.6.29-rc6 boot
  2009-02-25 23:20                   ` Mark Nelson
@ 2009-02-26 17:40                     ` Geert Uytterhoeven
  0 siblings, 0 replies; 24+ messages in thread
From: Geert Uytterhoeven @ 2009-02-26 17:40 UTC (permalink / raw)
  To: Mark Nelson
  Cc: linuxppc-dev, Jan Kara, Jan Kara, Mel Gorman, linux-kernel,
	Paul Mackerras, Andrew Morton, linux-ext4

On Thu, 26 Feb 2009, Mark Nelson wrote:
> On Thu, 26 Feb 2009 09:45:41 am Mark Nelson wrote:
> > On Thu, 26 Feb 2009 12:31:20 am Geert Uytterhoeven wrote:
> > > On Wed, 25 Feb 2009, Mark Nelson wrote:
> > > > Does the following patch fix the errors you're seeing? (it applies the
> > > > same fix as the previous patch but this time to copy_tofrom_user, which
> > > > I updated in a4e22f02f5b6518c1484faea1f88d81802b9feac)
> > > 
> > > Thanks, but I still get crashes in copy_page_range().
> > 
> > Hmmm... I'm out of ideas for the moment, but thanks for testing anyway!
> 
> If you revert both 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556 and
> a4e22f02f5b6518c1484faea1f88d81802b9feac, does it help? You could also
> try to revert 57dda6ef5bd5b9e60410477ad29e654097e2cca1 just in case I
> need to keep wearing the brown paper bag for a bit longer :)

Still doesn't help.

However, I noticed I never enabled CONFIG_DEBUG_PAGEALLOC before 2.6.29-rc5.
So far I tried 2.6.2[5-8], and they all crash with CONFIG_DEBUG_PAGEALLOC.
I guess it never actually worked on PS3.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:    +32 (0)2 700 8453
Fax:      +32 (0)2 700 8622
E-mail:   Geert.Uytterhoeven@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-02-26 17:40 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-23  9:46 Crash (ext3 ) during 2.6.29-rc6 boot Sachin P. Sant
2009-02-23 10:13 ` Andrew Morton
2009-02-23 10:32   ` Paul Mackerras
2009-02-23 10:57     ` Sachin P. Sant
2009-02-23 15:51     ` Jan Kara
2009-02-24  6:38       ` Sachin P. Sant
2009-02-24 15:51         ` Jan Kara
2009-02-25  1:20           ` Mark Nelson
2009-02-25  6:52         ` Mark Nelson
2009-02-25  9:50           ` Geert Uytterhoeven
2009-02-25 12:10             ` Mark Nelson
2009-02-25 13:31               ` Geert Uytterhoeven
2009-02-25 22:45                 ` Mark Nelson
2009-02-25 23:20                   ` Mark Nelson
2009-02-26 17:40                     ` Geert Uytterhoeven
2009-02-25 11:08           ` Sachin P. Sant
2009-02-25 12:13             ` Mark Nelson
2009-02-25 23:26           ` [PATCH] powerpc: Fix 64bit memcpy() regression Mark Nelson
2009-02-25 23:46             ` [PATCH] powerpc: Fix 64bit __copy_tofrom_user() regression Mark Nelson
2009-02-24 18:01     ` Crash (ext3 ) during 2.6.29-rc6 boot Geert Uytterhoeven
2009-02-25  1:27       ` Mark Nelson
2009-02-25 10:50         ` Geert Uytterhoeven
2009-02-23 10:48   ` Sachin P. Sant
2009-02-24 16:14     ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).