All of lore.kernel.org
 help / color / mirror / Atom feed
* Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16
@ 2009-03-12 18:50 Kevin Jamieson
  2009-03-12 19:23 ` Eric Sandeen
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Jamieson @ 2009-03-12 18:50 UTC (permalink / raw)
  To: xfs

Hello,

We have triggered the below oops in XFS several times on the SLES 10 SP2
kernel (2.6.16.60-0.21-smp). We've also seen it on earlier SLES 10 kernels
(2.6.16.21-0.8-smp).

The oops seems to occur with an application that backs up largish (~ 1GB)
files to the XFS file system over NFS, although it is not easily
reproducible (it happens about once a week on our test system). We have
run xfs_repair a few times afterwards, and it did not detect any problems
with the file system.

We will be reporting this to Novell support, of course, but I thought I'd
post it here too in case anyone had any ideas or has seen this before.

Thanks,
Kevin

Mar 11 13:10:26 gn1 kernel: Unable to handle kernel paging request at
virtual address 0301c39c
Mar 11 13:10:26 gn1 kernel:  printing eip:
Mar 11 13:10:26 gn1 kernel: f95899ab
Mar 11 13:10:26 gn1 kernel: *pde = 00000000
Mar 11 13:10:26 gn1 kernel: Oops: 0000 [#1]
Mar 11 13:10:26 gn1 kernel: SMP
Mar 11 13:10:26 gn1 kernel: last sysfs file:
/devices/pci0000:00/0000:00:06.0/0000:17:00.0/host1/target1:0:3/1:0:3:0/type
Mar 11 13:10:26 gn1 kernel: Modules linked in: nfsd exportfs lockd nfs_acl
sunrpc af_packet xfs_quota sg qla2xxx_conf qla2xxx intermodule bnx2
xt_pkttype ipt_LOG xt_limit bonding ip6t_REJECT xt_tcpudp ipt_REJECT
xt_state iptable_ma
ngle iptable_nat ip_nat iptable_filter ip6table_mangle ip_conntrack
nfnetlink ip_tables ip6table_filter ip6_tables x_tables ipv6 loop
xfs_dmapi xfs dmapi dm_mod usbhid hw_random shpchp pci_hotplug uhci_hcd
ehci_hcd ide_cd cdrom usbcore e10
00 ext3 jbd ata_piix ahci libata edd fan thermal processor firmware_class
scsi_transport_fc cciss piix sd_mod scsi_mod ide_disk ide_core
Mar 11 13:10:26 gn1 kernel: CPU:    0
Mar 11 13:10:26 gn1 kernel: EIP:    0060:[<f95899ab>]    Tainted: G     U VLI
Mar 11 13:10:26 gn1 kernel: EFLAGS: 00010202   (2.6.16.60-0.21-smp #1)
Mar 11 13:10:26 gn1 kernel: EIP is at xfs_bmbt_get_startoff+0x2/0x15 [xfs]
Mar 11 13:10:26 gn1 kernel: eax: 0301c398   ebx: ec9fc6d0   ecx: 0301c398 
 edx: 00001c39
Mar 11 13:10:26 gn1 kernel: esi: ec9fc680   edi: 0301c398   ebp: d9c1fe40 
 esp: d9c1fe24
Mar 11 13:10:26 gn1 kernel: ds: 007b   es: 007b   ss: 0068
Mar 11 13:10:26 gn1 kernel: Process nfsd (pid: 2178, threadinfo=d9c1e000
task=dfc80910)
Mar 11 13:10:26 gn1 kernel: Stack: <0>f957eb34 ec9fc680 ec9fc680 f651b000
00000000 f95a2f4e 00000000 dfc80910
Mar 11 13:10:26 gn1 kernel:        00100100 ec9fc680 ec9fc680 2f14a000
f95a3045 00000001 d0bcd800 ec9fc680
Mar 11 13:10:26 gn1 kernel:        00000000 0002f14a 00000000 f95bae44
2f14a000 00000000 00000fff 00000000
Mar 11 13:10:26 gn1 kernel: Call Trace:
Mar 11 13:10:26 gn1 kernel:  [<f957eb34>] xfs_bmap_last_offset+0xc0/0xdc
[xfs]
Mar 11 13:10:26 gn1 kernel:  [<f95a2f4e>] xfs_file_last_byte+0x1e/0xbc [xfs]
Mar 11 13:10:26 gn1 kernel:  [<f95a3045>] xfs_itruncate_start+0x59/0xa8 [xfs]
Mar 11 13:10:26 gn1 kernel:  [<f95bae44>] xfs_free_eofblocks+0x17a/0x276
[xfs]
Mar 11 13:10:26 gn1 kernel:  [<f95bf921>] xfs_release+0x115/0x175 [xfs]
Mar 11 13:10:26 gn1 kernel:  [<f95c51e7>] xfs_file_release+0x13/0x1a [xfs]
Mar 11 13:10:26 gn1 kernel:  [<c01645cd>] __fput+0x9d/0x170
Mar 11 13:10:26 gn1 kernel:  [<f9b588be>] nfsd_write+0xb6/0xbf [nfsd]
Mar 11 13:10:26 gn1 kernel:  [<f9b5ee1a>] nfsd3_proc_write+0xd0/0xe7 [nfsd]
Mar 11 13:10:26 gn1 kernel:  [<f973c76b>] svcauth_unix_accept+0xe1/0x18e
[sunrpc]
Mar 11 13:10:26 gn1 kernel:  [<f9b550cb>] nfsd_dispatch+0xbb/0x16c [nfsd]
Mar 11 13:10:26 gn1 kernel:  [<f9739518>] svc_process+0x388/0x616 [sunrpc]
Mar 11 13:10:26 gn1 kernel:  [<f9b55578>] nfsd+0x197/0x2ff [nfsd]
Mar 11 13:10:26 gn1 kernel:  [<f9b553e1>] nfsd+0x0/0x2ff [nfsd]
Mar 11 13:10:26 gn1 kernel:  [<c0102005>] kernel_thread_helper+0x5/0xb
Mar 11 13:10:26 gn1 kernel: Code: 53 89 c1 8b 00 31 d2 8b 59 0c 8b 49 08
25 ff 01 00 00 89 c2 b8 00 00 00 00 0f ac d9 15 c1 e2 0b 09 c8 c1 eb 15 09
da 5b c3 89 c1 <8b> 51 04 8b 00 81 e2 ff ff ff 7f 0f ac d0 09 c1 ea 09 c3
57 56

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16
  2009-03-12 18:50 Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 Kevin Jamieson
@ 2009-03-12 19:23 ` Eric Sandeen
  2009-03-12 23:13   ` Kevin Jamieson
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Sandeen @ 2009-03-12 19:23 UTC (permalink / raw)
  To: kevin; +Cc: xfs

Kevin Jamieson wrote:
> Hello,
> 
> We have triggered the below oops in XFS several times on the SLES 10 SP2
> kernel (2.6.16.60-0.21-smp). We've also seen it on earlier SLES 10 kernels
> (2.6.16.21-0.8-smp).
> 
> The oops seems to occur with an application that backs up largish (~ 1GB)
> files to the XFS file system over NFS, although it is not easily
> reproducible (it happens about once a week on our test system). We have
> run xfs_repair a few times afterwards, and it did not detect any problems
> with the file system.
> 
> We will be reporting this to Novell support, of course, but I thought I'd
> post it here too in case anyone had any ideas or has seen this before.

For SLES that usually is the best route...

However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html looks
applicable... don't think it ever got merged though.

perhaps you could test it?

-Eric

> Thanks,
> Kevin
> 
> Mar 11 13:10:26 gn1 kernel: Unable to handle kernel paging request at
> virtual address 0301c39c
> Mar 11 13:10:26 gn1 kernel:  printing eip:
> Mar 11 13:10:26 gn1 kernel: f95899ab
> Mar 11 13:10:26 gn1 kernel: *pde = 00000000
> Mar 11 13:10:26 gn1 kernel: Oops: 0000 [#1]
> Mar 11 13:10:26 gn1 kernel: SMP
> Mar 11 13:10:26 gn1 kernel: last sysfs file:
> /devices/pci0000:00/0000:00:06.0/0000:17:00.0/host1/target1:0:3/1:0:3:0/type
> Mar 11 13:10:26 gn1 kernel: Modules linked in: nfsd exportfs lockd nfs_acl
> sunrpc af_packet xfs_quota sg qla2xxx_conf qla2xxx intermodule bnx2
> xt_pkttype ipt_LOG xt_limit bonding ip6t_REJECT xt_tcpudp ipt_REJECT
> xt_state iptable_ma
> ngle iptable_nat ip_nat iptable_filter ip6table_mangle ip_conntrack
> nfnetlink ip_tables ip6table_filter ip6_tables x_tables ipv6 loop
> xfs_dmapi xfs dmapi dm_mod usbhid hw_random shpchp pci_hotplug uhci_hcd
> ehci_hcd ide_cd cdrom usbcore e10
> 00 ext3 jbd ata_piix ahci libata edd fan thermal processor firmware_class
> scsi_transport_fc cciss piix sd_mod scsi_mod ide_disk ide_core
> Mar 11 13:10:26 gn1 kernel: CPU:    0
> Mar 11 13:10:26 gn1 kernel: EIP:    0060:[<f95899ab>]    Tainted: G     U VLI
> Mar 11 13:10:26 gn1 kernel: EFLAGS: 00010202   (2.6.16.60-0.21-smp #1)
> Mar 11 13:10:26 gn1 kernel: EIP is at xfs_bmbt_get_startoff+0x2/0x15 [xfs]
> Mar 11 13:10:26 gn1 kernel: eax: 0301c398   ebx: ec9fc6d0   ecx: 0301c398 
>  edx: 00001c39
> Mar 11 13:10:26 gn1 kernel: esi: ec9fc680   edi: 0301c398   ebp: d9c1fe40 
>  esp: d9c1fe24
> Mar 11 13:10:26 gn1 kernel: ds: 007b   es: 007b   ss: 0068
> Mar 11 13:10:26 gn1 kernel: Process nfsd (pid: 2178, threadinfo=d9c1e000
> task=dfc80910)
> Mar 11 13:10:26 gn1 kernel: Stack: <0>f957eb34 ec9fc680 ec9fc680 f651b000
> 00000000 f95a2f4e 00000000 dfc80910
> Mar 11 13:10:26 gn1 kernel:        00100100 ec9fc680 ec9fc680 2f14a000
> f95a3045 00000001 d0bcd800 ec9fc680
> Mar 11 13:10:26 gn1 kernel:        00000000 0002f14a 00000000 f95bae44
> 2f14a000 00000000 00000fff 00000000
> Mar 11 13:10:26 gn1 kernel: Call Trace:
> Mar 11 13:10:26 gn1 kernel:  [<f957eb34>] xfs_bmap_last_offset+0xc0/0xdc
> [xfs]
> Mar 11 13:10:26 gn1 kernel:  [<f95a2f4e>] xfs_file_last_byte+0x1e/0xbc [xfs]
> Mar 11 13:10:26 gn1 kernel:  [<f95a3045>] xfs_itruncate_start+0x59/0xa8 [xfs]
> Mar 11 13:10:26 gn1 kernel:  [<f95bae44>] xfs_free_eofblocks+0x17a/0x276
> [xfs]
> Mar 11 13:10:26 gn1 kernel:  [<f95bf921>] xfs_release+0x115/0x175 [xfs]
> Mar 11 13:10:26 gn1 kernel:  [<f95c51e7>] xfs_file_release+0x13/0x1a [xfs]
> Mar 11 13:10:26 gn1 kernel:  [<c01645cd>] __fput+0x9d/0x170
> Mar 11 13:10:26 gn1 kernel:  [<f9b588be>] nfsd_write+0xb6/0xbf [nfsd]
> Mar 11 13:10:26 gn1 kernel:  [<f9b5ee1a>] nfsd3_proc_write+0xd0/0xe7 [nfsd]
> Mar 11 13:10:26 gn1 kernel:  [<f973c76b>] svcauth_unix_accept+0xe1/0x18e
> [sunrpc]
> Mar 11 13:10:26 gn1 kernel:  [<f9b550cb>] nfsd_dispatch+0xbb/0x16c [nfsd]
> Mar 11 13:10:26 gn1 kernel:  [<f9739518>] svc_process+0x388/0x616 [sunrpc]
> Mar 11 13:10:26 gn1 kernel:  [<f9b55578>] nfsd+0x197/0x2ff [nfsd]
> Mar 11 13:10:26 gn1 kernel:  [<f9b553e1>] nfsd+0x0/0x2ff [nfsd]
> Mar 11 13:10:26 gn1 kernel:  [<c0102005>] kernel_thread_helper+0x5/0xb
> Mar 11 13:10:26 gn1 kernel: Code: 53 89 c1 8b 00 31 d2 8b 59 0c 8b 49 08
> 25 ff 01 00 00 89 c2 b8 00 00 00 00 0f ac d9 15 c1 e2 0b 09 c8 c1 eb 15 09
> da 5b c3 89 c1 <8b> 51 04 8b 00 81 e2 ff ff ff 7f 0f ac d0 09 c1 ea 09 c3
> 57 56
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16
  2009-03-12 19:23 ` Eric Sandeen
@ 2009-03-12 23:13   ` Kevin Jamieson
  2009-04-22  0:54     ` Kevin Jamieson
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Jamieson @ 2009-03-12 23:13 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote:

> For SLES that usually is the best route...
>
> However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html looks
> applicable... don't think it ever got merged though.
>
> perhaps you could test it?

Thanks, Eric. I will test Lachlan's patch on our system.

Kevin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16
  2009-03-12 23:13   ` Kevin Jamieson
@ 2009-04-22  0:54     ` Kevin Jamieson
  2009-04-22  2:16       ` Eric Sandeen
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Jamieson @ 2009-04-22  0:54 UTC (permalink / raw)
  To: kevin; +Cc: xfs

On Thu, March 12, 2009 4:13 pm, Kevin Jamieson wrote:
> On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote:
>
>> For SLES that usually is the best route...
>>
>> However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html looks
>> applicable... don't think it ever got merged though.
>>
>> perhaps you could test it?
>
> Thanks, Eric. I will test Lachlan's patch on our system.

To follow this up, since applying the patch from the above thread there
have been no re-occurrences of the issue on our test servers over the past
month.

Regards,
Kevin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16
  2009-04-22  0:54     ` Kevin Jamieson
@ 2009-04-22  2:16       ` Eric Sandeen
  2009-04-22  2:31         ` Felix Blyakher
  2009-04-23  6:18         ` Kevin Jamieson
  0 siblings, 2 replies; 14+ messages in thread
From: Eric Sandeen @ 2009-04-22  2:16 UTC (permalink / raw)
  To: kevin; +Cc: xfs

Kevin Jamieson wrote:
> On Thu, March 12, 2009 4:13 pm, Kevin Jamieson wrote:
>> On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote:
>>
>>> For SLES that usually is the best route...
>>>
>>> However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html looks
>>> applicable... don't think it ever got merged though.
>>>
>>> perhaps you could test it?
>> Thanks, Eric. I will test Lachlan's patch on our system.
> 
> To follow this up, since applying the patch from the above thread there
> have been no re-occurrences of the issue on our test servers over the past
> month.

And you hit it pretty reliably before, right?  Sounds like we need to
give that a pretty strong eyeball and get it merged, perhaps.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16
  2009-04-22  2:16       ` Eric Sandeen
@ 2009-04-22  2:31         ` Felix Blyakher
  2009-04-22  2:32           ` Eric Sandeen
  2009-04-23  6:24           ` Kevin Jamieson
  2009-04-23  6:18         ` Kevin Jamieson
  1 sibling, 2 replies; 14+ messages in thread
From: Felix Blyakher @ 2009-04-22  2:31 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs


On Apr 21, 2009, at 9:16 PM, Eric Sandeen wrote:

> Kevin Jamieson wrote:
>> On Thu, March 12, 2009 4:13 pm, Kevin Jamieson wrote:
>>> On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote:
>>>
>>>> For SLES that usually is the best route...
>>>>
>>>> However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html  
>>>> looks
>>>> applicable... don't think it ever got merged though.
>>>>
>>>> perhaps you could test it?
>>> Thanks, Eric. I will test Lachlan's patch on our system.
>>
>> To follow this up, since applying the patch from the above thread  
>> there
>> have been no re-occurrences of the issue on our test servers over  
>> the past
>> month.
>
> And you hit it pretty reliably before, right?  Sounds like we need to
> give that a pretty strong eyeball and get it merged, perhaps.

I was looking at this patch too.
But I could never reproduce the problem, even with Lachlan's test
program. Kevin, any idea what kind of io load triggered this problem?
The patch looks right, but I really want to prove the problem
exists, and the patch addresses it.

Felix

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16
  2009-04-22  2:31         ` Felix Blyakher
@ 2009-04-22  2:32           ` Eric Sandeen
  2009-04-22  2:35             ` Felix Blyakher
  2009-04-23  6:24           ` Kevin Jamieson
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Sandeen @ 2009-04-22  2:32 UTC (permalink / raw)
  To: Felix Blyakher; +Cc: xfs

Felix Blyakher wrote:
> On Apr 21, 2009, at 9:16 PM, Eric Sandeen wrote:
> 
>> Kevin Jamieson wrote:
>>> On Thu, March 12, 2009 4:13 pm, Kevin Jamieson wrote:
>>>> On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote:
>>>>
>>>>> For SLES that usually is the best route...
>>>>>
>>>>> However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html  
>>>>> looks
>>>>> applicable... don't think it ever got merged though.
>>>>>
>>>>> perhaps you could test it?
>>>> Thanks, Eric. I will test Lachlan's patch on our system.
>>> To follow this up, since applying the patch from the above thread  
>>> there
>>> have been no re-occurrences of the issue on our test servers over  
>>> the past
>>> month.
>> And you hit it pretty reliably before, right?  Sounds like we need to
>> give that a pretty strong eyeball and get it merged, perhaps.
> 
> I was looking at this patch too.
> But I could never reproduce the problem, even with Lachlan's test
> program. Kevin, any idea what kind of io load triggered this problem?
> The patch looks right, but I really want to prove the problem
> exists, and the patch addresses it.
> 
> Felix
> 

FWIW I can't reproduce either, with the stated commandline.

Should try it with a 1k blocksize, though - maybe Lachlan tested 4k on
16k page ia64?

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16
  2009-04-22  2:32           ` Eric Sandeen
@ 2009-04-22  2:35             ` Felix Blyakher
  2009-04-22  2:38               ` Eric Sandeen
  0 siblings, 1 reply; 14+ messages in thread
From: Felix Blyakher @ 2009-04-22  2:35 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs


On Apr 21, 2009, at 9:32 PM, Eric Sandeen wrote:

> Felix Blyakher wrote:
>> On Apr 21, 2009, at 9:16 PM, Eric Sandeen wrote:
>>
>>> Kevin Jamieson wrote:
>>>> On Thu, March 12, 2009 4:13 pm, Kevin Jamieson wrote:
>>>>> On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote:
>>>>>
>>>>>> For SLES that usually is the best route...
>>>>>>
>>>>>> However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html
>>>>>> looks
>>>>>> applicable... don't think it ever got merged though.
>>>>>>
>>>>>> perhaps you could test it?
>>>>> Thanks, Eric. I will test Lachlan's patch on our system.
>>>> To follow this up, since applying the patch from the above thread
>>>> there
>>>> have been no re-occurrences of the issue on our test servers over
>>>> the past
>>>> month.
>>> And you hit it pretty reliably before, right?  Sounds like we need  
>>> to
>>> give that a pretty strong eyeball and get it merged, perhaps.
>>
>> I was looking at this patch too.
>> But I could never reproduce the problem, even with Lachlan's test
>> program. Kevin, any idea what kind of io load triggered this problem?
>> The patch looks right, but I really want to prove the problem
>> exists, and the patch addresses it.
>>
>> Felix
>>
>
> FWIW I can't reproduce either, with the stated commandline.
>
> Should try it with a 1k blocksize, though - maybe Lachlan tested 4k on
> 16k page ia64?

That's what I've tested on.

Felix

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16
  2009-04-22  2:35             ` Felix Blyakher
@ 2009-04-22  2:38               ` Eric Sandeen
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Sandeen @ 2009-04-22  2:38 UTC (permalink / raw)
  To: Felix Blyakher; +Cc: xfs

Felix Blyakher wrote:
> On Apr 21, 2009, at 9:32 PM, Eric Sandeen wrote:
> 
>> Felix Blyakher wrote:
>>> On Apr 21, 2009, at 9:16 PM, Eric Sandeen wrote:
>>>
>>>> Kevin Jamieson wrote:
>>>>> On Thu, March 12, 2009 4:13 pm, Kevin Jamieson wrote:
>>>>>> On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote:
>>>>>>
>>>>>>> For SLES that usually is the best route...
>>>>>>>
>>>>>>> However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html
>>>>>>> looks
>>>>>>> applicable... don't think it ever got merged though.
>>>>>>>
>>>>>>> perhaps you could test it?
>>>>>> Thanks, Eric. I will test Lachlan's patch on our system.
>>>>> To follow this up, since applying the patch from the above thread
>>>>> there
>>>>> have been no re-occurrences of the issue on our test servers over
>>>>> the past
>>>>> month.
>>>> And you hit it pretty reliably before, right?  Sounds like we need  
>>>> to
>>>> give that a pretty strong eyeball and get it merged, perhaps.
>>> I was looking at this patch too.
>>> But I could never reproduce the problem, even with Lachlan's test
>>> program. Kevin, any idea what kind of io load triggered this problem?
>>> The patch looks right, but I really want to prove the problem
>>> exists, and the patch addresses it.
>>>
>>> Felix
>>>
>> FWIW I can't reproduce either, with the stated commandline.
>>
>> Should try it with a 1k blocksize, though - maybe Lachlan tested 4k on
>> 16k page ia64?
> 
> That's what I've tested on.

Ah, well, I just spoke with Lachlan and he said he tested on x86_64,
4k/4k.  So hrm...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16
  2009-04-22  2:16       ` Eric Sandeen
  2009-04-22  2:31         ` Felix Blyakher
@ 2009-04-23  6:18         ` Kevin Jamieson
  1 sibling, 0 replies; 14+ messages in thread
From: Kevin Jamieson @ 2009-04-23  6:18 UTC (permalink / raw)
  To: xfs

Eric Sandeen wrote:

> And you hit it pretty reliably before, right?

Yes, fairly regularly -- about 2-3 times per week under the same workload.

Kevin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16
  2009-04-22  2:31         ` Felix Blyakher
  2009-04-22  2:32           ` Eric Sandeen
@ 2009-04-23  6:24           ` Kevin Jamieson
  1 sibling, 0 replies; 14+ messages in thread
From: Kevin Jamieson @ 2009-04-23  6:24 UTC (permalink / raw)
  To: xfs

Felix Blyakher wrote:

> But I could never reproduce the problem, even with Lachlan's test
> program.

I was not able to reproduce the problem with Lachlan's test program 
either (although I did not experiment much with the parameters).

> Kevin, any idea what kind of io load triggered this problem?

The workload consisted of predominantly (98%) 2-3MB file ingests through 
an NFS share, with a a few (2%) 1GB file ingests, where the issue seemed 
to trigger on the larger files.

Kevin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16
  2009-04-23  9:47   ` Christoph Hellwig
@ 2009-04-24  2:43     ` Lachlan McIlroy
  0 siblings, 0 replies; 14+ messages in thread
From: Lachlan McIlroy @ 2009-04-24  2:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Eric Sandeen, xfs


----- "Christoph Hellwig" <hch@infradead.org> wrote:

> Lachlan,
> 
> any chance we could get you to resubmit the minimal locking fix for
> now?

Okay, done.  If people still see problems then we should take the
rest of the patch too (of course I would suggest taking it anyway
but it needs more soak testing).

> 
> We can play with the test program than to find some cause the other

I suggest increasing or decreasing the -l argument until you can
reproduce the problem.  Timing it so that the file's data is being
flushed to disk (and therefore delayed allocations being converted
and extents merged) at the same time the file is closed (ie the
fput() is done) can be tricky.

It may be easier to reproduce if another program, running concurrently
to the test program, is simply opening and close the file repeatedly.

> corruptions.  We also still have that non-freed attr fork patch that
> was
> somewhat related which we still need get done :P
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16
  2009-04-22  3:06 ` Lachlan McIlroy
@ 2009-04-23  9:47   ` Christoph Hellwig
  2009-04-24  2:43     ` Lachlan McIlroy
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2009-04-23  9:47 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: Eric Sandeen, xfs


Lachlan,

any chance we could get you to resubmit the minimal locking fix for now?

We can play with the test program than to find some cause the other
corruptions.  We also still have that non-freed attr fork patch that was
somewhat related which we still need get done :P

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16
       [not found] <1416563271.242851240369384712.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
@ 2009-04-22  3:06 ` Lachlan McIlroy
  2009-04-23  9:47   ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Lachlan McIlroy @ 2009-04-22  3:06 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

----- "Eric Sandeen" <sandeen@sandeen.net> wrote:

> Felix Blyakher wrote:
> > On Apr 21, 2009, at 9:32 PM, Eric Sandeen wrote:
> > 
> >> Felix Blyakher wrote:
> >>> On Apr 21, 2009, at 9:16 PM, Eric Sandeen wrote:
> >>>
> >>>> Kevin Jamieson wrote:
> >>>>> On Thu, March 12, 2009 4:13 pm, Kevin Jamieson wrote:
> >>>>>> On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote:
> >>>>>>
> >>>>>>> For SLES that usually is the best route...
> >>>>>>>
> >>>>>>> However,
> http://oss.sgi.com/archives/xfs/2009-02/msg00220.html
> >>>>>>> looks
> >>>>>>> applicable... don't think it ever got merged though.
> >>>>>>>
> >>>>>>> perhaps you could test it?
> >>>>>> Thanks, Eric. I will test Lachlan's patch on our system.
> >>>>> To follow this up, since applying the patch from the above
> thread
> >>>>> there
> >>>>> have been no re-occurrences of the issue on our test servers
> over
> >>>>> the past
> >>>>> month.
> >>>> And you hit it pretty reliably before, right?  Sounds like we
> need  
> >>>> to
> >>>> give that a pretty strong eyeball and get it merged, perhaps.
> >>> I was looking at this patch too.
> >>> But I could never reproduce the problem, even with Lachlan's test
> >>> program. Kevin, any idea what kind of io load triggered this
> problem?
> >>> The patch looks right, but I really want to prove the problem
> >>> exists, and the patch addresses it.
> >>>
> >>> Felix
> >>>
> >> FWIW I can't reproduce either, with the stated commandline.
> >>
> >> Should try it with a 1k blocksize, though - maybe Lachlan tested 4k
> on
> >> 16k page ia64?
> > 
> > That's what I've tested on.
> 
> Ah, well, I just spoke with Lachlan and he said he tested on x86_64,
> 4k/4k.  So hrm...
You'll probably need to tweak the arguments to the test program to
generate the precise senario to trigger the race.  I remember having
to play around with them until I got it to crash reliably.  It will
depend on how fast your CPUs are, how much of the file is cached before
it is paged to disk, how fast the disks are, etc...

The race requires a thread to be executing xfs_file_last_byte() while
another thread is modifying the file's extent map - in particular
shrinking the extent map by merging extents.

> 
> -Eric
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-04-24  2:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-12 18:50 Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 Kevin Jamieson
2009-03-12 19:23 ` Eric Sandeen
2009-03-12 23:13   ` Kevin Jamieson
2009-04-22  0:54     ` Kevin Jamieson
2009-04-22  2:16       ` Eric Sandeen
2009-04-22  2:31         ` Felix Blyakher
2009-04-22  2:32           ` Eric Sandeen
2009-04-22  2:35             ` Felix Blyakher
2009-04-22  2:38               ` Eric Sandeen
2009-04-23  6:24           ` Kevin Jamieson
2009-04-23  6:18         ` Kevin Jamieson
     [not found] <1416563271.242851240369384712.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-04-22  3:06 ` Lachlan McIlroy
2009-04-23  9:47   ` Christoph Hellwig
2009-04-24  2:43     ` Lachlan McIlroy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.