* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 [not found] <1416563271.242851240369384712.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> @ 2009-04-22 3:06 ` Lachlan McIlroy 2009-04-23 9:47 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Lachlan McIlroy @ 2009-04-22 3:06 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs ----- "Eric Sandeen" <sandeen@sandeen.net> wrote: > Felix Blyakher wrote: > > On Apr 21, 2009, at 9:32 PM, Eric Sandeen wrote: > > > >> Felix Blyakher wrote: > >>> On Apr 21, 2009, at 9:16 PM, Eric Sandeen wrote: > >>> > >>>> Kevin Jamieson wrote: > >>>>> On Thu, March 12, 2009 4:13 pm, Kevin Jamieson wrote: > >>>>>> On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote: > >>>>>> > >>>>>>> For SLES that usually is the best route... > >>>>>>> > >>>>>>> However, > http://oss.sgi.com/archives/xfs/2009-02/msg00220.html > >>>>>>> looks > >>>>>>> applicable... don't think it ever got merged though. > >>>>>>> > >>>>>>> perhaps you could test it? > >>>>>> Thanks, Eric. I will test Lachlan's patch on our system. > >>>>> To follow this up, since applying the patch from the above > thread > >>>>> there > >>>>> have been no re-occurrences of the issue on our test servers > over > >>>>> the past > >>>>> month. > >>>> And you hit it pretty reliably before, right? Sounds like we > need > >>>> to > >>>> give that a pretty strong eyeball and get it merged, perhaps. > >>> I was looking at this patch too. > >>> But I could never reproduce the problem, even with Lachlan's test > >>> program. Kevin, any idea what kind of io load triggered this > problem? > >>> The patch looks right, but I really want to prove the problem > >>> exists, and the patch addresses it. > >>> > >>> Felix > >>> > >> FWIW I can't reproduce either, with the stated commandline. > >> > >> Should try it with a 1k blocksize, though - maybe Lachlan tested 4k > on > >> 16k page ia64? > > > > That's what I've tested on. > > Ah, well, I just spoke with Lachlan and he said he tested on x86_64, > 4k/4k. So hrm... You'll probably need to tweak the arguments to the test program to generate the precise senario to trigger the race. I remember having to play around with them until I got it to crash reliably. It will depend on how fast your CPUs are, how much of the file is cached before it is paged to disk, how fast the disks are, etc... The race requires a thread to be executing xfs_file_last_byte() while another thread is modifying the file's extent map - in particular shrinking the extent map by merging extents. > > -Eric > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 2009-04-22 3:06 ` Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 Lachlan McIlroy @ 2009-04-23 9:47 ` Christoph Hellwig 2009-04-24 2:43 ` Lachlan McIlroy 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2009-04-23 9:47 UTC (permalink / raw) To: Lachlan McIlroy; +Cc: Eric Sandeen, xfs Lachlan, any chance we could get you to resubmit the minimal locking fix for now? We can play with the test program than to find some cause the other corruptions. We also still have that non-freed attr fork patch that was somewhat related which we still need get done :P _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 2009-04-23 9:47 ` Christoph Hellwig @ 2009-04-24 2:43 ` Lachlan McIlroy 0 siblings, 0 replies; 14+ messages in thread From: Lachlan McIlroy @ 2009-04-24 2:43 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Eric Sandeen, xfs ----- "Christoph Hellwig" <hch@infradead.org> wrote: > Lachlan, > > any chance we could get you to resubmit the minimal locking fix for > now? Okay, done. If people still see problems then we should take the rest of the patch too (of course I would suggest taking it anyway but it needs more soak testing). > > We can play with the test program than to find some cause the other I suggest increasing or decreasing the -l argument until you can reproduce the problem. Timing it so that the file's data is being flushed to disk (and therefore delayed allocations being converted and extents merged) at the same time the file is closed (ie the fput() is done) can be tricky. It may be easier to reproduce if another program, running concurrently to the test program, is simply opening and close the file repeatedly. > corruptions. We also still have that non-freed attr fork patch that > was > somewhat related which we still need get done :P > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 @ 2009-03-12 18:50 Kevin Jamieson 2009-03-12 19:23 ` Eric Sandeen 0 siblings, 1 reply; 14+ messages in thread From: Kevin Jamieson @ 2009-03-12 18:50 UTC (permalink / raw) To: xfs Hello, We have triggered the below oops in XFS several times on the SLES 10 SP2 kernel (2.6.16.60-0.21-smp). We've also seen it on earlier SLES 10 kernels (2.6.16.21-0.8-smp). The oops seems to occur with an application that backs up largish (~ 1GB) files to the XFS file system over NFS, although it is not easily reproducible (it happens about once a week on our test system). We have run xfs_repair a few times afterwards, and it did not detect any problems with the file system. We will be reporting this to Novell support, of course, but I thought I'd post it here too in case anyone had any ideas or has seen this before. Thanks, Kevin Mar 11 13:10:26 gn1 kernel: Unable to handle kernel paging request at virtual address 0301c39c Mar 11 13:10:26 gn1 kernel: printing eip: Mar 11 13:10:26 gn1 kernel: f95899ab Mar 11 13:10:26 gn1 kernel: *pde = 00000000 Mar 11 13:10:26 gn1 kernel: Oops: 0000 [#1] Mar 11 13:10:26 gn1 kernel: SMP Mar 11 13:10:26 gn1 kernel: last sysfs file: /devices/pci0000:00/0000:00:06.0/0000:17:00.0/host1/target1:0:3/1:0:3:0/type Mar 11 13:10:26 gn1 kernel: Modules linked in: nfsd exportfs lockd nfs_acl sunrpc af_packet xfs_quota sg qla2xxx_conf qla2xxx intermodule bnx2 xt_pkttype ipt_LOG xt_limit bonding ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_ma ngle iptable_nat ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables ipv6 loop xfs_dmapi xfs dmapi dm_mod usbhid hw_random shpchp pci_hotplug uhci_hcd ehci_hcd ide_cd cdrom usbcore e10 00 ext3 jbd ata_piix ahci libata edd fan thermal processor firmware_class scsi_transport_fc cciss piix sd_mod scsi_mod ide_disk ide_core Mar 11 13:10:26 gn1 kernel: CPU: 0 Mar 11 13:10:26 gn1 kernel: EIP: 0060:[<f95899ab>] Tainted: G U VLI Mar 11 13:10:26 gn1 kernel: EFLAGS: 00010202 (2.6.16.60-0.21-smp #1) Mar 11 13:10:26 gn1 kernel: EIP is at xfs_bmbt_get_startoff+0x2/0x15 [xfs] Mar 11 13:10:26 gn1 kernel: eax: 0301c398 ebx: ec9fc6d0 ecx: 0301c398 edx: 00001c39 Mar 11 13:10:26 gn1 kernel: esi: ec9fc680 edi: 0301c398 ebp: d9c1fe40 esp: d9c1fe24 Mar 11 13:10:26 gn1 kernel: ds: 007b es: 007b ss: 0068 Mar 11 13:10:26 gn1 kernel: Process nfsd (pid: 2178, threadinfo=d9c1e000 task=dfc80910) Mar 11 13:10:26 gn1 kernel: Stack: <0>f957eb34 ec9fc680 ec9fc680 f651b000 00000000 f95a2f4e 00000000 dfc80910 Mar 11 13:10:26 gn1 kernel: 00100100 ec9fc680 ec9fc680 2f14a000 f95a3045 00000001 d0bcd800 ec9fc680 Mar 11 13:10:26 gn1 kernel: 00000000 0002f14a 00000000 f95bae44 2f14a000 00000000 00000fff 00000000 Mar 11 13:10:26 gn1 kernel: Call Trace: Mar 11 13:10:26 gn1 kernel: [<f957eb34>] xfs_bmap_last_offset+0xc0/0xdc [xfs] Mar 11 13:10:26 gn1 kernel: [<f95a2f4e>] xfs_file_last_byte+0x1e/0xbc [xfs] Mar 11 13:10:26 gn1 kernel: [<f95a3045>] xfs_itruncate_start+0x59/0xa8 [xfs] Mar 11 13:10:26 gn1 kernel: [<f95bae44>] xfs_free_eofblocks+0x17a/0x276 [xfs] Mar 11 13:10:26 gn1 kernel: [<f95bf921>] xfs_release+0x115/0x175 [xfs] Mar 11 13:10:26 gn1 kernel: [<f95c51e7>] xfs_file_release+0x13/0x1a [xfs] Mar 11 13:10:26 gn1 kernel: [<c01645cd>] __fput+0x9d/0x170 Mar 11 13:10:26 gn1 kernel: [<f9b588be>] nfsd_write+0xb6/0xbf [nfsd] Mar 11 13:10:26 gn1 kernel: [<f9b5ee1a>] nfsd3_proc_write+0xd0/0xe7 [nfsd] Mar 11 13:10:26 gn1 kernel: [<f973c76b>] svcauth_unix_accept+0xe1/0x18e [sunrpc] Mar 11 13:10:26 gn1 kernel: [<f9b550cb>] nfsd_dispatch+0xbb/0x16c [nfsd] Mar 11 13:10:26 gn1 kernel: [<f9739518>] svc_process+0x388/0x616 [sunrpc] Mar 11 13:10:26 gn1 kernel: [<f9b55578>] nfsd+0x197/0x2ff [nfsd] Mar 11 13:10:26 gn1 kernel: [<f9b553e1>] nfsd+0x0/0x2ff [nfsd] Mar 11 13:10:26 gn1 kernel: [<c0102005>] kernel_thread_helper+0x5/0xb Mar 11 13:10:26 gn1 kernel: Code: 53 89 c1 8b 00 31 d2 8b 59 0c 8b 49 08 25 ff 01 00 00 89 c2 b8 00 00 00 00 0f ac d9 15 c1 e2 0b 09 c8 c1 eb 15 09 da 5b c3 89 c1 <8b> 51 04 8b 00 81 e2 ff ff ff 7f 0f ac d0 09 c1 ea 09 c3 57 56 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 2009-03-12 18:50 Kevin Jamieson @ 2009-03-12 19:23 ` Eric Sandeen 2009-03-12 23:13 ` Kevin Jamieson 0 siblings, 1 reply; 14+ messages in thread From: Eric Sandeen @ 2009-03-12 19:23 UTC (permalink / raw) To: kevin; +Cc: xfs Kevin Jamieson wrote: > Hello, > > We have triggered the below oops in XFS several times on the SLES 10 SP2 > kernel (2.6.16.60-0.21-smp). We've also seen it on earlier SLES 10 kernels > (2.6.16.21-0.8-smp). > > The oops seems to occur with an application that backs up largish (~ 1GB) > files to the XFS file system over NFS, although it is not easily > reproducible (it happens about once a week on our test system). We have > run xfs_repair a few times afterwards, and it did not detect any problems > with the file system. > > We will be reporting this to Novell support, of course, but I thought I'd > post it here too in case anyone had any ideas or has seen this before. For SLES that usually is the best route... However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html looks applicable... don't think it ever got merged though. perhaps you could test it? -Eric > Thanks, > Kevin > > Mar 11 13:10:26 gn1 kernel: Unable to handle kernel paging request at > virtual address 0301c39c > Mar 11 13:10:26 gn1 kernel: printing eip: > Mar 11 13:10:26 gn1 kernel: f95899ab > Mar 11 13:10:26 gn1 kernel: *pde = 00000000 > Mar 11 13:10:26 gn1 kernel: Oops: 0000 [#1] > Mar 11 13:10:26 gn1 kernel: SMP > Mar 11 13:10:26 gn1 kernel: last sysfs file: > /devices/pci0000:00/0000:00:06.0/0000:17:00.0/host1/target1:0:3/1:0:3:0/type > Mar 11 13:10:26 gn1 kernel: Modules linked in: nfsd exportfs lockd nfs_acl > sunrpc af_packet xfs_quota sg qla2xxx_conf qla2xxx intermodule bnx2 > xt_pkttype ipt_LOG xt_limit bonding ip6t_REJECT xt_tcpudp ipt_REJECT > xt_state iptable_ma > ngle iptable_nat ip_nat iptable_filter ip6table_mangle ip_conntrack > nfnetlink ip_tables ip6table_filter ip6_tables x_tables ipv6 loop > xfs_dmapi xfs dmapi dm_mod usbhid hw_random shpchp pci_hotplug uhci_hcd > ehci_hcd ide_cd cdrom usbcore e10 > 00 ext3 jbd ata_piix ahci libata edd fan thermal processor firmware_class > scsi_transport_fc cciss piix sd_mod scsi_mod ide_disk ide_core > Mar 11 13:10:26 gn1 kernel: CPU: 0 > Mar 11 13:10:26 gn1 kernel: EIP: 0060:[<f95899ab>] Tainted: G U VLI > Mar 11 13:10:26 gn1 kernel: EFLAGS: 00010202 (2.6.16.60-0.21-smp #1) > Mar 11 13:10:26 gn1 kernel: EIP is at xfs_bmbt_get_startoff+0x2/0x15 [xfs] > Mar 11 13:10:26 gn1 kernel: eax: 0301c398 ebx: ec9fc6d0 ecx: 0301c398 > edx: 00001c39 > Mar 11 13:10:26 gn1 kernel: esi: ec9fc680 edi: 0301c398 ebp: d9c1fe40 > esp: d9c1fe24 > Mar 11 13:10:26 gn1 kernel: ds: 007b es: 007b ss: 0068 > Mar 11 13:10:26 gn1 kernel: Process nfsd (pid: 2178, threadinfo=d9c1e000 > task=dfc80910) > Mar 11 13:10:26 gn1 kernel: Stack: <0>f957eb34 ec9fc680 ec9fc680 f651b000 > 00000000 f95a2f4e 00000000 dfc80910 > Mar 11 13:10:26 gn1 kernel: 00100100 ec9fc680 ec9fc680 2f14a000 > f95a3045 00000001 d0bcd800 ec9fc680 > Mar 11 13:10:26 gn1 kernel: 00000000 0002f14a 00000000 f95bae44 > 2f14a000 00000000 00000fff 00000000 > Mar 11 13:10:26 gn1 kernel: Call Trace: > Mar 11 13:10:26 gn1 kernel: [<f957eb34>] xfs_bmap_last_offset+0xc0/0xdc > [xfs] > Mar 11 13:10:26 gn1 kernel: [<f95a2f4e>] xfs_file_last_byte+0x1e/0xbc [xfs] > Mar 11 13:10:26 gn1 kernel: [<f95a3045>] xfs_itruncate_start+0x59/0xa8 [xfs] > Mar 11 13:10:26 gn1 kernel: [<f95bae44>] xfs_free_eofblocks+0x17a/0x276 > [xfs] > Mar 11 13:10:26 gn1 kernel: [<f95bf921>] xfs_release+0x115/0x175 [xfs] > Mar 11 13:10:26 gn1 kernel: [<f95c51e7>] xfs_file_release+0x13/0x1a [xfs] > Mar 11 13:10:26 gn1 kernel: [<c01645cd>] __fput+0x9d/0x170 > Mar 11 13:10:26 gn1 kernel: [<f9b588be>] nfsd_write+0xb6/0xbf [nfsd] > Mar 11 13:10:26 gn1 kernel: [<f9b5ee1a>] nfsd3_proc_write+0xd0/0xe7 [nfsd] > Mar 11 13:10:26 gn1 kernel: [<f973c76b>] svcauth_unix_accept+0xe1/0x18e > [sunrpc] > Mar 11 13:10:26 gn1 kernel: [<f9b550cb>] nfsd_dispatch+0xbb/0x16c [nfsd] > Mar 11 13:10:26 gn1 kernel: [<f9739518>] svc_process+0x388/0x616 [sunrpc] > Mar 11 13:10:26 gn1 kernel: [<f9b55578>] nfsd+0x197/0x2ff [nfsd] > Mar 11 13:10:26 gn1 kernel: [<f9b553e1>] nfsd+0x0/0x2ff [nfsd] > Mar 11 13:10:26 gn1 kernel: [<c0102005>] kernel_thread_helper+0x5/0xb > Mar 11 13:10:26 gn1 kernel: Code: 53 89 c1 8b 00 31 d2 8b 59 0c 8b 49 08 > 25 ff 01 00 00 89 c2 b8 00 00 00 00 0f ac d9 15 c1 e2 0b 09 c8 c1 eb 15 09 > da 5b c3 89 c1 <8b> 51 04 8b 00 81 e2 ff ff ff 7f 0f ac d0 09 c1 ea 09 c3 > 57 56 > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 2009-03-12 19:23 ` Eric Sandeen @ 2009-03-12 23:13 ` Kevin Jamieson 2009-04-22 0:54 ` Kevin Jamieson 0 siblings, 1 reply; 14+ messages in thread From: Kevin Jamieson @ 2009-03-12 23:13 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote: > For SLES that usually is the best route... > > However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html looks > applicable... don't think it ever got merged though. > > perhaps you could test it? Thanks, Eric. I will test Lachlan's patch on our system. Kevin _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 2009-03-12 23:13 ` Kevin Jamieson @ 2009-04-22 0:54 ` Kevin Jamieson 2009-04-22 2:16 ` Eric Sandeen 0 siblings, 1 reply; 14+ messages in thread From: Kevin Jamieson @ 2009-04-22 0:54 UTC (permalink / raw) To: kevin; +Cc: xfs On Thu, March 12, 2009 4:13 pm, Kevin Jamieson wrote: > On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote: > >> For SLES that usually is the best route... >> >> However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html looks >> applicable... don't think it ever got merged though. >> >> perhaps you could test it? > > Thanks, Eric. I will test Lachlan's patch on our system. To follow this up, since applying the patch from the above thread there have been no re-occurrences of the issue on our test servers over the past month. Regards, Kevin _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 2009-04-22 0:54 ` Kevin Jamieson @ 2009-04-22 2:16 ` Eric Sandeen 2009-04-22 2:31 ` Felix Blyakher 2009-04-23 6:18 ` Kevin Jamieson 0 siblings, 2 replies; 14+ messages in thread From: Eric Sandeen @ 2009-04-22 2:16 UTC (permalink / raw) To: kevin; +Cc: xfs Kevin Jamieson wrote: > On Thu, March 12, 2009 4:13 pm, Kevin Jamieson wrote: >> On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote: >> >>> For SLES that usually is the best route... >>> >>> However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html looks >>> applicable... don't think it ever got merged though. >>> >>> perhaps you could test it? >> Thanks, Eric. I will test Lachlan's patch on our system. > > To follow this up, since applying the patch from the above thread there > have been no re-occurrences of the issue on our test servers over the past > month. And you hit it pretty reliably before, right? Sounds like we need to give that a pretty strong eyeball and get it merged, perhaps. -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 2009-04-22 2:16 ` Eric Sandeen @ 2009-04-22 2:31 ` Felix Blyakher 2009-04-22 2:32 ` Eric Sandeen 2009-04-23 6:24 ` Kevin Jamieson 2009-04-23 6:18 ` Kevin Jamieson 1 sibling, 2 replies; 14+ messages in thread From: Felix Blyakher @ 2009-04-22 2:31 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs On Apr 21, 2009, at 9:16 PM, Eric Sandeen wrote: > Kevin Jamieson wrote: >> On Thu, March 12, 2009 4:13 pm, Kevin Jamieson wrote: >>> On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote: >>> >>>> For SLES that usually is the best route... >>>> >>>> However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html >>>> looks >>>> applicable... don't think it ever got merged though. >>>> >>>> perhaps you could test it? >>> Thanks, Eric. I will test Lachlan's patch on our system. >> >> To follow this up, since applying the patch from the above thread >> there >> have been no re-occurrences of the issue on our test servers over >> the past >> month. > > And you hit it pretty reliably before, right? Sounds like we need to > give that a pretty strong eyeball and get it merged, perhaps. I was looking at this patch too. But I could never reproduce the problem, even with Lachlan's test program. Kevin, any idea what kind of io load triggered this problem? The patch looks right, but I really want to prove the problem exists, and the patch addresses it. Felix _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 2009-04-22 2:31 ` Felix Blyakher @ 2009-04-22 2:32 ` Eric Sandeen 2009-04-22 2:35 ` Felix Blyakher 2009-04-23 6:24 ` Kevin Jamieson 1 sibling, 1 reply; 14+ messages in thread From: Eric Sandeen @ 2009-04-22 2:32 UTC (permalink / raw) To: Felix Blyakher; +Cc: xfs Felix Blyakher wrote: > On Apr 21, 2009, at 9:16 PM, Eric Sandeen wrote: > >> Kevin Jamieson wrote: >>> On Thu, March 12, 2009 4:13 pm, Kevin Jamieson wrote: >>>> On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote: >>>> >>>>> For SLES that usually is the best route... >>>>> >>>>> However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html >>>>> looks >>>>> applicable... don't think it ever got merged though. >>>>> >>>>> perhaps you could test it? >>>> Thanks, Eric. I will test Lachlan's patch on our system. >>> To follow this up, since applying the patch from the above thread >>> there >>> have been no re-occurrences of the issue on our test servers over >>> the past >>> month. >> And you hit it pretty reliably before, right? Sounds like we need to >> give that a pretty strong eyeball and get it merged, perhaps. > > I was looking at this patch too. > But I could never reproduce the problem, even with Lachlan's test > program. Kevin, any idea what kind of io load triggered this problem? > The patch looks right, but I really want to prove the problem > exists, and the patch addresses it. > > Felix > FWIW I can't reproduce either, with the stated commandline. Should try it with a 1k blocksize, though - maybe Lachlan tested 4k on 16k page ia64? -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 2009-04-22 2:32 ` Eric Sandeen @ 2009-04-22 2:35 ` Felix Blyakher 2009-04-22 2:38 ` Eric Sandeen 0 siblings, 1 reply; 14+ messages in thread From: Felix Blyakher @ 2009-04-22 2:35 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs On Apr 21, 2009, at 9:32 PM, Eric Sandeen wrote: > Felix Blyakher wrote: >> On Apr 21, 2009, at 9:16 PM, Eric Sandeen wrote: >> >>> Kevin Jamieson wrote: >>>> On Thu, March 12, 2009 4:13 pm, Kevin Jamieson wrote: >>>>> On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote: >>>>> >>>>>> For SLES that usually is the best route... >>>>>> >>>>>> However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html >>>>>> looks >>>>>> applicable... don't think it ever got merged though. >>>>>> >>>>>> perhaps you could test it? >>>>> Thanks, Eric. I will test Lachlan's patch on our system. >>>> To follow this up, since applying the patch from the above thread >>>> there >>>> have been no re-occurrences of the issue on our test servers over >>>> the past >>>> month. >>> And you hit it pretty reliably before, right? Sounds like we need >>> to >>> give that a pretty strong eyeball and get it merged, perhaps. >> >> I was looking at this patch too. >> But I could never reproduce the problem, even with Lachlan's test >> program. Kevin, any idea what kind of io load triggered this problem? >> The patch looks right, but I really want to prove the problem >> exists, and the patch addresses it. >> >> Felix >> > > FWIW I can't reproduce either, with the stated commandline. > > Should try it with a 1k blocksize, though - maybe Lachlan tested 4k on > 16k page ia64? That's what I've tested on. Felix _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 2009-04-22 2:35 ` Felix Blyakher @ 2009-04-22 2:38 ` Eric Sandeen 0 siblings, 0 replies; 14+ messages in thread From: Eric Sandeen @ 2009-04-22 2:38 UTC (permalink / raw) To: Felix Blyakher; +Cc: xfs Felix Blyakher wrote: > On Apr 21, 2009, at 9:32 PM, Eric Sandeen wrote: > >> Felix Blyakher wrote: >>> On Apr 21, 2009, at 9:16 PM, Eric Sandeen wrote: >>> >>>> Kevin Jamieson wrote: >>>>> On Thu, March 12, 2009 4:13 pm, Kevin Jamieson wrote: >>>>>> On Thu, March 12, 2009 12:23 pm, Eric Sandeen wrote: >>>>>> >>>>>>> For SLES that usually is the best route... >>>>>>> >>>>>>> However, http://oss.sgi.com/archives/xfs/2009-02/msg00220.html >>>>>>> looks >>>>>>> applicable... don't think it ever got merged though. >>>>>>> >>>>>>> perhaps you could test it? >>>>>> Thanks, Eric. I will test Lachlan's patch on our system. >>>>> To follow this up, since applying the patch from the above thread >>>>> there >>>>> have been no re-occurrences of the issue on our test servers over >>>>> the past >>>>> month. >>>> And you hit it pretty reliably before, right? Sounds like we need >>>> to >>>> give that a pretty strong eyeball and get it merged, perhaps. >>> I was looking at this patch too. >>> But I could never reproduce the problem, even with Lachlan's test >>> program. Kevin, any idea what kind of io load triggered this problem? >>> The patch looks right, but I really want to prove the problem >>> exists, and the patch addresses it. >>> >>> Felix >>> >> FWIW I can't reproduce either, with the stated commandline. >> >> Should try it with a 1k blocksize, though - maybe Lachlan tested 4k on >> 16k page ia64? > > That's what I've tested on. Ah, well, I just spoke with Lachlan and he said he tested on x86_64, 4k/4k. So hrm... -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 2009-04-22 2:31 ` Felix Blyakher 2009-04-22 2:32 ` Eric Sandeen @ 2009-04-23 6:24 ` Kevin Jamieson 1 sibling, 0 replies; 14+ messages in thread From: Kevin Jamieson @ 2009-04-23 6:24 UTC (permalink / raw) To: xfs Felix Blyakher wrote: > But I could never reproduce the problem, even with Lachlan's test > program. I was not able to reproduce the problem with Lachlan's test program either (although I did not experiment much with the parameters). > Kevin, any idea what kind of io load triggered this problem? The workload consisted of predominantly (98%) 2-3MB file ingests through an NFS share, with a a few (2%) 1GB file ingests, where the issue seemed to trigger on the larger files. Kevin _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 2009-04-22 2:16 ` Eric Sandeen 2009-04-22 2:31 ` Felix Blyakher @ 2009-04-23 6:18 ` Kevin Jamieson 1 sibling, 0 replies; 14+ messages in thread From: Kevin Jamieson @ 2009-04-23 6:18 UTC (permalink / raw) To: xfs Eric Sandeen wrote: > And you hit it pretty reliably before, right? Yes, fairly regularly -- about 2-3 times per week under the same workload. Kevin _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-04-24 2:43 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <1416563271.242851240369384712.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com> 2009-04-22 3:06 ` Oops at xfs_bmbt_get_startoff in SLES 10 2.6.16 Lachlan McIlroy 2009-04-23 9:47 ` Christoph Hellwig 2009-04-24 2:43 ` Lachlan McIlroy 2009-03-12 18:50 Kevin Jamieson 2009-03-12 19:23 ` Eric Sandeen 2009-03-12 23:13 ` Kevin Jamieson 2009-04-22 0:54 ` Kevin Jamieson 2009-04-22 2:16 ` Eric Sandeen 2009-04-22 2:31 ` Felix Blyakher 2009-04-22 2:32 ` Eric Sandeen 2009-04-22 2:35 ` Felix Blyakher 2009-04-22 2:38 ` Eric Sandeen 2009-04-23 6:24 ` Kevin Jamieson 2009-04-23 6:18 ` Kevin Jamieson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.