linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PROBLEM: Oops during iozone over NFS against 2.4.18-rc4 + Trond's NFS Patches + Sistina's LVM2
@ 2002-03-03  2:09 Joachim Breuer
  2002-03-05 20:38 ` Joachim Breuer
  0 siblings, 1 reply; 2+ messages in thread
From: Joachim Breuer @ 2002-03-03  2:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: support


Sistina: I'm NOT a support customer; consider me a voluntary beta
tester of LVM2. You might be interested in this...

[1.] One line summary of the problem:

  During performance test (using iozone) against nfs server with
  kernel 2.4.18-rc4 + linux-2.4.18-NFS_ALL.dif (provided against
  2.4.18-rc2, went in cleanly) from nfs.sourceforge.net + Sistina's
  LVM2 beta1.1, Oops occurs.

[2.] Full description of the problem/report:

  Client and Server running same aforementioned kernel, NFS parameters
  set to rsize==wsize==4096; udp. Exported FS is an ext3 on a LV on
  two PVs (linear/segmented, not striped), one is a complete SCSI disk
  (sda), the other is a partition of an IDE disk (hda5). "./iozone -ac
  -R -n 256m" showed results for write/rewrite/read/reread; showed
  "Error writing block at 219414528", "write: Input/output error",
  "iozone: interrupted" before random read result. Result "never"
  comes up (not in 15+ minutes), iozone appears hung (Ctrl-C won't
  break it, kill -9 does).

  Checking at the server shows oops detailed below, and high load
  (around 8, stays up even after iozone was killed). No processes on
  server show responsible for the load average in "top".

  Same iozone against other NFS server works (more than one iteration,
  this NFS server will be called "reference server" from here on). The
  reference server is different hardware (see next paragraph and
  [7.7]), esp. different ethernet controller. Reference server does
  *NOT* employ LVM; but *does* also use ext3.

  All boxen X86, server is a Celeron 900 (PIII-Core-based); client is
  a Pentium III 850; running identical kernels. Reference server is a
  dual Pentium Pro 200 SMP, using the same kernel modulo SMP turned
  on. More info on hardware available on request.

[3.] Keywords (i.e., modules, networking, kernel):

  NFS (Trond's NFS Patches), LVM, ext3

[4.] Kernel version (from /proc/version):

  Linux version 2.4.18-rc4 (jmbreuer@venus.fo.et.local) (gcc version
  2.96 20000731 (Red Hat Linux 7.1 2.96-98)) #1 Tue Feb 26 18:11:31
  CET 2002

[5.] Output of Oops.. message (if applicable) with symbolic information
     resolved (see Documentation/oops-tracing.txt)

Assertion failure in journal_dirty_data_R79f5b221() at transaction.c:975: "jh->b_transaction == journal->j_committing_transaction"

ksymoops 2.4.3 on i686 2.4.18-rc4.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.18-rc4/ (default)
     -m /boot/kernel/System.map-2.4.18-rc4 (specified)
     -i

invalid operand: 0000
CPU:    0
EIP:    0010:[tulip:__insmod_tulip_O/lib/modules/2.4.18-rc4/kernel/drivers/net/+-896957/96]    Not tainted
EIP:    0010:[<d083c043>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00013296
eax: 00000086   ebx: c8741e3c   ecx: 00000009   edx: ce8c1f64
esi: 00000001   edi: cf774c00   ebp: c8e098c0   esp: cdbfdd0c
ds: 0018   es: 0018   ss: 0018
Process nfsd (pid: 1041, stackpage=cdbfd000)
Stack: d0841b20 d08414d5 d08413d0 000003cf d0841d40 00000000 c8e098c0 cb4344f0 
       00000000 00000000 c2be9860 c8e098c0 00000000 00001000 d0848bf6 c8e098c0 
       c2be9860 00000000 00007883 cf774a00 c0143da5 00003246 cdbfddac 00003246 
Call Trace: [tulip:__insmod_tulip_O/lib/modules/2.4.18-rc4/kernel/drivers/net/+-873696/96] [tulip:__insmod_tulip_O/lib/modules/2.4.18-rc4/kernel/drivers/net
Call Trace: [<d0841b20>] [<d08414d5>] [<d08413d0>] [<d0841d40>] [<d0848bf6>] 
   [<c0143da5>] [<d0840ec7>] [<d0848a2a>] [<d0848e31>] [<d0848be0>] [<d08486a0>] 
   [<c0127f93>] [<c019a5fa>] [<d095904c>] [<c010842c>] [<c010a468>] [<d0852ae0>] 
   [<d095e140>] [<d0964a60>] [<d0955577>] [<d0964a60>] [<d092c8d6>] [<d09643d8>] 
   [<d09643f8>] [<d0955387>] [<c0105726>] [<d09551b0>] 
Code: 0f 0b 83 c4 14 83 7b 08 02 0f 87 ac 00 00 00 8b 4c 24 30 85 

>>EIP; d083c042 <[jbd]journal_dirty_data+82/170>   <=====
Trace; d0841b20 <[jbd].LC4+20/40>
Trace; d08414d4 <[jbd].LC37+a/14>
Trace; d08413d0 <[jbd].LC10+c/1a>
Trace; d0841d40 <[jbd].LC15+0/40>
Trace; d0848bf6 <[ext3]journal_dirty_sync_data+16/60>
Trace; c0143da4 <iget4+44/d0>
Trace; d0840ec6 <[jbd]__jbd_kmalloc+26/80>
Trace; d0848a2a <[ext3]walk_page_buffers+5a/80>
Trace; d0848e30 <[ext3]ext3_commit_write+120/1d0>
Trace; d0848be0 <[ext3]journal_dirty_sync_data+0/60>
Trace; d08486a0 <[ext3]ext3_get_block+0/60>
Trace; c0127f92 <generic_file_write+4c2/6d0>
Trace; c019a5fa <net_rx_action+1ca/2a0>
Trace; d095904c <[nfsd]nfsd_write+14c/2d0>
Trace; c010842c <do_IRQ+9c/b0>
Trace; c010a468 <call_do_IRQ+6/e>
Trace; d0852ae0 <[ext3]ext3_file_operations+0/60>
Trace; d095e140 <[nfsd]nfsd3_proc_write+f0/110>
Trace; d0964a60 <[nfsd]nfsd_procedures3+e0/2c0>
Trace; d0955576 <[nfsd]nfsd_dispatch+b6/190>
Trace; d0964a60 <[nfsd]nfsd_procedures3+e0/2c0>
Trace; d092c8d6 <[sunrpc]svc_process+306/4c0>
Trace; d09643d8 <[nfsd]nfsd_version3+0/10>
Trace; d09643f8 <[nfsd]nfsd_program+0/1c>
Trace; d0955386 <[nfsd]nfsd+1d6/310>
Trace; c0105726 <kernel_thread+26/30>
Trace; d09551b0 <[nfsd]nfsd+0/310>
Code;  d083c042 <[jbd]journal_dirty_data+82/170>
00000000 <_EIP>:
Code;  d083c042 <[jbd]journal_dirty_data+82/170>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  d083c044 <[jbd]journal_dirty_data+84/170>
   2:   83 c4 14                  add    $0x14,%esp
Code;  d083c046 <[jbd]journal_dirty_data+86/170>
   5:   83 7b 08 02               cmpl   $0x2,0x8(%ebx)
Code;  d083c04a <[jbd]journal_dirty_data+8a/170>
   9:   0f 87 ac 00 00 00         ja     bb <_EIP+0xbb> d083c0fc <[jbd]journal_dirty_data+13c/170>
Code;  d083c050 <[jbd]journal_dirty_data+90/170>
   f:   8b 4c 24 30               mov    0x30(%esp,1),%ecx
Code;  d083c054 <[jbd]journal_dirty_data+94/170>
  13:   85 00                     test   %eax,(%eax)


[6.] A small shell script or example program which triggers the
     problem (if possible)

  ./iozone -ac -R -n 256m

  against an ext3 filesystem on an LVM2 beta1.1 through NFS (both
  client and server using Trond's Patches)

[7.] Environment

  If not seperately indicated, only information of the oopsing server
  is shown. Further info also on the other machines on request.

[7.1.] Software (add the output of the ver_linux script here)

Linux terra.fo.et.local 2.4.18-rc4 #1 Tue Feb 26 18:11:31 CET 2002 i686 unknown
 
Gnu C                  2.96
Gnu make               3.79.1
binutils               2.11.90.0.8
util-linux             2.11f
mount                  2.11g
modutils               2.4.6
e2fsprogs              1.23
reiserfsprogs          3.x.0j
pcmcia-cs              3.1.22
PPP                    2.4.1
isdn4k-utils           3.1pre1
Linux C Library        2.2.4
Dynamic linker (ldd)   2.2.4
Procps                 2.0.7
Net-tools              1.60
Console-tools          0.3.3
Sh-utils               2.0.11
Modules Loaded         es1370 gameport soundcore nfs mga agpgart nfsd lockd sunrpc autofs tulip af_packet md ospm_thermal ospm_processor ospm_button ospm_system ospm_busmgr lvm-mod dm-mod sr_mod cdrom sd_mod aic7xxx scsi_mod usb-uhci usbcore rtc unix ext3 jbd ide-disk ide-probe-mod ide-mod

[7.2.] Processor information (from /proc/cpuinfo):

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Celeron (Coppermine)
stepping        : 10
cpu MHz         : 896.854
cache size      : 128 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips        : 1789.13

[7.3.] Module information (from /proc/modules):

es1370                 25008   0 (autoclean)
gameport                1552   0 (autoclean) [es1370]
soundcore               3664   4 (autoclean) [es1370]
nfs                    74944   1 (autoclean)
mga                    98528   0
agpgart                31072   1
nfsd                   66656   8 (autoclean)
lockd                  47344   1 (autoclean) [nfs nfsd]
sunrpc                 62240   1 (autoclean) [nfs nfsd lockd]
autofs                  9120   3 (autoclean)
tulip                  37456   1
af_packet              11968   0 (autoclean)
md                     44000   0 (unused)
ospm_thermal            5808   0 (unused)
ospm_processor          5616   0 (unused)
ospm_button             3264   0 (unused)
ospm_system             5936   0 (unused)
ospm_busmgr            11616   0 [ospm_thermal ospm_processor ospm_button ospm_system]
lvm-mod                45536   0 (unused)
dm-mod                 24048   3
sr_mod                 12560   0
cdrom                  27872   0 [sr_mod]
sd_mod                 10800   4
aic7xxx               112640   2
scsi_mod               90432   3 [sr_mod sd_mod aic7xxx]
usb-uhci               21344   0 (unused)
usbcore                50112   1 [usb-uhci]
rtc                     5664   0 (autoclean)
unix                   13552  31
ext3                   57776   2
jbd                    36096   2 [ext3]
ide-disk                6736   3
ide-probe-mod           7712   0
ide-mod               148960   3 [ide-disk ide-probe-mod]

[7.4.] Loaded driver and hardware information (/proc/ioports,
/proc/iomem)

0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
01f0-01f7 : ide0
03c0-03df : vga+
03f6-03f6 : ide0
0cf8-0cff : PCI conf1
5000-500f : Intel Corp. 82820 820 (Camino 2) Chipset SMBus
c000-cfff : PCI Bus #02
  c000-c03f : Ensoniq ES1370 [AudioPCI]
    c000-c03f : es1370
  c400-c4ff : Lite-On Communications Inc LNE100TX
    c400-c4ff : tulip
  c800-c8ff : Adaptec AHA-294x / AIC-7871
    c800-c8ff : aic7xxx
d000-d01f : Intel Corp. 82820 820 (Camino 2) Chipset USB (Hub A)
  d000-d01f : usb-uhci
d800-d81f : Intel Corp. 82820 820 (Camino 2) Chipset USB (Hub B)
  d800-d81f : usb-uhci
f000-f00f : Intel Corp. 82820 820 (Camino 2) Chipset IDE U100
  f000-f007 : ide0
  f008-f00f : ide1

00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000d0000-000d27ff : Extension ROM
000f0000-000fffff : System ROM
00100000-0ffeffff : System RAM
  00100000-001df582 : Kernel code
  001df583-00215c4b : Kernel data
0fff0000-0fff2fff : ACPI Non-volatile Storage
0fff3000-0fffffff : ACPI Tables
d0000000-d3ffffff : Intel Corp. 82815 815 Chipset Host Bridge and Memory Controller Hub
d4000000-d7ffffff : PCI Bus #01
  d4000000-d4003fff : Matrox Graphics, Inc. MGA G400 AGP
  d5000000-d57fffff : Matrox Graphics, Inc. MGA G400 AGP
d8000000-d9ffffff : PCI Bus #01
  d8000000-d9ffffff : Matrox Graphics, Inc. MGA G400 AGP
da000000-dbffffff : PCI Bus #02
  db000000-db0000ff : Lite-On Communications Inc LNE100TX
    db000000-db0000ff : tulip
  db001000-db001fff : Adaptec AHA-294x / AIC-7871
    db001000-db001fff : aic7xxx
ffb00000-ffffffff : reserved

[7.5.] PCI information ('lspci -vvv' as root)

00:00.0 Host bridge: Intel Corporation 82815 815 Chipset Host Bridge and Memory Controller Hub (rev 02)
        Subsystem: Intel Corporation 82815 815 Chipset Host Bridge and Memory Controller Hub
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
        Latency: 0
        Region 0: Memory at d0000000 (32-bit, prefetchable) [size=64M]
        Capabilities: [88] #09 [e104]
        Capabilities: [a0] AGP version 2.0
                Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
                Command: RQ=0 SBA+ AGP+ 64bit- FW- Rate=x1

00:01.0 PCI bridge: Intel Corporation: Unknown device 1131 (rev 02) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
        I/O behind bridge: 0000f000-00000fff
        Memory behind bridge: d4000000-d7ffffff
        Prefetchable memory behind bridge: d8000000-d9ffffff
        BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-

00:1e.0 PCI bridge: Intel Corporation 82801BAM PCI (rev 02) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=32
        I/O behind bridge: 0000c000-0000cfff
        Memory behind bridge: da000000-dbffffff
        Prefetchable memory behind bridge: fff00000-000fffff
        BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-

00:1f.0 ISA bridge: Intel Corporation 82801BA ISA Bridge (ICH2) (rev 02)
        Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0

00:1f.1 IDE interface: Intel Corporation 82801BA IDE U100 (rev 02) (prog-if 80 [Master])
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Region 4: I/O ports at f000 [size=16]

00:1f.2 USB Controller: Intel Corporation 82801BA(M) USB (Hub A) (rev 02) (prog-if 00 [UHCI])
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Interrupt: pin D routed to IRQ 9
        Region 4: I/O ports at d000 [size=32]

00:1f.3 SMBus: Intel Corporation 82801BA(M) SMBus (rev 02)
        Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin B routed to IRQ 9
        Region 4: I/O ports at 5000 [size=16]

00:1f.4 USB Controller: Intel Corporation 82801BA(M) USB (Hub B) (rev 02) (prog-if 00 [UHCI])
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Interrupt: pin C routed to IRQ 9
        Region 4: I/O ports at d800 [size=32]

01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 82) (prog-if 00 [VGA])
        Subsystem: Matrox Graphics, Inc.: Unknown device 0542
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (4000ns min, 8000ns max), cache line size 08
        Interrupt: pin A routed to IRQ 10
        Region 0: Memory at d8000000 (32-bit, prefetchable) [size=32M]
        Region 1: Memory at d4000000 (32-bit, non-prefetchable) [size=16K]
        Region 2: Memory at d5000000 (32-bit, non-prefetchable) [size=8M]
        Expansion ROM at <unassigned> [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [f0] AGP version 2.0
                Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
                Command: RQ=31 SBA+ AGP+ 64bit- FW- Rate=x1

02:01.0 Multimedia audio controller: Ensoniq ES1370 [AudioPCI] (rev 01)
        Subsystem: Unknown device 4942:4c4c
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort+ <MAbort+ >SERR- <PERR-
        Latency: 32 (3000ns min, 32000ns max)
        Interrupt: pin A routed to IRQ 9
        Region 0: I/O ports at c000 [size=64]

02:03.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 20)
        Subsystem: Netgear FA310TX
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32
        Interrupt: pin A routed to IRQ 10
        Region 0: I/O ports at c400 [size=256]
        Region 1: Memory at db000000 (32-bit, non-prefetchable) [size=256]
        Expansion ROM at <unassigned> [disabled] [size=256K]

02:04.0 SCSI storage controller: Adaptec AHA-294x / AIC-7871
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (2000ns min, 2000ns max), cache line size 08
        Interrupt: pin A routed to IRQ 11
        Region 0: I/O ports at c800 [disabled] [size=256]
        Region 1: Memory at db001000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at <unassigned> [disabled] [size=32K]

[7.6.] SCSI information (from /proc/scsi/scsi)

Attached devices: 
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: IBM      Model: DORS-32160       Rev: S82C
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 04 Lun: 00
  Vendor: PLEXTOR  Model: CD-ROM PX-32CS   Rev: 1.01
  Type:   CD-ROM                           ANSI SCSI revision: 02

[7.7.] Other information that might be relevant to the problem
       (please look in /proc and include all information that you
       think to be relevant):

Server (the oopsed one):
eth0: Lite-On 82c168 PNIC rev 32 at 0xd0965000, 00:A0:CC:D2:C7:52, IRQ 10.
(NetGear FA-310TX)

Reference server (against which iozone works; doesn't use LVM):
eth0: NatSemi DP8381[56] at 0xd09e8000, 00:02:e3:13:e3:18, IRQ 5.
(Presumably NetGear FA-311TX)

Client (running iozone):
eth0: Lite-On 82c168 PNIC rev 32 at 0xd0a2a000, 00:A0:CC:5A:09:30, IRQ 11.
(Presumably NetGear FA-310TX)


[X.] Other notes, patches, fixes, workarounds:

  My limited understanding and tentative evidence point in the
  direction of LVM being not completely innocent (maybe in conjunction
  with the direct_io patches contained in Trond's NFS), but this is
  pure conjecture. I'll CC: the report to Sistina, anyway.


-- 
"I use emacs, which might be thought of as a thermonuclear
 word processor."
-- Neal Stephenson, "In the beginning... was the command line"

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: PROBLEM: Oops during iozone over NFS against 2.4.18-rc4 + Trond's NFS Patches + Sistina's LVM2
  2002-03-03  2:09 PROBLEM: Oops during iozone over NFS against 2.4.18-rc4 + Trond's NFS Patches + Sistina's LVM2 Joachim Breuer
@ 2002-03-05 20:38 ` Joachim Breuer
  0 siblings, 0 replies; 2+ messages in thread
From: Joachim Breuer @ 2002-03-05 20:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: support, trond.myklebust


I received a patch from Sistina (thanks!) which does indeed fix the
oops occuring on the server. The patch apparently cleans up concurrent
usage of a _private buffer_head data field by ext3 and lvm. It does
*not*, however, alleviate the iozone error - that still occurs with
different semantics: iozone now exits cleanly after the i/o error is
displayed, NFS stays usable.

Trond: Is there anything I can do to get error messages/traces out of
the nfs layer? Or do analysis of the server (as it's fully usable
during/after the iozone run). Syslog (in full *.*) shows
nothing. Kudos for the patch btw - w/out I can't get write performance
better than 1.5MBps, with the patch it's an even 7.something MBps.

The iozone error reported below is 100% reproducible between those two
boxen.

Since the original report I've moved all mentioned machines to 2.4.18
(final, not -rc4) including Sistina's cleanup as mentioned above.

On a half-related note: I'm not subscribed to the nfs list and can't
subscribe to it without bending over backwards (the mail server
doesn't accept the confirmation mail targeted at it from an "dial-up"
(rather: dynamic-ip) account). No, the obvious simple solution is not
really viable.


Joachim Breuer <jmbreuer@gmx.net> writes:

> [1.] One line summary of the problem:
>
>   During performance test (using iozone) against nfs server with
>   kernel 2.4.18-rc4 + linux-2.4.18-NFS_ALL.dif (provided against
>   2.4.18-rc2, went in cleanly) from nfs.sourceforge.net + Sistina's
>   LVM2 beta1.1, Oops occurs.
>
> [2.] Full description of the problem/report:
>
>   Client and Server running same aforementioned kernel, NFS parameters
>   set to rsize==wsize==4096; udp. Exported FS is an ext3 on a LV on
>   two PVs (linear/segmented, not striped), one is a complete SCSI disk
>   (sda), the other is a partition of an IDE disk (hda5). "./iozone -ac
>   -R -n 256m" showed results for write/rewrite/read/reread; showed
>   "Error writing block at 219414528", "write: Input/output error",
>   "iozone: interrupted" before random read result. Result "never"
>   comes up (not in 15+ minutes), iozone appears hung (Ctrl-C won't
>   break it, kill -9 does).
>
>   Checking at the server shows oops detailed below, and high load
>   (around 8, stays up even after iozone was killed). No processes on
>   server show responsible for the load average in "top".
>
>   Same iozone against other NFS server works (more than one iteration,
>   this NFS server will be called "reference server" from here on). The
>   reference server is different hardware (see next paragraph and
>   [7.7]), esp. different ethernet controller. Reference server does
>   *NOT* employ LVM; but *does* also use ext3.
>
>   All boxen X86, server is a Celeron 900 (PIII-Core-based); client is
>   a Pentium III 850; running identical kernels. Reference server is a
>   dual Pentium Pro 200 SMP, using the same kernel modulo SMP turned
>   on. More info on hardware available on request.

[Rest of original problem report, including setup/machine description,
in <m3zo1q8kr2.fsf@venus.fo.et.local> posted on Sun, 03 Mar 2002 03:09:53 +0100]


So long,
   Joe

-- 
"I use emacs, which might be thought of as a thermonuclear
 word processor."
-- Neal Stephenson, "In the beginning... was the command line"

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-03-05 20:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-03-03  2:09 PROBLEM: Oops during iozone over NFS against 2.4.18-rc4 + Trond's NFS Patches + Sistina's LVM2 Joachim Breuer
2002-03-05 20:38 ` Joachim Breuer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).