Re: 2.6.28.9: EXT3/NFS inodes corruption

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: 2.6.28.9: EXT3/NFS inodes corruption
       [not found] <ct4xS-63o-27@gated-at.bofh.it>
@ 2009-07-28 16:40 ` Daniel J Blueman
  2009-07-28 16:45   ` Sylvain Rochet
  0 siblings, 1 reply; 39+ messages in thread
From: Daniel J Blueman @ 2009-07-28 16:40 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: Linux Kernel

On Apr 20, 5:30 pm, Sylvain Rochet <grada...@gradator.net> wrote:
> Hi,
>
> We(TuxFamily) are having some inodes corruptions on a NFS server.
>
> So, let's start with the facts.
>
> ==== NFS Server
>
> Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux
[snip]

Can you do a 'lspci -v' on the server please?

Daniel
--
Daniel J Blueman

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
  2009-07-28 16:40 ` 2.6.28.9: EXT3/NFS inodes corruption Daniel J Blueman
@ 2009-07-28 16:45   ` Sylvain Rochet
  2009-08-21 11:05     ` Daniel J Blueman
  0 siblings, 1 reply; 39+ messages in thread
From: Sylvain Rochet @ 2009-07-28 16:45 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Linux Kernel


[-- Attachment #1.1: Type: text/plain, Size: 558 bytes --]

Hi,


On Tue, Jul 28, 2009 at 09:40:42AM -0700, Daniel J Blueman wrote:
> On Apr 20, 5:30 pm, Sylvain Rochet <grada...@gradator.net> wrote:
> > Hi,
> >
> > We(TuxFamily) are having some inodes corruptions on a NFS server.
> >
> > So, let's start with the facts.
> >
> > ==== NFS Server
> >
> > Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux
> [snip]
> 
> Can you do a 'lspci -v' on the server please?

Of course yes.

I attached the 'lspci -v' of the previous and the current storage 
server.


Sylvain

[-- Attachment #1.2: currentserver --]
[-- Type: text/plain, Size: 8176 bytes --]

00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) (prog-if 00 [Normal decode])
	Flags: bus master, 66MHz, medium devsel, latency 123
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=68
	Memory behind bridge: fc900000-fdffffff
	Prefetchable memory behind bridge: fc400000-fc4fffff
	Capabilities: [c0] HyperTransport: Slave or Primary Interface
	Capabilities: [f0] HyperTransport: Interrupt Discovery and Configuration

00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05)
	Subsystem: Newisys, Inc. Unknown device 0010
	Flags: bus master, 66MHz, medium devsel, latency 0

00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) (prog-if 8a [Master SecP PriP])
	Subsystem: Newisys, Inc. Unknown device 0010
	Flags: bus master, medium devsel, latency 64
	[virtual] Memory at 000001f0 (32-bit, non-prefetchable) [disabled] [size=8]
	[virtual] Memory at 000003f0 (type 3, non-prefetchable) [disabled] [size=1]
	[virtual] Memory at 00000170 (32-bit, non-prefetchable) [disabled] [size=8]
	[virtual] Memory at 00000370 (type 3, non-prefetchable) [disabled] [size=1]
	I/O ports at 1000 [size=16]

00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05)
	Subsystem: Newisys, Inc. Unknown device 0010
	Flags: medium devsel

00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) (prog-if 00 [Normal decode])
	Flags: bus master, 66MHz, medium devsel, latency 64
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=64
	I/O behind bridge: 00002000-00002fff
	Memory behind bridge: fe000000-fe0fffff
	Prefetchable memory behind bridge: 00000000fc500000-00000000fc5fffff
	Capabilities: [a0] PCI-X bridge device
	Capabilities: [b8] HyperTransport: Interrupt Discovery and Configuration
	Capabilities: [c0] HyperTransport: Slave or Primary Interface

00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) (prog-if 10 [IO-APIC])
	Subsystem: Newisys, Inc. Unknown device 0010
	Flags: bus master, medium devsel, latency 0
	Memory at fc800000 (64-bit, non-prefetchable) [size=4K]

00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) (prog-if 00 [Normal decode])
	Flags: bus master, 66MHz, medium devsel, latency 64
	Bus: primary=00, secondary=03, subordinate=04, sec-latency=64
	I/O behind bridge: 00003000-00003fff
	Memory behind bridge: fe100000-fe1fffff
	Capabilities: [a0] PCI-X bridge device
	Capabilities: [b8] HyperTransport: Interrupt Discovery and Configuration

00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) (prog-if 10 [IO-APIC])
	Subsystem: Newisys, Inc. Unknown device 0010
	Flags: bus master, medium devsel, latency 0
	Memory at fc801000 (64-bit, non-prefetchable) [size=4K]

00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
	Flags: fast devsel
	Capabilities: [80] HyperTransport: Host or Secondary Interface
	Capabilities: [a0] HyperTransport: Host or Secondary Interface
	Capabilities: [c0] HyperTransport: Host or Secondary Interface

00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
	Flags: fast devsel

00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
	Flags: fast devsel

00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
	Flags: fast devsel

00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
	Flags: fast devsel
	Capabilities: [80] HyperTransport: Host or Secondary Interface
	Capabilities: [a0] HyperTransport: Host or Secondary Interface
	Capabilities: [c0] HyperTransport: Host or Secondary Interface

00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
	Flags: fast devsel

00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
	Flags: fast devsel

00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
	Flags: fast devsel

01:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) (prog-if 10 [OHCI])
	Subsystem: Newisys, Inc. Unknown device 0010
	Flags: bus master, medium devsel, latency 64, IRQ 11
	Memory at fc900000 (32-bit, non-prefetchable) [size=4K]

01:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) (prog-if 10 [OHCI])
	Subsystem: Newisys, Inc. Unknown device 0010
	Flags: bus master, medium devsel, latency 64, IRQ 11
	Memory at fc901000 (32-bit, non-prefetchable) [size=4K]

01:05.0 VGA compatible controller: Trident Microsystems Blade 3D PCI/AGP (rev 3a) (prog-if 00 [VGA])
	Subsystem: Newisys, Inc. Unknown device 0010
	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 10
	Memory at fd800000 (32-bit, non-prefetchable) [size=8M]
	Memory at fc920000 (32-bit, non-prefetchable) [size=128K]
	Memory at fd000000 (32-bit, non-prefetchable) [size=8M]
	[virtual] Expansion ROM at fc400000 [disabled] [size=64K]
	Capabilities: [80] AGP version 1.0
	Capabilities: [90] Power Management version 1

02:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit Ethernet (rev 02)
	Subsystem: Newisys, Inc. Unknown device 0010
	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 25
	Memory at fe000000 (64-bit, non-prefetchable) [size=64K]
	Expansion ROM at <ignored> [disabled]
	Capabilities: [40] PCI-X non-bridge device
	Capabilities: [48] Power Management version 2
	Capabilities: [50] Vital Product Data
	Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable-

02:03.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit Ethernet (rev 02)
	Subsystem: Newisys, Inc. Unknown device 0010
	Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 26
	Memory at fe010000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [40] PCI-X non-bridge device
	Capabilities: [48] Power Management version 2
	Capabilities: [50] Vital Product Data
	Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable-

02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
	Subsystem: Newisys, Inc. Unknown device 0010
	Flags: bus master, 66MHz, medium devsel, latency 72, IRQ 27
	I/O ports at 2000 [size=256]
	Memory at fe030000 (64-bit, non-prefetchable) [size=64K]
	Memory at fe020000 (64-bit, non-prefetchable) [size=64K]
	[virtual] Expansion ROM at fc500000 [disabled] [size=1M]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
	Capabilities: [68] PCI-X non-bridge device

03:01.0 PCI bridge: IBM PCI-X to PCI-X Bridge (rev 02) (prog-if 00 [Normal decode])
	Flags: bus master, 66MHz, medium devsel, latency 64
	Bus: primary=03, secondary=04, subordinate=04, sec-latency=64
	I/O behind bridge: 00003000-00003fff
	Memory behind bridge: fe100000-fe1fffff
	Capabilities: [80] PCI-X bridge device
	Capabilities: [90] Power Management version 2

04:04.0 Fibre Channel: Emulex Corporation LP9802 Fibre Channel Host Adapter (rev 01)
	Subsystem: Emulex Corporation LP9802 Fibre Channel Host Adapter
	Flags: bus master, 66MHz, medium devsel, latency 248, IRQ 28
	Memory at fe100000 (64-bit, non-prefetchable) [size=4K]
	Memory at fe102000 (64-bit, non-prefetchable) [size=256]
	I/O ports at 3000 [size=256]
	Capabilities: [4c] Power Management version 2
	Capabilities: [54] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
	Capabilities: [64] Vital Product Data
	Capabilities: [44] PCI-X non-bridge device

04:05.0 Fibre Channel: Emulex Corporation LP9802 Fibre Channel Host Adapter (rev 01)
	Subsystem: Emulex Corporation LP9802 Fibre Channel Host Adapter
	Flags: bus master, 66MHz, medium devsel, latency 248, IRQ 29
	Memory at fe101000 (64-bit, non-prefetchable) [size=4K]
	Memory at fe102400 (64-bit, non-prefetchable) [size=256]
	I/O ports at 3400 [size=256]
	Capabilities: [4c] Power Management version 2
	Capabilities: [54] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
	Capabilities: [64] Vital Product Data
	Capabilities: [44] PCI-X non-bridge device


[-- Attachment #1.3: previousserver --]
[-- Type: text/plain, Size: 9777 bytes --]

00:00.0 Host bridge: Intel Corporation Unknown device 29f0 (rev 01)
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: bus master, fast devsel, latency 0
	Capabilities: [e0] Vendor Specific Information

00:01.0 PCI bridge: Intel Corporation Unknown device 29f1 (rev 01) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=01, subordinate=03, sec-latency=0
	Memory behind bridge: d0100000-d01fffff
	Prefetchable memory behind bridge: 00000000d4000000-00000000d7ffffff
	Capabilities: [88] Subsystem: Super Micro Computer Inc Unknown device d280
	Capabilities: [80] Power Management version 3
	Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+
	Capabilities: [a0] Express Root Port (Slot+) IRQ 0

00:1a.0 USB Controller: Intel Corporation Unknown device 2937 (rev 02) (prog-if 00 [UHCI])
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: bus master, medium devsel, latency 0, IRQ 5
	I/O ports at 1820 [size=32]
	Capabilities: [50] #13 [0306]

00:1a.1 USB Controller: Intel Corporation Unknown device 2938 (rev 02) (prog-if 00 [UHCI])
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: bus master, medium devsel, latency 0, IRQ 10
	I/O ports at 1840 [size=32]
	Capabilities: [50] #13 [0306]

00:1a.2 USB Controller: Intel Corporation Unknown device 2939 (rev 02) (prog-if 00 [UHCI])
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: bus master, medium devsel, latency 0, IRQ 11
	I/O ports at 1860 [size=32]
	Capabilities: [50] #13 [0306]

00:1a.7 USB Controller: Intel Corporation Unknown device 293c (rev 02) (prog-if 20 [EHCI])
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: bus master, medium devsel, latency 0, IRQ 11
	Memory at d0000000 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Debug port
	Capabilities: [98] #13 [0306]

00:1c.0 PCI bridge: Intel Corporation Unknown device 2940 (rev 02) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
	Capabilities: [40] Express Root Port (Slot+) IRQ 0
	Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+
	Capabilities: [90] Subsystem: Super Micro Computer Inc Unknown device d280
	Capabilities: [a0] Power Management version 2

00:1c.4 PCI bridge: Intel Corporation Unknown device 2948 (rev 02) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=0d, subordinate=0d, sec-latency=0
	I/O behind bridge: 00002000-00002fff
	Memory behind bridge: d0200000-d02fffff
	Capabilities: [40] Express Root Port (Slot+) IRQ 0
	Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+
	Capabilities: [90] Subsystem: Super Micro Computer Inc Unknown device d280
	Capabilities: [a0] Power Management version 2

00:1c.5 PCI bridge: Intel Corporation Unknown device 294a (rev 02) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=0f, subordinate=0f, sec-latency=0
	I/O behind bridge: 00003000-00003fff
	Memory behind bridge: d0300000-d03fffff
	Capabilities: [40] Express Root Port (Slot+) IRQ 0
	Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+
	Capabilities: [90] Subsystem: Super Micro Computer Inc Unknown device d280
	Capabilities: [a0] Power Management version 2

00:1d.0 USB Controller: Intel Corporation Unknown device 2934 (rev 02) (prog-if 00 [UHCI])
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: bus master, medium devsel, latency 0, IRQ 7
	I/O ports at 1880 [size=32]
	Capabilities: [50] #13 [0306]

00:1d.1 USB Controller: Intel Corporation Unknown device 2935 (rev 02) (prog-if 00 [UHCI])
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: bus master, medium devsel, latency 0, IRQ 10
	I/O ports at 18a0 [size=32]
	Capabilities: [50] #13 [0306]

00:1d.2 USB Controller: Intel Corporation Unknown device 2936 (rev 02) (prog-if 00 [UHCI])
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: bus master, medium devsel, latency 0, IRQ 11
	I/O ports at 18c0 [size=32]
	Capabilities: [50] #13 [0306]

00:1d.7 USB Controller: Intel Corporation Unknown device 293a (rev 02) (prog-if 20 [EHCI])
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: bus master, medium devsel, latency 0, IRQ 7
	Memory at d0000400 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Debug port
	Capabilities: [98] #13 [0306]

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92) (prog-if 01 [Subtractive decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=11, subordinate=11, sec-latency=32
	I/O behind bridge: 00004000-00004fff
	Memory behind bridge: d0400000-d04fffff
	Prefetchable memory behind bridge: 00000000d8000000-00000000dfffffff
	Capabilities: [50] Subsystem: Super Micro Computer Inc Unknown device d280

00:1f.0 ISA bridge: Intel Corporation Unknown device 2916 (rev 02)
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: bus master, medium devsel, latency 0
	Capabilities: [e0] Vendor Specific Information

00:1f.2 SATA controller: Intel Corporation Unknown device 2922 (rev 02) (prog-if 01 [AHCI 1.0])
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 76
	I/O ports at 1c50 [size=8]
	I/O ports at 1c44 [size=4]
	I/O ports at 1c48 [size=8]
	I/O ports at 1c40 [size=4]
	I/O ports at 18e0 [size=32]
	Memory at d0000800 (32-bit, non-prefetchable) [size=2K]
	Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/4 Enable+
	Capabilities: [70] Power Management version 3
	Capabilities: [a8] #12 [0010]
	Capabilities: [b0] #13 [0306]

00:1f.3 SMBus: Intel Corporation Unknown device 2930 (rev 02)
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: medium devsel, IRQ 10
	Memory at d0001000 (64-bit, non-prefetchable) [disabled] [size=256]
	I/O ports at 1100 [size=32]

00:1f.6 Signal processing controller: Intel Corporation Unknown device 2932 (rev 02)
	Subsystem: Super Micro Computer Inc Unknown device 0000
	Flags: fast devsel, IRQ 5
	Memory at d0002000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [50] Power Management version 3

01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=01, secondary=02, subordinate=02, sec-latency=64
	Capabilities: [44] Express PCI/PCI-X Bridge IRQ 0
	Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
	Capabilities: [6c] Power Management version 2
	Capabilities: [d8] PCI-X bridge device

01:00.1 PIC: Intel Corporation 6700/6702PXH I/OxAPIC Interrupt Controller A (rev 09) (prog-if 20 [IO(X)-APIC])
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: bus master, fast devsel, latency 0
	Memory at d0100000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: [44] Express Endpoint IRQ 0
	Capabilities: [6c] Power Management version 2

01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=01, secondary=03, subordinate=03, sec-latency=48
	Prefetchable memory behind bridge: 00000000d4000000-00000000d7ffffff
	Capabilities: [44] Express PCI/PCI-X Bridge IRQ 0
	Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
	Capabilities: [6c] Power Management version 2
	Capabilities: [d8] PCI-X bridge device

01:00.3 PIC: Intel Corporation 6700PXH I/OxAPIC Interrupt Controller B (rev 09) (prog-if 20 [IO(X)-APIC])
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: bus master, fast devsel, latency 0
	Memory at d0101000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: [44] Express Endpoint IRQ 0
	Capabilities: [6c] Power Management version 2

03:02.0 RAID bus controller: Mylex Corporation AcceleRAID 352/170/160 support Device (rev 02)
	Subsystem: Mylex Corporation AcceleRAID 352 support Device
	Flags: bus master, fast Back2Back, medium devsel, latency 32, IRQ 52
	Memory at d4000000 (32-bit, prefetchable) [size=64M]
	Capabilities: [80] Power Management version 2

0d:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03)
	Subsystem: Super Micro Computer Inc Unknown device 108c
	Flags: bus master, fast devsel, latency 0, IRQ 77
	Memory at d0200000 (32-bit, non-prefetchable) [size=128K]
	I/O ports at 2000 [size=32]
	Capabilities: [c8] Power Management version 2
	Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
	Capabilities: [e0] Express Endpoint IRQ 0

0f:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
	Subsystem: Super Micro Computer Inc Unknown device 109a
	Flags: bus master, fast devsel, latency 0, IRQ 78
	Memory at d0300000 (32-bit, non-prefetchable) [size=128K]
	I/O ports at 3000 [size=32]
	Capabilities: [c8] Power Management version 2
	Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
	Capabilities: [e0] Express Endpoint IRQ 0

11:04.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) (prog-if 00 [VGA])
	Subsystem: Super Micro Computer Inc Unknown device d280
	Flags: bus master, stepping, fast Back2Back, medium devsel, latency 66, IRQ 10
	Memory at d8000000 (32-bit, prefetchable) [size=128M]
	I/O ports at 4000 [size=256]
	Memory at d0400000 (32-bit, non-prefetchable) [size=64K]
	[virtual] Expansion ROM at d0420000 [disabled] [size=128K]
	Capabilities: [50] Power Management version 2


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
  2009-07-28 16:45   ` Sylvain Rochet
@ 2009-08-21 11:05     ` Daniel J Blueman
  2009-08-21 14:32       ` Sylvain Rochet
  0 siblings, 1 reply; 39+ messages in thread
From: Daniel J Blueman @ 2009-08-21 11:05 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: Linux Kernel

Hi Sylvain,

On Tue, Jul 28, 2009 at 5:45 PM, Sylvain Rochet<gradator@gradator.net> wrote:
> On Tue, Jul 28, 2009 at 09:40:42AM -0700, Daniel J Blueman wrote:
>> On Apr 20, 5:30 pm, Sylvain Rochet <grada...@gradator.net> wrote:
>> > Hi,
>> >
>> > We(TuxFamily) are having some inodes corruptions on a NFS server.
>> >
>> > So, let's start with the facts.
>> >
>> > ==== NFS Server
>> >
>> > Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux
>> [snip]
>>
>> Can you do a 'lspci -v' on the server please?
>
> Of course yes.
>
> I attached the 'lspci -v' of the previous and the current storage
> server.

The reason I ask, I was chasing data corruption across the PCIe bus
with some high-performance Quadrics interconnect adapters a while ago.
The reproducer involved multiple outstanding main memory read requests
to related addresses and a small block of data would be returned from
the wrong offset.

In the end, I found the nVidia CK804 (also MCP55) HT->PCIe bridge was
at fault and later found disk corruption when doing heavy rsyncs to
network. This was never publicly acknowledged, but I guess it
illustrates the need for some micro-tests to verify data-soundness
under duress; it took a day (and petabytes of data) of the production
I/O workload to get this data corruption, and 3 seconds with the right
reproducer, (still non-trivial to catch on a PCIe protocol analyser).

Sometime I'll develop a stress-test driver for a common SATA or
network controller to drive it's DMA engine with I/O patterns to and
from main memory, checking the data integrity every few seconds; this
could be generalised with OpenGL nicely for graphics cards on
workstations I imagine.

Daniel
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
  2009-08-21 11:05     ` Daniel J Blueman
@ 2009-08-21 14:32       ` Sylvain Rochet
  0 siblings, 0 replies; 39+ messages in thread
From: Sylvain Rochet @ 2009-08-21 14:32 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Linux Kernel, Sylvain Rochet

[-- Attachment #1: Type: text/plain, Size: 1242 bytes --]

Hi,

On Fri, Aug 21, 2009 at 12:05:10PM +0100, Daniel J Blueman wrote:
> 
> The reason I ask, I was chasing data corruption across the PCIe bus
> with some high-performance Quadrics interconnect adapters a while ago.
> The reproducer involved multiple outstanding main memory read requests
> to related addresses and a small block of data would be returned from
> the wrong offset.
> 
> In the end, I found the nVidia CK804 (also MCP55) HT->PCIe bridge was
> at fault and later found disk corruption when doing heavy rsyncs to
> network. This was never publicly acknowledged, but I guess it
> illustrates the need for some micro-tests to verify data-soundness
> under duress; it took a day (and petabytes of data) of the production
> I/O workload to get this data corruption, and 3 seconds with the right
> reproducer, (still non-trivial to catch on a PCIe protocol analyser).
> 
> Sometime I'll develop a stress-test driver for a common SATA or
> network controller to drive it's DMA engine with I/O patterns to and
> from main memory, checking the data integrity every few seconds; this
> could be generalised with OpenGL nicely for graphics cards on
> workstations I imagine.

Hehe, sounds interesting.

Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
  2009-08-21  0:00                           ` Simon Kirby
  (?)
  (?)
@ 2009-08-21 10:51                           ` Sylvain Rochet
  -1 siblings, 0 replies; 39+ messages in thread
From: Sylvain Rochet @ 2009-08-21 10:51 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Jan Kara, linux-kernel, linux-ext4, linux-nfs, Al Viro, Sylvain Rochet

[-- Attachment #1: Type: text/plain, Size: 2631 bytes --]

Hi,

On Thu, Aug 20, 2009 at 05:00:35PM -0700, Simon Kirby wrote:
> On Thu, Aug 20, 2009 at 07:19:53PM +0200, Sylvain Rochet wrote:
> 
> > So, everything is fine, but the problem happened only one time on this 
> > server, so we cannot conclude anything after a few weeks. However, 
> > I now have physical access back, so we will switch back to the former 
> > server where the problem happened quite frequently, then we will see!
> 
> Not to derail the thread, but you were definitely seeing the same issues
> with stock 2.6.30.4, right?

Nope, the last issue we had came from 2.6.28.9.

We upgraded to 2.6.30.3 on the advice of Jan, then we "upgraded" to 
2.6.30.3 with the first Jan's patch to add some debug output 
(0001-ext3-Debug-unlinking-of-inodes.patch). Finally we upgraded to 
2.6.30.4 with the first and the second Jan's patch 
(0001-fs-Make-sure-data-stored-into-inode-is-properly-see.patch) to add 
a smp_mb() in the unlock_new_inode() function.

> We had all sorts of corruption happening for files served via NFS with 
> 2.6.28 and 2.6.29, but everything was magically fixed on 2.6.30 
> (though we needed a lot of fscking).  I never did track down what 
> change fixed it, since it took a while to reproduce.

Same here, everything is fine since 2.6.30. We will switch back to the 
quad-core server where the corruption happen(ed) in a few days. We are 
now using a bi-opteron server because we suspected hardware issues on 
the quad-core, the corruption happened only one time on the bi-opteron 
(which is IMHO a sufficient evidence to discard hardware issue). I guess 
the issue was(or is) kinda SMP related.

And yep, we also had long times playing with fsck ;-) Luckily that the 
corruption only occurs on new files, and new files are mostly caches, 
sessions, logs, and such, so fsck used its chainsaw on quite 
not-really-important files.

> Hmm.  I just noticed what seems to be a new occurrence of "deleted inode
> referenced" on a box with 2.6.30.  We saw many when we first upgraded to
> 2.6.30 due to the corruption caused by 2.6.29, but those all occurred
> within a day or so and were fsck'd.  I would have thought the backup
> sweeps would have tripped over that inode way before now...
> 
> Just wondering if you can confirm that the errors you saw with 2.6.30.4
> were not leftover from older kernels.

The few garbaged inodes from 2.6.28.9 (and previous) were pushed to 
lost+found to prevent future use of them. We do a fsck when we moved to 
2.6.30.4 that fixed everything. We never had corruption yet with the 
2.6.30.4.

Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-21  0:00                           ` Simon Kirby
  0 siblings, 0 replies; 39+ messages in thread
From: Simon Kirby @ 2009-08-21  0:00 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: Jan Kara, linux-kernel, linux-ext4, linux-nfs, Al Viro

On Thu, Aug 20, 2009 at 07:19:53PM +0200, Sylvain Rochet wrote:

> So, everything is fine, but the problem happened only one time on this 
> server, so we cannot conclude anything after a few weeks. However, 
> I now have physical access back, so we will switch back to the former 
> server where the problem happened quite frequently, then we will see!

Not to derail the thread, but you were definitely seeing the same issues
with stock 2.6.30.4, right?  We had all sorts of corruption happening for
files served via NFS with 2.6.28 and 2.6.29, but everything was magically
fixed on 2.6.30 (though we needed a lot of fscking).  I never did track
down what change fixed it, since it took a while to reproduce.

Hmm.  I just noticed what seems to be a new occurrence of "deleted inode
referenced" on a box with 2.6.30.  We saw many when we first upgraded to
2.6.30 due to the corruption caused by 2.6.29, but those all occurred
within a day or so and were fsck'd.  I would have thought the backup
sweeps would have tripped over that inode way before now...

Just wondering if you can confirm that the errors you saw with 2.6.30.4
were not leftover from older kernels.

Cheers,

Simon-

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-21  0:00                           ` Simon Kirby
  0 siblings, 0 replies; 39+ messages in thread
From: Simon Kirby @ 2009-08-21  0:00 UTC (permalink / raw)
  To: Sylvain Rochet
  Cc: Jan Kara, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro

On Thu, Aug 20, 2009 at 07:19:53PM +0200, Sylvain Rochet wrote:

> So, everything is fine, but the problem happened only one time on this 
> server, so we cannot conclude anything after a few weeks. However, 
> I now have physical access back, so we will switch back to the former 
> server where the problem happened quite frequently, then we will see!

Not to derail the thread, but you were definitely seeing the same issues
with stock 2.6.30.4, right?  We had all sorts of corruption happening for
files served via NFS with 2.6.28 and 2.6.29, but everything was magically
fixed on 2.6.30 (though we needed a lot of fscking).  I never did track
down what change fixed it, since it took a while to reproduce.

Hmm.  I just noticed what seems to be a new occurrence of "deleted inode
referenced" on a box with 2.6.30.  We saw many when we first upgraded to
2.6.30 due to the corruption caused by 2.6.29, but those all occurred
within a day or so and were fsck'd.  I would have thought the backup
sweeps would have tripped over that inode way before now...

Just wondering if you can confirm that the errors you saw with 2.6.30.4
were not leftover from older kernels.

Cheers,

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-21  0:00                           ` Simon Kirby
  0 siblings, 0 replies; 39+ messages in thread
From: Simon Kirby @ 2009-08-21  0:00 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: Jan Kara, linux-kernel, linux-ext4, linux-nfs, Al Viro

On Thu, Aug 20, 2009 at 07:19:53PM +0200, Sylvain Rochet wrote:

> So, everything is fine, but the problem happened only one time on this 
> server, so we cannot conclude anything after a few weeks. However, 
> I now have physical access back, so we will switch back to the former 
> server where the problem happened quite frequently, then we will see!

Not to derail the thread, but you were definitely seeing the same issues
with stock 2.6.30.4, right?  We had all sorts of corruption happening for
files served via NFS with 2.6.28 and 2.6.29, but everything was magically
fixed on 2.6.30 (though we needed a lot of fscking).  I never did track
down what change fixed it, since it took a while to reproduce.

Hmm.  I just noticed what seems to be a new occurrence of "deleted inode
referenced" on a box with 2.6.30.  We saw many when we first upgraded to
2.6.30 due to the corruption caused by 2.6.29, but those all occurred
within a day or so and were fsck'd.  I would have thought the backup
sweeps would have tripped over that inode way before now...

Just wondering if you can confirm that the errors you saw with 2.6.30.4
were not leftover from older kernels.

Cheers,

Simon-

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-20 17:19                         ` Sylvain Rochet
  0 siblings, 0 replies; 39+ messages in thread
From: Sylvain Rochet @ 2009-08-20 17:19 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, linux-ext4, linux-nfs, Al Viro, Sylvain Rochet

[-- Attachment #1: Type: text/plain, Size: 829 bytes --]

Hi!,

On Thu, Aug 13, 2009 at 12:34:53AM +0200, Jan Kara wrote:
>   Hello,
> 
> On Thu 06-08-09 15:15:56, Sylvain Rochet wrote:
> > > I have one idea what could cause your filesystem corruption, although 
> > > it's a wild guess... Please try attached oneliner.
> > 
> > Running since yesterday.
> 
> Any news after a week of running? How often did the corruption happen
> previously?

Sorry for the late answer, I was lurking at HAR ;-)

So, everything is fine, but the problem happened only one time on this 
server, so we cannot conclude anything after a few weeks. However, 
I now have physical access back, so we will switch back to the former 
server where the problem happened quite frequently, then we will see!

By the way, syslogd is happy, eating about 350 MiB of kernel logs a day ;)

Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-20 17:19                         ` Sylvain Rochet
  0 siblings, 0 replies; 39+ messages in thread
From: Sylvain Rochet @ 2009-08-20 17:19 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro, Sylvain Rochet

[-- Attachment #1: Type: text/plain, Size: 829 bytes --]

Hi!,

On Thu, Aug 13, 2009 at 12:34:53AM +0200, Jan Kara wrote:
>   Hello,
> 
> On Thu 06-08-09 15:15:56, Sylvain Rochet wrote:
> > > I have one idea what could cause your filesystem corruption, although 
> > > it's a wild guess... Please try attached oneliner.
> > 
> > Running since yesterday.
> 
> Any news after a week of running? How often did the corruption happen
> previously?

Sorry for the late answer, I was lurking at HAR ;-)

So, everything is fine, but the problem happened only one time on this 
server, so we cannot conclude anything after a few weeks. However, 
I now have physical access back, so we will switch back to the former 
server where the problem happened quite frequently, then we will see!

By the way, syslogd is happy, eating about 350 MiB of kernel logs a day ;)

Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-20 17:19                         ` Sylvain Rochet
  0 siblings, 0 replies; 39+ messages in thread
From: Sylvain Rochet @ 2009-08-20 17:19 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, linux-ext4, linux-nfs, Al Viro, Sylvain Rochet

[-- Attachment #1: Type: text/plain, Size: 829 bytes --]

Hi!,

On Thu, Aug 13, 2009 at 12:34:53AM +0200, Jan Kara wrote:
>   Hello,
> 
> On Thu 06-08-09 15:15:56, Sylvain Rochet wrote:
> > > I have one idea what could cause your filesystem corruption, although 
> > > it's a wild guess... Please try attached oneliner.
> > 
> > Running since yesterday.
> 
> Any news after a week of running? How often did the corruption happen
> previously?

Sorry for the late answer, I was lurking at HAR ;-)

So, everything is fine, but the problem happened only one time on this 
server, so we cannot conclude anything after a few weeks. However, 
I now have physical access back, so we will switch back to the former 
server where the problem happened quite frequently, then we will see!

By the way, syslogd is happy, eating about 350 MiB of kernel logs a day ;)

Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-12 22:34                       ` Jan Kara
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2009-08-12 22:34 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: linux-kernel, linux-ext4, linux-nfs, Al Viro

  Hello,

On Thu 06-08-09 15:15:56, Sylvain Rochet wrote:
> > I have one idea what could cause your filesystem corruption, although 
> > it's a wild guess... Please try attached oneliner.
> 
> Running since yesterday.
  Any news after a week of running? How often did the corruption happen
previously?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-12 22:34                       ` Jan Kara
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2009-08-12 22:34 UTC (permalink / raw)
  To: Sylvain Rochet
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro

  Hello,

On Thu 06-08-09 15:15:56, Sylvain Rochet wrote:
> > I have one idea what could cause your filesystem corruption, although 
> > it's a wild guess... Please try attached oneliner.
> 
> Running since yesterday.
  Any news after a week of running? How often did the corruption happen
previously?

								Honza
-- 
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-12 22:34                       ` Jan Kara
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2009-08-12 22:34 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: linux-kernel, linux-ext4, linux-nfs, Al Viro

  Hello,

On Thu 06-08-09 15:15:56, Sylvain Rochet wrote:
> > I have one idea what could cause your filesystem corruption, although 
> > it's a wild guess... Please try attached oneliner.
> 
> Running since yesterday.
  Any news after a week of running? How often did the corruption happen
previously?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-06 17:05                       ` J. Bruce Fields
  0 siblings, 0 replies; 39+ messages in thread
From: J. Bruce Fields @ 2009-08-06 17:05 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: Jan Kara, linux-kernel, linux-ext4, linux-nfs, Al Viro

On Thu, Aug 06, 2009 at 03:15:56PM +0200, Sylvain Rochet wrote:
> Hi,
> 
> 
> On Wed, Aug 05, 2009 at 12:56:19AM +0200, Jan Kara wrote:
> > 
> > Thanks for testing. So you seem to be really stressting the path where
> > creation of new files / directories fails (probably due to group quota).
> 
> Yes, there are 29 groups over quota on a total of 4499. Those are mainly 
> spammed websites and therefore quite stressed due to the amount of tries 
> to add new "data".
> 
> 
> > I have one idea what could cause your filesystem corruption, although 
> > it's a wild guess... Please try attached oneliner.
> 
> Running since yesterday.
> 
> 
> > Also your corruption reminded me that Al Viro has been fixing problems
> > where we could cache one inode twice when a filesystem was mounted over NFS
> > and that could also lead to a filesystem corruption. So I'm adding him to
> > CC just in case he has some idea. BTW Al, what do you think about the
> > problem I describe in the attached patch? I'm not sure if it can cause some
> > real problems but in theory it could...
> 
> Should we upgrade NFS clients as well ?  (now running 2.6.28.9)

The client version shouldn't matter.

--b.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-06 17:05                       ` J. Bruce Fields
  0 siblings, 0 replies; 39+ messages in thread
From: J. Bruce Fields @ 2009-08-06 17:05 UTC (permalink / raw)
  To: Sylvain Rochet
  Cc: Jan Kara, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro

On Thu, Aug 06, 2009 at 03:15:56PM +0200, Sylvain Rochet wrote:
> Hi,
> 
> 
> On Wed, Aug 05, 2009 at 12:56:19AM +0200, Jan Kara wrote:
> > 
> > Thanks for testing. So you seem to be really stressting the path where
> > creation of new files / directories fails (probably due to group quota).
> 
> Yes, there are 29 groups over quota on a total of 4499. Those are mainly 
> spammed websites and therefore quite stressed due to the amount of tries 
> to add new "data".
> 
> 
> > I have one idea what could cause your filesystem corruption, although 
> > it's a wild guess... Please try attached oneliner.
> 
> Running since yesterday.
> 
> 
> > Also your corruption reminded me that Al Viro has been fixing problems
> > where we could cache one inode twice when a filesystem was mounted over NFS
> > and that could also lead to a filesystem corruption. So I'm adding him to
> > CC just in case he has some idea. BTW Al, what do you think about the
> > problem I describe in the attached patch? I'm not sure if it can cause some
> > real problems but in theory it could...
> 
> Should we upgrade NFS clients as well ?  (now running 2.6.28.9)

The client version shouldn't matter.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-06 17:05                       ` J. Bruce Fields
  0 siblings, 0 replies; 39+ messages in thread
From: J. Bruce Fields @ 2009-08-06 17:05 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: Jan Kara, linux-kernel, linux-ext4, linux-nfs, Al Viro

On Thu, Aug 06, 2009 at 03:15:56PM +0200, Sylvain Rochet wrote:
> Hi,
> 
> 
> On Wed, Aug 05, 2009 at 12:56:19AM +0200, Jan Kara wrote:
> > 
> > Thanks for testing. So you seem to be really stressting the path where
> > creation of new files / directories fails (probably due to group quota).
> 
> Yes, there are 29 groups over quota on a total of 4499. Those are mainly 
> spammed websites and therefore quite stressed due to the amount of tries 
> to add new "data".
> 
> 
> > I have one idea what could cause your filesystem corruption, although 
> > it's a wild guess... Please try attached oneliner.
> 
> Running since yesterday.
> 
> 
> > Also your corruption reminded me that Al Viro has been fixing problems
> > where we could cache one inode twice when a filesystem was mounted over NFS
> > and that could also lead to a filesystem corruption. So I'm adding him to
> > CC just in case he has some idea. BTW Al, what do you think about the
> > problem I describe in the attached patch? I'm not sure if it can cause some
> > real problems but in theory it could...
> 
> Should we upgrade NFS clients as well ?  (now running 2.6.28.9)

The client version shouldn't matter.

--b.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-06 13:15                     ` Sylvain Rochet
  0 siblings, 0 replies; 39+ messages in thread
From: Sylvain Rochet @ 2009-08-06 13:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, linux-ext4, linux-nfs, Al Viro

[-- Attachment #1: Type: text/plain, Size: 1063 bytes --]

Hi,


On Wed, Aug 05, 2009 at 12:56:19AM +0200, Jan Kara wrote:
> 
> Thanks for testing. So you seem to be really stressting the path where
> creation of new files / directories fails (probably due to group quota).

Yes, there are 29 groups over quota on a total of 4499. Those are mainly 
spammed websites and therefore quite stressed due to the amount of tries 
to add new "data".


> I have one idea what could cause your filesystem corruption, although 
> it's a wild guess... Please try attached oneliner.

Running since yesterday.


> Also your corruption reminded me that Al Viro has been fixing problems
> where we could cache one inode twice when a filesystem was mounted over NFS
> and that could also lead to a filesystem corruption. So I'm adding him to
> CC just in case he has some idea. BTW Al, what do you think about the
> problem I describe in the attached patch? I'm not sure if it can cause some
> real problems but in theory it could...

Should we upgrade NFS clients as well ?  (now running 2.6.28.9)


Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-06 13:15                     ` Sylvain Rochet
  0 siblings, 0 replies; 39+ messages in thread
From: Sylvain Rochet @ 2009-08-06 13:15 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro

[-- Attachment #1: Type: text/plain, Size: 1063 bytes --]

Hi,


On Wed, Aug 05, 2009 at 12:56:19AM +0200, Jan Kara wrote:
> 
> Thanks for testing. So you seem to be really stressting the path where
> creation of new files / directories fails (probably due to group quota).

Yes, there are 29 groups over quota on a total of 4499. Those are mainly 
spammed websites and therefore quite stressed due to the amount of tries 
to add new "data".


> I have one idea what could cause your filesystem corruption, although 
> it's a wild guess... Please try attached oneliner.

Running since yesterday.


> Also your corruption reminded me that Al Viro has been fixing problems
> where we could cache one inode twice when a filesystem was mounted over NFS
> and that could also lead to a filesystem corruption. So I'm adding him to
> CC just in case he has some idea. BTW Al, what do you think about the
> problem I describe in the attached patch? I'm not sure if it can cause some
> real problems but in theory it could...

Should we upgrade NFS clients as well ?  (now running 2.6.28.9)


Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-06 13:15                     ` Sylvain Rochet
  0 siblings, 0 replies; 39+ messages in thread
From: Sylvain Rochet @ 2009-08-06 13:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, linux-ext4, linux-nfs, Al Viro

[-- Attachment #1: Type: text/plain, Size: 1063 bytes --]

Hi,


On Wed, Aug 05, 2009 at 12:56:19AM +0200, Jan Kara wrote:
> 
> Thanks for testing. So you seem to be really stressting the path where
> creation of new files / directories fails (probably due to group quota).

Yes, there are 29 groups over quota on a total of 4499. Those are mainly 
spammed websites and therefore quite stressed due to the amount of tries 
to add new "data".


> I have one idea what could cause your filesystem corruption, although 
> it's a wild guess... Please try attached oneliner.

Running since yesterday.


> Also your corruption reminded me that Al Viro has been fixing problems
> where we could cache one inode twice when a filesystem was mounted over NFS
> and that could also lead to a filesystem corruption. So I'm adding him to
> CC just in case he has some idea. BTW Al, what do you think about the
> problem I describe in the attached patch? I'm not sure if it can cause some
> real problems but in theory it could...

Should we upgrade NFS clients as well ?  (now running 2.6.28.9)


Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-04 22:56                   ` Jan Kara
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2009-08-04 22:56 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: Jan Kara, linux-kernel, linux-ext4, linux-nfs, Al Viro

[-- Attachment #1: Type: text/plain, Size: 1703 bytes --]

  Hi,

On Tue 04-08-09 13:15:05, Sylvain Rochet wrote:
> On Tue, Aug 04, 2009 at 12:29:01AM +0200, Jan Kara wrote:
> > 
> >   OK, I've found some time and written the debugging patch. Hopefully it
> > will tell us more. It should output messages to the kernel log if it
> > finds something suspicious - like:
> > No dentry for unlinked inode...
> > Dentry ... for unlinked inode ... has no parent
> > Found directory entry ... for unlinked inode
> > 
> >   When you see such messages in the log, send them to me please. Also
> > attach the System.map file so that I can translate the address where
> > i_nlink was dropped - for that ext3 should be compiled into the kernel
> > (should not be a module). Thanks a lot for testing.
> 
> Patch applied.
> 
> And there is already a lot of output.
> 
> http://edony.tuxfamily.org/~grad/bazooka/System.map-2.6.30.4
> http://edony.tuxfamily.org/~grad/bazooka/config-2.6.30.4
> http://edony.tuxfamily.org/~grad/bazooka/kern.log
  Thanks for testing. So you seem to be really stressting the path where
creation of new files / directories fails (probably due to group quota).  I
have one idea what could cause your filesystem corruption, although it's a
wild guess... Please try attached oneliner.
  Also your corruption reminded me that Al Viro has been fixing problems
where we could cache one inode twice when a filesystem was mounted over NFS
and that could also lead to a filesystem corruption. So I'm adding him to
CC just in case he has some idea. BTW Al, what do you think about the
problem I describe in the attached patch? I'm not sure if it can cause some
real problems but in theory it could...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-fs-Make-sure-data-stored-into-inode-is-properly-see.patch --]
[-- Type: text/x-patch, Size: 985 bytes --]

>From 78513d3a5628fda0f8d685d732b7bc73bd4c9222 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 5 Aug 2009 00:42:21 +0200
Subject: [PATCH] fs: Make sure data stored into inode is properly seen before unlocking new inode

In theory it could happen that on one CPU we initialize a new inode but clearing
of I_NEW | I_LOCK gets reordered before some of the initialization. Thus on
another CPU we return not fully uptodate inode from iget_locked().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/inode.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 901bad1..e9a8e77 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -696,6 +696,7 @@ void unlock_new_inode(struct inode *inode)
 	 * just created it (so there can be no old holders
 	 * that haven't tested I_LOCK).
 	 */
+	smp_mb();
 	WARN_ON((inode->i_state & (I_LOCK|I_NEW)) != (I_LOCK|I_NEW));
 	inode->i_state &= ~(I_LOCK|I_NEW);
 	wake_up_inode(inode);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-04 22:56                   ` Jan Kara
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2009-08-04 22:56 UTC (permalink / raw)
  To: Sylvain Rochet
  Cc: Jan Kara, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro

[-- Attachment #1: Type: text/plain, Size: 1724 bytes --]

  Hi,

On Tue 04-08-09 13:15:05, Sylvain Rochet wrote:
> On Tue, Aug 04, 2009 at 12:29:01AM +0200, Jan Kara wrote:
> > 
> >   OK, I've found some time and written the debugging patch. Hopefully it
> > will tell us more. It should output messages to the kernel log if it
> > finds something suspicious - like:
> > No dentry for unlinked inode...
> > Dentry ... for unlinked inode ... has no parent
> > Found directory entry ... for unlinked inode
> > 
> >   When you see such messages in the log, send them to me please. Also
> > attach the System.map file so that I can translate the address where
> > i_nlink was dropped - for that ext3 should be compiled into the kernel
> > (should not be a module). Thanks a lot for testing.
> 
> Patch applied.
> 
> And there is already a lot of output.
> 
> http://edony.tuxfamily.org/~grad/bazooka/System.map-2.6.30.4
> http://edony.tuxfamily.org/~grad/bazooka/config-2.6.30.4
> http://edony.tuxfamily.org/~grad/bazooka/kern.log
  Thanks for testing. So you seem to be really stressting the path where
creation of new files / directories fails (probably due to group quota).  I
have one idea what could cause your filesystem corruption, although it's a
wild guess... Please try attached oneliner.
  Also your corruption reminded me that Al Viro has been fixing problems
where we could cache one inode twice when a filesystem was mounted over NFS
and that could also lead to a filesystem corruption. So I'm adding him to
CC just in case he has some idea. BTW Al, what do you think about the
problem I describe in the attached patch? I'm not sure if it can cause some
real problems but in theory it could...

								Honza
-- 
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR

[-- Attachment #2: 0001-fs-Make-sure-data-stored-into-inode-is-properly-see.patch --]
[-- Type: text/x-patch, Size: 1027 bytes --]

>From 78513d3a5628fda0f8d685d732b7bc73bd4c9222 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Date: Wed, 5 Aug 2009 00:42:21 +0200
Subject: [PATCH] fs: Make sure data stored into inode is properly seen before unlocking new inode

In theory it could happen that on one CPU we initialize a new inode but clearing
of I_NEW | I_LOCK gets reordered before some of the initialization. Thus on
another CPU we return not fully uptodate inode from iget_locked().

Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 fs/inode.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 901bad1..e9a8e77 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -696,6 +696,7 @@ void unlock_new_inode(struct inode *inode)
 	 * just created it (so there can be no old holders
 	 * that haven't tested I_LOCK).
 	 */
+	smp_mb();
 	WARN_ON((inode->i_state & (I_LOCK|I_NEW)) != (I_LOCK|I_NEW));
 	inode->i_state &= ~(I_LOCK|I_NEW);
 	wake_up_inode(inode);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
  2009-08-03 22:29               ` Jan Kara
  (?)
  (?)
@ 2009-08-04 11:15               ` Sylvain Rochet
  2009-08-04 22:56                   ` Jan Kara
  -1 siblings, 1 reply; 39+ messages in thread
From: Sylvain Rochet @ 2009-08-04 11:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, linux-ext4, linux-nfs

[-- Attachment #1: Type: text/plain, Size: 915 bytes --]

Hi,


On Tue, Aug 04, 2009 at 12:29:01AM +0200, Jan Kara wrote:
> 
>   OK, I've found some time and written the debugging patch. Hopefully it
> will tell us more. It should output messages to the kernel log if it
> finds something suspicious - like:
> No dentry for unlinked inode...
> Dentry ... for unlinked inode ... has no parent
> Found directory entry ... for unlinked inode
> 
>   When you see such messages in the log, send them to me please. Also
> attach the System.map file so that I can translate the address where
> i_nlink was dropped - for that ext3 should be compiled into the kernel
> (should not be a module). Thanks a lot for testing.

Patch applied.

And there is already a lot of output.

http://edony.tuxfamily.org/~grad/bazooka/System.map-2.6.30.4
http://edony.tuxfamily.org/~grad/bazooka/config-2.6.30.4
http://edony.tuxfamily.org/~grad/bazooka/kern.log


Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
  2009-07-29 12:58             ` Jan Kara
@ 2009-08-04 11:02               ` Sylvain Rochet
  0 siblings, 0 replies; 39+ messages in thread
From: Sylvain Rochet @ 2009-08-04 11:02 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, linux-ext4, linux-nfs

[-- Attachment #1: Type: text/plain, Size: 1127 bytes --]

Hi,


On Wed, Jul 29, 2009 at 02:58:12PM +0200, Jan Kara wrote:
> On Tue 28-07-09 18:41:42, Sylvain Rochet wrote:
> > 
> > Lets move out the corrupted directory ;)
> > 
> > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# rm -- * .ok 
> > rm: cannot remove `spip%3Farticle19.f8740dca': Input/output error
> > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cd ..
> > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache# mv e/ /data/lost+found/wooops
> 
> Actually, leaving that file in the filesystem can potentially lead to
> strange effects because eventually the inode "spip%3Farticle19.f8740dca"
> points to gets reallocated and then you can get e.g. a hardlinked
> directory. On the other hand having it lost+found should be safe enough.

This happened a few times in the past, we saw corrupted dentries reappearing with 
a new file. New files with reference count set to 1 (but obviously should be 2 in 
this case). So the rule is "do not delete corrupted dentries anyway, keep them 
safe in lost+found and do not touch it" ;).


Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
  2009-07-28 21:12             ` J. Bruce Fields
@ 2009-08-04 10:50               ` Sylvain Rochet
  0 siblings, 0 replies; 39+ messages in thread
From: Sylvain Rochet @ 2009-08-04 10:50 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jan Kara, linux-kernel, linux-ext4, linux-nfs

[-- Attachment #1: Type: text/plain, Size: 1107 bytes --]

Hi,

On Tue, Jul 28, 2009 at 05:12:15PM -0400, J. Bruce Fields wrote:
> On Tue, Jul 28, 2009 at 06:41:42PM +0200, Sylvain Rochet wrote:
> > 
> > [...]
> > May 26 12:34:39 cognac kernel: NFS: Buggy server - nlink == 0!
> > May 26 12:39:43 cognac kernel: NFS: Buggy server - nlink == 0!
> > 
> > This is obviously related to the corruption.
> 
> It might be interesting to know whether the file that we returned to the
> client with nlink 0 was the same that you later saw corruption on; maybe
> adding a printk of the inode number there would help.
> 
> Googling around on that error message, a previous thread:
> 
> 	http://marc.info/?t=107429333300004&r=1&w=4
> 
> seems to conclude it's a bug, but doesn't followup with a fix.  And I
> don't see any mention of possible filesystem corruption.
> 
> Is NFSv4 involved here?  I wonder if something that might otherwise be
> only a problem for the client could become a problem for the server if
> it attempts to do further operations with an unlinked inode in a
> compound operation that follows a lookup.

NFSv3 here.

Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-03 22:29               ` Jan Kara
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2009-08-03 22:29 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: linux-kernel, linux-ext4, linux-nfs

[-- Attachment #1: Type: text/plain, Size: 3322 bytes --]

  Hi,

On Tue 28-07-09 18:41:42, Sylvain Rochet wrote:
> On Tue, Jul 28, 2009 at 03:52:26PM +0200, Jan Kara wrote:
> > On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> > > On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > > > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > > > > 
> > > > > > Can you still see the corruption with 2.6.30 kernel?
> > > > > 
> > > > > Not upgraded yet, we'll give a try.
> > > 
> > > Done, now featuring 2.6.30.3 ;)
> > 
> > OK, drop me an email if you will see corruption also with this kernel.
> 
> Lets move out the corrupted directory ;)
> 
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# rm -- * .ok 
> rm: cannot remove `spip%3Farticle19.f8740dca': Input/output error
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cd ..
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache# mv e/ /data/lost+found/wooops
> 
> > > > This is probably the misleading output from ext3_iget(). It should give
> > > > you EIO in the latest kernel.
> > > 
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca 
> > > cat: spip%3Farticle19.f8740dca: Input/output error
> > > 
> > > It has much more sense now. We thought the problem was around NFS due 
> > > the the previous error message, actually this is probably not the best 
> > > looking path.
> > 
> > Yes, EIO makes more sence. I think the problem is NFS connected anyway
> > though :). But I don't have a clue how it can happen yet. Maybe I can try
> > adding some low-cost debugging checks if you'd be willing to run such
> > kernel...
> 
> Without any problem, we have 24/7/365 physical access and we don't need 
> to provide high-availability services.
> 
> Anyway, the data hosted aren't that important, there is little or even 
> no need for strict confidentiality, so we will be happy to provide ssh 
> access to whom would like to look deeper into this issue.
> 
> 
> > I'm adding to CC linux-nfs just in case someone has an idea.
> > 
> > > >   Ah, OK, here's the problem. The directory points to a file which is
> > > > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > > > to indicate that the file was correctly deleted (you might check that the
> > > > corresponding bit in the bitmap is cleared via: "icheck 88541562").
> > > 
> > > root@bazooka:~# debugfs /dev/md10
> > > debugfs 1.40-WIP (14-Nov-2006)
> > > debugfs:  icheck 88541562
> > > Block   Inode number
> > > 88541562        <block not found>
> > 
> > Ah, wrong debugfs command. I should have written:
> > testi <88541562>
> 
> debugfs:  testi <88541562>
> Inode 88541562 is not in use
  OK, I've found some time and written the debugging patch. Hopefully it
will tell us more. It should output messages to the kernel log if it
finds something suspicious - like:
No dentry for unlinked inode...
Dentry ... for unlinked inode ... has no parent
Found directory entry ... for unlinked inode

  When you see such messages in the log, send them to me please. Also
attach the System.map file so that I can translate the address where
i_nlink was dropped - for that ext3 should be compiled into the kernel
(should not be a module). Thanks a lot for testing.

								Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-ext3-Debug-unlinking-of-inodes.patch --]
[-- Type: text/x-patch, Size: 3524 bytes --]

>From b32511dbd58c8d9111001a33d253a283943bbf7a Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 4 Aug 2009 00:17:35 +0200
Subject: [PATCH] ext3: Debug unlinking of inodes

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext3/inode.c    |   28 ++++++++++++++++++++++++++++
 fs/ext3/namei.c    |    2 +-
 include/linux/fs.h |    6 ++++++
 3 files changed, 35 insertions(+), 1 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index b49908a..dca30a2 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -2876,6 +2876,9 @@ bad_inode:
 	return ERR_PTR(ret);
 }
 
+struct buffer_head *ext3_find_entry(struct inode *dir,
+                                        struct qstr *entry,
+                                        struct ext3_dir_entry_2 **res_dir);
 /*
  * Post the struct inode info into an on-disk inode location in the
  * buffer-cache.  This gobbles the caller's reference to the
@@ -2892,6 +2895,31 @@ static int ext3_do_update_inode(handle_t *handle,
 	struct buffer_head *bh = iloc->bh;
 	int err = 0, rc, block;
 
+	if (!inode->i_nlink && !inode->i_checked_drop) {
+		struct dentry *dentry;
+		struct ext3_dir_entry_2 *de;
+		struct buffer_head *bh;
+
+		inode->i_checked_drop = 1;
+		if (list_empty(&inode->i_dentry)) {
+			printk("No dentry for unlinked inode %lu\nNlink dropped at 0x%lx\n", inode->i_ino, inode->i_dropped);
+			dump_stack();
+			goto next;
+		}
+		dentry = list_entry(inode->i_dentry.next, struct dentry, d_alias);
+		if (!dentry->d_parent) {
+			printk("Dentry %s for unlinked inode %lu has no parent\nNlink dropped at 0x%lx\n", dentry->d_name.name, inode->i_ino, inode->i_dropped);
+			dump_stack();
+			goto next;
+		}
+		bh = ext3_find_entry(dentry->d_parent->d_inode, &dentry->d_name, &de);
+		if (bh && le32_to_cpu(de->inode) == inode->i_ino) {
+			printk("Found directory entry %s for unlinked inode %lu\nNlink dropped at 0x%lx\n", dentry->d_name.name, inode->i_ino, inode->i_dropped);
+			brelse(bh);
+			dump_stack();
+		}
+	}
+next:
 	/* For fields not not tracking in the in-memory inode,
 	 * initialise them to zero for new inodes. */
 	if (ei->i_state & EXT3_STATE_NEW)
diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index 6ff7b97..e66b6c0 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -850,7 +850,7 @@ static inline int search_dirblock(struct buffer_head * bh,
  * The returned buffer_head has ->b_count elevated.  The caller is expected
  * to brelse() it when appropriate.
  */
-static struct buffer_head *ext3_find_entry(struct inode *dir,
+struct buffer_head *ext3_find_entry(struct inode *dir,
 					struct qstr *entry,
 					struct ext3_dir_entry_2 **res_dir)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a36ffa5..271c51c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -780,6 +780,8 @@ struct inode {
 	struct posix_acl	*i_acl;
 	struct posix_acl	*i_default_acl;
 #endif
+	unsigned long		i_dropped;
+	int			i_checked_drop;
 	void			*i_private; /* fs or device private pointer */
 };
 
@@ -1693,6 +1695,8 @@ static inline void inode_inc_link_count(struct inode *inode)
 static inline void drop_nlink(struct inode *inode)
 {
 	inode->i_nlink--;
+	inode->i_dropped = _THIS_IP_;
+	inode->i_checked_drop = 0;
 }
 
 /**
@@ -1706,6 +1710,8 @@ static inline void drop_nlink(struct inode *inode)
 static inline void clear_nlink(struct inode *inode)
 {
 	inode->i_nlink = 0;
+	inode->i_dropped = _THIS_IP_;
+	inode->i_checked_drop = 0;
 }
 
 static inline void inode_dec_link_count(struct inode *inode)
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-03 22:29               ` Jan Kara
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2009-08-03 22:29 UTC (permalink / raw)
  To: Sylvain Rochet
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 3343 bytes --]

  Hi,

On Tue 28-07-09 18:41:42, Sylvain Rochet wrote:
> On Tue, Jul 28, 2009 at 03:52:26PM +0200, Jan Kara wrote:
> > On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> > > On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > > > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > > > > 
> > > > > > Can you still see the corruption with 2.6.30 kernel?
> > > > > 
> > > > > Not upgraded yet, we'll give a try.
> > > 
> > > Done, now featuring 2.6.30.3 ;)
> > 
> > OK, drop me an email if you will see corruption also with this kernel.
> 
> Lets move out the corrupted directory ;)
> 
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# rm -- * .ok 
> rm: cannot remove `spip%3Farticle19.f8740dca': Input/output error
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cd ..
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache# mv e/ /data/lost+found/wooops
> 
> > > > This is probably the misleading output from ext3_iget(). It should give
> > > > you EIO in the latest kernel.
> > > 
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca 
> > > cat: spip%3Farticle19.f8740dca: Input/output error
> > > 
> > > It has much more sense now. We thought the problem was around NFS due 
> > > the the previous error message, actually this is probably not the best 
> > > looking path.
> > 
> > Yes, EIO makes more sence. I think the problem is NFS connected anyway
> > though :). But I don't have a clue how it can happen yet. Maybe I can try
> > adding some low-cost debugging checks if you'd be willing to run such
> > kernel...
> 
> Without any problem, we have 24/7/365 physical access and we don't need 
> to provide high-availability services.
> 
> Anyway, the data hosted aren't that important, there is little or even 
> no need for strict confidentiality, so we will be happy to provide ssh 
> access to whom would like to look deeper into this issue.
> 
> 
> > I'm adding to CC linux-nfs just in case someone has an idea.
> > 
> > > >   Ah, OK, here's the problem. The directory points to a file which is
> > > > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > > > to indicate that the file was correctly deleted (you might check that the
> > > > corresponding bit in the bitmap is cleared via: "icheck 88541562").
> > > 
> > > root@bazooka:~# debugfs /dev/md10
> > > debugfs 1.40-WIP (14-Nov-2006)
> > > debugfs:  icheck 88541562
> > > Block   Inode number
> > > 88541562        <block not found>
> > 
> > Ah, wrong debugfs command. I should have written:
> > testi <88541562>
> 
> debugfs:  testi <88541562>
> Inode 88541562 is not in use
  OK, I've found some time and written the debugging patch. Hopefully it
will tell us more. It should output messages to the kernel log if it
finds something suspicious - like:
No dentry for unlinked inode...
Dentry ... for unlinked inode ... has no parent
Found directory entry ... for unlinked inode

  When you see such messages in the log, send them to me please. Also
attach the System.map file so that I can translate the address where
i_nlink was dropped - for that ext3 should be compiled into the kernel
(should not be a module). Thanks a lot for testing.

								Honza

-- 
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR

[-- Attachment #2: 0001-ext3-Debug-unlinking-of-inodes.patch --]
[-- Type: text/x-patch, Size: 3566 bytes --]

>From b32511dbd58c8d9111001a33d253a283943bbf7a Mon Sep 17 00:00:00 2001
From: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Date: Tue, 4 Aug 2009 00:17:35 +0200
Subject: [PATCH] ext3: Debug unlinking of inodes

Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 fs/ext3/inode.c    |   28 ++++++++++++++++++++++++++++
 fs/ext3/namei.c    |    2 +-
 include/linux/fs.h |    6 ++++++
 3 files changed, 35 insertions(+), 1 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index b49908a..dca30a2 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -2876,6 +2876,9 @@ bad_inode:
 	return ERR_PTR(ret);
 }
 
+struct buffer_head *ext3_find_entry(struct inode *dir,
+                                        struct qstr *entry,
+                                        struct ext3_dir_entry_2 **res_dir);
 /*
  * Post the struct inode info into an on-disk inode location in the
  * buffer-cache.  This gobbles the caller's reference to the
@@ -2892,6 +2895,31 @@ static int ext3_do_update_inode(handle_t *handle,
 	struct buffer_head *bh = iloc->bh;
 	int err = 0, rc, block;
 
+	if (!inode->i_nlink && !inode->i_checked_drop) {
+		struct dentry *dentry;
+		struct ext3_dir_entry_2 *de;
+		struct buffer_head *bh;
+
+		inode->i_checked_drop = 1;
+		if (list_empty(&inode->i_dentry)) {
+			printk("No dentry for unlinked inode %lu\nNlink dropped at 0x%lx\n", inode->i_ino, inode->i_dropped);
+			dump_stack();
+			goto next;
+		}
+		dentry = list_entry(inode->i_dentry.next, struct dentry, d_alias);
+		if (!dentry->d_parent) {
+			printk("Dentry %s for unlinked inode %lu has no parent\nNlink dropped at 0x%lx\n", dentry->d_name.name, inode->i_ino, inode->i_dropped);
+			dump_stack();
+			goto next;
+		}
+		bh = ext3_find_entry(dentry->d_parent->d_inode, &dentry->d_name, &de);
+		if (bh && le32_to_cpu(de->inode) == inode->i_ino) {
+			printk("Found directory entry %s for unlinked inode %lu\nNlink dropped at 0x%lx\n", dentry->d_name.name, inode->i_ino, inode->i_dropped);
+			brelse(bh);
+			dump_stack();
+		}
+	}
+next:
 	/* For fields not not tracking in the in-memory inode,
 	 * initialise them to zero for new inodes. */
 	if (ei->i_state & EXT3_STATE_NEW)
diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index 6ff7b97..e66b6c0 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -850,7 +850,7 @@ static inline int search_dirblock(struct buffer_head * bh,
  * The returned buffer_head has ->b_count elevated.  The caller is expected
  * to brelse() it when appropriate.
  */
-static struct buffer_head *ext3_find_entry(struct inode *dir,
+struct buffer_head *ext3_find_entry(struct inode *dir,
 					struct qstr *entry,
 					struct ext3_dir_entry_2 **res_dir)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a36ffa5..271c51c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -780,6 +780,8 @@ struct inode {
 	struct posix_acl	*i_acl;
 	struct posix_acl	*i_default_acl;
 #endif
+	unsigned long		i_dropped;
+	int			i_checked_drop;
 	void			*i_private; /* fs or device private pointer */
 };
 
@@ -1693,6 +1695,8 @@ static inline void inode_inc_link_count(struct inode *inode)
 static inline void drop_nlink(struct inode *inode)
 {
 	inode->i_nlink--;
+	inode->i_dropped = _THIS_IP_;
+	inode->i_checked_drop = 0;
 }
 
 /**
@@ -1706,6 +1710,8 @@ static inline void drop_nlink(struct inode *inode)
 static inline void clear_nlink(struct inode *inode)
 {
 	inode->i_nlink = 0;
+	inode->i_dropped = _THIS_IP_;
+	inode->i_checked_drop = 0;
 }
 
 static inline void inode_dec_link_count(struct inode *inode)
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-08-03 22:29               ` Jan Kara
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2009-08-03 22:29 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: linux-kernel, linux-ext4, linux-nfs

[-- Attachment #1: Type: text/plain, Size: 3322 bytes --]

  Hi,

On Tue 28-07-09 18:41:42, Sylvain Rochet wrote:
> On Tue, Jul 28, 2009 at 03:52:26PM +0200, Jan Kara wrote:
> > On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> > > On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > > > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > > > > 
> > > > > > Can you still see the corruption with 2.6.30 kernel?
> > > > > 
> > > > > Not upgraded yet, we'll give a try.
> > > 
> > > Done, now featuring 2.6.30.3 ;)
> > 
> > OK, drop me an email if you will see corruption also with this kernel.
> 
> Lets move out the corrupted directory ;)
> 
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# rm -- * .ok 
> rm: cannot remove `spip%3Farticle19.f8740dca': Input/output error
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cd ..
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache# mv e/ /data/lost+found/wooops
> 
> > > > This is probably the misleading output from ext3_iget(). It should give
> > > > you EIO in the latest kernel.
> > > 
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca 
> > > cat: spip%3Farticle19.f8740dca: Input/output error
> > > 
> > > It has much more sense now. We thought the problem was around NFS due 
> > > the the previous error message, actually this is probably not the best 
> > > looking path.
> > 
> > Yes, EIO makes more sence. I think the problem is NFS connected anyway
> > though :). But I don't have a clue how it can happen yet. Maybe I can try
> > adding some low-cost debugging checks if you'd be willing to run such
> > kernel...
> 
> Without any problem, we have 24/7/365 physical access and we don't need 
> to provide high-availability services.
> 
> Anyway, the data hosted aren't that important, there is little or even 
> no need for strict confidentiality, so we will be happy to provide ssh 
> access to whom would like to look deeper into this issue.
> 
> 
> > I'm adding to CC linux-nfs just in case someone has an idea.
> > 
> > > >   Ah, OK, here's the problem. The directory points to a file which is
> > > > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > > > to indicate that the file was correctly deleted (you might check that the
> > > > corresponding bit in the bitmap is cleared via: "icheck 88541562").
> > > 
> > > root@bazooka:~# debugfs /dev/md10
> > > debugfs 1.40-WIP (14-Nov-2006)
> > > debugfs:  icheck 88541562
> > > Block   Inode number
> > > 88541562        <block not found>
> > 
> > Ah, wrong debugfs command. I should have written:
> > testi <88541562>
> 
> debugfs:  testi <88541562>
> Inode 88541562 is not in use
  OK, I've found some time and written the debugging patch. Hopefully it
will tell us more. It should output messages to the kernel log if it
finds something suspicious - like:
No dentry for unlinked inode...
Dentry ... for unlinked inode ... has no parent
Found directory entry ... for unlinked inode

  When you see such messages in the log, send them to me please. Also
attach the System.map file so that I can translate the address where
i_nlink was dropped - for that ext3 should be compiled into the kernel
(should not be a module). Thanks a lot for testing.

								Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-ext3-Debug-unlinking-of-inodes.patch --]
[-- Type: text/x-patch, Size: 3524 bytes --]

>From b32511dbd58c8d9111001a33d253a283943bbf7a Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 4 Aug 2009 00:17:35 +0200
Subject: [PATCH] ext3: Debug unlinking of inodes

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext3/inode.c    |   28 ++++++++++++++++++++++++++++
 fs/ext3/namei.c    |    2 +-
 include/linux/fs.h |    6 ++++++
 3 files changed, 35 insertions(+), 1 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index b49908a..dca30a2 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -2876,6 +2876,9 @@ bad_inode:
 	return ERR_PTR(ret);
 }
 
+struct buffer_head *ext3_find_entry(struct inode *dir,
+                                        struct qstr *entry,
+                                        struct ext3_dir_entry_2 **res_dir);
 /*
  * Post the struct inode info into an on-disk inode location in the
  * buffer-cache.  This gobbles the caller's reference to the
@@ -2892,6 +2895,31 @@ static int ext3_do_update_inode(handle_t *handle,
 	struct buffer_head *bh = iloc->bh;
 	int err = 0, rc, block;
 
+	if (!inode->i_nlink && !inode->i_checked_drop) {
+		struct dentry *dentry;
+		struct ext3_dir_entry_2 *de;
+		struct buffer_head *bh;
+
+		inode->i_checked_drop = 1;
+		if (list_empty(&inode->i_dentry)) {
+			printk("No dentry for unlinked inode %lu\nNlink dropped at 0x%lx\n", inode->i_ino, inode->i_dropped);
+			dump_stack();
+			goto next;
+		}
+		dentry = list_entry(inode->i_dentry.next, struct dentry, d_alias);
+		if (!dentry->d_parent) {
+			printk("Dentry %s for unlinked inode %lu has no parent\nNlink dropped at 0x%lx\n", dentry->d_name.name, inode->i_ino, inode->i_dropped);
+			dump_stack();
+			goto next;
+		}
+		bh = ext3_find_entry(dentry->d_parent->d_inode, &dentry->d_name, &de);
+		if (bh && le32_to_cpu(de->inode) == inode->i_ino) {
+			printk("Found directory entry %s for unlinked inode %lu\nNlink dropped at 0x%lx\n", dentry->d_name.name, inode->i_ino, inode->i_dropped);
+			brelse(bh);
+			dump_stack();
+		}
+	}
+next:
 	/* For fields not not tracking in the in-memory inode,
 	 * initialise them to zero for new inodes. */
 	if (ei->i_state & EXT3_STATE_NEW)
diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index 6ff7b97..e66b6c0 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -850,7 +850,7 @@ static inline int search_dirblock(struct buffer_head * bh,
  * The returned buffer_head has ->b_count elevated.  The caller is expected
  * to brelse() it when appropriate.
  */
-static struct buffer_head *ext3_find_entry(struct inode *dir,
+struct buffer_head *ext3_find_entry(struct inode *dir,
 					struct qstr *entry,
 					struct ext3_dir_entry_2 **res_dir)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a36ffa5..271c51c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -780,6 +780,8 @@ struct inode {
 	struct posix_acl	*i_acl;
 	struct posix_acl	*i_default_acl;
 #endif
+	unsigned long		i_dropped;
+	int			i_checked_drop;
 	void			*i_private; /* fs or device private pointer */
 };
 
@@ -1693,6 +1695,8 @@ static inline void inode_inc_link_count(struct inode *inode)
 static inline void drop_nlink(struct inode *inode)
 {
 	inode->i_nlink--;
+	inode->i_dropped = _THIS_IP_;
+	inode->i_checked_drop = 0;
 }
 
 /**
@@ -1706,6 +1710,8 @@ static inline void drop_nlink(struct inode *inode)
 static inline void clear_nlink(struct inode *inode)
 {
 	inode->i_nlink = 0;
+	inode->i_dropped = _THIS_IP_;
+	inode->i_checked_drop = 0;
 }
 
 static inline void inode_dec_link_count(struct inode *inode)
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
  2009-07-28 16:41           ` Sylvain Rochet
  2009-07-28 21:12             ` J. Bruce Fields
@ 2009-07-29 12:58             ` Jan Kara
  2009-08-04 11:02               ` Sylvain Rochet
  2009-08-03 22:29               ` Jan Kara
  2 siblings, 1 reply; 39+ messages in thread
From: Jan Kara @ 2009-07-29 12:58 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: Jan Kara, linux-kernel, linux-ext4, linux-nfs

On Tue 28-07-09 18:41:42, Sylvain Rochet wrote:
> Hi,
> 
> 
> On Tue, Jul 28, 2009 at 03:52:26PM +0200, Jan Kara wrote:
> > On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> > > On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > > > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > > > > 
> > > > > > Can you still see the corruption with 2.6.30 kernel?
> > > > > 
> > > > > Not upgraded yet, we'll give a try.
> > > 
> > > Done, now featuring 2.6.30.3 ;)
> > 
> > OK, drop me an email if you will see corruption also with this kernel.
> 
> Lets move out the corrupted directory ;)
> 
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# rm -- * .ok 
> rm: cannot remove `spip%3Farticle19.f8740dca': Input/output error
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cd ..
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache# mv e/ /data/lost+found/wooops
  Actually, leaving that file in the filesystem can potentially lead to
strange effects because eventually the inode "spip%3Farticle19.f8740dca"
points to gets reallocated and then you can get e.g. a hardlinked
directory. On the other hand having it lost+found should be safe enough.

> > > > This is probably the misleading output from ext3_iget(). It should give
> > > > you EIO in the latest kernel.
> > > 
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca 
> > > cat: spip%3Farticle19.f8740dca: Input/output error
> > > 
> > > It has much more sense now. We thought the problem was around NFS due 
> > > the the previous error message, actually this is probably not the best 
> > > looking path.
> > 
> > Yes, EIO makes more sence. I think the problem is NFS connected anyway
> > though :). But I don't have a clue how it can happen yet. Maybe I can try
> > adding some low-cost debugging checks if you'd be willing to run such
> > kernel...
> 
> Without any problem, we have 24/7/365 physical access and we don't need 
> to provide high-availability services.
  Cool, I'll try to cook up something then.

> Anyway, the data hosted aren't that important, there is little or even 
> no need for strict confidentiality, so we will be happy to provide ssh 
> access to whom would like to look deeper into this issue.
  I don't need to go that far (at least for now) but thanks for the offer.

> > I'm adding to CC linux-nfs just in case someone has an idea.
> > 
> > > >   Ah, OK, here's the problem. The directory points to a file which is
> > > > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > > > to indicate that the file was correctly deleted (you might check that the
> > > > corresponding bit in the bitmap is cleared via: "icheck 88541562").
> > > 
> > > root@bazooka:~# debugfs /dev/md10
> > > debugfs 1.40-WIP (14-Nov-2006)
> > > debugfs:  icheck 88541562
> > > Block   Inode number
> > > 88541562        <block not found>
> > 
> > Ah, wrong debugfs command. I should have written:
> > testi <88541562>
> 
> debugfs:  testi <88541562>
> Inode 88541562 is not in use
  Yes, again this confirms that the inode was just correctly deleted. But
somehow a pointer to it remained in the directory.

> > > >   The question is how it could happen the directory still points to the
> > > > inode. Really strange. It looks as if we've lost a write to the directory
> > > > but I don't see how. Are there any suspitious kernel messages in this case?
> > > 
> > > There were nothing for a while, but since the reboot there are some 
> > > about this inode: 
> > > 
> > > EXT3-fs error (device md10): ext3_lookup: deleted inode referenced: 88541562
> > 
> > Yes, that's to be expected given the corruption any NFS error messages?
> 
> There are some error messages on NFS clients, however they are quite old.
> 
> Apr 19 15:38:21 gin kernel: NFS: Buggy server - nlink == 0!
> May  3 20:00:52 gin kernel: NFS: Buggy server - nlink == 0!
> May  3 23:24:03 gin kernel: NFS: Buggy server - nlink == 0!
> May  7 11:40:57 gin kernel: NFS: Buggy server - nlink == 0!
> May  7 14:41:02 gin kernel: NFS: Buggy server - nlink == 0!
> May 26 11:10:42 cognac kernel: NFS: Buggy server - nlink == 0!
> May 26 11:13:28 cognac kernel: NFS: Buggy server - nlink == 0!
> May 26 12:34:39 cognac kernel: NFS: Buggy server - nlink == 0!
> May 26 12:39:43 cognac kernel: NFS: Buggy server - nlink == 0!
> 
> This is obviously related to the corruption.
  Yes, this is a consequence of the bug - somebody deleted an inode because
i_nlink dropped down to 0 but the inode was in fact still referenced.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
  2009-07-28 16:41           ` Sylvain Rochet
@ 2009-07-28 21:12             ` J. Bruce Fields
  2009-08-04 10:50               ` Sylvain Rochet
  2009-07-29 12:58             ` Jan Kara
  2009-08-03 22:29               ` Jan Kara
  2 siblings, 1 reply; 39+ messages in thread
From: J. Bruce Fields @ 2009-07-28 21:12 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: Jan Kara, linux-kernel, linux-ext4, linux-nfs

On Tue, Jul 28, 2009 at 06:41:42PM +0200, Sylvain Rochet wrote:
> Hi,
> 
> 
> On Tue, Jul 28, 2009 at 03:52:26PM +0200, Jan Kara wrote:
> > On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> > > On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > > > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > > > > 
> > > > > > Can you still see the corruption with 2.6.30 kernel?
> > > > > 
> > > > > Not upgraded yet, we'll give a try.
> > > 
> > > Done, now featuring 2.6.30.3 ;)
> > 
> > OK, drop me an email if you will see corruption also with this kernel.
> 
> Lets move out the corrupted directory ;)
> 
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# rm -- * .ok 
> rm: cannot remove `spip%3Farticle19.f8740dca': Input/output error
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cd ..
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache# mv e/ /data/lost+found/wooops
> 
> 
> > > > This is probably the misleading output from ext3_iget(). It should give
> > > > you EIO in the latest kernel.
> > > 
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca 
> > > cat: spip%3Farticle19.f8740dca: Input/output error
> > > 
> > > It has much more sense now. We thought the problem was around NFS due 
> > > the the previous error message, actually this is probably not the best 
> > > looking path.
> > 
> > Yes, EIO makes more sence. I think the problem is NFS connected anyway
> > though :). But I don't have a clue how it can happen yet. Maybe I can try
> > adding some low-cost debugging checks if you'd be willing to run such
> > kernel...
> 
> Without any problem, we have 24/7/365 physical access and we don't need 
> to provide high-availability services.
> 
> Anyway, the data hosted aren't that important, there is little or even 
> no need for strict confidentiality, so we will be happy to provide ssh 
> access to whom would like to look deeper into this issue.
> 
> 
> > I'm adding to CC linux-nfs just in case someone has an idea.
> > 
> > > >   Ah, OK, here's the problem. The directory points to a file which is
> > > > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > > > to indicate that the file was correctly deleted (you might check that the
> > > > corresponding bit in the bitmap is cleared via: "icheck 88541562").
> > > 
> > > root@bazooka:~# debugfs /dev/md10
> > > debugfs 1.40-WIP (14-Nov-2006)
> > > debugfs:  icheck 88541562
> > > Block   Inode number
> > > 88541562        <block not found>
> > 
> > Ah, wrong debugfs command. I should have written:
> > testi <88541562>
> 
> debugfs:  testi <88541562>
> Inode 88541562 is not in use
> 
> 
> > > >   The question is how it could happen the directory still points to the
> > > > inode. Really strange. It looks as if we've lost a write to the directory
> > > > but I don't see how. Are there any suspitious kernel messages in this case?
> > > 
> > > There were nothing for a while, but since the reboot there are some 
> > > about this inode: 
> > > 
> > > EXT3-fs error (device md10): ext3_lookup: deleted inode referenced: 88541562
> > 
> > Yes, that's to be expected given the corruption any NFS error messages?
> 
> There are some error messages on NFS clients, however they are quite old.
> 
> Apr 19 15:38:21 gin kernel: NFS: Buggy server - nlink == 0!
> May  3 20:00:52 gin kernel: NFS: Buggy server - nlink == 0!
> May  3 23:24:03 gin kernel: NFS: Buggy server - nlink == 0!
> May  7 11:40:57 gin kernel: NFS: Buggy server - nlink == 0!
> May  7 14:41:02 gin kernel: NFS: Buggy server - nlink == 0!
> May 26 11:10:42 cognac kernel: NFS: Buggy server - nlink == 0!
> May 26 11:13:28 cognac kernel: NFS: Buggy server - nlink == 0!
> May 26 12:34:39 cognac kernel: NFS: Buggy server - nlink == 0!
> May 26 12:39:43 cognac kernel: NFS: Buggy server - nlink == 0!
> 
> This is obviously related to the corruption.

It might be interesting to know whether the file that we returned to the
client with nlink 0 was the same that you later saw corruption on; maybe
adding a printk of the inode number there would help.

Googling around on that error message, a previous thread:

	http://marc.info/?t=107429333300004&r=1&w=4

seems to conclude it's a bug, but doesn't followup with a fix.  And I
don't see any mention of possible filesystem corruption.

Is NFSv4 involved here?  I wonder if something that might otherwise be
only a problem for the client could become a problem for the server if
it attempts to do further operations with an unlinked inode in a
compound operation that follows a lookup.

--b.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
  2009-07-28 13:52           ` Jan Kara
  (?)
  (?)
@ 2009-07-28 16:41           ` Sylvain Rochet
  2009-07-28 21:12             ` J. Bruce Fields
                               ` (2 more replies)
  -1 siblings, 3 replies; 39+ messages in thread
From: Sylvain Rochet @ 2009-07-28 16:41 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, linux-ext4, linux-nfs

[-- Attachment #1: Type: text/plain, Size: 3815 bytes --]

Hi,


On Tue, Jul 28, 2009 at 03:52:26PM +0200, Jan Kara wrote:
> On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> > On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > > > 
> > > > > Can you still see the corruption with 2.6.30 kernel?
> > > > 
> > > > Not upgraded yet, we'll give a try.
> > 
> > Done, now featuring 2.6.30.3 ;)
> 
> OK, drop me an email if you will see corruption also with this kernel.

Lets move out the corrupted directory ;)

root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# rm -- * .ok 
rm: cannot remove `spip%3Farticle19.f8740dca': Input/output error
root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cd ..
root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache# mv e/ /data/lost+found/wooops


> > > This is probably the misleading output from ext3_iget(). It should give
> > > you EIO in the latest kernel.
> > 
> > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca 
> > cat: spip%3Farticle19.f8740dca: Input/output error
> > 
> > It has much more sense now. We thought the problem was around NFS due 
> > the the previous error message, actually this is probably not the best 
> > looking path.
> 
> Yes, EIO makes more sence. I think the problem is NFS connected anyway
> though :). But I don't have a clue how it can happen yet. Maybe I can try
> adding some low-cost debugging checks if you'd be willing to run such
> kernel...

Without any problem, we have 24/7/365 physical access and we don't need 
to provide high-availability services.

Anyway, the data hosted aren't that important, there is little or even 
no need for strict confidentiality, so we will be happy to provide ssh 
access to whom would like to look deeper into this issue.


> I'm adding to CC linux-nfs just in case someone has an idea.
> 
> > >   Ah, OK, here's the problem. The directory points to a file which is
> > > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > > to indicate that the file was correctly deleted (you might check that the
> > > corresponding bit in the bitmap is cleared via: "icheck 88541562").
> > 
> > root@bazooka:~# debugfs /dev/md10
> > debugfs 1.40-WIP (14-Nov-2006)
> > debugfs:  icheck 88541562
> > Block   Inode number
> > 88541562        <block not found>
> 
> Ah, wrong debugfs command. I should have written:
> testi <88541562>

debugfs:  testi <88541562>
Inode 88541562 is not in use


> > >   The question is how it could happen the directory still points to the
> > > inode. Really strange. It looks as if we've lost a write to the directory
> > > but I don't see how. Are there any suspitious kernel messages in this case?
> > 
> > There were nothing for a while, but since the reboot there are some 
> > about this inode: 
> > 
> > EXT3-fs error (device md10): ext3_lookup: deleted inode referenced: 88541562
> 
> Yes, that's to be expected given the corruption any NFS error messages?

There are some error messages on NFS clients, however they are quite old.

Apr 19 15:38:21 gin kernel: NFS: Buggy server - nlink == 0!
May  3 20:00:52 gin kernel: NFS: Buggy server - nlink == 0!
May  3 23:24:03 gin kernel: NFS: Buggy server - nlink == 0!
May  7 11:40:57 gin kernel: NFS: Buggy server - nlink == 0!
May  7 14:41:02 gin kernel: NFS: Buggy server - nlink == 0!
May 26 11:10:42 cognac kernel: NFS: Buggy server - nlink == 0!
May 26 11:13:28 cognac kernel: NFS: Buggy server - nlink == 0!
May 26 12:34:39 cognac kernel: NFS: Buggy server - nlink == 0!
May 26 12:39:43 cognac kernel: NFS: Buggy server - nlink == 0!

This is obviously related to the corruption.



Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-07-28 13:52           ` Jan Kara
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2009-07-28 13:52 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: Jan Kara, linux-kernel, linux-ext4, linux-nfs

On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > > 
> > > > Can you still see the corruption with 2.6.30 kernel?
> > > 
> > > Not upgraded yet, we'll give a try.
> 
> Done, now featuring 2.6.30.3 ;)
  OK, drop me an email if you will see corruption also with this kernel.

> > > > If you can still see this problem, could you run: debugfs /dev/md10
> > > > and send output of the command:
> > > > stat <40420228>
> > > > (or whatever the corrupted inode number will be)
> > > > and also:
> > > > dump <40420228> /tmp/corrupted_dir
> > > 
> > > One inode get corrupted recently, here is the output:
> > > 
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# ls -lai
> > > total 64
> > > 88539836 drwxr-sr-x  2 18804 23084  4096 2009-07-25 07:53 .
> > > 88539821 drwxr-sr-x 20 18804 23084  4096 2008-08-20 10:14 ..
> > > 88541578 -rw-rw-rw-  1 18804 23084   471 2009-07-25 04:55 -inc_forum-10-wa.3cb1921f
> > > 88541465 -rw-rw-rw-  1 18804 23084  6693 2009-07-25 07:53 -inc_rss_item-32-wa.23d91cc2
> > > 88541471 -rw-rw-rw-  1 18804 23084  1625 2009-07-25 07:53 -inc_rubriques-17-wa.f2f152f0
> > > 88541549 -rw-rw-rw-  1 18804 23084  2813 2009-07-25 03:04 INDEX-.edfac52c
> > > 88541366 -rw-rw-rw-  1 18804 23084     0 2008-08-17 20:44 .ok
> > >        ? ?---------  ? ?     ?         ?                ? spip%3Farticle19.f8740dca
> > > 88541671 -rw-rw-rw-  1 18804 23084  5619 2009-07-24 21:07 spip%3Fauteur1.c64f7f7e
> > > 88541460 -rw-rw-rw-  1 18804 23084  5636 2009-07-24 19:30 spip%3Fmot5.f3e9adda
> > > 88540284 -rw-rw-rw-  1 18804 23084  3802 2009-07-25 16:10 spip%3Fpage%3Dforum-30.63b2c1b1
> > > 88541539 -rw-rw-rw-  1 18804 23084 12972 2009-07-25 11:14 spip%3Fpage%3Djquery.cce608b6.gz
> >   OK, so we couldn't stat a directory...
> > 
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
> > > cat: spip%3Farticle19.f8740dca: Stale NFS file handle
> >   This is probably the misleading output from ext3_iget(). It should give
> > you EIO in the latest kernel.
> 
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca 
> cat: spip%3Farticle19.f8740dca: Input/output error
> 
> It has much more sense now. We thought the problem was around NFS due 
> the the previous error message, actually this is probably not the best 
> looking path.
  Yes, EIO makes more sence. I think the problem is NFS connected anyway
though :). But I don't have a clue how it can happen yet. Maybe I can try
adding some low-cost debugging checks if you'd be willing to run such
kernel...
  I'm adding to CC linux-nfs just in case someone has an idea.

> > > root@bazooka:~# debugfs /dev/md10
> > > debugfs 1.40-WIP (14-Nov-2006)
> > > 
> > >     debugfs:  stat <88539836>
> > > 
> > > Inode: 88539836   Type: directory    Mode:  0755   Flags: 0x0   Generation: 791796957
> > > User: 18804   Group: 23084   Size: 4096
> > > File ACL: 0    Directory ACL: 0
> > > Links: 2   Blockcount: 8
> > > Fragment:  Address: 0    Number: 0    Size: 0
> > > ctime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> > > atime: 0x4a0de585 -- Fri May 15 23:58:29 2009
> > > mtime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> > > Size of extra inode fields: 4
> > > BLOCKS:
> > > (0):177096928
> > > TOTAL: 1
> > > 
> > > 
> > >     debugfs:  ls <88539836>
> > > 
> > >  88539836  (12) .    88539821  (32) ..    88541366  (12) .ok
> > >  88541465  (56) -inc_rss_item-32-wa.23d91cc2
> > >  88541539  (40) spip%3Fpage%3Djquery.cce608b6.gz
> > >  88540284  (40) spip%3Fpage%3Dforum-30.63b2c1b1
> > >  88541460  (28) spip%3Fmot5.f3e9adda
> > >  88541471  (160) -inc_rubriques-17-wa.f2f152f0
> > >  88541549  (24) INDEX-.edfac52c    88541578  (284) -inc_forum-10-wa.3cb1921f
> > >  88541562  (36) spip%3Farticle19.f8740dca
> > >  88541671  (3372) spip%3Fauteur1.c64f7f7e
> >   The directory itself looks fine...
> > 
> > >     debugfs:  stat <88541562>
> > > 
> > > Inode: 88541562   Type: regular    Mode:  0666   Flags: 0x0   Generation: 860068541
> > > User: 18804   Group: 23084   Size: 0
> > > File ACL: 0    Directory ACL: 0
> > > Links: 0   Blockcount: 0
> > > Fragment:  Address: 0    Number: 0    Size: 0
> > > ctime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > > atime: 0x4a6a612f -- Sat Jul 25 03:34:39 2009
> > > mtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > > dtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > > Size of extra inode fields: 4
> > > BLOCKS:
> > 
> >   Ah, OK, here's the problem. The directory points to a file which is
> > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > to indicate that the file was correctly deleted (you might check that the
> > corresponding bit in the bitmap is cleared via: "icheck 88541562").
> 
> root@bazooka:~# debugfs /dev/md10
> debugfs 1.40-WIP (14-Nov-2006)
> debugfs:  icheck 88541562
> Block   Inode number
> 88541562        <block not found>
  Ah, wrong debugfs command. I should have written:
testi <88541562>

> >   The question is how it could happen the directory still points to the
> > inode. Really strange. It looks as if we've lost a write to the directory
> > but I don't see how. Are there any suspitious kernel messages in this case?
> 
> There were nothing for a while, but since the reboot there are some 
> about this inode: 
> 
> EXT3-fs error (device md10): ext3_lookup: deleted inode referenced: 88541562
  Yes, that's to be expected given the corruption any NFS error messages?

> > > We'll try.
> > 
> >   It probably won't help. This particular directory had just one block so
> > DIR_INDEX had no effect on it.
> 
> Let's keep dir_index for now, then.
  OK.

> >   OK, so it's probably not a storage device problem. Good to know.
> 
> We also thought about motherboard, CPU, or chassis issues, but 
> everything has been replaced.
> 
> 
> The check of the MD raid6 array always ends happily:
> 
> Jul  5 01:06:01 bazooka kernel: md: data-check of RAID array md10
> Jul  5 01:06:01 bazooka kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Jul  5 01:06:01 bazooka kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> Jul  5 01:06:01 bazooka kernel: md: using 128k window, over a total of 143373888 blocks.
> Jul  5 04:28:28 bazooka kernel: md: md10: data-check done.
> 
> 
> We never saw modification to the data of files themselves, maybe it 
> happened, but we never saw any evidence of that. Of course, due to the 
> modification of the filesystem structure, we saw files replaced by other 
> files ;)

								Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-07-28 13:52           ` Jan Kara
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2009-07-28 13:52 UTC (permalink / raw)
  To: Sylvain Rochet
  Cc: Jan Kara, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > > 
> > > > Can you still see the corruption with 2.6.30 kernel?
> > > 
> > > Not upgraded yet, we'll give a try.
> 
> Done, now featuring 2.6.30.3 ;)
  OK, drop me an email if you will see corruption also with this kernel.

> > > > If you can still see this problem, could you run: debugfs /dev/md10
> > > > and send output of the command:
> > > > stat <40420228>
> > > > (or whatever the corrupted inode number will be)
> > > > and also:
> > > > dump <40420228> /tmp/corrupted_dir
> > > 
> > > One inode get corrupted recently, here is the output:
> > > 
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# ls -lai
> > > total 64
> > > 88539836 drwxr-sr-x  2 18804 23084  4096 2009-07-25 07:53 .
> > > 88539821 drwxr-sr-x 20 18804 23084  4096 2008-08-20 10:14 ..
> > > 88541578 -rw-rw-rw-  1 18804 23084   471 2009-07-25 04:55 -inc_forum-10-wa.3cb1921f
> > > 88541465 -rw-rw-rw-  1 18804 23084  6693 2009-07-25 07:53 -inc_rss_item-32-wa.23d91cc2
> > > 88541471 -rw-rw-rw-  1 18804 23084  1625 2009-07-25 07:53 -inc_rubriques-17-wa.f2f152f0
> > > 88541549 -rw-rw-rw-  1 18804 23084  2813 2009-07-25 03:04 INDEX-.edfac52c
> > > 88541366 -rw-rw-rw-  1 18804 23084     0 2008-08-17 20:44 .ok
> > >        ? ?---------  ? ?     ?         ?                ? spip%3Farticle19.f8740dca
> > > 88541671 -rw-rw-rw-  1 18804 23084  5619 2009-07-24 21:07 spip%3Fauteur1.c64f7f7e
> > > 88541460 -rw-rw-rw-  1 18804 23084  5636 2009-07-24 19:30 spip%3Fmot5.f3e9adda
> > > 88540284 -rw-rw-rw-  1 18804 23084  3802 2009-07-25 16:10 spip%3Fpage%3Dforum-30.63b2c1b1
> > > 88541539 -rw-rw-rw-  1 18804 23084 12972 2009-07-25 11:14 spip%3Fpage%3Djquery.cce608b6.gz
> >   OK, so we couldn't stat a directory...
> > 
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
> > > cat: spip%3Farticle19.f8740dca: Stale NFS file handle
> >   This is probably the misleading output from ext3_iget(). It should give
> > you EIO in the latest kernel.
> 
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca 
> cat: spip%3Farticle19.f8740dca: Input/output error
> 
> It has much more sense now. We thought the problem was around NFS due 
> the the previous error message, actually this is probably not the best 
> looking path.
  Yes, EIO makes more sence. I think the problem is NFS connected anyway
though :). But I don't have a clue how it can happen yet. Maybe I can try
adding some low-cost debugging checks if you'd be willing to run such
kernel...
  I'm adding to CC linux-nfs just in case someone has an idea.

> > > root@bazooka:~# debugfs /dev/md10
> > > debugfs 1.40-WIP (14-Nov-2006)
> > > 
> > >     debugfs:  stat <88539836>
> > > 
> > > Inode: 88539836   Type: directory    Mode:  0755   Flags: 0x0   Generation: 791796957
> > > User: 18804   Group: 23084   Size: 4096
> > > File ACL: 0    Directory ACL: 0
> > > Links: 2   Blockcount: 8
> > > Fragment:  Address: 0    Number: 0    Size: 0
> > > ctime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> > > atime: 0x4a0de585 -- Fri May 15 23:58:29 2009
> > > mtime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> > > Size of extra inode fields: 4
> > > BLOCKS:
> > > (0):177096928
> > > TOTAL: 1
> > > 
> > > 
> > >     debugfs:  ls <88539836>
> > > 
> > >  88539836  (12) .    88539821  (32) ..    88541366  (12) .ok
> > >  88541465  (56) -inc_rss_item-32-wa.23d91cc2
> > >  88541539  (40) spip%3Fpage%3Djquery.cce608b6.gz
> > >  88540284  (40) spip%3Fpage%3Dforum-30.63b2c1b1
> > >  88541460  (28) spip%3Fmot5.f3e9adda
> > >  88541471  (160) -inc_rubriques-17-wa.f2f152f0
> > >  88541549  (24) INDEX-.edfac52c    88541578  (284) -inc_forum-10-wa.3cb1921f
> > >  88541562  (36) spip%3Farticle19.f8740dca
> > >  88541671  (3372) spip%3Fauteur1.c64f7f7e
> >   The directory itself looks fine...
> > 
> > >     debugfs:  stat <88541562>
> > > 
> > > Inode: 88541562   Type: regular    Mode:  0666   Flags: 0x0   Generation: 860068541
> > > User: 18804   Group: 23084   Size: 0
> > > File ACL: 0    Directory ACL: 0
> > > Links: 0   Blockcount: 0
> > > Fragment:  Address: 0    Number: 0    Size: 0
> > > ctime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > > atime: 0x4a6a612f -- Sat Jul 25 03:34:39 2009
> > > mtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > > dtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > > Size of extra inode fields: 4
> > > BLOCKS:
> > 
> >   Ah, OK, here's the problem. The directory points to a file which is
> > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > to indicate that the file was correctly deleted (you might check that the
> > corresponding bit in the bitmap is cleared via: "icheck 88541562").
> 
> root@bazooka:~# debugfs /dev/md10
> debugfs 1.40-WIP (14-Nov-2006)
> debugfs:  icheck 88541562
> Block   Inode number
> 88541562        <block not found>
  Ah, wrong debugfs command. I should have written:
testi <88541562>

> >   The question is how it could happen the directory still points to the
> > inode. Really strange. It looks as if we've lost a write to the directory
> > but I don't see how. Are there any suspitious kernel messages in this case?
> 
> There were nothing for a while, but since the reboot there are some 
> about this inode: 
> 
> EXT3-fs error (device md10): ext3_lookup: deleted inode referenced: 88541562
  Yes, that's to be expected given the corruption any NFS error messages?

> > > We'll try.
> > 
> >   It probably won't help. This particular directory had just one block so
> > DIR_INDEX had no effect on it.
> 
> Let's keep dir_index for now, then.
  OK.

> >   OK, so it's probably not a storage device problem. Good to know.
> 
> We also thought about motherboard, CPU, or chassis issues, but 
> everything has been replaced.
> 
> 
> The check of the MD raid6 array always ends happily:
> 
> Jul  5 01:06:01 bazooka kernel: md: data-check of RAID array md10
> Jul  5 01:06:01 bazooka kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Jul  5 01:06:01 bazooka kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> Jul  5 01:06:01 bazooka kernel: md: using 128k window, over a total of 143373888 blocks.
> Jul  5 04:28:28 bazooka kernel: md: md10: data-check done.
> 
> 
> We never saw modification to the data of files themselves, maybe it 
> happened, but we never saw any evidence of that. Of course, due to the 
> modification of the filesystem structure, we saw files replaced by other 
> files ;)

								Honza

-- 
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-07-28 13:52           ` Jan Kara
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2009-07-28 13:52 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: Jan Kara, linux-kernel, linux-ext4, linux-nfs

On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > > 
> > > > Can you still see the corruption with 2.6.30 kernel?
> > > 
> > > Not upgraded yet, we'll give a try.
> 
> Done, now featuring 2.6.30.3 ;)
  OK, drop me an email if you will see corruption also with this kernel.

> > > > If you can still see this problem, could you run: debugfs /dev/md10
> > > > and send output of the command:
> > > > stat <40420228>
> > > > (or whatever the corrupted inode number will be)
> > > > and also:
> > > > dump <40420228> /tmp/corrupted_dir
> > > 
> > > One inode get corrupted recently, here is the output:
> > > 
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# ls -lai
> > > total 64
> > > 88539836 drwxr-sr-x  2 18804 23084  4096 2009-07-25 07:53 .
> > > 88539821 drwxr-sr-x 20 18804 23084  4096 2008-08-20 10:14 ..
> > > 88541578 -rw-rw-rw-  1 18804 23084   471 2009-07-25 04:55 -inc_forum-10-wa.3cb1921f
> > > 88541465 -rw-rw-rw-  1 18804 23084  6693 2009-07-25 07:53 -inc_rss_item-32-wa.23d91cc2
> > > 88541471 -rw-rw-rw-  1 18804 23084  1625 2009-07-25 07:53 -inc_rubriques-17-wa.f2f152f0
> > > 88541549 -rw-rw-rw-  1 18804 23084  2813 2009-07-25 03:04 INDEX-.edfac52c
> > > 88541366 -rw-rw-rw-  1 18804 23084     0 2008-08-17 20:44 .ok
> > >        ? ?---------  ? ?     ?         ?                ? spip%3Farticle19.f8740dca
> > > 88541671 -rw-rw-rw-  1 18804 23084  5619 2009-07-24 21:07 spip%3Fauteur1.c64f7f7e
> > > 88541460 -rw-rw-rw-  1 18804 23084  5636 2009-07-24 19:30 spip%3Fmot5.f3e9adda
> > > 88540284 -rw-rw-rw-  1 18804 23084  3802 2009-07-25 16:10 spip%3Fpage%3Dforum-30.63b2c1b1
> > > 88541539 -rw-rw-rw-  1 18804 23084 12972 2009-07-25 11:14 spip%3Fpage%3Djquery.cce608b6.gz
> >   OK, so we couldn't stat a directory...
> > 
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
> > > cat: spip%3Farticle19.f8740dca: Stale NFS file handle
> >   This is probably the misleading output from ext3_iget(). It should give
> > you EIO in the latest kernel.
> 
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca 
> cat: spip%3Farticle19.f8740dca: Input/output error
> 
> It has much more sense now. We thought the problem was around NFS due 
> the the previous error message, actually this is probably not the best 
> looking path.
  Yes, EIO makes more sence. I think the problem is NFS connected anyway
though :). But I don't have a clue how it can happen yet. Maybe I can try
adding some low-cost debugging checks if you'd be willing to run such
kernel...
  I'm adding to CC linux-nfs just in case someone has an idea.

> > > root@bazooka:~# debugfs /dev/md10
> > > debugfs 1.40-WIP (14-Nov-2006)
> > > 
> > >     debugfs:  stat <88539836>
> > > 
> > > Inode: 88539836   Type: directory    Mode:  0755   Flags: 0x0   Generation: 791796957
> > > User: 18804   Group: 23084   Size: 4096
> > > File ACL: 0    Directory ACL: 0
> > > Links: 2   Blockcount: 8
> > > Fragment:  Address: 0    Number: 0    Size: 0
> > > ctime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> > > atime: 0x4a0de585 -- Fri May 15 23:58:29 2009
> > > mtime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> > > Size of extra inode fields: 4
> > > BLOCKS:
> > > (0):177096928
> > > TOTAL: 1
> > > 
> > > 
> > >     debugfs:  ls <88539836>
> > > 
> > >  88539836  (12) .    88539821  (32) ..    88541366  (12) .ok
> > >  88541465  (56) -inc_rss_item-32-wa.23d91cc2
> > >  88541539  (40) spip%3Fpage%3Djquery.cce608b6.gz
> > >  88540284  (40) spip%3Fpage%3Dforum-30.63b2c1b1
> > >  88541460  (28) spip%3Fmot5.f3e9adda
> > >  88541471  (160) -inc_rubriques-17-wa.f2f152f0
> > >  88541549  (24) INDEX-.edfac52c    88541578  (284) -inc_forum-10-wa.3cb1921f
> > >  88541562  (36) spip%3Farticle19.f8740dca
> > >  88541671  (3372) spip%3Fauteur1.c64f7f7e
> >   The directory itself looks fine...
> > 
> > >     debugfs:  stat <88541562>
> > > 
> > > Inode: 88541562   Type: regular    Mode:  0666   Flags: 0x0   Generation: 860068541
> > > User: 18804   Group: 23084   Size: 0
> > > File ACL: 0    Directory ACL: 0
> > > Links: 0   Blockcount: 0
> > > Fragment:  Address: 0    Number: 0    Size: 0
> > > ctime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > > atime: 0x4a6a612f -- Sat Jul 25 03:34:39 2009
> > > mtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > > dtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > > Size of extra inode fields: 4
> > > BLOCKS:
> > 
> >   Ah, OK, here's the problem. The directory points to a file which is
> > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > to indicate that the file was correctly deleted (you might check that the
> > corresponding bit in the bitmap is cleared via: "icheck 88541562").
> 
> root@bazooka:~# debugfs /dev/md10
> debugfs 1.40-WIP (14-Nov-2006)
> debugfs:  icheck 88541562
> Block   Inode number
> 88541562        <block not found>
  Ah, wrong debugfs command. I should have written:
testi <88541562>

> >   The question is how it could happen the directory still points to the
> > inode. Really strange. It looks as if we've lost a write to the directory
> > but I don't see how. Are there any suspitious kernel messages in this case?
> 
> There were nothing for a while, but since the reboot there are some 
> about this inode: 
> 
> EXT3-fs error (device md10): ext3_lookup: deleted inode referenced: 88541562
  Yes, that's to be expected given the corruption any NFS error messages?

> > > We'll try.
> > 
> >   It probably won't help. This particular directory had just one block so
> > DIR_INDEX had no effect on it.
> 
> Let's keep dir_index for now, then.
  OK.

> >   OK, so it's probably not a storage device problem. Good to know.
> 
> We also thought about motherboard, CPU, or chassis issues, but 
> everything has been replaced.
> 
> 
> The check of the MD raid6 array always ends happily:
> 
> Jul  5 01:06:01 bazooka kernel: md: data-check of RAID array md10
> Jul  5 01:06:01 bazooka kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> Jul  5 01:06:01 bazooka kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> Jul  5 01:06:01 bazooka kernel: md: using 128k window, over a total of 143373888 blocks.
> Jul  5 04:28:28 bazooka kernel: md: md10: data-check done.
> 
> 
> We never saw modification to the data of files themselves, maybe it 
> happened, but we never saw any evidence of that. Of course, due to the 
> modification of the filesystem structure, we saw files replaced by other 
> files ;)

								Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
  2009-07-27 15:42     ` Jan Kara
@ 2009-07-28 11:27       ` Sylvain Rochet
  2009-07-28 13:52           ` Jan Kara
  0 siblings, 1 reply; 39+ messages in thread
From: Sylvain Rochet @ 2009-07-28 11:27 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 6024 bytes --]

Hi,


On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > 
> > > Can you still see the corruption with 2.6.30 kernel?
> > 
> > Not upgraded yet, we'll give a try.

Done, now featuring 2.6.30.3 ;)


> > > If you can still see this problem, could you run: debugfs /dev/md10
> > > and send output of the command:
> > > stat <40420228>
> > > (or whatever the corrupted inode number will be)
> > > and also:
> > > dump <40420228> /tmp/corrupted_dir
> > 
> > One inode get corrupted recently, here is the output:
> > 
> > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# ls -lai
> > total 64
> > 88539836 drwxr-sr-x  2 18804 23084  4096 2009-07-25 07:53 .
> > 88539821 drwxr-sr-x 20 18804 23084  4096 2008-08-20 10:14 ..
> > 88541578 -rw-rw-rw-  1 18804 23084   471 2009-07-25 04:55 -inc_forum-10-wa.3cb1921f
> > 88541465 -rw-rw-rw-  1 18804 23084  6693 2009-07-25 07:53 -inc_rss_item-32-wa.23d91cc2
> > 88541471 -rw-rw-rw-  1 18804 23084  1625 2009-07-25 07:53 -inc_rubriques-17-wa.f2f152f0
> > 88541549 -rw-rw-rw-  1 18804 23084  2813 2009-07-25 03:04 INDEX-.edfac52c
> > 88541366 -rw-rw-rw-  1 18804 23084     0 2008-08-17 20:44 .ok
> >        ? ?---------  ? ?     ?         ?                ? spip%3Farticle19.f8740dca
> > 88541671 -rw-rw-rw-  1 18804 23084  5619 2009-07-24 21:07 spip%3Fauteur1.c64f7f7e
> > 88541460 -rw-rw-rw-  1 18804 23084  5636 2009-07-24 19:30 spip%3Fmot5.f3e9adda
> > 88540284 -rw-rw-rw-  1 18804 23084  3802 2009-07-25 16:10 spip%3Fpage%3Dforum-30.63b2c1b1
> > 88541539 -rw-rw-rw-  1 18804 23084 12972 2009-07-25 11:14 spip%3Fpage%3Djquery.cce608b6.gz
>   OK, so we couldn't stat a directory...
> 
> > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
> > cat: spip%3Farticle19.f8740dca: Stale NFS file handle
>   This is probably the misleading output from ext3_iget(). It should give
> you EIO in the latest kernel.

root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca 
cat: spip%3Farticle19.f8740dca: Input/output error

It has much more sense now. We thought the problem was around NFS due 
the the previous error message, actually this is probably not the best 
looking path.


> > root@bazooka:~# debugfs /dev/md10
> > debugfs 1.40-WIP (14-Nov-2006)
> > 
> >     debugfs:  stat <88539836>
> > 
> > Inode: 88539836   Type: directory    Mode:  0755   Flags: 0x0   Generation: 791796957
> > User: 18804   Group: 23084   Size: 4096
> > File ACL: 0    Directory ACL: 0
> > Links: 2   Blockcount: 8
> > Fragment:  Address: 0    Number: 0    Size: 0
> > ctime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> > atime: 0x4a0de585 -- Fri May 15 23:58:29 2009
> > mtime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> > Size of extra inode fields: 4
> > BLOCKS:
> > (0):177096928
> > TOTAL: 1
> > 
> > 
> >     debugfs:  ls <88539836>
> > 
> >  88539836  (12) .    88539821  (32) ..    88541366  (12) .ok
> >  88541465  (56) -inc_rss_item-32-wa.23d91cc2
> >  88541539  (40) spip%3Fpage%3Djquery.cce608b6.gz
> >  88540284  (40) spip%3Fpage%3Dforum-30.63b2c1b1
> >  88541460  (28) spip%3Fmot5.f3e9adda
> >  88541471  (160) -inc_rubriques-17-wa.f2f152f0
> >  88541549  (24) INDEX-.edfac52c    88541578  (284) -inc_forum-10-wa.3cb1921f
> >  88541562  (36) spip%3Farticle19.f8740dca
> >  88541671  (3372) spip%3Fauteur1.c64f7f7e
>   The directory itself looks fine...
> 
> >     debugfs:  stat <88541562>
> > 
> > Inode: 88541562   Type: regular    Mode:  0666   Flags: 0x0   Generation: 860068541
> > User: 18804   Group: 23084   Size: 0
> > File ACL: 0    Directory ACL: 0
> > Links: 0   Blockcount: 0
> > Fragment:  Address: 0    Number: 0    Size: 0
> > ctime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > atime: 0x4a6a612f -- Sat Jul 25 03:34:39 2009
> > mtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > dtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > Size of extra inode fields: 4
> > BLOCKS:
> 
>   Ah, OK, here's the problem. The directory points to a file which is
> obviously deleted (note the "Links: 0"). All the content of the inode seems
> to indicate that the file was correctly deleted (you might check that the
> corresponding bit in the bitmap is cleared via: "icheck 88541562").

root@bazooka:~# debugfs /dev/md10
debugfs 1.40-WIP (14-Nov-2006)
debugfs:  icheck 88541562
Block   Inode number
88541562        <block not found>


>   The question is how it could happen the directory still points to the
> inode. Really strange. It looks as if we've lost a write to the directory
> but I don't see how. Are there any suspitious kernel messages in this case?

There were nothing for a while, but since the reboot there are some 
about this inode: 

EXT3-fs error (device md10): ext3_lookup: deleted inode referenced: 88541562


> > We'll try.
> 
>   It probably won't help. This particular directory had just one block so
> DIR_INDEX had no effect on it.

Let's keep dir_index for now, then.


>   OK, so it's probably not a storage device problem. Good to know.

We also thought about motherboard, CPU, or chassis issues, but 
everything has been replaced.


The check of the MD raid6 array always ends happily:

Jul  5 01:06:01 bazooka kernel: md: data-check of RAID array md10
Jul  5 01:06:01 bazooka kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Jul  5 01:06:01 bazooka kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
Jul  5 01:06:01 bazooka kernel: md: using 128k window, over a total of 143373888 blocks.
Jul  5 04:28:28 bazooka kernel: md: md10: data-check done.


We never saw modification to the data of files themselves, maybe it 
happened, but we never saw any evidence of that. Of course, due to the 
modification of the filesystem structure, we saw files replaced by other 
files ;)


Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
  2009-07-25 15:17   ` Sylvain Rochet
@ 2009-07-27 15:42     ` Jan Kara
  2009-07-28 11:27       ` Sylvain Rochet
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Kara @ 2009-07-27 15:42 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: Jan Kara, linux-kernel, linux-ext4

  Hi,

On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> Sorry for the late answer, waiting for the problem to happen again ;)
  No problem.

> On Thu, Jul 16, 2009 at 07:27:49PM +0200, Jan Kara wrote:
> >   Hi,
> > 
> > > We(TuxFamily) are having some inodes corruptions on a NFS server.
> > > 
> > > So, let's start with the facts.
> > > 
> > > 
> > > ==== NFS Server
> > > 
> > > Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux
> > 
> > Can you still see the corruption with 2.6.30 kernel?
> Not upgraded yet, we'll give a try.
> 
> > If you can still see this problem, could you run: debugfs /dev/md10
> > and send output of the command:
> > stat <40420228>
> > (or whatever the corrupted inode number will be)
> > and also:
> > dump <40420228> /tmp/corrupted_dir
> 
> One inode get corrupted recently, here is the output:
> 
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# ls -lai
> total 64
> 88539836 drwxr-sr-x  2 18804 23084  4096 2009-07-25 07:53 .
> 88539821 drwxr-sr-x 20 18804 23084  4096 2008-08-20 10:14 ..
> 88541578 -rw-rw-rw-  1 18804 23084   471 2009-07-25 04:55 -inc_forum-10-wa.3cb1921f
> 88541465 -rw-rw-rw-  1 18804 23084  6693 2009-07-25 07:53 -inc_rss_item-32-wa.23d91cc2
> 88541471 -rw-rw-rw-  1 18804 23084  1625 2009-07-25 07:53 -inc_rubriques-17-wa.f2f152f0
> 88541549 -rw-rw-rw-  1 18804 23084  2813 2009-07-25 03:04 INDEX-.edfac52c
> 88541366 -rw-rw-rw-  1 18804 23084     0 2008-08-17 20:44 .ok
>        ? ?---------  ? ?     ?         ?                ? spip%3Farticle19.f8740dca
> 88541671 -rw-rw-rw-  1 18804 23084  5619 2009-07-24 21:07 spip%3Fauteur1.c64f7f7e
> 88541460 -rw-rw-rw-  1 18804 23084  5636 2009-07-24 19:30 spip%3Fmot5.f3e9adda
> 88540284 -rw-rw-rw-  1 18804 23084  3802 2009-07-25 16:10 spip%3Fpage%3Dforum-30.63b2c1b1
> 88541539 -rw-rw-rw-  1 18804 23084 12972 2009-07-25 11:14 spip%3Fpage%3Djquery.cce608b6.gz
  OK, so we couldn't stat a directory...

> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
> cat: spip%3Farticle19.f8740dca: Stale NFS file handle
  This is probably the misleading output from ext3_iget(). It should give
you EIO in the latest kernel.

> root@bazooka:~# debugfs /dev/md10
> debugfs 1.40-WIP (14-Nov-2006)
> 
>     debugfs:  stat <88539836>
> 
> Inode: 88539836   Type: directory    Mode:  0755   Flags: 0x0   Generation: 791796957
> User: 18804   Group: 23084   Size: 4096
> File ACL: 0    Directory ACL: 0
> Links: 2   Blockcount: 8
> Fragment:  Address: 0    Number: 0    Size: 0
> ctime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> atime: 0x4a0de585 -- Fri May 15 23:58:29 2009
> mtime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> Size of extra inode fields: 4
> BLOCKS:
> (0):177096928
> TOTAL: 1
> 
> 
>     debugfs:  ls <88539836>
> 
>  88539836  (12) .    88539821  (32) ..    88541366  (12) .ok
>  88541465  (56) -inc_rss_item-32-wa.23d91cc2
>  88541539  (40) spip%3Fpage%3Djquery.cce608b6.gz
>  88540284  (40) spip%3Fpage%3Dforum-30.63b2c1b1
>  88541460  (28) spip%3Fmot5.f3e9adda
>  88541471  (160) -inc_rubriques-17-wa.f2f152f0
>  88541549  (24) INDEX-.edfac52c    88541578  (284) -inc_forum-10-wa.3cb1921f
>  88541562  (36) spip%3Farticle19.f8740dca
>  88541671  (3372) spip%3Fauteur1.c64f7f7e
  The directory itself looks fine...

>     debugfs:  stat <88541562>
> 
> Inode: 88541562   Type: regular    Mode:  0666   Flags: 0x0   Generation: 860068541
> User: 18804   Group: 23084   Size: 0
> File ACL: 0    Directory ACL: 0
> Links: 0   Blockcount: 0
> Fragment:  Address: 0    Number: 0    Size: 0
> ctime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> atime: 0x4a6a612f -- Sat Jul 25 03:34:39 2009
> mtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> dtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> Size of extra inode fields: 4
> BLOCKS:
  Ah, OK, here's the problem. The directory points to a file which is
obviously deleted (note the "Links: 0"). All the content of the inode seems
to indicate that the file was correctly deleted (you might check that the
corresponding bit in the bitmap is cleared via: "icheck 88541562").
  The question is how it could happen the directory still points to the
inode. Really strange. It looks as if we've lost a write to the directory
but I don't see how. Are there any suspitious kernel messages in this case?

>     debugfs:  dump <88539836> /tmp/corrupted_dir
> 
> (file attached)
> 
> 
> > You might want to try disabling the DIR_INDEX feature and see whether
> > the corruption still occurs...
> 
> We'll try.
  It probably won't help. This particular directory had just one block so
DIR_INDEX had no effect on it.

> > > Keeping inodes into servers' cache seems to prevent the problem to happen.
> > > ( yeah, # while true ; do ionice -c3 find /data -size +0 > /dev/null ; done )
> > 
> > I'd guess just because they don't have to be read from disk where they
> > get corrupted.
> 
> Exactly.
> 
> 
> > Interesting, but it may well be just by the way how these files get
> > created / updated.
> 
> Yes, this is only because of that.
> 
> Additional data that may help, we replaced the storage server to 
> something slower (less number of CPU, less number of cores, ...). We are 
> still getting some corruption but with non-common sense with the former 
> server.
> 
> The data are stored on two storage arrays of disks. The primary one is 
> made of fiber-channel disks used through a simple fiber-channel card, 
> RAID soft with md, raid6. The secondary one is made of SCSI disks used 
> through a RAID-hard card. We got corruption on both, depending on
> the one currently used into production.
  OK, so it's probably not a storage device problem. Good to know.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
  2009-07-16 17:27 ` Jan Kara
@ 2009-07-25 15:17   ` Sylvain Rochet
  2009-07-27 15:42     ` Jan Kara
  0 siblings, 1 reply; 39+ messages in thread
From: Sylvain Rochet @ 2009-07-25 15:17 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 4700 bytes --]

Hi,

Sorry for the late answer, waiting for the problem to happen again ;)


On Thu, Jul 16, 2009 at 07:27:49PM +0200, Jan Kara wrote:
>   Hi,
> 
> > We(TuxFamily) are having some inodes corruptions on a NFS server.
> > 
> > So, let's start with the facts.
> > 
> > 
> > ==== NFS Server
> > 
> > Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux
> 
> Can you still see the corruption with 2.6.30 kernel?

Not upgraded yet, we'll give a try.


> If you can still see this problem, could you run: debugfs /dev/md10
> and send output of the command:
> stat <40420228>
> (or whatever the corrupted inode number will be)
> and also:
> dump <40420228> /tmp/corrupted_dir


One inode get corrupted recently, here is the output:


root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# ls -lai
total 64
88539836 drwxr-sr-x  2 18804 23084  4096 2009-07-25 07:53 .
88539821 drwxr-sr-x 20 18804 23084  4096 2008-08-20 10:14 ..
88541578 -rw-rw-rw-  1 18804 23084   471 2009-07-25 04:55 -inc_forum-10-wa.3cb1921f
88541465 -rw-rw-rw-  1 18804 23084  6693 2009-07-25 07:53 -inc_rss_item-32-wa.23d91cc2
88541471 -rw-rw-rw-  1 18804 23084  1625 2009-07-25 07:53 -inc_rubriques-17-wa.f2f152f0
88541549 -rw-rw-rw-  1 18804 23084  2813 2009-07-25 03:04 INDEX-.edfac52c
88541366 -rw-rw-rw-  1 18804 23084     0 2008-08-17 20:44 .ok
       ? ?---------  ? ?     ?         ?                ? spip%3Farticle19.f8740dca
88541671 -rw-rw-rw-  1 18804 23084  5619 2009-07-24 21:07 spip%3Fauteur1.c64f7f7e
88541460 -rw-rw-rw-  1 18804 23084  5636 2009-07-24 19:30 spip%3Fmot5.f3e9adda
88540284 -rw-rw-rw-  1 18804 23084  3802 2009-07-25 16:10 spip%3Fpage%3Dforum-30.63b2c1b1
88541539 -rw-rw-rw-  1 18804 23084 12972 2009-07-25 11:14 spip%3Fpage%3Djquery.cce608b6.gz


root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
cat: spip%3Farticle19.f8740dca: Stale NFS file handle


root@bazooka:~# debugfs /dev/md10
debugfs 1.40-WIP (14-Nov-2006)


    debugfs:  stat <88539836>

Inode: 88539836   Type: directory    Mode:  0755   Flags: 0x0   Generation: 791796957
User: 18804   Group: 23084   Size: 4096
File ACL: 0    Directory ACL: 0
Links: 2   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
atime: 0x4a0de585 -- Fri May 15 23:58:29 2009
mtime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
Size of extra inode fields: 4
BLOCKS:
(0):177096928
TOTAL: 1


    debugfs:  ls <88539836>

 88539836  (12) .    88539821  (32) ..    88541366  (12) .ok
 88541465  (56) -inc_rss_item-32-wa.23d91cc2
 88541539  (40) spip%3Fpage%3Djquery.cce608b6.gz
 88540284  (40) spip%3Fpage%3Dforum-30.63b2c1b1
 88541460  (28) spip%3Fmot5.f3e9adda
 88541471  (160) -inc_rubriques-17-wa.f2f152f0
 88541549  (24) INDEX-.edfac52c    88541578  (284) -inc_forum-10-wa.3cb1921f
 88541562  (36) spip%3Farticle19.f8740dca
 88541671  (3372) spip%3Fauteur1.c64f7f7e


    debugfs:  stat <88541562>

Inode: 88541562   Type: regular    Mode:  0666   Flags: 0x0   Generation: 860068541
User: 18804   Group: 23084   Size: 0
File ACL: 0    Directory ACL: 0
Links: 0   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
atime: 0x4a6a612f -- Sat Jul 25 03:34:39 2009
mtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
dtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
Size of extra inode fields: 4
BLOCKS:


    debugfs:  dump <88539836> /tmp/corrupted_dir

(file attached)


> You might want to try disabling the DIR_INDEX feature and see whether
> the corruption still occurs...

We'll try.


> > Keeping inodes into servers' cache seems to prevent the problem to happen.
> > ( yeah, # while true ; do ionice -c3 find /data -size +0 > /dev/null ; done )
> 
> I'd guess just because they don't have to be read from disk where they
> get corrupted.

Exactly.


> Interesting, but it may well be just by the way how these files get
> created / updated.

Yes, this is only because of that.



Additional data that may help, we replaced the storage server to 
something slower (less number of CPU, less number of cores, ...). We are 
still getting some corruption but with non-common sense with the former 
server.

The data are stored on two storage arrays of disks. The primary one is 
made of fiber-channel disks used through a simple fiber-channel card, 
RAID soft with md, raid6. The secondary one is made of SCSI disks used 
through a RAID-hard card. We got corruption on both, depending on
the one currently used into production.


Sylvain

[-- Attachment #1.2: corrupted_dir --]
[-- Type: application/octet-stream, Size: 4096 bytes --]

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: 2.6.28.9: EXT3/NFS inodes corruption
  2009-04-20 16:20 Sylvain Rochet
@ 2009-07-16 17:27 ` Jan Kara
  2009-07-25 15:17   ` Sylvain Rochet
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Kara @ 2009-07-16 17:27 UTC (permalink / raw)
  To: Sylvain Rochet; +Cc: linux-kernel

  Hi,

> We(TuxFamily) are having some inodes corruptions on a NFS server.
> 
> So, let's start with the facts.
> 
> 
> ==== NFS Server
> 
> Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux
  Can you still see the corruption with 2.6.30 kernel?

...
> /dev/md10 on /data type ext3 (rw,noatime,nodiratime,grpquota,commit=5,data=ordered)
> 
>   ==> We used data=writeback, we fallback to data=ordered,
>       problem's still here
> 
...
> 
> # df -m
> /dev/md10              1378166     87170   1290997   7% /data
  1.3 TB, a large filesystem ;).

> # df -i
> /dev/md10            179224576 3454822 175769754    2% /data
> 
> 
> 
> ==== NFS Clients
> 
> 6x Linux cognac 2.6.28.9-grsec #1 SMP Sun Apr 12 13:06:49 CEST 2009 i686 GNU/Linux
> 5x Linux martini 2.6.28.9-grsec #1 SMP Tue Apr 14 00:01:30 UTC 2009 i686 GNU/Linux
> 2x Linux armagnac 2.6.28.9 #1 SMP Tue Apr 14 08:59:12 CEST 2009 i686 GNU/Linux
> 
> x.x.x.x:/data/... on /data/... type nfs (rw,noexec,nosuid,nodev,async,hard,nfsvers=3,udp,intr,rsize=32768,wsize=32768,timeo=20,addr=x.x.x.x)
> 
>   ==> All NFS exports are mounted this way, sometimes with the 'sync' 
>       option, like web sessions.
>   ==> Those are often mounted from outside of chroots into chroots, 
>       useless detail I think
...

> ==== So, now, going into the problem
> 
> The kernel log is not really nice with us, here on the NFS Server:
> 
> Mar 22 06:47:14 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> Mar 22 06:47:14 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
> Mar 22 06:47:16 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> Mar 22 06:47:16 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> And so on...
  If you can still see this problem, could you run: debugfs /dev/md10
and send output of the command:
stat <40420228>
(or whatever the corrupted inode number will be)
and also:
dump <40420228> /tmp/corrupted_dir

> And more recently...
> Apr  2 22:19:01 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (40780223), 0
> Apr  2 22:19:02 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (40491685), 0
> Apr 11 07:23:02 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (174301379), 0
> Apr 20 08:13:32 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (54942021), 0
> 
> 
> Not much stuff in the kernel log of NFS clients, history is quite lost, 
> but we got some of them:
> 
> ....................: NFS: Buggy server - nlink == 0!
> 
> 
> == Going deeper into the problem
> 
> Something like that is quite common:
> 
> root@bazooka:/data/...# ls -la
> total xxx
> drwxrwx--- 2 xx    xx        4096 2009-04-20 03:48 .
> drwxr-xr-x 7 root  root      4096 2007-01-21 13:15 ..
> -rw-r--r-- 1 root  root         0 2009-04-20 03:48 access.log
> -rw-r--r-- 1 root  root  70784145 2009-04-20 00:11 access.log.0
> -rw-r--r-- 1 root  root   6347007 2009-04-10 00:07 access.log.10.gz
> -rw-r--r-- 1 root  root   6866097 2009-04-09 00:08 access.log.11.gz
> -rw-r--r-- 1 root  root   6410119 2009-04-08 00:07 access.log.12.gz
> -rw-r--r-- 1 root  root   6488274 2009-04-07 00:08 access.log.13.gz
> ?--------- ?    ?     ?         ?                ? access.log.14.gz
> ?--------- ?    ?     ?         ?                ? access.log.15.gz
> ?--------- ?    ?     ?         ?                ? access.log.16.gz
> ?--------- ?    ?     ?         ?                ? access.log.17.gz
> -rw-r--r-- 1 root  root   6950626 2009-04-02 00:07 access.log.18.gz
> ?--------- ?    ?     ?         ?                ? access.log.19.gz
> -rw-r--r-- 1 root  root   6635884 2009-04-19 00:11 access.log.1.gz
> ?--------- ?    ?     ?         ?                ? access.log.20.gz
> ?--------- ?    ?     ?         ?                ? access.log.21.gz
> ?--------- ?    ?     ?         ?                ? access.log.22.gz
> ?--------- ?    ?     ?         ?                ? access.log.23.gz
> ?--------- ?    ?     ?         ?                ? access.log.24.gz
> ?--------- ?    ?     ?         ?                ? access.log.25.gz
> ?--------- ?    ?     ?         ?                ? access.log.26.gz
> -rw-r--r-- 1 root  root   6616546 2009-03-24 00:07 access.log.27.gz
> ?--------- ?    ?     ?         ?                ? access.log.28.gz
> ?--------- ?    ?     ?         ?                ? access.log.29.gz
> -rw-r--r-- 1 root  root   6671875 2009-04-18 00:12 access.log.2.gz
> ?--------- ?    ?     ?         ?                ? access.log.30.gz
> -rw-r--r-- 1 root  root   6347518 2009-04-17 00:10 access.log.3.gz
> -rw-r--r-- 1 root  root   6569714 2009-04-16 00:12 access.log.4.gz
> -rw-r--r-- 1 root  root   7170750 2009-04-15 00:11 access.log.5.gz
> -rw-r--r-- 1 root  root   6676518 2009-04-14 00:12 access.log.6.gz
> -rw-r--r-- 1 root  root   6167458 2009-04-13 00:11 access.log.7.gz
> -rw-r--r-- 1 root  root   5856576 2009-04-12 00:10 access.log.8.gz
> -rw-r--r-- 1 root  root   6644142 2009-04-11 00:07 access.log.9.gz
> 
> 
> root@bazooka:/data/...# cat *      # output filtered, only errors
> cat: access.log.14.gz: Stale NFS file handle
> cat: access.log.15.gz: Stale NFS file handle
> cat: access.log.16.gz: Stale NFS file handle
> cat: access.log.17.gz: Stale NFS file handle
> cat: access.log.19.gz: Stale NFS file handle
> cat: access.log.20.gz: Stale NFS file handle
> cat: access.log.21.gz: Stale NFS file handle
> cat: access.log.22.gz: Stale NFS file handle
> cat: access.log.23.gz: Stale NFS file handle
> cat: access.log.24.gz: Stale NFS file handle
> cat: access.log.25.gz: Stale NFS file handle
> cat: access.log.26.gz: Stale NFS file handle
> cat: access.log.28.gz: Stale NFS file handle
> cat: access.log.29.gz: Stale NFS file handle
> cat: access.log.30.gz: Stale NFS file handle
> 
> 
> "Stale NFS file handle"... on the NFS Server... hummm...
> 
> 
> == Other facts
> 
> fsck.ext3 fixed the filesystem but didn't fix the problem.
> 
> mkfs.ext3 didn't fix the problem either.
  You might want to try disabling the DIR_INDEX feature and see whether
the corruption still occurs...

> It only concerns files which have been recently modified, logs, awstats 
> hashfiles, websites caches, sessions, locks, and such.
> 
> It mainly happens to files which are created on the NFS server itself, 
> but it's not a hard rule.
> 
> Keeping inodes into servers' cache seems to prevent the problem to happen.
> ( yeah, # while true ; do ionice -c3 find /data -size +0 > /dev/null ; done )
  I'd guess just because they don't have to be read from disk where they
get corrupted.

> Hummm, it seems to concern files which are quite near to each others, 
> let's check that:
> 
> Let's build up an inode "database"
> 
> # find /data -printf '%i %p\n' > /root/inodesnumbers
> 
> 
> Let's check how inodes numbers are distributed:
> 
> # cat /root/inodesnumbers | perl -e 'use Data::Dumper; my @pof; while(<>){my ( $inode ) = ( $_ =~ /^(\d+)/ ); my $hop = int($inode/1000000); $pof[$hop]++; }; for (0 .. $#pof) { print $_." = ".($pof[$_]/10000)."%\n" }'
> [... lot of quite unused inodes groups]
> 53 = 3.0371%
> 54 = 26.679%     <= mailboxes
> 55 = 2.7026%
> [... lot of quite unused inodes groups]
> 58 = 1.3262%
> 59 = 27.3211%    <= mailing lists archives
> 60 = 5.5159%
> [... lot of quite unused inodes groups]
> 171 = 0.0631%
> 172 = 0.1063%
> 173 = 27.2895%   <=
> 174 = 44.0623%   <=
> 175 = 45.6783%   <= websites files
> 176 = 45.8247%   <=
> 177 = 36.9376%   <=
> 178 = 6.3294%
> 179 = 0.0442%
> 
> Hummm, all the files are using the same inodes "groups".
>   (groups of a million of inodes)
  Interesting, but it may well be just by the way how these files get
created / updated.

> We use to fix broken folders by moving them to a quarantine folder and 
> by restoring disappeared files from the backup.
> 
> So, let's check corrupted inodes number from the quarantine folder:
> 
> root@bazooka:/data/path/to/rep/of/quarantine/folders# find . -mindepth 1 -maxdepth 1 -printf '%i\n' | sort -n
> 174293418
> 174506030
> 174506056
> 174506073
> 174506081
> 174506733
> 174507694
> 174507708
> 174507888
> 174507985
> 174508077
> 174508083
> 176473056
> 176473062
> 176473064
> 
> Humm... those are quite near to each other 17450... 17647... and are of 
> course in the most used inodes "groups"...
> 
> 
> Open question: are NFS clients can steal inodes numbers from each others ?
> 
> 
> I am not sure whether my bug report is good, feel free to ask questions ;)


										Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* 2.6.28.9: EXT3/NFS inodes corruption
@ 2009-04-20 16:20 Sylvain Rochet
  2009-07-16 17:27 ` Jan Kara
  0 siblings, 1 reply; 39+ messages in thread
From: Sylvain Rochet @ 2009-04-20 16:20 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 12056 bytes --]

Hi,


We(TuxFamily) are having some inodes corruptions on a NFS server.


So, let's start with the facts.


==== NFS Server

Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux

root@bazooka:/usr/src# grep EXT3 /boot/config-2.6.28.9 
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y

root@bazooka:/usr/src# grep NFS /boot/config-2.6.28.9 
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
# CONFIG_NFS_V4 is not set
CONFIG_NFSD=y
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
# CONFIG_NFSD_V4 is not set
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y

  ==> We upgraded from 2.6.26.5 to 2.6.28.9, problem's still here


/dev/md10 on /data type ext3 (rw,noatime,nodiratime,grpquota,commit=5,data=ordered)

  ==> We used data=writeback, we fallback to data=ordered,
      problem's still here


# /etc/exports
/data                   *(rw,no_root_squash,async,no_subtree_check)
... and lots of exports of subdirs of /data, exported the same way


Process about NFS, on the NFS server.

root@bazooka:~# ps aux | grep -E '(nfsd]|lockd]|statd|mountd|idmapd|rquotad|portmap)$'
daemon    1226  0.0  0.0   4824   452 ?        Ss   Apr11   0:06 /sbin/portmap
root      1703  0.0  0.0      0     0 ?        S<   01:29   0:09 [lockd]
root      1704  0.3  0.0      0     0 ?        D<   01:29   3:29 [nfsd]
root      1705  0.3  0.0      0     0 ?        S<   01:29   3:34 [nfsd]
root      1706  0.3  0.0      0     0 ?        S<   01:29   3:32 [nfsd]
root      1707  0.3  0.0      0     0 ?        S<   01:29   3:30 [nfsd]
root      1708  0.3  0.0      0     0 ?        D<   01:29   3:43 [nfsd]
root      1709  0.3  0.0      0     0 ?        D<   01:29   3:43 [nfsd]
root      1710  0.3  0.0      0     0 ?        D<   01:29   3:39 [nfsd]
root      1711  0.3  0.0      0     0 ?        D<   01:29   3:42 [nfsd]
root      1715  0.0  0.0   5980   576 ?        Ss   01:29   0:00 /usr/sbin/rpc.mountd
statd     1770  0.0  0.0   8072   648 ?        Ss   Apr11   0:00 /sbin/rpc.statd
root      1776  0.0  0.0  23180   536 ?        Ss   Apr11   0:00 /usr/sbin/rpc.idmapd
root      1785  0.0  0.0   6148   552 ?        Ss   Apr11   0:00 /usr/sbin/rpc.rquotad

  ==> We used to run tenths of nfsd daemons, we fallback to 8,
      the default, problem's still here
  ==> There are some 'D' processes because of a running data-check


Block device health:

Apr  3 00:28:20 bazooka kernel: md: data-check of RAID array md10
Apr  3 05:11:59 bazooka kernel: md: md10: data-check done.

Apr  5 01:06:01 bazooka kernel: md: data-check of RAID array md10
Apr  5 05:49:42 bazooka kernel: md: md10: data-check done.

Apr 20 16:27:33 bazooka kernel: md: data-check of RAID array md10

md10 : active raid6 sda[0] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1]
      1433738880 blocks level 6, 64k chunk, algorithm 2 [12/12] [UUUUUUUUUUUU]
      [======>..............]  check = 30.1% (43176832/143373888) finish=208.1min speed=8020K/sec

  ==> Everything seems fine


# df -m
/dev/md10              1378166     87170   1290997   7% /data

# df -i
/dev/md10            179224576 3454822 175769754    2% /data



==== NFS Clients

6x Linux cognac 2.6.28.9-grsec #1 SMP Sun Apr 12 13:06:49 CEST 2009 i686 GNU/Linux
5x Linux martini 2.6.28.9-grsec #1 SMP Tue Apr 14 00:01:30 UTC 2009 i686 GNU/Linux
2x Linux armagnac 2.6.28.9 #1 SMP Tue Apr 14 08:59:12 CEST 2009 i686 GNU/Linux

grad@armagnac:~$ grep NFS /boot/config-2.6.28.9 
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
# CONFIG_NFS_V4 is not set
CONFIG_NFSD=y
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
# CONFIG_NFSD_V4 is not set
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y

  ==> We upgraded from 2.6.23.16 and 2.6.24.2 (yeah, vmsplice upgrade 
;-) to 2.6.28.9, problem's still here


x.x.x.x:/data/... on /data/... type nfs (rw,noexec,nosuid,nodev,async,hard,nfsvers=3,udp,intr,rsize=32768,wsize=32768,timeo=20,addr=x.x.x.x)

  ==> All NFS exports are mounted this way, sometimes with the 'sync' 
      option, like web sessions.
  ==> Those are often mounted from outside of chroots into chroots, 
      useless detail I think


Process about NFS, on the NFS clients.

root@cognac:~# ps aux | grep -E '(nfsd]|lockd]|statd|mountd|idmapd|rquotad|portmap)$'
daemon     349  0.0  0.0   1904   536 ?        Ss   Apr12   0:00 /sbin/portmap
statd      360  0.0  0.1   3452  1152 ?        Ss   Apr12   0:00 /sbin/rpc.statd
root      1190  0.0  0.0      0     0 ?        S<   Apr12   0:00 [lockd]



==== So, now, going into the problem

The kernel log is not really nice with us, here on the NFS Server:

Mar 22 06:47:14 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
Mar 22 06:47:14 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
Mar 22 06:47:16 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
Mar 22 06:47:16 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
And so on...

And more recently...
Apr  2 22:19:01 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (40780223), 0
Apr  2 22:19:02 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (40491685), 0
Apr 11 07:23:02 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (174301379), 0
Apr 20 08:13:32 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (54942021), 0


Not much stuff in the kernel log of NFS clients, history is quite lost, 
but we got some of them:

....................: NFS: Buggy server - nlink == 0!


== Going deeper into the problem

Something like that is quite common:

root@bazooka:/data/...# ls -la
total xxx
drwxrwx--- 2 xx    xx        4096 2009-04-20 03:48 .
drwxr-xr-x 7 root  root      4096 2007-01-21 13:15 ..
-rw-r--r-- 1 root  root         0 2009-04-20 03:48 access.log
-rw-r--r-- 1 root  root  70784145 2009-04-20 00:11 access.log.0
-rw-r--r-- 1 root  root   6347007 2009-04-10 00:07 access.log.10.gz
-rw-r--r-- 1 root  root   6866097 2009-04-09 00:08 access.log.11.gz
-rw-r--r-- 1 root  root   6410119 2009-04-08 00:07 access.log.12.gz
-rw-r--r-- 1 root  root   6488274 2009-04-07 00:08 access.log.13.gz
?--------- ?    ?     ?         ?                ? access.log.14.gz
?--------- ?    ?     ?         ?                ? access.log.15.gz
?--------- ?    ?     ?         ?                ? access.log.16.gz
?--------- ?    ?     ?         ?                ? access.log.17.gz
-rw-r--r-- 1 root  root   6950626 2009-04-02 00:07 access.log.18.gz
?--------- ?    ?     ?         ?                ? access.log.19.gz
-rw-r--r-- 1 root  root   6635884 2009-04-19 00:11 access.log.1.gz
?--------- ?    ?     ?         ?                ? access.log.20.gz
?--------- ?    ?     ?         ?                ? access.log.21.gz
?--------- ?    ?     ?         ?                ? access.log.22.gz
?--------- ?    ?     ?         ?                ? access.log.23.gz
?--------- ?    ?     ?         ?                ? access.log.24.gz
?--------- ?    ?     ?         ?                ? access.log.25.gz
?--------- ?    ?     ?         ?                ? access.log.26.gz
-rw-r--r-- 1 root  root   6616546 2009-03-24 00:07 access.log.27.gz
?--------- ?    ?     ?         ?                ? access.log.28.gz
?--------- ?    ?     ?         ?                ? access.log.29.gz
-rw-r--r-- 1 root  root   6671875 2009-04-18 00:12 access.log.2.gz
?--------- ?    ?     ?         ?                ? access.log.30.gz
-rw-r--r-- 1 root  root   6347518 2009-04-17 00:10 access.log.3.gz
-rw-r--r-- 1 root  root   6569714 2009-04-16 00:12 access.log.4.gz
-rw-r--r-- 1 root  root   7170750 2009-04-15 00:11 access.log.5.gz
-rw-r--r-- 1 root  root   6676518 2009-04-14 00:12 access.log.6.gz
-rw-r--r-- 1 root  root   6167458 2009-04-13 00:11 access.log.7.gz
-rw-r--r-- 1 root  root   5856576 2009-04-12 00:10 access.log.8.gz
-rw-r--r-- 1 root  root   6644142 2009-04-11 00:07 access.log.9.gz


root@bazooka:/data/...# cat *      # output filtered, only errors
cat: access.log.14.gz: Stale NFS file handle
cat: access.log.15.gz: Stale NFS file handle
cat: access.log.16.gz: Stale NFS file handle
cat: access.log.17.gz: Stale NFS file handle
cat: access.log.19.gz: Stale NFS file handle
cat: access.log.20.gz: Stale NFS file handle
cat: access.log.21.gz: Stale NFS file handle
cat: access.log.22.gz: Stale NFS file handle
cat: access.log.23.gz: Stale NFS file handle
cat: access.log.24.gz: Stale NFS file handle
cat: access.log.25.gz: Stale NFS file handle
cat: access.log.26.gz: Stale NFS file handle
cat: access.log.28.gz: Stale NFS file handle
cat: access.log.29.gz: Stale NFS file handle
cat: access.log.30.gz: Stale NFS file handle


"Stale NFS file handle"... on the NFS Server... hummm...


== Other facts

fsck.ext3 fixed the filesystem but didn't fix the problem.

mkfs.ext3 didn't fix the problem either.

It only concerns files which have been recently modified, logs, awstats 
hashfiles, websites caches, sessions, locks, and such.

It mainly happens to files which are created on the NFS server itself, 
but it's not a hard rule.

Keeping inodes into servers' cache seems to prevent the problem to happen.
( yeah, # while true ; do ionice -c3 find /data -size +0 > /dev/null ; done )


Hummm, it seems to concern files which are quite near to each others, 
let's check that:

Let's build up an inode "database"

# find /data -printf '%i %p\n' > /root/inodesnumbers


Let's check how inodes numbers are distributed:

# cat /root/inodesnumbers | perl -e 'use Data::Dumper; my @pof; while(<>){my ( $inode ) = ( $_ =~ /^(\d+)/ ); my $hop = int($inode/1000000); $pof[$hop]++; }; for (0 .. $#pof) { print $_." = ".($pof[$_]/10000)."%\n" }'
[... lot of quite unused inodes groups]
53 = 3.0371%
54 = 26.679%     <= mailboxes
55 = 2.7026%
[... lot of quite unused inodes groups]
58 = 1.3262%
59 = 27.3211%    <= mailing lists archives
60 = 5.5159%
[... lot of quite unused inodes groups]
171 = 0.0631%
172 = 0.1063%
173 = 27.2895%   <=
174 = 44.0623%   <=
175 = 45.6783%   <= websites files
176 = 45.8247%   <=
177 = 36.9376%   <=
178 = 6.3294%
179 = 0.0442%

Hummm, all the files are using the same inodes "groups".
  (groups of a million of inodes)

We use to fix broken folders by moving them to a quarantine folder and 
by restoring disappeared files from the backup.

So, let's check corrupted inodes number from the quarantine folder:

root@bazooka:/data/path/to/rep/of/quarantine/folders# find . -mindepth 1 -maxdepth 1 -printf '%i\n' | sort -n
174293418
174506030
174506056
174506073
174506081
174506733
174507694
174507708
174507888
174507985
174508077
174508083
176473056
176473062
176473064

Humm... those are quite near to each other 17450... 17647... and are of 
course in the most used inodes "groups"...


Open question: are NFS clients can steal inodes numbers from each others ?


I am not sure whether my bug report is good, feel free to ask questions ;)

Best regards,
Sylvain

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2009-08-21 14:32 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <ct4xS-63o-27@gated-at.bofh.it>
2009-07-28 16:40 ` 2.6.28.9: EXT3/NFS inodes corruption Daniel J Blueman
2009-07-28 16:45   ` Sylvain Rochet
2009-08-21 11:05     ` Daniel J Blueman
2009-08-21 14:32       ` Sylvain Rochet
2009-04-20 16:20 Sylvain Rochet
2009-07-16 17:27 ` Jan Kara
2009-07-25 15:17   ` Sylvain Rochet
2009-07-27 15:42     ` Jan Kara
2009-07-28 11:27       ` Sylvain Rochet
2009-07-28 13:52         ` Jan Kara
2009-07-28 13:52           ` Jan Kara
2009-07-28 13:52           ` Jan Kara
2009-07-28 16:41           ` Sylvain Rochet
2009-07-28 21:12             ` J. Bruce Fields
2009-08-04 10:50               ` Sylvain Rochet
2009-07-29 12:58             ` Jan Kara
2009-08-04 11:02               ` Sylvain Rochet
2009-08-03 22:29             ` Jan Kara
2009-08-03 22:29               ` Jan Kara
2009-08-03 22:29               ` Jan Kara
2009-08-04 11:15               ` Sylvain Rochet
2009-08-04 22:56                 ` Jan Kara
2009-08-04 22:56                   ` Jan Kara
2009-08-06 13:15                   ` Sylvain Rochet
2009-08-06 13:15                     ` Sylvain Rochet
2009-08-06 13:15                     ` Sylvain Rochet
2009-08-06 17:05                     ` J. Bruce Fields
2009-08-06 17:05                       ` J. Bruce Fields
2009-08-06 17:05                       ` J. Bruce Fields
2009-08-12 22:34                     ` Jan Kara
2009-08-12 22:34                       ` Jan Kara
2009-08-12 22:34                       ` Jan Kara
2009-08-20 17:19                       ` Sylvain Rochet
2009-08-20 17:19                         ` Sylvain Rochet
2009-08-20 17:19                         ` Sylvain Rochet
2009-08-21  0:00                         ` Simon Kirby
2009-08-21  0:00                           ` Simon Kirby
2009-08-21  0:00                           ` Simon Kirby
2009-08-21 10:51                           ` Sylvain Rochet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.