All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG  at fs/jbd2/commit.c:534" from Postfix on ext4
       [not found] <20110301165239.3310.43806.reportbug@support.exmeritus.com>
       [not found] ` <BE4E C1DF-4DFC-4B94-923D-0197B16BD7B4@boeing.com>
@ 2011-03-01 19:26 ` Moffett, Kyle D
  2011-04-03  2:02   ` Ted Ts'o
  1 sibling, 1 reply; 35+ messages in thread
From: Moffett, Kyle D @ 2011-03-01 19:26 UTC (permalink / raw)
  To: 615998
  Cc: Livingston, John A, linux-ext4, linux-kernel, Sachin Sant,
	Aneesh Kumar K.V, Theodore T'so

Whoops, looks like the Debian bug-tracker lost the CC list somehow.  I believe I've got all the CCs re-added, sorry for any duplicate emails.

On Mar 01, 2011, at 11:52, Kyle Moffett wrote:
> Package: linux-2.6
> Version: 2.6.32-30
> Severity: important
>
> I'm getting a repeatable BUG from ext4, which seems to be caused by
> Postfix processing its mail queue.  The specific filesystem block device
> that has the problem seems to be "dm-13", which on this boot is the
> logical volume containing the "/var/spool/postfix" chroot.
>
> This is a completely standard Debian installation running on an Amazon EC2
> instance (x86_64).  The filesystem is mounted in "data=journal" mode.
>
> This crash is *very* repeatable.  It occurs almost every reboot when
> there are more than 1 or 2 queued emails.  I will try re-mounting the
> filesystem in "data=ordered" mode momentarily.
>
> The relevant filesystems are:
>  /dev/mapper/system-root / ext4 rw,noatime,barrier=1,data=ordered 0 0
>  /dev/mapper/system-var /var ext4 rw,noatime,barrier=1,nodelalloc,data=journal 0 0
>  /dev/mapper/system-log /var/log ext4 rw,nosuid,nodev,noatime,barrier=1,data=ordered 0 0
>  /dev/xvda1 /boot ext3 rw,noatime,user_xattr,acl,data=journal 0 0
>  /dev/mapper/db-mail /var/mail ext4 rw,nosuid,nodev,noatime,barrier=1,data=ordered 0 0
>  /dev/mapper/db-postfix /var/spool/postfix ext4 rw,nosuid,nodev,noatime,barrier=1,nodelalloc,data=journal 0 0
>  /dev/mapper/db-smtp /var/lib/postfix ext4 rw,nosuid,nodev,noatime,barrier=1,nodelalloc,data=journal 0 0
>  /dev/mapper/db-smtpcfg /etc/postfix ext4 rw,nosuid,nodev,noatime,barrier=1,nodelalloc,data=journal 0 0
>
> In particular, I note that there was a previous report of a BUG at
> fs/jbd2/commit.c:533 which never seemed to get isolated:
>  http://www.kerneltrap.com/mailarchive/linux-ext4/2009/9/2/6373283
>
> I need to get this system operational again right now, but I'm going to
> take a consistent snapshot of it so I can debug it later.
>
> NOTE:  For followers on the linux-ext4 mailing list, this particular
> system is running the Debian "squeeze" kernel (based on 2.6.32), so it's
> theoretically possible this bug has been fixed upstream since then.
> I didn't have any luck finding such a fix on Google, though.
>
> -- Package-specific info:
> ** Version:
> Linux version 2.6.32-5-xen-amd64 (Debian 2.6.32-30) (ben@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed Jan 12 05:46:49 UTC 2011
>
> ** Command line:
> root=/dev/mapper/system-root ro
>
> ** Tainted: D (128)
> * Kernel has oopsed before.
>
> ** Kernel log:
> [  118.525038]   alloc irq_desc for 526 on node -1
> [  118.525040]   alloc kstat_irqs on node -1
> [  118.700415] device-mapper: uevent: version 1.0.3
> [  118.700890] device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel@redhat.com
> [  118.925563] EXT4-fs (dm-0): INFO: recovery required on readonly filesystem
> [  118.925580] EXT4-fs (dm-0): write access will be enabled during recovery
> [  118.968700] EXT4-fs (dm-0): orphan cleanup on readonly fs
> [  118.968716] EXT4-fs (dm-0): ext4_orphan_cleanup: deleting unreferenced inode 790044
> [  118.968761] EXT4-fs (dm-0): ext4_orphan_cleanup: deleting unreferenced inode 790012
> [  118.968768] EXT4-fs (dm-0): ext4_orphan_cleanup: deleting unreferenced inode 790011
> [  118.968775] EXT4-fs (dm-0): ext4_orphan_cleanup: deleting unreferenced inode 790010
> [  118.968782] EXT4-fs (dm-0): ext4_orphan_cleanup: deleting unreferenced inode 790009
> [  118.968788] EXT4-fs (dm-0): 5 orphan inodes deleted
> [  118.968794] EXT4-fs (dm-0): recovery complete
> [  118.979150] EXT4-fs (dm-0): mounted filesystem with ordered data mode
> [  119.293543] udev[204]: starting version 164
> [  119.366778] input: PC Speaker as /devices/platform/pcspkr/input/input1
> [  119.436417] Error: Driver 'pcspkr' is already registered, aborting...
> [  124.153241] Adding 4194296k swap on /dev/xvdb1.  Priority:-1 extents:1 across:4194296k SS
> [  125.156599] loop: module loaded
> [  138.650657] EXT4-fs (dm-21): Ignoring delalloc option - requested data journaling mode
> [  138.650959] EXT4-fs (dm-21): mounted filesystem with journalled data mode
> [  138.660092] EXT4-fs (dm-22): mounted filesystem with ordered data mode
> [  138.674436] kjournald starting.  Commit interval 5 seconds
> [  138.675145] EXT3 FS on xvda1, internal journal
> [  138.675155] EXT3-fs: mounted filesystem with journal data mode.
> [  138.728462] EXT4-fs (xvdc): mounted filesystem without journal
> [  138.745406] EXT4-fs (dm-17): mounted filesystem with ordered data mode
> [  138.748531] EXT4-fs (dm-18): mounted filesystem with ordered data mode
> [  138.774667] EXT4-fs (dm-19): mounted filesystem with ordered data mode
> [  138.780834] EXT4-fs (dm-2): Ignoring delalloc option - requested data journaling mode
> [  138.781400] EXT4-fs (dm-2): mounted filesystem with journalled data mode
> [  138.784700] EXT4-fs (dm-1): Ignoring delalloc option - requested data journaling mode
> [  138.784773] EXT4-fs (dm-1): mounted filesystem with journalled data mode
> [  138.790320] EXT4-fs (dm-4): Ignoring delalloc option - requested data journaling mode
> [  138.790871] EXT4-fs (dm-4): mounted filesystem with journalled data mode
> [  138.794955] EXT4-fs (dm-3): Ignoring delalloc option - requested data journaling mode
> [  138.795064] EXT4-fs (dm-3): mounted filesystem with journalled data mode
> [  138.800441] EXT4-fs (dm-14): Ignoring delalloc option - requested data journaling mode
> [  138.801015] EXT4-fs (dm-14): mounted filesystem with journalled data mode
> [  138.804188] EXT4-fs (dm-13): Ignoring delalloc option - requested data journaling mode
> [  138.804263] EXT4-fs (dm-13): mounted filesystem with journalled data mode
> [  138.807812] EXT4-fs (dm-12): Ignoring delalloc option - requested data journaling mode
> [  138.807890] EXT4-fs (dm-12): mounted filesystem with journalled data mode
> [  138.814241] EXT4-fs (dm-16): mounted filesystem with ordered data mode
> [  138.817864] EXT4-fs (dm-15): mounted filesystem with ordered data mode
> [  138.821251] EXT4-fs (dm-9): mounted filesystem with ordered data mode
> [  138.824801] EXT4-fs (dm-7): Ignoring delalloc option - requested data journaling mode
> [  138.824877] EXT4-fs (dm-7): mounted filesystem with journalled data mode
> [  138.828423] EXT4-fs (dm-6): Ignoring delalloc option - requested data journaling mode
> [  138.828495] EXT4-fs (dm-6): mounted filesystem with journalled data mode
> [  138.832295] EXT4-fs (dm-5): Ignoring delalloc option - requested data journaling mode
> [  138.832868] EXT4-fs (dm-5): mounted filesystem with journalled data mode
> [  138.836422] EXT4-fs (dm-8): Ignoring delalloc option - requested data journaling mode
> [  138.836495] EXT4-fs (dm-8): mounted filesystem with journalled data mode
> [  138.840309] EXT4-fs (dm-11): mounted filesystem with ordered data mode
> [  138.844664] EXT4-fs (dm-10): Ignoring delalloc option - requested data journaling mode
> [  138.844735] EXT4-fs (dm-10): mounted filesystem with journalled data mode
> [  138.859715] EXT4-fs (dm-20): mounted filesystem with ordered data mode
> [  150.205516] eth0: no IPv6 routers present
> [  157.443694] ------------[ cut here ]------------
> [  157.443708] kernel BUG at /build/buildd-linux-2.6_2.6.32-30-amd64-d4MbNM/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/commit.c:534!
> [  157.443717] invalid opcode: 0000 [#1] SMP
> [  157.443722] last sysfs file: /sys/devices/virtual/vc/vcsa6/uevent
> [  157.443727] CPU 0
> [  157.443730] Modules linked in: ext3 jbd loop snd_pcm snd_timer snd soundcore evdev snd_page_alloc pcspkr ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
> [  157.443759] Pid: 748, comm: jbd2/dm-13-8 Not tainted 2.6.32-5-xen-amd64 #1
> [  157.443764] RIP: e030:[<ffffffffa00213bc>]  [<ffffffffa00213bc>] jbd2_journal_commit_transaction+0x58b/0x115b [jbd2]
> [  157.443778] RSP: e02b:ffff8801dc2adcf0  EFLAGS: 00010202
> [  157.443782] RAX: 0000000000000009 RBX: ffff8801d7c26870 RCX: 00000000ffff74c8
> [  157.443788] RDX: 00000000ffff74c8 RSI: ffff8801d7c26800 RDI: ffff8801d87a9024
> [  157.443793] RBP: ffff8801d7c26850 R08: ffff8801d87a9058 R09: ffff88000b1df780
> [  157.443798] R10: ffff8801d7aa0048 R11: ffff8801dba1d880 R12: ffff8801d87a9000
> [  157.443803] R13: ffff8801dc2add88 R14: ffff8801d7c26800 R15: 0000000000000000
> [  157.443812] FS:  00007f3afa7d3700(0000) GS:ffff88000b1ca000(0000) knlGS:0000000000000000
> [  157.443818] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  157.443822] CR2: 00007f3afc878268 CR3: 00000001d7b05000 CR4: 0000000000002660
> [  157.443828] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  157.443833] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
> [  157.443839] Process jbd2/dm-13-8 (pid: 748, threadinfo ffff8801dc2ac000, task ffff8801db02e2e0)
> [  157.443844] Stack:
> [  157.443847]  ffff8801db02e318 ffff8801dc2add70 ffff8801db02e2e0 ffff8801d7aa38c8
> [  157.443854] <0> 00000024a85fc12e ffff8801d87a9024 ffff8801d87a939c ffff88000b1df780
> [  157.443865] <0> ffff8801d72f3000 ffff8801db02e2e0 0000000800000000 00000441000116e0
> [  157.443877] Call Trace:
> [  157.443886]  [<ffffffff8130b79f>] ? thread_return+0x79/0xe0
> [  157.443893]  [<ffffffff8100e609>] ? __raw_callee_save_xen_irq_disable+0x11/0x1e
> [  157.443901]  [<ffffffff8130cb3a>] ? _spin_unlock_irqrestore+0xd/0xe
> [  157.443908]  [<ffffffff8105b748>] ? try_to_del_timer_sync+0x63/0x6c
> [  157.443917]  [<ffffffffa002788f>] ? kjournald2+0xbe/0x206 [jbd2]
> [  157.443923]  [<ffffffff81065d4a>] ? autoremove_wake_function+0x0/0x2e
> [  157.443930]  [<ffffffffa00277d1>] ? kjournald2+0x0/0x206 [jbd2]
> [  157.443936]  [<ffffffff81065a7d>] ? kthread+0x79/0x81
> [  157.443942]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
> [  157.443947]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
> [  157.443952]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20
> [  157.443956] Code: 00 48 89 4c 24 38 45 31 ed c7 44 24 4c 00 00 00 00 c7 44 24 50 00 00 00 00 c7 44 24 58 ff ff ff ff 48 89 44 24 18 e9 f0 02 00 00 <0f> 0b eb fe 41 f6 04 24 02 74 40 48 8b 13 48 8d 42 02 f0 80 62
> [  157.444009] RIP  [<ffffffffa00213bc>] jbd2_journal_commit_transaction+0x58b/0x115b [jbd2]
> [  157.444017]  RSP <ffff8801dc2adcf0>
> [  157.444035] ---[ end trace 1277bd032d9ae98f ]---
>
> ** Model information
> Amazon EC2 instance
>
> ** Loaded modules:
> Module                  Size  Used by
> ext3                  106502  1
> jbd                    37085  1 ext3
> loop                   11799  0
> snd_pcm                60551  0
> snd_timer              15582  1 snd_pcm
> snd                    46446  2 snd_pcm,snd_timer
> soundcore               4598  1 snd
> evdev                   7352  0
> snd_page_alloc          6249  1 snd_pcm
> pcspkr                  1699  0
> ext4                  289033  24
> mbcache                 5050  2 ext3,ext4
> jbd2                   67015  1 ext4
> crc16                   1319  1 ext4
> dm_mod                 53754  69
> xen_netfront           16257  0
> xen_blkfront            9435  8
>
> ** Network interface configuration:
> ###
> ## WARNING:
> ##   DO NOT MESS WITH THIS FILE, AND ESPECIALLY DO NOT RUN IFDOWN OR IFUP!!!
> ##   THIS MACHINE IS AN AMAZON EC2 VM WITH NO CONSOLE ACCESS!!!
> ###
>
> auto lo
> iface lo inet loopback
>        dns-search border.exmeritus.com exmeritus.com
>
> auto eth0
> iface eth0 inet dhcp
>
>
> ** Network status:
> *** IP interfaces and addresses:
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
>    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>    inet 127.0.0.1/8 scope host lo
>    inet6 ::1/128 scope host
>       valid_lft forever preferred_lft forever
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
>    link/ether 12:31:3b:0a:94:dc brd ff:ff:ff:ff:ff:ff
>    inet 10.204.151.42/23 brd 10.204.151.255 scope global eth0
>    inet6 fe80::1031:3bff:fe0a:94dc/64 scope link
>       valid_lft forever preferred_lft forever
>
> *** Device statistics:
> Inter-|   Receive                                                |  Transmit
> face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
>    lo:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
>  eth0:  185543    1684    0    0    0     0          0         0   239161    1212    0    0    0     0       0          0
>
> *** Protocol statistics:
> Ip:
>    1653 total packets received
>    1 with invalid addresses
>    0 forwarded
>    0 incoming packets discarded
>    1652 incoming packets delivered
>    1203 requests sent out
> Icmp:
>    9 ICMP messages received
>    0 input ICMP message failed.
>    ICMP input histogram:
>        echo requests: 9
>    9 ICMP messages sent
>    0 ICMP messages failed
>    ICMP output histogram:
>        echo replies: 9
> IcmpMsg:
>        InType8: 9
>        OutType0: 9
> Tcp:
>    26 active connections openings
>    6 passive connection openings
>    7 failed connection attempts
>    7 connection resets received
>    6 connections established
>    1477 segments received
>    1008 segments send out
>    20 segments retransmited
>    0 bad segments received.
>    12 resets sent
> Udp:
>    166 packets received
>    0 packets to unknown port received.
>    0 packet receive errors
>    166 packets sent
> UdpLite:
> TcpExt:
>    2 TCP sockets finished time wait in fast timer
>    22 delayed acks sent
>    4 packets directly queued to recvmsg prequeue.
>    563 packet headers predicted
>    84 acknowledgments not containing data payload received
>    740 predicted acknowledgments
>    6 congestion windows recovered without slow start after partial ack
>    13 other TCP timeouts
>    1 connections reset due to unexpected data
>    2 connections reset due to early user close
>    7 connections aborted due to timeout
> IpExt:
>    InOctets: 184459
>    OutOctets: 221425
>
> *** Device features:
> eth0: 0x50003
> lo: 0x13865
>
> ** PCI devices:
>
> ** USB devices:
> not available
>
>
> -- System Information:
> Debian Release: 6.0
>  APT prefers stable
>  APT policy: (990, 'stable'), (700, 'testing'), (600, 'unstable'), (500, 'squeeze-updates'), (500, 'experimental')
> Architecture: amd64 (x86_64)
>
> Kernel: Linux 2.6.32-5-xen-amd64 (SMP w/2 CPU cores)
> Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
> Shell: /bin/sh linked to /bin/dash
>
> Versions of packages linux-image-2.6.32-5-xen-amd64 depends on:
> ii  debconf [debconf-2.0]         1.5.36.1   Debian configuration management sy
> ii  initramfs-tools               0.98.8     tools for generating an initramfs
> ii  linux-base                    2.6.32-30  Linux image base package
> ii  module-init-tools             3.12-1     tools for managing Linux kernel mo
>
> Versions of packages linux-image-2.6.32-5-xen-amd64 recommends:
> pn  firmware-linux-free           <none>     (no description available)
>
> Versions of packages linux-image-2.6.32-5-xen-amd64 suggests:
> ii  grub-legacy [grub]            0.97-64    GRand Unified Bootloader (Legacy v
> pn  linux-doc-2.6.32              <none>     (no description available)
>
> Versions of packages linux-image-2.6.32-5-xen-amd64 is related to:
> pn  firmware-bnx2                 <none>     (no description available)
> pn  firmware-bnx2x                <none>     (no description available)
> pn  firmware-ipw2x00              <none>     (no description available)
> pn  firmware-ivtv                 <none>     (no description available)
> pn  firmware-iwlwifi              <none>     (no description available)
> pn  firmware-linux                <none>     (no description available)
> pn  firmware-linux-nonfree        <none>     (no description available)
> pn  firmware-qlogic               <none>     (no description available)
> pn  firmware-ralink               <none>     (no description available)
> pn  xen-hypervisor                <none>     (no description available)
>
> -- debconf information:
>  linux-image-2.6.32-5-xen-amd64/postinst/ignoring-do-bootloader-2.6.32-5-xen-amd64:
>  linux-image-2.6.32-5-xen-amd64/prerm/removing-running-kernel-2.6.32-5-xen-amd64: true
>  linux-image-2.6.32-5-xen-amd64/postinst/depmod-error-initrd-2.6.32-5-xen-amd64: false
>  linux-image-2.6.32-5-xen-amd64/postinst/missing-firmware-2.6.32-5-xen-amd64:
>
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG  at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-03-01 19:26 ` Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4 Moffett, Kyle D
@ 2011-04-03  2:02   ` Ted Ts'o
  2011-04-04 14:24     ` Moffett, Kyle D
  0 siblings, 1 reply; 35+ messages in thread
From: Ted Ts'o @ 2011-04-03  2:02 UTC (permalink / raw)
  To: Moffett, Kyle D
  Cc: 615998, Livingston, John A, linux-ext4, Sachin Sant, Aneesh Kumar K.V

Hi Kyle,

Sorry for not following up sooner.  Are you still able to reproduce
this failure?  If I set up an identical Debian stable instance on
EC-2, am I likely to reproduce it myself?  Do you have a package list
or EC2 base image I can use as a starting point?

Thanks,

						- Ted

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable  "kernel BUG  at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-04-03  2:02   ` Ted Ts'o
@ 2011-04-04 14:24     ` Moffett, Kyle D
  2011-04-04 20:51       ` Moffett, Kyle D
  2011-04-05  0:15       ` Ted Ts'o
  0 siblings, 2 replies; 35+ messages in thread
From: Moffett, Kyle D @ 2011-04-04 14:24 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: 615998, Livingston, John A, linux-ext4, Sachin Sant, Aneesh Kumar K.V

On Apr 02, 2011, at 22:02, Ted Ts'o wrote:
> Sorry for not following up sooner.  Are you still able to reproduce
> this failure?  If I set up an identical Debian stable instance on
> EC-2, am I likely to reproduce it myself?  Do you have a package list
> or EC2 base image I can use as a starting point?

I'll need to check on this.

Unfortunately it was not a trivial process to install Debian "squeeze" onto an EC2 instance; it took a couple ugly Perl scripts, a patched Debian-Installer, and several manual post-install-but-before-reboot steps (like fixing up GRUB 0.99).  One of these days I may get time to update all that to the official "wheezy" release and submit bug reports.

I have an exact image of the failing instance, but it has proprietary data on it and if I stand up an old copy I need to be careful not to actually let it send all the queued emails :-D.

It's probably easier for me to halt email delivery and clone the working instance and try to reproduce from there.  If I recall, the (easily undone) workaround was to remount from "data=journal" to "data=ordered" on a couple filesystems.  It may take a day or two to get this done, though.

If it comes down to it I also have a base image (from "squeeze" as of 9 months ago) that could be made public after updating with new SSH keys.

Thanks again!

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable  "kernel BUG  at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-04-04 14:24     ` Moffett, Kyle D
@ 2011-04-04 20:51       ` Moffett, Kyle D
  2011-04-05  0:15       ` Ted Ts'o
  1 sibling, 0 replies; 35+ messages in thread
From: Moffett, Kyle D @ 2011-04-04 20:51 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: 615998, Livingston, John A, linux-ext4, Sachin Sant, Aneesh Kumar K.V

On Apr 04, 2011, at 10:24, Moffett, Kyle D wrote:
> On Apr 02, 2011, at 22:02, Ted Ts'o wrote:
>> Sorry for not following up sooner.  Are you still able to reproduce
>> this failure?  If I set up an identical Debian stable instance on
>> EC-2, am I likely to reproduce it myself?  Do you have a package list
>> or EC2 base image I can use as a starting point?
> 
> I'll need to check on this.
> 
> Unfortunately it was not a trivial process to install Debian "squeeze" onto an EC2 instance; it took a couple ugly Perl scripts, a patched Debian-Installer, and several manual post-install-but-before-reboot steps (like fixing up GRUB 0.99).  One of these days I may get time to update all that to the official "wheezy" release and submit bug reports.
> 
> I have an exact image of the failing instance, but it has proprietary data on it and if I stand up an old copy I need to be careful not to actually let it send all the queued emails :-D.
> 
> It's probably easier for me to halt email delivery and clone the working instance and try to reproduce from there.  If I recall, the (easily undone) workaround was to remount from "data=journal" to "data=ordered" on a couple filesystems.  It may take a day or two to get this done, though.
> 
> If it comes down to it I also have a base image (from "squeeze" as of 9 months ago) that could be made public after updating with new SSH keys.

Bah...

I went back to the old image that was crashing every boot before and I can't find any way to make it crash at all now...

If I manage to reproduce it again later I'll send you another email.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG  at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-04-04 14:24     ` Moffett, Kyle D
  2011-04-04 20:51       ` Moffett, Kyle D
@ 2011-04-05  0:15       ` Ted Ts'o
  2011-04-05 15:30         ` Moffett, Kyle D
  1 sibling, 1 reply; 35+ messages in thread
From: Ted Ts'o @ 2011-04-05  0:15 UTC (permalink / raw)
  To: Moffett, Kyle D
  Cc: 615998, Livingston, John A, linux-ext4, Sachin Sant, Aneesh Kumar K.V

On Mon, Apr 04, 2011 at 09:24:28AM -0500, Moffett, Kyle D wrote:
> 
> Unfortunately it was not a trivial process to install Debian
> "squeeze" onto an EC2 instance; it took a couple ugly Perl scripts,
> a patched Debian-Installer, and several manual
> post-install-but-before-reboot steps (like fixing up GRUB 0.99).
> One of these days I may get time to update all that to the official
> "wheezy" release and submit bug reports.

Sigh, I was whoping someone was maintaining semi-official EC2 images
for Debian, much like alestic has been maintaining for Ubuntu.  (Hmm,
actually, he has EC2 images for Lenny and Etch, but unfortunately not
for squeeze.  Sigh....)

> It's probably easier for me to halt email delivery and clone the
> working instance and try to reproduce from there.  If I recall, the
> (easily undone) workaround was to remount from "data=journal" to
> "data=ordered" on a couple filesystems.  It may take a day or two to
> get this done, though.

Couple of questions which might give me some clues: (a) was this a
natively formatted ext4 file system, or a ext3 file system which was
later converted to ext4?  (b) How big are the files/directories
involved?  In particular, how big is the Postfix mail queue directory,
and it is an extent-based directory?  (what does lsattr on the mail
queue directory report) As far as file sizes, does it matter how big
the e-mail messages are, and are there any other database files that
postgress might be touching at the time that you get the OOPS?

I have found a bug in ext4 where we were underestimating how many
journal credits were needed when modifying direct/indirect-mapped
files (which would be seen on ext4 if you had a ext3 file system that
was converted to start using extents; but old, pre-existing
directories wouldn't be converted), which is why I'm asking the
question about whether this was an ext2/ext3 file system which was
converted to use ext4.

I have a patch to fix it, but backporting it into a kernel which will
work with EC2 is not something I've done before.  Can anyone point me
at a web page that gives me the quick cheat sheet?

> If it comes down to it I also have a base image (from "squeeze" as of 9 months ago) that could be made public after updating with new SSH keys. 

If we can reproduce the problem on that base image it would be really
great!  I have an Amazon AWS account; contact me when you have an
image you want to share, if you want to share it just with my AWS
account id, instead of sharing it publically...

      	  	    		   	   	 - Ted

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable  "kernel BUG  at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-04-05  0:15       ` Ted Ts'o
@ 2011-04-05 15:30         ` Moffett, Kyle D
  2011-04-05 19:07           ` Ted Ts'o
  0 siblings, 1 reply; 35+ messages in thread
From: Moffett, Kyle D @ 2011-04-05 15:30 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: 615998, linux-ext4, Sachin Sant, Aneesh Kumar K.V

On Apr 04, 2011, at 20:15, Ted Ts'o wrote:
> On Mon, Apr 04, 2011 at 09:24:28AM -0500, Moffett, Kyle D wrote:
>> 
>> Unfortunately it was not a trivial process to install Debian
>> "squeeze" onto an EC2 instance; it took a couple ugly Perl scripts,
>> a patched Debian-Installer, and several manual
>> post-install-but-before-reboot steps (like fixing up GRUB 0.99).
>> One of these days I may get time to update all that to the official
>> "wheezy" release and submit bug reports.
> 
> Sigh, I was whoping someone was maintaining semi-official EC2 images
> for Debian, much like alestic has been maintaining for Ubuntu.  (Hmm,
> actually, he has EC2 images for Lenny and Etch, but unfortunately not
> for squeeze.  Sigh....)

The Alestic EC2 images (now replaced by official Ubuntu images) use
kernel images formed as AKIs, which means users can't upload their
own.  Prior to a couple of Ubuntu staff getting special permission
to upload kernel images, all the Alestic EC2 images just borrowed
RedHat or Fedora kernels and copied over the modules.

The big problem for Squeeze is that it uses new udev which is not
compatible with those older kernels.

For the Debian-Installer and my Debian images, I use the PV-GRUB
AKI to load a kernel image from my rootfs.

Specifically, one of the Perl scripts builds an S3-based AMI
containing a Debian-Installer kernel and initramfs (using a tweaked
and preseeded D-I build).  It uploads the AMI to my account and
registers it with EC2.

Then another Perl script starts the uploaded AMI and attaches one
or more EBS volumes for the Debian-Instalelr to use.  When you've
completed the install it takes EBS snapshots and creates an
EBS-backed AMI from those.

The scripts use an odd mix of the Net::Amazon::EC2 CPAN module and
shell callouts to the ec2 tools, but they seem to work well enough.

I'm actually using the official Debian Xen kernels for both the
install process and the operational system, but the regular pv_ops
kernels (without extra Xen patches) work fine too.  The only bug I
found so far was a known workaround for old buggy hypervisors:

  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=592428

That one is fixed in the official "squeeze" release.

>> It's probably easier for me to halt email delivery and clone the
>> working instance and try to reproduce from there.  If I recall, the
>> (easily undone) workaround was to remount from "data=journal" to
>> "data=ordered" on a couple filesystems.  It may take a day or two to
>> get this done, though.
> 
> Couple of questions which might give me some clues:
>   (a) was this a natively formatted ext4 file system, or a ext3 file
>       system which was later converted to ext4?

All the filesystems were formatted like this using Debian e2fstools
as of 9 months ago:

  mke2fs -t ext4 -E lazy_itable_init=1 -L db:mail /dev/mapper/db-mail
  tune2fs -i 0 -c 1 -e remount-ro -o acl,user_xattr,journal_data /dev/mapper/db-mail

Ooooh.... could the lazy_itable_init have anything to do with it?

>   (b) How big are the files/directories involved?  In particular,
>       how big is the Postfix mail queue directory, and it is an
>       extent-based directory?  (what does lsattr on the mail queue
>       directory report)

Ok, there's a couple relatively small filesystems:
  /var/spool/postfix (20971520 sectors, 728K used right now)
  /var/lib/postfix (262144 sectors, 26K used right now)
  /var/mail (8380416 sectors, 340K used right now)

As far as I can tell, everything in each filesystem is using
extents (at least I assume that's what this means from lsattr -R):
  -----------------e- .
  -----------------e- ./corrupt
  -----------------e- ./deferred
  [...]

The "/var/spool/postfix" is the Postfix chroot as per the default
Debian configuration.

I should also mention that the EC2 hypervisor does not seem to
support barriers or flushes.  PV-GRUB complains about that very
early during the boot process.

>   (c) As far as file sizes, does it matter how big the e-mail
>       messages are, and are there any other database files that
>       postgress might be touching at the time that you get the OOPS?

I assume you mean "postfix" instead of "postgres" here.   I'm not
entirely sure because I can't reproduce the OOPS anymore, but there
does not seem to be anything in the Postfix directories other than
the individual spooled-mail files (one per email), some libraries,
some PID files, some UNIX-domain sockets, and a couple of static
files in etc/, so I would assume not.  I'm pretty sure that it is
/var/spool/postfix that was crashing.

The emails that were triggering the issue were between 4k and 120k,
but no more than 100-120 stuck emails total.

The SSL session cache files are stored in /var/lib/postfix, which
as I said above is an entirely separate filesystem.


> I have found a bug in ext4 where we were underestimating how many
> journal credits were needed when modifying direct/indirect-mapped
> files (which would be seen on ext4 if you had a ext3 file system that
> was converted to start using extents; but old, pre-existing
> directories wouldn't be converted), which is why I'm asking the
> question about whether this was an ext2/ext3 file system which was
> converted to use ext4.

I'm not entirely clear if this applies to me or not, but I'm more
than happy to try patches.


> I have a patch to fix it, but backporting it into a kernel which will
> work with EC2 is not something I've done before.  Can anyone point me
> at a web page that gives me the quick cheat sheet?

As I said above, I'm just using unmodified Debian "squeeze" kernels.
The EC2 stuff is basically a really old version of Xen, and latest
upstream kernels with paravirt_ops seem to work.  Not sure how much
good it does since I can't seem to reproduce...

I've switched the relevant filesystems back to data=journal mode,
so if you want to send me a patch for 2.6.32 that I can apply to a
Debian kernel I will keep that kernel around and if I see it happen
again I'll check if the patch fixes it.


>> If it comes down to it I also have a base image (from "squeeze" as of 9 months ago) that could be made public after updating with new SSH keys. 
> 
> If we can reproduce the problem on that base image it would be really
> great!  I have an Amazon AWS account; contact me when you have an
> image you want to share, if you want to share it just with my AWS
> account id, instead of sharing it publically...

Well, the base image is essentially a somewhat basic Debian "squeeze"
for EC2 with our SSH public keys and a couple generic customizations
applied.  It does not have Postfix installed or configured, so there
would be some work involved.

I also didn't see any problems with the system at all until the
queue got backed up with ~100-120 stuck emails.  After Postfix tried
and failed to deliver a bunch of emails I would get the OOPS.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG  at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-04-05 15:30         ` Moffett, Kyle D
@ 2011-04-05 19:07           ` Ted Ts'o
  2011-04-05 19:44             ` Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernelBUG " Moffett, Kyle D
  0 siblings, 1 reply; 35+ messages in thread
From: Ted Ts'o @ 2011-04-05 19:07 UTC (permalink / raw)
  To: Moffett, Kyle D; +Cc: 615998, linux-ext4, Sachin Sant, Aneesh Kumar K.V

On Tue, Apr 05, 2011 at 10:30:11AM -0500, Moffett, Kyle D wrote:
> > Couple of questions which might give me some clues:
> >   (a) was this a natively formatted ext4 file system, or a ext3 file
> >       system which was later converted to ext4?
> 
> All the filesystems were formatted like this using Debian e2fstools
> as of 9 months ago:

Rats.  OK, so the indirect block journal credit bug fix won't help
this bug.

>   mke2fs -t ext4 -E lazy_itable_init=1 -L db:mail /dev/mapper/db-mail
>   tune2fs -i 0 -c 1 -e remount-ro -o acl,user_xattr,journal_data /dev/mapper/db-mail
> 
> Ooooh.... could the lazy_itable_init have anything to do with it?

Shouldn't be, since 2.6.32 doesn't have the lazy inode init support.
That support didn't show up until 2.6.37.

> I've switched the relevant filesystems back to data=journal mode,
> so if you want to send me a patch for 2.6.32 that I can apply to a
> Debian kernel I will keep that kernel around and if I see it happen
> again I'll check if the patch fixes it.

Given that this was a freshly created file system with mke2fs -t ext4,
I doubt the patch would help.

> Well, the base image is essentially a somewhat basic Debian "squeeze"
> for EC2 with our SSH public keys and a couple generic customizations
> applied.  It does not have Postfix installed or configured, so there
> would be some work involved.

Well, if you can share that image in AWS with the ssh keys stripped
out it would save me a bunch of time.  I assume it's not setup to
automatically set ssh keys and pass them back to AWS like the generic
images can?

> I also didn't see any problems with the system at all until the
> queue got backed up with ~100-120 stuck emails.  After Postfix tried
> and failed to deliver a bunch of emails I would get the OOPS.

Yeah, what I'd probably try to do is install postfix and then send a
few hundred messages to foobar@example.com and see if I can repro the OOPS.

Thanks for investigating!

						- Ted

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernelBUG  at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-04-05 19:07           ` Ted Ts'o
@ 2011-04-05 19:44             ` Moffett, Kyle D
       [not found]               ` <20110405230538.GH2832@thunk.org>
  0 siblings, 1 reply; 35+ messages in thread
From: Moffett, Kyle D @ 2011-04-05 19:44 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: 615998, linux-ext4, Sachin Sant, Aneesh Kumar K.V

On Apr 05, 2011, at 15:07, Ted Ts'o wrote:
> On Tue, Apr 05, 2011 at 10:30:11AM -0500, Moffett, Kyle D wrote:
>> Well, the base image is essentially a somewhat basic Debian "squeeze"
>> for EC2 with our SSH public keys and a couple generic customizations
>> applied.  It does not have Postfix installed or configured, so there
>> would be some work involved.
> 
> Well, if you can share that image in AWS with the ssh keys stripped
> out it would save me a bunch of time.  I assume it's not setup to
> automatically set ssh keys and pass them back to AWS like the generic
> images can?

Well, the generic images just download to ~root/.ssh/authorized_keys
from http://169.254.169.254/meta-data/public-keys/0/openssh-key

They don't really "generate" a keypair themselves, that's just what
AWS does and provides the pubkey via that URL.  The 169.254.169.254 is
just the link-local address for some virtualization infrastructure
software.  These days they even let you upload your own pubkey.

Unfortunately our images don't do that download, although the patched
D-I image does do that when initially setting up the network console.

If you send me an SSH key and AWS account number in private email then
I will cook an updated image and share it out to you.


>> I also didn't see any problems with the system at all until the
>> queue got backed up with ~100-120 stuck emails.  After Postfix tried
>> and failed to deliver a bunch of emails I would get the OOPS.
> 
> Yeah, what I'd probably try to do is install postfix and then send a
> few hundred messages to foobar@example.com and see if I can repro the OOPS.

Not sure if it's related, but the particular SMTP error that was
causing things to back up in the first place was by the remote server
rejecting emails via SMTP errors after a *successful* connection.


> Thanks for investigating!

Thanks for debugging!

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64:  Repeatable "kernel BUG  at fs/jbd2/commit.c:534" from Postfix on ext4
       [not found]                 ` <FD93E462-D97B-411B-BF09-9A64670AC5C2@boeing.com>
@ 2011-06-23 18:32                   ` Moffett, Kyle D
  2011-06-23 20:55                     ` Sean Ryle
  2011-06-23 22:23                     ` Ted Ts'o
  0 siblings, 2 replies; 35+ messages in thread
From: Moffett, Kyle D @ 2011-06-23 18:32 UTC (permalink / raw)
  To: Moffett, Kyle D
  Cc: Ted Ts'o, 615998, linux-ext4, Sachin Sant, Aneesh Kumar K.V

Hello again everyone,

I'm in the middle of doing some software testing on a pre-production
clone of this system using some modified software configurations and a
testing-only data volume, and I've managed to trigger this panic again.

The trigger was exactly the same; I had a bunch of queued emails from
logcheck because my TLS configuration was wrong, then I fixed the TLS
configuration and typed "postqueue -f" to send the queued mail.

Ted, since this new iteration has no customer data, passwords, keys, or
any other private data, I'm going to try to get approval to release an
exact EC2 image of this system for you to test with, including the fake
data volume that I triggered the problem on.

If not I can certainly reproduce it now by stopping email delivery and
generating a lot of fake syslog spam; I can try applying kernel patches
and report what happens.

Hopefully you're still willing to help out tracking down the problem?

Thanks again!

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-23 18:32                   ` Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG " Moffett, Kyle D
@ 2011-06-23 20:55                     ` Sean Ryle
  2011-06-23 21:19                       ` Moffett, Kyle D
  2011-06-23 22:23                     ` Ted Ts'o
  1 sibling, 1 reply; 35+ messages in thread
From: Sean Ryle @ 2011-06-23 20:55 UTC (permalink / raw)
  To: Moffett, Kyle D, Ted Ts'o, 615998, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 2026 bytes --]

Maybe I am wrong here, but shouldn't the cast be to (unsigned long) or to
(sector_t)?

Line 534 of commit.c:
                        jbd_debug(4, "JBD: got buffer %llu (%p)\n",
                                (unsigned long long)bh->b_blocknr,
bh->b_data);

Line 64 of buffer_head.h:
         sector_t b_blocknr;             /* start block number */

Lines 137-143 of include/linux/types/h:
#ifdef CONFIG_LBDAF
typedef u64 sector_t;
typedef u64 blkcnt_t;
#else
typedef unsigned long sector_t;
typedef unsigned long blkcnt_t;
#endif

Is it possible he is experiencing the panic due to a bad cast in the call to
jbd_debug() in fs/jbd2/commit.c?  It would seem to me this should be cast to
(sector_t).  Any thoughts?



On Thu, Jun 23, 2011 at 2:32 PM, Moffett, Kyle D
<Kyle.D.Moffett@boeing.com>wrote:

> Hello again everyone,
>
> I'm in the middle of doing some software testing on a pre-production
> clone of this system using some modified software configurations and a
> testing-only data volume, and I've managed to trigger this panic again.
>
> The trigger was exactly the same; I had a bunch of queued emails from
> logcheck because my TLS configuration was wrong, then I fixed the TLS
> configuration and typed "postqueue -f" to send the queued mail.
>
> Ted, since this new iteration has no customer data, passwords, keys, or
> any other private data, I'm going to try to get approval to release an
> exact EC2 image of this system for you to test with, including the fake
> data volume that I triggered the problem on.
>
> If not I can certainly reproduce it now by stopping email delivery and
> generating a lot of fake syslog spam; I can try applying kernel patches
> and report what happens.
>
> Hopefully you're still willing to help out tracking down the problem?
>
> Thanks again!
>
> Cheers,
> Kyle Moffett
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

[-- Attachment #2: Type: text/html, Size: 2637 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-23 20:55                     ` Sean Ryle
@ 2011-06-23 21:19                       ` Moffett, Kyle D
  2011-06-24 13:46                         ` Jan Kara
  0 siblings, 1 reply; 35+ messages in thread
From: Moffett, Kyle D @ 2011-06-23 21:19 UTC (permalink / raw)
  To: Sean Ryle; +Cc: Ted Ts'o, 615998, linux-ext4, Sachin Sant, Aneesh Kumar K.V

On Jun 23, 2011, at 16:55, Sean Ryle wrote:
> Maybe I am wrong here, but shouldn't the cast be to (unsigned long) or to (sector_t)?
> 
> Line 534 of commit.c:
>                         jbd_debug(4, "JBD: got buffer %llu (%p)\n",
>                                 (unsigned long long)bh->b_blocknr, bh->b_data);

No, that printk() is fine, the format string says "%llu" so the cast is
unsigned long long.

Besides which, line 534 in the Debian 2.6.32 kernel I am using is this
one:

  J_ASSERT(commit_transaction->t_nr_buffers <=
           commit_transaction->t_outstanding_credits);

If somebody can tell me what information would help to debug this I'd be
more than happy to throw a whole bunch of debug printks under that error
condition and try to trigger the crash with that.

Alternatively I could remove that J_ASSERT() and instead add some debug
further down around the "commit_transaction->t_outstanding_credits--;"
to try to see exactly what IO it's handling when it runs out of credits.

Any ideas?

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64:  Repeatable "kernel BUG  at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-23 18:32                   ` Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG " Moffett, Kyle D
  2011-06-23 20:55                     ` Sean Ryle
@ 2011-06-23 22:23                     ` Ted Ts'o
  1 sibling, 0 replies; 35+ messages in thread
From: Ted Ts'o @ 2011-06-23 22:23 UTC (permalink / raw)
  To: Moffett, Kyle D; +Cc: 615998, linux-ext4, Sachin Sant, Aneesh Kumar K.V

On Thu, Jun 23, 2011 at 01:32:48PM -0500, Moffett, Kyle D wrote:
> 
> Ted, since this new iteration has no customer data, passwords, keys, or
> any other private data, I'm going to try to get approval to release an
> exact EC2 image of this system for you to test with, including the fake
> data volume that I triggered the problem on.

That would be great!  Approximately how big are the images involved?

     	      	      		    	    - Ted

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-23 21:19                       ` Moffett, Kyle D
@ 2011-06-24 13:46                         ` Jan Kara
  2011-06-24 16:03                           ` Moffett, Kyle D
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2011-06-24 13:46 UTC (permalink / raw)
  To: Moffett, Kyle D
  Cc: Sean Ryle, Ted Ts'o, 615998, linux-ext4, Sachin Sant,
	Aneesh Kumar K.V

[-- Attachment #1: Type: text/plain, Size: 1797 bytes --]

On Thu 23-06-11 16:19:08, Moffett, Kyle D wrote:
> On Jun 23, 2011, at 16:55, Sean Ryle wrote:
> > Maybe I am wrong here, but shouldn't the cast be to (unsigned long) or to (sector_t)?
> > 
> > Line 534 of commit.c:
> >                         jbd_debug(4, "JBD: got buffer %llu (%p)\n",
> >                                 (unsigned long long)bh->b_blocknr, bh->b_data);
> 
> No, that printk() is fine, the format string says "%llu" so the cast is
> unsigned long long.
> 
> Besides which, line 534 in the Debian 2.6.32 kernel I am using is this
> one:
> 
>   J_ASSERT(commit_transaction->t_nr_buffers <=
>            commit_transaction->t_outstanding_credits);
  Hmm, OK, so we've used more metadata buffers than we told JBD2 to
reserve. I suppose you are not using data=journal mode and the filesystem
was created as ext4 (i.e. not converted from ext3), right? Are you using
quotas?

> If somebody can tell me what information would help to debug this I'd be
> more than happy to throw a whole bunch of debug printks under that error
> condition and try to trigger the crash with that.
> 
> Alternatively I could remove that J_ASSERT() and instead add some debug
> further down around the "commit_transaction->t_outstanding_credits--;"
> to try to see exactly what IO it's handling when it runs out of credits.
  The trouble is that the problem is likely in some journal list shuffling
code because if just some operation wrongly estimated the number of needed
buffers, we'd fail the assertion in jbd2_journal_dirty_metadata():
J_ASSERT_JH(jh, handle->h_buffer_credits > 0);

The patch below might catch the problem closer to the place where it
happens...

Also possibly you can try current kernel whether the bug happens with it or
not.
								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: jbd2-credits-debug.diff --]
[-- Type: text/x-patch, Size: 1335 bytes --]

diff -rupX /crypted/home/jack/.kerndiffexclude linux-2.6.32-SLE11-SP1/fs/jbd2/transaction.c linux-2.6.32-SLE11-SP1-1-jbd2-credits-bug//fs/jbd2/transaction.c
--- linux-2.6.32-SLE11-SP1/fs/jbd2/transaction.c	2011-06-23 23:01:55.600988795 +0200
+++ linux-2.6.32-SLE11-SP1-1-jbd2-credits-bug//fs/jbd2/transaction.c	2011-06-24 15:43:40.569213743 +0200
@@ -416,6 +416,7 @@ int jbd2_journal_restart(handle_t *handl
 	spin_lock(&journal->j_state_lock);
 	spin_lock(&transaction->t_handle_lock);
 	transaction->t_outstanding_credits -= handle->h_buffer_credits;
+	WARN_ON(transaction->t_outstanding_credits < transaction->t_nr_buffers);
 	transaction->t_updates--;
 
 	if (!transaction->t_updates)
@@ -1317,6 +1318,7 @@ int jbd2_journal_stop(handle_t *handle)
 	spin_lock(&journal->j_state_lock);
 	spin_lock(&transaction->t_handle_lock);
 	transaction->t_outstanding_credits -= handle->h_buffer_credits;
+	WARN_ON(transaction->t_outstanding_credits < transaction->t_nr_buffers);
 	transaction->t_updates--;
 	if (!transaction->t_updates) {
 		wake_up(&journal->j_wait_updates);
@@ -1924,6 +1926,7 @@ void __jbd2_journal_file_buffer(struct j
 		return;
 	case BJ_Metadata:
 		transaction->t_nr_buffers++;
+		WARN_ON(transaction->t_outstanding_credits < transaction->t_nr_buffers);
 		list = &transaction->t_buffers;
 		break;
 	case BJ_Forget:

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-24 13:46                         ` Jan Kara
@ 2011-06-24 16:03                           ` Moffett, Kyle D
  2011-06-24 20:02                             ` Jan Kara
  0 siblings, 1 reply; 35+ messages in thread
From: Moffett, Kyle D @ 2011-06-24 16:03 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sean Ryle, Ted Ts'o, 615998, linux-ext4, Sachin Sant,
	Aneesh Kumar K.V

On Jun 24, 2011, at 09:46, Jan Kara wrote:
> On Thu 23-06-11 16:19:08, Moffett, Kyle D wrote:
>> Besides which, line 534 in the Debian 2.6.32 kernel I am using is this
>> one:
>> 
>>  J_ASSERT(commit_transaction->t_nr_buffers <=
>>           commit_transaction->t_outstanding_credits);
> 
>  Hmm, OK, so we've used more metadata buffers than we told JBD2 to
> reserve. I suppose you are not using data=journal mode and the filesystem
> was created as ext4 (i.e. not converted from ext3), right? Are you using
> quotas?

The filesystem *is* using data=journal mode.  If I switch to data=ordered
or data=writeback, the problem goes away.

The filesystems were created as ext4 using the e2fstools in Debian squeeze:
1.41.12, and the kernel package is 2.6.32-5-xen-amd64 (2.6.32-34squeeze1).

The exact commands I used to create the Postfix filesystems were:
  lvcreate -L  5G -n postfix dbnew
  lvcreate -L 32M -n smtp    dbnew
  mke2fs -t ext4 -L db:postfix /dev/dbnew/postfix
  mke2fs -t ext4 -L db:smtp    /dev/dbnew/smtp
  tune2fs -i 0 -c 1 -e remount-ro -o acl,user_xattr,journal_data /dev/dbnew/postfix
  tune2fs -i 0 -c 1 -e remount-ro -o acl,user_xattr,journal_data /dev/dbnew/smtp

Then my fstab has:
  /dev/mapper/dbnew-postfix /var/spool/postfix ext4 noauto,noatime,nosuid,nodev 0 2
  /dev/mapper/dbnew-smtp    /var/lib/postfix   ext4 noauto,noatime,nosuid,nodev 0 2

I don't even think I have the quota tools installed on this system; there
are certainly none configured.

>> If somebody can tell me what information would help to debug this I'd be
>> more than happy to throw a whole bunch of debug printks under that error
>> condition and try to trigger the crash with that.
>> 
>> Alternatively I could remove that J_ASSERT() and instead add some debug
>> further down around the "commit_transaction->t_outstanding_credits--;"
>> to try to see exactly what IO it's handling when it runs out of credits.
> 
>  The trouble is that the problem is likely in some journal list shuffling
> code because if just some operation wrongly estimated the number of needed
> buffers, we'd fail the assertion in jbd2_journal_dirty_metadata():
> J_ASSERT_JH(jh, handle->h_buffer_credits > 0);

Hmm, ok...  I'm also going to turn that failing J_ASSERT() into a WARN_ON()
just to see how much further it gets.  I have an easy script to recreate this
data volume even if it gets totally hosed anyways, so...

> The patch below might catch the problem closer to the place where it
> happens...
> 
> Also possibly you can try current kernel whether the bug happens with it or
> not.

I'm definitely going to try this patch, but I'll also see what I can do about
trying a more recent kernel.

Thanks!

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-24 16:03                           ` Moffett, Kyle D
@ 2011-06-24 20:02                             ` Jan Kara
  2011-06-24 20:51                               ` Kyle Moffett
  2011-06-27 11:16                               ` Lukas Czerner
  0 siblings, 2 replies; 35+ messages in thread
From: Jan Kara @ 2011-06-24 20:02 UTC (permalink / raw)
  To: Moffett, Kyle D
  Cc: Jan Kara, Sean Ryle, Ted Ts'o, 615998, linux-ext4,
	Sachin Sant, Aneesh Kumar K.V

On Fri 24-06-11 11:03:52, Moffett, Kyle D wrote:
> On Jun 24, 2011, at 09:46, Jan Kara wrote:
> > On Thu 23-06-11 16:19:08, Moffett, Kyle D wrote:
> >> Besides which, line 534 in the Debian 2.6.32 kernel I am using is this
> >> one:
> >> 
> >>  J_ASSERT(commit_transaction->t_nr_buffers <=
> >>           commit_transaction->t_outstanding_credits);
> > 
> >  Hmm, OK, so we've used more metadata buffers than we told JBD2 to
> > reserve. I suppose you are not using data=journal mode and the filesystem
> > was created as ext4 (i.e. not converted from ext3), right? Are you using
> > quotas?
> 
> The filesystem *is* using data=journal mode.  If I switch to data=ordered
> or data=writeback, the problem goes away.
  Ah, OK. Then bug https://bugzilla.kernel.org/show_bug.cgi?id=34642 is
probably ext3 incarnation of the same problem and it seems it's still
present even in the current kernel - that ext3 assertion triggered even
with 2.6.39 kernel. Frankly data=journal mode is far less tested than the
other two modes especially with ext4, so I'm not sure how good idea is to
use it in production.

> The filesystems were created as ext4 using the e2fstools in Debian squeeze:
> 1.41.12, and the kernel package is 2.6.32-5-xen-amd64 (2.6.32-34squeeze1).
> 
> The exact commands I used to create the Postfix filesystems were:
>   lvcreate -L  5G -n postfix dbnew
>   lvcreate -L 32M -n smtp    dbnew
>   mke2fs -t ext4 -L db:postfix /dev/dbnew/postfix
>   mke2fs -t ext4 -L db:smtp    /dev/dbnew/smtp
>   tune2fs -i 0 -c 1 -e remount-ro -o acl,user_xattr,journal_data /dev/dbnew/postfix
>   tune2fs -i 0 -c 1 -e remount-ro -o acl,user_xattr,journal_data /dev/dbnew/smtp
> 
> Then my fstab has:
>   /dev/mapper/dbnew-postfix /var/spool/postfix ext4 noauto,noatime,nosuid,nodev 0 2
>   /dev/mapper/dbnew-smtp    /var/lib/postfix   ext4 noauto,noatime,nosuid,nodev 0 2
> 
> I don't even think I have the quota tools installed on this system; there
> are certainly none configured.
  OK, thanks.

> >> If somebody can tell me what information would help to debug this I'd be
> >> more than happy to throw a whole bunch of debug printks under that error
> >> condition and try to trigger the crash with that.
> >> 
> >> Alternatively I could remove that J_ASSERT() and instead add some debug
> >> further down around the "commit_transaction->t_outstanding_credits--;"
> >> to try to see exactly what IO it's handling when it runs out of credits.
> > 
> >  The trouble is that the problem is likely in some journal list shuffling
> > code because if just some operation wrongly estimated the number of needed
> > buffers, we'd fail the assertion in jbd2_journal_dirty_metadata():
> > J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
> 
> Hmm, ok...  I'm also going to turn that failing J_ASSERT() into a WARN_ON()
> just to see how much further it gets.  I have an easy script to recreate this
> data volume even if it gets totally hosed anyways, so...
  OK, we'll see what happens.

> > The patch below might catch the problem closer to the place where it
> > happens...
> > 
> > Also possibly you can try current kernel whether the bug happens with it or
> > not.
> 
> I'm definitely going to try this patch, but I'll also see what I can do about
> trying a more recent kernel.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-24 20:02                             ` Jan Kara
@ 2011-06-24 20:51                               ` Kyle Moffett
  2011-08-26 21:03                                 ` Moffett, Kyle D
  2011-06-27 11:16                               ` Lukas Czerner
  1 sibling, 1 reply; 35+ messages in thread
From: Kyle Moffett @ 2011-06-24 20:51 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sean Ryle, Ted Ts'o, 615998, linux-ext4, Sachin Sant,
	Aneesh Kumar K.V

On Jun 24, 2011, at 16:02, Jan Kara wrote:
> On Fri 24-06-11 11:03:52, Moffett, Kyle D wrote:
>> On Jun 24, 2011, at 09:46, Jan Kara wrote:
>>> On Thu 23-06-11 16:19:08, Moffett, Kyle D wrote:
>>>> Besides which, line 534 in the Debian 2.6.32 kernel I am using is this
>>>> one:
>>>> 
>>>> J_ASSERT(commit_transaction->t_nr_buffers <=
>>>>          commit_transaction->t_outstanding_credits);
>>> 
>>> Hmm, OK, so we've used more metadata buffers than we told JBD2 to
>>> reserve. I suppose you are not using data=journal mode and the filesystem
>>> was created as ext4 (i.e. not converted from ext3), right? Are you using
>>> quotas?
>> 
>> The filesystem *is* using data=journal mode.  If I switch to data=ordered
>> or data=writeback, the problem goes away.
> 
>  Ah, OK. Then bug https://bugzilla.kernel.org/show_bug.cgi?id=34642 is
> probably ext3 incarnation of the same problem and it seems it's still
> present even in the current kernel - that ext3 assertion triggered even
> with 2.6.39 kernel. Frankly data=journal mode is far less tested than the
> other two modes especially with ext4, so I'm not sure how good idea is to
> use it in production.

Hm... ugh...

I really would *like* data=journal mode to work reliably, especially for
specific read-mostly databases and other such things...  I suppose the
solution is for me to load-test it and help get the bugs fixed!!! :-D


>>> The trouble is that the problem is likely in some journal list shuffling
>>> code because if just some operation wrongly estimated the number of needed
>>> buffers, we'd fail the assertion in jbd2_journal_dirty_metadata():
>>> J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
>> 
>> Hmm, ok...  I'm also going to turn that failing J_ASSERT() into a WARN_ON()
>> just to see how much further it gets.  I have an easy script to recreate this
>> data volume even if it gets totally hosed anyways, so...
>  OK, we'll see what happens.

Ok, status update here:

I applied a modified version of your patch that prints out the values of both
t_outstanding_credits and t_nr_buffers when the assertion triggers.  I replaced
the J_ASSERT() that was failing with the exact same WARN_ON() trigger too.

The end result is that postfix successfully finished delivering all the emails.
Afterwards I unmounted both filesystems and ran "fsck -fy" on them, it reported
no errors at all.

Looking through the log, the filesystem with the issues is the 32MB one mounted
on /var/lib/postfix:
  total 61
  drwxr-x---  3 postfix postfix  1024 Jun 16 21:02 .
  drwxr-xr-x 46 root    root     4096 Jun 20 17:19 ..
  d---------  2 root    root    12288 Jun 16 18:35 lost+found
  -rw-------  1 postfix postfix    33 Jun 24 16:34 master.lock
  -rw-------  1 postfix postfix  1024 Jun 24 16:44 prng_exch
  -rw-------  1 postfix postfix  2048 Jun 24 16:34 smtpd_scache.db
  -rw-------  1 postfix postfix 41984 Jun 24 16:36 smtp_scache.db

In particular, it's the tlsmgr program accessing the smtp_scache file when it
dies.

Full log below.

Cheers,
Kyle Moffett

Jun 24 16:36:05 i-38020f57 kernel: [5369326.385234] transaction->t_outstanding_credits = 8
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385247] transaction->t_nr_buffers = 9
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385251] ------------[ cut here ]------------
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385278] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/transaction.c:1329 jbd2_journal_stop+0x189/0x25d [jbd2]()
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385287] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrac
Jun 24 16:36:05 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385440] Pid: 3817, comm: tlsmgr Not tainted 2.6.32-5-xen-amd64 #1
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385445] Call Trace:
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385458]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385467]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385477]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385486]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385505]  [<ffffffffa0074c8e>] ? __ext4_journal_stop+0x63/0x69 [ext4]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385517]  [<ffffffffa0060949>] ? ext4_journalled_write_end+0x160/0x19a [ext4]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385633]  [<ffffffffa00857c6>] ? ext4_xattr_get+0x1fa/0x27c [ext4]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385643]  [<ffffffff810b5e91>] ? generic_file_buffered_write+0x18d/0x278
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385650]  [<ffffffff810b632d>] ? __generic_file_aio_write+0x25f/0x293
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385659]  [<ffffffffa0032d42>] ? jbd2_journal_stop+0x24a/0x25d [jbd2]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385669]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385674]  [<ffffffff810b63ba>] ? generic_file_aio_write+0x59/0x9f
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385682]  [<ffffffff810efa46>] ? do_sync_write+0xce/0x113
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385694]  [<ffffffffa0074c8e>] ? __ext4_journal_stop+0x63/0x69 [ext4]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385701]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385709]  [<ffffffff81102492>] ? notify_change+0x2b3/0x2c5
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385715]  [<ffffffff810f0398>] ? vfs_write+0xa9/0x102
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385720]  [<ffffffff810f0448>] ? sys_pwrite64+0x57/0x77
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385726]  [<ffffffff810ef04d>] ? sys_ftruncate+0x112/0x11d
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385732]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385737] ---[ end trace 2c615eb111c993ca ]---
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385782] transaction->t_outstanding_credits = 8
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385788] transaction->t_nr_buffers = 9
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385793] ------------[ cut here ]------------
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385804] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/transaction.c:1329 jbd2_journal_stop+0x189/0x25d [jbd2]()
Jun 24 16:36:05 i-38020f57 kernel: [5369326.385815] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrac
Jun 24 16:36:05 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386041] Pid: 3817, comm: tlsmgr Tainted: G        W  2.6.32-5-xen-amd64 #1
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386046] Call Trace:
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386055]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386064]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386071]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386077]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386087]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386098]  [<ffffffffa0033cd6>] ? jbd2_journal_start+0x96/0xc6 [jbd2]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386113]  [<ffffffffa0070df6>] ? ext4_force_commit+0xa4/0xb3 [ext4]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386126]  [<ffffffffa0058a37>] ? ext4_sync_file+0x8f/0x25c [ext4]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386138]  [<ffffffff8110ccd4>] ? vfs_fsync_range+0x73/0x9e
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386146]  [<ffffffff8110cd7e>] ? do_fsync+0x28/0x39
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386153]  [<ffffffff8110cd9d>] ? sys_fdatasync+0xe/0x13
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386163]  [<ffffffff8130d89a>] ? error_exit+0x2a/0x60
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386170]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386178]  [<ffffffff810092eb>] ? hypercall_page+0x2eb/0x1001
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386185] ---[ end trace 2c615eb111c993cb ]---
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386243] transaction->t_outstanding_credits = 8
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386249] transaction->t_nr_buffers = 9
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386253] ------------[ cut here ]------------
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386263] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/commit.c:538 jbd2_journal_commit_transaction+0x574/0x118b [jbd2]()
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386311] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrac
Jun 24 16:36:05 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386452] Pid: 992, comm: jbd2/dm-23-8 Tainted: G        W  2.6.32-5-xen-amd64 #1
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386457] Call Trace:
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386466]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386475]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386483]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386492]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386500]  [<ffffffff8130bf48>] ? thread_return+0x79/0xe0
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386506]  [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386512]  [<ffffffff8100ece2>] ? check_events+0x12/0x20
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386519]  [<ffffffff8130d2e2>] ? _spin_unlock_irqrestore+0xd/0xe
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386560]  [<ffffffff8105b8c8>] ? try_to_del_timer_sync+0x63/0x6c
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386572]  [<ffffffffa003a98b>] ? kjournald2+0xbe/0x206 [jbd2]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386578]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386587]  [<ffffffffa003a8cd>] ? kjournald2+0x0/0x206 [jbd2]
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386592]  [<ffffffff81065c39>] ? kthread+0x79/0x81
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386599]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386604]  [<ffffffff81011d61>] ? int_ret_from_sys_call+0x7/0x1b
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386610]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386616]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20
Jun 24 16:36:05 i-38020f57 kernel: [5369326.386620] ---[ end trace 2c615eb111c993cc ]---
Jun 24 16:36:07 i-38020f57 kernel: [5369328.124954] transaction->t_outstanding_credits = 9
Jun 24 16:36:07 i-38020f57 kernel: [5369328.124967] transaction->t_nr_buffers = 10
Jun 24 16:36:07 i-38020f57 kernel: [5369328.124971] ------------[ cut here ]------------
Jun 24 16:36:07 i-38020f57 kernel: [5369328.124998] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/transaction.c:1329 jbd2_journal_stop+0x189/0x25d [jbd2]()
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125007] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrac
Jun 24 16:36:07 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125158] Pid: 3817, comm: tlsmgr Tainted: G        W  2.6.32-5-xen-amd64 #1
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125164] Call Trace:
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125174]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125183]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125194]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125203]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125222]  [<ffffffffa0074c8e>] ? __ext4_journal_stop+0x63/0x69 [ext4]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125338]  [<ffffffffa0060949>] ? ext4_journalled_write_end+0x160/0x19a [ext4]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125351]  [<ffffffffa00857c6>] ? ext4_xattr_get+0x1fa/0x27c [ext4]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125361]  [<ffffffff810b5e91>] ? generic_file_buffered_write+0x18d/0x278
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125368]  [<ffffffff810b632d>] ? __generic_file_aio_write+0x25f/0x293
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125377]  [<ffffffffa0032d42>] ? jbd2_journal_stop+0x24a/0x25d [jbd2]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125385]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125391]  [<ffffffff810b63ba>] ? generic_file_aio_write+0x59/0x9f
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125398]  [<ffffffff810efa46>] ? do_sync_write+0xce/0x113
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125410]  [<ffffffffa0074c8e>] ? __ext4_journal_stop+0x63/0x69 [ext4]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125418]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125426]  [<ffffffff81102492>] ? notify_change+0x2b3/0x2c5
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125432]  [<ffffffff810f0398>] ? vfs_write+0xa9/0x102
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125437]  [<ffffffff810f0448>] ? sys_pwrite64+0x57/0x77
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125443]  [<ffffffff810ef04d>] ? sys_ftruncate+0x112/0x11d
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125449]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Jun 24 16:36:07 i-38020f57 kernel: [5369328.125454] ---[ end trace 2c615eb111c993cd ]---
Jun 24 16:36:07 i-38020f57 kernel: [5369328.180777] transaction->t_outstanding_credits = 9
Jun 24 16:36:07 i-38020f57 kernel: [5369328.180787] transaction->t_nr_buffers = 10
Jun 24 16:36:07 i-38020f57 kernel: [5369328.180792] ------------[ cut here ]------------
Jun 24 16:36:07 i-38020f57 kernel: [5369328.180810] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/transaction.c:1329 jbd2_journal_stop+0x189/0x25d [jbd2]()
Jun 24 16:36:07 i-38020f57 kernel: [5369328.180819] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrac
Jun 24 16:36:07 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
Jun 24 16:36:07 i-38020f57 kernel: [5369328.180969] Pid: 3817, comm: tlsmgr Tainted: G        W  2.6.32-5-xen-amd64 #1
Jun 24 16:36:07 i-38020f57 kernel: [5369328.180975] Call Trace:
Jun 24 16:36:07 i-38020f57 kernel: [5369328.180983]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.180992]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.180999]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181006]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181015]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181024]  [<ffffffffa0033cd6>] ? jbd2_journal_start+0x96/0xc6 [jbd2]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181083]  [<ffffffffa0070df6>] ? ext4_force_commit+0xa4/0xb3 [ext4]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181094]  [<ffffffffa0058a37>] ? ext4_sync_file+0x8f/0x25c [ext4]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181104]  [<ffffffff8110ccd4>] ? vfs_fsync_range+0x73/0x9e
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181110]  [<ffffffff8110cd7e>] ? do_fsync+0x28/0x39
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181115]  [<ffffffff8110cd9d>] ? sys_fdatasync+0xe/0x13
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181125]  [<ffffffff8130d89a>] ? error_exit+0x2a/0x60
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181131]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181137]  [<ffffffff810092eb>] ? hypercall_page+0x2eb/0x1001
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181142] ---[ end trace 2c615eb111c993ce ]---
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181160] transaction->t_outstanding_credits = 9
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181164] transaction->t_nr_buffers = 10
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181168] ------------[ cut here ]------------
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181176] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/commit.c:538 jbd2_journal_commit_transaction+0x574/0x118b [jbd2]()
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181185] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrac
Jun 24 16:36:07 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181368] Pid: 992, comm: jbd2/dm-23-8 Tainted: G        W  2.6.32-5-xen-amd64 #1
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181374] Call Trace:
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181383]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181393]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181401]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181409]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181418]  [<ffffffff8130bf48>] ? thread_return+0x79/0xe0
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181424]  [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181430]  [<ffffffff8100ece2>] ? check_events+0x12/0x20
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181436]  [<ffffffff8130d2e2>] ? _spin_unlock_irqrestore+0xd/0xe
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181445]  [<ffffffff8105b8c8>] ? try_to_del_timer_sync+0x63/0x6c
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181455]  [<ffffffffa003a98b>] ? kjournald2+0xbe/0x206 [jbd2]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181461]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181470]  [<ffffffffa003a8cd>] ? kjournald2+0x0/0x206 [jbd2]
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181475]  [<ffffffff81065c39>] ? kthread+0x79/0x81
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181482]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181487]  [<ffffffff81011d61>] ? int_ret_from_sys_call+0x7/0x1b
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181493]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181499]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20
Jun 24 16:36:07 i-38020f57 kernel: [5369328.181503] ---[ end trace 2c615eb111c993cf ]---



	

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-24 20:02                             ` Jan Kara
  2011-06-24 20:51                               ` Kyle Moffett
@ 2011-06-27 11:16                               ` Lukas Czerner
  2011-06-27 11:57                                 ` Amir Goldstein
  2011-06-27 14:02                                 ` Jan Kara
  1 sibling, 2 replies; 35+ messages in thread
From: Lukas Czerner @ 2011-06-27 11:16 UTC (permalink / raw)
  To: Jan Kara
  Cc: Moffett, Kyle D, Sean Ryle, Ted Ts'o, 615998, linux-ext4,
	Sachin Sant, Aneesh Kumar K.V

On Fri, 24 Jun 2011, Jan Kara wrote:

> On Fri 24-06-11 11:03:52, Moffett, Kyle D wrote:
> > On Jun 24, 2011, at 09:46, Jan Kara wrote:
> > > On Thu 23-06-11 16:19:08, Moffett, Kyle D wrote:
> > >> Besides which, line 534 in the Debian 2.6.32 kernel I am using is this
> > >> one:
> > >> 
> > >>  J_ASSERT(commit_transaction->t_nr_buffers <=
> > >>           commit_transaction->t_outstanding_credits);
> > > 
> > >  Hmm, OK, so we've used more metadata buffers than we told JBD2 to
> > > reserve. I suppose you are not using data=journal mode and the filesystem
> > > was created as ext4 (i.e. not converted from ext3), right? Are you using
> > > quotas?
> > 
> > The filesystem *is* using data=journal mode.  If I switch to data=ordered
> > or data=writeback, the problem goes away.
>   Ah, OK. Then bug https://bugzilla.kernel.org/show_bug.cgi?id=34642 is
> probably ext3 incarnation of the same problem and it seems it's still
> present even in the current kernel - that ext3 assertion triggered even
> with 2.6.39 kernel. Frankly data=journal mode is far less tested than the
> other two modes especially with ext4, so I'm not sure how good idea is to
> use it in production.

Hi Jan,

if it is so (and it probably is, since I am not testing this mode as
well:), it would be interesting to find out whether there are many users
of this and if there are not, which is probably the case, deprecate it now,
so we can remove it later. If we are openly suggesting not to use this,
then there is probably no point in having this option in the first
place.

I vaguely remember that Ted said something about removing data=journal
mode, but I do not remember details. Ted ?

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-27 11:16                               ` Lukas Czerner
@ 2011-06-27 11:57                                 ` Amir Goldstein
  2011-06-27 14:02                                 ` Jan Kara
  1 sibling, 0 replies; 35+ messages in thread
From: Amir Goldstein @ 2011-06-27 11:57 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: Jan Kara, Moffett, Kyle D, Sean Ryle, Ted Ts'o, 615998,
	linux-ext4, Sachin Sant, Aneesh Kumar K.V

On Mon, Jun 27, 2011 at 2:16 PM, Lukas Czerner <lczerner@redhat.com> wrote:
> On Fri, 24 Jun 2011, Jan Kara wrote:
>
>> On Fri 24-06-11 11:03:52, Moffett, Kyle D wrote:
>> > On Jun 24, 2011, at 09:46, Jan Kara wrote:
>> > > On Thu 23-06-11 16:19:08, Moffett, Kyle D wrote:
>> > >> Besides which, line 534 in the Debian 2.6.32 kernel I am using is this
>> > >> one:
>> > >>
>> > >>  J_ASSERT(commit_transaction->t_nr_buffers <=
>> > >>           commit_transaction->t_outstanding_credits);
>> > >
>> > >  Hmm, OK, so we've used more metadata buffers than we told JBD2 to
>> > > reserve. I suppose you are not using data=journal mode and the filesystem
>> > > was created as ext4 (i.e. not converted from ext3), right? Are you using
>> > > quotas?
>> >
>> > The filesystem *is* using data=journal mode.  If I switch to data=ordered
>> > or data=writeback, the problem goes away.
>>   Ah, OK. Then bug https://bugzilla.kernel.org/show_bug.cgi?id=34642 is
>> probably ext3 incarnation of the same problem and it seems it's still
>> present even in the current kernel - that ext3 assertion triggered even
>> with 2.6.39 kernel. Frankly data=journal mode is far less tested than the
>> other two modes especially with ext4, so I'm not sure how good idea is to
>> use it in production.
>
> Hi Jan,
>
> if it is so (and it probably is, since I am not testing this mode as
> well:), it would be interesting to find out whether there are many users
> of this and if there are not, which is probably the case, deprecate it now,
> so we can remove it later. If we are openly suggesting not to use this,
> then there is probably no point in having this option in the first
> place.
>
> I vaguely remember that Ted said something about removing data=journal
> mode, but I do not remember details. Ted ?
>

I think Ted was plotting about removing data=ordered...

Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-27 11:16                               ` Lukas Czerner
  2011-06-27 11:57                                 ` Amir Goldstein
@ 2011-06-27 14:02                                 ` Jan Kara
  2011-06-27 15:30                                   ` Lukas Czerner
  1 sibling, 1 reply; 35+ messages in thread
From: Jan Kara @ 2011-06-27 14:02 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: Jan Kara, Moffett, Kyle D, Sean Ryle, Ted Ts'o, 615998,
	linux-ext4, Sachin Sant, Aneesh Kumar K.V

On Mon 27-06-11 13:16:50, Lukas Czerner wrote:
> On Fri, 24 Jun 2011, Jan Kara wrote:
> 
> > On Fri 24-06-11 11:03:52, Moffett, Kyle D wrote:
> > > On Jun 24, 2011, at 09:46, Jan Kara wrote:
> > > > On Thu 23-06-11 16:19:08, Moffett, Kyle D wrote:
> > > >> Besides which, line 534 in the Debian 2.6.32 kernel I am using is this
> > > >> one:
> > > >> 
> > > >>  J_ASSERT(commit_transaction->t_nr_buffers <=
> > > >>           commit_transaction->t_outstanding_credits);
> > > > 
> > > >  Hmm, OK, so we've used more metadata buffers than we told JBD2 to
> > > > reserve. I suppose you are not using data=journal mode and the filesystem
> > > > was created as ext4 (i.e. not converted from ext3), right? Are you using
> > > > quotas?
> > > 
> > > The filesystem *is* using data=journal mode.  If I switch to data=ordered
> > > or data=writeback, the problem goes away.
> >   Ah, OK. Then bug https://bugzilla.kernel.org/show_bug.cgi?id=34642 is
> > probably ext3 incarnation of the same problem and it seems it's still
> > present even in the current kernel - that ext3 assertion triggered even
> > with 2.6.39 kernel. Frankly data=journal mode is far less tested than the
> > other two modes especially with ext4, so I'm not sure how good idea is to
> > use it in production.
 
  Hi Lukas,

> if it is so (and it probably is, since I am not testing this mode as
> well:), it would be interesting to find out whether there are many users
> of this and if there are not, which is probably the case, deprecate it now,
> so we can remove it later. If we are openly suggesting not to use this,
> then there is probably no point in having this option in the first
> place.
  For about one year I'm trying to look for people using data=journal and
I've found some. So although data=journal users are minority, there are
some. That being said I agree with you we should do something about it
- either state that we want to fully support data=journal - and then we
should really do better with testing it or deprecate it and remove it
(which would save us some complications in the code).

I would be slightly in favor of removing it (code simplicity, less options
to configure for admin, less options to test for us, some users I've come
across actually were not quite sure why they are using it - they just
thought it looks safer).

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-27 14:02                                 ` Jan Kara
@ 2011-06-27 15:30                                   ` Lukas Czerner
  2011-06-27 16:01                                     ` Ted Ts'o
  0 siblings, 1 reply; 35+ messages in thread
From: Lukas Czerner @ 2011-06-27 15:30 UTC (permalink / raw)
  To: Jan Kara
  Cc: Lukas Czerner, Moffett, Kyle D, Sean Ryle, Ted Ts'o, 615998,
	linux-ext4, Sachin Sant, Aneesh Kumar K.V

On Mon, 27 Jun 2011, Jan Kara wrote:

> On Mon 27-06-11 13:16:50, Lukas Czerner wrote:
> > On Fri, 24 Jun 2011, Jan Kara wrote:
> > 
> > > On Fri 24-06-11 11:03:52, Moffett, Kyle D wrote:
> > > > On Jun 24, 2011, at 09:46, Jan Kara wrote:
> > > > > On Thu 23-06-11 16:19:08, Moffett, Kyle D wrote:
> > > > >> Besides which, line 534 in the Debian 2.6.32 kernel I am using is this
> > > > >> one:
> > > > >> 
> > > > >>  J_ASSERT(commit_transaction->t_nr_buffers <=
> > > > >>           commit_transaction->t_outstanding_credits);
> > > > > 
> > > > >  Hmm, OK, so we've used more metadata buffers than we told JBD2 to
> > > > > reserve. I suppose you are not using data=journal mode and the filesystem
> > > > > was created as ext4 (i.e. not converted from ext3), right? Are you using
> > > > > quotas?
> > > > 
> > > > The filesystem *is* using data=journal mode.  If I switch to data=ordered
> > > > or data=writeback, the problem goes away.
> > >   Ah, OK. Then bug https://bugzilla.kernel.org/show_bug.cgi?id=34642 is
> > > probably ext3 incarnation of the same problem and it seems it's still
> > > present even in the current kernel - that ext3 assertion triggered even
> > > with 2.6.39 kernel. Frankly data=journal mode is far less tested than the
> > > other two modes especially with ext4, so I'm not sure how good idea is to
> > > use it in production.
>  
>   Hi Lukas,
> 
> > if it is so (and it probably is, since I am not testing this mode as
> > well:), it would be interesting to find out whether there are many users
> > of this and if there are not, which is probably the case, deprecate it now,
> > so we can remove it later. If we are openly suggesting not to use this,
> > then there is probably no point in having this option in the first
> > place.
>   For about one year I'm trying to look for people using data=journal and
> I've found some. So although data=journal users are minority, there are
> some. That being said I agree with you we should do something about it
> - either state that we want to fully support data=journal - and then we
> should really do better with testing it or deprecate it and remove it
> (which would save us some complications in the code).
> 
> I would be slightly in favor of removing it (code simplicity, less options
> to configure for admin, less options to test for us, some users I've come
> across actually were not quite sure why they are using it - they just
> thought it looks safer).
> 
> 								Honza
> 

I completely agree with you. Also I find it really dangerous that the
option which looks much safer is in fact less safe, because just a
minority of people (including developers) are testing it.

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-27 15:30                                   ` Lukas Czerner
@ 2011-06-27 16:01                                     ` Ted Ts'o
  2011-06-27 20:27                                       ` Jan Kara
  2011-06-28  4:21                                       ` Moffett, Kyle D
  0 siblings, 2 replies; 35+ messages in thread
From: Ted Ts'o @ 2011-06-27 16:01 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: Jan Kara, Moffett, Kyle D, Sean Ryle, 615998, linux-ext4,
	Sachin Sant, Aneesh Kumar K.V

On Mon, Jun 27, 2011 at 05:30:11PM +0200, Lukas Czerner wrote:
> > I've found some. So although data=journal users are minority, there are
> > some. That being said I agree with you we should do something about it
> > - either state that we want to fully support data=journal - and then we
> > should really do better with testing it or deprecate it and remove it
> > (which would save us some complications in the code).
> > 
> > I would be slightly in favor of removing it (code simplicity, less options
> > to configure for admin, less options to test for us, some users I've come
> > across actually were not quite sure why they are using it - they just
> > thought it looks safer).

Hmm...  FYI, I hope to be able to bring on line automated testing for
ext4 later this summer (there's a testing person at Google is has
signed up to work on setting this up as his 20% project).  The test
matrix that I have him included data=journal, so we will be getting
better testing in the near future.

At least historically, data=journalling was the *simpler* case, and
was the first thing supported by ext4.  (data=ordered required revoke
handling which didn't land for six months or so).  So I'm not really
that convinced that removing really buys us that much code
simplification.

That being siad, it is true that data=journalled isn't necessarily
faster.  For heavy disk-bound workloads, it can be slower.  So I can
imagine adding some documentation that warns people not to use
data=journal unless they really know what they are doing, but at least
personally, I'm a bit reluctant to dispense with a bug report like
this by saying, "oh, that feature should be deprecated".

Regards,

					- Ted

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-27 16:01                                     ` Ted Ts'o
@ 2011-06-27 20:27                                       ` Jan Kara
  2011-06-28  4:21                                       ` Moffett, Kyle D
  1 sibling, 0 replies; 35+ messages in thread
From: Jan Kara @ 2011-06-27 20:27 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Lukas Czerner, Jan Kara, Moffett, Kyle D, Sean Ryle, 615998,
	linux-ext4, Sachin Sant, Aneesh Kumar K.V

On Mon 27-06-11 12:01:40, Ted Tso wrote:
> On Mon, Jun 27, 2011 at 05:30:11PM +0200, Lukas Czerner wrote:
> > > I've found some. So although data=journal users are minority, there are
> > > some. That being said I agree with you we should do something about it
> > > - either state that we want to fully support data=journal - and then we
> > > should really do better with testing it or deprecate it and remove it
> > > (which would save us some complications in the code).
> > > 
> > > I would be slightly in favor of removing it (code simplicity, less options
> > > to configure for admin, less options to test for us, some users I've come
> > > across actually were not quite sure why they are using it - they just
> > > thought it looks safer).
> 
> Hmm...  FYI, I hope to be able to bring on line automated testing for
> ext4 later this summer (there's a testing person at Google is has
> signed up to work on setting this up as his 20% project).  The test
> matrix that I have him included data=journal, so we will be getting
> better testing in the near future.
  Good! 

> At least historically, data=journalling was the *simpler* case, and
> was the first thing supported by ext4.  (data=ordered required revoke
> handling which didn't land for six months or so).  So I'm not really
> that convinced that removing really buys us that much code
> simplification.
  It does buy us some reduction in number of variants (e.g. write_begin,
write_end, writepage), we also wouldn't have to care about journalled data
during invalidatepage() or releasepage() calls. 
 
> That being siad, it is true that data=journalled isn't necessarily
> faster.  For heavy disk-bound workloads, it can be slower.  So I can
> imagine adding some documentation that warns people not to use
> data=journal unless they really know what they are doing, but at least
> personally, I'm a bit reluctant to dispense with a bug report like
> this by saying, "oh, that feature should be deprecated".
  No, I didn't want to dispense the bug report - we should definitely fix
the bug. I just remarked that data=journal is currently not well tested and
thus using it in production has its problems.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-27 16:01                                     ` Ted Ts'o
  2011-06-27 20:27                                       ` Jan Kara
@ 2011-06-28  4:21                                       ` Moffett, Kyle D
  2011-06-28  9:36                                         ` Jan Kara
  1 sibling, 1 reply; 35+ messages in thread
From: Moffett, Kyle D @ 2011-06-28  4:21 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Lukas Czerner, Jan Kara, Sean Ryle, 615998, linux-ext4,
	Sachin Sant, Aneesh Kumar K.V

On Jun 27, 2011, at 12:01, Ted Ts'o wrote:
> On Mon, Jun 27, 2011 at 05:30:11PM +0200, Lukas Czerner wrote:
>>> I've found some. So although data=journal users are minority, there are
>>> some. That being said I agree with you we should do something about it
>>> - either state that we want to fully support data=journal - and then we
>>> should really do better with testing it or deprecate it and remove it
>>> (which would save us some complications in the code).
>>> 
>>> I would be slightly in favor of removing it (code simplicity, less options
>>> to configure for admin, less options to test for us, some users I've come
>>> across actually were not quite sure why they are using it - they just
>>> thought it looks safer).
> 
> Hmm...  FYI, I hope to be able to bring on line automated testing for
> ext4 later this summer (there's a testing person at Google is has
> signed up to work on setting this up as his 20% project).  The test
> matrix that I have him included data=journal, so we will be getting
> better testing in the near future.
> 
> At least historically, data=journalling was the *simpler* case, and
> was the first thing supported by ext4.  (data=ordered required revoke
> handling which didn't land for six months or so).  So I'm not really
> that convinced that removing really buys us that much code
> simplification.
> 
> That being siad, it is true that data=journalled isn't necessarily
> faster.  For heavy disk-bound workloads, it can be slower.  So I can
> imagine adding some documentation that warns people not to use
> data=journal unless they really know what they are doing, but at least
> personally, I'm a bit reluctant to dispense with a bug report like
> this by saying, "oh, that feature should be deprecated".

I suppose I should chime in here, since I'm the one who (potentially
incorrectly) thinks I should be using data=journalled mode.

My basic impression is that the use of "data=journalled" can help
reduce the risk (slightly) of serious corruption to some kinds of
databases when the application does not provide appropriate syncs
or journalling on its own (IE: such as text-based Wiki database files).

Please correct me if this is horribly horribly wrong:

no journal:
  Nothing is journalled
  + Very fast.
  + Works well for filesystems that are "mkfs"ed on every boot
  - Have to fsck after every reboot

data=writeback:
  Metadata is journalled, data (to allocated extents) may be written
  before or after the metadata is updated with a new file size.
  + Fast (not as fast as unjournalled)
  + No need to "fsck" after a hard power-down
  - A crash or power failure in the middle of a write could leave
    old data on disk at the end of a file.  If security labeling
    such as SELinux is enabled, this could "contaminate" a file with
    data from a deleted file that was at a higher sensitivity.
    Log files (including binary database replication logs) may be
    effectively corrupted as a result.

data=ordered:
  Data appended to a file will be written before the metadata
  extending the length of the file is written, and in certain cases
  the data will be written before file renames (partial ordering),
  but the data itself is unjournalled, and may be only partially
  complete for updates.
  + Does not write data to the media twice
  + A crash or power failure will not leave old uninitialized data
    in files.
  - Data writes to files may only partially complete in the event
    of a crash.  No problems for logfiles, or self-journalled
    application databases, but others may experience partial writes
    in the event of a crash and need recovery.

data=journalled:
  Data and metadata are both journalled, meaning that a given data
  write will either complete or it will never occur, although the
  precise ordering is not guaranteed.  This also implies all of the
  data<=>metadata guarantees of data=ordered.
  + Direct IO data writes are effectively "atomic", resulting in
    less likelihood of data loss for application databases which do
    not do their own journalling.  This means that a power failure
    or system crash will not result in a partially-complete write.
  - Cached writes are not atomic
  + For small cached file writes (of only a few filesystem pages)
    there is a good chance that kernel writeback will queue the
    entire write as a single I/O and it will be "protected" as a
    result.  This helps reduce the chance of serious damage to some
    text-based database files (such as those for some Wikis), but
    is obviously not a guarantee.
  - This writes all data to the block device twice (once to the FS
    journal and once to the data blocks).  This may be especially bad
    for write-limited Flash-backed devices.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-28  4:21                                       ` Moffett, Kyle D
@ 2011-06-28  9:36                                         ` Jan Kara
  2011-06-28 13:58                                           ` Ben Hutchings
                                                             ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Jan Kara @ 2011-06-28  9:36 UTC (permalink / raw)
  To: Moffett, Kyle D
  Cc: Ted Ts'o, Lukas Czerner, Jan Kara, Sean Ryle, 615998,
	linux-ext4, Sachin Sant, Aneesh Kumar K.V

On Mon 27-06-11 23:21:17, Moffett, Kyle D wrote:
> On Jun 27, 2011, at 12:01, Ted Ts'o wrote:
> > On Mon, Jun 27, 2011 at 05:30:11PM +0200, Lukas Czerner wrote:
> >>> I've found some. So although data=journal users are minority, there are
> >>> some. That being said I agree with you we should do something about it
> >>> - either state that we want to fully support data=journal - and then we
> >>> should really do better with testing it or deprecate it and remove it
> >>> (which would save us some complications in the code).
> >>> 
> >>> I would be slightly in favor of removing it (code simplicity, less options
> >>> to configure for admin, less options to test for us, some users I've come
> >>> across actually were not quite sure why they are using it - they just
> >>> thought it looks safer).
> > 
> > Hmm...  FYI, I hope to be able to bring on line automated testing for
> > ext4 later this summer (there's a testing person at Google is has
> > signed up to work on setting this up as his 20% project).  The test
> > matrix that I have him included data=journal, so we will be getting
> > better testing in the near future.
> > 
> > At least historically, data=journalling was the *simpler* case, and
> > was the first thing supported by ext4.  (data=ordered required revoke
> > handling which didn't land for six months or so).  So I'm not really
> > that convinced that removing really buys us that much code
> > simplification.
> > 
> > That being siad, it is true that data=journalled isn't necessarily
> > faster.  For heavy disk-bound workloads, it can be slower.  So I can
> > imagine adding some documentation that warns people not to use
> > data=journal unless they really know what they are doing, but at least
> > personally, I'm a bit reluctant to dispense with a bug report like
> > this by saying, "oh, that feature should be deprecated".
> 
> I suppose I should chime in here, since I'm the one who (potentially
> incorrectly) thinks I should be using data=journalled mode.
> 
> My basic impression is that the use of "data=journalled" can help
> reduce the risk (slightly) of serious corruption to some kinds of
> databases when the application does not provide appropriate syncs
> or journalling on its own (IE: such as text-based Wiki database files).
  It depends on the way such programs update the database files. But
generally yeas, data=journal provides a bit more guarantees than other
journaling modes - see below.

> Please correct me if this is horribly horribly wrong:
> 
> no journal:
>   Nothing is journalled
>   + Very fast.
>   + Works well for filesystems that are "mkfs"ed on every boot
>   - Have to fsck after every reboot
Fsck is needed only after a crash / hard powerdown. Otherwise completely
correct. Plus you always have a possibility of exposing uninitialized
(potentially sensitive) data after a fsck.

Actually, normal desktop might be quite happy with non-journaled filesystem
when fsck is fask enough.

> data=writeback:
>   Metadata is journalled, data (to allocated extents) may be written
>   before or after the metadata is updated with a new file size.
>   + Fast (not as fast as unjournalled)
>   + No need to "fsck" after a hard power-down
>   - A crash or power failure in the middle of a write could leave
>     old data on disk at the end of a file.  If security labeling
>     such as SELinux is enabled, this could "contaminate" a file with
>     data from a deleted file that was at a higher sensitivity.
>     Log files (including binary database replication logs) may be
>     effectively corrupted as a result.
Correct.

> data=ordered:
>   Data appended to a file will be written before the metadata
>   extending the length of the file is written, and in certain cases
>   the data will be written before file renames (partial ordering),
>   but the data itself is unjournalled, and may be only partially
>   complete for updates.
>   + Does not write data to the media twice
>   + A crash or power failure will not leave old uninitialized data
>     in files.
>   - Data writes to files may only partially complete in the event
>     of a crash.  No problems for logfiles, or self-journalled
>     application databases, but others may experience partial writes
>     in the event of a crash and need recovery.
Correct, one should also note that noone guarantees order in which data
hits the disk - i.e. when you do write(f,"a"); write(f,"b"); and these are
overwrites it may happen that "b" is written while "a" is not.

> data=journalled:
>   Data and metadata are both journalled, meaning that a given data
>   write will either complete or it will never occur, although the
>   precise ordering is not guaranteed.  This also implies all of the
>   data<=>metadata guarantees of data=ordered.
>   + Direct IO data writes are effectively "atomic", resulting in
>     less likelihood of data loss for application databases which do
>     not do their own journalling.  This means that a power failure
>     or system crash will not result in a partially-complete write.
Well, direct IO is atomic in data=journal the same way as in data=ordered.
It can happen only half of direct IO write is done when you hit power
button at the right moment - note this holds for overwrites.  Extending
writes or writes to holes are all-or-nothing for ext4 (again both in
data=journal and data=ordered mode).

>   - Cached writes are not atomic
>   + For small cached file writes (of only a few filesystem pages)
>     there is a good chance that kernel writeback will queue the
>     entire write as a single I/O and it will be "protected" as a
>     result.  This helps reduce the chance of serious damage to some
>     text-based database files (such as those for some Wikis), but
>     is obviously not a guarantee.
Page sized and page aligned writes are atomic (in both data=journal and
data=ordered modes). When a write spans multiple pages, there are chances
the writes will be merged in a single transaction but no guarantees as you
properly write.

>   - This writes all data to the block device twice (once to the FS
>     journal and once to the data blocks).  This may be especially bad
>     for write-limited Flash-backed devices.
Correct.

To sum up, the only additional guarantee data=journal offers against
data=ordered is a total ordering of all IO operations. That is, if you do a
sequence of data and metadata operations, then you are guaranteed that
after a crash you will see the filesystem in a state corresponding exactly
to your sequence terminated at some (arbitrary) point. Data writes are
disassembled into page-sized & page-aligned sequence of writes for purpose
of this model...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-28  9:36                                         ` Jan Kara
@ 2011-06-28 13:58                                           ` Ben Hutchings
  2011-06-28 14:16                                           ` Ted Ts'o
  2011-06-28 19:30                                           ` Moffett, Kyle D
  2 siblings, 0 replies; 35+ messages in thread
From: Ben Hutchings @ 2011-06-28 13:58 UTC (permalink / raw)
  To: Jan Kara, 615998
  Cc: Moffett, Kyle D, Ted Ts'o, Lukas Czerner, Sean Ryle,
	linux-ext4, Sachin Sant, Aneesh Kumar K.V

[-- Attachment #1: Type: text/plain, Size: 964 bytes --]

On Tue, 2011-06-28 at 11:36 +0200, Jan Kara wrote:
> On Mon 27-06-11 23:21:17, Moffett, Kyle D wrote:
[...]
> > Please correct me if this is horribly horribly wrong:
> > 
> > no journal:
> >   Nothing is journalled
> >   + Very fast.
> >   + Works well for filesystems that are "mkfs"ed on every boot
> >   - Have to fsck after every reboot
> Fsck is needed only after a crash / hard powerdown. Otherwise completely
> correct. Plus you always have a possibility of exposing uninitialized
> (potentially sensitive) data after a fsck.
> 
> Actually, normal desktop might be quite happy with non-journaled filesystem
> when fsck is fask enough.
[...]

With no journal, there is a fair risk that fsck cannot recover the
filesystem automatically (let alone the actual data).  And normal users
should never have to suffer questions from fsck.

Ben.

-- 
Ben Hutchings
In a hierarchy, every employee tends to rise to his level of incompetence.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-28  9:36                                         ` Jan Kara
  2011-06-28 13:58                                           ` Ben Hutchings
@ 2011-06-28 14:16                                           ` Ted Ts'o
  2011-06-28 19:36                                             ` Moffett, Kyle D
  2011-06-28 19:30                                           ` Moffett, Kyle D
  2 siblings, 1 reply; 35+ messages in thread
From: Ted Ts'o @ 2011-06-28 14:16 UTC (permalink / raw)
  To: Jan Kara
  Cc: Moffett, Kyle D, Lukas Czerner, Sean Ryle, 615998, linux-ext4,
	Sachin Sant, Aneesh Kumar K.V

> > My basic impression is that the use of "data=journalled" can help
> > reduce the risk (slightly) of serious corruption to some kinds of
> > databases when the application does not provide appropriate syncs
> > or journalling on its own (IE: such as text-based Wiki database files).

Yes, although if the application has index files that have to be
updated at the same time, there is no guarantee that the changes that
survive after a system failure (either a crash or a power fail),
unless the application is doing proper application-level journalling
or some other structured.

> To sum up, the only additional guarantee data=journal offers against
> data=ordered is a total ordering of all IO operations. That is, if you do a
> sequence of data and metadata operations, then you are guaranteed that
> after a crash you will see the filesystem in a state corresponding exactly
> to your sequence terminated at some (arbitrary) point. Data writes are
> disassembled into page-sized & page-aligned sequence of writes for purpose
> of this model...

data=journal can also make the fsync() operation faster, since it will
involver fewer seeks (although it will require a greater write
bandwidth).  Depending on the write bandwidth, you really need to
benchmark things to be sure, though.

						- Ted

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-28  9:36                                         ` Jan Kara
  2011-06-28 13:58                                           ` Ben Hutchings
  2011-06-28 14:16                                           ` Ted Ts'o
@ 2011-06-28 19:30                                           ` Moffett, Kyle D
  2011-06-28 22:57                                             ` Jan Kara
  2 siblings, 1 reply; 35+ messages in thread
From: Moffett, Kyle D @ 2011-06-28 19:30 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ted Ts'o, Lukas Czerner, Sean Ryle, 615998, linux-ext4,
	Sachin Sant, Aneesh Kumar K.V

This is really helpful to me, but it's deviated a bit from solving
the original bug.  Based on the last log that I generated showing that
the error occurs in journal_stop(), what else should I be testing?

Further discussion of the exact behavior of data-journalling below:

On Jun 28, 2011, at 05:36, Jan Kara wrote:
> On Mon 27-06-11 23:21:17, Moffett, Kyle D wrote:
>> On Jun 27, 2011, at 12:01, Ted Ts'o wrote:
>>> That being siad, it is true that data=journalled isn't necessarily
>>> faster.  For heavy disk-bound workloads, it can be slower.  So I can
>>> imagine adding some documentation that warns people not to use
>>> data=journal unless they really know what they are doing, but at least
>>> personally, I'm a bit reluctant to dispense with a bug report like
>>> this by saying, "oh, that feature should be deprecated".
>> 
>> I suppose I should chime in here, since I'm the one who (potentially
>> incorrectly) thinks I should be using data=journalled mode.
>> 
>> Please correct me if this is horribly horribly wrong:
>> 
>> [...]
>> 
>> no journal:
>>  Nothing is journalled
>>  + Very fast.
>>  + Works well for filesystems that are "mkfs"ed on every boot
>>  - Have to fsck after every reboot
> 
> Fsck is needed only after a crash / hard powerdown. Otherwise completely
> correct. Plus you always have a possibility of exposing uninitialized
> (potentially sensitive) data after a fsck.

Yes, sorry, I dropped the word "hard" from "hard reboot" while editing... oops.

> Actually, normal desktop might be quite happy with non-journaled filesystem
> when fsck is fask enough.

No, because fsck can occasionally fail on a non-journalled filesystem, and
then the Joe user is sitting there staring at an unhappy console prompt with
a lot of cryptic error messages.

It's also very bad for any kind of embedded or server environment that might
need to come back up headless.


>> data=ordered:
>>  Data appended to a file will be written before the metadata
>>  extending the length of the file is written, and in certain cases
>>  the data will be written before file renames (partial ordering),
>>  but the data itself is unjournalled, and may be only partially
>>  complete for updates.
>>  + Does not write data to the media twice
>>  + A crash or power failure will not leave old uninitialized data
>>    in files.
>>  - Data writes to files may only partially complete in the event
>>    of a crash.  No problems for logfiles, or self-journalled
>>    application databases, but others may experience partial writes
>>    in the event of a crash and need recovery.
> 
> Correct, one should also note that noone guarantees order in which data
> hits the disk - i.e. when you do write(f,"a"); write(f,"b"); and these are
> overwrites it may happen that "b" is written while "a" is not.

Yes, right, I should have mentioned that too.  If a program wants
data-level ordering then it must issue an fsync() or fdatasync().

Just to confirm, an file write in data=ordered mode can be only
partially written during a hard shutdown:
  char a[512] = "aaaaaaaaaaaaaaa"...;
  char b[512] = "bbbbbbbbbbbbbbb"...;
  write(fd, a, 512);
  fsync(fd);
  write(fd, b, 512);  <== Hard poweroff here
  fsync(fd);

The data on disk could contain any mix of "b"s and "a"s, and possibly
even garbage data depending on the operation of the disk firmware,
correct?


>> data=journalled:
>>  Data and metadata are both journalled, meaning that a given data
>>  write will either complete or it will never occur, although the
>>  precise ordering is not guaranteed.  This also implies all of the
>>  data<=>metadata guarantees of data=ordered.
>>  + Direct IO data writes are effectively "atomic", resulting in
>>    less likelihood of data loss for application databases which do
>>    not do their own journalling.  This means that a power failure
>>    or system crash will not result in a partially-complete write.
> 
> Well, direct IO is atomic in data=journal the same way as in data=ordered.
> It can happen only half of direct IO write is done when you hit power
> button at the right moment - note this holds for overwrites.  Extending
> writes or writes to holes are all-or-nothing for ext4 (again both in
> data=journal and data=ordered mode).

My impression of journalled data was that a single-sector write would
be written checksummed into the journal and then later into the actual
filesystem, so it would either complete (IE: journal entry checksum is
OK and it gets replayed after a crash) or it would not (IE: journal
entry does not checksum and therefore the later write never happened
and the entry is not replayed).

Where is my mental model wrong?


>>  - Cached writes are not atomic
>>  + For small cached file writes (of only a few filesystem pages)
>>    there is a good chance that kernel writeback will queue the
>>    entire write as a single I/O and it will be "protected" as a
>>    result.  This helps reduce the chance of serious damage to some
>>    text-based database files (such as those for some Wikis), but
>>    is obviously not a guarantee.
> Page sized and page aligned writes are atomic (in both data=journal and
> data=ordered modes). When a write spans multiple pages, there are chances
> the writes will be merged in a single transaction but no guarantees as you
> properly write.

I don't know that our definitions of "atomic write" are quite the same...

I'm assuming that filesystem "atomic write" means that even if the disk
itself does not guarantee that a single write will either complete or it
will be discarded, then the filesystem will provide that guarantee.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-28 14:16                                           ` Ted Ts'o
@ 2011-06-28 19:36                                             ` Moffett, Kyle D
  0 siblings, 0 replies; 35+ messages in thread
From: Moffett, Kyle D @ 2011-06-28 19:36 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Jan Kara, Lukas Czerner, Sean Ryle, 615998, linux-ext4,
	Sachin Sant, Aneesh Kumar K.V

On Jun 28, 2011, at 10:16, Ted Ts'o wrote:
>>> My basic impression is that the use of "data=journalled" can help
>>> reduce the risk (slightly) of serious corruption to some kinds of
>>> databases when the application does not provide appropriate syncs
>>> or journalling on its own (IE: such as text-based Wiki database files).
> 
> Yes, although if the application has index files that have to be
> updated at the same time, there is no guarantee that the changes that
> survive after a system failure (either a crash or a power fail),
> unless the application is doing proper application-level journalling
> or some other structured.

Manually rebuilding application indexes and clearing out caches is fine;
with a badly written application I'd have to do that anyways.  I just want
to reduce the risk that I actually corrupt data, and it sounds like that's
what data-journalling will help with.

>> To sum up, the only additional guarantee data=journal offers against
>> data=ordered is a total ordering of all IO operations. That is, if you do a
>> sequence of data and metadata operations, then you are guaranteed that
>> after a crash you will see the filesystem in a state corresponding exactly
>> to your sequence terminated at some (arbitrary) point. Data writes are
>> disassembled into page-sized & page-aligned sequence of writes for purpose
>> of this model...
> 
> data=journal can also make the fsync() operation faster, since it will
> involver fewer seeks (although it will require a greater write
> bandwidth).  Depending on the write bandwidth, you really need to
> benchmark things to be sure, though.

Hm, so this would actually be very beneficial for a mail spool directory
then, because mail servers are supposed to fsync each email received in
order to guarantee that it will not be lost before it acknowledges receipt
to the SMTP client.

Thanks again!

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-28 19:30                                           ` Moffett, Kyle D
@ 2011-06-28 22:57                                             ` Jan Kara
  2011-06-29  4:22                                               ` Moffett, Kyle D
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2011-06-28 22:57 UTC (permalink / raw)
  To: Moffett, Kyle D
  Cc: Jan Kara, Ted Ts'o, Lukas Czerner, Sean Ryle, 615998,
	linux-ext4, Sachin Sant, Aneesh Kumar K.V

On Tue 28-06-11 14:30:55, Moffett, Kyle D wrote:
> This is really helpful to me, but it's deviated a bit from solving
> the original bug.  Based on the last log that I generated showing that
> the error occurs in journal_stop(), what else should I be testing?
  I was looking at it for a while but so far I have no idea...

> Further discussion of the exact behavior of data-journalling below:
> On Jun 28, 2011, at 05:36, Jan Kara wrote:
> > On Mon 27-06-11 23:21:17, Moffett, Kyle D wrote:
>
> > Actually, normal desktop might be quite happy with non-journaled filesystem
> > when fsck is fask enough.
> 
> No, because fsck can occasionally fail on a non-journalled filesystem, and
> then the Joe user is sitting there staring at an unhappy console prompt with
> a lot of cryptic error messages.
> 
> It's also very bad for any kind of embedded or server environment that might
> need to come back up headless.
  OK, I agree.

> >> data=ordered:
> >>  Data appended to a file will be written before the metadata
> >>  extending the length of the file is written, and in certain cases
> >>  the data will be written before file renames (partial ordering),
> >>  but the data itself is unjournalled, and may be only partially
> >>  complete for updates.
> >>  + Does not write data to the media twice
> >>  + A crash or power failure will not leave old uninitialized data
> >>    in files.
> >>  - Data writes to files may only partially complete in the event
> >>    of a crash.  No problems for logfiles, or self-journalled
> >>    application databases, but others may experience partial writes
> >>    in the event of a crash and need recovery.
> > 
> > Correct, one should also note that noone guarantees order in which data
> > hits the disk - i.e. when you do write(f,"a"); write(f,"b"); and these are
> > overwrites it may happen that "b" is written while "a" is not.
> 
> Yes, right, I should have mentioned that too.  If a program wants
> data-level ordering then it must issue an fsync() or fdatasync().
> 
> Just to confirm, an file write in data=ordered mode can be only
> partially written during a hard shutdown:
>   char a[512] = "aaaaaaaaaaaaaaa"...;
>   char b[512] = "bbbbbbbbbbbbbbb"...;
>   write(fd, a, 512);
>   fsync(fd);
>   write(fd, b, 512);  <== Hard poweroff here
>   fsync(fd);
> 
> The data on disk could contain any mix of "b"s and "a"s, and possibly
> even garbage data depending on the operation of the disk firmware,
> correct?
  Correct. 

> >> data=journalled:
> >>  Data and metadata are both journalled, meaning that a given data
> >>  write will either complete or it will never occur, although the
> >>  precise ordering is not guaranteed.  This also implies all of the
> >>  data<=>metadata guarantees of data=ordered.
> >>  + Direct IO data writes are effectively "atomic", resulting in
> >>    less likelihood of data loss for application databases which do
> >>    not do their own journalling.  This means that a power failure
> >>    or system crash will not result in a partially-complete write.
> > 
> > Well, direct IO is atomic in data=journal the same way as in data=ordered.
> > It can happen only half of direct IO write is done when you hit power
> > button at the right moment - note this holds for overwrites.  Extending
> > writes or writes to holes are all-or-nothing for ext4 (again both in
> > data=journal and data=ordered mode).
> 
> My impression of journalled data was that a single-sector write would
> be written checksummed into the journal and then later into the actual
> filesystem, so it would either complete (IE: journal entry checksum is
> OK and it gets replayed after a crash) or it would not (IE: journal
> entry does not checksum and therefore the later write never happened
> and the entry is not replayed).
  Umm, right. This is true. That's another guarantee of data=journal mode I
didn't think of.

> >>  - Cached writes are not atomic
> >>  + For small cached file writes (of only a few filesystem pages)
> >>    there is a good chance that kernel writeback will queue the
> >>    entire write as a single I/O and it will be "protected" as a
> >>    result.  This helps reduce the chance of serious damage to some
> >>    text-based database files (such as those for some Wikis), but
> >>    is obviously not a guarantee.
> > Page sized and page aligned writes are atomic (in both data=journal and
> > data=ordered modes). When a write spans multiple pages, there are chances
> > the writes will be merged in a single transaction but no guarantees as you
> > properly write.
> 
> I don't know that our definitions of "atomic write" are quite the same...
> 
> I'm assuming that filesystem "atomic write" means that even if the disk
> itself does not guarantee that a single write will either complete or it
> will be discarded, then the filesystem will provide that guarantee.
  OK. There are different levels of "disk does not guarantee atomic writes"
though. E.g. flash disks don't guarantee atomic writes but even more they
actually corrupt unrelated blocks on power failure so any filesystem is
actually screwed on power failure. For standard rotating drives I'd rely on
the drive being able to write a full fs block (4k) although I agree noone
really guarantees this.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-28 22:57                                             ` Jan Kara
@ 2011-06-29  4:22                                               ` Moffett, Kyle D
  0 siblings, 0 replies; 35+ messages in thread
From: Moffett, Kyle D @ 2011-06-29  4:22 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ted Ts'o, Lukas Czerner, Sean Ryle, 615998, linux-ext4,
	Sachin Sant, Aneesh Kumar K.V

On Jun 28, 2011, at 18:57, Jan Kara wrote:
> On Tue 28-06-11 14:30:55, Moffett, Kyle D wrote:
>> On Jun 28, 2011, at 05:36, Jan Kara wrote:
>>> Well, direct IO is atomic in data=journal the same way as in data=ordered.
>>> It can happen only half of direct IO write is done when you hit power
>>> button at the right moment - note this holds for overwrites.  Extending
>>> writes or writes to holes are all-or-nothing for ext4 (again both in
>>> data=journal and data=ordered mode).
>> 
>> My impression of journalled data was that a single-sector write would
>> be written checksummed into the journal and then later into the actual
>> filesystem, so it would either complete (IE: journal entry checksum is
>> OK and it gets replayed after a crash) or it would not (IE: journal
>> entry does not checksum and therefore the later write never happened
>> and the entry is not replayed).
> 
> Umm, right. This is true. That's another guarantee of data=journal mode I
> didn't think of.

Ok, that's what I had hoped was the case.  That doesn't help much for
overwrites of variable-length data (EG: text files), but it does help
protect stuff like MySQL MyISAM (which does not do journalling).  It's
probably unnecessary for MySQL InnoDB, which *does* have its own journal.


>>> Page sized and page aligned writes are atomic (in both data=journal and
>>> data=ordered modes). When a write spans multiple pages, there are chances
>>> the writes will be merged in a single transaction but no guarantees as you
>>> properly write.
>> 
>> I don't know that our definitions of "atomic write" are quite the same...
>> 
>> I'm assuming that filesystem "atomic write" means that even if the disk
>> itself does not guarantee that a single write will either complete or it
>> will be discarded, then the filesystem will provide that guarantee.
> 
> OK. There are different levels of "disk does not guarantee atomic writes"
> though. E.g. flash disks don't guarantee atomic writes but even more they
> actually corrupt unrelated blocks on power failure so any filesystem is
> actually screwed on power failure. For standard rotating drives I'd rely on
> the drive being able to write a full fs block (4k) although I agree noone
> really guarantees this.

Well, I've seen a study somewhere that some spinning media actually *can*
tend to corrupt a nearby sector or two during a power failure, depending
on exactly what the input voltage does.  The better ones certainly have
a voltage monitor that automatically cuts power to the heads when it goes
below a critical level.

And the better Flash-based media actually *do* provide atomic write
guarantees due to the wear-levelling and flash-remapping engine.  In
order to protect their mapping table metadata and avoid very large
write amplification they will use a system similar to a log-structured
filesystem to accumulate a bunch of small random writes into one larger
write.  Since they're always writing into empty space and then doing an
atomic metadata update, their writes are always effectively atomic,
even for data.

My informal testing of the Intel X-18M drives seems to indicate that
they work that way.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-06-24 20:51                               ` Kyle Moffett
@ 2011-08-26 21:03                                 ` Moffett, Kyle D
  2011-08-30 22:12                                   ` Jan Kara
  0 siblings, 1 reply; 35+ messages in thread
From: Moffett, Kyle D @ 2011-08-26 21:03 UTC (permalink / raw)
  To: Moffett, Kyle D
  Cc: Jan Kara, Sean Ryle, Ted Ts'o, 615998, linux-ext4,
	Sachin Sant, Aneesh Kumar K.V

Ping?

Any more ideas for debugging this issue?

I can still trigger it on my VM snapshot very easily, so if you have anything
you think I should test I would be very happy to give it a shot.

On Jun 24, 2011, at 16:51, Kyle Moffett wrote:
> On Jun 24, 2011, at 16:02, Jan Kara wrote:
>> On Fri 24-06-11 11:03:52, Moffett, Kyle D wrote:
>>> On Jun 24, 2011, at 09:46, Jan Kara wrote:
>>>> On Thu 23-06-11 16:19:08, Moffett, Kyle D wrote:
>>>>> Besides which, line 534 in the Debian 2.6.32 kernel I am using is this
>>>>> one:
>>>>>
>>>>> J_ASSERT(commit_transaction->t_nr_buffers <=
>>>>>         commit_transaction->t_outstanding_credits);
>>>>
>>>> The trouble is that the problem is likely in some journal list shuffling
>>>> code because if just some operation wrongly estimated the number of needed
>>>> buffers, we'd fail the assertion in jbd2_journal_dirty_metadata():
>>>> J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
>>>
>>> Hmm, ok...  I'm also going to turn that failing J_ASSERT() into a WARN_ON()
>>> just to see how much further it gets.  I have an easy script to recreate this
>>> data volume even if it gets totally hosed anyways, so...
>>
>> OK, we'll see what happens.
>
> Ok, status update here:
>
> I applied a modified version of your patch that prints out the values of both
> t_outstanding_credits and t_nr_buffers when the assertion triggers.  I replaced
> the J_ASSERT() that was failing with the exact same WARN_ON() trigger too.
>
> The end result is that postfix successfully finished delivering all the emails.
> Afterwards I unmounted both filesystems and ran "fsck -fy" on them, it reported
> no errors at all.
>
> Looking through the log, the filesystem with the issues is the 32MB one mounted
> on /var/lib/postfix:
>  total 61
>  drwxr-x---  3 postfix postfix  1024 Jun 16 21:02 .
>  drwxr-xr-x 46 root    root     4096 Jun 20 17:19 ..
>  d---------  2 root    root    12288 Jun 16 18:35 lost+found
>  -rw-------  1 postfix postfix    33 Jun 24 16:34 master.lock
>  -rw-------  1 postfix postfix  1024 Jun 24 16:44 prng_exch
>  -rw-------  1 postfix postfix  2048 Jun 24 16:34 smtpd_scache.db
>  -rw-------  1 postfix postfix 41984 Jun 24 16:36 smtp_scache.db
>
> In particular, it's the tlsmgr program accessing the smtp_scache file when it
> dies.
>
> Full log below.
>
> Cheers,
> Kyle Moffett
>
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385234] transaction->t_outstanding_credits = 8
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385247] transaction->t_nr_buffers = 9
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385251] ------------[ cut here ]------------
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385278] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/transaction.c:1329 jbd2_journal_stop+0x189/0x25d [jbd2]()
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385287] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG 
 nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrac
> Jun 24 16:36:05 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385440] Pid: 3817, comm: tlsmgr Not tainted 2.6.32-5-xen-amd64 #1
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385445] Call Trace:
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385458]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385467]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385477]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385486]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385505]  [<ffffffffa0074c8e>] ? __ext4_journal_stop+0x63/0x69 [ext4]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385517]  [<ffffffffa0060949>] ? ext4_journalled_write_end+0x160/0x19a [ext4]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385633]  [<ffffffffa00857c6>] ? ext4_xattr_get+0x1fa/0x27c [ext4]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385643]  [<ffffffff810b5e91>] ? generic_file_buffered_write+0x18d/0x278
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385650]  [<ffffffff810b632d>] ? __generic_file_aio_write+0x25f/0x293
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385659]  [<ffffffffa0032d42>] ? jbd2_journal_stop+0x24a/0x25d [jbd2]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385669]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385674]  [<ffffffff810b63ba>] ? generic_file_aio_write+0x59/0x9f
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385682]  [<ffffffff810efa46>] ? do_sync_write+0xce/0x113
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385694]  [<ffffffffa0074c8e>] ? __ext4_journal_stop+0x63/0x69 [ext4]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385701]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385709]  [<ffffffff81102492>] ? notify_change+0x2b3/0x2c5
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385715]  [<ffffffff810f0398>] ? vfs_write+0xa9/0x102
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385720]  [<ffffffff810f0448>] ? sys_pwrite64+0x57/0x77
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385726]  [<ffffffff810ef04d>] ? sys_ftruncate+0x112/0x11d
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385732]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385737] ---[ end trace 2c615eb111c993ca ]---
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385782] transaction->t_outstanding_credits = 8
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385788] transaction->t_nr_buffers = 9
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385793] ------------[ cut here ]------------
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385804] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/transaction.c:1329 jbd2_journal_stop+0x189/0x25d [jbd2]()
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.385815] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG 
 nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrac
> Jun 24 16:36:05 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386041] Pid: 3817, comm: tlsmgr Tainted: G        W  2.6.32-5-xen-amd64 #1
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386046] Call Trace:
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386055]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386064]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386071]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386077]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386087]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386098]  [<ffffffffa0033cd6>] ? jbd2_journal_start+0x96/0xc6 [jbd2]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386113]  [<ffffffffa0070df6>] ? ext4_force_commit+0xa4/0xb3 [ext4]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386126]  [<ffffffffa0058a37>] ? ext4_sync_file+0x8f/0x25c [ext4]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386138]  [<ffffffff8110ccd4>] ? vfs_fsync_range+0x73/0x9e
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386146]  [<ffffffff8110cd7e>] ? do_fsync+0x28/0x39
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386153]  [<ffffffff8110cd9d>] ? sys_fdatasync+0xe/0x13
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386163]  [<ffffffff8130d89a>] ? error_exit+0x2a/0x60
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386170]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386178]  [<ffffffff810092eb>] ? hypercall_page+0x2eb/0x1001
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386185] ---[ end trace 2c615eb111c993cb ]---
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386243] transaction->t_outstanding_credits = 8
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386249] transaction->t_nr_buffers = 9
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386253] ------------[ cut here ]------------
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386263] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/commit.c:538 jbd2_journal_commit_transaction+0x574/0x118b [jbd2]()
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386311] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG 
 nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrac
> Jun 24 16:36:05 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386452] Pid: 992, comm: jbd2/dm-23-8 Tainted: G        W  2.6.32-5-xen-amd64 #1
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386457] Call Trace:
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386466]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386475]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386483]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386492]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386500]  [<ffffffff8130bf48>] ? thread_return+0x79/0xe0
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386506]  [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386512]  [<ffffffff8100ece2>] ? check_events+0x12/0x20
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386519]  [<ffffffff8130d2e2>] ? _spin_unlock_irqrestore+0xd/0xe
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386560]  [<ffffffff8105b8c8>] ? try_to_del_timer_sync+0x63/0x6c
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386572]  [<ffffffffa003a98b>] ? kjournald2+0xbe/0x206 [jbd2]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386578]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386587]  [<ffffffffa003a8cd>] ? kjournald2+0x0/0x206 [jbd2]
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386592]  [<ffffffff81065c39>] ? kthread+0x79/0x81
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386599]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386604]  [<ffffffff81011d61>] ? int_ret_from_sys_call+0x7/0x1b
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386610]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386616]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20
> Jun 24 16:36:05 i-38020f57 kernel: [5369326.386620] ---[ end trace 2c615eb111c993cc ]---
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.124954] transaction->t_outstanding_credits = 9
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.124967] transaction->t_nr_buffers = 10
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.124971] ------------[ cut here ]------------
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.124998] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/transaction.c:1329 jbd2_journal_stop+0x189/0x25d [jbd2]()
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125007] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG 
 nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrac
> Jun 24 16:36:07 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125158] Pid: 3817, comm: tlsmgr Tainted: G        W  2.6.32-5-xen-amd64 #1
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125164] Call Trace:
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125174]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125183]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125194]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125203]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125222]  [<ffffffffa0074c8e>] ? __ext4_journal_stop+0x63/0x69 [ext4]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125338]  [<ffffffffa0060949>] ? ext4_journalled_write_end+0x160/0x19a [ext4]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125351]  [<ffffffffa00857c6>] ? ext4_xattr_get+0x1fa/0x27c [ext4]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125361]  [<ffffffff810b5e91>] ? generic_file_buffered_write+0x18d/0x278
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125368]  [<ffffffff810b632d>] ? __generic_file_aio_write+0x25f/0x293
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125377]  [<ffffffffa0032d42>] ? jbd2_journal_stop+0x24a/0x25d [jbd2]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125385]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125391]  [<ffffffff810b63ba>] ? generic_file_aio_write+0x59/0x9f
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125398]  [<ffffffff810efa46>] ? do_sync_write+0xce/0x113
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125410]  [<ffffffffa0074c8e>] ? __ext4_journal_stop+0x63/0x69 [ext4]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125418]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125426]  [<ffffffff81102492>] ? notify_change+0x2b3/0x2c5
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125432]  [<ffffffff810f0398>] ? vfs_write+0xa9/0x102
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125437]  [<ffffffff810f0448>] ? sys_pwrite64+0x57/0x77
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125443]  [<ffffffff810ef04d>] ? sys_ftruncate+0x112/0x11d
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125449]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.125454] ---[ end trace 2c615eb111c993cd ]---
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.180777] transaction->t_outstanding_credits = 9
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.180787] transaction->t_nr_buffers = 10
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.180792] ------------[ cut here ]------------
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.180810] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/transaction.c:1329 jbd2_journal_stop+0x189/0x25d [jbd2]()
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.180819] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG 
 nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrac
> Jun 24 16:36:07 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.180969] Pid: 3817, comm: tlsmgr Tainted: G        W  2.6.32-5-xen-amd64 #1
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.180975] Call Trace:
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.180983]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.180992]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.180999]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181006]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181015]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181024]  [<ffffffffa0033cd6>] ? jbd2_journal_start+0x96/0xc6 [jbd2]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181083]  [<ffffffffa0070df6>] ? ext4_force_commit+0xa4/0xb3 [ext4]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181094]  [<ffffffffa0058a37>] ? ext4_sync_file+0x8f/0x25c [ext4]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181104]  [<ffffffff8110ccd4>] ? vfs_fsync_range+0x73/0x9e
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181110]  [<ffffffff8110cd7e>] ? do_fsync+0x28/0x39
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181115]  [<ffffffff8110cd9d>] ? sys_fdatasync+0xe/0x13
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181125]  [<ffffffff8130d89a>] ? error_exit+0x2a/0x60
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181131]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181137]  [<ffffffff810092eb>] ? hypercall_page+0x2eb/0x1001
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181142] ---[ end trace 2c615eb111c993ce ]---
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181160] transaction->t_outstanding_credits = 9
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181164] transaction->t_nr_buffers = 10
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181168] ------------[ cut here ]------------
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181176] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/commit.c:538 jbd2_journal_commit_transaction+0x574/0x118b [jbd2]()
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181185] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG 
 nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrac
> Jun 24 16:36:07 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181368] Pid: 992, comm: jbd2/dm-23-8 Tainted: G        W  2.6.32-5-xen-amd64 #1
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181374] Call Trace:
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181383]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181393]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181401]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181409]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181418]  [<ffffffff8130bf48>] ? thread_return+0x79/0xe0
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181424]  [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181430]  [<ffffffff8100ece2>] ? check_events+0x12/0x20
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181436]  [<ffffffff8130d2e2>] ? _spin_unlock_irqrestore+0xd/0xe
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181445]  [<ffffffff8105b8c8>] ? try_to_del_timer_sync+0x63/0x6c
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181455]  [<ffffffffa003a98b>] ? kjournald2+0xbe/0x206 [jbd2]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181461]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181470]  [<ffffffffa003a8cd>] ? kjournald2+0x0/0x206 [jbd2]
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181475]  [<ffffffff81065c39>] ? kthread+0x79/0x81
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181482]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181487]  [<ffffffff81011d61>] ? int_ret_from_sys_call+0x7/0x1b
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181493]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181499]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20
> Jun 24 16:36:07 i-38020f57 kernel: [5369328.181503] ---[ end trace 2c615eb111c993cf ]---


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-08-26 21:03                                 ` Moffett, Kyle D
@ 2011-08-30 22:12                                   ` Jan Kara
  2011-08-31  0:26                                     ` Moffett, Kyle D
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2011-08-30 22:12 UTC (permalink / raw)
  To: Moffett, Kyle D
  Cc: Jan Kara, Sean Ryle, Ted Ts'o, 615998, linux-ext4,
	Sachin Sant, Aneesh Kumar K.V

  Hi,

On Fri 26-08-11 16:03:32, Moffett, Kyle D wrote:
> Ping?
> 
> Any more ideas for debugging this issue?
  Sorry for not getting to you earlier.

> I can still trigger it on my VM snapshot very easily, so if you have anything
> you think I should test I would be very happy to give it a shot.
  OK, so in the meantime I found a bug in data=journal code which could be
related to your problem. It is fixed by commit
2d859db3e4a82a365572592d57624a5f996ed0ec which is in 3.1-rc1. Have you
tried that or newer kernel as well?

If the problem still is not fixed, I can provide some debugging patch to
you. We spoke with Josef Bacik how errors like yours could happen so I have
some places to watch...

								Honza

> On Jun 24, 2011, at 16:51, Kyle Moffett wrote:
> > On Jun 24, 2011, at 16:02, Jan Kara wrote:
> >> On Fri 24-06-11 11:03:52, Moffett, Kyle D wrote:
> >>> On Jun 24, 2011, at 09:46, Jan Kara wrote:
> >>>> On Thu 23-06-11 16:19:08, Moffett, Kyle D wrote:
> >>>>> Besides which, line 534 in the Debian 2.6.32 kernel I am using is this
> >>>>> one:
> >>>>>
> >>>>> J_ASSERT(commit_transaction->t_nr_buffers <=
> >>>>>         commit_transaction->t_outstanding_credits);
> >>>>
> >>>> The trouble is that the problem is likely in some journal list shuffling
> >>>> code because if just some operation wrongly estimated the number of needed
> >>>> buffers, we'd fail the assertion in jbd2_journal_dirty_metadata():
> >>>> J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
> >>>
> >>> Hmm, ok...  I'm also going to turn that failing J_ASSERT() into a WARN_ON()
> >>> just to see how much further it gets.  I have an easy script to recreate this
> >>> data volume even if it gets totally hosed anyways, so...
> >>
> >> OK, we'll see what happens.
> >
> > Ok, status update here:
> >
> > I applied a modified version of your patch that prints out the values of both
> > t_outstanding_credits and t_nr_buffers when the assertion triggers.  I replaced
> > the J_ASSERT() that was failing with the exact same WARN_ON() trigger too.
> >
> > The end result is that postfix successfully finished delivering all the emails.
> > Afterwards I unmounted both filesystems and ran "fsck -fy" on them, it reported
> > no errors at all.
> >
> > Looking through the log, the filesystem with the issues is the 32MB one mounted
> > on /var/lib/postfix:
> >  total 61
> >  drwxr-x---  3 postfix postfix  1024 Jun 16 21:02 .
> >  drwxr-xr-x 46 root    root     4096 Jun 20 17:19 ..
> >  d---------  2 root    root    12288 Jun 16 18:35 lost+found
> >  -rw-------  1 postfix postfix    33 Jun 24 16:34 master.lock
> >  -rw-------  1 postfix postfix  1024 Jun 24 16:44 prng_exch
> >  -rw-------  1 postfix postfix  2048 Jun 24 16:34 smtpd_scache.db
> >  -rw-------  1 postfix postfix 41984 Jun 24 16:36 smtp_scache.db
> >
> > In particular, it's the tlsmgr program accessing the smtp_scache file when it
> > dies.
> >
> > Full log below.
> >
> > Cheers,
> > Kyle Moffett
> >
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385234] transaction->t_outstanding_credits = 8
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385247] transaction->t_nr_buffers = 9
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385251] ------------[ cut here ]------------
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385278] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/transaction.c:1329 jbd2_journal_stop+0x189/0x25d [jbd2]()
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385287] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLO
 G nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat n
> f_conntrac
> > Jun 24 16:36:05 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385440] Pid: 3817, comm: tlsmgr Not tainted 2.6.32-5-xen-amd64 #1
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385445] Call Trace:
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385458]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385467]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385477]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385486]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385505]  [<ffffffffa0074c8e>] ? __ext4_journal_stop+0x63/0x69 [ext4]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385517]  [<ffffffffa0060949>] ? ext4_journalled_write_end+0x160/0x19a [ext4]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385633]  [<ffffffffa00857c6>] ? ext4_xattr_get+0x1fa/0x27c [ext4]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385643]  [<ffffffff810b5e91>] ? generic_file_buffered_write+0x18d/0x278
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385650]  [<ffffffff810b632d>] ? __generic_file_aio_write+0x25f/0x293
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385659]  [<ffffffffa0032d42>] ? jbd2_journal_stop+0x24a/0x25d [jbd2]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385669]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385674]  [<ffffffff810b63ba>] ? generic_file_aio_write+0x59/0x9f
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385682]  [<ffffffff810efa46>] ? do_sync_write+0xce/0x113
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385694]  [<ffffffffa0074c8e>] ? __ext4_journal_stop+0x63/0x69 [ext4]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385701]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385709]  [<ffffffff81102492>] ? notify_change+0x2b3/0x2c5
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385715]  [<ffffffff810f0398>] ? vfs_write+0xa9/0x102
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385720]  [<ffffffff810f0448>] ? sys_pwrite64+0x57/0x77
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385726]  [<ffffffff810ef04d>] ? sys_ftruncate+0x112/0x11d
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385732]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385737] ---[ end trace 2c615eb111c993ca ]---
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385782] transaction->t_outstanding_credits = 8
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385788] transaction->t_nr_buffers = 9
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385793] ------------[ cut here ]------------
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385804] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/transaction.c:1329 jbd2_journal_stop+0x189/0x25d [jbd2]()
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.385815] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLO
 G nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat n
> f_conntrac
> > Jun 24 16:36:05 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386041] Pid: 3817, comm: tlsmgr Tainted: G        W  2.6.32-5-xen-amd64 #1
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386046] Call Trace:
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386055]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386064]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386071]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386077]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386087]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386098]  [<ffffffffa0033cd6>] ? jbd2_journal_start+0x96/0xc6 [jbd2]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386113]  [<ffffffffa0070df6>] ? ext4_force_commit+0xa4/0xb3 [ext4]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386126]  [<ffffffffa0058a37>] ? ext4_sync_file+0x8f/0x25c [ext4]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386138]  [<ffffffff8110ccd4>] ? vfs_fsync_range+0x73/0x9e
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386146]  [<ffffffff8110cd7e>] ? do_fsync+0x28/0x39
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386153]  [<ffffffff8110cd9d>] ? sys_fdatasync+0xe/0x13
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386163]  [<ffffffff8130d89a>] ? error_exit+0x2a/0x60
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386170]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386178]  [<ffffffff810092eb>] ? hypercall_page+0x2eb/0x1001
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386185] ---[ end trace 2c615eb111c993cb ]---
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386243] transaction->t_outstanding_credits = 8
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386249] transaction->t_nr_buffers = 9
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386253] ------------[ cut here ]------------
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386263] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/commit.c:538 jbd2_journal_commit_transaction+0x574/0x118b [jbd2]()
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386311] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLO
 G nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat n
> f_conntrac
> > Jun 24 16:36:05 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386452] Pid: 992, comm: jbd2/dm-23-8 Tainted: G        W  2.6.32-5-xen-amd64 #1
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386457] Call Trace:
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386466]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386475]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386483]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386492]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386500]  [<ffffffff8130bf48>] ? thread_return+0x79/0xe0
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386506]  [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386512]  [<ffffffff8100ece2>] ? check_events+0x12/0x20
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386519]  [<ffffffff8130d2e2>] ? _spin_unlock_irqrestore+0xd/0xe
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386560]  [<ffffffff8105b8c8>] ? try_to_del_timer_sync+0x63/0x6c
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386572]  [<ffffffffa003a98b>] ? kjournald2+0xbe/0x206 [jbd2]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386578]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386587]  [<ffffffffa003a8cd>] ? kjournald2+0x0/0x206 [jbd2]
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386592]  [<ffffffff81065c39>] ? kthread+0x79/0x81
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386599]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386604]  [<ffffffff81011d61>] ? int_ret_from_sys_call+0x7/0x1b
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386610]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386616]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20
> > Jun 24 16:36:05 i-38020f57 kernel: [5369326.386620] ---[ end trace 2c615eb111c993cc ]---
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.124954] transaction->t_outstanding_credits = 9
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.124967] transaction->t_nr_buffers = 10
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.124971] ------------[ cut here ]------------
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.124998] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/transaction.c:1329 jbd2_journal_stop+0x189/0x25d [jbd2]()
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125007] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLO
 G nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat n
> f_conntrac
> > Jun 24 16:36:07 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125158] Pid: 3817, comm: tlsmgr Tainted: G        W  2.6.32-5-xen-amd64 #1
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125164] Call Trace:
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125174]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125183]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125194]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125203]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125222]  [<ffffffffa0074c8e>] ? __ext4_journal_stop+0x63/0x69 [ext4]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125338]  [<ffffffffa0060949>] ? ext4_journalled_write_end+0x160/0x19a [ext4]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125351]  [<ffffffffa00857c6>] ? ext4_xattr_get+0x1fa/0x27c [ext4]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125361]  [<ffffffff810b5e91>] ? generic_file_buffered_write+0x18d/0x278
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125368]  [<ffffffff810b632d>] ? __generic_file_aio_write+0x25f/0x293
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125377]  [<ffffffffa0032d42>] ? jbd2_journal_stop+0x24a/0x25d [jbd2]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125385]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125391]  [<ffffffff810b63ba>] ? generic_file_aio_write+0x59/0x9f
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125398]  [<ffffffff810efa46>] ? do_sync_write+0xce/0x113
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125410]  [<ffffffffa0074c8e>] ? __ext4_journal_stop+0x63/0x69 [ext4]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125418]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125426]  [<ffffffff81102492>] ? notify_change+0x2b3/0x2c5
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125432]  [<ffffffff810f0398>] ? vfs_write+0xa9/0x102
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125437]  [<ffffffff810f0448>] ? sys_pwrite64+0x57/0x77
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125443]  [<ffffffff810ef04d>] ? sys_ftruncate+0x112/0x11d
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125449]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.125454] ---[ end trace 2c615eb111c993cd ]---
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.180777] transaction->t_outstanding_credits = 9
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.180787] transaction->t_nr_buffers = 10
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.180792] ------------[ cut here ]------------
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.180810] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/transaction.c:1329 jbd2_journal_stop+0x189/0x25d [jbd2]()
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.180819] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLO
 G nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat n
> f_conntrac
> > Jun 24 16:36:07 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.180969] Pid: 3817, comm: tlsmgr Tainted: G        W  2.6.32-5-xen-amd64 #1
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.180975] Call Trace:
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.180983]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.180992]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.180999]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181006]  [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181015]  [<ffffffffa0032c81>] ? jbd2_journal_stop+0x189/0x25d [jbd2]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181024]  [<ffffffffa0033cd6>] ? jbd2_journal_start+0x96/0xc6 [jbd2]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181083]  [<ffffffffa0070df6>] ? ext4_force_commit+0xa4/0xb3 [ext4]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181094]  [<ffffffffa0058a37>] ? ext4_sync_file+0x8f/0x25c [ext4]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181104]  [<ffffffff8110ccd4>] ? vfs_fsync_range+0x73/0x9e
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181110]  [<ffffffff8110cd7e>] ? do_fsync+0x28/0x39
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181115]  [<ffffffff8110cd9d>] ? sys_fdatasync+0xe/0x13
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181125]  [<ffffffff8130d89a>] ? error_exit+0x2a/0x60
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181131]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181137]  [<ffffffff810092eb>] ? hypercall_page+0x2eb/0x1001
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181142] ---[ end trace 2c615eb111c993ce ]---
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181160] transaction->t_outstanding_credits = 9
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181164] transaction->t_nr_buffers = 10
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181168] ------------[ cut here ]------------
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181176] WARNING: at /tmp/kdm-deb-kernel/linux-2.6-2.6.32/debian/build/source_amd64_xen/fs/jbd2/commit.c:538 jbd2_journal_commit_transaction+0x574/0x118b [jbd2]()
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181185] Modules linked in: ip6table_filter ip6_tables act_police cls_flow cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLO
 G nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat n
> f_conntrac
> > Jun 24 16:36:07 i-38020f57 kernel: k_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables ext3 jbd loop snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext4 mbcache jbd2 crc16 dm_mod xen_netfront xen_blkfront
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181368] Pid: 992, comm: jbd2/dm-23-8 Tainted: G        W  2.6.32-5-xen-amd64 #1
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181374] Call Trace:
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181383]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181393]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181401]  [<ffffffff8104ef00>] ? warn_slowpath_common+0x77/0xa3
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181409]  [<ffffffffa0034471>] ? jbd2_journal_commit_transaction+0x574/0x118b [jbd2]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181418]  [<ffffffff8130bf48>] ? thread_return+0x79/0xe0
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181424]  [<ffffffff8100e629>] ? xen_force_evtchn_callback+0x9/0xa
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181430]  [<ffffffff8100ece2>] ? check_events+0x12/0x20
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181436]  [<ffffffff8130d2e2>] ? _spin_unlock_irqrestore+0xd/0xe
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181445]  [<ffffffff8105b8c8>] ? try_to_del_timer_sync+0x63/0x6c
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181455]  [<ffffffffa003a98b>] ? kjournald2+0xbe/0x206 [jbd2]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181461]  [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181470]  [<ffffffffa003a8cd>] ? kjournald2+0x0/0x206 [jbd2]
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181475]  [<ffffffff81065c39>] ? kthread+0x79/0x81
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181482]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181487]  [<ffffffff81011d61>] ? int_ret_from_sys_call+0x7/0x1b
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181493]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181499]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20
> > Jun 24 16:36:07 i-38020f57 kernel: [5369328.181503] ---[ end trace 2c615eb111c993cf ]---
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-08-30 22:12                                   ` Jan Kara
@ 2011-08-31  0:26                                     ` Moffett, Kyle D
  2011-09-01 15:17                                       ` Jan Kara
  0 siblings, 1 reply; 35+ messages in thread
From: Moffett, Kyle D @ 2011-08-31  0:26 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sean Ryle, Ted Ts'o, 615998, linux-ext4, Sachin Sant,
	Aneesh Kumar K.V

On Aug 30, 2011, at 18:12, Jan Kara wrote:
> On Fri 26-08-11 16:03:32, Moffett, Kyle D wrote:
>> Ping?
>> 
>> Any more ideas for debugging this issue?
> 
> Sorry for not getting to you earlier.

That's ok, I have a workaround so it's been on my back burner for a while.

>> I can still trigger it on my VM snapshot very easily, so if you have anything
>> you think I should test I would be very happy to give it a shot.
> 
>  OK, so in the meantime I found a bug in data=journal code which could be
> related to your problem. It is fixed by commit
> 2d859db3e4a82a365572592d57624a5f996ed0ec which is in 3.1-rc1. Have you
> tried that or newer kernel as well?
> 
> If the problem still is not fixed, I can provide some debugging patch to
> you. We spoke with Josef Bacik how errors like yours could happen so I have
> some places to watch...

I have not tried anything more recent; I'm actually a bit reluctant to move
away from the Debian squeeze official kernels since I do need the security
updates.

I took a quick look and I can't find that function in 2.6.32, so I assume it
would be a rather nontrivial back-port.  It looks like the relevant code
used to be in ext4_clear_inode somewhere?

Out of curiosity, what would happen in data=journal mode if you unlinked a
file which still had buffers pending?  That case does not seem to be handled
by that commit you mentioned, was it already handled elsewhere?

Thanks again!

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-08-31  0:26                                     ` Moffett, Kyle D
@ 2011-09-01 15:17                                       ` Jan Kara
  2011-12-06 21:26                                         ` Moffett, Kyle D
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2011-09-01 15:17 UTC (permalink / raw)
  To: Moffett, Kyle D
  Cc: Jan Kara, Sean Ryle, Ted Ts'o, 615998, linux-ext4,
	Sachin Sant, Aneesh Kumar K.V

[-- Attachment #1: Type: text/plain, Size: 1635 bytes --]

On Tue 30-08-11 19:26:22, Moffett, Kyle D wrote:
> On Aug 30, 2011, at 18:12, Jan Kara wrote:
> >> I can still trigger it on my VM snapshot very easily, so if you have anything
> >> you think I should test I would be very happy to give it a shot.
> > 
> >  OK, so in the meantime I found a bug in data=journal code which could be
> > related to your problem. It is fixed by commit
> > 2d859db3e4a82a365572592d57624a5f996ed0ec which is in 3.1-rc1. Have you
> > tried that or newer kernel as well?
> > 
> > If the problem still is not fixed, I can provide some debugging patch to
> > you. We spoke with Josef Bacik how errors like yours could happen so I have
> > some places to watch...
> 
> I have not tried anything more recent; I'm actually a bit reluctant to move
> away from the Debian squeeze official kernels since I do need the security
> updates.
> 
> I took a quick look and I can't find that function in 2.6.32, so I assume it
> would be a rather nontrivial back-port.  It looks like the relevant code
> used to be in ext4_clear_inode somewhere?
  It's not that hard - untested patch attached.

> Out of curiosity, what would happen in data=journal mode if you unlinked a
> file which still had buffers pending?  That case does not seem to be handled
> by that commit you mentioned, was it already handled elsewhere?
  Once the file is deleted, it's OK to discard its data after a
transaction doing delete commits. The current code in JBD2 handles this
case fine - the problem was that for not-deleted files we cannot discard
dirty data after a transaction commits ;)

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: ext4-2.6.32-data-journal-corruption.diff --]
[-- Type: text/x-patch, Size: 3021 bytes --]

diff -rupX /crypted/home/jack/.kerndiffexclude linux-2.6.32.orig//fs/ext4/inode.c linux-2.6.32-ext4_fix//fs/ext4/inode.c
--- linux-2.6.32.orig//fs/ext4/inode.c	2009-12-03 04:51:21.000000000 +0100
+++ linux-2.6.32-ext4_fix//fs/ext4/inode.c	2011-09-01 17:15:39.742361528 +0200
@@ -1794,6 +1794,7 @@ static int ext4_journalled_write_end(str
 	if (new_i_size > inode->i_size)
 		i_size_write(inode, pos+copied);
 	EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
+	EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
 	if (new_i_size > EXT4_I(inode)->i_disksize) {
 		ext4_update_i_disksize(inode, new_i_size);
 		ret2 = ext4_mark_inode_dirty(handle, inode);
@@ -2630,6 +2631,7 @@ static int __ext4_journalled_writepage(s
 				write_end_fn);
 	if (ret == 0)
 		ret = err;
+	EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
 	err = ext4_journal_stop(handle);
 	if (!ret)
 		ret = err;
diff -rupX /crypted/home/jack/.kerndiffexclude linux-2.6.32.orig//fs/ext4/super.c linux-2.6.32-ext4_fix//fs/ext4/super.c
--- linux-2.6.32.orig//fs/ext4/super.c	2009-12-03 04:51:21.000000000 +0100
+++ linux-2.6.32-ext4_fix//fs/ext4/super.c	2011-09-01 17:13:55.379363889 +0200
@@ -759,6 +759,40 @@ static void ext4_clear_inode(struct inod
 				       &EXT4_I(inode)->jinode);
 }
 
+static void ext4_drop_inode(struct inode *inode)
+{
+	if (inode->i_nlink) {
+		/*
+		 * When journalling data dirty buffers are tracked only in the
+		 * journal. So although mm thinks everything is clean and
+		 * ready for reaping the inode might still have some pages to
+		 * write in the running transaction or waiting to be
+		 * checkpointed. Thus calling jbd2_journal_invalidatepage()
+		 * (via truncate_inode_pages()) to discard these buffers can
+		 * cause data loss. Also even if we did not discard these
+		 * buffers, we would have no way to find them after the inode
+		 * is reaped and thus user could see stale data if he tries to
+		 * read them before the transaction is checkpointed. So be
+		 * careful and force everything to disk here... We use
+		 * ei->i_datasync_tid to store the newest transaction
+		 * containing inode's data.
+		 *
+		 * Note that directories do not have this problem because they
+		 * don't use page cache.
+		 */
+		if (ext4_should_journal_data(inode) &&
+		    (S_ISLNK(inode->i_mode) || S_ISREG(inode->i_mode))) {
+			journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
+			tid_t commit_tid = EXT4_I(inode)->i_datasync_tid;
+
+			jbd2_log_start_commit(journal, commit_tid);
+			jbd2_log_wait_commit(journal, commit_tid);
+			filemap_write_and_wait(&inode->i_data);
+		}
+	}
+	generic_drop_inode(inode);
+}
+
 static inline void ext4_show_quota_options(struct seq_file *seq,
 					   struct super_block *sb)
 {
@@ -1029,6 +1063,7 @@ static const struct super_operations ext
 	.statfs		= ext4_statfs,
 	.remount_fs	= ext4_remount,
 	.clear_inode	= ext4_clear_inode,
+	.drop_inode	= ext4_drop_inode,
 	.show_options	= ext4_show_options,
 #ifdef CONFIG_QUOTA
 	.quota_read	= ext4_quota_read,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4
  2011-09-01 15:17                                       ` Jan Kara
@ 2011-12-06 21:26                                         ` Moffett, Kyle D
  0 siblings, 0 replies; 35+ messages in thread
From: Moffett, Kyle D @ 2011-12-06 21:26 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sean Ryle, Ted Ts'o, 615998, linux-ext4, Sachin Sant,
	Aneesh Kumar K.V

Hello again!

I know it's been ages, but I finally got some time to get that patch
tested out and try additional debugging.

On Sep 01, 2011, at 11:17, Jan Kara wrote:
> On Tue 30-08-11 19:26:22, Moffett, Kyle D wrote:
>> On Aug 30, 2011, at 18:12, Jan Kara wrote:
>>>> I can still trigger it on my VM snapshot very easily, so if you have anything
>>>> you think I should test I would be very happy to give it a shot.
>>> 
>>> OK, so in the meantime I found a bug in data=journal code which could be
>>> related to your problem. It is fixed by commit
>>> 2d859db3e4a82a365572592d57624a5f996ed0ec which is in 3.1-rc1. Have you
>>> tried that or newer kernel as well?
>>> 
>>> If the problem still is not fixed, I can provide some debugging patch to
>>> you. We spoke with Josef Bacik how errors like yours could happen so I have
>>> some places to watch...
>> 
>> I have not tried anything more recent; I'm actually a bit reluctant to move
>> away from the Debian squeeze official kernels since I do need the security
>> updates.
>> 
>> I took a quick look and I can't find that function in 2.6.32, so I assume it
>> would be a rather nontrivial back-port.  It looks like the relevant code
>> used to be in ext4_clear_inode somewhere?
> It's not that hard - untested patch attached.

So this applied mostly cleanly (with one minor context-only conflict in
the 2.6.32.17 patch), unfortunately it didn't resolve the problem.
Just as a sanity check, I upgraded to the Debian 3.1.0-1-amd64 kernel,
based on kernel version 3.1.1 and the problem still occurs there too
(additional info at the end of the email).

Looking at the issue again, I don't think it has anything to do with
file deletion at all.

Specifically, there are a grand total of 4 files in that filesystem
(alongside an empty "lost+found" directory):
  master.lock
  prng_exch
  smtpd_scache.db
  smtp_scache.db

As far as I can tell, none of those is ever deleted during normal
operation.

The crash occurs very quickly after starting postfix.  It connects to
the external email server (using TLS) and begins to flush queued mail.

At that point, the "tlsmgr" daemon tries to update the "smtp_scache.db"
file, which is a Berkeley DB about 40k in size.  Somewhere in there,
the Berkeley DB does an fdatasync().

The "fdatasync()" apparently triggers the bad behavior from the "jbd2"
thread, which then oopses in fs/jbd2/commit.c:485 (which appears to be
the same same BUG_ON() as before).

The stack looks something like this:
  jbd_journal_commit_transaction+0x4ea/0x1053 [jbd2]
  kjournald2+0xc0/0x20a [jbd2]
  add_wait_queue+0x3c/0x3c
  commit_timeout+0x5/0x5 [jbd2]
  kthread+0x76/0x7e

Cheers,
Kyle Moffett

--
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2011-12-06 23:20 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20110301165239.3310.43806.reportbug@support.exmeritus.com>
     [not found] ` <BE4E C1DF-4DFC-4B94-923D-0197B16BD7B4@boeing.com>
2011-03-01 19:26 ` Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4 Moffett, Kyle D
2011-04-03  2:02   ` Ted Ts'o
2011-04-04 14:24     ` Moffett, Kyle D
2011-04-04 20:51       ` Moffett, Kyle D
2011-04-05  0:15       ` Ted Ts'o
2011-04-05 15:30         ` Moffett, Kyle D
2011-04-05 19:07           ` Ted Ts'o
2011-04-05 19:44             ` Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernelBUG " Moffett, Kyle D
     [not found]               ` <20110405230538.GH2832@thunk.org>
     [not found]                 ` <FD93E462-D97B-411B-BF09-9A64670AC5C2@boeing.com>
2011-06-23 18:32                   ` Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG " Moffett, Kyle D
2011-06-23 20:55                     ` Sean Ryle
2011-06-23 21:19                       ` Moffett, Kyle D
2011-06-24 13:46                         ` Jan Kara
2011-06-24 16:03                           ` Moffett, Kyle D
2011-06-24 20:02                             ` Jan Kara
2011-06-24 20:51                               ` Kyle Moffett
2011-08-26 21:03                                 ` Moffett, Kyle D
2011-08-30 22:12                                   ` Jan Kara
2011-08-31  0:26                                     ` Moffett, Kyle D
2011-09-01 15:17                                       ` Jan Kara
2011-12-06 21:26                                         ` Moffett, Kyle D
2011-06-27 11:16                               ` Lukas Czerner
2011-06-27 11:57                                 ` Amir Goldstein
2011-06-27 14:02                                 ` Jan Kara
2011-06-27 15:30                                   ` Lukas Czerner
2011-06-27 16:01                                     ` Ted Ts'o
2011-06-27 20:27                                       ` Jan Kara
2011-06-28  4:21                                       ` Moffett, Kyle D
2011-06-28  9:36                                         ` Jan Kara
2011-06-28 13:58                                           ` Ben Hutchings
2011-06-28 14:16                                           ` Ted Ts'o
2011-06-28 19:36                                             ` Moffett, Kyle D
2011-06-28 19:30                                           ` Moffett, Kyle D
2011-06-28 22:57                                             ` Jan Kara
2011-06-29  4:22                                               ` Moffett, Kyle D
2011-06-23 22:23                     ` Ted Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.