linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.28 ext4, xen and lvm volume becomes ro after snapshot
@ 2008-12-26 11:06 Andreas Sundstrom
  2008-12-26 14:07 ` Theodore Tso
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Sundstrom @ 2008-12-26 11:06 UTC (permalink / raw)
  To: linux-kernel

Decided to try out 2.6.28 on a xen VM yesterday.
Used my old 2.6.27 .config as a base and did an "make oldconfig" and
answered any questions that came up.

Everything went fine and the new kernel booted and worked nicely.

Now I wanted to try the new ext4 fs (for no particular reason other than
it being new)
So to start with I tried it without updating the actual fs layout to
ext4, I left it at ext3 and only mounted it as ext4.

That meant adding this to the xen VM config file:
extra = "console=hvc0 rootfstype=ext4"

And also changing to ext4 in /etc/fstab

That also worked fine and I saw that some stuff was being enabled (like
barriers)
But today I noticed that something was wrong with the VM and I noticed
after a while that the root volume was read-only (this VM only has a
root partition).

I noticed the time that the logs were written to last and that was
around the time my backups are being run.

I use a little script that backup this xen VM from xen dom0.

Here is the interesting part of my script:
/usr/sbin/xm sysrq xenfw1 s
/usr/sbin/xm pause xenfw1
sync
/sbin/lvm lvcreate --snapshot --permission rw --size 1G --name xenfw1_s
/dev/3w250g/xenfw1 > /dev/null
/usr/sbin/xm unpause xenfw1
/bin/mount -o ro,noatime /dev/3w250g/xenfw1_s /mnt/snapshots/xenfw1 ||
/bin/rmdir /mnt/snapshots/xenfw1
# Here's where the actual backups take place
/bin/umount /mnt/snapshots/xenfw1 2> /dev/null
/sbin/lvm lvremove --force /dev/3w250g/xenfw1_s > /dev/null

After the "lvcreate --snapshot" if I check within the xen VM (with cat
/proc/mounts) I can see that / changed from rw to ro:
Before snapshot:
/dev/root / ext4 rw,noatime,barrier=1,noextents,data=ordered 0 0
After snapshot:
/dev/root / ext4 ro,noatime,barrier=1,noextents,data=ordered 0 0

It was not possible to do "mount -o remount,rw /":
mount -o remount,rw
/                                                                                      

mount: block device /dev/xvda1 is write-protected, mounting read-only  

I don't see how this could be an expected behaviour so I thought it
might be good to report it here.
I hope I explained the scenario good enough, I can reproduce this if you
want more details (now I'm back on ext3 on this xen VM)

P.S. I'm currently not subscribed so please CC me on any posts regarding
this.

/Andreas Sundstrom

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.28 ext4, xen and lvm volume becomes ro after snapshot
  2008-12-26 11:06 2.6.28 ext4, xen and lvm volume becomes ro after snapshot Andreas Sundstrom
@ 2008-12-26 14:07 ` Theodore Tso
  2008-12-26 15:42   ` Andreas Sundstrom
  0 siblings, 1 reply; 13+ messages in thread
From: Theodore Tso @ 2008-12-26 14:07 UTC (permalink / raw)
  To: Andreas Sundstrom; +Cc: linux-kernel

On Fri, Dec 26, 2008 at 12:06:19PM +0100, Andreas Sundstrom wrote:
> I use a little script that backup this xen VM from xen dom0.
> 
> Here is the interesting part of my script:
> /usr/sbin/xm sysrq xenfw1 s
> /usr/sbin/xm pause xenfw1
> sync
> /sbin/lvm lvcreate --snapshot --permission rw --size 1G --name xenfw1_s
> /dev/3w250g/xenfw1 > /dev/null
> /usr/sbin/xm unpause xenfw1
> /bin/mount -o ro,noatime /dev/3w250g/xenfw1_s /mnt/snapshots/xenfw1 ||
> /bin/rmdir /mnt/snapshots/xenfw1
> # Here's where the actual backups take place
> /bin/umount /mnt/snapshots/xenfw1 2> /dev/null
> /sbin/lvm lvremove --force /dev/3w250g/xenfw1_s > /dev/null

OK, so this is being run on Xen Dom0, which means your doing the
snapshot from Host OS, while the guest OS is suspended, correct?

> After the "lvcreate --snapshot" if I check within the xen VM (with cat
> /proc/mounts) I can see that / changed from rw to ro:
> Before snapshot:
> /dev/root / ext4 rw,noatime,barrier=1,noextents,data=ordered 0 0
> After snapshot:
> /dev/root / ext4 ro,noatime,barrier=1,noextents,data=ordered 0 0

But this was done from the guest OS --- what is /dev/root in the guest
OS?  Am I right in assuming it is the device /dev/3w250g/xenfw1?
Stupid question --- if this is the case, why are you taking the
snapshot from the Host OS?  It won't be a consistent snapshot, given
that it was mounted in the Guest OS, and all you've done in the Guest
OS is to ask it to sync the filesystem.  Given that there's no pause
between sysrq s and the pause, the write operations probably haven't
even been completed before the pause takes place, so the snapshot
probably has a chance of in pretty bad shape as it is.

If the filesystem is being remounted read-only, that tends to indicate
that the filesystem flagged an error for some reason.  So looking at
the dmesg on the Guest OS for any ext4 errors would be the useful
thing to do.

> It was not possible to do "mount -o remount,rw /":
> mount -o remount,rw /                                                 

There are multiple potential reasons for this, but I'm guessing this
was caused by an aborted journal, probably caused by an I/O error when
trying to write to the journal.  This may very well be related to
pausing guest OS in the middle of a write operations that would have
been caused by the sysrq s, but that's a guess; we need more
information about exactly what is going on.

> I hope I explained the scenario good enough, I can reproduce this if you
> want more details (now I'm back on ext3 on this xen VM)

If you could reproduce this and send back the messages from dmesg in
the guest OS, that would be quite helpful.  You might also try adding
a sleep 2 between the sysrq s and the pause xenfw1 commands, and see
if the problem goes away.  That's just a stab in the dark, of course,
but it's a simple enough thing to try.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.28 ext4, xen and lvm volume becomes ro after snapshot
  2008-12-26 14:07 ` Theodore Tso
@ 2008-12-26 15:42   ` Andreas Sundstrom
  2008-12-26 18:21     ` Theodore Tso
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Sundstrom @ 2008-12-26 15:42 UTC (permalink / raw)
  To: linux-kernel

Theodore Tso wrote:
> On Fri, Dec 26, 2008 at 12:06:19PM +0100, Andreas Sundstrom wrote:
>   
>> I use a little script that backup this xen VM from xen dom0.
>>
>> Here is the interesting part of my script:
>> /usr/sbin/xm sysrq xenfw1 s
>> /usr/sbin/xm pause xenfw1
>> sync
>> /sbin/lvm lvcreate --snapshot --permission rw --size 1G --name xenfw1_s
>> /dev/3w250g/xenfw1 > /dev/null
>> /usr/sbin/xm unpause xenfw1
>> /bin/mount -o ro,noatime /dev/3w250g/xenfw1_s /mnt/snapshots/xenfw1 ||
>> /bin/rmdir /mnt/snapshots/xenfw1
>> # Here's where the actual backups take place
>> /bin/umount /mnt/snapshots/xenfw1 2> /dev/null
>> /sbin/lvm lvremove --force /dev/3w250g/xenfw1_s > /dev/null
>>     
>
> OK, so this is being run on Xen Dom0, which means your doing the
> snapshot from Host OS, while the guest OS is suspended, correct?
>   
Correct
>   
>> After the "lvcreate --snapshot" if I check within the xen VM (with cat
>> /proc/mounts) I can see that / changed from rw to ro:
>> Before snapshot:
>> /dev/root / ext4 rw,noatime,barrier=1,noextents,data=ordered 0 0
>> After snapshot:
>> /dev/root / ext4 ro,noatime,barrier=1,noextents,data=ordered 0 0
>>     
>
> But this was done from the guest OS --- what is /dev/root in the guest
> OS?  Am I right in assuming it is the device /dev/3w250g/xenfw1?
> Stupid question --- if this is the case, why are you taking the
> snapshot from the Host OS?  It won't be a consistent snapshot, given
> that it was mounted in the Guest OS, and all you've done in the Guest
> OS is to ask it to sync the filesystem.  Given that there's no pause
> between sysrq s and the pause, the write operations probably haven't
> even been completed before the pause takes place, so the snapshot
> probably has a chance of in pretty bad shape as it is.
>   
You're probably correct, I already have a "sleep 1" that I missed to
include in my e-mail though.
Let's not analyze the backup method so much, it's not a very important
machine and I mostly want config files to be backed up and they are
pretty consistent at all times anyway.
>> It was not possible to do "mount -o remount,rw /":
>> mount -o remount,rw /                                                 
>>     
>
> There are multiple potential reasons for this, but I'm guessing this
> was caused by an aborted journal, probably caused by an I/O error when
> trying to write to the journal.  This may very well be related to
> pausing guest OS in the middle of a write operations that would have
> been caused by the sysrq s, but that's a guess; we need more
> information about exactly what is going on.
>   
You are correct about the aborted journal but it's not caused by
anythingelse than the lvm snapshot.
> If you could reproduce this and send back the messages from dmesg in
> the guest OS, that would be quite helpful.  You might also try adding
> a sleep 2 between the sysrq s and the pause xenfw1 commands, and see
> if the problem goes away.  That's just a stab in the dark, of course,
> but it's a simple enough thing to try.
>   
Ok, I used another Xen VM for testing now.

While I was testing now I noticed that I even could take the snapshot
while the system is not started and the error would still occur.
No problems with ext3 with the same kernel that I've seen yet.

Here's one way to reproduce:
# lvs | grep xenfw2
  xenfw2          3w250g -wi-a-   1.00G # Notice the last "-" which is
not "o"
  xenfw2_swap     3w250g -wi-a- 196.00M
# lvcreate --snapshot --permission rw --size 1G --name xenfw2_s
/dev/3w250g/xenfw2
  Logical volume "xenfw2_s" created
# lvs | grep xenfw2
  xenfw2          3w250g owi-a-   1.00G
  xenfw2_s        3w250g swi-a-   1.00G xenfw2   0.00
  xenfw2_swap     3w250g -wi-a- 196.00M
# xm create xenfw2 -c # To start the Xen VM with console attached

Here's what I think is the relevant part of the kernel output

[    0.317797] EXT4-fs warning (device xvda1): ext4_fill_super: extents
feature not enabled on this filesystem, use tune2fs.     
[   
0.317822]                                                                                                                   

[    0.326805] EXT4-fs: barriers
enabled                                                                                         

[    0.336488] kjournald2 starting.  Commit interval 5
seconds                                                                   
[    0.336514] EXT4-fs: delayed allocation
enabled                                                                               

[    0.336610] EXT4-fs: mballoc
enabled                                                                                          

[    0.336622] EXT4-fs: mounted filesystem with ordered data
mode.                                                               
[    0.336651] VFS: Mounted root (ext4 filesystem)
readonly.                                                                     

[    0.336880] Freeing unused kernel memory: 280k
freed                                                                          

[    0.337936] Write protecting the kernel text:
2624k                                                                           

[    0.338358] Write protecting the kernel read-only data: 1028k
[snip]
[    7.636849] blkfront: xvda1: write barrier op failed     
[    7.636867] blkfront: xvda1: barriers disabled          
[    7.636877] end_request: I/O error, dev xvda1, sector 4512
[    7.636892] end_request: I/O error, dev xvda1, sector 4512
[    7.636933] Aborting journal on device xvda1:8.          
[    7.637602] ext4_abort called.                           
[    7.637616] EXT4-fs error (device xvda1): ext4_journal_start_sb:
Detected aborted journal
[    7.637633] Remounting filesystem
read-only                                             
[    7.652152] EXT4-fs error (device xvda1) in ext4_reserve_inode_write:
Journal has aborted

Let me know if I can help with more information

/Andreas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.28 ext4, xen and lvm volume becomes ro after snapshot
  2008-12-26 15:42   ` Andreas Sundstrom
@ 2008-12-26 18:21     ` Theodore Tso
  2008-12-26 18:48       ` Andreas Sundstrom
  0 siblings, 1 reply; 13+ messages in thread
From: Theodore Tso @ 2008-12-26 18:21 UTC (permalink / raw)
  To: Andreas Sundstrom; +Cc: linux-kernel

Hmm... ok, to summarize, you are seeing this problem with 2.6.28 (have
you tried earlier kernel versions) when an LVM volume has a read/only
snapshot in existence and you try to mount the LVM volume using ext4.
If you mount the same LVM volume using ext3, you don't see any
problems.   Is that correct?

Can you try mounting ext3 with barriers enabled?  Booting with the
command linux option rootflags=barriers should do the trick.  If that
fails, then it would indicate that trying to enable barriers in a Xen
guest while the host OS has created a snapshot of volume causes an I/O
error, thus leading errors which you are seeing.

						- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.28 ext4, xen and lvm volume becomes ro after snapshot
  2008-12-26 18:21     ` Theodore Tso
@ 2008-12-26 18:48       ` Andreas Sundstrom
  2008-12-26 19:33         ` Theodore Tso
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Sundstrom @ 2008-12-26 18:48 UTC (permalink / raw)
  To: linux-kernel

Theodore Tso wrote:
> Hmm... ok, to summarize, you are seeing this problem with 2.6.28 (have
> you tried earlier kernel versions) when an LVM volume has a read/only
> snapshot in existence and you try to mount the LVM volume using ext4.
> If you mount the same LVM volume using ext3, you don't see any
> problems.   Is that correct?
>   
Correct, and I have not been using barriers with any earlier kernel.
I stumbled upon this because barriers are enabled by default in ext4.
> Can you try mounting ext3 with barriers enabled?  Booting with the
> command linux option rootflags=barriers should do the trick.  If that
> fails, then it would indicate that trying to enable barriers in a Xen
> guest while the host OS has created a snapshot of volume causes an I/O
> error, thus leading errors which you are seeing.
>
> 						- Ted
>   
Yes, I mounted it with ext3 and barrier=1 and could reproduce the problem.
ext3 did not remount the fs ro though, it seems to only disable barriers:

[    7.681759] blkfront: xvda1: write barrier op failed
[    7.681776] blkfront: xvda1: barriers disabled
[    7.681785] end_request: I/O error, dev xvda1, sector 4584
[    7.681800] end_request: I/O error, dev xvda1, sector 4584
[    7.681886] JBD: barrier-based sync failed on xvda1 - disabling barriers

And then I tested with ext4 and barrier=0 and that also works.

I don't know if this is expected behaviour or not or if it's worth
looking into.
I just wanted to report back what I saw. For me personally it's no issue.
I'll be using ext4 without barriers for a while and see how it goes.

But I'm here if you want something tested or a patch verified or anything,
but I guess this might be a Xen issue rather than vanilla kernel stuff.

Thanks for helping out with the narrowing down of the issue

/Andreas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.28 ext4, xen and lvm volume becomes ro after snapshot
  2008-12-26 18:48       ` Andreas Sundstrom
@ 2008-12-26 19:33         ` Theodore Tso
  2008-12-26 22:00           ` Andreas Sundstrom
  0 siblings, 1 reply; 13+ messages in thread
From: Theodore Tso @ 2008-12-26 19:33 UTC (permalink / raw)
  To: Andreas Sundstrom; +Cc: linux-kernel

On Fri, Dec 26, 2008 at 07:48:22PM +0100, Andreas Sundstrom wrote:
> Yes, I mounted it with ext3 and barrier=1 and could reproduce the problem.
> ext3 did not remount the fs ro though, it seems to only disable barriers:
> 
> [    7.681759] blkfront: xvda1: write barrier op failed
> [    7.681776] blkfront: xvda1: barriers disabled
> [    7.681785] end_request: I/O error, dev xvda1, sector 4584
> [    7.681800] end_request: I/O error, dev xvda1, sector 4584
> [    7.681886] JBD: barrier-based sync failed on xvda1 - disabling barriers
> 
> And then I tested with ext4 and barrier=0 and that also works.

Ext4 has patches which will checks the error returns on writes to the
journal, and will abort the journal in case of I/O failures.  Ext3
should have the same patches, but it's apparently missing one of the
patches, or it's otherwise not noticing the problem.  (You were
testing ext3 on a 2.6.28 kernel, right?)

> But I'm here if you want something tested or a patch verified or anything,
> but I guess this might be a Xen issue rather than vanilla kernel stuff.

Yes, this looks very much like a Xen issue.  What is going on is that
we submit the write with barriers enabled, and if it fails, we try
again without barriers.  I'm guessing that Xen emulation code didn't
notice that we were trying again without barriers, or the Xen
emulation isn't clearing the error flag, but for whatever reason,
we're getting a write failure somewhere else later on, and that's
causing the failures.

What would be really useful to nail down exactly what is going on
would be to patch fs/jbd/journal.c and fs/jbd2/journal.c so that the
line:

u8 journal_enable_debug __read_mostly;

is changed to read:

u8 journal_enable_debug=3 __read_mostly;


and similarly in fs/jbd2/journal.c, change:

u8 jbd2_journal_enable_debug __read_mostly;

to read

u8 jbd2_journal_enable_debug=3 __read_mostly;

That will generate a lot more debugging information, and hopefully we
can see exactly what was going on right before the journal abort, and
why ext4 apparently didn't get the corret error return after the
barrier operation failed.

But yes, this ultimately seems very likely to be a Xen emulation bug.

    	      		       	    	      - Ted







> 
> Thanks for helping out with the narrowing down of the issue
> 
> /Andreas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.28 ext4, xen and lvm volume becomes ro after snapshot
  2008-12-26 19:33         ` Theodore Tso
@ 2008-12-26 22:00           ` Andreas Sundstrom
  2008-12-27  3:06             ` Theodore Tso
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Sundstrom @ 2008-12-26 22:00 UTC (permalink / raw)
  To: linux-kernel

Theodore Tso wrote:
> On Fri, Dec 26, 2008 at 07:48:22PM +0100, Andreas Sundstrom wrote:
>> Yes, I mounted it with ext3 and barrier=1 and could reproduce the problem.
>> ext3 did not remount the fs ro though, it seems to only disable barriers:
>>
>> [    7.681759] blkfront: xvda1: write barrier op failed
>> [    7.681776] blkfront: xvda1: barriers disabled
>> [    7.681785] end_request: I/O error, dev xvda1, sector 4584
>> [    7.681800] end_request: I/O error, dev xvda1, sector 4584
>> [    7.681886] JBD: barrier-based sync failed on xvda1 - disabling barriers
>>
>> And then I tested with ext4 and barrier=0 and that also works.
> 
> Ext4 has patches which will checks the error returns on writes to the
> journal, and will abort the journal in case of I/O failures.  Ext3
> should have the same patches, but it's apparently missing one of the
> patches, or it's otherwise not noticing the problem.  (You were
> testing ext3 on a 2.6.28 kernel, right?)

Yes it was the same kernel even.

> 
>> But I'm here if you want something tested or a patch verified or anything,
>> but I guess this might be a Xen issue rather than vanilla kernel stuff.
> 
> Yes, this looks very much like a Xen issue.  What is going on is that
> we submit the write with barriers enabled, and if it fails, we try
> again without barriers.  I'm guessing that Xen emulation code didn't
> notice that we were trying again without barriers, or the Xen
> emulation isn't clearing the error flag, but for whatever reason,
> we're getting a write failure somewhere else later on, and that's
> causing the failures.
> 
> What would be really useful to nail down exactly what is going on
> would be to patch fs/jbd/journal.c and fs/jbd2/journal.c so that the
> line:
> 
> u8 journal_enable_debug __read_mostly;
> 
> is changed to read:
> 
> u8 journal_enable_debug=3 __read_mostly;
> 
> 
> and similarly in fs/jbd2/journal.c, change:
> 
> u8 jbd2_journal_enable_debug __read_mostly;
> 
> to read
> 
> u8 jbd2_journal_enable_debug=3 __read_mostly;
> 
> That will generate a lot more debugging information, and hopefully we
> can see exactly what was going on right before the journal abort, and
> why ext4 apparently didn't get the corret error return after the
> barrier operation failed.
> 
> But yes, this ultimately seems very likely to be a Xen emulation bug.

Unfortunately it didn't compile with the above changes and I'm totally
crap at programming :(

But I enabled debugfs and did
"echo 3 > /sys/kernel/debug/jbd2/jbd2-debug" and reproduced the problem
by taking a snapshot while the system was live.
I hope this had the same effect as your proposed change.

I also put it on this URL to make it easier to read
http://pastebin.com/m7efd8eb3


Dec 26 22:38:57 192.168.20.33 kernel: imklog 3.18.6, log source =
/proc/kmsg started.
Dec 26 22:42:01 192.168.20.33 kernel: [  482.814386]
(fs/jbd2/transaction.c, 111): start_this_handle: New handle c34be000
going live.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815139]
(fs/jbd2/transaction.c, 1267): jbd2_journal_stop: transaction too old,
requesting commit for handle c34be000

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815147] (fs/jbd2/journal.c,
449): __jbd2_log_start_commit: JBD: requesting commit 1549171/1549170

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815157] (fs/jbd2/journal.c,
546): jbd2_log_wait_commit: JBD: want 1549171, j_commit_sequence=1549170

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815168] (fs/jbd2/journal.c,
195): kjournald2: kjournald2 wakes
Dec 26 22:42:01 192.168.20.33 kernel: [  482.815174] (fs/jbd2/journal.c,
147): kjournald2: commit_sequence=1549170, commit_request=1549171

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815181] (fs/jbd2/journal.c,
150): kjournald2: OK, requests differ

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815188] (fs/jbd2/commit.c,
362): jbd2_journal_commit_transaction: superblock not updated

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815194] (fs/jbd2/commit.c,
374): jbd2_journal_commit_transaction: JBD: starting commit of
transaction 1549171

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815210]
(fs/jbd2/checkpoint.c, 762): __jbd2_journal_drop_transaction: Dropping
transaction 1549170, all done

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815218] (fs/jbd2/commit.c,
447): jbd2_journal_commit_transaction: JBD: commit phase 1

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815226] (fs/jbd2/commit.c,
465): jbd2_journal_commit_transaction: JBD: commit phase 2

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815235] (fs/jbd2/revoke.c,
520): jbd2_journal_write_revoke_records: Wrote 0 revoke records

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815242] (fs/jbd2/commit.c,
477): jbd2_journal_commit_transaction: JBD: commit phase 2

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815249] (fs/jbd2/commit.c,
703): jbd2_journal_commit_transaction: JBD: commit phase 3

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815255] (fs/jbd2/commit.c,
762): jbd2_journal_commit_transaction: JBD: commit phase 4

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815261] (fs/jbd2/commit.c,
792): jbd2_journal_commit_transaction: JBD: commit phase 5

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815369]
(fs/jbd2/transaction.c, 111): start_this_handle: New handle c34be018
going live.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815389]
(fs/jbd2/transaction.c, 111): start_this_handle: New handle c34be018
going live.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815415]
(fs/jbd2/transaction.c, 111): start_this_handle: New handle c34be018
going live.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815425]
(fs/jbd2/transaction.c, 111): start_this_handle: New handle c34be018
going live.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815490]
(fs/jbd2/transaction.c, 111): start_this_handle: New handle c34be018
going live.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815500]
(fs/jbd2/transaction.c, 111): start_this_handle: New handle c34be018
going live.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.816304] blkfront: xvda1:
write barrier op failed
Dec 26 22:42:01 192.168.20.33 kernel: [  482.816320] blkfront: xvda1:
barriers disabled
Dec 26 22:42:01 192.168.20.33 kernel: [  482.816330] end_request: I/O
error, dev xvda1, sector 5600
Dec 26 22:42:01 192.168.20.33 kernel: [  482.816342] end_request: I/O
error, dev xvda1, sector 5600
Dec 26 22:42:01 192.168.20.33 kernel: [  482.816380] Aborting journal on
device xvda1:8.
Dec 26 22:42:01 192.168.20.33 kernel: [  482.816389] (fs/jbd2/journal.c,
449): __jbd2_log_start_commit: JBD: requesting commit 1549172/1549170

Dec 26 22:42:01 192.168.20.33 kernel: [  482.816397] (fs/jbd2/journal.c,
1278): jbd2_journal_update_superblock: JBD: updating superblock (start
1, seq 1549157, errno -5)
Dec 26 22:42:01 192.168.20.33 kernel: [  482.821228] EXT4-fs error
(device xvda1) in ext4_reserve_inode_write: Journal has aborted

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836134] (fs/jbd2/commit.c,
812): jbd2_journal_commit_transaction: JBD: commit phase 6

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836142] (fs/jbd2/commit.c,
937): jbd2_journal_commit_transaction: JBD: commit phase 7

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836151]
(fs/jbd2/checkpoint.c, 762): __jbd2_journal_drop_transaction: Dropping
transaction 1549171, all done

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836158] (fs/jbd2/commit.c,
1005): jbd2_journal_commit_transaction: JBD: commit 1549171 complete,
head 1549157
Dec 26 22:42:01 192.168.20.33 kernel: [  482.836169] (fs/jbd2/journal.c,
147): kjournald2: commit_sequence=1549171, commit_request=1549172

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836175] (fs/jbd2/journal.c,
150): kjournald2: OK, requests differ

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836182] (fs/jbd2/commit.c,
362): jbd2_journal_commit_transaction: superblock not updated

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836189] (fs/jbd2/commit.c,
374): jbd2_journal_commit_transaction: JBD: starting commit of
transaction 1549172

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836205] journal commit I/O
error
Dec 26 22:42:01 192.168.20.33 kernel: [  482.836521] ext4_abort called.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836531] EXT4-fs error
(device xvda1): ext4_journal_start_sb: Detected aborted journal

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836548] Remounting
filesystem read-only
Dec 26 22:42:01 192.168.20.33 kernel: [  482.836685] ext4_da_writepages:
jbd2_start: 1642 pages, ino 81965; err -30

/Andreas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.28 ext4, xen and lvm volume becomes ro after snapshot
  2008-12-26 22:00           ` Andreas Sundstrom
@ 2008-12-27  3:06             ` Theodore Tso
  2008-12-27  9:19               ` Andreas Sundstrom
  0 siblings, 1 reply; 13+ messages in thread
From: Theodore Tso @ 2008-12-27  3:06 UTC (permalink / raw)
  To: Andreas Sundstrom; +Cc: linux-kernel

On Fri, Dec 26, 2008 at 11:00:11PM +0100, Andreas Sundstrom wrote:
> But I enabled debugfs and did
> "echo 3 > /sys/kernel/debug/jbd2/jbd2-debug" and reproduced the problem
> by taking a snapshot while the system was live.
> I hope this had the same effect as your proposed change.

Thanks, that was helpful.  Can you try applying this patch, and let me
know whether the printk triggers?

What I'm guessing is going on is that on a native kernel, we get the
ENOTSUPP error immediately when we call submit_bh().  However, with
the Xen kernel, we aren't getting the error right away; we're either
getting ENOTSUPP later on, when we call wait_on_buffer().  For ext3,
this doesn't matter, since we call sync_dirty_buffer() which calls
submit_bh() and wait_on_buffer() synchronously.  But ext4 doesn't use
sync_dirty_buffer(), instead calling submit_bh() and wait_on_buffer()
separately.

This patch should be able to confirm whether or not this supposition
is correct.

						- Ted

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index bd1fad0..630196d 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -174,9 +174,16 @@ static int journal_wait_on_commit_record(struct buffer_head *bh)
 
 	clear_buffer_dirty(bh);
 	wait_on_buffer(bh);
+	if (buffer_eopnotsupp(bh)) {
+		printk("jbd2: journal_wait_on_commit_record: eopnotsupp\n");
+		ret = sync_dirty_buffer(bh);
+		printk("jbd2: sync_dirty_buffer returned %d\n", ret);
+	}
 
-	if (unlikely(!buffer_uptodate(bh)))
+	if (unlikely(!buffer_uptodate(bh))) {
+		printk("jbd2: journal_wait_on_commit_record: not uptodate\n");
 		ret = -EIO;
+	}
 	put_bh(bh);            /* One for getblk() */
 	jbd2_journal_put_journal_head(bh2jh(bh));
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: 2.6.28 ext4, xen and lvm volume becomes ro after snapshot
  2008-12-27  3:06             ` Theodore Tso
@ 2008-12-27  9:19               ` Andreas Sundstrom
  2009-01-02  3:11                 ` Theodore Tso
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Sundstrom @ 2008-12-27  9:19 UTC (permalink / raw)
  To: linux-kernel

Theodore Tso wrote:
> On Fri, Dec 26, 2008 at 11:00:11PM +0100, Andreas Sundstrom wrote:
>> But I enabled debugfs and did
>> "echo 3 > /sys/kernel/debug/jbd2/jbd2-debug" and reproduced the problem
>> by taking a snapshot while the system was live.
>> I hope this had the same effect as your proposed change.
> 
> Thanks, that was helpful.  Can you try applying this patch, and let me
> know whether the printk triggers?

No problem

> 
> What I'm guessing is going on is that on a native kernel, we get the
> ENOTSUPP error immediately when we call submit_bh().  However, with
> the Xen kernel, we aren't getting the error right away; we're either
> getting ENOTSUPP later on, when we call wait_on_buffer().  For ext3,
> this doesn't matter, since we call sync_dirty_buffer() which calls
> submit_bh() and wait_on_buffer() synchronously.  But ext4 doesn't use
> sync_dirty_buffer(), instead calling submit_bh() and wait_on_buffer()
> separately.
> 
> This patch should be able to confirm whether or not this supposition
> is correct.
> 
> 						- Ted
> 
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index bd1fad0..630196d 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -174,9 +174,16 @@ static int journal_wait_on_commit_record(struct buffer_head *bh)
>  
>  	clear_buffer_dirty(bh);
>  	wait_on_buffer(bh);
> +	if (buffer_eopnotsupp(bh)) {
> +		printk("jbd2: journal_wait_on_commit_record: eopnotsupp\n");
> +		ret = sync_dirty_buffer(bh);
> +		printk("jbd2: sync_dirty_buffer returned %d\n", ret);
> +	}
>  
> -	if (unlikely(!buffer_uptodate(bh)))
> +	if (unlikely(!buffer_uptodate(bh))) {
> +		printk("jbd2: journal_wait_on_commit_record: not uptodate\n");
>  		ret = -EIO;
> +	}
>  	put_bh(bh);            /* One for getblk() */
>  	jbd2_journal_put_journal_head(bh2jh(bh));
>  

[   44.546636] blkfront: xvda1: write barrier op failed

[   44.546666] blkfront: xvda1: barriers disabled

[   44.546686] end_request: I/O error, dev xvda1, sector 5256

[   44.546710] end_request: I/O error, dev xvda1, sector 5256

[   44.548228] jbd2: journal_wait_on_commit_record: eopnotsupp

[   44.548251] jbd2: sync_dirty_buffer returned 0

[   44.548270] jbd2: journal_wait_on_commit_record: not uptodate

[   44.548293] Aborting journal on device xvda1:8.

More output here http://pastebin.com/m3694a25b

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.28 ext4, xen and lvm volume becomes ro after snapshot
  2008-12-27  9:19               ` Andreas Sundstrom
@ 2009-01-02  3:11                 ` Theodore Tso
  2009-01-02  8:29                   ` Andreas Sundstrom
  0 siblings, 1 reply; 13+ messages in thread
From: Theodore Tso @ 2009-01-02  3:11 UTC (permalink / raw)
  To: Andreas Sundstrom; +Cc: linux-kernel

On Sat, Dec 27, 2008 at 10:19:52AM +0100, Andreas Sundstrom wrote:
> Theodore Tso wrote:
> > On Fri, Dec 26, 2008 at 11:00:11PM +0100, Andreas Sundstrom wrote:
> >> But I enabled debugfs and did
> >> "echo 3 > /sys/kernel/debug/jbd2/jbd2-debug" and reproduced the problem
> >> by taking a snapshot while the system was live.
> >> I hope this had the same effect as your proposed change.
> > 
> > Thanks, that was helpful.  Can you try applying this patch, and let me
> > know whether the printk triggers?
> 
> No problem
> 

Hi Andreas,

Can you try this patch?  This should (hopefully!) allow you to mount
ext4 in a Xen guest without needing to explicitly disable barriers.
If it works, it's what I plan to push to Linus.

         	     	       	      	      	 - Ted

jbd2: Add barrier not supported test to journal_wait_on_commit_record

Xen doesn't report that barriers are not supported until buffer I/O is
reported as completed, instead of when the buffer I/O is submitted.
Add a check and a fallback codepath to journal_wait_on_commit_record()
to detect this case, so that attempts to mount ext4 filesystems on
LVM/devicemapper devices on Xen guests don't blow up with an "Aborting
journal on device XXX"; "Remounting filesystem read-only" error.

Thanks to Andreas Sundstrom for reporting this issue.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index bd1fad0..4b87547 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -25,6 +25,7 @@
 #include <linux/crc32.h>
 #include <linux/writeback.h>
 #include <linux/backing-dev.h>
+#include <linux/bio.h>
 
 /*
  * Default IO end handler for temporary BJ_IO buffer_heads.
@@ -168,12 +169,34 @@ static int journal_submit_commit_record(journal_t *journal,
  * This function along with journal_submit_commit_record
  * allows to write the commit record asynchronously.
  */
-static int journal_wait_on_commit_record(struct buffer_head *bh)
+static int journal_wait_on_commit_record(journal_t *journal, 
+					 struct buffer_head *bh)
 {
 	int ret = 0;
 
+retry:
 	clear_buffer_dirty(bh);
 	wait_on_buffer(bh);
+	if (buffer_eopnotsupp(bh) && (journal->j_flags & JBD2_BARRIER)) {
+		printk(KERN_WARNING
+		       "JBD2: wait_on_commit_record: sync failed on %s - "
+		       "disabling barriers\n", journal->j_devname);
+		spin_lock(&journal->j_state_lock);
+		journal->j_flags &= ~JBD2_BARRIER;
+		spin_unlock(&journal->j_state_lock);
+
+		lock_buffer(bh);
+		clear_buffer_dirty(bh);
+		set_buffer_uptodate(bh);
+		bh->b_end_io = journal_end_buffer_io_sync;
+
+		ret = submit_bh(WRITE_SYNC, bh);
+		if (ret) {
+			unlock_buffer(bh);
+			return ret;
+		}
+		goto retry;
+	}
 
 	if (unlikely(!buffer_uptodate(bh)))
 		ret = -EIO;
@@ -802,7 +825,7 @@ wait_for_iobuf:
 			__jbd2_journal_abort_hard(journal);
 	}
 	if (!err && !is_journal_aborted(journal))
-		err = journal_wait_on_commit_record(cbh);
+		err = journal_wait_on_commit_record(journal, cbh);
 
 	if (err)
 		jbd2_journal_abort(journal, err);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: 2.6.28 ext4, xen and lvm volume becomes ro after snapshot
  2009-01-02  3:11                 ` Theodore Tso
@ 2009-01-02  8:29                   ` Andreas Sundstrom
  2009-01-02 14:10                     ` Theodore Tso
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Sundstrom @ 2009-01-02  8:29 UTC (permalink / raw)
  To: linux-kernel

Quoting Theodore Tso <tytso@mit.edu>:

> Can you try this patch?  This should (hopefully!) allow you to mount
> ext4 in a Xen guest without needing to explicitly disable barriers.
> If it works, it's what I plan to push to Linus.
>
>          	     	       	      	      	 - Ted
>
> jbd2: Add barrier not supported test to journal_wait_on_commit_record
>
> Xen doesn't report that barriers are not supported until buffer I/O is
> reported as completed, instead of when the buffer I/O is submitted.
> Add a check and a fallback codepath to journal_wait_on_commit_record()
> to detect this case, so that attempts to mount ext4 filesystems on
> LVM/devicemapper devices on Xen guests don't blow up with an "Aborting
> journal on device XXX"; "Remounting filesystem read-only" error.
>
> Thanks to Andreas Sundstrom for reporting this issue.
>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
>
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index bd1fad0..4b87547 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -25,6 +25,7 @@
>  #include <linux/crc32.h>
>  #include <linux/writeback.h>
>  #include <linux/backing-dev.h>
> +#include <linux/bio.h>
>
>  /*
>   * Default IO end handler for temporary BJ_IO buffer_heads.
> @@ -168,12 +169,34 @@ static int   
> journal_submit_commit_record(journal_t *journal,
>   * This function along with journal_submit_commit_record
>   * allows to write the commit record asynchronously.
>   */
> -static int journal_wait_on_commit_record(struct buffer_head *bh)
> +static int journal_wait_on_commit_record(journal_t *journal,
> +					 struct buffer_head *bh)
>  {
>  	int ret = 0;
>
> +retry:
>  	clear_buffer_dirty(bh);
>  	wait_on_buffer(bh);
> +	if (buffer_eopnotsupp(bh) && (journal->j_flags & JBD2_BARRIER)) {
> +		printk(KERN_WARNING
> +		       "JBD2: wait_on_commit_record: sync failed on %s - "
> +		       "disabling barriers\n", journal->j_devname);
> +		spin_lock(&journal->j_state_lock);
> +		journal->j_flags &= ~JBD2_BARRIER;
> +		spin_unlock(&journal->j_state_lock);
> +
> +		lock_buffer(bh);
> +		clear_buffer_dirty(bh);
> +		set_buffer_uptodate(bh);
> +		bh->b_end_io = journal_end_buffer_io_sync;
> +
> +		ret = submit_bh(WRITE_SYNC, bh);
> +		if (ret) {
> +			unlock_buffer(bh);
> +			return ret;
> +		}
> +		goto retry;
> +	}
>
>  	if (unlikely(!buffer_uptodate(bh)))
>  		ret = -EIO;
> @@ -802,7 +825,7 @@ wait_for_iobuf:
>  			__jbd2_journal_abort_hard(journal);
>  	}
>  	if (!err && !is_journal_aborted(journal))
> -		err = journal_wait_on_commit_record(cbh);
> +		err = journal_wait_on_commit_record(journal, cbh);
>
>  	if (err)
>  		jbd2_journal_abort(journal, err);
>

Well, the patch didn't work when I did "patch -i ext4.patch -p1" but I  
did it manually and the code compiles and works fine.
Here are the ext4 related kernel messages:

[    0.147721] EXT4-fs warning (device xvda1): ext4_fill_super:  
extents feature not enabled on this filesystem, use tune2fs.
[    0.148744] EXT4-fs: barriers enabled
[    0.158109] kjournald2 starting.  Commit interval 5 seconds
[    0.158131] EXT4-fs: delayed allocation enabled
[    0.158224] EXT4-fs: mballoc enabled
[    0.158235] EXT4-fs: mounted filesystem with ordered data mode.
[    0.158262] VFS: Mounted root (ext4 filesystem) readonly.
[    3.403818] EXT4 FS on xvda1, internal journal on xvda1:8
[    7.800623] blkfront: xvda1: write barrier op failed
[    7.800640] blkfront: xvda1: barriers disabled
[    7.800651] end_request: I/O error, dev xvda1, sector 4384
[    7.800664] end_request: I/O error, dev xvda1, sector 4384
[    7.800690] JBD2: wait_on_commit_record: sync failed on xvda1:8 -  
disabling barriers

# cat /proc/mounts |grep ext4
/dev/root / ext4 rw,noatime,barrier=1,noextents,data=ordered 0 0

/Andreas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.28 ext4, xen and lvm volume becomes ro after snapshot
  2009-01-02  8:29                   ` Andreas Sundstrom
@ 2009-01-02 14:10                     ` Theodore Tso
  2009-01-03 10:25                       ` Andreas Sundstrom
  0 siblings, 1 reply; 13+ messages in thread
From: Theodore Tso @ 2009-01-02 14:10 UTC (permalink / raw)
  To: Andreas Sundstrom; +Cc: linux-kernel

On Fri, Jan 02, 2009 at 09:29:02AM +0100, Andreas Sundstrom wrote:
> 
> Well, the patch didn't work when I did "patch -i ext4.patch -p1" but I  
> did it manually and the code compiles and works fine.

The patch assumed some whitespace cleanups earlier in the ext4 patch
queue, which is probably why you had to apply the patch by hand.
Thanks for trying it out!!

May I add:

Tested-by: Andreas Sundstrom <sunkan@zappa.cx>

To the patch commit log?  We've been trying trying to give credit to
those people who work with kernel developers to test fixes, but given
that it does add peoples' e-mail addresses, it's something I always
ask permission to do if that person hasn't agreed to this sort of
thing already.

Thanks again,

						- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.28 ext4, xen and lvm volume becomes ro after snapshot
  2009-01-02 14:10                     ` Theodore Tso
@ 2009-01-03 10:25                       ` Andreas Sundstrom
  0 siblings, 0 replies; 13+ messages in thread
From: Andreas Sundstrom @ 2009-01-03 10:25 UTC (permalink / raw)
  To: linux-kernel

Quoting Theodore Tso <tytso@mit.edu>:

> On Fri, Jan 02, 2009 at 09:29:02AM +0100, Andreas Sundstrom wrote:
>>
>> Well, the patch didn't work when I did "patch -i ext4.patch -p1" but I
>> did it manually and the code compiles and works fine.
>
> The patch assumed some whitespace cleanups earlier in the ext4 patch
> queue, which is probably why you had to apply the patch by hand.
> Thanks for trying it out!!

I see.. no problem..

>
> May I add:
>
> Tested-by: Andreas Sundstrom <sunkan@zappa.cx>
>
> To the patch commit log?  We've been trying trying to give credit to
> those people who work with kernel developers to test fixes, but given
> that it does add peoples' e-mail addresses, it's something I always
> ask permission to do if that person hasn't agreed to this sort of
> thing already.

Yes that is ok

Thank you for looking in to the issue

/Andreas

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-01-03 10:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-26 11:06 2.6.28 ext4, xen and lvm volume becomes ro after snapshot Andreas Sundstrom
2008-12-26 14:07 ` Theodore Tso
2008-12-26 15:42   ` Andreas Sundstrom
2008-12-26 18:21     ` Theodore Tso
2008-12-26 18:48       ` Andreas Sundstrom
2008-12-26 19:33         ` Theodore Tso
2008-12-26 22:00           ` Andreas Sundstrom
2008-12-27  3:06             ` Theodore Tso
2008-12-27  9:19               ` Andreas Sundstrom
2009-01-02  3:11                 ` Theodore Tso
2009-01-02  8:29                   ` Andreas Sundstrom
2009-01-02 14:10                     ` Theodore Tso
2009-01-03 10:25                       ` Andreas Sundstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).