All of lore.kernel.org
 help / color / mirror / Atom feed
* linux-next regression: IO errors in with ext4 and xen-blkfront
@ 2010-10-21  0:04 Jeremy Fitzhardinge
  2010-10-21  0:09   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-21  0:04 UTC (permalink / raw)
  To: Jens Axboe, Andreas Dilger, Theodore Ts'o
  Cc: Linux Kernel Mailing List, Xen-devel

 Hi,

When doing some regression testing with Xen on linux-next, I'm finding
that my domains are failing to get through the boot sequence due to IO
errors:

Remounting root filesystem in read-write mode:  EXT4-fs (dm-0): re-mounted. Opts: (null)
[  OK  ]
Mounting local filesystems:  EXT3-fs: barriers not enabled
kjournald starting.  Commit interval 5 seconds
EXT3-fs (xvda1): using internal journal
EXT3-fs (xvda1): mounted filesystem with writeback data mode
SELinux: initialized (dev xvda1, type ext3), uses xattr
SELinux: initialized (dev xenfs, type xenfs), uses genfs_contexts
[  OK  ]
Enabling local filesystem quotas:  [  OK  ]
Enabling /etc/fstab swaps:  Adding 917500k swap on /dev/mapper/vg_f1364-lv_swap.  Priority:-1 extents:1 across:917500k 
[  OK  ]
SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts
Entering non-interactive startup
Starting monitoring for VG vg_f1364:   2 logical volume(s) in volume group "vg_f1364" monitored
[  OK  ]
ip6tables: Applying firewall rules: [  OK  ]
iptables: Applying firewall rules: [  OK  ]
Bringing up loopback interface:  [  OK  ]
Bringing up interface eth0:  
Determining IP information for eth0... done.
[  OK  ]
Starting auditd: [  OK  ]
end_request: I/O error, dev xvda, sector 0
end_request: I/O error, dev xvda, sector 0
end_request: I/O error, dev xvda, sector 9675936
Aborting journal on device dm-0-8.
Starting portreserve: EXT4-fs error (device dm-0): ext4_journal_start_sb:259: Detected aborted journal
EXT4-fs (dm-0): Remounting filesystem read-only
[  OK  ]
Starting system logger: EXT4-fs (dm-0): error count: 4
EXT4-fs (dm-0): initial error at 1286479997: ext4_journal_start_sb:251
EXT4-fs (dm-0): last error at 1287618175: ext4_journal_start_sb:259


I haven't tried to bisect this yet (which will be awkward because
linux-next had also introduced various Xen bootcrashing bugs), but I
wonder if you have any thoughts about what may be happening here.  I
guess an obvious candidate is the barrier changes in the storage
subsystem, but I still get the same errors if I mount root with barrier=0.

Current linux-2.6 mainline is fine, so the problem is in some of the
patches targeted at the next merge window.

Thanks,
    J

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: linux-next regression: IO errors in with ext4 and xen-blkfront
  2010-10-21  0:04 linux-next regression: IO errors in with ext4 and xen-blkfront Jeremy Fitzhardinge
@ 2010-10-21  0:09   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-21  0:09 UTC (permalink / raw)
  To: Jens Axboe, Andreas Dilger, Theodore Ts'o
  Cc: Linux Kernel Mailing List, Xen-devel

 On 10/20/2010 05:04 PM, Jeremy Fitzhardinge wrote:
>  Hi,
>
> When doing some regression testing with Xen on linux-next, I'm finding
> that my domains are failing to get through the boot sequence due to IO
> errors:
>
> Remounting root filesystem in read-write mode:  EXT4-fs (dm-0): re-mounted. Opts: (null)
> [  OK  ]
> Mounting local filesystems:  EXT3-fs: barriers not enabled
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs (xvda1): using internal journal
> EXT3-fs (xvda1): mounted filesystem with writeback data mode
> SELinux: initialized (dev xvda1, type ext3), uses xattr
> SELinux: initialized (dev xenfs, type xenfs), uses genfs_contexts
> [  OK  ]
> Enabling local filesystem quotas:  [  OK  ]
> Enabling /etc/fstab swaps:  Adding 917500k swap on /dev/mapper/vg_f1364-lv_swap.  Priority:-1 extents:1 across:917500k 
> [  OK  ]
> SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts
> Entering non-interactive startup
> Starting monitoring for VG vg_f1364:   2 logical volume(s) in volume group "vg_f1364" monitored
> [  OK  ]
> ip6tables: Applying firewall rules: [  OK  ]
> iptables: Applying firewall rules: [  OK  ]
> Bringing up loopback interface:  [  OK  ]
> Bringing up interface eth0:  
> Determining IP information for eth0... done.
> [  OK  ]
> Starting auditd: [  OK  ]
> end_request: I/O error, dev xvda, sector 0
> end_request: I/O error, dev xvda, sector 0
> end_request: I/O error, dev xvda, sector 9675936
> Aborting journal on device dm-0-8.
> Starting portreserve: EXT4-fs error (device dm-0): ext4_journal_start_sb:259: Detected aborted journal
> EXT4-fs (dm-0): Remounting filesystem read-only
> [  OK  ]
> Starting system logger: EXT4-fs (dm-0): error count: 4
> EXT4-fs (dm-0): initial error at 1286479997: ext4_journal_start_sb:251
> EXT4-fs (dm-0): last error at 1287618175: ext4_journal_start_sb:259
>
>
> I haven't tried to bisect this yet (which will be awkward because
> linux-next had also introduced various Xen bootcrashing bugs), but I
> wonder if you have any thoughts about what may be happening here.  I
> guess an obvious candidate is the barrier changes in the storage
> subsystem, but I still get the same errors if I mount root with barrier=0.

Hm.  I get the same errors, but the system boots to login prompt rather
than hanging at that point above, and seems generally happy.  So perhaps
barriers are the key.

> Current linux-2.6 mainline is fine, so the problem is in some of the
> patches targeted at the next merge window.
>


Thanks,
    J

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: linux-next regression: IO errors in with ext4 and xen-blkfront
@ 2010-10-21  0:09   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-21  0:09 UTC (permalink / raw)
  To: Jens Axboe, Andreas Dilger, Theodore Ts'o
  Cc: Xen-devel, Linux Kernel Mailing List

 On 10/20/2010 05:04 PM, Jeremy Fitzhardinge wrote:
>  Hi,
>
> When doing some regression testing with Xen on linux-next, I'm finding
> that my domains are failing to get through the boot sequence due to IO
> errors:
>
> Remounting root filesystem in read-write mode:  EXT4-fs (dm-0): re-mounted. Opts: (null)
> [  OK  ]
> Mounting local filesystems:  EXT3-fs: barriers not enabled
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs (xvda1): using internal journal
> EXT3-fs (xvda1): mounted filesystem with writeback data mode
> SELinux: initialized (dev xvda1, type ext3), uses xattr
> SELinux: initialized (dev xenfs, type xenfs), uses genfs_contexts
> [  OK  ]
> Enabling local filesystem quotas:  [  OK  ]
> Enabling /etc/fstab swaps:  Adding 917500k swap on /dev/mapper/vg_f1364-lv_swap.  Priority:-1 extents:1 across:917500k 
> [  OK  ]
> SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts
> Entering non-interactive startup
> Starting monitoring for VG vg_f1364:   2 logical volume(s) in volume group "vg_f1364" monitored
> [  OK  ]
> ip6tables: Applying firewall rules: [  OK  ]
> iptables: Applying firewall rules: [  OK  ]
> Bringing up loopback interface:  [  OK  ]
> Bringing up interface eth0:  
> Determining IP information for eth0... done.
> [  OK  ]
> Starting auditd: [  OK  ]
> end_request: I/O error, dev xvda, sector 0
> end_request: I/O error, dev xvda, sector 0
> end_request: I/O error, dev xvda, sector 9675936
> Aborting journal on device dm-0-8.
> Starting portreserve: EXT4-fs error (device dm-0): ext4_journal_start_sb:259: Detected aborted journal
> EXT4-fs (dm-0): Remounting filesystem read-only
> [  OK  ]
> Starting system logger: EXT4-fs (dm-0): error count: 4
> EXT4-fs (dm-0): initial error at 1286479997: ext4_journal_start_sb:251
> EXT4-fs (dm-0): last error at 1287618175: ext4_journal_start_sb:259
>
>
> I haven't tried to bisect this yet (which will be awkward because
> linux-next had also introduced various Xen bootcrashing bugs), but I
> wonder if you have any thoughts about what may be happening here.  I
> guess an obvious candidate is the barrier changes in the storage
> subsystem, but I still get the same errors if I mount root with barrier=0.

Hm.  I get the same errors, but the system boots to login prompt rather
than hanging at that point above, and seems generally happy.  So perhaps
barriers are the key.

> Current linux-2.6 mainline is fine, so the problem is in some of the
> patches targeted at the next merge window.
>


Thanks,
    J

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: linux-next regression: IO errors in with ext4 and xen-blkfront
  2010-10-21  0:09   ` Jeremy Fitzhardinge
@ 2010-10-22  8:18     ` Jens Axboe
  -1 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2010-10-22  8:18 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Andreas Dilger, Theodore Ts'o, Linux Kernel Mailing List, Xen-devel

On 2010-10-21 02:09, Jeremy Fitzhardinge wrote:
>  On 10/20/2010 05:04 PM, Jeremy Fitzhardinge wrote:
>>  Hi,
>>
>> When doing some regression testing with Xen on linux-next, I'm finding
>> that my domains are failing to get through the boot sequence due to IO
>> errors:
>>
>> Remounting root filesystem in read-write mode:  EXT4-fs (dm-0): re-mounted. Opts: (null)
>> [  OK  ]
>> Mounting local filesystems:  EXT3-fs: barriers not enabled
>> kjournald starting.  Commit interval 5 seconds
>> EXT3-fs (xvda1): using internal journal
>> EXT3-fs (xvda1): mounted filesystem with writeback data mode
>> SELinux: initialized (dev xvda1, type ext3), uses xattr
>> SELinux: initialized (dev xenfs, type xenfs), uses genfs_contexts
>> [  OK  ]
>> Enabling local filesystem quotas:  [  OK  ]
>> Enabling /etc/fstab swaps:  Adding 917500k swap on /dev/mapper/vg_f1364-lv_swap.  Priority:-1 extents:1 across:917500k 
>> [  OK  ]
>> SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts
>> Entering non-interactive startup
>> Starting monitoring for VG vg_f1364:   2 logical volume(s) in volume group "vg_f1364" monitored
>> [  OK  ]
>> ip6tables: Applying firewall rules: [  OK  ]
>> iptables: Applying firewall rules: [  OK  ]
>> Bringing up loopback interface:  [  OK  ]
>> Bringing up interface eth0:  
>> Determining IP information for eth0... done.
>> [  OK  ]
>> Starting auditd: [  OK  ]
>> end_request: I/O error, dev xvda, sector 0
>> end_request: I/O error, dev xvda, sector 0
>> end_request: I/O error, dev xvda, sector 9675936
>> Aborting journal on device dm-0-8.
>> Starting portreserve: EXT4-fs error (device dm-0): ext4_journal_start_sb:259: Detected aborted journal
>> EXT4-fs (dm-0): Remounting filesystem read-only
>> [  OK  ]
>> Starting system logger: EXT4-fs (dm-0): error count: 4
>> EXT4-fs (dm-0): initial error at 1286479997: ext4_journal_start_sb:251
>> EXT4-fs (dm-0): last error at 1287618175: ext4_journal_start_sb:259
>>
>>
>> I haven't tried to bisect this yet (which will be awkward because
>> linux-next had also introduced various Xen bootcrashing bugs), but I
>> wonder if you have any thoughts about what may be happening here.  I
>> guess an obvious candidate is the barrier changes in the storage
>> subsystem, but I still get the same errors if I mount root with barrier=0.
> 
> Hm.  I get the same errors, but the system boots to login prompt rather
> than hanging at that point above, and seems generally happy.  So perhaps
> barriers are the key.

To test that theory, can you try and pull the two other main bits of the
pending block patches and see if it works?

git://git.kernel.dk/linux-2.6-block.git for-2.6.37/core
git://git.kernel.dk/linux-2.6-block.git for-2.6.37/drivers

and if that works, then pull

git://git.kernel.dk/linux-2.6-block.git for-2.6.37/barrier

and see how that fares.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: linux-next regression: IO errors in with ext4 and xen-blkfront
@ 2010-10-22  8:18     ` Jens Axboe
  0 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2010-10-22  8:18 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Theodore Ts'o, Xen-devel, Andreas Dilger, Linux Kernel Mailing List

On 2010-10-21 02:09, Jeremy Fitzhardinge wrote:
>  On 10/20/2010 05:04 PM, Jeremy Fitzhardinge wrote:
>>  Hi,
>>
>> When doing some regression testing with Xen on linux-next, I'm finding
>> that my domains are failing to get through the boot sequence due to IO
>> errors:
>>
>> Remounting root filesystem in read-write mode:  EXT4-fs (dm-0): re-mounted. Opts: (null)
>> [  OK  ]
>> Mounting local filesystems:  EXT3-fs: barriers not enabled
>> kjournald starting.  Commit interval 5 seconds
>> EXT3-fs (xvda1): using internal journal
>> EXT3-fs (xvda1): mounted filesystem with writeback data mode
>> SELinux: initialized (dev xvda1, type ext3), uses xattr
>> SELinux: initialized (dev xenfs, type xenfs), uses genfs_contexts
>> [  OK  ]
>> Enabling local filesystem quotas:  [  OK  ]
>> Enabling /etc/fstab swaps:  Adding 917500k swap on /dev/mapper/vg_f1364-lv_swap.  Priority:-1 extents:1 across:917500k 
>> [  OK  ]
>> SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts
>> Entering non-interactive startup
>> Starting monitoring for VG vg_f1364:   2 logical volume(s) in volume group "vg_f1364" monitored
>> [  OK  ]
>> ip6tables: Applying firewall rules: [  OK  ]
>> iptables: Applying firewall rules: [  OK  ]
>> Bringing up loopback interface:  [  OK  ]
>> Bringing up interface eth0:  
>> Determining IP information for eth0... done.
>> [  OK  ]
>> Starting auditd: [  OK  ]
>> end_request: I/O error, dev xvda, sector 0
>> end_request: I/O error, dev xvda, sector 0
>> end_request: I/O error, dev xvda, sector 9675936
>> Aborting journal on device dm-0-8.
>> Starting portreserve: EXT4-fs error (device dm-0): ext4_journal_start_sb:259: Detected aborted journal
>> EXT4-fs (dm-0): Remounting filesystem read-only
>> [  OK  ]
>> Starting system logger: EXT4-fs (dm-0): error count: 4
>> EXT4-fs (dm-0): initial error at 1286479997: ext4_journal_start_sb:251
>> EXT4-fs (dm-0): last error at 1287618175: ext4_journal_start_sb:259
>>
>>
>> I haven't tried to bisect this yet (which will be awkward because
>> linux-next had also introduced various Xen bootcrashing bugs), but I
>> wonder if you have any thoughts about what may be happening here.  I
>> guess an obvious candidate is the barrier changes in the storage
>> subsystem, but I still get the same errors if I mount root with barrier=0.
> 
> Hm.  I get the same errors, but the system boots to login prompt rather
> than hanging at that point above, and seems generally happy.  So perhaps
> barriers are the key.

To test that theory, can you try and pull the two other main bits of the
pending block patches and see if it works?

git://git.kernel.dk/linux-2.6-block.git for-2.6.37/core
git://git.kernel.dk/linux-2.6-block.git for-2.6.37/drivers

and if that works, then pull

git://git.kernel.dk/linux-2.6-block.git for-2.6.37/barrier

and see how that fares.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: linux-next regression: IO errors in with ext4 and xen-blkfront
  2010-10-22  8:18     ` Jens Axboe
  (?)
@ 2010-10-22  8:29     ` Christoph Hellwig
  2010-10-22  8:54       ` Jens Axboe
  2010-10-25 18:26       ` [Xen-devel] " Konrad Rzeszutek Wilk
  -1 siblings, 2 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-10-22  8:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jeremy Fitzhardinge, Andreas Dilger, Theodore Ts'o,
	Linux Kernel Mailing List, Xen-devel

In the barriers tree Xen claims to support flushes, but I doesn't.
It never handles REQ_FLUSH requests.  Try commenting out the

	blk_queue_flush(info->rq, info->feature_flush);

call and things should improve.  I still need to hear back from Xen
folks how to actually implement a cache flush - they only implement
a barrier write privilegue which could never implement an empty
cache flush.  Up to current kernels that meant it would implement
barrier writes with content correctly and silently ignore empty barriers
leading to very interesting data integrity bugs.  From 2.6.37 onwards
it simply won't work anymore at all, which is at least consistent
(modulo the bug of actually claiming to support flushes).


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: linux-next regression: IO errors in with ext4 and xen-blkfront
  2010-10-22  8:29     ` Christoph Hellwig
@ 2010-10-22  8:54       ` Jens Axboe
  2010-10-22  8:56         ` Christoph Hellwig
  2010-10-25 18:26       ` [Xen-devel] " Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2010-10-22  8:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeremy Fitzhardinge, Andreas Dilger, Theodore Ts'o,
	Linux Kernel Mailing List, Xen-devel

On 2010-10-22 10:29, Christoph Hellwig wrote:
> In the barriers tree Xen claims to support flushes, but I doesn't.
> It never handles REQ_FLUSH requests.  Try commenting out the
> 
> 	blk_queue_flush(info->rq, info->feature_flush);
> 
> call and things should improve.  I still need to hear back from Xen
> folks how to actually implement a cache flush - they only implement
> a barrier write privilegue which could never implement an empty
> cache flush.  Up to current kernels that meant it would implement
> barrier writes with content correctly and silently ignore empty barriers
> leading to very interesting data integrity bugs.  From 2.6.37 onwards
> it simply won't work anymore at all, which is at least consistent
> (modulo the bug of actually claiming to support flushes).

So how about we just disable barriers for Xen atm? I would really really
like to push that branch out as well now, since I'll be travelling for
most of the merge window this time.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: linux-next regression: IO errors in with ext4 and xen-blkfront
  2010-10-22  8:54       ` Jens Axboe
@ 2010-10-22  8:56         ` Christoph Hellwig
  2010-10-22  8:57           ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2010-10-22  8:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Jeremy Fitzhardinge, Andreas Dilger,
	Theodore Ts'o, Linux Kernel Mailing List, Xen-devel

On Fri, Oct 22, 2010 at 10:54:54AM +0200, Jens Axboe wrote:
> So how about we just disable barriers for Xen atm? I would really really
> like to push that branch out as well now, since I'll be travelling for
> most of the merge window this time.

Yes, that's what removing/commenting out that line does.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: linux-next regression: IO errors in with ext4 and xen-blkfront
  2010-10-22  8:56         ` Christoph Hellwig
@ 2010-10-22  8:57           ` Jens Axboe
  2010-10-22  9:20             ` Christoph Hellwig
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2010-10-22  8:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jeremy Fitzhardinge, Andreas Dilger, Theodore Ts'o,
	Linux Kernel Mailing List, Xen-devel

On 2010-10-22 10:56, Christoph Hellwig wrote:
> On Fri, Oct 22, 2010 at 10:54:54AM +0200, Jens Axboe wrote:
>> So how about we just disable barriers for Xen atm? I would really really
>> like to push that branch out as well now, since I'll be travelling for
>> most of the merge window this time.
> 
> Yes, that's what removing/commenting out that line does.

Certainly, but I meant in the barrier branch for submission. If
it doesn't do empty flushes to begin with, that should be fixed
up before being enabled in any case.

I'll disable barrier support in xen-blkfront.c for now.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: linux-next regression: IO errors in with ext4 and xen-blkfront
  2010-10-22  8:57           ` Jens Axboe
@ 2010-10-22  9:20             ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-10-22  9:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Jeremy Fitzhardinge, Andreas Dilger,
	Theodore Ts'o, Linux Kernel Mailing List, Xen-devel

On Fri, Oct 22, 2010 at 10:57:40AM +0200, Jens Axboe wrote:
> On 2010-10-22 10:56, Christoph Hellwig wrote:
> > On Fri, Oct 22, 2010 at 10:54:54AM +0200, Jens Axboe wrote:
> >> So how about we just disable barriers for Xen atm? I would really really
> >> like to push that branch out as well now, since I'll be travelling for
> >> most of the merge window this time.
> > 
> > Yes, that's what removing/commenting out that line does.
> 
> Certainly, but I meant in the barrier branch for submission. If
> it doesn't do empty flushes to begin with, that should be fixed
> up before being enabled in any case.

Yes, it should have been disabled log ago.  I had a long discussion with
them abnout it when they introduced the even more buggy barriers by
tags mode for .36 but they simply ignored it.

> I'll disable barrier support in xen-blkfront.c for now.

Thanks.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
  2010-10-22  8:29     ` Christoph Hellwig
  2010-10-22  8:54       ` Jens Axboe
@ 2010-10-25 18:26       ` Konrad Rzeszutek Wilk
  2010-10-25 18:47           ` Christoph Hellwig
  1 sibling, 1 reply; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-10-25 18:26 UTC (permalink / raw)
  To: Christoph Hellwig, daniel.stodden
  Cc: Jens Axboe, Theodore Ts'o, Jeremy Fitzhardinge, Xen-devel,
	Andreas Dilger, Linux Kernel Mailing List

On Fri, Oct 22, 2010 at 04:29:16AM -0400, Christoph Hellwig wrote:
> In the barriers tree Xen claims to support flushes, but I doesn't.
> It never handles REQ_FLUSH requests.  Try commenting out the
> 
> 	blk_queue_flush(info->rq, info->feature_flush);
> 
> call and things should improve.  I still need to hear back from Xen
> folks how to actually implement a cache flush - they only implement

I think we just blindly assume that we would pass the request
to the backend. And if the backend is running under an ancient
version (2.6.18), the behavior would be quite different.

Perhaps we should negotiate with the backend whether it runs
under a kernel with the new barrier support? And if so, then enable
them? If the backend says it has no idea what we are talking about then
disable the barrier support?

How does that sound?

(Adding Daniel to this email thread as he has much more experience
than I do).

Daniel, what about the "use tagged queuing for barriers" patch you wrote
some time ago? Is it applicable to this issue?

> a barrier write privilegue which could never implement an empty
> cache flush.  Up to current kernels that meant it would implement
> barrier writes with content correctly and silently ignore empty barriers
> leading to very interesting data integrity bugs.  From 2.6.37 onwards
> it simply won't work anymore at all, which is at least consistent
> (modulo the bug of actually claiming to support flushes).
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
  2010-10-25 18:26       ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2010-10-25 18:47           ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-10-25 18:47 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Christoph Hellwig, daniel.stodden, Jens Axboe, Theodore Ts'o,
	Jeremy Fitzhardinge, Xen-devel, Andreas Dilger,
	Linux Kernel Mailing List

On Mon, Oct 25, 2010 at 02:26:30PM -0400, Konrad Rzeszutek Wilk wrote:
> I think we just blindly assume that we would pass the request
> to the backend. And if the backend is running under an ancient
> version (2.6.18), the behavior would be quite different.

I don't think this has much to do with the backend.  Xen never
implemented empty barriers correctly.  This has been a bug since day
one, although before no one noticed because the cruft in the old
barrier code made them look like they succeed without them actually
succeeding.  With the new barrier code you do get an error back for
them - and you do get them more often because cache flushes aka
empty barriers are the only thing we send now.

The right fix is to add a cache flush command to the protocol which
will do the right things for all guests.  In fact I read on a netbsd
lists they had to do exactly that command to get their cache flushes
to work, so it must exist for some versions of the backends.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: linux-next regression: IO errors in with ext4 and xen-blkfront
@ 2010-10-25 18:47           ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-10-25 18:47 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jens Axboe, Jeremy Fitzhardinge, Xen-devel, Theodore Ts'o,
	Linux Kernel Mailing List, Christoph Hellwig, Andreas Dilger,
	daniel.stodden

On Mon, Oct 25, 2010 at 02:26:30PM -0400, Konrad Rzeszutek Wilk wrote:
> I think we just blindly assume that we would pass the request
> to the backend. And if the backend is running under an ancient
> version (2.6.18), the behavior would be quite different.

I don't think this has much to do with the backend.  Xen never
implemented empty barriers correctly.  This has been a bug since day
one, although before no one noticed because the cruft in the old
barrier code made them look like they succeed without them actually
succeeding.  With the new barrier code you do get an error back for
them - and you do get them more often because cache flushes aka
empty barriers are the only thing we send now.

The right fix is to add a cache flush command to the protocol which
will do the right things for all guests.  In fact I read on a netbsd
lists they had to do exactly that command to get their cache flushes
to work, so it must exist for some versions of the backends.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
  2010-10-25 18:47           ` Christoph Hellwig
@ 2010-10-25 19:05             ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-10-25 19:05 UTC (permalink / raw)
  To: Christoph Hellwig, daniel.stodden
  Cc: Jens Axboe, Jeremy Fitzhardinge, Xen-devel, Theodore Ts'o,
	Linux Kernel Mailing List, Andreas Dilger, daniel.stodden

On Mon, Oct 25, 2010 at 02:47:56PM -0400, Christoph Hellwig wrote:
> On Mon, Oct 25, 2010 at 02:26:30PM -0400, Konrad Rzeszutek Wilk wrote:
> > I think we just blindly assume that we would pass the request
> > to the backend. And if the backend is running under an ancient
> > version (2.6.18), the behavior would be quite different.
> 
> I don't think this has much to do with the backend.  Xen never
> implemented empty barriers correctly.  This has been a bug since day
> one, although before no one noticed because the cruft in the old
> barrier code made them look like they succeed without them actually
> succeeding.  With the new barrier code you do get an error back for
> them - and you do get them more often because cache flushes aka
> empty barriers are the only thing we send now.
> 
> The right fix is to add a cache flush command to the protocol which
> will do the right things for all guests.  In fact I read on a netbsd
> lists they had to do exactly that command to get their cache flushes
> to work, so it must exist for some versions of the backends.

Ok, thank you for the pointer.

Daniel, you are the resident expert, what do you say?

Jens, for 2.6.37 is the patch for disabling write barrier support
by the xen-blkfront the way to do it?

Or if we came up with a patch now would it potentially make it in
2.6.37-rcX (I don't know if the fix for this would qualify as a bug
or regression since it looks to be adding a new command)? And what
Christoph suggest that this has been in v2.6.36, v2.6.35, etc. so that
would definitly but it outside the regression definition.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: linux-next regression: IO errors in with ext4 and xen-blkfront
@ 2010-10-25 19:05             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-10-25 19:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Jeremy Fitzhardinge, Xen-devel, Theodore Ts'o,
	Linux Kernel Mailing List, Andreas Dilger, daniel.stodden

On Mon, Oct 25, 2010 at 02:47:56PM -0400, Christoph Hellwig wrote:
> On Mon, Oct 25, 2010 at 02:26:30PM -0400, Konrad Rzeszutek Wilk wrote:
> > I think we just blindly assume that we would pass the request
> > to the backend. And if the backend is running under an ancient
> > version (2.6.18), the behavior would be quite different.
> 
> I don't think this has much to do with the backend.  Xen never
> implemented empty barriers correctly.  This has been a bug since day
> one, although before no one noticed because the cruft in the old
> barrier code made them look like they succeed without them actually
> succeeding.  With the new barrier code you do get an error back for
> them - and you do get them more often because cache flushes aka
> empty barriers are the only thing we send now.
> 
> The right fix is to add a cache flush command to the protocol which
> will do the right things for all guests.  In fact I read on a netbsd
> lists they had to do exactly that command to get their cache flushes
> to work, so it must exist for some versions of the backends.

Ok, thank you for the pointer.

Daniel, you are the resident expert, what do you say?

Jens, for 2.6.37 is the patch for disabling write barrier support
by the xen-blkfront the way to do it?

Or if we came up with a patch now would it potentially make it in
2.6.37-rcX (I don't know if the fix for this would qualify as a bug
or regression since it looks to be adding a new command)? And what
Christoph suggest that this has been in v2.6.36, v2.6.35, etc. so that
would definitly but it outside the regression definition.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
  2010-10-25 19:05             ` Konrad Rzeszutek Wilk
@ 2010-10-26 12:49               ` Daniel Stodden
  -1 siblings, 0 replies; 18+ messages in thread
From: Daniel Stodden @ 2010-10-26 12:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Christoph Hellwig, Jens Axboe, Jeremy Fitzhardinge, Xen-devel,
	Theodore Ts'o, Linux Kernel Mailing List, Andreas Dilger

On Mon, 2010-10-25 at 15:05 -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Oct 25, 2010 at 02:47:56PM -0400, Christoph Hellwig wrote:
> > On Mon, Oct 25, 2010 at 02:26:30PM -0400, Konrad Rzeszutek Wilk wrote:
> > > I think we just blindly assume that we would pass the request
> > > to the backend. And if the backend is running under an ancient
> > > version (2.6.18), the behavior would be quite different.
> > 
> > I don't think this has much to do with the backend.  Xen never
> > implemented empty barriers correctly.  This has been a bug since day
> > one, although before no one noticed because the cruft in the old
> > barrier code made them look like they succeed without them actually
> > succeeding.  With the new barrier code you do get an error back for
> > them - and you do get them more often because cache flushes aka
> > empty barriers are the only thing we send now.
> > 
> > The right fix is to add a cache flush command to the protocol which
> > will do the right things for all guests.  In fact I read on a netbsd
> > lists they had to do exactly that command to get their cache flushes
> > to work, so it must exist for some versions of the backends.
> 
> Ok, thank you for the pointer.
> 
> Daniel, you are the resident expert, what do you say?
> 
> Jens, for 2.6.37 is the patch for disabling write barrier support
> by the xen-blkfront the way to do it?

This thread is not just about a single command, it's two entirely
different models.

Let's try like approach it like this: I don't see the point in adding a
dedicated command for the above. You want the backend to issue a cache
flush. As far as the current ring model is concerned, you can express
this as an empty barrier write, or you can add a dedicated op (which is
an empty request with a fancier name). That's fairly boring.

Bugginess in how Linux drivers / kernel versions realize this, whether
in front- or backend, aside.

Next, go on and make discussions more entertaining by redefining your
use of the term 'barrier' to mean 'cache flush' now. I think that marked
the end of the previous thread. I've seen discussions like this. That
is, you remove the ordering constraint, which is what differentiates
barriers from mere cache flushes.

The crux is moving to a model where an ordered write requires a queue
drain by the guest. That's somewhat more low-level and for many disks
more realistic, but it's also awkward for a virtualization layer,
compared to ordered/durable writes. 

One things that it gets you is more latency by stalling the request
stream, then extra events to kick things off again (ok, not that the
difference is huge).

The more general reason why I'd be reluctant to move from barriers to a
caching/flushing/non-ordering disk model are questions like: Why would a
frontend even want to know if a disk is cached, or have to assume so?
Letting the backend alone deal with it is less overhead across different
guest systems, gets enforced in the right place, and avoids a rathole
full of compat headaches later on.

The barrier model is relatively straightforward to implement, even when
it doesn't map to the backend queue anymore. The backend will need to
translate to queue draining and cache flushes as needed by the device
then. That's a state machine, but a small one, and not exactly a new
idea.

Furthermore: If the backend ever gets to start dealing with that entire
cache write durability thing *properly*, we need synchronization across
backend groups sharing a common physical layer anyway, to schedule and
merge barrier points etc. That's a bigger state machine, but derives
from the one above. From there on, any effort spent on trying to
'simplify' things by imposing explicit drain/flush on frontends will
look rather embarrassing.

Unless Xen is just a fancy way to run Linux on Linux on a flat
partition, I'd rather like to see the barrier model stay, blkback fixed,
frontend cache flushes mapped to empty barriers. In the long run, the
simpler model is the least expensive one.

Daniel

> Or if we came up with a patch now would it potentially make it in
> 2.6.37-rcX (I don't know if the fix for this would qualify as a bug
> or regression since it looks to be adding a new command)? And what
> Christoph suggest that this has been in v2.6.36, v2.6.35, etc. so that
> would definitly but it outside the regression definition.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: linux-next regression: IO errors in with ext4 and xen-blkfront
@ 2010-10-26 12:49               ` Daniel Stodden
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Stodden @ 2010-10-26 12:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jens Axboe, Jeremy Fitzhardinge, Xen-devel, Theodore Ts'o,
	Kernel Mailing List, Christoph Hellwig, Andreas Dilger, Linux

On Mon, 2010-10-25 at 15:05 -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Oct 25, 2010 at 02:47:56PM -0400, Christoph Hellwig wrote:
> > On Mon, Oct 25, 2010 at 02:26:30PM -0400, Konrad Rzeszutek Wilk wrote:
> > > I think we just blindly assume that we would pass the request
> > > to the backend. And if the backend is running under an ancient
> > > version (2.6.18), the behavior would be quite different.
> > 
> > I don't think this has much to do with the backend.  Xen never
> > implemented empty barriers correctly.  This has been a bug since day
> > one, although before no one noticed because the cruft in the old
> > barrier code made them look like they succeed without them actually
> > succeeding.  With the new barrier code you do get an error back for
> > them - and you do get them more often because cache flushes aka
> > empty barriers are the only thing we send now.
> > 
> > The right fix is to add a cache flush command to the protocol which
> > will do the right things for all guests.  In fact I read on a netbsd
> > lists they had to do exactly that command to get their cache flushes
> > to work, so it must exist for some versions of the backends.
> 
> Ok, thank you for the pointer.
> 
> Daniel, you are the resident expert, what do you say?
> 
> Jens, for 2.6.37 is the patch for disabling write barrier support
> by the xen-blkfront the way to do it?

This thread is not just about a single command, it's two entirely
different models.

Let's try like approach it like this: I don't see the point in adding a
dedicated command for the above. You want the backend to issue a cache
flush. As far as the current ring model is concerned, you can express
this as an empty barrier write, or you can add a dedicated op (which is
an empty request with a fancier name). That's fairly boring.

Bugginess in how Linux drivers / kernel versions realize this, whether
in front- or backend, aside.

Next, go on and make discussions more entertaining by redefining your
use of the term 'barrier' to mean 'cache flush' now. I think that marked
the end of the previous thread. I've seen discussions like this. That
is, you remove the ordering constraint, which is what differentiates
barriers from mere cache flushes.

The crux is moving to a model where an ordered write requires a queue
drain by the guest. That's somewhat more low-level and for many disks
more realistic, but it's also awkward for a virtualization layer,
compared to ordered/durable writes. 

One things that it gets you is more latency by stalling the request
stream, then extra events to kick things off again (ok, not that the
difference is huge).

The more general reason why I'd be reluctant to move from barriers to a
caching/flushing/non-ordering disk model are questions like: Why would a
frontend even want to know if a disk is cached, or have to assume so?
Letting the backend alone deal with it is less overhead across different
guest systems, gets enforced in the right place, and avoids a rathole
full of compat headaches later on.

The barrier model is relatively straightforward to implement, even when
it doesn't map to the backend queue anymore. The backend will need to
translate to queue draining and cache flushes as needed by the device
then. That's a state machine, but a small one, and not exactly a new
idea.

Furthermore: If the backend ever gets to start dealing with that entire
cache write durability thing *properly*, we need synchronization across
backend groups sharing a common physical layer anyway, to schedule and
merge barrier points etc. That's a bigger state machine, but derives
from the one above. From there on, any effort spent on trying to
'simplify' things by imposing explicit drain/flush on frontends will
look rather embarrassing.

Unless Xen is just a fancy way to run Linux on Linux on a flat
partition, I'd rather like to see the barrier model stay, blkback fixed,
frontend cache flushes mapped to empty barriers. In the long run, the
simpler model is the least expensive one.

Daniel

> Or if we came up with a patch now would it potentially make it in
> 2.6.37-rcX (I don't know if the fix for this would qualify as a bug
> or regression since it looks to be adding a new command)? And what
> Christoph suggest that this has been in v2.6.36, v2.6.35, etc. so that
> would definitly but it outside the regression definition.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] Re: linux-next regression: IO errors in with ext4 and xen-blkfront
  2010-10-26 12:49               ` Daniel Stodden
  (?)
@ 2010-10-27 10:23               ` Christoph Hellwig
  -1 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2010-10-27 10:23 UTC (permalink / raw)
  To: Daniel Stodden
  Cc: Konrad Rzeszutek Wilk, Jens Axboe, Jeremy Fitzhardinge,
	Xen-devel, Linux Kernel Mailing List

I'm really not interested in getting into this flamewar again.

If you want to make Xen blockdevices work reliably you need to implement
a cache flush primitive in the driver.  If your cache flush primitive
also enforced ordering that's fine for data integrity, but won't help
your performance.

Note that current the _driver_ does not implement the cache flushes
correctly which is what started this thread and the previous flamewar.
If you can fix it using the existing primitive with just driver changes
that's fine - but according to
http://mail-index.netbsd.org/port-xen/2010/09/24/msg006274.html at least
the NetBSD people didn't think so.

For details on the implementation refer to the
Documentation/block/writeback_cache_control.txt file in the kernel tree,
for reasons why we got rid of barriers with their syncronization
semantics refer to various threads on -fsdevel and lkml during the past
couple of month (search your favour archive for barriers).


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-10-27 10:23 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-21  0:04 linux-next regression: IO errors in with ext4 and xen-blkfront Jeremy Fitzhardinge
2010-10-21  0:09 ` Jeremy Fitzhardinge
2010-10-21  0:09   ` Jeremy Fitzhardinge
2010-10-22  8:18   ` Jens Axboe
2010-10-22  8:18     ` Jens Axboe
2010-10-22  8:29     ` Christoph Hellwig
2010-10-22  8:54       ` Jens Axboe
2010-10-22  8:56         ` Christoph Hellwig
2010-10-22  8:57           ` Jens Axboe
2010-10-22  9:20             ` Christoph Hellwig
2010-10-25 18:26       ` [Xen-devel] " Konrad Rzeszutek Wilk
2010-10-25 18:47         ` Christoph Hellwig
2010-10-25 18:47           ` Christoph Hellwig
2010-10-25 19:05           ` [Xen-devel] " Konrad Rzeszutek Wilk
2010-10-25 19:05             ` Konrad Rzeszutek Wilk
2010-10-26 12:49             ` [Xen-devel] " Daniel Stodden
2010-10-26 12:49               ` Daniel Stodden
2010-10-27 10:23               ` [Xen-devel] " Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.