All of lore.kernel.org
 help / color / mirror / Atom feed
* blkfront problem in pvops kernel when barriers enabled
@ 2011-09-04 10:49 Marek Marczykowski
  2011-09-06 16:32 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 16+ messages in thread
From: Marek Marczykowski @ 2011-09-04 10:49 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1370 bytes --]

Hello,

Pvops block frontend (tested vanilla 3.0.3, 3.1rc2, Konrad's testing
branch) produces a lot of I/O errors when barriers are enabled but
cannot be used.

On xenlinux I've got message:
[   15.036921] blkfront: xvdb: empty write barrier op failed
[   15.036936] blkfront: xvdb: barriers disabled

and after that, everything works fine. On pvops - I/O errors.
As backend I've used 2.6.38.3 xenlinux (based on SUSE package) and
3.1rc2 with same result.

When I disable barriers (patching blkbackend to set feature-barrier=0)
everything works fine with all above versions.

My setup is xen-4.1.1 (if it matters), backends: phy from device-mapper
device and phy from loop device; frontends covered by device-mapper
snapshot, which is set up in domU initramfs.

It looks like some race condition, because when I setup device-mapper in
domU and mount it manually (which cause some delays between steps), it
works fine...

Have you idea why it happens? What additional data can I provide debug it?

In addition it should be possible to disable barrier without patching
module... Perhaps some pciback module parameter? Or leave feature-*
xenstore entries alone if present before device initialization.

-- 
Pozdrawiam / Best Regards,
Marek Marczykowski         | RLU #390519
marmarek at mimuw edu pl   | xmpp:marmarek at staszic waw pl


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5842 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-04 10:49 blkfront problem in pvops kernel when barriers enabled Marek Marczykowski
@ 2011-09-06 16:32 ` Konrad Rzeszutek Wilk
  2011-09-06 16:55   ` Konrad Rzeszutek Wilk
  2011-09-06 17:16   ` Marek Marczykowski
  0 siblings, 2 replies; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-06 16:32 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: xen-devel

On Sun, Sep 04, 2011 at 12:49:42PM +0200, Marek Marczykowski wrote:
> Hello,
> 
> Pvops block frontend (tested vanilla 3.0.3, 3.1rc2, Konrad's testing
> branch) produces a lot of I/O errors when barriers are enabled but
> cannot be used.
> 
> On xenlinux I've got message:
> [   15.036921] blkfront: xvdb: empty write barrier op failed
> [   15.036936] blkfront: xvdb: barriers disabled
> 
> and after that, everything works fine. On pvops - I/O errors.
> As backend I've used 2.6.38.3 xenlinux (based on SUSE package) and
> 3.1rc2 with same result.

Hm, and the 'feature-barrier' was enabled on in those backends?
That is really bizzare considering that those backends don't actually
support WRITE_BARRIER anymore.

> 
> When I disable barriers (patching blkbackend to set feature-barrier=0)
> everything works fine with all above versions.

Ok, and the patch you sent "[PATCH] Initialize vars in blkfront_connect"
as well?

> 
> My setup is xen-4.1.1 (if it matters), backends: phy from device-mapper
> device and phy from loop device; frontends covered by device-mapper
> snapshot, which is set up in domU initramfs.
> 
> It looks like some race condition, because when I setup device-mapper in
> domU and mount it manually (which cause some delays between steps), it
> works fine...
> 
> Have you idea why it happens? What additional data can I provide debug it?
> 
> In addition it should be possible to disable barrier without patching
> module... Perhaps some pciback module parameter? Or leave feature-*

Not sure why you would touch pciback.. But the barrier should _not_
be enabled in those backends. The 'feature-flush-cache' should be.


> xenstore entries alone if present before device initialization.
> 
> -- 
> Pozdrawiam / Best Regards,
> Marek Marczykowski         | RLU #390519
> marmarek at mimuw edu pl   | xmpp:marmarek at staszic waw pl
> 



> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-06 16:32 ` Konrad Rzeszutek Wilk
@ 2011-09-06 16:55   ` Konrad Rzeszutek Wilk
  2011-09-06 17:47     ` Marek Marczykowski
  2011-09-06 17:16   ` Marek Marczykowski
  1 sibling, 1 reply; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-06 16:55 UTC (permalink / raw)
  To: Marek Marczykowski, JBeulich; +Cc: xen-devel

On Tue, Sep 06, 2011 at 12:32:13PM -0400, Konrad Rzeszutek Wilk wrote:
> On Sun, Sep 04, 2011 at 12:49:42PM +0200, Marek Marczykowski wrote:
> > Hello,
> > 
> > Pvops block frontend (tested vanilla 3.0.3, 3.1rc2, Konrad's testing
> > branch) produces a lot of I/O errors when barriers are enabled but
> > cannot be used.
> > 
> > On xenlinux I've got message:
> > [   15.036921] blkfront: xvdb: empty write barrier op failed
> > [   15.036936] blkfront: xvdb: barriers disabled
> > 
> > and after that, everything works fine. On pvops - I/O errors.
> > As backend I've used 2.6.38.3 xenlinux (based on SUSE package) and
> > 3.1rc2 with same result.
> 
> Hm, and the 'feature-barrier' was enabled on in those backends?
> That is really bizzare considering that those backends don't actually
> support WRITE_BARRIER anymore.

To be exact:
http://lwn.net/Articles/399715/ so in 2.6.37-era ish the WRITE_BARRIER
functionality got ripped out.

And the LFS summit in 2010 had more details:
http://lwn.net/Articles/399148/
"That led, eventually, to one of the clearest decisions in the first
day of the summit: barriers, as such, will be no more."

And WRITE_BARRIER != WRITE_FLUSH so if the SuSE backend is using it
as so - then there is a bug in there.

In the 3.1-rc2 upstream kernel there should be absolutly no hint
of 'feature-barrier' in the _backend_ code (it is OK for it to be
in the frontend code).

Can you confirm where you got your sources?

P.S.
There should be a backwards compatible way of implementing the
'feature-barrier' in the block backend of 3.0 and further kernels..
but nobody has stepped up in implementing it.

Also, one more thing - are you sure you are using the block backend?
You might be using the QEMU qdisk?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-06 16:32 ` Konrad Rzeszutek Wilk
  2011-09-06 16:55   ` Konrad Rzeszutek Wilk
@ 2011-09-06 17:16   ` Marek Marczykowski
  2011-09-07  1:47     ` Konrad Rzeszutek Wilk
  2011-12-01 19:08     ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 16+ messages in thread
From: Marek Marczykowski @ 2011-09-06 17:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2289 bytes --]

On 06.09.2011 18:32, Konrad Rzeszutek Wilk wrote:
> On Sun, Sep 04, 2011 at 12:49:42PM +0200, Marek Marczykowski wrote:
>> Hello,
>>
>> Pvops block frontend (tested vanilla 3.0.3, 3.1rc2, Konrad's testing
>> branch) produces a lot of I/O errors when barriers are enabled but
>> cannot be used.
>>
>> On xenlinux I've got message:
>> [   15.036921] blkfront: xvdb: empty write barrier op failed
>> [   15.036936] blkfront: xvdb: barriers disabled
>>
>> and after that, everything works fine. On pvops - I/O errors.
>> As backend I've used 2.6.38.3 xenlinux (based on SUSE package) and
>> 3.1rc2 with same result.
> 
> Hm, and the 'feature-barrier' was enabled on in those backends?
> That is really bizzare considering that those backends don't actually
> support WRITE_BARRIER anymore.

At least in 2.6.38.3 xenlinux  (SUSE). Now I'm not sure if 3.1rc2 also
needed this modification (can't find it now).

>> When I disable barriers (patching blkbackend to set feature-barrier=0)
>> everything works fine with all above versions.
> 
> Ok, and the patch you sent "[PATCH] Initialize vars in blkfront_connect"
> as well?

Yes.
I've noticed now that this patch was needed only on your testing branch
(not vanilla kernel).

>> My setup is xen-4.1.1 (if it matters), backends: phy from device-mapper
>> device and phy from loop device; frontends covered by device-mapper
>> snapshot, which is set up in domU initramfs.
>>
>> It looks like some race condition, because when I setup device-mapper in
>> domU and mount it manually (which cause some delays between steps), it
>> works fine...
>>
>> Have you idea why it happens? What additional data can I provide debug it?
>>
>> In addition it should be possible to disable barrier without patching
>> module... Perhaps some pciback module parameter? Or leave feature-*
> 
> Not sure why you would touch pciback.. 

I mean blkback of course.

> But the barrier should _not_
> be enabled in those backends. The 'feature-flush-cache' should be.

(on 3.1rc2) Looking to xenstore now there is 'feature-flush-cache=1' and
no 'feature-barrier'. So it is ok.

-- 
Pozdrawiam / Best Regards,
Marek Marczykowski         | RLU #390519
marmarek at mimuw edu pl   | xmpp:marmarek at staszic waw pl


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5842 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-06 16:55   ` Konrad Rzeszutek Wilk
@ 2011-09-06 17:47     ` Marek Marczykowski
  0 siblings, 0 replies; 16+ messages in thread
From: Marek Marczykowski @ 2011-09-06 17:47 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, JBeulich


[-- Attachment #1.1: Type: text/plain, Size: 2162 bytes --]

On 06.09.2011 18:55, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 06, 2011 at 12:32:13PM -0400, Konrad Rzeszutek Wilk wrote:
>> On Sun, Sep 04, 2011 at 12:49:42PM +0200, Marek Marczykowski wrote:
>>> Hello,
>>>
>>> Pvops block frontend (tested vanilla 3.0.3, 3.1rc2, Konrad's testing
>>> branch) produces a lot of I/O errors when barriers are enabled but
>>> cannot be used.
>>>
>>> On xenlinux I've got message:
>>> [   15.036921] blkfront: xvdb: empty write barrier op failed
>>> [   15.036936] blkfront: xvdb: barriers disabled
>>>
>>> and after that, everything works fine. On pvops - I/O errors.
>>> As backend I've used 2.6.38.3 xenlinux (based on SUSE package) and
>>> 3.1rc2 with same result.
>>
>> Hm, and the 'feature-barrier' was enabled on in those backends?
>> That is really bizzare considering that those backends don't actually
>> support WRITE_BARRIER anymore.
> 
> To be exact:
> http://lwn.net/Articles/399715/ so in 2.6.37-era ish the WRITE_BARRIER
> functionality got ripped out.
> 
> And the LFS summit in 2010 had more details:
> http://lwn.net/Articles/399148/
> "That led, eventually, to one of the clearest decisions in the first
> day of the summit: barriers, as such, will be no more."
> 
> And WRITE_BARRIER != WRITE_FLUSH so if the SuSE backend is using it
> as so - then there is a bug in there.

2.6.38.3 OpenSUSE (stable branch) uses feature-barrier and no
feature-flush-cache, so it should works...
http://kernel.opensuse.org/cgit/kernel/tree/drivers/xen/blkback/xenbus.c?h=stable#n208
http://kernel.opensuse.org/cgit/kernel/tree/drivers/xen/blkback/xenbus.c?h=stable#n443

> In the 3.1-rc2 upstream kernel there should be absolutly no hint
> of 'feature-barrier' in the _backend_ code (it is OK for it to be
> in the frontend code).

Ok, it looks like I've mixed up logs from 2.6.38.3 dom0 and 3.1-rc2
dom0. Sorry for that.

> Also, one more thing - are you sure you are using the block backend?
> You might be using the QEMU qdisk?

Yes.

-- 
Pozdrawiam / Best Regards,
Marek Marczykowski         | RLU #390519
marmarek at mimuw edu pl   | xmpp:marmarek at staszic waw pl


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5842 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-06 17:16   ` Marek Marczykowski
@ 2011-09-07  1:47     ` Konrad Rzeszutek Wilk
  2011-09-07  9:50       ` Jan Beulich
  2011-09-07 17:34       ` Jeremy Fitzhardinge
  2011-12-01 19:08     ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-07  1:47 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: xen-devel

On Tue, Sep 06, 2011 at 07:16:34PM +0200, Marek Marczykowski wrote:
> On 06.09.2011 18:32, Konrad Rzeszutek Wilk wrote:
> > On Sun, Sep 04, 2011 at 12:49:42PM +0200, Marek Marczykowski wrote:
> >> Hello,
> >>
> >> Pvops block frontend (tested vanilla 3.0.3, 3.1rc2, Konrad's testing
> >> branch) produces a lot of I/O errors when barriers are enabled but
> >> cannot be used.
> >>
> >> On xenlinux I've got message:
> >> [   15.036921] blkfront: xvdb: empty write barrier op failed
> >> [   15.036936] blkfront: xvdb: barriers disabled
> >>
> >> and after that, everything works fine. On pvops - I/O errors.
> >> As backend I've used 2.6.38.3 xenlinux (based on SUSE package) and
> >> 3.1rc2 with same result.
> > 
> > Hm, and the 'feature-barrier' was enabled on in those backends?
> > That is really bizzare considering that those backends don't actually
> > support WRITE_BARRIER anymore.
> 
> At least in 2.6.38.3 xenlinux  (SUSE). Now I'm not sure if 3.1rc2 also
> needed this modification (can't find it now).
> 
> >> When I disable barriers (patching blkbackend to set feature-barrier=0)
> >> everything works fine with all above versions.
> > 
> > Ok, and the patch you sent "[PATCH] Initialize vars in blkfront_connect"
> > as well?
> 
> Yes.
> I've noticed now that this patch was needed only on your testing branch
> (not vanilla kernel).

Oooo. Let me check what went wrong. Perhaps the fix is already applied in
my local tree.
> 
> >> My setup is xen-4.1.1 (if it matters), backends: phy from device-mapper
> >> device and phy from loop device; frontends covered by device-mapper
> >> snapshot, which is set up in domU initramfs.
> >>
> >> It looks like some race condition, because when I setup device-mapper in
> >> domU and mount it manually (which cause some delays between steps), it
> >> works fine...
> >>
> >> Have you idea why it happens? What additional data can I provide debug it?
> >>
> >> In addition it should be possible to disable barrier without patching
> >> module... Perhaps some pciback module parameter? Or leave feature-*
> > 
> > Not sure why you would touch pciback.. 
> 
> I mean blkback of course.
> 
> > But the barrier should _not_
> > be enabled in those backends. The 'feature-flush-cache' should be.
> 
> (on 3.1rc2) Looking to xenstore now there is 'feature-flush-cache=1' and
> no 'feature-barrier'. So it is ok.

<scratches head>

I can only think of 2.6.38-3 XenOLinux doing it - and it is a bug
to do it. It really ought to _not_ advertise 'feature-barrier' and
instead advertise 'feature-flush-cache'.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-07  1:47     ` Konrad Rzeszutek Wilk
@ 2011-09-07  9:50       ` Jan Beulich
  2011-09-07 10:19         ` Jan Beulich
  2011-09-07 17:34       ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2011-09-07  9:50 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Marek Marczykowski

>>> On 07.09.11 at 03:47, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Tue, Sep 06, 2011 at 07:16:34PM +0200, Marek Marczykowski wrote:
>> On 06.09.2011 18:32, Konrad Rzeszutek Wilk wrote:
>> > On Sun, Sep 04, 2011 at 12:49:42PM +0200, Marek Marczykowski wrote:
>> >> Hello,
>> >>
>> >> Pvops block frontend (tested vanilla 3.0.3, 3.1rc2, Konrad's testing
>> >> branch) produces a lot of I/O errors when barriers are enabled but
>> >> cannot be used.
>> >>
>> >> On xenlinux I've got message:
>> >> [   15.036921] blkfront: xvdb: empty write barrier op failed
>> >> [   15.036936] blkfront: xvdb: barriers disabled
>> >>
>> >> and after that, everything works fine. On pvops - I/O errors.
>> >> As backend I've used 2.6.38.3 xenlinux (based on SUSE package) and
>> >> 3.1rc2 with same result.
>> > 
>> > Hm, and the 'feature-barrier' was enabled on in those backends?
>> > That is really bizzare considering that those backends don't actually
>> > support WRITE_BARRIER anymore.
>> 
>> At least in 2.6.38.3 xenlinux  (SUSE). Now I'm not sure if 3.1rc2 also
>> needed this modification (can't find it now).
>> 
>> >> When I disable barriers (patching blkbackend to set feature-barrier=0)
>> >> everything works fine with all above versions.
>> > 
>> > Ok, and the patch you sent "[PATCH] Initialize vars in blkfront_connect"
>> > as well?
>> 
>> Yes.
>> I've noticed now that this patch was needed only on your testing branch
>> (not vanilla kernel).
> 
> Oooo. Let me check what went wrong. Perhaps the fix is already applied in
> my local tree.
>> 
>> >> My setup is xen-4.1.1 (if it matters), backends: phy from device-mapper
>> >> device and phy from loop device; frontends covered by device-mapper
>> >> snapshot, which is set up in domU initramfs.
>> >>
>> >> It looks like some race condition, because when I setup device-mapper in
>> >> domU and mount it manually (which cause some delays between steps), it
>> >> works fine...
>> >>
>> >> Have you idea why it happens? What additional data can I provide debug it?
>> >>
>> >> In addition it should be possible to disable barrier without patching
>> >> module... Perhaps some pciback module parameter? Or leave feature-*
>> > 
>> > Not sure why you would touch pciback.. 
>> 
>> I mean blkback of course.
>> 
>> > But the barrier should _not_
>> > be enabled in those backends. The 'feature-flush-cache' should be.
>> 
>> (on 3.1rc2) Looking to xenstore now there is 'feature-flush-cache=1' and
>> no 'feature-barrier'. So it is ok.
> 
> <scratches head>
> 
> I can only think of 2.6.38-3 XenOLinux doing it - and it is a bug
> to do it. It really ought to _not_ advertise 'feature-barrier' and
> instead advertise 'feature-flush-cache'.

Indeed, I see that I added feature-flush-cache support to the frontend
back then, but neglected to do so for the backend. Partly perhaps
because I'm not much of a (block, network, ...) driver person...

However, what I'm not understanding with dropping feature-barrier
support from the backend - how do you deal with old frontends
wanting to use barriers? I'm currently converting them into
WRITE_FLUSH_FUA operations in the backend as a (hopefully) best
effort approach.

Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-07  9:50       ` Jan Beulich
@ 2011-09-07 10:19         ` Jan Beulich
  2011-09-07 17:41           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2011-09-07 10:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Marek Marczykowski

>>> On 07.09.11 at 11:50, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>> On 07.09.11 at 03:47, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> On Tue, Sep 06, 2011 at 07:16:34PM +0200, Marek Marczykowski wrote:
>>> On 06.09.2011 18:32, Konrad Rzeszutek Wilk wrote:
>>> > On Sun, Sep 04, 2011 at 12:49:42PM +0200, Marek Marczykowski wrote:
>>> >> Hello,
>>> >>
>>> >> Pvops block frontend (tested vanilla 3.0.3, 3.1rc2, Konrad's testing
>>> >> branch) produces a lot of I/O errors when barriers are enabled but
>>> >> cannot be used.
>>> >>
>>> >> On xenlinux I've got message:
>>> >> [   15.036921] blkfront: xvdb: empty write barrier op failed
>>> >> [   15.036936] blkfront: xvdb: barriers disabled
>>> >>
>>> >> and after that, everything works fine. On pvops - I/O errors.
>>> >> As backend I've used 2.6.38.3 xenlinux (based on SUSE package) and
>>> >> 3.1rc2 with same result.
>>> > 
>>> > Hm, and the 'feature-barrier' was enabled on in those backends?
>>> > That is really bizzare considering that those backends don't actually
>>> > support WRITE_BARRIER anymore.
>>> 
>>> At least in 2.6.38.3 xenlinux  (SUSE). Now I'm not sure if 3.1rc2 also
>>> needed this modification (can't find it now).
>>> 
>>> >> When I disable barriers (patching blkbackend to set feature-barrier=0)
>>> >> everything works fine with all above versions.
>>> > 
>>> > Ok, and the patch you sent "[PATCH] Initialize vars in blkfront_connect"
>>> > as well?
>>> 
>>> Yes.
>>> I've noticed now that this patch was needed only on your testing branch
>>> (not vanilla kernel).
>> 
>> Oooo. Let me check what went wrong. Perhaps the fix is already applied in
>> my local tree.
>>> 
>>> >> My setup is xen-4.1.1 (if it matters), backends: phy from device-mapper
>>> >> device and phy from loop device; frontends covered by device-mapper
>>> >> snapshot, which is set up in domU initramfs.
>>> >>
>>> >> It looks like some race condition, because when I setup device-mapper in
>>> >> domU and mount it manually (which cause some delays between steps), it
>>> >> works fine...
>>> >>
>>> >> Have you idea why it happens? What additional data can I provide debug it?
>>> >>
>>> >> In addition it should be possible to disable barrier without patching
>>> >> module... Perhaps some pciback module parameter? Or leave feature-*
>>> > 
>>> > Not sure why you would touch pciback.. 
>>> 
>>> I mean blkback of course.
>>> 
>>> > But the barrier should _not_
>>> > be enabled in those backends. The 'feature-flush-cache' should be.
>>> 
>>> (on 3.1rc2) Looking to xenstore now there is 'feature-flush-cache=1' and
>>> no 'feature-barrier'. So it is ok.
>> 
>> <scratches head>
>> 
>> I can only think of 2.6.38-3 XenOLinux doing it - and it is a bug
>> to do it. It really ought to _not_ advertise 'feature-barrier' and
>> instead advertise 'feature-flush-cache'.
> 
> Indeed, I see that I added feature-flush-cache support to the frontend
> back then, but neglected to do so for the backend. Partly perhaps
> because I'm not much of a (block, network, ...) driver person...
> 
> However, what I'm not understanding with dropping feature-barrier
> support from the backend - how do you deal with old frontends
> wanting to use barriers? I'm currently converting them into
> WRITE_FLUSH_FUA operations in the backend as a (hopefully) best
> effort approach.

Also I notice you're using WRITE_ODIRECT - what's the background
of that?

Thanks, Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-07  1:47     ` Konrad Rzeszutek Wilk
  2011-09-07  9:50       ` Jan Beulich
@ 2011-09-07 17:34       ` Jeremy Fitzhardinge
  2011-09-07 17:43         ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-09-07 17:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Marek Marczykowski

On 09/06/2011 06:47 PM, Konrad Rzeszutek Wilk wrote:
> (on 3.1rc2) Looking to xenstore now there is 'feature-flush-cache=1' and
> no 'feature-barrier'. So it is ok.
> <scratches head>
>
> I can only think of 2.6.38-3 XenOLinux doing it - and it is a bug
> to do it. It really ought to _not_ advertise 'feature-barrier' and
> instead advertise 'feature-flush-cache'.

Does that mean that older guests which don't understand flush-cache will
be left with no way to force writes to stable storage?  Seems to me that
even if the backend would prefer flush-cache, it should also advertise
barriers.

However, that raises the question of how to express the preferred
mechanism if multiple are available.  You could assume that flush-cache
is always preferred if available, but that's pretty clunky.

    J

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-07 10:19         ` Jan Beulich
@ 2011-09-07 17:41           ` Konrad Rzeszutek Wilk
  2011-09-08  8:06             ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-07 17:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Marek Marczykowski

> >> <scratches head>
> >> 
> >> I can only think of 2.6.38-3 XenOLinux doing it - and it is a bug
> >> to do it. It really ought to _not_ advertise 'feature-barrier' and
> >> instead advertise 'feature-flush-cache'.
> > 
> > Indeed, I see that I added feature-flush-cache support to the frontend
> > back then, but neglected to do so for the backend. Partly perhaps
> > because I'm not much of a (block, network, ...) driver person...
> > 
> > However, what I'm not understanding with dropping feature-barrier
> > support from the backend - how do you deal with old frontends
> > wanting to use barriers? I'm currently converting them into

Just not supporting them. I know it is incredibly bad to do so - but
I have not had a chance to write the code to emulate the 'feature-barrier'
correctly.

> > WRITE_FLUSH_FUA operations in the backend as a (hopefully) best
> > effort approach.

I am not sure. I need to run blktrace|blkparse to make sure it does the
right think as compared to a WRITE_BARRIER. Lets ask Christopher Hellwig - he
knows a lot of this.

> 
> Also I notice you're using WRITE_ODIRECT - what's the background
> of that?

Ah, http://git.drbd.org/linux-2.6-drbd.git/?p=linux-2.6-drbd.git;a=commit;h=013c3ca184851078b9c04744efd4d47e52c6ecf8


> 
> Thanks, Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-07 17:34       ` Jeremy Fitzhardinge
@ 2011-09-07 17:43         ` Konrad Rzeszutek Wilk
  2011-09-07 18:58           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-07 17:43 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, Marek Marczykowski

On Wed, Sep 07, 2011 at 10:34:49AM -0700, Jeremy Fitzhardinge wrote:
> On 09/06/2011 06:47 PM, Konrad Rzeszutek Wilk wrote:
> > (on 3.1rc2) Looking to xenstore now there is 'feature-flush-cache=1' and
> > no 'feature-barrier'. So it is ok.
> > <scratches head>
> >
> > I can only think of 2.6.38-3 XenOLinux doing it - and it is a bug
> > to do it. It really ought to _not_ advertise 'feature-barrier' and
> > instead advertise 'feature-flush-cache'.
> 
> Does that mean that older guests which don't understand flush-cache will
> be left with no way to force writes to stable storage?  Seems to me that

Correct.
> even if the backend would prefer flush-cache, it should also advertise
> barriers.

But doing it incorrectly is bad - really bad.

> 
> However, that raises the question of how to express the preferred
> mechanism if multiple are available.  You could assume that flush-cache
> is always preferred if available, but that's pretty clunky.

That is how I did it in the frontend.
> 
>     J

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-07 17:43         ` Konrad Rzeszutek Wilk
@ 2011-09-07 18:58           ` Jeremy Fitzhardinge
  2011-09-07 19:31             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 16+ messages in thread
From: Jeremy Fitzhardinge @ 2011-09-07 18:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Marek Marczykowski

On 09/07/2011 10:43 AM, Konrad Rzeszutek Wilk wrote:
> On Wed, Sep 07, 2011 at 10:34:49AM -0700, Jeremy Fitzhardinge wrote:
>> On 09/06/2011 06:47 PM, Konrad Rzeszutek Wilk wrote:
>>> (on 3.1rc2) Looking to xenstore now there is 'feature-flush-cache=1' and
>>> no 'feature-barrier'. So it is ok.
>>> <scratches head>
>>>
>>> I can only think of 2.6.38-3 XenOLinux doing it - and it is a bug
>>> to do it. It really ought to _not_ advertise 'feature-barrier' and
>>> instead advertise 'feature-flush-cache'.
>> Does that mean that older guests which don't understand flush-cache will
>> be left with no way to force writes to stable storage?  Seems to me that
> Correct.
>> even if the backend would prefer flush-cache, it should also advertise
>> barriers.
> But doing it incorrectly is bad - really bad.

Well, there's "bad performance" and "bad oops we lost data".  If the
backend emulates a barrier by doing a drain, flush, write, drain, flush
then I think that should be safe, but definitely not quick.

>> However, that raises the question of how to express the preferred
>> mechanism if multiple are available.  You could assume that flush-cache
>> is always preferred if available, but that's pretty clunky.
> That is how I did it in the frontend.

OK, how about this for a cheapo idea: make the
feature-barrier/flush-cache files contain a priority: 0 = "do not use",
non-zero = bigger the better?  That way we can have barrier-preferring
backends also support flush.  I suppose.

Really, frontends should also try to make do with whatever the backend
supports, even if its not preferred as well.

    J

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-07 18:58           ` Jeremy Fitzhardinge
@ 2011-09-07 19:31             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-07 19:31 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, Marek Marczykowski

On Wed, Sep 07, 2011 at 11:58:38AM -0700, Jeremy Fitzhardinge wrote:
> On 09/07/2011 10:43 AM, Konrad Rzeszutek Wilk wrote:
> > On Wed, Sep 07, 2011 at 10:34:49AM -0700, Jeremy Fitzhardinge wrote:
> >> On 09/06/2011 06:47 PM, Konrad Rzeszutek Wilk wrote:
> >>> (on 3.1rc2) Looking to xenstore now there is 'feature-flush-cache=1' and
> >>> no 'feature-barrier'. So it is ok.
> >>> <scratches head>
> >>>
> >>> I can only think of 2.6.38-3 XenOLinux doing it - and it is a bug
> >>> to do it. It really ought to _not_ advertise 'feature-barrier' and
> >>> instead advertise 'feature-flush-cache'.
> >> Does that mean that older guests which don't understand flush-cache will
> >> be left with no way to force writes to stable storage?  Seems to me that
> > Correct.
> >> even if the backend would prefer flush-cache, it should also advertise
> >> barriers.
> > But doing it incorrectly is bad - really bad.
> 
> Well, there's "bad performance" and "bad oops we lost data".  If the
> backend emulates a barrier by doing a drain, flush, write, drain, flush
> then I think that should be safe, but definitely not quick.

Which it looks like we need to do. Stop the processing of the ring
buffer and do that sequence of events you mentioned. Which would
entail waiting for all of the bio callbacks to finish.

> 
> >> However, that raises the question of how to express the preferred
> >> mechanism if multiple are available.  You could assume that flush-cache
> >> is always preferred if available, but that's pretty clunky.
> > That is how I did it in the frontend.
> 
> OK, how about this for a cheapo idea: make the
> feature-barrier/flush-cache files contain a priority: 0 = "do not use",
> non-zero = bigger the better?  That way we can have barrier-preferring
> backends also support flush.  I suppose.

Well, the "older" backends could emulate the 'feature-flush-cache'
.. except it is not really right - but it will do the same thing - stop
and drain the queue (except actually flushing the contents to the disk).
So perhaps the right way to implement this in the "old" backends
is to also send SYNC along.

I think I am dense today , but the issue I am seeing is issue is with
"old" frontends (RHEL5) with "new" backends (3.0 and higher) -
where there is no 'feature-barrier' support. So they won't do barriers.

> 
> Really, frontends should also try to make do with whatever the backend
> supports, even if its not preferred as well.

Sure. That is how they do it now. If it can do barriers - it will do
BLKIF_OP_BARRIER. If it can do flush, it will do BLKIF_OP_FLUSH.

Either one is used when the frontend gets REQ_FLUSH || REQ_FUA command.

> 
>     J

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-07 17:41           ` Konrad Rzeszutek Wilk
@ 2011-09-08  8:06             ` Jan Beulich
  2011-09-08 13:11               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2011-09-08  8:06 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, Marek Marczykowski

>>> On 07.09.11 at 19:41, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> >> <scratches head>
>> >> 
>> >> I can only think of 2.6.38-3 XenOLinux doing it - and it is a bug
>> >> to do it. It really ought to _not_ advertise 'feature-barrier' and
>> >> instead advertise 'feature-flush-cache'.
>> > 
>> > Indeed, I see that I added feature-flush-cache support to the frontend
>> > back then, but neglected to do so for the backend. Partly perhaps
>> > because I'm not much of a (block, network, ...) driver person...
>> > 
>> > However, what I'm not understanding with dropping feature-barrier
>> > support from the backend - how do you deal with old frontends
>> > wanting to use barriers? I'm currently converting them into
> 
> Just not supporting them. I know it is incredibly bad to do so - but
> I have not had a chance to write the code to emulate the 'feature-barrier'
> correctly.
> 
>> > WRITE_FLUSH_FUA operations in the backend as a (hopefully) best
>> > effort approach.
> 
> I am not sure. I need to run blktrace|blkparse to make sure it does the
> right think as compared to a WRITE_BARRIER. Lets ask Christopher Hellwig - he
> knows a lot of this.
> 
>> 
>> Also I notice you're using WRITE_ODIRECT - what's the background
>> of that?
> 
> Ah, 
> http://git.drbd.org/linux-2.6-drbd.git/?p=linux-2.6-drbd.git;a=commit;h=013c3 
> ca184851078b9c04744efd4d47e52c6ecf8

Hmm, that seems more like a band-aid than a real solution. What if with
another scheduler (or after some changes to CFQ) REQ_SYNC actually
hurts (as - without any data - I would have expected)? Can't/shouldn't
the use of REQ_SYNC be made at least dependent on the scheduler in
use on the queue?

Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-08  8:06             ` Jan Beulich
@ 2011-09-08 13:11               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-08 13:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Marek Marczykowski

On Thu, Sep 08, 2011 at 09:06:09AM +0100, Jan Beulich wrote:
> >>> On 07.09.11 at 19:41, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> >> >> <scratches head>
> >> >> 
> >> >> I can only think of 2.6.38-3 XenOLinux doing it - and it is a bug
> >> >> to do it. It really ought to _not_ advertise 'feature-barrier' and
> >> >> instead advertise 'feature-flush-cache'.
> >> > 
> >> > Indeed, I see that I added feature-flush-cache support to the frontend
> >> > back then, but neglected to do so for the backend. Partly perhaps
> >> > because I'm not much of a (block, network, ...) driver person...
> >> > 
> >> > However, what I'm not understanding with dropping feature-barrier
> >> > support from the backend - how do you deal with old frontends
> >> > wanting to use barriers? I'm currently converting them into
> > 
> > Just not supporting them. I know it is incredibly bad to do so - but
> > I have not had a chance to write the code to emulate the 'feature-barrier'
> > correctly.
> > 
> >> > WRITE_FLUSH_FUA operations in the backend as a (hopefully) best
> >> > effort approach.
> > 
> > I am not sure. I need to run blktrace|blkparse to make sure it does the
> > right think as compared to a WRITE_BARRIER. Lets ask Christopher Hellwig - he
> > knows a lot of this.
> > 
> >> 
> >> Also I notice you're using WRITE_ODIRECT - what's the background
> >> of that?
> > 
> > Ah, 
> > http://git.drbd.org/linux-2.6-drbd.git/?p=linux-2.6-drbd.git;a=commit;h=013c3 
> > ca184851078b9c04744efd4d47e52c6ecf8
> 
> Hmm, that seems more like a band-aid than a real solution. What if with
> another scheduler (or after some changes to CFQ) REQ_SYNC actually
> hurts (as - without any data - I would have expected)? Can't/shouldn't
> the use of REQ_SYNC be made at least dependent on the scheduler in
> use on the queue?

This is what the header fine says about async vs sync:

 *      All IO is handled async in Linux. This is fine for background
 *      writes, but for reads or writes that someone waits for completion
 *      on, we want to notify the block layer and IO scheduler so that they
 *      know about it. That allows them to make better scheduling
 *      decisions. So when the below references 'sync' and 'async', it
 *      is referencing this priority hint.


To make sure I was not shooting myself in the foot, I did the change and
also made sure the other schedules worked without any regressions in speeds.

But keep in mind that this 'WRITE_ODIRECT' behavior is also used
by AIO, and by any userspace application that stick O_DIRECT on the
open call. So if another another scheduler breaks this behavior we are
not the only one affected.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: blkfront problem in pvops kernel when barriers enabled
  2011-09-06 17:16   ` Marek Marczykowski
  2011-09-07  1:47     ` Konrad Rzeszutek Wilk
@ 2011-12-01 19:08     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-01 19:08 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: xen-devel

On Tue, Sep 06, 2011 at 07:16:34PM +0200, Marek Marczykowski wrote:
> On 06.09.2011 18:32, Konrad Rzeszutek Wilk wrote:
> > On Sun, Sep 04, 2011 at 12:49:42PM +0200, Marek Marczykowski wrote:
> >> Hello,
> >>
> >> Pvops block frontend (tested vanilla 3.0.3, 3.1rc2, Konrad's testing
> >> branch) produces a lot of I/O errors when barriers are enabled but
> >> cannot be used.
> >>
> >> On xenlinux I've got message:
> >> [   15.036921] blkfront: xvdb: empty write barrier op failed
> >> [   15.036936] blkfront: xvdb: barriers disabled
> >>
> >> and after that, everything works fine. On pvops - I/O errors.
> >> As backend I've used 2.6.38.3 xenlinux (based on SUSE package) and
> >> 3.1rc2 with same result.
> > 
> > Hm, and the 'feature-barrier' was enabled on in those backends?
> > That is really bizzare considering that those backends don't actually
> > support WRITE_BARRIER anymore.
> 
> At least in 2.6.38.3 xenlinux  (SUSE). Now I'm not sure if 3.1rc2 also
> needed this modification (can't find it now).

I think this is resolved now? This patch below should fix the issue
(at least it did for me when I tried 3.0 with a 2.6.32 older backend
with DEBUG enabled):


# HG changeset patch
From: Jan Beulich <jbeulich@novell.com>
# Date 1306409621 -3600
# Node ID 876a5aaac0264cf38cae6581e5714b93ec380aaa
# Parent  aedb712c05cf065e943e15d0f38597c2e80f7982
Subject: xen/blkback: don't fail empty barrier requests

The sector number on empty barrier requests may (will?) be
uninitialized (neither bio_init() nor rq_init() set the respective
fields), which allows for exceeding the actual (virtual) disk's size.

Inspired by Konrad's "When writting barriers set the sector number to
zero...", but instead of zapping the sector number (which is wrong for
non-empty ones) just ignore the sector number when the sector count is
zero.

While at it also add overflow checking to the math in vbd_translate().

Signed-off-by: Jan Beulich <jbeulich@novell.com>

diff -r aedb712c05cf -r 876a5aaac026 drivers/xen/blkback/vbd.c
--- a/drivers/xen/blkback/vbd.c	Thu May 26 08:09:04 2011 +0100
+++ b/drivers/xen/blkback/vbd.c	Thu May 26 12:33:41 2011 +0100
@@ -108,8 +108,14 @@ int vbd_translate(struct phys_req *req, 
 	if ((operation != READ) && vbd->readonly)
 		goto out;
 
-	if (unlikely((req->sector_number + req->nr_sects) > vbd_sz(vbd)))
-		goto out;
+	if (likely(req->nr_sects)) {
+		blkif_sector_t end = req->sector_number + req->nr_sects;
+
+		if (unlikely(end < req->sector_number))
+			goto out;
+		if (unlikely(end > vbd_sz(vbd)))
+			goto out;
+	}
 
 	req->dev  = vbd->pdevice;
 	req->bdev = vbd->bdev;

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-12-01 19:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-04 10:49 blkfront problem in pvops kernel when barriers enabled Marek Marczykowski
2011-09-06 16:32 ` Konrad Rzeszutek Wilk
2011-09-06 16:55   ` Konrad Rzeszutek Wilk
2011-09-06 17:47     ` Marek Marczykowski
2011-09-06 17:16   ` Marek Marczykowski
2011-09-07  1:47     ` Konrad Rzeszutek Wilk
2011-09-07  9:50       ` Jan Beulich
2011-09-07 10:19         ` Jan Beulich
2011-09-07 17:41           ` Konrad Rzeszutek Wilk
2011-09-08  8:06             ` Jan Beulich
2011-09-08 13:11               ` Konrad Rzeszutek Wilk
2011-09-07 17:34       ` Jeremy Fitzhardinge
2011-09-07 17:43         ` Konrad Rzeszutek Wilk
2011-09-07 18:58           ` Jeremy Fitzhardinge
2011-09-07 19:31             ` Konrad Rzeszutek Wilk
2011-12-01 19:08     ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.