Help with implementing some form of barriers in 3.0 kernels.

All of lore.kernel.org
 help / color / mirror / Atom feed

* Help with implementing some form of barriers in 3.0 kernels.
@ 2011-09-07 17:48 Konrad Rzeszutek Wilk
  2011-09-07 18:17 ` Vivek Goyal
  2011-09-13 10:44 ` Jan Beulich
  0 siblings, 2 replies; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-07 17:48 UTC (permalink / raw)
  To: hch; +Cc: Jeremy Fitzhardinge, jbeulich, linux-kernel, JBeulich

Hey Christoph,

I was wondering what you think is the proper way of implementing a
backend to support the 'barrier' type requests? We have this issue were
there are 2.6.36 type guests that still use barriers and we would like
to support them properly. But in 3.0 there are no barriers - hence
the question whether WRITE_fLUSH_FUA would be equal to WRITE_BARRIER?

Or is there some other things that we should take in consideration?

Thanks!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-07 17:48 Help with implementing some form of barriers in 3.0 kernels Konrad Rzeszutek Wilk
@ 2011-09-07 18:17 ` Vivek Goyal
  2011-09-07 18:27   ` Konrad Rzeszutek Wilk
  2011-09-07 20:16   ` Christoph Hellwig
  2011-09-13 10:44 ` Jan Beulich
  1 sibling, 2 replies; 25+ messages in thread
From: Vivek Goyal @ 2011-09-07 18:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: hch, Jeremy Fitzhardinge, jbeulich, linux-kernel, JBeulich

On Wed, Sep 07, 2011 at 01:48:32PM -0400, Konrad Rzeszutek Wilk wrote:
> Hey Christoph,
> 
> I was wondering what you think is the proper way of implementing a
> backend to support the 'barrier' type requests? We have this issue were
> there are 2.6.36 type guests that still use barriers and we would like
> to support them properly. But in 3.0 there are no barriers - hence
> the question whether WRITE_fLUSH_FUA would be equal to WRITE_BARRIER?

I think WRITE_FLUSH_FUA is not same as WRITE_BARRIER. Because it does
not ensure request ordering. A request rq2 which is issued after rq1 (with
WRITE_flush_FUA), can still finish before rq1. In the past WRITE_BARRIER
would not allow that.

So AFAIK, WRITE_flush_fua is not WRITE_BARRIER.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-07 18:17 ` Vivek Goyal
@ 2011-09-07 18:27   ` Konrad Rzeszutek Wilk
  2011-09-07 18:36     ` Vivek Goyal
  2011-09-07 20:16   ` Christoph Hellwig
  1 sibling, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-07 18:27 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: hch, Jeremy Fitzhardinge, jbeulich, linux-kernel, JBeulich

On Wed, Sep 07, 2011 at 02:17:40PM -0400, Vivek Goyal wrote:
> On Wed, Sep 07, 2011 at 01:48:32PM -0400, Konrad Rzeszutek Wilk wrote:
> > Hey Christoph,
> > 
> > I was wondering what you think is the proper way of implementing a
> > backend to support the 'barrier' type requests? We have this issue were
> > there are 2.6.36 type guests that still use barriers and we would like
> > to support them properly. But in 3.0 there are no barriers - hence
> > the question whether WRITE_fLUSH_FUA would be equal to WRITE_BARRIER?
> 
> I think WRITE_FLUSH_FUA is not same as WRITE_BARRIER. Because it does
> not ensure request ordering. A request rq2 which is issued after rq1 (with
> WRITE_flush_FUA), can still finish before rq1. In the past WRITE_BARRIER
> would not allow that.
> 
> So AFAIK, WRITE_flush_fua is not WRITE_BARRIER.

Ok, any thoughts on how to emulate it then perhaps? Mark each request after
rq1 with WRITE_FUA? .. But then how long should the _FUA bit be set - perhaps
until the rq1 has completed?

> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-07 18:27   ` Konrad Rzeszutek Wilk
@ 2011-09-07 18:36     ` Vivek Goyal
  0 siblings, 0 replies; 25+ messages in thread
From: Vivek Goyal @ 2011-09-07 18:36 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: hch, Jeremy Fitzhardinge, jbeulich, linux-kernel, JBeulich

On Wed, Sep 07, 2011 at 02:27:49PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Sep 07, 2011 at 02:17:40PM -0400, Vivek Goyal wrote:
> > On Wed, Sep 07, 2011 at 01:48:32PM -0400, Konrad Rzeszutek Wilk wrote:
> > > Hey Christoph,
> > > 
> > > I was wondering what you think is the proper way of implementing a
> > > backend to support the 'barrier' type requests? We have this issue were
> > > there are 2.6.36 type guests that still use barriers and we would like
> > > to support them properly. But in 3.0 there are no barriers - hence
> > > the question whether WRITE_fLUSH_FUA would be equal to WRITE_BARRIER?
> > 
> > I think WRITE_FLUSH_FUA is not same as WRITE_BARRIER. Because it does
> > not ensure request ordering. A request rq2 which is issued after rq1 (with
> > WRITE_flush_FUA), can still finish before rq1. In the past WRITE_BARRIER
> > would not allow that.
> > 
> > So AFAIK, WRITE_flush_fua is not WRITE_BARRIER.
> 
> Ok, any thoughts on how to emulate it then perhaps? Mark each request after
> rq1 with WRITE_FUA? .. But then how long should the _FUA bit be set - perhaps
> until the rq1 has completed?

I think sender of the request need to wait for the completion of rq1
before issuing rq2 for emulating request ordering.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-07 18:17 ` Vivek Goyal
  2011-09-07 18:27   ` Konrad Rzeszutek Wilk
@ 2011-09-07 20:16   ` Christoph Hellwig
  2011-09-07 21:31     ` Vivek Goyal
  2011-09-08  8:02     ` Jan Beulich
  1 sibling, 2 replies; 25+ messages in thread
From: Christoph Hellwig @ 2011-09-07 20:16 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Konrad Rzeszutek Wilk, hch, Jeremy Fitzhardinge, jbeulich,
	linux-kernel, JBeulich

[Hmm, for some reason I never manage to receive Konrads mails directly,
 but only get the replies, or copies via the list]

On Wed, Sep 07, 2011 at 02:17:40PM -0400, Vivek Goyal wrote:
> On Wed, Sep 07, 2011 at 01:48:32PM -0400, Konrad Rzeszutek Wilk wrote:
> > Hey Christoph,
> > 
> > I was wondering what you think is the proper way of implementing a
> > backend to support the 'barrier' type requests? We have this issue were
> > there are 2.6.36 type guests that still use barriers and we would like
> > to support them properly. But in 3.0 there are no barriers - hence
> > the question whether WRITE_fLUSH_FUA would be equal to WRITE_BARRIER?
> 
> I think WRITE_FLUSH_FUA is not same as WRITE_BARRIER. Because it does
> not ensure request ordering. A request rq2 which is issued after rq1 (with
> WRITE_flush_FUA), can still finish before rq1. In the past WRITE_BARRIER
> would not allow that.
> 
> So AFAIK, WRITE_flush_fua is not WRITE_BARRIER.

Indeed.  And while most guests won't care some will.  E.g. reiserfs
which is the standard filesystem in most SuSE guests, which happen to
be fairly popular with Xen.

I'd suggest you look at the pre-2.6.36 barrier implementation and see
if you can move that into xen-blkfront.

For the qemu side doing this is a bit easier as you'll just have to wait
for all pending aio requests to complete.  The current qemu xen disk
code gets thus horribly wrong, though.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-07 20:16   ` Christoph Hellwig
@ 2011-09-07 21:31     ` Vivek Goyal
  2011-09-08  8:22       ` Christoph Hellwig
  2011-09-08  8:02     ` Jan Beulich
  1 sibling, 1 reply; 25+ messages in thread
From: Vivek Goyal @ 2011-09-07 21:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Konrad Rzeszutek Wilk, Jeremy Fitzhardinge, jbeulich,
	linux-kernel, JBeulich

On Wed, Sep 07, 2011 at 04:16:20PM -0400, Christoph Hellwig wrote:
> [Hmm, for some reason I never manage to receive Konrads mails directly,
>  but only get the replies, or copies via the list]
> 
> On Wed, Sep 07, 2011 at 02:17:40PM -0400, Vivek Goyal wrote:
> > On Wed, Sep 07, 2011 at 01:48:32PM -0400, Konrad Rzeszutek Wilk wrote:
> > > Hey Christoph,
> > > 
> > > I was wondering what you think is the proper way of implementing a
> > > backend to support the 'barrier' type requests? We have this issue were
> > > there are 2.6.36 type guests that still use barriers and we would like
> > > to support them properly. But in 3.0 there are no barriers - hence
> > > the question whether WRITE_fLUSH_FUA would be equal to WRITE_BARRIER?
> > 
> > I think WRITE_FLUSH_FUA is not same as WRITE_BARRIER. Because it does
> > not ensure request ordering. A request rq2 which is issued after rq1 (with
> > WRITE_flush_FUA), can still finish before rq1. In the past WRITE_BARRIER
> > would not allow that.
> > 
> > So AFAIK, WRITE_flush_fua is not WRITE_BARRIER.
> 
> Indeed.  And while most guests won't care some will.  E.g. reiserfs
> which is the standard filesystem in most SuSE guests, which happen to
> be fairly popular with Xen.
> 
> I'd suggest you look at the pre-2.6.36 barrier implementation and see
> if you can move that into xen-blkfront.
> 
> For the qemu side doing this is a bit easier as you'll just have to wait
> for all pending aio requests to complete.  The current qemu xen disk
> code gets thus horribly wrong, though.

I have a basic question. In old guest why BARRIER handling on request
queue is not sufficient for sequencing and ordering of requests and why
xen-blkfront and qemu have to do something about it.

I am also wondering if virtio-blk have similar issues?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-07 20:16   ` Christoph Hellwig
  2011-09-07 21:31     ` Vivek Goyal
@ 2011-09-08  8:02     ` Jan Beulich
  2011-09-08  8:08       ` Christoph Hellwig
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2011-09-08  8:02 UTC (permalink / raw)
  To: hch; +Cc: Jeremy Fitzhardinge, Konrad Rzeszutek Wilk, Vivek Goyal, linux-kernel

>>> On 07.09.11 at 22:16, Christoph Hellwig <hch@infradead.org> wrote:
> [Hmm, for some reason I never manage to receive Konrads mails directly,
>  but only get the replies, or copies via the list]
> 
> On Wed, Sep 07, 2011 at 02:17:40PM -0400, Vivek Goyal wrote:
>> On Wed, Sep 07, 2011 at 01:48:32PM -0400, Konrad Rzeszutek Wilk wrote:
>> > Hey Christoph,
>> > 
>> > I was wondering what you think is the proper way of implementing a
>> > backend to support the 'barrier' type requests? We have this issue were
>> > there are 2.6.36 type guests that still use barriers and we would like
>> > to support them properly. But in 3.0 there are no barriers - hence
>> > the question whether WRITE_fLUSH_FUA would be equal to WRITE_BARRIER?
>> 
>> I think WRITE_FLUSH_FUA is not same as WRITE_BARRIER. Because it does
>> not ensure request ordering. A request rq2 which is issued after rq1 (with
>> WRITE_flush_FUA), can still finish before rq1. In the past WRITE_BARRIER
>> would not allow that.
>> 
>> So AFAIK, WRITE_flush_fua is not WRITE_BARRIER.
> 
> Indeed.  And while most guests won't care some will.  E.g. reiserfs
> which is the standard filesystem in most SuSE guests, which happen to
> be fairly popular with Xen.
> 
> I'd suggest you look at the pre-2.6.36 barrier implementation and see
> if you can move that into xen-blkfront.

That would need to be done in the backend, as the expectation cannot
be to patch old frontends to deal with new backends.

Jan

> For the qemu side doing this is a bit easier as you'll just have to wait
> for all pending aio requests to complete.  The current qemu xen disk
> code gets thus horribly wrong, though.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-08  8:02     ` Jan Beulich
@ 2011-09-08  8:08       ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2011-09-08  8:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hch, Jeremy Fitzhardinge, Konrad Rzeszutek Wilk, Vivek Goyal,
	linux-kernel

On Thu, Sep 08, 2011 at 09:02:28AM +0100, Jan Beulich wrote:
> That would need to be done in the backend, as the expectation cannot
> be to patch old frontends to deal with new backends.

Sorry, I meant blkback above.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-07 21:31     ` Vivek Goyal
@ 2011-09-08  8:22       ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2011-09-08  8:22 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Christoph Hellwig, Konrad Rzeszutek Wilk, Jeremy Fitzhardinge,
	jbeulich, linux-kernel, JBeulich

On Wed, Sep 07, 2011 at 05:31:37PM -0400, Vivek Goyal wrote:
> I have a basic question. In old guest why BARRIER handling on request
> queue is not sufficient for sequencing and ordering of requests and why
> xen-blkfront and qemu have to do something about it.
> 
> I am also wondering if virtio-blk have similar issues?

Mainline xen-blkfront was using ORDERED_TAG before, which meant
bypassing the sequencer entirely, as well as not beeing able to
implement plain cache flushes as used by ->fsync.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-07 17:48 Help with implementing some form of barriers in 3.0 kernels Konrad Rzeszutek Wilk
  2011-09-07 18:17 ` Vivek Goyal
@ 2011-09-13 10:44 ` Jan Beulich
  2011-09-14  8:59   ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2011-09-13 10:44 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Jeremy Fitzhardinge, hch, Jan Kara, linux-kernel

>>> On 07.09.11 at 19:48, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> Hey Christoph,
> 
> I was wondering what you think is the proper way of implementing a
> backend to support the 'barrier' type requests? We have this issue were
> there are 2.6.36 type guests that still use barriers and we would like
> to support them properly. But in 3.0 there are no barriers - hence
> the question whether WRITE_fLUSH_FUA would be equal to WRITE_BARRIER?
> 
> Or is there some other things that we should take in consideration?
> 
> Thanks!

Below is what Jan Kara came up with for addressing this - what do
you think?

Jan

From: Jan Kara <jack@suse.cz>
Date: Sun, 11 Sep 2011 14:39:18 +0200
Subject: [PATCH] xen: Add support for old BARRIER requests to xenblk driver

Recent kernels do not support BARRIER operation but only FLUSH operation. But
older xenblk frontends still use the BARRIER operation to achieve data
integrity requirements. So add support for BARRIER operation into xenblk
backend so that all guests do not corrupt their filesystem on host crash.

Signed-off-by: Jan Kara <jack@suse.cz>

--- a/block/elevator.c
+++ b/block/elevator.c
@@ -619,6 +619,7 @@ void elv_drain_elevator(struct request_q
 		       q->elevator->elevator_type->elevator_name, q->nr_sorted);
 	}
 }
+EXPORT_SYMBOL(elv_drain_elevator);
 
 /*
  * Call with queue lock held, interrupts disabled
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -530,6 +530,7 @@ static int dispatch_rw_block_io(struct x
 	struct bio *biolist[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 	int i, nbio = 0;
 	int operation;
+	bool drain = false;
 	struct blk_plug plug;
 
 	switch (req->operation) {
@@ -541,11 +542,12 @@ static int dispatch_rw_block_io(struct x
 		blkif->st_wr_req++;
 		operation = WRITE_ODIRECT;
 		break;
+	case BLKIF_OP_WRITE_BARRIER:
+		drain = true;
 	case BLKIF_OP_FLUSH_DISKCACHE:
 		blkif->st_f_req++;
 		operation = WRITE_FLUSH;
 		break;
-	case BLKIF_OP_WRITE_BARRIER:
 	default:
 		operation = 0; /* make gcc happy */
 		goto fail_response;
@@ -603,6 +605,17 @@ static int dispatch_rw_block_io(struct x
 		}
 	}
 
+	if (drain) {
+		struct request_queue *q = bdev_get_queue(preq.bdev);
+		unsigned long flags;
+
+		/* Emulate the original behavior of write barriers */
+		spin_lock_irqsave(q->queue_lock, flags);
+		elv_drain_elevator(q);
+		__blk_run_queue(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
+
 	/*
 	 * If we have failed at this point, we need to undo the M2P override,
 	 * set gnttab_set_unmap_op on all of the grant references and perform




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-13 10:44 ` Jan Beulich
@ 2011-09-14  8:59   ` Konrad Rzeszutek Wilk
  2011-09-14  9:12     ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-14  8:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jeremy Fitzhardinge, hch, Jan Kara, linux-kernel

On Tue, Sep 13, 2011 at 11:44:07AM +0100, Jan Beulich wrote:
> >>> On 07.09.11 at 19:48, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > Hey Christoph,
> > 
> > I was wondering what you think is the proper way of implementing a
> > backend to support the 'barrier' type requests? We have this issue were
> > there are 2.6.36 type guests that still use barriers and we would like
> > to support them properly. But in 3.0 there are no barriers - hence
> > the question whether WRITE_fLUSH_FUA would be equal to WRITE_BARRIER?
> > 
> > Or is there some other things that we should take in consideration?
> > 
> > Thanks!
> 
> Below is what Jan Kara came up with for addressing this - what do
> you think?

It looks like it would do it. I modified it a bit, testing it now.


commit 315c0cf1a5174b9aed494d7903133ce9ac76d6f1
Author: Jan Kara <jack@suse.cz>
Date:   Tue Sep 13 11:44:07 2011 +0100

    xen: Add support for old BARRIER requests to xenblk driver
    
    Recent kernels do not support BARRIER operation but only FLUSH operation. But
    older xenblk frontends still use the BARRIER operation to achieve data
    integrity requirements. So add support for BARRIER operation into xenblk
    backend so that all guests do not corrupt their filesystem on host crash.
    
    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Jan Beulich <JBeulich@suse.com>
    [v1: Added some extra functions, and other cases]
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

diff --git a/block/elevator.c b/block/elevator.c
index a3b64bc..b2143a8 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -616,6 +616,7 @@ void elv_drain_elevator(struct request_queue *q)
 		       q->elevator->elevator_type->elevator_name, q->nr_sorted);
 	}
 }
+EXPORT_SYMBOL(elv_drain_elevator);
 
 /*
  * Call with queue lock held, interrupts disabled
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 9713d5a..1df773c 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -464,6 +464,11 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
 		pr_debug(DRV_PFX "flush diskcache op failed, not supported\n");
 		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->blkif->be, 0);
 		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
+	} else if ((pending_req->operation == BLKIF_OP_WRITE_BARRIER) &&
+		 (error == -EOPNOTSUPP)) {
+		pr_debug(DRV_PFX "write barrier op failed, not supported\n");
+		xen_blkbk_barrier(XBT_NIL, pending_req->blkif->be, 0);
+		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
 	} else if (error) {
 		pr_debug(DRV_PFX "Buffer not up-to-date at end of operation,"
 			 " error=%d\n", error);
@@ -590,6 +595,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	struct bio *biolist[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 	int i, nbio = 0;
 	int operation;
+	bool drain = false;
 	struct blk_plug plug;
 
 	switch (req->operation) {
@@ -601,6 +607,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		blkif->st_wr_req++;
 		operation = WRITE_ODIRECT;
 		break;
+	case BLKIF_OP_WRITE_BARRIER:
+		drain = true;
 	case BLKIF_OP_FLUSH_DISKCACHE:
 		blkif->st_f_req++;
 		operation = WRITE_FLUSH;
@@ -609,7 +617,6 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		blkif->st_ds_req++;
 		operation = REQ_DISCARD;
 		break;
-	case BLKIF_OP_WRITE_BARRIER:
 	default:
 		operation = 0; /* make gcc happy */
 		goto fail_response;
@@ -668,6 +675,17 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		}
 	}
 
+	if (drain) {
+		struct request_queue *q = bdev_get_queue(preq.bdev);
+		unsigned long flags;
+
+		/* Emulate the original behavior of write barriers */
+		spin_lock_irqsave(q->queue_lock, flags);
+		elv_drain_elevator(q);
+		__blk_run_queue(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
+
 	/*
 	 * If we have failed at this point, we need to undo the M2P override,
 	 * set gnttab_set_unmap_op on all of the grant references and perform
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index bfb532e..42b0e46 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -228,6 +228,8 @@ int xen_blkif_schedule(void *arg);
 
 int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
 			      struct backend_info *be, int state);
+int xen_blkbk_barrier(struct xenbus_transaction xbt,
+		      struct backend_info *be, int state);
 
 struct xenbus_device *xen_blkbk_xenbus(struct backend_info *be);
 
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 2b3aef0..b477aee 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -421,6 +421,20 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
 	return err;
 }
 
+int xen_blkbk_barrier(struct xenbus_transaction xbt,
+		      struct backend_info *be, int state)
+{
+	struct xenbus_device *dev = be->dev;
+	int err;
+
+	err = xenbus_printf(xbt, dev->nodename, "feature-barrier",
+			    "%d", state);
+	if (err)
+		xenbus_dev_fatal(dev, err, "writing feature-barrier");
+
+	return err;
+}
+
 int xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info *be)
 {
 	struct xenbus_device *dev = be->dev;
@@ -706,6 +720,8 @@ again:
 	if (err)
 		goto abort;
 
+	err = xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
+
 	err = xen_blkbk_discard(xbt, be);
 
 	err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-14  8:59   ` Konrad Rzeszutek Wilk
@ 2011-09-14  9:12     ` Jan Beulich
  2011-09-14  9:30       ` Konrad Rzeszutek Wilk
  2011-09-14 14:32       ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 25+ messages in thread
From: Jan Beulich @ 2011-09-14  9:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Jeremy Fitzhardinge, hch, Jan Kara, linux-kernel

>>> On 14.09.11 at 10:59, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Tue, Sep 13, 2011 at 11:44:07AM +0100, Jan Beulich wrote:
>> >>> On 07.09.11 at 19:48, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> > Hey Christoph,
>> > 
>> > I was wondering what you think is the proper way of implementing a
>> > backend to support the 'barrier' type requests? We have this issue were
>> > there are 2.6.36 type guests that still use barriers and we would like
>> > to support them properly. But in 3.0 there are no barriers - hence
>> > the question whether WRITE_fLUSH_FUA would be equal to WRITE_BARRIER?
>> > 
>> > Or is there some other things that we should take in consideration?
>> > 
>> > Thanks!
>> 
>> Below is what Jan Kara came up with for addressing this - what do
>> you think?
> 
> It looks like it would do it. I modified it a bit, testing it now.
> 
> 
> commit 315c0cf1a5174b9aed494d7903133ce9ac76d6f1
> Author: Jan Kara <jack@suse.cz>
> Date:   Tue Sep 13 11:44:07 2011 +0100
> 
>     xen: Add support for old BARRIER requests to xenblk driver
>     
>     Recent kernels do not support BARRIER operation but only FLUSH 
> operation. But
>     older xenblk frontends still use the BARRIER operation to achieve data
>     integrity requirements. So add support for BARRIER operation into xenblk
>     backend so that all guests do not corrupt their filesystem on host 
> crash.
>     
>     Signed-off-by: Jan Kara <jack@suse.cz>
>     Signed-off-by: Jan Beulich <JBeulich@suse.com>
>     [v1: Added some extra functions, and other cases]

Ah, yes, of course - Jan probably didn't pay attention because in our
variant of the driver these functions never got removed.

>     Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> diff --git a/block/elevator.c b/block/elevator.c
> index a3b64bc..b2143a8 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -616,6 +616,7 @@ void elv_drain_elevator(struct request_queue *q)
>  		       q->elevator->elevator_type->elevator_name, q->nr_sorted);
>  	}
>  }
> +EXPORT_SYMBOL(elv_drain_elevator);

Now, if you modify it anyway, how about making this an
EXPORT_SYMBOL_GPL()?

Jan

>  
>  /*
>   * Call with queue lock held, interrupts disabled
> diff --git a/drivers/block/xen-blkback/blkback.c 
> b/drivers/block/xen-blkback/blkback.c
> index 9713d5a..1df773c 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -464,6 +464,11 @@ static void __end_block_io_op(struct pending_req 
> *pending_req, int error)
>  		pr_debug(DRV_PFX "flush diskcache op failed, not supported\n");
>  		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->blkif->be, 0);
>  		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
> +	} else if ((pending_req->operation == BLKIF_OP_WRITE_BARRIER) &&
> +		 (error == -EOPNOTSUPP)) {
> +		pr_debug(DRV_PFX "write barrier op failed, not supported\n");
> +		xen_blkbk_barrier(XBT_NIL, pending_req->blkif->be, 0);
> +		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
>  	} else if (error) {
>  		pr_debug(DRV_PFX "Buffer not up-to-date at end of operation,"
>  			 " error=%d\n", error);
> @@ -590,6 +595,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  	struct bio *biolist[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>  	int i, nbio = 0;
>  	int operation;
> +	bool drain = false;
>  	struct blk_plug plug;
>  
>  	switch (req->operation) {
> @@ -601,6 +607,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  		blkif->st_wr_req++;
>  		operation = WRITE_ODIRECT;
>  		break;
> +	case BLKIF_OP_WRITE_BARRIER:
> +		drain = true;
>  	case BLKIF_OP_FLUSH_DISKCACHE:
>  		blkif->st_f_req++;
>  		operation = WRITE_FLUSH;
> @@ -609,7 +617,6 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  		blkif->st_ds_req++;
>  		operation = REQ_DISCARD;
>  		break;
> -	case BLKIF_OP_WRITE_BARRIER:
>  	default:
>  		operation = 0; /* make gcc happy */
>  		goto fail_response;
> @@ -668,6 +675,17 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  		}
>  	}
>  
> +	if (drain) {
> +		struct request_queue *q = bdev_get_queue(preq.bdev);
> +		unsigned long flags;
> +
> +		/* Emulate the original behavior of write barriers */
> +		spin_lock_irqsave(q->queue_lock, flags);
> +		elv_drain_elevator(q);
> +		__blk_run_queue(q);
> +		spin_unlock_irqrestore(q->queue_lock, flags);
> +	}
> +
>  	/*
>  	 * If we have failed at this point, we need to undo the M2P override,
>  	 * set gnttab_set_unmap_op on all of the grant references and perform
> diff --git a/drivers/block/xen-blkback/common.h 
> b/drivers/block/xen-blkback/common.h
> index bfb532e..42b0e46 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -228,6 +228,8 @@ int xen_blkif_schedule(void *arg);
>  
>  int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
>  			      struct backend_info *be, int state);
> +int xen_blkbk_barrier(struct xenbus_transaction xbt,
> +		      struct backend_info *be, int state);
>  
>  struct xenbus_device *xen_blkbk_xenbus(struct backend_info *be);
>  
> diff --git a/drivers/block/xen-blkback/xenbus.c 
> b/drivers/block/xen-blkback/xenbus.c
> index 2b3aef0..b477aee 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -421,6 +421,20 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction 
> xbt,
>  	return err;
>  }
>  
> +int xen_blkbk_barrier(struct xenbus_transaction xbt,
> +		      struct backend_info *be, int state)
> +{
> +	struct xenbus_device *dev = be->dev;
> +	int err;
> +
> +	err = xenbus_printf(xbt, dev->nodename, "feature-barrier",
> +			    "%d", state);
> +	if (err)
> +		xenbus_dev_fatal(dev, err, "writing feature-barrier");
> +
> +	return err;
> +}
> +
>  int xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info 
> *be)
>  {
>  	struct xenbus_device *dev = be->dev;
> @@ -706,6 +720,8 @@ again:
>  	if (err)
>  		goto abort;
>  
> +	err = xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
> +
>  	err = xen_blkbk_discard(xbt, be);
>  
>  	err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-14  9:12     ` Jan Beulich
@ 2011-09-14  9:30       ` Konrad Rzeszutek Wilk
  2011-09-14 10:15         ` Jan Beulich
  2011-09-14 14:32       ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-14  9:30 UTC (permalink / raw)
  To: Jan Beulich, jaxboe; +Cc: Jeremy Fitzhardinge, hch, Jan Kara, linux-kernel

> >> Below is what Jan Kara came up with for addressing this - what do
> >> you think?

Jens,

Could you take a look at the block/elevator patch below?
> > 
> > It looks like it would do it. I modified it a bit, testing it now.
> > 
> > 
> > commit 315c0cf1a5174b9aed494d7903133ce9ac76d6f1
> > Author: Jan Kara <jack@suse.cz>
> > Date:   Tue Sep 13 11:44:07 2011 +0100
> > 
> >     xen: Add support for old BARRIER requests to xenblk driver
> >     
> >     Recent kernels do not support BARRIER operation but only FLUSH 
> > operation. But
> >     older xenblk frontends still use the BARRIER operation to achieve data
> >     integrity requirements. So add support for BARRIER operation into xenblk
> >     backend so that all guests do not corrupt their filesystem on host 
> > crash.
> >     
> >     Signed-off-by: Jan Kara <jack@suse.cz>
> >     Signed-off-by: Jan Beulich <JBeulich@suse.com>
> >     [v1: Added some extra functions, and other cases]
> 
> Ah, yes, of course - Jan probably didn't pay attention because in our
> variant of the driver these functions never got removed.

<nods> Which is OK - but you might want to migrate in the the 'feature-write-flush'
> 
> >     Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > 
> > diff --git a/block/elevator.c b/block/elevator.c
> > index a3b64bc..b2143a8 100644
> > --- a/block/elevator.c
> > +++ b/block/elevator.c
> > @@ -616,6 +616,7 @@ void elv_drain_elevator(struct request_queue *q)
> >  		       q->elevator->elevator_type->elevator_name, q->nr_sorted);
> >  	}
> >  }
> > +EXPORT_SYMBOL(elv_drain_elevator);
> 
> Now, if you modify it anyway, how about making this an
> EXPORT_SYMBOL_GPL()?

The rest of that file had EXPORT_SYMBOL.. so I figured I will follow
the same appearance. But I am OK doing it via EXPORT_SYMBOL_GPL. Lets
ask Jens.

> 
> Jan
> 
> >  
> >  /*
> >   * Call with queue lock held, interrupts disabled
> > diff --git a/drivers/block/xen-blkback/blkback.c 
> > b/drivers/block/xen-blkback/blkback.c
> > index 9713d5a..1df773c 100644
> > --- a/drivers/block/xen-blkback/blkback.c
> > +++ b/drivers/block/xen-blkback/blkback.c
> > @@ -464,6 +464,11 @@ static void __end_block_io_op(struct pending_req 
> > *pending_req, int error)
> >  		pr_debug(DRV_PFX "flush diskcache op failed, not supported\n");
> >  		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->blkif->be, 0);
> >  		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
> > +	} else if ((pending_req->operation == BLKIF_OP_WRITE_BARRIER) &&
> > +		 (error == -EOPNOTSUPP)) {
> > +		pr_debug(DRV_PFX "write barrier op failed, not supported\n");
> > +		xen_blkbk_barrier(XBT_NIL, pending_req->blkif->be, 0);
> > +		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
> >  	} else if (error) {
> >  		pr_debug(DRV_PFX "Buffer not up-to-date at end of operation,"
> >  			 " error=%d\n", error);
> > @@ -590,6 +595,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >  	struct bio *biolist[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> >  	int i, nbio = 0;
> >  	int operation;
> > +	bool drain = false;
> >  	struct blk_plug plug;
> >  
> >  	switch (req->operation) {
> > @@ -601,6 +607,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >  		blkif->st_wr_req++;
> >  		operation = WRITE_ODIRECT;
> >  		break;
> > +	case BLKIF_OP_WRITE_BARRIER:
> > +		drain = true;
> >  	case BLKIF_OP_FLUSH_DISKCACHE:
> >  		blkif->st_f_req++;
> >  		operation = WRITE_FLUSH;
> > @@ -609,7 +617,6 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >  		blkif->st_ds_req++;
> >  		operation = REQ_DISCARD;
> >  		break;
> > -	case BLKIF_OP_WRITE_BARRIER:
> >  	default:
> >  		operation = 0; /* make gcc happy */
> >  		goto fail_response;
> > @@ -668,6 +675,17 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >  		}
> >  	}
> >  
> > +	if (drain) {
> > +		struct request_queue *q = bdev_get_queue(preq.bdev);
> > +		unsigned long flags;
> > +
> > +		/* Emulate the original behavior of write barriers */
> > +		spin_lock_irqsave(q->queue_lock, flags);
> > +		elv_drain_elevator(q);
> > +		__blk_run_queue(q);
> > +		spin_unlock_irqrestore(q->queue_lock, flags);
> > +	}
> > +
> >  	/*
> >  	 * If we have failed at this point, we need to undo the M2P override,
> >  	 * set gnttab_set_unmap_op on all of the grant references and perform
> > diff --git a/drivers/block/xen-blkback/common.h 
> > b/drivers/block/xen-blkback/common.h
> > index bfb532e..42b0e46 100644
> > --- a/drivers/block/xen-blkback/common.h
> > +++ b/drivers/block/xen-blkback/common.h
> > @@ -228,6 +228,8 @@ int xen_blkif_schedule(void *arg);
> >  
> >  int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
> >  			      struct backend_info *be, int state);
> > +int xen_blkbk_barrier(struct xenbus_transaction xbt,
> > +		      struct backend_info *be, int state);
> >  
> >  struct xenbus_device *xen_blkbk_xenbus(struct backend_info *be);
> >  
> > diff --git a/drivers/block/xen-blkback/xenbus.c 
> > b/drivers/block/xen-blkback/xenbus.c
> > index 2b3aef0..b477aee 100644
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -421,6 +421,20 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction 
> > xbt,
> >  	return err;
> >  }
> >  
> > +int xen_blkbk_barrier(struct xenbus_transaction xbt,
> > +		      struct backend_info *be, int state)
> > +{
> > +	struct xenbus_device *dev = be->dev;
> > +	int err;
> > +
> > +	err = xenbus_printf(xbt, dev->nodename, "feature-barrier",
> > +			    "%d", state);
> > +	if (err)
> > +		xenbus_dev_fatal(dev, err, "writing feature-barrier");
> > +
> > +	return err;
> > +}
> > +
> >  int xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info 
> > *be)
> >  {
> >  	struct xenbus_device *dev = be->dev;
> > @@ -706,6 +720,8 @@ again:
> >  	if (err)
> >  		goto abort;
> >  
> > +	err = xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
> > +
> >  	err = xen_blkbk_discard(xbt, be);
> >  
> >  	err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-14  9:30       ` Konrad Rzeszutek Wilk
@ 2011-09-14 10:15         ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2011-09-14 10:15 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: jaxboe, Jeremy Fitzhardinge, hch, Jan Kara, linux-kernel

>>> On 14.09.11 at 11:30, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> >     [v1: Added some extra functions, and other cases]
>> 
>> Ah, yes, of course - Jan probably didn't pay attention because in our
>> variant of the driver these functions never got removed.
> 
> <nods> Which is OK - but you might want to migrate in the the 
> 'feature-write-flush'

We really have both. I just never removed the barrier one.

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-14  9:12     ` Jan Beulich
  2011-09-14  9:30       ` Konrad Rzeszutek Wilk
@ 2011-09-14 14:32       ` Konrad Rzeszutek Wilk
  2011-09-14 15:01         ` [PATCH]: xen/blkback: Add support for old BARRIER requests - 'feature-barrier', was "Help with implementing some form of barriers in 3.0 kernels." Konrad Rzeszutek Wilk
  2011-09-14 15:34         ` Help with implementing some form of barriers in 3.0 kernels Mike Snitzer
  1 sibling, 2 replies; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-14 14:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jeremy Fitzhardinge, hch, Jan Kara, linux-kernel

> > +	if (drain) {
> > +		struct request_queue *q = bdev_get_queue(preq.bdev);
> > +		unsigned long flags;
> > +

> > +		/* Emulate the original behavior of write barriers */
> > +		spin_lock_irqsave(q->queue_lock, flags);
> > +		elv_drain_elevator(q);
> > +		__blk_run_queue(q);
> > +		spin_unlock_irqrestore(q->queue_lock, flags);
> > +	}
I also had to add:

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index eaf49d1..20fddbc 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -679,6 +679,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		struct request_queue *q = bdev_get_queue(preq.bdev);
 		unsigned long flags;
 
+		if (!q->elevator) {
+			__end_block_io_op(pending_req, -EOPNOTSUPP);
+			return -EOPNOTSUPP;
+		}
 		/* Emulate the original behavior of write barriers */
 		spin_lock_irqsave(q->queue_lock, flags);
 		elv_drain_elevator(q);
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 52d8893..722714a 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -157,6 +157,7 @@ struct xen_vbd {
 	/* Cached size parameter. */
 	sector_t		size;
 	bool			flush_support;
+	bool			barrier_support;
 };
 
 struct backend_info;
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index da1e27a..7189ecd 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -384,6 +384,9 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
 	if (q && q->flush_flags)
 		vbd->flush_support = true;
 
+	if (q && q->elevator && q->elevator->ops)
+		vbd->barrier_support = true;
+
 	DPRINTK("Successful creation of handle=%04x (dom=%u)\n",
 		handle, blkif->domid);
 	return 0;
@@ -728,7 +731,7 @@ again:
 	if (err)
 		goto abort;
 
-	err = xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
+	err = xen_blkbk_barrier(xbt, be, be->blkif->vbd.barrier_support);
 
 	err = xen_blkbk_discard(xbt, be);


Otherwise it would crash.. Thought I am not sure why the elevator is not
set (the guest is on a LVM LV).

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH]: xen/blkback: Add support for old BARRIER requests - 'feature-barrier', was "Help with implementing some form of barriers in 3.0 kernels."
  2011-09-14 14:32       ` Konrad Rzeszutek Wilk
@ 2011-09-14 15:01         ` Konrad Rzeszutek Wilk
  2011-09-14 16:13           ` Jan Beulich
  2011-09-14 15:34         ` Help with implementing some form of barriers in 3.0 kernels Mike Snitzer
  1 sibling, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-14 15:01 UTC (permalink / raw)
  To: Jan Beulich, axboe; +Cc: Jeremy Fitzhardinge, hch, Jan Kara, linux-kernel

On Wed, Sep 14, 2011 at 10:32:47AM -0400, Konrad Rzeszutek Wilk wrote:

Let me just post the one that I've succesfully tested. Not sure
about the need for the check for q->elvator in the dispatch_rw_block_io
so if you guys want me to remove it I can certainly do it.

commit a5a12f729697161522ecae0564915cf4145cb65c
Author: Jan Kara <jack@suse.cz>
Date:   Tue Sep 13 11:44:07 2011 +0100

    xen/blkback: Add support for old BARRIER requests - 'feature-barrier'
    
    Recent kernels do not support BARRIER operation but only FLUSH operation. But
    older xenblk frontends still use the BARRIER operation to achieve data
    integrity requirements. So add support for BARRIER operation into xenblk
    backend so that all guests do not corrupt their filesystem on host crash.
    
    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Jan Beulich <JBeulich@suse.com>
    [v1: Added some extra functions, and other cases]
    [v2: Added check for q->elevator]
    [v3: Changed EXPORT_SYMBOL to EXPORT_SYMBOL_GPL]
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

diff --git a/block/elevator.c b/block/elevator.c
index a3b64bc..4e6c5b6 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -616,6 +616,7 @@ void elv_drain_elevator(struct request_queue *q)
 		       q->elevator->elevator_type->elevator_name, q->nr_sorted);
 	}
 }
+EXPORT_SYMBOL_GPL(elv_drain_elevator);
 
 /*
  * Call with queue lock held, interrupts disabled
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 9713d5a..573744e 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -464,6 +464,11 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
 		pr_debug(DRV_PFX "flush diskcache op failed, not supported\n");
 		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->blkif->be, 0);
 		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
+	} else if ((pending_req->operation == BLKIF_OP_WRITE_BARRIER) &&
+		   (error == -EOPNOTSUPP)) {
+		pr_debug(DRV_PFX "write barrier op failed, not supported\n");
+		xen_blkbk_barrier(XBT_NIL, pending_req->blkif->be, 0);
+		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
 	} else if (error) {
 		pr_debug(DRV_PFX "Buffer not up-to-date at end of operation,"
 			 " error=%d\n", error);
@@ -590,6 +595,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	struct bio *biolist[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 	int i, nbio = 0;
 	int operation;
+	bool drain = false;
 	struct blk_plug plug;
 
 	switch (req->operation) {
@@ -601,6 +607,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		blkif->st_wr_req++;
 		operation = WRITE_ODIRECT;
 		break;
+	case BLKIF_OP_WRITE_BARRIER:
+		drain = true;
 	case BLKIF_OP_FLUSH_DISKCACHE:
 		blkif->st_f_req++;
 		operation = WRITE_FLUSH;
@@ -609,7 +617,6 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		blkif->st_ds_req++;
 		operation = REQ_DISCARD;
 		break;
-	case BLKIF_OP_WRITE_BARRIER:
 	default:
 		operation = 0; /* make gcc happy */
 		goto fail_response;
@@ -668,6 +675,21 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		}
 	}
 
+	if (drain) {
+		struct request_queue *q = bdev_get_queue(preq.bdev);
+		unsigned long flags;
+
+		if (!q->elevator) {
+			__end_block_io_op(pending_req, -EOPNOTSUPP);
+			return -EOPNOTSUPP;
+		}
+		/* Emulate the original behavior of write barriers */
+		spin_lock_irqsave(q->queue_lock, flags);
+		elv_drain_elevator(q);
+		__blk_run_queue(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
+
 	/*
 	 * If we have failed at this point, we need to undo the M2P override,
 	 * set gnttab_set_unmap_op on all of the grant references and perform
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index bfb532e..9f1990b 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -157,6 +157,7 @@ struct xen_vbd {
 	/* Cached size parameter. */
 	sector_t		size;
 	bool			flush_support;
+	bool			barrier_support;
 };
 
 struct backend_info;
@@ -228,6 +229,8 @@ int xen_blkif_schedule(void *arg);
 
 int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
 			      struct backend_info *be, int state);
+int xen_blkbk_barrier(struct xenbus_transaction xbt,
+		      struct backend_info *be, int state);
 
 struct xenbus_device *xen_blkbk_xenbus(struct backend_info *be);
 
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 2b3aef0..ec0975b 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -376,6 +376,9 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
 	if (q && q->flush_flags)
 		vbd->flush_support = true;
 
+	if (q && q->elevator)
+		vbd->barrier_support = true;
+
 	DPRINTK("Successful creation of handle=%04x (dom=%u)\n",
 		handle, blkif->domid);
 	return 0;
@@ -421,6 +424,20 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
 	return err;
 }
 
+int xen_blkbk_barrier(struct xenbus_transaction xbt,
+		      struct backend_info *be, int state)
+{
+	struct xenbus_device *dev = be->dev;
+	int err;
+
+	err = xenbus_printf(xbt, dev->nodename, "feature-barrier",
+			    "%d", state);
+	if (err)
+		xenbus_dev_fatal(dev, err, "writing feature-barrier");
+
+	return err;
+}
+
 int xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info *be)
 {
 	struct xenbus_device *dev = be->dev;
@@ -706,6 +723,8 @@ again:
 	if (err)
 		goto abort;
 
+	err = xen_blkbk_barrier(xbt, be, be->blkif->vbd.barrier_support);
+
 	err = xen_blkbk_discard(xbt, be);
 
 	err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Help with implementing some form of barriers in 3.0 kernels.
  2011-09-14 14:32       ` Konrad Rzeszutek Wilk
  2011-09-14 15:01         ` [PATCH]: xen/blkback: Add support for old BARRIER requests - 'feature-barrier', was "Help with implementing some form of barriers in 3.0 kernels." Konrad Rzeszutek Wilk
@ 2011-09-14 15:34         ` Mike Snitzer
  1 sibling, 0 replies; 25+ messages in thread
From: Mike Snitzer @ 2011-09-14 15:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jan Beulich, Jeremy Fitzhardinge, hch, Jan Kara, linux-kernel

On Wed, Sep 14, 2011 at 10:32 AM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>
> > > +   if (drain) {
> > > +           struct request_queue *q = bdev_get_queue(preq.bdev);
> > > +           unsigned long flags;
> > > +
>
> > > +           /* Emulate the original behavior of write barriers */
> > > +           spin_lock_irqsave(q->queue_lock, flags);
> > > +           elv_drain_elevator(q);
> > > +           __blk_run_queue(q);
> > > +           spin_unlock_irqrestore(q->queue_lock, flags);
> > > +   }
> I also had to add:
>
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index eaf49d1..20fddbc 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -679,6 +679,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>                struct request_queue *q = bdev_get_queue(preq.bdev);
>                unsigned long flags;
>
> +               if (!q->elevator) {
> +                       __end_block_io_op(pending_req, -EOPNOTSUPP);
> +                       return -EOPNOTSUPP;
> +               }
>                /* Emulate the original behavior of write barriers */
>                spin_lock_irqsave(q->queue_lock, flags);
>                elv_drain_elevator(q);
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index 52d8893..722714a 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -157,6 +157,7 @@ struct xen_vbd {
>        /* Cached size parameter. */
>        sector_t                size;
>        bool                    flush_support;
> +       bool                    barrier_support;
>  };
>
>  struct backend_info;
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index da1e27a..7189ecd 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -384,6 +384,9 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
>        if (q && q->flush_flags)
>                vbd->flush_support = true;
>
> +       if (q && q->elevator && q->elevator->ops)
> +               vbd->barrier_support = true;
> +
>        DPRINTK("Successful creation of handle=%04x (dom=%u)\n",
>                handle, blkif->domid);
>        return 0;
> @@ -728,7 +731,7 @@ again:
>        if (err)
>                goto abort;
>
> -       err = xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
> +       err = xen_blkbk_barrier(xbt, be, be->blkif->vbd.barrier_support);
>
>        err = xen_blkbk_discard(xbt, be);
>
>
> Otherwise it would crash.. Thought I am not sure why the elevator is not
> set (the guest is on a LVM LV).

bio-based DM devices do not have an elevator.  But that doesn't mean
barriers (or FLUSH/FUA) aren't supported and/or needed by the
underlying devices.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH]: xen/blkback: Add support for old BARRIER requests - 'feature-barrier', was "Help with implementing some form of barriers in 3.0 kernels."
  2011-09-14 15:01         ` [PATCH]: xen/blkback: Add support for old BARRIER requests - 'feature-barrier', was "Help with implementing some form of barriers in 3.0 kernels." Konrad Rzeszutek Wilk
@ 2011-09-14 16:13           ` Jan Beulich
  2011-09-15 12:51             ` Konrad Rzeszutek Wilk
  2011-09-15 13:24             ` Jan Kara
  0 siblings, 2 replies; 25+ messages in thread
From: Jan Beulich @ 2011-09-14 16:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jeremy Fitzhardinge, hch, axboe, Jan Kara, linux-kernel

>>> On 14.09.11 at 17:01, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Wed, Sep 14, 2011 at 10:32:47AM -0400, Konrad Rzeszutek Wilk wrote:
> 
> Let me just post the one that I've succesfully tested. Not sure
> about the need for the check for q->elvator in the dispatch_rw_block_io
> so if you guys want me to remove it I can certainly do it.
> 
> commit a5a12f729697161522ecae0564915cf4145cb65c
> Author: Jan Kara <jack@suse.cz>
> Date:   Tue Sep 13 11:44:07 2011 +0100
> 
>     xen/blkback: Add support for old BARRIER requests - 'feature-barrier'
>     
>     Recent kernels do not support BARRIER operation but only FLUSH 
> operation. But
>     older xenblk frontends still use the BARRIER operation to achieve data
>     integrity requirements. So add support for BARRIER operation into xenblk
>     backend so that all guests do not corrupt their filesystem on host 
> crash.
>     
>     Signed-off-by: Jan Kara <jack@suse.cz>
>     Signed-off-by: Jan Beulich <JBeulich@suse.com>
>     [v1: Added some extra functions, and other cases]
>     [v2: Added check for q->elevator]
>     [v3: Changed EXPORT_SYMBOL to EXPORT_SYMBOL_GPL]
>     Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> diff --git a/block/elevator.c b/block/elevator.c
> index a3b64bc..4e6c5b6 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -616,6 +616,7 @@ void elv_drain_elevator(struct request_queue *q)
>  		       q->elevator->elevator_type->elevator_name, q->nr_sorted);
>  	}
>  }
> +EXPORT_SYMBOL_GPL(elv_drain_elevator);
>  
>  /*
>   * Call with queue lock held, interrupts disabled
> diff --git a/drivers/block/xen-blkback/blkback.c 
> b/drivers/block/xen-blkback/blkback.c
> index 9713d5a..573744e 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -464,6 +464,11 @@ static void __end_block_io_op(struct pending_req 
> *pending_req, int error)
>  		pr_debug(DRV_PFX "flush diskcache op failed, not supported\n");
>  		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->blkif->be, 0);
>  		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
> +	} else if ((pending_req->operation == BLKIF_OP_WRITE_BARRIER) &&
> +		   (error == -EOPNOTSUPP)) {
> +		pr_debug(DRV_PFX "write barrier op failed, not supported\n");
> +		xen_blkbk_barrier(XBT_NIL, pending_req->blkif->be, 0);
> +		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
>  	} else if (error) {
>  		pr_debug(DRV_PFX "Buffer not up-to-date at end of operation,"
>  			 " error=%d\n", error);
> @@ -590,6 +595,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  	struct bio *biolist[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>  	int i, nbio = 0;
>  	int operation;
> +	bool drain = false;
>  	struct blk_plug plug;
>  
>  	switch (req->operation) {
> @@ -601,6 +607,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  		blkif->st_wr_req++;
>  		operation = WRITE_ODIRECT;
>  		break;
> +	case BLKIF_OP_WRITE_BARRIER:
> +		drain = true;
>  	case BLKIF_OP_FLUSH_DISKCACHE:
>  		blkif->st_f_req++;
>  		operation = WRITE_FLUSH;
> @@ -609,7 +617,6 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  		blkif->st_ds_req++;
>  		operation = REQ_DISCARD;
>  		break;
> -	case BLKIF_OP_WRITE_BARRIER:
>  	default:
>  		operation = 0; /* make gcc happy */
>  		goto fail_response;
> @@ -668,6 +675,21 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  		}
>  	}
>  
> +	if (drain) {
> +		struct request_queue *q = bdev_get_queue(preq.bdev);
> +		unsigned long flags;
> +
> +		if (!q->elevator) {
> +			__end_block_io_op(pending_req, -EOPNOTSUPP);
> +			return -EOPNOTSUPP;
> +		}

You shouldn't return here - barrier requests may be "non-empty" (i.e.
carry data to be written), and hence you'd need to make sure the
writes get carried out nevertheless.

Jan - this suggests that the placement isn't correct either: The
draining of the queue - as I understand it - should be happening
*after* these writes completed, not before they get started.

Jan

> +		/* Emulate the original behavior of write barriers */
> +		spin_lock_irqsave(q->queue_lock, flags);
> +		elv_drain_elevator(q);
> +		__blk_run_queue(q);
> +		spin_unlock_irqrestore(q->queue_lock, flags);
> +	}
> +
>  	/*
>  	 * If we have failed at this point, we need to undo the M2P override,
>  	 * set gnttab_set_unmap_op on all of the grant references and perform
> diff --git a/drivers/block/xen-blkback/common.h 
> b/drivers/block/xen-blkback/common.h
> index bfb532e..9f1990b 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -157,6 +157,7 @@ struct xen_vbd {
>  	/* Cached size parameter. */
>  	sector_t		size;
>  	bool			flush_support;
> +	bool			barrier_support;
>  };
>  
>  struct backend_info;
> @@ -228,6 +229,8 @@ int xen_blkif_schedule(void *arg);
>  
>  int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
>  			      struct backend_info *be, int state);
> +int xen_blkbk_barrier(struct xenbus_transaction xbt,
> +		      struct backend_info *be, int state);
>  
>  struct xenbus_device *xen_blkbk_xenbus(struct backend_info *be);
>  
> diff --git a/drivers/block/xen-blkback/xenbus.c 
> b/drivers/block/xen-blkback/xenbus.c
> index 2b3aef0..ec0975b 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -376,6 +376,9 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
> blkif_vdev_t handle,
>  	if (q && q->flush_flags)
>  		vbd->flush_support = true;
>  
> +	if (q && q->elevator)
> +		vbd->barrier_support = true;
> +
>  	DPRINTK("Successful creation of handle=%04x (dom=%u)\n",
>  		handle, blkif->domid);
>  	return 0;
> @@ -421,6 +424,20 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction 
> xbt,
>  	return err;
>  }
>  
> +int xen_blkbk_barrier(struct xenbus_transaction xbt,
> +		      struct backend_info *be, int state)
> +{
> +	struct xenbus_device *dev = be->dev;
> +	int err;
> +
> +	err = xenbus_printf(xbt, dev->nodename, "feature-barrier",
> +			    "%d", state);
> +	if (err)
> +		xenbus_dev_fatal(dev, err, "writing feature-barrier");
> +
> +	return err;
> +}
> +
>  int xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info 
> *be)
>  {
>  	struct xenbus_device *dev = be->dev;
> @@ -706,6 +723,8 @@ again:
>  	if (err)
>  		goto abort;
>  
> +	err = xen_blkbk_barrier(xbt, be, be->blkif->vbd.barrier_support);
> +
>  	err = xen_blkbk_discard(xbt, be);
>  
>  	err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH]: xen/blkback: Add support for old BARRIER requests - 'feature-barrier', was "Help with implementing some form of barriers in 3.0 kernels."
  2011-09-14 16:13           ` Jan Beulich
@ 2011-09-15 12:51             ` Konrad Rzeszutek Wilk
  2011-09-15 13:00               ` Jan Beulich
  2011-09-15 13:24             ` Jan Kara
  1 sibling, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-15 12:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jeremy Fitzhardinge, hch, axboe, Jan Kara, linux-kernel

> > +	if (drain) {
> > +		struct request_queue *q = bdev_get_queue(preq.bdev);
> > +		unsigned long flags;
> > +
> > +		if (!q->elevator) {
> > +			__end_block_io_op(pending_req, -EOPNOTSUPP);
> > +			return -EOPNOTSUPP;
> > +		}
> 
> You shouldn't return here - barrier requests may be "non-empty" (i.e.
> carry data to be written), and hence you'd need to make sure the
> writes get carried out nevertheless.

In theory (and in practice) the 'feature-barrier' is set to zero
if !q->elevator. So the frontend should not even try WRITE_BARRIER.

I think the better approach will be to outright fail the WRITE_BARRIER
request if 'feature-barrier=0' instead of just trying (which is what
this tries).

On a unrelated topic is what to do with devices which are bio based
but not request based. From the DM directory, only multipath is capable
of doing requests, while the rest are all bio. There is code to
quisce the bio's: dm_suspend, but it is not apparent to me how to
make it be exposed.


> 
> Jan - this suggests that the placement isn't correct either: The
> draining of the queue - as I understand it - should be happening
> *after* these writes completed, not before they get started.

So that looks like it should be just moved a bit, like this new patch:

commit 315c0cf1a5174b9aed494d7903133ce9ac76d6f1
Author: Jan Kara <jack@suse.cz>
Date:   Tue Sep 13 11:44:07 2011 +0100

    xen: Add support for old BARRIER requests to xenblk driver
    
    Recent kernels do not support BARRIER operation but only FLUSH operation. But
    older xenblk frontends still use the BARRIER operation to achieve data
    integrity requirements. So add support for BARRIER operation into xenblk
    backend so that all guests do not corrupt their filesystem on host crash.
    
    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Jan Beulich <JBeulich@suse.com>
    [v1: Added some extra functions, and other cases]
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

diff --git a/block/elevator.c b/block/elevator.c
index a3b64bc..b2143a8 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -616,6 +616,7 @@ void elv_drain_elevator(struct request_queue *q)
 		       q->elevator->elevator_type->elevator_name, q->nr_sorted);
 	}
 }
+EXPORT_SYMBOL(elv_drain_elevator);
 
 /*
  * Call with queue lock held, interrupts disabled
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 9713d5a..1df773c 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -464,6 +464,11 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
 		pr_debug(DRV_PFX "flush diskcache op failed, not supported\n");
 		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->blkif->be, 0);
 		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
+	} else if ((pending_req->operation == BLKIF_OP_WRITE_BARRIER) &&
+		 (error == -EOPNOTSUPP)) {
+		pr_debug(DRV_PFX "write barrier op failed, not supported\n");
+		xen_blkbk_barrier(XBT_NIL, pending_req->blkif->be, 0);
+		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
 	} else if (error) {
 		pr_debug(DRV_PFX "Buffer not up-to-date at end of operation,"
 			 " error=%d\n", error);
@@ -590,6 +595,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	struct bio *biolist[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 	int i, nbio = 0;
 	int operation;
+	bool drain = false;
 	struct blk_plug plug;
 
 	switch (req->operation) {
@@ -601,6 +607,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		blkif->st_wr_req++;
 		operation = WRITE_ODIRECT;
 		break;
+	case BLKIF_OP_WRITE_BARRIER:
+		drain = true;
 	case BLKIF_OP_FLUSH_DISKCACHE:
 		blkif->st_f_req++;
 		operation = WRITE_FLUSH;
@@ -609,7 +617,6 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		blkif->st_ds_req++;
 		operation = REQ_DISCARD;
 		break;
-	case BLKIF_OP_WRITE_BARRIER:
 	default:
 		operation = 0; /* make gcc happy */
 		goto fail_response;
@@ -668,6 +675,17 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		}
 	}
 
+	if (drain) {
+		struct request_queue *q = bdev_get_queue(preq.bdev);
+		unsigned long flags;
+
+		/* Emulate the original behavior of write barriers */
+		spin_lock_irqsave(q->queue_lock, flags);
+		elv_drain_elevator(q);
+		__blk_run_queue(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
+
 	/*
 	 * If we have failed at this point, we need to undo the M2P override,
 	 * set gnttab_set_unmap_op on all of the grant references and perform
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index bfb532e..42b0e46 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -228,6 +228,8 @@ int xen_blkif_schedule(void *arg);
 
 int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
 			      struct backend_info *be, int state);
+int xen_blkbk_barrier(struct xenbus_transaction xbt,
+		      struct backend_info *be, int state);
 
 struct xenbus_device *xen_blkbk_xenbus(struct backend_info *be);
 
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 2b3aef0..b477aee 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -421,6 +421,20 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
 	return err;
 }
 
+int xen_blkbk_barrier(struct xenbus_transaction xbt,
+		      struct backend_info *be, int state)
+{
+	struct xenbus_device *dev = be->dev;
+	int err;
+
+	err = xenbus_printf(xbt, dev->nodename, "feature-barrier",
+			    "%d", state);
+	if (err)
+		xenbus_dev_fatal(dev, err, "writing feature-barrier");
+
+	return err;
+}
+
 int xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info *be)
 {
 	struct xenbus_device *dev = be->dev;
@@ -706,6 +720,8 @@ again:
 	if (err)
 		goto abort;
 
+	err = xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
+
 	err = xen_blkbk_discard(xbt, be);
 
 	err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH]: xen/blkback: Add support for old BARRIER requests - 'feature-barrier', was "Help with implementing some form of barriers in 3.0 kernels."
  2011-09-15 12:51             ` Konrad Rzeszutek Wilk
@ 2011-09-15 13:00               ` Jan Beulich
  2011-09-15 14:21                 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2011-09-15 13:00 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jeremy Fitzhardinge, hch, axboe, Jan Kara, linux-kernel

>>> On 15.09.11 at 14:51, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> > +	if (drain) {
>> > +		struct request_queue *q = bdev_get_queue(preq.bdev);
>> > +		unsigned long flags;
>> > +
>> > +		if (!q->elevator) {
>> > +			__end_block_io_op(pending_req, -EOPNOTSUPP);
>> > +			return -EOPNOTSUPP;
>> > +		}
>> 
>> You shouldn't return here - barrier requests may be "non-empty" (i.e.
>> carry data to be written), and hence you'd need to make sure the
>> writes get carried out nevertheless.
> 
> In theory (and in practice) the 'feature-barrier' is set to zero
> if !q->elevator. So the frontend should not even try WRITE_BARRIER.
> 
> I think the better approach will be to outright fail the WRITE_BARRIER
> request if 'feature-barrier=0' instead of just trying (which is what
> this tries).
> 
> On a unrelated topic is what to do with devices which are bio based
> but not request based. From the DM directory, only multipath is capable
> of doing requests, while the rest are all bio. There is code to
> quisce the bio's: dm_suspend, but it is not apparent to me how to
> make it be exposed.
> 
> 
>> 
>> Jan - this suggests that the placement isn't correct either: The
>> draining of the queue - as I understand it - should be happening
>> *after* these writes completed, not before they get started.
> 
> So that looks like it should be just moved a bit, like this new patch:

Looks like this is the patch after the very first modifications you made
to it - I don't see anything being moved, and I no longer see any of
the subsequent changes you did.

Jan

> commit 315c0cf1a5174b9aed494d7903133ce9ac76d6f1
> Author: Jan Kara <jack@suse.cz>
> Date:   Tue Sep 13 11:44:07 2011 +0100
> 
>     xen: Add support for old BARRIER requests to xenblk driver
>     
>     Recent kernels do not support BARRIER operation but only FLUSH 
> operation. But
>     older xenblk frontends still use the BARRIER operation to achieve data
>     integrity requirements. So add support for BARRIER operation into xenblk
>     backend so that all guests do not corrupt their filesystem on host 
> crash.
>     
>     Signed-off-by: Jan Kara <jack@suse.cz>
>     Signed-off-by: Jan Beulich <JBeulich@suse.com>
>     [v1: Added some extra functions, and other cases]
>     Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> diff --git a/block/elevator.c b/block/elevator.c
> index a3b64bc..b2143a8 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -616,6 +616,7 @@ void elv_drain_elevator(struct request_queue *q)
>  		       q->elevator->elevator_type->elevator_name, q->nr_sorted);
>  	}
>  }
> +EXPORT_SYMBOL(elv_drain_elevator);
>  
>  /*
>   * Call with queue lock held, interrupts disabled
> diff --git a/drivers/block/xen-blkback/blkback.c 
> b/drivers/block/xen-blkback/blkback.c
> index 9713d5a..1df773c 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -464,6 +464,11 @@ static void __end_block_io_op(struct pending_req 
> *pending_req, int error)
>  		pr_debug(DRV_PFX "flush diskcache op failed, not supported\n");
>  		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->blkif->be, 0);
>  		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
> +	} else if ((pending_req->operation == BLKIF_OP_WRITE_BARRIER) &&
> +		 (error == -EOPNOTSUPP)) {
> +		pr_debug(DRV_PFX "write barrier op failed, not supported\n");
> +		xen_blkbk_barrier(XBT_NIL, pending_req->blkif->be, 0);
> +		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
>  	} else if (error) {
>  		pr_debug(DRV_PFX "Buffer not up-to-date at end of operation,"
>  			 " error=%d\n", error);
> @@ -590,6 +595,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  	struct bio *biolist[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>  	int i, nbio = 0;
>  	int operation;
> +	bool drain = false;
>  	struct blk_plug plug;
>  
>  	switch (req->operation) {
> @@ -601,6 +607,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  		blkif->st_wr_req++;
>  		operation = WRITE_ODIRECT;
>  		break;
> +	case BLKIF_OP_WRITE_BARRIER:
> +		drain = true;
>  	case BLKIF_OP_FLUSH_DISKCACHE:
>  		blkif->st_f_req++;
>  		operation = WRITE_FLUSH;
> @@ -609,7 +617,6 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  		blkif->st_ds_req++;
>  		operation = REQ_DISCARD;
>  		break;
> -	case BLKIF_OP_WRITE_BARRIER:
>  	default:
>  		operation = 0; /* make gcc happy */
>  		goto fail_response;
> @@ -668,6 +675,17 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  		}
>  	}
>  
> +	if (drain) {
> +		struct request_queue *q = bdev_get_queue(preq.bdev);
> +		unsigned long flags;
> +
> +		/* Emulate the original behavior of write barriers */
> +		spin_lock_irqsave(q->queue_lock, flags);
> +		elv_drain_elevator(q);
> +		__blk_run_queue(q);
> +		spin_unlock_irqrestore(q->queue_lock, flags);
> +	}
> +
>  	/*
>  	 * If we have failed at this point, we need to undo the M2P override,
>  	 * set gnttab_set_unmap_op on all of the grant references and perform
> diff --git a/drivers/block/xen-blkback/common.h 
> b/drivers/block/xen-blkback/common.h
> index bfb532e..42b0e46 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -228,6 +228,8 @@ int xen_blkif_schedule(void *arg);
>  
>  int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
>  			      struct backend_info *be, int state);
> +int xen_blkbk_barrier(struct xenbus_transaction xbt,
> +		      struct backend_info *be, int state);
>  
>  struct xenbus_device *xen_blkbk_xenbus(struct backend_info *be);
>  
> diff --git a/drivers/block/xen-blkback/xenbus.c 
> b/drivers/block/xen-blkback/xenbus.c
> index 2b3aef0..b477aee 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -421,6 +421,20 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction 
> xbt,
>  	return err;
>  }
>  
> +int xen_blkbk_barrier(struct xenbus_transaction xbt,
> +		      struct backend_info *be, int state)
> +{
> +	struct xenbus_device *dev = be->dev;
> +	int err;
> +
> +	err = xenbus_printf(xbt, dev->nodename, "feature-barrier",
> +			    "%d", state);
> +	if (err)
> +		xenbus_dev_fatal(dev, err, "writing feature-barrier");
> +
> +	return err;
> +}
> +
>  int xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info 
> *be)
>  {
>  	struct xenbus_device *dev = be->dev;
> @@ -706,6 +720,8 @@ again:
>  	if (err)
>  		goto abort;
>  
> +	err = xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
> +
>  	err = xen_blkbk_discard(xbt, be);
>  
>  	err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH]: xen/blkback: Add support for old BARRIER requests - 'feature-barrier', was "Help with implementing some form of barriers in 3.0 kernels."
  2011-09-14 16:13           ` Jan Beulich
  2011-09-15 12:51             ` Konrad Rzeszutek Wilk
@ 2011-09-15 13:24             ` Jan Kara
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Kara @ 2011-09-15 13:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Konrad Rzeszutek Wilk, Jeremy Fitzhardinge, hch, axboe, Jan Kara,
	linux-kernel

On Wed 14-09-11 17:13:07, Jan Beulich wrote:
> >>> On 14.09.11 at 17:01, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > On Wed, Sep 14, 2011 at 10:32:47AM -0400, Konrad Rzeszutek Wilk wrote:
> > 
> > Let me just post the one that I've succesfully tested. Not sure
> > about the need for the check for q->elvator in the dispatch_rw_block_io
> > so if you guys want me to remove it I can certainly do it.
> > 
> > commit a5a12f729697161522ecae0564915cf4145cb65c
> > Author: Jan Kara <jack@suse.cz>
> > Date:   Tue Sep 13 11:44:07 2011 +0100
> > 
> >     xen/blkback: Add support for old BARRIER requests - 'feature-barrier'
> >     
> >     Recent kernels do not support BARRIER operation but only FLUSH 
> > operation. But
> >     older xenblk frontends still use the BARRIER operation to achieve data
> >     integrity requirements. So add support for BARRIER operation into xenblk
> >     backend so that all guests do not corrupt their filesystem on host 
> > crash.
> >     
> >     Signed-off-by: Jan Kara <jack@suse.cz>
> >     Signed-off-by: Jan Beulich <JBeulich@suse.com>
> >     [v1: Added some extra functions, and other cases]
> >     [v2: Added check for q->elevator]
> >     [v3: Changed EXPORT_SYMBOL to EXPORT_SYMBOL_GPL]
> >     Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > 
> > diff --git a/block/elevator.c b/block/elevator.c
> > index a3b64bc..4e6c5b6 100644
> > --- a/block/elevator.c
> > +++ b/block/elevator.c
> > @@ -616,6 +616,7 @@ void elv_drain_elevator(struct request_queue *q)
> >  		       q->elevator->elevator_type->elevator_name, q->nr_sorted);
> >  	}
> >  }
> > +EXPORT_SYMBOL_GPL(elv_drain_elevator);
> >  
> >  /*
> >   * Call with queue lock held, interrupts disabled
> > diff --git a/drivers/block/xen-blkback/blkback.c 
> > b/drivers/block/xen-blkback/blkback.c
> > index 9713d5a..573744e 100644
> > --- a/drivers/block/xen-blkback/blkback.c
> > +++ b/drivers/block/xen-blkback/blkback.c
> > @@ -464,6 +464,11 @@ static void __end_block_io_op(struct pending_req 
> > *pending_req, int error)
> >  		pr_debug(DRV_PFX "flush diskcache op failed, not supported\n");
> >  		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->blkif->be, 0);
> >  		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
> > +	} else if ((pending_req->operation == BLKIF_OP_WRITE_BARRIER) &&
> > +		   (error == -EOPNOTSUPP)) {
> > +		pr_debug(DRV_PFX "write barrier op failed, not supported\n");
> > +		xen_blkbk_barrier(XBT_NIL, pending_req->blkif->be, 0);
> > +		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
> >  	} else if (error) {
> >  		pr_debug(DRV_PFX "Buffer not up-to-date at end of operation,"
> >  			 " error=%d\n", error);
> > @@ -590,6 +595,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >  	struct bio *biolist[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> >  	int i, nbio = 0;
> >  	int operation;
> > +	bool drain = false;
> >  	struct blk_plug plug;
> >  
> >  	switch (req->operation) {
> > @@ -601,6 +607,8 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >  		blkif->st_wr_req++;
> >  		operation = WRITE_ODIRECT;
> >  		break;
> > +	case BLKIF_OP_WRITE_BARRIER:
> > +		drain = true;
> >  	case BLKIF_OP_FLUSH_DISKCACHE:
> >  		blkif->st_f_req++;
> >  		operation = WRITE_FLUSH;
> > @@ -609,7 +617,6 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >  		blkif->st_ds_req++;
> >  		operation = REQ_DISCARD;
> >  		break;
> > -	case BLKIF_OP_WRITE_BARRIER:
> >  	default:
> >  		operation = 0; /* make gcc happy */
> >  		goto fail_response;
> > @@ -668,6 +675,21 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> >  		}
> >  	}
> >  
> > +	if (drain) {
> > +		struct request_queue *q = bdev_get_queue(preq.bdev);
> > +		unsigned long flags;
> > +
> > +		if (!q->elevator) {
> > +			__end_block_io_op(pending_req, -EOPNOTSUPP);
> > +			return -EOPNOTSUPP;
> > +		}
> 
> You shouldn't return here - barrier requests may be "non-empty" (i.e.
> carry data to be written), and hence you'd need to make sure the
> writes get carried out nevertheless.
> 
> Jan - this suggests that the placement isn't correct either: The
> draining of the queue - as I understand it - should be happening
> *after* these writes completed, not before they get started.
  The draining should really happen before. It was like that in older
kernels as well. But you are right that we should probably make the
operation be WRITE_FLUSH_FUA to properly simulate the old behavior.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH]: xen/blkback: Add support for old BARRIER requests - 'feature-barrier', was "Help with implementing some form of barriers in 3.0 kernels."
  2011-09-15 13:00               ` Jan Beulich
@ 2011-09-15 14:21                 ` Konrad Rzeszutek Wilk
  2011-09-15 15:13                   ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-15 14:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jeremy Fitzhardinge, hch, axboe, Jan Kara, linux-kernel

On Thu, Sep 15, 2011 at 02:00:43PM +0100, Jan Beulich wrote:
> >>> On 15.09.11 at 14:51, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> >> > +	if (drain) {
> >> > +		struct request_queue *q = bdev_get_queue(preq.bdev);
> >> > +		unsigned long flags;
> >> > +
> >> > +		if (!q->elevator) {
> >> > +			__end_block_io_op(pending_req, -EOPNOTSUPP);
> >> > +			return -EOPNOTSUPP;
> >> > +		}
> >> 
> >> You shouldn't return here - barrier requests may be "non-empty" (i.e.
> >> carry data to be written), and hence you'd need to make sure the
> >> writes get carried out nevertheless.
> > 
> > In theory (and in practice) the 'feature-barrier' is set to zero
> > if !q->elevator. So the frontend should not even try WRITE_BARRIER.
> > 
> > I think the better approach will be to outright fail the WRITE_BARRIER
> > request if 'feature-barrier=0' instead of just trying (which is what
> > this tries).
> > 
> > On a unrelated topic is what to do with devices which are bio based
> > but not request based. From the DM directory, only multipath is capable
> > of doing requests, while the rest are all bio. There is code to
> > quisce the bio's: dm_suspend, but it is not apparent to me how to
> > make it be exposed.
> > 
> > 
> >> 
> >> Jan - this suggests that the placement isn't correct either: The
> >> draining of the queue - as I understand it - should be happening
> >> *after* these writes completed, not before they get started.
> > 
> > So that looks like it should be just moved a bit, like this new patch:
> 
> Looks like this is the patch after the very first modifications you made
> to it - I don't see anything being moved, and I no longer see any of
> the subsequent changes you did.

duh!
commit 9e8970f64bceffd90113f6cbb426af93cb7043dd
Author: Jan Kara <jack@suse.cz>
Date:   Tue Sep 13 11:44:07 2011 +0100

    xen/blkback: Add support for old BARRIER requests - 'feature-barrier'
    
    Recent kernels do not support BARRIER operation but only FLUSH operation. But
    older xenblk frontends still use the BARRIER operation to achieve data
    integrity requirements. So add support for BARRIER operation into xenblk
    backend so that all guests do not corrupt their filesystem on host crash.
    
    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Jan Beulich <JBeulich@suse.com>
    [v1: Added some extra functions, and other cases]
    [v2: Added check for q->elevator]
    [v3: Changed EXPORT_SYMBOL to EXPORT_SYMBOL_GPL]
    [v4: Moved the drain _after_ the write request and fix failling]
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

diff --git a/block/elevator.c b/block/elevator.c
index a3b64bc..4e6c5b6 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -616,6 +616,7 @@ void elv_drain_elevator(struct request_queue *q)
 		       q->elevator->elevator_type->elevator_name, q->nr_sorted);
 	}
 }
+EXPORT_SYMBOL_GPL(elv_drain_elevator);
 
 /*
  * Call with queue lock held, interrupts disabled
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 9713d5a..738966e 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -464,6 +464,11 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
 		pr_debug(DRV_PFX "flush diskcache op failed, not supported\n");
 		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->blkif->be, 0);
 		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
+	} else if ((pending_req->operation == BLKIF_OP_WRITE_BARRIER) &&
+		   (error == -EOPNOTSUPP)) {
+		pr_debug(DRV_PFX "write barrier op failed, not supported\n");
+		xen_blkbk_barrier(XBT_NIL, pending_req->blkif->be, 0);
+		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
 	} else if (error) {
 		pr_debug(DRV_PFX "Buffer not up-to-date at end of operation,"
 			 " error=%d\n", error);
@@ -601,6 +606,9 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		blkif->st_wr_req++;
 		operation = WRITE_ODIRECT;
 		break;
+	case BLKIF_OP_WRITE_BARRIER:
+		if (!blkif->vbd.barrier_support)
+			goto fail_response;
 	case BLKIF_OP_FLUSH_DISKCACHE:
 		blkif->st_f_req++;
 		operation = WRITE_FLUSH;
@@ -609,7 +617,6 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 		blkif->st_ds_req++;
 		operation = REQ_DISCARD;
 		break;
-	case BLKIF_OP_WRITE_BARRIER:
 	default:
 		operation = 0; /* make gcc happy */
 		goto fail_response;
@@ -667,7 +674,6 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 			goto fail_response;
 		}
 	}
-
 	/*
 	 * If we have failed at this point, we need to undo the M2P override,
 	 * set gnttab_set_unmap_op on all of the grant references and perform
@@ -746,6 +752,16 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 	else if (operation == WRITE || operation == WRITE_FLUSH)
 		blkif->st_wr_sect += preq.nr_sects;
 
+	if (req->operation == BLKIF_OP_WRITE_BARRIER) {
+		struct request_queue *q = bdev_get_queue(preq.bdev);
+		unsigned long flags;
+
+		/* Emulate the original behavior of write barriers */
+		spin_lock_irqsave(q->queue_lock, flags);
+		elv_drain_elevator(q);
+		__blk_run_queue(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
 	return 0;
 
  fail_flush:
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index bfb532e..9f1990b 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -157,6 +157,7 @@ struct xen_vbd {
 	/* Cached size parameter. */
 	sector_t		size;
 	bool			flush_support;
+	bool			barrier_support;
 };
 
 struct backend_info;
@@ -228,6 +229,8 @@ int xen_blkif_schedule(void *arg);
 
 int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
 			      struct backend_info *be, int state);
+int xen_blkbk_barrier(struct xenbus_transaction xbt,
+		      struct backend_info *be, int state);
 
 struct xenbus_device *xen_blkbk_xenbus(struct backend_info *be);
 
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 2b3aef0..ec0975b 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -376,6 +376,9 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
 	if (q && q->flush_flags)
 		vbd->flush_support = true;
 
+	if (q && q->elevator)
+		vbd->barrier_support = true;
+
 	DPRINTK("Successful creation of handle=%04x (dom=%u)\n",
 		handle, blkif->domid);
 	return 0;
@@ -421,6 +424,20 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
 	return err;
 }
 
+int xen_blkbk_barrier(struct xenbus_transaction xbt,
+		      struct backend_info *be, int state)
+{
+	struct xenbus_device *dev = be->dev;
+	int err;
+
+	err = xenbus_printf(xbt, dev->nodename, "feature-barrier",
+			    "%d", state);
+	if (err)
+		xenbus_dev_fatal(dev, err, "writing feature-barrier");
+
+	return err;
+}
+
 int xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info *be)
 {
 	struct xenbus_device *dev = be->dev;
@@ -706,6 +723,8 @@ again:
 	if (err)
 		goto abort;
 
+	err = xen_blkbk_barrier(xbt, be, be->blkif->vbd.barrier_support);
+
 	err = xen_blkbk_discard(xbt, be);
 
 	err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH]: xen/blkback: Add support for old BARRIER requests - 'feature-barrier', was "Help with implementing some form of barriers in 3.0 kernels."
  2011-09-15 14:21                 ` Konrad Rzeszutek Wilk
@ 2011-09-15 15:13                   ` Jan Kara
  2011-09-15 15:15                     ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2011-09-15 15:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jan Beulich, Jeremy Fitzhardinge, hch, axboe, Jan Kara, linux-kernel

On Thu 15-09-11 10:21:25, Konrad Rzeszutek Wilk wrote:
> On Thu, Sep 15, 2011 at 02:00:43PM +0100, Jan Beulich wrote:
> > >>> On 15.09.11 at 14:51, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > >> > +	if (drain) {
> > >> > +		struct request_queue *q = bdev_get_queue(preq.bdev);
> > >> > +		unsigned long flags;
> > >> > +
> > >> > +		if (!q->elevator) {
> > >> > +			__end_block_io_op(pending_req, -EOPNOTSUPP);
> > >> > +			return -EOPNOTSUPP;
> > >> > +		}
> > >> 
> > >> You shouldn't return here - barrier requests may be "non-empty" (i.e.
> > >> carry data to be written), and hence you'd need to make sure the
> > >> writes get carried out nevertheless.
> > > 
> > > In theory (and in practice) the 'feature-barrier' is set to zero
> > > if !q->elevator. So the frontend should not even try WRITE_BARRIER.
> > > 
> > > I think the better approach will be to outright fail the WRITE_BARRIER
> > > request if 'feature-barrier=0' instead of just trying (which is what
> > > this tries).
> > > 
> > > On a unrelated topic is what to do with devices which are bio based
> > > but not request based. From the DM directory, only multipath is capable
> > > of doing requests, while the rest are all bio. There is code to
> > > quisce the bio's: dm_suspend, but it is not apparent to me how to
> > > make it be exposed.
> > > 
> > > 
> > >> 
> > >> Jan - this suggests that the placement isn't correct either: The
> > >> draining of the queue - as I understand it - should be happening
> > >> *after* these writes completed, not before they get started.
> > > 
> > > So that looks like it should be just moved a bit, like this new patch:
> > 
> > Looks like this is the patch after the very first modifications you made
> > to it - I don't see anything being moved, and I no longer see any of
> > the subsequent changes you did.
> 
> duh!
> commit 9e8970f64bceffd90113f6cbb426af93cb7043dd
> Author: Jan Kara <jack@suse.cz>
> Date:   Tue Sep 13 11:44:07 2011 +0100
> 
>     xen/blkback: Add support for old BARRIER requests - 'feature-barrier'
>     
>     Recent kernels do not support BARRIER operation but only FLUSH operation. But
>     older xenblk frontends still use the BARRIER operation to achieve data
>     integrity requirements. So add support for BARRIER operation into xenblk
>     backend so that all guests do not corrupt their filesystem on host crash.
>     
>     Signed-off-by: Jan Kara <jack@suse.cz>
>     Signed-off-by: Jan Beulich <JBeulich@suse.com>
>     [v1: Added some extra functions, and other cases]
>     [v2: Added check for q->elevator]
>     [v3: Changed EXPORT_SYMBOL to EXPORT_SYMBOL_GPL]
>     [v4: Moved the drain _after_ the write request and fix failling]
  As I said in an email to Jan Beulich, this is wrong. If you drain after
the request is submitted, you have no control over the ordering of the data
in the BARRIER request and previous requests. And that is wrong. So drain
was originally at the right place. Only we should probably set the
operation to WRITE_FLUSH_FUA (not just WRITE_FLUSH) so we flush the disk
caches also after the request completes.

									Honza

>     Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> diff --git a/block/elevator.c b/block/elevator.c
> index a3b64bc..4e6c5b6 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -616,6 +616,7 @@ void elv_drain_elevator(struct request_queue *q)
>  		       q->elevator->elevator_type->elevator_name, q->nr_sorted);
>  	}
>  }
> +EXPORT_SYMBOL_GPL(elv_drain_elevator);
>  
>  /*
>   * Call with queue lock held, interrupts disabled
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index 9713d5a..738966e 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -464,6 +464,11 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
>  		pr_debug(DRV_PFX "flush diskcache op failed, not supported\n");
>  		xen_blkbk_flush_diskcache(XBT_NIL, pending_req->blkif->be, 0);
>  		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
> +	} else if ((pending_req->operation == BLKIF_OP_WRITE_BARRIER) &&
> +		   (error == -EOPNOTSUPP)) {
> +		pr_debug(DRV_PFX "write barrier op failed, not supported\n");
> +		xen_blkbk_barrier(XBT_NIL, pending_req->blkif->be, 0);
> +		pending_req->status = BLKIF_RSP_EOPNOTSUPP;
>  	} else if (error) {
>  		pr_debug(DRV_PFX "Buffer not up-to-date at end of operation,"
>  			 " error=%d\n", error);
> @@ -601,6 +606,9 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  		blkif->st_wr_req++;
>  		operation = WRITE_ODIRECT;
>  		break;
> +	case BLKIF_OP_WRITE_BARRIER:
> +		if (!blkif->vbd.barrier_support)
> +			goto fail_response;
>  	case BLKIF_OP_FLUSH_DISKCACHE:
>  		blkif->st_f_req++;
>  		operation = WRITE_FLUSH;
> @@ -609,7 +617,6 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  		blkif->st_ds_req++;
>  		operation = REQ_DISCARD;
>  		break;
> -	case BLKIF_OP_WRITE_BARRIER:
>  	default:
>  		operation = 0; /* make gcc happy */
>  		goto fail_response;
> @@ -667,7 +674,6 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  			goto fail_response;
>  		}
>  	}
> -
>  	/*
>  	 * If we have failed at this point, we need to undo the M2P override,
>  	 * set gnttab_set_unmap_op on all of the grant references and perform
> @@ -746,6 +752,16 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>  	else if (operation == WRITE || operation == WRITE_FLUSH)
>  		blkif->st_wr_sect += preq.nr_sects;
>  
> +	if (req->operation == BLKIF_OP_WRITE_BARRIER) {
> +		struct request_queue *q = bdev_get_queue(preq.bdev);
> +		unsigned long flags;
> +
> +		/* Emulate the original behavior of write barriers */
> +		spin_lock_irqsave(q->queue_lock, flags);
> +		elv_drain_elevator(q);
> +		__blk_run_queue(q);
> +		spin_unlock_irqrestore(q->queue_lock, flags);
> +	}
>  	return 0;
>  
>   fail_flush:
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index bfb532e..9f1990b 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -157,6 +157,7 @@ struct xen_vbd {
>  	/* Cached size parameter. */
>  	sector_t		size;
>  	bool			flush_support;
> +	bool			barrier_support;
>  };
>  
>  struct backend_info;
> @@ -228,6 +229,8 @@ int xen_blkif_schedule(void *arg);
>  
>  int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
>  			      struct backend_info *be, int state);
> +int xen_blkbk_barrier(struct xenbus_transaction xbt,
> +		      struct backend_info *be, int state);
>  
>  struct xenbus_device *xen_blkbk_xenbus(struct backend_info *be);
>  
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index 2b3aef0..ec0975b 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -376,6 +376,9 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
>  	if (q && q->flush_flags)
>  		vbd->flush_support = true;
>  
> +	if (q && q->elevator)
> +		vbd->barrier_support = true;
> +
>  	DPRINTK("Successful creation of handle=%04x (dom=%u)\n",
>  		handle, blkif->domid);
>  	return 0;
> @@ -421,6 +424,20 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
>  	return err;
>  }
>  
> +int xen_blkbk_barrier(struct xenbus_transaction xbt,
> +		      struct backend_info *be, int state)
> +{
> +	struct xenbus_device *dev = be->dev;
> +	int err;
> +
> +	err = xenbus_printf(xbt, dev->nodename, "feature-barrier",
> +			    "%d", state);
> +	if (err)
> +		xenbus_dev_fatal(dev, err, "writing feature-barrier");
> +
> +	return err;
> +}
> +
>  int xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info *be)
>  {
>  	struct xenbus_device *dev = be->dev;
> @@ -706,6 +723,8 @@ again:
>  	if (err)
>  		goto abort;
>  
> +	err = xen_blkbk_barrier(xbt, be, be->blkif->vbd.barrier_support);
> +
>  	err = xen_blkbk_discard(xbt, be);
>  
>  	err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH]: xen/blkback: Add support for old BARRIER requests - 'feature-barrier', was "Help with implementing some form of barriers in 3.0 kernels."
  2011-09-15 15:13                   ` Jan Kara
@ 2011-09-15 15:15                     ` Christoph Hellwig
  2011-09-16  9:24                       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2011-09-15 15:15 UTC (permalink / raw)
  To: Jan Kara
  Cc: Konrad Rzeszutek Wilk, Jan Beulich, Jeremy Fitzhardinge, hch,
	axboe, linux-kernel

Btw, I'm really against exporting any of the low-level request/elevator
code interfaces.  I have plans for huge changes in that area, and having
someone poke into it is just a bad idea.

Try to take a look at the barrier era dm implementation of them, which
did the draining on the bio layer and copy & paste that.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH]: xen/blkback: Add support for old BARRIER requests - 'feature-barrier', was "Help with implementing some form of barriers in 3.0 kernels."
  2011-09-15 15:15                     ` Christoph Hellwig
@ 2011-09-16  9:24                       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-09-16  9:24 UTC (permalink / raw)
  To: Christoph Hellwig, Jan Kara
  Cc: Jan Kara, Jan Beulich, Jeremy Fitzhardinge, axboe, linux-kernel

On Thu, Sep 15, 2011 at 11:15:40AM -0400, Christoph Hellwig wrote:
> Btw, I'm really against exporting any of the low-level request/elevator
> code interfaces.  I have plans for huge changes in that area, and having
> someone poke into it is just a bad idea.

Thanks for the heads up.
> 
> Try to take a look at the barrier era dm implementation of them, which
> did the draining on the bio layer and copy & paste that.

Jan (K), do you want to poke at that and re-use the patch of yours which I
have been altering?

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-09-16  9:24 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-07 17:48 Help with implementing some form of barriers in 3.0 kernels Konrad Rzeszutek Wilk
2011-09-07 18:17 ` Vivek Goyal
2011-09-07 18:27   ` Konrad Rzeszutek Wilk
2011-09-07 18:36     ` Vivek Goyal
2011-09-07 20:16   ` Christoph Hellwig
2011-09-07 21:31     ` Vivek Goyal
2011-09-08  8:22       ` Christoph Hellwig
2011-09-08  8:02     ` Jan Beulich
2011-09-08  8:08       ` Christoph Hellwig
2011-09-13 10:44 ` Jan Beulich
2011-09-14  8:59   ` Konrad Rzeszutek Wilk
2011-09-14  9:12     ` Jan Beulich
2011-09-14  9:30       ` Konrad Rzeszutek Wilk
2011-09-14 10:15         ` Jan Beulich
2011-09-14 14:32       ` Konrad Rzeszutek Wilk
2011-09-14 15:01         ` [PATCH]: xen/blkback: Add support for old BARRIER requests - 'feature-barrier', was "Help with implementing some form of barriers in 3.0 kernels." Konrad Rzeszutek Wilk
2011-09-14 16:13           ` Jan Beulich
2011-09-15 12:51             ` Konrad Rzeszutek Wilk
2011-09-15 13:00               ` Jan Beulich
2011-09-15 14:21                 ` Konrad Rzeszutek Wilk
2011-09-15 15:13                   ` Jan Kara
2011-09-15 15:15                     ` Christoph Hellwig
2011-09-16  9:24                       ` Konrad Rzeszutek Wilk
2011-09-15 13:24             ` Jan Kara
2011-09-14 15:34         ` Help with implementing some form of barriers in 3.0 kernels Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.