All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Qemu-devel] 'qemu-nbd' explicit flush
@ 2013-05-23 21:58 Mark Trumpold
  2013-05-24  9:05 ` Stefan Hajnoczi
  2013-05-24 12:10 ` Paolo Bonzini
  0 siblings, 2 replies; 15+ messages in thread
From: Mark Trumpold @ 2013-05-23 21:58 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi; +Cc: Mark Trumpold, qemu-devel, markt

I have a working configuration using the signal approach suggested by Stefan.

'qemu-nbd.c' is patched as follows:

    do {
        main_loop_wait(false);
+       if (sighup_reported) {
+           sighup_reported = false;
+           bdrv_drain_all();
+           bdrv_flush_all();
        }
    } while (!sigterm_reported && (persistent || !nbd_started || nb_fds > 0));

The driving script was patched as follows:

     mount -o remount,ro /dev/nbd0
     blockdev --flushbufs /dev/nbd0
+    kill -HUP <qemu-nbd process id>

I needed to retain 'blockdev --flushbufs' for things to work.  Seems the 'bdrv_flush_all' is flushing what is being missed by the blockdev flush.  I did not go back an retest with 'fsync' or other approaches I had tried before.

Thanks again Paolo and Stefan for your help!!
Regards,
Mark T.

-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Wednesday, May 22, 2013 04:07 AM
To: 'Stefan Hajnoczi'
Cc: 'Mark Trumpold', qemu-devel@nongnu.org, markt@tachyon.net
Subject: Re: 'qemu-nbd' explicit flush

Il 22/05/2013 11:47, Stefan Hajnoczi ha scritto:
> On Tue, May 21, 2013 at 08:01:10PM +0000, Mark Trumpold wrote:
>>     Linux kernel 3.3.1 with Qemu patch to enable kernel flushing:
>>         http://thread.gmane.org/gmane.linux.drivers.nbd.general/1108
>
> Did you check that the kernel is sending NBD_FLUSH commands?  You can
> use tcpdump and then check the captured network traffic.
>
>> Usage example:
>>     'qemu-nbd --cache=writeback -c /dev/nbd0 /images/my-qcow.img'
>>     'mount /dev/nbd0 /my-mount-point'
>>
>> Everything does flush correctly when I first unmount and then disconnect the device; however, in my case I am not able to unmount things before snapshotting.
>>
>> I tried several approaches externally to flush the device.  For example:
>>     'mount -o remount,ro /dev/nbd0'
>>     'blockdev --flushbufs /dev/nbd0'
>
> Did you try plain old sync(1)?

This could also work:

  dd if=/dev/zero of=dummy oflag=sync bs=512 count=1

> 1. Add a signal handler (like SIGHUP or SIGUSR1) to qemu-nbd which
>    flushes all exports.

That would be a useful addition anyway.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 'qemu-nbd' explicit flush
  2013-05-23 21:58 [Qemu-devel] 'qemu-nbd' explicit flush Mark Trumpold
@ 2013-05-24  9:05 ` Stefan Hajnoczi
  2013-05-25 17:42   ` Mark Trumpold
  2013-05-24 12:10 ` Paolo Bonzini
  1 sibling, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2013-05-24  9:05 UTC (permalink / raw)
  To: Mark Trumpold; +Cc: Paolo Bonzini, qemu-devel, markt

On Thu, May 23, 2013 at 09:58:31PM +0000, Mark Trumpold wrote:
> I have a working configuration using the signal approach suggested by Stefan.
> 
> 'qemu-nbd.c' is patched as follows:
> 
>     do {
>         main_loop_wait(false);
> +       if (sighup_reported) {
> +           sighup_reported = false;
> +           bdrv_drain_all();
> +           bdrv_flush_all();
>         }
>     } while (!sigterm_reported && (persistent || !nbd_started || nb_fds > 0));
> 
> The driving script was patched as follows:
> 
>      mount -o remount,ro /dev/nbd0
>      blockdev --flushbufs /dev/nbd0
> +    kill -HUP <qemu-nbd process id>
> 
> I needed to retain 'blockdev --flushbufs' for things to work.  Seems the 'bdrv_flush_all' is flushing what is being missed by the blockdev flush.  I did not go back an retest with 'fsync' or other approaches I had tried before.

Okay, that makes sense:

'blockdev --flushbufs' is writing dirty pages to the NBD device.

bdrv_drain_all() + bdrv_flush_all() ensures that image file writes reach
the physical disk.

One thing to be careful of is whether these operations are asynchronous.
The signal is asynchronous, you have no way of knowing when qemu-nbd is
finished flushing to the physical disk.

I didn't check blockdev(8) but it could be the same there.

So watch out, otherwise your script is timing-dependent and may not
actually have finished flushing when you take the snapshot.

Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 'qemu-nbd' explicit flush
  2013-05-23 21:58 [Qemu-devel] 'qemu-nbd' explicit flush Mark Trumpold
  2013-05-24  9:05 ` Stefan Hajnoczi
@ 2013-05-24 12:10 ` Paolo Bonzini
  1 sibling, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2013-05-24 12:10 UTC (permalink / raw)
  To: Mark Trumpold; +Cc: Stefan Hajnoczi, qemu-devel, markt

Il 23/05/2013 23:58, Mark Trumpold ha scritto:
> I have a working configuration using the signal approach suggested by Stefan.
> 
> 'qemu-nbd.c' is patched as follows:
> 
>     do {
>         main_loop_wait(false);
> +       if (sighup_reported) {
> +           sighup_reported = false;
> +           bdrv_drain_all();
> +           bdrv_flush_all();
>         }
>     } while (!sigterm_reported && (persistent || !nbd_started || nb_fds > 0));
> 
> The driving script was patched as follows:

Yes, a patch along these lines would be acceptable.

>      mount -o remount,ro /dev/nbd0
>      blockdev --flushbufs /dev/nbd0
> +    kill -HUP <qemu-nbd process id>
> 
> I needed to retain 'blockdev --flushbufs' for things to work. Seems
> the 'bdrv_flush_all' is flushing what is being missed by the blockdev
> flush. I did not go back an retest with 'fsync' or other approaches I
> had tried before.

Right.  That said, I think a newer kernel would do what you want.
Perhaps you can look at the actual patches that went into 3.9 and
backport them.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 'qemu-nbd' explicit flush
  2013-05-24  9:05 ` Stefan Hajnoczi
@ 2013-05-25 17:42   ` Mark Trumpold
  2013-05-27 12:36     ` Stefan Hajnoczi
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Trumpold @ 2013-05-25 17:42 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, qemu-devel, markt

On 5/24/13 1:05 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:

>On Thu, May 23, 2013 at 09:58:31PM +0000, Mark Trumpold wrote:
>> I have a working configuration using the signal approach suggested by
>>Stefan.
>> 
>> 'qemu-nbd.c' is patched as follows:
>> 
>>     do {
>>         main_loop_wait(false);
>> +       if (sighup_reported) {
>> +           sighup_reported = false;
>> +           bdrv_drain_all();
>> +           bdrv_flush_all();
>>         }
>>     } while (!sigterm_reported && (persistent || !nbd_started || nb_fds
>>> 0));
>> 
>> The driving script was patched as follows:
>> 
>>      mount -o remount,ro /dev/nbd0
>>      blockdev --flushbufs /dev/nbd0
>> +    kill -HUP <qemu-nbd process id>
>> 
>> I needed to retain 'blockdev --flushbufs' for things to work.  Seems
>>the 'bdrv_flush_all' is flushing what is being missed by the blockdev
>>flush.  I did not go back an retest with 'fsync' or other approaches I
>>had tried before.
>
>Okay, that makes sense:
>
>'blockdev --flushbufs' is writing dirty pages to the NBD device.
>
>bdrv_drain_all() + bdrv_flush_all() ensures that image file writes reach
>the physical disk.
>
>One thing to be careful of is whether these operations are asynchronous.
>The signal is asynchronous, you have no way of knowing when qemu-nbd is
>finished flushing to the physical disk.

Right, of course.  I missed the obvious.

>
>I didn't check blockdev(8) but it could be the same there.
>
>So watch out, otherwise your script is timing-dependent and may not
>actually have finished flushing when you take the snapshot.
>
>Stefan
>

The race condition would not be acceptable.  You had mentioned another
approach using the socket interface:

>2. Instantiate a block/nbd.c client that connects to the running
>   qemu-nbd server (make sure format=raw).  Then call bdrv_flush() on
>   the NBD client.  You must use the qemu-nbd --shared=2 option.
>

In my case I only have a 'qemu-nbd' process per loop device.  Would the
'qemu-nbd' process act as the socket server, and I would then write a
simple socket client to instruct him to do the flush?  And, would the
client block until the flush is complete?

Thank you,
Mark T.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 'qemu-nbd' explicit flush
  2013-05-25 17:42   ` Mark Trumpold
@ 2013-05-27 12:36     ` Stefan Hajnoczi
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan Hajnoczi @ 2013-05-27 12:36 UTC (permalink / raw)
  To: Mark Trumpold; +Cc: Paolo Bonzini, qemu-devel, markt

On Sat, May 25, 2013 at 09:42:08AM -0800, Mark Trumpold wrote:
> On 5/24/13 1:05 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
> >On Thu, May 23, 2013 at 09:58:31PM +0000, Mark Trumpold wrote:
> >One thing to be careful of is whether these operations are asynchronous.
> >The signal is asynchronous, you have no way of knowing when qemu-nbd is
> >finished flushing to the physical disk.
> 
> Right, of course.  I missed the obvious.

I missed something too.  Paolo may have already hinted at this when he
posted a dd oflag=sync command-line option:

blockdev --flushbufs is the wrong tool because ioctl(BLKFLSBUF) only
writes out dirty pages to the block device.  It does *not* guarantee to
send a flush request to the device.

Therefore, the underlying image file may not be put into an up-to-date
state by qemu-nbd.


I suggest trying the following instead of blockdev --flushbufs:

  python -c 'import os; os.fsync(open("/dev/loopX", "r+b"))'

This should do the same as blockdev --flushbufs *plus* it sends and
waits for the NBD FLUSH command.

You may have to play with this command-line a little but the main idea
is to open the block device and fsync it.

Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 'qemu-nbd' explicit flush
  2013-05-29  7:42 ` Stefan Hajnoczi
  2013-05-29 15:29   ` Mark Trumpold
@ 2013-06-07 14:00   ` Mark Trumpold
  1 sibling, 0 replies; 15+ messages in thread
From: Mark Trumpold @ 2013-06-07 14:00 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, qemu-devel, markt

On 5/28/13 11:42 PM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:

>On Tue, May 28, 2013 at 06:00:08PM +0000, Mark Trumpold wrote:
>> 
>> >-----Original Message-----
>> >From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
>> >Sent: Monday, May 27, 2013 05:36 AM
>> >To: 'Mark Trumpold'
>> >Cc: 'Paolo Bonzini', qemu-devel@nongnu.org, markt@tachyon.net
>> >Subject: Re: 'qemu-nbd' explicit flush
>> >
>> >On Sat, May 25, 2013 at 09:42:08AM -0800, Mark Trumpold wrote:
>> >> On 5/24/13 1:05 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
>> >> >On Thu, May 23, 2013 at 09:58:31PM +0000, Mark Trumpold wrote:
>> >> >One thing to be careful of is whether these operations are
>>asynchronous.
>> >> >The signal is asynchronous, you have no way of knowing when
>>qemu-nbd is
>> >> >finished flushing to the physical disk.
>> >>
>> >> Right, of course.  I missed the obvious.
>> >
>> >I missed something too.  Paolo may have already hinted at this when he
>> >posted a dd oflag=sync command-line option:
>> >
>> >blockdev --flushbufs is the wrong tool because ioctl(BLKFLSBUF) only
>> >writes out dirty pages to the block device.  It does *not* guarantee to
>> >send a flush request to the device.
>> >
>> >Therefore, the underlying image file may not be put into an up-to-date
>> >state by qemu-nbd.
>> >
>> >
>> >I suggest trying the following instead of blockdev --flushbufs:
>> >
>> >  python -c 'import os; os.fsync(open("/dev/loopX", "r+b"))'
>> >
>> >This should do the same as blockdev --flushbufs *plus* it sends and
>> >waits for the NBD FLUSH command.
>> >
>> >You may have to play with this command-line a little but the main idea
>> >is to open the block device and fsync it.
>> >
>> >Stefan
>> >
>> 
>> Hi Stefan,
>> 
>> One of my early experiments was adding a command line option to
>>'qemu-nbd' that did an open on 'device' (similar to the -c option), and
>>then calling 'fsync' on the 'device'.  By itself, I did not get a
>>complete flush to disk.  Was I missing something?
>> 
>> Empirically, the signal solution (blockdev --flushbufs plus
>>'bdrv_flush_all') was keeping my disk consistent.  My unit test
>>exercises the flush and snapshot pretty rigorously; that is, it never
>>passed before with 'qemu-nbd --cache=writeback ...'.  However, I did not
>>want to rely on 'sleep' for the race condition.
>> 
>> Is there any opportunity with the nbd client socket interface?  The
>>advantage for me there is not modifying 'qemu-nbd' source.
>
>I'm suggesting that you don't need to modify qemu-nbd.  If your host is
>running nbd.ko with flush support, then it should be enough to open the
>device and issue fsync(2).
>
>You can verify this using tcpdump(8) and checking that the NBD FLUSH
>command is really being sent by the host kernel.  If not, double check
>you're using the latest nbd.ko.
>
>Stefan


Stefan,

I tried the 'fsync' approach.  It apparently has no effect with my
3.3.1 Linux kernel and patch.  Changing kernels is not an option for me
at the moment, so I will revisit when we have an opportunity to upgrade
kernels, but for the moment I'll have to stick with 'cache=writethrough'.

Thank you again for your attention and help.

Best Regards,
Mark T.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 'qemu-nbd' explicit flush
  2013-05-29  7:42 ` Stefan Hajnoczi
@ 2013-05-29 15:29   ` Mark Trumpold
  2013-06-07 14:00   ` Mark Trumpold
  1 sibling, 0 replies; 15+ messages in thread
From: Mark Trumpold @ 2013-05-29 15:29 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, qemu-devel, markt

On 5/28/13 11:42 PM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:

>On Tue, May 28, 2013 at 06:00:08PM +0000, Mark Trumpold wrote:
>> 
>> >-----Original Message-----
>> >From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
>> >Sent: Monday, May 27, 2013 05:36 AM
>> >To: 'Mark Trumpold'
>> >Cc: 'Paolo Bonzini', qemu-devel@nongnu.org, markt@tachyon.net
>> >Subject: Re: 'qemu-nbd' explicit flush
>> >
>> >On Sat, May 25, 2013 at 09:42:08AM -0800, Mark Trumpold wrote:
>> >> On 5/24/13 1:05 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
>> >> >On Thu, May 23, 2013 at 09:58:31PM +0000, Mark Trumpold wrote:
>> >> >One thing to be careful of is whether these operations are
>>asynchronous.
>> >> >The signal is asynchronous, you have no way of knowing when
>>qemu-nbd is
>> >> >finished flushing to the physical disk.
>> >>
>> >> Right, of course.  I missed the obvious.
>> >
>> >I missed something too.  Paolo may have already hinted at this when he
>> >posted a dd oflag=sync command-line option:
>> >
>> >blockdev --flushbufs is the wrong tool because ioctl(BLKFLSBUF) only
>> >writes out dirty pages to the block device.  It does *not* guarantee to
>> >send a flush request to the device.
>> >
>> >Therefore, the underlying image file may not be put into an up-to-date
>> >state by qemu-nbd.
>> >
>> >
>> >I suggest trying the following instead of blockdev --flushbufs:
>> >
>> >  python -c 'import os; os.fsync(open("/dev/loopX", "r+b"))'
>> >
>> >This should do the same as blockdev --flushbufs *plus* it sends and
>> >waits for the NBD FLUSH command.
>> >
>> >You may have to play with this command-line a little but the main idea
>> >is to open the block device and fsync it.
>> >
>> >Stefan
>> >
>> 
>> Hi Stefan,
>> 
>> One of my early experiments was adding a command line option to
>>'qemu-nbd' that did an open on 'device' (similar to the -c option), and
>>then calling 'fsync' on the 'device'.  By itself, I did not get a
>>complete flush to disk.  Was I missing something?
>> 
>> Empirically, the signal solution (blockdev --flushbufs plus
>>'bdrv_flush_all') was keeping my disk consistent.  My unit test
>>exercises the flush and snapshot pretty rigorously; that is, it never
>>passed before with 'qemu-nbd --cache=writeback ...'.  However, I did not
>>want to rely on 'sleep' for the race condition.
>> 
>> Is there any opportunity with the nbd client socket interface?  The
>>advantage for me there is not modifying 'qemu-nbd' source.
>
>I'm suggesting that you don't need to modify qemu-nbd.  If your host is
>running nbd.ko with flush support, then it should be enough to open the
>device and issue fsync(2).
>
>You can verify this using tcpdump(8) and checking that the NBD FLUSH
>command is really being sent by the host kernel.  If not, double check
>you're using the latest nbd.ko.
>
>Stefan
>

Got it.  I will try this approach with python.

Thank again,
Mark T.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 'qemu-nbd' explicit flush
  2013-05-28 18:00 Mark Trumpold
@ 2013-05-29  7:42 ` Stefan Hajnoczi
  2013-05-29 15:29   ` Mark Trumpold
  2013-06-07 14:00   ` Mark Trumpold
  0 siblings, 2 replies; 15+ messages in thread
From: Stefan Hajnoczi @ 2013-05-29  7:42 UTC (permalink / raw)
  To: Mark Trumpold; +Cc: Paolo Bonzini, qemu-devel, markt

On Tue, May 28, 2013 at 06:00:08PM +0000, Mark Trumpold wrote:
> 
> >-----Original Message-----
> >From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
> >Sent: Monday, May 27, 2013 05:36 AM
> >To: 'Mark Trumpold'
> >Cc: 'Paolo Bonzini', qemu-devel@nongnu.org, markt@tachyon.net
> >Subject: Re: 'qemu-nbd' explicit flush
> >
> >On Sat, May 25, 2013 at 09:42:08AM -0800, Mark Trumpold wrote:
> >> On 5/24/13 1:05 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
> >> >On Thu, May 23, 2013 at 09:58:31PM +0000, Mark Trumpold wrote:
> >> >One thing to be careful of is whether these operations are asynchronous.
> >> >The signal is asynchronous, you have no way of knowing when qemu-nbd is
> >> >finished flushing to the physical disk.
> >>
> >> Right, of course.  I missed the obvious.
> >
> >I missed something too.  Paolo may have already hinted at this when he
> >posted a dd oflag=sync command-line option:
> >
> >blockdev --flushbufs is the wrong tool because ioctl(BLKFLSBUF) only
> >writes out dirty pages to the block device.  It does *not* guarantee to
> >send a flush request to the device.
> >
> >Therefore, the underlying image file may not be put into an up-to-date
> >state by qemu-nbd.
> >
> >
> >I suggest trying the following instead of blockdev --flushbufs:
> >
> >  python -c 'import os; os.fsync(open("/dev/loopX", "r+b"))'
> >
> >This should do the same as blockdev --flushbufs *plus* it sends and
> >waits for the NBD FLUSH command.
> >
> >You may have to play with this command-line a little but the main idea
> >is to open the block device and fsync it.
> >
> >Stefan
> >
> 
> Hi Stefan,
> 
> One of my early experiments was adding a command line option to 'qemu-nbd' that did an open on 'device' (similar to the -c option), and then calling 'fsync' on the 'device'.  By itself, I did not get a complete flush to disk.  Was I missing something?
> 
> Empirically, the signal solution (blockdev --flushbufs plus 'bdrv_flush_all') was keeping my disk consistent.  My unit test exercises the flush and snapshot pretty rigorously; that is, it never passed before with 'qemu-nbd --cache=writeback ...'.  However, I did not want to rely on 'sleep' for the race condition.
> 
> Is there any opportunity with the nbd client socket interface?  The advantage for me there is not modifying 'qemu-nbd' source.

I'm suggesting that you don't need to modify qemu-nbd.  If your host is
running nbd.ko with flush support, then it should be enough to open the
device and issue fsync(2).

You can verify this using tcpdump(8) and checking that the NBD FLUSH
command is really being sent by the host kernel.  If not, double check
you're using the latest nbd.ko.

Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 'qemu-nbd' explicit flush
@ 2013-05-28 18:00 Mark Trumpold
  2013-05-29  7:42 ` Stefan Hajnoczi
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Trumpold @ 2013-05-28 18:00 UTC (permalink / raw)
  To: Stefan Hajnoczi, Mark Trumpold; +Cc: Paolo Bonzini, qemu-devel, markt


>-----Original Message-----
>From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
>Sent: Monday, May 27, 2013 05:36 AM
>To: 'Mark Trumpold'
>Cc: 'Paolo Bonzini', qemu-devel@nongnu.org, markt@tachyon.net
>Subject: Re: 'qemu-nbd' explicit flush
>
>On Sat, May 25, 2013 at 09:42:08AM -0800, Mark Trumpold wrote:
>> On 5/24/13 1:05 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
>> >On Thu, May 23, 2013 at 09:58:31PM +0000, Mark Trumpold wrote:
>> >One thing to be careful of is whether these operations are asynchronous.
>> >The signal is asynchronous, you have no way of knowing when qemu-nbd is
>> >finished flushing to the physical disk.
>>
>> Right, of course.  I missed the obvious.
>
>I missed something too.  Paolo may have already hinted at this when he
>posted a dd oflag=sync command-line option:
>
>blockdev --flushbufs is the wrong tool because ioctl(BLKFLSBUF) only
>writes out dirty pages to the block device.  It does *not* guarantee to
>send a flush request to the device.
>
>Therefore, the underlying image file may not be put into an up-to-date
>state by qemu-nbd.
>
>
>I suggest trying the following instead of blockdev --flushbufs:
>
>  python -c 'import os; os.fsync(open("/dev/loopX", "r+b"))'
>
>This should do the same as blockdev --flushbufs *plus* it sends and
>waits for the NBD FLUSH command.
>
>You may have to play with this command-line a little but the main idea
>is to open the block device and fsync it.
>
>Stefan
>

Hi Stefan,

One of my early experiments was adding a command line option to 'qemu-nbd' that did an open on 'device' (similar to the -c option), and then calling 'fsync' on the 'device'.  By itself, I did not get a complete flush to disk.  Was I missing something?

Empirically, the signal solution (blockdev --flushbufs plus 'bdrv_flush_all') was keeping my disk consistent.  My unit test exercises the flush and snapshot pretty rigorously; that is, it never passed before with 'qemu-nbd --cache=writeback ...'.  However, I did not want to rely on 'sleep' for the race condition.

Is there any opportunity with the nbd client socket interface?  The advantage for me there is not modifying 'qemu-nbd' source.

Paolo had also mentioned taking a look at the newer 3.9 kernel for ideas, and possibly back porting.  I have not spent any time on this yet..

Thanks for all yours and Paolo's attention on this.

Mark T.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 'qemu-nbd' explicit flush
  2013-05-23 23:35 Mark Trumpold
@ 2013-05-24  9:06 ` Stefan Hajnoczi
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan Hajnoczi @ 2013-05-24  9:06 UTC (permalink / raw)
  To: Mark Trumpold; +Cc: Paolo Bonzini, qemu-devel, markt

On Thu, May 23, 2013 at 11:35:24PM +0000, Mark Trumpold wrote:
> I had one question I forgot to ask..
> 
> Is it possible to switch from '--cache=writeback' functionality
> to '--cache=writethrough' (and visa versa) while qemu-nbd is
> connected to the '/dev/nbd<x>' device?

No.  The block layer APIs to do that are available in QEMU but qemu-nbd
doesn't have run-time cache mode changing.

Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 'qemu-nbd' explicit flush
@ 2013-05-23 23:35 Mark Trumpold
  2013-05-24  9:06 ` Stefan Hajnoczi
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Trumpold @ 2013-05-23 23:35 UTC (permalink / raw)
  To: Mark Trumpold, Paolo Bonzini, Stefan Hajnoczi; +Cc: qemu-devel, markt

I had one question I forgot to ask..

Is it possible to switch from '--cache=writeback' functionality
to '--cache=writethrough' (and visa versa) while qemu-nbd is
connected to the '/dev/nbd<x>' device?

Thank you,
Mark T.


-----Original Message-----
From: Mark Trumpold [mailto:markt@netqa.com]
Sent: Thursday, May 23, 2013 02:58 PM
To: 'Paolo Bonzini', 'Stefan Hajnoczi'
Cc: 'Mark Trumpold', qemu-devel@nongnu.org, markt@tachyon.net
Subject: Re:  'qemu-nbd' explicit flush

I have a working configuration using the signal approach suggested by Stefan.

'qemu-nbd.c' is patched as follows:

    do {
        main_loop_wait(false);
+       if (sighup_reported) {
+           sighup_reported = false;
+           bdrv_drain_all();
+           bdrv_flush_all();
        }
    } while (!sigterm_reported && (persistent || !nbd_started || nb_fds > 0));

The driving script was patched as follows:

     mount -o remount,ro /dev/nbd0
     blockdev --flushbufs /dev/nbd0
+    kill -HUP <qemu-nbd process id>

I needed to retain 'blockdev --flushbufs' for things to work.  Seems the 'bdrv_flush_all' is flushing what is being missed by the blockdev flush.  I did not go back an retest with 'fsync' or other approaches I had tried before.

Thanks again Paolo and Stefan for your help!!
Regards,
Mark T.

-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Wednesday, May 22, 2013 04:07 AM
To: 'Stefan Hajnoczi'
Cc: 'Mark Trumpold', qemu-devel@nongnu.org, markt@tachyon.net
Subject: Re: 'qemu-nbd' explicit flush

Il 22/05/2013 11:47, Stefan Hajnoczi ha scritto:
> On Tue, May 21, 2013 at 08:01:10PM +0000, Mark Trumpold wrote:
>>     Linux kernel 3.3.1 with Qemu patch to enable kernel flushing:
>>         http://thread.gmane.org/gmane.linux.drivers.nbd.general/1108
>
> Did you check that the kernel is sending NBD_FLUSH commands?  You can
> use tcpdump and then check the captured network traffic.
>
>> Usage example:
>>     'qemu-nbd --cache=writeback -c /dev/nbd0 /images/my-qcow.img'
>>     'mount /dev/nbd0 /my-mount-point'
>>
>> Everything does flush correctly when I first unmount and then disconnect the device; however, in my case I am not able to unmount things before snapshotting.
>>
>> I tried several approaches externally to flush the device.  For example:
>>     'mount -o remount,ro /dev/nbd0'
>>     'blockdev --flushbufs /dev/nbd0'
>
> Did you try plain old sync(1)?

This could also work:

  dd if=/dev/zero of=dummy oflag=sync bs=512 count=1

> 1. Add a signal handler (like SIGHUP or SIGUSR1) to qemu-nbd which
>    flushes all exports.

That would be a useful addition anyway.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 'qemu-nbd' explicit flush
@ 2013-05-22 16:10 Mark Trumpold
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Trumpold @ 2013-05-22 16:10 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi; +Cc: Mark Trumpold, qemu-devel, markt

Thank you guys for responding!!

> 
> > 1. Add a signal handler (like SIGHUP or SIGUSR1) to qemu-nbd which
> >    flushes all exports.
> 
> That would be a useful addition anyway.
> 
> Paolo

This is exactly what I was going to try today.  I'm just getting familiar with Qemu source.
I'll let you know how it goes..

Thanks again Paolo and Stefan.
Regards,
Mark Trumpold


-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Wednesday, May 22, 2013 04:07 AM
To: 'Stefan Hajnoczi'
Cc: 'Mark Trumpold', qemu-devel@nongnu.org, markt@tachyon.net
Subject: Re: 'qemu-nbd' explicit flush

Il 22/05/2013 11:47, Stefan Hajnoczi ha scritto:
> On Tue, May 21, 2013 at 08:01:10PM +0000, Mark Trumpold wrote:
>>     Linux kernel 3.3.1 with Qemu patch to enable kernel flushing:
>>         http://thread.gmane.org/gmane.linux.drivers.nbd.general/1108
> 
> Did you check that the kernel is sending NBD_FLUSH commands?  You can
> use tcpdump and then check the captured network traffic.
> 
>> Usage example:
>>     'qemu-nbd --cache=writeback -c /dev/nbd0 /images/my-qcow.img'
>>     'mount /dev/nbd0 /my-mount-point'
>>
>> Everything does flush correctly when I first unmount and then disconnect the device; however, in my case I am not able to unmount things before snapshotting.
>>
>> I tried several approaches externally to flush the device.  For example:
>>     'mount -o remount,ro /dev/nbd0'
>>     'blockdev --flushbufs /dev/nbd0'
> 
> Did you try plain old sync(1)?

This could also work:

  dd if=/dev/zero of=dummy oflag=sync bs=512 count=1

> 1. Add a signal handler (like SIGHUP or SIGUSR1) to qemu-nbd which
>    flushes all exports.

That would be a useful addition anyway.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 'qemu-nbd' explicit flush
  2013-05-22  9:47 ` Stefan Hajnoczi
@ 2013-05-22 11:07   ` Paolo Bonzini
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2013-05-22 11:07 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Mark Trumpold, qemu-devel, markt

Il 22/05/2013 11:47, Stefan Hajnoczi ha scritto:
> On Tue, May 21, 2013 at 08:01:10PM +0000, Mark Trumpold wrote:
>>     Linux kernel 3.3.1 with Qemu patch to enable kernel flushing:
>>         http://thread.gmane.org/gmane.linux.drivers.nbd.general/1108
> 
> Did you check that the kernel is sending NBD_FLUSH commands?  You can
> use tcpdump and then check the captured network traffic.
> 
>> Usage example:
>>     'qemu-nbd --cache=writeback -c /dev/nbd0 /images/my-qcow.img'
>>     'mount /dev/nbd0 /my-mount-point'
>>
>> Everything does flush correctly when I first unmount and then disconnect the device; however, in my case I am not able to unmount things before snapshotting.
>>
>> I tried several approaches externally to flush the device.  For example:
>>     'mount -o remount,ro /dev/nbd0'
>>     'blockdev --flushbufs /dev/nbd0'
> 
> Did you try plain old sync(1)?

This could also work:

  dd if=/dev/zero of=dummy oflag=sync bs=512 count=1

> 1. Add a signal handler (like SIGHUP or SIGUSR1) to qemu-nbd which
>    flushes all exports.

That would be a useful addition anyway.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 'qemu-nbd' explicit flush
  2013-05-21 20:01 Mark Trumpold
@ 2013-05-22  9:47 ` Stefan Hajnoczi
  2013-05-22 11:07   ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2013-05-22  9:47 UTC (permalink / raw)
  To: Mark Trumpold; +Cc: qemu-devel, markt

On Tue, May 21, 2013 at 08:01:10PM +0000, Mark Trumpold wrote:
>     Linux kernel 3.3.1 with Qemu patch to enable kernel flushing:
>         http://thread.gmane.org/gmane.linux.drivers.nbd.general/1108

Did you check that the kernel is sending NBD_FLUSH commands?  You can
use tcpdump and then check the captured network traffic.

> Usage example:
>     'qemu-nbd --cache=writeback -c /dev/nbd0 /images/my-qcow.img'
>     'mount /dev/nbd0 /my-mount-point'
> 
> Everything does flush correctly when I first unmount and then disconnect the device; however, in my case I am not able to unmount things before snapshotting.
> 
> I tried several approaches externally to flush the device.  For example:
>     'mount -o remount,ro /dev/nbd0'
>     'blockdev --flushbufs /dev/nbd0'

Did you try plain old sync(1)?

> I have been looking at the Qemu source code and in user space 'nbd.c' in routine 'nbd_trip' I see the case 'NBD_CMD_FLUSH' which looks to be called from the NBD socket interface.  Here I see 'bdrv_co_flush(exp->bs)' which looks promising; however, I don't know how to setup the 'bs' pointer for the call.

bs is the block device which was exported using:

exp = nbd_export_new(bs, dev_offset, fd_size, nbdflags, nbd_export_closed);

in qemu-nbd.c:main().

> Ideally, I would like to add a command line parm to 'qemu-nbd.c' to explicitely do the flush, but so far no luck.

Doing that is a little tricky, I think there are two options:

1. Add a signal handler (like SIGHUP or SIGUSR1) to qemu-nbd which
   flushes all exports.

2. Instantiate a block/nbd.c client that connects to the running
   qemu-nbd server (make sure format=raw).  Then call bdrv_flush() on
   the NBD client.  You must use the qemu-nbd --shared=2 option.

Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] 'qemu-nbd' explicit flush
@ 2013-05-21 20:01 Mark Trumpold
  2013-05-22  9:47 ` Stefan Hajnoczi
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Trumpold @ 2013-05-21 20:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: markt

Hello all,

I am using ‘qemu-nbd’ and ‘qemu-img’ from the command line to provide ‘qcow2’ loop filesystems.  For performance, I have '--cache=writeback' set for the qemu-nbd device.

I am having trouble flushing all caches to disk at will; specifically before snapshotting the underlying filesystem that hold the ‘qcow2’ images.

Environment:
    Qemu 1.2.0
    Debian 6.0.5
    Linux kernel 3.3.1 with Qemu patch to enable kernel flushing:
        http://thread.gmane.org/gmane.linux.drivers.nbd.general/1108

Usage example:
    'qemu-nbd --cache=writeback -c /dev/nbd0 /images/my-qcow.img'
    'mount /dev/nbd0 /my-mount-point'

Everything does flush correctly when I first unmount and then disconnect the device; however, in my case I am not able to unmount things before snapshotting.

I tried several approaches externally to flush the device.  For example:
    'mount -o remount,ro /dev/nbd0'
    'blockdev --flushbufs /dev/nbd0'

I have been looking at the Qemu source code and in user space 'nbd.c' in routine 'nbd_trip' I see the case 'NBD_CMD_FLUSH' which looks to be called from the NBD socket interface.  Here I see 'bdrv_co_flush(exp->bs)' which looks promising; however, I don't know how to setup the 'bs' pointer for the call.

Ideally, I would like to add a command line parm to 'qemu-nbd.c' to explicitely do the flush, but so far no luck.

I've been struggling with this for some time.  Any guidance would be greatly appreciated.

Thank you,
Mark Trumpold

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-06-07 13:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-23 21:58 [Qemu-devel] 'qemu-nbd' explicit flush Mark Trumpold
2013-05-24  9:05 ` Stefan Hajnoczi
2013-05-25 17:42   ` Mark Trumpold
2013-05-27 12:36     ` Stefan Hajnoczi
2013-05-24 12:10 ` Paolo Bonzini
  -- strict thread matches above, loose matches on Subject: below --
2013-05-28 18:00 Mark Trumpold
2013-05-29  7:42 ` Stefan Hajnoczi
2013-05-29 15:29   ` Mark Trumpold
2013-06-07 14:00   ` Mark Trumpold
2013-05-23 23:35 Mark Trumpold
2013-05-24  9:06 ` Stefan Hajnoczi
2013-05-22 16:10 Mark Trumpold
2013-05-21 20:01 Mark Trumpold
2013-05-22  9:47 ` Stefan Hajnoczi
2013-05-22 11:07   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.