All of lore.kernel.org
 help / color / mirror / Atom feed
* KVM Block Device Driver
@ 2013-08-13 20:13 Spensky, Chad - 0559 - MITLL
  2013-08-14  2:40 ` Fam Zheng
  0 siblings, 1 reply; 16+ messages in thread
From: Spensky, Chad - 0559 - MITLL @ 2013-08-13 20:13 UTC (permalink / raw)
  To: kvm

Hi All,

  I'm working with some disk introspection on KVM, and we trying to create
a shadow image of the disk.  We've hooked the functions in block.c, in
particular bdrv_aio_writev.  However we are seeing writes go through,
pausing the VM, and the comparing our shadow image with the actual VM
image, and they aren't 100% synced up.  The first 1-2 sectors appear to be
always be correct, however, after that, there are sometimes some
discrepancies.  I believe we have exhausted most obvious bugs (malloc
bugs, incorrect size calculations etc.).  Has anyone had any experience
with this or have any insights?

Our methodology is as follows:
 1. Boot the VM.
 2. Pause VM.
 3. Copy the disk to our shadow image.
 4. Perform very few reads/writes.
 5. Pause VM.
 6. Compare shadow copy with active vm disk.

 And this is where we are seeing discrepancies.  Any help is much
appreciated!  We are running on Ubuntu 12.04 with a modified Debian build.

 - Chad

-- 
Chad S. Spensky

MIT Lincoln Laboratory
Group 59 (Cyber Systems Assessment)
Ph: (781) 981-4173




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-13 20:13 KVM Block Device Driver Spensky, Chad - 0559 - MITLL
@ 2013-08-14  2:40 ` Fam Zheng
  2013-08-14 10:05   ` Stefan Hajnoczi
  0 siblings, 1 reply; 16+ messages in thread
From: Fam Zheng @ 2013-08-14  2:40 UTC (permalink / raw)
  To: Spensky, Chad - 0559 - MITLL; +Cc: kvm

On Tue, 08/13 16:13, Spensky, Chad - 0559 - MITLL wrote:
> Hi All,
> 
>   I'm working with some disk introspection on KVM, and we trying to create
> a shadow image of the disk.  We've hooked the functions in block.c, in
> particular bdrv_aio_writev.  However we are seeing writes go through,
> pausing the VM, and the comparing our shadow image with the actual VM
> image, and they aren't 100% synced up.  The first 1-2 sectors appear to be
> always be correct, however, after that, there are sometimes some
> discrepancies.  I believe we have exhausted most obvious bugs (malloc
> bugs, incorrect size calculations etc.).  Has anyone had any experience
> with this or have any insights?
> 
> Our methodology is as follows:
>  1. Boot the VM.
>  2. Pause VM.
>  3. Copy the disk to our shadow image.

How do you copy the disk, from guest or host?

>  4. Perform very few reads/writes.

Did you flush to disk?

>  5. Pause VM.
>  6. Compare shadow copy with active vm disk.
> 
>  And this is where we are seeing discrepancies.  Any help is much
> appreciated!  We are running on Ubuntu 12.04 with a modified Debian build.
> 
>  - Chad
> 
> -- 
> Chad S. Spensky
> 

I think drive-backup command does just what you want, it creates a image
and copy-on-write date from guest disk to the target, without pausing
VM.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-14  2:40 ` Fam Zheng
@ 2013-08-14 10:05   ` Stefan Hajnoczi
  2013-08-14 11:29     ` Spensky, Chad - 0559 - MITLL
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-08-14 10:05 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Spensky, Chad - 0559 - MITLL, kvm

On Wed, Aug 14, 2013 at 10:40:06AM +0800, Fam Zheng wrote:
> On Tue, 08/13 16:13, Spensky, Chad - 0559 - MITLL wrote:
> > Hi All,
> > 
> >   I'm working with some disk introspection on KVM, and we trying to create
> > a shadow image of the disk.  We've hooked the functions in block.c, in
> > particular bdrv_aio_writev.  However we are seeing writes go through,
> > pausing the VM, and the comparing our shadow image with the actual VM
> > image, and they aren't 100% synced up.  The first 1-2 sectors appear to be
> > always be correct, however, after that, there are sometimes some
> > discrepancies.  I believe we have exhausted most obvious bugs (malloc
> > bugs, incorrect size calculations etc.).  Has anyone had any experience
> > with this or have any insights?
> > 
> > Our methodology is as follows:
> >  1. Boot the VM.
> >  2. Pause VM.
> >  3. Copy the disk to our shadow image.
> 
> How do you copy the disk, from guest or host?
> 
> >  4. Perform very few reads/writes.
> 
> Did you flush to disk?
> 
> >  5. Pause VM.
> >  6. Compare shadow copy with active vm disk.
> > 
> >  And this is where we are seeing discrepancies.  Any help is much
> > appreciated!  We are running on Ubuntu 12.04 with a modified Debian build.
> > 
> >  - Chad
> > 
> > -- 
> > Chad S. Spensky
> > 
> 
> I think drive-backup command does just what you want, it creates a image
> and copy-on-write date from guest disk to the target, without pausing
> VM.

Or perhaps drive-mirror.

Maybe Chad can explain what the use case is.  There is probably an
existing command that does this or that could be extended to do this
safely.

Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-14 10:05   ` Stefan Hajnoczi
@ 2013-08-14 11:29     ` Spensky, Chad - 0559 - MITLL
  2013-08-14 12:16       ` Fam Zheng
  2013-08-14 14:07       ` Stefan Hajnoczi
  0 siblings, 2 replies; 16+ messages in thread
From: Spensky, Chad - 0559 - MITLL @ 2013-08-14 11:29 UTC (permalink / raw)
  To: Stefan Hajnoczi, Fam Zheng; +Cc: kvm

[-- Attachment #1: Type: text/plain, Size: 2140 bytes --]

Stefan, Fam,

  We are trying to keep an active shadow copy while the system is running
without any need for pausing.  More precisely we want to log every
individual access to the drive into a database so that the entire stream
of accesses could be replayed at a later time.

 - Chad

-- 
Chad S. Spensky

MIT Lincoln Laboratory
Group 59 (Cyber Systems Assessment)
Ph: (781) 981-4173





On 8/14/13 6:05 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:

>On Wed, Aug 14, 2013 at 10:40:06AM +0800, Fam Zheng wrote:
>> On Tue, 08/13 16:13, Spensky, Chad - 0559 - MITLL wrote:
>> > Hi All,
>> > 
>> >   I'm working with some disk introspection on KVM, and we trying to
>>create
>> > a shadow image of the disk.  We've hooked the functions in block.c, in
>> > particular bdrv_aio_writev.  However we are seeing writes go through,
>> > pausing the VM, and the comparing our shadow image with the actual VM
>> > image, and they aren't 100% synced up.  The first 1-2 sectors appear
>>to be
>> > always be correct, however, after that, there are sometimes some
>> > discrepancies.  I believe we have exhausted most obvious bugs (malloc
>> > bugs, incorrect size calculations etc.).  Has anyone had any
>>experience
>> > with this or have any insights?
>> > 
>> > Our methodology is as follows:
>> >  1. Boot the VM.
>> >  2. Pause VM.
>> >  3. Copy the disk to our shadow image.
>> 
>> How do you copy the disk, from guest or host?
>> 
>> >  4. Perform very few reads/writes.
>> 
>> Did you flush to disk?
>> 
>> >  5. Pause VM.
>> >  6. Compare shadow copy with active vm disk.
>> > 
>> >  And this is where we are seeing discrepancies.  Any help is much
>> > appreciated!  We are running on Ubuntu 12.04 with a modified Debian
>>build.
>> > 
>> >  - Chad
>> > 
>> > -- 
>> > Chad S. Spensky
>> > 
>> 
>> I think drive-backup command does just what you want, it creates a image
>> and copy-on-write date from guest disk to the target, without pausing
>> VM.
>
>Or perhaps drive-mirror.
>
>Maybe Chad can explain what the use case is.  There is probably an
>existing command that does this or that could be extended to do this
>safely.
>
>Stefan

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5142 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-14 11:29     ` Spensky, Chad - 0559 - MITLL
@ 2013-08-14 12:16       ` Fam Zheng
  2013-08-14 12:19         ` Spensky, Chad - 0559 - MITLL
  2013-08-14 14:07       ` Stefan Hajnoczi
  1 sibling, 1 reply; 16+ messages in thread
From: Fam Zheng @ 2013-08-14 12:16 UTC (permalink / raw)
  To: Spensky, Chad - 0559 - MITLL; +Cc: Stefan Hajnoczi, kvm

On Wed, 08/14 07:29, Spensky, Chad - 0559 - MITLL wrote:
> Stefan, Fam,
> 
>   We are trying to keep an active shadow copy while the system is running
> without any need for pausing.  More precisely we want to log every
> individual access to the drive into a database so that the entire stream
> of accesses could be replayed at a later time.
> 
There's no IO request log infrastructure in QEMU for now. What
drive-mirror can do is to repetitively send changed sectors since last
sending point, but it's not in guest request order or operation size, it
works just with a dirty bitmap.

In your methodology you didn't mention the hook you worked in block.c,
but I think it is necessary to hack block.c to log every r/w access to
the drive, I think you synchronous each write to the shadow image,
right?
> 
> 
> On 8/14/13 6:05 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
> 
> >On Wed, Aug 14, 2013 at 10:40:06AM +0800, Fam Zheng wrote:
> >> On Tue, 08/13 16:13, Spensky, Chad - 0559 - MITLL wrote:
> >> > Hi All,
> >> > 
> >> >   I'm working with some disk introspection on KVM, and we trying to
> >>create
> >> > a shadow image of the disk.  We've hooked the functions in block.c, in
> >> > particular bdrv_aio_writev.  However we are seeing writes go through,
> >> > pausing the VM, and the comparing our shadow image with the actual VM
> >> > image, and they aren't 100% synced up.  The first 1-2 sectors appear
> >>to be
> >> > always be correct, however, after that, there are sometimes some
> >> > discrepancies.  I believe we have exhausted most obvious bugs (malloc
> >> > bugs, incorrect size calculations etc.).  Has anyone had any
> >>experience
> >> > with this or have any insights?
> >> > 
> >> > Our methodology is as follows:
> >> >  1. Boot the VM.
> >> >  2. Pause VM.
> >> >  3. Copy the disk to our shadow image.
> >> 
> >> How do you copy the disk, from guest or host?
> >> 
> >> >  4. Perform very few reads/writes.
> >> 
> >> Did you flush to disk?
> >> 
> >> >  5. Pause VM.
> >> >  6. Compare shadow copy with active vm disk.
> >> > 
> >> >  And this is where we are seeing discrepancies.  Any help is much
> >> > appreciated!  We are running on Ubuntu 12.04 with a modified Debian
> >>build.
> >> > 
> >> >  - Chad
> >> > 
> >> > -- 
> >> > Chad S. Spensky
> >> > 
> >> 
> >> I think drive-backup command does just what you want, it creates a image
> >> and copy-on-write date from guest disk to the target, without pausing
> >> VM.
> >
> >Or perhaps drive-mirror.
> >
> >Maybe Chad can explain what the use case is.  There is probably an
> >existing command that does this or that could be extended to do this
> >safely.
> >
> >Stefan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-14 12:16       ` Fam Zheng
@ 2013-08-14 12:19         ` Spensky, Chad - 0559 - MITLL
  2013-08-14 12:35           ` Fam Zheng
  0 siblings, 1 reply; 16+ messages in thread
From: Spensky, Chad - 0559 - MITLL @ 2013-08-14 12:19 UTC (permalink / raw)
  To: famz; +Cc: Stefan Hajnoczi, kvm

[-- Attachment #1: Type: text/plain, Size: 3459 bytes --]

Fam,

  That's correct, we modified block.c to hook the appropriate functions
and output the information through a unix socket.  One of the functions
that we hooked is bdrv_aio_writev, however it seems like the data that we
are seeing at that point in the callstack is not what actually makes it to
the disk image for the guest.  The first couple of sectors always seem to
be the same, however after sector 2 it's a toss up.  I'm guessing there
may be some sort of caching going on or something, and we appear to be
missing it.

 - Chad

-- 
Chad S. Spensky

MIT Lincoln Laboratory
Group 59 (Cyber Systems Assessment)
Ph: (781) 981-4173





On 8/14/13 8:16 AM, "Fam Zheng" <famz@redhat.com> wrote:

>On Wed, 08/14 07:29, Spensky, Chad - 0559 - MITLL wrote:
>> Stefan, Fam,
>> 
>>   We are trying to keep an active shadow copy while the system is
>>running
>> without any need for pausing.  More precisely we want to log every
>> individual access to the drive into a database so that the entire stream
>> of accesses could be replayed at a later time.
>> 
>There's no IO request log infrastructure in QEMU for now. What
>drive-mirror can do is to repetitively send changed sectors since last
>sending point, but it's not in guest request order or operation size, it
>works just with a dirty bitmap.
>
>In your methodology you didn't mention the hook you worked in block.c,
>but I think it is necessary to hack block.c to log every r/w access to
>the drive, I think you synchronous each write to the shadow image,
>right?
>> 
>> 
>> On 8/14/13 6:05 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
>> 
>> >On Wed, Aug 14, 2013 at 10:40:06AM +0800, Fam Zheng wrote:
>> >> On Tue, 08/13 16:13, Spensky, Chad - 0559 - MITLL wrote:
>> >> > Hi All,
>> >> > 
>> >> >   I'm working with some disk introspection on KVM, and we trying to
>> >>create
>> >> > a shadow image of the disk.  We've hooked the functions in
>>block.c, in
>> >> > particular bdrv_aio_writev.  However we are seeing writes go
>>through,
>> >> > pausing the VM, and the comparing our shadow image with the actual
>>VM
>> >> > image, and they aren't 100% synced up.  The first 1-2 sectors
>>appear
>> >>to be
>> >> > always be correct, however, after that, there are sometimes some
>> >> > discrepancies.  I believe we have exhausted most obvious bugs
>>(malloc
>> >> > bugs, incorrect size calculations etc.).  Has anyone had any
>> >>experience
>> >> > with this or have any insights?
>> >> > 
>> >> > Our methodology is as follows:
>> >> >  1. Boot the VM.
>> >> >  2. Pause VM.
>> >> >  3. Copy the disk to our shadow image.
>> >> 
>> >> How do you copy the disk, from guest or host?
>> >> 
>> >> >  4. Perform very few reads/writes.
>> >> 
>> >> Did you flush to disk?
>> >> 
>> >> >  5. Pause VM.
>> >> >  6. Compare shadow copy with active vm disk.
>> >> > 
>> >> >  And this is where we are seeing discrepancies.  Any help is much
>> >> > appreciated!  We are running on Ubuntu 12.04 with a modified Debian
>> >>build.
>> >> > 
>> >> >  - Chad
>> >> > 
>> >> > -- 
>> >> > Chad S. Spensky
>> >> > 
>> >> 
>> >> I think drive-backup command does just what you want, it creates a
>>image
>> >> and copy-on-write date from guest disk to the target, without pausing
>> >> VM.
>> >
>> >Or perhaps drive-mirror.
>> >
>> >Maybe Chad can explain what the use case is.  There is probably an
>> >existing command that does this or that could be extended to do this
>> >safely.
>> >
>> >Stefan
>
>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5142 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-14 12:19         ` Spensky, Chad - 0559 - MITLL
@ 2013-08-14 12:35           ` Fam Zheng
  0 siblings, 0 replies; 16+ messages in thread
From: Fam Zheng @ 2013-08-14 12:35 UTC (permalink / raw)
  To: Spensky, Chad - 0559 - MITLL; +Cc: Stefan Hajnoczi, kvm

On Wed, 08/14 08:19, Spensky, Chad - 0559 - MITLL wrote:
> Fam,
> 
>   That's correct, we modified block.c to hook the appropriate functions
> and output the information through a unix socket.  One of the functions
> that we hooked is bdrv_aio_writev, however it seems like the data that we
> are seeing at that point in the callstack is not what actually makes it to
> the disk image for the guest.  The first couple of sectors always seem to
> be the same, however after sector 2 it's a toss up.  I'm guessing there
> may be some sort of caching going on or something, and we appear to be
> missing it.
> 

Your approach sounds right. For you problem, I think it should be
possible to debug using gdb: start qemu in gdb, boot with a live CD, so
no disturbing disk IO by the OS; attach a image to the guest; insert
breakpoint at bdrv_aio_writev, then in guest you can start IO:

    # echo "your data" | dd of=/dev/sda oflag=sync seek=XXX

You should now be at the breakpoint in gdb and you can trace the
your data to the unix socket and compare with the guest io request.

Fam
>  - Chad
> 
> -- 
> Chad S. Spensky
> 
> MIT Lincoln Laboratory
> Group 59 (Cyber Systems Assessment)
> Ph: (781) 981-4173
> 
> 
> 
> 
> 
> On 8/14/13 8:16 AM, "Fam Zheng" <famz@redhat.com> wrote:
> 
> >On Wed, 08/14 07:29, Spensky, Chad - 0559 - MITLL wrote:
> >> Stefan, Fam,
> >> 
> >>   We are trying to keep an active shadow copy while the system is
> >>running
> >> without any need for pausing.  More precisely we want to log every
> >> individual access to the drive into a database so that the entire stream
> >> of accesses could be replayed at a later time.
> >> 
> >There's no IO request log infrastructure in QEMU for now. What
> >drive-mirror can do is to repetitively send changed sectors since last
> >sending point, but it's not in guest request order or operation size, it
> >works just with a dirty bitmap.
> >
> >In your methodology you didn't mention the hook you worked in block.c,
> >but I think it is necessary to hack block.c to log every r/w access to
> >the drive, I think you synchronous each write to the shadow image,
> >right?
> >> 
> >> 
> >> On 8/14/13 6:05 AM, "Stefan Hajnoczi" <stefanha@gmail.com> wrote:
> >> 
> >> >On Wed, Aug 14, 2013 at 10:40:06AM +0800, Fam Zheng wrote:
> >> >> On Tue, 08/13 16:13, Spensky, Chad - 0559 - MITLL wrote:
> >> >> > Hi All,
> >> >> > 
> >> >> >   I'm working with some disk introspection on KVM, and we trying to
> >> >>create
> >> >> > a shadow image of the disk.  We've hooked the functions in
> >>block.c, in
> >> >> > particular bdrv_aio_writev.  However we are seeing writes go
> >>through,
> >> >> > pausing the VM, and the comparing our shadow image with the actual
> >>VM
> >> >> > image, and they aren't 100% synced up.  The first 1-2 sectors
> >>appear
> >> >>to be
> >> >> > always be correct, however, after that, there are sometimes some
> >> >> > discrepancies.  I believe we have exhausted most obvious bugs
> >>(malloc
> >> >> > bugs, incorrect size calculations etc.).  Has anyone had any
> >> >>experience
> >> >> > with this or have any insights?
> >> >> > 
> >> >> > Our methodology is as follows:
> >> >> >  1. Boot the VM.
> >> >> >  2. Pause VM.
> >> >> >  3. Copy the disk to our shadow image.
> >> >> 
> >> >> How do you copy the disk, from guest or host?
> >> >> 
> >> >> >  4. Perform very few reads/writes.
> >> >> 
> >> >> Did you flush to disk?
> >> >> 
> >> >> >  5. Pause VM.
> >> >> >  6. Compare shadow copy with active vm disk.
> >> >> > 
> >> >> >  And this is where we are seeing discrepancies.  Any help is much
> >> >> > appreciated!  We are running on Ubuntu 12.04 with a modified Debian
> >> >>build.
> >> >> > 
> >> >> >  - Chad
> >> >> > 
> >> >> > -- 
> >> >> > Chad S. Spensky
> >> >> > 
> >> >> 
> >> >> I think drive-backup command does just what you want, it creates a
> >>image
> >> >> and copy-on-write date from guest disk to the target, without pausing
> >> >> VM.
> >> >
> >> >Or perhaps drive-mirror.
> >> >
> >> >Maybe Chad can explain what the use case is.  There is probably an
> >> >existing command that does this or that could be extended to do this
> >> >safely.
> >> >
> >> >Stefan
> >
> >



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-14 11:29     ` Spensky, Chad - 0559 - MITLL
  2013-08-14 12:16       ` Fam Zheng
@ 2013-08-14 14:07       ` Stefan Hajnoczi
  2013-08-14 14:40         ` Wolfgang Richter
  1 sibling, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-08-14 14:07 UTC (permalink / raw)
  To: Spensky, Chad - 0559 - MITLL; +Cc: Fam Zheng, kvm, wolf

On Wed, Aug 14, 2013 at 07:29:53AM -0400, Spensky, Chad - 0559 - MITLL wrote:
>   We are trying to keep an active shadow copy while the system is running
> without any need for pausing.  More precisely we want to log every
> individual access to the drive into a database so that the entire stream
> of accesses could be replayed at a later time.

CCing Wolfgang Richter who was previously interested in block I/O
tracing:
https://lists.nongnu.org/archive/html/qemu-devel/2013-05/msg01725.html

Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-14 14:07       ` Stefan Hajnoczi
@ 2013-08-14 14:40         ` Wolfgang Richter
  2013-08-14 14:43           ` Spensky, Chad - 0559 - MITLL
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Richter @ 2013-08-14 14:40 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Spensky, Chad - 0559 - MITLL, Fam Zheng, kvm

Still interested and back to working on this.  I taught a couple
classes this summer which killed my time in June - July.

So Chad, are you already logging all accesses? Or do you need
something quick to log them?

I have a patch to QEMU mainline (very small) to add block I/O tracing,
but it works via the log subsystem and might not fit your environment
if you wanted a production solution.

--
Wolf

On Wed, Aug 14, 2013 at 10:07 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Wed, Aug 14, 2013 at 07:29:53AM -0400, Spensky, Chad - 0559 - MITLL wrote:
>>   We are trying to keep an active shadow copy while the system is running
>> without any need for pausing.  More precisely we want to log every
>> individual access to the drive into a database so that the entire stream
>> of accesses could be replayed at a later time.
>
> CCing Wolfgang Richter who was previously interested in block I/O
> tracing:
> https://lists.nongnu.org/archive/html/qemu-devel/2013-05/msg01725.html
>
> Stefan



-- 
Wolf

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-14 14:40         ` Wolfgang Richter
@ 2013-08-14 14:43           ` Spensky, Chad - 0559 - MITLL
  2013-08-14 15:02             ` Wolfgang Richter
  0 siblings, 1 reply; 16+ messages in thread
From: Spensky, Chad - 0559 - MITLL @ 2013-08-14 14:43 UTC (permalink / raw)
  To: Wolfgang Richter, Stefan Hajnoczi; +Cc: Fam Zheng, kvm

[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]

Wolf,

  We're able to get all of the meta data just fine.  However it seems that
the actual content of the read/write seems to be wrong some of the time.
The first 2 sectors seem to always be correct, however on some writes, the
data that we traced does not match up with the data we are actually seeing
in the .img file for the guest on disk.

 - Chad

-- 
Chad S. Spensky

MIT Lincoln Laboratory
Group 59 (Cyber Systems Assessment)
Ph: (781) 981-4173





On 8/14/13 10:40 AM, "Wolfgang Richter" <wolf@cs.cmu.edu> wrote:

>Still interested and back to working on this.  I taught a couple
>classes this summer which killed my time in June - July.
>
>So Chad, are you already logging all accesses? Or do you need
>something quick to log them?
>
>I have a patch to QEMU mainline (very small) to add block I/O tracing,
>but it works via the log subsystem and might not fit your environment
>if you wanted a production solution.
>
>--
>Wolf
>
>On Wed, Aug 14, 2013 at 10:07 AM, Stefan Hajnoczi <stefanha@gmail.com>
>wrote:
>> On Wed, Aug 14, 2013 at 07:29:53AM -0400, Spensky, Chad - 0559 - MITLL
>>wrote:
>>>   We are trying to keep an active shadow copy while the system is
>>>running
>>> without any need for pausing.  More precisely we want to log every
>>> individual access to the drive into a database so that the entire
>>>stream
>>> of accesses could be replayed at a later time.
>>
>> CCing Wolfgang Richter who was previously interested in block I/O
>> tracing:
>> https://lists.nongnu.org/archive/html/qemu-devel/2013-05/msg01725.html
>>
>> Stefan
>
>
>
>-- 
>Wolf

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5142 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-14 14:43           ` Spensky, Chad - 0559 - MITLL
@ 2013-08-14 15:02             ` Wolfgang Richter
  2013-08-14 15:49               ` Wolfgang Richter
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Richter @ 2013-08-14 15:02 UTC (permalink / raw)
  To: Spensky, Chad - 0559 - MITLL; +Cc: Stefan Hajnoczi, Fam Zheng, kvm

On Wed, Aug 14, 2013 at 10:43 AM, Spensky, Chad - 0559 - MITLL
<chad.spensky@ll.mit.edu> wrote:
> Wolf,
>
>   We're able to get all of the meta data just fine.

I assume by meta-data you mean (essentially) "function" call
parameters within QEMU as to the (1) location of the write on disk,
and (2) the amount of data being written out (in bytes).

> However it seems that
> the actual content of the read/write seems to be wrong some of the time.

That's very odd, I'm pretty sure I never had that bug (although I
might silently have it!).

I did occasionally observe a kernel accidentally writing out kernel
buffers that hadn't been cleared to disk (guest kernel) :-)

> The first 2 sectors seem to always be correct, however on some writes, the
> data that we traced does not match up with the data we are actually seeing
> in the .img file for the guest on disk.

Are you certain you're getting every write?  Sometimes things might be
written to rapidly in succession?

How are you obtaining the shadow copy of the write stream (maybe there
was earlier stuff in this thread I should read up on)?

-- 
Wolf

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-14 15:02             ` Wolfgang Richter
@ 2013-08-14 15:49               ` Wolfgang Richter
  2013-08-14 18:44                 ` Spensky, Chad - 0559 - MITLL
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Richter @ 2013-08-14 15:49 UTC (permalink / raw)
  To: Spensky, Chad - 0559 - MITLL; +Cc: Stefan Hajnoczi, Fam Zheng, kvm

Read through the kvm thread (I'm not on that mailing just just for
reference; thanks for the CC).

I saw you're hooking a different function than me (not sure it
matters).  I hook bdrv_co_writev and I operate on the passed in
iovector datastructure there writing out its contents and a short
header to stderr where my introspection tools interpret it.  I don't
maintain a full shadow copy of the disk, but it conceptually shouldn't
matter.

Are you working on the iovec data structure as well?  Also, is your
disk format raw?  I don't think it should matter, but I was just
wondering.

--
Wolf

On Wed, Aug 14, 2013 at 11:02 AM, Wolfgang Richter <wolf@cs.cmu.edu> wrote:
> On Wed, Aug 14, 2013 at 10:43 AM, Spensky, Chad - 0559 - MITLL
> <chad.spensky@ll.mit.edu> wrote:
>> Wolf,
>>
>>   We're able to get all of the meta data just fine.
>
> I assume by meta-data you mean (essentially) "function" call
> parameters within QEMU as to the (1) location of the write on disk,
> and (2) the amount of data being written out (in bytes).
>
>> However it seems that
>> the actual content of the read/write seems to be wrong some of the time.
>
> That's very odd, I'm pretty sure I never had that bug (although I
> might silently have it!).
>
> I did occasionally observe a kernel accidentally writing out kernel
> buffers that hadn't been cleared to disk (guest kernel) :-)
>
>> The first 2 sectors seem to always be correct, however on some writes, the
>> data that we traced does not match up with the data we are actually seeing
>> in the .img file for the guest on disk.
>
> Are you certain you're getting every write?  Sometimes things might be
> written to rapidly in succession?
>
> How are you obtaining the shadow copy of the write stream (maybe there
> was earlier stuff in this thread I should read up on)?
>
> --
> Wolf



-- 
Wolf

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-14 15:49               ` Wolfgang Richter
@ 2013-08-14 18:44                 ` Spensky, Chad - 0559 - MITLL
  2013-08-14 19:42                   ` Wolfgang Richter
  2013-08-18 14:04                   ` Paolo Bonzini
  0 siblings, 2 replies; 16+ messages in thread
From: Spensky, Chad - 0559 - MITLL @ 2013-08-14 18:44 UTC (permalink / raw)
  To: Wolfgang Richter; +Cc: Stefan Hajnoczi, Fam Zheng, kvm

[-- Attachment #1: Type: text/plain, Size: 2811 bytes --]

Wolfgang,

  Thanks so much for the response.  It turns out that wasn't handling the
QEMUIOVector properly.  When I first implemented it, I saw that the iovec
was a pointer and assumed that there would only ever be one.  Given the
lack of documentation and my lack of understanding this went undetected
for a while.  everything now seems to work just fine.  :-)  See below for
the portion of code that threw me off.  Thanks again!

 - Chad


Proposed change to qemu-common.h (:

typedef struct QEMUIOVector {
    struct iovec *iov;
    int niov;
    int nalloc;
    size_t size;
} QEMUIOVector;

changed to:

// Array of I/O vectors

typedef struct QEMUIOVector {
struct iovec iov[];
int niov;
int nalloc;
size_t size;
} QEMUIOVector;





-- 
Chad S. Spensky

MIT Lincoln Laboratory
Group 59 (Cyber Systems Assessment)
Ph: (781) 981-4173





On 8/14/13 11:49 AM, "Wolfgang Richter" <wolf@cs.cmu.edu> wrote:

>Read through the kvm thread (I'm not on that mailing just just for
>reference; thanks for the CC).
>
>I saw you're hooking a different function than me (not sure it
>matters).  I hook bdrv_co_writev and I operate on the passed in
>iovector datastructure there writing out its contents and a short
>header to stderr where my introspection tools interpret it.  I don't
>maintain a full shadow copy of the disk, but it conceptually shouldn't
>matter.
>
>Are you working on the iovec data structure as well?  Also, is your
>disk format raw?  I don't think it should matter, but I was just
>wondering.
>
>--
>Wolf
>
>On Wed, Aug 14, 2013 at 11:02 AM, Wolfgang Richter <wolf@cs.cmu.edu>
>wrote:
>> On Wed, Aug 14, 2013 at 10:43 AM, Spensky, Chad - 0559 - MITLL
>> <chad.spensky@ll.mit.edu> wrote:
>>> Wolf,
>>>
>>>   We're able to get all of the meta data just fine.
>>
>> I assume by meta-data you mean (essentially) "function" call
>> parameters within QEMU as to the (1) location of the write on disk,
>> and (2) the amount of data being written out (in bytes).
>>
>>> However it seems that
>>> the actual content of the read/write seems to be wrong some of the
>>>time.
>>
>> That's very odd, I'm pretty sure I never had that bug (although I
>> might silently have it!).
>>
>> I did occasionally observe a kernel accidentally writing out kernel
>> buffers that hadn't been cleared to disk (guest kernel) :-)
>>
>>> The first 2 sectors seem to always be correct, however on some writes,
>>>the
>>> data that we traced does not match up with the data we are actually
>>>seeing
>>> in the .img file for the guest on disk.
>>
>> Are you certain you're getting every write?  Sometimes things might be
>> written to rapidly in succession?
>>
>> How are you obtaining the shadow copy of the write stream (maybe there
>> was earlier stuff in this thread I should read up on)?
>>
>> --
>> Wolf
>
>
>
>-- 
>Wolf

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5142 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-14 18:44                 ` Spensky, Chad - 0559 - MITLL
@ 2013-08-14 19:42                   ` Wolfgang Richter
  2013-08-18 14:04                   ` Paolo Bonzini
  1 sibling, 0 replies; 16+ messages in thread
From: Wolfgang Richter @ 2013-08-14 19:42 UTC (permalink / raw)
  To: Spensky, Chad - 0559 - MITLL; +Cc: Stefan Hajnoczi, Fam Zheng, kvm

I'd expect it would be something with QEMUIOvector :-)  Glad you found it!

On Wed, Aug 14, 2013 at 2:44 PM, Spensky, Chad - 0559 - MITLL
<chad.spensky@ll.mit.edu> wrote:
> Wolfgang,
>
>   Thanks so much for the response.  It turns out that wasn't handling the
> QEMUIOVector properly.  When I first implemented it, I saw that the iovec
> was a pointer and assumed that there would only ever be one.  Given the
> lack of documentation and my lack of understanding this went undetected
> for a while.  everything now seems to work just fine.  :-)  See below for
> the portion of code that threw me off.  Thanks again!

Just so you know (possibly to be safer?)  the code I use was based on
these functions (used to be declared qemu-common.h and iov.h, moved or
refactored I think recently?):

void qemu_iovec_to_fd(int fd, QEMUIOVector *qiov);
void iov_to_fd(int fd, const struct iovec *iov, unsigned int iov_cnt);

void qemu_iovec_to_fd(int fd, QEMUIOVector *qiov)
{
    iov_to_fd(fd, qiov->iov, qiov->niov);
}

void iov_to_fd(int fd, const struct iovec *iov, const unsigned int iov_cnt)
{
    unsigned int i;
    for (i = 0; i < iov_cnt; i++)
    {
        assert(qemu_write_full(fd, iov[i].iov_base, iov[i].iov_len) ==
               iov[i].iov_len);
    }
}


Thus, you have to loop through the iovector (as I think you found and
just fixed).

-- 
Wolf

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-14 18:44                 ` Spensky, Chad - 0559 - MITLL
  2013-08-14 19:42                   ` Wolfgang Richter
@ 2013-08-18 14:04                   ` Paolo Bonzini
  2013-08-18 14:48                     ` Wolfgang Richter
  1 sibling, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2013-08-18 14:04 UTC (permalink / raw)
  To: Spensky, Chad - 0559 - MITLL
  Cc: Wolfgang Richter, Stefan Hajnoczi, Fam Zheng, kvm

Il 14/08/2013 20:44, Spensky, Chad - 0559 - MITLL ha scritto:
> Wolfgang,
> 
>   Thanks so much for the response.  It turns out that wasn't handling the
> QEMUIOVector properly.  When I first implemented it, I saw that the iovec
> was a pointer and assumed that there would only ever be one.  Given the
> lack of documentation and my lack of understanding this went undetected
> for a while.  everything now seems to work just fine.  :-)  See below for
> the portion of code that threw me off.  Thanks again!
> 
>  - Chad
> 
> 
> Proposed change to qemu-common.h (:
> 
> typedef struct QEMUIOVector {
>     struct iovec *iov;
>     int niov;
>     int nalloc;
>     size_t size;
> } QEMUIOVector;
> 
> changed to:
> 
> // Array of I/O vectors
> 
> typedef struct QEMUIOVector {
> struct iovec iov[];
> int niov;
> int nalloc;
> size_t size;
> } QEMUIOVector;

This wouldn't work.  As you wrote it, it wouldn't even compile.

Embedding an array is possible if you place it at the end of the struct,
but then it is not possible to resize the array (because when you
reallocate it, the containing QEMUIOVector could move in memory and any
pointers to the containing QEMUIOVector would become invalid).

In C, a dynamically-allocated array is represented by a pointer to the
first item.  The size of the array and the size of the
dynamically-allocated memory block are stored together with the pointer
(in this case, in the niov and nalloc members).

In fact, whenever you see something like "iovec iov[]" in Java, what's
being stored in memory is exactly a pointer to the first item, the size
of the array and the size of the dynamically-allocated memory block.  C
just makes that explicit.

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM Block Device Driver
  2013-08-18 14:04                   ` Paolo Bonzini
@ 2013-08-18 14:48                     ` Wolfgang Richter
  0 siblings, 0 replies; 16+ messages in thread
From: Wolfgang Richter @ 2013-08-18 14:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Spensky, Chad - 0559 - MITLL, Wolfgang Richter, Stefan Hajnoczi,
	Fam Zheng, kvm

Also there are utility functions within the QEMU code base that help deal with QEMUIOVectors (as far as I remember) which would help maybe so you don't have to directly code using them.

--
Wolf

On Aug 18, 2013, at 10:04 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 14/08/2013 20:44, Spensky, Chad - 0559 - MITLL ha scritto:
>> Wolfgang,
>> 
>>  Thanks so much for the response.  It turns out that wasn't handling the
>> QEMUIOVector properly.  When I first implemented it, I saw that the iovec
>> was a pointer and assumed that there would only ever be one.  Given the
>> lack of documentation and my lack of understanding this went undetected
>> for a while.  everything now seems to work just fine.  :-)  See below for
>> the portion of code that threw me off.  Thanks again!
>> 
>> - Chad
>> 
>> 
>> Proposed change to qemu-common.h (:
>> 
>> typedef struct QEMUIOVector {
>>    struct iovec *iov;
>>    int niov;
>>    int nalloc;
>>    size_t size;
>> } QEMUIOVector;
>> 
>> changed to:
>> 
>> // Array of I/O vectors
>> 
>> typedef struct QEMUIOVector {
>> struct iovec iov[];
>> int niov;
>> int nalloc;
>> size_t size;
>> } QEMUIOVector;
> 
> This wouldn't work.  As you wrote it, it wouldn't even compile.
> 
> Embedding an array is possible if you place it at the end of the struct,
> but then it is not possible to resize the array (because when you
> reallocate it, the containing QEMUIOVector could move in memory and any
> pointers to the containing QEMUIOVector would become invalid).
> 
> In C, a dynamically-allocated array is represented by a pointer to the
> first item.  The size of the array and the size of the
> dynamically-allocated memory block are stored together with the pointer
> (in this case, in the niov and nalloc members).
> 
> In fact, whenever you see something like "iovec iov[]" in Java, what's
> being stored in memory is exactly a pointer to the first item, the size
> of the array and the size of the dynamically-allocated memory block.  C
> just makes that explicit.
> 
> Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-08-18 14:48 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-13 20:13 KVM Block Device Driver Spensky, Chad - 0559 - MITLL
2013-08-14  2:40 ` Fam Zheng
2013-08-14 10:05   ` Stefan Hajnoczi
2013-08-14 11:29     ` Spensky, Chad - 0559 - MITLL
2013-08-14 12:16       ` Fam Zheng
2013-08-14 12:19         ` Spensky, Chad - 0559 - MITLL
2013-08-14 12:35           ` Fam Zheng
2013-08-14 14:07       ` Stefan Hajnoczi
2013-08-14 14:40         ` Wolfgang Richter
2013-08-14 14:43           ` Spensky, Chad - 0559 - MITLL
2013-08-14 15:02             ` Wolfgang Richter
2013-08-14 15:49               ` Wolfgang Richter
2013-08-14 18:44                 ` Spensky, Chad - 0559 - MITLL
2013-08-14 19:42                   ` Wolfgang Richter
2013-08-18 14:04                   ` Paolo Bonzini
2013-08-18 14:48                     ` Wolfgang Richter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.