* [Qemu-devel] fstrim & upstream kernel not working
@ 2014-03-13 21:49 Richard W.M. Jones
2014-03-14 12:30 ` Paolo Bonzini
0 siblings, 1 reply; 7+ messages in thread
From: Richard W.M. Jones @ 2014-03-13 21:49 UTC (permalink / raw)
To: qemu-devel, pbonzini
I got fstrim happily working in Fedora 20, but it's not working with
the upstream kernel. The message is:
fstrim -v /sysroot/
[ 45.541339] sda: WRITE SAME failed. Manually zeroing.
/sysroot/: 47.2 MiB (49466368 bytes) trimmed
While this isn't technically an error, it of course doesn't trim
anything. In fact the host disk grows after the fstrim.
A couple of questions:
- Is there any reason why virtio-scsi doesn't emulate WRITE SAME? It
seems pretty simple, and upstream kernels issue WRITE SAME when they
want to zero large areas of disk.
- Can you see where ext4 issues the zeroout/write same call? AFAICT
it is still issuing discards, but these are getting turned into
zeroout/write same by some sort of block layer magic that I can't
quite follow.
kernel: 3.14.0-0.rc6.git2.1.fc21
qemu: 1.7.0-5.fc21.x86_64
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] fstrim & upstream kernel not working
2014-03-13 21:49 [Qemu-devel] fstrim & upstream kernel not working Richard W.M. Jones
@ 2014-03-14 12:30 ` Paolo Bonzini
2014-03-14 12:42 ` Richard W.M. Jones
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2014-03-14 12:30 UTC (permalink / raw)
To: Richard W.M. Jones, qemu-devel
Il 13/03/2014 22:49, Richard W.M. Jones ha scritto:
> I got fstrim happily working in Fedora 20, but it's not working with
> the upstream kernel. The message is:
>
> fstrim -v /sysroot/
> [ 45.541339] sda: WRITE SAME failed. Manually zeroing.
> /sysroot/: 47.2 MiB (49466368 bytes) trimmed
>
> While this isn't technically an error, it of course doesn't trim
> anything. In fact the host disk grows after the fstrim.
>
> A couple of questions:
>
> - Is there any reason why virtio-scsi doesn't emulate WRITE SAME?
Yes, the reason is that you're using QEMU 1.7. :)
> - Can you see where ext4 issues the zeroout/write same call? AFAICT
> it is still issuing discards, but these are getting turned into
> zeroout/write same by some sort of block layer magic that I can't
> quite follow.
That's provisioning_mode, which is writesame_16 with QEMU 1.7 and unmap
with QEMU 2.0.
Paolo
> kernel: 3.14.0-0.rc6.git2.1.fc21
> qemu: 1.7.0-5.fc21.x86_64
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] fstrim & upstream kernel not working
2014-03-14 12:30 ` Paolo Bonzini
@ 2014-03-14 12:42 ` Richard W.M. Jones
2014-03-14 12:47 ` Paolo Bonzini
0 siblings, 1 reply; 7+ messages in thread
From: Richard W.M. Jones @ 2014-03-14 12:42 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
On Fri, Mar 14, 2014 at 01:30:40PM +0100, Paolo Bonzini wrote:
> Il 13/03/2014 22:49, Richard W.M. Jones ha scritto:
> >- Is there any reason why virtio-scsi doesn't emulate WRITE SAME?
>
> Yes, the reason is that you're using QEMU 1.7. :)
>
> >- Can you see where ext4 issues the zeroout/write same call? AFAICT
> >it is still issuing discards, but these are getting turned into
> >zeroout/write same by some sort of block layer magic that I can't
> >quite follow.
>
> That's provisioning_mode, which is writesame_16 with QEMU 1.7 and
> unmap with QEMU 2.0.
Got it.
This morning I was trying kernel from git + qemu from git together.
This works, sort of.
Firstly I tightened up the automated tests[1] of trimming. Previously
we just tested that >= 1 block was freed in the host file. Now I'm
checking that >= 512 KB is freed. This change revealed that fstrim
was only trimming about 64 KB from the host file (although -o discard
and blkdiscard tests[1] work as expected).
I worked around this in any case by rearranging the test [2]:
Doing:
rm /a_big_file
fstrim /
sync
umount /
[shut down qemu]
would only trim 64 KB on the host.
Doing:
rm /a_big_file
umount / # added
mount -o nodiscard /dev/sda / # added
fstrim /
sync
umount /
[shut down qemu]
would trim the expected amount (around 10 MB).
I've no idea why this is (looks like an ext4/kernel bug to me), but in
any case the tests now use the second method[2].
Rich.
[1] https://github.com/libguestfs/libguestfs/tree/master/tests/discard
[2] https://github.com/libguestfs/libguestfs/commit/accf1b66aa835714690a2979e990c49243875dab
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] fstrim & upstream kernel not working
2014-03-14 12:42 ` Richard W.M. Jones
@ 2014-03-14 12:47 ` Paolo Bonzini
2014-03-14 13:24 ` Richard W.M. Jones
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2014-03-14 12:47 UTC (permalink / raw)
To: Richard W.M. Jones; +Cc: qemu-devel
Il 14/03/2014 13:42, Richard W.M. Jones ha scritto:
> I worked around this in any case by rearranging the test [2]:
>
> Doing:
>
> rm /a_big_file
> fstrim /
> sync
> umount /
> [shut down qemu]
>
> would only trim 64 KB on the host.
>
> Doing:
>
> rm /a_big_file
> umount / # added
> mount -o nodiscard /dev/sda / # added
> fstrim /
> sync
> umount /
> [shut down qemu]
>
> would trim the expected amount (around 10 MB).
>
> I've no idea why this is (looks like an ext4/kernel bug to me), but in
> any case the tests now use the second method[2].
Could be a race condition (something going on in the background between
rm and fstrim). Try syncing before fstrim, not after. In fact the sync
before umount should not be necessary.
Paolo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] fstrim & upstream kernel not working
2014-03-14 12:47 ` Paolo Bonzini
@ 2014-03-14 13:24 ` Richard W.M. Jones
2014-03-14 13:28 ` Paolo Bonzini
0 siblings, 1 reply; 7+ messages in thread
From: Richard W.M. Jones @ 2014-03-14 13:24 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
On Fri, Mar 14, 2014 at 01:47:59PM +0100, Paolo Bonzini wrote:
> Il 14/03/2014 13:42, Richard W.M. Jones ha scritto:
> >I worked around this in any case by rearranging the test [2]:
> >
> >Doing:
> >
> > rm /a_big_file
> > fstrim /
> > sync
> > umount /
> > [shut down qemu]
> >
> >would only trim 64 KB on the host.
> >
> >Doing:
> >
> > rm /a_big_file
> > umount / # added
> > mount -o nodiscard /dev/sda / # added
> > fstrim /
> > sync
> > umount /
> > [shut down qemu]
> >
> >would trim the expected amount (around 10 MB).
> >
> >I've no idea why this is (looks like an ext4/kernel bug to me), but in
> >any case the tests now use the second method[2].
>
> Could be a race condition (something going on in the background
> between rm and fstrim).
Not much happens in the libguestfs appliance. There are usually only
two processes (udev + guestfsd).
> Try syncing before fstrim, not after. In
> fact the sync before umount should not be necessary.
Yes, that works with both upstream kernel+qemu and with F21 kernel+qemu.
https://github.com/libguestfs/libguestfs/commit/d46ceea6014006ab19b6f795e2e28a7360d90b2c
Thanks,
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines. Supports shell scripting,
bindings from many languages. http://libguestfs.org
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] fstrim & upstream kernel not working
2014-03-14 13:24 ` Richard W.M. Jones
@ 2014-03-14 13:28 ` Paolo Bonzini
2014-03-14 13:34 ` Richard W.M. Jones
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2014-03-14 13:28 UTC (permalink / raw)
To: Richard W.M. Jones; +Cc: qemu-devel
Il 14/03/2014 14:24, Richard W.M. Jones ha scritto:
>> > Could be a race condition (something going on in the background
>> > between rm and fstrim).
> Not much happens in the libguestfs appliance. There are usually only
> two processes (udev + guestfsd).
There's also the kernel. The ext4 driver is probably doing something
after rm returns, and hasn't finished yet when you invoke FITRIM.
Paolo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] fstrim & upstream kernel not working
2014-03-14 13:28 ` Paolo Bonzini
@ 2014-03-14 13:34 ` Richard W.M. Jones
0 siblings, 0 replies; 7+ messages in thread
From: Richard W.M. Jones @ 2014-03-14 13:34 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
On Fri, Mar 14, 2014 at 02:28:24PM +0100, Paolo Bonzini wrote:
> Il 14/03/2014 14:24, Richard W.M. Jones ha scritto:
> >>> Could be a race condition (something going on in the background
> >>> between rm and fstrim).
> >Not much happens in the libguestfs appliance. There are usually only
> >two processes (udev + guestfsd).
>
> There's also the kernel. The ext4 driver is probably doing
> something after rm returns, and hasn't finished yet when you invoke
> FITRIM.
Yup. libguestfs has exposed a number of places where you can type
commands slowly by hand, but running them from a script would fail.
Check out the number of places we call 'udevadm --settle' or 'sync' --
each one has been discovered painfully over 5 years.
$ git grep -E 'udev_settle|sync_disks' daemon | wc -l
59
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-03-14 13:34 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-13 21:49 [Qemu-devel] fstrim & upstream kernel not working Richard W.M. Jones
2014-03-14 12:30 ` Paolo Bonzini
2014-03-14 12:42 ` Richard W.M. Jones
2014-03-14 12:47 ` Paolo Bonzini
2014-03-14 13:24 ` Richard W.M. Jones
2014-03-14 13:28 ` Paolo Bonzini
2014-03-14 13:34 ` Richard W.M. Jones
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.