All of lore.kernel.org
 help / color / mirror / Atom feed
* BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12  5:24 Giangiacomo Mariotti
  2010-07-12  5:54 ` Justin P. Mattock
                   ` (3 more replies)
  0 siblings, 4 replies; 45+ messages in thread
From: Giangiacomo Mariotti @ 2010-07-12  5:24 UTC (permalink / raw)
  To: linux-kernel

Hi, is it a known problem how much slow is Btrfs with kvm/qemu(meaning
that the image kvm/qemu uses as the hd is on a partition formatted
with Btrfs, not that the fs used by the hd inside the kvm environment
is Btrfs, in fact inside kvm the / partition is formatted with ext3)?
I haven't written down the exact numbers, because I forgot, but while
I was trying to make it work, after I noticed how much longer than
usual it was taking to just install the system, I took a look at iotop
and it was reporting a write speed of the kvm process of approximately
3M/s, while the Btrfs kernel thread had an approximately write speed
of 7K/s! Just formatting the partitions during the debian installation
took minutes. When the actual installation of the distro started I had
to stop it, because it was taking hours! The iotop results made me
think that the problem could be Btrfs, but, to be sure that it wasn't
instead a kvm/qemu problem, I cut/pasted the same virtual hd on an
ext3 fs and started kvm with the same parameters as before. The
installation of debian inside kvm this time went smoothly and fast,
like normally it does. I've been using Btrfs for some time now and
while it has never been a speed champion(and I guess it's not supposed
to be one and I don't even really care that much about it), I've never
had any noticeable performance problem before and it has always been
quite stable. In this test case though, it seems to be doing very bad.

cheers

-- 
Giangiacomo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-12  5:24 BTRFS: Unbelievably slow with kvm/qemu Giangiacomo Mariotti
@ 2010-07-12  5:54 ` Justin P. Mattock
  2010-07-12  7:09   ` [Qemu-devel] " Michael Tokarev
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: Justin P. Mattock @ 2010-07-12  5:54 UTC (permalink / raw)
  To: Giangiacomo Mariotti; +Cc: linux-kernel

On 07/11/2010 10:24 PM, Giangiacomo Mariotti wrote:
> Hi, is it a known problem how much slow is Btrfs with kvm/qemu(meaning
> that the image kvm/qemu uses as the hd is on a partition formatted
> with Btrfs, not that the fs used by the hd inside the kvm environment
> is Btrfs, in fact inside kvm the / partition is formatted with ext3)?
> I haven't written down the exact numbers, because I forgot, but while
> I was trying to make it work, after I noticed how much longer than
> usual it was taking to just install the system, I took a look at iotop
> and it was reporting a write speed of the kvm process of approximately
> 3M/s, while the Btrfs kernel thread had an approximately write speed
> of 7K/s! Just formatting the partitions during the debian installation
> took minutes. When the actual installation of the distro started I had
> to stop it, because it was taking hours! The iotop results made me
> think that the problem could be Btrfs, but, to be sure that it wasn't
> instead a kvm/qemu problem, I cut/pasted the same virtual hd on an
> ext3 fs and started kvm with the same parameters as before. The
> installation of debian inside kvm this time went smoothly and fast,
> like normally it does. I've been using Btrfs for some time now and
> while it has never been a speed champion(and I guess it's not supposed
> to be one and I don't even really care that much about it), I've never
> had any noticeable performance problem before and it has always been
> quite stable. In this test case though, it seems to be doing very bad.
>
> cheers
>

not sure with butter filesystems.. but, what is the last good kernel? 
are you able to bisect?

Justin P. Mattock

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-12  5:24 BTRFS: Unbelievably slow with kvm/qemu Giangiacomo Mariotti
@ 2010-07-12  7:09   ` Michael Tokarev
  2010-07-12  7:09   ` [Qemu-devel] " Michael Tokarev
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: Michael Tokarev @ 2010-07-12  7:09 UTC (permalink / raw)
  To: Giangiacomo Mariotti; +Cc: linux-kernel, linux-fsdevel, qemu-devel

12.07.2010 09:24, Giangiacomo Mariotti wrote:
> Hi, is it a known problem how much slow is Btrfs with kvm/qemu(meaning
> that the image kvm/qemu uses as the hd is on a partition formatted
> with Btrfs, not that the fs used by the hd inside the kvm environment
> is Btrfs, in fact inside kvm the / partition is formatted with ext3)?
> I haven't written down the exact numbers, because I forgot, but while
> I was trying to make it work, after I noticed how much longer than
> usual it was taking to just install the system, I took a look at iotop
> and it was reporting a write speed of the kvm process of approximately
> 3M/s, while the Btrfs kernel thread had an approximately write speed
> of 7K/s! Just formatting the partitions during the debian installation
> took minutes. When the actual installation of the distro started I had
> to stop it, because it was taking hours! The iotop results made me
> think that the problem could be Btrfs, but, to be sure that it wasn't
> instead a kvm/qemu problem, I cut/pasted the same virtual hd on an
> ext3 fs and started kvm with the same parameters as before. The
> installation of debian inside kvm this time went smoothly and fast,
> like normally it does. I've been using Btrfs for some time now and
> while it has never been a speed champion(and I guess it's not supposed
> to be one and I don't even really care that much about it), I've never
> had any noticeable performance problem before and it has always been
> quite stable. In this test case though, it seems to be doing very bad.

This looks quite similar to a problem with ext4 and O_SYNC which I
reported earlier but no one cared to answer (or read?) - there:
http://permalink.gmane.org/gmane.linux.file-systems/42758
(sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
try a few other options, esp. cache=none and re-writing some guest
files to verify.

/mjt

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12  7:09   ` Michael Tokarev
  0 siblings, 0 replies; 45+ messages in thread
From: Michael Tokarev @ 2010-07-12  7:09 UTC (permalink / raw)
  To: Giangiacomo Mariotti; +Cc: linux-fsdevel, linux-kernel, qemu-devel

12.07.2010 09:24, Giangiacomo Mariotti wrote:
> Hi, is it a known problem how much slow is Btrfs with kvm/qemu(meaning
> that the image kvm/qemu uses as the hd is on a partition formatted
> with Btrfs, not that the fs used by the hd inside the kvm environment
> is Btrfs, in fact inside kvm the / partition is formatted with ext3)?
> I haven't written down the exact numbers, because I forgot, but while
> I was trying to make it work, after I noticed how much longer than
> usual it was taking to just install the system, I took a look at iotop
> and it was reporting a write speed of the kvm process of approximately
> 3M/s, while the Btrfs kernel thread had an approximately write speed
> of 7K/s! Just formatting the partitions during the debian installation
> took minutes. When the actual installation of the distro started I had
> to stop it, because it was taking hours! The iotop results made me
> think that the problem could be Btrfs, but, to be sure that it wasn't
> instead a kvm/qemu problem, I cut/pasted the same virtual hd on an
> ext3 fs and started kvm with the same parameters as before. The
> installation of debian inside kvm this time went smoothly and fast,
> like normally it does. I've been using Btrfs for some time now and
> while it has never been a speed champion(and I guess it's not supposed
> to be one and I don't even really care that much about it), I've never
> had any noticeable performance problem before and it has always been
> quite stable. In this test case though, it seems to be doing very bad.

This looks quite similar to a problem with ext4 and O_SYNC which I
reported earlier but no one cared to answer (or read?) - there:
http://permalink.gmane.org/gmane.linux.file-systems/42758
(sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
try a few other options, esp. cache=none and re-writing some guest
files to verify.

/mjt

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-12  7:09   ` [Qemu-devel] " Michael Tokarev
@ 2010-07-12  7:17     ` Justin P. Mattock
  -1 siblings, 0 replies; 45+ messages in thread
From: Justin P. Mattock @ 2010-07-12  7:17 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Giangiacomo Mariotti, linux-kernel, linux-fsdevel, qemu-devel

On 07/12/2010 12:09 AM, Michael Tokarev wrote:
> 12.07.2010 09:24, Giangiacomo Mariotti wrote:
>> Hi, is it a known problem how much slow is Btrfs with kvm/qemu(meaning
>> that the image kvm/qemu uses as the hd is on a partition formatted
>> with Btrfs, not that the fs used by the hd inside the kvm environment
>> is Btrfs, in fact inside kvm the / partition is formatted with ext3)?
>> I haven't written down the exact numbers, because I forgot, but while
>> I was trying to make it work, after I noticed how much longer than
>> usual it was taking to just install the system, I took a look at iotop
>> and it was reporting a write speed of the kvm process of approximately
>> 3M/s, while the Btrfs kernel thread had an approximately write speed
>> of 7K/s! Just formatting the partitions during the debian installation
>> took minutes. When the actual installation of the distro started I had
>> to stop it, because it was taking hours! The iotop results made me
>> think that the problem could be Btrfs, but, to be sure that it wasn't
>> instead a kvm/qemu problem, I cut/pasted the same virtual hd on an
>> ext3 fs and started kvm with the same parameters as before. The
>> installation of debian inside kvm this time went smoothly and fast,
>> like normally it does. I've been using Btrfs for some time now and
>> while it has never been a speed champion(and I guess it's not supposed
>> to be one and I don't even really care that much about it), I've never
>> had any noticeable performance problem before and it has always been
>> quite stable. In this test case though, it seems to be doing very bad.
>
> This looks quite similar to a problem with ext4 and O_SYNC which I
> reported earlier but no one cared to answer (or read?) - there:
> http://permalink.gmane.org/gmane.linux.file-systems/42758
> (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
> try a few other options, esp. cache=none and re-writing some guest
> files to verify.
>
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

cool a solution... glad to see... no chance at a bisect with this?
(getting this down too a commit or two makes things easier)

Justin P. Mattock

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12  7:17     ` Justin P. Mattock
  0 siblings, 0 replies; 45+ messages in thread
From: Justin P. Mattock @ 2010-07-12  7:17 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: linux-fsdevel, Giangiacomo Mariotti, linux-kernel, qemu-devel

On 07/12/2010 12:09 AM, Michael Tokarev wrote:
> 12.07.2010 09:24, Giangiacomo Mariotti wrote:
>> Hi, is it a known problem how much slow is Btrfs with kvm/qemu(meaning
>> that the image kvm/qemu uses as the hd is on a partition formatted
>> with Btrfs, not that the fs used by the hd inside the kvm environment
>> is Btrfs, in fact inside kvm the / partition is formatted with ext3)?
>> I haven't written down the exact numbers, because I forgot, but while
>> I was trying to make it work, after I noticed how much longer than
>> usual it was taking to just install the system, I took a look at iotop
>> and it was reporting a write speed of the kvm process of approximately
>> 3M/s, while the Btrfs kernel thread had an approximately write speed
>> of 7K/s! Just formatting the partitions during the debian installation
>> took minutes. When the actual installation of the distro started I had
>> to stop it, because it was taking hours! The iotop results made me
>> think that the problem could be Btrfs, but, to be sure that it wasn't
>> instead a kvm/qemu problem, I cut/pasted the same virtual hd on an
>> ext3 fs and started kvm with the same parameters as before. The
>> installation of debian inside kvm this time went smoothly and fast,
>> like normally it does. I've been using Btrfs for some time now and
>> while it has never been a speed champion(and I guess it's not supposed
>> to be one and I don't even really care that much about it), I've never
>> had any noticeable performance problem before and it has always been
>> quite stable. In this test case though, it seems to be doing very bad.
>
> This looks quite similar to a problem with ext4 and O_SYNC which I
> reported earlier but no one cared to answer (or read?) - there:
> http://permalink.gmane.org/gmane.linux.file-systems/42758
> (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
> try a few other options, esp. cache=none and re-writing some guest
> files to verify.
>
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

cool a solution... glad to see... no chance at a bisect with this?
(getting this down too a commit or two makes things easier)

Justin P. Mattock

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-12  7:17     ` [Qemu-devel] " Justin P. Mattock
  (?)
@ 2010-07-12 13:15       ` Giangiacomo Mariotti
  -1 siblings, 0 replies; 45+ messages in thread
From: Giangiacomo Mariotti @ 2010-07-12 13:15 UTC (permalink / raw)
  To: Justin P. Mattock
  Cc: Michael Tokarev, linux-kernel, linux-fsdevel, qemu-devel

On Mon, Jul 12, 2010 at 9:17 AM, Justin P. Mattock
<justinmattock@gmail.com> wrote:
> On 07/12/2010 12:09 AM, Michael Tokarev wrote:
>>
>> This looks quite similar to a problem with ext4 and O_SYNC which I
>> reported earlier but no one cared to answer (or read?) - there:
>> http://permalink.gmane.org/gmane.linux.file-systems/42758
>> (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
>> try a few other options, esp. cache=none and re-writing some guest
>> files to verify.
>>
>> /mjt
>
> cool a solution... glad to see... no chance at a bisect with this?
> (getting this down too a commit or two makes things easier)
>
> Justin P. Mattock
>
I didn't even say what kernel version I was using, sorry! Kernel
2.6.34.1+"patches in stable queue for next stable release". I tried
this some time ago with 2.6.33.x(don't remember which version exactly)
and it had the same problem, but at the time I stopped trying thinking
that it was a kvm problem. So basically there's no known(to me) good
version and no, I can't bisect this because this is my production
system. Anyway, I suspect this is reproducible. Am I the only one who
created a virtual hd file on a Btrfs and then used it with kvm/qemu? I
mean, it's not a particularly exotic test-case!


-- 
Giangiacomo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12 13:15       ` Giangiacomo Mariotti
  0 siblings, 0 replies; 45+ messages in thread
From: Giangiacomo Mariotti @ 2010-07-12 13:15 UTC (permalink / raw)
  To: Justin P. Mattock
  Cc: linux-fsdevel, Michael Tokarev, linux-kernel, qemu-devel

On Mon, Jul 12, 2010 at 9:17 AM, Justin P. Mattock
<justinmattock@gmail.com> wrote:
> On 07/12/2010 12:09 AM, Michael Tokarev wrote:
>>
>> This looks quite similar to a problem with ext4 and O_SYNC which I
>> reported earlier but no one cared to answer (or read?) - there:
>> http://permalink.gmane.org/gmane.linux.file-systems/42758
>> (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
>> try a few other options, esp. cache=none and re-writing some guest
>> files to verify.
>>
>> /mjt
>
> cool a solution... glad to see... no chance at a bisect with this?
> (getting this down too a commit or two makes things easier)
>
> Justin P. Mattock
>
I didn't even say what kernel version I was using, sorry! Kernel
2.6.34.1+"patches in stable queue for next stable release". I tried
this some time ago with 2.6.33.x(don't remember which version exactly)
and it had the same problem, but at the time I stopped trying thinking
that it was a kvm problem. So basically there's no known(to me) good
version and no, I can't bisect this because this is my production
system. Anyway, I suspect this is reproducible. Am I the only one who
created a virtual hd file on a Btrfs and then used it with kvm/qemu? I
mean, it's not a particularly exotic test-case!


-- 
Giangiacomo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12 13:15       ` Giangiacomo Mariotti
  0 siblings, 0 replies; 45+ messages in thread
From: Giangiacomo Mariotti @ 2010-07-12 13:15 UTC (permalink / raw)
  To: Justin P. Mattock
  Cc: linux-fsdevel, Michael Tokarev, linux-kernel, qemu-devel

On Mon, Jul 12, 2010 at 9:17 AM, Justin P. Mattock
<justinmattock@gmail.com> wrote:
> On 07/12/2010 12:09 AM, Michael Tokarev wrote:
>>
>> This looks quite similar to a problem with ext4 and O_SYNC which I
>> reported earlier but no one cared to answer (or read?) - there:
>> http://permalink.gmane.org/gmane.linux.file-systems/42758
>> (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
>> try a few other options, esp. cache=none and re-writing some guest
>> files to verify.
>>
>> /mjt
>
> cool a solution... glad to see... no chance at a bisect with this?
> (getting this down too a commit or two makes things easier)
>
> Justin P. Mattock
>
I didn't even say what kernel version I was using, sorry! Kernel
2.6.34.1+"patches in stable queue for next stable release". I tried
this some time ago with 2.6.33.x(don't remember which version exactly)
and it had the same problem, but at the time I stopped trying thinking
that it was a kvm problem. So basically there's no known(to me) good
version and no, I can't bisect this because this is my production
system. Anyway, I suspect this is reproducible. Am I the only one who
created a virtual hd file on a Btrfs and then used it with kvm/qemu? I
mean, it's not a particularly exotic test-case!


-- 
Giangiacomo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-12  7:09   ` [Qemu-devel] " Michael Tokarev
  (?)
@ 2010-07-12 13:34     ` Giangiacomo Mariotti
  -1 siblings, 0 replies; 45+ messages in thread
From: Giangiacomo Mariotti @ 2010-07-12 13:34 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: linux-kernel, linux-fsdevel, qemu-devel

On Mon, Jul 12, 2010 at 9:09 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
>
> This looks quite similar to a problem with ext4 and O_SYNC which I
> reported earlier but no one cared to answer (or read?) - there:
> http://permalink.gmane.org/gmane.linux.file-systems/42758
> (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
> try a few other options, esp. cache=none and re-writing some guest
> files to verify.
>
> /mjt
>
Either way, changing to cache=none I suspect wouldn't tell me much,
because if it's as slow as before, it's still unusable and if instead
it's even slower, well it'd be even more unusable, so I wouldn't be
able to tell the difference. What I can say for certain is that with
the exact same virtual hd file, same options, same system, but on an
ext3 fs there's no problem at all, on a Btrfs is not just slower, it
takes ages.

-- 
Giangiacomo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12 13:34     ` Giangiacomo Mariotti
  0 siblings, 0 replies; 45+ messages in thread
From: Giangiacomo Mariotti @ 2010-07-12 13:34 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: linux-kernel, linux-fsdevel, qemu-devel

On Mon, Jul 12, 2010 at 9:09 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
>
> This looks quite similar to a problem with ext4 and O_SYNC which I
> reported earlier but no one cared to answer (or read?) - there:
> http://permalink.gmane.org/gmane.linux.file-systems/42758
> (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
> try a few other options, esp. cache=none and re-writing some guest
> files to verify.
>
> /mjt
>
Either way, changing to cache=none I suspect wouldn't tell me much,
because if it's as slow as before, it's still unusable and if instead
it's even slower, well it'd be even more unusable, so I wouldn't be
able to tell the difference. What I can say for certain is that with
the exact same virtual hd file, same options, same system, but on an
ext3 fs there's no problem at all, on a Btrfs is not just slower, it
takes ages.

-- 
Giangiacomo
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12 13:34     ` Giangiacomo Mariotti
  0 siblings, 0 replies; 45+ messages in thread
From: Giangiacomo Mariotti @ 2010-07-12 13:34 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: linux-fsdevel, linux-kernel, qemu-devel

On Mon, Jul 12, 2010 at 9:09 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
>
> This looks quite similar to a problem with ext4 and O_SYNC which I
> reported earlier but no one cared to answer (or read?) - there:
> http://permalink.gmane.org/gmane.linux.file-systems/42758
> (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
> try a few other options, esp. cache=none and re-writing some guest
> files to verify.
>
> /mjt
>
Either way, changing to cache=none I suspect wouldn't tell me much,
because if it's as slow as before, it's still unusable and if instead
it's even slower, well it'd be even more unusable, so I wouldn't be
able to tell the difference. What I can say for certain is that with
the exact same virtual hd file, same options, same system, but on an
ext3 fs there's no problem at all, on a Btrfs is not just slower, it
takes ages.

-- 
Giangiacomo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-12 13:34     ` Giangiacomo Mariotti
@ 2010-07-12 13:40       ` Michael Tokarev
  -1 siblings, 0 replies; 45+ messages in thread
From: Michael Tokarev @ 2010-07-12 13:40 UTC (permalink / raw)
  To: Giangiacomo Mariotti; +Cc: linux-kernel, linux-fsdevel, qemu-devel

Giangiacomo Mariotti wrote:
> On Mon, Jul 12, 2010 at 9:09 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
>> This looks quite similar to a problem with ext4 and O_SYNC which I
>> reported earlier but no one cared to answer (or read?) - there:
>> http://permalink.gmane.org/gmane.linux.file-systems/42758
>> (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
>> try a few other options, esp. cache=none and re-writing some guest
>> files to verify.
>>
>> /mjt
>>
> Either way, changing to cache=none I suspect wouldn't tell me much,
> because if it's as slow as before, it's still unusable and if instead
> it's even slower, well it'd be even more unusable, so I wouldn't be
> able to tell the difference.

Actually it's not that simple.

>     What I can say for certain is that with
> the exact same virtual hd file, same options, same system, but on an
> ext3 fs there's no problem at all, on a Btrfs is not just slower, it
> takes ages.

It is exactly the same with ext4 vs ext3.  But only on metadata-intensitive
operations (for qcow2 image).  Once you allocate space, it becomes fast,
and _especially_ fast with cache=none.  Actually, it looks like O_SYNC
(default cache mode) is _slower_ on ext4 than O_DIRECT (cache=none).

(And yes, I know O_DIRECT does NOT imply O_SYNC and vise versa).

/mjt

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12 13:40       ` Michael Tokarev
  0 siblings, 0 replies; 45+ messages in thread
From: Michael Tokarev @ 2010-07-12 13:40 UTC (permalink / raw)
  To: Giangiacomo Mariotti; +Cc: linux-fsdevel, linux-kernel, qemu-devel

Giangiacomo Mariotti wrote:
> On Mon, Jul 12, 2010 at 9:09 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
>> This looks quite similar to a problem with ext4 and O_SYNC which I
>> reported earlier but no one cared to answer (or read?) - there:
>> http://permalink.gmane.org/gmane.linux.file-systems/42758
>> (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
>> try a few other options, esp. cache=none and re-writing some guest
>> files to verify.
>>
>> /mjt
>>
> Either way, changing to cache=none I suspect wouldn't tell me much,
> because if it's as slow as before, it's still unusable and if instead
> it's even slower, well it'd be even more unusable, so I wouldn't be
> able to tell the difference.

Actually it's not that simple.

>     What I can say for certain is that with
> the exact same virtual hd file, same options, same system, but on an
> ext3 fs there's no problem at all, on a Btrfs is not just slower, it
> takes ages.

It is exactly the same with ext4 vs ext3.  But only on metadata-intensitive
operations (for qcow2 image).  Once you allocate space, it becomes fast,
and _especially_ fast with cache=none.  Actually, it looks like O_SYNC
(default cache mode) is _slower_ on ext4 than O_DIRECT (cache=none).

(And yes, I know O_DIRECT does NOT imply O_SYNC and vise versa).

/mjt

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-12 13:43       ` Josef Bacik
@ 2010-07-12 13:42         ` Michael Tokarev
  -1 siblings, 0 replies; 45+ messages in thread
From: Michael Tokarev @ 2010-07-12 13:42 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Giangiacomo Mariotti, linux-kernel, linux-fsdevel, qemu-devel

Josef Bacik wrote:
[]
> O_DIRECT support was just introduced recently, please try on the latest kernel
> with the normal settings (which IIRC uses O_DIRECT), that should make things
> suck alot less.  Thanks,

Um.  Do you mean it were introduced in BTRFS or general? :)

Because, wel, O_DIRECT is here and supported since some 2.2 times... ;)

/mjt

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12 13:42         ` Michael Tokarev
  0 siblings, 0 replies; 45+ messages in thread
From: Michael Tokarev @ 2010-07-12 13:42 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-fsdevel, Giangiacomo Mariotti, linux-kernel, qemu-devel

Josef Bacik wrote:
[]
> O_DIRECT support was just introduced recently, please try on the latest kernel
> with the normal settings (which IIRC uses O_DIRECT), that should make things
> suck alot less.  Thanks,

Um.  Do you mean it were introduced in BTRFS or general? :)

Because, wel, O_DIRECT is here and supported since some 2.2 times... ;)

/mjt

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-12 13:34     ` Giangiacomo Mariotti
  (?)
@ 2010-07-12 13:43       ` Josef Bacik
  -1 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2010-07-12 13:43 UTC (permalink / raw)
  To: Giangiacomo Mariotti
  Cc: Michael Tokarev, linux-kernel, linux-fsdevel, qemu-devel

On Mon, Jul 12, 2010 at 03:34:44PM +0200, Giangiacomo Mariotti wrote:
> On Mon, Jul 12, 2010 at 9:09 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
> >
> > This looks quite similar to a problem with ext4 and O_SYNC which I
> > reported earlier but no one cared to answer (or read?) - there:
> > http://permalink.gmane.org/gmane.linux.file-systems/42758
> > (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
> > try a few other options, esp. cache=none and re-writing some guest
> > files to verify.
> >
> > /mjt
> >
> Either way, changing to cache=none I suspect wouldn't tell me much,
> because if it's as slow as before, it's still unusable and if instead
> it's even slower, well it'd be even more unusable, so I wouldn't be
> able to tell the difference. What I can say for certain is that with
> the exact same virtual hd file, same options, same system, but on an
> ext3 fs there's no problem at all, on a Btrfs is not just slower, it
> takes ages.
>

O_DIRECT support was just introduced recently, please try on the latest kernel
with the normal settings (which IIRC uses O_DIRECT), that should make things
suck alot less.  Thanks,

Josef 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12 13:43       ` Josef Bacik
  0 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2010-07-12 13:43 UTC (permalink / raw)
  To: Giangiacomo Mariotti
  Cc: Michael Tokarev, linux-kernel, linux-fsdevel, qemu-devel

On Mon, Jul 12, 2010 at 03:34:44PM +0200, Giangiacomo Mariotti wrote:
> On Mon, Jul 12, 2010 at 9:09 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
> >
> > This looks quite similar to a problem with ext4 and O_SYNC which I
> > reported earlier but no one cared to answer (or read?) - there:
> > http://permalink.gmane.org/gmane.linux.file-systems/42758
> > (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
> > try a few other options, esp. cache=none and re-writing some guest
> > files to verify.
> >
> > /mjt
> >
> Either way, changing to cache=none I suspect wouldn't tell me much,
> because if it's as slow as before, it's still unusable and if instead
> it's even slower, well it'd be even more unusable, so I wouldn't be
> able to tell the difference. What I can say for certain is that with
> the exact same virtual hd file, same options, same system, but on an
> ext3 fs there's no problem at all, on a Btrfs is not just slower, it
> takes ages.
>

O_DIRECT support was just introduced recently, please try on the latest kernel
with the normal settings (which IIRC uses O_DIRECT), that should make things
suck alot less.  Thanks,

Josef 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12 13:43       ` Josef Bacik
  0 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2010-07-12 13:43 UTC (permalink / raw)
  To: Giangiacomo Mariotti
  Cc: linux-fsdevel, Michael Tokarev, linux-kernel, qemu-devel

On Mon, Jul 12, 2010 at 03:34:44PM +0200, Giangiacomo Mariotti wrote:
> On Mon, Jul 12, 2010 at 9:09 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
> >
> > This looks quite similar to a problem with ext4 and O_SYNC which I
> > reported earlier but no one cared to answer (or read?) - there:
> > http://permalink.gmane.org/gmane.linux.file-systems/42758
> > (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
> > try a few other options, esp. cache=none and re-writing some guest
> > files to verify.
> >
> > /mjt
> >
> Either way, changing to cache=none I suspect wouldn't tell me much,
> because if it's as slow as before, it's still unusable and if instead
> it's even slower, well it'd be even more unusable, so I wouldn't be
> able to tell the difference. What I can say for certain is that with
> the exact same virtual hd file, same options, same system, but on an
> ext3 fs there's no problem at all, on a Btrfs is not just slower, it
> takes ages.
>

O_DIRECT support was just introduced recently, please try on the latest kernel
with the normal settings (which IIRC uses O_DIRECT), that should make things
suck alot less.  Thanks,

Josef 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-12 13:42         ` [Qemu-devel] " Michael Tokarev
@ 2010-07-12 13:49           ` Josef Bacik
  -1 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2010-07-12 13:49 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Josef Bacik, Giangiacomo Mariotti, linux-kernel, linux-fsdevel,
	qemu-devel

On Mon, Jul 12, 2010 at 05:42:04PM +0400, Michael Tokarev wrote:
> Josef Bacik wrote:
> []
> > O_DIRECT support was just introduced recently, please try on the latest kernel
> > with the normal settings (which IIRC uses O_DIRECT), that should make things
> > suck alot less.  Thanks,
> 
> Um.  Do you mean it were introduced in BTRFS or general? :)
> 
> Because, wel, O_DIRECT is here and supported since some 2.2 times... ;)

Btrfs obviously.

Josef

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12 13:49           ` Josef Bacik
  0 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2010-07-12 13:49 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: linux-fsdevel, Giangiacomo Mariotti, linux-kernel, Josef Bacik,
	qemu-devel

On Mon, Jul 12, 2010 at 05:42:04PM +0400, Michael Tokarev wrote:
> Josef Bacik wrote:
> []
> > O_DIRECT support was just introduced recently, please try on the latest kernel
> > with the normal settings (which IIRC uses O_DIRECT), that should make things
> > suck alot less.  Thanks,
> 
> Um.  Do you mean it were introduced in BTRFS or general? :)
> 
> Because, wel, O_DIRECT is here and supported since some 2.2 times... ;)

Btrfs obviously.

Josef

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-12 13:43       ` Josef Bacik
  (?)
@ 2010-07-12 20:23         ` Giangiacomo Mariotti
  -1 siblings, 0 replies; 45+ messages in thread
From: Giangiacomo Mariotti @ 2010-07-12 20:23 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Michael Tokarev, linux-kernel, linux-fsdevel, qemu-devel

On Mon, Jul 12, 2010 at 3:43 PM, Josef Bacik <josef@redhat.com> wrote:
>
> O_DIRECT support was just introduced recently, please try on the latest kernel
> with the normal settings (which IIRC uses O_DIRECT), that should make things
> suck alot less.  Thanks,
>
> Josef
>
With latest kernel do you mean the current Linus' git tree? Because if
instead you're talking about the current stable kernel, that's the one
I used on my test.

-- 
Giangiacomo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12 20:23         ` Giangiacomo Mariotti
  0 siblings, 0 replies; 45+ messages in thread
From: Giangiacomo Mariotti @ 2010-07-12 20:23 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Michael Tokarev, linux-kernel, linux-fsdevel, qemu-devel

On Mon, Jul 12, 2010 at 3:43 PM, Josef Bacik <josef@redhat.com> wrote:
>
> O_DIRECT support was just introduced recently, please try on the latest kernel
> with the normal settings (which IIRC uses O_DIRECT), that should make things
> suck alot less.  Thanks,
>
> Josef
>
With latest kernel do you mean the current Linus' git tree? Because if
instead you're talking about the current stable kernel, that's the one
I used on my test.

-- 
Giangiacomo
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12 20:23         ` Giangiacomo Mariotti
  0 siblings, 0 replies; 45+ messages in thread
From: Giangiacomo Mariotti @ 2010-07-12 20:23 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-fsdevel, Michael Tokarev, linux-kernel, qemu-devel

On Mon, Jul 12, 2010 at 3:43 PM, Josef Bacik <josef@redhat.com> wrote:
>
> O_DIRECT support was just introduced recently, please try on the latest kernel
> with the normal settings (which IIRC uses O_DIRECT), that should make things
> suck alot less.  Thanks,
>
> Josef
>
With latest kernel do you mean the current Linus' git tree? Because if
instead you're talking about the current stable kernel, that's the one
I used on my test.

-- 
Giangiacomo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-12 20:23         ` Giangiacomo Mariotti
@ 2010-07-12 20:24           ` Josef Bacik
  -1 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2010-07-12 20:24 UTC (permalink / raw)
  To: Giangiacomo Mariotti
  Cc: Josef Bacik, Michael Tokarev, linux-kernel, linux-fsdevel, qemu-devel

On Mon, Jul 12, 2010 at 10:23:14PM +0200, Giangiacomo Mariotti wrote:
> On Mon, Jul 12, 2010 at 3:43 PM, Josef Bacik <josef@redhat.com> wrote:
> >
> > O_DIRECT support was just introduced recently, please try on the latest kernel
> > with the normal settings (which IIRC uses O_DIRECT), that should make things
> > suck alot less.  Thanks,
> >
> > Josef
> >
> With latest kernel do you mean the current Linus' git tree? Because if
> instead you're talking about the current stable kernel, that's the one
> I used on my test.
> 

Yes Linus' git tree.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [Qemu-devel] Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-12 20:24           ` Josef Bacik
  0 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2010-07-12 20:24 UTC (permalink / raw)
  To: Giangiacomo Mariotti
  Cc: linux-fsdevel, Michael Tokarev, linux-kernel, Josef Bacik, qemu-devel

On Mon, Jul 12, 2010 at 10:23:14PM +0200, Giangiacomo Mariotti wrote:
> On Mon, Jul 12, 2010 at 3:43 PM, Josef Bacik <josef@redhat.com> wrote:
> >
> > O_DIRECT support was just introduced recently, please try on the latest kernel
> > with the normal settings (which IIRC uses O_DIRECT), that should make things
> > suck alot less.  Thanks,
> >
> > Josef
> >
> With latest kernel do you mean the current Linus' git tree? Because if
> instead you're talking about the current stable kernel, that's the one
> I used on my test.
> 

Yes Linus' git tree.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-12  5:24 BTRFS: Unbelievably slow with kvm/qemu Giangiacomo Mariotti
  2010-07-12  5:54 ` Justin P. Mattock
  2010-07-12  7:09   ` [Qemu-devel] " Michael Tokarev
@ 2010-07-13  4:29 ` Avi Kivity
  2010-07-14  2:39   ` Giangiacomo Mariotti
  2010-07-14 19:49 ` Christoph Hellwig
  3 siblings, 1 reply; 45+ messages in thread
From: Avi Kivity @ 2010-07-13  4:29 UTC (permalink / raw)
  To: Giangiacomo Mariotti; +Cc: linux-kernel

On 07/12/2010 08:24 AM, Giangiacomo Mariotti wrote:
> Hi, is it a known problem how much slow is Btrfs with kvm/qemu(meaning
> that the image kvm/qemu uses as the hd is on a partition formatted
> with Btrfs, not that the fs used by the hd inside the kvm environment
> is Btrfs, in fact inside kvm the / partition is formatted with ext3)?
> I haven't written down the exact numbers, because I forgot, but while
> I was trying to make it work, after I noticed how much longer than
> usual it was taking to just install the system, I took a look at iotop
> and it was reporting a write speed of the kvm process of approximately
> 3M/s, while the Btrfs kernel thread had an approximately write speed
> of 7K/s! Just formatting the partitions during the debian installation
> took minutes. When the actual installation of the distro started I had
> to stop it, because it was taking hours! The iotop results made me
> think that the problem could be Btrfs, but, to be sure that it wasn't
> instead a kvm/qemu problem, I cut/pasted the same virtual hd on an
> ext3 fs and started kvm with the same parameters as before. The
> installation of debian inside kvm this time went smoothly and fast,
> like normally it does. I've been using Btrfs for some time now and
> while it has never been a speed champion(and I guess it's not supposed
> to be one and I don't even really care that much about it), I've never
> had any noticeable performance problem before and it has always been
> quite stable. In this test case though, it seems to be doing very bad.
>
>    

Btrfs is very slow on sync writes:

$ fio --name=x --directory=/images --rw=randwrite --runtime=300 
--size=1G --filesize=1G --bs=4k --ioengine=psync --sync=1 --unlink=1
x: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=psync, iodepth=1
Starting 1 process
x: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [w] [1.3% done] [0K/0K /s] [0/0 iops] [eta 06h:18m:45s]
x: (groupid=0, jobs=1): err= 0: pid=2086
   write: io=13,752KB, bw=46,927B/s, iops=11, runt=300078msec
     clat (msec): min=33, max=1,711, avg=87.26, stdev=60.00
     bw (KB/s) : min=    5, max=  105, per=103.79%, avg=46.70, stdev=15.86
   cpu          : usr=0.03%, sys=19.55%, ctx=47197, majf=0, minf=94
   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      issued r/w: total=0/3438, short=0/0

      lat (msec): 50=3.40%, 100=75.63%, 250=19.14%, 500=1.40%, 750=0.35%
      lat (msec): 1000=0.06%, 2000=0.03%

Run status group 0 (all jobs):
   WRITE: io=13,752KB, aggrb=45KB/s, minb=46KB/s, maxb=46KB/s, 
mint=300078msec, maxt=300078msec

45KB/s, while 4-5MB/s traffic was actually going to the disk.  For every 
4KB that the the application writes, 400KB+ of metadata is written.

(It's actually worse, since it starts faster than the average and ends 
up slower than the average).

For kvm, you can try cache=writeback or cache=unsafe and get better 
performance (though still slower than ext*).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-12 13:43       ` Josef Bacik
@ 2010-07-13  8:53         ` Kevin Wolf
  -1 siblings, 0 replies; 45+ messages in thread
From: Kevin Wolf @ 2010-07-13  8:53 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Giangiacomo Mariotti, linux-fsdevel, Michael Tokarev,
	linux-kernel, qemu-devel

Am 12.07.2010 15:43, schrieb Josef Bacik:
> On Mon, Jul 12, 2010 at 03:34:44PM +0200, Giangiacomo Mariotti wrote:
>> On Mon, Jul 12, 2010 at 9:09 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
>>>
>>> This looks quite similar to a problem with ext4 and O_SYNC which I
>>> reported earlier but no one cared to answer (or read?) - there:
>>> http://permalink.gmane.org/gmane.linux.file-systems/42758
>>> (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
>>> try a few other options, esp. cache=none and re-writing some guest
>>> files to verify.
>>>
>>> /mjt
>>>
>> Either way, changing to cache=none I suspect wouldn't tell me much,
>> because if it's as slow as before, it's still unusable and if instead
>> it's even slower, well it'd be even more unusable, so I wouldn't be
>> able to tell the difference. What I can say for certain is that with
>> the exact same virtual hd file, same options, same system, but on an
>> ext3 fs there's no problem at all, on a Btrfs is not just slower, it
>> takes ages.
>>
> 
> O_DIRECT support was just introduced recently, please try on the latest kernel
> with the normal settings (which IIRC uses O_DIRECT), that should make things
> suck alot less. 

IIUC, he uses the default cache option of qemu, which is
cache=writethrough and maps to O_DSYNC without O_DIRECT. O_DIRECT would
only be used for cache=none.

Kevin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [Qemu-devel] Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-07-13  8:53         ` Kevin Wolf
  0 siblings, 0 replies; 45+ messages in thread
From: Kevin Wolf @ 2010-07-13  8:53 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-fsdevel, Giangiacomo Mariotti, Michael Tokarev,
	linux-kernel, qemu-devel

Am 12.07.2010 15:43, schrieb Josef Bacik:
> On Mon, Jul 12, 2010 at 03:34:44PM +0200, Giangiacomo Mariotti wrote:
>> On Mon, Jul 12, 2010 at 9:09 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
>>>
>>> This looks quite similar to a problem with ext4 and O_SYNC which I
>>> reported earlier but no one cared to answer (or read?) - there:
>>> http://permalink.gmane.org/gmane.linux.file-systems/42758
>>> (sent to qemu-devel and linux-fsdevel lists - Cc'd too).  You can
>>> try a few other options, esp. cache=none and re-writing some guest
>>> files to verify.
>>>
>>> /mjt
>>>
>> Either way, changing to cache=none I suspect wouldn't tell me much,
>> because if it's as slow as before, it's still unusable and if instead
>> it's even slower, well it'd be even more unusable, so I wouldn't be
>> able to tell the difference. What I can say for certain is that with
>> the exact same virtual hd file, same options, same system, but on an
>> ext3 fs there's no problem at all, on a Btrfs is not just slower, it
>> takes ages.
>>
> 
> O_DIRECT support was just introduced recently, please try on the latest kernel
> with the normal settings (which IIRC uses O_DIRECT), that should make things
> suck alot less. 

IIUC, he uses the default cache option of qemu, which is
cache=writethrough and maps to O_DSYNC without O_DIRECT. O_DIRECT would
only be used for cache=none.

Kevin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-13  4:29 ` Avi Kivity
@ 2010-07-14  2:39   ` Giangiacomo Mariotti
  0 siblings, 0 replies; 45+ messages in thread
From: Giangiacomo Mariotti @ 2010-07-14  2:39 UTC (permalink / raw)
  To: Avi Kivity; +Cc: linux-kernel

On Tue, Jul 13, 2010 at 6:29 AM, Avi Kivity <avi@redhat.com> wrote:
> Btrfs is very slow on sync writes:
>
> 45KB/s, while 4-5MB/s traffic was actually going to the disk.  For every 4KB
> that the the application writes, 400KB+ of metadata is written.
>
> (It's actually worse, since it starts faster than the average and ends up
> slower than the average).
>
> For kvm, you can try cache=writeback or cache=unsafe and get better
> performance (though still slower than ext*).
>
Yeah, well I've already moved the virtual hd file to an ext3
partition, so the problem for me was actually already "solved" before
posting the first post. I posted the first message just to report the
particularly bad performances of Btrfs for this test-case, so that, if
not already known, they could be investigated and hopefully solved.

By the way, thanks to everyone who answered!

-- 
Giangiacomo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-12  5:24 BTRFS: Unbelievably slow with kvm/qemu Giangiacomo Mariotti
                   ` (2 preceding siblings ...)
  2010-07-13  4:29 ` Avi Kivity
@ 2010-07-14 19:49 ` Christoph Hellwig
  2010-07-17  5:29   ` Giangiacomo Mariotti
  2010-07-17 10:28   ` Ted Ts'o
  3 siblings, 2 replies; 45+ messages in thread
From: Christoph Hellwig @ 2010-07-14 19:49 UTC (permalink / raw)
  To: Giangiacomo Mariotti; +Cc: linux-kernel

There are a lot of variables when using qemu.

The most important one are:

 - the cache mode on the device.  The default is cache=writethrough,
   which is not quite optimal.  You generally do want to use cache=none
   which uses O_DIRECT in qemu.
 - if the backing image is sparse or not.
 - if you use barrier - both in the host and the guest.

Below I have a table comparing raw blockdevices, xfs, btrfs, ext4 and
ext3.  For ext3 we also compare the default, unsafe barrier=0 version
and the barrier=1 version you should use if you actually care about
your data.

The comparism is a simple untar of a Linux 2.6.34 tarball, including a
sync after it.  We run this with ext3 in the guest, either using the
default barrier=0, or for the later tests also using barrier=1.  It
is done on an OCZ Vertext SSD, which gets reformatted and fully TRIMed
before each test.

As you can see you generally do want to use cache=none and every
filesystem is about the same speed for that - except that on XFS you
also really need preallocation.  What's interesting is how bad btrfs
is for the default compared to the others, and that for many filesystems
things actually get minimally faster when enabling barriers in the
guest.  Things will look very different for barrier heavy guest, I'll
do another benchmark for those.

							bdev		xfs		btrfs		ext4		ext3		ext3 (barrier)

cache=writethrough	nobarrier	sparse		0m27.183s	0m42.552s	2m28.929s	0m33.749s	0m24.975s	0m37.105s
cache=writethrough	nobarrier	prealloc	-		0m32.840s	2m28.378s	0m34.233s	-		-

cache=none		nobarrier	sparse		0m21.988s	0m49.758s	0m24.819s	0m23.977s	0m22.569s	0m24.938s
cache=none		nobarrier	prealloc	-		0m24.464s	0m24.646s	0m24.346s	-		-

cache=none		barrier		sparse		0m21.526s	0m41.158s	0m24.403s	0m23.924s	0m23.040s	0m23.272s
cache=none		barrier		prealloc	-		0m23.944s	0m24.284s	0m23.981s	-		-

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-14 19:49 ` Christoph Hellwig
@ 2010-07-17  5:29   ` Giangiacomo Mariotti
  2010-07-17 10:28   ` Ted Ts'o
  1 sibling, 0 replies; 45+ messages in thread
From: Giangiacomo Mariotti @ 2010-07-17  5:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3054 bytes --]

On Wed, Jul 14, 2010 at 9:49 PM, Christoph Hellwig <hch@infradead.org> wrote:
> There are a lot of variables when using qemu.
>
> The most important one are:
>
>  - the cache mode on the device.  The default is cache=writethrough,
>   which is not quite optimal.  You generally do want to use cache=none
>   which uses O_DIRECT in qemu.
>  - if the backing image is sparse or not.
>  - if you use barrier - both in the host and the guest.
>
> Below I have a table comparing raw blockdevices, xfs, btrfs, ext4 and
> ext3.  For ext3 we also compare the default, unsafe barrier=0 version
> and the barrier=1 version you should use if you actually care about
> your data.
>
> The comparism is a simple untar of a Linux 2.6.34 tarball, including a
> sync after it.  We run this with ext3 in the guest, either using the
> default barrier=0, or for the later tests also using barrier=1.  It
> is done on an OCZ Vertext SSD, which gets reformatted and fully TRIMed
> before each test.
>
> As you can see you generally do want to use cache=none and every
> filesystem is about the same speed for that - except that on XFS you
> also really need preallocation.  What's interesting is how bad btrfs
> is for the default compared to the others, and that for many filesystems
> things actually get minimally faster when enabling barriers in the
> guest.  Things will look very different for barrier heavy guest, I'll
> do another benchmark for those.
>
>                                                        bdev            xfs             btrfs           ext4            ext3            ext3 (barrier)
>
> cache=writethrough      nobarrier       sparse          0m27.183s       0m42.552s       2m28.929s       0m33.749s       0m24.975s       0m37.105s
> cache=writethrough      nobarrier       prealloc        -               0m32.840s       2m28.378s       0m34.233s       -               -
>
> cache=none              nobarrier       sparse          0m21.988s       0m49.758s       0m24.819s       0m23.977s       0m22.569s       0m24.938s
> cache=none              nobarrier       prealloc        -               0m24.464s       0m24.646s       0m24.346s       -               -
>
> cache=none              barrier         sparse          0m21.526s       0m41.158s       0m24.403s       0m23.924s       0m23.040s       0m23.272s
> cache=none              barrier         prealloc        -               0m23.944s       0m24.284s       0m23.981s       -               -
>
Very interesting. I haven't had the time to try it again, but now I'm
gonna try some options about the cache and see what gives me the best
results.

-- 
Giangiacomo
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-14 19:49 ` Christoph Hellwig
  2010-07-17  5:29   ` Giangiacomo Mariotti
@ 2010-07-17 10:28   ` Ted Ts'o
  2010-07-18  7:15     ` Christoph Hellwig
  1 sibling, 1 reply; 45+ messages in thread
From: Ted Ts'o @ 2010-07-17 10:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Giangiacomo Mariotti, linux-kernel

On Wed, Jul 14, 2010 at 03:49:05PM -0400, Christoph Hellwig wrote:
> Below I have a table comparing raw blockdevices, xfs, btrfs, ext4 and
> ext3.  For ext3 we also compare the default, unsafe barrier=0 version
> and the barrier=1 version you should use if you actually care about
> your data.
> 
> The comparism is a simple untar of a Linux 2.6.34 tarball, including a
> sync after it.  We run this with ext3 in the guest, either using the
> default barrier=0, or for the later tests also using barrier=1.  It
> is done on an OCZ Vertext SSD, which gets reformatted and fully TRIMed
> before each test.
> 
> As you can see you generally do want to use cache=none and every
> filesystem is about the same speed for that - except that on XFS you
> also really need preallocation.  What's interesting is how bad btrfs
> is for the default compared to the others, and that for many filesystems
> things actually get minimally faster when enabling barriers in the
> guest.

Christoph,

Thanks so much for running these benchmarks.  It's been on my todo
list ever since the original complaint came across on the linux-ext4
list, but I just haven't had time to do the investigation.  I wonder
exactly what qemu is doing which is impact btrfs in particularly so
badly.  I assume that using the qcow2 format with cache=writethrough,
it's doing lots of effectively file appends whih require allocation
(or conversion of uninitialized preallocated blocks to initialized
blocks in the fs metadata) with lots of fsync()'s afterwards.

But when I've benchmarked the fs_mark benchmark writing 10k files
followed by an fsync, I didn't see results for btrfs that were way out
of line compared to xfs, ext3, ext4, et.al.  So merely doing a block
allocation, a small write, followed by an fsync, was something that
all file systems did fairly well at.  So there must be something
interesting/pathalogical about what qemu is doing with
cache=writethrough.  It might be interesting to understand what is
going on there, either to fix qemu/kvm, or so file systems know that
there's a particular workload that requires some special attention...

  	     	      	   	    	 - Ted

P.S.  I assume since you listed "sparse" that you were using a raw
disk and not a qcom2 block device image?


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-07-17 10:28   ` Ted Ts'o
@ 2010-07-18  7:15     ` Christoph Hellwig
  0 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2010-07-18  7:15 UTC (permalink / raw)
  To: Ted Ts'o, Christoph Hellwig, Giangiacomo Mariotti, linux-kernel

On Sat, Jul 17, 2010 at 06:28:06AM -0400, Ted Ts'o wrote:
> Thanks so much for running these benchmarks.  It's been on my todo
> list ever since the original complaint came across on the linux-ext4
> list, but I just haven't had time to do the investigation.  I wonder
> exactly what qemu is doing which is impact btrfs in particularly so
> badly.  I assume that using the qcow2 format with cache=writethrough,
> it's doing lots of effectively file appends whih require allocation
> (or conversion of uninitialized preallocated blocks to initialized
> blocks in the fs metadata) with lots of fsync()'s afterwards.

This is using raw images.  So what we're doing there is hole filling.
No explicit fsyncs are done for cache=writethrough.  cache=writethrough
translates to using O_DSYNC, which makes every write synchronous, which
these days translates to an implicity ->fsync call on every write.

> P.S.  I assume since you listed "sparse" that you were using a raw
> disk and not a qcom2 block device image?

All of these are using raw images.  sparse means just doing a truncate
to the image size, preallocated means using fallocate to pre-allocate
the space.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-09-02 16:36           ` K. Richard Pixley
@ 2010-09-02 16:49             ` K. Richard Pixley
  2010-09-02 16:49             ` K. Richard Pixley
  1 sibling, 0 replies; 45+ messages in thread
From: K. Richard Pixley @ 2010-09-02 16:49 UTC (permalink / raw)
  To: Ted Ts'o, Mike Fedyk, Josef Bacik, Tomasz Chmielewski,
	linux-kernel, linux-btrfs

  On 9/2/10 09:36 , K. Richard Pixley wrote:
>  On 9/1/10 17:18 , Ted Ts'o wrote:
>> On Tue, Aug 31, 2010 at 02:58:44PM -0700, K. Richard Pixley wrote:
>>>   On 20100831 14:46, Mike Fedyk wrote:
>>>> There is little reason not to use duplicate metadata.  Only small
>>>> files (less than 2kb) get stored in the tree, so there should be no
>>>> worries about images being duplicated without data duplication set at
>>>> mkfs time.
>>> My benchmarks show that for my kinds of data, btrfs is somewhat
>>> slower than ext4, (which is slightly slower than ext3 which is
>>> somewhat slower than ext2), when using the defaults, (ie, duplicate
>>> metadata).
>>>
>>> It's a hair faster than ext2, (the fastest of the ext family), when
>>> using singleton metadata.  And ext2 isn't even crash resistant while
>>> btrfs has snapshots.
>> I'm really, really curious.  Can you describe your data and your
>> workload in detail?  You mentioned "continuous builders"; is this some
>> kind of tinderbox setup?
> I'm not familiar with tinderbox.  Continuous builders tend to be a lot 
> like shell scripts - its usually easier to write a new one than to 
> even bother to read someone else's.  :).
>
> Basically, it's an automated system that started out life as a shell 
> script loop around a build a few years ago.  The current rendition 
> includes a number of extra features.  The basic idea here is to expose 
> top-of-tree build errors as fast as possible which means that these 
> machines can take some build shortcuts that would not be appropriate 
> for official builds intended as release candidates.  We have a 
> different set of builders which build release candidates.
>
> When it starts, it removes as many snapshots as it needs to in order 
> to make space for another build.  Initially it creates a snapshot from 
> /home, checks out source, and does a full build of top of tree.  Then 
> it starts over.  If it has a build and is not top of tree, it creates 
> a snapshot from the last successful build, updates, and does an 
> incremental build.  When it reaches top of tree, it starts taking 
> requests.
>
> We're using openembedded so the build is largely based on components 
> with a global "BOM", (bill of materials), acting as a code based 
> database of which versions of which components are in use for which 
> images.  This acts as a funneling point.  Requests are a specification 
> of a list of components to change, (different versions, etc).  A 
> snapshot is taken from the last successful build, the BOM is changed 
> locally and built incrementally.  If everything builds alright, then 
> the new BOM may be committed and/or the resulting binary packages may 
> be published for QA consumption.  But even in the case of failure, 
> this snapshot is terminal and never marked as "successful" so never 
> reused.
>
> The system acts both as a continuous builder to check top of tree as 
> well as an automated method for serializing changes, (which stands in 
> for real, human integration).
>
> We currently have about 20 of these servers, ranging from 2 - 24 
> cores, 4 - 24G memory, etc.  A single device build takes about 22G so 
> a 24G machine can do an entire build in memory.  The different 
> machines run similar builds against different branches or against 
> different targets and the staggering tends to create a lower response 
> time in the case of top-of-tree build errors that affect all devices, 
> (the most common type of error).  And most of the servers are cast 
> offs, older servers that would be discarded otherwise.  Server speed 
> tends to be an issue primarily for the full builds.  Once the full 
> build has been created, the incrementals tend to be limited to single 
> threading as the build spends most of it's time doing dependency 
> rechecking.
>
> The snapshot based approach is recent, as is our btrfs usage, (which 
> is currently problematic, polluted file systems, kernel crashes, 
> etc).  Previously I was using rsync to backup a copy of a full build 
> and rsync to replace it when a build failed.  The working directory 
> was the same working directory and I went to some pains to make it 
> reusable.  I've been looking for a snapshotting facility for a couple 
> of years now but only discovered btrfs recently.  (I tried lvm based 
> snapshots but they don't really have the characteristics that I want, 
> nor do nilfs2 snapshots.)
>
> Is that what you were looking for?
I should probably mention times and targets.

A typical 2-core, 4G developer workstation can build our entire system 
for 1 device in about 6 - 8hrs.  We typically build each device on a 
separate server and the highest end servers we're using today, (8 - 24 
core, 24G memory), can build a single device in a little under an hour.  
Those are full build times.  A complete cycle of an incremental based 
builder, (doing nothing but bookkeeping and checking dependencies), can 
take anywhere from about 2 - 4 minutes.  And a typical single component 
update usually takes 4 - 6 minutes.

 From a developer's perspective, I'm churning out 8hr builds every 5 
minutes or so.  What snapshots provide primarily is the ability to 
discard a polluted/broken working directory while retaining the ability 
to reuse it's immediate predecessor.  It's also true that snapshots 
leave old working directories laying around where they could be examined 
or debugged, but generally that facility is rarely used because it's too 
much trouble to provide developers access to those machines.

The targets here are an openembedded based embedded linux system.

--rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-09-02 16:36           ` K. Richard Pixley
  2010-09-02 16:49             ` K. Richard Pixley
@ 2010-09-02 16:49             ` K. Richard Pixley
  1 sibling, 0 replies; 45+ messages in thread
From: K. Richard Pixley @ 2010-09-02 16:49 UTC (permalink / raw)
  To: Ted Ts'o, Mike Fedyk, Josef Bacik, Tomasz Chmielewski,
	linux-kernel, linux-btrfs, hch, gg.mariotti, Justin P. Mattock,
	mjt

  On 9/2/10 09:36 , K. Richard Pixley wrote:
>  On 9/1/10 17:18 , Ted Ts'o wrote:
>> On Tue, Aug 31, 2010 at 02:58:44PM -0700, K. Richard Pixley wrote:
>>>   On 20100831 14:46, Mike Fedyk wrote:
>>>> There is little reason not to use duplicate metadata.  Only small
>>>> files (less than 2kb) get stored in the tree, so there should be no
>>>> worries about images being duplicated without data duplication set at
>>>> mkfs time.
>>> My benchmarks show that for my kinds of data, btrfs is somewhat
>>> slower than ext4, (which is slightly slower than ext3 which is
>>> somewhat slower than ext2), when using the defaults, (ie, duplicate
>>> metadata).
>>>
>>> It's a hair faster than ext2, (the fastest of the ext family), when
>>> using singleton metadata.  And ext2 isn't even crash resistant while
>>> btrfs has snapshots.
>> I'm really, really curious.  Can you describe your data and your
>> workload in detail?  You mentioned "continuous builders"; is this some
>> kind of tinderbox setup?
> I'm not familiar with tinderbox.  Continuous builders tend to be a lot 
> like shell scripts - its usually easier to write a new one than to 
> even bother to read someone else's.  :).
>
> Basically, it's an automated system that started out life as a shell 
> script loop around a build a few years ago.  The current rendition 
> includes a number of extra features.  The basic idea here is to expose 
> top-of-tree build errors as fast as possible which means that these 
> machines can take some build shortcuts that would not be appropriate 
> for official builds intended as release candidates.  We have a 
> different set of builders which build release candidates.
>
> When it starts, it removes as many snapshots as it needs to in order 
> to make space for another build.  Initially it creates a snapshot from 
> /home, checks out source, and does a full build of top of tree.  Then 
> it starts over.  If it has a build and is not top of tree, it creates 
> a snapshot from the last successful build, updates, and does an 
> incremental build.  When it reaches top of tree, it starts taking 
> requests.
>
> We're using openembedded so the build is largely based on components 
> with a global "BOM", (bill of materials), acting as a code based 
> database of which versions of which components are in use for which 
> images.  This acts as a funneling point.  Requests are a specification 
> of a list of components to change, (different versions, etc).  A 
> snapshot is taken from the last successful build, the BOM is changed 
> locally and built incrementally.  If everything builds alright, then 
> the new BOM may be committed and/or the resulting binary packages may 
> be published for QA consumption.  But even in the case of failure, 
> this snapshot is terminal and never marked as "successful" so never 
> reused.
>
> The system acts both as a continuous builder to check top of tree as 
> well as an automated method for serializing changes, (which stands in 
> for real, human integration).
>
> We currently have about 20 of these servers, ranging from 2 - 24 
> cores, 4 - 24G memory, etc.  A single device build takes about 22G so 
> a 24G machine can do an entire build in memory.  The different 
> machines run similar builds against different branches or against 
> different targets and the staggering tends to create a lower response 
> time in the case of top-of-tree build errors that affect all devices, 
> (the most common type of error).  And most of the servers are cast 
> offs, older servers that would be discarded otherwise.  Server speed 
> tends to be an issue primarily for the full builds.  Once the full 
> build has been created, the incrementals tend to be limited to single 
> threading as the build spends most of it's time doing dependency 
> rechecking.
>
> The snapshot based approach is recent, as is our btrfs usage, (which 
> is currently problematic, polluted file systems, kernel crashes, 
> etc).  Previously I was using rsync to backup a copy of a full build 
> and rsync to replace it when a build failed.  The working directory 
> was the same working directory and I went to some pains to make it 
> reusable.  I've been looking for a snapshotting facility for a couple 
> of years now but only discovered btrfs recently.  (I tried lvm based 
> snapshots but they don't really have the characteristics that I want, 
> nor do nilfs2 snapshots.)
>
> Is that what you were looking for?
I should probably mention times and targets.

A typical 2-core, 4G developer workstation can build our entire system 
for 1 device in about 6 - 8hrs.  We typically build each device on a 
separate server and the highest end servers we're using today, (8 - 24 
core, 24G memory), can build a single device in a little under an hour.  
Those are full build times.  A complete cycle of an incremental based 
builder, (doing nothing but bookkeeping and checking dependencies), can 
take anywhere from about 2 - 4 minutes.  And a typical single component 
update usually takes 4 - 6 minutes.

 From a developer's perspective, I'm churning out 8hr builds every 5 
minutes or so.  What snapshots provide primarily is the ability to 
discard a polluted/broken working directory while retaining the ability 
to reuse it's immediate predecessor.  It's also true that snapshots 
leave old working directories laying around where they could be examined 
or debugged, but generally that facility is rarely used because it's too 
much trouble to provide developers access to those machines.

The targets here are an openembedded based embedded linux system.

--rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-09-02  0:18         ` Ted Ts'o
@ 2010-09-02 16:36           ` K. Richard Pixley
  2010-09-02 16:36           ` K. Richard Pixley
  1 sibling, 0 replies; 45+ messages in thread
From: K. Richard Pixley @ 2010-09-02 16:36 UTC (permalink / raw)
  To: Ted Ts'o, Mike Fedyk, Josef Bacik, Tomasz Chmielewski,
	linux-kernel, linux-btrfs

  On 9/1/10 17:18 , Ted Ts'o wrote:
> On Tue, Aug 31, 2010 at 02:58:44PM -0700, K. Richard Pixley wrote:
>>   On 20100831 14:46, Mike Fedyk wrote:
>>> There is little reason not to use duplicate metadata.  Only small
>>> files (less than 2kb) get stored in the tree, so there should be no
>>> worries about images being duplicated without data duplication set at
>>> mkfs time.
>> My benchmarks show that for my kinds of data, btrfs is somewhat
>> slower than ext4, (which is slightly slower than ext3 which is
>> somewhat slower than ext2), when using the defaults, (ie, duplicate
>> metadata).
>>
>> It's a hair faster than ext2, (the fastest of the ext family), when
>> using singleton metadata.  And ext2 isn't even crash resistant while
>> btrfs has snapshots.
> I'm really, really curious.  Can you describe your data and your
> workload in detail?  You mentioned "continuous builders"; is this some
> kind of tinderbox setup?
I'm not familiar with tinderbox.  Continuous builders tend to be a lot 
like shell scripts - its usually easier to write a new one than to even 
bother to read someone else's.  :).

Basically, it's an automated system that started out life as a shell 
script loop around a build a few years ago.  The current rendition 
includes a number of extra features.  The basic idea here is to expose 
top-of-tree build errors as fast as possible which means that these 
machines can take some build shortcuts that would not be appropriate for 
official builds intended as release candidates.  We have a different set 
of builders which build release candidates.

When it starts, it removes as many snapshots as it needs to in order to 
make space for another build.  Initially it creates a snapshot from 
/home, checks out source, and does a full build of top of tree.  Then it 
starts over.  If it has a build and is not top of tree, it creates a 
snapshot from the last successful build, updates, and does an 
incremental build.  When it reaches top of tree, it starts taking requests.

We're using openembedded so the build is largely based on components 
with a global "BOM", (bill of materials), acting as a code based 
database of which versions of which components are in use for which 
images.  This acts as a funneling point.  Requests are a specification 
of a list of components to change, (different versions, etc).  A 
snapshot is taken from the last successful build, the BOM is changed 
locally and built incrementally.  If everything builds alright, then the 
new BOM may be committed and/or the resulting binary packages may be 
published for QA consumption.  But even in the case of failure, this 
snapshot is terminal and never marked as "successful" so never reused.

The system acts both as a continuous builder to check top of tree as 
well as an automated method for serializing changes, (which stands in 
for real, human integration).

We currently have about 20 of these servers, ranging from 2 - 24 cores, 
4 - 24G memory, etc.  A single device build takes about 22G so a 24G 
machine can do an entire build in memory.  The different machines run 
similar builds against different branches or against different targets 
and the staggering tends to create a lower response time in the case of 
top-of-tree build errors that affect all devices, (the most common type 
of error).  And most of the servers are cast offs, older servers that 
would be discarded otherwise.  Server speed tends to be an issue 
primarily for the full builds.  Once the full build has been created, 
the incrementals tend to be limited to single threading as the build 
spends most of it's time doing dependency rechecking.

The snapshot based approach is recent, as is our btrfs usage, (which is 
currently problematic, polluted file systems, kernel crashes, etc).  
Previously I was using rsync to backup a copy of a full build and rsync 
to replace it when a build failed.  The working directory was the same 
working directory and I went to some pains to make it reusable.  I've 
been looking for a snapshotting facility for a couple of years now but 
only discovered btrfs recently.  (I tried lvm based snapshots but they 
don't really have the characteristics that I want, nor do nilfs2 snapshots.)

Is that what you were looking for?

--rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-09-02  0:18         ` Ted Ts'o
  2010-09-02 16:36           ` K. Richard Pixley
@ 2010-09-02 16:36           ` K. Richard Pixley
  2010-09-02 16:49             ` K. Richard Pixley
  2010-09-02 16:49             ` K. Richard Pixley
  1 sibling, 2 replies; 45+ messages in thread
From: K. Richard Pixley @ 2010-09-02 16:36 UTC (permalink / raw)
  To: Ted Ts'o, Mike Fedyk, Josef Bacik, Tomasz Chmielewski,
	linux-kernel, linux-btrfs, hch, gg.mariotti, Justin P. Mattock,
	mjt

  On 9/1/10 17:18 , Ted Ts'o wrote:
> On Tue, Aug 31, 2010 at 02:58:44PM -0700, K. Richard Pixley wrote:
>>   On 20100831 14:46, Mike Fedyk wrote:
>>> There is little reason not to use duplicate metadata.  Only small
>>> files (less than 2kb) get stored in the tree, so there should be no
>>> worries about images being duplicated without data duplication set at
>>> mkfs time.
>> My benchmarks show that for my kinds of data, btrfs is somewhat
>> slower than ext4, (which is slightly slower than ext3 which is
>> somewhat slower than ext2), when using the defaults, (ie, duplicate
>> metadata).
>>
>> It's a hair faster than ext2, (the fastest of the ext family), when
>> using singleton metadata.  And ext2 isn't even crash resistant while
>> btrfs has snapshots.
> I'm really, really curious.  Can you describe your data and your
> workload in detail?  You mentioned "continuous builders"; is this some
> kind of tinderbox setup?
I'm not familiar with tinderbox.  Continuous builders tend to be a lot 
like shell scripts - its usually easier to write a new one than to even 
bother to read someone else's.  :).

Basically, it's an automated system that started out life as a shell 
script loop around a build a few years ago.  The current rendition 
includes a number of extra features.  The basic idea here is to expose 
top-of-tree build errors as fast as possible which means that these 
machines can take some build shortcuts that would not be appropriate for 
official builds intended as release candidates.  We have a different set 
of builders which build release candidates.

When it starts, it removes as many snapshots as it needs to in order to 
make space for another build.  Initially it creates a snapshot from 
/home, checks out source, and does a full build of top of tree.  Then it 
starts over.  If it has a build and is not top of tree, it creates a 
snapshot from the last successful build, updates, and does an 
incremental build.  When it reaches top of tree, it starts taking requests.

We're using openembedded so the build is largely based on components 
with a global "BOM", (bill of materials), acting as a code based 
database of which versions of which components are in use for which 
images.  This acts as a funneling point.  Requests are a specification 
of a list of components to change, (different versions, etc).  A 
snapshot is taken from the last successful build, the BOM is changed 
locally and built incrementally.  If everything builds alright, then the 
new BOM may be committed and/or the resulting binary packages may be 
published for QA consumption.  But even in the case of failure, this 
snapshot is terminal and never marked as "successful" so never reused.

The system acts both as a continuous builder to check top of tree as 
well as an automated method for serializing changes, (which stands in 
for real, human integration).

We currently have about 20 of these servers, ranging from 2 - 24 cores, 
4 - 24G memory, etc.  A single device build takes about 22G so a 24G 
machine can do an entire build in memory.  The different machines run 
similar builds against different branches or against different targets 
and the staggering tends to create a lower response time in the case of 
top-of-tree build errors that affect all devices, (the most common type 
of error).  And most of the servers are cast offs, older servers that 
would be discarded otherwise.  Server speed tends to be an issue 
primarily for the full builds.  Once the full build has been created, 
the incrementals tend to be limited to single threading as the build 
spends most of it's time doing dependency rechecking.

The snapshot based approach is recent, as is our btrfs usage, (which is 
currently problematic, polluted file systems, kernel crashes, etc).  
Previously I was using rsync to backup a copy of a full build and rsync 
to replace it when a build failed.  The working directory was the same 
working directory and I went to some pains to make it reusable.  I've 
been looking for a snapshotting facility for a couple of years now but 
only discovered btrfs recently.  (I tried lvm based snapshots but they 
don't really have the characteristics that I want, nor do nilfs2 snapshots.)

Is that what you were looking for?

--rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
       [not found]       ` <4C7D7B14.9020008@noir.com>
@ 2010-09-02  0:18         ` Ted Ts'o
  2010-09-02 16:36           ` K. Richard Pixley
  2010-09-02 16:36           ` K. Richard Pixley
  0 siblings, 2 replies; 45+ messages in thread
From: Ted Ts'o @ 2010-09-02  0:18 UTC (permalink / raw)
  To: K. Richard Pixley
  Cc: Mike Fedyk, Josef Bacik, Tomasz Chmielewski, linux-kernel,
	linux-btrfs, hch, gg.mariotti, Justin P. Mattock, mjt

On Tue, Aug 31, 2010 at 02:58:44PM -0700, K. Richard Pixley wrote:
>  On 20100831 14:46, Mike Fedyk wrote:
> >There is little reason not to use duplicate metadata.  Only small
> >files (less than 2kb) get stored in the tree, so there should be no
> >worries about images being duplicated without data duplication set at
> >mkfs time.
> My benchmarks show that for my kinds of data, btrfs is somewhat
> slower than ext4, (which is slightly slower than ext3 which is
> somewhat slower than ext2), when using the defaults, (ie, duplicate
> metadata).
> 
> It's a hair faster than ext2, (the fastest of the ext family), when
> using singleton metadata.  And ext2 isn't even crash resistant while
> btrfs has snapshots.

I'm really, really curious.  Can you describe your data and your
workload in detail?  You mentioned "continuous builders"; is this some
kind of tinderbox setup?

						- Ted

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-08-31 21:46       ` Mike Fedyk
  (?)
@ 2010-08-31 22:01       ` K. Richard Pixley
  -1 siblings, 0 replies; 45+ messages in thread
From: K. Richard Pixley @ 2010-08-31 22:01 UTC (permalink / raw)
  To: Mike Fedyk
  Cc: Josef Bacik, Tomasz Chmielewski, linux-kernel, linux-btrfs, hch,
	gg.mariotti, Justin P. Mattock, mjt, tytso

On 20100831 14:46, Mike Fedyk wrote:
 > There is little reason not to use duplicate metadata.  Only small
 > files (less than 2kb) get stored in the tree, so there should be no
 > worries about images being duplicated without data duplication set at
 > mkfs time.

My benchmarks show that for my kinds of data, btrfs is somewhat slower 
than ext4, (which is slightly slower than ext3 which is somewhat slower 
than ext2), when using the defaults, (ie, duplicate metadata).

It's a hair faster than ext2, (the fastest of the ext family), when 
using singleton metadata.  And ext2 isn't even crash resistant while 
btrfs has snapshots.

I'm using hardware raid for striping speed.  (Tried btrfs striping, it 
was close, but not as fast on my hardware).  I want speed, speed, speed. 
  My data is only vaguely important, (continuous builders), but speed is 
everything.

While the reason to use singleton metadata may be "little", it dominates 
my application.  If I were forced to use duplicate metadata then I'd 
still be arguing with my coworkers about whether the speed costs were 
worth it to buy snapshot functionality.  But the fact that btrfs is 
faster AND provides snapshots, (and less metadata overhead and bigger 
file systems and etc), makes for an easy sale.

Note that nilfs2 has similar performance, but somewhat different 
snapshot characteristics that aren't as useful in my current application.

--rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-08-30 15:59   ` K. Richard Pixley
@ 2010-08-31 21:46       ` Mike Fedyk
  0 siblings, 0 replies; 45+ messages in thread
From: Mike Fedyk @ 2010-08-31 21:46 UTC (permalink / raw)
  To: K. Richard Pixley
  Cc: Josef Bacik, Tomasz Chmielewski, linux-kernel, linux-btrfs, hch,
	gg.mariotti, Justin P. Mattock, mjt, tytso

On Mon, Aug 30, 2010 at 8:59 AM, K. Richard Pixley <rich@noir.com> wrot=
e:
> =C2=A0On 8/29/10 17:14 , Josef Bacik wrote:
>>
>> On Sun, Aug 29, 2010 at 09:34:29PM +0200, Tomasz Chmielewski wrote:
>>>
>>> Christoph Hellwig wrote:
>>>>
>>>> There are a lot of variables when using qemu.
>>>>
>>>> The most important one are:
>>>>
>>>> =C2=A0- the cache mode on the device. =C2=A0The default is cache=3D=
writethrough,
>>>> =C2=A0 =C2=A0which is not quite optimal. =C2=A0You generally do wa=
nt to use cache=3Dnone
>>>> =C2=A0 =C2=A0which uses O_DIRECT in qemu.
>>>> =C2=A0- if the backing image is sparse or not.
>>>> =C2=A0- if you use barrier - both in the host and the guest.
>>>
>>> I noticed that when btrfs is mounted with default options, when wri=
ting
>>> i.e. 10 GB on the KVM guest using qcow2 image, 20 GB are written on=
 the
>>> host (as measured with "iostat -m -p").
>>>
>>> With ext4 (or btrfs mounted with nodatacow), 10 GB write on a guest
>>> produces 10 GB write on the host
>>
>> Whoa 20gb? =C2=A0That doesn't sound right, COW should just mean we g=
et quite a
>> bit of
>> fragmentation, not write everything twice. =C2=A0What exactly is qem=
u doing?
>> =C2=A0Thanks,
>
> Make sure you build your file system with "mkfs.btrfs -m single -d si=
ngle
> /dev/whatever". =C2=A0You may well be writing duplicate copies of eve=
rything.
>
There is little reason not to use duplicate metadata.  Only small
files (less than 2kb) get stored in the tree, so there should be no
worries about images being duplicated without data duplication set at
mkfs time.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-08-31 21:46       ` Mike Fedyk
  0 siblings, 0 replies; 45+ messages in thread
From: Mike Fedyk @ 2010-08-31 21:46 UTC (permalink / raw)
  To: K. Richard Pixley
  Cc: Josef Bacik, Tomasz Chmielewski, linux-kernel, linux-btrfs, hch,
	gg.mariotti, Justin P. Mattock, mjt, tytso

On Mon, Aug 30, 2010 at 8:59 AM, K. Richard Pixley <rich@noir.com> wrote:
>  On 8/29/10 17:14 , Josef Bacik wrote:
>>
>> On Sun, Aug 29, 2010 at 09:34:29PM +0200, Tomasz Chmielewski wrote:
>>>
>>> Christoph Hellwig wrote:
>>>>
>>>> There are a lot of variables when using qemu.
>>>>
>>>> The most important one are:
>>>>
>>>>  - the cache mode on the device.  The default is cache=writethrough,
>>>>    which is not quite optimal.  You generally do want to use cache=none
>>>>    which uses O_DIRECT in qemu.
>>>>  - if the backing image is sparse or not.
>>>>  - if you use barrier - both in the host and the guest.
>>>
>>> I noticed that when btrfs is mounted with default options, when writing
>>> i.e. 10 GB on the KVM guest using qcow2 image, 20 GB are written on the
>>> host (as measured with "iostat -m -p").
>>>
>>> With ext4 (or btrfs mounted with nodatacow), 10 GB write on a guest
>>> produces 10 GB write on the host
>>
>> Whoa 20gb?  That doesn't sound right, COW should just mean we get quite a
>> bit of
>> fragmentation, not write everything twice.  What exactly is qemu doing?
>>  Thanks,
>
> Make sure you build your file system with "mkfs.btrfs -m single -d single
> /dev/whatever".  You may well be writing duplicate copies of everything.
>
There is little reason not to use duplicate metadata.  Only small
files (less than 2kb) get stored in the tree, so there should be no
worries about images being duplicated without data duplication set at
mkfs time.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-08-30  0:14 ` Josef Bacik
@ 2010-08-30 15:59   ` K. Richard Pixley
  2010-08-31 21:46       ` Mike Fedyk
  0 siblings, 1 reply; 45+ messages in thread
From: K. Richard Pixley @ 2010-08-30 15:59 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Tomasz Chmielewski, linux-kernel, linux-btrfs, hch, gg.mariotti,
	Justin P. Mattock, mjt, tytso

  On 8/29/10 17:14 , Josef Bacik wrote:
> On Sun, Aug 29, 2010 at 09:34:29PM +0200, Tomasz Chmielewski wrote:
>> Christoph Hellwig wrote:
>>> There are a lot of variables when using qemu.
>>>
>>> The most important one are:
>>>
>>>   - the cache mode on the device.  The default is cache=writethrough,
>>>     which is not quite optimal.  You generally do want to use cache=none
>>>     which uses O_DIRECT in qemu.
>>>   - if the backing image is sparse or not.
>>>   - if you use barrier - both in the host and the guest.
>> I noticed that when btrfs is mounted with default options, when writing
>> i.e. 10 GB on the KVM guest using qcow2 image, 20 GB are written on the
>> host (as measured with "iostat -m -p").
>>
>> With ext4 (or btrfs mounted with nodatacow), 10 GB write on a guest
>> produces 10 GB write on the host
> Whoa 20gb?  That doesn't sound right, COW should just mean we get quite a bit of
> fragmentation, not write everything twice.  What exactly is qemu doing?  Thanks,
Make sure you build your file system with "mkfs.btrfs -m single -d 
single /dev/whatever".  You may well be writing duplicate copies of 
everything.

--rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
  2010-08-29 19:34 Tomasz Chmielewski
@ 2010-08-30  0:14 ` Josef Bacik
  2010-08-30 15:59   ` K. Richard Pixley
  0 siblings, 1 reply; 45+ messages in thread
From: Josef Bacik @ 2010-08-30  0:14 UTC (permalink / raw)
  To: Tomasz Chmielewski
  Cc: linux-kernel, linux-btrfs, hch, gg.mariotti, Justin P. Mattock,
	mjt, josef, tytso

On Sun, Aug 29, 2010 at 09:34:29PM +0200, Tomasz Chmielewski wrote:
> Christoph Hellwig wrote:
>
>> There are a lot of variables when using qemu.
>>
>> The most important one are:
>>
>>  - the cache mode on the device.  The default is cache=writethrough,
>>    which is not quite optimal.  You generally do want to use cache=none
>>    which uses O_DIRECT in qemu.
>>  - if the backing image is sparse or not.
>>  - if you use barrier - both in the host and the guest.
>
> I noticed that when btrfs is mounted with default options, when writing  
> i.e. 10 GB on the KVM guest using qcow2 image, 20 GB are written on the  
> host (as measured with "iostat -m -p").
>
>
> With ext4 (or btrfs mounted with nodatacow), 10 GB write on a guest  
> produces 10 GB write on the host.
>

Whoa 20gb?  That doesn't sound right, COW should just mean we get quite a bit of
fragmentation, not write everything twice.  What exactly is qemu doing?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: BTRFS: Unbelievably slow with kvm/qemu
@ 2010-08-29 19:34 Tomasz Chmielewski
  2010-08-30  0:14 ` Josef Bacik
  0 siblings, 1 reply; 45+ messages in thread
From: Tomasz Chmielewski @ 2010-08-29 19:34 UTC (permalink / raw)
  To: linux-kernel, linux-btrfs
  Cc: hch, gg.mariotti, Justin P. Mattock, mjt, josef, tytso

Christoph Hellwig wrote:

> There are a lot of variables when using qemu.
>
> The most important one are:
>
>  - the cache mode on the device.  The default is cache=writethrough,
>    which is not quite optimal.  You generally do want to use cache=none
>    which uses O_DIRECT in qemu.
>  - if the backing image is sparse or not.
>  - if you use barrier - both in the host and the guest.

I noticed that when btrfs is mounted with default options, when writing 
i.e. 10 GB on the KVM guest using qcow2 image, 20 GB are written on the 
host (as measured with "iostat -m -p").


With ext4 (or btrfs mounted with nodatacow), 10 GB write on a guest 
produces 10 GB write on the host.


-- 
Tomasz Chmielewski
http://wpkg.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2010-09-02 16:49 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-12  5:24 BTRFS: Unbelievably slow with kvm/qemu Giangiacomo Mariotti
2010-07-12  5:54 ` Justin P. Mattock
2010-07-12  7:09 ` Michael Tokarev
2010-07-12  7:09   ` [Qemu-devel] " Michael Tokarev
2010-07-12  7:17   ` Justin P. Mattock
2010-07-12  7:17     ` [Qemu-devel] " Justin P. Mattock
2010-07-12 13:15     ` Giangiacomo Mariotti
2010-07-12 13:15       ` [Qemu-devel] " Giangiacomo Mariotti
2010-07-12 13:15       ` Giangiacomo Mariotti
2010-07-12 13:34   ` Giangiacomo Mariotti
2010-07-12 13:34     ` [Qemu-devel] " Giangiacomo Mariotti
2010-07-12 13:34     ` Giangiacomo Mariotti
2010-07-12 13:40     ` Michael Tokarev
2010-07-12 13:40       ` [Qemu-devel] " Michael Tokarev
2010-07-12 13:43     ` Josef Bacik
2010-07-12 13:43       ` [Qemu-devel] " Josef Bacik
2010-07-12 13:43       ` Josef Bacik
2010-07-12 13:42       ` Michael Tokarev
2010-07-12 13:42         ` [Qemu-devel] " Michael Tokarev
2010-07-12 13:49         ` Josef Bacik
2010-07-12 13:49           ` [Qemu-devel] " Josef Bacik
2010-07-12 20:23       ` Giangiacomo Mariotti
2010-07-12 20:23         ` [Qemu-devel] " Giangiacomo Mariotti
2010-07-12 20:23         ` Giangiacomo Mariotti
2010-07-12 20:24         ` Josef Bacik
2010-07-12 20:24           ` [Qemu-devel] " Josef Bacik
2010-07-13  8:53       ` Kevin Wolf
2010-07-13  8:53         ` Kevin Wolf
2010-07-13  4:29 ` Avi Kivity
2010-07-14  2:39   ` Giangiacomo Mariotti
2010-07-14 19:49 ` Christoph Hellwig
2010-07-17  5:29   ` Giangiacomo Mariotti
2010-07-17 10:28   ` Ted Ts'o
2010-07-18  7:15     ` Christoph Hellwig
2010-08-29 19:34 Tomasz Chmielewski
2010-08-30  0:14 ` Josef Bacik
2010-08-30 15:59   ` K. Richard Pixley
2010-08-31 21:46     ` Mike Fedyk
2010-08-31 21:46       ` Mike Fedyk
2010-08-31 22:01       ` K. Richard Pixley
     [not found]       ` <4C7D7B14.9020008@noir.com>
2010-09-02  0:18         ` Ted Ts'o
2010-09-02 16:36           ` K. Richard Pixley
2010-09-02 16:36           ` K. Richard Pixley
2010-09-02 16:49             ` K. Richard Pixley
2010-09-02 16:49             ` K. Richard Pixley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.