linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Low hit ratio and cache usage
@ 2021-06-04 11:07 Santiago Castillo Oli
  2021-06-04 12:05 ` Coly Li
       [not found] ` <f25c7f91-433e-d699-c1f6-7e828023167f@orange.fr>
  0 siblings, 2 replies; 6+ messages in thread
From: Santiago Castillo Oli @ 2021-06-04 11:07 UTC (permalink / raw)
  To: linux-bcache

Hi all!


I'm using bcache and I think I have a rather low hit ratio and cache 
occupation.


My setup is:

- Cache device: 82 GiB partition on a SSD drive. Bucket size=4M. The 
partition is aligned on a Gigabyte boundary.

- Backing device: 3.6 TiB partition on a HDD drive. There is 732 GiB of 
data usage on this partition. This 732 GiB are used by 9 qcow2 files 
assigned to 3 VMs running on the host.

- Neither the SDD nor HDD drives have another partitions in use.

- After 24 hours of use, according to priority_stats the cache is 75% 
Unused (63 GiB Unused - 19 GiB used), but...

- ... according to "smartctl -a" in those 24 hours "Writes to Flash" has 
increased in 160 GiB and "GB written from host" has increased in 90 GiB

- cache_hit_ratio is 10 %



- I'm using maximum bucket size (4M) trying to minimize write 
amplification. With this bucket size, "Writes to Flash" (160) to "GB 
written from host"(90) ratio is 1,78. Previously, some days ago, I was 
using default bucket size. The write amplification ratio then was 2,01.

- Isn't the cache_hit_ratio (10%) a bit low?

- Is it normal that, after 24 hours running, the cache occupation is 
that low (82-63 = 19GiB, 25%)  when the host has written 90 GiB to the 
cache device in the same period? I don´t understand why 90 GiB of data 
has been written to fill 19 GiB of cache.


Any ideas?


Thank you and regards.


-- 
___________________________________________________________


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Low hit ratio and cache usage
  2021-06-04 11:07 Low hit ratio and cache usage Santiago Castillo Oli
@ 2021-06-04 12:05 ` Coly Li
  2021-06-04 12:35   ` Santiago Castillo Oli
       [not found] ` <f25c7f91-433e-d699-c1f6-7e828023167f@orange.fr>
  1 sibling, 1 reply; 6+ messages in thread
From: Coly Li @ 2021-06-04 12:05 UTC (permalink / raw)
  To: Santiago Castillo Oli; +Cc: linux-bcache

On 6/4/21 7:07 PM, Santiago Castillo Oli wrote:
> Hi all!
>
>
> I'm using bcache and I think I have a rather low hit ratio and cache
> occupation.
>
>
> My setup is:
>
> - Cache device: 82 GiB partition on a SSD drive. Bucket size=4M. The
> partition is aligned on a Gigabyte boundary.
>
> - Backing device: 3.6 TiB partition on a HDD drive. There is 732 GiB
> of data usage on this partition. This 732 GiB are used by 9 qcow2
> files assigned to 3 VMs running on the host.
>
> - Neither the SDD nor HDD drives have another partitions in use.
>
> - After 24 hours of use, according to priority_stats the cache is 75%
> Unused (63 GiB Unused - 19 GiB used), but...
>
> - ... according to "smartctl -a" in those 24 hours "Writes to Flash"
> has increased in 160 GiB and "GB written from host" has increased in
> 90 GiB
>
> - cache_hit_ratio is 10 %
>
>
>
> - I'm using maximum bucket size (4M) trying to minimize write
> amplification. With this bucket size, "Writes to Flash" (160) to "GB
> written from host"(90) ratio is 1,78. Previously, some days ago, I was
> using default bucket size. The write amplification ratio then was 2,01.
>
> - Isn't the cache_hit_ratio (10%) a bit low?
>
> - Is it normal that, after 24 hours running, the cache occupation is
> that low (82-63 = 19GiB, 25%)  when the host has written 90 GiB to the
> cache device in the same period? I don´t understand why 90 GiB of data
> has been written to fill 19 GiB of cache.
>
>
> Any ideas?
>
>
> Thank you and regards.
>
>

What is the kernel version and where do you have the kernel ?  And what
is the workload on your machine ?

Most of the read requests are missing, so they will read from backing
device and refilled into cache device as used-and-clean data. Once there
is no enough space to hold more read-cached data, garbage colleague may
retire the used-and-clean data very fast and make available room for new
refilling read data. The 19GB data might be existing data from last time gc.

Coly Li

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Low hit ratio and cache usage
       [not found] ` <f25c7f91-433e-d699-c1f6-7e828023167f@orange.fr>
@ 2021-06-04 12:12   ` Santiago Castillo Oli
  0 siblings, 0 replies; 6+ messages in thread
From: Santiago Castillo Oli @ 2021-06-04 12:12 UTC (permalink / raw)
  To: Pierre Juhen, linux-bcache

Hi Pierre!


I see your point about Copy On Write, but in this scenario most of the 
data is only written once and read many times. I hoped bcache to perform 
better as a read cache. I´m afraid that bcache is only caching written 
(new and modified) blocks, not blocks already in backing device but not 
in cache device. Cache device was attached with most of data already 
resting on backing device.


What other setup would you say to be an optimal configuration to speed 
up VMs I/O using qcow2 files?


Thank you and regards


El 04/06/2021 a las 14:00, Pierre Juhen escribió:
>
> Hi !
>
> COW from qcow2 means Copy On Write.
>
> It means that a new block is written for each modification on an 
> existing block.
>
> Therefore, a "living" block is read only once, and the statistics are 
> not favorable keep the blocks in the cache.
>
> Only the "static" files (OS and frequently used program) benefit from 
> the cache.
>
> So I think that qcow2 and bcache might not be a optimal configuration.
>
> Any complement on this quick analysis ?
>
> Regards,
>
>
> Le 04/06/2021 à 13:07, Santiago Castillo Oli a écrit :
>> Hi all!
>>
>>
>> I'm using bcache and I think I have a rather low hit ratio and cache 
>> occupation.
>>
>>
>> My setup is:
>>
>> - Cache device: 82 GiB partition on a SSD drive. Bucket size=4M. The 
>> partition is aligned on a Gigabyte boundary.
>>
>> - Backing device: 3.6 TiB partition on a HDD drive. There is 732 GiB 
>> of data usage on this partition. This 732 GiB are used by 9 qcow2 
>> files assigned to 3 VMs running on the host.
>>
>> - Neither the SDD nor HDD drives have another partitions in use.
>>
>> - After 24 hours of use, according to priority_stats the cache is 75% 
>> Unused (63 GiB Unused - 19 GiB used), but...
>>
>> - ... according to "smartctl -a" in those 24 hours "Writes to Flash" 
>> has increased in 160 GiB and "GB written from host" has increased in 
>> 90 GiB
>>
>> - cache_hit_ratio is 10 %
>>
>>
>>
>> - I'm using maximum bucket size (4M) trying to minimize write 
>> amplification. With this bucket size, "Writes to Flash" (160) to "GB 
>> written from host"(90) ratio is 1,78. Previously, some days ago, I 
>> was using default bucket size. The write amplification ratio then was 
>> 2,01.
>>
>> - Isn't the cache_hit_ratio (10%) a bit low?
>>
>> - Is it normal that, after 24 hours running, the cache occupation is 
>> that low (82-63 = 19GiB, 25%)  when the host has written 90 GiB to 
>> the cache device in the same period? I don´t understand why 90 GiB of 
>> data has been written to fill 19 GiB of cache.
>>
>>
>> Any ideas?
>>
>>
>> Thank you and regards.
>>
>>
-- 
___________________________________________________________


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Low hit ratio and cache usage
  2021-06-04 12:05 ` Coly Li
@ 2021-06-04 12:35   ` Santiago Castillo Oli
  2021-06-04 12:59     ` Coly Li
  2021-06-04 15:56     ` Kai Krakow
  0 siblings, 2 replies; 6+ messages in thread
From: Santiago Castillo Oli @ 2021-06-04 12:35 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache

Hi Coli!


El 04/06/2021 a las 14:05, Coly Li escribió:
> What is the kernel version and where do you have the kernel ?  And what
> is the workload on your machine ?

I'm using debian 10 with default debian kernel (4.19.0-16-amd64) in host 
and guests.

For virtualization I'm using KVM.


There is a host, where bcache is running. The filesystem over bcache 
device is ext4. In that filesystem there is only 9 qcow2 files user by 
three VM guests. Two VM are running small nextcloud instances, another 
one is running transmission (bittorrent) for feeding debian and other 
distro iso files (30 files - 60 GiB approx.)


> Most of the read requests are missing, so they will read from backing
> device and refilled into cache device as used-and-clean data. Once there
> is no enough space to hold more read-cached data, garbage colleague may
> retire the used-and-clean data very fast and make available room for new
> refilling read data. The 19GB data might be existing data from last time gc.

Is it possible to know GC last execution time?


Regards and thank you.


-- 

___________________________________________________________


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Low hit ratio and cache usage
  2021-06-04 12:35   ` Santiago Castillo Oli
@ 2021-06-04 12:59     ` Coly Li
  2021-06-04 15:56     ` Kai Krakow
  1 sibling, 0 replies; 6+ messages in thread
From: Coly Li @ 2021-06-04 12:59 UTC (permalink / raw)
  To: Santiago Castillo Oli; +Cc: linux-bcache

On 6/4/21 8:35 PM, Santiago Castillo Oli wrote:
> Hi Coli!
>
>
> El 04/06/2021 a las 14:05, Coly Li escribió:
>> What is the kernel version and where do you have the kernel ?  And what
>> is the workload on your machine ?
>
> I'm using debian 10 with default debian kernel (4.19.0-16-amd64) in
> host and guests.
>
> For virtualization I'm using KVM.


The kernel version is too old. I strongly suggest to use 5.3+ kernel,
which most of obvious bugs were fixed.
Then let's see what will happen.

>
>
> There is a host, where bcache is running. The filesystem over bcache
> device is ext4. In that filesystem there is only 9 qcow2 files user by
> three VM guests. Two VM are running small nextcloud instances, another
> one is running transmission (bittorrent) for feeding debian and other
> distro iso files (30 files - 60 GiB approx.)
>
>
>> Most of the read requests are missing, so they will read from backing
>> device and refilled into cache device as used-and-clean data. Once there
>> is no enough space to hold more read-cached data, garbage colleague may
>> retire the used-and-clean data very fast and make available room for new
>> refilling read data. The 19GB data might be existing data from last
>> time gc.
>
> Is it possible to know GC last execution time?
>

See /sys/fs/bcache/<UUID>/internal/btree_gc_last_sec



Coly Li

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Low hit ratio and cache usage
  2021-06-04 12:35   ` Santiago Castillo Oli
  2021-06-04 12:59     ` Coly Li
@ 2021-06-04 15:56     ` Kai Krakow
  1 sibling, 0 replies; 6+ messages in thread
From: Kai Krakow @ 2021-06-04 15:56 UTC (permalink / raw)
  To: Santiago Castillo Oli; +Cc: Coly Li, linux-bcache

Hello!

Am Fr., 4. Juni 2021 um 14:36 Uhr schrieb Santiago Castillo Oli
<scastillo@aragon.es>:
>
> Hi Coli!
>
>
> El 04/06/2021 a las 14:05, Coly Li escribió:
> > What is the kernel version and where do you have the kernel ?  And what
> > is the workload on your machine ?
>
> I'm using debian 10 with default debian kernel (4.19.0-16-amd64) in host
> and guests.
>
> For virtualization I'm using KVM.
>
>
> There is a host, where bcache is running. The filesystem over bcache
> device is ext4. In that filesystem there is only 9 qcow2 files user by
> three VM guests. Two VM are running small nextcloud instances, another
> one is running transmission (bittorrent) for feeding debian and other
> distro iso files (30 files - 60 GiB approx.)

Besides Coly recommending to use a newer kernel, I think there may be
some misunderstanding of how bcache works:

* bcache is mostly about reducing latency so it skips sequential
access, you should measure block access latency instead of throughput
or fill rate

* thus, it probably fills your cache very slowly if a lot of patterns
are sequential

* ext4 has a write journal which turns many random write patterns into
sequential write patterns, YMMV if you disable ordered data mode or
journalling

* qcow2 is copy-on-write: new blocks are appended, resulting in
possible write amplification in bcache, it also creates sequential
write patterns

* kvm/qemu probably use directio / uncached IO by default which may
bypass bcache or caching completely, you should try a different IO
mode in kvm (e.g. unsafe cached)


Regards,
Kai

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-06-04 15:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-04 11:07 Low hit ratio and cache usage Santiago Castillo Oli
2021-06-04 12:05 ` Coly Li
2021-06-04 12:35   ` Santiago Castillo Oli
2021-06-04 12:59     ` Coly Li
2021-06-04 15:56     ` Kai Krakow
     [not found] ` <f25c7f91-433e-d699-c1f6-7e828023167f@orange.fr>
2021-06-04 12:12   ` Santiago Castillo Oli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).