linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* IO hang when cache do not have enough buckets on small SSD
@ 2021-05-17  3:54 Jim Guo
  2021-05-17 11:53 ` Coly Li
  0 siblings, 1 reply; 4+ messages in thread
From: Jim Guo @ 2021-05-17  3:54 UTC (permalink / raw)
  To: colyli, linux-bcache

Hello, Mr. Li.
Recently I was experiencing frequent io hang when testing with fio
with 4K random write. Fio iops dropped  to 0 for about 20 seconds
every several minutes.
After some debugging, I discovered that it is the incremental gc that
cause this problem.
My cache disk is relatively small (375GiB with 4K block size and 512K
bucket size), backing hdds are 4 x 1 TiB. I cannot reproduce this on
another environment with bigger cache disk.
When running 4K random write fio bench, the buckets are consumed  very
quickly and soon it has to invalidate some bucket (this happens quite
often). Since the cache disk is small, a lot of write io will soon
reach sectors_to_gc and trigger gc thread. Write io will also increase
search_inflight, which cause gc thread to sleep for 100ms. This will
cause gc procedure to execute for a long time, and invalidating bucket
for the write io will wait for the whole gc procedure.
After removing the 100ms sleep from the incremental gc patch,  the io
never hang any more.
I think for small ssd, sleeping for 100ms seems too long or maybe
write io should not trigger gc thread to sleep for 100ms?
Thank you very much.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: IO hang when cache do not have enough buckets on small SSD
  2021-05-17  3:54 IO hang when cache do not have enough buckets on small SSD Jim Guo
@ 2021-05-17 11:53 ` Coly Li
  2021-05-26  5:58   ` Jim Guo
  0 siblings, 1 reply; 4+ messages in thread
From: Coly Li @ 2021-05-17 11:53 UTC (permalink / raw)
  To: Jim Guo; +Cc: linux-bcache

On 5/17/21 11:54 AM, Jim Guo wrote:
> Hello, Mr. Li.
> Recently I was experiencing frequent io hang when testing with fio
> with 4K random write. Fio iops dropped  to 0 for about 20 seconds
> every several minutes.
> After some debugging, I discovered that it is the incremental gc that
> cause this problem.
> My cache disk is relatively small (375GiB with 4K block size and 512K
> bucket size), backing hdds are 4 x 1 TiB. I cannot reproduce this on
> another environment with bigger cache disk.
> When running 4K random write fio bench, the buckets are consumed  very
> quickly and soon it has to invalidate some bucket (this happens quite
> often). Since the cache disk is small, a lot of write io will soon
> reach sectors_to_gc and trigger gc thread. Write io will also increase
> search_inflight, which cause gc thread to sleep for 100ms. This will
> cause gc procedure to execute for a long time, and invalidating bucket
> for the write io will wait for the whole gc procedure.
> After removing the 100ms sleep from the incremental gc patch,  the io
> never hang any more.

What is the kernel version in your system? And where the kernel package
is from?


> I think for small ssd, sleeping for 100ms seems too long or maybe
> write io should not trigger gc thread to sleep for 100ms?
> Thank you very much.
> 

Do you have a testing result on this idea?


Thanks.

Coly Li

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: IO hang when cache do not have enough buckets on small SSD
  2021-05-17 11:53 ` Coly Li
@ 2021-05-26  5:58   ` Jim Guo
  2021-05-26  6:09     ` Coly Li
  0 siblings, 1 reply; 4+ messages in thread
From: Jim Guo @ 2021-05-26  5:58 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache

> What is the kernel version in your system? And where the kernel package
> is from?
I am using kernel version 4.19.142, I compile it from source code
downloaded from kernel.org.

> Do you have a testing result on this idea?
Sorry, the testing environment is not owned by me and I did not keep
any testing result currently. I will test for this later in my own
testing environment.

Coly Li <colyli@suse.de> 于2021年5月17日周一 下午7:53写道:

>
> On 5/17/21 11:54 AM, Jim Guo wrote:
> > Hello, Mr. Li.
> > Recently I was experiencing frequent io hang when testing with fio
> > with 4K random write. Fio iops dropped  to 0 for about 20 seconds
> > every several minutes.
> > After some debugging, I discovered that it is the incremental gc that
> > cause this problem.
> > My cache disk is relatively small (375GiB with 4K block size and 512K
> > bucket size), backing hdds are 4 x 1 TiB. I cannot reproduce this on
> > another environment with bigger cache disk.
> > When running 4K random write fio bench, the buckets are consumed  very
> > quickly and soon it has to invalidate some bucket (this happens quite
> > often). Since the cache disk is small, a lot of write io will soon
> > reach sectors_to_gc and trigger gc thread. Write io will also increase
> > search_inflight, which cause gc thread to sleep for 100ms. This will
> > cause gc procedure to execute for a long time, and invalidating bucket
> > for the write io will wait for the whole gc procedure.
> > After removing the 100ms sleep from the incremental gc patch,  the io
> > never hang any more.
>
> What is the kernel version in your system? And where the kernel package
> is from?
>
>
> > I think for small ssd, sleeping for 100ms seems too long or maybe
> > write io should not trigger gc thread to sleep for 100ms?
> > Thank you very much.
> >
>
> Do you have a testing result on this idea?
>
>
> Thanks.
>
> Coly Li

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: IO hang when cache do not have enough buckets on small SSD
  2021-05-26  5:58   ` Jim Guo
@ 2021-05-26  6:09     ` Coly Li
  0 siblings, 0 replies; 4+ messages in thread
From: Coly Li @ 2021-05-26  6:09 UTC (permalink / raw)
  To: Jim Guo; +Cc: linux-bcache

On 5/26/21 1:58 PM, Jim Guo wrote:
>> What is the kernel version in your system? And where the kernel package
>> is from?
> I am using kernel version 4.19.142, I compile it from source code
> downloaded from kernel.org.
> 
>> Do you have a testing result on this idea?
> Sorry, the testing environment is not owned by me and I did not keep
> any testing result currently. I will test for this later in my own
> testing environment.
> 

OK. And I would suggest to start your work on upstream bcache code.

Coly Li


> Coly Li <colyli@suse.de> 于2021年5月17日周一 下午7:53写道:
> 
>>
>> On 5/17/21 11:54 AM, Jim Guo wrote:
>>> Hello, Mr. Li.
>>> Recently I was experiencing frequent io hang when testing with fio
>>> with 4K random write. Fio iops dropped  to 0 for about 20 seconds
>>> every several minutes.
>>> After some debugging, I discovered that it is the incremental gc that
>>> cause this problem.
>>> My cache disk is relatively small (375GiB with 4K block size and 512K
>>> bucket size), backing hdds are 4 x 1 TiB. I cannot reproduce this on
>>> another environment with bigger cache disk.
>>> When running 4K random write fio bench, the buckets are consumed  very
>>> quickly and soon it has to invalidate some bucket (this happens quite
>>> often). Since the cache disk is small, a lot of write io will soon
>>> reach sectors_to_gc and trigger gc thread. Write io will also increase
>>> search_inflight, which cause gc thread to sleep for 100ms. This will
>>> cause gc procedure to execute for a long time, and invalidating bucket
>>> for the write io will wait for the whole gc procedure.
>>> After removing the 100ms sleep from the incremental gc patch,  the io
>>> never hang any more.
>>
>> What is the kernel version in your system? And where the kernel package
>> is from?
>>
>>
>>> I think for small ssd, sleeping for 100ms seems too long or maybe
>>> write io should not trigger gc thread to sleep for 100ms?
>>> Thank you very much.
>>>
>>
>> Do you have a testing result on this idea?
>>
>>
>> Thanks.
>>
>> Coly Li


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-05-26  6:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-17  3:54 IO hang when cache do not have enough buckets on small SSD Jim Guo
2021-05-17 11:53 ` Coly Li
2021-05-26  5:58   ` Jim Guo
2021-05-26  6:09     ` Coly Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).