All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/4] bcache: finish incremental GC
@ 2018-04-13  8:37 tang.junhui
  2018-04-13  8:48 ` Coly Li
  0 siblings, 1 reply; 7+ messages in thread
From: tang.junhui @ 2018-04-13  8:37 UTC (permalink / raw)
  To: colyli; +Cc: kent.overstreet, mlyle, linux-bcache, linux-block, tang.junhui

Hi Coly,
>> 
>> Hello Coly,
>> 
>>> On 2018/4/12 2:38 PM, tang.junhui@zte.com.cn wrote:
>>>> From: Tang Junhui <tang.junhui@zte.com.cn>
>>>>
>>>> In GC thread, we record the latest GC key in gc_done, which is expected
>>>> to be used for incremental GC, but in currently code, we didn't realize
>>>> it. When GC runs, front side IO would be blocked until the GC over, it
>>>> would be a long time if there is a lot of btree nodes.
>>>>
>>>> This patch realizes incremental GC, the main ideal is that, when there
>>>> are front side I/Os, after GC some nodes (100), we stop GC, release locker
>>>> of the btree node, and go to process the front side I/Os for some times
>>>> (100 ms), then go back to GC again.
>>>>
>>>> By this patch, when we doing GC, I/Os are not blocked all the time, and
>>>> there is no obvious I/Os zero jump problem any more.
>>>>
>>>> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
>>>
>>> Hi Junhui,
>>>
>>> I reply my comments in line,
>> Thanks for your comments in advance.
>>>
>>>> ---
>>>>  drivers/md/bcache/bcache.h  |  5 +++++
>>>>  drivers/md/bcache/btree.c   | 15 ++++++++++++++-
>>>>  drivers/md/bcache/request.c |  3 +++
>>>>  3 files changed, 22 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
>>>> index 843877e..ab4c9ca 100644
>>>> --- a/drivers/md/bcache/bcache.h
>>>> +++ b/drivers/md/bcache/bcache.h
>>>> @@ -445,6 +445,7 @@ struct cache {
>>>>  
>>>>  struct gc_stat {
>>>>      size_t            nodes;
>>>> +    size_t            nodes_pre;
>>>>      size_t            key_bytes;
>>>>  
>>>>      size_t            nkeys;
>>>> @@ -568,6 +569,10 @@ struct cache_set {
>>>>       */
>>>>      atomic_t        rescale;
>>>>      /*
>>>> +     * used for GC, identify if any front side I/Os is inflight
>>>> +     */
>>>> +    atomic_t        io_inflight;
>>>
>>> Could you please to rename the above variable to something like
>>> search_inflight ? Just to be more explicit, considering we have many
>>> types of io requests.
>> OK, It looks better.
>>>
>>>> +    /*
>>>>       * When we invalidate buckets, we use both the priority and the amount
>>>>       * of good data to determine which buckets to reuse first - to weight
>>>>       * those together consistently we keep track of the smallest nonzero
>>>> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
>>>> index 81e8dc3..b36d276 100644
>>>> --- a/drivers/md/bcache/btree.c
>>>> +++ b/drivers/md/bcache/btree.c
>>>> @@ -90,6 +90,9 @@
>>>>  
>>>>  #define MAX_NEED_GC        64
>>>>  #define MAX_SAVE_PRIO        72
>>>> +#define MIN_GC_NODES        100
>>>> +#define GC_SLEEP_TIME        100
>>>
>>> How about naming the above macro as GC_SLEEP_MS ? So people may
>>> understand the sleep time is 100ms without checking more context.
>> OK, It looks better.
>>>> +
>>>>  
>>>>  #define PTR_DIRTY_BIT        (((uint64_t) 1 << 36))
>>>>  
>>>> @@ -1581,6 +1584,13 @@ static int btree_gc_recurse(struct btree *b, struct btree_op *op,
>>>>          memmove(r + 1, r, sizeof(r[0]) * (GC_MERGE_NODES - 1));
>>>>          r->b = NULL;
>>>>  
>>>> +        if (atomic_read(&b->c->io_inflight) &&
>>>> +            gc->nodes >= gc->nodes_pre + MIN_GC_NODES) {
>>>
>>> On 32bit machines, gc->nodes is a 32bit count, if it is overflowed the
>>> above check will be false for a very long time, and in some condition it
>>> might be always false after gc->nodes overflowed.
>>>
>>> How about adding the following check before the if() statement ?
>> 
>>>         /* in case gc->nodes is overflowed */
>>>         if (gc->nodes_pre < gc->nodes)
>>>             gc->noeds_pre = gc->nodes;
>>>
>> I think 32bit is big enough, which can express a value of billions,
>> but the number of nodes is usually around tens of thousands.
>> Also gc->nodes and gc->nodes_pre were set to zero when GC begin each time.
>> 
>> How do you think about? 
>
>Oh, I see. Then I don't worry here. The patch is OK to me.
Thanks Coly, Would you mind if I add you as a reveiw-by in the v2 patch?

Thanks.
Tang Junhui

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/4] bcache: finish incremental GC
  2018-04-13  8:37 [PATCH 1/4] bcache: finish incremental GC tang.junhui
@ 2018-04-13  8:48 ` Coly Li
  0 siblings, 0 replies; 7+ messages in thread
From: Coly Li @ 2018-04-13  8:48 UTC (permalink / raw)
  To: tang.junhui; +Cc: kent.overstreet, mlyle, linux-bcache, linux-block

On 2018/4/13 4:37 PM, tang.junhui@zte.com.cn wrote:
> Hi Coly,
>>>
>>> Hello Coly,
>>>
>>>> On 2018/4/12 2:38 PM, tang.junhui@zte.com.cn wrote:
>>>>> From: Tang Junhui <tang.junhui@zte.com.cn>
>>>>>
>>>>> In GC thread, we record the latest GC key in gc_done, which is expected
>>>>> to be used for incremental GC, but in currently code, we didn't realize
>>>>> it. When GC runs, front side IO would be blocked until the GC over, it
>>>>> would be a long time if there is a lot of btree nodes.
>>>>>
>>>>> This patch realizes incremental GC, the main ideal is that, when there
>>>>> are front side I/Os, after GC some nodes (100), we stop GC, release locker
>>>>> of the btree node, and go to process the front side I/Os for some times
>>>>> (100 ms), then go back to GC again.
>>>>>
>>>>> By this patch, when we doing GC, I/Os are not blocked all the time, and
>>>>> there is no obvious I/Os zero jump problem any more.
>>>>>
>>>>> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
>>>>
>>>> Hi Junhui,
>>>>
>>>> I reply my comments in line,
>>> Thanks for your comments in advance.
>>>>
>>>>> ---
>>>>>  drivers/md/bcache/bcache.h  |  5 +++++
>>>>>  drivers/md/bcache/btree.c   | 15 ++++++++++++++-
>>>>>  drivers/md/bcache/request.c |  3 +++
>>>>>  3 files changed, 22 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
>>>>> index 843877e..ab4c9ca 100644
>>>>> --- a/drivers/md/bcache/bcache.h
>>>>> +++ b/drivers/md/bcache/bcache.h
>>>>> @@ -445,6 +445,7 @@ struct cache {
>>>>>  
>>>>>  struct gc_stat {
>>>>>      size_t            nodes;
>>>>> +    size_t            nodes_pre;
>>>>>      size_t            key_bytes;
>>>>>  
>>>>>      size_t            nkeys;
>>>>> @@ -568,6 +569,10 @@ struct cache_set {
>>>>>       */
>>>>>      atomic_t        rescale;
>>>>>      /*
>>>>> +     * used for GC, identify if any front side I/Os is inflight
>>>>> +     */
>>>>> +    atomic_t        io_inflight;
>>>>
>>>> Could you please to rename the above variable to something like
>>>> search_inflight ? Just to be more explicit, considering we have many
>>>> types of io requests.
>>> OK, It looks better.
>>>>
>>>>> +    /*
>>>>>       * When we invalidate buckets, we use both the priority and the amount
>>>>>       * of good data to determine which buckets to reuse first - to weight
>>>>>       * those together consistently we keep track of the smallest nonzero
>>>>> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
>>>>> index 81e8dc3..b36d276 100644
>>>>> --- a/drivers/md/bcache/btree.c
>>>>> +++ b/drivers/md/bcache/btree.c
>>>>> @@ -90,6 +90,9 @@
>>>>>  
>>>>>  #define MAX_NEED_GC        64
>>>>>  #define MAX_SAVE_PRIO        72
>>>>> +#define MIN_GC_NODES        100
>>>>> +#define GC_SLEEP_TIME        100
>>>>
>>>> How about naming the above macro as GC_SLEEP_MS ? So people may
>>>> understand the sleep time is 100ms without checking more context.
>>> OK, It looks better.
>>>>> +
>>>>>  
>>>>>  #define PTR_DIRTY_BIT        (((uint64_t) 1 << 36))
>>>>>  
>>>>> @@ -1581,6 +1584,13 @@ static int btree_gc_recurse(struct btree *b, struct btree_op *op,
>>>>>          memmove(r + 1, r, sizeof(r[0]) * (GC_MERGE_NODES - 1));
>>>>>          r->b = NULL;
>>>>>  
>>>>> +        if (atomic_read(&b->c->io_inflight) &&
>>>>> +            gc->nodes >= gc->nodes_pre + MIN_GC_NODES) {
>>>>
>>>> On 32bit machines, gc->nodes is a 32bit count, if it is overflowed the
>>>> above check will be false for a very long time, and in some condition it
>>>> might be always false after gc->nodes overflowed.
>>>>
>>>> How about adding the following check before the if() statement ?
>>>
>>>>         /* in case gc->nodes is overflowed */
>>>>         if (gc->nodes_pre < gc->nodes)
>>>>             gc->noeds_pre = gc->nodes;
>>>>
>>> I think 32bit is big enough, which can express a value of billions,
>>> but the number of nodes is usually around tens of thousands.
>>> Also gc->nodes and gc->nodes_pre were set to zero when GC begin each time.
>>>
>>> How do you think about? 
>>
>> Oh, I see. Then I don't worry here. The patch is OK to me.
> Thanks Coly, Would you mind if I add you as a reveiw-by in the v2 patch?

Hi Junhui,

How about send out v2 patch and let me confirm it with a Reviewed-by ?

Coly Li

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/4] bcache: finish incremental GC
@ 2018-04-13  8:55 tang.junhui
  0 siblings, 0 replies; 7+ messages in thread
From: tang.junhui @ 2018-04-13  8:55 UTC (permalink / raw)
  To: colyli; +Cc: kent.overstreet, mlyle, linux-bcache, linux-block, tang.junhui

Hi Coly,

> Hi Junhui,

> How about send out v2 patch and let me confirm it with a Reviewed-by ?

Certainly it's OK.

Thanks.
Tang Junhui

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/4] bcache: finish incremental GC
  2018-04-13  3:12 tang.junhui
@ 2018-04-13  3:43 ` Coly Li
  0 siblings, 0 replies; 7+ messages in thread
From: Coly Li @ 2018-04-13  3:43 UTC (permalink / raw)
  To: tang.junhui; +Cc: kent.overstreet, mlyle, linux-bcache, linux-block

On 2018/4/13 11:12 AM, tang.junhui@zte.com.cn wrote:
> Hi Coly,
> 
> Hello Coly,
> 
>> On 2018/4/12 2:38 PM, tang.junhui@zte.com.cn wrote:
>>> From: Tang Junhui <tang.junhui@zte.com.cn>
>>>
>>> In GC thread, we record the latest GC key in gc_done, which is expected
>>> to be used for incremental GC, but in currently code, we didn't realize
>>> it. When GC runs, front side IO would be blocked until the GC over, it
>>> would be a long time if there is a lot of btree nodes.
>>>
>>> This patch realizes incremental GC, the main ideal is that, when there
>>> are front side I/Os, after GC some nodes (100), we stop GC, release locker
>>> of the btree node, and go to process the front side I/Os for some times
>>> (100 ms), then go back to GC again.
>>>
>>> By this patch, when we doing GC, I/Os are not blocked all the time, and
>>> there is no obvious I/Os zero jump problem any more.
>>>
>>> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
>>
>> Hi Junhui,
>>
>> I reply my comments in line,
> Thanks for your comments in advance.
>>
>>> ---
>>>  drivers/md/bcache/bcache.h  |  5 +++++
>>>  drivers/md/bcache/btree.c   | 15 ++++++++++++++-
>>>  drivers/md/bcache/request.c |  3 +++
>>>  3 files changed, 22 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
>>> index 843877e..ab4c9ca 100644
>>> --- a/drivers/md/bcache/bcache.h
>>> +++ b/drivers/md/bcache/bcache.h
>>> @@ -445,6 +445,7 @@ struct cache {
>>>  
>>>  struct gc_stat {
>>>      size_t            nodes;
>>> +    size_t            nodes_pre;
>>>      size_t            key_bytes;
>>>  
>>>      size_t            nkeys;
>>> @@ -568,6 +569,10 @@ struct cache_set {
>>>       */
>>>      atomic_t        rescale;
>>>      /*
>>> +     * used for GC, identify if any front side I/Os is inflight
>>> +     */
>>> +    atomic_t        io_inflight;
>>
>> Could you please to rename the above variable to something like
>> search_inflight ? Just to be more explicit, considering we have many
>> types of io requests.
> OK, It looks better.
>>
>>> +    /*
>>>       * When we invalidate buckets, we use both the priority and the amount
>>>       * of good data to determine which buckets to reuse first - to weight
>>>       * those together consistently we keep track of the smallest nonzero
>>> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
>>> index 81e8dc3..b36d276 100644
>>> --- a/drivers/md/bcache/btree.c
>>> +++ b/drivers/md/bcache/btree.c
>>> @@ -90,6 +90,9 @@
>>>  
>>>  #define MAX_NEED_GC        64
>>>  #define MAX_SAVE_PRIO        72
>>> +#define MIN_GC_NODES        100
>>> +#define GC_SLEEP_TIME        100
>>
>> How about naming the above macro as GC_SLEEP_MS ? So people may
>> understand the sleep time is 100ms without checking more context.
> OK, It looks better.
>>> +
>>>  
>>>  #define PTR_DIRTY_BIT        (((uint64_t) 1 << 36))
>>>  
>>> @@ -1581,6 +1584,13 @@ static int btree_gc_recurse(struct btree *b, struct btree_op *op,
>>>          memmove(r + 1, r, sizeof(r[0]) * (GC_MERGE_NODES - 1));
>>>          r->b = NULL;
>>>  
>>> +        if (atomic_read(&b->c->io_inflight) &&
>>> +            gc->nodes >= gc->nodes_pre + MIN_GC_NODES) {
>>
>> On 32bit machines, gc->nodes is a 32bit count, if it is overflowed the
>> above check will be false for a very long time, and in some condition it
>> might be always false after gc->nodes overflowed.
>>
>> How about adding the following check before the if() statement ?
> 
>>         /* in case gc->nodes is overflowed */
>>         if (gc->nodes_pre < gc->nodes)
>>             gc->noeds_pre = gc->nodes;
>>
> I think 32bit is big enough, which can express a value of billions,
> but the number of nodes is usually around tens of thousands.
> Also gc->nodes and gc->nodes_pre were set to zero when GC begin each time.
> 
> How do you think about? 

Oh, I see. Then I don't worry here. The patch is OK to me.

Thanks :-)

Coly Li


[code snipped]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/4] bcache: finish incremental GC
@ 2018-04-13  3:12 tang.junhui
  2018-04-13  3:43 ` Coly Li
  0 siblings, 1 reply; 7+ messages in thread
From: tang.junhui @ 2018-04-13  3:12 UTC (permalink / raw)
  To: colyli; +Cc: kent.overstreet, mlyle, linux-bcache, linux-block, tang.junhui

Hi Coly,

Hello Coly,

> On 2018/4/12 2:38 PM, tang.junhui@zte.com.cn wrote:
> > From: Tang Junhui <tang.junhui@zte.com.cn>
> > 
> > In GC thread, we record the latest GC key in gc_done, which is expected
> > to be used for incremental GC, but in currently code, we didn't realize
> > it. When GC runs, front side IO would be blocked until the GC over, it
> > would be a long time if there is a lot of btree nodes.
> > 
> > This patch realizes incremental GC, the main ideal is that, when there
> > are front side I/Os, after GC some nodes (100), we stop GC, release locker
> > of the btree node, and go to process the front side I/Os for some times
> > (100 ms), then go back to GC again.
> > 
> > By this patch, when we doing GC, I/Os are not blocked all the time, and
> > there is no obvious I/Os zero jump problem any more.
> > 
> > Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
> 
> Hi Junhui,
> 
> I reply my comments in line,
Thanks for your comments in advance.
> 
> > ---
> >  drivers/md/bcache/bcache.h  |  5 +++++
> >  drivers/md/bcache/btree.c   | 15 ++++++++++++++-
> >  drivers/md/bcache/request.c |  3 +++
> >  3 files changed, 22 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> > index 843877e..ab4c9ca 100644
> > --- a/drivers/md/bcache/bcache.h
> > +++ b/drivers/md/bcache/bcache.h
> > @@ -445,6 +445,7 @@ struct cache {
> >  
> >  struct gc_stat {
> >      size_t            nodes;
> > +    size_t            nodes_pre;
> >      size_t            key_bytes;
> >  
> >      size_t            nkeys;
> > @@ -568,6 +569,10 @@ struct cache_set {
> >       */
> >      atomic_t        rescale;
> >      /*
> > +     * used for GC, identify if any front side I/Os is inflight
> > +     */
> > +    atomic_t        io_inflight;
> 
> Could you please to rename the above variable to something like
> search_inflight ? Just to be more explicit, considering we have many
> types of io requests.
OK, It looks better.
> 
> > +    /*
> >       * When we invalidate buckets, we use both the priority and the amount
> >       * of good data to determine which buckets to reuse first - to weight
> >       * those together consistently we keep track of the smallest nonzero
> > diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> > index 81e8dc3..b36d276 100644
> > --- a/drivers/md/bcache/btree.c
> > +++ b/drivers/md/bcache/btree.c
> > @@ -90,6 +90,9 @@
> >  
> >  #define MAX_NEED_GC        64
> >  #define MAX_SAVE_PRIO        72
> > +#define MIN_GC_NODES        100
> > +#define GC_SLEEP_TIME        100
> 
> How about naming the above macro as GC_SLEEP_MS ? So people may
> understand the sleep time is 100ms without checking more context.
OK, It looks better.
> > +
> >  
> >  #define PTR_DIRTY_BIT        (((uint64_t) 1 << 36))
> >  
> > @@ -1581,6 +1584,13 @@ static int btree_gc_recurse(struct btree *b, struct btree_op *op,
> >          memmove(r + 1, r, sizeof(r[0]) * (GC_MERGE_NODES - 1));
> >          r->b = NULL;
> >  
> > +        if (atomic_read(&b->c->io_inflight) &&
> > +            gc->nodes >= gc->nodes_pre + MIN_GC_NODES) {
> 
> On 32bit machines, gc->nodes is a 32bit count, if it is overflowed the
> above check will be false for a very long time, and in some condition it
> might be always false after gc->nodes overflowed.
> 
> How about adding the following check before the if() statement ?

>         /* in case gc->nodes is overflowed */
>         if (gc->nodes_pre < gc->nodes)
>             gc->noeds_pre = gc->nodes;
>
I think 32bit is big enough, which can express a value of billions,
but the number of nodes is usually around tens of thousands.
Also gc->nodes and gc->nodes_pre were set to zero when GC begin each time.

How do you think about? 

> > +            gc->nodes_pre =  gc->nodes;
> > +            ret = -EAGAIN;
> > +            break;
> > +        }
> > +
> >          if (need_resched()) {
> >              ret = -EAGAIN;
> >              break;
> > @@ -1748,7 +1758,10 @@ static void bch_btree_gc(struct cache_set *c)
> >          closure_sync(&writes);
> >          cond_resched();
> >  
> > -        if (ret && ret != -EAGAIN)
> > +        if (ret == -EAGAIN)
> > +            schedule_timeout_interruptible(msecs_to_jiffies
> > +                               (GC_SLEEP_TIME));
> > +        else if (ret)
> >              pr_warn("gc failed!");
> >      } while (ret);
> >  
> > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> > index 643c3021..26750a9 100644
> > --- a/drivers/md/bcache/request.c
> > +++ b/drivers/md/bcache/request.c
> > @@ -637,7 +637,9 @@ static void do_bio_hook(struct search *s, struct bio *orig_bio)
> >  static void search_free(struct closure *cl)
> >  {
> >      struct search *s = container_of(cl, struct search, cl);
> > +
> >      bio_complete(s);
> > +    atomic_dec(&s->d->c->io_inflight);
> >  
> >      if (s->iop.bio)
> >          bio_put(s->iop.bio);
> > @@ -655,6 +657,7 @@ static inline struct search *search_alloc(struct bio *bio,
> >  
> >      closure_init(&s->cl, NULL);
> >      do_bio_hook(s, bio);
> > +    atomic_inc(&d->c->io_inflight);
> >  
> 
> Similar variable naming.
> 
> >      s->orig_bio        = bio;
> >      s->cache_miss        = NULL;
> > 
> 

Thanks for your comments.
Tang Junhui

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/4] bcache: finish incremental GC
  2018-04-12  6:38 ` [PATCH 1/4] bcache: finish incremental GC tang.junhui
@ 2018-04-12 14:08   ` Coly Li
  0 siblings, 0 replies; 7+ messages in thread
From: Coly Li @ 2018-04-12 14:08 UTC (permalink / raw)
  To: tang.junhui; +Cc: kent.overstreet, mlyle, linux-bcache, linux-block

On 2018/4/12 2:38 PM, tang.junhui@zte.com.cn wrote:
> From: Tang Junhui <tang.junhui@zte.com.cn>
> 
> In GC thread, we record the latest GC key in gc_done, which is expected
> to be used for incremental GC, but in currently code, we didn't realize
> it. When GC runs, front side IO would be blocked until the GC over, it
> would be a long time if there is a lot of btree nodes.
> 
> This patch realizes incremental GC, the main ideal is that, when there
> are front side I/Os, after GC some nodes (100), we stop GC, release locker
> of the btree node, and go to process the front side I/Os for some times
> (100 ms), then go back to GC again.
> 
> By this patch, when we doing GC, I/Os are not blocked all the time, and
> there is no obvious I/Os zero jump problem any more.
> 
> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>

Hi Junhui,

I reply my comments in line,

> ---
>  drivers/md/bcache/bcache.h  |  5 +++++
>  drivers/md/bcache/btree.c   | 15 ++++++++++++++-
>  drivers/md/bcache/request.c |  3 +++
>  3 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 843877e..ab4c9ca 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -445,6 +445,7 @@ struct cache {
>  
>  struct gc_stat {
>  	size_t			nodes;
> +	size_t			nodes_pre;
>  	size_t			key_bytes;
>  
>  	size_t			nkeys;
> @@ -568,6 +569,10 @@ struct cache_set {
>  	 */
>  	atomic_t		rescale;
>  	/*
> +	 * used for GC, identify if any front side I/Os is inflight
> +	 */
> +	atomic_t		io_inflight;

Could you please to rename the above variable to something like
search_inflight ? Just to be more explicit, considering we have many
types of io requests.

> +	/*
>  	 * When we invalidate buckets, we use both the priority and the amount
>  	 * of good data to determine which buckets to reuse first - to weight
>  	 * those together consistently we keep track of the smallest nonzero
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index 81e8dc3..b36d276 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -90,6 +90,9 @@
>  
>  #define MAX_NEED_GC		64
>  #define MAX_SAVE_PRIO		72
> +#define MIN_GC_NODES		100
> +#define GC_SLEEP_TIME		100

How about naming the above macro as GC_SLEEP_MS ? So people may
understand the sleep time is 100ms without checking more context.

> +
>  
>  #define PTR_DIRTY_BIT		(((uint64_t) 1 << 36))
>  
> @@ -1581,6 +1584,13 @@ static int btree_gc_recurse(struct btree *b, struct btree_op *op,
>  		memmove(r + 1, r, sizeof(r[0]) * (GC_MERGE_NODES - 1));
>  		r->b = NULL;
>  
> +		if (atomic_read(&b->c->io_inflight) &&
> +		    gc->nodes >= gc->nodes_pre + MIN_GC_NODES) {

On 32bit machines, gc->nodes is a 32bit count, if it is overflowed the
above check will be false for a very long time, and in some condition it
might be always false after gc->nodes overflowed.

How about adding the following check before the if() statement ?
		/* in case gc->nodes is overflowed */
		if (gc->nodes_pre < gc->nodes)
			gc->noeds_pre = gc->nodes;

> +			gc->nodes_pre =  gc->nodes;
> +			ret = -EAGAIN;
> +			break;
> +		}
> +
>  		if (need_resched()) {
>  			ret = -EAGAIN;
>  			break;
> @@ -1748,7 +1758,10 @@ static void bch_btree_gc(struct cache_set *c)
>  		closure_sync(&writes);
>  		cond_resched();
>  
> -		if (ret && ret != -EAGAIN)
> +		if (ret == -EAGAIN)
> +			schedule_timeout_interruptible(msecs_to_jiffies
> +						       (GC_SLEEP_TIME));
> +		else if (ret)
>  			pr_warn("gc failed!");
>  	} while (ret);
>  
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index 643c3021..26750a9 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -637,7 +637,9 @@ static void do_bio_hook(struct search *s, struct bio *orig_bio)
>  static void search_free(struct closure *cl)
>  {
>  	struct search *s = container_of(cl, struct search, cl);
> +
>  	bio_complete(s);
> +	atomic_dec(&s->d->c->io_inflight);
>  
>  	if (s->iop.bio)
>  		bio_put(s->iop.bio);
> @@ -655,6 +657,7 @@ static inline struct search *search_alloc(struct bio *bio,
>  
>  	closure_init(&s->cl, NULL);
>  	do_bio_hook(s, bio);
> +	atomic_inc(&d->c->io_inflight);
>  

Similar variable naming.

>  	s->orig_bio		= bio;
>  	s->cache_miss		= NULL;
> 

Thanks for your effort.

Coly Li

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] bcache: finish incremental GC
  2018-04-12  6:38 [PATCH 0/4] bcache: incremental GC and dirty data init tang.junhui
@ 2018-04-12  6:38 ` tang.junhui
  2018-04-12 14:08   ` Coly Li
  0 siblings, 1 reply; 7+ messages in thread
From: tang.junhui @ 2018-04-12  6:38 UTC (permalink / raw)
  To: kent.overstreet, colyli, mlyle; +Cc: linux-bcache, linux-block, tang.junhui

From: Tang Junhui <tang.junhui@zte.com.cn>

In GC thread, we record the latest GC key in gc_done, which is expected
to be used for incremental GC, but in currently code, we didn't realize
it. When GC runs, front side IO would be blocked until the GC over, it
would be a long time if there is a lot of btree nodes.

This patch realizes incremental GC, the main ideal is that, when there
are front side I/Os, after GC some nodes (100), we stop GC, release locker
of the btree node, and go to process the front side I/Os for some times
(100 ms), then go back to GC again.

By this patch, when we doing GC, I/Os are not blocked all the time, and
there is no obvious I/Os zero jump problem any more.

Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
---
 drivers/md/bcache/bcache.h  |  5 +++++
 drivers/md/bcache/btree.c   | 15 ++++++++++++++-
 drivers/md/bcache/request.c |  3 +++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 843877e..ab4c9ca 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -445,6 +445,7 @@ struct cache {
 
 struct gc_stat {
 	size_t			nodes;
+	size_t			nodes_pre;
 	size_t			key_bytes;
 
 	size_t			nkeys;
@@ -568,6 +569,10 @@ struct cache_set {
 	 */
 	atomic_t		rescale;
 	/*
+	 * used for GC, identify if any front side I/Os is inflight
+	 */
+	atomic_t		io_inflight;
+	/*
 	 * When we invalidate buckets, we use both the priority and the amount
 	 * of good data to determine which buckets to reuse first - to weight
 	 * those together consistently we keep track of the smallest nonzero
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 81e8dc3..b36d276 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -90,6 +90,9 @@
 
 #define MAX_NEED_GC		64
 #define MAX_SAVE_PRIO		72
+#define MIN_GC_NODES		100
+#define GC_SLEEP_TIME		100
+
 
 #define PTR_DIRTY_BIT		(((uint64_t) 1 << 36))
 
@@ -1581,6 +1584,13 @@ static int btree_gc_recurse(struct btree *b, struct btree_op *op,
 		memmove(r + 1, r, sizeof(r[0]) * (GC_MERGE_NODES - 1));
 		r->b = NULL;
 
+		if (atomic_read(&b->c->io_inflight) &&
+		    gc->nodes >= gc->nodes_pre + MIN_GC_NODES) {
+			gc->nodes_pre =  gc->nodes;
+			ret = -EAGAIN;
+			break;
+		}
+
 		if (need_resched()) {
 			ret = -EAGAIN;
 			break;
@@ -1748,7 +1758,10 @@ static void bch_btree_gc(struct cache_set *c)
 		closure_sync(&writes);
 		cond_resched();
 
-		if (ret && ret != -EAGAIN)
+		if (ret == -EAGAIN)
+			schedule_timeout_interruptible(msecs_to_jiffies
+						       (GC_SLEEP_TIME));
+		else if (ret)
 			pr_warn("gc failed!");
 	} while (ret);
 
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 643c3021..26750a9 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -637,7 +637,9 @@ static void do_bio_hook(struct search *s, struct bio *orig_bio)
 static void search_free(struct closure *cl)
 {
 	struct search *s = container_of(cl, struct search, cl);
+
 	bio_complete(s);
+	atomic_dec(&s->d->c->io_inflight);
 
 	if (s->iop.bio)
 		bio_put(s->iop.bio);
@@ -655,6 +657,7 @@ static inline struct search *search_alloc(struct bio *bio,
 
 	closure_init(&s->cl, NULL);
 	do_bio_hook(s, bio);
+	atomic_inc(&d->c->io_inflight);
 
 	s->orig_bio		= bio;
 	s->cache_miss		= NULL;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-04-13  8:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-13  8:37 [PATCH 1/4] bcache: finish incremental GC tang.junhui
2018-04-13  8:48 ` Coly Li
  -- strict thread matches above, loose matches on Subject: below --
2018-04-13  8:55 tang.junhui
2018-04-13  3:12 tang.junhui
2018-04-13  3:43 ` Coly Li
2018-04-12  6:38 [PATCH 0/4] bcache: incremental GC and dirty data init tang.junhui
2018-04-12  6:38 ` [PATCH 1/4] bcache: finish incremental GC tang.junhui
2018-04-12 14:08   ` Coly Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.