All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/4] bcache: finish incremental GC
@ 2018-04-13  3:12 tang.junhui
  2018-04-13  3:43 ` Coly Li
  0 siblings, 1 reply; 7+ messages in thread
From: tang.junhui @ 2018-04-13  3:12 UTC (permalink / raw)
  To: colyli; +Cc: kent.overstreet, mlyle, linux-bcache, linux-block, tang.junhui

Hi Coly,

Hello Coly,

> On 2018/4/12 2:38 PM, tang.junhui@zte.com.cn wrote:
> > From: Tang Junhui <tang.junhui@zte.com.cn>
> > 
> > In GC thread, we record the latest GC key in gc_done, which is expected
> > to be used for incremental GC, but in currently code, we didn't realize
> > it. When GC runs, front side IO would be blocked until the GC over, it
> > would be a long time if there is a lot of btree nodes.
> > 
> > This patch realizes incremental GC, the main ideal is that, when there
> > are front side I/Os, after GC some nodes (100), we stop GC, release locker
> > of the btree node, and go to process the front side I/Os for some times
> > (100 ms), then go back to GC again.
> > 
> > By this patch, when we doing GC, I/Os are not blocked all the time, and
> > there is no obvious I/Os zero jump problem any more.
> > 
> > Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
> 
> Hi Junhui,
> 
> I reply my comments in line,
Thanks for your comments in advance.
> 
> > ---
> >  drivers/md/bcache/bcache.h  |  5 +++++
> >  drivers/md/bcache/btree.c   | 15 ++++++++++++++-
> >  drivers/md/bcache/request.c |  3 +++
> >  3 files changed, 22 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> > index 843877e..ab4c9ca 100644
> > --- a/drivers/md/bcache/bcache.h
> > +++ b/drivers/md/bcache/bcache.h
> > @@ -445,6 +445,7 @@ struct cache {
> >  
> >  struct gc_stat {
> >      size_t            nodes;
> > +    size_t            nodes_pre;
> >      size_t            key_bytes;
> >  
> >      size_t            nkeys;
> > @@ -568,6 +569,10 @@ struct cache_set {
> >       */
> >      atomic_t        rescale;
> >      /*
> > +     * used for GC, identify if any front side I/Os is inflight
> > +     */
> > +    atomic_t        io_inflight;
> 
> Could you please to rename the above variable to something like
> search_inflight ? Just to be more explicit, considering we have many
> types of io requests.
OK, It looks better.
> 
> > +    /*
> >       * When we invalidate buckets, we use both the priority and the amount
> >       * of good data to determine which buckets to reuse first - to weight
> >       * those together consistently we keep track of the smallest nonzero
> > diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> > index 81e8dc3..b36d276 100644
> > --- a/drivers/md/bcache/btree.c
> > +++ b/drivers/md/bcache/btree.c
> > @@ -90,6 +90,9 @@
> >  
> >  #define MAX_NEED_GC        64
> >  #define MAX_SAVE_PRIO        72
> > +#define MIN_GC_NODES        100
> > +#define GC_SLEEP_TIME        100
> 
> How about naming the above macro as GC_SLEEP_MS ? So people may
> understand the sleep time is 100ms without checking more context.
OK, It looks better.
> > +
> >  
> >  #define PTR_DIRTY_BIT        (((uint64_t) 1 << 36))
> >  
> > @@ -1581,6 +1584,13 @@ static int btree_gc_recurse(struct btree *b, struct btree_op *op,
> >          memmove(r + 1, r, sizeof(r[0]) * (GC_MERGE_NODES - 1));
> >          r->b = NULL;
> >  
> > +        if (atomic_read(&b->c->io_inflight) &&
> > +            gc->nodes >= gc->nodes_pre + MIN_GC_NODES) {
> 
> On 32bit machines, gc->nodes is a 32bit count, if it is overflowed the
> above check will be false for a very long time, and in some condition it
> might be always false after gc->nodes overflowed.
> 
> How about adding the following check before the if() statement ?

>         /* in case gc->nodes is overflowed */
>         if (gc->nodes_pre < gc->nodes)
>             gc->noeds_pre = gc->nodes;
>
I think 32bit is big enough, which can express a value of billions,
but the number of nodes is usually around tens of thousands.
Also gc->nodes and gc->nodes_pre were set to zero when GC begin each time.

How do you think about? 

> > +            gc->nodes_pre =  gc->nodes;
> > +            ret = -EAGAIN;
> > +            break;
> > +        }
> > +
> >          if (need_resched()) {
> >              ret = -EAGAIN;
> >              break;
> > @@ -1748,7 +1758,10 @@ static void bch_btree_gc(struct cache_set *c)
> >          closure_sync(&writes);
> >          cond_resched();
> >  
> > -        if (ret && ret != -EAGAIN)
> > +        if (ret == -EAGAIN)
> > +            schedule_timeout_interruptible(msecs_to_jiffies
> > +                               (GC_SLEEP_TIME));
> > +        else if (ret)
> >              pr_warn("gc failed!");
> >      } while (ret);
> >  
> > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> > index 643c3021..26750a9 100644
> > --- a/drivers/md/bcache/request.c
> > +++ b/drivers/md/bcache/request.c
> > @@ -637,7 +637,9 @@ static void do_bio_hook(struct search *s, struct bio *orig_bio)
> >  static void search_free(struct closure *cl)
> >  {
> >      struct search *s = container_of(cl, struct search, cl);
> > +
> >      bio_complete(s);
> > +    atomic_dec(&s->d->c->io_inflight);
> >  
> >      if (s->iop.bio)
> >          bio_put(s->iop.bio);
> > @@ -655,6 +657,7 @@ static inline struct search *search_alloc(struct bio *bio,
> >  
> >      closure_init(&s->cl, NULL);
> >      do_bio_hook(s, bio);
> > +    atomic_inc(&d->c->io_inflight);
> >  
> 
> Similar variable naming.
> 
> >      s->orig_bio        = bio;
> >      s->cache_miss        = NULL;
> > 
> 

Thanks for your comments.
Tang Junhui

^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [PATCH 1/4] bcache: finish incremental GC
@ 2018-04-13  8:55 tang.junhui
  0 siblings, 0 replies; 7+ messages in thread
From: tang.junhui @ 2018-04-13  8:55 UTC (permalink / raw)
  To: colyli; +Cc: kent.overstreet, mlyle, linux-bcache, linux-block, tang.junhui

Hi Coly,

> Hi Junhui,

> How about send out v2 patch and let me confirm it with a Reviewed-by ?

Certainly it's OK.

Thanks.
Tang Junhui

^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [PATCH 1/4] bcache: finish incremental GC
@ 2018-04-13  8:37 tang.junhui
  2018-04-13  8:48 ` Coly Li
  0 siblings, 1 reply; 7+ messages in thread
From: tang.junhui @ 2018-04-13  8:37 UTC (permalink / raw)
  To: colyli; +Cc: kent.overstreet, mlyle, linux-bcache, linux-block, tang.junhui

Hi Coly,
>> 
>> Hello Coly,
>> 
>>> On 2018/4/12 2:38 PM, tang.junhui@zte.com.cn wrote:
>>>> From: Tang Junhui <tang.junhui@zte.com.cn>
>>>>
>>>> In GC thread, we record the latest GC key in gc_done, which is expected
>>>> to be used for incremental GC, but in currently code, we didn't realize
>>>> it. When GC runs, front side IO would be blocked until the GC over, it
>>>> would be a long time if there is a lot of btree nodes.
>>>>
>>>> This patch realizes incremental GC, the main ideal is that, when there
>>>> are front side I/Os, after GC some nodes (100), we stop GC, release locker
>>>> of the btree node, and go to process the front side I/Os for some times
>>>> (100 ms), then go back to GC again.
>>>>
>>>> By this patch, when we doing GC, I/Os are not blocked all the time, and
>>>> there is no obvious I/Os zero jump problem any more.
>>>>
>>>> Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
>>>
>>> Hi Junhui,
>>>
>>> I reply my comments in line,
>> Thanks for your comments in advance.
>>>
>>>> ---
>>>>  drivers/md/bcache/bcache.h  |  5 +++++
>>>>  drivers/md/bcache/btree.c   | 15 ++++++++++++++-
>>>>  drivers/md/bcache/request.c |  3 +++
>>>>  3 files changed, 22 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
>>>> index 843877e..ab4c9ca 100644
>>>> --- a/drivers/md/bcache/bcache.h
>>>> +++ b/drivers/md/bcache/bcache.h
>>>> @@ -445,6 +445,7 @@ struct cache {
>>>>  
>>>>  struct gc_stat {
>>>>      size_t            nodes;
>>>> +    size_t            nodes_pre;
>>>>      size_t            key_bytes;
>>>>  
>>>>      size_t            nkeys;
>>>> @@ -568,6 +569,10 @@ struct cache_set {
>>>>       */
>>>>      atomic_t        rescale;
>>>>      /*
>>>> +     * used for GC, identify if any front side I/Os is inflight
>>>> +     */
>>>> +    atomic_t        io_inflight;
>>>
>>> Could you please to rename the above variable to something like
>>> search_inflight ? Just to be more explicit, considering we have many
>>> types of io requests.
>> OK, It looks better.
>>>
>>>> +    /*
>>>>       * When we invalidate buckets, we use both the priority and the amount
>>>>       * of good data to determine which buckets to reuse first - to weight
>>>>       * those together consistently we keep track of the smallest nonzero
>>>> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
>>>> index 81e8dc3..b36d276 100644
>>>> --- a/drivers/md/bcache/btree.c
>>>> +++ b/drivers/md/bcache/btree.c
>>>> @@ -90,6 +90,9 @@
>>>>  
>>>>  #define MAX_NEED_GC        64
>>>>  #define MAX_SAVE_PRIO        72
>>>> +#define MIN_GC_NODES        100
>>>> +#define GC_SLEEP_TIME        100
>>>
>>> How about naming the above macro as GC_SLEEP_MS ? So people may
>>> understand the sleep time is 100ms without checking more context.
>> OK, It looks better.
>>>> +
>>>>  
>>>>  #define PTR_DIRTY_BIT        (((uint64_t) 1 << 36))
>>>>  
>>>> @@ -1581,6 +1584,13 @@ static int btree_gc_recurse(struct btree *b, struct btree_op *op,
>>>>          memmove(r + 1, r, sizeof(r[0]) * (GC_MERGE_NODES - 1));
>>>>          r->b = NULL;
>>>>  
>>>> +        if (atomic_read(&b->c->io_inflight) &&
>>>> +            gc->nodes >= gc->nodes_pre + MIN_GC_NODES) {
>>>
>>> On 32bit machines, gc->nodes is a 32bit count, if it is overflowed the
>>> above check will be false for a very long time, and in some condition it
>>> might be always false after gc->nodes overflowed.
>>>
>>> How about adding the following check before the if() statement ?
>> 
>>>         /* in case gc->nodes is overflowed */
>>>         if (gc->nodes_pre < gc->nodes)
>>>             gc->noeds_pre = gc->nodes;
>>>
>> I think 32bit is big enough, which can express a value of billions,
>> but the number of nodes is usually around tens of thousands.
>> Also gc->nodes and gc->nodes_pre were set to zero when GC begin each time.
>> 
>> How do you think about? 
>
>Oh, I see. Then I don't worry here. The patch is OK to me.
Thanks Coly, Would you mind if I add you as a reveiw-by in the v2 patch?

Thanks.
Tang Junhui

^ permalink raw reply	[flat|nested] 7+ messages in thread
* [PATCH 0/4] bcache: incremental GC and dirty data init
@ 2018-04-12  6:38 tang.junhui
  2018-04-12  6:38 ` [PATCH 1/4] bcache: finish incremental GC tang.junhui
  0 siblings, 1 reply; 7+ messages in thread
From: tang.junhui @ 2018-04-12  6:38 UTC (permalink / raw)
  To: kent.overstreet, colyli, mlyle; +Cc: linux-bcache, linux-block, tang.junhui

Hi maintainers and folks,

Some patches of this patch set have been sent before, they are not merged
yet, and I add two new patches to solve some issues I found while testing.
since They are interdependent, so I make a patch set and resend them.

[PATCH 1/4] bcache: finish incremental GC
[PATCH 2/4] bcache: calculate the number of incremental GC nodes according to
			the total of btree nodes
[PATCH 3/4] bcache: notify allocator to prepare for GC
[PATCH 4/4] bcache: fix I/O significant decline while backend devices registering

These patches are useful to prevent I/O fluctuations or even jump 0 while GC or
cached devices registering. I have test them for some times, I hope somebody could
have a review for these patch, any comment would be welcome.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-04-13  8:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-13  3:12 [PATCH 1/4] bcache: finish incremental GC tang.junhui
2018-04-13  3:43 ` Coly Li
  -- strict thread matches above, loose matches on Subject: below --
2018-04-13  8:55 tang.junhui
2018-04-13  8:37 tang.junhui
2018-04-13  8:48 ` Coly Li
2018-04-12  6:38 [PATCH 0/4] bcache: incremental GC and dirty data init tang.junhui
2018-04-12  6:38 ` [PATCH 1/4] bcache: finish incremental GC tang.junhui
2018-04-12 14:08   ` Coly Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.