All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bob Liu <bob.liu@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Bob Liu <lliubbo@gmail.com>,
	keir@xen.org, ian.campbell@citrix.com,
	George.Dunlap@eu.citrix.com, andrew.cooper3@citrix.com,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH 4/4] xen: use idle vcpus to scrub pages
Date: Wed, 18 Jun 2014 09:18:06 +0800	[thread overview]
Message-ID: <53A0E8CE.7050304@oracle.com> (raw)
In-Reply-To: <53A0583B020000780001B0E0@mail.emea.novell.com>


On 06/17/2014 09:01 PM, Jan Beulich wrote:
>>>> On 17.06.14 at 13:49, <lliubbo@gmail.com> wrote:
>> In case of heavy lock contention, use a percpu list.
>>  - Delist a batch of pages to a percpu list from "scrub" free page list.
>>  - Scrub pages on this percpu list.
>>  - Add those clean pages to normal "head" free page list, merge with other
> 
> Did you mean "heap" here?
> 

Yes, sorry!

>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -587,12 +587,14 @@ int domain_kill(struct domain *d)
>>          d->tmem_client = NULL;
>>          /* fallthrough */
>>      case DOMDYING_dying:
>> +        enable_idle_scrub = 0;
>>          rc = domain_relinquish_resources(d);
>>          if ( rc != 0 )
>>          {
>>              BUG_ON(rc != -EAGAIN);
>>              break;
>>          }
>> +        enable_idle_scrub = 1;
>>          for_each_vcpu ( d, v )
>>              unmap_vcpu_info(v);
>>          d->is_dying = DOMDYING_dead;
> 
> ???

enable_idle_scrub is a rough way to disable idle vcpu scrubbing, so that
domain_relinquish_resources() can get "&heap_lock" faster and make xl
destroy return quickly.

> 
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -79,6 +79,9 @@ PAGE_LIST_HEAD(page_offlined_list);
>>  /* Broken page list, protected by heap_lock. */
>>  PAGE_LIST_HEAD(page_broken_list);
>>  
>> +volatile bool_t enable_idle_scrub;
>> +DEFINE_PER_CPU(struct page_list_head, scrub_list_cpu);
> 
> I'm pretty sure I pointed out to you before that variables used only in
> a single file should be static.
> 

Sorry, again..

>> @@ -1387,7 +1390,86 @@ void __init scrub_heap_pages(void)
>>      setup_low_mem_virq();
>>  }
>>  
>> +#define SCRUB_BATCH 1024
>> +void scrub_free_pages(void)
>> +{
>> +    struct page_info *pg;
>> +    unsigned int i, j, node_empty = 0, nr_delisted = 0;
>> +    int order;
>> +    unsigned int cpu = smp_processor_id();
>> +    unsigned int node = cpu_to_node(cpu);
>> +    struct page_list_head *temp_list = &this_cpu(scrub_list_cpu);
>> +
>> +    if ( !enable_idle_scrub )
>> +        return;
>> +
>> +    do
>> +    {
>> +        if ( page_list_empty(temp_list) )
>> +        {
>> +            /* Delist a batch of pages from global scrub list */
>> +            spin_lock(&heap_lock);
>> +            for ( j = 0; j < NR_ZONES; j++ )
>> +            {
>> +                for ( order = MAX_ORDER; order >= 0; order-- )
>> +                {
>> +                    if ( (pg = page_list_remove_head(&scrub(node, j, order))) )
>> +                    {
>> +                        for ( i = 0; i < (1 << order); i++)
>> +                            mark_page_offline(&pg[i], 0);
> 
> A page previously having caused #MC now suddenly became healthy
> again?
> 

Will be fixed.

>> +
>> +                        page_list_add_tail(pg, temp_list);
>> +                        nr_delisted += (1 << order);
>> +                        if ( nr_delisted > SCRUB_BATCH )
>> +                        {
>> +                            nr_delisted = 0;
>> +                            spin_unlock(&heap_lock);
>> +                            goto start_scrub;
>> +                        }
>> +                    }
>> +                }
>> +            }
>> +
>> +            node_empty = 1;
>> +            spin_unlock(&heap_lock);
>> +        }
>> +        else
>> +        {
>> +start_scrub:
> 
> Labels need to be indented by at least one space.
> 
>> +            /* Scrub percpu list */
>> +            while ( !page_list_empty(temp_list) )
>> +            {
>> +                pg = page_list_remove_head(temp_list);
>> +                ASSERT(pg);
>> +                order = PFN_ORDER(pg);
>> +                for ( i = 0; i < (1 << order); i++ )
>> +                {
> 
> Order here can again be almost arbitrarily high. You aren't allowed
> to scrub e.g. 4Tb in one go.
> 

I can add a softirq_pending(cpu) check after each scrub_one_page.

>> +                    ASSERT( test_bit(_PGC_need_scrub, &(pg[i].count_info)) );
>> +                    scrub_one_page(&pg[i]);
>> +                    pg[i].count_info &= ~(PGC_need_scrub);
>> +                }
>>  
>> +                /* Add pages to free heap list */
>> +                spin_lock(&heap_lock);
> 
> Between the dropping of the lock above and the re-acquiring of it
> here the pages "stolen" can lead to allocation failures despite there
> being available memory. This needs to be avoided if at all possible.
> 

#define SCRUB_BATCH 1024 is used to limit the number of pages in percpu
list.
But yes, in the worst case(order=20) 4Tb will be invisible. I don't have
a good solution right now.

>> +                for ( i = 0; i < (1 << order); i++ )
>> +                {
>> +                    ASSERT ( !test_bit(_PGC_need_scrub, &(pg[i].count_info)) );
>> +                    pg[i].count_info |= PGC_state_free;
>> +                }
>> +                ASSERT (node == phys_to_nid(page_to_maddr(pg)));
>> +                merge_free_trunks(pg, order, 0);
>> +                spin_unlock(&heap_lock);
>> +
>> +                if ( softirq_pending(cpu) )
>> +                    return;
>> +            }
>> +        }
>> +
>> +        /* Scrub list of this node is empty */
>> +        if ( node_empty )
>> +            return;
>> +    } while ( !softirq_pending(cpu) );
>> +}
> 
> And all the logic needs to be made NUMA-aware, i.e. a CPU should
> prefer to scrub local memory, and only resort to scrubbing distant
> memory if no CPU on the correct node is idle. Plus (as we have seen
> with Konrad's boot time scrubbing changes) you should avoid having
> two hyperthreads within the same core doing scrubbing at the same
> time.
> 

Okay, will be taken into account.

> In the end I'm getting the impression that this all wasn't properly
> thought through yet. Convoys on the lock inside the idle processing
> should e.g. also be avoided (or reduced to a minimum) - there's no
> point preventing the entering of C states if all you're otherwise going
> to do is spin on a lock.
> 

I already use percpu list to reduce spin lock. node_empty can be
extended to avoid the case your mentioned.
Thank you very much for all your review.

-Bob

  reply	other threads:[~2014-06-18  1:18 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-17 11:49 [PATCH 1/4] xen: asm-x86: introduce PGC_need_scrub page flag Bob Liu
2014-06-17 11:49 ` [PATCH 2/4] xen: introduce an "scrub" free page list Bob Liu
2014-06-17 12:36   ` Jan Beulich
2014-06-18  0:54     ` Bob Liu
2014-06-18 13:18       ` Konrad Rzeszutek Wilk
2014-06-17 11:49 ` [PATCH 3/4] xen: separate a function merge_free_trunks from Bob Liu
2014-06-17 12:39   ` Jan Beulich
2014-06-17 11:49 ` [PATCH 4/4] xen: use idle vcpus to scrub pages Bob Liu
2014-06-17 13:01   ` Jan Beulich
2014-06-18  1:18     ` Bob Liu [this message]
2014-06-18 10:42       ` Jan Beulich
2014-06-17 12:35 ` [PATCH 1/4] xen: asm-x86: introduce PGC_need_scrub page flag Jan Beulich
2014-06-17 12:46 ` Julien Grall
2014-06-18  0:55   ` Bob Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53A0E8CE.7050304@oracle.com \
    --to=bob.liu@oracle.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=keir@xen.org \
    --cc=lliubbo@gmail.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.