All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Yang, Bin" <bin.yang@intel.com>
To: "tglx@linutronix.de" <tglx@linutronix.de>
Cc: "mingo@kernel.org" <mingo@kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"Gross, Mark" <mark.gross@intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"Hansen, Dave" <dave.hansen@intel.com>
Subject: Re: [PATCH] x86/mm: fix cpu stuck issue in __change_page_attr_set_clr
Date: Tue, 10 Jul 2018 02:18:32 +0000	[thread overview]
Message-ID: <6957364e872792c9cc310cf4928ae90771f2b69a.camel@intel.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1807041221110.1816@nanos.tec.linutronix.de>

On Wed, 2018-07-04 at 16:01 +0200, Thomas Gleixner wrote:
> On Wed, 4 Jul 2018, Yang, Bin wrote:
> > e820 table:
> > =================
> > 
> > [    0.000000] BIOS-e820: [mem 0x0000000000000000-
> > 0x000000000009fbff]
> > usable
> > [    0.000000] BIOS-e820: [mem 0x000000000009fc00-
> > 0x000000000009ffff]
> > reserved
> > [    0.000000] BIOS-e820: [mem 0x00000000000f0000-
> > 0x00000000000fffff]
> > reserved
> > [    0.000000] BIOS-e820: [mem 0x0000000000100000-
> > 0x00000000bffdffff]
> > usable
> > [    0.000000] BIOS-e820: [mem 0x00000000bffe0000-
> > 0x00000000bfffffff]
> > reserved
> > [    0.000000] BIOS-e820: [mem 0x00000000feffc000-
> > 0x00000000feffffff]
> > reserved
> > [    0.000000] BIOS-e820: [mem 0x00000000fffc0000-
> > 0x00000000ffffffff]
> > reserved
> > [    0.000000] BIOS-e820: [mem 0x0000000100000000-
> > 0x000000013fffffff]
> > usable
> > 
> > call chain:
> > ======================
> > 
> > ...
> > => free_init_pages(what="initrd" or "unused kernel",
> > begin=ffff9b26b....000, end=ffff9b26c....000); begin and end
> > addresses
> > are random. The begin/end value above is just for reference.
> > 
> > => set_memory_rw()
> > => change_page_attr_set()
> > => change_page_attr_set_clr()
> > => __change_page_attr_set_clr(); cpa->numpages is 512 on my board
> > if
> > what=="unused kernel"
> > => __change_page_attr()
> > => try_preserve_large_page(); address=ffff9b26bfacf000, pfn=80000,
> > level=3; and the check loop count is 262144, exit loop after 861
> > usecs
> > on my board
> 
> You are talking about the static_protections() check, right?
> 
> > the actual problem
> > ===================
> > sometimes, free_init_pages returns after hundreds of secounds. The
> > major impact is kernel boot time.
> 
> That's the symptom you are observing. The problem is in the
> try_to_preserve_large_page() logic.
> 
> The address range from the example above is:
> 
>    0xffff9b26b0000000 - 0xffff9b26c0000000
> 
> i.e. 256 MB.
> 
> So the first call with address 0xffff9b26b0000000 will try to
> preserve the
> 1GB page, but it's obvious that if pgrot changes that this has to
> split the
> 1GB page.
> 
> The current code is stupid in that regard simply because it does the
> static_protection() check loop _before_ checking:
> 
>   1) Whether the new and the old pgprot are the same
>   
>   2) Whether the address range which needs to be changed is aligned
> to and
>      covers the full large mapping
> 
> So it does the static_protections() loop before #1 and #2 and checks
> the
> full 1GB page wise, which makes it loop 262144 times.
> 
> With your magic 'cache' logic this will still loop exactly 262144
> times at
> least on the first invocation because there is no valid information
> in that
> 'cache'. So I really have no idea how your patch makes any difference
> unless it is breaking stuff left and right in very subtle ways.
> 
> If there is a second invocation with the same pgprot on that very
> same
> range, then I can see it somehow having that effect by chance, but
> not by
> design.
> 
> But this is all voodoo programming and there is a very obvious and
> simple
> solution for this:
> 
>   The check for pgprot_val(new_prot) == pgprot_val(old_port) can
> definitely
>   be done _before_ the check loop. The check loop is pointless in
> that
>   case, really. If there is a mismatch anywhere then it existed
> already and
>   instead of yelling loudly the checking would just discover it,
> enforce
>   the split and that would in the worst case preserve the old wrong
> mapping
>   for those pages.
> 
>   What we want there is a debug mechanism which catches such cases,
> but is
>   not effective on production kernels and runs once or twice during
> boot.
> 
>   The range check whether the address is aligned to the large page
> and
>   covers the full large page (1G or 2M) is also obvious to do
> _before_ that
>   check, because if the requested range does not fit and has a
> different
>   pgprot_val() then it will decide to split after the check anyway.
> 
>   The check loop itself is stupid as well. Instead of looping in 4K
> steps
>   the thing can be rewritten to check for overlapping ranges and then
> check
>   explicitely for those. If there is no overlap in the first place
> the
>   whole loop thing can be avoided, but that's a pure optimization and
> needs
>   more thought than the straight forward and obvious solution to the
>   problem at hand.
> 
> Untested patch just moving the quick checks to the obvious right
> place
> below.
> 
> Thanks,
> 
> 	tglx
> 
> 8<-----------------
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -628,47 +628,61 @@ try_preserve_large_page(pte_t *kpte, uns
>  	new_prot = static_protections(req_prot, address, pfn);
>  
>  	/*
> -	 * We need to check the full range, whether
> -	 * static_protection() requires a different pgprot for one
> of
> -	 * the pages in the range we try to preserve:
> +	 * If there are no changes, return. cpa->numpages has been
> updated
> +	 * above.
> +	 *
> +	 * There is no need to do any of the checks below because
> the
> +	 * existing pgprot of the large mapping has to be correct.
> If it's
> +	 * not then this is a bug in some other code and needs to be
> fixed
> +	 * there and not silently papered over by the
> static_protections()
> +	 * magic and then 'fixed' up in the wrong way.
>  	 */
> -	addr = address & pmask;
> -	pfn = old_pfn;
> -	for (i = 0; i < (psize >> PAGE_SHIFT); i++, addr +=
> PAGE_SIZE, pfn++) {
> -		pgprot_t chk_prot = static_protections(req_prot,
> addr, pfn);
> -
> -		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
> -			goto out_unlock;
> +	if (pgprot_val(new_prot) == pgprot_val(old_prot)) {
> +		do_split = 0;
> +		goto out_unlock;
>  	}
>  
>  	/*
> -	 * If there are no changes, return. maxpages has been
> updated
> -	 * above:
> +	 * If the requested address range is not aligned to the
> start of
> +	 * the large page or does not cover the full range, split it
> up.
> +	 * No matter what the static_protections() check below does,
> it
> +	 * would anyway result in a split after doing all the check
> work
> +	 * for nothing.
>  	 */
> -	if (pgprot_val(new_prot) == pgprot_val(old_prot)) {
> -		do_split = 0;
> +	addr = address & pmask;
> +	numpages = psize >> PAGE_SHIFT;
> +	if (address != addr || cpa->numpages != numpages)
>  		goto out_unlock;
> -	}
>  
>  	/*
> -	 * We need to change the attributes. Check, whether we can
> -	 * change the large page in one go. We request a split, when
> -	 * the address is not aligned and the number of pages is
> -	 * smaller than the number of pages in the large page. Note
> -	 * that we limited the number of possible pages already to
> -	 * the number of pages in the large page.
> +	 * Check the full range, whether static_protection()
> requires a
> +	 * different pgprot for one of the pages in the existing
> large
> +	 * mapping.
> +	 *
> +	 * FIXME:
> +	 * 1) Make this yell loudly if something tries to set a full
> +	 *    large page to something which is not allowed.

I am trying to work out a patch based on your sample code and
comments. 
I do not understand here why it needs to yell loudly if set a full
large page to something which is not allowed. It can split the large
page to 4K-pages finally. And static_protection() will also be called
for 4K-page change. Why not just add warning if 4K-page change is not
allowed?

> +	 * 2) Add a check for catching a case where the existing
> mapping
> +	 *    is wrong already.
> +	 * 3) Instead of looping over the whole thing, find the
> overlapping
> +	 *    ranges and check them explicitely instead of looping
> over a
> +	 *    full 1G mapping in 4K steps (256k iterations) for
> figuring
> +	 *    that out.
>  	 */
> -	if (address == (address & pmask) && cpa->numpages == (psize
> >> PAGE_SHIFT)) {
> -		/*
> -		 * The address is aligned and the number of pages
> -		 * covers the full page.
> -		 */
> -		new_pte = pfn_pte(old_pfn, new_prot);
> -		__set_pmd_pte(kpte, address, new_pte);
> -		cpa->flags |= CPA_FLUSHTLB;
> -		do_split = 0;
> +	pfn = old_pfn;
> +	for (i = 0; i < numpages; i++, addr += PAGE_SIZE, pfn++) {
> +		pgprot_t chk_prot = static_protections(req_prot,
> addr, pfn);
> +
> +		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
> +			goto out_unlock;
>  	}
>  
> +	/* All checks passed. Just change the large mapping entry */
> +	new_pte = pfn_pte(old_pfn, new_prot);
> +	__set_pmd_pte(kpte, address, new_pte);
> +	cpa->flags |= CPA_FLUSHTLB;
> +	do_split = 0;
> +
>  out_unlock:
>  	spin_unlock(&pgd_lock);
>  
> 
> 
> 
> 

  parent reply	other threads:[~2018-07-10  2:18 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-28 10:05 [PATCH] x86/mm: fix cpu stuck issue in __change_page_attr_set_clr Bin Yang
2018-07-03 13:57 ` Thomas Gleixner
2018-07-04  6:07   ` Yang, Bin
2018-07-04  7:40     ` Thomas Gleixner
2018-07-04  9:16       ` Yang, Bin
2018-07-04  9:20         ` Thomas Gleixner
2018-07-04 10:15           ` Yang, Bin
2018-07-04 14:01             ` Thomas Gleixner
2018-07-05  1:08               ` Yang, Bin
2018-07-05  5:30                 ` Thomas Gleixner
2018-07-05  5:48                   ` Yang, Bin
2018-07-05  6:15                     ` Thomas Gleixner
2018-07-10  2:18               ` Yang, Bin [this message]
2018-07-16  7:48                 ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6957364e872792c9cc310cf4928ae90771f2b69a.camel@intel.com \
    --to=bin.yang@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.gross@intel.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.