linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rong Chen <rong.a.chen@intel.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	mceier@gmail.com, Davidlohr Bueso <dave@stgolabs.net>,
	Davidlohr Bueso <dbueso@suse.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@lists.01.org, "Kenneth R. Crudup" <kenny@panix.com>
Subject: Re: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search
Date: Mon, 2 Dec 2019 16:31:52 +0800	[thread overview]
Message-ID: <45ca8086-b7ce-1561-c7d1-3580872d2ba8@intel.com> (raw)
In-Reply-To: <20191201144947.GA4167@gmail.com>

Hi Ingo,

The patch fixes the regression reported by 0day-CI.

    "[LKP] [x86/mm/pat] 8d04a5f97a: phoronix-test-suite.glmark2.0.score 
-23.7% regression": 
https://lkml.kernel.org/r/20191127005312.GD20422@shao2-debian

Best Regards,
Rong Chen

On 12/1/19 10:49 PM, Ingo Molnar wrote:
> * Ingo Molnar <mingo@kernel.org> wrote:
>
>> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>> But the final difference is a real difference where it used to be WC,
>>> and is now UC-:
>>>
>>>      -write-combining @ 0x2000000000-0x2100000000
>>>      -write-combining @ 0x2000000000-0x2100000000
>>>      +uncached-minus @ 0x2000000000-0x2100000000
>>>      +uncached-minus @ 0x2000000000-0x2100000000
>>>
>>> which certainly could easily explain the huge performance degradation.
>> It's not an unconditional regression, as both Boris and me tried to
>> reproduce it on different systems that do ioremap_wc() as well and didn't
>> measure a slowdown, but something about the memory layout probably
>> triggers the tree management bug.
> Ok, I think I found at least one bug in the new PAT code, the conversion
> of memtype_check_conflict() is buggy I think:
>
>     8d04a5f97a5f: ("x86/mm/pat: Convert the PAT tree to a generic interval tree")
>
>          dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
>          found_type = match->type;
>   
> -       node = rb_next(&match->rb);
> -       while (node) {
> -               match = rb_entry(node, struct memtype, rb);
> -
> -               if (match->start >= end) /* Checked all possible matches */
> -                       goto success;
> -
> -               if (is_node_overlap(match, start, end) &&
> -                   match->type != found_type) {
> +       match = memtype_interval_iter_next(match, start, end);
> +       while (match) {
> +               if (match->type != found_type)
>                          goto failure;
> -               }
>   
> -               node = rb_next(&match->rb);
> +               match = memtype_interval_iter_next(match, start, end);
>          }
>
>
> Note how the '>= end' condition to end the interval check, got converted
> into:
>
> +       match = memtype_interval_iter_next(match, start, end);
>
> This is subtly off by one, because the interval trees interfaces require
> closed interval parameters:
>
>    include/linux/interval_tree_generic.h
>
>   /*                                                                            \
>    * Iterate over intervals intersecting [start;last]                           \
>    *                                                                            \
>    * Note that a node's interval intersects [start;last] iff:                   \
>    *   Cond1: ITSTART(node) <= last                                             \
>    * and                                                                        \
>    *   Cond2: start <= ITLAST(node)                                             \
>    */                                                                           \
>
>    ...
>
>                  if (ITSTART(node) <= last) {            /* Cond1 */           \
>                          if (start <= ITLAST(node))      /* Cond2 */           \
>                                  return node;    /* node is leftmost match */  \
>
> [start;last] is a closed interval (note that '<= last' check) - while the
> PAT 'end' parameter is 1 byte beyond the end of the range, because
> ioremap() and the other mapping APIs usually use the [start,end)
> half-open interval, derived from 'size'.
>
> This is what ioremap() does for example:
>
>          /*
>           * Mappings have to be page-aligned
>           */
>          offset = phys_addr & ~PAGE_MASK;
>          phys_addr &= PHYSICAL_PAGE_MASK;
>          size = PAGE_ALIGN(last_addr+1) - phys_addr;
>
>          retval = reserve_memtype(phys_addr, (u64)phys_addr + size,
>                                                  pcm, &new_pcm);
>
>
> phys_addr+size will be on a page boundary, after the last byte of the
> mapped interval.
>
> So the correct parameter to use in the interval tree searches is not
> 'end' but 'end-1'.
>
> This could have relevance if conflicting PAT ranges are exactly adjacent,
> for example a future WC region is followed immediately by an already
> mapped UC- region - in this case memtype_check_conflict() would
> incorrectly deny the WC memtype region and downgrade the memtype to UC-.
>
> BTW., rather annoyingly this downgrading is done silently in
> memtype_check_insert():
>
> int memtype_check_insert(struct memtype *new,
>                           enum page_cache_mode *ret_type)
> {
>          int err = 0;
>
>          err = memtype_check_conflict(new->start, new->end, new->type, ret_type);
>          if (err)
>                  return err;
>
>          if (ret_type)
>                  new->type = *ret_type;
>
>          memtype_interval_insert(new, &memtype_rbroot);
>          return 0;
> }
>
>
> So on such a conflict we'd just silently get UC- in *ret_type, and write
> it into the new region, never the wiser ...
>
> So assuming that the patch below fixes the primary bug the diagnostics
> side of ioremap() cache attribute downgrades would be another thing to
> fix.
>
> Anyway, I checked all the interval-tree iterations, and most of them are
> off by one - but I think the one related to memtype_check_conflict() is
> the one causing this particular performance regression.
>
> The only correct interval-tree searches were these two:
>
> arch/x86/mm/pat_interval.c:     match = memtype_interval_iter_first(&memtype_rbroot, 0, ULONG_MAX);
> arch/x86/mm/pat_interval.c:             match = memtype_interval_iter_next(match, 0, ULONG_MAX);
>
> The ULONG_MAX was hiding the off-by-one in plain sight. :-)
>
> So it would be nice if everyone who is seeing this bug could test the
> patch below against Linus's latest tree - does it fix the regression?
>
> If not then please send the before/after dump of
> /sys/kernel/debug/x86/pat_memtype_list - and even if it works please send
> the dumps so we can double check it all.
>
> Note that the bug was benign in the sense of implementing a too strict
> cache attribute conflict policy and downgrading cache attributes - so
> AFAICS the worst outcome of this bug would be a performance regression.
>
> Patch is only lightly tested, so take care. (Patch is emphatically not
> signed off yet, because I spent most of the day on this and I don't yet
> trust my fix - all of the affected sites need to be reviewed more
> carefully.)
>
> Thanks,
>
> 	Ingo
>
>
> ====================>
> From: Ingo Molnar <mingo@kernel.org>
> Date: Sun, 1 Dec 2019 15:25:50 +0100
> Subject: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search
>
> NOT-Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>   arch/x86/mm/pat_interval.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/mm/pat_interval.c b/arch/x86/mm/pat_interval.c
> index 47a1bf30748f..6855362eaf21 100644
> --- a/arch/x86/mm/pat_interval.c
> +++ b/arch/x86/mm/pat_interval.c
> @@ -56,7 +56,7 @@ static struct memtype *memtype_match(u64 start, u64 end, int match_type)
>   {
>   	struct memtype *match;
>   
> -	match = memtype_interval_iter_first(&memtype_rbroot, start, end);
> +	match = memtype_interval_iter_first(&memtype_rbroot, start, end-1);
>   	while (match != NULL && match->start < end) {
>   		if ((match_type == MEMTYPE_EXACT_MATCH) &&
>   		    (match->start == start) && (match->end == end))
> @@ -66,7 +66,7 @@ static struct memtype *memtype_match(u64 start, u64 end, int match_type)
>   		    (match->start < start) && (match->end == end))
>   			return match;
>   
> -		match = memtype_interval_iter_next(match, start, end);
> +		match = memtype_interval_iter_next(match, start, end-1);
>   	}
>   
>   	return NULL; /* Returns NULL if there is no match */
> @@ -79,7 +79,7 @@ static int memtype_check_conflict(u64 start, u64 end,
>   	struct memtype *match;
>   	enum page_cache_mode found_type = reqtype;
>   
> -	match = memtype_interval_iter_first(&memtype_rbroot, start, end);
> +	match = memtype_interval_iter_first(&memtype_rbroot, start, end-1);
>   	if (match == NULL)
>   		goto success;
>   
> @@ -89,12 +89,12 @@ static int memtype_check_conflict(u64 start, u64 end,
>   	dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
>   	found_type = match->type;
>   
> -	match = memtype_interval_iter_next(match, start, end);
> +	match = memtype_interval_iter_next(match, start, end-1);
>   	while (match) {
>   		if (match->type != found_type)
>   			goto failure;
>   
> -		match = memtype_interval_iter_next(match, start, end);
> +		match = memtype_interval_iter_next(match, start, end-1);
>   	}
>   success:
>   	if (newtype)
> @@ -160,7 +160,7 @@ struct memtype *memtype_erase(u64 start, u64 end)
>   struct memtype *memtype_lookup(u64 addr)
>   {
>   	return memtype_interval_iter_first(&memtype_rbroot, addr,
> -					   addr + PAGE_SIZE);
> +					   addr + PAGE_SIZE-1);
>   }
>   
>   #if defined(CONFIG_DEBUG_FS)


      parent reply	other threads:[~2019-12-02  8:32 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-27  0:53 [x86/mm/pat] 8d04a5f97a: phoronix-test-suite.glmark2.0.score -23.7% regression kernel test robot
2019-11-30 20:23 ` Mariusz Ceier
2019-11-30 21:27   ` Davidlohr Bueso
2019-11-30 22:08     ` Mariusz Ceier
2019-11-30 22:35       ` Linus Torvalds
2019-12-01 10:46         ` Ingo Molnar
2019-12-01 14:49           ` [PATCH] x86/pat: Fix off-by-one bugs in interval tree search Ingo Molnar
2019-12-01 16:09             ` Mariusz Ceier
2019-12-01 19:53               ` Ingo Molnar
2019-12-01 16:42             ` Kenneth R. Crudup
2019-12-01 17:01             ` Davidlohr Bueso
2019-12-01 17:08             ` Kenneth R. Crudup
2019-12-01 19:55               ` Ingo Molnar
2019-12-01 20:09                 ` Kenneth R. Crudup
2019-12-01 20:30                   ` Ingo Molnar
2019-12-01 20:04             ` [tip: x86/urgent] x86/mm/pat: " tip-bot2 for Ingo Molnar
2019-12-02  8:31             ` Rong Chen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45ca8086-b7ce-1561-c7d1-3580872d2ba8@intel.com \
    --to=rong.a.chen@intel.com \
    --cc=bp@alien8.de \
    --cc=dave@stgolabs.net \
    --cc=dbueso@suse.de \
    --cc=kenny@panix.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@lists.01.org \
    --cc=mceier@gmail.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).