All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fengguang Wu <fengguang.wu@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>, Russ Anderson <rja@sgi.com>,
	Dimitri Sivanich <sivanich@sgi.com>,
	Matt Fleming <matt@codeblueprint.co.uk>,
	x86@kernel.org, LKML <linux-kernel@vger.kernel.org>,
	Mike Travis <travis@sgi.com>,
	Masahiro Yamada <yamada.masahiro@socionext.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Alex Thorlton <athorlton@sgi.com>,
	lkp@01.org
Subject: Re: [LKP] [lkp] [x86/platform/UV] 71854cb812: will-it-scale.per_thread_ops -2.3% regression
Date: Mon, 31 Oct 2016 13:41:01 +0800	[thread overview]
Message-ID: <20161031054101.b27mjyr46b3r2tbp@wfg-t540p.sh.intel.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1610280030460.4913@nanos>

Hi Thomas,

It's been a big challenge that we'll occasionally run into such bisect
whose data show clear changes, however cannot be easily explained by
looking at the code logic.

On Fri, Oct 28, 2016 at 12:37:45AM +0200, Thomas Gleixner wrote:
>On Thu, 27 Oct 2016, Ye Xiaolong wrote:
>> Yes, this is weird, the per_thread_ops change is small and should be run
>> to run variation, the actual significant change is will-it-scale.time.user_time
>> -27% decrease, but the patch seems not relevant, we can't interpret it. :(
>>
>> We've tried to queue the jobs (4 times) for
>> 71854cb812ec23bfe5f63d52217e6b9e6cb901f5 and v4.9-rc1 with new kconfig
>> (added CONFIG_DEBUG_INFO_REDUCED), and result shows user_time change is
>> quite stable.
>
>>         v4.9-rc1 71854cb812ec23bfe5f63d5221
>> ---------------- --------------------------
>>          %stddev     %change         %stddev
>>              \          |                \
>>    1670068 ±  0%      -3.8%    1606650 ±  1%  will-it-scale.per_thread_ops
>>       9749 ±  2%   +1328.0%     139222 ±105%  will-it-scale.time.involuntary_context_switches
>
>	     	     		  ^^^^^^ This is massive
>
>I have no explanation for this either, but you really should try to figure
>out what's going on here.

Xiaolong, how about doing a small debug patch (a WARN_ONCE() line may
be enough) to verify whether the code path is executed?

It'd also help to compare vmlinux according to Thomas' reasoning:

>The only difference between plain rc1 and rc + this patch is the resulting
>text size and therefor some other unrelated stuff moving to different
>places in memory which has some yet to figure out side effects.

Yeah that's possible.

>From bisect POV, the below graphs show the user_time and system_time
are clearly and consistently different before/after commit 71854cb812.
So this commit must impacted something.

Legend:
        [*] bisect good samples (eg. tests run on commits before 71854cb812)
        [o] bisect bad samples  (eg. tests run on commits  since 71854cb812)

                          will-it-scale.time.user_time

  85 ++---------------------------------------------------------------------+
     | .*.*..      .*..*.    .*..*.       .*.. .*..*.       .*              |
  80 *+      *.*..*      *..*      *..*..*    *      *..*.*.                |
     |                                                                      |
     |                                                                      |
  75 ++                                                                     |
     |                                                                      |
  70 ++                                                                     |
     |                                                                      |
  65 ++                                                                     |
     |                                                                      |
     |    O    O                                                            |
  60 O+ O    O           O       O O  O  O    O      O  O O  O O  O O  O O  O
     |            O O  O    O O            O    O  O                        |
  55 ++---------------------------------------------------------------------+



                          will-it-scale.time.system_time

  1010 ++-------------------------------------------------------------------+
       |                                                                    |
  1005 ++          O  O O    O  O           O O  O O                        |
       O O    O            O      O  O O  O           O O  O O  O O  O O  O O
       |    O    O                                                          |
  1000 ++                                                                   |
       |                                                                    |
   995 ++                                                                   |
       |                                                                    |
   990 ++                                                                   |
       |                                                                    |
       |                                                                    |
   985 ++                                                                   |
       *.    .*..*.*..    .*.*..    .*.*..*. .*..    .*.*..*.               |
   980 ++*--*---------*-*-------*-*---------*----*-*---------*--------------+


The voluntary_context_switches increase looks obvious, too, though not
as consistent:

                    will-it-scale.time.voluntary_context_switches

  160000 ++-----------------------------------------------------------------+
         |                       O                                          |
  150000 ++                                                                 |
         |                                                                  |
  140000 ++          O         O           O                                |
         |                         O  O O                                   |
  130000 ++            O                                                    |
         | O                                      O O                       |
  120000 ++     O .*                   .*..    O        .*                  |
         O   .O.*. O+             .*..*      O     .*.*.  +O  O      O O    |
  110000 *+*.        *.       .*.*         *. .*..*        *..  O O       O O
         |             *..*.*.               *        O  O    *             |
  100000 ++                 O                                               |
         |                O                                                 |
   90000 ++-----------------------------------------------------------------+


So do the branch misses:

                                perf-stat.branch-misses

  6.5e+09 ++----------------------------------------------------------------+
          |                                                                 |
    6e+09 ++               O O                                              |
  5.5e+09 ++                                                                |
          |                                                                 |
    5e+09 ++                                                                |
          O O    O                                                          |
  4.5e+09 ++   O   O  O O                                                   |
          |                                                                 |
    4e+09 ++                             O O           O O  O O O  O O O  O O
  3.5e+09 ++                                                                |
          |                                  O      O                       |
    3e+09 *+*..      .*.*..*.*.*..*.O O   .*.*..O.O.       .*.*             |
          |    *.*.*.          O  O *.*..*          *..*.*.                 |
  2.5e+09 ++----------------------------------------------------------------+


                            perf-stat.branch-miss-rate%

   0.6 ++------------------O------------------------------------------------+
       |                O                                                   |
  0.55 ++                                                                   |
       |                                                                    |
   0.5 ++                                                                   |
       O O    O    O                                                        |
  0.45 ++   O    O    O                                                     |
       |                                                                    |
   0.4 ++                                 O             O       O         O |
       |                               O              O    O O    O  O O    O
  0.35 ++                                                                   |
       |                                           O                        |
   0.3 *+*..      .*..*.*..*.*..*.O  O   .*.O.O..O.       .*.*              |
       |    *.*..*           O  O *..*.*.          *..*.*.                  |
  0.25 ++-------------------------------------------------------------------+

Regards,
Fengguang

WARNING: multiple messages have this Message-ID (diff)
From: Fengguang Wu <fengguang.wu@intel.com>
To: lkp@lists.01.org
Subject: Re: [x86/platform/UV] 71854cb812: will-it-scale.per_thread_ops -2.3% regression
Date: Mon, 31 Oct 2016 13:41:01 +0800	[thread overview]
Message-ID: <20161031054101.b27mjyr46b3r2tbp@wfg-t540p.sh.intel.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1610280030460.4913@nanos>

[-- Attachment #1: Type: text/plain, Size: 8468 bytes --]

Hi Thomas,

It's been a big challenge that we'll occasionally run into such bisect
whose data show clear changes, however cannot be easily explained by
looking at the code logic.

On Fri, Oct 28, 2016 at 12:37:45AM +0200, Thomas Gleixner wrote:
>On Thu, 27 Oct 2016, Ye Xiaolong wrote:
>> Yes, this is weird, the per_thread_ops change is small and should be run
>> to run variation, the actual significant change is will-it-scale.time.user_time
>> -27% decrease, but the patch seems not relevant, we can't interpret it. :(
>>
>> We've tried to queue the jobs (4 times) for
>> 71854cb812ec23bfe5f63d52217e6b9e6cb901f5 and v4.9-rc1 with new kconfig
>> (added CONFIG_DEBUG_INFO_REDUCED), and result shows user_time change is
>> quite stable.
>
>>         v4.9-rc1 71854cb812ec23bfe5f63d5221
>> ---------------- --------------------------
>>          %stddev     %change         %stddev
>>              \          |                \
>>    1670068 ±  0%      -3.8%    1606650 ±  1%  will-it-scale.per_thread_ops
>>       9749 ±  2%   +1328.0%     139222 ±105%  will-it-scale.time.involuntary_context_switches
>
>	     	     		  ^^^^^^ This is massive
>
>I have no explanation for this either, but you really should try to figure
>out what's going on here.

Xiaolong, how about doing a small debug patch (a WARN_ONCE() line may
be enough) to verify whether the code path is executed?

It'd also help to compare vmlinux according to Thomas' reasoning:

>The only difference between plain rc1 and rc + this patch is the resulting
>text size and therefor some other unrelated stuff moving to different
>places in memory which has some yet to figure out side effects.

Yeah that's possible.

>From bisect POV, the below graphs show the user_time and system_time
are clearly and consistently different before/after commit 71854cb812.
So this commit must impacted something.

Legend:
        [*] bisect good samples (eg. tests run on commits before 71854cb812)
        [o] bisect bad samples  (eg. tests run on commits  since 71854cb812)

                          will-it-scale.time.user_time

  85 ++---------------------------------------------------------------------+
     | .*.*..      .*..*.    .*..*.       .*.. .*..*.       .*              |
  80 *+      *.*..*      *..*      *..*..*    *      *..*.*.                |
     |                                                                      |
     |                                                                      |
  75 ++                                                                     |
     |                                                                      |
  70 ++                                                                     |
     |                                                                      |
  65 ++                                                                     |
     |                                                                      |
     |    O    O                                                            |
  60 O+ O    O           O       O O  O  O    O      O  O O  O O  O O  O O  O
     |            O O  O    O O            O    O  O                        |
  55 ++---------------------------------------------------------------------+



                          will-it-scale.time.system_time

  1010 ++-------------------------------------------------------------------+
       |                                                                    |
  1005 ++          O  O O    O  O           O O  O O                        |
       O O    O            O      O  O O  O           O O  O O  O O  O O  O O
       |    O    O                                                          |
  1000 ++                                                                   |
       |                                                                    |
   995 ++                                                                   |
       |                                                                    |
   990 ++                                                                   |
       |                                                                    |
       |                                                                    |
   985 ++                                                                   |
       *.    .*..*.*..    .*.*..    .*.*..*. .*..    .*.*..*.               |
   980 ++*--*---------*-*-------*-*---------*----*-*---------*--------------+


The voluntary_context_switches increase looks obvious, too, though not
as consistent:

                    will-it-scale.time.voluntary_context_switches

  160000 ++-----------------------------------------------------------------+
         |                       O                                          |
  150000 ++                                                                 |
         |                                                                  |
  140000 ++          O         O           O                                |
         |                         O  O O                                   |
  130000 ++            O                                                    |
         | O                                      O O                       |
  120000 ++     O .*                   .*..    O        .*                  |
         O   .O.*. O+             .*..*      O     .*.*.  +O  O      O O    |
  110000 *+*.        *.       .*.*         *. .*..*        *..  O O       O O
         |             *..*.*.               *        O  O    *             |
  100000 ++                 O                                               |
         |                O                                                 |
   90000 ++-----------------------------------------------------------------+


So do the branch misses:

                                perf-stat.branch-misses

  6.5e+09 ++----------------------------------------------------------------+
          |                                                                 |
    6e+09 ++               O O                                              |
  5.5e+09 ++                                                                |
          |                                                                 |
    5e+09 ++                                                                |
          O O    O                                                          |
  4.5e+09 ++   O   O  O O                                                   |
          |                                                                 |
    4e+09 ++                             O O           O O  O O O  O O O  O O
  3.5e+09 ++                                                                |
          |                                  O      O                       |
    3e+09 *+*..      .*.*..*.*.*..*.O O   .*.*..O.O.       .*.*             |
          |    *.*.*.          O  O *.*..*          *..*.*.                 |
  2.5e+09 ++----------------------------------------------------------------+


                            perf-stat.branch-miss-rate%

   0.6 ++------------------O------------------------------------------------+
       |                O                                                   |
  0.55 ++                                                                   |
       |                                                                    |
   0.5 ++                                                                   |
       O O    O    O                                                        |
  0.45 ++   O    O    O                                                     |
       |                                                                    |
   0.4 ++                                 O             O       O         O |
       |                               O              O    O O    O  O O    O
  0.35 ++                                                                   |
       |                                           O                        |
   0.3 *+*..      .*..*.*..*.*..*.O  O   .*.O.O..O.       .*.*              |
       |    *.*..*           O  O *..*.*.          *..*.*.                  |
  0.25 ++-------------------------------------------------------------------+

Regards,
Fengguang

  reply	other threads:[~2016-10-31  5:41 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-20  1:48 [PATCH v2] x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates Alex Thorlton
2016-10-20 12:26 ` Matt Fleming
2016-10-21  5:48 ` [tip:x86/urgent] " tip-bot for Alex Thorlton
2016-10-25  6:46 ` [lkp] [x86/platform/UV] 71854cb812: will-it-scale.per_thread_ops -2.3% regression kernel test robot
2016-10-25  6:46   ` kernel test robot
2016-10-25 10:57   ` [lkp] " Thomas Gleixner
2016-10-25 10:57     ` Thomas Gleixner
2016-10-27  1:55     ` [lkp] " Ye Xiaolong
2016-10-27  1:55       ` Ye Xiaolong
2016-10-27 22:37       ` [lkp] " Thomas Gleixner
2016-10-27 22:37         ` Thomas Gleixner
2016-10-31  5:41         ` Fengguang Wu [this message]
2016-10-31  5:41           ` Fengguang Wu
2016-10-31 18:19           ` [LKP] [lkp] " Thomas Gleixner
2016-10-31 18:19             ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161031054101.b27mjyr46b3r2tbp@wfg-t540p.sh.intel.com \
    --to=fengguang.wu@intel.com \
    --cc=athorlton@sgi.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rja@sgi.com \
    --cc=sivanich@sgi.com \
    --cc=tglx@linutronix.de \
    --cc=travis@sgi.com \
    --cc=x86@kernel.org \
    --cc=xiaolong.ye@intel.com \
    --cc=yamada.masahiro@socionext.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.