From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751802AbbFXJSu (ORCPT <rfc822;w@1wt.eu>);
	Wed, 24 Jun 2015 05:18:50 -0400
Received: from mail.bmw-carit.de ([62.245.222.98]:43821 "EHLO
	mail.bmw-carit.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750966AbbFXJSm (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 24 Jun 2015 05:18:42 -0400
X-CTCH-RefID: str=0001.0A0C0203.558A75EB.0111,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0
Message-ID: <558A75EA.40905@bmw-carit.de>
Date: Wed, 24 Jun 2015 11:18:34 +0200
From: Daniel Wagner <daniel.wagner@bmw-carit.de>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>
CC: <oleg@redhat.com>, <paulmck@linux.vnet.ibm.com>, <tj@kernel.org>,
        <mingo@redhat.com>, <linux-kernel@vger.kernel.org>, <der.herr@hofr.at>,
        <dave@stgolabs.net>, <riel@redhat.com>, <viro@ZenIV.linux.org.uk>,
        <torvalds@linux-foundation.org>, <jlayton@poochiereds.net>
Subject: Re: [RFC][PATCH 00/13] percpu rwsem -v2
References: <20150622121623.291363374@infradead.org> <55884FC2.6030607@bmw-carit.de> <20150622190553.GZ3644@twins.programming.kicks-ass.net> <5589285C.2010100@bmw-carit.de> <20150623143411.GA25159@twins.programming.kicks-ass.net> <558973A7.6010407@bmw-carit.de> <20150623175012.GD3644@twins.programming.kicks-ass.net> <20150623193624.GH18673@twins.programming.kicks-ass.net> <20150624084648.GB27873@gmail.com>
In-Reply-To: <20150624084648.GB27873@gmail.com>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 06/24/2015 10:46 AM, Ingo Molnar wrote:
> So I'd suggest to first compare preemption behavior: does the workload 
> context-switch heavily, and is it the exact same context switching rate and are 
> the points of preemption the same as well between the two kernels?

If I read this correctly, the answer is yes.

First the 'stable' flock02 test:

perf stat --repeat 5  --pre 'rm -rf /tmp/a' ~/src/lockperf/flock02 -n 128 -l 64 /tmp/a
0.008793148
0.008784990
0.008587804
0.008693641
0.008776946

 Performance counter stats for '/home/wagi/src/lockperf/flock02 -n 128 -l 64 /tmp/a' (5 runs):

         76.509634      task-clock (msec)         #    3.312 CPUs utilized            ( +-  0.67% )
                 2      context-switches          #    0.029 K/sec                    ( +- 26.50% )
               128      cpu-migrations            #    0.002 M/sec                    ( +-  0.31% )
             5,295      page-faults               #    0.069 M/sec                    ( +-  0.49% )
        89,944,154      cycles                    #    1.176 GHz                      ( +-  0.66% )
        58,670,259      stalled-cycles-frontend   #   65.23% frontend cycles idle     ( +-  0.88% )
                 0      stalled-cycles-backend    #    0.00% backend  cycles idle   
        76,991,414      instructions              #    0.86  insns per cycle        
                                                  #    0.76  stalled cycles per insn  ( +-  0.19% )
        15,239,720      branches                  #  199.187 M/sec                    ( +-  0.20% )
           103,418      branch-misses             #    0.68% of all branches          ( +-  6.68% )

       0.023102895 seconds time elapsed                                          ( +-  1.09% )


And here posix01 which shows high variance:

perf stat --repeat 5  --pre 'rm -rf /tmp/a' ~/src/lockperf/posix01 -n 128 -l 64 /tmp/a
0.006020402
32.510838421
55.516466069
46.794470223
5.097701438

 Performance counter stats for '/home/wagi/src/lockperf/posix01 -n 128 -l 64 /tmp/a' (5 runs):

       4177.932106      task-clock (msec)         #   14.162 CPUs utilized            ( +- 34.59% )
            70,646      context-switches          #    0.017 M/sec                    ( +- 31.56% )
            28,009      cpu-migrations            #    0.007 M/sec                    ( +- 33.55% )
             4,834      page-faults               #    0.001 M/sec                    ( +-  0.98% )
     7,291,160,968      cycles                    #    1.745 GHz                      ( +- 32.17% )
     5,216,204,262      stalled-cycles-frontend   #   71.54% frontend cycles idle     ( +- 32.13% )
                 0      stalled-cycles-backend    #    0.00% backend  cycles idle   
     1,901,289,780      instructions              #    0.26  insns per cycle        
                                                  #    2.74  stalled cycles per insn  ( +- 30.80% )
       440,415,914      branches                  #  105.415 M/sec                    ( +- 31.06% )
         1,347,021      branch-misses             #    0.31% of all branches          ( +- 29.17% )

       0.295016987 seconds time elapsed                                          ( +- 32.01% )


BTW, thanks for the perf stat tip. Really handy!

cheers,
daniel