From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751303AbdE2QlR (ORCPT <rfc822;w@1wt.eu>);
        Mon, 29 May 2017 12:41:17 -0400
Received: from mga01.intel.com ([192.55.52.88]:56987 "EHLO mga01.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1750921AbdE2QlQ (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 29 May 2017 12:41:16 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.38,415,1491289200"; 
   d="scan'208";a="92624329"
Subject: Re: [PATCH]: perf/core: addressing 4x slowdown during per-process
 profiling of STREAM benchmark on Intel Xeon Phi
To: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Ingo Molnar <mingo@redhat.com>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Andi Kleen <ak@linux.intel.com>, Kan Liang <kan.liang@intel.com>,
        Dmitri Prokhorov <Dmitry.Prohorov@intel.com>,
        Valery Cherepennikov <valery.cherepennikov@intel.com>,
        David Carrillo-Cisneros <davidcc@google.com>,
        Stephane Eranian <eranian@google.com>,
        Mark Rutland <mark.rutland@arm.com>, linux-kernel@vger.kernel.org
References: <1e962b59-3e39-e0d6-515d-c4fd3502edae@linux.intel.com>
 <87k24zzx7s.fsf@ashishki-desk.ger.corp.intel.com>
 <47dc6d8d-77db-70f5-9aa6-2aca38590e60@linux.intel.com>
 <20170529152254.wjx3b6apmatcso77@hirez.programming.kicks-ass.net>
 <20170529152931.yxapmev3ix3hn5bk@hirez.programming.kicks-ass.net>
From: Alexey Budankov <alexey.budankov@linux.intel.com>
Organization: Intel Corp.
Message-ID: <882d3e9d-426a-35f6-b7ef-cb4a29954af1@linux.intel.com>
Date: Mon, 29 May 2017 19:41:11 +0300
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.1.1
MIME-Version: 1.0
In-Reply-To: <20170529152931.yxapmev3ix3hn5bk@hirez.programming.kicks-ass.net>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 29.05.2017 18:29, Peter Zijlstra wrote:
> On Mon, May 29, 2017 at 05:22:54PM +0200, Peter Zijlstra wrote:
>> On Mon, May 29, 2017 at 04:43:09PM +0300, Alexey Budankov wrote:
>>> On 29.05.2017 15:03, Alexander Shishkin wrote:
>>>> Alexey Budankov <alexey.budankov@linux.intel.com> writes:
>>
>>>>> +		} else if (event->cpu > node_event->cpu) {
>>>>> +			node = &((*node)->rb_right);
>>>>> +		} else {
>>>>> +			list_add_tail(&event->group_list_entry,
>>>>> +					&node_event->group_list);
>>>>
>>>> So why is this better than simply having per-cpu event lists plus one
>>>> for per-thread events?
>>>
>>> Good question. Choice of data structure and layout depends on the operations
>>> applied to the data so keeping groups as a tree simplifies and improves the
>>> implementation in terms of scalability and performance. Please ask more if
>>> any.
>>
>> Since these lists are per context, and each task can have a context,
>> you'd end up with per-task-per-cpu memory, which is something we'd like
>> to avoid (some archs have very limited per-cpu memory space etc..).

Aw, yeah. Memory consumption does matter in the kernel space.

>>
>> Also, we'd like to have that tree for other reasons, like for instance
>> that heterogeneous PMU crud ARM has. Also, with a tree we can easier do
>> time based round-robin scheduling,
>>
> 
> Oh and in general multi-PMU stuff, aside from hetero PMU becomes much
> easier.
>