From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754080AbdKAGKR (ORCPT ); Wed, 1 Nov 2017 02:10:17 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:34265 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753753AbdKAGKP (ORCPT ); Wed, 1 Nov 2017 02:10:15 -0400 Subject: Re: [PATCH RFC 1/2] sched: Minimize the idle cpu selection race window. To: Mike Galbraith , Peter Zijlstra References: <1509427662-25114-1-git-send-email-atish.patra@oracle.com> <1509427662-25114-2-git-send-email-atish.patra@oracle.com> <20171031082009.rxxa57goto6q5xld@hirez.programming.kicks-ass.net> <1509439705.14765.16.camel@gmx.de> Cc: linux-kernel@vger.kernel.org, joelaf@google.com, brendan.jackman@arm.com, jbacik@fb.com, mingo@redhat.com From: Atish Patra Organization: Oracle Corporation Message-ID: Date: Wed, 1 Nov 2017 01:08:59 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <1509439705.14765.16.camel@gmx.de> Content-Type: text/plain; charset=iso-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: aserv0021.oracle.com [141.146.126.233] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/31/2017 03:48 AM, Mike Galbraith wrote: > On Tue, 2017-10-31 at 09:20 +0100, Peter Zijlstra wrote: >> On Tue, Oct 31, 2017 at 12:27:41AM -0500, Atish Patra wrote: >>> Currently, multiple tasks can wakeup on same cpu from >>> select_idle_sibiling() path in case they wakeup simulatenously >>> and last ran on the same llc. This happens because an idle cpu >>> is not updated until idle task is scheduled out. Any task waking >>> during that period may potentially select that cpu for a wakeup >>> candidate. >>> >>> Introduce a per cpu variable that is set as soon as a cpu is >>> selected for wakeup for any task. This prevents from other tasks >>> to select the same cpu again. Note: This does not close the race >>> window but minimizes it to accessing the per-cpu variable. If two >>> wakee tasks access the per cpu variable at the same time, they may >>> select the same cpu again. But it minimizes the race window >>> considerably. >> The very most important question; does it actually help? What >> benchmarks, give what numbers? Here are the numbers from one of the OLTP configuration on a 8 socket x86 machine kernel txn/minute (normalized) user/sys baseline 1.0 80/5 pcpu 1.021 84/5 The throughput gains are not very high and close to run-to-run variation %. The schedstat data (added for testing in 2/2 patch) indicates the there are many instances of the race conditions that got addressed but may be not enough to trigger a significant throughput change. All other benchmark I tested (TPCC, hackbench, schbench, swingbench) did not show any regression. I will let Joel post numbers from Android benchmarks. > I played with something ~similar (cmpxchg() idle cpu reservation) I had an atomic version earlier as well. Peter's suggestion for per cpu seems to perform slightly better than atomic. Thus, this patch has the per cpu version. > a > while back in the context of schbench, and it did help that, Do you have the schbench configuration somewhere that I can test? I tried various configurations but did not see any improvement or regression. > but for > generic fast mover benchmarks, the added overhead had the expected > effect, it shaved throughput a wee bit (rob Peter, pay Paul, repeat). which benchmark ? Is it hackbench or something else ? I have not found any regression yet in my testing. I would be happy to test if any other benchmark or different configuration for hackbench. Regards, Atish > I still have the patch lying about in my rubbish heap, but didn't > bother to save any of the test results. > > -Mike > >