From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F277C433EF for ; Thu, 30 Jun 2022 10:46:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232212AbiF3KqQ (ORCPT ); Thu, 30 Jun 2022 06:46:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229673AbiF3KqP (ORCPT ); Thu, 30 Jun 2022 06:46:15 -0400 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05BD5D65 for ; Thu, 30 Jun 2022 03:46:14 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id r1so16675920plo.10 for ; Thu, 30 Jun 2022 03:46:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=AHRCCBW/JYhqCHWOVbSogrUe2u3TyZz98/JcPRt3vOM=; b=IicN9pLM0nICiRtnqiY88yj2ZaU+Nb/b/PnRYuAyLzX/hEOX8t4SzAFodnLYTWlXRz DPVEOzR1r2U39H89xXjdq9zYYoYYtNmc2cRFXBCAAHv05qvePfx3EHvI6grx+hvcZtW8 fXDvt6sCT4w+6xGYENjw0/NdJCwJGCy1ul40L1MG9E5nZCWEXEMajLlLLo5DDaY6dtOK 2HS6KJdWdplHLw+iKj9Hm5byXMvDcKaFBzwjXWMk3IMiomeCa50tkTGxbdgOYJ5cA3Pq ppO0fNfeG9ObHxh/DywqXOWr0psylqZnvI0qZyvkH0bbn1QdZn62H7lvSW/OJcLMswgY RTbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=AHRCCBW/JYhqCHWOVbSogrUe2u3TyZz98/JcPRt3vOM=; b=Yj3HpKfS4NzRi9UqYg3c8i0UXqrACR4KM0aeQLaHKc6+ybQDWaHwiEXAWnz/ZUhNEZ zMiCi+8z4Rw8KfAMaiGG7V1smKHhmtvAs9cL9R8aiVtxpfDC5OOwWWw/PbzhgeUdu24M nDCjDa9tmvj5CdTXFysFUM5v/ZnxEbFrg3JG4BAXK8M0ASLLtujDGUsbAV/PX5j7gspG 6jWS4dZhjuGHsY/I7Y+UVQsNwSBgS9JZWEjvUtrsslRSWVkVgwJERpPIjiHs/03Mfnoy Hl+jYeTg5zBLsJTWhl1YlPVb+4WoujZ+nANt2onKW6sgXQ2QlnmbIBcS4c88D4PS5XnM UFlw== X-Gm-Message-State: AJIora9PE26a4+b633rez3bxRdlThBm5BK+gm0ouEpEF2I9vQ3705VnP 0HU2mQgGRUUmEhprERRGRVTD3+1juK8rNg== X-Google-Smtp-Source: AGRyM1t1rfYwPCSnYs9iDgfAbHPrQu203uWt+ZTuuf6fPirLXRLU9qsp862QAit9TQ7AZws/PJOclQ== X-Received: by 2002:a17:90b:1b07:b0:1ec:c617:a314 with SMTP id nu7-20020a17090b1b0700b001ecc617a314mr9200091pjb.214.1656585973552; Thu, 30 Jun 2022 03:46:13 -0700 (PDT) Received: from [10.4.105.41] ([139.177.225.225]) by smtp.gmail.com with ESMTPSA id y5-20020a62ce05000000b0052514384f02sm2465676pfg.54.2022.06.30.03.46.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 30 Jun 2022 03:46:13 -0700 (PDT) Message-ID: Date: Thu, 30 Jun 2022 18:46:08 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: Re: [PATCH v4 6/7] sched/fair: skip busy cores in SIS search Content-Language: en-US To: Chen Yu Cc: Peter Zijlstra , Mel Gorman , Vincent Guittot , Josh Don , Tim Chen , K Prateek Nayak , "Gautham R . Shenoy" , linux-kernel@vger.kernel.org References: <20220619120451.95251-1-wuyun.abel@bytedance.com> <20220619120451.95251-7-wuyun.abel@bytedance.com> <20220621181442.GA37168@chenyu5-mobl1> <543d55e1-fad8-3df3-8bae-d79c0c8d8340@bytedance.com> <20220624033032.GA14945@chenyu5-mobl1> <3e4d2594-f678-b77a-4883-0b893daf19f6@bytedance.com> <2d18453d-9c9b-b57b-1616-d4a9229abd5a@bytedance.com> <20220630041645.GA9253@chenyu5-mobl1> From: Abel Wu In-Reply-To: <20220630041645.GA9253@chenyu5-mobl1> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/30/22 12:16 PM, Chen Yu Wrote: > On Tue, Jun 28, 2022 at 03:58:55PM +0800, Abel Wu wrote: >> >> On 6/27/22 6:13 PM, Abel Wu Wrote: >> There seems like not much difference except hackbench pipe test at >> certain groups (30~110). > OK, smaller LLC domain seems to not have much difference, which might > suggest that by leveraging load balance code path, the read/write > to LLC shared mask might not be the bottleneck. I have an vague > impression that during Aubrey's cpumask searching for idle CPUs > work[1], there is concern that updating the shared mask in large LLC > has introduced cache contention and performance degrading. Maybe we > can find that regressed test case to verify. > [1] https://lore.kernel.org/all/1615872606-56087-1-git-send-email-aubrey.li@intel.com/ I just went through Aubrey's v1-v11 patches and didn't find any particular tests other than hackbench/tbench/uperf. Please let me know if I missed something, thanks! >> I am intended to provide better scalability >> by applying the filter which will be enabled when: >> >> - The LLC is large enough that simply traversing becomes >> in-sufficient, and/or >> >> - The LLC is loaded that unoccupied cpus are minority. >> >> But it would be very nice if a more fine grained pattern works well >> so we can drop the above constrains. >> > We can first try to push a simple version, and later optimize it. > One concern about v4 is that, we changed the logic in v3, which recorded > the overloaded CPU, while v4 tracks unoccupied CPUs. An overloaded CPU is > more "stable" because there are more than 1 running tasks on that runqueue. > It is more likely to remain "occupied" for a while. That is to say, > nr_task = 1, 2, 3... will all be regarded as occupied, while only nr_task = 0 > is unoccupied. The former would bring less false negative/positive. Yes, I like the 'overloaded mask' too, but the downside is extra cpumask ops needed in the SIS path (the added cpumask_andnot). Besides, in this patch, the 'overloaded mask' is also unstable due to the state is maintained at core level rather than per-cpu, some more thoughts are in cover letter. > > By far I have tested hackbench/schbench/netperf on top of Peter's sched/core branch, > with SIS_UTIL enabled. Overall it looks good, and netperf has especially > significant improvement when the load approaches overloaded(which is aligned > with your comment above). I'll re-run the netperf for several cycles to check the > standard deviation. And I'm also curious about v3's performance because it > tracks overloaded CPUs, so I'll also test on v3 with small modifications. Thanks very much for your reviewing and testing. Abel