From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C8D5C4332F for ; Fri, 10 Nov 2023 18:23:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229617AbjKJSX3 (ORCPT ); Fri, 10 Nov 2023 13:23:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345508AbjKJSV5 (ORCPT ); Fri, 10 Nov 2023 13:21:57 -0500 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BAC856F83; Thu, 9 Nov 2023 22:23:47 -0800 (PST) Received: from in02.mta.xmission.com ([166.70.13.52]:52218) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1r1K32-00Au9P-8z; Thu, 09 Nov 2023 22:27:16 -0700 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:57024 helo=email.froward.int.ebiederm.org.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1r1K31-002Sgf-8u; Thu, 09 Nov 2023 22:27:15 -0700 From: "Eric W. Biederman" To: Mateusz Guzik Cc: Peter Zijlstra , Kees Cook , Josh Triplett , Alexander Viro , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org References: <5c7333ea4bec2fad1b47a8fa2db7c31e4ffc4f14.1663334978.git.josh@joshtriplett.org> <202311071228.27D22C00@keescook> <20231107205151.qkwlw7aarjvkyrqs@f> <202311071445.53E5D72C@keescook> <202311081129.9E1EC8D34@keescook> <87msvnwzim.fsf@email.froward.int.ebiederm.org> Date: Thu, 09 Nov 2023 23:26:23 -0600 In-Reply-To: (Mateusz Guzik's message of "Thu, 9 Nov 2023 13:21:04 +0100") Message-ID: <87a5rmw54w.fsf@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1r1K31-002Sgf-8u;;;mid=<87a5rmw54w.fsf@email.froward.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX1+cPIx/y1pkh9ZJrAXmYoQCg+zoL4NEB9Y= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH] fs/exec.c: Add fast path for ENOENT on PATH search before allocating mm X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Mateusz Guzik writes: > On 11/9/23, Eric W. Biederman wrote: >> Mateusz Guzik writes: >>> sched_exec causes migration only for only few % of execs in the bench, >>> but when it does happen there is tons of overhead elsewhere. >>> >>> I expect real programs which get past execve will be prone to >>> migrating anyway, regardless of what sched_exec is doing. >>> >>> That is to say, while sched_exec buggering off here would be nice, I >>> think for real-world wins the thing to investigate is the overhead >>> which comes from migration to begin with. >> >> I have a vague memory that the idea is that there is a point during exec >> when it should be much less expensive than normal to allow migration >> between cpus because all of the old state has gone away. >> >> Assuming that is the rationale, if we are getting lock contention >> then either there is a global lock in there, or there is the potential >> to pick a less expensive location within exec. >> > > Given the commit below I think the term "migration cost" is overloaded here. > > By migration cost in my previous mail I meant the immediate cost > (stop_one_cpu and so on), but also the aftermath -- for example tlb > flushes on another CPU when tearing down your now-defunct mm after you > switched. > > For testing purposes I verified commenting out sched_exec and not > using taskset still gives me about 9.5k ops/s. > > I 100% agree should the task be moved between NUMA domains, it makes > sense to do it when it has the smallest footprint. I don't know what > the original patch did, the current code just picks a CPU and migrates > to it, regardless of NUMA considerations. I will note that the goal > would still be achieved by comparing domains and doing nothing if they > match. > > I think this would be nice to fix, but it is definitely not a big > deal. I guess the question is to Peter Zijlstra if this sounds > reasonable. Perhaps I misread the trace. My point was simply that the sched_exec seemed to be causing lock contention because what was on one cpu is now on another cpu, and we are now getting cross cpu lock ping-pongs. If the sched_exec is causing exec to cause cross cpu lock ping-pongs, then we can move sched_exec to a better place within exec. It has already happened once, shortly after it was introduced. Ultimately we want the sched_exec to be in the cheapest place within exec that we can find. Eric