From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00,BITCOIN_OBFU_SUBJ, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8029C433DB for ; Fri, 12 Mar 2021 20:02:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7901464F88 for ; Fri, 12 Mar 2021 20:02:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234429AbhCLUCF (ORCPT ); Fri, 12 Mar 2021 15:02:05 -0500 Received: from mx1.riseup.net ([198.252.153.129]:47278 "EHLO mx1.riseup.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234342AbhCLUBm (ORCPT ); Fri, 12 Mar 2021 15:01:42 -0500 Received: from fews1.riseup.net (fews1-pn.riseup.net [10.0.1.83]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Sectigo RSA Domain Validation Secure Server CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id 4DxxXk4bSjzDqCH; Fri, 12 Mar 2021 12:01:42 -0800 (PST) X-Riseup-User-ID: 48AD520682A172C0D2723F83ED3442DB99E29EB8E46036614C1E5308AF18023B Received: from [127.0.0.1] (localhost [127.0.0.1]) by fews1.riseup.net (Postfix) with ESMTPSA id 4DxxXj6L1hz5wGG; Fri, 12 Mar 2021 12:01:41 -0800 (PST) Subject: Re: [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n) From: Jim Newsome To: Andrew Morton Cc: Oleg Nesterov , "Eric W . Biederman" , Christian Brauner , linux-kernel@vger.kernel.org References: <20210312173855.24843-1-jnewsome@torproject.org> <20210312102207.a347e38db375226a78cc37bf@linux-foundation.org> Organization: The Tor Project Message-ID: Date: Fri, 12 Mar 2021 14:01:41 -0600 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (Re-sent without html part, which the list rejected) On 3/12/21 12:47, Andrew Morton wrote: > IOW, please spend a bit of time selling the patch! What is the case > for including it in Linux? What benefit does it provide our users? Ah yes - I'd included some context when I first reached out to Oleg, but that never made it to the list :). I'm helping develop a new ptrace-based version of Shadow [1] - a tool for simulating a (potentially large) network. Shadow runs the network user-space applications in an emulated environment, and routes network traffic through a model of the network accounting for latency, bandwidth, etc. The Tor Project plans to make increasing use of Shadow both for focused evaluation of specific proposed software and parameter changes, attacks, and defenses, and as a regular automated performance evaluation prior to deployment of new versions. Today Shadow is already actively used in the research community for applications including tor and bitcoin. We're interested in running simulations including at least tens of thousands of processes, with a stretch goal of being able to handle 1M processes. Since each process is being ptraced, calling an O(n) waitpid has a huge performance penalty at this scale, and results in simulation performance growing ~quadratically with the size of the simulation. We do have a workaround where we use a "fork proxy" thread to actually fork all the processes, and we stop and detach inactive processes. (The number of "active" processes is roughly fixed to the number of worker threads, which is generally the # of CPUs available). i.e. this keeps the number of children and tracees small and fixed, allowing us to scale linearly. However, having to detach and reattach tracees adds a significant linear overhead factor. This kernel patch would allow us to get rid of the complexity and extra overhead of this workaround, and benefit other applications that haven't implemented such a workaround. We have some details and analysis of this issue in GitHub [2]. I haven't added results with the patch yet, but plan to do so. This should also help other applications that have a large number of children or tracees. Ptrace-based emulation tools are probably the most likely to benefit. e.g. I suspect it'll help User Mode Linux [3], which IIUC uses a single tracer thread to ptrace every process running on its kernel. Likewise it could help DetTrace [4], which uses ptrace for deterministic software builds. [1]: https://shadow.github.io/ [2]: https://github.com/shadow/shadow/issues/1134 [3]: https://en.wikipedia.org/wiki/User-mode_Linux [4]: https://github.com/dettrace/dettrace