From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CC84C433E1 for ; Fri, 31 Jul 2020 17:19:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2DABC2074B for ; Fri, 31 Jul 2020 17:19:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387455AbgGaRTn (ORCPT ); Fri, 31 Jul 2020 13:19:43 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:59002 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732817AbgGaRTm (ORCPT ); Fri, 31 Jul 2020 13:19:42 -0400 Received: from in02.mta.xmission.com ([166.70.13.52]) by out01.mta.xmission.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1k1Yh1-005yTT-FM; Fri, 31 Jul 2020 11:19:39 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1k1Yh0-0005CV-Mh; Fri, 31 Jul 2020 11:19:39 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Linus Torvalds Cc: Linux Kernel Mailing List , Kees Cook , Pavel Machek , "Rafael J. Wysocki" , linux-fsdevel , Oleg Nesterov , Linux PM References: <87h7tsllgw.fsf@x220.int.ebiederm.org> <87d04fhkyz.fsf@x220.int.ebiederm.org> <87h7trg4ie.fsf@x220.int.ebiederm.org> <878sf16t34.fsf@x220.int.ebiederm.org> <87pn8c1uj6.fsf_-_@x220.int.ebiederm.org> Date: Fri, 31 Jul 2020 12:16:29 -0500 In-Reply-To: (Linus Torvalds's message of "Thu, 30 Jul 2020 16:17:50 -0700") Message-ID: <87pn8by58y.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1k1Yh0-0005CV-Mh;;;mid=<87pn8by58y.fsf@x220.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/myGxaPwVa4s7NNKG9jDuSCrMBG7MsRNg= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [RFC][PATCH] exec: Conceal the other threads from wakeups during exec X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Linus Torvalds writes: > On Thu, Jul 30, 2020 at 4:00 PM Eric W. Biederman wrote: >> >> The key is the function make_task_wakekill which could probably >> benefit from a little more review and refinement but appears to >> be basically correct. > > You really need to explain a lot more why you think this is all a good idea. > > For example, what if one of those other threads is waiting in line for > a critical lock, and the wait-queue you basically disabled was the > exclusive wait after lock handoff? > > That means that the lock will now effectively be held by that thread. > No, it wasn't woken up, but it had the lock handed to it, and it's now > entirely unresponsive until it is killed. > > How is that different from the deadlocks you're actually trying to fix? > > These are the kinds of problems that the freezer() code had too, with > freezing things that held locks etc. > > This approach does seem better than the freezer thing, and if I read > it right it will gather things in the signal handler code, but it's > not obvious that gathering them in random places where they sleep for > random reasons is safe or a good idea. > > I can imagine _so_ many dead systems if you just basically froze > something that holds the mmap lock and is sleeping on a page fault, > for example. > > Maybe I'm missing something, but I really think your "let's freeze > things" is seriously misguided. You're concentrating on some small > problem and trying to solve that, and not seeign the HUGE HONKING > problems that your approach is fundamentally introducing. Very good point. That would be a priority inversion on mmap_lock. Without great care that could indeed result in lockups. That definitely requires the points where things are already sleeping that can be converted to be opt-in. Which potentially makes things much more work. Thanks, that helps kill my bright idea as I expressed it. Part of what I was trying to solve (because I ran into the problem while I was reading the code) was that the freezer, the cgroup v2 freezer, and other waits do not compose nicely. Even limited to opt-in locations I think the trick of being able to transform the wait-state may solve that composition problem. That said I was really just posting this so if the ideas were good they could inspire future code, and if the ideas were bad they could be sunk. When it comes to sorting out future especially in exec I will know which ideas don't fly, so it will be easier to make the case for ideas that will work. Eric