From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.7 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF63DC2BA2B for ; Thu, 9 Apr 2020 15:01:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 903602078E for ; Thu, 9 Apr 2020 15:01:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727865AbgDIPBD (ORCPT ); Thu, 9 Apr 2020 11:01:03 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:47558 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726977AbgDIPBC (ORCPT ); Thu, 9 Apr 2020 11:01:02 -0400 Received: from in02.mta.xmission.com ([166.70.13.52]) by out01.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jMYft-0006Ep-WE; Thu, 09 Apr 2020 09:01:02 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1jMYft-00016T-3C; Thu, 09 Apr 2020 09:01:01 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Linus Torvalds Cc: Waiman Long , Ingo Molnar , Will Deacon , Bernd Edlinger , Linux Kernel Mailing List , Alexey Gladkov References: <87blobnq02.fsf@x220.int.ebiederm.org> <87lfnda3w3.fsf@x220.int.ebiederm.org> <87blo45keg.fsf@x220.int.ebiederm.org> <87v9maxb5q.fsf@x220.int.ebiederm.org> Date: Thu, 09 Apr 2020 09:58:09 -0500 In-Reply-To: (Linus Torvalds's message of "Wed, 8 Apr 2020 09:34:47 -0700") Message-ID: <87y2r4so3i.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1jMYft-00016T-3C;;;mid=<87y2r4so3i.fsf@x220.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/wmCXIWdycpFYe9E/5SZbgXSc0P7Notj0= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1 X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Linus Torvalds writes: > On Wed, Apr 8, 2020 at 8:17 AM Eric W. Biederman wrote: >> >> Yes. I missed the fact that we could take the lock killable. >> We still unfortunately have the deadlock with ptrace. > > That, I feel, is similarly trivial. > > Again, anybody who takes the lock for writing should just do so > killably. So you have three cases: > > - ptrace wins the race and gets the lock. > > Fine, the execve will wait until afterwards. > > - ptrace loses the race and is not a thread with execve. > > Fine, the execve() won, and the ptrace will wait until after execve. > > - ptrace loses the race and is a thread with execve. > > Fine, the execve() will kill the thing in dethread() and the ptrace > thread will release the lock and die. That would be nice. That is unfortunately not how ptrace_event(PTRACE_EVENT_EXIT, ...) works. When a thread going about it's ordinary business receives the SIGKILL from de_thread the thread changes course and finds it's way to do_exit. In do_exit the thread calls ptrace_event(PTRACE_EVENT_EXIT, ...) and blocks waiting for the tracer to let it continue. Further from previous attempts to fix this we know that there are pieces of userspace expect that stop to happen. So if the PTRACE_EVENT_EXIT stop does not happen userspace which is already attached breaks. Further this case with ptrace is something we know userspace does and is is just a matter of bad timing of attaching to the threads when one thread is exec'ing. So we don't even need to wonder if userspace would do such a silling thing. There are a lot similar cases that can happen if userspace inserts itself into the path of page faults, directly or indirectly, as long as some wait somewhere ultimately waits for a ptrace attach. > So all three cases are fine, and none of them have any behavioral > differences (as mentioned, the killing is "invisible" to users since > it's fundamentally a race, and you can consider the kill to have > happened before the ptrace started). See above. >> It might be simpler to make whichever lock we are dealing with per >> task_struct instead of per signal_struct. Then we don't even have to >> think about what de_thread does or if the lock is taken killable. > > Well, yes, but I think the dethread behavior of killing threads is > required anyway, so.. It is, but it is actually part of the problem. I think making some of this thread local might solve another easy case and let us focus more on the really hard problem. >> I keep wondering if we could do something similar to vfork. That is >> allocate an new task_struct and fully set it up for the post exec >> process, and then make it visible under tasklist_lock. Finally we could >> free the old process. >> >> That would appear as if everything happened atomically from >> the point of view of the rest of the kernel. > > I do think that would have been a lovely design originally, and would > avoid a lot of things. So "execve()" would basically look like an exit > and a thread creation with the same pid (without the SIGCHILD to the > parent, obviously) > > That would also naturally handle the "flush pending signals" etc issues. > > The fact that we created a whole new mm-struct ended up fixing a lot > of problems (even if it was painful to do). This might be similar. > > But it's not what we've ever done, and I do suspect you'd run into a > lot of odd small special cases if we were to try to do it now. I completely agree, which is why I haven't been rushing to do that. But this remains the only idea that I have thought of that would solve all of the issues. > So I think it's simpler to just start making the "cred lock waiters > have to be killable" rule. It's not like that's a very complex rule. I just looked at the remaining users of cred_guard_mutex and they are all killable or interruptible. Further all of the places that have been converted to use the exec_update_mutex are also all killable or interruptible. So where we came in is that we had the killable rule and that has what has allowed this to remain on the backburner for so long. At least you could kill the affected process from userspace. Unfortunately the deadlocks still happen. Eric