From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2466AC4332F for ; Thu, 21 Apr 2022 16:24:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231484AbiDUQ12 convert rfc822-to-8bit (ORCPT ); Thu, 21 Apr 2022 12:27:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34320 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231929AbiDUQPv (ORCPT ); Thu, 21 Apr 2022 12:15:51 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B0512B241 for ; Thu, 21 Apr 2022 09:13:01 -0700 (PDT) Received: from in01.mta.xmission.com ([166.70.13.51]:47922) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1nhZQO-00FgfU-0E; Thu, 21 Apr 2022 10:12:56 -0600 Received: from ip68-227-174-4.om.om.cox.net ([68.227.174.4]:35204 helo=email.froward.int.ebiederm.org.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1nhZQM-009wda-SP; Thu, 21 Apr 2022 10:12:55 -0600 From: "Eric W. Biederman" To: Zhang Qiao Cc: lkml , , , Peter Zijlstra , , , , References: <87ilrd2dfj.fsf@email.froward.int.ebiederm.org> <58223bd3-b63b-0c2b-abcc-e1136090d060@huawei.com> Date: Thu, 21 Apr 2022 11:12:48 -0500 In-Reply-To: <58223bd3-b63b-0c2b-abcc-e1136090d060@huawei.com> (Zhang Qiao's message of "Thu, 14 Apr 2022 19:40:47 +0800") Message-ID: <874k2mtny7.fsf@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-XM-SPF: eid=1nhZQM-009wda-SP;;;mid=<874k2mtny7.fsf@email.froward.int.ebiederm.org>;;;hst=in01.mta.xmission.com;;;ip=68.227.174.4;;;frm=ebiederm@xmission.com;;;spf=softfail X-XM-AID: U2FsdGVkX1+x09LpwF9rV/Vq/NcUAUTBtxTT3Ay+Q9s= X-SA-Exim-Connect-IP: 68.227.174.4 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: Question about kill a process group X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Zhang Qiao writes: > 在 2022/4/13 23:47, Eric W. Biederman 写道: >> To do something about this is going to take a deep and fundamental >> redesign of how we maintain process lists to handle a parent >> with millions of children well. >> >> Is there any real world reason to care about this case? Without >> real world motivation I am inclined to just note that this is > > I just foune it while i ran ltp test. So I looked and fork12 has been around since 2002 in largely it's current form. So I am puzzled why you have run into problems and other people have not. Did you perhaps have lock debugging enabled? Did you run on a very large machine where a ridiculous number processes could be created? Did you happen to run fork12 on a machine where locks are much more expensive than on most machines? >> Is there a real world use case that connects to this? >> >> How many children are being created in this test? Several million? > > There are about 300,000+ processes. Not as many as I was guessing, but still enough to cause a huge wait on locks. >> I would like to blame this on the old issue that tasklist_lock being >> a global lock. Given the number of child processes (as many as can be >> created) I don't think we are hurt much by using a global lock. The >> problem for solubility is that we have a lock. >> >> Fundamentally there must be a lock taken to maintain the parent's >> list of children. >> >> I only see SIGQUIT being called once in the parent process so that >> should not be an issue. > > > In fork12, every child will call kill(0, SIGQUIT) at cleanup(). > There are a lot of kill(0, SIGQUIT) calls. I had missed that. I can see that stressing out a lot. At the same time as I read fork12.c that is very much a bug. The children in fork12.c should call _exit() instead of exit(). Which would suppress calling the atexit() handlers and let fork12.c test what it is trying to test. That doesn't mean there isn't a mystery here, but more that if we really want to test lots of processes calling the same signal at the same time it should be a test that means to do that. >> There is a minor issue in fork12 that it calls exit(0) instead of >> _exit(0) in the children. Not the problem you are dealing with >> but it does look like it can be a distraction. >> >> I suspect the issue really is the thundering hurd of a million+ >> processes synchronizing on a single lock. >> >> I don't think this is a hard lockup, just a global slow down. >> I expect everything will eventually exit. >> > > But according to the vmcore, this is a hardlockup issue, and i think > there may be the following scenarios: Let me rewind a second. I just realized that I don't have a clue what a hard lockup is (outside of the linux hard lockup detector). The two kinds of lockups that I understand with a technical meaning are deadlock (such taking two locks in opposite orders which can never be escaped), and livelock (where things are so busy no progress is made for an extended period of time). I meant to say this is not a deadlock situation. This looks like a livelock, but I think given enough time the code would make progress and get out of it. I do agree over 1 second for holding a spin lock is ridiculous and a denial of service attack. What I unfortunately do not see is a real world scenario where this will happen. Without a real world scenario it is hard to find motivation to spend the year or so it would take to rework all of the data structures. The closest I can imagine to a real world scenario is that this situation can be used as a denial of service attack. The hardest part of the problem is that signals sent to a group need to be sent to the group atomically. That is the signals need to be sent to every member of the group. Anyway I am very curious why you are the only one seeing a problem with fork12. That we can definitely investigate as tracking down what is different about your setup versus other people who have run ltp seems much easier than redesigning all of the signal processing data structures from scratch. Eric