From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 233FDC433F5 for ; Thu, 28 Apr 2022 02:05:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241288AbiD1CIf (ORCPT ); Wed, 27 Apr 2022 22:08:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239935AbiD1CI0 (ORCPT ); Wed, 27 Apr 2022 22:08:26 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1603101C0 for ; Wed, 27 Apr 2022 19:05:11 -0700 (PDT) Received: from dggpeml500023.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4Kpf825WlfzgYbP; Thu, 28 Apr 2022 10:04:50 +0800 (CST) Received: from dggpeml500018.china.huawei.com (7.185.36.186) by dggpeml500023.china.huawei.com (7.185.36.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Thu, 28 Apr 2022 10:05:10 +0800 Received: from [10.67.111.186] (10.67.111.186) by dggpeml500018.china.huawei.com (7.185.36.186) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Thu, 28 Apr 2022 10:05:09 +0800 Message-ID: Date: Thu, 28 Apr 2022 10:05:09 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.1.1 Subject: Re: Question about kill a process group To: "Eric W. Biederman" CC: lkml , , , Peter Zijlstra , , , , References: <87ilrd2dfj.fsf@email.froward.int.ebiederm.org> <58223bd3-b63b-0c2b-abcc-e1136090d060@huawei.com> <874k2mtny7.fsf@email.froward.int.ebiederm.org> From: Zhang Qiao In-Reply-To: <874k2mtny7.fsf@email.froward.int.ebiederm.org> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.111.186] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpeml500018.china.huawei.com (7.185.36.186) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org hi, 在 2022/4/22 0:12, Eric W. Biederman 写道: > Zhang Qiao writes: > >> 在 2022/4/13 23:47, Eric W. Biederman 写道: >>> To do something about this is going to take a deep and fundamental >>> redesign of how we maintain process lists to handle a parent >>> with millions of children well. >>> >>> Is there any real world reason to care about this case? Without >>> real world motivation I am inclined to just note that this is >> >> I just foune it while i ran ltp test. > > So I looked and fork12 has been around since 2002 in largely it's > current form. So I am puzzled why you have run into problems > and other people have not. > > Did you perhaps have lock debugging enabled? > > Did you run on a very large machine where a ridiculous number processes > could be created? > > Did you happen to run fork12 on a machine where locks are much more > expensive than on most machines? I don't think so, I reproduced this problem on two servers with different configurations. One of server info as follows: cpu: Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz, 64 cpus, RAM: 377G Do you need any other information? > > >>> Is there a real world use case that connects to this? >>> >>> How many children are being created in this test? Several million? >> >> There are about 300,000+ processes. > > Not as many as I was guessing, but still enough to cause a huge > wait on locks. > >>> I would like to blame this on the old issue that tasklist_lock being >>> a global lock. Given the number of child processes (as many as can be >>> created) I don't think we are hurt much by using a global lock. The >>> problem for solubility is that we have a lock. >>> >>> Fundamentally there must be a lock taken to maintain the parent's >>> list of children. >>> >>> I only see SIGQUIT being called once in the parent process so that >>> should not be an issue. >> >> >> In fork12, every child will call kill(0, SIGQUIT) at cleanup(). >> There are a lot of kill(0, SIGQUIT) calls. > > I had missed that. I can see that stressing out a lot. > > At the same time as I read fork12.c that is very much a bug. The > children in fork12.c should call _exit() instead of exit(). Which > would suppress calling the atexit() handlers and let fork12.c > test what it is trying to test. > > That doesn't mean there isn't a mystery here, but more that if > we really want to test lots of processes calling the same > signal at the same time it should be a test that means to do that. > > >>> There is a minor issue in fork12 that it calls exit(0) instead of >>> _exit(0) in the children. Not the problem you are dealing with >>> but it does look like it can be a distraction. >>> >>> I suspect the issue really is the thundering hurd of a million+ >>> processes synchronizing on a single lock. >>> >>> I don't think this is a hard lockup, just a global slow down. >>> I expect everything will eventually exit. >>> >> >> But according to the vmcore, this is a hardlockup issue, and i think >> there may be the following scenarios: > > Let me rewind a second. I just realized that I don't have a clue what > a hard lockup is (outside of the linux hard lockup detector). > > The two kinds of lockups that I understand with a technical meaning are > deadlock (such taking two locks in opposite orders which can never be > escaped), and livelock (where things are so busy no progress is made for > an extended period of time). > > I meant to say this is not a deadlock situation. This looks like a > livelock, but I think given enough time the code would make progress and > get out of it. > > I do agree over 1 second for holding a spin lock is ridiculous and a > denial of service attack. > > > > What I unfortunately do not see is a real world scenario where this will > happen. Without a real world scenario it is hard to find motivation to > spend the year or so it would take to rework all of the data structures. > The closest I can imagine to a real world scenario is that this > situation can be used as a denial of service attack. > > The hardest part of the problem is that signals sent to a group need to > be sent to the group atomically. That is the signals need to be sent to > every member of the group. > > Anyway I am very curious why you are the only one seeing a problem with > fork12. That we can definitely investigate as tracking down what is > different about your setup versus other people who have run ltp seems > much easier than redesigning all of the signal processing data > structures from scratch. the test steps are as follows: 1. git clone https://github.com/linux-test-project/ltp.git --depth=1 2. cd ltp/ 3. make autotools 4. ./configure 5. cd testcases/kernel/syscalls/ 6. make -j64 7. find ./ -type f -executable > newlist 8. while read line;do ./$line -I 30;done < newlist 9. After ten hours, i trigger Ctrl+C repeatedly. > > Eric > . >