From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753859AbdFLRK5 (ORCPT ); Mon, 12 Jun 2017 13:10:57 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:60734 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752640AbdFLRKx (ORCPT ); Mon, 12 Jun 2017 13:10:53 -0400 Subject: Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot To: Tejun Heo Cc: Lai Jiangshan , linux-kernel@vger.kernel.org, Nathan Fontenot References: <20170523194952.GF13222@htj.duckdns.org> <20170523201029.GH13222@htj.duckdns.org> <20170525150353.GE23493@htj.duckdns.org> <20170525150752.GF23493@htj.duckdns.org> <20170606180913.GA32062@htj.duckdns.org> <736f7f6e-8d47-eaea-acc6-8ed75014a287@linux.vnet.ibm.com> <20170612161433.GB19206@htj.duckdns.org> From: Michael Bringmann Organization: IBM Linux Technology Center Date: Mon, 12 Jun 2017 12:10:49 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: <20170612161433.GB19206@htj.duckdns.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 17061217-0008-0000-0000-000008008F51 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007219; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000212; SDB=6.00873782; UDB=6.00434875; IPR=6.00653868; BA=6.00005414; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015790; XFM=3.00000015; UTC=2017-06-12 17:10:52 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17061217-0009-0000-0000-0000429564A0 Message-Id: <69c4bbad-5d40-d054-0004-38ac81377b0b@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-12_11:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706120298 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/12/2017 11:14 AM, Tejun Heo wrote: > Hello, > > On Mon, Jun 12, 2017 at 09:47:31AM -0500, Michael Bringmann wrote: >>> I'm not sure because it doesn't make any logical sense and it's not >>> right in terms of correctness. The above would be able to enable CPUs >>> which are explicitly excluded from a workqueue. The only fallback >>> which makes sense is falling back to the default pwq. >> >> What would that look like? Are you sure that would always be valid? >> In a system that is hot-adding and hot-removing CPUs? > > The reason why we're ending up with empty masks is because > wq_calc_node_cpumask() is assuming that the possible node cpumask is > always a superset of online (as it should). We can trigger a fat > warning there if that isn't so and just return false from that > function. What would that look like? I should be able to test it on top of the other changes / corrections. >>> The only way offlining can lead to this failure is when wq numa >>> possible cpu mask is a proper subset of the matching online mask. Can >>> you please print out the numa online cpu and wq_numa_possible_cpumask >>> masks and verify that online stays within the possible for each node? >>> If not, the ppc arch init code needs to be updated so that cpu <-> >>> node binding is establish for all possible cpus on boot. Note that >>> this isn't a requirement coming solely from wq. All node affine (thus >>> percpu) allocations depend on that. >> >> The ppc arch init code already records all nodes used by the CPUs visible in >> the device-tree at boot time into the possible and online node bindings. The >> problem here occurs when we hot-add new CPUs to the powerpc system -- they may >> require nodes that are mentioned by the VPHN hcall, but which were not used >> at boot time. > > We need all the possible (so, for cpus which aren't online yet too) > CPU -> node mappings to be established on boot. This isn't just a > requirement from workqueue. We don't have any synchronization > regarding cpu <-> numa mapping in memory allocation paths either. > >> I will run a test that dumps these masks later this week to try to provide >> the information that you are interested in. >> >> Right now we are having a discussion on another thread as to how to properly >> set the possible node mask at boot given that there is no mechanism to hot-add >> nodes to the system. The latest idea appears to be adding another property >> or two to define the maximum number of nodes that should be added to the >> possible / online node masks to allow for dynamic growth after boot. > > I have no idea about the specifics of ppc but at least the code base > we have currently expect all possible cpus and nodes and their > mappings to be established on boot. Hopefully, the new properties will fix the holes in the current implementation with regard to hot-add. > > Thanks. > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 mwb@linux.vnet.ibm.com