From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29C98C433F5 for ; Mon, 25 Apr 2022 08:55:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239855AbiDYI6U (ORCPT ); Mon, 25 Apr 2022 04:58:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54192 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240033AbiDYI5n (ORCPT ); Mon, 25 Apr 2022 04:57:43 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBE6CB860 for ; Mon, 25 Apr 2022 01:54:34 -0700 (PDT) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 23P7WthB002630; Mon, 25 Apr 2022 08:54:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : from : to : cc : references : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=dCVPKYv5VNMsz/Nqd9o08lQyf9PDUaiErrxL1ARkM5c=; b=X5J+3Wd1J1/pI+ccqjVVy6ZH/MDhAWXFJO31Xv+d4Mv2i8DPWGm+TNDdRNamStr0ncTd izsHlPeF+8oKef5gC82si3dZw1MQ6H+4C6TPhkb3nWdKpe0HxEKRg9WBeugK/I1sxjqv yj2JwuR4LollCe5qWrHo06GE2z+Zf++uSGCb3h5dvPoHq+AYbB18WKXsnc60P52nC2l2 eadpefUTsYw9klr7deRz2pJ/LP26sKKv3gU871DG1CFXaH1Rxu0lJFL7WatDEpsJy+HU lvhQYZyZHuSl6lo+oKVRSNvz85Avs7bTNm6Y0uTAbDEsSTBDp2YTir6eZFKURhbk3VaI 1w== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3fnqbrhkmx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Apr 2022 08:54:27 +0000 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 23P8ldrr003350; Mon, 25 Apr 2022 08:54:26 GMT Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3fnqbrhkm5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Apr 2022 08:54:26 +0000 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 23P8nSs5032381; Mon, 25 Apr 2022 08:54:24 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma01fra.de.ibm.com with ESMTP id 3fm938spq9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Apr 2022 08:54:23 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 23P8sLDg52625850 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 25 Apr 2022 08:54:21 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A2E2BA405B; Mon, 25 Apr 2022 08:54:21 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 41539A4054; Mon, 25 Apr 2022 08:54:17 +0000 (GMT) Received: from [9.43.95.32] (unknown [9.43.95.32]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 25 Apr 2022 08:54:16 +0000 (GMT) Message-ID: <2f716f45-f8c6-a078-6cfc-b4fb5ef74cd5@linux.ibm.com> Date: Mon, 25 Apr 2022 14:24:15 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH v2 0/5] mm: demotion: Introduce new node state N_DEMOTION_TARGETS Content-Language: en-US From: Aneesh Kumar K V To: "ying.huang@intel.com" , Jagdish Gediya , Wei Xu , Yang Shi , Dave Hansen , Dan Williams , Davidlohr Bueso Cc: Linux MM , Linux Kernel Mailing List , Andrew Morton , Baolin Wang , Greg Thelen , MichalHocko , Brice Goglin References: <610ccaad03f168440ce765ae5570634f3b77555e.camel@intel.com> <8e31c744a7712bb05dbf7ceb2accf1a35e60306a.camel@intel.com> <78b5f4cfd86efda14c61d515e4db9424e811c5be.camel@intel.com> <200e95cf36c1642512d99431014db8943fed715d.camel@intel.com> <8735i1zurt.fsf@linux.ibm.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: UDs4WcZScv8QbXAcsoc__p8Qhwe0gbe8 X-Proofpoint-GUID: QSYIHtWzkuqkKtx8Or5MBtdIJfmM3UPw X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-04-25_05,2022-04-22_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 mlxlogscore=999 mlxscore=0 bulkscore=0 priorityscore=1501 lowpriorityscore=0 suspectscore=0 impostorscore=0 phishscore=0 clxscore=1015 adultscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204250038 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/25/22 1:39 PM, Aneesh Kumar K V wrote: > On 4/25/22 11:40 AM, ying.huang@intel.com wrote: >> On Mon, 2022-04-25 at 09:20 +0530, Aneesh Kumar K.V wrote: >>> "ying.huang@intel.com" writes: >>> >>>> Hi, All, >>>> >>>> On Fri, 2022-04-22 at 16:30 +0530, Jagdish Gediya wrote: >>>> >>>> [snip] >>>> >>>>> I think it is necessary to either have per node demotion targets >>>>> configuration or the user space interface supported by this patch >>>>> series. As we don't have clear consensus on how the user interface >>>>> should look like, we can defer the per node demotion target set >>>>> interface to future until the real need arises. >>>>> >>>>> Current patch series sets N_DEMOTION_TARGET from dax device kmem >>>>> driver, it may be possible that some memory node desired as demotion >>>>> target is not detected in the system from dax-device kmem probe path. >>>>> >>>>> It is also possible that some of the dax-devices are not preferred as >>>>> demotion target e.g. HBM, for such devices, node shouldn't be set to >>>>> N_DEMOTION_TARGETS. In future, Support should be added to distinguish >>>>> such dax-devices and not mark them as N_DEMOTION_TARGETS from the >>>>> kernel, but for now this user space interface will be useful to avoid >>>>> such devices as demotion targets. >>>>> >>>>> We can add read only interface to view per node demotion targets >>>>> from /sys/devices/system/node/nodeX/demotion_targets, remove >>>>> duplicated /sys/kernel/mm/numa/demotion_target interface and instead >>>>> make /sys/devices/system/node/demotion_targets writable. >>>>> >>>>> Huang, Wei, Yang, >>>>> What do you suggest? >>>> >>>> We cannot remove a kernel ABI in practice.  So we need to make it right >>>> at the first time.  Let's try to collect some information for the >>>> kernel >>>> ABI definitation. >>>> >>>> The below is just a starting point, please add your requirements. >>>> >>>> 1. Jagdish has some machines with DRAM only NUMA nodes, but they don't >>>> want to use that as the demotion targets.  But I don't think this is a >>>> issue in practice for now, because demote-in-reclaim is disabled by >>>> default. >>> >>> It is not just that the demotion can be disabled. We should be able to >>> use demotion on a system where we can find DRAM only NUMA nodes. That >>> cannot be achieved by /sys/kernel/mm/numa/demotion_enabled. It needs >>> something similar to to N_DEMOTION_TARGETS >>> >> >> Can you show NUMA information of your machines with DRAM-only nodes and >> PMEM nodes?  We can try to find the proper demotion order for the >> system.  If you can not show it, we can defer N_DEMOTION_TARGETS until >> the machine is available. > > > Sure will find one such config. As you might have noticed this is very > easy to have in a virtualization setup because the hypervisor can assign > memory to a guest VM from a numa node that doesn't have CPU assigned to > the same guest. This depends on the other guest VM instance config > running on the system. So on any virtualization config that has got > persistent memory attached, this can become an easy config to end up with. > > something like this $ numactl -H available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 14272 MB node 0 free: 13392 MB node 1 cpus: node 1 size: 2028 MB node 1 free: 1971 MB node distances: node 0 1 0: 10 40 1: 40 10 $ cat /sys/bus/nd/devices/dax0.0/target_node 2 $ # cd /sys/bus/dax/drivers/ :/sys/bus/dax/drivers# ls device_dax kmem :/sys/bus/dax/drivers# cd device_dax/ :/sys/bus/dax/drivers/device_dax# echo dax0.0 > unbind :/sys/bus/dax/drivers/device_dax# echo dax0.0 > ../kmem/new_id :/sys/bus/dax/drivers/device_dax# numactl -H available: 3 nodes (0-2) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 14272 MB node 0 free: 13380 MB node 1 cpus: node 1 size: 2028 MB node 1 free: 1961 MB node 2 cpus: node 2 size: 0 MB node 2 free: 0 MB node distances: node 0 1 2 0: 10 40 80 1: 40 10 80 2: 80 80 10 :/sys/bus/dax/drivers/device_dax#