From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74B67C3A5A4 for ; Fri, 30 Aug 2019 08:39:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3D0E4238C5 for ; Fri, 30 Aug 2019 08:39:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1567154370; bh=hEAMnGt+2O/MF2oakFpHP7y4b1PWoN2RHx1H+cNKtng=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=Mgzfk3KFkeqd/Wxzyxgh0GnXvrYEEt+QWG2q7PeOc/YMB94DpH8JzUaM47GY8nIyx Bk/NbwirHhQWk+SrZdYhKwxYar/FF8nxCOOybPk4wK1AH4kQTWnCn2Yv+odz1FVAhf YlgAr5QSNfLN9zpzkkjjZ9TyG0O32HpZtKGigeM4= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727410AbfH3Ij3 (ORCPT ); Fri, 30 Aug 2019 04:39:29 -0400 Received: from mx2.suse.de ([195.135.220.15]:54416 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726495AbfH3Ij2 (ORCPT ); Fri, 30 Aug 2019 04:39:28 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id DEC91AF9F; Fri, 30 Aug 2019 08:39:26 +0000 (UTC) Date: Fri, 30 Aug 2019 10:39:25 +0200 From: Michal Hocko To: Yunsheng Lin Cc: will@kernel.org, akpm@linux-foundation.org, rppt@linux.ibm.com, anshuman.khandual@arm.com, adobriyan@gmail.com, cai@lca.pw, robin.murphy@arm.com, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linuxarm@huawei.com Subject: Re: [PATCH] arm64: numa: check the node id before accessing node_to_cpumask_map Message-ID: <20190830083925.GV28313@dhcp22.suse.cz> References: <1567131991-189761-1-git-send-email-linyunsheng@huawei.com> <20190830055528.GO28313@dhcp22.suse.cz> <49b86da7-f114-27c2-463a-9bf5082ac197@huawei.com> <20190830064421.GS28313@dhcp22.suse.cz> <740cae36-f1a9-4d20-e4ea-3100076de533@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <740cae36-f1a9-4d20-e4ea-3100076de533@huawei.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 30-08-19 16:08:14, Yunsheng Lin wrote: > On 2019/8/30 14:44, Michal Hocko wrote: > > On Fri 30-08-19 14:35:26, Yunsheng Lin wrote: > >> On 2019/8/30 13:55, Michal Hocko wrote: > >>> On Fri 30-08-19 10:26:31, Yunsheng Lin wrote: > >>>> Some buggy bios may not set the device' numa id, and dev_to_node > >>>> will return -1, which may cause global-out-of-bounds error > >>>> detected by KASAN. > >>> > >>> Why should we workaround a buggy bios like that? Is it so widespread and > >>> no BIOS update available? Also, why is this arm64 specific? > >> > >> For our case, there is BIOS update available. I just thought it might > >> be better to protect from this case when BIOS has not implemented the > >> device' numa id setting feature or the feature from BIOS has some bug. > >> > >> It is not arm64 specific, right now I only have arm64 board. If it is > >> ok to protect this from the buggy BIOS, maybe all other arch can be > >> changed too. > > > > If we are to really care then this should be consistent among > > architectures IMHO. But I am not really sure this is really worth it. > > The code is quite old and I do not really remember any reports. > > It is only detected by enabling KASAN, the system seems to run fine without > any visible error if KASAN is disabled. Maybe there is why no report has > been seen? > > Also according to Section 6.2.14 from ACPI spec 6.3 [1], the setting of proximity > domain is optional, as below: > > This optional object is used to describe proximity domain > associations within a machine. _PXM evaluates to an integer > that identifies a device as belonging to a Proximity Domain > defined in the System Resource Affinity Table (SRAT). > > > Do you think it is ok to resend the fix with above clarification and below > log: OK, if the specification really allows to provide NUMA_NO_NODE (-1) then the code really has to be prepared for that. And ideally all arches should deal with that. > [ 42.970381] ================================================================== > [ 42.977595] BUG: KASAN: global-out-of-bounds in __bitmap_weight+0x48/0xb0 > [ 42.984370] Read of size 8 at addr ffff20008cdf8790 by task kworker/0:1/13 > [ 42.991230] > [ 42.992712] CPU: 0 PID: 13 Comm: kworker/0:1 Tainted: G O 5.2.0-rc4-g8bde06a-dirty #3 > [ 43.001830] Hardware name: Huawei TaiShan 2280 V2/BC82AMDA, BIOS TA BIOS 2280-A CS V2.B050.01 08/08/2019 > [ 43.011298] Workqueue: events work_for_cpu_fn > [ 43.015643] Call trace: > [ 43.018078] dump_backtrace+0x0/0x1e8 > [ 43.021727] show_stack+0x14/0x20 > [ 43.025031] dump_stack+0xc4/0xfc > [ 43.028335] print_address_description+0x178/0x270 > [ 43.033113] __kasan_report+0x164/0x1b8 > [ 43.036936] kasan_report+0xc/0x18 > [ 43.040325] __asan_load8+0x84/0xa8 > [ 43.043801] __bitmap_weight+0x48/0xb0 > [ 43.047552] hclge_init_ae_dev+0x988/0x1e78 [hclge] > [ 43.052418] hnae3_register_ae_dev+0xcc/0x278 [hnae3] > [ 43.057467] hns3_probe+0xe0/0x120 [hns3] > [ 43.061464] local_pci_probe+0x74/0xf0 > [ 43.065200] work_for_cpu_fn+0x2c/0x48 > [ 43.068937] process_one_work+0x3c0/0x878 > [ 43.072934] worker_thread+0x400/0x670 > [ 43.076670] kthread+0x1b0/0x1b8 > [ 43.079885] ret_from_fork+0x10/0x18 > [ 43.083446] > [ 43.084925] The buggy address belongs to the variable: > [ 43.090052] numa_distance+0x30/0x40 > [ 43.093613] > [ 43.095091] Memory state around the buggy address: > [ 43.099870] ffff20008cdf8680: fa fa fa fa 04 fa fa fa fa fa fa fa 00 00 fa fa > [ 43.107078] ffff20008cdf8700: fa fa fa fa 04 fa fa fa fa fa fa fa 00 fa fa fa > [ 43.114286] >ffff20008cdf8780: fa fa fa fa 00 00 00 00 00 00 00 00 fa fa fa fa > [ 43.121494] ^ > [ 43.125230] ffff20008cdf8800: 01 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa > [ 43.132439] ffff20008cdf8880: fa fa fa fa fa fa fa fa 00 00 fa fa fa fa fa fa > [ 43.139646] ================================================================== > > > > > >>>> This patch changes cpumask_of_node to return cpu_none_mask if the > >>>> node is not valid, and sync the cpumask_of_node between the > >>>> cpumask_of_node function in numa.h and numa.c. > >>> > >>> Why? > >> > >> When CONFIG_DEBUG_PER_CPU_MAPS is defined, the cpumask_of_node() in > >> numa.c is used, if not, the cpumask_of_node() in numa.h is used. > >> > >> I am not sure why there is difference between them, and it is there > >> when since the below commit: > >> 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.") > >> > >> I synced them to keep them consistent whether CONFIG_DEBUG_PER_CPU_MAPS > >> is defined. > > > > Such a change should be made in a separate patch with a full > > clarification/justification. From the above it is still not clear why > > this is needed though. > > Ok. > > How about: > > Currently there are different implementations of cpumask_of_node() depend > on the arch, for example: > > ia64: > #define cpumask_of_node(node) ((node) == -1 ? \ > cpu_all_mask : \ > &node_to_cpu_mask[node]) > > > alpha: > static const struct cpumask *cpumask_of_node(int node) > { > int cpu; > > if (node == NUMA_NO_NODE) > return cpu_all_mask; > > cpumask_clear(&node_to_cpumask_map[node]); > > for_each_online_cpu(cpu) { > if (cpu_to_node(cpu) == node) > cpumask_set_cpu(cpu, node_to_cpumask_map[node]); > } > > return &node_to_cpumask_map[node]; > } > > Even for the same arch, there are two implementations depend on the > CONFIG_DEBUG_PER_CPU_MAPS configuration. > > arm64/x86 without CONFIG_DEBUG_PER_CPU_MAPS: > static inline const struct cpumask *cpumask_of_node(int node) > { > return node_to_cpumask_map[node]; > } > > arm64/x86 with CONFIG_DEBUG_PER_CPU_MAPS: > const struct cpumask *cpumask_of_node(int node) > { > if (WARN_ON(node >= nr_node_ids)) > return cpu_none_mask; > > if (WARN_ON(node_to_cpumask_map[node] == NULL)) > return cpu_online_mask; > > return node_to_cpumask_map[node]; > } > > It seems the cpumask_of_node with CONFIG_DEBUG_PER_CPU_MAPS is used > to catch the erorr case and give a warning to user when node id is not > valid. Yeah the config help text config DEBUG_PER_CPU_MAPS bool "Debug access to per_cpu maps" depends on DEBUG_KERNEL depends on SMP help Say Y to verify that the per_cpu map being accessed has been set up. This adds a fair amount of code to kernel memory and decreases performance. Say N if unsure. suggests that this is intentionally hidden behind a config so a normal path shouldn't really duplicate it. If those checks make sense in general then the config option should be dropped I think. -- Michal Hocko SUSE Labs