From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752006AbbLCR6b (ORCPT ); Thu, 3 Dec 2015 12:58:31 -0500 Received: from relay1.sgi.com ([192.48.180.66]:54155 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751648AbbLCR6a (ORCPT ); Thu, 3 Dec 2015 12:58:30 -0500 Reply-To: Subject: [PATCH] drivers: memory: check for missing sections when testing zones References: <1449068821-9870-1-git-send-email-sjennings@variantweb.net> <1449068821-9870-3-git-send-email-sjennings@variantweb.net> <20151202144556.c21211967d835f5607a909bb@linux-foundation.org> To: Andrew Morton , Seth Jennings CC: , Greg Kroah-Hartman , Russ Anderson , From: Andrew Banman Message-ID: <566082D6.3070905@sgi.com> Date: Thu, 3 Dec 2015 11:58:46 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <20151202144556.c21211967d835f5607a909bb@linux-foundation.org> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [128.162.233.132] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org test_pages_in_a_zone does not account for the possibility of missing sections in the given pfn range. Since pfn_valid_within always returns 1 when CONFIG_HOLES_IN_ZONE is not set, invalid pfns from missing sections will pass the test, resulting in a kernel oops. This is remedied by simply checking for the presence of the pfn's section. We don't have to remove the pfn_valid_within optimization. The patch also prevents a crash from offlining memory devices with missing sections. Despite this, it's probably best to keep [PATCH 3/3] drivers: memory: prohibit offlining of memory blocks withmissing sections because missing sections may indicate other problems, like overlapping mem blocks and who knows what else (see the discussion at BZ 107781). --- mm/memory_hotplug.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 67d488a..74f5bcd 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1383,6 +1383,9 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) pfn < end_pfn; pfn += MAX_ORDER_NR_PAGES) { i = 0; + /* Make sure the memory section is present */ + if (!present_section_nr(pfn_to_section_nr(pfn))) + continue; /* This is just a CONFIG_HOLES_IN_ZONE check.*/ while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i)) i++; -- 1.7.12.4 On 12/02/2015 04:45 PM, Andrew Morton wrote: > On Wed, 2 Dec 2015 09:07:01 -0600 Seth Jennings wrote: > >> bdee237c and 982792c7 introduced large block sizes for x86. >> This made it possible to have multiple sections per memory >> block where previously, there was a only every one section >> per block. >> >> Since blocks consist of contiguous ranges of section, there >> can be holes in the blocks where sections are not present. >> If one attempts to offline such a block, a crash occurs since >> the code is not designed to deal with this. >> >> This patch is a quick fix to gaurd against the crash by >> not allowing blocks with non-present sections to be offlined. >> >> ... >> >> --- a/drivers/base/memory.c >> +++ b/drivers/base/memory.c >> @@ -303,6 +303,10 @@ static int memory_subsys_offline(struct device *dev) >> if (mem->state == MEM_OFFLINE) >> return 0; >> >> + /* Can't offline block with non-present sections */ >> + if (mem->section_count != sections_per_block) >> + return -EINVAL; >> + >> return memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE); >> } > > [3/3] fixes a kernel crash so I've tagged it for -stable and shall move > it ahead of [1/2] and [2/2], which are merely cleanups. > > This assumes that [3/3] is independent of the other two patches. I'll > eat my hat if it isn't. >