Re: [PATCH] ARM: sparsemem: Enable CONFIG_HOLES_IN_ZONE config option for SparseMem and HAS_HOLES_MEMORYMODEL for linux-3.0.

From: Kautuk Consul <consul.kautuk@gmail.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Russell King <rmk@arm.linux.org.uk>,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org
Subject: Re: [PATCH] ARM: sparsemem: Enable CONFIG_HOLES_IN_ZONE config option for SparseMem and HAS_HOLES_MEMORYMODEL for linux-3.0.
Date: Fri, 5 Aug 2011 17:10:38 +0530	[thread overview]
Message-ID: <CAFPAmTS8-qQ4ZzBeJeKuG2jvyyfkwnqbtSjPX2TLddDPtSmF7g@mail.gmail.com> (raw)
In-Reply-To: <20110805084742.GU19099@suse.de>

Ok, I analyzed the code and it seems that this alignment problem has
been solved by the changes made
to the free_unused_memmap() code in arch/arm/mm/init.c.

I backported those changes to free_unused_memmap_node() in
linux-2.6.35.9 and I don't see any more
crashes. This solves my problem.

Thanks for all the help.

On Fri, Aug 5, 2011 at 2:17 PM, Mel Gorman <mgorman@suse.de> wrote:
> On Fri, Aug 05, 2011 at 11:27:21AM +0530, Kautuk Consul wrote:
>> Hi Mel,
>>
>> Please find my comments inline to the email below.
>>
>> 2 general questions:
>> i)    If an email chain such as this leads to another kernel patch for
>> the same problem, do I need to
>>       create a new email chain for that ?
>
> Yes. I believe it's easier on the maintainer if a new thread is started
> with the new patch leader summarising relevant information from the
> discussion. A happy maintainer makes the day easier.
>
>> ii)  Sorry about my formatting problems. However, text such as
>> backtraces and logs tend to wrap
>>       irrespective of whatever gmail settings/browser I try. Any
>> pointers here ?
>>
>
> I don't use gmail so I do not have any suggestions other than using a
> different mailer. There are no suggestions in
> Documentation/email-clients.txt on how to deal with gmail.
>
>> On Thu, Aug 4, 2011 at 3:39 PM, Mel Gorman <mgorman@suse.de> wrote:
>> > On Thu, Aug 04, 2011 at 03:06:39PM +0530, Kautuk Consul wrote:
>> >> Hi Mel,
>> >>
>> >> My ARM system has 2 memory banks which have the following 2 PFN ranges:
>> >> 60000-62000 and 70000-7ce00.
>> >>
>> >> My SECTION_SIZE_BITS is #defined to 23.
>> >>
>> >
>> > So bank 0 is 4 sections and bank 1 is 26 sections with the last section
>> > incomplete.
>> >
>> >> I am altering the ranges via the following kind of pseudo-code in the
>> >> arch/arm/mach-*/mach-*.c file:
>> >> meminfo->bank[0].size -= (1 << 20)
>> >> meminfo->bank[1].size -= (1 << 20)
>> >>
>> >
>> > Why are you taking 1M off each bank? I could understand aligning the
>> > banks to a section size at least.
>>
>> The reason I am doing this is that one of our embedded boards actually
>> has this problem, due
>> to which we see this kernel crash. I am merely reproducing this
>> problem by performing this step.
>>
>
> Ah, that makes sense.
>
>> >> <SNIP>
>> >> The reason why we cannot expect the 0x61fff end_page->flags to contain
>> >> a valid zone number is:
>> >> memmap_init_zone() initializes the zone number of all pages for a zone
>> >> via the set_page_links() inline function.
>> >> For the end_page (whose PFN is 0x61fff), set_page_links() cannot be
>> >> possibly called, as the zones are simply not aware of of PFNs above
>> >> 0x61f00 and below 0x70000.
>> >>
>> >
>> > Can you ensure that the ranges passed into free_area_init_node()
>> > are MAX_ORDER aligned as this would initialise the struct pages. You
>> > may have already seen that care is taken when freeing memmap that it
>> > is aligned to MAX_ORDER in free_unused_memmap() in ARM.
>> >
>>
>> Will this work ? My doubt arises from the fact that there is only one
>> zone on the entire
>> system which contains both memory banks.
>
> That is a common situation. It's why present_pages and spanned_pages
> in a zone can differ. As long as valid memmap is MAX_ORDER-aligned, it's
> fine.
>
>> The crash arises at the PFN 0x61fff, which will not be covered by such
>> a check, as this function
>
> No, it won't be covered by the range check. However, the memmap will be
> initialised so even though the page is outside a valid bank of memory,
> it'll still resolve to the correct zone. The struct page will be marked
> Reserved so it'll never be used.
>
>> will try to act on the entire zone, which is the PFN range:
>> 60000-7cd00, including the holes within as
>> all of this RAM falls into the same node and zone.
>> ( Please correct me if I am wrong about this. )
>>
>> I tried aligning the end parameter in the memory_present() function
>> which is called separately
>> for each memory bank.
>> I tried the following change in memory_present() as well as
>> mminit_validate_memodel_limits():
>> end &= ~(pageblock_nr_pages-1);
>> But, in this case, the board simply does not boot up. I think that
>> will then require some change in the
>> arch/arm code which I think would be an arch-specific solution to a
>> possibly generic problem.
>>
>
> I do not believe this is a generic problem. Machines have
> holes in the zone all the time and this bug does not trigger.
> mminit_validate_memodel_limits() is the wrong place to make a change.
> Look more towards where free_area_init_node() gets called to initialise
> memmap.
>
>> >> The (end >= zone->zone_start_pfn + zone->spanned_pages) in
>> >> move_freepages_block() does not stop this crash from happening as both
>> >> our memory banks are in the same zone and the empty space within them
>> >> is accomodated into this zone via the CONFIG_SPARSEMEM
>> >> config option.
>> >>
>> >> When we enable CONFIG_HOLES_IN_ZONE we survive this BUG_ON as well as
>> >> any other BUG_ONs in the loop in move_freepages() as then the
>> >> pfn_valid_within()/pfn_valid() function takes care of this
>> >> functionality, especially in the case where the newly introduced
>> >> CONFIG_HAVE_ARCH_PFN_VALID is
>> >> enabled.
>> >>
>> >
>> > This is an expensive option in terms of performance. If Russell
>> > wants to pick it up, I won't object but I would strongly suggest that
>> > you solve this problem by ensuring that memmap is initialised on a
>> > MAX_ORDER-aligned boundaries as it'll perform better.
>> >
>>
>> I couldn't really locate a method in the kernel wherein we can
>> validate a pageblock(1024 pages for my
>> platform) with respect to the memory banks on that system.
>>
>
> I suspect what you're looking for is somewhere in arch/arm/mm/init.c .
> I'm afraid I didn't dig through the ARM memory initialisation
> code to see where it could be done but if it was me, I'd be looking at
> how free_area_init_node() is called.
>
>> How about this :
>> We implement an arch_is_valid_pageblock() function, controlled by a
>> new config option
>> CONFIG_ARCH_HAVE IS_VALID_PAGEBLOCK.
>> This arch function will simply check whether this pageblock is valid
>> or not, in terms of arch-specific
>> memory banks or by using the memblock APIs depending on CONFIG_HAVE_MEMBLOCK.
>> We can modify the memmap_init_zone() function so that an outer loop
>> works in measures of
>> pageblocks thus enabling us to avoid invalid pageblocks.
>
> That seems like massive overkill for what should be an alignment problem
> when initialisating memmap. It's adding complexity that is similar to
> HOLES_IN_ZONE with very little gain.
>
> When it gets down to it, I'd even prefer deleting the BUG_ON as
> the PageBuddy check over such a solution. However, the BUG_ON is
> there because alignment to MAX_ORDER is expected so it is a valid
> sanity check. There would need to be good evidence that initialising
> memmap to MAX_ORDER alignment was somehow impossible.
>
> --
> Mel Gorman
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>