All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Lameter <cl@linux.com>
To: Jiang Liu <jiang.liu@huawei.com>
Cc: Jiang Liu <liuj97@gmail.com>, Pekka Enberg <penberg@kernel.org>,
	Matt Mackall <mpm@selenic.com>, Mel Gorman <mgorman@suse.de>,
	Yinghai Lu <yinghai@kernel.org>, Tony Luck <tony.luck@intel.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	David Rientjes <rientjes@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Keping Chen <chenkeping@huawei.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/4] mm: introduce a safer interface to check whether a page is managed by SLxB
Date: Fri, 6 Jul 2012 08:50:09 -0500 (CDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.1207060841001.26441@router.home> (raw)
In-Reply-To: <4FF693F6.8070505@huawei.com>

On Fri, 6 Jul 2012, Jiang Liu wrote:

> 	Originally the patch is aimed to fix an issue encountered when
> hot-removing a hot-added memory device. Currently memory hotplug is only
> supported with SPARSE memory model. After offlining all pages of a memory
> section, we need to free resources used by "struct mem_section" itself.
> That is to free section_mem_map and pageblock_flags. For memory section
> created at boot time, section_mem_map and pageblock_flags are allocated
> from bootmem. For memory section created at runtime, section_mem_map
> and pageblock_flags are allocated from slab. So when freeing these
> resources, we use PageSlab() to tell whether there are allocated from slab.
> So free_section_usemap() has following code snippet.
> {
>         usemap_page = virt_to_page(usemap);
>         /*
>          * Check to see if allocation came from hot-plug-add
>          */
>         if (PageSlab(usemap_page)) {

Change this to PageSlab(usemap_page) || PageCompound(usemap_page) and then
the code segment will work. Fallback to the page allocator always implied
the use of compound pages. It would be cleaner if memory hotplug had an
indicator which allocation mechanism was used and would use the
corresponding free action. Slab allocators could put multiple objects into
the slab page (if the structures are sufficiently small). So this is not
that good of a solution.


> 	And when fixing this issue, we found some other usages of PageSlab() may
> have the same issue. For example:
> 	1) /proc/kpageflags and /proc/kpagecount may return incorrect result for
> pages allocated by slab.

Ok then the compound page handling is broken in those.

> 	2) DRBD has following comments. At first glance, it seems that it's
> 	dangerous if PageSlab() to return false for pages allocated by slab.

Again the pages that do not have PageSlab set were not allocated using a
slab allocator. They were allocated by calls to the page allocator.

> 	(With more thinking, the comments is a little out of date because now
> 	put_page/get_page already correctly handle compound pages, so it should
> 	be OK to send pages allocated from slab.)

AFAICT they always handled compound pages correctly.

> 	3) show_mem() on ARM and unicore32 reports much less pages used by slab
> 	if SLUB/SLOB is used instead of SLAB because SLUB/SLOB doesn't mark big
> 	compound pages with PG_slab flag.

Right. That is because SLUB/SLOB lets the page allocator directly
allocator large structures where it would not make sense to use the slab
allocators. The main purpose of the slab allocators is to allocate
objects in fractions of pages. This does not seem to be a use case for
slab objects. Maybe it would be better to directly call the page allocator
for your large structures?

> 	For example, if the memory backing a "struct resource" structure is
> allocated from bootmem, __release_region() shouldn't free the memory into
> slab allocator, otherwise it will trigger panic as below. This issue is
> reproducible when hot-removing a memory device present at boot time on x86
> platforms. On x86 platforms, e820_reserve_resources() allocates bootmem for
> all physical memory resources present at boot time. Later when those memory
> devices are hot-removed, __release_region() will try to free  memory from
> bootmem into slab, so trigger the panic. And a proposed fix is:

Working out how a certain memory structure was allocated could be most
easily done by setting a flag somewhere instead of checking the page flags
of a page that may potentially include multiple slab objects.

WARNING: multiple messages have this Message-ID (diff)
From: Christoph Lameter <cl@linux.com>
To: Jiang Liu <jiang.liu@huawei.com>
Cc: Jiang Liu <liuj97@gmail.com>, Pekka Enberg <penberg@kernel.org>,
	Matt Mackall <mpm@selenic.com>, Mel Gorman <mgorman@suse.de>,
	Yinghai Lu <yinghai@kernel.org>, Tony Luck <tony.luck@intel.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	David Rientjes <rientjes@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Keping Chen <chenkeping@huawei.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/4] mm: introduce a safer interface to check whether a page is managed by SLxB
Date: Fri, 6 Jul 2012 08:50:09 -0500 (CDT)	[thread overview]
Message-ID: <alpine.DEB.2.00.1207060841001.26441@router.home> (raw)
In-Reply-To: <4FF693F6.8070505@huawei.com>

On Fri, 6 Jul 2012, Jiang Liu wrote:

> 	Originally the patch is aimed to fix an issue encountered when
> hot-removing a hot-added memory device. Currently memory hotplug is only
> supported with SPARSE memory model. After offlining all pages of a memory
> section, we need to free resources used by "struct mem_section" itself.
> That is to free section_mem_map and pageblock_flags. For memory section
> created at boot time, section_mem_map and pageblock_flags are allocated
> from bootmem. For memory section created at runtime, section_mem_map
> and pageblock_flags are allocated from slab. So when freeing these
> resources, we use PageSlab() to tell whether there are allocated from slab.
> So free_section_usemap() has following code snippet.
> {
>         usemap_page = virt_to_page(usemap);
>         /*
>          * Check to see if allocation came from hot-plug-add
>          */
>         if (PageSlab(usemap_page)) {

Change this to PageSlab(usemap_page) || PageCompound(usemap_page) and then
the code segment will work. Fallback to the page allocator always implied
the use of compound pages. It would be cleaner if memory hotplug had an
indicator which allocation mechanism was used and would use the
corresponding free action. Slab allocators could put multiple objects into
the slab page (if the structures are sufficiently small). So this is not
that good of a solution.


> 	And when fixing this issue, we found some other usages of PageSlab() may
> have the same issue. For example:
> 	1) /proc/kpageflags and /proc/kpagecount may return incorrect result for
> pages allocated by slab.

Ok then the compound page handling is broken in those.

> 	2) DRBD has following comments. At first glance, it seems that it's
> 	dangerous if PageSlab() to return false for pages allocated by slab.

Again the pages that do not have PageSlab set were not allocated using a
slab allocator. They were allocated by calls to the page allocator.

> 	(With more thinking, the comments is a little out of date because now
> 	put_page/get_page already correctly handle compound pages, so it should
> 	be OK to send pages allocated from slab.)

AFAICT they always handled compound pages correctly.

> 	3) show_mem() on ARM and unicore32 reports much less pages used by slab
> 	if SLUB/SLOB is used instead of SLAB because SLUB/SLOB doesn't mark big
> 	compound pages with PG_slab flag.

Right. That is because SLUB/SLOB lets the page allocator directly
allocator large structures where it would not make sense to use the slab
allocators. The main purpose of the slab allocators is to allocate
objects in fractions of pages. This does not seem to be a use case for
slab objects. Maybe it would be better to directly call the page allocator
for your large structures?

> 	For example, if the memory backing a "struct resource" structure is
> allocated from bootmem, __release_region() shouldn't free the memory into
> slab allocator, otherwise it will trigger panic as below. This issue is
> reproducible when hot-removing a memory device present at boot time on x86
> platforms. On x86 platforms, e820_reserve_resources() allocates bootmem for
> all physical memory resources present at boot time. Later when those memory
> devices are hot-removed, __release_region() will try to free  memory from
> bootmem into slab, so trigger the panic. And a proposed fix is:

Working out how a certain memory structure was allocated could be most
easily done by setting a flag somewhere instead of checking the page flags
of a page that may potentially include multiple slab objects.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-07-06 13:50 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-03  3:57 [RFC PATCH 1/4] mm: introduce a safer interface to check whether a page is managed by SLxB Jiang Liu
2012-07-03  3:57 ` Jiang Liu
2012-07-03  3:57 ` [RFC PATCH 2/4] mm: make consistent use of PG_slab flag Jiang Liu
2012-07-03  3:57   ` Jiang Liu
2012-07-05 14:47   ` Christoph Lameter
2012-07-05 14:47     ` Christoph Lameter
2012-07-05 16:15     ` Jiang Liu
2012-07-05 16:15       ` Jiang Liu
2012-07-05 17:37       ` Christoph Lameter
2012-07-05 17:37         ` Christoph Lameter
2012-07-06  8:30         ` Jiang Liu
2012-07-06  8:30           ` Jiang Liu
2012-07-06 13:53           ` Christoph Lameter
2012-07-06 13:53             ` Christoph Lameter
2012-07-03  3:57 ` [RFC PATCH 3/4] SLAB: minor code cleanup Jiang Liu
2012-07-03  3:57   ` Jiang Liu
2012-07-03 10:02   ` Cong Wang
2012-07-03  3:57 ` [RFC PATCH 4/4] mm: change slob's struct page definition to accomodate struct page changes Jiang Liu
2012-07-03  3:57   ` Jiang Liu
2012-07-03 10:22   ` Cong Wang
2012-07-03  9:56 ` [RFC PATCH 1/4] mm: introduce a safer interface to check whether a page is managed by SLxB Cong Wang
2012-07-05 14:45 ` Christoph Lameter
2012-07-05 14:45   ` Christoph Lameter
2012-07-05 15:55   ` Jiang Liu
2012-07-05 15:55     ` Jiang Liu
2012-07-05 17:36     ` Christoph Lameter
2012-07-05 17:36       ` Christoph Lameter
2012-07-06  7:29       ` Jiang Liu
2012-07-06  7:29         ` Jiang Liu
2012-07-06 13:50         ` Christoph Lameter [this message]
2012-07-06 13:50           ` Christoph Lameter
2012-07-06 15:36           ` Jiang Liu
2012-07-06 15:36             ` Jiang Liu
2012-09-04  9:18 ` Wen Congyang
2012-09-04  9:18   ` Wen Congyang
2012-09-04 12:13   ` Jiang Liu
2012-09-04 12:13     ` Jiang Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1207060841001.26441@router.home \
    --to=cl@linux.com \
    --cc=chenkeping@huawei.com \
    --cc=jiang.liu@huawei.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liuj97@gmail.com \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=mpm@selenic.com \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=tony.luck@intel.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.