linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30)
@ 2005-11-20  6:56 Miles Lane
  2005-11-20  8:05 ` Hugh Dickins
  0 siblings, 1 reply; 13+ messages in thread
From: Miles Lane @ 2005-11-20  6:56 UTC (permalink / raw)
  To: Andrew Morton, LKML

[17179671.700000] Bad page state at free_hot_cold_page (in process
'aplay', page c18eef30)
[17179671.700000] flags:0x80000414 mapping:00000000 mapcount:0 count:0
[17179671.700000] Backtrace:
[17179671.700000]  [<c0103de8>] dump_stack+0x1e/0x20
[17179671.700000]  [<c014c775>] bad_page+0x87/0x140
[17179671.700000]  [<c014d12d>] free_hot_cold_page+0x43/0x130
[17179671.700000]  [<c014d224>] free_hot_page+0xa/0xc
[17179671.700000]  [<c0150b53>] __page_cache_release+0x61/0xbf
[17179671.700000]  [<c015074d>] put_page+0x3d/0x82
[17179671.700000]  [<c015e430>] free_page_and_swap_cache+0x24/0x47
[17179671.700000]  [<c01559e4>] zap_pte_range+0x232/0x348
[17179671.700000]  [<c0155bdb>] unmap_page_range+0xe1/0x10b
[17179671.700000]  [<c0155cc4>] unmap_vmas+0xbf/0x237
[17179671.700000]  [<c015a494>] unmap_region+0xb3/0x158
[17179671.700000]  [<c015a7f2>] do_munmap+0xf8/0x138
[17179671.700000]  [<c015a889>] sys_munmap+0x57/0x73
[17179671.700000]  [<c0102e9f>] sysenter_past_esp+0x54/0x75
[17179671.700000] ---------------------------
[17179671.700000] | preempt count: 00000003 ]
[17179671.700000] | 3 level deep critical section nesting:
[17179671.700000] ----------------------------------------
[17179671.700000] .. [<c015a414>] .... unmap_region+0x33/0x158
[17179671.700000] .....[<c015a7f2>] ..   ( <= do_munmap+0xf8/0x138)
[17179671.700000] .. [<c01181dd>] .... kmap_atomic+0x15/0x9c
[17179671.700000] .....[<c01557f2>] ..   ( <= zap_pte_range+0x40/0x348)
[17179671.700000] .. [<c03ca9dc>] .... _spin_lock+0x13/0x6f
[17179671.700000] .....[<c0155809>] ..   ( <= zap_pte_range+0x57/0x348)
[17179671.700000]
[17179671.700000] Hexdump:
[17179671.700000] 000: e4 d8 cc c1 9e ff 64 00 f8 57 a7 c1 b8 ad a0 c1
[17179671.700000] 010: 14 04 00 80 00 00 00 00 ff ff ff ff 00 00 00 00
[17179671.700000] 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[17179671.700000] 030: 00 00 00 00 f0 00 00 00 00 01 10 00 00 02 20 00
[17179671.700000] 040: 14 04 00 80 ff ff ff ff ff ff ff ff 00 00 00 00
[17179671.700000] 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[17179671.700000] 060: 00 00 00 00 f1 00 00 00 00 01 10 00 00 02 20 00
[17179671.700000] 070: 00 04 00 80 00 00 00 00 00 00 00 00 00 00 00 00
[17179671.700000] 080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[17179671.700000] 090: 00 00 00 00 38 00 00 00 00 01 10 00 00 02 20 00
[17179671.700000] 0a0: 00 04 00 80 00 00 00 00 00 00 00 00 00 00 00 00
[17179671.700000] 0b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[17179671.700000] Trying to fix it up, but a reboot is needed

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30)
  2005-11-20  6:56 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30) Miles Lane
@ 2005-11-20  8:05 ` Hugh Dickins
  2005-11-20 18:14   ` Lee Revell
  0 siblings, 1 reply; 13+ messages in thread
From: Hugh Dickins @ 2005-11-20  8:05 UTC (permalink / raw)
  To: Miles Lane; +Cc: Andrew Morton, LKML

On Sat, 19 Nov 2005, Miles Lane wrote:
> [17179671.700000] Bad page state at free_hot_cold_page (in process
> 'aplay', page c18eef30)
> [17179671.700000] flags:0x80000414 mapping:00000000 mapcount:0 count:0

Please let me know if it's not fixed by:

--- 2.6.15-rc1-mm2/sound/core/memalloc.c	2005-11-12 09:01:28.000000000 +0000
+++ linux/sound/core/memalloc.c	2005-11-19 19:03:32.000000000 +0000
@@ -197,6 +197,7 @@ void *snd_malloc_pages(size_t size, gfp_
 
 	snd_assert(size > 0, return NULL);
 	snd_assert(gfp_flags != 0, return NULL);
+	gfp_flags |= __GFP_COMP;	/* compound page lets parts be mapped */
 	pg = get_order(size);
 	if ((res = (void *) __get_free_pages(gfp_flags, pg)) != NULL) {
 		mark_pages(virt_to_page(res), pg);
@@ -241,6 +242,7 @@ static void *snd_malloc_dev_pages(struct
 	snd_assert(dma != NULL, return NULL);
 	pg = get_order(size);
 	gfp_flags = GFP_KERNEL
+		| __GFP_COMP	/* compound page lets parts be mapped */
 		| __GFP_NORETRY /* don't trigger OOM-killer */
 		| __GFP_NOWARN; /* no stack trace print - this call is non-critical */
 	res = dma_alloc_coherent(dev, PAGE_SIZE << pg, dma, gfp_flags);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30)
  2005-11-20  8:05 ` Hugh Dickins
@ 2005-11-20 18:14   ` Lee Revell
  2005-11-20 19:35     ` Hugh Dickins
  0 siblings, 1 reply; 13+ messages in thread
From: Lee Revell @ 2005-11-20 18:14 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Miles Lane, Andrew Morton, LKML, alsa-devel

(Added alsa-devel to cc:)

Will this change have any ill effects on older kernels?  If not we
should fix it in the ALSA tree right?

Lee

On Sun, 2005-11-20 at 08:05 +0000, Hugh Dickins wrote:
> On Sat, 19 Nov 2005, Miles Lane wrote:
> > [17179671.700000] Bad page state at free_hot_cold_page (in process
> > 'aplay', page c18eef30)
> > [17179671.700000] flags:0x80000414 mapping:00000000 mapcount:0 count:0
> 
> Please let me know if it's not fixed by:
> 
> --- 2.6.15-rc1-mm2/sound/core/memalloc.c	2005-11-12 09:01:28.000000000 +0000
> +++ linux/sound/core/memalloc.c	2005-11-19 19:03:32.000000000 +0000
> @@ -197,6 +197,7 @@ void *snd_malloc_pages(size_t size, gfp_
>  
>  	snd_assert(size > 0, return NULL);
>  	snd_assert(gfp_flags != 0, return NULL);
> +	gfp_flags |= __GFP_COMP;	/* compound page lets parts be mapped */
>  	pg = get_order(size);
>  	if ((res = (void *) __get_free_pages(gfp_flags, pg)) != NULL) {
>  		mark_pages(virt_to_page(res), pg);
> @@ -241,6 +242,7 @@ static void *snd_malloc_dev_pages(struct
>  	snd_assert(dma != NULL, return NULL);
>  	pg = get_order(size);
>  	gfp_flags = GFP_KERNEL
> +		| __GFP_COMP	/* compound page lets parts be mapped */
>  		| __GFP_NORETRY /* don't trigger OOM-killer */
>  		| __GFP_NOWARN; /* no stack trace print - this call is non-critical */
>  	res = dma_alloc_coherent(dev, PAGE_SIZE << pg, dma, gfp_flags);
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30)
  2005-11-20 18:14   ` Lee Revell
@ 2005-11-20 19:35     ` Hugh Dickins
  2005-11-21  6:59       ` Miles Lane
  2005-11-21 11:52       ` Takashi Iwai
  0 siblings, 2 replies; 13+ messages in thread
From: Hugh Dickins @ 2005-11-20 19:35 UTC (permalink / raw)
  To: Lee Revell; +Cc: Miles Lane, Andrew Morton, LKML, alsa-devel

On Sun, 20 Nov 2005, Lee Revell wrote:
> (Added alsa-devel to cc:)
> 
> Will this change have any ill effects on older kernels?

I think not (except 2.6.5 and earlier didn't define __GFP_COMP).

> If not we should fix it in the ALSA tree right?

Probably, but I'm no authority on the ALSA tree.

And suggest you wait a bit, since I haven't yet signed off on the
patch that piece will be a part of.

And what about the patch at the bottom (which I had CC'ed to Karsten
Wiese), is that part of the ALSA tree too?  That case isn't so easy:
the get_page makes a difference, and probably it was right not to do
it before, yet strange it was the only nopage in the tree which didn't
get_page.

Hugh

> Lee
> 
> On Sun, 2005-11-20 at 08:05 +0000, Hugh Dickins wrote:
> > On Sat, 19 Nov 2005, Miles Lane wrote:
> > > [17179671.700000] Bad page state at free_hot_cold_page (in process
> > > 'aplay', page c18eef30)
> > > [17179671.700000] flags:0x80000414 mapping:00000000 mapcount:0 count:0
> > 
> > Please let me know if it's not fixed by:
> > 
> > --- 2.6.15-rc1-mm2/sound/core/memalloc.c	2005-11-12 09:01:28.000000000 +0000
> > +++ linux/sound/core/memalloc.c	2005-11-19 19:03:32.000000000 +0000
> > @@ -197,6 +197,7 @@ void *snd_malloc_pages(size_t size, gfp_
> >  
> >  	snd_assert(size > 0, return NULL);
> >  	snd_assert(gfp_flags != 0, return NULL);
> > +	gfp_flags |= __GFP_COMP;	/* compound page lets parts be mapped */
> >  	pg = get_order(size);
> >  	if ((res = (void *) __get_free_pages(gfp_flags, pg)) != NULL) {
> >  		mark_pages(virt_to_page(res), pg);
> > @@ -241,6 +242,7 @@ static void *snd_malloc_dev_pages(struct
> >  	snd_assert(dma != NULL, return NULL);
> >  	pg = get_order(size);
> >  	gfp_flags = GFP_KERNEL
> > +		| __GFP_COMP	/* compound page lets parts be mapped */
> >  		| __GFP_NORETRY /* don't trigger OOM-killer */
> >  		| __GFP_NOWARN; /* no stack trace print - this call is non-critical */
> >  	res = dma_alloc_coherent(dev, PAGE_SIZE << pg, dma, gfp_flags);

Karsten Wiese <annabellesgarden@yahoo.de>
[PATCH 03/11] unpaged: sound nopage get_page

Something noticed when studying use of VM_RESERVED in different drivers:
snd_usX2Y_hwdep_pcm_vm_nopage omitted to get_page: fixed.

And how did this work before?  Aargh!  That nopage is returning a page
from within a buffer allocated by snd_malloc_pages, which allocates a
high-order page, then does SetPageReserved on each 0-order page within.

That would have worked in 2.6.14, because when the area was unmapped,
PageReserved inhibited put_page.  2.6.15-rc1 removed that inhibition
(while leaving ineffective PageReserveds around for now), but it hasn't
caused trouble because.. we've not been freeing from VM_RESERVED at all.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 sound/usb/usx2y/usx2yhwdeppcm.c |    1 +
 1 files changed, 1 insertion(+)

--- unpaged02/sound/usb/usx2y/usx2yhwdeppcm.c	2005-11-12 09:01:30.000000000 +0000
+++ unpaged03/sound/usb/usx2y/usx2yhwdeppcm.c	2005-11-17 15:10:34.000000000 +0000
@@ -691,6 +691,7 @@ static struct page * snd_usX2Y_hwdep_pcm
 	snd_assert((offset % PAGE_SIZE) == 0, return NOPAGE_OOM);
 	vaddr = (char*)((usX2Ydev_t*)area->vm_private_data)->hwdep_pcm_shm + offset;
 	page = virt_to_page(vaddr);
+	get_page(page);
 
 	if (type)
 		*type = VM_FAULT_MINOR;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30)
  2005-11-20 19:35     ` Hugh Dickins
@ 2005-11-21  6:59       ` Miles Lane
  2005-11-21 11:52       ` Takashi Iwai
  1 sibling, 0 replies; 13+ messages in thread
From: Miles Lane @ 2005-11-21  6:59 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Lee Revell, Andrew Morton, LKML, alsa-devel

On 11/20/05, Hugh Dickins <hugh@veritas.com> wrote:
> On Sun, 20 Nov 2005, Lee Revell wrote:
> > (Added alsa-devel to cc:)
> >
> > Will this change have any ill effects on older kernels?
>
> I think not (except 2.6.5 and earlier didn't define __GFP_COMP).
>
> > If not we should fix it in the ALSA tree right?
>
> Probably, but I'm no authority on the ALSA tree.
>
> And suggest you wait a bit, since I haven't yet signed off on the
> patch that piece will be a part of.
>
> And what about the patch at the bottom (which I had CC'ed to Karsten
> Wiese), is that part of the ALSA tree too?  That case isn't so easy:
> the get_page makes a difference, and probably it was right not to do
> it before, yet strange it was the only nopage in the tree which didn't
> get_page.
>
> Hugh
>
> > Lee
> >
> > On Sun, 2005-11-20 at 08:05 +0000, Hugh Dickins wrote:
> > > On Sat, 19 Nov 2005, Miles Lane wrote:
> > > > [17179671.700000] Bad page state at free_hot_cold_page (in process
> > > > 'aplay', page c18eef30)
> > > > [17179671.700000] flags:0x80000414 mapping:00000000 mapcount:0 count:0
> > >
> > > Please let me know if it's not fixed by:
> > >
> > > --- 2.6.15-rc1-mm2/sound/core/memalloc.c    2005-11-12 09:01:28.000000000 +0000
> > > +++ linux/sound/core/memalloc.c     2005-11-19 19:03:32.000000000 +0000
> > > @@ -197,6 +197,7 @@ void *snd_malloc_pages(size_t size, gfp_
> > >
> > >     snd_assert(size > 0, return NULL);
> > >     snd_assert(gfp_flags != 0, return NULL);
> > > +   gfp_flags |= __GFP_COMP;        /* compound page lets parts be mapped */
> > >     pg = get_order(size);
> > >     if ((res = (void *) __get_free_pages(gfp_flags, pg)) != NULL) {
> > >             mark_pages(virt_to_page(res), pg);
> > > @@ -241,6 +242,7 @@ static void *snd_malloc_dev_pages(struct
> > >     snd_assert(dma != NULL, return NULL);
> > >     pg = get_order(size);
> > >     gfp_flags = GFP_KERNEL
> > > +           | __GFP_COMP    /* compound page lets parts be mapped */
> > >             | __GFP_NORETRY /* don't trigger OOM-killer */
> > >             | __GFP_NOWARN; /* no stack trace print - this call is non-critical */
> > >     res = dma_alloc_coherent(dev, PAGE_SIZE << pg, dma, gfp_flags);

Thanks, this did fix the problem for me.

         Miles

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30)
  2005-11-20 19:35     ` Hugh Dickins
  2005-11-21  6:59       ` Miles Lane
@ 2005-11-21 11:52       ` Takashi Iwai
  2005-11-21 15:46         ` Hugh Dickins
  1 sibling, 1 reply; 13+ messages in thread
From: Takashi Iwai @ 2005-11-21 11:52 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Lee Revell, Miles Lane, Andrew Morton, LKML, alsa-devel

Hi Hugh,

At Sun, 20 Nov 2005 19:35:57 +0000 (GMT),
Hugh Dickins wrote:
> 
> On Sun, 20 Nov 2005, Lee Revell wrote:
> > (Added alsa-devel to cc:)
> > 
> > Will this change have any ill effects on older kernels?
> 
> I think not (except 2.6.5 and earlier didn't define __GFP_COMP).

Sorry, I still don't figure out how __GFP_COMP solved this problem.
Could you enlighten me a bit?

Isn't it needed for dma_alloc_coherent() (for i386, particularly),
too?  dma_alloc_coherent() also gets pages with __get_free_pages().

Also, I think we can remove all Set/ClearPageReserved() in memalloc.c
now.  It was there just for mmap...


> > If not we should fix it in the ALSA tree right?
> 
> Probably, but I'm no authority on the ALSA tree.

We'd need to add a dummy definition of __GFP_COMP for older kenels in
alsa-driver tree.  But that's all.

> And suggest you wait a bit, since I haven't yet signed off on the
> patch that piece will be a part of.
> 
> And what about the patch at the bottom (which I had CC'ed to Karsten
> Wiese), is that part of the ALSA tree too?  That case isn't so easy:
> the get_page makes a difference, and probably it was right not to do
> it before, yet strange it was the only nopage in the tree which didn't
> get_page.

In the ealier version, get_page() in sound/pcm_native.c was checked
with "if (! PageReserved(page))".  Since the pages were always
reserved, get_page was never called indeed.  (IIRC, the check was
introduced to avoid crash with anon page or so.)
usx2y driver was added later, so it didn't have unneeded get_page().


thanks,

Takashi

> 
> Hugh
> 
> > Lee
> > 
> > On Sun, 2005-11-20 at 08:05 +0000, Hugh Dickins wrote:
> > > On Sat, 19 Nov 2005, Miles Lane wrote:
> > > > [17179671.700000] Bad page state at free_hot_cold_page (in process
> > > > 'aplay', page c18eef30)
> > > > [17179671.700000] flags:0x80000414 mapping:00000000 mapcount:0 count:0
> > > 
> > > Please let me know if it's not fixed by:
> > > 
> > > --- 2.6.15-rc1-mm2/sound/core/memalloc.c	2005-11-12 09:01:28.000000000 +0000
> > > +++ linux/sound/core/memalloc.c	2005-11-19 19:03:32.000000000 +0000
> > > @@ -197,6 +197,7 @@ void *snd_malloc_pages(size_t size, gfp_
> > >  
> > >  	snd_assert(size > 0, return NULL);
> > >  	snd_assert(gfp_flags != 0, return NULL);
> > > +	gfp_flags |= __GFP_COMP;	/* compound page lets parts be mapped */
> > >  	pg = get_order(size);
> > >  	if ((res = (void *) __get_free_pages(gfp_flags, pg)) != NULL) {
> > >  		mark_pages(virt_to_page(res), pg);
> > > @@ -241,6 +242,7 @@ static void *snd_malloc_dev_pages(struct
> > >  	snd_assert(dma != NULL, return NULL);
> > >  	pg = get_order(size);
> > >  	gfp_flags = GFP_KERNEL
> > > +		| __GFP_COMP	/* compound page lets parts be mapped */
> > >  		| __GFP_NORETRY /* don't trigger OOM-killer */
> > >  		| __GFP_NOWARN; /* no stack trace print - this call is non-critical */
> > >  	res = dma_alloc_coherent(dev, PAGE_SIZE << pg, dma, gfp_flags);
> 
> Karsten Wiese <annabellesgarden@yahoo.de>
> [PATCH 03/11] unpaged: sound nopage get_page
> 
> Something noticed when studying use of VM_RESERVED in different drivers:
> snd_usX2Y_hwdep_pcm_vm_nopage omitted to get_page: fixed.
> 
> And how did this work before?  Aargh!  That nopage is returning a page
> from within a buffer allocated by snd_malloc_pages, which allocates a
> high-order page, then does SetPageReserved on each 0-order page within.
> 
> That would have worked in 2.6.14, because when the area was unmapped,
> PageReserved inhibited put_page.  2.6.15-rc1 removed that inhibition
> (while leaving ineffective PageReserveds around for now), but it hasn't
> caused trouble because.. we've not been freeing from VM_RESERVED at all.
> 
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> ---
> 
>  sound/usb/usx2y/usx2yhwdeppcm.c |    1 +
>  1 files changed, 1 insertion(+)
> 
> --- unpaged02/sound/usb/usx2y/usx2yhwdeppcm.c	2005-11-12 09:01:30.000000000 +0000
> +++ unpaged03/sound/usb/usx2y/usx2yhwdeppcm.c	2005-11-17 15:10:34.000000000 +0000
> @@ -691,6 +691,7 @@ static struct page * snd_usX2Y_hwdep_pcm
>  	snd_assert((offset % PAGE_SIZE) == 0, return NOPAGE_OOM);
>  	vaddr = (char*)((usX2Ydev_t*)area->vm_private_data)->hwdep_pcm_shm + offset;
>  	page = virt_to_page(vaddr);
> +	get_page(page);
>  
>  	if (type)
>  		*type = VM_FAULT_MINOR;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30)
  2005-11-21 11:52       ` Takashi Iwai
@ 2005-11-21 15:46         ` Hugh Dickins
  2005-11-21 15:52           ` Russell King
  2005-11-21 16:17           ` Takashi Iwai
  0 siblings, 2 replies; 13+ messages in thread
From: Hugh Dickins @ 2005-11-21 15:46 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Lee Revell, Miles Lane, Andrew Morton, LKML, alsa-devel

On Mon, 21 Nov 2005, Takashi Iwai wrote:
> 
> Sorry, I still don't figure out how __GFP_COMP solved this problem.
> Could you enlighten me a bit?

The sequence of problems was this.

Nick's core PageReserved removal patch in 2.6.15-rc1 (and -rc2) 
changed VM_RESERVED vmas never to free their pages on unmapping (e.g.
on exit) - fine for remap_pfn_range areas, but a leak where others set
VM_RESERVED; and PageReserved not to inhibit decrementing page count.

In -rc1-mm2 I tried to fix that leak by restoring VM_RESERVED to its
previous behaviour, and using a different flag VM_UNPAGED, set in
remap_pfn_range, for the don't-free-when-unmapping behaviour.

But there's then a problem when the underlying page was allocated
as a high-order page, but the separate individual 0-order constituent
pages are mapped into userspace by nopage: the page count of the first
0-order is raised by allocation, but the following 0-order pages are
left with page count 0.  nopage's get_page raises that to 1,
zap_pte_range (or whatever it uses to actually do the freeing) lowers
that to 0, and hence frees the page, even though it's a constituent of
the not-yet-freed high-order page.  (This had not been a problem while
PageReserved was inhibiting decrementing the page count.)

So another of my patches in -rc1-mm2 made the PageCompound technique
available always, no longer under #ifdef CONFIG_HUGETLB_PAGE: so that
get_page and put_page on the later constituents of the high-order
page get redirected to the first one, and it should work okay again.

Except that I'd missed that you actually have to choose to have your
high-order pages supplied as compound pages, by passing __GFP_COMP.
Since I wasn't passing that, they still weren't allocated as compound
pages, so were still being freed too soon - and the PG_reserved flag
found while freeing gave rise to the "Bad page state" messages seen.

> Isn't it needed for dma_alloc_coherent() (for i386, particularly),
> too?  dma_alloc_coherent() also gets pages with __get_free_pages().

Didn't I deal with that by adding __GFP_COMP in snd_malloc_dev_pages?
And (in a separate patch run past davem and wli first, to be aggregated
with the sound/core/memalloc patch when I sign off and send to Andrew)
in the sparc and sparc64 sbus_alloc_consistent.

It's only an issue when the high-order page might be mapped into
userspace, then its constituents freed up by zap_pte_range; or
locked down with get_user_pages and later released: when constituents
of a high-order page pass through common code designed for 0-order pages.

> Also, I think we can remove all Set/ClearPageReserved() in memalloc.c
> now.  It was there just for mmap...

That is so, but we'd prefer you to hold off for now.  The way it's
proceeding is, for 2.6.15 we're not actually removing the Set/Clear
PageReserved from any of the drivers or from any of the architecture
initialization; but PageReserved is no longer serving any functional
purpose, except where PG_reserved acts as a double-check when the page
is freed, as to whether it's all working right.  Which was useful in
this case, to identify where I'd forgotten all about __GFP_COMP; and
I fear may prove useful in some other cases too.  Retaining this use
of PG_reserved for now, gives greater confidence in our safety when we
later advance to removing all the Set/ClearPageReserved hocus-pocus.

Hugh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30)
  2005-11-21 15:46         ` Hugh Dickins
@ 2005-11-21 15:52           ` Russell King
  2005-11-21 16:51             ` Hugh Dickins
  2005-11-21 16:17           ` Takashi Iwai
  1 sibling, 1 reply; 13+ messages in thread
From: Russell King @ 2005-11-21 15:52 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Takashi Iwai, Lee Revell, Miles Lane, Andrew Morton, LKML, alsa-devel

On Mon, Nov 21, 2005 at 03:46:50PM +0000, Hugh Dickins wrote:
> So another of my patches in -rc1-mm2 made the PageCompound technique
> available always, no longer under #ifdef CONFIG_HUGETLB_PAGE: so that
> get_page and put_page on the later constituents of the high-order
> page get redirected to the first one, and it should work okay again.
> 
> Except that I'd missed that you actually have to choose to have your
> high-order pages supplied as compound pages, by passing __GFP_COMP.
> Since I wasn't passing that, they still weren't allocated as compound
> pages, so were still being freed too soon - and the PG_reserved flag
> found while freeing gave rise to the "Bad page state" messages seen.

Does this mean that in arch/arm/mm/consistent.c, we should also set
__GFP_COMP ?  Should we be doing that today?

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30)
  2005-11-21 15:46         ` Hugh Dickins
  2005-11-21 15:52           ` Russell King
@ 2005-11-21 16:17           ` Takashi Iwai
  2005-11-21 16:25             ` Russell King
  2005-11-21 17:09             ` Hugh Dickins
  1 sibling, 2 replies; 13+ messages in thread
From: Takashi Iwai @ 2005-11-21 16:17 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Lee Revell, Miles Lane, Andrew Morton, LKML, alsa-devel

At Mon, 21 Nov 2005 15:46:50 +0000 (GMT),
Hugh Dickins wrote:
> 
> On Mon, 21 Nov 2005, Takashi Iwai wrote:
> > 
> > Sorry, I still don't figure out how __GFP_COMP solved this problem.
> > Could you enlighten me a bit?
> 
> The sequence of problems was this.
> 
> Nick's core PageReserved removal patch in 2.6.15-rc1 (and -rc2) 
> changed VM_RESERVED vmas never to free their pages on unmapping (e.g.
> on exit) - fine for remap_pfn_range areas, but a leak where others set
> VM_RESERVED; and PageReserved not to inhibit decrementing page count.
> 
> In -rc1-mm2 I tried to fix that leak by restoring VM_RESERVED to its
> previous behaviour, and using a different flag VM_UNPAGED, set in
> remap_pfn_range, for the don't-free-when-unmapping behaviour.
> 
> But there's then a problem when the underlying page was allocated
> as a high-order page, but the separate individual 0-order constituent
> pages are mapped into userspace by nopage: the page count of the first
> 0-order is raised by allocation, but the following 0-order pages are
> left with page count 0.  nopage's get_page raises that to 1,
> zap_pte_range (or whatever it uses to actually do the freeing) lowers
> that to 0, and hence frees the page, even though it's a constituent of
> the not-yet-freed high-order page.  (This had not been a problem while
> PageReserved was inhibiting decrementing the page count.)
> 
> So another of my patches in -rc1-mm2 made the PageCompound technique
> available always, no longer under #ifdef CONFIG_HUGETLB_PAGE: so that
> get_page and put_page on the later constituents of the high-order
> page get redirected to the first one, and it should work okay again.
> 
> Except that I'd missed that you actually have to choose to have your
> high-order pages supplied as compound pages, by passing __GFP_COMP.
> Since I wasn't passing that, they still weren't allocated as compound
> pages, so were still being freed too soon - and the PG_reserved flag
> found while freeing gave rise to the "Bad page state" messages seen.

I see, thanks for explanation!

Now another question arises:  Which is the recommended method for
mmapping RAM pages, vma nopage callback or remap_pfn_range()?

IIRC, in the ealier versions, the former was recommended because
remap_page_range() with page-reserve was regarded as a hack.
But, looking through these changes, I feel that remap_pfn_range() is
better (easier and stabler) than vma nopage...


> > Isn't it needed for dma_alloc_coherent() (for i386, particularly),
> > too?  dma_alloc_coherent() also gets pages with __get_free_pages().
> 
> Didn't I deal with that by adding __GFP_COMP in snd_malloc_dev_pages?

Oh yes, I overlooked it.  It must be fine.


> And (in a separate patch run past davem and wli first, to be aggregated
> with the sound/core/memalloc patch when I sign off and send to Andrew)
> in the sparc and sparc64 sbus_alloc_consistent.
> 
> It's only an issue when the high-order page might be mapped into
> userspace, then its constituents freed up by zap_pte_range; or
> locked down with get_user_pages and later released: when constituents
> of a high-order page pass through common code designed for 0-order pages.
> 
> > Also, I think we can remove all Set/ClearPageReserved() in memalloc.c
> > now.  It was there just for mmap...
> 
> That is so, but we'd prefer you to hold off for now.  The way it's
> proceeding is, for 2.6.15 we're not actually removing the Set/Clear
> PageReserved from any of the drivers or from any of the architecture
> initialization; but PageReserved is no longer serving any functional
> purpose, except where PG_reserved acts as a double-check when the page
> is freed, as to whether it's all working right.  Which was useful in
> this case, to identify where I'd forgotten all about __GFP_COMP; and
> I fear may prove useful in some other cases too.  Retaining this use
> of PG_reserved for now, gives greater confidence in our safety when we
> later advance to removing all the Set/ClearPageReserved hocus-pocus.

OK.  Let's fix them right later.


Takashi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30)
  2005-11-21 16:17           ` Takashi Iwai
@ 2005-11-21 16:25             ` Russell King
  2005-11-21 16:41               ` Takashi Iwai
  2005-11-21 17:09             ` Hugh Dickins
  1 sibling, 1 reply; 13+ messages in thread
From: Russell King @ 2005-11-21 16:25 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Hugh Dickins, Lee Revell, Miles Lane, Andrew Morton, LKML, alsa-devel

On Mon, Nov 21, 2005 at 05:17:12PM +0100, Takashi Iwai wrote:
> Now another question arises:  Which is the recommended method for
> mmapping RAM pages, vma nopage callback or remap_pfn_range()?
> 
> IIRC, in the ealier versions, the former was recommended because
> remap_page_range() with page-reserve was regarded as a hack.
> But, looking through these changes, I feel that remap_pfn_range() is
> better (easier and stabler) than vma nopage...

And this brings up the question of this age old patch:

diff -up -x BitKeeper -x ChangeSet -x SCCS -x _xlk -x '*.orig' -x '*.rej' -r orig/arch/i386/kernel/pci-dma.c linux/arch/i386/kernel/pci-dma.c
--- orig/arch/i386/kernel/pci-dma.c	Mon Apr  4 22:52:57 2005
+++ linux/arch/i386/kernel/pci-dma.c	Mon Apr  4 22:44:45 2005
@@ -50,7 +50,7 @@ void *dma_alloc_coherent(struct device *
 	ret = (void *)__get_free_pages(gfp, order);
 
 	if (ret != NULL) {
-		memset(ret, 0, size);
+		memset(ret, 0, PAGE_ALIGN(size));
 		*dma_handle = virt_to_phys(ret);
 	}
 	return ret;
diff -up -x BitKeeper -x ChangeSet -x SCCS -x _xlk -x '*.orig' -x '*.rej' -r orig/include/asm-i386/dma-mapping.h linux/include/asm-i386/dma-mapping.h
--- orig/include/asm-i386/dma-mapping.h	Mon Apr  4 22:54:41 2005
+++ linux/include/asm-i386/dma-mapping.h	Mon Apr  4 22:48:11 2005
@@ -16,6 +16,23 @@ void *dma_alloc_coherent(struct device *
 void dma_free_coherent(struct device *dev, size_t size,
 			 void *vaddr, dma_addr_t dma_handle);
 
+static inline int
+dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
+		  void *vaddr, dma_addr_t handle, size_t size)
+{
+	unsigned long offset = vma->vm_pgoff, usize;
+
+	size = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	usize = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+
+	if (offset >= size || usize > (size - offset))
+		return -ENXIO;
+
+	return remap_page_range(vma, vma->vm_start,
+				__pa(vaddr) + (offset << PAGE_SHIFT),
+				usize << PAGE_SHIFT, vma->vm_page_prot);
+}
+
 static inline dma_addr_t
 dma_map_single(struct device *dev, void *ptr, size_t size,
 	       enum dma_data_direction direction)

which provides the dma mmap API which presently is ARM-only on to x86.

Note: the first part of this patch should probably be applied anyway
if you're mmaping dma_alloc_coherent pages to userspace to ensure that
information is not leaked from kernel context.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30)
  2005-11-21 16:25             ` Russell King
@ 2005-11-21 16:41               ` Takashi Iwai
  0 siblings, 0 replies; 13+ messages in thread
From: Takashi Iwai @ 2005-11-21 16:41 UTC (permalink / raw)
  To: Russell King
  Cc: Hugh Dickins, Lee Revell, Miles Lane, Andrew Morton, LKML, alsa-devel

At Mon, 21 Nov 2005 16:25:58 +0000,
Russell King wrote:
> 
> On Mon, Nov 21, 2005 at 05:17:12PM +0100, Takashi Iwai wrote:
> > Now another question arises:  Which is the recommended method for
> > mmapping RAM pages, vma nopage callback or remap_pfn_range()?
> > 
> > IIRC, in the ealier versions, the former was recommended because
> > remap_page_range() with page-reserve was regarded as a hack.
> > But, looking through these changes, I feel that remap_pfn_range() is
> > better (easier and stabler) than vma nopage...
> 
> And this brings up the question of this age old patch:
> 
> diff -up -x BitKeeper -x ChangeSet -x SCCS -x _xlk -x '*.orig' -x '*.rej' -r orig/arch/i386/kernel/pci-dma.c linux/arch/i386/kernel/pci-dma.c
> --- orig/arch/i386/kernel/pci-dma.c	Mon Apr  4 22:52:57 2005
> +++ linux/arch/i386/kernel/pci-dma.c	Mon Apr  4 22:44:45 2005
> @@ -50,7 +50,7 @@ void *dma_alloc_coherent(struct device *
>  	ret = (void *)__get_free_pages(gfp, order);
>  
>  	if (ret != NULL) {
> -		memset(ret, 0, size);
> +		memset(ret, 0, PAGE_ALIGN(size));
>  		*dma_handle = virt_to_phys(ret);
>  	}
>  	return ret;
> diff -up -x BitKeeper -x ChangeSet -x SCCS -x _xlk -x '*.orig' -x '*.rej' -r orig/include/asm-i386/dma-mapping.h linux/include/asm-i386/dma-mapping.h
> --- orig/include/asm-i386/dma-mapping.h	Mon Apr  4 22:54:41 2005
> +++ linux/include/asm-i386/dma-mapping.h	Mon Apr  4 22:48:11 2005
> @@ -16,6 +16,23 @@ void *dma_alloc_coherent(struct device *
>  void dma_free_coherent(struct device *dev, size_t size,
>  			 void *vaddr, dma_addr_t dma_handle);
>  
> +static inline int
> +dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
> +		  void *vaddr, dma_addr_t handle, size_t size)
> +{
> +	unsigned long offset = vma->vm_pgoff, usize;
> +
> +	size = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +	usize = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
> +
> +	if (offset >= size || usize > (size - offset))
> +		return -ENXIO;
> +
> +	return remap_page_range(vma, vma->vm_start,
> +				__pa(vaddr) + (offset << PAGE_SHIFT),
> +				usize << PAGE_SHIFT, vma->vm_page_prot);
> +}
> +
>  static inline dma_addr_t
>  dma_map_single(struct device *dev, void *ptr, size_t size,
>  	       enum dma_data_direction direction)
> 
> which provides the dma mmap API which presently is ARM-only on to x86.

Yes, that's nice.  It would be nice if dma_mmap_coherent() is
implemented on all architectures.

> Note: the first part of this patch should probably be applied anyway
> if you're mmaping dma_alloc_coherent pages to userspace to ensure that
> information is not leaked from kernel context.

This reminds me another thing:  x86 dma_alloc_coherent() doesn't trim
the pages between PAGE_ALIGN(size) and (1 << order).  Some archs seem
doing this right.


Takashi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30)
  2005-11-21 15:52           ` Russell King
@ 2005-11-21 16:51             ` Hugh Dickins
  0 siblings, 0 replies; 13+ messages in thread
From: Hugh Dickins @ 2005-11-21 16:51 UTC (permalink / raw)
  To: Russell King
  Cc: Takashi Iwai, Lee Revell, Miles Lane, Andrew Morton, LKML, alsa-devel

On Mon, 21 Nov 2005, Russell King wrote:
> 
> Does this mean that in arch/arm/mm/consistent.c, we should also set
> __GFP_COMP ?  Should we be doing that today?

No, I don't believe so.  I notice there's use of PageReserved and page
count manipulation in there, which we may well want to rationalize in
a later phase, perhaps replacing by __GFP_COMP then; but I don't see
any need to change what you're doing at this stage.

Looking again, to see why you even thought you might need to change:
the vital thing you're doing (aside from the PageReserved, which is
becoming irrelevant), which is missing from the regular high-order
allocation, is the set_page_count(page, 1).  That's why you don't
need to change: the problems I was writing of come from when the
0-order page count goes down to 0.

Makes me wonder whether there could be a useful halfway stage to
compound pages, whether it would help for the page allocator to
set those counts to 1 generally.

Hugh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30)
  2005-11-21 16:17           ` Takashi Iwai
  2005-11-21 16:25             ` Russell King
@ 2005-11-21 17:09             ` Hugh Dickins
  1 sibling, 0 replies; 13+ messages in thread
From: Hugh Dickins @ 2005-11-21 17:09 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Lee Revell, Miles Lane, Andrew Morton, Linus Torvalds,
	Christoph Hellwig, LKML, alsa-devel

On Mon, 21 Nov 2005, Takashi Iwai wrote:
> 
> Now another question arises:  Which is the recommended method for
> mmapping RAM pages, vma nopage callback or remap_pfn_range()?
> 
> IIRC, in the ealier versions, the former was recommended because
> remap_page_range() with page-reserve was regarded as a hack.

I do believe Linus (CC'ed so he can defend himself from my slanders)
saw it that way for a long while.  I believe he did, and still does,
think it more correct to work through the intended vm_operations_struct
methods.  But has relented a little down the years, and now accepts
that remap_pfn_range is here to stay (as much as anything stays).

> But, looking through these changes, I feel that remap_pfn_range() is
> better (easier and stabler) than vma nopage...

I'm not going to make a recommendation: whatever suits you best.

It does sometimes look like people converted from remap_pfn_range
to nopage, just because they felt under pressure to do so.

It is curious that your snd_pcm_mmap_status_nopage seems to have been
affected and nothing else (I'd expected trouble in DRM but no sign).

I've CC'ed hch also, he did have some pungent views on how you ought
better to go about this.

Hugh

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-11-21 17:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-20  6:56 2.6.15-rc1-mm2 -- Bad page state at free_hot_cold_page (in process 'aplay', page c18eef30) Miles Lane
2005-11-20  8:05 ` Hugh Dickins
2005-11-20 18:14   ` Lee Revell
2005-11-20 19:35     ` Hugh Dickins
2005-11-21  6:59       ` Miles Lane
2005-11-21 11:52       ` Takashi Iwai
2005-11-21 15:46         ` Hugh Dickins
2005-11-21 15:52           ` Russell King
2005-11-21 16:51             ` Hugh Dickins
2005-11-21 16:17           ` Takashi Iwai
2005-11-21 16:25             ` Russell King
2005-11-21 16:41               ` Takashi Iwai
2005-11-21 17:09             ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).