linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 at drivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380()
       [not found] <1426227621.6711.238.camel@intel.com>
@ 2015-03-17 17:15 ` Linus Torvalds
  2015-03-17 17:28   ` Michal Hocko
  2015-03-17 19:24   ` Johannes Weiner
  0 siblings, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2015-03-17 17:15 UTC (permalink / raw)
  To: Huang Ying, Michal Hocko, Tetsuo Handa, David Rientjes,
	Andrew Morton, Dave Chinner
  Cc: Johannes Weiner, LKML, LKP ML, linux-mm

Explicitly adding the emails of other people involved with that commit
and the original oom thread to make sure people are aware, since this
didn't get any response.

Commit cc87317726f8 fixed some behavior, but also seems to have turned
an oom situation into a complete hang. So presumably we shouldn't loop
*forever*. Hmm?

Comments?

                           Linus

On Thu, Mar 12, 2015 at 11:20 PM, Huang Ying <ying.huang@intel.com> wrote:
> FYI, we noticed the below changes on
>
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> commit cc87317726f851531ae8422e0c2d3d6e2d7b1955 ("mm: page_alloc: revert inadvertent !__GFP_FS retry behavior change")
>
> Before the commit, the page allocation failure is as follow (in prev_dmesg).
>
> [    3.069031] BTRFS: selftest: Running space stealing from bitmap to extent
> [    3.070243] BTRFS: selftest: Free space cache tests finished
> [    3.070919] BTRFS: selftest: Running extent buffer operation tests
> [    3.072111] BTRFS: selftest: Running btrfs_split_item tests
> [    3.072840] BTRFS: selftest: Running find delalloc tests
> [    3.295788] swapper/0: page allocation failure: order:0, mode:0x50
> [    3.296315] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W       4.0.0-rc1-00038-g39afb5e #4
> [    3.297033] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [    3.297490]  00000000 00000000 4002bdd4 4158716c 00000001 4002bdfc 410c64f1 41719e60
> [    3.298218]  4001b304 00000000 00000050 4002bdf8 4158da0d 00000000 00000000 4002be80
> [    3.298929]  410c8331 00000050 00000000 00000000 00000001 00000050 4001b000 00000040
> [    3.299644] Call Trace:
> [    3.299859]  [<4158716c>] dump_stack+0x48/0x60
> [    3.300235]  [<410c64f1>] warn_alloc_failed+0xa1/0xe0
> [    3.300640]  [<4158da0d>] ? _raw_spin_unlock+0x1d/0x30
> [    3.301070]  [<410c8331>] __alloc_pages_nodemask+0x4d1/0x810
> [    3.301517]  [<410c04e3>] pagecache_get_page+0xf3/0x1c0
> [    3.301957]  [<4124ccf7>] btrfs_test_extent_io+0x67/0x660
> [    3.302401]  [<4124c5cb>] ? btrfs_test_extent_buffer_operations+0x54b/0x6c0
> [    3.302966]  [<4184109b>] ? debugfs_init+0x4e/0x4e
> [    3.303360]  [<41841192>] init_btrfs_fs+0xf7/0x172
> [    3.303750]  [<41000472>] do_one_initcall+0xc2/0x1c0
> [    3.304155]  [<41829462>] ? repair_env_string+0x12/0x54
> [    3.304566]  [<41829400>] ? do_early_param+0x23/0x73
> [    3.304971]  [<4104ca99>] ? parse_args+0x249/0x4e0
> [    3.305364]  [<41829450>] ? do_early_param+0x73/0x73
> [    3.305767]  [<41829bce>] kernel_init_freeable+0xe3/0x160
> [    3.306204]  [<41829bce>] ? kernel_init_freeable+0xe3/0x160
> [    3.306632]  [<41582b78>] kernel_init+0x8/0xc0
> [    3.307022]  [<4158e281>] ret_from_kernel_thread+0x21/0x30
> [    3.307455]  [<41582b70>] ? rest_init+0xb0/0xb0
> [    3.307826] Mem-Info:
> [    3.308024] Normal per-cpu:
> [    3.308251] CPU    0: hi:   90, btch:  15 usd:  82
> [    3.308630] CPU    1: hi:   90, btch:  15 usd:   2
> [    3.309026] active_anon:0 inactive_anon:0 isolated_anon:0
> [    3.309026]  active_file:873 inactive_file:62554 isolated_file:0
> [    3.309026]  unevictable:9425 dirty:0 writeback:0 unstable:0
> [    3.309026]  free:539 slab_reclaimable:0 slab_unreclaimable:0
> [    3.309026]  mapped:0 shmem:0 pagetables:0 bounce:0
> [    3.309026]  free_cma:0
>
>
> After the commit, the system hang at the same position (in .dmesg).
>
> [    3.303002] BTRFS: selftest: Running btrfs free space cache tests
> [    3.303636] BTRFS: selftest: Running extent only tests
> [    3.304190] BTRFS: selftest: Running bitmap only tests
> [    3.304726] BTRFS: selftest: Running bitmap and extent tests
> [    3.305346] BTRFS: selftest: Running space stealing from bitmap to extent
> [    3.306318] BTRFS: selftest: Free space cache tests finished
> [    3.306881] BTRFS: selftest: Running extent buffer operation tests
> [    3.307483] BTRFS: selftest: Running btrfs_split_item tests
> [    3.308134] BTRFS: selftest: Running find delalloc tests
>
> BUG: kernel boot hang
> Elapsed time: 305
>
>
> Thanks,
> Ying Huang
>
>
> _______________________________________________
> LKP mailing list
> LKP@linux.intel.com
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 at drivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380()
  2015-03-17 17:15 ` [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 at drivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380() Linus Torvalds
@ 2015-03-17 17:28   ` Michal Hocko
  2015-03-17 19:24   ` Johannes Weiner
  1 sibling, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2015-03-17 17:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Huang Ying, Tetsuo Handa, David Rientjes, Andrew Morton,
	Dave Chinner, Johannes Weiner, LKML, LKP ML, linux-mm,
	Theodore Ts'o

[CCing Ted]

On Tue 17-03-15 10:15:29, Linus Torvalds wrote:
> Explicitly adding the emails of other people involved with that commit
> and the original oom thread to make sure people are aware, since this
> didn't get any response.
> 
> Commit cc87317726f8 fixed some behavior,

Yes, it was ext4 remounting RO because of the allocation failures AFAIR.
I am not sure those were addressed in the meantime. Ted?

> but also seems to have turned an oom situation into a complete
> hang. So presumably we shouldn't loop *forever*. Hmm?

I am definitely for the failure for GFP_NOFS allocations. It is weird to
loop inside the allocator without any way out because even OOM killer as
the last resort is not used. The primary force for the revert was that
the change came in very late in the release cycle. I guess we should go
with revert of cc87317726f8 for 4.1.

> 
> Comments?
> 
>                            Linus
> 
> On Thu, Mar 12, 2015 at 11:20 PM, Huang Ying <ying.huang@intel.com> wrote:
> > FYI, we noticed the below changes on
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > commit cc87317726f851531ae8422e0c2d3d6e2d7b1955 ("mm: page_alloc: revert inadvertent !__GFP_FS retry behavior change")
> >
> > Before the commit, the page allocation failure is as follow (in prev_dmesg).
> >
> > [    3.069031] BTRFS: selftest: Running space stealing from bitmap to extent
> > [    3.070243] BTRFS: selftest: Free space cache tests finished
> > [    3.070919] BTRFS: selftest: Running extent buffer operation tests
> > [    3.072111] BTRFS: selftest: Running btrfs_split_item tests
> > [    3.072840] BTRFS: selftest: Running find delalloc tests
> > [    3.295788] swapper/0: page allocation failure: order:0, mode:0x50
> > [    3.296315] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W       4.0.0-rc1-00038-g39afb5e #4
> > [    3.297033] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [    3.297490]  00000000 00000000 4002bdd4 4158716c 00000001 4002bdfc 410c64f1 41719e60
> > [    3.298218]  4001b304 00000000 00000050 4002bdf8 4158da0d 00000000 00000000 4002be80
> > [    3.298929]  410c8331 00000050 00000000 00000000 00000001 00000050 4001b000 00000040
> > [    3.299644] Call Trace:
> > [    3.299859]  [<4158716c>] dump_stack+0x48/0x60
> > [    3.300235]  [<410c64f1>] warn_alloc_failed+0xa1/0xe0
> > [    3.300640]  [<4158da0d>] ? _raw_spin_unlock+0x1d/0x30
> > [    3.301070]  [<410c8331>] __alloc_pages_nodemask+0x4d1/0x810
> > [    3.301517]  [<410c04e3>] pagecache_get_page+0xf3/0x1c0
> > [    3.301957]  [<4124ccf7>] btrfs_test_extent_io+0x67/0x660
> > [    3.302401]  [<4124c5cb>] ? btrfs_test_extent_buffer_operations+0x54b/0x6c0
> > [    3.302966]  [<4184109b>] ? debugfs_init+0x4e/0x4e
> > [    3.303360]  [<41841192>] init_btrfs_fs+0xf7/0x172
> > [    3.303750]  [<41000472>] do_one_initcall+0xc2/0x1c0
> > [    3.304155]  [<41829462>] ? repair_env_string+0x12/0x54
> > [    3.304566]  [<41829400>] ? do_early_param+0x23/0x73
> > [    3.304971]  [<4104ca99>] ? parse_args+0x249/0x4e0
> > [    3.305364]  [<41829450>] ? do_early_param+0x73/0x73
> > [    3.305767]  [<41829bce>] kernel_init_freeable+0xe3/0x160
> > [    3.306204]  [<41829bce>] ? kernel_init_freeable+0xe3/0x160
> > [    3.306632]  [<41582b78>] kernel_init+0x8/0xc0
> > [    3.307022]  [<4158e281>] ret_from_kernel_thread+0x21/0x30
> > [    3.307455]  [<41582b70>] ? rest_init+0xb0/0xb0
> > [    3.307826] Mem-Info:
> > [    3.308024] Normal per-cpu:
> > [    3.308251] CPU    0: hi:   90, btch:  15 usd:  82
> > [    3.308630] CPU    1: hi:   90, btch:  15 usd:   2
> > [    3.309026] active_anon:0 inactive_anon:0 isolated_anon:0
> > [    3.309026]  active_file:873 inactive_file:62554 isolated_file:0
> > [    3.309026]  unevictable:9425 dirty:0 writeback:0 unstable:0
> > [    3.309026]  free:539 slab_reclaimable:0 slab_unreclaimable:0
> > [    3.309026]  mapped:0 shmem:0 pagetables:0 bounce:0
> > [    3.309026]  free_cma:0
> >
> >
> > After the commit, the system hang at the same position (in .dmesg).
> >
> > [    3.303002] BTRFS: selftest: Running btrfs free space cache tests
> > [    3.303636] BTRFS: selftest: Running extent only tests
> > [    3.304190] BTRFS: selftest: Running bitmap only tests
> > [    3.304726] BTRFS: selftest: Running bitmap and extent tests
> > [    3.305346] BTRFS: selftest: Running space stealing from bitmap to extent
> > [    3.306318] BTRFS: selftest: Free space cache tests finished
> > [    3.306881] BTRFS: selftest: Running extent buffer operation tests
> > [    3.307483] BTRFS: selftest: Running btrfs_split_item tests
> > [    3.308134] BTRFS: selftest: Running find delalloc tests
> >
> > BUG: kernel boot hang
> > Elapsed time: 305
> >
> >
> > Thanks,
> > Ying Huang
> >
> >
> > _______________________________________________
> > LKP mailing list
> > LKP@linux.intel.com
> >
> >

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 at drivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380()
  2015-03-17 17:15 ` [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 at drivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380() Linus Torvalds
  2015-03-17 17:28   ` Michal Hocko
@ 2015-03-17 19:24   ` Johannes Weiner
  2015-03-18  1:53     ` Huang Ying
  1 sibling, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2015-03-17 19:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Huang Ying, Michal Hocko, Tetsuo Handa, David Rientjes,
	Andrew Morton, Dave Chinner, LKML, LKP ML, linux-mm

On Tue, Mar 17, 2015 at 10:15:29AM -0700, Linus Torvalds wrote:
> Explicitly adding the emails of other people involved with that commit
> and the original oom thread to make sure people are aware, since this
> didn't get any response.
> 
> Commit cc87317726f8 fixed some behavior, but also seems to have turned
> an oom situation into a complete hang. So presumably we shouldn't loop
> *forever*. Hmm?

It seems we are between a rock and a hard place here, as we reverted
specifically to that endless looping on request of filesystem people.
They said[1] they rely on these allocations never returning NULL, or
they might fail inside a transactions and corrupt on-disk data.

Huang, against which kernels did you first run this test on this exact
setup?  Is there a chance you could try to run a kernel without/before
9879de7373fc?  I want to make sure I'm not missing something, but all
versions preceding this commit should also have the same hang.  There
should only be a tiny window between 9879de7373fc and cc87317726f8 --
v3.19 -- where these allocations are allowed to fail.

[1] https://www.marc.info/?l=linux-mm&m=142450545009301&w=3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 at drivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380()
  2015-03-17 19:24   ` Johannes Weiner
@ 2015-03-18  1:53     ` Huang Ying
  2015-03-18 11:45       ` [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380() Tetsuo Handa
  0 siblings, 1 reply; 11+ messages in thread
From: Huang Ying @ 2015-03-18  1:53 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Linus Torvalds, Michal Hocko, Tetsuo Handa, David Rientjes,
	Andrew Morton, Dave Chinner, LKML, LKP ML, linux-mm

On Tue, 2015-03-17 at 15:24 -0400, Johannes Weiner wrote:
> On Tue, Mar 17, 2015 at 10:15:29AM -0700, Linus Torvalds wrote:
> > Explicitly adding the emails of other people involved with that commit
> > and the original oom thread to make sure people are aware, since this
> > didn't get any response.
> > 
> > Commit cc87317726f8 fixed some behavior, but also seems to have turned
> > an oom situation into a complete hang. So presumably we shouldn't loop
> > *forever*. Hmm?
> 
> It seems we are between a rock and a hard place here, as we reverted
> specifically to that endless looping on request of filesystem people.
> They said[1] they rely on these allocations never returning NULL, or
> they might fail inside a transactions and corrupt on-disk data.
> 
> Huang, against which kernels did you first run this test on this exact
> setup?  Is there a chance you could try to run a kernel without/before
> 9879de7373fc?  I want to make sure I'm not missing something, but all
> versions preceding this commit should also have the same hang.  There
> should only be a tiny window between 9879de7373fc and cc87317726f8 --
> v3.19 -- where these allocations are allowed to fail.

I checked the test result of v3.19-rc6.  It shows that boot will hang at
the same position.

BTW: the test is run on 32 bit system.

Best Regards,
Huang, Ying


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380()
  2015-03-18  1:53     ` Huang Ying
@ 2015-03-18 11:45       ` Tetsuo Handa
  2015-03-19  1:57         ` Huang Ying
  0 siblings, 1 reply; 11+ messages in thread
From: Tetsuo Handa @ 2015-03-18 11:45 UTC (permalink / raw)
  To: ying.huang, hannes
  Cc: torvalds, mhocko, rientjes, akpm, david, linux-kernel, lkp, linux-mm

Huang Ying wrote:
> On Tue, 2015-03-17 at 15:24 -0400, Johannes Weiner wrote:
> > On Tue, Mar 17, 2015 at 10:15:29AM -0700, Linus Torvalds wrote:
> > > Explicitly adding the emails of other people involved with that commit
> > > and the original oom thread to make sure people are aware, since this
> > > didn't get any response.
> > > 
> > > Commit cc87317726f8 fixed some behavior, but also seems to have turned
> > > an oom situation into a complete hang. So presumably we shouldn't loop
> > > *forever*. Hmm?
> > 
> > It seems we are between a rock and a hard place here, as we reverted
> > specifically to that endless looping on request of filesystem people.
> > They said[1] they rely on these allocations never returning NULL, or
> > they might fail inside a transactions and corrupt on-disk data.
> > 
> > Huang, against which kernels did you first run this test on this exact
> > setup?  Is there a chance you could try to run a kernel without/before
> > 9879de7373fc?  I want to make sure I'm not missing something, but all
> > versions preceding this commit should also have the same hang.  There
> > should only be a tiny window between 9879de7373fc and cc87317726f8 --
> > v3.19 -- where these allocations are allowed to fail.
> 
> I checked the test result of v3.19-rc6.  It shows that boot will hang at
> the same position.

OK. That's the expected result. We are discussing about how to safely
allow small allocations to fail, including how to handle stalls caused by
allocations without __GFP_FS.

> 
> BTW: the test is run on 32 bit system.

That sounds like the cause of your problem. The system might be out of
address space available for the kernel (only 1GB if x86_32). You should
try running tests on 64 bit systems.

> 
> Best Regards,
> Huang, Ying

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380()
  2015-03-18 11:45       ` [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380() Tetsuo Handa
@ 2015-03-19  1:57         ` Huang Ying
  2015-03-20 13:34           ` [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380() Tetsuo Handa
  0 siblings, 1 reply; 11+ messages in thread
From: Huang Ying @ 2015-03-19  1:57 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: hannes, torvalds, mhocko, rientjes, akpm, david, linux-kernel,
	lkp, linux-mm

On Wed, 2015-03-18 at 20:45 +0900, Tetsuo Handa wrote:
> Huang Ying wrote:
> > On Tue, 2015-03-17 at 15:24 -0400, Johannes Weiner wrote:
> > > On Tue, Mar 17, 2015 at 10:15:29AM -0700, Linus Torvalds wrote:
> > > > Explicitly adding the emails of other people involved with that commit
> > > > and the original oom thread to make sure people are aware, since this
> > > > didn't get any response.
> > > > 
> > > > Commit cc87317726f8 fixed some behavior, but also seems to have turned
> > > > an oom situation into a complete hang. So presumably we shouldn't loop
> > > > *forever*. Hmm?
> > > 
> > > It seems we are between a rock and a hard place here, as we reverted
> > > specifically to that endless looping on request of filesystem people.
> > > They said[1] they rely on these allocations never returning NULL, or
> > > they might fail inside a transactions and corrupt on-disk data.
> > > 
> > > Huang, against which kernels did you first run this test on this exact
> > > setup?  Is there a chance you could try to run a kernel without/before
> > > 9879de7373fc?  I want to make sure I'm not missing something, but all
> > > versions preceding this commit should also have the same hang.  There
> > > should only be a tiny window between 9879de7373fc and cc87317726f8 --
> > > v3.19 -- where these allocations are allowed to fail.
> > 
> > I checked the test result of v3.19-rc6.  It shows that boot will hang at
> > the same position.
> 
> OK. That's the expected result. We are discussing about how to safely
> allow small allocations to fail, including how to handle stalls caused by
> allocations without __GFP_FS.
> 
> > 
> > BTW: the test is run on 32 bit system.
> 
> That sounds like the cause of your problem. The system might be out of
> address space available for the kernel (only 1GB if x86_32). You should
> try running tests on 64 bit systems.

We run test on 32 bit and 64 bit systems.  Try to catch problems on both
platforms.  I think we still need to support 32 bit systems?

Best Regards,
Huang, Ying


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380()
  2015-03-19  1:57         ` Huang Ying
@ 2015-03-20 13:34           ` Tetsuo Handa
  2015-03-20 13:38             ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Tetsuo Handa @ 2015-03-20 13:34 UTC (permalink / raw)
  To: ying.huang
  Cc: hannes, torvalds, mhocko, rientjes, akpm, david, linux-kernel,
	lkp, linux-mm

Huang Ying wrote:
> > > BTW: the test is run on 32 bit system.
> > 
> > That sounds like the cause of your problem. The system might be out of
> > address space available for the kernel (only 1GB if x86_32). You should
> > try running tests on 64 bit systems.
> 
> We run test on 32 bit and 64 bit systems.  Try to catch problems on both
> platforms.  I think we still need to support 32 bit systems?

Yes, testing on both platforms is good. But please read
http://lwn.net/Articles/627419/ , http://lwn.net/Articles/635354/ and
http://lwn.net/Articles/636017/ . Then please add __GFP_NORETRY to memory
allocations in btrfs code if it is appropriate.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380()
  2015-03-20 13:34           ` [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380() Tetsuo Handa
@ 2015-03-20 13:38             ` Michal Hocko
  2015-03-20 14:02               ` [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID:1atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380() Tetsuo Handa
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2015-03-20 13:38 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: ying.huang, hannes, torvalds, rientjes, akpm, david,
	linux-kernel, lkp, linux-mm

On Fri 20-03-15 22:34:21, Tetsuo Handa wrote:
> Huang Ying wrote:
> > > > BTW: the test is run on 32 bit system.
> > > 
> > > That sounds like the cause of your problem. The system might be out of
> > > address space available for the kernel (only 1GB if x86_32). You should
> > > try running tests on 64 bit systems.
> > 
> > We run test on 32 bit and 64 bit systems.  Try to catch problems on both
> > platforms.  I think we still need to support 32 bit systems?
> 
> Yes, testing on both platforms is good. But please read
> http://lwn.net/Articles/627419/ , http://lwn.net/Articles/635354/ and
> http://lwn.net/Articles/636017/ . Then please add __GFP_NORETRY to memory
> allocations in btrfs code if it is appropriate.

I guess you meant __GFP_NOFAIL?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID:1atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380()
  2015-03-20 13:38             ` Michal Hocko
@ 2015-03-20 14:02               ` Tetsuo Handa
  2015-03-20 14:34                 ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Tetsuo Handa @ 2015-03-20 14:02 UTC (permalink / raw)
  To: mhocko
  Cc: ying.huang, hannes, torvalds, rientjes, akpm, david,
	linux-kernel, lkp, linux-mm

Michal Hocko wrote:
> On Fri 20-03-15 22:34:21, Tetsuo Handa wrote:
> > Huang Ying wrote:
> > > > > BTW: the test is run on 32 bit system.
> > > > 
> > > > That sounds like the cause of your problem. The system might be out of
> > > > address space available for the kernel (only 1GB if x86_32). You should
> > > > try running tests on 64 bit systems.
> > > 
> > > We run test on 32 bit and 64 bit systems.  Try to catch problems on both
> > > platforms.  I think we still need to support 32 bit systems?
> > 
> > Yes, testing on both platforms is good. But please read
> > http://lwn.net/Articles/627419/ , http://lwn.net/Articles/635354/ and
> > http://lwn.net/Articles/636017/ . Then please add __GFP_NORETRY to memory
> > allocations in btrfs code if it is appropriate.
> 
> I guess you meant __GFP_NOFAIL?
> 
No. btrfs's selftest (which is not using __GFP_NOFAIL) is already looping
forever. If we want to avoid btrfs's selftest from looping forever, btrfs
needs __GFP_NORETRY than __GFP_NOFAIL (until we establish a way to safely
allow small allocations to fail).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID:1atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380()
  2015-03-20 14:02               ` [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID:1atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380() Tetsuo Handa
@ 2015-03-20 14:34                 ` Michal Hocko
  2015-03-20 15:41                   ` [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 at drivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380() Tetsuo Handa
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2015-03-20 14:34 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: ying.huang, hannes, torvalds, rientjes, akpm, david,
	linux-kernel, lkp, linux-mm

On Fri 20-03-15 23:02:09, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Fri 20-03-15 22:34:21, Tetsuo Handa wrote:
> > > Huang Ying wrote:
> > > > > > BTW: the test is run on 32 bit system.
> > > > > 
> > > > > That sounds like the cause of your problem. The system might be out of
> > > > > address space available for the kernel (only 1GB if x86_32). You should
> > > > > try running tests on 64 bit systems.
> > > > 
> > > > We run test on 32 bit and 64 bit systems.  Try to catch problems on both
> > > > platforms.  I think we still need to support 32 bit systems?
> > > 
> > > Yes, testing on both platforms is good. But please read
> > > http://lwn.net/Articles/627419/ , http://lwn.net/Articles/635354/ and
> > > http://lwn.net/Articles/636017/ . Then please add __GFP_NORETRY to memory
> > > allocations in btrfs code if it is appropriate.
> > 
> > I guess you meant __GFP_NOFAIL?
> > 
> No. btrfs's selftest (which is not using __GFP_NOFAIL) is already looping
> forever. If we want to avoid btrfs's selftest from looping forever, btrfs
> needs __GFP_NORETRY than __GFP_NOFAIL (until we establish a way to safely
> allow small allocations to fail).

Sigh. If the code is using GFP_NOFS allocation (which seem to be the
case because it worked with the 9879de7373fc) and the proper fix for
this IMO is to simply not retry endlessly for these allocations.  We
have to sort some other issues before we can make NOFS allocations fail
but let's not pile more workarounds on top in the meantime. But if btrfs
people really think __GFP_NORETRY then I do not really care much.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 at drivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380()
  2015-03-20 14:34                 ` Michal Hocko
@ 2015-03-20 15:41                   ` Tetsuo Handa
  0 siblings, 0 replies; 11+ messages in thread
From: Tetsuo Handa @ 2015-03-20 15:41 UTC (permalink / raw)
  To: mhocko
  Cc: ying.huang, hannes, torvalds, rientjes, akpm, david,
	linux-kernel, lkp, linux-mm

Michal Hocko wrote:
> On Fri 20-03-15 23:02:09, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Fri 20-03-15 22:34:21, Tetsuo Handa wrote:
> > > > Huang Ying wrote:
> > > > > > > BTW: the test is run on 32 bit system.
> > > > > > 
> > > > > > That sounds like the cause of your problem. The system might be out of
> > > > > > address space available for the kernel (only 1GB if x86_32). You should
> > > > > > try running tests on 64 bit systems.
> > > > > 
> > > > > We run test on 32 bit and 64 bit systems.  Try to catch problems on both
> > > > > platforms.  I think we still need to support 32 bit systems?
> > > > 
> > > > Yes, testing on both platforms is good. But please read
> > > > http://lwn.net/Articles/627419/ , http://lwn.net/Articles/635354/ and
> > > > http://lwn.net/Articles/636017/ . Then please add __GFP_NORETRY to memory
> > > > allocations in btrfs code if it is appropriate.
> > > 
> > > I guess you meant __GFP_NOFAIL?
> > > 
> > No. btrfs's selftest (which is not using __GFP_NOFAIL) is already looping
> > forever. If we want to avoid btrfs's selftest from looping forever, btrfs
> > needs __GFP_NORETRY than __GFP_NOFAIL (until we establish a way to safely
> > allow small allocations to fail).
> 
> Sigh. If the code is using GFP_NOFS allocation (which seem to be the
> case because it worked with the 9879de7373fc) and the proper fix for
> this IMO is to simply not retry endlessly for these allocations.

We can avoid looping forever by passing __GFP_NORETRY (from the caller side)
or by using sysctl_nr_alloc_retry == 1 (from the callee side). But

> We
> have to sort some other issues before we can make NOFS allocations fail
> but let's not pile more workarounds on top in the meantime. But if btrfs
> people really think __GFP_NORETRY then I do not really care much.

https://lkml.org/lkml/2015/3/19/221 suggests that changing each caller to
use either __GFP_NOFAIL or __GFP_NORETRY is the safer way to allow small
allocations to fail than using sysctl_nr_alloc_retry, for we don't want to
add __GFP_NOFAIL to allocations by page fault.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-03-20 15:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1426227621.6711.238.camel@intel.com>
2015-03-17 17:15 ` [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 at drivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380() Linus Torvalds
2015-03-17 17:28   ` Michal Hocko
2015-03-17 19:24   ` Johannes Weiner
2015-03-18  1:53     ` Huang Ying
2015-03-18 11:45       ` [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380() Tetsuo Handa
2015-03-19  1:57         ` Huang Ying
2015-03-20 13:34           ` [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380() Tetsuo Handa
2015-03-20 13:38             ` Michal Hocko
2015-03-20 14:02               ` [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID:1atdrivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380() Tetsuo Handa
2015-03-20 14:34                 ` Michal Hocko
2015-03-20 15:41                   ` [LKP] [mm] cc87317726f: WARNING: CPU: 0 PID: 1 at drivers/iommu/io-pgtable-arm.c:413 __arm_lpae_unmap+0x341/0x380() Tetsuo Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).