All of lore.kernel.org
 help / color / mirror / Atom feed
* [next-20150119]regression (mm)?
@ 2015-01-19 16:42 ` Nishanth Menon
  0 siblings, 0 replies; 48+ messages in thread
From: Nishanth Menon @ 2015-01-19 16:42 UTC (permalink / raw)
  To: linux-omap, linux-next, linux-arm-kernel; +Cc: Balbi, Felipe

Hi,

Most platforms seem broken intoday's next tag.

https://github.com/nmenon/kernel-test-logs/tree/next-20150119
(defconfig: omap2plus_defconfig)

> [    7.166600] ------------[ cut here ]------------
> [    7.171676] WARNING: CPU: 0 PID: 54 at mm/mmap.c:2859 exit_mmap+0x1a8/0x21c()
> [    7.179194] Modules linked in:
> [    7.182479] CPU: 0 PID: 54 Comm: init Not tainted 3.19.0-rc5-next-20150119-00002-gfdefcded1272 #1
> [    7.191863] Hardware name: Generic AM33XX (Flattened Device Tree)
> [    7.198318] [<c00153f0>] (unwind_backtrace) from [<c0011a74>] (show_stack+0x10/0x14)
> [    7.206528] [<c0011a74>] (show_stack) from [<c0580150>] (dump_stack+0x78/0x94)
> [    7.214191] [<c0580150>] (dump_stack) from [<c003d4d0>] (warn_slowpath_common+0x7c/0xb4)
> [    7.222751] [<c003d4d0>] (warn_slowpath_common) from [<c003d524>] (warn_slowpath_null+0x1c/0x24)
> [    7.232038] [<c003d524>] (warn_slowpath_null) from [<c012de64>] (exit_mmap+0x1a8/0x21c)
> [    7.240536] [<c012de64>] (exit_mmap) from [<c003abb8>] (mmput+0x44/0xec)
> [    7.247612] [<c003abb8>] (mmput) from [<c0151368>] (flush_old_exec+0x300/0x5a4)
> [    7.255357] [<c0151368>] (flush_old_exec) from [<c0195c10>] (load_elf_binary+0x2ec/0x1144)
> [    7.264111] [<c0195c10>] (load_elf_binary) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> [    7.273311] [<c0150ea0>] (search_binary_handler) from [<c019554c>] (load_script+0x260/0x280)
> [    7.282232] [<c019554c>] (load_script) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> [    7.291066] [<c0150ea0>] (search_binary_handler) from [<c0151f0c>] (do_execveat_common+0x538/0x6c4)
> [    7.300628] [<c0151f0c>] (do_execveat_common) from [<c01520c4>] (do_execve+0x2c/0x34)
> [    7.308881] [<c01520c4>] (do_execve) from [<c000e5e0>] (ret_fast_syscall+0x0/0x4c)
> [    7.316881] ---[ end trace 3b8a46b1b280f423 ]---


-- 
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-19 16:42 ` Nishanth Menon
  0 siblings, 0 replies; 48+ messages in thread
From: Nishanth Menon @ 2015-01-19 16:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

Most platforms seem broken intoday's next tag.

https://github.com/nmenon/kernel-test-logs/tree/next-20150119
(defconfig: omap2plus_defconfig)

> [    7.166600] ------------[ cut here ]------------
> [    7.171676] WARNING: CPU: 0 PID: 54 at mm/mmap.c:2859 exit_mmap+0x1a8/0x21c()
> [    7.179194] Modules linked in:
> [    7.182479] CPU: 0 PID: 54 Comm: init Not tainted 3.19.0-rc5-next-20150119-00002-gfdefcded1272 #1
> [    7.191863] Hardware name: Generic AM33XX (Flattened Device Tree)
> [    7.198318] [<c00153f0>] (unwind_backtrace) from [<c0011a74>] (show_stack+0x10/0x14)
> [    7.206528] [<c0011a74>] (show_stack) from [<c0580150>] (dump_stack+0x78/0x94)
> [    7.214191] [<c0580150>] (dump_stack) from [<c003d4d0>] (warn_slowpath_common+0x7c/0xb4)
> [    7.222751] [<c003d4d0>] (warn_slowpath_common) from [<c003d524>] (warn_slowpath_null+0x1c/0x24)
> [    7.232038] [<c003d524>] (warn_slowpath_null) from [<c012de64>] (exit_mmap+0x1a8/0x21c)
> [    7.240536] [<c012de64>] (exit_mmap) from [<c003abb8>] (mmput+0x44/0xec)
> [    7.247612] [<c003abb8>] (mmput) from [<c0151368>] (flush_old_exec+0x300/0x5a4)
> [    7.255357] [<c0151368>] (flush_old_exec) from [<c0195c10>] (load_elf_binary+0x2ec/0x1144)
> [    7.264111] [<c0195c10>] (load_elf_binary) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> [    7.273311] [<c0150ea0>] (search_binary_handler) from [<c019554c>] (load_script+0x260/0x280)
> [    7.282232] [<c019554c>] (load_script) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> [    7.291066] [<c0150ea0>] (search_binary_handler) from [<c0151f0c>] (do_execveat_common+0x538/0x6c4)
> [    7.300628] [<c0151f0c>] (do_execveat_common) from [<c01520c4>] (do_execve+0x2c/0x34)
> [    7.308881] [<c01520c4>] (do_execve) from [<c000e5e0>] (ret_fast_syscall+0x0/0x4c)
> [    7.316881] ---[ end trace 3b8a46b1b280f423 ]---


-- 
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
       [not found] ` <CANMBJr6DudDBSs+rM-e2QnC5ztxAYLuSvZ0khvx7OZdQpcu_3A@mail.gmail.com>
@ 2015-01-19 17:04     ` Nishanth Menon
  0 siblings, 0 replies; 48+ messages in thread
From: Nishanth Menon @ 2015-01-19 17:04 UTC (permalink / raw)
  To: Tyler Baker; +Cc: linux-omap, linux-next, linux-arm-kernel, Balbi, Felipe


On 01/19/2015 10:59 AM, Tyler Baker wrote:
> I can confirm, I am observing the same issue in my lab. 15 platforms
> failed to boot on next-20150119.
> 
> http://kernelci.org/boot/?next-20150119&fail

http://kernelci.org/boot/all/job/next/kernel/next-20150119/
I see many platforms succeed in lab-khilman, but fails in your farm as
well :(

For example:
http://storage.kernelci.org/next/next-20150119/arm-imx_v6_v7_defconfig/lab-khilman/boot-imx6q-wandboard.txt
has the same errors, but marked success.
http://storage.kernelci.org/next/next-20150119/arm-imx_v6_v7_defconfig/lab-tbaker/boot-imx6q-wandboard.txt
is marked fail.

I suppose this is much worse than the "pass" status indicates.
> 
> 
> On Monday, 19 January 2015, Nishanth Menon <nm@ti.com
> <mailto:nm@ti.com>> wrote:
> 
>     Hi,
> 
>     Most platforms seem broken intoday's next tag.
> 
>     https://github.com/nmenon/kernel-test-logs/tree/next-20150119
>     (defconfig: omap2plus_defconfig)
> 
>     > [    7.166600] ------------[ cut here ]------------
>     > [    7.171676] WARNING: CPU: 0 PID: 54 at mm/mmap.c:2859
>     exit_mmap+0x1a8/0x21c()
>     > [    7.179194] Modules linked in:
>     > [    7.182479] CPU: 0 PID: 54 Comm: init Not tainted
>     3.19.0-rc5-next-20150119-00002-gfdefcded1272 #1
>     > [    7.191863] Hardware name: Generic AM33XX (Flattened Device Tree)
>     > [    7.198318] [<c00153f0>] (unwind_backtrace) from [<c0011a74>]
>     (show_stack+0x10/0x14)
>     > [    7.206528] [<c0011a74>] (show_stack) from [<c0580150>]
>     (dump_stack+0x78/0x94)
>     > [    7.214191] [<c0580150>] (dump_stack) from [<c003d4d0>]
>     (warn_slowpath_common+0x7c/0xb4)
>     > [    7.222751] [<c003d4d0>] (warn_slowpath_common) from
>     [<c003d524>] (warn_slowpath_null+0x1c/0x24)
>     > [    7.232038] [<c003d524>] (warn_slowpath_null) from
>     [<c012de64>] (exit_mmap+0x1a8/0x21c)
>     > [    7.240536] [<c012de64>] (exit_mmap) from [<c003abb8>]
>     (mmput+0x44/0xec)
>     > [    7.247612] [<c003abb8>] (mmput) from [<c0151368>]
>     (flush_old_exec+0x300/0x5a4)
>     > [    7.255357] [<c0151368>] (flush_old_exec) from [<c0195c10>]
>     (load_elf_binary+0x2ec/0x1144)
>     > [    7.264111] [<c0195c10>] (load_elf_binary) from [<c0150ea0>]
>     (search_binary_handler+0x88/0x1ac)
>     > [    7.273311] [<c0150ea0>] (search_binary_handler) from
>     [<c019554c>] (load_script+0x260/0x280)
>     > [    7.282232] [<c019554c>] (load_script) from [<c0150ea0>]
>     (search_binary_handler+0x88/0x1ac)
>     > [    7.291066] [<c0150ea0>] (search_binary_handler) from
>     [<c0151f0c>] (do_execveat_common+0x538/0x6c4)
>     > [    7.300628] [<c0151f0c>] (do_execveat_common) from
>     [<c01520c4>] (do_execve+0x2c/0x34)
>     > [    7.308881] [<c01520c4>] (do_execve) from [<c000e5e0>]
>     (ret_fast_syscall+0x0/0x4c)
>     > [    7.316881] ---[ end trace 3b8a46b1b280f423 ]---
> 
> 
>     --
>     Regards,
>     Nishanth Menon
> 
>     _______________________________________________
>     linux-arm-kernel mailing list
>     linux-arm-kernel@lists.infradead.org <javascript:;>
>     http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> 
> 
> -- 
> Tyler Baker
> Tech Lead, LAVA
> Linaro.org | Open source software for ARM SoCs
> Follow Linaro: http://www.facebook.com/pages/Linaro
> http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


-- 
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-19 17:04     ` Nishanth Menon
  0 siblings, 0 replies; 48+ messages in thread
From: Nishanth Menon @ 2015-01-19 17:04 UTC (permalink / raw)
  To: linux-arm-kernel


On 01/19/2015 10:59 AM, Tyler Baker wrote:
> I can confirm, I am observing the same issue in my lab. 15 platforms
> failed to boot on next-20150119.
> 
> http://kernelci.org/boot/?next-20150119&fail

http://kernelci.org/boot/all/job/next/kernel/next-20150119/
I see many platforms succeed in lab-khilman, but fails in your farm as
well :(

For example:
http://storage.kernelci.org/next/next-20150119/arm-imx_v6_v7_defconfig/lab-khilman/boot-imx6q-wandboard.txt
has the same errors, but marked success.
http://storage.kernelci.org/next/next-20150119/arm-imx_v6_v7_defconfig/lab-tbaker/boot-imx6q-wandboard.txt
is marked fail.

I suppose this is much worse than the "pass" status indicates.
> 
> 
> On Monday, 19 January 2015, Nishanth Menon <nm@ti.com
> <mailto:nm@ti.com>> wrote:
> 
>     Hi,
> 
>     Most platforms seem broken intoday's next tag.
> 
>     https://github.com/nmenon/kernel-test-logs/tree/next-20150119
>     (defconfig: omap2plus_defconfig)
> 
>     > [    7.166600] ------------[ cut here ]------------
>     > [    7.171676] WARNING: CPU: 0 PID: 54 at mm/mmap.c:2859
>     exit_mmap+0x1a8/0x21c()
>     > [    7.179194] Modules linked in:
>     > [    7.182479] CPU: 0 PID: 54 Comm: init Not tainted
>     3.19.0-rc5-next-20150119-00002-gfdefcded1272 #1
>     > [    7.191863] Hardware name: Generic AM33XX (Flattened Device Tree)
>     > [    7.198318] [<c00153f0>] (unwind_backtrace) from [<c0011a74>]
>     (show_stack+0x10/0x14)
>     > [    7.206528] [<c0011a74>] (show_stack) from [<c0580150>]
>     (dump_stack+0x78/0x94)
>     > [    7.214191] [<c0580150>] (dump_stack) from [<c003d4d0>]
>     (warn_slowpath_common+0x7c/0xb4)
>     > [    7.222751] [<c003d4d0>] (warn_slowpath_common) from
>     [<c003d524>] (warn_slowpath_null+0x1c/0x24)
>     > [    7.232038] [<c003d524>] (warn_slowpath_null) from
>     [<c012de64>] (exit_mmap+0x1a8/0x21c)
>     > [    7.240536] [<c012de64>] (exit_mmap) from [<c003abb8>]
>     (mmput+0x44/0xec)
>     > [    7.247612] [<c003abb8>] (mmput) from [<c0151368>]
>     (flush_old_exec+0x300/0x5a4)
>     > [    7.255357] [<c0151368>] (flush_old_exec) from [<c0195c10>]
>     (load_elf_binary+0x2ec/0x1144)
>     > [    7.264111] [<c0195c10>] (load_elf_binary) from [<c0150ea0>]
>     (search_binary_handler+0x88/0x1ac)
>     > [    7.273311] [<c0150ea0>] (search_binary_handler) from
>     [<c019554c>] (load_script+0x260/0x280)
>     > [    7.282232] [<c019554c>] (load_script) from [<c0150ea0>]
>     (search_binary_handler+0x88/0x1ac)
>     > [    7.291066] [<c0150ea0>] (search_binary_handler) from
>     [<c0151f0c>] (do_execveat_common+0x538/0x6c4)
>     > [    7.300628] [<c0151f0c>] (do_execveat_common) from
>     [<c01520c4>] (do_execve+0x2c/0x34)
>     > [    7.308881] [<c01520c4>] (do_execve) from [<c000e5e0>]
>     (ret_fast_syscall+0x0/0x4c)
>     > [    7.316881] ---[ end trace 3b8a46b1b280f423 ]---
> 
> 
>     --
>     Regards,
>     Nishanth Menon
> 
>     _______________________________________________
>     linux-arm-kernel mailing list
>     linux-arm-kernel at lists.infradead.org <javascript:;>
>     http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> 
> 
> -- 
> Tyler Baker
> Tech Lead, LAVA
> Linaro.org | Open source software for ARM SoCs
> Follow Linaro: http://www.facebook.com/pages/Linaro
> http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


-- 
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-19 17:04     ` Nishanth Menon
@ 2015-01-19 17:19       ` Tyler Baker
  -1 siblings, 0 replies; 48+ messages in thread
From: Tyler Baker @ 2015-01-19 17:19 UTC (permalink / raw)
  To: Nishanth Menon; +Cc: linux-omap, linux-next, linux-arm-kernel, Balbi, Felipe

On 19 January 2015 at 09:04, Nishanth Menon <nm@ti.com> wrote:
>
> On 01/19/2015 10:59 AM, Tyler Baker wrote:
>> I can confirm, I am observing the same issue in my lab. 15 platforms
>> failed to boot on next-20150119.
>>
>> http://kernelci.org/boot/?next-20150119&fail
>
> http://kernelci.org/boot/all/job/next/kernel/next-20150119/
> I see many platforms succeed in lab-khilman, but fails in your farm as
> well :(
>
> For example:
> http://storage.kernelci.org/next/next-20150119/arm-imx_v6_v7_defconfig/lab-khilman/boot-imx6q-wandboard.txt
> has the same errors, but marked success.
> http://storage.kernelci.org/next/next-20150119/arm-imx_v6_v7_defconfig/lab-tbaker/boot-imx6q-wandboard.txt
> is marked fail.
>
> I suppose this is much worse than the "pass" status indicates.

I agree. I believe this boots were marked as 'passed' because the
platforms eventually reached userspace despite the kernel spewing
errors. I've re-run a few of my boots, and sometimes the platform
reaches userspace, other times it hangs.

>>
>>
>> On Monday, 19 January 2015, Nishanth Menon <nm@ti.com
>> <mailto:nm@ti.com>> wrote:
>>
>>     Hi,
>>
>>     Most platforms seem broken intoday's next tag.
>>
>>     https://github.com/nmenon/kernel-test-logs/tree/next-20150119
>>     (defconfig: omap2plus_defconfig)
>>
>>     > [    7.166600] ------------[ cut here ]------------
>>     > [    7.171676] WARNING: CPU: 0 PID: 54 at mm/mmap.c:2859
>>     exit_mmap+0x1a8/0x21c()
>>     > [    7.179194] Modules linked in:
>>     > [    7.182479] CPU: 0 PID: 54 Comm: init Not tainted
>>     3.19.0-rc5-next-20150119-00002-gfdefcded1272 #1
>>     > [    7.191863] Hardware name: Generic AM33XX (Flattened Device Tree)
>>     > [    7.198318] [<c00153f0>] (unwind_backtrace) from [<c0011a74>]
>>     (show_stack+0x10/0x14)
>>     > [    7.206528] [<c0011a74>] (show_stack) from [<c0580150>]
>>     (dump_stack+0x78/0x94)
>>     > [    7.214191] [<c0580150>] (dump_stack) from [<c003d4d0>]
>>     (warn_slowpath_common+0x7c/0xb4)
>>     > [    7.222751] [<c003d4d0>] (warn_slowpath_common) from
>>     [<c003d524>] (warn_slowpath_null+0x1c/0x24)
>>     > [    7.232038] [<c003d524>] (warn_slowpath_null) from
>>     [<c012de64>] (exit_mmap+0x1a8/0x21c)
>>     > [    7.240536] [<c012de64>] (exit_mmap) from [<c003abb8>]
>>     (mmput+0x44/0xec)
>>     > [    7.247612] [<c003abb8>] (mmput) from [<c0151368>]
>>     (flush_old_exec+0x300/0x5a4)
>>     > [    7.255357] [<c0151368>] (flush_old_exec) from [<c0195c10>]
>>     (load_elf_binary+0x2ec/0x1144)
>>     > [    7.264111] [<c0195c10>] (load_elf_binary) from [<c0150ea0>]
>>     (search_binary_handler+0x88/0x1ac)
>>     > [    7.273311] [<c0150ea0>] (search_binary_handler) from
>>     [<c019554c>] (load_script+0x260/0x280)
>>     > [    7.282232] [<c019554c>] (load_script) from [<c0150ea0>]
>>     (search_binary_handler+0x88/0x1ac)
>>     > [    7.291066] [<c0150ea0>] (search_binary_handler) from
>>     [<c0151f0c>] (do_execveat_common+0x538/0x6c4)
>>     > [    7.300628] [<c0151f0c>] (do_execveat_common) from
>>     [<c01520c4>] (do_execve+0x2c/0x34)
>>     > [    7.308881] [<c01520c4>] (do_execve) from [<c000e5e0>]
>>     (ret_fast_syscall+0x0/0x4c)
>>     > [    7.316881] ---[ end trace 3b8a46b1b280f423 ]---
>>
>>
>>     --
>>     Regards,
>>     Nishanth Menon
>>
>>     _______________________________________________
>>     linux-arm-kernel mailing list
>>     linux-arm-kernel@lists.infradead.org <javascript:;>
>>     http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>>
>>
>> --
>> Tyler Baker
>> Tech Lead, LAVA
>> Linaro.org | Open source software for ARM SoCs
>> Follow Linaro: http://www.facebook.com/pages/Linaro
>> http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog
>
>
> --
> Regards,
> Nishanth Menon



-- 
Tyler Baker
Tech Lead, LAVA
Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-19 17:19       ` Tyler Baker
  0 siblings, 0 replies; 48+ messages in thread
From: Tyler Baker @ 2015-01-19 17:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 January 2015 at 09:04, Nishanth Menon <nm@ti.com> wrote:
>
> On 01/19/2015 10:59 AM, Tyler Baker wrote:
>> I can confirm, I am observing the same issue in my lab. 15 platforms
>> failed to boot on next-20150119.
>>
>> http://kernelci.org/boot/?next-20150119&fail
>
> http://kernelci.org/boot/all/job/next/kernel/next-20150119/
> I see many platforms succeed in lab-khilman, but fails in your farm as
> well :(
>
> For example:
> http://storage.kernelci.org/next/next-20150119/arm-imx_v6_v7_defconfig/lab-khilman/boot-imx6q-wandboard.txt
> has the same errors, but marked success.
> http://storage.kernelci.org/next/next-20150119/arm-imx_v6_v7_defconfig/lab-tbaker/boot-imx6q-wandboard.txt
> is marked fail.
>
> I suppose this is much worse than the "pass" status indicates.

I agree. I believe this boots were marked as 'passed' because the
platforms eventually reached userspace despite the kernel spewing
errors. I've re-run a few of my boots, and sometimes the platform
reaches userspace, other times it hangs.

>>
>>
>> On Monday, 19 January 2015, Nishanth Menon <nm@ti.com
>> <mailto:nm@ti.com>> wrote:
>>
>>     Hi,
>>
>>     Most platforms seem broken intoday's next tag.
>>
>>     https://github.com/nmenon/kernel-test-logs/tree/next-20150119
>>     (defconfig: omap2plus_defconfig)
>>
>>     > [    7.166600] ------------[ cut here ]------------
>>     > [    7.171676] WARNING: CPU: 0 PID: 54 at mm/mmap.c:2859
>>     exit_mmap+0x1a8/0x21c()
>>     > [    7.179194] Modules linked in:
>>     > [    7.182479] CPU: 0 PID: 54 Comm: init Not tainted
>>     3.19.0-rc5-next-20150119-00002-gfdefcded1272 #1
>>     > [    7.191863] Hardware name: Generic AM33XX (Flattened Device Tree)
>>     > [    7.198318] [<c00153f0>] (unwind_backtrace) from [<c0011a74>]
>>     (show_stack+0x10/0x14)
>>     > [    7.206528] [<c0011a74>] (show_stack) from [<c0580150>]
>>     (dump_stack+0x78/0x94)
>>     > [    7.214191] [<c0580150>] (dump_stack) from [<c003d4d0>]
>>     (warn_slowpath_common+0x7c/0xb4)
>>     > [    7.222751] [<c003d4d0>] (warn_slowpath_common) from
>>     [<c003d524>] (warn_slowpath_null+0x1c/0x24)
>>     > [    7.232038] [<c003d524>] (warn_slowpath_null) from
>>     [<c012de64>] (exit_mmap+0x1a8/0x21c)
>>     > [    7.240536] [<c012de64>] (exit_mmap) from [<c003abb8>]
>>     (mmput+0x44/0xec)
>>     > [    7.247612] [<c003abb8>] (mmput) from [<c0151368>]
>>     (flush_old_exec+0x300/0x5a4)
>>     > [    7.255357] [<c0151368>] (flush_old_exec) from [<c0195c10>]
>>     (load_elf_binary+0x2ec/0x1144)
>>     > [    7.264111] [<c0195c10>] (load_elf_binary) from [<c0150ea0>]
>>     (search_binary_handler+0x88/0x1ac)
>>     > [    7.273311] [<c0150ea0>] (search_binary_handler) from
>>     [<c019554c>] (load_script+0x260/0x280)
>>     > [    7.282232] [<c019554c>] (load_script) from [<c0150ea0>]
>>     (search_binary_handler+0x88/0x1ac)
>>     > [    7.291066] [<c0150ea0>] (search_binary_handler) from
>>     [<c0151f0c>] (do_execveat_common+0x538/0x6c4)
>>     > [    7.300628] [<c0151f0c>] (do_execveat_common) from
>>     [<c01520c4>] (do_execve+0x2c/0x34)
>>     > [    7.308881] [<c01520c4>] (do_execve) from [<c000e5e0>]
>>     (ret_fast_syscall+0x0/0x4c)
>>     > [    7.316881] ---[ end trace 3b8a46b1b280f423 ]---
>>
>>
>>     --
>>     Regards,
>>     Nishanth Menon
>>
>>     _______________________________________________
>>     linux-arm-kernel mailing list
>>     linux-arm-kernel at lists.infradead.org <javascript:;>
>>     http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>>
>>
>> --
>> Tyler Baker
>> Tech Lead, LAVA
>> Linaro.org | Open source software for ARM SoCs
>> Follow Linaro: http://www.facebook.com/pages/Linaro
>> http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog
>
>
> --
> Regards,
> Nishanth Menon



-- 
Tyler Baker
Tech Lead, LAVA
Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-19 16:42 ` Nishanth Menon
@ 2015-01-19 17:43   ` Felipe Balbi
  -1 siblings, 0 replies; 48+ messages in thread
From: Felipe Balbi @ 2015-01-19 17:43 UTC (permalink / raw)
  To: Nishanth Menon
  Cc: linux-omap, linux-next, linux-arm-kernel, Balbi, Felipe,
	Kirill A. Shutemov

[-- Attachment #1: Type: text/plain, Size: 10182 bytes --]

Hi,

On Mon, Jan 19, 2015 at 10:42:04AM -0600, Nishanth Menon wrote:
> Most platforms seem broken intoday's next tag.
> 
> https://github.com/nmenon/kernel-test-logs/tree/next-20150119
> (defconfig: omap2plus_defconfig)
> 
> > [    7.166600] ------------[ cut here ]------------
> > [    7.171676] WARNING: CPU: 0 PID: 54 at mm/mmap.c:2859 exit_mmap+0x1a8/0x21c()
> > [    7.179194] Modules linked in:
> > [    7.182479] CPU: 0 PID: 54 Comm: init Not tainted 3.19.0-rc5-next-20150119-00002-gfdefcded1272 #1
> > [    7.191863] Hardware name: Generic AM33XX (Flattened Device Tree)
> > [    7.198318] [<c00153f0>] (unwind_backtrace) from [<c0011a74>] (show_stack+0x10/0x14)
> > [    7.206528] [<c0011a74>] (show_stack) from [<c0580150>] (dump_stack+0x78/0x94)
> > [    7.214191] [<c0580150>] (dump_stack) from [<c003d4d0>] (warn_slowpath_common+0x7c/0xb4)
> > [    7.222751] [<c003d4d0>] (warn_slowpath_common) from [<c003d524>] (warn_slowpath_null+0x1c/0x24)
> > [    7.232038] [<c003d524>] (warn_slowpath_null) from [<c012de64>] (exit_mmap+0x1a8/0x21c)
> > [    7.240536] [<c012de64>] (exit_mmap) from [<c003abb8>] (mmput+0x44/0xec)
> > [    7.247612] [<c003abb8>] (mmput) from [<c0151368>] (flush_old_exec+0x300/0x5a4)
> > [    7.255357] [<c0151368>] (flush_old_exec) from [<c0195c10>] (load_elf_binary+0x2ec/0x1144)
> > [    7.264111] [<c0195c10>] (load_elf_binary) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> > [    7.273311] [<c0150ea0>] (search_binary_handler) from [<c019554c>] (load_script+0x260/0x280)
> > [    7.282232] [<c019554c>] (load_script) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> > [    7.291066] [<c0150ea0>] (search_binary_handler) from [<c0151f0c>] (do_execveat_common+0x538/0x6c4)
> > [    7.300628] [<c0151f0c>] (do_execveat_common) from [<c01520c4>] (do_execve+0x2c/0x34)
> > [    7.308881] [<c01520c4>] (do_execve) from [<c000e5e0>] (ret_fast_syscall+0x0/0x4c)
> > [    7.316881] ---[ end trace 3b8a46b1b280f423 ]---

seems like it's caused by:

b316feb3c37ff19cddcaf1f6b5056c633193257d is the first bad commit

Adding Kiryl to the loop.

git bisect start
# good: [ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc] Linux 3.19-rc5
git bisect good ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc
# bad: [a0d4287f787889e59db0fd295853a0f1f55d0699] Add linux-next specific files for 20150119
git bisect bad a0d4287f787889e59db0fd295853a0f1f55d0699
# good: [1c2f70b77b8ca77f10c59d479d009e07359d00d2] Merge remote-tracking branch 'drm/drm-next'
git bisect good 1c2f70b77b8ca77f10c59d479d009e07359d00d2
# good: [73c1390843223d8bfc85795c560c36b3d0ffee40] Merge remote-tracking branch 'leds/for-next'
git bisect good 73c1390843223d8bfc85795c560c36b3d0ffee40
# good: [7bc6bef35d48e91ad796b6eead7304998842c782] Merge remote-tracking branch 'pinctrl/for-next'
git bisect good 7bc6bef35d48e91ad796b6eead7304998842c782
# bad: [45e1eaa38732ffa3de0d18fe95d2d2b960a7c777] lib: bitmap: change bitmap_shift_right to take unsigned parameters
git bisect bad 45e1eaa38732ffa3de0d18fe95d2d2b960a7c777
# good: [c82a73a0369a7dd6dcfaf9e6bd572a4e5deda223] mm, page_alloc: reduce number of alloc_pages* functions' parameters
git bisect good c82a73a0369a7dd6dcfaf9e6bd572a4e5deda223
# bad: [0b1c810fbc4bbff7e314dd6ff91c2b4af499199d] mm: don't split THP page when syscall is called
git bisect bad 0b1c810fbc4bbff7e314dd6ff91c2b4af499199d
# good: [54faa439355a9ae476a446429967e9e38f04363e] oom, PM: make OOM detection in the freezer path raceless
git bisect good 54faa439355a9ae476a446429967e9e38f04363e
# bad: [b6c9f11c6b6993303067f7c04a73258226a6e77e] mm/compaction: add tracepoint to observe behaviour of compaction defer
git bisect bad b6c9f11c6b6993303067f7c04a73258226a6e77e
# good: [9ce5d3fb13a80f28db450de4ecf2727893e99c93] mm: pagemap_read: limit scan to virtual region being asked
git bisect good 9ce5d3fb13a80f28db450de4ecf2727893e99c93
# bad: [1a7a376546ca56e7750987c15d0c7541c17a512c] mm/compaction: change tracepoint format from decimal to hexadecimal
git bisect bad 1a7a376546ca56e7750987c15d0c7541c17a512c
# bad: [4081187ff19cf2186010c003939c17d70d0bbb27] page_writeback: put account_page_redirty() after set_page_dirty()
git bisect bad 4081187ff19cf2186010c003939c17d70d0bbb27
# bad: [b316feb3c37ff19cddcaf1f6b5056c633193257d] mm: account pmd page tables to the process
git bisect bad b316feb3c37ff19cddcaf1f6b5056c633193257d
# first bad commit: [b316feb3c37ff19cddcaf1f6b5056c633193257d] mm: account pmd page tables to the process

I've added a dump_mm() call when the bug happens followed by a
while (true) loop (to avoid constant reprinting of the same thing),
here's what I get:

[    7.235903] ------------[ cut here ]------------
[    7.240881] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218()
[    7.248369] Modules linked in: ipv6 autofs4
[    7.252792] CPU: 0 PID: 58 Comm: systemd Not tainted 3.19.0-rc5-next-20150119-dirty #888
[    7.261274] Hardware name: Generic AM43 (Flattened Device Tree)
[    7.267512] [<c0015afc>] (unwind_backtrace) from [<c001221c>] (show_stack+0x10/0x14)
[    7.275651] [<c001221c>] (show_stack) from [<c058972c>] (dump_stack+0x84/0x9c)
[    7.283249] [<c058972c>] (dump_stack) from [<c003def0>] (warn_slowpath_common+0x78/0xb4)
[    7.291750] [<c003def0>] (warn_slowpath_common) from [<c003dfc8>] (warn_slowpath_null+0x1c/0x24)
[    7.300977] [<c003dfc8>] (warn_slowpath_null) from [<c0133410>] (exit_mmap+0x1b4/0x218)
[    7.309376] [<c0133410>] (exit_mmap) from [<c003b5f0>] (mmput+0x44/0xec)
[    7.316385] [<c003b5f0>] (mmput) from [<c0157e68>] (flush_old_exec+0x264/0x5d4)
[    7.324061] [<c0157e68>] (flush_old_exec) from [<c019f180>] (load_elf_binary+0x288/0x1234)
[    7.332727] [<c019f180>] (load_elf_binary) from [<c0158304>] (search_binary_handler+0x84/0x1e8)
[    7.341857] [<c0158304>] (search_binary_handler) from [<c0158c84>] (do_execveat_common+0x53c/0x6b8)
[    7.351346] [<c0158c84>] (do_execveat_common) from [<c0158e24>] (do_execve+0x24/0x2c)
[    7.359561] [<c0158e24>] (do_execve) from [<c000e6c0>] (ret_fast_syscall+0x0/0x4c)
[    7.367485] ---[ end trace 633a89eb76b1d46e ]---
[    7.372360] mm ed29fa00 mmap ed29b6b8 seqnum 0 task_size 3204448256
[    7.372360] get_unmapped_area c001cfc0
[    7.372360] mmap_base 3069620224 mmap_legacy_base 0 highest_vm_end 3202711552
[    7.372360] pgd ed184000 mm_users 0 mm_count 1 nr_ptes 1 nr_pmds 4294967292 map_count 59
[    7.372360] hiwater_rss 37 hiwater_vm 37e total_vm 37e locked_vm 0
[    7.372360] pinned_vm 0 shared_vm 324 exec_vm 254 stack_vm 22
[    7.372360] start_code 10000 end_code c9d48 start_data da1b8 end_data ea1a4
[    7.372360] start_brk eb000 brk 10c000 start_stack becd5f10
[    7.372360] arg_start becd5fd4 arg_end becd5fdf env_start becd5fdf env_end becd5ff1
[    7.372360] binfmt c08b3158 flags cd core_state   (null)
[    7.372360] ioctx_table   (null)
[    7.372360] owner   (null) exe_file ee49d040
[    7.372360] tlb_flush_pending 0
[    7.448908] flags: 0x0()

Looking at nr_pmds, that's basically (unsigned long) -4, which tells me
we are decrementing mm->nr_pmds without incrementing first. In, when I
add:

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ba5f3bcca55d..8425fb419eab 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1440,11 +1440,15 @@ static inline unsigned long mm_nr_pmds(struct mm_struct *mm)
 static inline void mm_inc_nr_pmds(struct mm_struct *mm)
 {
 	atomic_long_inc(&mm->nr_pmds);
+	printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
+			atomic_long_read(&mm->nr_pmds));
 }
 
 static inline void mm_dec_nr_pmds(struct mm_struct *mm)
 {
 	atomic_long_dec(&mm->nr_pmds);
+	printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
+			atomic_long_read(&mm->nr_pmds));
 }
 #endif

I start getting:

[...]

[    5.935390] ===> mm_dec_nr_pmds nr_pmds -1
[    6.236832] random: systemd urandom read with 34 bits of entropy
available
[    6.276340] systemd[1]: systemd 215 running in system mode. (+PAM
+AUDIT +SELINUX +IMA +SYSVINIT +LIBC
RYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR)
[    6.291380] systemd[1]: Detected architecture 'arm'.

Welcome to Debian GNU/Linux 8 (jessie)!

[    6.434013] systemd[1]: Inserted module 'autofs4'
[    7.152037] NET: Registered protocol family 10
[    7.165770] systemd[1]: Inserted module 'ipv6'
[    7.170646] ===> mm_dec_nr_pmds nr_pmds -2
[    7.174932] ===> mm_dec_nr_pmds nr_pmds -3
[    7.179427] ===> mm_dec_nr_pmds nr_pmds -4
[    7.258496] ===> mm_dec_nr_pmds nr_pmds -1
[    7.262809] ===> mm_dec_nr_pmds nr_pmds -2
[    7.267206] ===> mm_dec_nr_pmds nr_pmds -3
[    7.271486] ===> mm_dec_nr_pmds nr_pmds -4
[    7.275884] ------------[ cut here ]------------
[    7.280773] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218()

[...]

Which confirms my suspicion. So we never increment nr_pmds, but we
decrement it. The simplest "fix" is to make mm_nr_pmds() return a signed
long (see below) and cast roundup()'s return to (signed long), but
that's not what we really want in this case because it's clear our PMD
accounting is bogus.

Kiryl, any better idea on how to balance mm_inc_nr_pmds() and
mm_dec_nr_pmds() ?

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ba5f3bcca55d..c0ce2e8a9d45 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1432,7 +1432,7 @@ static inline void mm_dec_nr_pmds(struct mm_struct *mm) {}
 #else
 int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address);
 
-static inline unsigned long mm_nr_pmds(struct mm_struct *mm)
+static inline long mm_nr_pmds(struct mm_struct *mm)
 {
 	return atomic_long_read(&mm->nr_pmds);
 }
diff --git a/mm/mmap.c b/mm/mmap.c
index 25271805ab39..703474d3336b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2856,7 +2856,7 @@ void exit_mmap(struct mm_struct *mm)
 			round_up(FIRST_USER_ADDRESS, PMD_SIZE) >> PMD_SHIFT);
 #ifdef PUD_SHIFT
 	WARN_ON(mm_nr_pmds(mm) >
-			round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
+			(signed long) round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
 #endif
 }
 

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-19 17:43   ` Felipe Balbi
  0 siblings, 0 replies; 48+ messages in thread
From: Felipe Balbi @ 2015-01-19 17:43 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Mon, Jan 19, 2015 at 10:42:04AM -0600, Nishanth Menon wrote:
> Most platforms seem broken intoday's next tag.
> 
> https://github.com/nmenon/kernel-test-logs/tree/next-20150119
> (defconfig: omap2plus_defconfig)
> 
> > [    7.166600] ------------[ cut here ]------------
> > [    7.171676] WARNING: CPU: 0 PID: 54 at mm/mmap.c:2859 exit_mmap+0x1a8/0x21c()
> > [    7.179194] Modules linked in:
> > [    7.182479] CPU: 0 PID: 54 Comm: init Not tainted 3.19.0-rc5-next-20150119-00002-gfdefcded1272 #1
> > [    7.191863] Hardware name: Generic AM33XX (Flattened Device Tree)
> > [    7.198318] [<c00153f0>] (unwind_backtrace) from [<c0011a74>] (show_stack+0x10/0x14)
> > [    7.206528] [<c0011a74>] (show_stack) from [<c0580150>] (dump_stack+0x78/0x94)
> > [    7.214191] [<c0580150>] (dump_stack) from [<c003d4d0>] (warn_slowpath_common+0x7c/0xb4)
> > [    7.222751] [<c003d4d0>] (warn_slowpath_common) from [<c003d524>] (warn_slowpath_null+0x1c/0x24)
> > [    7.232038] [<c003d524>] (warn_slowpath_null) from [<c012de64>] (exit_mmap+0x1a8/0x21c)
> > [    7.240536] [<c012de64>] (exit_mmap) from [<c003abb8>] (mmput+0x44/0xec)
> > [    7.247612] [<c003abb8>] (mmput) from [<c0151368>] (flush_old_exec+0x300/0x5a4)
> > [    7.255357] [<c0151368>] (flush_old_exec) from [<c0195c10>] (load_elf_binary+0x2ec/0x1144)
> > [    7.264111] [<c0195c10>] (load_elf_binary) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> > [    7.273311] [<c0150ea0>] (search_binary_handler) from [<c019554c>] (load_script+0x260/0x280)
> > [    7.282232] [<c019554c>] (load_script) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> > [    7.291066] [<c0150ea0>] (search_binary_handler) from [<c0151f0c>] (do_execveat_common+0x538/0x6c4)
> > [    7.300628] [<c0151f0c>] (do_execveat_common) from [<c01520c4>] (do_execve+0x2c/0x34)
> > [    7.308881] [<c01520c4>] (do_execve) from [<c000e5e0>] (ret_fast_syscall+0x0/0x4c)
> > [    7.316881] ---[ end trace 3b8a46b1b280f423 ]---

seems like it's caused by:

b316feb3c37ff19cddcaf1f6b5056c633193257d is the first bad commit

Adding Kiryl to the loop.

git bisect start
# good: [ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc] Linux 3.19-rc5
git bisect good ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc
# bad: [a0d4287f787889e59db0fd295853a0f1f55d0699] Add linux-next specific files for 20150119
git bisect bad a0d4287f787889e59db0fd295853a0f1f55d0699
# good: [1c2f70b77b8ca77f10c59d479d009e07359d00d2] Merge remote-tracking branch 'drm/drm-next'
git bisect good 1c2f70b77b8ca77f10c59d479d009e07359d00d2
# good: [73c1390843223d8bfc85795c560c36b3d0ffee40] Merge remote-tracking branch 'leds/for-next'
git bisect good 73c1390843223d8bfc85795c560c36b3d0ffee40
# good: [7bc6bef35d48e91ad796b6eead7304998842c782] Merge remote-tracking branch 'pinctrl/for-next'
git bisect good 7bc6bef35d48e91ad796b6eead7304998842c782
# bad: [45e1eaa38732ffa3de0d18fe95d2d2b960a7c777] lib: bitmap: change bitmap_shift_right to take unsigned parameters
git bisect bad 45e1eaa38732ffa3de0d18fe95d2d2b960a7c777
# good: [c82a73a0369a7dd6dcfaf9e6bd572a4e5deda223] mm, page_alloc: reduce number of alloc_pages* functions' parameters
git bisect good c82a73a0369a7dd6dcfaf9e6bd572a4e5deda223
# bad: [0b1c810fbc4bbff7e314dd6ff91c2b4af499199d] mm: don't split THP page when syscall is called
git bisect bad 0b1c810fbc4bbff7e314dd6ff91c2b4af499199d
# good: [54faa439355a9ae476a446429967e9e38f04363e] oom, PM: make OOM detection in the freezer path raceless
git bisect good 54faa439355a9ae476a446429967e9e38f04363e
# bad: [b6c9f11c6b6993303067f7c04a73258226a6e77e] mm/compaction: add tracepoint to observe behaviour of compaction defer
git bisect bad b6c9f11c6b6993303067f7c04a73258226a6e77e
# good: [9ce5d3fb13a80f28db450de4ecf2727893e99c93] mm: pagemap_read: limit scan to virtual region being asked
git bisect good 9ce5d3fb13a80f28db450de4ecf2727893e99c93
# bad: [1a7a376546ca56e7750987c15d0c7541c17a512c] mm/compaction: change tracepoint format from decimal to hexadecimal
git bisect bad 1a7a376546ca56e7750987c15d0c7541c17a512c
# bad: [4081187ff19cf2186010c003939c17d70d0bbb27] page_writeback: put account_page_redirty() after set_page_dirty()
git bisect bad 4081187ff19cf2186010c003939c17d70d0bbb27
# bad: [b316feb3c37ff19cddcaf1f6b5056c633193257d] mm: account pmd page tables to the process
git bisect bad b316feb3c37ff19cddcaf1f6b5056c633193257d
# first bad commit: [b316feb3c37ff19cddcaf1f6b5056c633193257d] mm: account pmd page tables to the process

I've added a dump_mm() call when the bug happens followed by a
while (true) loop (to avoid constant reprinting of the same thing),
here's what I get:

[    7.235903] ------------[ cut here ]------------
[    7.240881] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218()
[    7.248369] Modules linked in: ipv6 autofs4
[    7.252792] CPU: 0 PID: 58 Comm: systemd Not tainted 3.19.0-rc5-next-20150119-dirty #888
[    7.261274] Hardware name: Generic AM43 (Flattened Device Tree)
[    7.267512] [<c0015afc>] (unwind_backtrace) from [<c001221c>] (show_stack+0x10/0x14)
[    7.275651] [<c001221c>] (show_stack) from [<c058972c>] (dump_stack+0x84/0x9c)
[    7.283249] [<c058972c>] (dump_stack) from [<c003def0>] (warn_slowpath_common+0x78/0xb4)
[    7.291750] [<c003def0>] (warn_slowpath_common) from [<c003dfc8>] (warn_slowpath_null+0x1c/0x24)
[    7.300977] [<c003dfc8>] (warn_slowpath_null) from [<c0133410>] (exit_mmap+0x1b4/0x218)
[    7.309376] [<c0133410>] (exit_mmap) from [<c003b5f0>] (mmput+0x44/0xec)
[    7.316385] [<c003b5f0>] (mmput) from [<c0157e68>] (flush_old_exec+0x264/0x5d4)
[    7.324061] [<c0157e68>] (flush_old_exec) from [<c019f180>] (load_elf_binary+0x288/0x1234)
[    7.332727] [<c019f180>] (load_elf_binary) from [<c0158304>] (search_binary_handler+0x84/0x1e8)
[    7.341857] [<c0158304>] (search_binary_handler) from [<c0158c84>] (do_execveat_common+0x53c/0x6b8)
[    7.351346] [<c0158c84>] (do_execveat_common) from [<c0158e24>] (do_execve+0x24/0x2c)
[    7.359561] [<c0158e24>] (do_execve) from [<c000e6c0>] (ret_fast_syscall+0x0/0x4c)
[    7.367485] ---[ end trace 633a89eb76b1d46e ]---
[    7.372360] mm ed29fa00 mmap ed29b6b8 seqnum 0 task_size 3204448256
[    7.372360] get_unmapped_area c001cfc0
[    7.372360] mmap_base 3069620224 mmap_legacy_base 0 highest_vm_end 3202711552
[    7.372360] pgd ed184000 mm_users 0 mm_count 1 nr_ptes 1 nr_pmds 4294967292 map_count 59
[    7.372360] hiwater_rss 37 hiwater_vm 37e total_vm 37e locked_vm 0
[    7.372360] pinned_vm 0 shared_vm 324 exec_vm 254 stack_vm 22
[    7.372360] start_code 10000 end_code c9d48 start_data da1b8 end_data ea1a4
[    7.372360] start_brk eb000 brk 10c000 start_stack becd5f10
[    7.372360] arg_start becd5fd4 arg_end becd5fdf env_start becd5fdf env_end becd5ff1
[    7.372360] binfmt c08b3158 flags cd core_state   (null)
[    7.372360] ioctx_table   (null)
[    7.372360] owner   (null) exe_file ee49d040
[    7.372360] tlb_flush_pending 0
[    7.448908] flags: 0x0()

Looking at nr_pmds, that's basically (unsigned long) -4, which tells me
we are decrementing mm->nr_pmds without incrementing first. In, when I
add:

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ba5f3bcca55d..8425fb419eab 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1440,11 +1440,15 @@ static inline unsigned long mm_nr_pmds(struct mm_struct *mm)
 static inline void mm_inc_nr_pmds(struct mm_struct *mm)
 {
 	atomic_long_inc(&mm->nr_pmds);
+	printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
+			atomic_long_read(&mm->nr_pmds));
 }
 
 static inline void mm_dec_nr_pmds(struct mm_struct *mm)
 {
 	atomic_long_dec(&mm->nr_pmds);
+	printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
+			atomic_long_read(&mm->nr_pmds));
 }
 #endif

I start getting:

[...]

[    5.935390] ===> mm_dec_nr_pmds nr_pmds -1
[    6.236832] random: systemd urandom read with 34 bits of entropy
available
[    6.276340] systemd[1]: systemd 215 running in system mode. (+PAM
+AUDIT +SELINUX +IMA +SYSVINIT +LIBC
RYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR)
[    6.291380] systemd[1]: Detected architecture 'arm'.

Welcome to Debian GNU/Linux 8 (jessie)!

[    6.434013] systemd[1]: Inserted module 'autofs4'
[    7.152037] NET: Registered protocol family 10
[    7.165770] systemd[1]: Inserted module 'ipv6'
[    7.170646] ===> mm_dec_nr_pmds nr_pmds -2
[    7.174932] ===> mm_dec_nr_pmds nr_pmds -3
[    7.179427] ===> mm_dec_nr_pmds nr_pmds -4
[    7.258496] ===> mm_dec_nr_pmds nr_pmds -1
[    7.262809] ===> mm_dec_nr_pmds nr_pmds -2
[    7.267206] ===> mm_dec_nr_pmds nr_pmds -3
[    7.271486] ===> mm_dec_nr_pmds nr_pmds -4
[    7.275884] ------------[ cut here ]------------
[    7.280773] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218()

[...]

Which confirms my suspicion. So we never increment nr_pmds, but we
decrement it. The simplest "fix" is to make mm_nr_pmds() return a signed
long (see below) and cast roundup()'s return to (signed long), but
that's not what we really want in this case because it's clear our PMD
accounting is bogus.

Kiryl, any better idea on how to balance mm_inc_nr_pmds() and
mm_dec_nr_pmds() ?

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ba5f3bcca55d..c0ce2e8a9d45 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1432,7 +1432,7 @@ static inline void mm_dec_nr_pmds(struct mm_struct *mm) {}
 #else
 int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address);
 
-static inline unsigned long mm_nr_pmds(struct mm_struct *mm)
+static inline long mm_nr_pmds(struct mm_struct *mm)
 {
 	return atomic_long_read(&mm->nr_pmds);
 }
diff --git a/mm/mmap.c b/mm/mmap.c
index 25271805ab39..703474d3336b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2856,7 +2856,7 @@ void exit_mmap(struct mm_struct *mm)
 			round_up(FIRST_USER_ADDRESS, PMD_SIZE) >> PMD_SHIFT);
 #ifdef PUD_SHIFT
 	WARN_ON(mm_nr_pmds(mm) >
-			round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
+			(signed long) round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
 #endif
 }
 

-- 
balbi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150119/8067d0bd/attachment-0001.sig>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-19 17:43   ` Felipe Balbi
  (?)
@ 2015-01-20  0:16     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 48+ messages in thread
From: Kirill A. Shutemov @ 2015-01-20  0:16 UTC (permalink / raw)
  Cc: Nishanth Menon, linux-omap, linux-next, linux-arm-kernel, Balbi,
	Felipe, Kirill A. Shutemov, linux-mm

Felipe Balbi wrote:
> Hi,
> 
> On Mon, Jan 19, 2015 at 10:42:04AM -0600, Nishanth Menon wrote:
> > Most platforms seem broken intoday's next tag.
> > 
> > https://github.com/nmenon/kernel-test-logs/tree/next-20150119
> > (defconfig: omap2plus_defconfig)
> > 
> > > [    7.166600] ------------[ cut here ]------------
> > > [    7.171676] WARNING: CPU: 0 PID: 54 at mm/mmap.c:2859 exit_mmap+0x1a8/0x21c()
> > > [    7.179194] Modules linked in:
> > > [    7.182479] CPU: 0 PID: 54 Comm: init Not tainted 3.19.0-rc5-next-20150119-00002-gfdefcded1272 #1
> > > [    7.191863] Hardware name: Generic AM33XX (Flattened Device Tree)
> > > [    7.198318] [<c00153f0>] (unwind_backtrace) from [<c0011a74>] (show_stack+0x10/0x14)
> > > [    7.206528] [<c0011a74>] (show_stack) from [<c0580150>] (dump_stack+0x78/0x94)
> > > [    7.214191] [<c0580150>] (dump_stack) from [<c003d4d0>] (warn_slowpath_common+0x7c/0xb4)
> > > [    7.222751] [<c003d4d0>] (warn_slowpath_common) from [<c003d524>] (warn_slowpath_null+0x1c/0x24)
> > > [    7.232038] [<c003d524>] (warn_slowpath_null) from [<c012de64>] (exit_mmap+0x1a8/0x21c)
> > > [    7.240536] [<c012de64>] (exit_mmap) from [<c003abb8>] (mmput+0x44/0xec)
> > > [    7.247612] [<c003abb8>] (mmput) from [<c0151368>] (flush_old_exec+0x300/0x5a4)
> > > [    7.255357] [<c0151368>] (flush_old_exec) from [<c0195c10>] (load_elf_binary+0x2ec/0x1144)
> > > [    7.264111] [<c0195c10>] (load_elf_binary) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> > > [    7.273311] [<c0150ea0>] (search_binary_handler) from [<c019554c>] (load_script+0x260/0x280)
> > > [    7.282232] [<c019554c>] (load_script) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> > > [    7.291066] [<c0150ea0>] (search_binary_handler) from [<c0151f0c>] (do_execveat_common+0x538/0x6c4)
> > > [    7.300628] [<c0151f0c>] (do_execveat_common) from [<c01520c4>] (do_execve+0x2c/0x34)
> > > [    7.308881] [<c01520c4>] (do_execve) from [<c000e5e0>] (ret_fast_syscall+0x0/0x4c)
> > > [    7.316881] ---[ end trace 3b8a46b1b280f423 ]---
> 
> seems like it's caused by:
> 
> b316feb3c37ff19cddcaf1f6b5056c633193257d is the first bad commit
> 
> Adding Kiryl to the loop.
> 
> git bisect start
> # good: [ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc] Linux 3.19-rc5
> git bisect good ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc
> # bad: [a0d4287f787889e59db0fd295853a0f1f55d0699] Add linux-next specific files for 20150119
> git bisect bad a0d4287f787889e59db0fd295853a0f1f55d0699
> # good: [1c2f70b77b8ca77f10c59d479d009e07359d00d2] Merge remote-tracking branch 'drm/drm-next'
> git bisect good 1c2f70b77b8ca77f10c59d479d009e07359d00d2
> # good: [73c1390843223d8bfc85795c560c36b3d0ffee40] Merge remote-tracking branch 'leds/for-next'
> git bisect good 73c1390843223d8bfc85795c560c36b3d0ffee40
> # good: [7bc6bef35d48e91ad796b6eead7304998842c782] Merge remote-tracking branch 'pinctrl/for-next'
> git bisect good 7bc6bef35d48e91ad796b6eead7304998842c782
> # bad: [45e1eaa38732ffa3de0d18fe95d2d2b960a7c777] lib: bitmap: change bitmap_shift_right to take unsigned parameters
> git bisect bad 45e1eaa38732ffa3de0d18fe95d2d2b960a7c777
> # good: [c82a73a0369a7dd6dcfaf9e6bd572a4e5deda223] mm, page_alloc: reduce number of alloc_pages* functions' parameters
> git bisect good c82a73a0369a7dd6dcfaf9e6bd572a4e5deda223
> # bad: [0b1c810fbc4bbff7e314dd6ff91c2b4af499199d] mm: don't split THP page when syscall is called
> git bisect bad 0b1c810fbc4bbff7e314dd6ff91c2b4af499199d
> # good: [54faa439355a9ae476a446429967e9e38f04363e] oom, PM: make OOM detection in the freezer path raceless
> git bisect good 54faa439355a9ae476a446429967e9e38f04363e
> # bad: [b6c9f11c6b6993303067f7c04a73258226a6e77e] mm/compaction: add tracepoint to observe behaviour of compaction defer
> git bisect bad b6c9f11c6b6993303067f7c04a73258226a6e77e
> # good: [9ce5d3fb13a80f28db450de4ecf2727893e99c93] mm: pagemap_read: limit scan to virtual region being asked
> git bisect good 9ce5d3fb13a80f28db450de4ecf2727893e99c93
> # bad: [1a7a376546ca56e7750987c15d0c7541c17a512c] mm/compaction: change tracepoint format from decimal to hexadecimal
> git bisect bad 1a7a376546ca56e7750987c15d0c7541c17a512c
> # bad: [4081187ff19cf2186010c003939c17d70d0bbb27] page_writeback: put account_page_redirty() after set_page_dirty()
> git bisect bad 4081187ff19cf2186010c003939c17d70d0bbb27
> # bad: [b316feb3c37ff19cddcaf1f6b5056c633193257d] mm: account pmd page tables to the process
> git bisect bad b316feb3c37ff19cddcaf1f6b5056c633193257d
> # first bad commit: [b316feb3c37ff19cddcaf1f6b5056c633193257d] mm: account pmd page tables to the process
> 
> I've added a dump_mm() call when the bug happens followed by a
> while (true) loop (to avoid constant reprinting of the same thing),
> here's what I get:
> 
> [    7.235903] ------------[ cut here ]------------
> [    7.240881] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218()
> [    7.248369] Modules linked in: ipv6 autofs4
> [    7.252792] CPU: 0 PID: 58 Comm: systemd Not tainted 3.19.0-rc5-next-20150119-dirty #888
> [    7.261274] Hardware name: Generic AM43 (Flattened Device Tree)
> [    7.267512] [<c0015afc>] (unwind_backtrace) from [<c001221c>] (show_stack+0x10/0x14)
> [    7.275651] [<c001221c>] (show_stack) from [<c058972c>] (dump_stack+0x84/0x9c)
> [    7.283249] [<c058972c>] (dump_stack) from [<c003def0>] (warn_slowpath_common+0x78/0xb4)
> [    7.291750] [<c003def0>] (warn_slowpath_common) from [<c003dfc8>] (warn_slowpath_null+0x1c/0x24)
> [    7.300977] [<c003dfc8>] (warn_slowpath_null) from [<c0133410>] (exit_mmap+0x1b4/0x218)
> [    7.309376] [<c0133410>] (exit_mmap) from [<c003b5f0>] (mmput+0x44/0xec)
> [    7.316385] [<c003b5f0>] (mmput) from [<c0157e68>] (flush_old_exec+0x264/0x5d4)
> [    7.324061] [<c0157e68>] (flush_old_exec) from [<c019f180>] (load_elf_binary+0x288/0x1234)
> [    7.332727] [<c019f180>] (load_elf_binary) from [<c0158304>] (search_binary_handler+0x84/0x1e8)
> [    7.341857] [<c0158304>] (search_binary_handler) from [<c0158c84>] (do_execveat_common+0x53c/0x6b8)
> [    7.351346] [<c0158c84>] (do_execveat_common) from [<c0158e24>] (do_execve+0x24/0x2c)
> [    7.359561] [<c0158e24>] (do_execve) from [<c000e6c0>] (ret_fast_syscall+0x0/0x4c)
> [    7.367485] ---[ end trace 633a89eb76b1d46e ]---
> [    7.372360] mm ed29fa00 mmap ed29b6b8 seqnum 0 task_size 3204448256
> [    7.372360] get_unmapped_area c001cfc0
> [    7.372360] mmap_base 3069620224 mmap_legacy_base 0 highest_vm_end 3202711552
> [    7.372360] pgd ed184000 mm_users 0 mm_count 1 nr_ptes 1 nr_pmds 4294967292 map_count 59
> [    7.372360] hiwater_rss 37 hiwater_vm 37e total_vm 37e locked_vm 0
> [    7.372360] pinned_vm 0 shared_vm 324 exec_vm 254 stack_vm 22
> [    7.372360] start_code 10000 end_code c9d48 start_data da1b8 end_data ea1a4
> [    7.372360] start_brk eb000 brk 10c000 start_stack becd5f10
> [    7.372360] arg_start becd5fd4 arg_end becd5fdf env_start becd5fdf env_end becd5ff1
> [    7.372360] binfmt c08b3158 flags cd core_state   (null)
> [    7.372360] ioctx_table   (null)
> [    7.372360] owner   (null) exe_file ee49d040
> [    7.372360] tlb_flush_pending 0
> [    7.448908] flags: 0x0()
> 
> Looking at nr_pmds, that's basically (unsigned long) -4, which tells me
> we are decrementing mm->nr_pmds without incrementing first. In, when I
> add:
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ba5f3bcca55d..8425fb419eab 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1440,11 +1440,15 @@ static inline unsigned long mm_nr_pmds(struct mm_struct *mm)
>  static inline void mm_inc_nr_pmds(struct mm_struct *mm)
>  {
>  	atomic_long_inc(&mm->nr_pmds);
> +	printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
> +			atomic_long_read(&mm->nr_pmds));
>  }
>  
>  static inline void mm_dec_nr_pmds(struct mm_struct *mm)
>  {
>  	atomic_long_dec(&mm->nr_pmds);
> +	printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
> +			atomic_long_read(&mm->nr_pmds));
>  }
>  #endif
> 
> I start getting:
> 
> [...]
> 
> [    5.935390] ===> mm_dec_nr_pmds nr_pmds -1
> [    6.236832] random: systemd urandom read with 34 bits of entropy
> available
> [    6.276340] systemd[1]: systemd 215 running in system mode. (+PAM
> +AUDIT +SELINUX +IMA +SYSVINIT +LIBC
> RYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR)
> [    6.291380] systemd[1]: Detected architecture 'arm'.
> 
> Welcome to Debian GNU/Linux 8 (jessie)!
> 
> [    6.434013] systemd[1]: Inserted module 'autofs4'
> [    7.152037] NET: Registered protocol family 10
> [    7.165770] systemd[1]: Inserted module 'ipv6'
> [    7.170646] ===> mm_dec_nr_pmds nr_pmds -2
> [    7.174932] ===> mm_dec_nr_pmds nr_pmds -3
> [    7.179427] ===> mm_dec_nr_pmds nr_pmds -4
> [    7.258496] ===> mm_dec_nr_pmds nr_pmds -1
> [    7.262809] ===> mm_dec_nr_pmds nr_pmds -2
> [    7.267206] ===> mm_dec_nr_pmds nr_pmds -3
> [    7.271486] ===> mm_dec_nr_pmds nr_pmds -4
> [    7.275884] ------------[ cut here ]------------
> [    7.280773] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218()
> 
> [...]
> 
> Which confirms my suspicion. So we never increment nr_pmds, but we
> decrement it. The simplest "fix" is to make mm_nr_pmds() return a signed
> long (see below) and cast roundup()'s return to (signed long), but
> that's not what we really want in this case because it's clear our PMD
> accounting is bogus.
> 
> Kiryl, any better idea on how to balance mm_inc_nr_pmds() and
> mm_dec_nr_pmds() ?

I assume it's on !LPAE kernel, right?

I did a quick look. ARM has folded PMD level in case of 2-level pages
tables, but it doesn't use standard approach -- pgtable-nopmd.h.
As result ARM doesn't have __PAGETABLE_PMD_FOLDED defined.

I will look further tomorrow, but I wounder if we can just define
__PAGETABLE_PMD_FOLDED in arch/arm/include/asm/pgtable-2level.h ?

This way we would also get rid of dead code -- __pmd_alloc() is never
called in this configuration. And fix the accounting issue: mm_*_nr_pmd()
helpers will become nop.

Better option would be converting 2-lvl ARM configuration to
<asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
@ 2015-01-20  0:16     ` Kirill A. Shutemov
  0 siblings, 0 replies; 48+ messages in thread
From: Kirill A. Shutemov @ 2015-01-20  0:16 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Nishanth Menon, linux-omap, linux-next, linux-arm-kernel,
	Kirill A. Shutemov, linux-mm

Felipe Balbi wrote:
> Hi,
> 
> On Mon, Jan 19, 2015 at 10:42:04AM -0600, Nishanth Menon wrote:
> > Most platforms seem broken intoday's next tag.
> > 
> > https://github.com/nmenon/kernel-test-logs/tree/next-20150119
> > (defconfig: omap2plus_defconfig)
> > 
> > > [    7.166600] ------------[ cut here ]------------
> > > [    7.171676] WARNING: CPU: 0 PID: 54 at mm/mmap.c:2859 exit_mmap+0x1a8/0x21c()
> > > [    7.179194] Modules linked in:
> > > [    7.182479] CPU: 0 PID: 54 Comm: init Not tainted 3.19.0-rc5-next-20150119-00002-gfdefcded1272 #1
> > > [    7.191863] Hardware name: Generic AM33XX (Flattened Device Tree)
> > > [    7.198318] [<c00153f0>] (unwind_backtrace) from [<c0011a74>] (show_stack+0x10/0x14)
> > > [    7.206528] [<c0011a74>] (show_stack) from [<c0580150>] (dump_stack+0x78/0x94)
> > > [    7.214191] [<c0580150>] (dump_stack) from [<c003d4d0>] (warn_slowpath_common+0x7c/0xb4)
> > > [    7.222751] [<c003d4d0>] (warn_slowpath_common) from [<c003d524>] (warn_slowpath_null+0x1c/0x24)
> > > [    7.232038] [<c003d524>] (warn_slowpath_null) from [<c012de64>] (exit_mmap+0x1a8/0x21c)
> > > [    7.240536] [<c012de64>] (exit_mmap) from [<c003abb8>] (mmput+0x44/0xec)
> > > [    7.247612] [<c003abb8>] (mmput) from [<c0151368>] (flush_old_exec+0x300/0x5a4)
> > > [    7.255357] [<c0151368>] (flush_old_exec) from [<c0195c10>] (load_elf_binary+0x2ec/0x1144)
> > > [    7.264111] [<c0195c10>] (load_elf_binary) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> > > [    7.273311] [<c0150ea0>] (search_binary_handler) from [<c019554c>] (load_script+0x260/0x280)
> > > [    7.282232] [<c019554c>] (load_script) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> > > [    7.291066] [<c0150ea0>] (search_binary_handler) from [<c0151f0c>] (do_execveat_common+0x538/0x6c4)
> > > [    7.300628] [<c0151f0c>] (do_execveat_common) from [<c01520c4>] (do_execve+0x2c/0x34)
> > > [    7.308881] [<c01520c4>] (do_execve) from [<c000e5e0>] (ret_fast_syscall+0x0/0x4c)
> > > [    7.316881] ---[ end trace 3b8a46b1b280f423 ]---
> 
> seems like it's caused by:
> 
> b316feb3c37ff19cddcaf1f6b5056c633193257d is the first bad commit
> 
> Adding Kiryl to the loop.
> 
> git bisect start
> # good: [ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc] Linux 3.19-rc5
> git bisect good ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc
> # bad: [a0d4287f787889e59db0fd295853a0f1f55d0699] Add linux-next specific files for 20150119
> git bisect bad a0d4287f787889e59db0fd295853a0f1f55d0699
> # good: [1c2f70b77b8ca77f10c59d479d009e07359d00d2] Merge remote-tracking branch 'drm/drm-next'
> git bisect good 1c2f70b77b8ca77f10c59d479d009e07359d00d2
> # good: [73c1390843223d8bfc85795c560c36b3d0ffee40] Merge remote-tracking branch 'leds/for-next'
> git bisect good 73c1390843223d8bfc85795c560c36b3d0ffee40
> # good: [7bc6bef35d48e91ad796b6eead7304998842c782] Merge remote-tracking branch 'pinctrl/for-next'
> git bisect good 7bc6bef35d48e91ad796b6eead7304998842c782
> # bad: [45e1eaa38732ffa3de0d18fe95d2d2b960a7c777] lib: bitmap: change bitmap_shift_right to take unsigned parameters
> git bisect bad 45e1eaa38732ffa3de0d18fe95d2d2b960a7c777
> # good: [c82a73a0369a7dd6dcfaf9e6bd572a4e5deda223] mm, page_alloc: reduce number of alloc_pages* functions' parameters
> git bisect good c82a73a0369a7dd6dcfaf9e6bd572a4e5deda223
> # bad: [0b1c810fbc4bbff7e314dd6ff91c2b4af499199d] mm: don't split THP page when syscall is called
> git bisect bad 0b1c810fbc4bbff7e314dd6ff91c2b4af499199d
> # good: [54faa439355a9ae476a446429967e9e38f04363e] oom, PM: make OOM detection in the freezer path raceless
> git bisect good 54faa439355a9ae476a446429967e9e38f04363e
> # bad: [b6c9f11c6b6993303067f7c04a73258226a6e77e] mm/compaction: add tracepoint to observe behaviour of compaction defer
> git bisect bad b6c9f11c6b6993303067f7c04a73258226a6e77e
> # good: [9ce5d3fb13a80f28db450de4ecf2727893e99c93] mm: pagemap_read: limit scan to virtual region being asked
> git bisect good 9ce5d3fb13a80f28db450de4ecf2727893e99c93
> # bad: [1a7a376546ca56e7750987c15d0c7541c17a512c] mm/compaction: change tracepoint format from decimal to hexadecimal
> git bisect bad 1a7a376546ca56e7750987c15d0c7541c17a512c
> # bad: [4081187ff19cf2186010c003939c17d70d0bbb27] page_writeback: put account_page_redirty() after set_page_dirty()
> git bisect bad 4081187ff19cf2186010c003939c17d70d0bbb27
> # bad: [b316feb3c37ff19cddcaf1f6b5056c633193257d] mm: account pmd page tables to the process
> git bisect bad b316feb3c37ff19cddcaf1f6b5056c633193257d
> # first bad commit: [b316feb3c37ff19cddcaf1f6b5056c633193257d] mm: account pmd page tables to the process
> 
> I've added a dump_mm() call when the bug happens followed by a
> while (true) loop (to avoid constant reprinting of the same thing),
> here's what I get:
> 
> [    7.235903] ------------[ cut here ]------------
> [    7.240881] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218()
> [    7.248369] Modules linked in: ipv6 autofs4
> [    7.252792] CPU: 0 PID: 58 Comm: systemd Not tainted 3.19.0-rc5-next-20150119-dirty #888
> [    7.261274] Hardware name: Generic AM43 (Flattened Device Tree)
> [    7.267512] [<c0015afc>] (unwind_backtrace) from [<c001221c>] (show_stack+0x10/0x14)
> [    7.275651] [<c001221c>] (show_stack) from [<c058972c>] (dump_stack+0x84/0x9c)
> [    7.283249] [<c058972c>] (dump_stack) from [<c003def0>] (warn_slowpath_common+0x78/0xb4)
> [    7.291750] [<c003def0>] (warn_slowpath_common) from [<c003dfc8>] (warn_slowpath_null+0x1c/0x24)
> [    7.300977] [<c003dfc8>] (warn_slowpath_null) from [<c0133410>] (exit_mmap+0x1b4/0x218)
> [    7.309376] [<c0133410>] (exit_mmap) from [<c003b5f0>] (mmput+0x44/0xec)
> [    7.316385] [<c003b5f0>] (mmput) from [<c0157e68>] (flush_old_exec+0x264/0x5d4)
> [    7.324061] [<c0157e68>] (flush_old_exec) from [<c019f180>] (load_elf_binary+0x288/0x1234)
> [    7.332727] [<c019f180>] (load_elf_binary) from [<c0158304>] (search_binary_handler+0x84/0x1e8)
> [    7.341857] [<c0158304>] (search_binary_handler) from [<c0158c84>] (do_execveat_common+0x53c/0x6b8)
> [    7.351346] [<c0158c84>] (do_execveat_common) from [<c0158e24>] (do_execve+0x24/0x2c)
> [    7.359561] [<c0158e24>] (do_execve) from [<c000e6c0>] (ret_fast_syscall+0x0/0x4c)
> [    7.367485] ---[ end trace 633a89eb76b1d46e ]---
> [    7.372360] mm ed29fa00 mmap ed29b6b8 seqnum 0 task_size 3204448256
> [    7.372360] get_unmapped_area c001cfc0
> [    7.372360] mmap_base 3069620224 mmap_legacy_base 0 highest_vm_end 3202711552
> [    7.372360] pgd ed184000 mm_users 0 mm_count 1 nr_ptes 1 nr_pmds 4294967292 map_count 59
> [    7.372360] hiwater_rss 37 hiwater_vm 37e total_vm 37e locked_vm 0
> [    7.372360] pinned_vm 0 shared_vm 324 exec_vm 254 stack_vm 22
> [    7.372360] start_code 10000 end_code c9d48 start_data da1b8 end_data ea1a4
> [    7.372360] start_brk eb000 brk 10c000 start_stack becd5f10
> [    7.372360] arg_start becd5fd4 arg_end becd5fdf env_start becd5fdf env_end becd5ff1
> [    7.372360] binfmt c08b3158 flags cd core_state   (null)
> [    7.372360] ioctx_table   (null)
> [    7.372360] owner   (null) exe_file ee49d040
> [    7.372360] tlb_flush_pending 0
> [    7.448908] flags: 0x0()
> 
> Looking at nr_pmds, that's basically (unsigned long) -4, which tells me
> we are decrementing mm->nr_pmds without incrementing first. In, when I
> add:
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ba5f3bcca55d..8425fb419eab 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1440,11 +1440,15 @@ static inline unsigned long mm_nr_pmds(struct mm_struct *mm)
>  static inline void mm_inc_nr_pmds(struct mm_struct *mm)
>  {
>  	atomic_long_inc(&mm->nr_pmds);
> +	printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
> +			atomic_long_read(&mm->nr_pmds));
>  }
>  
>  static inline void mm_dec_nr_pmds(struct mm_struct *mm)
>  {
>  	atomic_long_dec(&mm->nr_pmds);
> +	printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
> +			atomic_long_read(&mm->nr_pmds));
>  }
>  #endif
> 
> I start getting:
> 
> [...]
> 
> [    5.935390] ===> mm_dec_nr_pmds nr_pmds -1
> [    6.236832] random: systemd urandom read with 34 bits of entropy
> available
> [    6.276340] systemd[1]: systemd 215 running in system mode. (+PAM
> +AUDIT +SELINUX +IMA +SYSVINIT +LIBC
> RYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR)
> [    6.291380] systemd[1]: Detected architecture 'arm'.
> 
> Welcome to Debian GNU/Linux 8 (jessie)!
> 
> [    6.434013] systemd[1]: Inserted module 'autofs4'
> [    7.152037] NET: Registered protocol family 10
> [    7.165770] systemd[1]: Inserted module 'ipv6'
> [    7.170646] ===> mm_dec_nr_pmds nr_pmds -2
> [    7.174932] ===> mm_dec_nr_pmds nr_pmds -3
> [    7.179427] ===> mm_dec_nr_pmds nr_pmds -4
> [    7.258496] ===> mm_dec_nr_pmds nr_pmds -1
> [    7.262809] ===> mm_dec_nr_pmds nr_pmds -2
> [    7.267206] ===> mm_dec_nr_pmds nr_pmds -3
> [    7.271486] ===> mm_dec_nr_pmds nr_pmds -4
> [    7.275884] ------------[ cut here ]------------
> [    7.280773] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218()
> 
> [...]
> 
> Which confirms my suspicion. So we never increment nr_pmds, but we
> decrement it. The simplest "fix" is to make mm_nr_pmds() return a signed
> long (see below) and cast roundup()'s return to (signed long), but
> that's not what we really want in this case because it's clear our PMD
> accounting is bogus.
> 
> Kiryl, any better idea on how to balance mm_inc_nr_pmds() and
> mm_dec_nr_pmds() ?

I assume it's on !LPAE kernel, right?

I did a quick look. ARM has folded PMD level in case of 2-level pages
tables, but it doesn't use standard approach -- pgtable-nopmd.h.
As result ARM doesn't have __PAGETABLE_PMD_FOLDED defined.

I will look further tomorrow, but I wounder if we can just define
__PAGETABLE_PMD_FOLDED in arch/arm/include/asm/pgtable-2level.h ?

This way we would also get rid of dead code -- __pmd_alloc() is never
called in this configuration. And fix the accounting issue: mm_*_nr_pmd()
helpers will become nop.

Better option would be converting 2-lvl ARM configuration to
<asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-20  0:16     ` Kirill A. Shutemov
  0 siblings, 0 replies; 48+ messages in thread
From: Kirill A. Shutemov @ 2015-01-20  0:16 UTC (permalink / raw)
  To: linux-arm-kernel

Felipe Balbi wrote:
> Hi,
> 
> On Mon, Jan 19, 2015 at 10:42:04AM -0600, Nishanth Menon wrote:
> > Most platforms seem broken intoday's next tag.
> > 
> > https://github.com/nmenon/kernel-test-logs/tree/next-20150119
> > (defconfig: omap2plus_defconfig)
> > 
> > > [    7.166600] ------------[ cut here ]------------
> > > [    7.171676] WARNING: CPU: 0 PID: 54 at mm/mmap.c:2859 exit_mmap+0x1a8/0x21c()
> > > [    7.179194] Modules linked in:
> > > [    7.182479] CPU: 0 PID: 54 Comm: init Not tainted 3.19.0-rc5-next-20150119-00002-gfdefcded1272 #1
> > > [    7.191863] Hardware name: Generic AM33XX (Flattened Device Tree)
> > > [    7.198318] [<c00153f0>] (unwind_backtrace) from [<c0011a74>] (show_stack+0x10/0x14)
> > > [    7.206528] [<c0011a74>] (show_stack) from [<c0580150>] (dump_stack+0x78/0x94)
> > > [    7.214191] [<c0580150>] (dump_stack) from [<c003d4d0>] (warn_slowpath_common+0x7c/0xb4)
> > > [    7.222751] [<c003d4d0>] (warn_slowpath_common) from [<c003d524>] (warn_slowpath_null+0x1c/0x24)
> > > [    7.232038] [<c003d524>] (warn_slowpath_null) from [<c012de64>] (exit_mmap+0x1a8/0x21c)
> > > [    7.240536] [<c012de64>] (exit_mmap) from [<c003abb8>] (mmput+0x44/0xec)
> > > [    7.247612] [<c003abb8>] (mmput) from [<c0151368>] (flush_old_exec+0x300/0x5a4)
> > > [    7.255357] [<c0151368>] (flush_old_exec) from [<c0195c10>] (load_elf_binary+0x2ec/0x1144)
> > > [    7.264111] [<c0195c10>] (load_elf_binary) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> > > [    7.273311] [<c0150ea0>] (search_binary_handler) from [<c019554c>] (load_script+0x260/0x280)
> > > [    7.282232] [<c019554c>] (load_script) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> > > [    7.291066] [<c0150ea0>] (search_binary_handler) from [<c0151f0c>] (do_execveat_common+0x538/0x6c4)
> > > [    7.300628] [<c0151f0c>] (do_execveat_common) from [<c01520c4>] (do_execve+0x2c/0x34)
> > > [    7.308881] [<c01520c4>] (do_execve) from [<c000e5e0>] (ret_fast_syscall+0x0/0x4c)
> > > [    7.316881] ---[ end trace 3b8a46b1b280f423 ]---
> 
> seems like it's caused by:
> 
> b316feb3c37ff19cddcaf1f6b5056c633193257d is the first bad commit
> 
> Adding Kiryl to the loop.
> 
> git bisect start
> # good: [ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc] Linux 3.19-rc5
> git bisect good ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc
> # bad: [a0d4287f787889e59db0fd295853a0f1f55d0699] Add linux-next specific files for 20150119
> git bisect bad a0d4287f787889e59db0fd295853a0f1f55d0699
> # good: [1c2f70b77b8ca77f10c59d479d009e07359d00d2] Merge remote-tracking branch 'drm/drm-next'
> git bisect good 1c2f70b77b8ca77f10c59d479d009e07359d00d2
> # good: [73c1390843223d8bfc85795c560c36b3d0ffee40] Merge remote-tracking branch 'leds/for-next'
> git bisect good 73c1390843223d8bfc85795c560c36b3d0ffee40
> # good: [7bc6bef35d48e91ad796b6eead7304998842c782] Merge remote-tracking branch 'pinctrl/for-next'
> git bisect good 7bc6bef35d48e91ad796b6eead7304998842c782
> # bad: [45e1eaa38732ffa3de0d18fe95d2d2b960a7c777] lib: bitmap: change bitmap_shift_right to take unsigned parameters
> git bisect bad 45e1eaa38732ffa3de0d18fe95d2d2b960a7c777
> # good: [c82a73a0369a7dd6dcfaf9e6bd572a4e5deda223] mm, page_alloc: reduce number of alloc_pages* functions' parameters
> git bisect good c82a73a0369a7dd6dcfaf9e6bd572a4e5deda223
> # bad: [0b1c810fbc4bbff7e314dd6ff91c2b4af499199d] mm: don't split THP page when syscall is called
> git bisect bad 0b1c810fbc4bbff7e314dd6ff91c2b4af499199d
> # good: [54faa439355a9ae476a446429967e9e38f04363e] oom, PM: make OOM detection in the freezer path raceless
> git bisect good 54faa439355a9ae476a446429967e9e38f04363e
> # bad: [b6c9f11c6b6993303067f7c04a73258226a6e77e] mm/compaction: add tracepoint to observe behaviour of compaction defer
> git bisect bad b6c9f11c6b6993303067f7c04a73258226a6e77e
> # good: [9ce5d3fb13a80f28db450de4ecf2727893e99c93] mm: pagemap_read: limit scan to virtual region being asked
> git bisect good 9ce5d3fb13a80f28db450de4ecf2727893e99c93
> # bad: [1a7a376546ca56e7750987c15d0c7541c17a512c] mm/compaction: change tracepoint format from decimal to hexadecimal
> git bisect bad 1a7a376546ca56e7750987c15d0c7541c17a512c
> # bad: [4081187ff19cf2186010c003939c17d70d0bbb27] page_writeback: put account_page_redirty() after set_page_dirty()
> git bisect bad 4081187ff19cf2186010c003939c17d70d0bbb27
> # bad: [b316feb3c37ff19cddcaf1f6b5056c633193257d] mm: account pmd page tables to the process
> git bisect bad b316feb3c37ff19cddcaf1f6b5056c633193257d
> # first bad commit: [b316feb3c37ff19cddcaf1f6b5056c633193257d] mm: account pmd page tables to the process
> 
> I've added a dump_mm() call when the bug happens followed by a
> while (true) loop (to avoid constant reprinting of the same thing),
> here's what I get:
> 
> [    7.235903] ------------[ cut here ]------------
> [    7.240881] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218()
> [    7.248369] Modules linked in: ipv6 autofs4
> [    7.252792] CPU: 0 PID: 58 Comm: systemd Not tainted 3.19.0-rc5-next-20150119-dirty #888
> [    7.261274] Hardware name: Generic AM43 (Flattened Device Tree)
> [    7.267512] [<c0015afc>] (unwind_backtrace) from [<c001221c>] (show_stack+0x10/0x14)
> [    7.275651] [<c001221c>] (show_stack) from [<c058972c>] (dump_stack+0x84/0x9c)
> [    7.283249] [<c058972c>] (dump_stack) from [<c003def0>] (warn_slowpath_common+0x78/0xb4)
> [    7.291750] [<c003def0>] (warn_slowpath_common) from [<c003dfc8>] (warn_slowpath_null+0x1c/0x24)
> [    7.300977] [<c003dfc8>] (warn_slowpath_null) from [<c0133410>] (exit_mmap+0x1b4/0x218)
> [    7.309376] [<c0133410>] (exit_mmap) from [<c003b5f0>] (mmput+0x44/0xec)
> [    7.316385] [<c003b5f0>] (mmput) from [<c0157e68>] (flush_old_exec+0x264/0x5d4)
> [    7.324061] [<c0157e68>] (flush_old_exec) from [<c019f180>] (load_elf_binary+0x288/0x1234)
> [    7.332727] [<c019f180>] (load_elf_binary) from [<c0158304>] (search_binary_handler+0x84/0x1e8)
> [    7.341857] [<c0158304>] (search_binary_handler) from [<c0158c84>] (do_execveat_common+0x53c/0x6b8)
> [    7.351346] [<c0158c84>] (do_execveat_common) from [<c0158e24>] (do_execve+0x24/0x2c)
> [    7.359561] [<c0158e24>] (do_execve) from [<c000e6c0>] (ret_fast_syscall+0x0/0x4c)
> [    7.367485] ---[ end trace 633a89eb76b1d46e ]---
> [    7.372360] mm ed29fa00 mmap ed29b6b8 seqnum 0 task_size 3204448256
> [    7.372360] get_unmapped_area c001cfc0
> [    7.372360] mmap_base 3069620224 mmap_legacy_base 0 highest_vm_end 3202711552
> [    7.372360] pgd ed184000 mm_users 0 mm_count 1 nr_ptes 1 nr_pmds 4294967292 map_count 59
> [    7.372360] hiwater_rss 37 hiwater_vm 37e total_vm 37e locked_vm 0
> [    7.372360] pinned_vm 0 shared_vm 324 exec_vm 254 stack_vm 22
> [    7.372360] start_code 10000 end_code c9d48 start_data da1b8 end_data ea1a4
> [    7.372360] start_brk eb000 brk 10c000 start_stack becd5f10
> [    7.372360] arg_start becd5fd4 arg_end becd5fdf env_start becd5fdf env_end becd5ff1
> [    7.372360] binfmt c08b3158 flags cd core_state   (null)
> [    7.372360] ioctx_table   (null)
> [    7.372360] owner   (null) exe_file ee49d040
> [    7.372360] tlb_flush_pending 0
> [    7.448908] flags: 0x0()
> 
> Looking at nr_pmds, that's basically (unsigned long) -4, which tells me
> we are decrementing mm->nr_pmds without incrementing first. In, when I
> add:
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ba5f3bcca55d..8425fb419eab 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1440,11 +1440,15 @@ static inline unsigned long mm_nr_pmds(struct mm_struct *mm)
>  static inline void mm_inc_nr_pmds(struct mm_struct *mm)
>  {
>  	atomic_long_inc(&mm->nr_pmds);
> +	printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
> +			atomic_long_read(&mm->nr_pmds));
>  }
>  
>  static inline void mm_dec_nr_pmds(struct mm_struct *mm)
>  {
>  	atomic_long_dec(&mm->nr_pmds);
> +	printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
> +			atomic_long_read(&mm->nr_pmds));
>  }
>  #endif
> 
> I start getting:
> 
> [...]
> 
> [    5.935390] ===> mm_dec_nr_pmds nr_pmds -1
> [    6.236832] random: systemd urandom read with 34 bits of entropy
> available
> [    6.276340] systemd[1]: systemd 215 running in system mode. (+PAM
> +AUDIT +SELINUX +IMA +SYSVINIT +LIBC
> RYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR)
> [    6.291380] systemd[1]: Detected architecture 'arm'.
> 
> Welcome to Debian GNU/Linux 8 (jessie)!
> 
> [    6.434013] systemd[1]: Inserted module 'autofs4'
> [    7.152037] NET: Registered protocol family 10
> [    7.165770] systemd[1]: Inserted module 'ipv6'
> [    7.170646] ===> mm_dec_nr_pmds nr_pmds -2
> [    7.174932] ===> mm_dec_nr_pmds nr_pmds -3
> [    7.179427] ===> mm_dec_nr_pmds nr_pmds -4
> [    7.258496] ===> mm_dec_nr_pmds nr_pmds -1
> [    7.262809] ===> mm_dec_nr_pmds nr_pmds -2
> [    7.267206] ===> mm_dec_nr_pmds nr_pmds -3
> [    7.271486] ===> mm_dec_nr_pmds nr_pmds -4
> [    7.275884] ------------[ cut here ]------------
> [    7.280773] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218()
> 
> [...]
> 
> Which confirms my suspicion. So we never increment nr_pmds, but we
> decrement it. The simplest "fix" is to make mm_nr_pmds() return a signed
> long (see below) and cast roundup()'s return to (signed long), but
> that's not what we really want in this case because it's clear our PMD
> accounting is bogus.
> 
> Kiryl, any better idea on how to balance mm_inc_nr_pmds() and
> mm_dec_nr_pmds() ?

I assume it's on !LPAE kernel, right?

I did a quick look. ARM has folded PMD level in case of 2-level pages
tables, but it doesn't use standard approach -- pgtable-nopmd.h.
As result ARM doesn't have __PAGETABLE_PMD_FOLDED defined.

I will look further tomorrow, but I wounder if we can just define
__PAGETABLE_PMD_FOLDED in arch/arm/include/asm/pgtable-2level.h ?

This way we would also get rid of dead code -- __pmd_alloc() is never
called in this configuration. And fix the accounting issue: mm_*_nr_pmd()
helpers will become nop.

Better option would be converting 2-lvl ARM configuration to
<asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-20  0:16     ` Kirill A. Shutemov
@ 2015-01-20 11:45       ` Russell King - ARM Linux
  -1 siblings, 0 replies; 48+ messages in thread
From: Russell King - ARM Linux @ 2015-01-20 11:45 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Felipe Balbi, Nishanth Menon, linux-mm, linux-next, linux-omap,
	linux-arm-kernel

On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
> Better option would be converting 2-lvl ARM configuration to
> <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.

Well, IMHO the folded approach in asm-generic was done the wrong way
which barred ARM from ever using it.

By that, I mean that the asm-generic stuff encapsulates a pgd into a pud,
and a pud into a pmd:

typedef struct { pgd_t pgd; } pud_t;
typedef struct { pud_t pud; } pmd_t;

This, I assert, is the wrong way around.  Think about it when you have a
real 4 level page table structure - a single pgd points to a set of puds.
So, one pgd encapsulates via a pointer a set of puds.  One pud does not
encapsulate a set of pgds.

What we have on ARM is slightly different: because of the sizes of page
tables, we have a pgd entry which is physically two page table pointers.
However, there are cases where we want to access these as two separate
pointers.

So, we define pgd_t to be an array of two u32's, and a pmd_t to be a
single entry.  This works fine, we set the masks, shifts and sizes
appropriately so that the pmd code is optimised away, but leaves us with
the ability to go down to the individual pgd_t entries when we need to
(eg, for section mappings, writing the pgd pointers for page tables,
etc.)

I think I also ran into problems with:

#define pmd_val(x)                              (pud_val((x).pud))
#define __pmd(x)                                ((pmd_t) { __pud(x) } )

too - but it's been a very long time since the nopmd.h stuff was
introduced, and I last looked at it.

In any case, what we have today is what has worked for well over a decade
(and pre-dates nopmd.h), and I'm really not interested today in trying to
rework tonnes of code to make use of nopmd.h - especially as it will most
likely require nopmd.h to be rewritten too, and we now have real 3 level
page table support (which I have no way to test.)

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-20 11:45       ` Russell King - ARM Linux
  0 siblings, 0 replies; 48+ messages in thread
From: Russell King - ARM Linux @ 2015-01-20 11:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
> Better option would be converting 2-lvl ARM configuration to
> <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.

Well, IMHO the folded approach in asm-generic was done the wrong way
which barred ARM from ever using it.

By that, I mean that the asm-generic stuff encapsulates a pgd into a pud,
and a pud into a pmd:

typedef struct { pgd_t pgd; } pud_t;
typedef struct { pud_t pud; } pmd_t;

This, I assert, is the wrong way around.  Think about it when you have a
real 4 level page table structure - a single pgd points to a set of puds.
So, one pgd encapsulates via a pointer a set of puds.  One pud does not
encapsulate a set of pgds.

What we have on ARM is slightly different: because of the sizes of page
tables, we have a pgd entry which is physically two page table pointers.
However, there are cases where we want to access these as two separate
pointers.

So, we define pgd_t to be an array of two u32's, and a pmd_t to be a
single entry.  This works fine, we set the masks, shifts and sizes
appropriately so that the pmd code is optimised away, but leaves us with
the ability to go down to the individual pgd_t entries when we need to
(eg, for section mappings, writing the pgd pointers for page tables,
etc.)

I think I also ran into problems with:

#define pmd_val(x)                              (pud_val((x).pud))
#define __pmd(x)                                ((pmd_t) { __pud(x) } )

too - but it's been a very long time since the nopmd.h stuff was
introduced, and I last looked at it.

In any case, what we have today is what has worked for well over a decade
(and pre-dates nopmd.h), and I'm really not interested today in trying to
rework tonnes of code to make use of nopmd.h - especially as it will most
likely require nopmd.h to be rewritten too, and we now have real 3 level
page table support (which I have no way to test.)

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-20 11:45       ` Russell King - ARM Linux
  (?)
@ 2015-01-20 14:05         ` Kirill A. Shutemov
  -1 siblings, 0 replies; 48+ messages in thread
From: Kirill A. Shutemov @ 2015-01-20 14:05 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Kirill A. Shutemov, Felipe Balbi, Nishanth Menon, linux-mm,
	linux-next, linux-omap, linux-arm-kernel

Russell King - ARM Linux wrote:
> On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
> > Better option would be converting 2-lvl ARM configuration to
> > <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
> 
> Well, IMHO the folded approach in asm-generic was done the wrong way
> which barred ARM from ever using it.

Okay, I see.

Regarding the topic bug. Completely untested patch is below. Could anybody
check if it helps?

>From 34b9182d08ef2b541829e305fcc91ef1d26b27ea Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Tue, 20 Jan 2015 15:47:22 +0200
Subject: [PATCH] arm: define __PAGETABLE_PMD_FOLDED for !LPAE

ARM uses custom implementation of PMD folding in 2-level page table case.
Generic code expects to see __PAGETABLE_PMD_FOLDED to be defined if PMD is
folded, but ARM doesn't do this. Let's fix it.

Defining __PAGETABLE_PMD_FOLDED will drop out unused __pmd_alloc().
It also fixes problems with recently-introduced pmd accounting on ARM
without LPAE.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Nishanth Menon <nm@ti.com>
---
 arch/arm/include/asm/pgtable-2level.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index bcc5e300413f..bfd662e49a25 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -10,6 +10,8 @@
 #ifndef _ASM_PGTABLE_2LEVEL_H
 #define _ASM_PGTABLE_2LEVEL_H
 
+#define __PAGETABLE_PMD_FOLDED
+
 /*
  * Hardware-wise, we have a two level page table structure, where the first
  * level has 4096 entries, and the second level has 256 entries.  Each entry
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
@ 2015-01-20 14:05         ` Kirill A. Shutemov
  0 siblings, 0 replies; 48+ messages in thread
From: Kirill A. Shutemov @ 2015-01-20 14:05 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Kirill A. Shutemov, Felipe Balbi, Nishanth Menon, linux-mm,
	linux-next, linux-omap, linux-arm-kernel

Russell King - ARM Linux wrote:
> On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
> > Better option would be converting 2-lvl ARM configuration to
> > <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
> 
> Well, IMHO the folded approach in asm-generic was done the wrong way
> which barred ARM from ever using it.

Okay, I see.

Regarding the topic bug. Completely untested patch is below. Could anybody
check if it helps?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-20 14:05         ` Kirill A. Shutemov
  0 siblings, 0 replies; 48+ messages in thread
From: Kirill A. Shutemov @ 2015-01-20 14:05 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
> > Better option would be converting 2-lvl ARM configuration to
> > <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
> 
> Well, IMHO the folded approach in asm-generic was done the wrong way
> which barred ARM from ever using it.

Okay, I see.

Regarding the topic bug. Completely untested patch is below. Could anybody
check if it helps?

>From 34b9182d08ef2b541829e305fcc91ef1d26b27ea Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Tue, 20 Jan 2015 15:47:22 +0200
Subject: [PATCH] arm: define __PAGETABLE_PMD_FOLDED for !LPAE

ARM uses custom implementation of PMD folding in 2-level page table case.
Generic code expects to see __PAGETABLE_PMD_FOLDED to be defined if PMD is
folded, but ARM doesn't do this. Let's fix it.

Defining __PAGETABLE_PMD_FOLDED will drop out unused __pmd_alloc().
It also fixes problems with recently-introduced pmd accounting on ARM
without LPAE.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Nishanth Menon <nm@ti.com>
---
 arch/arm/include/asm/pgtable-2level.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index bcc5e300413f..bfd662e49a25 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -10,6 +10,8 @@
 #ifndef _ASM_PGTABLE_2LEVEL_H
 #define _ASM_PGTABLE_2LEVEL_H
 
+#define __PAGETABLE_PMD_FOLDED
+
 /*
  * Hardware-wise, we have a two level page table structure, where the first
  * level has 4096 entries, and the second level has 256 entries.  Each entry
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-20 14:05         ` Kirill A. Shutemov
  (?)
@ 2015-01-20 14:50           ` Fabio Estevam
  -1 siblings, 0 replies; 48+ messages in thread
From: Fabio Estevam @ 2015-01-20 14:50 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Russell King - ARM Linux, Nishanth Menon, Felipe Balbi, linux-mm,
	linux-next, linux-arm-kernel, linux-omap

On Tue, Jan 20, 2015 at 12:05 PM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> Russell King - ARM Linux wrote:
>> On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
>> > Better option would be converting 2-lvl ARM configuration to
>> > <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
>>
>> Well, IMHO the folded approach in asm-generic was done the wrong way
>> which barred ARM from ever using it.
>
> Okay, I see.
>
> Regarding the topic bug. Completely untested patch is below. Could anybody
> check if it helps?

Yes, it helps. Now I can boot mx6 running linux-next 20150120 with
your patch applied.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
@ 2015-01-20 14:50           ` Fabio Estevam
  0 siblings, 0 replies; 48+ messages in thread
From: Fabio Estevam @ 2015-01-20 14:50 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Russell King - ARM Linux, Nishanth Menon, Felipe Balbi, linux-mm,
	linux-next, linux-arm-kernel, linux-omap

On Tue, Jan 20, 2015 at 12:05 PM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> Russell King - ARM Linux wrote:
>> On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
>> > Better option would be converting 2-lvl ARM configuration to
>> > <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
>>
>> Well, IMHO the folded approach in asm-generic was done the wrong way
>> which barred ARM from ever using it.
>
> Okay, I see.
>
> Regarding the topic bug. Completely untested patch is below. Could anybody
> check if it helps?

Yes, it helps. Now I can boot mx6 running linux-next 20150120 with
your patch applied.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-20 14:50           ` Fabio Estevam
  0 siblings, 0 replies; 48+ messages in thread
From: Fabio Estevam @ 2015-01-20 14:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 20, 2015 at 12:05 PM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> Russell King - ARM Linux wrote:
>> On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
>> > Better option would be converting 2-lvl ARM configuration to
>> > <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
>>
>> Well, IMHO the folded approach in asm-generic was done the wrong way
>> which barred ARM from ever using it.
>
> Okay, I see.
>
> Regarding the topic bug. Completely untested patch is below. Could anybody
> check if it helps?

Yes, it helps. Now I can boot mx6 running linux-next 20150120 with
your patch applied.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-20 14:50           ` Fabio Estevam
@ 2015-01-20 15:10             ` Felipe Balbi
  -1 siblings, 0 replies; 48+ messages in thread
From: Felipe Balbi @ 2015-01-20 15:10 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Kirill A. Shutemov, Russell King - ARM Linux, Nishanth Menon,
	Felipe Balbi, linux-mm, linux-next, linux-arm-kernel, linux-omap

[-- Attachment #1: Type: text/plain, Size: 867 bytes --]

On Tue, Jan 20, 2015 at 12:50:59PM -0200, Fabio Estevam wrote:
> On Tue, Jan 20, 2015 at 12:05 PM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> > Russell King - ARM Linux wrote:
> >> On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
> >> > Better option would be converting 2-lvl ARM configuration to
> >> > <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
> >>
> >> Well, IMHO the folded approach in asm-generic was done the wrong way
> >> which barred ARM from ever using it.
> >
> > Okay, I see.
> >
> > Regarding the topic bug. Completely untested patch is below. Could anybody
> > check if it helps?
> 
> Yes, it helps. Now I can boot mx6 running linux-next 20150120 with
> your patch applied.

worked fine here too with AM437x SK, AM437x IDK and BeagleBoneBlack.

thanks

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-20 15:10             ` Felipe Balbi
  0 siblings, 0 replies; 48+ messages in thread
From: Felipe Balbi @ 2015-01-20 15:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 20, 2015 at 12:50:59PM -0200, Fabio Estevam wrote:
> On Tue, Jan 20, 2015 at 12:05 PM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> > Russell King - ARM Linux wrote:
> >> On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
> >> > Better option would be converting 2-lvl ARM configuration to
> >> > <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
> >>
> >> Well, IMHO the folded approach in asm-generic was done the wrong way
> >> which barred ARM from ever using it.
> >
> > Okay, I see.
> >
> > Regarding the topic bug. Completely untested patch is below. Could anybody
> > check if it helps?
> 
> Yes, it helps. Now I can boot mx6 running linux-next 20150120 with
> your patch applied.

worked fine here too with AM437x SK, AM437x IDK and BeagleBoneBlack.

thanks

-- 
balbi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150120/91a52db0/attachment.sig>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-20 14:05         ` Kirill A. Shutemov
@ 2015-01-20 23:26           ` Nishanth Menon
  -1 siblings, 0 replies; 48+ messages in thread
From: Nishanth Menon @ 2015-01-20 23:26 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Russell King - ARM Linux, Felipe Balbi, linux-mm, linux-next,
	linux-omap, linux-arm-kernel

On 16:05-20150120, Kirill A. Shutemov wrote:
> Russell King - ARM Linux wrote:
> > On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
> > > Better option would be converting 2-lvl ARM configuration to
> > > <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
> > 
> > Well, IMHO the folded approach in asm-generic was done the wrong way
> > which barred ARM from ever using it.
> 
> Okay, I see.
> 
> Regarding the topic bug. Completely untested patch is below. Could anybody
> check if it helps?
> 
> From 34b9182d08ef2b541829e305fcc91ef1d26b27ea Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Tue, 20 Jan 2015 15:47:22 +0200
> Subject: [PATCH] arm: define __PAGETABLE_PMD_FOLDED for !LPAE
> 
> ARM uses custom implementation of PMD folding in 2-level page table case.
> Generic code expects to see __PAGETABLE_PMD_FOLDED to be defined if PMD is
> folded, but ARM doesn't do this. Let's fix it.
> 
> Defining __PAGETABLE_PMD_FOLDED will drop out unused __pmd_alloc().
> It also fixes problems with recently-introduced pmd accounting on ARM
> without LPAE.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Nishanth Menon <nm@ti.com>
> ---
>  arch/arm/include/asm/pgtable-2level.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index bcc5e300413f..bfd662e49a25 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -10,6 +10,8 @@
>  #ifndef _ASM_PGTABLE_2LEVEL_H
>  #define _ASM_PGTABLE_2LEVEL_H
>  
> +#define __PAGETABLE_PMD_FOLDED
> +
>  /*
>   * Hardware-wise, we have a two level page table structure, where the first
>   * level has 4096 entries, and the second level has 256 entries.  Each entry
> -- 
> 2.1.4

Above helps the TI platforms
1:                     am335x-evm: BOOT: PASS: am335x-evm.txt
2:                      am335x-sk: BOOT: PASS: am335x-sk.txt
3:                     am3517-evm: BOOT: PASS: am3517-evm.txt
4:                      am37x-evm: BOOT: PASS: am37x-evm.txt
5:                      am437x-sk: BOOT: PASS: am437x-sk.txt
6:                    am43xx-epos: BOOT: PASS: am43xx-epos.txt
7:                   am43xx-gpevm: BOOT: PASS: am43xx-gpevm.txt
8:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: am57xx-evm.txt
9:                 BeagleBoard-XM: BOOT: PASS: beagleboard.txt
10:            beagleboard-vanilla: BOOT: PASS: beagleboard-vanilla.txt
11:               beaglebone-black: BOOT: PASS: beaglebone-black.txt
12:                     beaglebone: BOOT: PASS: beaglebone.txt
13:                     craneboard: BOOT: PASS: craneboard.txt
14:                     dra72x-evm: BOOT: PASS: dra72x-evm.txt
15:                     dra7xx-evm: BOOT: PASS: dra7xx-evm.txt
16:         OMAP3430-Labrador(LDP): BOOT: PASS: ldp.txt
17:                           n900: BOOT: FAIL: n900.txt (legacy issue
with my farm)
18:                      omap5-evm: BOOT: PASS: omap5-evm.txt
19:                  pandaboard-es: BOOT: PASS: pandaboard-es.txt
20:             pandaboard-vanilla: BOOT: PASS: pandaboard-vanilla.txt
21:                        sdp2430: BOOT: PASS: sdp2430.txt
22:                        sdp3430: BOOT: PASS: sdp3430.txt
23:                        sdp4430: BOOT: PASS: sdp4430.txt
TOTAL = 23 boards, Booted Boards = 22, No Boot boards = 1

please feel free to add my
Tested-by: Nishanth Menon <nm@ti.com>

-- 
Regards,
Nishanth Menon

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-20 23:26           ` Nishanth Menon
  0 siblings, 0 replies; 48+ messages in thread
From: Nishanth Menon @ 2015-01-20 23:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 16:05-20150120, Kirill A. Shutemov wrote:
> Russell King - ARM Linux wrote:
> > On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
> > > Better option would be converting 2-lvl ARM configuration to
> > > <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
> > 
> > Well, IMHO the folded approach in asm-generic was done the wrong way
> > which barred ARM from ever using it.
> 
> Okay, I see.
> 
> Regarding the topic bug. Completely untested patch is below. Could anybody
> check if it helps?
> 
> From 34b9182d08ef2b541829e305fcc91ef1d26b27ea Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Tue, 20 Jan 2015 15:47:22 +0200
> Subject: [PATCH] arm: define __PAGETABLE_PMD_FOLDED for !LPAE
> 
> ARM uses custom implementation of PMD folding in 2-level page table case.
> Generic code expects to see __PAGETABLE_PMD_FOLDED to be defined if PMD is
> folded, but ARM doesn't do this. Let's fix it.
> 
> Defining __PAGETABLE_PMD_FOLDED will drop out unused __pmd_alloc().
> It also fixes problems with recently-introduced pmd accounting on ARM
> without LPAE.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Nishanth Menon <nm@ti.com>
> ---
>  arch/arm/include/asm/pgtable-2level.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index bcc5e300413f..bfd662e49a25 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -10,6 +10,8 @@
>  #ifndef _ASM_PGTABLE_2LEVEL_H
>  #define _ASM_PGTABLE_2LEVEL_H
>  
> +#define __PAGETABLE_PMD_FOLDED
> +
>  /*
>   * Hardware-wise, we have a two level page table structure, where the first
>   * level has 4096 entries, and the second level has 256 entries.  Each entry
> -- 
> 2.1.4

Above helps the TI platforms
1:                     am335x-evm: BOOT: PASS: am335x-evm.txt
2:                      am335x-sk: BOOT: PASS: am335x-sk.txt
3:                     am3517-evm: BOOT: PASS: am3517-evm.txt
4:                      am37x-evm: BOOT: PASS: am37x-evm.txt
5:                      am437x-sk: BOOT: PASS: am437x-sk.txt
6:                    am43xx-epos: BOOT: PASS: am43xx-epos.txt
7:                   am43xx-gpevm: BOOT: PASS: am43xx-gpevm.txt
8:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: am57xx-evm.txt
9:                 BeagleBoard-XM: BOOT: PASS: beagleboard.txt
10:            beagleboard-vanilla: BOOT: PASS: beagleboard-vanilla.txt
11:               beaglebone-black: BOOT: PASS: beaglebone-black.txt
12:                     beaglebone: BOOT: PASS: beaglebone.txt
13:                     craneboard: BOOT: PASS: craneboard.txt
14:                     dra72x-evm: BOOT: PASS: dra72x-evm.txt
15:                     dra7xx-evm: BOOT: PASS: dra7xx-evm.txt
16:         OMAP3430-Labrador(LDP): BOOT: PASS: ldp.txt
17:                           n900: BOOT: FAIL: n900.txt (legacy issue
with my farm)
18:                      omap5-evm: BOOT: PASS: omap5-evm.txt
19:                  pandaboard-es: BOOT: PASS: pandaboard-es.txt
20:             pandaboard-vanilla: BOOT: PASS: pandaboard-vanilla.txt
21:                        sdp2430: BOOT: PASS: sdp2430.txt
22:                        sdp3430: BOOT: PASS: sdp3430.txt
23:                        sdp4430: BOOT: PASS: sdp4430.txt
TOTAL = 23 boards, Booted Boards = 22, No Boot boards = 1

please feel free to add my
Tested-by: Nishanth Menon <nm@ti.com>

-- 
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-20 14:05         ` Kirill A. Shutemov
@ 2015-01-21  9:23           ` Peter Ujfalusi
  -1 siblings, 0 replies; 48+ messages in thread
From: Peter Ujfalusi @ 2015-01-21  9:23 UTC (permalink / raw)
  To: Kirill A. Shutemov, Russell King - ARM Linux
  Cc: Felipe Balbi, Nishanth Menon, linux-mm, linux-next, linux-omap,
	linux-arm-kernel

On 01/20/2015 04:05 PM, Kirill A. Shutemov wrote:
> Russell King - ARM Linux wrote:
>> On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
>>> Better option would be converting 2-lvl ARM configuration to
>>> <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
>>
>> Well, IMHO the folded approach in asm-generic was done the wrong way
>> which barred ARM from ever using it.
> 
> Okay, I see.
> 
> Regarding the topic bug. Completely untested patch is below. Could anybody
> check if it helps?
> 
> From 34b9182d08ef2b541829e305fcc91ef1d26b27ea Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Tue, 20 Jan 2015 15:47:22 +0200
> Subject: [PATCH] arm: define __PAGETABLE_PMD_FOLDED for !LPAE
> 
> ARM uses custom implementation of PMD folding in 2-level page table case.
> Generic code expects to see __PAGETABLE_PMD_FOLDED to be defined if PMD is
> folded, but ARM doesn't do this. Let's fix it.
> 
> Defining __PAGETABLE_PMD_FOLDED will drop out unused __pmd_alloc().
> It also fixes problems with recently-introduced pmd accounting on ARM
> without LPAE.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Nishanth Menon <nm@ti.com>
> ---
>  arch/arm/include/asm/pgtable-2level.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index bcc5e300413f..bfd662e49a25 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -10,6 +10,8 @@
>  #ifndef _ASM_PGTABLE_2LEVEL_H
>  #define _ASM_PGTABLE_2LEVEL_H
>  
> +#define __PAGETABLE_PMD_FOLDED
> +
>  /*
>   * Hardware-wise, we have a two level page table structure, where the first
>   * level has 4096 entries, and the second level has 256 entries.  Each entry
> 

Among other boards I have my daVinci board (OMAP-L138-EVM) boots fine with
this patch.

Tested-by: Peter Ujfalusi <peter.ujfalusi@ti.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-21  9:23           ` Peter Ujfalusi
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Ujfalusi @ 2015-01-21  9:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/20/2015 04:05 PM, Kirill A. Shutemov wrote:
> Russell King - ARM Linux wrote:
>> On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
>>> Better option would be converting 2-lvl ARM configuration to
>>> <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
>>
>> Well, IMHO the folded approach in asm-generic was done the wrong way
>> which barred ARM from ever using it.
> 
> Okay, I see.
> 
> Regarding the topic bug. Completely untested patch is below. Could anybody
> check if it helps?
> 
> From 34b9182d08ef2b541829e305fcc91ef1d26b27ea Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Tue, 20 Jan 2015 15:47:22 +0200
> Subject: [PATCH] arm: define __PAGETABLE_PMD_FOLDED for !LPAE
> 
> ARM uses custom implementation of PMD folding in 2-level page table case.
> Generic code expects to see __PAGETABLE_PMD_FOLDED to be defined if PMD is
> folded, but ARM doesn't do this. Let's fix it.
> 
> Defining __PAGETABLE_PMD_FOLDED will drop out unused __pmd_alloc().
> It also fixes problems with recently-introduced pmd accounting on ARM
> without LPAE.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Nishanth Menon <nm@ti.com>
> ---
>  arch/arm/include/asm/pgtable-2level.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index bcc5e300413f..bfd662e49a25 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -10,6 +10,8 @@
>  #ifndef _ASM_PGTABLE_2LEVEL_H
>  #define _ASM_PGTABLE_2LEVEL_H
>  
> +#define __PAGETABLE_PMD_FOLDED
> +
>  /*
>   * Hardware-wise, we have a two level page table structure, where the first
>   * level has 4096 entries, and the second level has 256 entries.  Each entry
> 

Among other boards I have my daVinci board (OMAP-L138-EVM) boots fine with
this patch.

Tested-by: Peter Ujfalusi <peter.ujfalusi@ti.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-20 14:05         ` Kirill A. Shutemov
@ 2015-01-21 10:29           ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 48+ messages in thread
From: Krzysztof Kozlowski @ 2015-01-21 10:29 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Russell King - ARM Linux, Felipe Balbi, Nishanth Menon, linux-mm,
	linux-next, linux-omap, linux-arm-kernel

2015-01-20 15:05 GMT+01:00 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>:
> Russell King - ARM Linux wrote:
>> On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
>> > Better option would be converting 2-lvl ARM configuration to
>> > <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
>>
>> Well, IMHO the folded approach in asm-generic was done the wrong way
>> which barred ARM from ever using it.
>
> Okay, I see.
>
> Regarding the topic bug. Completely untested patch is below. Could anybody
> check if it helps?
>
> From 34b9182d08ef2b541829e305fcc91ef1d26b27ea Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Tue, 20 Jan 2015 15:47:22 +0200
> Subject: [PATCH] arm: define __PAGETABLE_PMD_FOLDED for !LPAE
>
> ARM uses custom implementation of PMD folding in 2-level page table case.
> Generic code expects to see __PAGETABLE_PMD_FOLDED to be defined if PMD is
> folded, but ARM doesn't do this. Let's fix it.
>
> Defining __PAGETABLE_PMD_FOLDED will drop out unused __pmd_alloc().
> It also fixes problems with recently-introduced pmd accounting on ARM
> without LPAE.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Nishanth Menon <nm@ti.com>
> ---
>  arch/arm/include/asm/pgtable-2level.h | 2 ++
>  1 file changed, 2 insertions(+)

Helps for this issue on Exynos 4412 (Trats2) and Exynos 5420 (Arndale Octa):
Tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>

Off-topic: "Using smp_processor_id() in preemptible" still screams [1]

[1] https://lkml.org/lkml/2015/1/20/162

Best regards,
Krzysztof

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-21 10:29           ` Krzysztof Kozlowski
  0 siblings, 0 replies; 48+ messages in thread
From: Krzysztof Kozlowski @ 2015-01-21 10:29 UTC (permalink / raw)
  To: linux-arm-kernel

2015-01-20 15:05 GMT+01:00 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>:
> Russell King - ARM Linux wrote:
>> On Tue, Jan 20, 2015 at 02:16:43AM +0200, Kirill A. Shutemov wrote:
>> > Better option would be converting 2-lvl ARM configuration to
>> > <asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.
>>
>> Well, IMHO the folded approach in asm-generic was done the wrong way
>> which barred ARM from ever using it.
>
> Okay, I see.
>
> Regarding the topic bug. Completely untested patch is below. Could anybody
> check if it helps?
>
> From 34b9182d08ef2b541829e305fcc91ef1d26b27ea Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Tue, 20 Jan 2015 15:47:22 +0200
> Subject: [PATCH] arm: define __PAGETABLE_PMD_FOLDED for !LPAE
>
> ARM uses custom implementation of PMD folding in 2-level page table case.
> Generic code expects to see __PAGETABLE_PMD_FOLDED to be defined if PMD is
> folded, but ARM doesn't do this. Let's fix it.
>
> Defining __PAGETABLE_PMD_FOLDED will drop out unused __pmd_alloc().
> It also fixes problems with recently-introduced pmd accounting on ARM
> without LPAE.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Nishanth Menon <nm@ti.com>
> ---
>  arch/arm/include/asm/pgtable-2level.h | 2 ++
>  1 file changed, 2 insertions(+)

Helps for this issue on Exynos 4412 (Trats2) and Exynos 5420 (Arndale Octa):
Tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>

Off-topic: "Using smp_processor_id() in preemptible" still screams [1]

[1] https://lkml.org/lkml/2015/1/20/162

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-20 14:05         ` Kirill A. Shutemov
@ 2015-01-23 17:27           ` Nishanth Menon
  -1 siblings, 0 replies; 48+ messages in thread
From: Nishanth Menon @ 2015-01-23 17:27 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Russell King - ARM Linux, Felipe Balbi, linux-mm, linux-next,
	linux-omap, linux-arm-kernel

On 16:05-20150120, Kirill A. Shutemov wrote:
[..]
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Nishanth Menon <nm@ti.com>
Just to close on this thread:
https://github.com/nmenon/kernel-test-logs/tree/next-20150123 looks good
and back to old status. Thank you folks for all the help.
-- 
Regards,
Nishanth Menon

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-23 17:27           ` Nishanth Menon
  0 siblings, 0 replies; 48+ messages in thread
From: Nishanth Menon @ 2015-01-23 17:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 16:05-20150120, Kirill A. Shutemov wrote:
[..]
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Nishanth Menon <nm@ti.com>
Just to close on this thread:
https://github.com/nmenon/kernel-test-logs/tree/next-20150123 looks good
and back to old status. Thank you folks for all the help.
-- 
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-23 17:27           ` Nishanth Menon
@ 2015-01-23 17:39             ` Tyler Baker
  -1 siblings, 0 replies; 48+ messages in thread
From: Tyler Baker @ 2015-01-23 17:39 UTC (permalink / raw)
  To: Nishanth Menon
  Cc: Kirill A. Shutemov, Russell King - ARM Linux, Felipe Balbi,
	linux-mm, linux-next, linux-omap, linux-arm-kernel

Hi,

On 23 January 2015 at 09:27, Nishanth Menon <nm@ti.com> wrote:
> On 16:05-20150120, Kirill A. Shutemov wrote:
> [..]
>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Reported-by: Nishanth Menon <nm@ti.com>
> Just to close on this thread:
> https://github.com/nmenon/kernel-test-logs/tree/next-20150123 looks good
> and back to old status. Thank you folks for all the help.

I just reviewed the boot logs for next-20150123 and there still seems
to be a related issue. I've been boot testing
multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
seem broken.

For example here are two boots with exynos5250-arndale, one with
multi_v7_defconfig+CONFIG_ARM_LPAE=y [1] and the other with
multi_v7_defconfig[2]. You can see the kernel configurations with
CONFIG_ARM_LPAE=y show the splat:

[   14.605950] ------------[ cut here ]------------
[   14.609163] WARNING: CPU: 1 PID: 63 at ../mm/mmap.c:2858
exit_mmap+0x1b8/0x224()
[   14.616548] Modules linked in:
[   14.619553] CPU: 1 PID: 63 Comm: init Not tainted 3.19.0-rc5-next-20150123 #1
[   14.626713] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   14.632830] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[   14.640473] [] (show_stack) from [] (dump_stack+0x78/0x94)
[   14.647678] [] (dump_stack) from [] (warn_slowpath_common+0x74/0xb0)
[   14.655744] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24)
[   14.664510] [] (warn_slowpath_null) from [] (exit_mmap+0x1b8/0x224)
[   14.672497] [] (exit_mmap) from [] (mmput+0x40/0xf8)
[   14.679180] [] (mmput) from [] (flush_old_exec+0x328/0x604)
[   14.686471] [] (flush_old_exec) from [] (load_elf_binary+0x26c/0x11f4)
[   14.694715] [] (load_elf_binary) from [] (search_binary_handler+0x98/0x244)
[   14.703395] [] (search_binary_handler) from []
(do_execveat_common+0x4dc/0x5bc)
[   14.712421] [] (do_execveat_common) from [] (do_execve+0x28/0x30)
[   14.720235] [] (do_execve) from [] (ret_fast_syscall+0x0/0x34)
[   14.727782] ---[ end trace 5e3ca48b454c7e0a ]---
[   14.733758] ------------[ cut here ]------------

Has anyone else tested with CONFIG_ARM_LPAE=y that can confirm my findings?


> --
> Regards,
> Nishanth Menon
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

[1] http://storage.kernelci.org/next/next-20150123/arm-multi_v7_defconfig+CONFIG_ARM_LPAE=y/lab-tbaker/boot-exynos5250-arndale.html

[2] http://storage.kernelci.org/next/next-20150123/arm-multi_v7_defconfig/lab-tbaker/boot-exynos5250-arndale.html

Cheers,

Tyler

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-23 17:39             ` Tyler Baker
  0 siblings, 0 replies; 48+ messages in thread
From: Tyler Baker @ 2015-01-23 17:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 23 January 2015 at 09:27, Nishanth Menon <nm@ti.com> wrote:
> On 16:05-20150120, Kirill A. Shutemov wrote:
> [..]
>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Reported-by: Nishanth Menon <nm@ti.com>
> Just to close on this thread:
> https://github.com/nmenon/kernel-test-logs/tree/next-20150123 looks good
> and back to old status. Thank you folks for all the help.

I just reviewed the boot logs for next-20150123 and there still seems
to be a related issue. I've been boot testing
multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
seem broken.

For example here are two boots with exynos5250-arndale, one with
multi_v7_defconfig+CONFIG_ARM_LPAE=y [1] and the other with
multi_v7_defconfig[2]. You can see the kernel configurations with
CONFIG_ARM_LPAE=y show the splat:

[   14.605950] ------------[ cut here ]------------
[   14.609163] WARNING: CPU: 1 PID: 63 at ../mm/mmap.c:2858
exit_mmap+0x1b8/0x224()
[   14.616548] Modules linked in:
[   14.619553] CPU: 1 PID: 63 Comm: init Not tainted 3.19.0-rc5-next-20150123 #1
[   14.626713] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   14.632830] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[   14.640473] [] (show_stack) from [] (dump_stack+0x78/0x94)
[   14.647678] [] (dump_stack) from [] (warn_slowpath_common+0x74/0xb0)
[   14.655744] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24)
[   14.664510] [] (warn_slowpath_null) from [] (exit_mmap+0x1b8/0x224)
[   14.672497] [] (exit_mmap) from [] (mmput+0x40/0xf8)
[   14.679180] [] (mmput) from [] (flush_old_exec+0x328/0x604)
[   14.686471] [] (flush_old_exec) from [] (load_elf_binary+0x26c/0x11f4)
[   14.694715] [] (load_elf_binary) from [] (search_binary_handler+0x98/0x244)
[   14.703395] [] (search_binary_handler) from []
(do_execveat_common+0x4dc/0x5bc)
[   14.712421] [] (do_execveat_common) from [] (do_execve+0x28/0x30)
[   14.720235] [] (do_execve) from [] (ret_fast_syscall+0x0/0x34)
[   14.727782] ---[ end trace 5e3ca48b454c7e0a ]---
[   14.733758] ------------[ cut here ]------------

Has anyone else tested with CONFIG_ARM_LPAE=y that can confirm my findings?


> --
> Regards,
> Nishanth Menon
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

[1] http://storage.kernelci.org/next/next-20150123/arm-multi_v7_defconfig+CONFIG_ARM_LPAE=y/lab-tbaker/boot-exynos5250-arndale.html

[2] http://storage.kernelci.org/next/next-20150123/arm-multi_v7_defconfig/lab-tbaker/boot-exynos5250-arndale.html

Cheers,

Tyler

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-23 17:39             ` Tyler Baker
@ 2015-01-23 18:37               ` Nishanth Menon
  -1 siblings, 0 replies; 48+ messages in thread
From: Nishanth Menon @ 2015-01-23 18:37 UTC (permalink / raw)
  To: Tyler Baker
  Cc: Kirill A. Shutemov, Russell King - ARM Linux, Felipe Balbi,
	linux-mm, linux-next, linux-omap, linux-arm-kernel

On 09:39-20150123, Tyler Baker wrote:
> Hi,
> 
> On 23 January 2015 at 09:27, Nishanth Menon <nm@ti.com> wrote:
> > On 16:05-20150120, Kirill A. Shutemov wrote:
> > [..]
> >> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >> Reported-by: Nishanth Menon <nm@ti.com>
> > Just to close on this thread:
> > https://github.com/nmenon/kernel-test-logs/tree/next-20150123 looks good
> > and back to old status. Thank you folks for all the help.
> 
> I just reviewed the boot logs for next-20150123 and there still seems
> to be a related issue. I've been boot testing
> multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
> seem broken.
> 
> For example here are two boots with exynos5250-arndale, one with
> multi_v7_defconfig+CONFIG_ARM_LPAE=y [1] and the other with
> multi_v7_defconfig[2]. You can see the kernel configurations with
> CONFIG_ARM_LPAE=y show the splat:
> 
> [   14.605950] ------------[ cut here ]------------
> [   14.609163] WARNING: CPU: 1 PID: 63 at ../mm/mmap.c:2858
> exit_mmap+0x1b8/0x224()
> [   14.616548] Modules linked in:
> [   14.619553] CPU: 1 PID: 63 Comm: init Not tainted 3.19.0-rc5-next-20150123 #1
> [   14.626713] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> [   14.632830] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> [   14.640473] [] (show_stack) from [] (dump_stack+0x78/0x94)
> [   14.647678] [] (dump_stack) from [] (warn_slowpath_common+0x74/0xb0)
> [   14.655744] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24)
> [   14.664510] [] (warn_slowpath_null) from [] (exit_mmap+0x1b8/0x224)
> [   14.672497] [] (exit_mmap) from [] (mmput+0x40/0xf8)
> [   14.679180] [] (mmput) from [] (flush_old_exec+0x328/0x604)
> [   14.686471] [] (flush_old_exec) from [] (load_elf_binary+0x26c/0x11f4)
> [   14.694715] [] (load_elf_binary) from [] (search_binary_handler+0x98/0x244)
> [   14.703395] [] (search_binary_handler) from []
> (do_execveat_common+0x4dc/0x5bc)
> [   14.712421] [] (do_execveat_common) from [] (do_execve+0x28/0x30)
> [   14.720235] [] (do_execve) from [] (ret_fast_syscall+0x0/0x34)
> [   14.727782] ---[ end trace 5e3ca48b454c7e0a ]---
> [   14.733758] ------------[ cut here ]------------
> 
> Has anyone else tested with CONFIG_ARM_LPAE=y that can confirm my findings?
Uggh... I missed since i was looking at non LPAE omap2plus_defconfig.

Dual A15 OMAP5432 with multi_v7_defconfig + CONFIG_ARM_LPAE=y
https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/omap5-evm.txt

Dual A15 DRA7/AM572x with same configuration as above.
https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra7xx-evm.txt
https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/am57xx-evm.txt

Single A15 DRA72 with same configuration as above:
https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra72x-evm.txt

You are right. the issue re-appears with LPAE on :(
Apologies on missing that.

> 
> 
> > --
> > Regards,
> > Nishanth Menon
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> [1] http://storage.kernelci.org/next/next-20150123/arm-multi_v7_defconfig+CONFIG_ARM_LPAE=y/lab-tbaker/boot-exynos5250-arndale.html
> 
> [2] http://storage.kernelci.org/next/next-20150123/arm-multi_v7_defconfig/lab-tbaker/boot-exynos5250-arndale.html
> 
> Cheers,
> 
> Tyler

-- 
Regards,
Nishanth Menon

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-23 18:37               ` Nishanth Menon
  0 siblings, 0 replies; 48+ messages in thread
From: Nishanth Menon @ 2015-01-23 18:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 09:39-20150123, Tyler Baker wrote:
> Hi,
> 
> On 23 January 2015 at 09:27, Nishanth Menon <nm@ti.com> wrote:
> > On 16:05-20150120, Kirill A. Shutemov wrote:
> > [..]
> >> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >> Reported-by: Nishanth Menon <nm@ti.com>
> > Just to close on this thread:
> > https://github.com/nmenon/kernel-test-logs/tree/next-20150123 looks good
> > and back to old status. Thank you folks for all the help.
> 
> I just reviewed the boot logs for next-20150123 and there still seems
> to be a related issue. I've been boot testing
> multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
> seem broken.
> 
> For example here are two boots with exynos5250-arndale, one with
> multi_v7_defconfig+CONFIG_ARM_LPAE=y [1] and the other with
> multi_v7_defconfig[2]. You can see the kernel configurations with
> CONFIG_ARM_LPAE=y show the splat:
> 
> [   14.605950] ------------[ cut here ]------------
> [   14.609163] WARNING: CPU: 1 PID: 63 at ../mm/mmap.c:2858
> exit_mmap+0x1b8/0x224()
> [   14.616548] Modules linked in:
> [   14.619553] CPU: 1 PID: 63 Comm: init Not tainted 3.19.0-rc5-next-20150123 #1
> [   14.626713] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> [   14.632830] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> [   14.640473] [] (show_stack) from [] (dump_stack+0x78/0x94)
> [   14.647678] [] (dump_stack) from [] (warn_slowpath_common+0x74/0xb0)
> [   14.655744] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24)
> [   14.664510] [] (warn_slowpath_null) from [] (exit_mmap+0x1b8/0x224)
> [   14.672497] [] (exit_mmap) from [] (mmput+0x40/0xf8)
> [   14.679180] [] (mmput) from [] (flush_old_exec+0x328/0x604)
> [   14.686471] [] (flush_old_exec) from [] (load_elf_binary+0x26c/0x11f4)
> [   14.694715] [] (load_elf_binary) from [] (search_binary_handler+0x98/0x244)
> [   14.703395] [] (search_binary_handler) from []
> (do_execveat_common+0x4dc/0x5bc)
> [   14.712421] [] (do_execveat_common) from [] (do_execve+0x28/0x30)
> [   14.720235] [] (do_execve) from [] (ret_fast_syscall+0x0/0x34)
> [   14.727782] ---[ end trace 5e3ca48b454c7e0a ]---
> [   14.733758] ------------[ cut here ]------------
> 
> Has anyone else tested with CONFIG_ARM_LPAE=y that can confirm my findings?
Uggh... I missed since i was looking at non LPAE omap2plus_defconfig.

Dual A15 OMAP5432 with multi_v7_defconfig + CONFIG_ARM_LPAE=y
https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/omap5-evm.txt

Dual A15 DRA7/AM572x with same configuration as above.
https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra7xx-evm.txt
https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/am57xx-evm.txt

Single A15 DRA72 with same configuration as above:
https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra72x-evm.txt

You are right. the issue re-appears with LPAE on :(
Apologies on missing that.

> 
> 
> > --
> > Regards,
> > Nishanth Menon
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> [1] http://storage.kernelci.org/next/next-20150123/arm-multi_v7_defconfig+CONFIG_ARM_LPAE=y/lab-tbaker/boot-exynos5250-arndale.html
> 
> [2] http://storage.kernelci.org/next/next-20150123/arm-multi_v7_defconfig/lab-tbaker/boot-exynos5250-arndale.html
> 
> Cheers,
> 
> Tyler

-- 
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-23 18:37               ` Nishanth Menon
@ 2015-01-23 20:22                 ` Kirill A. Shutemov
  -1 siblings, 0 replies; 48+ messages in thread
From: Kirill A. Shutemov @ 2015-01-23 20:22 UTC (permalink / raw)
  To: Nishanth Menon
  Cc: Tyler Baker, Kirill A. Shutemov, Russell King - ARM Linux,
	Felipe Balbi, linux-mm, linux-next, linux-omap, linux-arm-kernel

On Fri, Jan 23, 2015 at 12:37:06PM -0600, Nishanth Menon wrote:
> On 09:39-20150123, Tyler Baker wrote:
> > Hi,
> > 
> > On 23 January 2015 at 09:27, Nishanth Menon <nm@ti.com> wrote:
> > > On 16:05-20150120, Kirill A. Shutemov wrote:
> > > [..]
> > >> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > >> Reported-by: Nishanth Menon <nm@ti.com>
> > > Just to close on this thread:
> > > https://github.com/nmenon/kernel-test-logs/tree/next-20150123 looks good
> > > and back to old status. Thank you folks for all the help.
> > 
> > I just reviewed the boot logs for next-20150123 and there still seems
> > to be a related issue. I've been boot testing
> > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
> > seem broken.
> > 
> > For example here are two boots with exynos5250-arndale, one with
> > multi_v7_defconfig+CONFIG_ARM_LPAE=y [1] and the other with
> > multi_v7_defconfig[2]. You can see the kernel configurations with
> > CONFIG_ARM_LPAE=y show the splat:
> > 
> > [   14.605950] ------------[ cut here ]------------
> > [   14.609163] WARNING: CPU: 1 PID: 63 at ../mm/mmap.c:2858
> > exit_mmap+0x1b8/0x224()
> > [   14.616548] Modules linked in:
> > [   14.619553] CPU: 1 PID: 63 Comm: init Not tainted 3.19.0-rc5-next-20150123 #1
> > [   14.626713] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> > [   14.632830] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> > [   14.640473] [] (show_stack) from [] (dump_stack+0x78/0x94)
> > [   14.647678] [] (dump_stack) from [] (warn_slowpath_common+0x74/0xb0)
> > [   14.655744] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24)
> > [   14.664510] [] (warn_slowpath_null) from [] (exit_mmap+0x1b8/0x224)
> > [   14.672497] [] (exit_mmap) from [] (mmput+0x40/0xf8)
> > [   14.679180] [] (mmput) from [] (flush_old_exec+0x328/0x604)
> > [   14.686471] [] (flush_old_exec) from [] (load_elf_binary+0x26c/0x11f4)
> > [   14.694715] [] (load_elf_binary) from [] (search_binary_handler+0x98/0x244)
> > [   14.703395] [] (search_binary_handler) from []
> > (do_execveat_common+0x4dc/0x5bc)
> > [   14.712421] [] (do_execveat_common) from [] (do_execve+0x28/0x30)
> > [   14.720235] [] (do_execve) from [] (ret_fast_syscall+0x0/0x34)
> > [   14.727782] ---[ end trace 5e3ca48b454c7e0a ]---
> > [   14.733758] ------------[ cut here ]------------
> > 
> > Has anyone else tested with CONFIG_ARM_LPAE=y that can confirm my findings?
> Uggh... I missed since i was looking at non LPAE omap2plus_defconfig.
> 
> Dual A15 OMAP5432 with multi_v7_defconfig + CONFIG_ARM_LPAE=y
> https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/omap5-evm.txt
> 
> Dual A15 DRA7/AM572x with same configuration as above.
> https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra7xx-evm.txt
> https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/am57xx-evm.txt
> 
> Single A15 DRA72 with same configuration as above:
> https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra72x-evm.txt
> 
> You are right. the issue re-appears with LPAE on :(
> Apologies on missing that.

Guys, could you instrument mm_{inc,dec}_nr_pmds() with dump_stack() +
printk() of the counter and add printk() on mmap_exit() then run a simple
program which triggers the issue?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-23 20:22                 ` Kirill A. Shutemov
  0 siblings, 0 replies; 48+ messages in thread
From: Kirill A. Shutemov @ 2015-01-23 20:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 23, 2015 at 12:37:06PM -0600, Nishanth Menon wrote:
> On 09:39-20150123, Tyler Baker wrote:
> > Hi,
> > 
> > On 23 January 2015 at 09:27, Nishanth Menon <nm@ti.com> wrote:
> > > On 16:05-20150120, Kirill A. Shutemov wrote:
> > > [..]
> > >> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > >> Reported-by: Nishanth Menon <nm@ti.com>
> > > Just to close on this thread:
> > > https://github.com/nmenon/kernel-test-logs/tree/next-20150123 looks good
> > > and back to old status. Thank you folks for all the help.
> > 
> > I just reviewed the boot logs for next-20150123 and there still seems
> > to be a related issue. I've been boot testing
> > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
> > seem broken.
> > 
> > For example here are two boots with exynos5250-arndale, one with
> > multi_v7_defconfig+CONFIG_ARM_LPAE=y [1] and the other with
> > multi_v7_defconfig[2]. You can see the kernel configurations with
> > CONFIG_ARM_LPAE=y show the splat:
> > 
> > [   14.605950] ------------[ cut here ]------------
> > [   14.609163] WARNING: CPU: 1 PID: 63 at ../mm/mmap.c:2858
> > exit_mmap+0x1b8/0x224()
> > [   14.616548] Modules linked in:
> > [   14.619553] CPU: 1 PID: 63 Comm: init Not tainted 3.19.0-rc5-next-20150123 #1
> > [   14.626713] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> > [   14.632830] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> > [   14.640473] [] (show_stack) from [] (dump_stack+0x78/0x94)
> > [   14.647678] [] (dump_stack) from [] (warn_slowpath_common+0x74/0xb0)
> > [   14.655744] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24)
> > [   14.664510] [] (warn_slowpath_null) from [] (exit_mmap+0x1b8/0x224)
> > [   14.672497] [] (exit_mmap) from [] (mmput+0x40/0xf8)
> > [   14.679180] [] (mmput) from [] (flush_old_exec+0x328/0x604)
> > [   14.686471] [] (flush_old_exec) from [] (load_elf_binary+0x26c/0x11f4)
> > [   14.694715] [] (load_elf_binary) from [] (search_binary_handler+0x98/0x244)
> > [   14.703395] [] (search_binary_handler) from []
> > (do_execveat_common+0x4dc/0x5bc)
> > [   14.712421] [] (do_execveat_common) from [] (do_execve+0x28/0x30)
> > [   14.720235] [] (do_execve) from [] (ret_fast_syscall+0x0/0x34)
> > [   14.727782] ---[ end trace 5e3ca48b454c7e0a ]---
> > [   14.733758] ------------[ cut here ]------------
> > 
> > Has anyone else tested with CONFIG_ARM_LPAE=y that can confirm my findings?
> Uggh... I missed since i was looking at non LPAE omap2plus_defconfig.
> 
> Dual A15 OMAP5432 with multi_v7_defconfig + CONFIG_ARM_LPAE=y
> https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/omap5-evm.txt
> 
> Dual A15 DRA7/AM572x with same configuration as above.
> https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra7xx-evm.txt
> https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/am57xx-evm.txt
> 
> Single A15 DRA72 with same configuration as above:
> https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra72x-evm.txt
> 
> You are right. the issue re-appears with LPAE on :(
> Apologies on missing that.

Guys, could you instrument mm_{inc,dec}_nr_pmds() with dump_stack() +
printk() of the counter and add printk() on mmap_exit() then run a simple
program which triggers the issue?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-23 20:22                 ` Kirill A. Shutemov
@ 2015-01-23 22:05                   ` Nishanth Menon
  -1 siblings, 0 replies; 48+ messages in thread
From: Nishanth Menon @ 2015-01-23 22:05 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Tyler Baker, Kirill A. Shutemov, Russell King - ARM Linux,
	Felipe Balbi, linux-mm, linux-next, linux-omap, linux-arm-kernel

On 22:22-20150123, Kirill A. Shutemov wrote:
> On Fri, Jan 23, 2015 at 12:37:06PM -0600, Nishanth Menon wrote:
> > On 09:39-20150123, Tyler Baker wrote:
> > > Hi,
> > > 
> > > On 23 January 2015 at 09:27, Nishanth Menon <nm@ti.com> wrote:
> > > > On 16:05-20150120, Kirill A. Shutemov wrote:
> > > > [..]
> > > >> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > >> Reported-by: Nishanth Menon <nm@ti.com>
> > > > Just to close on this thread:
> > > > https://github.com/nmenon/kernel-test-logs/tree/next-20150123 looks good
> > > > and back to old status. Thank you folks for all the help.
> > > 
> > > I just reviewed the boot logs for next-20150123 and there still seems
> > > to be a related issue. I've been boot testing
> > > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
> > > seem broken.
> > > 
> > > For example here are two boots with exynos5250-arndale, one with
> > > multi_v7_defconfig+CONFIG_ARM_LPAE=y [1] and the other with
> > > multi_v7_defconfig[2]. You can see the kernel configurations with
> > > CONFIG_ARM_LPAE=y show the splat:
> > > 
> > > [   14.605950] ------------[ cut here ]------------
> > > [   14.609163] WARNING: CPU: 1 PID: 63 at ../mm/mmap.c:2858
> > > exit_mmap+0x1b8/0x224()
> > > [   14.616548] Modules linked in:
> > > [   14.619553] CPU: 1 PID: 63 Comm: init Not tainted 3.19.0-rc5-next-20150123 #1
> > > [   14.626713] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> > > [   14.632830] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> > > [   14.640473] [] (show_stack) from [] (dump_stack+0x78/0x94)
> > > [   14.647678] [] (dump_stack) from [] (warn_slowpath_common+0x74/0xb0)
> > > [   14.655744] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24)
> > > [   14.664510] [] (warn_slowpath_null) from [] (exit_mmap+0x1b8/0x224)
> > > [   14.672497] [] (exit_mmap) from [] (mmput+0x40/0xf8)
> > > [   14.679180] [] (mmput) from [] (flush_old_exec+0x328/0x604)
> > > [   14.686471] [] (flush_old_exec) from [] (load_elf_binary+0x26c/0x11f4)
> > > [   14.694715] [] (load_elf_binary) from [] (search_binary_handler+0x98/0x244)
> > > [   14.703395] [] (search_binary_handler) from []
> > > (do_execveat_common+0x4dc/0x5bc)
> > > [   14.712421] [] (do_execveat_common) from [] (do_execve+0x28/0x30)
> > > [   14.720235] [] (do_execve) from [] (ret_fast_syscall+0x0/0x34)
> > > [   14.727782] ---[ end trace 5e3ca48b454c7e0a ]---
> > > [   14.733758] ------------[ cut here ]------------
> > > 
> > > Has anyone else tested with CONFIG_ARM_LPAE=y that can confirm my findings?
> > Uggh... I missed since i was looking at non LPAE omap2plus_defconfig.
> > 
> > Dual A15 OMAP5432 with multi_v7_defconfig + CONFIG_ARM_LPAE=y
> > https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/omap5-evm.txt
> > 
> > Dual A15 DRA7/AM572x with same configuration as above.
> > https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra7xx-evm.txt
> > https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/am57xx-evm.txt
> > 
> > Single A15 DRA72 with same configuration as above:
> > https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra72x-evm.txt
> > 
> > You are right. the issue re-appears with LPAE on :(
> > Apologies on missing that.
> 
> Guys, could you instrument mm_{inc,dec}_nr_pmds() with dump_stack() +
> printk() of the counter and add printk() on mmap_exit() then run a simple
> program which triggers the issue?

The simplest program I think we are all running is "boot to shell" - I
mean, have'nt spend more time digging at it as I am not in a familiar
territory here. :( is there any instrumentation patch you want us to try?

-- 
Regards,
Nishanth Menon

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-23 22:05                   ` Nishanth Menon
  0 siblings, 0 replies; 48+ messages in thread
From: Nishanth Menon @ 2015-01-23 22:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 22:22-20150123, Kirill A. Shutemov wrote:
> On Fri, Jan 23, 2015 at 12:37:06PM -0600, Nishanth Menon wrote:
> > On 09:39-20150123, Tyler Baker wrote:
> > > Hi,
> > > 
> > > On 23 January 2015 at 09:27, Nishanth Menon <nm@ti.com> wrote:
> > > > On 16:05-20150120, Kirill A. Shutemov wrote:
> > > > [..]
> > > >> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > >> Reported-by: Nishanth Menon <nm@ti.com>
> > > > Just to close on this thread:
> > > > https://github.com/nmenon/kernel-test-logs/tree/next-20150123 looks good
> > > > and back to old status. Thank you folks for all the help.
> > > 
> > > I just reviewed the boot logs for next-20150123 and there still seems
> > > to be a related issue. I've been boot testing
> > > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
> > > seem broken.
> > > 
> > > For example here are two boots with exynos5250-arndale, one with
> > > multi_v7_defconfig+CONFIG_ARM_LPAE=y [1] and the other with
> > > multi_v7_defconfig[2]. You can see the kernel configurations with
> > > CONFIG_ARM_LPAE=y show the splat:
> > > 
> > > [   14.605950] ------------[ cut here ]------------
> > > [   14.609163] WARNING: CPU: 1 PID: 63 at ../mm/mmap.c:2858
> > > exit_mmap+0x1b8/0x224()
> > > [   14.616548] Modules linked in:
> > > [   14.619553] CPU: 1 PID: 63 Comm: init Not tainted 3.19.0-rc5-next-20150123 #1
> > > [   14.626713] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> > > [   14.632830] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> > > [   14.640473] [] (show_stack) from [] (dump_stack+0x78/0x94)
> > > [   14.647678] [] (dump_stack) from [] (warn_slowpath_common+0x74/0xb0)
> > > [   14.655744] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24)
> > > [   14.664510] [] (warn_slowpath_null) from [] (exit_mmap+0x1b8/0x224)
> > > [   14.672497] [] (exit_mmap) from [] (mmput+0x40/0xf8)
> > > [   14.679180] [] (mmput) from [] (flush_old_exec+0x328/0x604)
> > > [   14.686471] [] (flush_old_exec) from [] (load_elf_binary+0x26c/0x11f4)
> > > [   14.694715] [] (load_elf_binary) from [] (search_binary_handler+0x98/0x244)
> > > [   14.703395] [] (search_binary_handler) from []
> > > (do_execveat_common+0x4dc/0x5bc)
> > > [   14.712421] [] (do_execveat_common) from [] (do_execve+0x28/0x30)
> > > [   14.720235] [] (do_execve) from [] (ret_fast_syscall+0x0/0x34)
> > > [   14.727782] ---[ end trace 5e3ca48b454c7e0a ]---
> > > [   14.733758] ------------[ cut here ]------------
> > > 
> > > Has anyone else tested with CONFIG_ARM_LPAE=y that can confirm my findings?
> > Uggh... I missed since i was looking at non LPAE omap2plus_defconfig.
> > 
> > Dual A15 OMAP5432 with multi_v7_defconfig + CONFIG_ARM_LPAE=y
> > https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/omap5-evm.txt
> > 
> > Dual A15 DRA7/AM572x with same configuration as above.
> > https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra7xx-evm.txt
> > https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/am57xx-evm.txt
> > 
> > Single A15 DRA72 with same configuration as above:
> > https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra72x-evm.txt
> > 
> > You are right. the issue re-appears with LPAE on :(
> > Apologies on missing that.
> 
> Guys, could you instrument mm_{inc,dec}_nr_pmds() with dump_stack() +
> printk() of the counter and add printk() on mmap_exit() then run a simple
> program which triggers the issue?

The simplest program I think we are all running is "boot to shell" - I
mean, have'nt spend more time digging at it as I am not in a familiar
territory here. :( is there any instrumentation patch you want us to try?

-- 
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-23 20:22                 ` Kirill A. Shutemov
@ 2015-01-23 22:42                   ` Tyler Baker
  -1 siblings, 0 replies; 48+ messages in thread
From: Tyler Baker @ 2015-01-23 22:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Nishanth Menon, Kirill A. Shutemov, Russell King - ARM Linux,
	Felipe Balbi, linux-mm, linux-next, linux-omap, linux-arm-kernel

Hi Kirill,

On 23 January 2015 at 12:22, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Fri, Jan 23, 2015 at 12:37:06PM -0600, Nishanth Menon wrote:
>> On 09:39-20150123, Tyler Baker wrote:
>> > Hi,
>> >
>> > On 23 January 2015 at 09:27, Nishanth Menon <nm@ti.com> wrote:
>> > > On 16:05-20150120, Kirill A. Shutemov wrote:
>> > > [..]
>> > >> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> > >> Reported-by: Nishanth Menon <nm@ti.com>
>> > > Just to close on this thread:
>> > > https://github.com/nmenon/kernel-test-logs/tree/next-20150123 looks good
>> > > and back to old status. Thank you folks for all the help.
>> >
>> > I just reviewed the boot logs for next-20150123 and there still seems
>> > to be a related issue. I've been boot testing
>> > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
>> > seem broken.
>> >
>> > For example here are two boots with exynos5250-arndale, one with
>> > multi_v7_defconfig+CONFIG_ARM_LPAE=y [1] and the other with
>> > multi_v7_defconfig[2]. You can see the kernel configurations with
>> > CONFIG_ARM_LPAE=y show the splat:
>> >
>> > [   14.605950] ------------[ cut here ]------------
>> > [   14.609163] WARNING: CPU: 1 PID: 63 at ../mm/mmap.c:2858
>> > exit_mmap+0x1b8/0x224()
>> > [   14.616548] Modules linked in:
>> > [   14.619553] CPU: 1 PID: 63 Comm: init Not tainted 3.19.0-rc5-next-20150123 #1
>> > [   14.626713] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>> > [   14.632830] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
>> > [   14.640473] [] (show_stack) from [] (dump_stack+0x78/0x94)
>> > [   14.647678] [] (dump_stack) from [] (warn_slowpath_common+0x74/0xb0)
>> > [   14.655744] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24)
>> > [   14.664510] [] (warn_slowpath_null) from [] (exit_mmap+0x1b8/0x224)
>> > [   14.672497] [] (exit_mmap) from [] (mmput+0x40/0xf8)
>> > [   14.679180] [] (mmput) from [] (flush_old_exec+0x328/0x604)
>> > [   14.686471] [] (flush_old_exec) from [] (load_elf_binary+0x26c/0x11f4)
>> > [   14.694715] [] (load_elf_binary) from [] (search_binary_handler+0x98/0x244)
>> > [   14.703395] [] (search_binary_handler) from []
>> > (do_execveat_common+0x4dc/0x5bc)
>> > [   14.712421] [] (do_execveat_common) from [] (do_execve+0x28/0x30)
>> > [   14.720235] [] (do_execve) from [] (ret_fast_syscall+0x0/0x34)
>> > [   14.727782] ---[ end trace 5e3ca48b454c7e0a ]---
>> > [   14.733758] ------------[ cut here ]------------
>> >
>> > Has anyone else tested with CONFIG_ARM_LPAE=y that can confirm my findings?
>> Uggh... I missed since i was looking at non LPAE omap2plus_defconfig.
>>
>> Dual A15 OMAP5432 with multi_v7_defconfig + CONFIG_ARM_LPAE=y
>> https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/omap5-evm.txt
>>
>> Dual A15 DRA7/AM572x with same configuration as above.
>> https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra7xx-evm.txt
>> https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/am57xx-evm.txt
>>
>> Single A15 DRA72 with same configuration as above:
>> https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra72x-evm.txt
>>
>> You are right. the issue re-appears with LPAE on :(
>> Apologies on missing that.
>
> Guys, could you instrument mm_{inc,dec}_nr_pmds() with dump_stack() +
> printk() of the counter and add printk() on mmap_exit() then run a simple
> program which triggers the issue?

For reference, here is the patch I've applied for testing, mostly
stolen from Felipe's debug patch above in this thread.

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1fbd0e8..e5b0444 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1455,11 +1455,17 @@ static inline unsigned long mm_nr_pmds(struct
mm_struct *mm)
 static inline void mm_inc_nr_pmds(struct mm_struct *mm)
 {
        atomic_long_inc(&mm->nr_pmds);
+        dump_stack();
+        printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
+                atomic_long_read(&mm->nr_pmds));
 }

 static inline void mm_dec_nr_pmds(struct mm_struct *mm)
 {
        atomic_long_dec(&mm->nr_pmds);
+        dump_stack();
+        printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
+                atomic_long_read(&mm->nr_pmds));
 }
 #endif

diff --git a/mm/mmap.c b/mm/mmap.c
index 6a7d36d..a16471f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2809,6 +2809,7 @@ EXPORT_SYMBOL(vm_brk);
 /* Release all mmaps. */
 void exit_mmap(struct mm_struct *mm)
 {
+       printk(KERN_INFO "===> %s exit_mmap enter\n", __func__);
        struct mmu_gather tlb;
        struct vm_area_struct *vma;
        unsigned long nr_accounted = 0;

I applied this patch to the tip of linux-next, configured for
multi_v7_defconfig and set CONFIG_ARM_LPAE=y. The log for this arndale
boot can be found here [1]. For good measure, I then rebuilt the
kernel with CONFIG_ARM_LPAE=n and booted the same platform again. This
log can be found here [2].

Happy hunting!

>
> --
>  Kirill A. Shutemov

[1] http://storage.kernelci.org/debug/mm/arndale-lpae-debug-next-20150123.html
[2] http://storage.kernelci.org/debug/mm/arndale-no-lpae-debug-next-20150123.html

Cheers,

Tyler

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-23 22:42                   ` Tyler Baker
  0 siblings, 0 replies; 48+ messages in thread
From: Tyler Baker @ 2015-01-23 22:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Kirill,

On 23 January 2015 at 12:22, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Fri, Jan 23, 2015 at 12:37:06PM -0600, Nishanth Menon wrote:
>> On 09:39-20150123, Tyler Baker wrote:
>> > Hi,
>> >
>> > On 23 January 2015 at 09:27, Nishanth Menon <nm@ti.com> wrote:
>> > > On 16:05-20150120, Kirill A. Shutemov wrote:
>> > > [..]
>> > >> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> > >> Reported-by: Nishanth Menon <nm@ti.com>
>> > > Just to close on this thread:
>> > > https://github.com/nmenon/kernel-test-logs/tree/next-20150123 looks good
>> > > and back to old status. Thank you folks for all the help.
>> >
>> > I just reviewed the boot logs for next-20150123 and there still seems
>> > to be a related issue. I've been boot testing
>> > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
>> > seem broken.
>> >
>> > For example here are two boots with exynos5250-arndale, one with
>> > multi_v7_defconfig+CONFIG_ARM_LPAE=y [1] and the other with
>> > multi_v7_defconfig[2]. You can see the kernel configurations with
>> > CONFIG_ARM_LPAE=y show the splat:
>> >
>> > [   14.605950] ------------[ cut here ]------------
>> > [   14.609163] WARNING: CPU: 1 PID: 63 at ../mm/mmap.c:2858
>> > exit_mmap+0x1b8/0x224()
>> > [   14.616548] Modules linked in:
>> > [   14.619553] CPU: 1 PID: 63 Comm: init Not tainted 3.19.0-rc5-next-20150123 #1
>> > [   14.626713] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>> > [   14.632830] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
>> > [   14.640473] [] (show_stack) from [] (dump_stack+0x78/0x94)
>> > [   14.647678] [] (dump_stack) from [] (warn_slowpath_common+0x74/0xb0)
>> > [   14.655744] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24)
>> > [   14.664510] [] (warn_slowpath_null) from [] (exit_mmap+0x1b8/0x224)
>> > [   14.672497] [] (exit_mmap) from [] (mmput+0x40/0xf8)
>> > [   14.679180] [] (mmput) from [] (flush_old_exec+0x328/0x604)
>> > [   14.686471] [] (flush_old_exec) from [] (load_elf_binary+0x26c/0x11f4)
>> > [   14.694715] [] (load_elf_binary) from [] (search_binary_handler+0x98/0x244)
>> > [   14.703395] [] (search_binary_handler) from []
>> > (do_execveat_common+0x4dc/0x5bc)
>> > [   14.712421] [] (do_execveat_common) from [] (do_execve+0x28/0x30)
>> > [   14.720235] [] (do_execve) from [] (ret_fast_syscall+0x0/0x34)
>> > [   14.727782] ---[ end trace 5e3ca48b454c7e0a ]---
>> > [   14.733758] ------------[ cut here ]------------
>> >
>> > Has anyone else tested with CONFIG_ARM_LPAE=y that can confirm my findings?
>> Uggh... I missed since i was looking at non LPAE omap2plus_defconfig.
>>
>> Dual A15 OMAP5432 with multi_v7_defconfig + CONFIG_ARM_LPAE=y
>> https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/omap5-evm.txt
>>
>> Dual A15 DRA7/AM572x with same configuration as above.
>> https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra7xx-evm.txt
>> https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/am57xx-evm.txt
>>
>> Single A15 DRA72 with same configuration as above:
>> https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra72x-evm.txt
>>
>> You are right. the issue re-appears with LPAE on :(
>> Apologies on missing that.
>
> Guys, could you instrument mm_{inc,dec}_nr_pmds() with dump_stack() +
> printk() of the counter and add printk() on mmap_exit() then run a simple
> program which triggers the issue?

For reference, here is the patch I've applied for testing, mostly
stolen from Felipe's debug patch above in this thread.

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1fbd0e8..e5b0444 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1455,11 +1455,17 @@ static inline unsigned long mm_nr_pmds(struct
mm_struct *mm)
 static inline void mm_inc_nr_pmds(struct mm_struct *mm)
 {
        atomic_long_inc(&mm->nr_pmds);
+        dump_stack();
+        printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
+                atomic_long_read(&mm->nr_pmds));
 }

 static inline void mm_dec_nr_pmds(struct mm_struct *mm)
 {
        atomic_long_dec(&mm->nr_pmds);
+        dump_stack();
+        printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
+                atomic_long_read(&mm->nr_pmds));
 }
 #endif

diff --git a/mm/mmap.c b/mm/mmap.c
index 6a7d36d..a16471f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2809,6 +2809,7 @@ EXPORT_SYMBOL(vm_brk);
 /* Release all mmaps. */
 void exit_mmap(struct mm_struct *mm)
 {
+       printk(KERN_INFO "===> %s exit_mmap enter\n", __func__);
        struct mmu_gather tlb;
        struct vm_area_struct *vma;
        unsigned long nr_accounted = 0;

I applied this patch to the tip of linux-next, configured for
multi_v7_defconfig and set CONFIG_ARM_LPAE=y. The log for this arndale
boot can be found here [1]. For good measure, I then rebuilt the
kernel with CONFIG_ARM_LPAE=n and booted the same platform again. This
log can be found here [2].

Happy hunting!

>
> --
>  Kirill A. Shutemov

[1] http://storage.kernelci.org/debug/mm/arndale-lpae-debug-next-20150123.html
[2] http://storage.kernelci.org/debug/mm/arndale-no-lpae-debug-next-20150123.html

Cheers,

Tyler

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-23 22:42                   ` Tyler Baker
@ 2015-01-24  1:13                     ` Kirill A. Shutemov
  -1 siblings, 0 replies; 48+ messages in thread
From: Kirill A. Shutemov @ 2015-01-24  1:13 UTC (permalink / raw)
  To: Tyler Baker
  Cc: Nishanth Menon, Kirill A. Shutemov, Russell King - ARM Linux,
	Felipe Balbi, linux-mm, linux-next, linux-omap, linux-arm-kernel

On Fri, Jan 23, 2015 at 02:42:17PM -0800, Tyler Baker wrote:
> Hi Kirill,
> 
> On 23 January 2015 at 12:22, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > On Fri, Jan 23, 2015 at 12:37:06PM -0600, Nishanth Menon wrote:
> >> On 09:39-20150123, Tyler Baker wrote:
> >> > Hi,
> >> >
> >> > On 23 January 2015 at 09:27, Nishanth Menon <nm@ti.com> wrote:
> >> > > On 16:05-20150120, Kirill A. Shutemov wrote:
> >> > > [..]
> >> > >> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >> > >> Reported-by: Nishanth Menon <nm@ti.com>
> >> > > Just to close on this thread:
> >> > > https://github.com/nmenon/kernel-test-logs/tree/next-20150123 looks good
> >> > > and back to old status. Thank you folks for all the help.
> >> >
> >> > I just reviewed the boot logs for next-20150123 and there still seems
> >> > to be a related issue. I've been boot testing
> >> > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
> >> > seem broken.
> >> >
> >> > For example here are two boots with exynos5250-arndale, one with
> >> > multi_v7_defconfig+CONFIG_ARM_LPAE=y [1] and the other with
> >> > multi_v7_defconfig[2]. You can see the kernel configurations with
> >> > CONFIG_ARM_LPAE=y show the splat:
> >> >
> >> > [   14.605950] ------------[ cut here ]------------
> >> > [   14.609163] WARNING: CPU: 1 PID: 63 at ../mm/mmap.c:2858
> >> > exit_mmap+0x1b8/0x224()
> >> > [   14.616548] Modules linked in:
> >> > [   14.619553] CPU: 1 PID: 63 Comm: init Not tainted 3.19.0-rc5-next-20150123 #1
> >> > [   14.626713] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> >> > [   14.632830] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> >> > [   14.640473] [] (show_stack) from [] (dump_stack+0x78/0x94)
> >> > [   14.647678] [] (dump_stack) from [] (warn_slowpath_common+0x74/0xb0)
> >> > [   14.655744] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24)
> >> > [   14.664510] [] (warn_slowpath_null) from [] (exit_mmap+0x1b8/0x224)
> >> > [   14.672497] [] (exit_mmap) from [] (mmput+0x40/0xf8)
> >> > [   14.679180] [] (mmput) from [] (flush_old_exec+0x328/0x604)
> >> > [   14.686471] [] (flush_old_exec) from [] (load_elf_binary+0x26c/0x11f4)
> >> > [   14.694715] [] (load_elf_binary) from [] (search_binary_handler+0x98/0x244)
> >> > [   14.703395] [] (search_binary_handler) from []
> >> > (do_execveat_common+0x4dc/0x5bc)
> >> > [   14.712421] [] (do_execveat_common) from [] (do_execve+0x28/0x30)
> >> > [   14.720235] [] (do_execve) from [] (ret_fast_syscall+0x0/0x34)
> >> > [   14.727782] ---[ end trace 5e3ca48b454c7e0a ]---
> >> > [   14.733758] ------------[ cut here ]------------
> >> >
> >> > Has anyone else tested with CONFIG_ARM_LPAE=y that can confirm my findings?
> >> Uggh... I missed since i was looking at non LPAE omap2plus_defconfig.
> >>
> >> Dual A15 OMAP5432 with multi_v7_defconfig + CONFIG_ARM_LPAE=y
> >> https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/omap5-evm.txt
> >>
> >> Dual A15 DRA7/AM572x with same configuration as above.
> >> https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra7xx-evm.txt
> >> https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/am57xx-evm.txt
> >>
> >> Single A15 DRA72 with same configuration as above:
> >> https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra72x-evm.txt
> >>
> >> You are right. the issue re-appears with LPAE on :(
> >> Apologies on missing that.
> >
> > Guys, could you instrument mm_{inc,dec}_nr_pmds() with dump_stack() +
> > printk() of the counter and add printk() on mmap_exit() then run a simple
> > program which triggers the issue?
> 
> For reference, here is the patch I've applied for testing, mostly
> stolen from Felipe's debug patch above in this thread.
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1fbd0e8..e5b0444 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1455,11 +1455,17 @@ static inline unsigned long mm_nr_pmds(struct
> mm_struct *mm)
>  static inline void mm_inc_nr_pmds(struct mm_struct *mm)
>  {
>         atomic_long_inc(&mm->nr_pmds);
> +        dump_stack();
> +        printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
> +                atomic_long_read(&mm->nr_pmds));
>  }
> 
>  static inline void mm_dec_nr_pmds(struct mm_struct *mm)
>  {
>         atomic_long_dec(&mm->nr_pmds);
> +        dump_stack();
> +        printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
> +                atomic_long_read(&mm->nr_pmds));
>  }
>  #endif
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 6a7d36d..a16471f 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2809,6 +2809,7 @@ EXPORT_SYMBOL(vm_brk);
>  /* Release all mmaps. */
>  void exit_mmap(struct mm_struct *mm)
>  {
> +       printk(KERN_INFO "===> %s exit_mmap enter\n", __func__);
>         struct mmu_gather tlb;
>         struct vm_area_struct *vma;
>         unsigned long nr_accounted = 0;
> 
> I applied this patch to the tip of linux-next, configured for
> multi_v7_defconfig and set CONFIG_ARM_LPAE=y. The log for this arndale
> boot can be found here [1]. For good measure, I then rebuilt the
> kernel with CONFIG_ARM_LPAE=n and booted the same platform again. This
> log can be found here [2].
> 
> Happy hunting!

Okay, proof of concept patch is below. It's going to break every other
architecture with FIRST_USER_ADDRESS != 0, but I think it's cleaner way to
go.

The problem is that we check nr_ptes/nr_pmds in exit_mmap() which happens
*before* pgd_free(). And if an arch does pte/pmd allocation in pgd_alloc()
and frees them in pgd_free() we see offset in counters by the time of the
checks.

This scenario happens for all archs with FIRST_USER_ADDRESS != 0 and we
tried to work it around by offsetting expected counter value according to
FIRST_USER_ADDRESS for both nr_pte and nr_pmd in exit_mmap().

But ARM with LPAE also has non-zero USER_PGTABLES_CEILING, but upper
addresses occupied with huge pmd entries, so the trick with offsetting
expected counter value will get really ugly: we will have to apply it
nr_pmds, but not nr_ptes.

The proposal is to move the check to check_mm() which happens *after*
pgd_free(). We would need to adjust pgd_free() on all architectures with
non-zero FIRST_USER_ADDRESS to make accouting properly there. But I think
the end result would be cleaner.

Andrew, any comments?

diff --git a/arch/arm/mm/pgd.c b/arch/arm/mm/pgd.c
index 249379535be2..c3ec18d9bbb9 100644
--- a/arch/arm/mm/pgd.c
+++ b/arch/arm/mm/pgd.c
@@ -130,9 +130,11 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd_base)
 	pte = pmd_pgtable(*pmd);
 	pmd_clear(pmd);
 	pte_free(mm, pte);
+	atomic_long_dec(&mm->nr_ptes);
 no_pmd:
 	pud_clear(pud);
 	pmd_free(mm, pmd);
+	mm_dec_nr_pmds(mm);
 no_pud:
 	pgd_clear(pgd);
 	pud_free(mm, pud);
@@ -152,6 +154,7 @@ no_pgd:
 		pmd = pmd_offset(pud, 0);
 		pud_clear(pud);
 		pmd_free(mm, pmd);
+		mm_dec_nr_pmds(mm);
 		pgd_clear(pgd);
 		pud_free(mm, pud);
 	}
diff --git a/kernel/fork.c b/kernel/fork.c
index c99098c52641..0a6f0a380335 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -599,6 +599,13 @@ static void check_mm(struct mm_struct *mm)
 {
 	int i;
 
+	if (atomic_long_read(&mm->nr_ptes))
+		pr_alert("BUG: non-zero nr_ptes on freeing mm: %ld",
+				atomic_long_read(&mm->nr_ptes));
+	if (mm_nr_pmds(mm))
+		pr_alert("BUG: non-zero nr_pmds on freeing mm: %ld",
+				mm_nr_pmds(mm));
+
 	for (i = 0; i < NR_MM_COUNTERS; i++) {
 		long x = atomic_long_read(&mm->rss_stat.count[i]);
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 6a7d36d133fb..c5f44682c0d1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2851,11 +2851,6 @@ void exit_mmap(struct mm_struct *mm)
 		vma = remove_vma(vma);
 	}
 	vm_unacct_memory(nr_accounted);
-
-	WARN_ON(atomic_long_read(&mm->nr_ptes) >
-			round_up(FIRST_USER_ADDRESS, PMD_SIZE) >> PMD_SHIFT);
-	WARN_ON(mm_nr_pmds(mm) >
-			round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
 }
 
 /* Insert vm structure into process list sorted by address
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-24  1:13                     ` Kirill A. Shutemov
  0 siblings, 0 replies; 48+ messages in thread
From: Kirill A. Shutemov @ 2015-01-24  1:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 23, 2015 at 02:42:17PM -0800, Tyler Baker wrote:
> Hi Kirill,
> 
> On 23 January 2015 at 12:22, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > On Fri, Jan 23, 2015 at 12:37:06PM -0600, Nishanth Menon wrote:
> >> On 09:39-20150123, Tyler Baker wrote:
> >> > Hi,
> >> >
> >> > On 23 January 2015 at 09:27, Nishanth Menon <nm@ti.com> wrote:
> >> > > On 16:05-20150120, Kirill A. Shutemov wrote:
> >> > > [..]
> >> > >> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >> > >> Reported-by: Nishanth Menon <nm@ti.com>
> >> > > Just to close on this thread:
> >> > > https://github.com/nmenon/kernel-test-logs/tree/next-20150123 looks good
> >> > > and back to old status. Thank you folks for all the help.
> >> >
> >> > I just reviewed the boot logs for next-20150123 and there still seems
> >> > to be a related issue. I've been boot testing
> >> > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
> >> > seem broken.
> >> >
> >> > For example here are two boots with exynos5250-arndale, one with
> >> > multi_v7_defconfig+CONFIG_ARM_LPAE=y [1] and the other with
> >> > multi_v7_defconfig[2]. You can see the kernel configurations with
> >> > CONFIG_ARM_LPAE=y show the splat:
> >> >
> >> > [   14.605950] ------------[ cut here ]------------
> >> > [   14.609163] WARNING: CPU: 1 PID: 63 at ../mm/mmap.c:2858
> >> > exit_mmap+0x1b8/0x224()
> >> > [   14.616548] Modules linked in:
> >> > [   14.619553] CPU: 1 PID: 63 Comm: init Not tainted 3.19.0-rc5-next-20150123 #1
> >> > [   14.626713] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> >> > [   14.632830] [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> >> > [   14.640473] [] (show_stack) from [] (dump_stack+0x78/0x94)
> >> > [   14.647678] [] (dump_stack) from [] (warn_slowpath_common+0x74/0xb0)
> >> > [   14.655744] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x1c/0x24)
> >> > [   14.664510] [] (warn_slowpath_null) from [] (exit_mmap+0x1b8/0x224)
> >> > [   14.672497] [] (exit_mmap) from [] (mmput+0x40/0xf8)
> >> > [   14.679180] [] (mmput) from [] (flush_old_exec+0x328/0x604)
> >> > [   14.686471] [] (flush_old_exec) from [] (load_elf_binary+0x26c/0x11f4)
> >> > [   14.694715] [] (load_elf_binary) from [] (search_binary_handler+0x98/0x244)
> >> > [   14.703395] [] (search_binary_handler) from []
> >> > (do_execveat_common+0x4dc/0x5bc)
> >> > [   14.712421] [] (do_execveat_common) from [] (do_execve+0x28/0x30)
> >> > [   14.720235] [] (do_execve) from [] (ret_fast_syscall+0x0/0x34)
> >> > [   14.727782] ---[ end trace 5e3ca48b454c7e0a ]---
> >> > [   14.733758] ------------[ cut here ]------------
> >> >
> >> > Has anyone else tested with CONFIG_ARM_LPAE=y that can confirm my findings?
> >> Uggh... I missed since i was looking at non LPAE omap2plus_defconfig.
> >>
> >> Dual A15 OMAP5432 with multi_v7_defconfig + CONFIG_ARM_LPAE=y
> >> https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/omap5-evm.txt
> >>
> >> Dual A15 DRA7/AM572x with same configuration as above.
> >> https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra7xx-evm.txt
> >> https://github.com/nmenon/kernel-test-logs/blob/next-20150123/multi_lpae_defconfig/am57xx-evm.txt
> >>
> >> Single A15 DRA72 with same configuration as above:
> >> https://raw.githubusercontent.com/nmenon/kernel-test-logs/next-20150123/multi_lpae_defconfig/dra72x-evm.txt
> >>
> >> You are right. the issue re-appears with LPAE on :(
> >> Apologies on missing that.
> >
> > Guys, could you instrument mm_{inc,dec}_nr_pmds() with dump_stack() +
> > printk() of the counter and add printk() on mmap_exit() then run a simple
> > program which triggers the issue?
> 
> For reference, here is the patch I've applied for testing, mostly
> stolen from Felipe's debug patch above in this thread.
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1fbd0e8..e5b0444 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1455,11 +1455,17 @@ static inline unsigned long mm_nr_pmds(struct
> mm_struct *mm)
>  static inline void mm_inc_nr_pmds(struct mm_struct *mm)
>  {
>         atomic_long_inc(&mm->nr_pmds);
> +        dump_stack();
> +        printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
> +                atomic_long_read(&mm->nr_pmds));
>  }
> 
>  static inline void mm_dec_nr_pmds(struct mm_struct *mm)
>  {
>         atomic_long_dec(&mm->nr_pmds);
> +        dump_stack();
> +        printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
> +                atomic_long_read(&mm->nr_pmds));
>  }
>  #endif
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 6a7d36d..a16471f 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2809,6 +2809,7 @@ EXPORT_SYMBOL(vm_brk);
>  /* Release all mmaps. */
>  void exit_mmap(struct mm_struct *mm)
>  {
> +       printk(KERN_INFO "===> %s exit_mmap enter\n", __func__);
>         struct mmu_gather tlb;
>         struct vm_area_struct *vma;
>         unsigned long nr_accounted = 0;
> 
> I applied this patch to the tip of linux-next, configured for
> multi_v7_defconfig and set CONFIG_ARM_LPAE=y. The log for this arndale
> boot can be found here [1]. For good measure, I then rebuilt the
> kernel with CONFIG_ARM_LPAE=n and booted the same platform again. This
> log can be found here [2].
> 
> Happy hunting!

Okay, proof of concept patch is below. It's going to break every other
architecture with FIRST_USER_ADDRESS != 0, but I think it's cleaner way to
go.

The problem is that we check nr_ptes/nr_pmds in exit_mmap() which happens
*before* pgd_free(). And if an arch does pte/pmd allocation in pgd_alloc()
and frees them in pgd_free() we see offset in counters by the time of the
checks.

This scenario happens for all archs with FIRST_USER_ADDRESS != 0 and we
tried to work it around by offsetting expected counter value according to
FIRST_USER_ADDRESS for both nr_pte and nr_pmd in exit_mmap().

But ARM with LPAE also has non-zero USER_PGTABLES_CEILING, but upper
addresses occupied with huge pmd entries, so the trick with offsetting
expected counter value will get really ugly: we will have to apply it
nr_pmds, but not nr_ptes.

The proposal is to move the check to check_mm() which happens *after*
pgd_free(). We would need to adjust pgd_free() on all architectures with
non-zero FIRST_USER_ADDRESS to make accouting properly there. But I think
the end result would be cleaner.

Andrew, any comments?

diff --git a/arch/arm/mm/pgd.c b/arch/arm/mm/pgd.c
index 249379535be2..c3ec18d9bbb9 100644
--- a/arch/arm/mm/pgd.c
+++ b/arch/arm/mm/pgd.c
@@ -130,9 +130,11 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd_base)
 	pte = pmd_pgtable(*pmd);
 	pmd_clear(pmd);
 	pte_free(mm, pte);
+	atomic_long_dec(&mm->nr_ptes);
 no_pmd:
 	pud_clear(pud);
 	pmd_free(mm, pmd);
+	mm_dec_nr_pmds(mm);
 no_pud:
 	pgd_clear(pgd);
 	pud_free(mm, pud);
@@ -152,6 +154,7 @@ no_pgd:
 		pmd = pmd_offset(pud, 0);
 		pud_clear(pud);
 		pmd_free(mm, pmd);
+		mm_dec_nr_pmds(mm);
 		pgd_clear(pgd);
 		pud_free(mm, pud);
 	}
diff --git a/kernel/fork.c b/kernel/fork.c
index c99098c52641..0a6f0a380335 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -599,6 +599,13 @@ static void check_mm(struct mm_struct *mm)
 {
 	int i;
 
+	if (atomic_long_read(&mm->nr_ptes))
+		pr_alert("BUG: non-zero nr_ptes on freeing mm: %ld",
+				atomic_long_read(&mm->nr_ptes));
+	if (mm_nr_pmds(mm))
+		pr_alert("BUG: non-zero nr_pmds on freeing mm: %ld",
+				mm_nr_pmds(mm));
+
 	for (i = 0; i < NR_MM_COUNTERS; i++) {
 		long x = atomic_long_read(&mm->rss_stat.count[i]);
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 6a7d36d133fb..c5f44682c0d1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2851,11 +2851,6 @@ void exit_mmap(struct mm_struct *mm)
 		vma = remove_vma(vma);
 	}
 	vm_unacct_memory(nr_accounted);
-
-	WARN_ON(atomic_long_read(&mm->nr_ptes) >
-			round_up(FIRST_USER_ADDRESS, PMD_SIZE) >> PMD_SHIFT);
-	WARN_ON(mm_nr_pmds(mm) >
-			round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
 }
 
 /* Insert vm structure into process list sorted by address
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-24  1:13                     ` Kirill A. Shutemov
@ 2015-01-24  4:37                       ` Nishanth Menon
  -1 siblings, 0 replies; 48+ messages in thread
From: Nishanth Menon @ 2015-01-24  4:37 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Tyler Baker, Kirill A. Shutemov, Russell King - ARM Linux,
	Felipe Balbi, linux-mm, linux-next, linux-omap, linux-arm-kernel

On 03:13-20150124, Kirill A. Shutemov wrote:
> > >> On 09:39-20150123, Tyler Baker wrote:
[...]
> > >> > I just reviewed the boot logs for next-20150123 and there still seems
> > >> > to be a related issue. I've been boot testing
> > >> > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
> > >> > seem broken.
[...]
> Okay, proof of concept patch is below. It's going to break every other
> architecture with FIRST_USER_ADDRESS != 0, but I think it's cleaner way to
> go.

Testing on my end:

just ran through this set (+ logs similar to Tyler's from my side):

next-20150123 (multi_v7_defconfig == !LPAE)
 1:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2219449
 2:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219450
 3:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219451
 4:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219452
TOTAL = 4 boards, Booted Boards = 4, No Boot boards = 0

next-20150123-LPAE-Logging enabled[1] (multi_v7_defconfig +LPAE)
 1:    BeagleBoard-X15(am57xx-evm): BOOT: FAIL: http://paste.ubuntu.org.cn/2220938
 2:                     dra72x-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220943
 3:                     dra7xx-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220947
 4:                      omap5-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220955
TOTAL = 4 boards, Booted Boards = 0, No Boot boards = 4

next-20150123-LPAE-new-patch [2] (multi_v7_defconfig + LPAE)
 1:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221047
 2:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221065
 3:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221069
 4:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221070
TOTAL = 4 boards, Booted Boards = 4, No Boot boards = 0

next-20150123-new-patch[2] (multi_v7_defconfig == !LPAE)
 1:                     am335x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221277
 2:                      am335x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221278
 3:                      am437x-sk: BOOT: FAIL: http://paste.ubuntu.org.cn/2221279 (unrelated)
 4:                    am43xx-epos: BOOT: PASS: http://paste.ubuntu.org.cn/2221280
 5:                   am43xx-gpevm: BOOT: PASS: http://paste.ubuntu.org.cn/2221281
 6:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221282
 7:                 BeagleBoard-XM: BOOT: FAIL: http://paste.ubuntu.org.cn/2221283 (unrelated)
 8:            beagleboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221284
 9:               beaglebone-black: BOOT: PASS: http://paste.ubuntu.org.cn/2221285
10:                     beaglebone: BOOT: FAIL: http://paste.ubuntu.org.cn/2221286 (unrelated)
11:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221287
12:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221288
13:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221289
14:                  pandaboard-es: BOOT: PASS: http://paste.ubuntu.org.cn/2221290
15:             pandaboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221291
16:                        sdp4430: BOOT: PASS: http://paste.ubuntu.org.cn/2221292
TOTAL = 16 boards, Booted Boards = 13, No Boot boards = 3

next-20150123-new-patch[2] (omap2plus_defconfig)
 1:                     am335x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221653
 2:                      am335x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221654
 3:                      am437x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221656
 4:                    am43xx-epos: BOOT: PASS: http://paste.ubuntu.org.cn/2221659
 5:                   am43xx-gpevm: BOOT: PASS: http://paste.ubuntu.org.cn/2221660
 6:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221661
 7:                 BeagleBoard-XM: BOOT: PASS: http://paste.ubuntu.org.cn/2221670
 8:            beagleboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221676
 9:               beaglebone-black: BOOT: PASS: http://paste.ubuntu.org.cn/2221683
10:                     beaglebone: BOOT: PASS: http://paste.ubuntu.org.cn/2221690
11:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221692
12:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221695
13:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221700
14:                  pandaboard-es: BOOT: PASS: http://paste.ubuntu.org.cn/2221704
15:             pandaboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221707
16:                        sdp4430: BOOT: PASS: http://paste.ubuntu.org.cn/2221713
TOTAL = 16 boards, Booted Boards = 16, No Boot boards = 0

[1] http://paste.ubuntu.org.cn/2220994 (based on diff from Tyler B)
[2] https://patchwork.kernel.org/patch/5698491/
-- 
Regards,
Nishanth Menon

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-24  4:37                       ` Nishanth Menon
  0 siblings, 0 replies; 48+ messages in thread
From: Nishanth Menon @ 2015-01-24  4:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 03:13-20150124, Kirill A. Shutemov wrote:
> > >> On 09:39-20150123, Tyler Baker wrote:
[...]
> > >> > I just reviewed the boot logs for next-20150123 and there still seems
> > >> > to be a related issue. I've been boot testing
> > >> > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
> > >> > seem broken.
[...]
> Okay, proof of concept patch is below. It's going to break every other
> architecture with FIRST_USER_ADDRESS != 0, but I think it's cleaner way to
> go.

Testing on my end:

just ran through this set (+ logs similar to Tyler's from my side):

next-20150123 (multi_v7_defconfig == !LPAE)
 1:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2219449
 2:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219450
 3:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219451
 4:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219452
TOTAL = 4 boards, Booted Boards = 4, No Boot boards = 0

next-20150123-LPAE-Logging enabled[1] (multi_v7_defconfig +LPAE)
 1:    BeagleBoard-X15(am57xx-evm): BOOT: FAIL: http://paste.ubuntu.org.cn/2220938
 2:                     dra72x-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220943
 3:                     dra7xx-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220947
 4:                      omap5-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220955
TOTAL = 4 boards, Booted Boards = 0, No Boot boards = 4

next-20150123-LPAE-new-patch [2] (multi_v7_defconfig + LPAE)
 1:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221047
 2:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221065
 3:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221069
 4:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221070
TOTAL = 4 boards, Booted Boards = 4, No Boot boards = 0

next-20150123-new-patch[2] (multi_v7_defconfig == !LPAE)
 1:                     am335x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221277
 2:                      am335x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221278
 3:                      am437x-sk: BOOT: FAIL: http://paste.ubuntu.org.cn/2221279 (unrelated)
 4:                    am43xx-epos: BOOT: PASS: http://paste.ubuntu.org.cn/2221280
 5:                   am43xx-gpevm: BOOT: PASS: http://paste.ubuntu.org.cn/2221281
 6:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221282
 7:                 BeagleBoard-XM: BOOT: FAIL: http://paste.ubuntu.org.cn/2221283 (unrelated)
 8:            beagleboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221284
 9:               beaglebone-black: BOOT: PASS: http://paste.ubuntu.org.cn/2221285
10:                     beaglebone: BOOT: FAIL: http://paste.ubuntu.org.cn/2221286 (unrelated)
11:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221287
12:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221288
13:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221289
14:                  pandaboard-es: BOOT: PASS: http://paste.ubuntu.org.cn/2221290
15:             pandaboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221291
16:                        sdp4430: BOOT: PASS: http://paste.ubuntu.org.cn/2221292
TOTAL = 16 boards, Booted Boards = 13, No Boot boards = 3

next-20150123-new-patch[2] (omap2plus_defconfig)
 1:                     am335x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221653
 2:                      am335x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221654
 3:                      am437x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221656
 4:                    am43xx-epos: BOOT: PASS: http://paste.ubuntu.org.cn/2221659
 5:                   am43xx-gpevm: BOOT: PASS: http://paste.ubuntu.org.cn/2221660
 6:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221661
 7:                 BeagleBoard-XM: BOOT: PASS: http://paste.ubuntu.org.cn/2221670
 8:            beagleboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221676
 9:               beaglebone-black: BOOT: PASS: http://paste.ubuntu.org.cn/2221683
10:                     beaglebone: BOOT: PASS: http://paste.ubuntu.org.cn/2221690
11:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221692
12:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221695
13:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221700
14:                  pandaboard-es: BOOT: PASS: http://paste.ubuntu.org.cn/2221704
15:             pandaboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221707
16:                        sdp4430: BOOT: PASS: http://paste.ubuntu.org.cn/2221713
TOTAL = 16 boards, Booted Boards = 16, No Boot boards = 0

[1] http://paste.ubuntu.org.cn/2220994 (based on diff from Tyler B)
[2] https://patchwork.kernel.org/patch/5698491/
-- 
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-24  4:37                       ` Nishanth Menon
  (?)
@ 2015-01-26 12:00                         ` Kirill A. Shutemov
  -1 siblings, 0 replies; 48+ messages in thread
From: Kirill A. Shutemov @ 2015-01-26 12:00 UTC (permalink / raw)
  To: Nishanth Menon, Andrew Morton
  Cc: Tyler Baker, Kirill A. Shutemov, Russell King - ARM Linux,
	Felipe Balbi, linux-mm, linux-next, linux-omap, linux-arm-kernel,
	James Hogan, Guan Xuetao

On Fri, Jan 23, 2015 at 10:37:46PM -0600, Nishanth Menon wrote:
> On 03:13-20150124, Kirill A. Shutemov wrote:
> > > >> On 09:39-20150123, Tyler Baker wrote:
> [...]
> > > >> > I just reviewed the boot logs for next-20150123 and there still seems
> > > >> > to be a related issue. I've been boot testing
> > > >> > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
> > > >> > seem broken.
> [...]
> > Okay, proof of concept patch is below. It's going to break every other
> > architecture with FIRST_USER_ADDRESS != 0, but I think it's cleaner way to
> > go.
> 
> Testing on my end:
> 
> just ran through this set (+ logs similar to Tyler's from my side):
> 
> next-20150123 (multi_v7_defconfig == !LPAE)
>  1:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2219449
>  2:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219450
>  3:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219451
>  4:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219452
> TOTAL = 4 boards, Booted Boards = 4, No Boot boards = 0
> 
> next-20150123-LPAE-Logging enabled[1] (multi_v7_defconfig +LPAE)
>  1:    BeagleBoard-X15(am57xx-evm): BOOT: FAIL: http://paste.ubuntu.org.cn/2220938
>  2:                     dra72x-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220943
>  3:                     dra7xx-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220947
>  4:                      omap5-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220955
> TOTAL = 4 boards, Booted Boards = 0, No Boot boards = 4
> 
> next-20150123-LPAE-new-patch [2] (multi_v7_defconfig + LPAE)
>  1:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221047
>  2:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221065
>  3:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221069
>  4:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221070
> TOTAL = 4 boards, Booted Boards = 4, No Boot boards = 0
> 
> next-20150123-new-patch[2] (multi_v7_defconfig == !LPAE)
>  1:                     am335x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221277
>  2:                      am335x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221278
>  3:                      am437x-sk: BOOT: FAIL: http://paste.ubuntu.org.cn/2221279 (unrelated)
>  4:                    am43xx-epos: BOOT: PASS: http://paste.ubuntu.org.cn/2221280
>  5:                   am43xx-gpevm: BOOT: PASS: http://paste.ubuntu.org.cn/2221281
>  6:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221282
>  7:                 BeagleBoard-XM: BOOT: FAIL: http://paste.ubuntu.org.cn/2221283 (unrelated)
>  8:            beagleboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221284
>  9:               beaglebone-black: BOOT: PASS: http://paste.ubuntu.org.cn/2221285
> 10:                     beaglebone: BOOT: FAIL: http://paste.ubuntu.org.cn/2221286 (unrelated)
> 11:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221287
> 12:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221288
> 13:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221289
> 14:                  pandaboard-es: BOOT: PASS: http://paste.ubuntu.org.cn/2221290
> 15:             pandaboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221291
> 16:                        sdp4430: BOOT: PASS: http://paste.ubuntu.org.cn/2221292
> TOTAL = 16 boards, Booted Boards = 13, No Boot boards = 3
> 
> next-20150123-new-patch[2] (omap2plus_defconfig)
>  1:                     am335x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221653
>  2:                      am335x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221654
>  3:                      am437x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221656
>  4:                    am43xx-epos: BOOT: PASS: http://paste.ubuntu.org.cn/2221659
>  5:                   am43xx-gpevm: BOOT: PASS: http://paste.ubuntu.org.cn/2221660
>  6:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221661
>  7:                 BeagleBoard-XM: BOOT: PASS: http://paste.ubuntu.org.cn/2221670
>  8:            beagleboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221676
>  9:               beaglebone-black: BOOT: PASS: http://paste.ubuntu.org.cn/2221683
> 10:                     beaglebone: BOOT: PASS: http://paste.ubuntu.org.cn/2221690
> 11:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221692
> 12:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221695
> 13:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221700
> 14:                  pandaboard-es: BOOT: PASS: http://paste.ubuntu.org.cn/2221704
> 15:             pandaboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221707
> 16:                        sdp4430: BOOT: PASS: http://paste.ubuntu.org.cn/2221713
> TOTAL = 16 boards, Booted Boards = 16, No Boot boards = 0

Okay thanks. Here's proper patch.

>From 8f9845ab8d972164b700ff3e3ce53484cceb942b Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Mon, 26 Jan 2015 12:07:54 +0200
Subject: [PATCH 1/2] mm: fix false-positive warning on exit due mm_nr_pmds(mm)

The problem is that we check nr_ptes/nr_pmds in exit_mmap() which happens
*before* pgd_free(). And if an arch does pte/pmd allocation in pgd_alloc()
and frees them in pgd_free() we see offset in counters by the time of the
checks.

We tried to workaround this by offsetting expected counter value
according to FIRST_USER_ADDRESS for both nr_pte and nr_pmd in
exit_mmap(). But it doesn't work in some cases:

1. ARM with LPAE enabled also has non-zero USER_PGTABLES_CEILING, but
   upper addresses occupied with huge pmd entries, so the trick with
   offsetting expected counter value will get really ugly: we will have
   to apply it nr_pmds, but not nr_ptes.

2. Metag has non-zero FIRST_USER_ADDRESS, but doesn't do allocation
   pte/pmd page tables allocation in pgd_alloc(), just setup a pgd entry
   which is allocated at boot and shared accross all processes.

The proposal is to move the check to check_mm() which happens *after*
pgd_free() and do proper accounting during pgd_alloc() and pgd_free()
which would bring counters to zero if nothing leaked.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Tyler Baker <tyler.baker@linaro.org>
Tested-by: Nishanth Menon <nm@ti.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
---
 arch/arm/mm/pgd.c       | 4 ++++
 arch/unicore32/mm/pgd.c | 3 +++
 kernel/fork.c           | 8 ++++++++
 mm/mmap.c               | 5 -----
 4 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/pgd.c b/arch/arm/mm/pgd.c
index 249379535be2..a3681f11dd9f 100644
--- a/arch/arm/mm/pgd.c
+++ b/arch/arm/mm/pgd.c
@@ -97,6 +97,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 
 no_pte:
 	pmd_free(mm, new_pmd);
+	mm_dec_nr_pmds(mm);
 no_pmd:
 	pud_free(mm, new_pud);
 no_pud:
@@ -130,9 +131,11 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd_base)
 	pte = pmd_pgtable(*pmd);
 	pmd_clear(pmd);
 	pte_free(mm, pte);
+	atomic_long_dec(&mm->nr_ptes);
 no_pmd:
 	pud_clear(pud);
 	pmd_free(mm, pmd);
+	mm_dec_nr_pmds(mm);
 no_pud:
 	pgd_clear(pgd);
 	pud_free(mm, pud);
@@ -152,6 +155,7 @@ no_pgd:
 		pmd = pmd_offset(pud, 0);
 		pud_clear(pud);
 		pmd_free(mm, pmd);
+		mm_dec_nr_pmds(mm);
 		pgd_clear(pgd);
 		pud_free(mm, pud);
 	}
diff --git a/arch/unicore32/mm/pgd.c b/arch/unicore32/mm/pgd.c
index 08b8d4295e70..1bc00d0305d4 100644
--- a/arch/unicore32/mm/pgd.c
+++ b/arch/unicore32/mm/pgd.c
@@ -69,6 +69,7 @@ pgd_t *get_pgd_slow(struct mm_struct *mm)
 
 no_pte:
 	pmd_free(mm, new_pmd);
+	mm_dec_nr_pmds(mm);
 no_pmd:
 	free_pages((unsigned long)new_pgd, 0);
 no_pgd:
@@ -96,7 +97,9 @@ void free_pgd_slow(struct mm_struct *mm, pgd_t *pgd)
 	pte = pmd_pgtable(*pmd);
 	pmd_clear(pmd);
 	pte_free(mm, pte);
+	atomic_long_dec(&mm->nr_ptes);
 	pmd_free(mm, pmd);
+	mm_dec_nr_pmds(mm)
 free:
 	free_pages((unsigned long) pgd, 0);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index c99098c52641..76d6f292274c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -606,6 +606,14 @@ static void check_mm(struct mm_struct *mm)
 			printk(KERN_ALERT "BUG: Bad rss-counter state "
 					  "mm:%p idx:%d val:%ld\n", mm, i, x);
 	}
+
+	if (atomic_long_read(&mm->nr_ptes))
+		pr_alert("BUG: non-zero nr_ptes on freeing mm: %ld",
+				atomic_long_read(&mm->nr_ptes));
+	if (mm_nr_pmds(mm))
+		pr_alert("BUG: non-zero nr_pmds on freeing mm: %ld",
+				mm_nr_pmds(mm));
+
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
 	VM_BUG_ON_MM(mm->pmd_huge_pte, mm);
 #endif
diff --git a/mm/mmap.c b/mm/mmap.c
index 6a7d36d133fb..c5f44682c0d1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2851,11 +2851,6 @@ void exit_mmap(struct mm_struct *mm)
 		vma = remove_vma(vma);
 	}
 	vm_unacct_memory(nr_accounted);
-
-	WARN_ON(atomic_long_read(&mm->nr_ptes) >
-			round_up(FIRST_USER_ADDRESS, PMD_SIZE) >> PMD_SHIFT);
-	WARN_ON(mm_nr_pmds(mm) >
-			round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
 }
 
 /* Insert vm structure into process list sorted by address
-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
@ 2015-01-26 12:00                         ` Kirill A. Shutemov
  0 siblings, 0 replies; 48+ messages in thread
From: Kirill A. Shutemov @ 2015-01-26 12:00 UTC (permalink / raw)
  To: Nishanth Menon, Andrew Morton
  Cc: Tyler Baker, Kirill A. Shutemov, Russell King - ARM Linux,
	Felipe Balbi, linux-mm, linux-next, linux-omap, linux-arm-kernel,
	James Hogan, Guan Xuetao

On Fri, Jan 23, 2015 at 10:37:46PM -0600, Nishanth Menon wrote:
> On 03:13-20150124, Kirill A. Shutemov wrote:
> > > >> On 09:39-20150123, Tyler Baker wrote:
> [...]
> > > >> > I just reviewed the boot logs for next-20150123 and there still seems
> > > >> > to be a related issue. I've been boot testing
> > > >> > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
> > > >> > seem broken.
> [...]
> > Okay, proof of concept patch is below. It's going to break every other
> > architecture with FIRST_USER_ADDRESS != 0, but I think it's cleaner way to
> > go.
> 
> Testing on my end:
> 
> just ran through this set (+ logs similar to Tyler's from my side):
> 
> next-20150123 (multi_v7_defconfig == !LPAE)
>  1:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2219449
>  2:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219450
>  3:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219451
>  4:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219452
> TOTAL = 4 boards, Booted Boards = 4, No Boot boards = 0
> 
> next-20150123-LPAE-Logging enabled[1] (multi_v7_defconfig +LPAE)
>  1:    BeagleBoard-X15(am57xx-evm): BOOT: FAIL: http://paste.ubuntu.org.cn/2220938
>  2:                     dra72x-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220943
>  3:                     dra7xx-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220947
>  4:                      omap5-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220955
> TOTAL = 4 boards, Booted Boards = 0, No Boot boards = 4
> 
> next-20150123-LPAE-new-patch [2] (multi_v7_defconfig + LPAE)
>  1:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221047
>  2:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221065
>  3:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221069
>  4:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221070
> TOTAL = 4 boards, Booted Boards = 4, No Boot boards = 0
> 
> next-20150123-new-patch[2] (multi_v7_defconfig == !LPAE)
>  1:                     am335x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221277
>  2:                      am335x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221278
>  3:                      am437x-sk: BOOT: FAIL: http://paste.ubuntu.org.cn/2221279 (unrelated)
>  4:                    am43xx-epos: BOOT: PASS: http://paste.ubuntu.org.cn/2221280
>  5:                   am43xx-gpevm: BOOT: PASS: http://paste.ubuntu.org.cn/2221281
>  6:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221282
>  7:                 BeagleBoard-XM: BOOT: FAIL: http://paste.ubuntu.org.cn/2221283 (unrelated)
>  8:            beagleboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221284
>  9:               beaglebone-black: BOOT: PASS: http://paste.ubuntu.org.cn/2221285
> 10:                     beaglebone: BOOT: FAIL: http://paste.ubuntu.org.cn/2221286 (unrelated)
> 11:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221287
> 12:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221288
> 13:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221289
> 14:                  pandaboard-es: BOOT: PASS: http://paste.ubuntu.org.cn/2221290
> 15:             pandaboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221291
> 16:                        sdp4430: BOOT: PASS: http://paste.ubuntu.org.cn/2221292
> TOTAL = 16 boards, Booted Boards = 13, No Boot boards = 3
> 
> next-20150123-new-patch[2] (omap2plus_defconfig)
>  1:                     am335x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221653
>  2:                      am335x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221654
>  3:                      am437x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221656
>  4:                    am43xx-epos: BOOT: PASS: http://paste.ubuntu.org.cn/2221659
>  5:                   am43xx-gpevm: BOOT: PASS: http://paste.ubuntu.org.cn/2221660
>  6:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221661
>  7:                 BeagleBoard-XM: BOOT: PASS: http://paste.ubuntu.org.cn/2221670
>  8:            beagleboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221676
>  9:               beaglebone-black: BOOT: PASS: http://paste.ubuntu.org.cn/2221683
> 10:                     beaglebone: BOOT: PASS: http://paste.ubuntu.org.cn/2221690
> 11:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221692
> 12:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221695
> 13:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221700
> 14:                  pandaboard-es: BOOT: PASS: http://paste.ubuntu.org.cn/2221704
> 15:             pandaboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221707
> 16:                        sdp4430: BOOT: PASS: http://paste.ubuntu.org.cn/2221713
> TOTAL = 16 boards, Booted Boards = 16, No Boot boards = 0

Okay thanks. Here's proper patch.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-26 12:00                         ` Kirill A. Shutemov
  0 siblings, 0 replies; 48+ messages in thread
From: Kirill A. Shutemov @ 2015-01-26 12:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 23, 2015 at 10:37:46PM -0600, Nishanth Menon wrote:
> On 03:13-20150124, Kirill A. Shutemov wrote:
> > > >> On 09:39-20150123, Tyler Baker wrote:
> [...]
> > > >> > I just reviewed the boot logs for next-20150123 and there still seems
> > > >> > to be a related issue. I've been boot testing
> > > >> > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
> > > >> > seem broken.
> [...]
> > Okay, proof of concept patch is below. It's going to break every other
> > architecture with FIRST_USER_ADDRESS != 0, but I think it's cleaner way to
> > go.
> 
> Testing on my end:
> 
> just ran through this set (+ logs similar to Tyler's from my side):
> 
> next-20150123 (multi_v7_defconfig == !LPAE)
>  1:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2219449
>  2:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219450
>  3:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219451
>  4:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219452
> TOTAL = 4 boards, Booted Boards = 4, No Boot boards = 0
> 
> next-20150123-LPAE-Logging enabled[1] (multi_v7_defconfig +LPAE)
>  1:    BeagleBoard-X15(am57xx-evm): BOOT: FAIL: http://paste.ubuntu.org.cn/2220938
>  2:                     dra72x-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220943
>  3:                     dra7xx-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220947
>  4:                      omap5-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220955
> TOTAL = 4 boards, Booted Boards = 0, No Boot boards = 4
> 
> next-20150123-LPAE-new-patch [2] (multi_v7_defconfig + LPAE)
>  1:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221047
>  2:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221065
>  3:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221069
>  4:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221070
> TOTAL = 4 boards, Booted Boards = 4, No Boot boards = 0
> 
> next-20150123-new-patch[2] (multi_v7_defconfig == !LPAE)
>  1:                     am335x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221277
>  2:                      am335x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221278
>  3:                      am437x-sk: BOOT: FAIL: http://paste.ubuntu.org.cn/2221279 (unrelated)
>  4:                    am43xx-epos: BOOT: PASS: http://paste.ubuntu.org.cn/2221280
>  5:                   am43xx-gpevm: BOOT: PASS: http://paste.ubuntu.org.cn/2221281
>  6:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221282
>  7:                 BeagleBoard-XM: BOOT: FAIL: http://paste.ubuntu.org.cn/2221283 (unrelated)
>  8:            beagleboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221284
>  9:               beaglebone-black: BOOT: PASS: http://paste.ubuntu.org.cn/2221285
> 10:                     beaglebone: BOOT: FAIL: http://paste.ubuntu.org.cn/2221286 (unrelated)
> 11:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221287
> 12:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221288
> 13:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221289
> 14:                  pandaboard-es: BOOT: PASS: http://paste.ubuntu.org.cn/2221290
> 15:             pandaboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221291
> 16:                        sdp4430: BOOT: PASS: http://paste.ubuntu.org.cn/2221292
> TOTAL = 16 boards, Booted Boards = 13, No Boot boards = 3
> 
> next-20150123-new-patch[2] (omap2plus_defconfig)
>  1:                     am335x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221653
>  2:                      am335x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221654
>  3:                      am437x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221656
>  4:                    am43xx-epos: BOOT: PASS: http://paste.ubuntu.org.cn/2221659
>  5:                   am43xx-gpevm: BOOT: PASS: http://paste.ubuntu.org.cn/2221660
>  6:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221661
>  7:                 BeagleBoard-XM: BOOT: PASS: http://paste.ubuntu.org.cn/2221670
>  8:            beagleboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221676
>  9:               beaglebone-black: BOOT: PASS: http://paste.ubuntu.org.cn/2221683
> 10:                     beaglebone: BOOT: PASS: http://paste.ubuntu.org.cn/2221690
> 11:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221692
> 12:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221695
> 13:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221700
> 14:                  pandaboard-es: BOOT: PASS: http://paste.ubuntu.org.cn/2221704
> 15:             pandaboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221707
> 16:                        sdp4430: BOOT: PASS: http://paste.ubuntu.org.cn/2221713
> TOTAL = 16 boards, Booted Boards = 16, No Boot boards = 0

Okay thanks. Here's proper patch.

>From 8f9845ab8d972164b700ff3e3ce53484cceb942b Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Mon, 26 Jan 2015 12:07:54 +0200
Subject: [PATCH 1/2] mm: fix false-positive warning on exit due mm_nr_pmds(mm)

The problem is that we check nr_ptes/nr_pmds in exit_mmap() which happens
*before* pgd_free(). And if an arch does pte/pmd allocation in pgd_alloc()
and frees them in pgd_free() we see offset in counters by the time of the
checks.

We tried to workaround this by offsetting expected counter value
according to FIRST_USER_ADDRESS for both nr_pte and nr_pmd in
exit_mmap(). But it doesn't work in some cases:

1. ARM with LPAE enabled also has non-zero USER_PGTABLES_CEILING, but
   upper addresses occupied with huge pmd entries, so the trick with
   offsetting expected counter value will get really ugly: we will have
   to apply it nr_pmds, but not nr_ptes.

2. Metag has non-zero FIRST_USER_ADDRESS, but doesn't do allocation
   pte/pmd page tables allocation in pgd_alloc(), just setup a pgd entry
   which is allocated at boot and shared accross all processes.

The proposal is to move the check to check_mm() which happens *after*
pgd_free() and do proper accounting during pgd_alloc() and pgd_free()
which would bring counters to zero if nothing leaked.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Tyler Baker <tyler.baker@linaro.org>
Tested-by: Nishanth Menon <nm@ti.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
---
 arch/arm/mm/pgd.c       | 4 ++++
 arch/unicore32/mm/pgd.c | 3 +++
 kernel/fork.c           | 8 ++++++++
 mm/mmap.c               | 5 -----
 4 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/pgd.c b/arch/arm/mm/pgd.c
index 249379535be2..a3681f11dd9f 100644
--- a/arch/arm/mm/pgd.c
+++ b/arch/arm/mm/pgd.c
@@ -97,6 +97,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 
 no_pte:
 	pmd_free(mm, new_pmd);
+	mm_dec_nr_pmds(mm);
 no_pmd:
 	pud_free(mm, new_pud);
 no_pud:
@@ -130,9 +131,11 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd_base)
 	pte = pmd_pgtable(*pmd);
 	pmd_clear(pmd);
 	pte_free(mm, pte);
+	atomic_long_dec(&mm->nr_ptes);
 no_pmd:
 	pud_clear(pud);
 	pmd_free(mm, pmd);
+	mm_dec_nr_pmds(mm);
 no_pud:
 	pgd_clear(pgd);
 	pud_free(mm, pud);
@@ -152,6 +155,7 @@ no_pgd:
 		pmd = pmd_offset(pud, 0);
 		pud_clear(pud);
 		pmd_free(mm, pmd);
+		mm_dec_nr_pmds(mm);
 		pgd_clear(pgd);
 		pud_free(mm, pud);
 	}
diff --git a/arch/unicore32/mm/pgd.c b/arch/unicore32/mm/pgd.c
index 08b8d4295e70..1bc00d0305d4 100644
--- a/arch/unicore32/mm/pgd.c
+++ b/arch/unicore32/mm/pgd.c
@@ -69,6 +69,7 @@ pgd_t *get_pgd_slow(struct mm_struct *mm)
 
 no_pte:
 	pmd_free(mm, new_pmd);
+	mm_dec_nr_pmds(mm);
 no_pmd:
 	free_pages((unsigned long)new_pgd, 0);
 no_pgd:
@@ -96,7 +97,9 @@ void free_pgd_slow(struct mm_struct *mm, pgd_t *pgd)
 	pte = pmd_pgtable(*pmd);
 	pmd_clear(pmd);
 	pte_free(mm, pte);
+	atomic_long_dec(&mm->nr_ptes);
 	pmd_free(mm, pmd);
+	mm_dec_nr_pmds(mm)
 free:
 	free_pages((unsigned long) pgd, 0);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index c99098c52641..76d6f292274c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -606,6 +606,14 @@ static void check_mm(struct mm_struct *mm)
 			printk(KERN_ALERT "BUG: Bad rss-counter state "
 					  "mm:%p idx:%d val:%ld\n", mm, i, x);
 	}
+
+	if (atomic_long_read(&mm->nr_ptes))
+		pr_alert("BUG: non-zero nr_ptes on freeing mm: %ld",
+				atomic_long_read(&mm->nr_ptes));
+	if (mm_nr_pmds(mm))
+		pr_alert("BUG: non-zero nr_pmds on freeing mm: %ld",
+				mm_nr_pmds(mm));
+
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
 	VM_BUG_ON_MM(mm->pmd_huge_pte, mm);
 #endif
diff --git a/mm/mmap.c b/mm/mmap.c
index 6a7d36d133fb..c5f44682c0d1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2851,11 +2851,6 @@ void exit_mmap(struct mm_struct *mm)
 		vma = remove_vma(vma);
 	}
 	vm_unacct_memory(nr_accounted);
-
-	WARN_ON(atomic_long_read(&mm->nr_ptes) >
-			round_up(FIRST_USER_ADDRESS, PMD_SIZE) >> PMD_SHIFT);
-	WARN_ON(mm_nr_pmds(mm) >
-			round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
 }
 
 /* Insert vm structure into process list sorted by address
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [next-20150119]regression (mm)?
  2015-01-26 12:00                         ` Kirill A. Shutemov
@ 2015-01-26 22:38                           ` Tyler Baker
  -1 siblings, 0 replies; 48+ messages in thread
From: Tyler Baker @ 2015-01-26 22:38 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Nishanth Menon, Andrew Morton, Kirill A. Shutemov,
	Russell King - ARM Linux, Felipe Balbi, linux-mm, linux-next,
	linux-omap, linux-arm-kernel, James Hogan, Guan Xuetao

On 26 January 2015 at 04:00, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Fri, Jan 23, 2015 at 10:37:46PM -0600, Nishanth Menon wrote:
>> On 03:13-20150124, Kirill A. Shutemov wrote:
>> > > >> On 09:39-20150123, Tyler Baker wrote:
>> [...]
>> > > >> > I just reviewed the boot logs for next-20150123 and there still seems
>> > > >> > to be a related issue. I've been boot testing
>> > > >> > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
>> > > >> > seem broken.
>> [...]
>> > Okay, proof of concept patch is below. It's going to break every other
>> > architecture with FIRST_USER_ADDRESS != 0, but I think it's cleaner way to
>> > go.
>>
>> Testing on my end:
>>
>> just ran through this set (+ logs similar to Tyler's from my side):
>>
>> next-20150123 (multi_v7_defconfig == !LPAE)
>>  1:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2219449
>>  2:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219450
>>  3:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219451
>>  4:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219452
>> TOTAL = 4 boards, Booted Boards = 4, No Boot boards = 0
>>
>> next-20150123-LPAE-Logging enabled[1] (multi_v7_defconfig +LPAE)
>>  1:    BeagleBoard-X15(am57xx-evm): BOOT: FAIL: http://paste.ubuntu.org.cn/2220938
>>  2:                     dra72x-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220943
>>  3:                     dra7xx-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220947
>>  4:                      omap5-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220955
>> TOTAL = 4 boards, Booted Boards = 0, No Boot boards = 4
>>
>> next-20150123-LPAE-new-patch [2] (multi_v7_defconfig + LPAE)
>>  1:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221047
>>  2:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221065
>>  3:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221069
>>  4:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221070
>> TOTAL = 4 boards, Booted Boards = 4, No Boot boards = 0
>>
>> next-20150123-new-patch[2] (multi_v7_defconfig == !LPAE)
>>  1:                     am335x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221277
>>  2:                      am335x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221278
>>  3:                      am437x-sk: BOOT: FAIL: http://paste.ubuntu.org.cn/2221279 (unrelated)
>>  4:                    am43xx-epos: BOOT: PASS: http://paste.ubuntu.org.cn/2221280
>>  5:                   am43xx-gpevm: BOOT: PASS: http://paste.ubuntu.org.cn/2221281
>>  6:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221282
>>  7:                 BeagleBoard-XM: BOOT: FAIL: http://paste.ubuntu.org.cn/2221283 (unrelated)
>>  8:            beagleboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221284
>>  9:               beaglebone-black: BOOT: PASS: http://paste.ubuntu.org.cn/2221285
>> 10:                     beaglebone: BOOT: FAIL: http://paste.ubuntu.org.cn/2221286 (unrelated)
>> 11:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221287
>> 12:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221288
>> 13:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221289
>> 14:                  pandaboard-es: BOOT: PASS: http://paste.ubuntu.org.cn/2221290
>> 15:             pandaboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221291
>> 16:                        sdp4430: BOOT: PASS: http://paste.ubuntu.org.cn/2221292
>> TOTAL = 16 boards, Booted Boards = 13, No Boot boards = 3
>>
>> next-20150123-new-patch[2] (omap2plus_defconfig)
>>  1:                     am335x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221653
>>  2:                      am335x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221654
>>  3:                      am437x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221656
>>  4:                    am43xx-epos: BOOT: PASS: http://paste.ubuntu.org.cn/2221659
>>  5:                   am43xx-gpevm: BOOT: PASS: http://paste.ubuntu.org.cn/2221660
>>  6:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221661
>>  7:                 BeagleBoard-XM: BOOT: PASS: http://paste.ubuntu.org.cn/2221670
>>  8:            beagleboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221676
>>  9:               beaglebone-black: BOOT: PASS: http://paste.ubuntu.org.cn/2221683
>> 10:                     beaglebone: BOOT: PASS: http://paste.ubuntu.org.cn/2221690
>> 11:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221692
>> 12:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221695
>> 13:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221700
>> 14:                  pandaboard-es: BOOT: PASS: http://paste.ubuntu.org.cn/2221704
>> 15:             pandaboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221707
>> 16:                        sdp4430: BOOT: PASS: http://paste.ubuntu.org.cn/2221713
>> TOTAL = 16 boards, Booted Boards = 16, No Boot boards = 0
>
> Okay thanks. Here's proper patch.
>
> From 8f9845ab8d972164b700ff3e3ce53484cceb942b Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Mon, 26 Jan 2015 12:07:54 +0200
> Subject: [PATCH 1/2] mm: fix false-positive warning on exit due mm_nr_pmds(mm)
>
> The problem is that we check nr_ptes/nr_pmds in exit_mmap() which happens
> *before* pgd_free(). And if an arch does pte/pmd allocation in pgd_alloc()
> and frees them in pgd_free() we see offset in counters by the time of the
> checks.
>
> We tried to workaround this by offsetting expected counter value
> according to FIRST_USER_ADDRESS for both nr_pte and nr_pmd in
> exit_mmap(). But it doesn't work in some cases:
>
> 1. ARM with LPAE enabled also has non-zero USER_PGTABLES_CEILING, but
>    upper addresses occupied with huge pmd entries, so the trick with
>    offsetting expected counter value will get really ugly: we will have
>    to apply it nr_pmds, but not nr_ptes.
>
> 2. Metag has non-zero FIRST_USER_ADDRESS, but doesn't do allocation
>    pte/pmd page tables allocation in pgd_alloc(), just setup a pgd entry
>    which is allocated at boot and shared accross all processes.
>
> The proposal is to move the check to check_mm() which happens *after*
> pgd_free() and do proper accounting during pgd_alloc() and pgd_free()
> which would bring counters to zero if nothing leaked.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

I've tested this patch on top of linux-next [1] on a various array of
arm, arm64 and x86 hardware. I can confirm the issue with
CONFIG_ARM_LPAE=y has been resolved with no additional regressions
detected. The results can be found here [2].

Feel free to add:

Tested-by: Tyler Baker <tyler.baker@linaro.org>

> Reported-by: Tyler Baker <tyler.baker@linaro.org>
> Tested-by: Nishanth Menon <nm@ti.com>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: James Hogan <james.hogan@imgtec.com>
> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
> ---
>  arch/arm/mm/pgd.c       | 4 ++++
>  arch/unicore32/mm/pgd.c | 3 +++
>  kernel/fork.c           | 8 ++++++++
>  mm/mmap.c               | 5 -----
>  4 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm/mm/pgd.c b/arch/arm/mm/pgd.c
> index 249379535be2..a3681f11dd9f 100644
> --- a/arch/arm/mm/pgd.c
> +++ b/arch/arm/mm/pgd.c
> @@ -97,6 +97,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
>
>  no_pte:
>         pmd_free(mm, new_pmd);
> +       mm_dec_nr_pmds(mm);
>  no_pmd:
>         pud_free(mm, new_pud);
>  no_pud:
> @@ -130,9 +131,11 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd_base)
>         pte = pmd_pgtable(*pmd);
>         pmd_clear(pmd);
>         pte_free(mm, pte);
> +       atomic_long_dec(&mm->nr_ptes);
>  no_pmd:
>         pud_clear(pud);
>         pmd_free(mm, pmd);
> +       mm_dec_nr_pmds(mm);
>  no_pud:
>         pgd_clear(pgd);
>         pud_free(mm, pud);
> @@ -152,6 +155,7 @@ no_pgd:
>                 pmd = pmd_offset(pud, 0);
>                 pud_clear(pud);
>                 pmd_free(mm, pmd);
> +               mm_dec_nr_pmds(mm);
>                 pgd_clear(pgd);
>                 pud_free(mm, pud);
>         }
> diff --git a/arch/unicore32/mm/pgd.c b/arch/unicore32/mm/pgd.c
> index 08b8d4295e70..1bc00d0305d4 100644
> --- a/arch/unicore32/mm/pgd.c
> +++ b/arch/unicore32/mm/pgd.c
> @@ -69,6 +69,7 @@ pgd_t *get_pgd_slow(struct mm_struct *mm)
>
>  no_pte:
>         pmd_free(mm, new_pmd);
> +       mm_dec_nr_pmds(mm);
>  no_pmd:
>         free_pages((unsigned long)new_pgd, 0);
>  no_pgd:
> @@ -96,7 +97,9 @@ void free_pgd_slow(struct mm_struct *mm, pgd_t *pgd)
>         pte = pmd_pgtable(*pmd);
>         pmd_clear(pmd);
>         pte_free(mm, pte);
> +       atomic_long_dec(&mm->nr_ptes);
>         pmd_free(mm, pmd);
> +       mm_dec_nr_pmds(mm)
>  free:
>         free_pages((unsigned long) pgd, 0);
>  }
> diff --git a/kernel/fork.c b/kernel/fork.c
> index c99098c52641..76d6f292274c 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -606,6 +606,14 @@ static void check_mm(struct mm_struct *mm)
>                         printk(KERN_ALERT "BUG: Bad rss-counter state "
>                                           "mm:%p idx:%d val:%ld\n", mm, i, x);
>         }
> +
> +       if (atomic_long_read(&mm->nr_ptes))
> +               pr_alert("BUG: non-zero nr_ptes on freeing mm: %ld",
> +                               atomic_long_read(&mm->nr_ptes));
> +       if (mm_nr_pmds(mm))
> +               pr_alert("BUG: non-zero nr_pmds on freeing mm: %ld",
> +                               mm_nr_pmds(mm));
> +
>  #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
>         VM_BUG_ON_MM(mm->pmd_huge_pte, mm);
>  #endif
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 6a7d36d133fb..c5f44682c0d1 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2851,11 +2851,6 @@ void exit_mmap(struct mm_struct *mm)
>                 vma = remove_vma(vma);
>         }
>         vm_unacct_memory(nr_accounted);
> -
> -       WARN_ON(atomic_long_read(&mm->nr_ptes) >
> -                       round_up(FIRST_USER_ADDRESS, PMD_SIZE) >> PMD_SHIFT);
> -       WARN_ON(mm_nr_pmds(mm) >
> -                       round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
>  }
>
>  /* Insert vm structure into process list sorted by address
> --
>  Kirill A. Shutemov

[1] https://git.linaro.org/people/tyler.baker/linux-next.git/shortlog/refs/heads/next-testing
[2] http://kernelci.org/boot/all/job/tbaker/kernel/v3.19-rc5-5174-g384ba8a33c70/

Thanks,

Tyler

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [next-20150119]regression (mm)?
@ 2015-01-26 22:38                           ` Tyler Baker
  0 siblings, 0 replies; 48+ messages in thread
From: Tyler Baker @ 2015-01-26 22:38 UTC (permalink / raw)
  To: linux-arm-kernel

On 26 January 2015 at 04:00, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Fri, Jan 23, 2015 at 10:37:46PM -0600, Nishanth Menon wrote:
>> On 03:13-20150124, Kirill A. Shutemov wrote:
>> > > >> On 09:39-20150123, Tyler Baker wrote:
>> [...]
>> > > >> > I just reviewed the boot logs for next-20150123 and there still seems
>> > > >> > to be a related issue. I've been boot testing
>> > > >> > multi_v7_defconfig+CONFIG_ARM_LPAE=y kernel configurations which still
>> > > >> > seem broken.
>> [...]
>> > Okay, proof of concept patch is below. It's going to break every other
>> > architecture with FIRST_USER_ADDRESS != 0, but I think it's cleaner way to
>> > go.
>>
>> Testing on my end:
>>
>> just ran through this set (+ logs similar to Tyler's from my side):
>>
>> next-20150123 (multi_v7_defconfig == !LPAE)
>>  1:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2219449
>>  2:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219450
>>  3:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219451
>>  4:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2219452
>> TOTAL = 4 boards, Booted Boards = 4, No Boot boards = 0
>>
>> next-20150123-LPAE-Logging enabled[1] (multi_v7_defconfig +LPAE)
>>  1:    BeagleBoard-X15(am57xx-evm): BOOT: FAIL: http://paste.ubuntu.org.cn/2220938
>>  2:                     dra72x-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220943
>>  3:                     dra7xx-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220947
>>  4:                      omap5-evm: BOOT: FAIL: http://paste.ubuntu.org.cn/2220955
>> TOTAL = 4 boards, Booted Boards = 0, No Boot boards = 4
>>
>> next-20150123-LPAE-new-patch [2] (multi_v7_defconfig + LPAE)
>>  1:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221047
>>  2:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221065
>>  3:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221069
>>  4:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221070
>> TOTAL = 4 boards, Booted Boards = 4, No Boot boards = 0
>>
>> next-20150123-new-patch[2] (multi_v7_defconfig == !LPAE)
>>  1:                     am335x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221277
>>  2:                      am335x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221278
>>  3:                      am437x-sk: BOOT: FAIL: http://paste.ubuntu.org.cn/2221279 (unrelated)
>>  4:                    am43xx-epos: BOOT: PASS: http://paste.ubuntu.org.cn/2221280
>>  5:                   am43xx-gpevm: BOOT: PASS: http://paste.ubuntu.org.cn/2221281
>>  6:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221282
>>  7:                 BeagleBoard-XM: BOOT: FAIL: http://paste.ubuntu.org.cn/2221283 (unrelated)
>>  8:            beagleboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221284
>>  9:               beaglebone-black: BOOT: PASS: http://paste.ubuntu.org.cn/2221285
>> 10:                     beaglebone: BOOT: FAIL: http://paste.ubuntu.org.cn/2221286 (unrelated)
>> 11:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221287
>> 12:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221288
>> 13:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221289
>> 14:                  pandaboard-es: BOOT: PASS: http://paste.ubuntu.org.cn/2221290
>> 15:             pandaboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221291
>> 16:                        sdp4430: BOOT: PASS: http://paste.ubuntu.org.cn/2221292
>> TOTAL = 16 boards, Booted Boards = 13, No Boot boards = 3
>>
>> next-20150123-new-patch[2] (omap2plus_defconfig)
>>  1:                     am335x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221653
>>  2:                      am335x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221654
>>  3:                      am437x-sk: BOOT: PASS: http://paste.ubuntu.org.cn/2221656
>>  4:                    am43xx-epos: BOOT: PASS: http://paste.ubuntu.org.cn/2221659
>>  5:                   am43xx-gpevm: BOOT: PASS: http://paste.ubuntu.org.cn/2221660
>>  6:    BeagleBoard-X15(am57xx-evm): BOOT: PASS: http://paste.ubuntu.org.cn/2221661
>>  7:                 BeagleBoard-XM: BOOT: PASS: http://paste.ubuntu.org.cn/2221670
>>  8:            beagleboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221676
>>  9:               beaglebone-black: BOOT: PASS: http://paste.ubuntu.org.cn/2221683
>> 10:                     beaglebone: BOOT: PASS: http://paste.ubuntu.org.cn/2221690
>> 11:                     dra72x-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221692
>> 12:                     dra7xx-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221695
>> 13:                      omap5-evm: BOOT: PASS: http://paste.ubuntu.org.cn/2221700
>> 14:                  pandaboard-es: BOOT: PASS: http://paste.ubuntu.org.cn/2221704
>> 15:             pandaboard-vanilla: BOOT: PASS: http://paste.ubuntu.org.cn/2221707
>> 16:                        sdp4430: BOOT: PASS: http://paste.ubuntu.org.cn/2221713
>> TOTAL = 16 boards, Booted Boards = 16, No Boot boards = 0
>
> Okay thanks. Here's proper patch.
>
> From 8f9845ab8d972164b700ff3e3ce53484cceb942b Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Mon, 26 Jan 2015 12:07:54 +0200
> Subject: [PATCH 1/2] mm: fix false-positive warning on exit due mm_nr_pmds(mm)
>
> The problem is that we check nr_ptes/nr_pmds in exit_mmap() which happens
> *before* pgd_free(). And if an arch does pte/pmd allocation in pgd_alloc()
> and frees them in pgd_free() we see offset in counters by the time of the
> checks.
>
> We tried to workaround this by offsetting expected counter value
> according to FIRST_USER_ADDRESS for both nr_pte and nr_pmd in
> exit_mmap(). But it doesn't work in some cases:
>
> 1. ARM with LPAE enabled also has non-zero USER_PGTABLES_CEILING, but
>    upper addresses occupied with huge pmd entries, so the trick with
>    offsetting expected counter value will get really ugly: we will have
>    to apply it nr_pmds, but not nr_ptes.
>
> 2. Metag has non-zero FIRST_USER_ADDRESS, but doesn't do allocation
>    pte/pmd page tables allocation in pgd_alloc(), just setup a pgd entry
>    which is allocated at boot and shared accross all processes.
>
> The proposal is to move the check to check_mm() which happens *after*
> pgd_free() and do proper accounting during pgd_alloc() and pgd_free()
> which would bring counters to zero if nothing leaked.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

I've tested this patch on top of linux-next [1] on a various array of
arm, arm64 and x86 hardware. I can confirm the issue with
CONFIG_ARM_LPAE=y has been resolved with no additional regressions
detected. The results can be found here [2].

Feel free to add:

Tested-by: Tyler Baker <tyler.baker@linaro.org>

> Reported-by: Tyler Baker <tyler.baker@linaro.org>
> Tested-by: Nishanth Menon <nm@ti.com>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: James Hogan <james.hogan@imgtec.com>
> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
> ---
>  arch/arm/mm/pgd.c       | 4 ++++
>  arch/unicore32/mm/pgd.c | 3 +++
>  kernel/fork.c           | 8 ++++++++
>  mm/mmap.c               | 5 -----
>  4 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm/mm/pgd.c b/arch/arm/mm/pgd.c
> index 249379535be2..a3681f11dd9f 100644
> --- a/arch/arm/mm/pgd.c
> +++ b/arch/arm/mm/pgd.c
> @@ -97,6 +97,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
>
>  no_pte:
>         pmd_free(mm, new_pmd);
> +       mm_dec_nr_pmds(mm);
>  no_pmd:
>         pud_free(mm, new_pud);
>  no_pud:
> @@ -130,9 +131,11 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd_base)
>         pte = pmd_pgtable(*pmd);
>         pmd_clear(pmd);
>         pte_free(mm, pte);
> +       atomic_long_dec(&mm->nr_ptes);
>  no_pmd:
>         pud_clear(pud);
>         pmd_free(mm, pmd);
> +       mm_dec_nr_pmds(mm);
>  no_pud:
>         pgd_clear(pgd);
>         pud_free(mm, pud);
> @@ -152,6 +155,7 @@ no_pgd:
>                 pmd = pmd_offset(pud, 0);
>                 pud_clear(pud);
>                 pmd_free(mm, pmd);
> +               mm_dec_nr_pmds(mm);
>                 pgd_clear(pgd);
>                 pud_free(mm, pud);
>         }
> diff --git a/arch/unicore32/mm/pgd.c b/arch/unicore32/mm/pgd.c
> index 08b8d4295e70..1bc00d0305d4 100644
> --- a/arch/unicore32/mm/pgd.c
> +++ b/arch/unicore32/mm/pgd.c
> @@ -69,6 +69,7 @@ pgd_t *get_pgd_slow(struct mm_struct *mm)
>
>  no_pte:
>         pmd_free(mm, new_pmd);
> +       mm_dec_nr_pmds(mm);
>  no_pmd:
>         free_pages((unsigned long)new_pgd, 0);
>  no_pgd:
> @@ -96,7 +97,9 @@ void free_pgd_slow(struct mm_struct *mm, pgd_t *pgd)
>         pte = pmd_pgtable(*pmd);
>         pmd_clear(pmd);
>         pte_free(mm, pte);
> +       atomic_long_dec(&mm->nr_ptes);
>         pmd_free(mm, pmd);
> +       mm_dec_nr_pmds(mm)
>  free:
>         free_pages((unsigned long) pgd, 0);
>  }
> diff --git a/kernel/fork.c b/kernel/fork.c
> index c99098c52641..76d6f292274c 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -606,6 +606,14 @@ static void check_mm(struct mm_struct *mm)
>                         printk(KERN_ALERT "BUG: Bad rss-counter state "
>                                           "mm:%p idx:%d val:%ld\n", mm, i, x);
>         }
> +
> +       if (atomic_long_read(&mm->nr_ptes))
> +               pr_alert("BUG: non-zero nr_ptes on freeing mm: %ld",
> +                               atomic_long_read(&mm->nr_ptes));
> +       if (mm_nr_pmds(mm))
> +               pr_alert("BUG: non-zero nr_pmds on freeing mm: %ld",
> +                               mm_nr_pmds(mm));
> +
>  #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
>         VM_BUG_ON_MM(mm->pmd_huge_pte, mm);
>  #endif
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 6a7d36d133fb..c5f44682c0d1 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2851,11 +2851,6 @@ void exit_mmap(struct mm_struct *mm)
>                 vma = remove_vma(vma);
>         }
>         vm_unacct_memory(nr_accounted);
> -
> -       WARN_ON(atomic_long_read(&mm->nr_ptes) >
> -                       round_up(FIRST_USER_ADDRESS, PMD_SIZE) >> PMD_SHIFT);
> -       WARN_ON(mm_nr_pmds(mm) >
> -                       round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
>  }
>
>  /* Insert vm structure into process list sorted by address
> --
>  Kirill A. Shutemov

[1] https://git.linaro.org/people/tyler.baker/linux-next.git/shortlog/refs/heads/next-testing
[2] http://kernelci.org/boot/all/job/tbaker/kernel/v3.19-rc5-5174-g384ba8a33c70/

Thanks,

Tyler

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2015-01-26 22:38 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-19 16:42 [next-20150119]regression (mm)? Nishanth Menon
2015-01-19 16:42 ` Nishanth Menon
     [not found] ` <CANMBJr6DudDBSs+rM-e2QnC5ztxAYLuSvZ0khvx7OZdQpcu_3A@mail.gmail.com>
2015-01-19 17:04   ` Nishanth Menon
2015-01-19 17:04     ` Nishanth Menon
2015-01-19 17:19     ` Tyler Baker
2015-01-19 17:19       ` Tyler Baker
2015-01-19 17:43 ` Felipe Balbi
2015-01-19 17:43   ` Felipe Balbi
2015-01-20  0:16   ` Kirill A. Shutemov
2015-01-20  0:16     ` Kirill A. Shutemov
2015-01-20  0:16     ` Kirill A. Shutemov
2015-01-20 11:45     ` Russell King - ARM Linux
2015-01-20 11:45       ` Russell King - ARM Linux
2015-01-20 14:05       ` Kirill A. Shutemov
2015-01-20 14:05         ` Kirill A. Shutemov
2015-01-20 14:05         ` Kirill A. Shutemov
2015-01-20 14:50         ` Fabio Estevam
2015-01-20 14:50           ` Fabio Estevam
2015-01-20 14:50           ` Fabio Estevam
2015-01-20 15:10           ` Felipe Balbi
2015-01-20 15:10             ` Felipe Balbi
2015-01-20 23:26         ` Nishanth Menon
2015-01-20 23:26           ` Nishanth Menon
2015-01-21  9:23         ` Peter Ujfalusi
2015-01-21  9:23           ` Peter Ujfalusi
2015-01-21 10:29         ` Krzysztof Kozlowski
2015-01-21 10:29           ` Krzysztof Kozlowski
2015-01-23 17:27         ` Nishanth Menon
2015-01-23 17:27           ` Nishanth Menon
2015-01-23 17:39           ` Tyler Baker
2015-01-23 17:39             ` Tyler Baker
2015-01-23 18:37             ` Nishanth Menon
2015-01-23 18:37               ` Nishanth Menon
2015-01-23 20:22               ` Kirill A. Shutemov
2015-01-23 20:22                 ` Kirill A. Shutemov
2015-01-23 22:05                 ` Nishanth Menon
2015-01-23 22:05                   ` Nishanth Menon
2015-01-23 22:42                 ` Tyler Baker
2015-01-23 22:42                   ` Tyler Baker
2015-01-24  1:13                   ` Kirill A. Shutemov
2015-01-24  1:13                     ` Kirill A. Shutemov
2015-01-24  4:37                     ` Nishanth Menon
2015-01-24  4:37                       ` Nishanth Menon
2015-01-26 12:00                       ` Kirill A. Shutemov
2015-01-26 12:00                         ` Kirill A. Shutemov
2015-01-26 12:00                         ` Kirill A. Shutemov
2015-01-26 22:38                         ` Tyler Baker
2015-01-26 22:38                           ` Tyler Baker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.