All of lore.kernel.org
 help / color / mirror / Atom feed
* mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-13  8:17 ` Haren Myneni
  0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-13  8:17 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev; +Cc: Haren Myneni, aneesh.kumar, srikar

Hi,

 I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
1.6 TB memory. We can easily recreate this issue with kernel compile
(make -j500). But I could not reproduce with numa_balancing=disable.

------------[ cut here ]------------
kernel BUG at include/linux/swapops.h:134!
cpu 0x154: Vector: 700 (Program Check) at [c00009cf365c7610]
    pc: c00000000021e48c: remove_migration_pte+0x29c/0x450
    lr: c00000000021e47c: remove_migration_pte+0x28c/0x450
    sp: c00009cf365c7890
   msr: 8000000002029033
  current = 0xc00009cf36525fc0
  paca    = 0xc00000000e80fa00   softe: 0        irq_happened: 0x01
    pid   = 244969, comm = cc1
kernel BUG at include/linux/swapops.h:134!
enter ? for help
[c00009cf365c7960] c0000000001f3228 rmap_walk+0x348/0x460
[c00009cf365c7a10] c0000000008d8804 remove_migration_ptes+0x6c/0x84
[c00009cf365c7ab0] c000000000220d2c migrate_pages+0xaac/0xd20
[c00009cf365c7c00] c0000000002218cc migrate_misplaced_page+0x12c/0x210
[c00009cf365c7ca0] c0000000001e613c handle_mm_fault+0xa4c/0x17d0
[c00009cf365c7d70] c0000000008d1098 do_page_fault+0x3a8/0x800
[c00009cf365c7e30] c000000000008664 handle_page_fault+0x10/0x30

I think we are hitting this race issue when the migrate entry page is
not locked.

dump_page() for *old page:

page:f00000035f36a5a0 count:1 mapcount:0 mapping:c00009cf3d351311
index:0x3ffffffe
flags: 0x93ffff800080009(locked|uptodate|swapbacked)

dump_page() for migrate entry page:

page:f00000009f36a5a0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x13ffff800000000()

Any suggestions on how to debug this issue?

Thanks
Haren

^ permalink raw reply	[flat|nested] 18+ messages in thread

* mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-13  8:17 ` Haren Myneni
  0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-13  8:17 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev; +Cc: Haren Myneni, aneesh.kumar, srikar

Hi,

 I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
1.6 TB memory. We can easily recreate this issue with kernel compile
(make -j500). But I could not reproduce with numa_balancing=disable.

------------[ cut here ]------------
kernel BUG at include/linux/swapops.h:134!
cpu 0x154: Vector: 700 (Program Check) at [c00009cf365c7610]
    pc: c00000000021e48c: remove_migration_pte+0x29c/0x450
    lr: c00000000021e47c: remove_migration_pte+0x28c/0x450
    sp: c00009cf365c7890
   msr: 8000000002029033
  current = 0xc00009cf36525fc0
  paca    = 0xc00000000e80fa00   softe: 0        irq_happened: 0x01
    pid   = 244969, comm = cc1
kernel BUG at include/linux/swapops.h:134!
enter ? for help
[c00009cf365c7960] c0000000001f3228 rmap_walk+0x348/0x460
[c00009cf365c7a10] c0000000008d8804 remove_migration_ptes+0x6c/0x84
[c00009cf365c7ab0] c000000000220d2c migrate_pages+0xaac/0xd20
[c00009cf365c7c00] c0000000002218cc migrate_misplaced_page+0x12c/0x210
[c00009cf365c7ca0] c0000000001e613c handle_mm_fault+0xa4c/0x17d0
[c00009cf365c7d70] c0000000008d1098 do_page_fault+0x3a8/0x800
[c00009cf365c7e30] c000000000008664 handle_page_fault+0x10/0x30

I think we are hitting this race issue when the migrate entry page is
not locked.

dump_page() for *old page:

page:f00000035f36a5a0 count:1 mapcount:0 mapping:c00009cf3d351311
index:0x3ffffffe
flags: 0x93ffff800080009(locked|uptodate|swapbacked)

dump_page() for migrate entry page:

page:f00000009f36a5a0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x13ffff800000000()

Any suggestions on how to debug this issue?

Thanks
Haren

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-13  8:17 ` Haren Myneni
  0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-13  8:17 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev; +Cc: aneesh.kumar, srikar

Hi,

 I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
1.6 TB memory. We can easily recreate this issue with kernel compile
(make -j500). But I could not reproduce with numa_balancing=disable.

------------[ cut here ]------------
kernel BUG at include/linux/swapops.h:134!
cpu 0x154: Vector: 700 (Program Check) at [c00009cf365c7610]
    pc: c00000000021e48c: remove_migration_pte+0x29c/0x450
    lr: c00000000021e47c: remove_migration_pte+0x28c/0x450
    sp: c00009cf365c7890
   msr: 8000000002029033
  current = 0xc00009cf36525fc0
  paca    = 0xc00000000e80fa00   softe: 0        irq_happened: 0x01
    pid   = 244969, comm = cc1
kernel BUG at include/linux/swapops.h:134!
enter ? for help
[c00009cf365c7960] c0000000001f3228 rmap_walk+0x348/0x460
[c00009cf365c7a10] c0000000008d8804 remove_migration_ptes+0x6c/0x84
[c00009cf365c7ab0] c000000000220d2c migrate_pages+0xaac/0xd20
[c00009cf365c7c00] c0000000002218cc migrate_misplaced_page+0x12c/0x210
[c00009cf365c7ca0] c0000000001e613c handle_mm_fault+0xa4c/0x17d0
[c00009cf365c7d70] c0000000008d1098 do_page_fault+0x3a8/0x800
[c00009cf365c7e30] c000000000008664 handle_page_fault+0x10/0x30

I think we are hitting this race issue when the migrate entry page is
not locked.

dump_page() for *old page:

page:f00000035f36a5a0 count:1 mapcount:0 mapping:c00009cf3d351311
index:0x3ffffffe
flags: 0x93ffff800080009(locked|uptodate|swapbacked)

dump_page() for migrate entry page:

page:f00000009f36a5a0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x13ffff800000000()

Any suggestions on how to debug this issue?

Thanks
Haren

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
  2015-05-13  8:17 ` Haren Myneni
  (?)
@ 2015-05-14  9:33   ` Mel Gorman
  -1 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2015-05-14  9:33 UTC (permalink / raw)
  To: Haren Myneni
  Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar, srikar

On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote:
> Hi,
> 
>  I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
> 1.6 TB memory. We can easily recreate this issue with kernel compile
> (make -j500). But I could not reproduce with numa_balancing=disable.
> 

Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not
match up with a BUG_ON. It's close to a PageLocked check but I want to
be sure there are no other modifications.

Otherwise, when was the last time this worked? Was 4.0 ok? As it can be
easily reproduced, can the problem be bisected please?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-14  9:33   ` Mel Gorman
  0 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2015-05-14  9:33 UTC (permalink / raw)
  To: Haren Myneni
  Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar, srikar

On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote:
> Hi,
> 
>  I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
> 1.6 TB memory. We can easily recreate this issue with kernel compile
> (make -j500). But I could not reproduce with numa_balancing=disable.
> 

Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not
match up with a BUG_ON. It's close to a PageLocked check but I want to
be sure there are no other modifications.

Otherwise, when was the last time this worked? Was 4.0 ok? As it can be
easily reproduced, can the problem be bisected please?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-14  9:33   ` Mel Gorman
  0 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2015-05-14  9:33 UTC (permalink / raw)
  To: Haren Myneni; +Cc: srikar, linux-kernel, linux-mm, aneesh.kumar, linuxppc-dev

On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote:
> Hi,
> 
>  I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
> 1.6 TB memory. We can easily recreate this issue with kernel compile
> (make -j500). But I could not reproduce with numa_balancing=disable.
> 

Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not
match up with a BUG_ON. It's close to a PageLocked check but I want to
be sure there are no other modifications.

Otherwise, when was the last time this worked? Was 4.0 ok? As it can be
easily reproduced, can the problem be bisected please?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
  2015-05-14  9:33   ` Mel Gorman
  (?)
@ 2015-05-14 15:48     ` Haren Myneni
  -1 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-14 15:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar, srikar

On 5/14/15, Mel Gorman <mgorman@suse.de> wrote:
> On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote:
>> Hi,
>>
>>  I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
>> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
>> 1.6 TB memory. We can easily recreate this issue with kernel compile
>> (make -j500). But I could not reproduce with numa_balancing=disable.
>>
>
> Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not
> match up with a BUG_ON. It's close to a PageLocked check but I want to
> be sure there are no other modifications.

Mel, Thanks for your help. I added some printks and dump_page() to get
the page struct and swp_entry information.

>
> Otherwise, when was the last time this worked? Was 4.0 ok? As it can be
> easily reproduced, can the problem be bisected please?

I did not try previous versions other than RHEL kernel (3.10.*). I
will try with previous versions.

In the failure case, also noticed pte and address values are matched
in try_to_unmap_one() and remove_migration_pte(), but entry
(swp_entry_t) value is different. So looks like page strut address in
migration_entry_to_page() is not valid.

try_to_unmap_one()
{

...
        } else if (IS_ENABLED(CONFIG_MIGRATION)) {
                        /*
                         * Store the pfn of the page in a special migration
                         * pte. do_swap_page() will wait until the migration
                         * pte is removed and then restart fault handling.
                         */
                        BUG_ON(!(flags & TTU_MIGRATION));
                        entry = make_migration_entry(page, pte_write(pteval));
                }
                swp_pte = swp_entry_to_pte(entry);
                if (pte_soft_dirty(pteval))
                        swp_pte = pte_swp_mksoft_dirty(swp_pte);
                set_pte_at(mm, address, pte, swp_pte);

                /*pte=0xb16b8d0f80000000 address=0x100008150000
                page=0xf000000513f3e1e0  entry=0x3e0000000ec5ae34 */
...
}

 remove_migration_pte()
{
...
        /* address=0x100008150000 pte=0xb16b8d0f80000000
        *old=0xf000000513f3e1e0 */
        if (!is_migration_entry(entry) ||
        migration_entry_to_page(entry) != old)
        goto unlock;
...
}

 migration_entry_to_page()  {
        pte=0xb16b8d0f80000000  entry=0x3e00000002c5ae34
        page=0xf0000000f3f3e1e0
}


Thanks
Haren

>
> --
> Mel Gorman
> SUSE Labs
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-14 15:48     ` Haren Myneni
  0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-14 15:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar, srikar

On 5/14/15, Mel Gorman <mgorman@suse.de> wrote:
> On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote:
>> Hi,
>>
>>  I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
>> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
>> 1.6 TB memory. We can easily recreate this issue with kernel compile
>> (make -j500). But I could not reproduce with numa_balancing=disable.
>>
>
> Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not
> match up with a BUG_ON. It's close to a PageLocked check but I want to
> be sure there are no other modifications.

Mel, Thanks for your help. I added some printks and dump_page() to get
the page struct and swp_entry information.

>
> Otherwise, when was the last time this worked? Was 4.0 ok? As it can be
> easily reproduced, can the problem be bisected please?

I did not try previous versions other than RHEL kernel (3.10.*). I
will try with previous versions.

In the failure case, also noticed pte and address values are matched
in try_to_unmap_one() and remove_migration_pte(), but entry
(swp_entry_t) value is different. So looks like page strut address in
migration_entry_to_page() is not valid.

try_to_unmap_one()
{

...
        } else if (IS_ENABLED(CONFIG_MIGRATION)) {
                        /*
                         * Store the pfn of the page in a special migration
                         * pte. do_swap_page() will wait until the migration
                         * pte is removed and then restart fault handling.
                         */
                        BUG_ON(!(flags & TTU_MIGRATION));
                        entry = make_migration_entry(page, pte_write(pteval));
                }
                swp_pte = swp_entry_to_pte(entry);
                if (pte_soft_dirty(pteval))
                        swp_pte = pte_swp_mksoft_dirty(swp_pte);
                set_pte_at(mm, address, pte, swp_pte);

                /*pte=0xb16b8d0f80000000 address=0x100008150000
                page=0xf000000513f3e1e0  entry=0x3e0000000ec5ae34 */
...
}

 remove_migration_pte()
{
...
        /* address=0x100008150000 pte=0xb16b8d0f80000000
        *old=0xf000000513f3e1e0 */
        if (!is_migration_entry(entry) ||
        migration_entry_to_page(entry) != old)
        goto unlock;
...
}

 migration_entry_to_page()  {
        pte=0xb16b8d0f80000000  entry=0x3e00000002c5ae34
        page=0xf0000000f3f3e1e0
}


Thanks
Haren

>
> --
> Mel Gorman
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-14 15:48     ` Haren Myneni
  0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-14 15:48 UTC (permalink / raw)
  To: Mel Gorman; +Cc: srikar, linux-kernel, linux-mm, aneesh.kumar, linuxppc-dev

On 5/14/15, Mel Gorman <mgorman@suse.de> wrote:
> On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote:
>> Hi,
>>
>>  I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
>> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
>> 1.6 TB memory. We can easily recreate this issue with kernel compile
>> (make -j500). But I could not reproduce with numa_balancing=disable.
>>
>
> Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not
> match up with a BUG_ON. It's close to a PageLocked check but I want to
> be sure there are no other modifications.

Mel, Thanks for your help. I added some printks and dump_page() to get
the page struct and swp_entry information.

>
> Otherwise, when was the last time this worked? Was 4.0 ok? As it can be
> easily reproduced, can the problem be bisected please?

I did not try previous versions other than RHEL kernel (3.10.*). I
will try with previous versions.

In the failure case, also noticed pte and address values are matched
in try_to_unmap_one() and remove_migration_pte(), but entry
(swp_entry_t) value is different. So looks like page strut address in
migration_entry_to_page() is not valid.

try_to_unmap_one()
{

...
        } else if (IS_ENABLED(CONFIG_MIGRATION)) {
                        /*
                         * Store the pfn of the page in a special migration
                         * pte. do_swap_page() will wait until the migration
                         * pte is removed and then restart fault handling.
                         */
                        BUG_ON(!(flags & TTU_MIGRATION));
                        entry = make_migration_entry(page, pte_write(pteval));
                }
                swp_pte = swp_entry_to_pte(entry);
                if (pte_soft_dirty(pteval))
                        swp_pte = pte_swp_mksoft_dirty(swp_pte);
                set_pte_at(mm, address, pte, swp_pte);

                /*pte=0xb16b8d0f80000000 address=0x100008150000
                page=0xf000000513f3e1e0  entry=0x3e0000000ec5ae34 */
...
}

 remove_migration_pte()
{
...
        /* address=0x100008150000 pte=0xb16b8d0f80000000
        *old=0xf000000513f3e1e0 */
        if (!is_migration_entry(entry) ||
        migration_entry_to_page(entry) != old)
        goto unlock;
...
}

 migration_entry_to_page()  {
        pte=0xb16b8d0f80000000  entry=0x3e00000002c5ae34
        page=0xf0000000f3f3e1e0
}


Thanks
Haren

>
> --
> Mel Gorman
> SUSE Labs
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
  2015-05-14 15:48     ` Haren Myneni
  (?)
@ 2015-05-18  7:32       ` Haren Myneni
  -1 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-18  7:32 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar, srikar

Mel,
    I am hitting this issue with 4.0 kernel and even with 3.19 and
3.17 kernels. I will also try with previous versions. Please let me
know any suggestions on the debugging.

Thanks
Haren

On 5/14/15, Haren Myneni <hmyneni@gmail.com> wrote:
> On 5/14/15, Mel Gorman <mgorman@suse.de> wrote:
>> On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote:
>>> Hi,
>>>
>>>  I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
>>> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
>>> 1.6 TB memory. We can easily recreate this issue with kernel compile
>>> (make -j500). But I could not reproduce with numa_balancing=disable.
>>>
>>
>> Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not
>> match up with a BUG_ON. It's close to a PageLocked check but I want to
>> be sure there are no other modifications.
>
> Mel, Thanks for your help. I added some printks and dump_page() to get
> the page struct and swp_entry information.
>
>>
>> Otherwise, when was the last time this worked? Was 4.0 ok? As it can be
>> easily reproduced, can the problem be bisected please?
>
> I did not try previous versions other than RHEL kernel (3.10.*). I
> will try with previous versions.
>
> In the failure case, also noticed pte and address values are matched
> in try_to_unmap_one() and remove_migration_pte(), but entry
> (swp_entry_t) value is different. So looks like page strut address in
> migration_entry_to_page() is not valid.
>
> try_to_unmap_one()
> {
>
> ...
>         } else if (IS_ENABLED(CONFIG_MIGRATION)) {
>                         /*
>                          * Store the pfn of the page in a special migration
>                          * pte. do_swap_page() will wait until the
> migration
>                          * pte is removed and then restart fault handling.
>                          */
>                         BUG_ON(!(flags & TTU_MIGRATION));
>                         entry = make_migration_entry(page,
> pte_write(pteval));
>                 }
>                 swp_pte = swp_entry_to_pte(entry);
>                 if (pte_soft_dirty(pteval))
>                         swp_pte = pte_swp_mksoft_dirty(swp_pte);
>                 set_pte_at(mm, address, pte, swp_pte);
>
>                 /*pte=0xb16b8d0f80000000 address=0x100008150000
>                 page=0xf000000513f3e1e0  entry=0x3e0000000ec5ae34 */
> ...
> }
>
>  remove_migration_pte()
> {
> ...
>         /* address=0x100008150000 pte=0xb16b8d0f80000000
>         *old=0xf000000513f3e1e0 */
>         if (!is_migration_entry(entry) ||
>         migration_entry_to_page(entry) != old)
>         goto unlock;
> ...
> }
>
>  migration_entry_to_page()  {
>         pte=0xb16b8d0f80000000  entry=0x3e00000002c5ae34
>         page=0xf0000000f3f3e1e0
> }
>
>
> Thanks
> Haren
>
>>
>> --
>> Mel Gorman
>> SUSE Labs
>>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-18  7:32       ` Haren Myneni
  0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-18  7:32 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar, srikar

Mel,
    I am hitting this issue with 4.0 kernel and even with 3.19 and
3.17 kernels. I will also try with previous versions. Please let me
know any suggestions on the debugging.

Thanks
Haren

On 5/14/15, Haren Myneni <hmyneni@gmail.com> wrote:
> On 5/14/15, Mel Gorman <mgorman@suse.de> wrote:
>> On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote:
>>> Hi,
>>>
>>>  I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
>>> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
>>> 1.6 TB memory. We can easily recreate this issue with kernel compile
>>> (make -j500). But I could not reproduce with numa_balancing=disable.
>>>
>>
>> Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not
>> match up with a BUG_ON. It's close to a PageLocked check but I want to
>> be sure there are no other modifications.
>
> Mel, Thanks for your help. I added some printks and dump_page() to get
> the page struct and swp_entry information.
>
>>
>> Otherwise, when was the last time this worked? Was 4.0 ok? As it can be
>> easily reproduced, can the problem be bisected please?
>
> I did not try previous versions other than RHEL kernel (3.10.*). I
> will try with previous versions.
>
> In the failure case, also noticed pte and address values are matched
> in try_to_unmap_one() and remove_migration_pte(), but entry
> (swp_entry_t) value is different. So looks like page strut address in
> migration_entry_to_page() is not valid.
>
> try_to_unmap_one()
> {
>
> ...
>         } else if (IS_ENABLED(CONFIG_MIGRATION)) {
>                         /*
>                          * Store the pfn of the page in a special migration
>                          * pte. do_swap_page() will wait until the
> migration
>                          * pte is removed and then restart fault handling.
>                          */
>                         BUG_ON(!(flags & TTU_MIGRATION));
>                         entry = make_migration_entry(page,
> pte_write(pteval));
>                 }
>                 swp_pte = swp_entry_to_pte(entry);
>                 if (pte_soft_dirty(pteval))
>                         swp_pte = pte_swp_mksoft_dirty(swp_pte);
>                 set_pte_at(mm, address, pte, swp_pte);
>
>                 /*pte=0xb16b8d0f80000000 address=0x100008150000
>                 page=0xf000000513f3e1e0  entry=0x3e0000000ec5ae34 */
> ...
> }
>
>  remove_migration_pte()
> {
> ...
>         /* address=0x100008150000 pte=0xb16b8d0f80000000
>         *old=0xf000000513f3e1e0 */
>         if (!is_migration_entry(entry) ||
>         migration_entry_to_page(entry) != old)
>         goto unlock;
> ...
> }
>
>  migration_entry_to_page()  {
>         pte=0xb16b8d0f80000000  entry=0x3e00000002c5ae34
>         page=0xf0000000f3f3e1e0
> }
>
>
> Thanks
> Haren
>
>>
>> --
>> Mel Gorman
>> SUSE Labs
>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-18  7:32       ` Haren Myneni
  0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-18  7:32 UTC (permalink / raw)
  To: Mel Gorman; +Cc: srikar, linux-kernel, linux-mm, aneesh.kumar, linuxppc-dev

Mel,
    I am hitting this issue with 4.0 kernel and even with 3.19 and
3.17 kernels. I will also try with previous versions. Please let me
know any suggestions on the debugging.

Thanks
Haren

On 5/14/15, Haren Myneni <hmyneni@gmail.com> wrote:
> On 5/14/15, Mel Gorman <mgorman@suse.de> wrote:
>> On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote:
>>> Hi,
>>>
>>>  I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
>>> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
>>> 1.6 TB memory. We can easily recreate this issue with kernel compile
>>> (make -j500). But I could not reproduce with numa_balancing=disable.
>>>
>>
>> Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not
>> match up with a BUG_ON. It's close to a PageLocked check but I want to
>> be sure there are no other modifications.
>
> Mel, Thanks for your help. I added some printks and dump_page() to get
> the page struct and swp_entry information.
>
>>
>> Otherwise, when was the last time this worked? Was 4.0 ok? As it can be
>> easily reproduced, can the problem be bisected please?
>
> I did not try previous versions other than RHEL kernel (3.10.*). I
> will try with previous versions.
>
> In the failure case, also noticed pte and address values are matched
> in try_to_unmap_one() and remove_migration_pte(), but entry
> (swp_entry_t) value is different. So looks like page strut address in
> migration_entry_to_page() is not valid.
>
> try_to_unmap_one()
> {
>
> ...
>         } else if (IS_ENABLED(CONFIG_MIGRATION)) {
>                         /*
>                          * Store the pfn of the page in a special migration
>                          * pte. do_swap_page() will wait until the
> migration
>                          * pte is removed and then restart fault handling.
>                          */
>                         BUG_ON(!(flags & TTU_MIGRATION));
>                         entry = make_migration_entry(page,
> pte_write(pteval));
>                 }
>                 swp_pte = swp_entry_to_pte(entry);
>                 if (pte_soft_dirty(pteval))
>                         swp_pte = pte_swp_mksoft_dirty(swp_pte);
>                 set_pte_at(mm, address, pte, swp_pte);
>
>                 /*pte=0xb16b8d0f80000000 address=0x100008150000
>                 page=0xf000000513f3e1e0  entry=0x3e0000000ec5ae34 */
> ...
> }
>
>  remove_migration_pte()
> {
> ...
>         /* address=0x100008150000 pte=0xb16b8d0f80000000
>         *old=0xf000000513f3e1e0 */
>         if (!is_migration_entry(entry) ||
>         migration_entry_to_page(entry) != old)
>         goto unlock;
> ...
> }
>
>  migration_entry_to_page()  {
>         pte=0xb16b8d0f80000000  entry=0x3e00000002c5ae34
>         page=0xf0000000f3f3e1e0
> }
>
>
> Thanks
> Haren
>
>>
>> --
>> Mel Gorman
>> SUSE Labs
>>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
  2015-05-18  7:32       ` Haren Myneni
  (?)
@ 2015-05-18  8:11         ` Mel Gorman
  -1 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2015-05-18  8:11 UTC (permalink / raw)
  To: Haren Myneni
  Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar, srikar

On Mon, May 18, 2015 at 12:32:29AM -0700, Haren Myneni wrote:
> Mel,
>     I am hitting this issue with 4.0 kernel and even with 3.19 and
> 3.17 kernels. I will also try with previous versions. Please let me
> know any suggestions on the debugging.
> 

Please keep going further back in time to see if there was a point where
this was ever working. It could be a ppc64-specific bug but right now,
I'm still drawing a blank.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-18  8:11         ` Mel Gorman
  0 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2015-05-18  8:11 UTC (permalink / raw)
  To: Haren Myneni
  Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar, srikar

On Mon, May 18, 2015 at 12:32:29AM -0700, Haren Myneni wrote:
> Mel,
>     I am hitting this issue with 4.0 kernel and even with 3.19 and
> 3.17 kernels. I will also try with previous versions. Please let me
> know any suggestions on the debugging.
> 

Please keep going further back in time to see if there was a point where
this was ever working. It could be a ppc64-specific bug but right now,
I'm still drawing a blank.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-18  8:11         ` Mel Gorman
  0 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2015-05-18  8:11 UTC (permalink / raw)
  To: Haren Myneni; +Cc: srikar, linux-kernel, linux-mm, aneesh.kumar, linuxppc-dev

On Mon, May 18, 2015 at 12:32:29AM -0700, Haren Myneni wrote:
> Mel,
>     I am hitting this issue with 4.0 kernel and even with 3.19 and
> 3.17 kernels. I will also try with previous versions. Please let me
> know any suggestions on the debugging.
> 

Please keep going further back in time to see if there was a point where
this was ever working. It could be a ppc64-specific bug but right now,
I'm still drawing a blank.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
  2015-05-18  8:11         ` Mel Gorman
  (?)
@ 2015-05-18  8:18           ` Haren Myneni
  -1 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-18  8:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar, srikar

On 5/18/15, Mel Gorman <mgorman@suse.de> wrote:
> On Mon, May 18, 2015 at 12:32:29AM -0700, Haren Myneni wrote:
>> Mel,
>>     I am hitting this issue with 4.0 kernel and even with 3.19 and
>> 3.17 kernels. I will also try with previous versions. Please let me
>> know any suggestions on the debugging.
>>
>
> Please keep going further back in time to see if there was a point where
> this was ever working. It could be a ppc64-specific bug but right now,
> I'm still drawing a blank.

Sure, will do. I am running PPC64 LE kernel, but it does not show any
LE issue so far.

Thanks
Haren

>
> --
> Mel Gorman
> SUSE Labs
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-18  8:18           ` Haren Myneni
  0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-18  8:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar, srikar

On 5/18/15, Mel Gorman <mgorman@suse.de> wrote:
> On Mon, May 18, 2015 at 12:32:29AM -0700, Haren Myneni wrote:
>> Mel,
>>     I am hitting this issue with 4.0 kernel and even with 3.19 and
>> 3.17 kernels. I will also try with previous versions. Please let me
>> know any suggestions on the debugging.
>>
>
> Please keep going further back in time to see if there was a point where
> this was ever working. It could be a ppc64-specific bug but right now,
> I'm still drawing a blank.

Sure, will do. I am running PPC64 LE kernel, but it does not show any
LE issue so far.

Thanks
Haren

>
> --
> Mel Gorman
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-18  8:18           ` Haren Myneni
  0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-18  8:18 UTC (permalink / raw)
  To: Mel Gorman; +Cc: srikar, linux-kernel, linux-mm, aneesh.kumar, linuxppc-dev

On 5/18/15, Mel Gorman <mgorman@suse.de> wrote:
> On Mon, May 18, 2015 at 12:32:29AM -0700, Haren Myneni wrote:
>> Mel,
>>     I am hitting this issue with 4.0 kernel and even with 3.19 and
>> 3.17 kernels. I will also try with previous versions. Please let me
>> know any suggestions on the debugging.
>>
>
> Please keep going further back in time to see if there was a point where
> this was ever working. It could be a ppc64-specific bug but right now,
> I'm still drawing a blank.

Sure, will do. I am running PPC64 LE kernel, but it does not show any
LE issue so far.

Thanks
Haren

>
> --
> Mel Gorman
> SUSE Labs
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-05-18  8:18 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-13  8:17 mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!) Haren Myneni
2015-05-13  8:17 ` Haren Myneni
2015-05-13  8:17 ` Haren Myneni
2015-05-14  9:33 ` Mel Gorman
2015-05-14  9:33   ` Mel Gorman
2015-05-14  9:33   ` Mel Gorman
2015-05-14 15:48   ` Haren Myneni
2015-05-14 15:48     ` Haren Myneni
2015-05-14 15:48     ` Haren Myneni
2015-05-18  7:32     ` Haren Myneni
2015-05-18  7:32       ` Haren Myneni
2015-05-18  7:32       ` Haren Myneni
2015-05-18  8:11       ` Mel Gorman
2015-05-18  8:11         ` Mel Gorman
2015-05-18  8:11         ` Mel Gorman
2015-05-18  8:18         ` Haren Myneni
2015-05-18  8:18           ` Haren Myneni
2015-05-18  8:18           ` Haren Myneni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.