From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751110AbeAOXFY (ORCPT + 1 other); Mon, 15 Jan 2018 18:05:24 -0500 Received: from mail-it0-f65.google.com ([209.85.214.65]:36709 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750764AbeAOXFW (ORCPT ); Mon, 15 Jan 2018 18:05:22 -0500 X-Google-Smtp-Source: ACJfBovnzPs6YCVEukLpFDzPM6n4TfhQCNclM3J7zz0RnkTP+8hxWvzdts/7uN6zZuof+b2qhwIVEcQJbVzLkCuFe+k= MIME-Version: 1.0 In-Reply-To: <201801142054.FAD95378.LVOOFQJOFtMFSH@I-love.SAKURA.ne.jp> References: <201801112311.EHI90152.FLJMQOStVHFOFO@I-love.SAKURA.ne.jp> <20180111142148.GD1732@dhcp22.suse.cz> <201801120131.w0C1VJUN034283@www262.sakura.ne.jp> <201801122022.IDI35401.VOQOFOMLFSFtHJ@I-love.SAKURA.ne.jp> <201801142054.FAD95378.LVOOFQJOFtMFSH@I-love.SAKURA.ne.jp> From: Linus Torvalds Date: Mon, 15 Jan 2018 15:05:20 -0800 X-Google-Sender-Auth: 63T6WlFBBBOZlo7zIxVfI6an5xk Message-ID: Subject: Re: [mm 4.15-rc7] Random oopses under memory pressure. To: Tetsuo Handa Cc: Linux Kernel Mailing List , linux-mm , "the arch/x86 maintainers" , linux-fsdevel , Michal Hocko Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Sun, Jan 14, 2018 at 3:54 AM, Tetsuo Handa wrote: > This memory corruption bug occurs even on CONFIG_SMP=n CONFIG_PREEMPT_NONE=y > kernel. This bug highly depends on timing and thus too difficult to bisect. > This bug seems to exist at least since Linux 4.8 (judging from the traces, though > the cause might be different). None of debugging configuration gives me a clue. > So far only CONFIG_HIGHMEM=y CONFIG_DEBUG_PAGEALLOC=y kernel (with RAM enough to > use HighMem: zone) seems to hit this bug, but it might be just by chance caused > by timings. Thus, there is no evidence that 64bit kernels are not affected by > this bug. But I can't narrow down any more. Thus, I call for developers who can > narrow down / identify where the memory corruption bug is. Hmm. I guess I'm still hung up on the "it does not look like a valid 'struct page *'" thing. Can you reproduce this with CONFIG_FLATMEM=y instead of CONFIG_SPARSEMEM? Because if you can, I think we can easily add a few more pfn and 'struct page' validation debug statements. With SPARSEMEM, it gets pretty complicated because the whole struct page setup is much more complex. Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: <201801142054.FAD95378.LVOOFQJOFtMFSH@I-love.SAKURA.ne.jp> References: <201801112311.EHI90152.FLJMQOStVHFOFO@I-love.SAKURA.ne.jp> <20180111142148.GD1732@dhcp22.suse.cz> <201801120131.w0C1VJUN034283@www262.sakura.ne.jp> <201801122022.IDI35401.VOQOFOMLFSFtHJ@I-love.SAKURA.ne.jp> <201801142054.FAD95378.LVOOFQJOFtMFSH@I-love.SAKURA.ne.jp> From: Linus Torvalds Date: Mon, 15 Jan 2018 15:05:20 -0800 Message-ID: Subject: Re: [mm 4.15-rc7] Random oopses under memory pressure. To: Tetsuo Handa Cc: Linux Kernel Mailing List , linux-mm , "the arch/x86 maintainers" , linux-fsdevel , Michal Hocko Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: On Sun, Jan 14, 2018 at 3:54 AM, Tetsuo Handa wrote: > This memory corruption bug occurs even on CONFIG_SMP=n CONFIG_PREEMPT_NONE=y > kernel. This bug highly depends on timing and thus too difficult to bisect. > This bug seems to exist at least since Linux 4.8 (judging from the traces, though > the cause might be different). None of debugging configuration gives me a clue. > So far only CONFIG_HIGHMEM=y CONFIG_DEBUG_PAGEALLOC=y kernel (with RAM enough to > use HighMem: zone) seems to hit this bug, but it might be just by chance caused > by timings. Thus, there is no evidence that 64bit kernels are not affected by > this bug. But I can't narrow down any more. Thus, I call for developers who can > narrow down / identify where the memory corruption bug is. Hmm. I guess I'm still hung up on the "it does not look like a valid 'struct page *'" thing. Can you reproduce this with CONFIG_FLATMEM=y instead of CONFIG_SPARSEMEM? Because if you can, I think we can easily add a few more pfn and 'struct page' validation debug statements. With SPARSEMEM, it gets pretty complicated because the whole struct page setup is much more complex. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org