From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754067AbeAQWEy (ORCPT ); Wed, 17 Jan 2018 17:04:54 -0500 Received: from mga06.intel.com ([134.134.136.31]:59229 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753730AbeAQWEx (ORCPT ); Wed, 17 Jan 2018 17:04:53 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,374,1511856000"; d="scan'208";a="166957981" Subject: Re: [mm 4.15-rc8] Random oopses under memory pressure. To: Linus Torvalds , Tetsuo Handa References: <201801160115.w0G1FOIG057203@www262.sakura.ne.jp> <201801170233.JDG21842.OFOJMQSHtOFFLV@I-love.SAKURA.ne.jp> <201801172008.CHH39543.FFtMHOOVSQJLFO@I-love.SAKURA.ne.jp> Cc: "Kirill A. Shutemov" , Andrew Morton , Johannes Weiner , Joonsoo Kim , Mel Gorman , Tony Luck , Vlastimil Babka , Michal Hocko , Ingo Molnar , Linux Kernel Mailing List , linux-mm , the arch/x86 maintainers From: Dave Hansen Message-ID: <4fe52147-b6a1-83a7-ee4b-104846ddb919@linux.intel.com> Date: Wed, 17 Jan 2018 14:04:51 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/17/2018 01:51 PM, Linus Torvalds wrote: > In fact, it seems to be such a fundamental bug that I suspect I'm > entirely wrong, and full of shit. So it's an interesting and not > _obviously_ incorrect theory, but I suspect I must be missing > something. I'll just note that a few of the pfns I decoded were smack in the middle of the zone, not near either the high or low end of ZONE_NORMAL where we would expect this cross-zone stuff to happen. But I guess we could get similar wonkiness where 'struct page' is screwed up in so many different ways if during buddy joining you do: list_del(&buddy->lru); and 'buddy' is off in another zone for which you do not hold the spinlock. If we are somehow missing some locking, or double-allocating a page, something like this would help: static inline void rmv_page_order(struct page *page) { + WARN_ON_ONCE(!PageBuddy(page)); __ClearPageBuddy(page); set_page_private(page, 0); }