From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f198.google.com (mail-io0-f198.google.com [209.85.223.198]) by kanga.kvack.org (Postfix) with ESMTP id 2BE7D6B0069 for ; Thu, 18 Jan 2018 12:26:28 -0500 (EST) Received: by mail-io0-f198.google.com with SMTP id b184so8128074iof.21 for ; Thu, 18 Jan 2018 09:26:28 -0800 (PST) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id g141sor3626001ioe.96.2018.01.18.09.26.27 for (Google Transport Security); Thu, 18 Jan 2018 09:26:27 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20180118165629.kpdkezarsf4qymnw@node.shutemov.name> References: <201801160115.w0G1FOIG057203@www262.sakura.ne.jp> <201801170233.JDG21842.OFOJMQSHtOFFLV@I-love.SAKURA.ne.jp> <201801172008.CHH39543.FFtMHOOVSQJLFO@I-love.SAKURA.ne.jp> <201801181712.BFD13039.LtHOSVMFJQFOFO@I-love.SAKURA.ne.jp> <20180118122550.2lhsjx7hg5drcjo4@node.shutemov.name> <20180118145830.GA6406@redhat.com> <20180118165629.kpdkezarsf4qymnw@node.shutemov.name> From: Linus Torvalds Date: Thu, 18 Jan 2018 09:26:25 -0800 Message-ID: Subject: Re: [mm 4.15-rc8] Random oopses under memory pressure. Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: "Kirill A. Shutemov" Cc: Andrea Arcangeli , Dave Hansen , Tetsuo Handa , "Kirill A. Shutemov" , Andrew Morton , Johannes Weiner , Joonsoo Kim , Mel Gorman , Tony Luck , Vlastimil Babka , Michal Hocko , "hillf.zj" , Hugh Dickins , Oleg Nesterov , Peter Zijlstra , Rik van Riel , Srikar Dronamraju , Vladimir Davydov , Ingo Molnar , Linux Kernel Mailing List , linux-mm , the arch/x86 maintainers On Thu, Jan 18, 2018 at 8:56 AM, Kirill A. Shutemov wrote: > > I can't say I fully grasp how 'diff' got this value and how it leads to both > checks being false. I think the problem is that page difference when they are in different sections. When you do pte_page(*pvmw->pte) - pvmw->page then the compiler takes the pointer difference, and then divides by the size of "struct page" to get an index. But - and this is important - it does so knowing that the division it does will have no modulus: the two 'struct page *' pointers are really in the same array, and they really are 'n*sizeof(struct page)' apart for some 'n'. That means that the compiler can optimize the division. In fact, for this case, gcc will generate subl %ebx, %eax sarl $3, %eax imull $-858993459, %eax, %eax because 'struct page' is 40 bytes in size, and that magic sequence happens to divide by 40 (first divide by 8, then that magical "imull" will divide by 5 *IFF* the thing is evenly divisible by 5 (and not too big - but the shift guarantees that). Basically, it's a magic trick, because real divides are very expensive, but you can fake them more quickly if you can limit the input domain. But what does it mean if the two "struct page *" are not in the same array, and the two arrays were allocated not aligned exactly 40 bytes away, but some random number of pages away? You get *COMPLETE*GARBAGE* when you do the above optimized divide. Suddenly the divide had a modulus (because the base of the two arrays weren't 40-byte aligned), and the "trick" doesn't work. So that's why you can't do pointer diffs between two arrays. Not because you can't subtract the two pointers, but because the *division* part of the C pointer diff rules leads to issues. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org