From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932207AbeBVMXE (ORCPT ); Thu, 22 Feb 2018 07:23:04 -0500 Received: from bombadil.infradead.org ([198.137.202.133]:55734 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753570AbeBVMW5 (ORCPT ); Thu, 22 Feb 2018 07:22:57 -0500 Date: Thu, 22 Feb 2018 04:22:54 -0800 From: Matthew Wilcox To: Michal Hocko Cc: Dave Hansen , Konstantin Khlebnikov , linux-kernel@vger.kernel.org, Christoph Hellwig , linux-mm@kvack.org, Andy Lutomirski , Andrew Morton , "Kirill A. Shutemov" Subject: Re: Use higher-order pages in vmalloc Message-ID: <20180222122254.GA22703@bombadil.infradead.org> References: <151670492223.658225.4605377710524021456.stgit@buzz> <151670493255.658225.2881484505285363395.stgit@buzz> <20180221154214.GA4167@bombadil.infradead.org> <20180221170129.GB27687@bombadil.infradead.org> <20180222065943.GA30681@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180222065943.GA30681@dhcp22.suse.cz> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 22, 2018 at 07:59:43AM +0100, Michal Hocko wrote: > On Wed 21-02-18 09:01:29, Matthew Wilcox wrote: > > Right. It helps with fragmentation if we can keep higher-order > > allocations together. > > Hmm, wouldn't it help if we made vmalloc pages migrateable instead? That > would help the compaction and get us to a lower fragmentation longterm > without playing tricks in the allocation path. I was wondering about that possibility. If we want to migrate a page then we have to shoot down the PTE across all CPUs, copy the data to the new page, and insert the new PTE. Copying 4kB doesn't take long; if you have 12GB/s (current example on Wikipedia: dual-channel memory and one DDR2-800 module per channel gives a theoretical bandwidth of 12.8GB/s) then we should be able to copy a page in 666ns). So there's no problem holding a spinlock for it. But we can't handle a fault in vmalloc space today. It's handled in arch-specific code, see vmalloc_fault() in arch/x86/mm/fault.c If we're going to do this, it'll have to be something arches opt into because I'm not taking on the job of fixing every architecture! > Maybe we should consider kvmalloc for the kernel stack? We'd lose the guard page, so it'd have to be something we let the sysadmin decide to do.