From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752997AbbFKPuI (ORCPT ); Thu, 11 Jun 2015 11:50:08 -0400 Received: from mail-wi0-f173.google.com ([209.85.212.173]:38135 "EHLO mail-wi0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751067AbbFKPuE (ORCPT ); Thu, 11 Jun 2015 11:50:04 -0400 Date: Thu, 11 Jun 2015 17:49:58 +0200 From: Ingo Molnar To: Andy Lutomirski Cc: "linux-kernel@vger.kernel.org" , linux-mml@vger.kernel.org, Andrew Morton , Denys Vlasenko , Brian Gerst , Peter Zijlstra , Borislav Petkov , "H. Peter Anvin" , Linus Torvalds , Oleg Nesterov , Thomas Gleixner , Waiman Long Subject: Re: [PATCH 08/12] x86/mm: Remove pgd_list use from vmalloc_sync_all() Message-ID: <20150611154958.GA16799@gmail.com> References: <1434031637-9091-1-git-send-email-mingo@kernel.org> <1434031637-9091-9-git-send-email-mingo@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Andy Lutomirski wrote: > On Thu, Jun 11, 2015 at 7:07 AM, Ingo Molnar wrote: > > The vmalloc() code uses vmalloc_sync_all() to synchronize changes to > > the global reference kernel PGD to task PGDs. > > Does it? AFAICS the only caller is register_die_notifier, and it's > not really clear to me why that exists. Doh, indeed, got confused in that changelog - we are filling it in opportunistically via vmalloc_fault(). > At some point I'd love to remove lazy kernel PGD sync from the kernel entirely > (or at least from x86) and just do it when we switch mms. Now that you're > removing all code that deletes kernel PGD entries, I think all we'd need to do > is to add a per-PGD or per-mm count of the number of kernel entries populated > and to fix it up when we switch to an mm with fewer entries populated than > init_mm. That would add a (cheap but nonzero) runtime check to every context switch. It's a relative slow path, but in comparison vmalloc() is an even slower slowpath, so why not do it there and just do synchronous updates and remove the vmalloc faults altogether? Also, on 64-bit it should not matter much: there the only change is the once in a blue moon case where we allocate a new pgd for a 512 GB block of address space that a single pgd entry covers. I'd hate to add a check to every context switch, no matter how cheap, just for a case that essentially never triggers... So how about this solution instead: - we add a generation counter to sync_global_pgds() so that it can detect when the number of pgds populated in init_mm changes. - we change vmalloc() to call sync_global_pgds(): this will be very cheap in the overwhelming majority of cases. - we eliminate vmalloc_fault(), on 64-bit at least. Yay! :-) Thanks, Ingo