From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751168AbdHWWhQ (ORCPT ); Wed, 23 Aug 2017 18:37:16 -0400 Received: from mga01.intel.com ([192.55.52.88]:62540 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751116AbdHWWhO (ORCPT ); Wed, 23 Aug 2017 18:37:14 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,417,1498546800"; d="scan'208";a="893508761" Date: Thu, 24 Aug 2017 01:36:38 +0300 From: "Kirill A. Shutemov" To: Linus Torvalds Cc: "Kirill A. Shutemov" , Vitaly Kuznetsov , the arch/x86 maintainers , Linux Kernel Mailing List , xen-devel , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Peter Zijlstra , Jork Loeser , KY Srinivasan , Stephen Hemminger , Steven Rostedt , Juergen Gross , Boris Ostrovsky , Andrew Cooper , Andy Lutomirski Subject: Re: [PATCH] x86: enable RCU based table free when PARAVIRT Message-ID: <20170823223637.bjke4w3wpolrn7md@black.fi.intel.com> References: <20170823134521.5068-1-vkuznets@redhat.com> <20170823195955.wnyg2dcv4c23kdoj@node.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170714-126-deb55f (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 23, 2017 at 08:27:18PM +0000, Linus Torvalds wrote: > On Wed, Aug 23, 2017 at 12:59 PM, Kirill A. Shutemov > wrote: > > > > In this case we need performance numbers for !PARAVIRT kernel. > > Yes. > > > Numbers for tight loop of "mmap(MAP_POPULATE); munmap()" might be > > interesting too for worst case scenario. > > Actually, I don't think you want to populate all the pages. You just > want to populate *one* page, in order to build up the page directory > structure, not allocate all the final points. > > And we only free the actual page tables when there is nothing around, > so it should be at least a 2MB-aligned region etc. > > So you should do a *big* allocation, and then touch a single page in > the middle, and then minmap it - that should give you maximal page > table activity. Otherwise the page tables will generally just stay > around. > > Realistically, it's mainly exit() that frees page tables. Yes, you may > have a few page tables free'd by a normal munmap(), but it's usually > very limited. Which is why I suggested that script-heavy thing with > lots of small executables. That tends to be the main realistic load > that really causes a ton of page directory activity. Below is test cases that allocates a lot of page tables and measuare fork/exit time. (I'm not entirely sure it's the best way to stress the codepath.) Unpatched: average 4.8322s, stddev 0.114s Patched: average 4.8362s, stddev 0.111s Both without PARAVIRT. Patch is modified to enable HAVE_RCU_TABLE_FREE for !PARAVIRT too. The test-case requires "echo 1 > /proc/sys/vm/overcommit_memory". #include #include #include #include #include #include #include #include #include #include #define PUD_SIZE (1UL << 30) #define PMD_SIZE (1UL << 21) #define NR_PUD 4096 #define NSEC_PER_SEC 1000000000L int main(void) { char *addr = NULL; unsigned long i, j; struct timespec start, finish; long long nsec; prctl(PR_SET_THP_DISABLE); for (i = 0; i < NR_PUD ; i++) { addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); if (addr == MAP_FAILED) { perror("mmap"); break; } for (j = 0; j < PUD_SIZE; j += PMD_SIZE) assert(addr[j] == 0); } for (i = 0; i < 10; i++) { pid_t pid; clock_gettime(CLOCK_MONOTONIC, &start); pid = fork(); if (pid == -1) perror("fork"); if (!pid) exit(0); wait(NULL); clock_gettime(CLOCK_MONOTONIC, &finish); nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC + (finish.tv_nsec - start.tv_nsec); printf("%lld\n", nsec); } return 0; } -- Kirill A. Shutemov