From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753778AbbHMUEV (ORCPT ); Thu, 13 Aug 2015 16:04:21 -0400 Received: from mail-io0-f176.google.com ([209.85.223.176]:34358 "EHLO mail-io0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752320AbbHMUET (ORCPT ); Thu, 13 Aug 2015 16:04:19 -0400 MIME-Version: 1.0 In-Reply-To: References: <20150812210027.88dfcf90.akpm@linux-foundation.org> Date: Thu, 13 Aug 2015 13:04:19 -0700 X-Google-Sender-Auth: SLxMtzlGE5BZ01aUwaUn7UGQtIc Message-ID: Subject: Re: get_vmalloc_info() and /proc/meminfo insanely expensive From: Linus Torvalds To: David Rientjes Cc: Andrew Morton , Joonsoo Kim , Al Viro , Linux Kernel Mailing List Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 13, 2015 at 12:50 PM, David Rientjes wrote: > > Rather than a time based approach, why not invalidate when it is known to > change, either on the next call to get_vmalloc_info() or as part of the > vmalloc operation itself? I started doing that, looking at all the users of vmap_area_lock, and decided that it's just too messy for me. I wanted something minimal and obvious. The vmap_area handling is not obvious, especially since the whole vmalloc statistics don't just depend on the vmap area list, but also take the individual flags into account (ie it counts VM_LAZY_FREE[ING] entries as having already been removed etc). So I started looking at actually turning the vmap_area_lock into a seqlock or even a rwlock (because some of the users are pure readers) just to clarify things, and that wasn't hard per se, but I threw up my hands in disgust. The code that modifies the thing is full of "take lock, look up,. drop lock, do something, take lock again") etc. Yes, we could just sprinkle "invalidate_cache = 1" statements around in all the places that can cause this, but that would be pretty ad-hoc and ugly too. And since the reader side is actually entirely lockless (currently using rcu, with my patch it has the additional lockless and racy cache reading), to do it *right* you actually need to use at a minimum memory ordering rules. So it would be fairly subtle too. In contrast, my "just base it on time" sure as hell isn't subtle. It's not great, but at least it's obvious and the crazy logic is localized.. So I'd be all for somebody actually taking the time to do it right. I'm pretty sure nobody really cares enough, though. Willing to prove me wrong? Linus