From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753778AbbHMUEV (ORCPT <rfc822;w@1wt.eu>);
	Thu, 13 Aug 2015 16:04:21 -0400
Received: from mail-io0-f176.google.com ([209.85.223.176]:34358 "EHLO
	mail-io0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752320AbbHMUET (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 13 Aug 2015 16:04:19 -0400
MIME-Version: 1.0
In-Reply-To: <alpine.DEB.2.10.1508131249050.14178@chino.kir.corp.google.com>
References: <CA+55aFzUDiGRY_CZz0hON9AxkD1Fw_B-YSLiHH8tt_3rkKUrzw@mail.gmail.com>
	<20150812210027.88dfcf90.akpm@linux-foundation.org>
	<CA+55aFzjhAoa6aiEfuwM_XyYn0OWRjUuHyEasfj=q4tzOHFGYA@mail.gmail.com>
	<CA+55aFxzOAx7365Mx2o55TZOS+bZGh_Pfr=vVF3QGg0btsDumg@mail.gmail.com>
	<CA+55aFyf7heNE2VW8gfejMy0QKUtTZiRv5T+zo9EJEiJ41WN2w@mail.gmail.com>
	<alpine.DEB.2.10.1508131249050.14178@chino.kir.corp.google.com>
Date: Thu, 13 Aug 2015 13:04:19 -0700
X-Google-Sender-Auth: SLxMtzlGE5BZ01aUwaUn7UGQtIc
Message-ID: <CA+55aFyTLkNP+5Bwnc=Z8BnygBAALdzTAYxfXXoenkp33iipyA@mail.gmail.com>
Subject: Re: get_vmalloc_info() and /proc/meminfo insanely expensive
From: Linus Torvalds <torvalds@linux-foundation.org>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
        Joonsoo Kim <iamjoonsoo.kim@lge.com>,
        Al Viro <viro@zeniv.linux.org.uk>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Aug 13, 2015 at 12:50 PM, David Rientjes <rientjes@google.com> wrote:
>
> Rather than a time based approach, why not invalidate when it is known to
> change, either on the next call to get_vmalloc_info() or as part of the
> vmalloc operation itself?

I started doing that, looking at all the users of vmap_area_lock, and
decided that it's just too messy for me. I wanted something minimal
and obvious. The vmap_area handling is not obvious, especially since
the whole vmalloc statistics don't just depend on the vmap area list,
but also take the individual flags into account (ie it counts
VM_LAZY_FREE[ING] entries as having already been removed etc).

So I started looking at actually turning the vmap_area_lock into a
seqlock or even a rwlock (because some of the users are pure readers)
just to clarify things, and that wasn't hard per se, but I threw up my
hands in disgust. The code that modifies the thing is full of "take
lock, look up,. drop lock, do something, take lock again") etc.

Yes, we could just sprinkle "invalidate_cache = 1" statements around
in all the places that can cause this, but that would be pretty ad-hoc
and ugly too.  And since the reader side is actually entirely lockless
(currently using rcu, with my patch it has the additional lockless and
racy cache reading), to do it *right* you actually need to use at a
minimum memory ordering rules. So it would be fairly subtle too.

In contrast, my "just base it on time" sure as hell isn't subtle. It's
not great, but at least it's obvious and the crazy logic is
localized..

So I'd be all for somebody actually taking the time to do it right.
I'm pretty sure nobody really cares enough, though.

Willing to prove me wrong?

                   Linus