Re: [PATCH v1] proc: Implement /proc/self/meminfo

From: "Enrico Weigelt, metux IT consult" <lkml@metux.net>
To: Johannes Weiner <hannes@cmpxchg.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Chris Down <chris@chrisdown.name>,
	legion@kernel.org, LKML <linux-kernel@vger.kernel.org>,
	Linux Containers <containers@lists.linux.dev>,
	Linux Containers <containers@lists.linux-foundation.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH v1] proc: Implement /proc/self/meminfo
Date: Fri, 11 Jun 2021 12:37:36 +0200	[thread overview]
Message-ID: <f62b652c-3f6f-31ba-be0f-5f97b304599f@metux.net> (raw)
In-Reply-To: <YMElKcrVIhJg4GTT@cmpxchg.org>

On 09.06.21 22:31, Johannes Weiner wrote:

>> Alex has found applications that are trying to do something with
>> meminfo, and the fields that those applications care about.  I don't see
>> anyone making the case that specifically what the applications are
>> trying to do is buggy.

Actually, I do. The assumptions made by those applications are most
certainly wrong - have been wrong ever since.

The problem is that just looking on these numbers is only helpful if
that application is pretty much alone on the machine. You'll find those
when the vendor explicitly tells you to run it on a standalone machine,
disable swap, etc. Database systems are probably the most prominent
sector here.

Onder certain circumstances this actually migh, heuristically, work,
but those kind of auto tuning (eg. automatically eat X% of ram for
buffers, etc) might work. Under certain circumstance.

Those assumptions already had been defeated by VMs (eg. overcommit).

I don't feel it's a good idea to add extra kernel code, just in order
to make some ill-designed applications work a little bit less bad
(which still needs to be changed anyways)

Don't get me wrong, I'm really in favour of having a clean interface
for telling applications how much resources they can take (actually
considered quite the same), but this needs to be very well thought (and
we should also get orchestration folks into the loop for that). This
basically falls into two categories:

a) hard limits - how much can an application possibly consume
    --> what makes about an application ? process ? process group ?
        cgroup ? namespace ?
b) reasonable defaults - how much can an application take at will, w/o
    affecting others ?
    --> what exactly is "reasonable" ? kernel cache vs. userland buffers?
    --> how to deal w/ overcommit scenarios ?
    --> who shall be in charge of controlling these values ?

It's a very complex problem, not easy to solve. Much of that seems to be
a matter of policies, and depending on actual workloads.

Maybe, for now, it's better pursue that on orchestration level.

> Not all the information at the system level translates well to the
> container level. Things like available memory require a hierarchical
> assessment rather than just a look at the local level, since there
> could be limits higher up the tree.

By the way: what exactly *is* a container anyways ?

The mainline kernel (in contrast to openvz kernel) doesn't actually know
about containers at all - instead is provides several concepts like
namespaces, cgroups, etc, that together are used for providing some
container environment - but that composition is done in userland, and
there're several approaches w/ different semantics.

Before we can do anything container specific in the kernel, we first
have to come to an general aggreement what actually is a container from
kernel perspective. No idea whether we can achieve that at all (in near
future) w/o actually introducing the concept of container within the
kernel.

> We should also not speculate what users intended to do with the
> meminfo data right now. There is a surprising amount of misconception
> around what these values actually mean. I'd rather have users show up
> on the mailing list directly and outline the broader usecase.

ACK.

The only practical use cases I'm aware of is:

a) safety: know how much memory I can eat, until I get -ENOMEM, so
    applications can proactively take counter measures, eg. pre
    allocation (common practise in safety related applications)

b) autotuning: how much shall the application take for caches or
    buffers. this is problematic, since it can only work on heuristics,
    which in turn can only be experimentally found within certain range
    of assumptions (eg. certain databases like to do that). By that way
    you can only find more or less reasonable parameters for the majority
    of cases (assuming you have an idea what that majority actually is),
    but still far from optimal. for *good* parameters you need to measure
    your actual workloads and applying good knowledge of what this
    application is actually doing. (one of the DBA's primary jobs)

--mtx
-- 
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287