From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f198.google.com (mail-io0-f198.google.com [209.85.223.198]) by kanga.kvack.org (Postfix) with ESMTP id 032AA6B0003 for ; Mon, 21 May 2018 18:07:50 -0400 (EDT) Received: by mail-io0-f198.google.com with SMTP id s2-v6so13131467ioa.22 for ; Mon, 21 May 2018 15:07:49 -0700 (PDT) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id g20-v6sor8288640iob.41.2018.05.21.15.07.48 for (Google Transport Security); Mon, 21 May 2018 15:07:49 -0700 (PDT) MIME-Version: 1.0 From: Daniel Colascione Date: Mon, 21 May 2018 15:07:36 -0700 Message-ID: Subject: Why do we let munmap fail? Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: Tim Murray , Minchan Kim Right now, we have this system knob max_map_count that caps the number of VMAs we can have in a single address space. Put aside for the moment of whether this knob should exist: even if it does, enforcing it for munmap, mprotect, etc. produces weird and counter-intuitive situations in which it's possible to fail to return resources (address space and commit charge) to the system. At a deep philosophical level, that's the kind of operation that should never fail. A library that does all the right things can still experience a failure to deallocate resources it allocated itself if it gets unlucky with VMA merging. Why should we allow that to happen? Now let's return to max_map_count itself: what is it supposed to achieve? If we want to limit application kernel memory resource consumption, let's limit application kernel memory resource consumption, accounting for it on a byte basis the same way we account for other kernel objects allocated on behalf of userspace. Why should we have a separate cap just for the VMA count? I propose the following changes: 1) Let -1 mean "no VMA count limit". 2) Default max_map_count to -1. 3) Do not enforce max_map_count on munmap and mprotect. Alternatively, can we account VMAs toward max_map_count on a page count basis instead of a VMA basis? This way, no matter how you split and merge your VMAs, you'll never see a weird failure to release resources. We'd have to bump the default value of max_map_count to compensate for its new interpretation.