Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Vladimir Davydov <vdavydov@virtuozzo.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>,
	David Miller <davem@davemloft.net>,
	akpm@linux-foundation.org, tj@kernel.org, netdev@vger.kernel.org,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 5/8] mm: memcontrol: account socket memory on unified hierarchy
Date: Fri, 6 Nov 2015 12:05:55 +0300
Message-ID: <20151106090555.GK29259@esperanza> (raw)
In-Reply-To: <20151105205522.GA1067@cmpxchg.org>

On Thu, Nov 05, 2015 at 03:55:22PM -0500, Johannes Weiner wrote:
> On Thu, Nov 05, 2015 at 03:40:02PM +0100, Michal Hocko wrote:
...
> > 3) keep only some (safe) cache types enabled by default with the current
> >    failing semantic and require an explicit enabling for the complete
> >    kmem accounting. [di]cache code paths should be quite robust to
> >    handle allocation failures.
> 
> Vladimir, what would be your opinion on this?

I'm all for this option. Actually, I've been thinking about this since I
introduced the __GFP_NOACCOUNT flag. Not because of the failing
semantics, since we can always let kmem allocations breach the limit.
This shouldn't be critical, because I don't think it's possible to issue
a series of kmem allocations w/o a single user page allocation, which
would reclaim/kill the excess.

The point is there are allocations that are shared system-wide and
therefore shouldn't go to any memcg. Most obvious examples are: mempool
users and radix_tree/idr preloads. Accounting them to memcg is likely to
result in noticeable memory overhead as memory cgroups are
created/destroyed, because they pin dead memory cgroups with all their
kmem caches, which aren't tiny.

Another funny example is objects destroyed lazily for performance
reasons, e.g. vmap_area. Such objects are usually very small, so
delaying destruction of a bunch of them will normally go unnoticed.
However, if kmemcg is used the effective memory consumption caused by
such objects can be multiplied by many times due to dangling kmem
caches.

We can, of course, mark all such allocations as __GFP_NOACCOUNT, but the
problem is they are tricky to identify, because they are scattered all
over the kernel source tree. E.g. Dave Chinner mentioned that XFS
internals do a lot of allocations that are shared among all XFS
filesystems and therefore should not be accounted (BTW that's why
list_lru's used by XFS are not marked as memcg-aware). There must be
more out there. Besides, kernel developers don't usually even know about
kmemcg (they just write the code for their subsys, so why should they?)
so they won't care thinking about using __GFP_NOACCOUNT, and hence new
falsely-accounted allocations are likely to appear.

That said, by switching from black-list (__GFP_NOACCOUNT) to white-list
(__GFP_ACCOUNT) kmem accounting policy we would make the system more
predictable and robust IMO. OTOH what would we lose? Security? Well,
containers aren't secure IMHO. In fact, I doubt they will ever be (as
secure as VMs). Anyway, if a runaway allocation is reported, it should
be trivial to fix by adding __GFP_ACCOUNT where appropriate.

If there are no objections, I'll prepare a patch switching to the
white-list approach. Let's start from obvious things like fs_struct,
mm_struct, task_struct, signal_struct, dentry, inode, which can be
easily allocated from user space. This should cover 90% of all
allocations that should be accounted AFAICS. The rest will be added
later if necessarily.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply index

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-22  4:21 [PATCH 0/8] mm: memcontrol: account socket memory in " Johannes Weiner
2015-10-22  4:21 ` [PATCH 1/8] mm: page_counter: let page_counter_try_charge() return bool Johannes Weiner
2015-10-23 11:31   ` Michal Hocko
2015-10-22  4:21 ` [PATCH 2/8] mm: memcontrol: export root_mem_cgroup Johannes Weiner
2015-10-23 11:32   ` Michal Hocko
2015-10-22  4:21 ` [PATCH 3/8] net: consolidate memcg socket buffer tracking and accounting Johannes Weiner
2015-10-22 18:46   ` Vladimir Davydov
2015-10-22 19:09     ` Johannes Weiner
2015-10-23 13:42       ` Vladimir Davydov
2015-10-23 12:38   ` Michal Hocko
2015-10-22  4:21 ` [PATCH 4/8] mm: memcontrol: prepare for unified hierarchy socket accounting Johannes Weiner
2015-10-23 12:39   ` Michal Hocko
2015-10-22  4:21 ` [PATCH 5/8] mm: memcontrol: account socket memory on unified hierarchy Johannes Weiner
2015-10-22 18:47   ` Vladimir Davydov
2015-10-23 13:19   ` Michal Hocko
2015-10-23 13:59     ` David Miller
2015-10-26 16:56       ` Johannes Weiner
2015-10-27 12:26         ` Michal Hocko
2015-10-27 13:49           ` David Miller
2015-10-27 15:41           ` Johannes Weiner
2015-10-27 16:15             ` Michal Hocko
2015-10-27 16:42               ` Johannes Weiner
2015-10-28  0:45                 ` David Miller
2015-10-28  3:05                   ` Johannes Weiner
2015-10-29 15:25                 ` Michal Hocko
2015-10-29 16:10                   ` Johannes Weiner
2015-11-04 10:42                     ` Michal Hocko
2015-11-04 19:50                       ` Johannes Weiner
2015-11-05 14:40                         ` Michal Hocko
2015-11-05 16:16                           ` David Miller
2015-11-05 16:28                             ` Michal Hocko
2015-11-05 16:30                               ` David Miller
2015-11-05 22:32                               ` Johannes Weiner
2015-11-06 12:51                                 ` Michal Hocko
2015-11-05 20:55                           ` Johannes Weiner
2015-11-05 22:52                             ` Johannes Weiner
2015-11-06 10:57                               ` Michal Hocko
2015-11-06 16:19                                 ` Johannes Weiner
2015-11-06 16:46                                   ` Michal Hocko
2015-11-06 17:45                                     ` Johannes Weiner
2015-11-07  3:45                                     ` David Miller
2015-11-12 18:36                                   ` Mel Gorman
2015-11-12 19:12                                     ` Johannes Weiner
2015-11-06  9:05                             ` Vladimir Davydov [this message]
2015-11-06 13:29                               ` Michal Hocko
2015-11-06 16:35                               ` Johannes Weiner
2015-11-06 13:21                             ` Michal Hocko
2015-10-22  4:21 ` [PATCH 6/8] mm: vmscan: simplify memcg vs. global shrinker invocation Johannes Weiner
2015-10-23 13:26   ` Michal Hocko
2015-10-22  4:21 ` [PATCH 7/8] mm: vmscan: report vmpressure at the level of reclaim activity Johannes Weiner
2015-10-22 18:48   ` Vladimir Davydov
2015-10-23 13:49   ` Michal Hocko
2015-10-22  4:21 ` [PATCH 8/8] mm: memcontrol: hook up vmpressure to socket pressure Johannes Weiner
2015-10-22 18:57   ` Vladimir Davydov
2015-10-22 18:45 ` [PATCH 0/8] mm: memcontrol: account socket memory in unified hierarchy Vladimir Davydov
2015-10-26 17:22   ` Johannes Weiner
2015-10-27  8:43     ` Vladimir Davydov
2015-10-27 16:01       ` Johannes Weiner
2015-10-28  8:20         ` Vladimir Davydov
2015-10-28 18:58           ` Johannes Weiner
2015-10-29  9:27             ` Vladimir Davydov
2015-10-29 17:52               ` Johannes Weiner
2015-11-02 14:47                 ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151106090555.GK29259@esperanza \
    --to=vdavydov@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git