From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752379Ab0KWHSh (ORCPT ); Tue, 23 Nov 2010 02:18:37 -0500 Received: from fgwmail7.fujitsu.co.jp ([192.51.44.37]:36541 "EHLO fgwmail7.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752085Ab0KWHQ7 (ORCPT ); Tue, 23 Nov 2010 02:16:59 -0500 X-SecurityPolicyCheck-FJ: OK by FujitsuOutboundMailChecker v1.3.1 From: KOSAKI Motohiro To: David Rientjes Subject: Re: [PATCH] Revert oom rewrite series Cc: kosaki.motohiro@jp.fujitsu.com, Andrew Morton , Linus Torvalds , LKML , Ying Han , Bodo Eggert <7eggert@web.de>, Mandeep Singh Baines , "Figo.zhang" In-Reply-To: References: <20101115113238.BF06.A69D9226@jp.fujitsu.com> Message-Id: <20101123151731.7B7B.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.50.07 [ja] Date: Tue, 23 Nov 2010 16:16:56 +0900 (JST) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Sorry for the delay. > On Mon, 15 Nov 2010, KOSAKI Motohiro wrote: > > > Of cource, I denied. He seems to think number of email is meaningful than > > how talk about. but it's incorrect and makes no sense. Why not? Also, He > > have to talk about logically. "Hey, I think it's not bug" makes no sense. > > Such claim don't solve anything. userland is still unhappy. Why not? > > I want to quickly action. > > If there are pending complaints or bugs that I haven't addressed, please > bring them to my attention. To date, I know of no issues that have been > raised that I have not addressed; you're always free to disagree with my > position, but in the end you may find that when the kernel moves in a > different direction that you should begin to accept it. I can't understand. Why do I need to ignore userland folks? WHY? I have no reason userland complain. I tend to prefer to avoid userland folks painful than kernel developers. > > > That said, If anyone want to change userland ABI, Be carefully. They have > > to investigate userland usecase carefully and avoid to break them carefully > > again. If someone think "hey, It's no big matter. userland rewritten can solve > > an issue", I strongly disagree. they don't understand why all of userland > > applications rewritten is harmful. > > You may remember that the initial version of my rewrite replaced oom_adj > entirely with the new oom_score_adj semantics. Others suggested that it > be seperated into a new tunable and the old tunable deprecated for a > lengthy period of time. I accepted that criticism and understood the > drawbacks of replacing the tunable immediately and followed those > suggestions. I disagree with you that the deprecation of oom_adj for a > period of two years is as dramatic as you imply and I disagree that users > are experiencing problems with the linear scale that it now operates on > versus the old exponential scale. Yes and No. People wanted to separate AND don't break old one. > > > 1) About two month ago, Dave hansen observed strange OOM issue because he > > has a big machine and ALL process are not so big. thus, eventually all > > process got oom-score=0 and oom-killer didn't work. > > > > https://kerneltrap.org/mailarchive/linux-driver-devel/2010/9/9/6886383 > > > > DavidR changed oom-score to +1 in such situation. > > > > http://kerneltrap.org/mailarchive/linux-kernel/2010/9/9/4617455 > > > > But it is completely bognus. If all process have score=1, oom-killer fall > > back to purely random killer. I expected and explained his patch has > > its problem at half years ago. but he didn't fix yet. > > > > The resolution with which the oom killer considers memory is at 0.1% of > system RAM at its highest (smaller when you have a memory controller, > cpuset, or mempolicy constrained oom). It considers a task within 0.1% of > memory of another task to have equal "badness" to kill, we don't break > ties in between that resolution -- it all depends on which one shows up in > the tasklist first. If you disagree with that resolution, which I support > as being high enough, then you may certainly propose a patch to make it > even finer at 0.01%, 0.001%, etc. It would only change oom_badness() to > range between [0,10000], [0,100000], etc. No. Think Moore's Law. rational value will be not able to work in future anyway. 10 years ago, I used 20M bytes memory desktop machine and I'm now using 2GB. memory amount is growing and growing. and bash size doesn't grwoing so fast. > > > 2) Also half years ago, I did explained oom_adj is used from multiple > > applications. And we can't break them. But DavidR didn't fix. > > > > And we didn't. oom_adj is still there and maps linearly to oom_score_adj; > you just can't show a single application where that mapping breaks because > it was based on an actual calculation. > > If you would like to cite these "multiple" applications that need to be > converted to use oom_score_adj (I know of udev), please let me know and > if they're open-source applications then I will commit to submitting > patches for them myself. I believe the two year window is sufficient for > everyone else, though. If you want, you have to change userland at first and by yourself. Don't claim anyoneelse should working for you. > > 3) Also about four month ago, I and kamezawa-san pointed out his patch > > don't work on memcg. It also haven't been fixed. > > I don't know what you're referring to here, sorry. You should have read my patch. Even though you haven't use memcg, We do. > As kamezawa-san pointed out, This break cgroup and lxr environment. > He said, > > Assume 2 proceses A, B which has oom_score_adj of 300 and 0 > > And A uses 200M, B uses 1G of memory under 4G system > > > > Under the system. > > A's socre = (200M *1000)/4G + 300 = 350 > > B's score = (1G * 1000)/4G = 250. > > > > In the cpuset, it has 2G of memory. > > A's score = (200M * 1000)/2G + 300 = 400 > > B's socre = (1G * 1000)/2G = 500 > > > > This priority-inversion don't happen in current system. > > > In the other hand, You can't explain what worth OOM-rewritten patch has. > > Because there is nothing. It is only "powerful"(TM) for Google. but > > instead It has zero worth for every other people. Here is just technical > > issue. Bah. > > > > Please see my reply to Figo.zhang where I enumerate the four reasons why > the new userspace tunable is more powerful than oom_adj. I'm NOT interesting *powerful* crap. Please DON'T talk which is powerful. I can only said, It's useful only for you. > At this point, I can only speculate that your distaste for the new oom > killer is one of disposition; it seems like everytime you reply to an > email (or, more regularly, just repost your revert) that you come into it > with the attitude that my response cannot possibly be correct and that the > way you see things is exactly as they should be. If you were to consider > other people's opinions, however, you may find some common ground that can > be met. I certainly did that when I introduced oom_score_adj instead of > replacing oom_adj immediatley. I also did it when I removed the forkbomb > detector from the rewrite. I also did it when considering swap in the > heuristic when it initially was only rss. Andrew is in the position where > he has to make a judgment call on what should be included and what > shouldn't and it should be pretty darn clear after you post your revert > the first time, then the second time, then the third time, then the fourth > time, and now the fifth time.