From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1757713AbcHYKHp (ORCPT <rfc822;w@1wt.eu>);
        Thu, 25 Aug 2016 06:07:45 -0400
Received: from mx1.redhat.com ([209.132.183.28]:51958 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751615AbcHYKHm (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 25 Aug 2016 06:07:42 -0400
From: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
To: Jiri Olsa <jolsa@redhat.com>
Cc: linux-kernel@vger.kernel.org, Aristeu Rozanski <aris@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>,
        Arnaldo Carvalho de Melo <acme@kernel.org>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [RFC] rlimit exceed notification events
References: <xunyh9ag254f.fsf@redhat.com> <20160824112428.GA15743@krava>
Date: Thu, 25 Aug 2016 13:07:36 +0300
Message-ID: <xuny1t1dw49j.fsf@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Thu, 25 Aug 2016 10:07:24 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi, Jiri!

>>>>> On Wed, 24 Aug 2016 13:24:28 +0200, Jiri Olsa  wrote:

 > On Fri, Aug 19, 2016 at 05:41:20PM +0300, Yauheni Kaliuta wrote:
 >> 
 >> At the moment there is no clear indication if a process exceeds resource
 >> limit. In some cases the problematic syscall can return a error, in some cases
 >> the process can be just killed.

[...]

 >> 2) Using tracepoints. I've used a simple program, which dup()s until gets the
 >> error 3 times:

 > just to start up the discussion.. ;-)

 > I'd think this one (2) is the proper way,

>>From the options I checked, I like it most as well. Probably I should
prepare an RFC PATCH with it.

 > but generaly you need to
 > come with good justification/usecase to add new tracepoint

 > also rlimit seems to be difficult to add tracepoints to,
 > because the checks are spread all over the code.. 

 > can't think of a good solution ATM

Yes, every place should be instrumented. I just introduce some indirection
to have some flexibility for the final output.

Still it's good to know if there are objections for such a
instrumentation in any of the resource check places, like file operations
for example.

 >> $ sudo ./perf record -e rlimit:rlimit_exceeded ./a.out

[...]

 >> index 6b1acdfe59da..a358de041ac4 100644
 >> --- a/fs/file.c
 >> +++ b/fs/file.c
 >> @@ -947,6 +947,9 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes)
 >> else
 >> fput(file);
 >> }
 >> +	if (ret == -EMFILE)
 >> +		rlimit_exceeded(RLIMIT_NOFILE,
 >> +				rlimit(RLIMIT_NOFILE), (u64)-1);
 >> return ret;

 > how about other places? alloc_fd/get_unused_fd_flags/replace_fd..

This is very good question. Initially I just wanted something for demo, but
I run into a dilemma even here. Ideally it must be a place, which is

a) aware of RLIMIT and
b) responsible for the decision making:

1) It would be good to place it into __alloc_fd() since it is a final point
and performs the check to against the limit, but it's not aware of the
RLIMIT, the limit is passed to it from upper levels.

2) get_unused_fd_flags() is aware of RLIMIT and entry point for many other
fd allocations, but doesn't do any decision.

3) the dup() syscall is not aware of RLIMIT, but makes the final decision.

That was the reason, why I put it here for the prototype code, but it
doesn't look as a good place for final solution.

In many other cases both a) and b) are in one place, so there is no such
problem.


-- 
WBR,
Yauheni Kaliuta