From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753340Ab3F0Nes (ORCPT ); Thu, 27 Jun 2013 09:34:48 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45177 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752165Ab3F0Neq (ORCPT ); Thu, 27 Jun 2013 09:34:46 -0400 Date: Thu, 27 Jun 2013 09:34:07 -0400 From: Luiz Capitulino To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, minchan@kernel.org, anton@enomsg.org, akpm@linux-foundation.org, kmpark@infradead.org, hyunhee.kim@samsung.com Subject: Re: [PATCH v2] vmpressure: implement strict mode Message-ID: <20130627093407.0466ced2@redhat.com> In-Reply-To: <20130627092616.GB17647@dhcp22.suse.cz> References: <20130626231712.4a7392a7@redhat.com> <20130627092616.GB17647@dhcp22.suse.cz> Organization: Red Hat Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 27 Jun 2013 11:26:16 +0200 Michal Hocko wrote: > On Wed 26-06-13 23:17:12, Luiz Capitulino wrote: > > Currently, an eventfd is notified for the level it's registered for > > _plus_ higher levels. > > > > This is a problem if an application wants to implement different > > actions for different levels. For example, an application might want > > to release 10% of its cache on level low, 50% on medium and 100% on > > critical. To do this, an application has to register a different > > eventfd for each pressure level. However, fd low is always going to > > be notified and and all fds are going to be notified on level critical. > > > > Strict mode solves this problem by strictly notifiying an eventfd > > for the pressure level it registered for. This new mode is optional, > > by default we still notify eventfds on higher levels too. > > > > Signed-off-by: Luiz Capitulino > > Two nits bellow but it looks good in general to me. You can add my > Reviewed-by: Michal Hocko Thanks. > I still think that edge triggering makes some sense but that one might > be rebased on top of this patch. We should still figure out whether the > edge triggering is the right approach for the use case Hyunhee Kim wants > it for so the strict mode should go first IMO. I also think that edge triggering makes sense. > > --- > > > > o v2 > > > > - Improve documentation > > - Use a bit to store mode instead of a bool > > - Minor changelog changes > > > > Documentation/cgroups/memory.txt | 26 ++++++++++++++++++++++---- > > mm/vmpressure.c | 26 ++++++++++++++++++++++++-- > > 2 files changed, 46 insertions(+), 6 deletions(-) > > > > diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt > > index ddf4f93..412872b 100644 > > --- a/Documentation/cgroups/memory.txt > > +++ b/Documentation/cgroups/memory.txt > > @@ -791,6 +791,22 @@ way to trigger. Applications should do whatever they can to help the > > system. It might be too late to consult with vmstat or any other > > statistics, so it's advisable to take an immediate action. > > > > +Applications can also choose between two notification modes when > > +registering an eventfd for memory pressure events: > > + > > +When in "non-strict" mode, an eventfd is notified for the specific level > > +it's registered for and higher levels. For example, an eventfd registered > > +for low level is also going to be notified on medium and critical levels. > > +This mode makes sense for applications interested on monitoring reclaim > > +activity or implementing simple load-balacing logic. The non-strict mode > > +is the default notification mode. > > + > > +When in "strict" mode, an eventfd is strictly notified for the pressure > > +level it's registered for. For example, an eventfd registered for the low > > +level event is not going to be notified when memory pressure gets into > > +medium or critical levels. This allows for more complex logic based on > > +the actual pressure level the system is experiencing. > > It would be also fair to mention that there is no guarantee that lower > levels are signaled before higher so nobody should rely on seeing LOW > before MEDIUM or CRITICAL. I think this is implied. Actually, as an user of this interface I didn't expect this to happen until I read the code. > > > + > > The events are propagated upward until the event is handled, i.e. the > > events are not pass-through. Here is what this means: for example you have > > three cgroups: A->B->C. Now you set up an event listener on cgroups A, B > > @@ -807,12 +823,14 @@ register a notification, an application must: > > > > - create an eventfd using eventfd(2); > > - open memory.pressure_level; > > -- write string like " " > > +- write string like " [strict]" > > to cgroup.event_control. > > > > -Application will be notified through eventfd when memory pressure is at > > -the specific level (or higher). Read/write operations to > > -memory.pressure_level are no implemented. > > +Applications will be notified through eventfd when memory pressure is at > > +the specific level or higher. If strict is passed, then applications > > +will only be notified when memory pressure reaches the specified level. > > + > > +Read/write operations to memory.pressure_level are no implemented. > > > > Test: > > > > diff --git a/mm/vmpressure.c b/mm/vmpressure.c > > index 736a601..ba5c17e 100644 > > --- a/mm/vmpressure.c > > +++ b/mm/vmpressure.c > > @@ -138,8 +138,16 @@ struct vmpressure_event { > > struct eventfd_ctx *efd; > > enum vmpressure_levels level; > > struct list_head node; > > + unsigned int mode; > > You would fill up a hole between level and node if you move it up on > 64b. Doesn't matter much but why not do it... You want me to respin?