From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail-io0-f196.google.com ([209.85.223.196]:39698 "EHLO
        mail-io0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1753731AbeGBVlx (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Mon, 2 Jul 2018 17:41:53 -0400
Received: by mail-io0-f196.google.com with SMTP id e13-v6so15987443iof.6
        for <linux-fsdevel@vger.kernel.org>; Mon, 02 Jul 2018 14:41:53 -0700 (PDT)
Subject: Re: [PATCH 00/14][V5] Introduce io.latency io controller for cgroups
To: Andrew Morton <akpm@linux-foundation.org>,
        Josef Bacik <josef@toxicpanda.com>
Cc: kernel-team@fb.com, linux-block@vger.kernel.org,
        hannes@cmpxchg.org, tj@kernel.org, linux-kernel@vger.kernel.org,
        linux-fsdevel@vger.kernel.org
References: <20180629192542.26649-1-josef@toxicpanda.com>
 <20180702142639.752759da566fd9074cf8edfe@linux-foundation.org>
From: Jens Axboe <axboe@kernel.dk>
Message-ID: <08f3bef3-7189-a368-74d9-b4c5e0edc824@kernel.dk>
Date: Mon, 2 Jul 2018 15:41:48 -0600
MIME-Version: 1.0
In-Reply-To: <20180702142639.752759da566fd9074cf8edfe@linux-foundation.org>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On 7/2/18 3:26 PM, Andrew Morton wrote:
> On Fri, 29 Jun 2018 15:25:28 -0400 Josef Bacik <josef@toxicpanda.com> wrote:
> 
>> This series adds a latency based io controller for cgroups.  It is based on the
>> same concept as the writeback throttling code, which is watching the overall
>> total latency of IO's in a given window and then adjusting the queue depth of
>> the group accordingly.  This is meant to be a workload protection controller, so
>> whoever has the lowest latency target gets the preferential treatment with no
>> thought to fairness or proportionality.  It is meant to be work conserving, so
>> as long as nobody is missing their latency targets the disk is fair game.
>>
>> We have been testing this in production for several months now to get the
>> behavior right and we are finally at the point that it is working well in all of
>> our test cases.  With this patch we protect our main workload (the web server)
>> and isolate out the system services (chef/yum/etc).  This works well in the
>> normal case, smoothing out weird request per second (RPS) dips that we would see
>> when one of the system services would run and compete for IO resources.  This
>> also works incredibly well in the runaway task case.
>>
>> The runaway task usecase is where we have some task that slowly eats up all of
>> the memory on the system (think a memory leak).  Previously this sort of
>> workload would push the box into a swapping/oom death spiral that was only
>> recovered by rebooting the box.  With this patchset and proper configuration of
>> the memory.low and io.latency controllers we're able to survive this test with a
>> at most 20% dip in RPS.
> 
> Is this purely useful for spinning disks, or is there some
> applicability to SSDs and perhaps other storage devices?  Some
> discussion on this topic would be useful.
> 
> Patches 5, 7 & 14 look fine to me - go wild.  #14 could do with a
> couple of why-we're-doing-this comments, but I say that about
> everything ;)

I want to queue this up for 4.19 shortly - is the above an acked-by? Andrewed-by?
Which do you prefer? :-)

-- 
Jens Axboe