From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linuxfoundation.org ([140.211.169.12]:37350 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753544AbeGBVr7 (ORCPT ); Mon, 2 Jul 2018 17:47:59 -0400 Date: Mon, 2 Jul 2018 14:47:57 -0700 From: Andrew Morton To: Jens Axboe Cc: Josef Bacik , kernel-team@fb.com, linux-block@vger.kernel.org, hannes@cmpxchg.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 00/14][V5] Introduce io.latency io controller for cgroups Message-Id: <20180702144757.0138984124f97bf0b8f8de31@linux-foundation.org> In-Reply-To: <08f3bef3-7189-a368-74d9-b4c5e0edc824@kernel.dk> References: <20180629192542.26649-1-josef@toxicpanda.com> <20180702142639.752759da566fd9074cf8edfe@linux-foundation.org> <08f3bef3-7189-a368-74d9-b4c5e0edc824@kernel.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Mon, 2 Jul 2018 15:41:48 -0600 Jens Axboe wrote: > On 7/2/18 3:26 PM, Andrew Morton wrote: > > On Fri, 29 Jun 2018 15:25:28 -0400 Josef Bacik wrote: > > > >> This series adds a latency based io controller for cgroups. It is based on the > >> same concept as the writeback throttling code, which is watching the overall > >> total latency of IO's in a given window and then adjusting the queue depth of > >> the group accordingly. This is meant to be a workload protection controller, so > >> whoever has the lowest latency target gets the preferential treatment with no > >> thought to fairness or proportionality. It is meant to be work conserving, so > >> as long as nobody is missing their latency targets the disk is fair game. > >> > >> We have been testing this in production for several months now to get the > >> behavior right and we are finally at the point that it is working well in all of > >> our test cases. With this patch we protect our main workload (the web server) > >> and isolate out the system services (chef/yum/etc). This works well in the > >> normal case, smoothing out weird request per second (RPS) dips that we would see > >> when one of the system services would run and compete for IO resources. This > >> also works incredibly well in the runaway task case. > >> > >> The runaway task usecase is where we have some task that slowly eats up all of > >> the memory on the system (think a memory leak). Previously this sort of > >> workload would push the box into a swapping/oom death spiral that was only > >> recovered by rebooting the box. With this patchset and proper configuration of > >> the memory.low and io.latency controllers we're able to survive this test with a > >> at most 20% dip in RPS. > > > > Is this purely useful for spinning disks, or is there some > > applicability to SSDs and perhaps other storage devices? Some > > discussion on this topic would be useful. > > > > Patches 5, 7 & 14 look fine to me - go wild. #14 could do with a > > couple of why-we're-doing-this comments, but I say that about > > everything ;) > > I want to queue this up for 4.19 shortly - is the above an acked-by? Andrewed-by? > Which do you prefer? :-) Quacked-at-by: Andrew Hannes's acks are good. Feel free to add mine as well ;)