From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754346AbcFIBAH (ORCPT ); Wed, 8 Jun 2016 21:00:07 -0400 Received: from LGEAMRELO11.lge.com ([156.147.23.51]:55669 "EHLO lgeamrelo11.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751359AbcFIBAE (ORCPT ); Wed, 8 Jun 2016 21:00:04 -0400 X-Original-SENDERIP: 156.147.1.126 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 165.244.98.204 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 10.177.223.161 X-Original-MAILFROM: minchan@kernel.org Date: Thu, 9 Jun 2016 10:01:07 +0900 From: Minchan Kim To: Johannes Weiner CC: , , Andrew Morton , Rik van Riel , Mel Gorman , Andrea Arcangeli , Andi Kleen , Michal Hocko , Tim Chen , Subject: Re: [PATCH 01/10] mm: allow swappiness that prefers anon over file Message-ID: <20160609010107.GF28620@bbox> References: <20160606194836.3624-1-hannes@cmpxchg.org> <20160606194836.3624-2-hannes@cmpxchg.org> <20160607002550.GA26230@bbox> <20160607141818.GE9978@cmpxchg.org> <20160608000632.GA27258@bbox> <20160608155812.GC6727@cmpxchg.org> MIME-Version: 1.0 In-Reply-To: <20160608155812.GC6727@cmpxchg.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-MIMETrack: Itemize by SMTP Server on LGEKRMHUB04/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2016/06/09 09:59:58, Serialize by Router on LGEKRMHUB04/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2016/06/09 09:59:58, Serialize complete at 2016/06/09 09:59:58 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 08, 2016 at 11:58:12AM -0400, Johannes Weiner wrote: > On Wed, Jun 08, 2016 at 09:06:32AM +0900, Minchan Kim wrote: > > On Tue, Jun 07, 2016 at 10:18:18AM -0400, Johannes Weiner wrote: > > > On Tue, Jun 07, 2016 at 09:25:50AM +0900, Minchan Kim wrote: > > > > On Mon, Jun 06, 2016 at 03:48:27PM -0400, Johannes Weiner wrote: > > > > > --- a/Documentation/sysctl/vm.txt > > > > > +++ b/Documentation/sysctl/vm.txt > > > > > @@ -771,14 +771,20 @@ with no ill effects: errors and warnings on these stats are suppressed.) > > > > > > > > > > swappiness > > > > > > > > > > -This control is used to define how aggressive the kernel will swap > > > > > -memory pages. Higher values will increase agressiveness, lower values > > > > > -decrease the amount of swap. A value of 0 instructs the kernel not to > > > > > -initiate swap until the amount of free and file-backed pages is less > > > > > -than the high water mark in a zone. > > > > > +This control is used to define the relative IO cost of cache misses > > > > > +between the swap device and the filesystem as a value between 0 and > > > > > +200. At 100, the VM assumes equal IO cost and will thus apply memory > > > > > +pressure to the page cache and swap-backed pages equally. At 0, the > > > > > +kernel will not initiate swap until the amount of free and file-backed > > > > > +pages is less than the high watermark in a zone. > > > > > > > > Generally, I agree extending swappiness value good but not sure 200 is > > > > enough to represent speed gap between file and swap sotrage in every > > > > cases. - Just nitpick. > > > > > > How so? You can't give swap more weight than 100%. 200 is the maximum > > > possible value. > > > > In old, swappiness is how agressively reclaim anonymous pages in favour > > of page cache. But when I read your description and changes about > > swappiness in vm.txt, esp, *relative IO cost*, I feel you change swappiness > > define to represent relative IO cost between swap storage and file storage. > > Then, with that, we could balance anonymous and file LRU with the weight. > > > > For example, let's assume that in-memory swap storage is 10x times faster > > than slow thumb drive. In that case, IO cost of 5 anonymous pages > > swapping-in/out is equal to 1 file-backed page-discard/read. > > > > I thought it does make sense because that measuring the speed gab between > > those storages is easier than selecting vague swappiness tendency. > > > > In terms of such approach, I thought 200 is not enough to show the gab > > because the gap is started from 100. > > Isn't it your intention? If so, to me, the description was rather > > misleading. :( > > The way swappiness works never actually changed. > > The only thing that changed is that we used to look at referenced > pages (recent_rotated) and *assumed* they would likely cause IO when > reclaimed, whereas with my patches we actually know whether they are. > But swappiness has always been about relative IO cost of the LRUs. > > Swappiness defines relative IO cost between file and swap on a scale > from 0 to 200, where 100 is the point of equality. The scale factors > are calculated in get_scan_count() like this: > > anon_prio = swappiness > file_prio = 200 - swappiness > > and those are applied to the recorded cost/value ratios like this: > > ap = anon_prio * scanned / rotated > fp = file_prio * scanned / rotated > > That means if your swap device is 10 times faster than your filesystem > device, and you thus want anon to receive 10x the refaults when the > anon and file pages are used equally, you do this: > > x + 10x = 200 > x = 18 (ish) > > So your file priority is ~18 and your swap priority is the remainder > of the range, 200 - 18. You set swappiness to 182. > > Now fill in the numbers while assuming all pages on both lists have > been referenced before and will likely refault (or in the new model, > all pages are refaulting): > > fraction[anon] = ap = 182 * 1 / 1 = 182 > fraction[file] = fp = 18 * 1 / 1 = 18 > denominator = ap + fp = 182 + 18 = 200 > > and then calculate the scan target like this: > > scan[type] = (lru_size() >> priority) * fraction[type] / denominator > > This will scan and reclaim 9% of the file pages and 90% of the anon > pages. On refault, 9% of the IO will be from the filesystem and 90% > from the swap device. Thanks for the detail example. Then, let's change the example a little bit. A system has big HDD storage and SSD swap. HDD: 200 IOPS SSD: 100000 IOPS >>From https://en.wikipedia.org/wiki/IOPS So, speed gap is 500x. x + 500x = 200 If we use PCIe-SSD, the gap will be larger. That's why I said 200 is enough to represent speed gap. Such system configuration is already non-sense so it is okay to ignore such usecases?