From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964878AbeAKPXy (ORCPT + 1 other); Thu, 11 Jan 2018 10:23:54 -0500 Received: from mail-ve1eur01on0097.outbound.protection.outlook.com ([104.47.1.97]:55296 "EHLO EUR01-VE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932844AbeAKPXv (ORCPT ); Thu, 11 Jan 2018 10:23:51 -0500 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=aryabinin@virtuozzo.com; Subject: Re: [PATCH v4] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes To: Michal Hocko Cc: Andrew Morton , Johannes Weiner , Vladimir Davydov , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shakeel Butt References: <20180109152622.31ca558acb0cc25a1b14f38c@linux-foundation.org> <20180110124317.28887-1-aryabinin@virtuozzo.com> <20180111104239.GZ1732@dhcp22.suse.cz> <4a8f667d-c2ae-e3df-00fd-edc01afe19e1@virtuozzo.com> <20180111124629.GA1732@dhcp22.suse.cz> From: Andrey Ryabinin Message-ID: Date: Thu, 11 Jan 2018 18:23:57 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <20180111124629.GA1732@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [195.214.232.6] X-ClientProxiedBy: HE1PR1001CA0020.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:3:f7::30) To HE1PR08MB2827.eurprd08.prod.outlook.com (2603:10a6:7:2e::26) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 3da051f2-5416-4b2c-fc50-08d55907504d X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020067)(4652020)(5600026)(4604075)(2017052603307)(7153060)(7193020);SRVR:HE1PR08MB2827; X-Microsoft-Exchange-Diagnostics: 1;HE1PR08MB2827;3:oFf3dXJlmLSy/B7gqSNJCABIzzOiO0GbGd0C4YAW5s/Hqlfk3CBnbZxS1Pj00Re8jkqA+wTMacQOi+/7JOB+g8DdtePF5M7EJn85X2SKTcGvGf54yRXb9EePW9RE2ZQbWRZ0jOd+BNK3HuFY52oxVndb2bcVgjiZggdm4m8P9uE+nhBsd43ej9VJA3F7t/dgCVPRnt9BuBcxzC5iFWc1M1IobvCtQosVQB+VntRCdRn9w6I6tw1WoexB7hGkfj2l;25:FfGjyajYNagoIamj2rOAiWsHhHA+UG2p78ESUar8fzVWe4Ym02Zu2nAQ3r/I3/FnY/VbYeydzGjlQmE97A3IymZsJGxsdp97Xrc/I31SmjuElxLNS6G4ptLZgNIkspZ+YCJDeL01JJsnCnrVYzrLETcdGgl6j1PdSWCIruMxpz3DCmad/brflIV82aV63LTt1R0RlFd0VH2iVGUVSImMFPuAQ3XvSHL4Zj8t/GsXThVan+s6AH3tEo7aPoCT7Us1fa8hhOS3IVtapRkAU7ItEjUcIpQJkO+zewnLocM+lOAo4jMydZRSrPKGsCAQuKeFCmv2ul+4PMfvbqQQq3jxDQ==;31:m1kozQ0gAkl3Q2+zqcBx0MH18eZ/NHfVsY4t9Stm7OYkq6I+sLexNWoYyfhs+/JVjglOpa85U52X5CyTrsJUdponn3Tx09gGMrAxKCIfVU6A5galPKXLg3ifIshbEWLti6Z1+Mza9VRCzmdZ6eYs1dKJE+kke7DuL12e3a6CK0CEMpf8EMq75LiPM3JNwzj5T+9ju6Rpaw7hpfiLn6fIy9MN4H8wuSOSLmqKknfJhRo= X-MS-TrafficTypeDiagnostic: HE1PR08MB2827: X-Microsoft-Exchange-Diagnostics: 1;HE1PR08MB2827;20:vYmhlC1IJBl4KWolG6z4Sn4vlwWjBLnGdN5sKZxlHvIhXYP+p4u6FyBBljqe/wmc/LPMrksa0rPe/IsM0AybmKC4ETlTXnlz2EkFoy59hq30ofkURCZT6S6x+6lMAENe7mSy9QQUM3+2wJTBgbico2ioejbOkOCk6VZyVwKs+gidtUyrqEZuDLnT4yW8tQbjTj+jlKuA8NEf+5xMJzIbvAYjvwMIHK4sJ9eianvsKMRlv4ZUenSX6zVESHf/MndwkLGZvjtBTGOUE+OvPCMlbXx98ak7e8dUcZEUyD4AFRLNIcTT/9c3au/sqYmFMyQZLJ32FRj1NUNpMGkdcIEGKAMQlVQHRGbuKeDabn4rGallTzJNwa/CraTwIxRkaj7AkTs8NXVTl1cJaTvKFu3k0R8kQp4Bqrrq8Mz2UY4zY4Q=;4:wad1I4mYr2bIEXP0TdI3uHP8Pt1RyfggIR0pfN29SBK5Ga2fp2k70EtdLRRF8VK/KE4of1HprmR+j/pZVHpqU3aNTBKhPqmHtvI097796dYwnFWZbSxyLa0Aiw5FBi2E3tu+Egn99JxTZ01kPZpHNIiNM0VXiA50Qh0xeFjP43X7JuruWSjGCizI6XvIzgUn+hLQe24w+r5Eb1jDFmfWHXKIjkgBo5J6pZ8dyGWC9vuo3duqVVoWxmwuPMmrz572GGE9F8MFFM2o2Ao/S7TPsQ== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040470)(2401047)(5005006)(8121501046)(3002001)(3231023)(944501134)(93006095)(93001095)(10201501046)(6041268)(20161123564045)(20161123562045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123558120)(6072148)(201708071742011);SRVR:HE1PR08MB2827;BCL:0;PCL:0;RULEID:(100000803101)(100110400095);SRVR:HE1PR08MB2827; X-Forefront-PRVS: 0549E6FD50 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6049001)(376002)(39850400004)(346002)(39380400002)(396003)(366004)(55674003)(377424004)(24454002)(199004)(189003)(25786009)(93886005)(76176011)(3846002)(53546011)(68736007)(53936002)(50466002)(39060400002)(81156014)(59450400001)(4326008)(6916009)(54906003)(16526018)(81166006)(305945005)(230700001)(386003)(65826007)(2950100002)(52146003)(83506002)(6246003)(5660300001)(8676002)(8936002)(2486003)(64126003)(478600001)(23676004)(52116002)(6116002)(7736002)(6486002)(77096006)(105586002)(106356001)(2906002)(31686004)(16576012)(6666003)(36756003)(31696002)(66066001)(316002)(58126008)(86362001)(47776003)(229853002)(97736004)(65806001)(65956001)(34023003)(148693002)(142933001);DIR:OUT;SFP:1102;SCL:1;SRVR:HE1PR08MB2827;H:[172.16.25.12];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtIRTFQUjA4TUIyODI3OzIzOkRtbC9YVGw1enpCV0lseEg5MXVJbG9ucnRF?= =?utf-8?B?K1F6dDRXa2YwckhjcWI0V3V0SS9jUTdIVWZyMW1IMFNCK0hubXgzeDErOGhM?= =?utf-8?B?MExUd3QrcldKSUxFZ05GdzlRa3RlUmxXN2tPbmE4SUI1QldLdWdhcGVUWWlD?= =?utf-8?B?b2FMSTFKcEhacE1qK29pdE15eUJtMjBwWll2cGF3cmQxc21CaDA2MGI5MDJ2?= =?utf-8?B?c3MrS1paRHZvUm5zRVlpTHJLK21acmhKdEhFcVJHcVlyYXNKM3lRRGZIYTVj?= =?utf-8?B?TnZlNGxLQXRDR3k2RDlaS2pGRFJYSng4YnYycXZOSytnTDl3SXhQb3ZZSC9W?= =?utf-8?B?VXNFUGdHQzNON3g2S1RMOHNUVisrRFh2eGk4bnlDS0dmM1ZLdXFwa3piRzd3?= =?utf-8?B?bDBiRGpsWm9PMk8rVExpdjRHcEw4RW42dDJLQ3ZSLzZ1TXFhaExndlV0RTlY?= =?utf-8?B?bklqYlNSdzZENG9LZHVJcDZQQ1RBSUxpMnUwR1F0MUVHUDlkbEtWcXNIRkhB?= =?utf-8?B?ZkNxTUNwQVhmalllZGxrRkhjNE9hdEI3NnI1SkpvaC9BYmZIQVZkUUg0cE1P?= =?utf-8?B?TkNHMkJ3Y1hRcmRqbmZFOExsUnVCV24wWW1OeTNpTENNaDNncy9NZmhUdkhw?= =?utf-8?B?bHBoK2x1TjhjbzA1amtvWmROWXBMb0Ewa3Rpb2NyYllHSG5wVXd3Qm5LVGt3?= =?utf-8?B?SEJqc0U4SXhJZWV3bS92ZlhlMWhnTU9JbEZEMGVha2h4UThNZk53QzJJVC9h?= =?utf-8?B?V21CUlE3bGZsSXpLaWJhVisvVFF1dGdDNmZrcUpRckY0VzJXaW40UGs3RHY2?= =?utf-8?B?alVuOTE4eXgzT3RuVkltUW85ZlpTUzExM0NSSFFQQml6bVQ5cEtSUXBzMUFa?= =?utf-8?B?QXZZY25MWU5IaE03MFFQSTA3SGh0UG8yMzFsb3hXOVhpcnp4OXFUaFFjMndQ?= =?utf-8?B?VEVjaVpCUXdtcUswUVFaNE84RXZBRllFaVJlTGNWNXNXSTc1ZWprM0IreTZX?= =?utf-8?B?dTN0Sit5MXdVb2wwMUJGZlpYQVprUnRMRXJZdnhySmE2ODFTc3BlajRtcFBk?= =?utf-8?B?TXZDQzY2Tk13b29DRmhyWGthQWVNbEpuYkhpeTc2YTZhSHlnTEhQWFFHVG9W?= =?utf-8?B?UlZjdHU1dCtvSUJHcVZQMmMvQzVqREE3MUlhSUljSlFwN1ZZdExVT04yVkh2?= =?utf-8?B?OC80MEZMMDdyaVdxcnROVGVhU3RMQ3JUVDNGcGYzRGR6b2lsREVLNnJkZmlv?= =?utf-8?B?aFYvWGw0NC9hSXVaL3Z1cGNwcWhKeEh4S3l3YUg2UjB5M1E3ZkltRHp3d2pv?= =?utf-8?B?ZHdKS3VaM094NHB6aVdubTdUV1Q1bWptdk1UdDl5NUE3RlpHU1JxMlJBK2Zr?= =?utf-8?B?ZVAvSDBLYjRXdkpMTEdUU3JTQU15aWsyZkVSdy8xSU9sY1VTeFBUaWZGNTJu?= =?utf-8?B?ZUtIeUk4MjdmeEc3N0UrK1BFU1hGZFdNcFRFZUhXQmcyY2FoWTZuMmg4NWVq?= =?utf-8?B?eml1Z3didUUwVHVMYUJOeGN1VFZqVmZuaUNsQy9JaW1kNDlvQXR1L240bmhV?= =?utf-8?B?QUdFT3NYM3laekNQb1NrTktBZHBPUXJVMStGK2NuTmNxUk4wQXgwZkhtSFpk?= =?utf-8?B?clY3OTV1U0JOUEZjT3pTeG5uM2l5Um01NVhUN2lnb0o3VGlhSHdEMUQ5K1Jk?= =?utf-8?B?T3ZBbTI0Z2RXdTVCUUFGNGZ1dFFqYWFxZjVIVVFVYXFPZzhpU0dHRVpDK0ZK?= =?utf-8?B?bzRDUEs3em1TdnB6SkdRR1VuVzZueHNKS0xNWWkvVzhzMG9SRXd5OERNcDhX?= =?utf-8?B?SlNUUk92K2Y2YlZZcCtWcGl6RGZxVm1uaUhxWWNpNVA3RUZKYk1vZDRLUVFn?= =?utf-8?B?dkFmbnBrYnhjc1N2Q090Q0ZCYWpReVF6aFNPd0c2VTlxWHpDcDU2MkRPbHpK?= =?utf-8?B?SUlZamV6d2tZRzVJVWxURlcrZVVUOVNhZHplOFM4UlZNYnRhUENPN3EyWDU5?= =?utf-8?B?VHVFWkhORmJSMnBKaWxhWnV0MVErWHBsZTg3ZUtkRzdWbERFM0NONkg3K3FH?= =?utf-8?B?bUdsNHpDbHlRcklnV3diWTdMTG9qdWtUdGdSY0xqcHIzb2c5Si9SdjU1M1Zy?= =?utf-8?B?WUE9PQ==?= X-Microsoft-Exchange-Diagnostics: 1;HE1PR08MB2827;6:8TX9L7EgHaIoPJecfpCtQKmmCOOgX4fwC1ZeKqsN5ea2GmpEWbzQLEk6uRk527xKdvtusrRtnH8zD2ABlbFTLzKvq9pNMK3N0hAouimKsn5fVE4hzqhdzqoR2/fueJ2Iu58kf5lGgSx+JNIuyYwqm14RXUH//Yny1KtA0fvgKpZ0lD0aCEhM7yqknhuVhO6sZkB6qlCUrdUdT6N+9NF/2sUdH6pJNrRzKiAnvijP0zZd07qjihmrlxDQCAHFWFRXDmFcRzCbHCVIk4mZKX6Bk1zx65uRXcwTV/j6tTAQ9eAMDGImzA2Brl8y5smzyNaGe6uJNxveRY2c2cWs9FzJdVaOCjQP7/frHSAdZcQVRrc=;5:yqg6H0h8TvRu7qhINUC1TVqmkrOHBTO/T728+7/l7pnEzOAJqckREjQqI/CBV4Ax6W6zBsg4agHTbsZooJRLe1Q8yWZCOaLCBAriSOm4Mg2pg4qihuh/MR1rS8R6Gmeyoo+VibGWh8TGh7pebaQYL6xDhJelMq3Y8W5CvqIZn0E=;24:J0X5bWqvymUjVnMqXdLk+20qNL0SFcI4EHGYuhsgyLyQtg2C9FE1ES7HpB/TYgBqSI1+pCk0rWcmYLfGzvrVFYP01fR1RbY/hhm1EnO4zKA=;7:aYXh0paqroHCk5wnxUmq2tr30lnm273bKblxq83Bw0Dn6BbX66Y5JqaPSuKDw3gIy3QubV3Mkw7SDasNuA7aal9Dp50Ri+b6L0daTt9650ja8CMLZZnWFZ/+pYqTd43mbe9oJ6o6tZznHw6KZPMhDXPXjGJKlKSD4/pTJnRu/NOwOGteXDXMjN7Y2+mMyM9GocQx/aCyBEJjdhRjwMmknj8Rw8H/PJGDbCo60QyEk3u1zLimxuz/LgdOm4UK8kMB SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;HE1PR08MB2827;20:bLSukBSryIxSpdSOGP4piLhNL6gs1qbW8e+wwGEVsV0G1YGT+34kI9u0ZfkplxyiV964U786H763xUy38nl3jynaiN1YjSCLgMZhkoiH7+P2pXY8xwlw0D9wuhO9hMr4ZtjBremtV8c4iWqvxlhuY8Z8aI7XP5BobtlZL+8I2bo= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Jan 2018 15:23:48.6599 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3da051f2-5416-4b2c-fc50-08d55907504d X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 0bc7f26d-0264-416e-a6fc-8352af79c58f X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR08MB2827 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/11/2018 03:46 PM, Michal Hocko wrote: > On Thu 11-01-18 15:21:33, Andrey Ryabinin wrote: >> >> >> On 01/11/2018 01:42 PM, Michal Hocko wrote: >>> On Wed 10-01-18 15:43:17, Andrey Ryabinin wrote: >>> [...] >>>> @@ -2506,15 +2480,13 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg, >>>> if (!ret) >>>> break; >>>> >>>> - try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, !memsw); >>>> - >>>> - curusage = page_counter_read(counter); >>>> - /* Usage is reduced ? */ >>>> - if (curusage >= oldusage) >>>> - retry_count--; >>>> - else >>>> - oldusage = curusage; >>>> - } while (retry_count); >>>> + usage = page_counter_read(counter); >>>> + if (!try_to_free_mem_cgroup_pages(memcg, usage - limit, >>>> + GFP_KERNEL, !memsw)) { >>> >>> If the usage drops below limit in the meantime then you get underflow >>> and reclaim the whole memcg. I do not think this is a good idea. This >>> can also lead to over reclaim. Why don't you simply stick with the >>> original SWAP_CLUSTER_MAX (aka 1 for try_to_free_mem_cgroup_pages)? >>> >> >> Because, if new limit is gigabytes bellow the current usage, retrying to set >> new limit after reclaiming only 32 pages seems unreasonable. > > Who would do insanity like that? > What's insane about that? >> @@ -2487,8 +2487,8 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg, >> if (!ret) >> break; >> >> - usage = page_counter_read(counter); >> - if (!try_to_free_mem_cgroup_pages(memcg, usage - limit, >> + nr_pages = max_t(long, 1, page_counter_read(counter) - limit); >> + if (!try_to_free_mem_cgroup_pages(memcg, nr_pages, >> GFP_KERNEL, !memsw)) { >> ret = -EBUSY; >> break; > > How does this address the over reclaim concern? It protects from over reclaim due to underflow. Assuming that yours over reclaim concern is situation like this: Task A Task B mem_cgroup_resize_limit(new_limit): ... do { ... try_to_free_mem_cgroup_pages(): //start reclaim free memory => drop down usage below new_limit //end reclaim ... } while(true) than I don't understand why is this a problem at all, and how try_to_free_mem_cgroup_pages(1) supposed to solve it. First of all, this is highly unlikely situation. Decreasing limit is not something that happens very often. I imagine that freeing large amounts of memory is also not very frequent operation, workloads mostly consume/use resources. So this is something that should almost never happen, and when it does, who and how would notice? I mean, that 'problem' has no user-visible effect. Secondly, how try_to_free_mem_cgroup_pages(1) can help here? Task B could simply free() right after the limit is successfully set. So it suddenly doesn't matter whether the memory was reclaimed by baby steps or in one go, we 'over reclaimed' memory that B freed. Basically, your suggestion sounds like "lets slowly reclaim with baby steps, and check the limit after each step in hope that tasks in cgroup did some of our job and freed some memory". So, the only way to completely avoid such over reclaim would be to do not reclaim at all, and simply wait until the memory usage goes down by itself. But we are not that crazy to do this, right? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id AD5366B026C for ; Thu, 11 Jan 2018 10:23:53 -0500 (EST) Received: by mail-pg0-f71.google.com with SMTP id a9so2499626pgf.12 for ; Thu, 11 Jan 2018 07:23:53 -0800 (PST) Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-eopbgr10098.outbound.protection.outlook.com. [40.107.1.98]) by mx.google.com with ESMTPS id p14si887348pli.680.2018.01.11.07.23.50 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 11 Jan 2018 07:23:51 -0800 (PST) Subject: Re: [PATCH v4] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes References: <20180109152622.31ca558acb0cc25a1b14f38c@linux-foundation.org> <20180110124317.28887-1-aryabinin@virtuozzo.com> <20180111104239.GZ1732@dhcp22.suse.cz> <4a8f667d-c2ae-e3df-00fd-edc01afe19e1@virtuozzo.com> <20180111124629.GA1732@dhcp22.suse.cz> From: Andrey Ryabinin Message-ID: Date: Thu, 11 Jan 2018 18:23:57 +0300 MIME-Version: 1.0 In-Reply-To: <20180111124629.GA1732@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Andrew Morton , Johannes Weiner , Vladimir Davydov , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shakeel Butt On 01/11/2018 03:46 PM, Michal Hocko wrote: > On Thu 11-01-18 15:21:33, Andrey Ryabinin wrote: >> >> >> On 01/11/2018 01:42 PM, Michal Hocko wrote: >>> On Wed 10-01-18 15:43:17, Andrey Ryabinin wrote: >>> [...] >>>> @@ -2506,15 +2480,13 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg, >>>> if (!ret) >>>> break; >>>> >>>> - try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, !memsw); >>>> - >>>> - curusage = page_counter_read(counter); >>>> - /* Usage is reduced ? */ >>>> - if (curusage >= oldusage) >>>> - retry_count--; >>>> - else >>>> - oldusage = curusage; >>>> - } while (retry_count); >>>> + usage = page_counter_read(counter); >>>> + if (!try_to_free_mem_cgroup_pages(memcg, usage - limit, >>>> + GFP_KERNEL, !memsw)) { >>> >>> If the usage drops below limit in the meantime then you get underflow >>> and reclaim the whole memcg. I do not think this is a good idea. This >>> can also lead to over reclaim. Why don't you simply stick with the >>> original SWAP_CLUSTER_MAX (aka 1 for try_to_free_mem_cgroup_pages)? >>> >> >> Because, if new limit is gigabytes bellow the current usage, retrying to set >> new limit after reclaiming only 32 pages seems unreasonable. > > Who would do insanity like that? > What's insane about that? >> @@ -2487,8 +2487,8 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg, >> if (!ret) >> break; >> >> - usage = page_counter_read(counter); >> - if (!try_to_free_mem_cgroup_pages(memcg, usage - limit, >> + nr_pages = max_t(long, 1, page_counter_read(counter) - limit); >> + if (!try_to_free_mem_cgroup_pages(memcg, nr_pages, >> GFP_KERNEL, !memsw)) { >> ret = -EBUSY; >> break; > > How does this address the over reclaim concern? It protects from over reclaim due to underflow. Assuming that yours over reclaim concern is situation like this: Task A Task B mem_cgroup_resize_limit(new_limit): ... do { ... try_to_free_mem_cgroup_pages(): //start reclaim free memory => drop down usage below new_limit //end reclaim ... } while(true) than I don't understand why is this a problem at all, and how try_to_free_mem_cgroup_pages(1) supposed to solve it. First of all, this is highly unlikely situation. Decreasing limit is not something that happens very often. I imagine that freeing large amounts of memory is also not very frequent operation, workloads mostly consume/use resources. So this is something that should almost never happen, and when it does, who and how would notice? I mean, that 'problem' has no user-visible effect. Secondly, how try_to_free_mem_cgroup_pages(1) can help here? Task B could simply free() right after the limit is successfully set. So it suddenly doesn't matter whether the memory was reclaimed by baby steps or in one go, we 'over reclaimed' memory that B freed. Basically, your suggestion sounds like "lets slowly reclaim with baby steps, and check the limit after each step in hope that tasks in cgroup did some of our job and freed some memory". So, the only way to completely avoid such over reclaim would be to do not reclaim at all, and simply wait until the memory usage goes down by itself. But we are not that crazy to do this, right? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrey Ryabinin Subject: Re: [PATCH v4] mm/memcg: try harder to decrease [memory,memsw].limit_in_bytes Date: Thu, 11 Jan 2018 18:23:57 +0300 Message-ID: References: <20180109152622.31ca558acb0cc25a1b14f38c@linux-foundation.org> <20180110124317.28887-1-aryabinin@virtuozzo.com> <20180111104239.GZ1732@dhcp22.suse.cz> <4a8f667d-c2ae-e3df-00fd-edc01afe19e1@virtuozzo.com> <20180111124629.GA1732@dhcp22.suse.cz> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=virtuozzo.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=qeBGg6WDG8klifEF7oHbcCTtvGz0EgNi/5b9AnPYcUo=; b=VXoWH7pOGy6pj5SOUwsv97NN8Un2TpG8kzOskyJd2BAcGp7vrty9BRs10uLygxjcCkdDUjlwqPZhwZ1Nvj69V7oBkGLI6mMdtETkSNOunQeaLl9zZjhFZuy5oBO9MgFCgZAE8Jv1V7i9uFBzcV/gO2Lfp6w3r3TkHmLN8pIATvo= In-Reply-To: <20180111124629.GA1732-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> Content-Language: en-US Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Michal Hocko Cc: Andrew Morton , Johannes Weiner , Vladimir Davydov , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Shakeel Butt On 01/11/2018 03:46 PM, Michal Hocko wrote: > On Thu 11-01-18 15:21:33, Andrey Ryabinin wrote: >> >> >> On 01/11/2018 01:42 PM, Michal Hocko wrote: >>> On Wed 10-01-18 15:43:17, Andrey Ryabinin wrote: >>> [...] >>>> @@ -2506,15 +2480,13 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg, >>>> if (!ret) >>>> break; >>>> >>>> - try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, !memsw); >>>> - >>>> - curusage = page_counter_read(counter); >>>> - /* Usage is reduced ? */ >>>> - if (curusage >= oldusage) >>>> - retry_count--; >>>> - else >>>> - oldusage = curusage; >>>> - } while (retry_count); >>>> + usage = page_counter_read(counter); >>>> + if (!try_to_free_mem_cgroup_pages(memcg, usage - limit, >>>> + GFP_KERNEL, !memsw)) { >>> >>> If the usage drops below limit in the meantime then you get underflow >>> and reclaim the whole memcg. I do not think this is a good idea. This >>> can also lead to over reclaim. Why don't you simply stick with the >>> original SWAP_CLUSTER_MAX (aka 1 for try_to_free_mem_cgroup_pages)? >>> >> >> Because, if new limit is gigabytes bellow the current usage, retrying to set >> new limit after reclaiming only 32 pages seems unreasonable. > > Who would do insanity like that? > What's insane about that? >> @@ -2487,8 +2487,8 @@ static int mem_cgroup_resize_limit(struct mem_cgroup *memcg, >> if (!ret) >> break; >> >> - usage = page_counter_read(counter); >> - if (!try_to_free_mem_cgroup_pages(memcg, usage - limit, >> + nr_pages = max_t(long, 1, page_counter_read(counter) - limit); >> + if (!try_to_free_mem_cgroup_pages(memcg, nr_pages, >> GFP_KERNEL, !memsw)) { >> ret = -EBUSY; >> break; > > How does this address the over reclaim concern? It protects from over reclaim due to underflow. Assuming that yours over reclaim concern is situation like this: Task A Task B mem_cgroup_resize_limit(new_limit): ... do { ... try_to_free_mem_cgroup_pages(): //start reclaim free memory => drop down usage below new_limit //end reclaim ... } while(true) than I don't understand why is this a problem at all, and how try_to_free_mem_cgroup_pages(1) supposed to solve it. First of all, this is highly unlikely situation. Decreasing limit is not something that happens very often. I imagine that freeing large amounts of memory is also not very frequent operation, workloads mostly consume/use resources. So this is something that should almost never happen, and when it does, who and how would notice? I mean, that 'problem' has no user-visible effect. Secondly, how try_to_free_mem_cgroup_pages(1) can help here? Task B could simply free() right after the limit is successfully set. So it suddenly doesn't matter whether the memory was reclaimed by baby steps or in one go, we 'over reclaimed' memory that B freed. Basically, your suggestion sounds like "lets slowly reclaim with baby steps, and check the limit after each step in hope that tasks in cgroup did some of our job and freed some memory". So, the only way to completely avoid such over reclaim would be to do not reclaim at all, and simply wait until the memory usage goes down by itself. But we are not that crazy to do this, right?