From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751921AbdKVFhD convert rfc822-to-8bit (ORCPT ); Wed, 22 Nov 2017 00:37:03 -0500 Received: from mout.gmx.net ([212.227.17.20]:59156 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751613AbdKVFg7 (ORCPT ); Wed, 22 Nov 2017 00:36:59 -0500 Message-ID: <1511329005.8762.0.camel@gmx.de> Subject: Re: [BUG] 4.4.x-rt - memcg: refill_stock() use get_cpu_light() has data corruption issue From: Mike Galbraith To: Steven Rostedt , Haiyang HY1 Tan Cc: "'bigeasy@linutronix.de'" , "'linux-rt-users@vger.kernel.org'" , "'linux-kernel@vger.kernel.org'" , Tong Tong3 Li , Feng Feng24 Liu , Jianqiang1 Lu , Hongyong HY2 Zang Date: Wed, 22 Nov 2017 06:36:45 +0100 In-Reply-To: <20171121225002.0704d679@vmware.local.home> References: <05AA4EC5C6EC1D48BE2CDCFF3AE0B8A637F78A15@CNMAILEX04.lenovo.com> <20171121225002.0704d679@vmware.local.home> Content-Type: text/plain; charset="ISO-8859-15" X-Mailer: Evolution 3.20.5 Mime-Version: 1.0 Content-Transfer-Encoding: 8BIT X-Provags-ID: V03:K0:mIM+DbjJa/56nX2TTXlkqBuArJUguP1x/IQZ5C55rVcNsnSOmAE weHpIC1+qXwe2xlcBf09rY5YyMIKDlAAw3II/GH4N02eYqljsgPuh4KnWXl1WdwVP8DGb36 dhbFxrO3aCjCfPT/OQcZWQGHFrNGVfzNwUowOWa/dl9hYL8e2RTc/LRNnsn0BjhnagPSoZF mcSeqj3VhUlcQWFZhuEyw== X-UI-Out-Filterresults: notjunk:1;V01:K0:G/H/csQAOdQ=:dwFOONhP/Cr7YpH1d497J4 FRr7R449rNK4tdH7l0AVcmtRcD42pgV9KZP9EsCA6bSv9LcoU0LzjfXsdPblBGE1Bg8k1xAvu 6hqpQw0+vWSdFgBA00mFdia4ATjfFqe4jlf34+R5K7cm8+hplmDkMjwSzFnvgjrr4HnColvTs 81v0LFmQk7kpYx9ow+A4h/aT70IqBnTVN5J8g1HTMZF8m+wLou0eHtzVV4vC3KNw/B5nIRhs0 LeE9RN87JFWZrfzHnYVujixKGmq8b8MzdkdhNwKB9DSEiHqdiF5SKcorH8BRzPCRKFKMjNXqD sda5/mK7xyqd3mQYyr7RAJjRt5UJ/eQ9gy0qGAuR4P27wPsK1ZBC3DAmnKQtjyAgQ0tT6Ye0X tLsLUUcl9EagvWCTYx+/JK83Gpiaum8rSJGSCJa3/FieftGGPvNA3X7dOa8vdwGl/pi9C+L3I u1rWPL37wc1xJxSGwg4ntUsBqjE5Mt3BcbbOIvhegTS7PL5AWztfDIJ6tFPoe0tz5Ob+P6/Gu u/Xl64bGKqqTISHAZfUClL862WI6zb9aTJ9gzYn5ruaKkmHX7EcmUBM0w60uHtx67kzc8drT4 DRjSdEEfa8MDzJYKpx2iAWK2fhvztM6RdYg9knC3BA/Vyte632gWX6jZ5RjM/1kTilSMEq1Mw Nl+BJ39/5/QlFnlHgi2E01ZshQ2iLk2EfO6kUO0c0tBjN9t0KpbcbFeghPyBSu1sEpUChUM2s 1tnGxfIIaMr76xFTDqslrbz0/uV8x8SrYiDAkDLKaCi7QNRgV+Nw4NFrQbXEjVME+ZQB5eQeJ Q/+oc601VqO2rR+UuyIE39NKWr2IQv7RgD/5rzBP1IsoXmD+zk= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2017-11-21 at 22:50 -0500, Steven Rostedt wrote: > > Does it work if you revert the patch? That would restore the gripe.  How about this.. mm, memcg: serialize consume_stock(), drain_local_stock() and refill_stock() Haiyang HY1 Tan reports encountering races between drain_stock() and refill_stock(), resulting in drain_stock() draining stock freshly assigned by refill_stock(). This doesn't appear to have been safe before RT touched any of it due do drain_local_stock() being preemptible until db2ba40c277d came along and disabled irqs across the lot. Rather than do that with the upstream RT replacement with local_lock_irqsave/restore() since older trees don't yet need to be irq safe, use the local lock name and placement for consistency, but serialize with get/put_locked_var(). The below may not deserve full credit for the breakage, but it surely didn't help, so tough, it gets to wear the BPB. Reported-by: Haiyang HY1 Tan Signed-off-by: Mike Galbraith Fixes: ("mm, memcg: make refill_stock() use get_cpu_light()") --- mm/memcontrol.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1861,6 +1861,7 @@ struct memcg_stock_pcp { #define FLUSHING_CACHED_CHARGE 0 }; static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); +static DEFINE_LOCAL_IRQ_LOCK(memcg_stock_ll); static DEFINE_MUTEX(percpu_charge_mutex); /** @@ -1882,12 +1883,12 @@ static bool consume_stock(struct mem_cgr if (nr_pages > CHARGE_BATCH) return ret; - stock = &get_cpu_var(memcg_stock); + stock = &get_locked_var(memcg_stock_ll, memcg_stock); if (memcg == stock->cached && stock->nr_pages >= nr_pages) { stock->nr_pages -= nr_pages; ret = true; } - put_cpu_var(memcg_stock); + put_locked_var(memcg_stock_ll, memcg_stock); return ret; } @@ -1914,9 +1915,12 @@ static void drain_stock(struct memcg_sto */ static void drain_local_stock(struct work_struct *dummy) { - struct memcg_stock_pcp *stock = this_cpu_ptr(&memcg_stock); + struct memcg_stock_pcp *stock; + + stock = &get_locked_var(memcg_stock_ll, memcg_stock); drain_stock(stock); clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); + put_locked_var(memcg_stock_ll, memcg_stock); } /* @@ -1926,16 +1930,15 @@ static void drain_local_stock(struct wor static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) { struct memcg_stock_pcp *stock; - int cpu = get_cpu_light(); - stock = &per_cpu(memcg_stock, cpu); + stock = &get_locked_var(memcg_stock_ll, memcg_stock); if (stock->cached != memcg) { /* reset if necessary */ drain_stock(stock); stock->cached = memcg; } stock->nr_pages += nr_pages; - put_cpu_light(); + put_locked_var(memcg_stock_ll, memcg_stock); } /*