From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751730Ab2LDAfE (ORCPT ); Mon, 3 Dec 2012 19:35:04 -0500 Received: from mail-qa0-f46.google.com ([209.85.216.46]:49247 "EHLO mail-qa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751306Ab2LDAfD (ORCPT ); Mon, 3 Dec 2012 19:35:03 -0500 MIME-Version: 1.0 In-Reply-To: <20121203150110.39c204ff.akpm@linux-foundation.org> References: <1354344987-28203-1-git-send-email-walken@google.com> <20121203150110.39c204ff.akpm@linux-foundation.org> Date: Mon, 3 Dec 2012 16:35:01 -0800 Message-ID: Subject: Re: [PATCH] mm: protect against concurrent vma expansion From: Michel Lespinasse To: Andrew Morton Cc: linux-mm@kvack.org, Rik van Riel , Hugh Dickins , linux-kernel@vger.kernel.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 3, 2012 at 3:01 PM, Andrew Morton wrote: > On Fri, 30 Nov 2012 22:56:27 -0800 > Michel Lespinasse wrote: > >> expand_stack() runs with a shared mmap_sem lock. Because of this, there >> could be multiple concurrent stack expansions in the same mm, which may >> cause problems in the vma gap update code. >> >> I propose to solve this by taking the mm->page_table_lock around such vma >> expansions, in order to avoid the concurrency issue. We only have to worry >> about concurrent expand_stack() calls here, since we hold a shared mmap_sem >> lock and all vma modificaitons other than expand_stack() are done under >> an exclusive mmap_sem lock. >> >> I previously tried to achieve the same effect by making sure all >> growable vmas in a given mm would share the same anon_vma, which we >> already lock here. However this turned out to be difficult - all of the >> schemes I tried for refcounting the growable anon_vma and clearing >> turned out ugly. So, I'm now proposing only the minimal fix. > > I think I don't understand the problem fully. Let me demonstrate: > > a) vma_lock_anon_vma() doesn't take a lock which is specific to > "this" anon_vma. It takes anon_vma->root->mutex. That mutex is > shared with vma->vm_next, yes? If so, we have no problem here? > (which makes me suspect that the races lies other than where I think > it lies). So, the first thing I need to mention is that this fix is NOT for any problem that has been reported (and in particular, not for Sasha's trinity fuzzing issue). It's just me looking at the code and noticing I haven't gotten locking right for the case of concurrent stack expansion. Regarding vma and vma->vm_next sharing the same root anon_vma mutex - this will often be the case, but not always. find_mergeable_anon_vma() will try to make it so, but it could fail if there was another vma in-between at the time the stack's anon_vmas got assigned (either a non-stack vma that later gets unmapped, or another stack vma that didn't get its own anon_vma assigned yet). > b) I can see why a broader lock is needed in expand_upwards(): it > plays with a different vma: vma->vm_next. But expand_downwards() > doesn't do that - it only alters "this" vma. So I'd have thought > that vma_lock_anon_vma("this" vma) would be sufficient. The issue there is that vma_gap_update() accesses vma->vm_prev, so the issue is actually symetrical with expand_upwards(). > What are the performance costs of this change? It's expected to be small. glibc doesn't use expandable stacks for the threads it creates, so having multiple growable stacks is actually uncommon (another reason why the problem hasn't been observed in practice). Because of this, I don't expect the page table lock to get bounced between threads, so the cost of taking it should be small (compared to the cost of delivering the #PF, let alone handling it in software). But yes, the initial idea of forcing all growable vmas in an mm to share the same root anon_vma sounded much more appealing at first. Unfortunately I haven't been able to make that work in a simple enough way to be comfortable submitting it this late in the release cycle :/ -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies.