From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933016Ab2JZNuc (ORCPT ); Fri, 26 Oct 2012 09:50:32 -0400 Received: from mail-ea0-f174.google.com ([209.85.215.174]:53391 "EHLO mail-ea0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932356Ab2JZNub (ORCPT ); Fri, 26 Oct 2012 09:50:31 -0400 Date: Fri, 26 Oct 2012 15:50:24 +0200 From: Ingo Molnar To: Linus Torvalds Cc: Peter Zijlstra , Rik van Riel , Andrea Arcangeli , Mel Gorman , Johannes Weiner , Thomas Gleixner , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 26/31] sched, numa, mm: Add fault driven placement and migration policy Message-ID: <20121026135024.GA11640@gmail.com> References: <20121025121617.617683848@chello.nl> <20121025124834.467791319@chello.nl> <20121026071532.GC8141@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121026071532.GC8141@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > [ > task_numa_work() performance side note: > > We are also *very* close to be able to use down_read() instead > of down_write() in the sampling-unmap code in > task_numa_work(), as it should be safe in theory to call > change_protection(PROT_NONE) in parallel - but there's one > regression that disagrees with this theory so we use > down_write() at the moment. > > Maybe you could help us there: can you see a reason why the > change_prot_none()->change_protection() call in > task_numa_work() can not occur in parallel to a page fault in > another thread on another CPU? It should be safe - yet if we > change it I can see occasional corruption of user-space state: > segfaults and register corruption. > ] Oh, just found the reason: the ptep_modify_prot_start()/modify()/commit() sequence is SMP-unsafe - it has to be done under the mmap_sem write-locked. It is safe against *hardware* updates to the PTE, but not safe against itself. This is apparently a hidden cost of paravirt, it is forcing that weird sequence and thus the down_write() ... Thanks, Ingo From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx119.postini.com [74.125.245.119]) by kanga.kvack.org (Postfix) with SMTP id EA8686B0072 for ; Fri, 26 Oct 2012 09:50:31 -0400 (EDT) Received: by mail-ee0-f41.google.com with SMTP id c4so1287430eek.14 for ; Fri, 26 Oct 2012 06:50:30 -0700 (PDT) Date: Fri, 26 Oct 2012 15:50:24 +0200 From: Ingo Molnar Subject: Re: [PATCH 26/31] sched, numa, mm: Add fault driven placement and migration policy Message-ID: <20121026135024.GA11640@gmail.com> References: <20121025121617.617683848@chello.nl> <20121025124834.467791319@chello.nl> <20121026071532.GC8141@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121026071532.GC8141@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Peter Zijlstra , Rik van Riel , Andrea Arcangeli , Mel Gorman , Johannes Weiner , Thomas Gleixner , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org * Ingo Molnar wrote: > [ > task_numa_work() performance side note: > > We are also *very* close to be able to use down_read() instead > of down_write() in the sampling-unmap code in > task_numa_work(), as it should be safe in theory to call > change_protection(PROT_NONE) in parallel - but there's one > regression that disagrees with this theory so we use > down_write() at the moment. > > Maybe you could help us there: can you see a reason why the > change_prot_none()->change_protection() call in > task_numa_work() can not occur in parallel to a page fault in > another thread on another CPU? It should be safe - yet if we > change it I can see occasional corruption of user-space state: > segfaults and register corruption. > ] Oh, just found the reason: the ptep_modify_prot_start()/modify()/commit() sequence is SMP-unsafe - it has to be done under the mmap_sem write-locked. It is safe against *hardware* updates to the PTE, but not safe against itself. This is apparently a hidden cost of paravirt, it is forcing that weird sequence and thus the down_write() ... Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org