From mboxrd@z Thu Jan 1 00:00:00 1970 From: John David Anglin Subject: Re: threads and fork on machine with VIPT-WB cache Date: Sun, 11 Apr 2010 18:25:54 -0400 Message-ID: <20100411222554.GA10147@hiauly1.hia.nrc.ca> References: <20100408215453.GA18445@hiauly1.hia.nrc.ca> <20100408224446.96F294FA3@hiauly1.hia.nrc.ca> <20100409151330.GA23889@hiauly1.hia.nrc.ca> <4BC0E3AD.4050802@gmx.de> <20100410225355.GA2812@hiauly1.hia.nrc.ca> <4BC219F7.5020204@gmx.de> Reply-To: John David Anglin Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: John David Anglin , Carlos O'Donell , gniibe@fsij.org, linux-parisc@vger.kernel.org To: Helge Deller Return-path: In-Reply-To: <4BC219F7.5020204@gmx.de> List-ID: List-Id: linux-parisc.vger.kernel.org On Sun, 11 Apr 2010, Helge Deller wrote: > Nevertheless, I still see the crashes with all kernel patches applied. > > What I usually do is to start up more than 8 screen sessions. In each of the > sessions I start the bash loop: > -> i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail; done; > and detach from the screen sessions. > After some time, the load goes up to 8-16 and a few crashes fill the syslog. > I'm sure the crashes are related to how much load the machine is, and how > often process switches will happen. > How many minifail testcases do you run in parallel? Sigh, never more than one... That said, I did realize last night that the cache flush in ptep_set_wrprotect based on pte_dirty was flawed. In a SMP kernel with a user on a different cpu pounding on the page to be write protected, there was a race between the pte_dirty check and the write protect. Further, I don't believe the dirty bit is reliable. Our cmpxchg is not atomic with respect to changes in the dirty bit. Thus, there is a small window where a change in the dirty bit could get lost. So for now, I think it safest to move the flush after the setting of the write protect bit, and do it unconditionally. This should be ok since page faults are disabled. I recognize that this will hurt performance. I'm going to test the following on my rp3440. The flushing has greatly improved SMP userspace stability. However, I have still seen a few issues in the GCC testsuite. Maybe it will help your B2000. However, let's just go one step at a time. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h index a27d2e2..e85f43c 100644 --- a/arch/parisc/include/asm/pgtable.h +++ b/arch/parisc/include/asm/pgtable.h @@ -14,6 +14,7 @@ #include #include #include +extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long pfn); /* * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel @@ -456,7 +457,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, return old_pte; } -static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) +static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct mm_struct *mm, unsigned long addr, pte_t *ptep) { #ifdef CONFIG_SMP unsigned long new, old; @@ -469,6 +470,8 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t old_pte = *ptep; set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte)); #endif + + flush_cache_page(vma, addr, pte_pfn(*ptep)); } #define pte_same(A,B) (pte_val(A) == pte_val(B))