From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261288AbULWU1o (ORCPT ); Thu, 23 Dec 2004 15:27:44 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261228AbULWU1n (ORCPT ); Thu, 23 Dec 2004 15:27:43 -0500 Received: from news.suse.de ([195.135.220.2]:56229 "EHLO Cantor.suse.de") by vger.kernel.org with ESMTP id S261288AbULWU1j (ORCPT ); Thu, 23 Dec 2004 15:27:39 -0500 To: Christoph Lameter Cc: linux-kernel@vger.kernel.org Subject: Re: Prezeroing V2 [0/3]: Why and When it works References: <41C20E3E.3070209@yahoo.com.au.suse.lists.linux.kernel> From: Andi Kleen Date: 23 Dec 2004 21:27:38 +0100 In-Reply-To: Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Christoph Lameter writes: > and why other approaches have not worked. > o Instead of zero_page(p,order) extend clear_page to take second argument > o Update all architectures to accept second argument for clear_pages Sorry if there was a miscommunication, but ... > 1. Aggregating zeroing operations to only apply to pages of higher order, > which results in many pages that will later become order 0 to be > zeroed in one go. For that purpose the existing clear_page function is > extended and made to take an additional argument specifying the order of > the page to be cleared. But if you do that you should really use a separate function that can use cache bypassing stores. Normal clear_page cannot use that because it would be a loss when the data is soon used. So the two changes don't really make sense. Also I must say I'm still suspicious regarding your heuristic to trigger gang faulting - with bad luck it could lead to a lot more memory usage to specific applications that do very sparse usage of memory. There should be at least an madvise flag to turn it off and a sysctl and it would be better to trigger only on a longer sequence of consecutive faulted pages. > 2. Hardware support for offloading zeroing from the cpu. This avoids > the invalidation of the cpu caches by extensive zeroing operations. > > The result is a significant increase of the page fault performance even for > single threaded applications: [...] How about some numbers on i386? -Andi