From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262923AbVDHTCC (ORCPT ); Fri, 8 Apr 2005 15:02:02 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262926AbVDHTCC (ORCPT ); Fri, 8 Apr 2005 15:02:02 -0400 Received: from fire.osdl.org ([65.172.181.4]:60138 "EHLO smtp.osdl.org") by vger.kernel.org with ESMTP id S262923AbVDHTB7 (ORCPT ); Fri, 8 Apr 2005 15:01:59 -0400 Date: Fri, 8 Apr 2005 12:03:49 -0700 (PDT) From: Linus Torvalds To: Chris Wedgwood cc: Matthias-Christian Ott , Andrea Arcangeli , Kernel Mailing List Subject: Re: Kernel SCM saga.. In-Reply-To: <20050408180540.GA4522@taniwha.stupidest.org> Message-ID: References: <20050408041341.GA8720@taniwha.stupidest.org> <20050408071428.GB3957@opteron.random> <4256AE0D.201@tiscali.de> <20050408171518.GA4201@taniwha.stupidest.org> <20050408180540.GA4522@taniwha.stupidest.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 8 Apr 2005, Chris Wedgwood wrote: > > Actually, I could probably make this *much* still faster with a > caveat. Given that my editor when I write a file will write a > temporary file and rename it, for files in directories where nlink==2 > I can check chat first and skip the stat of the individual files. Yes, doing the stat just on the directory (on leaf directories only, of course, but nlink==2 does say that on most filesystems) is indeed a huge potential speedup. It doesn't matter so much for the cached case, but it _does_ matter for the uncached one. Makes a huge difference, in fact (I was playing with exactly that back when I started doing "bkr" in BK/tools - three years ago). It turns out that I expect to cache my source tree (at least the mail outline), and that guides my optimizations, but yes, your dir stat does help in the case of "occasionally working with lots of large projects" rather than "mostly working on the same ones with enough RAM to cache it all". And "git" is actually fairly anal in this respect: it not only stats all files, but the index file contains a lot more of the stat info than you'd expect. So for example, it checks both ctime and mtime to the nanosecond (did I mention that I didn't worry too much about portability?) exactly so that it can catch any changes except for actively malicious things. And if you do actively malicious things in your own directory, you get what you deserve. It's actually _hard_ to try to fool git into believing a file hasn't changed: you need to not only replace it with the exact same file length and ctime/mtime, you need to reuse the same inode/dev numbers (again - I didn't worry about portability, and filesystems where those aren't stable are a "don't do that then") and keep the mode the same. Oh, and uid/gid, but that was much me being silly. Linus