From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S263839AbTH1Ilo (ORCPT ); Thu, 28 Aug 2003 04:41:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S263847AbTH1Iln (ORCPT ); Thu, 28 Aug 2003 04:41:43 -0400 Received: from dp.samba.org ([66.70.73.150]:42700 "EHLO lists.samba.org") by vger.kernel.org with ESMTP id S263839AbTH1IEE (ORCPT ); Thu, 28 Aug 2003 04:04:04 -0400 From: Rusty Russell To: Andrew Morton Cc: mingo@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] Futex non-page-pinning fix In-reply-to: Your message of "Tue, 26 Aug 2003 23:20:39 MST." <20030826232039.626f5a4c.akpm@osdl.org> Date: Thu, 28 Aug 2003 10:47:36 +1000 Message-Id: <20030828080403.DC3272C640@lists.samba.org> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org In message <20030826232039.626f5a4c.akpm@osdl.org> you write: > > I assumed that for non-anonymous pages the mapping + index was always > > a unique identifier, even as they were swapped out. We need a > > persistent unique identifier for a page, OR a callback to > > unhash/rehash it when the identifier changes. Hence mapping + index > > where mapping != NULL, and the struct page and callbacks for swap > > pages. Using the callbacks for wherever else page->mapping changes is > > simple (but may be slow). > > swap_writepage() and end_swap_bio_read() are not really companion > functions. The page is in use and may be mapped into user pagetables > during swap_writepage(). It won't actually be freed up for a very long > time, if at all. > > I guess this means that there could be a large number of futexes which are > considered "swapped out" which are in fact not swapped out at all. > > I'm starting to dimly understand what this code does. You get 2/10 for > patch explanation ;) Heh, you get 4/10 for reading comprehension. See: "Name: Futexes without pinning pages" at the top of the patch. lkml trains people to use Alan Cox-style maximal density descriptions, to avoid being accused of insulting colleagues' intelligence (or being accused of not being 31337 enough). You seem to be cut of a different cloth. I will try to be more verbose here, and I think we'll all be better for it. > I think a better place to rehash the futex would be at the point where the > page is added to and removed from swapcache. This is simplest: the current code actually moves the futex queue out of the hash. If we make the rule: "call futex_rehash" every time page->mapping (or page->index) changes, we avoid races and make the code simpler. But this means it could be called quite often. One answer is to restrict the futex hashing so we don't have to search the entire table. Another is to have a separate futexed pages hash. The third is to implement the "futex_was_here" bit in the page->flags, which I think will work well in practice. I'll implement it as a separate patch however. > Similarly, all places which change the page's hash keys (mapping and index) > need to be locked against the futex lookup code. Yes. I'll look at everywhere that mapping is changed: thanks for the hints [snipped]. > Please make sure it builds with CONFIG_SWAP=n > > Please make sure it builds with CONFIG_FUTEX=n (sorry) Will do. > Please augment the lock ranking comment at the top of filemap.c Yes. > If a futex resides in a pagecache page which is then truncated, a > futex_wake() should really send the caller a SIGBUS; it looks like the code > will return -EFAULT, which is good enough. Any waiters on that futex will > not be wakeable, but they will be killable. It would be nice, but it's not worth more than a couple of lines of kernel code. There's a similar case where one thread is waiting in an mmapped file and it is unmapped by the other thread. It's a programmer bug since obviously noone can now wake the futex you're waiting on... Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell.