From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757267Ab3BRTuk (ORCPT ); Mon, 18 Feb 2013 14:50:40 -0500 Received: from e33.co.us.ibm.com ([32.97.110.151]:35453 "EHLO e33.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757212Ab3BRTui (ORCPT ); Mon, 18 Feb 2013 14:50:38 -0500 Message-ID: <512285C4.4050809@linux.vnet.ibm.com> Date: Mon, 18 Feb 2013 11:49:24 -0800 From: Cody P Schafer User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 MIME-Version: 1.0 To: Seth Jennings CC: Ric Mason , Andrew Morton , Greg Kroah-Hartman , Nitin Gupta , Minchan Kim , Konrad Rzeszutek Wilk , Dan Magenheimer , Robert Jennings , Jenifer Hopper , Mel Gorman , Johannes Weiner , Rik van Riel , Larry Woodman , Benjamin Herrenschmidt , Dave Hansen , Joe Perches , linux-mm@kvack.org, linux-kernel@vger.kernel.org, devel@driverdev.osuosl.org Subject: Re: [PATCHv5 4/8] zswap: add to mm/ References: <1360780731-11708-1-git-send-email-sjenning@linux.vnet.ibm.com> <1360780731-11708-5-git-send-email-sjenning@linux.vnet.ibm.com> <511F0536.5030802@gmail.com> <51227FDA.7040000@linux.vnet.ibm.com> In-Reply-To: <51227FDA.7040000@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13021819-2398-0000-0000-0000112F4B03 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/18/2013 11:24 AM, Seth Jennings wrote: > On 02/15/2013 10:04 PM, Ric Mason wrote: >> On 02/14/2013 02:38 AM, Seth Jennings wrote: > >>> +/* invalidates all pages for the given swap type */ >>> +static void zswap_frontswap_invalidate_area(unsigned type) >>> +{ >>> + struct zswap_tree *tree = zswap_trees[type]; >>> + struct rb_node *node, *next; >>> + struct zswap_entry *entry; >>> + >>> + if (!tree) >>> + return; >>> + >>> + /* walk the tree and free everything */ >>> + spin_lock(&tree->lock); >>> + node = rb_first(&tree->rbroot); >>> + while (node) { >>> + entry = rb_entry(node, struct zswap_entry, rbnode); >>> + zs_free(tree->pool, entry->handle); >>> + next = rb_next(node); >>> + zswap_entry_cache_free(entry); >>> + node = next; >>> + } >>> + tree->rbroot = RB_ROOT; >> >> Why don't need rb_erase for every nodes? > > We are freeing the entire tree here. try_to_unuse() in the swapoff > syscall should have already emptied the tree, but this is here for > completeness. > > rb_erase() will do things like rebalancing the tree; something that > just wastes time since we are in the process of freeing the whole > tree. We are holding the tree lock here so we are sure that no one > else is accessing the tree while it is in this transient broken state. If we have a sub-tree like: ... / A / \ B C B == rb_next(tree) A == rb_next(B) C == rb_next(A) The current code free's A (via zswap_entry_cache_free()) prior to examining C, and thus rb_next(C) results in a use after free of A. You can solve this by doing a post-order traversal of the tree, either a) in the destructive manner used in a number of filesystems, see fs/ubifs/orphan.c ubifs_add_orphan(), for example. b) or by doing something similar to this commit: https://github.com/jmesmon/linux/commit/d9e43aaf9e8a447d6802531d95a1767532339fad , which I've been using for some yet-to-be-merged code.