From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759508AbZDONHv (ORCPT ); Wed, 15 Apr 2009 09:07:51 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753025AbZDONHm (ORCPT ); Wed, 15 Apr 2009 09:07:42 -0400 Received: from mail-bw0-f169.google.com ([209.85.218.169]:48715 "EHLO mail-bw0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751919AbZDONHk (ORCPT ); Wed, 15 Apr 2009 09:07:40 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; b=FGJFQv30PK0ODLnUGmwy/+oXaeAB/W3qQmAsrtQnAz35T6/aFwZEgGmZ/UAYAPa8PC /lZkrh1KZ1RNByN6Jq48mfIuVOdW4+TdweF/xFFMLhv3RoZMOdZOSqviIY3y16TZvuuY OjCtya3d5ue1+Y7xqU3dtaaBbaEV4ye998Ez8= Date: Wed, 15 Apr 2009 15:07:33 +0200 From: Andrea Righi To: KAMEZAWA Hiroyuki Cc: Paul Menage , Balbir Singh , Gui Jianfeng , agk@sourceware.org, akpm@linux-foundation.org, axboe@kernel.dk, baramsori72@gmail.com, Carl Henrik Lunde , dave@linux.vnet.ibm.com, Divyesh Shah , eric.rannaud@gmail.com, fernando@oss.ntt.co.jp, Hirokazu Takahashi , Li Zefan , matt@bluehost.com, dradford@bluehost.com, ngupta@google.com, randy.dunlap@oracle.com, roberto@unbit.it, Ryo Tsuruta , Satoshi UCHIDA , subrata@linux.vnet.ibm.com, yoshikawa.takuya@oss.ntt.co.jp, containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/9] bio-cgroup controller Message-ID: <20090415130729.GA16735@linux> Mail-Followup-To: KAMEZAWA Hiroyuki , Paul Menage , Balbir Singh , Gui Jianfeng , agk@sourceware.org, akpm@linux-foundation.org, axboe@kernel.dk, baramsori72@gmail.com, Carl Henrik Lunde , dave@linux.vnet.ibm.com, Divyesh Shah , eric.rannaud@gmail.com, fernando@oss.ntt.co.jp, Hirokazu Takahashi , Li Zefan , matt@bluehost.com, dradford@bluehost.com, ngupta@google.com, randy.dunlap@oracle.com, roberto@unbit.it, Ryo Tsuruta , Satoshi UCHIDA , subrata@linux.vnet.ibm.com, yoshikawa.takuya@oss.ntt.co.jp, containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org References: <1239740480-28125-1-git-send-email-righi.andrea@gmail.com> <1239740480-28125-4-git-send-email-righi.andrea@gmail.com> <20090415111528.b796519a.kamezawa.hiroyu@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090415111528.b796519a.kamezawa.hiroyu@jp.fujitsu.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 15, 2009 at 11:15:28AM +0900, KAMEZAWA Hiroyuki wrote: > > +/* > > + * This function is used to make a given page have the bio-cgroup id of > > + * the owner of this page. > > + */ > > +void bio_cgroup_set_owner(struct page *page, struct mm_struct *mm) > > +{ > > + struct bio_cgroup *biog; > > + struct page_cgroup *pc; > > + > > + if (bio_cgroup_disabled()) > > + return; > > + pc = lookup_page_cgroup(page); > > + if (unlikely(!pc)) > > + return; > > + > > + pc->bio_cgroup_id = 0; /* 0: default bio_cgroup id */ > > + if (!mm) > > + return; > > + /* > > + * Locking "pc" isn't necessary here since the current process is > > + * the only one that can access the members related to bio_cgroup. > > + */ > > + rcu_read_lock(); > > + biog = bio_cgroup_from_task(rcu_dereference(mm->owner)); > > + if (unlikely(!biog)) > > + goto out; > > + /* > > + * css_get(&bio->css) isn't called to increment the reference > > + * count of this bio_cgroup "biog" so pc->bio_cgroup_id might turn > > + * invalid even if this page is still active. > > + * This approach is chosen to minimize the overhead. > > + */ > > + pc->bio_cgroup_id = biog->id; > > +out: > > + rcu_read_unlock(); > > +} > > + > > +/* > > + * Change the owner of a given page if necessary. > > + */ > > +void bio_cgroup_reset_owner(struct page *page, struct mm_struct *mm) > > +{ > > + /* > > + * A little trick: > > + * Just call bio_cgroup_set_owner() for pages which are already > > + * active since the bio_cgroup_id member of page_cgroup can be > > + * updated without any locks. This is because an integer type of > > + * variable can be set a new value at once on modern cpus. > > + */ > > + bio_cgroup_set_owner(page, mm); > > +} > Hmm ? I think all operations are under lock_page() and there are no races. > Isn't it ? ehm.. no. Adding this in bio_cgroup_set_owner(): WARN_ON_ONCE(!test_bit(PG_locked, &page->flags)); produces the following: [ 1.641186] WARNING: at mm/biotrack.c:77 bio_cgroup_set_owner+0xe2/0x100() [ 1.644534] Hardware name: [ 1.646955] Modules linked in: [ 1.650526] Pid: 1, comm: swapper Not tainted 2.6.30-rc2 #77 [ 1.653499] Call Trace: [ 1.656004] [] warn_slowpath+0xd0/0x120 [ 1.659062] [] ? save_stack_trace+0x2a/0x50 [ 1.662357] [] ? save_trace+0x3f/0xb0 [ 1.670214] [] ? handle_mm_fault+0x40d/0x8b0 [ 1.673321] [] ? __lock_acquire+0x63b/0x1de0 [ 1.676446] [] ? get_lock_stats+0x2a/0x60 [ 1.679657] [] ? put_lock_stats+0xe/0x30 [ 1.682673] [] ? handle_mm_fault+0x40d/0x8b0 [ 1.685706] [] bio_cgroup_set_owner+0xe2/0x100 [ 1.688852] [] ? handle_mm_fault+0x40d/0x8b0 [ 1.692280] [] handle_mm_fault+0x432/0x8b0 [ 1.695261] [] __get_user_pages+0x12f/0x430 [ 1.703507] [] get_user_pages+0x32/0x40 [ 1.706947] [] get_arg_page+0x4b/0xb0 [ 1.710287] [] copy_strings+0xfd/0x200 [ 1.714028] [] copy_strings_kernel+0x29/0x40 [ 1.717058] [] do_execve+0x2c1/0x400 [ 1.720291] [] sys_execve+0x49/0x80 [ 1.723209] [] kernel_execve+0x68/0xd0 [ 1.726309] [] ? init_post+0x18b/0x1b0 [ 1.729585] [] kernel_init+0x198/0x1b0 [ 1.735754] [] child_rip+0xa/0x20 [ 1.738690] [] ? restore_args+0x0/0x30 [ 1.741663] [] ? kernel_init+0x0/0x1b0 [ 1.744683] [] ? child_rip+0x0/0x20 [ 1.747820] ---[ end trace b9f530261e455c85 ]--- In do_anonymous_page(), bio_cgroup_set_owner() seems to be called without lock_page() held. -Andrea