From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752847AbbIKQ7s (ORCPT ); Fri, 11 Sep 2015 12:59:48 -0400 Received: from mail-ob0-f175.google.com ([209.85.214.175]:35686 "EHLO mail-ob0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751514AbbIKQ7o (ORCPT ); Fri, 11 Sep 2015 12:59:44 -0400 Date: Sat, 12 Sep 2015 00:59:22 +0800 From: Boqun Feng To: Oleg Nesterov Cc: "Paul E. McKenney" , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Jonathan Corbet , Michal Hocko , Peter Zijlstra , David Howells , Linus Torvalds Subject: Re: [PATCH] Documentation: Remove misleading examples of the barriers in wake_*() Message-ID: <20150911165922.GC1521@fixme-laptop.cn.ibm.com> References: <1441674841-11498-1-git-send-email-boqun.feng@gmail.com> <20150909192822.GM4029@linux.vnet.ibm.com> <20150910021612.GC18494@fixme-laptop.cn.ibm.com> <20150910175557.GA20640@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="+pHx0qQiF2pBVqBT" Content-Disposition: inline In-Reply-To: <20150910175557.GA20640@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --+pHx0qQiF2pBVqBT Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Oleg, On Thu, Sep 10, 2015 at 07:55:57PM +0200, Oleg Nesterov wrote: > On 09/10, Boqun Feng wrote: > > > > On Wed, Sep 09, 2015 at 12:28:22PM -0700, Paul E. McKenney wrote: > > > My feeling is > > > that we should avoid saying too much about the internals of wait_even= t() > > > and wake_up(). >=20 > I feel the same. I simply can't understand what we are trying to > document ;) >=20 What I think we should document here is what memory ordering guarantee we can rely on with these sleep/wakeup primitives, or what kind of barriers these primitives imply. Because the structure of the memory-barriers.txt here is: (*) Implicit kernel memory barriers. - Locking functions. - Interrupt disabling functions. ->- Sleep and wake-up functions.<- - Miscellaneous functions. > For example, >=20 > > A STORE-LOAD barrier is implied after setting task state by wait-relate= d functions: > > > > prepare_to_wait(); > > prepare_to_wait_exclusive(); > > prepare_to_wait_event(); >=20 > I won't argue, but to me this looks misleading too. >=20 > Yes, prepare_to_wait()->set_current_state() implies mb() and thus > a STORE-LOAD barrier. >=20 > But this has nothing to do with the explanation above. We do not > need this barrier to avoid the race with wake_up(). Again, again, > we can safely rely on wq->lock and acquire/release semantics. >=20 Yes, you are right. prepare_to_wait*() should be put here. What should be put here is set_current_state(), whose STORE-LOAD barrier pairs with the STORE-LOAD barrier of wake_up_process(). > This barrier is only needed if you do, say, >=20 > CONDITION =3D 1; >=20 > if (waitqueue_active(wq)) > wake_up(wq); >=20 > And note that the code above is wrong without another mb() after > CONDITION =3D 1. >=20 Understood, I admit I didn't realize this before. To summarize, we have three kinds of data related to sleep/wakeup: * CONDITIONs: global data used to indicate events * task states * wait queues(may not be used, if users use set_current_state() + schedule() to sleep and wake_up_process() to wake up) IIUC, the race on wait queues are almost avoided because of wq->locks, and if a wait queue is used, race on task states are avoided because states are readed and written with a wq->lock held in sleep/wakeup functions. So only in two cases we need STORE-LOAD barriers to avoid the race: 1. no wait queue used(e.g. rcu_boost_kthread), we need STORE-LOAD to order accesses to task states and CONDITIONs, so we have barriers in wake_up_process() and set_current_state(). 2. wait queue accessed without a wq->lock held(e.g. your example), we need STORE-LOAD to order accesses to wait queues and CONDITIONs Since case #1 still exists in kernel, we'd better keep this section in memory-barriers.txt, however, I'm not sure whether we should mention case #2 in this section. Here is a modified version, without mentioning case #2: SLEEP AND WAKE-UP FUNCTIONS --------------------------- Sleeping and waking on an event flagged in global data can be viewed as an interaction between two pieces of data: the task state of the task waiting = for the event and the global data used to indicate the event. To make sure that these appear to happen in the right order, the primitives to begin the proc= ess of going to sleep, and the primitives to initiate a wake up imply certain barriers. If a wait queue is used, all accesses to task states are protected by the l= ock of the wait queue, so the race on task states are avoided. However, if no w= ait queue used, we need some memory ordering guantanee to avoid the race between sleepers/wakees and wakers. The memory ordering requirement here can be expressed by two STORE-LOAD barriers(barriers which can guarantee a STORE perceding it can never be reordered after a LOAD following it). One STORE-LOAD barrier is needed on t= he sleeper/wakee side, before reading a variable used to indicate the event and after setting the state of the current task. Another STORE-LOAD barrier is needed on the waker side, before reading the state of the wakee task and af= ter setting a variable used to indicate the event. These two barriers can pair = with each other to avoid race conditions between sleepers/wakees and wakers: sleepr/wakee on CPU 1 waker on CPU 2 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D { wakee->state =3D TASK_RUNNING, event_indicated =3D 0 } STORE current->state=3DTASK_INTERRUPTIBLE c =3D LOAD event_indicated=09 STORE event_indicated=3D1 s =3D LOAD wakee->state=09 assert(!(c=3D=3D0 && s =3D=3D TASK_RUNNING)); A STORE-LOAD barrier is implied after setting task state in set_current_sta= te() and before reading task state in wake_up_process() Make sure call set_current_state() before read the global data used to indi= cate event and sleep, and call wake_up_process() after set the global data used = to indicate a event. Regards, Boqun --+pHx0qQiF2pBVqBT Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJV8whjAAoJEEl56MO1B/q4Qt4H/3ktPGkTE9rBpXc/q6+dqdnc PDl0vwNlC2V5RZr45B9Oir9zb26yattTGNm5qI8bh/JZ6gs6yoyz/fJrHS4IUmS9 iDkvtoQGu6x5axmiK1wKu2cpZHg/fPTtuQ2xWrh5EPTn1LK7d9lJJL+i2dnNZL9e 10XidtXVLKYQHSBA/PFi9dJhwvAgyIm00uVmHjPUvQ67/kiz+yteceuG8LJMeybC pAxrl1kxgTy1vzC+ccBoPgzKoeyPoOm0oXRqjp8493bihtwtXSLbfk+kUEvA9VNT dNzhhQUX9oQSW0aMKO6zF9XnfzhM6JSm5Xlbw3QBST09J3/Og7bl8ZaPh5+Mi7s= =g9jN -----END PGP SIGNATURE----- --+pHx0qQiF2pBVqBT--