All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gregory Haskins <ghaskins@novell.com>
To: Davide Libenzi <davidel@xmailserver.org>
Cc: mst@redhat.com, kvm@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	avi@redhat.com, paulmck@linux.vnet.ibm.com,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions
Date: Fri, 19 Jun 2009 17:49:51 -0400	[thread overview]
Message-ID: <4A3C07FF.3000406@novell.com> (raw)
In-Reply-To: <alpine.DEB.1.10.0906191421581.14884@makko.or.mcafeemobile.com>

[-- Attachment #1: Type: text/plain, Size: 4476 bytes --]

Davide Libenzi wrote:
> On Fri, 19 Jun 2009, Gregory Haskins wrote:
>
>   
>> Davide Libenzi wrote:
>>     
>>> On Fri, 19 Jun 2009, Gregory Haskins wrote:
>>>
>>>   
>>>       
>>>> eventfd currently emits a POLLHUP wakeup on f_ops->release() to generate a
>>>> notifier->release() callback.  This lets notification clients know if
>>>> the eventfd is about to go away and is very useful particularly for
>>>> in-kernel clients.  However, as it stands today it is not possible to
>>>> use the notification API in a race-free way.  This patch adds some
>>>> additional logic to the notification subsystem to rectify this problem.
>>>>
>>>> Background:
>>>> -----------------------
>>>> Eventfd currently only has one reference count mechanism: fget/fput.  This
>>>> in of itself is normally fine.  However, if a client expects to be
>>>> notified if the eventfd is closed, it cannot hold a fget() reference
>>>> itself or the underlying f_ops->release() callback will never be invoked
>>>> by VFS.  Therefore we have this somewhat unusual situation where we may
>>>> hold a pointer to an eventfd object (by virtue of having a waiter registered
>>>> in its wait-queue), but no reference.  This makes it nearly impossible to
>>>> design a mutual decoupling algorithm: you cannot unhook one side from the
>>>> other (or vice versa) without racing.
>>>>     
>>>>         
>>> And why is that?
>>>
>>> struct xxx {
>>> 	struct mutex mtx;
>>> 	struct file *file;
>>> 	...
>>> };
>>>
>>> struct file *xxx_get_file(struct xxx *x) {
>>> 	struct file *file;
>>>
>>> 	mutex_lock(&x->mtx);
>>> 	file = x->file;
>>> 	if (!file)
>>> 		mutex_unlock(&x->mtx);
>>> 	return file;
>>> }
>>>
>>> void xxx_release_file(struct xxx *x) {
>>> 	mutex_unlock(&x->mtx);
>>> }
>>>
>>> void handle_POLLHUP(struct xxx *x) {
>>> 	struct file *file;
>>>
>>> 	file = xxx_get_file(x);
>>> 	if (file) {
>>> 		unhook_waitqueue(file, ...);
>>> 		x->file = NULL;
>>> 		xxx_release_file(x);
>>> 	}
>>> }
>>>
>>>
>>> Every time you need to "use" file, you call xxx_get_file(), and if you get 
>>> NULL, it means it's gone and you handle it accordigly to your IRQ fd 
>>> policies. As soon as you done with the file, you call xxx_release_file().
>>> Replace "mtx" with the lock that fits your needs.
>>>   
>>>       
>> Consider what would happen if the f_ops->release() was preempted inside
>> the wake_up_locked_polled() after it dereferenced the xxx from the list,
>> but before it calls the callback(POLLHUP).  The xxx object, and/or the
>> .text for the xxx object may be long gone by the time it comes back
>> around.  Afaict, there is no way to guard against that scenario unless
>> you do something like 2/3+3/3.  Or am I missing something?
>>     
>
> Right. Don't you see an easier answer to that, instead of adding 300 lines 
> of code to eventfd?
>   

I tried, but this problem made my head hurt and this was what I came up
with that I felt closes the holes all the way.  Also keep in mind that
while I added X lines to eventfd, I took Y lines *out* of irqfd in the
process, too.  I just excluded the KVM portions in this thread per your
request, so its not apparent.  But technically, any other clients that
may come along can reuse that notification code instead of coding it
again.  One way or the other, *someone* has to do that ptable_proc stuff
;)  FYI: Its more like 133 lines, fwiw.

fs/eventfd.c            |  104
++++++++++++++++++++++++++++++++++++++++++++----
 include/linux/eventfd.h |   36 ++++++++++++++++
 2 files changed, 133 insertions(+), 7 deletions(-)

In case you care, heres what the complete solution when I include KVM
currently looks like:

 fs/eventfd.c            |  104 +++++++++++++++++++++++++--
 include/linux/eventfd.h |   36 +++++++++
 virt/kvm/eventfd.c      |  181
+++++++++++++++++++++++++-----------------------
 3 files changed, 228 insertions(+), 93 deletions(-)

> For example, turning wake_up_locked() into a nornal wake_up().
>   

I am fairly confident it is not that simple after having thought about
this issue over the last few days.  But I've been wrong in the past. 
Propose a patch and I will review it for races/correctness, if you
like.  Perhaps a combination of that plus your asymmetrical locking
scheme would work.  One of the challenges you will hit is avoiding ABBA
between your "get" lock and the wqh, but good luck!

-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

  reply	other threads:[~2009-06-19 21:50 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-16  2:29 [KVM-RFC PATCH 0/2] eventfd enhancements for irqfd/iosignalfd Gregory Haskins
2009-06-16  2:29 ` [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface Gregory Haskins
2009-06-16 14:02   ` Michael S. Tsirkin
2009-06-16 14:11     ` Gregory Haskins
2009-06-16 14:38       ` Michael S. Tsirkin
2009-06-16 14:48         ` Gregory Haskins
2009-06-16 14:54           ` Gregory Haskins
2009-06-16 15:16             ` Michael S. Tsirkin
2009-06-16 14:55           ` Michael S. Tsirkin
2009-06-16 15:20             ` Gregory Haskins
2009-06-16 15:41               ` Michael S. Tsirkin
2009-06-16 16:17                 ` Gregory Haskins
2009-06-16 16:19                   ` Davide Libenzi
2009-06-16 17:01                     ` Gregory Haskins
2009-06-17 16:38                       ` Davide Libenzi
2009-06-17 17:28                         ` Gregory Haskins
2009-06-17 17:44                           ` Davide Libenzi
2009-06-17 19:17                             ` Gregory Haskins
2009-06-17 19:50                               ` Davide Libenzi
2009-06-17 21:48                                 ` Gregory Haskins
2009-06-17 23:21                                   ` Davide Libenzi
2009-06-18  6:23                                     ` Michael S. Tsirkin
2009-06-18 17:52                                       ` Davide Libenzi
2009-06-18 14:01                                     ` Gregory Haskins
2009-06-18 14:01                                       ` Gregory Haskins
2009-06-18 17:44                                       ` Davide Libenzi
2009-06-18 19:04                                         ` Gregory Haskins
2009-06-18 19:04                                           ` Gregory Haskins
2009-06-18 22:03                                           ` Davide Libenzi
2009-06-18 22:47                                             ` Gregory Haskins
2009-06-18 22:47                                               ` Gregory Haskins
2009-06-19 18:51                                             ` Gregory Haskins
2009-06-19 18:51                                               ` [PATCH 1/3] eventfd: Allow waiters to be notified about the eventfd file* going away Gregory Haskins
2009-06-19 18:51                                               ` [PATCH 2/3] eventfd: add generalized notifier interface Gregory Haskins
2009-06-19 18:51                                               ` [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions Gregory Haskins
2009-06-19 19:10                                                 ` Davide Libenzi
2009-06-19 21:16                                                   ` Gregory Haskins
2009-06-19 21:26                                                     ` Davide Libenzi
2009-06-19 21:49                                                       ` Gregory Haskins [this message]
2009-06-19 21:54                                                         ` Davide Libenzi
2009-06-19 22:47                                                           ` Davide Libenzi
2009-06-20  2:09                                                             ` Gregory Haskins
2009-06-20 21:17                                                               ` Davide Libenzi
2009-06-20 22:11                                                                 ` Davide Libenzi
2009-06-20 23:48                                                                   ` Davide Libenzi
2009-06-21  1:14                                                                     ` Gregory Haskins
2009-06-21 16:51                                                                       ` Davide Libenzi
2009-06-21 18:39                                                                         ` Gregory Haskins
2009-06-21 23:54                                                                           ` Davide Libenzi
2009-06-22 16:05                                                                             ` Gregory Haskins
2009-06-22 16:05                                                                               ` Gregory Haskins
2009-06-22 17:01                                                                               ` Davide Libenzi
2009-06-22 17:43                                                                                 ` Gregory Haskins
2009-06-22 17:43                                                                                   ` Gregory Haskins
2009-06-22 18:03                                                                                   ` Davide Libenzi
2009-06-22 18:31                                                                                     ` Gregory Haskins
2009-06-22 18:31                                                                                       ` Gregory Haskins
2009-06-22 18:40                                                                                       ` Davide Libenzi
2009-06-22 18:41                                                                                     ` Michael S. Tsirkin
2009-06-22 18:51                                                                                       ` Davide Libenzi
2009-06-22 19:05                                                                                         ` Michael S. Tsirkin
2009-06-22 19:26                                                                                           ` Gregory Haskins
2009-06-22 19:29                                                                                             ` Davide Libenzi
2009-06-22 20:06                                                                                               ` Gregory Haskins
2009-06-22 22:53                                                                                                 ` Davide Libenzi
2009-06-23  1:03                                                                                                   ` Gregory Haskins
2009-06-23  1:17                                                                                                     ` Davide Libenzi
2009-06-23  1:26                                                                                                       ` Gregory Haskins
2009-06-23  1:26                                                                                                         ` Gregory Haskins
2009-06-23 14:29                                                                                                         ` Davide Libenzi
2009-06-23 14:37                                                                                                           ` Gregory Haskins
2009-06-23 14:37                                                                                                             ` Gregory Haskins
2009-06-23 14:35                                                                                                             ` Davide Libenzi
2009-06-23 14:42                                                                                                               ` Gregory Haskins
2009-06-23 14:42                                                                                                                 ` Gregory Haskins
2009-06-23 15:04                                                                                                               ` Michael S. Tsirkin
2009-06-22 20:28                                                                                             ` Michael S. Tsirkin
2009-06-22 19:16                                                                                         ` Gregory Haskins
2009-06-22 19:54                                                                                           ` Davide Libenzi
2009-06-24  3:25                                                                                     ` Rusty Russell
2009-06-24 22:45                                                                                       ` Davide Libenzi
2009-06-25 11:42                                                                                         ` Rusty Russell
2009-06-25 16:34                                                                                           ` Davide Libenzi
2009-06-25 17:32                                                                                             ` Gregory Haskins
2009-06-25 18:26                                                                                               ` Michael S. Tsirkin
2009-06-25 18:41                                                                                                 ` Gregory Haskins
2009-06-26 11:23                                                                                                   ` Michael S. Tsirkin
2009-06-23  3:25                                                                             ` Rusty Russell
2009-06-23 14:31                                                                               ` Davide Libenzi
2009-06-25  0:19                                                                                 ` Davide Libenzi
2009-06-21  1:05                                                                 ` Gregory Haskins
2009-06-16 17:54                   ` [KVM-RFC PATCH 1/2] eventfd: add an explicit srcu based notifier interface Michael S. Tsirkin
2009-06-16 18:09                     ` Gregory Haskins
2009-06-17 14:45                       ` Michael S. Tsirkin
2009-06-17 15:02                         ` Gregory Haskins
2009-06-17 16:25                           ` Michael S. Tsirkin
2009-06-17 16:41                             ` Gregory Haskins
2009-06-16 14:17     ` Gregory Haskins
2009-06-16 14:22       ` Gregory Haskins
2009-06-16 14:40     ` Gregory Haskins
2009-06-16 14:46       ` Michael S. Tsirkin
2009-06-18  9:03       ` Avi Kivity
2009-06-18 11:43         ` Gregory Haskins
2009-06-16  2:30 ` [KVM-RFC PATCH 2/2] eventfd: add module reference counting support for registered notifiers Gregory Haskins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A3C07FF.3000406@novell.com \
    --to=ghaskins@novell.com \
    --cc=avi@redhat.com \
    --cc=davidel@xmailserver.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mst@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.