All of lore.kernel.org
 help / color / mirror / Atom feed
From: Davide Libenzi <davidel@xmailserver.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>, kvm@vger.kernel.org
Subject: Re: [PATCH 0/2] eventfd: new EFD_STATE flag
Date: Sun, 10 Jan 2010 11:04:23 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.00.1001101102250.20563@makko.or.mcafeemobile.com> (raw)
In-Reply-To: <20100110173557.GA29350@redhat.com>

On Sun, 10 Jan 2010, Michael S. Tsirkin wrote:

> On Sun, Jan 10, 2010 at 09:27:34AM -0800, Davide Libenzi wrote:
> > On Sun, 10 Jan 2010, Michael S. Tsirkin wrote:
> > 
> > > Not sure what you mean by irqfd locking.  IMO we must take waitqueue
> > > lock, otherwise we can not prevent wait queue callback from being
> > > invoked, right? Note that if we take our own lock under wait queue
> > > callback, we won't be able to use this lock around remove_wait_queue.
> > > Also note how fs/evetfd.c uses __remove_wait_queue after taking wait
> > > queue lock - we'd like to do the same with our wait queue.
> > > 
> > > Could you clarify pls?
> > 
> > Wait a second. Looking at the current code in Linus tree, when you 
> > deassign an irqfd, you are actually destroying it, so the idea of saving 
> > the counter on deassign is not going to work.
> > 
> > 
> > 
> > - Davide
> 
> Yes, the only context passed between deassign and assign is the eventfd
> itself. So I think the following code for deassign would work provided
> eventfd_ctx_read_locked works with wait queue lock taken:
> 
>         spin_lock_irqsave(&irqfd->wqh.lock, flags);
>         eventfd_ctx_read_locked(ctx->eventfd, 1, &ucnt);
>         __remove_wait_queue(irqfd->wqh, &irqfd->wait);
>         spin_lock_irqrestore(&irqfd->wqh.lock, flags);

As I said, you cannot do that from KVM, since write-witers will nedd to be 
woke up.
Use the eventfd_ctx_remove_wait_queue() below (NOTE: compiled, not tested).



- Davide


---
 fs/eventfd.c            |   88 +++++++++++++++++++++++++++++++++++++++---------
 include/linux/eventfd.h |   15 ++++++++
 2 files changed, 88 insertions(+), 15 deletions(-)

Index: linux-2.6.mod/fs/eventfd.c
===================================================================
--- linux-2.6.mod.orig/fs/eventfd.c	2010-01-06 18:33:27.000000000 -0800
+++ linux-2.6.mod/fs/eventfd.c	2010-01-10 11:02:00.000000000 -0800
@@ -135,26 +135,71 @@ static unsigned int eventfd_poll(struct 
 	return events;
 }
 
-static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count,
-			    loff_t *ppos)
+static void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt)
+{
+	*cnt = (ctx->flags & EFD_SEMAPHORE) ? 1: ctx->count;
+	ctx->count -= *cnt;
+}
+
+/**
+ * eventfd_ctx_remove_wait_queue - Read the current counter and removes wait queue.
+ * @ctx: [in] Pointer to eventfd context.
+ * @wait: [in] Wait queue to be removed.
+ * @cnt: [out] Pointer to the 64bit conter value.
+ *
+ * Returns zero if successful, or the following error codes:
+ *
+ * -EAGAIN      : The operation would have blocked.
+ *
+ * This is used to atomically remove a wait queue entry from the eventfd wait
+ * queue head, and read/reset the counter value.
+ */
+int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_t *wait,
+				  __u64 *cnt)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ctx->wqh.lock, flags);
+	eventfd_ctx_do_read(ctx, cnt);
+	__remove_wait_queue(&ctx->wqh, wait);
+	if (*cnt != 0 && waitqueue_active(&ctx->wqh))
+		wake_up_locked_poll(&ctx->wqh, POLLOUT);
+	spin_unlock_irqrestore(&ctx->wqh.lock, flags);
+
+	return *cnt != 0 ? 0: -EAGAIN;
+}
+EXPORT_SYMBOL_GPL(eventfd_ctx_remove_wait_queue);
+
+/**
+ * eventfd_ctx_read - Reads the eventfd counter or wait if it is zero.
+ * @ctx: [in] Pointer to eventfd context.
+ * @no_wait: [in] Different from zero if the operation should not block.
+ * @cnt: [out] Pointer to the 64bit conter value.
+ *
+ * Returns zero if successful, or the following error codes:
+ *
+ * -EAGAIN      : The operation would have blocked but @no_wait was nonzero.
+ * -ERESTARTSYS : A signal interrupted the wait operation.
+ *
+ * If @no_wait is zero, the function might sleep until the eventfd internal
+ * counter becomes greater than zero.
+ */
+ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int no_wait, __u64 *cnt)
 {
-	struct eventfd_ctx *ctx = file->private_data;
 	ssize_t res;
-	__u64 ucnt = 0;
 	DECLARE_WAITQUEUE(wait, current);
 
-	if (count < sizeof(ucnt))
-		return -EINVAL;
 	spin_lock_irq(&ctx->wqh.lock);
+	*cnt = 0;
 	res = -EAGAIN;
 	if (ctx->count > 0)
-		res = sizeof(ucnt);
-	else if (!(file->f_flags & O_NONBLOCK)) {
+		res = 0;
+	else if (!no_wait) {
 		__add_wait_queue(&ctx->wqh, &wait);
-		for (res = 0;;) {
+		for (;;) {
 			set_current_state(TASK_INTERRUPTIBLE);
 			if (ctx->count > 0) {
-				res = sizeof(ucnt);
+				res = 0;
 				break;
 			}
 			if (signal_pending(current)) {
@@ -168,18 +213,31 @@ static ssize_t eventfd_read(struct file 
 		__remove_wait_queue(&ctx->wqh, &wait);
 		__set_current_state(TASK_RUNNING);
 	}
-	if (likely(res > 0)) {
-		ucnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
-		ctx->count -= ucnt;
+	if (likely(res == 0)) {
+		eventfd_ctx_do_read(ctx, cnt);
 		if (waitqueue_active(&ctx->wqh))
 			wake_up_locked_poll(&ctx->wqh, POLLOUT);
 	}
 	spin_unlock_irq(&ctx->wqh.lock);
-	if (res > 0 && put_user(ucnt, (__u64 __user *) buf))
-		return -EFAULT;
 
 	return res;
 }
+EXPORT_SYMBOL_GPL(eventfd_ctx_read);
+
+static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count,
+			    loff_t *ppos)
+{
+	struct eventfd_ctx *ctx = file->private_data;
+	ssize_t res;
+	__u64 cnt;
+
+	if (count < sizeof(cnt))
+		return -EINVAL;
+	if ((res = eventfd_ctx_read(ctx, file->f_flags & O_NONBLOCK, &cnt)) < 0)
+		return res;
+
+	return put_user(cnt, (__u64 __user *) buf) ? -EFAULT: sizeof(cnt);
+}
 
 static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t count,
 			     loff_t *ppos)
Index: linux-2.6.mod/include/linux/eventfd.h
===================================================================
--- linux-2.6.mod.orig/include/linux/eventfd.h	2010-01-06 18:33:27.000000000 -0800
+++ linux-2.6.mod/include/linux/eventfd.h	2010-01-10 10:57:06.000000000 -0800
@@ -34,6 +34,9 @@ struct file *eventfd_fget(int fd);
 struct eventfd_ctx *eventfd_ctx_fdget(int fd);
 struct eventfd_ctx *eventfd_ctx_fileget(struct file *file);
 int eventfd_signal(struct eventfd_ctx *ctx, int n);
+ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int no_wait, __u64 *cnt);
+int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_t *wait,
+				  __u64 *cnt);
 
 #else /* CONFIG_EVENTFD */
 
@@ -61,6 +64,18 @@ static inline void eventfd_ctx_put(struc
 
 }
 
+static inline ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int no_wait,
+				       __u64 *cnt)
+{
+	return -ENOSYS;
+}
+
+static inline int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx,
+						wait_queue_t *wait, __u64 *cnt)
+{
+	return -ENOSYS;
+}
+
 #endif
 
 #endif /* _LINUX_EVENTFD_H */

  reply	other threads:[~2010-01-10 19:04 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-20 15:56 [PATCH 0/2] eventfd: new EFD_STATE flag Michael S. Tsirkin
2009-08-20 16:20 ` Davide Libenzi
2009-08-20 17:38   ` Avi Kivity
2009-08-20 17:44     ` Davide Libenzi
2009-08-20 17:56       ` Paolo Bonzini
2009-08-21 17:21         ` Davide Libenzi
2009-08-20 17:55     ` Michael S. Tsirkin
2009-08-20 18:06       ` Avi Kivity
2009-08-20 18:28         ` Michael S. Tsirkin
2009-08-23 13:01           ` Avi Kivity
2009-08-23 13:36             ` Michael S. Tsirkin
2009-08-23 13:40               ` Avi Kivity
2009-08-23 14:30                 ` Michael S. Tsirkin
2009-08-23 16:51                   ` Paolo Bonzini
2009-08-24 18:25                   ` Davide Libenzi
2009-08-24 18:31                     ` Avi Kivity
2009-08-24 22:08                       ` Davide Libenzi
2009-08-24 22:10                         ` Paolo Bonzini
2009-08-24 22:32                           ` Davide Libenzi
2009-08-25  6:59                             ` Paolo Bonzini
2009-08-25  4:26                         ` Avi Kivity
2009-08-24 21:49                     ` Michael S. Tsirkin
2009-08-24 22:15                       ` Davide Libenzi
2009-08-25  7:22                         ` Michael S. Tsirkin
2009-08-25 21:57                           ` Davide Libenzi
2009-08-26 10:29                             ` Michael S. Tsirkin
2009-08-26 10:41                               ` Avi Kivity
2009-08-26 17:45                               ` Davide Libenzi
2009-08-26 18:58                                 ` Avi Kivity
2009-08-26 19:13                                   ` Davide Libenzi
2009-08-26 19:42                                     ` Avi Kivity
2009-08-26 19:44                                       ` Davide Libenzi
2009-08-26 23:30                                         ` Davide Libenzi
2009-08-27  4:13                                           ` Avi Kivity
2009-08-27  8:06                                             ` Michael S. Tsirkin
2009-08-27 14:20                                               ` Davide Libenzi
2009-08-26 19:50                                       ` Gleb Natapov
2009-08-26 20:04                                         ` Davide Libenzi
2009-08-27  5:25                                           ` Gleb Natapov
2009-08-27  9:05                                     ` Paolo Bonzini
2009-08-27  9:09                                       ` Michael S. Tsirkin
2009-08-27 14:21                                       ` Davide Libenzi
2009-08-27 14:30                                         ` Michael S. Tsirkin
2009-08-27 14:38                                           ` Davide Libenzi
2009-08-27 14:49                                             ` Michael S. Tsirkin
2009-08-27 15:29                                               ` Davide Libenzi
2009-08-27 17:09                                                 ` Davide Libenzi
     [not found]                                                   ` <alpine.DEB.2.00.0908311644410.17349@makko.or.mcafeemobile.com>
     [not found]                                                     ` <4A9CB318.7030401@redhat.com>
     [not found]                                                       ` <alpine.DEB.2.00.0909010723380.28172@makko.or.mcafeemobile.com>
2010-01-06 19:33                                                         ` Michael S. Tsirkin
2010-01-06 20:43                                                           ` Davide Libenzi
2010-01-06 20:55                                                             ` Michael S. Tsirkin
2010-01-06 21:17                                                               ` Davide Libenzi
2010-01-06 22:29                                                                 ` Michael S. Tsirkin
2010-01-06 22:46                                                                   ` Davide Libenzi
2010-01-06 23:45                                                                     ` Michael S. Tsirkin
2010-01-06 23:59                                                                       ` Davide Libenzi
2010-01-07  0:02                                                                         ` Michael S. Tsirkin
2010-01-07  6:45                                                                         ` Michael S. Tsirkin
2010-01-07  7:25                                                                           ` Davide Libenzi
2010-01-07 10:36                                                                             ` Michael S. Tsirkin
2010-01-07 23:37                                                                               ` Davide Libenzi
2010-01-08  0:13                                                                                 ` Davide Libenzi
2010-01-08  0:26                                                                               ` Davide Libenzi
2010-01-10 10:30                                                                                 ` Michael S. Tsirkin
2010-01-10 15:26                                                                                   ` Davide Libenzi
2010-01-10 16:22                                                                                     ` Michael S. Tsirkin
2010-01-10 17:27                                                                                       ` Davide Libenzi
2010-01-10 17:35                                                                                         ` Michael S. Tsirkin
2010-01-10 19:04                                                                                           ` Davide Libenzi [this message]
2010-01-11  7:34                                                                                             ` Michael S. Tsirkin
2010-01-11 19:14                                                                                               ` Davide Libenzi
2010-01-11 19:19                                                                                                 ` Michael S. Tsirkin
2010-01-11 22:53                                                                                                   ` Davide Libenzi
2010-01-13 17:07                                                                                                     ` Michael S. Tsirkin
2010-01-11  9:01                                                                               ` Gleb Natapov
2010-01-11  9:02                                                                                 ` Michael S. Tsirkin
2010-01-11  9:08                                                                                   ` Gleb Natapov
2010-01-11  9:19                                                                                     ` Michael S. Tsirkin
2010-01-11  9:36                                                                                       ` Gleb Natapov
2010-01-11  9:41                                                                                         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1001101102250.20563@makko.or.mcafeemobile.com \
    --to=davidel@xmailserver.org \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.