All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefano Stabellini <sstabellini@kernel.org>
To: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Stefano Stabellini <sstabellini@kernel.org>
Subject: Re: [PATCH] xen/pvcalls: fix potential endless loop in pvcalls-front.c
Date: Wed, 15 Nov 2017 11:09:14 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.10.1711151100490.22767@sstabellini-ThinkPad-X260> (raw)
In-Reply-To: <12bea1d0-6504-d7c6-c47c-7cd197264536@suse.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4871 bytes --]

On Wed, 15 Nov 2017, Juergen Gross wrote:
> >>>         while(mutex_is_locked(&map->active.in_mutex.owner) ||
> >>>               mutex_is_locked(&map->active.out_mutex.owner))
> >>>                 cpu_relax();
> >>>
> >>> ?
> >> I'm not convinced there isn't a race.
> >>
> >> In pvcalls_front_recvmsg() sock->sk->sk_send_head is being read and only
> >> then in_mutex is taken. What happens if pvcalls_front_release() resets
> >> sk_send_head and manages to test the mutex before the mutex is locked?
> >>
> >> Even in case this is impossible: the whole construct seems to be rather
> >> fragile.

I agree it looks fragile, and I agree that it might be best to avoid the
usage of in_mutex and out_mutex as refcounts. More comments on this
below.

 
> > I think we can wait until pvcalls_refcount is 1 (i.e. it's only us) and
> > not rely on mutex state.
> 
> Yes, this would work.

Yes, I agree it would work and for the sake of getting something in
shape for the merge window I am attaching a patch for it. Please go
ahead with it. Let me know if you need anything else immediately, and
I'll work on it ASAP.



However, I should note that this is a pretty big hammer we are using:
the refcount is global, while we only need to wait until it's only us
_on this specific socket_.

We really need a per socket refcount. If we don't want to use the mutex
internal counters, then we need another one.

See the appended patch that introduces a per socket refcount. However,
for the merge window, also using pvcalls_refcount is fine.

The race Juergen is concerned about is only theoretically possible:

recvmsg:                                 release:
  
  test sk_send_head                      clear sk_send_head
  <only few instructions>                <prepare a message>
  grab in_mutex                          <send a message to the server>
                                         <wait for an answer>
                                         test in_mutex

Without kernel preemption is not possible for release to clear
sk_send_head and test in_mutex after recvmsg tests sk_send_head and
before recvmsg grabs in_mutex.

But maybe we need to disable kernel preemption in recvmsg and sendmsg to
stay on the safe side?

The patch below introduces a per active socket refcount, so that we
don't have to rely on in_mutex and out_mutex for refcounting. It also
disables preemption in sendmsg and recvmsg in the region described
above.

I don't think this patch should go in immediately. We can take our time
to figure out the best fix.


diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 0c1ec68..8c1030b 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -68,6 +68,7 @@ struct sock_mapping {
 			struct pvcalls_data data;
 			struct mutex in_mutex;
 			struct mutex out_mutex;
+			atomic_t sock_refcount;
 
 			wait_queue_head_t inflight_conn_req;
 		} active;
@@ -497,15 +498,20 @@ int pvcalls_front_sendmsg(struct socket *sock, struct msghdr *msg,
 	}
 	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
 
+	preempt_disable();
 	map = (struct sock_mapping *) sock->sk->sk_send_head;
 	if (!map) {
+		preempt_enable();
 		pvcalls_exit();
 		return -ENOTSOCK;
 	}
 
+	atomic_inc(&map->active.sock_refcount);
 	mutex_lock(&map->active.out_mutex);
+	preempt_enable();
 	if ((flags & MSG_DONTWAIT) && !pvcalls_front_write_todo(map)) {
 		mutex_unlock(&map->active.out_mutex);
+		atomic_dec(&map->active.sock_refcount);
 		pvcalls_exit();
 		return -EAGAIN;
 	}
@@ -528,6 +534,7 @@ int pvcalls_front_sendmsg(struct socket *sock, struct msghdr *msg,
 		tot_sent = sent;
 
 	mutex_unlock(&map->active.out_mutex);
+	atomic_dec(&map->active.sock_refcount);
 	pvcalls_exit();
 	return tot_sent;
 }
@@ -600,13 +607,17 @@ int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 	}
 	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
 
+	preempt_disable();
 	map = (struct sock_mapping *) sock->sk->sk_send_head;
 	if (!map) {
+		preempt_enable();
 		pvcalls_exit();
 		return -ENOTSOCK;
 	}
 
+	atomic_inc(&map->active.sock_refcount);
 	mutex_lock(&map->active.in_mutex);
+	preempt_enable();
 	if (len > XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER))
 		len = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
 
@@ -625,6 +636,7 @@ int pvcalls_front_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 		ret = 0;
 
 	mutex_unlock(&map->active.in_mutex);
+	atomic_dec(&map->active.sock_refcount);
 	pvcalls_exit();
 	return ret;
 }
@@ -1048,8 +1060,7 @@ int pvcalls_front_release(struct socket *sock)
 		 * is set to NULL -- we only need to wait for the existing
 		 * waiters to return.
 		 */
-		while (!mutex_trylock(&map->active.in_mutex) ||
-			   !mutex_trylock(&map->active.out_mutex))
+		while (atomic_read(&map->active.sock_refcount) > 0)
 			cpu_relax();
 
 		pvcalls_front_free_map(bedata, map);

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: TEXT/X-DIFF; NAME=front.patch, Size: 1174 bytes --]

xen/pvcalls: fix potential endless loop in pvcalls-front.c

mutex_trylock() returns 1 if you take the lock and 0 if not. Assume you
take in_mutex on the first try, but you can't take out_mutex. Next times
you call mutex_trylock() in_mutex is going to fail. It's an endless
loop.

Solve the problem by waiting until the global refcount is 1 instead (the
refcount is 1 when the only active pvcalls frontend function is
pvcalls_front_release).

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com


diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 0c1ec68..72a74c3 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -1048,8 +1048,7 @@ int pvcalls_front_release(struct socket *sock)
 		 * is set to NULL -- we only need to wait for the existing
 		 * waiters to return.
 		 */
-		while (!mutex_trylock(&map->active.in_mutex) ||
-			   !mutex_trylock(&map->active.out_mutex))
+		while (atomic_read(&pvcalls_refcount) > 1)
 			cpu_relax();
 
 		pvcalls_front_free_map(bedata, map);

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-11-15 19:09 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-06 22:17 [PATCH] xen/pvcalls: fix potential endless loop in pvcalls-front.c Stefano Stabellini
2017-11-06 22:34 ` Boris Ostrovsky
2017-11-06 22:38   ` Stefano Stabellini
2017-11-07  5:44 ` Juergen Gross
2017-11-10 23:57   ` Stefano Stabellini
2017-11-11  2:02     ` Boris Ostrovsky
2017-11-13 18:30       ` Stefano Stabellini
2017-11-13 15:15     ` Juergen Gross
2017-11-13 18:33       ` Stefano Stabellini
2017-11-14  9:11         ` Juergen Gross
2017-11-14 21:46           ` Boris Ostrovsky
2017-11-15  8:14             ` Juergen Gross
2017-11-15 19:09               ` Stefano Stabellini [this message]
2017-11-15 19:50                 ` Boris Ostrovsky
2017-11-15 19:53                   ` Boris Ostrovsky
2017-11-15 21:20                   ` Stefano Stabellini
2017-11-15 21:50                     ` Stefano Stabellini
2017-11-15 21:51                       ` Boris Ostrovsky
2017-11-15 21:54                         ` Stefano Stabellini
2017-11-16  5:28                     ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.10.1711151100490.22767@sstabellini-ThinkPad-X260 \
    --to=sstabellini@kernel.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=jgross@suse.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.