[RFC] replace ->get_poll_head with a waitqueue pointer in struct file

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC] replace ->get_poll_head with a waitqueue pointer in struct file
@ 2018-06-28 14:20 ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-28 14:20 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, linux-fsdevel, netdev, lkp

Introducing the new poll methods showed up a regression in the
will-it-scale ltp tests.  One reason for that is that indirect function
calls are very expensive now with the spectre mitigations.  I'm waiting
for better numbers, but this series has shown a 5% improvements in the
ops per second so far, while for the get_poll_head addition we had
regressions of 3.7% or 8.8% depending on the measurement.

This series removes the get_poll_head method again and instead stores an
optional wait_queue_head pointer in struct file, on which the poll_mask
method can be used if it is set.  The only complication is the networking
poll code which not only does some interesting gymnastics to get at the
wait queue pointer, but also has a mode to to hardware polling before
waiting for an even from poll or epoll.  Because of that this series has
a few net prep patches that need careful review.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC] replace ->get_poll_head with a waitqueue pointer in struct file
@ 2018-06-28 14:20 ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-28 14:20 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 901 bytes --]

Introducing the new poll methods showed up a regression in the
will-it-scale ltp tests.  One reason for that is that indirect function
calls are very expensive now with the spectre mitigations.  I'm waiting
for better numbers, but this series has shown a 5% improvements in the
ops per second so far, while for the get_poll_head addition we had
regressions of 3.7% or 8.8% depending on the measurement.

This series removes the get_poll_head method again and instead stores an
optional wait_queue_head pointer in struct file, on which the poll_mask
method can be used if it is set.  The only complication is the networking
poll code which not only does some interesting gymnastics to get at the
wait queue pointer, but also has a mode to to hardware polling before
waiting for an even from poll or epoll.  Because of that this series has
a few net prep patches that need careful review.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/6] net: remove sock_poll_busy_flag
  2018-06-28 14:20 ` Christoph Hellwig
@ 2018-06-28 14:20   ` Christoph Hellwig
  -1 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-28 14:20 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, linux-fsdevel, netdev, lkp

Can simplify be inlined into the only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/busy_poll.h | 6 ------
 net/socket.c            | 5 ++++-
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index c5187438af38..4a459a0d70d1 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -130,12 +130,6 @@ static inline void sock_poll_busy_loop(struct socket *sock, __poll_t events)
 	}
 }
 
-/* if this socket can poll_ll, tell the system call */
-static inline __poll_t sock_poll_busy_flag(struct socket *sock)
-{
-	return sk_can_busy_loop(sock->sk) ? POLL_BUSY_LOOP : 0;
-}
-
 /* used in the NIC receive handler to mark the skb */
 static inline void skb_mark_napi_id(struct sk_buff *skb,
 				    struct napi_struct *napi)
diff --git a/net/socket.c b/net/socket.c
index 8a109012608a..0f397fa33614 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1171,7 +1171,10 @@ static __poll_t sock_poll(struct file *file, poll_table *wait)
 		mask = sock->ops->poll_mask(sock, events);
 	}
 
-	return mask | sock_poll_busy_flag(sock);
+	/* this socket can poll_ll so tell the system call */
+	if (sk_can_busy_loop(sock->sk))
+		mask |= POLL_BUSY_LOOP;
+	return mask;
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 1/6] net: remove sock_poll_busy_flag
@ 2018-06-28 14:20   ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-28 14:20 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]

Can simplify be inlined into the only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/busy_poll.h | 6 ------
 net/socket.c            | 5 ++++-
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index c5187438af38..4a459a0d70d1 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -130,12 +130,6 @@ static inline void sock_poll_busy_loop(struct socket *sock, __poll_t events)
 	}
 }
 
-/* if this socket can poll_ll, tell the system call */
-static inline __poll_t sock_poll_busy_flag(struct socket *sock)
-{
-	return sk_can_busy_loop(sock->sk) ? POLL_BUSY_LOOP : 0;
-}
-
 /* used in the NIC receive handler to mark the skb */
 static inline void skb_mark_napi_id(struct sk_buff *skb,
 				    struct napi_struct *napi)
diff --git a/net/socket.c b/net/socket.c
index 8a109012608a..0f397fa33614 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1171,7 +1171,10 @@ static __poll_t sock_poll(struct file *file, poll_table *wait)
 		mask = sock->ops->poll_mask(sock, events);
 	}
 
-	return mask | sock_poll_busy_flag(sock);
+	/* this socket can poll_ll so tell the system call */
+	if (sk_can_busy_loop(sock->sk))
+		mask |= POLL_BUSY_LOOP;
+	return mask;
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/6] net: remove bogus RCU annotations on socket.wq
  2018-06-28 14:20 ` Christoph Hellwig
@ 2018-06-28 14:20   ` Christoph Hellwig
  -1 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-28 14:20 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, linux-fsdevel, netdev, lkp

We never use RCU protection for it, just a lot of cargo-cult
rcu_deference_protects calls.

Note that we do keep the kfree_rcu call for it, as the references through
struct sock are RCU protected and thus might require a grace period before
freeing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/net.h |  2 +-
 include/net/sock.h  |  2 +-
 net/socket.c        | 10 ++++------
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 08b6eb964dd6..2be0d5368315 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -114,7 +114,7 @@ struct socket {
 
 	unsigned long		flags;
 
-	struct socket_wq __rcu	*wq;
+	struct socket_wq	*wq;
 
 	struct file		*file;
 	struct sock		*sk;
diff --git a/include/net/sock.h b/include/net/sock.h
index b3b75419eafe..ad85d37c83c8 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1725,7 +1725,7 @@ static inline void sock_graft(struct sock *sk, struct socket *parent)
 {
 	WARN_ON(parent->sk);
 	write_lock_bh(&sk->sk_callback_lock);
-	sk->sk_wq = parent->wq;
+	rcu_assign_pointer(sk->sk_wq, parent->wq);
 	parent->sk = sk;
 	sk_set_socket(sk, parent);
 	sk->sk_uid = SOCK_INODE(parent)->i_uid;
diff --git a/net/socket.c b/net/socket.c
index 0f397fa33614..fe6620607b07 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -255,7 +255,7 @@ static struct inode *sock_alloc_inode(struct super_block *sb)
 	init_waitqueue_head(&wq->wait);
 	wq->fasync_list = NULL;
 	wq->flags = 0;
-	RCU_INIT_POINTER(ei->socket.wq, wq);
+	ei->socket.wq = wq;
 
 	ei->socket.state = SS_UNCONNECTED;
 	ei->socket.flags = 0;
@@ -269,11 +269,9 @@ static struct inode *sock_alloc_inode(struct super_block *sb)
 static void sock_destroy_inode(struct inode *inode)
 {
 	struct socket_alloc *ei;
-	struct socket_wq *wq;
 
 	ei = container_of(inode, struct socket_alloc, vfs_inode);
-	wq = rcu_dereference_protected(ei->socket.wq, 1);
-	kfree_rcu(wq, rcu);
+	kfree_rcu(ei->socket.wq, rcu);
 	kmem_cache_free(sock_inode_cachep, ei);
 }
 
@@ -607,7 +605,7 @@ static void __sock_release(struct socket *sock, struct inode *inode)
 		module_put(owner);
 	}
 
-	if (rcu_dereference_protected(sock->wq, 1)->fasync_list)
+	if (sock->wq->fasync_list)
 		pr_err("%s: fasync list not empty!\n", __func__);
 
 	if (!sock->file) {
@@ -1211,7 +1209,7 @@ static int sock_fasync(int fd, struct file *filp, int on)
 		return -EINVAL;
 
 	lock_sock(sk);
-	wq = rcu_dereference_protected(sock->wq, lockdep_sock_is_held(sk));
+	wq = sock->wq;
 	fasync_helper(fd, filp, on, &wq->fasync_list);
 
 	if (!wq->fasync_list)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/6] net: remove bogus RCU annotations on socket.wq
@ 2018-06-28 14:20   ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-28 14:20 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 2686 bytes --]

We never use RCU protection for it, just a lot of cargo-cult
rcu_deference_protects calls.

Note that we do keep the kfree_rcu call for it, as the references through
struct sock are RCU protected and thus might require a grace period before
freeing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/net.h |  2 +-
 include/net/sock.h  |  2 +-
 net/socket.c        | 10 ++++------
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 08b6eb964dd6..2be0d5368315 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -114,7 +114,7 @@ struct socket {
 
 	unsigned long		flags;
 
-	struct socket_wq __rcu	*wq;
+	struct socket_wq	*wq;
 
 	struct file		*file;
 	struct sock		*sk;
diff --git a/include/net/sock.h b/include/net/sock.h
index b3b75419eafe..ad85d37c83c8 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1725,7 +1725,7 @@ static inline void sock_graft(struct sock *sk, struct socket *parent)
 {
 	WARN_ON(parent->sk);
 	write_lock_bh(&sk->sk_callback_lock);
-	sk->sk_wq = parent->wq;
+	rcu_assign_pointer(sk->sk_wq, parent->wq);
 	parent->sk = sk;
 	sk_set_socket(sk, parent);
 	sk->sk_uid = SOCK_INODE(parent)->i_uid;
diff --git a/net/socket.c b/net/socket.c
index 0f397fa33614..fe6620607b07 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -255,7 +255,7 @@ static struct inode *sock_alloc_inode(struct super_block *sb)
 	init_waitqueue_head(&wq->wait);
 	wq->fasync_list = NULL;
 	wq->flags = 0;
-	RCU_INIT_POINTER(ei->socket.wq, wq);
+	ei->socket.wq = wq;
 
 	ei->socket.state = SS_UNCONNECTED;
 	ei->socket.flags = 0;
@@ -269,11 +269,9 @@ static struct inode *sock_alloc_inode(struct super_block *sb)
 static void sock_destroy_inode(struct inode *inode)
 {
 	struct socket_alloc *ei;
-	struct socket_wq *wq;
 
 	ei = container_of(inode, struct socket_alloc, vfs_inode);
-	wq = rcu_dereference_protected(ei->socket.wq, 1);
-	kfree_rcu(wq, rcu);
+	kfree_rcu(ei->socket.wq, rcu);
 	kmem_cache_free(sock_inode_cachep, ei);
 }
 
@@ -607,7 +605,7 @@ static void __sock_release(struct socket *sock, struct inode *inode)
 		module_put(owner);
 	}
 
-	if (rcu_dereference_protected(sock->wq, 1)->fasync_list)
+	if (sock->wq->fasync_list)
 		pr_err("%s: fasync list not empty!\n", __func__);
 
 	if (!sock->file) {
@@ -1211,7 +1209,7 @@ static int sock_fasync(int fd, struct file *filp, int on)
 		return -EINVAL;
 
 	lock_sock(sk);
-	wq = rcu_dereference_protected(sock->wq, lockdep_sock_is_held(sk));
+	wq = sock->wq;
 	fasync_helper(fd, filp, on, &wq->fasync_list);
 
 	if (!wq->fasync_list)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/6] net: don't detour through struct to find the poll head
  2018-06-28 14:20 ` Christoph Hellwig
@ 2018-06-28 14:20   ` Christoph Hellwig
  -1 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-28 14:20 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, linux-fsdevel, netdev, lkp

As far as I can tell sock->sk->sk_wq->wait will always point to
sock->wq->wait.  That means we can do the shorter dereference and
not worry about any RCU protection.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/socket.c b/net/socket.c
index fe6620607b07..7cf037d21805 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1136,7 +1136,7 @@ static struct wait_queue_head *sock_get_poll_head(struct file *file,
 	if (!sock->ops->poll_mask)
 		return NULL;
 	sock_poll_busy_loop(sock, events);
-	return sk_sleep(sock->sk);
+	return &sock->wq->wait;
 }
 
 static __poll_t sock_poll_mask(struct file *file, __poll_t events)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/6] net: don't detour through struct to find the poll head
@ 2018-06-28 14:20   ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-28 14:20 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 743 bytes --]

As far as I can tell sock->sk->sk_wq->wait will always point to
sock->wq->wait.  That means we can do the shorter dereference and
not worry about any RCU protection.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/socket.c b/net/socket.c
index fe6620607b07..7cf037d21805 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1136,7 +1136,7 @@ static struct wait_queue_head *sock_get_poll_head(struct file *file,
 	if (!sock->ops->poll_mask)
 		return NULL;
 	sock_poll_busy_loop(sock, events);
-	return sk_sleep(sock->sk);
+	return &sock->wq->wait;
 }
 
 static __poll_t sock_poll_mask(struct file *file, __poll_t events)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/6] net: remove busy polling from sock_get_poll_head
  2018-06-28 14:20 ` Christoph Hellwig
@ 2018-06-28 14:20   ` Christoph Hellwig
  -1 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-28 14:20 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, linux-fsdevel, netdev, lkp

Busy polling always comes from a synchronous poll context, so for now we
can assume that it calls ->poll if present.  Move the busy polling in
sock_poll to the common block and remove it from sock_get_poll_head to
prepare for the removal of the get_poll_head method.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/socket.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 7cf037d21805..ca300c97b7ba 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1135,7 +1135,6 @@ static struct wait_queue_head *sock_get_poll_head(struct file *file,
 
 	if (!sock->ops->poll_mask)
 		return NULL;
-	sock_poll_busy_loop(sock, events);
 	return &sock->wq->wait;
 }
 
@@ -1161,8 +1160,9 @@ static __poll_t sock_poll(struct file *file, poll_table *wait)
 	struct socket *sock = file->private_data;
 	__poll_t events = poll_requested_events(wait), mask = 0;
 
+	sock_poll_busy_loop(sock, events);
+
 	if (sock->ops->poll) {
-		sock_poll_busy_loop(sock, events);
 		mask = sock->ops->poll(file, sock, wait);
 	} else if (sock->ops->poll_mask) {
 		sock_poll_wait(file, sock_get_poll_head(file, events), wait);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/6] net: remove busy polling from sock_get_poll_head
@ 2018-06-28 14:20   ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-28 14:20 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1203 bytes --]

Busy polling always comes from a synchronous poll context, so for now we
can assume that it calls ->poll if present.  Move the busy polling in
sock_poll to the common block and remove it from sock_get_poll_head to
prepare for the removal of the get_poll_head method.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/socket.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 7cf037d21805..ca300c97b7ba 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1135,7 +1135,6 @@ static struct wait_queue_head *sock_get_poll_head(struct file *file,
 
 	if (!sock->ops->poll_mask)
 		return NULL;
-	sock_poll_busy_loop(sock, events);
 	return &sock->wq->wait;
 }
 
@@ -1161,8 +1160,9 @@ static __poll_t sock_poll(struct file *file, poll_table *wait)
 	struct socket *sock = file->private_data;
 	__poll_t events = poll_requested_events(wait), mask = 0;
 
+	sock_poll_busy_loop(sock, events);
+
 	if (sock->ops->poll) {
-		sock_poll_busy_loop(sock, events);
 		mask = sock->ops->poll(file, sock, wait);
 	} else if (sock->ops->poll_mask) {
 		sock_poll_wait(file, sock_get_poll_head(file, events), wait);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 5/6] net: remove sock_poll_busy_loop
  2018-06-28 14:20 ` Christoph Hellwig
@ 2018-06-28 14:20   ` Christoph Hellwig
  -1 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-28 14:20 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, linux-fsdevel, netdev, lkp

There is no point in hiding this logic in a helper.  Also remove the
useless events != 0 check, and reorder the remaining ones so that the
cheaper test comes first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/busy_poll.h | 9 ---------
 net/socket.c            | 4 +++-
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 4a459a0d70d1..71c72a939bf8 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -121,15 +121,6 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
 #endif
 }
 
-static inline void sock_poll_busy_loop(struct socket *sock, __poll_t events)
-{
-	if (sk_can_busy_loop(sock->sk) &&
-	    events && (events & POLL_BUSY_LOOP)) {
-		/* once, only if requested by syscall */
-		sk_busy_loop(sock->sk, 1);
-	}
-}
-
 /* used in the NIC receive handler to mark the skb */
 static inline void skb_mark_napi_id(struct sk_buff *skb,
 				    struct napi_struct *napi)
diff --git a/net/socket.c b/net/socket.c
index ca300c97b7ba..4354afe8ad96 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1160,7 +1160,9 @@ static __poll_t sock_poll(struct file *file, poll_table *wait)
 	struct socket *sock = file->private_data;
 	__poll_t events = poll_requested_events(wait), mask = 0;
 
-	sock_poll_busy_loop(sock, events);
+	/* poll once if requested by the syscall */
+	if ((events & POLL_BUSY_LOOP) && sk_can_busy_loop(sock->sk))
+		sk_busy_loop(sock->sk, 1);
 
 	if (sock->ops->poll) {
 		mask = sock->ops->poll(file, sock, wait);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 5/6] net: remove sock_poll_busy_loop
@ 2018-06-28 14:20   ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-28 14:20 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1611 bytes --]

There is no point in hiding this logic in a helper.  Also remove the
useless events != 0 check, and reorder the remaining ones so that the
cheaper test comes first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/net/busy_poll.h | 9 ---------
 net/socket.c            | 4 +++-
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 4a459a0d70d1..71c72a939bf8 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -121,15 +121,6 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
 #endif
 }
 
-static inline void sock_poll_busy_loop(struct socket *sock, __poll_t events)
-{
-	if (sk_can_busy_loop(sock->sk) &&
-	    events && (events & POLL_BUSY_LOOP)) {
-		/* once, only if requested by syscall */
-		sk_busy_loop(sock->sk, 1);
-	}
-}
-
 /* used in the NIC receive handler to mark the skb */
 static inline void skb_mark_napi_id(struct sk_buff *skb,
 				    struct napi_struct *napi)
diff --git a/net/socket.c b/net/socket.c
index ca300c97b7ba..4354afe8ad96 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1160,7 +1160,9 @@ static __poll_t sock_poll(struct file *file, poll_table *wait)
 	struct socket *sock = file->private_data;
 	__poll_t events = poll_requested_events(wait), mask = 0;
 
-	sock_poll_busy_loop(sock, events);
+	/* poll once if requested by the syscall */
+	if ((events & POLL_BUSY_LOOP) && sk_can_busy_loop(sock->sk))
+		sk_busy_loop(sock->sk, 1);
 
 	if (sock->ops->poll) {
 		mask = sock->ops->poll(file, sock, wait);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 14:20 ` Christoph Hellwig
@ 2018-06-28 14:20   ` Christoph Hellwig
  -1 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-28 14:20 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, linux-fsdevel, netdev, lkp

The doubling of the indirect calls caused a too big regression for some
benchmarks in our post-spectre world.

With some of the nastiness in the networking code moved out of the way
we can instead stick a pointer to a waitqueue into struct file and
avoid one of the indirect calls.  This actually happens to simplify
the code quite a bit as well.

Note that for this removes the possibility of actually returning an
error before waiting in poll.  We could still do this with an ERR_PTR
in f_poll_head with a little bit of WRITE_ONCE/READ_ONCE magic, but
I'd like to defer that until actually required.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/filesystems/Locking |  4 +---
 Documentation/filesystems/vfs.txt | 15 +++++---------
 drivers/char/random.c             |  8 ++++----
 fs/aio.c                          | 22 ++++++---------------
 fs/eventfd.c                      | 33 +++++++++++++++++++------------
 fs/eventpoll.c                    |  9 +--------
 fs/pipe.c                         | 12 +++--------
 fs/select.c                       | 18 ++++-------------
 fs/timerfd.c                      | 31 +++++++++++++++++------------
 include/linux/fs.h                |  2 +-
 include/linux/poll.h              |  7 +------
 net/socket.c                      | 17 +++-------------
 12 files changed, 67 insertions(+), 111 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 2c391338c675..4d183e8258b6 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -441,7 +441,6 @@ prototypes:
 	int (*iterate) (struct file *, struct dir_context *);
 	int (*iterate_shared) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
-	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
 	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
@@ -505,8 +504,7 @@ in sys_read() and friends.
 the lease within the individual filesystem to record the result of the
 operation
 
-->poll_mask can be called with or without the waitqueue lock for the waitqueue
-returned from ->get_poll_head.
+->poll_mask can be called with or without the waitqueue lock
 
 --------------------------- dquot_operations -------------------------------
 prototypes:
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 829a7b7857a4..2d2f07acafa8 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -857,7 +857,6 @@ struct file_operations {
 	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
 	int (*iterate) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
-	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
 	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
@@ -903,16 +902,12 @@ otherwise noted.
 	activity on this file and (optionally) go to sleep until there
 	is activity. Called by the select(2) and poll(2) system calls
 
-  get_poll_head: Returns the struct wait_queue_head that callers can
-  wait on.  Callers need to check the returned events using ->poll_mask
-  once woken.  Can return NULL to indicate polling is not supported,
-  or any error code using the ERR_PTR convention to indicate that a
-  grave error occured and ->poll_mask shall not be called.
-
   poll_mask: return the mask of EPOLL* values describing the file descriptor
-  state.  Called either before going to sleep on the waitqueue returned by
-  get_poll_head, or after it has been woken.  If ->get_poll_head and
-  ->poll_mask are implemented ->poll does not need to be implement.
+  state.  Called either before going to sleep on ->f_poll_head, or after it
+  has been woken.  On files that support ->poll_mask the ->f_poll_head field
+  must point to a wait_queue_head that poll can wait on.  The waitqueue must
+  have the same (or a longer) life time as the struct file itself.
+  If  ->poll_mask is implemented ->poll does not need to be implement.
 
   unlocked_ioctl: called by the ioctl(2) system call.
 
diff --git a/drivers/char/random.c b/drivers/char/random.c
index a8fb0020ba5c..e6260a9fa3b9 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1875,10 +1875,10 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 	return ret;
 }
 
-static struct wait_queue_head *
-random_get_poll_head(struct file *file, __poll_t events)
+static int random_open(struct inode *inode, struct file *file)
 {
-	return &random_wait;
+	file->f_poll_head = &random_wait;
+	return 0;
 }
 
 static __poll_t
@@ -1990,9 +1990,9 @@ static int random_fasync(int fd, struct file *filp, int on)
 }
 
 const struct file_operations random_fops = {
+	.open = random_open,
 	.read  = random_read,
 	.write = random_write,
-	.get_poll_head  = random_get_poll_head,
 	.poll_mask  = random_poll_mask,
 	.unlocked_ioctl = random_ioctl,
 	.fasync = random_fasync,
diff --git a/fs/aio.c b/fs/aio.c
index e1d20124ec0e..84f498df8afc 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -168,7 +168,6 @@ struct fsync_iocb {
 struct poll_iocb {
 	struct file		*file;
 	__poll_t		events;
-	struct wait_queue_head	*head;
 
 	union {
 		struct wait_queue_entry	wait;
@@ -1632,7 +1631,7 @@ static int aio_poll_cancel(struct kiocb *iocb)
 {
 	struct aio_kiocb *aiocb = container_of(iocb, struct aio_kiocb, rw);
 	struct poll_iocb *req = &aiocb->poll;
-	struct wait_queue_head *head = req->head;
+	struct wait_queue_head *head = req->file->f_poll_head;
 	bool found = false;
 
 	spin_lock(&head->lock);
@@ -1655,7 +1654,7 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 	struct file *file = req->file;
 	__poll_t mask = key_to_poll(key);
 
-	assert_spin_locked(&req->head->lock);
+	assert_spin_locked(&file->f_poll_head->lock);
 
 	/* for instances that support it check for an event match first: */
 	if (mask && !(mask & req->events))
@@ -1703,30 +1702,21 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
 	req->file = fget(iocb->aio_fildes);
 	if (unlikely(!req->file))
 		return -EBADF;
-	if (!file_has_poll_mask(req->file))
+	if (!req->file->f_poll_head)
 		goto out_fail;
 
-	req->head = req->file->f_op->get_poll_head(req->file, req->events);
-	if (!req->head)
-		goto out_fail;
-	if (IS_ERR(req->head)) {
-		mask = EPOLLERR;
-		goto done;
-	}
-
 	init_waitqueue_func_entry(&req->wait, aio_poll_wake);
 	aiocb->ki_cancel = aio_poll_cancel;
 
 	spin_lock_irq(&ctx->ctx_lock);
-	spin_lock(&req->head->lock);
+	spin_lock(&req->file->f_poll_head->lock);
 	mask = req->file->f_op->poll_mask(req->file, req->events) & req->events;
 	if (!mask) {
-		__add_wait_queue(req->head, &req->wait);
+		__add_wait_queue(req->file->f_poll_head, &req->wait);
 		list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
 	}
-	spin_unlock(&req->head->lock);
+	spin_unlock(&req->file->f_poll_head->lock);
 	spin_unlock_irq(&ctx->ctx_lock);
-done:
 	if (mask)
 		__aio_poll_complete(aiocb, mask);
 	return 0;
diff --git a/fs/eventfd.c b/fs/eventfd.c
index ceb1031f1cac..8904b127577b 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -101,14 +101,6 @@ static int eventfd_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static struct wait_queue_head *
-eventfd_get_poll_head(struct file *file, __poll_t events)
-{
-	struct eventfd_ctx *ctx = file->private_data;
-
-	return &ctx->wqh;
-}
-
 static __poll_t eventfd_poll_mask(struct file *file, __poll_t eventmask)
 {
 	struct eventfd_ctx *ctx = file->private_data;
@@ -311,7 +303,6 @@ static const struct file_operations eventfd_fops = {
 	.show_fdinfo	= eventfd_show_fdinfo,
 #endif
 	.release	= eventfd_release,
-	.get_poll_head	= eventfd_get_poll_head,
 	.poll_mask	= eventfd_poll_mask,
 	.read		= eventfd_read,
 	.write		= eventfd_write,
@@ -390,7 +381,8 @@ EXPORT_SYMBOL_GPL(eventfd_ctx_fileget);
 static int do_eventfd(unsigned int count, int flags)
 {
 	struct eventfd_ctx *ctx;
-	int fd;
+	struct file *file;
+	int fd, error;
 
 	/* Check the EFD_* constants for consistency.  */
 	BUILD_BUG_ON(EFD_CLOEXEC != O_CLOEXEC);
@@ -408,12 +400,27 @@ static int do_eventfd(unsigned int count, int flags)
 	ctx->count = count;
 	ctx->flags = flags;
 
-	fd = anon_inode_getfd("[eventfd]", &eventfd_fops, ctx,
+	error = get_unused_fd_flags(O_RDWR | (flags & EFD_SHARED_FCNTL_FLAGS));
+	if (error < 0)
+		goto err_free_ctx;
+	fd = error;
+
+	file = anon_inode_getfile("[eventfd]", &eventfd_fops, ctx,
 			      O_RDWR | (flags & EFD_SHARED_FCNTL_FLAGS));
-	if (fd < 0)
-		eventfd_free_ctx(ctx);
+	if (IS_ERR(file)) {
+		error = PTR_ERR(file);
+		goto err_put_unused_fd;
+	}
+	file->f_poll_head = &ctx->wqh;
+	fd_install(fd, file);
 
 	return fd;
+
+err_put_unused_fd:
+	put_unused_fd(fd);
+err_free_ctx:
+	eventfd_free_ctx(ctx);
+	return error;
 }
 
 SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index ea4436f409fb..a0d199be91db 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -922,13 +922,6 @@ static __poll_t ep_read_events_proc(struct eventpoll *ep, struct list_head *head
 	return 0;
 }
 
-static struct wait_queue_head *ep_eventpoll_get_poll_head(struct file *file,
-		__poll_t eventmask)
-{
-	struct eventpoll *ep = file->private_data;
-	return &ep->poll_wait;
-}
-
 static __poll_t ep_eventpoll_poll_mask(struct file *file, __poll_t eventmask)
 {
 	struct eventpoll *ep = file->private_data;
@@ -972,7 +965,6 @@ static const struct file_operations eventpoll_fops = {
 	.show_fdinfo	= ep_show_fdinfo,
 #endif
 	.release	= ep_eventpoll_release,
-	.get_poll_head	= ep_eventpoll_get_poll_head,
 	.poll_mask	= ep_eventpoll_poll_mask,
 	.llseek		= noop_llseek,
 };
@@ -1973,6 +1965,7 @@ static int do_epoll_create(int flags)
 		goto out_free_fd;
 	}
 	ep->file = file;
+	file->f_poll_head = &ep->poll_wait;
 	fd_install(fd, file);
 	return fd;
 
diff --git a/fs/pipe.c b/fs/pipe.c
index bb0840e234f3..6817a60bebc9 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -509,14 +509,6 @@ static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	}
 }
 
-static struct wait_queue_head *
-pipe_get_poll_head(struct file *filp, __poll_t events)
-{
-	struct pipe_inode_info *pipe = filp->private_data;
-
-	return &pipe->wait;
-}
-
 /* No kernel lock held - fine */
 static __poll_t pipe_poll_mask(struct file *filp, __poll_t events)
 {
@@ -768,6 +760,7 @@ int create_pipe_files(struct file **res, int flags)
 
 	f->f_flags = O_WRONLY | (flags & (O_NONBLOCK | O_DIRECT));
 	f->private_data = inode->i_pipe;
+	f->f_poll_head = &inode->i_pipe->wait;
 
 	res[0] = alloc_file(&path, FMODE_READ, &pipefifo_fops);
 	if (IS_ERR(res[0])) {
@@ -778,6 +771,7 @@ int create_pipe_files(struct file **res, int flags)
 	path_get(&path);
 	res[0]->private_data = inode->i_pipe;
 	res[0]->f_flags = O_RDONLY | (flags & O_NONBLOCK);
+	res[0]->f_poll_head = &inode->i_pipe->wait;
 	res[1] = f;
 	return 0;
 
@@ -930,6 +924,7 @@ static int fifo_open(struct inode *inode, struct file *filp)
 
 	/* We can only do regular read/write on fifos */
 	filp->f_mode &= (FMODE_READ | FMODE_WRITE);
+	filp->f_poll_head = &pipe->wait;
 
 	switch (filp->f_mode) {
 	case FMODE_READ:
@@ -1023,7 +1018,6 @@ const struct file_operations pipefifo_fops = {
 	.llseek		= no_llseek,
 	.read_iter	= pipe_read,
 	.write_iter	= pipe_write,
-	.get_poll_head	= pipe_get_poll_head,
 	.poll_mask	= pipe_poll_mask,
 	.unlocked_ioctl	= pipe_ioctl,
 	.release	= pipe_release,
diff --git a/fs/select.c b/fs/select.c
index 317891ff8165..26e20063454a 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -38,20 +38,10 @@ __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
 {
 	if (file->f_op->poll) {
 		return file->f_op->poll(file, pt);
-	} else if (file_has_poll_mask(file)) {
-		unsigned int events = poll_requested_events(pt);
-		struct wait_queue_head *head;
-
-		if (pt && pt->_qproc) {
-			head = file->f_op->get_poll_head(file, events);
-			if (!head)
-				return DEFAULT_POLLMASK;
-			if (IS_ERR(head))
-				return EPOLLERR;
-			pt->_qproc(file, head, pt);
-		}
-
-		return file->f_op->poll_mask(file, events);
+	} else if (file->f_poll_head) {
+		if (pt && pt->_qproc)
+			pt->_qproc(file, file->f_poll_head, pt);
+		return file->f_op->poll_mask(file, poll_requested_events(pt));
 	} else {
 		return DEFAULT_POLLMASK;
 	}
diff --git a/fs/timerfd.c b/fs/timerfd.c
index d84a2bee4f82..08e820166c4d 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -227,14 +227,6 @@ static int timerfd_release(struct inode *inode, struct file *file)
 	return 0;
 }
 	
-static struct wait_queue_head *timerfd_get_poll_head(struct file *file,
-		__poll_t eventmask)
-{
-	struct timerfd_ctx *ctx = file->private_data;
-
-	return &ctx->wqh;
-}
-
 static __poll_t timerfd_poll_mask(struct file *file, __poll_t eventmask)
 {
 	struct timerfd_ctx *ctx = file->private_data;
@@ -363,7 +355,6 @@ static long timerfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg
 
 static const struct file_operations timerfd_fops = {
 	.release	= timerfd_release,
-	.get_poll_head	= timerfd_get_poll_head,
 	.poll_mask	= timerfd_poll_mask,
 	.read		= timerfd_read,
 	.llseek		= noop_llseek,
@@ -386,8 +377,9 @@ static int timerfd_fget(int fd, struct fd *p)
 
 SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
 {
-	int ufd;
+	int ufd, error;
 	struct timerfd_ctx *ctx;
+	struct file *file;
 
 	/* Check the TFD_* constants for consistency.  */
 	BUILD_BUG_ON(TFD_CLOEXEC != O_CLOEXEC);
@@ -424,12 +416,25 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
 
 	ctx->moffs = ktime_mono_to_real(0);
 
-	ufd = anon_inode_getfd("[timerfd]", &timerfd_fops, ctx,
+	error = get_unused_fd_flags(O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
+	if (error < 0)
+		goto out_free_ctx;
+	ufd = error;
+
+	file = anon_inode_getfile("[timerfd]", &timerfd_fops, ctx,
 			       O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
-	if (ufd < 0)
-		kfree(ctx);
+	if (IS_ERR(file)) {
+		error = PTR_ERR(file);
+		goto err_put_unused_fd;
+	}
+	file->f_poll_head = &ctx->wqh;
+	fd_install(ufd, file);
 
 	return ufd;
+err_put_unused_fd:
+	put_unused_fd(ufd);
+out_free_ctx:
+	return error;
 }
 
 static int do_timerfd_settime(int ufd, int flags, 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 5c91108846db..cc0fb83d3743 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -878,6 +878,7 @@ struct file {
 	loff_t			f_pos;
 	struct fown_struct	f_owner;
 	const struct cred	*f_cred;
+	struct wait_queue_head	*f_poll_head;
 	struct file_ra_state	f_ra;
 
 	u64			f_version;
@@ -1720,7 +1721,6 @@ struct file_operations {
 	int (*iterate) (struct file *, struct dir_context *);
 	int (*iterate_shared) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
-	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
 	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
diff --git a/include/linux/poll.h b/include/linux/poll.h
index fdf86b4cbc71..e3dd9dfee20a 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -74,14 +74,9 @@ static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
 	pt->_key   = ~(__poll_t)0; /* all events enabled */
 }
 
-static inline bool file_has_poll_mask(struct file *file)
-{
-	return file->f_op->get_poll_head && file->f_op->poll_mask;
-}
-
 static inline bool file_can_poll(struct file *file)
 {
-	return file->f_op->poll || file_has_poll_mask(file);
+	return file->f_op->poll || file->f_poll_head;
 }
 
 __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt);
diff --git a/net/socket.c b/net/socket.c
index 4354afe8ad96..155d4ba38645 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -117,8 +117,6 @@ static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from);
 static int sock_mmap(struct file *file, struct vm_area_struct *vma);
 
 static int sock_close(struct inode *inode, struct file *file);
-static struct wait_queue_head *sock_get_poll_head(struct file *file,
-		__poll_t events);
 static __poll_t sock_poll_mask(struct file *file, __poll_t);
 static __poll_t sock_poll(struct file *file, struct poll_table_struct *wait);
 static long sock_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
@@ -143,7 +141,6 @@ static const struct file_operations socket_file_ops = {
 	.llseek =	no_llseek,
 	.read_iter =	sock_read_iter,
 	.write_iter =	sock_write_iter,
-	.get_poll_head = sock_get_poll_head,
 	.poll_mask =	sock_poll_mask,
 	.poll =		sock_poll,
 	.unlocked_ioctl = sock_ioctl,
@@ -422,6 +419,8 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
 
 	sock->file = file;
 	file->f_flags = O_RDWR | (flags & O_NONBLOCK);
+	if (sock->ops->poll_mask)
+		file->f_poll_head = &sock->wq->wait;
 	file->private_data = sock;
 	return file;
 }
@@ -1128,16 +1127,6 @@ int sock_create_lite(int family, int type, int protocol, struct socket **res)
 }
 EXPORT_SYMBOL(sock_create_lite);
 
-static struct wait_queue_head *sock_get_poll_head(struct file *file,
-		__poll_t events)
-{
-	struct socket *sock = file->private_data;
-
-	if (!sock->ops->poll_mask)
-		return NULL;
-	return &sock->wq->wait;
-}
-
 static __poll_t sock_poll_mask(struct file *file, __poll_t events)
 {
 	struct socket *sock = file->private_data;
@@ -1167,7 +1156,7 @@ static __poll_t sock_poll(struct file *file, poll_table *wait)
 	if (sock->ops->poll) {
 		mask = sock->ops->poll(file, sock, wait);
 	} else if (sock->ops->poll_mask) {
-		sock_poll_wait(file, sock_get_poll_head(file, events), wait);
+		sock_poll_wait(file, &sock->wq->wait, wait);
 		mask = sock->ops->poll_mask(sock, events);
 	}
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 14:20   ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-28 14:20 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 18598 bytes --]

The doubling of the indirect calls caused a too big regression for some
benchmarks in our post-spectre world.

With some of the nastiness in the networking code moved out of the way
we can instead stick a pointer to a waitqueue into struct file and
avoid one of the indirect calls.  This actually happens to simplify
the code quite a bit as well.

Note that for this removes the possibility of actually returning an
error before waiting in poll.  We could still do this with an ERR_PTR
in f_poll_head with a little bit of WRITE_ONCE/READ_ONCE magic, but
I'd like to defer that until actually required.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/filesystems/Locking |  4 +---
 Documentation/filesystems/vfs.txt | 15 +++++---------
 drivers/char/random.c             |  8 ++++----
 fs/aio.c                          | 22 ++++++---------------
 fs/eventfd.c                      | 33 +++++++++++++++++++------------
 fs/eventpoll.c                    |  9 +--------
 fs/pipe.c                         | 12 +++--------
 fs/select.c                       | 18 ++++-------------
 fs/timerfd.c                      | 31 +++++++++++++++++------------
 include/linux/fs.h                |  2 +-
 include/linux/poll.h              |  7 +------
 net/socket.c                      | 17 +++-------------
 12 files changed, 67 insertions(+), 111 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 2c391338c675..4d183e8258b6 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -441,7 +441,6 @@ prototypes:
 	int (*iterate) (struct file *, struct dir_context *);
 	int (*iterate_shared) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
-	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
 	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
@@ -505,8 +504,7 @@ in sys_read() and friends.
 the lease within the individual filesystem to record the result of the
 operation
 
-->poll_mask can be called with or without the waitqueue lock for the waitqueue
-returned from ->get_poll_head.
+->poll_mask can be called with or without the waitqueue lock
 
 --------------------------- dquot_operations -------------------------------
 prototypes:
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 829a7b7857a4..2d2f07acafa8 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -857,7 +857,6 @@ struct file_operations {
 	ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
 	int (*iterate) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
-	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
 	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
@@ -903,16 +902,12 @@ otherwise noted.
 	activity on this file and (optionally) go to sleep until there
 	is activity. Called by the select(2) and poll(2) system calls
 
-  get_poll_head: Returns the struct wait_queue_head that callers can
-  wait on.  Callers need to check the returned events using ->poll_mask
-  once woken.  Can return NULL to indicate polling is not supported,
-  or any error code using the ERR_PTR convention to indicate that a
-  grave error occured and ->poll_mask shall not be called.
-
   poll_mask: return the mask of EPOLL* values describing the file descriptor
-  state.  Called either before going to sleep on the waitqueue returned by
-  get_poll_head, or after it has been woken.  If ->get_poll_head and
-  ->poll_mask are implemented ->poll does not need to be implement.
+  state.  Called either before going to sleep on ->f_poll_head, or after it
+  has been woken.  On files that support ->poll_mask the ->f_poll_head field
+  must point to a wait_queue_head that poll can wait on.  The waitqueue must
+  have the same (or a longer) life time as the struct file itself.
+  If  ->poll_mask is implemented ->poll does not need to be implement.
 
   unlocked_ioctl: called by the ioctl(2) system call.
 
diff --git a/drivers/char/random.c b/drivers/char/random.c
index a8fb0020ba5c..e6260a9fa3b9 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1875,10 +1875,10 @@ urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
 	return ret;
 }
 
-static struct wait_queue_head *
-random_get_poll_head(struct file *file, __poll_t events)
+static int random_open(struct inode *inode, struct file *file)
 {
-	return &random_wait;
+	file->f_poll_head = &random_wait;
+	return 0;
 }
 
 static __poll_t
@@ -1990,9 +1990,9 @@ static int random_fasync(int fd, struct file *filp, int on)
 }
 
 const struct file_operations random_fops = {
+	.open = random_open,
 	.read  = random_read,
 	.write = random_write,
-	.get_poll_head  = random_get_poll_head,
 	.poll_mask  = random_poll_mask,
 	.unlocked_ioctl = random_ioctl,
 	.fasync = random_fasync,
diff --git a/fs/aio.c b/fs/aio.c
index e1d20124ec0e..84f498df8afc 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -168,7 +168,6 @@ struct fsync_iocb {
 struct poll_iocb {
 	struct file		*file;
 	__poll_t		events;
-	struct wait_queue_head	*head;
 
 	union {
 		struct wait_queue_entry	wait;
@@ -1632,7 +1631,7 @@ static int aio_poll_cancel(struct kiocb *iocb)
 {
 	struct aio_kiocb *aiocb = container_of(iocb, struct aio_kiocb, rw);
 	struct poll_iocb *req = &aiocb->poll;
-	struct wait_queue_head *head = req->head;
+	struct wait_queue_head *head = req->file->f_poll_head;
 	bool found = false;
 
 	spin_lock(&head->lock);
@@ -1655,7 +1654,7 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 	struct file *file = req->file;
 	__poll_t mask = key_to_poll(key);
 
-	assert_spin_locked(&req->head->lock);
+	assert_spin_locked(&file->f_poll_head->lock);
 
 	/* for instances that support it check for an event match first: */
 	if (mask && !(mask & req->events))
@@ -1703,30 +1702,21 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
 	req->file = fget(iocb->aio_fildes);
 	if (unlikely(!req->file))
 		return -EBADF;
-	if (!file_has_poll_mask(req->file))
+	if (!req->file->f_poll_head)
 		goto out_fail;
 
-	req->head = req->file->f_op->get_poll_head(req->file, req->events);
-	if (!req->head)
-		goto out_fail;
-	if (IS_ERR(req->head)) {
-		mask = EPOLLERR;
-		goto done;
-	}
-
 	init_waitqueue_func_entry(&req->wait, aio_poll_wake);
 	aiocb->ki_cancel = aio_poll_cancel;
 
 	spin_lock_irq(&ctx->ctx_lock);
-	spin_lock(&req->head->lock);
+	spin_lock(&req->file->f_poll_head->lock);
 	mask = req->file->f_op->poll_mask(req->file, req->events) & req->events;
 	if (!mask) {
-		__add_wait_queue(req->head, &req->wait);
+		__add_wait_queue(req->file->f_poll_head, &req->wait);
 		list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
 	}
-	spin_unlock(&req->head->lock);
+	spin_unlock(&req->file->f_poll_head->lock);
 	spin_unlock_irq(&ctx->ctx_lock);
-done:
 	if (mask)
 		__aio_poll_complete(aiocb, mask);
 	return 0;
diff --git a/fs/eventfd.c b/fs/eventfd.c
index ceb1031f1cac..8904b127577b 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -101,14 +101,6 @@ static int eventfd_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static struct wait_queue_head *
-eventfd_get_poll_head(struct file *file, __poll_t events)
-{
-	struct eventfd_ctx *ctx = file->private_data;
-
-	return &ctx->wqh;
-}
-
 static __poll_t eventfd_poll_mask(struct file *file, __poll_t eventmask)
 {
 	struct eventfd_ctx *ctx = file->private_data;
@@ -311,7 +303,6 @@ static const struct file_operations eventfd_fops = {
 	.show_fdinfo	= eventfd_show_fdinfo,
 #endif
 	.release	= eventfd_release,
-	.get_poll_head	= eventfd_get_poll_head,
 	.poll_mask	= eventfd_poll_mask,
 	.read		= eventfd_read,
 	.write		= eventfd_write,
@@ -390,7 +381,8 @@ EXPORT_SYMBOL_GPL(eventfd_ctx_fileget);
 static int do_eventfd(unsigned int count, int flags)
 {
 	struct eventfd_ctx *ctx;
-	int fd;
+	struct file *file;
+	int fd, error;
 
 	/* Check the EFD_* constants for consistency.  */
 	BUILD_BUG_ON(EFD_CLOEXEC != O_CLOEXEC);
@@ -408,12 +400,27 @@ static int do_eventfd(unsigned int count, int flags)
 	ctx->count = count;
 	ctx->flags = flags;
 
-	fd = anon_inode_getfd("[eventfd]", &eventfd_fops, ctx,
+	error = get_unused_fd_flags(O_RDWR | (flags & EFD_SHARED_FCNTL_FLAGS));
+	if (error < 0)
+		goto err_free_ctx;
+	fd = error;
+
+	file = anon_inode_getfile("[eventfd]", &eventfd_fops, ctx,
 			      O_RDWR | (flags & EFD_SHARED_FCNTL_FLAGS));
-	if (fd < 0)
-		eventfd_free_ctx(ctx);
+	if (IS_ERR(file)) {
+		error = PTR_ERR(file);
+		goto err_put_unused_fd;
+	}
+	file->f_poll_head = &ctx->wqh;
+	fd_install(fd, file);
 
 	return fd;
+
+err_put_unused_fd:
+	put_unused_fd(fd);
+err_free_ctx:
+	eventfd_free_ctx(ctx);
+	return error;
 }
 
 SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index ea4436f409fb..a0d199be91db 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -922,13 +922,6 @@ static __poll_t ep_read_events_proc(struct eventpoll *ep, struct list_head *head
 	return 0;
 }
 
-static struct wait_queue_head *ep_eventpoll_get_poll_head(struct file *file,
-		__poll_t eventmask)
-{
-	struct eventpoll *ep = file->private_data;
-	return &ep->poll_wait;
-}
-
 static __poll_t ep_eventpoll_poll_mask(struct file *file, __poll_t eventmask)
 {
 	struct eventpoll *ep = file->private_data;
@@ -972,7 +965,6 @@ static const struct file_operations eventpoll_fops = {
 	.show_fdinfo	= ep_show_fdinfo,
 #endif
 	.release	= ep_eventpoll_release,
-	.get_poll_head	= ep_eventpoll_get_poll_head,
 	.poll_mask	= ep_eventpoll_poll_mask,
 	.llseek		= noop_llseek,
 };
@@ -1973,6 +1965,7 @@ static int do_epoll_create(int flags)
 		goto out_free_fd;
 	}
 	ep->file = file;
+	file->f_poll_head = &ep->poll_wait;
 	fd_install(fd, file);
 	return fd;
 
diff --git a/fs/pipe.c b/fs/pipe.c
index bb0840e234f3..6817a60bebc9 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -509,14 +509,6 @@ static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	}
 }
 
-static struct wait_queue_head *
-pipe_get_poll_head(struct file *filp, __poll_t events)
-{
-	struct pipe_inode_info *pipe = filp->private_data;
-
-	return &pipe->wait;
-}
-
 /* No kernel lock held - fine */
 static __poll_t pipe_poll_mask(struct file *filp, __poll_t events)
 {
@@ -768,6 +760,7 @@ int create_pipe_files(struct file **res, int flags)
 
 	f->f_flags = O_WRONLY | (flags & (O_NONBLOCK | O_DIRECT));
 	f->private_data = inode->i_pipe;
+	f->f_poll_head = &inode->i_pipe->wait;
 
 	res[0] = alloc_file(&path, FMODE_READ, &pipefifo_fops);
 	if (IS_ERR(res[0])) {
@@ -778,6 +771,7 @@ int create_pipe_files(struct file **res, int flags)
 	path_get(&path);
 	res[0]->private_data = inode->i_pipe;
 	res[0]->f_flags = O_RDONLY | (flags & O_NONBLOCK);
+	res[0]->f_poll_head = &inode->i_pipe->wait;
 	res[1] = f;
 	return 0;
 
@@ -930,6 +924,7 @@ static int fifo_open(struct inode *inode, struct file *filp)
 
 	/* We can only do regular read/write on fifos */
 	filp->f_mode &= (FMODE_READ | FMODE_WRITE);
+	filp->f_poll_head = &pipe->wait;
 
 	switch (filp->f_mode) {
 	case FMODE_READ:
@@ -1023,7 +1018,6 @@ const struct file_operations pipefifo_fops = {
 	.llseek		= no_llseek,
 	.read_iter	= pipe_read,
 	.write_iter	= pipe_write,
-	.get_poll_head	= pipe_get_poll_head,
 	.poll_mask	= pipe_poll_mask,
 	.unlocked_ioctl	= pipe_ioctl,
 	.release	= pipe_release,
diff --git a/fs/select.c b/fs/select.c
index 317891ff8165..26e20063454a 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -38,20 +38,10 @@ __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
 {
 	if (file->f_op->poll) {
 		return file->f_op->poll(file, pt);
-	} else if (file_has_poll_mask(file)) {
-		unsigned int events = poll_requested_events(pt);
-		struct wait_queue_head *head;
-
-		if (pt && pt->_qproc) {
-			head = file->f_op->get_poll_head(file, events);
-			if (!head)
-				return DEFAULT_POLLMASK;
-			if (IS_ERR(head))
-				return EPOLLERR;
-			pt->_qproc(file, head, pt);
-		}
-
-		return file->f_op->poll_mask(file, events);
+	} else if (file->f_poll_head) {
+		if (pt && pt->_qproc)
+			pt->_qproc(file, file->f_poll_head, pt);
+		return file->f_op->poll_mask(file, poll_requested_events(pt));
 	} else {
 		return DEFAULT_POLLMASK;
 	}
diff --git a/fs/timerfd.c b/fs/timerfd.c
index d84a2bee4f82..08e820166c4d 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -227,14 +227,6 @@ static int timerfd_release(struct inode *inode, struct file *file)
 	return 0;
 }
 	
-static struct wait_queue_head *timerfd_get_poll_head(struct file *file,
-		__poll_t eventmask)
-{
-	struct timerfd_ctx *ctx = file->private_data;
-
-	return &ctx->wqh;
-}
-
 static __poll_t timerfd_poll_mask(struct file *file, __poll_t eventmask)
 {
 	struct timerfd_ctx *ctx = file->private_data;
@@ -363,7 +355,6 @@ static long timerfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg
 
 static const struct file_operations timerfd_fops = {
 	.release	= timerfd_release,
-	.get_poll_head	= timerfd_get_poll_head,
 	.poll_mask	= timerfd_poll_mask,
 	.read		= timerfd_read,
 	.llseek		= noop_llseek,
@@ -386,8 +377,9 @@ static int timerfd_fget(int fd, struct fd *p)
 
 SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
 {
-	int ufd;
+	int ufd, error;
 	struct timerfd_ctx *ctx;
+	struct file *file;
 
 	/* Check the TFD_* constants for consistency.  */
 	BUILD_BUG_ON(TFD_CLOEXEC != O_CLOEXEC);
@@ -424,12 +416,25 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
 
 	ctx->moffs = ktime_mono_to_real(0);
 
-	ufd = anon_inode_getfd("[timerfd]", &timerfd_fops, ctx,
+	error = get_unused_fd_flags(O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
+	if (error < 0)
+		goto out_free_ctx;
+	ufd = error;
+
+	file = anon_inode_getfile("[timerfd]", &timerfd_fops, ctx,
 			       O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
-	if (ufd < 0)
-		kfree(ctx);
+	if (IS_ERR(file)) {
+		error = PTR_ERR(file);
+		goto err_put_unused_fd;
+	}
+	file->f_poll_head = &ctx->wqh;
+	fd_install(ufd, file);
 
 	return ufd;
+err_put_unused_fd:
+	put_unused_fd(ufd);
+out_free_ctx:
+	return error;
 }
 
 static int do_timerfd_settime(int ufd, int flags, 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 5c91108846db..cc0fb83d3743 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -878,6 +878,7 @@ struct file {
 	loff_t			f_pos;
 	struct fown_struct	f_owner;
 	const struct cred	*f_cred;
+	struct wait_queue_head	*f_poll_head;
 	struct file_ra_state	f_ra;
 
 	u64			f_version;
@@ -1720,7 +1721,6 @@ struct file_operations {
 	int (*iterate) (struct file *, struct dir_context *);
 	int (*iterate_shared) (struct file *, struct dir_context *);
 	__poll_t (*poll) (struct file *, struct poll_table_struct *);
-	struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
 	__poll_t (*poll_mask) (struct file *, __poll_t);
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
diff --git a/include/linux/poll.h b/include/linux/poll.h
index fdf86b4cbc71..e3dd9dfee20a 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -74,14 +74,9 @@ static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
 	pt->_key   = ~(__poll_t)0; /* all events enabled */
 }
 
-static inline bool file_has_poll_mask(struct file *file)
-{
-	return file->f_op->get_poll_head && file->f_op->poll_mask;
-}
-
 static inline bool file_can_poll(struct file *file)
 {
-	return file->f_op->poll || file_has_poll_mask(file);
+	return file->f_op->poll || file->f_poll_head;
 }
 
 __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt);
diff --git a/net/socket.c b/net/socket.c
index 4354afe8ad96..155d4ba38645 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -117,8 +117,6 @@ static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from);
 static int sock_mmap(struct file *file, struct vm_area_struct *vma);
 
 static int sock_close(struct inode *inode, struct file *file);
-static struct wait_queue_head *sock_get_poll_head(struct file *file,
-		__poll_t events);
 static __poll_t sock_poll_mask(struct file *file, __poll_t);
 static __poll_t sock_poll(struct file *file, struct poll_table_struct *wait);
 static long sock_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
@@ -143,7 +141,6 @@ static const struct file_operations socket_file_ops = {
 	.llseek =	no_llseek,
 	.read_iter =	sock_read_iter,
 	.write_iter =	sock_write_iter,
-	.get_poll_head = sock_get_poll_head,
 	.poll_mask =	sock_poll_mask,
 	.poll =		sock_poll,
 	.unlocked_ioctl = sock_ioctl,
@@ -422,6 +419,8 @@ struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname)
 
 	sock->file = file;
 	file->f_flags = O_RDWR | (flags & O_NONBLOCK);
+	if (sock->ops->poll_mask)
+		file->f_poll_head = &sock->wq->wait;
 	file->private_data = sock;
 	return file;
 }
@@ -1128,16 +1127,6 @@ int sock_create_lite(int family, int type, int protocol, struct socket **res)
 }
 EXPORT_SYMBOL(sock_create_lite);
 
-static struct wait_queue_head *sock_get_poll_head(struct file *file,
-		__poll_t events)
-{
-	struct socket *sock = file->private_data;
-
-	if (!sock->ops->poll_mask)
-		return NULL;
-	return &sock->wq->wait;
-}
-
 static __poll_t sock_poll_mask(struct file *file, __poll_t events)
 {
 	struct socket *sock = file->private_data;
@@ -1167,7 +1156,7 @@ static __poll_t sock_poll(struct file *file, poll_table *wait)
 	if (sock->ops->poll) {
 		mask = sock->ops->poll(file, sock, wait);
 	} else if (sock->ops->poll_mask) {
-		sock_poll_wait(file, sock_get_poll_head(file, events), wait);
+		sock_poll_wait(file, &sock->wq->wait, wait);
 		mask = sock->ops->poll_mask(sock, events);
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 14:20   ` Christoph Hellwig
@ 2018-06-28 16:40     ` Linus Torvalds
  -1 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-28 16:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Al Viro, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 7:21 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Note that for this removes the possibility of actually returning an
> error before waiting in poll.  We could still do this with an ERR_PTR
> in f_poll_head with a little bit of WRITE_ONCE/READ_ONCE magic, but
> I'd like to defer that until actually required.

I'm still going to just revert the whole poll mess for now.

It's still completely broken. This helps things, but it doesn't fix
the fundamental issue: the new interface is strictly worse than the
old interface ever was.

So Christoph, if you don't like the tradoitional ->poll() method, and
you want something else for aio polling, I seriously will suggest that
you introduce a new f_op for *that*. Don't mess with the existing
->poll() function, don't make select() and poll() use a slower and
less capable function just because aio wants something else.

In other words, you need to see AIO as the less important case, not as
the driver for this.

I also want to understand what the AIO race was, and what the problem
with the poll() thing was. You claimed it was racy. I don't see it,
and it was never ever explained in the whole series. I should never
have pulled it in the first place if only for that reason, but I tend
to trust Al when it comes to the VFS layer, so I did. My bad.

So before we try this again, I most definitely want _explanations_.
And I want the whole approach to be very clear that AIO is the ugly
step-sister, not the driving force.

                     Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 16:40     ` Linus Torvalds
  0 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-28 16:40 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1558 bytes --]

On Thu, Jun 28, 2018 at 7:21 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Note that for this removes the possibility of actually returning an
> error before waiting in poll.  We could still do this with an ERR_PTR
> in f_poll_head with a little bit of WRITE_ONCE/READ_ONCE magic, but
> I'd like to defer that until actually required.

I'm still going to just revert the whole poll mess for now.

It's still completely broken. This helps things, but it doesn't fix
the fundamental issue: the new interface is strictly worse than the
old interface ever was.

So Christoph, if you don't like the tradoitional ->poll() method, and
you want something else for aio polling, I seriously will suggest that
you introduce a new f_op for *that*. Don't mess with the existing
->poll() function, don't make select() and poll() use a slower and
less capable function just because aio wants something else.

In other words, you need to see AIO as the less important case, not as
the driver for this.

I also want to understand what the AIO race was, and what the problem
with the poll() thing was. You claimed it was racy. I don't see it,
and it was never ever explained in the whole series. I should never
have pulled it in the first place if only for that reason, but I tend
to trust Al when it comes to the VFS layer, so I did. My bad.

So before we try this again, I most definitely want _explanations_.
And I want the whole approach to be very clear that AIO is the ugly
step-sister, not the driving force.

                     Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 16:40     ` Linus Torvalds
@ 2018-06-28 18:17       ` Al Viro
  -1 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2018-06-28 18:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 09:40:21AM -0700, Linus Torvalds wrote:
> On Thu, Jun 28, 2018 at 7:21 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > Note that for this removes the possibility of actually returning an
> > error before waiting in poll.  We could still do this with an ERR_PTR
> > in f_poll_head with a little bit of WRITE_ONCE/READ_ONCE magic, but
> > I'd like to defer that until actually required.
> 
> I'm still going to just revert the whole poll mess for now.
>
> It's still completely broken. This helps things, but it doesn't fix
> the fundamental issue: the new interface is strictly worse than the
> old interface ever was.
> 
> So Christoph, if you don't like the tradoitional ->poll() method, and
> you want something else for aio polling, I seriously will suggest that
> you introduce a new f_op for *that*. Don't mess with the existing
> ->poll() function, don't make select() and poll() use a slower and
> less capable function just because aio wants something else.
> 
> In other words, you need to see AIO as the less important case, not as
> the driver for this.
> 
> I also want to understand what the AIO race was, and what the problem
> with the poll() thing was. You claimed it was racy. I don't see it,
> and it was never ever explained in the whole series. I should never
> have pulled it in the first place if only for that reason, but I tend
> to trust Al when it comes to the VFS layer, so I did. My bad.

... and I should have pushed back harder, rather than getting sidetracked
into fixing the fs/aio.c-side races in this series ;-/

As for what can be salvaged out of the whole mess,
	* probably the most serious lesson is that INDIRECT CALLS ARE
COSTLY NOW and shouldn't be used lightly.  That had been slow to sink
in and we'd better all realize how much the things have changed.
That, BTW, has implications going a lot further than poll-related stuff -
e.g. the whole "we'll deal with revoke-like issues in procfs/sysfs/debugfs
by wrapping method calls" needs to be reexamined.  And in poll-related
area, note that we have a lot of indirection levels for socket poll.
	* having an optional pointer to wait_queue_head in struct file
is probably a good idea; a lot of ->poll() instances do have the same
form.  Even if sockets do not (and I'm not all that happy about the
solution in the latest series), the instances that do are common and
important enough.
	* a *LOT* of ->poll() instances only block in __pollwait()
called (indirectly) on the first pass.  If we annotate those in some
way (flag set in ->open(), presence of a new method, whatever) and
limit aio-poll to just those, we could deal with the aio side without
disrupting select/poll at all; just use (in place of __pollwait)
a different callback that wouldn't try to allocate poll_table_entry
and worked with the stuff embedded into aio-poll iocb.

How much do you intend to revert?  Untangling just the ->poll()-related
parts from the rest of changes in fs/aio.c will be unpleasant; we might
end up reverting the whole tail of the series and redoing the things
that are not poll-related on top of the revert... ;-/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 18:17       ` Al Viro
  0 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2018-06-28 18:17 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 3178 bytes --]

On Thu, Jun 28, 2018 at 09:40:21AM -0700, Linus Torvalds wrote:
> On Thu, Jun 28, 2018 at 7:21 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > Note that for this removes the possibility of actually returning an
> > error before waiting in poll.  We could still do this with an ERR_PTR
> > in f_poll_head with a little bit of WRITE_ONCE/READ_ONCE magic, but
> > I'd like to defer that until actually required.
> 
> I'm still going to just revert the whole poll mess for now.
>
> It's still completely broken. This helps things, but it doesn't fix
> the fundamental issue: the new interface is strictly worse than the
> old interface ever was.
> 
> So Christoph, if you don't like the tradoitional ->poll() method, and
> you want something else for aio polling, I seriously will suggest that
> you introduce a new f_op for *that*. Don't mess with the existing
> ->poll() function, don't make select() and poll() use a slower and
> less capable function just because aio wants something else.
> 
> In other words, you need to see AIO as the less important case, not as
> the driver for this.
> 
> I also want to understand what the AIO race was, and what the problem
> with the poll() thing was. You claimed it was racy. I don't see it,
> and it was never ever explained in the whole series. I should never
> have pulled it in the first place if only for that reason, but I tend
> to trust Al when it comes to the VFS layer, so I did. My bad.

... and I should have pushed back harder, rather than getting sidetracked
into fixing the fs/aio.c-side races in this series ;-/

As for what can be salvaged out of the whole mess,
	* probably the most serious lesson is that INDIRECT CALLS ARE
COSTLY NOW and shouldn't be used lightly.  That had been slow to sink
in and we'd better all realize how much the things have changed.
That, BTW, has implications going a lot further than poll-related stuff -
e.g. the whole "we'll deal with revoke-like issues in procfs/sysfs/debugfs
by wrapping method calls" needs to be reexamined.  And in poll-related
area, note that we have a lot of indirection levels for socket poll.
	* having an optional pointer to wait_queue_head in struct file
is probably a good idea; a lot of ->poll() instances do have the same
form.  Even if sockets do not (and I'm not all that happy about the
solution in the latest series), the instances that do are common and
important enough.
	* a *LOT* of ->poll() instances only block in __pollwait()
called (indirectly) on the first pass.  If we annotate those in some
way (flag set in ->open(), presence of a new method, whatever) and
limit aio-poll to just those, we could deal with the aio side without
disrupting select/poll at all; just use (in place of __pollwait)
a different callback that wouldn't try to allocate poll_table_entry
and worked with the stuff embedded into aio-poll iocb.

How much do you intend to revert?  Untangling just the ->poll()-related
parts from the rest of changes in fs/aio.c will be unpleasant; we might
end up reverting the whole tail of the series and redoing the things
that are not poll-related on top of the revert... ;-/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 18:17       ` Al Viro
@ 2018-06-28 19:31         ` Linus Torvalds
  -1 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-28 19:31 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 11:17 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> As for what can be salvaged out of the whole mess,
>         * probably the most serious lesson is that INDIRECT CALLS ARE
> COSTLY NOW and shouldn't be used lightly.

Note that this has always been true, it's just _way_ more obvious now.

Indirect calls are costly not just because of the nasty 20+ cycle cost
of the stupid spectre overhead (which hopefully will be a thing of the
past in a few years as people upgrade), but because they are a pretty
basic barrier to optimization, both for compilers but also just for
_people_.

Look at a lot of vfs optimization we've done, like all the name
hashing optimization etc. We basically fall flat on our face if a
filesystem implements its own name hash function, not _just_ because
of the cost of the indirect function call, but because it suddenly
means that the filesystem is doing its own thing and all the clever
work we did to integrate name hashing with copying the name no longer
works.

So I really want to avoid indirect calls. And when they *do* happen, I
want to avoid the model where people think of them as low-level object
accessor functions - the C++ disease. I want indirect function calls
to make sense at a higher level, and do some real operation.

End result: I really despised the new poll() model. Yes, the
performance report was what made me *notice*, but then when I looked
at the code I went "No". Using an indirect call as some low-level
accessor function really is fundamentally wrong. Don't do it. It's
badly designed.

Out VFS operations are _high-level_ operations, where we do one single
call for a whole "read()" operation. "->poll()" used to be the same.
The new "->get_poll_head()" and "->poll_mask()" was just bad, bad,
bad.

>         * having an optional pointer to wait_queue_head in struct file
> is probably a good idea; a lot of ->poll() instances do have the same
> form.  Even if sockets do not (and I'm not all that happy about the
> solution in the latest series), the instances that do are common and
> important enough.

Right. I don't hate the poll wait-queue pointer. That said, I do hope
that we can simply write things so as to not even need it.

>         * a *LOT* of ->poll() instances only block in __pollwait()
> called (indirectly) on the first pass.

They are *all* supposed to do it.

The whole idea with "->poll()" is that the model of operation is:

 -  add_wait_queue() and return state on the first pass

 - on subsequent passes (or if somebody else already returned a state
that means we already know we're not going to wait), the poll table is
NULL, so you *CANNOT* add_wait_queue again, so you just return state.

Additionally, once _anybody_ has returned a "I already have the
event", we also clear the poll table queue, so subsequent accesses
will purely be for returning the poll state.

So I don't understand why you're asking for annotation. The whole "you
add yourself to the poll table" is *fundamentally* only done on the
first pass. You should never do it for later passes.

> How much do you intend to revert?  Untangling just the ->poll()-related
> parts from the rest of changes in fs/aio.c will be unpleasant; we might
> end up reverting the whole tail of the series and redoing the things
> that are not poll-related on top of the revert... ;-/

I pushed out my revert. It was fairly straightforward, it just
reverted all the poll_mask/get_poll_head changes, and the aio code
that depended on them.

Btw, I really don't understand the "aio has a race". The old poll()
interface was fundamentally race-free. There simply *is* no way to
race on it, exactly because of the whole "add yourself to the wait
queue first, then ask for state afterwards" model.  The model may be
_odd_, but it has literally worked well for a quarter century exactly
because it's really simple and fundamentally cannot have races.

So I think it's the aio code that needs fixing, not the polling code.

I do want that explanation for why AIO is _so_ special that it can
introduce a race in poll().

Because I suspect it's not so special, and it's just buggy. Maybe
Christoph didn't understand the two-phase model (how you call ->poll()
_twice_ - first to add yourself to the queue, later to check status).
Or maybe AIO interfaces are just shit (again!) and don't work right.

                   Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 19:31         ` Linus Torvalds
  0 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-28 19:31 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 4476 bytes --]

On Thu, Jun 28, 2018 at 11:17 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> As for what can be salvaged out of the whole mess,
>         * probably the most serious lesson is that INDIRECT CALLS ARE
> COSTLY NOW and shouldn't be used lightly.

Note that this has always been true, it's just _way_ more obvious now.

Indirect calls are costly not just because of the nasty 20+ cycle cost
of the stupid spectre overhead (which hopefully will be a thing of the
past in a few years as people upgrade), but because they are a pretty
basic barrier to optimization, both for compilers but also just for
_people_.

Look at a lot of vfs optimization we've done, like all the name
hashing optimization etc. We basically fall flat on our face if a
filesystem implements its own name hash function, not _just_ because
of the cost of the indirect function call, but because it suddenly
means that the filesystem is doing its own thing and all the clever
work we did to integrate name hashing with copying the name no longer
works.

So I really want to avoid indirect calls. And when they *do* happen, I
want to avoid the model where people think of them as low-level object
accessor functions - the C++ disease. I want indirect function calls
to make sense at a higher level, and do some real operation.

End result: I really despised the new poll() model. Yes, the
performance report was what made me *notice*, but then when I looked
at the code I went "No". Using an indirect call as some low-level
accessor function really is fundamentally wrong. Don't do it. It's
badly designed.

Out VFS operations are _high-level_ operations, where we do one single
call for a whole "read()" operation. "->poll()" used to be the same.
The new "->get_poll_head()" and "->poll_mask()" was just bad, bad,
bad.

>         * having an optional pointer to wait_queue_head in struct file
> is probably a good idea; a lot of ->poll() instances do have the same
> form.  Even if sockets do not (and I'm not all that happy about the
> solution in the latest series), the instances that do are common and
> important enough.

Right. I don't hate the poll wait-queue pointer. That said, I do hope
that we can simply write things so as to not even need it.

>         * a *LOT* of ->poll() instances only block in __pollwait()
> called (indirectly) on the first pass.

They are *all* supposed to do it.

The whole idea with "->poll()" is that the model of operation is:

 -  add_wait_queue() and return state on the first pass

 - on subsequent passes (or if somebody else already returned a state
that means we already know we're not going to wait), the poll table is
NULL, so you *CANNOT* add_wait_queue again, so you just return state.

Additionally, once _anybody_ has returned a "I already have the
event", we also clear the poll table queue, so subsequent accesses
will purely be for returning the poll state.

So I don't understand why you're asking for annotation. The whole "you
add yourself to the poll table" is *fundamentally* only done on the
first pass. You should never do it for later passes.

> How much do you intend to revert?  Untangling just the ->poll()-related
> parts from the rest of changes in fs/aio.c will be unpleasant; we might
> end up reverting the whole tail of the series and redoing the things
> that are not poll-related on top of the revert... ;-/

I pushed out my revert. It was fairly straightforward, it just
reverted all the poll_mask/get_poll_head changes, and the aio code
that depended on them.

Btw, I really don't understand the "aio has a race". The old poll()
interface was fundamentally race-free. There simply *is* no way to
race on it, exactly because of the whole "add yourself to the wait
queue first, then ask for state afterwards" model.  The model may be
_odd_, but it has literally worked well for a quarter century exactly
because it's really simple and fundamentally cannot have races.

So I think it's the aio code that needs fixing, not the polling code.

I do want that explanation for why AIO is _so_ special that it can
introduce a race in poll().

Because I suspect it's not so special, and it's just buggy. Maybe
Christoph didn't understand the two-phase model (how you call ->poll()
_twice_ - first to add yourself to the queue, later to check status).
Or maybe AIO interfaces are just shit (again!) and don't work right.

                   Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 19:31         ` Linus Torvalds
@ 2018-06-28 20:28           ` Al Viro
  -1 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2018-06-28 20:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 12:31:14PM -0700, Linus Torvalds wrote:

> >         * a *LOT* of ->poll() instances only block in __pollwait()
> > called (indirectly) on the first pass.
> 
> They are *all* supposed to do it.

Sure, but...

static __poll_t binder_poll(struct file *filp,
                                struct poll_table_struct *wait)
{
        struct binder_proc *proc = filp->private_data;
        struct binder_thread *thread = NULL;
        bool wait_for_proc_work;

        thread = binder_get_thread(proc);
        if (!thread)
                return POLLERR;
...
static struct binder_thread *binder_get_thread(struct binder_proc *proc)
{
        struct binder_thread *thread;
        struct binder_thread *new_thread;

        binder_inner_proc_lock(proc);
        thread = binder_get_thread_ilocked(proc, NULL);
        binder_inner_proc_unlock(proc);
        if (!thread) {
                new_thread = kzalloc(sizeof(*thread), GFP_KERNEL);

And that's hardly unique - we have instances playing with timers,
allocations, whatnot.  Even straight mutex_lock(), as in
static __poll_t i915_perf_poll(struct file *file, poll_table *wait)
{
        struct i915_perf_stream *stream = file->private_data;
        struct drm_i915_private *dev_priv = stream->dev_priv;
        __poll_t ret;

        mutex_lock(&dev_priv->perf.lock);
        ret = i915_perf_poll_locked(dev_priv, stream, file, wait);
        mutex_unlock(&dev_priv->perf.lock);

        return ret;
}

or
static __poll_t cec_poll(struct file *filp,
                             struct poll_table_struct *poll)
{
        struct cec_fh *fh = filp->private_data;
        struct cec_adapter *adap = fh->adap;
        __poll_t res = 0;

        if (!cec_is_registered(adap))
                return EPOLLERR | EPOLLHUP;
        mutex_lock(&adap->lock);
        if (adap->is_configured &&
            adap->transmit_queue_sz < CEC_MAX_MSG_TX_QUEUE_SZ)
                res |= EPOLLOUT | EPOLLWRNORM;
        if (fh->queued_msgs)
                res |= EPOLLIN | EPOLLRDNORM;
        if (fh->total_queued_events)
                res |= EPOLLPRI;
        poll_wait(filp, &fh->wait, poll);
        mutex_unlock(&adap->lock);
        return res;
}

etc.

> 
> The whole idea with "->poll()" is that the model of operation is:
> 
>  -  add_wait_queue() and return state on the first pass
> 
>  - on subsequent passes (or if somebody else already returned a state
> that means we already know we're not going to wait), the poll table is
> NULL, so you *CANNOT* add_wait_queue again, so you just return state.
> 
> Additionally, once _anybody_ has returned a "I already have the
> event", we also clear the poll table queue, so subsequent accesses
> will purely be for returning the poll state.
> 
> So I don't understand why you're asking for annotation. The whole "you
> add yourself to the poll table" is *fundamentally* only done on the
> first pass. You should never do it for later passes.

Sure.  Unfortunately, adding yourself to the poll table is not the only
way to block.  And a plenty of instances in e.g. drivers/media (where
the bulk of ->poll() instances lives) are very free with grabbing mutexes
as they go.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 20:28           ` Al Viro
  0 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2018-06-28 20:28 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 3277 bytes --]

On Thu, Jun 28, 2018 at 12:31:14PM -0700, Linus Torvalds wrote:

> >         * a *LOT* of ->poll() instances only block in __pollwait()
> > called (indirectly) on the first pass.
> 
> They are *all* supposed to do it.

Sure, but...

static __poll_t binder_poll(struct file *filp,
                                struct poll_table_struct *wait)
{
        struct binder_proc *proc = filp->private_data;
        struct binder_thread *thread = NULL;
        bool wait_for_proc_work;

        thread = binder_get_thread(proc);
        if (!thread)
                return POLLERR;
...
static struct binder_thread *binder_get_thread(struct binder_proc *proc)
{
        struct binder_thread *thread;
        struct binder_thread *new_thread;

        binder_inner_proc_lock(proc);
        thread = binder_get_thread_ilocked(proc, NULL);
        binder_inner_proc_unlock(proc);
        if (!thread) {
                new_thread = kzalloc(sizeof(*thread), GFP_KERNEL);

And that's hardly unique - we have instances playing with timers,
allocations, whatnot.  Even straight mutex_lock(), as in
static __poll_t i915_perf_poll(struct file *file, poll_table *wait)
{
        struct i915_perf_stream *stream = file->private_data;
        struct drm_i915_private *dev_priv = stream->dev_priv;
        __poll_t ret;

        mutex_lock(&dev_priv->perf.lock);
        ret = i915_perf_poll_locked(dev_priv, stream, file, wait);
        mutex_unlock(&dev_priv->perf.lock);

        return ret;
}

or
static __poll_t cec_poll(struct file *filp,
                             struct poll_table_struct *poll)
{
        struct cec_fh *fh = filp->private_data;
        struct cec_adapter *adap = fh->adap;
        __poll_t res = 0;

        if (!cec_is_registered(adap))
                return EPOLLERR | EPOLLHUP;
        mutex_lock(&adap->lock);
        if (adap->is_configured &&
            adap->transmit_queue_sz < CEC_MAX_MSG_TX_QUEUE_SZ)
                res |= EPOLLOUT | EPOLLWRNORM;
        if (fh->queued_msgs)
                res |= EPOLLIN | EPOLLRDNORM;
        if (fh->total_queued_events)
                res |= EPOLLPRI;
        poll_wait(filp, &fh->wait, poll);
        mutex_unlock(&adap->lock);
        return res;
}

etc.

> 
> The whole idea with "->poll()" is that the model of operation is:
> 
>  -  add_wait_queue() and return state on the first pass
> 
>  - on subsequent passes (or if somebody else already returned a state
> that means we already know we're not going to wait), the poll table is
> NULL, so you *CANNOT* add_wait_queue again, so you just return state.
> 
> Additionally, once _anybody_ has returned a "I already have the
> event", we also clear the poll table queue, so subsequent accesses
> will purely be for returning the poll state.
> 
> So I don't understand why you're asking for annotation. The whole "you
> add yourself to the poll table" is *fundamentally* only done on the
> first pass. You should never do it for later passes.

Sure.  Unfortunately, adding yourself to the poll table is not the only
way to block.  And a plenty of instances in e.g. drivers/media (where
the bulk of ->poll() instances lives) are very free with grabbing mutexes
as they go.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 20:28           ` Al Viro
@ 2018-06-28 20:37             ` Al Viro
  -1 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2018-06-28 20:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 09:28:37PM +0100, Al Viro wrote:

> Sure.  Unfortunately, adding yourself to the poll table is not the only
> way to block.  And a plenty of instances in e.g. drivers/media (where
> the bulk of ->poll() instances lives) are very free with grabbing mutexes
> as they go.

Speaking of obvious bogosities (a lot more so than a blocking allocation
several calls down into helper):

static __poll_t ca8210_test_int_poll(
        struct file *filp,
        struct poll_table_struct *ptable
)
{
        __poll_t return_flags = 0;
        struct ca8210_priv *priv = filp->private_data;

        poll_wait(filp, &priv->test.readq, ptable);
        if (!kfifo_is_empty(&priv->test.up_fifo))
                return_flags |= (EPOLLIN | EPOLLRDNORM);
        if (wait_event_interruptible(
                priv->test.readq,
                !kfifo_is_empty(&priv->test.up_fifo))) {
                return EPOLLERR;
        }
        return return_flags;
}

In a sense, poll_wait() had been an unfortunate choice of name - it's
really "arrange to be woken up by", not "wait for something now" and
while the outright confusion like above is rare, general "blocking
is OK in that area" is not rare at all...

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 20:37             ` Al Viro
  0 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2018-06-28 20:37 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1247 bytes --]

On Thu, Jun 28, 2018 at 09:28:37PM +0100, Al Viro wrote:

> Sure.  Unfortunately, adding yourself to the poll table is not the only
> way to block.  And a plenty of instances in e.g. drivers/media (where
> the bulk of ->poll() instances lives) are very free with grabbing mutexes
> as they go.

Speaking of obvious bogosities (a lot more so than a blocking allocation
several calls down into helper):

static __poll_t ca8210_test_int_poll(
        struct file *filp,
        struct poll_table_struct *ptable
)
{
        __poll_t return_flags = 0;
        struct ca8210_priv *priv = filp->private_data;

        poll_wait(filp, &priv->test.readq, ptable);
        if (!kfifo_is_empty(&priv->test.up_fifo))
                return_flags |= (EPOLLIN | EPOLLRDNORM);
        if (wait_event_interruptible(
                priv->test.readq,
                !kfifo_is_empty(&priv->test.up_fifo))) {
                return EPOLLERR;
        }
        return return_flags;
}

In a sense, poll_wait() had been an unfortunate choice of name - it's
really "arrange to be woken up by", not "wait for something now" and
while the outright confusion like above is rare, general "blocking
is OK in that area" is not rare at all...

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 20:28           ` Al Viro
@ 2018-06-28 21:11             ` Linus Torvalds
  -1 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-28 21:11 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 1:28 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>
> Sure, but...
>
> static __poll_t binder_poll(struct file *filp,
>                                 struct poll_table_struct *wait)
> {
>         struct binder_proc *proc = filp->private_data;
>         struct binder_thread *thread = NULL;
>         bool wait_for_proc_work;
>
>         thread = binder_get_thread(proc);
>         if (!thread)
>                 return POLLERR;

That's actually fine.

In particular, it's ok to *not* add yourself to the wait-queues if you
already return the value that you will always return.

For example, the poll function for a regular file could just always return

        EPOLLIN | EPOLLOUT | EPOLLRDNORM | EPOLLWRNORM

because it never changes. Now, it doesn't actually *do* that, because
we have the rule that "no ->poll function" means exactly that, but
it's not wrong.

Similarly, if a poll handler has an error that will not go away, the
above is actually the *right* thing to do. There simply isn't anything
to wait for.

Arguably, it probably should return not just POLLERR, but also
POLLIN/POLLOUT, since an error *also* means that it's going to return
immediately from read/write, but that's a separate thing entirely.

> And that's hardly unique - we have instances playing with timers,
> allocations, whatnot.  Even straight mutex_lock(), as in

So?

Again, locking is permitted. It's not great, but it's not against the rules.

So none of the things you point to are excuses for changing interfaces
or adding any flags.

And no, we don't care at all about "blocking" in general. Somebody who
cares about _performance_ may care, but it's not fundamnetal. Even
select(*) and poll() itself will block for allocating select tables
etc, but also for reading (and updating) user space.

Anybody who thinks "select cannot block" or "->poll() musn't block" is
just confused. It has *never* been about that. It waits asynchronously
for IO, but it may well wait synchronously for locks or memory or just
"lazy implementation".

And none of this has antyhign to do with the ->poll() interface
itself. If you want to improve performance on some _particular_ file
and actually worry about locking etc, you fix that particular
implementation. You don't go and change the interface for everybody.

The fact is, those interface changes were just broken shit. They were
confused. I don't actually believe that AIO even needed them.

Christoph, do you have a test program for IOCB_CMD_POLL and what it's
actually supposed to do?

Because I think that what it can do is simply to do the ->poll() calls
outside the iocb locks, and then just attach the poll table to the
kioctx afterwards.

This whole "poll must not block" is a complete red herring. It doesn't
come from any other requirements than BAD AIO GARBAGE CODE.

Seriously. Stop thinking this has to happen inside some spinlocked
region. That's AIO braindamage, it's irrelevant.

             Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 21:11             ` Linus Torvalds
  0 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-28 21:11 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 3046 bytes --]

On Thu, Jun 28, 2018 at 1:28 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>
> Sure, but...
>
> static __poll_t binder_poll(struct file *filp,
>                                 struct poll_table_struct *wait)
> {
>         struct binder_proc *proc = filp->private_data;
>         struct binder_thread *thread = NULL;
>         bool wait_for_proc_work;
>
>         thread = binder_get_thread(proc);
>         if (!thread)
>                 return POLLERR;

That's actually fine.

In particular, it's ok to *not* add yourself to the wait-queues if you
already return the value that you will always return.

For example, the poll function for a regular file could just always return

        EPOLLIN | EPOLLOUT | EPOLLRDNORM | EPOLLWRNORM

because it never changes. Now, it doesn't actually *do* that, because
we have the rule that "no ->poll function" means exactly that, but
it's not wrong.

Similarly, if a poll handler has an error that will not go away, the
above is actually the *right* thing to do. There simply isn't anything
to wait for.

Arguably, it probably should return not just POLLERR, but also
POLLIN/POLLOUT, since an error *also* means that it's going to return
immediately from read/write, but that's a separate thing entirely.

> And that's hardly unique - we have instances playing with timers,
> allocations, whatnot.  Even straight mutex_lock(), as in

So?

Again, locking is permitted. It's not great, but it's not against the rules.

So none of the things you point to are excuses for changing interfaces
or adding any flags.

And no, we don't care at all about "blocking" in general. Somebody who
cares about _performance_ may care, but it's not fundamnetal. Even
select(*) and poll() itself will block for allocating select tables
etc, but also for reading (and updating) user space.

Anybody who thinks "select cannot block" or "->poll() musn't block" is
just confused. It has *never* been about that. It waits asynchronously
for IO, but it may well wait synchronously for locks or memory or just
"lazy implementation".

And none of this has antyhign to do with the ->poll() interface
itself. If you want to improve performance on some _particular_ file
and actually worry about locking etc, you fix that particular
implementation. You don't go and change the interface for everybody.

The fact is, those interface changes were just broken shit. They were
confused. I don't actually believe that AIO even needed them.

Christoph, do you have a test program for IOCB_CMD_POLL and what it's
actually supposed to do?

Because I think that what it can do is simply to do the ->poll() calls
outside the iocb locks, and then just attach the poll table to the
kioctx afterwards.

This whole "poll must not block" is a complete red herring. It doesn't
come from any other requirements than BAD AIO GARBAGE CODE.

Seriously. Stop thinking this has to happen inside some spinlocked
region. That's AIO braindamage, it's irrelevant.

             Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 20:37             ` Al Viro
@ 2018-06-28 21:16               ` Linus Torvalds
  -1 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-28 21:16 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 1:37 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>
> Speaking of obvious bogosities (a lot more so than a blocking allocation
> several calls down into helper):
>
> static __poll_t ca8210_test_int_poll(
>         struct file *filp,
>         struct poll_table_struct *ptable
> )

Ok, that's just garbage.

Again, this is not an excuse for changing "->poll()". This is just a
bogus driver that does stupid things.

And again, don't use this as an example of "poll must not block" The
fact is, poll() *can* block for locking and other such things, and
it's perfectly normal and ok. It just shouldn't block for IO.

I will not take any misguided patches to try to "fix" poll functions
that might block.

But I will take patches to fix bugs in drivers. In this case, I think
the fix is to just remove that crazy "wait_event_interruptible()"
entirely.

Not that anybody cares, obviously. Apparently nobody has ever noticed
how broken that poll function is.

               Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 21:16               ` Linus Torvalds
  0 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-28 21:16 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1031 bytes --]

On Thu, Jun 28, 2018 at 1:37 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>
> Speaking of obvious bogosities (a lot more so than a blocking allocation
> several calls down into helper):
>
> static __poll_t ca8210_test_int_poll(
>         struct file *filp,
>         struct poll_table_struct *ptable
> )

Ok, that's just garbage.

Again, this is not an excuse for changing "->poll()". This is just a
bogus driver that does stupid things.

And again, don't use this as an example of "poll must not block" The
fact is, poll() *can* block for locking and other such things, and
it's perfectly normal and ok. It just shouldn't block for IO.

I will not take any misguided patches to try to "fix" poll functions
that might block.

But I will take patches to fix bugs in drivers. In this case, I think
the fix is to just remove that crazy "wait_event_interruptible()"
entirely.

Not that anybody cares, obviously. Apparently nobody has ever noticed
how broken that poll function is.

               Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 21:11             ` Linus Torvalds
@ 2018-06-28 21:30               ` Al Viro
  -1 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2018-06-28 21:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 02:11:17PM -0700, Linus Torvalds wrote:
> On Thu, Jun 28, 2018 at 1:28 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> >
> > Sure, but...
> >
> > static __poll_t binder_poll(struct file *filp,
> >                                 struct poll_table_struct *wait)
> > {
> >         struct binder_proc *proc = filp->private_data;
> >         struct binder_thread *thread = NULL;
> >         bool wait_for_proc_work;
> >
> >         thread = binder_get_thread(proc);
> >         if (!thread)
> >                 return POLLERR;
> 
> That's actually fine.
> 
> In particular, it's ok to *not* add yourself to the wait-queues if you
> already return the value that you will always return.

Sure (and that's one of the problems I mentioned with ->get_poll_head() model).
But in this case I was refering to GFP_KERNEL allocation down there.

> > And that's hardly unique - we have instances playing with timers,
> > allocations, whatnot.  Even straight mutex_lock(), as in
> 
> So?
> 
> Again, locking is permitted. It's not great, but it's not against the rules.

Me: a *LOT* of ->poll() instances only block in __pollwait() called (indirectly)
on the first pass.

You: They are *all* supposed to do it.

Me: <examples of instances that block elsewhere>

I'm not saying that blocking on other things is a bug; some of such *are* bogus,
but a lot aren't really broken.  What I said is that in a lot of cases we really
have hard "no blocking other than in callback" (and on subsequent passes there's
no callback at all).  Which is just about perfect for AIO purposes, so *IF* we
go for "new method just for AIO, those who don't have it can take a hike", we might
as well indicate that "can take a hike" in some way (be it opt-in or opt-out) and
use straight unchanged ->poll(), with alternative callback.

Looks like we were talking past each other for the last couple of rounds...

> So none of the things you point to are excuses for changing interfaces
> or adding any flags.

> Anybody who thinks "select cannot block" or "->poll() musn't block" is
> just confused. It has *never* been about that. It waits asynchronously
> for IO, but it may well wait synchronously for locks or memory or just
> "lazy implementation".

Obviously.  I *do* understand how poll() works, really.

> The fact is, those interface changes were just broken shit. They were
> confused. I don't actually believe that AIO even needed them.
> 
> Christoph, do you have a test program for IOCB_CMD_POLL and what it's
> actually supposed to do?
> 
> Because I think that what it can do is simply to do the ->poll() calls
> outside the iocb locks, and then just attach the poll table to the
> kioctx afterwards.

I'd do a bit more - embed the first poll_table_entry into poll iocb itself,
so that the instances that use only one queue wouldn't need any allocations
at all.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 21:30               ` Al Viro
  0 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2018-06-28 21:30 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 2936 bytes --]

On Thu, Jun 28, 2018 at 02:11:17PM -0700, Linus Torvalds wrote:
> On Thu, Jun 28, 2018 at 1:28 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> >
> > Sure, but...
> >
> > static __poll_t binder_poll(struct file *filp,
> >                                 struct poll_table_struct *wait)
> > {
> >         struct binder_proc *proc = filp->private_data;
> >         struct binder_thread *thread = NULL;
> >         bool wait_for_proc_work;
> >
> >         thread = binder_get_thread(proc);
> >         if (!thread)
> >                 return POLLERR;
> 
> That's actually fine.
> 
> In particular, it's ok to *not* add yourself to the wait-queues if you
> already return the value that you will always return.

Sure (and that's one of the problems I mentioned with ->get_poll_head() model).
But in this case I was refering to GFP_KERNEL allocation down there.

> > And that's hardly unique - we have instances playing with timers,
> > allocations, whatnot.  Even straight mutex_lock(), as in
> 
> So?
> 
> Again, locking is permitted. It's not great, but it's not against the rules.

Me: a *LOT* of ->poll() instances only block in __pollwait() called (indirectly)
on the first pass.

You: They are *all* supposed to do it.

Me: <examples of instances that block elsewhere>

I'm not saying that blocking on other things is a bug; some of such *are* bogus,
but a lot aren't really broken.  What I said is that in a lot of cases we really
have hard "no blocking other than in callback" (and on subsequent passes there's
no callback at all).  Which is just about perfect for AIO purposes, so *IF* we
go for "new method just for AIO, those who don't have it can take a hike", we might
as well indicate that "can take a hike" in some way (be it opt-in or opt-out) and
use straight unchanged ->poll(), with alternative callback.

Looks like we were talking past each other for the last couple of rounds...

> So none of the things you point to are excuses for changing interfaces
> or adding any flags.

> Anybody who thinks "select cannot block" or "->poll() musn't block" is
> just confused. It has *never* been about that. It waits asynchronously
> for IO, but it may well wait synchronously for locks or memory or just
> "lazy implementation".

Obviously.  I *do* understand how poll() works, really.

> The fact is, those interface changes were just broken shit. They were
> confused. I don't actually believe that AIO even needed them.
> 
> Christoph, do you have a test program for IOCB_CMD_POLL and what it's
> actually supposed to do?
> 
> Because I think that what it can do is simply to do the ->poll() calls
> outside the iocb locks, and then just attach the poll table to the
> kioctx afterwards.

I'd do a bit more - embed the first poll_table_entry into poll iocb itself,
so that the instances that use only one queue wouldn't need any allocations
at all.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 21:30               ` Al Viro
@ 2018-06-28 21:39                 ` Linus Torvalds
  -1 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-28 21:39 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 2:30 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> > Again, locking is permitted. It's not great, but it's not against the rules.
>
> Me: a *LOT* of ->poll() instances only block in __pollwait() called (indirectly)
> on the first pass.
>
> You: They are *all* supposed to do it.
>
> Me: <examples of instances that block elsewhere>

Oh, I thought you were talking about the whole "first pass" adding to
wait queues, as opposed to doing it on the second pass.

The *blocking* is entirely immaterial. I didn't even react to it,
because it's simply not an issue.

I don't understand why you're even hung up about it.

The only reason "blocking" seems to be an issu eis because AIO has
shit-for-brains and wanted to do poll() under the spinlock.

But that's literally just AIO being confused garbage. It has zero
relevance for anything else.

                Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 21:39                 ` Linus Torvalds
  0 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-28 21:39 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 913 bytes --]

On Thu, Jun 28, 2018 at 2:30 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> > Again, locking is permitted. It's not great, but it's not against the rules.
>
> Me: a *LOT* of ->poll() instances only block in __pollwait() called (indirectly)
> on the first pass.
>
> You: They are *all* supposed to do it.
>
> Me: <examples of instances that block elsewhere>

Oh, I thought you were talking about the whole "first pass" adding to
wait queues, as opposed to doing it on the second pass.

The *blocking* is entirely immaterial. I didn't even react to it,
because it's simply not an issue.

I don't understand why you're even hung up about it.

The only reason "blocking" seems to be an issu eis because AIO has
shit-for-brains and wanted to do poll() under the spinlock.

But that's literally just AIO being confused garbage. It has zero
relevance for anything else.

                Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 21:30               ` Al Viro
@ 2018-06-28 22:20                 ` Al Viro
  -1 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2018-06-28 22:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 10:30:27PM +0100, Al Viro wrote:

> I'm not saying that blocking on other things is a bug; some of such *are* bogus,
> but a lot aren't really broken.  What I said is that in a lot of cases we really
> have hard "no blocking other than in callback" (and on subsequent passes there's
> no callback at all).  Which is just about perfect for AIO purposes, so *IF* we
> go for "new method just for AIO, those who don't have it can take a hike", we might
> as well indicate that "can take a hike" in some way (be it opt-in or opt-out) and
> use straight unchanged ->poll(), with alternative callback.

PS: one way of doing that would be to steal a flag from pt->_key and have ->poll()
instances do an equivalent of
        if (flags & LOOKUP_RCU)
                return -ECHILD;
we have in a lot of ->d_revalidate() instances for "need to block" case.  Only
here they would've returned EPOLLNVAL.

Most of the ->poll() instances wouldn't care at all - they do not block unless
the callback does (and in this case it wouldn't have).  Normal poll(2)/select(2)
are completely unaffected.  And AIO would just have that bit set in its
poll_table_struct.

The rules for drivers change only in one respect - if your ->poll() is going to
need to block, check poll_requested_events(pt) & EPOLL_ATOMIC and return EPOLLNVAL
in such case.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 22:20                 ` Al Viro
  0 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2018-06-28 22:20 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1371 bytes --]

On Thu, Jun 28, 2018 at 10:30:27PM +0100, Al Viro wrote:

> I'm not saying that blocking on other things is a bug; some of such *are* bogus,
> but a lot aren't really broken.  What I said is that in a lot of cases we really
> have hard "no blocking other than in callback" (and on subsequent passes there's
> no callback at all).  Which is just about perfect for AIO purposes, so *IF* we
> go for "new method just for AIO, those who don't have it can take a hike", we might
> as well indicate that "can take a hike" in some way (be it opt-in or opt-out) and
> use straight unchanged ->poll(), with alternative callback.

PS: one way of doing that would be to steal a flag from pt->_key and have ->poll()
instances do an equivalent of
        if (flags & LOOKUP_RCU)
                return -ECHILD;
we have in a lot of ->d_revalidate() instances for "need to block" case.  Only
here they would've returned EPOLLNVAL.

Most of the ->poll() instances wouldn't care at all - they do not block unless
the callback does (and in this case it wouldn't have).  Normal poll(2)/select(2)
are completely unaffected.  And AIO would just have that bit set in its
poll_table_struct.

The rules for drivers change only in one respect - if your ->poll() is going to
need to block, check poll_requested_events(pt) & EPOLL_ATOMIC and return EPOLLNVAL
in such case.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 22:20                 ` Al Viro
@ 2018-06-28 22:35                   ` Linus Torvalds
  -1 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-28 22:35 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 3:20 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> The rules for drivers change only in one respect - if your ->poll() is going to
> need to block, check poll_requested_events(pt) & EPOLL_ATOMIC and return EPOLLNVAL
> in such case.

OI still don't even understand why you care.

Yes, the AIO poll implementation did it under the spinlock.

But there's no good *reason* for that.  The "aio_poll()" function
itself is called in perfectly fine blocking context.

The only reason it does it under the spinlock is that apparently
Christoph didn't understand how poll() worked.

As far as I can tell, Christoph could have just done the first pass
'->poll()' *without* taking a spinlock, and that adds the table entry
to the table. Then, *under the spinlock*, you associate the table the
the kioctx. And then *after* the spinlock, you can call "->poll()"
again (now with a NULL table pointer), to verify that the state is
still not triggered. That's the whole point of the two-phgase poll
thing - the first phase adds the entry to the wait queues, and the
second phase checks for the race of "did it the event happen in the
meantime".

There is absolutely no excuse for calling '->poll()' itself under the
spinlock. I don't see any reason for it. The whole "AIO needs this to
avoid races" was always complete and utter bullshit, as far as I can
tell.

So stop it with this crazy and pointless "poll() might block".

IT DAMN WELL SHOULD BE ABLE TO BLOCK, AND NOBODY SANE WILL EVER CARE!

If somebody cares, they are doing things wrong. So fix the AIO code,
don't look at the poll() code, for chrissake!

               Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 22:35                   ` Linus Torvalds
  0 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-28 22:35 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1682 bytes --]

On Thu, Jun 28, 2018 at 3:20 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> The rules for drivers change only in one respect - if your ->poll() is going to
> need to block, check poll_requested_events(pt) & EPOLL_ATOMIC and return EPOLLNVAL
> in such case.

OI still don't even understand why you care.

Yes, the AIO poll implementation did it under the spinlock.

But there's no good *reason* for that.  The "aio_poll()" function
itself is called in perfectly fine blocking context.

The only reason it does it under the spinlock is that apparently
Christoph didn't understand how poll() worked.

As far as I can tell, Christoph could have just done the first pass
'->poll()' *without* taking a spinlock, and that adds the table entry
to the table. Then, *under the spinlock*, you associate the table the
the kioctx. And then *after* the spinlock, you can call "->poll()"
again (now with a NULL table pointer), to verify that the state is
still not triggered. That's the whole point of the two-phgase poll
thing - the first phase adds the entry to the wait queues, and the
second phase checks for the race of "did it the event happen in the
meantime".

There is absolutely no excuse for calling '->poll()' itself under the
spinlock. I don't see any reason for it. The whole "AIO needs this to
avoid races" was always complete and utter bullshit, as far as I can
tell.

So stop it with this crazy and pointless "poll() might block".

IT DAMN WELL SHOULD BE ABLE TO BLOCK, AND NOBODY SANE WILL EVER CARE!

If somebody cares, they are doing things wrong. So fix the AIO code,
don't look at the poll() code, for chrissake!

               Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 22:35                   ` Linus Torvalds
@ 2018-06-28 22:49                     ` Al Viro
  -1 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2018-06-28 22:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 03:35:03PM -0700, Linus Torvalds wrote:

> Yes, the AIO poll implementation did it under the spinlock.
> 
> But there's no good *reason* for that.  The "aio_poll()" function
> itself is called in perfectly fine blocking context.

aio_poll() is not a problem.  It's wakeup callback that is one.

> As far as I can tell, Christoph could have just done the first pass
> '->poll()' *without* taking a spinlock, and that adds the table entry
> to the table. Then, *under the spinlock*, you associate the table the
> the kioctx. And then *after* the spinlock, you can call "->poll()"
> again (now with a NULL table pointer), to verify that the state is
> still not triggered. That's the whole point of the two-phgase poll
> thing - the first phase adds the entry to the wait queues, and the
> second phase checks for the race of "did it the event happen in the
> meantime".

You are misreading that mess.  What he's trying to do (other than surviving
the awful clusterfuck around cancels) is to handle the decision what to
report to userland right in the wakeup callback.  *That* is what really
drives the "make the second-pass ->poll() or something similar to it
non-blocking" (in addition to the fact that it is such in considerable
majority of instances).

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 22:49                     ` Al Viro
  0 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2018-06-28 22:49 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1302 bytes --]

On Thu, Jun 28, 2018 at 03:35:03PM -0700, Linus Torvalds wrote:

> Yes, the AIO poll implementation did it under the spinlock.
> 
> But there's no good *reason* for that.  The "aio_poll()" function
> itself is called in perfectly fine blocking context.

aio_poll() is not a problem.  It's wakeup callback that is one.

> As far as I can tell, Christoph could have just done the first pass
> '->poll()' *without* taking a spinlock, and that adds the table entry
> to the table. Then, *under the spinlock*, you associate the table the
> the kioctx. And then *after* the spinlock, you can call "->poll()"
> again (now with a NULL table pointer), to verify that the state is
> still not triggered. That's the whole point of the two-phgase poll
> thing - the first phase adds the entry to the wait queues, and the
> second phase checks for the race of "did it the event happen in the
> meantime".

You are misreading that mess.  What he's trying to do (other than surviving
the awful clusterfuck around cancels) is to handle the decision what to
report to userland right in the wakeup callback.  *That* is what really
drives the "make the second-pass ->poll() or something similar to it
non-blocking" (in addition to the fact that it is such in considerable
majority of instances).

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 22:49                     ` Al Viro
@ 2018-06-28 22:55                       ` Linus Torvalds
  -1 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-28 22:55 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 3:49 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> aio_poll() is not a problem.  It's wakeup callback that is one.

No, it's not a problem either.

The only thing the wakup callback wants to do is to remove the wait queue entry.

And *that* doesn't need to sleep, and it has absolutely nothing to do
with whether "->poll()" itself sleeps or not.

All that needs to do is "poll_freewait()". Done. In practice, it's
probably only going to be a single free_poll_entry(), but who cares?
The AIO code should handle the "maybe ->poll() added multiple
wait-queues" just fine.

> You are misreading that mess.  What he's trying to do (other than surviving
> the awful clusterfuck around cancels) is to handle the decision what to
> report to userland right in the wakeup callback.  *That* is what really
> drives the "make the second-pass ->poll() or something similar to it
> non-blocking" (in addition to the fact that it is such in considerable
> majority of instances).

That's just crazy BS.

Just call poll() again when you copy the data to userland (which by
definition can block, again).

Stop the idiotic "let's break poll for stupid AIO reasons, because the
AIO people are morons".

Just make the AIO code do the sane thing.

                  Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 22:55                       ` Linus Torvalds
  0 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-28 22:55 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1312 bytes --]

On Thu, Jun 28, 2018 at 3:49 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> aio_poll() is not a problem.  It's wakeup callback that is one.

No, it's not a problem either.

The only thing the wakup callback wants to do is to remove the wait queue entry.

And *that* doesn't need to sleep, and it has absolutely nothing to do
with whether "->poll()" itself sleeps or not.

All that needs to do is "poll_freewait()". Done. In practice, it's
probably only going to be a single free_poll_entry(), but who cares?
The AIO code should handle the "maybe ->poll() added multiple
wait-queues" just fine.

> You are misreading that mess.  What he's trying to do (other than surviving
> the awful clusterfuck around cancels) is to handle the decision what to
> report to userland right in the wakeup callback.  *That* is what really
> drives the "make the second-pass ->poll() or something similar to it
> non-blocking" (in addition to the fact that it is such in considerable
> majority of instances).

That's just crazy BS.

Just call poll() again when you copy the data to userland (which by
definition can block, again).

Stop the idiotic "let's break poll for stupid AIO reasons, because the
AIO people are morons".

Just make the AIO code do the sane thing.

                  Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 22:55                       ` Linus Torvalds
@ 2018-06-28 23:37                         ` Al Viro
  -1 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2018-06-28 23:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 03:55:35PM -0700, Linus Torvalds wrote:
> > You are misreading that mess.  What he's trying to do (other than surviving
> > the awful clusterfuck around cancels) is to handle the decision what to
> > report to userland right in the wakeup callback.  *That* is what really
> > drives the "make the second-pass ->poll() or something similar to it
> > non-blocking" (in addition to the fact that it is such in considerable
> > majority of instances).
> 
> That's just crazy BS.
> 
> Just call poll() again when you copy the data to userland (which by
> definition can block, again).
> 
> Stop the idiotic "let's break poll for stupid AIO reasons, because the
> AIO people are morons".

You underestimate the nastiness of that thing (and for the record, I'm
a lot *less* fond of AIO than you are, what with having had to read that
nest of horrors lately).  It does not "copy the data to userland"; what it
does instead is copying into an array of pages it keeps, right from
IO completion callback.  In read/write case.  This
        ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
        event = ev_page + pos % AIO_EVENTS_PER_PAGE;

        event->obj = (u64)(unsigned long)iocb->ki_user_iocb;
        event->data = iocb->ki_user_data;
        event->res = res;
        event->res2 = res2;

        kunmap_atomic(ev_page);
        flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
is what does the copying.  And that might be done from IRQ context.
Yes, really.

They do have a slightly saner syscall that does copying from the damn
ring buffer, but its use is optional - userland can (and does) direct
read access to mmapped buffer.

Single-consumer ABIs suck and AIO is one such...

It could do schedule_work() and do blocking stuff from that - does so, in
case if it can't grab ->ctx_lock.  Earlier iteration used to try doing
everything straight from wakeup callback, and *that* was racy as hell;
I'd rather have Christoph explain which races he'd been refering to,
but there had been a whole lot of that.  Solution I suggested in the
last round of that was to offload __aio_poll_complete() via schedule_work()
both for cancel and poll wakeup cases.  Doing the common case right
from poll wakeup callback was argued to avoid noticable overhead in
common situation - that's what "aio: try to complete poll iocbs without
context switch" is about.  I'm more than slightly unhappy about the
lack of performance regression testing in non-AIO case...

At that point I would really like to see replies from Christoph - he's
on CET usually, no idea what his effective timezone is...

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-28 23:37                         ` Al Viro
  0 siblings, 0 replies; 50+ messages in thread
From: Al Viro @ 2018-06-28 23:37 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 2687 bytes --]

On Thu, Jun 28, 2018 at 03:55:35PM -0700, Linus Torvalds wrote:
> > You are misreading that mess.  What he's trying to do (other than surviving
> > the awful clusterfuck around cancels) is to handle the decision what to
> > report to userland right in the wakeup callback.  *That* is what really
> > drives the "make the second-pass ->poll() or something similar to it
> > non-blocking" (in addition to the fact that it is such in considerable
> > majority of instances).
> 
> That's just crazy BS.
> 
> Just call poll() again when you copy the data to userland (which by
> definition can block, again).
> 
> Stop the idiotic "let's break poll for stupid AIO reasons, because the
> AIO people are morons".

You underestimate the nastiness of that thing (and for the record, I'm
a lot *less* fond of AIO than you are, what with having had to read that
nest of horrors lately).  It does not "copy the data to userland"; what it
does instead is copying into an array of pages it keeps, right from
IO completion callback.  In read/write case.  This
        ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
        event = ev_page + pos % AIO_EVENTS_PER_PAGE;

        event->obj = (u64)(unsigned long)iocb->ki_user_iocb;
        event->data = iocb->ki_user_data;
        event->res = res;
        event->res2 = res2;

        kunmap_atomic(ev_page);
        flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
is what does the copying.  And that might be done from IRQ context.
Yes, really.

They do have a slightly saner syscall that does copying from the damn
ring buffer, but its use is optional - userland can (and does) direct
read access to mmapped buffer.

Single-consumer ABIs suck and AIO is one such...

It could do schedule_work() and do blocking stuff from that - does so, in
case if it can't grab ->ctx_lock.  Earlier iteration used to try doing
everything straight from wakeup callback, and *that* was racy as hell;
I'd rather have Christoph explain which races he'd been refering to,
but there had been a whole lot of that.  Solution I suggested in the
last round of that was to offload __aio_poll_complete() via schedule_work()
both for cancel and poll wakeup cases.  Doing the common case right
from poll wakeup callback was argued to avoid noticable overhead in
common situation - that's what "aio: try to complete poll iocbs without
context switch" is about.  I'm more than slightly unhappy about the
lack of performance regression testing in non-AIO case...

At that point I would really like to see replies from Christoph - he's
on CET usually, no idea what his effective timezone is...

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 23:37                         ` Al Viro
@ 2018-06-29  0:13                           ` Linus Torvalds
  -1 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-29  0:13 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 4:37 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> You underestimate the nastiness of that thing (and for the record, I'm
> a lot *less* fond of AIO than you are, what with having had to read that
> nest of horrors lately).  It does not "copy the data to userland"; what it
> does instead is copying into an array of pages it keeps, right from
> IO completion callback.  I

Ugh.

Oh well. I'd be perfectly happy to have somebody re-write and
re-architect the aio code entirely.  Much rather than that the
->poll() code. Because I know which one I think is well-desiged with a
nice usable interface, and which one is a pile of shit.

In the meantime, if AIO wants to do poll() in some irq callback, may I
suggest just using workqueues.

            Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-29  0:13                           ` Linus Torvalds
  0 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-29  0:13 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 799 bytes --]

On Thu, Jun 28, 2018 at 4:37 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> You underestimate the nastiness of that thing (and for the record, I'm
> a lot *less* fond of AIO than you are, what with having had to read that
> nest of horrors lately).  It does not "copy the data to userland"; what it
> does instead is copying into an array of pages it keeps, right from
> IO completion callback.  I

Ugh.

Oh well. I'd be perfectly happy to have somebody re-write and
re-architect the aio code entirely.  Much rather than that the
->poll() code. Because I know which one I think is well-desiged with a
nice usable interface, and which one is a pile of shit.

In the meantime, if AIO wants to do poll() in some irq callback, may I
suggest just using workqueues.

            Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 21:11             ` Linus Torvalds
@ 2018-06-29 13:28               ` Christoph Hellwig
  -1 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-29 13:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Christoph Hellwig, linux-fsdevel, Network Development, LKP

On Thu, Jun 28, 2018 at 02:11:17PM -0700, Linus Torvalds wrote:
> Christoph, do you have a test program for IOCB_CMD_POLL and what it's
> actually supposed to do?

https://pagure.io/libaio/c/9c6935e81854d1585bbfa48c35b185849d746864?branch=aio-poll

is the actual test in libaio.  In addition to that the seastar library
actually has a real life user.  But given that is c++ with all modern
bells and whistles you'll probably have an as hard time as me actually
understanding that.

> Because I think that what it can do is simply to do the ->poll() calls
> outside the iocb locks, and then just attach the poll table to the
> kioctx afterwards.

We could do that on the submit side fairly easily.  The problem
is really the completion side, where I'd much avoid introducing a
spurious context switch.  Right now even with a NULL qproc we can't
guarantee any of that.  So we'll need to schedule out to a workqueue,
and then from that schedule the potential multiple NULL qproc calls,
which might actually block elsewhere even if __pollwait is never called.

> This whole "poll must not block" is a complete red herring. It doesn't
> come from any other requirements than BAD AIO GARBAGE CODE.

I comes from the fact to avoid a totally pointless context switch.
aio code itself works just fine called from a workqueue, we have
exatly that case when file system do non-trivial operations in their
end_io handler.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-29 13:28               ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-29 13:28 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 1439 bytes --]

On Thu, Jun 28, 2018 at 02:11:17PM -0700, Linus Torvalds wrote:
> Christoph, do you have a test program for IOCB_CMD_POLL and what it's
> actually supposed to do?

https://pagure.io/libaio/c/9c6935e81854d1585bbfa48c35b185849d746864?branch=aio-poll

is the actual test in libaio.  In addition to that the seastar library
actually has a real life user.  But given that is c++ with all modern
bells and whistles you'll probably have an as hard time as me actually
understanding that.

> Because I think that what it can do is simply to do the ->poll() calls
> outside the iocb locks, and then just attach the poll table to the
> kioctx afterwards.

We could do that on the submit side fairly easily.  The problem
is really the completion side, where I'd much avoid introducing a
spurious context switch.  Right now even with a NULL qproc we can't
guarantee any of that.  So we'll need to schedule out to a workqueue,
and then from that schedule the potential multiple NULL qproc calls,
which might actually block elsewhere even if __pollwait is never called.

> This whole "poll must not block" is a complete red herring. It doesn't
> come from any other requirements than BAD AIO GARBAGE CODE.

I comes from the fact to avoid a totally pointless context switch.
aio code itself works just fine called from a workqueue, we have
exatly that case when file system do non-trivial operations in their
end_io handler.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-28 21:30               ` Al Viro
@ 2018-06-29 13:29                 ` Christoph Hellwig
  -1 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-29 13:29 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Christoph Hellwig, linux-fsdevel,
	Network Development, LKP

On Thu, Jun 28, 2018 at 10:30:27PM +0100, Al Viro wrote:
> > Because I think that what it can do is simply to do the ->poll() calls
> > outside the iocb locks, and then just attach the poll table to the
> > kioctx afterwards.
> 
> I'd do a bit more - embed the first poll_table_entry into poll iocb itself,
> so that the instances that use only one queue wouldn't need any allocations
> at all.

No need for poll_table_entry, we just need a wait_queue_head.
poll_table_entry is an select.c internal (except for two nasty driver) -
neither epoll nor most in-kernel callers use it.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-29 13:29                 ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2018-06-29 13:29 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 592 bytes --]

On Thu, Jun 28, 2018 at 10:30:27PM +0100, Al Viro wrote:
> > Because I think that what it can do is simply to do the ->poll() calls
> > outside the iocb locks, and then just attach the poll table to the
> > kioctx afterwards.
> 
> I'd do a bit more - embed the first poll_table_entry into poll iocb itself,
> so that the instances that use only one queue wouldn't need any allocations
> at all.

No need for poll_table_entry, we just need a wait_queue_head.
poll_table_entry is an select.c internal (except for two nasty driver) -
neither epoll nor most in-kernel callers use it.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
  2018-06-29 13:29                 ` Christoph Hellwig
@ 2018-06-29 13:40                   ` Linus Torvalds
  -1 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-29 13:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Al Viro, linux-fsdevel, Network Development, LKP

On Fri, Jun 29, 2018 at 6:29 AM Christoph Hellwig <hch@lst.de> wrote:
> No need for poll_table_entry, we just need a wait_queue_head.
> poll_table_entry is an select.c internal (except for two nasty driver) -
> neither epoll nor most in-kernel callers use it.

Well, you need the poll_table for the "poll_wait()", and you would
need to be able to have multiple wait-queues even though you only have
one file you're waiting for.

But yes, it doesn't really need to be the full complex poll_table_entry model.

              Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer
@ 2018-06-29 13:40                   ` Linus Torvalds
  0 siblings, 0 replies; 50+ messages in thread
From: Linus Torvalds @ 2018-06-29 13:40 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 541 bytes --]

On Fri, Jun 29, 2018 at 6:29 AM Christoph Hellwig <hch@lst.de> wrote:
> No need for poll_table_entry, we just need a wait_queue_head.
> poll_table_entry is an select.c internal (except for two nasty driver) -
> neither epoll nor most in-kernel callers use it.

Well, you need the poll_table for the "poll_wait()", and you would
need to be able to have multiple wait-queues even though you only have
one file you're waiting for.

But yes, it doesn't really need to be the full complex poll_table_entry model.

              Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2018-06-29 13:41 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-28 14:20 [RFC] replace ->get_poll_head with a waitqueue pointer in struct file Christoph Hellwig
2018-06-28 14:20 ` Christoph Hellwig
2018-06-28 14:20 ` [PATCH 1/6] net: remove sock_poll_busy_flag Christoph Hellwig
2018-06-28 14:20   ` Christoph Hellwig
2018-06-28 14:20 ` [PATCH 2/6] net: remove bogus RCU annotations on socket.wq Christoph Hellwig
2018-06-28 14:20   ` Christoph Hellwig
2018-06-28 14:20 ` [PATCH 3/6] net: don't detour through struct to find the poll head Christoph Hellwig
2018-06-28 14:20   ` Christoph Hellwig
2018-06-28 14:20 ` [PATCH 4/6] net: remove busy polling from sock_get_poll_head Christoph Hellwig
2018-06-28 14:20   ` Christoph Hellwig
2018-06-28 14:20 ` [PATCH 5/6] net: remove sock_poll_busy_loop Christoph Hellwig
2018-06-28 14:20   ` Christoph Hellwig
2018-06-28 14:20 ` [PATCH 6/6] fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer Christoph Hellwig
2018-06-28 14:20   ` Christoph Hellwig
2018-06-28 16:40   ` Linus Torvalds
2018-06-28 16:40     ` Linus Torvalds
2018-06-28 18:17     ` Al Viro
2018-06-28 18:17       ` Al Viro
2018-06-28 19:31       ` Linus Torvalds
2018-06-28 19:31         ` Linus Torvalds
2018-06-28 20:28         ` Al Viro
2018-06-28 20:28           ` Al Viro
2018-06-28 20:37           ` Al Viro
2018-06-28 20:37             ` Al Viro
2018-06-28 21:16             ` Linus Torvalds
2018-06-28 21:16               ` Linus Torvalds
2018-06-28 21:11           ` Linus Torvalds
2018-06-28 21:11             ` Linus Torvalds
2018-06-28 21:30             ` Al Viro
2018-06-28 21:30               ` Al Viro
2018-06-28 21:39               ` Linus Torvalds
2018-06-28 21:39                 ` Linus Torvalds
2018-06-28 22:20               ` Al Viro
2018-06-28 22:20                 ` Al Viro
2018-06-28 22:35                 ` Linus Torvalds
2018-06-28 22:35                   ` Linus Torvalds
2018-06-28 22:49                   ` Al Viro
2018-06-28 22:49                     ` Al Viro
2018-06-28 22:55                     ` Linus Torvalds
2018-06-28 22:55                       ` Linus Torvalds
2018-06-28 23:37                       ` Al Viro
2018-06-28 23:37                         ` Al Viro
2018-06-29  0:13                         ` Linus Torvalds
2018-06-29  0:13                           ` Linus Torvalds
2018-06-29 13:29               ` Christoph Hellwig
2018-06-29 13:29                 ` Christoph Hellwig
2018-06-29 13:40                 ` Linus Torvalds
2018-06-29 13:40                   ` Linus Torvalds
2018-06-29 13:28             ` Christoph Hellwig
2018-06-29 13:28               ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.