All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: bypass ->sendpage for slab pages
@ 2020-08-19  5:19 Christoph Hellwig
  2020-08-19 19:07 ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2020-08-19  5:19 UTC (permalink / raw)
  To: davem, kuba; +Cc: colyli, netdev, linux-kernel

Sending Slab or tail pages into ->sendpage will cause really strange
delayed oops.  Prevent it right in the networking code instead of
requiring drivers to guess the exact conditions where sendpage works.

Based on a patch from Coly Li <colyli@suse.de>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/socket.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/socket.c b/net/socket.c
index dbbe8ea7d395da..b4e65688915fe3 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -3638,7 +3638,11 @@ EXPORT_SYMBOL(kernel_getpeername);
 int kernel_sendpage(struct socket *sock, struct page *page, int offset,
 		    size_t size, int flags)
 {
-	if (sock->ops->sendpage)
+	/* sendpage does manipulates the refcount of the sent in page, which
+	 * does not work for Slab pages, or for tails of non-__GFP_COMP
+	 * high order pages.
+	 */
+	if (sock->ops->sendpage && !PageSlab(page) && page_count(page) > 0)
 		return sock->ops->sendpage(sock, page, offset, size, flags);
 
 	return sock_no_sendpage(sock, page, offset, size, flags);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: bypass ->sendpage for slab pages
  2020-08-19  5:19 [PATCH] net: bypass ->sendpage for slab pages Christoph Hellwig
@ 2020-08-19 19:07 ` David Miller
  2020-08-20  4:37   ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2020-08-19 19:07 UTC (permalink / raw)
  To: hch; +Cc: kuba, colyli, netdev, linux-kernel

From: Christoph Hellwig <hch@lst.de>
Date: Wed, 19 Aug 2020 07:19:45 +0200

> Sending Slab or tail pages into ->sendpage will cause really strange
> delayed oops.  Prevent it right in the networking code instead of
> requiring drivers to guess the exact conditions where sendpage works.
> 
> Based on a patch from Coly Li <colyli@suse.de>.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Yes this fixes the problem, but it doesn't in any way deal with the
callers who are doing this stuff.

They are all likely using sendpage because they expect that it will
avoid the copy, for performance reasons or whatever.

Now it won't.

At least with Coly's patch set, the set of violators was documented
and they could switch to allocating non-slab pages or calling
sendmsg() or write() instead.

I hear talk about ABIs just doing the right thing, but when their
value is increased performance vs. other interfaces it means that
taking a slow path silently is bad in the long term.  And that's
what this proposed patch here does.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: bypass ->sendpage for slab pages
  2020-08-19 19:07 ` David Miller
@ 2020-08-20  4:37   ` Christoph Hellwig
  2020-08-21 21:14     ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2020-08-20  4:37 UTC (permalink / raw)
  To: David Miller; +Cc: hch, kuba, colyli, netdev, linux-kernel

On Wed, Aug 19, 2020 at 12:07:09PM -0700, David Miller wrote:
> Yes this fixes the problem, but it doesn't in any way deal with the
> callers who are doing this stuff.
> 
> They are all likely using sendpage because they expect that it will
> avoid the copy, for performance reasons or whatever.
> 
> Now it won't.
> 
> At least with Coly's patch set, the set of violators was documented
> and they could switch to allocating non-slab pages or calling
> sendmsg() or write() instead.
> 
> I hear talk about ABIs just doing the right thing, but when their
> value is increased performance vs. other interfaces it means that
> taking a slow path silently is bad in the long term.  And that's
> what this proposed patch here does.

If you look at who uses sendpage outside the networking layer itself
you see that it is basically block driver and file systems.  These
have no way to control what memory they get passed and have to deal
with everything someone throws at them.

So for these callers the requirements are in order of importance:

 (1) just send the damn page without generating weird OOPSes
 (2) do so as fast as possible
 (3) do so without requіring pointless boilerplate code

Any I think the current interface fails these requirements really badly.
Having a helper that just does the right thing would really help all of
these users, including those currently using raw ->sendpage over
kernel_sendpage.  If you don't like kernel_sendpage to just do the
right thing we could just add another helper, e.g.
kernel_sendpage_or_fallback, but that would seem a little pointless
to me.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: bypass ->sendpage for slab pages
  2020-08-20  4:37   ` Christoph Hellwig
@ 2020-08-21 21:14     ` David Miller
  2020-09-18  8:37       ` Coly Li
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2020-08-21 21:14 UTC (permalink / raw)
  To: hch; +Cc: kuba, colyli, netdev, linux-kernel

From: Christoph Hellwig <hch@lst.de>
Date: Thu, 20 Aug 2020 06:37:44 +0200

> If you look at who uses sendpage outside the networking layer itself
> you see that it is basically block driver and file systems.  These
> have no way to control what memory they get passed and have to deal
> with everything someone throws at them.

I see nvme doing virt_to_page() on several things when it calls into
kernel_sendpage().

This is the kind of stuff I want cleaned up, and which your patch
will not trap nor address.

In nvme it sometimes seems to check for sendpage validity:

		/* can't zcopy slab pages */
		if (unlikely(PageSlab(page))) {
			ret = sock_no_sendpage(queue->sock, page, offset, len,
					flags);
		} else {
			ret = kernel_sendpage(queue->sock, page, offset, len,
					flags);
		}

Yet elsewhere does not and just blindly calls:

	ret = kernel_sendpage(queue->sock, virt_to_page(pdu),
			offset_in_page(pdu) + req->offset, len,  flags);

This pdu seems to come from a page frag allocation.

That's the target side.  On the host side:

		ret = kernel_sendpage(cmd->queue->sock, page, cmd->offset,
					left, flags);

No page slab check or anything like that.

I'm hesitent to put in the kernel_sendpage() patch, becuase it provides a
disincentive to fix up code like this.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: bypass ->sendpage for slab pages
  2020-08-21 21:14     ` David Miller
@ 2020-09-18  8:37       ` Coly Li
  2020-09-21 14:25         ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Coly Li @ 2020-09-18  8:37 UTC (permalink / raw)
  To: David Miller, hch; +Cc: kuba, netdev, linux-kernel

On 2020/8/22 05:14, David Miller wrote:
> From: Christoph Hellwig <hch@lst.de>
> Date: Thu, 20 Aug 2020 06:37:44 +0200
> 
>> If you look at who uses sendpage outside the networking layer itself
>> you see that it is basically block driver and file systems.  These
>> have no way to control what memory they get passed and have to deal
>> with everything someone throws at them.
> 
> I see nvme doing virt_to_page() on several things when it calls into
> kernel_sendpage().
> 
> This is the kind of stuff I want cleaned up, and which your patch
> will not trap nor address.
> 
> In nvme it sometimes seems to check for sendpage validity:
> 
> 		/* can't zcopy slab pages */
> 		if (unlikely(PageSlab(page))) {
> 			ret = sock_no_sendpage(queue->sock, page, offset, len,
> 					flags);
> 		} else {
> 			ret = kernel_sendpage(queue->sock, page, offset, len,
> 					flags);
> 		}
> 
> Yet elsewhere does not and just blindly calls:
> 
> 	ret = kernel_sendpage(queue->sock, virt_to_page(pdu),
> 			offset_in_page(pdu) + req->offset, len,  flags);
> 
> This pdu seems to come from a page frag allocation.
> 
> That's the target side.  On the host side:
> 
> 		ret = kernel_sendpage(cmd->queue->sock, page, cmd->offset,
> 					left, flags);
> 
> No page slab check or anything like that.
> 
> I'm hesitent to put in the kernel_sendpage() patch, becuase it provides a
> disincentive to fix up code like this.
> 

Hi David and Christoph,

It has been quiet for a while, what should we go next for the
kernel_sendpage() related issue ?

Will Christoph's or my series be considered as proper fix, or maybe I
should wait for some other better idea to show up? Any is OK for me,
once the problem is fixed.

Thanks in advance.

Coly Li

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: bypass ->sendpage for slab pages
  2020-09-18  8:37       ` Coly Li
@ 2020-09-21 14:25         ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2020-09-21 14:25 UTC (permalink / raw)
  To: Coly Li; +Cc: David Miller, hch, kuba, netdev, linux-kernel

On Fri, Sep 18, 2020 at 04:37:24PM +0800, Coly Li wrote:
> Hi David and Christoph,
> 
> It has been quiet for a while, what should we go next for the
> kernel_sendpage() related issue ?
> 
> Will Christoph's or my series be considered as proper fix, or maybe I
> should wait for some other better idea to show up? Any is OK for me,
> once the problem is fixed.

I think for all the network storage stuff we really need a "send me
out a page helper", and the nvmet bits that Dave pointed to look to
me like they actually are currently broken.

Given that Dave doesn't want to change the kernel_sendpage semantics
I'll resend with a new helper instead.  Any preferences for a name?
safe_sendpage?  kernel_sendpage_safe?  kernel_send_one_page?

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-09-21 14:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-19  5:19 [PATCH] net: bypass ->sendpage for slab pages Christoph Hellwig
2020-08-19 19:07 ` David Miller
2020-08-20  4:37   ` Christoph Hellwig
2020-08-21 21:14     ` David Miller
2020-09-18  8:37       ` Coly Li
2020-09-21 14:25         ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.