netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND PATCH net] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
@ 2019-02-13 21:45 Jann Horn
  2019-02-14 17:13 ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Jann Horn @ 2019-02-13 21:45 UTC (permalink / raw)
  To: David S. Miller, netdev, jannh
  Cc: linux-mm, linux-kernel, Michal Hocko, Vlastimil Babka,
	Pavel Tatashin, Oscar Salvador, Mel Gorman, Aaron Lu,
	Alexander Duyck

The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
number of references that we might need to create in the fastpath later,
the bump-allocation fastpath only has to modify the non-atomic bias value
that tracks the number of extra references we hold instead of the atomic
refcount. The maximum number of allocations we can serve (under the
assumption that no allocation is made with size 0) is nc->size, so that's
the bias used.

However, even when all memory in the allocation has been given away, a
reference to the page is still held; and in the `offset < 0` slowpath, the
page may be reused if everyone else has dropped their references.
This means that the necessary number of references is actually
`nc->size+1`.

Luckily, from a quick grep, it looks like the only path that can call
page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
requires CAP_NET_ADMIN in the init namespace and is only intended to be
used for kernel testing and fuzzing.

To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
`offset < 0` path, below the virt_to_page() call, and then repeatedly call
writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
with a vector consisting of 15 elements containing 1 byte each.

Signed-off-by: Jann Horn <jannh@google.com>
---
Resending to davem at the request of akpm.

 mm/page_alloc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35fdde041f5c..46285d28e43b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4675,11 +4675,11 @@ void *page_frag_alloc(struct page_frag_cache *nc,
 		/* Even if we own the page, we do not use atomic_set().
 		 * This would break get_page_unless_zero() users.
 		 */
-		page_ref_add(page, size - 1);
+		page_ref_add(page, size);
 
 		/* reset page count bias and offset to start of new frag */
 		nc->pfmemalloc = page_is_pfmemalloc(page);
-		nc->pagecnt_bias = size;
+		nc->pagecnt_bias = size + 1;
 		nc->offset = size;
 	}
 
@@ -4695,10 +4695,10 @@ void *page_frag_alloc(struct page_frag_cache *nc,
 		size = nc->size;
 #endif
 		/* OK, page count is 0, we can safely set it */
-		set_page_count(page, size);
+		set_page_count(page, size + 1);
 
 		/* reset page count bias and offset to start of new frag */
-		nc->pagecnt_bias = size;
+		nc->pagecnt_bias = size + 1;
 		offset = size - fragsz;
 	}
 
-- 
2.20.1.791.gb4d0f1c61a-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RESEND PATCH net] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
  2019-02-13 21:45 [RESEND PATCH net] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs Jann Horn
@ 2019-02-14 17:13 ` David Miller
  2019-02-14 21:26   ` Jann Horn
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2019-02-14 17:13 UTC (permalink / raw)
  To: jannh
  Cc: netdev, linux-mm, linux-kernel, mhocko, vbabka, pavel.tatashin,
	osalvador, mgorman, aaron.lu, alexander.h.duyck

From: Jann Horn <jannh@google.com>
Date: Wed, 13 Feb 2019 22:45:59 +0100

> The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
> number of references that we might need to create in the fastpath later,
> the bump-allocation fastpath only has to modify the non-atomic bias value
> that tracks the number of extra references we hold instead of the atomic
> refcount. The maximum number of allocations we can serve (under the
> assumption that no allocation is made with size 0) is nc->size, so that's
> the bias used.
> 
> However, even when all memory in the allocation has been given away, a
> reference to the page is still held; and in the `offset < 0` slowpath, the
> page may be reused if everyone else has dropped their references.
> This means that the necessary number of references is actually
> `nc->size+1`.
> 
> Luckily, from a quick grep, it looks like the only path that can call
> page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
> requires CAP_NET_ADMIN in the init namespace and is only intended to be
> used for kernel testing and fuzzing.
> 
> To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
> `offset < 0` path, below the virt_to_page() call, and then repeatedly call
> writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
> with a vector consisting of 15 elements containing 1 byte each.
> 
> Signed-off-by: Jann Horn <jannh@google.com>

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND PATCH net] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
  2019-02-14 17:13 ` David Miller
@ 2019-02-14 21:26   ` Jann Horn
  2019-02-14 22:21     ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Jann Horn @ 2019-02-14 21:26 UTC (permalink / raw)
  To: David Miller, Alexander Duyck; +Cc: Network Development, kernel list

On Thu, Feb 14, 2019 at 6:13 PM David Miller <davem@davemloft.net> wrote:
>
> From: Jann Horn <jannh@google.com>
> Date: Wed, 13 Feb 2019 22:45:59 +0100
>
> > The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
> > number of references that we might need to create in the fastpath later,
> > the bump-allocation fastpath only has to modify the non-atomic bias value
> > that tracks the number of extra references we hold instead of the atomic
> > refcount. The maximum number of allocations we can serve (under the
> > assumption that no allocation is made with size 0) is nc->size, so that's
> > the bias used.
> >
> > However, even when all memory in the allocation has been given away, a
> > reference to the page is still held; and in the `offset < 0` slowpath, the
> > page may be reused if everyone else has dropped their references.
> > This means that the necessary number of references is actually
> > `nc->size+1`.
> >
> > Luckily, from a quick grep, it looks like the only path that can call
> > page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
> > requires CAP_NET_ADMIN in the init namespace and is only intended to be
> > used for kernel testing and fuzzing.
> >
> > To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
> > `offset < 0` path, below the virt_to_page() call, and then repeatedly call
> > writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
> > with a vector consisting of 15 elements containing 1 byte each.
> >
> > Signed-off-by: Jann Horn <jannh@google.com>
>
> Applied and queued up for -stable.

I had sent a v2 at Alexander Duyck's request an hour before you
applied the patch (with a minor difference that, in Alexander's
opinion, might be slightly more efficient). I guess the net tree
doesn't work like the mm tree, where patches can get removed and
replaced with newer versions? So if Alexander wants that change
(s/size/PAGE_FRAG_CACHE_MAX_SIZE/ in the refcount), someone has to
send that as a separate patch?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND PATCH net] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
  2019-02-14 21:26   ` Jann Horn
@ 2019-02-14 22:21     ` David Miller
  2019-02-15 14:10       ` Jann Horn
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2019-02-14 22:21 UTC (permalink / raw)
  To: jannh; +Cc: alexander.duyck, netdev, linux-kernel

From: Jann Horn <jannh@google.com>
Date: Thu, 14 Feb 2019 22:26:22 +0100

> On Thu, Feb 14, 2019 at 6:13 PM David Miller <davem@davemloft.net> wrote:
>>
>> From: Jann Horn <jannh@google.com>
>> Date: Wed, 13 Feb 2019 22:45:59 +0100
>>
>> > The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
>> > number of references that we might need to create in the fastpath later,
>> > the bump-allocation fastpath only has to modify the non-atomic bias value
>> > that tracks the number of extra references we hold instead of the atomic
>> > refcount. The maximum number of allocations we can serve (under the
>> > assumption that no allocation is made with size 0) is nc->size, so that's
>> > the bias used.
>> >
>> > However, even when all memory in the allocation has been given away, a
>> > reference to the page is still held; and in the `offset < 0` slowpath, the
>> > page may be reused if everyone else has dropped their references.
>> > This means that the necessary number of references is actually
>> > `nc->size+1`.
>> >
>> > Luckily, from a quick grep, it looks like the only path that can call
>> > page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
>> > requires CAP_NET_ADMIN in the init namespace and is only intended to be
>> > used for kernel testing and fuzzing.
>> >
>> > To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
>> > `offset < 0` path, below the virt_to_page() call, and then repeatedly call
>> > writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
>> > with a vector consisting of 15 elements containing 1 byte each.
>> >
>> > Signed-off-by: Jann Horn <jannh@google.com>
>>
>> Applied and queued up for -stable.
> 
> I had sent a v2 at Alexander Duyck's request an hour before you
> applied the patch (with a minor difference that, in Alexander's
> opinion, might be slightly more efficient). I guess the net tree
> doesn't work like the mm tree, where patches can get removed and
> replaced with newer versions? So if Alexander wants that change
> (s/size/PAGE_FRAG_CACHE_MAX_SIZE/ in the refcount), someone has to
> send that as a separate patch?

Yes, please send a follow-up.  Sorry about that.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND PATCH net] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
  2019-02-14 22:21     ` David Miller
@ 2019-02-15 14:10       ` Jann Horn
  2019-02-15 18:35         ` Alexander Duyck
  0 siblings, 1 reply; 8+ messages in thread
From: Jann Horn @ 2019-02-15 14:10 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Network Development, kernel list

On Thu, Feb 14, 2019 at 11:21 PM David Miller <davem@davemloft.net> wrote:
>
> From: Jann Horn <jannh@google.com>
> Date: Thu, 14 Feb 2019 22:26:22 +0100
>
> > On Thu, Feb 14, 2019 at 6:13 PM David Miller <davem@davemloft.net> wrote:
> >>
> >> From: Jann Horn <jannh@google.com>
> >> Date: Wed, 13 Feb 2019 22:45:59 +0100
> >>
> >> > The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
> >> > number of references that we might need to create in the fastpath later,
> >> > the bump-allocation fastpath only has to modify the non-atomic bias value
> >> > that tracks the number of extra references we hold instead of the atomic
> >> > refcount. The maximum number of allocations we can serve (under the
> >> > assumption that no allocation is made with size 0) is nc->size, so that's
> >> > the bias used.
> >> >
> >> > However, even when all memory in the allocation has been given away, a
> >> > reference to the page is still held; and in the `offset < 0` slowpath, the
> >> > page may be reused if everyone else has dropped their references.
> >> > This means that the necessary number of references is actually
> >> > `nc->size+1`.
> >> >
> >> > Luckily, from a quick grep, it looks like the only path that can call
> >> > page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
> >> > requires CAP_NET_ADMIN in the init namespace and is only intended to be
> >> > used for kernel testing and fuzzing.
> >> >
> >> > To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
> >> > `offset < 0` path, below the virt_to_page() call, and then repeatedly call
> >> > writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
> >> > with a vector consisting of 15 elements containing 1 byte each.
> >> >
> >> > Signed-off-by: Jann Horn <jannh@google.com>
> >>
> >> Applied and queued up for -stable.
> >
> > I had sent a v2 at Alexander Duyck's request an hour before you
> > applied the patch (with a minor difference that, in Alexander's
> > opinion, might be slightly more efficient). I guess the net tree
> > doesn't work like the mm tree, where patches can get removed and
> > replaced with newer versions? So if Alexander wants that change
> > (s/size/PAGE_FRAG_CACHE_MAX_SIZE/ in the refcount), someone has to
> > send that as a separate patch?
>
> Yes, please send a follow-up.  Sorry about that.

@Alexander Do you want to do that? It was your idea and I don't think
I can reasonably judge the usefulness of the change.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND PATCH net] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
  2019-02-15 14:10       ` Jann Horn
@ 2019-02-15 18:35         ` Alexander Duyck
  2019-02-15 18:53           ` Jann Horn
  2019-02-15 19:58           ` David Miller
  0 siblings, 2 replies; 8+ messages in thread
From: Alexander Duyck @ 2019-02-15 18:35 UTC (permalink / raw)
  To: Jann Horn; +Cc: Network Development, kernel list

On Fri, Feb 15, 2019 at 6:10 AM Jann Horn <jannh@google.com> wrote:
>
> On Thu, Feb 14, 2019 at 11:21 PM David Miller <davem@davemloft.net> wrote:
> >
> > From: Jann Horn <jannh@google.com>
> > Date: Thu, 14 Feb 2019 22:26:22 +0100
> >
> > > On Thu, Feb 14, 2019 at 6:13 PM David Miller <davem@davemloft.net> wrote:
> > >>
> > >> From: Jann Horn <jannh@google.com>
> > >> Date: Wed, 13 Feb 2019 22:45:59 +0100
> > >>
> > >> > The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
> > >> > number of references that we might need to create in the fastpath later,
> > >> > the bump-allocation fastpath only has to modify the non-atomic bias value
> > >> > that tracks the number of extra references we hold instead of the atomic
> > >> > refcount. The maximum number of allocations we can serve (under the
> > >> > assumption that no allocation is made with size 0) is nc->size, so that's
> > >> > the bias used.
> > >> >
> > >> > However, even when all memory in the allocation has been given away, a
> > >> > reference to the page is still held; and in the `offset < 0` slowpath, the
> > >> > page may be reused if everyone else has dropped their references.
> > >> > This means that the necessary number of references is actually
> > >> > `nc->size+1`.
> > >> >
> > >> > Luckily, from a quick grep, it looks like the only path that can call
> > >> > page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
> > >> > requires CAP_NET_ADMIN in the init namespace and is only intended to be
> > >> > used for kernel testing and fuzzing.
> > >> >
> > >> > To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
> > >> > `offset < 0` path, below the virt_to_page() call, and then repeatedly call
> > >> > writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
> > >> > with a vector consisting of 15 elements containing 1 byte each.
> > >> >
> > >> > Signed-off-by: Jann Horn <jannh@google.com>
> > >>
> > >> Applied and queued up for -stable.
> > >
> > > I had sent a v2 at Alexander Duyck's request an hour before you
> > > applied the patch (with a minor difference that, in Alexander's
> > > opinion, might be slightly more efficient). I guess the net tree
> > > doesn't work like the mm tree, where patches can get removed and
> > > replaced with newer versions? So if Alexander wants that change
> > > (s/size/PAGE_FRAG_CACHE_MAX_SIZE/ in the refcount), someone has to
> > > send that as a separate patch?
> >
> > Yes, please send a follow-up.  Sorry about that.
>
> @Alexander Do you want to do that? It was your idea and I don't think
> I can reasonably judge the usefulness of the change.

I'll take care of it. I'm kind of annoyed that you resubmitted this to
netdev before anyone had a chance to even provide review comments
though.

As is this doesn't really address the issue anyway since the bigger
issue is the data alignment issue that I pointed out. I'll have
patches for both ready shortly.

- Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND PATCH net] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
  2019-02-15 18:35         ` Alexander Duyck
@ 2019-02-15 18:53           ` Jann Horn
  2019-02-15 19:58           ` David Miller
  1 sibling, 0 replies; 8+ messages in thread
From: Jann Horn @ 2019-02-15 18:53 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Network Development, kernel list

On Fri, Feb 15, 2019 at 7:35 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Fri, Feb 15, 2019 at 6:10 AM Jann Horn <jannh@google.com> wrote:
> > On Thu, Feb 14, 2019 at 11:21 PM David Miller <davem@davemloft.net> wrote:
> > > From: Jann Horn <jannh@google.com>
> > > Date: Thu, 14 Feb 2019 22:26:22 +0100
> > >
> > > > On Thu, Feb 14, 2019 at 6:13 PM David Miller <davem@davemloft.net> wrote:
> > > >> From: Jann Horn <jannh@google.com>
> > > >> Date: Wed, 13 Feb 2019 22:45:59 +0100
> > > >>
> > > >> > The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
> > > >> > number of references that we might need to create in the fastpath later,
> > > >> > the bump-allocation fastpath only has to modify the non-atomic bias value
> > > >> > that tracks the number of extra references we hold instead of the atomic
> > > >> > refcount. The maximum number of allocations we can serve (under the
> > > >> > assumption that no allocation is made with size 0) is nc->size, so that's
> > > >> > the bias used.
> > > >> >
> > > >> > However, even when all memory in the allocation has been given away, a
> > > >> > reference to the page is still held; and in the `offset < 0` slowpath, the
> > > >> > page may be reused if everyone else has dropped their references.
> > > >> > This means that the necessary number of references is actually
> > > >> > `nc->size+1`.
> > > >> >
> > > >> > Luckily, from a quick grep, it looks like the only path that can call
> > > >> > page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
> > > >> > requires CAP_NET_ADMIN in the init namespace and is only intended to be
> > > >> > used for kernel testing and fuzzing.
> > > >> >
> > > >> > To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
> > > >> > `offset < 0` path, below the virt_to_page() call, and then repeatedly call
> > > >> > writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
> > > >> > with a vector consisting of 15 elements containing 1 byte each.
> > > >> >
> > > >> > Signed-off-by: Jann Horn <jannh@google.com>
> > > >>
> > > >> Applied and queued up for -stable.
> > > >
> > > > I had sent a v2 at Alexander Duyck's request an hour before you
> > > > applied the patch (with a minor difference that, in Alexander's
> > > > opinion, might be slightly more efficient). I guess the net tree
> > > > doesn't work like the mm tree, where patches can get removed and
> > > > replaced with newer versions? So if Alexander wants that change
> > > > (s/size/PAGE_FRAG_CACHE_MAX_SIZE/ in the refcount), someone has to
> > > > send that as a separate patch?
> > >
> > > Yes, please send a follow-up.  Sorry about that.
> >
> > @Alexander Do you want to do that? It was your idea and I don't think
> > I can reasonably judge the usefulness of the change.
>
> I'll take care of it. I'm kind of annoyed that you resubmitted this to
> netdev before anyone had a chance to even provide review comments
> though.

Ah, I wasn't aware that there's a convention against resubmitting a
patch immediately if the first recipient list pointed to the wrong
maintainer. The only recipient I removed for the resend was akpm, so I
assumed that anyone wanting to comment on the patch could just as well
do that on the resent patch. Also, akpm had already put it in his
tree, and I wasn't sure whether that meant that I had to send it to
davem quickly enough to make it land in davem's tree before akpm sends
his next pull request to Linus...

So next time this happens, should I add a note below the resend
pointing out that the resend is of a recent patch and it hasn't had
time to be reviewed yet, or something like that?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND PATCH net] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
  2019-02-15 18:35         ` Alexander Duyck
  2019-02-15 18:53           ` Jann Horn
@ 2019-02-15 19:58           ` David Miller
  1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2019-02-15 19:58 UTC (permalink / raw)
  To: alexander.duyck; +Cc: jannh, netdev, linux-kernel

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Fri, 15 Feb 2019 10:35:18 -0800

> I'll take care of it. I'm kind of annoyed that you resubmitted this to
> netdev before anyone had a chance to even provide review comments
> though.

He was only following instructions from Andrew Morton.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-02-15 19:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-13 21:45 [RESEND PATCH net] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs Jann Horn
2019-02-14 17:13 ` David Miller
2019-02-14 21:26   ` Jann Horn
2019-02-14 22:21     ` David Miller
2019-02-15 14:10       ` Jann Horn
2019-02-15 18:35         ` Alexander Duyck
2019-02-15 18:53           ` Jann Horn
2019-02-15 19:58           ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).