* [PATCH] off-by-one in get_mempolicy(2)
@ 2019-10-09 14:05 Al Viro
2019-10-09 14:48 ` Vlastimil Babka
0 siblings, 1 reply; 2+ messages in thread
From: Al Viro @ 2019-10-09 14:05 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ralph Campbell, Alexander Duyck, Waiman Long, Andi Kleen,
linux-mm, linux-api
get_mempolicy(2) and related syscalls have always passed
1 + number of bits in nodemask as maxnodes argument - see e.g.
copy_nodes_to_user() and get_nodes(). Or libnuma, for the userland
side -
static void getpol(int *oldpolicy, struct bitmask *bmp)
{
if (get_mempolicy(oldpolicy, bmp->maskp, bmp->size + 1, 0, 0) < 0)
numa_error("get_mempolicy");
}
and similar for other syscalls. However, the check for insufficient
destination size in get_mempolicy(2) used to be
if (nmask != NULL && maxnode < MAX_NUMNODES)
return -EINVAL;
IOW, maxnode == MAX_NUMNODES (representing "MAX_NUMNODES - 1 bits")
had been accepted. The reason why that hadn't messed libnuma
logics used to determine the required bitmap size is that
MAX_NUMNODES is always a power of 2 and the loop in libnuma
is
nodemask_sz = 16;
do {
nodemask_sz <<= 1;
mask = realloc(mask, nodemask_sz / 8);
if (!mask)
return;
} while (get_mempolicy(&pol, mask, nodemask_sz + 1, 0, 0) < 0 && errno == EINVAL &&
nodemask_sz < 4096*8);
I.e. it's been passing 33, 65, 127, etc. until it got it large enough.
That sidesteps the boundary case - we never try to pass exactly
MAX_NUMNODES there.
However, that has changed recently, when get_mempolicy() switched
to
if (nmask != NULL && maxnode < nr_node_ids)
return -EINVAL;
_That_ can trigger. Consider a box with nr_node_ids == 65.
The first call in libnuma:set_nodemask_size() loop will
pass 33 and fail, then we'll raise nodemask_sz to 64,
allocate a 64bit mask and call get_mempolicy(&pol, mask, 65, 0, 0),
which will succeed. OK, so we decide to use 64bit bitmaps, and
subsequent getpol() will be passing 65 to get_mempolicy(2). Which
is not a good idea, since kernel-side we'll get
copy_nodes_to_user(nmask, 65, &nodes)
And that will copy only 8 bytes out of kernel-side bitmap with
65 bits in it...
IOW, that check always should had been <=, not <; it didn't matter
until commit 050c17f239fd ("numa: change get_mempolicy() to use
nr_node_ids instead of MAX_NUMNODES") this year. The fix is trivial
- we need to make that check consistent with the code that does
actual copyin/copyout.
Fixes: 050c17f239fd ("numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4ae967bcf954..e184df7633b0 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1561,7 +1561,7 @@ static int kernel_get_mempolicy(int __user *policy,
addr = untagged_addr(addr);
- if (nmask != NULL && maxnode < nr_node_ids)
+ if (nmask != NULL && maxnode <= nr_node_ids)
return -EINVAL;
err = do_get_mempolicy(&pval, &nodes, addr, flags);
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] off-by-one in get_mempolicy(2)
2019-10-09 14:05 [PATCH] off-by-one in get_mempolicy(2) Al Viro
@ 2019-10-09 14:48 ` Vlastimil Babka
0 siblings, 0 replies; 2+ messages in thread
From: Vlastimil Babka @ 2019-10-09 14:48 UTC (permalink / raw)
To: Al Viro, Linus Torvalds
Cc: Ralph Campbell, Alexander Duyck, Waiman Long, Andi Kleen,
linux-mm, linux-api, Andrew Morton
On 10/9/19 4:05 PM, Al Viro wrote:
> get_mempolicy(2) and related syscalls have always passed
> 1 + number of bits in nodemask as maxnodes argument - see e.g.
> copy_nodes_to_user() and get_nodes(). Or libnuma, for the userland
> side -
> static void getpol(int *oldpolicy, struct bitmask *bmp)
> {
> if (get_mempolicy(oldpolicy, bmp->maskp, bmp->size + 1, 0, 0) < 0)
> numa_error("get_mempolicy");
> }
> and similar for other syscalls. However, the check for insufficient
> destination size in get_mempolicy(2) used to be
> if (nmask != NULL && maxnode < MAX_NUMNODES)
> return -EINVAL;
> IOW, maxnode == MAX_NUMNODES (representing "MAX_NUMNODES - 1 bits")
> had been accepted. The reason why that hadn't messed libnuma
> logics used to determine the required bitmap size is that
> MAX_NUMNODES is always a power of 2 and the loop in libnuma
> is
> nodemask_sz = 16;
> do {
> nodemask_sz <<= 1;
> mask = realloc(mask, nodemask_sz / 8);
> if (!mask)
> return;
> } while (get_mempolicy(&pol, mask, nodemask_sz + 1, 0, 0) < 0 && errno == EINVAL &&
> nodemask_sz < 4096*8);
> I.e. it's been passing 33, 65, 127, etc. until it got it large enough.
Sigh, it was silly of me to hope nobody is doing that [1]. I thought
libnuma was parsing /proc/self/status though, IIRC I've checked [2]
> That sidesteps the boundary case - we never try to pass exactly
> MAX_NUMNODES there.
>
> However, that has changed recently, when get_mempolicy() switched
> to
> if (nmask != NULL && maxnode < nr_node_ids)
> return -EINVAL;
> _That_ can trigger. Consider a box with nr_node_ids == 65.
> The first call in libnuma:set_nodemask_size() loop will
> pass 33 and fail, then we'll raise nodemask_sz to 64,
> allocate a 64bit mask and call get_mempolicy(&pol, mask, 65, 0, 0),
> which will succeed. OK, so we decide to use 64bit bitmaps, and
> subsequent getpol() will be passing 65 to get_mempolicy(2). Which
> is not a good idea, since kernel-side we'll get
> copy_nodes_to_user(nmask, 65, &nodes)
> And that will copy only 8 bytes out of kernel-side bitmap with
> 65 bits in it...
>
> IOW, that check always should had been <=, not <; it didn't matter
> until commit 050c17f239fd ("numa: change get_mempolicy() to use
> nr_node_ids instead of MAX_NUMNODES") this year. The fix is trivial
> - we need to make that check consistent with the code that does
> actual copyin/copyout.
>
> Fixes: 050c17f239fd ("numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES")
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We should have reverted 050c17f239fd as it was fixing a patch in mmotm
that was ultimately discarded. It's not ideal e.g. for CRIU to determine
maxnode on old system and keep the value even on a new system with
possibly more nodes. But the commit was too quickly pushed into stables,
complicating the situation.
If we're not reverting then
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Thanks.
[1]
https://lore.kernel.org/linux-mm/32575d26-b141-6985-833a-12d48c0dce6a@suse.cz/
[2]
https://lore.kernel.org/linux-mm/4dab8a83-803a-56e0-6bbf-bdf581f2d1b4@suse.cz/
> ---
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 4ae967bcf954..e184df7633b0 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1561,7 +1561,7 @@ static int kernel_get_mempolicy(int __user *policy,
>
> addr = untagged_addr(addr);
>
> - if (nmask != NULL && maxnode < nr_node_ids)
> + if (nmask != NULL && maxnode <= nr_node_ids)
> return -EINVAL;
>
> err = do_get_mempolicy(&pval, &nodes, addr, flags);
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2019-10-09 14:48 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-09 14:05 [PATCH] off-by-one in get_mempolicy(2) Al Viro
2019-10-09 14:48 ` Vlastimil Babka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).