* [PATCH] bitmap: fix a unproper remap when mpol_rebind_nodemask() @ 2022-06-13 11:29 Yi Wang 2022-06-13 19:20 ` Yury Norov 0 siblings, 1 reply; 6+ messages in thread From: Yi Wang @ 2022-06-13 11:29 UTC (permalink / raw) To: yury.norov, andriy.shevchenko, linux Cc: linux-kernel, xue.zhihong, wang.yi59, wang.liang82 Consider one situation: The app have two vmas which mbind() to node 1 and node3 respectively, and its cpuset.mems is 0-3, now set its cpuset.mems to 1,3, according to current bitmap_remap(), we got: 1 => 3 3 => 3 This maybe confused because node 1,3 have already in the new settiing region but both nodes are binded to the same node 3 now. Actually we found the situation on a very old libvirt and qemu, but this can be easily reproduced in the current kernel, so we try to fix it. A possible fix way is to ignore the bits in @src have already existed in @new. Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> --- lib/bitmap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/bitmap.c b/lib/bitmap.c index b18e31ea6e66..b77bf1b3852e 100644 --- a/lib/bitmap.c +++ b/lib/bitmap.c @@ -1006,8 +1006,8 @@ unsigned int bitmap_ord_to_pos(const unsigned long *buf, unsigned int ord, unsig * @dst point to the same location, then this routine copies @src * to @dst. * - * The positions of unset bits in @old are mapped to themselves - * (the identify map). + * The positions of unset bits in @old or bits in @src have already + * existed in @new are mapped to themselves (the identify map). * * Apply the above specified mapping to @src, placing the result in * @dst, clearing any bits previously set in @dst. @@ -1033,7 +1033,7 @@ void bitmap_remap(unsigned long *dst, const unsigned long *src, for_each_set_bit(oldbit, src, nbits) { int n = bitmap_pos_to_ord(old, oldbit, nbits); - if (n < 0 || w == 0) + if (n < 0 || w == 0 || test_bit(oldbit, new)) set_bit(oldbit, dst); /* identity map */ else set_bit(bitmap_ord_to_pos(new, n % w, nbits), dst); -- 2.33.0.rc0.dirty ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] bitmap: fix a unproper remap when mpol_rebind_nodemask() 2022-06-13 11:29 [PATCH] bitmap: fix a unproper remap when mpol_rebind_nodemask() Yi Wang @ 2022-06-13 19:20 ` Yury Norov 2022-06-14 3:45 ` wang.yi59 0 siblings, 1 reply; 6+ messages in thread From: Yury Norov @ 2022-06-13 19:20 UTC (permalink / raw) To: Yi Wang; +Cc: andriy.shevchenko, linux, linux-kernel, xue.zhihong, wang.liang82 On Mon, Jun 13, 2022 at 4:31 AM Yi Wang <wang.yi59@zte.com.cn> wrote: > > Consider one situation: > > The app have two vmas which mbind() to node 1 and node3 respectively, > and its cpuset.mems is 0-3, now set its cpuset.mems to 1,3, according > to current bitmap_remap(), we got: > > 1 => 3 > 3 => 3 > > This maybe confused because node 1,3 have already in the new settiing > region but both nodes are binded to the same node 3 now. > > Actually we found the situation on a very old libvirt and qemu, but > this can be easily reproduced in the current kernel, so we try to fix > it. > > A possible fix way is to ignore the bits in @src have already existed > in @new. > > Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> > --- > lib/bitmap.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/lib/bitmap.c b/lib/bitmap.c > index b18e31ea6e66..b77bf1b3852e 100644 > --- a/lib/bitmap.c > +++ b/lib/bitmap.c > @@ -1006,8 +1006,8 @@ unsigned int bitmap_ord_to_pos(const unsigned long *buf, unsigned int ord, unsig > * @dst point to the same location, then this routine copies @src > * to @dst. > * > - * The positions of unset bits in @old are mapped to themselves > - * (the identify map). > + * The positions of unset bits in @old or bits in @src have already > + * existed in @new are mapped to themselves (the identify map). > * > * Apply the above specified mapping to @src, placing the result in > * @dst, clearing any bits previously set in @dst. > @@ -1033,7 +1033,7 @@ void bitmap_remap(unsigned long *dst, const unsigned long *src, > for_each_set_bit(oldbit, src, nbits) { > int n = bitmap_pos_to_ord(old, oldbit, nbits); > > - if (n < 0 || w == 0) > + if (n < 0 || w == 0 || test_bit(oldbit, new)) > set_bit(oldbit, dst); /* identity map */ > else > set_bit(bitmap_ord_to_pos(new, n % w, nbits), dst); > -- > 2.33.0.rc0.dirty Regarding the original problem - can you please confirm that it's reproduced on current kernels, show the execution path etc. From what I see on modern kernel, the only user of nodes_remap() is mpol_rebind_nodemask(). Is that the correct path? Anyways, as per name, bitmap_remap() is intended to change bit positions, and it doesn't look wrong if it does so. This is not how the function is supposed to work. For example, old: 00111000 new: 00011100 means: old: 00111 000 || \\\||| new: 000 11100 And after this patch it would be: old: 001 11000 || \||||| new: 000 11100 Which is not the same, right? If mpol_rebind() wants to keep previous relations, then according to the comment: * The positions of unset bits in @old are mapped to themselves * (the identify map). , you can just clear @old bits that already have good relations you'd like to preserve. Thanks, Yury ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re:[PATCH] bitmap: fix a unproper remap when mpol_rebind_nodemask() 2022-06-13 19:20 ` Yury Norov @ 2022-06-14 3:45 ` wang.yi59 2022-06-14 16:40 ` [PATCH] " Yury Norov 0 siblings, 1 reply; 6+ messages in thread From: wang.yi59 @ 2022-06-14 3:45 UTC (permalink / raw) To: yury.norov Cc: andriy.shevchenko, linux, linux-kernel, xue.zhihong, wang.liang82, Liu.Jianjun3 Hi Yury, Thanks for your quick and clear response! > On Mon, Jun 13, 2022 at 4:31 AM Yi Wang <wang.yi59@zte.com.cn> wrote: > > > > Consider one situation: > > > > The app have two vmas which mbind() to node 1 and node3 respectively, > > and its cpuset.mems is 0-3, now set its cpuset.mems to 1,3, according > > to current bitmap_remap(), we got: > > Regarding the original problem - can you please confirm that > it's reproduced on current kernels, show the execution path etc. > From what I see on modern kernel, the only user of nodes_remap() > is mpol_rebind_nodemask(). Is that the correct path? Yes, it's mpol_rebind_nodemask() calls nodes_remap() from mpol_rebind_policy(). The stacks are as follow: [ 290.836747] bitmap_remap+0x84/0xe0 [ 290.836753] mpol_rebind_nodemask+0x64/0x2a0 [ 290.836764] mpol_rebind_mm+0x3a/0x90 [ 290.836768] update_tasks_nodemask+0x8a/0x1e0 [ 290.836774] cpuset_write_resmask+0x563/0xa00 [ 290.836780] cgroup_file_write+0x81/0x150 [ 290.836784] kernfs_fop_write_iter+0x12d/0x1c0 [ 290.836791] new_sync_write+0x109/0x190 [ 290.836800] vfs_write+0x218/0x2a0 [ 290.836809] ksys_write+0x59/0xd0 [ 290.836812] do_syscall_64+0x37/0x80 [ 290.836818] entry_SYSCALL_64_after_hwframe+0x46/0xb0 To reproduce this situation, I write a program which seems like this: unsigned int flags = MAP_PRIVATE | MAP_ANONYMOUS; unsigned long size = 262144 << 12; unsigned long node1 = 2; // node 1 unsigned long node2 = 8; // node 3 p1 = vma1 = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0); p2 = vma2 = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0); assert(!mbind(vma1, size, MPOL_BIND, &node1, MAX_NODES, MPOL_MF_STRICT | MPOL_MF_MOVE)); assert(!mbind(vma2, size, MPOL_BIND, &node2, MAX_NODES, MPOL_MF_STRICT | MPOL_MF_MOVE)); Start the program whos name is mbind_tester, and do follow steps: mkdir && cd /sys/fs/cgroup/cpuset/mbind echo 0-31 > cpuset.cpus echo 0-3 > cpuset.mems cat /proc/`pidof mbind_tester`/numa_maps |grep bind -w 7ff73e200000 bind:3 anon=262144 dirty=262144 active=0 N3=262144 kernelpagesize_kB=4 7ff77e200000 bind:1 anon=262144 dirty=262144 active=0 N1=262144 kernelpagesize_kB=4 echo 1,3 > cpuset.mems cat /proc/`pidof mbind_tester`/numa_maps |grep bind -w 7ff73e200000 bind:3 anon=262144 dirty=262144 active=0 N3=262144 kernelpagesize_kB=4 7ff77e200000 bind:3 anon=262144 dirty=262144 active=0 N1=262144 kernelpagesize_kB=4 As you see, after set cpuset.mems to 1,3, the nodes which one of vma binded to changed from 1 to 3. This maybe confused, the original nodes binded is 1, after modify cpuset.mems to 1,3 which include the node 3, it changed to 3... > > Anyways, as per name, bitmap_remap() is intended to change bit > positions, and it doesn't look wrong if it does so. > > This is not how the function is supposed to work. For example, > old: 00111000 > new: 00011100 > > means: > old: 00111 000 > || \\\||| > new: 000 11100 > > And after this patch it would be: > old: 001 11000 > || \||||| > new: 000 11100 > > Which is not the same, right? Right. Actually this is what makes me embarrassed. If we want to fix this situtation, we can: - change the bitmap_remap() as this patch did, but this changed the behavior of this routine which looks does the right thing. One good news is this function is only called by mpol_rebind_nodemask(). - don't change the bitmap_remap(), to be honest, I didn't figure out a way. Any suggestions? > > If mpol_rebind() wants to keep previous relations, then according to > the comment: > * The positions of unset bits in @old are mapped to themselves > * (the identify map). > > , you can just clear @old bits that already have good relations > you'd like to preserve. Actually this does not work for me :) In the example above, if set cpuset.mems to 0,2 firstly, the nodes binds will change from 1 to 2. And then set cpuset.mems to 1,3, it will change from 2 to 3 again. --- Best wishes Yi Wang ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bitmap: fix a unproper remap when mpol_rebind_nodemask() 2022-06-14 3:45 ` wang.yi59 @ 2022-06-14 16:40 ` Yury Norov 2022-06-14 16:50 ` Andy Shevchenko 2022-06-15 8:54 ` wang.yi59 0 siblings, 2 replies; 6+ messages in thread From: Yury Norov @ 2022-06-14 16:40 UTC (permalink / raw) To: wang.yi59, Andrew Morton, linux-mm Cc: andriy.shevchenko, linux, linux-kernel, xue.zhihong, wang.liang82, Liu.Jianjun3, Yury Norov + Andrew Morton <akpm@linux-foundation.org> + linux-mm@kvack.org On Mon, Jun 13, 2022 at 8:45 PM <wang.yi59@zte.com.cn> wrote: > > Hi Yury, > > Thanks for your quick and clear response! > > > On Mon, Jun 13, 2022 at 4:31 AM Yi Wang <wang.yi59@zte.com.cn> wrote: > > > > > > Consider one situation: > > > > > > The app have two vmas which mbind() to node 1 and node3 respectively, > > > and its cpuset.mems is 0-3, now set its cpuset.mems to 1,3, according > > > to current bitmap_remap(), we got: > > > > Regarding the original problem - can you please confirm that > > it's reproduced on current kernels, show the execution path etc. > > From what I see on modern kernel, the only user of nodes_remap() > > is mpol_rebind_nodemask(). Is that the correct path? > > Yes, it's mpol_rebind_nodemask() calls nodes_remap() from > mpol_rebind_policy(). The stacks are as follow: > [ 290.836747] bitmap_remap+0x84/0xe0 > [ 290.836753] mpol_rebind_nodemask+0x64/0x2a0 > [ 290.836764] mpol_rebind_mm+0x3a/0x90 > [ 290.836768] update_tasks_nodemask+0x8a/0x1e0 > [ 290.836774] cpuset_write_resmask+0x563/0xa00 > [ 290.836780] cgroup_file_write+0x81/0x150 > [ 290.836784] kernfs_fop_write_iter+0x12d/0x1c0 > [ 290.836791] new_sync_write+0x109/0x190 > [ 290.836800] vfs_write+0x218/0x2a0 > [ 290.836809] ksys_write+0x59/0xd0 > [ 290.836812] do_syscall_64+0x37/0x80 > [ 290.836818] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > To reproduce this situation, I write a program which seems like this: > unsigned int flags = MAP_PRIVATE | MAP_ANONYMOUS; > unsigned long size = 262144 << 12; > unsigned long node1 = 2; // node 1 > unsigned long node2 = 8; // node 3 > > p1 = vma1 = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0); > p2 = vma2 = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0); > > assert(!mbind(vma1, size, MPOL_BIND, &node1, MAX_NODES, MPOL_MF_STRICT | MPOL_MF_MOVE)); > assert(!mbind(vma2, size, MPOL_BIND, &node2, MAX_NODES, MPOL_MF_STRICT | MPOL_MF_MOVE)); > > Start the program whos name is mbind_tester, and do follow steps: > > mkdir && cd /sys/fs/cgroup/cpuset/mbind > echo 0-31 > cpuset.cpus > echo 0-3 > cpuset.mems > > cat /proc/`pidof mbind_tester`/numa_maps |grep bind -w > 7ff73e200000 bind:3 anon=262144 dirty=262144 active=0 N3=262144 kernelpagesize_kB=4 > 7ff77e200000 bind:1 anon=262144 dirty=262144 active=0 N1=262144 kernelpagesize_kB=4 > > echo 1,3 > cpuset.mems > cat /proc/`pidof mbind_tester`/numa_maps |grep bind -w > 7ff73e200000 bind:3 anon=262144 dirty=262144 active=0 N3=262144 kernelpagesize_kB=4 > 7ff77e200000 bind:3 anon=262144 dirty=262144 active=0 N1=262144 kernelpagesize_kB=4 > > As you see, after set cpuset.mems to 1,3, the nodes which one of vma > binded to changed from 1 to 3. > > This maybe confused, the original nodes binded is 1, after modify > cpuset.mems to 1,3 which include the node 3, it changed to 3... Ok, thanks for the reproducer. I'll take a look at it closer to the weekend. > > Anyways, as per name, bitmap_remap() is intended to change bit > > positions, and it doesn't look wrong if it does so. > > > > This is not how the function is supposed to work. For example, > > old: 00111000 > > new: 00011100 > > > > means: > > old: 00111 000 > > || \\\||| > > new: 000 11100 > > > > And after this patch it would be: > > old: 001 11000 > > || \||||| > > new: 000 11100 > > > > Which is not the same, right? > > Right. So, we both agree that bitmap_remap() works as advertised. This is good. Let's try figuring out a solution without touching it. > Actually this is what makes me embarrassed. If we want to fix this > situtation, we can: > > - change the bitmap_remap() as this patch did, but this changed the > behavior of this routine which looks does the right thing. One good > news is this function is only called by mpol_rebind_nodemask(). There are users of bitmap_remap() in drivers/gpio/gpio-xilinx.c > - don't change the bitmap_remap(), to be honest, I didn't figure out > a way. Any suggestions? I haven't had a chance to play with it (because of my dayjob), but I have a strong feeling that the proper solution should come from existing functionality. Did you experiment with MPOL_F_{STATIC,RELATIVE}_NODES? Those flags enable nodes_and() and mpol_relative_nodemask() paths correspondingly. > > If mpol_rebind() wants to keep previous relations, then according to > > the comment: > > * The positions of unset bits in @old are mapped to themselves > > * (the identify map). > > > > , you can just clear @old bits that already have good relations > > you'd like to preserve. > > Actually this does not work for me :) What I suggested is: 328 node_clear(1, pol->w.cpuset_mems_allowed); 329 node_clear(3, pol->w.cpuset_mems_allowed); 330 nodes_remap(tmp, pol->nodes, pol->w.cpuset_mems_allowed, 331 *nodes); 332 pol->w.cpuset_mems_allowed = *nodes; > In the example above, if set cpuset.mems to 0,2 firstly, the nodes > binds will change from 1 to 2. And then set cpuset.mems to 1,3, it will > change from 2 to 3 again. I bet you can find a sequence that will finally give you the desired binding. And probably somebody does this black magic in production. For me it looks just scary. Can you try those static/relative flags, and if it doesn't work, we'll have to invent another policy for nodes binding. Adding Andrew and linux-mm, as it's definitely beyond bitmaps scope. Thanks, Yury > --- > Best wishes > Yi Wang ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bitmap: fix a unproper remap when mpol_rebind_nodemask() 2022-06-14 16:40 ` [PATCH] " Yury Norov @ 2022-06-14 16:50 ` Andy Shevchenko 2022-06-15 8:54 ` wang.yi59 1 sibling, 0 replies; 6+ messages in thread From: Andy Shevchenko @ 2022-06-14 16:50 UTC (permalink / raw) To: Yury Norov Cc: wang.yi59, Andrew Morton, linux-mm, linux, linux-kernel, xue.zhihong, wang.liang82, Liu.Jianjun3 On Tue, Jun 14, 2022 at 09:40:08AM -0700, Yury Norov wrote: > On Mon, Jun 13, 2022 at 8:45 PM <wang.yi59@zte.com.cn> wrote: > > > On Mon, Jun 13, 2022 at 4:31 AM Yi Wang <wang.yi59@zte.com.cn> wrote: ... > > > Anyways, as per name, bitmap_remap() is intended to change bit > > > positions, and it doesn't look wrong if it does so. Haven't read the discussion in full, but saw the function here and my 2 cents are the following: 1) if we don't have tests for bitmap_remap() / bitmap_bitremap(), we should; 2) these functions are used at least in one GPIO driver, so it would be nice to not touch them if they are not broken (see 1 above). -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re:[PATCH] bitmap: fix a unproper remap when mpol_rebind_nodemask() 2022-06-14 16:40 ` [PATCH] " Yury Norov 2022-06-14 16:50 ` Andy Shevchenko @ 2022-06-15 8:54 ` wang.yi59 1 sibling, 0 replies; 6+ messages in thread From: wang.yi59 @ 2022-06-15 8:54 UTC (permalink / raw) To: yury.norov Cc: akpm, linux-mm, andriy.shevchenko, linux, linux-kernel, xue.zhihong, wang.liang82, Liu.Jianjun3, yury.norov > On Mon, Jun 13, 2022 at 8:45 PM <wang.yi59@zte.com.cn> wrote: > > > > > > - change the bitmap_remap() as this patch did, but this changed the > > behavior of this routine which looks does the right thing. One good > > news is this function is only called by mpol_rebind_nodemask(). > > There are users of bitmap_remap() in drivers/gpio/gpio-xilinx.c > > > - don't change the bitmap_remap(), to be honest, I didn't figure out > > a way. Any suggestions? > > I haven't had a chance to play with it (because of my dayjob), but I > have a strong feeling that the proper solution should come from > existing functionality. > > Did you experiment with MPOL_F_{STATIC,RELATIVE}_NODES? > Those flags enable nodes_and() and mpol_relative_nodemask() > paths correspondingly. > You are right, adding MPOL_F_STATIC_NODES can fix this situation. So it seems that we don't need to do anything in kernel now. Many thanks for your time :) > What I suggested is: > > 328 node_clear(1, pol->w.cpuset_mems_allowed); > 329 node_clear(3, pol->w.cpuset_mems_allowed); > 330 nodes_remap(tmp, pol->nodes, pol->w.cpuset_mems_allowed, > 331 *nodes); > 332 pol->w.cpuset_mems_allowed = *nodes; > > > In the example above, if set cpuset.mems to 0,2 firstly, the nodes > > binds will change from 1 to 2. And then set cpuset.mems to 1,3, it will > > change from 2 to 3 again. --- Best wishes Yi Wang ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-06-15 8:55 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-13 11:29 [PATCH] bitmap: fix a unproper remap when mpol_rebind_nodemask() Yi Wang 2022-06-13 19:20 ` Yury Norov 2022-06-14 3:45 ` wang.yi59 2022-06-14 16:40 ` [PATCH] " Yury Norov 2022-06-14 16:50 ` Andy Shevchenko 2022-06-15 8:54 ` wang.yi59
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).