linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: install_special_mapping && vm_pgoff (Was: vvar, gup && coredump)
       [not found]                 ` <20150316190154.GA18472@redhat.com>
@ 2015-03-16 19:20                   ` Andy Lutomirski
  2015-03-16 19:44                     ` Oleg Nesterov
  0 siblings, 1 reply; 5+ messages in thread
From: Andy Lutomirski @ 2015-03-16 19:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Linus Torvalds, Jan Kratochvil,
	Sergio Durigan Junior, GDB Patches, Pedro Alves, linux-kernel,
	linux-mm

[cc: linux-mm]

On Mon, Mar 16, 2015 at 12:01 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 03/12, Oleg Nesterov wrote:
>>
>> OTOH. We can probably add ->access() into special_mapping_vmops, this
>> way __access_remote_vm() could work even if gup() fails ?
>
> So I tried to think how special_mapping_vmops->access() can work, it
> needs to rely on ->vm_pgoff.
>
> But afaics this logic is just broken. Lets even forget about vvar vma
> which uses remap_file_pages(). Lets look at "[vdso]" which uses the
> "normal" pages.
>
> The comment in special_mapping_fault() says
>
>          * special mappings have no vm_file, and in that case, the mm
>          * uses vm_pgoff internally.
>
> Yes. But afaics mm/ doesn't do this correctly. So
>
>          * do not copy this code into drivers!
>
> looks like a good recommendation ;)
>
> I think that this logic is wrong even if ARRAY_SIZE(pages) == 1, but I am
> not sure. But since vdso use 2 pages, it is trivial to show that this logic
> is wrong. To verify, I changed show_map_vma() to expose pgoff even if !file,
> but this test-case can show the problem too:
>
>         #include <stdio.h>
>         #include <unistd.h>
>         #include <stdlib.h>
>         #include <string.h>
>         #include <sys/mman.h>
>         #include <assert.h>
>
>         void *find_vdso_vaddr(void)
>         {
>                 FILE *perl;
>                 char buf[32] = {};
>
>                 perl = popen("perl -e 'open STDIN,qq|/proc/@{[getppid]}/maps|;"
>                                 "/^(.*?)-.*vdso/ && print hex $1 while <>'", "r");
>                 fread(buf, sizeof(buf), 1, perl);
>                 fclose(perl);
>
>                 return (void *)atol(buf);
>         }
>
>         #define PAGE_SIZE       4096
>
>         int main(void)
>         {
>                 void *vdso = find_vdso_vaddr();
>                 assert(vdso);
>
>                 // of course they should differ, and they do so far
>                 printf("vdso pages differ: %d\n",
>                         !!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE));
>
>                 // split into 2 vma's
>                 assert(mprotect(vdso, PAGE_SIZE, PROT_READ) == 0);
>
>                 // force another fault on the next check
>                 assert(madvise(vdso, 2 * PAGE_SIZE, MADV_DONTNEED) == 0);

I really hope this doesn't do anything (or fails) on the vvar page,
which is a pfnmap.

>
>                 // now they no longer differ, the 2nd vm_pgoff is wrong
>                 printf("vdso pages differ: %d\n",
>                         !!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE));
>
>                 return 0;
>         }
>
> output:
>
>         vdso pages differ: 1
>         vdso pages differ: 0
>
> And not only "split_vma" is wrong, I think that "move_vma" is not right too.
> Note this check in copy_vma(),
>
>         /*
>          * If anonymous vma has not yet been faulted, update new pgoff
>          * to match new location, to increase its chance of merging.
>          */
>         if (unlikely(!vma->vm_file && !vma->anon_vma)) {
>                 pgoff = addr >> PAGE_SHIFT;
>                 faulted_in_anon_vma = false;
>         }
>
> I can easily misread this code. But it doesn't look right too. If vdso was cow'ed
> (breakpoint installed by gdb) and sys_nremap()'ed, then the new pgoff will be wrong
> too after, say, MADV_DONTNEED.
>
> Or I am totally confused?

Ick, you're probably right.  For what it's worth, the vdso *seems* to
be okay (on 64-bit only, and only if you don't poke at it too hard) if
you mremap it in one piece.  CRIU does that.

What does the mm code do with vm_pgoff for vmas with no vm_file?  I'm
mystified.  There's this comment:

 * The way we recognize COWed pages within VM_PFNMAP mappings is through the
 * rules set up by "remap_pfn_range()": the vma will have the VM_PFNMAP bit
 * set, and the vm_pgoff will point to the first PFN mapped: thus every special
 * mapping will always honor the rule
 *
 *    pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)

Is that referring to special mappings in the install_special_mapping
sense or to something else.  FWIW, the vdso ins't a VM_PFNMAP at all.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: install_special_mapping && vm_pgoff (Was: vvar, gup && coredump)
  2015-03-16 19:20                   ` install_special_mapping && vm_pgoff (Was: vvar, gup && coredump) Andy Lutomirski
@ 2015-03-16 19:44                     ` Oleg Nesterov
  2015-03-17 13:43                       ` Oleg Nesterov
  0 siblings, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2015-03-16 19:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Hugh Dickins, Linus Torvalds, Jan Kratochvil,
	Sergio Durigan Junior, GDB Patches, Pedro Alves, linux-kernel,
	linux-mm

On 03/16, Andy Lutomirski wrote:
>
> Ick, you're probably right.  For what it's worth, the vdso *seems* to
> be okay (on 64-bit only, and only if you don't poke at it too hard) if
> you mremap it in one piece.  CRIU does that.

I need to run away till tomorrow, but looking at this code even if "one piece"
case doesn't look right if it was cow'ed. I'll verify tomorrow.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: install_special_mapping && vm_pgoff (Was: vvar, gup && coredump)
  2015-03-16 19:44                     ` Oleg Nesterov
@ 2015-03-17 13:43                       ` Oleg Nesterov
  2015-03-18  1:44                         ` Andy Lutomirski
  0 siblings, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2015-03-17 13:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Hugh Dickins, Linus Torvalds, Jan Kratochvil,
	Sergio Durigan Junior, GDB Patches, Pedro Alves, linux-kernel,
	linux-mm

On 03/16, Oleg Nesterov wrote:
>
> On 03/16, Andy Lutomirski wrote:
> >
> > Ick, you're probably right.  For what it's worth, the vdso *seems* to
> > be okay (on 64-bit only, and only if you don't poke at it too hard) if
> > you mremap it in one piece.  CRIU does that.
>
> I need to run away till tomorrow, but looking at this code even if "one piece"
> case doesn't look right if it was cow'ed. I'll verify tomorrow.

And I am still not sure this all is 100% correct, but I got lost in this code.
Probably this is fine...

But at least the bug exposed by the test-case looks clear:

	do_linear_fault:

		vmf->pgoff = (((address & PAGE_MASK) - vma->vm_start) >> PAGE_SHIFT)
				+ vma->vm_pgoff;
		...

		special_mapping_fault:

			pgoff = vmf->pgoff - vma->vm_pgoff;


So special_mapping_fault() can only work if this mapping starts from the
first page in ->pages[].

So perhaps we need _something like_ the (wrong/incomplete) patch below...

Or, really, perhaps we can create vdso_mapping ? So that map_vdso() could
simply mmap the anon_inode file...

Oleg.

--- x/mm/mmap.c
+++ x/mm/mmap.c
@@ -2832,6 +2832,8 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
 	return 0;
 }
 
+bool is_special_vma(struct vm_area_struct *vma);
+
 /*
  * Copy the vma structure to a new location in the same mm,
  * prior to moving page table entries, to effect an mremap move.
@@ -2851,7 +2853,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
 	 * If anonymous vma has not yet been faulted, update new pgoff
 	 * to match new location, to increase its chance of merging.
 	 */
-	if (unlikely(!vma->vm_file && !vma->anon_vma)) {
+	if (unlikely(!vma->vm_file && !is_special_vma(vma) && !vma->anon_vma)) {
 		pgoff = addr >> PAGE_SHIFT;
 		faulted_in_anon_vma = false;
 	}
@@ -2953,6 +2955,11 @@ static const struct vm_operations_struct legacy_special_mapping_vmops = {
 	.fault = special_mapping_fault,
 };
 
+bool is_special_vma(struct vm_area_struct *vma)
+{
+	return vma->vm_ops == &special_mapping_vmops;
+}
+
 static int special_mapping_fault(struct vm_area_struct *vma,
 				struct vm_fault *vmf)
 {
@@ -2965,7 +2972,7 @@ static int special_mapping_fault(struct vm_area_struct *vma,
 	 * We are allowed to do this because we are the mm; do not copy
 	 * this code into drivers!
 	 */
-	pgoff = vmf->pgoff - vma->vm_pgoff;
+	pgoff = vmf->pgoff;
 
 	if (vma->vm_ops == &legacy_special_mapping_vmops)
 		pages = vma->vm_private_data;
@@ -3014,6 +3021,7 @@ static struct vm_area_struct *__install_special_mapping(
 	if (ret)
 		goto out;
 
+	vma->vm_pgoff = 0;
 	mm->total_vm += len >> PAGE_SHIFT;
 
 	perf_event_mmap(vma);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: install_special_mapping && vm_pgoff (Was: vvar, gup && coredump)
  2015-03-17 13:43                       ` Oleg Nesterov
@ 2015-03-18  1:44                         ` Andy Lutomirski
  2015-03-18 18:06                           ` Oleg Nesterov
  0 siblings, 1 reply; 5+ messages in thread
From: Andy Lutomirski @ 2015-03-18  1:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Linus Torvalds, Jan Kratochvil,
	Sergio Durigan Junior, GDB Patches, Pedro Alves, linux-kernel,
	linux-mm

On Tue, Mar 17, 2015 at 6:43 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 03/16, Oleg Nesterov wrote:
>>
>> On 03/16, Andy Lutomirski wrote:
>> >
>> > Ick, you're probably right.  For what it's worth, the vdso *seems* to
>> > be okay (on 64-bit only, and only if you don't poke at it too hard) if
>> > you mremap it in one piece.  CRIU does that.
>>
>> I need to run away till tomorrow, but looking at this code even if "one piece"
>> case doesn't look right if it was cow'ed. I'll verify tomorrow.
>
> And I am still not sure this all is 100% correct, but I got lost in this code.
> Probably this is fine...
>
> But at least the bug exposed by the test-case looks clear:
>
>         do_linear_fault:
>
>                 vmf->pgoff = (((address & PAGE_MASK) - vma->vm_start) >> PAGE_SHIFT)
>                                 + vma->vm_pgoff;
>                 ...
>
>                 special_mapping_fault:
>
>                         pgoff = vmf->pgoff - vma->vm_pgoff;
>
>
> So special_mapping_fault() can only work if this mapping starts from the
> first page in ->pages[].
>
> So perhaps we need _something like_ the (wrong/incomplete) patch below...
>
> Or, really, perhaps we can create vdso_mapping ? So that map_vdso() could
> simply mmap the anon_inode file...

That's slightly tricky, I think, because it could start showing up in
/proc/PID/map_files or whatever it's called, and I don't think we want
that.  I also don't want to commit to all special mappings everywhere
being semantically identical (there are already two kinds on both x86
and arm64, and I'd eventually like to have them vary per-process as
well).  None of that precludes using non-null vm_file, but it's a
complication.

Your patch does look like a considerable improvement, though.  Let me
see if I can find some time to fold it in with the rest of my special
mapping rework over the next few days.

--Andy

>
> Oleg.
>
> --- x/mm/mmap.c
> +++ x/mm/mmap.c
> @@ -2832,6 +2832,8 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
>         return 0;
>  }
>
> +bool is_special_vma(struct vm_area_struct *vma);
> +
>  /*
>   * Copy the vma structure to a new location in the same mm,
>   * prior to moving page table entries, to effect an mremap move.
> @@ -2851,7 +2853,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
>          * If anonymous vma has not yet been faulted, update new pgoff
>          * to match new location, to increase its chance of merging.
>          */
> -       if (unlikely(!vma->vm_file && !vma->anon_vma)) {
> +       if (unlikely(!vma->vm_file && !is_special_vma(vma) && !vma->anon_vma)) {
>                 pgoff = addr >> PAGE_SHIFT;
>                 faulted_in_anon_vma = false;
>         }
> @@ -2953,6 +2955,11 @@ static const struct vm_operations_struct legacy_special_mapping_vmops = {
>         .fault = special_mapping_fault,
>  };
>
> +bool is_special_vma(struct vm_area_struct *vma)
> +{
> +       return vma->vm_ops == &special_mapping_vmops;
> +}
> +
>  static int special_mapping_fault(struct vm_area_struct *vma,
>                                 struct vm_fault *vmf)
>  {
> @@ -2965,7 +2972,7 @@ static int special_mapping_fault(struct vm_area_struct *vma,
>          * We are allowed to do this because we are the mm; do not copy
>          * this code into drivers!
>          */
> -       pgoff = vmf->pgoff - vma->vm_pgoff;
> +       pgoff = vmf->pgoff;
>
>         if (vma->vm_ops == &legacy_special_mapping_vmops)
>                 pages = vma->vm_private_data;
> @@ -3014,6 +3021,7 @@ static struct vm_area_struct *__install_special_mapping(
>         if (ret)
>                 goto out;
>
> +       vma->vm_pgoff = 0;
>         mm->total_vm += len >> PAGE_SHIFT;
>
>         perf_event_mmap(vma);
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: install_special_mapping && vm_pgoff (Was: vvar, gup && coredump)
  2015-03-18  1:44                         ` Andy Lutomirski
@ 2015-03-18 18:06                           ` Oleg Nesterov
  0 siblings, 0 replies; 5+ messages in thread
From: Oleg Nesterov @ 2015-03-18 18:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Hugh Dickins, Linus Torvalds, Jan Kratochvil,
	Sergio Durigan Junior, GDB Patches, Pedro Alves, linux-kernel,
	linux-mm

On 03/17, Andy Lutomirski wrote:
>
> On Tue, Mar 17, 2015 at 6:43 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > But at least the bug exposed by the test-case looks clear:
> >
> >         do_linear_fault:
> >
> >                 vmf->pgoff = (((address & PAGE_MASK) - vma->vm_start) >> PAGE_SHIFT)
> >                                 + vma->vm_pgoff;
> >                 ...
> >
> >                 special_mapping_fault:
> >
> >                         pgoff = vmf->pgoff - vma->vm_pgoff;
> >
> >
> > So special_mapping_fault() can only work if this mapping starts from the
> > first page in ->pages[].
> >
> > So perhaps we need _something like_ the (wrong/incomplete) patch below...
> >
> > Or, really, perhaps we can create vdso_mapping ? So that map_vdso() could
> > simply mmap the anon_inode file...
>
> That's slightly tricky, I think, because it could start showing up in
> /proc/PID/map_files or whatever it's called, and I don't think we want
> that.

Hmm. To me this looke liks improvement. And again, with this change
uprobe-in-vdso can work.

OK, this is off-topic right now, lets forget this for the moment.

> Your patch does look like a considerable improvement, though.  Let me
> see if I can find some time to fold it in with the rest of my special
> mapping rework over the next few days.

I'll try to recheck... Perhaps I'll send this (changed) patch for review.
This is a bugfix, even if the bug is minor.

And note that with this change vvar->access() becomes trivial. I think it
makes sense to fix "gup() fails in vvar" too. Gdb developers have enough
other problems with the poor kernel interfaces ;)

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-03-18 18:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <878ufc9kau.fsf@redhat.com>
     [not found] ` <20150305154827.GA9441@host1.jankratochvil.net>
     [not found]   ` <87zj7r5fpz.fsf@redhat.com>
     [not found]     ` <20150305205744.GA13165@host1.jankratochvil.net>
     [not found]       ` <20150311200052.GA22654@redhat.com>
     [not found]         ` <20150312143438.GA4338@redhat.com>
     [not found]           ` <CALCETrW5rmAHutzm_OwK2LTd_J0XByV3pvWGyW=AmC=v7rLfhQ@mail.gmail.com>
     [not found]             ` <20150312165423.GA10073@redhat.com>
     [not found]               ` <20150312174653.GA13086@redhat.com>
     [not found]                 ` <20150316190154.GA18472@redhat.com>
2015-03-16 19:20                   ` install_special_mapping && vm_pgoff (Was: vvar, gup && coredump) Andy Lutomirski
2015-03-16 19:44                     ` Oleg Nesterov
2015-03-17 13:43                       ` Oleg Nesterov
2015-03-18  1:44                         ` Andy Lutomirski
2015-03-18 18:06                           ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).