All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	linuxppc-dev@lists.ozlabs.org,
	linux-s390 <linux-s390@vger.kernel.org>
Subject: Re: Linux 5.1-rc5
Date: Wed, 17 Apr 2019 09:46:37 +0200	[thread overview]
Message-ID: <20190417094637.51ad4c67@mschwideX1> (raw)
In-Reply-To: <CAHk-=wj2SW0Zno0Yn=S9wrsmHOKV0FiFPiPS4TM=Gn8yjfYXAg@mail.gmail.com>

On Tue, 16 Apr 2019 09:49:46 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Apr 16, 2019 at 9:16 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > We actually already *have* this function.
> >
> > It's called "gup_fast_permitted()" and it's used by x86-64 to verify
> > the proper address range. Exactly like s390 needs..
> >
> > Could you please use that instead?  
> 
> IOW, something like the attached.
> 
> Obviously untested. And maybe 'current' isn't declared in
> <asm/pgtable.h>, in which case you'd need to modify it to instead make
> the inline function be "s390_gup_fast_permitted()" that takes a
> pointer to the mm, and do something like
> 
>   #define gup_fast_permitted(start, pages) \
>          s390_gup_fast_permitted(current->mm, start, pages)
> 
> instead.
> 
> But I think you get the idea..

Nice, I did not realize that gup_fast_permitted is a platform
override-able function. So that part is doable in arch/s390. But I
spoke to soon, I got my first crash and realized that the common gup code
is not usable as it is. The reason is this e.g. this sequence:

	pgdp = pgd_offset(current->mm, addr);
	pgd_t pgd = READ_ONCE(*pgdp);
	/* some checking on pgd */
	gup_p4d_range(pgd, addr, next, write, pages, nr);

	p4dp = p4d_offset(&pgd, addr);
	p4d_t p4d = READ_ONCE(*p4dp);
	/* some checking on p4d */
	gup_pud_range(p4d, addr, next, write, pages, nr);

	pudp = pud_offset(&p4d, addr);
	pud_t pud = READ_ONCE(*pudp);
	/* some checking on pud */
	gup_pmd_range(pud, addr, next, write, pages, nr;

Each step along the way will read the page table entry and pass the
table entry to the next function. This clashes with the page table
folding on s390. The s390 gup code looks more like this:

	pgdp = pgd_offset(current->mm, addr);
	/* some checking on pgd */
	pgd_t pgd = READ_ONCE(*pgdp);
	gup_p4d_range(pgdp, pgd, addr, next, write, pages, &nr);

	p4dp = p4d_offset(pgdp, addr);
	p4d_t p4d = READ_ONCE(*p4dp);
	/* some checking on p4d */
	gup_pud_range(p4dp, p4d, addr, next, write, pages, nr);

	pudp = pud_offset(p4dp, addr);
	pud_t pud = READ_ONCE(*pudp);
	/* some checking on pud */
	gup_pmd_range(pudp, pud, addr, next, write, pages, nr;

There are magic dereferences in the s390 versions of p4d_offset,
pud_offset and pmd_offset functions. To make this work the pointer
passed to these functions may not be the local copy of the already
dereferenced table entry. I'll cook up a patch for the common code.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


WARNING: multiple messages have this Message-ID (diff)
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>,
	linuxppc-dev@lists.ozlabs.org,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	linux-s390 <linux-s390@vger.kernel.org>
Subject: Re: Linux 5.1-rc5
Date: Wed, 17 Apr 2019 09:46:37 +0200	[thread overview]
Message-ID: <20190417094637.51ad4c67@mschwideX1> (raw)
In-Reply-To: <CAHk-=wj2SW0Zno0Yn=S9wrsmHOKV0FiFPiPS4TM=Gn8yjfYXAg@mail.gmail.com>

On Tue, 16 Apr 2019 09:49:46 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Apr 16, 2019 at 9:16 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > We actually already *have* this function.
> >
> > It's called "gup_fast_permitted()" and it's used by x86-64 to verify
> > the proper address range. Exactly like s390 needs..
> >
> > Could you please use that instead?  
> 
> IOW, something like the attached.
> 
> Obviously untested. And maybe 'current' isn't declared in
> <asm/pgtable.h>, in which case you'd need to modify it to instead make
> the inline function be "s390_gup_fast_permitted()" that takes a
> pointer to the mm, and do something like
> 
>   #define gup_fast_permitted(start, pages) \
>          s390_gup_fast_permitted(current->mm, start, pages)
> 
> instead.
> 
> But I think you get the idea..

Nice, I did not realize that gup_fast_permitted is a platform
override-able function. So that part is doable in arch/s390. But I
spoke to soon, I got my first crash and realized that the common gup code
is not usable as it is. The reason is this e.g. this sequence:

	pgdp = pgd_offset(current->mm, addr);
	pgd_t pgd = READ_ONCE(*pgdp);
	/* some checking on pgd */
	gup_p4d_range(pgd, addr, next, write, pages, nr);

	p4dp = p4d_offset(&pgd, addr);
	p4d_t p4d = READ_ONCE(*p4dp);
	/* some checking on p4d */
	gup_pud_range(p4d, addr, next, write, pages, nr);

	pudp = pud_offset(&p4d, addr);
	pud_t pud = READ_ONCE(*pudp);
	/* some checking on pud */
	gup_pmd_range(pud, addr, next, write, pages, nr;

Each step along the way will read the page table entry and pass the
table entry to the next function. This clashes with the page table
folding on s390. The s390 gup code looks more like this:

	pgdp = pgd_offset(current->mm, addr);
	/* some checking on pgd */
	pgd_t pgd = READ_ONCE(*pgdp);
	gup_p4d_range(pgdp, pgd, addr, next, write, pages, &nr);

	p4dp = p4d_offset(pgdp, addr);
	p4d_t p4d = READ_ONCE(*p4dp);
	/* some checking on p4d */
	gup_pud_range(p4dp, p4d, addr, next, write, pages, nr);

	pudp = pud_offset(p4dp, addr);
	pud_t pud = READ_ONCE(*pudp);
	/* some checking on pud */
	gup_pmd_range(pudp, pud, addr, next, write, pages, nr;

There are magic dereferences in the s390 versions of p4d_offset,
pud_offset and pmd_offset functions. To make this work the pointer
passed to these functions may not be the local copy of the already
dereferenced table entry. I'll cook up a patch for the common code.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

  reply	other threads:[~2019-04-17  7:47 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-14 22:40 Linux 5.1-rc5 Linus Torvalds
2019-04-15  5:19 ` Christoph Hellwig
2019-04-15 16:17   ` Linus Torvalds
2019-04-15 16:17     ` Linus Torvalds
2019-04-16  9:09     ` Martin Schwidefsky
2019-04-16  9:09       ` Martin Schwidefsky
2019-04-16 12:06       ` Martin Schwidefsky
2019-04-16 12:06         ` Martin Schwidefsky
2019-04-16 16:16         ` Linus Torvalds
2019-04-16 16:16           ` Linus Torvalds
2019-04-16 16:49           ` Linus Torvalds
2019-04-16 16:49             ` Linus Torvalds
2019-04-17  7:46             ` Martin Schwidefsky [this message]
2019-04-17  7:46               ` Martin Schwidefsky
2019-04-17  8:02               ` Martin Schwidefsky
2019-04-17  8:02                 ` Martin Schwidefsky
2019-04-17 16:57                 ` Linus Torvalds
2019-04-17 16:57                   ` Linus Torvalds
2019-04-18  8:02                   ` Martin Schwidefsky
2019-04-18  8:02                     ` Martin Schwidefsky
2019-04-18 15:49                     ` Linus Torvalds
2019-04-18 15:49                       ` Linus Torvalds
2019-04-18 18:41                       ` Martin Schwidefsky
2019-04-18 18:41                         ` Martin Schwidefsky
2019-04-19 13:33                         ` Martin Schwidefsky
2019-04-19 13:33                           ` Martin Schwidefsky
2019-04-19 17:27                           ` Linus Torvalds
2019-04-19 17:27                             ` Linus Torvalds
2019-04-23 15:38                             ` Martin Schwidefsky
2019-04-23 15:38                               ` Martin Schwidefsky
2019-04-23 16:06                               ` Linus Torvalds
2019-04-23 16:06                                 ` Linus Torvalds
2019-04-17  3:38     ` Michael Ellerman
2019-04-17  3:38       ` Michael Ellerman
2019-04-17  4:13       ` Linus Torvalds
2019-04-17  4:13         ` Linus Torvalds
2019-05-02 12:21     ` Greg KH
2019-05-02 12:21       ` Greg KH
2019-05-02 14:17       ` Martin Schwidefsky
2019-05-02 14:17         ` Martin Schwidefsky
2019-05-02 14:31         ` Greg KH
2019-05-02 14:31           ` Greg KH
2019-05-02 15:10           ` Martin Schwidefsky
2019-05-02 15:10             ` Martin Schwidefsky
2019-05-20 11:09             ` Greg KH
2019-05-20 11:09               ` Greg KH
2019-05-03 13:31       ` Michael Ellerman
2019-05-03 13:31         ` Michael Ellerman
2019-05-02 23:15     ` Christoph Hellwig
2019-05-02 23:15       ` Christoph Hellwig
2019-05-02 23:15       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190417094637.51ad4c67@mschwideX1 \
    --to=schwidefsky@de.ibm.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.