All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
@ 2023-07-11  7:21 Dan Carpenter
  2023-07-11 21:55 ` Suren Baghdasaryan
  0 siblings, 1 reply; 18+ messages in thread
From: Dan Carpenter @ 2023-07-11  7:21 UTC (permalink / raw)
  To: surenb; +Cc: linux-mm

Hello Suren Baghdasaryan,

The patch 1c71222e5f23: "mm: replace vma->vm_flags direct
modifications with modifier calls" from Jan 26, 2023, leads to the
following Smatch static checker warning:

	./include/linux/mm.h:729 vma_start_write()
	warn: sleeping in atomic context

include/linux/mm.h
    722 static inline void vma_start_write(struct vm_area_struct *vma)
    723 {
    724         int mm_lock_seq;
    725 
    726         if (__is_vma_write_locked(vma, &mm_lock_seq))
    727                 return;
    728 
--> 729         down_write(&vma->vm_lock->lock);
    730         vma->vm_lock_seq = mm_lock_seq;
    731         up_write(&vma->vm_lock->lock);
    732 }

The call tree is:

gru_fault() <- disables preempt
-> remap_pfn_range()
   -> track_pfn_remap()
   -> remap_pfn_range_notrack()
      -> vm_flags_set()
         -> vma_start_write()

Before track_pfn_remap() and remap_pfn_range_notrack() would just do |=
to set the flags but now they use vm_flags_set() so there is a potential
they could sleep.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-11  7:21 [bug report] mm: replace vma->vm_flags direct modifications with modifier calls Dan Carpenter
@ 2023-07-11 21:55 ` Suren Baghdasaryan
  2023-07-11 22:21   ` Matthew Wilcox
  0 siblings, 1 reply; 18+ messages in thread
From: Suren Baghdasaryan @ 2023-07-11 21:55 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: linux-mm, Andrew Morton, willy, Liam R. Howlett, Laurent Dufour,
	Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka,
	Johannes Weiner, Peter Xu, David Hildenbrand

On Tue, Jul 11, 2023 at 12:21 AM Dan Carpenter <dan.carpenter@linaro.org> wrote:
>
> Hello Suren Baghdasaryan,
>
> The patch 1c71222e5f23: "mm: replace vma->vm_flags direct
> modifications with modifier calls" from Jan 26, 2023, leads to the
> following Smatch static checker warning:
>
>         ./include/linux/mm.h:729 vma_start_write()
>         warn: sleeping in atomic context
>
> include/linux/mm.h
>     722 static inline void vma_start_write(struct vm_area_struct *vma)
>     723 {
>     724         int mm_lock_seq;
>     725
>     726         if (__is_vma_write_locked(vma, &mm_lock_seq))
>     727                 return;
>     728
> --> 729         down_write(&vma->vm_lock->lock);
>     730         vma->vm_lock_seq = mm_lock_seq;
>     731         up_write(&vma->vm_lock->lock);
>     732 }
>
> The call tree is:
>
> gru_fault() <- disables preempt
> -> remap_pfn_range()
>    -> track_pfn_remap()
>    -> remap_pfn_range_notrack()
>       -> vm_flags_set()
>          -> vma_start_write()
>
> Before track_pfn_remap() and remap_pfn_range_notrack() would just do |=
> to set the flags but now they use vm_flags_set() so there is a potential
> they could sleep.

Hi Dan,
Thanks for reporting! Looks like the page fault handler is modifying
the VMA flags, which has to be done under write-locked mmap_lock and I
don't see that being done here... I wonder if that should be allowed.
I'm CC'ing some MM folks to check if this is a valid VMA modification
and should be allowed. Matthew, this might be especially interesting
for you since gru_fault() handles file-backed page faults AFAIKT.

Back to the issue at hand. If such modification should be indeed
allowed then the simplest fix I think would be to add new
remap_pfn_range_locked() function to be called from gru_fault() which
would use __vm_flags_mod() instead of vm_flags_set(). __vm_flags_mod()
does not lock the VMA, so would not have this issue. If the conclusion
is that this is a valid scenario then I can post a fix I described.
Thanks,
Suren.

>
> regards,
> dan carpenter


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-11 21:55 ` Suren Baghdasaryan
@ 2023-07-11 22:21   ` Matthew Wilcox
  2023-07-11 23:45     ` Suren Baghdasaryan
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2023-07-11 22:21 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett,
	Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko,
	Vlastimil Babka, Johannes Weiner, Peter Xu, David Hildenbrand,
	Dimitri Sivanich, Mike Travis, Steve Wahl

On Tue, Jul 11, 2023 at 02:55:20PM -0700, Suren Baghdasaryan wrote:
> On Tue, Jul 11, 2023 at 12:21 AM Dan Carpenter <dan.carpenter@linaro.org> wrote:
> >
> > Hello Suren Baghdasaryan,
> >
> > The patch 1c71222e5f23: "mm: replace vma->vm_flags direct
> > modifications with modifier calls" from Jan 26, 2023, leads to the
> > following Smatch static checker warning:
> >
> >         ./include/linux/mm.h:729 vma_start_write()
> >         warn: sleeping in atomic context
> >
> > include/linux/mm.h
> >     722 static inline void vma_start_write(struct vm_area_struct *vma)
> >     723 {
> >     724         int mm_lock_seq;
> >     725
> >     726         if (__is_vma_write_locked(vma, &mm_lock_seq))
> >     727                 return;
> >     728
> > --> 729         down_write(&vma->vm_lock->lock);
> >     730         vma->vm_lock_seq = mm_lock_seq;
> >     731         up_write(&vma->vm_lock->lock);
> >     732 }
> >
> > The call tree is:
> >
> > gru_fault() <- disables preempt
> > -> remap_pfn_range()
> >    -> track_pfn_remap()
> >    -> remap_pfn_range_notrack()
> >       -> vm_flags_set()
> >          -> vma_start_write()
> >
> > Before track_pfn_remap() and remap_pfn_range_notrack() would just do |=
> > to set the flags but now they use vm_flags_set() so there is a potential
> > they could sleep.
> 
> Hi Dan,
> Thanks for reporting! Looks like the page fault handler is modifying
> the VMA flags, which has to be done under write-locked mmap_lock and I
> don't see that being done here... I wonder if that should be allowed.
> I'm CC'ing some MM folks to check if this is a valid VMA modification
> and should be allowed. Matthew, this might be especially interesting
> for you since gru_fault() handles file-backed page faults AFAIKT.

I don't run the ->fault handler under RCU, only the ->map_pages()
method.  I don't intend to change that.

> Back to the issue at hand. If such modification should be indeed
> allowed then the simplest fix I think would be to add new
> remap_pfn_range_locked() function to be called from gru_fault() which
> would use __vm_flags_mod() instead of vm_flags_set(). __vm_flags_mod()
> does not lock the VMA, so would not have this issue. If the conclusion
> is that this is a valid scenario then I can post a fix I described.

I'm not certain, but calling remap_pfn_range() in the fault handler
is definitely weird.  It's normally called _instead_ of having a fault
handler.  The fault handler usually calls set_pte_at() directly.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-11 22:21   ` Matthew Wilcox
@ 2023-07-11 23:45     ` Suren Baghdasaryan
  2023-07-12  7:35       ` David Hildenbrand
  0 siblings, 1 reply; 18+ messages in thread
From: Suren Baghdasaryan @ 2023-07-11 23:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett,
	Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko,
	Vlastimil Babka, Johannes Weiner, Peter Xu, David Hildenbrand,
	Dimitri Sivanich, Mike Travis, Steve Wahl

On Tue, Jul 11, 2023 at 3:21 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Jul 11, 2023 at 02:55:20PM -0700, Suren Baghdasaryan wrote:
> > On Tue, Jul 11, 2023 at 12:21 AM Dan Carpenter <dan.carpenter@linaro.org> wrote:
> > >
> > > Hello Suren Baghdasaryan,
> > >
> > > The patch 1c71222e5f23: "mm: replace vma->vm_flags direct
> > > modifications with modifier calls" from Jan 26, 2023, leads to the
> > > following Smatch static checker warning:
> > >
> > >         ./include/linux/mm.h:729 vma_start_write()
> > >         warn: sleeping in atomic context
> > >
> > > include/linux/mm.h
> > >     722 static inline void vma_start_write(struct vm_area_struct *vma)
> > >     723 {
> > >     724         int mm_lock_seq;
> > >     725
> > >     726         if (__is_vma_write_locked(vma, &mm_lock_seq))
> > >     727                 return;
> > >     728
> > > --> 729         down_write(&vma->vm_lock->lock);
> > >     730         vma->vm_lock_seq = mm_lock_seq;
> > >     731         up_write(&vma->vm_lock->lock);
> > >     732 }
> > >
> > > The call tree is:
> > >
> > > gru_fault() <- disables preempt
> > > -> remap_pfn_range()
> > >    -> track_pfn_remap()
> > >    -> remap_pfn_range_notrack()
> > >       -> vm_flags_set()
> > >          -> vma_start_write()
> > >
> > > Before track_pfn_remap() and remap_pfn_range_notrack() would just do |=
> > > to set the flags but now they use vm_flags_set() so there is a potential
> > > they could sleep.
> >
> > Hi Dan,
> > Thanks for reporting! Looks like the page fault handler is modifying
> > the VMA flags, which has to be done under write-locked mmap_lock and I
> > don't see that being done here... I wonder if that should be allowed.
> > I'm CC'ing some MM folks to check if this is a valid VMA modification
> > and should be allowed. Matthew, this might be especially interesting
> > for you since gru_fault() handles file-backed page faults AFAIKT.
>
> I don't run the ->fault handler under RCU, only the ->map_pages()
> method.  I don't intend to change that.
>
> > Back to the issue at hand. If such modification should be indeed
> > allowed then the simplest fix I think would be to add new
> > remap_pfn_range_locked() function to be called from gru_fault() which
> > would use __vm_flags_mod() instead of vm_flags_set(). __vm_flags_mod()
> > does not lock the VMA, so would not have this issue. If the conclusion
> > is that this is a valid scenario then I can post a fix I described.
>
> I'm not certain, but calling remap_pfn_range() in the fault handler
> is definitely weird.  It's normally called _instead_ of having a fault
> handler.  The fault handler usually calls set_pte_at() directly.

Hmm. Is it weird enough to be considered invalid or weird but still ok?
Also, is it ok to modify VMA flags here without write-locking the
mmap_lock (and without write-locking the VMA)? The fault handler is
done under read-locked mmap_lock but I thought VMA modifications
require stronger locking...


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-11 23:45     ` Suren Baghdasaryan
@ 2023-07-12  7:35       ` David Hildenbrand
  2023-07-12 15:01         ` Suren Baghdasaryan
  0 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2023-07-12  7:35 UTC (permalink / raw)
  To: Suren Baghdasaryan, Matthew Wilcox
  Cc: Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett,
	Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko,
	Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich,
	Mike Travis, Steve Wahl

On 12.07.23 01:45, Suren Baghdasaryan wrote:
> On Tue, Jul 11, 2023 at 3:21 PM Matthew Wilcox <willy@infradead.org> wrote:
>>
>> On Tue, Jul 11, 2023 at 02:55:20PM -0700, Suren Baghdasaryan wrote:
>>> On Tue, Jul 11, 2023 at 12:21 AM Dan Carpenter <dan.carpenter@linaro.org> wrote:
>>>>
>>>> Hello Suren Baghdasaryan,
>>>>
>>>> The patch 1c71222e5f23: "mm: replace vma->vm_flags direct
>>>> modifications with modifier calls" from Jan 26, 2023, leads to the
>>>> following Smatch static checker warning:
>>>>
>>>>          ./include/linux/mm.h:729 vma_start_write()
>>>>          warn: sleeping in atomic context
>>>>
>>>> include/linux/mm.h
>>>>      722 static inline void vma_start_write(struct vm_area_struct *vma)
>>>>      723 {
>>>>      724         int mm_lock_seq;
>>>>      725
>>>>      726         if (__is_vma_write_locked(vma, &mm_lock_seq))
>>>>      727                 return;
>>>>      728
>>>> --> 729         down_write(&vma->vm_lock->lock);
>>>>      730         vma->vm_lock_seq = mm_lock_seq;
>>>>      731         up_write(&vma->vm_lock->lock);
>>>>      732 }
>>>>
>>>> The call tree is:
>>>>
>>>> gru_fault() <- disables preempt
>>>> -> remap_pfn_range()
>>>>     -> track_pfn_remap()
>>>>     -> remap_pfn_range_notrack()
>>>>        -> vm_flags_set()
>>>>           -> vma_start_write()
>>>>
>>>> Before track_pfn_remap() and remap_pfn_range_notrack() would just do |=
>>>> to set the flags but now they use vm_flags_set() so there is a potential
>>>> they could sleep.
>>>
>>> Hi Dan,
>>> Thanks for reporting! Looks like the page fault handler is modifying
>>> the VMA flags, which has to be done under write-locked mmap_lock and I
>>> don't see that being done here... I wonder if that should be allowed.
>>> I'm CC'ing some MM folks to check if this is a valid VMA modification
>>> and should be allowed. Matthew, this might be especially interesting
>>> for you since gru_fault() handles file-backed page faults AFAIKT.
>>
>> I don't run the ->fault handler under RCU, only the ->map_pages()
>> method.  I don't intend to change that.
>>
>>> Back to the issue at hand. If such modification should be indeed
>>> allowed then the simplest fix I think would be to add new
>>> remap_pfn_range_locked() function to be called from gru_fault() which
>>> would use __vm_flags_mod() instead of vm_flags_set(). __vm_flags_mod()
>>> does not lock the VMA, so would not have this issue. If the conclusion
>>> is that this is a valid scenario then I can post a fix I described.
>>
>> I'm not certain, but calling remap_pfn_range() in the fault handler
>> is definitely weird.  It's normally called _instead_ of having a fault
>> handler.  The fault handler usually calls set_pte_at() directly.
> 
> Hmm. Is it weird enough to be considered invalid or weird but still ok?
> Also, is it ok to modify VMA flags here without write-locking the
> mmap_lock (and without write-locking the VMA)? The fault handler is
> done under read-locked mmap_lock but I thought VMA modifications
> require stronger locking...
> 

The "easy" fix would be to have something like "remap_pfn_range_prepare" 
that the remap_pfn_range() caller calls during mmap().

We can let that one set these flags, and then we can later let 
remap_pfn_range() fail if the relevant flags are not set.

It's certainly one of these "always done like that and suboptimal but 
somehow it worked" thingies.

I suspect, because only the first pagefault->remap_pfn_range() will 
actually modify the VMA flags, that this is ok. Otherwise, there usually 
wouldn't be any pagetables and nothing mapped, so who really cares about 
these VMA flags (e.g., GUP cannot pin anything if nothing is mapped).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-12  7:35       ` David Hildenbrand
@ 2023-07-12 15:01         ` Suren Baghdasaryan
  2023-07-12 15:52           ` Matthew Wilcox
  0 siblings, 1 reply; 18+ messages in thread
From: Suren Baghdasaryan @ 2023-07-12 15:01 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Matthew Wilcox, Dan Carpenter, linux-mm, Andrew Morton,
	Liam R. Howlett, Laurent Dufour, Michel Lespinasse,
	Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner,
	Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl

On Wed, Jul 12, 2023 at 12:35 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 12.07.23 01:45, Suren Baghdasaryan wrote:
> > On Tue, Jul 11, 2023 at 3:21 PM Matthew Wilcox <willy@infradead.org> wrote:
> >>
> >> On Tue, Jul 11, 2023 at 02:55:20PM -0700, Suren Baghdasaryan wrote:
> >>> On Tue, Jul 11, 2023 at 12:21 AM Dan Carpenter <dan.carpenter@linaro.org> wrote:
> >>>>
> >>>> Hello Suren Baghdasaryan,
> >>>>
> >>>> The patch 1c71222e5f23: "mm: replace vma->vm_flags direct
> >>>> modifications with modifier calls" from Jan 26, 2023, leads to the
> >>>> following Smatch static checker warning:
> >>>>
> >>>>          ./include/linux/mm.h:729 vma_start_write()
> >>>>          warn: sleeping in atomic context
> >>>>
> >>>> include/linux/mm.h
> >>>>      722 static inline void vma_start_write(struct vm_area_struct *vma)
> >>>>      723 {
> >>>>      724         int mm_lock_seq;
> >>>>      725
> >>>>      726         if (__is_vma_write_locked(vma, &mm_lock_seq))
> >>>>      727                 return;
> >>>>      728
> >>>> --> 729         down_write(&vma->vm_lock->lock);
> >>>>      730         vma->vm_lock_seq = mm_lock_seq;
> >>>>      731         up_write(&vma->vm_lock->lock);
> >>>>      732 }
> >>>>
> >>>> The call tree is:
> >>>>
> >>>> gru_fault() <- disables preempt
> >>>> -> remap_pfn_range()
> >>>>     -> track_pfn_remap()
> >>>>     -> remap_pfn_range_notrack()
> >>>>        -> vm_flags_set()
> >>>>           -> vma_start_write()
> >>>>
> >>>> Before track_pfn_remap() and remap_pfn_range_notrack() would just do |=
> >>>> to set the flags but now they use vm_flags_set() so there is a potential
> >>>> they could sleep.
> >>>
> >>> Hi Dan,
> >>> Thanks for reporting! Looks like the page fault handler is modifying
> >>> the VMA flags, which has to be done under write-locked mmap_lock and I
> >>> don't see that being done here... I wonder if that should be allowed.
> >>> I'm CC'ing some MM folks to check if this is a valid VMA modification
> >>> and should be allowed. Matthew, this might be especially interesting
> >>> for you since gru_fault() handles file-backed page faults AFAIKT.
> >>
> >> I don't run the ->fault handler under RCU, only the ->map_pages()
> >> method.  I don't intend to change that.
> >>
> >>> Back to the issue at hand. If such modification should be indeed
> >>> allowed then the simplest fix I think would be to add new
> >>> remap_pfn_range_locked() function to be called from gru_fault() which
> >>> would use __vm_flags_mod() instead of vm_flags_set(). __vm_flags_mod()
> >>> does not lock the VMA, so would not have this issue. If the conclusion
> >>> is that this is a valid scenario then I can post a fix I described.
> >>
> >> I'm not certain, but calling remap_pfn_range() in the fault handler
> >> is definitely weird.  It's normally called _instead_ of having a fault
> >> handler.  The fault handler usually calls set_pte_at() directly.
> >
> > Hmm. Is it weird enough to be considered invalid or weird but still ok?
> > Also, is it ok to modify VMA flags here without write-locking the
> > mmap_lock (and without write-locking the VMA)? The fault handler is
> > done under read-locked mmap_lock but I thought VMA modifications
> > require stronger locking...
> >
>
> The "easy" fix would be to have something like "remap_pfn_range_prepare"
> that the remap_pfn_range() caller calls during mmap().

Are you suggesting to break remap_pfn_range() into two stages
(remap_pfn_range_prepare() then remap_pfn_range())?
If so, there are many places remap_pfn_range() is called and IIUC all
of them would need to use that 2-stage approach (lots of code churn).
In addition, this is an exported function, so many more drivers might
expect the current behavior.
My suggestion was to have another function called smth like
remap_pfn_range_nolock() and internally use the same code with an
additional flag that would tell us whether we should lock the vma or
not. Such change should be quite simple and small.

>
> We can let that one set these flags, and then we can later let
> remap_pfn_range() fail if the relevant flags are not set.
>
> It's certainly one of these "always done like that and suboptimal but
> somehow it worked" thingies.
>
> I suspect, because only the first pagefault->remap_pfn_range() will
> actually modify the VMA flags, that this is ok. Otherwise, there usually
> wouldn't be any pagetables and nothing mapped, so who really cares about
> these VMA flags (e.g., GUP cannot pin anything if nothing is mapped).

Is my understanding correct that the reasoning remap_pfn_range() is
allowed here without write-locking the VMA (or write-locking
mmap_lock) is because it's done only if there are no pre-existing
patables?
Thanks,
Suren.

>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-12 15:01         ` Suren Baghdasaryan
@ 2023-07-12 15:52           ` Matthew Wilcox
  2023-07-12 15:55             ` David Hildenbrand
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2023-07-12 15:52 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: David Hildenbrand, Dan Carpenter, linux-mm, Andrew Morton,
	Liam R. Howlett, Laurent Dufour, Michel Lespinasse,
	Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner,
	Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl

On Wed, Jul 12, 2023 at 08:01:18AM -0700, Suren Baghdasaryan wrote:
> Are you suggesting to break remap_pfn_range() into two stages
> (remap_pfn_range_prepare() then remap_pfn_range())?
> If so, there are many places remap_pfn_range() is called and IIUC all
> of them would need to use that 2-stage approach (lots of code churn).
> In addition, this is an exported function, so many more drivers might
> expect the current behavior.

You do not understand correctly.

When somebody calls mmap, there are two reasonable implementations.
Here's one:

        .mmap = snd_dma_iram_mmap,

static int snd_dma_iram_mmap(struct snd_dma_buffer *dmab,
                             struct vm_area_struct *area)
{
        area->vm_page_prot = pgprot_writecombine(area->vm_page_prot);
        return remap_pfn_range(area, area->vm_start,
                               dmab->addr >> PAGE_SHIFT,
                               area->vm_end - area->vm_start,
                               area->vm_page_prot);
}

This is _fine_.  It is not called from the fault path, it is called in
process context.  Few locks are held (which ones aren't even
documented!)

The other way is to set vma->vm_ops.  The fault handler in vm_ops
should not be calling remap_pfn_range().  It should be calling
set_ptes().  I almost have this driver fixed up, but I have another
meeting to go to now.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-12 15:52           ` Matthew Wilcox
@ 2023-07-12 15:55             ` David Hildenbrand
  2023-07-12 16:03               ` Suren Baghdasaryan
  2023-07-12 18:49               ` Matthew Wilcox
  0 siblings, 2 replies; 18+ messages in thread
From: David Hildenbrand @ 2023-07-12 15:55 UTC (permalink / raw)
  To: Matthew Wilcox, Suren Baghdasaryan
  Cc: Dan Carpenter, linux-mm, Andrew Morton, Liam R. Howlett,
	Laurent Dufour, Michel Lespinasse, Jerome Glisse, Michal Hocko,
	Vlastimil Babka, Johannes Weiner, Peter Xu, Dimitri Sivanich,
	Mike Travis, Steve Wahl

On 12.07.23 17:52, Matthew Wilcox wrote:
> On Wed, Jul 12, 2023 at 08:01:18AM -0700, Suren Baghdasaryan wrote:
>> Are you suggesting to break remap_pfn_range() into two stages
>> (remap_pfn_range_prepare() then remap_pfn_range())?
>> If so, there are many places remap_pfn_range() is called and IIUC all
>> of them would need to use that 2-stage approach (lots of code churn).
>> In addition, this is an exported function, so many more drivers might
>> expect the current behavior.
> 
> You do not understand correctly.
> 
> When somebody calls mmap, there are two reasonable implementations.
> Here's one:
> 
>          .mmap = snd_dma_iram_mmap,
> 
> static int snd_dma_iram_mmap(struct snd_dma_buffer *dmab,
>                               struct vm_area_struct *area)
> {
>          area->vm_page_prot = pgprot_writecombine(area->vm_page_prot);
>          return remap_pfn_range(area, area->vm_start,
>                                 dmab->addr >> PAGE_SHIFT,
>                                 area->vm_end - area->vm_start,
>                                 area->vm_page_prot);
> }
> 
> This is _fine_.  It is not called from the fault path, it is called in
> process context.  Few locks are held (which ones aren't even
> documented!)
> 
> The other way is to set vma->vm_ops.  The fault handler in vm_ops
> should not be calling remap_pfn_range().  It should be calling
> set_ptes().  I almost have this driver fixed up, but I have another
> meeting to go to now.

Just a note that we still have to make sure that the VMA flags will be 
set properly -- I guess at mmap time is the right time as I suggested above.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-12 15:55             ` David Hildenbrand
@ 2023-07-12 16:03               ` Suren Baghdasaryan
  2023-07-12 18:49               ` Matthew Wilcox
  1 sibling, 0 replies; 18+ messages in thread
From: Suren Baghdasaryan @ 2023-07-12 16:03 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Matthew Wilcox, Dan Carpenter, linux-mm, Andrew Morton,
	Liam R. Howlett, Laurent Dufour, Michel Lespinasse,
	Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner,
	Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl

On Wed, Jul 12, 2023 at 8:55 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 12.07.23 17:52, Matthew Wilcox wrote:
> > On Wed, Jul 12, 2023 at 08:01:18AM -0700, Suren Baghdasaryan wrote:
> >> Are you suggesting to break remap_pfn_range() into two stages
> >> (remap_pfn_range_prepare() then remap_pfn_range())?
> >> If so, there are many places remap_pfn_range() is called and IIUC all
> >> of them would need to use that 2-stage approach (lots of code churn).
> >> In addition, this is an exported function, so many more drivers might
> >> expect the current behavior.
> >
> > You do not understand correctly.
> >
> > When somebody calls mmap, there are two reasonable implementations.
> > Here's one:
> >
> >          .mmap = snd_dma_iram_mmap,
> >
> > static int snd_dma_iram_mmap(struct snd_dma_buffer *dmab,
> >                               struct vm_area_struct *area)
> > {
> >          area->vm_page_prot = pgprot_writecombine(area->vm_page_prot);
> >          return remap_pfn_range(area, area->vm_start,
> >                                 dmab->addr >> PAGE_SHIFT,
> >                                 area->vm_end - area->vm_start,
> >                                 area->vm_page_prot);
> > }
> >
> > This is _fine_.  It is not called from the fault path, it is called in
> > process context.  Few locks are held (which ones aren't even
> > documented!)
> >
> > The other way is to set vma->vm_ops.  The fault handler in vm_ops
> > should not be calling remap_pfn_range().  It should be calling
> > set_ptes().  I almost have this driver fixed up, but I have another
> > meeting to go to now.
>
> Just a note that we still have to make sure that the VMA flags will be
> set properly -- I guess at mmap time is the right time as I suggested above.

Ah, ok, now I understand what you meant. Ok, let's wait for Matthew to
send the fix for the driver. Sounds like he has it almost ready.

>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-12 15:55             ` David Hildenbrand
  2023-07-12 16:03               ` Suren Baghdasaryan
@ 2023-07-12 18:49               ` Matthew Wilcox
  2023-07-12 18:52                 ` David Hildenbrand
  2023-07-12 19:34                 ` Matthew Wilcox
  1 sibling, 2 replies; 18+ messages in thread
From: Matthew Wilcox @ 2023-07-12 18:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Suren Baghdasaryan, Dan Carpenter, linux-mm, Andrew Morton,
	Liam R. Howlett, Laurent Dufour, Michel Lespinasse,
	Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner,
	Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl

On Wed, Jul 12, 2023 at 05:55:47PM +0200, David Hildenbrand wrote:
> On 12.07.23 17:52, Matthew Wilcox wrote:
> > On Wed, Jul 12, 2023 at 08:01:18AM -0700, Suren Baghdasaryan wrote:
> > > Are you suggesting to break remap_pfn_range() into two stages
> > > (remap_pfn_range_prepare() then remap_pfn_range())?
> > > If so, there are many places remap_pfn_range() is called and IIUC all
> > > of them would need to use that 2-stage approach (lots of code churn).
> > > In addition, this is an exported function, so many more drivers might
> > > expect the current behavior.
> > 
> > You do not understand correctly.
> > 
> > When somebody calls mmap, there are two reasonable implementations.
> > Here's one:
> > 
> >          .mmap = snd_dma_iram_mmap,
> > 
> > static int snd_dma_iram_mmap(struct snd_dma_buffer *dmab,
> >                               struct vm_area_struct *area)
> > {
> >          area->vm_page_prot = pgprot_writecombine(area->vm_page_prot);
> >          return remap_pfn_range(area, area->vm_start,
> >                                 dmab->addr >> PAGE_SHIFT,
> >                                 area->vm_end - area->vm_start,
> >                                 area->vm_page_prot);
> > }
> > 
> > This is _fine_.  It is not called from the fault path, it is called in
> > process context.  Few locks are held (which ones aren't even
> > documented!)
> > 
> > The other way is to set vma->vm_ops.  The fault handler in vm_ops
> > should not be calling remap_pfn_range().  It should be calling
> > set_ptes().  I almost have this driver fixed up, but I have another
> > meeting to go to now.
> 
> Just a note that we still have to make sure that the VMA flags will be set
> properly -- I guess at mmap time is the right time as I suggested above.

It actually does that already:

static int gru_file_mmap(struct file *file, struct vm_area_struct *vma)
{
        if ((vma->vm_flags & (VM_SHARED | VM_WRITE)) != (VM_SHARED | VM_WRITE))
                return -EPERM;

        if (vma->vm_start & (GRU_GSEG_PAGESIZE - 1) ||
                                vma->vm_end & (GRU_GSEG_PAGESIZE - 1))
                return -EINVAL;

        vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_LOCKED |
                         VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);


This compiles, but obviously I don't have a spare HP supercomputer lying
around for me to test whether it works.  Also set_ptes() was only just
introduced to the mm tree, so doing something that needs backporting
would take more effort (maybe having a private set_ptes() in the driver
would be a good backport option that calls set_pte_at() in a loop).


diff --git a/drivers/misc/sgi-gru/grumain.c b/drivers/misc/sgi-gru/grumain.c
index 4eb4b9455139..c21bcb528f12 100644
--- a/drivers/misc/sgi-gru/grumain.c
+++ b/drivers/misc/sgi-gru/grumain.c
@@ -951,6 +951,8 @@ vm_fault_t gru_fault(struct vm_fault *vmf)
 	}
 
 	if (!gts->ts_gru) {
+		pte_t *ptep, pte;
+
 		STAT(load_user_context);
 		if (!gru_assign_gru_context(gts)) {
 			preempt_enable();
@@ -964,9 +966,12 @@ vm_fault_t gru_fault(struct vm_fault *vmf)
 		}
 		gru_load_context(gts);
 		paddr = gseg_physical_address(gts->ts_gru, gts->ts_ctxnum);
-		remap_pfn_range(vma, vaddr & ~(GRU_GSEG_PAGESIZE - 1),
-				paddr >> PAGE_SHIFT, GRU_GSEG_PAGESIZE,
-				vma->vm_page_prot);
+
+		pte = pfn_pte(paddr / PAGE_SIZE, vma->vm_page_prot);
+		ptep = vmf->pte - (vaddr % GRU_GSEG_PAGESIZE) / PAGE_SIZE;
+		set_ptes(vma->vm_mm, vaddr & ~(GRU_GSEG_PAGESIZE - 1),
+				ptep, pte_mkspecial(pte),
+				GRU_GSEG_PAGESIZE / PAGE_SIZE);
 	}
 
 	preempt_enable();


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-12 18:49               ` Matthew Wilcox
@ 2023-07-12 18:52                 ` David Hildenbrand
  2023-07-12 19:48                   ` Suren Baghdasaryan
  2023-07-12 19:34                 ` Matthew Wilcox
  1 sibling, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2023-07-12 18:52 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Suren Baghdasaryan, Dan Carpenter, linux-mm, Andrew Morton,
	Liam R. Howlett, Laurent Dufour, Michel Lespinasse,
	Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner,
	Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl

On 12.07.23 20:49, Matthew Wilcox wrote:
> On Wed, Jul 12, 2023 at 05:55:47PM +0200, David Hildenbrand wrote:
>> On 12.07.23 17:52, Matthew Wilcox wrote:
>>> On Wed, Jul 12, 2023 at 08:01:18AM -0700, Suren Baghdasaryan wrote:
>>>> Are you suggesting to break remap_pfn_range() into two stages
>>>> (remap_pfn_range_prepare() then remap_pfn_range())?
>>>> If so, there are many places remap_pfn_range() is called and IIUC all
>>>> of them would need to use that 2-stage approach (lots of code churn).
>>>> In addition, this is an exported function, so many more drivers might
>>>> expect the current behavior.
>>>
>>> You do not understand correctly.
>>>
>>> When somebody calls mmap, there are two reasonable implementations.
>>> Here's one:
>>>
>>>           .mmap = snd_dma_iram_mmap,
>>>
>>> static int snd_dma_iram_mmap(struct snd_dma_buffer *dmab,
>>>                                struct vm_area_struct *area)
>>> {
>>>           area->vm_page_prot = pgprot_writecombine(area->vm_page_prot);
>>>           return remap_pfn_range(area, area->vm_start,
>>>                                  dmab->addr >> PAGE_SHIFT,
>>>                                  area->vm_end - area->vm_start,
>>>                                  area->vm_page_prot);
>>> }
>>>
>>> This is _fine_.  It is not called from the fault path, it is called in
>>> process context.  Few locks are held (which ones aren't even
>>> documented!)
>>>
>>> The other way is to set vma->vm_ops.  The fault handler in vm_ops
>>> should not be calling remap_pfn_range().  It should be calling
>>> set_ptes().  I almost have this driver fixed up, but I have another
>>> meeting to go to now.
>>
>> Just a note that we still have to make sure that the VMA flags will be set
>> properly -- I guess at mmap time is the right time as I suggested above.
> 
> It actually does that already:
> 
> static int gru_file_mmap(struct file *file, struct vm_area_struct *vma)
> {
>          if ((vma->vm_flags & (VM_SHARED | VM_WRITE)) != (VM_SHARED | VM_WRITE))
>                  return -EPERM;
> 
>          if (vma->vm_start & (GRU_GSEG_PAGESIZE - 1) ||
>                                  vma->vm_end & (GRU_GSEG_PAGESIZE - 1))
>                  return -EINVAL;
> 
>          vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_LOCKED |
>                           VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);
> 
> 
> This compiles, but obviously I don't have a spare HP supercomputer lying
> around for me to test whether it works.  Also set_ptes() was only just
> introduced to the mm tree, so doing something that needs backporting
> would take more effort (maybe having a private set_ptes() in the driver
> would be a good backport option that calls set_pte_at() in a loop).
> 
> 
> diff --git a/drivers/misc/sgi-gru/grumain.c b/drivers/misc/sgi-gru/grumain.c
> index 4eb4b9455139..c21bcb528f12 100644
> --- a/drivers/misc/sgi-gru/grumain.c
> +++ b/drivers/misc/sgi-gru/grumain.c
> @@ -951,6 +951,8 @@ vm_fault_t gru_fault(struct vm_fault *vmf)
>   	}
>   
>   	if (!gts->ts_gru) {
> +		pte_t *ptep, pte;
> +
>   		STAT(load_user_context);
>   		if (!gru_assign_gru_context(gts)) {
>   			preempt_enable();
> @@ -964,9 +966,12 @@ vm_fault_t gru_fault(struct vm_fault *vmf)
>   		}
>   		gru_load_context(gts);
>   		paddr = gseg_physical_address(gts->ts_gru, gts->ts_ctxnum);
> -		remap_pfn_range(vma, vaddr & ~(GRU_GSEG_PAGESIZE - 1),
> -				paddr >> PAGE_SHIFT, GRU_GSEG_PAGESIZE,
> -				vma->vm_page_prot);
> +
> +		pte = pfn_pte(paddr / PAGE_SIZE, vma->vm_page_prot);
> +		ptep = vmf->pte - (vaddr % GRU_GSEG_PAGESIZE) / PAGE_SIZE;
> +		set_ptes(vma->vm_mm, vaddr & ~(GRU_GSEG_PAGESIZE - 1),
> +				ptep, pte_mkspecial(pte),
> +				GRU_GSEG_PAGESIZE / PAGE_SIZE);
>   	}
>   
>   	preempt_enable();
> 

Would we be able to fix it in stable simply by not triggering the 
vm_flags_set() in case these flags are already set?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-12 18:49               ` Matthew Wilcox
  2023-07-12 18:52                 ` David Hildenbrand
@ 2023-07-12 19:34                 ` Matthew Wilcox
  2023-07-17 19:54                   ` Dimitri Sivanich
  1 sibling, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2023-07-12 19:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Suren Baghdasaryan, Dan Carpenter, linux-mm, Andrew Morton,
	Liam R. Howlett, Laurent Dufour, Michel Lespinasse,
	Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner,
	Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl

On Wed, Jul 12, 2023 at 07:49:53PM +0100, Matthew Wilcox wrote:
> @@ -964,9 +966,12 @@ vm_fault_t gru_fault(struct vm_fault *vmf)
>  		}
>  		gru_load_context(gts);
>  		paddr = gseg_physical_address(gts->ts_gru, gts->ts_ctxnum);
> -		remap_pfn_range(vma, vaddr & ~(GRU_GSEG_PAGESIZE - 1),
> -				paddr >> PAGE_SHIFT, GRU_GSEG_PAGESIZE,
> -				vma->vm_page_prot);
> +
> +		pte = pfn_pte(paddr / PAGE_SIZE, vma->vm_page_prot);
> +		ptep = vmf->pte - (vaddr % GRU_GSEG_PAGESIZE) / PAGE_SIZE;
> +		set_ptes(vma->vm_mm, vaddr & ~(GRU_GSEG_PAGESIZE - 1),
> +				ptep, pte_mkspecial(pte),
> +				GRU_GSEG_PAGESIZE / PAGE_SIZE);
>  	}

Argh, no, this is wrong.  The page table isn't mapped at this point.
What we want is a cross between vmf_insert_pfn() and vm_insert_pages().
Let me add that ...



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-12 18:52                 ` David Hildenbrand
@ 2023-07-12 19:48                   ` Suren Baghdasaryan
  2023-07-17  6:13                     ` Yan Zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Suren Baghdasaryan @ 2023-07-12 19:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Matthew Wilcox, Dan Carpenter, linux-mm, Andrew Morton,
	Liam R. Howlett, Laurent Dufour, Michel Lespinasse,
	Jerome Glisse, Michal Hocko, Vlastimil Babka, Johannes Weiner,
	Peter Xu, Dimitri Sivanich, Mike Travis, Steve Wahl

On Wed, Jul 12, 2023 at 6:52 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 12.07.23 20:49, Matthew Wilcox wrote:
> > On Wed, Jul 12, 2023 at 05:55:47PM +0200, David Hildenbrand wrote:
> >> On 12.07.23 17:52, Matthew Wilcox wrote:
> >>> On Wed, Jul 12, 2023 at 08:01:18AM -0700, Suren Baghdasaryan wrote:
> >>>> Are you suggesting to break remap_pfn_range() into two stages
> >>>> (remap_pfn_range_prepare() then remap_pfn_range())?
> >>>> If so, there are many places remap_pfn_range() is called and IIUC all
> >>>> of them would need to use that 2-stage approach (lots of code churn).
> >>>> In addition, this is an exported function, so many more drivers might
> >>>> expect the current behavior.
> >>>
> >>> You do not understand correctly.
> >>>
> >>> When somebody calls mmap, there are two reasonable implementations.
> >>> Here's one:
> >>>
> >>>           .mmap = snd_dma_iram_mmap,
> >>>
> >>> static int snd_dma_iram_mmap(struct snd_dma_buffer *dmab,
> >>>                                struct vm_area_struct *area)
> >>> {
> >>>           area->vm_page_prot = pgprot_writecombine(area->vm_page_prot);
> >>>           return remap_pfn_range(area, area->vm_start,
> >>>                                  dmab->addr >> PAGE_SHIFT,
> >>>                                  area->vm_end - area->vm_start,
> >>>                                  area->vm_page_prot);
> >>> }
> >>>
> >>> This is _fine_.  It is not called from the fault path, it is called in
> >>> process context.  Few locks are held (which ones aren't even
> >>> documented!)
> >>>
> >>> The other way is to set vma->vm_ops.  The fault handler in vm_ops
> >>> should not be calling remap_pfn_range().  It should be calling
> >>> set_ptes().  I almost have this driver fixed up, but I have another
> >>> meeting to go to now.
> >>
> >> Just a note that we still have to make sure that the VMA flags will be set
> >> properly -- I guess at mmap time is the right time as I suggested above.
> >
> > It actually does that already:
> >
> > static int gru_file_mmap(struct file *file, struct vm_area_struct *vma)
> > {
> >          if ((vma->vm_flags & (VM_SHARED | VM_WRITE)) != (VM_SHARED | VM_WRITE))
> >                  return -EPERM;
> >
> >          if (vma->vm_start & (GRU_GSEG_PAGESIZE - 1) ||
> >                                  vma->vm_end & (GRU_GSEG_PAGESIZE - 1))
> >                  return -EINVAL;
> >
> >          vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_LOCKED |
> >                           VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);
> >
> >
> > This compiles, but obviously I don't have a spare HP supercomputer lying
> > around for me to test whether it works.  Also set_ptes() was only just
> > introduced to the mm tree, so doing something that needs backporting
> > would take more effort (maybe having a private set_ptes() in the driver
> > would be a good backport option that calls set_pte_at() in a loop).

vm_flags_set() was introduced in 6.3, so not too many versions to
backport to if needed. Private set_ptes() seems reasonable to me.

> >
> >
> > diff --git a/drivers/misc/sgi-gru/grumain.c b/drivers/misc/sgi-gru/grumain.c
> > index 4eb4b9455139..c21bcb528f12 100644
> > --- a/drivers/misc/sgi-gru/grumain.c
> > +++ b/drivers/misc/sgi-gru/grumain.c
> > @@ -951,6 +951,8 @@ vm_fault_t gru_fault(struct vm_fault *vmf)
> >       }
> >
> >       if (!gts->ts_gru) {
> > +             pte_t *ptep, pte;
> > +
> >               STAT(load_user_context);
> >               if (!gru_assign_gru_context(gts)) {
> >                       preempt_enable();
> > @@ -964,9 +966,12 @@ vm_fault_t gru_fault(struct vm_fault *vmf)
> >               }
> >               gru_load_context(gts);
> >               paddr = gseg_physical_address(gts->ts_gru, gts->ts_ctxnum);
> > -             remap_pfn_range(vma, vaddr & ~(GRU_GSEG_PAGESIZE - 1),
> > -                             paddr >> PAGE_SHIFT, GRU_GSEG_PAGESIZE,
> > -                             vma->vm_page_prot);
> > +
> > +             pte = pfn_pte(paddr / PAGE_SIZE, vma->vm_page_prot);
> > +             ptep = vmf->pte - (vaddr % GRU_GSEG_PAGESIZE) / PAGE_SIZE;
> > +             set_ptes(vma->vm_mm, vaddr & ~(GRU_GSEG_PAGESIZE - 1),
> > +                             ptep, pte_mkspecial(pte),
> > +                             GRU_GSEG_PAGESIZE / PAGE_SIZE);
> >       }
> >
> >       preempt_enable();
> >
>
> Would we be able to fix it in stable simply by not triggering the
> vm_flags_set() in case these flags are already set?

I think we can do that. gru_file_mmap() sets all the flags that are
set by remap_pfn_range_notrack() (VM_IO | VM_PFNMAP | VM_DONTEXPAND |
VM_DONTDUMP), so we can check if all bits are already present and skip
the vm_flags_set() call.

>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-12 19:48                   ` Suren Baghdasaryan
@ 2023-07-17  6:13                     ` Yan Zhao
  2023-07-17 16:18                       ` Suren Baghdasaryan
  0 siblings, 1 reply; 18+ messages in thread
From: Yan Zhao @ 2023-07-17  6:13 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: David Hildenbrand, Matthew Wilcox, Dan Carpenter, linux-mm,
	Andrew Morton, Liam R. Howlett, Laurent Dufour,
	Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka,
	Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis,
	Steve Wahl

On Wed, Jul 12, 2023 at 07:48:06PM +0000, Suren Baghdasaryan wrote:
> > Would we be able to fix it in stable simply by not triggering the
> > vm_flags_set() in case these flags are already set?
> 
> I think we can do that. gru_file_mmap() sets all the flags that are
> set by remap_pfn_range_notrack() (VM_IO | VM_PFNMAP | VM_DONTEXPAND |
> VM_DONTDUMP), so we can check if all bits are already present and skip
> the vm_flags_set() call.
> 
But on x86, remap_pfn_range() also sets flag VM_PAT. (in track_pfn_remap()).

Is there any interface to allow device driver to pre-set this flag in .mmap()
before .fault()? e.g. export track_pfn_remap() ?



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-17  6:13                     ` Yan Zhao
@ 2023-07-17 16:18                       ` Suren Baghdasaryan
  2023-07-18  0:27                         ` Yan Zhao
  0 siblings, 1 reply; 18+ messages in thread
From: Suren Baghdasaryan @ 2023-07-17 16:18 UTC (permalink / raw)
  To: Yan Zhao
  Cc: David Hildenbrand, Matthew Wilcox, Dan Carpenter, linux-mm,
	Andrew Morton, Liam R. Howlett, Laurent Dufour,
	Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka,
	Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis,
	Steve Wahl

On Sun, Jul 16, 2023 at 11:40 PM Yan Zhao <yan.y.zhao@intel.com> wrote:
>
> On Wed, Jul 12, 2023 at 07:48:06PM +0000, Suren Baghdasaryan wrote:
> > > Would we be able to fix it in stable simply by not triggering the
> > > vm_flags_set() in case these flags are already set?
> >
> > I think we can do that. gru_file_mmap() sets all the flags that are
> > set by remap_pfn_range_notrack() (VM_IO | VM_PFNMAP | VM_DONTEXPAND |
> > VM_DONTDUMP), so we can check if all bits are already present and skip
> > the vm_flags_set() call.
> >
> But on x86, remap_pfn_range() also sets flag VM_PAT. (in track_pfn_remap()).
>
> Is there any interface to allow device driver to pre-set this flag in .mmap()
> before .fault()? e.g. export track_pfn_remap() ?

Driver should be able to call vm_flags_set() in its .mmap().

>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-12 19:34                 ` Matthew Wilcox
@ 2023-07-17 19:54                   ` Dimitri Sivanich
  0 siblings, 0 replies; 18+ messages in thread
From: Dimitri Sivanich @ 2023-07-17 19:54 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Hildenbrand, Suren Baghdasaryan, Dan Carpenter, linux-mm,
	Andrew Morton, Liam R. Howlett, Laurent Dufour,
	Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka,
	Johannes Weiner, Peter Xu, Dimitri Sivanich, Steve Wahl

On Wed, Jul 12, 2023 at 08:34:10PM +0100, Matthew Wilcox wrote:
> On Wed, Jul 12, 2023 at 07:49:53PM +0100, Matthew Wilcox wrote:
> > @@ -964,9 +966,12 @@ vm_fault_t gru_fault(struct vm_fault *vmf)
> >  		}
> >  		gru_load_context(gts);
> >  		paddr = gseg_physical_address(gts->ts_gru, gts->ts_ctxnum);
> > -		remap_pfn_range(vma, vaddr & ~(GRU_GSEG_PAGESIZE - 1),
> > -				paddr >> PAGE_SHIFT, GRU_GSEG_PAGESIZE,
> > -				vma->vm_page_prot);
> > +
> > +		pte = pfn_pte(paddr / PAGE_SIZE, vma->vm_page_prot);
> > +		ptep = vmf->pte - (vaddr % GRU_GSEG_PAGESIZE) / PAGE_SIZE;
> > +		set_ptes(vma->vm_mm, vaddr & ~(GRU_GSEG_PAGESIZE - 1),
> > +				ptep, pte_mkspecial(pte),
> > +				GRU_GSEG_PAGESIZE / PAGE_SIZE);
> >  	}
> 
> Argh, no, this is wrong.  The page table isn't mapped at this point.
> What we want is a cross between vmf_insert_pfn() and vm_insert_pages().
> Let me add that ...

I should be able to test this once you have something ready.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-17 16:18                       ` Suren Baghdasaryan
@ 2023-07-18  0:27                         ` Yan Zhao
  2023-07-18 16:27                           ` Suren Baghdasaryan
  0 siblings, 1 reply; 18+ messages in thread
From: Yan Zhao @ 2023-07-18  0:27 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: David Hildenbrand, Matthew Wilcox, Dan Carpenter, linux-mm,
	Andrew Morton, Liam R. Howlett, Laurent Dufour,
	Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka,
	Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis,
	Steve Wahl

On Mon, Jul 17, 2023 at 09:18:34AM -0700, Suren Baghdasaryan wrote:
> On Sun, Jul 16, 2023 at 11:40 PM Yan Zhao <yan.y.zhao@intel.com> wrote:
> >
> > On Wed, Jul 12, 2023 at 07:48:06PM +0000, Suren Baghdasaryan wrote:
> > > > Would we be able to fix it in stable simply by not triggering the
> > > > vm_flags_set() in case these flags are already set?
> > >
> > > I think we can do that. gru_file_mmap() sets all the flags that are
> > > set by remap_pfn_range_notrack() (VM_IO | VM_PFNMAP | VM_DONTEXPAND |
> > > VM_DONTDUMP), so we can check if all bits are already present and skip
> > > the vm_flags_set() call.
> > >
> > But on x86, remap_pfn_range() also sets flag VM_PAT. (in track_pfn_remap()).
> >
> > Is there any interface to allow device driver to pre-set this flag in .mmap()
> > before .fault()? e.g. export track_pfn_remap() ?
> 
> Driver should be able to call vm_flags_set() in its .mmap().

Do you mean do something like this in .mmap()?

	flags = VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
	#ifdef CONFIG_X86
	flags |= VM_PAT;
	#endif
	vm_flags_set(vma, flags);

But VM_PAT cannot be set until after a successful reserve_pfn_range(),
which function is again not exported.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [bug report] mm: replace vma->vm_flags direct modifications with modifier calls
  2023-07-18  0:27                         ` Yan Zhao
@ 2023-07-18 16:27                           ` Suren Baghdasaryan
  0 siblings, 0 replies; 18+ messages in thread
From: Suren Baghdasaryan @ 2023-07-18 16:27 UTC (permalink / raw)
  To: Yan Zhao
  Cc: David Hildenbrand, Matthew Wilcox, Dan Carpenter, linux-mm,
	Andrew Morton, Liam R. Howlett, Laurent Dufour,
	Michel Lespinasse, Jerome Glisse, Michal Hocko, Vlastimil Babka,
	Johannes Weiner, Peter Xu, Dimitri Sivanich, Mike Travis,
	Steve Wahl

On Mon, Jul 17, 2023 at 5:54 PM Yan Zhao <yan.y.zhao@intel.com> wrote:
>
> On Mon, Jul 17, 2023 at 09:18:34AM -0700, Suren Baghdasaryan wrote:
> > On Sun, Jul 16, 2023 at 11:40 PM Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >
> > > On Wed, Jul 12, 2023 at 07:48:06PM +0000, Suren Baghdasaryan wrote:
> > > > > Would we be able to fix it in stable simply by not triggering the
> > > > > vm_flags_set() in case these flags are already set?
> > > >
> > > > I think we can do that. gru_file_mmap() sets all the flags that are
> > > > set by remap_pfn_range_notrack() (VM_IO | VM_PFNMAP | VM_DONTEXPAND |
> > > > VM_DONTDUMP), so we can check if all bits are already present and skip
> > > > the vm_flags_set() call.
> > > >
> > > But on x86, remap_pfn_range() also sets flag VM_PAT. (in track_pfn_remap()).
> > >
> > > Is there any interface to allow device driver to pre-set this flag in .mmap()
> > > before .fault()? e.g. export track_pfn_remap() ?
> >
> > Driver should be able to call vm_flags_set() in its .mmap().
>
> Do you mean do something like this in .mmap()?
>
>         flags = VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
>         #ifdef CONFIG_X86
>         flags |= VM_PAT;
>         #endif
>         vm_flags_set(vma, flags);
>
> But VM_PAT cannot be set until after a successful reserve_pfn_range(),
> which function is again not exported.

I see. Let's wait for Matthew's fix and see what it looks like first.
Maybe we can backport it directly instead of using alternatives.


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-07-18 16:27 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-11  7:21 [bug report] mm: replace vma->vm_flags direct modifications with modifier calls Dan Carpenter
2023-07-11 21:55 ` Suren Baghdasaryan
2023-07-11 22:21   ` Matthew Wilcox
2023-07-11 23:45     ` Suren Baghdasaryan
2023-07-12  7:35       ` David Hildenbrand
2023-07-12 15:01         ` Suren Baghdasaryan
2023-07-12 15:52           ` Matthew Wilcox
2023-07-12 15:55             ` David Hildenbrand
2023-07-12 16:03               ` Suren Baghdasaryan
2023-07-12 18:49               ` Matthew Wilcox
2023-07-12 18:52                 ` David Hildenbrand
2023-07-12 19:48                   ` Suren Baghdasaryan
2023-07-17  6:13                     ` Yan Zhao
2023-07-17 16:18                       ` Suren Baghdasaryan
2023-07-18  0:27                         ` Yan Zhao
2023-07-18 16:27                           ` Suren Baghdasaryan
2023-07-12 19:34                 ` Matthew Wilcox
2023-07-17 19:54                   ` Dimitri Sivanich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.