linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	surenb@google.com, vbabka@suse.cz, rientjes@google.com,
	sfr@canb.auug.org.au, edgararriaga@google.com,
	nadav.amit@gmail.com, mhocko@suse.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	"# 5 . 10+" <stable@vger.kernel.org>
Subject: Re: [PATCH V2,2/2] mm: madvise: skip unmapped vma holes passed to process_madvise
Date: Thu, 17 Mar 2022 09:24:01 -0700	[thread overview]
Message-ID: <YjNgoeg1yOocsjWC@google.com> (raw)
In-Reply-To: <5428f192-1537-fa03-8e9c-4a8322772546@quicinc.com>

On Wed, Mar 16, 2022 at 07:49:38PM +0530, Charan Teja Kalla wrote:
> Thanks Andrew and Minchan.
> 
> On 3/16/2022 7:13 AM, Minchan Kim wrote:
> > On Tue, Mar 15, 2022 at 04:48:07PM -0700, Andrew Morton wrote:
> >> On Tue, 15 Mar 2022 15:58:28 -0700 Minchan Kim <minchan@kernel.org> wrote:
> >>
> >>> On Fri, Mar 11, 2022 at 08:59:06PM +0530, Charan Teja Kalla wrote:
> >>>> The process_madvise() system call is expected to skip holes in vma
> >>>> passed through 'struct iovec' vector list. But do_madvise, which
> >>>> process_madvise() calls for each vma, returns ENOMEM in case of unmapped
> >>>> holes, despite the VMA is processed.
> >>>> Thus process_madvise() should treat ENOMEM as expected and consider the
> >>>> VMA passed to as processed and continue processing other vma's in the
> >>>> vector list. Returning -ENOMEM to user, despite the VMA is processed,
> >>>> will be unable to figure out where to start the next madvise.
> >>>> Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
> >>>> Cc: <stable@vger.kernel.org> # 5.10+
> >>>
> >>> Hmm, not sure whether it's stable material since it changes semantic of
> >>> API. It would be better to change the semantic from 5.19 with man page
> >>> update to specify the change.
> >>
> >> It's a very desirable change and it makes the code match the manpage
> >> and it's cc:stable.  I think we should just absorb any transitory
> >> damage which this causes people.  I doubt if there will be much - if
> >> anyone was affected by this they would have already told us that it's
> >> broken?
> > 
> > 
> > process_madvise fails to return exact processed bytes at several cases
> > if it encounters the error, such as, -EINVAL, -EINTR, -ENOMEM in the
> > middle of processing vmas. And now we are trying to make exception for
> > change for only hole?
> I think EINTR will never return in the middle of processing VMA's for
> the behaviours supported by process_madvise().
> 
> It can return EINTR when:
> -------------------------
> 1) PTRACE_MODE_READ is being checked in mm_access() where it is waiting
> on task->signal->exec_update_lock. EINTR returned from here guarantees
> that process_madvise() didn't event start processing.
> https://elixir.bootlin.com/linux/v5.16.14/source/mm/madvise.c#L1264 -->
> https://elixir.bootlin.com/linux/v5.16.14/source/kernel/fork.c#L1318
> 
> 2) The process_madvise() started processing VMA's but the required
> behavior on a VMA needs mmap_write_lock_killable(), from where EINTR is
> returned. The current behaviours supported by process_madvise(),
> MADV_COLD, PAGEOUT, WILLNEED, just need read lock here.
> https://elixir.bootlin.com/linux/v5.16.14/source/mm/madvise.c#L1164
>  **Thus I think no way for EINTR can be returned by process_madvise() in
> the middle of processing.** . No?
> 
> for EINVAL:
> -----------
> The only case, I can think of,  where EINVAL can be returned in the
> middle of processing is in examples like, given range contains VMA's
> with a hole in between and one of the VMA contains the pages that fails
> can_madv_lru_vma() condition.
> So, it's a limitation that this returns -EINVAL though some bytes are
> processed.
> 	OR
> Since there exists still some invalid bytes processed it is valid to
> return -EINVAL here and user has to check the address range sent?
> 
> for ENOMEM:
> ----------
> Though complete range is processed still returns ENOMEM. IMO, This
> shouldn't be treated as error which the patch is targeted for. Then
> there is limitation case that you mentioned below where it returns
> positive processes bytes even though it didn't process anything if it
> couldn't find any vma for the first iteration in madvise_walk_vmas
> 
> I think the above limitations with EINVAL and ENOMEM are arising because
> we are relying on do_madvise() functionality which madvise() call uses
> to process a single VMA. When 'struct iovec' vector processing interface
> is given in a system call, it is the expectation by the caller that this
> system call should return the correct bytes processed to help the user
> to take the correct decisions. Please correct me If i am wrong here.
> 
> So, should we add the new function say do_process_madvise(), which take
> cares of above limitations? or any alternative suggestions here please?

What I am thinking now is that the process_madvise needs own iterator(i.e.,
do_process_madvise) and it should represent exact bytes it addressed with
exacts ranges like process_vm_readv/writev. Poviding valid ranges is
responsiblity from the user.

> 
> > IMO, it's worth to note in man page.
> > 
> 
> Or the current patch for just ENOMEM is sufficient here and we just have
> to update the man page?
> 
> > In addition, this change returns positive processes bytes even though
> > it didn't process anything if it couldn't find any vma for the first
> > iteration in madvise_walk_vmas.
> 
> Thanks,
> Charan
> 


  parent reply	other threads:[~2022-03-17 16:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-11 15:29 [PATCH V2,0/2]mm: madvise: return correct bytes processed with process_madvise Charan Teja Kalla
2022-03-11 15:29 ` [PATCH V2,1/2] mm: madvise: return correct bytes advised " Charan Teja Kalla
2022-03-15 22:20   ` Minchan Kim
2022-03-21 15:18   ` Michal Hocko
2022-03-11 15:29 ` [PATCH V2,2/2] mm: madvise: skip unmapped vma holes passed to process_madvise Charan Teja Kalla
2022-03-15 22:58   ` Minchan Kim
2022-03-15 23:48     ` Andrew Morton
2022-03-16  1:43       ` Minchan Kim
2022-03-16 14:19         ` Charan Teja Kalla
2022-03-16 21:29           ` Andrew Morton
2022-03-17 16:28             ` Minchan Kim
2022-03-17 16:53               ` Suren Baghdasaryan
2022-03-17 20:38                 ` Nadav Amit
2022-03-18 14:05                   ` Charan Teja Kalla
2022-03-18 15:37                     ` Minchan Kim
2022-03-17 16:24           ` Minchan Kim [this message]
2022-03-21 15:02           ` Michal Hocko
2022-03-22  5:19             ` Charan Teja Kalla
2022-03-21 15:34   ` Michal Hocko
2022-03-22  7:10     ` Charan Teja Kalla
2022-03-22  8:40       ` Michal Hocko
2022-03-11 21:42 ` [PATCH V2,0/2]mm: madvise: return correct bytes processed with process_madvise Andrew Morton
2022-03-15 14:26   ` Charan Teja Kalla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjNgoeg1yOocsjWC@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=edgararriaga@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=nadav.amit@gmail.com \
    --cc=quic_charante@quicinc.com \
    --cc=rientjes@google.com \
    --cc=sfr@canb.auug.org.au \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).