linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: "Nadav Amit" <nadav.amit@gmail.com>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"David Rientjes" <rientjes@google.com>,
	"Stephen Rothwell" <sfr@canb.auug.org.au>,
	"Edgar Arriaga García" <edgararriaga@google.com>,
	"Michal Hocko" <mhocko@suse.com>, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"# 5 . 10+" <stable@vger.kernel.org>
Subject: Re: [PATCH V2,2/2] mm: madvise: skip unmapped vma holes passed to process_madvise
Date: Fri, 18 Mar 2022 08:37:50 -0700	[thread overview]
Message-ID: <YjSnTu9QZiiZQE7A@google.com> (raw)
In-Reply-To: <74852e90-003b-84b8-9836-72258e3c5057@quicinc.com>

On Fri, Mar 18, 2022 at 07:35:41PM +0530, Charan Teja Kalla wrote:
> Thank you for valuable inputs.
> 
> On 3/18/2022 2:08 AM, Nadav Amit wrote:
> >>>>>> IMO, it's worth to note in man page.
> >>>>>>
> >>>>> Or the current patch for just ENOMEM is sufficient here and we just have
> >>>>> to update the man page?
> >>>> I think the "On success, process_madvise() returns the number of bytes
> >>>> advised" behaviour sounds useful.  But madvise() doesn't do that.
> >>>>
> >>>> RETURN VALUE
> >>>>       On  success, madvise() returns zero.  On error, it returns -1 and errno
> >>>>       is set to indicate the error.
> >>>>
> >>>> So why is it desirable in the case of process_madvise()?
> >>> Since process_madvise deal with multiple ranges and could fail at one of
> >>> them in the middle or pocessing, people could decide where the call
> >>> failed and then make a strategy whether they will abort at the point or
> >>> continue to hint next addresses. Here, problem of the strategy is API
> >>> doesn't return any error vaule if it has processed any bytes so they
> >>> would have limitation to decide a policy. That's the limitation for
> >>> every vector IO syscalls, unfortunately.
> >>>
> >>>>
> >>>>
> >>>> And why was process_madvise() designed this way?   Or was it
> >>>> always simply an error in the manpage?
> >> Taking a closer look, indeed manpage seems to be wrong.
> >> https://elixir.bootlin.com/linux/v5.17-rc8/source/mm/madvise.c#L1154
> >> indicates that in the presence of unmapped holes madvise will skip
> >> them but will return ENOMEM and that's what process_madvise is
> >> ultimately returning in this case. So, the manpage claim of "This
> >> return value may be less than the total number of requested bytes, if
> >> an error occurred after some iovec elements were already processed."
> >> does not reflect the reality in our case because the return value will
> >> be -ENOMEM. After the desired behavior is finalized I'll modify the
> >> manpage accordingly.
> > Since process_madvise() might be used in sort of non-cooperative mode,
> > I think that the caller cannot guarantee that it knows exactly the
> > memory layout of the process whose memory it madvise’s. I know that
> > MADV_DONTNEED for instance is not supported (at least today) by
> > process_madvise(), but if it were, the caller may want which exact
> > memory was madvise'd even if the target process ran some other
> > memory layout changing syscalls (e.g., munmap()).
> > 
> > IOW, skipping holes and just returning the total number of madvise’d
> > bytes might not be enough.
> 
> Then does the advised bytes range by default including holes is a
> correct design?
> Say the [start, len) range passed in the iovec by the user contains the
> layout like, vma1 -- hole-- vma2 -- hole -- vma3.
> 
> Under ideal case, where all vma's are eligible for advise, the total
> bytes processed returning should be vma3->end - vma1->start. This is
> success case.
> 
>  Now, say that vma1 is succeeded but vma2(say VM_LOCKED) is failed at
> advise. In such case processed bytes will be
> vma2->start-vma1->start(still consider hole as bytes processed), so that
> user may restart/skip at vma2, then continue. This return type will be
> partially processed bytes.
> 
> If the system doesn't found any VMA in the passed range by user, it
> returns ENOMEM as not a single advisable vma is found in the range.

As I mentioned in other reply, let's do not make any exception(i.e.,
skipping hole) for vectored memory syscall but exact processed bytes
on the exact ranges.


  reply	other threads:[~2022-03-18 15:37 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-11 15:29 [PATCH V2,0/2]mm: madvise: return correct bytes processed with process_madvise Charan Teja Kalla
2022-03-11 15:29 ` [PATCH V2,1/2] mm: madvise: return correct bytes advised " Charan Teja Kalla
2022-03-15 22:20   ` Minchan Kim
2022-03-21 15:18   ` Michal Hocko
2022-03-11 15:29 ` [PATCH V2,2/2] mm: madvise: skip unmapped vma holes passed to process_madvise Charan Teja Kalla
2022-03-15 22:58   ` Minchan Kim
2022-03-15 23:48     ` Andrew Morton
2022-03-16  1:43       ` Minchan Kim
2022-03-16 14:19         ` Charan Teja Kalla
2022-03-16 21:29           ` Andrew Morton
2022-03-17 16:28             ` Minchan Kim
2022-03-17 16:53               ` Suren Baghdasaryan
2022-03-17 20:38                 ` Nadav Amit
2022-03-18 14:05                   ` Charan Teja Kalla
2022-03-18 15:37                     ` Minchan Kim [this message]
2022-03-17 16:24           ` Minchan Kim
2022-03-21 15:02           ` Michal Hocko
2022-03-22  5:19             ` Charan Teja Kalla
2022-03-21 15:34   ` Michal Hocko
2022-03-22  7:10     ` Charan Teja Kalla
2022-03-22  8:40       ` Michal Hocko
2022-03-11 21:42 ` [PATCH V2,0/2]mm: madvise: return correct bytes processed with process_madvise Andrew Morton
2022-03-15 14:26   ` Charan Teja Kalla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjSnTu9QZiiZQE7A@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=edgararriaga@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=nadav.amit@gmail.com \
    --cc=quic_charante@quicinc.com \
    --cc=rientjes@google.com \
    --cc=sfr@canb.auug.org.au \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).