From: Minchan Kim <minchan@kernel.org>
To: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: "Nadav Amit" <nadav.amit@gmail.com>,
"Suren Baghdasaryan" <surenb@google.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Vlastimil Babka" <vbabka@suse.cz>,
"David Rientjes" <rientjes@google.com>,
"Stephen Rothwell" <sfr@canb.auug.org.au>,
"Edgar Arriaga García" <edgararriaga@google.com>,
"Michal Hocko" <mhocko@suse.com>, linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
"# 5 . 10+" <stable@vger.kernel.org>
Subject: Re: [PATCH V2,2/2] mm: madvise: skip unmapped vma holes passed to process_madvise
Date: Fri, 18 Mar 2022 08:37:50 -0700 [thread overview]
Message-ID: <YjSnTu9QZiiZQE7A@google.com> (raw)
In-Reply-To: <74852e90-003b-84b8-9836-72258e3c5057@quicinc.com>
On Fri, Mar 18, 2022 at 07:35:41PM +0530, Charan Teja Kalla wrote:
> Thank you for valuable inputs.
>
> On 3/18/2022 2:08 AM, Nadav Amit wrote:
> >>>>>> IMO, it's worth to note in man page.
> >>>>>>
> >>>>> Or the current patch for just ENOMEM is sufficient here and we just have
> >>>>> to update the man page?
> >>>> I think the "On success, process_madvise() returns the number of bytes
> >>>> advised" behaviour sounds useful. But madvise() doesn't do that.
> >>>>
> >>>> RETURN VALUE
> >>>> On success, madvise() returns zero. On error, it returns -1 and errno
> >>>> is set to indicate the error.
> >>>>
> >>>> So why is it desirable in the case of process_madvise()?
> >>> Since process_madvise deal with multiple ranges and could fail at one of
> >>> them in the middle or pocessing, people could decide where the call
> >>> failed and then make a strategy whether they will abort at the point or
> >>> continue to hint next addresses. Here, problem of the strategy is API
> >>> doesn't return any error vaule if it has processed any bytes so they
> >>> would have limitation to decide a policy. That's the limitation for
> >>> every vector IO syscalls, unfortunately.
> >>>
> >>>>
> >>>>
> >>>> And why was process_madvise() designed this way? Or was it
> >>>> always simply an error in the manpage?
> >> Taking a closer look, indeed manpage seems to be wrong.
> >> https://elixir.bootlin.com/linux/v5.17-rc8/source/mm/madvise.c#L1154
> >> indicates that in the presence of unmapped holes madvise will skip
> >> them but will return ENOMEM and that's what process_madvise is
> >> ultimately returning in this case. So, the manpage claim of "This
> >> return value may be less than the total number of requested bytes, if
> >> an error occurred after some iovec elements were already processed."
> >> does not reflect the reality in our case because the return value will
> >> be -ENOMEM. After the desired behavior is finalized I'll modify the
> >> manpage accordingly.
> > Since process_madvise() might be used in sort of non-cooperative mode,
> > I think that the caller cannot guarantee that it knows exactly the
> > memory layout of the process whose memory it madvise’s. I know that
> > MADV_DONTNEED for instance is not supported (at least today) by
> > process_madvise(), but if it were, the caller may want which exact
> > memory was madvise'd even if the target process ran some other
> > memory layout changing syscalls (e.g., munmap()).
> >
> > IOW, skipping holes and just returning the total number of madvise’d
> > bytes might not be enough.
>
> Then does the advised bytes range by default including holes is a
> correct design?
> Say the [start, len) range passed in the iovec by the user contains the
> layout like, vma1 -- hole-- vma2 -- hole -- vma3.
>
> Under ideal case, where all vma's are eligible for advise, the total
> bytes processed returning should be vma3->end - vma1->start. This is
> success case.
>
> Now, say that vma1 is succeeded but vma2(say VM_LOCKED) is failed at
> advise. In such case processed bytes will be
> vma2->start-vma1->start(still consider hole as bytes processed), so that
> user may restart/skip at vma2, then continue. This return type will be
> partially processed bytes.
>
> If the system doesn't found any VMA in the passed range by user, it
> returns ENOMEM as not a single advisable vma is found in the range.
As I mentioned in other reply, let's do not make any exception(i.e.,
skipping hole) for vectored memory syscall but exact processed bytes
on the exact ranges.
next prev parent reply other threads:[~2022-03-18 15:37 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-11 15:29 [PATCH V2,0/2]mm: madvise: return correct bytes processed with process_madvise Charan Teja Kalla
2022-03-11 15:29 ` [PATCH V2,1/2] mm: madvise: return correct bytes advised " Charan Teja Kalla
2022-03-15 22:20 ` Minchan Kim
2022-03-21 15:18 ` Michal Hocko
2022-03-11 15:29 ` [PATCH V2,2/2] mm: madvise: skip unmapped vma holes passed to process_madvise Charan Teja Kalla
2022-03-15 22:58 ` Minchan Kim
2022-03-15 23:48 ` Andrew Morton
2022-03-16 1:43 ` Minchan Kim
2022-03-16 14:19 ` Charan Teja Kalla
2022-03-16 21:29 ` Andrew Morton
2022-03-17 16:28 ` Minchan Kim
2022-03-17 16:53 ` Suren Baghdasaryan
2022-03-17 20:38 ` Nadav Amit
2022-03-18 14:05 ` Charan Teja Kalla
2022-03-18 15:37 ` Minchan Kim [this message]
2022-03-17 16:24 ` Minchan Kim
2022-03-21 15:02 ` Michal Hocko
2022-03-22 5:19 ` Charan Teja Kalla
2022-03-21 15:34 ` Michal Hocko
2022-03-22 7:10 ` Charan Teja Kalla
2022-03-22 8:40 ` Michal Hocko
2022-03-11 21:42 ` [PATCH V2,0/2]mm: madvise: return correct bytes processed with process_madvise Andrew Morton
2022-03-15 14:26 ` Charan Teja Kalla
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YjSnTu9QZiiZQE7A@google.com \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=edgararriaga@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=nadav.amit@gmail.com \
--cc=quic_charante@quicinc.com \
--cc=rientjes@google.com \
--cc=sfr@canb.auug.org.au \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).