From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jiri Kosina Subject: Re: [PATCH 0/3] mincore() and IOCB_NOWAIT adjustments Date: Wed, 6 Mar 2019 23:48:03 +0100 (CET) Message-ID: References: <20190130124420.1834-1-vbabka@suse.cz> <20190306143547.c686225447822beaf3b6e139@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Return-path: In-Reply-To: <20190306143547.c686225447822beaf3b6e139@linux-foundation.org> Sender: linux-kernel-owner@vger.kernel.org To: Andrew Morton Cc: Vlastimil Babka , Linus Torvalds , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, Peter Zijlstra , Greg KH , Jann Horn , Andy Lutomirski , Cyril Hrubis , Daniel Gruss , Dave Chinner , Dominique Martinet , Kevin Easton , "Kirill A. Shutemov" , Matthew Wilcox , Tejun Heo List-Id: linux-api@vger.kernel.org On Wed, 6 Mar 2019, Andrew Morton wrote: > > could you please take at least the correct and straightforward fix for > > mincore() before we figure out how to deal with the slightly less > > practical RWF_NOWAIT? Thanks. > > I assume we're talking about [1/3] and [2/3] from this thread? > > Can we have a resend please? Gather the various acks and revisions, > make changelog changes to address the review questions and comments? 1/3 is clearly the one to be merged. The version with all the acks gathered is in this thread, at https://lore.kernel.org/lkml/de52b3bd-4e39-c133-542a-0a9c5e357404@suse.cz/ Attaching the patch also at the end of this mail so that it could be easily picked up. I am unfortunately not sure what changelog changes you are talking about, there were none requested during the review as far as I know. 2/3 is clearly postponed for now, it needs more thinking. 3/3 is actually waiting for your decision, see https://lore.kernel.org/lkml/20190212063643.GL15609@dhcp22.suse.cz/ The 1/3 patch to be merged in any case: === cut here === From: Jiri Kosina Date: Wed, 16 Jan 2019 20:53:17 +0100 Subject: [PATCH v2] mm/mincore: make mincore() more conservative The semantics of what mincore() considers to be resident is not completely clear, but Linux has always (since 2.3.52, which is when mincore() was initially done) treated it as "page is available in page cache". That's potentially a problem, as that [in]directly exposes meta-information about pagecache / memory mapping state even about memory not strictly belonging to the process executing the syscall, opening possibilities for sidechannel attacks. Change the semantics of mincore() so that it only reveals pagecache information for non-anonymous mappings that belog to files that the calling process could (if it tried to) successfully open for writing. [mhocko@suse.com: restructure can_do_mincore() conditions] Originally-by: Linus Torvalds Originally-by: Dominique Martinet Cc: Dominique Martinet Cc: Andy Lutomirski Cc: Dave Chinner Cc: Kevin Easton Cc: Matthew Wilcox Cc: Cyril Hrubis Cc: Tejun Heo Cc: Kirill A. Shutemov Cc: Daniel Gruss Signed-off-by: Jiri Kosina Signed-off-by: Vlastimil Babka Acked-by: Josh Snyder Acked-by: Michal Hocko --- mm/mincore.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/mm/mincore.c b/mm/mincore.c index 218099b5ed31..b8842b849604 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -169,6 +169,16 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, return 0; } +static inline bool can_do_mincore(struct vm_area_struct *vma) +{ + if (vma_is_anonymous(vma)) + return true; + if (!vma->vm_file) + return false; + return inode_owner_or_capable(file_inode(vma->vm_file)) || + inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; +} + /* * Do a chunk of "sys_mincore()". We've already checked * all the arguments, we hold the mmap semaphore: we should @@ -189,8 +199,13 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v vma = find_vma(current->mm, addr); if (!vma || addr < vma->vm_start) return -ENOMEM; - mincore_walk.mm = vma->vm_mm; end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); + if (!can_do_mincore(vma)) { + unsigned long pages = (end - addr) >> PAGE_SHIFT; + memset(vec, 1, pages); + return pages; + } + mincore_walk.mm = vma->vm_mm; err = walk_page_range(addr, end, &mincore_walk); if (err < 0) return err; -- Jiri Kosina SUSE Labs