From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34469) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a3jrS-0002Fs-JP for qemu-devel@nongnu.org; Tue, 01 Dec 2015 07:16:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a3jrO-00070Z-4p for qemu-devel@nongnu.org; Tue, 01 Dec 2015 07:16:46 -0500 Received: from e28smtp04.in.ibm.com ([122.248.162.4]:56886) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a3jrN-00070H-Ha for qemu-devel@nongnu.org; Tue, 01 Dec 2015 07:16:42 -0500 Received: from localhost by e28smtp04.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 1 Dec 2015 17:46:38 +0530 Received: from d28relay03.in.ibm.com (d28relay03.in.ibm.com [9.184.220.60]) by d28dlp01.in.ibm.com (Postfix) with ESMTP id 8473EE0058 for ; Tue, 1 Dec 2015 17:47:14 +0530 (IST) Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67]) by d28relay03.in.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id tB1CGVeK2949418 for ; Tue, 1 Dec 2015 17:46:31 +0530 Received: from d28av05.in.ibm.com (localhost [127.0.0.1]) by d28av05.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id tB1CFTLd002453 for ; Tue, 1 Dec 2015 17:45:29 +0530 From: "Aneesh Kumar K.V" In-Reply-To: <20151201125659-mutt-send-email-mst@redhat.com> References: <20151130105044.12269.21261.stgit@bahia.huguette.org> <20151130150353-mutt-send-email-mst@redhat.com> <20151130144631.4736280b@bahia.local> <20151130185328-mutt-send-email-mst@redhat.com> <878u5eqw2w.fsf@linux.vnet.ibm.com> <20151201125659-mutt-send-email-mst@redhat.com> Date: Tue, 01 Dec 2015 17:45:27 +0530 Message-ID: <87vb8i2wm8.fsf@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] [PATCH] mmap-alloc: use same backend for all mappings List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: Paolo Bonzini , qemu-devel@nongnu.org, Greg Kurz "Michael S. Tsirkin" writes: > On Tue, Dec 01, 2015 at 04:23:11PM +0530, Aneesh Kumar K.V wrote: >> "Michael S. Tsirkin" writes: >> >> > On Mon, Nov 30, 2015 at 02:46:31PM +0100, Greg Kurz wrote: >> >> On Mon, 30 Nov 2015 15:06:33 +0200 >> >> "Michael S. Tsirkin" wrote: >> >> >> >> >> .... >> >> >> >> On ppc64, the address space is divided in 256MB-sized segments where all pages >> >> have the same size. This is a hw limitation IIUC. I don't know if it can be >> >> fixed and I'll let Ben comment on it. >> > >> > But it's anonymous memory with PROT_NONE. There should be no pages there: >> > just a chunk of virtual memory reserved. >> > >> >> ppc64 use page size (called as base page size) to find the hash slot in >> which we find the virtual address to real address translation. All the >> pages in a segment should have same base page size. Hugetlb pages have a >> base page size of 16M whereas a regular linux page have 64K. mmap will >> fail to map a hugetlb mapping in a segment that already have regular >> pages mapped. >> >> -aneesh > > > I see this in kernel: > > } else if (flags & MAP_HUGETLB) { > struct user_struct *user = NULL; > struct hstate *hs; > > hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & SHM_HUGE_MASK); > if (!hs) > return -EINVAL; > > len = ALIGN(len, huge_page_size(hs)); > /* > * VM_NORESERVE is used because the reservations will be > * taken when vm_ops->mmap() is called > * A dummy user value is used because we are not locking > * memory so no accounting is necessary > */ > file = hugetlb_file_setup(HUGETLB_ANON_FILE, len, > VM_NORESERVE, > &user, HUGETLB_ANONHUGE_INODE, > (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK); > if (IS_ERR(file)) > return PTR_ERR(file); > } > > So maybe it's a question of passing in MAP_HUGETLB and the > correct size mask. > Can you explain this more ? If the question is do we need to pass fd and remove MAP_ANONYMOUS to map hugetlb, we don't. A good example is tools/testing/selftest/vm/map_hugetlb.c If the question is whether we will loose hugepages on mmap even if the mapping is PROT_NONE, then the answer is we do in the form of hugetlb reservation. -aneesh